Solving a Sudoku with a Genetic Algorithm

Why would you want so solve a Sudoku with a Genetic Algorithm? That’s a fair question. Much easier would be to google on solving Sudoku’s and you’d probably have a method in 10 minutes. Like this one.

Okay, but suppose we didn’t know anything about sudoku’s (only the rules), or -more likely- we would have a slightly different problem, where googling wouldn’t help? Get it? ‘Because we can’ is also valid I think, though. Besides, I wanted to test my solving problem solving (no error) skills. If you’re relatively new to GA, it may help you understand.

Here’s a 4 star sudoku (I have not tested extensively with other sudoku’s)

(source www.sudokunet.nl)

First we have to define a Sudoku in Python. A list of integers is the easiest thing. One could argue for a 2D matrix (array), but this doesn’t add much. Besides, it makes implementing a Genetic Algoritmn complicated, because generally a genome is a 1D thing. Pyevolve has a lot more options for 1D lists that for 2D.

# define sudoku problem. 0's have to be filled
sudoku=[
0,0,6,0,0,0,0,0,0,
0,8,0,0,5,4,2,0,0,
0,4,0,0,9,0,0,7,0,
0,0,7,9,0,0,3,0,0,
0,0,0,0,8,0,4,0,0,
6,0,0,0,0,0,1,0,0,
2,0,3,0,6,7,9,8,1,
0,0,0,5,0,0,0,4,0,
4,7,8,3,1,9,5,6,2]

Now let’s implement the rules (one of each number in a row, column and block). I’ve added three functions to give me the rows, columns an blocks:

def row(sd,rijnum):
    #function that returs a list consisting of row
    row=[]
    for index in range(9):
        row.append(sd[rijnum*9+index])
    return row

def column(sd, columnnum):
    #returns column
    column=[]
    for index in range(9):
        column.append(sd[index*9+columnnum])
    return column

def block(sd,blocknum):
    #returns block numbered form left to right and then down
    block=[]
    blockstart=int(blocknum/3)*9*3+(blocknum-int(blocknum/3)*3)*3
    for indexX in range(3):
        for indexY in range(3):
            block.append(sd[blockstart+indexX*9+indexY])
    return block

def errors(sdsol):
    #this is the fitness function. No errors means the solution
    err=0
    for index in range (9):
        for element in [row(sdsol,index),column(sdsol,index),block(sdsol,index)]:
            for getal in [1,2,3,4,5,6,7,8,9]:
                # check on double numbers
                if (element.count(getal))>1:
                    err+=element.count(getal)-1

    for i,getal in enumerate(sdsol):
        # check on numbers that don't match the startnumbners
        if getal!=sudoku[i] and sudoku[i]!=0:
            err+=2
    return err

That’s all for the Sudoku. Note that the errors function, an implementation of the rules, is also the fitness function for the genetic algorithm.

The brilliant thing about Python is all the libaries. Why re-invent it all ourselves? I’m using Pyevolve here.

Now, a genome (81 numbers) represents a possible solution. A genome is a list containing inputs for the fitness function. If you don’t understand this, you might want to read some more basics on GA first. It may be full of errors (double numbers), doesn’t matter, we’ll let the error function deal with that. Keeping the numbers between 1 and 9 helps the search speed. This is all there is to it, the code pretty much explains itself (considering you know the basics of GA):

genome = G1DList.G1DList(81) #defining the genome as a list of 81 integers
genome.setParams(rangemin=1,rangemax=9)
genome.evaluator.set(errors) #sets fitness function to errors
genome.setParams(bestrawscore=0.00, rounddecimal=2)
ga = GSimpleGA.GSimpleGA(genome)
ga.setMinimax(Consts.minimaxType["minimize"]) #0 errors wanted
ga.setMutationRate(0.04)
ga.setGenerations(10000)
ga.setPopulationSize(200)
ga.setCrossoverRate(1)
ga.setElitismReplacement(5)
ga.terminationCriteria.set(GSimpleGA.RawScoreCriteria)
ga.evolve(freq_stats=10)

print ga.bestIndividual()

Were trying to “minimze” the errors of course and if it is 0, we can stop (terminationCriteria). Now, the most important lesson for me was that a GA takes tuning. What’s your population size? What’s the crossover method? Probably the most used method is single point crossover, but does this work here? The mutation rate is also important. Too high will result in slow search and to low means you might not reach the solution in 10000 generations. Try some numbers if you like.

Here’s the result.

>>>
Gen. 4270 (42.70%): Max/Min/Avg Fitness(Raw) [17.68(23.00)/9.48(0.00)/14.73(14.73)]
Total time elapsed: 99.557 seconds.
- GenomeBase
	Score:			 0.000000
	Fitness:		 9.482771
...

- G1DList
	List size:	 81
	List:		 [
6, 3, 8, 2, 5, 9, 7, 1, 4,
2, 5, 9, 7, 1, 4, 6, 3, 8,
7, 1, 4, 6, 3, 8, 2, 5, 9, 
3, 6, 2, 8, 9, 5, 1, 4, 7, 
8, 9, 5, 1, 4, 7, 3, 6, 2, 
1, 4, 7, 3, 6, 2, 8, 9, 5, 
5, 8, 6, 9, 2, 3, 4, 7, 1, 
9, 2, 3, 4, 7, 1, 5, 8, 6, 
4, 7, 1, 5, 8, 6, 9, 2, 3]

It took my laptop 100 seconds at best. Better that human, but many times worse than a proper Sudoku solving routine. I did not expect it to be fast, I was just wondering if I could make it work. And I forced myself not to Goole for it, until it worked. Of course people did it before me. But it looks like mine is doing quite well. On multiple runs, the time can differ a lot though, and sometimes it will not find it at all. I guess the algorithm runs into a local optimum and mutation is the only way to get it out again. It seems attractive¬† to raise the mutation rate, but if you raise it a bit to much, multiple genes in a single genome will change and that will set you back. I think the thing to do would be, to have multiple populations and migration between the populations. Where have I heard that before? I’ll try that some other time.