algorithm genetic-algorithm evolutionary-algorithm

Evolution Algorithm to solve Nonograms

I want to try to solve nonograms using Evolutionary Algorithm. I represent fitness as amount of constraints that satisfy the board. For example board 10X10 got 20 constraints ( 10 left, 10 top)

So my maximum Fitness is 20.

My main problem is that algorithm in most cases stuck in local maximum( with 15-18 fitness) and i don't know how i can prevent it or jump in right direction.

Still sometimes algorithm manage to solve the puzzle.

I use simple crossover( x rows from first individual + y rows from second) And mutation that change 1 random cell.

Any ideas for better mutations or other techniques that can help me with local maximum problem?

Many thanks!

Solution

(adding on Sisnett's good answer)

Any ideas for better mutations or other techniques that can help me with local maximum problem?

You can adopt the standard mutation / crossover operators, probably the problem representation and fitness function have to be improved.

Here I'm just setting out an example. More details and a possible C++ implementation (based on the Ultra framework) can be found on Github.

Individual representation

A candidate solution (i.e. an individual of the GAs population) can be encoded as a sequence of integers.

Consider the following problem:

. # # # . . . .  3
# # . # . . . .  2 1
. # # # . . # #  3 2
. . # # . . # #  2 2
. . # # # # # #  6
# . # # # # # .  1 5
# # # # # # . .  6
. . . . # . . .  1
. . . # # . . .  2
1 3 1 7 5 3 4 3
2 1 5 1

the pattern can be encoded (column-order) as:

COLUMN (zero-based indexing)
| 0  |  1  |  2  |  3  | 4| 5| 6| 7|

{1, 2, 0, 2, 0, 0, 0, 0, 4, 4, 2, 2}

 ^  ^                          ^  ^
 |  |                          |  +-- last block (size 3)
 |  |                          +----- second to last block (size 4)
 .  .                          .  .
 .  .                          .  .
 .  .                          .  .
 |  +-------------------------------- second block (size 2)
 +----------------------------------- first block (size 1)

where each number represents the distance of a block from the first allowed placing position. In total there are 12 blocks on 8 columns.

The encoding by column is an arbitrary choice. The same pattern could be represented (row-order) as:

{1, 0, 0, 1, 1, 2, 1, 2, 0, 0, 0, 4, 3}

This time there are 13 blocks on 9 rows.

Depending on the problem structure (i.e. number and type of the clues) one approach could be better than the other (from a performance point of view).

Anyway it's always possible to:

swap row and column clues
solve the resulting problem
transpose the solution matrix to get the solution of the original problem

Note that the representation allows invalid individuals. Consider the following column:

.
.
.
.
1
2

The only solution is {0,0}:

#
.
#
#
1
2

but the representation also allows {0,1}, {0,2}, {0,3}, ... {3,2}, {3,3}.

To "keep it simple" invalid integers can be mapped to a valid range (e.g. via the modulo operator) thus creating a separation between the genotype (the list of integers) and the phenotype (the real position of blocks on the board).

This is very similar to the Grammatical Evolution scheme in which the objects the search algorithm operates on and what the fitness evaluation function interprets are distinct.

Fitness function

First of all you need a decoding function that, given the genome, returns a true/false matrix.

With the proposed representation column constraints (i.e. column clues) are already satisfied. The fitness function measures the performance of an individual considering only row-clues.

E.g.

Solution:                Candidate solution:
                         {1,2,0,2,0,0,0,0,4,4,2,0}

. # # # . . . .  3       . # # # . . . #  3 1    <-- ERROR
# # . # . . . .  2 1     # # . # . . . #  2 1 1  <-- ERROR
. # # # . . # #  3 2     . # # # . . # #  3 2
. . # # . . # #  2 2     . . # # . . # .  2 1    <-- ERROR
. . # # # # # #  6       . . # # # # # .  5      <-- ERROR
# . # # # # # .  1 5     # . # # # # # .  1 5
# # # # # # . .  6       # # # # # # . .  6
. . . . # . . .  1       . . . . # . . .  1
. . . # # . . .  2       . . . # # . . .  2
1 3 1 7 5 3 4 3          1 3 1 7 5 3 4 3
2 1 5 1                  2 1 5 1

the block on the last column is misplaced:

ROW  CORRECT  WRONG     DELTA
0     [3]     [3,1]     |3-3| + |0-1| = 1
1     [2,1]   [2,1,1]   |2-2| + |1-1| + |0-1| = 1
3     [2,2]   [2,1]     |2-2| + |2-1| = 1
4     [6]     [5]       |6-5| = 1

FITNESS = -sum(delta_i) = -4

The fitness function computes the distance between the correct and wrong sequences, matching elements by order and taking their absolute difference.

Notes

GAs tend to produce good but sub-optimal solutions and this behaviour is acceptable only for some combinatorial optimization problems.

The proposed representation is good enough to detect biases in low-order schemas and, combining information, eventually converges (I've played with 30x30 nonograms).

There are further improvements but this should be a good starting point.