Monday, May 10, 2010

The Art of Solving Sudokus - Part II

ANN - Artifical Neural Network
An artificial neural network (ANN) is the analog of the biological neural networks as we encounter in the “real world”. Such a biological neural network consists of neurons and synapses which connect neurons to each other. Each neuron can fire signals to the other neurons to which it is connected. Depending on the width of the synapse between the firing neuron and the neuron which is fired at, the fired signal can be weakened or strengthened. A neuron will only fire a signal when it is triggered to do so by his connecting neurons, and when the accumulated signal of those connecting neurons is strong enough.

An artificial neural network is defined in a similar way: it consists of nodes, as the analogue of neurons, and edges between these nodes, as the analogue of synapses. To each edge a weight is attached, corresponding to the width of the synapse. Starting from an initial state of the ANN we can iteratively determine values for each node by taking the weighed average of the values of all nodes connected to this node. This weighed average for such a node corresponds to the accumulated signal of all connecting nodes. The average is weighed using the values of the connecting nodes and the weights of the corresponding edges. The weighed average determines whether the node is about to fire or not. If this weighed average is more than a certain threshold value the value of the node is set to 1, that is the firing mode, if the weighed average is less than the threshold value it is set to 0, indicating the non-firing mode. The corresponding function to determine the firing mode, is called the discrete transfer function. The iteration step is called the local updating rule.

Example: Assuming a network of $N$ neurons $s_i$, having values in $\{0,1\}$, where the interaction between the neurons is represented by the weights $w_{ij}$, then the network can be updated by applying the rules:\[v_i=\sum_{j=1,\neq i}^{N} w_{ij}\cdot s_j \mbox{, and } s_i=sgn(v_i)\]
In real life biological neural networks behave “rational”, at least that is how we would like to consider ourselves, suggesting some stable behavior of such networks. This stable behavior can be expressed by an equilibrium state of the network. Common approach to determine such stable states is to consider them as states of the network where some so-called energy function is minimal. For each ANN we can also define an energy function which gives a value for each possible state of the ANN. This energy function is based on the weights of the synapses and the values of the neurons which are connected by these synapses.

Example: For the network of the previous example one could define the energy function by:\[E=\frac{1}{2}\sum_{i=1}^{N} \sum_{j=1,\neq i}^{N} w_{ij}\cdot s_i \cdot s_j\]
If certain conditions are satisfied, when applying the local update rule iteratively, the ANN will converge to an equilibrium state, in which the energy function will obtain a global minimum.

Optimization
Since mankind can solve any of its problems (at least that is how we see ourselves), and the underlying scheme for it is provided by its biological neural networks, it makes sense to try to see whether ANN’s can also be used to solve certain types of optimization problems. Since solutions for optimization problems are the minimal values of some target function, say energy function, the underlying idea is to "program" the ANN in such a way that its energy function corresponds to the optimization problem we want to solve.

Using the local updating rule we can try to determine iteratively the optimal solution. Important restrictions are that not all optimization problems can be programmed in such a way, and that (as with all heuristics) the local updating rule will not always reach the global optimum, but can end up in a local optimum.

There are different types of ANN’s, depending on the choices we make in our modeling. These types are:
  1. Discrete ANN’s
  2. Fuzzy ANN’s
  3. Quantum ANN’s
The way that ANN's were defined changed during time, based on the experiences with them and due to the quality of the proposed method. The first ANN's were based on binary values for the nodes, using a discete step function as threshold function, and evolved to quantum like ANN's where discrete distributions are assigned as values to the nodes, and a temperature based Boltzmann function is used as threshold function.
These different types will be described in more details below.

Discrete ANN's
In a discrete ANN the values of the neurons are restricted to 0 and 1 (sometimes the values -1 and 1 are used). In this discrete case the local updating rule will be based on the discrete transfer function, and will only "jump" from one discrete state to another discrete state. Sometimes those jumps from one discrete state to another state will be "too big", implying that the local update rule will not be able to leave this state anymore. This results in a state that corresponds to only a local minimum instead of the global minimum we were looking for.

Example: In the discrete case the value 0 corresponds to the distribution (1, 0). The first entry in this vector is the probability that the value 0 occurs, the second entry the probability that the value 1 occurs. The value 1 corresponds to the distribution (0, 1).

Fuzzy ANN's
A first adjustment is to choose a different form for the transfer function. We can choose a "smooth" function instead of the discrete transfer function. This smooth function still depends on the input value, the weighed average of all connected neurons, but can now have all possible values between 0 and 1, instead of only 0 or 1. This type of ANN is called a fuzzy (or continuous) ANN.
It makes sense to consider such fuzzy ANN’s: since the values of the neurons will have values between 0 and 1, each value can be interpreted as the probability that the neuron will fire a signal. The parameter that is used for "smoothing" the transfer function is called the temperature.

Example: In the fuzzy case each neuron has a distribution (p, 1-p), where p is a value between 0 and 1.

A disadvantage of assigning values between 0 and 1 to the neurons, is that the local updating rule will not force the final state to be a discrete state, leaving the ANN in a fuzzy state. We can take care of this by adding an extra term to the energy function, which prefers discrete states, but in doing so we have to carefully choose the appropriate weight factor for this extra term, which is sometime a harder job than the original optimization problem itself.

Quantum ANN's
A way to overcome this problem and which results automatically in a feasible optimum state, consists of a "quantum" approach. Instead of assigning just a single value we assign a discrete distribution to a neuron. That is we assign to a neuron a number of possible values and for each value a probability that this values occurs.

Example: More generally we could assign K possible values to a neuron, with appropriate probabilities. This is called a K-state Potts spin. The natural representation for such a K-state Potts spin is a vector of K entries, where each entry has the value 0 or 1 (discrete variant) or between 0 and 1 (fuzzy variant). The vector should be a distribution, that is the sum of all entries should be 1.

The local updating rule can be reformulated in such a way that the resulting value is again a K-state Potts spin: when we apply the rule to a certain neuron we take the sum of all K-state Potts spins of all neurons which are connected to the specified neuron. We normalize the resulting K-vector in such a way that the sum of all entries becomes 1. There is an 'obvious' choice for this normalization.

The Art of Solving Sudokus

Introduction
In this notebook we will look at so-called artificial neural networks, and how these can be “programmed” to solve optimization problems. More specifically we will look at a meta heuristic using such artificial neural networks, which is inspired by physical phenomena like the cooling down of gases or the occurrence of permanent magnetization under certain conditions. The underlying physical model used to explain these phenomena, can also be used to “solve” other optimization problems. To demonstrate the use of this meta heuristic we will apply it to the popular Sudoku game. We will first show how the heuristic can be applied to solve the Sudoku, and secondly how the heuristic itself can be related to the more intuitive approach commonly used to solve a Sudoku by hand. Finally we will discuss what its weak points are, and we will indicate some possible improvements to the meta-heuristics.

Meta-heuristics
In combinatorial optimization exact solutions are sometimes hard to find due to the size of the optimization problem and the time it takes to solve the problem. In such cases it can be beneficial to use approximation algorithms, which do not guarantee to find the best solution, but often find a near optimal solution in a reasonable time span. One can define “meta heuristics” as general patterns for such algorithmic approaches to approximate optimal solutions for problems in combinatorial optimization. Usually meta heuristics are applied in case exact solutions cannot be found in time, however it is also worthwhile to apply meta heuristics to easier optimization problems, where exact solutions can be found, so that one can compare the outcome of the meta heuristic to these exact solutions. In the following notes we will focus on the meta heuristic Mean Field Annealing. We will describe how the meta heuristic works and apply it to a simple optimization problem, which corresponds to solving the popular Sudoku game. In this case we will focus on the stochastic nature of Mean Field Annealing, point out some "peculiar" behaviour of the meta heuristic and suggest a way to measure the quality of the solutions found.

The Sudoku game
The Sudoku-game consists of a 9x9 square of 81 fields. The purpose of the game is to fill in all fields using the integer numbers 1, 2,..., 9, in such a way that each column, each row and each of the nine smaller 3x3-squares consists of all nine different integers. Since there are a lot of possibilities to fill in a blank 9x9 square in this way, some of the fields are pre-filled. For these fields the integers are fixed and given in advance, and thus putting constraints on the possible solutions. The other fields have to be filled in by the Sudoku player. A good Sudoku contains enough pre-filled fields to have one unique solution. An example of a Sudoku follows below:
[[Insert Picture]]

One way for solving the Sudoku is to keep track of possible values for each field, that is to maintain a list for each field consisting of all possible integers, excluding the integers that are already used, but also excluding integers that cannot be used based on more “advanced” reasoning. Such a list for each field can be constructed easily by looking at the column, the row and 3x3-square, to which the field belongs. We can start constructing these lists for an arbitrary field, do the same for a next (perhaps less) arbitrary field, and continue this way, using of course the additional information we obtained from each list.
For some Sudoku’s these lists are sufficient to fill in all fields, because after some iterations each list uniquely determines the integer value for the corresponding field. These are the easy ones. For more difficult Sudoku’s it takes more effort; we have to combine the information from the lists for several fields, and sometimes even that will not be enough, and we have to go through all remaining possibilities.

This intuitive approach for solving Sudoku’s is quite similar to what one can accomplish using artificial neural networks and the mean field annealing meta heuristic. Before detailing on those artificial neural networks it is good to already highlight these similarities:
  1. We first pick a field at random, to investigate on its possible values. The randomness can be decreased by learning from the past, that is using information obtained in previous steps.
  2. For each field we keep track of the possible values by maintaining a list of those values. These lists will be used in next iterations to determine or update the list for other fields.
What will be interesting is to see why some Sudoku’s are that hard to solve and why some of them are just a piece of cake. We will see that this relates directly to what artificial neural networks and mean field annealing will do for us. In the end we hope that this provides us with means to determine while solving problems with the mean field annealing meta heuristic whether we are heading the right way, or will get stuck at some "solution" that does not solve our problem.