This is an animated gif of the weightspace of a network trying to learn the number 7. Each time the system learned, the weights were dumped to disk in the form of a .ppm file. Then I ran convert to turn these .ppm files into .gif files. Then I ran whirlgif to make the .gif files into an animated gif.
The system was run for 10 epochs on 2000 training patterns per epoch. It's error was computed before each epoch on 500 testing patterns. I didn't randomize selection from the database, but it would have been better if I had. Learning was set at .01 and the bias was set at 0. Note that, since weights were written out only when we learned, there are more images in the file from early in the training. This is true because more learning occurs when more errors occur.
A couple of things to note: red indicates negative (inhibitory) weight and green indicates positive (excitatory) weights. Also, I normalized the weights between -1 and 1 using a sigmoidal squashing function. This was necessary because the .ppm format requires a maximum value. Since there is no theoretical bound on the weights, I simply squashed them and then multiplied by 256 to get a nice bound. Here is my code for generating a .ppm file. It is nearly identical to a routine distributed in the MNISTTools package.
Here were my errors (printed to stderr by my java program):
At epoch 1 error is 0.81 At epoch 2 error is 0.12 At epoch 3 error is 0.08 At epoch 4 error is 0.04 At epoch 5 error is 0.03 At epoch 6 error is 0.036 At epoch 7 error is 0.04 At epoch 8 error is 0.042 At epoch 9 error is 0.038 At epoch 10 error is 0.046