Implement a ``vanilla'' multilayer perceptron and backpropagation to calculate the gradient of the error and gradient descent (with a control to switch it between stochastic and batch modes), and train it to do XOR. You can implement thresholds (biases) however you find most convenient. Be sure to initialize the weights to small random numbers, to break symmetry.
You should turn in: your code, an animated Hinton diagram of the evolution of the weights for a good run, and a graph of the error as a function of training time. For extra brownie points, graph the probability of learning the training set correctly as a function of the number of hidden units, or show the output as a function of the two inputs for real-valued inputs ranging between 0 and 1, using a PPM file or a 3D plot or whatever.
You're still encouraged to work in teams. This time, there is no extra work involved, but please describe how you divided up the project.