Fall 2011, NUIM CS401: Machine Learning
M 10-10:50, Arts Hall D
T 11-11:50, Arts Hall C
Instructional materials
Assignments
I encourage you to work in teams of two or even three people. It is
my recommendation to try to form teams that are not "clones", i.e.,
that contain members with varying backgrounds; this diversity tends to
lead to greater success.
Assignment 1
HW1 involves supervised learning, but you are free to use whatever ML
methods you please to solve the problem: any algorithms, supervised or
unsupervised. The input is 40-dimensional numeric vectors. There are
two classes: A and B. There are 10000 labeled samples in the training
set, hw1-train.txt, and 1000 unlabeled
samples in the test set, hw1-test.txt.
Your goal is to train a machine learning system using the given data,
label the test data as well as you can, and send me these labels. The
format for your labels should be a 1000 line ASCII file, each line of
which is one character long: either A or B, corresponding to the label
given to the input pattern in the corresponding line of the test set
file. I will measure your error rate on the test set by comparing
your labels to the truth. Your score will be in part based on this.
You should also give me your estimate of the error rate you
expect me to see; I will give points for accuracy of that estimate,
subject to some reasonableness conditions. (E.g., if you just guess
random labels, and estimate an accuracy of 50%, I will not be
impressed.) And lastly, turn in a report showing what you did:
include code you wrote, graphs generated (please label the X and Y
axes) to determine parameters for algorithms, brief English
descriptions of what you did, and anything else you want to show me.
If you worked in a team, please very briefly state what each
person on the team did.
Deadline: before class, Mon 17-Oct-2011.
Assignment 2
Draw ROC curves for A detection using the (training) data in
assignment one:
- Forty ROC curves for each of the 40 input dimensions, using the
values in that dimension as the value to use for discrimination
- One ROC curve (or more if you so desire) for the discriminator you
actually used for HW1. (If you need to do something funny for this
to make sense, e.g., if you used a 1-nearest-neighbour classifier so
it would only give you one point on the ROC curve, do something to
get the remainder of the curve, like in the case of 1-NN vary a
ratio between the distances to the nearest A and the nearest B to
weight towards As or Bs.)
Deadline: before class, Tue 25-Oct-2011.
FAQ
if you ask me a question about the assignment that I think others
might be interested in, I will post the answer here.