EDIT: As another version of the ml-class course has started, I've made the repository private
Back when I was in college, I took three different courses that dealt with subjects related to machine learning and data mining. Although I didn’t lose interest on those matters, my work has led me in a totally unrelated direction, so I haven’t exercised any of that knowledge in about eight years or so. A few weeks ago, I stumbled upon Stanford’s online class on Machine Learning and decided to enroll. I want to revive many of the things I have forgotten and try to put them into practice, as nowadays it’s very easy to access large amounts of interesting data from all kinds of online sources.
The programming exercises of this class are supposed to be done in Octave or Matlab, and while I understand the advantages of these tools, my past experience (where all the exercises and projects were done either with SAS or with Matlab) shows me that not using a general purpose programming languages doesn’t help a lot in turning academic exercises into real world programs. As professor Andrew Ng said in the introduction, one of the goals of the class is for us to put machine learning into practice in real world problems we care about, so I decided that I’ll implement all the algorithms and exercises in F#.
I’m using Math.NET Numerics for the linear algebra and statistics, and while some operations are not as simple as in Octave, being able to use higher order functions instead of for loops compensates for it. As an example, look at the code for feature normalization in Octave:
1 2 3 4 5 6 7 8 9 10 11 12 13
And here’s the equivalent F# code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Even though calculating the mean and standard deviation is a little more elaborate than using just a built-in function call, the actual normalization code is more straightforward, so we’re not really loosing any productivity by using F#, considering that we have the same kind of development interactivity thanks to F# Interactive. Other nice touches in F# are the ability to use greek letters and the possibility of separating the data from the algorithm parameters by means of the pipelining operator (|>). By leaving all the data parameters at the end and aggregating them into a tuple, we can turn this code:
which I think is clearer.
I also like the fact that Math.NET Numerics has a separate type for vectors instead of using one-column or one-row matrices for that, allowing us not to worry about transposing the vectors when multiplying.
Mimicking Matlab charting capabilities was a bit more tricky. For the majority of charting I’m using FSharpChart, but the Microsoft Chart Controls that it wraps don’t support contour nor surface plots. For contour plots I ended up using WPF Dynamic Data Display, and for surface plots I adapted a sample from CodeProject. I’m not really happy with the surface plot, as it’s really very basic, so I’ll definitely try to improve it in the future. Here’s a sample of charts from Matlab:
And here are the equivalent ones I was able to produce in F#:
I’m putting this available at https://github.com/ovatsus/MLClass. I’m only going to push the commits weekly after the class exercise deadlines expire, so at the time of this post only the implementation of the Linear Regression exercises is available.
So far this has been a very rewarding experience, as besides preparing the way so I can reuse this code in the future, I've been submitting patches both to Math.NET Numerics and to FSharpChart.
As some people have pointed out in the comments, the Octave version of feature normalization could also be done without iteration:
1 2 3 4 5 6 7 8
In this case the corresponding F# code would be:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18