Amazon Machine Learning Service initial tests
Today I have executed some simple test against Amazon AWS Machine Learning service. The idea was simple: take the MNIST dataset available on Kaggle.com divided into train and tests datasets, push it to Amazon S3 and run it through Amazon Machine Learning service to see what happens.
I have intentionally left all the defaults (although the service is customizable a bit) and was trying to find out the score I will get on the Kaggle Digits Recognizer competition.
First impression, the Amazon Machine Learning service is pretty simple to use, you need to set up some permissions on the S3 buckets you use for reading data and writing results of your batch processing predictions but its all rather clear and easy.
After the data source is created, model prepared and the predictions made the result file can be found in the specified location on the S3 service. To upload that to Kaggle some additional transformations needs to be done on the result data set.
Below is a simple script in R which loads the data, transforms it to select the most probable digit for the specific prediction row:
data <- read.csv("/path to your result file/result_mnist.csv",sep=",", header=TRUE)
//name the columns
names(data) <- c("5","6","label","7","2","8","3","9","0","1","4")
//remove label and the first row
data <- data[2:28001,-3]
//select the values with the biggest prediction
biggests <- colnames(data)[apply(data,1,which.max)]
//prepare the data for kaggle
biggests_num <- as.numeric(biggests)
mnist <- as.data.frame(biggests_num)
mnist$ImageId <-seq.int(nrow(mnist))
names(mnist) <- c("Label","ImageId")
write.csv(mnist, file = "/path to your output folder/mnist_output.csv",row.names=FALSE)
The score received (0.91800) is not impressive but the simplicity of the service usage actually is.
Happy Saturday everyone!