Microsoft Azure Machine Learning Algorithm Cheat Sheet

Azure Machine Learning Studio
comes with a large number of machine learning algorithms that you can use to build your predictive analytics solutions. These algorithms fall into the general machine learning categories of regression, classification, clustering, and anomaly detection, and each one is designed to address a different type of machine learning problem.

The question is, is there something that can help me quickly figure out how to choose a machine learning algorithm for my specific solution?

The Microsoft Azure Machine Learning Algorithm Cheat Sheet is designed to help you sift through the available machine learning algorithms and choose the appropriate one to use for your predictive analytics solution. The cheat sheet asks you questions about both the nature of your data and the problem you’re working to address, and then suggests an algorithm for you to try.

Download the cheat sheet here: Microsoft Azure Machine Learning Algorithm Cheat Sheet

For a deeper discussion of the different types of machine learning algorithms and how they’re used, see How to choose an algorithm in Azure Machine Learning. For a list of all the machine learning algorithms available in Machine Learning Studio, see Initialize Model in the Machine Learning Studio Algorithm and Module Help.

  • The suggestions offered in this cheat sheet are approximate rules-of-thumb. Some can be bent, and some can be flagrantly violated. This is intended to suggest a starting point. Don’t be afraid run a head-to-head competition between several algorithms on your data. There is simply no substitute for understanding the principles of each algorithm and understanding the system that generated your data.
  • Every machine learning algorithm has its own style or inductive bias. For a specific problem, several algorithms may be appropriate and one algorithm may be a better fit than others. But knowing which will be the best fit beforehand is not always possible. In cases like these, several algorithms are listed together in the cheat sheet. An appropriate strategy would be to try one algorithm, and if the results are not yet satisfactory, try the others. Here’s an example from the Azure Machine Learning Gallery of an experiment that tries several algorithms against the same data and compares the results: Compare Multi-class Classifiers: Letter recognition.
  • There are three main categories of machine learning: supervised learning, unsupervised learning, and reinforcement learning.
  • In supervised learning, each data point is labeled or associated with a category or value of interest. An example of a categorical label is assigning an image as either a ‘cat’ or a ‘dog’. An example of a value label is the sale price associated with a used car. The goal of supervised learning is to study many labeled examples like these, and then to be able to make predictions about future data points – for example, to identify new photos with the correct animal or to assign accurate sale prices to other used cars. This is a popular and useful type of machine learning. All of the modules in Azure Machine Learning are supervised learning algorithms except for K-Means Clustering.
  • In unsupervised learning, data points have no labels associated with them. Instead, the goal of an unsupervised learning algorithm is to organize the data in some way or to describe its structure. This can mean grouping it into clusters, as K-means does, or finding different ways of looking at complex data so that it appears simpler.
  • In reinforcement learning, the algorithm gets to choose an action in response to each data point. It is a common approach in robotics, where the set of sensor readings at one point in time is a data point, and the algorithm must choose the robot’s next action. It’s also a natural fit for Internet of Things applications. The learning algorithm also receives a reward signal a short time later, indicating how good the decision was. Based on this, the algorithm modifies its strategy in order to achieve the highest reward. Currently there are no reinforcement learning algorithm modules in Azure ML.
  • Bayesian methods make the assumption of statistically independent data points. This means that the unmodeled variability in one data point is uncorrelated with others, that is, it can’t be predicted. For example, if the data being recorded is the number of minutes until the next subway train arrives, two measurements taken a day apart are statistically independent. However, two measurements taken a minute apart are not statistically independent – the value of one is highly predictive of the value of the other.
  • Boosted decision tree regression takes advantage of feature overlap or interaction among features. That means that, in any given data point, the value of one feature is somewhat predictive of the value of another. For example, in daily high/low temperature data, knowing the low temperature for the day allows you to make a reasonable guess at the high. The information contained in the two features is somewhat redundant.
  • Classifying data into more than two categories can be done by either using an inherently multi-class classifier, or by combining a set of two-class classifiers into an ensemble. In the ensemble approach, there is a separate two-class classifier for each class – each one separates the data into two categories: “this class” and “not this class.” Then these classifiers vote on the correct assignment of the data point. This is the operational principle behind One-vs-All Multiclass.
  • Several methods, including logistic regression and the Bayes point machine, assume linear class boundaries, that is, that the boundaries between classes are approximately straight lines (or hyperplanes in the more general case). Often this is a characteristic of the data that you don’t know until after you’ve tried to separate it, but it’s something that typically can be learned by visualizing beforehand. If the class boundaries look very irregular, stick with decision trees, decision jungles, support vector machines, or neural networks.
  • Neural networks can be used with categorical variables by creating a dummy variable for each category and setting it to 1 in cases where the category applies, 0 where it doesn’t.

Azure ML: NCAA Bracket Prediction – A competitor perspective

If you have seen the Day 2 BUILD keynote you might have noticed Joseph Sirosh, CVP-Machine Learning, talking about an internal hackathon where competitors used Azure ML to predict the 2015 March Madness bracket. So how hard was it to develop an algorithm and generate a bracket which got me onto the leaderboard there? I am not a data scientist but I have passion for machine learning and Azure ML have peaked my interest. I entered the contest just for a learning experience.

Azure ML allows users to publish trained Machine Learning models as web services enabling quick operationalization of their experiments. The first step in the process is creating a Training experiment to train a model. The trained model is then published as a web service using a Scoring experiment. The end-result of this process is a “default endpoint”.


To begin, Sign into ML studio and watch the get started videos and you will be on the way to creating your own first machine learning experiment.

To get the NCAA Bracket prediction I used a standard classification algorithm “Two-Class Boosted Decision tree” and fed the historical NCAA tournament data.

When I scored this simple model and evaluated the results, I wasn’t satisfied since it pretty much picked as per the rankings and predicted all higher seeds to win every game. So I wanted to train it with a higher accuracy model and looked into R programming and standard R packages in the field. I used the infamous rpart to solve this classification problem and created another training experiment.

Creating an R model was quite easy in ML studio, all I have to do was include the following training script so that rpart package is included and trains for Team1Wins or not over the mentioned features in my dataset.

Once I ran the training experiment, I clicked on “Create Scoring Experiment” and scored the model. I also included few R scripts to calculate weighted probabilities for each year’s historical data.

Once I evaluated the scored model, I looked at the probabilities of the result and to fine tune it I split the data with less accuracy and scored it with the trained model I created before with a standard “Two-class Boosted Decision tree” algorithm. As you can see splitting and manipulating datasets for appropriate trained models is quite easy and doesn’t need complex data transformation.


Here is the generated bracket out of my predicted results. This was generated by a simple program that the hackathon organizers used to call the web service provided by each participant.

Isn’t it amazing that with few clicks I was able to create a web service that predicted an NCAA bracket’s final four teams accurately. Azure ML rocks!

If you are interested in learning more about Azure ML, here is our free Microsoft Press ebook by Jeff Barnes.