Microsoft Azure Machine Learning Algorithm Cheat Sheet

Azure Machine Learning Studio
comes with a large number of machine learning algorithms that you can use to build your predictive analytics solutions. These algorithms fall into the general machine learning categories of regression, classification, clustering, and anomaly detection, and each one is designed to address a different type of machine learning problem.

The question is, is there something that can help me quickly figure out how to choose a machine learning algorithm for my specific solution?

The Microsoft Azure Machine Learning Algorithm Cheat Sheet is designed to help you sift through the available machine learning algorithms and choose the appropriate one to use for your predictive analytics solution. The cheat sheet asks you questions about both the nature of your data and the problem you’re working to address, and then suggests an algorithm for you to try.

Download the cheat sheet here: Microsoft Azure Machine Learning Algorithm Cheat Sheet


For a deeper discussion of the different types of machine learning algorithms and how they’re used, see How to choose an algorithm in Azure Machine Learning. For a list of all the machine learning algorithms available in Machine Learning Studio, see Initialize Model in the Machine Learning Studio Algorithm and Module Help.

  • The suggestions offered in this cheat sheet are approximate rules-of-thumb. Some can be bent, and some can be flagrantly violated. This is intended to suggest a starting point. Don’t be afraid run a head-to-head competition between several algorithms on your data. There is simply no substitute for understanding the principles of each algorithm and understanding the system that generated your data.
  • Every machine learning algorithm has its own style or inductive bias. For a specific problem, several algorithms may be appropriate and one algorithm may be a better fit than others. But knowing which will be the best fit beforehand is not always possible. In cases like these, several algorithms are listed together in the cheat sheet. An appropriate strategy would be to try one algorithm, and if the results are not yet satisfactory, try the others. Here’s an example from the Azure Machine Learning Gallery of an experiment that tries several algorithms against the same data and compares the results: Compare Multi-class Classifiers: Letter recognition.
  • There are three main categories of machine learning: supervised learning, unsupervised learning, and reinforcement learning.
  • In supervised learning, each data point is labeled or associated with a category or value of interest. An example of a categorical label is assigning an image as either a ‘cat’ or a ‘dog’. An example of a value label is the sale price associated with a used car. The goal of supervised learning is to study many labeled examples like these, and then to be able to make predictions about future data points – for example, to identify new photos with the correct animal or to assign accurate sale prices to other used cars. This is a popular and useful type of machine learning. All of the modules in Azure Machine Learning are supervised learning algorithms except for K-Means Clustering.
  • In unsupervised learning, data points have no labels associated with them. Instead, the goal of an unsupervised learning algorithm is to organize the data in some way or to describe its structure. This can mean grouping it into clusters, as K-means does, or finding different ways of looking at complex data so that it appears simpler.
  • In reinforcement learning, the algorithm gets to choose an action in response to each data point. It is a common approach in robotics, where the set of sensor readings at one point in time is a data point, and the algorithm must choose the robot’s next action. It’s also a natural fit for Internet of Things applications. The learning algorithm also receives a reward signal a short time later, indicating how good the decision was. Based on this, the algorithm modifies its strategy in order to achieve the highest reward. Currently there are no reinforcement learning algorithm modules in Azure ML.
  • Bayesian methods make the assumption of statistically independent data points. This means that the unmodeled variability in one data point is uncorrelated with others, that is, it can’t be predicted. For example, if the data being recorded is the number of minutes until the next subway train arrives, two measurements taken a day apart are statistically independent. However, two measurements taken a minute apart are not statistically independent – the value of one is highly predictive of the value of the other.
  • Boosted decision tree regression takes advantage of feature overlap or interaction among features. That means that, in any given data point, the value of one feature is somewhat predictive of the value of another. For example, in daily high/low temperature data, knowing the low temperature for the day allows you to make a reasonable guess at the high. The information contained in the two features is somewhat redundant.
  • Classifying data into more than two categories can be done by either using an inherently multi-class classifier, or by combining a set of two-class classifiers into an ensemble. In the ensemble approach, there is a separate two-class classifier for each class – each one separates the data into two categories: “this class” and “not this class.” Then these classifiers vote on the correct assignment of the data point. This is the operational principle behind One-vs-All Multiclass.
  • Several methods, including logistic regression and the Bayes point machine, assume linear class boundaries, that is, that the boundaries between classes are approximately straight lines (or hyperplanes in the more general case). Often this is a characteristic of the data that you don’t know until after you’ve tried to separate it, but it’s something that typically can be learned by visualizing beforehand. If the class boundaries look very irregular, stick with decision trees, decision jungles, support vector machines, or neural networks.
  • Neural networks can be used with categorical variables by creating a dummy variable for each category and setting it to 1 in cases where the category applies, 0 where it doesn’t.

Leave a Reply

Your email address will not be published. Required fields are marked *