Top 10 Machine Learning Algorithms Data Scientists should Master

Machine learning algorithms can help people explore, analyze and find meaning in complex data sets

All practicing journalists must have some basic knowledge of machine learning algorithms to operate with ease. Machine learning algorithms are pieces of code that help people explore, analyze and find meaning in complex data sets. Each algorithm is a finite set of unambiguous step-by-step instructions that a machine can follow to achieve a certain goal. In a machine learning model, the goal is to establish or discover patterns that people can use to make predictions or categorize information. Here are the top 10 machine learning algorithms that data scientists should master.

Principal Component Analysis(PCA)/SVD

PCA is an unsupervised method to understand the global properties of a dataset consisting of vectors. The Covariance Matrix of data points is analyzed here to understand what dimensions(mostly)/ data points (sometimes) are more important. One way to think of the top PCs of a matrix is to think of its eigenvectors with the highest eigenvalues. SVD is essentially a way to calculate ordered components too, but you don't need to get the covariance matrix of points to get it.

Least Squares and Polynomial Fitting

If you have used Numerical Analysis code in college, you can use them to fit curves in Machine Learning for very small datasets with low dimensions. (For large data or datasets with many dimensions, you might just end up terribly overfitting, so don't bother). OLS has a closed-form solution, so you don't need to use complex optimization techniques.

K means Clustering

Everyone's favorite unsupervised clustering algorithm. Given a set of data points in form of vectors, we can make clusters of points based on distances between them. It's an Expectation-Maximization algorithm that iteratively moves the centers of clusters and then clubs points with each cluster center. The input the algorithm has taken is the number of clusters that are to be generated and the number of iterations in which it will try to converge clusters.

Logistic Regression

Logistic Regression is constrained Linear Regression with a nonlinearity (sigmoid function is used mostly or you can use tanh too) application after weights are applied, hence restricting the outputs close to +/- classes (which is 1 and 0 in case of sigmoid). Cross-Entropy Loss functions are optimized using Gradient Descent. Logistic Regression is used for classification, not regression. You can also think of Logistic regression as a layered Neural Network. Logistic Regression is trained using optimization methods like Gradient Descent or L-BFGS. NLP people will often use it with the name of Maximum Entropy Classifier.

SVM (Support Vector Machines)

SVMs are linear models like Linear/ Logistic Regression, the difference is that they have different margin-based loss functions. You can optimize the loss function using optimization methods like L-BFGS or even SGD. Another innovation in SVMs is the usage of kernels on data to feature engineers. If you have good domain insight, you can replace the good-old RBF kernel with smarter ones and profit.

Feedforward Neural Networks

These are basically multilayered Logistic Regression classifiers. Many layers of weights are separated by non-linearities (sigmoid, tanh, relu + softmax and the cool new selu). Another popular name for them is Multi-Layered Perceptrons. FFNNs can be used for classification and unsupervised feature learning as autoencoders.

Convolutional Neural Networks (Convnets)

Almost any state-of-the-art Vision-based Machine Learning result in the world today has been achieved using Convolutional Neural Networks. They can be used for Image Classification, Object Detection, or even segmentation of images. Invented by Yann Lecun in the late 80s-early 90s, Convnets feature convolutional layers which act as hierarchical feature extractors. You can use them in text too (and even graphs).

Recurrent Neural Networks (RNNs):

RNNs model sequences by applying the same set of weights recursively on the aggregator state at a time t and input at a time t (Given a sequence has inputs at times 0..t..T, and have a hidden state at each time t which is output from t-1 step of RNN). Pure RNNs are rarely used now but their counterparts like LSTMs and GRUs are state of the art in most sequence modeling tasks.

Conditional Random Fields (CRFs)

CRFs are probably the most frequently used models from the family of Probabilistic Graphical Models (PGMs). They are used for sequence modeling like RNNs and can be used in combination with RNNs too. Before Neural Machine Translation systems came in CRFs were the state of the art and in many sequence tagging tasks with small datasets, they will still learn better than RNNs which require a larger amount of data to generalize. They can also be used in other structured prediction tasks like Image Segmentation etc. CRF models each element of the sequence (say a sentence) such that neighbors affect a label of a component in a sequence instead of all labels being independent of each other.

Decision Trees

Earlier versions like CART trees were once used for simple data, but with a bigger and larger datasets, the bias-variance tradeoff needs to be solved with better algorithms. The two common decision trees algorithms used nowadays are Random Forests (which build different classifiers on a random subset of attributes and combine them for output) and Boosting Trees (which train a cascade of trees one on top of others, correcting the mistakes of ones below them).

Machine Learning