Supervised learning algorithms#
Supervised learning (SL) is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. It infers a function from labeled training data consisting of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal). A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples.
These notebooks serve us to practice some ML techniques, and as snippets to build on.
Algorithms#
Algorithm |
Good for |
---|---|
Nearest neighbors |
Good for small datasets, good as a baseline, easy to explain. |
Linear models |
Go-to as a first algorithm to try, good for very large datasets, good for very high-dimensional data. |
Naive Bayes |
Only for classification. Even faster than linear models, good for very large datasets and high-dimensional data. Often less accurate than linear models. |
Decision trees |
Very fast, don’t need scaling of the data, can be visualized and easily explained. |
Random forests |
Nearly always perform better than a single decision tree, very robust and powerful. Don’t need scaling of data. Not good for very high-dimensional sparse data. |
Gradient boosted decision trees |
Often slightly more accurate than random forests. Slower to train but faster to predict than random forests, and smaller in memory. Need more parameter tuning than random forests. |
Support vector machines |
Powerful for medium-sized datasets of features with similar meaning. Require scaling of data, sensitive to parameters. Training time with SVMs can be high. Less effective on noisier datasets with overlapping classes. |
Neural networks |
Can build very complex models, particularly for large datasets. Sensitive to scaling of the data and to the choice of parameters. Large models need a long time to train. |