Naive Bayes |
It is simple, does not suffer from outliers, can deal with missing data, is recommended when there are many features and density estimation becomes unfeasible. |
Assumes conditional independence of input variables given the output label. |
Linear Discriminant Analysis |
Works well on linearly discriminable data. Widely used for Dimensionality Reduction. |
Assumes that features belonging to the same class have the same Gaussian distribution. Can be sensitive to outliers. |
Linear regression |
Usually the method of choice for small N and/or large M problems. Can be easily regularized with straightforward procedures. |
Assumes linearity between features and output variables. |
Logistic regression and softmax regression |
Widely used in classification problems for its simplicity as a generalized linear model. Can be easily regularized with straightforward procedures. More robust to outliers compared to LDA. |
Works well on linear problem. Its extension to non linear problems is computationally expensive and SVM is generally preferred. |
Support Vector Machine |
Works well on linear and nonlinear (Gaussian Kernel function) problems. |
Not appropriate for large datasets where the number of features exceed the number of observations. |
Perceptron learning and ANN |
Can solve complex problems. |
Difficult to tune due to the large number of parameters and of possible architectures. |
Random Forest |
Easy to tune. Robust to overfitting. Can provide an estimate of the importance of each feature in the model. |
Can be slow as computation time increases linearly with the number of trees (inadequate for some real-time applications). |
Deep Neural Networks |
Able to solve complex problems. |
Difficult to tune due to the large number of parameters/possible architectures. |