| Linear regression |
Simple and easy to interpret |
Assumes linear
relationship, sensitive to outliers |
Used when the
relationship between features and target is approximately linear |
| Fast training
and prediction
times |
| Works
well with linear relationships
between features and targets |
| Random Sample
Consensus |
Robust
to outliers in the
data |
May require
a large number of iterations to converge |
Computer vision,
image stitching, feature matching. Used when dealing with data sets
containing outliers and when robust estimation of parameters is desired |
| Can handle noisy
data sets
effectively |
| Suitable for linear and
nonlinear regression problems |
Sensitivity to the choice
of threshold parameters |
| Theil-Sen Estimator |
Robust to outliers and non-normality
in the data |
Computationally
intensive
for large data sets. |
Used when robust
estimation of parameters is critical and when dealing with data sets
containing outliers |
| Provides consistent estimates
of parameters even with up to 29% of outliers |
May not perform well with
highly skewed data sets |
| Decision Tree |
Able to capture complex
nonlinear relationships in the data |
Prone to overfitting,
unstable (small data changes can result in different trees) |
Classification,
regression, feature selection, decision analysis. Used when the relationship
between features and target is nonlinear or when interpretability
is important |
| Easy to interpret and visualize |
| Robust to outliers in the
data |
| Random forest |
Robust and less prone to
overfitting compared to individual decision trees |
Computationally intensive,
less interpretable than a single decision tree |
Classification,
regression, feature selection, anomaly detection. Used when high predictive
accuracy is desired and interpretability is less important |
| Handles high-dimensional
data sets with ease |
Slower training
and prediction times compared to decision trees Sensitive to the choice
of kernel and hyper parameters |
| Provides feature importance
scores |
| Support Vector
Machine |
Effective
in high-dimensional
spaces |
|
Classification,
text categorization, image recognition. Used when dealing with nonlinear
regression problems and when interpretability is less important |
| Robust to overfitting,
especially
with appropriate kernel functions |
Can be computationally intensive,
especially with large data sets |
| K-Nearest Neighbor |
Simple and intuitive concept |
Computationally expensive
during prediction, especially with large data sets |
Classification,
regression, Pattern recognition, data imputation, anomaly detection |
| No training phase,
making
it suitable for online learning |
Sensitive to
the choice of distance metric and number of neighbors |
| Effective for multiclass
classification |
| AdaBoost |
It helps to reduces bias
variance trade off, performs well with various weak learners |
Sensitive to noisy data
sets and outliers |
Classification, regression,
text recognition, face detection |
| Gradient Boosting |
Provides high predictive
accuracy, handles missing data very well, also reduces bias and variance
trade off |
It is prone
to overfitting,
computationally intensive |
Classification, regression,
ranking, anomaly detection |
| Extreme Gradient
Boosting |
High predictive
accuracy
and speed. Regularization techniques to prevent overfitting |
Can be sensitive to noisy
data |
Regression,
classification, anomaly detection pattern recognition Used when high
predictive accuracy is desired and computational efficiency is important |
| Handles missing
data well |
Requires
tuning of hyper
parameters |
| Artificial Neural Networks |
It is flexible and powerful
models for a wide range of problems, robustness to noise, provides
nonlinear modeling capability, provides parallel processing capabilities
of GPUs |
Prone to
overfitting, requires
large data sets and tuning |
Regression, classification,
anomaly detection pattern recognition |
| Long Short-Term Memory (LSTM) |
Solves vanishing gradient
problem, good for long sequence data |
Computationally intensive,
complex architecture |
Time series forecasting,
language modeling, speech synthesis |
| Generative Adversarial Networks
(GAN) |
Generates realistic
data,
unsupervised learning |
Difficult to train, risk
of mode collapse |
Image
generation, style
transfer, data augmentation, creative applications |
| Bayesian Networks |
Handles uncertainty, provides
probabilistic interpretations |
Computationally intensive,
complex to implement |
Diagnosis, forecasting,
decision support systems |