. 2022 May 13;3(2):311–322. doi: 10.1093/ehjdh/ztac025

Table 1.

Differences between traditional statistical models and machine learning algorithms

Method	Traditional regression models	Machine learning
Assumptions
ȃIndependence	Predictors are assumed to be independent of each other.	Predictors do not need to be independent.

ȃMulticollinearity	No multicollinearity—predictors should not correlate with each other.	Multicollinearity allowed.
Predictors

ȃSelection of predictors	Prespecified.	Does not have to be prespecified.
Data structure
ȃReasoning	Inductive—derives a rule for the relationship between the input and the outcome.¹¹	Transductive—can predict outcomes using inputs from the training set without deriving a general rule.¹¹
ȃDimensionality	Performs well with low signal-to-noise ratio, but poorly with high-dimensional data.	Performs well with high-dimensional data with high signal-to-noise ratio.¹²
ȃSample size	Smaller sample size, fewer events required per predictor.	Larger sample size, more events required per predictor.
Performance
ȃInteractions	Can test for a limited number of prespecified interactions.¹²	Can handle large number of non-prespecified interactions.¹²
ȃEffect size	The effect of individual predictors is of interest.	The effect of individual predictors is not of interest, prediction is prioritized.
ȃPerformance	Lower accuracy.	Higher accuracy, particularly for non-linear, non-smooth relationships.
ȃInterpretability	Models are easier to interpret and explain.	Models are more challenging to interpret, can be a ‘black box’.
ȃDichotomization	Calibration poor with dichotomized predictor and outcome variables.	Better calibration with dichotomous predictor and outcome variables.