Skip to main content
. 2022 May 13;3(2):311–322. doi: 10.1093/ehjdh/ztac025

Table 1.

Differences between traditional statistical models and machine learning algorithms

Method Traditional regression models Machine learning
Assumptions
ȃIndependence Predictors are assumed to be independent of each other. Predictors do not need to be independent.
ȃMulticollinearity No multicollinearity—predictors should not correlate with each other. Multicollinearity allowed.
Predictors
ȃSelection of predictors Prespecified. Does not have to be prespecified.
Data structure
ȃReasoning Inductive—derives a rule for the relationship between the input and the outcome.11 Transductive—can predict outcomes using inputs from the training set without deriving a general rule.11
ȃDimensionality Performs well with low signal-to-noise ratio, but poorly with high-dimensional data. Performs well with high-dimensional data with high signal-to-noise ratio.12
ȃSample size Smaller sample size, fewer events required per predictor. Larger sample size, more events required per predictor.
Performance
ȃInteractions Can test for a limited number of prespecified interactions.12 Can handle large number of non-prespecified interactions.12
ȃEffect size The effect of individual predictors is of interest. The effect of individual predictors is not of interest, prediction is prioritized.
ȃPerformance Lower accuracy. Higher accuracy, particularly for non-linear, non-smooth relationships.
ȃInterpretability Models are easier to interpret and explain. Models are more challenging to interpret, can be a ‘black box’.
ȃDichotomization Calibration poor with dichotomized predictor and outcome variables. Better calibration with dichotomous predictor and outcome variables.