Table 1.
Differences between traditional statistical models and machine learning algorithms
Method | Traditional regression models | Machine learning |
---|---|---|
Assumptions | ||
ȃIndependence | Predictors are assumed to be independent of each other. | Predictors do not need to be independent. |
ȃMulticollinearity | No multicollinearity—predictors should not correlate with each other. | Multicollinearity allowed. |
Predictors | ||
ȃSelection of predictors | Prespecified. | Does not have to be prespecified. |
Data structure | ||
ȃReasoning | Inductive—derives a rule for the relationship between the input and the outcome.11 | Transductive—can predict outcomes using inputs from the training set without deriving a general rule.11 |
ȃDimensionality | Performs well with low signal-to-noise ratio, but poorly with high-dimensional data. | Performs well with high-dimensional data with high signal-to-noise ratio.12 |
ȃSample size | Smaller sample size, fewer events required per predictor. | Larger sample size, more events required per predictor. |
Performance | ||
ȃInteractions | Can test for a limited number of prespecified interactions.12 | Can handle large number of non-prespecified interactions.12 |
ȃEffect size | The effect of individual predictors is of interest. | The effect of individual predictors is not of interest, prediction is prioritized. |
ȃPerformance | Lower accuracy. | Higher accuracy, particularly for non-linear, non-smooth relationships. |
ȃInterpretability | Models are easier to interpret and explain. | Models are more challenging to interpret, can be a ‘black box’. |
ȃDichotomization | Calibration poor with dichotomized predictor and outcome variables. | Better calibration with dichotomous predictor and outcome variables. |