Skip to main content
. Author manuscript; available in PMC: 2020 Nov 7.
Published in final edited form as: Curr Psychiatry Rep. 2019 Nov 7;21(11):116. doi: 10.1007/s11920-019-1094-0

Table 1.

Key comparisons between machine learning and traditional statistics in healthcare research

Machine Learning Traditional Statistics
Year conceptualized 1959 17th century
Primary goal Make the best prediction and/or recognize patterns within data (either samples of or an entire study population of interest) Describe data (samples only) and estimate parameters of an analytic model specified for a population of interest (aka statistical inference)
Knowledge of potential relationships between variables Not required Not required for description of data, but required for statistical inference
Hypotheses More often hypothesis-generating More often hypothesis-driven
Analysis approach Often learns from data and models can be difficult to interpret due to extensive use of latent variables (DL & UML black-box phenomenon) Explicitly specified analytic models for statistical inference and easy to interpret
Data size Very large and can be the size of an entire population of interest Small to moderate and samples of a population of interest only for statistical inference
Number of features Large and unspecified Small and explicitly specified for statistical inference
Rigor Minimal model assumptions Strict model assumptions for statistical inference
Interpretability Limited to data at hand (either example or population) and results Inference of relationships for the entire population of interest
Methods for assessing performance Often empirically using cross-validation, ROC AUC, % accuracy, sensitivity, and specificity Statistical and practical significance (e.g., p values, effect sizes)

AUC=area under the curve; DL=deep learning; ROC=Receiver Operating Characteristic; UML=unsupervised machine learning