Table 1.
Machine Learning | Traditional Statistics | |
---|---|---|
Year conceptualized | 1959 | 17th century |
Primary goal | Make the best prediction and/or recognize patterns within data (either samples of or an entire study population of interest) | Describe data (samples only) and estimate parameters of an analytic model specified for a population of interest (aka statistical inference) |
Knowledge of potential relationships between variables | Not required | Not required for description of data, but required for statistical inference |
Hypotheses | More often hypothesis-generating | More often hypothesis-driven |
Analysis approach | Often learns from data and models can be difficult to interpret due to extensive use of latent variables (DL & UML black-box phenomenon) | Explicitly specified analytic models for statistical inference and easy to interpret |
Data size | Very large and can be the size of an entire population of interest | Small to moderate and samples of a population of interest only for statistical inference |
Number of features | Large and unspecified | Small and explicitly specified for statistical inference |
Rigor | Minimal model assumptions | Strict model assumptions for statistical inference |
Interpretability | Limited to data at hand (either example or population) and results | Inference of relationships for the entire population of interest |
Methods for assessing performance | Often empirically using cross-validation, ROC AUC, % accuracy, sensitivity, and specificity | Statistical and practical significance (e.g., p values, effect sizes) |
AUC=area under the curve; DL=deep learning; ROC=Receiver Operating Characteristic; UML=unsupervised machine learning