. Author manuscript; available in PMC: 2020 Nov 7.

Published in final edited form as: Curr Psychiatry Rep. 2019 Nov 7;21(11):116. doi: 10.1007/s11920-019-1094-0

Table 1.

Key comparisons between machine learning and traditional statistics in healthcare research

	Machine Learning	Traditional Statistics
Year conceptualized	1959	17^th century
Primary goal	Make the best prediction and/or recognize patterns within data (either samples of or an entire study population of interest)	Describe data (samples only) and estimate parameters of an analytic model specified for a population of interest (aka statistical inference)
Knowledge of potential relationships between variables	Not required	Not required for description of data, but required for statistical inference
Hypotheses	More often hypothesis-generating	More often hypothesis-driven
Analysis approach	Often learns from data and models can be difficult to interpret due to extensive use of latent variables (DL & UML black-box phenomenon)	Explicitly specified analytic models for statistical inference and easy to interpret
Data size	Very large and can be the size of an entire population of interest	Small to moderate and samples of a population of interest only for statistical inference
Number of features	Large and unspecified	Small and explicitly specified for statistical inference
Rigor	Minimal model assumptions	Strict model assumptions for statistical inference
Interpretability	Limited to data at hand (either example or population) and results	Inference of relationships for the entire population of interest
Methods for assessing performance	Often empirically using cross-validation, ROC AUC, % accuracy, sensitivity, and specificity	Statistical and practical significance (e.g., p values, effect sizes)

AUC=area under the curve; DL=deep learning; ROC=Receiver Operating Characteristic; UML=unsupervised machine learning