Skip to main content
. 2020 Oct 8;1(8):100119. doi: 10.1016/j.patter.2020.100119

Figure 1.

Figure 1

Problem Statement and Schematic of Workflow

Biomedical research has long relied on regression-type analysis tools. However, the same regression analysis tools are often used with non-identical objectives within different scientific communities. In carefully designed experiments in animals and humans, studies have focused on making progress toward inferring the role of preselected variables in explaining the observed outcome or experimental conditions, such as isolating cancer-related genetic variants in mouse models. Propelled by larger datasets and recent advances in machine-learning algorithms, applied clinical research has shifted toward combining heterogeneous and rich measurements from different sources to “brute-force” forecasting of practically useful endpoints, such as predicting the duration of hospitalization of new patients based on previous electronic health records. For a long time, these disparate uses of identical analytical tools have been carried out in parallel with little crosstalk between communities. As a result, the conceptual and empirical relationship between the established agenda of statistical inference and the now expanding agenda of raw prediction performance remains largely obscure. Motivated by this increasing need, our study carried out a careful comparison of modeling for inference (left) and prediction (right) on identical datasets, based on comprehensive empirical simulations and revisiting common medical studies. OLS = ordinary least squares; Lasso = least absolute shrinkage and selection operator.