Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2020 Oct 8;1(8):100119. doi: 10.1016/j.patter.2020.100119

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© 2020 The Authors

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

PMC Copyright notice

Problem Statement and Schematic of Workflow

Biomedical research has long relied on regression-type analysis tools. However, the same regression analysis tools are often used with non-identical objectives within different scientific communities. In carefully designed experiments in animals and humans, studies have focused on making progress toward inferring the role of preselected variables in explaining the observed outcome or experimental conditions, such as isolating cancer-related genetic variants in mouse models. Propelled by larger datasets and recent advances in machine-learning algorithms, applied clinical research has shifted toward combining heterogeneous and rich measurements from different sources to “brute-force” forecasting of practically useful endpoints, such as predicting the duration of hospitalization of new patients based on previous electronic health records. For a long time, these disparate uses of identical analytical tools have been carried out in parallel with little crosstalk between communities. As a result, the conceptual and empirical relationship between the established agenda of statistical inference and the now expanding agenda of raw prediction performance remains largely obscure. Motivated by this increasing need, our study carried out a careful comparison of modeling for inference (left) and prediction (right) on identical datasets, based on comprehensive empirical simulations and revisiting common medical studies. OLS = ordinary least squares; Lasso = least absolute shrinkage and selection operator.