Skip to main content
Cell Reports Medicine logoLink to Cell Reports Medicine
. 2025 Apr 15;6(4):102081. doi: 10.1016/j.xcrm.2025.102081

Importance of cohort and target trial emulation in clinical research

Xiaohua Liang 1, Di Zhang 2, Huan Wang 3, Muhammad Fahad Tahir 1, Lanling Chen 1, Xiaodong Zhao 4,5,, Zhiyong Zou 3,5,∗∗
PMCID: PMC12047467  PMID: 40239630

Abstract

Advances in cohort studies and target trial emulation provide substantial evidence for policymaking to enhance global health. Moving forward, key priorities should focus on standardizing data processing protocols, advancing life-course research, establishing global health data-sharing platforms, and integrating population studies with omics research to support disease prevention and treatment development.


Advances in cohort studies and target trial emulation provide substantial evidence for policymaking to enhance global health. Moving forward, key priorities should focus on standardizing data processing protocols, advancing life-course research, establishing global health data-sharing platforms, and integrating population studies with omics research to support disease prevention and treatment development.

Main text

Introduction

In recent decades, clinical research has been crucial in addressing global public health challenges by tackling diseases of poverty, the rise of chronic diseases, and evaluating health-promoting practices. Clinical research demands rigorous design and employs various tools and methods such as cohort studies, case-control studies, case-crossover, and randomized controlled trials (RCTs). A key concern in population studies is how clinical research can effectively contribute to in-depth scientific investigations while addressing real-world issues.

Here, we discuss the utility of cohort studies and target trial emulation (TTE) in clinical research. We first explore the roles of cohort studies and TTE in advancing clinical research, and we highlight the importance of integrating these frameworks to produce strong evidence for medical research. Lastly, we address challenges in enhancing the cohort studies and TTE and outline future directions for population research.

Application of the cohort and target trial emulation

Cohort: Deciphering potential causal associations

Cohort studies are widely utilized in public health and clinical research due to their ability to address key public health and clinical challenges. By pre-selecting specific study populations and conducting long-term follow-up investigations, cohort studies effectively mitigate the selection bias often inherent in macro-level data (Table 1). This facilitates the establishment of robust causal associations. Well-established and world-renowned cohorts, such as UK Biobank, Framingham Heart Study, and National Health and Nutrition Examination Survey (NHANES) have provided a scientific foundation for advancing population health and informing health policies. As a robust prospective epidemiological methodology, cohort studies play pivotal roles in identifying disease risk factors, investigating disease progression, and evaluating therapeutic interventions. The identification of multiple risk factors related to ecosystems, behaviors, and environments, along with their potential interactions with genes, has underscored the need for large-scale prospective cohort studies involving diverse populations across different countries and regions. For example, large-scale cohort studies such as the UK Biobank and the China Kadoorie Biobank have made significant contributions by identifying the roles of modifiable risk factors, including smoking, alcohol consumption, and physical activity, alongside genetic predispositions, in the incidence of mortality, metabolic disorders, respiratory diseases, and cardiovascular diseases in older populations. These findings have further enhanced our understanding of disease pathogenesis.

Table 1.

Comparison between cohort studies and target trial emulation

Cohort study Target trial emulation Random controlled trials
Characteristics

Study type observational both experimental
Sample Size large either small
Representativeness high medium low
Confounding factors more some less
Follow-up time long either short
Dias risks high medium low
Evidence quality low medium high
To real world near near far
Statistical analysis Cox proportional-hazards model or mixed models flexible, G methods, standard methods for observational data
Strengths reliable causality
wide applications
multiple outcome analysis
cost-efficiency
reduce self-inflicted biases
informative to patient care: by describing the target trial population, treatment strategies, follow-up time, and outcome, investigators clarify when and how their results might be informative to patient care
Limitations time-consuming and costly
loss to follow-up
control confounding factors
lack of randomization: observational data inherently lack randomization, requiring an assumption that treatment strategies can be compared as if they were randomized
restriction to pragmatic trials: certain features of highly controlled trials (e.g., blinded treatment assignment) cannot be replicated using observational data
inability to study novel treatments: investigators can only consider treatment strategies used in practice and captured in observational data
Key points clear temporal sequence and etiological exploration explicitly defined target trial protocol (time zero)

Cohort studies also provide invaluable insights by systematically tracking the temporal changes in individuals’ health status, offering empirical evidence critical to understanding the natural history of diseases. In addition to their advantages in exploring the pathological progression of chronic disease, cohort studies have been instrumental in delineating patterns of cancer metastasis and the biological characteristics associated with various stages of malignancy. These studies provide insights into the underlying physiological and molecular mechanisms, offering a scientific basis for early detection and personalized interventions.

Moreover, cohort studies are fundamental in evaluating therapeutic efficacy, providing a comprehensive and longitudinal perspective on the long-term outcomes of medical interventions, including pharmacological treatments, surgical procedures, and aid to inform public health policies. For instance, cohort studies have provided evidence on the long-term impact of interventions such as melatonin use in reducing diabetes and cardiovascular disease risk,1 as well as the effectiveness of bleeding prophylaxis strategies in reducing surgical complications.2 Additionally, pharmacological interventions aimed at slowing the progression of renal disease have been evaluated through cohort studies, highlighting their utility in informing evidence-based clinical practice. These evaluations contribute significantly to the optimization of health policies and enhancement of public health strategies, ensuring that clinical practices and interventions are grounded in rigorous scientific data.

Notably, the multiple time point repeated measures design (MTP-RMD), which shares some similarities with cohort studies, serves as a key supplementary approach for determining biomedical therapeutic effects in patients with chronic illnesses or rare diseases. Chronic disease treatments can have delayed or cumulative adverse effects, and the use of MTP-RMD to identify previously unreported negative impacts related to prolonged drug use holds significant clinical importance. Furthermore, MTP-RMD designs facilitate investigations into rare diseases by enabling patients to serve as their own controls, thereby helping to mitigate the challenge of small sample sizes. The integration of technology, such as wearable monitoring devices and remote tracking tools, can further enhance data collection efficiency while minimizing the need for frequent follow-up visits and reducing workforce demands.

TTE: Mimicking and beyond randomized controlled trials

TTE is a research methodology that uses observational data to approximate RCTs, aiming to make causal inferences.3,4 When RCTs are infeasible or difficult to implement, due to ethical concerns, high cost, or design complexity, TTEs can be employed to assess the effects of treatments or interventions in real-world settings (Table 1). In practice, TTEs have demonstrated significant potential in rare disease research, drug safety monitoring, and comparative effectiveness studies, offering a cost-effective and efficient alternative to traditional RCTs. For rare diseases, where patient numbers are limited, traditional large-scale RCTs are often impractical. TTEs leverage big data from real-world medical systems to study the effect of treatment and drug safety in rare disease populations. For example, by aggregating data from multiple healthcare organizations, TTEs can construct simulated target trials to evaluate the impact of specific treatments on rare disease patients.

The great potential of TTE in drug safety monitoring has been highlighted in numerous studies. After a drug is marketed, TTE can be utilized to monitor its safety using real-world data, enabling the timely detection of rare adverse responses and/or highlighting potential long-term safety issues. By analyzing observational data from diverse drug-using populations, TTEs approximate the conditions of an RCT and assess safety profiles across different demographic groups. For instance, Semaglutide, a hormonal drug used to treat type 2 diabetes, has been associated with Alzheimer’s disease in some studies.5 During the COVID-19 pandemic, TTEs were employed to evaluate the effectiveness of treatments such as molnupiravir, nirmatrelvir-ritonavir, and azvudine,6 providing scientific evidence for their safety and efficacy in addressing emerging viral threats.

Furthermore, when multiple treatments co-exist and head-to-head RCTs are lacking, TTEs can compare the effectiveness of different treatments. For example, several TTE frameworks have been designed to compare the effectiveness of omalizumab, mepolizumab, and dupilumab in treating asthma.7 Similarly, during the COVID-19 pandemic, TTEs were used to assess the comparative effectiveness of combination therapy (nirmatrelvir-ritonavir and remdesivir) versus monotherapy (remdesivir or nirmatrelvir-ritonavir) in hospitalized patients, offering valuable insights to guide clinical decision-making.

Integration of cohort and target trial emulation

Healthcare data routinely collected from claims databases, registries, and electronic health records (EHRs) are increasingly being used to investigate causal questions about the benefits and risks of medical treatments. When conducted rigorously, observational studies, especially cohort studies, enable researchers to assess under-represented populations in clinical trials, directly compare interventions instead of relying solely on placebo comparisons, and explore additional health outcomes beyond those examined in traditional trials. While cohort studies provide long-term and large-scale observation data, TTEs require structured data to simulate the trial. This creates an opportunity for long-term follow-up data from cohort studies (e.g., interventions, outcomes, covariates) to serve as input observational data for target trial experiments. For example, to study the effects of a drug on health outcomes within a specific population, traditional cohort studies involve follow-up observations of different exposure groups. Within the TTE framework, more robust results can be achieved by precisely defining the inclusion and exclusion criteria for the study population, carefully determining the time zero (e.g., time of first drug use or start of follow-up), and analyzing the results using appropriate statistical methods (e.g., propensity score matching, inverse probability weighting) to adjust for between-group variations and simulate randomization.

To implement the TTE framework within a cohort study, a four-step guide is usually presented to researchers interested in the integration (Figure 1A). Initially, researchers need to consider various elements such as eligibility criteria, treatment strategies, assignment procedures, and follow-up duration, within the limitations of available observational data. Unlike traditional RCTs, the validity of TTE heavily relies on the quality and timeliness of observational data, highlighting the importance of addressing time-dependent confounding factors and establishing clear exposure definitions. Consequently, investigators must carefully balance baseline confounders and align time-zero for cohort emulation. Substantial statistical analyses and validation procedures should be clearly outlined. Both internal and external validation, as depicted in Figure 1A, are crucial to ensure the robustness of the findings, necessitating models to be continuously updated to reflect evolving clinical practice.

Figure 1.

Figure 1

Importance of cohort and target trial emulation in clinical research

(A) A four-step guide toward integration of cohort and target trial emulation.

(B) Future directions of cohort studies and target trial emulation.

A notable study conducted in Sweden combined a large cohort study with the TTE framework to investigate the correlation between attention deficit hyperactivity disorder (ADHD) medication and health outcomes, including mortality.8 The investigators initially delineated the protocol for the target trial: individuals aged 6 through 64 years with new diagnosis of ADHD from 2007 through 2018 and no prior dispensation of ADHD medication. Follow-up commenced from the time of ADHD diagnosis until death, emigration, 2 years post-diagnosis, or December 31, 2020 was reached (whichever occurred first). To emulate this target trial, researchers identified 148,578 eligible individuals and categorized them into the ADHD medication group (comprising 84,204 individuals) or the control group based on their ADHD medication within three months of diagnosis. Adjustment for measured confounders were made using inverse probability weighting for subsequent analyses.

This integration significantly reduces the costs and ethical constraints associated with traditional RCTs by merging the breadth of cohort data with the causal inference advantages of time-to-event analysis. It is particularly well suited for exploring long-term efficacy or rare outcomes, such as predicting the 10-year risk of cardiovascular medications. Compared to observational data from other resources like cross-sectional studies and case-control studies, TTEs integrated with cohorts effectively minimized memory bias risks in case-control studies and strengthened causal inference rigor. The successful integration of cohort studies and TTEs provides a promising pathway to convert observational studies into causal evidence, delivering significant value for both scientific research and clinical decision-making.

Current challenges and future directions

Cohort studies often demand significant time and financial investments, while also requiring careful attention to challenges such as loss of patients during follow-up and minimizing confounding factors. Similarly, the design of TTEs limits its large-scale applicability due to the randomness of the sample. Even with near-perfect integration of both approaches, persistent challenges (e.g., limited representativeness and insufficient exploration of biological mechanisms) remain unavoidable. Although long-term, dedicated efforts are needed to unravel the complex interplay between exposure and health outcomes, adopting an ecological approach to clinical research, combined with rapidly advancing biotechnology, holds promise for addressing future global public health challenges. Building on these two approaches, we propose four future directions in population research that could significantly advance clinical research efforts (Figure 1B).

First, standardizing data collection and processing procedures, along with optimizing statistical analysis methods, is crucial for enhancing the causal explanatory power of population research. Comprehensive and accurate data should be collected, with detailed documentation of their sources and quality indicators. Including comprehensive information on potential confounders can improve data quality and minimize their impact on study results. Furthermore, advanced and appropriate statistical methods such as propensity score matching, inverse probability weighting, and marginal structural modeling should be employed to better adjust for confounding factors, thereby improving the accuracy and reliability of the findings. Multiple sensitivity analyses should also be conducted to evaluate the impact of different assumptions and analytic methods on study results, ultimately strengthening the robustness of the conclusions.

Life-course studies, particularly cohort studies that span the entire life cycle from birth to old age, have become prominent research approaches. These studies systematically reveal the long-term effects of early-life exposures on later health outcomes and uncover the underlying mechanisms of healthy aging. Emerging research has demonstrated that childhood environmental factors can significantly influence health outcomes throughout adulthood. For example, studies have investigated the link between childhood dyslipidemia and adult lipid abnormalities,9 as well as the impact of early exposure to environmental pollution on adult health status.10 These findings highlight the critical role of birth cohorts in bridging the gap between early-life exposures and long-term health outcomes. As a result, birth cohorts, such as the US National Children’s study and the UK Life Study, will remain indispensable for advancing our understanding of population health across the life course, offering valuable insights into the early determinants of health and aging.

Moreover, the establishment of global health data-sharing platforms will significantly enhance collaboration in multi-national cohort studies, fostering synergistic research on global health challenges, such as climate change and infectious disease outbreaks. These platforms will also enable a more nuanced examination of population heterogeneity, facilitating an in-depth investigation into how factors such as geography, gender, race, socio-economic status, and environmental exposures11,12 influence differential disease risks and health outcomes. Such research will provide essential data for advancing health equity studies, offering a comprehensive understanding of disparities in health outcomes across diverse populations. The Global Burden of Disease (GBD) study exemplifies a major global health data initiative, providing assessments of thousands of disease outcomes, injuries, and risk factors across more than 200 countries and regions. It also includes secondary health outcomes from more than 20 countries, playing a pivotal role in enhancing the comparability of health outcomes across the temporal, geographic, and demographic contexts. Through these efforts, global health data platforms will not only deepen our understanding of disease dynamics but also support evidence-based policies aimed at reducing health inequities and improving public health worldwide.

The integration of multi-omics technologies with cohort studies and TTEs marks a groundbreaking advancement in precision medicine. By merging genomics, transcriptomics, proteomics, and metabolomics data, researchers can uncover the molecular mechanisms underlying diseases, gaining a comprehensive understanding of the intricate regulatory networks linking genetic variation to phenotypic expression. This synergistic integration of diverse data layers not only aids in identifying novel biomarkers but also enhances the accuracy of early disease detection, classification, and prognostic evaluation. For example, proteomics-driven biomarker discovery and validation have been extensively used in clinical research on pulmonary diseases13 and bowel diseases,14 showcasing the significant translational potential of this approach. Moreover, the fusion of multi-omics data has the power to transform development and application of personalized therapeutic approaches. This shift toward individualized care holds great promise for optimizing patient outcomes and propelling the progress of precision medicine.

Declaration of interests

The authors declare no competing interests.

Contributor Information

Xiaodong Zhao, Email: zhaoxd530@aliyun.com.

Zhiyong Zou, Email: harveyzou2002@bjmu.edu.cn.

References

  • 1.Li Y., Huang T., Redline S., Willett W.C., Manson J.E., Schernhammer E.S., Hu F.B. Use of melatonin supplements and risk of type 2 diabetes and cardiovascular diseases in the USA: insights from three prospective cohort studies. Lancet Diabetes Endocrinol. 2024;12:404–413. doi: 10.1016/S2213-8587(24)00096-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Spertus J.A., Decker C., Gialde E., Jones P.G., McNulty E.J., Bach R., Chhatriwalla A.K. Precision medicine to improve use of bleeding avoidance strategies and reduce bleeding in patients undergoing percutaneous coronary intervention: prospective cohort study before and after implementation of personalized bleeding risks. BMJ. 2015;350:h1302. doi: 10.1136/bmj.h1302. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Hernán M.A., Wang W., Leaf D.E. Target Trial Emulation: A Framework for Causal Inference From Observational Data. JAMA. 2022;328:2446–2447. doi: 10.1001/jama.2022.21383. [DOI] [PubMed] [Google Scholar]
  • 4.Matthews A.A., Danaei G., Islam N., Kurth T. Target trial emulation: applying principles of randomised trials to observational studies. BMJ. 2022;378 doi: 10.1136/bmj-2022-071108. [DOI] [PubMed] [Google Scholar]
  • 5.Wang W., Wang Q., Qi X., Gurney M., Perry G., Volkow N.D., Davis P.B., Kaelber D.C., Xu R. Associations of semaglutide with first-time diagnosis of Alzheimer’s disease in patients with type 2 diabetes: Target trial emulation using nationwide real-world data in the US. Alzheimers Dement. 2024;20:8661–8672. doi: 10.1002/alz.14313. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Bajema K.L., Berry K., Streja E., Rajeevan N., Li Y., Mutalik P., Yan L., Cunningham F., Hynes D.M., Rowneki M., et al. Effectiveness of COVID-19 Treatment With Nirmatrelvir-Ritonavir or Molnupiravir Among U.S. Veterans: Target Trial Emulation Studies With One-Month and Six-Month Outcomes. Ann. Intern. Med. 2023;176:807–816. doi: 10.7326/M22-3565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Akenroye A.T., Segal J.B., Zhou G., Foer D., Li L., Alexander G.C., Keet C.A., Jackson J.W. Comparative effectiveness of omalizumab, mepolizumab, and dupilumab in asthma: A target trial emulation. J. Allergy Clin. Immunol. 2023;151:1269–1276. doi: 10.1016/j.jaci.2023.01.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Li L., Zhu N., Zhang L., Kuja-Halkola R., D’Onofrio B.M., Brikell I., Lichtenstein P., Cortese S., Larsson H., Chang Z. ADHD Pharmacotherapy and Mortality in Individuals With ADHD. JAMA. 2024;331:850–860. doi: 10.1001/jama.2024.0851. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Wu F., Jacobs D.R., Daniels S.R., Kähönen M., Woo J.G., Sinaiko A.R., Viikari J.S.A., Bazzano L.A., Steinberger J., Urbina E.M., et al. Non–High-Density Lipoprotein Cholesterol Levels From Childhood to Adulthood and Cardiovascular Disease Events. JAMA. 2024;331:1834–1844. doi: 10.1001/jama.2024.4819. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Newbury J.B., Heron J., Kirkbride J.B., Fisher H.L., Bakolis I., Boyd A., Thomas R., Zammit S. Air and Noise Pollution Exposure in Early Life and Mental Health From Adolescence to Young Adulthood. JAMA Netw. Open. 2024;7 doi: 10.1001/jamanetworkopen.2024.12169. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Deivanayagam T.A., English S., Hickel J., Bonifacio J., Guinto R.R., Hill K.X., Huq M., Issa R., Mulindwa H., Nagginda H.P., et al. Envisioning environmental equity: climate change, health, and racial justice. Lancet. 2023;402:64–78. doi: 10.1016/S0140-6736(23)00919-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Dwyer-Lindgren L., Kendrick P., Baumann M.M., Li Z., Schmidt C., Sylte D.O., Daoud F., La Motte-Kerr W., Aldridge R.W., Bisignano C., et al. Disparities in wellbeing in the USA by race and ethnicity, age, sex, and location, 2008–21: an analysis using the Human Development Index. Lancet. 2024;404:2261–2277. doi: 10.1016/S0140-6736(24)01757-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Wang R., Gabriel S.E., Ward M.M. Progression of Nonradiographic Axial Spondyloarthritis to Ankylosing Spondylitis: A Population-Based Cohort Study. Arthritis Rheumatol. 2016;68:1415–1421. doi: 10.1002/art.39542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Grännö O., Bergemalm D., Salomon B., Lindqvist C.M., Hedin C.R.H., Carlson M., Dannenberg K., Andersson E., Keita Å.V., Magnusson M.K., et al. Preclinical Protein Signatures of Crohn’s Disease and Ulcerative Colitis: A Nested Case-Control Study Within Large Population-Based Cohorts. Gastroenterology. 2024;168:741–753. doi: 10.1053/j.gastro.2024.11.006. [DOI] [PubMed] [Google Scholar]

Articles from Cell Reports Medicine are provided here courtesy of Elsevier

RESOURCES