Abstract
Purpose of review
During the COVID-19 pandemic, governments and public health agencies used data science tools and data sources in real time to evaluate pathogen transmissibility, disease burden, healthcare capacity, and evaluate treatment and preventive measures. The purpose of the review is to highlight the application of these data sources and methods during the COVID-19 response.
Recent findings
Advances in the development of common data models enabled multisite data networks to overcome healthcare data fragmentation, enabling national surveillance platforms, and offering unprecedented statistical power to conduct national surveillance and detect emerging clinical entities like MIS-C and long COVID in diverse pediatric populations. These integrated networks were also used in evaluating the effectiveness of vaccines and therapies. New surveillance approaches combining traditional clinical data with novel data sources including wastewater detection, web-based search engines, and mobility patterns yielded comprehensive ensemble approaches that informed public health policy.
Summary
The COVID-19 pandemic highlighted the importance of timely evidence for decision-making during outbreak responses and the benefits of using data science tools to help provide real time, actionable insights, which can help guide our public health response to infectious diseases threats in the future.
Keywords: computable phenotypes, COVID-19, data science, machine learning, pandemic surveillance
INTRODUCTION
The COVID-19 pandemic is considered one of the most severe in recent history, with long-standing social, health and economic impacts. Governments and health agencies needed to consider a number of factors simultaneously, including transmissibility, disease burden, healthcare capacity, economic impact, and behavioral factors, both on a local and global scale. This complexity required a response that utilized all available data from electronic health records (EHRs), public health departments, internet search engines, and virologic samples. Data science, which incorporates mathematics, statistics, specialized programming, advanced analytics, artificial intelligence and machine learning with subject matter expertise to guide decision making, aided in this response.
During the pandemic, several data science analytic approaches and sources were able to be activated rapidly to produce high-quality data accurately and consistently [1▪. Built upon a decade of advances in clinical data infrastructure, large-scale collaborative networks supported insights in real time, as data were generated. These networks formed the epicenter of the outbreak countermeasures, helped inform decision making and public awareness regarding transmission risk and disease burden. These data helped inform many facets of the COVID response, including testing, variant and disease monitoring, healthcare resource utilization, as well as evaluating vaccination and treatment effectiveness (Fig. 1).
FIGURE 1.

Examples of data sources, infrastructure and outputs utilized during the COVID-19 pandemic. Footnote created in BioRender. Waxse, B. (2025) https://BioRender.com/dywtcix.
In this review, we outline ways in which data science enabled the COVID-19 response in multiple ways: [1▪] the development of multiinstitutional networks [2], global surveillance and visualization methods [3], treatment evaluations [4], vaccine effectiveness monitoring, and [5] discovery of postacute sequelae.
THE ROLE OF DATA NETWORKS IN COVID-19
The COVID-19 pandemic catalyzed unprecedented collaboration in clinical data science, building upon a decade of advances in EHR research infrastructure. These collaborative data networks enabled researchers to study COVID-19 at an accelerated pace.
Modern EHR data are maintained by individual health systems, and even though only a few different EHR applications exist, standards and implementation differ dramatically. The challenge of harmonizing data from disparate sources necessitated common data models (CDMs) – standardized frameworks for collecting and querying multiinstitutional data.
Three primary CDMs were instrumental for pediatric COVID-19 research infrastructure: Observational Medical Outcomes Partnership (OMOP), National Patient-Centered Clinical Research Network (PCORnet), and Integrating Biology at the Bedside (i2b2) data repositories integrated by the SHRINE platform [2–6]. These prepandemic developments in data standardization created an environment where international collaborative networks could quickly analyze real-word patient data.
PEDSnet, established in 2013, emerged as a critical resource for pediatric COVID-19 research [7▪]. PEDSnet, a network within the Patient Centered Outcomes Research Network (PCORnet), is a national clinical research network that has standardized EHR data for millions of children, and provides a novel resource for timely, efficient, and accurate surveillance of infectious diseases outbreaks in children. With data from 11 US pediatric health systems, PEDSnet’s infrastructure allowed for rapid, large-scale evaluation of COVID-19 impact in children [8]. By 2025, over a fifth of their 371 published studies were related to COVID-19 and the network has grown to over 14 million children [9].
The Consortium for Clinical Characterization of COVID-19 by EHR (4CE) became the first multicenter consortium to harmonize EHR data in the US and Europe, connecting 96 hospitals across 5 countries within just months of the pandemic’s onset [10]. Built upon i2b2 and OMOP frameworks, 4CE demonstrated that laboratory value trajectories and clinical presentations were remarkably consistent across international boundaries, although pediatric patients were underrepresented in this cohort [10].
The U.S.-based National COVID Cohort Collaborative (N3C), funded by the National Center for Advancing Translational Sciences, represented a data science collaboration unrivaled in scale and impact for both adult and pediatric research [11▪]. Adopting the OMOP CDM, N3C harmonized data from diverse networks and CDMs into an open and secure enclave containing more than 12 billion clinical observations [11▪]. By 2025, N3C had grown to include data from 84 sites and nearly 23 million patients, supporting 4300 researchers and generating 296 publications [12▪]. Though only around 15% of N3C patients were children, its scale made it an invaluable contributor to pediatric COVID-19 research.
Just as preexisting data infrastructure supported rapid discovery during the pandemic, COVID-19 accelerated the development of clinical data infrastructure beyond scaling existing approaches. Today, lessons from these global data sharing initiatives are driving international collaboration, and the integration of EHR data with other modalities – genomic sequencing, participant surveys, social determinants of health, wearable technologies, and environmental exposures – is expanding our understanding of human health [13].
SURVEILLANCE, DATA VISUALIZATION, AND DATA DASHBOARDS
The COVID-19 pandemic revolutionized infectious diseases surveillance through the integration of traditional surveillance systems with innovative data science approaches. The Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE) COVID-19 dashboard emerged as the world’s premier COVID-19 tracking resource. Launched within just three weeks of the World Health Organization’s announcement of the outbreak in Wuhan, China, it initially mirrored data from the Chinese CDC and WHO before evolving to incorporate data from over 3,500 locations across 195 countries and regions using automated web scraping complemented by manual data collection [14,15,16▪▪]. This resource visualized cases, deaths, recoveries, testing, hospitalization, and vaccinations, serving as an objective source of information and setting a new standard for global pandemic surveillance [15,16▪▪].
While dashboards tracked cases, other approaches provided critical insights into infection prevalence. The UK COVID-19 Infection Survey (CIS), coordinated by the Office for National Statistics, the University of Oxford, and other partners, exemplified a rigorous approach by randomly sampling households for PCR and antibody testing to capture both symptomatic and asymptomatic infections. This method estimated age-stratified transmission models that informed COVID-19 restrictions and related policies on a weekly basis [17]. As the pandemic evolved, data scientists also developed novel surveillance approaches that were particularly valuable for monitoring community transmission. Wastewater-based epidemiology (WBE) emerged as a powerful approach that could detect viral RNA 2–4 days before clinical testing by using quantified chemical and biological markers [18,19]. A comprehensive study of 99 counties from 40 states in the US demonstrated that wastewater surveillance not only predicted hospitalizations better than models based on cases and hospitalization records, but also revealed associations between vaccination rates, vulnerability indices, and COVID-19 outcomes [20]. As public health emergency orders ended and resources decreased, these approaches became increasingly valuable.
The pandemic also supported the development of ensemble forecasting approaches that combined multiple data types to improve prediction accuracy. In Austria, researchers compared wastewater data with Google mobility data to improve precision of the multivariate model [21]. The CDC developed forecasting systems that integrated case surveillance, hospitalization data, and wastewater monitoring to predict regional outbreaks with improved accuracy and found that median ensemble forecasts outperformed individual components, improving current modeling strategies [22,23].
RISK STRATIFICATION
The scale of massive integrated datasets during the COVID-19 pandemic enabled unmatched statistical power for identifying risk factors for COVID-19 severity and long-term complications [24]. Within months, large-scale EHR analysis from the United Kingdom’s National Health System identified risk factors associated with severe infection, providing clinicians with early guidance [25]. In other cases, standardized electronic phenotyping, machine learning, and novel data sources like wearables converged to create an enhanced understanding of a risk stratification and diseases mechanisms [11▪,26▪,27,28▪,29,30].
EVALUATION OF TREATMENTS
During the early phase of the COVID-19 pandemic, there was a critical need to evaluate the effectiveness of treatments against SARS-CoV-2 in order to minimize disease burden and mortality. Observational data played a crucial role in evaluating treatment effectiveness in a timely manner, to aid clinicians as an adjunct to randomized controlled trials, or in settings in which RCT data were not unavailable, an issue particularly relevant to pediatrics [31,32]. However, such retrospective studies faced methodological challenges, including time-fixed and time-varying confounding and immortal time bias [33]. The target trial emulation framework has been proposed as a more rigorous method, overcoming several of these limitations, whereby a hypothetical RCT is designed, and then the target trial is emulated using observational data [34▪,35▪,36]. Other applications of EHR data used to aid treatment studies during the COVID pandemic were used to ascertain study feasibility, optimize study design, modernize recruitment through identifying potential participants for clinical trials, and to reduce trial data collection costs through automated clinical data extraction [37].
VACCINE EFFECTIVENESS
The development and deployment of SARS-CoV-2 vaccines signaled a turning point for the COVID-19 pandemic, and provided the general population with protection from severe SARS CoV-2 disease and mortality. Following the vaccine rollout through emergency use authorizations and conditional approvals, it became necessary to assess the real-world effectiveness of SARS-CoV-2 vaccines, and provide further evidence for its safety through surveillance. The CDC collaborated with a number of public health and academic partners throughout the US to collate EHR and observational data to provide timely assessments, and to monitor changes that may arise from emerging variants as well as waning immunity [38–40]. These programs helped inform the community, clinicians, and help shape vaccine policy, and included networks such as Investigating Respiratory Viruses in the Acutely Ill (IVY) [41], Virtual SARS-CoV-2, Influenza, and Other respiratory viruses Network (VISION) [42,43], Overcoming COVID [44,45] and Increasing Community Access to Testing, Treatment and Response (ICATT) [46].
EVALUATING THE POST-ACUTE SEQUELAE OF SARS-COV-2
Soon after the COVID-19 pandemic emerged, there was a recognition of a number of individuals suffering from postacute sequelae of SARS-CoV-2, also known as long COVID. Given its heterogeneous manifestations and novelty, there was an urgent need to better understand this condition both in adults and children [47]. In response to this need, the NIH funded the REsearching COVID to Enhance Recovery (RECOVER) Initiative, which was launched in 2021 to advance research into long COVID [48]. One of the cores of RECOVER includes EHR data efforts led by PEDSnet, PCORnet, and N3C. These networks have contributed vast amounts of clinical data that have been used to study aspects related to the detection, prevention, prediction and treatment of long COVID, and have helped to provide accelerated insights into an emerging health condition. Through the harmonization of data, development of computable phenotypes, use of machine learning and other advanced statistical approaches as well as collaboration with data scientists and clinicians, there has been a rapidly growing body of literature related to long COVID in children and adults [29,49–51].
CHALLENGES
The scale and speed of the COVID-19 data science research also revealed challenges and limitations that will need to inform pediatric data science in the future [52▪]. Conduct of timely research during a global public health emergency must be balanced with ensuring adequate patient privacy, a central tenet of research. Networks such as N3C addressed this issue through a tiered access model with public researcher identification and comprehensive activity logging while maintaining transparency through publicly listed research abstracts [11▪]. Alternatively, the OpenSAFELY model provided an innovative solution by providing researchers dummy data for the development of analyses that were executed with real patient records without granting direct access to underlying data [25].
The fragmented nature of healthcare delivery, particularly in the US, also posed significant challenges for data capture and validity. In N3C for example, many networks could only access mortality identified within contributing health systems, limiting the use and reliability of in-hospital mortality for studies [53▪▪]. Data missingness, inappropriate or incomplete variable mapping among sites within large data networks for certain healthcare variables is another common challenge, requiring a rigorous data quality approach, and careful variable and cohort selection [12▪]. In some circumstances, missing data required aggregation of data manually, a technique adopted for the Johns Hopkins University CSSE COVID-19 dashboard, for missing data from Arizona that was locked behind business intelligence tools [16▪▪].
Two high-profile article retractions within 2 months of publication [54▪▪] early in the pandemic emphasized the importance of data transparency and reproducibility when the data could not be independently verified [55,56]. These instances demonstrate the importance of using data that are “fit for purpose” and thoroughly explored before conducting analyses [57▪]. In one study, a thorough consideration of data missingness, EHR discontinuity, and data quality resulted in only 15 out of 67 sites remaining in the final analysis, highlighting the trade-offs between data quality and ensuring a representative cohort [53▪▪].
CONCLUSION
During a public health emergency response, leaders must make critical decisions regarding public health measures, provision of care and allocation of resources. These decisions are often reactive, occurring in a rapidly changing environment where there is little or incomplete information available [58]. Through a collaborative network of public health experts, epidemiologists, informaticians, data scientists and infectious diseases experts, the use of high quality, integrated data for analysis and surveillance met the call for an effective outbreak response, and provided guidance for the utility, scale and timing of prevention strategies. There is a growing use of AI tools for this purpose, including large-scale data abstraction, homogenization and modeling. Emerging infectious pathogens remain a significant threat to public health worldwide. The COVID-19 pandemic highlighted the importance of timely evidence for decision-making during outbreak responses and the benefits of using data science tools to help provide real time, actionable insights, which can help guide our public health response to infectious diseases threats in the future.
KEY POINTS.
The COVID-19 pandemic accelerated data infrastructure development sharing in a period of discovery that would have been impossible through traditional approaches.
Prepandemic advances in the development of common data models enabled researchers to overcome healthcare data fragmentation, offering unprecedented statistical power to detect emerging clinical entities like MIS-C and long COVID in diverse pediatric populations.
New surveillance approaches combining traditional clinical data with wastewater epidemiology and mobility patterns yielded comprehensive ensemble approaches that informed public policy approaches.
Target trial emulation and other advanced methods have helped overcome traditional limitations of observational data, enhancing the utility of large multicenter datasets and networks.
A collaborative, open-science response to COVID-19 demonstrated that transparent, reproducible methodologies can accelerate discovery while maintaining scientific rigor and patient privacy.
Footnotes
Conflicts of interest
Dr. Rao reports prior grant support from GSK and Biofire and was a former consultant for Sequiris. This work was supported in part by the Division of Intramural Research of the National Institute of Allergy and Infectious Diseases.
Disclaimer: Dr Waxse contributed to this article in his personal capacity. The views expressed are his own and do not necessarily represent the views of the National Institutes of Health or the United States Government.
REFERENCES AND RECOMMENDED READING
Papers of particular interest, published within the annual period of review, have been highlighted as:
▪ of special interest
▪▪ of outstanding interest
- 1.▪Polonsky JA, Baidjoe A, Kamvar ZN, et al. Outbreak analytics: a developing data science for informing the response to emerging pathogens. Philos Trans R Soc Lond B Biol Sci 2019; 374:20180276. This article provides an overview of common analytics components, data requirements, challenges and opportunities and role of analytics in outbreak responses.
- 2.Overhage JM, Ryan PB, Reich CG, et al. Validation of a common data model for active safety surveillance research. J Am Méd Inform Assoc 2012; 19:54–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Hripcsak G, Duke JD, Shah NH, et al. Observational health data sciences and informatics (OHDSI): opportunities for observational researchers stud health technol inform. 2015; 216:574–578. [PMC free article] [PubMed] [Google Scholar]
- 4.Fleurence RL, Curtis LH, Califf RM, et al. Launching PCORnet, a national patient-centered clinical research network. J Am Méd Inform Assoc 2014; 21:578–582. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Visweswaran S, Becich MJ, D’Itri VS, et al. Accrual to clinical trials (ACT): a clinical and translational science award consortium network. JAMIA Open 2018; 1:147–152. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Topaloglu U, Palchuk MB. Using a federated network of real-world data to optimize clinical trials operations. JCO Clin Cancer Inform 2018; 2:1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.▪Forrest CB, Margolis PA, Bailey LC, et al. PEDSnet: a national pediatric learning health system. J Am Med Inform Assoc 2014; 21:602–606. This article is a description of the national digital architecture to create PEDSnet, and how it can be used for learning health system research.
- 8.Bailey LC, Razzaghi H, Burrows EK, et al. Assessment of 135 794 pediatric patients tested for severe acute respiratory syndrome coronavirus 2 across the United States. JAMA Pediatr 2021; 175:176–184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.PEDSnet Research Publications. 2025. Available at: https://pedsnet.org/research/publications/.
- 10.Brat GA, Weber GM, Gehlenborg N, et al. International electronic health record-derived COVID-19 clinical course profiles: the 4CE consortium. npj Digit Med 2020; 3:109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.▪Haendel MA, Chute CG, Bennett TD, et al. The National COVID Cohort Collaborative (N3C): rationale, design, infrastructure, and deployment. J Am Méd Inform Assoc 2020; 28:427–443.Summary of the development of the national COVID cohort collaborative.
- 12.▪Walters KM, Clark M, Dard S, et al. National COVID Cohort Collaborative data enhancements: a path for expanding common data models. J Am Méd Inform Assoc 2024; 32:391–397.This article provides the methods used for the N3C data enhancements, including understanding domains, defining scope, writing CDM specific and agnostic guidance and data harmonization.
- 13.Mayo KR, Basford MA, Carroll RJ, et al. The all of us data and research center: creating a secure, scalable, and sustainable ecosystem for biomedical research. Annu Rev Biomed Data Sci 2023; 6:443–464. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect Dis 2020; 20:533–534. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.▪▪Wells CR, Galvani AP. Tackling the politicisation of COVID-19 data reporting through open access data sharing. Lancet Infect Dis 2022; 22:1660–1661. This article summarizes learnings from Johns Hopkins COVID-19 dashboard, details presented here help develop a framework for future, large-scale public health-related data collection and reporting.
- 16.▪▪Dong E, Ratcliff J, Goyea TD, et al. The Johns Hopkins University Center for Systems Science and Engineering COVID-19 Dashboard: data collection process, challenges faced, and lessons learned. Lancet Infect Dis 2022; 22: e370–e376. This article summarizes learnings from Johns Hopkins COVID-19 dashboard, details presented here help develop a framework for future, large-scale public health-related data collection and reporting.
- 17.Office for National Statistics: Coronavirus (COVID-19) Infection Survey: quality and methodology information (QMI). Available at: https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsand-diseases/methodologies/coronaviruscovid19infectionsurveyqmi.
- 18.Sims N, Kasprzyk-Hordern B. Future perspectives of wastewater-based epidemiology: monitoring infectious disease spread and resistance to the community level. Environ Int 2020; 139:105689. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Nemudryi A, Nemudraia A, Wiegand T, et al. Temporal detection and phylogenetic assessment of SARS-CoV-2 in municipal wastewater. Cell Rep Med 2020; 1:100098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Li X, Liu H, Gao L, et al. Wastewater-based epidemiology predicts COVID-19-induced weekly new hospital admissions in over 150 USA counties. Nat Commun 2023; 14:4548. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Schenk H, Arabzadeh R, Dabiri S, et al. Integrating wastewater-based epidemiology and mobility data to predict SARS-CoV-2 cases. Environments 2024; 11:100. [Google Scholar]
- 22.Centers for Disease Control and Prevention. Behind the model: wastewater-informed forecasting of COVID-19 hospital admissions. Available at: https://www.cdc.gov/cfa-behind-the-model/php/data-research/wastewater-in-formed-forecasting/index.html.
- 23.Ray EL, Brooks LC, Bien J, et al. Comparing trained and untrained probabilistic ensemble forecasts of COVID-19 cases and deaths in the United States. Int J Forecast 2023; 39:1366–1383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Collaborators C-EM. Estimating excess mortality due to the COVID-19 pandemic: a systematic analysis of COVID-19-related mortality. Lancet 2022; 399:1513–1536. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Williamson EJ, Walker AJ, Bhaskaran K, et al. Factors associated with COVID-19-related death using OpenSAFELY. Nature 2020; 584:430–436. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.▪Bennett TD, Moffitt RA, Hajagos JG, et al. Clinical characterization and prediction of clinical severity of SARS-CoV-2 infection among US adults using data from the US National COVID Cohort Collaborative. JAMA Netw Open 2021; 4:e2116901. This first study from N3C identifies factors associated with COVID-19 severity and develops machine learning models that accurately predict adverse outcomes using only the first 24 h of hospital admission data.
- 27.Martin B, DeWitt PE, Russell S, et al. Characteristics, outcomes, and severity risk factors associated with SARS-CoV-2 infection among children in the US National COVID Cohort Collaborative. JAMA Netw Open 2022; 5:e2143151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.▪Gadaleta M, Radin JM, Baca-Motes K, et al. Passive detection of COVID-19 with wearable sensors and explainable machine learning algorithms. npj Digit Med 2021; 4:166. This article demonstrates an explainable gradient boosting model that detects COVID-19 infection using only passive wearable sensor data, eliminating the need for self-reported symptoms and enabling adaptable deployment settings.
- 29.Lorman V, Razzaghi H, Song X, et al. A machine learning-based phenotype for long COVID in children: an EHR-based study from the RECOVER program. PLoS One 2023; 18:e0289774. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Desine S, Master H, Annis J, et al. Daily step counts before and after the COVID-19 pandemic among all of us research participants. JAMA Netw Open 2023; 6:e233526. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Gupta S, Wang W, Hayek SS, et al. Association between early treatment with tocilizumab and mortality among critically ill patients with COVID-19. JAMA Intern Med 2021; 181:41–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Tsuzuki S, Hayakawa K, Uemura Y, et al. Effectiveness of remdesivir in hospitalized nonsevere patients with COVID-19 in Japan: a large observational study using the COVID-19 Registry Japan. Int J Infect Dis 2022; 118:119–125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Martinuka O, von Cube M, Wolkewitz M. Methodological evaluation of bias in observational coronavirus disease 2019 studies on drug effectiveness. Clin Microbiol Infect 2021; 27:949–957. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.▪Hoffman KL, Schenck EJ, Satlin MJ, et al. Comparison of a target trial emulation framework vs Cox regression to estimate the association of corticosteroids with covid-19 mortality. JAMA Netw Open 2022; 5:e2234425. Authors demonstrate that target trial emulation using EHR data successfully reproduced benchmark findings from a meta-analysis of randomized controlled trials, while conventional Cox proportional hazards models failed to consistently achieve similar accuracy.
- 35.▪Martinuka O, Cube MV, Hazard D, et al. Target trial emulation using hospital based observational data: demonstration and application in COVID-19. Life (Basel) 2023; 13:777. This work demonstrates the practical implementation of target trial emulation to addresses immortal time bias, time-fixed confounding, and competing risks in a COVID-19 hospitalization study.
- 36.Hernan MA, Robins JM. Using big data to emulate a target trial when a randomized trial is not available. Am J Epidemiol 2016; 183:758–764. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Kalankesh LR, Monaghesh E. Utilization of EHRs for clinical trials: a systematic review. BMC Med Res Methodol 2024; 24:70. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Ferdinands JM, Rao S, Dixon BE, et al. Waning of vaccine effectiveness against moderate and severe covid-19 among adults in the US from the VISION network: test negative, case-control study. BMJ 2022; 379:e072141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Adams K, Weber ZA, Yang DH, et al. Vaccine effectiveness against pediatric influenza-a — associated urgent care emergency department, and hospital encounters during the 2023 season: VISION Network. Clin Infect Dis 2024; 78:746–755. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Thompson MG, Stenehjem E, Grannis S, et al. Effectiveness of COVID-19 vaccines in ambulatory and inpatient care settings. N Engl J Med 2021; 385: 1355–1371. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Link-Gelles R, Chickery S, Webber A, et al. Interim estimates of 2024–2025 COVID-19 vaccine effectiveness among adults aged >/=18 years –VISION and IVY Networks, September 2024–January 2025. MMWR Morb Mortal Wkly Rep 2025; 74:73–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Link-Gelles R, Ciesla AA, Rowley EAK, et al. Effectiveness of monovalent and bivalent mRNA vaccines in preventing COVID-19-associated emergency department and urgent care encounters among children aged 6 months-5 years – VISION Network, United States, July 2022–June 2023. MMWR Morb Mortal Wkly Rep 2023; 72:886–892. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.DeCuir J, Payne AB, Self WH, et al. Interim effectiveness of updated 2023–2024 (monovalent XBB.1.5) COVID-19 vaccines against COVID-19-associated emergency department and urgent care encounters and hospitalization among immunocompetent adults aged >/=18 years – VISION and IVY networks, September 2023–January 2024. MMWR Morb Mortal Wkly Rep 2024; 73:180–188. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Simeone RM, Zambrano LD, Halasa NB, et al. Effectiveness of maternal mRNA COVID-19 vaccination during pregnancy against COVID-19-associated hospitalizations in infants aged <6 months during SARS-CoV-2 omicron predominance – 20 states, March 9, 2022–May 31, 2023. MMWR Morb Mortal Wkly Rep 2023; 72:1057–1064. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Halasa NB, Olson SM, Staat MA, et al. Maternal vaccination and risk of hospitalization for COVID-19 among infants. N Engl J Med 2022; 387:109–119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Centers for Disease Control and Prevention – vaccine effectiveness studies. 2025. Available at: https://www.cdc.gov/covid/php/surveillance/vaccine-effectiveness-studies.html.
- 47.Rao S, Gross RS, Mohandas S, et al. Postacute sequelae of SARS-CoV-2 in children. Pediatrics 2024; 153:e2023062570. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.RECOVER COVID. 2025. Available at: https://recovercovid.org.
- 49.Rao S, Azuero-Dajud R, Lorman V, et al. Ethnic and racial differences in children and young people with respiratory and neurological postacute sequelae of SARS-CoV-2: an electronic health record-based cohort study from the RECOVER Initiative. EClinicalMedicine 2025; 80:103042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Razzaghi H, Forrest CB, Hirabayashi K, et al. Vaccine effectiveness against long COVID in children. Pediatrics 2024; 153:e2023064446. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Rao S, Lee GM, Razzaghi H, et al. Clinical features and burden of postacute sequelae of SARS-CoV-2 infection in children and adolescents. JAMA Pediatr 2022; 176:1000–1009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.▪Mandel HL, Shah SN, Bailey LC, et al. Opportunities and challenges in using electronic health record systems to study postacute sequelae of SARS-CoV-2 infection: insights from the NIH RECOVER initiative. J Med Internet Res 2025; 27:e59217. This article examines the potential and limitations of using EHR data to study novel, emerging, and multifaceted conditions such as long COVID.
- 53.▪▪Sidky H, Young JC, Girvin AT, et al. Data quality considerations for evaluating COVID-19 treatments using real world data: learnings from the National COVID Cohort Collaborative (N3C). BMC Méd Res Methodol 2023; 23:46. This cohort study comprehensively addresses critical data quality challenges within the N3C cohort, including site-specific variation, data missingness, data discontinuity, and drug exposure measurement issues. The methodological approach provides a thorough framework that can serve as a model for multisite clinical research using EHR data in any domain.
- 54.▪▪Kohane IS, Aronow BJ, Avillach P, et al. What every reader should know about studies using electronic health record data but may be afraid to ask. J Med Internet Res 2021; 23:e22219. This article describes the opportunities and limitations of EHR data–driven studies. With a summary of key considerations for constructing and appraising EHR-based studies.
- 55.Mehra MR, Desai SS, Kuy S, et al. Cardiovascular disease drug therapy and mortality in COVID-19. N Engl J Med 2020; 382:e102. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
- 56.Mehra MR, Desai SS, Ruschitzka F, Patel AN. RETRACTED: Hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID-19: a multinational registry analysis. Lancet 2020; S0140-6736:31180-31186. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
- 57.▪Gatto NM, Campbell UB, Rubinstein E, et al. The structured process to identify fit-for-purpose data: a data feasibility assessment framework. Clin Pharmacol Ther 2022; 111:122–134. This work presents a structured framework for identifying decision-grade, fit-for-purpose real-world data for clinical research.
- 58.Wernstedt K, Roberts PS, Arvai J, Redmond K. How emergency managers (mis?)interpret forecasts. Disasters 2019; 43:88–109. [DOI] [PubMed] [Google Scholar]
