Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Sep 15.
Published in final edited form as: Am J Cardiol. 2023 Jul 25;203:136–148. doi: 10.1016/j.amjcard.2023.06.104

Cardiovascular Care Innovation through Data-Driven Discoveries in the Electronic Health Record

Lovedeep Singh Dhingra a, Miles Shen b, Anjali Mangla c, Rohan Khera a,d,e
PMCID: PMC10865722  NIHMSID: NIHMS1915150  PMID: 37499593

Abstract

The electronic health record (EHR) represents a rich source of patient information, increasingly being leveraged for cardiovascular research. While its primary use remains the seamless delivery of healthcare, various longitudinally aggregated structured and unstructured data elements for each patient within the EHR can define the computational phenotypes of disease, and care signatures and their association with outcomes. While structured data elements, such as demographic characteristics, laboratory measurements, problem lists, and medications, are easily extracted, unstructured data is underutilized. The latter include free text in clinical narratives, documentation of procedures, and reports of imaging and pathology. Rapid scaling up of data storage, and rapid innovation in natural language processing and computer vision can power insights from unstructured data streams. However, despite an array of opportunities for research using the EHR, specific expertise is necessary to adequately address confidentiality, accuracy, completeness, and heterogeneity challenges in EHR-based research. These often require methodological innovation and best practices to design and conduct successful research studies. Our review discusses these challenges and proposed solutions for these challenges. Additionally, we highlight ongoing innovations in federated learning in the EHR through greater focus on common data models, and discuss ongoing work that defines such an approach to large-scale, multi-center, federated studies. Such parallel improvements in technology and research methodology enable innovative care and optimization of patient outcomes.

Keywords: Electronic Health Records, Federated Learning, Natural Language Processing

CURRENT STATE OF THE EHR IN CARDIOVASCULAR RESEARCH

Electronic health records (EHRs) represent electronic platforms that collect, store, and present clinical data linked longitudinally for each patient. The primary purpose of the EHR is to allow for the seamless delivery of healthcare.1 However, EHR data can provide abundant sociodemographic and clinical data that can potentiate and galvanize clinical research, spanning investigations into the epidemiology of disease, healthcare resource utilization, disease phenotyping, outcome prediction, pragmatic trials, and assessments of comparative effectiveness and safety of treatments.26

The history of EHRs dates to the 1970s, when they were originally designed to support health insurance and billing.1,7 A series of investigations suggested additional benefits in patient safety with the use of EHRs, which led to the National Academy of Medicine (then known as the Institute of Medicine) releasing a report in 1997 promoting the adoption of EHRs in the healthcare systems across the US.8 As continued evidence emerged for improved quality and efficiency of healthcare delivery with the EHRs,9 the Health Information Technology for Economic and Clinical Health (HITECH) Act of 2009 incentivized EHR adoption and propelled the proliferation of EHR in healthcare systems.10,11 Legislation continues to prioritize the ease of electronic health information exchange, as evidenced by the 21st Century Cures Act of 2016, which addresses information blocking and mandates standards to further advance healthcare interoperability.12 These challenges and their continued mitigation at both a federal and individual level are discussed in detail later in this review.

In addition to the clinical role of EHRs, the National Academy of Medicine identified research as their key secondary role.8,13 EHRs present many opportunities to analyze, predict, and improve the care of patients with cardiovascular disorders and their outcomes. However, important scientific and technical considerations for optimizing EHR-based clinical research require careful and systematic data practices. This has prompted contemporary innovation in research strategy and data interoperability.1417

This review discusses the data infrastructure of EHRs that can help research, the challenges and opportunities for cardiovascular care innovation through data-driven discoveries in the EHR, and contemporary innovation in data interoperability and federated research strategies.

DATA ASSETS AND THE EHR

Data Streams within the EHR

The EHR aggregates a series of discrete data sources spanning structured and unstructured elements.18,19 (Figure 1) Structured data exists within pre-defined fields of standardized data classes and elements which allow for easy extraction and analysis.19 Common structured data sources within the EHR include demographic characteristics (e.g. age, sex, race, and ethnicity), laboratory measurements (e.g. complete blood counts, chemical laboratory tests, biomarkers, etc.), vital signs (e.g. body mass index, heart rate, blood pressure, etc.), problem lists, diagnoses, procedures, medications, allergies, and healthcare utilization information (e.g. cost, hospitalization duration), among others.1,20 These data sources have become relatively standardized through the Common Clinical Data Set (CCDS), which was developed by the Centers for Medicaid and Medicare Services (CMS) Meaningful Use Program in conjunction with the Office of the National Coordinator for Health Information Technology (ONC).20,21

Figure 1.

Figure 1

Data Streams and Assets in the EHR

However, the majority of data contained within the EHR – approximately 80% – is unstructured.2224 Sources of unstructured data in EHRs include free text in the form of clinical narratives, procedure documentation, and diagnostic testing reports, such as those for imaging and pathology.1,19,24,25 Picture Archiving and Communication Systems also store medical imaging and videographic data such as X-rays and echocardiograms.19,23,26 Unstructured data captures the rich, multi-faceted, and complex nature of disease and provides insight into patient-provider interactions; these factors may be inadequately represented if only structured data is used. For example, a patient with a myocardial infarction requiring percutaneous coronary intervention may have corresponding diagnoses and procedural codes available in the chart that specify the territory of myocardial infarction and the culprit coronary artery that required intervention. However, signs and symptoms at presentation, intraprocedural challenges, post-procedural hospital course, and relevant complications are aspects of the disease that are often not captured within structured data. While unstructured data may be more useful in clinical concept recall than structured data,27 the process of extracting and analyzing unstructured data can be time-consuming and expensive.1,19 The emergence of specialized flexible applications of natural language processing (NLP), machine learning, and computer vision techniques enables health professionals and researchers to leverage these essential data sources for clinical and research.4,19,28 Other forms of both structured and unstructured data streams such as continuous physiologic monitoring (e.g. vital signs, telemetry), data from mechanical ventilators and cardiopulmonary bypass machines, and raw videographic footage from endoscopic or laparoscopic procedures are not commonly stored due to very large file sizes and thus pose challenges in utilization and handling.23,29,30 However, advances in data storage (e.g. data lakes) and computational methods, both discussed later in greater detail, allow researchers the potential to leverage these data more effectively.31

EHR-Adjacent Data Streams

Research using EHR is increasingly utilizing additional data streams such as genomics and molecular profiling, patient-generated health data (PGHD) obtained from wearable and nearable devices, patient-reported outcomes (PRO), and community and personal social determinants of health (SDOH).20,23,24

Several initiatives have pioneered the integration of EHR and genomics data, including the National Human Genome Research Institute-funded eMERGE (Electronic Medical Records and Genomics) Network,3234 Vanderbilt University’s BioVU biobank,35 and Kaiser Permanente’s Research Program on Genes, Environment and Health biobank,36 with broad scientific guidance on merging genomics with EHRs.37 This enables a new era of discovery arising from EHRs paired with genomics. An example is phenome-wide association studies (PheWAS) that map phenotypic information extracted from EHRs to single nucleotide polymorphisms (SNPs), allowing researchers to uncover novel associations between genetic variants and a variety of disease states. For example, PheWAS have revealed genetic variants implicated in the association between QRS duration in patients without cardiac disease and subsequent arrhythmia development, as well as the genetic component underlying the association of obesity with various disease phenotypes.38,39

The growth of PGHD in recent years owes itself in large part to the increasing ubiquity of smartphones, wearable devices, remote monitoring devices, and mobile applications, which encompass a wide spectrum of functions including the monitoring of daily activity and fitness, heart rate and rhythm, and blood pressure.20,40,41 Furthermore, EHR integration of PGHD is expected to accelerate following the 2020 ONC mandate for health information technology services to comply with interoperability standards and to implement technologies supporting application programming interfaces (API) that can better interface with EHRs.25,42 Such integration will improve the monitoring and detection of patient risk factors through automated health messaging.43 Additionally, these data can guide clinicians in decision-making for optimal patient treatment.40 PGHD has revolutionized cardiovascular medicine via notable large-scale clinical trials that used wearables such as the Apple Heart Study, which enrolled over 400,000 ambulatory participants without atrial fibrillation and demonstrated an accurate irregular pulse notification algorithm utilizing the Apple Watch.44 However, evidence supporting the use of wearables in improving health outcomes remains limited, and further research is warranted to ensure they are safely implemented and clinically meaningful.45,46

Integrating PROs with EHR can similarly improve patient-centered health research. PROs provide significant indicators of patient health status post-intervention and are often primary or secondary outcomes within cardiovascular clinical trials as they encapsulate patient perspectives on the consequences of intervention on quality of life and symptoms.47 The Patient-Reported Outcomes Measurement Information System (PROMIS) is a standardized series of PRO surveys that measure physical, mental, and social health developed by the NIH.48 However, remote EHR-based collection of PROs presents several logistical and technical challenges, primarily in engaging patients to report their outcomes.49,50 A recent study successfully implemented remote EHR-integrated data collection across seven orthopedic clinics to successfully administer standardized PROMIS measures across a heterogenous patient population.49

Lastly, SDOH have long been recognized as important social, economic, psychosocial, and environmental factors mediating morbidity and mortality due to cardiovascular diseases.51,52 Despite the significance of these determinants, the lack of systematic, standardized methods limits their capture within EHRs.20,53,54 However, SDOH can better be captured into EHR through the development of systematic measurement of patients’ environment and interactions that influence their patient-provider relationship. For example, the IOM convened a committee of social scientists, clinicians, and informatics researchers who recommended 12 brief, standard SDOH screening measures in a panel that can be adopted through EHR.53 This panel would reduce barriers to implementation, increase interoperability through widespread adoption, and reduce the need for redundant capture. Such a concise panel of standard measures increases clinical awareness of a patient’s health status and allows for integrating public health and community resources in patient care. A readily adopted standard set of measures would hopefully motivate EHR vendors to incorporate this panel into their products, health systems to adopt its use, and clinicians to incorporate this integral information. A variety of community-level SDOH data and indices are available to the public, including the American Community Survey published by the US Census Bureau, the SDOH database published by the Agency for Healthcare Research and Quality, and the Neighborhood Atlas Area Deprivation Index.55,56 A study by Bhavsar and colleagues linked EHR and ACS data and found that lower neighborhood socioeconomic status was predictive of shorter time to use of emergency department visits and inpatient encounters, as well as shorter time to hospitalizations due to myocardial infarction and stroke.54

Strategies and Infrastructures to Leverage the Data

Leveraging the heterogeneous and complex data within EHR and HER-adjacent data streams requires the development of novel infrastructures and strategies that can address data capture, maintenance, processing, and analysis challenges. The Coronavirus Disease 2019 (COVID-19) pandemic especially highlighted the need for platforms that could support rapidly updating datasets and agile analysis of real-time data to provide timely surveillance of incident cases to inform hospital operations.57,58

Traditional data storage structures, such as enterprise data warehouses used in the analysis and reporting of structured data,24 are not well-equipped to deal with heterogeneous data due to their structural inflexibility. Scalable data storage architectures, or data lakes, provide a centralized repository of data in its original form and enable flexible analytics through the concept of ‘late binding’, which enables researchers to perform customized analysis and bypass the rigid constraints of the schema in conventional data warehouses.31,59 Data lakes are often linked to open-source computational frameworks such as Hadoop and Apache Spark, along with a suite of tools for data governance, data discovery, and extraction.31,6062 Other comprehensively integrated institutional data ecosystems like ‘data commons’ acquire and harmonize EHR data from multiple sources such that they can be readily used for research.57,61

DISCOVERY SCIENCE IN EHR: OPPORTUNITIES, CHALLENGES, AND SOLUTIONS

Research Opportunities

Computational research in the EHR presents many opportunities for improving patient care and advancing medical knowledge. First, EHR data can improve the efficiency and efficacy of healthcare practice through real-world outcome assessments via retrospective record review to time-varying clinical risk prediction.4,27,6366 Second, techniques such as deep learning-based natural language processing (NLP) and computer vision can harness unstructured elements in the EHR for computational phenotyping of multifaceted and complex clinical disease scenarios, without resorting to abstractable structured elements alone.6,29,43,67,68 Third, data-driven investigations in the EHR can be utilized to achieve the 2007 FDA Amendment Act’s mission for post-marketing surveillance of medications, improving the safety and efficacy of these drugs.6972 Fourth, clinical trials can efficiently leverage the EHR to enrich screening using computable inclusion and exclusion criteria for potential trial subjects more likely to benefit from treatment.6974 Fifth, novel machine learning modalities can be used to derive distinct clusters of patients with clinical conditions (sub-phenotyping) by using the rich variety of data available within the EHR.75,76 Such sub-phenotyping can be used to deliver precise therapy based on clinically relevant insights derived from other patients of the same phenotype.77,78 Finally, patient-led research presents revolutionary opportunities if vendors’ EHR data is exchanged via write-back APIs, allowing patients to be directly involved in data-driven research.79

Challenges and Solutions

While EHR represents a rich source of real-world data for evaluating clinical care, the data is often large, heterogenous, incomplete, and noisy.2,80 This presents multiple challenges, and requires new methodologic approaches, significant data validation, and careful interpretation of results.57

• Data Confidentiality

EHR data require rigorous handling practices due to its highly sensitive nature and are protected by institutional and governmental confidentiality guidelines, including the Health Insurance Portability and Accountability Act of 1996 (HIPAA) Privacy Rule.8184 Patient privacy can be protected by deidentifying and pseudonymizing data prior to analytical use, and ensuring user authentication and audit on devices and servers where the data is being used.19,8587

• Data Accuracy

EHR data accuracy can be compromised by errors and biases in manually recorded data due to physician overload and limited time for patient care, in addition to inter-institutional variations in encounter information practices.8890 A higher accuracy is associated with reimbursements linked to the diagnosis and procedure codes encoded in the billing information.5 Inconsistencies and differences in EHR accuracy data versus medical record abstraction can produce disparate results in performance analysis.91

When building reliable cohorts for data analysis, information should be triangulated based on multiple clinical domains such as diagnosis codes, imaging criteria, procedural intervention, medications administered, and clinical note documentation.19,75 For example, to create a cohort of patients with hypertrophic cardiomyopathy (HCM), the use of diagnosis codes alone may not be accurate inclusion criteria. A more accurate cohort could be developed by including patients with diagnosis codes for HCM who are also undergoing treatment in a specialty HCM clinic, have hypertrophic features on their echocardiogram, and have documented symptoms and clinical manifestations of HCM in their notes. Regular data quality reporting standards can be institutionally implemented to ensure higher accuracy of EHR coding.92 Streamlined and physician-friendly EHR designs can also improve the accuracy of data capture.

• Data Completeness

As patients seek care at multiple sites, their health information is represented across multiple health systems. Clinical endpoints in studies cannot be reliably assessed unless all relevant patient encounters can be identified.93 Moreover, the inability to capture clinical characteristics across these disparate EHR systems and data sources poses the risk of misclassifying patients. Randomly missing information can be ignored, but if the absence of captured information itself represents a clinical signal, it can be informative.94,95

For longitudinal outcome assessments, only patients who consistently seek care at the same health system should be included to ensure completeness of follow-up.96 Developing methodological standards to identify informative missingness in the EHR can improve predictive models for clinical disease outcomes.94,95 For example, whether or not the physician ordered a laboratory measurement is often clinically relevant in determining if a patient’s prognosis is favorable, irrespective of the measured value,.95 Regular and frequent data quality measurements across the linked data sources can lower encounters of random missing data.97,98

• Data Heterogeneity

Institutional heterogeneity across multiple sites hinders the conduction of multi-site data-driven research and serves as a barrier for methodological reproducibility and consistency.99,100 Emerging solutions to tackle data heterogeneity and promote data interoperability include harmonizing data to standard Detailed Clinical Models (DCMs) and mapping the databases to Common Data Models (CDMs). These are discussed further in the sections below.

• Data Capture

The challenges related to data capture in EHRs arise due to the large amounts of unstructured data representing clinical information, which are difficult to extract.23,24

Methodological innovation within the realm of unstructured data, in the form of NLP on clinical notes and computer vision on medical images, can improve the amount of data captured during research.28,101,102 NLP, a branch of computer science that combines computational linguistics with AI, can convert vast swaths of natural language data, such as unstructured free text contained within clinical notes, to structured, machine-interpretable data through sentence-splitting, tokenization, lemmatization, stemming, and terminology recognition via data dictionaries.16

Additionally, deep learning models that utilize structured and unstructured data performed well in predictions pertaining to key hospital metrics, including in-hospital mortality, 30-day hospital re-admission, and prolonged length of stay.103,104 NLP can also be utilized to quantify stigmatizing language in provider notes, providing valuable insight into the impact of such language on patient care.105 One such study found that stigmatizing language was more often used to describe non-Hispanic Black patients in hospital notes for treatment of patients with diabetes.106

• Information Blocking

The intentional or unintentional obstruction of timely health information exchange poses a major barrier to the advancement of interoperability within the contemporary healthcare landscape, carrying downstream implications for healthcare delivery and clinical research. Informed by a substantial body of anecdotal evidence, the Office of the National Coordinator for Health Information Technology (ONC) released a report to Congress in 2015, proposing several targeted actions to mitigate this practice.107 This eventually led to the 21st Century Cures Act of 2016, as discussed earlier, which included provisions prohibiting information blocking, followed by the Cures Act Final Rule of 2020, which implemented these provisions and outlined eight exceptions to the rule.

Although information blocking remains prevalent,108 efforts are underway to further mitigate this issue. These include implementing concrete enforcement guidelines by the Department of Health and Human Services (HHS), which may result in penalties spanning from disincentives for healthcare providers to civil monetary penalties for health IT developers and health information exchanges and networks. Currently, any party can anonymously submit information blocking complaints through the information blocking portal,109 which may lead to a formalized investigation by the Office of Inspector General (OIG) or the ONC. Moreover, federal initiatives that incentivize and reward interoperability, such as the Medicare Promoting Interoperability Program, may also ameliorate information blocking.110

• Physician Burnout

Although EHRs were intended to help clinicians improve productivity, efficiency, and work-life balance,111 they have also significantly contributed to clinician burnout.112116 Documentation tasks, inbox workload, after-hours charting, and poor EHR design and workflow, are commonly cited factors contributing to this issue, among many others.115 EHR-associated clinician burnout may thus lead to healthcare providers’ resistance to accepting ever-evolving EHR systems and limit the utilization of innovative EHR-based research modalities.

EHR-associated burnout is increasingly recognized as a major issue and has received widespread attention from professional medical societies (e.g. American College of Physicians, American Medical Informatics Association),117 standards-development organization Health Level Seven (HL7),118 as well as federal bodies including the ONC.119 Although the process of redesigning EHRs to be more clinician-friendly can be a costly and logistically challenging endeavor, continuing EHR improvements are feasible and have been successfully implemented at a multi-institutional level.117,120 Additionally, numerous major EHR vendors have spearheaded efforts focused on improving the clinician-EHR experience, ranging from Allscripts’ Human-Centered Design initiative,121 to Epic and Cerner’s continued development of artificial intelligence (AI) solutions such as smart voice assistants.122 The continued refinement of AI technologies, advancements in data storage and data models supporting interoperability, and evolution of EHR-based research methodology are poised to further enhance the clinician-EHR experience.

• Implementing AI Models in the EHR

Some of the limitations of AI augmented EHR-based research lie within the fact that input data are generated within a non-stationary environment with shifting patient populations, and clinical care practices shift over time. Thus, introducing and implementing novel prediction algorithms could potentially cause a change in practice, which influences the training data for algorithms. Therefore, developing methods to identify drift and update models in response to performance deterioration is critical. Careful performance quantification and evaluation with periodical retraining could potentially mitigate these challenges. Data-driven testing procedures have the potential to appropriately update models to maintain a similar level of performance over time.123

Moreover, the economic impacts of AI-based research and developing technologies that must be considered. The U.S. healthcare payment system continues to evolve to account for the added value of these services, as well as the constant support they require from technical experts. For example, as radiology AI algorithms progress, one Category III code for AI use in CT and two for non-elastographic quantitative analysis with ultrasound became active in 2022.124 The transition to value-based care may additionally lower some of the barriers to the adoption of AI tools in clinical care related to lack of reimbursement through fee-for-service paradigms. Additionally, AI models used in healthcare have enabled significant cost savings when utilized in diagnosis and treatment.125 Furthermore, the economics of AI can be improved through the incorporation of pruning, reduction in bias, explainability, and regulatory approval.

SCALING INNOVATION FROM THE EHR: DATA MODELS AND FEDERATED STRATEGIES

Data Interoperability – Standardized Vocabularies, Detailed Clinical Models (DCMs), and Common Data Models (CDMs)

Although operational interoperability is rare in EHR data infrastructures, recent policies and tools aim to enhance EHR access and exchange.126129 Lack of interoperability is a major challenge due to multiple types of EHR used across and within health systems producing disparate data infrastructure and coding terminologies that lead to inter-research inconsistency. The United States Core Data for Interoperability (USCDI) is a standardized set of health data classes and elements intended to enable nationwide, interoperable EHR exchange.128 The use of standardized terminologies and ontologies, such as International Classification of Diseases (ICD),130 Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT),131 Current Procedural Terminology (CPT),132 RxNorm,133 and Logical Observation Identifiers Names and Codes (LOINC),134 for encoding of structured data elements in the EHR aids in accurate outcome comparisons across diverse health systems.86 Moreover, due to a lack of consistency in EHR vendor data storage, outcomes in patients seeking care at multiple sites cannot be tracked conveniently.20 Consistent data infrastructure for storing EHR data across different health systems or mapping the data to common data models can enable patient-level linking of databases, allowing better capture of information for each patient.135,136

Detailed Clinical Models (DCMs) represent an effort to harmonize clinical knowledge modeling and preserving computable meaning in EHR data during an exchange between heterogenous data infrastructures.137139 DCMs are a small, standalone information models designed to express clinical concepts in a standardized and reusable manner, such that they can be independently maintained during the transformation of data across formats.138,140 The Clinical Informatics Modeling Initiative (CIMI) and Health Level Seven’s (HL7) International Fast Healthcare Interoperability Resources (FHIR) standard are examples of DCM initiatives that improve semantic interoperability in healthcare information systems.141,142

While the models like CIMI and FHIR serve as a layer of standardization for clinical data elements, integration across broad data architectures and formats requires mapping of EHR data to common data models (CDMs).99,143 CDMs attempt to achieve true interoperability – by defining the same tables, same column names, and each element defined a consistent way to ensure the process of creating programs for one data stream in the EHR is immediately usable across all other EHR data streams in the same CDM format. Therefore, CDMs represent standard data schemas that can integrate data from multiple clinical areas and incorporate standard vocabularies. While various CDMs have been proposed, the Observational Medical Outcomes Partnership (OMOP) CDM and the National Patient-Centered Clinical Research Network (PCORnet) CDM are commonly used.143145

The process of data harmonization and transforming raw source EHR data to a CDM is referred to as ‘Extract, Transform, Load’ (ETL).143,146,147 ETL involves extracting the EHR data from the source, transforming it by de-duplicating, linking, cleaning, conducting quality checks, and loading it into the target CDM database. While transforming data from the EHR system to the CDM, potentially large amounts of information can be misrepresented or lost in translation. Thus, systematic care must be taken to ensure data quality in terms of conformance, completeness, plausibility, and precision:148,149

  • Conformance

    Conformance refers to the representation of source EHR data against the formatting, relational, and computational definitions of the CDM. Syntactical conformance relates to the formatting and structure of datasets. This often involves matching data types or transforming the format of datasets to meet CDM standards, such as reshaping data from wide-to-long or vice versa. For example, in the case of mapping to the OMOP CDM, syntactical conformance means that the EHR data should be transformed to columns in the OMOP CDM tables. Semantic conformance measures how accurately the clinical values map to the corresponding concepts in the standard CDM vocabulary.86,150,151 This can be done via automated ontology mapping and rigorous manual review. For example, in the case of OMOP, semantic conformance refers to how accurately the diagnosis codes in the EHR (often present as ICD-10 codes) map to the standard concepts derived from the SNOMED CT vocabulary.86

  • Completeness

    Completeness refers to the frequencies of source EHR data attributes present in the CDM dataset. An adequate ETL process ensures minimal loss of EHR data in translation from the source data structure to the CDM.97,152

  • Plausibility

    Plausibility refers to the believability and accuracy of data values. Unlike conformance and completeness, which focus only on the structure and presence of values, plausibility focuses on the actual values as representations of real-world source data. To evaluate the plausibility of the mapped data, different data fields can be checked for erroneous values that are physiologically implausible by performing quality assessments on the mapped data.148,153 Assessing the distributions for numerical variables and the frequencies for categorical variables can help screen for such implausible mapping. This necessitates edits to the ETL process design, often aided by graphical and dynamic tools,144,154 with inputs from clinical experts guiding the mapping process.

  • Precision

    Due to differences in granularity between EHR data coding systems and the standard vocabularies, data transformations can combine lists of multiple elements from the source data into a manageable aggregated category. While this may be convenient and essential for mapping, the aggregation should be guided by clinical and methodologic expertise to preserve the necessary precision for assessing study variables.153 Moreover, many CDMs, including the OMOP CDM, ensure that the EHR source values are preserved in the transformed tables in different columns. Thus, research investigators who need to modify the mapping using different precision and granularity criteria can perform these processes without re-extracting the EHR data.146

Close collaboration between computational and clinical experts may be necessary to define the logic of the ETL design and the debugging process after thorough data quality assessments of the mapped datasets. Multiple graphical and dynamic tools can aid in the design and modification of the ETL approach.144,154

Federated Strategies for Multi-institutional Research

Interoperable CDM databases enable research studies to be conducted in a large-scale, multi-center network setting. EHR data from multiple health systems are mapped to a common data model, enabling analytical strategies to be deployed without the need for sharing patient-level data across institutes. This enables multi-center studies while ensuring data confidentiality and patient privacy.87 Federated strategies make multi-center studies feasible, ensure consistency in the implementation of methods, and improve the reproducibility of results.155 (Figure 2)

Figure 2.

Figure 2

Scaling Innovation in EHR Research with Federated Network Studies

These strategies are being adopted widely for large-scale pharmacoepidemiologic and comparative assessments of medical therapies. For utilization trends, effectiveness and safety comparisons of medication use in hypertension and diabetes, ‘Large-scale Evidence Generation and Evaluation across a Network of Databases (LEGEND) – Hypertension’ and LEGEND – Type 2 Diabetes Mellitus (T2DM), are being conducted respectively.155159 Figure 3 represents the specific study approach of LEGEND-T2DM and its participating health system databases.158 Similarly, a large data repository comprising EHR data from patients across 8 diverse healthcare systems, The Guideline Advantage™ (TGA) repository, has been used to evaluate whether cardiovascular health trends across large populations are progressing towards the American Heart Association’s (AHA) Impact Goals.160 Such federated strategies were also adopted during the COVID-19 pandemic to create a large repository of deidentified patient information, National COVID Cohort Collaborative (N3C), which could be used by clinical researchers in the open-source community.151,161

Figure 3.

Figure 3

The ‘Large-scale Evidence Generation and Evaluation across a Network of Databases for Type 2 Diabetes Mellitus’ (LEGEND-T2DM) Study

All data sources participating in such federated studies need IRB approval to contribute to a multi-site study. While federated studies enable the collaborative deployment of analytical methods without exchanging patient-level information, there are specific regulatory considerations for these studies. The IRBs need to ensure investigator compliance with the proposed research protocol. Automating this IRB monitoring of the data resources that are extracted and accessed can improve compliance with the regulations for human subject protection and increase operational efficiency while minimizing the logistical burden on the IRB.162 However, although the sharing of code for analysis is more secure than sharing of patient-level data, there is a limited risk for data leakage in federated research.163 Malicious actors can perform property or membership inference attacks which can be used to determine limited information about the patients.163,164 Such attacks can be mitigated by regulatory interventions of maintaining stringent data access across participating institutes and evaluating the code before deployment. Methodological interventions, including differential privacy, the introduction of stochastic noise, and intentional rounding of values in the data, can also be used across sites.165,166

Beyond these conventional federated strategies, novel approaches such as swarm learning are being developed for decentralized machine learning on peer-to-peer networking technology and optimization of confidential study designs.87,167 While the disease classifier models developed using this approach have been shown to outperform models made at individual sites, the adoption of this approach in clinical research is lacking.168

CONCLUSIONS

The EHR provides a valuable source of clinical structured and unstructured data for research that can predict cardiovascular clinical outcomes, enhance cardiology practice and diagnosis, address healthcare disparities, and advance precision medicine. Several challenges currently exist in utilizing EHR for data-driven research — however, recent innovations in data storage, synthesis, and analytics can ameliorate current challenges. Moreover, emerging interoperable data infrastructure, enhanced regulation of data exchange, and patient-integrated research can enable large-scale, multi-center evidence generation with improved methodological consistency. With continued development and innovation, HER-based research can bring novel technology to patient care and improve health outcomes.

ACKNOWLEDGEMENTS

Dr. Khera is an Associate Editor of JAMA. He receives support from the National Heart, Lung, and Blood Institute of the National Institutes of Health (under award K23HL153775) and the Doris Duke Charitable Foundation (under award, 2022060). He also receives research support, through Yale, from Bristol-Myers Squibb and Novo Nordisk. He is a coinventor of U.S. Provisional Patent Applications 63/177,117, 63/428,569, 63/346,610, and 63/484,426. He is a co-founder of Evidence2Health, a precision health platform to improve evidence-based cardiovascular care. The other authors do not report any disclosures.

Footnotes

Declaration of interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

REFERENCES

  • 1.Kim E, Rubinstein SM, Nead KT, Wojcieszynski AP, Gabriel PE, Warner JL. The Evolving Use of Electronic Health Records (EHR) for Research. Semin Radiat Oncol 2019;29:354–361. Available at: 10.1016/j.semradonc.2019.05.010. [DOI] [PubMed] [Google Scholar]
  • 2.Cowie MR, Blomster JI, Curtis LH, Duclaux S, Ford I, Fritz F, Goldman S, Janmohamed S, Kreuzer J, Leenay M, Michel A, Ong S, Pell JP, Southworth MR, Stough WG, Thoenes M, Zannad F, Zalewski A. Electronic health records to facilitate clinical research. Clin Res Cardiol 2017;106:1–9. Available at: 10.1007/s00392-016-1025-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Pfaff E, Lee A, Bradford R, Pae J, Potter C, Blue P, Knoepp P, Thompson K, Roumie CL, Crenshaw D, Servis R, DeWalt DA. Recruiting for a pragmatic trial using the electronic health record and patient portal: successes and lessons learned. J Am Med Inform Assoc 2019;26:44–49. Available at: https://academic.oup.com/jamia/article/26/1/44/5185594. Accessed January 11, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Wu J, Roy J, Stewart WF. Prediction modeling using EHR data: challenges, strategies, and a comparison of machine learning approaches. Med Care 2010;48:S106–13. Available at: 10.1097/MLR.0b013e3181de9e17. [DOI] [PubMed] [Google Scholar]
  • 5.Yim W-W, Wheeler AJ, Curtin C, Wagner TH, Hernandez-Boussard T. Secondary use of electronic medical records for clinical research: Challenges and Opportunities. Converg Sci Phys Oncol 2018;4. Available at: 10.1088/2057-1739/aaa905. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Khera R. Digital Cardiovascular Epidemiology-Ushering in a New Era Through Computational Phenotyping of Cardiovascular Disease. JAMA Netw Open 2021;4:e2135561. Available at: 10.1001/jamanetworkopen.2021.35561. [DOI] [PubMed] [Google Scholar]
  • 7.Gillum RF. From papyrus to the electronic tablet: a brief history of the clinical medical record with lessons for the digital age. Am J Med 2013;126:853–857. Available at: https://www.amjmed.com/article/S0002-9343(13)00398-7/fulltext. Accessed October 22, 2022. [DOI] [PubMed] [Google Scholar]
  • 8.Institute of Medicine (US) Committee on Improving the Patient Record, Dick RS, Steen EB, Detmer DE. The computer-based patient record. Washington, D.C.: National Academies Press; 1997. Available at: https://www.ncbi.nlm.nih.gov/books/NBK233047/. Accessed October 21, 2022. [PubMed] [Google Scholar]
  • 9.Blumenthal D. Stimulating the adoption of health information technology. N Engl J Med 2009;360:1477–1479. Available at: 10.1056/NEJMp0901592. [DOI] [PubMed] [Google Scholar]
  • 10.Adler-Milstein J, Jha AK. HITECH Act Drove Large Gains In Hospital Electronic Health Record Adoption. Health Aff 2017;36:1416–1422. Available at: 10.1377/hlthaff.2016.1651. [DOI] [PubMed] [Google Scholar]
  • 11.Berner ES, Detmer DE, Simborg D. Will the wave finally break? A brief view of the adoption of electronic medical records in the United States. J Am Med Inform Assoc 2005;12:3–7. Available at: 10.1197/jamia.M1664. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Gabay M. 21st century cures act. Hosp Pharm 2017;52:264–265. Available at: 10.1310/hpj5204-264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Institute of Medicine (US) Committee on Data Standards for Patient Safety. KEY CAPABILITIES OF AN ELECTRONIC HEALTH RECORD SYSTEM Letter Report. National Academies Press (US); 2003. Available at: https://www.ncbi.nlm.nih.gov/books/NBK221800/. Accessed October 21, 2022. [PubMed] [Google Scholar]
  • 14.Shickel B, Tighe PJ, Bihorac A, Rashidi P. Deep EHR: A survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J Biomed Health Inform 2018;22:1589–1604. Available at: https://ieeexplore.ieee.org/abstract/document/8086133/?casa_token=5DutGIo5aWkAAAAA:0a7ZRaJpFwm6d3TAoqejrtVPx5PkNkTRtVZ5PEx3N1TurPgvT7QkOAb6Wq1gEQUvNC72w0Rl_OM. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Botsis T, Hartvigsen G, Chen F, Weng C. Secondary Use of EHR: Data Quality Issues and Informatics Opportunities. Summit Transl Bioinform 2010;2010:1–5. Available at: https://www.ncbi.nlm.nih.gov/pubmed/21347133. [PMC free article] [PubMed] [Google Scholar]
  • 16.Juhn Y, Liu H. Artificial intelligence approaches using natural language processing to advance EHR-based clinical research. J Allergy Clin Immunol 2020;145:463–469. Available at: 10.1016/j.jaci.2019.12.897. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Kohane IS, Aronow BJ, Avillach P, Beaulieu-Jones BK, Bellazzi R, Bradford RL, Brat GA, Cannataro M, Cimino JJ, García-Barrio N, Gehlenborg N, Ghassemi M, Gutiérrez-Sacristán A, Hanauer DA, Holmes JH, Hong C, Klann JG, Loh NHW, Luo Y, Mandl KD, Daniar M, Moore JH, Murphy SN, Neuraz A, Ngiam KY, Omenn GS, Palmer N, Patel LP, Pedrera-Jiménez M, Sliz P, South AM, Tan ALM, Taylor DM, Taylor BW, Torti C, Vallejos AK, Wagholikar KB, Consortium For Clinical Characterization Of COVID-19 By EHR (4CE), Weber GM, Cai T. What every reader should know about studies using electronic health record data but may be afraid to ask. J Med Internet Res 2021;23:e22219. Available at: https://www.jmir.org/2021/3/e22219/. Accessed January 11, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Hemingway H, Asselbergs FW, Danesh J, Dobson R, Maniadakis N, Maggioni A, Thiel van GJM, Cronin M, Brobert G, Vardas P, Anker SD, Grobbee DE, Denaxas S, Innovative Medicines Initiative 2nd programme, Big Data for Better Outcomes, BigData@Heart Consortium of 20 academic and industry partners including ESC. Big data from electronic health records for early and late translational cardiovascular research: challenges and potential. Eur Heart J 2018;39:1481–1495. Available at: 10.1093/eurheartj/ehx487. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Tayefi M, Ngo P, Chomutare T, Dalianis H, Salvi E, Budrionis A, Godtliebsen F. Challenges and opportunities beyond structured data in analysis of electronic health records. Wiley Interdiscip Rev Comput Stat 2021;13. Available at: 10.1002/wics.1549. [DOI] [Google Scholar]
  • 20.Ehrenstein V, Kharrazi H, Lehmann H, Taylor CO. Obtaining Data From Electronic Health Records. Agency for Healthcare Research and Quality (US); 2019. Available at: https://www.ncbi.nlm.nih.gov/books/NBK551878/. Accessed December 24, 2022. [Google Scholar]
  • 21.Henricks WH. “Meaningful use” of electronic health records and its relevance to laboratories and pathologists. J Pathol Inform 2011;2:7. Available at: 10.4103/2153-3539.76733. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Murdoch TB, Detsky AS. The inevitable application of big data to health care. JAMA 2013;309:1351–1352. Available at: 10.1001/jama.2013.393. [DOI] [PubMed] [Google Scholar]
  • 23.Kong H-J. Managing Unstructured Big Data in Healthcare System. Healthc Inform Res 2019;25:1–2. Available at: 10.4258/hir.2019.25.1.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Zhang J, Symons J, Agapow P, Teo JT, Paxton CA, Abdi J, Mattie H, Davie C, Torres AZ, Folarin A, Sood H, Celi LA, Halamka J, Eapen S, Budhdeo S. Best practices in the real-world data life cycle. PLOS Digital Health 2022;1:e0000003. Available at: https://journals.plos.org/digitalhealth/article/file?id=10.1371/journal.pdig.0000003&type=printable . Accessed December 22, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Lin AL, Chen WC, Hong JC. Chapter 8 - Electronic health record data mining for artificial intelligence healthcare. In: Xing L, Giger ML, Min JK, eds. Artificial Intelligence in Medicine. Academic Press; 2021:133–150. Available at: https://www.sciencedirect.com/science/article/pii/B9780128212592000089. [Google Scholar]
  • 26.Geeslin MG, Gaskin CM. Electronic Health Record-Driven Workflow for Diagnostic Radiologists. J Am Coll Radiol 2016;13:45–53. Available at: 10.1016/j.jacr.2015.08.008. [DOI] [PubMed] [Google Scholar]
  • 27.Hernandez-Boussard T, Monda KL, Crespo BC, Riskin D. Real world evidence in cardiovascular medicine: ensuring data validity in electronic health record-based studies. J Am Med Inform Assoc 2019;26:1189–1194. Available at: 10.1093/jamia/ocz119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Li I, Pan J, Goldwasser J, Verma N, Wong WP, Nuzumlalı MY, Rosand B, Li Y, Zhang M, Chang D, Taylor RA, Krumholz HM, Radev D. Neural Natural Language Processing for unstructured data in electronic health records: A review. Computer Science Review 2022;46:100511. Available at: https://www.sciencedirect.com/science/article/pii/S1574013722000454. [Google Scholar]
  • 29.Mori M, Khera R, Lin Z, Ross JS, Schulz W, Krumholz HM. The Promise of Big Data and Digital Solutions in Building a Cardiovascular Learning System: Opportunities and Barriers. Methodist Debakey Cardiovasc J 2020;16:212–219. Available at: 10.14797/mdcj-16-3-212. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Mori M, Schulz WL, Geirsson A, Krumholz HM. Tapping Into Underutilized Healthcare Data in Clinical Research. Ann Surg 2019;270:227–229. Available at: 10.1097/SLA.0000000000003329. [DOI] [PubMed] [Google Scholar]
  • 31.Giebler C, Gröger C, Hoos E, Schwarz H, Mitschang B. Leveraging the Data Lake: Current State and Challenges. In: Big Data Analytics and Knowledge Discovery. Springer International Publishing; 2019:179–188. Available at: 10.1007/978-3-030-27520-4_13. [DOI] [Google Scholar]
  • 32.Kho AN, Pacheco JA, Peissig PL, Rasmussen L, Newton KM, Weston N, Crane PK, Pathak J, Chute CG, Bielinski SJ, Kullo IJ, Li R, Manolio TA, Chisholm RL, Denny JC. Electronic medical records for genetic research: results of the eMERGE consortium. Sci Transl Med 2011;3:79re1. Available at: 10.1126/scitranslmed.3001807. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Pathak J, Wang J, Kashyap S, Basford M, Li R, Masys DR, Chute CG. Mapping clinical phenotype data elements to standardized metadata repositories and controlled terminologies: the eMERGE Network experience. J Am Med Inform Assoc 2011;18:376–386. Available at: 10.1136/amiajnl-2010-000061. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Gottesman O, Kuivaniemi H, Tromp G, Faucett WA, Li R, Manolio TA, Sanderson SC, Kannry J, Zinberg R, Basford MA, Brilliant M, Carey DJ, Chisholm RL, Chute CG, Connolly JJ, Crosslin D, Denny JC, Gallego CJ, Haines JL, Hakonarson H, Harley J, Jarvik GP, Kohane I, Kullo IJ, Larson EB, McCarty C, Ritchie MD, Roden DM, Smith ME, Böttinger EP, Williams MS, eMERGE Network. The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future. Genet Med 2013;15:761–771. Available at: 10.1038/gim.2013.72. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Roden DM, Pulley JM, Basford MA, Bernard GR, Clayton EW, Balser JR, Masys DR. Development of a large-scale de-identified DNA biobank to enable personalized medicine. Clin Pharmacol Ther 2008;84:362–369. Available at: 10.1038/clpt.2008.89. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Schaefer C, the RPGEH GO Project Collaboration. C-A3–04: The Kaiser Permanente Research Program on Genes, Environment and Health: A Resource for Genetic Epidemiology in Adult Health and Aging. Clin Med Res 2011;9:177. Available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3251456/. Accessed January 4, 2023. [Google Scholar]
  • 37.Hall JL, Ryan JJ, Bray BE, Brown C, Lanfear D, Newby LK, Relling MV, Risch NJ, Roden DM, Shaw SY, Tcheng JE, Tenenbaum J, Wang TN, Weintraub WS. Merging Electronic Health Record Data and Genomics for Cardiovascular Research. Circ Cardiovasc Genet 2016;9:193–202. Available at: 10.1161/HCG.0000000000000029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Robinson JR, Carroll RJ, Bastarache L, Chen Q, Pirruccello J, Mou Z, Wei W-Q, Connolly J, Mentch F, Crane PK, Hebbring SJ, Crosslin DR, Gordon AS, Rosenthal EA, Stanaway IB, Hayes MG, Wei W, Petukhova L, Namjou-Khales B, Zhang G, Safarova MS, Walton NA, Still C, Bottinger EP, Loos RJF, Murphy SN, Jackson GP, Abumrad N, Kullo IJ, Jarvik GP, Larson EB, Weng C, Roden D, Khera AV, Denny JC. Quantifying the phenome-wide disease burden of obesity using electronic health records and genomics. Obesity 2022;30:2477–2488. Available at: 10.1002/oby.23561. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Ritchie MD, Denny JC, Zuvich RL, Crawford DC, Schildcrout JS, Bastarache L, Ramirez AH, Mosley JD, Pulley JM, Basford MA, Bradford Y, Rasmussen LV, Pathak J, Chute CG, Kullo IJ, McCarty CA, Chisholm RL, Kho AN, Carlson CS, Larson EB, Jarvik GP, Sotoodehnia N, Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) QRS Group, Manolio TA, Li R, Masys DR, Haines JL, Roden DM. Genome- and phenome-wide analyses of cardiac conduction identifies markers of arrhythmia risk. Circulation 2013;127:1377–1385. Available at: 10.1161/CIRCULATIONAHA.112.000604. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Pevnick JM, Birkeland K, Zimmer R, Elad Y, Kedan I. Wearable technology for cardiology: An update and framework for the future. Trends Cardiovasc Med 2018;28:144–150. Available at: 10.1016/j.tcm.2017.08.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Dagher L, Shi H, Zhao Y, Marrouche NF. Wearables in cardiology: Here to stay. Heart Rhythm 2020;17:889–895. Available at: 10.1016/j.hrthm.2020.02.023. [DOI] [PubMed] [Google Scholar]
  • 42.Tiase VL, Hull W, McFarland MM, Sward KA, Del Fiol G, Staes C, Weir C, Cummins MR. Patient-generated health data and electronic health record integration: a scoping review. JAMIA Open 2020;3:619–627. Available at: 10.1093/jamiaopen/ooaa052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Ross MK, Wei W, Ohno-Machado L. “Big data” and the electronic health record. Yearb Med Inform 2014;9:97–104. Available at: 10.15265/IY-2014-0003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Perez MV, Mahaffey KW, Hedlin H, Rumsfeld JS, Garcia A, Ferris T, Balasubramanian V, Russo AM, Rajmane A, Cheung L, Hung G, Lee J, Kowey P, Talati N, Nag D, Gummidipundi SE, Beatty A, Hills MT, Desai S, Granger CB, Desai M, Turakhia MP, Apple Heart Study Investigators. Large-Scale Assessment of a Smartwatch to Identify Atrial Fibrillation. N Engl J Med 2019;381:1909–1917. Available at: 10.1056/NEJMoa1901183. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Mattison G, Canfell O, Forrester D, Dobbins C, Smith D, Töyräs J, Sullivan C. The Influence of Wearables on Health Care Outcomes in Chronic Disease: Systematic Review. J Med Internet Res 2022;24:e36690. Available at: 10.2196/36690. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Burke LE, Ma J, Azar KMJ, Bennett GG, Peterson ED, Zheng Y, Riley W, Stephens J, Shah SH, Suffoletto B, Turan TN, Spring B, Steinberger J, Quinn CC, American Heart Association Publications Committee of the Council on Epidemiology and Prevention, Behavior Change Committee of the Council on Cardiometabolic Health, Council on Cardiovascular and Stroke Nursing, Council on Functional Genomics and Translational Biology, Council on Quality of Care and Outcomes Research, and Stroke Council. Current Science on Consumer Use of Mobile Health for Cardiovascular Disease Prevention: A Scientific Statement From the American Heart Association. Circulation 2015;132:1157–1213. Available at: 10.1161/CIR.0000000000000232. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Masterson Creber R, Spadaccio C, Dimagli A, Myers A, Taylor B, Fremes S. Patient-reported outcomes in cardiovascular trials. Can J Cardiol 2021;37:1340–1352. Available at: 10.1016/j.cjca.2021.04.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Bevans M, Ross A, Cella D. Patient-Reported Outcomes Measurement Information System (PROMIS): efficient, standardized tools to measure self-reported health and quality of life. Nurs Outlook 2014;62:339–345. Available at: 10.1016/j.outlook.2014.05.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Horn ME, Reinke EK, Mather RC, O’Donnell JD, George SZ. Electronic health record-integrated approach for collection of patient-reported outcome measures: a retrospective evaluation. BMC Health Serv Res 2021;21:626. Available at: 10.1186/s12913-021-06626-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Gensheimer SG, Wu AW, Snyder CF, PRO-EHR Users’ Guide Steering Group, PRO-EHR Users’ Guide Working Group. Oh, the places we’ll go: Patient-reported outcomes and electronic health records. Patient 2018;11:591–598. Available at: 10.1007/s40271-018-0321-9. [DOI] [PubMed] [Google Scholar]
  • 51.White-Williams C, Rossi LP, Bittner VA, Driscoll A, Durant RW, Granger BB, Graven LJ, Kitko L, Newlin K, Shirey M, American Heart Association Council on Cardiovascular and Stroke Nursing; Council on Clinical Cardiology; and Council on Epidemiology and Prevention. Addressing Social Determinants of Health in the Care of Patients With Heart Failure: A Scientific Statement From the American Heart Association. Circulation 2020;141:e841–e863. Available at: 10.1161/CIR.0000000000000767. [DOI] [PubMed] [Google Scholar]
  • 52.Powell-Wiley TM, Baumer Y, Baah FO, Baez AS, Farmer N, Mahlobo CT, Pita MA, Potharaju KA, Tamura K, Wallen GR. Social Determinants of Cardiovascular Disease. Circ Res 2022;130:782–799. Available at: 10.1161/CIRCRESAHA.121.319811. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Adler NE, Stead WW. Patients in context--EHR capture of social and behavioral determinants of health. N Engl J Med 2015;372:698–701. Available at: 10.1056/NEJMp1413945. [DOI] [PubMed] [Google Scholar]
  • 54.Bhavsar NA, Gao A, Phelan M, Pagidipati NJ, Goldstein BA. Value of Neighborhood Socioeconomic Status in Predicting Risk of Outcomes in Studies That Use Electronic Health Record Data. JAMA Netw Open 2018;1:e182716. Available at: 10.1001/jamanetworkopen.2018.2716. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Anon. Social determinants of health (SDOH) and PLACES data. 2022. Available at: https://www.cdc.gov/places/social-determinants-of-health-and-places-data/index.html. Accessed January 11, 2023.
  • 56.Maroko AR, Doan TM, Arno PS, Hubel M, Yi S, Viola D. Integrating Social Determinants of Health With Treatment and Prevention: A New Tool to Assess Local Area Deprivation. Prev Chronic Dis 2016;13:E128. Available at: 10.5888/pcd13.160221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Schulz WL, Durant TJS, Torre CJ Jr, Hsiao AL, Krumholz HM. Agile health care analytics: Enabling real-time disease surveillance with a computational health platform. J Med Internet Res 2020;22:e18707. Available at: 10.2196/18707. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Wolkewitz M, Puljak L. Methodological challenges of analysing COVID-19 data during the pandemic. BMC Med Res Methodol 2020;20:81. Available at: 10.1186/s12874-020-00972-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Antman EM, Loscalzo J. Precision medicine in cardiology. Nat Rev Cardiol 2016;13:591–602. Available at: 10.1038/nrcardio.2016.101. [DOI] [PubMed] [Google Scholar]
  • 60.Nargesian F, Zhu E, Miller RJ, Pu KQ, Arocena PC. Data lake management: challenges and opportunities. Proceedings VLDB Endowment 2019;12:1986–1989. Available at: 10.14778/3352063.3352116. [DOI] [Google Scholar]
  • 61.Grossman RL. Data Lakes, Clouds, and Commons: A Review of Platforms for Analyzing and Sharing Genomic Data. Trends Genet 2019;35:223–234. Available at: 10.1016/j.tig.2018.12.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Terrizzano I, Schwarz P, Roth M, Colino JE. Data wrangling: The challenging journey from the wild to the lake. 2015. Available at: http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper2.pdf. Accessed November 17, 2022.
  • 63.Griffith SD, Tucker M, Bowser B, Calkins G, Chang C- HJ, Guardino E, Khozin S, Kraut J, You P, Schrag D, Miksad RA. Generating real-world tumor burden endpoints from electronic health record data: Comparison of RECIST, radiology-anchored, and clinician-anchored approaches for abstracting real-world progression in non-small cell lung cancer. Adv Ther 2019;36:2122–2136. Available at: 10.1007/s12325-019-00970-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Zhu M, Sridhar S, Hollingsworth R, Chit A, Kimball T, Murmello K, Greenberg M, Gurunathan S, Chen J. Hybrid clinical trials to generate real-world evidence: design considerations from a sponsor’s perspective. Contemp Clin Trials 2020;94:105856. Available at: https://www.sciencedirect.com/science/article/pii/S1551714419305713. [DOI] [PubMed] [Google Scholar]
  • 65.Sangha V, Aghajani Nargesi A, Dhingra LS, Mortazavi BJ, Ribeiro AH, Brandt C, Miller EJ, Ribeiro ALP, Velazquez E, Krumholz H, Khera R. Detection of left ventricular systolic dysfunction from electrocardiographic images. bioRxiv 2022. Available at: 10.1101/2022.06.04.22276000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Goldstein BA, Navar AM, Pencina MJ, Ioannidis JPA. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. J Am Med Inform Assoc 2017;24:198–208. Available at: 10.1093/jamia/ocw042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Chen ES, Sarkar IN. Mining the electronic health record for disease knowledge. Methods Mol Biol 2014;1159:269–286. Available at: 10.1007/978-1-4939-0709-0_15. [DOI] [PubMed] [Google Scholar]
  • 68.Sun W, Cai Z, Li Y, Liu F, Fang S, Wang G. Data Processing and Text Mining Technologies on Electronic Medical Records: A Review. J Healthc Eng 2018;2018:4302425. Available at: 10.1155/2018/4302425. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Liu F, Jagannatha A, Yu H. Towards drug safety surveillance and pharmacovigilance: Current progress in detecting medication and adverse drug events from electronic health records. Drug Saf 2019;42:95–97. Available at: 10.1007/s40264-018-0766-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Ball R, Robb M, Anderson SA, Dal Pan G. The FDA’s sentinel initiative--A comprehensive approach to medical product surveillance. Clin Pharmacol Ther 2016;99:265–268. Available at: 10.1002/cpt.320. [DOI] [PubMed] [Google Scholar]
  • 71.Staffa JA, Dal Pan GJ. Regulatory innovation in postmarketing risk assessment and management. Clin Pharmacol Ther 2012;91:555–557. Available at: 10.1038/clpt.2011.289. [DOI] [PubMed] [Google Scholar]
  • 72.Food and Drug Administration. Postmarket Surveillance Under Section 522 of the Federal Food, Drug, and Cosmetic Act; Guidance for Industry and Food and Drug Administration Staff; and Procedures for Handling Post-Approval Studies Imposed by Premarket Approval Application Order; Guidance for Industry and Food and Drug Administration Staff; Availability. Fed Regist 2022;87:61030–61032. Available at: https://www.federalregister.gov/documents/2022/10/07/2022-21832/postmarket-surveillance-under-section-522-of-the-federal-food-drug-and-cosmetic-act-guidance-for. [Google Scholar]
  • 73.Oikonomou EK, Suchard MA, Kernan WN, Young LH, Inzucchi SE, Khera R. Abstract 13661: An adaptive strategy of predictive enrichment to increase efficiency of randomized clinical trials using machine learning: A simulation of the IRIS trial. Circulation 2022;146:A13661–A13661. Available at: https://www.ahajournals.org/doi/abs/10.1161/circ.146.suppl_1.13661. [Google Scholar]
  • 74.Mehta C, Gao P, Bhatt DL, Harrington RA, Skerjanec S, Ware JH. Optimizing trial design: sequential, adaptive, and enrichment strategies. Circulation 2009;119:597–605. Available at: 10.1161/CIRCULATIONAHA.108.809707. [DOI] [PubMed] [Google Scholar]
  • 75.Khera R, Mortazavi BJ, Sangha V, Warner F, Young HP, Ross JS, Shah ND, Theel ES, Jenkinson WG, Knepper C, Wang K, Peaper D, Martinello RA, Brandt CA, Lin Z, Ko AI, Krumholz HM, Pollock BD, Schulz WL. Accuracy of computable phenotyping approaches for SARS-CoV-2 infection and COVID-19 hospitalizations from the electronic Health Record. medRxiv 2021. Available at: 10.1101/2021.03.16.21253770. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Mahajan S, Gu J, Lu Y, Khera R, Spatz ES, Zhang M, Sun N, Zheng X, Zhao H, Lu H, Ma ZJ, Krumholz HM. Hemodynamic phenotypes of hypertension based on cardiac output and systemic vascular resistance. Am J Med 2020;133:e127–e139. Available at: https://www.sciencedirect.com/science/article/pii/S0002934319307661. [DOI] [PubMed] [Google Scholar]
  • 77.Oikonomou EK, Suchard MA, McGuire DK, Khera R. Phenomapping-derived tool to individualize the effect of canagliflozin on cardiovascular risk in type 2 diabetes. Diabetes Care 2022;45:965–974. Available at: https://diabetesjournals.org/care/article-abstract/45/4/965/144528. Accessed May 23, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Oikonomou EK, Van Dijk D, Parise H, Suchard MA, Lemos de J, Antoniades C, Velazquez EJ, Miller EJ, Khera R. A phenomapping-derived tool to personalize the selection of anatomical vs. functional testing in evaluating chest pain (ASSIST). Eur Heart J 2021;42:2536–2548. Available at: https://academic.oup.com/eurheartj/article-abstract/42/26/2536/6242724. Accessed May 23, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Gordon WJ, Gottlieb D, Kreda D, Mandel JC, Mandl KD, Kohane IS. Patient-led data sharing for clinical bioinformatics research: USCDI and beyond. J Am Med Inform Assoc 2021;28:2298–2300. Available at: 10.1093/jamia/ocab133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Sherman RE, Anderson SA, Dal Pan GJ, Gray GW, Gross T, Hunter NL, LaVange L, Marinac-Dabic D, Marks PW, Robb MA, Shuren J, Temple R, Woodcock J, Yue LQ, Califf RM. Real-World Evidence - What Is It and What Can It Tell Us? N Engl J Med 2016;375:2293–2297. Available at: 10.1056/NEJMsb1609216. [DOI] [PubMed] [Google Scholar]
  • 81.Health Information Privacy Division. Individuals’ right under HIPAA to access their health information 45 CFR § 164.524. Hhs.gov 2016. Available at: https://www.hhs.gov/hipaa/for-professionals/privacy/guidance/access/index.html. Accessed January 11, 2023. [Google Scholar]
  • 82.Cohen IG, Mello MM. HIPAA and protecting health information in the 21st century. JAMA 2018;320:231. Available at: https://jamanetwork.com/journals/jama/article-abstract/2682916. Accessed November 18, 2022. [DOI] [PubMed] [Google Scholar]
  • 83.Glenn T, Monteith S. Privacy in the digital world: medical and health data outside of HIPAA protections. Curr Psychiatry Rep 2014;16:494. Available at: 10.1007/s11920-014-0494-4. [DOI] [PubMed] [Google Scholar]
  • 84.Moore W, Frye S. Review of HIPAA, part 2: Limitations, rights, violations, and role for the imaging technologist. J Nucl Med Technol 2020;48:17–23. Available at: http://tech.snmjournals.org/content/48/1/17.abstract. [DOI] [PubMed] [Google Scholar]
  • 85.Dey P, Ross JS, Ritchie JD, Desai NR, Bhavnani SP, Krumholz HM. Data Sharing and Cardiology: Platforms and Possibilities. J Am Coll Cardiol 2017;70:3018–3025. Available at: 10.1016/j.jacc.2017.10.037. [DOI] [PubMed] [Google Scholar]
  • 86.Kent S, Burn E, Dawoud D, Jonsson P, Østby JT, Hughes N, Rijnbeek P, Bouvy JC. Common Problems, Common Data Model Solutions: Evidence Generation for Health Technology Assessment. Pharmacoeconomics 2021;39:275–285. Available at: 10.1007/s40273-020-00981-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Kaissis GA, Makowski MR, Rückert D, Braren RF. Secure, privacy-preserving and federated machine learning in medical imaging. Nat Mach Intell 2020;2:305–311. Available at: https://www.nature.com/articles/s42256-020-0186-1. Accessed November 18, 2022. [Google Scholar]
  • 88.Fort D, Weng C, Bakken S, Wilcox AB. Considerations for using research data to verify clinical data accuracy. AMIA Summits Transl Sci Proc 2014;2014:211–217. Available at: https://www.ncbi.nlm.nih.gov/pubmed/25717415. [PMC free article] [PubMed] [Google Scholar]
  • 89.Kroth PJ, Morioka-Douglas N, Veres S, Pollock K, Babbott S, Poplau S, Corrigan K, Linzer M. The electronic elephant in the room: Physicians and the electronic health record. JAMIA Open 2018;1:49–56. Available at: 10.1093/jamiaopen/ooy016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Diaz-Garelli J-F, Strowd R, Ahmed T, Wells BJ, Merrill R, Laurini J, Pasche B, Topaloglu U. A tale of three subspecialties: Diagnosis recording patterns are internally consistent but Specialty-Dependent. JAMIA Open 2019;2:369–377. Available at: 10.1093/jamiaopen/ooz020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Homco J, Carabin H, Nagykaldi Z, Garwe T, Duffy FD, Kendrick D, Martinez S, Zhao YD, Stoner J. Validity of medical record abstraction and electronic health record-generated reports to assess performance on cardiovascular quality measures in primary care. JAMA Netw Open 2020;3:e209411. Available at: 10.1001/jamanetworkopen.2020.9411. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Shivade C, Raghavan P, Fosler-Lussier E, Embi PJ, Elhadad N, Johnson SB, Lai AM. A review of approaches to identifying patient phenotype cohorts using electronic health records. J Am Med Inform Assoc 2014;21:221–230. Available at: https://academic.oup.com/jamia/article/21/2/221/2909214. Accessed October 22, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Cismondi F, Fialho AS, Vieira SM, Reti SR, Sousa JMC, Finkelstein SN. Missing data in medical databases: impute, delete or classify? Artif Intell Med 2013;58:63–72. Available at: 10.1016/j.artmed.2013.01.003. [DOI] [PubMed] [Google Scholar]
  • 94.Groenwold RHH. Informative missingness in electronic health record systems: the curse of knowing. Diagn Progn Res 2020;4:8. Available at: 10.1186/s41512-020-00077-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Groenwold RHH, White IR, Donders ART, Carpenter JR, Altman DG, Moons KGM. Missing covariate data in clinical research: when and when not to use the missing-indicator method for analysis. CMAJ 2012;184:1265–1269. Available at: 10.1503/cmaj.110977. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Kotecha D, Asselbergs FW, Achenbach S, Anker SD, Atar D, Baigent C, Banerjee A, Beger B, Brobert G, Casadei B, Ceccarelli C, Cowie MR, Crea F, Cronin M, Denaxas S, Derix A, Fitzsimons D, Fredriksson M, Gale CP, Gkoutos GV, Goettsch W, Hemingway H, Ingvar M, Jonas A, Kazmierski R, Løgstrup S, Lumbers RT, Lüscher TF, McGreavy P, Piña IL, Roessig L, Steinbeisser C, Sundgren M, Tyl B, Thiel van G, Bochove van K, Vardas PE, Villanueva T, Vrana M, Weber W, Weidinger F, Windecker S, Wood A, Grobbee DE. CODE-EHR best-practice framework for the use of structured electronic health-care records in clinical research. The Lancet Digital Health 2022. Available at: 10.1016/S2589-7500(22)00151-0. [DOI] [PubMed] [Google Scholar]
  • 97.Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc 2013;20:144–151. Available at: 10.1136/amiajnl-2011-000681. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Kharrazi H, Wang C, Scharfstein D. Prospective EHR-based clinical trials: the challenge of missing data. J Gen Intern Med 2014;29:976–978. Available at: 10.1007/s11606-014-2883-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Hripcsak G, Duke JD, Shah NH, Reich CG, Huser V, Schuemie MJ, Suchard MA, Park RW, Wong ICK, Rijnbeek PR, Lei van der J, Pratt N, Norén GN, Li Y-C, Stang PE, Madigan D, Ryan PB. Observational Health Data Sciences and Informatics (OHDSI): Opportunities for observational researchers. Stud Health Technol Inform 2015;216:574–578. Available at: https://www.ncbi.nlm.nih.gov/pubmed/26262116. [PMC free article] [PubMed] [Google Scholar]
  • 100.Fu S, Leung LY, Raulli A-O, Kallmes DF, Kinsman KA, Nelson KB, Clark MS, Luetmer PH, Kingsbury PR, Kent DM, Liu H. Assessment of the impact of EHR heterogeneity for clinical research through a case study of silent brain infarction. BMC Med Inform Decis Mak 2020;20:60. Available at: 10.1186/s12911-020-1072-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Ambrosy AP, Parikh RV, Sung SH, Narayanan A, Masson R, Lam P-Q, Kheder K, Iwahashi A, Hardwick AB, Fitzpatrick JK, Avula HR, Selby VN, Shen X, Sanghera N, Cristino J, Go AS. A Natural Language Processing-Based Approach for Identifying Hospitalizations for Worsening Heart Failure Within an Integrated Health Care Delivery System. JAMA Netw Open 2021;4:e2135152. Available at: 10.1001/jamanetworkopen.2021.35152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Koleck TA, Dreisbach C, Bourne PE, Bakken S. Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review. J Am Med Inform Assoc 2019;26:364–379. Available at: 10.1093/jamia/ocy173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, Liu PJ, Liu X, Marcus J, Sun M, Sundberg P, Yee H, Zhang K, Zhang Y, Flores G, Duggan GE, Irvine J, Le Q, Litsch K, Mossin A, Tansuwan J, Wang D, Wexler J, Wilson J, Ludwig D, Volchenboum SL, Chou K, Pearson M, Madabushi S, Shah NH, Butte AJ, Howell MD, Cui C, Corrado GS, Dean J. Scalable and accurate deep learning with electronic health records. NPJ Digit Med 2018;1:18. Available at: 10.1038/s41746-018-0029-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Zhang D, Yin C, Zeng J, Yuan X, Zhang P. Combining structured and unstructured data for predictive models: a deep learning approach. BMC Med Inform Decis Mak 2020;20:280. Available at: 10.1186/s12911-020-01297-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Park J, Saha S, Chee B, Taylor J, Beach MC. Physician use of stigmatizing language in patient medical records. JAMA Netw Open 2021;4:e2117052. Available at: 10.1001/jamanetworkopen.2021.17052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Himmelstein G, Bates D, Zhou L. Examination of stigmatizing language in the electronic health record. JAMA Netw Open 2022;5:e2144967. Available at: 10.1001/jamanetworkopen.2021.44967. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.by: P. Report on Health Information Blocking. Available at: https://www.healthit.gov/sites/default/files/reports/info_blocking_040915.pdf. Accessed May 23, 2023.
  • 108.Everson J, Patel V, Adler-Milstein J. Information blocking remains prevalent at the start of 21st Century Cures Act: results from a survey of health information exchange organizations. J Am Med Inform Assoc 2021;28:727–732. Available at: 10.1093/jamia/ocaa323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Anon. Service management. Available at: https://inquiry.healthit.gov/support/plugins/servlet/desk/portal/6. Accessed May 23, 2023.
  • 110.Anon. 2022. Medicare Promoting Interoperability Program requirements. Available at: https://www.cms.gov/regulations-guidance/promoting-interoperability/2022-medicare-promoting-interoperability-program-requirements. Accessed May 23, 2023.
  • 111.Anon. What are the advantages of electronic health records? Available at: https://www.healthit.gov/faq/what-are-advantages-electronic-health-records. Accessed May 23, 2023.
  • 112.Kruse CS, Mileski M, Dray G, Johnson Z, Shaw C, Shirodkar H. Physician burnout and the electronic health record leading up to and during the first year of COVID-19: Systematic review. J Med Internet Res 2022;24:e36200. Available at: 10.2196/36200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.Ejohnson KB, Neuss MJ, Detmer DE. Electronic health records and clinician burnout: A story of three eras. Journal of the American Medical Informatics Association: JAMIA 2021;28:967–973. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114.Eschenroeder HC, Manzione LC, Adler-Milstein J, Bice C, Cash R, Duda C, Joseph C, Lee JS, Maneker A, Poterack KA, Rahman SB, Jeppson J, Longhurst C. Associations of physician burnout with organizational electronic health record support and after-hours charting. J Am Med Inform Assoc 2021;28:960–966. Available at: 10.1093/jamia/ocab053. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Muhiyaddin R, Elfadl A, Mohamed E, Shah Z, Alam T, Abd-Alrazaq A, Househ M. Electronic Health Records and physician burnout: A scoping review. Stud Health Technol Inform 2022;289:481–484. Available at: 10.3233/SHTI210962. [DOI] [PubMed] [Google Scholar]
  • 116.Yan Q, Jiang Z, Harbin Z, Tolbert PH, Davies MG. Exploring the relationship between electronic health records and provider burnout: A systematic review. J Am Med Inform Assoc 2021;28:1009–1021. Available at: 10.1093/jamia/ocab009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.Windle JR, Windle TA, Shamavu KY, Nelson QM, Clarke MA, Fruhling AL, Tcheng JE. Roadmap to a more useful and usable electronic health record. Cardiovasc Digit Health J 2021;2:301–311. Available at: 10.1016/j.cvdhj.2021.09.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 118.Anon. Reducing clinician burden. Available at: https://wiki.hl7.org/index.php?title=Reducing_Clinician_Burden. Accessed May 23, 2023.
  • 119.Anon. Strategy on reducing regulatory and administrative burden relating to the use of health IT and EHRs. Available at: https://www.healthit.gov/sites/default/files/page/2020-02/BurdenReport_0.pdf. Accessed May 23, 2023.
  • 120.Sieja A, Markley K, Pell J, Gonzalez C, Redig B, Kneeland P, Lin C-T. Optimization sprints: Improving clinician satisfaction and teamwork by rapidly reducing electronic health record burden. Mayo Clin Proc 2019;94:793–802. Available at: 10.1016/j.mayocp.2018.08.036. [DOI] [PubMed] [Google Scholar]
  • 121.Anon. Available at: https://www.allscripts.com/wp-content/uploads/2021/10/NextNow_12_Human-Centered_Design-1.pdf. Accessed May 23, 2023.
  • 122.Pirtle C, Whyte H, Goode E, Anders S, Lehmann C, Kumah-Crystal Y. Electronic health record interactions through voice: A review. Appl Clin Inform 2018;09:541–552. Available at: 10.1055/s-0038-1666844. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123.Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med 2019;17:195. Available at: 10.1186/s12916-019-1426-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124.Smetherman D, Golding L, Moy L, Rubin E. The economic impact of AI on breast imaging. J Breast Imaging 2022;4:302–308. Available at: 10.1093/jbi/wbac012. [DOI] [PubMed] [Google Scholar]
  • 125.Khanna NN, Maindarkar MA, Viswanathan V, Fernandes JFE, Paul S, Bhagawati M, Ahluwalia P, Ruzsa Z, Sharma A, Kolluri R, Singh IM, Laird JR, Fatemi M, Alizad A, Saba L, Agarwal V, Sharma A, Teji JS, Al-Maini M, Rathore V, Naidu S, Liblik K, Johri AM, Turk M, Mohanty L, Sobel DW, Miner M, Viskovic K, Tsoulfas G, Protogerou AD, Kitas GD, Fouda MM, Chaturvedi S, Kalra MK, Suri JS. Economics of artificial Intelligence in healthcare: Diagnosis vs. Treatment. Healthcare (Basel) 2022;10:2493. Available at: 10.3390/healthcare10122493. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 126.Begoyan A. An overview of interoperability standards for electronic health records. Available at: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=5ca532439868b9fac13bf5a0d6b46365280828d3. Accessed January 11, 2023.
  • 127.Garde S, Knaup P, Hovenga E, Heard S. Towards semantic interoperability for electronic health records. Methods Inf Med 2007;46:332–343. Available at: 10.1160/ME5001. [DOI] [PubMed] [Google Scholar]
  • 128.Anon. United States core data for interoperability (USCDI). Available at: https://www.healthit.gov/isa/united-states-core-data-interoperability-uscdi. Accessed February 16, 2023.
  • 129.Graham RNJ, Perriss RW, Scarsbrook AF. DICOM demystified: a review of digital file formats and their use in radiological practice. Clin Radiol 2005;60:1133–1140. Available at: https://www.sciencedirect.com/science/article/pii/S0009926005002199. [DOI] [PubMed] [Google Scholar]
  • 130.Steindel SJ. International classification of diseases, 10th edition, clinical modification and procedure coding system: descriptive overview of the next generation HIPAA code sets. J Am Med Inform Assoc 2010;17:274–282. Available at: 10.1136/jamia.2009.001230. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 131.Millar J. The need for a global language - SNOMED CT introduction. Stud Health Technol Inform 2016;225:683–685. Available at: https://www.ncbi.nlm.nih.gov/pubmed/27332304. [PubMed] [Google Scholar]
  • 132.Leslie-Mazwi TM, Bello JA, Tu R, Nicola GN, Donovan WD, Barr RM, Hirsch JA. Current Procedural Terminology: History, structure, and relationship to valuation for the neuroradiologist. AJNR Am J Neuroradiol 2016;37:1972–1976. Available at: 10.3174/ajnr.A4863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 133.Nelson SJ, Zeng K, Kilbourne J, Powell T, Moore R. Normalized names for clinical drugs: RxNorm at 6 years. J Am Med Inform Assoc 2011;18:441–448. Available at: 10.1136/amiajnl-2011-000116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 134.McDonald CJ, Huff SM, Suico JG, Hill G, Leavelle D, Aller R, Forrey A, Mercer K, DeMoor G, Hook J, Williams W, Case J, Maloney P. LOINC, a universal standard for identifying laboratory observations: a 5-year update. Clin Chem 2003;49:624–633. Available at: 10.1373/49.4.624. [DOI] [PubMed] [Google Scholar]
  • 135.Schneeweiss S, Brown JS, Bate A, Trifirò G, Bartels DB. Choosing among common data models for real-world data analyses fit for making decisions about the effectiveness of medical products. Clin Pharmacol Ther 2020;107:827–833. Available at: https://ascpt.onlinelibrary.wiley.com/doi/abs/10.1002/cpt.1577. [DOI] [PubMed] [Google Scholar]
  • 136.Tan HX, Teo DCH, Lee D, Kim C, Neo JW, Sung C, Chahed H, Ang PS, Tan DSY, Park RW, Dorajoo SR. Applying the OMOP common data model to facilitate benefit-risk assessments of medicinal products using real-world data from Singapore and South Korea. Healthc Inform Res 2022;28:112–122. Available at: 10.4258/hir.2022.28.2.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 137.Goossen WTF. Detailed clinical models: representing knowledge, data and semantics in healthcare information technology. Healthc Inform Res 2014;20:163–172. Available at: 10.4258/hir.2014.20.3.163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 138.Bender D, Sartipi K. HL7 FHIR: An Agile and RESTful approach to healthcare information exchange. In: Proceedings of the 26th IEEE International Symposium on Computer-Based Medical Systems.; 2013:326–331. Available at: 10.1109/CBMS.2013.6627810. [DOI] [Google Scholar]
  • 139.Jiang G, Kiefer RC, Sharma DK, Prud’hommeaux E, Solbrig HR. A consensus-based approach for harmonizing the OHDSI common data model with HL7 FHIR. Stud Health Technol Inform 2017;245:887–891. Available at: https://www.ncbi.nlm.nih.gov/pubmed/29295227. [PMC free article] [PubMed] [Google Scholar]
  • 140.Saripalle R, Runyan C, Russell M. Using HL7 FHIR to achieve interoperability in patient health record. J Biomed Inform 2019;94:103188. Available at: 10.1016/j.jbi.2019.103188. [DOI] [PubMed] [Google Scholar]
  • 141.Pfaff ER, Champion J, Bradford RL, Clark M, Xu H, Fecho K, Krishnamurthy A, Cox S, Chute CG, Overby Taylor C, Ahalt S. Fast Healthcare Interoperability Resources (FHIR) as a Meta Model to Integrate Common Data Models: Development of a Tool and Quantitative Validation Study. JMIR Med Inform 2019;7:e15199. Available at: 10.2196/15199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 142.Ayaz M, Pasha MF, Alzahrani MY, Budiarto R, Stiawan D. The Fast Health Interoperability Resources (FHIR) Standard: Systematic Literature Review of Implementations, Applications, Challenges and Opportunities. JMIR Med Inform 2021;9:e21929. Available at: 10.2196/21929. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 143.Pecoraro F, Luzi D, Ricci FL. Designing ETL tools to feed a data warehouse based on Electronic Healthcare Record infrastructure. Stud Health Technol Inform 2015;210:929–933. Available at: https://ebooks.iospress.nl/doi/10.3233/978-1-61499-512-8-929. Accessed October 22, 2022. [PubMed] [Google Scholar]
  • 144.Ong TC, Kahn MG, Kwan BM, Yamashita T, Brandt E, Hosokawa P, Uhrich C, Schilling LM. Dynamic-ETL: a hybrid approach for health data extraction, transformation and loading. BMC Med Inform Decis Mak 2017;17. Available at: 10.1186/s12911-017-0532-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 145.Yu Y, Zong N, Wen A, Liu S, Stone DJ, Knaack D, Chamberlain AM, Pfaff E, Gabriel D, Chute CG, Shah N, Jiang G. Developing an ETL tool for converting the PCORnet CDM into the OMOP CDM to facilitate the COVID-19 data integration. J Biomed Inform 2022;127:104002. Available at: https://www.sciencedirect.com/science/article/pii/S1532046422000181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 146.Makadia R, Ryan PB. Transforming the Premier Perspective Hospital Database into the Observational Medical Outcomes Partnership (OMOP) Common Data Model. EGEMS (Wash DC) 2014;2:1110. Available at: 10.13063/2327-9214.1110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 147.Singh K, Woodward MA. The rigorous work of evaluating consistency and accuracy in electronic health record data. JAMA Ophthalmol 2021;139:894–895. Available at: https://jamanetwork.com/journals/jamaophthalmology/article-abstract/2781701. Accessed November 18, 2022. [DOI] [PubMed] [Google Scholar]
  • 148.Kahn MG, Callahan TJ, Barnard J, Bauck AE, Brown J, Davidson BN, Estiri H, Goerg C, Holve E, Johnson SG, Liaw S-T, Hamilton-Lopez M, Meeker D, Ong TC, Ryan P, Shang N, Weiskopf NG, Weng C, Zozus MN, Schilling L. A harmonized data quality assessment terminology and framework for the secondary use of electronic health record data. EGEMS (Wash, DC) 2016;4:1244. Available at: 10.13063/2327-9214.1244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 149.Callahan A, Shah NH, Chen JH. Research and Reporting Considerations for Observational Studies Using Electronic Health Record Data. Ann Intern Med 2020;172:S79–S84. Available at: 10.7326/M19-0873. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 150.Meehan RA, Mon DT, Kelly KM, Rocca M, Dickinson G, Ritter J, Johnson CM. Increasing EHR system usability through standards: Conformance criteria in the HL7 EHR-system functional model. J Biomed Inform 2016;63:169–173. Available at: 10.1016/j.jbi.2016.08.015. [DOI] [PubMed] [Google Scholar]
  • 151.Haendel MA, Chute CG, Bennett TD, Eichmann DA, Guinney J, Kibbe WA, Payne PRO, Pfaff ER, Robinson PN, Saltz JH, Spratt H, Suver C, Wilbanks J, Wilcox AB, Williams AE, Wu C, Blacketer C, Bradford RL, Cimino JJ, Clark M, Colmenares EW, Francis PA, Gabriel D, Graves A, Hemadri R, Hong SS, Hripscak G, Jiao D, Klann JG, Kostka K, Lee AM, Lehmann HP, Lingrey L, Miller RT, Morris M, Murphy SN, Natarajan K, Palchuk MB, Sheikh U, Solbrig H, Visweswaran S, Walden A, Walters KM, Weber GM, Zhang XT, Zhu RL, Amor B, Girvin AT, Manna A, Qureshi N, et al. The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment. J Am Med Inform Assoc 2021;28:427–443. Available at: 10.1093/jamia/ocaa196. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 152.Feder SL. Data quality in electronic health records research: Quality domains and assessment methods. West J Nurs Res 2018;40:753–766. Available at: 10.1177/0193945916689084. [DOI] [PubMed] [Google Scholar]
  • 153.Souibgui M, Atigui F, Zammali S, Cherfi S, Yahia SB. Data quality in ETL process: A preliminary study. Procedia Comput Sci 2019;159:676–687. Available at: https://www.sciencedirect.com/science/article/pii/S1877050919314097. [Google Scholar]
  • 154.Quiroz JC, Chard T, Sa Z, Ritchie A, Jorm L, Gallego B. Extract, transform, load framework for the conversion of health databases to OMOP. PLoS One 2022;17:e0266911. Available at: 10.1371/journal.pone.0266911. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 155.Schuemie MJ, Ryan PB, Pratt N, Chen R, You SC, Krumholz HM, Madigan D, Hripcsak G, Suchard MA. Principles of Large-scale Evidence Generation and Evaluation across a network of databases (LEGEND). J Am Med Inform Assoc 2020;27:1331–1337. Available at: 10.1093/jamia/ocaa103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 156.Rohan Khera, Yuan Lu, Ruijun Chen, Martijn Schuemie, George Hripcsak, Harlan Krumholz, Marc Suchard. COMPARATIVE CARDIOVASCULAR EFFECTIVENESS AND SAFETY OF INDIVIDUAL ANGIOTENSIN-CONVERTING ENZYME INHIBITORS AND ANGIOTENSIN RECEPTOR BLOCKERS: A MULTINATIONAL PARTICIPANT-LEVEL ASSESSMENT FROM LEGEND-HTN. J Am Coll Cardiol 2021;77:1474–1474. Available at: 10.1016/S0735-1097(21)02832-1. [DOI] [Google Scholar]
  • 157.Suchard MA, Schuemie MJ, Krumholz HM, You SC, Chen R, Pratt N, Reich CG, Duke J, Madigan D, Hripcsak G, Ryan PB. Comprehensive comparative effectiveness and safety of first-line antihypertensive drug classes: a systematic, multinational, large-scale analysis. Lancet 2019;394:1816–1826. Available at: 10.1016/S0140-6736(19)32317-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 158.Khera R, Dhingra LS, Aminorroaya A, Li K, Zhou JJ, Arshad F, Blacketer C, Bowring MG, Bu F, Cook M, Dorr DA, Duarte-Salles T, DuVall SL, Falconer T, French TE, Hanchrow EE, Horban S, Lau WCY, Li J, Liu Y, Lu Y, Man KKC, Matheny ME, Mathioudakis N, McLemore MF, Minty E, Morales DR, Nagy P, Nishimura A, Ostropolets A, Pistillo A, Posada JD, Pratt N, Reyes C, Ross J, Seager SL, Shah NH, Simon KR, Wan EYF, Yang J, Yin C, You SC, Schuemie MJ, Ryan PB, Hripcsak G, Krumholz HM, Suchard MA. Multinational patterns of second-line anti-hyperglycemic drug initiation across cardiovascular risk groups: A federated pharmacoepidemiologic evaluation in LEGEND-T2DM. bioRxiv 2022:2022.12.27.22283968. Available at: https://www.medrxiv.org/content/10.1101/2022.12.27.22283968v1. Accessed January 11, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 159.Khera R, Schuemie MJ, Lu Y, Ostropolets A, Chen R, Hripcsak G, Ryan PB, Krumholz HM, Suchard MA. Large-scale evidence generation and evaluation across a network of databases for type 2 diabetes mellitus (LEGEND-T2DM): a protocol for a series of multinational, real-world comparative cardiovascular effectiveness and safety studies. BMJ Open 2022;12:e057977. Available at: 10.1136/bmjopen-2021-057977. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 160.Rudy JE, Khan Y, Bower JK, Patel S, Foraker RE. Cardiovascular health trends in electronic health record data (2012–2015): A cross-sectional analysis of The Guideline AdvantageTM. EGEMS (Wash, DC ) 2019;7:30. Available at: 10.5334/egems.268. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 161.Pfaff ER, Girvin AT, Gabriel DL, Kostka K, Morris M, Palchuk MB, Lehmann HP, Amor B, Bissell M, Bradwell KR, Gold S, Hong SS, Loomba J, Manna A, McMurry JA, Niehaus E, Qureshi N, Walden A, Zhang XT, Zhu RL, Moffitt RA, Haendel MA, Chute CG, N3C Consortium, Adams WG, Al-Shukri S, Anzalone A, Baghal A, Bennett TD, Bernstam EV, Bernstam EV, Bissell MM, Bush B, Campion TR, Castro V, Chang J, Chaudhari DD, Chen W, Chu S, Cimino JJ, Crandall KA, Crooks M, Davies SJD, DiPalazzo J, Dorr D, Eckrich D, Eltinge SE, Fort DG, Golovko G, Gupta S, et al. Synergies between centralized and federated approaches to data quality: a report from the national COVID cohort collaborative. J Am Med Inform Assoc 2022;29:609–618. Available at: 10.1093/jamia/ocab217. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 162.He S, Hurdle JF, Botkin JR, Narus SP. Integrating a federated healthcare data query platform with electronic IRB information systems. AMIA Annu Symp Proc 2010;2010:291–295. Available at: https://www.ncbi.nlm.nih.gov/pubmed/21346987. [PMC free article] [PubMed] [Google Scholar]
  • 163.Loftus TJ, Ruppert MM, Shickel B, Ozrazgat-Baslanti T, Balch JA, Efron PA, Upchurch GR Jr, Rashidi P, Tignanelli C, Bian J, Bihorac A. Federated learning for preserving data privacy in collaborative healthcare research. Digit Health 2022;8:20552076221134456. Available at: 10.1177/20552076221134455. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 164.Melis L, Song C, De Cristofaro E, Shmatikov V. Exploiting unintended feature leakage in collaborative learning. In: 2019 IEEE Symposium on Security and Privacy (SP). IEEE; 2019. Available at: 10.1109/sp.2019.00029. [DOI] [Google Scholar]
  • 165.Li J, Meng Y, Ma L, Du S, Zhu H, Pei Q, Shen X. A federated learning based privacy-preserving smart healthcare system. IEEE Trans Industr Inform 2022;18:2021–2031. Available at: 10.1109/tii.2021.3098010. [DOI] [Google Scholar]
  • 166.Onesimu JA, Karthikeyan J, Sei Y. An efficient clustering-based anonymization scheme for privacy-preserving data collection in IoT based healthcare services. Peer Peer Netw Appl 2021;14:1629–1649. Available at: 10.1007/s12083-021-01077-7. [DOI] [Google Scholar]
  • 167.Warnat-Herresthal S, Schultze H, Shastry KL, Manamohan S, Mukherjee S, Garg V, Sarveswara R, Händler K, Pickkers P, Aziz NA, Ktena S, Tran F, Bitzer M, Ossowski S, Casadei N, Herr C, Petersheim D, Behrends U, Kern F, Fehlmann T, Schommers P, Lehmann C, Augustin M, Rybniker J, Altmüller J, Mishra N, Bernardes JP, Krämer B, Bonaguro L, Schulte-Schrepping J, De Domenico E, Siever C, Kraut M, Desai M, Monnet B, Saridaki M, Siegel CM, Drews A, Nuesch-Germano M, Theis H, Heyckendorf J, Schreiber S, Kim-Hellmuth S, COVID-19 Aachen Study (COVAS), Nattermann J, Skowasch D, Kurth I, Keller A, Bals R, Nürnberg P, et al. Swarm Learning for decentralized and confidential clinical machine learning. Nature 2021;594:265–270. Available at: 10.1038/s41586-021-03583-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 168.Han J, Ma Y, Han Y. Demystifying Swarm Learning: A new paradigm of blockchain-based decentralized federated learning. arXiv [csLG] 2022. Available at: http://arxiv.org/abs/2201.05286. [Google Scholar]

RESOURCES