Harnessing Electronic Health Records and Artificial Intelligence for Enhanced Cardiovascular Risk Prediction: A Comprehensive Review

Ming‐Lung Tsai; Kuan‐Fu Chen; Pei‐Chun Chen

doi:10.1161/JAHA.124.036946

. 2025 Mar 13;14(6):e036946. doi: 10.1161/JAHA.124.036946

Harnessing Electronic Health Records and Artificial Intelligence for Enhanced Cardiovascular Risk Prediction: A Comprehensive Review

Ming‐Lung Tsai ^1,^2,³, Kuan‐Fu Chen ^4,^5,^✉, Pei‐Chun Chen ^6,^7,^✉

PMCID: PMC12132661 PMID: 40079336

Abstract

Electronic health records (EHR) have revolutionized cardiovascular disease (CVD) research by enabling comprehensive, large‐scale, and dynamic data collection. Integrating EHR data with advanced analytical methods, including artificial intelligence (AI), transforms CVD risk prediction and management methodologies. This review examines the advancements and challenges of using EHR in developing CVD prediction models, covering traditional and AI‐based approaches. While EHR‐based CVD risk prediction has greatly improved, moving from models that integrate real‐world data on medication use and imaging, challenges persist regarding data quality, standardization across health care systems, and geographic variability. The complexity of EHR data requires sophisticated computational methods and multidisciplinary approaches for effective CVD risk modeling. AI's deep learning enhances prediction performance but faces limitations in interpretability and the need for validation and recalibration for diverse populations. The future of CVD risk prediction and management increasingly depends on using EHR and AI technologies effectively. Addressing data quality issues and overcoming limitations from retrospective data analysis are critical for improving the reliability and applicability of risk prediction models. Integrating multidimensional data, including environmental, lifestyle, social, and genomic factors, could significantly enhance risk assessment. These models require continuous validation and recalibration to ensure their adaptability to diverse populations and evolving health care environments, providing reassurance about their reliability.

Keywords: artificial intelligence, cardiovascular disease, electronic health records, risk prediction models

Subject Categories: Cardiovascular Disease, Primary Prevention

Nonstandard Abbreviations and Acronyms

PCE: pooled cohort equation
PREDICT: Predicting Risk of CVD EVENTs
SDOH: social determinant of health

In recent years, electronic health records (EHR) have played a pivotal role in cardiovascular disease (CVD) research, offering unparalleled access to a broad range of health care data. Advancements in medical informatics have enabled the integration of a diverse range of EHR data, including demographics, comorbidities, disease diagnoses, laboratory values, imaging, medications, vital signs, and even medical expenditures, significantly improving the quality of medical care and patient outcomes. ¹ , ² This centralized and digitized information management enhances medical information retrieval's speed, scale, and accuracy. When properly integrated and managed, EHR could reduce the chances of medical errors and increase the continuity and reliability of health care services. ³ , ⁴

EHR systems not only streamline the retrieval of medical information but also play a crucial role in collaborative care and patient engagement. By enabling seamless information sharing among health care providers, EHRs facilitate cross‐specialty or regional medical services and resource integration, essential for comprehensive CVD care. ⁵ , ⁶ Moreover, patient‐facing EHR tools empower patients to actively engage in their health management, fostering a more participatory approach to health care and reinforcing the importance of improving health care services.

Despite their widespread adoption and transformation from mere medical records into valuable research tools, significant gaps remain in understanding the challenges of CVD risk prediction. These challenges arise partly from the routine clinical care nature of EHR data, which differs markedly from data collected specifically for research purposes, such as more missing values, inconsistencies, and potential biases than carefully controlled research data. ⁷ Over the past 2 decades, EHRs have been increasingly used in various fields, including developing clinical risk prediction algorithms to support clinical decision making. ² , ³ The use of traditional statistical methods and cutting‐edge artificial intelligence (AI) techniques in developing EHR‐based risk prediction algorithms has expanded. ⁸ , ⁹ However, developing an appropriate prediction model using EHR requires overcoming multifaceted challenges and integrating various components, including data standardization, bias mitigation, and advanced analytical techniques (Figure).

This framework encapsulates the journey from data collection through standardization, research design, and ultimately to the deployment of risk prediction models in various clinical applications. AI indicates artificial intelligence; CONSORT‐AI, Consolidated Standards of Reporting Trials–Artificial Intelligence; CVD, cardiovascular disease; EHR, electronic health records; FHIR, Fast Healthcare Interoperability Resources; HL7, Health Level Seven International; LLM, large language model; NLP, natural language process; and SPIRIT‐AI, Standard Protocol Items: Recommendations for Interventional Trials–Artificial Intelligence.

This review aims to fill this gap by critically exploring 3 fundamental dimensions of EHR‐based risk prediction in CVD. First, we chart the evolution and advancements of both traditional and AI‐driven models for CVD risk prediction, emphasizing the role of EHR in their development. Second, we assess the strengths and limitations inherent in both traditional and AI methodologies in CVD risk prediction, offering insights into their comparative effectiveness. Finally, we introduce guidelines to improve the quality and reporting of clinical studies and models development, covering both conventional methods and those involving AI interventions. To achieve these aims, we followed a systematic approach to literature selection, searching PubMed, Web of Science, and Google Scholar for articles published between 2000 and 2024 using keywords including “cardiovascular disease,” “risk prediction,” “electronic health records,” and “artificial intelligence.” Studies were included if they (1) focused on CVD risk prediction, (2) used EHR data, or (3) used AI or machine learning methods. We excluded studies without clear methodology descriptions or those focusing solely on basic science without clinical applications.

TRADITIONAL COHORT‐BASED PREDICTION MODELS

Historically, models developed from the 1960s to the early 2000s, such as the Framingham general CVD model, Systematic Coronary Risk Evaluation (and Systematic Coronary Risk Evaluation 2), pooled cohort equations (PCEs), and the World Health Organization CVD risk charts, derived from prospective cohorts, have been recommended for CVD prevention and risk assessment (Table 1). ¹⁰ , ¹¹ , ¹² , ¹³ , ¹⁴ , ¹⁵ These traditional models depend on well‐established risk factors, making them practical in clinical settings due to their reliance on a limited number of easily obtainable CVD risk factors. Furthermore, the availability of nonlaboratory versions of some models, such as the Framingham office‐based model and World Health Organization CVD risk charts, further extend their applicability in resource‐limited settings. ¹² , ¹⁴ However, these models often face limitations with accuracy or miscalibration when applied to different geographic regions, demographic groups, or populations from different temporal periods. ¹⁶ This is partly due to differences in genetic backgrounds, lifestyle factors, and health care systems across different populations, which can significantly influence CVD risk profiles. Variations in average CVD risk and underlying risk factors between the populations used for model development and those in which the models are applied can lead to biased risk estimates. In addition, traditional cohort‐based models often have limited medication information, lack imaging data, and consider a narrower range of disease spectrum. The exclusion of these factors limits the models' ability to capture the dynamic nature of CVD risk and the impact of interventions over time, potentially reducing their predictive accuracy in real‐world clinical settings.

Table 1.

Overview of Cardiovascular Disease Prediction Models

Models	Algorithm	Published year	Primary target	Validation	C‐statistic	AUROC	Source
TIMI risk score ⁴⁸	Multivariable logistic regression model	2000	14‐d death and ischemic event for ACS	Researchers conducted external validation	0.65^*		International cohort, n=7081
SCORE risk estimation system ¹¹	Weibull proportional hazards model	2003	10‐y risk of fatal CVD	Independent external validation		0.71–0.84^*	European cohort studies, n=205 178
GRACE score ⁴⁹	Multivariable logistic regression model	2003	In‐hospital death for ACS	Researchers conducted subset and external validation	0.84^*		International registry cohort, n=11 389
Framingham Cardiovascular Risk Model ¹²	Cox proportional hazards model	2008	10‐y prediction of myocardial infarction or coronary death	Independent external validation	Men: 0.763 (95% CI, 0.746–0.780) Women: 0.793 (95% CI, 0.772–0.814)		US community‐based cohort, n=8491
Pooled cohort equations model ¹³	Cox proportional hazards model	2013	10‐y risk of atherosclerotic CVD	Split‐sample validation	0.713–0.818^*		US community‐based cohorts, n≈24 000
QRISK3 ²³	Cox proportional hazards model	2017	10‐y risk of CVD	Split‐sample validation	Men: 0.858 (95% CI, 0.857–0.859) Women: 0.880 (95% CI, 0.878–0.881)		UK QResearch database, n≈7 890 000
Motwani et al ⁴⁵	Information gain ranking and boosted ensemble algorithm	2017	5‐y all‐cause death	10‐fold stratified cross‐validation		0.79 (95% CI, 0.77–0.81)	International multicenter registry database, n=10 030
PREDICT ²⁴	Cox proportional hazards model	2018	5‐y risk of CVD	Split‐sample validation	Men: 0.73 (95% CI, 0.72–0.73) Women: 0.73 (95% CI, 0.72–0.73)		New Zealand primary care patients, n=401 752
Betancur et al ⁴¹	Information gain ranking and boosted ensemble algorithm	2018	3‐y risk of major adverse cardiac events	10‐fold stratified cross‐validation		0.81 (95% CI, 0.78–0.83)	Single‐center EHR, US, n=2619
WHO CVD risk charts ¹⁴	Cox proportional hazards models	2019	10‐y risk of CVD	Researchers conducted internal and external validation	0.685 (95% CI, 0.629–0.741) to 0.833 (95% CI, 0.783–0.882)		International cohorts, n=376 177
Kwon et al ⁴⁷	Three hidden layered neural network	2019	In‐hospital death	Researchers conducted internal and external validation		0.898^*	Multicenter retrospective cohort, Korea, n=30 245
Alaa et al ³⁴	AutoPrognosis ensembles machine learning–based model	2019	5‐y risk of CVD	Cross‐validation		0.774 (95% CI, 0.768–0.780)	UK Biobank database, n=423 604
SCORE2 Risk Estimation System ¹⁵	Fine and Gray competing risk‐adjusted models	2021	10‐y risk of CVD	Researchers conducted external validation	0.67 (95% CI, 0.65–0.68) to 0.81 (95% CI, 0.76–0.86)		European cohorts, n=677 684
BEHRT ⁴³	Transformer‐based model	2022	5‐y risk of heart failure, stroke, coronary heart disease	Researchers conducted internal and external validation		Heart failure: 0.909 (95% CI, 0.865–0.953) Stroke: 0.932 (95% CI, 0.907–0.957) Coronary heart disease: 0.929 (95% CI, 0.907–0.951)	Clinical Practice Research Datalink EHR, UK, n=3 052 290
Petrazzini et al ⁴⁰	Random forest, gradient‐boosted trees, support vector machine with polynomial kernel	2022	1‐y risk of CVD	Researchers conducted external validation		0.88 (95% CI, 0.87–0.89)	BioMe EHR, US, n=6349
PREVENT ²⁶	Cox proportional hazards models	2023	10‐ and 30‐y risk of CVD	Researchers conducted external validation	Men: 0.757 (95% CI, 0.727–0.778); Women: 0.794 (95% CI, 0.763–0.809)		US cohort studies and EHR data sets, n=3 281 919
QRISK4 ²²	Cause‐specific Cox models	2024	10‐y risk of CVD	Researchers conducted external validation	Men: 0.814 (95% CI, 0.812–0.816) Women: 0.835 (95% CI, 0.833–0.837)		UK QResearch database, n≈9 976 306
Yu et al ⁵³	Recurrent neural network and fully connected neural network	2024	10‐y risk of CVD	Split‐sample validation		0.815 (95% CI, 0.782–0.844)	US Cardiovascular Lifetime Risk Pooling Project, n=15 565

Open in a new tab

ACS indicates acute coronary syndrome; AUROC, area under the receiver operating characteristic curve; BEHRT, Transformer‐Based Model for EHR Analysis; CPRD, Clinical Practice Research Datalink; CVD, cardiovascular disease; EHR, electronic health record; GRACE, Global Registry of Acute Coronary Events; PREDICT, Predicting Risk of CVD Events; PREVENT, Predicting Risk of CVD EVENTs; SCORE, Systematic Coronary Risk Evaluation; TIMI, Thrombolysis in Myocardial Infarction; and WHO, World Health Organization.

95% CI not available.

These limitations have led researchers to reevaluate and validate traditional models using more diverse and contemporary data sources, particularly EHRs. This approach allows for a more comprehensive assessment of these models' performance across different populations and health care settings. Recent validation studies have offered valuable insights into the performance of traditional models in diverse populations. For example, Wolfson et al validated the Framingham general CVD model and PCEs using data from >84 000 US adults, including EHRs and insurance claims. ¹⁷ The C‐statistic, a measure of the model's discriminative ability where 1.0 indicates perfect discrimination, was 0.740 for the Framingham model and 0.747 for PCEs, indicating good discrimination for both models. Conversely, a validation study using EHRs from 84 617 Ontario residents revealed that both models overestimated the absolute risk of atherosclerotic CVD. ¹⁸ This finding underscores the need to recalibrate these models, which were developed in prospective cohorts from the United States, to suit geographically distinct populations better. In a recent study, the Framingham general CVD model and PCEs were recalibrated using multiple data sources, including EHRs from 6 938 971 Ontario residents. ¹⁹ Recalibration improved risk prediction of the Framingham general CVD model but not PCEs. These studies underscore the value of combining established prediction models with contemporary EHR data to enhance CVD prevention and risk assessment in primary care settings.

EHR IN CVD RISK PREDICTION

Leveraging EHR Data to Enhance CVD Risk Prediction Models

The evolution and integration of EHR data into CVD research have significantly expanded the capabilities and scope of risk prediction models. Unlike traditional cohort studies, which require extensive resources and time to track a limited population, EHRs automatically accumulate data during routine medical care, eliminating the need for dedicated follow‐up. ¹ , ³ As a result, EHRs provide a dynamic and comprehensive database of patient information, including longitudinal records of clinical data, laboratory results, and imaging data. ¹ This infrastructure supports the real‐time application of prediction models and enables routine updates. Beyond traditional risk factors like hypertension, cholesterol levels, diabetes, and smoking status, EHRs enable the inclusion of more nuanced biomarkers, lifestyle factors, medication use, and socioeconomic conditions that were often overlooked in earlier models. ¹ , ²⁰

EHR data also offer high observation frequency, capturing information from multiple patient visits rather than just periodic follow‐ups as seen in prospective cohort studies. This increased frequency of data capture enhances the ability to predict the risk of imminent events, which is crucial for timely clinical interventions. Additionally, the patient population represented in EHRs more accurately reflects real‐world diversity, including all patients interacting with the health care system, not just selected volunteers from cohort studies. ³ This broader representation is essential for developing more generalizable and applicable models across diverse populations, potentially addressing health disparities in CVD risk prediction and management.

Model Validation in CVD Risk Prediction

Model development in research may begin with internal validation, where the model is tested within the same data set from which it was developed. This method is crucial for initial assessment but may not adequately address the model's applicability to broader populations. External validation, which involves testing the model on independent data sets from different populations, is essential for assessing generalizability and real‐world predictive performance of the model. Validation should be conducted to ensure robustness and relevance of the prediction models, with recalibration as needed on the basis of external validation results. ¹⁶ EHRs collected routinely in clinical practice provide a valuable data source for the external validation and recalibration process, allowing researchers to refine models to better meet the needs of diverse health care settings (see examples in the section Traditional Cohort‐Based Prediction Models).

EHR‐Based CVD Prediction Models and Their Applications

Over the past decade, many studies have leveraged EHR data for CVD risk prediction, such as estimating the 10‐year risk of myocardial infarction or stroke. A notable example is the QRISK CVD risk prediction model, developed by Hippisley‐Cox and colleagues in 2007 using the UK QResearch EHR database. ²¹ This model has been continuously updated and has reached its latest version, QRISK4, introduced in 2024. ²² QRISK4 was developed using a derivation cohort of 9.98 million patients aged 25 to 84 years and a validation cohort of 6.79 million. Key features of the QRISK models include the development using a large, representative sample and the incorporation of risk factors readily available in the EHR, such as blood pressure variability, medication use, and a wide range of diseases (eg, chronic kidney disease, brain cancer, lung cancer). ²² , ²³ In external validation, QRISK4 demonstrated better discrimination and calibration than the widely used CVD prediction models like PCEs and Systematic Coronary Risk Evaluation 2. ²² This superior performance may be attributed to QRISK4's inclusion of a broader range of risk factors and its development using a larger, more diverse data set. However, further validation is necessary across different countries and demographic groups.

While the QRISK models exhibited a comprehensive approach to CVD risk prediction using >20 variables that reflect patients' risk, their complexity and challenges in accessibility may limit their practical use. In contrast, the Predicting Risk of CVD Events (PREDICT) model, developed in New Zealand by Pylypchuk and colleagues, offers a more straightforward approach, with only 13 variables, while maintaining high accuracy in predicting CVD risk. By using fewer variables and integrating directly with primary care EHR systems, PREDICT addresses the complexity and accessibility issues that may limit the practical use of more comprehensive models like QRISK. Notably, the PREDICT model includes various factors, such as socioeconomic status, ethnicity, and comorbidities like atrial fibrillation. ²⁴ Unlike the QRISK models, which are derived from broad population data from UK general practice records, PREDICT focused on individuals undergoing CVD risk assessments at primary care clinics. This assessment is facilitated by a decision support software integrated with the EHRs, which automatically populates a risk factor template during patient consultations. Moreover, compared with the PCEs, which tend to overestimate cardiovascular risk, PREDICT offers a more accurate risk estimation by incorporating additional predictive factors. The PREDICT model represents a balanced approach that combines inclusivity with practicality. ²⁴

Recognizing the greater size, diversity, and contemporary nature of individuals within EHRs compared with cohort studies and clinical trials, EHR data were used to develop the Predicting Risk of Cardiovascular Disease EVENTs, a new equation introduced by the American Heart Association in 2023, ²⁵ , ²⁶ , ²⁷ significantly removes race as a predictor and instead incorporates social determinants of health through a social deprivation index, which includes factors like income, education, employment, housing, and transportation. This approach represents a significant advancement in reflecting contemporary understandings of health disparities. The model provides risk assessments for both 10‐ and 30‐year periods, accommodating a wide age range and offering versatility for health care providers. Its development from a large, diverse EHR data set underscores its potential applicability and relevance to modern health care settings. While designed to be comprehensive, ongoing research and adaptation will continue to refine its predictions and enhance its utility in various clinical environments.

INTEGRATION OF AI TO EHR‐BASED MODELS

Overview of AI, Machine Learning, and Deep Learning in CVD

Rapid advancements in data science and AI have greatly enhanced CVD risk prediction by integrating complex machine learning models. These models process large volumes of diverse data from EHRs, identifying intricate risk patterns that were previously undetectable. ²⁸ Structured and unstructured clinical information from EHR data analysis has enabled AI applications, allowing for more accurate and dynamic risk stratification, which is crucial for early intervention and personalized patient management. ²⁹ , ³⁰ , ³¹ Traditionally, researchers have relied on structured tabular data like laboratory results, vital signs, diagnostic and procedure codes, and basic demographic features to build their prediction models. However, with the advancement of AI tools, unstructured data, including clinical notes, electronic physiological signals, and medical images, can now be used as novel features for predicting CVD.

AI uses learning concepts that could be categorized into supervised and unsupervised learning. In supervised learning, data undergo iterative analysis, selecting, processing, and weighting individual features to identify the optimal combination for the specific outcome. On the other hand, unsupervised learning explores patterns within data without predefined labels, helping us understand risk factors differently. ³² , ³³ The AI technology is particularly powerful for analyzing patterns within large‐scale EHR data. ³⁴ , ³⁵ More advanced AI models, like deep learning networks, involve multiple layers of computation to model intricate, nonlinear relationships without predefined interactions and can handle unstructured data like images and free text more effectively. ²⁸ , ³¹ Deep learning leverages neural networks, such as convolutional neural networks, recurrent neural networks, and long short‐term memory networks, intricately designed to mirror the human brain's functionality, enabling computers to interpret and analyze complex data layers with unsupervised learning. ⁹ , ³⁶ , ³⁷ This involves transforming input data into more intricate output data, allowing deep learning models to learn high‐level features that capture complex interactions of input variables without the need for expert‐guided feature engineering. ³⁶ , ³⁷

Specific AI Algorithms and Applications in CVD Risk Prediction

Some studies have demonstrated that AI‐predicting prognostic models for CVD offer superior results compared with traditional methods. ³⁸ , ³⁹ , ⁴⁰ For instance, Petrazzini et al used EHR data with machine learning to evaluate the risk of coronary artery disease. ⁴⁰ The study used a robust ensemble learning technique called the random forest–based algorithm and analyzed EHR data from a multiethnic clinical care cohort and the UK Biobank population‐based cohort. They compared the predictive capabilities against PCEs and found that leveraging clinical features from EHR could enhance the 1‐year risk prediction of coronary artery disease by 12% (Table 1). Notably, there was a 20% increase in discrimination and a 34.4% improvement in reclassification for subgroups with lower coronary artery disease risk, demonstrating the potential of machine learning in improving CVD risk prediction and integrating EHR data into health care strategies.

Integrating machine learning with clinical and imaging data has also advanced the prediction of major adverse cardiac events. Betancur et al used a machine learning algorithm, the LogitBoost ensemble, to integrate clinical information with myocardial perfusion imaging data. ⁴¹ Their study showed significant improvements in predicting major adverse cardiac events compared with traditional visual or automated assessments (area under the receiver operating characteristic curve [AUROC], 0.81 versus 0.65 versus 0.73, respectively; P<0.01 for all). This synergistic approach exemplifies AI's ability to merge diverse data types for detailed risk analysis and its transformative potential in CVD prediction. ³⁴ , ⁴² , ⁴³ , ⁴⁴

The advancements in AI‐driven image analysis further reshape the field of disease prognosis prediction in CVD care. Studies that combine coronary computed tomography with other parameters have demonstrated that AI can predict CVD events or death more accurately. ⁴⁵ , ⁴⁶ Another multicenter study by Kwon et al, involving 25 775 patients with heart disease, developed and validated an echocardiography‐based deep learning model that accurately predicted in‐hospital death. ⁴⁷ This model outperformed established prediction models like the Global Registry of Acute Coronary Events score (AUROC, 0.958 versus 0.881) and Thrombolysis in Myocardial Infarction score (AUROC, 0.958 versus 0.800) for coronary heart disease, demonstrating AI's capability in precisely predicting future CVD incidents and facilitating targeted preventive measures (Table 1). ⁴⁷ , ⁴⁸ , ⁴⁹

AI models, particularly deep learning ones, offer significant advantages in tailoring disease risk predictions across diverse regions and ethnic groups. Li et al used deep learning techniques to construct the Transformer‐Based Model for EHR Analysis and compared it with other models. ⁴³ , ⁵⁰ Internal validation demonstrated that deep learning models significantly outperformed the best statistical models by substantial margins in predicting heart failure, stroke, and coronary heart disease, achieving an AUROC of 0.954 (95% CI, 0.950–0.958), 0.957 (95% CI, 0.955–0.959), and 0.951 (95% CI, 0.949–0.953), respectively. Moreover, the study tested the robustness of these models against data distribution changes, including geographic and temporal variations. It was observed that while all models exhibited performance dips with data changes, deep learning models maintained relatively superior performance across all risk prediction tasks, with increases in AUROC of 6%, 8%, and 11% in heart failure, stroke, and coronary heart disease, respectively. ⁴³ This capability revolutionizes disease prognosis approaches in cardiology, making it possible to tailor predictions and interventions more closely to individual patient needs. ³⁹ , ⁵¹ , ⁵²

Further advancing the application of deep learning in atherosclerotic cardiovascular risk prediction, Yu et al developed an innovative approach that specifically addresses the temporal nature of risk factors. ⁵³ Their Dynamic‐DeepHit model incorporated 8 years of longitudinal data from 4 cardiovascular disease cohorts, analyzing traditional risk factors such as blood pressure, cholesterol levels, and smoking status over time. When compared with the conventional PCEs, their model demonstrated superior discrimination (AUROC, 0.815 versus 0.792) with better overall performance (Brier score, 0.0514 versus 0.0542). Most notably, the model showed enhanced performance in specific demographic groups, including Black women and individuals aged <60 years, reinforcing deep learning's potential to address health care disparities through more nuanced risk stratification. This study suggested the potential of longitudinal data integration for future AI applications in personalized cardiovascular risk assessment.

CHALLENGES AND LIMITATIONS OF EHR IN CVD RISK PREDICTION

Data Quality and Outcome Adjudication

EHRs have revolutionized CVD research, particularly in risk prediction. However, they also present significant challenges. Outcome adjudication and detection are complex due to the nature of EHR data and the diverse patient populations involved. Inconsistencies often hinder the determination of outcomes when recording outcomes across different data systems and the retrospective nature of many EHR data sets. This can lead to significant challenges in ensuring that the outcomes in studies accurately reflect patients' true conditions. ⁵⁴ , ⁵⁵

Data quality poses a significant obstacle in using EHRs. These systems often contain incomplete or inconsistent information, a widely acknowledged challenge in this field. Such inaccuracies in data can impact the precision of research outcomes. ⁸ Errors, omissions, or missing data in EHR can lead to inaccuracies in feature extraction and modeling, as well as misinterpretation of specific diseases or risk factors. ³ , ⁵⁴ Excluding patients with significant missing data can affect the validity of models, potentially overestimating the severity of diseases and leading to sample selection bias. ⁷ , ⁵⁶ Barzi and Woodward's analysis of 28 cohort studies on serum cholesterol showed that standard statistical methods can manage missing data effectively when they are <10%. However, with missing data ranging from 10% to 60%, results vary significantly across methods, and no statistical technique can provide reliable results when >60% of data are missing. ⁵⁷ More sophisticated approaches like multiple imputation can better handle larger amounts of missing data by creating multiple complete data sets, analyzing each separately, and combining the results. ⁵⁸ However, these methods assume data are missing at random, which may not hold true in EHR data where missing patterns could be informative of patient health status. Recent machine learning approaches offer alternative solutions for handling missing data. Some models, such as decision trees and random forests, can naturally handle missing values without requiring explicit imputation. ⁵⁹ The choice between imputation‐requiring and imputation‐free models often depends on the specific context, including the amount and pattern of missing data, the type of variables involved, and the intended use of the prediction model. Although missing data could be a problem, the frequency of data measurement might correlate with the severity of the disease, implying that missing data could indicate milder or overlooked conditions. This suggests that the pattern of missing data itself might contain valuable information about patient health status and health care usage patterns, which could be incorporated into more sophisticated risk prediction models.

Data Standards and Harmonization

The vast and diverse nature of patient data within EHR systems presents significant challenges in CVD research. The EHR data, often including unstructured parts and covering various variables like medication profiles, laboratory results, imaging data, and medical histories, require advanced and sophisticated data integration techniques. ¹ The lack of standardized data formats and harmonization across different EHR systems further compounds this complexity. Such inconsistency complicates the integration of multisystem EHR data for developing and validating robust risk models. Proper data harmonization and preprocessing are essential for success, as they ensure that data from diverse sources can be accurately combined and analyzed. Integrating this multifaceted data demands sophisticated computational methods and an in‐depth understanding of both medical informatics and cardiovascular science. Moreover, the accuracy of variable definitions is crucial. These definitions should be meticulously checked, as they significantly impact whether analysis results reflect actual clinical practices. ²⁰ , ⁶⁰

Adopting common data models is an approach to address data standardization. The Observational Medical Outcomes Partnership common data model has emerged as a standard for transforming health care data from different sources into a common format with standardized vocabularies. ⁶¹ , ⁶² The Observational Medical Outcomes Partnership common data model enables researchers to systematically analyze data from diverse sources while maintaining data quality and consistency, with its adoption growing significantly across health care institutions worldwide.

Furthermore, adopting Health Level Seven International and Fast Healthcare Interoperability Resources standards may be beneficial in addressing data standardization challenges. ⁶³ , ⁶⁴ These frameworks streamline the integration of multisource EHR data and enhance the overall quality and interoperability of the data used in CVD research. By ensuring that diverse data sets are compatible and uniformly interpretable, Health Level Seven International and Fast Healthcare Interoperability Resources may contribute to the CVD risk prediction models, facilitating broader and more effective clinical applications. Differences between clinical data and billing data within EHR systems can further complicate data quality; procedure codes used for billing may not be exhaustive, often only including billable aspects. ⁶⁵ The challenges and limitations of data quality in EHR underscore the need for meticulous data management and reporting standards. Properly addressing these challenges is vital not only for ensuring the accuracy of disease predictions but also for maintaining the reliability of research and facilitating effective clinical decision‐making processes.

Geographic and Ethnic Diversity

Many prognostic studies using EHR in CVD research are constrained by their regional focus despite some studies involving large sample sizes. Goldstein et al reviewed 107 studies using EHR, while 39 studies included >100 000 participants, 37% were confined to single hospitals, and 30% focused on short‐term predictions within a 90‐day time frame. ³ Single‐hospital studies may not capture the diversity of patient populations and clinical practices across different health care settings. At the same time, short‐term predictions may overlook long‐term risk factors and outcomes crucial for comprehensive CVD management. This limits these models' geographic and temporal scope, reducing their generalizability.

CVD research using EHR data is also concentrated mainly in advanced nations. This skew raises significant concerns about the generalizability of study findings to diverse ethnic groups and countries. ³ , ⁶⁶ A notable finding by Damen et al highlights this concern, with 46% of CVD prediction models originating from European research. ⁶⁷ This geographic focus brings into question whether the conclusions drawn from these studies can be uniformly applied across different populations. ³

Ethnic and socioeconomic factors present significant challenges in CVD risk prediction. Unmeasured variables linked to ethnicity could lead to biases in the analyses. These unmeasured variables might include cultural dietary habits, stress levels related to societal factors, or genetic predispositions to certain CVD risk factors, which could significantly influence risk profiles but are often not captured in standard EHR data. Comparative studies across ethnic groups have shown discrepancies between actual risks and model predictions, suggesting that ethnic variability may significantly affect risk prediction. ⁶⁸ , ⁶⁹ Similarly, social determinants of health (SDOHs) also present challenges in CVD risk prediction. While evidence links SDOH factors, such as socioeconomic status, educational level, and community environment, to CVD risk, EHR data often lack detailed information on these variables. ⁷⁰ , ⁷¹ , ⁷² This gap highlights the need for more inclusive models that account for regional, ethnic, and social health factors to ensure the accuracy and applicability of research findings across diverse populations.

Retrospective Analysis and Bias

The predominantly retrospective nature of studies using EHR introduces specific challenges in CVD risk prediction. In these studies, where cohorts, exposures, and outcomes are determined after data collection, often encounter information bias related to the post hoc definition and adjustment of features. Moreover, the retrospective design limits the accurate capture of the temporal sequence of events, which is crucial in developing and validating prediction models for CVD. ⁷ , ⁵⁴ Understanding and interpreting the temporal relationships between risk factors and CVD outcomes can lead to flawed model development. Misinterpreting these relationships can lead to flawed model development. While statistical methods and balancing techniques can mitigate some of these biases, they cannot entirely eliminate the inherent limitations of retrospective data.

Selection bias is a significant challenge in EHR‐based CVD research. Typically, EHRs capture data from patients who interact more frequently with health care systems and may have higher disease burdens than the general population. ⁷³ , ⁷⁴ This overrepresentation of higher‐risk individuals can lead to models that overestimate CVD risk in the general population, potentially resulting in unnecessary interventions or anxiety among lower‐risk individuals. These biases may affect the validity and generalizability of prediction models, as EHR data may not fully reflect broader population trends. ⁷

A significant challenge in EHR‐based CVD research is the irregular nature of patient visits. Unlike controlled studies with predetermined follow‐up schedules, EHR data reflect real‐world health care use patterns where patients have varying visit frequencies, intervals, and lengths of follow‐up. ⁷⁴ , ⁷⁵ Some patients may have frequent visits with rich data collection, while others have sporadic encounters, creating challenges for developing consistent risk prediction models. This variability can introduce bias and complicate the temporal analysis necessary for accurate risk prediction. The irregular visit patterns not only affect the completeness of data collection but also may reflect underlying differences in disease severity, health care access, and patient behavior, potentially introducing systematic biases in risk prediction models.

Furthermore, a fundamental assumption in time‐to‐event models used for risk prediction is noninformative censoring and independent event times given observed covariates. ¹⁷ Noninformative censoring assumes that the reason for loss to follow‐up is unrelated to the outcome of interest. In contrast, independent event times assume that unobserved factors do not influence the timing of events. In EHR‐based studies, these assumptions may be violated if, for instance, sicker patients have more frequent follow‐ups or if certain risk factors influence both the likelihood of an event and the timing of health care visits. While unverifiable empirically, this assumption forms the basis of many studies assessing the performance of CVD risk prediction using standard survival regression models. These methodological considerations are particularly crucial in retrospective EHR‐based studies, where the nature of data collection and follow‐up can significantly impact the validity of these assumptions.

Challenges in Model Reporting and the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis Statement

Despite the vast amount of information available from the EHR for building CVD prediction models, significant gaps persist in reporting the methodologies used in model development. It has been found that 10% of models did not describe their modeling methods in detail, 13% did not specify the prediction time frame, and 25% did not provide the necessary information (such as full regression equations, nomograms, or risk charts) to calculate individual CVD risks. This lack of detailed reporting hinders these models' verification or their application in clinical practice. Complete regression equations are essential for external validation of models, yet only 46% of models provided this information. ⁶⁷

The Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis statement, published in 2015, addresses these issues by providing guidelines aimed at improving the reporting of studies that develop, validate, or update prediction models for diagnostic and prognostic purposes in medicine. ⁷⁶ , ⁷⁷ Its goal is to ensure that the key details of developing and validating prediction models are clearly reported, allowing other researchers to replicate the process and achieve similar results. Such transparency is crucial for synthesizing and critically evaluating all relevant information to assess the risk of bias and clinical utility. In EHR‐based CVD prediction models, adhering to Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis guidelines is especially important due to the complexity and variability of EHR data. Researchers should clearly report how they handle missing data, harmonize variables across different EHR systems, and account for potential biases inherent in retrospective EHR data.

CHALLENGE AND ADVANCEMENT IN AI FOR CVD RISK PREDICTION

Disease Incidence and Data Leakage in AI Models

In CVD prediction, the accuracy of models can be significantly impacted by the incidence rates of specific diseases. Settings or limitations of studies might lead to the underrepresentation of low‐risk patients, which can distort disease models or introduce bias in AI and machine learning–based modeling. ⁴⁰ This issue arises when a disease or condition is rare within the study population, leading to insufficient data for these groups. As a result, models may not adequately capture the nuances of these low‐incidence diseases, affecting the overall predictive accuracy and reliability.

Another critical challenge in the development of machine learning models is data leakage. This issue refers to the scenario where training data appear in the testing data set, leading to overestimating the model's performance. Data leakage can occur if multiple hospitals' records of the same patient are used or if a patient's time‐series data are not strictly confined to the training or testing set but appear in both. This overlap can result in artificially inflated performance metrics, giving a false impression of the model's effectiveness in clinical practice. ⁷⁸ , ⁷⁹ Such overlaps can artificially inflate performance metrics, thereby misrepresenting the model's actual effectiveness in clinical practice.

Accuracy, Reliability, and Interpretability Challenges in AI Models for Health Care

Integration of AI models in health care promises transformative potential, yet it also presents significant challenges, particularly in accuracy and reliability. ⁸⁰ , ⁸¹ , ⁸² As AI models become increasingly sophisticated, their application in critical health care decisions necessitates thoroughly examining their performance and limitations. Research has demonstrated that when analyzing the same data set, different AI modules can produce varying levels of accuracy. Each model exhibits unique strengths and weaknesses, depending on the specific context of their application. ²⁹ , ⁵¹ , ⁸³ This variability raises questions about the most appropriate AI computational approach for a given study to achieve the highest accuracy. AI models must undergo rigorous validation and testing to ensure effective clinical application. ³⁶ , ³⁷

Overfitting presents another significant challenge in AI model development. This phenomenon occurs when a model becomes excessively tailored to its training data, failing to generalize effectively to new, unseen data. This results in models that perform exceptionally well on training data but poorly in practical applications, leading to overly optimistic internal validation. ³⁶ , ⁸⁰ , ⁸⁴ , ⁸⁵ In CVD risk prediction, overfitting could lead to models that accurately predict patient outcomes similar to those in the training data but fail to identify risks in patients with different characteristics or from other populations, potentially missing critical intervention opportunities. Techniques such as cross‐validation, regularization, and ensemble methods are commonly used to mitigate overfitting, but their effectiveness can vary depending on the specific health care application and available data.

Algorithm interpretability is perhaps one of the most pressing issues in the field of AI in health care. ⁸⁰ , ⁸² , ⁸⁴ Many machine learning models, particularly those using complex algorithms like deep neural networks, often act as “black boxes,” obscuring the reasoning behind their decisions. This lack of transparency can be a significant barrier to the adoption and trust of AI in sensitive fields like health care, where understanding the rationale behind clinical decisions is crucial for both practitioners and patients. ⁸⁰ , ⁸² , ⁸⁴ , ⁸⁶

Researchers have developed various explainable AI techniques to address these interpretability challenges. Shapley Additive Explanations and Local Interpretable Model–Agnostic Explanations have been adopted for structured electronic medical records data sets to quantify the contribution of each feature to individual predictions. Class activation mapping helps visualize which regions of medical images most influenced the model's decisions in cardiovascular imaging analysis. Saliency map verbalization techniques can help language models that drive predictions. ⁸⁷ , ⁸⁸ , ⁸⁹ Recent studies have demonstrated successful applications of these Explainable AI methods in cardiovascular risk prediction. For instance, Shapley Additive Explanations values have been used to explain how different clinical variables contribute to predicted cardiovascular risks, providing clinicians with interpretable insights. ⁹⁰ Local Interpretable Model–Agnostic Explanations have also been applied to generate patient‐specific explanations for predicted cardiovascular outcomes, enhancing clinicians' trust and facilitating informed decision making. ⁹¹

FUTURE DEVELOPMENT DIRECTIONS IN CARDIOVASCULAR RISK PREDICTION

Enhancing Predictive Analytics

The potential of big data in CVD research is increasingly recognized, promising a revolution in the field. Comprehensive analyses of large‐scale data sets encompassing patient demographics, lifestyle factors, and environmental exposures are crucial for identifying complex patterns and previously hidden risk factors. This data‐driven approach enhances the precision of risk assessments and drives a deeper, more nuanced understanding of CVD dynamics. ² , ⁴ , ⁸ To fully leverage this potential, the field must prioritize the development of adaptive modeling techniques that can accommodate the dynamic nature of cardiovascular health data. This includes creating frameworks for real‐time model updating as new data become available, ensuring that prediction models remain relevant in the face of evolving health trends and treatment paradigms. Refining existing models to better suit local population specifics, rather than generating entirely new ones, ensures a more consistent and reliable approach to CVD risk assessment. ⁶⁷ This approach should be complemented by rigorous external validation and comparison of existing CVD risk prediction models, emphasizing the need for standardized reporting and methodological transparency. ⁶⁶ , ⁶⁷ , ⁹²

Cross‐disciplinary collaborations will significantly bolster the advancement of predictive analytics in CVD. Synergizing expertise from clinical cardiologists, epidemiologists, statisticians, and social scientists can lead to innovative solutions and holistic approaches to prevent, diagnose, and treat CVD. ¹ , ⁷ These collaborations are crucial for developing models that predict risk accurately and provide actionable insights for personalized patient care.

Integrating Multidimensional Data: Environmental, Lifestyle, Social, and Genomic Factors

Research in CVD is increasingly oriented toward integrating a comprehensive array of data, including environmental, lifestyle, SDOHs, and genomic factors. This holistic approach underscores the multifactorial nature of heart diseases, where factors beyond traditional clinical indicators play a crucial role. ⁶⁰ , ⁷¹ , ⁷² By delving into the nuances of environmental conditions and personal lifestyle choices, researchers can uncover a broader spectrum of triggers and risk factors for CVD. This expanded lens is pivotal for developing more effective, personalized prevention strategies. The growing recognition of SDOHs in cardiovascular health is prompting the inclusion of these elements in prediction models and treatment plans. ⁷¹ , ⁷² Future research could explore whether incorporating SDOHs has the potential to improve predictive accuracy and model performance.

Additionally, the integration of genomic data represents another promising direction in cardiovascular health research, especially with the application of AI technologies. This synergy will open up new avenues for personalized treatments by understanding genetic predispositions to heart diseases. ⁸ , ⁴⁴ AI algorithms, when combined with genomic data, may significantly advance our knowledge of CVD risk prediction and lead to targeted therapies that are fine‐tuned to individual genetic profiles. The combination of these diverse data sources may significantly advance the knowledge of CVD risk prediction. This multidimensional approach has the potential to lead to more targeted therapies and treatment strategies, tailored to individual patient profiles.

Enhancing Quality and Transparency in CVD Prevention With AI Algorithms

To effectively leverage AI in health care, 3 key factors are essential: transparency in AI processes, robust validation of AI models, and careful consideration of data quality. ⁷ , ⁸² Given AI technologies' unique attributes and rapid advancement, traditional clinical trial guidelines fail to fully address the specific challenges posed by trials involving AI algorithms for CVD prevention. Standard Protocol Items: Recommendations for Interventional Trials–Artificial Intelligence extension emerges as a novel reporting guideline for assessing clinical trial protocols incorporating AI components. ⁹³ Focused on the protocol stage, Standard Protocol Items: Recommendations for Interventional Trials–Artificial Intelligence offers a checklist of essential elements to include when planning clinical trials involving AI. This includes detailed descriptions of AI interventions, algorithm design and development, data processing methods, and special considerations for handling and reporting AI interventions. Standard Protocol Items: Recommendations for Interventional Trials–Artificial Intelligence aims to ensure that clinical trials are designed with a comprehensive understanding of AI's complexity and uniqueness, thereby enhancing the integrity and scientific rigor of the research.

Complementing Standard Protocol Items: Recommendations for Interventional Trials–Artificial Intelligence, Consolidated Standards of Reporting Trials–Artificial Intelligence focuses on the trial reporting phase, specifically enhancing the transparency and reliability of studies involving AI algorithms for CVD prevention. ⁹⁴ This AI extension provides a framework for detailed reporting on the unique aspects of AI within clinical trials, covering the development, implementation, and evaluation of AI technologies. It mandates the explicit description of AI model development, data management, algorithm training and validation, and modifications made during the study, thereby addressing the AI‐specific challenges not covered by the original Consolidated Standards of Reporting Trials guidelines. The objective of Consolidated Standards of Reporting Trials–Artificial Intelligence is not only to promote the transparency and completeness of clinical trial reports involving AI but also to illuminate the specific contributions of AI to trial design, methodology, and outcomes. It assists editors, peer reviewers, and general readers in understanding, interpreting, and critically evaluating the quality of clinical trial design and the risk of bias in reported results. By delineating AI's role in studies, Consolidated Standards of Reporting Trials–Artificial Intelligence contributes in ensuring that AI‐based interventions in CVD research and other medical fields are rigorously evaluated and accurately reported.

Integrating Clinical Workflow and AI in Cardiovascular Medicine

Integrating AI into cardiovascular clinical workflows significantly advances patient care, offering potential improvements in diagnosis, treatment planning, and outcome prediction. As we look to the future, such efforts will increasingly focus on enhancing the integration of EHR data to improve the effectiveness of clinical decision support systems. ⁹⁵ The efforts include creating more advanced prediction models that offer precise and tailored recommendations for patient care and integrating models into clinical workflows to enhance the accuracy and efficiency of medical diagnoses and treatments.

However, this integration also presents substantial challenges that need careful consideration. ⁷ , ⁸⁰ , ⁸⁵ , ⁹⁵ , ⁹⁶ Paramount among these is the need for AI tools to be integrated into existing health care systems with minimal disruption, enhancing rather than impeding clinical efficiency. ⁹⁵ Additionally, there are challenges in navigating regulatory approval processes for integrating commercial AI applications and developing strategies for continuous model updating as clinical practices evolve. ⁷ , ⁸⁰ In cardiovascular medicine, where decisions can be life‐critical, regulatory bodies must balance the potential benefits of AI tools with the need to ensure patient safety, requiring robust evidence of efficacy and safety before approval. Furthermore, it remains crucial for successful integration to demonstrate the added value of integrated AI models over existing clinical prediction tools and assess their real‐world impact on patient outcomes and clinical efficiency. ⁸⁰ , ⁹⁵ The ongoing refinement of AI technologies and methodologies will be crucial in realizing the full potential of these tools to improve patient outcomes and overall health care delivery.

Emerging Technologies of Large Language Models

The use of large language models to extract valuable insights from the free‐text data within EHRs represents a significant frontier in CVD research. ⁹⁷ , ⁹⁸ Clinical notes, often underused due to their unstructured format, contain rich, in‐depth details about patient histories, symptoms, and treatment responses. Large language models can potentially transform this textual information into structured, actionable data. By applying natural language processing techniques, these models can identify and extract relevant clinical information, enhancing the longitudinal data already available for modeling. This capability augments structured data and opens new possibilities for more profound, comprehensive risk assessments and personalized treatment plans. Integrating large language models with traditional data analysis methods could significantly advance precision medicine, offering a more holistic view of patient health and tailoring interventions to individual needs.

Balancing AI With Human Feedback in Cardiovascular Medicine

AI's ability to detect subtle details often overlooked by human perception can provide deeper insights for clinicians and researchers in various diagnostics and prognostics setting, potentially revolutionizing patient care in cardiovascular medicine. ⁵² , ⁸⁵ This technological advancement must be balanced with human oversight and clinical judgment. While AI can provide valuable insights and recommendations, the final decision‐making process should involve human expertise to ensure the best possible patient care.

CONCLUSIONS

With EHR's advancement and widespread use, CVD risk prediction has entered a new era. The transition from models like Framingham and Systematic Coronary Risk Evaluation to EHR‐based models such as QRISK and Predicting Risk of Cardiovascular Disease EVENTs demonstrates significant progress in incorporating more diverse and contemporary data, enabling more comprehensive risk assessments. The complexity of data integration and quality issues, particularly in retrospective analyses, underscores the importance of rigorous data management and adherence to standards like Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis.

AI and machine learning technologies have further revolutionized this field, demonstrating superior performance in analyzing complex, multidimensional data and often outperforming traditional statistical methods. However, integrating EHR and AI in CVD risk prediction faces challenges, including data quality issues, standardization problems, and concerns about AI interpretability.

Looking to the future, the field of CVD risk prediction is poised for further advancement by integrating diverse data sources, including genomics and social determinants of health, and developing more interpretable AI models. As we continue to embrace technological innovation and interdisciplinary collaboration, the goal remains to leverage these advanced tools for more accurate, personalized, and effective cardiovascular care, ultimately improving patient outcomes and quality of life.

Sources of Funding

This study was supported by grants from the Ministry of Science and Technology (currently National Science and Technology Council), Taiwan (MOST 110‐2314‐B‐039‐030‐MY3), and Chang Gung Medical Foundation (Grant Number: CORPVVP0101).

Disclosures

None.

This manuscript was sent to Thomas S. Metkus, MD, PhD, Associate Editor, for review by expert referees, editorial decision, and final disposition.

For Sources of Funding and Disclosures, see page 13.

Contributor Information

Kuan‐Fu Chen, Email: drkfchen@gmail.com.

Pei‐Chun Chen, Email: peichun.chen@nhri.edu.tw.

References

1. Cowie MR, Blomster JI, Curtis LH, Duclaux S, Ford I, Fritz F, Goldman S, Janmohamed S, Kreuzer J, Leenay M, et al. Electronic health records to facilitate clinical research. Clin Res Cardiol. 2017;106:1–9. doi: 10.1007/s00392-016-1025-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Kim E, Rubinstein SM, Nead KT, Wojcieszynski AP, Gabriel PE, Warner JL. The evolving use of electronic health records (EHR) for research. Semin Radiat Oncol. 2019;29:354–361. doi: 10.1016/j.semradonc.2019.05.010 [DOI] [PubMed] [Google Scholar]
3. Goldstein BA, Navar AM, Pencina MJ, Ioannidis JP. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. J Am Med Inform Assoc. 2017;24:198–208. doi: 10.1093/jamia/ocw042 [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Hernandez‐Boussard T, Monda KL, Crespo BC, Riskin D. Real world evidence in cardiovascular medicine: ensuring data validity in electronic health record‐based studies. J Am Med Inform Assoc. 2019;26:1189–1194. doi: 10.1093/jamia/ocz119 [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Tapuria A, Porat T, Kalra D, Dsouza G, Xiaohui S, Curcin V. Impact of patient access to their electronic health record: systematic review. Inform Health Soc Care. 2021;46:192–204. doi: 10.1080/17538157.2021.1879810 [DOI] [PubMed] [Google Scholar]
6. Pawelek J, Baca‐Motes K, Pandit JA, Berk BB, Ramos E. The power of patient engagement with electronic health records as research participants. JMIR Med Inform. 2022;10:e39145. doi: 10.2196/39145 [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Sauer CM, Chen LC, Hyland SL, Girbes A, Elbers P, Celi LA. Leveraging electronic health records for data science: common pitfalls and how to avoid them. Lancet Digit Health. 2022;4:e893–e898. doi: 10.1016/S2589-7500(22)00154-6 [DOI] [PubMed] [Google Scholar]
8. Hemingway H, Asselbergs FW, Danesh J, Dobson R, Maniadakis N, Maggioni A, van Thiel GJM, Cronin M, Brobert G, Vardas P, et al. Big data from electronic health records for early and late translational cardiovascular research: challenges and potential. Eur Heart J. 2018;39:1481–1495. doi: 10.1093/eurheartj/ehx487 [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Romiti S, Vinciguerra M, Saade W, Anso Cortajarena I, Greco E. Artificial intelligence (AI) and cardiovascular diseases: an unexpected alliance. Cardiol Res Pract. 2020;2020:4972346. doi: 10.1155/2020/4972346 [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Anderson KM, Odell PM, Wilson PW, Kannel WB. Cardiovascular disease risk profiles. Am Heart J. 1991;121:293–298. [DOI] [PubMed] [Google Scholar]
11. Conroy RM, Pyorala K, Fitzgerald AP, Sans S, Menotti A, De Backer G, De Bacquer D, Ducimetiere P, Jousilahti P, Keil U, et al. Estimation of ten‐year risk of fatal cardiovascular disease in europe: the score project. Eur Heart J. 2003;24:987–1003. doi: 10.1016/S0195-668X(03)00114-3 [DOI] [PubMed] [Google Scholar]
12. D'Agostino RB Sr, Vasan RS, Pencina MJ, Wolf PA, Cobain M, Massaro JM, Kannel WB. General cardiovascular risk profile for use in primary care: the framingham heart study. Circulation. 2008;117:743–753. [DOI] [PubMed] [Google Scholar]
13. Goff DC Jr, Lloyd‐Jones DM, Bennett G, Coady S, D'Agostino RB, Gibbons R, Greenland P, Lackland DT, Levy D, O'Donnell CJ, et al. 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. Circulation. 2014;129:S49–S73. doi: 10.1161/01.cir.0000437741.48606.98 [DOI] [PubMed] [Google Scholar]
14. WHO CVD Risk Chart Working Group . World health organization cardiovascular disease risk charts: revised models to estimate risk in 21 global regions. Lancet Glob Health. 2019;7:e1332–e1345. doi: 10.1016/S2214-109X(22)00522-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
15. SCORE2 working group and ESC Cardiovascular risk collaboration, Collaboration ESCCR . Score2 risk prediction algorithms: new models to estimate 10‐year risk of cardiovascular disease in europe. Eur Heart J. 2021;42:2439–2454. doi: 10.1093/eurheartj/ehab309 [DOI] [PMC free article] [PubMed] [Google Scholar]
16. van Daalen KR, Zhang D, Kaptoge S, Paige E, Di Angelantonio E, Pennells L. Risk estimation for the primary prevention of cardiovascular disease: considerations for appropriate risk prediction model selection. Lancet Glob Health. 2024;12:e1343–e1358. doi: 10.1016/S2214-109X(24)00210-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Wolfson J, Vock DM, Bandyopadhyay S, Kottke T, Vazquez‐Benitez G, Johnson P, Adomavicius G, O'Connor PJ. Use and customization of risk scores for predicting cardiovascular events using electronic health record data. J Am Heart Assoc. 2017;6:6. doi: 10.1161/JAHA.116.003670 [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Ko DT, Sivaswamy A, Sud M, Kotrri G, Azizi P, Koh M, Austin PC, Lee DS, Roifman I, Thanassoulis G, et al. Calibration and discrimination of the framingham risk score and the pooled cohort equations. CMAJ. 2020;192:E442–E449. doi: 10.1503/cmaj.190848 [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Sud M, Sivaswamy A, Chu A, Austin PC, Anderson TJ, Naimark DMJ, Farkouh ME, Lee DS, Roifman I, Thanassoulis G, et al. Population‐based recalibration of the framingham risk score and pooled cohort equations. J Am Coll Cardiol. 2022;80:1330–1342. doi: 10.1016/j.jacc.2022.07.026 [DOI] [PubMed] [Google Scholar]
20. Denaxas SC, Morley KI. Big biomedical data and cardiovascular disease research: opportunities and challenges. Eur Heart J Qual Care Clin Outcomes. 2015;1:9–16. doi: 10.1093/ehjqcco/qcv005 [DOI] [PubMed] [Google Scholar]
21. Hippisley‐Cox J, Coupland C, Vinogradova Y, Robson J, May M, Brindle P. Derivation and validation of qrisk, a new cardiovascular disease risk score for the United Kingdom: prospective open cohort study. BMJ. 2007;335:136. [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Hippisley‐Cox J, Coupland CAC, Bafadhel M, Russell REK, Sheikh A, Brindle P, Channon KM. Development and validation of a new algorithm for improved cardiovascular risk prediction. Nat Med. 2024;30:1440–1447. doi: 10.1038/s41591-024-02905-y [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Hippisley‐Cox J, Coupland C, Brindle P. Development and validation of qrisk3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study. BMJ. 2017;357:j2099. doi: 10.1136/bmj.j2099 [DOI] [PMC free article] [PubMed] [Google Scholar]
24. Pylypchuk R, Wells S, Kerr A, Poppe K, Riddell T, Harwood M, Exeter D, Mehta S, Grey C, Wu BP. Cardiovascular disease risk prediction equations in 400 000 primary care patients in New Zealand: a derivation and validation study. Lancet. 2018;391:1897–1907. [DOI] [PubMed] [Google Scholar]
25. Khan SS, Coresh J, Pencina MJ, Ndumele CE, Rangaswami J, Chow SL, Palaniappan LP, Sperling LS, Virani SS, Ho JE, et al. Novel prediction equations for absolute risk assessment of total cardiovascular disease incorporating cardiovascular‐kidney‐metabolic health: a scientific statement from the american heart association. Circulation. 2023;148:1982–2004. doi: 10.1161/CIR.0000000000001191 [DOI] [PubMed] [Google Scholar]
26. Khan SS, Matsushita K, Sang Y, Ballew SH, Grams ME, Surapaneni A, Blaha MJ, Carson AP, Chang AR, Ciemins E, et al. Development and validation of the American Heart Association's PREVENT equations. Circulation. 2024;149:430–449. doi: 10.1161/CIRCULATIONAHA.123.067626 [DOI] [PMC free article] [PubMed] [Google Scholar]
27. Larkin H. What to know about prevent, the aha's new cardiovascular disease risk calculator. JAMA. 2024;331:277–279. doi: 10.1001/jama.2023.25115 [DOI] [PubMed] [Google Scholar]
28. Dey D, Slomka PJ, Leeson P, Comaniciu D, Shrestha S, Sengupta PP, Marwick TH. Artificial intelligence in cardiovascular imaging: Jacc state‐of‐the‐art review. J Am Coll Cardiol. 2019;73:1317–1335. doi: 10.1016/j.jacc.2018.12.054 [DOI] [PMC free article] [PubMed] [Google Scholar]
29. Wu J, Roy J, Stewart WF. Prediction modeling using ehr data: challenges, strategies, and a comparison of machine learning approaches. Med Care. 2010;48:S106–S113. doi: 10.1097/MLR.0b013e3181de9e17 [DOI] [PubMed] [Google Scholar]
30. Dhingra LS, Shen M, Mangla A, Khera R. Cardiovascular care innovation through data‐driven discoveries in the electronic health record. Am J Cardiol. 2023;203:136–148. doi: 10.1016/j.amjcard.2023.06.104 [DOI] [PMC free article] [PubMed] [Google Scholar]
31. Zhang D, Yin C, Zeng J, Yuan X, Zhang P. Combining structured and unstructured data for predictive models: a deep learning approach. BMC Med Inform Decis Mak. 2020;20:280. doi: 10.1186/s12911-020-01297-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
32. Alloghani M, Al‐Jumeily D, Mustafina J, Hussain A, Aljaaf AJ. A systematic review on supervised and unsupervised machine learning algorithms for data science. In: Berry M, Mohamed A, Yap B, eds Supervised and unsupervised learning for data science. Unsupervised and semi‐supervised learning. Cham: Springer; 2020. doi: 10.1007/978-3-030-22475-2_1 [DOI] [Google Scholar]
33. Sarker IH. Machine learning: algorithms, real‐world applications and research directions. SN Comput Sci. 2021;2:160. doi: 10.1007/s42979-021-00592-x [DOI] [PMC free article] [PubMed] [Google Scholar]
34. Alaa AM, Bolton T, Di Angelantonio E, Rudd JHF, van der Schaar M. Cardiovascular disease risk prediction using automated machine learning: a prospective study of 423,604 UK biobank participants. PLoS One. 2019;14:e0213653. doi: 10.1371/journal.pone.0213653 [DOI] [PMC free article] [PubMed] [Google Scholar]
35. Yasmin F, Shah SMI, Naeem A, Shujauddin SM, Jabeen A, Kazmi S, Siddiqui SA, Kumar P, Salman S, Hassan SA, et al. Artificial intelligence in the diagnosis and detection of heart failure: the past, present, and future. Rev Cardiovasc Med. 2021;22:1095–1113. doi: 10.31083/j.rcm2204121 [DOI] [PubMed] [Google Scholar]
36. Kagiyama N, Shrestha S, Farjo PD, Sengupta PP. Artificial intelligence: practical primer for clinical research in cardiovascular disease. J Am Heart Assoc. 2019;8:e012788. doi: 10.1161/JAHA.119.012788 [DOI] [PMC free article] [PubMed] [Google Scholar]
37. Sevakula RK, Au‐Yeung WM, Singh JP, Heist EK, Isselbacher EM, Armoundas AA. State‐of‐the‐art machine learning techniques aiming to improve patient outcomes pertaining to the cardiovascular system. J Am Heart Assoc. 2020;9:e013924. doi: 10.1161/JAHA.119.013924 [DOI] [PMC free article] [PubMed] [Google Scholar]
38. Mohd Faizal AS, Thevarajah TM, Khor SM, Chang SW. A review of risk prediction models in cardiovascular disease: conventional approach vs. artificial intelligent approach. Comput Methods Prog Biomed. 2021;207:106190. doi: 10.1016/j.cmpb.2021.106190 [DOI] [PubMed] [Google Scholar]
39. Liu W, Laranjo L, Klimis H, Chiang J, Yue J, Marschner S, Quiroz JC, Jorm L, Chow CK. Machine‐learning versus traditional approaches for atherosclerotic cardiovascular risk prognostication in primary prevention cohorts: a systematic review and meta‐analysis. Eur Heart J Qual Care Clin Outcomes. 2023;9:310–322. doi: 10.1093/ehjqcco/qcad017 [DOI] [PMC free article] [PubMed] [Google Scholar]
40. Petrazzini BO, Chaudhary K, Marquez‐Luna C, Forrest IS, Rocheleau G, Cho J, Narula J, Nadkarni G, Do R. Coronary risk estimation based on clinical data in electronic health records. J Am Coll Cardiol. 2022;79:1155–1166. doi: 10.1016/j.jacc.2022.01.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
41. Betancur J, Otaki Y, Motwani M, Fish MB, Lemley M, Dey D, Gransar H, Tamarappoo B, Germano G, Sharir T, et al. Prognostic value of combined clinical and myocardial perfusion imaging data using machine learning. JACC Cardiovasc Imaging. 2018;11:1000–1009. doi: 10.1016/j.jcmg.2017.07.024 [DOI] [PMC free article] [PubMed] [Google Scholar]
42. Gharehchopogh FS, Khalifelu ZA. Neural Network application in diagnosis of patient: A case study. International Conference on Computer Networks and Information Technology. Abbottabad, Pakistan: Institute of Electrical and Electronics Engineers (IEEE); 2011:245–249. doi: 10.1109/ICCNIT.2011.6020937 [DOI] [Google Scholar]
43. Li Y, Salimi‐Khorshidi G, Rao S, Canoy D, Hassaine A, Lukasiewicz T, Rahimi K, Mamouei M. Validation of risk prediction models applied to longitudinal electronic health record data for the prediction of major cardiovascular events in the presence of data shifts. Eur Heart J Digit Health. 2022;3:535–547. doi: 10.1093/ehjdh/ztac061 [DOI] [PMC free article] [PubMed] [Google Scholar]
44. Krittanawong C, Johnson KW, Choi E, Kaplin S, Venner E, Murugan M, Wang Z, Glicksberg BS, Amos CI, Schatz MC, et al. Artificial intelligence and cardiovascular genetics. Life (Basel). 2022;12:279. doi: 10.3390/life12020279 [DOI] [PMC free article] [PubMed] [Google Scholar]
45. Motwani M, Dey D, Berman DS, Germano G, Achenbach S, Al‐Mallah MH, Andreini D, Budoff MJ, Cademartiri F, Callister TQ, et al. Machine learning for prediction of all‐cause mortality in patients with suspected coronary artery disease: a 5‐year multicentre prospective registry analysis. Eur Heart J. 2017;38:500–507. doi: 10.1093/eurheartj/ehw188 [DOI] [PMC free article] [PubMed] [Google Scholar]
46. Nakanishi R, Dey D, Commandeur F, Slomka P, Betancur J, Gransar H, Dailing C, Osawa K, Berman D, Budoff M. Machine learning in predicting coronary heart disease and cardiovascular disease events: results from the multi‐ethnic study of atherosclerosis (MESA). J Am Coll Cardiol. 2018;71:A1483–A1483. doi: 10.1016/S0735-1097(18)32024-2 [DOI] [Google Scholar]
47. Kwon J, Kim KH, Jeon KH, Park J. Deep learning for predicting in‐hospital mortality among heart disease patients based on echocardiography. Echocardiography. 2019;36:213–218. doi: 10.1111/echo.14220 [DOI] [PubMed] [Google Scholar]
48. Antman EM, Cohen M, Bernink PJ, McCabe CH, Horacek T, Papuchis G, Mautner B, Corbalan R, Radley D, Braunwald E. The TIMI risk score for unstable angina/non‐st elevation mi: a method for prognostication and therapeutic decision making. JAMA. 2000;284:835–842. [DOI] [PubMed] [Google Scholar]
49. Granger CB, Goldberg RJ, Dabbous O, Pieper KS, Eagle KA, Cannon CP, Van De Werf F, Avezum A, Goodman SG, Flather MD, et al. Predictors of hospital mortality in the global registry of acute coronary events. Arch Intern Med. 2003;163:2345–2353. [DOI] [PubMed] [Google Scholar]
50. Li Y, Rao S, Solares JRA, Hassaine A, Ramakrishnan R, Canoy D, Zhu Y, Rahimi K, Salimi‐Khorshidi G. BEHRT: transformer for electronic health records. Sci Rep. 2020;10:7155. doi: 10.1038/s41598-020-62922-y [DOI] [PMC free article] [PubMed] [Google Scholar]
51. Liu KH, Chiang CY, Wang HY, Tseng YJ. Temporal Phenotype Matrix Engineering for Electronic Health Records – Enhancing Coronary Artery Disease Prediction. 2023 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI). Pittsburgh, PA, USA: Institute of Electrical and Electronics Engineers (IEEE); 2023:1–4. doi: 10.1109/BHI58575.2023.10313504 [DOI] [Google Scholar]
52. Sun X, Yin Y, Yang Q, Huo T. Artificial intelligence in cardiovascular diseases: diagnostic and therapeutic perspectives. Eur J Med Res. 2023;28:242. doi: 10.1186/s40001-023-01065-y [DOI] [PMC free article] [PubMed] [Google Scholar]
53. Yu J, Yang X, Deng Y, Krefman AE, Pool LR, Zhao L, Mi X, Ning H, Wilkins J, Lloyd‐Jones DM, et al. Incorporating longitudinal history of risk factors into atherosclerotic cardiovascular disease risk prediction using deep learning. Sci Rep. 2024;14:2554. doi: 10.1038/s41598-024-51685-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
54. Gianfrancesco MA, Goldstein ND. A narrative review on the validity of electronic health record‐based research in epidemiology. BMC Med Res Methodol. 2021;21:234. doi: 10.1186/s12874-021-01416-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
55. McGuckin T, Crick K, Myroniuk TW, Setchell B, Yeung RO, Campbell‐Scherer D. Understanding challenges of using routinely collected health data to address clinical care gaps: a case study in alberta, Canada. BMJ Open Qual. 2022;11:e001491. doi: 10.1136/bmjoq-2021-001491 [DOI] [PMC free article] [PubMed] [Google Scholar]
56. Ellenberg JH. Selection bias in observational and experimental studies. Stat Med. 1994;13:557–567. doi: 10.1002/sim.4780130518 [DOI] [PubMed] [Google Scholar]
57. Barzi F, Woodward M. Imputations of missing values in practice: results from imputations of serum cholesterol in 28 cohort studies. Am J Epidemiol. 2004;160:34–45. doi: 10.1093/aje/kwh175 [DOI] [PubMed] [Google Scholar]
58. Sterne JA, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, Wood AM, Carpenter JR. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009;338:b2393. [DOI] [PMC free article] [PubMed] [Google Scholar]
59. Tang F, Ishwaran H. Random forest missing data algorithms. Stat Anal Data Min. 2017;10:363–377. doi: 10.1002/sam.11348 [DOI] [PMC free article] [PubMed] [Google Scholar]
60. Roche N, Reddel H, Martin R, Brusselle G, Papi A, Thomas M, Postma D, Thomas V, Rand C, Chisholm A, et al. Quality standards for real‐world research. Focus on observational database studies of comparative effectiveness. Ann Am Thorac Soc. 2014;11:S99–S104. doi: 10.1513/AnnalsATS.201309-300RM [DOI] [PubMed] [Google Scholar]
61. Overhage JM, Ryan PB, Reich CG, Hartzema AG, Stang PE. Validation of a common data model for active safety surveillance research. J Am Med Inform Assoc. 2012;19:54–60. doi: 10.1136/amiajnl-2011-000376 [DOI] [PMC free article] [PubMed] [Google Scholar]
62. Hripcsak G, Duke JD, Shah NH, Reich CG, Huser V, Schuemie MJ, Suchard MA, Park RW, Wong IC, Rijnbeek PR, et al. Observational health data sciences and informatics (ohdsi): opportunities for observational researchers. Stud Health Technol Inform. 2015;216:574–578. [PMC free article] [PubMed] [Google Scholar]
63. Ayaz M, Pasha MF, Alzahrani MY, Budiarto R, Stiawan D. The fast health interoperability resources (fhir) standard: systematic literature review of implementations, applications, challenges and opportunities. JMIR Med Inform. 2021;9:e21929. doi: 10.2196/21929 [DOI] [PMC free article] [PubMed] [Google Scholar]
64. Vorisek CN, Lehne M, Klopfenstein SAI, Mayer PJ, Bartschke A, Haese T, Thun S. Fast healthcare interoperability resources (fhir) for interoperability in health research: systematic review. JMIR Med Inform. 2022;10:e35724. doi: 10.2196/35724 [DOI] [PMC free article] [PubMed] [Google Scholar]
65. Rudrapatna VA, Glicksberg BS, Avila P, Harding‐Theobald E, Wang C, Butte AJ. Accuracy of medical billing data against the electronic health record in the measurement of colorectal cancer screening rates. BMJ Open Qual. 2020;9:e000856. doi: 10.1136/bmjoq-2019-000856 [DOI] [PMC free article] [PubMed] [Google Scholar]
66. Rumsfeld JS, Joynt KE, Maddox TM. Big data analytics to improve cardiovascular care: promise and challenges. Nat Rev Cardiol. 2016;13:350–359. doi: 10.1038/nrcardio.2016.42 [DOI] [PubMed] [Google Scholar]
67. Damen JA, Hooft L, Schuit E, Debray TP, Collins GS, Tzoulaki I, Lassale CM, Siontis GC, Chiocchia V, Roberts C, et al. Prediction models for cardiovascular disease risk in the general population: systematic review. BMJ. 2016;353:i2416. doi: 10.1136/bmj.i2416 [DOI] [PMC free article] [PubMed] [Google Scholar]
68. Lopez‐Neyman SM, Davis K, Zohoori N, Broughton KS, Moore CE, Miketinas D. Racial disparities and prevalence of cardiovascular disease risk factors, cardiometabolic risk factors, and cardiovascular health metrics among us adults: Nhanes 2011–2018. Sci Rep. 2022;12:19475. doi: 10.1038/s41598-022-21878-x [DOI] [PMC free article] [PubMed] [Google Scholar]
69. Hong C, Pencina MJ, Wojdyla DM, Hall JL, Judd SE, Cary M, Engelhard MM, Berchuck S, Xian Y, D'Agostino R Sr, et al. Predictive accuracy of stroke risk prediction models across black and white race, sex, and age groups. JAMA. 2023;329:306–317. doi: 10.1001/jama.2022.24683 [DOI] [PMC free article] [PubMed] [Google Scholar]
70. Paradies Y, Ben J, Denson N, Elias A, Priest N, Pieterse A, Gupta A, Kelaher M, Gee G. Racism as a determinant of health: a systematic review and meta‐analysis. PLoS One. 2015;10:e0138511. doi: 10.1371/journal.pone.0138511 [DOI] [PMC free article] [PubMed] [Google Scholar]
71. Thomas Craig KJ, Fusco N, Gunnarsdottir T, Chamberland L, Snowdon JL, Kassler WJ. Leveraging data and digital health technologies to assess and impact social determinants of health (SDOH): a state‐of‐the‐art literature review. Online J Public Health Inform. 2021;13:E14. doi: 10.5210/ojphi.v13i3.11081 [DOI] [PMC free article] [PubMed] [Google Scholar]
72. McNeill E, Lindenfeld Z, Mostafa L, Zein D, Silver D, Pagan J, Weeks WB, Aerts A, Des Rosiers S, Boch J, et al. Uses of social determinants of health data to address cardiovascular disease and health equity: a scoping review. J Am Heart Assoc. 2023;12:e030571. doi: 10.1161/JAHA.123.030571 [DOI] [PMC free article] [PubMed] [Google Scholar]
73. Rusanov A, Weiskopf NG, Wang S, Weng C. Hidden in plain sight: bias towards sick patients when sampling patients with sufficient electronic health record data for research. BMC Med Inform Decis Mak. 2014;14:1–9. doi: 10.1186/1472-6947-14-51 [DOI] [PMC free article] [PubMed] [Google Scholar]
74. Goldstein BA, Bhavsar NA, Phelan M, Pencina MJ. Controlling for informed presence bias due to the number of health encounters in an electronic health record. Am J Epidemiol. 2016;184:847–855. doi: 10.1093/aje/kww112 [DOI] [PMC free article] [PubMed] [Google Scholar]
75. Wells BJ, Chagin KM, Nowacki AS, Kattan MW. Strategies for handling missing data in electronic health record derived data. EGEMS (Wash DC). 2013;1:1035. [DOI] [PMC free article] [PubMed] [Google Scholar]
76. Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (tripod): the tripod statement. Ann Intern Med. 2015;162:55–63. doi: 10.7326/M14-0698 [DOI] [PubMed] [Google Scholar]
77. Moons KG, Altman DG, Reitsma JB, Ioannidis JP, Macaskill P, Steyerberg EW, Vickers AJ, Ransohoff DF, Collins GS. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (tripod): explanation and elaboration. Ann Intern Med. 2015;162:W1–W73. doi: 10.7326/M14-0698 [DOI] [PubMed] [Google Scholar]
78. Kaufman S, Rosset S, Perlich C, Stitelman O. Leakage in data mining: formulation, detection, and avoidance. ACM Trans Knowl Discov Data (TKDD). 2012;6:1–21. doi: 10.1145/2382577.2382579 [DOI] [Google Scholar]
79. Olsavszky V, Dosius M, Vladescu C, Benecke J. Time series analysis and forecasting with automated machine learning on a national icd‐10 database. Int J Environ Res Public Health. 2020;17:4979. doi: 10.3390/ijerph17144979 [DOI] [PMC free article] [PubMed] [Google Scholar]
80. van Smeden M, Heinze G, Van Calster B, Asselbergs FW, Vardas PE, Bruining N, de Jaegere P, Moore JH, Denaxas S, Boulesteix AL, et al. Critical appraisal of artificial intelligence‐based prediction models for cardiovascular disease. Eur Heart J. 2022;43:2921–2930. doi: 10.1093/eurheartj/ehac238 [DOI] [PMC free article] [PubMed] [Google Scholar]
81. Alowais SA, Alghamdi SS, Alsuhebany N, Alqahtani T, Alshaya AI, Almohareb SN, Aldairem A, Alrashed M, Bin Saleh K, Badreldin HA, et al. Revolutionizing healthcare: the role of artificial intelligence in clinical practice. BMC Med Educ. 2023;23:689. doi: 10.1186/s12909-023-04698-z [DOI] [PMC free article] [PubMed] [Google Scholar]
82. Sadeghi Z, Alizadehsani R, Cifci MA, Kausar S, Rehman R, Mahanta P, Bora PK, Almasri A, Alkhawaldeh RS, Hussain S. A review of explainable artificial intelligence in healthcare. Comput Electr Eng. 2024;118:109370. doi: 10.1016/j.compeleceng.2024.109370 [DOI] [Google Scholar]
83. Hussain L, Awan IA, Aziz W, Saeed S, Ali A, Zeeshan F, Kwak KS. Detecting congestive heart failure by extracting multimodal features and employing machine learning techniques. Biomed Res Int. 2020;2020:4281243. doi: 10.1155/2020/4281243 [DOI] [PMC free article] [PubMed] [Google Scholar]
84. Wang F, Preininger A. Ai in health: state of the art, challenges, and future directions. Yearb Med Inform. 2019;28:16–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
85. Gala D, Behl H, Shah M, Makaryus AN. The role of artificial intelligence in improving patient outcomes and future of healthcare delivery in cardiology: a narrative review of the literature. Healthcare (Basel). 2024;12:481. doi: 10.3390/healthcare12040481 [DOI] [PMC free article] [PubMed] [Google Scholar]
86. Shu S, Ren J, Song J. Clinical application of machine learning‐based artificial intelligence in the diagnosis, prediction, and classification of cardiovascular diseases. Circ J. 2021;85:1416–1425. doi: 10.1253/circj.CJ-20-1121 [DOI] [PubMed] [Google Scholar]
87. Preechakul K, Sriswasdi S, Kijsirikul B, Chuangsuwanich E. Improved image classification explainability with high‐accuracy heatmaps. iScience. 2022;25:103933. doi: 10.1016/j.isci.2022.103933 [DOI] [PMC free article] [PubMed] [Google Scholar]
88. Feldhus N, Hennig L, Nasert MD, Ebert C, Schwarzenberg R, Moller S. Saliency map verbalization: comparing feature importance representations from model‐free and instruction‐based methods. Proceedings of the 1st Workshop on Natural Language Reasoning and Structured Explanations (NLRSE)2022.
89. Akbilgic O. Principles of artificial intellgence for medicine. J Am Heart Assoc. 2024;13:e035815. doi: 10.1161/JAHA.124.035815 [DOI] [PMC free article] [PubMed] [Google Scholar]
90. Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, Katz R, Himmelfarb J, Bansal N, Lee SI. From local explanations to global understanding with explainable ai for trees. Nat Mach Intell. 2020;2:56–67. doi: 10.1038/s42256-019-0138-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
91. Ribeiro MT, Singh S, Guestrin C. Why should I trust you? Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). New York, NY, USA: Association for Computing Machinery; 2016:1135–1144. doi: 10.1145/2939672.2939778 [DOI] [Google Scholar]
92. Messenger JC, Ho KK, Young CH, Slattery LE, Draoui JC, Curtis JP, Dehmer GJ, Grover FL, Mirro MJ, Reynolds MR, et al. The national cardiovascular data registry (NCDR) data quality brief: the ncdr data quality program in 2012. J Am Coll Cardiol. 2012;60:1484–1488. doi: 10.1016/j.jacc.2012.07.020 [DOI] [PubMed] [Google Scholar]
93. Cruz Rivera S, Liu X, Chan AW, Denniston AK, Calvert MJ, Spirit AI, Group C‐AW , Spirit AI, Group C‐AS , Spirit AI , et al. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the spirit‐ai extension. Nat Med. 2020;26:1351–1363. doi: 10.1038/s41591-020-1037-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
94. Liu X, Cruz Rivera S, Moher D, Calvert MJ, Denniston AK, Spirit AI, Group C‐AW . Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the consort‐ai extension. Nat Med. 2020;26:1364–1374. doi: 10.1038/s41591-020-1034-x [DOI] [PMC free article] [PubMed] [Google Scholar]
95. Sutton RT, Pincock D, Baumgart DC, Sadowski DC, Fedorak RN, Kroeker KI. An overview of clinical decision support systems: benefits, risks, and strategies for success. NPJ Digit Med. 2020;3:17. doi: 10.1038/s41746-020-0221-y [DOI] [PMC free article] [PubMed] [Google Scholar]
96. Kulkarni PA, Singh H. Artificial intelligence in clinical diagnosis: opportunities, challenges, and hype. JAMA. 2023;330:317–318. doi: 10.1001/jama.2023.11440 [DOI] [PubMed] [Google Scholar]
97. Peng C, Yang X, Chen A, Smith KE, PourNejatian N, Costa AB, Martin C, Flores MG, Zhang Y, Magoc T, et al. A study of generative large language model for medical research and healthcare. NPJ Digit Med. 2023;6:210. doi: 10.1038/s41746-023-00958-w [DOI] [PMC free article] [PubMed] [Google Scholar]
98. Nolin‐Lapalme A, Theriault‐Lauzier P, Corbin D, Tastet O, Sharma A, Hussin JG, Kadoury S, Jiang R, Krahn AD, Gallo R, et al. Maximising large language model utility in cardiovascular care: a practical guide. Can J Cardiol. 2024;40:1774–1787. doi: 10.1016/j.cjca.2024.05.024 [DOI] [PubMed] [Google Scholar]

[jah310696-bib-0001] 1. Cowie MR, Blomster JI, Curtis LH, Duclaux S, Ford I, Fritz F, Goldman S, Janmohamed S, Kreuzer J, Leenay M, et al. Electronic health records to facilitate clinical research. Clin Res Cardiol. 2017;106:1–9. doi: 10.1007/s00392-016-1025-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0002] 2. Kim E, Rubinstein SM, Nead KT, Wojcieszynski AP, Gabriel PE, Warner JL. The evolving use of electronic health records (EHR) for research. Semin Radiat Oncol. 2019;29:354–361. doi: 10.1016/j.semradonc.2019.05.010 [DOI] [PubMed] [Google Scholar]

[jah310696-bib-0003] 3. Goldstein BA, Navar AM, Pencina MJ, Ioannidis JP. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. J Am Med Inform Assoc. 2017;24:198–208. doi: 10.1093/jamia/ocw042 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0004] 4. Hernandez‐Boussard T, Monda KL, Crespo BC, Riskin D. Real world evidence in cardiovascular medicine: ensuring data validity in electronic health record‐based studies. J Am Med Inform Assoc. 2019;26:1189–1194. doi: 10.1093/jamia/ocz119 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0005] 5. Tapuria A, Porat T, Kalra D, Dsouza G, Xiaohui S, Curcin V. Impact of patient access to their electronic health record: systematic review. Inform Health Soc Care. 2021;46:192–204. doi: 10.1080/17538157.2021.1879810 [DOI] [PubMed] [Google Scholar]

[jah310696-bib-0006] 6. Pawelek J, Baca‐Motes K, Pandit JA, Berk BB, Ramos E. The power of patient engagement with electronic health records as research participants. JMIR Med Inform. 2022;10:e39145. doi: 10.2196/39145 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0007] 7. Sauer CM, Chen LC, Hyland SL, Girbes A, Elbers P, Celi LA. Leveraging electronic health records for data science: common pitfalls and how to avoid them. Lancet Digit Health. 2022;4:e893–e898. doi: 10.1016/S2589-7500(22)00154-6 [DOI] [PubMed] [Google Scholar]

[jah310696-bib-0008] 8. Hemingway H, Asselbergs FW, Danesh J, Dobson R, Maniadakis N, Maggioni A, van Thiel GJM, Cronin M, Brobert G, Vardas P, et al. Big data from electronic health records for early and late translational cardiovascular research: challenges and potential. Eur Heart J. 2018;39:1481–1495. doi: 10.1093/eurheartj/ehx487 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0009] 9. Romiti S, Vinciguerra M, Saade W, Anso Cortajarena I, Greco E. Artificial intelligence (AI) and cardiovascular diseases: an unexpected alliance. Cardiol Res Pract. 2020;2020:4972346. doi: 10.1155/2020/4972346 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0010] 10. Anderson KM, Odell PM, Wilson PW, Kannel WB. Cardiovascular disease risk profiles. Am Heart J. 1991;121:293–298. [DOI] [PubMed] [Google Scholar]

[jah310696-bib-0011] 11. Conroy RM, Pyorala K, Fitzgerald AP, Sans S, Menotti A, De Backer G, De Bacquer D, Ducimetiere P, Jousilahti P, Keil U, et al. Estimation of ten‐year risk of fatal cardiovascular disease in europe: the score project. Eur Heart J. 2003;24:987–1003. doi: 10.1016/S0195-668X(03)00114-3 [DOI] [PubMed] [Google Scholar]

[jah310696-bib-0012] 12. D'Agostino RB Sr, Vasan RS, Pencina MJ, Wolf PA, Cobain M, Massaro JM, Kannel WB. General cardiovascular risk profile for use in primary care: the framingham heart study. Circulation. 2008;117:743–753. [DOI] [PubMed] [Google Scholar]

[jah310696-bib-0013] 13. Goff DC Jr, Lloyd‐Jones DM, Bennett G, Coady S, D'Agostino RB, Gibbons R, Greenland P, Lackland DT, Levy D, O'Donnell CJ, et al. 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. Circulation. 2014;129:S49–S73. doi: 10.1161/01.cir.0000437741.48606.98 [DOI] [PubMed] [Google Scholar]

[jah310696-bib-0014] 14. WHO CVD Risk Chart Working Group . World health organization cardiovascular disease risk charts: revised models to estimate risk in 21 global regions. Lancet Glob Health. 2019;7:e1332–e1345. doi: 10.1016/S2214-109X(22)00522-8 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0015] 15. SCORE2 working group and ESC Cardiovascular risk collaboration, Collaboration ESCCR . Score2 risk prediction algorithms: new models to estimate 10‐year risk of cardiovascular disease in europe. Eur Heart J. 2021;42:2439–2454. doi: 10.1093/eurheartj/ehab309 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0016] 16. van Daalen KR, Zhang D, Kaptoge S, Paige E, Di Angelantonio E, Pennells L. Risk estimation for the primary prevention of cardiovascular disease: considerations for appropriate risk prediction model selection. Lancet Glob Health. 2024;12:e1343–e1358. doi: 10.1016/S2214-109X(24)00210-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0017] 17. Wolfson J, Vock DM, Bandyopadhyay S, Kottke T, Vazquez‐Benitez G, Johnson P, Adomavicius G, O'Connor PJ. Use and customization of risk scores for predicting cardiovascular events using electronic health record data. J Am Heart Assoc. 2017;6:6. doi: 10.1161/JAHA.116.003670 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0018] 18. Ko DT, Sivaswamy A, Sud M, Kotrri G, Azizi P, Koh M, Austin PC, Lee DS, Roifman I, Thanassoulis G, et al. Calibration and discrimination of the framingham risk score and the pooled cohort equations. CMAJ. 2020;192:E442–E449. doi: 10.1503/cmaj.190848 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0019] 19. Sud M, Sivaswamy A, Chu A, Austin PC, Anderson TJ, Naimark DMJ, Farkouh ME, Lee DS, Roifman I, Thanassoulis G, et al. Population‐based recalibration of the framingham risk score and pooled cohort equations. J Am Coll Cardiol. 2022;80:1330–1342. doi: 10.1016/j.jacc.2022.07.026 [DOI] [PubMed] [Google Scholar]

[jah310696-bib-0020] 20. Denaxas SC, Morley KI. Big biomedical data and cardiovascular disease research: opportunities and challenges. Eur Heart J Qual Care Clin Outcomes. 2015;1:9–16. doi: 10.1093/ehjqcco/qcv005 [DOI] [PubMed] [Google Scholar]

[jah310696-bib-0021] 21. Hippisley‐Cox J, Coupland C, Vinogradova Y, Robson J, May M, Brindle P. Derivation and validation of qrisk, a new cardiovascular disease risk score for the United Kingdom: prospective open cohort study. BMJ. 2007;335:136. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0022] 22. Hippisley‐Cox J, Coupland CAC, Bafadhel M, Russell REK, Sheikh A, Brindle P, Channon KM. Development and validation of a new algorithm for improved cardiovascular risk prediction. Nat Med. 2024;30:1440–1447. doi: 10.1038/s41591-024-02905-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0023] 23. Hippisley‐Cox J, Coupland C, Brindle P. Development and validation of qrisk3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study. BMJ. 2017;357:j2099. doi: 10.1136/bmj.j2099 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0024] 24. Pylypchuk R, Wells S, Kerr A, Poppe K, Riddell T, Harwood M, Exeter D, Mehta S, Grey C, Wu BP. Cardiovascular disease risk prediction equations in 400 000 primary care patients in New Zealand: a derivation and validation study. Lancet. 2018;391:1897–1907. [DOI] [PubMed] [Google Scholar]

[jah310696-bib-0025] 25. Khan SS, Coresh J, Pencina MJ, Ndumele CE, Rangaswami J, Chow SL, Palaniappan LP, Sperling LS, Virani SS, Ho JE, et al. Novel prediction equations for absolute risk assessment of total cardiovascular disease incorporating cardiovascular‐kidney‐metabolic health: a scientific statement from the american heart association. Circulation. 2023;148:1982–2004. doi: 10.1161/CIR.0000000000001191 [DOI] [PubMed] [Google Scholar]

[jah310696-bib-0026] 26. Khan SS, Matsushita K, Sang Y, Ballew SH, Grams ME, Surapaneni A, Blaha MJ, Carson AP, Chang AR, Ciemins E, et al. Development and validation of the American Heart Association's PREVENT equations. Circulation. 2024;149:430–449. doi: 10.1161/CIRCULATIONAHA.123.067626 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0027] 27. Larkin H. What to know about prevent, the aha's new cardiovascular disease risk calculator. JAMA. 2024;331:277–279. doi: 10.1001/jama.2023.25115 [DOI] [PubMed] [Google Scholar]

[jah310696-bib-0028] 28. Dey D, Slomka PJ, Leeson P, Comaniciu D, Shrestha S, Sengupta PP, Marwick TH. Artificial intelligence in cardiovascular imaging: Jacc state‐of‐the‐art review. J Am Coll Cardiol. 2019;73:1317–1335. doi: 10.1016/j.jacc.2018.12.054 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0029] 29. Wu J, Roy J, Stewart WF. Prediction modeling using ehr data: challenges, strategies, and a comparison of machine learning approaches. Med Care. 2010;48:S106–S113. doi: 10.1097/MLR.0b013e3181de9e17 [DOI] [PubMed] [Google Scholar]

[jah310696-bib-0030] 30. Dhingra LS, Shen M, Mangla A, Khera R. Cardiovascular care innovation through data‐driven discoveries in the electronic health record. Am J Cardiol. 2023;203:136–148. doi: 10.1016/j.amjcard.2023.06.104 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0031] 31. Zhang D, Yin C, Zeng J, Yuan X, Zhang P. Combining structured and unstructured data for predictive models: a deep learning approach. BMC Med Inform Decis Mak. 2020;20:280. doi: 10.1186/s12911-020-01297-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0032] 32. Alloghani M, Al‐Jumeily D, Mustafina J, Hussain A, Aljaaf AJ. A systematic review on supervised and unsupervised machine learning algorithms for data science. In: Berry M, Mohamed A, Yap B, eds Supervised and unsupervised learning for data science. Unsupervised and semi‐supervised learning. Cham: Springer; 2020. doi: 10.1007/978-3-030-22475-2_1 [DOI] [Google Scholar]

[jah310696-bib-0033] 33. Sarker IH. Machine learning: algorithms, real‐world applications and research directions. SN Comput Sci. 2021;2:160. doi: 10.1007/s42979-021-00592-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0034] 34. Alaa AM, Bolton T, Di Angelantonio E, Rudd JHF, van der Schaar M. Cardiovascular disease risk prediction using automated machine learning: a prospective study of 423,604 UK biobank participants. PLoS One. 2019;14:e0213653. doi: 10.1371/journal.pone.0213653 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0035] 35. Yasmin F, Shah SMI, Naeem A, Shujauddin SM, Jabeen A, Kazmi S, Siddiqui SA, Kumar P, Salman S, Hassan SA, et al. Artificial intelligence in the diagnosis and detection of heart failure: the past, present, and future. Rev Cardiovasc Med. 2021;22:1095–1113. doi: 10.31083/j.rcm2204121 [DOI] [PubMed] [Google Scholar]

[jah310696-bib-0036] 36. Kagiyama N, Shrestha S, Farjo PD, Sengupta PP. Artificial intelligence: practical primer for clinical research in cardiovascular disease. J Am Heart Assoc. 2019;8:e012788. doi: 10.1161/JAHA.119.012788 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0037] 37. Sevakula RK, Au‐Yeung WM, Singh JP, Heist EK, Isselbacher EM, Armoundas AA. State‐of‐the‐art machine learning techniques aiming to improve patient outcomes pertaining to the cardiovascular system. J Am Heart Assoc. 2020;9:e013924. doi: 10.1161/JAHA.119.013924 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0038] 38. Mohd Faizal AS, Thevarajah TM, Khor SM, Chang SW. A review of risk prediction models in cardiovascular disease: conventional approach vs. artificial intelligent approach. Comput Methods Prog Biomed. 2021;207:106190. doi: 10.1016/j.cmpb.2021.106190 [DOI] [PubMed] [Google Scholar]

[jah310696-bib-0039] 39. Liu W, Laranjo L, Klimis H, Chiang J, Yue J, Marschner S, Quiroz JC, Jorm L, Chow CK. Machine‐learning versus traditional approaches for atherosclerotic cardiovascular risk prognostication in primary prevention cohorts: a systematic review and meta‐analysis. Eur Heart J Qual Care Clin Outcomes. 2023;9:310–322. doi: 10.1093/ehjqcco/qcad017 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0040] 40. Petrazzini BO, Chaudhary K, Marquez‐Luna C, Forrest IS, Rocheleau G, Cho J, Narula J, Nadkarni G, Do R. Coronary risk estimation based on clinical data in electronic health records. J Am Coll Cardiol. 2022;79:1155–1166. doi: 10.1016/j.jacc.2022.01.021 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0041] 41. Betancur J, Otaki Y, Motwani M, Fish MB, Lemley M, Dey D, Gransar H, Tamarappoo B, Germano G, Sharir T, et al. Prognostic value of combined clinical and myocardial perfusion imaging data using machine learning. JACC Cardiovasc Imaging. 2018;11:1000–1009. doi: 10.1016/j.jcmg.2017.07.024 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0042] 42. Gharehchopogh FS, Khalifelu ZA. Neural Network application in diagnosis of patient: A case study. International Conference on Computer Networks and Information Technology. Abbottabad, Pakistan: Institute of Electrical and Electronics Engineers (IEEE); 2011:245–249. doi: 10.1109/ICCNIT.2011.6020937 [DOI] [Google Scholar]

[jah310696-bib-0043] 43. Li Y, Salimi‐Khorshidi G, Rao S, Canoy D, Hassaine A, Lukasiewicz T, Rahimi K, Mamouei M. Validation of risk prediction models applied to longitudinal electronic health record data for the prediction of major cardiovascular events in the presence of data shifts. Eur Heart J Digit Health. 2022;3:535–547. doi: 10.1093/ehjdh/ztac061 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0044] 44. Krittanawong C, Johnson KW, Choi E, Kaplin S, Venner E, Murugan M, Wang Z, Glicksberg BS, Amos CI, Schatz MC, et al. Artificial intelligence and cardiovascular genetics. Life (Basel). 2022;12:279. doi: 10.3390/life12020279 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0045] 45. Motwani M, Dey D, Berman DS, Germano G, Achenbach S, Al‐Mallah MH, Andreini D, Budoff MJ, Cademartiri F, Callister TQ, et al. Machine learning for prediction of all‐cause mortality in patients with suspected coronary artery disease: a 5‐year multicentre prospective registry analysis. Eur Heart J. 2017;38:500–507. doi: 10.1093/eurheartj/ehw188 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0046] 46. Nakanishi R, Dey D, Commandeur F, Slomka P, Betancur J, Gransar H, Dailing C, Osawa K, Berman D, Budoff M. Machine learning in predicting coronary heart disease and cardiovascular disease events: results from the multi‐ethnic study of atherosclerosis (MESA). J Am Coll Cardiol. 2018;71:A1483–A1483. doi: 10.1016/S0735-1097(18)32024-2 [DOI] [Google Scholar]

[jah310696-bib-0047] 47. Kwon J, Kim KH, Jeon KH, Park J. Deep learning for predicting in‐hospital mortality among heart disease patients based on echocardiography. Echocardiography. 2019;36:213–218. doi: 10.1111/echo.14220 [DOI] [PubMed] [Google Scholar]

[jah310696-bib-0048] 48. Antman EM, Cohen M, Bernink PJ, McCabe CH, Horacek T, Papuchis G, Mautner B, Corbalan R, Radley D, Braunwald E. The TIMI risk score for unstable angina/non‐st elevation mi: a method for prognostication and therapeutic decision making. JAMA. 2000;284:835–842. [DOI] [PubMed] [Google Scholar]

[jah310696-bib-0049] 49. Granger CB, Goldberg RJ, Dabbous O, Pieper KS, Eagle KA, Cannon CP, Van De Werf F, Avezum A, Goodman SG, Flather MD, et al. Predictors of hospital mortality in the global registry of acute coronary events. Arch Intern Med. 2003;163:2345–2353. [DOI] [PubMed] [Google Scholar]

[jah310696-bib-0050] 50. Li Y, Rao S, Solares JRA, Hassaine A, Ramakrishnan R, Canoy D, Zhu Y, Rahimi K, Salimi‐Khorshidi G. BEHRT: transformer for electronic health records. Sci Rep. 2020;10:7155. doi: 10.1038/s41598-020-62922-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0051] 51. Liu KH, Chiang CY, Wang HY, Tseng YJ. Temporal Phenotype Matrix Engineering for Electronic Health Records – Enhancing Coronary Artery Disease Prediction. 2023 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI). Pittsburgh, PA, USA: Institute of Electrical and Electronics Engineers (IEEE); 2023:1–4. doi: 10.1109/BHI58575.2023.10313504 [DOI] [Google Scholar]

[jah310696-bib-0052] 52. Sun X, Yin Y, Yang Q, Huo T. Artificial intelligence in cardiovascular diseases: diagnostic and therapeutic perspectives. Eur J Med Res. 2023;28:242. doi: 10.1186/s40001-023-01065-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0053] 53. Yu J, Yang X, Deng Y, Krefman AE, Pool LR, Zhao L, Mi X, Ning H, Wilkins J, Lloyd‐Jones DM, et al. Incorporating longitudinal history of risk factors into atherosclerotic cardiovascular disease risk prediction using deep learning. Sci Rep. 2024;14:2554. doi: 10.1038/s41598-024-51685-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0054] 54. Gianfrancesco MA, Goldstein ND. A narrative review on the validity of electronic health record‐based research in epidemiology. BMC Med Res Methodol. 2021;21:234. doi: 10.1186/s12874-021-01416-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0055] 55. McGuckin T, Crick K, Myroniuk TW, Setchell B, Yeung RO, Campbell‐Scherer D. Understanding challenges of using routinely collected health data to address clinical care gaps: a case study in alberta, Canada. BMJ Open Qual. 2022;11:e001491. doi: 10.1136/bmjoq-2021-001491 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0056] 56. Ellenberg JH. Selection bias in observational and experimental studies. Stat Med. 1994;13:557–567. doi: 10.1002/sim.4780130518 [DOI] [PubMed] [Google Scholar]

[jah310696-bib-0057] 57. Barzi F, Woodward M. Imputations of missing values in practice: results from imputations of serum cholesterol in 28 cohort studies. Am J Epidemiol. 2004;160:34–45. doi: 10.1093/aje/kwh175 [DOI] [PubMed] [Google Scholar]

[jah310696-bib-0058] 58. Sterne JA, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, Wood AM, Carpenter JR. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009;338:b2393. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0059] 59. Tang F, Ishwaran H. Random forest missing data algorithms. Stat Anal Data Min. 2017;10:363–377. doi: 10.1002/sam.11348 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0060] 60. Roche N, Reddel H, Martin R, Brusselle G, Papi A, Thomas M, Postma D, Thomas V, Rand C, Chisholm A, et al. Quality standards for real‐world research. Focus on observational database studies of comparative effectiveness. Ann Am Thorac Soc. 2014;11:S99–S104. doi: 10.1513/AnnalsATS.201309-300RM [DOI] [PubMed] [Google Scholar]

[jah310696-bib-0061] 61. Overhage JM, Ryan PB, Reich CG, Hartzema AG, Stang PE. Validation of a common data model for active safety surveillance research. J Am Med Inform Assoc. 2012;19:54–60. doi: 10.1136/amiajnl-2011-000376 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0062] 62. Hripcsak G, Duke JD, Shah NH, Reich CG, Huser V, Schuemie MJ, Suchard MA, Park RW, Wong IC, Rijnbeek PR, et al. Observational health data sciences and informatics (ohdsi): opportunities for observational researchers. Stud Health Technol Inform. 2015;216:574–578. [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0063] 63. Ayaz M, Pasha MF, Alzahrani MY, Budiarto R, Stiawan D. The fast health interoperability resources (fhir) standard: systematic literature review of implementations, applications, challenges and opportunities. JMIR Med Inform. 2021;9:e21929. doi: 10.2196/21929 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0064] 64. Vorisek CN, Lehne M, Klopfenstein SAI, Mayer PJ, Bartschke A, Haese T, Thun S. Fast healthcare interoperability resources (fhir) for interoperability in health research: systematic review. JMIR Med Inform. 2022;10:e35724. doi: 10.2196/35724 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0065] 65. Rudrapatna VA, Glicksberg BS, Avila P, Harding‐Theobald E, Wang C, Butte AJ. Accuracy of medical billing data against the electronic health record in the measurement of colorectal cancer screening rates. BMJ Open Qual. 2020;9:e000856. doi: 10.1136/bmjoq-2019-000856 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0066] 66. Rumsfeld JS, Joynt KE, Maddox TM. Big data analytics to improve cardiovascular care: promise and challenges. Nat Rev Cardiol. 2016;13:350–359. doi: 10.1038/nrcardio.2016.42 [DOI] [PubMed] [Google Scholar]

[jah310696-bib-0067] 67. Damen JA, Hooft L, Schuit E, Debray TP, Collins GS, Tzoulaki I, Lassale CM, Siontis GC, Chiocchia V, Roberts C, et al. Prediction models for cardiovascular disease risk in the general population: systematic review. BMJ. 2016;353:i2416. doi: 10.1136/bmj.i2416 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0068] 68. Lopez‐Neyman SM, Davis K, Zohoori N, Broughton KS, Moore CE, Miketinas D. Racial disparities and prevalence of cardiovascular disease risk factors, cardiometabolic risk factors, and cardiovascular health metrics among us adults: Nhanes 2011–2018. Sci Rep. 2022;12:19475. doi: 10.1038/s41598-022-21878-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0069] 69. Hong C, Pencina MJ, Wojdyla DM, Hall JL, Judd SE, Cary M, Engelhard MM, Berchuck S, Xian Y, D'Agostino R Sr, et al. Predictive accuracy of stroke risk prediction models across black and white race, sex, and age groups. JAMA. 2023;329:306–317. doi: 10.1001/jama.2022.24683 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0070] 70. Paradies Y, Ben J, Denson N, Elias A, Priest N, Pieterse A, Gupta A, Kelaher M, Gee G. Racism as a determinant of health: a systematic review and meta‐analysis. PLoS One. 2015;10:e0138511. doi: 10.1371/journal.pone.0138511 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0071] 71. Thomas Craig KJ, Fusco N, Gunnarsdottir T, Chamberland L, Snowdon JL, Kassler WJ. Leveraging data and digital health technologies to assess and impact social determinants of health (SDOH): a state‐of‐the‐art literature review. Online J Public Health Inform. 2021;13:E14. doi: 10.5210/ojphi.v13i3.11081 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0072] 72. McNeill E, Lindenfeld Z, Mostafa L, Zein D, Silver D, Pagan J, Weeks WB, Aerts A, Des Rosiers S, Boch J, et al. Uses of social determinants of health data to address cardiovascular disease and health equity: a scoping review. J Am Heart Assoc. 2023;12:e030571. doi: 10.1161/JAHA.123.030571 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0073] 73. Rusanov A, Weiskopf NG, Wang S, Weng C. Hidden in plain sight: bias towards sick patients when sampling patients with sufficient electronic health record data for research. BMC Med Inform Decis Mak. 2014;14:1–9. doi: 10.1186/1472-6947-14-51 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0074] 74. Goldstein BA, Bhavsar NA, Phelan M, Pencina MJ. Controlling for informed presence bias due to the number of health encounters in an electronic health record. Am J Epidemiol. 2016;184:847–855. doi: 10.1093/aje/kww112 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0075] 75. Wells BJ, Chagin KM, Nowacki AS, Kattan MW. Strategies for handling missing data in electronic health record derived data. EGEMS (Wash DC). 2013;1:1035. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0076] 76. Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (tripod): the tripod statement. Ann Intern Med. 2015;162:55–63. doi: 10.7326/M14-0698 [DOI] [PubMed] [Google Scholar]

[jah310696-bib-0077] 77. Moons KG, Altman DG, Reitsma JB, Ioannidis JP, Macaskill P, Steyerberg EW, Vickers AJ, Ransohoff DF, Collins GS. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (tripod): explanation and elaboration. Ann Intern Med. 2015;162:W1–W73. doi: 10.7326/M14-0698 [DOI] [PubMed] [Google Scholar]

[jah310696-bib-0078] 78. Kaufman S, Rosset S, Perlich C, Stitelman O. Leakage in data mining: formulation, detection, and avoidance. ACM Trans Knowl Discov Data (TKDD). 2012;6:1–21. doi: 10.1145/2382577.2382579 [DOI] [Google Scholar]

[jah310696-bib-0079] 79. Olsavszky V, Dosius M, Vladescu C, Benecke J. Time series analysis and forecasting with automated machine learning on a national icd‐10 database. Int J Environ Res Public Health. 2020;17:4979. doi: 10.3390/ijerph17144979 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0080] 80. van Smeden M, Heinze G, Van Calster B, Asselbergs FW, Vardas PE, Bruining N, de Jaegere P, Moore JH, Denaxas S, Boulesteix AL, et al. Critical appraisal of artificial intelligence‐based prediction models for cardiovascular disease. Eur Heart J. 2022;43:2921–2930. doi: 10.1093/eurheartj/ehac238 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0081] 81. Alowais SA, Alghamdi SS, Alsuhebany N, Alqahtani T, Alshaya AI, Almohareb SN, Aldairem A, Alrashed M, Bin Saleh K, Badreldin HA, et al. Revolutionizing healthcare: the role of artificial intelligence in clinical practice. BMC Med Educ. 2023;23:689. doi: 10.1186/s12909-023-04698-z [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0082] 82. Sadeghi Z, Alizadehsani R, Cifci MA, Kausar S, Rehman R, Mahanta P, Bora PK, Almasri A, Alkhawaldeh RS, Hussain S. A review of explainable artificial intelligence in healthcare. Comput Electr Eng. 2024;118:109370. doi: 10.1016/j.compeleceng.2024.109370 [DOI] [Google Scholar]

[jah310696-bib-0083] 83. Hussain L, Awan IA, Aziz W, Saeed S, Ali A, Zeeshan F, Kwak KS. Detecting congestive heart failure by extracting multimodal features and employing machine learning techniques. Biomed Res Int. 2020;2020:4281243. doi: 10.1155/2020/4281243 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0084] 84. Wang F, Preininger A. Ai in health: state of the art, challenges, and future directions. Yearb Med Inform. 2019;28:16–26. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0085] 85. Gala D, Behl H, Shah M, Makaryus AN. The role of artificial intelligence in improving patient outcomes and future of healthcare delivery in cardiology: a narrative review of the literature. Healthcare (Basel). 2024;12:481. doi: 10.3390/healthcare12040481 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0086] 86. Shu S, Ren J, Song J. Clinical application of machine learning‐based artificial intelligence in the diagnosis, prediction, and classification of cardiovascular diseases. Circ J. 2021;85:1416–1425. doi: 10.1253/circj.CJ-20-1121 [DOI] [PubMed] [Google Scholar]

[jah310696-bib-0087] 87. Preechakul K, Sriswasdi S, Kijsirikul B, Chuangsuwanich E. Improved image classification explainability with high‐accuracy heatmaps. iScience. 2022;25:103933. doi: 10.1016/j.isci.2022.103933 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0088] 88. Feldhus N, Hennig L, Nasert MD, Ebert C, Schwarzenberg R, Moller S. Saliency map verbalization: comparing feature importance representations from model‐free and instruction‐based methods. Proceedings of the 1st Workshop on Natural Language Reasoning and Structured Explanations (NLRSE)2022.

[jah310696-bib-0089] 89. Akbilgic O. Principles of artificial intellgence for medicine. J Am Heart Assoc. 2024;13:e035815. doi: 10.1161/JAHA.124.035815 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0090] 90. Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, Katz R, Himmelfarb J, Bansal N, Lee SI. From local explanations to global understanding with explainable ai for trees. Nat Mach Intell. 2020;2:56–67. doi: 10.1038/s42256-019-0138-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0091] 91. Ribeiro MT, Singh S, Guestrin C. Why should I trust you? Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). New York, NY, USA: Association for Computing Machinery; 2016:1135–1144. doi: 10.1145/2939672.2939778 [DOI] [Google Scholar]

[jah310696-bib-0092] 92. Messenger JC, Ho KK, Young CH, Slattery LE, Draoui JC, Curtis JP, Dehmer GJ, Grover FL, Mirro MJ, Reynolds MR, et al. The national cardiovascular data registry (NCDR) data quality brief: the ncdr data quality program in 2012. J Am Coll Cardiol. 2012;60:1484–1488. doi: 10.1016/j.jacc.2012.07.020 [DOI] [PubMed] [Google Scholar]

[jah310696-bib-0093] 93. Cruz Rivera S, Liu X, Chan AW, Denniston AK, Calvert MJ, Spirit AI, Group C‐AW , Spirit AI, Group C‐AS , Spirit AI , et al. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the spirit‐ai extension. Nat Med. 2020;26:1351–1363. doi: 10.1038/s41591-020-1037-7 [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0094] 94. Liu X, Cruz Rivera S, Moher D, Calvert MJ, Denniston AK, Spirit AI, Group C‐AW . Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the consort‐ai extension. Nat Med. 2020;26:1364–1374. doi: 10.1038/s41591-020-1034-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0095] 95. Sutton RT, Pincock D, Baumgart DC, Sadowski DC, Fedorak RN, Kroeker KI. An overview of clinical decision support systems: benefits, risks, and strategies for success. NPJ Digit Med. 2020;3:17. doi: 10.1038/s41746-020-0221-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0096] 96. Kulkarni PA, Singh H. Artificial intelligence in clinical diagnosis: opportunities, challenges, and hype. JAMA. 2023;330:317–318. doi: 10.1001/jama.2023.11440 [DOI] [PubMed] [Google Scholar]

[jah310696-bib-0097] 97. Peng C, Yang X, Chen A, Smith KE, PourNejatian N, Costa AB, Martin C, Flores MG, Zhang Y, Magoc T, et al. A study of generative large language model for medical research and healthcare. NPJ Digit Med. 2023;6:210. doi: 10.1038/s41746-023-00958-w [DOI] [PMC free article] [PubMed] [Google Scholar]

[jah310696-bib-0098] 98. Nolin‐Lapalme A, Theriault‐Lauzier P, Corbin D, Tastet O, Sharma A, Hussin JG, Kadoury S, Jiang R, Krahn AD, Gallo R, et al. Maximising large language model utility in cardiovascular care: a practical guide. Can J Cardiol. 2024;40:1774–1787. doi: 10.1016/j.cjca.2024.05.024 [DOI] [PubMed] [Google Scholar]

PERMALINK

Harnessing Electronic Health Records and Artificial Intelligence for Enhanced Cardiovascular Risk Prediction: A Comprehensive Review

Ming‐Lung Tsai, MD

Kuan‐Fu Chen, MD, PhD

Pei‐Chun Chen, PhD

Abstract

Nonstandard Abbreviations and Acronyms

Figure 1. Illustration of the framework of CVD risk prediction.

TRADITIONAL COHORT‐BASED PREDICTION MODELS

Table 1.

EHR IN CVD RISK PREDICTION

Leveraging EHR Data to Enhance CVD Risk Prediction Models

Model Validation in CVD Risk Prediction

EHR‐Based CVD Prediction Models and Their Applications

INTEGRATION OF AI TO EHR‐BASED MODELS

Overview of AI, Machine Learning, and Deep Learning in CVD

Specific AI Algorithms and Applications in CVD Risk Prediction

CHALLENGES AND LIMITATIONS OF EHR IN CVD RISK PREDICTION

Data Quality and Outcome Adjudication

Data Standards and Harmonization

Geographic and Ethnic Diversity

Retrospective Analysis and Bias

Challenges in Model Reporting and the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis Statement

CHALLENGE AND ADVANCEMENT IN AI FOR CVD RISK PREDICTION

Disease Incidence and Data Leakage in AI Models

Accuracy, Reliability, and Interpretability Challenges in AI Models for Health Care

FUTURE DEVELOPMENT DIRECTIONS IN CARDIOVASCULAR RISK PREDICTION

Enhancing Predictive Analytics

Integrating Multidimensional Data: Environmental, Lifestyle, Social, and Genomic Factors

Enhancing Quality and Transparency in CVD Prevention With AI Algorithms

Integrating Clinical Workflow and AI in Cardiovascular Medicine

Emerging Technologies of Large Language Models

Balancing AI With Human Feedback in Cardiovascular Medicine

CONCLUSIONS

Sources of Funding

Disclosures

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases