Development and validation of multicentre study on novel Artificial Intelligence-based Cardiovascular Risk Score (AICVD)

Shiv Kumar Jalepalli; Prashant Gupta; Andre L A J Dekker; Inigo Bermejo; Sujoy Kar

doi:10.1136/fmch-2023-002340

. 2024 Jan 18;12(Suppl 1):e002340. doi: 10.1136/fmch-2023-002340

Development and validation of multicentre study on novel Artificial Intelligence-based Cardiovascular Risk Score (AICVD)

Shiv Kumar Jalepalli ¹, Prashant Gupta ², Andre L A J Dekker ³, Inigo Bermejo ³, Sujoy Kar ^1,^✉

PMCID: PMC10806469 PMID: 38238156

Abstract

Objective

Cardiovascular diseases (CVD) are one of the most prevalent diseases in India amounting for nearly 30% of total deaths. A dearth of research on CVD risk scores in Indian population, limited performance of conventional risk scores and inability to reproduce the initial accuracies in randomised clinical trials has led to this study on large-scale patient data. The objective is to develop an Artificial Intelligence-based Risk Score (AICVD) to predict CVD event (eg, acute myocardial infarction/acute coronary syndrome) in the next 10 years and compare the model with the Framingham Heart Risk Score (FHRS) and QRisk3.

Methods

Our study included 31 599 participants aged 18–91 years from 2009 to 2018 in six Apollo Hospitals in India. A multistep risk factors selection process using Spearman correlation coefficient and propensity score matching yielded 21 risk factors. A deep learning hazards model was built on risk factors to predict event occurrence (classification) and time to event (hazards model) using multilayered neural network. Further, the model was validated with independent retrospective cohorts of participants from India and the Netherlands and compared with FHRS and QRisk3.

Results

The deep learning hazards model had a good performance (area under the curve (AUC) 0.853). Validation and comparative results showed AUCs between 0.84 and 0.92 with better positive likelihood ratio (AICVD −6.16 to FHRS −2.24 and QRisk3 −1.16) and accuracy (AICVD −80.15% to FHRS 59.71% and QRisk3 51.57%). In the Netherlands cohort, AICVD also outperformed the Framingham Heart Risk Model (AUC −0.737 vs 0.707).

Conclusions

This study concludes that the novel AI-based CVD Risk Score has a higher predictive performance for cardiac events than conventional risk scores in Indian population.

Trial registration number

CTRI/2019/07/020471.

Keywords: Cardiovascular Diseases, Hypertension, Medical Informatics, Preventive Medicine

WHAT IS ALREADY KNOWN ON THIS TOPIC

Over 18 million people die each year from heart disease, an estimated one-third of all deaths worldwide. Cardiovascular disease is an epidemic in India, with one of the highest burdens of events and comorbidities.

WHAT THIS STUDY ADDS

The deep learning algorithm measures risk by ‘high’, ‘moderate’ and ‘minimal’ and beyond clinical factors incorporates patient’s lifestyle (eg, diet, tobacco use), physical activity, mental health and other routine vital parameters. The certified (ISO 13485), multisite-validated (India, the Netherlands, others), peer-reviewed (Cardiology Society of India) algorithm also delivers insights about actions doctors and individuals can take to lower their risk.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

The Artificial Intelligence-powered Cardiovascular Disease Risk Tool has shown accuracy that is superior to the conventional risk scores. The algorithm continues to be recalibrated by data from prospective use in over half a million individuals, including wearable daily dynamic data.

Introduction

Cardiovascular diseases (CVD) are a major cause of death around the globe. CVDs have grown to epidemic proportion¹ in India, where lifestyle patterns have changed significantly due to growing urbanisation. One of the major effects of this has been that deaths related to CVDs have almost doubled in the last decade.^{1 2} As per an Indian Council of Medical Research-India Diabetes report,³ the overall age-standardised prevalence of hypertension was 26.3% and almost four-fifths of the population was unaware about it.

There have been multiple epidemiological studies in India to ascertain the risk factors for CVD (online supplemental appendix 1). Andhra Pradesh Rural Health Initiative (APHRI)⁴ study, conducted in Andhra Pradesh state validating the WHO/ISH (International Society of Hypertension) model, was limited to conventional factors (age, systolic blood pressure (SBP)). India Heart Watch⁵ opines that there are substantial regional variations in risk factors for CVD mortality and thus a need to study the prevalence of multiple cardiovascular risk factors in different regions and correlate them with variations in CVD mortality using a uniform protocol. Kanjilal et al⁶ compared different risk scoring systems—Framingham Risk Score, Systematic Coronary Risk Evaluation (SCORE), etc— in individuals with a family history of CVD, elevated levels of lipids, proinflammatory, prothrombotic and serological markers to opine that these risk scores identified <5% of the population as being at ‘high-risk’.

Supplementary data

fmch-2023-002340supp001.pdf^{(1.1MB, pdf)}

Worldwide, outcomes of multiple studies on CVD risk factors have been effectively used in many countries for better cardiac disease management. Framingham Study⁷ used longitudinal data and various statistical hazards models to predict the risk of events like CVD, stroke, hypertension and diabetes in the next 10–30 years. SCORE⁸ 2001 model was built using data from ~2 million patients from 12 European countries. The model was evaluated in over 10 000 patients from 2002 to 2012 with area under the curve (AUC) on different models varying from 0.71 to 0.84 for different countries.⁹ Similarly, Prospective Cardiovascular Münster¹⁰ was initiated to determine the prevalence of coronary heart disease risk factors in the German population. A list of these conventional risk scores studied is provided in figure 1.

Conventional risk scores and their respective risk factors for prediction of cardiovascular disease (CVD). *Covariates included. ˆOnly among those with diabetes. +Definitions of positive family history may vary. AAF, African American females; AAM, African American males; ACC, American College of Cardiology; AHA, American Heart Association; ARIC, atherosclerosis risk in communities; ASCVD, atherosclerotic cardiovascular disease; BMI, body mass index; BP, blood pressure; CARDIA, coronory artery risk development in young adults; CHD, coronary heart disease; CHS, cardiovascular health study, Chol, cholesterol; CRP, C reactive protein; EAF, European American females; EAM, European American males; EF, European females; EM, European males; HDL, high-density lipoprotein; ISH, International Society of Hypertension; LDL, low-density lipoprotein; NHLBI, National Heart, Lung and Blood Institute; PROCAM, Prospective Cardiovascular Münster.

However, these risk models do not work well on patients from other continents or other ethnic and socioeconomic groups. Renfrew/Paisley¹¹ showed that Framingham model yielded a very low specificity number of 0.56 for patients from Scotland. Southall and Brent REvisited (SABRE) study¹² showed that both Framingham Risk Score and QRisk2 did not consistently perform well in different ethnic groups, calling for further validation of QRisk Score in South Asian and African Caribbean population.

Finally, Chow et al¹³ showed that Framingham equations recalibrated on Indian population are unlikely to be relevant to all regions of India and demonstrate the need of local data collection for development of relevant CVD risk tool specifically for Indian population. Considering this, our objective was to conduct a large-scale study to find the important risk factors contributing to Indian population and develop and validate an artificial intelligence (AI) model to predict the chance of CVD for the next 10 years. Cox proportional hazards and deep learning models were developed to measure the risk of cardiovascular event with 14 predictors.

Deep learning is a representation learning method and used here to determine the layers that transform clinical predictor data non-linearly while providing a hierarchical relationship and interplay between different clinical predictors.

Study design and methodology

Source of data

Clinical features, history and vitals (online supplemental appendix 2) from over 40 000 individuals’ preventive health check (electronic medical) records were collected between 2009 and 2018 from four major centres (Chennai, Hyderabad, Karnataka and Mumbai) of Apollo Hospitals. These health checks were a mixture of generic health checks, corporate-based annual health checks as well as specific heart health checks with patients aged between 18 and 91 years, covering different socioeconomic groups. All the records were deidentified by removing patient health information before any further use. Apollo Hospitals uses codified data that are stored in tabular format in secure servers and can be retrieved through appropriate strucutured query language (SQL) queries. The schematic flow of the study design and methodology is provided in figure 2.

Flow chart of the development, validation and comparison process for the Artificial Intelligence-based Cardiovascular Risk Score (AICVD) study. ACS, acute coronary syndrome; AMI, acute myocardial infarction; AUC, area under the curve; CCA, complete case analysis; CVD, cardiovascular disease.

Participants

Of this preventive health check database at Apollo Hospitals, individual’s longitudinal follow-up was done in the period between 2009 and 2018. The CVD events were defined as outcome events (fatal or non-fatal) where patients were admitted with acute myocardial infarction, acute coronary syndrome, etc. (For complete definition of the event, refer to online supplemental appendix 3.) These health check records were mapped longitudinally with the patients who had discharge summaries of CVD from the electronic medical records (and International Classification of Diseases-Tenth Revision coded). For the study, such patients formed the set of positive instances. To obtain negative instances for the model training, we determined the participants without any documented CVD, who appeared for health check at least twice within the period 10 years with minimum of 5 years of gap (negative span), with the last visit documented in 2018. This was done to ensure that we capture individuals with no CVD in the negative instances. After this filtering, final training dataset included 21 956 males and 9643 females aged between 18 and 91, with 7035 individuals with one or more health checks and a CVD event and 24 564 individuals with two or more health checks, yet without a CVD event.

In Maastricht University Medical Centre (MUMC), validation set had 5599 (45.38%) males and 6739 (54.61%) females. The preanaesthesia cohort from MUMC included patients from general and day care surgery who had their preanaesthesia check as outpatients and excluded for cardiovascular surgeries, cancer surgeries, brain or neurosurgeries, transplant and joint replacements surgery.

The eligibility criteria included adult patients and their details are further provided in online supplemental appendix 4: inclusion and exclusion criteria. The outcome criteria for positive instance included cardiovascular event described in online supplemental appendix 3: definition of cardiac event.

All methods were carried out in accordance with relevant guidelines and regulations. Due to the retrospective nature of the study, the need for informed consent was waived by the ethics committees. It is further registered with the Clinical Trial Registry of India (CTRI/2019/07/020471).

Patient and public involvement

The study has been conducted with retrospective data from 2009 to 2018, and hence development of the research question and outcome measures were not directly communicated to the patients. Patients were not directly or indirectly involved in the design of the study.

Predictors selection

Predictors selection process began by studying risk factors used in past heart risk studies.^{7 8 12} Most commonly used parameters include age, gender, body mass index (BMI), SBP and smoking habits. Apollo Hospitals collects more than 50 clinical parameters during health check-up (online supplemental appendix 2: list of clinical parameters). As the next step for variable selection, correlation coefficient by Spearman, Pearson and Kendall¹⁴ was used to correlate risk factors to CVD event (online supplemental appendix 5), Kendall being the more intuitive due to probability of concordance and discordance between the parameter’s rankings. As per the guidelines by Mukaka¹⁵ on efficient use of correlation in medical research, we decided to retain only those parameters with the correlation >0.3 or <−0.3. This approach selected widely accepted risk factors and identified non-traditional ones.

Table 1 shows selected predictors with their age and gender categories. This final list of predictors was used to build the cardiac risk models. The timing for predictor measurement was determined during the health check of the individual (online supplemental appendix 6: definition of clinical parameters).

Table 1.

Age and gender-related data of significant risk predictors

	Males				Females
Age groups	<40	40–55	56–70	>70	<40	40–55	56–70	>70
Development cohort
n	3823	10 516	6741	876	2067	4939	2429	208
BMI	27.02±4.26	27.10±3.85	26.53±3.81	28.31±9.16	27.57±5.07	28.84±4.65	28.45±4.95	26.88±4.75
Systolic BP	123.48±13.29	127.33±15.02	130.85±16.15	133.075±16.40	117.81±13.89	126.10±15.52	132.67±17.76	136.14±20.21
Diastolic BP	80.35±8.80	81.61±9.23	80.51±8.93	25.74±3.38	76.47±8.96	79.86±8.81	80.06±8.91	79.50±8.75
Smoking (%)	21.84	20.87	18.67	16.32	0.24	0.18	0.20	0.14
Chewing tobacco (%)	14.56	18.14	19.50	14.85	8.27	11.07	11.65	10.06
Hypertension (%)	7.06	19.17	31.44	36.87	3.48	17.65	33.22	43.75
Diabetes mellitus (%)	5.93	20.45	30.79	31.05	4.49	16.40	27.91	31.37
Dyslipidaemia (%)	8.13	14.47	16.64	16.09	3.38	11.03	18.27	16.82
Diet (vegetarian), %	85.11	80.20	76.16	65.41	81.23	72.70	69.08	63.94
Physical activity (no), %	65.36	64.95	64.58	69.40	60.32	64.14	66.32	75
Alcohol (yes), %	37.03	36.13	25.46	18.71	0.56	0.49	1.33	1.44
Pulse rate	73.99±7.67	74.41±8.42	73.66±8.61	72.18±9.42	74.62±7.22	74.61±7.84	74.08±8.39	75.25±10.43
Validation cohort
n	156	1204	906	216	97	303	299	65
BMI	26.30±3.85	25.75±3.61	25.69±3.58	24.96±3.64	24.31±4.48	26.61±3.66	27.20±4.28	24.96±4.62
Systolic BP	119.16±12.45	121.73±13.59	125.02±14.57	129.35±15.18	119.16±9.22	118.05±14.84	124.50±17.046	130.29±16.12
Diastolic BP	77.48±8.84	79.93±8.52	79.71±8.68	77.57±8.15	71.71±8.77	75.94±8.35	77.73±8.36	80.76±7.64
Smoking (%)	58.97	26.27	20.96	15.27	6.18	3.63	3.196	3.07
Chewing tobacco (%)	67.30	63.50	69.86	40.55	51.54	56.10	43.22	21.53
Hypertension (%)	7.051	17.71	23.47	25.46	5.15	20.46	40.84	12.30
Diabetes mellitus (%)	10.25	12.77	17.14	19.91	9.27	12.21	19.06	12.31
Dyslipidaemia (%)	33.33	33.608	24.78	18.05	19.58	25.74	37.96	15.38
Diet (vegetarian), %	76.28	64.33	63.53	54.16	42.26	64.35	50.06	44.61
Physical activity (no), %	63.33	65.42	64.58	78.71	65.46	64.85	75.48	91.77
Alcohol (yes), %	41.03	38.12	28.37	12.96	3.09	4.62	3.01	1.54
Pulse rate	82.21±6.26	83.781±5.75	82.99±7.76	78.3±9.73	79.90±9.17	78.38±8.46	77.62±8.56	83.71±8.56
MUMC validation cohort
	Male				Female
	<40	40–55	56–70	>70	<40	40–55	56–70	>70
Validation cohort
n	996	1243	1996	1364	1504	1805	2078	1352
BMI	24.59±4.27	27.05±4.62	27.33±4.42	26.49±3.85	25.94±6.05	26.79±5.52	26.97±5.46	26.71±4.78
Systolic BP	128.60±12.55	133.56±15.95	139.60±19.14	142.11±21.64	121.80±13.57	131.32±18.11	139.61±20.77	146.83±22.67
Diastolic BP	74.68±10.70	81.99±11.51	81.32±11.24	77.41±11.53	72.79±10.11	79.10±11.32	79.05±11.51	77.96±11.87
Smoking (imputed), %	70.58	72.26	72.14	73.15	69.88	67.94	69.74	68.55
Chewing tobacco	–	–	–	–	–	–	–	–
Hypertension (%)	1.10	7.96	17.53	23.46	1.19	8.19	16.65	24.63
Diabetes mellitus (%)	0.40	3.45	7.41	7.84	0.79	2.32	5.10	9.54
Dyslipidaemia	–	–	–	–	–	–	–	–
Diet (vegetarian)	100	100	100	100	100	100	100	100
Physical activity (no), %	3.91	11.74	21.99	38.41	6.98	14.01	28.01	53.69
Alcohol (yes)	–	–	–	–	–	–	–	–
Pulse rate	72.74±20.25	75.44±13.31	74.83±13.74	72.95±14.09	80.14±20.93	78.31±12.78	77.25±13.12	76.72±13.46

Open in a new tab

BMI, body mass index; BP, blood pressure; MUMC, Maastricht University Medical Centre.

For statistical modelling, these predictors were further categorised into either categorical or numerical valued features. Predictors like age, BMI, diastolic blood pressure (DBP)/SBP and pulse rate were numerical. Alcohol, smoking and tobacco use were categorised into current, past or non-consumer. History of diabetes mellitus, hypertension, dyslipidaemia, gender, diet (vegetarian) and physical activity were used as binary features.

Sample size

A power calculation study showed our study sample size (31 599) was sufficient for building a model determining the population size (1.3 billion), given the expected margin of error (±0.71%), considering the event rate (~22%) and number of predictors (14). Sample size calculations for validation cohort at Apollo Hospitals and MUMC were also found sufficient.

Missing data and addressing selection bias

Complete case analysis was followed at Apollo Hospitals data, that is, only those subjects were selected where the data were completely available. Any observation that has a missing value for any variable is automatically discarded and only complete observations are analysed.¹⁶ There were no significant differences from the potential loss of information in discarding incomplete 5000 plus cases combined in model and validation cohort, negating the potential selection bias. Propensity score matching was added to help address selection bias, unmeasured confounders and heterogeneity of comorbidities, their treatment and lifestyle features which can vary in the subjects and cohorts over the studied period.¹⁷

In MUMC data, multiple imputation method was used.¹⁶ The dataset contained 11 of the predictors used in the core model but was missing three, including the dyslipidaemia in patients, which was imputed using a categorical variable encoding and logistic regression model derived from Maastricht Study diabetes dataset (table 1: MUMC validation cohort).

Additional predictors

Further additional seven predictors like family history, medical history, previous heart diseases, heart rate, current cardiovascular symptoms, pulse rhythm and respiratory rate are being studied, based on their feature importance and OR, to identify their effects on the models and impact on development of CVD risk score. These predictors have been further studied on both development and validation cohorts at Apollo Hospitals.

Modelling

Hazards models are traditionally used for time to event prediction tasks. Even though these models predict the time to event, they have a unique advantage over regression models in that they can use the right censored data (patient not getting disease in the study span) to further improve accuracy.

We decided to create two separate models using hazards models as well as deep learning model. The logic behind creating two different models is that the hazards models can provide better inferencing of covariates while deep learning can potentially give better prediction scores.

Cox proportional hazards model

There have been a large number of studies on survival analysis using different types of survival models, including basic non-parametric estimators like Kaplan-Meier¹⁸ and Nelson-Aalen,¹⁹ fully parametric models like Weibull and semiparametric models like Aalen’s additive,²⁰ and Cox proportional hazards models.²¹ All of these different models were tried. While constructing the models, negative labelled data were used as censored data, ties between two events occurring at the same time were handled using Newton-Raphson method, as it requires fewer iterations to converge than Breslow method in adequate data size, and gender was used as the stratification parameter. Across various hazards models tried, Cox proportional hazards models yield the highest receiver operating characteristic (ROC)-AUC and concordance numbers.

Deep survival model

Neural networks have been applied on survival data for couple of decades, including the works of Faraggi and Simon²² (proposed neural networks with Newton-Raphson optimisation) and Biganzoli et al⁹ (where feed forward network with partial logistic regression was used). However, these models did not improve much on hazards model benchmarks. With the recent advances in deep learning, there have been multiple studies on applying deep neural networks on survival analysis in the recent past. Yousefi et al²³ proposed use of Cox regression as last layer of deep learning model with maximum likelihood loss function for back propagation weight updates.

Based on similar approaches, we tried various combinations of deep neural network architectures. The algorithm was developed on Python V.3.7.0. Deep learning on the entire health check parameters was not chosen as it may lead to inexplicable learning, and hence used it only on the 14 (and 21) parameters which were selected following statistical methods. The best deep neural survival architecture and sequence of mathematical functions are illustrated in figure 3.

The architecture for deep learning model used in development of Artificial Intelligence-based Cardiovascular Risk Score (AICVD). AUC, area under the curve; BMI, body mass index; BP, blood pressure.

Input to the model is the normalised predictors (between 0 and 1), using feature-wise normalisation, where each neuron represents each risk factor. This is followed by four nodes of deep neural network of sizes 100, 70, 40 and 35, respectively. 50 epochs—four layers were used with corresponding AUC, area under precision recall curve (PRC), accuracy and loss—were noted at each epoch step and found to provide better output than other combinations (like three or five layers with different epochs combination). At the final layer, we overcame the major challenge to merge classification and regression loss. As classification loss was of numerical value (difference between number of days and actual and predicted days) while binary classification loss would be between 0 and 1 (no event and event), we computed both regression and classification loss using same ranges and added max scaling-based loss, where regression loss is scaled down between 0 and 1. The final algorithm based on this calculation is used to build the deep survival model and an application programming interface (API) to calculate the risk score. The combination of deep learning and Cox progression model is further elaborated in the Discussion section.

Input to the model is normalised risk factors (between 0 and 1), where each neuron represents each risk factor. This is followed by four layers of deep neural network of sizes 100, 70, 40 and 25, respectively.

RL1=FC₁₀₀(Input)∈ ℝ¹⁰⁰ (1)

RL2=FC₇₀(RL1)∈ ℝ⁷⁰ (2)

RL3=FC₄₀(RL2)∈ ℝ⁴⁰ (3)

RL4=FC₂₅(RL3)∈ ℝ²⁵ (4)

As the next step, a regression layer was added, where job of this layer was to find survival days (number of days before cardiac event).

CoxLayer=FC1(RL4)∈ℝ¹ (5)

We tried multiple approaches for regression loss calculation, including root mean square error loss as well as maximum likelihood loss, but the better accuracy numbers were achieved by using negative log partial likelihood and derivation based on that as suggested by Faraggi and Simon.²² The equations for the same suggested by them are as follows (loss 1):

graphic file with name fmch-2023-002340ilf01.jpg

Gradients for back propagation were calculated using the following equation:

graphic file with name fmch-2023-002340ilf02.jpg

where $X_{i}$ are inputs to the output layer from $R L 4$ (equation 4), β are coefficients for Cox parameters (described as fully connected layer in equation 5), U is set of uncensored samples, $R_{i}$ is the set of ‘at-risk samples’ where follow-up time $Y_{j}$ > $Y_{i}$ . One of the issues with current deep survival networks was to not use censored data in the model. To use right censored data properly, we added a classification layer parallel to Cox regression layer and calculated binary loss via binary cross-entropy loss (loss 2).

ClassificationLayer = FC2(RL4)∈ ℝ (8)

CrossEntropyLoss = −(y log(p) + (1 − y) log(1 − p)) (9)

One the major challenges is to merge classification and regression loss, as regression loss would be of numerical value (difference between number of days and actual and predicted days) while binary classification loss would be between 0 and 1. One of the ways to handle this is by adding static weight score to both the loss, but it was observed that regression loss decreases drastically compared with classification loss after first few iterations, so assigning predecided weights to both the loss terms was not very helpful. Changing the weights dynamically after every iteration would mean changing the error space and that the model may not converge. To solve this problem, different approaches based on Aalen et al’s²⁰ A-Mixture model as well as uncertainty-based multitask loss model were tried. But both of these models are based on Bayesian deep learning methodology and require learning two additional noise parameters for regression and classification loss. These additional parameters increase/decrease inversely in proportion to their respective regression and classification loss. Based on multiple experiments we observed that a similar objective can be achieved if both regression and classification loss are in same ranges. Hence, finally, we added min-max scaling-based loss, where regression loss is scaled down between 0 and 1 and average of regression and classification loss is used for further processing.

Parameters used to assess model performance

The parameters used to assess model performance include AUC-ROC with 95% CI, accuracy score, precision, recall and F1 score. Further for the comparative study between different risk scores, we used sensitivity, specificity, likelihood ratios and predictive values.

Development of the risk score

Based on the above calculations, an individual’s clinical parameters (all the 14 parameters) can be put into the algorithm to compute the individual risk score. Further, the algorithm provides an optimal score which is computed from the positive and negative instances and captures the cardiovascular risk explainable by age and gender, while ignoring the effect of other factors. Calculations are then done to refer an individual score (output of the deep learning computation) as relative risk (individual score divided by optimal score) of less than 1× as low-risk, 1× to 1.5× as moderate-risk and greater than 1.5× as high-risk individuals (online supplemental appendix 7: clinical algorithm). The thresholds were derived from the validation cohorts where low risk corresponded to 3.05% (CI −1.55% to 4.45%), moderate risk to 10.03% (CI −5.65% to 14.4%) and high risk to 20.01% (CI −15.6% to 24.4%) of CVD event in 10 years. With this categorisation, the results are provided in both numerical values—risk scores and risk categorisation (low, moderate and high). To mitigate the issues around gender variations, a Cox proportional hazards model was used separating dataset by gender to make sure even less data for female do not bias the model.

Retrospective validation and comparison

This model validation was performed on separate external cohorts from two different Apollo Hospitals (Indraprastha Apollo Hospitals, Delhi and Apollo Gleneagles Hospitals, Kolkata) and Maastricht University. Data from these two Apollo Hospitals centres were not used in training the models. They were selected from a similar period between 2009 and 2018 with same inclusion and exclusion criteria, clinical parameters, definition of events, etc (table 1).

Data from 3246 individuals were analysed for the retrospectively generated cardiac risk score. The data distribution is provided in table 1 as validation set. The generated risk scores on clinical parameters in the validation set were categorised into high risk, moderate risk and low risk categories based on the risks adjusted for age and gender. The validation process detected the accuracy of the model to predict cardiac events in individuals who were designated as high or moderate risk. This was correspondingly matched to review whether they actually had a CVD event.

Further, the data used in the validation set cohorts were used to determine the risk scores based on Framingham Heart Risk Score (FHRS) and QRisk3 Score. The rationale for using the FHRS and QRisk3 Score was that other scores like WHO/ISH and atheroclerotic cardiovascular disease risk score (ASCVD) (pooled cohort equation) did not perform well in Indian population.²⁴ FHRS was obtained from https://qxmd.com/calculate/calculator_252/framingham-risk-score-2008 and QRisk3 from https://qrisk.org/three/index.php.²⁵ The outputs obtained from these scoring systems were directly applied and used for corresponding risk scores and categorisation and subsequently analysed.

For the external validation, we used a dataset collected at MUMC+ between 2010 and 2018 as part of the anaesthesia screening of patients undergoing surgery of any kind. It contained data of 12 338 adult patients with a mean age of 55.7 years, 941 (7.59%) of which had a cardiovascular event or died during follow-up. The dataset contained 10 of the 14 predictors used in the core model.

Results

Inferences on the model set

The study focused on CVD risk factors in the context of an Indian population and identified potentially mediating factors which lead to cardiovascular events. The 14 predictors selected through the process of correlation coefficient methods are listed in table 1. It also reflects the details of the various clinical parameters and their incidences in different age and gender groups. The summaries of the activities are provided in figure 2.

Propensity score matching was performed for all variables on the model set which yielded following cumulative results: R²(McFadden) −0.259*, R²(Cox and Snell) 0.265*, R²(Nagelkerke) 0.381*, Akaike information criterion (AIC) 2913.483 (good for fit) Hosmer-Lemeshow test (event): χ² −23.895, df –8, Pr>χ² −0.002. *The value between 0.2 and 0.4 should be taken to present the very good fit of the model^{16 17} (online supplemental appendix 8).

Overall, 22.19% cases in the study had CVD events (7035 cases in a studied population of 31 599 individuals). The incidence of the CVD events was 40.01 per 1000 person years (males −41.43/1000 person years and females −36.78/1000 person years). The accuracy of the models with Cox proportional hazards method was 0.83 (0.808–0.852) and deep survival model was 0.853 (95% CI −0.831 to 0.875) (validation set from the models) (figure 4).

Comparison of ROC:AUC and precision–recall curve of the Cox proportional hazards model (left) and the deep learning model (right)—development cohort. AUC, area under the curve; ROC, receiver operating characteristic.

Precision-recall curve suggests that deep learning-based model is highly accurate in finding people at very high risk (average precision (AP) =0.73 in deep learning vs AP=0.67 in Cox proportional hazards model). The calibration slope showed that while the hazards model might miss few high-risk patients, it did not overestimate risk. Hence, the study shows that the most effective way is to use both the models together. AUC of deep survival model is on average 3 percentage points more accurate than the Cox proportional model (figure 4). The additional predictors discussed earlier provide an improvement of 0.015 (1.5%) increase in the AUC for deep survival model. AUC results of other methods included Framingham model (0.4732), Kaplan-Meir (0.6984), Aalen’s additive model (0.7738) and Weibull hazards model (0.7989).

In table 2, the list of the factors is enumerated with the adjusted HR and corresponding p value with 95% upper and lower CIs. Diabetes mellitus (HR: 2.342 (2.32–2.36); p<0.001), hypertension (HR: 1.543 (1.52–1.57); p<0.001), DBP (1.065 (1.04–1.09); p<0.001), chewing tobacco (HR: 2.01 (1.99–2.036); p<0.001), smoking (HR: 2.277 (2.26–2.299); p<0.0001) and dyslipidaemia (1.16 (1.14–1.18); p<0.001) emerge as the most significant cardiovascular risk parameters in the studied population.

Table 2.

HRs for model and validation cohort

	The model total subjects—31 599 CVD event—7035				The validation total subjects—3246 CVD event—902
Risk factor	HR	Lower 95% CI	Higher 95% CI	P value	HR	Lower 95% CI	Higher 95% CI	P value
Age	1.0411	1.02	1.06	<0.001	1.0138	0.94	1.08	0.01
Alcohol	0.8886	0.87	0.91	0.01	1.0484	0.98	1.12	0.03
BMI	1.0693	1.05	1.09	<0.001	0.9921	0.92	1.06	0.006
Diabetes mellitus	2.3421	2.32	2.36	<0.001	3.6712	3.60	3.74	<0.001
Hypertension	1.5437	1.52	1.57	<0.001	2.0279	1.96	2.10	<0.001
Physical activity	0.8863	0.86	0.91	<0.01	1.0699	1.00	1.14	0.04
Dyslipidaemia	1.1606	1.14	1.18	<0.01	1.5685	1.50	1.64	<0.001
Diet	1.0466	1.02	1.06	0.04	0.9667	0.90	1.04	0.03
Systolic BP	1.0259	1.003	1.048	<0.001	1.0007	0.93	1.07	0.1
Diastolic BP	1.0657	1.04	1.09	<0.001	0.9979	0.93	1.07	0.1
Smoking	2.2771	2.26	2.299	<0.001	1.3023	1.23	1.37	<0.003
Chewing tobacco	2.0147	1.99	2.036	<0.001	2.3109	2.24	2.38	<0.001
Pulse rate	1.0446	1.01	1.07	<0.001	1.0234	0.96	1.09	0.03

Open in a new tab

BMI, body mass index; BP, blood pressure; CVD, cardiovascular disease.

Age variation accounts for about 4% as a risk factor (HR −1.04) and BMI variation about 7%. Uncontrolled diabetes (HbA1c>7.5%) has 2.34 times higher risk while hypertension has 50% higher risk of CVD events studied for a period of 7 years. Raised DBP (6.57%) has slightly higher risk than SBP (2.5%). Smoking and chewing tobacco have over two times the higher risk in studied population, individuals with dyslipidaemia have 16% higher risk.

Inferences on validation and comparison

The AUC scores of validation that correspond to the initial model are 0.844 (95% CI −0.775 to 0.913) (Delhi) and 0.921 (95% CI −0.852 to 0.99) (Kolkata)—establishing the high precision (0.76 and 0.87), recall (0.80 and 0.84) and F1 score (0.77 and 0.85), respectively. Interestingly, the negative span (average time between two health checks or the health check and an event) for Delhi Hospital was ≥1800 days while for Kolkata Hospital was ≥1000 days; thus, the lower the frequency between the health checks, the higher is the accuracy of the models (figure 5).

Retrospective results for Delhi and Kolkata validation cohorts. AUC, area under the curve.

The comparison between the developed risk score (Artificial Intelligence-based Cardiovascular Risk Score, AICVD), FHRS and QRisk3 is enumerated in table 3. For confusion matrix and statistical purposes, the individuals below stipulated age groups in conventional risk scores were considered in low risk category. The comparative analysis is done at https://www.medcalc.org/calc/diagnostic_test.php. Sensitivity and specificity of the AICVD (61.31–90.04%) are higher than FHRS (37.63–82.70%) and QRisk3 (31.03–76.08%), respectively. Positive likelihood ratio in AICVD score is at 6.16 (5.37–7.06), signifying moderate to large accuracy in probability of predicting cardiovascular event compared with FHRS 2.18 (1.92–2.46) and QRisk3 1.30 (1.16–1.45) in validation set. Similarly, negative likelihood ratio is at 0.43 in AICVD compared with Framingham (0.75) and QRisk3 (0.91). Positive and negative predictive values also show considerable differences between the model.

Table 3.

Comparison of the model with conventional risk scores

	AICVD Risk Score		Framingham Risk Score		QRisk3 Score
Source of algorithm/API	Internal API		https://qxmd.com/calculate/calculator_252/framingham-risk-score-2008		https://qrisk.org/three/index.php
Confusion matrix	High+Mid	Low	High+Mid	Low	High+Mid	Low
Positive cases (events): 902	691	211	630	272	548	354
Negative case (no events): 2344	436	1908	1044	1300	1218	1126
Calculation details performed at: https://www.medcalc.org/calc/diagnostic_test.php
Sensitivity	61.31% (58.40% to 64.17%)		37.63% (35.31% to 40.01%)		31.03% (28.88% to 33.25%)
Specificity	90.04% (88.69% to 91.28%)		82.70% (80.73% to 84.54%)		76.08% (73.82% to 78.23%)
Positive likelihood ratio (PLR)	6.16 (5.37 to 7.06)		2.18 (1.92 to 2.46)		1.30 (1.16 to 1.45)
Negative likelihood ratio (NLR)	0.43 (0.40 to 0.46)		0.75 (0.72 to 0.79)		0.91 (0.87 to 0.95)
Positive predictive value	76.61% (74.08% to 78.96%)		69.84% (67.16% to 72.40%)		60.75% (57.99% to 63.45%)
Negative predictive value	81.40% (80.24% to 82.51%)		55.46% (54.38% to 56.53%)		48.04% (46.98% to 49.10%)
Accuracy	80.07% (78.65% to 81.43%)		59.46% (57.75% to 61.15%)		51.57% (49.84% to 53.30%)

Open in a new tab

AICVD, Artificial Intelligence-based Cardiovascular Risk Score; API, application programming interface.

In the Maastricht population, the application of the AICVD Risk Score (mid risk and high risk) in this dataset resulted in a precision of 0.94 and a recall of 0.62. The area under the ROC curve achieved by the model is 73.7%, compared with 70.7% using the Framingham Risk Score. The slope of the calibration plot was slightly lower than 1, implying a slight overestimation of the probability of a cardiovascular event in patients at higher risk (online supplemental appendix 9: comparison of the AUC at MUMC).

The API created from the algorithm inputs the data as specified in the Study Design and Methodology section and provides four specific outputs: (a) individual’s risk score and optimum score for age and gender, (b) top 3 modifiable risk factors, (c) a trend line of risk scores over time and (d) a clinical algorithm which would help the physician take the next steps based on an individual’s risk score (online supplemental appendix 7).

Discussion

Using the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis guidelines and checklist (online supplemental appendix 10),²⁶ we report the development and multicentre validation of a deep learning model that used clinical predictors to assess CVD risk. AICVD is significantly more accurate than the existing models on the studied populations (table 3).

Interpretation

We observed that, despite their widespread popularity, randomised controlled trials assessing the impact of various CVD risk scores show limited patient outcomes.²⁷ Although there is good evidence to identify and modify individuals with high-risk scores through holistic risk factor management,²⁸ the randomised controlled trials assessing the usefulness of existing risk scores show lack of evidence.²⁷

Further, there is an unmet need for a standard cardiovascular risk score for Indian population.²⁴ The key shortcomings in conventional CVD risk scores are threefold: (a) their effectiveness in Indian population is not extensively studied and hence their accuracy cannot be validated,²⁴ (b) there is an inherent bias in the risk feature selection process²⁹ and (c) availability of a prospective feedback loop to incorporate the data of an individual’s health and CVD event longitudinally.

The validation process of the study has been designed meticulously. In the validation process, the existing model performed differently in North Indian (Delhi) and Eastern Indian (Kolkata) cohorts with differences in accuracy, precision and recall. For the Maastricht population, the results of the external validation show a lower performance than in the original population, which could be explained partly due to some of the predictors used in the model being absent in the validation dataset and partly due to different population characteristics (eg, no tobacco chewing). The slight miscalibration could be explained by the population in the external validation being at lower risk. However, the model showed good generalisability by achieving overall good predictive performance and outperforming the Framingham Risk Score in terms of discrimination in the validation dataset.

The overall prevalence of diabetes in this study is 20.16% in Southern and Western India (model cohort) and 14.75% in Northern and Eastern India (validation cohort). In both the cohorts, diabetes has a significant prevalence (28.5%) above the age of 55 years (n=11 740). There is a heterogenous variation of diabetes in India ranging from 9% (rural) to 17% (urban) with majority of them in working age,³ a trend that has grown 10 times in the past four decades. The study shows that the individuals with comparatively lower BMI of 24–28 kg/m² and uncontrolled diabetes have higher risks of CVD compared with BMI>28 with uncontrolled diabetes, which corresponds to the study in Korean population.³⁰

Various population-based studies have shown that prevalence of age-adjusted hypertension in India is 26.3%,³ with a majority of the hypertensives unaware about their status. In the current study, the prevalence of hypertension is 20.7–22% in two different cohorts (and geographical location). However, in the context of young adults <40 years, the prevalence of hypertension (overall 5.8%) in males is 7.06% and in females is 3.5% (n=6143). This group shows the independent diastolic hypertension as an important risk factor for cardiovascular event in this age group with the HR of 1.63 (CI −1.26 to 1.99) over the systemic hypertension (SBP>130 and DBP>80).³¹ The AICVD Risk Score is most appropriate when applied to age group between 20 and 70 years. The model performs better in population below 70 years, which is the objective of the study, to identify at-risk individuals for intervention and lifestyle management.

In the current study, the prevalence of smoking in males is 20.7% and in females is 0.48%; however, chewing tobacco in males is 22.49%, and 13.31% in females. Chewing tobacco therefore requires a much intense follow-up study to understand the risk it has on both the genders and different age groups. The current HR of smoking (vs non-smokers) is at 2.27 (CI −2.25 to 2.29) and 1.30 (CI −1.23 to 1.37) in model and validation cohorts, whereas the HR of individuals chewing tobacco (vs non-users) is at 2.01 (CI −1.99 to 2.03) and 2.31 (CI −2.24 to 2.38). Smoking cessation is a key component of primary and secondary CVD prevention strategies, but chewing tobacco receives less attention despite the availability of proven evidence that it improves overall CVD risks following cessation.^{32 33}

Limitation

This retrospective study has been conducted for limited population (across geography and ethnicity) using a novel approach, predominantly from hospital-based wellness and healthy individual’s health check-up. To overcome the limitation, this model is being evaluated through an ongoing prospective observational study in nine institutions across India, covering all geographical, ethnic and socioeconomic diversities.

Implications

The deep learning AICVD tool can test the effectiveness of policy and health providers’ interventions on diet, tobacco, physical activity and other lifestyle-related attributes. It would also help identify the major barriers and challenges in implementing evidence-based policy measures in India.

Conclusion

Apollo Hospitals, with the help of Maastricht University, collaborated to develop and implement an AICVD model using deep learning. The study concludes that the novel AI-based CVD Risk Score has an improved predictive performance than conventional risk scores. The use of deep learning shows the interplay of multiple risk factors (predictors) and provides an accurate and precise stratification of CVD risk.

Acknowledgments

We would like to acknowledge the contribution of Dr Sangita Reddy, who had inspired us to conduct this study. We would also like to acknowledge various clinicians, data scientists and leaders at Apollo Hospitals and Microsoft who have assisted us in various manner, over the years.

Footnotes

Contributors: SKJ: concept and design, data acquisition, data analysis and interpretation, drafting of manuscript, critical revision of manuscript, statistical analysis, administration support, supervision, final approval. PG: concept and design, data acquisition, data analysis and interpretation, drafting of manuscript, supervision. ALAJD: data analysis and interpretation, drafting of manuscript, critical revision of manuscript, statistical analysis, administration support, supervision. IB: data acquisition, data analysis and interpretation, drafting of manuscript, critical revision of manuscript, statistical analysis, administration support, supervision. SK: concept and design, data acquisition, data analysis and interpretation, drafting of manuscript, critical revision of manuscript, statistical analysis, administration support, supervision, overall guarantor.

Funding: The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests: None declared.

Patient and public involvement: Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

Provenance and peer review: Not commissioned; externally peer reviewed.

Supplemental material: This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Data availability statement

Data are available upon reasonable request.

Ethics statements

Patient consent for publication

Not applicable.

Ethics approval

Institutional ethics committee approvals were obtained from all Apollo Hospitals centres (including Chennai, Bangalore, Mumbai, Hyderabad, Kolkata and Delhi), Maastricht University Medical Centre (MUMC) and Microsoft AETHER for the prospective study.

References

1.Gaziano TA. Cardiovascular disease in the developing world and its cost-effective management. Circulation 2005;112:3547–53. 10.1161/CIRCULATIONAHA.105.591792 [DOI] [PubMed] [Google Scholar]
2.Gupta R, Misra A, Pais P, et al. Correlation of regional cardiovascular disease mortality in India with lifestyle and nutritional factors. Int J Cardiol 2006;108:291–300. 10.1016/j.ijcard.2005.05.044 [DOI] [PubMed] [Google Scholar]
3.Anjana RM, Pradeepa R, Deepa M, et al. Prevalence of diabetes and pre-diabetes (impaired fasting glucose and/or impaired glucose tolerance) in urban and rural India: phase I results of the Indian Council of medical research-Indiadiabetes (ICMR-INDIAB) study. Diabetologia 2011;54:3022–7. 10.1007/s00125-011-2291-5 [DOI] [PubMed] [Google Scholar]
4.Raghu A, Praveen D, Peiris D, et al. Implications of cardiovascular disease risk assessment using the WHO/ISH risk prediction charts in rural India. PLoS ONE 2015;10:e0133618. 10.1371/journal.pone.0133618 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Gupta R, Guptha S, Sharma KK, et al. Regional variations in cardiovascular risk factors in India: India heart watch. World J Cardiol 2012;4:112–20. 10.4330/wjc.v4.i4.112 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Kanjilal S, Rao VS, Mukherjee M, et al. Application of cardiovascular disease risk prediction models and the relevance of novel biomarkers to risk stratification in Asian Indians. Vasc Health Risk Manag 2008;4:199–211. 10.2147/vhrm.2008.04.01.199 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.D’Agostino RB, Vasan RS, Pencina MJ, et al. General cardiovascular risk profile for use in primary care: the Framingham heart study. Circulation 2008;117:743–53. 10.1161/CIRCULATIONAHA.107.699579 [DOI] [PubMed] [Google Scholar]
8.Conroy RM, Pyörälä K, Fitzgerald AP, et al. Estimation of ten-year risk of fatal cardiovascular disease in Europe: the SCORE project. Eur Heart J 2003;24:987–1003. 10.1016/s0195-668x(03)00114-3 [DOI] [PubMed] [Google Scholar]
9.Biganzoli E, Boracchi P, Mariani L, et al. Feed forward neural networks for the analysis of censored survival data: a partial logistic regression approach. Stat Med 1998;17:1169–86. [DOI] [PubMed] [Google Scholar]
10.Assmann G, Schulte H. Prospective cardiovascular Münster study: prevalence and Prognostic significance of hyperlipidemia in men with systemic hypertension. Am J Cardiol 1987;59:9G–17G. 10.1016/0002-9149(87)90152-4 [DOI] [PubMed] [Google Scholar]
11.Brindle PM, McConnachie A, Upton MN, et al. The accuracy of the Framingham risk-score in different socioeconomic groups: a prospective study. Br J Gen Pract 2005;55:838–45. [PMC free article] [PubMed] [Google Scholar]
12.Tillin T, Hughes AD, Whincup P, et al. Whincup P, et al Ethnicity and prediction of cardiovascular disease: performance of Qrisk2 and Framingham scores in a UK tri-ethnic prospective cohort study (SABRE—Southall and Brent Revisited). Heart 2014;100:60–7. 10.1136/heartjnl-2013-304474 [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Chow CK, Joshi R, Celermajer DS, et al. Re-calibration of a Framingham risk equation for a rural population in India. Journal of Epidemiology & Community Health 2009;63:379–85. 10.1136/jech.2008.077057 [DOI] [PubMed] [Google Scholar]
14.Spearman C. The proof and measurement of association between two things. The American Journal of Psychology 1904;15:72. 10.2307/1412159 [DOI] [PubMed] [Google Scholar]
15.Mukaka MM. A guide to appropriate use of correlation coefficient in medical research. Malawi Med J 2012;24:69–71. [PMC free article] [PubMed] [Google Scholar]
16.Stavseth MR, Clausen T, Røislien J. How handling missing data may impact conclusions: A comparison of six different imputation methods for categorical questionnaire data. SAGE Open Med 2019;7. 10.1177/2050312118822912 [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Samuel M, Batomen B, Rouette J, et al. Evaluation of propensity score used in cardiovascular research: a cross-sectional survey and guidance document. BMJ Open 2020;10:e036961. 10.1136/bmjopen-2020-036961 [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association 1958;53:457–81. 10.1080/01621459.1958.10501452 [DOI] [Google Scholar]
19.Aalen O. Nonparametric inference for a family of counting processes. Ann Statist 1978;6:701–26. 10.1214/aos/1176344247 [DOI] [Google Scholar]
20.Aalen OO. A linear regression model for the analysis of life times. Stat Med 1989;8:907–25. 10.1002/sim.4780080803 [DOI] [PubMed] [Google Scholar]
21.Cox DR. Regression models and life-tables (with discussion). In: JR Stat Soc Ser B (Methodol). 1972: 187–220. 10.1111/j.2517-6161.1972.tb00899.x [DOI] [Google Scholar]
22.Faraggi D, Simon R. A neural network model for survival data. Stat Med 1995;14:73–82. 10.1002/sim.4780140108 [DOI] [PubMed] [Google Scholar]
23.Yousefi S, Amrollahi F, Amgad M, et al. Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models. Bioinformatics [Preprint] 2017. 10.1101/131367 [DOI] [PMC free article] [PubMed]
24.Garg N, Muduli SK, Kapoor A, et al. Comparison of different cardiovascular risk score calculators for cardiovascular risk prediction and guideline recommended Statin uses. Indian Heart J 2017;69:458–63. 10.1016/j.ihj.2017.01.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Development and validation of Qrisk3 risk prediction Algorithms to estimate future risk of cardiovascular disease: prospective cohort study. BMJ 2017;357. 10.1136/bmj.j2099 [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Collins GS, Reitsma JB, Altman DG, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med 2015;162:735–6. 10.7326/L15-5093-2 [DOI] [PubMed] [Google Scholar]
27.Collins DRJ, Tompson AC, Onakpoya IJ, et al. Global cardiovascular risk assessment in the primary prevention of cardiovascular disease in adults: systematic review of systematic reviews. BMJ Open 2017;7:e013650. 10.1136/bmjopen-2016-013650 [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Redberg RF, Benjamin EJ, Bittner V, et al. ACCF/AHA 2009 performance measures for primary prevention of cardiovascular disease in adults. Journal of the American College of Cardiology 2009;54:1364–405. 10.1016/j.jacc.2009.08.005 [DOI] [PubMed] [Google Scholar]
29.Labos C, Thanassoulis G. Selection bias in cardiology research: another thing to worry about (and how to correct for it). Can J Cardiol 2018;34:705–8. 10.1016/j.cjca.2018.03.010 [DOI] [PubMed] [Google Scholar]
30.Ma SH, Park B-Y, Yang JJ, et al. Interaction of body mass index and diabetes as modifiers of cardiovascular mortality in a cohort study. J Prev Med Public Health 2012;45:394–401. 10.3961/jpmph.2012.45.6.394 [DOI] [PMC free article] [PubMed] [Google Scholar]
31.McEvoy JW, Daya N, Rahman F, et al. Association of isolated diastolic hypertension as defined by the 2017 ACC/AHA blood pressure guideline with incident cardiovascular outcomes. JAMA 2020;323:329–38. 10.1001/jama.2019.21402 [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Rigotti NA, Clair C. Managing tobacco use: the neglected cardiovascular disease risk factor. Eur Heart J 2013;34:3259–67. 10.1093/eurheartj/eht352 [DOI] [PubMed] [Google Scholar]
33.Piano MR, Benowitz NL, Fitzgerald GA, et al. Impact of Smokeless tobacco products on cardiovascular disease: implications for policy, prevention, and treatment: a policy statement from the American heart Association. Circulation 2010;122:1520–44. 10.1161/CIR.0b013e3181f432c3 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary data

fmch-2023-002340supp001.pdf^{(1.1MB, pdf)}

Data Availability Statement

Data are available upon reasonable request.

[R1] 1.Gaziano TA. Cardiovascular disease in the developing world and its cost-effective management. Circulation 2005;112:3547–53. 10.1161/CIRCULATIONAHA.105.591792 [DOI] [PubMed] [Google Scholar]

[R2] 2.Gupta R, Misra A, Pais P, et al. Correlation of regional cardiovascular disease mortality in India with lifestyle and nutritional factors. Int J Cardiol 2006;108:291–300. 10.1016/j.ijcard.2005.05.044 [DOI] [PubMed] [Google Scholar]

[R3] 3.Anjana RM, Pradeepa R, Deepa M, et al. Prevalence of diabetes and pre-diabetes (impaired fasting glucose and/or impaired glucose tolerance) in urban and rural India: phase I results of the Indian Council of medical research-Indiadiabetes (ICMR-INDIAB) study. Diabetologia 2011;54:3022–7. 10.1007/s00125-011-2291-5 [DOI] [PubMed] [Google Scholar]

[R4] 4.Raghu A, Praveen D, Peiris D, et al. Implications of cardiovascular disease risk assessment using the WHO/ISH risk prediction charts in rural India. PLoS ONE 2015;10:e0133618. 10.1371/journal.pone.0133618 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Gupta R, Guptha S, Sharma KK, et al. Regional variations in cardiovascular risk factors in India: India heart watch. World J Cardiol 2012;4:112–20. 10.4330/wjc.v4.i4.112 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Kanjilal S, Rao VS, Mukherjee M, et al. Application of cardiovascular disease risk prediction models and the relevance of novel biomarkers to risk stratification in Asian Indians. Vasc Health Risk Manag 2008;4:199–211. 10.2147/vhrm.2008.04.01.199 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.D’Agostino RB, Vasan RS, Pencina MJ, et al. General cardiovascular risk profile for use in primary care: the Framingham heart study. Circulation 2008;117:743–53. 10.1161/CIRCULATIONAHA.107.699579 [DOI] [PubMed] [Google Scholar]

[R8] 8.Conroy RM, Pyörälä K, Fitzgerald AP, et al. Estimation of ten-year risk of fatal cardiovascular disease in Europe: the SCORE project. Eur Heart J 2003;24:987–1003. 10.1016/s0195-668x(03)00114-3 [DOI] [PubMed] [Google Scholar]

[R9] 9.Biganzoli E, Boracchi P, Mariani L, et al. Feed forward neural networks for the analysis of censored survival data: a partial logistic regression approach. Stat Med 1998;17:1169–86. [DOI] [PubMed] [Google Scholar]

[R10] 10.Assmann G, Schulte H. Prospective cardiovascular Münster study: prevalence and Prognostic significance of hyperlipidemia in men with systemic hypertension. Am J Cardiol 1987;59:9G–17G. 10.1016/0002-9149(87)90152-4 [DOI] [PubMed] [Google Scholar]

[R11] 11.Brindle PM, McConnachie A, Upton MN, et al. The accuracy of the Framingham risk-score in different socioeconomic groups: a prospective study. Br J Gen Pract 2005;55:838–45. [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Tillin T, Hughes AD, Whincup P, et al. Whincup P, et al Ethnicity and prediction of cardiovascular disease: performance of Qrisk2 and Framingham scores in a UK tri-ethnic prospective cohort study (SABRE—Southall and Brent Revisited). Heart 2014;100:60–7. 10.1136/heartjnl-2013-304474 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Chow CK, Joshi R, Celermajer DS, et al. Re-calibration of a Framingham risk equation for a rural population in India. Journal of Epidemiology & Community Health 2009;63:379–85. 10.1136/jech.2008.077057 [DOI] [PubMed] [Google Scholar]

[R14] 14.Spearman C. The proof and measurement of association between two things. The American Journal of Psychology 1904;15:72. 10.2307/1412159 [DOI] [PubMed] [Google Scholar]

[R15] 15.Mukaka MM. A guide to appropriate use of correlation coefficient in medical research. Malawi Med J 2012;24:69–71. [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Stavseth MR, Clausen T, Røislien J. How handling missing data may impact conclusions: A comparison of six different imputation methods for categorical questionnaire data. SAGE Open Med 2019;7. 10.1177/2050312118822912 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Samuel M, Batomen B, Rouette J, et al. Evaluation of propensity score used in cardiovascular research: a cross-sectional survey and guidance document. BMJ Open 2020;10:e036961. 10.1136/bmjopen-2020-036961 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association 1958;53:457–81. 10.1080/01621459.1958.10501452 [DOI] [Google Scholar]

[R19] 19.Aalen O. Nonparametric inference for a family of counting processes. Ann Statist 1978;6:701–26. 10.1214/aos/1176344247 [DOI] [Google Scholar]

[R20] 20.Aalen OO. A linear regression model for the analysis of life times. Stat Med 1989;8:907–25. 10.1002/sim.4780080803 [DOI] [PubMed] [Google Scholar]

[R21] 21.Cox DR. Regression models and life-tables (with discussion). In: JR Stat Soc Ser B (Methodol). 1972: 187–220. 10.1111/j.2517-6161.1972.tb00899.x [DOI] [Google Scholar]

[R22] 22.Faraggi D, Simon R. A neural network model for survival data. Stat Med 1995;14:73–82. 10.1002/sim.4780140108 [DOI] [PubMed] [Google Scholar]

[R23] 23.Yousefi S, Amrollahi F, Amgad M, et al. Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models. Bioinformatics [Preprint] 2017. 10.1101/131367 [DOI] [PMC free article] [PubMed]

[R24] 24.Garg N, Muduli SK, Kapoor A, et al. Comparison of different cardiovascular risk score calculators for cardiovascular risk prediction and guideline recommended Statin uses. Indian Heart J 2017;69:458–63. 10.1016/j.ihj.2017.01.015 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Development and validation of Qrisk3 risk prediction Algorithms to estimate future risk of cardiovascular disease: prospective cohort study. BMJ 2017;357. 10.1136/bmj.j2099 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Collins GS, Reitsma JB, Altman DG, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med 2015;162:735–6. 10.7326/L15-5093-2 [DOI] [PubMed] [Google Scholar]

[R27] 27.Collins DRJ, Tompson AC, Onakpoya IJ, et al. Global cardiovascular risk assessment in the primary prevention of cardiovascular disease in adults: systematic review of systematic reviews. BMJ Open 2017;7:e013650. 10.1136/bmjopen-2016-013650 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] 28.Redberg RF, Benjamin EJ, Bittner V, et al. ACCF/AHA 2009 performance measures for primary prevention of cardiovascular disease in adults. Journal of the American College of Cardiology 2009;54:1364–405. 10.1016/j.jacc.2009.08.005 [DOI] [PubMed] [Google Scholar]

[R29] 29.Labos C, Thanassoulis G. Selection bias in cardiology research: another thing to worry about (and how to correct for it). Can J Cardiol 2018;34:705–8. 10.1016/j.cjca.2018.03.010 [DOI] [PubMed] [Google Scholar]

[R30] 30.Ma SH, Park B-Y, Yang JJ, et al. Interaction of body mass index and diabetes as modifiers of cardiovascular mortality in a cohort study. J Prev Med Public Health 2012;45:394–401. 10.3961/jpmph.2012.45.6.394 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] 31.McEvoy JW, Daya N, Rahman F, et al. Association of isolated diastolic hypertension as defined by the 2017 ACC/AHA blood pressure guideline with incident cardiovascular outcomes. JAMA 2020;323:329–38. 10.1001/jama.2019.21402 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] 32.Rigotti NA, Clair C. Managing tobacco use: the neglected cardiovascular disease risk factor. Eur Heart J 2013;34:3259–67. 10.1093/eurheartj/eht352 [DOI] [PubMed] [Google Scholar]

[R33] 33.Piano MR, Benowitz NL, Fitzgerald GA, et al. Impact of Smokeless tobacco products on cardiovascular disease: implications for policy, prevention, and treatment: a policy statement from the American heart Association. Circulation 2010;122:1520–44. 10.1161/CIR.0b013e3181f432c3 [DOI] [PubMed] [Google Scholar]

PERMALINK

Development and validation of multicentre study on novel Artificial Intelligence-based Cardiovascular Risk Score (AICVD)

Shiv Kumar Jalepalli

Prashant Gupta

Andre L A J Dekker

Inigo Bermejo

Sujoy Kar

Abstract

Objective

Methods

Results

Conclusions

Trial registration number

WHAT IS ALREADY KNOWN ON THIS TOPIC

WHAT THIS STUDY ADDS

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

Introduction

Figure 1.

Study design and methodology

Source of data

Figure 2.

Participants

Patient and public involvement

Predictors selection

Table 1.

Sample size

Missing data and addressing selection bias

Additional predictors

Modelling

Cox proportional hazards model

Deep survival model

Figure 3.

Parameters used to assess model performance

Development of the risk score

Retrospective validation and comparison

Results

Inferences on the model set

Figure 4.

Table 2.

Inferences on validation and comparison

Figure 5.

Table 3.

Discussion

Interpretation

Limitation

Implications

Conclusion

Acknowledgments

Footnotes

Data availability statement

Ethics statements

Patient consent for publication

Ethics approval

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases