Abstract
Aims
Cardiovascular diseases (CVDs) are among the leading causes of death worldwide. Predictive scores providing personalized risk of developing CVD are increasingly used in clinical practice. Most scores, however, utilize a homogenous set of features and require the presence of a physician. The aim was to develop a new risk model (DiCAVA) using statistical and machine learning techniques that could be applied in a remote setting. A secondary goal was to identify new patient-centric variables that could be incorporated into CVD risk assessments.
Methods and results
Across 466 052 participants, Cox proportional hazards (CPH) and DeepSurv models were trained using 608 variables derived from the UK Biobank to investigate the 10-year risk of developing a CVD. Data-driven feature selection reduced the number of features to 47, after which reduced models were trained. Both models were compared to the Framingham score. The reduced CPH model achieved a c-index of 0.7443, whereas DeepSurv achieved a c-index of 0.7446. Both CPH and DeepSurv were superior in determining the CVD risk compared to Framingham score. Minimal difference was observed when cholesterol and blood pressure were excluded from the models (CPH: 0.741, DeepSurv: 0.739). The models show very good calibration and discrimination on the test data.
Conclusion
We developed a cardiovascular risk model that has very good predictive capacity and encompasses new variables. The score could be incorporated into clinical practice and utilized in a remote setting, without the need of including cholesterol. Future studies will focus on external validation across heterogeneous samples.
Keywords: Cardiovascular disease, Prediction, Risk modelling, Machine learning, Lifestyle
Graphical Abstract
Introduction
Cardiovascular diseases (CVDs) are the most common cause of mortality globally, with 18.6 million attributed deaths in 2017.1 The increase of CVD-related mortality in low- and middle-income countries,2 together with increased life expectancy and a growing CVD morbidity worldwide,3 poses a significant challenge in managing this group of diseases. Many risk factors for acquiring CVDs are well-established: hypertension, obesity, diabetes, poor physical activity, hypercholesterolaemia, smoking, alcohol intake, among others.4 Given that up to 70% of cases and deaths from CVDs are attributed to modifiable factors,4 primary prevention of CVD, rather than treatment, has become a mainstay of public health strategies and is extensively described in leading guidelines.5 These strategies can be highly effective in reducing the number of occurrences and, thereby, the corresponding mortality.
The use of risk scores for CVDs in clinical practice is commonplace.6–10 Their primary use is to identify individuals who are at high risk of development of either a fatal and/or non-fatal CVD event in the next 10 years. Their goal is to highlight that risk, so it can be mitigated through either lifestyle adjustment or pharmacological treatment of associated conditions, such as hypertension and hypercholesterolaemia. These interventions have indeed been shown to be cost-effective,11 but the pursuit of better screening programmes and economic assessments is necessary.12
There are several limitations of the cardiovascular risk scores used in practice. In addition to overestimating the proportion of people in the high-risk category,13 recent systematic reviews show low-quality evidence that use of existing CVD risk scores may have minimal effect on incidence of CVD events.14 Moreover, the majority of existing risk scores utilize a small and homogenous set of non-malleable factors in their calculation (Supplementary material online, Table S1).
Risk factors for CVDs can be easily captured through web-based platforms, with standard practice often integrating a risk model as a function within electronic healthcare records (EHRs).15 However, more granular insight into cardiovascular health can be achieved through smartphone-based apps and integrated questionnaires, or wearable devices that allow passive and continuous assessments.16 Furthermore, the analysis of such data using modern methods, such as machine learning (ML), allows for deeper insight into the risk factors that contribute to CVDs, capturing non-linear variable interactions uncapturable by classical statistical methods.17
The primary aim of our study, therefore, is to develop a CVD risk model using both traditional statistical and ML approaches, guided by clinical intuition, which is viable in a remote setting. The secondary goal is the identification of new variables which should be considered by the wider community for incorporation into future CVD risk models to improve their utility. To evidence the value of non-traditional variables, we compare our model to the Framingham risk score that is currently used in clinical practice.
Methods
Study design and exclusion criteria
The UK Biobank (UKB) dataset is a prospective cohort study of 502 488 UK participants18 recruited between 2006 and 2010. It comprises baseline information from an initial assessment on participants who were subsequently followed up via additional assessment sessions, medical records, and other health-related records. Use of data for this study was approved by UKB (application number 55668).
The sole exclusion criterion was a pre-existing diagnosis of a CVD (defined identically to the outcome definition below, with non-limited look-back). Censoring was done at the date of first CVD diagnosis, when lost to follow-up due to death or other reasons, or at the date of last available update on the data (30 September 2020), whichever came first.
Outcome definition
Definition of CVD in this study includes myocardial infarction, heart failure, angina pectoris, stroke, and transient ischaemic attack. A diagnosis of CVD was confirmed using several fields of UKB-defined First Occurrences (includes primary care, hospital admissions, death register data, and self-report), Algorithmically defined outcomes (hospital admissions, death register data, and self-report), and remaining ICD-10 codes from inpatient records. The full list of UKB fields used for outcome definition can be found in Supplementary material online, Table S2. While there was >75% overlap between the Algorithmically defined outcomes and the corresponding fields for first occurrences, taking into account both fields ensured a more comprehensive selection of CVD cases. Earliest date for any cardiovascular disease on record was used for each participant.
Variable selection
For the baseline set of variables, we used UKB variables available for the majority of participants. This dataset was further enriched by including first occurrences fields for all available ICD-10 (10th revision of the International Statistical Classification of Diseases and Related Health Problems) codes and hospital inpatient records for remaining ICD-10 codes (Supplementary material online, Table S3). Diagnoses with <0.2% occurrence in the dataset were excluded from the feature list. Considering our use-case, a key determinant was the possibility of its input or assessment using only a smartphone and the possibility to apply these findings to other countries. This led to exclusion of blood tests and other biological measurements which cannot be acquired via smartphone, as well as UK-specific variables, such as Townsend index, income, and certain qualification levels. The exceptions were total cholesterol, HDL cholesterol, and blood pressure measurements owing to their significant predictive value in CVD risk. Analysis of data completeness is provided in Supplementary material online, Table S3.
The majority of predictor variables were used as provided by the UKB, with the addition of several variables which were derived from the data (waist-to-hip circumference ratio, total cholesterol/HDL cholesterol ratio, and total alcohol intake). Imputation of missing values was performed by substitution with the mean. Binary variables for pre-existing diseases were derived from a combination of corresponding fields in first occurrences, self-reported medical conditions, medications, and in-patient hospital diagnosis, with the exclusion of cases where diagnosis occurred after the assessment date. Categorical and ordinal variables were one-hot encoded and continuous variables were scaled.
Feature selection and prediction models
Prior to training the model, the dataset was split into two parts: train dataset (75%) and test dataset (25%), stratified on the outcome. The train dataset was further split into train (75%) and validation (25%) for the purposes of DeepSurv hyperparameter search. There were no statistical differences between the different datasets (data not shown).
Our modelling approach involved the use of Cox Proportional Hazard (CPH) models, implemented in the Python ‘lifelines’ library, and Cox proportional hazards deep neural network (DeepSurv), implemented in the ‘pycox’ package using PyTorch (Supplementary material online, Table S4). DeepSurv models,22 as opposed to CPH model, are able to capture non-linear interaction between variables but require large-scale hyperparameter optimization to achieve a good performance. Using the full set, and later the reduced set of features, optimal hyperparameters were searched among those described in Table 1 via Tree-Structured Parzen Estimator (TPE) algorithm20 from the Optuna Library (Supplementary material online, Table S4). TPE is a Bayesian hyperparameter optimization method which uses the results from past evaluations to build a probabilistic model used to identify the best candidate hyperparameter values for future searches.
Table 1.
Hyper-parameter | Search space |
---|---|
Activation | LeakyReLU21, ReLU22, and SELU23 |
Hidden layers topology | 8, 32, 256, 32 × 32, 64 × 64, 128 × 128, 64 × 16, 256 × 32, 32 × 32 × 32, 64 × 64 × 64 |
Drop-outa24 | [0, 0.9] |
Weight-decaya25 | [0, 20] |
Batch normalization26 | Yes/No |
Optimizer | Stochastic Gradient Descent, Adam27 |
Momentuma28 | [0, 1] |
Learning rate | Log distribution on [1e−5, 1] |
The search space consisted of 10 different neural network topologies, up to three layers deep, and a choice of three activation functions for these layers. Regularization techniques included drop-out (ignores randomly selected neurons in the network) and weight decay (L2 regularization, shrinks weights). The option to utilize batch normalization was offered to accelerate training via standardizing the inputs’ changing distribution. The choice of an optimizer for gradient descent included standard stochastic gradient descent (SGD) with or without Momentum, or Adam (adaptive moment optimization).
Uniform distributions.
Feature selection was performed using the CPH model to decrease the risk of overfitting and ensure suitability for use in a digital solution. In the first step, univariate CPH models were trained for each of the features in the baseline model and those with P-value >0.01 were excluded. The remaining features were processed through stepwise backward elimination. In brief, in each elimination round, a batch of features was removed and performance evaluated. If the concordance index (c-index) decreased by <0.001, features were eliminated, otherwise a smaller subset was removed until the final set of features whose removal would cause performance degradation was found. These highly contributing features were then used in the reduced models.
Both CPH and DeepSurv models were then trained using the final set of features, along with a model variant excluding all cholesterol variables and systolic blood pressure (substituted by heart rate for its digital feasibility), to enable calculation of risk for individuals who are not able to obtain these measurements.
Feature selection using DeepSurv was not performed as a consequence of its high computational requirements and because performance of the baseline DeepSurv model was not notably superior to the CPH model.
Comparison to other models
Details of variable derivation for replication of Framingham risk score on the UKB dataset is summarized in Supplementary material online, Table S5. Framingham score for each participant was calculated using the formulas for males and females published in the original article,8 and c-index was calculated using the predicted and actual time-to-CVD-event. Additionally, a CPH model was re-trained using the seven Framingham variables on the UKB dataset and compared to our findings. Sex-specific variants of our final models were trained to allow for closer comparison to these two scores.
Statistical analysis
In the summaries of cohort characteristics, participant numbers and percentages of total are shown for categorical and ordinal variables, whereas median and first and third quartiles are shown for continuous variables. Statistical comparisons were performed using the χ2 test for categorical and ordinal variables and Kruskal–Wallis test for continuous variables.
Where detailed analysis of the results of CPH models is provided, hazard ratios (HR) with 95% confidence intervals (CIs), as well as the coefficients, are provided. P-values test the null hypothesis that the coefficient of each variable is equal to zero. Significance level was set to 0.05.
C-index was used as the metric for both models, with 95% CIs calculated using the percentile bootstrap resampling method (50 resampling rounds). Where detailed analysis of the results of CPH models is provided, log(HR) with 95% CIs are shown. P-values test the null hypothesis that the coefficient of each variable is equal to zero and significance level was set to 0.05. Calibration was evaluated at the 10-year time point using calibration plots and the integrated calibration index (ICI), which is a mean weighted difference between observed and predicted probabilities, implemented in the Python lifelines library. Time-specific evaluation of model discrimination was done using the cumulative/dynamic area under the ROC curve (AUCC/D), implemented in the scikit-survival package. This article was written following the TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) guidelines, which are further elaborated in Supplementary material online, Table S6.
Results
Population characteristics
After exclusion of participants with pre-existing CVD, the study was conducted with 466 052 UKB participants. There were 42 377 participants (9.09%) who developed CVD during the observation period. The most common CVDs were chronic ischaemic heart disease (42.1%), myocardial infarction (20.3%), and stroke (15.4%). The recruitment and data funnel is illustrated in Figure 1. The full breakdown can be found in Supplementary material online, Table S2. Median follow-up time was 11.5 years (IQR 10.7–12.3 years) and maximum follow-up time was 14.6 years.
The studied dataset includes 44.1% men and 55.9% women (sex-specific demographic data shown in Supplementary material online, Table S7), aged 37–73 at the time of recruitment (mean 56.2 ± 8.09). The participants were predominantly white, with a small proportion of other ethnic groups (2.2% Asian, 1.6% Black, and 1.5% Others). Detailed demographic analysis for the variables selected in the final model can be found in Table 2.
Table 2.
n (%) |
||||
---|---|---|---|---|
All participants | No incident CVD | Incident CVD | P-value (adjusted) | |
Total | 466 052 | 423 675 | 42 377 | |
Sex: female | 260 673 (55.93) | 243 390 (57.45) | 17 283 (40.78) | <0.001 |
Age, median [Q1–Q3] | 57 [50–63] | 57 [49–62] | 62 [56–66] | <0.001 |
Has college or university degree, n (%) | 153 353 (32.90) | 142 831 (33.71) | 10 522 (24.83) | <0.001 |
Waist-to-hip ratio, median [Q1–Q3] | 0.87 [0.80–0.93] | 0.86 [0.80–0.93] | 0.91 [0.85–0.97] | <0.001 |
Lost weight compared with 1 year ago, n (%) | 69 005 (14.81) | 62 106 (14.66) | 6899 (16.28) | <0.001 |
Systolic blood pressure, median [Q1–Q3] | 137.50 [125.00–148.50] | 137.00 [124.50–147.50] | 142.00 [132.00–155.50] | <0.001 |
Heart ratea, median [Q1–Q3] | 69.50 [62.50–75.50] | 69.50 [62.50–75.50] | 69.54 [62.50–77.00] | <0.001 |
Total cholesterol, median [Q1–Q3] | 5.77 [5.06–6.41] | 5.77 [5.07–6.40] | 5.77 [4.97–6.45] | <0.001 |
Cholesterol ratio, median [Q1–Q3] | 4.14 [3.43–4.66] | 4.14 [3.41–4.62] | 4.14 [3.66–4.98] | <0.001 |
Currently does not smoke, n (%) | 416 519 (89.37) | 380 368 (89.78) | 36 151 (85.31) | <0.001 |
Smoked occasionally in the past, n (%) | 61 282 (13.15) | 56 096 (13.24) | 5186 (12.24) | <0.001 |
Pack years of smoking, median [Q1–Q3] | 11.88 [0.00–22.29] | 11.00 [0.00–22.29] | 20.00 [0.00–22.29] | <0.001 |
Never drinks alcohol, n (%) | 35 946 (7.71) | 31 707 (7.48) | 4239 (10.00) | <0.001 |
Always adds salt to served food, n (%) | 22 252 (4.77) | 19 582 (4.62) | 2670 (6.30) | <0.001 |
Hours spent outdoors in winter, median [Q1–Q3] | 1.93 [1.00–2.00] | 1.00 [1.00–2.00] | 1.93 [1.00–3.00] | <0.001 |
Steady average usual walking pace, n (%) | 243 706 (52.29) | 219 976 (51.92) | 23 730 (56.00) | <0.001 |
Brisk usual walking pace, n (%) | 186 297 (39.97) | 174 169 (41.11) | 12 128 (28.62) | <0.001 |
Excellent self-rated health, n (%) | 80 390 (17.25) | 75 946 (17.93) | 4444 (10.49) | <0.001 |
Good self-rated health, n (%) | 274 677 (58.94) | 252 295 (59.55) | 22 382 (52.82) | <0.001 |
Poor self-rated health, n (%) | 16 742 (3.59) | 13 477 (3.18) | 3265 (7.70) | <0.001 |
Father diagnosed with heart disease, n (%) | 128 063 (27.48) | 114 859 (27.11) | 13 204 (31.16) | <0.001 |
Mother diagnosed with heart disease, n (%) | 81 038 (17.39) | 71 723 (16.93) | 9315 (21.98) | <0.001 |
Sibling diagnosed with heart disease, n (%) | 34 085 (7.31) | 28 943 (6.83) | 5142 (12.13) | <0.001 |
Diagnosis of depressive episode (F32), n (%) | 36 410 (7.81) | 32 466 (7.66) | 3944 (9.31) | <0.001 |
Diagnosis of epilepsy (G40), n (%) | 4453 (0.96) | 3859 (0.91) | 594 (1.40) | <0.001 |
Diagnosis of atrial fibrillation and flutter (I48), n (%) | 4925 (1.06) | 3371 (0.80) | 1554 (3.67) | <0.001 |
Diagnosis of other cardiac arrhythmias (I49), n (%) | 3892 (0.84) | 3179 (0.75) | 713 (1.68) | <0.001 |
Diagnosis of urinary tract infection or incontinence (I39), n (%) | 23 218 (4.98) | 20 629 (4.87) | 2589 (6.11) | <0.001 |
Diagnosis of diabetes (E10, E11, E14), n (%) | 20 054 (4.30) | 15 919 (3.76) | 4135 (9.76) | <0.001 |
Diagnosis of haematological cancer, n (%) | 1921 (0.41) | 1528 (0.36) | 393 (0.93) | <0.001 |
Diagnosis of cellulitis (L03), n (%) | 3635 (0.78) | 3007 (0.71) | 628 (1.48) | <0.001 |
Has diabetes-related eye disease, n (%) | 2602 (0.56) | 1860 (0.44) | 742 (1.75) | <0.001 |
Fractured bones in the last 5 years, n (%) | 43 598 (9.35) | 39 197 (9.25) | 4401 (10.39) | <0.001 |
Does not have any long-standing illness, disability or infirmity, n (%) | 318 502 (68.34) | 295 823 (69.82) | 22 679 (53.52) | <0.001 |
Number of operations, median [Q1–Q3] | 1 [1–2] | 1 [1–2] | 2 [1–3] | <0.001 |
Regularly takes blood pressure medications, n (%) | 81 084 (17.40) | 67 673 (15.97) | 13 411 (31.65) | <0.001 |
Regularly takes insulin, n (%) | 4124 (0.88) | 3095 (0.73) | 1029 (2.43) | <0.001 |
Regularly takes aspirin, n (%) | 44 935 (9.64) | 37 535 (8.86) | 7400 (17.46) | <0.001 |
Regularly takes corticosteroids, n (%) | 3993 (0.86) | 3257 (0.77) | 736 (1.74) | <0.001 |
Does not regularly take mineral supplements, fish oil or glucosamine, n (%) | 261 167 (56.04) | 238 037 (56.18) | 23 130 (54.58) | <0.001 |
Does not take insulin or medications for cholesterol/blood pressure, n (%) | 328 612 (70.51) | 305 358 (72.07) | 23 254 (54.87) | <0.001 |
Number of medications taken regularly, median [Q1–Q3] | 2 [0–3] | 1 [0–3] | 3 [1–5] | <0.001 |
Experiences dyspnoea (R060), n (%) | 1729 (0.37) | 1334 (0.31) | 395 (0.93) | <0.001 |
Experiences abdominal or pelvic pain (R10), n (%) | 21 070 (4.52) | 18 496 (4.37) | 2574 (6.07) | <0.001 |
Experiences dizziness or giddiness (R42), n (%) | 1626 (0.35) | 1319 (0.31) | 307 (0.72) | <0.001 |
Experiences syncope or collapse (R55), n (%) | 3354 (0.72) | 2732 (0.64) | 622 (1.47) | <0.001 |
Has had wheeze or whistling in the chest in the last year, n (%) | 91 158 (19.56) | 79 415 (18.74) | 11 743 (27.71) | <0.001 |
Never feels any pain or discomfort in their chest, n (%) | 393 719 (84.48) | 360 851 (85.17) | 32 868 (77.56) | <0.001 |
Last column shows P-value after comparing the incident CVD group with the non-CVD group. Comparisons were performed using the χ2 test for categories and Kruskal–Wallis test for continuous variables.
This variable is used only as a substitute for systolic blood pressure in the model excluding cholesterol and systolic blood pressure.
Baseline model performance and feature selection
CPH and DeepSurv models were trained using the pre-processed dataset, containing 608 features. This baseline CPH model achieved a c-index of 0.7431 (95% CI 0.7422–0.7441) on the test dataset. DeepSurv, with optimized hyperparameters (Supplementary material online, Table S8), achieved a c-index of 0.7461 (95% CI 0.7452–0.7469) on the test dataset. The CPH model was used for the subsequent feature selection from this baseline dataset.
Initially, 93 features were eliminated based on a P-value >0.01 in a univariate CPH model. The subsequent stepwise backward elimination excluded a further 465 features. The remaining 50 features were subjected to review by a clinician, further excluding three features (First Occurrence of ICD-10 codes F17 and H25, lamb/mutton intake 2–4 times a week).
Performance of the reduced prediction models
The remaining 47 features in the final reduced models include demographic measures (age, sex, holding a university degree), anthropometrics (waist-to-hip ratio), systolic blood pressure, total cholesterol, cholesterol ratio, range of pre-existing conditions, medications, symptoms, family history of heart disease, lifestyle measures (e.g. smoking, alcohol consumption), and self-rated health. A CPH model trained using these features achieved a c-index 0.7443 (95% CI 0.7441–0.7445) on the test dataset. A DeepSurv model showed concordance of 0.7446 (95% CI 0.7441–0.7452) on the test dataset. The AUCCD calculated at years 1 through 10 showed above-average discrimination at years 1–5 and stable AUC of ∼0.76 from year 6 onwards (Figure 3A and B). Sensitivity analysis (Supplementary material online, Figure S1) revealed that features representing symptoms (e.g. abdominal pain, wheeze in chest) contribute more to shorter term predictions (1–3 years).
The details of CPH feature coefficients and statistical analysis can be found in Figure 2 and Supplementary material online, Table S9. Based on the P-value calculated in the CPH model, the top five risk factors include age, systolic blood pressure, diagnosis of atrial fibrillation and flutter (ICD-10 code I48), cholesterol ratio, and father with heart disease. The top five protective features include being a female, non-smoker, not experiencing any chest pain, brisk usual walking pace, and excellent self-rated health.
The CPH model showed good calibration, with slight overestimation of the higher probabilities which were more sparsely represented in the dataset at 10 years (Figure 3C). The DeepSurv model slightly overestimated lower probabilities at 10 years and underestimated higher probabilities at 5 years (Figure 3D). The integrated calibration index (ICI) was 0.00295 for the CPH model and 0.00567 for DeepSurv model (details in Supplementary material online, Table S10).
To enable calculation of risk scores for individuals without access to cholesterol and blood pressure measurements, the models were re-trained after excluding total cholesterol and cholesterol ratio. Heart rate replaced systolic blood pressure as it is measurable via mobile device. The c-index of the CPH model decreased to 0.741 and c-index of the DeepSurv model decreased to 0.739 (Table 3).
Table 3.
Before feature selection | After feature selection | Excluding cholesterol measurements + substituting BP | |
---|---|---|---|
Number of features | 608 | 47 | 45 |
CPH model C-index | 0.7431 [0.7422–0.7441] | 0.7443 [0.7441–0.7445] | 0.7409 [0.7407–0.7411] |
DeepSurv model C-index | 0.7461 [0.7452–0.7469] | 0.7446 [0.7441–0.7452] | 0.7388 [0.7382–0.7393] |
A 95% confidence interval is shown in square brackets. Cholesterol measurements include total cholesterol and cholesterol ratio, systolic blood pressure was substituted with heart rate.
BP, blood pressure.
Comparison to existing risk scores
Established risk scores, such as Framingham, provide separate models for males and females. To provide closer comparison to these scores, we trained sex-specific models using our final set of variables (excluding sex). The c-index of the male-specific model (both CPH and DeepSurv) was 0.72 (detailed results in Supplementary material online, Table S11) and the female-specific model was 0.75 (detailed results in Supplementary material online, Table S12).
Risk scores calculated using the Framingham formula achieved a c-index of 0.68 and 0.70 for male and females, respectively. The c-indices rose slightly after re-training the CPH model using the seven Framingham variables on the UKB dataset. Comparison of all c-indices can be found in Table 4.
Table 4.
Score | Method | C-index |
||
---|---|---|---|---|
Men | Women | All participants | ||
Framingham score | Risk scores calculated using published formula | 0.678 | 0.695 | 0.704 |
CPH model trained on UKB | 0.684 [0.684–0.685] | 0.714 [0.713–0.714] | 0.715 [0.715–0.715] | |
Our score | CPH model | 0.716 [0.716–0.717] | 0.748 [0.747–0.748] | 0.744 [0.744–0.744] |
DeepSurv model | 0.716 [0.715–0.717] | 0.747 [0.746–0.748] | 0.745 [0.744–0.745] |
Test dataset c-indices was shown. 95% confidence intervals are shown in the square brackets.
UKB, UK Biobank.
Discussion
Through our investigation of the UKB’s sizable dataset, we were able to develop a model with a c-index of 0.745, showing very good predictive ability for CVD events over a 10-year period. The main features in the model are well-established across other risk scores: age, sex, hypertensive medication, systolic blood pressure, smoking status, and cholesterol. However, minimal differences were observed when cholesterol-related variables were excluded (0.744 vs. 0.741 using the Cox model), strengthening the case for use of our model in a remote setting and without mandatory cholesterol screening. The clinical demand for this exclusion is illustrated by the incorporation of a non-laboratory-based score along with the original Framingham CVD risk score.8 The observed modest decrease in c-index after exclusion of cholesterol may be explained by multicollinearity with other factors with high importance in our model: waist-to-height ratio, smoking status, family history, and hypertensive medication. Anthropometric measures, while traditionally viewed as valued parameters only of obesity and diabetes,30 are gaining traction in risk modelling across a wider range of non-communicable and communicable diseases alike.31 The expanding impact of the digital revolution on healthcare makes these features especially relevant owing to novel, accessible technologies which can capture a broad range of anthropometric information using only a smartphone camera.32
Alcohol consumption and smoking habits, while traditional factors, present in an interesting manner. For the former, as categorical UKB fields were one-hot encoded, only alcohol abstinence passed feature selection, with other degrees of consumption not significant. The detrimental predictive significance of alcohol abstinence can be attributed to the average poorer health status and higher prevalence of chronic conditions and neurological problems of never- and former-drinkers when compared to ever-drinkers.33 For tobacco, as expected, currently not smoking is protective and increasing pack years confers increased risk of CVD. However, having smoked occasionally was unexpectedly calculated as protective, likely as this group is enriched for people who have since ceased smoking, having never smoked regularly.
In addition to traditional risk factors and anthropometric variables, several other variables not present in any conventional risk model have shown to be significant contributors to our model. Notably, the inclusion of reported symptoms in our analysis sets the resultant model apart from others. The symptoms of dyspnoea, syncope, dizziness, wheezing, and chest pain are recognized as clinical indicators of possible cardiac disease. Our model supports their potential to highlight subclinical, underlying cardiovascular pathology. Similarly, abdominal and pelvic pain may represent undiagnosed comorbidity associated with our CVD outcomes, including inflammatory bowel disease, endometriosis, or aortic aneurysm.34,35 We performed a sensitivity analysis to investigate the influence of these features for short- and long-term predictions which revealed that removing these features mostly impairs shorter term predictions.
The extent of one’s education has also emerged as an important contributing social factor in our model, reinforcing the findings of recent studies which show education as a significant predictor of CVD acquisition and mortality.36
Furthermore, several diagnoses not traditionally associated with the incidence of CVD were found to be significant contributors to the CPH model. A diagnosis of urinary tract infections (UTI) or incontinence, as well as epilepsy, are significant contributors independent of all other features. UTI and incontinence may be explained by the raised cardiovascular risk of the proinflammatory state, but warrants further investigation. Epidemiological studies have shown a raised CVD risk profile in those with epilepsy, but this is mostly attributed to concomitant risk factors.37
Self-rated health and prior diagnosis of a depressive episode, both associated with increased CVD risk,38,39 are also significant features, stressing the assessment of psychological aspects of health when calculating risk for CVDs. To a smaller, but still important extent, two physical activity-related behaviours—walking pace and hours spent outdoors (in winter)—have contributed to the performance of the model, both known to influence the risk of CVD-related events.40,41 Lastly, a family history of heart disease showed strong predictive power, with sibling heart disease ranking significantly higher than parental. This is likely as a sibling diagnosis has been caused both by similar genetic and environmental factors, whereas parental diagnoses reflect pertinent environmental contributors to a lesser extent. Despite its well-known contribution to CVD occurrence, the QRISK3 is the only risk model that currently incorporates family history in the calculation, but our results show that more granular coverage of family history could lead to better risk assessment.7
By using CPH and Deepsurv to develop risk scores, we compared an established statistical method with a more novel machine learning approach to survival analysis. We hypothesized that use of Deepsurv might result in superior predictions by identifying complex interactions between variables. However, performances of both models were similar, implying a minor contribution of non-linear interactions within the large feature space, or insufficient number of training examples to give advantage to a more data-intensive deep learning model. In the case of equal performance of both models, CPH models provide several advantages: they are less computationally intensive and easier to implement and update with new data or recalibrate for other demographics owing to a low number of pre-set hyperparameters. Importantly, CPH models also provide easily interpretable feature coefficients, making them more trustworthy for clinicians who may not be familiar with academic data science.
The minimal performance drop when excluding the less accessible features of cholesterol and blood pressure, together with the fact that many features are modifiable, supports the feasibility of feature collection through a digital application, in a remote setting, and on a regular basis. This enables more engaging risk management and eliminates the need for interaction with a healthcare provider. Our digitally-collected score slightly overperformed Framingham risk score (c-index 0.745 vs. 0.704).
Due to increasing ownership, mobile phones and smart wearables have become a priority medium for digital health interventions. The use of digitally collected features to calculate CVD risk will allow patient-facing solutions, transferring responsibility of health status and lifestyle choices to patients rather than clinical teams. Furthermore, gamification and goal setting has been shown to increase the success of lifestyle change interventions.42 Several risk variables, particularly those related to diet (salt added to food, nutritional supplement intake), smoking, alcohol consumption, as well as activity (hours spent outdoors in winter) are adjustable in the short term, giving direct feedback to users. A patient-facing digital risk score that responds to and reports positive change in a short time span, as a consequence of such lifestyle improvements, could increase motivation in patients with high CVD risk. Furthermore, the risk score can be easily accompanied by educational content from credible sources that is tailored to support the patient along that journey, providing further support on their path to improving their cardiovascular health. The assessment of cardiovascular risk is a mainstay of clinical guidelines,5 and the ability to calculate this remotely with very limited face-to-face clinical resources could have significant beneficial cost and patient accessibility implications.
A key challenge in any digital health intervention is being able to maintain patient adherence across time, which is tackled by considering key usability and design aspects early in the development process. The number of variables incorporated into this model can be challenging for patients to recall, especially those related to non-modifiable factors, and can lead to patient drop-off over time. By creating a simple, user-friendly interface that asks clear questions that are easy to understand, we strongly believe that all of the necessary information can be comfortably obtained, especially in the setting of repeated assessments.
A notable limitation of our study stems from the UKB cohort being unrepresentative of the general population across various domains.43 First, ethnic diversity is very low with 94% of all participants identifying as White and, second, age at recruitment was restricted to 37–73. Our analysis, therefore, was not able to account for differences in CVD risk across ethnicities, which is a significant known factor for CVD incidence, nor in younger age groups, thus use of our model in these populations should be interpreted with some reserve. Importantly, the restricted age distribution also represents a higher risk population, implying validation of the model in a more representative sample would result in a higher performance, as seen when QRISK3 is calculated for UKB participants (representative population: 0.88; UKB: 0.76).7,44 Last, the UKB cohort is reported to be healthier and wealthier than the general population.43 This is a significant factor when genetic aspects account for only ∼32% of coronary artery disease occurrence,45 thereby may result in an underappreciation of the influence of detrimental lifestyle choices. Future work will concern external validation of the model which will allow conclusions to be drawn about its use across geographies and populations.
While our aim was to limit model bias by applying a data-driven approach to feature selection and excluding features with a large proportion of missing data, choosing CPH model-based feature selection may have biased the final features in the reduced model. It is likely that this has also limited the performance of the DeepSurv model, by selecting features that specifically enhanced the performance of the CPH model.
In conclusion, DiCAVA, our 10-year CVD risk model, has very good predictive capacity and contains significant predictors not previously described by existing risk scores in the literature. We demonstrated its feasible utility in a remote setting where cholesterol and blood pressure measurements may not always be convenient, highlighting that even the most established predictors are not always essential.
Lead author biography
Nikola Dolezalova works in the Innovation & AI team at Huma Therapeutics as a Clinical Data Scientist. In 2019, she completed her PhD at the University of Cambridge, Department of Surgery on cryopreservation of pancreatic islets and then worked as a post-doctoral Research Associate, researching anti-inflammatory reflex, and a bioelectronic therapy for inflammatory diseases in collaboration with Galvani Bioelectronics.
Supplementary material
Supplementary material is available at European Heart Journal is available at online.
Supplementary Material
Acknowledgements
The authors would like to thank Sam Nikbakhtian and Michele Colombo for their contributions.
Funding
This research was funded by Huma Therapeutics Ltd.
Conflict of interest: N.D., A.B.R., A.D., B.D.O., D.M., M.A., and D.P. are employees of Huma Therapeutics Ltd.
Data availability
The UK Biobank is an ‘open’ resource accessible to any researchers approved as bona fide by the Biobank Access Management Team.
References
- 1.GBD 2019 Diseases and Injuries Collaborators. Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet 2020;396:1204–1222. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Timmis A, Townsend N, Gale CP, Torbica A, Lettino M, Petersen SE, Mossialos EA, Maggioni AP, Kazakiewicz D, May HT, De Smedt D, Flather M, Zuhlke L, Beltrame JF, Huculeci R, Tavazzi L, Hindricks G, Bax J, Casadei B, Achenbach S, Wright L, Vardas P, Mimoza L, Artan G, Aurel D, Chettibi M, Hammoudi N, Sisakian H, Pepoyan S, Metzler B, Siostrzonek P, Weidinger F, Jahangirov T, Aliyev F, Rustamova Y, Manak N, Mrochak A, Lancellotti P, Pasquet A, Claeys M, Kušljugić Z, Dizdarević Hudić L, Smajić E, Tokmakova MP, Gatzov PM, Milicic D, Bergovec M, Christou C, Moustra HH, Christodoulides T, Linhart A, Taborsky M, Hansen HS, Holmvang L, Kristensen SD, Abdelhamid M, Shokry K, Kampus P, Viigimaa M, Ryödi E, Niemelä M, Rissanen TT, Le Heuzey J-Y, Gilard M, Aladashvili A, Gamkrelidze A, Kereselidze M, Zeiher A, Katus H, Bestehorn K, Tsioufis C, Goudevenos J, Csanádi Z, Becker D, Tóth K, Jóna Hrafnkelsdóttir Þ, Crowley J, Kearney P, Dalton B, Zahger D, Wolak A, Gabrielli D, Indolfi C, Urbinati S, Imantayeva G, Berkinbayev S, Bajraktari G, Ahmeti A, Berisha G, Erkin M, Saamay A, Erglis A, Bajare I, Jegere S, Mohammed M, Sarkis A, Saadeh G, Zvirblyte R, Sakalyte G, Slapikas R, Ellafi K, El Ghamari F, Banu C, Beissel J, Felice T, Buttigieg SC, Xuereb RG, Popovici M, Boskovic A, Rabrenovic M, Ztot S, Abir-Khalil S, van Rossum AC, Mulder BJM, Elsendoorn MW, Srbinovska-Kostovska E, Kostov J, Marjan B, Steigen T, Mjølstad OC, Ponikowski P, Witkowski A, Jankowski P, Gil VM, Mimoso J, Baptista S, Vinereanu D, Chioncel O, Popescu BA, Shlyakhto E, Oganov R, Foscoli M, Zavatta M, Dikic AD, Beleslin B, Radovanovic MR, Hlivák P, Hatala R, Kaliská G, Kenda M, Fras Z, Anguita M, Cequier Á, Muñiz J, James S, Johansson B, Platonov P, Zellweger MJ, Pedrazzini GB, Carballo D, Shebli HE, Kabbani S, Abid L, Addad F, Bozkurt E, Kayıkçıoğlu M, Erol MK, Kovalenko V, Nesukay E, Wragg A, Ludman P, Ray S, Kurbanov R, Boateng D, Daval G, de Benito Rubio V, Sebastiao D, de Courtelary PT, Bardinet I; European Society of Cardiology. European Society of Cardiology: cardiovascular disease statistics 2019. Eur Heart J 2020;41:12–85. [DOI] [PubMed] [Google Scholar]
- 3. Roth GA, Mensah GA, Johnson CO, Addolorato G, Ammirati E, Baddour LM, Barengo NC, Beaton AZ, Benjamin EJ, Benziger CP, Bonny A, Brauer M, Brodmann M, Cahill TJ, Carapetis J, Catapano AL, Chugh SS, Cooper LT, Coresh J, Criqui M, DeCleene N, Eagle KA, Emmons-Bell S, Feigin VL, Fernández-Solà J, Fowkes G, Gakidou E, Grundy SM, He FJ, Howard G, Hu F, Inker L, Karthikeyan G, Kassebaum N, Koroshetz W, Lavie C, Lloyd-Jones D, Lu HS, Mirijello A, Temesgen AM, Mokdad A, Moran AE, Muntner P, Narula J, Neal B, Ntsekhe M, Moraes de Oliveira G, Otto C, Owolabi M, Pratt M, Rajagopalan S, Reitsma M, Ribeiro ALP, Rigotti N, Rodgers A, Sable C, Shakil S, Sliwa-Hahnle K, Stark B, Sundström J, Timpel P, Tleyjeh IM, Valgimigli M, Vos T, Whelton PK, Yacoub M, Zuhlke L, Murray C, Fuster V; GBD-NHLBI-JACC Global Burden of Cardiovascular Diseases Writing Group. Global burden of cardiovascular diseases and risk factors, 1990–2019. J Am Coll Cardiol 2020;76:2982–3021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Yusuf S, Joseph P, Rangarajan S, Islam S, Mente A, Hystad P, Brauer M, Kutty VR, Gupta R, Wielgosz A, AlHabib KF, Dans A, Lopez-Jaramillo P, Avezum A, Lanas F, Oguz A, Kruger IM, Diaz R, Yusoff K, Mony P, Chifamba J, Yeates K, Kelishadi R, Yusufali A, Khatib R, Rahman O, Zatonska K, Iqbal R, Wei L, Bo H, Rosengren A, Kaur M, Mohan V, Lear SA, Teo KK, Leong D, O'Donnell M, McKee M, Dagenais G.. Modifiable risk factors, cardiovascular disease, and mortality in 155 722 individuals from 21 high-income, middle-income, and low-income countries (PURE): a prospective cohort study. Lancet 2020;395:795–808. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Piepoli MF, Hoes AW, Agewall S, Albus C, Brotons C, Catapano AL, Cooney MT, Corrà U, Cosyns B, Deaton C, Graham I, Hall MS, Hobbs FDR, Løchen ML, Löllgen H, Marques-Vidal P, Perk J, Prescott E, Redon J, Richter DJ, Sattar N, Smulders Y, Tiberi M, van der Worp HB, van Dis I, Verschuren WMM, Binno S, ESC Scientific Document Group.et al. 2016 European Guidelines on cardiovascular disease prevention in clinical practice: The Sixth Joint Task Force of the European Society of Cardiology and Other Societies on Cardiovascular Disease Prevention in Clinical Practice (constituted by representatives of 10 societies and by invited experts). Developed with the special contribution of the European Association for Cardiovascular Prevention & Rehabilitation (EACPR). Eur Heart J 2016;37:2315–2381.27222591 [Google Scholar]
- 6. Goff DC, Lloyd-Jones DM, Bennett G, Coady S, D’Agostino RB, Gibbons R, Greenland P, Lackland DT, Levy D, O’Donnell CJ, Robinson JG, Schwartz JS, Shero ST, Smith SC, Sorlie P, Stone NJ, Wilson PWF.. 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. J Am Coll Cardiol 2014;63:2935–2959. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Hippisley-Cox J, Coupland C, Brindle P. Development and validation of QRISK3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study. BMJ 2017;357:j2099. 10.1136/bmj.j2099 28536104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. D’Agostino RB, Vasan RS, Pencina MJ, Wolf PA, Cobain M, Massaro JM, Kannel WB.. General cardiovascular risk profile for use in primary care. Circulation 2008;117:743–753. [DOI] [PubMed] [Google Scholar]
- 9. Hajifathalian K, Ueda P, Lu Y, Woodward M, Ahmadvand A, Aguilar-Salinas CA, Azizi F, Cifkova R, Di Cesare M, Eriksen L, Farzadfar F, Ikeda N, Khalili D, Khang Y-H, Lanska V, León-Muñoz L, Magliano D, Msyamboza KP, Oh K, Rodríguez-Artalejo F, Rojas-Martinez R, Shaw JE, Stevens GA, Tolstrup J, Zhou B, Salomon JA, Ezzati M, Danaei G.. A novel risk score to predict cardiovascular disease risk in national populations (Globorisk): a pooled analysis of prospective cohorts and health examination surveys. Lancet Diabetes Endocrinol 2015;3:339–355. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Conroy RM, Pyörälä K, Fitzgerald AP, Sans S, Menotti A, De Backer G, De Bacquer D, Ducimetière P, Jousilahti P, Keil U, Njølstad I, Oganov RG, Thomsen T, Tunstall-Pedoe H, Tverdal A, Wedel H, Whincup P, Wilhelmsen L, Graham IM; SCORE project group. Estimation of ten-year risk of fatal cardiovascular disease in Europe: the SCORE project. Eur Heart J 2003;24:987–1003. [DOI] [PubMed] [Google Scholar]
- 11. Shroufi A, Chowdhury R, Anchala R, Stevens S, Blanco P, Han T, Niessen L, Franco OH.. Cost effective interventions for the prevention of cardiovascular disease in low and middle income countries: a systematic review. BMC Public Health 2013;13:285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Hiligsmann M, Wyers CE, Mayer S, Evers SM, Ruwaard D.. A systematic review of economic evaluations of screening programmes for cardiometabolic diseases. Eur J Public Health 2017;27:621–631. [DOI] [PubMed] [Google Scholar]
- 13. van Staa T-P, Gulliford M, Ng ES-W, Goldacre B, Smeeth L.. Prediction of cardiovascular risk using Framingham, ASSIGN and QRISK2: how well do they predict individual rather than population risk? PLoS One 2014;9:e106455. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Studziński K, Tomasik T, Krzysztoń J, Jóźwiak J, Windak A.. Effect of using cardiovascular risk scoring in routine risk assessment in primary prevention of cardiovascular disease: an overview of systematic reviews. BMC Cardiovasc Disord 2019;19:11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Whelton PK, Carey RM, Aronow WS, Casey DE, Collins KJ, Dennison Himmelfarb C, DePalma SM, Gidding S, Jamerson KA, Jones DW, MacLaughlin EJ, Muntner P, Ovbiagele B, Smith SC, Spencer CC, Stafford RS, Taler SJ, Thomas RJ, Williams KA, Williamson JD, Wright JT.. 2017 ACC/AHA/AAPA/ABC/ACPM/AGS/APhA/ASH/ASPC/NMA/PCNA guideline for the prevention, detection, evaluation, and management of high blood pressure in adults. J Am Coll Cardiol 2018;71:e127–e248. [DOI] [PubMed] [Google Scholar]
- 16. Pevnick JM, Birkeland K, Zimmer R, Elad Y, Kedan I.. Wearable technology for cardiology: an update and framework for the future. Trends Cardiovasc Med 2018;28:144–150. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Ward A, Sarraju A, Chung S, Li J, Harrington R, Heidenreich P, Palaniappan L, Scheinker D, Rodriguez F. et al. Machine learning and atherosclerotic cardiovascular disease risk prediction in a multi-ethnic population. Npj Digit Med 2020;3:1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, Downey P, Elliott P, Green J, Landray M, Liu B, Matthews P, Ong G, Pell J, Silman A, Young A, Sprosen T, Peakman T, Collins R.. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med 2015;12:e1001779. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Katzman JL, Shaham U, Cloninger A, Bates J, Jiang T, Kluger Y.. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med Res Methodol 2018;18:1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Bergstra J, Bardenet R, Bengio Y, Kégl B. Algorithms for hyper-parameter optimization. In: 25th Annual Conference on Neural Information Processing Systems (NIPS 2011), Granada, Spain, 2011.
- 21. Maas AL, Hannun AY, Ng AY. Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the 30th International Conference on Machine Learning, Atlanta, Georgia, USA, 2013. JMLR: W&CP volume 28.
- 22. Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 2011. JMLR: W&CP volume 15, p. 315–323.
- 23. Klambauer G, Unterthiner T, Mayr A, Hochreiter S. Self-normalizing neural networks. In Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA. 2017.
- 24. Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR. Improving neural networks by preventing co-adaptation of feature detectors. ArXiv Prepr ArXiv:1207.0580. 2012.
- 25. Moody J, Hanson S, Krogh A, Hertz JA.. A simple weight decay can improve generalization. Adv Neural Inf Process Syst 1995;4:950–957. [Google Scholar]
- 26. Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning, Lille, France. 2015. JMLR: W&CP volume 37. p. 448–456.
- 27. Kingma DP, Ba J., Adam: A method for stochastic optimization. ArXiv Prepr ArXiv:1412.6980. 2014.
- 28. Rumelhart DE, Hinton GE, Williams RJ.. Learning representations by back-propagating errors. Nature 1986;323:533–536. [Google Scholar]
- 29. Pölsterl S. scikit-survival: a library for time-to-event analysis built on top of scikit-learn. J Mach Learn Res 2020;21:1–6.34305477 [Google Scholar]
- 30. Janghorbani M, Momeni F, Dehghani M.. Hip circumference, height and risk of type 2 diabetes: systematic review and meta-analysis. Obes Rev 2012;13:1172–1181. [DOI] [PubMed] [Google Scholar]
- 31. Corrêa MM, Thumé E, De Oliveira ERA, Tomasi E.. Performance of the waist-to-height ratio in identifying obesity and predicting non-communicable diseases in the elderly population: a systematic literature review. Arch Gerontol Geriatr 2016;65:174–182. [DOI] [PubMed] [Google Scholar]
- 32. Medina-Inojosa JR, Chacin Suarez A, Saeidifard F, Narayana Gowda S, Robinson J, Aseged K, Lynne J, Zundel J, Bonikowske A, Lopez-Jimenez F.. Abstract 16983: validation of 3D volume measurement technology to assess body fat content using biplane imaging from mobile devices. Circulation 2020;142:A16983. [Google Scholar]
- 33. Friesema IHM, Zwietering PJ, Veenstra MY, Knottnerus JA, Garretsen HFL, Lemmens PHHM.. Alcohol intake and cardiovascular disease and mortality: the role of pre‐existing disease. J Epidemiol Community Health 2007;61:441–446. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Tan J, Taskin O, Iews M, Lee AJ, Kan A, Rowe T, Bedaiwy MA.. Atherosclerotic cardiovascular disease in women with endometriosis: a systematic review of risk factors and prospects for early surveillance. Reprod Biomed Online 2019;39:1007–1016. [DOI] [PubMed] [Google Scholar]
- 35. Brady AR, Fowkes F, Gerald R, Thompson Simon G, Powell JT.. Aortic aneurysm diameter and risk of cardiovascular mortality. Arterioscler Thromb Vasc Biol 2001;21:1203–1207. [DOI] [PubMed] [Google Scholar]
- 36. Carter AR, Gill D, Davies NM, Taylor AE, Tillmann T, Vaucher J, Wootton RE, Munafò MR, Hemani G, Malik R, Seshadri S, Woo D, Burgess S, Smith GD, Holmes MV, Tzoulaki I, Howe LD, Dehghan A. . et al. Understanding the consequences of education inequality on cardiovascular disease: Mendelian randomisation study. BMJ 2019;365:I1855. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Shmuely S, Lende M V D, Lamberts RJ, Sander JW, Thijs RD.. The heart of epilepsy: current views and future concepts. Seizure Eur J Epilepsy 2017;44:176–183. [DOI] [PubMed] [Google Scholar]
- 38. Osibogun O, Ogunmoroti O, Spatz ES, Burke GL, Michos ED.. Is self-rated health associated with ideal cardiovascular health? The multi-ethnic study of atherosclerosis. Clin Cardiol 2018;41:1154–1163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Hare DL, Toukhsati SR, Johansson P, Jaarsma T.. Depression and cardiovascular disease: a clinical review. Eur Heart J 2014;35:1365–1372. [DOI] [PubMed] [Google Scholar]
- 40. Stamatakis E, Kelly P, Strain T, Murtagh EM, Ding D, Murphy MH.. Self-rated walking pace and all-cause, cardiovascular disease and cancer mortality: individual participant pooled analysis of 50 225 walkers from 11 population British cohorts. Br J Sports Med 2018;52:761–768. [DOI] [PubMed] [Google Scholar]
- 41. Beyer KMM, Szabo A, Hoormann K, Stolley M.. Time spent outdoors, activity levels, and chronic disease among American adults. J Behav Med 2018;41:494–503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Heron N, Kee F, Donnelly M, Cardwell C, Tully MA, Cupples ME.. Behaviour change techniques in home-based cardiac rehabilitation: a systematic review. Br J Gen Pract 2016;66:e747–e757. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Fry A, Littlejohns TJ, Sudlow C, Doherty N, Adamska L, Sprosen T, Collins R, Allen NE.. Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population. Am J Epidemiol 2017;186:1026–1034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Elliott J, Bodinier B, Bond TA, Chadeau-Hyam M, Evangelou E, Moons KGM, Dehghan A, Muller DC, Elliott P, Tzoulaki I.. Predictive accuracy of a polygenic risk score-enhanced prediction model vs a clinical risk score for coronary artery disease. JAMA 2020;323:636–645. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Zeng L, Talukdar HA, Koplev S, Giannarelli C, Ivert T, Gan L-M, Ruusalepp A, Schadt EE, Kovacic JC, Lusis AJ, Michoel T, Schunkert H, Björkegren JLM.. Contribution of gene regulatory networks to heritability of coronary artery disease. J Am Coll Cardiol 2019;73:2946–2957. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Woodward M, Brindle P, Tunstall-Pedoe H; SIGN Group on Risk Estimation. Adding social deprivation and family history to cardiovascular risk assessment: the ASSIGN score from the Scottish Heart Health Extended Cohort (SHHEC). Heart 2007;93:172–176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Assmann G, Cullen P, Schulte H.. Simple scoring scheme for calculating the risk of acute coronary events based on the 10-year follow-up of the prospective cardiovascular Munster (PROCAM) study. Circulation 2002;105:310–315. [DOI] [PubMed] [Google Scholar]
- 48. Giampaoli S. CUORE: A Sustainable Cardiovascular Disease Prevention Strategy. Eur J Cardiovasc Prev Rehabil ; 2007;14:161–162. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The UK Biobank is an ‘open’ resource accessible to any researchers approved as bona fide by the Biobank Access Management Team.