Abstract
Aim:
The Framingham Risk Score (FRS) and atherosclerotic cardiovascular disease (ASCVD) Pooled Cohort Equation (PCE) for predicting risk for incident coronary heart disease (CHD) work poorly. To improve risk stratification for CHD, we developed a novel integrated genetic-epigenetic tool.
Materials & methods:
Using machine learning techniques and datasets from the Framingham Heart Study (FHS) and Intermountain Healthcare (IM), we developed and validated an integrated genetic-epigenetic model for predicting 3-year incident CHD.
Results:
Our approach was more sensitive than FRS and PCE and had high generalizability across cohorts. It performed with sensitivity/specificity of 79/75% in the FHS test set and 75/72% in the IM set. The sensitivity/specificity was 15/93% in FHS and 31/89% in IM for FRS, and sensitivity/specificity was 41/74% in FHS and 69/55% in IM for PCE.
Conclusion:
The use of our tool in a clinical setting could better identify patients at high risk for a heart attack.
Keywords: : artificial intelligence, coronary heart disease, digital PCR, epigenetics, genetics, machine learning, prevention
Lay abstract
Current lipid-based methods for assessing risk for coronary heart disease (CHD) have limitations. Conceivably, incorporating epigenetic information into risk prediction algorithms may be beneficial, but underlying genetic variation obscures its effects on risk. In order to develop a better CHD risk assessment method, we used artificial intelligence to identify genome-wide genetic and epigenetic biomarkers from two independent datasets of subjects characterized for incident CHD. The resulting algorithm significantly outperformed the current assessment methods in independent test sets. We conclude that artificial intelligence-moderated genetic-epigenetic algorithms have considerable potential as clinical tools for assessing risk for CHD.
Coronary heart disease (CHD) is the most common type of heart disease and was responsible for over 360,000 deaths in the USA in 2017 [1]. In order to decrease this recurring toll, a number of primary prevention risk estimators have been developed to better identify those at risk for CHD. Beginning with the Framingham Risk Score (FRS) and more recently, the atherosclerotic cardiovascular disease (ASCVD) Pooled Cohort Equation (PCE), these risk stratification tools capture variance in key potentially treatable parameters, such as serum lipid levels, known to be associated with risk for CHD [2,3]. Despite the magnitude of these efforts, current risk scores often lack in sensitivity and specificity. As a result, there is a need for alternative primary prevention risk stratification approaches for CHD.
Some of the newer risk prediction strategies take advantage of the rapid advancements in assessing genome-wide genetic or transcriptional variation [4–7]. Although each of these newer approaches has had some success, to date, their clinical impact has been limited. In particular, those algorithms relying only on genetic information have a clear ceiling in predictive capacity and are potentially confounded by ethnic stratification. Finally, because genotype is static, those approaches relying solely on genetic information cannot be used to monitor changes in disease status [8,9].
Recent advances in genome-wide epigenetic profiling techniques have raised the possibility that DNA methylation assessments of peripheral blood DNA may serve as a mechanism for improving prediction of cardiac disease or mortality [10,11]. However, prediction models that only incorporate epigenetic effects may fail to account for confounding genetic variation which affects the vast majority of the environmentally responsive methylome [12]. This failure may result in models that lack robustness with respect to generalizability.
In 2018, we published a proof-of-concept study using data from the Framingham Heart Study (FHS) Offspring cohort outlining an approach that integrates genetic and epigenetic biomarkers. The model was significantly more sensitive than the FRS and PCE risk calculators at predicting risk for incident CHD [13]. One of the limitations of this proof-of-concept study was that the biomarkers were not validated for generalizability in an external cohort. The second limitation was the reliance on genome-wide methylation arrays whose use is both costly and time consuming.
In this study, we utilize improved nonlinear machine learning techniques to identify a new panel of biomarkers that is highly predictive of incident CHD risk and incorporates genetically contextual DNA methylation signatures. Furthermore, we address both of the shortcomings from our prior proof-of-concept study. To address the first limitation, we externally validate this new integrated genetic-epigenetic biomarker panel for incident CHD risk prediction in a highly informative cohort from Intermountain Healthcare (IM). Then, we demonstrate the translation of the array-based methylation assessments into standalone methylation sensitive digital PCR (dPCR) assays as a viable alternative for DNA methylation quantification.
Materials & methods
This study features data and/or biomaterial from two sources. The first set of anonymized genome-wide genetic, genome-wide DNA methylation and clinical data is from the eighth examination cycle of the FHS Offspring cohort. The second set of anonymized clinical data and DNA is from the IM cardiovascular biorepository. The procedures and protocols used for the analysis of the FHS data were approved by the University of Iowa Institutional Review Board (IRB# 201503802). The procedures and protocols used for the analyses of the IM materials were approved by the IM Institutional Review Board (IRB# 1024811).
FHS Offspring cohort
The details on the collection and preparation of clinical and biological data of the FHS cohort have been described previously (dbGAP study accession: phs000007) [14]. Subjects included in this study provided informed consent. In brief, the demographics, risk factors and clinical information were abstracted from the eighth examination of the Offspring cohort, with additional clinical follow-up information from the ninth examination used to determine incident CHD status. Incident CHD was considered present if an individual was diagnosed with CHD within 3 years of the eighth examination cycle. Conversely, incident CHD was considered absent if an individual was not diagnosed with CHD within 3 years of the eighth examination cycle. Data from subjects with prevalent CHD at the eighth examination cycle were excluded from further consideration. Sources of clinical data in determining incident CHD events included subject report, review of medical records and death certificates. The designations and dates of CHD onset used in this study are as determined by a panel of three investigators on the Framingham End Point Review Committee.
Genome-wide DNA methylation data profiles using the Illumina Infinium HumanMethylation450 BeadChip array (450K; Illumina, CA, USA) were available from 2567 subjects who were phlebotomized at the eighth examination cycle. We performed standard sample and probe level quality control as described in previous studies, which resulted in the retainment of DNA methylation data from 2,560 samples at 403,192 loci [13,15–18]. Genome-wide genotype data obtained using the Affymetrix GeneChip HumanMapping 500K array (Thermo Fisher Science, CA, USA) were available for 2406 of the remaining 2560 samples. After standard sample and probe level quality control procedures were performed in PLINK (Harvard University, MA, USA) on these array data as described previously, the total number of samples and SNPs remaining were 2295 and 472,822, respectively [13,18,19]. A challenge in conducting biological studies of community cohorts such as the FHS is the potential for inter-relatedness of some of the subjects. Therefore, the genetic data were subjected to relatedness analysis in PLINK. A total of 1919 subjects had both genome-wide methylation and genotype data, and information on incident CHD status.
FHS training and test sets
Based on incident CHD status, 1280 subjects (18/542 males and 10/738 females diagnosed with clinical CHD within 3 years of the eighth examination cycle ascertainment) were part of the training set, and 639 subjects (9/271 males and 5/368 females diagnosed with clinical CHD within 3 years of the eighth examination cycle ascertainment) were part of the test set. The demographics and conventional risk factors of these individuals are summarized in Table 1.
Table 1. . Summary of demographics and conventional coronary heart disease risk factors for the 1280, 639 and 159 individuals in the Framingham Heart Study Offspring cohort training set, test set and Intermountain Healthcare test set, respectively.
FHS training (n = 1280) | FHS test (n = 639) | IM test (n = 159) | ||||
---|---|---|---|---|---|---|
CHD (n = 28) | No CHD (n = 1252) | CHD (n = 14) | No CHD (n = 625) | CHD (n = 44) | No CHD (n = 115) | |
Gender (count) | ||||||
Males | 18 | 524 | 9 | 262 | 23 | 54 |
Females | 10 | 728 | 5 | 363 | 21 | 61 |
Age (years) | ||||||
Males | 70.6 ± 9.3 | 65.8 ± 8.2 | 66.1 ± 9.1 | 62.7 ± 9.0 | 62.6 ± 14.8 | 61.3 ± 16.5 |
Females | 71.2 ± 10.3 | 66.3 ± 8.5 | 66.8 ± 9.0 | 64.9 ± 9.3 | 64.4 ± 14.3 | 66.2 ± 13.6 |
Total cholesterol (mg/dl) | ||||||
Males | 171 ± 54 | 177 ± 32 | 161 ± 30 | 182 ± 32 | 157 ± 40 | 175 ± 38 |
Females | 229 ± 40 | 199 ± 36 | 185 ± 49 | 197 ± 35 | 193 ± 51 | 179 ± 36 |
HDL cholesterol (mg/dl) | ||||||
Males | 50 ± 16 | 50 ± 14 | 48 ± 12 | 51 ± 15 | 39 ± 10 | 38 ± 12 |
Females | 57 ± 16 | 65 ± 19 | 60 ± 17 | 65 ± 19 | 55 ± 13 | 56 ± 20 |
HbA1c (%) | ||||||
Males | 5.7 ± 0.4 | 5.7 ± 0.8 | 5.8 ± 0.8 | 5.6 ± 0.5 | 6.2 ± 1.0 | 6.1 ± 0.7 |
Females | 6.0 ± 1.0 | 5.7 ± 0.5 | 5.8 ± 0.8 | 5.7 ± 0.6 | 6.6 ± 2.0 | 5.8 ± 1.1 |
SBP (mmHg) | ||||||
Males | 137 ± 15 | 130 ± 17 | 136 ± 11 | 128 ± 16 | 148 ± 22 | 143 ± 24 |
Females | 140 ± 19 | 129 ± 17 | 132 ± 22 | 127 ± 18 | 147 ± 20 | 149 ± 23 |
DBP (mmHg) | ||||||
Males | 74 ± 11 | 76 ± 11 | 74 ± 8 | 77 ± 10 | 82 ± 11 | 84 ± 14 |
Females | 80 ± 11 | 73 ± 10 | 72 ± 7 | 73 ± 10 | 80 ± 10 | 80 ± 12 |
Smoker (count) | ||||||
Males | 1 (6%) | 35 (7%) | 2 (22%) | 16 (6%) | 1 (2%) | 2 (2%) |
Females | 2 (20%) | 57 (8%) | 0 (0%) | 32 (9%) | 3 (7%) | 2 (2%) |
Blood pressure treatment (count) | ||||||
Males | 12 (67%) | 265 (51%) | 4 (44%) | 112 (43%) | 8 (18%) | 17 (15%) |
Females | 3 (30%) | 294 (40%) | 4 (80%) | 157 (43%) | 6 (14%) | 26 (23%) |
CHD: Coronary heart disease; DBP: Diastolic blood pressure; FHS: Framingham Heart Study; HbA1c: Hemoglobin A1c; IM: Intermountain Healthcare; SBP: Systolic blood pressure.
IM cohort
The second de-identified cohort consisting of 159 subjects was drawn from the IM Heart Institute INSPIRE registry, where participants contributed biomaterial and have electronic medical records [20,21]. These subjects were patients who underwent clinically indicated coronary angiography at IM, provided consent to participate in the registry, and for whom both DNA from the time of the catheterization (i.e., index) and clinical follow-up status with respect to incident CHD status were available. Incident case and control subjects were drawn from the registry based on several inclusion and exclusion criteria with the exclusion of prevalent CHD for both groups. An incident case subject was defined as an adult >18 years old whom did not have a history of CHD or myocardial infarction (MI) prior to the index coronary angiogram, had no clinical diagnosis of CHD (<50% stenosis) at the index coronary angiography, but had a clinical diagnosis of CHD (>70% stenosis) on angiography, MI, revascularization or death due to CHD within 3 years of index coronary angiography and biomaterial collection. Of the INSPIRE subjects without prevalent CHD (<50% stenosis), 1% met this case definition. A control subject was defined as an adult >18 years old whom did not have a history of CHD or MI prior to the index coronary angiogram, had no clinical diagnosis of CHD (<50% stenosis) at the index coronary angiography and no clinical diagnosis of CHD (>70% stenosis) on angiography, MI, revascularization or death due to CHD within 3 years of index coronary angiography and biomaterial collection. The controls were selected on an approximately 3:1 basis to the cases and were frequency matched on age, gender and race to the case set.
When available, conventional risk factor values (age, gender, systolic blood pressure [SBP], diastolic blood pressure [DBP], HDL cholesterol level, total cholesterol level, hemoglobin A1c [HbA1c] and smoking status) were also extracted. The blood pressure values were from the admission assessment for the index coronary angiogram. For cholesterol and HbA1c, they were first available values in the 12 months prior to and 3 months after catheterization.
IM dataset
A total of 159 subjects (23/77 males and 21/82 females diagnosed with clinical CHD within 3 years of the coronary angiogram) were included in this study. Please note that, in contrast to the FHS sample where high-class imbalance is evident (i.e., ∼46:1 ratio of controls:cases), we intentionally selected incident cases for this cohort to ensure better balance between cases and controls (∼3:1 ratio of controls:cases). The demographics and conventional risk factors of these individuals are summarized in Table 1.
Genome-wide DNA methylation and genetic assessments for each of these 159 subjects were conducted by the University of Minnesota Genome Center using the Illumina Infinium MethylationEpic Beadchip array (EPIC) and the Illumina Infinium Multi-Ethnic Global BeadChip array, respectively. These data were then subjected to the same quality control procedure described above for the FHS cohort. A total of 862,593 methylation and 818,046 SNP loci survived quality control measures. For DNA methylation, we retained loci common to both the Illumina 450K and EPIC arrays, resulting in 437,242 loci being available for further analysis. Similarly, for SNPs, we retained those common to both genotyping arrays, resulting in 80,371 loci being available for further analysis.
Integrated genetic-epigenetic incident CHD risk prediction model
One of the aims of this study is to translate array-based methylation loci to potentially clinically implementable dPCR assays, which have fixed constraints on precision [22]. Therefore, prior to performing data mining exclusively using data from the FHS training set, we reduced the methylation variables to include loci whose Δβ (absolute difference between case and controls) was at least 0.03. Methylation loci beta values were converted into M-values and subsequently scaled to have zero mean and unit variance.
Figure 1 illustrates the approach implemented for preprocessing, data mining, modeling and developing the scoring system post genome-wide DNA methylation and genotype quality control. All data mining, feature selection, model development and model tuning were performed exclusively on the FHS training set. We integrated the methylation loci with the SNPs to mine for integrated genetic-epigenetic biomarkers that are highly predictive of 3-year risk for incident CHD. Our data mining approach has been outlined in previous publications [13,18]. All analyses were performed in Python Version 3.7 (Python Software Foundation, DE, USA) [23]. Briefly, using scikit-learn we implemented an undersampling-based approach to account for the high class imbalance and coupled that to an ensemble of machine learning algorithms (Random Forest, Support Vector Machine and Logistic Regression) that incorporated cross-validation to uncover nonlinear methylation–SNP interactions and highly predictive biosignatures in the FHS training set [24,25]. The algorithm was parallelized using message passing interface to reduce computation time on a high preforming computing system for 2 months with 256 cores [26–28]. Based on cross-validation performance in the FHS training set of different marker combinations, we selected a marker set consisting of three DNA methylation loci that were translated to standalone dPCR assays and five SNPs that had the best combined performance with respect to area under the receiver operating characteristic curve (AUC), sensitivity and specificity [29]. The ensemble model consisting of these eight biomarkers underwent hyperparameter tuning and was finalized for testing. The final trained integrated genetic-epigenetic model was then applied on the FHS test and IM dataset (IM test) to determine the AUC, sensitivity and specificity in these sets.
Figure 1. . Schematic of the integrated genetic-epigenetic approach.
dPCR: Digital PCR; FHS: Framingham Heart Study; IM: Intermountain Healthcare; QC: Quality control; ROC: Receiver operator curve.
Polygenic risk score
To understand the performance of our model compared with that of polygenic risk score (PRS) for incident CHD risk prediction, we calculated PRS using summary statistics from a genome-wide meta-analysis of CHD [30] that were performed in 60,801 cases and 123,504 controls using Python. Because only 80,371 SNPs overlapped between the Affymetrix array that was used to profile FHS subjects and the MultiEthnic Global BeadChip array (Illumina, CA, USA) that was used to profile IM subjects, we modeled PRS using 57,647 overlapping SNPs between both arrays that also had corresponding CHD associated log odds ratio. For each subject, PRS was calculated by taking the product of the number of risk alleles and the respective SNP’s log odds ratio for each SNP that were subsequently summed across all SNPs. Using undersampling-based logistic regression to account for class imbalance, a PRS model was fitted in the FHS training set and tested on the FHS and IM test sets. AUC, sensitivity and specificity of this model were evaluated in each of these datasets.
Survival analysis & prognostic scores
Using data from the FHS and IM test sets, a Kaplan–Meier survival curve was fitted to display the time to incident CHD event within 3 years as a function of risk group (high vs low) membership as predicted by our integrated genetic-epigenetic model. The y-axis represents the probability of not having an incident CHD event within 3 years. The 95% CI for each of the distribution was calculated and the distributions of the high- and low-risk groups were compared using the log-rank test [31].
To generate clinical prognostic scores, we then performed Cox proportional hazards analyses using the aforementioned integrated genetic-epigenetic score using the lifelines Python library [32]. The genetic-epigenetic score satisfies the hazard assumptions. The score hazard ratio and its corresponding 95% CI were estimated. Using the 3-year incident CHD probabilities, we then derived three clinical prognostic scores (score 1 = low risk, score 2 = intermediate risk and score 3 = high risk). Specifically, low-, intermediate- and high-risk groups corresponded to 3-year incident CHD occurrence probabilities of <0.05, 0.05–0.19 and ≥0.20, respectively. A Kaplan–Meier survival curve was fitted for these prognosis scores alongside their respective 95% CIs and compared using the log-rank test.
Conventional risk factors-based model
To compare the performance of our model to two commonly used conventional risk factors-based models, FRS and PCE, we implemented these risk calculators on both cohorts to identify those at high risk for incident CHD (≥20%). The variables used in this analysis include age, gender, total cholesterol, HDL, SBP, DBP, diabetes status, smoking status and whether individuals are undergoing blood pressure treatment. Individuals with missing values and those with values outside the allowed range (e.g., for PCE, age must be between 20 and 79 years) were excluded from this analysis.
To better understand if adding conventional CHD risk factors to our integrated genetic-epigenetic incident CHD risk prediction model (Epi+Gen CHD) could improve performance (i.e., increase AUC), we added FRS and PCE to our final trained model and evaluated its performance in the FHS and IM test sets.
Digital PCR assay development
Array-based clinical testing can be time consuming and costly. Fortunately, easily performed, commercially available fluorescent primer probe assays (e.g., Taqman®, Applied Biosystems, MA, USA)) can be used to genotype SNPs of interest. In contrast, there are limited options for profiling methylation loci of interest for clinical tests in a timely and cost-effective manner. To demonstrate that our approach can be used in a clinical setting, we translated the array-based methylation biomarkers in our model into nested primer, fluorescent dPCR assays similar to those previously developed for assessing cigarette or alcohol consumption [33,34]. In brief, we first bisulfite converted DNA from each subject in the IM cohort using the EpiTect Bisulfite Kit (Qiagen, Hilden, Germany). Next, the bisulfite converted DNA was subjected to 14 cycles of high stringency PCR amplification of the target region using a set of amplicon-specific proprietary primers. Next, an aliquot of the enriched amplicon target solution was diluted 1:1500, mixed together with our custom, proprietary primer probe sets and droplet dPCR reagents, partitioned into droplets with a Bio-Rad droplet generator (Bio-Rad, CA, USA), then PCR amplified. The methylation status of each droplet was then determined using a Bio-Rad QX-200 Reader and the percent methylation status of each sample was imputed using the the QuantaSoft™ software (Bio-Rad). The relationship between the dPCR values and their respective Illumina array values were then determined using Pearson correlation.
Results
The clinical and demographic characteristics of the FHS and IM cohorts are outlined in Table 1. The average age of subjects in the FHS and IM cohorts was in the mid and early 60s, respectively, with the age range in both cohorts extending from at least the lower 40s to the upper 80s. All subjects in the FHS cohort were of European ancestry but at least ten of the subjects in the IM cohort were of non-European ancestry. The most notable difference between the two cohorts was with respect to gender. The FHS cohort had more females than males, while the IM subjects were intentionally selected to maintain gender balance in the cohort. Of the conventional risk factors included in this study (SBP, DBP, total cholesterol, HDL cholesterol and HbA1c), in the FHS cohort, only SBP (p = 0.003) and HDL cholesterol (p = 0.029) were statistically significantly different between cases and controls at the 0.05 significance level. None of the conventional risk factors were statistically significantly different between cases and controls in the IM cohort. The average time to event was similar between FHS and IM at 1.5 ± 0.7 and 1.1 ± 1.0 years, respectively.
Integrated genetic-epigenetic incident CHD risk prediction model
Using genome-wide SNP and methylation data from the 1280 subjects in the FHS training set, we built a prediction model, Epi+Gen CHD, to identify those at high risk of having an incident CHD event within 3 years. Epi+Gen CHD consisted of a total of eight biomarkers, three of which were DNA methylation biomarkers and the remaining five were SNPs. The three methylation loci are cg00300879 (transcription start site [TSS200] of CNKSR1), cg09552548 (intergenic) and cg14789911 (body of SPATC1L), while the five SNPs are rs11716050 (LOC105376934), rs6560711 (WDR37), rs3735222 (SCIN/LOC107986769), rs6820447 (intergenic) and rs9638144 (ESYT2). It performed with an AUC, sensitivity, and specificity of 0.90, 0.85 and 0.75, respectively, when evaluated with the same FHS training set.
This model was then evaluated in the FHS and IM test sets. The AUC, sensitivity, and specificity of the final model in these sets are summarized in Table 2. The receiver operating characteristic curves are shown in Figure 2. The AUC, sensitivity and specificity in the FHS test set are 0.84, 0.79 and 0.75, respectively. Similarly, the AUC, sensitivity and specificity in the IM test set are 0.76, 0.75 and 0.72, respectively. The average AUC, sensitivity and specificity across these test sets are 0.80, 0.77 and 0.74, respectively. These performance metrics indicated good generalizability of the trained model to the FHS test set and to the external IM cohort.
Table 2. . Performance of our integrated genetic-epigenetic model in the Framingham Heart Study and Intermountain Healthcare cohorts.
Dataset | AUC | Sensitivity | Specificity |
---|---|---|---|
FHS training | |||
Overall | 0.90 | 0.85 | 0.75 |
Males | 0.90 | 0.89 | 0.76 |
Females | 0.89 | 0.80 | 0.74 |
FHS test | |||
Overall | 0.84 | 0.79 | 0.75 |
Males | 0.82 | 0.78 | 0.77 |
Females | 0.88 | 0.80 | 0.73 |
IM test | |||
Overall | 0.76 | 0.75 | 0.72 |
Males | 0.77 | 0.74 | 0.72 |
Females | 0.75 | 0.76 | 0.72 |
AUC: Area under the receiver operating characteristic curve; FHS: Framingham Heart Study; IM: Intermountain Healthcare.
Figure 2. . Receiver operating characteristic curves of the integrated genetic-epigenetic model for 3-year incident coronary heart disease risk prediction in the Framingham Heart Study training, Framingham Heart Study test and Intermountain Healthcare cohorts.
FHS: Framingham Heart Study; IM: Intermountain Healthcare.
We then evaluated the performance breakdown of Epi+Gen CHD by gender. These results are also summarized in Table 2. The similar performance metrics for both men and women and across cohorts was found and indicate the generalizability of our tool. For men, the average AUC, sensitivity and specificity of Epi+Gen CHD across the FHS and IM test sets are 0.80, 0.76 and 0.75, respectively. For women, the average AUC, sensitivity and specificity of Epi+Gen CHD across the FHS and IM test sets are 0.82, 0.78 and 0.73, respectively.
Survival curves & prognostic scores
The Kaplan–Meier survival curve for the high- and low-risk groups is shown in Figure 3. For those with poor prognosis (i.e., at higher risk of having an incident CHD event within 3 years), there is a clear rapid drop in the probability of not having an incident compared with the good prognosis group (i.e., at lower risk of having an incident CHD event within 3 years). The log-rank test p-value between these two groups of 2.5e-16 indicates a statistically significant difference between their distributions.
Figure 3. . Kaplan–Meier survival curve of the high- and low-risk groups based on binary classification model using data from the Framingham Heart Study test and Intermountain Healthcare cohorts.
Using the Cox proportional hazards model, the integrated genetic-epigenetic score significantly predicted 3-year risk for CHD (hazard ratio = 3.14; 95% CI: 2.33–4.23; p < 0.005). We then derived the clinical prognostic score of 1, 2 and 3 to indicate low-, intermediate- and high-risk groups, respectively. The Kaplan–Meier survival curve for these scores is shown in Figure 4. Once again, there is a clear rapid drop in probability of not having an incident event within 3 years with a high prognostic score. The log-rank test p-value between these groups is 5.4e-20, indicating a statistically significant difference between their distributions.
Figure 4. . Kaplan–Meier survival curve for high, intermediate and low prognostic scores based on Cox proportional hazards model from the Framingham Heart Study test and Intermountain Healthcare cohorts.
Polygenic risk score
To better understand the performance of a model that only incorporates SNPs (i.e., PRS) for incident CHD risk prediction compared with Epi+Gen CHD that integrates methylation biomarkers with SNPs, we trained and tested a PRS model. Using SNPs that only overlapped between the FHS and IM cohorts, the PRS model was trained in the FHS training set and performed with an AUC, sensitivity and specificity of 0.54, 0.50 and 0.59, respectively, in the FHS test set. Similarly, it performed with an AUC, sensitivity and specificity of 0.43, 0.30 and 0.57, respectively, in the IM test set.
Conventional risk factors-based model
Due to missing values and constraints in the FRS and PCE risk calculators such as with respect to age, only subjects with complete data were evaluated. The performance of these models across both cohorts and the breakdown by gender is summarized in Table 3. On average FRS performed with 0.23 and 0.91 sensitivity and specificity, respectively, across the FHS and IM cohorts. The PCE risk estimator, on average, performed with a sensitivity and specificity of 0.55 and 0.65, respectively, across the FHS and IM cohorts. The FRS calculator had better specificity over sensitivity compared with PCE in both cohorts and vice versa for the PCE calculator. With respect to gender, FRS tended to perform with better specificity for both men and women while PCE tended to perform better with respect to sensitivity. On average, across both cohorts, Epi+Gen CHD was 54 and 53% more sensitive for men and women, respectively, compared with FRS. Similarly, it was 11 and 41% more sensitive, and 11 and 7% more specific for men and women, respectively, compared with PCE. The average performance comparisons across cohorts in general and by gender are summarized in Figure 5.
Table 3. . Performance of the Framingham Risk Score and atherosclerotic cardiovascular disease Pooled Cohort risk estimators in the Framingham Heart Study and Intermountain Healthcare cohorts.
Risk estimator | Dataset | Sensitivity | Specificity |
---|---|---|---|
Framingham Risk Score | FHS | ||
Overall | 0.15 | 0.93 | |
Males | 0.12 | 0.86 | |
Females | 0.22 | 0.98 | |
IM | |||
Overall | 0.31 | 0.89 | |
Males | 0.33 | 0.85 | |
Females | 0.29 | 0.93 | |
ASCVD Pooled Cohort Equation | FHS | ||
Overall | 0.41 | 0.74 | |
Males | 0.52 | 0.66 | |
Females | 0.18 | 0.81 | |
IM | |||
Overall | 0.69 | 0.55 | |
Males | 0.78 | 0.61 | |
Females | 0.57 | 0.5 |
ASCVD: atherosclerotic cardiovascular disease; FHS: Framingham Heart Study; IM: Intermountain Healthcare.
Figure 5. . Comparison of the average and of the integrated genetic-epigenetic model, Framingham Risk Score and Pooled Cohort Equation models across the Framingham Heart Study and Intermountain Healthcare cohorts. (A) Sensitivity, (B) specificity.
FHS: Framingham Heart Study; IM: Intermountain Healthcare; PCE: Pooled Cohort Equation.
Model comparisons
In order to determine whether the addition of PRS, FRS or PCE to Epi+Gen CHD could potentially improve risk stratification, we compared the average AUCs of Epi+Gen CHD with and without incorporating each of these features. These are also summarized in Figure 6. The addition of PRS and FRS do not result in an increase in average AUC of Epi+Gen CHD. The addition of PCE to Epi+Gen CHD increases the average AUC from 0.83 to 0.84.
Figure 6. . Average area under the curve across Framingham Heart Study and Intermountain Healthcare cohorts of integrated genetic-epigenetic model compared with models with the addition of polygenic risk score, Framingham Risk Score and Pooled Cohort Equation.
AUC: Area under curve; CHD: Coronary heart disease; FHS: Framingham Heart Study; FRS: Framingham Risk Score; IM: Intermountain Healthcare; PCE: Pooled Cohort Equation; PRS: Polygenic risk score.
dPCR assays
Because one of the goals of this study is to demonstrate the applicability of our tool in conventional clinical or research settings, we translated the time consuming, labor intensive genome-wide methylation approach to simple, quick to perform methylation sensitive dPCR assays for methylation loci included in the final model. In Figure 7, we show the translation of the three methylation sites in our prediction model. The Pearson correlations between methylation values as determined by dPCR to that of their corresponding array values for cg00300879, cg09552548 and cg14789911 are 0.94, 0.93 and 0.93, respectively. The high correlation suggests that dPCR is a viable alternative to array-based DNA methylation assessments.
Figure 7. . Pearson correlation between digital PCR DNA methylation values and corresponding array DNA methylation values for (A) cg00300879, (B) cg09552548 and (C) cg14789911 markers in the Intermountain Healthcare cohort.
dPCR: Digital PCR.
Discussion
In this study, we illustrate the development and external validation of an integrated SNP-DNA methylation biomarkers-based risk assessment model capable of identifying those at risk for an incident CHD event within 3 years. Our risk estimator, Epi+Gen CHD, performed similarly in the FHS and IM cohorts, indicating good generalizability across cohorts. This approach also addresses two of the major limitations of the FRS and PCE risk calculators. The first is the overall lack in sensitivity, which represents the ability of the model to accurately identify those at high risk for an incident event (i.e., true positive rate). The sensitivity of these risk calculators in the IM cohort, on average, suggests that they are able to accurately identify about 50 out of 100 individuals at high risk for an incident event. In contrast, the sensitivity of Epi+Gen CHD suggests that it is able to accurately identify about 75 out of 100 individuals.
The second limitation of the FRS and PCE risk calculators is with respect to the performance of standard risk calculators by gender. Once again, the gender specific sensitivity in the IM cohort of these risk calculators, on average, suggests that they are able to accurately identify about 56 of 100 men and 43 of 100 women at high risk for an incident event. While the overall sensitivity is low, it is especially low in women. We are not the first to highlight the low sensitivity and lack in performance in females with the current risk calculators [35,36]. In contrast, the gender specific sensitivity of Epi+Gen CHD in the IM cohort suggests that it is able to accurately identify about 74 of 100 men and 76 of 100 women at high risk for an incident event. Our tool also does not exhibit gender gap in its ability to identify men and women at risk for an incident event. We believe given the marked improvement in sensitivity of our tool across both genders, its clinical utility to better stratify patients at risk for an incident CHD event within 3 years may allow for better informed and timely clinical decision making for prevention. Furthermore, this tool may even be utilized in clinical trials that are especially interested in understanding risk profile for both men and women.
Assessing the generalizability of the integrated genetic-epigenetic approach across cohorts was especially important given that the FHS cohort was drawn from the general, mostly asymptomatic population whereas the IM cohort was drawn from a clinical population who presented with symptoms consistent with significant angina or the presence of other cardiovascular risk factors to warrant the angiogram. These two patient populations offer unique challenges in identifying and appropriately treating individuals at high and low risk for CHD. This is not a trivial challenge. Nearly two thirds of all sudden cardiac death caused by CHD will occur as the first manifestation of disease or in patients who have been evaluated but judged to be at low risk for an event [37]. Therefore, we believe that the use of our test, which was able to identify these asymptomatic individuals with high sensitivity, could identify those individuals in need of further testing such as myocardial perfusion imaging and more aggressive medical management. Ultimately, our goal is to reduce the number of individuals whose first indication of CHD is a sudden cardiac death or irreversible loss of cardiac function. In asymptomatic patients who do not have severe CHD, the identification of those at high risk of a future event, whether secondary to small vessel disease, unstable/ruptured plaque and/or increased stenosis, could indicate the need for more intensive medical therapy and behavioral modifications. Conversely, the high specificity of Epi+Gen CHD could eliminate unnecessary treatment and greatly reduce healthcare expenditures. The schematic for how Epi+Gen CHD can be used in a clinical setting is shown in Figure 8. This risk assessment test could be administered completely remotely in a convenient manner with the use of telemedicine and at-home sampling.
Figure 8. . Schematic of the use of integrated genetic-epigenetic incident coronary heart disease risk prediction model in a clinical setting.
AI: Artificial intelligence; dPCR: Digital PCR.
We believe that our DNA-based multimarker approach has the ability to capture and better understand the complex nature of CHD via three angles, genetics (inherited risk that is static), DNA methylation (acquired risk that is dynamic) and the genetic confounding of methylation signatures. In our prior work using data from the FHS cohort, we showed that approximately 90% of the methylome whose status was affected by smoking had those effects moderated by cis- or trans-genetic variation [12]. Since smoking is only one of the modifiable risk factors for CHD, this suggests that methylation changes secondary to other modifiable risk factors such as cholesterol, diabetes and blood pressure may also be moderated by genetic variation. Therefore, our approach that integrates both types of biomarkers, improves the ability to capture genetically contextual effects in addition to main effects for a variety of modifiable risk factors that are associated with risk for incident CHD.
Thus, it is not surprising that biomarkers in our integrated Epi+Gen CHD risk score map to several of the complex pathways associated with the biology of CHD. For example, cg00300879 is located approximately 200 nucleotides upstream of the TSS200 of the CNKSR1 gene. CNK1 (also known as CNKSR1) is known to be involved in activating different signaling pathways, such as the AKT pathway [38]. The hypo-activation of the AKT pathway triggered by a combination of diabetes mellitus and hypercholesterolemia has been shown to result in complex coronary atherosclerosis [39]. As for SNPs, for example, rs3735222 is a G>A variant located within 2KB of the transcriptiona start site (TSS) of the SCIN gene. Variants in the SCIN gene have been associated with mean HDL diameter [40], with mean HDL size known to be inversely linked to cardiovascular disease risk [41]. Rs3735222 has also been shown in an African American population to be a part of a local European ancestry region associated with MI [42]. Still, given the rich tapestry of genetic interplay in tissue specific gene expression uncovered by the GTEx Consortium, it is quite possible that eQTLs in tight linkage disequilibrium with the SNPs in our panel may better account for the observed predictive power [43].
For the Epi+Gen CHD risk model to be clinically useful, it needs to utilize measurement techniques employable in most modern pathology laboratories. To that end, existing Taqman® assays can easily be obtained and used to ascertain each of the five SNPs in our model. However, if methylation informed metrics are to play a role in clinical medicine, we need a simpler, more cost-effective way to measure them. Currently, most researchers use genome-wide methylation arrays, such as the 450K and EPIC arrays used herein, for research purposes because of the breadth of genome-wide coverage and their ease of incorporation into bioinformatic analyses. But this expansive coverage comes with a cost. Even though both of these arrays are powerful tools for discovery, they are poorly suited for routine clinical assessments because of their expense, relative lack of precision and lengthy computationally intensive, processing procedures [44,45]. In fact, Dedeurwaerder and colleagues frequently found methylation differences of up to 10% on technical replicates [44]. What is perhaps more troubling, is that while the M-values derived from these arrays are reliable for internal comparisons, the conversion of the hybridization results into external, clinically interpretable values such as B-values or percent methylation is error prone and is not recommended [46].
In contrast, methylation sensitive PCR-based methods are commonly used for in vitro diagnostics and can produce reliable, interpretable percent methylation values [47]. The methylation sensitive dPCR assays that we developed for the three loci in this Epi+Gen CHD risk model can be easily implemented at most institutions, and can be performed relatively rapidly and overcome the lack of precision and interpretability associated with methylation arrays. However, alternative methods, such as pyrosequencing, may also be equally useful in assessing DNA methylation.
Nuances on the current approach may improve prediction. Our dPCR method uses a bisulfite conversion step that treats methylated and hydroxy-methylated DNA equally. But recent studies have suggested that hydroxy-methylation levels at key loci can also independently predict CHD status [48]. Therefore, it is quite possible that by deriving assessment tools that can measure both cytosine adducts independently, it may be possible to improve upon the prediction algorithms that we currently describe.
Although these markers appear to be powerful assessment tools, their ability to guide treatment is not known. In recently published work, we have shown that successful treatment of smoking, a known risk factor for CHD, is associated with reversion of CHD associated methylation changes at cg00300879 [49]. Since the gene targeted in this assay, CNKSR1, physically interacts with SASH1, which is an important link between smoking and risk for atherosclerosis, these findings make intuitive sense [50]. However, the time scale for these changes is on the order of months and the responsiveness of these loci to the successful treatment of other risk factors, such as elevated LDL levels, with statins is not known. Further research into determining the relationship of treatment of each of the dimensions for risk for CHD is clearly indicated.
Our study has several limitations. The first limitation is the relatively small number of incident cases used in the analysis. Second, the FHS cohort and a majority of the IM cohort were individuals of European ancestry. Therefore, there is a need to extend and validate the Epi+Gen CHD risk score in larger more ethnically diverse cohorts. However, we note that one of the SNPs in the model has been associated with CHD in African Americans and this may indicate that this locus and perhaps our Epi+Gen CHD could generalize to additional racial groups.
Finally, in this study, DNA methylation was only assessed at study intake. As a result, we do not know the relationship between subsequent changes in methylation and the actual occurrence, or alternatively, nonoccurrence of clinical CHD. However, in a recent pilot study of another population, we showed that changes at one of these three sites, cg00300879, showed significant reversion of risk associated methylation after 3 months of biochemically verified smoking cessation [49]. Although the pilot study was of short duration, there were no cardiac outcomes and smoking is only one of a number of established cardiac risk factors, this suggests that changes in DNA methylation could occur in response to other preventive treatments. Therefore, we suggest that further investigations to determine whether changes in DNA methylation occur as a function of other lifestyle or medication interventions, such as statins, are indicated. If some of these studies are successful, in the future clinicians may be able to monitor the effectiveness of prevention therapy through assessing DNA methylation.
Conclusion
In this communication, we report the development and external validation of Epi+Gen CHD, an integrated genetic-epigenetic tool for incident CHD risk stratification. The high clinical prognosis capability of Epi+Gen CHD coupled to its quick assessment turnaround using dPCR assays, makes it a promising tool for use in a clinical setting. In ongoing efforts, our goal is to better understand the value of the Epi+Gen CHD risk score in predicting CHD in a broad range of populations and its effectiveness for improving patient outcomes.
Future perspective
Although novel, this integrated genetic-epigenetic assay for heart disease is very unlikely to be the only integrated genetic-epigenetic assessment for long. A recent review of the genetics of complex disease by Hou and colleagues of 22 complex traits or diseases showed that the vast majority of illnesses, including diabetes, asthma and hypertension, had heritabilities between 10–20% [51]. Some of those diseases, such as diabetes, are known to affect the DNA methylation of peripheral white blood cells. Therefore, we think it is likely that over the next 5 years, numerous integrated-genetic assays for the assessment of complex medical disorders, including all the independent risk factors for CHD, that use white blood cell DNA as their source biomaterial, will emerge.
What about for disorders that do not affect white blood cells but may affect other tissues? We note that the Cologuard test uses desquamated colonic epithelial DNA and a combination of genetic and epigenetic assessments to screen for colorectal cancer [52]. Therefore, it is quite possible that other tissues such as skin and exfoliated bladder cells will used as source material for integrated genetic-epigenetic assessments for risk.
Finally, because each of the assessment described herein could be conceivably accomplished using saliva DNA, we believe that the potential of this assessment technique for telemedicine approaches is quite high. Currently, almost all risk assessments require the patient to present themselves for phlebotomy. By translating these assays to saliva-based format, it may be possible to circumvent the need for phlebotomy and usher in an age of painless assessment and monitoring of CHD prevention treatment.
Summary points.
Coronary heart disease (CHD) is a leading cause of death for both men and women.
Highly sensitive and specific screening tests for CHD could lead to more effective prevention of death and disability from CHD.
The Framingham Risk Score and atherosclerotic cardiovascular disease (ASCVD) Pooled Cohort Equation (PCE) are two risk estimators that are currently used to predict CHD risk. However, because they do not perform well, many individuals experience potentially preventable life cardiac events.
Prior studies show that CHD is associated with changes at a large number of CpG sites. But by themselves, methylation at these sites does not predict future CHD well. Genetic confounding of the CHD-associated methylation signature may explain some of that inability to predict.
Using an machine learning approach and two independent datasets, we have developed integrated genetic-epigenetic incident CHD risk prediction model (Epi+Gen CHD), a test that uses artificial intelligence to interpret the genetically contextual epigenetic signatures for predicting risk for incident CHD in the near-term (3-year). Epi+Gen CHD consists of five SNPs and three DNA methylation markers.
In our two study populations, Epi+Gen CHD had a higher overall sensitivity and similar specificity to the PCE. As opposed to the PCE, there was no gender bias of Epi+Gen CHD with the new test being 40% more sensitive for predicting incident CHD in women.
Limitations of the current findings include that the test was developed and tested in subjects of European ancestry. Further research to confirm and extend these findings into diverse populations is indicated.
Footnotes
Financial & competing interests disclosure
This work was supported by National Institute of Health grants R01DA037648 (Philibert) and R44DA041014 (Philibert), and Cardio Diagnostics, Inc. On behalf of MV Dogan and R Philibert, the University of Iowa has filed intellectual property claims related to the integrated genetic-epigenetic technology described in this communication (US Patent application 62,455,416: Compositions and Methods for Detecting Predisposition to Cardiovascular Disease). On behalf of MV Dogan, R Philibert and TK Dogan, Cardio Diagnostics, Inc. has filed intellectual property claims related to the integrated genetic-epigenetic technology described in this communication (US Patent application 63,074,878: Methods and Compositions for Predicting Coronary Heart Disease). They are potential royalty recipients on these intellectual right claims. MV Dogan is the chief executive officer and stockholder of Cardio Diagnostics, Inc. R Philibert is the chief medical officer and stockholder of Cardio Diagnostics, Inc. TK Dogan is an employee and stockholder of Cardio Diagnostics, Inc. (www.cardiodiagnosticsinc.com). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The National Institutes of Health had no role in the design of the study; in the collection, analyses or interpretation of data; in the writing of the manuscript; or in the decision to publish the results. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.
No writing assistance was utilized in the production of this manuscript.
Ethical conduct of research
The procedures and protocols used for the analysis of the Framingham Heart Study data were approved by the University of Iowa Institutional Review Board (IRB# 201503802). The procedures and protocols used for the analyses of the Intermountain Healthcare materials were approved by the Intermountain Healthcare Institutional Review Board (IRB# 1024811). Informed consent was obtained from all participants involved in this study.
References
Papers of special note have been highlighted as: •• of considerable interest
- 1.Centers for Disease Control. Leading causes of death for 2017. https://www.cdc.gov/heartdisease/facts.htm
- 2.Wilson PWF, D'agostino RB, Levy D, Belanger AM, Silbershatz H, Kannel WB. Prediction of coronary heart disease using risk factor categories. Circulation 97(18), 1837–1847 (1998). [DOI] [PubMed] [Google Scholar]
- 3.Goff DC, Lloyd-Jones DM, Bennett Get al. 2013 ACC/AHA guideline on the assessment of cardiovascular risk. Circulation 129(2 Suppl. 25), S49–S73 (2014). [DOI] [PubMed] [Google Scholar]
- 4.Thomas GS, Voros S, Mcpherson JAet al. A blood-based gene expression test for obstructive coronary artery disease tested in symptomatic nondiabetic patients referred for myocardial perfusion imaging the COMPASS study. Circ. Cardiovasc. Genet. 6(2), 154–162 (2013). [DOI] [PubMed] [Google Scholar]
- 5.Rosenberg S, Elashoff MR, Lieu HDet al. Whole blood gene expression testing for coronary artery disease in nondiabetic patients: major adverse cardiovascular events and interventions in the PREDICT trial. J. Cardiovasc. Transl. Res. 5(3), 366–374 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Siemelink MA, Zeller T. Biomarkers of coronary artery disease: the promise of the transcriptome. Curr. Cardiol. Rep. 16(8), 513 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Inouye M, Abraham G, Nelson CPet al. Genomic risk prediction of coronary artery disease in 480,000 adults: implications for primary prevention. J. Am. Coll. Cardiol. 72(16), 1883–1893 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Janssens ACJW. Validity of polygenic risk scores: are we measuring what we think we are? Hum. Mol. Genet. (2019) (Epub ahead of print). [DOI] [PMC free article] [PubMed] [Google Scholar]; •• Highlights the review of the limitations of polygenic risk scores that should be read by those seeking to create new polygenic risk scores.
- 9.Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51(4), 584 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Zhang Y, Schöttker B, Florath Iet al. Smoking-associated DNA methylation biomarkers and their predictive value for all-cause and cardiovascular mortality. Env. Health Perspect. 124(1), 67–74 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Sharma P, Garg G, Kumar Aet al. Genome wide DNA methylation profiling for epigenetic alteration in coronary artery disease patients. Gene 541(1), 31–40 (2014). [DOI] [PubMed] [Google Scholar]
- 12.Dogan MV, Beach SRH, Philibert RA. Genetically contextual effects of smoking on genome wide DNA methylation. Am. J. Med. Genet. B Neuropsychiatr. Genet. 174(6), 595–607 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]; •• Describes the extent of gene × methylation effects in a commonly used epigenetic resource.
- 13.Dogan M, Beach S, Simons R, Lendasse A, Penaluna B, Philibert R. Blood-based biomarkers for predicting the risk for five-year Incident coronary heart disease in the Framingham Heart Study via machine learning. Genes 9(12), 641 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Dawber TR, Kannel WB, Lyell LP. An approach to longitudinal studies in a community: the Framingham Study. Ann. NY Acad. Sci. 107, 539–556 (1963). [DOI] [PubMed] [Google Scholar]
- 15.Pidsley R, Wong CC Y, Volta M, Lunnon K, Mill J, Schalkwyk LC. A data-driven approach to preprocessing Illumina 450K methylation array data. BMC Genomics 14(1), 1–10 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Triche JRT. FDb.InfiniumMethylation.hg19: annotation package for Illumina Infinium DNA methylation probes. R package version 2.2.0. (2014). https://bioconductor.org/packages/release/data/annotation/html/FDb.InfiniumMethylation.hg19.html
- 17.Davis S, Du P, Bilke S, Triche JT, B M. methylumi: Handle Illumina methylation data. R package version 2.22.0. (2017). https://www.bioconductor.org/packages/release/bioc/html/methylumi.html
- 18.Dogan MV, Grumbach IM, Michaelson JJ, Philibert RA. Integrated genetic and epigenetic prediction of coronary heart disease in the Framingham Heart Study. PLoS ONE 13(1), e0190549 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Purcell S, Neale B, Todd-Brown Ket al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81(3), 559–575 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Muhlestein JB, May HT, Bair TLet al. Relation of elevated plasma renin activity at baseline to cardiac events in patients with angiographically proven coronary artery disease. Am. J. Cardiol. 106(6), 764–769 (2010). [DOI] [PubMed] [Google Scholar]
- 21.Taylor GS, Muhlestein JB, Wagner GS, Bair TL, Li P, Anderson JL. Implementation of a computerized cardiovascular information system in a private hospital setting. Am. Heart J. 136(5), 792–803 (1998). [DOI] [PubMed] [Google Scholar]
- 22.Bizouarn F, Karlin-Neumann G. Digital PCR: Methods and Protocols. Springer, NY, USA: (2018). [DOI] [PubMed] [Google Scholar]
- 23.Van Rossum G, Drake FL. Python 3 Reference Manual. CreateSpace, CA, USA: (2009). [Google Scholar]
- 24.Han J, Kamber M, Pei J. Data Mining: Concepts and Techniques (3rd Edition). Elsevier, Amsterdam, The Netherlands: (2011). [Google Scholar]
- 25.Pedregosa F, Varoquaux G, Gramfort Aet al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011). [Google Scholar]
- 26.Dalcín L, Paz R, Storti M. MPI for Python. J. Parallel Distrib. Comput. 65(9), 1108–1115 (2005). [Google Scholar]
- 27.Dalcín L, Paz R, Storti M, D'elía J. MPI for Python: performance improvements and MPI-2 extensions. J. Parallel Distrib. Comput. 68(5), 655–662 (2008). [Google Scholar]
- 28.Dalcin LD, Paz RR, Kler PA, Cosimo A. Parallel distributed computing using Python. Adv. Water Resour. 34(9), 1124–1139 (2011). [Google Scholar]
- 29.Hanley JA, Mcneil BJ. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 148(3), 839–843 (1983). [DOI] [PubMed] [Google Scholar]
- 30.Nikpay M, Goel A, Won HHet al. A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease. Nat. Genet. 47(10), 1121–1130 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Bland JM, Altman DG. The logrank test. BMJ 328(7447), 1073 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Davidson-Pilon C. lifelines: Survival analysis in Python. The Journal of Open Source Software 4(40), 1317 (2019). [Google Scholar]
- 33.Philibert R, Dogan M, Beach SRH, Mills JA, Long JD. AHRR methylation predicts smoking status and smoking intensity in both saliva and blood DNA. Am. J. Med. Genet. B Neuropsychiatr. Genet. 183(1), 51–60 (2019). [DOI] [PubMed] [Google Scholar]
- 34.Philibert R, Miller S, Noel Aet al. A four marker digital PCR toolkit for detecting heavy alcohol consumption and the effectiveness of its treatment. J. Insur. Med. 48(1), 90–102 (2019). [DOI] [PubMed] [Google Scholar]
- 35.Rana JS, Tabada GH, Solomon MDet al. Accuracy of the atherosclerotic cardiovascular risk equation in a large contemporary, multiethnic population. J. Am. Coll. Cardiol. 67(18), 2118–2130 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Hemann BA, Bimson WF, Taylor AJ. The Framingham Risk Score: an appraisal of its benefits and limitations. Am. Heart Hosp. J. 5(2), 91–96 (2007). [DOI] [PubMed] [Google Scholar]
- 37.Myerburg RJ. Sudden cardiac death: exploring the limits of our knowledge. J. Cardiovasc. Electrophysiol. 12(3), 369–381 (2001). [DOI] [PubMed] [Google Scholar]
- 38.Fischer A, Warscheid B, Weber W, Radziwill G. Optogenetic clustering of CNK1 reveals mechanistic insights in RAF and AKT signalling controlling cell fate decisions. Sci. Rep. 6, 38155 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Hamamdzic D, Fenning RS, Patel Det al. Akt pathway is hypoactivated by synergistic actions of diabetes mellitus and hypercholesterolemia resulting in advanced coronary artery disease. Am. J. Physiol. Heart Circ. Physiol. 299(3), H699–706 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Frazier-Wood AC, Manichaikul A, Aslibekyan Set al. Genetic variants associated with VLDL, LDL and HDL particle size differ with race/ethnicity. Hum. Genet. 132(4), 405–413 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Kontush A. HDL particle number and size as predictors of cardiovascular disease. Front. Pharmacol. 6, 218 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Shendre A, Irvin MR, Wiener Het al. Local ancestry and clinical cardiovascular events among African Americans from the atherosclerosis risk in communities study. J. Am. Heart Assoc. 6(4), e004739 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Consortium G. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369(6509), 1318–1330 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Dedeurwaerder S, Defrance M, Bizet M, Calonne E, Bontempi G, Fuks F. A comprehensive overview of Infinium HumanMethylation450 data processing. Brief. Bioinform. 15(6), 929–941 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.The Blueprint Consortium, Bock C, Halbritter Fet al. Quantitative comparison of DNA methylation assays for biomarker development and clinical applications. Nature Biotechnol. 34, 726 (2016). [DOI] [PubMed] [Google Scholar]; •• Highlights the summary of the strengths and limitations of many commonly used laboratory methods in epigenetics.
- 46.Kruppa J, Sieg M, Richter G, Pohrt A. Estimands in epigenome-wide association studies. Clin. Epigenetics 13(1), 98 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Taryma-Leśniak O, Sokolowska KE, Wojdacz TK. Current status of development of methylation biomarkers for in vitro diagnostic IVD applications. Clin. Epigenetics 12(1), 100 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Jiang D, Sun M, You Let al. DNA methylation and hydroxymethylation are associated with the degree of coronary atherosclerosis in elderly patients with coronary heart disease. Life Sci. 224, 241–248 (2019). [DOI] [PubMed] [Google Scholar]
- 49.Philibert W, Andersen AM, Hoffman EA, Philibert R, Dogan M. The reversion of DNA methylation at coronary heart disease risk loci in response to prevention therapy. Processes 9(4), 699 (2021). [Google Scholar]
- 50.Weidmann H, Touat-Hamici Z, Durand Het al. SASH1, a new potential link between smoking and atherosclerosis. Atherosclerosis 242(2), 571–579 (2015). [DOI] [PubMed] [Google Scholar]
- 51.Hou K, Burch KS, Majumdar Aet al. Accurate estimation of SNP-heritability from biobank-scale data irrespective of genetic architecture. Nat. Genet. 51(8), 1244–1251 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Imperiale TF, Ransohoff DF, Itzkowitz SHet al. Multitarget stool DNA testing for colorectal-cancer screening. N. Engl. J. Med. 370(14), 1287–1297 (2014). [DOI] [PubMed] [Google Scholar]