Abstract
Myocardial infarction is a leading cause of death globally but is notoriously difficult to predict. We aimed to identify biomarkers of an imminent first myocardial infarction and design relevant prediction models. Here, we constructed a new case–cohort consortium of 2,018 persons without prior cardiovascular disease from six European cohorts, among whom 420 developed a first myocardial infarction within 6 months after the baseline blood draw. We analyzed 817 proteins and 1,025 metabolites in biobanked blood and 16 clinical variables. Forty-eight proteins, 43 metabolites, age, sex and systolic blood pressure were associated with the risk of an imminent first myocardial infarction. Brain natriuretic peptide was most consistently associated with the risk of imminent myocardial infarction. Using clinically readily available variables, we devised a prediction model for an imminent first myocardial infarction for clinical use in the general population, with good discriminatory performance and potential for motivating primary prevention efforts.
Subject terms: Prognostic markers, Myocardial infarction
Gustafsson, Lampa et al. used proteomics, metabolomics and clinical factors in a case–cohort study to identify biomarkers of imminent myocardial infarction and devise a prediction model for imminent myocardial infarction, which can be clinically used in the general population.
Main
Despite declining age-standardized rates, myocardial infarction remains the leading and increasing cause of death globally1. Prevention of myocardial infarction is highly prioritized2, but the targeting of primary preventive efforts is hampered by inefficient means of identifying individuals at the highest risk for an imminent myocardial infarction (IMI). This could be partially explained by the inability of most risk prediction models to account for the highly dynamic nature of the period leading up to a myocardial infarction. For instance, traumatic events, such as a cancer diagnosis or loss of a spouse, markedly increase the risk of myocardial infarction3,4. In addition, the degree of stenosis in the culprit lesion in the coronary artery appears to increase in the months just before the myocardial infarction5. Nonetheless, to date, most biomarkers have been investigated over several years of follow-up because of a low number of individuals with a first myocardial infarction shortly after baseline in the general population. Hence, a large population-based study focusing on identifying biomarkers of an IMI is needed.
Primary prevention for asymptomatic risk factors over a long period is costly, and motivation among patients and providers is limited even for secondary prevention6. Risk prediction in the short term based on biomarkers of IMI might tilt the scales for prevention, as the knowledge of an increased risk of a first myocardial infarction within the ensuing few months might motivate patients and doctors to consider preventive strategies.
We hypothesized that circulating biomarkers of the dynamic biological processes that operate in the months preceding a myocardial infarction could be measured and used to assess risk. We tested this in a new nested case–cohort study and devised a prediction model for an imminent first myocardial infarction.
Results
We assembled a new nested case–cohort study, the Markers of Imminent Myocardial Infarction (MIMI) study. The study includes initially cardiovascular disease-free individuals in six European general population-based cohorts who developed a myocardial infarction within the first 6 months after the baseline examination, with up to four cohort representatives per case (Fig. 1 and Supplementary Table 1). The case–cohort design allows for time-to-event analyses and derivation of accurate prediction models; it is also less prone to certain biases than the case–control design7. After exclusions, data of 2,018 individuals weighted to represent the full cohort of 169,053 persons were available for analysis (420 IMI cases and 1,598 subcohort representatives). Their characteristics at baseline are shown in Extended Data Table 1.
Extended Data Table 1.
Baseline characteristics of the cases and weighted estimates of the full cohort of 169053 individuals from which the subcohort was drawn are shown. Numbers are percentages or arithmetic means, as indicated. LDL, low-density lipoprotein.
Thereafter, we determined the levels of 817 proteins (some duplicates) and 1,025 metabolites in biobanked plasma samples from the cohort baseline examinations in a core laboratory and harmonized 16 clinical variables between the cohorts. We divided the study sample into a discovery sample (EpiHealth, Trøndelag Health Study (HUNT) and Lifelines; 70% of the sample) and an external validation sample (European Prospective Investigation into Cancer and Nutrition—Cardiovascular Disease (EPIC-CVD), Estonian Biobank study and Malmö Preventive Project (MFM); 30% of the sample). Considering the limited sample size of the study, we also performed an internal validation as an exploratory analysis by randomly splitting the study sample into a 70:30 discovery/validation sample, repeated in 100 random draws.
We investigated the associations of proteins, metabolites and clinical variables with the risk of a first myocardial infarction within 6 months after baseline using weighted, stratified Cox proportional-hazards regression models in the discovery sample. Biomarkers that passed multiple testing bounds (a Benjamini–Hochberg false discovery rate (FDR) of <0.05) were verified in the same models in the validation sample (this was done in the external and internal validation sets), with directionally consistent results at P < 0.05 considered replicated.
In one-by-one models adjusting for technical covariates (season, storage time and plate; Fig. 2), 48 proteins, 43 metabolites and 3 clinical variables (age, sex and systolic blood pressure) were found to be associated with IMI after the discovery–validation process (Fig. 3 and Supplementary Table 2).
Thereafter, we investigated promising markers in models further adjusting for age and sex. Among them, brain natriuretic peptide (BNP) was the only biomarker with a borderline significant association with IMI (HR per doubling of BNP level (95% confidence interval (95% CI)) = 1.33 (1.15, 1.55), P = 1.63 × 10−4, FDR = 0.11 in the discovery sample and 1.40 (1.00, 1.94), P = 0.049 in the validation sample; Extended Data Fig. 2). BNP was the only biomarker with a suggestive association in the internal validation, passing the formal replication criteria in 22 of 100 random splits. By comparison, stem cell factor (SCF) and interleukin-6 (IL-6), biomarkers with a weaker support of an association, replicated in only 5 or 4 of 100 random splits. The cumulative hazard of IMI by fourths of BNP is shown in Extended Data Fig. 3. The associations of BNP with IMI in sensitivity analyses excluding one cohort at a time and in a random-effects meta-analysis were similar, as shown in Extended Data Figs. 4 and 5. For some of the 94 variables, we observed substantial between-cohort heterogeneity in the estimates when they were evaluated in a random-effects meta-analysis (Supplementary Table 3). The addition of interaction terms between sex and the biomarkers did not reveal any additional associations. Associations with IMI within 3 months (185 cases) were similar to those within 6 months (Extended Data Fig. 6).
In a model investigating the total effect of the BNP–IMI association (with a priori selected confounders, not mediators, according to Extended Data Fig. 1), adjusting for age, sex, weight, height, creatinine and systolic blood pressure, the association of BNP with IMI remained similar (HR (95% CI) = 1.34 (1.14, 1.57), P = 3.12 × 10−4 in the discovery sample and 1.51 (1.05, 2.18), P = 0.028 in the validation sample; per doubling of BNP level).
We then investigated the association of the most promising marker, BNP, with the coronary artery calcium score (CACS) at a cardiac computer tomography examination in an external population-based cohort of 1,586 participants of the Swedish CArdioPulmonary bioImage Study (SCAPIS) who were free from self-reported cardiovascular disease. Here, a higher CACS was not notably associated with a higher BNP level (odds ratio (95% CI) = 1.14 (0.91, 1.42), P = 0.25; per doubling of BNP level) in an ordinal regression model adjusting for the same covariates as in the total-effects model.
Finally, we investigated the possibility of developing a clinical risk prediction algorithm for a first IMI using clinically available variables and a weighted Cox ridge regression model. The prediction model achieved an internally validated C-index of 0.78, indicating a good ability to discriminate between IMI cases and noncases. When validating the model in the UK Biobank, a C-index of 0.82 was obtained, while a calibration plot showed some overestimation of 6-month IMI risks. As a comparison, the recalibrated SCORE2 achieved C-indexes of 0.77 (MIMI cohort) and 0.81 (UK Biobank) and overestimated the IMI risks in both samples (Extended Data Fig. 7). A nomogram based on the model is shown in Fig. 4, with a worked example of its intended use displayed in Extended Data Fig. 8 and its cross-validated calibration presented in Extended Data Fig. 7. An interactive web application is presented at miscore.org. Coefficients for predicting IMI from the model are shown in Supplementary Table 4.
No biomarkers improved risk prediction in a LASSO (least absolute shrinkage and selection operator) Cox regression model; the variable selection by the LASSO was unstable, with the 95% bootstrap CI on the model size being 0–128 variables. No biomarkers improved risk prediction in a random forest model using 2,000 trees; it also ranked BNP and N-terminal pro-BNP (NT-proBNP) at the top but with very large CIs (Supplementary Table 5).
Discussion
We here set out to identify and test biomarkers and the predictability of an imminent first myocardial infarction using a new case–cohort consortium of individuals without prior cardiovascular disease and with biobanked blood samples. From more than 1,800 biomarkers, we identified 48 proteins, 43 metabolites and 3 clinical variables associated with the risk of an imminent first myocardial infarction independent of technical covariates. Further analyses revealed BNP as the only biomarker consistently associated with IMI risk. We also derived a prediction model to discriminate between subsequent cases and noncases. The IMI phenotype has rarely been studied prospectively in the general population and with a broad panel of biomarkers. The findings may have implications for both clinical primary prevention studies and further etiological studies.
In the current study, higher BNP levels in individuals without a known cardiovascular disease were linked to a higher risk of a first myocardial infarction within 6 months in several models. Cardiomyocytes produce BNP in response to strain8, and NT-proBNP measurement is a pillar of the clinical management of heart failure9 but is not used in diagnosing myocardial infarction10. Diastolic dysfunction is an early feature of myocardial ischemia, and a higher BNP level in this context is likely underpinned by diastolic dysfunction caused by subclinical ischemia11 in individuals with some degree of coronary stenosis. This is supported by the weak association of BNP and CACS observed herein, although the association should be interpreted carefully. The noncausal explanation is further supported by the noncausality suggested by Mendelian randomization studies (acknowledging that associations of genetically determined lifelong BNP levels with coronary disease may have limited relevance to a temporally boxed-in series of events): a genetic variant affecting the expression of the BNP gene (NPPB, rs198389) is not associated with cardiovascular endpoints12 or coronary artery disease13. The influence of chance on the finding is low, as NT-proBNP was also significantly associated with IMI in the discovery sample, with a borderline association in the validation sample (Extended Data Fig. 5). While BNP may hence reflect an underlying coronary artery disease, it did not add materially to a risk prediction model for IMI composed of more readily available biomarkers.
Several known mechanisms implicated in atherosclerosis and ischemia were represented among the other 94 biomarkers associated with an IMI in both the discovery and validation samples after adjusting for technical covariates, including inflammation (IL-6)14, extracellular matrix metabolism (WAP four-disulfide core domain protein 2 (WFDC2))15, hypertrophy (adhesion G-protein-coupled receptor G1 (AGRG1))16, apoptosis (triggering receptor expressed on myeloid cells 1 (TREM1), tumor necrosis factor receptor superfamily member 10B (TRAIL-R2)) and cell adhesion (AGRG1). We also observed associations with markers representing mechanisms less often implicated in coronary diseases, such as markers of kidney injury (kidney injury molecule 1 (KIM1))17, appetite regulation (growth differentiation factor 15 (GDF15))18, and an α-amino acid found in dietary supplements and associated with paracetamol use (pyroglutamine)19. While some associations may be causal, others, such as associations with levels of chitinase-3-like protein 1 (CHI3L1)20, pleiotrophin (PTN) or KIT, may more likely be responses to myocardial ischemia. These findings may accelerate further etiological studies of acute coronary events.
We here developed a prediction model for IMI in the general population. An imminent infarction is difficult to predict; the signals are weak, and we faced power limitations. The model achieved good discriminative ability, with acceptable calibration in the lower risk range. It is possible to transpose to other settings by entering the base hazards and variable means of those settings, for example, interactively at miscore.org. Given the increasing global burden of deaths from myocardial infarction, the importance of predicting them and increasing the individual motivation for preventing such deaths may be substantial; this can be tested in clinical trials.
The current study has several limitations. First, the use of multiple cohorts introduced heterogeneity. We addressed this at the sampling, biomarker analysis and statistical analysis stages, with the resulting limitation that the heterogeneity decreases statistical power. The strengths are the same as in other multicenter studies, including that only biomarkers with consistent importance in different settings are brought forward. Other study limitations are inherent to the uncertainty of ranking the top findings and the inability of one-by-one strategies to capture complex interrelationships. The instability of the variable importances from the random forest was unsurprising, as such methods are notoriously data hungry and require far larger datasets than classical modeling techniques21. While the studied markers are easily obtainable by a simple blood test or clinical assessment, a limitation is that a blood sample will not always capture tissue-specific processes. In addition, our study was limited to proteins and metabolites that remain stable in the freezer for many years. The biomarker analyses used herein are currently not available in clinical practice, and we lacked the clinically available and more precise immunoassay measurements of, for example, NT-proBNP and cardiac troponin; hence, imprecision in the proximity extension assay and ultra-high performance liquid chromatography–mass spectrometry (UPLC–MS) technologies may preclude definitive mechanistic insights and maximal clinical utility. Further, making causal assumptions is fundamentally challenging in a multimarker landscape where many causal pathways are unknown. Most markers could be potential mediators in pathways for known causes of myocardial infarction, including age and sex. Consequently, we provided models adjusted for technical covariates only and models with further biological covariate adjustment. Thus, some associations could be explained by confounding by, for example, age and sex. Notably, mediators of causal effects are also important to identify, with implications for prediction and use as treatment targets.
In conclusion, we identified biomarkers associated with the risk of an imminent first myocardial infarction, including BNP. Delineation of the distinct biological processes that operate in the months before the first myocardial infarction will be key to discovering prevention targets. We developed and validated a prediction model with a fair ability to discriminate between persons with and without risk of an imminent first myocardial infarction. Risk prediction in the short term may enhance the motivation of patients and doctors for primary prevention.
Methods
Study sample and outcome
The MIMI study sample draws biobanked blood and data from six European cohorts of the BBMRI-LPC (Biobanking and Biomolecular Research Infrastructure—Large Prospective Cohorts) collaboration22, as shown in Fig. 1 and Supplementary Table 1. After sample size determination, we supplied each cohort with a standardized protocol (in which all definitions are described in detail) and an R script for selecting cohort representatives for the subcohort (Supplementary Notes).
Cohort participants with biobanked samples (at least 250 μl of plasma or serum; eventually, only plasma was included) and no previous clinical cardiovascular disease were eligible for inclusion in the present study. The exclusion criteria were previous clinical cardiovascular disease (defined as the presence at any time before baseline of any of the following: myocardial infarction, coronary procedure, heart failure, structural heart disease, tachyarrhythmias, stroke, thromboembolic disease and peripheral vascular disease) and renal failure.
Individuals with acute myocardial infarction (International Classification of Diseases, tenth revision (ICD-10), I21; ICD-9, 410.0–410.6 and 410.8) as the primary cause of hospitalization or death within 6 months after baseline were defined as IMI cases. We included both ST-elevation and non-ST-elevation myocardial infarctions; we encouraged efforts to include only type 1 myocardial infarctions by not counting cases with any of the following ICD codes in secondary positions: anemia (for example, ICD-10, D50–D64; ICD-9, 280–285), tachyarrhythmias (for example, ICD-10, I47–I49; ICD-9, 427), heart failure (for example, ICD-10, I50; ICD-9, 428), renal failure (for example, ICD-10, N17–N19; ICD-9, 584–586), chronic obstructive pulmonary disease (for example, ICD-10, J43–J44; ICD-9, 491, 492 and 496), sepsis and other severe infections (for example, ICD-10, A40–A41; ICD-9, 038), or hypertensive crises.
Up to four cohort representatives per available IMI case were randomly drawn from the full cohort to the subcohort in 50 strata based on sex, age (above/below median) and study center in a stratified case–cohort design7. All 420 IMI cases, and 1,598 subcohort representatives, were drawn from the full cohort of 169,053 participants, as summarized in Fig. 1.
Exposures
Clinical variables (age, sex, height, weight, waist circumference, systolic and diastolic blood pressure, triglycerides, high-density lipoprotein (HDL) cholesterol, non-HDL cholesterol, low-density lipoprotein (LDL) cholesterol, total cholesterol, glucose, diabetes status, highest education, smoking status, previous smoking exposure, alcohol intake and physical activity) were harmonized between the cohorts (Supplementary Notes). Non-HDL cholesterol was calculated as total cholesterol − HDL cholesterol. LDL levels were calculated using the extended Martin–Hopkins equation23.
All blood samples were randomized into appropriate measurement plates, stratified by cohort (with a similar number from each cohort on every plate), and aliquoted into the plates. Quality controls are summarized below and described in detail in the Supplementary Notes.
Protein measurements were done using the Olink proximity extension assay (Olink), a highly specific 92-plex immunoassay. Overall, 829 proteins across nine panels (cardiometabolic, cardiovascular II, cardiovascular III, development, immune response, inflammation, metabolism, oncology II and organ damage) were analyzed, including 804 unique proteins (considering overlap between panels). Relative protein values on a log2 scale are reported, with each protein value normalized by plate by centering all plates at the same median, assuming random plate placement. Values below the assay’s lower limit of detection (LOD) were also included in the analyses.
Metabolites were analyzed using the UPLC–tandem MS (UPLC–MS/MS)-based Metabolon platform (Metabolon) by four different methods: reversed-phase UPLC–MS/MS with positive-mode electrospray ionization (early and late phase), reversed-phase UPLC–MS/MS with negative-mode electrospray ionization, and hydrophilic interaction LC/UPLC–MS/MS with negative-mode electrospray ionization. Overall, 1,135 metabolites were captured, including 925 with known identity and 210 with unknown identity. Relative metabolite levels were determined and normalized by analysis day. Metabolite levels were log2 transformed, and nondetectable levels (<LOD or metabolite not present in the sample) were constant value imputed to a value below the minimum metabolite value (minimum/sqrt(2)).
Samples that did not satisfy the quality control criteria were initially excluded; exclusion filters were applied separately for the proteomics and metabolomics analyses, and only samples passing quality control for both analyses were included in the analysis set. For the proteomics analysis, samples with more than 50% of panels failing for technical reasons were excluded (n excluded = 33). For the metabolomics analysis, samples were excluded because of low volume or detection of fewer metabolites than expected (n excluded = 4). Consequently, samples for 420 cases and 1,598 subcohort representatives remained for analysis.
Next, biomarkers with an extremely high proportion of nondetectable or below-LOD measurements were excluded, with the same exclusion filters for proteins and metabolites. Biomarkers had to be detected in all six cohorts with at least 30 detectable values across all cohorts (~1.5% of the MIMI samples) or were otherwise excluded. Consequently, 817 proteins (some duplicates) and 1,025 metabolites were retained for analysis.
Statistical analysis
All analyses were done using R (version 4.1.1)24 with the glmnet25, mice26, rms27, ranger28 and survival29 add-on packages.
One-by-one etiological analyses
In the discovery sample, the associations of all clinical variables (listed in Extended Data Table 1), proteins and metabolites with IMI were analyzed in separate weighted, stratified Cox proportional-hazards regression models adjusting for covariates, as described below. Inverse sampling probability weights (Borgan II) were applied to account for the case–cohort design in a stratified model, allowing for a different shape of the baseline hazard for each MIMI cohort (six levels) and using a robust variance estimator (Huber–White). Nonlinear relationships between continuous covariates (not including the biomarkers) and IMI were modeled using restricted cubic splines, and all factor variables were considered unordered.
Associations with an FDR (Benjamini–Hochberg) of <0.05 were taken forward to the validation sample, in which directionally consistent results with P < 0.05 were considered replicated.
Cox proportional-hazards models adjusting for technical covariates (season, storage time and plate) were initially applied. Replicating biomarkers from the model adjusting for technical covariates were investigated in a model further adjusting for age and sex. A model allowing for an interaction between the biomarker and sex was further tested. Replicating biomarkers in the model adjusted for age and sex were then subjected to causal assumptions (Extended Data Fig. 1), and a bias-minimized model for each biomarker was investigated, estimating the total effects (including the effects of mediators).
Missingness and sensitivity analyses
Clinical variables with high missingness (previous smoking exposure, alcohol intake and physical activity) were not used in the analyses. Protein values below the LOD were included in the analyses; nondetectable metabolite levels were replaced with a constant value, and a missing indicator was added, as described below. The remaining missing values in the covariates were multiple imputed (n imputations = 20) using chained equations including the outcome, clinical covariates and other variables correlated with the variable in the imputation model30. Regression results across imputed datasets were combined using Rubin’s rules31.
Interactions with sex were investigated by analyzing an interaction term for sex and each biomarker in models adjusting for technical covariates, age and sex. The interaction terms and all terms including the biomarker were tested using a multivariable chi-squared test with the same multiple-testing correction described above, requiring directionally consistent discovery and validation results.
The following secondary sensitivity analyses were included: random-effects inverse variance-weighted meta-analyses (DerSimonian–Laird) combining per-cohort results, leave-one-out analyses investigating the influence of single cohorts, complete-case analyses not imputing missing values in the clinical covariates, and analyses limiting the follow-up time to 3 months.
Simultaneous modeling and development of a prediction model
To attempt predicting this phenotype, we developed a prediction model for IMI using age, sex, anthropometric variables (height, weight and waist circumference), variables routinely collected in the laboratory (LDL cholesterol, HDL cholesterol, creatinine, glucose and triglycerides), systolic and diastolic blood pressure, smoking status (never, former or current) and education level. Regression coefficients were estimated using a weighted Cox ridge regression model, which shrinks coefficients toward zero using an L2 penalty to accommodate overfitting. The strength of the penalty (lambda) was determined using tenfold cross-validation over a grid of 250 lambda values, repeated 100 times. The lambda selection was repeated in each imputed dataset, and the coefficients associated with the lambda giving the lowest cross-validated deviance were extracted. The final coefficient set was obtained by taking the median of the coefficients from each imputed dataset. A single-imputed dataset was used for validation and calibration. The C-index, which indicates a model’s ability to rank the risks, was determined using 100 repeats of tenfold cross-validation. A calibration curve was constructed using 100 repeats of tenfold cross-validation32. All modeling steps were repeated in each fold to assess the calibration accuracy objectively. The model containing only clinical variables was then reduced by approximating the linear predictor from the full model through stepwise regression. Predictions from the full model were used as the outcome in a linear model wherein variables were dropped sequentially until R2 > 0.95. This yielded a highly parsimonious final model incorporating the main drivers of predictions. The prediction model was compared to SCORE2, a validated prediction model for the 10-year risk of cardiovascular disease developed using multiple European cohorts33. The 10-year survival probability and the covariate mean values used in the SCORE2 equations were replaced with the estimated 6-month survival probability and mean values from the current data to calculate the SCORE2-estimated 6-month cardiovascular disease risk34. Two additional external validations of the model were performed in the UK Biobank. First, all coefficients and covariate mean values in Supplementary Table 4 were used to validate the model. Second, the model was recalibrated using mean values and the estimated baseline risk from the UK Biobank cohort before validation.
To evaluate whether any biomarkers added to the clinical prediction model improve risk prediction, we used the linear predictor from the prediction model as an offset in a LASSO Cox regression model. Before model fitting, all proteins and metabolites were adjusted for technical variables. Briefly, each biomarker was used as the outcome variable in a regression model with all technical variables as covariates. The residuals from these models were used in place of the original biomarker values in the LASSO model. The LASSO model fitting was bootstrapped 250 times to investigate the stability of the variable selection.
As the biomarkers may have nonlinear associations with the outcome and interact with one another, and prior knowledge about nonlinearities and interactions among these variables is scarce, a random forest with 2,000 trees was fitted to the data as an exploratory analysis. Briefly, the random forest fits survival trees to bootstrap data samples using a random subset of the variables in each tree, handling interactions and nonlinearities naturally. A variable importance measure is associated with each variable and calculated based on the number of splits in which a variable is involved. The random forest was bootstrapped 250 times to obtain CIs for the variable importance measures.
Further analysis of relevant biomarkers
The associations of proteins detected using the Olink panels cardiovascular II and cardiovascular III with the CACS were available for testing in individuals free from cardiovascular disease (self-reported myocardial infarction, angina, coronary intervention, heart failure, atrial fibrillation, stroke and peripheral artery disease) for 1,586 participants at the Malmö or Uppsala centers of SCAPIS35. A higher CACS reflects a higher myocardial infarction risk. Proteins replicated in the primary MIMI analysis (BNP) were tested for an association with the CACS using an ordinal regression model adjusting for age, sex, body mass index, systolic blood pressure, creatinine, center, Olink plate, analysis date and season.
Consent
This study was approved by the Uppsala Ethics Authority (Dnr 2016/197). All Estonian Biobank participants signed a broad informed consent form. The study was carried out under ethical approval 258/M-21 from the research ethics committee of the University of Tartu and data release J08 from the Estonian Biobank. The Lifelines protocol was approved by the University Medical Center Groningen medical ethical committee under number 2007/152. The study was performed in accordance with the Declaration of Helsinki. The EpiHealth study was approved by the ethics committee of Uppsala University, and all participants provided informed written consent. The MFM was approved by the previous regional research committee in Lund, Sweden (2014/643), and all participants provided informed consent. Ethical review boards of the cohorts in EPIC-CVD approved the study protocol, and all participants provided written informed consent. Participation in the HUNT study was based on informed consent, and the Data Inspectorate and the Regional Ethics Committee for Medical Research in Norway approved the study.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Acknowledgements
Transnational access to the large European prospective cohorts was provided by the BBMRI-LPC project funded by the European Commission’s Seventh Framework Programme (grant no. 313010, J.S.).
The Estonian Biobank study was supported by the European Union through the European Regional Development Fund (project no. 2014-2020.4.01.15-0012) and by institutional research funding IUT (IUT20-60) of the Estonian Ministry of Education and Research. We thank M. Alver (Estonian Biobank) for help with phenotype data.
The Trøndelag Health Study (HUNT) is a collaboration between the HUNT Research Centre (Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, NTNU), Trøndelag County Council, Central Norway Regional Health Authority, and the Norwegian Institute of Public Health.
The Lifelines initiative was made possible by a subsidy from the Dutch Ministry of Health, Welfare and Sport; the Dutch Ministry of Economic Affairs; the University Medical Center Groningen (UMCG), Groningen University; and the provinces in the north of the Netherlands (Drenthe, Friesland, Groningen). Lifelines is a multidisciplinary, prospective, population-based cohort study examining the health and health-related behaviors of 167,729 persons living in the north of the Netherlands in a unique three-generation design. The study used a broad range of investigative procedures in assessing the biomedical, sociodemographic, behavioral, physical and psychological factors that contribute to the health and disease of the general population, with a particular focus on multimorbidity and complex genetics.
EPIC-CVD was supported by funding from the European Commission Framework Programme 7 (HEALTH-F2-2012-279233), European Research Council (268834), Novartis, UK Medical Research Council (G0800270, MR/L003120/1), British Heart Foundation (SP/09/002, RG/13/13/30194, RG/18/13/33946), and National Institute for Health and Care Research (NIHR) Cambridge Biomedical Research Centre (BRC-1215-20014). The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care. The coordination of EPIC was financially supported by the International Agency for Research on Cancer (IARC) and the Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, which has additional infrastructure support provided by the NIHR Imperial Biomedical Research Centre (BRC). The national cohorts were supported by the Danish Cancer Society (Denmark); Ligue Contre le Cancer, Institut Gustave Roussy, Mutuelle Générale de l’Education Nationale, Institut National de la Santé et de la Recherche Médicale (INSERM) (France); German Cancer Aid, German Cancer Research Center (DKFZ), German Institute of Human Nutrition Potsdam-Rehbruecke (DIfE), Federal Ministry of Education and Research (BMBF) (Germany); Associazione Italiana per la Ricerca sul Cancro (AIRC), Compagnia di SanPaolo and National Research Council (Italy); Dutch Ministry of Public Health, Welfare and Sports (VWS), Netherlands Cancer Registry (NKR), LK Research Funds, Dutch Prevention Funds, Dutch ZON (Zorg Onderzoek Nederland), World Cancer Research Fund (WCRF), Statistics Netherlands (Netherlands); Health Research Fund (FIS)–Instituto de Salud Carlos III (ISCIII), regional governments (Andalucía, Asturias, Basque Country, Murcia and Navarra), Catalan Institute of Oncology (ICO) (Spain); Swedish Cancer Society, Swedish Research Council and county councils of Skåne and Västerbotten (Sweden); Cancer Research UK (14136 to EPIC-Norfolk, C8221/A29017 to EPIC-Oxford), Medical Research Council (1000143 to EPIC-Norfolk, MR/M012190/1 to EPIC-Oxford) (United Kingdom). We thank all EPIC participants and staff for their contribution to the study, the laboratory teams at the Medical Research Council Epidemiology Unit for sample management and at Cambridge Genomic Services for genotyping, S. Spackman for data management, and the team at the EPIC-CVD Coordinating Centre for study coordination and administration.
SCAPIS was supported by Hjärt-Lungfonden, the Knut and Alice Wallenberg Foundation, the Swedish Research Council and VINNOVA.
This research was conducted using the UK Biobank resource under application no. 52678.
Grants were received from the European Research Council (no. 801965), Swedish Research Council, VR (2019-01471) and Hjärt-Lungfonden (20190505) (T.F.).
Grants were received from AFA Försäkring (160266), the Swedish Research Council (2016-01065), Hjärt-Lungfonden (20160734) and A. Wiklöf (J.S.).
The computations were enabled by resources in projects sens2019006 and sens2020005 provided by the Swedish National Infrastructure for Computing (SNIC) at UPPMAX, partially funded by the Swedish Research Council through grant agreement no. 2018-05973.
The funding sources for the different cohorts and the sponsors of the current work did not have any part in the collection, analysis and interpretation of data or in the decision to submit the paper for publication. Where authors are identified as personnel of the International Agency for Research on Cancer/World Health Organization, the authors alone are responsible for the views expressed in this article, and they do not necessarily represent the decisions, policies or views of the International Agency for Research on Cancer/World Health Organization.
Extended data
Author contributions
J.S., S.G., E.L., E.I., U.H., A.G., B. Svennblad and T.F. were responsible for the study conception and design. S.G., E.L. and K.J.E. performed the statistical analyses. J.S. wrote the final draft together with S.G. and E.L. A.S.B., S.E., G.E., K.H., M.J., A.L., L.L., K.L., G.M., A.M., C.M.-I., P.M.N., M.P., B. Simell, H.S. and B.O.Å. were responsible for data acquisition for the different cohorts. All authors contributed to manuscript preparation and provided critical comments on the final manuscript. All authors had full access to all study data and had final responsibility for the decision to submit the paper for publication.
Peer review
Peer review information
Nature Cardiovascular Research thanks Ji Chen, Nicholas Mills and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Funding
Open access funding provided by Uppsala University.
Data availability
Data supporting the findings of this study are provided in the article and related files. Raw data are not publicly available, as they contain sensitive personal information, but may be obtained from the original cohorts upon request, with varying processes, requirements and response times. For example, researchers can apply to use the Lifelines data used in this study; information on how to request access to Lifelines data and the conditions of use can be found at https://lifelines.nl/researcher/how-to-apply. Data accession codes for this study are described below.
Code availability
The prediction model equation is available at miscore.org. Code is available at https://github.com/stefan-gustafsson-work/mimi. All code used to analyze UK Biobank data is deposited at the UK Biobank repository. Questions about the analyses and code can be sent to the corresponding author.
Competing interests
The authors declare the following competing interests: A.S.B. reports grants outside this work (from AstraZeneca, Bayer, Biogen, BioMarin, Bioverativ, Novartis and Sanofi) and personal fees from Novartis. E.I. is now an employee at GlaxoSmithKline. S.G. is an employee of Sence Research AB. J.S. reports stock ownership in Anagram kommunikation AB and Symptoms Europe AB outside the submitted work. All other authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Stefan Gustafsson, Erik Lampa.
Extended data
is available for this paper at 10.1038/s44161-024-00422-2.
Supplementary information
The online version contains supplementary material available at 10.1038/s44161-024-00422-2.
References
- 1.Lozano, R. et al. Global and regional mortality from 235 causes of death for 20 age groups in 1990 and 2010: a systematic analysis for the Global Burden of Disease Study 2010. Lancet380, 2095–2128 (2012). 10.1016/S0140-6736(12)61728-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.European Society of Cardiology, European Heart Network, European Commission, World Health Organization. European heart health charter. Eur. J. Prev. Cardiol. 10.1097/01.hjr.0000266926.91914.2a (2007).
- 3.Carey, I. M. et al. Increased risk of acute cardiovascular events after partner bereavement: a matched cohort study. JAMA Intern. Med.174, 598–605 (2014). 10.1001/jamainternmed.2013.14558 [DOI] [PubMed] [Google Scholar]
- 4.Fang, F. et al. Suicide and cardiovascular death after a cancer diagnosis. N. Engl. J. Med.366, 1310–1318 (2012). 10.1056/NEJMoa1110307 [DOI] [PubMed] [Google Scholar]
- 5.Zaman, T. et al. Angiographic lesion severity and subsequent myocardial infarction. Am. J. Cardiol.110, 167–172 (2012). 10.1016/j.amjcard.2012.03.008 [DOI] [PubMed] [Google Scholar]
- 6.Kotseva, K. et al. Cardiovascular prevention guidelines in daily practice: a comparison of EUROASPIRE I, II, and III surveys in eight European countries. Lancet373, 929–940 (2009). 10.1016/S0140-6736(09)60330-5 [DOI] [PubMed] [Google Scholar]
- 7.Ganna, A. et al. Risk prediction measures for case–cohort and nested case–control designs: an application to cardiovascular disease. Am. J. Epidemiol.175, 715–724 (2012). 10.1093/aje/kwr374 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Liang, F. & Gardner, D. G. Mechanical strain activates BNP gene transcription through a p38/NF-κB-dependent mechanism. J. Clin. Invest.104, 1603–1612 (1999). 10.1172/JCI7362 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Heidenreich, P. A. et al. 2022 AHA/ACC/HFSA guideline for the management of heart failure: a report of the American College of Cardiology/American Heart Association Joint Committee on Clinical Practice Guidelines. Circulation145, e895–e1032 (2022). [DOI] [PubMed] [Google Scholar]
- 10.Gulati, M. et al. 2021 AHA/ACC/ASE/CHEST/SAEM/SCCT/SCMR guideline for the evaluation and diagnosis of chest pain: a report of the American College of Cardiology/American Heart Association Joint Committee on Clinical Practice Guidelines. Circulation144, e368–e454 (2021). [DOI] [PubMed] [Google Scholar]
- 11.Sabatine, M. S. et al. Acute changes in circulating natriuretic peptide levels in relation to myocardial ischemia. J. Am. Coll. Cardiol.44, 1988–1995 (2004). 10.1016/j.jacc.2004.07.057 [DOI] [PubMed] [Google Scholar]
- 12.Johansson, A. et al. Genome-wide association and Mendelian randomization study of NT-proBNP in patients with acute coronary syndrome. Hum. Mol. Genet.25, 1447–1456 (2016). 10.1093/hmg/ddw012 [DOI] [PubMed] [Google Scholar]
- 13.Folkersen, L. et al. Genomic and drug target evaluation of 90 cardiovascular proteins in 30,931 individuals. Nat. Metab.2, 1135–1148 (2020). 10.1038/s42255-020-00287-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Rai, H. et al. Association of interleukin 6 −174 G/C polymorphism with coronary artery disease and circulating IL-6 levels: a systematic review and meta-analysis. Inflamm. Res.70, 1075–1087 (2021). 10.1007/s00011-021-01505-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Yamamoto, M. et al. HE4 predicts progressive fibrosis and cardiovascular events in patients with dilated cardiomyopathy. J. Am. Heart Assoc.10, e021069 (2021). 10.1161/JAHA.120.021069 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Zhang, Y., Si, Y., Ma, N. & Mei, J. The RNA-binding protein PCBP2 inhibits Ang II-induced hypertrophy of cardiomyocytes though promoting GPR56 mRNA degeneration. Biochem. Biophys. Res. Commun.464, 679–684 (2015). 10.1016/j.bbrc.2015.06.139 [DOI] [PubMed] [Google Scholar]
- 17.McCarthy, C. P. et al. Derivation and external validation of a high-sensitivity cardiac troponin-based proteomic model to predict the presence of obstructive coronary artery disease. J. Am. Heart Assoc.9, e017221 (2020). 10.1161/JAHA.120.017221 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Yang, L. et al. GFRAL is the receptor for GDF15 and is required for the anti-obesity effects of the ligand. Nat. Med.23, 1158–1166 (2017). 10.1038/nm.4394 [DOI] [PubMed] [Google Scholar]
- 19.Liss, D. B., Paden, M. S., Schwarz, E. S. & Mullins, M. E. What is the clinical significance of 5-oxoproline (pyroglutamic acid) in high anion gap metabolic acidosis following paracetamol (acetaminophen) exposure? Clin. Toxicol. (Phila.)51, 817–827 (2013). 10.3109/15563650.2013.844822 [DOI] [PubMed] [Google Scholar]
- 20.Deng, Y. et al. Upregulated microRNA-381-5p strengthens the effect of dexmedetomidine preconditioning to protect against myocardial ischemia–reperfusion injury in mouse models by inhibiting CHI3L1. Int. Immunopharmacol.92, 107326 (2021). 10.1016/j.intimp.2020.107326 [DOI] [PubMed] [Google Scholar]
- 21.van der Ploeg, T., Austin, P. C. & Steyerberg, E. W. Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints. BMC Med. Res. Methodol.14, 137 (2014). 10.1186/1471-2288-14-137 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Simell, B. A. et al. Transnational access to large prospective cohorts in Europe: current trends and unmet needs. N. Biotechnol.49, 98–103 (2019). 10.1016/j.nbt.2018.10.001 [DOI] [PubMed] [Google Scholar]
- 23.Sajja, A. et al. Comparison of methods to estimate low-density lipoprotein cholesterol in patients with high triglyceride levels. JAMA Netw. Open4, e2128817 (2021). 10.1001/jamanetworkopen.2021.28817 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2018).
- 25.Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw.33, 1–22 (2010). 10.18637/jss.v033.i01 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.van Buuren, S. & Groothuis-Oudshoorn, K. mice: multivariate imputation by chained equations in R. J. Stat. Softw.45, 1–67 (2011). 10.18637/jss.v045.i03 [DOI] [Google Scholar]
- 27.Harrell, F. E. Jr. rms: regression modeling strategies. CRAN.R-project.org/package=rms (2020).
- 28.Wright, M. N. & Ziegler, A. ranger: a fast implementation of random forests for high dimensional data in C++ and R. J. Stat. Softw.77, 1–17 (2017). 10.18637/jss.v077.i01 [DOI] [Google Scholar]
- 29.Therneau, T. A package for survival analysis in R. CRAN.R-project.org/package=survival (2022).
- 30.White, I. R., Royston, P. & Wood, A. M. Multiple imputation using chained equations: issues and guidance for practice. Stat. Med.30, 377–399 (2011). 10.1002/sim.4067 [DOI] [PubMed] [Google Scholar]
- 31.Rubin, D. B. Inference and missing data. Biometrika63, 581–592 (1976). 10.1093/biomet/63.3.581 [DOI] [Google Scholar]
- 32.Austin, P. C., Harrell, F. E. Jr. & van Klaveren, D. Graphical calibration curves and the integrated calibration index (ICI) for survival models. Stat. Med.39, 2714–2742 (2020). 10.1002/sim.8570 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.SCORE2 Working Group and ESC Cardiovascular Risk Collaboration. SCORE2 risk prediction algorithms: new models to estimate 10-year risk of cardiovascular disease in Europe. Eur. Heart J. 42, 2439–2454 (2021). [DOI] [PMC free article] [PubMed]
- 34.D’Agostino, R. B. Sr., Grundy, S., Sullivan, L. M. & Wilson, P. CHD Risk Prediction Group. Validation of the Framingham coronary heart disease prediction scores: results of a multiple ethnic groups investigation. JAMA286, 180–187 (2001). 10.1001/jama.286.2.180 [DOI] [PubMed] [Google Scholar]
- 35.Bergström, G. et al. The Swedish CArdioPulmonary BioImage Study: objectives and design. J. Intern. Med.278, 645–659 (2015). 10.1111/joim.12384 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data supporting the findings of this study are provided in the article and related files. Raw data are not publicly available, as they contain sensitive personal information, but may be obtained from the original cohorts upon request, with varying processes, requirements and response times. For example, researchers can apply to use the Lifelines data used in this study; information on how to request access to Lifelines data and the conditions of use can be found at https://lifelines.nl/researcher/how-to-apply. Data accession codes for this study are described below.
The prediction model equation is available at miscore.org. Code is available at https://github.com/stefan-gustafsson-work/mimi. All code used to analyze UK Biobank data is deposited at the UK Biobank repository. Questions about the analyses and code can be sent to the corresponding author.