Abstract
Background:
Stroke incidence is decreasing in older ages but increasing in young adults. These divergent trends are at least partially attributable to diverging trends in stroke risk factors but may also be due to differences in the impact of stroke risk factors at different ages. To address this latter possibility, we employed Mendelian Randomization (MR) to assess differences in the association of stroke risk factors between early onset (EOS, onset 18-59 years) and late onset ischemic stroke (LOS, onset ≥ 60 years).
Methods:
We identified genetic variants from the GWAS catalog for use as instrumental variables to proxy conventional stroke risk factors and then estimated the effects of these variants on risk factors in younger and older individuals in the UK Biobank. We then used these estimates to estimate the causal effects of stroke risk factors on EOS (n = 6,728 cases) and LOS (n = 9,272) cases from SiGN and the Early Onset Stroke Consortium. Lastly, we compared odds ratios between EOS and LOS, stratified by TOAST subtypes, to determine if differences between estimates could be attributed to differences in stroke subtype distributions.
Results:
EOS was associated with higher levels of body mass index (BMI), blood pressure, diabetes (T2D), and lower levels of HDL cholesterol (all p≤ 0.002) while LOS was associated with higher levels of systolic blood pressure (p = 0.0001).
The causal effect of BMI on stroke was significantly stronger for EOS than for LOS (OR = 1.26 vs 1.03; p=0.008). After the subtype-stratified analysis, the difference in causal effect sizes between EOS and LOS for BMI diminished and was no longer significant.
Conclusions:
These results support a causal relationship between BMI, blood pressure, T2D, and HDL levels with early onset ischemic stroke and blood pressure levels in late onset stroke. Interventions that target these traits may reduce stroke risk.
Keywords: Ischemic Stroke, Mendelian Randomization, Risk Factors, Young Adults
Graphical Abstract

Brief Summary
Our results support causal links of early onset stroke with BMI, HTN, T2D, and HDL, and of late onset stroke with HTN. Causal effect of BMI is significantly stronger for early than late.
Introduction
Stroke is one of the leading causes of death and disability, with more than 795,000 new and recurrent cases annually in the United States as of 2023.1 Despite its high incidence, stroke is largely a preventable disorder. The Global Burden of Disease Study estimated that 91% of stroke burden, measured as disability-adjusted life years, can be attributable to modifiable risk factors and 72% of stroke burden is attributable to clusters of metabolic risk factors, namely hypertension, obesity, hyperglycemia, hyperlipidemia, and renal dysfunction.2,3 While most strokes occur in individuals over the age of 50 years, approximately 10% of strokes occur in younger adults aged 18-50 years.4 Since the year 2000, incidence rates of ischemic stroke in high-income countries have been declining among older individuals but have been increasing in individuals younger than age 55.5 While these diverging trends in stroke incidence can be at least partially attributable to diverging patterns of stroke risk factors between younger and older adults, it is also possible that the impact of stroke risk factors differs between younger and older individuals.
To evaluate the impact of modifiable risk factors on ischemic stroke at different ages, we employed Mendelian randomization6 (MR) to compare causal associations by age of stroke onset and across stroke subtypes. MR uses genetically predicted levels of traits to serve as proxies. Since alleles are randomly assigned at conception, these genetic proxies are generally independent of the risk factor-outcome relationship and thus not easily subject to reverse causality or confounding factors as seen in observational studies. We hypothesize that the contributions of five modifiable ischemic stroke risk factors (blood pressure, body mass index, type 2 diabetes, hyperlipidemia, and smoking) differ between early and late onset ischemic stroke (EOS and LOS) and that these differences cannot be explained by the differential distribution of stroke subtypes. Using Mendelian randomization, we estimated the causal effects of these risk factors and compared these estimates between early (age of stroke onset < age 60 yrs.) and late (age of stroke onset ≥ 60 yrs.).
Methods
Data Availability
Individual level data from SiGN, where permitted by participant consent and institutional certification, have been deposited into dbGaP. Summary level GWAS statistics from SiGN and EOSC are available on the Cerebrovascular Disease Knowledge Portal.
Study design.
This study was performed according to the Strengthening the Reporting of Observational Studies in Epidemiology Using Mendelian Randomization (STROBE-MR) and the recommended guidelines.7 We employed a 2-sample MR design using IVW as our primary method of analysis to assess the causal estimates of conventional stroke risk factors on early and late onset ischemic stroke.
Study sample for Primary Outcome.
This study utilizes stroke cases and controls assembled from two large GWAS consortiums: The Early Onset Stroke Consortium8 (EOSC) and the Stroke Genetics Network9 (SiGN). Stroke cases in these Consortia underwent brain imaging at each site to exclude diagnoses other than ischemic stroke and to assist with subtype classification. Additional screening was performed in some, but not all, studies to exclude cases believed to be due to a known monogenic cause (e.g., sickle cell disease) or to a known non-genetic cause (e.g., drug use, complications of procedures). Ischemic stroke subtyping was performed using the TOAST criteria10 in most, but not all, sites.
Consistent with criteria used in the EOSC, we defined early onset stroke for these analyses as cases with stroke onset 18-59 years, and late-onset stroke as those with age at first stroke 60 years or older. Subjects included in this report are restricted to a subset of 6,728 early-onset cases (and 33,764 controls) and 9,272 late-onset stroke cases (and 25,124 controls) who are of European ancestry and for whom individual-level genotypes were available (Table S1). The genotype data from stroke cases and controls were based on hg38 and imputed using the TOPMed reference panel on the University of Michigan Imputation Server.11
Exposure Genetic instrument selection.
An important consideration for MR analysis is that the population in which the genetic instrument is developed should be as comparable as possible to the population in which the outcome is measured. To address this issue, we developed two sets of genetic instruments, one in a population < 60 years of age, and the second in a population ≥ 60 years of age. Our strategy was to use a common set of risk factor-associated variants, but then weighting them differently according to their population-specific effect sizes. First, we obtained summary genetic association results from large publicly GWAS available from the GWAS catalog12 (https://www.ebi.ac.uk/gwas/) for 9 stroke risk factors: body mass index (BMI), systolic blood pressure (SBP), diastolic blood pressure (DBP), total cholesterol (TCHOL), LDL cholesterol (LDL), HDL cholesterol (HDL), triglycerides (TG), type 2 diabetes (T2D), and smoking initiation (SmkInit). Sample sizes for the genome-wide association analyses of each trait ranged from 339,224 to 1,232,091 (Table S2). We identified variants from GWAS that were associated with each risk factor at genome-wide significance (p < 5 × 10−8) and selected the most significant variant at each associated locus by removing SNPs in linkage disequilibrium with the lead SNP using the clumping procedure in PLINK with parameters clump-kb = 10,000 and clump-r2 > 0.001. We assessed the strength of each SNP by calculating its F-statistic, which is a function of the proportion of the variance explained by the genetic instrument, and the sample size.11 The total number of SNPs obtained from each GWAS, the number of SNPs pruned at each filtering step, and the corresponding F-statistic are provided in Table S3.
Having identified the risk factor variants to include in our instruments, we then estimated age-specific effect sizes (age < 60 yrs and age ≥ 60 yrs) for each selected variant by performing an age-stratified genetic association analysis of these variants for each risk factor in the UK Biobank and extracting their effect sizes. This enabled us to obtain MR estimates of the casual effect of the risk factor on outcome in populations of approximately the same age as those in which the genetic instrument-exposure effect sizes were calculated. We utilized data from the UK Biobank, a large ongoing prospective cohort study involving 506,682 participants between 2006 and 2010. Participants were aged 40–69 at the time of recruitment. Study populations were identified using BMI, DBP, SBP, HDL, LDL, TG, TCHOL, Smoking Status, and T2D as phenotypes based on UDI and phenocodes. See Table S4 for study population size and definition of each phenotype. Details of the UK Biobank genetic association analyses, including analysis models and genetic association results, are provided in the Supplement (Supplementary Info, Methods; Table S5–13). We compared effect sizes of the risk factor PRS on the risk factor between individuals < age 60 and individuals ≥ age 60 using a paired t-test. We performed a formal comparison of effect sizes across all variants using Cochran’s Q and I2.
Mendelian Randomization Assumptions
To minimize the potential for bias in making causal inferences, SNPs were selected to adhere to three assumptions of valid instrumental variables (Figure 1):13 (1) the SNPs used must be strongly associated with the exposure; (2) the SNPs must not be associated with measured and unmeasured confounders; and (3) the SNPs affect the outcome only through the effects of the exposure (e.g., no horizontal pleiotropy.) We used the F-statistic as a measure of the strength of the SNP-exposure association (Assumption 1) and performed sensitivity analyses (see below) to assess violations of Assumptions 2 and 3.
Figure 1:

The Three MR Assumptions, where Z is the IV associated with the exposure, X is the exposure, Y is the outcome, and U is confounder. 1) Relevance Assumption: The IV is strongly associated with the exposure of interest. 2) Independence assumption: there are no confounders of the association between the IVs and the outcome, and 3) exclusion restriction assumption: the IV is not related to the outcome other than via the exposure
Statistical Analyses
Following clumping and F-statistic filtering, SNP-exposure (risk factors) association statistics were pulled for selected SNPs to create nine age-specific exposure genetic risk scores instruments. The SNP-outcome (stroke) associations were obtained from genetic association analyses performed on the early and late onset stroke datasets from the EOSC and SiGN. The SNP-outcome associations were calculated in PLINK214 (PLINK v2.00a3.3LM) software for all ischemic stroke and for the TOAST subtypes using logistic regression, controlling for genetic ancestry with principal components 1 through 10 and sex.
We used the random-effects inverse-variance weighted15 (IVW) as the primary method for computing causal estimates of the association of each exposure (risk factor) with stroke. This approach entails calculating a Wald ratio for each SNP by dividing the SNP-outcome association by the SNP-exposure association and then estimating the mean of these Wald ratios, weighing each by the inverse of their variances. We used the random-effects model to adjust for heterogeneity among Wald ratios by accounting for over-dispersion in the regression model. Odds ratios were calculated for both EOS and LOS using the final IVW estimate for the causal association of each risk factor with all stroke and TOAST subtype. The odds ratios for continuous risk factors correspond to the change in odds of stroke corresponding to a one standard deviation change in risk factor. To account for multiple testing, we considered a P-value < 0.0055 (P < 0.05/9) to be statistically significant, accounting for the nine different risk factors.
To evaluate whether the IVW estimates comply with the independence and exclusion assumption, we performed several sensitivity analyses. As a measure of pleiotropy, we assessed heterogeneity among the individual Wald ratios from the initial estimates using I2 and Cochran’s Q as well as the intercept taken from the MR-Egger method. We also performed the MR analysis using other methods (e.g., Simple median16, weighted median16, and MR-Egger17) that are more robust than the IVW approach against deviation from the MR assumptions. Although these methods have less power, estimates from these analyses that are directional discordant from the IVW estimates could be an indication of the presence of pleiotropy. Cook’s distance and the MR Pleiotropy Residual Sum and Outlier18 (MR-PRESSO) method were used to identify and remove pleiotropic outliers. SNPs with Cook’s distance > (4/number of SNPs) were tagged as outliers and filtered out due to their disproportionate level of influence on the MR models. MR-PRESSO uses a leave-one-out methodology to detect pleiotropic SNPs and quantifies their distortion in the causal estimate. To ensure the instrumental variables used were the same in our EOS and LOS estimates, we removed non-overlapping SNPs before recalculating the IVW estimate. All statistical analyses were done in R (4.0.3; The R Foundation for Statistical Computing) using the MendelianRandomization (0.9.0) and MR-PRESSO packages (1.0).
Homogeneity Test of Causal Estimates in EOS and LOS
To compare the difference of the causal estimates between early and late onset stroke groups, we performed a t-test, calculated as the difference between the betas divided by the variance of the difference. For this hypothesis we accounted for multiple testing by adjusting for the number of significantly associated risk factors we tested.
We also evaluated whether any differences in effect sizes of risk factors between early and late onset stroke could be driven by differences in the distribution of stroke subtypes between the two groups. For this analysis, we estimated the causal effects for each subtype and then computed the mean of the differences in effect sizes between early and late onset across the five different subtypes. We then computed the variance of the mean difference and performed a t-test to evaluate the significance of the difference in causal effects while accounting for subtype differences between EOS and LOS.
Ethical approval
The genetic association analyses, which involved deidentified data obtained from the UK Biobank Resource under Application Number 49852, underwent ethical oversight, including the determination by the University of Maryland, Baltimore Institutional Review Board that the study is not human research (IRB #: HF-00088022).
Results
Causal effects of stroke risk factors on EOS and LOS
From published GWAS and our filtering steps (see Table S2–3), we identified genetic instruments comprising 20 - 803 variants for the nine risk factors we analyzed (see Table 1). The IVW causal estimates between the stroke risk factors and ischemic stroke, and their corresponding odds ratios (scaled to a 1-standard deviation unit change for continuous risk factors), are shown in Figure 2 and Table S14 for EOS and LOS and stroke subtypes. We identified causal effects on EOS for BMI (OR = 1.26, 95% CI: 1.13-1.40), DBP (OR = 1.39, 95% CI: 1.21-1.60), SBP (OR = 1.47, 95% CI: 1.28-1.69), HDL (OR = 0.82, 95% CI: 0.73-0.93), and T2D (OR = 1.17, 95% CI: 1.06-1.29), all p ≤ 0.002. The causal effects of LDL, CHOL, and smoking on EOS did not meet criteria for statistical significance. In contrast, only the causal effect for SBP on LOS met criteria for statistical significance (OR = 1.24, 95% CI: 1.11-1.38, p = 0.0001).
Table 1:
Genetic Variants that passed filtering criteria per risk factor and were used in the All-Stroke Analysis
| Risk Factor | Onset | F-Statistics Average [Min, Max] | # Variants |
|---|---|---|---|
| BMI | EOS | 60.89 [28.45, 1426.17] | 791 |
| BMI | LOS | 60.89 [28.45, 1426.17] | 791 |
| DBP | EOS | 68.88 [29.58, 850.69] | 772 |
| DBP | LOS | 68.88 [29.58, 850.69] | 768 |
| HDL | EOS | 127.4 [29, 4451.54] | 312 |
| HDL | LOS | 127.4 [29, 4451.54] | 312 |
| LDL | EOS | 146.96 [29.03, 2967.58] | 294 |
| LDL | LOS | 146.96 [29.03, 2967.58] | 294 |
| SBP | EOS | 68.54 [29.69,700.65] | 800 |
| SBP | LOS | 68.54 [29.69,700.65] | 795 |
| SmkInit | EOS | 32.43 [17.15, 105.64] | 252 |
| SmkInit | LOS | 32.43 [17.15, 105.64] | 252 |
| T2D | EOS | 69.59 [24.52, 833.23] | 52 |
| T2D | LOS | 69.59 [24.52, 833.23] | 52 |
| TCHOL | EOS | 147.77 [29, 3099.41] | 300 |
| TCHOL | LOS | 147.77 [29, 3099.41] | 300 |
| TG | EOS | 120.97 [28.59, 2173.62] | 281 |
| TG | LOS | 120.97 [28.59, 2173.62] | 281 |
Figure 2:

Odds ratio and 95% confidence interval for association of nine stroke risk factors with EOS (blue) and LOS (red) for all ischemic strokes.
** P < 0.01; * P < 0.05; for the heterogeneity test between EOS and LOS
Comparison of effect sizes between EOS and LOS
For all risk factors except LDL and total cholesterol, unfavorable levels of the risk factor were more strongly associated with EOS than LOS. Heterogeneity testing indicated that the causal effect of BMI on stroke was significantly stronger for EOS than for LOS (OR = 1.26 vs 1.03; p=0.008). The causal effects of DBP, SBP, and HDL were also stronger for EOS than for LOS (DBP: OR = 1.39 vs 1.11, p = 0.016; SBP: OR = 1.47 vs 1.24, p = 0.051; HDL: OR = 0.82 vs 0.90, p = 0.022), although none achieved statistical significance at our threshold of P-value < 0.01 for 5 risk factors tested (Table S14).
Assessment of the MR assumptions
Weighted median, simple median, and MR-Egger were used as alternative causal estimators, and their estimates remained stable relative to the IVW estimate (Figure 3). The MR-Egger intercept indicated no evidence for pleiotropy (p > 0.05; Table S15–16). There was no strong evidence of heterogeneity among the Wald ratios using I2 and the Cochran Q test (I2 > 50% and p < 0.05; Table S17).
Figure 3:




MR Scatterplots and Causal Estimators of BMI, DBP, SBP, and HDL Association with All Stroke EOS and LOS.
Subtype-adjusted MR analyses
To evaluate whether the stronger associations of BMI, DBP, SBP, and HDL with EOS could be attributed to differences in stroke subtypes between EOS and LOS, we performed TOAST subtype-adjusted MR analyses of these risk factors. We reasoned that if the EOS vs. LOS differences in risk factor associations were attributable wholly to differences in the distribution of stroke subtypes between EOS and LOS, then there would be no difference in risk factor associations within stroke subtypes. These analyses indicated that the stronger association of BMI with EOS was diminished and no longer statistically significant after accounting for differing subtypes (p = 0.33). However, one caveat with these analyses is that stroke subtypes were available on only 81% of the stroke cases, thus diminishing power to detect differences. (Table S14)
Discussion
The contribution of conventional stroke risk factors to the development of ischemic stroke has been established previously through prospective epidemiologic studies19,20 and causal effects of these estimated through prior MR analyses.21 The novel contribution of our study is our use of age-specific weights to evaluate the impact of these risk factors on EOS and LOS separately, where we show that BMI, blood pressure, T2D, and HDL, but not total cholesterol, LDL and smoking, are significantly associated with EOS, while, in contrast, only SBP is significantly associated with LOS. Our analysis of effect size differences further revealed that that the causal effect of BMI on stroke was significantly stronger for EOS than for LOS. While the differences in effect sizes were not statistically significant, the causal effects of blood pressure and smoking were stronger for EOS than for LOS, while the causal effects for lipids were stronger for LOS than for EOS.
Our results are consistent with prior epidemiologic studies reporting relatively large associations of some conventional stroke risk factors on early onset stroke. For example, in a case-control study of young ischemic stroke (15-49 years old), Mitchell et al.22 found obesity to be significantly associated with an increased risk of ischemic stroke in young adults with an odds ratio of 1.57 (1.28 – 1.94). In other epidemiologic studies of BMI, dominated by older onset strokes, odds ratios in the range of 1.02 - 1.30 have been reported.23 Similarly, observational studies have shown relatively stronger associations with smoking,24,25 and hypertension in younger compared to older adults.26 Lower levels of HDL-cholesterol have also been reported in at least one study to be stronger in older compared to younger individuals (age <65: OR = 0.76 (0.44-1.32), age 65-74: OR = 0.38 (0.22-0.65), and age 375: OR = 0.51 (0.27-0.94)).27
Our MR estimates are generally in concordance with the prior Mendelian randomization studies, reviewed by Georgakis et al.21, finding causal associations of elevated levels of blood pressure and LDL, lower levels of HDL, and smoking and type 2 diabetes with ischemic stroke. The absence of associations of some of these risk factors with LOS in our study may be related to our estimation of age-specific effects or to weak instrument bias given that our instruments were based on genetic association analyses restricted to UK Biobank participants aged 60 and older. One notable exception is that a large MR analysis of BMI did not find evidence for a causal association with stroke.28 This partially contrasts with our own findings, where BMI was causal for EOS but was null for LOS. However, this prior study used MEGASTROKE (mean age of stroke = 67.4 yrs), an age range more closely matching our LOS group.
The prevalence of many of the conventional stroke risk factors has steadily risen over the past decades. In a review of the National Health and Examination Survey, Aggarwal et al. reported that between 2009 and 2020 the prevalence in hypertension among US adults aged 20-44 years rose from 9.3% to 11.5%, prevalence of diabetes rose from 3.0% to 4.1%, and prevalence of obesity rose from 32.7% to 40.9%.29 Concurrent with this rise in stroke risk factors among the young, stroke incidence has increased among younger adults. For example, from 1995 to 2012, US ischemic stroke hospitalization rate increased by 41.5% and 30% for males and females, respectively, aged 35-44 years old.30 Among those hospitalized in this age group, the prevalence of traditional risk factors nearly doubled during this time period.30 Thus, the increased prevalence of stroke risk factors among the young, combined with the greater impact they have on younger adults, may partly explain the rising incidence of ischemic stroke in this age group.
Like many studies, a major limitation of our study is its restriction to individuals of European ancestry, due primarily to the relatively small contribution of non-European samples to existing genome-wide association studies of stroke risk factors and stroke. Future studies involving non-European samples are urgently needed.31 Additionally, the number of stroke cases in each subtype classification was relatively small, thus limiting the power to detect subtype-specific associations. Finally, these results should be interpreted cautiously due to the risk of survival biases. Subjects with high genetic susceptibility to elevated BP, BMI, smoking, etc. may have already died and therefore may not have been included in the outcome cohorts.32
In summary, to our knowledge, ours is the first study to assess the causal effects of conventional stroke risk factors separately on early and late onset stroke. We found BMI, blood pressure T2D, and HDL, but not total and LDL-C and smoking to be causally associated with EOS, while, in contrast, only SBP was significantly associated with LOS. With the exception of total and LDL cholesterol, the causal estimates were generally stronger in EOS than LOS, although only for BMI did the difference in effect sizes achieve statistical significance. Larger studies of this issue, including non-European populations, are needed.
Supplementary Material
Acknowledgement
Please see Supplemental Materials for the acknowledgements for each contributing study.
Funding Sources
Partial funding was provided by NIH grants R01 NS100178, R01 NS105150, P30 AG028747, R01 NS114045, and U01HG011717. Kevin Nguyen was supported by a T32 AG000262 Epidemiology of Aging grant Dr. Xu was supported by the American Heart Association (Grant 19CDA34760258).
Disclosure
Dr Xu reports grants from National Institute of Health. Dr Worrall reports other intellectual property and compensation from American Academy of Neurology for other services. Dr Adebamowo reports grants from National Institutes of Health; and grants from American Cancer Society. Dr Kittner reports grants from National Institute of Neurological Disorders and Stroke.
Non-standard Abbreviations and Acronyms
- MR
Mendelian Randomization
- EOS
Early Onset Stroke
- LOS
Late Onset Stroke
- EOSC
Early Onset Stroke Consortium
- SiGN
Stroke Genetic Network
- TOAST
Trial of ORG 10172 in Acute Stroke Treatment
- CE
Cardioembolic Stroke
- LAA
Large Artery Atherosclerosis
- SAO
Small Artery Occlusion
- OTHER
Stroke of other determined cause
- UNDETER
Stroke of undetermined cause
- BMI
Body Mass Index
- SBP
systolic blood pressure
- DBP
diastolic blood pressure
- TCHOL
total cholesterol
- LDL
LDL cholesterol
- HDL
HDL cholesterol
- TG
triglycerides
- T2D
type 2 diabetes
- SmkInit
smoking initiation
- IVW
Inverse Variance Weighted Method
- MR-PRESSO
MR Pleiotropy Residual Sum and Outlier
Contributor Information
Kevin TK Nguyen, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, United States.
Huichun Xu, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, United States.
Brady J. Gaynor, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, United States
Patrick F. McArdle, Department of Medicine, School of Medicine, University of Maryland Baltimore, Baltimore, MD, United States
Timothy D. O’Connor, Department of Medicine, University of Maryland School of Medicine, Baltimore, Maryland, United States
James A. Perry, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, United States
Bradford B. Worrall, Department of Neurology, University of Virginia, Charlottesville, Virginia, USA; Department of Public Health Sciences, University of Virginia, Charlottesville, Virginia, USA
Rainer Malik, Institute for Stroke and Dementia Research, Ludwig-Maximilians-University of Munich, Germany.
Giorgio B. Boncoraglio, Department of Neurology, Fondazione IRCCS Istituto Neurologico C. Besta, Milan, Italy
Sally N. Adebamowo, Department of Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore, Maryland, United States
Ramin Zand, Department of Neurology, College of Medicine, The Pennsylvania State University, Hershey, PA 17033, USA.
John W. Cole, Department of Neurology, University of Maryland School of Medicine, Baltimore, Maryland
Steven J. Kittner, Department of Neurology, University of Maryland School of Medicine, Baltimore, Maryland, United States
Braxton D. Mitchell, Department of Medicine, University of Maryland School of Medicine, Baltimore, Maryland, United States
References
- 1.Tsao CW, Aday AW, Almarzooq ZI, Anderson CAM, Arora P, Avery CL, Baker-Smith CM, Beaton AZ, Boehme AK, Buxton AE, et al. Heart disease and stroke statistics—2023 update: a report from the American Heart Association. Circulation. 2023;147:e93–e621. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Rohr J, Kittner S, Feeser B, Hebel JR, Whyte M-G, Weinstein A, Kanarak N, Buchholz D, Earley C, Johnson C, et al. Traditional risk factors and ischemic stroke in young adults: the Baltimore-Washington Cooperative Young Stroke Study. Arch. Neurol 1996;53:603–607. [DOI] [PubMed] [Google Scholar]
- 3.Aigner A, Grittner U, Rolfs A, Norrving B, Siegerink B, Busch MA. Contribution of established stroke risk factors to the burden of stroke in young adults. Stroke. 2017;48:1744–1751. [DOI] [PubMed] [Google Scholar]
- 4.Putaala J, Metso AJ, Metso TM, Konkola N, Kraemer Y, Haapaniemi E, Kaste M, Tatlisumak T. Analysis of 1008 consecutive patients aged 15 to 49 with first-ever ischemic stroke. Stroke. 2009;40:1195–1203. [DOI] [PubMed] [Google Scholar]
- 5.Scott CA, Li L, Rothwell PM. Diverging temporal trends in stroke incidence in younger vs older people: a systematic review and meta-analysis. JAMA Neurol. 2022;79:1036–1048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Ebrahim S, Davey Smith G. Mendelian randomization: can genetic epidemiology help redress the failures of observational epidemiology? Hum. Genet 2008;123:15–33. [DOI] [PubMed] [Google Scholar]
- 7.Skrivankova VW, Richmond RC, Woolf BAR, Davies NM, Swanson SA, VanderWeele TJ, Timpson NJ, Higgins JPT, Dimou N, Langenberg C, et al. Strengthening the reporting of observational studies in epidemiology using mendelian randomisation (STROBE-MR): explanation and elaboration. BMJ. 2021;375:n2233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Jaworek T, Xu H, Gaynor BJ, Cole JW, Rannikmae K, Stanne TM, Tomppo L, Abedi V, Amouyel P, Armstrong ND, et al. Contribution of common genetic variants to risk of early-onset ischemic stroke. Neurology. 2022;99:e1738–e1754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Pulit SL, McArdle PF, Wong Q, Malik R, Gwinn K, Achterberg S, Algra A, Amouyel P, Anderson CD, Arnett DK, et al. The NINDS Stroke Genetics Network: a genome-wide association study of ischemic stroke and its subtypes. Lancet Neurol. 2016;15:174–184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Classification of subtype of acute ischemic stroke. definitions for use in a multicenter clinical trial. TOAST. Trial of org 10172 in acute stroke treatment. Stroke. 1993;24(1):35–41. [DOI] [PubMed] [Google Scholar]
- 11.Das S, Forer L, Schönherr S, Sidore C, Locke AE, Kwong A, Vrieze SI, Chew EY, Levy S, McGue M, et al. Next-generation genotype imputation service and methods. Nat. Genet 2016;48:1284–1287. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Sollis E, Mosaku A, Abid A, Buniello A, Cerezo M, Gil L, Groza T, Güneş O, Hall P, Hayhurst J, et al. The NHGRI-EBI gwas catalog: knowledgebase and deposition resource. Nucleic Acids Res. 2022;51:D977–D985. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Gagnon E, Daghlas I, Zagkos L, Sargurupremraj M, Georgakis MK, Anderson CD, Cronje HT, Burgess S, Arsenault BJ, Gill D. Mendelian randomization applied to neurology. Neurology. 2024;102:e209128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Burgess S, Butterworth A, Thompson SG. Mendelian randomization analysis with multiple genetic variants using summarized data. Genet. Epidemiol 2013;37:658–665. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Bowden J, Davey Smith G, Haycock PC, Burgess S. Consistent estimation in Mendelian randomization with some invalid instruments using a weighted median estimator. Genet. Epidemiol 2016;40:304–314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Burgess S, Thompson SG. Interpreting findings from Mendelian randomization using the mr-egger method. Eur. J. Epidemiol 2017;32:377–389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Verbanck M, Chen C-Y, Neale B, Do R. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat. Genet 2018;50:693–698. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Simons LA, McCallum J, Friedlander Y, Simons J. Risk factors for ischemic stroke. Stroke. 1998;29:1341–1346. [DOI] [PubMed] [Google Scholar]
- 20.Singer J, Gustafson D, Cummings C, Egelko A, Mlabasati J, Conigliaro A, Levine SR. Independent ischemic stroke risk factors in older Americans: a systematic review. Aging. 2019;11:3392–3407. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Georgakis MK, Gill D. Mendelian randomization studies in stroke: exploration of risk factors and drug targets with human genetic data. Stroke. 2021;52:2992–3003. [DOI] [PubMed] [Google Scholar]
- 22.Mitchell AB, Cole JW, McArdle Patrick F, Cheng Y-C, Ryan KA, Sparks MJ, Mitchell BD, Kittner SJ. Obesity increases risk of ischemic stroke in young adults. Stroke J. Cereb. Circ 2015;46:1690–1692. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Horn JW, Feng T, Mørkedal B, Strand LB, Horn J, Mukamal K, Janszky I. Obesity and risk for first ischemic stroke depends on metabolic syndrome: the HUNT study. Stroke. 2021;52:3555–3561. [DOI] [PubMed] [Google Scholar]
- 24.Markidan J, Cole JW, Cronin CA, Merino JG, Phipps MS, Wozniak MA, Kittner SJ. Smoking and risk of ischemic stroke in young men. Stroke. 2018;49:1276–1278. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Robbins AS, Manson JE, Lee I-M, Satterfield S, Hennekens CH. Cigarette smoking and stroke in a cohort of U.S. male physicians. Ann. Intern. Med 1994;120:458–462. [DOI] [PubMed] [Google Scholar]
- 26.Asplund K, Karvanen J, Giampaoli S, Jousilahti P, Niemelä M, Broda G, Cesana G, Dallongeville J, Ducimetriere P, Evans A, et al. Relative risks for stroke by age, sex, and population based on follow-up of 18 European populations in the MORGAM project. Stroke. 2009;40:2319–2326. [DOI] [PubMed] [Google Scholar]
- 27.Sacco RL, Benson RT, Kargman DE, Boden-Albala B, Tuck C, Lin I-F, Cheng JF, Paik MC, Shea S, Berglund L. High-density lipoprotein cholesterol and ischemic stroke in the elderly the Northern Manhattan Stroke Study. JAMA. 2001;285:2729–2735. [DOI] [PubMed] [Google Scholar]
- 28.Marini S, Merino J, Montgomery BE, Malik R, Sudlow CL, Dichgans M, Florez JC, Rosand J, Gill D, Anderson CD. Mendelian randomization study of obesity and cerebrovascular disease. Ann. Neurol 2020;87:516–524. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Aggarwal R, Yeh RW, Joynt Maddox KE, Wadhera RK. Cardiovascular risk factor prevalence, treatment, and control in US adults aged 20 to 44 years, 2009 to March 2020. JAMA. 2023;329:899–909. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.George MG, Tong X, Bowman BA. Prevalence of cardiovascular risk factors and strokes in younger adults. JAMA Neurol. 2017;74:695–703. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Adebamowo CA, Adeyemo A, Ashaye A, Akpa OM, Chikowore T, Choudhury A, Fakim YJ, Fatumo S, Hanchard N, Hauser M, et al. Polygenic risk scores for CARDINAL study. Nat. Genet 2022;54:527–530. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Smit RAJ, Trompet S, Dekkers OM, Jukema JW, le Cessie S. Survival bias in Mendelian randomization studies. Epidemiol. Camb. Mass 2019;30:813–816. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Individual level data from SiGN, where permitted by participant consent and institutional certification, have been deposited into dbGaP. Summary level GWAS statistics from SiGN and EOSC are available on the Cerebrovascular Disease Knowledge Portal.
Study design.
This study was performed according to the Strengthening the Reporting of Observational Studies in Epidemiology Using Mendelian Randomization (STROBE-MR) and the recommended guidelines.7 We employed a 2-sample MR design using IVW as our primary method of analysis to assess the causal estimates of conventional stroke risk factors on early and late onset ischemic stroke.
Study sample for Primary Outcome.
This study utilizes stroke cases and controls assembled from two large GWAS consortiums: The Early Onset Stroke Consortium8 (EOSC) and the Stroke Genetics Network9 (SiGN). Stroke cases in these Consortia underwent brain imaging at each site to exclude diagnoses other than ischemic stroke and to assist with subtype classification. Additional screening was performed in some, but not all, studies to exclude cases believed to be due to a known monogenic cause (e.g., sickle cell disease) or to a known non-genetic cause (e.g., drug use, complications of procedures). Ischemic stroke subtyping was performed using the TOAST criteria10 in most, but not all, sites.
Consistent with criteria used in the EOSC, we defined early onset stroke for these analyses as cases with stroke onset 18-59 years, and late-onset stroke as those with age at first stroke 60 years or older. Subjects included in this report are restricted to a subset of 6,728 early-onset cases (and 33,764 controls) and 9,272 late-onset stroke cases (and 25,124 controls) who are of European ancestry and for whom individual-level genotypes were available (Table S1). The genotype data from stroke cases and controls were based on hg38 and imputed using the TOPMed reference panel on the University of Michigan Imputation Server.11
Exposure Genetic instrument selection.
An important consideration for MR analysis is that the population in which the genetic instrument is developed should be as comparable as possible to the population in which the outcome is measured. To address this issue, we developed two sets of genetic instruments, one in a population < 60 years of age, and the second in a population ≥ 60 years of age. Our strategy was to use a common set of risk factor-associated variants, but then weighting them differently according to their population-specific effect sizes. First, we obtained summary genetic association results from large publicly GWAS available from the GWAS catalog12 (https://www.ebi.ac.uk/gwas/) for 9 stroke risk factors: body mass index (BMI), systolic blood pressure (SBP), diastolic blood pressure (DBP), total cholesterol (TCHOL), LDL cholesterol (LDL), HDL cholesterol (HDL), triglycerides (TG), type 2 diabetes (T2D), and smoking initiation (SmkInit). Sample sizes for the genome-wide association analyses of each trait ranged from 339,224 to 1,232,091 (Table S2). We identified variants from GWAS that were associated with each risk factor at genome-wide significance (p < 5 × 10−8) and selected the most significant variant at each associated locus by removing SNPs in linkage disequilibrium with the lead SNP using the clumping procedure in PLINK with parameters clump-kb = 10,000 and clump-r2 > 0.001. We assessed the strength of each SNP by calculating its F-statistic, which is a function of the proportion of the variance explained by the genetic instrument, and the sample size.11 The total number of SNPs obtained from each GWAS, the number of SNPs pruned at each filtering step, and the corresponding F-statistic are provided in Table S3.
Having identified the risk factor variants to include in our instruments, we then estimated age-specific effect sizes (age < 60 yrs and age ≥ 60 yrs) for each selected variant by performing an age-stratified genetic association analysis of these variants for each risk factor in the UK Biobank and extracting their effect sizes. This enabled us to obtain MR estimates of the casual effect of the risk factor on outcome in populations of approximately the same age as those in which the genetic instrument-exposure effect sizes were calculated. We utilized data from the UK Biobank, a large ongoing prospective cohort study involving 506,682 participants between 2006 and 2010. Participants were aged 40–69 at the time of recruitment. Study populations were identified using BMI, DBP, SBP, HDL, LDL, TG, TCHOL, Smoking Status, and T2D as phenotypes based on UDI and phenocodes. See Table S4 for study population size and definition of each phenotype. Details of the UK Biobank genetic association analyses, including analysis models and genetic association results, are provided in the Supplement (Supplementary Info, Methods; Table S5–13). We compared effect sizes of the risk factor PRS on the risk factor between individuals < age 60 and individuals ≥ age 60 using a paired t-test. We performed a formal comparison of effect sizes across all variants using Cochran’s Q and I2.
Mendelian Randomization Assumptions
To minimize the potential for bias in making causal inferences, SNPs were selected to adhere to three assumptions of valid instrumental variables (Figure 1):13 (1) the SNPs used must be strongly associated with the exposure; (2) the SNPs must not be associated with measured and unmeasured confounders; and (3) the SNPs affect the outcome only through the effects of the exposure (e.g., no horizontal pleiotropy.) We used the F-statistic as a measure of the strength of the SNP-exposure association (Assumption 1) and performed sensitivity analyses (see below) to assess violations of Assumptions 2 and 3.
Figure 1:

The Three MR Assumptions, where Z is the IV associated with the exposure, X is the exposure, Y is the outcome, and U is confounder. 1) Relevance Assumption: The IV is strongly associated with the exposure of interest. 2) Independence assumption: there are no confounders of the association between the IVs and the outcome, and 3) exclusion restriction assumption: the IV is not related to the outcome other than via the exposure
Statistical Analyses
Following clumping and F-statistic filtering, SNP-exposure (risk factors) association statistics were pulled for selected SNPs to create nine age-specific exposure genetic risk scores instruments. The SNP-outcome (stroke) associations were obtained from genetic association analyses performed on the early and late onset stroke datasets from the EOSC and SiGN. The SNP-outcome associations were calculated in PLINK214 (PLINK v2.00a3.3LM) software for all ischemic stroke and for the TOAST subtypes using logistic regression, controlling for genetic ancestry with principal components 1 through 10 and sex.
We used the random-effects inverse-variance weighted15 (IVW) as the primary method for computing causal estimates of the association of each exposure (risk factor) with stroke. This approach entails calculating a Wald ratio for each SNP by dividing the SNP-outcome association by the SNP-exposure association and then estimating the mean of these Wald ratios, weighing each by the inverse of their variances. We used the random-effects model to adjust for heterogeneity among Wald ratios by accounting for over-dispersion in the regression model. Odds ratios were calculated for both EOS and LOS using the final IVW estimate for the causal association of each risk factor with all stroke and TOAST subtype. The odds ratios for continuous risk factors correspond to the change in odds of stroke corresponding to a one standard deviation change in risk factor. To account for multiple testing, we considered a P-value < 0.0055 (P < 0.05/9) to be statistically significant, accounting for the nine different risk factors.
To evaluate whether the IVW estimates comply with the independence and exclusion assumption, we performed several sensitivity analyses. As a measure of pleiotropy, we assessed heterogeneity among the individual Wald ratios from the initial estimates using I2 and Cochran’s Q as well as the intercept taken from the MR-Egger method. We also performed the MR analysis using other methods (e.g., Simple median16, weighted median16, and MR-Egger17) that are more robust than the IVW approach against deviation from the MR assumptions. Although these methods have less power, estimates from these analyses that are directional discordant from the IVW estimates could be an indication of the presence of pleiotropy. Cook’s distance and the MR Pleiotropy Residual Sum and Outlier18 (MR-PRESSO) method were used to identify and remove pleiotropic outliers. SNPs with Cook’s distance > (4/number of SNPs) were tagged as outliers and filtered out due to their disproportionate level of influence on the MR models. MR-PRESSO uses a leave-one-out methodology to detect pleiotropic SNPs and quantifies their distortion in the causal estimate. To ensure the instrumental variables used were the same in our EOS and LOS estimates, we removed non-overlapping SNPs before recalculating the IVW estimate. All statistical analyses were done in R (4.0.3; The R Foundation for Statistical Computing) using the MendelianRandomization (0.9.0) and MR-PRESSO packages (1.0).
Homogeneity Test of Causal Estimates in EOS and LOS
To compare the difference of the causal estimates between early and late onset stroke groups, we performed a t-test, calculated as the difference between the betas divided by the variance of the difference. For this hypothesis we accounted for multiple testing by adjusting for the number of significantly associated risk factors we tested.
We also evaluated whether any differences in effect sizes of risk factors between early and late onset stroke could be driven by differences in the distribution of stroke subtypes between the two groups. For this analysis, we estimated the causal effects for each subtype and then computed the mean of the differences in effect sizes between early and late onset across the five different subtypes. We then computed the variance of the mean difference and performed a t-test to evaluate the significance of the difference in causal effects while accounting for subtype differences between EOS and LOS.
Ethical approval
The genetic association analyses, which involved deidentified data obtained from the UK Biobank Resource under Application Number 49852, underwent ethical oversight, including the determination by the University of Maryland, Baltimore Institutional Review Board that the study is not human research (IRB #: HF-00088022).
