Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2024 Aug 27.
Published in final edited form as: Stroke Vasc Neurol. 2023 Aug 28:svn-2023-002428. doi: 10.1136/svn-2023-002428

Associations of polygenic risk scores with risks of stroke and its subtypes in Chinese

Songchun Yang 1,2, Zhijia Sun 1, Dong Sun 1, Canqing Yu 1,3,4, Yu Guo 5, Dianjianyi Sun 1,3,4, Yuanjie Pang 1,4, Pei Pei 3, Ling Yang 6,7, Iona Y Millwood 6,7, Robin G Walters 6,7, Yiping Chen 6,7, Huaidong Du 6,7, Yan Lu 8, Sushila Burgess 7, Daniel Avery 7, Robert Clarke 7, Junshi Chen 9, Zhengming Chen 7, Liming Li 1,3,4, Jun Lv 1,3,4,10,*, on behalf of the China Kadoorie Biobank Collaborative Group
PMCID: PMC7616400  EMSID: EMS184994  PMID: 37640499

Abstract

Background and purpose

Previous studies, mostly focusing on the European population, have reported polygenic risk scores (PRSs) might achieve risk stratification of stroke. We aimed to examine the association strengths of PRSs with risks of stroke and its subtypes in the Chinese population.

Methods

Participants with genome-wide genotypic data in China Kadoorie Biobank were split into a potential training set (n=22,191) and a population-based testing set (n=72,150). Four previously developed PRSs were included, and new PRSs for stroke and its subtypes were developed. The PRSs showing the strongest association with risks of stroke or its subtypes in the training set were further evaluated in the testing set. Cox proportional hazards regression models were used to estimate the association strengths of different PRSs with risks of stroke and its subtypes (ischemic stroke [IS], intracerebral hemorrhage [ICH], and subarachnoid hemorrhage [SAH]).

Results

In the testing set, during 872,919 person-years of follow-up, 8514 incident stroke events were documented. The PRSs of any stroke (AS) and IS were both positively associated with risks of AS, IS, and ICH (P<0.05). The HR for per standard deviation increment (HRSD) of PRSAS was 1.10 (95% CI: 1.07-1.12), 1.10 (1.07-1.12), and 1.13 (1.07-1.20) for AS, IS, and ICH, respectively; The corresponding HRSD of PRSIS were 1.08 (1.06-1.11), 1.08 (1.06-1.11), and 1.09 (1.03-1.15). PRSICH was positively associated with the risk of ICH (HRSD = 1.07, 95% CI: 1.01-1.14). PRSSAH was not associated with risks of stroke and its subtypes. The addition of current PRSs offered little to no improvement in stroke risk prediction and risk stratification.

Conclusions

In this Chinese population, the association strengths of current PRSs with risks of stroke and its subtypes were moderate, suggesting a limited value for improving risk prediction over traditional risk factors in the context of current GWAS underrepresenting the East Asian population.

Keywords: Stroke, Polygenic Risk Score, Chinese Population

Introduction

Stroke is one of the leading causes of death and disease burdens globally.[1] Stroke includes two main subtypes, ischemic stroke (IS) and hemorrhagic stroke (HS). The latter could further be divided into intracerebral hemorrhage (ICH) and subarachnoid hemorrhage (SAH). With the accumulation of genomic data worldwide, the genetic background of stroke and its subtypes is gradually being revealed. Polygenic risk score (PRS), a method used to combine minor genetic effects across the whole genome, has been increasingly used in stroke research. Several studies based on European populations have developed PRSs for any stroke (AS) or IS and suggested their potential to improve risk prediction and risk stratification.[29] The incidence of stroke in China, especially ICH, is higher than in Western countries.[1] Recently, a PRS for AS was developed based on the Chinese population and showed similar association strength in predicting the risk of IS and HS.[10] However, IS and HS might have different etiological mechanisms.[1113] Different stroke subtypes also have their specific genetic loci.[14] No study has specifically developed PRSs for subtypes of stroke in the Chinese population.

The present study was based on a sub-cohort with genomic data from the China Kadoorie Biobank (CKB). We aimed to examine the association strengths of PRSs with risks of stroke and its subtypes in the Chinese population.

Methods

Participants

CKB is an ongoing prospective study with 512,724 participants aged 30 to 79 enrolled from five urban and five rural regions in China between 2004 and 2008. Details of the study have been described elsewhere.[15] CKB had ethical approvals from the Ethical Review Committee of the Chinese Center for Disease Control and Prevention (Beijing, China) (approval notice: 005/2004) and the Oxford Tropical Research Ethics Committee, University of Oxford (UK) (reference: 025–04). All participants provided a written informed consent form.

Among all CKB participants, there are 100,639 participants with genome-wide genotypic data. Of them, 24,657 participants were selected based on a case-control design nested within the cohort with the primary aim of studying CVD (“case-control samples”), which formed four matched-case-control training sets (figure 1A, supplemental methods, supplemental table 1, supplemental table 2). The other 75,982 participants were randomly selected from the entire CKB cohort (“population-based samples”); after excluding participants with self-reported coronary artery disease or stroke or transient ischemic attack at baseline (n=3,832), the remaining participants were used as a “testing set” (n=72,150) (figure 1A, supplemental methods).

Figure 1. Overview of the present study.

Figure 1

(A) Flowchart for the study population; (B) Study design. The current study can be divided into four parts. (i) Validation of previous PRSs. (ii) Development of new PRSs. (iii) Identification of the optimal PRS for each outcome. (iv) Validation and evaluation of the optimal PRS for each outcome.

Abbreviations: AS, any stroke; CAD, coronary heart disease; CKB, China Kadoorie Biobank; C+T, clumping & thresholding; GWAS, genome-wide association study; ICH, intracerebral hemorrhage; IS, ischemic stroke; PRS, polygenic risk score; SAH, subarachnoid hemorrhage; SSF, summary statistics file; TIA, transient ischemic attack.

aParticipants that had a first or second-degree relative in the sample (kinship coefficient φ> 0.125) were removed by using PLINK 1.9.

bPlease refer to supplemental methods for detailed procedures of case-control matching.

cSee supplemental methods and supplemental table 3 for details.

dSee supplemental methods and supplemental table 4 for details.

Study design

The current study can be divided into four parts (figure 1B). (1) Validation of previous PRSs. Four previously reported stroke-related PRSs were selected for validation.[2,4,5,10] (2) Development of new PRSs. Clumping & thresholding (“C+T”) and LDpred[16] were used to develop new PRSs for stroke and its subtypes based on two genome-wide association studies with large sample sizes.[14,17] (3) Identification of the optimal PRS for each outcome. The performances of different PRSs in predicting each outcome were compared in the corresponding training sets. (4) Validation and evaluation of the optimal PRS for each outcome. We prospectively examined the associations between optimal PRSs and risks of stroke and its subtypes. We evaluated the impact of PRSs on the risk prediction improvement by adding the optimal PRS to traditional risk prediction models in the testing set.

Assessment of traditional stroke risk factors

The baseline questionnaire collected information on sociodemographic characteristics, lifestyle behaviors, dietary habits, and personal and family medical history.[15] Traditional stroke risk factors considered in the present study included sex, age, systolic and diastolic blood pressure (SBP and DBP), smoking, body mass index (BMI), waist circumference, hypertension, diabetes, and family history of stroke. Details on the collection and definition of these variables have been described in our previous work.[18,19]

Genetic data

At baseline, a 10 mL random blood sample was collected from each participant. Genotyping and imputation in this study were centrally conducted, with detailsprovided in our previous study.[19,20] Briefly, two custom-designed single nucleotide polymorphism (SNP) arrays (Affymetrix Axiom® CKB array) were used for genotyping. Imputation was performed based on haplotypes derived from the 1000 Genomes Project Phase 3. There were 9.54 million genetic variants with high reliability (supplemental figure 1).

Polygenic risk scores

We searched the PGS Catalog,[21] PubMed, and Embase. Four previous stroke PRSs were selected for validation analyses (supplemental methods, supplemental table 3).[2,4,5,10] Meanwhile, we ran gwasfilter to filter genome-wide association studies (GWAS) from the GWAS Catalog (https://www.ebi.ac.uk/gwas/).[22,23] Based on ethnicity, sample size, and accessibility of the summary statistics file (SSF), we finally included 1 AS SSF, 2 SAH SSFs, 2 ICH SSFs, and 2 IS SSFs from two large-scale GWAS (supplemental methods, supplemental table 4).[14,17] Similar to our latest research,[19] we developed new PRSs by using two methods: clumping and thresholding (“C+T”) and LDpred[16] (supplemental methods).

Ascertainment of stroke outcomes

All participants were followed up for morbidity and mortality since their baseline enrollment. Incident events were identified by linking with local disease and death registries and the national health insurance database and supplemented by active follow-up.[15] In the testing set, only 653 (0.91%) were lost to follow-up before censoring on December 31, 2018. Trained staff blinded to baseline information codedall events using the International Classification of Diseases, Tenth Revision (ICD-10). Incident stroke events during the follow-up were defined as I60-I64, including SAH (I60), ICH (I61), other nontraumatic intracranial hemorrhage (I62), IS (I63), and unspecified stroke (I64). In the testing set, the events coded as I62 and I64 accounted for only 0.9% (n=76) and 3.5% (n=302) of all incident stroke events.

Since 2014, medical records of incident stroke cases have been retrieved and reviewed by qualified cardiovascular specialists blinded to baseline information. According to a previous study,[24] by October 2018, the reporting accuracy was 91.7%, 90.4%, and 82.7% for IS, ICH, and SAH;[24] the corresponding diagnostic accuracy was 93.1% (including silent lacunar infarction), 98.2%, and 98.1%, respectively.[24]

Identification of the optimal PRS in the training set

In each training set, we used the conditional logistic regression model to measure the association of each PRS with the risk of the corresponding stroke outcome, stratified by the case-control pair, with the top 10 principal components of ancestry (PCA) and array versions as the covariates. We defined the optimal PRS as the PRS with the highest odds ratio (OR) per standard deviation (SD), as our previous study did.[19]

Validation and evaluation of the optimal PRS in the testing set

In the testing set, we used the Cox regression model to measure the association of optimal PRSs with risks of stroke and stroke subtypes. The model was stratified by sex and ten study regions, with age as the time scale and adjusting for the top 10 PCA and array versions. We further adjusted for SBP, BMI, and family history of stroke in sensitivity analyses. We evaluated the proportional hazards assumptions by examining Schoenfeld residuals. Either non-existent or minimal deviations were observed. In subgroup analyses, the tests for multiplicative interaction were performed using likelihood ratio tests by comparing models with and without cross-product terms between the stratifying variable and PRS.

To evaluate the impact of PRS on risk prediction improvement, we defined the “CKB-CVD models” as the traditional risk prediction models, as our previous study did.[19] The “CKB-CVD models” distinguish risks of ischemic stroke and hemorrhagic stroke and have good discrimination without relying on blood lipids.[18] We added the PRS to traditional models to get a “PRS-enhanced model”. We assessed the discrimination performance by using Harrell’s C.[25] We used the net reclassification improvement (NRI) and integrated discrimination improvement (IDI) to evaluate model reclassification before and after the addition of PRS.[26]

The study adhered to the Polygenic Risk Score Reporting Standards (PRS-RS) and STROBE statement (Strengthening the reporting of observational studies in epidemiology) for cohort studies simultaneously (Supplemental file 2).[27,28] Analyses were done with Stata (V17.0, StataCorp) and R (V4.0.3). All statistical tests were two-sided with α = 0.05.

Results

Selection of the optimal PRSs in the training sets

In this study, four 1:1 matched training sets were defined to identify the optimal PRS for AS (7412 pairs), IS (3844 pairs), ICH (4296 pairs), and SAH (359 pairs) (figure 1, supplemental methods). Among the training sets, 72.7%, 61.6%, 77.9%, and 63.8% of the participants were from rural areas in China; 51.9%, 50.5%, 53.4%, and 38.4% of the participants were men, respectively. Among the cases, the median age of disease onset (25-75th percentile) was 65.3 (57.0-72.0), 64.1 (56.1-70.6), 65.9 (57.7-73.0), and 61.0 (53.8-69.2) years, respectively. Among all training sets, the proportion of the control group using the first version of the SNP array was lower than that of the case group (P<0.001) (supplemental table 2). The performance of PRS for AS and IS developed in previous studies was not better than that of the newly developed PRS in the present study (table 1, supplemental table 5). The optimal PRS for AS came from the LDpred method, and the optimal PRS for IS, ICH, and SAH came from the C+T method. The ORSD (95% CI) of the optimal PRSs were 1.14 (1.10-1.18) for AS, 1.18 (1.13-1.24) for IS, 1.10 (1.05-1.15) for ICH, and 1.25 (1.06-1.47) for SAH (table 1, supplemental table 5).

Table 1. The optimal PRSs associated with risks of stroke and its subtypes in the training sets.

Outcomes Method PRS sourcea Number of variants ORSD (95% CI) P-value Note
Any stroke (N=7412 pairs)
Previous study PGS002259 448 1.13 (1.09-1.16) 1.44×10-11
C+T GCST005838 (P=1×10 6 r2=0) 38 1.11 (1.07-1.14) 1.90 ×109
LDpred GCST005838 (ρ=0.01, Ref=1KGP-EAS) 1,017,531 1.14 (1.10-1.18) 3.38×10-14 Optimal
Ischemic stroke (N=3844 pairs)
Previous study PGS000039 1,563,569 1.07 (1.01-1.12) 0.012
C+T GCST90018864 (P=0.02, r2=0.8) 32,158 1.18 (1.13-1.24) 3.55×10” Optimal
LDpred GCST90018864 (ρ=0.01, Ref=1KGP-EUR) 1,017,672 1.17 (1.11-1.23) 1.46×109
Intracerebral hemorrhage (N=4296 pairs)
C+T GCST90018870 (P=0.001, r2=0.2) 1326 1.09 (1.04-1.14) 1.37×10-4
LDpred GCST90018870 (ρ=0.1, Ref=1KGP-EUR) 1,017,664 1.10 (1.05-1.15) 3.09×10’ Optimal
Subarachnoid hemorrhage (N=359 pairs)
C+T GCST90018703 (P=0.4, r2=0) 7899 1.25 (1.06-1.47) 9.21×10 3 Optimal
LDpred GCST90018923 (ρ=0.01, Ref=1KGP-EUR) 1,017,665 1.15 (0.98-1.35) 0.096

Abbreviations: 1KGP, 1000 Genomes Project (Phase 3); CI, confidence interval; C+T, clumping & thresholding; EAS, East Asian; EUR, European; OR, odds ratio; PRS, polygenic risk score; Ref, reference population; SD, standard deviation.

The current table only displays the optimal PRS obtained from different strategies (Previous study, C+T, and LDpred) for each disease outcome. The detailed results of all PRSs can be found in supplementary table 7.

a

“PGS###” indicates the index in the PGS Catalog. “GCST###” indicates the index in the GWAS Catalog. The information in brackets is the parameter used for developing the PRS.

Associations of PRSs with stroke and its subtypes in the testing set

The testing set included 72,150 Chinese participants, of which 59.8% were women. The median age was 50.6 years in women and 51.9 years in men. During 872,919 person-years of follow-up (over 12 years on average), 8514 incident stroke events were documented, including 7507 IS, 1193 ICH, and 132 SAH (table 2). The correlations among the optimal PRSs were weak (all correlation coefficients < 0.2) (supplemental figure 2).

Table 2. Characteristics of the testing set.

Women Men
Number of participants 43,170 28,980
Baseline characteristics
  Age, years 50.6 (42.5-58.3) 51.9 (43.2-60.3)
  Rural areas 22,449 (52.0) 15,772 (54.4)
  Array 1 5,948 (13.8) 4,503 (15.5)
  Primary school and below 23,605 (54.7) 11,882 (41.0)
  Daily smokers 915 (2.1) 16,317 (56.3)
  Body mass index, kg/m2 23.6 (21.4-26.0) 23.3 (21.1-25.7)
  Waist circumference, cm 78.0 (72.0-84.5) 81.5 (74.5-88.5)
  Hypertension 14,062 (32.6) 10,653 (36.8)
  Diabetes 2,477 (5.7) 1,553 (5.4)
  Family history of stroke 7,619 (17.6) 5,075 (17.5)
Follow-up
  Follow-up time, years 12.6 (11.7-13.4) 12.4 (11.4-13.3)
  Total person-yearsa 529,498 343,421
  Incident eventsb
    Any stroke 4763 (11.0) 3751 (12.9)
    Ischemic stroke 4254 (9.9) 3253 (11.2)
    Intracerebral hemorrhage 600 (1.4) 593 (2.0)
    Subarachnoid hemorrhage 87 (0.2) 45 (0.2)

Data are presented as n (%) or median (25–75th percentile) unless otherwise specified.

a

Person-years were calculated as the time from the baseline date to the first of the following: death, loss to follow-up, or the global censoring date (December 31, 2018).

b

Only the first event was counted.

The PRSAS and PRSIS were both positively associated with risks of AS, IS, and ICH (P<0.05). The HRSD (95% CI) of PRSAS were 1.10 (1.07-1.12), 1.10 (1.07-1.12), and 1.13 (1.07-1.20) for AS, IS, and ICH, respectively. The corresponding HRSD (95% CI) of PRSIS were 1.08 (1.06-1.11), 1.08 (1.06-1.11), and 1.09 (1.03-1.15) (figure 2, supplemental table 6). PRSICH was positively associated with the risk of ICH in the whole testing set (HRSD= 1.07), though it was not statistically significant in women (P for sex interaction = 0.056) (figure 2C). PRSSAH was not associated with risks of any outcomes (figure 2). A strong association of PRSAS with the risk of SAH (HRSD = 1.38, 95% CI: 1.03-1.87) was observed in men but not in women (P for sex interaction = 0.055) (figure 2D).

Figure 2. Associations of PRSs with risks of stroke and its subtypes.

Figure 2

Figure 2

Figure 2

Figure 2

(A) Any stroke; (B) Ischemic stroke; (C) Intracerebral hemorrhage; (D) Subarachnoid hemorrhage.

Abbreviations: AS, any stroke; CI, confidence interval; HR, hazard ratio; ICH, intracerebral hemorrhage; IS, ischemic stroke; PRS, polygenic risk score; SAH, subarachnoid hemorrhage.

The PRSs reported here are the optimal PRSs for stroke and its subtypes in the training sets (see table 1), which were standardized (zero mean, unit standard deviation) in the testing set. Cox models were stratified by sex and ten study regions and adjusted for the top 10 principal components of ancestry and array versions, with age as the time scale. The number above the closed square represents the HR. The number of stroke events in women and men has been reported in table 2. The vertical lines indicate 95% CIs.

In sensitivity analyses, the associations of PRSs with risks of stroke and its subtypes did not change significantly after additional adjustment for SBP, body mass index, and family history of stroke (supplemental table 6). In subgroup analyses, there was no strong evidence supporting a different association strength across subgroups for IS and ICH after considering multiple testing (P for interaction > 0.05/8) (supplemental figure 3, supplemental figure 4).

Addition of the optimal PRS to traditional risk prediction models

Based on the traditional models defined in this study, the addition of the PRS did notimprove or only slightly improve the discrimination performance of the models. For IS, the addition of PRSAS increased Harrell’s C by 0.0010 in men (P = 0.002). For hemorrhagic stroke, the addition of PRSs did not influence Harrell’s C significantly (P > 0.05) (figure 3). The addition of the PRS offered little to no improvement in stroke risk stratification. For example, the categorical NRIs at the 10% high-risk threshold for ischemic and hemorrhagic stroke were all not significant in both sexes (P > 0.05) (supplemental table 7).

Figure 3. C statistics evaluating the performance of PRS.

Figure 3

CIs: Confidence intervals; CKB: China Kadoorie Biobank; CVD: Cardiovascular disease; ICD: International Classification of Disease; PRS: Polygenic risk score.

The traditional risk prediction models (traditional models) were defined as sex-specific Cox models stratified by 10 study regions, with time on study as the time scale, including models for ischemic stroke (ICD-10: I63) and models for hemorrhagic stroke (ICD-10: I60-I62).[18] Predictors included in traditional models were the same as the “CKB-CVD models”, including age, systolic and diastolic blood pressure, use of anti-hypertensives, current daily smoking, self-reported diabetes, and waist circumference. Interactions between age and the other six predictors were also included. The 95% CIs of Harrell’s C and Harrell’s C changes were calculated by 100 bootstrap replications using the BCa method in Stata.

Discussion

Based on the largest biobank in the Chinese population, only moderate associations were observed between PRSs and risks of stroke and its subtypes in this Chinese population, with an HRSD of about 1.10. The addition of current PRSs offered little to no improvement in stroke risk prediction and risk stratification. We also found that the PRSs developed from GWAS summary statistics of IS were positively associated with the risk of ICH.

In the present study, the associations of PRSs with risks of stroke and its subtypes were moderate, suggesting a limited value for improving risk prediction over traditional risk factors. The HRSD for PRS was usually greater than 1.20 in previous studies of the general population. A PRS for IS (PGS000039) that was developed with the metaGRS method and combined PRSs of 5 stroke subtypes and 14 stroke-related traits had an HRSD of 1.26 (95% CI: 1.22-1.31) in the European population.[5] Another PRS for stroke (PGS002259) was also developed using the metaGRS method in a Chinese population, with the HRSD for stroke being 1.28 (95% CI: 1.21-1.36).[10] However, these two PRSs showed much weaker associations with the risk of stroke or IS in the present study than in previous studies. Since both PRSs were developed using the elastic-net logistic regression, a machine learning approach, the potential overfitting may undermine their generalization performance.

The incidence rate of ICH is much higher in Chinese than in European populations. However, non-European populations are under-represented in GWAS, which serves as the basis for PRS development. The largest GWAS for ICH included only 3400 ICH cases, with most of them from European populations.[17] The present study attempted to develop PRS for ICH based on summary statistics from this GWAS. The weak associations observed in the present study are either explained by the difference in genetic background between ethnic groups or suggest that this GWAS may be underpowered. The stronger association estimate between PRS and HS risk reported in the previous study was likely due to the inclusion of PRSs for risk factors of HS (such as blood pressure) in the metaGRS method.[10] It is worth mentioning that, in the present study, the PRSs directly developed from GWAS summary statistics of IS were also positively associated with the risk of ICH. Although there are differences in etiology and risk factor profile between IS and ICH,[1113] they might also have some partially shared etiological mechanisms like the cerebral small-vessel disease.[29]

This study has the following strengths. The large sample size and a large number of stroke events (including IS and ICH) enabled us to separate powerful training sets and the testing set and to conduct subgroup analyses. The loss to follow-up rate was less than 1% at an average follow-up period of over 12 years in CKB. The main subtypes of stroke (i.e., IS, ICH, and SAH) were well-classified, and the reporting and diagnostic accuracy of stroke events were high.[24] The genotyping and imputation of genetic data in this study were centrally conducted through a standard quality control process. Genetic variants with high reliability covered the whole genome well.

However, several limitations merit consideration. Firstly, we did not further consider the subtypes of IS (e.g., large-atherosclerotic stroke, cardioembolic stroke, and small vessel stroke) as over 75% of the incident IS events were coded as unspecified IS (ICD-10: I63.9), which precluded us from conducting more detailed analyses. Previous studies have suggested that there are differences in genetic loci of different IS subtypes.[14,30] Subsequent studies can explore whether distinguishing IS subtypes can further improve the predictive ability of PRS for IS. Secondly, compared with IS and ICH, the number of SAH events was relatively small. Therefore, it is difficult to exclude chance factors for the positive results observed in the present study. Further studies with more SAH events are warranted to examine our findings. Thirdly, the genetic variants with ambiguous SNP (i.e., A/T, C/G) and those that were not found in CKB or had low imputation quality scores were removed during the standard quality control process of PRSs. This might weaken the associations of previous PRSs with stroke and its subtypes. Fourthly, because information on blood lipids was not available for the current study population, we were unable to compare the impacts of blood lipids and PRS on traditional stroke risk prediction model improvement. However, the addition of blood lipids may enhance the traditional non-laboratory-based models, as previous studies have shown.[31,32] Therefore, adding PRS to a “lipid-enhanced model” might lead to a more minor improvement than what we have observed in the present study.

Conclusions

In this Chinese population, the associations of optimal PRSs with risks of stroke and its subtypes were moderate, suggesting a limited value for improving risk prediction over traditional risk factors in the context of current GWAS underrepresenting the East Asian population. As GWAS of stroke and its subtypes progress among East Asians, further studies are warranted to assess whether new PRSs have considerable potential to translate into precision public health and population health benefits and, if so, to determine the appropriate context for their use.

Supplementary Material

Checklists
Supplementary File 1

Key Messages Box.

What is already known on this topic?

  • Polygenic risk scores (PRSs) might achieve risk stratification of stroke.

  • Evidence from the East Asian population (including Chinese) is lacking.

What this study adds?

  • The association strengths of current PRSs with risks of stroke and its subtypes were moderate in the Chinese population.

  • PRS for ischemic stroke was positively associated with the risk of intracerebral hemorrhage.

How this study might affect research, practice or policy?

  • In the Chinese population, current PRSs might have limited value for improving stroke risk prediction over traditional risk factors.

  • Further studies are warranted to assess whether new PRSs based on larger GWAS or other developing methods have considerable potential to translate into population health benefits.

Acknowledgments

The most important acknowledgment is to the participants in the study and the members of the survey teams in each of the 10 regional centers, as well as to the project development and management teams based in Beijing, Oxford, and the 10 regional centers.

Funding/Support

This work was supported by the National Natural Science Foundation of China (82192904, 82192901, 82192900). The CKB baseline survey and the first re-survey were supported by a grant from the Kadoorie Charitable Foundation in Hong Kong. The long-term follow-up is supported by grants from the UK Wellcome Trust (212946/Z/18/Z, 202922/Z/16/Z, 104085/Z/14/Z, 088158/Z/09/Z), grants (2016YFC0900500) from the National Key R&D Program of China, National Natural Science Foundation of China (81390540, 91846303, 81941018), and Chinese Ministry of Science and Technology (2011BAI09B01).

Role of the Funder/Sponsor

The funders had no role in the study design, data collection, data analysis, data interpretation, or writing of the report.

Non-standard Abbreviations and Acronyms

CI

confidence interval

CKB

China Kadoorie Biobank

GWAS

genome-wide association study

HR

hazard ratio

ICH

intracerebral hemorrhage

IS

ischemic stroke

LD

linkage disequilibrium

OR

odds ratio

PRS

polygenic risk score

SAH

subarachnoid hemorrhage

SD

standard deviation

Footnotes

Author Contributions

JL conceived and designed the study. LL, ZC, and JC, members of the China Kadoorie Biobank Steering Committee, designed and supervised the whole study, obtained funding, and, together with CY, YG, Dianjianyi Sun, YP, PP, LY, YC, HD, YL, SB, DA, IM, and RW acquired the data. SY, ZS, and Dong Sun analyzed the data. SY drafted the manuscript. CY, YP, Dianjianyi Sun, and RC helped to interpret the results. JL contributed to the critical revision of the manuscript for important intellectual content. All authors reviewed and approved the final manuscript. JL is the guarantor.

Conflict of Interest Disclosures

We report no disclosures relevant to the manuscript.

Data Availability Statement

Details of how to access China Kadoorie Biobank data and details of the data release schedule are available from www.ckbiobank.org/site/Data+Access.

References

  • [1].GBD 2019 Stroke Collaborators. Global, regional, and national burden of stroke and its risk factors, 1990-2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet Neurol. 2021;20(10):795–820. doi: 10.1016/S1474-4422(21)00252-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Ibrahim-Verbaas CA, Fornage M, Bis JC, et al. Predicting stroke through genetic risk functions: the CHARGE Risk Score Project. Stroke. 2014;45(2):403–412. doi: 10.1161/STROKEAHA.113.003044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Malik R, Bevan S, Nalls MA, et al. Multilocus genetic risk score associates with ischemic stroke in case-control and prospective cohort studies. Stroke. 2014;45(2):394–402. doi: 10.1161/STROKEAHA.113.002938. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Rutten-Jacobs LC, Larsson SC, Malik R, et al. Genetic risk, incident stroke, and the benefits of adhering to a healthy lifestyle: cohort study of 306 473 UK Biobank participants. BMJ. 2018;363:k4168. doi: 10.1136/bmj.k4168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Abraham G, Malik R, Yonova-Doing E, et al. Genomic risk score offers predictive performance comparable to clinical risk factors for ischaemic stroke. Nat Commun. 2019;10(1):5819. doi: 10.1038/s41467-019-13848-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Li J, Chaudhary DP, Khan A, et al. Polygenic Risk Scores Augment Stroke Subtyping. Neurol Genet. 2021;7(2):e560. doi: 10.1212/NXG.0000000000000560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Marston NA, Patel PN, Kamanu FK, et al. Clinical Application of a Novel Genetic Risk Score for Ischemic Stroke in Patients With Cardiometabolic Disease. Circulation. 2021;143(5):470–478. doi: 10.1161/CIRCULATIONAHA.120.051927. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].O’Sullivan JW, Shcherbina A, Justesen JM, et al. Combining Clinical and Polygenic Risk Improves Stroke Prediction Among Individuals With Atrial Fibrillation. Circ Genom Precis Med. 2021;14(3):e003168. doi: 10.1161/CIRCGEN.120.003168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Sun L, Pennells L, Kaptoge S, et al. Polygenic risk scores in cardiovascular risk prediction: A cohort study and modelling analyses. PLoS Med. 2021;18(1):e1003498. doi: 10.1371/journal.pmed.1003498. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Lu X, Niu X, Shen C, et al. Development and Validation of a Polygenic Risk Score for Stroke in the Chinese Population. Neurology. 2021;97(6):e619–e628. doi: 10.1212/WNL.0000000000012263. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Chen Z, Iona A, Parish S, et al. Adiposity and risk of ischaemic and haemorrhagic stroke in 0.5 million Chinese men and women: a prospective cohort study. Lancet Glob Health. 2018;6(6):e630–e640. doi: 10.1016/S2214-109X(18)30216-X. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Sun L, Clarke R, Bennett D, et al. Causal associations of blood lipids with risk of ischemic stroke and intracerebral hemorrhage in Chinese adults. Nat Med. 2019;25(4):569–574. doi: 10.1038/s41591-019-0366-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Gu X, Li Y, Chen S, et al. Association of Lipids With Ischemic and Hemorrhagic Stroke: A Prospective Cohort Study Among 267 500 Chinese. Stroke. 2019;50(12):3376–3384. doi: 10.1161/STROKEAHA.119.026402. [DOI] [PubMed] [Google Scholar]
  • [14].Malik R, Chauhan G, Traylor M, et al. Multiancestry genome-wide association study of 520,000 subjects identifies 32 loci associated with stroke and stroke subtypes. Nat Genet. 2018;50(4):524–537. doi: 10.1038/s41588-018-0058-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Chen Z, Chen J, Collins R, et al. China Kadoorie Biobank of 0.5 million people: survey methods, baseline characteristics and long-term follow-up. Int J Epidemiol. 2011;40(6):1652–1666. doi: 10.1093/ije/dyr120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Vilhjalmsson BJ, Yang J, Finucane HK, et al. Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores. Am J Hum Genet. 2015;97(4):576–592. doi: 10.1016/j.ajhg.2015.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Sakaue S, Kanai M, Tanigawa Y, et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat Genet. 2021;53(10):1415–1424. doi: 10.1038/s41588-021-00931-x. [DOI] [PubMed] [Google Scholar]
  • [18].Yang S, Han Y, Yu C, et al. Development of a model to predict 10-year risk of ischemic and hemorrhagic stroke and ischemic heart disease using the China Kadoorie Biobank. Neurology. 2022;98(23):e2307–e2317. doi: 10.1212/WNL.0000000000200139. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [19].Yang S, Sun D, Sun Z, et al. Minimal improvement in coronary artery disease risk prediction in Chinese population using polygenic risk scores: Evidence from the China Kadoorie Biobank. Chin Med J (Engl) 2023 doi: 10.1097/CM9.0000000000002694. Online ahead of print. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Zhu Z, Li J, Si J, et al. A large-scale genome-wide association analysis of lung function in the Chinese population identifies novel loci and highlights shared genetic etiology with obesity. Eur Respir J. 2021;58(4):2100199. doi: 10.1183/13993003.00199-2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Lambert SA, Gil L, Jupp S, et al. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nat Genet. 2021;53(4):420–425. doi: 10.1038/s41588-021-00783-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Yang S, Li C, Hu Y, et al. gwasfilter: an R script to filter genome-wide association study. Chin J Epidemiol. 2021;42(10):1876–1881. doi: 10.3760/cma.j.cn112338-20200731-01003. [DOI] [PubMed] [Google Scholar]
  • [23].Buniello A, MacArthur JAL, Cerezo M, et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 2019;47(D1):D1005–D1012. doi: 10.1093/nar/gky1120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [24].Turnbull I, Clarke R, Wright N, et al. Diagnostic accuracy of major stroke types in Chinese adults: A clinical adjudication study involving 40,000 stroke cases. Lancet Reg Health West Pac. 2022;21:100415. doi: 10.1016/j.lanwpc.2022.100415. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [25].Harrell FE, Jr, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996;15(4):361–387. doi: 10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4. [DOI] [PubMed] [Google Scholar]
  • [26].Pencina MJ, D’Agostino RB, Sr, D’Agostino RB, Jr, Vasan RS. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med. 2008;27(2):157–172. doi: 10.1002/sim.2929. [DOI] [PubMed] [Google Scholar]
  • [27].Wand H, Lambert SA, Tamburro C, et al. Improving reporting standards for polygenic scores in risk prediction studies. Nature. 2021;591(7849):211–219. doi: 10.1038/s41586-021-03243-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [28].von Elm E, Altman DG, Egger M, et al. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. BMJ. 2007;335(7624):806–808. doi: 10.1136/bmj.39335.541782.AD. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Wardlaw JM, Smith C, Dichgans M. Small vessel disease: mechanisms and clinical implications. Lancet Neurol. 2019;18(7):684–696. doi: 10.1016/S1474-4422(19)30079-1. [DOI] [PubMed] [Google Scholar]
  • [30].Linden AB, Clarke R, Hammami I, et al. Genetic associations of adult height with risk of cardioembolic and other subtypes of ischemic stroke: A mendelian randomization study in multiple ancestries. PLoS Med. 2022;19(4):e1003967. doi: 10.1371/journal.pmed.1003967. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [31].Ueda P, Woodward M, Lu Y, et al. Laboratory-based and office-based risk scores and charts to predict 10-year risk of cardiovascular disease in 182 countries: a pooled analysis of prospective cohorts and health surveys. Lancet Diabetes Endocrinol. 2017;5(3):196–213. doi: 10.1016/S2213-8587(17)30015-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [32].The WHO CVD Risk Chart Working Group. World Health Organization cardiovascular disease risk charts: revised models to estimate risk in 21 global regions. Lancet Glob Health. 2019;7(10):e1332–e1345. doi: 10.1016/S2214-109X(19)30318-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [33].Purcell S, Neale B, Todd-Brown K, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Checklists
Supplementary File 1

Data Availability Statement

Details of how to access China Kadoorie Biobank data and details of the data release schedule are available from www.ckbiobank.org/site/Data+Access.

RESOURCES