Abstract
Background
Recent genome-wide association study meta-analyses have identified 28 loci associated with risk of Parkinson’s disease (PD). We sought to investigate if these genetic risk factors are associated with PD age at onset.
Methods
Genetic risk scores from these loci were calculated for 6249 cases. Linear regression tested associations between cumulative genetic risk and PD age at onset.
Results
Increasing genetic risk scores were associated with earlier age at onset (beta = −0.10, p-value = 2.92E-08, adjusted r2 = 0.27). Single standard deviation increase in genetic risk score is associated with 37.44 days earlier age at onset.
Conclusions
Highest genetic risk was found at 31–60 years, onset slightly below average, with youngest and oldest groups exhibiting the least common variant derived genetic risk.
INTRODUCTION
Recent genome-wide association studies (GWAS) have shown 28 distinct and replicated loci associated with Parkinson’s disease (PD) risk 1–5. Despite the success of this work there has been limited progress in understanding how genetic variation affects PD phenotype. PD is an age dependent neurodegenerative condition, and age at onset is a robust phenotypic measure. Age at onset (AAO) is associated with variation in the motor and non-motor phenotype and with the chance of possessing Mendelian genes that cause PD 6,7. To date, no GWAS have identified and independently replicated common variants associated with age at onset of PD on a genome-wide scale. We have investigated the influence of cumulative loading of common genetic risk factors on the age at onset (AAO) of PD in a large population of PD cases. We also attempt to quantify the distribution of genetic risk derived from common genetic variation across the age range in this case population.
In this study, we calculated cumulative genetic risk scores (GRS) across these 28 loci in a large series of 6249 PD cases to estimate the overall genetic burden attributable to relatively common variants of small effect. We use this score as a predictor of AAO to show how genetic factors may modify disease expression. We also study the distribution of the GRS across varying strata of age at onset.
PATIENTS AND METHODS
Patients
In this study we used high quality genotypic and clinical data from 6249 unrelated European ancestry PD cases with age at onset of at least 18 years of age. These cases were collected through the combined efforts of American, French, German, Greek and UK collaborators within the International Parkinson’s Disease Genomics Consortium (IPDGC) under appropriate consent an institutional review. All samples were genotyped on the NeuroX array at the Laboratory of Neurogenetics at the National Institute on Aging (Nalls et al., In press). In brief, this is an exome-focused array designed in collaboration with Illumina Inc that also includes over 24,000 custom designed variants useful in the study of neurodegenerative diseases. Within these custom designed variants, we included the 28 relatively common PD risk variants and a number of proxies from recent GWAS. None of the participants sampled in this study were used for the discovery meta-analyses that succeeded in defining the 28 loci of interest, but a portion were included in the replication series 5. For further details on patient genotyping and quality control, please see Supplemental Text S2.
Case acquisition was carried out as previously described across all collaborative sites within the IPDGC 2. All cases had an AAO of at least 18 years, and met the modified Queen Square Brain Bank criteria for the diagnosis of PD, although the criteria was modified to allow for inclusion of cases with familial history 8. Prior to genotyping, samples with known Mendelian mutations were excluded.
Statistics
Based on samples and genotypes passing standard GWAS quality control, genotypes for the 28 variants of interest were extracted (see Supplementary Table S1). Genetic risk scores were then generated for each sample. Risk allele dosages were summed across all loci of interest, with dosages scaled per SNP using effect estimates from the discovery phase of analysis in the largest published meta-analysis of GWAS (effect estimates and summary statistics for these SNPs are available on the PDgene database at http://www.pdgene.org 5). The methodology for scaling of risk allele dosages in the creation of GRS have been described in detail elsewhere 1,2,9,10. This scaling accounts for the reality that each variant of interest has a different risk estimate, and is a more applicable methodology than simply counting risk alleles. The GRS calculated are a summary of cumulative genetic risk attributable to known relatively common genetic variation across the 22 autosomes that have been definitively shown to be associated with risk of PD. To make the scale of analyses relating to the GRS more easily interpretable, GRS were converted to Z scores, in its simplest form, a Z score of one is equivalent to a single standard deviation of increase from the case population mean of the GRS.
Concurrent with risk profile calculation, 20 eigenvectors summarizing population substructure on a genetic level were generated, based on high quality variants not in linkage with each other and not within 500 kb of any of the 28 variants of interest, (minor allele frequency > 0.05, r2 < 0.50 within 250kb with another assayed variant, yielding 10024 variants in total) to account for stochastic genetic variance within our outbred European population of cases. These were used as additional covariates in statistical models to account for population substructure.
Stepwise linear regression was used to test the associations between both single variants comprising the GRS or the GRS itself as possibly associated with AAO. The initial models used included covariates of eigenvectors 1–20, country of origin as a factor variable and female gender. Stepwise modeling was employed to reduce analyses to the most parsimonious model based on default settings within the MASS package [Modern Applied Statistics with S, http://www.stats.ox.ac.uk/pub/MASS4] available using R v3.02 [R: A Language and Environment for Statistical Computing, http://www.R-project.org].
In addition, we also tested the association between the GRS and AAO by strata of AAO, roughly divided into decades of AAO (see Table 1). We also compared the mean GRS by AAO strata to the remainder of the case series to see if each strata was significantly different from others using a simple two-sided T-test. All statistical analyses and plotting were carried out using R v3.02 and when appropriate the GGPLOT2 package [ggplot2: elegant graphics for data analysis, http://had.co.nz/ggplot2/book].
Table 1. Descriptive statistics and regression model summaries.
Genetic risk scores across age strata in this study. Beta, standard error, p-value and r2 columns relate to parameter estimates for the genetic risk score after Z transformation based on pooled cases, the r2 is an adjusted r2, with all summary statistics derived from the stepwise regression models. The scale of the beta estimate is based on a 1 standard deviation increase from overall case mean genetic risk scores in Z units associated with a change in age at onset measured in years. AAO mean comparisons from each strata compared to all other strata via T-test are summarized by asterisks, * denotes two-sided mean difference in genetic risk score from other age strata at p-value < 0.05, ** denotes two-sided mean difference in genetic risk score from other age strata at p-value < 0.001.
| Descriptive statistics by fine strata: | ||||||||
|---|---|---|---|---|---|---|---|---|
| Age range (years) | Age at onset in years (mean, SD) | Female (%) | Cases (N) | Genetic risk score (mean, SD) | ||||
| 18–30 | 26.74, 3.36 | 30.13 | 73 | −0.09, 0.96 | ||||
| 31–40 | 36.81, 2.73 | 37.28 | 322 | 0.15, 0.98 | ||||
| 41–50 | 46.21, 2.85 | 38.41 | 932 | 0.14, 1.01 | ||||
| 51–60 | 55.92, 2.88 | 38.86 | 1459 | 0.05, 1.00 | ||||
| 61–70 | 65.59, 2.81 | 36.44 | 1858 | −0.04, 1.00 | ||||
| 71–80 | 75.03, 2.84 | 30.73 | 1412 | −0.11, 0.99 | ||||
| 81+ | 82.30, 2.07 | 26.42 | 193 | −0.22, 1.01 | ||||
| Descriptive statistics by coarse strata with regression summary statistics: | ||||||||
| Age range (years) | Age at onset in years (mean, SD) | Female (%) | Cases (N) | Genetic risk score (mean, SD) | Beta | Standard error | P-value | r2 |
| All samples (pooled across age ranges) | 61.15, 12.46 | 35.67 | 6249 | 0, 1 | −0.753 | 0.136 | 2.92E-08 | 0.266 |
| Samples 18–60 years only | 49.70, 8.10 | 38.30 | 2786 | 0.09, 1.00 | −0.097 | 0.149 | 0.514 | 0.060 |
| Samples 61+ years only | 70.37, 6.07 | 33.55 | 3463 | −0.7, 1.00 | −0.237 | 0.092 | 0.010 | 0.219 |
RESULTS
The cumulative burden of relatively common risk variants comprising the GRS for PD risk are strongly associated with PD AAO in a large population of typical cases. Increasing genetic risk scores were significantly associated with an overall trend for earlier AAO of disease in pooled analyses (beta = −0.10, p-value = 2.92E-08, adjusted r2 = 0.27). A single standard deviation increase from the mean GRS (ie a 1 Z increase) is associated with ~37 days earlier onset. As an example of the scale of this association, a case in the 99th percentile of GRS distribution would have an onset of disease ~90 days earlier than the average case. Interestingly enough, the GRS was not significantly associated with AAO within any of the analyses stratified by AAO, as this is likely due to power constraints and smaller sample sizes within each strata (Table 1, Figure 1 Panels A and B). No single variant comprising the risk score was itself associated with AAO after correction for multiple testing (minimum p-value for significance 1.79E-03 after 28 tests), except rs34311866, a common coding variant in the TMEM175 gene (beta = −0.81, p-value = 1.74E-03). For further details on single variant analyses please refer to Supplemental Table S1. For further details of the distribution of the GRS by age strata, please see Supplemental Figures S1–S3 and Supplemental Table S2.
Figure 1. Common genetic risk factors are associated with earlier Parkinson’s disease age at onset and differ across age at onset strata.
Panel A is a plot of the regression line showing the association between genetic risk score and age at onset, with the blue line as the parameter estimate and the grey shading as the 95% confidence interval of the regression model. Panel B is the overlay of the regression model from the previous panel fitted to the actual data points. Panel C shows boxplots of cumulative genetic risk estimates from pooled samples by age at onset strata. Panel D is a three-dimensional “probability map” of age at onset by genetic risk score based on the regression model.
We note that cumulative genetic risk was highest in the 31–40, 41–50 and 51–60 year AAO strata (Table 1, Figure 1 Panels C and D) at p-values of 4.9E-03, 3.85E-08 and 18E-02 respectively when comparing the mean AAO of each of these individual strata to all other age strata via simple T-test. These are the only age strata with GRS consistently above the case population mean. Interestingly enough, all three of these age groups fall below the overall population mean AAO of 61.15 years.
DISCUSSION
In this report, we definitively show that common genetic variants associated with PD risk also have a small but consistent effect on disease etiology by combining to expedite the onset of the disease. We identify small but consistent effects of these variants contributing to earlier onset among cases enriched for genetic risk using a large sample series and semi-targeted genotyping.
In addition, we show that genetic risk of PD quantified using relatively common variants from recent GWAS are enriched in earlier onset PD. This is evident across PD cases with AAO at 60 years and younger except for only the youngest PD cases (less than 30 years). This idiosyncratic GRS, outside of the expected trend, among the earliest onset PD may likely be due to either the rarity of PD at this particular range of AAO and our small sample counts within this range, or the impact of unknown rare or Mendelian variants that were not detected by current GWAS methods. This research is of interest to the general PD research community as well as in studies of aging related genomics, as it suggests that the canalization of genetic factors associated with aging and possible accumulated environmental exposures have an impact on disease etiology in later onset; while on the other hand, it shows that greater genetic liability is concurrent with earlier age at onset generally, although future investigations of rare alleles are likely necessary to evaluate the earliest onset forms of many diseases, including PD.
Supplementary Material
Counts of samples as a function of the genetic risk score.
Counts of samples as a function of age at onset in years.
Counts of samples as a function of the genetic risk score stratified by decades of age at onset.
Supplemental Table S1: Summaries of single variants included in risk score calculation as possibly associated with age at onset.
Supplemental Table S2: Percent of case population at differing levels of GRS by age strata.
Supplemental Text S1: Consortium membership and funding acknowledgements.
Supplemental Text S2: Additional methods information.
Acknowledgments
Please see Supplemental Text S1.
References
- 1.Chiò A, Schymick JC, Restagno G, et al. A two-stage genome-wide association study of sporadic amyotrophic lateral sclerosis. Hum Mol Genet. 2009 Apr 15;18(8):1524–1532. doi: 10.1093/hmg/ddp059. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.International Parkinson Disease Genomics Consortium. Nalls MA, Plagnol V, et al. Imputation of sequence variants for identification of genetic risks for Parkinson’s disease: a meta-analysis of genome-wide association studies. Lancet. 2011 Feb 19;377(9766):641–649. doi: 10.1016/S0140-6736(10)62345-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Lill CM, Roehr JT, McQueen MB, et al. Comprehensive research synopsis and systematic meta-analyses in Parkinson’s disease genetics: The PDGene database. PLoS Genet. 2012;8(3):e1002548. doi: 10.1371/journal.pgen.1002548. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Simón-Sánchez J, Schulte C, Bras JM, et al. Genome-wide association study reveals genetic risk underlying Parkinson’s disease. Nat Genet. 2009 Dec;41(12):1308–1312. doi: 10.1038/ng.487. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Nalls MA, Pankratz N, Lill CM, et al. Large Scale Meta Analysis of Genome-wide Association Data in Parkinson’s Disease Reveals 28 Distinct Risk Loci. Nature Genetics. 2014 doi: 10.1038/ng.3043. In press. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Kilarski LL, Pearson JP, Newsway V, et al. Systematic review and UK-based study of PARK2 (parkin), PINK1, PARK7 (DJ-1) and LRRK2 in early-onset Parkinson’s disease. Mov Disord Off J Mov Disord Soc. 2012 Oct;27(12):1522–1529. doi: 10.1002/mds.25132. [DOI] [PubMed] [Google Scholar]
- 7.Wickremaratchi MM, Knipe MDW, Sastry BSD, et al. The motor phenotype of Parkinson’s disease in relation to age at onset. Mov Disord Off J Mov Disord Soc. 2011 Feb 15;26(3):457–463. doi: 10.1002/mds.23469. [DOI] [PubMed] [Google Scholar]
- 8.Massano J, Bhatia KP. Clinical approach to Parkinson’s disease: features, diagnosis, and principles of management. Cold Spring Harb Perspect Med. 2012 Jun;2(6):a008870. doi: 10.1101/cshperspect.a008870. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Hernandez DG, Nalls MA, Ylikotila P, et al. Genome wide assessment of young onset Parkinson’s disease from Finland. PloS One. 2012;7(7):e41859. doi: 10.1371/journal.pone.0041859. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Ripatti S, Tikkanen E, Orho-Melander M, et al. A multilocus genetic risk score for coronary heart disease: case-control and prospective cohort analyses. Lancet. 2010 Oct 23;376(9750):1393–1400. doi: 10.1016/S0140-6736(10)61267-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Counts of samples as a function of the genetic risk score.
Counts of samples as a function of age at onset in years.
Counts of samples as a function of the genetic risk score stratified by decades of age at onset.
Supplemental Table S1: Summaries of single variants included in risk score calculation as possibly associated with age at onset.
Supplemental Table S2: Percent of case population at differing levels of GRS by age strata.
Supplemental Text S1: Consortium membership and funding acknowledgements.
Supplemental Text S2: Additional methods information.

