Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Oct 1.
Published in final edited form as: Lancet Neurol. 2015 Aug 10;14(10):1002–1009. doi: 10.1016/S1474-4422(15)00178-7

Diagnosis of Parkinson’s disease on the basis of clinical–genetic classification: a population-based modelling study

Mike A Nalls 1, Cory Y McLean 2, Jacqueline Rick 3, Shirley Eberly 4, Samantha J Hutten 5, Katrina Gwinn 6, Margaret Sutherland 6, Maria Martinez 7,8, Peter Heutink 9, Nigel Williams 10, John Hardy 11, Thomas Gasser 12, Alexis Brice 13,14,15,16,17, T Ryan Price 1, Aude Nicolas 1, Margaux F Keller 18, Cliona Molony 18, J Raphael Gibbs 1, Alice Chen-Plotkin 3, Eunran Suh 19, Christopher Letson 1, Massimo S Fiandaca 20, Mark Mapstone 21, Howard J Federoff 20, Alastair J Noyce 11, Huw Morris 11, Vivianna M Van Deerlin 19, Daniel Weintraub 3,22, Cyrus Zabetian 23, Dena G Hernandez 1, Suzanne Lesage 13,14,15,16,17, Meghan Mullins 2, Emily Drabant Conley 2, Carrie Northover 2, Mark Frasier 5, Ken Marek 24, Aaron G Day-Williams 25, David J Stone 18, John P A Ioannidis 26,27,28,29, Andrew B Singleton 1,*, on behalf of the Parkinson’s Disease Biomarkers Program (PDBP) and the Parkinson’s Progression Marker Initiative (PPMI) investigators
PMCID: PMC4575273  NIHMSID: NIHMS715938  PMID: 26271532

Abstract

Background

Accurate diagnosis and early detection of complex disease has the potential to be of enormous benefit to clinical trialists, patients, and researchers alike. We sought to create a non-invasive, low-cost, and accurate classification model for diagnosing Parkinson’s disease risk to serve as a basis for future disease prediction studies in prospective longitudinal cohorts.

Methods

We developed a simple disease classifying model within 367 patients with Parkinson’s disease and phenotypically typical imaging data and 165 controls without neurological disease of the Parkinson’s Progression Marker Initiative (PPMI) study. Olfactory function, genetic risk, family history of PD, age and gender were algorithmically selected as significant contributors to our classifying model. This model was developed using the PPMI study then tested in 825 patients with Parkinson’s disease and 261 controls from five independent studies with varying recruitment strategies and designs including the Parkinson’s Disease Biomarkers Program (PDBP), Parkinson’s Associated Risk Study (PARS), 23andMe, Longitudinal and Biomarker Study in PD (LABS-PD), and Morris K. Udall Parkinson’s Disease Research Center of Excellence (Penn-Udall).

Findings

Our initial model correctly distinguished patients with Parkinson’s disease from controls at an area under the curve (AUC) of 0.923 (95% CI = 0.900 – 0.946) with high sensitivity (0.834, 95% CI = 0.711 – 0.883) and specificity (0.903, 95% CI = 0.824 – 0.946) in PPMI at its optimal AUC threshold (0.655). The model is also well-calibrated with all Hosmer-Lemeshow simulations suggesting that when parsed into random subgroups, the actual data mirrors that of the larger expected data, demonstrating that our model is robust and fits well. Likewise external validation shows excellent classification of PD with AUCs of 0.894 in PDBP, 0.998 in PARS, 0.955 in 23andMe, 0.929 in LABS-PD, and 0.939 in Penn-Udall. Additionally, when our model classifies SWEDD as PD, they convert within one year to typical PD more than would be expected by chance, with 4 out of 17 classified as PD converting to PD during brief follow-up; while SWEDD not classified as PD showed one conversion to PD out of 38 participants (test of proportions, p-value = 0.003).

Interpretation

This model may serve as a basis for future investigations into the classification, prediction and treatment of Parkinson’s disease, particularly those planning on attempting to identify prodromal or preclinical etiologically typical PD cases in prospective cohorts for efficient interventional and biomarker studies.

Funding

Please see the acknowledgements and funding section at the end of the manuscript.

RESEARCH IN CONTEXT

Evidence before this study

Prior to January 1st, 2015 we searched PubMed for articles containing possible combinations of the terms “Parkinson’s disease”, “neurodegeneration”, “biomarker” and “risk prediction”. Previous studies have sought to identify accurate biomarkers of Parkinson’s disease allowing for risk quantification and classification outside or before entering a clinic for traditional symptomatic diagnosis. To build our model, we used data from a number of disparate sources within PPMI, from clinically derived data such as olfactory function and imaging, to genetic risk estimates and demographics. None of these comparatively cheap and easily accessible classifiers alone proved to be accurate (near AUC 0.900 or higher) in classifying PD cases and controls within PPMI. More accurate, surpassing AUC of 0.900, classifiers such as imaging data tend to be quite expensive, in the range of thousands of dollars per participant and not portable.

Added value of this study

This study adds value to the field of PD research by developing an algorithm using data that is of low material cost and can be remotely administered. This simple algorithm is also accurate at classifying case-control datasets based on data points outside of both study recruitment and PD diagnostic criteria (AUC > 0.89 in all independent datasets) In addition, this study has been successful in refining PD phenotypes (i.e. by not classifying SWEDD as PD), which may be useful to researchers in the clinical trial setting, particularly in concert with imaging data. In this case, the risk prediction approach was able to discriminate subjects without evidence of dopaminergic deficit typical of classic PD.

Implications of all available evidence

The implications of this study for future research are multifaceted. The development and primary validation of this classification algorithm using publicly available data shows the utility of public datasets such as Parkinson’s Progression Marker Initiative (PPMI) and Parkinson’s Disease Biomarkers Program (PDBP). Within these two larger case-control cohorts (PPMI and PDBP), this more expansive and multifaceted model significantly outperforms any single classifier itself. As the pace of PD genomics advances exponentially with added precision from sequencing studies, we will see the genetic contribution to risk prediction grow rapidly, with this study serving as a foundation. We show initial success in predicting which of the PD patients who present as SWEDDs will progress to typical PD with evidence for dopaminergic deficit. Easily identifying or precluding this group will likely be important in clinical trials. Also, this method has a low material cost compared to imaging based assessments, making this algorithmic method of classification low-cost by comparison (roughly $100 total per participant for all necessary measures).

INTRODUCTION

Accurate diagnosis or prediction of risk using simple, non-invasive measures is a goal of many researchers studying complex diseases which is rarely realized. In the context of complex progressive diseases such as Parkinson’s disease pre-clinical diagnosis and low-error rate in diagnosis are both critical needs, both in the execution of clinical trials and in the ultimate administration of disease altering therapeutics.

Imaging is often seen as the gold standard for identifying typical Parkinson’s disease cases pre-mortem; however, the high material costs and general lack of portability make this approach somewhat limiting. Here we present a comparable yet portable method for identifying Parkinson’s disease cases exhibiting etiologically typical disease presentation (confirmed by dopamine transporter imaging data) with relatively low material costs. We have sought to develop such a method to benefit both Parkinson’s disease (PD) patients and research communities. We employed a combination of factors that vary over the life of an individual, factors that are constant and do not change with time, as well as general indicators of neurodegeneration and PD-specific measures to create our classification algorithm.

Our goal was to develop the most simple and explanatory model to maximize classifier performance while minimizing cost. The model we developed includes what is commonly regarded as an indicator of likely neurodegeneration, hyposmia, as well as genetic and clinico-demographic data1. This integrative modeling approach maximizes the trend toward a growing wealth of health data from different facets of genetics, clinical and biomarker research. It is likely this predictive model will only increase in accuracy as additional data sources are added and genetic risk estimates are refined through sequencing studies.

RESULTS

After stepwise logistic regression, we retained five PD risk factors in PPMI: the GRS, total UPSIT, family history, age, and sex. None of these risk factors were used as study recruitment criteria in PPMI or as part of the PD diagnosis. Each of these factors made significant contributions to the information content of the integrative predictive model. When comparing standardized beta coefficients within the regression model, the UPSIT score is responsible for 63.1% of the explained variance, followed by the GRS contributing 13.6%, then family history (11.4%), gender (6.0%) and age (5.9%). Additional information on parameter estimates for factors comprising the integrative model in PPMI can be found in Supporting Table 1.

For participants with PD and healthy controls within PPMI, the AUC of the integrative model was 0.923 with a 95% confidence interval (CI) 0.900 – 0.946 (Table 1, Figure 2). This model appears to be relatively accurate, with a sensitivity of 0.834 (95% CI = 0.711 – 0.883) and a specificity of 0.903 (95% CI = 0.824 – 0.946) in PPMI at the best threshold for classification of 0.655 in the ROC curve. Although the low prevalence of PD even in aged populations over 60 years at 2% does give us a positive predictive value (PPV) of 0.149 despite of an AUC > 0.919. While the AUC of the UPSIT-only model in PPMI was individually strong (AUC = 0.901, 95% CI = 0.874–0.928), the integrative model is significantly more informative based on DeLong’s test for correlated ROC curves (|z| = 3.027, p-value = 0.002) 17. When the integrative model was used to classify SWEDD participants and controls in PPMI, there was a marked decrease in classification accuracy, with an AUC of 0.707 (95% CI = 0.630 – 0.783) as per Table 2.

Table 1. Descriptive statistics.

This table describes basic features of the participating studies.

Study PPMI PDBP PARS 23AndMe LABS-PD Penn-Udall
Status within study Case Control SWEDD Case Control Case Control At risk Case Control Case SWEDD Case
Participants (N) 367 165 55 453 156 15 85 146 20 20 239 13 98
Female (%) 33.2 33.3 34.5 40.4 50 33.3 54.1 52.0 45.0 55.0 35.1 61.5 31.6
Age (mean, SD) 64.256, 9.598 63.794, 10.588 63.018, 10.117 64.936, 9.046 62.301, 10.958 67.400, 6.909 63.467, 10.269 64.290, 8.338 52.650, 9.959 60.150, 10.148 59.127, 9.075 60.041, 12.927 63.816, 8.813
Family History (%) 25.1 5.4 30.9 25.4 10.9 66.6 43.5 43.8 25.0 25.0 26.8 38.5 29.6
Total UPSIT score (mean, SD) 22.196, 8.261 34.345, 4.372 32.181, 5.754 19.984, 7.959 32.590, 5.924 17.667, 5.827 35.118, 3.311 22.897, 6.283 20.550, 7.067 33.050, 3.634 19.422, 7.591 28.692, 8.420 18.316, 6.929
GRS in Z units (mean, SD) 0.595, 1.201 0, 1 0.347, 1.065 0.600, 1.229 0.094, 1.030 0.685, 0.912 0.140, 1.071 0.263, 1.070 0.531, 1.471 −0.159, 0.936 0.625, 1.114 0.328, 0.668 0.398, 1.189
Diagnostic criteria UK Brain Bank criteria for PD plus DaT scanning showing dopaminergic dysfunction at baseline No clinical indication of PD UK Brain Bank criteria for PD plus DaT scanning showing normal dopaminergic function at baseline UK Brain Bank criteria for PD No clinical indication of PD UK Brain Bank criteria for PD No clinical indication of PD No clinical indication of PD but some degree of hyposmia Self-report of PD diagnosis Self-report of neurologically normal status UK Brain Bank criteria for PD plus DaT scanning showing dopaminergic dysfunction at baseline UK Brain Bank criteria for PD plus DaT scanning showing normal dopaminergic function at baseline UK Brain Bank criteria for PD

Figure 2. Receiver operating characteristic curves.

Figure 2

The receiver operating characteristic curve for the integrative model as developed in PPMI. The red shading denotes the bootstrap estimated 95% confidence interval with the AUC: its corresponding 95% confidence interval estimate is highlighted in red text. The optimal threshold for classification is indicated by the cross-hairs with black text annotating this threshold value followed by specificity and sensitivity.

Table 2. Performance of classification.

The demographic AUC (95% CIs) includes estimates for a logistic model containing parameters of female gender, family history, and age. The UPSIT model includes estimates for a logistic model only containing the total UPSIT parameter. The GRS model includes estimates for a logistic model only containing the GRS parameter. The integrative model includes all parameters from the previous three models into the estimate of classification accuracy between cases and controls. For PPMI, PDBP, PARS, and 23andMe, the AUC estimates were generated comparing PD cases to controls. In PPMI, participants designated as SWEDD were compared to the same controls as the PD cases. In PARS, participants designated at risk were compared to the same controls as the PD cases. The LABS-PD and Penn-Udall studies were case-only, so instead of AUC, the proportion of correctly predicted cases was reported as a measure of classification accuracy (denoted by “*” in the table header) and includes the means and 95% CIs for this metric. At the current time, 23andMe could not provide precision estimates for this table.

Study PPMI PDBP PARS 23andMe LABS-PD* Penn-Udall*
Status within study PD SWEDD PD PD At risk PD PD SWEDD PD
Demographic model AUC 0.604 0.609 0.602 0.678 0.479 0.385 0.276 0.385 0.316
UPSIT model AUC 0.901 0.624 0.881 0.994 0.976 0.962 0.925 0.538 0.959
GRS model AUC 0.639 0.569 0.619 0.657 0.533 0.620 0.489 0.231 0.327
Integrative model AUC 0.923 0.707 0.894 0.998 0.962 0.955 0.929 0.692 0.939

After identifying model parameters, in silico validation of the integrative model was carried out in the PPMI dataset. We used 10,000 randomly generated training sets to train subset-specific integrative predictive models to evaluate the distribution of the AUC through resampling. These models were fitted to matched, non-overlapping validation sets for each iteration. We noted a relatively normal distribution of AUCs across all iterations (Figure 3). The mean AUC estimate was 0.918 (sd = 0.012), with a minimum AUC of 0.830 and a maximum of 0.959.

Figure 3. Internal validation and calibration.

Figure 3

Figure 3

Plot of the densities of AUCs from in silico validation of the integrative model within the PPMI dataset. This panel includes all 5 parameters (UPSIT, GRS, family history of PD, age and gender) for 10,000 iterations as well as another 10,000 iterations of training using backwards stepwise pruning.

We also ran a subsequent resampling exercise, repeating the analysis described above but using an Akaike information criteria informed backward stepwise pruning of the integrative model in the training subsets. We then applied this version of the integrative model to the validation subsets. Throughout 10,000 iterations, the UPSIT score always remained after stepwise pruning, while the GRS remained in 98.6% of the iterations, family history in 89.6%, gender in 49.9% and age in 49.4%. Of the iterations, 49.1% of the iterations contained 4 factors, 29.6% contained 3 factors, 19.9% contained 5 factors, 1.4% contained only 2 factors and only one iteration contained a single factor (the UPSIT score) based on resampling. Across the resampling iterations, the AUC ranged from 0.826 to 0.960 with a mean of 0.915 (sd = 0.013).

We evaluated model calibration in PPMI using the Hosmer-Lemeshow test. First we iterated across possibilities of 5, 10, 25, 50, and 100 random subsets within our dataset for the analysis. Each different grouping returned p-values between 0.286 and 0.592 for the Hosmer-Lemeshow test, suggesting no outlier subgroups were identified and is indicative of good calibration. We also repeated this analysis for all possible numbers of groupings ranging from 5–100. This analysis returned only p-values > 0.05, again suggesting that the integrative model does not suffer from any subset of the data driving results (Supporting Figure 1).

The integrative model shows high accuracy (quantified by AUC estimates) in classifying PD cases and healthy controls when parameter estimates are applied to additional case-control studies in a cross-sectional manner (Table 2, Figure 4). AUC values of 0.894 in PDBP and 0.955 in 23andMe were noted. In PDBP, the UPSIT-only model is associated with an AUC of 0.881 (95% CI = 0.850–0.912) and is at a disadvantage when compared to the integrative model AUC 0.894 (95% CI = 0.867–0.921), the integrative model is significantly more powerful than the UPSIT-only model when applied to PD cases and controls in PDBP (|z| = 2.154, p-value = 0.03127). Of interest, the AUC is slightly decreased in the integrative model for the 23andMe cohort compared to the UPSIT-only model and this decrease is not statistically significant (DeLong’s test, p > 0.05) and could be a function of recruitment enriched for LRRK2 risk carriers in this study subset and relatively small sample size. PARS was used as a positive control due to recruitment of cases (and at-risk participants) including UPSIT score as a consideration, therefore biasing estimates in that regard and introducing a degree of circularity (AUCs 0.962 and 0.998 in at-risk and PD).

Figure 4. Probability of Parkinson’s disease.

Figure 4

Density plots of the predicted probability of Parkinson’s disease estimated from the integrative model by participant status. Participant-level data from all studies analyzed except for 23andMe were included in this figure. Densities were smoothed using a gaussian kernel and no particpants had probability estimates of zero.

To further validate the applicability of this integrative classifying model, we attempted to predict PD case status in the Penn-Udall and LABS-PD datasets using the parameters trained in the PPMI dataset. The integrative model was able to predict cases accurately, at a rate of 92.9% correct predictions in LABS-PD (N = 222/239) and 93.9% correct predictions in the Penn-Udall (N = 92/98) dataset. There is also a slight decrease in classification accuracy for the integrative model in the Penn-Udall dataset compared to the UPSIT-only model, likely due to the small sample size (Table 2).

In PPMI, applying the integrative model to SWEDD participants and shared controls does not generally classify the SWEDD participants as PD (AUC = 0.707, 95% CI = 0.630 – 0.783). In addition, only 69.2% of the SWEDD participants in the LABS-PD (N = 9/13) cohort are actually classified as PD by the integrative model, showing many SWEDD participants are actually etiologically distinct from PD cases with regard to all modeled factors, not just with regard to functional imaging.

Strikingly our integrative model classified the SWEDD participants in a bimodal distribution suggesting that this group represents not a distinct disease entity but rather a heterogeneous mixture of typical PD cases and controls (Figure 4). In PPMI, a second round of longitudinal DaT scanning is underway. So far the available DaT scanning has identified five out of 55 SWEDD participants who, at 1–2 years after baseline enrollment show evidence of dopaminergic dysfunction and would therefore no longer be classified as SWEDD cases. Of these five participants, at baseline the integrative model definitively classified four of these participants as PD cases (probabilities of being a PD case 0.868, 0.907, 0.995 and 0.997) while the remaining one was on the cusp of classification as a PD case (slightly below our threshold with a predicted PD probability of 0.651). 12 additional SWEDD participants in PPMI fall above the PD risk threshold using the same cut-off of 0.651 but show no evidence of dopaminergic dysfunction after 1–2 years of follow-up. A simple test of proportions suggests this enrichment of latent dopaminergic deficit (referring to the process in which SWEDD becomes typical PD during follow-up imaging by showing evidence of dopaminergic dysfunction) in which the SWEDD participants algorithmically classified as PD is not likely due to chance; we note significant differences when comparing the rate of dopaminergic deficit noted during follow-up in SWEDD participants above (4 out of 17) versus those below (1 out of 38) our PD classification threshold (p-value = 0.003).

DISCUSSION

In this report we have described an accurate, non-invasive, and low-cost method for classifying PD cases and controls. We acknowledge that the studies included here vary in their design, recruitment and implementation; however, the results and model validation suggest future utility. The strength and utility of this model is in its high classification accuracy (AUC near 0.9 or higher) and ease of implementation. The application of this model will likely be useful in refining phenotypes in large-scale research studies, such as identifying SWEDD participants that overlap with PD in the spectrum of predicted risk. Follow-up in the SWEDD cohort as additional DaT scanning data becomes available will help inform the ability of the integrative classifying model to refine PD phenotyping in longitudinal research studies. This data generally suggests that within a study, misclassified cases may require additional, more detailed follow-up as they may not be an accurate representation of etiologically typical PD and their inclusion may possibly have a negative impact on future biomarker or interventional studies. One key element in the application of clinical studies and etiologic based trials, likely lies in the accurate identification of a group of patients with a homogeneous disease. One major group that causes concern in this regard are the SWEDDs who typically represent 15% of a clinically acquired cohort, but who do not show evidence of dopaminergic deficit when scanned. In addition, this model shows utility in facilitating a diagnostic path towards more accurate preclinical detection of PD; as here we present a model that can be adapted to disease prediction within populations, although this aspect will require follow-up studies in prospective cohorts.

While we have validated this classification model in three case-control (with one study, PARS as a positive control) and two case-only studies of PD. We hope to improve the accuracy of this model by identifying more disease-specific biomarkers and genetic risk loci as well as resequencing known loci to generate more accurate estimates of genetic risk. 93.3% (N = 28) of genetic risk variants used to create our GRS are from GWAS and are most likely surrogates for true functional variant(s) due to the inherent nature of imputation-based GWAS studies. Identifying the true functional variants within loci will only improve our algorithmic classification and prediction of PD. Resequencing studies of genetic loci underway across the world now will help to refine genetic aspects of this predictive model in the near future. In addition, we hope to expand the model in the future to increase accuracy as more data is being accumulated in our training and validation datasets, particularly within PPMI and PDBP. In this report, we have shown that in these two larger studies, our integrative model outperforms previous efforts and shows that it significantly outperforms its components if they are assessed independently in these two studies. In 23andMe and PARS where there was differential or targeted recruitment, hyposmia had such a high AUC associated with it by itself that there was no significant change in the model when adding other factors.

This is a PD-specific classification model, incorporating the classifying power of the UPSIT score, a known proxy for generalized neurodegeneration, in concert with PD-specific factors of family history and the GRS. We illustrate how this model is focused towards identifying typical PD, as evidenced by its bimodal classification of SWEDD participants. It is our intention to expand this model to other neurodegenerative diseases, incorporating multiple disease-specific genetic risk profiles and family histories. This could be of particular future value in minimizing misdiagnosis in conditions such as frontotemporal dementia, multiple system atrophy, and/or dementia with Lewy bodies, as well as defining preclinical stages of Alzheimer’s disease which could be expansions of the current model.

A shortcoming of this analysis is the lack of diverse continental ancestries in sufficient numbers for predictive modeling, as this study was conducted using genetically ascertained European ancestry participants, and some genetic heterogenetiy across continental ancestries may exist with regard to PD risk factors. To address this issue, we hope to begin building cohorts of adequate size to investigate PD risk in more diverse populations. This next step will help to refine and improve our predictive models and make them more globally applicable. Another shortcoming of note is the use of age dependent factors, particularly hyposmia which is quite common in old age and may affect model performance in very aged populations outside of the ages in this study. The high proportion of case family history seen here may be a potential source of bias not seen in some population based studies.

Currently, this integrative model has limited application as a general screening for PD. At the low estimated PD prevalence of 2% in many aged populations over 60 years, even with an AUC of 0.923, the integrative model would likely falsely identify six individuals of potential PD for every one true case 19. At this prevalence there is also a low PPV of 0.149. However, further application of Bayes’ theorem suggests that if the prevalence were 10%, there would be one false classification of PD for every true case of PD detected (PPV 0.489). If prevalence in a population were 20%, there would be one false classification of PD for every two correct classifications (PPV 0.682). This illustrates that the integrative model we propose in this paper may have maximal utility for identifying PD in high-risk populations, e.g. among a sample of people with symptoms or other features that may suggest the onset of PD, even though PD criteria are not yet met. Conversely, the positive predictive value would be low if the model were to be used as a screening test for the general public or by a medical practitioner in a routine clinical setting.

The future of this research should be directed towards the evolution of predictive and classifying models based on data from prospective studies. Such data will allow us to assess, modify and reassess predictive models using temporally-developed information, rather than simulations based on retrospective statistics which may operate on too many assumptions20. It is our expectation that models can be refined, evaluated, and tuned for varying rates of disease progression in established patients as datasets grow in size and depth of information content, then evaluated and validated further in prospective cohort studies. We acknowledge that basing this model on cross-sectional case-control data from PPMI may have influenced results to be slightly conservative, particularly with regard to the predictive power of the age and sex parameters in such a well matched study, but the strength of the imaging confirmed diagnoses likely contribute to the model’s classification accuracy, helping to avoid misdiagnosis. Through well-phenotyped prospective studies we can hopefully refine and extend this work to identify a viable timeframe for accurate pre-diagnostic screening. Another clear area of interest is the application of this model to broader neurodegenerative diseases, including atypical PD. It is our belief that this type of integrated model will ultimately afford us the ability to diagnose the specific type of neurodegeneration in a population.

As a companion to this report for facilitating both replication and extension of our work by interested researchers, the code and training data for this predictive model as well as some validation data have been made publicly available online. Data and code from PPMI and PDBP are available for download from http://www.ppmi-info.org/ and https://pdbp.ninds.nih.gov/. Data from other studies (except 23andMe) may be made available via correspondence with the authors involved.

ONLINE METHODS

Supporting Text: Online methods and consortia members

Participating studies

Parkinson’s Progression Marker Initiative (PPMI)

The Parkinson’s Progression Marker Initiative is an observational multi-center, international study sponsored by the Michael J. Fox Foundation for Parkinson’s Research and partially funded by 17 industry sponsors. The overall goal of the study is to identify and validate biomarkers of PD progression (PD). The study population consists of untreated recently diagnosed PD (423), similar age and gender healthy controls (196) and subjects screened as potential PD subjects but with dopamine transporter imaging scans, measured by DaTSCAN® without evidence of dopaminergic deficit (SWEDD). Participants enrolled in PPMI undergo a series of longitudinal assessments, including standardized functional assessments, neuroimaging, and biofluid collection (including DNA from whole blood). In this report, the PPMI cohort is divided into three groups of participants: healthy controls (HC), PD subjects and SWEDD subjects. PD subjects in PPMI are required to demonstrate at least asymmetric resting tremor or asymmetric bradykinesia or some combination of bradykinesia, resting tremor and/or rigidity within two years of diagnosis 1. They must be untreated for PD at the time of enrollment, as well as for the prior two years. PD cases must have an abnormal DaTSCAN® indicating dompamine transporter deficit. SWEDD participants have the same clinical criteria as PD cases, but DaT scanning data does not demonstrate an abnormal DaTSCAN®. HC samples are clinically defined as having no known neurologic dysfunction and a Montreal Cognitive Assessment (MoCA) > 26 2. The PD phenotype in this analysis excludes known SWEDD participants. The exclusion of SWEDD participants from the PD model allows us to focus our efforts on more etiologically typical PD as defined by the clinical diagnosis and DaT scanning data. All data in PPMI is available to qualified investigators via http://www.ppmi--info.org/.

Parkinson’s Disease Biomarkers Program (PDBP)

Established in November 2012 by the National Institute of Neurological Disorders and Stroke (NINDS), the PDBP seeks to identify and develop potential PD biomarkers, ideally for use in clinical trials of neuroprotective agents. The PDBP includes four key components: 1) biomarker hypothesis testing and collection of clinical data and biospecimens, 2) studies to identify novel PD biomarkers, 3) biospecimen banking and distribution, and 4) data management through the Data Management Resource (DMR). The application of these goals has resulted in the establishment of a self-structured consortium consisting of 11 unique projects, 6 of which actively enroll participants. Consortium-wide protocols ensure standardization of data collection and biospecimen processing. A standard set of clinical assessments and biospecimen collection procedures are used for all participants and specified by the NINDS (see RFA NS-12-011). These clinical assessments were chosen based on the NINDS Common Data Elements (see http://www.commondataelements.ninds.nih.gov) as well as for overlap with assessments used in BioFIND and PPMI. The PDBP has enrolled over 1,000 participants to date including participants with PD based on clinical criteria, neurologically normal controls, and other individuals with parkinsonism not meeting typical PD criteria as defined by the UK Parkinson’s Disease Society Brain Bank Clinical Diagnostic Criteria3. All sites use the DMR to record data; storage of biospecimens and quality control analysis are performed by NINDS Repository Laboratories. To access data and more information regarding PDBP, please refer to https://pdbp.ninds.nih.gov/.

Parkinson’s Associated Risk Study (PARS)

The PARS study is a prospective study aiming to test a sequential biomarker strategy to identify subjects at high risk for developing motor symptoms of PD. The study enrolled approximately 10,000 individuals to examine their risk profile for PD. The study tested individual over age 60 with olfactory testing (using the University of Pennsylvania Smell Identification Test (UPSIT)) sent by mail to identify a population of hyposmic and normaosmic subjects for more intensive clinical and biomarker testing including dopamine transporter imaging. The study has demonstrated that hyposmic subjects have a 11% risk of marked dopamine transporter deficit. During four year follow-up approximately 60% (14 subjects) of subjects identified with hyposmia and dopamine transporter deficit have developed motor PD. The PARS study enrollment and follow--up is ongoing and more information is available at http://www.parsinfosource.com/. We treated PARS as a positive control since hyposmia defined by a low UPSIT score and to a lesser degree family history were criteria for recruitment of at risk individuals. We do not regard this as a true validation but more of a theoretical proof of concept for our classification model that was developed and trained on PPMI data.

23andMe

The 23andMe Parkinson’s Disease cohort was described in detail previously 4. Briefly, patients with PD were recruited through a targeted email campaign in conjunction with the Michael J. Fox Foundation, The Parkinson’s Institute and Clinical Center, and numerous other PD patient groups and clinics. As part of a Michael J. Fox Foundation-funded study of PD biomarkers focusing mainly on individuals harboring at least one LRRK2 p.G2019S allele, 20 individuals from the 23andMe Parkinson’s Disease cohort without LRRK2 p.G2019S and 20 healthy controls underwent blood draws and completed the UPSIT. Additional phenotypic information was obtained through online questionnaires and classification as cases and controls was performed as described previously. The 23andMe study protocol and consent were approved by the external Association for the Accreditation of Human Research Protection Programs, Inc. accredited Institutional Review Board, Ethical and Independent Review Services. Our consent and privacy statement preclude sharing of individual-level data without explicit consent.

The Longitudinal and Biomarker Study in PD (LABS-PD)

The Longitudinal and Biomarker Study in PD is an observational study designed to prospectively measure the evolution of motor and non-motor features of PD and identify promising biomarkers of progression from early to late stages of the disease. Study participants had previously been enrolled in a controlled clinical trial of a mixed lineage kinase inhibitor in early, untreated PD (PreCEPT); the average duration of illness was less than 1 year, and subjects did not require dopaminergic therapy. Many of the original trial participants were subsequently enrolled in a follow-up study (PostCEPT) and later participated in a longitudinal clinical assessment program for biomarker development in PD, with annual visits and remote follow-up. As part of the PreCEPT and PostCEPT studies, subjects underwent DAT imaging; SWEDD participants were identified as per PPMI.

Morris K. Udall Parkinson’s Disease Research Center of Excellence (Penn-Udall)

The NINDS funded Penn-Udall Center was launched at the Perelman School of Medicine (Penn) at the University of Pennsylvania in 2007 (P50 NS062684 Pacific Northwest Udall Center, Zabetian PI and P50 NS053488, Trojanowski JQT-PI). The overarching goals of the Penn Udall Center are to elucidate mechanisms of disease progression and alpha–synuclein transmission through synergistic collaborations between basic and translational research.

The Clinical Core of the Penn Udall Center recruits patients with PD, PDD and DLB to participate in a longitudinal battery of neuropsychological testing and to donate plasma, whole blood for DNA, cerebrospinal fluid, structural and functional brain imaging and post-mortem brain tissue. The specific goal of the biomarker collection effort is to improve the ability to predict whether an individual patient with PD is likely to develop significant cognitive decline. To date we have enrolled over 300 patients in longitudinal neuropsychological testing, and of those over 100 have participated in complete biomarker collection, including blood for DNA.

Assignment of a cognitive diagnosis is made for each patient at baseline and at every annual or biannual visit during a consensus conference held every six months by movement disorders specialists affiliated with the Penn Udall Center. A participant is discontinued from the study if assigned a diagnosis of dementia in two consecutive years; for the purpose of these analyses, this diagnosis is carried forward if the patient is still alive at the time that a future visit would have occurred.

Factors included in the integrative model

Our integrative classifying model utilizes information from UPSIT, GRS, age, gender and family history to estimate risk. None of the factors included in our integrative classifying model are used in the clinical diagnosis of PD. Also, none of the cohorts assessed by this model were responsible for the discovery of the factors included in the classifying model. In addition, no factors in the integrative classifying model were used as part of recruitment for any cohorts except for one study (discussed below), which was used as a positive control in validating our model. This allows us to avoid circularity, overestimation, and overfitting of the classification model. Please see Table 1 for descriptive statistics of the participating studies.

The UPSIT is a commercially available test that uses smell identification to test the function of an individual’s sense of smell 5. It is the gold standard of smell identification tests for its reliability and practicality. Smell dysfunction is known to occur in several neurodegenerative disorders, including in PD, and has been suggested as a potential biomarker 6,7. Because of the rich clinical data available in the studies participating in this project, we were able to include UPSIT data in our analysis.

GRS were calculated by summing the risk allele counts for the 28 common risk loci identified and replicated in the most recent large-scale meta-analysis of PD genome-wide association study (GWAS) data, as well as including two additional relatively rare risk variants detected within PPMI known to be associated with PD (p.N370S in GBA and p.G2019S in LRRK2) 8,9,10,11,12,13. In PPMI, we see expected frequencies of G2019S and N370S variants in PD cases (1.3% and 1.9% respectively) and even find one N370S in controls, with the inclusion of these variants improving the AUC of the GRS by ~1% over previous efforts only focusing on the 28 independent common risk variants [unpublished data]. Prior to summing the risk allele counts, all allele counts per variant were scaled by their log odds ratios. The effect estimates for the 28 common variants were extracted from Nalls et al., 2014, and odds ratios of rare alleles at GBA p.N370S (3.33) and LRRK2 p.G2019S (9.620) were taken from the PDgene database and 23andMe [www.pdgene.org and www.23andme.com]14. In all cohorts except 23andMe, after the scaled risk allele counts were summed and divided by the number of loci, they were transformed into Z scores using the healthy controls in PPMI as a reference. This aids in communicating effect estimates, with Z corresponding to a single standard deviation from the control mean genetic risk of PD. This method for calculating risk scores mirrors that in the software package PLINK [http://pngu.mgh.harvard.edu/~purcell/plink/profile.shtml]15. Due to the slightly different study design and genotyping in the 23andMe cohort, imputed dosages of risk alleles were summed and divided by the number of loci and then transformed into Z scores using a reference set of 334,839 unrelated European individuals who self-reported as not having PD. Information regarding variants and effect estimates used in generating the GRS is included in the underlying data available for download by interested researchers. As a note, the GRS was used instead of single SNP estimates to improve power, as these associations were initially discovered in over 100,000 samples and effect estimates of single variants would not be accurate in cohorts of less than 1,000 samples as in this study.

Of the 40 total 23andMe samples used here, 31 were used to discover the loci comprising the GRS. However, the influence of these overlapping samples is trivial due to the immense size of the GWAS discovery efforts. Although LABS-PD was used in the replication phase of analyses for some loci, these comprise a negligible amount of the sample series contributing to the previous GWAS. Of note, the PPMI, PARS, Penn-Udall and PDBP cohorts did not have genetic data available at the time of this most recent GWAS effort used to identify and replicate the loci comprising the score and therefore were not included in that effort either.

Gender, age (at onset for cases and at last exam for controls) as well as family history were all self-reported information. To clarify, we regard family history as self-report of a first or second degree relative with a diagnosis of PD. When applicable, medical records were used to corroborate this information.

Genetic data

Genotypic data for all studies except Penn-Udall and 23andMe were generated at the National Institute on Aging’s Laboratory of Neurogenetics using the NeuroX genotyping array available from Illumina Inc. Penn-Udall genotypic data was generated using the NeuroX array at the Center for Applied Genomics in Philadelphia, PA. For a detailed description of the NeuroX array, its content, and genotype calling methods, please see previously published work 16. Quality control methods for the NeuroX genotyped samples are described in detail elsewhere 17. In brief, all NeuroX genotyped samples met the following inclusion criteria: per variant and per sample missingness < 5%; concordance between self-reported gender and genetically ascertained gender; no first or second degree relatives within each dataset from self-report and genetically ascertained relationships based on common polymorphisms; and European ancestry from self-report and genetic confirmation when compared to known reference populations 18.

Genotypic data for the 23andMe cohort were generated by National Genetics Institute (NGI), a Clinical Laboratory Improvements Amendments (CLIA) certified clinical laboratory and subsidiary of Laboratory Corporation of America. Samples were genotyped on either the Illumina HumanHap550+ BeadChip platform (n=10) or the Illumina HumanOmniExpress+ BeadChip platform (n=30), as described previously 19. Every sample that did not reach a 98.5% call rate for SNPs on the standard platforms was reanalyzed. Individuals whose analyses repeatedly failed were contacted by 23andMe customer service to provide additional samples.

Model generation

We selected known non-invasive risk factors for PD to train our classification framework in the PPMI cohort. In this model, we selected three time-independent factors: family history of PD from self-report in 1st or 2nd degree relatives, female gender, and a GRS; we also selected two time-dependent factors: age (onset in cases and most recent exam in controls) and total UPSIT score. The UPSIT is scored by summing all correctly-identified odors out of the 40 items within the UPSIT test booklets. The odors are culturally adjusted for familiarity. All five factors were entered into a logistic regression model to generate estimates of case probability used to classify samples in PPMI as cases or controls at baseline. In PPMI we regard baseline as the first clinic visit for controls at enrollment, or the initial exam at recruitment that is roughly concurrent with that participant’s PD diagnosis. Receiver operator characteristic (ROC) curves were used to quantify accuracy of the classification within the cohort. This probabilistic model was then applied to all other studies after training on PPMI. As a note, this model was trained on PPMI excluding known SWEDD samples. All parameters in the integrative classifying model contribute to the overall information content of the model based on Akaike information criteria (AIC) and surviving backwards and forwards stepwise modeling in PPMI20. We term this more complex model the “integrative model” in this report. Additional classifying models outside of the integrative model described above were created to estimate the accuracy of using only UPSIT, only the GRS, and only demographic factors (i.e. family history, age and female gender). Only AIC was used for model pruning and due to power concerns in small sample sizes, interactions were not incorporated into model generation. Standardized beta-coefficients were generated within the integrative model to compare the overall effect sizes of factors in the integrative model using PPMI data.

Internal validation within PPMI

After initial specification of the predictive model in PPMI, we used resampling to validate the model within the cohort in silico. We resampled the PPMI PD and control samples over 10,000 iterations, generating parameters (beta-coefficients) on a randomly assigned training subset and fitting the five predictors of the integrative model to a randomly assigned validation subset to calculate AUCs. In this analysis, cases and controls were equally split at random between training and test datasets per iteration. In addition, we also modified this workflow to run a backwards stepwise pruning of the integrative model on the training subset, then fit this model (which uses 2–5 parameters based on the subset in question) to the validation subset and calculated AUCs for an additional 10,000 iterations. In this phase of analysis, subsets were approximately equal in size and partitioned for each iteration of the resampling using a random number generator to assign a sample to either the training or validation subset. For each iteration, the AUC was calculated based on training parameters derived from a randomly-generated subset and fitted to the corresponding validation subset, with no sample overlap with the training set.

Calibration of the model within PPMI

To evaluate the calibration of the model in PPMI, we used the Hosmer-Lemeshow test21. The Hosmer-Lemeshow test itself has a weakness of erratic results across small sampling groups so we used a variety of sampling scenarios within PPMI. We first evaluated calibration by partitioning the data into 5, 10, 25, 50, and 100 groups and then running the calibration test. Next, we repeated tests for all possible values between 5–100 groups and evaluated the distribution of the test statistics.

Additional external validation

For all studies with available data, we fit the parameter estimates trained on the entire PPMI dataset to evaluate the applicability of the integrative model’s classification algorithm. We also compared the accuracy of the integrative model in the SWEDD subset of PPMI, which used controls shared with the training set, but was not used to do any additional training of the algorithm. Because the AUC could not be calculated for case-only studies using ROC (i.e. the LABS-PD and Penn-Udall studies), we quantified accuracy by the proportion of PD case classifications using a optimal prediction threshold derived from the best classification available in the training set derived from the ROC curve. Best prediction thresholds maximizing combined sensitivity and specificity were: 0.675 for the demographic model, 0.574 for the UPSIT model, 0.639 for the GRS model, and 0.655 for the integrative model.

Software note

Details on the generation of raw genotypes from the NeuroX array can be found elsewhere including the manual clustering of PD risk loci and the general Illumina-based pipeline. PLINK was used for management of raw genotype data. All downstream statistical analyses outside of the 23andMe dataset were carried out using R 3.0.2 on Ubuntu Linux 14.04 22,23. R packages used include ggplot2, pROC, QuantPsyc, ResourceSelection, and scales 24,25,26,27,28.

Supplementary Material

1. Supporting Figure 1.

Panel B is the distribution of Hosmer-Lemeshow p-values for the integrative model in PPMI for 5–100 randomly generated groupings to test calibration in a variety of scenarios, a p-value of < 0.05 would suggest that there was bias detected in that subset of the data. The histogram of the p-value distribution from the Hosmer-Lemeshow tests has been overlaid with a linear representation of gausian kernel smoothed density to better communicate the distribution. In this panel, p-values ranged from 0.055 to 0.876, with a mean of 0.471 (SD = 0.193) and a median of 0.480 (inter-quartile range = 0.310–0.624).

2

Supporting Table 1. Details of the stepwise regression. Logistic regression was used as the basis for the integrative predictive model trained on the PPMI dataset. Parameter estimates are sorted in descending order of Akaike information criterion.

Supporting Table 2. Variants included in the GRS calculations as they appear in the PPMI and PDBP NeuroX genotype datasets.

Figure 1. Details of methods and workflow.

Figure 1

Orange boxes denote steps of the workflow specific to the PPMI study while the purple box denotes validation phase analyses in additional sample series outside of PPMI. Abbreviations include: Parkinson’s Progression Marker Initiative (PPMI), Parkinson’s Disease Biomarkers Program (PDBP), Parkinson’s Associated Risk Study (PARS), 23andMe, Longitudinal and Biomarker Study in PD (LABS-PD), and Morris K. Udall Parkinson’s Disease Research Center of Excellence (Penn-Udall), University of Pennsylvania Smell Identificaiton Test (UPSIT), genetic risk score (GRS), subjects with dopamine transporter imaging scans showing no evidence of dopaminergic deficit (SWEDD), area under the curve (AUC).

Acknowledgments

FUNDING

This study used the high-performance computational capabilities of the Biowulf Linux cluster at the National Institutes of Health, Bethesda, MD (http://biowulf.nih.gov), and DNA panels, samples and clinical data from the National Institute of Neurological Disorders and Stroke Human Genetics Resource Center DNA and Cell Line Repository, human subjects protocol 2003-077. This work was supported in part by the Intramural Research Program of the National Institute on Aging and National Institute of Neurological Disorders and Stroke (project numbers Z01-AG000949-02). PPMI is supported by the Michael J. Fox Foundation for Parkinson’s Research (MJFF) and is co-funded by MJFF, Abbvie, Avid Radiopharmaceuticals, Biogen Idec, Bristol-Myers Squibb, Covance, Eli Lilly & Co., F. Hoffman-La Roche, Ltd., GE Healthcare, Genentech, GlaxoSmithKline, Lundbeck, Merck, MesoScale, Piramal, Pfizer and UCB. LABS-PD was supported by the National Institute for Neurological Disorders and Stroke, Department of Defense Neurotoxin Exposure Treatment Parkinson’s Research Program, Michael J Fox Foundation, Parkinson’s Disease Foundation, Cephalon Inc./Teva, and Lundbeck. PDBP sample and clinical data collection is supported under the following grants by NINDS: U01NS082134, U01NS082157, U01NS082151, U01NS082137, U01NS082148, U01NS082133. The University of Pennsylvania Udall center is supported by NINDS under grant P50NS053488 and P50 NS062684. Alastair J. Noyce is funded in part by Parkinson’s UK (Career Development Award reference F-1201). The 23andMe work was supported in part by the Michael J. Fox Foundation for Parkinson’s Research (Project title “23andMe Blood Collection for LRRK2 Consortium”). 23andMe would like to thank the customers of 23andMe who answered surveys, as well as the employees of 23andMe, who together made a portion of this research possible. All authors would like to thank all study participants for making this possible.

CONSORTIA MEMBERS AND AFFILIATIONS

Parkinson’s Disease Biomarkers Program (PDBP)

Dubois Bowman, Ph.D. Emory University (Atlanta, Georgia) Alice Chen-Plotkin, M.D. University of Pennsylvania (Philadelphia, Pennsylvania) Ted Dawson, M.D., Ph.D. Johns Hopkins University (Baltimore, Maryland) Richard Dewey, M.D. UT Southwestern Medical Center (Dallas, Texas) Dwight Charles German, Ph.D. UT Southwestern Medical Center (Dallas, Texas) Xuemei Huang, M.D., Ph.D. Pennsylvania State University Hershey Medical Center (Hershey, Pennsylvania) Vladislav Petyuk, Ph.D. Battelle Pacific Northwest Laboratories, (Richmond, Washington) Clemens Scherzer, M.D. The Brigham and Women’s Hospital Inc., Harvard Medical School (Boston, Massachusetts) David Vaillancourt, Ph.D. University of Florida, (Gainesville, Florida) Andrew West, Ph.D. University of Alabama, (Birmingham, Alabama) Jing Zhang, M.D., Ph.D. University of Washington (Seattle, Washington)

Parkinson’s Progression Marker Initiative (PPMI)

Steering Committee

Kenneth Marek, MD1 (Principal Investigator); Danna Jennings, MD1 (Site Investigator, Olfactory Core, PI); Shirley Lasch, MBA1; Caroline Tanner, MD, PhD9 (Site Investigator); Tanya Simuni, MD3 (Site Investigator); Christopher Coffey, PhD4 (Statistics Core, PI); Karl Kieburtz, MD, MPH5 (Clinical Core, PI); Renee Wilson, MA5; Werner Poewe, MD7 (Site Investigator); Brit Mollenhauer, MD8 (Site Investigator); Douglas Galasko, MD28; Tatiana Foroud, PhD 16 (Genetics Coordination Core, BioRepository PI); Todd Sherer, PhD6; Sohini Chowdhury6; Mark Frasier, PhD6; Catherine Kopil, PhD6; Vanessa Arnedo6

Study Cores

Clinical Coordination Core: Cynthia Casaceli, MBA5

Imaging Core: John Seibyl, MD 1; Susan Mendick, MPH1; Norbert Schuff, PhD9

Statistics Core: Christopher Coffey, PhD4; Chelsea Caspell 4; Liz Uribe4; Eric Foster 4; Katherine Gloer PhD4; Jon Yankey MS4

Bioinformatics Core: Arthur Toga, PhD10 (Principal Investigator), Karen Crawford, MLIS10

BioRepository: Danielle Elise Smith16; Paola Casalin12; Giulia Malferrari12

Bioanalytics Core: John Trojanowski, MD, PhD13 (Principal Investigator); Les Shaw, PhD13 (Co-Principal Investigator)

Genetics Core: Andrew Singleton, PhD14 (Principal Investigator)

Genetics Coordination Core: Cheryl Halter 16

Site Investigators

David Russell, MD, PhD1 Site; Stewart Factor, DO17; Penelope Hogarth, MD18; David Standaert, MD, PhD19; Robert Hauser, MD, MBA20; Joseph Jankovic, MD21; Matthew Stern, MD13; Lama Chahine, MD13; Shu-Ching HU, MD PhD22; Samuel Frank, MD23; Claudia Trenkwalder, MD8,; Wolfgang Oertel MD35; Irene Richard, MD24; Klaus Seppi, MD7; Eva Reiter, MD7; Holly Shill, MD 25; Hubert Fernandez, MD 26; Anwar Ahmed, MD26; Daniela Berg, MD 27; Isabel Wurster MD27; Zoltan Mari, MD29; David Brooks, MD30; Nicola Pavese, MD30; Paolo Barone, MD, PhD31; Stuart Isaacson, MD32; Alberto Espay, MD, MSc 33; Dominic Rowe, MD, PhD34; Melanie Brandabur MD2; James Tetrud MD2; Grace Liang MD10Karen Marder36; Jean-Christophe Corvol37; Jose Felix Martí Masso38; Eduardo Tolosa39; Jan O. Aasly40; Nir Giladi41; Leonidas Stefanis42;

Coordinators

Laura Leary1; Cheryl Riordan1; Linda Rees, MPH1; Barbara Sommerfeld, RN, MSN17; Cathy Wood-Siverio, MS17; Alicia Portillo18; Art Lenahan18; Karen Williams3; Stephanie Guthrie, MSN19; Ashlee Rawlins19; Sherry Harlan20; Christine Hunter, RN21; Baochan Tran13; Abigail Darin13; Carly Linder13; Gretchen Todd22; Cathi-Ann Thomas, RN, MS23; Raymond James, RN23; Cheryl Deeley, MSN24; Courtney Bishop BS24; Fabienne Sprenger, MD7; Diana Willeke8; Sanja Obradov25; Jennifer Mule26; Nancy Monahan26; Katharina Gauss27; Kathleen Comyns9 Deborah Fontaine, BSN, MS, RN, GNP, MS 28; Christina Gigliotti28; Arita McCoy29; Becky Dunlop29; Bina Shah, BSc30; Susan Ainscough31; Angela James32; Rebecca Silverstein32; Kristy Espay33; Madelaine Ranola34; Helen M. Santana36; Nelly Ngono37; Elisabet Rezola38; Delores Vilas Rolan39; Bjorg Waro40; Anat Mirlman41; Maria Stamelou42;

ISAB (Industry Scientific Advisory Board)

Thomas Comery, PhD43; Spyros Papapetropoulos, MD, PhD43; Bernard Ravina, MD, MSCE44; Igor D. Grachev, MD, PhD45; Jordan S. Dubow, MD46; Michael Ahlijanian, PhD47; Holly Soares, PhD47; Suzanne Ostrowizki, MD, PhD48; Paulo Fontoura, MD, PhD48; Alison Chalker, PhD49; David L. Hewitt, MD49; Marcel van der Brug, PhD50; Alastair D. Reith, PhD51; Peggy Taylor, ScD52; Jan Egebjerg, PhD53; Mark Minton, MD53; Andrew Siderowf, MD, MSCE54; Pierandrea Muglia, PhD55; Robert Umek, PhD56; Ana Catafau, MD, PhD57

Footnotes

1

Institute for Neurodegenerative Disorders, New Haven, CT

2

The Parkinson’s Institute, Sunnyvale, CA

3

Northwestern University, Chicago, IL

4

University of Iowa, Iowa City, IA

5

Clinical Trials Coordination Center, University of Rochester, Rochester, NY

6

The Michael J. Fox Foundation for Parkinson’s Research, New York, NY

7

Innsbruck Medical University, Innsbruck, Austria

8

Paracelsus-Elena Klinik, Kassel, Germany

9

University of California, San Francisco, CA

10

Laboratory of Neuroimaging (LONI), University of Southern California

11

Coriell Institute for Medical Research, Camden, NJ

12

BioRep, Milan, Italy

13

University of Pennsylvania, Philadelphia, PA

14

National Institute on Aging, NIH, Bethesda, MD

16

Indiana University, Indianapolis, IN

17

Emory University of Medicine, Atlanta, GA

18

Oregon Health and Science University, Portland, OR

19

University of Alabama at Birmingham, Birmingham, AL

20

University of South Florida, Tampa, FL

21

Baylor College of Medicine, Houston, TX

22

University of Washington, Seattle, WA

23

Boston University, Boston, MA

24

University of Rochester, Rochester, NY

25

Banner Research Institute, Sun City, AZ

26

Cleveland Clinic, Cleveland, OH

27

University of Tuebingen, Tuebingen, Germany

28

University of California, San Diego, CA

29

Johns Hopkins University, Baltimore, MD

30

Imperial College of London, London, UK

31

University of Salerno, Salerno, Italy

32

Parkinson’s Disease and Movement Disorders Center, Boca Raton, FL

33

University of Cincinnati, Cincinnati, OH

34

Macquarie University, Sydney Australia

35

Philipps University Marburg, Germany

36

Columbia Medical, New York, NY

37

Pitié-Salpêtrière Hospital, Paris France

38

University of Donostia-Service of Neurology Hospital, San Sebastian, Spain

39

University of Barcelona-Hospital Clinic of Barcelona, Barcelona, Spain

40

Norwegian University of Science and Technology, Trondheim, Norway

41

Tel Aviv Sourasky Medical Center, Tel Aviv, Isreal

42

Foundation for Biomedical research of the Academy of Athens, Athens, Greece

43

Pfizer, Inc., Groton, CT

44

Biogen Idec, Cambridge, MA

45

GE Healthcare, Princeton, NJ

46

AbbVie, Abbot Park, IL

47

Bristol-Myers Squibb Company

48

F.Hoffmann La-Roche, Basel, Switzerland

49

Merck & Co., North Wales, PA

50

Genentech, Inc., South San Francisco, CA

51

GlaxoSmithKline, Stevenage, United Kingdom

52

Covance, Dedham, MA

53

H. Lundbeck A/S

54

Avid Radiopharmaceuticals, Philadelphia, PA

55

UCB Pharma S.A., Brussels, Belgium

56

Meso Scale Discovery

57

Piramal Life Sciences, Berlin, Germany

1

Marek, Kenneth et al. “The parkinson progression marker initiative (PPMI).” Progress in neurobiology 95.4 (2011): 629–635.

2

Lindholm, Beata et al. “Prediction of Falls and/or Near Falls in People with Mild Parkinson’s Disease.” PloS one 10 (2015): e0117018–e0117018.

3

Hughes, Andrew J, Susan E Daniel, and Andrew J Lees. “Improved accuracy of clinical diagnosis of Lewy body Parkinson’s disease.” Neurology 57.8 (2001): 1497–1499.

4

Do, Chuong B et al. “Web-based genome-wide association study identifies two novel loci and a substantial genetic component for Parkinson’s disease.” PLoS genetics 7.6 (2011): e1002141.

5

Doty, Richard L, Richard E Frye, and Udayan Agrawal. “Internal consistency reliability of the fractionated and whole University of Pennsylvania Smell Identification Test.” Perception & Psychophysics 45.5 (1989): 381–384.

6

Lazarini, Françoise et al. “Adult Neurogenesis Restores Dopaminergic Neuronal Loss in the Olfactory Bulb.” The Journal of Neuroscience 34.43 (2014): 14430–14442.

7

Michell, AW et al. “Biomarkers and Parkinson’s disease.” Brain 127.8 (2004): 1693–1705.

8

Nalls, Mike A et al. “Large-scale meta-analysis of genome-wide association data identifies six new risk loci for Parkinson’s disease.” Nature genetics (2014).

9

International Parkinson’s Disease Genomics Consortium (IPDGC), and Wellcome Trust Case Control Consortium 2 (WTCCC2). “A Two-Stage Meta-Analysis.” Greg Gibson. (2011).

10

Do, Chuong B et al. “Web-based genome-wide association study identifies two novel loci and a substantial genetic component for Parkinson’s disease.” PLoS genetics 7.6 (2011): e1002141.

11

International Parkinson Disease Genomics Consortium. “Imputation of sequence variants for identification of genetic risks for Parkinson’s disease: a meta-analysis of genome-wide association studies.” The Lancet 377.9766 (2011): 641–649.

12

Sidransky, Ellen et al. “Multicenter analysis of glucocerebrosidase mutations in Parkinson’s disease.” New England Journal of Medicine 361.17 (2009): 1651–1661.

13

Paisán-Ruiz, Coro et al. “Cloning of the gene containing mutations that cause PARK8-linked Parkinson’s disease.” Neuron 44.4 (2004): 595–600.

14

Lill, Christina M et al. “Comprehensive research synopsis and systematic meta-analyses in Parkinson’s disease genetics: The PDGene database.” PLoS genetics 8.3 (2012): e1002548.

15

Purcell, Shaun et al. “PLINK: a tool set for whole-genome association and population-based linkage analyses.” The American Journal of Human Genetics 81.3 (2007): 559–575.

16

Nalls, Mike A et al. “NeuroX, a fast and efficient genotyping platform for investigation of neurodegenerative diseases.” Neurobiology of aging (2014).

17

Nalls, Mike A et al. “Large-scale meta-analysis of genome-wide association data identifies six new risk loci for Parkinson’s disease.” Nature genetics (2014).

18

1000 Genomes Project Consortium. “An integrated map of genetic variation from 1,092 human genomes.” Nature 491.7422 (2012): 56–65.

19

Hinds, David A et al. “A genome-wide association meta-analysis of self-reported allergy identifies shared and allergy-specific susceptibility loci.” Nature genetics 45.8 (2013): 907–911.

20

Akaike, H. “A new look at the statistical model identification.” 1974. <http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1100705>

21

Lemeshow, Stanley, and David W Hosmer. “A review of goodness of fit statistics for use in the development of logistic regression models.” American journal of epidemiology 115.1 (1982): 92–106.

22

“The R Project for Statistical Computing.” 20 Feb. 2015 <http://www.r-project.org/>

23

“Ubuntu Wiki: Home.” 2005. 20 Feb. 2015 <https://wiki.ubuntu.com/>

24

“ggplot2.” 2012. 20 Feb. 2015 <http://ggplot2.org/>

25

“CRAN - Package pROC.” 2010. 20 Feb. 2015 <http://cran.r-project.org/package=pROC>

26

“CRAN - Package QuantPsyc.” 2009. 20 Feb. 2015 <http://cran.r-project.org/package=QuantPsyc>

27

“CRAN - Package ResourceSelection.” 2011. 20 Feb. 2015 <http://cran.r-project.org/package=ResourceSelection>

28

“CRAN - Package scales.” 2011. 20 Feb. 2015 <http://cran.r-project.org/package=scales>

COMPETING FINANCIAL INTERESTS / CONFLICTS

CYM., MM., CN. and EDC are employees of and own stock or stock options in 23andMe, Inc. MFK, CM and DJS are employees of Merck Pharmaceuticals, Inc and hold stocks. AGDay-W is an employee of Biogen and holds stock. MSF, MMap and HJF hold a number of provisional patents related to dementia biomarkers. SJH and MF are employees of the Michael J. Fox Foundation. HM reports grants from Medical Research Council UK, grants from Wellcome Trust, grants from Parkinson’s UK, grants from Ipsen Fund, during the conduct of the study; grants from Motor Neuron Disease Association, grants from Welsh Assembly Government, personal fees from Teva, personal fees from Abbvie, personal fees from Teva, personal fees from UCB, personal fees from Boerhinger-Ingelheim, personal fees from GSK, outside the submitted work all outside the scope of this submitted work. No other potential conflicts of interest or disclosures exist.

AUTHOR CONTRIBUTIONS

Concept and design - MAN and ABS; Acquisition of data, data generation and data cleaning - MAN, CYM, JR, SE, SJH, KG, MS, MMa, PH, NW, JH, TG, AB, TRP, MFK, CM, JRG, AC-P, ES, CL, MSF, MM, HJF, AJN, VMvD, DW, CZ, DGH, SL, MMu, EDC, CN, MF, KM, AGDay-W, DJS and ABS; Analysis and interpretation of data - MAN, CYM, MSF, MMap, HJF, AJN, AGDay-W, DJS, JPAI and ABS; Drafting the article and revising it critically - MAN, CYM, JR, SE, SJH, KG, MS, MMar, PH, NW, JH, TG, AB, TRP, AN, MFK, CM, JRG, AC-P, ES, CL, MSF, MMap, HJF, AJN, HM, VMvD, DW, CZ, DGH, SL, MMu, CN, MF, KM, AGDay-W, DJS, JPAI and ABS

References

  • 1.Sobel N, Thomason ME, Stappen I, et al. An impairment in sniffing contributes to the olfactory impairment in Parkinson’s disease. Proc Natl Acad Sci U S A. 2001;98:4154–9. doi: 10.1073/pnas.071061598. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Michell AW, Lewis SJG, Foltynie T, Barker RA. Biomarkers and Parkinson’s disease. Brain. 2004;127:1693–705. doi: 10.1093/brain/awh198. [DOI] [PubMed] [Google Scholar]
  • 3.Lazarini F, Gabellec M-M, Moigneu C, de Chaumont F, Olivo-Marin J-C, Lledo P-M. Adult neurogenesis restores dopaminergic neuronal loss in the olfactory bulb. J Neurosci. 2014;34:14430–42. doi: 10.1523/JNEUROSCI.5366-13.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Goldstein DS, Sewell L. Olfactory dysfunction in pure autonomic failure: Implications for the pathogenesis of Lewy body diseases. Parkinsonism Relat Disord. 2009;15:516–20. doi: 10.1016/j.parkreldis.2008.12.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Adler CH, Gwinn KA, Newman S. Olfactory function in restless legs syndrome. Mov Disord. 1998;13:563–5. doi: 10.1002/mds.870130332. [DOI] [PubMed] [Google Scholar]
  • 6.Siderowf A, Jennings D, Connolly J, Doty RL, Marek K, Stern MB. Risk factors for Parkinson’s disease and impaired olfaction in relatives of patients with Parkinson’s disease. Mov Disord. 2007;22:2249–55. doi: 10.1002/mds.21707. [DOI] [PubMed] [Google Scholar]
  • 7.Doty RL, Reyes PF, Gregor T. Presence of both odor identification and detection deficits in Alzheimer’s disease. Brain Res Bull. 1987;18:597–600. doi: 10.1016/0361-9230(87)90129-8. [DOI] [PubMed] [Google Scholar]
  • 8.Li S, Okonkwo O, Albert M, Wang M-C. Variation in Variables that Predict Progression from MCI to AD Dementia over Duration of Follow-up. Am J Alzheimer’s Dis. 2013;2:12–28. doi: 10.7726/ajad.2013.1002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Nalls MA, Pankratz N, Lill CM, et al. Large-scale meta-analysis of genome-wide association data identifies six new risk loci for Parkinson’s disease. Nat Genet. 2014;46:989–93. doi: 10.1038/ng.3043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.International Parkinson’s Disease Genomics Consortium (IPDGC), Wellcome Trust Case Control Consortium 2 (WTCCC2) A two-stage meta-analysis identifies several new loci for Parkinson’s disease. PLoS Genet. 2011;7:e1002142. doi: 10.1371/journal.pgen.1002142. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Do CB, Tung JY, Dorfman E, et al. Web-based genome-wide association study identifies two novel loci and a substantial genetic component for Parkinson’s disease. PLoS Genet. 2011;7:e1002141. doi: 10.1371/journal.pgen.1002141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.International Parkinson Disease Genomics Consortium. Nalls MA, Plagnol V, et al. Imputation of sequence variants for identification of genetic risks for Parkinson’s disease: a meta-analysis of genome-wide association studies. Lancet. 2011;377:641–9. doi: 10.1016/S0140-6736(10)62345-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Sidransky E, Nalls MA, Aasly JO, et al. Multicenter analysis of glucocerebrosidase mutations in Parkinson’s disease. N Engl J Med. 2009;361:1651–61. doi: 10.1056/NEJMoa0901281. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Paisán-Ruíz C, Jain S, Evans EW, et al. Cloning of the gene containing mutations that cause PARK8-linked Parkinson’s disease. Neuron. 2004;44:595–600. doi: 10.1016/j.neuron.2004.10.023. [DOI] [PubMed] [Google Scholar]
  • 15.Zimprich A, Biskup S, Leitner P, et al. Mutations in LRRK2 cause autosomal-dominant parkinsonism with pleomorphic pathology. Neuron. 2004;44:601–7. doi: 10.1016/j.neuron.2004.11.005. [DOI] [PubMed] [Google Scholar]
  • 16.Lill CM, Roehr JT, McQueen MB, et al. Comprehensive research synopsis and systematic meta-analyses in Parkinson’s disease genetics: The PDGene database. PLoS Genet. 2012;8:e1002548. doi: 10.1371/journal.pgen.1002548. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44:837–45. [PubMed] [Google Scholar]
  • 18.Lemeshow S, Hosmer DW., Jr A review of goodness of fit statistics for use in the development of logistic regression models. Am J Epidemiol. 1982;115:92–106. doi: 10.1093/oxfordjournals.aje.a113284. [DOI] [PubMed] [Google Scholar]
  • 19.Gasser T. Genetics of Parkinson’s disease. Curr Opin Neurol. 2005;18:363–9. doi: 10.1097/01.wco.0000170951.08924.3d. [DOI] [PubMed] [Google Scholar]
  • 20.Postuma RB, Aarsland D, Barone P, et al. Identifying prodromal Parkinson’s disease: pre-motor disorders in Parkinson’s disease. Mov Disord. 2012;27:617–26. doi: 10.1002/mds.24996. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1. Supporting Figure 1.

Panel B is the distribution of Hosmer-Lemeshow p-values for the integrative model in PPMI for 5–100 randomly generated groupings to test calibration in a variety of scenarios, a p-value of < 0.05 would suggest that there was bias detected in that subset of the data. The histogram of the p-value distribution from the Hosmer-Lemeshow tests has been overlaid with a linear representation of gausian kernel smoothed density to better communicate the distribution. In this panel, p-values ranged from 0.055 to 0.876, with a mean of 0.471 (SD = 0.193) and a median of 0.480 (inter-quartile range = 0.310–0.624).

2

Supporting Table 1. Details of the stepwise regression. Logistic regression was used as the basis for the integrative predictive model trained on the PPMI dataset. Parameter estimates are sorted in descending order of Akaike information criterion.

Supporting Table 2. Variants included in the GRS calculations as they appear in the PPMI and PDBP NeuroX genotype datasets.

RESOURCES