Graphical abstract
Keywords: GWAS, Meta-analysis, Heterogeneity, Parkinson’s disease, METRADISC-XL
Highlights
-
•
The first quantitative synthesis of GWAS regarding Parkinson’s Disease.
-
•
Fifteen Parkinson’s Disease GWASs with 191.397 available SNPs pooled.
-
•
User friendly software (METRADISC-XL) implemented.
-
•
Seven chromosomal regions (bins) were replicated as associated with the Parkinson’s Disease trait.
Abstract
Introduction
Parkinson’s disease is a neurodegenerative disorder with a complex etiology coming from interactions between genetic and environmental factors. Research on Parkinson’s disease genetics has been an effortful struggle, while new technologies and novel study designs served as indispensable boosters. Until now, 90 loci and 20 disease-causing gene mutations have been identified. In this study we describe a novel non-parametric approach to GWAS meta-analysis and its application in PD genetics.
Methods
A literature search was conducted to identify Genome-Wide Association Studies (GWAS) regarding Parkinson’s disease. We applied predefined inclusion criteria and extracted the reported SNPs and their respective position and statistical significance. We divided all chromosomes in approximately equal genetic distance segments called bins and recorded the most significant SNP from each bin and each study and ranked them in terms of their p-value. Ranks from each bin were summed, averaged and added in a heterogeneity-based analysis using the METRADISC-XL software. Weighted and unweighted analysis was performed.
Results
Five-hundred and forty-three SNPs and their respective p-values from 15 studies were matched in their corresponding bins. The METRADISC-XL analysis resulted in 7 bins with a significant p-value. A bin on chromosome 4 where the SNCA gene is located found with genome-wide significant association with Parkinson’s Disease.
Conclusion
This is the first time a non-parametric method is applied in GWAS meta-analysis. The results add some insight on the overall understanding of Parkinson’s disease genetics and serve as a first step of further convergent analysis with Genome-wide linkage studies.
1. Introduction
Parkinson’s disease (PD) is the second most common neurodegenerative disease affecting 1% of the individuals over the age of 60 and 4% of the population older than 85 [1]. The disease has three core clinical characteristics, tremor, rigidity and bradykinesia, and numerous non-motor features that are now recognized to be present years before the manifestation of the typical parkinsonian syndrome, in a so-called prodromal phase. The disease’s neuropathological hallmark is neurodegeneration in specific brain areas, mainly the substantia nigra, due to the accumulation of a-synuclein and other proteins [2].
The pathogenesis of the disease is still not fully understood, and it is considered a multifactorial disease, with both a genetic and an environmental component hence most PD cases are sporadic and only 5–10 % of PD patients suffer from a monogenic form. To date, at least 90 loci and 20 disease-causing genes for parkinsonism have been identified [3].
Genetic epidemiology is a relatively new scientific approach to investigating the role of genetic factors in determining disease in families and populations. Genetic linkage and association studies were followed by Genome Wide Linkage (GWLS) and Association Studies (GWAS) as new genotyping methods emerged, resulting in a large amount of data. Meta-analysis of available data has a major contribution in revealing the true genetic component of sporadic disease under the common variant – common disease context. Meta-analyzing GWASs demands great computational effort and purpose-build software. Furthermore, since every GWAS may use different marker sets and genotyping platforms a classical meta-analysis approach should use genotype imputation with unclear effects on its performance [4]. In many cases the needed datasets to perform a GWAS meta-analysis are either incomplete or require the collaboration of many research teams globally to have access to the full spectrum of information.
In this study, we sought to perform a GWAS meta-analysis by implementing a comprehensive software (the METa-analysis of Ranked DISCovery datasets- METRADISC-XL), which can overcome the aforementioned limitations and to produce data that can be combined with relevant information from other study designs (GWLS and GWAS) as part of our team’s effort to pursue a genomic convergence approach regarding Parkinson’s disease. Previous similar approaches were made using the HEGESMA (Heterogeneity and Genome Search Meta Analysis) software and applied on Genome-wide scan meta-analyses [[5], [6], [7], [8], [9]]. In the case of GWAS meta-analysis though a larger number of markers (SNPs) and more missing values were anticipated, and the METRADISC-XL software was chosen to overcome these barriers. The METRADISC-XL (available online at http://biomath.med.uth.gr/metradisc/) is a software for non-parametric meta-analysis of ranked discovery datasets [10,11] which is here used for the first time for this purpose.
2. Material and methods
2.1. Search strategy
A thorough literature search was conducted in online databases PubMed and EMBASE for GWAS concerning PD from its inception to the 30th of June 2020. Combinations of key words such as “Parkinson’s disease”; “Genome-wide association study”; “GWAS”; “genome-wide”; “linkage disequilibrium”; “whole genome association” were used. To strengthen the depth and validity of our search, findings were compared and cross-validated with the HuGE navigator/GWAS integrator [12] and GWAS catalog [13] entries, where “Parkinson’s disease” was selected as the trait of interest.
2.2. Inclusion criteria
Eligible for inclusion were English language studies which followed a classical GWAS approach with well-characterized sporadic Parkinson’s disease cases and available association/statistical significance, in a genotype or most-significant level. Studies which examined other forms of PD (juvenile or early onset PD) or described associations with clinical characteristics (e.g. age at onset, motor and cognitive outcomes) or interactions (e.g. gene-environment interaction, coffee consumption) were considered ineligible. In case of overlapping samples, the study with the larger sample was included.
2.3. Data extraction
From each eligible study the following data were extracted: publication details (first author, year of study, title); number of cases and controls genotyped; all available SNPs either in the article, the supplemental files or the publicly available databases with their respective p-value and position. Only originally genotyped SNPs where included. Any replication results were discarded as with any overlapping samples.
2.4. Bins
All chromosomes were divided in approximately equal genetic distance segments called bins. Bin length was set to approximately 30 cM as usually used in the Genome Scan Meta-Analysis (GSMA) approach [8,14]. The bins were coded by the number of the respective chromosome and the order of the bin in the form “chromosome.bin order”. For example, bin 1.1 is the first bin of the first chromosome (Supplementary Table S1). The physical location of every bin (starting and ending base pair) was pinpointed by intergrading a Marshfield map and its respective DS markers, and the UCSC Genome Browser on Human (GRch38/hg38 Assembly).
2.5. SNPs matching
From each GWAS study, the most significant, in terms of reported p-value, SNP obtained within each bin was recorded. To facilitate this procedure due to the large number of entries, we matched the SNPs of each study to the corresponding chromosome and respective bin and finally recorded only the most significant, in a step-by-step algorithmic approach using original code in Python language through a Jupiter notebook. Original code is publicly available at https://dataverse.harvard.edu/ and https://github.com/ [15].
2.6. Heterogeneity based meta-analysis
For each study, the bins were ranked (1–120) according to their p-value significance. The smallest p-values were accredited the higher rank (120). Bins with no corresponding p-value were considered as missing values and attributed the code number “-99” to be recognized as such by the software [10,11]. When equal p-values were noted, we considered them as tied ranks and performed the mid-rank method i.e. they ranked by their median rank. The resulted ranks of each bin were summed and averaged across studies. The average rank of each bin (R) would serve as an indication of association or not of this bin with the trait, in this case Parkinson’s Disease. To further strengthen this indication, we investigated the consistency of the results for the same bin across studies, namely the between studies heterogeneity. This was assessed using the Q statistic which is defined as the sum of the squared deviations from the mean of the ranks of each study [8,16].
To implement the above-mentioned methodology, we used the METa-analysis of Ranked DISCovery datasets (METRADISC-XL) software. The METRADISC-XL software is a generalization of the METRADISC software based on the same methodology as described previously and implemented in microarray meta-analysis [10,11]. In this case the biological variable of interest are the chromosomal bins. As described previously each bin from each study is ranked based on the most significant p-value. Since, due to missing values, different number of bins may be ranked at each study (which may be common amongst all studies or in some of them) these raw ranks are adjusted by the maximum number of tested bins (nmax) in any of the combined studies. Therefore, the ranks of each study are multiplied by the nmax divided by the number of ranked bins in this study.
The significance of the metrics (R and Q) is assessed using a Monte Carlo method. The ranks of each study are randomly permuted for several times (in this case 100.000 times) and the software calculates the simulated metrics to create null distributions for them. Since there are missing values (not all bins have available ranking in all the studies) each bin is tested against the null distribution corresponding to the group of bins having available information (rank) from the same studies. These groups are called information classes and they are defined by the missing data. The significance of the metrics is defined as the percentage of simulated metrics that exceed or are equal to the observed metric.
The METRADISC-XL software allows for both unweighted and weighted analysis. We performed both and in the case of weighted analysis we used the weight function (n1i*n2i)/(n1i + n2i) where n1i is the number of cases and n2i the number of controls in study i.
3. Results
The database search resulted in 1.412 entries, 55 studies of which were initially selected as relevant. GWAS catalog under the trait “Parkinson’s disease” resulted in 22 studies which were cross-referenced with the GWAS integrator entries. After duplication removal and application of the selection criteria as described in the methods section, 19 GWASs with a total of 191.397 available-for-extraction SNPs, were selected for further analysis (Table 1) [[17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34], [35]]. Taking into consideration the significant amount of missing values (less than 5 available SNPs) from the studies of Beecham [18], Davis [20], Satake [31] and Vacic [35] they were also removed from the final analysis resulting in a total of 15 included GWASs.
Table 1.
Demographic characteristics of included studies.
| No. | Author | Year | Initial sample size (cases/controls) | Ethnicity | Extracted SNPs (n) |
Matched bins (n) |
|---|---|---|---|---|---|---|
| 1 | Bandre-Ciga | 2016 | 240/192 | Caucasian | 28 | 21 |
| 2 | Beecham* | 2013 | 484/1.145 | Caucasian | 1 | 1 |
| 3 | Davis* | 2013 | 31/767 | Amish | 3 | 3 |
| 4 | Do | 2011 | 3.426/29.624 | Caucasian | 390 | 65 |
| 5 | Edwards | 2010 | 604/619 | Caucasian | 72 | 33 |
| 6 | Foo | 2016 | 779/13.227 | East Asian (Han Chinese) | 96 | 32 |
| 7 | Fung | 2006 | 267/270 | Caucasian | 26 | 17 |
| 8 | Hamza | 2010 | 2.000/1.986 | Caucasian | 89 | 16 |
| 9 | Hu Y | 2015 | 250/250 | Chinese | 22 | 21 |
| 10 | Liu | 2011 | 268/178 | Ashkenazi | 55 | 32 |
| 11 | Pickrell | 2016 | 9.619/324.522 | Caucasian | 25 | 20 |
| 12 | Saad | 2010 | 1.039/1.984 | Caucasian | 50 | 21 |
| 13 | Satake* | 2009 | 988/2.521 | Japanese | 20 | 4 |
| 14 | Simon-Sanchez | 2009 | 1.713/3.978 | Caucasian | 345 | 87 |
| 15 | Simon-Sanchez | 2011 | 772/2.024 | Caucasian | 30 | 8 |
| 16 | Spencer | 2010 | 1.705/5.175 | Caucasian | 55 | 24 |
| 17 | Vacic* | 2014 | 1.130/2.611 | Ashkenazi | 4 | 4 |
| 18 | Maraganore | 2005 | 381/363 | Caucasian | 190059 | 120 |
| 19 | Chang | 2017 | 6.476/302.042 | Caucasian | 27 | 26 |
| Total | 32208/693847 | 191397 **(191369) | 555 **(543) |
Studies removed due to large number of missing values.
The included data set.
Application of the original code matched 543 SNPs and their respective p-values in their corresponding bins (the most significant SNP in terms of p-value in each bin and in each study) (Table 1) while 1.257 bins had missing values. Based on data availability from various studies these bins belonged to 92 information classes.
Application of the METRADISC-XL software for 15 studies, 120 bins, 100.000 permutations and 92 information classes revealed 7 statistically significant for association with-PD trait bins Fig. 1. Their corresponding right sided p-value for the adjusted R for both weighted and unweighted analysis and corresponding p-value of the Q statistic are shown on Table 2.
Fig. 1.
Weighted (square) and unweighted (circle) significance level of the average ranks of 120 bins in size-adjusted chromosomes. Bins with significant p-value<0,05 are shown above the 0.05 (solid) reference line.
Table 2.
Bins with high unweighted and/or weighted adjusted average ranks (Rmean, Rw/mean) and the corresponding significance and heterogeneity metrics.
| Bin | 4.4 | 12.4 | 19.4 | 17.3 | 17.2 | 3.2 | 1.1 | |
|---|---|---|---|---|---|---|---|---|
| adjusted rank | study 1 | 657,1 | −99,0 | −99,0 | 680,0 | 662,9 | −99,0 | −99,0 |
| study 4 | 219,7 | −99,0 | −99,0 | 216,0 | 217,8 | 186,5 | −99,0 | |
| study 5 | 436,4 | −99,0 | −99,0 | 367,3 | −99,0 | 378,2 | −99,0 | |
| study 6 | 450,0 | −99,0 | −99,0 | −99,0 | −99,0 | 435,0 | −99,0 | |
| study 7 | −99,0 | −99,0 | −99,0 | −99,0 | 840,0 | −99,0 | −99,0 | |
| study 8 | 892,5 | −99,0 | −99,0 | 847,5 | 870,0 | −99,0 | −99,0 | |
| study 9 | 628,6 | −99,0 | −99,0 | −99,0 | −99,0 | −99,0 | 628,6 | |
| study 10 | −99,0 | −99,0 | −99,0 | 345,0 | 356,3 | 431,3 | −99,0 | |
| study 11 | 720,0 | −99,0 | −99,0 | 708,0 | −99,0 | −99,0 | −99,0 | |
| study 12 | 685,7 | 680,0 | −99,0 | −99,0 | 662,9 | −99,0 | −99,0 | |
| study 14 | 158,2 | −99,0 | 139,8 | 143,7 | 156,9 | 127,9 | −99,0 | |
| study 15 | 1800,0 | −99,0 | −99,0 | −99,0 | 1755,0 | −99,0 | −99,0 | |
| study 16 | 590,0 | −99,0 | −99,0 | 575,0 | 585,0 | 580,0 | −99,0 | |
| study 18 | 10,0 | 108,0 | 115,0 | 66,0 | 73,0 | 11,0 | 116,0 | |
| study 19 | 553,8 | −99,0 | −99,0 | 549,2 | −99,0 | −99,0 | −99,0 | |
| Rmean | 45141167 | 19968572 | 1496758 | 23183384 | 50969116 | 11496949 | 18539999 | |
| right sided p-value for Rmean | 0,00 | 0,02 | 0,02 | 0,03 | 0,03 | 0,04 | 0,04 | |
| right sided p - value Rw/mean | 0,00 | 0,05 | 0,10 | 0,01 | 0,01 | 0,04 | 0,04 | |
| right sided p value for Q-mean | 0,12 | 0,14 | 0,50 | 0,26 | 0,50 | 0,12 | 0,97 |
The bins with the most significant p-values in both weighted and unweighted analysis were bin 1.1 (chr1: 1-11404933), 3.2 (chr3: 30697536-88674208), 4.4 (chr4: 70530434-98736813), 12.2 (chr12:12486720-43878111), 17.2 (chr17: 12661253-42207671) and 17.3 (chr17:42207672-71465399). Bins 12.4 (chr12: 73700113-103294741) and 19.4 (chr19: 50558304-58617616) where significant in unweighted analysis only (Table 2). Heterogeneity metric was marginally low for bins 4.4 and 3.2 (right-sided p-value = 0,11) and rather large for the rest bins.
4. Discussion
Exploration of the heritability in Parkinson’s disease has been a long and fascinating journey with numerous successes and drawbacks. Technological advantages were a booster in this effort, while the complexity of the matter was, and still is, a holdback. Until now, 20 disease-causing genes and 90 SNPs have been identified to be associated with the risk of developing Parkinson’s disease [[36], [37], [38], [39], [40]]. In this effort, GWASs and their meta-analyses have so far added insight of great value, consuming however great effort.
These approaches revealed a small, yet significant portion of the heritability of the disease. in GWASs or by implementing GWASs with other clinical phenotypes [36].
In this novel approach, we sought to investigate whether a quantitative synthesis is capable of effectively pooling available data from GWASs. Our goal was to identify genomic regions in a genome-wide-hypothesis free fashion, with significant pooled value serving to indicate candidate regions for further investigation.
This method is easy to be understood by clinicians and is not restricted by a distribution assumption nor by the different effect size measures or different techniques used in the initial GWASs. Nonparametric approaches have successfully been used in Genome Wide Linkage Scans and microarray meta-analysis [8,41]. Furthermore, in our effort to apply convergent genomics in PD, this is the first step to be followed by a similar meta-analysis of Genome-Wide linkage scans (Genome Scan Meta-Analysis, GSMA), and combine our findings based on the notion that “true” hits on both study designs have a better positive predictive value and serve as better candidate regions.
In this study we combined the initial, originally genotyped SNPs from each study. The combination of 534 SNPs and their ranking between 15 studies and 120 bins in 92 information classes using this methodology managed to result in one significant in the genome level bin (p-value<0,000042, threshold adjusted for 120 bins) and six bins with less significant association (p-value<0,05) with the trait in question. Forty-three of the top significant as initially genotyped and reported by the studies SNPs (n = 138) are located on a significant bin (Table 3).
Table 3.
Top significant, initially genotyped, SNPs as reported from each study and their corresponding BINs. Study number corresponds to the number on Table 1. BINs in bold are the significant ones.
| SNP | Position (bp) | Study No | BIN |
|---|---|---|---|
| rs12063142 | 18813023 | 5 | 1.2 |
| rs1543467 | 86548977 | 5 | 1.5 |
| rs17344386 | 83254011 | 18 | |
| rs35749011 | 155135036 | 19 | 1.6 |
| rs2986574 | 182173237 | 7 | 1.7 |
| rs823118 | 205723572 | 1 | 1.8 |
| rs823156 | 204031263 | 4 | |
| rs823118 | 205754444 | 11 | |
| rs823118 | 205723572 | 19 | |
| rs10797576 | 232664611 | 1 | 1.9 |
| rs849898 | 228153917 | 9 | |
| rs870575 | 45356764 | 8 | 2.3 |
| rs12613026 | 42867793 | 10 | |
| rs10197606 | 41790447 | 18 | |
| rs11887431 | 42179113 | 18 | |
| rs11674789 | 41822751 | 18 | |
| rs6430538 | 135539967 | 1 | 2.6 |
| rs1955337 | 169129145 | 1 | 2.7 |
| rs1474055 | 169110394 | 19 | |
| rs11186 | 189897394 | 9 | 2.8 |
| rs1010491 | 231160521 | 6 | 2.9 |
| rs1561374 | 29458092 | 6 | 3.2 |
| rs1684524 | 21936271 | 10 | |
| rs1352135 | 21935471 | 10 | |
| rs6783485 | 59427797 | 9 | 3.3 |
| rs1879553 | 118615463 | 9 | 3.5 |
| rs1879512 | 113576590 | 10 | |
| rs7641311 | 113574386 | 10 | |
| rs10513789 | 184242767 | 4 | 3.7 |
| rs976683 | 173767581 | 5 | |
| rs9290751 | 182732230 | 6 | |
| rs12637471 | 182762437 | 19 | |
| rs6599389 | 929113 | 4 | 4.1 |
| rs356220 | 90860363 | 12 | |
| rs1564282 | 842313 | 15 | |
| rs34311866 | 951947 | 19 | |
| rs4266290 | 15735495 | 11 | 4.2 |
| rs4698412 | 15346446 | 12 | |
| rs12502586 | 15335662 | 15 | |
| rs2242330 | 68129844 | 7 | 4.3 |
| rs6826751 | 68116450 | 7 | |
| rs3775866 | 68126775 | 7 | |
| rs356181 | 90626139 | 1 | 4.4 |
| rs356220 | 90860363 | 4 | |
| rs6812193 | 77418010 | 4 | |
| rs356220 | 89720189 | 5 | |
| rs356220 | 90641340 | 6 | |
| rs8180209 | 90644454 | 6 | |
| rs3775439 | 90709741 | 6 | |
| rs6532194 | 90780902 | 6 | |
| rs356220 | 89720189 | 8 | |
| rs356220 | 90860363 | 8 | |
| rs356168 | 90893454 | 8 | |
| rs2736990 | 90897564 | 8 | |
| rs1350855 | 91413829 | 8 | |
| rs6812193 | 76277833 | 11 | |
| rs2736990 | 90897564 | 12 | |
| rs2736990 | 90897564 | 14 | |
| rs3857059 | 90894261 | 14 | |
| rs11931074 | 90858538 | 14 | |
| rs2736990 | 90897564 | 15 | |
| rs3857059 | 90894261 | 15 | |
| rs11931074 | 90858538 | 15 | |
| rs356182 | 90626111 | 19 | |
| rs4862792 | 188438344 | 7 | 4.8 |
| rs13153459 | 44515935 | 9 | 5.2 |
| rs1916642 | 72488303 | 10 | 5.3 |
| rs6879012 | 72498637 | 10 | |
| rs26990 | 112814742 | 12 | 5.4 |
| rs3129882 | 32517508 | 8 | 6.2 |
| rs3129882 | 32441753 | 8 | |
| rs4713118 | 27709015 | 11 | |
| rs276555 | 137415146 | 6 | 6.5 |
| rs6912319 | 137452537 | 6 | |
| rs10256359 | 23084258 | 11 | 7.2 |
| rs320682 | 137038092 | 6 | 7.5 |
| rs17068332 | 3820589 | 16 | 8.2 |
| rs16887478 | 38561200 | 18 | 8.3 |
| rs10815285 | 5804424 | 18 | 9.1 |
| rs10746953 | 76917840 | 9 | 9.3 |
| rs2724788 | 12490835 | 16 | 10.2 |
| rs1892302 | 12486578 | 16 | |
| rs1480597 | 44481115 | 7 | 10.3 |
| rs7097094 | 44530696 | 7 | |
| rs10999501 | 72171365 | 10 | 10.4 |
| rs188789342 | 119612816 | 11 | 10.5 |
| rs117896735 | 121536327 | 19 | 10.6 |
| rs12294719 | 36684837 | 12 | 11.2 |
| rs1533588 | 36687460 | 12 | |
| rs12419750 | 36589978 | 12 | |
| rs7128419 | 36613848 | 12 | |
| rs687432 | 57926788 | 11 | 11.3 |
| rs10501570 | 84095494 | 7 | 11.4 |
| rs329648 | 133765367 | 1 | 11.6 |
| rs34637584 | 39020469 | 4 | 12.2 |
| rs148294058 | 42655580 | 11 | |
| rs1472402 | 40549297 | 18 | |
| rs7954761 | 82691472 | 12 | 12.4 |
| rs11060180 | 123303586 | 4 | 12.5 |
| rs11060180 | 122819039 | 11 | |
| rs11060180 | 123303586 | 19 | |
| rs9513249 | 97507450 | 5 | 13.3 |
| rs12870589 | 97572967 | 5 | |
| rs9323124 | 47466177 | 9 | 14.2 |
| rs11158026 | 55348869 | 19 | |
| rs1816879 | 58318356 | 5 | 15.2 |
| rs17463995 | 46791064 | 18 | |
| rs1881335 | 5206420 | 10 | 16.1 |
| rs4888984 | 78066835 | 7 | 16.3 |
| rs11868035 | 17715101 | 1 | 17.2 |
| rs12185268 | 41279463 | 4 | |
| rs11868035 | 17655826 | 4 | |
| rs281357 | 19683106 | 7 | |
| rs199533 | 42184098 | 8 | |
| rs199528 | 42198305 | 8 | |
| rs17690703 | 41281077 | 12 | |
| rs199533 | 42184098 | 14 | |
| rs169201 | 42145386 | 14 | |
| rs393152 | 41074926 | 14 | |
| rs1981997 | 41412603 | 14 | |
| rs2532274 | 41602941 | 14 | |
| rs2532269 | 41605885 | 14 | |
| rs8070723 | 41436901 | 14 | |
| rs17563986 | 41347100 | 15 | |
| rs1981997 | 41412603 | 15 | |
| rs8070723 | 41436901 | 15 | |
| rs2532274 | 41602941 | 15 | |
| rs393152 | 41074926 | 15 | |
| rs17649553 | 43994648 | 1 | 17.3 |
| rs17649553 | 43994648 | 19 | |
| rs1362858 | 32986600 | 9 | 18.2 |
| rs12456492 | 40673380 | 1 | 18.3 |
| rs4130047 | 38932233 | 4 | |
| rs4130047 | 43098270 | 11 | |
| rs1406968 | 19649880 | 5 | 20.1 |
| rs3746736 | 23372613 | 18 | 20.2 |
| rs1984279 | 23261192 | 18 | |
| rs151358 | 57043454 | 10 | 20.4 |
| rs2823357 | 15836776 | 4 | 21.1 |
The most significant recognized bin is 4.4 (chr4: 70530434-98736813). At least 22 SNPs were reported as top-ranking SNPs in their initial genotyping, from 10 different studies within this region (Table 3). In seven studies, this bin had SNPs with the most significant p-value thus assigned the maximum ranking (120) in our analysis. This resulted in a right-sided p-value for Q of 0,11. Furthermore, 37 out of the 67 SNPs reported as having an association with the PD trait in GWAS catalog (data downloaded on July 27, 2020) are located into bin 4.4. In this region rests the SNCA gene (Chr4: 89700345-89838315) which is a well-recognized risk gene for PD with very high confidence to represent an actual PD gene [36,42]. Bin 3.2 also showed some consistency among studies with ranking at the top quartile in 6 out of seven studies where data existed, but with an average rank p-value of 0,03.
Bins 17.2 and 17.3 where significant at the 0,01 level but with substantial heterogeneity. Bin 17.3 contains the MAPT gene, which also had been nominated association with increased PD risk [43,44]. This bin along with bin 4.4 may represent polymorphic risk loci were multiple common and rare risk alleles co-exist as described earlier [45].
This is the first time, to the best of our knowledge, of such an approach to GWAS meta-analysis being tested. Despite our enthusiasm, we should mention that this effort has some limitations. This method relies on the most significant statistical value in each bin from each study, and the consequent summation and averaging of their ranks. A great number of bins, though, remained without a designation due to missing values. GWAS datasets are reported to be publicly available but accessible only through consortia, specific organizations, and authorized users. In our study, the METRADISC-XL software can deal with missing values by creating null distributions from the same information class, yet such large amount of missing information drives to incomplete results. This discrepancy may have also contributed to the substantial observed heterogeneity. However, this approach is unlikely to generate false positive results.
Another issue is the well-known problem of matching a genetic map with a sequence-based physical map. Problems with assembly and incorrect identification of marker positions may lead to errors in the order of markers on physical maps [46]. Finally, since this is a meta-analysis based on GWAS, it carries all the inherent disadvantages of its type.
5. Conclusions
Overall, this study is the first attempt to handle the GWAS meta-analysis with a non-parametric rank-based approach. Though several drawbacks may have limited the value of our results, this study adds some insight in the overall understanding of Parkinson’s disease genetics and serves as a first step of further convergent analysis [47], while possibly introducing a new, useful tool to the scientific community.
Author statement
Dimitrios Rikos: Conceptualization, Methodology, Formal analysis, Investigation, Writing - Original Draf Vasileios Siokas: Formal analysis, Investigation, Writing - Original Draf Tatyana I, Burykina: Investigation, Data Curation, Writing - Review & Editing Nikolaos Drakoulis: Investigation, Writing - Review & Editing, Visualization Efthimios Dardiotis: Investigation, Supervision Elias Zintzaras: Conceptualization, Methodology, Supervision
Funding sources
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Code availability
Code is publicly available at https://dataverse.harvard.edu/ and https://github.com/.
Availability of data and material
Data and material are available upon request by the corresponding author.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
We thank professors Georgios Hadjigeorgiou and Georgia Xiromerisiou for their overall contribution and support. We highly appreciate Aris Andreou for his essential contribution in writing the code used as described in methods.
Handling Editor: Dr. Aristidis Tsatsakis
Footnotes
Supplementary material related to this article can be found, in the online version, at doi:https://doi.org/10.1016/j.toxrep.2021.10.008.
Appendix A. Supplementary data
The following is Supplementary data to this article:
References
- 1.Tysnes O.B., Storstein A. Epidemiology of Parkinson’s disease. J. Neural Transm. (Vienna, Austria : 1996) 2017;124(8):901–905. doi: 10.1007/s00702-017-1686-y. [DOI] [PubMed] [Google Scholar]
- 2.Dickson D.W. Neuropathology of Parkinson disease. Parkinsonism Relat. Disord. 2018;46(Suppl 1) doi: 10.1016/j.parkreldis.2017.07.033. S30-s3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Deng H., Wang P., Jankovic J. The genetics of Parkinson disease. Ageing Res. Rev. 2018;42:72–85. doi: 10.1016/j.arr.2017.12.007. [DOI] [PubMed] [Google Scholar]
- 4.Li J., Y-f Guo, Pei Y., Deng H.-W. The impact of imputation on meta-analysis of genome-wide association studies. PLoS One. 2012;7(4) doi: 10.1371/journal.pone.0034486. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Cao Y., Liao M., Huang X., Mo Z., Gao F. Meta-analysis of genome-wide linkage studies of atopic dermatitis. Dermatitis : contact, atopic, occupational. Drug. 2009;20(4):193–199. [PubMed] [Google Scholar]
- 6.Trikalinos T.A., Karvouni A., Zintzaras E., Ylisaukko-oja T., Peltonen L., Järvelä I. A heterogeneity-based genome search meta-analysis for autism-spectrum disorders. Mol. Psychiatry. 2006;11(1):29–36. doi: 10.1038/sj.mp.4001750. [DOI] [PubMed] [Google Scholar]
- 7.Tziastoudi M., Stefanidis I., Stravodimos K., Zintzaras E. Identification of chromosomal regions linked to diabetic nephropathy: a meta-analysis of genome-wide linkage scans. Genet. Test. Mol. Biomarkers. 2019;23(2):105–117. doi: 10.1089/gtmb.2018.0209. [DOI] [PubMed] [Google Scholar]
- 8.Zintzaras E., Ioannidis J.P. HEGESMA: genome search meta-analysis and heterogeneity testing. Bioinformatics (Oxford, England) 2005;21(18):3672–3673. doi: 10.1093/bioinformatics/bti536. [DOI] [PubMed] [Google Scholar]
- 9.Zintzaras E., Kitsios G., Harrison G.A., Laivuori H., Kivinen K., Kere J. Heterogeneity-based genome search meta-analysis for preeclampsia. Hum. Genet. 2006;120(3):360–370. doi: 10.1007/s00439-006-0214-1. [DOI] [PubMed] [Google Scholar]
- 10.Zintzaras E., Ioannidis J.P. Meta-analysis for ranked discovery datasets: theoretical framework and empirical demonstration for microarrays. Comput. Biol. Chem. 2008;32(1):38–46. doi: 10.1016/j.compbiolchem.2007.09.003. [DOI] [PubMed] [Google Scholar]
- 11.Zintzaras E., Ioannidis J.P. METRADISC-XL: a program for meta-analysis of multidimensional ranked discovery oriented datasets including microarrays. Comput. Methods Programs Biomed. 2012;108(3):1243–1246. doi: 10.1016/j.cmpb.2012.08.001. [DOI] [PubMed] [Google Scholar]
- 12.2011. GWAS Integrator: a Bioinformatics Tool to Explore Human Genetic Associations Reported in Published Genome-wide Association Studies. [Internet]. Available from: https://phgkb.cdc.gov/PHGKB/gWAHitStartPage.action https://www.nature.com/articles/ejhg201191.pdf. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.2019. The NHGRI-EBI GWAS Catalog of Published Genome-Wide Association Studies, Targeted Arrays and Summary Statistics 2019.https://www.ebi.ac.uk/gwas/home [Internet]. Available from: [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Zintzaras E., Kitsios G., Kent D., Camp N.J., Atwood L., Hopkins P.N. Genome-wide scans meta-analysis for pulse pressure. Hypertension (Dallas, Tex : 1979) 2007;50(3):557–564. doi: 10.1161/HYPERTENSIONAHA.107.090316. [DOI] [PubMed] [Google Scholar]
- 15.Andreou A. 2019. A Jupyter Notebook Used to Match a List of Research Papers, Base Pairs and an Associated p-value to Their Corresponding Bin and Filter for the Smallest Value: andreouA./basepair_to_bin. [Google Scholar]
- 16.Zintzaras E., Ioannidis J.P. Heterogeneity testing in meta-analysis of genome searches. Genet. Epidemiol. 2005;28(2):123–137. doi: 10.1002/gepi.20048. [DOI] [PubMed] [Google Scholar]
- 17.Bandrés-Ciga S., Price T.R., Barrero F.J., Escamilla-Sevilla F., Pelegrina J., Arepalli S. Genome-wide assessment of Parkinson’s disease in a Southern Spanish population. Neurobiol. Aging. 2016;45(213):e3–e9. doi: 10.1016/j.neurobiolaging.2016.06.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Beecham G.W., Dickson D.W., Scott W.K., Martin E.R., Schellenberg G., Nuytemans K. PARK10 is a major locus for sporadic neuropathologically confirmed Parkinson disease. Neurology. 2015;84(10):972–980. doi: 10.1212/WNL.0000000000001332. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Chang D., Nalls M.A., Hallgrímsdóttir I.B., Hunkapiller J., van der Brug M., Cai F. A meta-analysis of genome-wide association studies identifies 17 new Parkinson’s disease risk loci. Nature. 2017;49(10):1511–1516. doi: 10.1038/ng.3955. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Davis M.F., Cummings A.C., D’Aoust L.N., Jiang L., Velez Edwards D.R., Laux R. Parkinson disease loci in the mid-western Amish. Hum. Genet. 2013;132(11):1213–1221. doi: 10.1007/s00439-013-1316-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Do CB Tung J.Y., Dorfman E., Kiefer A.K., Drabant E.M., Francke U. Web-based genome-wide association study identifies two novel loci and a substantial genetic component for Parkinson’s disease. PLoS Genet. 2011;7(6) doi: 10.1371/journal.pgen.1002141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Edwards T.L., Scott W.K., Almonte C., Burt A., Powell E.H., Beecham G.W. Genome-wide association study confirms SNPs in SNCA and the MAPT region as common risk factors for Parkinson disease. Ann. Hum. Genet. 2010;74(2):97–109. doi: 10.1111/j.1469-1809.2009.00560.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Foo J.N., Tan L.C., Irwan I.D., Au W.L., Low H.Q., Prakash K.M. Genome-wide association study of Parkinson’s disease in East Asians. Hum. Mol. Genet. 2017;26(1):226–232. doi: 10.1093/hmg/ddw379. [DOI] [PubMed] [Google Scholar]
- 24.Fung H.C., Scholz S., Matarin M., Simón-Sánchez J., Hernandez D., Britton A. Genome-wide genotyping in Parkinson’s disease and neurologically normal controls: first stage analysis and public release of data. Lancet Neurol. 2006;5(11):911–916. doi: 10.1016/S1474-4422(06)70578-6. [DOI] [PubMed] [Google Scholar]
- 25.Hamza T.H., Zabetian C.P., Tenesa A., Laederach A., Montimurro J., Yearout D. Common genetic variation in the HLA region is associated with late-onset sporadic Parkinson’s disease. Nat. Genet. 2010;42(9):781–785. doi: 10.1038/ng.642. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Hu Y., Deng L., Zhang J., Fang X., Mei P., Cao X. A pooling genome-wide association study combining a pathway analysis for typical sporadic Parkinson’s disease in the Han Population of Chinese Mainland. Mol. Neurobiol. 2016;53(7):4302–4318. doi: 10.1007/s12035-015-9331-y. [DOI] [PubMed] [Google Scholar]
- 27.Liu X., Cheng R., Verbitsky M., Kisselev S., Browne A., Mejia-Sanatana H. Genome-wide association study identifies candidate genes for Parkinson’s disease in an Ashkenazi Jewish population. BMC Med. Genet. 2011;12:104. doi: 10.1186/1471-2350-12-104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Maraganore D.M., de Andrade M., Lesnick T.G., Strain K.J., Farrer M.J., Rocca W.A. High-resolution whole-genome association study of Parkinson disease. Am. J. Hum. Genet. 2005;77(5):685–693. doi: 10.1086/496902. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Pickrell J.K., Berisa T., Liu J.Z. Detection and interpretation of shared genetic influences on 42 human traits. nature. 2016;48(7):709–717. doi: 10.1038/ng.3570. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Saad M., Lesage S., Saint-Pierre A., Corvol J.C., Zelenika D., Lambert J.C. Genome-wide association study confirms BST1 and suggests a locus on 12q24 as the risk loci for Parkinson’s disease in the European population. Hum. Mol. Genet. 2011;20(3):615–627. doi: 10.1093/hmg/ddq497. [DOI] [PubMed] [Google Scholar]
- 31.Satake W., Nakabayashi Y., Mizuta I., Hirota Y., Ito C., Kubo M. Genome-wide association study identifies common variants at four loci as genetic risk factors for Parkinson’s disease. Nat. Genet. 2009;41(12):1303–1307. doi: 10.1038/ng.485. [DOI] [PubMed] [Google Scholar]
- 32.Simón-Sánchez J., Schulte C., Bras J.M., Sharma M., Gibbs J.R., Berg D. Genome-wide association study reveals genetic risk underlying Parkinson’s disease. Nat. Genet. 2009;41(12):1308–1312. doi: 10.1038/ng.487. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Simón-Sánchez J., van Hilten J.J., van de Warrenburg B., Post B., Berendse H.W., Arepalli S. Genome-wide association study confirms extant PD risk loci among the Dutch. Eur. J. Hum. Genet. 2011;19(6):655–661. doi: 10.1038/ejhg.2010.254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Spencer C.C., Plagnol V., Strange A., Gardner M., Paisan-Ruiz C., Band G. Dissection of the genetics of Parkinson’s disease identifies an additional association 5’ of SNCA and multiple associated haplotypes at 17q21. Hum. Mol. Genet. 2011;20(2):345–353. doi: 10.1093/hmg/ddq469. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Vacic V., Ozelius L.J., Clark L.N., Bar-Shira A., Gana-Weisz M., Gurevich T. Genome-wide mapping of IBD segments in an Ashkenazi PD cohort identifies associated haplotypes. Hum. Mol. Genet. 2014;23(17):4693–4702. doi: 10.1093/hmg/ddu158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Blauwendraat C., Nalls M.A., Singleton A.B. The genetic architecture of Parkinson’s disease. Lancet Neurol. 2020;19(2):170–178. doi: 10.1016/S1474-4422(19)30287-X. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Dardiotis E., Rikos D., Siokas V., Aloizou A.M., Tsouris Z., Sakalakis E. Assessment of TREM2 rs75932628 variant’s association with Parkinson’s disease in a Greek population and Meta-analysis of current data. Int. J. Neurosci. 2021;131(6):544–548. doi: 10.1080/00207454.2020.1750388. [DOI] [PubMed] [Google Scholar]
- 38.Nalls M.A., Blauwendraat C., Vallerga C.L., Heilbron K., Bandres-Ciga S., Chang D. Identification of novel risk loci, causal insights, and heritable risk for Parkinson’s disease: a meta-analysis of genome-wide association studies. Lancet Neurol. 2019;18(12):1091–1102. doi: 10.1016/S1474-4422(19)30320-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Siokas V., Aloizou A.M. ADORA2A rs5760423 and CYP1A2 rs762551 polymorphisms as risk factors for Parkinson’s disease. J. Clin. Med. 2021;10(3) doi: 10.3390/jcm10030381. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Siokas V., Arseniou S., Aloizou A.M., Tsouris Z., Liampas I., Sgantzos M. CD33 rs3865444 as a risk factor for Parkinson’s disease. Neurosci. Lett. 2021;748 doi: 10.1016/j.neulet.2021.135709. [DOI] [PubMed] [Google Scholar]
- 41.Kong X., Mas V., Archer K.J. A non-parametric meta-analysis approach for combining independent microarray datasets: application using two microarray datasets pertaining to chronic allograft nephropathy. BMC Genomics. 2008;9(1):98. doi: 10.1186/1471-2164-9-98. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Xiromerisiou G., Dardiotis E., Tsimourtou V., Kountra P.M., Paterakis K.N., Kapsalaki E.Z. Genetic basis of Parkinson disease. Neurosurg. Focus. 2010;28(1):E7. doi: 10.3171/2009.10.FOCUS09220. [DOI] [PubMed] [Google Scholar]
- 43.Pascale E., Di Battista M.E., Rubino A., Purcaro C., Valente M., Fattapposta F. Genetic Architecture of MAPT Gene Region in Parkinson Disease Subtypes. Front. Cell. Neurosci. 2016;10:96. doi: 10.3389/fncel.2016.00096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Seto-Salvia N., Clarimon J., Pagonabarraga J., Pascual-Sedano B., Campolongo A., Combarros O. Dementia risk in Parkinson disease: disentangling the role of MAPT haplotypes. Arch. Neurol. 2011;68(3):359–364. doi: 10.1001/archneurol.2011.17. [DOI] [PubMed] [Google Scholar]
- 45.Singleton A., Hardy J. A generalizable hypothesis for the genetic architecture of disease: pleomorphic risk loci. Hum. Mol. Genet. 2011;20(R2):R158–62. doi: 10.1093/hmg/ddr358. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.DeWan A.T., Parrado A.R., Matise T.C., Leal S.M. The map problem: a comparison of genetic and sequence-based physical maps. Am. J. Hum. Genet. 2002;70(1):101–107. doi: 10.1086/324774. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Aloizou A.M., Siokas V., Sapouni E.M., Sita N., Liampas I., Brotis A.G. Parkinson’s disease and pesticides: are microRNAs the missing link? Sci. Total Environ. 2020;744 doi: 10.1016/j.scitotenv.2020.140591. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data and material are available upon request by the corresponding author.


