MCPerm: A Monte Carlo Permutation Method for Accurately Correcting the Multiple Testing in a Meta-Analysis of Genetic Association Studies

Yongshuai Jiang; Lanying Zhang; Fanwu Kong; Mingming Zhang; Hongchao Lv; Guiyou Liu; Mingzhi Liao; Rennan Feng; Jin Li; Ruijie Zhang

doi:10.1371/journal.pone.0089212

. 2014 Feb 21;9(2):e89212. doi: 10.1371/journal.pone.0089212

MCPerm: A Monte Carlo Permutation Method for Accurately Correcting the Multiple Testing in a Meta-Analysis of Genetic Association Studies

Yongshuai Jiang ^1,^*,^¶, Lanying Zhang ^1,^¶, Fanwu Kong ^2,^¶, Mingming Zhang ^1,^¶, Hongchao Lv ^1,^¶, Guiyou Liu ³, Mingzhi Liao ⁴, Rennan Feng ⁵, Jin Li ¹, Ruijie Zhang ^1,^*

Editor: Fabio Rapallo⁶

PMCID: PMC3931718 PMID: 24586601

Abstract

Traditional permutation (TradPerm) tests are usually considered the gold standard for multiple testing corrections. However, they can be difficult to complete for the meta-analyses of genetic association studies based on multiple single nucleotide polymorphism loci as they depend on individual-level genotype and phenotype data to perform random shuffles, which are not easy to obtain. Most meta-analyses have therefore been performed using summary statistics from previously published studies. To carry out a permutation using only genotype counts without changing the size of the TradPerm P-value, we developed a Monte Carlo permutation (MCPerm) method. First, for each study included in the meta-analysis, we used a two-step hypergeometric distribution to generate a random number of genotypes in cases and controls. We then carried out a meta-analysis using these random genotype data. Finally, we obtained the corrected permutation P-value of the meta-analysis by repeating the entire process N times. We used five real datasets and five simulation datasets to evaluate the MCPerm method and our results showed the following: (1) MCPerm requires only the summary statistics of the genotype, without the need for individual-level data; (2) Genotype counts generated by our two-step hypergeometric distributions had the same distributions as genotype counts generated by shuffling; (3) MCPerm had almost exactly the same permutation P-values as TradPerm (r = 0.999; P<2.2e-16); (4) The calculation speed of MCPerm is much faster than that of TradPerm. In summary, MCPerm appears to be a viable alternative to TradPerm, and we have developed it as a freely available R package at CRAN: http://cran.r-project.org/web/packages/MCPerm/index.html.

Introduction

Meta-analysis is an important method that can improve statistical power by combining the results of multiple previously published studies [1], [2]. Every year, thousands of meta-analyses of genetic association studies are published. Some focus on only one single nucleotide polymorphic (SNP) locus [3]–[5], while others consider multiple SNP loci [6]–[9]. For those meta-analyses based on multiple loci, multiple testing is a persistent problem. Many techniques have been devised to correct multiple hypotheses, such as the Bonferroni and Sidak corrections [10], false discovery rate [11], and permutation [12]. Although these methods are highly successful, the traditional permutation (TradPerm) method is still widely considered the gold standard for accurately correcting for multiple testing [12], [13].

TradPerm is a type of non-parametric method in which the null distribution of the test statistic is estimated by shuffling uniform phenotype labels of cases and controls [13], [14]. However, it relies on the original SNP genotype and phenotype data (individual-level data) to perform a large number of random shuffles [15]. For the meta-analyses of genetic association studies, individual-level data are difficult to obtain for several reasons including research project privacy. Compared with raw SNP genotype data, count-based summary statistics of genotype data are more readily available, so most meta-analyses of genetic associations have been performed using summary statistics from previously published studies. However, for researchers of meta-analyses, it can be challenging to perform the permutation test using only the summary statistics of the genotype when similarly high standards to TradPerm are expected.

Here, we developed a Monte Carlo permutation (MCPerm) method for use in the permutation test. MCPerm employs a two-step hypergeometric distribution to generate random genotype counts in cases and controls. We used these random genotype counts to construct the background distribution of meta-analysis P-values and to complete the permutation correction. Finally, MCPerm was evaluated using five sets of real data and five sets of simulation data.

Methods

Data

The five real data sets used to evaluate the MCPerm method were the AlzGene database (Field Synopsis of Genetic Association Studies in Alzheimer disease, www.alzgene.org), SZGene database (Field Synopsis of Genetic Association Studies in Schizophrenia, www.szgene.org), PDGene database (Field Synopsis of Genetic Association Studies in Parkinson’s Disease, www.pdgene.org), MSGene database (Field Synopsis of Genetic Association Studies in Multiple Sclerosis, www.msgene.org), and AlsGene database (Field Synopsis of Genetic Association Studies in Amyotrophic Lateral Sclerosis, www.alsgene.org) [6]–[9]. We used PERLscript to extract all records from the five databases. The following criteria were used to filter the data: (1) all records must have a case-control design; (2) all records must have SNP genotype counts for both cases and controls; (3) the minor allele frequency must be greater than 0.01 for single records; (4) the P-value of the Hardy-Weinberg equilibrium test in the control group must be greater than 0.001 for single records; (5) each SNP must be reported by at least four studies (which will be used in the meta-analysis). We also excluded family-based studies.

Each record included the following information: study ID, the first author’s name, the year of publication, ethnicity, SNP names, SNP alleles, SNP genotypes, and genotype counts for cases and controls. A total of 850 SNP loci were obtained for the meta-analysis, and each locus had at least four records. These included 287 SNP loci from the AlzGene database (a total of 2,408 records), 273 from the SZGene database (a total of 2,179 records), 182 from the PDGene database (a total of 1,625 records), 88 from the MSGene database (a total of 604 records), and 20 from the AlsGene database (a total of 161 records).

As TradPerm relies on individual-level data, we also constructed five simulation data sets. We generated the individual’s genotype (AA, Aa, or aa) for each record based on the genotype count in cases and controls. By shuffling these simulated data, we constructed the random background and completed the TradPerm correction.

Meta-analysis

For ease of understanding, we will first describe the meta-analysis method. Suppose there are two alleles, A (A is the risk allele) and a at SNP locus A. Three possible genotypes are denoted by AA, Aa, and aa. Suppose also that a meta-analysis includes Inline graphic previously published studies. Without loss of generality, we mainly considered the allele model in the present study (the effect of the A allele vs. the a allele); however, we also provided some findings about the dominant and recessive models, and used to measure the effect size. To carry out a meta-analysis, we first tested for heterogeneity between studies. The commonly used indicators are Cochran’s Q-statistics and I ² [16], [17]. In this study, we used Cochran’s Q-statistics to evaluate heterogeneity. The null hypothesis was that the n studies have the same effect, and the significance level was Inline graphic . It should be noted that the multiple testing problem ought to be considered for the heterogeneity test in the meta-analyses of multiple loci, although few researchers do so. We provided a permutation correction method for the heterogeneity test, and, after testing for heterogeneity, we combined the effect sizes and evaluated the statistical association between genotype and phenotype. If heterogeneity existed ( Inline graphic ), we used a random effects model (assuming that each study has a specific effect size) to combine the effect sizes. If heterogeneity did not exist (), we used a fixed effects model (assuming that all studies share a common effect size) to combine the effect sizes [18].

TradPerm

For multiple SNP loci, the problem of multiple testing correction should be considered, so we describe below the TradPerm method for multiple testing corrections.

Suppose that we have obtained the original SNP genotype and phenotype data (individual-level data) from n previously published studies (Figure 1a). Each study has a different sample size. For each study, we can count the number of each of the three genotypes (AA, Aa, and aa) in the cases and controls (Figure 1c). Using these genotype counts, we can perform a meta-analysis and obtain a meta-analysis P-value ( Inline graphic ), then carry out a TradPerm procedure to correct the..

(a) Real individual-level genotype. (b) Shuffled individual-level genotype. (c) Genotype count from real SNP genotype. (d) Genotype count from random SNP genotype. (e) Completing meta-analysis. (f) Calculating P-values of meta-analysis (real P-values and N random P-values). (g) Ranking all P-values. (h) Calculating the corrected P-value.

The principle of TradPerm is if there is no association between genotypes and phenotypes, then any of the individual phenotype labels may be associated with any one of the genotypes. Therefore, the background distribution of the meta-analysis P-value can be generated by randomly shuffling the phenotype labels of cases and controls. Specifically, for each study included in the meta-analysis, we shuffled the phenotype labels (Figure 1b) and re-counted the number of the three genotypes in cases and controls (Figure 1d). Using these random genotype counts, we performed a meta-analysis (Figure 1e) and obtained a random meta-analysis P-value ( Inline graphic ). The entire process was repeated N times to obtain , ,, (Figure 1f). If there is a significant association between genotypes and phenotypes, the calculated using the real data will appear small relative to the distribution of the P-values obtained under permutation (Figure 1g). In other words, if the association does not exist, the Inline graphic for real data is unlikely to be obtained. The TradPerm P-value () under the null hypothesis can be calculated as the proportion of P-values under permutation that are less than or equal to (Figure 1h). is defined as follows: [19].

where Inline graphic is the real meta-analysis P-value, and is the P-value generated by permutation. Similarly, the P-value of the heterogeneity test can be corrected by shuffling the phenotype labels of cases and controls.

MCPerm

As described above, TradPerm relies on individual-level data which are very difficult to obtain. Therefore, our MCPerm method only uses count-based summary genotype data. MCPerm employs a two-step hypergeometric distribution to generate the random genotype counts in cases and controls.

Step 1: Simulate genotype count of AA in cases

First, we will consider one study in a meta-analysis. We suppose that there are a total of Inline graphic samples, including cases and controls in this study (Table 1). Consider locus A, where the counts of the three genotypes in cases and controls were denoted as: for AA in cases, for Aa in cases, for aa in cases, for AA in controls, for Aa in controls, and for aa in controls. The total number of counts of the three genotypes in all samples was denoted as: Inline graphic for AA, for Aa, and for aa. Next, we generated the genotype counts, , by stochastic simulation.

Table 1. The genotypic table.

	AA	Aa	Aa	Total
Case
Control
Total

Open in a new tab

First, we will introduce the hypergeometric distribution in statistics. Suppose an urn contains Inline graphic balls of which are black and are white. Consider an experiment in which balls are drawn without replacement from the run. The number of black balls in the sample of obeys a hypergeometric distribution: .

Now we will consider the entire TradPerm process. There are a total of Inline graphic samples of which are cases and are controls (where ). When we randomly shuffle the phenotype labels of cases and controls, the count of genotype AA in all samples is unchanged (that is, is a constant). This is equivalent to randomly drawing samples without replacement. Therefore, the count of genotype AA in case samples obeys a hypergeometric distribution Inline graphic ; i.e.,

graphic file with name pone.0089212.e071.jpg

Finally, we used the hypergeometric random number to simulate the count of genotype AA in case samples (that is, Inline graphic ). Kachitvichyanukul’s method [20] was used to generate the hypergeometric random number. This can be achieved by the function “rhyper” in R language. Because is a constant when we randomly shuffle, we can deduce that (where is a hypergeometric random number generated by Kachitvichyanukul’s method).

Step 2: Simulate genotype count of Aa in cases

At this stage, we have fixed the counts of genotype AA in case and control samples. As randomly shuffling is equivalent to randomly sampling without replacement, the remaining count of samples is Inline graphic . When we randomly shuffle the phenotype labels, the count of genotype Aa remains unchanged (that is, is a constant). This is equivalent to randomly drawing samples from the remaining samples without replacement. Therefore, the count of genotype Aa in case samples also obeys the hypergeometric distribution Inline graphic ; i.e.,

graphic file with name pone.0089212.e081.jpg

Therefore, we still employ Kachitvichyanukul’s method [20] to generate the hypergeometric random number. Because Inline graphic is a constant, we can deduce (where is a hypergeometric random number generated by Kachitvichyanukul’s method).

Finally, we can deduce Inline graphic and . Table 2 summarizes the methods used to simulate the six genotype counts.

Table 2. Summary of simulated methods of six genotype counts.

genotypes	counts	generationmethods
AA in case		hypergeometric random number generated in step1
Aa in case		hypergeometric random number generated in step2
aa in case
AA in control
Aa in control
aa in control
total

Open in a new tab

MCPerm procedure

The MCPerm P-value was calculated in the same way as for TradPerm. Specifically, for each study, we randomly generated genotype counts in cases and controls (Figure 2b). We then carried out a meta-analysis (Figure 2c) and obtained a random meta-analysis P-value ( Inline graphic ). The entire process was repeated times, resulting in random P-values (Figures 2d and 2e). The MCPerm P-value () can also be calculated as the proportion of P-values under simulation that are less than or equal to (Figure 2f). Similarly, the P-value of the heterogeneity test can be corrected by simulating the genotype counts of cases and controls.

(a) Genotype count collected from previous studies. (b) Genotype count generated by two-step hypergeometric distribution. (c) Completing meta-analysis. (d) Calculating P-values of meta-analysis (real P-values and N random P-values). (e) Ranking all P-values. (f) Calculating the corrected P-value.

Results

As TradPerm is often treated as the gold standard for multiple testing corrections, we did not compare our MCPerm with other multiple testing corrections methods. Instead, we compared the consistency of genotype distributions and permutation P-values between TradPerm and MCPerm. Our aim was to illustrate that MCPerm does not change the size of TradPerm P-values, so is a useful alternative to TradPerm.

Comparing Six Genotype Distributions between MCPerm and TradPerm

As an example, we randomly selected one SNP, rs778294, from the SZGene database. There were a total of 12 studies about rs778294 in the SZ database, and we selected one ( [21]) to compare six genotype distributions between MCPerm and TradPerm. We generated the individual’s genotype for this study based on the genotype counts in cases and controls. First, we used TradPerm to shuffle the sample labels 1,000 times and calculated the counts of six genotypes for each shuffle. Then we used MCPerm to generate the counts of the six genotypes (again 1,000 times). Figure 3 shows the distributions of the genotype counts, and we can see that the six genotype counts under MCPerm had similar distributions to genotype counts under TradPerm. This can also be seen from the cumulative distribution curve (Figure S1).

AA, Aa, aa in cases, and AA, Aa, and aa in controls. Gray solid bars represent the distributions obtained under TradPerm, and black hollow bars represent the distributions generated by MCPerm. The distributions are the same between MCPerm and TradPerm.

We next used the Kolmogorov-Smirnov test (KS-test) to compare genotype distributions between MCPerm and TradPerm. The null hypothesis was that the genotype counts had the same distribution between MCPerm and TradPerm, and this null hypothesis was not rejected after performing the KS-test in all six tests. The six KS-test P-values were P = 0.069 for AA genotypes in cases, P = 0.148 for Aa genotypes in cases, P = 0.828 for aa genotypes in cases, P = 0.069 for AA genotypes in controls, P = 0.148 for Aa genotypes in controls, and P = 0.828 for aa genotypes in controls. Because Inline graphic , the AA genotypes in the cases had the same KS-test P-value as the AA genotypes in the controls (P = 0.999). Similarly, Aa and aa genotypes in the cases had the same KS-test P-values as the controls. This indicated that the six genotype counts under MCPerm had the same distributions as genotype counts under TradPerm.

We also used the quantile-quantile plot (QQ-plot) to evaluate the consistency of the genotype distributions between MCPerm and TradPerm. This plotted the quantiles of the genotype counts generated by MCPerm vs. the quantiles of the genotype counts calculated by TradPerm (from 1,000 permutations). Figure 4 shows the QQ-plots of the six genotypes which were all linear, indicating that the genotype distributions are the same between MCPerm and TradPerm.

AA, Aa, and aa in cases, AA, Aa, and aa in controls.

In addition to genotype distributions, we also evaluated the distributions of the allele model (A allele vs. a allele, Figure S2), the dominant model (AA+Aa vs. aa genotypes, Figure S3), and the recessive model (AA vs. Aa+aa genotypes, Figure S4), and again observed the same distributions between MCPerm and TradPerm.

Comparing Meta-analysis P-values between MCPerm and TradPerm

In this section, we will describe the consistency of meta-analysis P-values between TradPerm and MCPerm. For each of the 850 SNPs, we carried out 1,000 permutations and calculated the permutation P-values of the meta-analysis (allele model) using TradPerm and MCPerm. For SNP rs778294, the probability density (Figure 5A), cumulative distribution curve (Figure 5B), and QQ-plot (Figure 5C) showed that permutation P-values had the same distributions between MCPerm and TradPerm. The same conclusions were obtained for both the dominant model and recessive model (Figures S5 and S6), and for the other 849 SNP loci. The comparison results for all 850 loci are shown in Supporting Information S1, which can be downloaded at http://www.bioapp.org/research/MCPerm/index.html.

(a) Probability density. (b) Cumulative distribution curve. (c) QQ-plot.

We next plotted a scatter plot of 850 MCPerm meta-analysis P-values against 850 TradPerm meta-analysis P-values (Figure 6). This revealed a highly linear relationship between MCPerm P-values and TradPerm P-values. We also calculated the Pearson’s correlation coefficient of the correlation between MCPerm P-values and TradPerm P-values (r = 0.999), and the correlation test P-value was <2.2e-16. These findings indicate that the MCPerm P-values are highly consistent with TradPerm P-values. The same conclusions were obtained for both the dominant model and recessive model.

The MCPerm P-values are highly consistent with TradPerm P-values (*r =* 0.999; P<2.2e-16).

Comparing Heterogeneity Test P-values between MCPerm and TradPerm

For the 850 SNPs, we compared the consistency of heterogeneity test P-values. For SNP rs778294, the probability density, cumulative distribution curve, and QQ-plot results revealed the same distributions of heterogeneity test P-values between MCPerm and TradPerm (Figures S7, S8, and S9). The comparison results for all 850 loci are shown in Supporting Information S1. The correlation test (r = 0.999, P-value <2.2e-16) also showed a high consistency of heterogeneity test P-values between MCPerm and TradPerm (Figure S10).

Estimating the Variance of lnOR and a Permutation Box Plot

In general, for a single study, the variance of Inline graphic (in an allele model, dominant model, or recessive model) is estimated using the following formula: , where a, b, c, and d represent the observed counts in fourfold tables [22]. The variance of is used as the weight of a single study when multiple studies are combined in a meta-analysis. Obviously, different observed counts cause different estimations of the variance of Inline graphic , and over-reliance on the original observed counts will affect the stability of the results. An advantage of MCPerm is that we can simulate the background of by a two-step hypergeometric random number. This enables us to directly calculate the variance of using the simulatedvalues, thus increasing the reliability of the meta-analysis results.

For each study in a meta-analysis, researchers expect to see the position of a real Inline graphic in a random background. To represent this, we designed a permutation box plot as shown for SNP rs778294 in Figure 7a. This allowed us to readily compare the real with the random background. In addition, for each study, we plotted the distribution of the random background and the position of the real Inline graphic (Figure 7b) to deepen our understanding of the meta-analysis.

(a) Permutation box plot. (b) Permutation probability density.

Comparing Running Time between MCPerm and TradPerm

We carried out 1,000 permutations for all 850 SNP loci on a 3.2 GHz desktop PC with a 2G memory running the Windows XP system. TradPerm took about 6 h of computation time, while MCPerm only required 18 min of computation time. This suggests that, for the 850 SNPs examined, MCPerm is about 20 times faster than TradPerm.

We further simulated three genome-wide data sets (100,000, 500,000, and 1,000,000 SNPs) to compare the running time of the two methods. Each data set included 10 genome-wide association studies. We then carried out 1,000 permutations of the three data sets. For 100,000 SNPs, 500,000 SNPs, and 1,000,000 SNPs, MCPerm required approximately 1.5 days, 6–7 days, and 13.5 days of computation time, respectively. The running time was not tested for TradPerm as it would take too long (more than 40 days for 100,000 SNPs).

In addition, for MCPerm, the genotype counts were directly simulated rather than calculated from the shuffled data, so the computation time did not depend on the sample size of a single study. This is directly relevant for a meta-analysis, as the number of studies affects the computation time.

MCPerm R Package

To facilitate the use of MCPerm, we developed a freely available R package, named MCPerm. The package has detailed instructions and examples and has been uploaded to the Comprehensive R Archive Network (CRAN). Users can download the package at the following website: http://cran.r-project.org/web/packages/MCPerm/index.html. It can be run using both Linux and the Windows environment. The current version is v1.1.4. There are a total of 45 functions that facilitate the implementation of the meta-analysis, TradPerm, and MCPerm. A detailed list of functions is given in Table S1.

Discussion

Multiple testing is a challenging issue in SNP-based meta-analysis. Among the many multiple testing correction methods, TradPerm is usually considered to be the gold standard [4], [5]. It estimates the background distribution of test P-values by shuffling phenotype labels, and thus is an accurate correcting method. However, the over-reliance on raw SNP genotype data and the large amount of computation are two obstacles that limit the scope of TradPerm. To overcome these limitations, we developed a MCPerm method that only uses the summary statistics of genotypes to perform the permutation.

MCPerm simulated the random genotype counts in cases and controls using a two-step hypergeometric distribution. We used five real data sets and five simulation data sets to prove that the genotype count distributions are the same between MCPerm and TradPerm. Furthermore, we showed that MCPerm can obtain the same size of permutation P-values of meta-analysis as TradPerm, and take less computation time than TradPerm to do so.

For the meta-analysis of a single SNP locus, we suggest that a MCPerm procedure is carried out as the background distribution of Inline graphic and the position of in the background (permutation box plot) will deepen our understanding of the analytical results. In addition, the procedure will enable the variance to be directly calculated from the simulatedvalues, increasing the reliability of the meta-analysis.

With the arrival of the era of post-association studies, two-step hypergeometric random numbers in MCPerm may have broader application prospects than expected. In some gene-based and pathway-based association studies, researchers combine P-values for SNPs into an overall P-value for a gene or pathway [23]. In such studies, the correction of P-values for SNPs is essential. However, they are usually undertaken by bioinformatics researchers who are not always able to collect raw SNP genotype data because of privacy policies. The two-step hypergeometric random number in MCPerm will aid these researchers to complete their permutation corrections using only the summary statistics of the genotype data such as those provided by the Wellcome Trust Case Control Consortium [24]. In conclusion, we hope that MCPerm will be widely used in genome-wide association studies and genome-wide studies of meta-analysis.

Supporting Information

Figure S1

Cumulative distribution curve of six genotype counts.

(JPG)

Click here for additional data file.^{(2.4MB, jpg)}

Figure S2

Comparison of the distributions of the allele model ( A allele vs. a allele).

(JPG)

Click here for additional data file.^{(3.7MB, jpg)}

Figure S3

Comparison of the distributions of the dominant model ( AA+Aa vs. aa genotypes).

(JPG)

Click here for additional data file.^{(3.4MB, jpg)}

Figure S4

Comparison of the distributions of the recessive model ( AA vs. Aa+aa genotypes).

(JPG)

Click here for additional data file.^{(3.4MB, jpg)}

Figure S5

Comparison of meta-analysis P -values of the dominant model.

(JPG)

Click here for additional data file.^{(1.4MB, jpg)}

Figure S6

Comparison of meta-analysis P -values of the recessive model.

(JPG)

Click here for additional data file.^{(1.4MB, jpg)}

Figure S7

Comparison of heterogeneity test P -values of the allele model.

(JPG)

Click here for additional data file.^{(1.4MB, jpg)}

Figure S8

Comparison of heterogeneity test P -values of the dominant model.

(JPG)

Click here for additional data file.^{(1.4MB, jpg)}

Figure S9

Comparison of heterogeneity test P -values of the recessive model.

(JPG)

Click here for additional data file.^{(1.4MB, jpg)}

Figure S10

Scatter plots of 850 MCPerm heterogeneity test P -values against 850 TradPerm heterogeneity test P -values. The MCPerm P-values are highly consistent with TradPerm P-values (r = 0.999; P<2.2e-16)

(JPG)

Click here for additional data file.^{(1.4MB, jpg)}

Table S1

The functions in MCPerm R package (v 1.1.4).

(DOC)

Click here for additional data file.^{(85.5KB, doc)}

Supporting Information S1

http://www.bioapp.org/research/MCPerm/index.html.

Click here for additional data file.^{(339B, html)}

Funding Statement

This work was supported in part by the National Natural Science Foundation of China (Grant Nos. 81172842, 31200934) and Natural Science Foundation of Heilongjiang Province (Grant No. C201206). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1.Green S (2005) Systematic reviews and meta-analysis. Singapore Med J 46: 270–273; quiz 274. [PubMed]
2. Engberg S (2008) Systematic reviews and meta-analysis: studies of studies. J Wound Ostomy Continence Nurs 35: 258–265. [DOI] [PubMed] [Google Scholar]
3. Haerian BS, Haerian MS (2013) OPRM1 rs1799971 polymorphism and opioid dependence: evidence from a meta-analysis. Pharmacogenomics 14: 813–824. [DOI] [PubMed] [Google Scholar]
4.Wu Z, Wang H, Chu X, Chen J, Fang S (2013) Association between FAS-1377 G/A polymorphism and susceptibility to gastric cancer: evidence from a meta-analysis. Tumour Biol. [DOI] [PubMed]
5.Liu G, Zhang S, Cai Z, Li Y, Cui L, et al.. (2013) BIN1 gene rs744373 polymorphism contributes to Alzheimer’s disease in East Asian population. Neurosci Lett. [DOI] [PubMed]
6. Allen NC, Bagade S, McQueen MB, Ioannidis JP, Kavvoura FK, et al. (2008) Systematic meta-analyses and field synopsis of genetic association studies in schizophrenia: the SzGene database. Nat Genet 40: 827–834. [DOI] [PubMed] [Google Scholar]
7. Lill CM, Abel O, Bertram L, Al-Chalabi A (2011) Keeping up with genetic discoveries in amyotrophic lateral sclerosis: the ALSoD and ALSGene databases. Amyotroph Lateral Scler 12: 238–249. [DOI] [PubMed] [Google Scholar]
8. Lill CM, Roehr JT, McQueen MB, Kavvoura FK, Bagade S, et al. (2012) Comprehensive research synopsis and systematic meta-analyses in Parkinson’s disease genetics: The PDGene database. PLoS Genet 8: e1002548. [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Bertram L, McQueen MB, Mullin K, Blacker D, Tanzi RE (2007) Systematic meta-analyses of Alzheimer disease genetic association studies: the AlzGene database. Nat Genet 39: 17–23. [DOI] [PubMed] [Google Scholar]
10. Sidak Z (1967) Rectangular confidence regions for the means of multivariate normal distributions. J Am Stat Assoc 62: 626–633. [Google Scholar]
11. Benjamini Y, Drai D, Elmer G, Kafkafi N, Golani I (2001) Controlling the false discovery rate in behavior genetics research. Behav Brain Res 125: 279–284. [DOI] [PubMed] [Google Scholar]
12.Westfall PH, Young SS (1993) Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment. New York: John Wiley & Sons, Inc.
13. Pahl R, Schafer H (2010) PERMORY: an LD-exploiting permutation test algorithm for powerful genome-wide association testing. Bioinformatics 26: 2093–2100. [DOI] [PubMed] [Google Scholar]
14. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81: 559–575. [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Gao X, Starmer J, Martin ER (2008) A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms. Genet Epidemiol 32: 361–369. [DOI] [PubMed] [Google Scholar]
16. Huedo-Medina TB, Sanchez-Meca J, Marin-Martinez F, Botella J (2006) Assessing heterogeneity in meta-analysis: Q statistic or I2 index? Psychol Methods 11: 193–206. [DOI] [PubMed] [Google Scholar]
17. Ioannidis JP, Patsopoulos NA, Evangelou E (2007) Heterogeneity in meta-analyses of genome-wide association investigations. PLoS One 2: e841. [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Lau J, Ioannidis JP, Schmid CH (1997) Quantitative synthesis in systematic reviews. Ann Intern Med 127: 820–826. [DOI] [PubMed] [Google Scholar]
19. Anderson MJ (2001) Permutation tests for univariate or multivariate analysis of variance and regression. Canadian Journal of Fisheries & Aquatic Sciences 58: 626–639. [Google Scholar]
20. Kachitvichyanukul V, Schmeiser B (1985) Computer generation of hypergeometric random variates. Journal of Statistical Computation and Simulation 22: 127–145. [Google Scholar]
21. Bass NJ, Datta SR, McQuillin A, Puri V, Choudhury K, et al. (2009) Evidence for the association of the DAOA (G72) gene with schizophrenia and bipolar disorder but not for the association of the DAO gene with schizophrenia. Behav Brain Funct 5: 28. [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Hauck WW (1979) The large-sample variance of the Mantel-Haenszel estimator of a common odds ratio. Biometrics 35: 817–819. [Google Scholar]
23. Peng G, Luo L, Siu H, Zhu Y, Hu P, et al. (2010) Gene and pathway-based second-wave analysis of genome-wide association studies. Eur J Hum Genet 18: 111–117. [DOI] [PMC free article] [PubMed] [Google Scholar]
24. WTCCC (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447: 661–678. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Figure S1

Cumulative distribution curve of six genotype counts.

(JPG)

Click here for additional data file.^{(2.4MB, jpg)}

Figure S2

Comparison of the distributions of the allele model ( A allele vs. a allele).

(JPG)

Click here for additional data file.^{(3.7MB, jpg)}

Figure S3

Comparison of the distributions of the dominant model ( AA+Aa vs. aa genotypes).

(JPG)

Click here for additional data file.^{(3.4MB, jpg)}

Figure S4

Comparison of the distributions of the recessive model ( AA vs. Aa+aa genotypes).

(JPG)

Click here for additional data file.^{(3.4MB, jpg)}

Figure S5

Comparison of meta-analysis P -values of the dominant model.

(JPG)

Click here for additional data file.^{(1.4MB, jpg)}

Figure S6

Comparison of meta-analysis P -values of the recessive model.

(JPG)

Click here for additional data file.^{(1.4MB, jpg)}

Figure S7

Comparison of heterogeneity test P -values of the allele model.

(JPG)

Click here for additional data file.^{(1.4MB, jpg)}

Figure S8

Comparison of heterogeneity test P -values of the dominant model.

(JPG)

Click here for additional data file.^{(1.4MB, jpg)}

Figure S9

Comparison of heterogeneity test P -values of the recessive model.

(JPG)

Click here for additional data file.^{(1.4MB, jpg)}

Figure S10

(JPG)

Click here for additional data file.^{(1.4MB, jpg)}

Table S1

The functions in MCPerm R package (v 1.1.4).

(DOC)

Click here for additional data file.^{(85.5KB, doc)}

Supporting Information S1

http://www.bioapp.org/research/MCPerm/index.html.

Click here for additional data file.^{(339B, html)}

[pone.0089212-Green1] 1.Green S (2005) Systematic reviews and meta-analysis. Singapore Med J 46: 270–273; quiz 274. [PubMed]

[pone.0089212-Engberg1] 2. Engberg S (2008) Systematic reviews and meta-analysis: studies of studies. J Wound Ostomy Continence Nurs 35: 258–265. [DOI] [PubMed] [Google Scholar]

[pone.0089212-Haerian1] 3. Haerian BS, Haerian MS (2013) OPRM1 rs1799971 polymorphism and opioid dependence: evidence from a meta-analysis. Pharmacogenomics 14: 813–824. [DOI] [PubMed] [Google Scholar]

[pone.0089212-Wu1] 4.Wu Z, Wang H, Chu X, Chen J, Fang S (2013) Association between FAS-1377 G/A polymorphism and susceptibility to gastric cancer: evidence from a meta-analysis. Tumour Biol. [DOI] [PubMed]

[pone.0089212-Liu1] 5.Liu G, Zhang S, Cai Z, Li Y, Cui L, et al.. (2013) BIN1 gene rs744373 polymorphism contributes to Alzheimer’s disease in East Asian population. Neurosci Lett. [DOI] [PubMed]

[pone.0089212-Allen1] 6. Allen NC, Bagade S, McQueen MB, Ioannidis JP, Kavvoura FK, et al. (2008) Systematic meta-analyses and field synopsis of genetic association studies in schizophrenia: the SzGene database. Nat Genet 40: 827–834. [DOI] [PubMed] [Google Scholar]

[pone.0089212-Lill1] 7. Lill CM, Abel O, Bertram L, Al-Chalabi A (2011) Keeping up with genetic discoveries in amyotrophic lateral sclerosis: the ALSoD and ALSGene databases. Amyotroph Lateral Scler 12: 238–249. [DOI] [PubMed] [Google Scholar]

[pone.0089212-Lill2] 8. Lill CM, Roehr JT, McQueen MB, Kavvoura FK, Bagade S, et al. (2012) Comprehensive research synopsis and systematic meta-analyses in Parkinson’s disease genetics: The PDGene database. PLoS Genet 8: e1002548. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0089212-Bertram1] 9. Bertram L, McQueen MB, Mullin K, Blacker D, Tanzi RE (2007) Systematic meta-analyses of Alzheimer disease genetic association studies: the AlzGene database. Nat Genet 39: 17–23. [DOI] [PubMed] [Google Scholar]

[pone.0089212-Sidak1] 10. Sidak Z (1967) Rectangular confidence regions for the means of multivariate normal distributions. J Am Stat Assoc 62: 626–633. [Google Scholar]

[pone.0089212-Benjamini1] 11. Benjamini Y, Drai D, Elmer G, Kafkafi N, Golani I (2001) Controlling the false discovery rate in behavior genetics research. Behav Brain Res 125: 279–284. [DOI] [PubMed] [Google Scholar]

[pone.0089212-Westfall1] 12.Westfall PH, Young SS (1993) Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment. New York: John Wiley & Sons, Inc.

[pone.0089212-Pahl1] 13. Pahl R, Schafer H (2010) PERMORY: an LD-exploiting permutation test algorithm for powerful genome-wide association testing. Bioinformatics 26: 2093–2100. [DOI] [PubMed] [Google Scholar]

[pone.0089212-Purcell1] 14. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81: 559–575. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0089212-Gao1] 15. Gao X, Starmer J, Martin ER (2008) A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms. Genet Epidemiol 32: 361–369. [DOI] [PubMed] [Google Scholar]

[pone.0089212-HuedoMedina1] 16. Huedo-Medina TB, Sanchez-Meca J, Marin-Martinez F, Botella J (2006) Assessing heterogeneity in meta-analysis: Q statistic or I2 index? Psychol Methods 11: 193–206. [DOI] [PubMed] [Google Scholar]

[pone.0089212-Ioannidis1] 17. Ioannidis JP, Patsopoulos NA, Evangelou E (2007) Heterogeneity in meta-analyses of genome-wide association investigations. PLoS One 2: e841. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0089212-Lau1] 18. Lau J, Ioannidis JP, Schmid CH (1997) Quantitative synthesis in systematic reviews. Ann Intern Med 127: 820–826. [DOI] [PubMed] [Google Scholar]

[pone.0089212-Anderson1] 19. Anderson MJ (2001) Permutation tests for univariate or multivariate analysis of variance and regression. Canadian Journal of Fisheries & Aquatic Sciences 58: 626–639. [Google Scholar]

[pone.0089212-Kachitvichyanukul1] 20. Kachitvichyanukul V, Schmeiser B (1985) Computer generation of hypergeometric random variates. Journal of Statistical Computation and Simulation 22: 127–145. [Google Scholar]

[pone.0089212-Bass1] 21. Bass NJ, Datta SR, McQuillin A, Puri V, Choudhury K, et al. (2009) Evidence for the association of the DAOA (G72) gene with schizophrenia and bipolar disorder but not for the association of the DAO gene with schizophrenia. Behav Brain Funct 5: 28. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0089212-Hauck1] 22. Hauck WW (1979) The large-sample variance of the Mantel-Haenszel estimator of a common odds ratio. Biometrics 35: 817–819. [Google Scholar]

[pone.0089212-Peng1] 23. Peng G, Luo L, Siu H, Zhu Y, Hu P, et al. (2010) Gene and pathway-based second-wave analysis of genome-wide association studies. Eur J Hum Genet 18: 111–117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0089212-WTCCC1] 24. WTCCC (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447: 661–678. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

MCPerm: A Monte Carlo Permutation Method for Accurately Correcting the Multiple Testing in a Meta-Analysis of Genetic Association Studies

Yongshuai Jiang

Lanying Zhang

Fanwu Kong

Mingming Zhang

Hongchao Lv

Guiyou Liu

Mingzhi Liao

Rennan Feng

Jin Li

Ruijie Zhang

Roles

Abstract

Introduction

Methods

Data

Meta-analysis

TradPerm

Figure 1. TradPerm method.

MCPerm

Step 1: Simulate genotype count of AA in cases

Table 1. The genotypic table.

Step 2: Simulate genotype count of Aa in cases

Table 2. Summary of simulated methods of six genotype counts.

MCPerm procedure

Figure 2. MCPerm method.

Results

Comparing Six Genotype Distributions between MCPerm and TradPerm

Figure 3. Distributions of the six genotype counts.

Figure 4. QQ-plots of six genotype counts.

Comparing Meta-analysis P-values between MCPerm and TradPerm

Figure 5. Comparison of meta-analysis P-values between MCPerm and TradPerm.

Figure 6. Scatter plots of 850 MCPerm vs. 850 TradPerm meta-analysis P-values.

Comparing Heterogeneity Test P-values between MCPerm and TradPerm

Estimating the Variance of lnOR and a Permutation Box Plot

Figure 7. Real LnOR in a random background.

Comparing Running Time between MCPerm and TradPerm

MCPerm R Package

Discussion

Supporting Information

Funding Statement

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases