Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 Apr 27.
Published in final edited form as: Proc IEEE Inst Electr Electron Eng. 2016 Mar 31;105(3):496–515. doi: 10.1109/jproc.2015.2507119

DANUBE: Data-driven meta-ANalysis using UnBiased Empirical distributions—applied to biological pathway analysis

Tin Nguyen 1, Cristina Mitrea 2, Rebecca Tagett 3, Sorin Draghici 4
PMCID: PMC5919277  NIHMSID: NIHMS854489  PMID: 29706661

Abstract

Identifying the pathways and mechanisms that are significantly impacted in a given phenotype is challenging. Issues include patient heterogeneity and noise. Many experiments do not have a large enough sample size to achieve the statistical power necessary to identify significantly impacted pathways. Meta-analysis based on combining p-values from individual experiments has been used to improve power. However, all classical meta-analysis approaches work under the assumption that the p-values produced by experiment-level statistical tests follow a uniform distribution under the null hypothesis. Here we show that this assumption does not hold for three mainstream pathway analysis methods, and significant bias is likely to affect many, if not all such meta-analysis studies. We introduce DANUBE, a novel and unbiased approach to combine statistics computed from individual studies. Our framework uses control samples to construct empirical null distributions, from which empirical p-values of individual studies are calculated and combined using either a Central Limit Theorem approach or the additive method. We assess the performance of DANUBE using four different pathway analysis methods. DANUBE is compared with five meta-analysis approaches, as well as with a pathway analysis approach that employs multiple datasets (MetaPath). The 25 approaches have been tested on 16 different datasets related to two human diseases, Alzheimer’s disease (7 datasets) and acute myeloid leukemia (9 datasets). We demonstrate that DANUBE overcomes bias in order to consistently identify relevant pathways. We also show how the framework improves results in more general cases, compared to classical meta-analysis performed with common experiment-level statistical tests such as Wilcoxon and t-test.

Index Terms: meta-analysis, p-values, empirical distribution, pathway analysis, Alzheimer’s disease, acute myeloid leukemia

I. Introduction

The proliferation of high-throughput genomics technologies has resulted in an abundance of data, for many different biomedical conditions. Large public repositories such as Gene Expression Omnibus [1, 2], The Cancer Genome Atlas (cancergenome.nih.gov), ArrayExpress [3, 4], and Therapeutically Applicable Research to Generate Effective Treatments (ocg.cancer.gov/programs/target) store thousands of datasets, within which there are independent experimental series with similar patient cohorts and experiment design. Gene expression data, as measured by microarrays, are particularly prevalent in public databases, such that some disease conditions are represented by half a dozen studies or more.

Experiments comparing two phenotypes, such as disease and control, yield lists of genes that are differentially expressed (DE). However, lists of DE genes obtained from similar but independent experiments tend to have little in common, and taken alone, they usually fail to elucidate the underlying biological mechanisms. Effective meta-analysis approaches are needed to unify the biological knowledge spread out over such similar studies with apparently incongruent results.

The goal of the meta-analysis is to combine the results of independent but related studies and provide increased statistical power and robustness compared to individual studies analyzed alone [5, 6]. In spite of the numerous sophisticated tools for meta-analysis, many biological applications still use only Venn diagrams (intersection/union) or vote counting for combining multiple studies [7, 8]. Such approaches are useful for demonstrating consistency when combining a few studies. However, when combining many studies, Venn diagrams are either too conservative (for intersection) or too anti-conservative (for union), while vote counting is statistically inefficient [5, 9, 10]. Regarding microarray data, meta-analysis has been used at both gene level [5, 7, 1113] and pathway level [11, 14]. Pathway analysis [1518] was developed to correlate differential gene expression evidence with a-priori defined functional modules, organized into biological pathway databases, such as Kyoto Encyclopedia of Genes and Genomes (KEGG) [19, 20], Reactome [21], Biocarta (www.biocarta.com), or Molecular Signatures Database (MSigDB) [22].

One straightforward and flexible way of integrating diverse studies is to combine the individual p-values provided by each study. Classical meta-analysis methods of combining p-values have been reviewed and compared in [23]. These include Fisher’s method based on the chi-squared distribution [24], the additive method [25] using the Irwin-Hall distribution [26, 27], minP [28], and maxP [29].

In an early study, Rhodes and others [13] collected multiple prostate cancer microarray datasets and combined p-values using Fisher’s method. Since then, other sophisticated approaches have been proposed including the weighted Fisher’s method [30] and the latent variable approach [31, 32].

The major drawback of the available p-value-based meta-analysis frameworks is that they work under the assumption that the p-values provided by the individual statistical tests follow a uniform distribution under the null hypothesis. Previous reports describe non-uniform distributions of p-values under the null as due to specific factors such as improper normalization, cross-hybridization, poorly characterized variance, and heteroskedasticity in microarray data analysis [33, 34], or even due to properties of some more general distributions [35]. Here we show that this assumption also does not hold in the realm of pathway analysis methods, severely compromising the reliability of the results. In addition to strong statistical assumptions, the current methods for combining p-values are sensitive to outliers. For example, using Fisher’s method, a p-value of zero in one individual case will result in a combined p-value of zero regardless of the other p-values. The same is true for the minP and maxP statistics, where outliers greatly influence the combined p-value.

Here we propose DANUBE (Data-driven meta-ANalysis using UnBiased Empirical distributions), a new meta-analysis framework which can combine the p-values of multiple studies in a better way. Our contribution is two-fold. First, we use empirical null distributions to calculate p-values for individual studies. This approach learns from the data under the null hypothesis and compensates for any bias potentially introduced by an individual pathway analysis method. Second, we combine the individual p-values using a method based on the Central Limit Theorem. This is less sensitive to outliers and provides more reliable results. Our simulation experiments demonstrate that both type I and type II errors of DANUBE are better than those of classical meta-analysis approaches using both parametric and non-parametric tests.

We apply DANUBE in the context of pathway analysis using 16 public gene expression datasets from two biological conditions, and 4 different pathway analysis methods. Gene Set Enrichment Analysis (GSEA) [36] and Gene Set Analysis (GSA) [37] are Functional Class Scoring methods [3639], Down-weighting of Overlapping Genes (PADOG) [38] is an enrichment method [4042], and Signaling Pathway Impact Analysis (SPIA) [43, 44] is a topology-aware method [43, 45]. These pathway analysis methods are applied on the human signaling pathways from KEGG [19, 20].

We show that with the exception of GSEA, each of the other three methods GSA, SPIA, and PADOG have different biases, leading to non-uniform distributions of p-values under the null hypothesis. Not surprisingly, when combining p-values using classical methods such as Fisher’s or the additive method, each of the three pathway analysis methods (GSA, SPIA, and PADOG) yields a very different list of significantly impacted pathways. We then apply the DANUBE framework using the empirical distributions characteristic to each of these methods. The DANUBE results yield much more consistent lists of significant pathways that are also pertinent to the phenotypes.

II. Background

We first recapitulate the classical methods of combining p-values, such as Fisher’s method [24] and the additive method [2527]. We then demonstrate the shortcomings of existing approaches in pathway analysis.

A. Fisher’s method

Fisher’s method [24] is one of the most widely used methods for combining independent p-values. Considering a set of m independent significance tests, the resulting p-values P1, P2, …, Pm are independent and uniformly distributed on the interval [0, 1] under the null hypothesis. Denoting Xi = −2 ln Pi (i ∈ {1, 2, …, m}) as new random variables, the cumulative distribution function of Xi can be calculated as follows:

Fi(x)=Pr(Xix)=Pr(2lnPix)=Pr(Piex2)=ex21f(p)dp=1ex2

The above function is the cumulative distribution function of a chi-squared distribution with two degrees of freedom (χ22). Since the sum of chi-squared random variables is also a chi-squared random variable, 2i=1mln(Pi) follows a chi-squared distribution with 2m degrees of freedom (χ2m2). In summary, the log product of m independent p-values follows a chi-squared distribution with 2m degrees of freedom:

X=2i=1mln(Pi)~χ2m2 (1)

We note that if one of the individual p-values approaches zero, which is often the case for empirical p-values, then the combined p-value approaches zero as well, regardless of other individual p-values. For example, if P1 → 0, then X → ∞ and therefore, Pr(X) → 0 regardless of P2, P3, …, Pm. Therefore, we see that Fisher’s method is sensitive to outliers.

In practice, most pathway analysis methods use some kind of permutation or bootstrap approach to construct an empirical distribution of a statistic under the null. For example, the empirical null distribution of the t statistic is ξt = {t1, t2, …, tN}. The empirical p-value calculated from such a distribution is the fraction of the statistics’ values in the N random trials performed that are more extreme than the observed one. Many times, there are no occurrences of values more extreme than the observed one, yielding an empirical p-value of zero. In this situation, the combined p-value calculated using Fisher’s method will be zero, even if all other p-values are equal to one. It is important to note that this phenomenon occurs because many methods choose to round the reported empirical p-value down to zero (when in fact, the real p-value is somewhere in the interval [0, 1/N]), and not because of the mathematical formulation of Fisher’s method.

B. Additive method

The additive method proposes an alternative approach that uses the sum of p-values instead of the log product. Consider m random variables P1, P2, …, Pm that are independent and uniformly distributed on the interval [0, 1]. Denoting X=i=1mPi as a new random variable, then X follows the Irwin-Hall distribution [26, 27]. The cumulative distribution function of X can be calculated as follows:

F(x)=12+12m!i=0m(1)i(mi)(xi)msgn(xi) (2)

Using the above cumulative distribution function, we can calculate the probability of observing the sum X=i=1mPi. We note that the concept of the additive method was also presented in [25] with a slightly different formulation and proof than in [26, 27]. However, they are equivalent and can be transformed into one another.

The additive method is not as sensitive to extremely small individual p-values as Fisher’s method. However, both methods assume the uniformity of the p-values under the null hypothesis. We will show that this assumption does not hold for three mainstream pathway analysis methods. The inherent bias of these pathway analysis methods is most likely to affect the classical meta-analysis in most cases, and thus lead to systematic bias in identifying significant pathways.

C. Pitfalls of the existing approaches

Null distributions are used to model populations so that statistical tests can determine whether an observation is unlikely to occur by chance. The p-values produced by a sound statistical test must be uniformly distributed in the interval [0,1] when the null hypothesis is true [3335, 46]. For example, the p-values that result from comparing two groups using a t-test should be distributed uniformly if the data are normally distributed [35]. When the assumptions of statistical models do not hold, the resulting p-values are not uniformly distributed under the null hypothesis. We will demonstrate this fact using gene expression data and pathway analysis.

Using only the control samples from 7 publicly available Alzheimer’s datasets (N=74), we simulate 40, 000 datasets as follows. We randomly label 37 as “control” samples and the remaining 37 as “disease” samples. We repeat this procedure 10, 000 times to generate different groups of 37 control and 37 disease samples. To make the simulation more general, we also create 10, 000 datasets consisting of 10 control and 10 disease samples, 10, 000 datasets consisting of 10 control and 20 disease samples, and 10, 000 datasets consisting of 20 control and 10 disease samples. We then calculate the p-values of the KEGG (version 65) human signaling pathways (extracted as graph objects by the R package ROntoTools1.2.0 [44] version 1.2.0) using the following methods: GSEA [36], GSA [37], SPIA [43, 44], and PADOG [38].

Figure 1 displays the empirical null distributions of p-values using GSA, SPIA, and PADOG. The horizonal axes represent p-values while the vertical axes represent p-value densities. Blue panels (A0–A6) show p-value distributions from GSA, while purple (B0–B6) and green (C0–C6) panels show p-value distributions from SPIA and PADOG, respectively. For each method, the larger panel (A0, B0, and C0) shows the cumulative p-values from all KEGG signaling pathways. The small panels, 6 per method, display extreme examples of non-uniform p-value distributions for specific pathways. For each method, we show three distributions severely biased towards zero (eg. A1–A3), and three distributions severely biased towards one (eg. A4–A6).

Fig. 1.

Fig. 1

The empirical null distributions of p-values using: Gene Set Analysis (GSA) - top, Signaling Pathway Impact Analysis (SPIA) - middle, and Down-weighting of Overlapping Genes (PADOG) - bottom. The distributions are generated by re-sampling from 74 control samples obtained from 7 public Alzheimer’s datasets. The horizontal axes display the p-values while the vertical axes display the p-value densities. Panels A0–A6 (blue) show the distributions of p-values from GSA; panels B0–B6 (purple) show the distribution of p-values from SPIA; panels C0–C6 (green) show the distribution of p-values from PADOG. The large panels on the left, A0, B0, and C0, display the distributions of p-values cumulated from all KEGG signaling pathways. The smaller panels on the right display the p-value distributions of selected individual pathways, which are extreme cases. For each method, the upper three distributions, for example A1–A3, are biased towards zero and the lower three distributions, for example A4–A6, are biased towards one. Since none of these p-value distributions are uniform, there will be systematic bias in identifying significant pathways using any one of the methods. Pathways that have p-values biased towards zero will often be falsely identified as significant (false positives). Likewise, pathways that have p-values biased towards one are more likely to be among false negative results even if they may be implicated in the given phenotype.

These results show that, contrary to generally accepted beliefs, the p-values are not uniformly distributed for three out of the four methods considered. Therefore one should expect a very strong and systematic bias in identifying significant pathways for each of these methods. Pathways that have p-values biased towards zero will often be falsely identified as significant (false positives). Likewise, pathways that have p-values biased towards one are likely to rarely meet the significance requirements, even when they are truly implicated in the given phenotype (false negatives). Systematic bias, due to non-uniformity of p-value distributions, results in failure of the statistical methods to correctly identify the biological pathways implicated in the condition, and also leads to inconsistent and incorrect results. For example, all three of the zero-biased GSA pathways shown in Figure 1: Prostate cancer (A1), Adherens junction (A2), and Pathways in cancer (A3), are reported as statistically significant in the results shown in Table I even though these data were collected in an experiment comparing Alzheimer’s disease patients vs. healthy subjects, an experiment that has nothing to do with cancer.

TABLE I.

The 17 top ranked pathways and FDR-corrected p-values obtained by combining the GSA p-values using 6 meta-analysis methods for Alzheimer’s disease. Stouffer’s method, the additive method, and DANUBE, identify the target pathway as significant and rank it in positions 11th, 6th, and 2nd, respectively. DANUBE yields the best ranking.

GSA + Stouffer’s method
GSA + Z-method
GSA + Brown’s method
Pathway pvalue.fdr Pathway pvalue.fdr Pathway pvalue.fdr
1 Vasopressin-regulated water reabsorption < 10−4 Vasopressin-regulated water reabsorption < 10−4 Vasopressin-regulated water reabsorption < 10−4
2 Pathogenic Escherichia coli infection < 10−4 Pathogenic Escherichia coli infection < 10−4 Pathogenic Escherichia coli infection < 10−4
3 Prostate cancer < 10−4 Prostate cancer 0.0307 Prostate cancer 0.0418
4 Pathways in cancer 0.0003 Pathways in cancer 0.1352 Adherens junction 0.1722
5 Adherens junction 0.0003 Adherens junction 0.1352 Pathways in cancer 0.1722
6 Hippo signaling pathway 0.0004 Hippo signaling pathway 0.1352 Hippo signaling pathway 0.1765
7 Synaptic vesicle cycle 0.0032 Synaptic vesicle cycle 0.2443 Synaptic vesicle cycle 0.2625
8 Vibrio cholerae infection 0.0032 Vibrio cholerae infection 0.2443 Endocrine and other factor-regulated calcium reabsorption 0.2625
9 Endocrine and other factor-regulated calcium reabsorption 0.0032 Endocrine and other factor-regulated calcium reabsorption 0.2443 Vibrio cholerae infection 0.2625
10 Shigellosis 0.0071 Shigellosis 0.2808 Pancreatic cancer 0.2625
11 Alzheimer’s disease 0.0073 Alzheimer’s disease 0.2808 Focal adhesion 0.2950
12 Bacterial invasion of epithelial cells 0.0073 Bacterial invasion of epithelial cells 0.2808 Shigellosis 0.3027
13 Pancreatic cancer 0.0095 Pancreatic cancer 0.2808 Bacterial invasion of epithelial cells 0.3034
14 Focal adhesion 0.0112 Focal adhesion 0.2808 Notch signaling pathway 0.3254
15 Parkinson’s disease 0.0112 Parkinson’s disease 0.2808 Alzheimer’s disease 0.3254
16 Huntington’s disease 0.0112 Huntington’s disease 0.2808 HIF-1 signaling pathway 0.3274
17 Wnt signaling pathway 0.0112 Wnt signaling pathway 0.2808 SNARE interactions in vesicular transport 0.3274

GSA + Fisher’s method
GSA + Additive method
GSA + DANUBE
Pathway pvalue.fdr Pathway pvalue.fdr Pathway pvalue.fdr

1 Vasopressin-regulated water reabsorption < 10−4 Prostate cancer < 10−4 Cardiac muscle contraction 0.0014
2 Pathogenic Escherichia coli infection < 10−4 Pathways in cancer 0.0002 Alzheimer’s disease 0.0014
3 Prostate cancer < 10−4 Hippo signaling pathway 0.0005 Huntington’s disease 0.0014
4 Adherens junction 0.0019 Adherens junction 0.0015 Parkinson’s disease 0.0014
5 Pathways in cancer 0.0023 Endocrine and other factor-regulated calcium reabsorption 0.0042 Hippo signaling pathway 0.0025
6 Hippo signaling pathway 0.0030 Alzheimer’s disease 0.0042 Vibrio cholerae infection 0.0047
7 Synaptic vesicle cycle 0.0097 Vibrio cholerae infection 0.0057 Synaptic vesicle cycle 0.0081
8 Vibrio cholerae infection 0.0121 Shigellosis 0.0057 Prostate cancer 0.0112
9 Endocrine and other factor-regulated calcium reabsorption 0.0133 Huntington’s disease 0.0057 Vasopressin-regulated water reabsorption 0.0112
10 Pancreatic cancer 0.0133 Bacterial invasion of epithelial cells 0.0057 Epithelial cell signaling in Helicobacter pylori infection 0.0118
11 Focal adhesion 0.0190 Parkinson’s disease 0.0057 Systemic lupus erythematosus 0.0150
12 Shigellosis 0.0222 Glioma 0.0057 Amyotrophic lateral sclerosis (ALS) 0.0174
13 Bacterial invasion of epithelial cells 0.0245 Vasopressin-regulated water reabsorption 0.0057 Shigellosis 0.0193
14 Alzheimer’s disease 0.0334 Cardiac muscle contraction 0.0057 Endocrine and other factor-regulated calcium reabsorption 0.0193
15 Notch signaling pathway 0.0334 Wnt signaling pathway 0.0057 Phagosome 0.0302
16 SNARE interactions in vesicular transport 0.0465 Synaptic vesicle cycle 0.0057 Lysosome 0.0302
17 Wnt signaling pathway 0.0465 Dorso-ventral axis formation 0.0119 Ribosome biogenesis in eukaryotes 0.0302

The horizontal lines show the 1% significance threshold. The target pathway Alzheimer’s disease is highlighted in green. Pathways highlighted in red are examples of false positives. These pathways were expected to be reported as false positives because their null distributions are very skewed toward zero (see Figure 1 panels A1–A3 and Supplementary Figure S3). These include Adherens junction and several cancer-related pathways, which are not considered to be implicated in Alzheimer’s disease.

The effect of combining control (i.e. healthy) samples from different experiments is to uniformly distribute all sources of bias among the random groups of samples. If we compare groups of control samples based on experiments, there could be true differences due to batch effects. By pooling them together, we form a population which is considered the reference population. This approach is similar to selecting from a large group of people that may contain different sub-groups (e.g. different ethnicities, gender, race, or living conditions). When we randomly select samples (for the two random groups to be compared) from the reference population, we expect all bias (e.g. ethnic subgroups) to be represented equally in both random groups and therefore, we should see no difference between these random groups, no matter how many distinct ethnic subgroups were present in the population at large. Therefore, the p-values of a test for difference between the two randomly selected groups should be equally probable between zero and one (see Supplementary Section 4 and Figures S10–S11 for more discussion).

We apply this procedure for the popular Gene Set Enrichment Analysis (GSEA) [36] using the exact same 40, 000 datasets simulated from the pool of control samples of Alzheimer’s data. The resulting p-value distributions are uniform, as displayed in Supplementary Figure S1, showing not only that our resampled data correctly models the null, but also that GSEA is an unbiased test. This supports the idea that the non-uniformity of the distributions is due to the methods rather than the data. We also plot the top 24 most biased null distributions of GSEA (Figures S2) using the exact same data and exact same random grouping of samples. In each figure, the panels are sorted by the distribution means. The distributions of GSEA (Figures S2, S6) are uniform while those of GSA (Figures S3, S7), SPIA (Figures S4, S8), and PADOG (Figures S5, S9) are biased. Therefore, the bias is indeed due to the methods and not to one specific pathway.

III. Methods

In this section we introduce the DANUBE framework and its application in the context of pathway analysis.

A. The DANUBE framework

We propose a new framework for meta-analysis that makes no assumptions on the data and is therefore expected to perform much better than any of the classical methods when the individual p-values are not distributed uniformly, as we have shown that it is the case for the pathway analysis methods. Figure 2 displays a flowchart comparison between classical meta-analysis and DANUBE. Both approaches take m independent studies as input. The pipeline marked by blue arrows (I–II) shows the classical meta-analysis, and the one marked by black arrows (1–4) is DANUBE.

Fig. 2.

Fig. 2

The DANUBE framework for meta-analysis. The blue arrows (I and II) show the classical meta-analysis pipeline while black arrows (1–4) show the pipeline of DANUBE. The first step (I) of the classical approach is to perform a parametric or non-parametric test for each study. This step provides individual p-values which are independent and identically distributed (i.i.d.), but not necessarily uniformly distributed under the null, as shown in Fig. 1. The second step (II) of the classical approach is to use a classical method, such as Fisher’s, to combine the individual p-values, relying heavily on the assumption of uniformity under the null. In step (1) of DANUBE, we choose the discriminating statistic and calculate the values of this statistic in each study (t1, t2, …, tm). In step (2), we generate the empirical distribution ξT of the discriminating statistic under the null hypothesis. In step (3), we calculate the probability of observing t1, t2, …, tm using ξT. In step (4), we combine the m empirical p-values using either the additive method or the Central Limit Theorem (CLT).

The classical approach first calculates a p-value for each study using a parametric or non-parametric test, then it combines the individual p-values into one. The main limitation of the classical approach is that it relies on the assumption of uniformity of the p-values under the null hypothesis, which often does not hold true. As shown in Figure 1, this assumption is not true for real transcriptomics data and KEGG pathways.

In the DANUBE framework, instead of modeling the data under a specific assumption, we construct empirical distributions and use them to calculate empirical p-values. Following the black arrows (1–4) in Figure 2, we initially calculate the values t1, t2, …, tm of the discriminating statistic for the m studies in step (1). For example, instead of using a statistical test to directly calculate the p-values, we could calculate the means of the data samples over the m studies. In step (2), we construct the empirical null distribution ξT for the chosen statistic. In step (3), we calculate the empirical p-values ep1, ep2, …, epm for the m studies with respect to the empirical null distribution ξT. For all i ∈ {1, 2, …, m}, epi is calculated as the number of elements in ξT more extreme than ti, divided by the total number of elements in ξT. We will prove that the resulting empirical p-values are uniformly distributed under the null hypothesis.

Lemma 1

Let T be a random variable with the empirical distribution ξT and the cumulative distribution function FT (T). We define the new random variable X as follows:

X=|{x:xξTxT}||ξT| (3)

where the numerator represents the number of elements of ξT that are smaller than or equal to T. If ξT consists of enough data points to be considered as continuous, then X is uniformly distributed on the interval [0,1].

Proof

Denote FT (T) as the cumulative distribution function of T. For any value tξT, FT(t) can be calculated as follows:

FT(t)=|{x:xξTxt}||ξT| (4)

We can see that X = FT (T). In addition, FT(t) is a strictly increasing function for all values tξT. Let FX(X) be the cumulative distribution function of X, we have the following formula:

FX(x)=Pr(Xx)=Pr(FT(T)FT(t))=Pr(Tt)=FT(t)=x (5)

We note that FX(x) = x is the cumulative distribution function of the continuous uniform distribution on [0,1]. Therefore, if we have enough data for FT(T) to be considered continuous, then X will be a uniformly distributed random variable.     ■

In step (4), we combine the empirical p-values using either the additive method or the Central Limit Theorem (CLT). According to Lemma 1, the resulting p-values after step (3) are now truly uniformly distributed under the null hypothesis and thus can be combined using the additive method as described in equation (2). However, the additive method can be computationally intensive when m is large. For this reason, we use the CLT to approximate the combined p-value [47]. The uniform distribution has mean and variance of 12 and 112, respectively. According to the CLT, the average of m independent and identically distributed (i.i.d.) variables (with large m) follows a normal distribution with mean μ=12 and variance σ2=112m. By default, we use this to approximate the combined p-value when m ≥ 20. We note that the additive method of combining p-values in our framework may be substituted by any other method of combining p-values.

B. The application of DANUBE in pathway analysis

Here we present the application of DANUBE in the context of pathway analysis (Figure 3). Let us consider a method M, which can be GSEA, GSA, SPIA, or PADOG, or any other method that outputs a p-value for each pathway in the pathway database. We treat this p-value as the discriminating statistic. In step (1), we calculate the p-values of the pathways using the method M. A pathway i will have m p-values (pi1, pi2, …, pim) for the m studies. The m p-values for a pathway are independent and identically distributed (i.i.d.). However, these p-values are not necessarily uniformly distributed under the null hypothesis (see Figure 1). Therefore, combining these p-values will lead to systematic bias in identifying significant pathways as shown in Section II-C and as will be further illustrated in Section IV. Instead of combining these p-values, we treat them as observed values of the discriminating statistic.

Fig. 3.

Fig. 3

DANUBE’s application in pathway analysis. The input is m studies (datasets), and a pathway database, such as KEGG. Each dataset has a certain number of control and disease samples. Step (1): perform pathway analysis using a method M (eg. GSA, SPIA, or PADOG). For each pathway, the resulting m p-values are independent and identically distributed (i.i.d.). However, these p-values are not uniformly distributed under the null hypothesis (see Figure 1), and therefore combining them would result in systematic bias. Step (2): pool the control samples from the m datasets to produce a large set of control samples. Step (3): generate k simulated datasets by randomly sampling from the pool. Since the “disease” and “control” samples in each of the simulated datasets were chosen only from the control samples of the original m studies, the resulting p-values are calculated under the null hypothesis. Step (4): perform pathway analysis on the simulated data. Step (5): build an empirical distribution for each pathway, which consists of k p-values obtained under the null hypothesis. Step (6): calculate an empirical p-value for each p-value obtained from step (1). For example, using the empirical distribution ξ1, we calculate the empirical p-value ep11 as the probability of observing a p-value more extreme than p1, i.e., ep11 = |{sp1i ≤ p11, i ∈ [1..k])|. Step (7): combine the m empirical p-values obtained for each pathway using either the additive method or the Central Limit Theorem.

To calculate the probability of observing such values, we need to construct the empirical distribution under the null hypothesis as described in steps (2–5) above. In step (2), we take all of the control samples from the m studies to create a set of control samples as shown in (C) in Figure 3. In step (3), we generate the k synthetic datasets by random sampling from the pool of control samples. For example, for a simulation, we choose two groups of samples from the pool and label them as controls and diseases. In our case study using the Alzheimer’s datasets, as described in Section II-C, we generated 10, 000 simulations of 10 control and 10 disease samples, 10, 000 simulations of 10 control and 20 disease samples, 10, 000 of 20 control and 10 disease samples, and 10, 000 of 37 control and 37 disease samples, for a total of 40, 000 simulations.

After generating k simulations from the control samples, we proceed to calculate the p-values for each pathway and each simulation using the same method M. For a pathway i, we have a set of p-values spi1, spi2, …, spik. Since all of these p-values are calculated from the real control samples (i.e. healthy people), they can be considered as p-values under the null hypothesis. These p-values will be used to construct the empirical distribution ξi in step (5). In summary, steps (2–5) produce an empirical distribution for each pathway, resulting in a total of n empirical distributions for n pathways. These distributions will be used to calculate the empirical p-values of the measurements done in step (1).

After steps (1–5), for a pathway i, we have m p-values pi1, pi2, …, pim and an empirical distribution ξi. Using the formula described in Equation (2), we calculate the empirical p-values epi1, epi2, …, epim. As we showed in the Methods section, these empirical p-values are independent and uniformly distributed under the null hypothesis. In step (7), we combine these empirical p-values using the additive method to have a single p-value pDANUBEi for pathway i.

IV. Results and Validation

In this section we illustrate the limitations of combining p-values using classical meta-analysis approaches, and show that DANUBE overcomes these limitations. Sections IV-A and IV-B compare the classical approaches with DANUBE for the specific application domain of pathway analysis. Sections IV-C and IV-D compare the classical meta-analysis approaches with DANUBE in the general case, applicable to any meta-analysis.

For the pathway analysis applications on which we focus in this paper, we compare DANUBE with 5 other classical meta-analysis methods: Stouffer’s, Z-method, Brown’s, Fisher’s, and the additive method [14, 24, 48, 49], each of them combined with each of the 4 pathway analysis methods (GSEA, GSA, SPIA, and PADOG). We also compare these methods with a stand-alone meta-analysis method, MetaPath. In total, we analyze the results of 25 approaches: 6 meta-analyses combined with 4 pathway analysis methods, plus MetaPath [11, 50]. Each of these methods is tested on two diseases, one is Alzheimer’s disease with 7 and the other is acute myeloid leukemia (AML) with 9 datasets. These conditions were selected for two reasons. First, there is a pathway in KEGG for each of the diseases. We refer to this as the target pathway, and use it to validate the methods. Second, there are multiple experiments available in the public domain for both of these diseases.

A. Pathway analysis applications: Alzheimer’s disease

The Alzheimer’s datasets we use in our data analysis are GSE28146 (hippocampus) and GSE5281 (6 different tissues: entorhinal cortex (EC), hippocampus (HIP), medial temporal gyrus (MTG), posterior cingulate (PC), superior frontal gyrus (SFG), and primary visual cortex (VCX)). The 4 pathway analysis methods, GSEA, GSA, SPIA, and PADOG, were used to process the expression data in each study and output a p-value for each study and for each pathway. Details of all datasets are provided in Supplementary Section 3.

The rankings and FDR-corrected p-values of the target pathway Alzheimer’s disease for the 7 Alzheimer’s datasets are displayed in Figure 4. The graphs demonstrate that the adjusted p-values and rankings of the target pathway vary substantially between the 4 methods for a given study, and from one study to the next. Furthermore, both GSA and PADOG report the target pathway Alzheimer’s disease as not significant in all 7 studies.

Fig. 4.

Fig. 4

Ranks (panel A) and p-values (panel B) of the KEGG target pathway, Alzheimer’s disease, for 7 Alzheimer’s datasets, using the pathway analysis methods: Gene Set Enrichment Analysis (GSEA), Gene Set Analysis (GSA), Signaling Pathway Impact Analysis (SPIA), and Down-weighting of Overlapping Genes (PADOG). The horizontal axes show the 7 Alzheimer’s datasets. The vertical axis in panel (A) shows the rankings of the target pathway for each dataset using the 4 methods. The vertical axis in panel (B) shows the FDR-corrected p-values of the target pathway. The red horizontal line in (B) shows the threshold 0.01. Note how the rankings and p-values of the target pathway vary greatly across different datasets and methods, making the interpretation of the results very difficult.

We combine the 4 pathway analysis methods with 6 meta-analyses: Stouffer’s, Z-method, Brown’s, Fisher’s, the additive method, and DANUBE. Using a pathway analysis method M, each pathway has 7 p-values – one per study. These 7 p-values are combined using each of the 6 meta analysis methods Therefore, each pathway analysis method produces 6 lists of pathways. Each list has 150 pathways ranked according to the combined p-values. We then adjusted the combined p-values for multiple comparisons in each list using FDR.

In order to run DANUBE, we generated the null distributions from control samples as described in Section III-B. We took the 74 control samples from the 7 Alzheimer’s datasets, and randomly divided them into “control” and “disease” subgroups. We generated 10, 000 simulations of 10 controls and 10 diseases, 10, 000 simulations of 10 controls and 20 diseases, 10, 000 of 20 controls and 10 diseases, and 10, 000 of 37 controls and 37 diseases, for a total of 40, 000 simulations. For each pathway analysis method, we constructed 150 empirical distributions for 150 KEGG signaling pathways (totally 600 empirical distributions for the 4 methods GSEA, GSA, SPIA, and PADOG). We used these empirical distributions to calculate the empirical p-values before applying the additive method to combine the empirical p-values for each pathway, resulting in 150 combined p-values. We then adjusted the combined p-values for multiple comparisons using FDR. Running time is reported in Supplementary Section 5 and Tables S1–S2.

Table I displays the results using GSA combined with the 6 meta-analysis methods. The horizontal line across each list marks the 1% significance threshold. The pathway highlighted green is the target pathway Alzheimer’s disease. Pathways highlighted in red are examples of false positives. These pathways were expected to be reported as false positives because their null distribution is very skewed towards zero (see Figure 1 panels A1–A3 and Supplementary Figure S3). These include Adherens junction and several cancer-related pathways, none of which are known to be implicated in Alzheimer’s disease. Stouffer’s method, the additive method, and DANUBE identify the target pathway as significant. DANUBE yields the best ranking.

Both Stouffer’s and the additive method identify the target pathway as significant using GSA, as shown in Table I. However, the inherent bias of the null distribution brings irrelevant results into the list of significant pathways. For Stouffer’s method, pathways having p-values biased toward zero, such as Prostate cancer, Adherens junction, Pathways in cancer, and Pancreatic cancer are still among the significant pathways. For the additive method, pathways having p-values biased toward zero, such as Prostate cancer, Adherens junction and Pathways in cancer are still among the significant pathways.

Table II displays the results using PADOG combined with the 6 meta-analysis methods. Only DANUBE identifies the target pathway as significant. Z-method and Brown’s method return no significant pathways. For Stouffer’s, Fisher’s, and the additive method, the systematic bias of the pathway analysis method greatly influences the outcome of the meta-analyses. Pathways having p-values biased toward zero, such as Adherens junction and cancer related pathways (see Figure 1 panels C1–C3 and Supplementary Figure S5) are among the significant pathways.

TABLE II.

The 20 top ranked pathways and FDR-corrected p-values obtained by combining the PADOG p-values using 6 meta-analysis methods for Alzheimer’s disease. Only DANUBE identifies the target pathway Alzheimer’s disease as significant and ranks it in position 6th.

PADOG + Stouffer’s method
PADOG + Z-method
PADOG + Brown’s method
Pathway pvalue.fdr Pathway pvalue.fdr Pathway pvalue.fdr
1 Adherens junction < 10−4 Adherens junction 0.6725 HIF-1 signaling pathway 0.6495
2 Shigellosis 0.0002 Shigellosis 0.6725 Adherens junction 0.6495
3 Renal cell carcinoma 0.0002 Renal cell carcinoma 0.6725 Gap junction 0.6495
4 Prostate cancer 0.0005 Prostate cancer 0.6725 Long-term potentiation 0.6495
5 Bacterial invasion of epithelial cells 0.0014 Bacterial invasion of epithelial cells 0.6725 Long-term depression 0.6495
6 Long-term depression 0.0036 Long-term depression 0.6725 Endocrine and other factor-regulated calcium reabsorption 0.6495
7 Pathogenic Escherichia coli infection 0.0036 Pathogenic Escherichia coli infection 0.6725 Bacterial invasion of epithelial cells 0.6495
8 Colorectal cancer 0.0036 Colorectal cancer 0.6725 Vibrio cholerae infection 0.6495
9 Gap junction 0.0036 Gap junction 0.6725 Pathogenic Escherichia coli infection 0.6495
10 Glioma 0.0036 Glioma 0.6725 Shigellosis 0.6495
11 Pancreatic cancer 0.0036 Pancreatic cancer 0.6725 Colorectal cancer 0.6495
12 Vibrio cholerae infection 0.0036 Vibrio cholerae infection 0.6725 Renal cell carcinoma 0.6495
13 Endocrine and other factor-regulated calcium reabsorption 0.0043 Endocrine and other factor-regulated calcium reabsorption 0.6725 Pancreatic cancer 0.6495
14 ErbB signaling pathway 0.0053 ErbB signaling pathway 0.6725 Endometrial cancer 0.6495
15 Endometrial cancer 0.0063 Endometrial cancer 0.6725 Glioma 0.6495
16 HIF-1 signaling pathway 0.0063 HIF-1 signaling pathway 0.6725 Prostate cancer 0.6495
17 Neurotrophin signaling pathway 0.0067 Neurotrophin signaling pathway 0.6725 ErbB signaling pathway 0.6533
18 Long-term potentiation 0.0076 Long-term potentiation 0.6725 Neurotrophin signaling pathway 0.6533
19 Synaptic vesicle cycle 0.0160 Synaptic vesicle cycle 0.7324 mRNA surveillance pathway 0.7157
20 VEGF signaling pathway 0.0317 VEGF signaling pathway 0.7324 MAPK signaling pathway 0.7157

PADOG + Fisher’s method
PADOG + Additive method
PADOG + DANUBE
Pathway pvalue.fdr Pathway pvalue.fdr Pathway pvalue.fdr

1 Adherens junction 0.0008 Adherens junction < 10−4 Vibrio cholerae infection < 10−4
2 Shigellosis 0.0022 Renal cell carcinoma < 10−4 Shigellosis < 10−4
3 Renal cell carcinoma 0.0022 Shigellosis < 10−4 Parkinson’s disease 0.0007
4 Prostate cancer 0.0049 Prostate cancer 0.0001 Synaptic vesicle cycle 0.0007
5 Bacterial invasion of epithelial cells 0.0065 Long-term depression 0.0006 Gap junction 0.0007
6 Pathogenic Escherichia coli infection 0.0149 Colorectal cancer 0.0009 Alzheimer’s disease 0.0007
7 Endocrine and other factor-regulated calcium reabsorption 0.0199 Gap junction 0.0011 Pathogenic Escherichia coli infection 0.0007
8 Glioma 0.0199 ErbB signaling pathway 0.0013 Cardiac muscle contraction 0.0007
9 Pancreatic cancer 0.0199 Bacterial invasion of epithelial cells 0.0013 Epithelial cell signaling in Helicobacter pylori infection 0.0009
10 Long-term depression 0.0199 Vibrio cholerae infection 0.0013 Huntington’s disease 0.0013
11 Gap junction 0.0199 Pancreatic cancer 0.0021 Renal cell carcinoma 0.0024
12 Colorectal cancer 0.0199 Glioma 0.0022 Vasopressin-regulated water reabsorption 0.0047
13 Vibrio cholerae infection 0.0199 Neurotrophin signaling pathway 0.0028 VEGF signaling pathway 0.0052
14 Long-term potentiation 0.0226 HIF-1 signaling pathway 0.0037 Endocrine and other factor-regulated calcium reabsorption 0.0072
15 Endometrial cancer 0.0226 Pathogenic Escherichia coli infection 0.0042 Bacterial invasion of epithelial cells 0.0078
16 HIF-1 signaling pathway 0.0257 Endometrial cancer 0.0052 GABAergic synapse 0.0102
17 ErbB signaling pathway 0.0326 VEGF signaling pathway 0.0052 Adherens junction 0.0103
18 Neurotrophin signaling pathway 0.0352 Endocrine and other factor-regulated calcium reabsorption 0.0052 Long-term depression 0.0103
19 Synaptic vesicle cycle 0.0600 Synaptic vesicle cycle 0.0086 Salmonella infection 0.0134
20 Dopaminergic synapse 0.1305 Long-term potentiation 0.0106 Colorectal cancer 0.0198

The horizontal lines show the 1% significance threshold. The target pathway Alzheimer’s disease is highlighted in green. Pathways highlighted in red are examples of false positives (see Figure 1 panels C1–C3 and Supplementary Figure S5).

Supplementary Table S3 displays the results using SPIA combined with the 6 meta-analysis methods. The target pathway is significant and is ranked near the top for all methods. DANUBE yields the shortest list of significant pathways. All the 5 significant pathways, Parkinson’s disease, Alzheimer’s disease, Synaptic vesicle cycle, Cardiac muscle contration, and Huntington’s disease are also significant when we combine DANUBE with GSA and PADOG.

Supplementary Table S4 displays the results using GSEA combined with the 6 meta-analysis methods. The horizontal line across each list marks the cutoff FDR = 0.01. The pathway highlighted green is the target pathway Alzheimer’s disease. The target pathway is significant for all the 6 meta-analysis methods. Because GSEA is unbiased, the additive method and DANUBE have equivalent results. These two methods have a shorter list of significant pathways and rank the target pathway higher than other methods. In addition, all the 4 significant pathways, Cardiac muscle contration, Huntington’s disease, Alzheimer’s disease, and Parkinson’s disease appear in the lists of significant pathways when we combine DANUBE with GSA, PADOG, and SPIA.

There is no gold standard for assigning true or false values to each of the results, apart from the expectation that a disease under study should impact its namesake pathway. Indeed, the target pathway Alzheimer’s disease is ranked as significant for all of the 4 pathway analysis methods when combined with DANUBE. The target pathway is also ranked higher when using DANUBE compared to the results of other 5 meta-analysis methods. In addition, the pathways Parkinson’s disease, Alzheimer’s disease, Cardiac muscle constration, and Huntington’s disease, consistently appear as significant in the results of all the 4 pathway analysis methods when combined with DANUBE.

Alzheimer’s, Parkinson’s, and Huntington’s diseases are three neurological disorders that have many commonalities including abnormal protein folding, endoplasmic reticulum stress, and ubiquitin mediated breakdown of proteins, leading to programmed cell death. Given that the pathway Alzheimer’s disease is influenced by the mitochondrial compartment, which is strongly implicated in the disease [5154], it is not surprising that other pathways with strong mitochondrial components also garner high rankings. Previous studies [55] have shown the presence of a cross-talk that makes the neurological disease pathways, Alzheimer’s disease, Parkinson’s disease and Huntington’s disease, along with Cardiac muscle contraction, appear as significant simultaneously, due to their dominant mitochondrial module. Cardiac muscle contraction has a strong mitochondrial component and is highly dependent on calcium signaling, which is also prevalent in Synaptic vesicle cycle, Alzheimer’s disease, and Huntington’s disease. Ca2+ regulates mitochondrial metabolism, but calcium overload to mitochondria can result in cell damage from reactive oxygen [56].

We also use MetaPath to combine the 7 studies. MetaPath is a stand-alone meta-analysis method, which does not need an external pathway analysis tool. This method performs meta-analysis at both gene (MAPE_G) and pathway levels (MAPE_P), and then combines the results (MAPE_I) to give the final p-value and ranking of pathways. Supplementary Table S5 shows the top 7 pathways using MetaPath for the 7 Alzheimer’s datasets. The target pathway Alzheimer’s disease is not significant and is outranked by 6 other pathways.

B. Pathway analysis applications: AML

The AML datasets we use in our data analysis are GSE14924 (CD4 and CD8 T cells), GSE17054 (stem cells), GSE12662 (CD34+ cells, promyelocytes, and neutrophils and PR9 cell line), GSE57194 (CD34+ cells), GSE33223 (peripheral blood, bone marrow), GSE42140 (peripheral blood, bone marrow), GSE8023 (CD34+ cells), and GSE15061 (bone marrow). The rankings and FDR-corrected p-values of the target pathway Acute myeloid leukemia for the 9 AML datasets are displayed in Supplementary Figure S12. The graphs demonstrate that the adjusted p-values and rankings of the target pathway vary substantially between the 4 methods for a given study, and from one study to the next. Furthermore, the AML pathway was not found to be significant by any method in any dataset.

We combine the 4 pathway analysis methods with the 6 meta-analysis methods. Using a pathway analysis method M, each pathway has 9 p-values – one per study. These 9 p-values are combined using each of the 6 meta-analysis methods Therefore, each pathway analysis method produces 6 lists of pathways. Each list has 150 pathways ranked according to the combined p-values. We then adjust the combined p-values for multiple comparisons in each list using FDR.

In order to run DANUBE, we generated the null distributions from control samples as described in Section III-B. We took the 140 control samples of the 9 AML datasets, and randomly designated “control” and “disease” subgroups. We generated 10, 000 simulations of 10 controls and 10 diseases, 10, 000 simulations of 30 controls and 50 diseases, 10, 000 of 50 controls and 30 diseases, and 10, 000 of 70 controls and 70 diseases, for a total of 40, 000 simulations. For each pathway analysis method, we constructed 150 empirical distributions for 150 KEGG signaling pathways (totally 600 empirical distributions for the 4 pathway analysis methods). We then used the empirical distributions to calculate the empirical p-values before applying the additive method to combine the empirical p-values for each pathway, resulting in 150 combined p-values. Finally, we adjusted the combined p-values for multiple comparisons using FDR.

Table III displays the results of GSA combined with the 6 meta-analysis methods, ordered by the FDR corrected p-values. We place a horizontal line across each list to mark our 1% cutoff. Stouffer’s method, the additive method, and DANUBE identify the target pathway as significant. DANUBE yields the best ranking (ranked 1st), followed by the additive (2nd) and Stouffer’s method (13th). In addition, the target pathway is the only significant pathway in DANUBE’s result.

TABLE III.

The 21 top ranked pathways and FDR-corrected p-values obtained by combining the GSA p-values using 6 meta-analysis methods for acute myeloid leukemia (AML). The target pathway Acute myeloid leukemia is significant for Stouffer’s, the additive method, and DANUBE with rankings 13th, 2nd, and 1st, respectively.

GSA + Stouffer’s method
GSA + Z-method
GSA + Brown’s method
Pathway pvalue.fdr Pathway pvalue.fdr Pathway pvalue.fdr
1 ErbB signaling pathway < 10−4 ErbB signaling pathway < 10−4 ErbB signaling pathway < 10−4
2 Sulfur relay system < 10−4 Sulfur relay system < 10−4 Sulfur relay system < 10−4
3 Adherens junction < 10−4 Adherens junction < 10−4 Adherens junction < 10−4
4 Tight junction < 10−4 Tight junction < 10−4 Tight junction < 10−4
5 Circadian rhythm < 10−4 Circadian rhythm < 10−4 Circadian rhythm < 10−4
6 Alcoholism < 10−4 Alcoholism < 10−4 Alcoholism < 10−4
7 Shigellosis < 10−4 Shigellosis < 10−4 Shigellosis < 10−4
8 Transcriptional misregulation in cancer < 10−4 Transcriptional misregulation in cancer < 10−4 Transcriptional misregulation in cancer < 10−4
9 Renal cell carcinoma < 10−4 Renal cell carcinoma < 10−4 Renal cell carcinoma < 10−4
10 Glioma < 10−4 Glioma < 10−4 Glioma < 10−4
11 Systemic lupus erythematosus < 10−4 Systemic lupus erythematosus < 10−4 Systemic lupus erythematosus < 10−4
12 Non-small cell lung cancer 0.0003 Non-small cell lung cancer 0.0606 Non-small cell lung cancer 0.1250
13 Acute myeloid leukemia 0.0012 Acute myeloid leukemia 0.1011 mTOR signaling pathway 0.2120
14 VEGF signaling pathway 0.0017 VEGF signaling pathway 0.1139 VEGF signaling pathway 0.2120
15 Endometrial cancer 0.0025 Endometrial cancer 0.1298 Pathways in cancer 0.2120
16 Pathways in cancer 0.0029 Pathways in cancer 0.1352 Acute myeloid leukemia 0.2120
17 mTOR signaling pathway 0.0033 mTOR signaling pathway 0.1386 HIF-1 signaling pathway 0.2252
18 Chronic myeloid leukemia 0.0081 Chronic myeloid leukemia 0.1933 Endometrial cancer 0.2252
19 Prostate cancer 0.0081 Prostate cancer 0.1933 Prostate cancer 0.2252
20 Pancreatic cancer 0.0097 Pancreatic cancer 0.2037 Insulin signaling pathway 0.2379
21 HIF-1 signaling pathway 0.0150 HIF-1 signaling pathway 0.2394 Pancreatic cancer 0.2628

GSA + Fisher’s method
GSA + Additive method
GSA + DANUBE
Pathway pvalue.fdr Pathway pvalue.fdr Pathway pvalue.fdr

1 ErbB signaling pathway < 10−4 Non-small cell lung cancer 0.0003 Acute myeloid leukemia 0.0065
2 Sulfur relay system < 10−4 Acute myeloid leukemia 0.0003 Transcriptional misregulation in cancer 0.0231
3 Adherens junction < 10−4 VEGF signaling pathway 0.0005 VEGF signaling pathway 0.0489
4 Tight junction < 10−4 ErbB signaling pathway 0.0005 Alcoholism 0.1161
5 Circadian rhythm < 10−4 Endometrial cancer 0.0008 Non-small cell lung cancer 0.5968
6 Alcoholism < 10−4 Transcriptional misregulation in cancer 0.0020 Bladder cancer 0.5968
7 Shigellosis < 10−4 Chronic myeloid leukemia 0.0038 HIF-1 signaling pathway 0.5968
8 Transcriptional misregulation in cancer < 10−4 mTOR signaling pathway 0.0043 Apoptosis 0.5968
9 Renal cell carcinoma < 10−4 Pathways in cancer 0.0043 mTOR signaling pathway 0.5968
10 Glioma < 10−4 Colorectal cancer 0.0084 Cocaine addiction 0.5968
11 Systemic lupus erythematosus < 10−4 Glioma 0.0108 Autoimmune thyroid disease 0.6141
12 Non-small cell lung cancer 0.0048 Pancreatic cancer 0.0108 Amyotrophic lateral sclerosis (ALS) 0.6458
13 Pathways in cancer 0.0153 Prostate cancer 0.0108 Notch signaling pathway 0.6458
14 Acute myeloid leukemia 0.0181 Small cell lung cancer 0.0177 ErbB signaling pathway 0.6458
15 mTOR signaling pathway 0.0188 Bacterial invasion of epithelial cells 0.0177 HTLV-I infection 0.6458
16 VEGF signaling pathway 0.0188 Adherens junction 0.0184 Natural killer cell mediated cytotoxicity 0.6458
17 Endometrial cancer 0.0243 Renal cell carcinoma 0.0239 Chronic myeloid leukemia 0.6458
18 HIF-1 signaling pathway 0.0252 Melanoma 0.0326 Endocytosis 0.6458
19 Prostate cancer 0.0252 Endocytosis 0.0403 Small cell lung cancer 0.6458
20 Insulin signaling pathway 0.0295 HIF-1 signaling pathway 0.0447 Fc gamma R-mediated phagocytosis 0.6458
21 Pancreatic cancer 0.0378 Circadian rhythm 0.0447 African trypanosomiasis 0.6458

The horizontal lines show the 1% significance threshold. The target pathway Acute myeloid leukemia is highlighted in green.

Table IV shows the results of PADOG combined with the 6 meta-analysis methods. The target pathway is significant for the 4 methods: DANUBE, Stouffer’s, Fisher’s, and the additive method. For DANUBE, Acute myeloid leukemia is ranked 1st compared to 7th using the other three meta-analysis methods. There are no significant pathways using the Z-method and Brown’s method.

TABLE IV.

The 23 top ranked pathways and FDR-corrected p-values obtained by combining the PADOG p-values using 6 meta-analysis methods for acute myeloid leukemia (AML). The target pathway Acute myeloid leukemia is significant for Stouffer’s, Fisher’s, the additive method and DANUBE. DANUBE yields the best ranking.

PADOG + Stouffer’s method
PADOG + Z-method
PADOG + Brown’s method
Pathway pvalue.fdr Pathway pvalue.fdr Pathway pvalue.fdr
1 Non-small cell lung cancer < 10−4 Non-small cell lung cancer 0.0705 Chronic myeloid leukemia 0.0412
2 Chronic myeloid leukemia < 10−4 Chronic myeloid leukemia 0.0705 Non-small cell lung cancer 0.0412
3 Glioma < 10−4 Glioma 0.2152 Glioma 0.1240
4 ErbB signaling pathway < 10−4 ErbB signaling pathway 0.2239 ErbB signaling pathway 0.2149
5 Colorectal cancer < 10−4 Colorectal cancer 0.2565 VEGF signaling pathway 0.2806
6 Prostate cancer < 10−4 Prostate cancer 0.2565 Pathways in cancer 0.2806
7 Acute myeloid leukemia < 10−4 Acute myeloid leukemia 0.2565 Colorectal cancer 0.2806
8 VEGF signaling pathway 0.0001 VEGF signaling pathway 0.2565 Pancreatic cancer 0.2806
9 Endometrial cancer 0.0001 Endometrial cancer 0.2565 Prostate cancer 0.2806
10 Pancreatic cancer 0.0001 Pancreatic cancer 0.2565 Acute myeloid leukemia 0.2806
11 Pathways in cancer 0.0001 Pathways in cancer 0.2565 Endometrial cancer 0.3398
12 Transcriptional misregulation in cancer 0.0005 Transcriptional misregulation in cancer 0.3509 mTOR signaling pathway 0.4198
13 T cell receptor signaling pathway 0.0012 T cell receptor signaling pathway 0.4055 T cell receptor signaling pathway 0.4198
14 mTOR signaling pathway 0.0012 mTOR signaling pathway 0.4055 Circadian rhythm 0.4198
15 Circadian rhythm 0.0015 Circadian rhythm 0.4061 Insulin signaling pathway 0.4198
16 Neurotrophin signaling pathway 0.0021 Neurotrophin signaling pathway 0.4184 Transcriptional misregulation in cancer 0.4198
17 Small cell lung cancer 0.0024 Small cell lung cancer 0.4184 Small cell lung cancer 0.4491
18 Renal cell carcinoma 0.0054 Renal cell carcinoma 0.4837 Neurotrophin signaling pathway 0.4568
19 Insulin signaling pathway 0.0063 Insulin signaling pathway 0.4837 mRNA surveillance pathway 0.4695
20 Endocytosis 0.0070 Endocytosis 0.4837 MAPK signaling pathway 0.4695
21 Adherens junction 0.0070 Adherens junction 0.4837 HIF-1 signaling pathway 0.4695
22 Wnt signaling pathway 0.0168 Wnt signaling pathway 0.5674 Endocytosis 0.4695
23 Melanoma 0.0195 Melanoma 0.5674 Wnt signaling pathway 0.4695

PADOG + Fisher’s method
PADOG + Additive method
PADOG + DANUBE
Pathway pvalue.fdr Pathway pvalue.fdr Pathway pvalue.fdr

1 Chronic myeloid leukemia < 10−4 Non-small cell lung cancer < 10−4 Acute myeloid leukemia < 10−4
2 Non-small cell lung cancer < 10−4 Chronic myeloid leukemia < 10−4 VEGF signaling pathway 0.0007
3 Glioma < 10−4 ErbB signaling pathway < 10−4 Non-small cell lung cancer 0.0008
4 ErbB signaling pathway < 10−4 Endometrial cancer < 10−4 T cell receptor signaling pathway 0.0021
5 Colorectal cancer 0.0003 Glioma < 10−4 Colorectal cancer 0.0023
6 Prostate cancer 0.0006 Colorectal cancer < 10−4 Chronic myeloid leukemia 0.0027
7 Acute myeloid leukemia 0.0006 Acute myeloid leukemia < 10−4 Endometrial cancer 0.0057
8 Pancreatic cancer 0.0007 Prostate cancer < 10−4 Transcriptional misregulation in cancer 0.0095
9 VEGF signaling pathway 0.0007 Transcriptional misregulation in cancer 0.0001 Glioma 0.0153
10 Pathways in cancer 0.0009 VEGF signaling pathway 0.0001 mTOR signaling pathway 0.0160
11 Endometrial cancer 0.0021 Pathways in cancer 0.0001 Prostate cancer 0.0203
12 Transcriptional misregulation in cancer 0.0056 Pancreatic cancer 0.0002 Apoptosis 0.0239
13 T cell receptor signaling pathway 0.0080 mTOR signaling pathway 0.0005 ErbB signaling pathway 0.0390
14 mTOR signaling pathway 0.0098 Neurotrophin signaling pathway 0.0005 B cell receptor signaling pathway 0.0464
15 Insulin signaling pathway 0.0098 Renal cell carcinoma 0.0006 Circadian rhythm 0.0521
16 Circadian rhythm 0.0098 T cell receptor signaling pathway 0.0006 Thyroid cancer 0.0844
17 Small cell lung cancer 0.0138 Circadian rhythm 0.0006 Progesterone-mediated oocyte maturation 0.1040
18 Neurotrophin signaling pathway 0.0165 Small cell lung cancer 0.0011 Oocyte meiosis 0.1040
19 Adherens junction 0.0318 Endocytosis 0.0036 Systemic lupus erythematosus 0.1441
20 Endocytosis 0.0356 Adherens junction 0.0052 Neurotrophin signaling pathway 0.1697
21 Renal cell carcinoma 0.0502 Melanoma 0.0072 Shigellosis 0.1697
22 Axon guidance 0.0564 Bacterial invasion of epithelial cells 0.0081 Fc epsilon RI signaling pathway 0.1697
23 Wnt signaling pathway 0.0564 Wnt signaling pathway 0.0128 Pancreatic cancer 0.2083

The horizontal lines show the 1% significance threshold. The target pathway Acute myeloid leukemia is highlighted in green.

Supplementary Table S6 shows the results of SPIA combined with the 6 meta-analysis methods, ordered by the FDR corrected p-value. Again, the target pathway is significant using Stouffer’s, Fisher’s, the additive method, and DANUBE. The additive method and DANUBE have the same list of significant pathways. In addition, both methods place the target pathway higher than the other two methods.

Supplementary Table S7 displays the results of GSEA combined with the 6 meta-analysis methods. The target pathway Acute myeloid leukemia is highlighted in green. For all 6 meta-analyses, the target pathway is not significant despite being ranked among the top pathways. Since GSEA has no bias, the additive method and DANUBE yield similar results. In essence, even though it is completely unbiased, GSEA lacks the power to identify the Acute myeloid leukemia (AML) as significant in the AML data.

We also use MetaPath to combine the 9 acute myeloid leukemia studies. Supplementary Table S8 shows the top 5 pathways using MetaPath. The target pathway is not significant (p=0.4), and is outranked by 2 other pathways.

Table V summarizes all the results for the 25 approaches (4 pathway analysis methods each combined with one of 6 meta-analysis approaches, plus MetaPath). On average, DANUBE performs best in terms of ranking, as well as in terms of identifying the target pathway as significant at the 1% cutoff.

TABLE V.

Ranking and significance of the target pathway for Alzheimer’s disease and acute myeloid leukemia (AML). The first and second columns show the disease and the pathway analysis methods. The next 6 columns show the ranking of the target pathways for 6 meta-analysis combined with the 4 pathway analysis methods. Each row shows the result of the 6 meta-analysis methods combined with the same pathway analysis method. Each cell shows the ranking of the target pathways. The Y(es) or N(o) letters next to the ranking denote if the target pathway is significant or not. Cells highlighted in green are those that are significant and have the best rankings in their row. The last column shows the result of MetaPath. For both diseases, and for all the 4 pathway analysis methods, the target pathway is significant and is ranked the highest when using DANUBE. The target pathway is not significant for AML data when the GSEA p-values are combined with any of the 6 meta-analysis methods.

Pathway analysis Meta-analysis Stouffer’s method Z-method Brown’s method Fisher’s method Additive method DANUBE MetaPath
Alzheimer’s GSEA 4 (Y) 4 (Y) 4 (Y) 4 (Y) 3 (Y) 3 (Y) 7 (N)
GSA 11 (Y) 11 (N) 15 (N) 14 (N) 6 (Y) 2 (Y)
SPIA 2 (Y) 2 (Y) 3 (Y) 3 (Y) 2 (Y) 2 (Y)
PADOG 21 (N) 21 (N) 31 (N) 23 (N) 21 (N) 6 (Y)

AML GSEA 1 (N) 1 (N) 4 (N) 4 (N) 1 (N) 1 (N) 4 (N)
GSA 13 (Y) 13 (N) 16 (N) 14 (N) 2 (Y) 1 (Y)
SPIA 4 (Y) 4 (N) 6 (N) 6 (Y) 2 (Y) 2 (Y)
PADOG 7 (Y) 7 (N) 10 (N) 7 (Y) 7 (Y) 1 (Y)

We note that for both diseases, DANUBE and the additive methods have the same results when combined with GSEA because GSEA is an unbiased method with uniform distributions of p-values under the null. In addition, the results of the two methods for SPIA are almost equivalent because the distributions of the p-values produced by SPIA under the null are closer to the expected uniform. Notably, DANUBE is more useful in conjunction with methods that have more skewed empirical null distributions.

C. General case: t-test and Wilcoxon test

In this section we will demonstrate the generality of the problem, beyond pathway analysis applications. In order to do so, we have used the one sample t-test [57, 58] and the one sample Wilcoxon signed-rank test [5961], as illustrative examples of parametric and non-parametric tests. Using simulated null distributions, we show that both the t-test and Wilcoxon tests have systematic bias depending on the shape and the symmetry of the null distribution. When the p-values are biased towards zero, combining multiple studies results in an increase of type I error (prevalence of false positives). When the p-values are biased towards one, the test loses power and more evidence is needed to identify true positives.

In Figure 5, panel (a) displays a simulated null distribution H0 which is not symmetrical and does not follow any standard distribution. Panel (b) displays an alternative distribution H1, which has the same shape as H0, but a slightly smaller median. Panel (c) displays another alternative distribution H2 which has the same shape as H0 but a slightly larger median. Each population has 100, 000 elements. The goal here is to investigate the ability of each approach to distinguish between H0 and H1, and between H0 and H2, respectively. This is attempted using both a t-test and a Wilcoxon test.

Fig. 5.

Fig. 5

Type I and Type II errors of the classical meta-analysis using one sample t-test and Wilcoxon signed-ranked test. Panel (a) displays the probability distribution under the null hypothesis H0. Panel (b) displays an alternative distribution H1 which has the same shape as the null distribution with a slightly smaller median. Panel (c) displays another alternative distribution H2 which has the same shape as the null distribution with a slightly larger median. Panels (d–h) display the results using left-tailed t-tests. Panel (d) displays the distribution of p-values using left-tailed t-test for samples drawn from the null distribution H0. Panel (e) displays the distribution of combined p-values using left-tailed t-test for samples drawn from the null distribution H0. The red dashed line represents the threshold (0.05) below which the null hypothesis will be rejected. The blue area to the left of the red dashed line is type I error (false positives). Panel (f) displays the distribution of combined p-values using a left-tailed t-test for samples drawn from the alternative distribution H1. The blue area to the right of the red dashed line is type II error (false negatives). Panel (g) displays the type I error with varying number of studies. Panel (h) displays the type II error with varying number of studies using a left-tailed t-test for samples drawn from the alternative distribution H1. Similarly, panels (i–m) display the results using right-tailed t-test; panels (n–r) display the results of left-tailed Wilcoxon signed-rank test; panels (s–w) display the results of right-tailed Wilcoxon signed-rank test. In this example, the left-tailed t-test and right-tailed Wilcoxon tests are biased towards 0 as shown in (e,f). Therefore, an increase in the number of studies makes the combined p-values more biased towards 0, causing an increase in type I error as shown in (g,v). On the contrary, the right-tailed t-test and left-tailed Wilcoxon test are biased towards 1. This kind of bias makes the test less powerful. For example, with 10 studies, type II errors using right-tailed t-test and left-tailed Wilcoxon test are 0.51 and 0.61, respectively.

Denoting M0 and m0 as the mean and median of the null distribution H0, M0 is used as the parameter (mean) for the t-tests where m0 is used as the parameter (median) for Wilcoxon test. To make the analysis more general, the sample size is randomized between 3 and 10 everytime we pick a sample. Since DANUBE uses the additive method to combine the p-values, we also use the additive method to combine the p-values of t-test and Wilcoxon test. When the number of studies is larger or equals to 20, the combined p-values are calculated using the Central Limit Theorem as described in section III.

Panels (d–h) show the results using the one sample left-tailed t-test for the mean; panels (i–m) show the results using the one sample right-tailed t-test for the mean; panels (n–r) show the results using the one sample left-tailed Wilcoxon test for the median; panels (s–w) show the results using one sample right-tailed Wilcoxon test for the median.

Panel (d) shows the distribution of p-values for samples drawn from the null distribution H0. To plot this panel, we randomly select 100, 000 samples from H0 and then calculate the p-values using the left-tailed t-test. Since the null distribution H0 is not normal, the resulting p-values are not uniformly distributed. Panel (e) displays the distribution of combined p-values for samples drawn from the null distribution H0. To calculate a combined p-value, we randomly pick 10 samples from the null population H0 and then calculate the 10 p-values using the left-tailed t-test. From these 10 p-values, we calculate a combined p-value using the addiive method. This procedure is repeated 100, 000 times to generate the distribution of the combined p-values under the null hypothesis. Similarly, panel (f) displays the distribution of the combined p-values for samples drawn from the alternative distribution H1.

The red dashed lines in panels (e, f) show the 0.05 cutoff. Since the combined p-values in (e) are calculated under the null hypothesis, values smaller than the cutoff are false positives. Therefore, the blue area to the left of the red dashed line is type I error of the classical meta-analysis using the left-tailed t-test. Similarly, combined p-values larger than the cutoff in panel (f) are false negatives. The blue area to the right of the red line panel (f) displays type II error.

The results show that combined p-values will be biased towards zero, since p-values of the left-tailed t-test are biased towards zero. To understand the behavior of the meta-analysis, we display type I and type II error in panels (g, h) with varying numbers of studies to be combined. As the number of studies increases, the meta-analysis becomes more biased, and type I error increases. For example, when the number of studies reaches 50, the analysis has more than 60% false positives. Paradoxically, increasing the number of studies will make the meta-analysis less useful due to the increase of type I error.

Panels (i–m) display the results of the right-tailed t-test. Panel (i) displays the distribution of p-values for samples drawn from the null distribution H0. Panel (j) displays the combined p-values for samples drawn from the null distribution H0. Panel (k) displays the combined p-values for samples drawn from the alternative distribution H2. Each combined p-value is calculated from 10 individual p-values. The right-tailed t-test is biased towards one, therefore more evidence is required to identify true positives. Compared to the left-tailed t-test, the right-tailed t-test has smaller type I error but larger type II error (less power). Therefore, many more studies would be required for this test to identify true positives. Panel (m) shows that for the case of combining 10 studies, the type II error of the right-tailed t-test is about 0.5 whereas the type II error of the left-tailed t-test is less than 0.2.

Panels (n–r) display the results of meta-analysis using the one sample left-tailed Wilcoxon test for the median. In this example, the left-tailed Wilcoxon test is biased towards one, so more evidence is required to identify true positives. As shown in panel (r), the expected type II error of the meta-analysis is about 0.6 when combining 10 studies. Interestingly, the behavior of the meta-analysis using the left-tailed Wilcoxon test is similar to that of the the right-tailed t-test. In both cases, the meta-analysis needs a large number of studies to identify true positives. Panels (m and r) show that type II error converges to zero as the number of studies increases.

Panels (s–w) display the results of meta-analysis using the one sample right-tailed Wilcoxon test for the median. Similar to the t-test, the right-tailed Wilcoxon test is biased towards zero. As shown in panels (g, v), type I error using either of the two tests increases as the number of studies increases.

D. General case: DANUBE

In this section, we analyze the performance of DANUBE using the same null and alternative distributions that were used for the t-test and Wilcoxon tests. Figure 6 displays the results using DANUBE. Panels (a, b, c) show the null distribution H0 and two alternative distributions H1 and H2. Panels (d–h) display the results using left-tailed DANUBE for the mean; panels (i–m) display the results using right-tailed DANUBE for the mean; panels (n–r) display the results using left-tailed DANUBE for the median; panels (s–w) display the results using right-tailed DANUBE for the median.

Fig. 6.

Fig. 6

Type I and type II errors of DANUBE using mean and median as discriminative statistics. Panel (a) displays the probability distribution under the null hypothesis (H0). Panel (b) displays an alternative distribution (H1), which has the same shape as the null distribution but a slightly smaller median. Panel (c) displays an alternative distribution (H2) which has the same shape as the null distribution but a slightly larger median. Panels (d–h) display the results of the left-tailed DANUBE using mean; panels (i–m) display the results of the right-tailed DANUBE using mean; panels (n–r) display the results of left-tailed DANUBE using median; panels (s–w) display the results of right-tailed DANUBE using median. Panels (d, i, n, s) show the p-value distributions for samples drawn from the null. For all four tests, p-values are uniformly distributed under the null hypothesis. Consequently, the combined p-values (using the additive method) are also uniformly distributed under the null hypothesis as shown in (e, j, o, t). The result is that the type I error equals the threshold (0.05) regardless of the number of studies combined, as shown in (g, l, q, v). Panels (h, m, r, w) show that the type II error converges quickly to zero. Combining 10 studies, the type II errors of left and right-tailed DANUBE for the mean are both less than 0.3 compared to 0.51 for the right-tailed t-test. Similarly, using the median, the type II error of DANUBE is less than 0.2 compared to 0.61 for the left-tailed Wilcoxon test.

We randomly select 10, 000 samples from the null distribution and use them to construct the empirical distribution of sample means (panels d–m) and likewise of sample medians (panels n–w). For a given empirical distribution, we calculate the probability of observing the discriminating statistic in a study. Panel (d) displays the distribution of empirical p-values for samples drawn from the null distribution H0; we see that these are uniformly distributed under the null hypothesis. Panel (e) displays the distribution of combined p-values for samples drawn from the null distribution H0. Each combined p-value is calculated from 10 individual empirical p-values. The blue area to the left of the red dashed line is type I error. Since the individual p-values are uniformly distributed, the combined p-values are also uniformly distributed. Consequently, the type I error of this test is equal to the threshold. Panel (f) displays the distribution of combined p-values for samples drawn from the alternative distribution H1. The blue area to the right of the red dashed line is the type II error.

Panels (g, h) display the type I and type II error of DANUBE with varying numbers of combined studies. The graphs show that the type I error of DANUBE consistently equals the threshold while type II error decreases when the number of studies increases. When combining 10 studies, the type I and type II errors of the left-tailed DANUBE for the mean are 0.05 and 0.27, respectively, compared to 0.24 and 0.14 for the left-tailed t-test. When the number of the studies increases over 30, one can expect DANUBE to give a 0.05 type I error and an almost zero type II error.

Similar to the left-tailed test, right-tailed DANUBE on the mean has the expected type I error and a reasonable type II error as shown in panels (l, m). With 10 studies to be combined, the right-tailed DANUBE’s type I and type II errors are 0.05 and 0.25, respectively, compared to 0.01 and 0.51 for the right-tailed t-test. The results for the mean show that both left- and right-tailed type I errors are equal to the threshold while the type II error decreases rapidly. On the contrary, the left and right-tailed t-tests have unpredictable behavior due to the skewness of the null distribution.

Panels (n–w) show the results of left- and right-tailed DANUBE for the median. As expected, the type I error for the median is also equal to the threshold, regardless of the number of studies that are combined. The test is proven to be powerful for both tails with type II error less than 0.2 for 10 studies. When compared to the left-tailed Wilcoxon test on 10 studies, the DANUBE left-tailed type II error is 0.17 as opposed to 0.61.

V. Conclusions

In this paper, we present a new framework to combine the results of multiple studies in order to gain more statistical power. Our framework first calculates the empirical p-values for each study using the empirical distribution of the discriminating statistic. It then combines the empirical p-value using either the Central Limit Theorem or the additive method. The new framework makes no statistical assumptions about the data and is therefore usable in many practical cases when no simple model is appropriate. In addition, use of the additive method makes the framework more robust to outliers.

The advantage of the new meta-analysis framework is demonstrated using both simulation and real-world data. In our simulation study, we compare the results of DANUBE to the classical additive method using the one sample t-test and Wilcoxon signed-rank test. The skewness and the non-normality of the simulated null distribution produces systematic bias in classical meta-analysis, either increasing type I error or decreasing the power of the test. In contrast, the type I error of DANUBE is equal to the threshold cutoff and type II error declines quickly when the number of studies increases.

To evaluate the proposed framework for pathway analysis applications, we examine 7 Alzheimer’s and 9 acute myeloid leukemia datasets using 25 approaches: 6 meta-analysis methods, Stouffer’s, Z-method, Brown’s, Fisher’s, the additive method and DANUBE, each of them combined with 4 representative pathway analysis methods, GSA, SPIA, PADOG, and GSEA, plus an additional independent meta-analysis method MetaPath. The results confirm the advantage of DANUBE over classical meta-analysis to identify pathways relevant to the phenotype.

This work describes an important limitation of current meta-analysis techniques, and provides a general statistical approach to increase the power of an analysis method using empirical distributions. With vast databases of biological data being made available, this framework may be powerful because it lets the data speak for itself. The proposed framework is flexible enough to be applicable to various types of studies, including gene-level analysis, pathway analysis, or clinical trials to assess the effect of a therapy in complex diseases.

Supplementary Material

DANUBE_PIEEE_Suppl

Acknowledgments

This research was supported in part by the following grants: NIH R01 DK089167, R42 GM087013 and NSF DBI-0965741, and by the Robert J. Sokol Endowment in Systems Biology. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of any of the funding agencies. We thank Diana Diaz for help and useful discussion.

Biographies

graphic file with name nihms854489b1.gif

Tin Nguyen received the BSc and MSc degrees in computer science from Eotvos Lorand University in Budapest, Hungary. He is currently is a PhD Candidate and a member of the Intelligent Systems and Bioinformatics Laboratory (ISBL) in the Department of Computer Science at Wayne State University, Michigan. His research interests include computational and statistical methods for analyzing high-throughput data. His current focus is meta-analysis, multi-omics data integration, and disease subtyping.

graphic file with name nihms854489b2.gif

Cristina Mitrea is a PhD Candidate and a member of the Intelligent Systems and Bioinformatics Laboratory (ISBL) in the Department of Computer Science at Wayne State University. In 2012, she received the Master of Science in Computer Science from Wayne State University. Her work is focused on research in data mining techniques applied to bioinformatics and computational biology. The main focus of her research is developing bioinformatics tools for cancer studies. Other interests include network discovery and meta-analysis applied to pathway analysis. She is also a student member of IEEE and ACM.

graphic file with name nihms854489b3.gif

Rebecca Tagett has a Bachelors in Physics, a Masters in Molecular Biology, and 10 years R&D experience in industry as a Computational Biologist. A PhD Candidate and a member of the Intelligent Systems and Bioinformatics Laboratory (ISBL) in the Department of Computer Science at Wayne State University, her research focuses on phenotypic prediction using multi-omics. Her interests are Functional Genomics, Scientific Writing, Bioinformatics and Biostatistics. She is a member of the International Society for Computational Biology (ISCB).

graphic file with name nihms854489b4.gif

Sorin Draghici is the Associate Dean for Innovation and Entrepreneurship, and Director, James and Patricia Anderson Engineering Ventures Institute in the College of Engineering at Wayne State University. He currently holds the Robert J. Sokol, MD Endowed Chair in Systems Biology, as well as appointments as full professor in the Department of Computer Science and the Department of Obstetrics and Gynecology, Wayne State University. Professor Draghici is also the head of the Intelligent Systems and Bioinformatics Laboratory (ISBL) in the Department of Computer Science. His work is focused on research in artificial intelligence, machine learning and data mining techniques applied to bioinformatics and computational biology. He has published two best-selling books on data analysis of high throughput genomics data, 8 book chapters and over 160 peer-reviewed journal and conference papers. His research laboratory has a strong track record in developing tools for data analysis of high throughput data. His laboratory has developed 8 analysis tools in this area, tools that have been made available over the web for over 10 years to over 11,000 scientists from 5 continents. He has also co-authored 3 analysis packages in Bioconductor. His top 4 papers in this area have over 2,000 total citations, while this entire work gathered over 7,000 citations. During his 17 year appointments as faculty, he was able to attract $8,262,283 as PI and $27,418,291 as co-PI in NIH and NSF grants.

Contributor Information

Tin Nguyen, Department of Computer Science, Wayne State University, Detroit, MI 48202.

Cristina Mitrea, Department of Computer Science, Wayne State University, Detroit, MI 48202.

Rebecca Tagett, Department of Computer Science, Wayne State University, Detroit, MI 48202.

Sorin Draghici, Department of Computer Science and the Department of Obstetrics and Gynecology, Wayne State University, Detroit, MI 48202.

References

  • 1.Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M, Yefanov A, Lee H, Zhang N, Robertson CL, Serova N, Davis S, Soboleva A. NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Research. 2013;41(D1):D991–D995. doi: 10.1093/nar/gks1193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Research. 2002;30(1):207–210. doi: 10.1093/nar/30.1.207. [Online]. Available: http://nar.oxfordjournals.org/cgi/content/abstract/30/1/207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Rustici G, Kolesnikov N, Brandizi M, Burdett T, Dylag M, Emam I, Farne A, Hastings E, Ison J, Keays M, Kurbatova N, Malone J, Mani R, Mupo A, Pereira RP, Pilicheva E, Rung J, Sharma A, Tang YA, Ternent T, Tikhonov A, Welter D, Williams E, Brazma A, Parkinson H, Sarkans U. ArrayExpress update–trends in database growth and links to data analysis tools. Nucleic Acids Research. 2013;41(D1):D987–D990. doi: 10.1093/nar/gks1174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Brazma A, Parkinson H, Sarkans U, Shojatalab M, Vilo J, Abeygunawardena N, Holloway E, Kapushesky M, Kemmeren P, Lara GG, Oezcimen A, Rocca-Serra P, Sansone S-A. ArrayExpress–a public repository for microarray gene expression data at the EBI. Nucleic Acids Research. 2003;31(1):68–71. doi: 10.1093/nar/gkg091. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Tseng GC, Ghosh D, Feingold E. Comprehensive literature review and statistical considerations for microarray meta-analysis. Nucleic Acids Research. 2012;40(9):3785–3799. doi: 10.1093/nar/gkr1265. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Ramasamy A, Mondry A, Holmes CC, Altman DG. Key issues in conducting a meta-analysis of gene expression microarray datasets. PLoS Medicine. 2008;5(9):e184. doi: 10.1371/journal.pmed.0050184. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Manoli T, Gretz N, Gröne H-J, Kenzelmann M, Eils R, Brors B. Group testing for pathway analysis improves comparability of different microarray datasets. Bioinformatics. 2006;22(20):2500–2506. doi: 10.1093/bioinformatics/btl424. [DOI] [PubMed] [Google Scholar]
  • 8.Borovecki F, Lovrecic L, Zhou J, Jeong H, Then F, Rosas H, Hersch S, Hogarth P, Bouzou B, Jensen R, Krainc D. Genome-wide expression profiling of human blood reveals biomarkers for Huntington’s disease. Proceedings of the National Academy of Sciences of the United States of America. 2005;102(31):11023–11028. doi: 10.1073/pnas.0504921102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Friedman L. Why vote-count reviews don’t count. Biological Psychiatry. 2001;49(2):161–162. [Google Scholar]
  • 10.Hedges LV, Olkin I. Vote-counting methods in research synthesis. Psychological Bulletin. 1980;88(2):359. [Google Scholar]
  • 11.Shen K, Tseng GC. Meta-analysis for pathway enrichment analysis when combining multiple genomic studies. Bioinformatics. 2010;26(10):1316–1323. doi: 10.1093/bioinformatics/btq148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Setlur SR, Royce TE, Sboner A, Mosquera J-M, Demichelis F, Hofer MD, Mertz KD, Gerstein M, Rubin MA. Integrative microarray analysis of pathways dysregulated in metastatic prostate cancer. Cancer Research. 2007;67(21):10296–10303. doi: 10.1158/0008-5472.CAN-07-2173. [DOI] [PubMed] [Google Scholar]
  • 13.Rhodes DR, Barrette TR, Rubin MA, Ghosh D, Chinnaiyan AM. Meta-analysis of microarrays interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. Cancer Research. 2002;62(15):4427–4433. [PubMed] [Google Scholar]
  • 14.Kaever A, Landesfeind M, Feussner K, Morgenstern B, Feussner I, Meinicke P. Meta-analysis of pathway enrichment: combining independent and dependent omics data sets. PloS One. 2014;9(2):e89297. doi: 10.1371/journal.pone.0089297. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Mitrea C, Taghavi Z, Bokanizad B, Hanoudi S, Tagett R, Donato M, Voichiţa C, Drǎghici S. Methods and approaches in the topology-based analysis of biological pathways. Frontiers in Physiology. 2013;4:278. doi: 10.3389/fphys.2013.00278. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Khatri P, Sirota M, Butte AJ. Ten years of pathway analysis: current approaches and outstanding challenges. PLoS Computational Biology. 2012;8(2):e1002375. doi: 10.1371/journal.pcbi.1002375. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Kotelnikova E, Shkrob MA, Pyatnitskiy MA, Ferlini A, Daraselia N. Novel approach to meta-analysis of microarray datasets reveals muscle remodeling-related drug targets and biomarkers in Duchenne muscular dystrophy. PLoS Computational Biology. 2012;8(2):e1002365. doi: 10.1371/journal.pcbi.1002365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Huang DW, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Research. 2009;37(1):1–13. doi: 10.1093/nar/gkn923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Kanehisa M, Goto S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research. 2000 Jan;28(1):27–30. doi: 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research. 1999;27(1):29–34. doi: 10.1093/nar/27.1.29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Croft D, Mundo AF, Haw R, Milacic M, Weiser J, Wu G, Caudy M, Garapati P, Gillespie M, Kamdar MR, Jassal B, Jupe S, Matthews L, May B, Palatnik S, Rothfels K, Shamovsky V, Song H, Williams M, Birney E, Hermjakob H, Stein L, D’Eustachio P. The Reactome pathway knowledgebase. Nucleic Acids Research. 2014;42(D1):D472–D477. doi: 10.1093/nar/gkt1102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Liberzon A, Subramanian A, Pinchback R, Thorvaldsdóttir H, Tamayo P, Mesirov JP. Molecular signatures database (MSigDB) 3.0. Bioinformatics. 2011;27(12):1739–1740. doi: 10.1093/bioinformatics/btr260. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Loughin TM. A systematic comparison of methods for combining p-values from independent tests. Computational Statistics & Data Analysis. 2004;47(3):467–485. [Google Scholar]
  • 24.Fisher RA. Statistical methods for research workers. Edinburgh: Oliver & Boyd; 1925. [Google Scholar]
  • 25.Edgington ES. An additive method for combining probability values from independent experiments. The Journal of Psychology. 1972;80(2):351–363. [Google Scholar]
  • 26.Hall P. The distribution of means for samples of size n drawn from a population in which the variate takes values between 0 and 1, all such values being equally probable. Biometrika. 1927;19(3–4):240–244. [Google Scholar]
  • 27.Irwin JO. On the frequency distribution of the means of samples from a population having any law of frequency with finite moments, with special reference to Pearson’s Type II. Biometrika. 1927;19(3–4):225–239. [Google Scholar]
  • 28.Tippett LHC. The methods of statistics. London: Williams & Norgate; 1931. [Google Scholar]
  • 29.Wilkinson B. A statistical consideration in psychological research. Psychological Bulletin. 1951;48(2):156. doi: 10.1037/h0059111. [DOI] [PubMed] [Google Scholar]
  • 30.Li J, Tseng GC. An adaptively weighted statistic for detecting differential gene expression when combining multiple transcriptomic studies. The Annals of Applied Statistics. 2011;5(2A):994–1019. [Google Scholar]
  • 31.Choi H, Shen R, Chinnaiyan AM, Ghosh D. A latent variable approach for meta-analysis of gene expression data from multiple microarray experiments. BMC Bioinformatics. 2007;8(1):364. doi: 10.1186/1471-2105-8-364. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Shen R, Ghosh D, Chinnaiyan AM. Prognostic meta-signature of breast cancer developed by two-stage mixture modeling of microarray data. BMC Genomics. 2004;5(1):94. doi: 10.1186/1471-2164-5-94. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Barton SJ, Crozier SR, Lillycrop KA, Godfrey KM, Inskip HM. Correction of unexpected distributions of P values from analysis of whole genome arrays by rectifying violation of statistical assumptions. BMC Genomics. 2013;14(1):161. doi: 10.1186/1471-2164-14-161. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Fodor AA, Tickle TL, Richardson C. Towards the uniform distribution of null P values on Affymetrix microarrays. Genome Biology. 2007;8(5):R69. doi: 10.1186/gb-2007-8-5-r69. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Bland M. Do baseline p-values follow a uniform distribution in randomised trials? PloS One. 2013;8(10):e76010. doi: 10.1371/journal.pone.0076010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceeding of The National Academy of Sciences of the Unites States of America. 2005;102(43):15545–15550. doi: 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Efron B, Tibshirani R. On testing the significance of sets of genes. The Annals of Applied Statistics. 2007;1(1):107–129. [Google Scholar]
  • 38.Tarca AL, Drǎghici S, Bhatti G, Romero R. Down-weighting overlapping genes improves gene set analysis. BMC Bioinformatics. 2012;13(1):136. doi: 10.1186/1471-2105-13-136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Mootha VK, Lindgren CM, Eriksson K-F, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstråle M, Laurila E, Houstis N, Daly MJ, Patterson N, Mesirov JP, Golub TR, Tamayo P, Spiegelman B, Lander ES, Hirschhorn JN, Altshuler D, Groop LC. PGC-11 α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nature Genetics. 2003 Jul;34(3):267–273. doi: 10.1038/ng1180. [DOI] [PubMed] [Google Scholar]
  • 40.Khatri P, Drǎghici S, Ostermeier GC, Krawetz SA. Profiling gene expression using Onto-Express. Genomics. 2002;79(2):266–270. doi: 10.1006/geno.2002.6698. [DOI] [PubMed] [Google Scholar]
  • 41.Drǎghici S, Khatri P, Martins RP, Ostermeier GC, Krawetz SA. Global functional profiling of gene expression. Genomics. 2003;81(2):98–104. doi: 10.1016/s0888-7543(02)00021-6. [DOI] [PubMed] [Google Scholar]
  • 42.Beißbarth T, Speed TP. GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics. 2004 Jun;20:1464–1465. doi: 10.1093/bioinformatics/bth088. [DOI] [PubMed] [Google Scholar]
  • 43.Tarca AL, Drǎghici S, Khatri P, Hassan SS, Mittal P, Kim J-s, Kim CJ, Kusanovic JP, Romero R. A novel signaling pathway impact analysis. Bioinformatics. 2009;25(1):75–82. doi: 10.1093/bioinformatics/btn577. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Voichita C, Draghici S. ROntoTools: R Onto-Tools suite. 2013 R package. [Online]. Available: http://www.bioconductor.org.
  • 45.Drǎghici S, Khatri P, Tarca AL, Amin K, Done A, Voichiţa C, Georgescu C, Romero R. A systems biology approach for pathway level analysis. Genome Research. 2007;17(10):1537–1545. doi: 10.1101/gr.6202607. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences of the United States of America. 2003;100(16):9440–9445. doi: 10.1073/pnas.1530509100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Edgington ES. A normal curve method for combining probability values from independent experiments. The Journal of Psychology. 1972;82(1):85–89. [Google Scholar]
  • 48.Stouffer S, Suchman E, DeVinney L, Star S, Williams JRM. The American Soldier: Adjustment during army life Princeton. Vol. 1 Princeton University Press; 1949. [Google Scholar]
  • 49.Brown MB. A method for combining nonindependent, one-sided tests of significance. Biometrics. 1975:987–992. [Google Scholar]
  • 50.Wang X, Kang DD, Shen K, Song C, Lu S, Chang L-C, Liao SG, Huo Z, Tang S, Ding Y, Kaminski N, Sibille E, Lin Y, Li J, Tseng GC. An R package suite for microarray meta-analysis in quality control, differentially expressed gene analysis and pathway enrichment detection. Bioinformatics. 2012;28(19):2534–2536. doi: 10.1093/bioinformatics/bts485. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Swerdlow RH. Brain aging, Alzheimer’s disease, and mitochondria. Biochimica et Biophysica Acta (BBA)-Molecular Basis of Disease. 2011;1812(12):1630–1639. doi: 10.1016/j.bbadis.2011.08.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Maruszak A, Żekanowski C. Mitochondrial dysfunction and Alzheimer’s disease. Progress in Neuro-Psychopharmacology and Biological Psychiatry. 2011;35(2):320–330. doi: 10.1016/j.pnpbp.2010.07.004. [DOI] [PubMed] [Google Scholar]
  • 53.Zhu X, Perry G, Smith MA, Wang X. Abnormal mitochondrial dynamics in the pathogenesis of Alzheimer’s disease. Journal of Alzheimer’s Disease. 2013;33:S253–S262. doi: 10.3233/JAD-2012-129005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Querfurth HW, LaFerla FM. Mechanisms of disease. New England Journal of Medicine. 2010;362(4):329–344. doi: 10.1056/NEJMra0909142. [DOI] [PubMed] [Google Scholar]
  • 55.Donato M, Xu Z, Tomoiaga A, Granneman JG, MacKenzie RG, Bao R, Than NG, Westfall PH, Romero R, Drăghici S. Analysis and correction of crosstalk effects in pathway analysis. Genome Research. 2013;23(11):1885–1893. doi: 10.1101/gr.153551.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Brookes PS, Yoon Y, Robotham JL, Anders M, Sheu S-S. Calcium, ATP, and ROS: a mitochondrial love-hate triangle. American Journal of Physiology-Cell Physiology. 2004;287(4):C817–C833. doi: 10.1152/ajpcell.00139.2004. [DOI] [PubMed] [Google Scholar]
  • 57.Gosset WS. The Probable Error of a Mean. Biometrika. 1908;6:1–25. [Google Scholar]
  • 58.Peaeson E, Haetlet H. Biometrika tables for statisticians. Biometrika Trust. 1976 [Google Scholar]
  • 59.Wilcoxon F. Individual comparisons by ranking methods. Biometrics. 1945;1(6):80–83. [Google Scholar]
  • 60.Wilcoxon F, Katti S, Wilcox RA. Critical values and probability levels for the Wilcoxon rank sum test and the Wilcoxon signed rank test. Selected tables in mathematical statistics. 1970;1:171–259. [Google Scholar]
  • 61.Hollander M, Wolfe DA, Chicken E. Nonparametric statistical methods. Vol. 751 John Wiley & Sons; 2013. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

DANUBE_PIEEE_Suppl

RESOURCES