Skip to main content
PLOS One logoLink to PLOS One
. 2014 Mar 24;9(3):e91225. doi: 10.1371/journal.pone.0091225

Accuracy Evaluation of the Unified P-Value from Combining Correlated P-Values

Gelio Alves 1, Yi-Kuo Yu 1,*
Editor: Frank Emmert-Streib2
PMCID: PMC3963868  PMID: 24663491

Abstract

Meta-analysis methods that combine Inline graphic-values into a single unified Inline graphic-value are frequently employed to improve confidence in hypothesis testing. An assumption made by most meta-analysis methods is that the Inline graphic-values to be combined are independent, which may not always be true. To investigate the accuracy of the unified Inline graphic-value from combining correlated Inline graphic-values, we have evaluated a family of statistical methods that combine: independent, weighted independent, correlated, and weighted correlated Inline graphic-values. Statistical accuracy evaluation by combining simulated correlated Inline graphic-values showed that correlation among Inline graphic-values can have a significant effect on the accuracy of the combined Inline graphic-value obtained. Among the statistical methods evaluated those that weight Inline graphic-values compute more accurate combined Inline graphic-values than those that do not. Also, statistical methods that utilize the correlation information have the best performance, producing significantly more accurate combined Inline graphic-values. In our study we have demonstrated that statistical methods that combine Inline graphic-values based on the assumption of independence can produce inaccurate Inline graphic-values when combining correlated Inline graphic-values, even when the Inline graphic-values are only weakly correlated. Therefore, to prevent from drawing false conclusions during hypothesis testing, our study advises caution be used when interpreting the Inline graphic-value obtained from combining Inline graphic-values of unknown correlation. However, when the correlation information is available, the weighting-capable statistical method, first introduced by Brown and recently modified by Hou, seems to perform the best amongst the methods investigated.

Introduction

Meta-analysis methods that combine Inline graphic-values into a single unified Inline graphic-value are commonly used to rank or score a list of hypotheses [1]. For each hypothesis tested, the Inline graphic-values to be combined are often acquired from studying different features associated with the hypothesis or from using different data analysis methods (DAM) to analyze a chosen feature. Either approaches conducted to test the same list of hypotheses assign an overall Inline graphic-value to each hypothesis tested. These Inline graphic-values are then usually sorted, with the most significant result ranking first in the list. Given that different features may not be completely independent and that different DAMs may share protocols and use similar information, it is likely that the Inline graphic-values obtained for a hypothesis are correlated.

Most Inline graphic-value combining methods assume that the Inline graphic-values to be combined are independent or weakly correlated [2], [3]. When the unified Inline graphic-value is computed by combining correlated Inline graphic-values, without properly taking into account the correlation, there can be notable effects in the significance assignment of the hypothesis tested. As the Inline graphic-values to be combined are possibly correlated, it is important to investigate the effect that correlation has on the unified Inline graphic-value. The current study is designed to evaluate the accuracy of the unified Inline graphic-value computed by combining (positively) correlated Inline graphic-values using some commonly applied statistical methods. By Inline graphic-value accuracy, we mean how well on average does reported Inline graphic-value agree with the one-sided cumulative distribution function of the random variable (associated with the null hypotheses tested) at the critical region. In other words, accurate Inline graphic-value means that when one controls type-I error rate at a level Inline graphic, the type-I error rate is really controlled at the level Inline graphic. To keep this paper focused, we will not provide a lengthy introduction. For methods that we will evaluate, more details are provided in the Methods sections. For others, we will only provide the readers with appropriate references.

Several studies have been performed to evaluate methods that combine independent Inline graphic-values [4][10]. For example, Rosenthal has evaluated nine methods for combining Inline graphic-values and has summarized advantages, limitations and applications for each method [4]. Loughin [5] has also conducted a systematic comparison of methods for combining Inline graphic-values and recommended practitioners to choose a method based on the structure and expectation for the problem being studied. Recently, Whitlock [6] has showed that the weighted Z-method has more power and precision than Fisher's test. In other studies, Chen [8] as well as Chen and Nadarajah [9], have shown that either the generalized Fisher method due to Lancaster or a special case of Lancaster's test outperform the weighted Z-method, while Zaykin [10] has shown that the weighted Z-method has similar power to Lancaster's method when the weights are selected to be the square roots of sample sizes.

As for combining correlated Inline graphic-values, only few studies have been conducted to evaluate the accuracy of the unified Inline graphic-value computed by existing statistical methods [11], [12]. Evidently, more comprehensive investigations that incorporate different methods, encompass a wide range of correlation strength, and have a large number of simulations can further our understanding on the effect of correlation has on computing a unified Inline graphic-value. To advance towards this direction, we systematically investigate a family of statistical methods for combining Inline graphic-values. Because we are interested in combining Inline graphic-values obtained from the right-tailed tests, we have limited our study to methods that combine Inline graphic-values based on the normal distribution (e.g. Stouffer's method) and on the Chi-square distribution (e.g. Fisher's method), the general purpose method and the right-tail method recommended by Loughin [5]. The two aforementioned methods, aside from being frequently used to combine Inline graphic-values, are useful and important to study for the following reason. Both methods mentioned have variations that weight Inline graphic-values while computing the combined Inline graphic-value: Lipták, Good and Bhoj methods [13][15], and variations that take into account the correlation among Inline graphic-values: Hartung and Hou methods [16], [17]. In addition, all methods mentioned above either have closed-form formulas, i.e., distribution functions, or approximation formulas that can provide the unified Inline graphic-value with minimum computation cost.

In summary, our study presents an accuracy evaluation of the unified Inline graphic-value obtained from statistical methods designed to combine independent, weighted independent, correlated, and weighted correlated Inline graphic-values. We have evaluated the accuracy of the unified Inline graphic-value from combining positively correlated Inline graphic-value vectors with correlation among Inline graphic-value vectors in the range Inline graphic. Our results show that methods designed to combine independent Inline graphic-values but with the capability of assigning weights to Inline graphic-values perform better than methods that combine independent Inline graphic-values without weights. Also methods that take into account the correlation between Inline graphic-values perform significantly better than methods designed to combine independent Inline graphic-values. Based on this study, the method first introduced by Brown [18] to combine correlated Inline graphic-values and later adapted to include weights by Hou [17] is the best performing one amongst the methods investigated.

Methods

The main task of combining Inline graphic-values is described below. Given a list of hypotheses Inline graphic, let each hypothesis have Inline graphic Inline graphic-values associated with it. These Inline graphic Inline graphic-values can be organized as Inline graphic Inline graphic-value vectors, Inline graphic, each having Inline graphic components. Each Inline graphic-value vector may result from analyzing one out of Inline graphic different features of every hypothesis or may be from analyzing a single feature using one of the Inline graphic different DAMs. The Inline graphic Inline graphic-values associated with hypothesis Inline graphic are Inline graphic. Given those values, one needs to combine them to form a single unified Inline graphic-value. This scenario can occur in many applications. As an example, when different studies are performed to test a set of genetic loci for allelic imbalance [19], the number of genetic regions tested will correspond to the number of hypotheses Inline graphic and each region will carry with them Inline graphic Inline graphic-values, one from each of the Inline graphic studies. To fairly rank these possible Inline graphic regions, for each region one would need a unified Inline graphic-value resulting from combining the Inline graphic Inline graphic-values associated with it. For database search based peptide identification using mass spectrometry, it is possible to analyze the data using multiple analysis methods. Here for each experimental spectrum, the number of hypotheses tested Inline graphic equals the number of scored peptides in the database and each peptide receives a Inline graphic-value from each of the Inline graphic analysis methods. To fairly rank the candidate peptides, it is again natural to combine the Inline graphic Inline graphic-values associated with each scored peptide [3] to reach a unified Inline graphic-value. In the sequence homology detection where multiple motifs are used as a query to a sequence database, it is often needed to combine the Inline graphic-values, each from one of the Inline graphic motifs, to assign the statistical significance to a sequence in the sequence database [2]. In this case, Inline graphic is the number of sequences in the database, while Inline graphic is the number of motifs used as the query.

To make the notation uniform, we will use Inline graphic and Inline graphic to represent the cumulative distribution and inverse cumulative distribution. When the subscript Inline graphic, Inline graphic represents respectively the cumulative Normal, Chi-squared, and Gamma distributions. All the parameters of these distributions will be shown as arguments enclosed by a pair of parentheses following the symbol Inline graphic.

Combining Independent P-values

We begin this subsection with a brief introduction of Stouffer's (Z-transform test) and Fisher's (Chi-square test) methods. Generalizations of both methods to combine weighted Inline graphic-values are also described.

Method 1

The combined Z-transform test was first used by Stouffer et al. [20] and later generalized to include weights by Lipták [13]. Under the null hypothesis, the Inline graphic-values are uniformly distributed between [0,1]. Given a list of Inline graphic-values Inline graphic associated with a given Inline graphic, one transforms the Inline graphic-values to a new variable Inline graphic by a simple transformation

graphic file with name pone.0091225.e112.jpg

where Inline graphic stands for the inverse of the cumulative normal distribution. For the Z-transform test the distribution function used is the standard Normal (Gaussian) distribution with probability density function given by

graphic file with name pone.0091225.e114.jpg

with parameters Inline graphic and Inline graphic.

Stouffer's way to combine the above Inline graphic-values is by defining a new variable

graphic file with name pone.0091225.e118.jpg

which is also Gaussian distributed with Inline graphic-value given by the formula

graphic file with name pone.0091225.e120.jpg (1)

A generalization of the above equation that assigns weights (Inline graphic) to the variable Inline graphic is know as the weighted Z-transform test [13]

graphic file with name pone.0091225.e123.jpg

The variable of the weighted Z-transform Inline graphic also follows Normal distribution, and the formula for the Inline graphic-value is also given by eq. (1).

Method 2

Fisher's method [21] is one of the most used method to combine independent Inline graphic-values. The combined Fisher Inline graphic-value is obtained through the following variable:

graphic file with name pone.0091225.e128.jpg

which follows a Chi-squared distribution Inline graphic with 2Inline graphic degrees of freedom. Computing the unified Inline graphic-value using the Chi-squared distribution is not the most efficient approach because of the significant computational cost in calculating the cumulative distribution Inline graphic. A more efficient way to obtain the unified Inline graphic-value has been proposed [2], [3], where the unified Inline graphic-value of Inline graphic has a closed form given by

graphic file with name pone.0091225.e136.jpg (2)

or in terms of the Inline graphic variable

graphic file with name pone.0091225.e138.jpg (3)

Note that as Inline graphic increases Inline graphic decreases and vice versa.

Fisher's method does not assign weights to the Inline graphic-values to be combined. However, when information is available regarding how Inline graphic-values were obtained, it might be beneficial to weight Inline graphic-values. Lancaster et al. [22] addresses this issue by replacing the random variable Inline graphic with Inline graphic, a variable following a Chi-squared distribution with Inline graphic degrees of freedom not necessarily equal to two.

In Lancaster's procedure, summarized below, one can exploit the equivalence between the Chi-squared distribution Inline graphic and the gamma distribution Inline graphic to reach a different weighting generalization. For hypothesis Inline graphic, the variable Inline graphic can now be written as

graphic file with name pone.0091225.e151.jpg

which evidently follows a Chi-squared distribution with Inline graphic degrees of freedom. In the expression above, Fisher's method is recovered by setting Inline graphic for all Inline graphic. Another way to incorporate weights is to keep Inline graphic while retaining a general Inline graphic value. Specifically, one may choose, with Inline graphic being the weight factor, to use the following new variable

graphic file with name pone.0091225.e158.jpg

The Inline graphic-value for Inline graphic can be easily evaluated using the same technique as that in [3] and is given below

graphic file with name pone.0091225.e161.jpg (4)

Interestingly, with Inline graphic, eq. (4) corresponds to the unified Inline graphic-value of multiplying weighted independent Inline graphic-values obtained earlier by Good [14]. This can be seen by the following observation. Good defined his variable

graphic file with name pone.0091225.e165.jpg

and the corresponding Inline graphic-value is given by

graphic file with name pone.0091225.e167.jpg (5)

When expressed in the variable Inline graphic, we easily see that

graphic file with name pone.0091225.e169.jpg

in agreement with eq. (4) when Inline graphic.

A question that arises naturally when using methods such as the weighted Z-transform's test, Good's test, and Lancaster's test is how to obtain the optimal weights (Inline graphic)? This difficult question has been raised and it was suggested that the choice of weights may vary by cases [23]. Existing methods to assign/estimate the weights include, but are not limited to: (1) weight in proportion to the reciprocal of the variance estimated from each study [6], (2) estimate the weights from one's prior belief about a method or feature [24], (3) select weights to stabilize the variance of the combined test statistics [25], and (4) use weights that improve the testing power [26]. Because there is no universal procedure to compute the optimal weights to be used, in this study the weights, when used, were randomly generated and normalized to sum to one (see Table 1).

Table 1. Breakdown of Methods Used to Combine Inline graphic-values Investigated.
Method Name Ref. number Eq. number Acc. weights Nor. weights Account for corr.
Fisher [21] 3 no none no
Stouffer [20] 1 no none no
Bhoj [15] 6 yes Inline graphic no
Good [14] 5 yes Inline graphic no
Lipták [13] 1 yes Inline graphic no
Hartung [16] 9 yes Inline graphic yes
Hou [17] 14 yes Inline graphic yes

The first column of the table provides the names of the methods used to combine Inline graphic-values investigated in our study. The second column lists the reference number cited in this paper for the publication (Ref) corresponding to the method used. The third column provides the equation number for the method distribution function used to compute the formula Inline graphic-value. The fourth column indicates if a method equation can accommodate (acc.) weight when combining Inline graphic-value. The fifth column gives the normalization (nor.) procedure used to normalize the weights. Finally, the last column conveys the information about a method's capability to account for correlation (corr.) between Inline graphic-values.

There are also two apparent problems with Lancaster's eq. (4) and Good's eq. (5). The first problem is that the weights used can't be identical, otherwise singularities can occur [14], [15]. Second, if the difference between some of the weights are small, numerical instability can occur [15], [17], [27]. In order to address the problem of numerical instability associated with identical and almost identical weights, Bhoj [15] suggested an approximation using a linear combination of Inline graphic gamma density functions (with Inline graphic)

graphic file with name pone.0091225.e184.jpg (6)

where Inline graphic is the incomplete gamma function and Inline graphic is the gamma function. Although the approximation provided by Bhoj does reduce to Fisher's distribution when the weights are all equal and does not encounter singularities when weights are identical or nearly identical, this approximation does not lead to Good's distribution when the weights are all different. A recent publication [27] has provided an analytical formula that not only is numerically stable when combining Inline graphic-values with nearly degenerate or identical weights but also correctly reproduces Fisher's and Good's results as limiting cases.

Combining Dependent P-values

In this subsection we summarize two statistical methods that are generalizations of Stouffer's test (Z-transform test) and Fisher's test (Chi-square test) that attempt to account for the correlation among Inline graphic-values to be combined.

Method 3

Hartung [16] incorporates the correlation among Inline graphic-values via introducing in the Z-transform test (eq. (1)) the correlation-matrix, with elements Inline graphic computed from the variable pairs Inline graphic, and by defining a new variable

graphic file with name pone.0091225.e192.jpg

where

graphic file with name pone.0091225.e193.jpg (7)

and

graphic file with name pone.0091225.e194.jpg (8)

where Inline graphic is the average value of Inline graphic, Inline graphic is the total number of hypotheses tested, and Inline graphic is the variance of Inline graphic.

The Inline graphic-value for Inline graphic is then approximated by the standard Normal distribution

graphic file with name pone.0091225.e202.jpg (9)

which nevertheless becomes exact in the two extreme limits of Inline graphic Inline graphic and Inline graphic Inline graphic. Although in general the distribution of Inline graphic is only approximately normal, it is arguable that ignoring correlation can cause more damage to the combined Inline graphic-value than the deviations from normality. Applications and extensions of Hartung's idea can also be found in more recent publications [12], [28].

Method 4

Following Satterthwaite's procedure [29], there have been some attempts, when combining correlated Inline graphic-values, to obtain approximate unified Inline graphic-value for the Fisher's variable (no weight) [18], [30] and for the Good's variable (unequal weights) [17]. The main idea of Satterthwaite's procedure is to equate the first two moments of the uncharacterized distribution to that of a Chi-squared distribution. Brown [18] and Kost et al. [30] tried to approximate the distribution of the Fisher's variable

graphic file with name pone.0091225.e211.jpg

and Hou [17] the distribution of Good's variable

graphic file with name pone.0091225.e212.jpg

to that of a Chi-squared distribution Inline graphic, with Inline graphic being a scale factor to be determined.

The expectation value (Inline graphic) the variance (Inline graphic) of Inline graphic by formal operation are given respectively by

graphic file with name pone.0091225.e218.jpg (10)
graphic file with name pone.0091225.e219.jpg (11)

On the other hand, the expectation value and variance of Inline graphic using Inline graphic yields

graphic file with name pone.0091225.e222.jpg (12)
graphic file with name pone.0091225.e223.jpg (13)

Equating (10) to (12) and (11) to (13) yields

graphic file with name pone.0091225.e224.jpg

and

graphic file with name pone.0091225.e225.jpg

The covariance (cov) term used above was first estimated by Brown [18] and recently an improved estimation (through numerically tabulating the covariance as a function of the correlation and then performing polynomial fits) was provided by Kost and McDermott [30]

graphic file with name pone.0091225.e226.jpg

where Inline graphic above is the correlation between Inline graphic and Inline graphic. The Inline graphic-value for Inline graphic is then approximated by that of a Chi-squared distribution

graphic file with name pone.0091225.e232.jpg (14)

Equation (14) reduces to Fisher's formula eq. (3) when the Inline graphic-values are independent and the weights are all same. However, the above equation does not reduces to Good's formula eq. (5) when the Inline graphic-values are independent and each carries a different weight.

Generating Correlated P-value Vectors

By definition, the Inline graphic-values of null hypotheses should be uniformly distributed between Inline graphic and Inline graphic, which is often assumed by methods of combining Inline graphic-values. However, the uniformity of Inline graphic-values, when assigned by available statistical tools to a group of null hypotheses, is often lost. This would handicap the efficacy of methods for combining Inline graphic-values from the start. To eliminate the effect of nonuniform null Inline graphic-values from our evaluation, we enforce the quasi-uniformity of null Inline graphic-values by first constructing a starter Inline graphic-value vector Inline graphic of size Inline graphic with the Inline graphicth element Inline graphic  =  Inline graphic, for Inline graphic. (See next paragraph for more details.) This guarantees an even sample of the Inline graphic-values (in the range from Inline graphic to Inline graphic). To achieve correlations of various strengths, we have used Inline graphic-value vectors, each of which is obtained via permuting (pairwise) the elements of a fixed vector, the starter vector with a small perturbation, by a randomly chosen number. The basic idea is that when the number of pairwise permutations is not large, the resulting Inline graphic-value vectors will be correlated to the fixed vector and will be correlated among one another. It is worth pointing out that this approach does not generate correlations with a prescribed strength: even with the same number of random pairwise permutations of the vector elements, the correlation between any pair of such permuted vectors does not have a fixed strength. We believe this is closer to the real-world scenario than having a fixed correlation strength among the Inline graphic-value vectors. The value of Inline graphic should not matter in terms of testing whether a method can provide accurate combined Inline graphic-value. If a small Inline graphic is used, however, the combined Inline graphic-value will have a large statistical fluctuation that may reduce the resolution of the comparison. On the other hand, making Inline graphic large causes a long computational time. We find that using Inline graphic yields enough separations among methods tested without significantly slowing down the computation.

For each method investigated, we have performed a simulation of 500,000 realizations, each of which was conducted as follows. First, pick a random positive integer Inline graphic with Inline graphic. Second, generate the first Inline graphic-value vector Inline graphic by adding a small random perturbation (Inline graphic) between 0 and Inline graphic to each vector element of Inline graphic: Inline graphic. Evidently, by increasing the upper bound for Inline graphic, one will produce Inline graphic-values with larger variations from exactly uniform distribution. In the third step, generate more size-Inline graphic vectors Inline graphic and initialize them to Inline graphic. For each vector generated, its vector elements are pairwise permuted Inline graphic (chosen at the first step) times. After that using Inline graphic in place of Inline graphic the pairwise correlation Inline graphic was computed using eq. (8) and the average correlation EInline graphic among vectors was computed using eq. (7). This work flow is illustrated in Figure 1 with Inline graphic for simplicity. The constructed random Inline graphic-value vectors Inline graphic were then combined to obtain a unified Inline graphic-value vector (Inline graphic) using the various methods listed in Table 0. Once the unified Inline graphic-value vector (Inline graphic) was calculated, its elements were sorted in increasing order and it was then compared against the rank (Inline graphic) vector, whose element is obtained by dividing the rank of a Inline graphic element by Inline graphic, i.e., Inline graphic for Inline graphic ranging from 1 to Inline graphic. We shall call Inline graphic, the Inline graphicth element of the rank vector, the normalized rank of rank Inline graphic.

Figure 1. Example workflow of generating correlated Inline graphic-values and pairwise correlations.

Figure 1

In this example figure, Inline graphic is Inline graphic, the number of Inline graphic-value vectors is Inline graphic, the number of pairwise permutations Inline graphic, and the perturbations Inline graphics are set to zero for clarity and simplicity. The resulting pairwise correlations by using Inline graphic in place of Inline graphic are displayed in a symmetric matrix form.

Statistical Accuracy Evaluation of the Combined P-value (Inline graphic)

If a method yields a unified Inline graphic-value vector Inline graphic agreeing with Inline graphic, the scatter plot of Inline graphic versus Inline graphic should produce a straight line with slope one and intercept zero [31]. It is also important to mention that the smallest computed Inline graphic-value is expected to be inversely proportional to the sample size, which for the current case is of the order of Inline graphic. An example of a logarithmic plot of Inline graphic versus Inline graphic generated from a single iteration of our simulation is shown in Figure 2. Using the textbook definition of Inline graphic-value, the linear slope obtained from the logarithmic plot of Inline graphic versus Inline graphic should be approximately one for methods with accurate statistics. To quantify how well Inline graphic agrees with Inline graphic we use four measures: (1) the average weighted sum of squares error (Inline graphic), (2) the distance (Inline graphic) between Inline graphic and Inline graphic, (3) the expected rank EInline graphic, and (4)the expected error of Inline graphic. Figure 2 also illustrates what is being computed by the above four measures.

Figure 2. Log-log plot of the unified Inline graphic-value vector Inline graphic versus the rank vector Inline graphic.

Figure 2

The curves in panels (A) and (B) were obtained from combining the Inline graphic-values of four Inline graphic-value vectors, each of size 10,000, using Stouffer's method. In panel (A), the red circles show the scatter plot of normalized rank versus computed Inline graphic-value from a randomly picked iteration (realization) of very weak average correlation. It is through curves like the one displayed in panel (A) that enables one to calculate the average sum of squares error using eq. (15) and the distance measure using eq. (16). Panel (B) shows 1000 curves, each of which is obtained from performing the same task as that leads to the curve in (A) but with different average correlation strengths. The lines that go significantly above Inline graphic line are from cases with stronger average correlations. They yield unified Inline graphic-values that are much exaggerated perhaps due to the fact that the Stouffer's method does not account for correlations. By averaging the normalized rank Inline graphic along the blue line (Inline graphic) yields the value Inline graphic (see eq. (17)). By shifting the blue line to different Inline graphic values renders the entire Inline graphic versus Inline graphic curve. The red horizontal line illustrates the case when Inline graphic (or normalized rank Inline graphic). By averaging the Inline graphic values along this line, the Inline graphic value is obtained for Inline graphic by simply adding Inline graphic to the averaged value (see eq. (18)).

Average Weighted Sum of Squares Error

We define the average weighted sum of squares error as

graphic file with name pone.0091225.e346.jpg (15)

The weight factor (Inline graphic), Inline graphic, in the above equation was chosen so that each point in the transformed variable domain carries the same contribution to the Inline graphic. By construction, the Inline graphic-values in the random vector Inline graphic are uniformly distributed between Inline graphic. However, once we make the logarithmic transformation, Inline graphic, we find the new variable Inline graphic to be exponentially distributed, i.e., Inline graphic. One may thus introduce Inline graphic, a weight factor making Inline graphic, to compensate the non-uniformity in Inline graphic. This leads to Inline graphic, the weight factor used in eq. (15).

Angular Distance Between F and R

To compute the distance between Inline graphic and Inline graphic, we began by first computing the slope (Inline graphic) of the logarithmic plot of Inline graphic versus Inline graphic using a weighted least-square regression, which aims to minimize the weighted sum of squares error (Inline graphic)

graphic file with name pone.0091225.e366.jpg

Taking the derivative of the above expression with respect to Inline graphic and Inline graphic and setting them equal to zero gives the following equations:

graphic file with name pone.0091225.e369.jpg

and

graphic file with name pone.0091225.e370.jpg

Solving the above two equations simultaneously for a and b gives

graphic file with name pone.0091225.e371.jpg

where Inline graphic and Inline graphic are the weighted average of Inline graphic and Inline graphic respectively and

graphic file with name pone.0091225.e376.jpg

From Inline graphic and Inline graphic, a normalized vector Inline graphic was computed using the points Inline graphic and Inline graphic along the regression line. Similarly another normalized vector Inline graphic was obtained using the points Inline graphic and Inline graphic along the ideal line. Finally, the (angular)distance between the two unit vectors Inline graphic and Inline graphic was computed

graphic file with name pone.0091225.e387.jpg (16)

Methods with accurate statistics are expected to have Inline graphic and Inline graphic. Evidently, Inline graphic leads to Inline graphic (see eq. (16)). The independence of the angular distance Inline graphic on the intercept parameter Inline graphic implies that Inline graphic only measures the relative accuracy of the Inline graphic-value, not the absolute accuracy. For example, if Inline graphic, even when the positive constant Inline graphic is different from Inline graphic, Inline graphic is still zero.

Expected Rank E[Inline graphic]

For iteration Inline graphic, we denote by Inline graphic the largest normalized rank whose corresponding reported Inline graphic-value is less than or equal to a selected cutoff Inline graphic-value Inline graphic. The expected rank E[Inline graphic] is computed by averaging Inline graphic over all realizations and can be written as

graphic file with name pone.0091225.e408.jpg (17)

In the ideal case of absolute accuracy, Inline graphic. In reality, this is hardly the case and that is why we use the expectation value of Inline graphic versus Inline graphic as the measure. For methods with accurate statistics a plot of E[Inline graphic] versus Inline graphic should trace closely the line Inline graphic.

Expected Error of Inline graphic

The expected error of Inline graphic relative to Inline graphic (for Inline graphic) is defined as

graphic file with name pone.0091225.e419.jpg (18)

and the standard deviation

graphic file with name pone.0091225.e420.jpg (19)

For methods with accurate statistics, plotting Inline graphic versus Inline graphic should track the line Inline graphic well and have small standard deviations for various Inline graphic.

Results and Discussion

The four measures mentioned in the methods section are used to evaluate the accuracy of the unified Inline graphic-value computed. In Figures 3, 4, 5 and 6, we show the results of combining a list of Inline graphic Inline graphic-values. The layout of each of these figure is identical. For each method considered, our simulation includes a total of Inline graphic iterations. At each iteration, we generated Inline graphic lists, within which the Inline graphicth list is obtained by taking the Inline graphic entry of each of the 12 Inline graphic-value vectors, Inline graphic. By computing the pairwise correlation (see eq. (8)) among the Inline graphic-value vectors, one obtains the average pairwise correlation EInline graphic given by eq. (7). Each iteration, generating a Inline graphic-tuples of Inline graphic-value vectors, thus yields an average correlation Inline graphic.

Figure 3. Methods that combine independent Inline graphic-values: Fisher, Stouffer and Bhoj.

Figure 3

The curves plotted above are the curves for the four different measures used to evaluate the accuracy of the computed Inline graphic-value from combining the Inline graphic-values of 12 Inline graphic- value vectors.In panel C, note that the Fisher curve (red) is almost completely covered by the Bhoj curve (green). See text for more details.

Figure 4. Methods that combine weighted independent Inline graphic-values: Good, Lipták and Bhoj.

Figure 4

The curves plotted above are the curves for the four different measures used to evaluate the accuracy of the computed Inline graphic-value from combining the Inline graphic-values of 12 Inline graphic-value vectors. See text for more details.

Figure 5. Methods that combine correlated Inline graphic-values: Hartung and Hou.

Figure 5

The curves plotted above are the curves for the four different measures used to evaluate the accuracy of the computed Inline graphic-value from combining the Inline graphic-values of 12 Inline graphic-value vectors. See text for more details.

Figure 6. Methods that combine weighted correlated Inline graphic-values: Hartung and Hou.

Figure 6

The curves plotted above are the curves for the four different measures used to evaluate the accuracy of the computed Inline graphic-value from combining the Inline graphic-values of 12 Inline graphic-value vectors. See text for more details.

For Figures 3, 4, 5, 6, the data points in panels A and B respectively display the expected average sums of square errors (EInline graphic) and expected distances (EInline graphic) versus EInline graphic. More specifically, every data point plotted with Inline graphic-axis value Inline graphic represents an average of 25,000 iterations, each of which has its Inline graphic-tuple's average correlation Inline graphic fall in the range of Inline graphic. For panels C, D, E and F, each data point plotted is computed using all the Inline graphic iterations from our simulation. The curves in panel C show the expected number of events with unified Inline graphic-value computed less than or equal to a cutoff value Inline graphic. For methods with accurate statistics, by the definition of Inline graphic-value, a plot of EInline graphic versus Inline graphic should follow the line Inline graphic. Panels D and E (and F for Figures 3 and 4) display the expected Inline graphic value together with its standard deviation as a function of Inline graphic. Similar plots for the combination of 4 and 8 Inline graphic-value vectors can be found in File S1.

Figure 3 displays the results for methods that assume the the Inline graphic-values to be combined are independent: Fisher's (eq. 3), Stouffer's (eq. 1) and Bhoj's (eq. 6) methods. These methods are expected to compute accurate combined Inline graphic-values for EInline graphic, corresponding to the first few data points of panels A and B. The data points in panels A and B show that as EInline graphic increases so does the EInline graphic and EInline graphic, indicating the methods' inadequacy for handling correlation among Inline graphic-values. All three curves in panel C lie above the Inline graphic line, indicating that all three methods exaggerate significance when combining correlated Inline graphic-values. The curves in panels D, E and F show that the average value (red solid curve) of Inline graphic can deviate significantly from Inline graphic axis with wild fluctuations (error bars shown in blue). Also, a comparison with the plots obtained from combining 4, 8, and 12 Inline graphic-value vectors indicates that the accuracy of the unified Inline graphic-value decreases as the number of Inline graphic-values combined increases from 4 to 12.

Figure 4 shows the results for methods that combine weighted independent Inline graphic-values: Good's (eq. 5), Lipták's (eq. 1) and Bhoj's (eq. 6) methods. These three methods may be viewed as extensions of the previous three methods with Inline graphic-value weighting enabled. Comparison of the panels of Figure 4 with that of Figure 3 shows noticeable improvement on the accuracy of the combined Inline graphic-values. Although the accuracy has improved by weighting the Inline graphic-values, the computed Inline graphic-value still differs significantly from the expected value. The observed improvement suggests that weighting Inline graphic-values might weaken the effect of correlation by promoting one Inline graphic-value over the rest in the list of Inline graphic-values to be combined. Other studies have also recommended [32], [33] weighting Inline graphic-values to improve statistical power. Even though weighting Inline graphic-values is recommended, there exists no consensus on how to determine the optimal weights [6], [24]-[26]. This is why in our simulation we have assigned random weights to the Inline graphic-values to be combined. In principle, the accuracy of the computed Inline graphic-value from the three methods above could be improved by using a different procedure to compute the weights. Such an investigation, although worth pursuing in its own right, is beyond the scope of the current study.

Figure 5 shows the results from using methods designed to combine correlated Inline graphic-values: Hartung's (eq. 9) and Hou's (eq. 14) methods. The curves in Figure 5 when compared with the curves of Figure 3 and 4 show a significant improvement in the accuracy of the combined Inline graphic-value computed. From the curves of Figure 5 Hou's method seems to be the better performing one, it has a smaller expected error and standard deviation when compared with the curves obtained from Hartung's method. As shown in panel C of Fig. 6, Hou's EInline graphic vs Inline graphic curve also traces reasonable well the line Inline graphic, deviating from it only by a factor of about 4.0 for Inline graphic.

Finally, in Figure 6 we have the evaluation results of methods that combine weighted correlated Inline graphic-values: Hartung's (eq. 9) and Hou's (eq. 14) methods. When the curves of Figure 6 are compared with that of Figure 5, as before it shows that weighting Inline graphic-values tends to improve the accuracy of the the computed Inline graphic-value The curves also show that Hou's method has a larger improvement in accuracy by using weights in comparison to Hartung's method. As articulated earlier and supported by the observed results, there is a possibility that the accuracy of the combined Inline graphic-value could be further improved by having a statistically and mathematically rigorous procedure that could render the optimal weights to be used.

In a brief summary, methods designed for combining independent Inline graphic-values tend to yield exaggerated Inline graphic-values when used to combining correlated Inline graphic-values. On the other hand, most methods designed to handle correlated Inline graphic-values tend to provide conservative estimates for the unified Inline graphic-values. The first case can be understood easily since one is effectively using nearly identical evidences to corroborate one another. For the latter case, however, we can not provide an intuitive interpretation except that it might result from the heuristics those methods employed. Weighting Inline graphic-values seems to weaken the effect of correlation. This can be roughly understood as follows. By weighting each of the Inline graphic Inline graphic-values, only the Inline graphic-values assigned the highest weights play a role. This increase the likelihood of having the highest weighted Inline graphic-values be nearly independent, thereby reducing the effect of correlations. Not only does it help the methods designed for combining independent Inline graphic-values, it also helps the ones for combining correlated Inline graphic-values as most of these methods are heuristic-based and get more accurate results when the correlation is weaker. Based on these results, when the lists of the Inline graphic-value vectors are complete, it is best to calculate the corresponding pairwise correlations between any two Inline graphic-value vectors, introduce weights, and then assign the final unified statistical significance to each hypothesis.

In real applications, however, one is often faced with incomplete lists of Inline graphic-values. That is, one only has the Inline graphic-values for the highest ranking hypotheses, not for all hypotheses tested. This prevents one from computing the correlations needed for the formalism for combining correlated Inline graphic-values. In this case, i.e., when combining Inline graphic-values of unknown correlation, one should exercise caution. Absent the correlation information, a better option might be to use the smallest of the Inline graphic-values to be combined and then apply the Bonferroni correction by multiplying the smallest Inline graphic-value by Inline graphic, the number of Inline graphic-values to be combined. This will guarantee a conserved statistics. However, under this approach, one might run into cases where the smallest Inline graphic-values considered is larger than Inline graphic, thereby obtaining a corrected Inline graphic-value that is larger than Inline graphic. Even if each of the Inline graphic-value lists is complete, there are still scenarios not covered in this paper. For example, it is possible that higher order correlations (such as the three-body or four-body) exist among the Inline graphic-value vectors. We did not consider these cases since we are not aware of any readily available methods designed to deal with such type of higher order correlations.

In conclusion our study recommends that the unified Inline graphic-value obtained from combining Inline graphic-values of unknown correlation should be used with caution to prevent from drawing false conclusions. Results from our study agree with previous investigations [6], [8], [10], supporting the hypothesis that weighting Inline graphic-values has the potential to improve the accuracy of the combined Inline graphic-value. However, the important issues of choosing the weights to optimize a method's power and estimating the correlation matrix elements among Inline graphic-values from small sample sizes remain challenging [34], [35]. Our results also show that when combining independent or weighted independent Inline graphic-values, Bhoj's method produces more accurate Inline graphic-values than other methods tested. In the case when the correlation information is available, among the methods investigated, Hou's method, able to accommodate Inline graphic-value weighting, seems to be the best performing method.

Supporting Information

File S1

This pdf file contains eight figures showing Inline graphic-value accuracy evaluation of methods considered in this manuscript when combining 4 and 8 Inline graphic-value vectors.

(PDF)

Acknowledgments

We thank the administrative group of the National Institutes of Health Biowulf Clusters, where all the computational tasks were carried out. We also thank the National Institutes of Health Fellows Editorial Board for editorial assistance.

Funding Statement

This work was supported by the Intramural Research Program of the National Library of Medicine at the National Institutes of Health/Department of Health and Human Services. Funding for Open Access publication charges for this article was provided by the National Institutes of Health. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Olkin I (1995) Statistical and theoretical considerations in meta-analysis. J Clin Epidemiol 48: 133–146. [DOI] [PubMed] [Google Scholar]
  • 2. Bailey TL, Gribskov M (1998) Combining evidence using p-values: application to sequence homology searches. Bioinformatics 14: 48–54. [DOI] [PubMed] [Google Scholar]
  • 3. Alves G, Wu WW, Wang G, Shen RF, Yu YK (2008) Enhancing peptide identification confidence by combining search methods. J Proteome Res 7: 3102–3113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Rosenthal R (1978) Combining Results of Independent studies. Psychological Bulletin 85: 185–193. [Google Scholar]
  • 5. Loughin TM (2004) A systematic comparison of methods for combining p-values from independent tests. Computational Statistics & Data Analysis 47: 467–485. [Google Scholar]
  • 6. Whitlock MC (2005) Combining probability from independent tests: the weighted Z-method is superior to Fisher's approach. J Evol Biol 18: 1368–1373. [DOI] [PubMed] [Google Scholar]
  • 7. Won S, Morris N, Lu Q, Elston RC (2009) Choosing an optimal method to combine P-values. Stat Med 28: 1537–1553. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Chen Z (2011) Is the weighted z-test the best method for combining probabilities from independent tests? J Evol Biol 24: 926–930. [DOI] [PubMed] [Google Scholar]
  • 9. Chen Z, Nadarajah S (2014) On the optimally weighted -test for combining probabilities from independent studies. Computational Statistics & Data Analysis 70: 387–394. [Google Scholar]
  • 10.Zaykin DV (2011) Optimally weighted Z-test is a powerful method for combining probabilities in meta-analysis. J Evol Biol. [DOI] [PMC free article] [PubMed]
  • 11. Dudbridge F, Koeleman BP (2003) Rank truncated product of P-values, with application to genomewide association scans. Genet Epidemiol 25: 360–366. [DOI] [PubMed] [Google Scholar]
  • 12. Demetrescu M, Hassler U, Tarcolea AI (2006) Combining significance of correlated statistics with application to panel data. Oxford Bulletin of Economics and Statistics 68: 647–663. [Google Scholar]
  • 13. Lipták P (1958) On the combination of independent tests. Magyar Tud Akad Nat Kutato int Kozl 3: 171–197. [Google Scholar]
  • 14. Good IJ (1955) On the weighted combination of significance tests. Journal of the Royal Statistical Society Series B (Methodological) 17: 264–265. [Google Scholar]
  • 15. Bhoj DS (1992) On the distribution of the weighted combination of independent probabilities. Statistics & Probability Letters 15: 37–40. [Google Scholar]
  • 16. Hartung J (1999) A note on combining dependent tests of significance. Biometrical Journal 41: 849–855. [Google Scholar]
  • 17. Hou CD (2005) A simple approximation for the distribution of the weighted combination of nonindependent or independent probabilities. Statistics & Probability Letters 73: 179–187. [Google Scholar]
  • 18. Brown MB (1975) A method for combining non-independent, one-sided tests of significance. Biometrics 31: 987–992. [Google Scholar]
  • 19. Vattathil S, Scheet P (2013) Haplotype-based profiling of subtle allelic imbalance with SNP arrays. Genome Res 23: 152–158. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Stouffer S, Suchman E, DeVinney L, Star S, Williams RMJ (1949) The American Soldier, Vol. 1: Adjustment during Army Life. Princeton: Princeton University Press.
  • 21.Fisher RA (1932) Statistical Methods for Research Workers, vol. II. Edinburgh: Oliver and Boyd.
  • 22. Lancaster HD (1961) The combination of probabilities: an application of orthogonal functions. Austr J Statist 3: 20–33. [Google Scholar]
  • 23.Hedges L, Olkin I (1985) Statistical methods for meta-analysis. New York: Academic Press.
  • 24.Zelen M, Joel LS (1959) The weighted compounding of two independent significance tests. The Annals of Mathematical Statistics 30 : pp. 885–895. [Google Scholar]
  • 25. Pepe MS, Fleming TR (1989) Weighted Kaplan-Meier statistics: a class of distance tests for censored survival data. Biometrics 45: 497–507. [PubMed] [Google Scholar]
  • 26. Loesgen S, Dempfle A, Golla A, Bickeboller H (2001) Weighting schemes in pooled linkage analysis. Genet Epidemiol 21 Suppl 1S142–147. [DOI] [PubMed] [Google Scholar]
  • 27. Alves G, Yu YK (2011) Combining independent, weighted p-values: Achieving computational stability by a systematic expansion with controllable accuracy. PLoS ONE 6: e22647. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Delongchamp R, Lee T, Velasco C (2006) A method for computing the overall statistical significance of a treatment effect among a group of genes. BMC Bioinformatics 7 Suppl 2S11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Satterthwaite FE (1946) An approximate distribution of estimates of variance components. Biometrics Bulletin 2: 110–114. [PubMed] [Google Scholar]
  • 30. Kost JT, McDermott MP (2002) Combining dependent p-values. Statistics & Probability Letters 60: 183–190. [Google Scholar]
  • 31. Schweder T, Spjotvoll E (1982) Plots of p-values to evaluate many tests simultaneously. Biometrika 69: 493–502. [Google Scholar]
  • 32. Genovese CR, Roeder K, Wasserman L (2006) False discovery control with p-value weighting. Biometrika 93: 509–524. [Google Scholar]
  • 33. Hu JX, Zhao H, Zhou HH (2010) False Discovery Rate Control With Groups. J Am Stat Assoc 105: 1215–1227. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Liechty JC, Liechty MW, Muller P (2004) Bayesian correlation estimation. Biometrika 91: 1–14. [Google Scholar]
  • 35. Peng J, Wang P, Zhou N, Zhu J (2009) Partial correlation estimation by joint sparse regression models. Journal of the American Statistical Association 104: 735–746. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

File S1

This pdf file contains eight figures showing Inline graphic-value accuracy evaluation of methods considered in this manuscript when combining 4 and 8 Inline graphic-value vectors.

(PDF)


Articles from PLoS ONE are provided here courtesy of PLOS

RESOURCES