Skip to main content
Human Heredity logoLink to Human Heredity
. 2009 Jan 27;67(4):267–275. doi: 10.1159/000194979

Ranking Bias in Association Studies

Neal O Jeffries 1,*
PMCID: PMC2880722  PMID: 19172085

Abstract

Background

It is widely appreciated that genomewide association studies often yield overestimates of the association of a marker with disease when attention focuses upon the marker showing the strongest relationship. For example, in a case-control setting the largest (in absolute value) estimated odds ratio has been found to typically overstate the association as measured in a second, independent set of data. The most common reason given for this observation is that the choice of the most extreme test statistic is often conditional upon first observing a significant p value associated with the marker. A second, less appreciated reason is described here. Under common circumstances it is the multiple testing of many markers and subsequent focus upon those with most extreme test statistics (i.e. highly ranked results) that leads to bias in the estimated effect sizes.

Conclusions

This bias, termed ranking bias, is separate from that arising from conditioning on a significant p value and may often be a more important factor in generating bias. An analytic description of this bias, simulations demonstrating its extent, and identification of some factors leading to its exacerbation are presented.

Key Words: Estimation bias, Multiple comparisons

Introduction

As high dimensional assay technology has become more available for genomic investigations there has been a corresponding increase in methodology to limit false positive findings [1, 2]. Until recently, less attention has been given to the effect sizes of those comparisons that are determined to reach statistical significance. In many instances this may be appropriate as effect sizes may not be of great importance per se; however in other circumstances it may be important to accurately assess the magnitude of the associated effects.

As an example, there has been increased recognition that some form of replication is desirable to support a claim of association found as a single result among a large panel of genetic markers [3]. However, using estimates of effect size derived from an initial study of as many as a million markers can lead to an underpowered evaluation as the initial estimate is likely biased toward more extreme results than are true.

A common rationale given for this bias is that it results from conditioning on the initial estimate meeting p value criteria for declaring significance – often referred to as selection or truncation bias. The term ‘significance bias’ will be used here to stress the conditioning upon a significant p value. Garner [4] describes this in the context of genomewide association scans; Allison et al. [5] describe it in the context of quantitative trait loci of paired siblings. This conditioning on a significant result has the effect of truncating the distribution of the marker's sampling distribution and creating an associated bias. For example, given a normally distributed variable X with mean μ and standard deviation ν, one can show the expected value of X conditional upon X > c (where c may be chosen to generate a significant p value) is given by

E[X|X>c]=μ+ν2·φ(c;μ,ν2)Prob[X>c] (1.1)

where φ(·) denotes the probability density function of normally distributed variable with mean μ and standard deviation ν. The second term on the right hand side of (1.1) captures the bias from conditioning. This description of bias does not recognize the effect multiple testing may exert on creating bias beyond its role in requiring a p value threshold less than 0.05. In this paper multiple testing and focusing on extreme results (i.e. highly ranked results) is shown to generate overestimation in situations where significance bias may have little effect. The terms ‘significance bias’ and ‘ranking bias’ are not in common use – the phrases ‘truncation bias’ and ‘selection bias’ are used more often but are insufficiently specific and fail to draw the distinction made in this paper.

Though a detailed description is given in the next section the idea of ranking bias is briefly described here. Given a number of test statistics, each one is composed of a trend term (related to true underlying effect size) and some random variation. When looking at the test statistic with the largest positive observed effect size it is likely (under many circumstances) that the random variation component is positive as well – because the large, positive observed effect is the sum of a trend and the random component. This random component captures the difference between the true and observed effect size and positive values correspond to a type of overestimation bias – called ranking bias here.

Allison et al. [5] briefly describe ranking bias but focus the work on significance bias. Göring et al. [6] discuss both significance bias and ranking bias. They make an analogy between the ranking bias discussed here with that corresponding to bias in model selection procedures.

Though they discuss ranking bias their modeling and analytical results are derived solely from models of significance bias. Zöllner and Pritchard [7] use the term ‘Winner's Curse’ to describe significance bias in a single marker and present an analytical approach toward correcting this bias. Sun and Bull [8] explicitly describe significance bias and more implicitly describe ranking bias in the context of a mathematical model with a number of non-informative markers and 1 informative marker. In the present work we present more general models with a number of informative markers and the relationship between bias (both significance and ranking) and the pattern of the markers' effect sizes is explicitly explored.

Description of Bias

The modeling proposed here is relatively simple both because it is easy to see how to apply this situation to others and because it allows for a transparent partitioning of a statistic into trend and random error terms that facilitates understanding ranking bias. Let N denote the common number of cases and controls drawn in population-based study, G denote the number of biallelic markers under consideration, G1 the number of markers with different allele frequencies between cases and controls, and G0 the number with no true difference between cases and controls, i.e. G = G1 + G0. For simplicity's sake a recessive pattern is assumed to influence the likelihood of disease for the informative markers. For each of the G markers the log odds ratio will be computed from an associated 2 × 2 table (table 1).

Table 1.

2 × 2 table for each biallelic marker

Case Control Total
2 minor alleles a b a + b
0 or 1 minor alleles c d c + d
N N 2N

For the j-th marker let θj denote the log odds ratio and θˆj its estimate, log(ad/bc). Then

Zj=θˆj-θjseˆjN(0,1) (2.1)

where

seˆj=1/a+1/c+1/b+1/d

and ∼ N(0,1) indicates an approximate Gaussian distribution with mean 0 and variance 1. Given the observed log odds and associated test statistic

Tj=θˆjseˆj (2.2)

one can use the relation in (2.1) to obtain p values for the hypothesis that θj = 0 as well as to generate confidence intervals

θˆj±Φ1-α/2-1·seˆj

where 1 – α is confidence associated with the interval and Φ1-α/2-1 is the associated percentile from a N(0,1) distribution.

To investigate how bias arises it is useful to rewrite the relationship between (2.1) and (2.2). First, write

sdˆ=1pˆ2/Case+11-pˆ2/Case+1pˆ2/Control+11-pˆ2/Control (2.3)

where

pˆ2/Case=a/Nandpˆ2/Control=b/N (2.4)

denote the empirically observed probability of two minor alleles in the case and control populations, respectively. The term sdˆ is an estimate of σ, which is defined by equation (2.3) after replacing the estimated probabilities of having both minor alleles by the true unknown proportions of having both minor alleles. Then we can decompose the test statistic Tj as

Tj=Nθˆjsdˆj=N(θˆj-θj)sdˆj+Nθjsdˆj (2.5)
=Zj+NθjsdˆjwhereZj=N(θˆj-θj)sdˆj. (2.6)

The decomposition shows the test statistic Tj is composed of a trend term,

Nθjsdˆj

and a mean 0 random fluctuation term, Zj. Note that Zj and Tj differ in that Zj has an approximately normal distribution centered about 0 while for Tj this is only true when θj = 0. The Zj term captures the difference between estimated log odds and true log odds and its distribution is of primary interest in assessing overestimation bias.

For each j, Zj has approximately an unconditional N(0,1) distribution. A critical question concerns the distribution of Zj when j corresponds to most extreme observed Tj statistic. We will focus upon the case of largest Tj – the case for smallest, i.e. most negative, statistic is similar and generalizations involving it and other extreme statistics are discussed below.

Let r1,r2, …, rG order the T test statistics where

Tj=NθˆjsdˆjandTr1Tr2TrG.

Typically, the distribution of

ZrG=N(θˆrG-θrG)sdˆrG (2.7)

is not that of a normal distribution with mean 0 – this will be demonstrated with simulations below. The reason for the bias is that if a number of markers have similar sized trend components

Nθjsdˆj

then the random components Zj are likely to play a larger role in determining which marker has the maximal observed value, TrG. Under these conditions of multiple markers with similar trend values the contribution of Zj for that with largest Tj value is likely to be positive. From the definition of ZrG in (2.7) we see a positive value corresponds to overestimation of the underlying log odds ratio. The expected value of the difference between θˆrG and θˆrG is what is meant by ranking bias, i.e.

Ranking Bias=E[θˆrG-θrG].

As discussed further below, it is important to note that rG, the index of the marker with greatest observed effect size, is random and can change in repeated sampling from the relevant populations.

Simulation Studies Demonstrating Bias

To illustrate the ranking bias a simple set of simulations are presented. In this case-control study there are G = 500 markers and there are N = 500 cases and 500 controls. In these simulation studies the markers are generated as independent variables. Populations of cases and controls were generated with 480 of the markers having no differences in allele frequencies. For the remaining 20 markers the odds ratios for diseased relative to healthy were 1.05, 1.10, …, 1.95, and 2.0. The probability of having 2 minor alleles in the control group was randomly chosen for each marker to lie between 0.05 and 0.25 (i.e. the minor allele frequencies varied between 0.224 and 0.5).

Given a simulated sample of cases and controls the following steps were performed:

For each of G = 500 markers find

Tj=Nθˆjsdˆj,j{1,,G}.

Because the true log odds ratios (θj) are known, one can calculate a realization of

ZrG=N(θˆrG-θrG)sdˆrG.

After making many such samples (1000 simulations) the resulting distribution of ZrG is shown along with the standard normal distribution in figure 1. Of note are the facts that the ZrG distribution is not centered about 0 (indicating θˆrG typically overestimates θrG) and that the distribution appears bimodal. Also, the distribution is relatively narrow compared to the standard normal density – this is perhaps surprising given asymptotic efficiency properties of normal approximations. However, the standard normal distribution is not relevant here as it neglects to take into account that rG, the index of the maximal T statistic, is random and changes in different samples drawn from the underlying populations. For example, in the 1000 simulations approximately 34% of the time rG corresponded to that marker with the odds ratio of 2.0, 32% of the time rG was associated with the 1.95 odds ratio, and 8% of the time with that associated with 1.90, and so forth. Because different markers are chosen as the simulated datasets change a mixture distribution arises as suggested by the non-unimodal character of the distribution.

Fig. 1.

Fig. 1

Distribution of ZrG=N(θˆrG-θrG)/sdˆrG and a standard normal distributions; maximum odds ratio of 2.0.

The observed and true log-odds values associated with the largest test statistic (i.e. θˆrG and θrG) for each of the 1000 simulations are shown in figure 2. The discrete vertical bands correspond to the 12 distinct values of θrG that were obtained: {0, log(1.45), log(1.50), …, log(2.0) = 0.69}. The degree to which the points lie above the diagonal 45 degree line indicate the amount of bias in each simulation (fig. 2).

Fig. 2.

Fig. 2

Comparison of δrG (X-axis) and δˆrG (Y-axis) in 1000 simulations.

Ranking Bias vs. Significance Bias

Thus far there has been no reliance upon a p value threshold as a means for creating the observed bias. Truncation bias related to conditioning on a significant p value (i.e. significance bias) is a common rationale given for overestimation bias in marker studies and is an independent contributing factor. The bias demonstrated thus far is not conditioned on the largest T statistic first exceeding some threshold, e.g. one that corresponds to a p value = 0.025/G. Hence it cannot be significance bias. To clarify the difference significance bias is now further discussed.

Recent work involving significance bias focuses upon the effect in a single marker [4, 7]. In our setting we illustrate the idea using the marker with largest true effect size. Let r0G designate this marker, i.e.

θrG0σrG0>θjσjfor alljrG0,j{1,,G}.

Because r0G is fixed (though unknown) it is the case that for large sample size N

ZrG0=N(θˆrG0-θrG0)sdˆrG0Normal(0,1) (4.1)

and hence E[θˆrG0]=θrG0. However, a bias is introduced if one conditions on TrG0=NθˆrG0/sdˆrG0 exceeding a thres hold value so that the associated p value is sufficiently small. For instance, if TrG0>3.90 then the associated p value will be less than 0.025/500 and in our simulations the marker would be judged showing significant association using a Bonferroni criterion. However, using standard derivations concerning conditional expectations of normally distributed random variables one can show that

E[ZrG0|TrG0>3.90]E[ZrG0>3.90-NθrG0/σrG0] (4.2)
=φ(3.90-NθrG0/σrG0)1-Φ(3.90-NθrG0/σrG0) (4.3)

where φ and Φ denote the density and cumulative distribution function of a standard normal distribution. Consequently it follows from (4.1) that

E[θˆrG0|TrG0>3.90]=θrG0+significance bias (4.4)
σrG0N.(φ(3.90-NθrG0/σrG0)1-Φ(3.90-NθrG0/σrG0)). (4.5)

The approximation arises because sˆdrG0 has been replaced by σrG0 and the asymptotic normal distribution was used. From the result in (4.5) one can show how changes in sample size, odds ratio, allele frequency (affecting the σ term), and p value threshold change the significance bias. Such an assessment was performed by Garner [4].

However, in most circumstances r0G is not known, i.e. in a study with a large number of markers and little a priori knowledge of which markers have the strongest effects only rG is observed and the preceding development based upon r0G is not really applicable in practice. What is of practical interest is E[θˆrG|TrG>3.90]. However analytical evaluation of this expression is more complicated than assessing bias for r0G because rG is random. Consequently, simulations may be used to explore the following types of overestimation bias significance bias related to θˆrG0:E[θˆrG0|TrG0>3.90]-θrG0, ranking bias related to θˆrG:E[θˆrG-θrG], and the combination of ranking bias and significance bias related to θˆrG:E[θˆrG-θrG|TrG>3.90].

Sets of simulations were performed to examine the relative size of biases across varying conditions. Each design has 500 markers with a smaller number of informative markers (G1) ranging between 20 and 1; the results are shown in table 2. Scenario 1 corresponds to the simulations generating figure 1, the maximum true odds ratio is 2 (log odds ratio of 0.69) and the maximum effect size is given by θr0Gr0G = 0.69/3.465 where 3.465 is the associated σ value derived from the underlying allele probabilities (the probability of 2 minor alleles is 0.1685 in the control group) and odds ratio. An estimate of σ is given by sdˆ in equation (2.3). From equation (4.5) an analytical estimate of the significance bias applied to this particular marker is 0.072 as shown in the first column of table 2 (all biases in this section are reported on a log odds ratio scale). In the 1000 simulations the observed bias for this marker arising from conditioning on its T statistic exceeding 3.90 was 0.075 – in good agreement with the analytic result. The simulations' estimate of ranking bias related to θˆrG is about 0.191 in this case – approximately 2.5 times larger than the significance bias of θˆrG0. As an aside it is worth noting that other markers with smaller true effect sizes will have larger significance bias, but smaller likelihood of reaching the threshold. Finally we can get an estimate of the combined effects of ranking bias and significance bias for θˆrG by examining just those simulations in which TrG > 3.90. In these 1000 simulations TrG > 3.90 in 99% of the occasions so the additional significance bias induced by conditioning on this common event is essentially negligible as it raises the bias only from 0.191 to 0.192 (table 2).

Table 2.

Ranking and significance bias on a log odds ratio scale

θˆrG0   significance bias:   E[θˆrG0|TrG0>3.90]-θˆrG0
θˆrGranking bias:   E[θˆrG-θˆrG]
θˆrGcombined significance and ranking bias:   E[θˆrG-θˆrG|TrG>3.90]
Scenario 1: n = 500, G1 = 20 informative ORs {1.05, …, 2.0} 0.072 0.191 0.192
Scenario 2: n = 500, G1 = 40 {1.025, …, 2. 0} 0.072 0.230 0.230
Scenario 3: n = 500, G1 = 5 {1.20, …, 2.0} 0.072 0.113 0.129
Scenario 4: n = 500, G1 = 1 {2.0} 0.072 0.069 0.085
Scenario 5: n = 500, G1 = 1 {4.0} 0.000 0.000 0.000
Scenario 6: n = 1000, G1 = 20 {1.05, …, 2.0} 0.002 0.112 0.112
Scenario 7: n = 500, G1 = 10 {1.04, …,1.40} 0.330 0.448 0.471
Scenario 8: n = 500, G1 = 10 {1.30, …, 1.30} 0.395 0.476 0.543

Next we investigate how results change when the number of informative markers change from the design in Scenario 1. Scenario 2 has a similar structure but has 40 informative markers taking values 1.025, 1.05, …, 1.975, and 2.0. Because the sample size, standard deviations, and maximum odds ratio of 2 stay the same the significance bias associated with θˆrG0 remains the same at 0.072. However, the changes in informative effect sizes induce changes in the ranking bias and combined ranking and significance bias associated with θˆrG. These figures have increased to 0.230 for both bias measures. Conversely, when the of informative markers are reduced to only 5 as in Scenario 3, taking values 1.20, …, 1.80, 2.0, then the two bias measures associated with θˆrG are reduced to 0.113 and 0.129 though the significance bias associated with θˆrG0 remains unchanged. These three scenarios show how bias in θˆrG depends upon the distribution of all markers, not just the conditions associated with the single marker having the largest true effect size.

In Scenario 4 the more extreme case of a single informative marker with an odds ratio of 2 is considered. Here the ranking bias of θˆrG=0.069 is considerably less because there is a relatively low probability that any other marker will be associated with rG – in 92.7% of the simulations the marker with an odds ratio of 2 generated the largest T statistic. Therefore, the significance bias of θˆrG0 and θˆrG are quite comparable as shown in the first and third column. In Scenario 5 the effect size of the single informative marker is increased so that this marker produces the smallest p value in all the simulations so there is essentially no ranking bias. Further, the sample size is sufficiently large so that the associated T statistic will exceed the 3.90 cutoff with an extremely high probability so there is no detected significance bias. Scenario 6 changes Scenario 1 by increasing the sample size. As in Scenario 5 this essentially eliminates the significance bias associated with the marker having an odds ratio of 2. However, there is still considerable bias associated with θˆrG indicating ranking bias is a more important factor than significance bias in this case.

In Scenario 7 more modest effect sizes are considered. Here there are 10 informative ORs between 1.04 and 1.40. In this case 10,000 simulations were considered as there was low power (7.1%) for any of the 10 markers with informative ORs to exceed the Bonferroni threshold. Because the power is low, one might expect higher significance bias and this is in fact the case. The marker with OR = 1.4 had significance bias of 0.330 – indicating the observed log-odds ratio was typically twice as high as the true effect (log(1.4) = 0.336) when the criterion was met. However, in this case the ranking bias was extensive as well with values of 0.448 (unconditional) and 0.471 (conditional upon a significant statistic). Ranking bias remains high because those conditions that create large significance bias (e.g. a large Bonferroni threshold relative to true effect size) also tend to create large ranking bias (many different markers could potentially be highest ranked, with large bias for any one marker being highest ranked). In general it is difficult to find conditions in which significance bias will appreciably exceed the rank bias – either unconditional or conditional upon the highest ranked test statistic exceeding a threshold. Indeed, as it is usually the case that θˆrGθˆrG0 and θrGθrG0 one would not expect significance bias, E[θˆrG0-θrG0|TrG0>3.90], to exceed the conditional ranking bias, E[θˆrG-θrG|TrG>3.90].

Finally, Scenario 8 shows the case when all the informative markers have the same effect size. As described in the Discussion and Appendix sections one should expect high ranking bias and this is in fact the case.

Among the conclusions to be drawn from these comparisons of different types of bias are (1) significance bias associated with a single, fixed marker may not be particularly relevant; (2) ranking bias is distinct from significance bias; (3) ranking bias is likely the greater of the two sources when many variables are considered. Finally, as shown by comparing the first 4 scenarios in table 2 that have many common design elements, the effect sizes of all markers play a role in determining ranking bias – efforts to understand bias associated with highly ranked markers must take this into account.

Discussion

Appendix A shows that ranking bias (scaled by the marker's standard deviation and sample size) for independent markers is given approximately by

E[N(θˆrG-θrG)sdˆrG]=E[ZrG]12πj=1GE|e-M-j22| (5.1)

where

M-j=maxkjTk-Nθjσj=maxkj(Zk+N(θkσk-θjσj)) (5.2)

for j in 1, …, G. In words, M–j represents the largest of all Tk (besides Tj) minus the trend term corresponding to the j-th marker. As discussed further in the appendix, some understanding of the bias can be deduced from relation (5.1). First, and perhaps most importantly, it is clear the bias depends in principle on the effect sizes, θjj, for all G of the markers. In practice, perhaps a few or even just one marker is relevant, but in principle one needs to take into account the true effect sizes of all.

As the sample size increases then each of the M–j increase toward infinity in absolute value (with probability 1 under common conditions) so that the bias tends to zero as we expect assuming there are some informative markers and one is more informative than the others. On the other hand, bias is considerably worse when the effect sizes, θjj, are the same for all j = 1, …, G as would be the case is there were no informative markers. If the markers are independent then in this case

E[ZrG]G2πE[e-12(max{X1,,XG-1})2]

An important related question that underlies equation (5.1) is: given repeated samples of size N from the underlying populations, how many different markers could reasonably be selected as rG? If one marker has an effect size that is much larger than all the rest then in repeated samples we would expect rG to almost always correspond to that marker and no appreciable bias should result – empirically this is seen in Scenario 5 of table 2. Analytically this is demonstrated in the appendix when γ → ∞. If, on the other hand, there are a number of markers that have non-trivial probabilities of producing the largest test statistic in a given sample then in these cases the corresponding Zj terms play a larger role in determining which marker is chosen and we would expect a correspondingly greater degree of bias given the relationship between bias and Zj. One way of thinking about these factors is to see that the distribution of ZrGis a mixture of the conditional distributions Zj given rG = j with the weights of the mixture given by the probability that rG = j. These conditional distributions are typically not centered about 0 – in fact they are subject to a kind of truncation bias similar in form to that arising from conditioning on a significant p value, but instead arising from conditioning on rG = j. When there are a number of markers with non-negligible Prob[rG = j] this mixture leads to non-negligible bias for ZrG. Appendix A provides technical details.

The results thus far presented only focus upon the marker with largest test statistic. A natural question is how much bias may be present if attention is paid to all markers that exceed a significance threshold. This corresponds to the common strategy of following-up on all markers that meet a stringent significance threshold. For each simulation under Scenario 1 in table 2 the number of markers with T scores > 13.90 was recorded and the difference between the observed δˆ and the true underlying θ was recorded. Table 3 provides information about all highly ranked markers that exceed the threshold. In 99% of the simulations at least one marker met the threshold requirement. The bias associated with the highest ranking marker exceeding the threshold is 0.192 when measured on a log odds scale – this is essentially the same information as reported in the first row of table 2. Again it is worth noting that this is not reporting the results for the marker with true log-odds ratio of 2.0, but rather the average bias for the marker with the highest observed T statistic – in 34% of the simulations this did correspond to the marker with odds ratio of 2.0, in 32% of occasions the marker with odds ratio of 1.95, etc…. The other rows of table 3 provide new information: in 92.5% of the simulations the second highest ranked marker met the Bonferroni criteria and the bias for these instances was estimated as 0.143 log-odds units. Table 3 gives information for the 3rd, 4th, 5th, 6th, and 7th mostly highly ranked markers (no more than 7 markers ever met the criteria) and we see the bias conditional upon exceeding the threshold remains considerable. This table indicates that the ranking bias is not just a problem for the most highly ranked marker – it can be present and substantial for all those markers meeting significance criteria.

Table 3.

Bias of markers exceeding threshold under scenario 1

Marker rank
Prob[Trj>3.90]
E[θˆrj-θˆrj|Trj>3.90]
1st highest ranked: θˆrj=θˆr500 0.990 0.192

2nd highest ranked: θˆrj=θˆr499 0.925 0.143

3rd highest ranked: θˆrj=θˆr498 0.741 0.124

4th highest ranked: θˆrj=θˆr497 0.438 0.128

5th highest ranked: θˆrj=θˆr496 0.193 0.132

6th highest ranked: θˆrj=θˆr495 0.054 0.131

7th highest ranked: θˆrj=θˆr494 0.007 0.129

Conclusions

In this report the focus has been on measuring bias associated with the largest effect size. Ranking bias is potentially present in any study with multiple tests where attention is drawn to outcomes associated with the most extreme observed effect sizes. Here we have seen that focus upon highly ranked results, rather than conditioning on significant p values, can be responsible for much of overestimation bias. Some authors have put forth methods to correct for both ranking and significance bias. The bootstrap/cross-validation approach put forth by Sun and Bull [8] can accommodate both types of bias. A similar bootstrap approach by Jeffries [9, 10] has been employed to correct for ranking bias in microarray and diagnostic modeling contexts. Analytic approaches have been put forth by Zöllner and Pritchard [7], Ghosh et al. [11], and Zhong and Prentice [12] however these appear to address significance bias but not ranking bias. The more extensive dependence of rank bias on potentially all effect sizes (as opposed to significance bias that depends only upon one marker's effect size) complicates analytical solutions. It is likely that analytic approaches to biased overestimation from genomic studies that ignore the need to consider all markers' effect sizes may be missing an important aspect of the problem, i.e. the distribution of the most extreme test statistic is a mixture distribution depending, in principle, upon all markers' parameters.

The modeling in this paper was very simple: a single stage study (i.e. no two-stage or higher stage designs) with a recessive genetic model and Wald test approach to evaluating significance. The simple approach allowed for a simple analytic description of the problem – more complicated designs are likely prone to the same types of biases and work remains for examining the role of ranking bias in these circumstances.

Appendix A

Calculation of the Bias

Here an effort is made to sketch the degree of bias that may be expected and link this magnitude to some factors such as sample size, distribution of true effect sizes, and the number of markers. Simplifying assumptions will be employed as necessary. Recall that the j-th test statistic Tj may be decomposed into a trend and random fluctuation component as

Tj=Zj+Nθjsdˆj

where Zj=N(θˆj-θj)/sdˆj and Zj is approximated by a standard normal distribution. Then we may calculate the expected value of ZrG(and hence obtain a measure of the bias of θˆrG-θrG) as

E[ZrG]=j=1GE[ZrG|rG=j]P[rG=j] (A.1)

and

E[ZrG|rG=j]P[rG=j]=(zjfZj|rG=j(zj)dzj)P[rG=j] (A.2)
=ZjfZj(zj,rG=j)dzjP[rG=j]P[rG=j] (A.3)
=ZjfZj(zj,rG=j)dzj (A.4)

where fZj|rG=j is the conditional distribution of Zj given rG = j and f Zj(zj,rG = j) describes the joint distribution of Zj and the event rG = j.

NowrG=jif and only ifTj>maxTkkj (A.5)
if and only ifZj>maxkj(zk+N(θksdˆk-θjsdˆj)). (A.6)

Here we have used the decomposition in (2.5) to show how the bias depends on the random fluctuation terms, Zj and Zk. To simplify we will approximate the sdˆj and sdˆk terms by σj and σk (the associated true population values). Then we obtain

E[ZrG]=j=1GzjfZj(zj,rG=j)dzj (A.7)
j=1GE[Zj·I[Zj>M-j]] (A.8)

where

M-j=maxkj(Zk+N(θkσk-θjσj)) (A.9)

and

=I[A]is a function=1if event A is true,0otherwise. (A.10)

To proceed further we condition the expectation on M–j and recall that Zj is marginally distributed as a standard Gaussian random variable and thus, for independent markers,

E[ZrG]j=1GE[Zj·I[Zj>M-j]] (A.11)
=j=1GE[E[Zj·I[Zj>M-j]|M-j]] (A.12)
=E[j=1GM-jzfZ(z)d] (A.13)

fz denotes a Gaussian distribution and the expectation in (A.13) is necessary because M–j contains random elements Zk. The integral may be explicitly rewritten as

E[ZrG]12πj=1GE[e-M-j22]. (A.14)

From (A.14) one sees that bias is inversely related to the absolute value of the M–j terms. Some consequences of this derivation are as follows.

Consider the effect of increasing the sample size holding all else constant. We focus upon the case in which there exists a single true maximum effect size and designate that marker's index by r0G, i.e.

θrG0σrG0>θjσjfor alljrG0.

It is worthwhile to examine M–j for the case when j = r0G and jr0G separately where we assume only one variable (with index r0G) has the most positive effect size, i.e. there are no ties. Then

limM-jN=limNmaxkj(Zk+N(θkσk-θjσj)) (A.15)
=-with probability1ifj=rG0 (A.16)
=-ifj=rG0. (A.17)

In either case we have that M2–j → ∞ so from (A.14) one sees limNE[ZrG]=0.

The case for expanding the differences among effect sizes is similar – at least for the simplified example below. For a given pattern of effect sizes among the G variables (again with no ties for the largest effect size), consider a new pattern of effect sizes given by multiplying each original effect by a constant γ > 0. Then if r0G designates the marker with the most positive true effect size

limγM-j=limγmaxkj(Zk+γN(θkσk-θjσj)) (A.18)
=-ifj=rG0 (A.19)
=ifjrG0. (A.20)

Consequently the same conclusion of no bias follows. If one reverses the limiting action of γ so that γ → 0 from above then

limγ0M-j=maxZkkj (A.21)

where the Zkare standard normal statistics and the bias is then positive. This situation corresponds to the situation of no variables showing differential expression.

The case for increasing G, the number of variables is less clear cut as it depends upon the combination of effect sizes. Empirically it seems that adding variables with effect sizes at or near the size of the largest preexisting effect sizes exaggerates the bias effects for θrG In terms of figuring the change of M–j terms as above there is more ambiguity as some terms M2−j terms will likely increase, others decrease, and some new terms will be introduced.

References

  • 1.Benjamini Y, Hochberg Y. Controlling the false discovery rate – a practical and powerful approach to multiple testing. J R Stat Soc Ser B. 1995;57:289–300. [Google Scholar]
  • 2.Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci USA. 2003;100:9440–9445. doi: 10.1073/pnas.1530509100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Editorial Freely associating. Nat Genet. 1999;22:1–2. doi: 10.1038/8702. [DOI] [PubMed] [Google Scholar]
  • 4.Garner C. Upward bias in odds ratio estimates from genome-wide association studies. Genet Epidemiol. 2007;31:288–295. doi: 10.1002/gepi.20209. [DOI] [PubMed] [Google Scholar]
  • 5.Allison D, Fernandez JR, Heo M, Zhu S, Etzel C, Beasley TM, Amos CI. Bias in estimates of quantitative-trait locus effect in genome scans: demonstration of the phenomenon and a methodof-moments procedure for reducing bias. Am J Hum Genet. 2002;70:575–585. doi: 10.1086/339273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Göring H, Terwilliger JD, Blanger J. Large upward bias in estimation of locus-specific effects from genomewide scans. Am J Hum Genet. 2001;69:1357–1363. doi: 10.1086/324471. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Zöllner S, Pritchard JK. Overcoming the winner's curse: estimating penetrance parameters from casecontrol data. Am J Hum Genet. 2007;80:605–615. doi: 10.1086/512821. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Sun L, Bull SB. Reduction of selection bias in genomewide studies by resampling. Genet Epidemiol. 2005;28:352–367. doi: 10.1002/gepi.20068. [DOI] [PubMed] [Google Scholar]
  • 9.Jeffries N. Multiple comparisons distortions of parameter estimates. Biostatistics. 2007;8:500–504. doi: 10.1093/biostatistics/kxl025. [DOI] [PubMed] [Google Scholar]
  • 10.Jeffries N. Performance of a genetic algorithm for mass spectrometry proteomics. BMC Bioinformatics. 2004;5:180. doi: 10.1186/1471-2105-5-180. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Ghosh A, Zou F, Wright FA. Estimating odds ratios in genome scans: an approximate conditional likelihood approach. Am J Hum Genet. 2008;82:1064–1074. doi: 10.1016/j.ajhg.2008.03.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Zhong H, Prentice RL: Bias-reduced estimators and confidence intervals for odds ratios in genome-wide association studies. Biostatistics 2008, in press. [DOI] [PMC free article] [PubMed]

Articles from Human Heredity are provided here courtesy of Karger Publishers

RESOURCES