An easy approach to the Robins-Breslow-Greenland variance estimator

Paul Silcocks

doi:10.1186/1742-5573-2-9

. 2005 Sep 26;2:9. doi: 10.1186/1742-5573-2-9

An easy approach to the Robins-Breslow-Greenland variance estimator

Paul Silcocks ^1,^✉

PMCID: PMC1270683 PMID: 16185354

Abstract

The Mantel-Haenszel estimate for the odds ratio (and its logarithm) in stratified case control studies lacked a generally acceptable variance estimate for many years. The Robins-Breslow-Greenland estimate has met this need, but standard textbooks still do not provide an explanation of how it is derived. This article provides an accessible derivation which demonstrates the link between the Robins-Breslow-Greenland estimate and the familiar Woolf estimate for the variance of the log odds ratio, and which could easily be included in Masters level courses in epidemiology. The relationships to the unconditional and conditional maximum likelihood estimates are also reviewed.

Keywords: Odds ratio, Variance, Robins-Breslow-Greenland, RBG estimate

Introduction

The Mantel-Haenszel (MH) estimate for the summary odds ratio across several 2 × 2 tables, ψ_MH, was proposed in 1959 [1]. Over twenty years later the lack of a robust estimate for its variance was still being noted [2], yet only a few years afterwards, Robins, Breslow and Greenland introduced their now generally-accepted variance estimator [3] for the Mantel-Haenszel log-odds ratio (denoted by the RBG estimate). This replaced estimation of confidence limits based on the unsatisfactory test-based procedure of Miettinen or the computationally intensive Cornfield type limits which had hitherto been used.

While a useful review of Mantel-Haenszel methods has been published, including some aspects of the historical development towards the RBG estimator [4] the formal derivations by Robins, Breslow & Greenland [3] and Phillips & Holland [5] are not, in the view of this author, easily comprehended. The former omits steps in the argument, while the latter appeals to descending factorial powers. Possibly it is no surprise that even modern textbooks [6,7] merely state the RBG formula without deriving it.

While other variance estimators exist, some are ad hoc, such as the application of the cohort study formula to case-control data suggested by Clayton and Hills [8], only apply to the large few strata case [9] or are closely related to the RBG estimator [10]. One rather different exception is Sato's formula [11] but this procedure gives confidence limits directly in the odds ratio scale.

It is the intention of this article to present an informal derivation of the RBG estimator as an extension of the familiar variance formula of Woolf [12], and which could readily be included in standard textbooks of epidemiology or biostatistics. I will describe this from the perspective of a case-control study.

Analysis

How does the Mantel-Haenszel estimate arise?

Consider a stratified case-control study for which the i^thof k independent tables is:

graphic file with name 1742-5573-2-9-i1.gif

Neglecting constants, the unconditional likelihood for the i^thtable is:

where in the i^thtable θ_i= probability of exposure if a case and ø_i= probability of exposure if a control.

The maximum likelihood estimate (MLE) for ø_iis given by b_i/(b_i+ d_i) and if we re-parameterise θ_ias ψø_i/ [ψø_i+ (1 - ø_i)], where ψ is the odds ratio (assumed common to all tables), the contribution to the overall log likelihood made by terms involving ψ is:

∑ a_iln {ψø_i/[ψø_i+ (1 - ø_i)]} + c_iln {1/[ψø_i+ (1 - ø_i)]}.

Differentiating with respect to ψ and equating to zero, and rearranging (noting that a_i+ c_i= n_1i)

we obtain:

∑ a_iln {ψø_i/[ψø_i+ (1 - ø_i)]} + c_iln {1/[ψø_i+ (1 - ø_i)]}.

i.e., ∑ {a_i- n_1iψø_i/[ψø_i+ (1 - ø_i)]} = 0

i.e., ∑ {[ψa_iø_i+ a_i- a_iø_i- n_1iψø_i]/[ψø_i+ (1 - ø_i)]} = 0.

This must be solved numerically to obtain the MLE for ψ, but if the denominators do not vary too much across the tables we merely have to solve:

∑ [ψa_iø_i+ a_i- a_iø_i- n_1iψø_i] = 0

i.e., ∑ [ψ(a_i- n_1i)ø_i+ a_i(1 - ø_i)] = 0

or, ∑ a_i(1 - ø_i) = ∑ ψ(n_1i- a_i)ø_i

giving, ∑ a_i(1 - ø_i) = ψ ∑ (n_1i- a_i)ø_i

and since, ψ_i= b_i/(b_i+ d_i) = b_i/n_0i

This can be used as a first approximation to find the MLE (if there is only one table then ψ is the unconditional MLE = ad/bc). Now in stratified case-control studies with a constant ratio, r, of controls to cases, the total number of subjects in each stratum is given by n_i= n_0i(1 + r), so n_0i= n_i/(1 + r). A constant r will be achieved by design if there is caliper matching; otherwise – as with a post-stratified analysis – this will be only approximately true. The term (1 + r) can then be cancelled and we are left with:

The MH estimator is therefore a first approximation to the unconditional MLE in the large strata case with a constant control:case ratio across strata. However the MH estimator actually coincides with the conditional MLE for the matched pairs design, as outlined, for example on page 164 of Breslow & Day [2].

The sensitivity to variation in the ø_iand constancy of the control:case ratio is not high, as shown by the data in Table 1. In a sense this would be expected because for the most sparse (e.g., pair-matched) data the control:case ratio will be constant, and while the ø_ithen have maximum variance – being only 0 and 1, the MH estimate coincides with the conditional MLE. Conversely, for large strata the control:case ratio will vary, but the variance of the ø_iwill be less and the MH estimate will then approximate the unconditional MLE.

Table 1.

Simulated case-control data with true odds ratio = 5

Case	Control	ø	Controls:cases
36	97	0.58	4
6	71
42	168

Ca	Co
41	79	0.94	2
1	5
42	84

Ca	Co
2	1	0.02	2
26	55
28	56

Ca	Co
19	25	0.30	3
9	59
28	84

Ca	Co
20	41	0.37	4
8	71
28	112

Ca	Co
30	21	0.62	1
4	13
34	34

Ca	Co
22	26	0.46	2
6	30
28	56

Open in a new tab

Odds ratio estimates (Stata v7.0):

Mantel-Haenszel 4.38 (95% CI 2.85 to 6.72)

Conditional MLE 4.36 (95% CI 2.85 to 6.67)

Unconditional MLE 4.42 (95% CI 2.88 to 6.78)

Deriving the variance of the Mantel-Haenszel estimate

Consider again the i^th2 × 2 table, giving the frequencies in each cell:

graphic file with name 1742-5573-2-9-i5.gif

For odds ratio Inline graphic , estimated for a single table by the cross-product ratio a_id_i/b_ic_i, application of the delta method gives Woolf's logit-based formula [8]:

Inline graphic with n_i= a_i+ b_i+ c_i+ d_iand,

The delta method is a widely used procedure in statistics when an approximation is needed for the variance of a function of a variable whose variance is known. In this instance the variable with known variance is a proportion p, and the function is the logit. The basic delta method formula is: var(y) ≈ (dy/dx)²var(x) from which, if y = logit(p = ln[p/(1 - p)],

var(y) ≈ (1/p + 1/(1 - p))²p(1 - p)/n

= (1/p + 1/(1 - p))1/n

= (1/a + 1/b)

if p = a/n and n = a + b.

Here we have two independent proportions (the proportion of cases and controls exposed) and Woolf's formula is obtained by estimating the variances of the separate logits and adding them.

For k such 2 × 2 tables, each representing a separate stratum, the Mantel-Haenszel pooled estimate of the common odds ratio ψ is given by:

graphic file with name 1742-5573-2-9-i9.gif

Hence ψ_MHis a weighted average of the stratum-specific odds ratios. The weights approximate the inverse of the variance of each Inline graphic _iif the true value of ψ = 1. Note that the assumption here of a common odds ratio is not required for the Mantel-Haenszel test.

To derive the variance, in addition to the approximation involved in application of the delta rule, an assumption is also made that each stratum-specific odds ratio is close enough to the Mantel-Haenszel pooled estimate to permit terms like a_id_i/b_ic_ito be replaced by ψ_MH.

We then proceed by obtaining an approximation which avoids zeros in the formula for var[ln( Inline graphic )]. The motivation for this can be seen by comparing the weights for ψ_MH– which are unaffected by zeros except for deleting such strata – whereas if Woolf's variances were used, the result would be indeterminate if cells with zeros were present.

Taking the weights as constant,

graphic file with name 1742-5573-2-9-i10.gif

Assuming a common odds ratio ψ, estimated by ψ_MH, this can be written as:

graphic file with name 1742-5573-2-9-i11.gif

Leading to a formula suggested by Hauck [9]:

graphic file with name 1742-5573-2-9-i12.gif

As mentioned above, a problem with this formula is that it fails if cell entries are zero. However we can proceed further by re-writing the formula as:

graphic file with name 1742-5573-2-9-i13.gif

On substituting 1/ψ_MHfor (b_ic_i/a_id_i):

graphic file with name 1742-5573-2-9-i14.gif

Now if the rows of the 2 × 2 table are interchanged, the variance stays the same. But a similar argument to that above leads to:

graphic file with name 1742-5573-2-9-i15.gif

(Note that the new odds ratio Inline graphic formed by exchanging rows is just 1/ψ_MH.) "The" variance, V, of ln(ψ_MH) is therefore taken to be the mean of the two estimates [13] as follows:

Let R = ∑ (a_id_i/n_i) and S = ∑ (b_ic_i/n_i). On substituting into the two variance formulae:

Next, divide the top and bottom by S²and move the Inline graphic term outside the brackets to obtain:

Inline graphic which is eq. 9 in Phillips & Holland [5].

If we now put

P_i= (a_i+ d_i)/n_iand Q_i= (b_i+ c_i)/n_iwith R_i= a_id_i/n_iand S_i= b_ic_i/n_i

then Inline graphic

which on multiplying out the brackets, rearranging and noting that R/S = ψ_MH, gives:

This is the RBG formula!

When there is only one stratum, this reduces to (1/a + 1/b + 1/c + 1/d) which is the familiar logit based formula of Woolf and which approaches 0 as the sample size increases, assuming a finite true odds ratio. Clearly as the RBG variance estimate is a finite sum of such estimators the RBG estimate will also approach 0, for large strata.

The RBG estimator was derived above on the assumption that the stratum-specific odds ratio estimates could be liberally replaced by the common value, in turn estimated by ψ_MH; both assumptions are reasonable with large samples per stratum. However, the success of the RBG formula derives from its being applicable also to the sparse data case.

To see this, consider a matched-pairs case control study. The capital letters denote the frequency of case-control pairs.

graphic file with name 1742-5573-2-9-i22.gif

In such a study each stratum has only two observations. The table can be decomposed into four types of "unmatched" table according to the exposure category of the case and the control, the frequency of each type being given by the frequency of the corresponding case-control pairs:

graphic file with name 1742-5573-2-9-i23.gif

graphic file with name 1742-5573-2-9-i24.gif

Only the B such tables with a_i= d_i= 1 and the C such tables with b_i= c_i= 1 contribute to the estimate of the odds ratio. Note that these are disjoint sets of tables.

Under these circumstances: ψ_MH= B/C which coincides with the conditional MLE and:

a) the middle term of the RBG formula vanishes because if b_ic_i= 1 then (a_i+ d_i) = 0, and if a_id_i= 1 then (b_i+ c_i) = 0

b) R = ∑ a_id_i/n_i= B/2 & S = ∑ b_ic_i/n_i= C/2

c) There are B terms in which a_id_i(a_i+ d_i) = 2 C terms in which b_ic_i(b_i+ c_i) = 2

giving:

V = B/B²+ C/C²= 1/B + 1/C

This is not only the familiar logit based formula for the variance of the log odds ratio for matched pairs, but is also the variance of the conditional maximum likelihood estimate. This is asymptotically consistent from the general properties of a MLE (and it's easy to see that as the number of tables increases, V → 0).

In other words, the RBG formula, though derived here without assuming validity in the sparse case, does in fact possess this property.

Table 1 shows how closely the conditional maximum likelihood estimate, unconditional maximum likelihood estimate, and MH estimate agree, despite varying ø_iand control:case ratio.

Conclusion

The Mantel-Haenszel estimate of the odds ratio approximates the maximum likelihood estimate for large, few strata and coincides with the conditional maximum likelihood estimate for the sparse data (matched pairs) case.

The RBG formula is the estimator of choice for the variance of the Mantel-Haenszel log-odds-ratio because it applies both in the large few strata case and in the many sparse strata case (as in matched pairs analysis), when the RBG variance estimate actually coincides with the conditional maximum likelihood variance estimate.

Moreover the RBG formula reduces to familiar standard forms for a single stratum and for matched pairs.

Formal derivation of the RBG formula is tricky but an informal, accessible derivation is possible as outlined above, which uses nothing more advanced than the delta method for approximating a variance.

Competing interests

The author(s) declare that they have no competing interests.

References

Mantel N, Haenszel W. Statistical aspects of the analysis of data from retrospective studies of disease. JNCI. 1959;22:719–748. [PubMed] [Google Scholar]
Breslow NE, Day NE. Statistical methods in cancer research, Volume 1 – the analysis of case-control studies. Lyons: International Agency for Research on Cancer; 1980. [PubMed] [Google Scholar]
Robins J, Breslow N, Greenland S. Estimators of the Mantel-Haenszel variance consistent in both sparse data and large-strata limiting models. Biometrics. 1986;42:311–323. [PubMed] [Google Scholar]
Kuritz SJ, Landis JR, Koch GG. A general overview of Mantel-Haenszel methods: applications and recent developments. Ann Rev Public Health. 1988;9:123–160. doi: 10.1146/annurev.pu.09.050188.001011. [DOI] [PubMed] [Google Scholar]
Phillips A, Holland PW. Estimators of the variance of the Mantel-Haenszel log-odds-ratio estimate. Biometrics. 1987;43:425–431. [Google Scholar]
Armitage P, Berry G, Matthews JNS. Statistical methods in medical research. 4. Oxford: Blackwell Science; 2002. [Google Scholar]
Fleiss JL, Levin B, Paik MC. Statistical Methods for Rates & Proportions. Chichester: John Wiley; 2003. [Google Scholar]
Clayton D, Hills M. Statistical methods in epidemiology. Oxford: Oxford University Press; 1995. [Google Scholar]
Hauck WW. The large-sample variance of the Mantel-Haenszel estimator of a common odds ratio. Biometrics. 1979;35:817–819. [Google Scholar]
Flanders WD. A new variance estimator for the Mantel-Haenszel odds ratio. Biometrics. 1985;41:637–642. [Google Scholar]
Sato T. Confidence limits for the Common Odds Ratio Based on the Asymptotic Distribution of the Mantel-Haenszel Estimator. Biometrics. 1990;46:71–80. [Google Scholar]
Woolf B. On estimating the relationship between blood group and disease. Human Genet. 1955;19:251–253. doi: 10.1111/j.1469-1809.1955.tb01348.x. [DOI] [PubMed] [Google Scholar]
Ury HK. Hauck's approximate large-sample variance of the Mantel-Haenszel estimator [letter] Biometrics. 1982;38:1094–1095. [Google Scholar]

[B1] Mantel N, Haenszel W. Statistical aspects of the analysis of data from retrospective studies of disease. JNCI. 1959;22:719–748. [PubMed] [Google Scholar]

[B2] Breslow NE, Day NE. Statistical methods in cancer research, Volume 1 – the analysis of case-control studies. Lyons: International Agency for Research on Cancer; 1980. [PubMed] [Google Scholar]

[B3] Robins J, Breslow N, Greenland S. Estimators of the Mantel-Haenszel variance consistent in both sparse data and large-strata limiting models. Biometrics. 1986;42:311–323. [PubMed] [Google Scholar]

[B4] Kuritz SJ, Landis JR, Koch GG. A general overview of Mantel-Haenszel methods: applications and recent developments. Ann Rev Public Health. 1988;9:123–160. doi: 10.1146/annurev.pu.09.050188.001011. [DOI] [PubMed] [Google Scholar]

[B5] Phillips A, Holland PW. Estimators of the variance of the Mantel-Haenszel log-odds-ratio estimate. Biometrics. 1987;43:425–431. [Google Scholar]

[B6] Armitage P, Berry G, Matthews JNS. Statistical methods in medical research. 4. Oxford: Blackwell Science; 2002. [Google Scholar]

[B7] Fleiss JL, Levin B, Paik MC. Statistical Methods for Rates & Proportions. Chichester: John Wiley; 2003. [Google Scholar]

[B8] Clayton D, Hills M. Statistical methods in epidemiology. Oxford: Oxford University Press; 1995. [Google Scholar]

[B9] Hauck WW. The large-sample variance of the Mantel-Haenszel estimator of a common odds ratio. Biometrics. 1979;35:817–819. [Google Scholar]

[B10] Flanders WD. A new variance estimator for the Mantel-Haenszel odds ratio. Biometrics. 1985;41:637–642. [Google Scholar]

[B11] Sato T. Confidence limits for the Common Odds Ratio Based on the Asymptotic Distribution of the Mantel-Haenszel Estimator. Biometrics. 1990;46:71–80. [Google Scholar]

[B12] Woolf B. On estimating the relationship between blood group and disease. Human Genet. 1955;19:251–253. doi: 10.1111/j.1469-1809.1955.tb01348.x. [DOI] [PubMed] [Google Scholar]

[B13] Ury HK. Hauck's approximate large-sample variance of the Mantel-Haenszel estimator [letter] Biometrics. 1982;38:1094–1095. [Google Scholar]

PERMALINK

An easy approach to the Robins-Breslow-Greenland variance estimator

Paul Silcocks

Abstract

Introduction

Analysis

How does the Mantel-Haenszel estimate arise?

Table 1.

Deriving the variance of the Mantel-Haenszel estimate

Conclusion

Competing interests

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

An easy approach to the Robins-Breslow-Greenland variance estimator

Paul Silcocks

Abstract

Introduction

Analysis

How does the Mantel-Haenszel estimate arise?

Table 1.

Deriving the variance of the Mantel-Haenszel estimate

Conclusion

Competing interests

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases