Skip to main content
International Journal of Epidemiology logoLink to International Journal of Epidemiology
. 2020 Sep 2;49(5):1497–1500. doi: 10.1093/ije/dyaa097

Commentary: Continuing the E-value’s post-publication peer review

Charles Poole
PMCID: PMC7746397  PMID: 33336256

An association is taken seriously as possibly causal and then is found to have been the spurious spawn of one or more confounders, hitherto unknown and even unsuspected. This nightmarish scenario is possible, but how often has it actually occurred? For all the fear it engenders, one might think it has been commonplace in epidemiological history. One response to it is the E-value,1–6 a measure its developers have lobbied to become ‘standard practice’ and ‘reported routinely’ in ‘all observational studies intended to produce evidence for causality’.2

Ordinarily, a sensitivity analysis of uncontrolled confounding requires one to specify estimated or postulated values of parameters representing a confounder’s distribution and its associations with exposure and outcome.7 The E-value relieves the analyst of those duties by building the specifications into the calculation, as more or less explicit assumptions. The simple formula, requiring but a single input, seems almost too good to be true. The challenge lies in interpreting the output.

What is an E-value?

E-values1,2,5 are for confounding the control of which would reduce an elevated risk ratio estimate, RRobs >1, down to the null value, RRadj = 1. (Perhaps unnecessarily, zero net bias from other sources is assumed, so that the de-confounded estimate would equal the true value, RRadj = RRtrue.) The association of the uncontrolled confounder (U) with the exposure (E) is assumed to be positive, RREU >1, as is U’s association with the outcome (D), RRUD >1. U’s prevalence in the exposed group, p1, is assumed to be 100%, which implies an exposure prevalence of zero in U’s reference level.

Under these assumptions, for a given value of RRobs, RRUD is related to RREU by a function,1 RRUD=RRobs(1–RREU) / (RRobs–RREU), an example of which is shown in Figure 2 of VanderWeele and Ding.2 At one point along this function, RREU and RRUD are equal. That point is the E-value = RRobs+[RRobs(RRobs–1)]0.5. At every other point, the two associations straddle the E-value: one weaker, the other stronger.

Interpretation

Unfortunately, the developers’ definition of the E-value in words as ‘the minimum strength of association, on the risk ratio scale, that an unmeasured confounder would need to have with both the treatment and outcome, conditional on the measured covariates, to fully explain away a specific treatment-outcome association’2 makes it sound as though RREU and RRUD must both equal or exceed the E-value under the specified conditions. So does their discussion of examples, as when they write of an E-value of 2.5 that ‘an unmeasured confounder would have to be associated with both [the exposure] and the outcome by a risk ratio of 2.5-fold each.’2  Table 1 gives a contrary example. The E-value is 2.5. RREU is stronger than that, 1/(6410/50 000) = 7.8, but RRUD is weaker, 0.038/0.022 = 1.7.

Table 1.

An uncontrolled confounder (U1) completely responsible for an apparent risk increase in connection with an exposure (E)

U1 E Cases Persons    Risk Risk ratio
1 1 1876 50 000 0.038 1.00
0 241 6410 0.038 1
0 1 0 0 (0.022)a (1.00)a
0 961 43 590 0.022 1
Total 1 1876 50 000 0.038 1.56
0 1202 50 000 0.024 1
a

This risk and risk ratio are well-defined, by assumption, but there would be no data to estimate them.

The developers’ guidance2 is to consider typical values of RREU and RRUD in the relevant research area. The E-value, however, was not designed for typical confounding. It was designed for extraordinarily exceptional confounding, the control of which would completely wipe out apparent risk increases despite researchers’ best efforts at confounder control.

Why, under the specified conditions, is the misinterpretation that both of the confounder’s associations must be at least as strong as the E-value more attractive than the correct interpretation that the two associations, if they do not coincidentally equal the E-value, must lie on either side of it? I suspect it is because a measure worthy of the former interpretation would be considerably more useful than the E-value, which deserves only the latter.

RREU and RRUD can both be allowed to exceed the E-value, but only by relaxing the extreme assumption of p1 = 1. The developers, however, recommend relaxing that assumption only when ‘it is known in advance’ to be false5: a tall order for unknown, unsuspected confounders. Nonetheless, if it is implausible for the confounder’s prevalence in the exposed group to be as high as 100%, the E-value can become ‘exceedingly conservative’ and ‘perhaps to be avoided’.5

Use

Following up on some of their concerns about the E-value,8 Blum et al. report results from a systematic review of its use by early adopters.9 One concern is that researchers will calculate E-values and then say whatever they would have said anyway about uncontrolled confounding in a paper’s Discussion section, where almost anything goes. One of the great benefits of sensitivity analysis is that it can shift such concerns into the Methods and Results sections, where greater order and accountability ostensibly prevail.

Here the well-studied, multifaceted and all too human phenomenon known in cognitive psychology as overconfidence bias10 comes into play. In viewing our own research, we are all tempted to see the strengths as strong and the limitations as limited. E-values, it would seem, do little to dampen this tendency. The early adopters saw about four out of five of the E-values they calculated as reassuring, with no material tendency for smaller E-values to elicit heightened concern.9

The reviewers found no paper containing an E-value that included any other sensitivity analysis of uncontrolled confounding as well.9 Clearly, the measure is being used as it has been promoted: not merely as a supplement to more extensive analyses, but as a putatively sufficient alternative to them.

Somewhat fewer than half of the scored papers mentioned specific potential confounders, a small number of which the reviewers classified as partially controlled.9 When a potential, uncontrolled confounder has a name, one would hope that its sensitivity analysis always would make use of what is known about it, which an E-value does not do. At a bare minimum, one should be able to discern the confounding’s likely direction, especially in the case of partial control. If that direction is toward the null, an E-value is by definition uninformative. The measure’s main utility, therefore, is not for all uncontrolled confounders, but for those that are unknown and even unsuspected.

Another E-value?

A goal in developing the E-value was to see if ‘it is possible to construct scenarios’5 in which an uncontrolled confounder has associations with exposure and outcome that are equal and as weak as possible, while fully accounting for an apparent risk increase. The E-value that emerged does not always achieve that goal. In Table 2, U2 accounts for the same apparent risk increase that U1 does in Table 1. The associations of U2 with E and D, however, are both 0.48, weaker than the developers’ E-value of 2.5.

Table 2.

Another uncontrolled confounder (U2) completely responsible for the same apparent risk increase as in Table 1

U2 E Cases Persons Risk Risk ratio
1 1 577 24 020 0.024 1.00
0 1201 50 000 0.024 1
0 1 1299 25 980 0.050 (1.00)a
0 0 0 (0.050)a 1
Total 1 1876 50 000 0.038 1.56
0 1202 50 000 0.024 1
a

This risk and risk ratio are well-defined, by assumption, but there would be no data to estimate them.

Apparent risk reductions

If a confounded risk ratio estimate suggests reduced risk and control of the confounding would raise that estimate to the null value, the confounding is downward. Downward confounding requires a confounder’s associations with exposure and outcome to be in opposite directions. An E-value requires them to be in the same direction. Hence, no E-value is possible for apparent risk reductions.

The proposed solution1,2,5,6 is to reverse the exposure or treatment contrast by transforming RRobs to its reciprocal. Thus, RRobs = 0.60 for macrovascular events, in a comparison of bariatric surgery with medical obesity treatment, becomes RRobs = 0.60–1 = 1.67 comparing medical treatment with surgery.6 This manoeuvre is fine, as far as it goes. The problem comes in the recommended interpretation of the resulting E-value, which is that ‘residual confounding could explain the observed association if there exists an unmeasured covariate having a relative risk association at least as large as 2.72 with both macrovascular events and with bariatric surgery’.6 To the contrary, a confounder associated with increased risk that accounts for the entirety of that association would have to be more common among persons receiving medical treatment, not among those undergoing the surgery.

The same mistake appeared in a discussion2 of examples in which breastfeeding was associated with reduced outcome risks. The E-value’s developers concluded that, to produce those associations, a confounder associated with increased risk would have to be more common among the breastfed babies. To the contrary, it would have to be more common among the non-breastfed babies. An editorial accepted this misinterpretation at face value and even displayed it on a graph, the entire left side of which is devoid of validity.11

In the data entry stage of their review, Blum et al.9 recorded the reciprocal of every RRobs below the null value. Hence, the full extent of the damage done to the field thus far cannot be assessed without reading all the papers. In a haphazard selection of a dozen or so, I found three (reviewers’ references S16, S22 and S44) in which researchers were sent searching for potential, away-from-the-null confounders associated with increased risk that are more common in their exposed or index treatment groups, instead of where they should be looking, in their unexposed or reference treatment groups.

Conspectus

The post-publication phase of peer review is almost always the most important.12 It seems clear that the E-value will prove to be a good example. The points touched upon here will require additional attention. Others will as well, such as the interpretation of E-values calculated for confidence limits, which also may turn out not to be as straightforward as it has been portrayed thus far.1–6

It would be unfair to blame the E-value’s early adopters for the problems noted here. It is accepted in product liability that harms arising from reasonably foreseeable uses should be ascribed to products’ developers, not to their users.13 This principle is no less applicable to methodological research products than it is to unsafe automobiles and toys. They can be used safely in principle, but in reasonably foreseeable practice they are not. To hold users at fault in such circumstances would be ‘victim-blaming at its worst’.13 Considerable attention needs to be paid to human factors14 and mistake-proofing.15

It would be just as unfair to note that some of the E-value’s limitations can be overcome by more thoroughgoing sensitivity analyses.4,5 The measure is being promoted for use in lieu of such analyses. That is how the early adopters have been using it. That, consequently, is the high standard to which it needs to be held. It does not need to be perfect,4 but it does need to be better than good.

In describing part of his motivation for promoting the E-value, Tyler VanderWeele expressed dismay that in 12 years of teaching and encouraging students to conduct sensitivity analyses, he has rarely seen them do so outside the classroom.4 Having tried the same as he for more than twice as long, I have a different experience to report. I have been quite pleased, in general, by the local uptake. Our settings and expectations may differ, we have only our informal impressions to report, and there is always room for improvement. Nonetheless, I attend a great many talks in which students present their planned, ongoing and completed research. Today, I would be more surprised by the absence of sensitivity analyses than by their presence. I would have to say, however, that if a promised analysis of uncontrolled confounding turned out to consist solely of the calculation of some E-values, I am not sure I would count it.

The E-value’s developers are ‘entirely in favor of carrying out a more extensive sensitivity analysis whenever possible’.4 Here’s the good news: it always is.

Funding

This work was funded in part by National Institute on Aging grant R01 AG056479.

Conflict of interest

None declared.

References

  • 1. Ding P, VanderWeele TJ.  Sensitivity analysis without assumptions. Epidemiology  2016;27:368–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. VanderWeele TJ, Ding P.  Sensitivity analysis in observational research: introducing the E-value. Ann Intern Med  2017;167:268–74. [DOI] [PubMed] [Google Scholar]
  • 3. Mathur MB, Ding P, Riddell CA, VanderWeele TJ.  Web site and R package for computing E-values. Epidemiology  2018;29:e45–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. VanderWeele TJ, Mathur MB, Ding P.  Correcting misinterpretations of the E-value. Ann Intern Med  2019;170:131–2. [DOI] [PubMed] [Google Scholar]
  • 5. VanderWeele TJ, Ding P, Mathur M. Technical considerations in the use of the E-value. J Causal Inference  2019;7:20180007. doi: 10.1515/jci-2018-0007. [DOI]
  • 6. Haneuse S, VanderWeele TJ, Arterburn D.  Using the E-value to assess the potential effect of unmeasured confounding in observational studies. JAMA  2019;321:602–3. [DOI] [PubMed] [Google Scholar]
  • 7. Schlesselman JJ.  Assessing effects of confounding variables. Am J Epidemiol  1978;108:3–8. [PubMed] [Google Scholar]
  • 8. Ioannidis JPA, Tan YJ, Blum MR.  Limitations and misinterpretations of E-values for sensitivity analyses of observational studies. Ann Intern Med  2019;170:108–11. [DOI] [PubMed] [Google Scholar]
  • 9. Blum MR, Tan YJ, Ioannidis JPA.  Use of E-values for addressing confounding in observational studies—an empirical assessment of the literature. Int J Epidemiol  2020;49:1482–94. [DOI] [PubMed] [Google Scholar]
  • 10. Moore DA, Schatz D.  The three faces of overconfidence. Soc Personal Psychol Compass  2017;11:e12331. [Google Scholar]
  • 11. Localio AR, Stack CB, Griswold ME.  Sensitivity analysis for unmeasured confounding: E-values for observational studies (editorial). Ann Intern Med  2017;167:285–6. [DOI] [PubMed] [Google Scholar]
  • 12. Poole C.  Evolution of epidemiologic evidence on magnetic fields and childhood cancers. Am J Epidemiol  1996;143:129–32. [DOI] [PubMed] [Google Scholar]
  • 13. Adler RS, Popper AF.  The misuse of product misuse: victim blaming at its worst. William Mary Bus Law Rev  2019;10:337–68. [Google Scholar]
  • 14.National Research Council. The Case for Human Factors in Industry and Government: Report of a Workshop. Washington, DC: National Academies Press, 1997. [Google Scholar]
  • 15. Grout JR.  Mistake proofing: changing designs to reduce error. Qual Saf Health Care  2006;15:i44–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from International Journal of Epidemiology are provided here courtesy of Oxford University Press

RESOURCES