Skip to main content
Health Services Research logoLink to Health Services Research
. 2019 Mar 12;54(3):537–542. doi: 10.1111/1475-6773.13129

Instrumental variables: The power of wishful thinking vs the confounded reality of comparative effectiveness research

Stephen B Soumerai 1,, Ross Koppel 2,3
PMCID: PMC6505571  PMID: 30864150

1. INTRODUCTION

Keele and Small responded to our article on instrumental variables (IVs) published in Health Services Research in February 2017.1, 2 Here, we address their efforts to defend IVs and we present additional evidence of the unreliability of IVs in comparative effectiveness research (CER). We appreciate that some economists, statisticians, and other IV adherents are emboldened by their faith in the power of weak cross‐sectional associations to accurately reflect the world. But health outcomes research requires confronting the interrelatedness of social and medical factors—almost always a confounded reality with unmeasured and, indeed, unknown variables.3 That is, most IV studies assume life is far less confounded than it is.

Rassen et al4 define an instrumental variable (IV) as “an unconfounded proxy for a study exposure that can be used to estimate a causal effect in the presence of unmeasured confounding.” In other words, it assumes that a common variable (eg, differences between regions in rates of a medical treatment [eg, prostatectomy]) can assume randomization in estimating the effect on mortality—without having to worry about unknown confounding. In this wonderfully convenient but almost always unrealistic theory, IVs can randomize exposure to medical interventions. For example, IV proponents used distance from patients’ homes to the hospital as an instrumental variable (IV) that presumably “randomizes” early MI treatments to estimate effects on mortality rates.2, 5, 6, 7 In these examples, however, the IV is likely associated with other critical variables (eg, urban/rural status, socioeconomic characteristics, health status) that impact health outcomes. The IV is confounded and therefore produces a biased effect estimate.

Even after more than a quarter century of IV research in health care, most instrumental variables still violate the assumptions that the IV‐Outcome relationship is not confounded.2, 3 Studies that use IV also ignore or accept inadequate measures of countless confounders (eg, health status, clinical risk factors, geography, procedure volume, access to care, and SES—part of a longer list we discuss below).3

2. RESPONSES TO SEVERAL OF KEELE AND SMALL'S STATEMENTS ABOUT OUR ARTICLE

  1. Keele et al state that we “generally argue against [the use of IVs”]. We agree that we are deeply skeptical of most uses of IV, but we do not reject all uses. For instance, we believe that some lotteries (eg, the draft board lotteries in the 1960s)8, 9 and other randomized experiments usually serve as valid IVs. The authors, however, equate our and Cochrane's rejection of most IVs as a weak research design10, 11 with “throwing the baby out with the bathwater.” The IV mentioned above, distance to the hospital, or other IV favorites, such as regional variations in health care, or variation in care by hospitals or physicians, are untrustworthy for the obvious reason that they do not “randomize” and are biased by a multitude of likely confounders.3

  2. We disagree with Keele and Small's statement that our previous article failed to provide evidence that the IV in Sanwald and Schober is invalid.2, 7 In fact, the IV they use (distance to medical facility again) could only distinguish hospitals with or without cardiac catheterization laboratories. But it ignored other medical interventions for acute MI plus other unmeasured differences in care quality that affect mortality.12 Absent knowledge of causes of mortality the IV provides no useful information.5 It also suffers from many other confounders of distance IVs, a small list of which are reproduced in Table 1 below. For example, the landmark McClellan et al's3 catheterization laboratories study showed that the small “mortality effects” of cardiac catheterization occurred before exposure to the procedure, an illustration of the reverse causality seen in many cross‐sectional IV studies. The “randomization” assumption was violated. The “effect” was due to many confounders of distance, such as rurality, SES, health, and access to other treatments, like medications.

Table 1.

Instrumental variables and categories of associated confounders (with example and variables)

Instrumental variable category
Distance to facility instrument Regional variation instrument Facility variation instrument Physician variation instrument
Confounder category violating IV assumptions (associated with IV and mortality)
Geographic location Urban/rural, U.S. region, and absolute distance (for relative difference) Urban/rural, U.S. region Urban/rural
Patient Race, education, income, age, insurance status, health status/comorbid conditions, and health behaviors Race, education, income, age, insurance status, health status/comorbid conditions, and health behaviors Race, education, income, age, insurance status, health status/comorbid conditions, and health behaviors Race, education, income, age, sex, insurance status, health status/comorbid conditions, and health behaviors
Provider supply Number of hospital beds or nursing homes
Technology adoption and utilization Invasive cardiac procedures, radical prostatectomies, prescribing behavior, and practice patterns
Treatment Receipt of other treatments, time to treatment, and transfer status Receipt of other treatments, time to treatment, and transfer status Receipt of other treatments Receipt of other treatments
Facility Procedure volume, facility volume, clinical services offered, departments, teaching status, profit status, trauma designation, delivery system type, and practice type Procedure volume, facility volume, clinical services offered, departments, teaching status, profit status, trauma designation, delivery system type, and practice type Procedure volume, facility volume, clinical services offered, departments, teaching status, profit status, trauma designation, delivery system type, and practice type
Physician Use other treatments
Health system Reimbursement policies, regional variation, and facility variation confounders, that is, geographic location, provider supply, technology adoption and utilization, and facility characteristics

There is an overwhelming bias toward undercounting confounders. Many are unknown, many are unmeasured or unmeasurable, and all are potential threats to the validity of results by those using IVs.

Adapted from: Garabedian et al.3 Printed with permission from Annals of Internal Medicine.

This misuse of IVs (generally, weak cross‐sectional associations) has consequences beyond academic debates. Inappropriate research methods offer no or false guidance for policy makers and can even cause wasteful resource allocation and patient harms.13

3. COMMENTS ON KEELE AND SMALL'S DEFENSE OF IVS

  1. The authors offer the general IV justification that IVs are a “haphazard element” that “randomize” or “as‐if” randomize treatment (eg, distance from the hospital randomizes catheterization for acute MIs). Oddly for a defense, they admit this assumption “cannot be tested with data”, but can be examined via “falsification tests” to see whether analyses of a few possible confounding variables change their conclusion. Unfortunately, IVs are confounded by numerous known or unmeasured variables.3, 14 Many powerful confounders are usually ignored by proponents of IV and are rarely included in the study data.

    For example, in the study mentioned above where investigators examined the effects of distance to hospitals with catheterization laboratories on acute MI mortality,5 their Medicare datasets do not capture use of life‐saving drug treatment in hospitals—like aspirin and reperfusion therapy.15 But hospital medication use covaries with the IV, distance to the hospital, rurality, and mortality.16 Thus, the absence of data on life‐saving hospital medications violates a basic assumption of IV use and offers a strong alternative pathway to mortality.

  2. For another example of critical but unmeasured variables, consider research on rural health outcomes that routinely highlight the hundreds of associations of health‐related (potentially confounding) variables with the IV, distance to the hospital, and mortality.17, 18, 19 They are not in the IV dataset in over half of distance IV/mortality studies,3 nor can they be wished away by a limited subgroup analysis of a few variables. Thus, if a rural resident (ie, “distant” from a hospital) is poor, less educated, cannot get an ambulance quickly, and is dying of a heart attack in transit), IV users assume the death may be treated as a consequence of lack of access to revascularization in the hospital rather than poor SES, poor health, limited rural ER access, or the deficit in many alternative medical procedures they do not examine (see Figure 1).3

    Figure 1.

    Figure 1

    The arrows to IV‐Outcome confounders show that confounders, such as SES, poor health, and rural residence, may cause hospital mortality, possibly more than the hypothesized access to catheterization laboratories. [Arrows indicate associations]. The IV is biased if the IV Outcome Related Through Unadjusted 3rd Variable. Adapted from: Garabedian et al.3 Printed with permission from Annals of Internal Medicine
  3. Keele and Strong also discuss the comparative strength of IVs vs “other feasible designs.” But, the leading research design texts show that only a few quasi‐experimental designs protect against threats to internal validity, such as history and selection.13, 20, 21 Cross‐sectional, correlational IV studies do not control for any of these biases.

4. TWO STUDIES ILLUSTRATING THE BIASES CAUSED BY CONFOUNDED IVS

It is instructive to expand on the example of the landmark JAMA study by McClellan et al.5 It is the most cited, extant IV study, finding a small 5 percent reduction in mortality among acute MI patients with closer relative distance to hospitals with cardiac catheterization laboratories. It was followed by a large increase in similar IV studies in the following 25 years.3 Ironically, the study provides solid evidence for the invalidity of such methods, even before many researchers attempted to replicate the unreliable findings:

  • The more distant patients already experienced the total reported 5 percent greater mortality than the closer patients in the first day of hospitalization—largely before any treatment or mortality benefits occur. This small inversely causal difference was due to IV‐Outcome confounding and nonequivalence.

  • The more distant patients were sicker, not randomly equivalent to closer patients. For example, distant MI patients receiving fewer cardiac procedures were almost 10 times more likely to have a rural residence,5 a predictor of MI mortality.3 Also, more distant patients were about half as likely to be admitted to a high‐volume hospital, a major predictor of better outcomes. These and other variables were associated with both mortality and distance to the hospital, a violation of IV assumptions.

  • We also found that rural residents (the majority of “distant patients”) are less likely to receive life‐saving medications for acute MI than closer patients.15

5. IV STUDY SUGGESTING THE MORTALITY CONSEQUENCES OF ADVANCED AMBULANCES

  • Perhaps the most dramatic example of heroic assumptions is seen in Sanghavi et al's10, 14, 22, 23 use of an equally biased IV— comparing (cross‐sectionally) the rate of use of advanced vs basic ambulances (the latter with less life‐saving equipment and trained personnel) by county—on survival among Medicare patients transported to the hospital. Moreover, many IV‐Outcome confounders were poorly measured in the Medicare claims, such as illness severity: often the real cause of increased mortality, not the dispatching of advanced ambulances.

  • Emergency medical clinicians often triage advanced life support ambulances to save critically ill patients.14 The figure below, from a large study in Washington State, indicates only a few of the patients’ illnesses at the time they were transported by advanced vs basic ambulances. Those transported in advanced ambulances with trained personnel were already more likely to die before they reached the hospital: an example of a correlational IV study of reversed cause and effect (Figure 2).

    Figure 2.

    Figure 2

    Several serious conditions of patients transported in advance life support vs basic life support ambulance Source: Prekker et al.27

Despite these severe limitations, the article's authors proclaimed the national policy implications of their study, even calculating that many millions could be saved if the United States abandoned more expensive advanced ambulances. Such overconfidence in interpretation of a flawed correlational study demonstrates yet another IV study, that if believed by policy makers, could have unintended adverse health consequences by defunding life‐saving interventions.

6. POWERFUL CONFOUNDERS OF THE MOST COMMON IV‐MORTALITY STUDIES IN THE ONLY SYSTEMATIC REVIEW THAT OBSERVED NON‐STUDY DATA

Methodological leaders in CER and IV recognized the crucial importance of closely examining other data sources and common sense/experience to identify IV‐Outcome confounders, such as concomitant medical treatments.24

  • To our knowledge, Garabedian et al3 published the only study that searched the worldwide literature to identify unmeasured confounders of the most common four IVs and mortality (including distance to the hospital). They discovered hundreds of confounders (Table 1). Because the individual confounders are too numerous to display, here we only list the categories in Table 1. Every IV study examined (n = 65) had at least one confounder.3 Of course, the number of true confounders is grossly underestimated because studies of many IV‐Outcome confounders have not been published.

7. USE OF WEAK STUDY DESIGNS FOR CONTROLLING THREATS TO THE VALIDITY OF IVS

In the majority of cases, observational IV studies (eg, distance IVs that prompted these debates) only observe outcomes after treatment. They cannot measure pre‐ and post‐ changes in outcomes and are at the bottom of the hierarchy of nonexperimental (or quasi‐experimental) designs (Table 2). This is why the most respected international body reviewing medical evidence (the Cochrane Collaboration) still does not include most IV designs in its systematic reviews of health interventions, based on the hierarchy of strong and weak study designs and the relative inability of cross‐sectional designs (most IV studies) to control for most biases.21

Table 2.

Hierarchy of strong and weak designs, based on capacity to control for biases (Cochrane Effective Practice and Organisation of Care (EPOC) 2017, Soumerai SB, Starr D, Majumdar SR 2015, and Naci H, Soumerai SB, 2016)11, 13, 28

Strong design: often trustworthy effects
Multiple RCTs The “gold standard” of evidence, incorporating systematic review of all RCTs of an intervention (eg, random assignment of smoking cessation treatment)
Single RCT A single, strong randomized experiment, but sometimes not generalizable
Interrupted time series with control series Baseline trends often allow visible effects and control for biases. This design has two controls: baseline trend and control group to measure sudden discontinuities in trend soon after an intervention
Intermediate designs: sometimes trustworthy effects
Single interrupted time series Controls for trends, but no comparison group (see above)
Before and after with comparison group Pre‐ and postchange using single observations. Comparability of baseline trend often unknown
Weak designs: rarely trustworthy effects (no controls for common biases. excluded from literature syntheses)
Uncontrolled before and after (pre‐post) Simple observations before and after intervention, no baseline trend or control group
Cross‐sectional designs Simple correlation, no baseline, and no measure of change. Includes most IVs

Source: Soumerai et al.29

8. CONCLUSION

The evidence presented here should give scholars pause before relying on IVs in CER research. The complexity of health, health care, and social factors almost ensures the presence of major confounding factors that seldom, if ever can be addressed with incomplete datasets, unfounded assumptions and complex but unrealistic statistics. While IVs are extraordinarily convenient, they can also be extraordinarily misleading.

Scholars and policy makers are eager to find efficiencies in medicine. Health care consumes almost one‐fifth (18 percent) of the gross national product.25 But we cannot rely on research methods that often mislead us because of unmeasured and ignored variables. Increasingly, we should insist on strong experimental and quasi‐experimental research designs (such as RCTs, controlled interrupted time‐series designs, and systematic reviews) in pilot tests of expensive policies. Private and taxpayer investments should be based on solid evidence of safety and efficacy.26 We should not base policy on weak research designs that may lead to perverse effects, such as unsustainable costs and policies that fail to improve medical care.

Supporting information

 

ACKNOWLEDGMENTS

Joint Acknowledgment/Disclosure Statement: Parts of this article were presented at the International Society for Pharmacoepidemiology (ISPE) 10th Asian Conference on Pharmacoepidemiology on October 29, 2017, in Brisbane, Australia. The title of the presentation was: “Instrumental Variable Analysis and Interrupted Time Series Analysis in Health Policy Research: You Can't Fix by Adjustment What You Bungled by Design.”

REFERENCES

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

 


Articles from Health Services Research are provided here courtesy of Health Research & Educational Trust

RESOURCES