Skip to main content
International Journal of Epidemiology logoLink to International Journal of Epidemiology
. 2017 May 30;46(5):1627–1632. doi: 10.1093/ije/dyx090

Power calculator for instrumental variable analysis in pharmacoepidemiology

Venexia M Walker 1,2,*, Neil M Davies 1,2, Frank Windmeijer 3,2, Stephen Burgess 2,4,5, Richard M Martin 1,2
PMCID: PMC5837396  PMID: 28575313

Abstract

Background

Instrumental variable analysis, for example with physicians’ prescribing preferences as an instrument for medications issued in primary care, is an increasingly popular method in the field of pharmacoepidemiology. Existing power calculators for studies using instrumental variable analysis, such as Mendelian randomization power calculators, do not allow for the structure of research questions in this field. This is because the analysis in pharmacoepidemiology will typically have stronger instruments and detect larger causal effects than in other fields. Consequently, there is a need for dedicated power calculators for pharmacoepidemiological research.

Methods and Results

The formula for calculating the power of a study using instrumental variable analysis in the context of pharmacoepidemiology is derived before being validated by a simulation study. The formula is applicable for studies using a single binary instrument to analyse the causal effect of a binary exposure on a continuous outcome. An online calculator, as well as packages in both R and Stata, are provided for the implementation of the formula by others.

Conclusions

The statistical power of instrumental variable analysis in pharmacoepidemiological studies to detect a clinically meaningful treatment effect is an important consideration. Research questions in this field have distinct structures that must be accounted for when calculating power. The formula presented differs from existing instrumental variable power formulae due to its parametrization, which is designed specifically for ease of use by pharmacoepidemiologists.

Keywords: Pharmacoepidemiology, instrumental variable, power, binary exposure, continuous outcome, prescribing preference

Introduction

Pharmacoepidemiological studies risk irrelevance if they are insufficiently powered to detect clinically meaningful treatment effects. Before starting a study, the statistical power to calculate a given treatment effect can be calculated. This type of calculation is becoming increasingly important for grant and data request applications, which look to value the contribution of such studies.

The number of pharmacoepidemiology studies using instrumental variable analysis, for example with physicians’ prescribing preferences as an instrument for exposure, continues to grow.1–6 This is partly because instrumental variable analyses have the potential to overcome some of the issues associated with conventional statistical approaches, such as residual confounding and reverse causation. As the demand to provide power calculations to support applications increases, there is a more pressing need to be able to provide power calculations for this method.

There are power calculators for instrumental variable analysis in other settings, such as Mendelian randomization, which uses germline genetic variants as proxies for exposures in disease-related research.7,8 However, pharmacoepidemiological research questions have distinct structures that are not sufficiently catered for by these existing calculators. Unlike Mendelian randomization studies, which often use a case-control study design, pharmacoepidemiology studies typically use a cohort study design. Further to this, pharmacoepidemiology studies usually report a risk difference for a binary exposure using a binary instrument, whereas Mendelian randomization studies report on a continuous exposure using a discrete or continuous genetic instrument (count of alleles or allele score respectively). As a result of these differences, as well as the stronger instruments and larger causal effects seen in pharmacoepidemiology, there is a need for a dedicated power calculator for instrumental variable analysis in the context of this field.

This paper will address how to conduct power calculations for pharmacoepidemiological studies using a single binary instrument to analyse the causal effect of a binary exposure on a continuous outcome. The formula to calculate power will be derived and then validated by a simulation study. The formula is distinct from existing instrumental variable power formulae due to its parametrization, which is designed specifically for ease of use by pharmacoepidemiologists. An online calculator, as well as packages in both R and Stata, are provided for the implementation of the formula by others.

Methods and Results

Let us consider physicians’ prescribing preferences for two different treatments–for example a treatment of interest and a control treatment–as an instrument for exposure to these treatments. Physicians’ preferences are generally not directly observable, so each physician’s prescriptions to previous patients are used as a proxy for their preferences. This results in a binary instrument that takes a value of one if the physician issued a prescription for the treatment of interest to their previous patient and a value of zero if they prescribed the control treatment. We will derive the formula for the power of studies that use this instrument to measure the causal effect of a drug exposure on a continuous outcome, for example systolic blood pressure or low-density lipoprotein cholesterol.

Formula derivation

The instrumental variable analysis we consider requires the following three variables; namely a binary instrument Z, a binary exposure X and a continuous outcome Y. The outcome for patient i, for i=1,,n, is modelled as follows:

Yi=α+βXi+Ui

where Ui is a zero-mean error term containing unobserved confounders, determining both the outcome Yi and the treatment Xi. The instrument Zi affects treatment Xi, but is not associated with the unobserved confounders and has no direct effect on the outcome.

Let Y~i=YiY¯, X~i=XiX¯ and Z~i=ZiZ¯, where Y¯, X¯ and Z¯ are sample averages. Denote by y~, x~ and z~ the n-vectors of observations on Y~i, X~i and Z~i, respectively. The two-stage least squares (2SLS) estimator of β is then given by:

β^=(z~x~)1z~y~.

The variance of the 2SLS estimator is:

Var(β^)=σ2(x~Pz~x~)1

where Pz~=z~(z~z~)1z~ and σ2=E(Ui2) is the residual variance. Note that conditional homoscedasticity holds, so the variance is constant for all values of the instrument i.e. E(Ui2)=E(Ui2|Zi)=σ2 for i=1,,n.

Consider the term x~Pz~x~:

x~Pz~x~=x~z~(z~z~)1z~x~=n(x~z~n)(z~z~n)1(z~x~n)

Let pZ=P(Z=1), pX=P(X=1) and pXZ=P(X=1|Z=1). In large samples:

(z~z~n)Var(z~)=pZ(1pZ)(x~z~n)=(xznXZ¯)pZ(pXZpX)

Hence x~Pz~x~ can be presented in the following way:

x~Pz~x~n(pZ(pXZpX))2pZ(1pZ)

Now consider the instrumental variable estimator of β. Using the asymptotic distribution β^N(β,σ2(x~Pz~x~)1), the distribution of the t-test statistic under the null hypothesis H0:β=β0 is:

t=β^β0σ(x~Pz~x~)1N(0,1)

The distribution of the test statistic under the alternative hypothesis H1:β=β0+δ is:

t=β^β0σ(x~Pz~x~)1=β^β0δσ(x~Pz~x~)1+δσ(x~Pz~x~)1N(δσ(x~Pz~x~)1,1)

The null hypothesis is rejected if |t|>cα where cα is the critical value at significance level α.

The power is the probability that the test statistic will exceed the critical value, which is:

P(t>cα)+P(t<cα)= Φ(cα+δσ(x~Pz~x~)1)+Φ(cαδσ(x~Pz~x~)1)

where Φ(s) is the cumulative standard normal distribution function evaluated at s. Power therefore increases as the value of σ decreases and/or the value of x~Pz~x~ increases. By substituting x~Pz~x~ and simplifying, we obtain the following formula for power:

Power= Φ(cα+δ(pZ(pXZpX))nσpZ(1pZ))+Φ(cαδ(pZ(pXZpX))nσpZ(1pZ))

The formula requires a total of seven parameters to be specified. This includes four parameters that must always be specified–these are the significance level, α; the size of the causal effect, δ; the residual variance, σ2=E(Ui2); and the sample size, n. Also three that can be chosen from the following four parameters–these are the frequency of the instrument, pZ=P(Z=1); the frequency of exposure, pX=P(X=1); the probability of exposure given the instrument Z=1, pXZ=P(X=1|Z=1); and the probability of exposure given the instrument Z=0, pXZ=P(X=1|Z=0). The chosen parameters must be specified so that the following holds:

P(X=1)=P(X=1|Z=0)P(Z=0)+P(X=1|Z=1)P(Z=1)

The formula for power is available for use via an online calculator [https://venexia.shinyapps.io/PharmIV/] and packages for R and Stata can be downloaded from GitHub [https://github.com/venexia/PharmIV].

Note that the frequency of exposure in an instrumental variable analysis of this type is likely to be higher than in a general population study because a drug is compared against one or more other drugs in a population of people with the indication for these treatments. General population studies, on the other hand, tend to compare a population who received the drug of interest with a population who did not receive it, and consequently the frequency of exposure is generally much lower. The effect of varying the parameters within the formula on a study’s power is best presented graphically. Figure 1 illustrates an example of the effect of the frequency of the exposure pX=P(X=1) on the power of a study to detect a causal effect of δ=0.150 using an instrument with a frequency of pZ=0.200, a residual variance of σ2=1 and a sample size of up to 30 000 participants. Both increasing the frequency of exposure up to 50% and increasing the sample size results in increased power for this study.

Figure 1.

Figure 1

Power curves for several values of the frequency of exposure pX=P(X=1) that show the effect on the power of a study to detect a causal effect of δ=0.150 using an instrument with a frequency of pZ=0.200, a residual variance of σ2=1 and a sample size of up to 30 000 participants.

Formula validation

To validate the power formula, we conducted a simulation. We simulated the data by defining the three variables necessary to conduct instrumental variable analysis with a single instrumental variable as follows:

Instrument:ZiBinomial(1,pZ)Exposure:Xi{0,ifc0+Zi(c1c0)+Vi01,ifc0+Zi(c1c0)+Vi>0Outcome:YiδXi+Ui

where pZ=P(Z=1) is the frequency of the instrument, cj=Ф1(P(X=1|Z=j)) for j=0,1 are the inverse cumulative standard normal distribution, or quantile, functions of the conditional probabilities of exposure given the instrument, δ is the causal effect, and Ui and Vi are standard normally distributed error terms with covariance ρ.

The formula uses a binary instrument, binary exposure and continuous outcome and so the above variables were simulated to recreate data of this form. The instrument Z is modelled by a binomial distribution parameterized by its frequency pZ=P(Z=1). This ensures a binary variable with the correct probability of success. The exposure X is also binary but is modelled using a threshold model. The variability in the equation for the exposure comes from the normally distributed error term Vi. The use of the model equation allows the exposure X to be associated with the instrument Z. The outcome Y is modelled by its model equation Yi=δXi+Ui. In the model, the instrument is valid as the outcome Y is only associated with the exposure X, as dictated by the causal effect δ, and is not associated with the instrument Z other than through the exposure X.

Using the generated data, we performed an instrumental variable analysis using the command IVREG2 in Stata.9 From this analysis, we recorded the coefficient of the exposure X with the 95% confidence interval. We then counted the number of simulations for which the confidence interval excluded the null, and divided this by the total number of simulations to determine the power. By running the simulation and calculating the formula using the same parameters, we are able to validate the formula against the simulation.

We present the power calculated from both the simulation and the formula for several parameter combinations in Table 1. The table contains 27 different simulations and each was repeated 10 000 times. The simulations consider each combination of three values of the frequency of exposure, pX=0.100,0.250,0.500; three values of the probability of exposure given the instrument Z=1, pXZ=0.150,0.300,0.450; and three values of the sample size, N=10000,20000,30000. We set the frequency of the instrument, pZ=0.200; the causal effect, δ=0.150; the residual variance, σ2=1; and calculated P(X=1|Z=0) according to the following equation:

P(X=1|Z=0)=P(X=1)P(X=1|Z=1)P(Z=1)1P(Z=1)=pXpXZpZ1pZ

The effect of confounding was removed as a parameter because the power was insensitive to its value in the simulation setting. Details of the simulations conducted to test this can be found in Supplementary File 1, available as Supplementary data at IJE online. The Stata code for this paper, including that used to create the simulation, is available from GitHub [https://github.com/venexia/PharmIV].

Table 1.

A comparison of the power calculated from the formula and a validation simulation for an instrumental variable analysis where the causal effect δ=0.150, the frequency of the instrument pZ=0.200 and the residual variance σ2=1

pX pXZ 10 000 patients
20 000 patients
30 000 patients
Formula Simulation Formula Simulation Formula Simulation
0.100 0.150 6.6% 6.1% 8.3% 7.9% 10.0% 9.8%
0.300 32.3% 33.3% 56.4% 55.5% 73.8% 73.9%
0.450 74.7% 75.6% 96.0% 95.9% 99.5% 99.5%
0.250 0.150 11.7% 11.4% 18.6% 18.3% 25.5% 25.5%
0.300 6.6% 5.4% 8.3% 7.9% 10.0% 9.8%
0.450 32.3% 32.8% 56.4% 56.1% 73.8% 73.7%
0.500 0.150 74.7% 74.2% 96.0% 95.9% 99.5% 99.6%
0.300 32.3% 32.5% 56.4% 57.1% 73.8% 73.7%
0.450 6.6% 5.0% 8.3% 7.2% 10.0% 10.1%

Simulation results

The formula and the simulation consistently provide similar results, with an absolute mean difference of 0.4% for the parameter combinations presented in Table 1. There is also no discernible pattern in the differences, suggesting systematic bias is not present. Further to this, the power is consistent with its behaviour in other established power calculations. For example, increasing sample size universally improves power for all parameter combinations.

Discussion

In this paper, we have derived the formula necessary to calculate power for instrumental variable analysis with a single binary instrument, binary exposure and continuous outcome in the context of pharmacoepidemiology. The formula has been shown to be valid by comparison against a simulation study, which concluded that the formula provided near true values across a range of realistic parameters.

We acknowledge that there is some overlap of this calculator with existing calculators such as that proposed by Burgess for Mendelian randomization.7 Although both calculators ultimately have a shared aim, namely to calculate the power of a study using an instrumental variable analysis, their application makes the calculators distinct. This is evident from the choice of parameterization of the power formula. The Mendelian randomization calculator opts for the coefficient of determination, R2. This is a natural choice for the application, as it summarizes the proportion of the variance expected to be explained by genetic factors. In contrast, we have opted to parameterize our calculator for use in pharmacoepidemiological studies in terms of the frequency of the instrument, the frequency of the exposure and the conditional probabilities that relate them. This is more intuitive to a pharmacoepidemiological audience who will typically use the proportion of patients exposed, i.e. the frequency of exposure. In addition to this, the instruments typically used in this framework–for example, physicians’ prescribing preference–do not necessarily fit as naturally to the notion of variance explained and are summarized much more easily by their frequency and their relationship with exposure.

A concern for any instrumental variable analysis, whether in the context of pharmacoepidemiology or not, is the strength of the instrument. Instruments are termed weak when the correlation between the instruments and the exposure is low.10 A commonly cited threshold is a partial F statistic of the association between the instrument and the exposure of less than 10.8,11 Weak instruments will result in low power to detect a causal effect.12–14 They are also known to induce bias, as such instruments may explain only a small proportion of the association between the exposure and outcome. Therefore, although pharmacoepidemiological studies are likely to have stronger instruments than other forms of instrumental variable analysis such as Mendelian randomization, researchers should remain mindful of their choice of instrument and whether it is appropriate for the research question they wish to study.

As for any power formula, the formula presented here is limited by its parameters, which simplify the dataset being considered. Power calculated from such formulae cannot account for dataset characteristics outside these parameters. For example, the formula makes no allowance for the presence of missing data–a known limiting factor in the power of a study. By allowing for missing data in the anticipated sample size, conservative estimates for the power of a study can be obtained using the formula presented. Further work is needed in order to establish the formula for power in other scenarios that use instrumental variable analysis within a pharmacoepidemiology context. This includes analyses with binary outcomes and analyses that involve multiple instrumental variables.

As the use of instrumental variable analysis in pharmacoepidemiology becomes more commonplace, there is an increasing need to provide power calculations for studies using this type of analysis. To provide such information, accessible and accurate power formulae need to be made available. By using the formula presented here, either directly or via the online tool and packages in R and Stata, it is hoped that pharmacoepidemiologists can calculate the power of instrumental variable analysis studies with a single binary instrument, binary exposure and continuous outcome with ease.

Supplementary Data

Supplementary data area available at IJE online.

Funding

This work was supported by the Perros Trust and the Integrative Epidemiology Unit. The Integrative Epidemiology Unit is supported by the Medical Research Council and the University of Bristol [grant number MC_UU_12013/9]. S.B. is supported by a Sir Henry Dale Fellowship jointly funded by the Wellcome Trust and the Royal Society (Grant Number 204623/Z/16/Z).

Conflict of interest: None to declare.

Key Messages

  • Research questions using instrumental variable analysis in pharmacoepidemiology have distinct structures that have previously not been catered for by instrumental variable analysis power calculators.

  • Power can be calculated for studies using a single binary instrument to analyse the causal effect of a binary exposure on a continuous outcome in the context of pharmacoepidemiology using the presented formula, an online power calculator or packages available for use in both R and Stata.

  • The use of this power calculator will allow investigators to determine whether a pharmacoepidemiology study is likely to detect clinically meaningful treatment effects before the study’s commencement.

Supplementary Material

Supplementary Data

References

  • 1. Hernán MA, Robins JM. Instruments for causal inference: an epidemiologist’s dream? Epidemiology 2006;17:360–72. [DOI] [PubMed] [Google Scholar]
  • 2. Greenland S. An introduction to instrumental variables for epidemiologists. Int J Epidemiol 2000;29:722–29. [DOI] [PubMed] [Google Scholar]
  • 3. Davies NM, Gunnell D, Thomas KH, Metcalfe C, Windmeijer F, Martin RM. Physicians’ prescribing preferences were a potential instrument for patients’ actual prescriptions of antidepressants. J Clin Epidemiol 2013;66:1386–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Brookhart MA, Rassen JA, Schneeweiss S. Instrumental variable methods in comparative safety and effectiveness research. Pharmacoepidemiol Drug Saf 2010;19:537–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Brookhart MA, Wang P, Solomon DH, Schneeweiss S. Evaluating short-term drug effects using a physician-specific prescribing preference as an instrumental variable. Epidemiology 2006;17:268–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Davies NM, Davey Smith G, Windmeijer F, Martin RM. COX-2 selective nonsteroidal anti-inflammatory drugs and risk of gastrointestinal tract complications and myocardial infarction: an instrumental variable analysis. Epidemiology 2013;24:352–62. [DOI] [PubMed] [Google Scholar]
  • 7. Burgess S. Sample size and power calculations in Mendelian randomization with a single instrumental variable and a binary outcome. Int J Epidemiol 2014;43:922–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Freeman G, Cowling BJ, Schooling CM. Power and sample size calculations for Mendelian randomization studies using one genetic instrument. Int J Epidemiol 2013;42:1157–63. [DOI] [PubMed] [Google Scholar]
  • 9. StataCorp. Stata Statistical Software. College Station, TX: StataCorp LP, 2015. [Google Scholar]
  • 10. Bound J, Jaeger DA, Baker RM. Problems with instrumental variables estimation when the correlation between the instruments and the endogeneous explanatory variable is weak. J Am Stat Assoc 1995;90:443–50. [Google Scholar]
  • 11. Hernán M, Robins J. Causal Inference. Boca Raton, FL: Chapman & Hall/CRC, forthcoming. [Google Scholar]
  • 12. Rassen JA, Brookhart MA, Glynn RJ, Mittleman MA, Schneeweiss S. Instrumental variables I: instrumental variables exploit natural variation in nonexperimental data to estimate causal relationships. J Clin Epidemiol 2009;62:1226–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Rassen JA, Brookhart MA, Glynn RJ, Mittleman MA, Schneeweiss S. Instrumental variables II: instrumental variable application—in 25 variations, the physician prescribing preference generally was strong and reduced covariate imbalance. J Clin Epidemiol 2009;62:1233–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Davies NM, von Hinke Kessler Scholder S, Farbmacher H, Burgess S, Windmeijer F, Davey Smith G. The many weak instruments problem and Mendelian randomization. Stat Med 2015;34:454–68. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from International Journal of Epidemiology are provided here courtesy of Oxford University Press

RESOURCES