Estimating cost-offsets of new medications: Use of new antipsychotics and mental health costs for schizophrenia

A James O'Malley; R G Frank; S-L T Normand

doi:10.1002/sim.4245

. 2011 Apr 26;30(16):1971–1988. doi: 10.1002/sim.4245

Estimating cost-offsets of new medications: Use of new antipsychotics and mental health costs for schizophrenia

A James O'Malley ^a,^*,^†, R G Frank ^a,^b, S-L T Normand ^a,^c

PMCID: PMC3601436 PMID: 21520217

Abstract

Estimation of the effect of one treatment compared to another in the absence of randomization is a common problem in biostatistics. An increasingly popular approach involves instrumental variables—variables that are predictive of who received a treatment yet not directly predictive of the outcome. When treatment is binary, many estimators have been proposed: method-of-moments estimators using a two-stage least-squares procedure, generalized-method-of-moments estimators using two-stage predictor substitution or two-stage residual inclusion procedures, and likelihood-based latent variable approaches. The critical assumptions to the consistency of two-stage procedures and of the likelihood-based procedures differ. Because neither set of assumptions can be completely tested from the observed data alone, comparing the results from the different approaches is an important sensitivity analysis. We provide a general statistical framework for estimation of the casual effect of a binary treatment on a continuous outcome using simultaneous equations to specify models. A comparison of health care costs for adults with schizophrenia treated with newer atypical antipsychotics and those treated with conventional antipsychotic medications illustrates our methods. Surprisingly large differences in the results among the methods are investigated using a simulation study. Several new findings concerning the performance in terms of precision and robustness of each approach in different situations are obtained. We illustrate that in general supplemental information is needed to determine which analysis, if any, is trustworthy and reaffirm that comparing results from different approaches is a valuable sensitivity analysis. Copyright © 2011 John Wiley & Sons, Ltd.

Keywords: bivariate likelihood, instrumental variables, medicaid cost data, method-of-moments, simultaneous equations, two-stage regression

1. Introduction

Estimation of the effect of one treatment compared to another in the absence of randomization is a common problem in biostatistics. With more emphasis placed on value in the health care setting illustrated with increased funding for comparative effectiveness research in the United States ¹, researchers are increasingly utilizing observational studies to learn about effectiveness of interventions. It is well understood that a simple comparison of average outcomes between treatment arms will potentially confound the treatment effect with various selection effects (associations of predictors with treatment). If the treatment assignment mechanism depends on unmeasured variables affecting the outcome of interest (unmeasured confounders) then regression adjustment and propensity score methods ² may fail to account for selection effects. In this case, instrumental variables methods may provide a pathway to causality. An instrumental variable (IV) is a random variable that is predictive of the treatment a patient receives but uncorrelated with the outcome conditional on treatment ³.

Despite the existence of the various estimators for IV analysis, there is little research on their comparative operating characteristics, and far less on empirical experience in real world settings. A compelling issue is that traditional instrumental variables methods are invariant to the form of the data (continuous versus binary outcome, continuous versus binary treatment) prompting the question of whether one can do better by tailoring methods to a given situation. Terza et al. ⁴ proposed a residual inclusion method for cases when the outcome or selection equation is nonlinear (e.g. as in generalized linear models). Another important consideration is that traditional IV methods do not utilize parametric assumptions, perhaps surprisingly so given the recent explosion in the adoption of latent variable models derived using parametric assumptions (e.g. IRT models, Rasch models, latent class models, latent factor models). An exception is the bivariate probit model, which is generated from assumptions on the underlying latent variables ⁵. We study the traditional IV methods, the residual inclusion method, and the latent variable approach to IV in the context of evaluating whether newer antipsychotic drugs are less costly than their predecessors. We focus on the estimation of the causal effect of a binary treatment on a continuous outcome.

Our research is motivated by the problem of comparing mental health spending between schizophrenia patients using newer atypical antipsychotic medications and those using older conventional antipsychotic medications in Florida's Medicaid population over the period 1994–2001. The older drugs, which are D2-antagonists such as chlorpromazine and haloperidol, were introduced in the 1950s to alleviate hallucinations and delusions in psychotic patients. Atypical antipsychotics, including clozapine, olanzapine (trade name zyprexa), quetiapine (trade name seroquel), and risperidone (trade name risperidal), were first marketed in the late 1980s and 1990s, and while considerably more expensive than the D2-antagonists, were associated with a different profile of side effects. While the conventional antipsychotics were associated with neurologic side effects, the newer atypicals have been linked to other side effects such as weight gain, diabetes, and lipid problems. During our study observation period, three atypicals were introduced—zyprexa, seroquel, and geodon. Some have claimed that atypical antipsychotics, while more expensive ultimately pay for themselves by leading to reductions in other types of health spending ⁶. This claim has come to be known as the offset hypothesis. The offset hypothesis asserts that the greater tolerability of the new antipsychotics will improve adherence to treatment regimens, thereby reducing relapses, resulting in declines in the use of hospital and emergency room services. However, it is disputed whether lower subsequent costs for atypicals are sufficiently large as to offset their greater upfront cost ⁷.

Study of the offsets hypothesis is complicated by the fact that patients that receive the newer atypical drugs likely differ from those getting the older drugs on a number of systematic factors that may not be fully measured. These include existing medical and mental health comorbidities, severity of illness, and treatment preferences.

We utilize variation in the availability of atypical drugs across the state of Florida that arises because the time-lag between Federal approval and local availability varies by geographic area to form instrumental variables. The instruments are indicators of whether a specific atypical was available in a patient's geographic area of residence defined as one of 11 area Medicaid offices representing geographic, cultural, social, and economic factors in a given year. Using these instruments we illustrate several different estimators that account for unmeasured selection effects to test the offsets hypothesis in the Florida Medicaid population. Our goal involves quantifying the evidence for or against the offsets hypothesis using multiple approaches encompassing different assumptions, thereby enabling one approach to act as a sensitivity analysis for another and yielding real-world experience of the extent to which methodological concerns about the various approaches matter. We also use simulations to evaluate the operating characteristics of the various methods when assumptions hold and when they are violated.

We next define notation, describe assumptions, and introduce models. General methods for estimation are detailed in Section 3 and implemented on the Florida Medicaid data in Section 4. Section 5 describes a simulation study to evaluate the operating characteristics of the methods when assumptions hold and when they are violated. We provide concluding remarks in Section 6.

2. Statistical models

2.1. Notation and definitions

We use simultaneous equations to specify models and the potential outcomes nomenclature of ⁸ to define treatment effects. Let y_i, z_i, x_i, u_i, and c_i denote the outcome, the treatment variable, a vector of exogenous covariates, a vector of instrumental variables, and an unmeasured confounding variable for the ith of n subjects.

The instrumental variables u_i are assumed to be: (1) associated with z_i conditional on x_i, (2) uncorrelated with y_i conditional on (z_i, c_i, x_i), and (3) uncorrelated with c_i conditional on x_i ⁹^,¹⁰. Assumption (2) says that there is no direct effect of u_i on y_i (the exclusion restriction), while assumption (3) says that u_i shares no common causes with y_i (i.e. u_i is uncorrelated with any unmeasured variables that predict y_i). If assumption (3) is violated then u_i may be related to y_i through an uncontrolled confounding variable ¹¹, thereby introducing bias. In models where y_i is modeled with an explicit error term, ε_{y, i}, assumptions 2 and 3 reduce to the assumption that u_i and ε_{y, i} (which includes c_i) are uncorrelated conditional on (z_i, x_i). Although x_i and c_i might predict both y_i and z_i and so structurally are equivalent, c_i is problematic because it is unobserved. Controlling for x_i generally makes assumptions (2) and (3) above more believable by controlling for variation in unmeasured confounders that is correlated with x_i ¹².

The annual mental health spending for patient i, denoted cost_i, is the sum of all payments made for services with mental health diagnoses, mental health procedures (e.g. psychotherapy), or psychotropic drugs that are primarily used for mental health treatment such as antidepressants and mood stabilizers. The distribution of cost_i is right skewed. As discussed in Section 4.1, Box–Cox transformations under various models indicated that the log-transformation traditionally used for spending data to account for right-skew would be reasonable. Accordingly, y_i = log(cost_i). Because all patients in the data set received services from a health care provider, the 29 observations with cost_i = 0 were considered impossible and excluded from the analysis.

The treatment z_i is a binary-valued indicator of whether a patient filled an atypical (z_i = 1) or a conventional antipsychotic (z_i = 0) prescription in a given year. If a patient filled both we assigned them to the drug that accounted for the greatest share of their health costs for that year. Thus, z_i is defined in the same year as cost_i. In a sensitivity analysis we restricted the data to new users (i.e. those who initiated treatment with an antipsychotic during our study period) and the first year of data on each individual, thereby obtaining the subset of subjects for whom we could reasonably assume made their initial antipsychotic choice during the study period. This allowed us to check whether it made sense to combine new users and longer term users, and those staying on a single drug from those who switched drugs, in a single analysis. The results were minimally affected suggesting that a pooled analysis that controlled for year was justified.

The predictors in x_i are race/ethnicity, female, age, receipt of Supplementary Security Income (SSI) benefits, history of substance abuse, area of residence, and year. Variables represented by c_i could include health status of the patient, access to skilled physicians, and physician prescribing habits. The vector of instrumental variables u_i consists of the products of binary indicators of whether zyprexa, seroquel, and geodon were FDA-approved at the start of each year and the 10 area-of-residence indicators; the most populous area, Miami, was the excluded category. The variables (x_i, c_i, u_i) are all defined in the same year as cost_i and z_i.

The rationale for the above choice for u_i is that the availability of antipsychotics depends on physician learning which in turn depends on local area attitudes towards innovation, information dissemination, and other conditions that varied substantially across Florida. Thus, drug approval and area of residence are related to antipsychotic use at a given time. In order for u_i to be an appropriate instrument, it cannot be directly related to health care costs or to unmeasured confounders affecting health care costs. This would not be the case if patients with higher costs lived in areas that were faster adopters or if attitudes towards innovation, information dissemination, and other conditions directly affect costs. Thus, the inclusion of area of residence indicator variables in x_i helps make drug approval interacted with region a valid IV.

2.2. Assumed underlying model

The outcome y_i depends on treatment z_i and the exogenous predictors x_i through the linear regression equation

(1)

where ε_{y, i} has mean 0 and variance σ Inline graphic . The validity of this model relies on the existence of linear relationships, homogeneous variances, independent observations, and orthogonality between (z_i, x)^T and ε_{y, i}. In the Medicaid data, z_i and ε_{y, i} are likely to be correlated as (e.g.) detailed measures of the severity of a patient's health condition were not available, and these likely affect a patient's propensity to fill an atypical prescription and their net health spending.

A second equation describes the relationship between z_i and (u_i, x_i)

(2)

where Inline graphic . In terms of the Medicaid data, represents the patient's propensity to be prescribed an atypical antipsychotic. By assumption, the predictors on the right-hand side (rhs) of (2), including the IV u_i, are independent of ε_{y, i}.

The regression parameter β₁, the difference in the outcomes when all other factors (including those influencing ε_{y, i}) are equal, is of primary interest. When z_i is exogenous (uncorrelated with ε_{y, i}), β₁ = E[y_i(1) − y_i(0)∣x_i], where y_i(z) denotes the potential outcome for subject i when z_i = z. However, if z_i is correlated with ε_{y, i} then β₁ = E[y_i(1) − y_i(0)∣x_i, ε_{y, i}]≢E[y_i(1) − y_i(0)∣x_i].

2.3. Parametric model: structural and distributional assumptions

In parametric analyses we follow the construction of the bivariate probit model. This model assumes that the error term ε_i = (ε_{y, i}, ε_{z, i}) is an additive function of c_i, an unmeasured confounder that linearly affects Inline graphic , and (δ_{y, i}, δ_{z, i}), a random disturbance. That is, ε_i = (β₃c_i + δ_{y, i}, θ₃c_i + δ_{z, i}), where c_i, δ_{y, i}, and δ_{z, i} are mutually independent random variables each with mean 0 and variance σ, τ, and τ, respectively. Hence, ε_i has mean 0 and covariance

Because we can multiply σ Inline graphic by k, and divide β₃ and θ₃ by k^1/2 without changing the model, for model identification we set σ = 1.

Derivation of the bivariate probit is completed by assuming that c_i, δ_{y, i}, and δ_{z, i} are normally distributed, implying that ε_i is bivariate normal. Thus, the model is identified through the first and second moments of the distribution of ε_i. Because z_i is binary we can only identify the standardized effects θ₁/(θ Inline graphic + τ)^1/2 and θ₂/(θ + τ)^1/2, leading to the constraint θ + τ = 1. With three parameters and two degrees of freedom in cov (ε_i) we set θ₃ = 1 (equivalently, τ = 0) to identify the model, defining β₃ and β₃/(β + τ)^1/2 as the covariance and correlation between ε_{y, i} and ε_{z, i}, respectively. In the normal case unobserved selection is thus quantified by ρ = β₃/σ_y, where σ Inline graphic = β + τ. Clearly, ρ∈[−1, 1].

3. Estimation methods

3.1. Ordinary least squares (OLS)

Linear regression fits the model y_i = β₁z_i + β Inline graphic x_i + ε_{y, i} where var (ε_{y, i}) = σ. The least-squares estimator is , where y = (y₁, …, y_n)^T and Xis the n × p matrix with ith row (z_i, x). When ε_{y, i} is mean independent of all predictors and has homoscedastic variance (as assumed here), the estimator is minimum variance unbiased among the class of linear estimators (Gauss Markov theorem). However, if any predictor is correlated with ε_{y, i}, OLS will be inconsistent ¹³, Chapter 5.

3.2. Two-stage least squares (2SLS)

The classic IV estimator of β in (1) is the minimum variance estimator among those satisfying the constraint that u_i and ε_i are orthogonal (see Appendix A for construction). This method-of-moments estimator is equivalent to the two-stage least-squares (2SLS) procedure, in which we first fit

(3)

to obtain Inline graphic , and then fit

(4)

to estimate β. In the special case where u_i is univariate-binary and there are no other covariates, the 2SLS procedure given by (3) and (4) is equivalent to the Wald estimator ¹⁴. The standard error of Inline graphic is

(5)

where U is the matrix with ith row (u Inline graphic , x) and estimates the residual variance of the outcome equation ¹³, Section 5.2.2, ¹⁵, p. 531.

The orthogonality condition enforced in (3) holds factors affecting ε_i constant, allowing β to be estimated for those subjects for whom u_i influences z_i, the ‘population on the margin’. In the offsets analysis, the population on the margin is patients whose uptake of an atypical antipsychotic medication was influenced by the availability of zyprexa, seroquel, or geodon in the city where they lived. Thus Inline graphic is a ‘structural shift’ of using an atypical.

A notable feature of 2SLS is that no presumption is made about the type (e.g. binary, ordinal, interval) of variables that y_i and z_i are or about the distribution of ε_i. The binary nature of z_i led us to consider whether more efficient results could be obtained by accounting for the form of z_i.

3.3. Alternative two-stage approaches

Although the 2SLS estimator is consistent when the IV assumptions hold ¹⁶^,¹⁷, inferences may be inefficient because the binary form of z_i is not respected. As an alternative to 2SLS, we can replace (3) with

(6)

where Φ(·) is the cdf of the standard normal distribution. The implied (nonlinear) 2SLS procedure fits (6) using nonlinear least squares (NLS) or a generalized linear model, sets Inline graphic , and then evaluates . The interpretation of β₁ is unchanged by the nonlinear first-stage equation because z_i is the main effect in the outcome equation, not .

Following Terza et al. ⁴ we term this approach ‘two-stage predictor substitution (2SPS).’ Despite the fact that orthogonality of Inline graphic and ε is no longer enforced, 2SPS has been said to yield consistent estimates of β₁ when the outcome equation is linear and (θ₁, θ₂) is estimated consistently ⁵. However, in the case when the outcome equation is nonlinear, 2SPS has been shown to perform poorly even when the first-stage equation is linear ⁴.

In the case of a linear model for z_i, point estimates of β₁ under the model

(7)

are identical to those obtained from (4). However, when z_i depends on a nonlinear model such as (6), the effect of z_i above and beyond the effect of Inline graphic (the ‘endogenous (bad) variation’ in z_i) on y_i does not equal the effect of (the ‘exogenous (good) variation’ in z_i) on y_i. The two-stage residual inclusion (2SRI) procedure of ⁴, whose origins date to a test for endogeneity in ¹⁸, is the estimation of (6) followed by (7). It has been shown that 2SRI yields consistent estimates for linear and nonlinear models ¹³, Chapter 12.

3.4. Maximum likelihood

Assuming normality and using the parameterization of Section 2.2, it follows that

(8)

In this model ρ quantifies the extent to which unobserved factors affecting Inline graphic are correlated with those affecting y_i ¹⁹. A positive value of ρindicates atypical-favorable selection because unobserved factors that make individuals more likely to take an atypical are also more likely to have higher health costs; i.e. ignoring selection would lead to over estimation of β₁.

The joint marginal density for (y_i, z_i) is obtained by integrating over Inline graphic in the model defined by (1), (2), and (8) (see Appendix B for details). The product of these densities is the observed data likelihood function for (β, θ, ρ, σ), given by

where

(9)

(10)

for ρ≠ ± 1. The presence of µ_{y, i} and thus β in (9) and (10) precludes separate maximization of the y_i and z_i∣y_i components of the likelihood function. The two components need to be fit simultaneously in order for Inline graphic to have a structural shift interpretation. Only when ρ = 0 does it suffice to fit separate linear and probit regression models.

3.5. Bayesian inference

Bayesian analysis provides a more flexible approach to inference than maximum likelihood by incorporating prior distributions containing information about the parameters. In the absence of prior information, a non-informative prior is often reasonable.

Because it is the mechanism governing selection, ρ plays a crucial role in the offsets analysis. We are interested in the sensitivity of the results to the prior for ρ and the extent to which it characterizes the other approaches. Prior distributions for ρ that cover a wide range of levels of precision will be used.

Conditional on ρ, the other model parameters are well identified by the data. Therefore, to investigate sensitivity to the prior on ρ, we specify priors diffuse in (β₁, β₂, θ₁, θ₂, σ Inline graphic ) and with varying levels of informativeness about ρ. Specifically, we assume

where (ρ + 1)/2∼Beta(ν₁, ν₂); p(ρ) has the shape of a Beta density but its support is extended from [0, 1] to [−1, 1]. Note that Inline graphic maps the correlation coefficient to (−∞, ∞). In the special case where ν₁ = ν₂ = 1, ρ∼U(−1, 1) and p(η∣σ) = σ_y{2(1 + ση²)^3/2}⁻¹, the density of a t-distribution with two degrees of freedom, mean 0, and scale parameter (2σ)^−1/2; a thick-tailed distribution.

The values considered for ν = (ν₁, ν₂) are such that E[ρ] = 0 and Inline graphic . This requires that , which places a supremum of 1 on (a bound that is only obtained in the limiting case where p(ρ) has point masses of 1/2 at ± 1). The larger ν₁ = ν₂ the smaller .

Although β₁ is the same conditional effect as for 2SLS and maximum likelihood, Bayesian interpretations are conditioned on data^obs = {y_i, z_i, x_i, u_i}_{i = 1, …, n}.

3.6. Testing the exclusion restriction

While the exclusion restriction is a necessary condition for parameter identifiability in the two-stage approaches, the specification of a parametric distribution for ε_i makes the exclusion restriction non-essential for identifiability of likelihood-based procedures. Therefore, the exclusion restriction may be tested by fitting the model

(11)

where ε_i is specified as in (8). Equation (11) is equivalent to the selection model in ²⁰ and is a special case of the structural shift model in ²¹. The model is fully identified when ε_i is bivariate normal if (x_i, u_i) contains at least one non-constant predictor ²². A small non-significant value of Inline graphic supports the exclusion restriction. However, we emphasize that this test is only valid if ε_i truly is bivariate normal, an assumption that itself cannot be fully evaluated using the observed data.

3.7. Computation

The two-stage procedures can be implemented using standard methods for fitting linear or generalized linear models. However, the computation of standard errors is complicated by the need to simultaneously account for the estimation error from both equations. Equation (5) may be used for 2SLS while asymptotic approximations, such as those outlined in ¹³, Chapter 12, are needed to obtain closed-form expressions for 2SPS and 2SRI.

Because the likelihood function depends on unobserved latent variables, specialized model-fitting routines are needed for maximum likelihood and Bayesian inferences. MLEs are obtained by directly maximizing the observed data log-likelihood function in (10) using a nonlinear optimization package in R. Standard errors are computed using the delta method to obtain closed-form expressions approximating the covariance matrix of the parameters or functions thereof that is then evaluated at the MLEs of the parameters (see Appendix C). WinBUGS ²³ is used for Bayesian inference with inferences evaluated as Monte Carlo averages over draws from the posterior distribution of the model parameters. Convergence is monitored using trace plots and the diagnostics available in CODA ²⁴.

4. Cost-offsets: atypical and conventional antipsychotic use in adults with schizophrenia in Florida

The dependent variable is the log-transformed aggregate spending for all services with mental health diagnoses, mental health procedures (e.g. psychotherapy), or psychotropic drugs that are primarily used for mental health treatment such as antidepressants and mood stabilizers for a patient in a given year. The objective is to infer β₁, the difference in the annual log-spending of treatment of using an atypical versus a conventional antipsychotic for individuals suffering from schizophrenia in Florida's Medicaid population during 1994–2001 when all other factors, including unmeasured factors influencing ε_i, are fixed.

To facilitate interpretation we transform estimates from log-spending to spending (units of $). At a given value of (z_i, x Inline graphic )^T, mean spending equals the exponential of mean log-spending multiplied by a retransformation factor. The retransformation factor may in general be estimated using the smearing estimate ²⁵, given by , where is the estimated residual in (1). Therefore, in $the savings attributed to using atypicals over conventionals is given by

(12)

Under likelihood-based approaches there are alternatives to the smearing estimate. For example, the MLE of the retransformation factor is given by Inline graphic . In Bayesian implementations, the retransformation factor from log-normal to normal, S = exp(σ/2), is incorporated in the posterior means of any quantities on the scale of the retransformed outcome and so is automatically accounted for when quantities of interest are evaluated as Monte Carlo averages over draws from the posterior distribution of the model parameters.

Another quantity of interest is the average treatment effect (ATE). The ATE evaluates the combined effect of selection and treatment on spending by evaluating the expectation with respect to f(y_i(z)), the marginal distribution of y_i(z) after integrating over Inline graphic , whereas β₁ is the pure effect of the latter. The average treatment effect over individuals with covariates values is given by

graphic file with name sim0030-1971-m13.jpg

(13)

or in terms of dollars

graphic file with name sim0030-1971-m14.jpg

(14)

where ϕ(·) and Φ(·) denote the pdf and cdf of the standard normal distribution, respectively. Equation (13) illustrates that ATE = β₁ when ρ = 0 (no selection).

A feature of parametric methods is that they are able to delineate between population effects (e.g. β₁) and (average) treatment effects specific to the subjects in the sample. The expressions for the ATE's in (13) and (14) are informative because they show their relationship to β₁. Such expressions can only be determined through the specification of a full parametric model for the data.

The local average treatment effect (LATE), the effect of treatment on those whose treatment status can be changed by u_i (the marginal population), equals β₁ when there is a single binary instrument, no covariates, the exclusion restriction holds, and the effect of u_i is monotone across i ²⁶, p. 155. However, under our model different LATEs correspond to the values of u_i defining the marginal population. Specified mathematically, the LATE is given by E[y_i(1) − y_i(0)∣z_i(u⁽¹⁾)>z_i(u⁽⁰⁾), x_i], where z_i(u^(k)) is the potential treatment of subject i when u_i = u^(k) (k = 0, 1). (See ²⁷^,²⁸ for summary measures of heterogeneous LATE.) However, β₁ (the 2SLS estimand) can be thought of as a weighted average of a LATE for the marginal subpopulations identified (one at a time) by each component of u_i. Therefore, despite not corresponding to a single LATE, our primary interest is in β₁ and so we do not report LATE for any particular subpopulations.

To gauge the sensitivity of results computed under the Bayesian model with respect to the prior for ρ, we fit this model with Inline graphic between 0.96 (prior has a U-shape) and 10⁻⁴/3 (prior is a spike at 0).

4.1. Descriptive results

The Florida Medicaid data set comprises 26 759 adults diagnosed with schizophrenia at some point during 1994–2001 yielding n = 78349 person-year observations (Table I) of health care spending. The vector x_i has 18 elements (intercept, black, other non-white (largely Latino), female, age, receipt of supplemental security income (SSI), substance abuse history, year, and 10 area dummies), whereas u_i contains 33 elements (the availability of zyprexa, seroquel, and geodon and their interactions with area).

Table I.

Florida Medicaid spending for atypical users compared to conventional users (78 378 person-year observations)

		Mental Health (MH) Spending ($) (Mean, StDev)

Variable	Mean	Atypical	Conventional
Atypical	0.450	11 713 (12 083)
Conventional	0.550		6218 (8833)
Male	0.476	12 034 (12 190)	6754 (9206)
Female	0.524	11 425 (11 980)	5727 (8445)
White	0.430	12 283 (11 998)	6365 (8759)
Black	0.266	11 656 (11 996)	6235 (8833)
Other non-white	0.304	10 994 (12 221)	5994 (8931)
Substance abuse	0.120	21 241 (14 844)	14 694 (13 070)
Non-substance abuse	0.880	10 103 (10 748)	5270 (7662)
SSI	0.964	11 872 (12 199)	6291 (8891)
non-SSI	0.036	7647 (7565)	4146 (6646)
Zyprexa available	0.730	11 420 (11 896)	5737 (8277)
Zyprexa not available	0.270	13 931 (13 207)	6953 (9573)
Seroquel available	0.589	11 315 (11 816)	5533 (7986)
Seroquel not available	0.411	13 076 (12 867)	6750 (9403)
Geodon available	0.159	11 999 (12 717)	6001 (8817)
Geodon not available	0.841	11 619 (11 867)	6239 (8834)
Pensacola	0.041	7323 (10 150)	6475 (12 660)
Tallahassee, Panama City	0.045	6991 (10 012)	5220 (10 368)
Gainesville, Ocala	0.065	6209 (9568)	4436 (9430)
Jacksonville, Daytona Beach	0.099	833 (11 172)	6664 (11 976)
St. Petersburg	0.069	8077 (10 654)	6081 (11 611)
Tampa, Lakeland, Bradenton	0.049	6861 (8855)	4365 (7268)
Orlando	0.072	7775 (11 707)	5552 (11 006)
Ft. Myers, Sarasota, Naples	0.031	6639 (9671)	4434 (8960)
West Palm Beach, Vero Beach	0.058	7517 (11 080)	5685 (12 130)
Ft. Lauderdale	0.086	9768 (13 092)	7886 (13 865)
Miami, Key West	0.384	10 154 (13 445)	6807 (12 349)

		Correlation with MH spending

Variable	Mean (SD) or range	Atypical	Conventional
Year	1994–2001	−0.0425	−0.0655
Age	42.57 (11.37)	−0.0374	−0.0791

Open in a new tab

A comparison of means based on Table I suggests that atypical antipsychotics are much more expensive than conventional drugs. However, this analysis does not account for non-random selection of patients into treatment; for example, patients with more severe conditions may have higher propensity to receive an atypical and also be more costly. It is clear from Table I that spending is higher for males, whites over blacks, blacks over other non-whites, substance abusers, and those receiving SSI. For all predictors other than male (versus female), the magnitude of the difference is greater within atypical antipsychotics than conventionals. However, the magnitude of the correlations between year and spending, and between age and spending, was greater for conventional antipsychotics.

Because the distribution of cost _i is naturally skewed to the right, we sought a transformation that induced normality. The maximum likelihood estimate of the Box–Cox transformation is 0.140 for unadjusted cost_i, 0.153 under OLS, and approximately − 0.05 under the fully parametric simultaneous equations model. Because the log-transformation is a compromise between these alternatives, corresponding to a transformation parameter of 0, we transformed the data using y_i = log(cost_i). Using the alternative Box–Cox transformations had minimal effect on the results of the analysis.

4.2. Treatment effects

We analyzed the data using each of the procedures discussed in Section 3: ordinary least squares (OLS), two-stage least squares (2SLS), two-stage predictor substitution (2SPS), two-stage residual inclusion (2SRI), maximum likelihood, and Bayesian analysis with various priors.

The large value of ρ estimated by the likelihood-based methods (MLE and Bayesian models) was verified by plotting the profile likelihood function of ρ and confirming the existence of a unique global optimum at Inline graphic , far from the edge of the parameter space (Figure 1). Under such a strong unmeasured selection effect, β₁ and ATE are destined to have very different values. The Staiger–Stock test F-statistic of 9.86 in the 2SLS analysis suggests that the instrument only accounts for a small fraction of the selection effect and would be considered borderline-weak compared to the conventional standard of 10 ²⁹.

Profile likelihood of ρ based on 78 378 observations from the Florida Medicaid data set.

Ordinary least squares (OLS) suggests that the newer atypical antipsychotics result in more spending (Table II: β₁ estimated to be near 1), the two-stage procedures give inconclusive results (β₁ estimated to be near 0), and the likelihood-based methods suggest that the newer atypicals lead to lower levels of spending (β₁ estimated to be near − 0.7). In terms of annual patient dollars, the cost of atypicals less the cost of conventionals was estimated to be $9948, range from − $263.3 to $2262, and range from − $10010 to − $9065 under OLS, the two-stage procedures, and the likelihood-based procedures, respectively. Because the left-skewness of the data inflates the treatment effect upon retransformation, the predicted mean OLS estimate is substantially larger than the raw mean difference (Table I). Inflation of the mean due to retransformation combined with the highly positive selection effect leads to the large saving found under the likelihood-based analyses.

Table II.

Point estimates of the treatment effects (and associated uncertainty) for the ordinary least squares (OLS), two-stage least squares (2SLS), two-stage predictor substitution (2SPS), two-stage residual inclusion (2SRI), maximum likelihood (MLE), and the Bayesian procedures on the Florida Medicaid population

		Two-stage				Likelihood-based

Term	Quantity	OLS	2SLS	2SPS	2SRI	MLE	Bayesian
β₁	Estimate	1.022	−0.028	0.237	0.193	−0.793	−0.773
	Standard deviation	0.010	0.169	0.144	0.145	0.031	0.031
	Estimate	9 965	−263.3	2 262	1 911	−9393	−9948
	Standard deviation	122.8	1 607	1 415	1 668	537.2	531.5
ATE	Estimate					1.049	1.013
	Standard deviation					0.010	0.010
ATE ^$	Estimate					9 576	10 190
	Standard deviation					107.1	94.3
ρ	Estimate					0.721	0.696
	Standard deviation					0.008	0.008

Open in a new tab

The ATE, ATE ^$, and ρ are only estimated for the likelihood-based procedures as estimation relies on the specification of a probability distribution for the observations.

The OLS estimate of β₁ and the likelihood-based estimate of the ATE are fairly similar, illustrating that the former is actually estimating the ATE. The SE of the MLE of the ATE is slightly smaller than the SE of the OLS estimate, consistent with the result in ³⁰ that regression parameters of terms unique to one regression equation are estimated more efficiently in a bivariate model than with the corresponding univariate model.

Despite estimating the same quantity, differences between the two-stage and likelihood-based estimates of β₁ are substantial. Because we thoroughly check the distribution of the observed variables graphically and using several diagnostics, and also explored various variable transformations, we believe that the discrepancy in these estimates is due to things we do not observed that cannot be tested fully empirically: violations of the exclusion restriction or departures of the distribution of the error terms from the bivariate normal distribution of the data. To gain further insight into possible causes of the discrepancy we conducted a simulation study (Section 5).

The 95 per cent confidence interval for β₁ under 2SLS only just overlaps Inline graphic under 2SPS and 2SRI and conversely the 95 per cent confidence intervals of β₁ under 2SPS and 2SRI only just encompass under 2SLS, illustrating that the results are sensitive to small differences in the method of estimation. The MLEs had SEs about one-fifth and one-third those for 2SLS and its nonlinear variants (2SPS and 2SRI), respectively, thus highlighting the ability of the likelihood procedures to yield more precise inferences.

Comparing the scale of the vertical axes in Figure 2, the Bayesian point and interval estimates of β₁ and Inline graphic were substantially more sensitive to p(ρ) than Bayesian estimators of ATE and ATE ^$. From to , E[β₁∣data^obs] (and thus ) move from being close to the OLS estimate to close to the MLE. However, as indicated in the plot of the selection effect (bottom-right), a very precise prior on Inline graphic is required to obtain Bayesian estimates that correspond to those of the two-stage approaches.

Relationship between Bayesian posterior means of (β₁, ATE, σ, ρ) and log₁₀ of the prior variance, , when ρ has an extended Beta density on [−1, 1]. The dotted lines are the interpolated pointwise 99 per cent credibility intervals. The dashed lines in the upper left-hand plot depict from top to bottom the OLS, 2SPS, 2SRI, 2SLS, Bayesian posterior mean under a U(−1, 1) prior for ρ, and the MLE, respectively.

The 33 elements of u_i in (11), the model for testing the exclusion restriction, had standardized effects (estimate divided by standard error or posterior standard deviation) ranging from 1.054 to 1.954, not significant at the 0.05 level. The F-statistic for the test that β₃ = 0 equals 237, well above the critical value at the 0.05-level of 47.4. Thus, there is strong evidence under the assumed bivariate normal model that the exclusion restriction is violated.

5. Simulation study

We conducted a simulation study to evaluate the sensitivity of the estimators of β₁ and the ATE, and properties of likelihood-based tests of the exclusion restriction (i.e. the condition β₃ = 0), to the distribution of ε_i. Computations were streamlined by substituting x_i and u_i with the univariate variables x Inline graphic and u, whose effects approximate the combined effects of all elements of x_i and u_i, respectively. This was achieved by making the variance of x and u equal 1 and β, θ and θ equal the empirical standard deviation of βx_i, θu_i and θx_i, respectively. To further reduce computation time while emulating the Florida Medicaid data, both n and σ Inline graphic were reduced by factors of 10. Bias, mean-squared error (MSE), and coverage were estimated by averaging over 1000 simulated data sets.

In the first group of simulations, ε_i was drawn from a bivariate normal distribution, the case where the likelihood function is correctly specified. In subsequent simulations, observations were randomly drawn from a bivariate t-distribution with seven degrees of freedom or were correlated draws from gamma distributions, allowing assessment of the robustness of the approaches to thicker-tailed and skewed distributions. Finally, we simulated data in violation of the exclusion restriction by setting β Inline graphic = θ to evaluate sensitivity with respect to the exclusion restriction. We also evaluated the normal likelihood-based test of the exclusion restriction for β ranging from 0 to θ.

The bias and root mean square error (RMSE) for each estimator and scenario are displayed in Figures 3 and 4, respectively, while Table III contains operating characteristics of the likelihood-based test of the exclusion restriction. However, in discussing these results we use a method-by-method approach; this was most helpful in describing the scenarios under which each approach works best and when it absolutely should not be used. To supplement the results, Figure 5 depicts an algorithm for determining which approach is best to use in practice.

Simulated bias of estimators as a function of ρ for different outcome distributions and status of the exclusion restriction. As per the Florida Medicaid analysis β₁ = − 0.793, θ₁ = 0.144, and if the exclusion restriction is violated then β₃ = 0.144. The vertical axis in the upper-right plot covers a wider range to accommodate the excessive bias of 2SLS under violation of the exclusion restriction.

Simulated root mean-squared error (RMSE) of estimators as a function of ρ for different outcome distributions and status of the exclusion restriction. As per the Florida Medicaid analysis β₁ = − 0.793, θ₁ = 0.144, and if the exclusion restriction is violated then β₃ = 0.144. The vertical axis in the upper-right plot covers a wider range to accommodate the excessive RMSE of 2SLS under violation of the exclusion restriction.

Table III.

Simulations results when the exclusion restriction may be violated (ρ = 0.721)

	Parameter values			Statistics

Distribution	β₁	θ₁	β₃	Bias	RMSE	Coverage	z-Value	Power
Normal	−0.793	0.144	0.000	0.000	0.018	0.968	−0.025	0.054
Normal	−0.793	0.144	0.036	0.001	0.018	0.972	2.055	0.526
Normal	−0.793	0.144	0.072	0.000	0.017	0.984	4.077	0.982
Normal	−0.793	0.144	0.108	0.000	0.017	0.972	6.060	1.000
Normal	−0.793	0.144	0.144	0.000	0.018	0.980	8.075	1.000
T	−0.793	0.144	0.000	0.001	0.015	0.974	0.054	0.038
T	−0.793	0.144	0.036	0.001	0.015	0.970	2.470	0.707
T	−0.793	0.144	0.072	0.001	0.015	0.982	4.802	0.996
T	−0.793	0.144	0.108	0.003	0.016	0.946	7.322	1.000
T	−0.793	0.144	0.144	0.002	0.015	0.972	9.651	1.000
Gamma	−0.793	0.144	0.000	−0.043	0.046	1.000	−2.767	0.802
Gamma	−0.793	0.144	0.036	−0.044	0.046	1.000	−0.498	0.066
Gamma	−0.793	0.144	0.072	−0.045	0.047	1.000	1.723	0.390
Gamma	−0.793	0.144	0.108	−0.044	0.047	1.000	4.054	0.988
Gamma	−0.793	0.144	0.144	−0.043	0.046	1.000	6.389	1.000

Open in a new tab

Algorithm for choosing the best method in practice. The decision process begins with the left-hand rectangle and at each subsequent step selects a method (ovals) or moves to the next decision (boxes). Because our results suggest that 2SPS is dominated by either 2SRI or 2SLS, it does not appear. Data transformations and other analyses that inform model specification should be performed prior invoking this algorithm.

5.1. Results for OLS

As expected the OLS estimates became increasingly biased the further ρ was from 0 (Figure 3) with RMSE is essentially equal to bias in all cases where ρ≠0. Any variations in its performance across distributions or under violation of the exclusion restriction (which is irrelevant as far as OLS is concerned) were drowned out by the impact of an unmeasured confounder. Clearly, if there are legitimate concerns about unmeasured confounders then OLS is not appropriate.

5.2. Results for 2SLS

Figure 3 shows that 2SLS performs well across all conditions other than violation of the exclusion restriction, in which case 2SLS is even more biased than OLS and thus should not be used. However, if the IV is supported by theoretical arguments or other insights indicating that the exclusion restriction holds use of 2SLS is appropriate (as shown in Figure 5).

It is clear from Figure 4, where the range of the vertical axis for 2SLS is much greater than that of the other methods, that the standard errors of 2SLS estimates exceed those of all other methods. Thus use of 2SLS typically lowers the statistical power of the analysis compared to other approaches.

5.3. Results for 2SPS and 2SRI

As shown in Figures 3 and 4 these alternative moment-based IV procedures compared favorably to 2SLS when the underlying distribution is symmetric or the exclusion restriction is violated (although they still perform consistently poorly in this scenario) but not so favorably when the underlying distribution is skewed. In general, 2SRI is more precise (smaller variance and RMSE) than 2SPS which is more precise than 2SLS while the reverse order holds for sensitivity to skewness (i.e. 2SRI is worst performed).

These results, not previously reported in the literature, may be a consequence of the nonlinearity of Equation (7) introducing bias when the distribution of ε_i is skewed. Because the theoretical results reported in ⁴ imply that 2SRI and 2SPS are consistent (irrespective of the underlying distribution), bias should approach 0 as n increases. However, additional simulations at different values of n suggested that, at best, the convergence is very slow.

Based on the above, 2SRI may be the best method to use when the evidence supporting the validity of the IV is strong (as for 2SLS) but n is such that the study is insufficiently powered under 2SLS. However, if there is evidence that ε_i has a skewed distribution, particularly ε_{y, i} (see Section 5.5), then 2SLS would be the safer (more robust) choice.

5.4. Results for likelihood-based estimators

The MLE and the Bayesian estimators of β₁ yield better results than the two-stage estimators when the underlying distribution is normal and are more robust to violations of the exclusion restriction. However, they are more sensitive to departures of the underlying distribution from normality. An interesting finding is that likelihood-based estimators of β₁ are relatively more robust when non-normality is in the form of thicker tails than skewness while the reverse is true for likelihood-based estimators of the ATE.

The robustness of the MLE of the ATE is due to the presence of ρ. The two-stage methods do not account for ρ and so, with no way to compensate for β₃≠0, yield biased results, while the MLE of β₁ is only partially affected by violations of the exclusion restriction due to the fact that the error correlation ρ partially absorbs β₃u Inline graphic ≠0.

As indicated in Figure 5, likelihood-based methods are recommended when unmeasured confounders are thought to exist but it is questionable whether the IV is valid or no IV is available. To make likelihood-based analyses as believable as possible, transformations of y_i that induce normality in Inline graphic should be considered.

5.4.1. Test of exclusion restriction

Table III shows the results of including u Inline graphic as a predictor of y_i when the true value of its coefficient, β₃, varies from 0 to 0.144 (i.e. up to the magnitude of the effect of the IV). When the underlying distribution is bivariate normal, β₃ is estimated with high precision and no bias. Furthermore, the power of the test H0: β₃ = 0 against the alternative HA: β₃≠0 increases from 0.05 when the true value is 0 (in this case power = type I error) to over 0.95 at 0.072. Therefore, if normality holds the likelihood-based methods provide a valid test of the exclusion restriction.

Inferences about β₃ are almost as reliable if the underlying distribution has t₇ as opposed to normal marginals, slightly over-covering when β₃>0. However, if the underlying distribution is skewed (as for a Gamma distribution), then estimates of β₃ are biased and the type I error of the test of the exclusion restriction is excessive. Therefore, the bivariate normal test of the exclusion restriction cannot be relied upon if the true outcome distribution is asymmetric.

5.5. Other results

We also evaluated the approaches under various other scenarios for which we do not present results. In one series of simulations one of ε_{y, i} and ε_{z, i} was normal and the other was non-normal (t or gamma) distributed. Results were more sensitivity to non-normality of ε_{y, i} than ε_{z, i}. In fact, as long as ε_{y, i} was symmetric, 2SRI appeared to be robust to the distribution of ε_{z, i} and the likelihood-based procedures were only biased by small amounts.

When θ₁ increases by a factor of 2, the MSE of the two-stage estimators decreases by a factor of 4. Although the MLE becomes more precise as θ₁ increases, the trend is nowhere as dramatic as for the two-stage approaches. This reflects the fact that the likelihood-based procedures are identified from the distribution of the data and so the involvement of u Inline graphic improves the stability and precision of the estimates. Multiplying β₁ by 2 does not alter the precision of the estimators revealing that the magnitude of β₁ is not tied to the precision with which it is estimated.

The performance of interval estimators for β₁ was highly correlated with the bias and variance of the corresponding point estimators. If the point estimator was unbiased then the coverage of the interval estimator was close to 0.95.

6. Discussion

We used data from a large state database to investigate whether newer atypical antipsychotics lowered net costs of health care relative to conventional antipsychotics. Because treatment is non-randomly assigned, instrumental variables methods were used to separate the true effect of treatment on log-cost from selection effects. To aid interpretation, we converted the total payments made under each treatment from log-spending to spending (in $). We used several approaches for the analysis with the rationale that the methods would validate one another if similar results were obtained. The methods yielded results that were surprisingly disparate; atypicals were estimated to save about $10 000 under likelihood-based procedures (the MLE and Bayesian models), in contrast to no saving or increased spending, of about $2000, under the two-stage procedures. These results bring the assumptions underpinning the methods into question.

To gain a sense of which results to believe, we used simulations that studied the properties of the two-stage procedures and the MLE. We observed the following: (1) OLS only works in the absence of confounding, (2) 2SLS works well in all scenarios other than when the exclusion restriction is violated, in which case it fails completely, (3) 2SPS performs better than 2SLS unless the underlying distribution is skewed, (4) 2SRI performs better than 2SPS unless the underlying distribution is skewed, and (5) likelihood-based estimators perform better when the underlying distribution is normal and when the exclusion restriction is violated. While results (1) and (2) are well known, (3)–(5) are new findings. The poor performance of the alternative two-stage procedures when the distribution of the data is asymmetric illustrates that the complete robustness of 2SLS to the distribution of the data is compromised by seeking to improve efficiency through the use of a nonlinear first-stage equation.

Likelihood-based estimators of β₁ and β₃ (the direct effect of the candidate IVs on the outcome) are surprisingly robust to violations of normality as long as the true distribution is symmetric, but fail when the true distribution is skewed. Thus, while likelihood-based models can be robust and, therefore, can be used to test the validity of 2SLS when the true distribution is symmetric, their findings are quickly compromised if the true distribution is skewed.

With the above in mind, where does the evidence for offsets of atypical antipsychotics point? Based on the analysis of the Florida Medicaid data, there is evidence that the assumption of normal residuals, although a reasonable approximation, does not hold exactly. Therefore, because 2SLS is robust to the distribution of the residuals and the exclusion restriction can be defended heuristically (i.e. from an economic standpoint), 2SLS might be most trustworthy. However, the findings in this paper reveal that even a small departure from the exclusion restriction makes 2SLS and the alternative two-stage procedures likely to produce results that are substantially biased. Given the highly conflicting results across the approaches we feel there is insufficient evidence to conclude the offset of atypical antipsychotics is positive or negative. This is generally consistent with the research from clinical trials such as the Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) study ³¹.

The fact that different statistical procedures result in such different results is alarming. The finding that the alternative two-stage procedures are sensitive to the underlying distribution of the data is important for researchers using these methods. Similarly, the results on the sensitivity of parametric models derived from latent variable constructs, as for the likelihood-based analysis here, is an important lesson to statisticians and other users of these approaches.

The methodology we have developed is generally applicable to any observational study in which an IV is available for the treatment and outcome of interest. In light of the sensitivity results reported here, the availability of a valid IV is critical. However, finding IVs in practice can seem an art form to one not familiar with IV analysis as the arguments supporting the exclusion restriction are theoretically rather than empirically driven. Therefore, we recommend a subject matter expert with an acute sense of the outcome, treatment and unobserved confounding variables is integrally involved in the determination of candidate IVs.

A limitation of our empirical analysis is that we did not account for repeated measurement of subjects that appeared in the data set in multiple years. We subsequently fit a hierarchical Bayesian model that included a random intercept for subject. The posterior mean of β₁ was − 0.492 (sd = 0.0197), suggesting that single-level analyses might over-estimate the magnitude of β₁. The within-subject variance had a posterior mean of 1.036 (0.0125) while the between-subject variance had a posterior mean of 1.118 (0.0141), suggesting there is substantial unexplained variance between subjects.

Another direction in which the likelihood-based estimators could be extended is by assuming a more flexible family of distributions for ε_i (e.g. bivariate t-distribution) and constructing estimators under those less restrictive assumptions. However, because a more flexible model is likely to be less well identified by the data, it is not clear that it would yield an estimator that is more robust to the distribution of the data.

Acknowledgments

Research for this article was supported by NIH grants R01-MH061434, R01-MH069721 and 1RC4MH092717-01 from the National Institute of Mental Health.

Appendix A: Method-of-moments derivation of 2SLS

Classic IV fits the model in Equation (1) subject to the constraint that u_i and ε_i are orthogonal. Write ε_{y, i}(β) = y_i − β₁z_i + β Inline graphic x_i to emphasize the dependence of ε_{y, i} on β = (β₁, β₂). By definition, u_i is uncorrelated with ε_{y, i} and so E[u_iε_{y, i}(β)] = 0 for all i. As we do not observe the true expectations, we seek values of β such that

(A)1

Because x_i is exogenous, E[x_iε_{y, i}(β)] = 0.

Equation (A1) can be solved exactly if the number of IVs equals the number of endogenous predictors. However, if dim(u_i)>dim(z_i) it is generally not possible to satisfy all of the orthogonality conditions simultaneously; z_i is said to be overidentified. An ‘optimal’ solution is obtained by finding the parameter values that minimize the quadratic form

where W is a positive-definite weighting matrix quantifying the relative importance of the orthogonality conditions across the instruments. Setting W proportional to cov (Xε_y) = σ Inline graphic X^TX, where ε_y = (ε_{y, 1}, …, ε_{y, n})^T, minimizes the variance of Q(β). The IV estimator is then the value of β which minimizes T(β)^TX^TXT(β) subject to (A1), yielding

(A2)

where Inline graphic is the predicted value of X in a regression of X on U and U is the matrix with ith row (u, x) ¹⁵, p. 530. Because x_i is contained in both Xand Uit projects onto itself and so . Geometrically, is the projection of z = (z₁, …, z_n)^T onto U; is orthogonal to ε_y and thus contains only the component of variation in z that can be used to estimate β.

Appendix B: Derivation of the likelihood function

Make ε_{y, i} and ε_{z, i} the subject of Equations (1) and (2) and factor the joint density as f(ε_{y, i}, ε_{z, i}) = f(ε_{y, i})f(ε_{z, i}∣ε_{y, i}). Then integrate with respect to ε_{z, i} over those values of ε_{z, i} for which z_i = 1, and analogously for those values for which z_i = 0, to obtain f(z_i∣ε_{y, i}). Finally, substitute for ε_{y, i} to obtain the joint density of (y_i, z_i).

Appendix C: Computation of standard errors of MLEs

Let d_i = (z_i, u Inline graphic )^T, θ←θ/(1 − ρ²)^1/2, η = ρ/{σ(1 − ρ²)}^1/2, r_i = y_i − dβ and m_i = uθ + ηr_i. Then set Φ₁(m_i, z_i) = {z_i/Φ(m_i) − (1 − z_i)/(1 − Φ(m_i))}ϕ(m_i), Φ₂(m_i, z_i) = {z_i/Φ(m_i)² + (1 − z_i)/(1 − Φ(m_i)²)}ϕ(m_i)², Φ₃(m_i, z_i) = − m_iΦ₁(m_i, z_i) − Φ₂(m_i, z_i), and w_i = 1/σ Inline graphic − η²Φ₃(m_i, z_i).

The first and second derivatives of the log-likelihood function, ℒ, are

graphic file with name sim0030-1971-mu28.jpg

All other elements of the Hessian matrix, the matrix of second derivatives of the log-likelihood function with respect to (β, θ, η, σ Inline graphic ), equal 0. The covariance matrix of the estimated parameters is estimated by the negative-inverse-Hessian matrix. The approximate variance of estimators of functions of these parameters, such as ρ and the ATE, are then derived using the delta method.

References

1.VanLare JM, Conway PH, Sox HC. Five next steps for a new national program for comparative-effectiveness research. New England Journal of Medicine. 2010;362:970–973. doi: 10.1056/NEJMp1000096. [DOI] [PubMed] [Google Scholar]
2.Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70:41–55. [Google Scholar]
3.Imbens G, Angrist J. Identification and estimation of local average treatment effects. Econometrica. 1994;62:467–476. [Google Scholar]
4.Terza JV, Basu A, Rathouz P. Two-stage residual inclusion estimation: Addressing endogeneity in health econometric modeling. Journal of Health Economics. 2008;27(3):531–543. doi: 10.1016/j.jhealeco.2007.09.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Goldman DP, Bhattacharya J, McCaffrey DF, Duan N, Leibowitz AA, Joyce GF, Morton SC. Effect of insurance on mortality in an HIV-positive population in care. Journal of the American Statistical Association. 2001;96(455):883–894. [Google Scholar]
6.Lichtenberg F. Are the benefits of new drugs worth their costs? Health Affairs. 2001;20:41–51. doi: 10.1377/hlthaff.20.5.241. [DOI] [PubMed] [Google Scholar]
7.Rosenheck R, Leslie D, Sindelar J, Miller E, Lin H, Stroup T, McEvoy J, Davis S, Keefe R, Swartz M. Cost effectiveness of second generation antipsychotics and perphenazine in a randomized trial of treatment for chronic schizophrenia. American Journal of Psychiatry. 2006;163:2080–2089. doi: 10.1176/ajp.2006.163.12.2080. [DOI] [PubMed] [Google Scholar]
8.Angrist JD, Imbens GW, Rubin DB. Identification of causal effects using instrumental variables (Disc: P456-472) Journal of the American Statistical Association. 1996;91:444–455. [Google Scholar]
9.Robins JM. Data, design, and background knowledge in etiologic inference. Epidemiology. 2001;11:313–320. doi: 10.1097/00001648-200105000-00011. [DOI] [PubMed] [Google Scholar]
10.Joffe MM, Small D, Brunelli S, Ten Have T, Feldman HI. Extended instrumental variables estimation of overall effects. International Journal of Biostatistics. 2008;4 doi: 10.2202/1557-4679.1082. (Epub April 7, 2008) [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Brookhart MA, Stürmer T, Gylnn RJ, Rassen J, Schneeweiss S. Confounding control in healthcare database research: Challenges and potential approaches. Medical Care. 2010;48:S114–S120. doi: 10.1097/MLR.0b013e3181dbebe3. (6(S1)) [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Small DS. Sensitivity analysis for instrumental variables regression with overidentifying restrictions. Journal of the American Statistical Association. 2007;102(479):1049–1058. [Google Scholar]
13.Wooldridge J. Econometric Analysis of Cross Section and Panel Data. Cambridge, MA: MIT Press; 2002. [Google Scholar]
14.Wald A. The fitting of straight lines if both variables are subject to error. Annals of Mathematical Statistics. 1940;11:284–300. [Google Scholar]
15.Campbell JY, Lo AW, Mackinlay AC. The Econometrics of Financial Markets. NJ: Princeton University Press; 1997. [Google Scholar]
16.Bhattacharya J, Goldman D, McCaffrey D. Estimating probit models with self-selected treatments. Statistics in Medicine. 2006;25(3):389–413. doi: 10.1002/sim.2226. [DOI] [PubMed] [Google Scholar]
17.Maddala G. Limited-dependent and Qualitative Variables in Econometrics. Cambridge: Cambridge University Press; 1978. [Google Scholar]
18.Hausman J. Specification tests in econometrics. Econometrica. 1978;46:1251–1271. [Google Scholar]
19.Zeng F, O'Leary JF, Sloss EM, Lopez MS, Dhanani N, Melnick G. The effect of medicare health maintenance organizations on hospitalization rates for ambulatory care-sensitive conditions. Medical Care. 2006;44(10):900–907. doi: 10.1097/01.mlr.0000220699.58684.68. [DOI] [PubMed] [Google Scholar]
20.Copas JB, Li HG. Inference for non-random samples (Disc: P77-95) Journal of the Royal Statistical Society, Series B: Methodological. 1997;59:55–77. [Google Scholar]
21.Heckman JJ. Dummy endogenous variables in a simultaneous equation system. Econometrica. 1978;46:931–960. [Google Scholar]
22.Rotnitzky A, Robins J. Analysis of semi-parametric regression models with non-ignorable non-response. Statistics in Medicine. 1997;16:81–102. doi: 10.1002/(sici)1097-0258(19970115)16:1<81::aid-sim473>3.0.co;2-0. [DOI] [PubMed] [Google Scholar]
23.Spiegelhalter DJ, Thomas A, Best N, Lunn D. WinBugs Version 1.4: User Manual. Medical Research Council Biostatistics Unit. http://www.mrc-bsu.cam.ac.uk/bugs, 2003.
24.Best N, Cowles M, Vines K. Convergence Diagnosis and Output Analysis Software for Gibbs Sampling Output. Robinson Way, Cambridge, U.K: MRC Biostatistics Unit, Institute of Public Health; 1995. [Google Scholar]
25.Duan N. Smearing estimate: A nonparametric retransformation method. Journal of the American Statistical Association. 1983;78:605–610. [Google Scholar]
26.Angrist J, Pischke JS. Mostly Harmless Econometrics: An Empiricist's Companion. Princeton, NJ: Princeton University Press; 2009. [Google Scholar]
27.Vytlacil E. Independence, monotonicity, and latent index models: an equivalence result. Econometrica. 2002;70(1):331–341. [Google Scholar]
28.Heckman JJ, Vytlacil E. Structural equations, treatment effects and econometric policy evaluation. Econometrica. 2005;73(3):669–738. [Google Scholar]
29.Staiger D, Stock JH. Instrumental variables regression with weak instruments. Econometrica. 1997;65:557–586. [Google Scholar]
30.Teixeira-Pinto A, Normand SLT. Correlated bivariate continuous and binary outcomes: issues and applications. Statistics in Medicine. 2008;28:1753–1773. doi: 10.1002/sim.3588. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Lieberman JA, Stroup TS, McEvoy J, Swartz MS, Rosenheck RA, Perkins DO, Keefe RS, Davis SM, Davis CE, Lebowitz BD. Effectiveness of antipsychotic drugs in patients with schizophrenia. New England Journal of Medicine. 2005;353:1209–1223. doi: 10.1056/NEJMoa051688. [DOI] [PubMed] [Google Scholar]

[b1] 1.VanLare JM, Conway PH, Sox HC. Five next steps for a new national program for comparative-effectiveness research. New England Journal of Medicine. 2010;362:970–973. doi: 10.1056/NEJMp1000096. [DOI] [PubMed] [Google Scholar]

[b2] 2.Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70:41–55. [Google Scholar]

[b3] 3.Imbens G, Angrist J. Identification and estimation of local average treatment effects. Econometrica. 1994;62:467–476. [Google Scholar]

[b4] 4.Terza JV, Basu A, Rathouz P. Two-stage residual inclusion estimation: Addressing endogeneity in health econometric modeling. Journal of Health Economics. 2008;27(3):531–543. doi: 10.1016/j.jhealeco.2007.09.009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b5] 5.Goldman DP, Bhattacharya J, McCaffrey DF, Duan N, Leibowitz AA, Joyce GF, Morton SC. Effect of insurance on mortality in an HIV-positive population in care. Journal of the American Statistical Association. 2001;96(455):883–894. [Google Scholar]

[b6] 6.Lichtenberg F. Are the benefits of new drugs worth their costs? Health Affairs. 2001;20:41–51. doi: 10.1377/hlthaff.20.5.241. [DOI] [PubMed] [Google Scholar]

[b7] 7.Rosenheck R, Leslie D, Sindelar J, Miller E, Lin H, Stroup T, McEvoy J, Davis S, Keefe R, Swartz M. Cost effectiveness of second generation antipsychotics and perphenazine in a randomized trial of treatment for chronic schizophrenia. American Journal of Psychiatry. 2006;163:2080–2089. doi: 10.1176/ajp.2006.163.12.2080. [DOI] [PubMed] [Google Scholar]

[b8] 8.Angrist JD, Imbens GW, Rubin DB. Identification of causal effects using instrumental variables (Disc: P456-472) Journal of the American Statistical Association. 1996;91:444–455. [Google Scholar]

[b9] 9.Robins JM. Data, design, and background knowledge in etiologic inference. Epidemiology. 2001;11:313–320. doi: 10.1097/00001648-200105000-00011. [DOI] [PubMed] [Google Scholar]

[b10] 10.Joffe MM, Small D, Brunelli S, Ten Have T, Feldman HI. Extended instrumental variables estimation of overall effects. International Journal of Biostatistics. 2008;4 doi: 10.2202/1557-4679.1082. (Epub April 7, 2008) [DOI] [PMC free article] [PubMed] [Google Scholar]

[b11] 11.Brookhart MA, Stürmer T, Gylnn RJ, Rassen J, Schneeweiss S. Confounding control in healthcare database research: Challenges and potential approaches. Medical Care. 2010;48:S114–S120. doi: 10.1097/MLR.0b013e3181dbebe3. (6(S1)) [DOI] [PMC free article] [PubMed] [Google Scholar]

[b12] 12.Small DS. Sensitivity analysis for instrumental variables regression with overidentifying restrictions. Journal of the American Statistical Association. 2007;102(479):1049–1058. [Google Scholar]

[b13] 13.Wooldridge J. Econometric Analysis of Cross Section and Panel Data. Cambridge, MA: MIT Press; 2002. [Google Scholar]

[b14] 14.Wald A. The fitting of straight lines if both variables are subject to error. Annals of Mathematical Statistics. 1940;11:284–300. [Google Scholar]

[b15] 15.Campbell JY, Lo AW, Mackinlay AC. The Econometrics of Financial Markets. NJ: Princeton University Press; 1997. [Google Scholar]

[b16] 16.Bhattacharya J, Goldman D, McCaffrey D. Estimating probit models with self-selected treatments. Statistics in Medicine. 2006;25(3):389–413. doi: 10.1002/sim.2226. [DOI] [PubMed] [Google Scholar]

[b17] 17.Maddala G. Limited-dependent and Qualitative Variables in Econometrics. Cambridge: Cambridge University Press; 1978. [Google Scholar]

[b18] 18.Hausman J. Specification tests in econometrics. Econometrica. 1978;46:1251–1271. [Google Scholar]

[b19] 19.Zeng F, O'Leary JF, Sloss EM, Lopez MS, Dhanani N, Melnick G. The effect of medicare health maintenance organizations on hospitalization rates for ambulatory care-sensitive conditions. Medical Care. 2006;44(10):900–907. doi: 10.1097/01.mlr.0000220699.58684.68. [DOI] [PubMed] [Google Scholar]

[b20] 20.Copas JB, Li HG. Inference for non-random samples (Disc: P77-95) Journal of the Royal Statistical Society, Series B: Methodological. 1997;59:55–77. [Google Scholar]

[b21] 21.Heckman JJ. Dummy endogenous variables in a simultaneous equation system. Econometrica. 1978;46:931–960. [Google Scholar]

[b22] 22.Rotnitzky A, Robins J. Analysis of semi-parametric regression models with non-ignorable non-response. Statistics in Medicine. 1997;16:81–102. doi: 10.1002/(sici)1097-0258(19970115)16:1<81::aid-sim473>3.0.co;2-0. [DOI] [PubMed] [Google Scholar]

[b23] 23.Spiegelhalter DJ, Thomas A, Best N, Lunn D. WinBugs Version 1.4: User Manual. Medical Research Council Biostatistics Unit. http://www.mrc-bsu.cam.ac.uk/bugs, 2003.

[b24] 24.Best N, Cowles M, Vines K. Convergence Diagnosis and Output Analysis Software for Gibbs Sampling Output. Robinson Way, Cambridge, U.K: MRC Biostatistics Unit, Institute of Public Health; 1995. [Google Scholar]

[b25] 25.Duan N. Smearing estimate: A nonparametric retransformation method. Journal of the American Statistical Association. 1983;78:605–610. [Google Scholar]

[b26] 26.Angrist J, Pischke JS. Mostly Harmless Econometrics: An Empiricist's Companion. Princeton, NJ: Princeton University Press; 2009. [Google Scholar]

[b27] 27.Vytlacil E. Independence, monotonicity, and latent index models: an equivalence result. Econometrica. 2002;70(1):331–341. [Google Scholar]

[b28] 28.Heckman JJ, Vytlacil E. Structural equations, treatment effects and econometric policy evaluation. Econometrica. 2005;73(3):669–738. [Google Scholar]

[b29] 29.Staiger D, Stock JH. Instrumental variables regression with weak instruments. Econometrica. 1997;65:557–586. [Google Scholar]

[b30] 30.Teixeira-Pinto A, Normand SLT. Correlated bivariate continuous and binary outcomes: issues and applications. Statistics in Medicine. 2008;28:1753–1773. doi: 10.1002/sim.3588. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b31] 31.Lieberman JA, Stroup TS, McEvoy J, Swartz MS, Rosenheck RA, Perkins DO, Keefe RS, Davis SM, Davis CE, Lebowitz BD. Effectiveness of antipsychotic drugs in patients with schizophrenia. New England Journal of Medicine. 2005;353:1209–1223. doi: 10.1056/NEJMoa051688. [DOI] [PubMed] [Google Scholar]

PERMALINK

Estimating cost-offsets of new medications: Use of new antipsychotics and mental health costs for schizophrenia

A James O'Malley

R G Frank

S-L T Normand

Abstract

1. Introduction

2. Statistical models

2.1. Notation and definitions

2.2. Assumed underlying model

2.3. Parametric model: structural and distributional assumptions

3. Estimation methods

3.1. Ordinary least squares (OLS)

3.2. Two-stage least squares (2SLS)

3.3. Alternative two-stage approaches

3.4. Maximum likelihood

3.5. Bayesian inference

3.6. Testing the exclusion restriction

3.7. Computation

4. Cost-offsets: atypical and conventional antipsychotic use in adults with schizophrenia in Florida

4.1. Descriptive results

Table I.

4.2. Treatment effects

Figure 1.

Table II.

Figure 2.

5. Simulation study

Figure 3.

Figure 4.

Table III.

Figure 5.

5.1. Results for OLS

5.2. Results for 2SLS

5.3. Results for 2SPS and 2SRI

5.4. Results for likelihood-based estimators

5.4.1. Test of exclusion restriction

5.5. Other results

6. Discussion

Acknowledgments

Appendix A: Method-of-moments derivation of 2SLS

Appendix B: Derivation of the likelihood function

Appendix C: Computation of standard errors of MLEs

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases