Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2009 May 1.
Published in final edited form as: J Health Econ. 2007 Dec 4;27(3):531–543. doi: 10.1016/j.jhealeco.2007.09.009

Two-Stage Residual Inclusion Estimation: Addressing Endogeneity in Health Econometric Modeling

Joseph V Terza *, Anirban Basu , Paul J Rathouz
PMCID: PMC2494557  NIHMSID: NIHMS53772  PMID: 18192044

Abstract

The paper focuses on two estimation methods that have been widely used to address endogeneity in empirical research in health economics and health services research B two-stage predictor substitution (2SPS) and two-stage residual inclusion (2SRI). 2SPS is the rote extension (to nonlinear models) of the popular linear two-stage least squares estimator. The 2SRI estimator is similar except that in the second stage regression, the endogenous variables are not replaced by first-stage predictors. Instead, first-stage residuals are included as additional regressors. In a generic parametric framework, we show that 2SRI is consistent and 2SPS is not. Results from a simulation study and an illustrative example also recommend against 2SPS and favor 2SRI. Our findings are important given that there are many prominent examples of the application of inconsistent 2SPS in the recent literature. This study can be used as a guide by future researchers in health economics who are confronted with endogeneity in their empirical work.

1. Introduction

Endogeneity of regression predictors is a common problem in many areas of applied economics, including health economics and health services research, as these fields rely heavily on observational data. Endogeneity arises owing to problems such as omitted confounder variables, simultaneity between a predictor and the outcome, and errors in regression covariates. Instrumental variables (IV) methods form a common body of approaches to handing such endogeneity. The theoretical and methodological literature guiding the use of IVs in linear regression models is large and serves as the basis for most practitioners’ understanding of the assumptions and implementation of IV models. In many health economics and health services research problems, however, linear regression models are now being replaced by nonlinear regression models, including generalized linear models, as these models are often more appropriate for limited-dependent variables, count variables and skewed distributions such as healthcare costs.

Despite the growing use and appreciation of nonlinear models among empirical researchers in health economics and health services research, there appears to be some confusion surrounding the applications of IV methods in the context of these models. The goal of the present paper is to address this concern while at the same time unifying earlier results under a common nonlinear modeling framework. We carefully examine two instrumental variables (IV) based approaches to correcting for endogeneity bias in nonlinear models -- two-stage residual inclusion (2SRI) and two-stage predictor substitution (2SPS) -- focusing especially on a class of nonlinear models that have been widely exploited in empirical health economics and health services research. We show the consistency of the 2SRI estimator in this class of models and reemphasize the inconsistency of the alternative 2SPS approach. Our goal is to demonstrate the superiority of the 2SRI method, to guide applied researchers in carrying out 2SRI estimation when they are trying to address endogeneity in nonlinear models, and to help them understand why they should steer away from the popular 2SPS approach.

2SPS is the rote extension to nonlinear models of the popular linear two-stage least squares (2SLS) estimator. In the first-stage of 2SPS, auxiliary (reduced-form) regressions are estimated, and the results are used to generate predicted values for the endogenous variables. The second-stage regression is then conducted for the outcome equation of interest after replacing the endogenous variables with their predicted values. The 2SRI estimator has the same first stage as 2SPS. In the second stage regression, however, the endogenous variables are not replaced. Instead, the first-stage residuals are included as additional regressors in second-stage estimation. This method was first suggested by Hausman (1978) in the linear context as a means of testing for endogeneity. We focus on these two methods because both have been applied in empirical studies in health economics and health services research. Indeed, these models can be easily implemented using any modern statistical software package

We begin, in the next section with detailed descriptions of the methods within a unified modeling framework. This framework extends the two-stage least squares (2SLS) linear modeling framework for instrumental variables to nonlinear outcome and/or auxiliary models, encompassing many parametric nonlinear models that are commonly used in empirical health economics and health services research. The statistical properties of the two alternative methods are more formally examined in section 3. There, we note that although the two methods produce identical results in the fully linear model (a special case of the broader class of models we consider), they do not coincide in the generic nonlinear model. Moreover, we show why 2SRI is generally statistically consistent in this broader class, but 2SPS is not. In section 4, we compare the methods using simulated data in the context of two interesting nonlinear models involving endogenous regressors. The results reflect the theoretical consistency of 2SRI and the lack thereof for 2SPS. Further comparisons are drawn between the methods in section 5, wherein we re-estimate Mullahy’s (1997) exponential regression model of the effect of prenatal smoking on birthweight using a more flexible functional form. The 2SPS and 2SRI estimates differ substantially. The final section summarizes and concludes. The theoretical consistency of 2SRI, the results of the simulation analyses, and the findings from re-estimation of Mullahy’s (1997) model all support the use of 2SRI over 2SPS.

2. The Modeling Framework and Estimation

2.1 The Model

We employ the following nonlinear modeling framework. The main, and minimal, assumption of the model is that the conditional mean of the outcome (y) is of the form

E[yxe,xo,xu]=M(xeβe+xoβo+xuβu) (1)

where M(•) is a known nonlinear function, and we distinguish among three types of regressor: xe = [xe1 xe2 … xeS] denotes a 1×S vector of endogenous regressors; xo = [xo1 xo2 … xoK] is a 1×K vector of observable exogenous regressors (observable confounders); and xu = [xu1 xu2 … xuS] is a 1×S vector of unobservable confounder latent variables (omitted variables) that influence the outcome y and are correlated with the endogenous variables. Correspondingly, βe and βu are S×1 vectors, and βo is a K×1 vector of unknown regression parameters. Letting β=[βeβoβu] be the corresponding column vector, the regression model corresponding to (1) is

y=M(xeβe+xoβo+xuβu)+e (2)

where e is the random error e, tautologically defined as e = y − M(xeβe + xoβo + xuβu) so that E[e | x] = 0.

At issue here is the correlation between xe and xu – this is the essence of the endogeneity problem. To formalize the relationship between xe and xu, and thereby provide a means for dealing with endogeneity bias through the use of instrumental variables (IV), we define the following set of (possibly) nonlinear auxiliary (or reduced form) equations:

xes=rs(wαs)+xusfor s=1,,S (3)

where w = [xo w+], w+ = [w1+, …, wS++] is a 1×S+ vector of identifying IV, and αs is a (K+S+)×1 column vector of parameters. The elements of w+ must satisfy the following three conditions: 1) they cannot be correlated with xu; 2) they must be sufficiently correlated with xe (i.e., they must not be “weak”); and 3) they can neither have a direct influence on y nor be correlated with the error term in (2). Also there must be at least as many elements in w+ as there are endogenous regressors in (2) (i.e., S+ ≥ S).1

Theoretical and applied econometric studies that implement the nonlinear model specification (2) abound in the health economics and health services research literatures.2 In the remainder of this section, we examine two alternative ways to estimate the parameters of the model in (2)–(3): two-stage predictor substitution (2SPS) and two-stage residual inclusion (2SRI).3 Both methods extend the familiar linear two-stage least squares (2SLS) method to the nonlinear model defined in (2)–(3), and both have been applied in the health economics and health services research literature.

2.2 Two-Stage Predictor Substitution

The two-stage predictor substitution (2SPS) method is straightforward and simple to implement in the context of the model defined by (2)–(3). In the first stage, obtain consistent estimates of the vectors αs (α̂s) by applying the nonlinear least squares method (NLS) or any other consistent estimation technique to auxiliary equations (3). Next, compute the “predictors” of xe1, …, xeS as

x^es=rs(wα^s)   for s=1,,S. (4)

Correspondingly, define the “residuals” in these models as

x^us=xesrs(wα^s)   for s=1,,S. (5)

In the second stage estimate γ=[γeγo] apply NLS to

y=M(xoγo+x^eγe)+e2SPS (6)

where x̂e = [x̂e1e2, …, x̂eS], γe is S×1, γo is K×1, and e2SPS denotes the regression error term. Consider, for instance, the linear case in which (1) becomes

E[yx]=xeβe+xoβo+xuβu. (7)

In addition, assume that the auxiliary equations (3) are linear, i.e.

xes=wαs+xus   for s=1,,S. (8)

In this case, 2SPS is identical to the popular two-stage least squares (2SLS) [or linear instrumental variables (IV)] method and is, therefore, consistent (Greene, 2003). For a heuristic understanding of how 2SLS works, consider the linear regression model obtained by combining (2) and (7). The problem with naïve OLS regression of y on [xe xo] (ignoring the unobserved confounders xu) is that the error term in this regression is equal to (xuβu + e); this regression error is correlated with xe, thereby introducing bias in the OLS estimators of the coefficients of [xe xo]. 2SLS works by replacing xeβe in (2) with (x̂eβe + x̂uβe) and then relegating x̂uβe to the error term. The error thus becomes (x̂uβe + xuβu + e) which is easily shown to be uncorrelated with the regressors x̂e and xo.

As we discuss formally in section 3, the consistency property of 2SLS in the fully linear models does not generally extend to the use of 2SPS in nonlinear models. The reason for this hinges on the fact that, in the nonlinear case, neither xuβu nor x̂uβe is additive in the model [i.e. they are “inside” the function M(•)]. Therefore, neither of these terms can simply be moved “outside” of M(•) to become part of the error term e in (2). Applications of inconsistent 2SPS methods in nonlinear health econometric contexts can be found in many papers in health economics.4,5 The 2SRI method, discussed next, addresses the limitation of 2SPS in nonlinear models through an alternative approach to the problem that is equivalent to 2SPS in the fully linear model.

2.3 Two-Stage Residual Inclusion

An alternative implementation of the two-stage IV approach in nonlinear models is the two-stage residual inclusion (2SRI) method. The first stage of the 2SRI estimator is identical to that of 2SPS. The second stage of the estimator applies NLS to the following version of (2)

y=M(xeβe+xoβo+x^uβu)+e2SRI (9)

where e2SRI is the regression error term, and where x̂u is as defined in section 2.2. Note that the actual observed value of the endogenous regressors xe are maintained in the second stage regression model while the residuals from the auxiliary regressions are substituted for the unobserved confounders xu.6 The reason that 2SRI works is simple: if the auxiliary regression parameters (αs) were known then, by (3), the values of xu would also be known and could be included among the observable controls in NLS estimation. In short, and the endogeneity of xe would cease to exist. Although we do not know α, we can consistently estimate it and thereby obtain very good estimates of the true xu’s.

In the linear case defined by (7) and (8), like 2SPS, 2SRI is identical to the popular two-stage least squares (2SLS) [or linear instrumental variables (IV)] method and is, therefore, consistent. The 2SRI method is not new. It was first proposed by Hausman (1978) as a means of directly testing for endogeneity in the model defined in equations (7) and (8). In this case, because 2SRI is identical to 2SLS, it is consistent for βe and βo. It is also easy to show that 2SRI yields a consistent estimate of βu in the linear case.7 Consistent 2SRI methods for specific nonlinear models have been developed by Blundell and Smith (1989, 1993), Newey (1987), Rivers and Vuong (1988), and Smith and Blundell (1986). Wooldridge (1997, 2002) suggests the use of the 2SRI method for count data models. Examples of the use of the 2SRI method in health economics can be found in DeSimone (2002), Baser et al. (2004), Norton and Van Houtven (2006), Shea et al. (2006), Stuart et al. (2007), Gibson et al. (2006), Shin and Moon (2007), and Lindrooth and Weisbrod (2007).8

In what follows, we formally show that 2SRI is generally consistent in the broad class of models with endogenous regressors characterized by (2)–(3).

3. Formal treatment of Consistency Properties of the Estimators

As discussed earlier, in the linear model 2SLS = 2SPS = 2SRI. Therefore, all three methods are consistent. These identities do not, however, hold in the generic nonlinear case so the consistency of each method must be individually examined. To prove the consistency of 2SRI, we cast it as a special case of the generic two-stage optimization estimator (see Newey and McFadden, 1994; White, 1994, Chapter 6; or Wooldridge, 2002, Chapter 12). For simplicity of exposition, let us assume that xe and xu each have only one element (i.e. the model involves only one endogenous variable [S = 1]); extension to the higher-dimensional case is straightforward. In this case, there is one auxiliary regression equation and only one vector of auxiliary parameters (α) to be estimated [i.e., xe = r(wα) + xu]. In what follows, we will ignore estimation of the auxiliary regression parameters (α) because, given (3), we can obtain consistent estimates of α via NLS. Abstracting from the estimation of α, and couching the discussion in the OE context, it follows that the second-stage NLS 2SRI estimator is consistent for the value of λ=[λeλoλu] that optimizes

E[(yM(xeλe+xoλo+(xer(wα))λu))2]. (10)

Therefore, establishing the consistency of the 2SRI estimator for β, as defined in (2), amounts to showing that β is equal to the specific value of λ that optimizes (10). Now, given the properties of the identifying IV (w+) we can replace (1) with

E[yxe,w,xu]=M(xeβe+xoβo+xuβu). (11)

Therefore, β is the parameter value at which M(•) is equal to the conditional mean of y given xe, xo, xu, and w+. Following Goldberger (1991, p.53) it follows that β is best predictor parameter vector value under the mean squared error criterion. In other words, β is the optimizer of (10). The consistency of NLS 2SRI for β is thus established.

A similar line of reasoning for the 2SPS estimator breaks down. In the OE framework, the NLS 2SPS estimator is consistent for the value of δ=[δeδoδu] that optimizes

E[(yM(r(wα)δe+xoδo))2]. (12)

At issue here is whether βe and βo as defined in (11) are equal to the optimizing values of δe and δo, respectively, for (12). We could in this case make the “best predictor” argument, as we did for 2SRI, if we could establish that E[y | w] = M(r(wα)βe + xoβo). But this is not possible, in general. The problem is that, although by definition E[xu | w] = 0 (one of the properties of the IV), the term xuβu cannot, in general, be eliminated from (11) by conditioning on w because it is “inside” the nonlinear function M(•).9

The general inconsistency of 2SPS in the context of the model defined in (2)–(3) leads one to speculate about the possibility of specifying a similar nonlinear parametric model for which 2SPS would be consistent in the presence of endogeneity. We were unable to come up with an equally general specification and, given the analyses in the present section (and the supplementary appendices available from the first author upon request), do not believe that such a specification exists.10

4. Simulation Analysis

As a follow-up to the discussion in the previous section, we explore potential biases from 2SPS relative to 2SRI estimation using simulated data in a few interesting nonlinear models involving endogenous regressors. Each of these examples is inspired by a published study in the health economics literature.

4.1 A Duration Model with Multinomial Endogenous Treatments

Gowisankaran and Town (1999) seek to estimate the effect of hospital choice on patient mortality hazard rates. In order to deal with the potential endogeneity of the hospital choice variables, they linearize both the hazard model and the multinomial model of inpatient choices, and implement the conventional linear IV method. They justify this approach by stating that “The reason that we use a linear probability model instead of a more common Weibull or Lognormal specifications for the hazard model is that it is extremely difficult to use nonlinear models such as these with endogenous variables” (p. 754). Such a model can be specified as a special case of our modeling framework, and consistently estimated via the 2SRI method.11

To illustrate this point, we constructed a simulation design in which the data generating process for y is assumed to be Weibull distributed, conditional on xe, w, and xu; and xe = [xe1 xe2] where xej is binary with value 1 if and only if the jth multinomial logit (MNL) alternative is manifested (xej = 0 otherwise).12 Note that there are actually three possible MNL outcomes in this model (j = 0, 1, 2) but under the usual identification restriction, the model has been normalized on one of the outcomes (j = 0). Using this sampling design, we generated 1000 samples of size n = 5,000 and to each of them applied four different estimators: 1) True Model – maximum likelihood estimation (MLE) based on the actual model used to simulate the data; 2) Naïve Model -- MLE ignoring the unobservable confounders (xu); 3) 2SPS; and 4) 2SRI.13 Using the results from each of these models, we estimated the following average treatment effects as a basis for comparison of estimator performance14

E[yxeω=1]E[yxeξ=1]effectonsurvivalofexogenouslyimposingtreatmentoptionωvs.treatmentoptionξ,forω,ξ=0,1,2andξδ (13)

where yx=1 denotes the random variable representing the counterfactual outcome as it would be under the exogenously imposed x = 1 scenario.15

The results of our simulation analysis are displayed in Table 1. For each of the True, Naïve, 2SPS, and 2SRI estimators we give the percent absolute bias of the estimated value of (13) relative to the corresponding true values across the 1000 simulated samples. The differences between 2SPS and 2SRI in the estimation of the average effects in (13) are striking. The mean overall bias for the 2SPS estimated average effects (the simple average of the first three entries in the 2SPS column of Table 1) is 36% as compared to 8% for 2SRI (as a baseline for comparison, note that this value is 8% for the True Model). To investigate the theoretical prediction that the 2SRI estimates should converge toward the true values as the sample size increases, we also simulated 1000 replicates of size 20,000 and repeated the estimations. These results, given in the bottom half of Table 1, comport with the consistency of the 2SRI estimator and the lack thereof for 2SPS. Huge differences between 2SPS and 2SRI with regard to bias in the estimation of the average effects persist despite the increase in sample size. The overall bias for 2SPS (the simple average of the last three entries in the 2SPS column of Table 1) remains high at 27% but drops to 1.3% for 2SRI (the analogous value for the True Model is 1.3%).

Table 1.

Simulation Results for Weibull Outcome with Multinomial Endogenous Variables Average Absolute % Bias

Average Effects Based on 1000 replicates of size n=5000
True Model (%bias) Naïve Model (%bias) 2SPS Model (%bias) 2SRI Model (%bias)
E[yxe1 =1] − E[yxe0=1] 16% 205% 28% 16%
E[yxe2 =1] − E[yxe0 =1] 2% 99% 38% 2%
E[yxe1 =1] − E[yxe1 =1] 5% 34% 42% 5%

Based on 1000 replicates of size n=20,000
E[yxe1 =1] − E[yxe0 =1] 2% 205% 2% 2%
E[yxe2 =1] − E[yxe0 =1] 2% 97% 27% 2%
E[yxe2 =1] − E[yxe1 =1] 0% 34% 51% 0%
The value in a particular cell of the table is the average percentage absolute bias, over the 1000 simulated samples, for a particular (estimator-q, average effect-t, sample size-j) combination, and is measured as
(m=1100011000abs(AE(t)qrmAE(t))abs(AE(t)))×100%
where AE(t) denotes the true value of the tth effect, AE(t)qrm is its estimated value obtained by applying the qth method to mth sample of the rth sample size, with
q=true MLE, nai¨ve, 2SPS, 2SRPt=E[yxe1=1]E[yxe0=1],E[yxe2=1]E[yxe0=1],E[yxe2=1]E[yxe1=1]r=5000,20000.

4.2 Ordered Logit with a Count-valued Endogenous Treatment

Lu and McGuire (2002) seek to estimate the effect of treatment on subsequent substance abuse. The ordinal outcome in their model can be represented by the vector y = [y1 y2 y3 y4], where

y1 = 1 if the client got worse at the time of discharge (used more drugs than at admission), 0 otherwise

y2 = 1 if the client’s drug use frequency stayed the same, 0 otherwise

y3 = 1 if the client got better but did not achieve abstinence, 0 otherwise

y4 = 1 if abstinence was achieved, 0 otherwise.

The endogenous variable in the regression is

xe = log of number of visits the client makes during the episode of treatment.

They model y as an ordinal logit regression, and estimate the parameters using 2SPS. As discussed above, this estimator is not consistent. To illustrate this point we generated 1000 replicates each of sample size n = 10,000 [in line with the Lu and McGuire (2002) sample size of 13,362] based on an ordered logit sampling design.16 Using the appropriate ordered logit MLE, for each sample we estimated the True (actual value of xu used) and Naïve (xu not included) versions of the model. We also applied the 2SPS estimator used by Lu and McGuire (2002), and the 2SRI ordered logit MLE. The results from each of these models were then used to estimate the following average probability effects attributable to specified exogenous changes in xe

P(yj(xe=b)=1)P(yj(xe=a)=1)theeffectofanexogenouschangeinxefromatobontheprobabilityoftheindividualbeinginoutcomecategoryj,forj=1,,4. (14)

where yj(xe =c) denotes the random variable representing the counterfactual outcome as it would be under the endogenously imposed xe = c scenario.17

The simulation results are displayed in Table 2. For samples of size n=10,000, 2SRI clearly dominates 2SPS with regard to estimation of the estimated probability effects (14). Taking the simple average of the first four entries in the 2SPS column of Table 2 yields an overall mean bias of 26.5%, while the analogous measure for 2SRI is at 7% (the baseline measure from the True Model is 5%). To track the behavior of the estimators with an increase in sample size, we reran the analysis with n = 20,000. The size of the 2SPS bias remained virtually unchanged, while the bias in 2SRI estimation all but disappears. Here, as in the Weibull duration analysis, the simulation results support the theoretical consistency of the 2SRI estimator.

Table 2.

Simulation Results for Ordered Categorical Outcome with Count Endogenous Variable Average Absolute % Bias

Average Effects Based on 1000 replicates of size n=10,000
True Model (%bias) Naïve Model (%bias) 2SPS Model (%bias) 2SRI Model (%bias)
P(y1(xe = ln(4)) = 1) − P(y1(xe = ln(2)) = 1) 0% 20% 28% 4%
P(y2(xe = ln(4)) = 1) − P(y2(xe = ln(2)) = 1) 1% 67% 29% 1%
P(y3(xe = ln(4)) = 1) − P(y3(xe = ln(2)) = 1) 15% 367% 44% 19%
P(y4(xe = ln(4)) = 1) − P(y4(xe = ln(2)) = 1) 3% 191% 5% 4%

Based on 1000 replicates of size n=20,000
P(y1(xe = ln(4)) = 1) − P(y1(xe = ln(2)) = 1) 0% 20% 28% 0%
P(y1(xe = ln(4)) = 1) − P(y1(xe = ln(2)) = 1) 0% 67% 29% 0%
P(y1(xe = ln(4)) = 1) − P(y1(xe = ln(2)) = 1) 0% 367% 37% 0%
P(y1(xe = ln(4)) = 1) − P(y1(xe = ln(2)) = 1) 0% 191% 8% 0%
The value in a particular cell of the table is the average percentage absolute bias, over the 1000 simulated samples, for a particular (estimator-q, average effect-t, sample size-j) combination, and is measured as
(m=1100011000abs(AE(t)qrmAE(t))abs(AE(t)))×100%
where AE(t) denotes the true value of the tth effect, AE(t)qrm is its estimated value obtained by applying the qth method to mth sample of the rth sample size, and
q=true MLE, nai¨ve, 2SPS, 2SRIt=P(y1(xe=ln(4))=1)P(y1(xe=ln(2))=1),P(y2(xe=ln(4))=1)P(y2(xe=ln(2))=1),P(y3(xe=ln(4))=1)P(y3(xe=ln(2))=1),P(y4(xe=ln(4))=1)P(y4(xe=ln(2))=1)r=10000,20000.

5. Mullahy’s Birthweight Model Revisited

To demonstrate the potential differences that might arise in actual practice between the 2SPS and 2SRI estimates, we re-estimated Mullahy’s (1997) model of the effect of prenatal cigarette smoking on birthweight using data supplied by the author. Mullahy (1997) suspects that maternal smoking during pregnancy may be correlated with the unobservable determinants of birthweight, so he specifies a nonlinear conditional mean regression model, which can be viewed as a special case of (1). In Mullahy’s model, birthweight (y) is the following function of prenatal smoking (xe), other observable determinants (xo) and a scalar representing the unobservable birthweight determinants that are correlated with prenatal smoking (xu)18

y=exp(xeβe+xoβo+xu)+e (15)

where x and β are defined canonically, and e is the random error term which is tautologically defined as e = y − exp(xeβe + xoβo + xu), so that E[e | x] = 0. Mullahy (1997) demonstrates that, given a vector of instrumental variables w = [xo w+], if the following conditions hold

E[exp(xu)w]=1E[yxe,w,xu]=E[yxe,xo,xu] (16)

then βe and βo can be consistently estimated via a generalized method of moments (GMM) estimator that does not require explicit specification of an auxiliary regression of xe on w.19

We implemented a flexible functional form for which GMM is not feasible (see Terza, 2006a) and 2SPS is not consistent. Specifically, we replaced (15) with the following variant of the inverse of the Box-Cox (1964) model originally suggested by Wooldridge (1992) for nonlinear models that do not involve endogeneity

E[yxe,xo,xu]=k(xeβe+xoβo+xuβu,τ) (17)

where

k(a,τ)=(((τ/2)a+1)2)1ττ0exp(a)τ=0

and 0 ≤ τ ≤ 2. This version of the inverse Box-Cox (IBC) model maintains the desired positivity of the regression function (regardless of the values of τ and xeβe + xoβo + xuβu), and possesses all of the essential properties of Wooldridge=s (1992) IBC formulation. In particular, k(a, τ) subsumes the linear model when τ = 2, and k(a, τ) 6 exp(a) as τ 6 0. We estimated the parameters of (17) using both the 2SRI and 2SPS estimators. Following Mullahy (1997) who states A… a linear reduced form for CIGARETTES may not be unreasonable@ we specify the auxiliary regression for prenatal cigarette consumption (xe) as in (8).

We used the same variables as did Mullahy: y = the newborn’s weight measured in pounds; xe = number of cigarettes smoked per day during pregnancy, xo = [1 PARITY WHITE MALE], w+ = [EDFATHER EDMOTHER FAMINCOM CIGTAX88]; PARITY = birth order; WHITE = 1 if white, 0 otherwise; MALE = 1 if male, 0 otherwise; EDFATHER = paternal schooling − yrs.; EDMOTHER = maternal schooling inus; yrs.; FAMINCOM = family income (× 10−3); CIGTAX99 = per pack state excise tax on cigarettes. The descriptive statistics of the sample are given in Table 3.

Table 3.

Descriptive Statistics of Sample for Re-Analysis of Mullahy’s Birthweight Model

Variable Mean Max Min
BIRTHWT(oz.) 7.42 16.94 1.44
CIGSPREG 2.09 50.00 0.00
PARITY 1.63 6.00 1.00
WHITE 0.78 1.00 0.00
MALE 0.52 1.00 0.00
EDFATHER 11.32 18.00 0.00
EDMOTHER 12.93 18.00 0.00
FAMINCOM 29.03 65.00 0.50
CIGTAX88 19.55 38.00 2.00

For 2SRI estimation, we applied OLS to the linear auxiliary equation, and used NLS to estimate β and τ in the following version of (9)

y=k(xeβe+xoβo+x^uβu,τ)+e2SRI (18)

where x̂u =xe − wα̂ and α̂ denotes the first-stage OLS estimator of α.20 For 2SPS estimation, we implemented the same first stage estimator of α but in the second stage applied NLS to

y=k(x^eδe+xoδo,τ)+e2SPS (19)

where x̂e denotes the first-stage OLS predictor of xe. The first-stage OLS estimates are given in Table 3. The 2SRI and 2SPS results are shown in Table 4. As in the simulation analyses of the previous section, the ultimate estimation objective is the causal effect of an exogenous change in prenatal smoking frequency on birthweight. For instance, consider

Table 4.

First Stage OLS Estimates of Auxiliary Regression in the Re-Analysis of Mullahy’s Birthweight Model

Variable Estimate t-stat
CONSTANT 6.74 6.49
PARITY 0.30 1.72
WHITE 0.78 1.89
MALE −0.04 −0.13
EDFATHER −0.12 −3.14
EDMOTHER −0.33 −4.37
FAMINCOM −0.02 −2.01
CIGTAX88 0.03 1.43

Second column gives estimates of the elements of α in xe = wα + xu

E[y0]E[y20]the average effect of an exogenously imposed decrease in prenatalsmoking from one pack (20 cigarettes) per day to zero usage. (20)

where yxe is the random variable representing birthweight as it would be under the exogenously imposed prenatal smoking level xe*. Terza (2006b) shows that under general conditions we can rewrite (20) as E[E[y|xe =0, xo, xu] − E[y|xe =20, xo, xu]] which, when combined with (16) yields

E[y0]E[y20]=E[k(xoβo+xuβu,τ)k(20βe+xoβo+xuβu,τ)]. (21)

We estimated (21), using the 2SRI results, as

i=1n1n{k(xoiβ^e+x^uiβ^u,τ^)k(20β^e+xoiβ^e+x^uiβ^u,τ^)} (22)

where x̂u denotes the first stage OLS residual, and the A^s@ indicate the 2SRI estimates. Alternatively, we estimated (21) with the 2SPS results using

i=1n1n{k(xoiδe,τ)k(20δe+xoiδe,τ)} (23)

where the A~s@ indicate the 2SPS estimates. The values of (22) and (23) are given in the last row of Table 5. As is shown therein, the predicted effects of an exogenous reduction in smoking from a pack per day to abstinence differ substantially between the two methods. Given the consistency of the 2SRI estimates, the results imply that 2SPS overstates (in absolute terms) the effect on birthweight by approximately 7 oz. which is about 6% of the sample mean birthweight.

Table 5.

Estimation Results B IBC Version of Mullahy=s Birthweight Model

2SPS IBC 2SRI IBC
Variable Coeff t-stat Coeff t-stat
CONSTANT 58.96 22.07 1.27 215.33
CIGSPREG −1.92 −3.47 −0.004 −3.52
PARITY 3.23 3.26 0.01 3.39
WHITE 9.64 4.85 0.02 4.64
MALE 4.64 2.81 0.01 2.93
First-stage Residual (xu) B B 0.024 2.03
Box-Cox 5.20 B −0.95 B
Parameter τ E[y0]− E[y20] 1.85 1.41

Second and fourth columns, respectively, give the estimates of the parameters in equations (18) and (19). Last row shows the 2SPS and 2SRI estimated effects of an exogenously imposed one-pack per day reduction in smoking, respectively.

6. Discussion

We have examined two estimation methods that are commonly used in health economic applications involving nonlinear models with endogenous regressors – two-stage predictor substitution (2SPS) and two-stage residual inclusion (2SRI). The discussion begins with a detailed description of the estimators in an intuitively appealing nonlinear regression framework that explicitly accounts for endogeneity (i.e., the presence of unobservable confounders). Within that framework we show that the 2SRI estimator is generally consistent while the 2SPS approach is not. To assess the potential extent of the bias in 2SPS estimation, we conducted simulation analyses based on two studies found in the recent health economics literature – one in which an inherently nonlinear duration model involving endogeneity was “linearized” in order to apply conventional linear instrumental variables (IV) methods; the other in which 2SPS was applied in order to deal with an endogenous regressor in an ordered logit model. The results reveal that potential bias from the use of 2SPS can be substantial and that such bias is not attenuated as the sample size increases. As a follow-up to the simulation analyses, and to examine the possible differences between 2SRI and 2SPS estimates in a real-world estimation setting, we revisited the study of the effect of prenatal cigarette smoking on birthweight conducted by Mullahy (1997). Using a flexible functional form we re-estimated the model and found there to be considerable differences between the estimates obtained via 2SPS vs.2SRI.

Our theoretical results, combined with those from the simulation and replication studies, favor the use of 2SRI for the econometric estimation of nonlinear models with endogenous regressors. We hope that this work will serve as a guide to applied researchers in health economics and health services research.

Acknowledgments

This research was supported by the National Institute on Drug Abuse (R01 DA013968-02) and the Substance Abuse Policy Research Program of the Robert Wood Johnson Foundation (53902). The author is grateful for the helpful comments of Libby Dismuke and David Bradford, and for the excellent research assistance provided by F. Michael Kunz. We also thank the editor and two anonymous reviewers for their many suggestions that served to improve the presentation.

Footnotes

1

The model can be generalized in a number of ways. First, it can be extended to allow: the outcome (y) to be vector-valued; multiple regression indexes (i.e. xβ is a vector); and multiple auxiliary regression indexes (i.e. wα is a vector). Secondly, this nonlinear framework constitutes the most parametrically parsimonious specification of the regression of y on x – i.e. only the key conditional mean regressions are specified [(2) and (3)]. The model can, of course, be more fully specified by positing higher-order conditional moment restrictions (heteroskedasticity, skewness, kurtosis, etc). Indeed, the model can be made fully parametric by specifying the joint probability density function of (y | x) and (xe | w). A more general version of the model, incorporating these features, can be found in an appendix that will be supplied upon request. All results presented here for the simple model in (2) and (3) are valid for, and can easily be extended to, the general model.

3

Terza (2006a) shows that the GMM cannot, in general, be directly applied to the model defined in (2).

5

Lee (1979, 1981) has developed a 2SPS method that is consistent under special conditions when the outcome is binary and the endogenous variable is continuous. Bollen et al. (1995), Norton et al. (1998), and Mroz et al. (1999) apply this method.

6

The 2SRI estimator can be cast as a special case of the conventional generic two-stage optimization estimator. Therefore, its asymptotic properties (in particular correct asymptotic standard errors) follow directly from the discussions found in Newey and McFadden (1994), White (1994, Chapter 6), or Wooldridge (2002, Chapter 12). Alternatively, the standard errors of the 2SRI estimator can be obtained via bootstrapping.

7

Note that this feature of 2SRI also holds true in the nonlinear case. Therefore, the exogeneity of xe can be tested via a conventional Wald-type statistic for H0: βu1 = βu2 = … = βuS = 0.

8

Other applications of 2SRI outside of health economics can be found in Burnett (1997), Alvarez and Glasgow (1999), McGarrity and Sutter (2000), and Petrin and Train (2006).

9

A formal proof of the general inconsistency of 2SPS is available from the authors upon request. In this supplementary appendix, we also discuss a special case in which 2SPS is consistent.

10

One might, for instance, consider the nonsymmetric additive model wherein y=H(xeβe+xoβo)+xuβu+e as a candidate for consistent 2SPS estimation. Kelejian (1971), however, proves that 2SPS is generally inconsistent in such models.

11

We note that Gowisankaran and Town later teamed with Geweke and reestimated their model in an appropriately designed Bayesian framework (see Geweke et al., 2003).

12

We refer here to the multinomial logit model introduced by McFadden (1973).

13

The complete details of the sampling design will be supplied upon request.

14

Recall that the treatment options are mutually exclusive and collectively exhaustive. Therefore, if x = 1 then x = 0 for all ω ≠ ξ and ω, ξ = 0, 1, 2.

15

Details of the estimators for the average treatment effects in (13) will be supplied upon request.

16

Details of the simulation design will be supplied upon request.

17

Details of the estimators for the probability effects in (14) will be supplied upon request.

18

Mullahy (1997) sets the coefficient of xuu) equal to 1. It is not identified and is therefore irrelevant in his model.

19

For a discussion of the GMM see Hansen (1982).

20

We performed a line search on τ to find its sum of squared residuals minimizing value.

DO NOT QUOTE WITHOUT AUTHORS’ PERMISSION

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Alvarez RM, Glasgow G. Two-Stage Estimation of Nonrecursive Choice Models. Political Analysis. 1999;8:147–165. [Google Scholar]
  2. Baser O, Bradley CJ, Gardiner JC, Given C. Testing and Correcting for Non-Random Selection Bias due to Censoring: An Application to Medical Costs. Health Services and Outcomes Research Methodology. 2004;4:93–107. [Google Scholar]
  3. Blundell RW, Smith RJ. Estimation in a Class of Simultaneous Equation Limited Dependent Variable Models. Review of Economics and Statistics. 1989;56:37–58. [Google Scholar]
  4. Blundell RW, Smith RJ. Simultaneous Microeconmetric Models with Censored or Qualitative Dependent Variables. In: Maddala GS, Rao CR, Vinod HD, editors. Handbook of Statistics. Vol. 2. Amsterdam: North Holland Publishers; 1993. pp. 1117–1143. [Google Scholar]
  5. Bollen KA, Guilkey DK, Mroz TA. Binary Outcomes and Endogenous Explanatory Variables: Tests and Solutions with an Application to the Demand for Contraceptive Use in Tunisia. Demography. 1995;32:111–131. [PubMed] [Google Scholar]
  6. Box GEP, Cox DR. An Analysis of Transformations. Journal of the Royal Statistical Society Series. 1964;B 26:211–252. [Google Scholar]
  7. Burgess S, Gregg P, Propper C, Wasgbrook E ALSPAC Study Team. Maternity Rights and Mothers’ Return to Work. 2002. Working Paper - CMPO, 02/055, 00. [Google Scholar]
  8. Burnett NJ. AGender Economics Courses in Liberal Arts Colleges,@. Journal of Economic Education. 1997;28:369–77. [Google Scholar]
  9. Cawley J. An Instrumental Variables Approach to Measuring the Effect of Body Weight on Employment Disability. Health Services Research. 2000;35:1159–1179. [PMC free article] [PubMed] [Google Scholar]
  10. Coulson E, Neslusan C, Stuart B, Terza J. Estimating the Moral Hazard Effect of Supplemental Medical Insurance in the Demand for Prescription Drugs by the Elderly,@ and. American Economic Review - Papers and Proceedings. 1995;85:122–126. [PubMed] [Google Scholar]
  11. DeSimone J. Illegal Drug Use and Employment. Journal of Labor Economics. 2002;20:952–977. [Google Scholar]
  12. Ettner SL, Hermann RC, Tang H. Differences Between Generalists and Mental Health Specialists in the Psychiatric Treatment of Medicare Beneficiaries. Health Services Research. 1999;34:737–760. [PMC free article] [PubMed] [Google Scholar]
  13. Fox M. Medical Student Indebtedness and the Propensity to Enter Academic Medicine. Health Economics. 2002;12:101–112. doi: 10.1002/hec.701. [DOI] [PubMed] [Google Scholar]
  14. French MT, Roebuck MC, Alexandre PK. Illicit Drug Use, Employment, and Labor Force Participation. Southern Economic Journal. 2001;68:349–368. [Google Scholar]
  15. Gibson TB, Mark TL, Axelsen K, Baser O, Rublee DA, McGuigan KA. Impact of Statin Copayments on Adherence and Medical Care Utilization and Expenditures. American Journal of Managed Care. 2006;12:SP11–SP19. [PubMed] [Google Scholar]
  16. Goldberger AS. A Course in Econometrics. Cambridge, MA: Harvard University Press; 1991. [Google Scholar]
  17. Gowrisankaran G, Town R. Estimating the quality of care in hospitals using instrumental variables. Journal of Health Economics. 1999;18(6):747–767. doi: 10.1016/s0167-6296(99)00022-3. [DOI] [PubMed] [Google Scholar]
  18. Geweke J, Gowrisankaran G, Town R. Bayesian Inference for Hospital Quality in a Selection Model. Econometrica. 2003;71:1215–1238. [Google Scholar]
  19. Gramm M. The case for regulatory rent-seeking: CRA based protests of bank mergers and acquisitions. Public Choice. 2003;116:367–379. [Google Scholar]
  20. Greene WH. AGender Economics Courses in Liberal Arts Colleges: Further Results,@. Journal of Economic Education. 1998;29:291–300. [Google Scholar]
  21. Greene WH. Econometric Analysis. 5. Upper Saddle River, NJ: Prentice Hall; 2003. [Google Scholar]
  22. Hansen LP. Large Sample Properties of Generalized Method of Moments Estimators. Econometrica. 1982;50:1029–1054. [Google Scholar]
  23. Hausman JA. Specification Tests in Econometrics. Econometrica. 1978;46:1251–1271. [Google Scholar]
  24. Holmes AM, Deb P. Provider Choice and Use of Mental Health Care: Implications for Gatekeeper Models. Health Services Research. 1998;33:1263–1284. [PMC free article] [PubMed] [Google Scholar]
  25. Howard D. The Impact of Waiting Time on Liver Transplant Outcomes. Health Services Research. 2000;35:117–1134. [PMC free article] [PubMed] [Google Scholar]
  26. Kelejian HH. Two-Stage Least Squares and Econometric Systems Linear in the Parameters but Nonlinear in the Endogenous Variables. Journal of the American Statistical Association. 1971;66:373–74. [Google Scholar]
  27. Kenkel D, Terza J. The Effect of Physician Advice on Alcohol Consumption: Count Regression with an Endogenous Treatment Effect. Journal of Applied Econometrics. 2001;16:165–184. [Google Scholar]
  28. Lee L. Identification and Estimation in Binary Choice Models with Limited ‘Censored’ Dependent Variables. Econometrica. 1979;47:977–995. [Google Scholar]
  29. Lee L. Simultaneous Equations Models with Discrete and Censored Dependent Variables Chapter 9. In: Manski C, McFadden D, editors. Structural Analysis of Discrete Data with Econometric Applications. Cambridge, MA: MIT Press; 1981. pp. 346–364. [Google Scholar]
  30. Lindrooth RC, Weisbrod BA. Do Religious Nonprofit and For-profit Organizations Respond Differently to Financial Incentives?” The Hospice Industry. Journal of Health Economics. 2007;26:342–357. doi: 10.1016/j.jhealeco.2006.09.003. [DOI] [PubMed] [Google Scholar]
  31. Lu M, McGuire TG. The Productivity of Outpatient Treatment for Substance Abuse. Journal of Human Resources. 2002;37:309–335. [Google Scholar]
  32. McFadden D. Conditional Logit Analysis of Qualitative Choice Behavior, Chapter 4. In: Zarembka P, editor. Frontiers in Econometrics. New York: Academic Press; 1973. pp. 105–142. [Google Scholar]
  33. McGarrity JP, Sutter D. A Test of the Structure of PAC Contracts: An Analysis of House Gun Control Votes in the 1980s. Southern Economic Journal. 2000;677:41–63. [Google Scholar]
  34. McGeary KA, French MT. Illicit Drug Use and Emergency Room Utilization. Health Services Research. 2000;35:153–169. [PMC free article] [PubMed] [Google Scholar]
  35. Meer J, Rosen HS. Insurance and the Utilization of Medical Services. Social Science in Medicine. 2004;58:1623–1632. doi: 10.1016/S0277-9536(03)00394-0. [DOI] [PubMed] [Google Scholar]
  36. Mroz TA, Bollen KA, Speizer IS, Mancini DJ. Quality, Accessibility, and Contraceptive Use in Rural Tanzania. Demography. 1999;36:23–40. [PubMed] [Google Scholar]
  37. Mullahy J. Instrumental-Variable Estimation of Count Data Models: Applications to Models of Cigarette Smoking Behavior. Review of Economics and Statistics. 1997;79:586–593. [Google Scholar]
  38. Neslusan CA, Hylan TR, Dunn RL, Donoghue J. Controlling for Systematic Selection in Retrospective Analyses: An Application to Fluoxetine and Sertraline Prescribing in the United Kingdom. Value in Health. 1999;2:435–445. doi: 10.1046/j.1524-4733.1999.26002.x. [DOI] [PubMed] [Google Scholar]
  39. Newey WK. Efficient Estimation of Limited Dependent Variable Models with Endogenous Explanatory Variables. Journal of Econometrics. 1987;36:231–250. [Google Scholar]
  40. Newey WK, McFadden D. Large Sample Estimation and Hypothesis Testing. In: Engle RF, McFadden DL, editors. Handbook of Econometrics. Amsterdam: Elsevier Science B.V.; 1994. pp. 2111–2245. Chapter 36. [Google Scholar]
  41. Norton EC, Lindrooth RC, Ennett ST. Controlling for the Endogeneity of Peer Substance Use on Adolescent Alcohol and Tobacco Use. Health Economics. 1998;7:439–453. doi: 10.1002/(sici)1099-1050(199808)7:5<439::aid-hec362>3.0.co;2-9. [DOI] [PubMed] [Google Scholar]
  42. Norton EC, Van Houtven CH. Intervivos Transfers and Exchange. Southern Economic Journal. 2006;73:157–172. [Google Scholar]
  43. Petrin A, Train K. Control Function Corrections for Unobserved Factors in Differentiated Product Models. 2006. Working paper. [Google Scholar]
  44. Pryor C, Terza J. Are Going Concern Audit Opinions a Self-Fulfilling Prophecy? Advances in Quantitative Analysis of Finance and Accounting. 2002;10:89–116. [Google Scholar]
  45. Register CA, Williams DR. Labor Market Effects of Marijuana and Cocaine Use Among Young Men. Industrial and Labor Relations Review. 1992;45:435–448. [Google Scholar]
  46. Rivers D, Vuong QH. Limited Information Estimators and Exogeneity Tests for Simultaneous Probit Models. Journal of Econometrics. 1988;39:347–366. [Google Scholar]
  47. Savage E, Wright DJ. AMoral Hazard and Adverse Selection in Australian Private Hospitals: 1989–1990,@. Journal of Health Economics. 2003;22:331–359. doi: 10.1016/S0167-6296(02)00104-2. [DOI] [PubMed] [Google Scholar]
  48. Shea D, Terza J, Stuart B, Briesacher B. Estimating the Effects of Prescription Drug Coverage for Medicare Beneficiaries. Health Services Research. 2007;43:933–949. doi: 10.1111/j.1475-6773.2006.00659.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Shin J, Moon S. Do HMO Plans Reduce Health Care Expenditure in the Private Sector. Economic Inquiry. 2007;45:82–99. [Google Scholar]
  50. Smith RJ, Blundell RW. An Exogeneity Test for a Simultaneous Equation Tobit Model with an Application to Labor Supply. Econometrica. 1986;54:679–685. [Google Scholar]
  51. Stuart BC, Terza JV, Doshi J. Assessing the Impact of Drug Use on Hospital Costs. Working Paper, Department of Epidemiology and Health Policy Research, University of Florida; 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Terza J. ADummy Endogenous Variables and Endogenous Switching in Exponential Conditional Mean Regression Models,@ Working paper. Department of Economics, Penn State University, presented at North American Summer Meeting of the Econometric Society; Quebec, Canada. 1994a. [Google Scholar]
  53. Terza J. AAn Estimator for Nonlinear Regression Models with Endogenous Switching and Sample Selection. @ Working paper, Department of Economics, Penn State University.1994b. [Google Scholar]
  54. Terza J. Estimating Count Data Models with Endogenous Switching: Sample Selection and Endogenous Treatment Effects. Journal of Econometrics. 1998;84:129–154. [Google Scholar]
  55. Terza J. AEstimating Endogenous Treatment Effects in Retrospective Data Analysis,@. Value in Health. 1999;2:429–434. doi: 10.1046/j.1524-4733.1999.26003.x. [DOI] [PubMed] [Google Scholar]
  56. Terza J. AAlcohol Abuse and Employment: A Second Look@. Journal of Applied Econometrics. 2002;17:393–404. [Google Scholar]
  57. Terza J. AEstimation of Policy Effects Using Parametric Nonlinear Models: A Contextual Critique of the Generalized Method of Moments,@. Health Services and Outcomes Research Methodology. 2006a;6:177–198. [Google Scholar]
  58. Terza J. An Econometric Framework for Analyzing Health Policy with Nonexperimental Data. Working Paper, Department of Epidemiology and Health Policy Research, University of Florida.2006b. [Google Scholar]
  59. Terza J. Parametric Nonlinear Regression with Endogenous Switching. Working Paper, Department of Epidemiology and Health Policy Research, University of Florida.2007. [Google Scholar]
  60. Terza J, Tsai W. Censored Probit Estimation with Correlation Near the Boundary: A Useful Reparameterization. Review of Applied Economics. 2006;2:1–12. [Google Scholar]
  61. Treglia M, Neslusan CA, Dunn RL. Fluoxetine and Dothiepin Therapy in Primary Care and Health Resource Utilization: Evidence from the United Kingdom. International Journal of Psychiatry in Clinical Practice. 1999;3:23–30. doi: 10.3109/13651509909024755. [DOI] [PubMed] [Google Scholar]
  62. White H. Estimation, Inference and Specification Analysis. Cambridge: Cambridge University Press; 1994. [Google Scholar]
  63. Wooldridge JM. AQuasi-Likelihood Methods for Count Data,@. In: Pesaran M, Schmidt P, editors. Handbook of Applied Econometrics, Volume II: Microeconometrics. Malden, MA: Blackwell Publishers, Ltd; 1997. [Google Scholar]
  64. Wooldridge JM. Some Alternatives to the Box-Cox Regression Model. International Economic Review. 1992;33:935–955. [Google Scholar]
  65. Wooldridge JM. Econometric Analysis of Cross Section and Panel Data. Cambridge, MA: MIT Press; 2002. [Google Scholar]

RESOURCES