Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Dec 23.
Published in final edited form as: J Polit Econ. 2015 Apr;123(2):413–443. doi: 10.1086/679498

The Generalized Roy Model and the Cost-Benefit Analysis of Social Programs*

Philipp Eisenhauer 1, James J Heckman 2, Edward Vytlacil 3
PMCID: PMC4689211  NIHMSID: NIHMS722338  PMID: 26709315

Abstract

The literature on treatment effects focuses on gross benefits from program participation. We extend this literature by developing conditions under which it is possible to identify parameters measuring the cost and net surplus from program participation. Using the generalized Roy model, we nonparametrically identify the cost, benefit, and net surplus of selection into treatment without requiring the analyst to have direct information on the cost. We apply our methodology to estimate the gross benefit and net surplus of attending college.

Keywords: Cost-Benefit Analysis, Treatment Effects, Returns and Costs to Education

1 Introduction

The traditional approach to the evaluation of public policy compares the benefits and costs of policies. Measures of net surplus are used to determine whether policies should be undertaken (see Hotelling, 1938; Tinbergen, 1956; Harberger and Jenkins, 2002; and Chetty, 2009). The recent literature on program evaluation, or “treatment effects”, focuses on gross benefits of policies and considers neither the marginal costs nor the perceived surplus associated with the programs being evaluated.1

We extend this literature using the generalized Roy model. In it, agents choose treatment if their expected surplus from doing so is positive, so the benefit outweighs the subjective cost. We present conditions under which we can use the economics of the model to identify cost and surplus parameters even without direct information on the costs of treatment. Information on revealed choices creates a simple relationship between the cost and benefit parameters: for individuals who are indifferent towards treatment participation, the benefit equals the cost and the surplus is zero. Building on existing identification results for benefit parameters, we show how to identify surplus and cost parameters by varying the margin of indifference. Our identification analysis applies traditional exclusion restrictions that separately shift costs and benefits from treatment. We use cost shifters to identify the benefit of treatment, and benefit shifters to vary the margin of indifference and thus to identify the cost of the treatment.

Our analysis complements and extends the work by Björklund and Moffitt (1987) who first noted the duality between cost and benefit parameters in the generalized Roy model. They estimate marginal gains and surpluses for policies within a parametric normal generalized Roy model. They use structural econometric methods to identify the components of the cost and benefit functions. This paper extends their analysis to a more general setting. It develops and applies a nonparametric identification analysis of benefits, costs, and surpluses without the need to identify all of the ingredients of a fully specified structural model. This approach implements Marschak’s Maxim (Heckman, 2010) by directly estimating the cost, benefit, and surplus parameters rather then constructing them from the estimates of a full structural model.

We present ex ante and ex post analyses of costs and benefits. Applying our methods to the data on ex post gross benefits analyzed by Carneiro et al. (2011), we find that heterogeneity in benefits, and not costs, is the main driver of the variability in the decision to attend college.

Our analysis is reminiscent of the Heckman (1974) model of female labor supply. In that analysis, the econometrician observes the offered wage only for the agents who choose to work. The economist does not observe the reservation wage of any agent. Yet, his analysis identifies the parameters of the offered wage equation and the reservation wage equation by using the implication of the underlying economic model that agents decide to work if the offered wage exceeds the reservation wage.2 In our analysis, we observe program outcomes for agents who select into treatment, and we observe the no treatment outcome for the agents who do not select into treatment. We do not observe the cost of treatment for any agent. Yet, using the economics of the model, we are able to identify the average benefit and average cost of treatment parameters by exploiting the agent’s decision rule of selecting into treatment if the benefit exceeds the cost.

Our analysis is very different from analyses using randomized experiments to infer treatment effects. In commonly implemented randomizations, it is not possible to identify the choice probability (Heckman, 1992; Heckman and Smith, 1995). Instead of using randomization to bypass problems of self-selection, we exploit the information that agents self-select into treatment and infer information on the cost of the treatment that cannot be recovered by standard randomized experiments.

The paper unfolds in the following way. Section 2 introduces the generalized Roy model. Section 3 reviews the average benefit of treatment parameters from Heckman and Vytlacil (1999, 2005, 2007), and develops and analyzes the dual cost parameters that match the benefit parameters. Section 4 presents our identification analysis of the cost and surplus parameters. Section 5 extends our analysis to allow agents to have imperfect foresight about future outcomes. We apply our analysis to study the decision to attend college in Section 6. Section 7 concludes.

2 The Generalized Roy Model

Suppose there are two potential outcomes (Y0, Y1), and a choice indicator D with D = 1 if the agent selects into treatment so that Y1 is observed and D = 0 if the agent does not select into treatment so that Y0 is observed. Anticipating our empirical analysis, Y1 is the annualized flow of income from college, and Y0 is the annualized flow of income from high school. The observed outcome Y can be written in switching regression form (Quandt, 1958, 1972):

Y=DY1+(1-D)Y0, (2.1)

where E(Yj | X) = μj and

Yj=μj(X)+Uj (2.2)

for j = 0, 1. X is a vector of regressors observed by the economist while (U0, U1) are not. Combining Equations (2.1) and (2.2),

Y=μ0(X)+{[μ1(X)-μ0(X)]+U1-U0}D+U0.

The individual gross benefit of treatment associated with moving an otherwise identical person from state “0” to “1” is B = Y1Y0 and is defined as the causal effect on Y of a ceteris paribus move from “0” to “1”. Defining E(C | Z) = μC(Z), the subjective cost of choosing treatment as perceived by the agent is

C=μC(Z)+UC, (2.3)

where Z is an observed random vector of cost shifters and UC is a random variable unobserved by the econometrician. Individuals choose treatment if the perceived benefit from treatment is greater than the subjective cost:

D=1ifS0;D=0otherwise, (2.4)

where S is the surplus, i.e. the net benefit, from treatment:

S=(Y1-Y0)-C={[μ1(X)-μ0(X)]-μC(Z)}-[UC-(U1-U0)]=μS(X,Z)-V,

with μS(X, Z) = [μ1(X) − μ0(X)] − μC(Z) and V = UC − (U1U0). Our identification analysis of cost and surplus parameters does not assume particular functional forms for μ0, μ1, and μC, nor does it assume that the distributions of U0, U1, and UC are of a known parametric form.

The original Roy (1951) model assumes that there are no observed regressors, X, that the cost of treatment is identically zero (i.e. μC = 0, UC = 0), and that (U0, U1) ~ N(0, Σ). Heckman and Honoré (1990) present an identification analysis for a nonparametric version of the Roy model using variation in regressors and making no parametric assumption on the distribution of (U0, U1). Their version of the Roy model also imposes the condition that the cost of treatment is identically zero. In contrast, we allow for nonzero cost of treatment. In fact, for our identification analysis we require nondegenerate cost of treatment and observed cost shifters.3 From the point of view of the observing economist, (X, Z) is observed and (U1, U0, UC) is unobserved. This model assumes that agents know the gross benefit, B = Y1Y0, of treatment. We show in Section 5 that our results extend to a broader class of models, where agents only have imperfect foresight about the benefits of treatment. This model also supposes that there is no other aspect of the benefit of treatment than Y1Y0. Implicitly, any subjective benefits of the program are incorporated into the costs of treatment, i.e. the cost function includes the subjective benefits of the treatment. For example, if job training allows the individual to work in a job with preferred amenities, this is modeled as a (negative) contribution to the subjective cost of treatment. The classification of effects in either positive benefits or negative subjective cost (or vice versa) does not affect the definition of the surplus. To simplify the exposition, we suppose that Z and X do not contain any common elements. Thus, all of our analysis can be seen as implicitly conditioning on all common elements of X and Z.

We make the following technical assumptions:

(A-1) (U0, U1, UC) is independent of (X, Z).

(A-2) The distribution of μC(Z) conditional on X is absolutely continuous with respect to Lebesgue measure.

(A-3) The distribution of V = UC − (U1U0) is absolutely continuous with respect to Lebesgue measure and has a cumulative distribution function that is strictly increasing.

(A-4) The population means E|Y1|, E|Y0| and E|C| are finite.

Assumption (A-1) assumes that (U0, U1, UC) is independent of (X, Z). Thus, D is endogenous but all other regressors in both the treatment equation and the outcome equation are exogenous. We implicitly condition on any regressors that enter both the outcome equations and the cost equation. Thus, this condition should be interpreted as an independence assumption for the error terms with regard to the unique elements of X and Z conditional on the regressors that enter both equations. No independence condition is required for the common elements. We also do not impose any restrictions on the dependence among the unobservables. (A-2) requires that there exists at least one continuous component of Z conditional on X. This assumption will only be required for our identification analysis, and is not needed for our definition or analysis of the cost and surplus parameters. (A-3) is a regularity condition. It allows for the possibility that UC is degenerate (costs do not vary conditional on Z) or that U1U0 is degenerate (benefits do not vary conditional on X), though not both. Assumption (A-4) is required for the mean benefit and cost parameters to be well defined. An implication of our model with Assumptions (A-1) and (A-3) is that 0 < Pr(D = 1 | X, Z) < 1 w.p.1, so that there is a treated group and a control group for almost all (X, Z). Note that this restriction still allows the support of the distribution of Pr(D = 1|X, Z) to be the full unit interval.

Let P(X, Z) denote the probability of selecting into treatment given (X, Z). Statisticians call this the “propensity score” P(X, Z) ≡ Pr(D = 1 | X, Z) = FV(μS(X, Z)), where FV(·) denotes the distribution of V.4 We sometimes denote P(X, Z) by P, suppressing the (X, Z) argument. We also work with US, a uniform random variable (US ~ Unif[0, 1]) defined by US = FV(V). Different values of US denote different quantiles of V. Given our previous assumptions, FV is strictly increasing, and P(X, Z) is a continuous random variable conditional on X.

The generalized Roy model presented in this paper is a special case of the model of Heckman and Vytlacil (1999, 2005). Under Assumptions (A-1)–(A-4), the model of Equations (2.1)(2.4) implies the model and assumptions of Heckman and Vytlacil (1999, 2005). From the analysis of Vytlacil (2002), the more general model is equivalent to the conditions that justify the Local Average Treatment Effect (LATE) model of Imbens and Angrist (1994). We impose more restrictions here. In particular, we impose the generalized Roy model and the corresponding assumptions that will allow us to exploit its structure for identification of subjective cost parameters. As in the conventional Roy model (Heckman and Sedlacek, 1985), we assume additive separability in the outcome equations. Additive separability is not required in Heckman and Vytlacil (1999, 2005), but is required by our analysis in order to obtain additive separability in the latent index equation consistent with the generalized Roy model.5 Thus our assumptions are most appropriate for continuous outcome variables, and we exclude discrete outcomes from our analysis. We also assume conditions on X that are not required in Heckman and Vytlacil (1999, 2005) to identify the gross benefit parameters. Their analysis conditions on X, and thus does not need to assume that X is independent of the error vector. In contrast, in order to use the generalized Roy model to recover subjective cost parameters, we require that the unique elements X are independent of the error vector.6

3 Benefit, Cost, and Surplus Parameters

This section defines and analyzes the benefit, cost, and surplus parameters. We maintain the model of Equations (2.1)(2.4), and invoke Assumptions (A-1) and (A-3)–(A-4). We do not require Assumption (A-2) for the definition of the parameters, but do require it for our identification analysis.

Standard treatment effect analyses identify averaged parameters of the gross benefit of treatment, B = Y1Y0. The most commonly studied treatment effect parameter is the average benefit of treatment BATE(x) ≡ E(Y1Y0 | X = x) = μ1(x) − μ0(x). This is the effect of assigning treatment randomly to everyone of type X = x assuming full compliance, and ignoring any general equilibrium effects. Another commonly used parameter is the average benefit of treatment on persons who actually take the treatment, referred to as the benefit of treatment on the treated: BTT(x) ≡ E(Y1Y0 | X = x, D = 1) = μ1(x) − μ0(x) + E(U1U0|X = x, D = 1). Heckman and Vytlacil (1999, 2005) unify a broad class of treatment effect parameters including the BATE(x) and BTT(x) through the marginal benefit of treatment, defined as BMTE(x, uS) ≡ E(Y1Y0|X = x, US = uS) = μ1(x) − μ0(x) + E(U1U0|US = uS). BMTE(x, uS) is the treatment effect parameter that conditions on the unobserved desire to select into treatment.

The conventional analysis of treatment effects does not define, identify, or estimate any aspect of the cost of the treatment. We define a set of cost parameters parallel to the benefit parameters, where cost is the subjective cost as perceived by the agent. Thus, we define the average cost of treatment, the average cost of treatment on the treated, and the marginal cost of treatment as follows:

CATE(z)=E(CZ=z)=μC(z)CTT(z)=E(CZ=z,D=1)=μC(z)+E(UCZ=z,D=1)CMTE(z,uS)=E(CZ=z,US=uS)=μC(z)+E(UCUS=uS).

Recalling that S = BC = μS(X, Z) − V, where μS(X, Z) = [μ1(X) − μ0(X)] − μC(Z) and V = UC − (U1U0), we can define the corresponding surplus parameters:

SATE(x,z)=E(SX=x,Z=z)=μS(x,z)SMTE(x,z,uS)=E(SX=x,Z=z,US=uS)=μS(x,z)-E(VUS=uS)

and

STT(x,z)=E(SX=x,Z=z,D=1)=μS(x,z)-E(VX=x,Z=z,D=1).

With these parameters, we can answer questions not only about the outcome change from treatment, but also about the subjective cost of treatment and the net surplus as well. As the surplus from treatment participation STT(x, z) is always positive among the treated, it follows immediately that BTT(x) > CTT(z) holds as well. Following Heckman and Vytlacil (1999, 2005), we can represent the average treatment effects and treatment on the treated as averaged versions of the marginal effects of treatment:

BATE(x)=01BMTE(x,uS)duSBTT(x)=01BMTE(x,uS)1-FPX(uSx)01(1-FPX(tx))dtduS. (3.1)

Following the same line of argument as used by Heckman and Vytlacil (1999, 2005),

CATE(z)=01CMTE(z,uS)duSCTT(z)=01CMTE(z,uS)1-FPZ(uSz)01(1-FPZ(tz))dtduS, (3.2)

and

SATE(x,z)=01SMTE(x,z,uS)duSSTT(x,z)=1P(x,z)0P(x,z)SMTE(x,z,uS)duS. (3.3)

We now establish relationships among these parameters. First, consider the marginal surplus parameter. Recall that US = FV(V) with FV strictly increasing. Thus US = uS is equivalent to V=FV-1(uS), and

SMTE(x,z,uS)=μS(x,z)-E(VV=FV-1(uS))=μS(x,z)-FV-1(uS).

FV-1 is strictly increasing, and thus SMTE(x, z, uS) is strictly decreasing in uS. Individuals with low uS want to enter the program the most and are those with the highest surplus from the program, while individuals with high uS want to enter the program the least and have the smallest surplus from the program. Using the fact that FV is strictly increasing and that P(X, Z) = FV(μS(X, Z)), conditioning on us = P(x, z) is equivalent to conditioning on V = μS(x, z). Thus

SMTE(x,z,P(x,z))=μS(x,z)-E(VV=μS(x,z))=0.

An individual with uS = P(x, z) is an individual who is indifferent between being treated and untreated if assigned X = x and Z = z. Since SMTE(x, z, uS) is strictly decreasing in uS, SMTE(x, z, uS) is positive for uS < P(x, z), is equal to zero at uS = P(x, z), and is negative if uS > P(x, z). If we instead fix evaluation point uS and consider how SMTE(x, z, uS) varies with (x, z), SMTE(x, z, uS) will be positive for all (x, z) such that P(x, z) > uS and will be negative for all (x, z) such that P(x, z) < uS.

We have thus far discussed only the marginal surplus function. Using the relationship SMTE(x, z, uS) = BMTE(x, uS)−CMTE(z, uS), we can translate statements about SMTE(x, z, uS) into inequalities about the marginal benefit and marginal cost functions:

BMTE(x,uS)>CMTE(z,uS)(x,z,uS)s.t.P(x,z)>uSBMTE(x,uS)=CMTE(z,uS)(x,z,uS)s.t.P(x,z)=uSBMTE(x,uS)<CMTE(z,uS)(x,z,uS)s.t.P(x,z)<uS.

The benefit and cost parameters coincide when evaluated at uS = P(x, z), because at this point the marginal cost equals the marginal benefit. We exploit this equality at the margin of indifference in the next section to achieve identification of the cost parameters.

To fix ideas, in Figure 1 we display the full set of marginal effects for a numerical example. We plot the marginal effect functions for fixed values of (x, z), where it happens that P(x, z) = 0.50. Individuals at that margin, uS = 0.50, have their benefit of treatment just offset by their subjective cost and are thus indifferent between participation in treatment and nonparticipation. The benefits are positive, but so are the costs. Overall, the surplus is zero. For uS < 0.50, the marginal benefit function lies above the marginal cost function and thus the marginal surplus is strictly positive. The reverse is true for uS > 0.50.

Figure 1.

Figure 1

Marginal Effects of Treatment

This example is constructed to have intuitive properties, with the marginal benefit of treatment BMTE(x, uS) decreasing in uS and the marginal cost of treatment CMTE(z, uS) increasing in uS. Agents with the greatest unobserved desire to select into treatment not only have higher benefits, but also have lower costs. These properties, while intuitive, need not hold in general—individuals with lower values of uS (and thus a greater unobserved desire to take treatment) must necessarily have higher net surplus than those with higher values of uS, but they need not have higher benefits and lower costs. It is possible, for example, that benefits and costs are so strongly positively correlated that those with the greatest unobserved desire to participate have either the smallest benefits and the lowest costs or the largest benefits and the highest costs. In Appendix A, we establish sufficient conditions for intuitive properties on BMTE(x, uS) and CMTE(z, uS) to hold, as well as testable implications of those conditions.

4 Identifying the Surplus and Benefit Functions Nonparametrically

Heckman and Vytlacil (1999, 2005) show that local instrumental variables (LIV) identify the marginal benefit of treatment:

E(YX=x,P=p)p=BMTE(x,p). (4.1)

We can identify E(Y|X = x, P = p) and its derivative for all (x, p) ∈ Supp(X, P), where Supp(X, P) denotes the support of (X, P(X, Z)).7 We can thus identify BMTE(x, uS) for all values of (x, uS) ∈ Supp(X, P). For a fixed x, we can identify BMTE(x, uS) for uS ∈ Supp(P|X = x). The more variation in propensity scores P conditional on X = x, the larger the set of evaluation points uS for which we identify BMTE(x, uS). Variation in propensity scores conditional on X is driven by variation in Z, the cost shifters. Thus, if we observe regressors that produce large variations in costs, we will be able to identify BMTE(x, uS) on a larger set.

We can identify BATE(x) and BTT(x) by identifying BMTE(x, uS) over the appropriate support and then integrating the latter with the appropriate weights, which are known given data on X and Z. By Equation (3.1), we identify BATE(x) if Supp(P|X = x) = [0, 1]. For fixed X = x, this requires that there be enough variation in the cost shifters Z to drive the probabilities P(x, Z) all the way to zero and to one. In other words, holding fixed the regressors that enter the outcome equation, we must observe cost shifters such that conditional on some values of those cost shifters, the cost to the agent is so low that the agent will select into treatment with probability arbitrarily close to one, and, conditional on other values of the cost shifters, the cost to the agent is so high that the agent will select into treatment with probability arbitrarily close to zero. Likewise, we identify BTT(x) if Supp(PX=x)=[0,pxmax] where pxmax is the supremum of Supp(P|X = x). This support requirement in turn requires that, for fixed X = x, there be enough variation in the cost shifters Z to drive the selection probability arbitrarily close to zero.8

Using Equation (4.1) and the relationship for people on the margin of choice that BMTE(x, P(x, z)) = CMTE(z, P(x, z)), we have

E(YX=x,P=p)p|p=P(x,z)=CMTE(z,P(x,z)). (4.2)

Using this relationship, we identify CMTE(z, uS) for all values of (z, uS) ∈ Supp(Z, P). We thus identify the marginal cost parameter without direct information on the cost of treatment by using the structure of the Roy model and by identifying the marginal benefit of treatment for individuals at the margin of participation. For a fixed z, we identify CMTE(z, uS) for uS ∈ Supp(P|Z = z). The greater the variation in propensity scores conditional on Z = z, the larger the set of evaluation points for which we identify CMTE(z, uS). Variation in propensity scores conditional on Z = z is driven by variation in X, the regressors that affect the potential outcomes and thus that drive the benefit of treatment. If we observe X regressors that cause large variations in benefits, we will be able to identify CMTE(z, uS) at a larger set of uS evaluation points. In contrast, if there are no X regressors, then P only depends on Z and we can only identify CMTE(z, uS) for uS = P(z).

From Equation (3.2), we can identify CATE(z) if Supp(P|Z = z) = [0, 1]. This requires, for fixed Z = z, for there to be enough variation in the outcome shifters X to drive the probabilities P(X, Z) all the way to zero and to one. In other words, holding fixed the regressors that enter the cost equation, we must observe outcome shifters such that conditional on some values of those outcome shifters, the benefit to the agent is so high that the agent will select into treatment with probability arbitrarily close to one; conditional on other values of the outcome shifters, the benefit to the agent is so low that the agent will select into treatment with probability arbitrarily close zero. Likewise, we identify CTT(x) if Supp(PZ=z)=[0,pzmax] where pzmax is the supremum of Supp(P|Z = z). This support requirement in turn requires that, for fixed Z = z, there is sufficient variation in the outcome shifters X to drive the probabilities arbitrarily close to zero.

Finally, consider identification of the surplus parameters. Using the fact that

SMTE(x,z,uS)=BMTE(x,uS)-CMTE(z,uS),

we can identify the marginal surplus parameter at (x, z, uS) such that (x, uS) ∈ Supp(X, P) and (z, uS) ∈ Supp(Z, P). By Equation (3.3), we can integrate SMTE(x, z, uS) using the appropriate weights (which are identified from the data on X and Z) to identify SATE(x, z) and STT(x, z) under the appropriate support conditions. For example, we identify SATE(x, z) if Supp(P|X = x) = [0, 1] and Supp(P|Z = z) = [0, 1].

Thus, for identification of the treatment parameters we need sufficient variation in cost shifters conditional on the outcome shifters. For identification of the cost parameters, we need sufficient variation in the outcome shifters conditional on the cost shifters. For identification of the surplus parameters we need sufficient variation in both sets of regressors. We can thus identify the marginal cost, the average cost, and the cost of treatment without direct information on the cost. Consequently, we can also identify the corresponding surplus parameters as well. Our ability to do so is directly related to the extent of variation in observed regressors that shift the benefit of the treatment.

We summarize our discussion in the form of a theorem:

Theorem 1

Assume that Equations (2.1)(2.4) and our Assumptions (A-1)–(A-4) hold.

  1. BMTE(x, uS) is identified for (x, uS) ∈ Supp(X, P); CMTE(z, uS) is identified for (z, uS) ∈ Supp(Z, P); and SMTE(x, z, uS) is identified for (x, z, uS) such that (x, uS) ∈ Supp(X, P) and (z, uS) ∈ Supp(Z, P).

  2. BATE(x) is identified if Supp(P|X = x) = [0, 1]; CATE(z) is identified if Supp(P|Z = z) = [0, 1]; SATE(x, z) is identified if Supp(P|X = x) = [0, 1] and Supp(P|Z = z) = [0, 1].

  3. BTT(x) is identified if Supp(PX=x)=[0,pxmax]; CTT(z) is identified if Supp(PZ=z)=[0,pzmax]; STT(x, z) is identified if Supp(PX=x)=[0,pxmax] and Supp(PZ=z)=[0,pzmax].

Our results allow for unobserved heterogeneity in costs and benefits conditional on the observed regressors. If there is no unobserved (by the economist) heterogeneity in the costs of treatment, UC = 0, then CMTE(z, uS) = CTT(z) = CATE(z) and thus we can identify the cost of treatment on the treated and average cost parameters without the additional support conditions. Likewise, if we impose that there is no unobserved heterogeneity in the benefits of treatment, U1U0 = 0, we have BMTE(z, uS) = BTT(z) = BATE(z) and can thus identify all of the benefit parameters without additional support conditions.

We establish identification of the marginal effect parameters within the conditional support of P. However, exploiting additive separability, we are able to extend the margin of identification to the unconditional support of P by a chaining argument. We illustrate the reasoning behind this for the BMTE(x, uS), but the analogous result applies to the marginal cost and surplus functions as well.

Recall that BMTE(x, uS) = μ1(x) − μ0(x) + E(U1U0|US = uS) is identified for all (x, uS) ∈ Supp(X, P). How BMTE(x, uS) varies with x does not depend on the point of evaluation of uS, and how BMTE(x, uS) varies with uS does not depend on the point of evaluation of x. This insight is helpful in securing identification of BMTE(x, uS) for other (x, uS) pairs.

For example, consider two potential values of X, x0 and x1, and suppose that there exists some p* such that p* ∈ Supp(P|X = x0) ∩ Supp(P|X = x1) so that BMTE(x0, p*) and BMTE(x1, p*) are both identified by Theorem 1. BMTE(x, uS) is additively separable in x and uS. As a consequence of additive separability, it follows directly that

BMTE(x0,uS)-BMTE(x0,p)=BMTE(x1,uS)-BMTE(x1,p). (4.3)

If uS ∈ Supp(P|X = x1), we identify BMTE(x1, uS) by Theorem 1. We can solve Equation (4.3) to identify BMTE(x0, uS) even if uS ∉ Supp(P|X = x0). Alternatively, if uS ∈ Supp(P|X = x0), we identify BMTE(x0, uS) by Theorem 1 and can now solve Equation (4.3) to identify BMTE(x1, uS) even if uS ∉ Supp(P|X = x1). Thus, if there exists some p* such that p* ∈ Supp(P|X = x0) ∩ Supp(P|X = x1), we can chain together identification of BMTE(x0, uS) for uS ∈ Supp(P|X = x0) and identification of BMTE(x1, uS) for uS ∈ Supp(P|X = x1) to obtain identification of BMTE(x0, uS) and BMTE(x1, uS) for all uS ∈ Supp(P|X = x0) ∪ Supp(P|X = x1). One can iterate to further increase the range of values for which BMTE(x, uS) is identified. Under an additional rank condition, we can use this strategy to identify BMTE(x, uS) for all (x, uS) ∈ Supp(X) × Supp(P). In particular, we consider the following assumption:

(A-5) X and P(X, Z) are measurably separated; i.e., any function of X that almost surely equals a function of P(X, Z) must be almost surely equal to a constant.

Measurable separability between X and P is a rank condition. A necessary condition for measurable separability between X and P(X, Z) is for P(X, Z) to be nondegenerate conditional on X, as implied by P(X, Z) = FV(μS(X, Z)) along with Assumptions (A-2) and (A-3). In Theorem 5 in Appendix A, we build on Theorem 2 of Florens et al. (2008) to provide sufficient conditions on our model for measurable separability between X and P(X, Z). As shown by that theorem, strengthened versions of Assumptions (A-2) and (A-3), along with an additional support condition, are sufficient for measurable separability between X and P(X, Z).

Using Assumption (A-5), we obtain the following identification result:

Theorem 2

Assume that Equations (2.1)(2.4) and our Assumptions (A-1)–(A-5) hold. Then, for x ∈ Supp(X) and z ∈ Supp(Z),

  1. BMTE(x, uS), CMTE(z, uS) and SMTE(x, z, uS) are identified for uSSupp(P).

  2. BATE(x), CATE(z) and SATE(x, z) are identified if Supp(P) = [0, 1], and

  3. BTT(x), CTT(z) and STT(x, z) are identified if Supp(P) = [0, pmax].

The proof of Theorem 2 is in Appendix B. The theorem shows that, under our maintained assumptions and condition (A-5), identification of the treatment parameters depends on the marginal support of P, not on the support of P conditional on X or Z.

5 Extension to the Case of Limited Information by the Agent

Thus far, our analysis has assumed choice Equation (2.4), i.e., that D = 1[S ≥ 0] where S = (Y1Y0) − C. This implicitly assumes that agents have perfect foresight about their net benefit. In this section, we extend the choice model of Equation (2.4) to allow for limited information on the part of the agents, while maintaining the model for latent outcomes (Y0, Y1) and cost C of Equations (2.2) and (2.3). We assume that agents form valid expectations about their outcomes and costs given the information that they have at the time of their treatment choice and that they select into treatment if the expected surplus is positive. We allow agents to know only some elements of (X, Z), and possibly to have incomplete knowledge of (U0, U1, UC) and hence their own idiosyncratic benefit and cost of treatment. We now show that the preceding analysis goes through with minor modifications, though it is now important to distinguish conditioning sets: what is known to the agent at the time of treatment choice (which might include some information not known to the econometrician), what is known to the econometrician (which might include some information not known to the agent at the time of treatment choice), and what is realized ex post. The essential change in our procedure in the case of incomplete information is that the marginal benefit of treatment identified by LIV must be projected onto the agent’s information set when selecting treatment to form the expected marginal benefit of treatment conditional on the information available to the agent. This coarsened version of BMTE is used to identify the marginal cost parameter. In addition, only components of X that are known to the agent at the time of treatment choice can aid in identification of the cost parameters. The exclusion restrictions for identification of the cost parameter are variables in X that are not in Z and that are known to the agent at the time of choosing treatment.

Let (XI, Z) denote components of (X, Z) that are observed by the agent when choosing whether to select into treatment.9 Suppose that the agent’s information set is (XI, Z, UI).10 UI is the private information of the agent relevant to his or her own benefits and cost of treatment, and is not observed by the econometrician.

We revise assumption (A-1) in the following way:

(A-1′) (U0, U1, UC, UI) is independent of (X, Z), and X is independent of Z conditional on XI.

Assumption (A-1′) imposes the requirement that the private information of the agent is independent of the observed regressors. Note that, under this independence assumption, (U0, U1, UC, UI) ⫫ (XI, Z), and

E(VX,Z,UI)=E(VXI,Z,UI)=E(VUI),

using the definition V = UC − (U1U0).

Assumption (A-1′) implies that (X, Z) ⫫ UI | (XI, Z), so that UI does not help the agent predict elements of (X, Z) that are not contained in (XI, Z). Thus, we allow the agents to have private information about their own idiosyncratic benefits (U1U0) and costs UC, though we impose the restriction that the only information known by the agent that is useful for predicting X is (XI, Z). Furthermore, Assumption (A-1′) requires that, conditional on the components of X known to the agent at the time of selecting into treatment, Z does not help to predict those elements of X not known at the time of treatment selection. This restriction is only imposed for notational convenience and can be easily relaxed.

We restate Assumption (A-3) as:

(A-3′) The distribution of Ṽ = E(V|UI) is absolutely continuous with respect to Lebesgue measure, and the cumulative distribution function of Ṽ is strictly increasing.

An implication of (A-3′) is that E(V|UI) is a nondegenerate random variable, and thus that agents have some nontrivial information about their own idiosyncratic cost or benefit from treatment when deciding whether to select into treatment. We maintain Assumptions (A-2) and (A-4) as before.

Define μjI(XI)=E(YjXI) for j = 0, 1, and μCI(Z)=E(CZ), and note that given our independence assumptions and the law of iterated expectations, μjI(XI)=E(μj(X)XI),μCI(Z)=E(μC(Z)Z). Define μSI(XI,Z)=E(SXI,Z). Under our assumptions,

E(SXI,Z,UI)=μSI(XI,Z)-V=μ1I(XI)-μ0I(XI)-μCI(Z)-V.

The decision rule becomes

D=1ifE(SXI,Z,UI)0;D=0otherwise, (5.1)

where E(S|XI, Z, UI) is the expected surplus from treatment, with the expectation conditional on the agents information set. We thus have

D=1[μSI(XI,Z)-V0],

where our independence assumptions imply ⫫ (XI, Z), and thus the selection model is of the same form as that used by Heckman and Vytlacil (1999), which allows us to use LIV to identify BMTE(x, uS). Redefining US = F() and P(XI,Z)=Pr[D=1XI,Z]=FV(μSI(XI,Z)), we have

D=1[P(XI,Z)-US0],

with US distributed unit uniform and independent of (X, Z) and thus independent of (XI, Z).

Define BIMTE(xI,uS)E(Y1-Y0XI=xI,US=uS),CIMTE(z,uS)E(CZ=z,US=uS), and SIMTE(xI,z,uS)BIMTE(xI,uS)-CIMTE(z,uS), the marginal benefit, cost, and net surplus of treatment conditional on the agent’s information set, where again by the law of iterated expectations and our independence assumptions

BIMTE(xI,uS)=E(BMTE(X,uS)XI=xI,US=uS)=E(BMTE(X,uS)XI=xI)CIMTE(z,uS)=E(CMTE(Z,uS)Z=z,US=uS)=E(CMTE(Z,uS)Z=z).

Evaluating SIMTE(xI,z,uS) at uS = P(xI, z), we obtain

SIMTE(xI,z,P(xI,z))=μSI(xI,z)-E(VUS=P(xI,z))=μSI(xI,z)-E(VV=μSI(xI,z))=μSI(xI,z)-E(VE(VUI)=μSI(xI,z))=μSI(xI,z)-E(E(VUI)E(VUI)=μSI(xI,z))=μSI(xI,z)-μSI(xI,z)=0,

where the second equality is obtained by plugging in the definition of US, the third equality is obtained by plugging in the definition of , and the fourth equality is obtained using the law of iterated expectations and the fact that E(V|UI) is degenerate given UI. Since SIMTE(xI,z,uS)=BIMTE(xI,uS)-CIMTE(z,uS), we have

BIMTE(xI,uS)=CIMTE(z,uS)foruSsuchthatuS=P(xI,z).

Thus, identification of BIMTE(xI,P(xI,z)) provides identification of CIMTE(z,P(xI,z)).

Since our model is a special case of Heckman and Vytlacil (1999), we can follow them in using LIV to identify BMTE(x, uS) for (x, uS) in the support of (X, P(XI, Z)). It is important to note that LIV does not identify the BMTE(x, uS) that is relevant to the agent’s decision problem. LIV identifies BMTE(x, uS) = E(Y1Y0|X = x, US = uS), not BIMTE(xI,uS)=E(Y1-Y0Xt=xI,US=uS). However, we can project the BMTE(x, uS) identified by LIV on the information known to the agent at the time of selecting into treatment and coarsen the set used to define and identify BMTE(x, uS), to identify the BIMTE(xI,uS) relevant to the agent’s decision problem. It is the latter that is relevant for identifying the cost functions. By the law of iterated expectations, we obtain

BIMTE(xI,uS)=E(BMTE(X,uS)XI=xI)=BMTE(x,uS)dFx(xXI=xI), (5.2)

where FX(·|XI = xI) is the cumulative distribution function of X conditional on XI = xI. We directly identify FX(·|XI = xI), and thus, for given uS, obtain identification of BMTE(x, uS) for all x ∈ Supp(X|XI = xI) implies identification of BIMTE(xI,uS). Since, for a given x, we identify BMTE(x, uS) if uS ∈ Supp(P(XI, Z)|X = x), we thus identify BIMTE(xI,uS) if

uSxSupp(XXI=xI)Supp(P(XI,Z)X=x).

In other words, to identify ex ante BIMTE(xI,uS), we need to identify ex post BMTE(x, uS) for every value x that X can take given XI = xI, and thus we need for uS to be an element of Supp(P(XI, Z)|X = x) for each value x that X can take given XI = xI. However, using the fact that XI is a subvector of X and independence assumption (A-1′), it follows that Supp(P(XI, Z)|X) = Supp(P(XI, Z)|XI), and thus using Equation (5.2) we identify BIMTE(xI,uS) for (xI, uS) in the support of (XI, P(XI, Z)). Using the fact that BIMTE(xI,P(xI,z))=CIMTE(z,P(xI,z)), we identify CIMTE(z,uS) for (z, uS) in the support of (Z, P(XI, Z)). We have thus identified the marginal cost parameter, and can integrate it to obtain other cost parameters. We can also combine it with the benefit parameters to identify net surplus parameters as before. The only elements of X that are useful for identifying the cost parameters are those elements that are in X, but not in Z, and which are known to the agent at the time of selection into treatment (i.e., are contained in XI).

6 Estimating the Cost and Surplus from Educational Choices

We apply our methodology to an analysis of educational choice and estimate the marginal benefit, cost, and surplus from a college education. Carneiro et al. (2011) provide estimates of the marginal benefit of attending college. We extend their work by adding results for the subjective cost and surplus. Björklund and Moffitt (1987) provide fully parametric estimates of cost and surplus in the context of a manpower training program in Sweden. Application of their approach offers a useful benchmark to gauge our more flexible estimation strategy. Our nonparametric identification analysis follows Marschak (1953) who noted that for many policy analyses only combinations of structural parameters are required. We embrace Marschak’s Maxim (Heckman, 2010) and implement an estimation strategy with minimal assumptions and transparent sources of identification for the marginal effects of treatment.

We analyze a sample of 1,747 white males from the National Longitudinal Survey of Youth of 1979 (NLSY79).11 The outcome variable is the log of the mean non-missing values of the hourly wage between 1989 and 1993, which we interpret as an estimate of the log hourly wage in 1991, and an approximation to the long-run wage. Schooling is measured in 1991 when individuals are between 28 and 34 years of age. We separate individuals into two groups: persons with no college (D = 0) and persons with at least some college (D = 1). We present annualized returns to education, obtained by dividing all our estimates by four which is the average difference in years of schooling between those with D = 1 and those with D = 0.

To identify the CIMTE(z,uS), we require variables that do not affect the cost of attending college, but that change future wages and are known to the agent at college entry (benefit shifters). We measure long-run labor market conditions by permanent local wages and compute average earnings between 1973 and 2000 for each location of residence at 17 as a proxy. Since we will also condition on current labor market conditions at the time of potential enrollment, these regressors should only affect the schooling decision through their effect on agent’s expected future wages and thus the expected benefit of treatment. We assume that the main benefits to a higher education are through earnings. Any other subjective benefits, such as allowing access to jobs with preferred amenities, are implicitly included (as a negative contribution) in costs. The validity of our exclusion restriction would be threatened if our measure of permanent local wages affects the subjective benefit of education.

We identify BMTE(x, uS) and BIMTE(xI,uS) using variables that do not affect future wages, but only the cost of attending college (cost shifters). We use current fluctuations in local labor market conditions such as local wages at the time of the educational decision, which shift the opportunity cost of schooling. They should not help to predict the agent’s expected future wages as we also control for permanent local labor market conditions. Effectively, we use only the innovations in local wages as cost shifters. We also include tuition cost, a dummy variable indicating urban residence at age 14, and distance to college as shifters that affect the direct cost of attending college.

Table 1 presents the covariates used in our empirical analysis. We highlight the two different types of exclusion restrictions. Variables that affect benefits as well as costs of treatment (common elements) include the Armed Forces Qualifying Test (AFQT) scores, mother’s education, number of siblings, and cohort dummies. In what follows, we keep this set of observables in the background to ease notation. X and Z continue to denote the benefit and cost shifters respectively. XI is the subvector of X which is known to the agent at the college entry decision. We include two variables in X not included in XI : years of experience and wages in the county of residence. The excluded variables are measured approximately 12 years after the agent’s college entry decision and thus not in the individual’s information set at the time of the treatment decision. We follow the analysis of Section 5 and allow agents to have imperfect foresight about the realizations of these variables. They form expectations about their future wages, but do not have perfect information. In line with our exposition, we assume that Z does not help to predict the ex post realization of X conditional on XI and denote the agent’s information about their idiosyncratic cost and benefit from treatment as = E (V | UI).

Table 1.

Specification

X XI Z Common
Years of Experience (in 1991)
Current Local Wages (in 1991)
Permanent Local Wages
AFQT Scores
Mother’s Education
Number of Siblings
Cohort Dummies
Urban Residence
Local Presence of Public College (age 14)
Local Tuition at Public College (age 17)
Local Wages (age 17)

Notes: Our main specification includes years of experience (linear and squared), current local wages (linear), permanent local wages (linear and squared), AFQT scores (linear and squared), mother’s education (linear and squared), number of siblings (linear and squared), urban residence (linear), cohort dummies (linear), local presence of public colleges (linear), local tuition of public college (linear), and local wages (linear). All exclusions from the benefit equation are interacted with AFQT scores, mother’s education, and number of siblings.

We specify a linear version of the generalized Roy model. Define potential outcomes:

Y1=Xβ1+U1andY0=Xβ0+U0.

The choice equation is:

D=1[XI(α1-α0)-Zγ>V],

where we assume that agents form valid expectations about their own outcomes so that E (X(β1β0) | XI) = XI (α1α0) holds. Note that XI does not only affect the returns to education directly, but also helps to predict the ex post realization of those elements of X not contained in XI.

We first implement the traditional structural approach and explicitly estimate all components of the generalized Roy model and combine them to form the marginal effect parameters (Björklund and Moffitt, 1987). We impose normality for the unobservables and fit the model by maximum-likelihood. As the participation decision is based on the net surplus and X does not affect the cost of treatment, this implies a cross-equation restriction between the coefficients on X in the outcome equations and XI in the choice equation. We account for agents’ imperfect foresight and set (α1-α0)=(X¯IX¯I)-1X¯IX¯(β1-β0), where (X̄, X̄I) denote the matrices with the outcome shifters of the whole sample. We estimate the whole model in one step. In a standard Probit model, the coefficients can only be identified up to a factor of proportionality. However, as the wage gain (α1α0)XI appears with a coefficient of one in the choice equation, we do not need to normalize the variance of and estimate it instead. We can then construct the marginal effects of treatment based on the results:

BMTE(x,uS)=x(β1-β0)+(σU1-U0,VσV2)ΦσV-1(uS)BIMTE(xI,uS)=xI(α1-α0)+(σU1-U0,VσV2)ΦσV-1(uS)CIMTE(z,uS)=zγ+(σUC,VσV2)ΦσV-1(uS)SIMTE(xI,z,uS)=xI(α1-α0)-zγ-ΦσV-1(uS),

where σU1U0,Ṽ and σUC denote the covariance between (U1U0, Ṽ) and (UC, Ṽ) respectively. ΦσV-1 indicates the inverse of a normal cumulative distribution function with standard deviation σ.

The sign of the slope of the marginal effect parameters is determined by σUC and σU1U0 as σV2>0. We present our results for these parameters in Table 2. The estimate for σU1U0 is negative and thus the marginal benefits of treatment decrease when moving along the margins of . The opposite is true for σUC and so the marginal cost increases in uS. However, only σU1U0 is significantly different from zero at the 10% level.

Table 2.

Slope Parameters

Parameter Estimate 90% Confi. p -val.
σ(U1U0), −0.042 −0.216 / 0.001 0.06
σUC, 0.015 −0.020 / 0.579 0.29
σV2
0.058 0.005 / 0.769 0.00

Notes: Confi. = Confidence Interval, p - val. = p -values.

Figure 2 presents our fully parametric results for the ex post marginal benefit and ex ante cost and surplus parameters. We plot them as a function of uS and evaluate them at the sample mean of (XI, Z). As agents are assumed to form valid expectations about their future benefits, the ex ante and ex post marginal benefits are identical. Individuals with a high unobserved desire for treatment (low uS) have the highest benefit, strictly decreasing from +16% to −4%. The estimated surplus is positive for low values of uS and decreases when moving along the margins of . The opposite holds for the marginal cost, which is always positive and slightly increasing. The cost is lowest for individuals with low values of uS and ranges from +3% to +10%. In summary, the benefit is highest and cost lowest for those most likely to pursue a higher education. However, the estimates are not precisely determined. The marginal benefit of treatment is significantly different from zero for roughly half of the individuals. Along all margins of , the marginal cost of a college education does not significantly differ from zero. By construction, the marginal surplus is strictly positive for all those individuals who participate in the treatment and negative for those that do not. Conditional on the observables set to their sample mean, individuals are indifferent towards treatment when uS = 0.51.

Figure 2.

Figure 2

Marginal Effects of Treatment, Parametric

Figure 2 presents the marginal effect parameters over the full unit interval from the structural model. The distributional assumptions on (U1, U0, Ṽ) expand the margins for which we can identify the marginal effects of treatment. As we assume full independence between all observables and unobservables, we identify the marginal effects of treatment over the unconditional common support of P (XI, Z). In our sample, this support ranges between 0.03 and 0.98. Adding joint normality, we can extrapolate even further and cover the full unit interval.

However, our formal analysis demonstrates that in a fully nonparametric setting we are only able to identify the BIMTE(xI,uS) over the support of P (XI, Z) conditional on XI = xI and the CIMTE(z,uS) over the support of P (XI, Z) conditional on Z = z. We identify the SIMTE(xI,z,uS) over the intersection of the two supports. In Figure 3 we plot the conditional densities of P (XI, Z) in our data. As XI and Z are both multidimensional, we condition on the decile of the relevant index, i.e. on XI (α1α0) for the BIMTE(xI,uS) and for the CIMTE(z,uS).12 The support is very limited and thus the results of a fully parametric implementation rely heavily on extrapolation based on the distributional assumptions.

Figure 3.

Figure 3

Conditional Support

We now develop a semiparametric estimation strategy that relies on fewer assumptions and provides more transparent sources of identification. We apply Marschak’s Maxim, estimating only those combinations of structural parameters needed for the marginal effect parameters. To fix ideas, consider the estimation of the BMTE(x, uS), where the conditional expectation of (U1U0) along the margins of is a key element. In the fully parametric normal-theory approach, it is directly constructed from estimates of (σU1,, σU0,) and σV2:

E(U1-U0US=uS)=(σU1-U0,VσV2)ΦσV-1(uS).

Instead, in what follows, we directly obtain E (U1U0 | US = uS) without having to estimate all structural components. We will also carefully recognize the relevant conditional support of P for each parameter and thus present a data-sensitive structural analysis (Heckman, 2010).

We determine the support of P by building on an estimator of the joint support of the distribution of (X, Z):

S^X,Z={(x,z):(Xi,Zi)-(x,z)εforsomei},

where || · || corresponds to the Euclidean norm and i denotes a generic observation in our data.13 Then, letting xI (x) indicate the appropriate subvector of x, our resulting estimator for the support of (XI, Z) is:

S^XI,Z={(xI,z):(x,z)S^X,Zsuchthat(xI(x),z)=(xI,z)}.

We can use these estimates to construct our desired support for the marginal cost and benefit parameters:

S^X,P={(x,p):(x,z)S^X,Zsuchthat(x,P(xI(x),z))=(x,p)}S^XI,P={(xI,p):(xI,z)S^XI,Zsuchthat(xI,P(xI,z))=(xI,p)}S^Z,P={(z,p):(xI,z)S^XI,Zsuchthat(z,P(xI,z))=(z,p)}.

Note, that the variation in p for a given x and xI (x) is the same in ŜX,P and ŜX<sub>I</sub>,P. Thus we can identify the BMTE(x, uS) and BIMTE(xI(x),uS) over the same margins. Finally, for the marginal surplus parameter, we collect in ŜX<sub>I</sub>,Z,P all (xI, z, p) where the relevant subsets in ŜX<sub>I</sub>,P and ŜZ,P overlap in p. We only report estimates for the margins within these sets and thus acknowledge the limitations of the data.

We estimate the BMTE(x, uS) using the method of local instrumental variables (LIV) proposed in Heckman and Vytlacil (1999, 2001b, 2005). They show that under our conditions the BMTE(x, uS) is identified by differentiating the conditional expectation of observed outcomes:

E(YX=x,P=p)p|p=uS=BMTE(x,uS). (6.1)

Applied to sample data, this is the LIV estimator of Heckman and Vytlacil (1999).14 As noted in Carneiro et al. (2011), it is empirically very difficult to apply the LIV estimator while conditioning on all variables in the outcome equations. Thus we proceed by invoking the stronger assumption that in addition to the variables in X, all elements common to outcome and choice equations are independent of (U1, U0, Ṽ) as well. Because our generalized Roy model is also linear, the conditional expectation of Y simplifies to:

E(YX=x,P=p)=E(DY1+(1-D)Y0X=x,P=p)=xβ0+px(β1-β0)+K(p), (6.2)

where K(p) = E(U1U0 | D = 1, P = p) can be estimated nonparametrically. We determine the parameters of Equation (6.1) by a partially linear regression of Y on X and P. We proceed in two steps. The first step is the construction of P, and the second step is the estimation of β1 and β0 using the estimated P. We carry out the first step using a Probit regression of D on (XI, Z). In the second step we use Robinson (1988)’s method for estimating partially linear models as extended in Heckman et al. (1997a).15 Next, consider the estimation of K(P). Equation (6.2) implies that E() = K(p), where = Y0px(β1β0) is the residualized observed outcome. We thus use a local quadratic regression of on P to estimate K(P) and its partial derivative with respect to P.16 We construct the ex post marginal benefit of treatment BMTE(x, uS) based on these estimates:

BMTE(x,p)=x(β1-β0)+K(p)p(x,p)S^X,P.

For the ex ante marginal benefit of treatment, we account for the agents’ imperfect foresight about the future realization of components of X. As agents form valid expectations, we calculate (α1-α0)=(X¯IX¯I)-1X¯IX¯(β1-β0)17 and then construct the BIMTE(xI,uS) as follows:

BIMTE(xI,p)=xI(α1-α0)+K(p)p(xI,p)S^XI,P.

We can identify the CIMTE(z,uS) using the equality of the marginal cost and benefit parameter at the margin of indifference:

CIMTE(z,p)=BIMTE(xI,p)(z,p)S^Z,P. (6.3)

This step directly mirrors Equation (4.2) from our nonparametric identification analysis. We obtain an estimate for the marginal cost of treatment using only information on the marginal benefits. We do not exploit any additional distributional assumptions such as joint normality of the unobservables.

We finally determine the SIMTE(xI,z,uS) by taking the difference between benefits and costs:

SIMTE(xI,z,p)=BIMTE(xI,p)-CIMTE(z,p)(xI,z,p)S^XI,Z,P. (6.4)

Figure 4 presents our semiparametric results for the ex ante benefit, cost and surplus parameters as well as the ex post benefit. We calculate the marginal effects at the mean values in the sample (x̄, z̄) and at two additional points of evaluation (xA, zA) and (xB, zB). We plot them as a function of uS within the relevant conditional support and compute the 90% confidence bands using the bootstrap.18

Figure 4.

Figure 4

Marginal Effects of Treatment, Semiparametric

Our estimates show that individuals with a high unobserved desire for treatment (low uS) have high benefits as well as high costs from participation. When moving up the margins of uS the benefits fall more quickly than the costs as the surplus decreases. The BIMTE(xI,xS) ranges from +37% within the support of xA to as low as −12% within the support of xB. The CIMTE(z,uS) varies between +32% and −6% overall, but within each margin of support the variation is limited to about 4% in absolute value. We can calculate the SIMTE(xI,z,uS) which ranges from +5% to −5% as the difference between ex ante benefits and costs within the overlap of the support. Note that the estimates for the marginal benefits at xB are all negative. However, costs are as well and so the surplus is still positive at the lower end of the conditional support. After conditioning on observables, it is unobservable heterogeneity in benefits and not costs that is driving the college entry decision. However, all estimates are rather imprecise, precision is highest at the mean values in the sample.

The conditional support is limited as shown in Figure 3. The location and range of the support depends on the point of evaluation. In general, we can identify BMTE(x, uS) and BIMTE(xI,xS) over longer stretches of uS than the CIMTE(z,uS) function. In fact, for all xI, z evaluation points considered, the values of uS for which we identify CIMTE(z,uS) is a subset of the values of uS for which we identify BIMTE(xI,xS). Hence, for the xI, z evaluation points considered, we can identify SIMTE(xI,z,uS) only over the set of uS values corresponding to the smaller set of uS values for which we identify CIMTE(z,uS). The conditional variation in P is largest at I where we can identify the longest stretch for the BIMTE(x¯I,xS) with uS ∈ (0.42, 0.61), while it is smallest for CIMTE(zB,xS) with uS ∈ (0.81, 0.89). Note that we identify all marginal effect parameters around the margin of indifference at SIMTE(xI,z,uS)=0.

We can also assess the magnitude of the expectation errors due to the agents’ imperfect foresight about parts of their future benefits. Given our prediction model, the ex post and ex ante benefits coincide for the average individual (x̄, z̄). However, a comparison between realized and predicted benefits reveals that at xA, ex post benefits are overestimated by about 9%, while at xB the prediction is only off by 3%.

We can compare the results for the marginal effects of treatment between the two estimation approaches at (x̄, z̄) within the conditional support. The semiparametric approach indicates larger heterogeneity in benefits and costs due to the steeper slope of the marginal effect parameters. In both cases, benefits decrease considerably when moving along the margins of while variation in costs is limited. Thus, it is heterogeneity in benefits that drives the college attendance decision. This is in line with the results by Björklund and Moffitt (1987), who also find that heterogeneity in rewards is more important than heterogeneity in costs for the participation decision in their context of a manpower training program in Sweden.

7 Summary and Conclusion

This paper extends the modern treatment effect literature by developing a framework for identifying both the marginal benefit and marginal cost of policies. The treatment effect literature focuses only on the benefit side, and does not address the question of the subjective cost of treatment as perceived by the agents attempting to take it. We build on the pioneering parametric analysis of Björklund and Moffitt (1987) by extending the nonparametric analysis of Heckman and Vytlacil (1999, 2005, 2007) to identify subjective cost and surplus functions. We provide identification results for the case of perfect foresight (as in the previous literature) as well as cases with imperfect foresight not previously considered. An analysis of college-going finds unobserved heterogeneity in the benefits as well as costs of attending college, with agents selecting into college based on both their idiosyncratic expected benefit and perceived cost of attending college. We find more heterogeneity in expected benefits than in perceived cost. Thus, the observed variability in college attendance is mainly driven by the variability in expected benefits.

Supplementary Material

Appendix

Footnotes

*

This research was supported by NIH R01-HD32058, NSF SES-024158, NSF SES-05-51089, NICHD R37HD065072, NIH R01-HD54702, The Pritzker Children’s Initiative, the American Bar Foundation, the Human Capital and Economic Opportunity Working Group—an initiative of the Becker Friedman Institute for Research and Economics—funded by the Institute for New Economic Thinking (INET), and a European Research Council grant hosted by the University College Dublin, DEVHEALTH 269874. The website for this paper is https://heckman.uchicago.edu/generalized-roy-model. The views expressed in this paper are those of the authors and not necessarily those of the funders or commentators mentioned here. We have greatly benefited from comments received from Ismael Mourifié. We thank Luke Schmerold, Edward Sung, and Jake Torcasso for their outstanding research assistance.

1

See the discussion in Heckman and Vytlacil (2007) and Heckman (2010).

2

The same methodology applies to search theory, see Flinn and Heckman (1982).

3

Because Heckman and Honoré (1990) impose a Roy model with zero cost of treatment, they are able to identify the joint distribution of (U0, U1). In contrast, because we allow for nonzero cost of treatment (and, in particular, for unobserved costs of treatment), we are unable to identify the dependence between U0 and U1 which precludes the identification of some potentially interesting economic parameters. See Heckman (1990), Heckman and Smith (1998) and Heckman et al. (1997b) for related analysis. With additional information, the joint distribution of (U1, U0, UC) can be identified. See, e.g., Carneiro et al. (2003), Aakvik et al. (2005), and Abbring and Heckman (2007). D’Haultfoeuille and Maurel (2013) identify the cost of treatment in a related Roy model in which the cost of treatment is a deterministic function of observed covariates. Their identification strategy is fundamentally different from ours, and critically relies on the restriction that the cost of treatment is constant conditional on covariates.

4

We will refer to the cumulative distribution function of a random vector A by FA(·) and to the cumulative distribution function of a random vector A conditional on random vector B by FA|B(·). We write the cumulative distribution function of A conditional on B = b by FA|B| b).

5

Recall again that we are implicitly conditioning on all common elements of (X, Z), so that these need not be additively separable from the error term.

6

In this respect, our analysis is broadly analogous to the identification strategies and conditions of Vytlacil and Yildiz (2007) and Shaikh and Vytlacil (2011), who also require that there be exogenous regressors in the outcome equation that is excluded from the treatment choice equation, and they exploit variation in such regressors for identification.

7

For any random vectors A and B, we will write the support of the distribution of A as Supp(A), and the support of distribution of A conditional on B = b as Supp(A|B = b).

8

Heckman and Vytlacil (2001a) show that one can identify BATE(x) and BTT(x) under slightly weaker conditions than those required to follow this strategy of first identifying BMTE(x, u) over the appropriate support. In particular, they show that the necessary and sufficient condition for identification of BATE(x) is that {0, 1} ∈ Supp(P|X = x), and for BTT(x) that {0} ∈ Supp(P|X = x).

9

We assume that agents know all components of Z, while we allow agents to be ignorant of some components of X. This assumption simplifies our notation and conforms to our empirical analysis of Section 6. The analysis can be extended (at the cost of somewhat more cumbersome notation) to allow agents to know only a subvector of Z as well as only a subvector of X at the time of selection into treatment.

10

In other words, the information set of the agent equals σ(X, Z, UI), the sigma-algebra generated by (X, Z, UI).

11

See Bureau of Labor Statistics (2005) for a detailed description of the NLSY79 and Appendix C for details on the construction of the variables.

12

We trace out the remaining variation in P (XI, Z) by applying a two-dimensional kernel density estimation with a bivariate normal kernel.

13

In practice, we set ε such that at most 5% of the sample are within the support for a given pair of (Xi, Zi).

14

See the Web Appendix of Heckman et al. (2006) for a detailed description of the implementation of the LIV estimator.

15

We run kernel regressions of each of the regressors on P using a bandwidth of h = 0.05. We compute the residuals of each of these regressions and then run a linear regression of Y on these residuals.

16

We choose the bandwidth that minimizes the residual square criterion proposed in Fan and Gijbles (1996), which gives us a bandwidth of h = 0.3.

17

The economics of the model imply a restriction on the coefficients (α1α0) in the choice equation, which depend on the estimated values of (β1β0). However, we only learn about the values of (β1β0) using an initial estimate of P. We insure internal consistency of our estimation routine by iterating between the estimation of the BMTE(x, uS) and P with restricted (α1α0) until convergence.

18

We use 2,000 bootstrap replications. In each iteration of the bootstrap we re-estimate P so all standard errors account for the fact that P itself is an estimated object.

Contributor Information

Philipp Eisenhauer, The University of Chicago.

James J. Heckman, The University of Chicago, University College Dublin, American Bar Foundation

Edward Vytlacil, New York University.

References

  1. Aakvik A, Heckman JJ, Vytlacil EJ. Treatment Effects for Discrete Outcomes When Responses to Treatment Vary Among Observationally Identical Persons: An Application to Norwegian Vocational Rehabilitation Programs. Journal of Econometrics. 2005;125(1–2):15–51. [Google Scholar]
  2. Abbring J, Heckman JJ. Econometric Evaluation of Social Programs, Part III: Distributional Treatment Effects, Dynamic Treatment Effects, Dynamic Discrete Choice, and General Equilibrium Policy Evaluation. In: Heckman JJ, Leamer EE, editors. Handbook of Econometrics. 6B. Elsevier Science; Amsterdam, Netherlands: 2007. pp. 5145–5303. [Google Scholar]
  3. Björklund A, Moffitt R. The Estimation of Wage Gains and Welfare Gains in Self-Selection Models. The Review of Economics and Statistics. 1987;69(1):42–49. [Google Scholar]
  4. Bureau of Labor Statistics. NLS Hanbook 2005: The National Longitudinal Surveys. U.S. Department of Labor; Washington, DC: 2005. [Google Scholar]
  5. Carneiro P, Hansen K, Heckman JJ. Estimating Distributions of Treatment Effects with an Application to the Returns to Schooling and Measurement of the Effects of Uncertainty on College Choice. International Economic Review. 2003;44(2):361–422. [Google Scholar]
  6. Carneiro P, Heckman JJ, Vytlacil EJ. Estimating Marginal Returns to Education. American Economic Review. 2011;101(6):2754–2781. doi: 10.1257/aer.101.6.2754. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Chetty R. Sufficient Statistics for Welfare Analysis: A Bridge Between Structural and Reduced-Form Methods. Annual Review of Economics. 2009;1:451–488. [Google Scholar]
  8. D’Haultfoeuille X, Maurel A. Inference on an Extended Roy Model, With an Application to Schooling Decisions in France. Journal of Econometrics. 2013;174(2):95–106. [Google Scholar]
  9. Fan J, Gijbles I. Local Polynomial Modelling and its Applications. Chapman and Hall; New York, NY: 1996. [Google Scholar]
  10. Flinn C, Heckman JJ. New Methods for Analyzing Structural Models of Labor Force Dynamics. Journal of Econometrics. 1982;18(1):115–168. [Google Scholar]
  11. Florens JP, Heckman JJ, Meghir C, Vytlacil EJ. Identification of Treatment Effects Using Control Functions in Models with Continuous, Endogenous Treatment and Heterogeneous Effects. Econometrica. 2008;76(5):1191–1206. [Google Scholar]
  12. Harberger AC, Jenkins GP, editors. Cost-Benefit Analysis. Edward Elgar Publishers; Northampton, MA: 2002. The International Library of Critical Writings in Economics. [Google Scholar]
  13. Heckman JJ. Shadow Prices, Market Wages, and Labor Supply. Econometrica. 1974;42(4):679–694. [Google Scholar]
  14. Heckman JJ. Varieties of Selection Bias. American Economic Review. 1990;80(2):313–318. [Google Scholar]
  15. Heckman JJ. Randomization and Social Policy Evaluation. In: Manski C, Garfinkel I, editors. Evaluating Welfare and Training Programs. Harvard University Press; Cambridge, MA: 1992. pp. 201–230. [Google Scholar]
  16. Heckman JJ. Building Bridges Between Structural and Program Evaluation Approaches to Evaluating Policies. Journal of Economic Literature. 2010;48(2):356–398. doi: 10.1257/jel.48.2.356. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Heckman JJ, Honoré BE. The Empirical Content of the Roy Model. Econometrica. 1990;58(5):1121–1149. [Google Scholar]
  18. Heckman JJ, Ichimura H, Todd PE. How Details Make a Difference: Semiparametric Estimation of the Partially Linear Regression Model. Unpublished Manuscript 1997a [Google Scholar]
  19. Heckman JJ, Sedlacek GL. Heterogeneity, Aggregation, and Market Wage Functions: An Empirical Model of Self-Selection in the Labor Market. Journal of Political Economy. 1985;93(6):1077–1125. [Google Scholar]
  20. Heckman JJ, Smith J. Evaluating the Welfare State. In: Strom S, editor. Econometrics and Economic Theory in the 20th Century: The Ragnar Frisch Centennial Symposium. Cambridge University Press; New York, NY: 1998. pp. 241–318. [Google Scholar]
  21. Heckman JJ, Smith J, Clements N. Making the Most out of Programme Evaluations and Social Experiments: Accounting for Heterogeneity in Programme Impacts. The Review of Economic Studies. 1997b;64(4):487–535. [Google Scholar]
  22. Heckman JJ, Smith JA. Assessing the Case for Social Experiments. Journal of Economic Perspectives. 1995;9(2):85–110. [Google Scholar]
  23. Heckman JJ, Urzua S, Vytlacil EJ. Understanding Instrumental Variables in Models with Essential Heterogeneity. The Review of Economics and Statistics. 2006;88(3):389–432. [Google Scholar]
  24. Heckman JJ, Vytlacil EJ. Local Instrumental Variables and Latent Variable Models for Identifying and Bounding Treatment Effects. Proceedings of the National Academy of Sciences. 1999;96(8):4730–4734. doi: 10.1073/pnas.96.8.4730. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Heckman JJ, Vytlacil EJ. Instrumental Variables, Selection Models, and Tight Bounds on the Average Treatment Effect. In: Lechner M, Pfeiffer F, editors. Econometric Evaluation of Labour Market Policies. Springer; New York, NY: 2001a. pp. 1–15. [Google Scholar]
  26. Heckman JJ, Vytlacil EJ. Local Instrumental Variables. In: Hsiao C, Morimune K, Powell JL, editors. Nonlinear Statistical Modeling: Proceedings of the Thirteenth International Symposium in Economic Theory and Econometrics: Essays in Honor of Takeshi Amemiya. Cambridge University Press; New York, NY: 2001b. pp. 1–46. [Google Scholar]
  27. Heckman JJ, Vytlacil EJ. Structural Equations, Treatment Effects, and Econometric Policy Evaluation. Econometrica. 2005;73(3):669–738. [Google Scholar]
  28. Heckman JJ, Vytlacil EJ. Econometric Evaluation of Social Programs, Part II: Using the Marginal Treatment Effect to Organize Alternative Economic Estimators to Evaluate Social Programs and to Forecast their Effects in New Environments. In: Heckman JJ, Leamer EE, editors. Handbook of Econometrics. 6B. Elsevier Science; Amsterdam, NL: 2007. pp. 4875–5144. [Google Scholar]
  29. Hotelling H. The General Welfare in Relation to Problems of Taxation and of Railway and Utility Rates. Econometrica. 1938;6(3):242–269. [Google Scholar]
  30. Imbens GW, Angrist JD. Identification and Estimation of Local Average Treatment Effects. Econometrica. 1994;62(2):467–475. [Google Scholar]
  31. Marschak J. Economic Measurements for Policy and Prediction. In: Hood W, Koopman T, editors. Studies in Econometric Method. Wiley; New York, NY: 1953. pp. 1–26. [Google Scholar]
  32. Quandt RE. The Estimation of the Parameters of a Linear Regression System Obeying two Separate Regimes. Journal of the American Statistical Association. 1958;53(284):873–880. [Google Scholar]
  33. Quandt RE. A New Approach to Estimating Switching Regressions. Journal of the American Statistical Association. 1972;67(338):306–310. [Google Scholar]
  34. Robinson PM. Root-N-Consistent Semiparametric Regression. Econometrica. 1988;56(4):931–954. [Google Scholar]
  35. Roy A. Some Thoughts on the Distribution of Earnings. Oxford Economic Papers. 1951;3(2):135–146. [Google Scholar]
  36. Shaikh AM, Vytlacil EJ. Partial Identification in Triangular Systems of Equations with Binary Dependent Variables. Econometrica. 2011;79(3):949–955. [Google Scholar]
  37. Tinbergen J. Economic Policy: Principles and Design. North Holland Publishing Company; Amsterdam: 1956. [Google Scholar]
  38. Vytlacil EJ. Independence, Monotonicity, and Latent Index Models: An Equivalence Result. Econometrica. 2002;70(1):331–341. [Google Scholar]
  39. Vytlacil EJ, Yildiz N. Dummy Endogenous Variables in Weakly Separable Models. Econometrica. 2007;75(3):757–779. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix

RESOURCES