Heckman and Vytlacil (1) synthesize and extend a recent body of research in economics and statistics on the identification and estimation of treatment effects when the subjects have heterogeneous responses. This research has demonstrated the importance of distinguishing between several different types of treatment effects and the need to establish firmly the relationship between the different types. Heckman and Vytlacil (H&V) distinguish between four different types: average treatment effects (ATE), effects of the treatment on the treated (TT), local average treatment effects (LATE), and local instrumental variable (LIV) treatment effects. Each is conceptually different, and each has a different set of conditions for identification. The research discussed by H&V represents a genuine advance and clarification of concepts in models of treatment effects.
The framework set up by H&V is heavily influenced by economic and econometric terminology but has major elements that are drawn from statistics as well. The reader may be led to suppose that the concepts and methods are applicable only to observational data and not to randomized clinical trials (RCTs), but this is not the case. All parts of their analysis are equally applicable to both. The application to RCTs is most easily seen by drawing an analogy with intent-to-treat models, as discussed by Angrist et al. (2). Although, in an RCT, the randomization creates a treatment-assignment dummy variable that economists call an “instrument,” similar variables often exist in observational data—natural experiments, quasi-experiments, or, more generally, what economists merely call exogenous identifying variables—and these can satisfy the same conditions as the treatment assignment variable in an RCT, and, hence, the same methods apply.
The major restriction in the models of H&V is the index function restriction, which is the assumption that the propensity to “participate” or “take up” the treatment, when it is offered, can be described by a single function with a single unobservable. H&V note correctly in their conclusions that most of their results can be obtained, in modified form, without an index function restriction, but their analysis in the main makes that assumption. While imposing restrictions, the index function model has great expository and intuitive value in the analysis of treatment effects models.
In this commentary, I demonstrate that, with a few additional restrictions, most of the major points discussed by H&V can be given a simple graphical interpretation that has the same virtue of expository usefulness. Denote by αi the treatment effect for subject i, often called a “random coefficient” in econometric models but equivalent to the heterogeneous treatment response in treatment effect models in statistics. Fig. 1 shows a hypothetical density of αi in the population, a density assumed to have a mean of ᾱ. This figure appears in Björklund and Moffitt (3), where αi is given the random coefficient interpretation just referred to. Björklund and Moffitt call αi the “gain” to the treatment for individual i. Now make the additional restriction that selection takes place strictly on αi and that, for illustration, selection is positive: subjects with higher values of αi are more likely to take up the treatment if offered, i.e., more likely to participate.‡ Denote the cutoff value of αi as α*, above which subjects participate and below which subjects do not.
Fig. 1 can be used to illustrate all four estimators. The parameter ᾱ is equivalent to the ATE. The treatment effect on the treated is shown in Fig. 1 as αTT and is equal to E(αi|αi > α*), the mean gain of those who are participants. For future use, Fig. 1 also shows the treatment effect on the untreated, αTU = E(αi|αi < α*), the mean gain that those who are nonparticipants would have if they received the treatment. This quantity is an unobservable. Note that ᾱ = PαTT + (1 − P)αTU, where P = Prob(αi > α*), by construction.
The LATE estimator is representable in Fig. 1 by a discrete change in α* moving it to α*′, which necessarily shifts αTT as well, moving it to αTT′. The LATE estimator equals αTT − αTT′ divided by the area under the curve between α* and α*′, which equals the change in the probability of participating. The estimator termed by H&V the LIV is the limit of this change as α*→α*′. This estimator is termed the “marginal” gain by Björklund and Moffitt (3) to emphasize that it is the treatment gain of the marginal subject just on the edge of participating and not participating. A small expansion or contraction of the program will bring these subjects into the program or push them out of it and will lead to a change in the mean gain of participants accordingly. In their empirical example, Björklund and Moffitt analyzed a program that had a positive mean gain for those in the program (positive αTT) but a negative marginal gain, implying that the program was “too large”—it had been overextended to include subjects who were made worse off by it. Thus the difference between the two can be important.
In general, only αTT, and changes in that quantity, are identified. This is clear from Fig. 1. Denote the difference between the mean outcome of the subjects of a treatment group that is offered the program and that divides up into participants and nonparticipants in the manner described by Fig. 1, and the mean outcome of a comparison group that is not offered the program at all, as αUNADJ. Then, it can be shown that αTT = αUNADJ/P, where P is the participation rate in the treatment group. Multiple treatment groups with different participation rates can be used with this formula to calculate the LATE estimator and, in the limit, the LIV estimator. Only if P = 1 in one of the treatment groups can ᾱ be identified, as is clear from Fig. 1 (“identification at infinity”).
Fig. 2 graphically illustrates this result by showing the relationship between P and αTT. By the positive and monotonic selection assumed here, the value of αTT falls and approaches -α as P→1. Treatment groups with different values of P form data points along the curve. Two data points are shown in Fig. 2. With two data points, the LATE estimator can be calculated. If the full curve were estimable, what Björklund and Moffitt term the marginal gain and what H&V term the LIV can be calculated as the slope of the curve in Fig. 2.§
Fig. 2 suggests what a research agenda on a program should aim to achieve. Estimates from multiple RCTs, observational studies, and other analyses that contain treatment groups with different values of P yield data points on the curve. Nonparametric estimation and related smoothing techniques permit the estimation of the function, at least between the maximum and minimum values of P in the different studies. Extrapolation is required beyond those points.
The bounding concepts described in H&V also can be illustrated graphically. Fig. 3 shows both αTT and αTU. The latter is an unobservable, as previously noted, and, hence, the αTU line cannot be estimated. This can be viewed as the reason that ᾱ cannot be estimated as well, for ᾱ = PαTT + (1 − P)αTU is an equation with two estimable quantities, P and αTT, but two unknowns, ᾱ and αTU. But if a (say) lower bound αMIN is established for αTU, a lower bound on ᾱ can be established as well (note that it is not the bound on the individual y values that matter but only the bound on their difference for the untreated; this is implicit in the formula for the width of the bound). Inserting αMIN in for αTU in the weighted average formula for -α yields the desired bound for ᾱ. It is illustrated by the dotted line in Fig. 3. Upper bounds could be similarly illustrated.
The results in the work by H&V provide a fruitful set of concepts and methods for applications in future research in the social sciences. The different concepts have seen relatively little use to date in applications but should see more in the future.
Footnotes
The companion to this commentary begins on page 4730 in issue 8 of volume 96.
This selection can either arise from voluntary actions on the part of the subjects or from decisions by the program operators to admit and deny applicants on the basis of αi. This model is neutral on this issue.
In a personal communication from H&V, the authors show that the ATE and LATE can be graphically obtained from the LIV by integration.
References
- 1.Heckman J J, Vytlacil E J. Proc Natl Acad Sci USA. 1999;96:4730–4734. doi: 10.1073/pnas.96.8.4730. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Angrist J, Imbens G, Rubin D. J Am Stat Assoc. 1996;91:444–472. [Google Scholar]
- 3.Björklund A, Moffitt R. Rev Econ Stat. 1987;69:42–49. [Google Scholar]