Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Jan 1.
Published in final edited form as: Econometrica. 2018 Jan;86(1):35. doi: 10.3982/ECTA13777

Unordered Monotonicity

James J Heckman 1, Rodrigo Pinto 2
PMCID: PMC5822751  NIHMSID: NIHMS938978  PMID: 29479110

Abstract

This paper defines and analyzes a new monotonicity condition for the identification of counterfactuals and treatment effects in unordered discrete choice models with multiple treatments, heterogenous agents and discrete-valued instruments. Unordered monotonicity implies and is implied by additive separability of choice of treatment equations in terms of observed and unobserved variables. These results follow from properties of binary matrices developed in this paper. We investigate conditions under which unordered monotonicity arises as a consequence of choice behavior. We characterize IV estimators of counterfactuals as solutions to discrete mixture problems.

Keywords: Instrumental Variables, Monotonicity, Revealed Preference, Generalized Roy Model, Binary Matrices, Discrete Choice, Selection Bias, Identification, Discrete Mixtures

JEL codes: I21, C93, J15, V16

1 Introduction

The evaluation of economic policies is a central goal of econometrics.1 Economists have long used instrumental variables (IV) to identify policy-relevant parameters.2 Early econometricians used IV to identify parameters in systems of linear simultaneous equations. In that framework, economists can safely be agnostic about the models generating choices in estimating a variety of interesting policy counterfactuals if their instruments satisfy rank and exogeneity conditions.

This agnostic stance is not justified in models with heterogeneous responses in which decisions to take treatment are based on unobserved3 components of those responses. Without additional assumptions, instrumental variables do not identify interpretable causal parameters. Choice mechanisms play a fundamental role in interpreting what instruments identify.

For binary and ordered versions of IV models, Imbens and Angrist (1994) show that monotonicity facilitates identification of certain instrument-defined causal parameters. This condition requires that responses to changes in instruments move all people toward or against the same choices.4 It is a condition about the uniformity of the direction of responses across persons in responses to changes in instruments.5 In binary and ordered choice models, monotonicity coupled with standard IV assumptions permits economists to identify certain causal effects on outcomes of changes in the choices induced by variation in the instruments with different instruments generally identifying different parameters.6

For a nonparametric binary choice Generalized Roy model, Vytlacil (2002) shows that monotonicity is equivalent to assuming that treatment choice equations can be characterized by an additively-separable latent-variable threshold-crossing model. Separability is defined in terms of observed and unobserved (by the economist) variables. Vytlacil (2006) extends his analysis to the case of ordered multiple choice models where the order is placed on the possible outcome variables (e.g., years of schooling).

This paper contributes to the literature by extending the analysis of instrumental variables to a general model of unordered choices. We develop a natural generalization of monotonicity—unordered monotonicity—that applies to models with multiple choices without a natural order among the choice values. For example, the choice of a pet among the set { cat, dog, bird } is only ordered by the preferences of agents across choices and not necessarily by the characteristics of the outcomes of choices. Unordered monotonicity preserves the intuitive notion of weak uniformity of responses to changes in instruments across persons without assuming any cardinalization on choice outcomes. We demonstrate how unordered monotonicity arises in choice models and examine which counterfactuals and causal parameters and weights are identified by different configurations of instruments. Like its counterpart in ordered choice models, unordered monotonicity identifies a mixture of LATEs with identifiable weights. We cannot identify the causal effect subcomponent of the mixture of LATEs, but we can identify certain counterfactuals.

Identification of causal effects in unordered choice models is studied by Heckman et al. (2006b), Heckman and Vytlacil (2007b) and Heckman et al. (2008) who identify a variety of economically interpretable treatment effects. They assume that the equations generating treatment choices are governed by additively separable threshold-crossing models. Their identification strategy relies critically on instruments that assume values on a continuum. They also invoke “identification at infinity,” as does a large literature in structural economics.7 In this paper, we show that these assumptions can be relaxed and identification of causal effects can still be secured. We rely only on discrete-valued instruments—the case commonly encountered in empirical work.8

This paper introduces economists to the identifying and interpretive power of binary matrices. We state the necessary and sufficient conditions for identification of counterfactuals in terms of conditions on binary matrices. We establish an equivalence result that connects unordered monotonicity and separability of choice equations. Separability is not imposed on the underlying choice equations.

However, unordered monotonicity implies and is implied by representations of choice equations that are additively separable in observed and unobserved variables. We show that this equivalence result stems from the properties of binary matrices that characterize choice sets. We determine the counterfactual outcomes that are identified under unordered monotonicity and present equations that facilitate estimation of the identified parameters.9

This paper proceeds in the following way. Section 2 defines a general model of multiple choices and categorical instrumental variables. Section 3 presents a general framework for studying identification of counterfactuals and causal parameters in the general model. Our framework is based on partitioning the population into strata corresponding to counterfactual treatment choices. Section 4 presents a new characterization of the IV identification problem using a finite mixture model with restrictions on admissible vectors of counterfactual choices. We state necessary and sufficient conditions for identifying causal parameters. We illustrate these conditions for a binary choice (LATE) model. We show the simplicity and power of our analytical framework by deriving Vytlacil’s equivalence result (2002) in a transparent way. Section 5 defines unordered monotonicity and illustrates how is arises in choice-theoretic models. Section 6 presents equivalence theorems that relate the properties of unordered monotonicity and the separability of choice equations. We interpret these equivalence results in light of economic theory. Section 7 applies this analysis to identify causal parameters. We establish the role of choice theory in securing identifiability. Section 8 concludes.

2 A Choice-Theoretic Model of Instrumental Variables

Our model consists of five (possible vector-valued) random variables defined on probability space (Ω,, P), two policy-invariant (vector) equations that determine causal relationships among the variables, and an independence condition:10

ChoiceEquations:T=fT(Z,V) (1)
OutcomeEquations:Y=fY(T,V,εY) (2)
IndependenceCondition:V,Z,εYaremutuallyindependent, (3)

Variables (Z, T, Y, εY,V ) have the following properties. P1: Instrument Z is a categorical random variable with support supp(Z) = {z1,, zNZ};11 P2: Treatment (or Choice) indicator T is a discrete-valued random variable with support supp(T) = {t1,, tNT}; P3: Y is an observed random variable denoting outcomes arising from treatment; P4: εY is an unobserved error term;12 P5: V is a confounder—an unobserved random vector (possibly infinite dimensional) affecting both choices and outcomes. We assume that the expectation of each component of Y exists. We also assume that the distribution of T varies conditional on each value of Z, that is, P(T = t|Z = z) > 0 for all t ∈ supp(T) and z ∈ supp(Z). Vector (Zω; Tω; Yω;Vω) denotes the realization of these variables for an element ω ∈ Ω. To simplify notation, background variables unaffected by treatment are kept implicit. Our analysis is conditional on such variables.

Counterfactual outcome Y (t) is defined by fixing the argument T of the outcome Equation (2) to t ∈ supp(T), that is, Y (t) = fY (t,V, εY ), The observed outcome Y (Equation (2)) is the output of a Quandt (1972) switching regression model:

Y=tsupp(T)Y(t)·1[T=t]Y(T), (4)

where 1[α] is an indicator function that takes value 1 if α is true and 0 otherwise. Counterfactual choice T(z) = fT (z,V ) is defined by fixing the argument Z of the choice equation (2) to z ∈ supp(Z).13 Observed choice is given by

T=zsupp(Z)T(z)·1[Z=z]T(Z). (5)

Remark 2.1

The binary Generalized Roy Model (Heckman and Vytlacil, 2007a) is a special case of this model in which V is a scalar random variable V, the choice is binary T ∈ {0, 1}, and the choice equation is defined by an indicator function that is separable in Z and V, namely T = fT (Z, V ) 1[τ (Z) ≥ V ], In this paper, we analyze multiple choices and impose no restriction on the functional forms of the choice equations (1) or outcome equations (2). Instead, we make restrictions on counterfactual choices and examine how those restrictions affect the characterization of choice equations.

Independence condition (3) generates the following properties:

ExclusionRestriction:(V,Y(t))Z (6)
ConditionalIndependence(Matching)Property:Y(t)TV. (7)

Equation (6) states that instrument Z is independent of counterfactual outcome Y (t) and the confounding variable V that generates selection bias. It implies that instrument Z affects Y only through its effect on T. Equation (7) states that Y (t) is independent of treatment choice T after conditioning on V. Counterfactual outcomes can be evaluated by conditioning on V :

E(Y(t)V)=E(Y(t)V,T=t)=E((tsupp(T)Y(t)·1[T=t])V,T=t)=E(YV,T=t). (8)

Any solution to the problem of selection bias requires that the analyst control for, or balance, unobserved V across treatment and control states.14

We control for V by partitioning the sample space Ω so that the treatment indicator T is independent of counterfactual outcomes within each partition set. Consider a partition of Ω:Ω=n=1NΩn; Ωn Ωn= ∅,n, n′ ∈ {1,, N}, nn′, with an associated indicator Hω that takes the value n ∈ {1,, N} if ω ∈ Ωn, i.e., Hω=n=1Nn·1[ωΩn]. If the following relationship holds within each partition,

Y(t)T(H=n);n{1,,N}, (9)

T is effectively randomly assigned conditional on H = n. If such partitions were known, one could apply the logic underlying Equation (8) to evaluate counterfactual outcome E(Y (t)|H = n) using E(Y |T = t,H = n). If T takes the value t with strictly positive probability in all partition sets, i.e., Pr(T = t|H = n) > 0; n ∈ {1,, N}, E(Y (t)) can be constructed from E(Y(t))=n=1NE(YT=t,H=n)P(H=n). Our identification strategy uses instrumental variable Z to generate partitions {Ωn}n=1N that satisfy Equation (9). To do so we use response vectors which we define next.

3 Response Vectors and Identifying or Bounding of Mean Counterfactuals and Weights on Counterfactuals

Response Vector S is defined as a NZ-dimensional random vector of counterfactual treatment choices T for Z fixed at each value of its support:

S=[T(z1),,T(zNZ)]=[fT(V,z1),,fT(V,zNZ)]fS(V), (10)

where T(z) denotes a counterfactual treatment choice when instrumental variable Z is fixed at z ∈ supp(Z). Let supp(S) = {s1, · · · , sNS} denote the finite support of S. The NZ-dimensional vectors s ∈ supp(S) are termed response-types or strata.15 S plays a fundamental role in our analysis. T is related to S in the following way:

T=[1[Z=z1],,1[Z=zNZ]]·SgT(S,Z).16 (11)

Equation (10) uses the fact that after fixing Z = z, S is a function only of unobserved V. Conditioning on S effectively conditions on the regions of V that map into S by Equation (10).17 It is a coarse way of conditioning on V.

3.1 Properties of Response Vectors

Lemma L-1 establishes four useful properties of response vectors analogous to properties shared with V.

Lemma L-1

The following relationships for S hold for IV model (1)(3):

  1. Y (t) ⫫ T|S, (ii) SZ, (iii) YT|(S, Z), (iv) YZ|(S, T).

Proof

See Web Appendix A.1.

Relationship (i) states that counterfactual outcomes Y (t) for all t ∈ supp(T) are independent of treatment choices conditional on S. Thus S shares the same conditional independence (matching) properties as V in (7). Relationship (ii) states that the potential treatment choices in S are independent of the instrumental variables. Relationship (iii) states that outcomes are independent of treatment choices conditional on S and Z. Indeed, from (11), T is deterministic conditional on S and Z. Relationship (iv) is closely related to (iii). It states that outcome Y is independent of instrumental variable Z when conditioned on S and T.

Remark 3.1

Response vector S generates a partition of the sample space Ω that has independence property (9). Function fS : supp(V ) supp(S) in (10) is constructed using function fT defined by (1). Thus, for each ω ∈ Ω, there is a single value v ∈ supp(V ) such that Vω = v and a single value s ∈ supp(S) such that fS(v) = s. We define a partition of the sample space Ω by:

Ωn={ωΩ;fS(Vω)=sn}foreachsnsupp(S). (12)

In partition (12), Sω = sn and ω ∈ Ωn are equivalent. This partition satisfies (9) because Y (t) ⫫ T|(ω ∈ Ωn) holds due to item (i) of Lemma L-1. Hence treatment choice can be interpreted as being randomly assigned conditional on S. Indeed, conditional on S, treatment T only depends on Z which is statistically independent of V.

Response vector S is a balancing score for V.18 It exploits the properties of instruments Z to generate a coarse partition of unobserved variable V while maintaining the independence properties arising from conditioning on V. The matching condition Y (t) ⫫ T|S is analogous to Y (t) ⫫ T|V in (7). If S (or V ) were known, counterfactual outcomes (conditional on S (or V )) can be identified by conditioning on S or V.19 Thus, S plays the role of a control function (Heckman and Robb, 1985). From Equation (8), Y (t) ⫫ T|S implies that E(Y (t)|S = s) = E(Y |T = t,S = s). If P(T = t|S = s) > 0 for all s ∈ supp(S), counterfactual mean outcomes can be expressed as:

E(Y(t))=ssupp(S)E(Y(t)S=s)P(S=s)=ssupp(S)E(YT=t,S=s)P(S=s). (13)

S acts as a coarse surrogate for V and identifies treatment effects within strata by balancing unobservables V across treatment states.

3.2 The Strata Identification Problem

The problem of identifying counterfactual mean outcomes defined for each stratum consists of identifying unobserved E(Y (t)|S = s) and P(S = s) for s ∈ supp(S) and t ∈ supp(T), from observed E(Y |T = t, Z = z) and P(T = t|Z = z) for z ∈ supp(Z) and t ∈ supp(T). Theorem T-1 uses the relationships of Lemma L-1 to express unobserved objects in terms of observed ones.

Theorem T-1

The following equality holds for the IV model (1)(3):

E(κ(Y)·1[T=t]Z)=ssupp(S)1[T=tS=s,Z]E(κ(Y(t))S=s)P(S=s), (14)

where κ : supp(Y ) ℝ is an arbitrary known function.

Proof

See Web Appendix A.2.

Setting κ(Y ) to 1 generates the propensity score equality:20

P(T=tZ=z)=ssupp(S)1[T=tS=s,Z=z]P(S=s). (15)

Replacing κ(Y ) by any variable X such that XT|S, we obtain:21

E(XT=t,Z)P(T=tZ)=ssupp(S)1[T=tS=s,Z]E(XS=s)P(S=s). (16)
Remark 3.2

Equation (14) characterizes the problem of identifying counterfactual outcomes within strata. There are NZ observed objects on the left-hand side for each t ∈ supp(T) totalling NZ · NT. Without further restrictions, the total number of latent response-types on the right-hand side is NTNZ, i.e., the number of strata. Thus, the number of observed quantities (NT · NZ) grows linearly in NZ while the number of possible response-types ( NTNZ) grows geometrically in NZ.22 Identification requires that constraints be placed on the number of admissible strata (S). Choice theory can produce such restrictions, as can other assumptions, such as those about functional forms.

Indicator 1[T = t|S = s, Z = z] in Equation (14) is deterministic because T is deterministic given Z and S in Equation (11). Our identification strategy develops economically interpretable restrictions on these indicators that govern the choice of treatment as Z varies. Such restrictions reduce the number of admissible response-types and characterize the indicators 1[T = t|S = s, Z = z], facilitating identification of causal parameters.

We note, for later use, that the probability of treatment choice conditional on response-types is

P(T=tS=s)=zsupp(Z)1[T=tS=s,Z=z]P(Z=zS=s),=zsupp(Z)1[T=tS=s,Z=z]P(Z=z), (17)

where the last equality is a consequence of SZ (item (ii) of Lemma L-1).

Note that Equation (14) is a discrete mixture latent class model, a feature we exploit below.23 Our paper differs from previous work on nonparametric instrumental variables. Instead of forming the usual nonparametric IV moment equations (see, e.g., Carrasco et al., 2007), we use instruments to construct strata that generate the kernels of finite mixture equations and choice theory to place restrictions on the kernels. We then use finite mixture methods to examine the identification of the individual causal parameters on the right hand of Equations (14)(16).

4 Identifying Response Probabilities and Counterfactual Outcomes

We now present general conditions for identifying response probabilities, counterfactual outcomes, and pre-program variables conditioned on strata. To do so, it is useful to express Equations (14)(15) as a system of linear equations. Define PZ(t) = [P(T = t|Z = z1),, P (T = t|Z = zNZ)]′, the vector of observed choice probabilities (“propensity scores”). Define PZ as the vector that stacks PZ(t) across t ∈ supp(T): PZ = [PZ(t1),,PZ(tNT)]′. QZ(t) is defined in an analogous fashion for outcomes defined for different values of T (i.e., multiplied by the treatment indicators). In a similar fashion, LZ(t) stands for vector X such that XT|S, Z. The left-hand sides of Equations (14) and (16) are given respectively by: QZ(t) = [E(κ(Y ) · 1[T = t]|Z = z1),, E(κ(Y ) · 1[T = t]|Z = zNZ)]′, and LZ(t) = [E(X · 1[T = t]|Z = z1),, E(X · 1[T = t]|Z = zNZ)]′, where LZ = [LZ(t1),,LZ(tNT)]′.

Let PS be the vector of unobserved response probabilities PS = [P(S = s1),, P(S = sNS)] ′ and LS = [E(X·1[S = s1]),, E(X·1[S = sNS])]′ be the unobserved vector of X-expectations times response indicators. We denote the vector of the expected outcomes multiplied by response indicators by: QS(t) = [E(κ(Y (t)) · 1[S = s1]),, E(κ(Y (t)) · 1[S = sNS])]′.

The following notation and concepts are used throughout the rest of this paper. Define response matrix R as an array of response-types defined over supp(S), i.e., R = [s1,, sNS]. To avoid trivial degeneracies we delete redundant rows (where different values of Z produce the same pattern for T) and redundant columns (where the same choices are made for the same value of Z). Matrix R has dimension NZ×NS. An element in the i-th row and n-th column of R is denoted by R[i, n] = (T|Z = zi,S = sn); i ∈ {1, · · · ,NZ}, n ∈ {1,, NS}. We use R[i, ·] to denote the i-th row of R, R, n] for the n-th column R.

Let Bt denote a binary matrix of the same dimension as R and whose elements take value 1 if the respective element in R is equal to t and zero otherwise. Notationally, we define an element in the i-th row and n-th column of matrix Bt by Bt[i, n] = 1[T = t|Z = zi, S = sn]; i ∈ {1, · · · ,NZ}, n ∈ {1,, NS}. We also use the short-hand notation Bt = 1[R = t] to denote Bt. Let BT be a binary matrix of dimension (NZ · NT ) × NS generated by stacking Bt as t ranges over supp(T):BT=[Bt1,,BtNT].

In this notation, Equations (14), (15), and (16) can be written respectively as

QZ(t)=BtQS(t), (18)
PZ=BTPS (19)
LZ=BTLS. (20)

If Bt and BT were invertible, QS(t), PS, and LS would be identified. However, such inverses do not always exist. In their place, we can use generalized inverses.

Let BT+ and Bt+ be the Moore-Penrose pseudo-inverses24 of matrices BT and Bt; t ∈ supp(T) respectively. The following expressions are useful for characterizing the identification of response probabilities and counterfactual means:

KT=INS-BT+BTandKt=INS-Bt+Bt;tsupp(T), (21)

where INS denotes an identity matrix of dimension NS. KT and Kt are orthogonal projection matrices that depend only on binary matrices BT and Bt; t ∈ supp(T).25

Applying the Moore-Penrose inverse to (18) and (19), we obtain:

PS=BT+PZ+KTλ (22)
QS(t)=Bt+QZ(t)+Ktλ (23)

where λ and λ̃ are arbitrary NS-dimensional vectors (same dimension as PS). In this notation, Theorem T-2 states general conditions for identification of response probabilities and counterfactual means:

Theorem T-2

For IV model (1)(3), if there exists a real-valued NS-dimensional vector ξ such that ξKT = 0, then ξPS and ξLS are identified. In addition, if there exists a real-valued NS-dimensional vector ζ such that ζKt = 0, then ζQS(t) is identified.

Proof

See Web Appendix A.3.

Theorem T-2 shows the identifying properties of the response matrix. For example, suppose that BT has full column-rank. Then BT+BT=INS and KT = 0. Therefore ξPS is identified for any real vector ξ of dimension NS. In particular, ξPS is identified when ξ is set to be each column vector of the identity matrix INS. In that case, each n-th column of INS identifies P(S = sn) and all the response-type probabilities are identified.26

Note that full-rank for BT does not imply full-rank for each Bt; t ∈ supp(T). Therefore, the identification of the response-type probabilities does not automatically produce identification of corresponding mean counterfactual outcomes. Corollary C-1 formalizes this discussion.

Corollary C-1

The following relationships hold for the IV model (1)(3):

VectorsPSandLSarepoint-identifiedrank(BT)=NS. (24)
VectorQS(t)ispoint-identifiedrank(Bt)=NS, (25)

Also, if (25) holds, then E(κ(Y(t))) is identified by ιBt+QZ(t), where ι is a NS-dimensional vector of 1s.

Proof

See Web Appendix A.5.

Versions of Corollary C-1 are found in the literature on the identifiability of finite mixtures.27 Given binary matrices BT, and Bt; t ∈ {1, ·, NT }, the problem of identifying PS, LS and QS(t) is equivalent to the problem of identifying finite mixtures of distributions where the BT and Bt play the roles of kernels of mixtures. Mixture components are the corresponding counterfactual outcomes conditional on the response types and mixture probabilities are the response-type probabilities.

One approach to identifiability is to simply assume that conditions (24) and (25) apply to R. A more satisfactory approach, and the one taken here and in Pinto (2016a), investigates how alternative specifications of choice relationships generate response matrices R that satisfy the identifiability requirements of Theorem T-2 and Corollary C-1.

It is important to note that we have given conditions for identifying counterfactual means within strata, E(Y(t)|S). Treatment effects are derived from, but are distinct from, these counterfactual means. Mean treatment effects are comparisons of different counterfactuals within the same set of strata: E(Y(t) − Y(t′)|S ∈ Σ) for tt′, where Σ ⊆ supp(S) is a subset-set of strata that might consist of a single element. In Section 7 we discuss identification of mean treatment effects, which is a more demanding problem.

4.1 Example: Binary Choice (LATE)

To familiarize the reader with our notation and concepts, and anticipate our generalization of it, consider the binary choice model implicit in the Local Average Treatment Effect – LATE. Treatment variable T takes two values: Tω = t1 if agent ω chooses to be treated and Tω = t0 if not. Instrument Z is binary valued (supp(Z) = {z0, z1}) with the property 0 < P(T = t1|Z = z0) < P(T = t1|Z = z1) < 1. A standard example is the problem of identifying the causal effect of college education on income Y. Agent ω decides between going to college (Tω = t1) or not (Tω = t0). Instrumental variable Z represents randomly assigned college scholarships. For example, Zω = z1 if a scholarship is assigned to agent ω and Zω = z0 if agent ω does not receive a scholarship.

The response vector is S = [T(z0), T(z1)]′. Without further restrictions, S can take four possible values described by the following response matrix:

s1s2s3s4R=[t1t0t1t0t1t1t0t0]valuesforT(z0)valuesforT(z1). (26)

In the language of LATE, the response-types s1, s2, s3, s4 are always-takers, compliers, defiers, and never-takers, respectively. Bt1 is the binary matrix that has the same dimension as R, whose elements take value 1 if the corresponding element in R is t1 and value 0 if the element in R is t0. Thus, Bt1 = 1[R = t1] and Bt0 = 1[R = t0] indicate whether elements in R are equal to t1 or t0, respectively.28

The 4 × 4 binary matrix BT=[Bt0,Bt1] has rank equal to 3, which is less than the number of response-types NS = 4. Therefore, by C-1, neither response-type probabilities nor the counterfactual outcomes are point identified. To identify them, it is necessary to reduce the number of response-types.

LATE solves this non-identification problem by assuming that each agent ω can only change his decision in one direction as the instrument varies. The monotonicity condition of Imbens and Angrist (1994) is:

Assumption A-1. Monotonicity for the Binary Choice Model

The following inequalities hold for any z, z′ ∈ supp(Z):

1[Tω(z)=t1]1[Tω(z)=t1]ψΩor1[Tω(z)=t1]1[Tω(z)=t1]ωΩ.29 (27)

In our example, condition A-1 assumes that each agent is inclined to decide towards college if a scholarship is granted, i.e., 1[Tω(z1) = t1] ≥ 1[Tω(z0) = t1] for all ω ∈ Ω. This eliminates the response-type s3 (the defiers) in matrix (26), generating the following matrices:

s1s2s4s1s2s4s1s2s4R=[t1t0t0t1t1t0],Bt1=[100110],Bt0=[011001],BT=[Bt0Bt1]. (28)

Under monotonicity condition A-1 the three response-type probabilities (P(s1), P(s2), P(s4)) and the four counterfactual outcomes (E(Y(t0)|S = s2), E(Y(t0)|S = s4), E(Y(t1)|S = s1), E(Y(t0)|S = s4)) are identified. These claims can be demonstrated by applying T-2 and C-1. For instance the rank of the binary matrix BT in (28) is 3, which is also the number of response-types. Thus, by C-1, all the response probabilities PS are identified. The identification of counterfactual outcomes depends on the properties of matrices Kt0, Kt1 that are calculated using the pseudo-inverse matrices Bt0+,Bt1+ as described in (21):

Bt0+=[001-101]Kt0=[100000000]andBt1+=[10-1100]Kt1=[000000001].

The observed vectors of propensity scores and conditional outcome expectations are PZ = [P(T = t|Z = z0), P(T = t|Z = z1)]′ and QZ(t) = [E(Y · 1[T = t]|Z = z0), E(Y · 1[T = t]|Z = z1)]′, for t ∈ {t1, t0}. The unobserved 3×1 vectors of responsetype probabilities and counterfactual outcomes are given by

PS=[P(S=s1),P(S=s2),P(S=s4)] (29)

and

QS(t)=[E(Y(t)S=s1)P(S=s1),E(Y(t)S=s2)P(S=s2),E(Y(t)S=s4)P(S=s4)]. (30)

Equations (29) and (30) enable us to write the counterfactual E(Y(t0)|S = s2) as E(Y(t0)S=s2)=ζQS(t0)ζPS where ζ = [0, 1, 0]′, so that ζPS = P(S = s2) is the population probability of the switchers. Note that ζKt0 = 0, thus, by T-2, E(Y(t0)|S = s2) is identified. From Equation (22)(23), we have:

E(Y(t0)S=s2)=ζBt0+QZ(t0)ζBt0+PZ(t0)=E(Y·1[T=t0]Z=z0)-E(Y·1[T=t0]Z=z1)P(T=t0Z=z0)-P(T=t0Z=z1).

By a parallel argument, the counterfactual outcome E(Y(t1)S=s2)=ζQS(t1)ζPS. Since ζKt1 = 0, by T-2, E(Y(t1)|S = s2) is identified from the expression

E(Y(t1)S=s2)=ζBt1+QZ(t1)ζBt1+PZ(t1)=E(Y·1[T=t1]Z=z1)-E(Y·1[T=t1]Z=z0)P(T=t1Z=z1)-P(T=t1Z=z0).

LATE is the causal effect for compliers E(Y(t1) − Y(t0)|S = s2). Since P(T = t0|Z = z) = 1 − P(T = t1|Z = z), ζBt1+PZ(t1)=ζBt0+PZ(t0)=P(S=s2). Putting these ingredients together,

E(Y(t0)-Y(t0)S=s2)=ζ(Bt1+QZ(t1)-Bt0+QZ(t0))ζBt1+PZ(t1)=E(YZ=z1)-E(YZ=z0)P(T=t1Z=z1)-P(T=t1Z=z0).

LATE is the causal effect conditioned on the values of V associated with strata s2. It does not identify the average treatment effect E(Y(t1) − Y(t0)) because we cannot identify Y(t1) for s4 (t0-always-taker) nor Y(t0) for s1 (t1-always-taker). The counterfactual outcomes for the always-takers can be expressed in terms of QS(t) and PS by:

E(Y(t0)S=s4)=ζ0QS(t0)ζ0PS;ζ0=[0,0,1]andE(Y(t1)S=s1)=ζ1QS(t1)ζ1PS;ζ1=[1,0,0].

Since ζ0Kt0=0 and ζ1Kt1=0, by Theorem T-2, E(Y(t0)|S = s4) and E(Y(t1)|S = s1) are identified. In Section 7, we use the properties of the generalized inverse to extend our analysis to a general model of multiple choices and extend the notion of compliers to the general unordered choice model.

4.2 Revisiting Vytlacil’s Equivalence Theorem

A by-product of our analysis is a simple derivation of Vytlacil’s (2002) fundamental equivalence result. He shows that monotonicity condition A-1 holds if and only if the treatment choice can be expressed as a function that is separable in Z and V, i.e., there exist deterministic functions, φ: supp(V) → ℝ and τ: supp(Z) → ℝ such that:

(1[T=t1]V=v,Z=z)=1[τ(z)φ(v)]. (31)

Monotonicity A-1 generates a key property of the binary matrix Bt1 = 1[R = t1]. We can always reorder its rows and columns so that Bt1 becomes a lower-triangular matrix.30 Consider the binary choice model where T takes values in {t0, t1} and Z takes values in {z1, …, zNZ} that are indexed by increasing values of the propensity score, i.e., P(T = t1|z1) ≤ ··· ≤ P(T = t1|zNZ). Arrange the columns of binary matrix Bt1 in decreasing order of the column-sums. Under Monotonicity A-1, Bt1 has dimension NZ ×(NZ +1) and is lower triangular. An explicit expression for Bt1 is given by Equation (28) for NZ = 2.31 Under triangularity, for all i ∈ {1 ···, NZ}, n ∈ {1, ···, NZ + 1},

Bt1[i,n]=1forinandBt1[i,n]=0fori<n. (32)

Propensity score equality (15) generates the following expressions:

P(T=t1Z=zi)=n=1NS1[T=t1Z=zi,S=sn]·P(S=sn)=n=1NZ+1Bt1[i,n]·P(S=sn)=n=1iP(S=sn). (33)

The second equality uses the definition of an element in the i-th row and n-th column of Bt1[i, n′], that is Bt1[i, n′] = 1[T = t1|Z = zi, S = sn] and that NS = NZ + 1 due to monotonicity A-1. The third equality uses triangularity property (32). Thus the following inequalities hold:

SinceP(T=t1Z=zi)=n=1iP(S=sn),thenP(T=t1Z=zi)n=1nP(S=sn)forin (34)
andP(T=t1Z=zi)<n=1nP(S=sn)fori<n. (35)

We can combine Equations (32) and (34)(35) to express the elements Bt1[i, n] as:

Bt1[i,n]=1[P(T=t1Z=zi)ϕ(sn)], (36)
whereϕ(sn)=P(S{s1,,sn})=n=1nP(S=sn). (37)

Vytlacil’s theorem emerges since Bt1[i, n] = 1[T = t1|Z = zi, S = sn] and S is a balancing score for V, i.e., S = fS(V). Thus, for any v ∈ supp(V) there is an s ∈ supp(S) such that s = fS(v), and

1[T=t1Z=z,V=v]=1[T=t1Z=z,S=fS(v)]=1[P(T=t1Z=z)τ(z)ϕ(fS(v))φ(v)] (38)

This expression captures the key idea that the response variable S summarizes V. Section 6 establishes separability properties for a general unordered choice model. The triangularity property generating separability carries over to that general setting.

5 Multiple Unordered Choices

In the published literature, when LATE is extended to analyze multiple choices, T is assumed to be a scalar index defined over an ordered finite set of natural numbers {1, …, NT} where the index is monotonically increasing (or decreasing) in the indicators of t (Angrist and Imbens, 1995). Treatment effects are defined in terms of variations in this index:

Assumption A-2. Ordered Monotonicity

The following inequalities hold for any z, z′ ∈ supp(Z), and each treatment t ∈ supp(T):

Tω(z)Tω(z)ωΩorTω(z)Tω(z)ωΩ. (39)

Under standard assumptions about IV, A-2 is equivalent to the assumption that choices are generated by an ordered choice model (Vytlacil, 2004). To extend monotonicity to the unordered case, we retain the core feature of a monotonic relationship: shifts in Z move all agents toward or against making treatment choice t in supp(T). We do not require any order among the values of T, nor do we rely on a scalar representation of T. Instead, we replace comparisons of T with inequalities that compare indicator functions of the values taken by T for each pair of values z, z′ in supp(Z). If the support of T has no natural order, Assumption A-2 is meaningless.

This section extends the literature to define a concept of monotonicity for an unordered choice model. We discuss restrictions on the response matrix R that follow from this definition. We present some examples that build intuition.

5.1 Monotonicity for Unordered Models

Assumption A-3. Unordered Monotonicity

The following inequalities hold for any z, z′ ∈ supp(Z), and each treatment t ∈ supp(T):

1[Tω(z)=t]1[Tω(z)=t]ωΩor1[Tω(z)=t]1[Tω(z)=t]ωΩ, (40)

where 1[Tω(z) = t] indicates whether or not agent ω chooses treatment t ∈ supp(T) when Z is set to z.

Using indicator functions, we can make pairwise comparisons for all values of Z for each choice t ∈ supp(T) without imposing an arbitrary ordering on the values of the treatment choices T or creating a scalar index of T. Condition (40) preserves the key intuitive notion of monotonicity: a shift in an instrument moves all agents uniformly toward or against each possible choice. A-3 prohibits non-uniform movements induced by the instruments and is ruled out in Theorem T-3 below.

In the case of binary treatment, Ordered Monotonicity A-2 and unordered monotonicity A-3 generate the same monotonicity restriction A-1.32 In Appendix C, we present a simple example that demonstrates the benefits of using choice indicators rather than cardinal measures of outcomes to define monotonicity.

5.2 Linking Unordered Monotonicity to Choice Theory

Under unordered monotonicity, treatment choice can be characterized as the solution to a problem in which agents maximize utility Ψ(t, z, v), the utility arising from choosing t ∈ supp(T) for agent ω whose unobserved variable V takes value v when the instrument Z is set at z. We present a formal analysis of the properties of Ψ(t, z, v) generated by unordered monotonicity in Section 6. In this section we build economic intuition of how unordered monotonicity arises. We use revealed preference arguments to restrict R and generate monotonicity conditions. We give examples where plausible restrictions on choice theory, coupled with standard instrumental variable conditions, produce identification of various strata counterfactuals and response-type probabilities. We also examine cases in which the point identification of response-type probabilities fails.

Consider a model of car purchase in which each agent buys a single car from three possible options: {a, b, c}. Let Tω = tj if agent ω buys car j in supp(T) = {ta, tb, tc}. Instruments are randomly assigned car-specific vouchers that offer price discounts to the car (or cars) specified by an offered voucher. We use za, zb, zc for vouchers that offer a discount to cars a, b and c respectively. We use zbc for the voucher whose discount can be used to buy car b or c. zno denotes no discount. If the voucher assigned to agent ω is za, then he faces a price-discount if he decides to buy car a. Agent ω pays full if decides to buy car b or c. If the agent were assigned voucher zbc then the cars b and c become cheaper while car a has full price. We compare experimental designs that randomly assign different combinations of 3 out of the 5 voucher-types described above. Each agent ω is assumed to buy some car. In this section and in Web Appendix D, we give some examples of how choice restrictions facilitate identification and where they fail.

Our main example carried throughout the rest of this paper considers vouchers in supp(Z) = {zno, za, zbc}. The response vector S is given by the 3-dimensional vector of counterfactual choices: S = [T(zno), T(za), T(zbc)]′. Each of the three counterfactual choices T(z); z ∈ {zno, za, zbc} takes values in {ta, tb, tc}, which gives a total of 27 (= 33) possible response-types.33 Without restrictions on admissible strata, the model of strata-contingent counterfactuals is not identified.34 There are four intuitive monotonicity relationships arising from changes in z:

1[Tω(zno)=ta]1[Tω(za)=ta], (41)
1[Tω(zbc)=ta]1[Tω(za)=ta], (42)
1[Tω(zno){tb,tc}]1[Tω(zbc){tb,tc}], (43)
1[Tω(za){tb,tc}]1[Tω(zbc){tb,tc}]. (44)

Relationship (41) states that the agent is induced toward buying car a when the instrument changes from no voucher (zno) to a voucher for car a (za). Relationship (42) states that the agent is induced toward buying car a when the instrument changes from a voucher to buy b or c (zbc) to a voucher for car a (za). Relationship (43) states that the agent is induced toward buying either car b or c when the instrument changes from no voucher (zno) to a voucher for either car b or c (zbc). Relationship (44) states that the agent is induced toward buying either car b or c when the instrument changes from a voucher for car a (za) to a voucher that applies to either car b or c (zbc). Monotonicity relationships (41)–(44) eliminate 12 response-types out of the 27 possible ones, leaving the 15 admissible response-types presented in Table 1.35

Table 1.

Response Matrix Generated by Monotonicity Relationships (41)–(44)

Instrumental Variables Choices Response-types of S
s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13 s14 s15
No Voucher T(zno) ta ta ta tb tb tb tb tb tb tc tc tc tc tc tc
Voucher for a T(za) ta ta ta ta ta tb tb tc tc ta ta tb tb tc tc
Voucher for b or c T(zbc) ta tb tc tb tc tb tc tb tc tb tc tb tc tb tc

Thus, by Corollary C-1, our model for counterfactuals is not identified. In addition, some of the remaining strata are not consistent with unordered monotonicity A- 3. More stringent application of revealed preference analysis can generate additional choice restrictions. Let Λω(z, t) be the consumption set of agent ω when assigned instrument z ∈ supp(Z) when treatment is set to t ∈ supp(T). Let γ ∈ Λω(z, t) represent a consumption good. Agent ω is assumed to maximize a utility function uω defined over consumption goods γ and choice t. Thus, the choice function Chω: supp(Z) → supp(T) of agent ω when the instrument is set to value z ∈ supp(Z) is:

Chω(z)=argmaxtsupp(T)(maxgΛω(z,t)uω(g,t)). (45)

For budget set Λω(z, t) for agent ω, we assume the following relationships:

Λω(zno,ta)=Λω(zbc,ta)Λω(za,ta), (46)
Λω(zno,tb)=Λω(za,tb)Λω(zbc,tb), (47)
Λω(zno,tc)=Λω(za,tc)Λω(zbc,tc). (48)

Relationship (46) compares the budget sets of agent ω for each possible voucher assignment given the car choice is fixed at a. The budget set of agent ω is enlarged when she has a voucher for car a (za) compared to when she does not (za is the only voucher that applies to car a). Thus, assigning consumer ω who buys car a, voucher za provides additional income. Vouchers zno and zbc offer no discount for car a and produce the same budget set for this choice. Relationship (47) examines the agent’s budget set if ω purchases car b. The budget set of agent ω is enlarged if she has a voucher that subsidizes car b when compared to vouchers that do not affect the choice set (za, zno). Relationship (48) examines the agent’s budget set when car c is assigned and is consistent with the budget analysis of relationship (47).36 For this example, the Weak Axiom of Revealed Preference (WARP) generates the following choice rule:

ifChω(z)=tandΛω(z,t)Λω(z,t)andΛω(z,t)Λω(z,t)Chω(z)t. (49)

37

In particular, Choice Rule (49) applied to budget set relationships (46)–(48) generates the choice restrictions 1–6 in Table 2.

Table 2.

Choice Restrictions Generated by Revealed Preference Analysis for supp(Z) = {zno, za, zbc}

Choice Restriction 1 : Chω(zno) = taChω(za) = ta
Choice Restriction 2 : Chω(zno) = tbChω(za) ≠ tc and Chω(zbc) ≠ ta
Choice Restriction 3 : Chω(zno) = tcChω(za) ≠ tb and Chω(zbc) ≠ ta
Choice Restriction 4 : Chω(za) = tbChω(zno) = tb and Chω(zbc) ≠ ta
Choice Restriction 5 : Chω(za) = tcChω(zno) = tc and Chω(zbc) ≠ ta
Choice Restriction 6 : Chω(zbc) = taChω(zno) = ta and Chω(za) = ta
Choice Restriction 7 : Chω(zno) ≠ taChω(zbc) = Chω(zno)

Under additional assumptions about choice, we generate additional restrictions on the admissible strata. It is reasonable to assume that if an agent decides to buy a car without a discount, then the agent will not alter his choice if assigned a voucher that makes his choice of car cheaper. Specifically consider the agent who decides between cars b and c when voucher assignment shifts from zno to zbc. There is no discount under zno whereas zbc offers a discount for either car. If most of the income increase is spent on goods, then the agent’s car choice likely remains the same.38 Under this condition, an income increase should not decrease its consumption of a good. If the agent is already consuming one unit of car b and his income is increased, then the agent will not decrease his car consumption, hence the agent still buys car b if the voucher changes from zno to zbc.39 This restriction on choice generates the 7 admissible response types in Table 2. The choice restrictions of Table 2 eliminate 20 out of the 27 possible response-types generating the admissible response matrix in Table 3.40

Table 3.

Response-types Generated by Revealed Preference Analysis for supp(Z) = {zno, za, zbc}

Instrumental Variables Choices Response-types of S
s1 s2 s3 s4 s5 s6 s7
No Voucher T(zno) ta ta ta tb tb tc tc
Voucher for car a T(za) ta ta ta ta tb ta tc
Voucher for car b or c T(zbc) ta tb tc tb tb tc tc

For the response matrix of Table 3, the rank of the indicator matrix BT associated with this response matrix is equal to 7 which is also equal to the number of response-types. From Corollary C-1, response-type probabilities are identified. We can also identify mean counterfactual outcomes defined in terms of the strata in the table. The response matrix of Table 3 is generated by the nine unordered monotonicity relationships of Table 4.41 The choice restrictions generated by the revealed preference analysis in Table 2 produce unordered monotonicity A-3.

Table 4.

An Identified Pattern of Response Matrices

Monotonicity Relationships Implied Propensity Score Inequalities
Relation 1 1[Tω(zno) = ta] ≤ 1[Tω(za) = ta] P(T = ta|Z = zno) ≤ P(T = ta|Z = za)
Relation 2 1[Tω(zno) = ta] ≥ 1[Tω(zbc) = ta] P(T = ta|Z = zno) ≥ P(T = ta|Z = zbc)
Relation 3 1[Tω(za) = ta] ≥ 1[Tω(zbc) = ta] P(T = ta|Z = za) ≥ P(T = ta|Z = zbc)

Relation 4 1[Tω(zno) = tb] ≥ 1[Tω(za) = tb] P(T = tb|Z = zno) ≥ P(T = ta|Z = za)
Relation 5 1[Tω(zno) = tb] ≤ 1[Tω(zbc) = tb] P(T = tb|Z = zno) ≤ P(T = ta|Z = zbc)
Relation 6 1[Tω(za) = tb] ≤ 1[Tω(zbc) = tb] P(T = tb|Z = za) ≤ P(T = ta|Z = zbc)

Relation 7 1[Tω(zno) = tc] ≥ 1[Tω(za) = tc] P(T = tc|Z = zno) ≥ P(T = tc|Z = za)
Relation 8 1[Tω(zno) = tc] ≤ 1[Tω(zbc) = tc] P(T = tc|Z = zno) ≤ P(T = tc|Z = zbc)
Relation 9 1[Tω(za) = tc] ≤ 1[Tω(zbc) = tc] P(T = tc|Z = za) ≤ P(T = tc|Z = zbc)

Remark 5.1

The response matrix in Table 3 is uniquely generated by the unordered monotonicity relationships of Table 4. By uniquely we mean that a change in the direction of any of these inequalities produces a response matrix that differs from the one in Table 3. This property is useful for testing the model assumptions as each monotonicity relationship implies a propensity score inequality that can be tested on observed data.

Unordered monotonicity can arise under different configurations of the instrumental variable. Thus, in the previous example, consider changing the support of the instrumental variable Z from {zno, za, zbc} to {zno, zb, zbc}. We can apply the same revealed preference analysis of the first example to {zno, zb, zbc}. This analysis generates the response matrix shown in Table 5 which is also uniquely generated by nine inequalities consistent with unordered monotonicity A-3. The response matrix also identifies response-type probabilities and an associated set of counterfactual outcomes. However, three out of seven response-types in Table 5 differ from the ones in Table 3.

Table 5.

Response-types Generated by Revealed Preference Analysis for supp(Z) = {zno, zb, zbc}.

Instrumental Variables Choices Response-types of S
s1 s2 s3 s4 s5 s6 s7
No Voucher T(zno) ta ta ta ta tb tc tc
Voucher for car b T(zb) ta ta tb tb tb tc tb
Voucher for car b or c T(zbc) ta tc tb tc tb tc tc

Choice restrictions alone do not necessarily produce identifiability. For an example, see Web Appendix D.2. We further note that unordered monotonicity A-3 is not a necessary condition for identification of model parameters. In Web Appendix D.3, we modify the example of Table 5 by assuming that Z takes values in supp(Z) = {zc, zb, zbc}. WARP alone generates the response matrix described in Table 6.42 The rank of its associated binary matrix BT is equal to 7. Thus, response-type probabilities are identified. However, the response matrix in Table 6 is not consistent with unordered monotonicity A-3. There is no sequence of monotonic relationships consistent with A-3 that generates this response matrix. For example, consider the change in voucher assignment from voucher for c (zc) to voucher for b (zb) in Table 6. This change induces those in s4 to move towards ta (from tc to ta), while those in s2 to move away from ta (from ta to tb). This pattern of counterfactual choices is inconsistent with monotonicity.43 Moreover, revealed preference analysis may or may not identify the choice model, depending on the patterns of restrictions imposed on the variation in the instruments.44

Table 6.

Response-types Generated by Revealed Preference Analysis for supp(Z) = {zc, zb, zbc}.

Instrumental Variables Count. Choices Response-types of S
s1 s2 s3 s4 s5 s6 s7
Voucher for c T(zc) ta ta tb tc tc tc tc
Voucher for b T(zb) ta tb tb ta tb tb tc
Voucher for b or c T(zbc) ta tb tb tc tb tc tc

6 Equivalent Conditions for Characterizing Unordered Monotonicity

This section presents and interprets general properties shared by all response matrices that satisfy unordered monotonicity A-3. We explore a variety of ways to express A-3 including separability of choice equations.

6.1 Properties of Binary Matrices

To establish a relationship between identifiability and the properties of response matrix R, it is helpful to use concepts from the literature on binary matrices. A binary matrix is lonesum if it is uniquely determined by its row and column sums.45 We establish that response matrix R is an unordered monotone response matrix (henceforth “monotone”) if each binary matrix derived from it, Bt = 1[R = t]; t ∈ supp(T), is lonesum. Lonesum matrices can be used to characterise monotonicity conditions in choice models. We show that identification and equivalence results arise from the properties of lonesum matrices.

Let ri,t be the i-th row sum of the binary matrix Bt:ri,t=n=1NSBt[i,n]. Let cn,t denote the sum of the n-th column of Bt, that is, cn,t=i=1NZBt[i,n]. The maximal of matrix Bt is a matrix whose i-th row is given by ri,t elements 1 followed by 0s. Two matrices are equivalent if one can be transformed into the other by a series of row and/or column permutations.

Table 7 displays matrix Bta = 1[R = ta], where R is the response matrix of Table 3. The first column of Table 7 gives the row sums of Bta . The last row of Table 7 presents its column sums. To show that matrix Bta is lonesum, reorder its columns and rows based on decreasing values of column sums and increasing values of row sums. The maximal of Bta is obtained by a reordering of Bta based only on row and column sums. Note that there are different orderings for different t. The reordered matrix of Table 3 is given in Table 8. It is a maximal matrix because the matrix rows are described by elements 1 followed by 0s. For example, if a maximal matrix has 7 columns and its first row sum is 1, the first row is [1, 0, 0, 0, 0, 0, 0]. Thus a maximal matrix is uniquely determined by its row sums. Therefore we conclude that Bta is a lonesum matrix. One can check that matrices Btb and Btc of Table 3 are also lonesum. Thus, following our definition, response matrix R of Table 3 is unordered monotone. In our analysis of LATE in Section 4.1, Bt1 and Bt0 are both lonesum.

Table 7.

Row and Column Sums of Matrix Bta of Response Matrix in Table 3

Row Sum Row Index Matrix Bta = 1[R = ta] of Table 3
s1 s2 s3 s4 s5 s6 s7
3 r1,ta 1 1 1 0 0 0 0
5 r2,ta 1 1 1 1 0 1 0
1 r3,ta 1 0 0 0 0 0 0

Column Index c1,ta c2,ta c3,ta c4,ta c5,ta c6,ta c7,ta
Column Sum 3 2 2 1 0 1 0

Table 8.

Reordered Matrix Bta According to Increasing Values of Row Sums and Decreasing Values of Column Sums

Row Sum Row Index Reordered Rows and Columns by Sums
s1 s2 s3 s4 s6 s5 s7
1 r3,ta 1 0 0 0 0 0 0
3 r1,ta 1 1 1 0 0 0 0
5 r2,ta 1 1 1 1 1 0 0

Column Index c1,ta c2,ta c3,ta c4,ta c6,ta c5,ta c7,ta
Column Sum 3 2 2 1 1 0 0

6.2 Characterizing Unordered Monotonicity

The following conditions are necessary and sufficient for characterizing unordered monotonicity A-3:

Theorem T-3

The following statements are equivalent characterizations of A-3 for the IV model (1)–(3):

  1. R is an unordered monotone response matrix, i.e., each binary matrix Bt = 1[R = t]; t ∈ supp(T) is lonesum;

  2. For any t, t′, t″ ∈ supp(T), there are no 2 × 2 sub-matrices of R of the type:
    (tttt)or(tttt),wherettandtt. (50)
    46
  3. Unordered monotonicity: For any z, z′ ∈ supp(Z), and for each treatment t ∈ supp(T), we have that:47
    1[Tω(z)=t]1[Tω(z)=t]ωΩor1[Tω(z)=t]1[Tω(z)=t]ωΩ.
  4. Unordered Separability: treatment choice can be represented by separable choice functions in V and Z, i.e., there exist functions φ : supp(V ) × supp(T) → ℝ and τ : supp(Z) × supp(T) → ℝ such that:
    1[T=tV=v,Z=z]=1[Ψ(t,z,v)0]=1[φ(v,t)+τ(z,t)0]. (51)
Proof

See Web Appendix A.6.

Condition (i) states our main condition for equivalence: if and only if response matrix R is unordered monotone, each indicator matrix formed from it (Bt = 1[R = t]) is lonesum, and conversely. Condition (ii) states that if R is an unordered monotone response matrix, each 2 × 2 sub-matrix in R is not of the form in (50). Condition (iii) states that the conditions preceding it hold if and only if unordered monotonicity A-3 holds. As previously noted, condition (iii) implies monotonicity A-1 for the binary choice model. Condition (iv) is a separability property that characterizes the choice functions. Vytlacil’s equivalence theorem (2002) is generated by the equivalence of conditions (iii) and (iv) when we specialize the model to the case of a binary treatment.48

6.3 Interpreting T-3

Condition (i) describes a key property of response matrices: the lonesum property of treatment choice indicators. Lonesum matrices are not only useful for characterizing unordered monotonicity, but they are key concepts for investigating properties of choice models.49

Condition (i) of T-3 implies that Bt is fully characterized by its column and row sums. This condition implies that the response matrix R is also characterized by its row and column sums. However, the reverse is not true. We illustrate this in Remark 6.1 :

Remark 6.1

If R is an unordered monotone response, each matrix Bt is lonesum and therefore fully characterized by its column and row sums ri,t, cn,t; t ∈ supp(T), i ∈ {1, …, NZ}, n ∈ {1, …, NS}. Since Response matrix R can be written as R=tsupp(T)tBt, R is characterized by its column and row sums ri,t, cn,t as well. However, the reverse is not true. R being characterized by its column and row sums does not imply that R is an unordered monotone response. To illustrate this claim, let response matrix R be defined by:

R=(t1t2t2t3),thusr1,t1=1,r1,t2=1,r1,t3=0,r2,t1=0,r2,t2=1,r2,t3=1,rowsums,:c1,t1=1,c1,t2=1,c1,t3=0,c2,t1=0,c2,t2=1,c2,t3=1.columnsums

R is not unordered monotone because it violates condition (ii) of T-3. Moreover Bt2 = 1[R = t2] exhibits one of the prohibited patterns (52) and it is not lonesum. Nevertheless, R is fully characterized by its column sums and row sums: r1,t1 = 1 and c1,t1 = 1 ⇒ R[1, 1] = t1; r2,t3 = 1 and c2,t3 = 1 ⇒ R[2, 2] = t3; r1,t2 = 1 and R[1, 1] = t1R[1, 2] = t2; r2,t2 = 1 and R[2, 2] = t3R[2, 1] = t2.

All response matrices for the case of binary treatment are equivalent under monotonicity A-1. This property does not hold for the general unordered case:

Remark 6.2

Consider the binary choice model in which the instrument takes NZ values and T takes values in {0, 1}. Unordered monotonicity generates a monotonicity inequality for each pair of Z-values. Different sets of inequalities generate different response matrices. However, each of these response matrices is equivalent to the same lower triangular binary matrix with NZ rows and NZ + 1 columns (see the example in Section 4.1) and produces an identified model. However, in the case of multiple choices, unordered monotonicity does not generate response matrices that are equivalent to the same matrix. For example, the response matrices of Tables 3 and 5 are monotone responses but they are not equivalent, because one matrix cannot be transformed into another by row and/or column permutations. The response matrices in Tables 3 and 5 consist of seven response-types for NT = 3 and NZ = 3. There are 27 possible response-types for NT = 3 and NZ = 3. The combination of 7 response-types out of these 27 generates 888,030 possible response matrices, although some may not be identifiable. Among them, 66 response matrices satisfy unordered monotonicity condition (iii).50 Response matrices of Tables 3 and 5 are two examples of these matrices.

Condition (ii) of T-3 imposes a restriction on counterfactual choices that does not depend on the number of treatment choices in supp(T) or the number of values that Z takes. The condition rules out two-way flows generated by changes in instruments. Thus the response matrix of Table 6 is not unordered monotone. The forbidden type of condition (ii) is obtained using the first and second rows of response-types s2 and s4 in Table 6.51 The change from zc to zb shifts people away from a in s2 but toward a in s4.

Remark 6.3

We note that a consequence of condition (ii) in T-3 is that under A-3, no 2 × 2 sub-matrix of any Bt; t ∈ supp(T) is of the type:52

(1001)nor(0110). (52)

Unordered monotonicity A-3 holds if and only if no prohibited patterns (52) occur for any Bt; t ∈ supp(T). An example clarifies the equivalence between the requirements for unordered monotonicity A-3 and the absence of prohibited patterns (52). Suppose that (1[T = t]|Z = z, V = v) ≥ (1[T = t]|Z = z, V = v) holds for all v ∈ supp(V ). Then it must be the case that:

(1[T=t]Z=z,S=s)(1[T=t]Z=z,S=s) (53)

holds for all s ∈ supp(S) because for each v ∈ supp(V) there is s ∈ supp(S) such that s = fS(v) (see (10)) and (T|S = s, Z = z) = (T|V = v, Z = z). Inequality (53) generates three possible sub-vectors of dimension 2 × 1 that indicate whether T is equal to t when Z takes value z and z′ or any response-type s ∈ supp(S):

((1[T=t]Z=z,S=s)(1[T=t]Z=z,S=s)){(00),(10),(11)}forallssupp(S). (54)

The matrix generated by a combination of sub-vectors in (54) for any two response-types s, s′ ∈ supp(S) is:

((1[T=t]Z=z,S=s)(1[T=t]Z=z,S=s)(1[T=t]Z=z,S=s)(1[T=t]Z=z,S=s)).

It cannot be of the form:

(1001)or(0110),

which are prohibited patterns (52). Hence the weak inequality (1[T = t]|Z = z, V = v)) ≥ (1[T = t]|Z = z, V = v)) ∀ v ∈ supp(V) implies that Bt is lonesum. On the other hand, suppose that v, v′ ∈ supp(V) are such that (1[T = t]|Z = z, V = v) > (1[T = t]|Z = z, V = v) and (1[T = t]|Z = z, V = v′) < (1[T = t]|Z = z, V = v′). Then there must exist s, s′ ∈ supp(S) where s = fS(v), s′ = fS(v′) that generates the prohibited pattern:

((1[T=t]Z=z,S=s)(1[T=t]Z=z,S=s)(1[T=t]Z=z,S=s)(1[T=t]Z=z,S=s))=(1001). (55)

The equality of the first columns in the right and left side of Equation (55) means that, for some type s, treatment t is chosen when the instrument shifts from z′ to z. The equality of the second columns of Equation (55) states the opposite. For some type s, the instrument shift from z′ to z causes treatment t not to be chosen. This behavior violates both the intuitive notion and formal definition of monotonicity because the instrument shifts some agents to change their choice towards t while others change their choice away from t.53 Condition (iii) is implied by super (or sub) modularity of Ψ(t, z, v) in terms of v and z for all t, but that condition is stronger than what is required to produce A-3. Strictly speaking, the requirement is that component-wise, sgn(ΔΨ(T,z,v)Δz) is the same for all V = v for each T = t and Z = z.

6.4 Understanding Condition (iv) of T-3

We draw on and generalize the binary-treatment model of Sections 4.1–4.2 to build the intuition underlying condition (iv). In the binary case, monotonicity implies that Bt1 is lower triangular (28).54 Triangularity generates Equation (38) which expresses treatment choice T as an indicator function that is separable in the observed propensity score P(T = t1|Z), which depends on Z, and a sum of response-type probabilities, which depends on V.

Theorem T-3 applies to choice models with multiple treatments, which include the binary case. If unordered monotonicity (condition (iii) of T-3) holds, then each binary matrix Bt; t ∈ supp(T) is characterized solely by its row and column sums so that Bt; t ∈ supp(T) are lonesum (Item (i) of T-3). This property can be understood as a generalization of the lower triangular property in the binary case, but applied to each Bt.55 Generalized triangularity generates condition (iv) which characterizes treatment choice as an indicator function that is separable in Z and V. We present a detailed discussion of this condition in Appendix G.

To interpret separability condition (iv), suppose that agent ω with Vω = v ∈ supp(V) chooses t ∈ supp(T) when an instrumental variable is set to zsupp(Z), so that 1[T = t|V = v, Z = z] = 1. According to condition (iv), there exist functions φ and τ such that φ(v, t) + τ (z, t) ≥ 0.56 It is clear that expressions of this type rule out the prohibited patterns (52) and therefore generate unordered monotonicity. What is less obvious is that (iii) implies representation (iv), which is not necessarily unique.57

Note that 1[T = t|V = v, Z = z] = 1 implies that 1[T = t|V = v, Z = z] = 0 for all t′ ∈ supp(T) \ {t}. Therefore it must be the case that φ(v, t′) + τ (z, t′) < 0 for all t′ that differs from t. In particular, condition (iv) implies that:

1[T=tV=v,Z=z]=1t=argmaxtsupp(T)(Ψ(t,z,v))=argmaxtsupp(T)(φ(v,t)+τ(z,t)). (56)

Condition (iv) does not claim that the functions φ and τ are unique. Indeed if t maximizes φ(v, t′) + τ (z, t′), it also maximizes m(φ(v, t′) + τ (z, t′)) where m is any strictly increasing function.

Condition (iv) does not impose rationality or perfect foresight on agent decision making. Suppose that agent ω decides among t1, t2, t3 and that his treatment choice is generated by maximization of a utility function Ψ(t, z, v) where Vω = v and Zω = z. Condition (iv) states that if unordered monotonicity A-3 holds, the maximized choice value Ψ(t, z, v) can be characterized as arising from the maximization of a separable function φ(v, t)+τ (z, t). Specifically, if ω chooses t1, then t1 is the maximum among Ψ(t, z, v) for t ∈ {t1, t2, t3}. In this case, t1 also maximizes φ(v, t) + τ (z, t) for t ∈ {t1, t2, t3}:

t1=argmaxt{t1,t2,t3}Ψ(v,t,z)t1=argmaxt{t1,t2,t3}(φ(v,t)+τ(z,t)).

Condition (iv) does not imply that the ranking of treatment utilities generated by Ψ(t, z, v) is necessarily the same as the ranking generated by φ(v, t) + τ (z, t). For instance, if Ψ(t1, z, v) > Ψ(t2, z, v) > Ψ(t3, z, v) then ω prefers t1 to t2, and t2 to t3. This does not necessarily imply that φ(v, t1) + τ (z, t1) > φ(v, t2) + τ (z, t2) > φ(v, t3) + τ (z, t3). Indeed, φ(v, t1) + τ (z, t1) > φ(v, t3) + τ (z, t3) > φ(v, t2) + τ (z, t2) may also occur. It is the ranking of t1 relative to the next best that generates agent choices of t1. Variation in instruments only identify preferences relative to the next best choice and not an order among the remaining elements in the choice set.

To formalize this discussion, we establish that unordered monotonicity arises if we assume that utilities of a choice compared to the next best choice can be represented as additively separable functions:58

u(v,t)+h(z,t)=Ψ(t,z,v)-maxtsupp(T)\{t}Ψ(t,z,v).

The following theorem formalizes this point.

Theorem T-4

If there exist functions u: supp(V)×supp(T) → ℝ and h: supp(Z)× supp(T) → ℝ such that

u(v,t)+h(z,t)=(Ψ(t,z,v)-maxtsupp(T)\{t}Ψ(t,z,v))vsupp(V),zsupp(Z),

then the response matrix R associated with this choice model is unordered monotone.

Proof

See Web Appendix A.7.

As before, the separable representation is not necessarily unique.

Remark 6.4

T-4 imposes stronger functional form assumptions than T-3. Summarizing:

unorderedmonotonicity(argmaxtsupp(T)φ(v,t)+τ(z,t))=(argmaxtsupp(T)(Ψ(t,z,v)-maxtsupp(T)\{t}Ψ(t,z,v)))whileu(v,t)+h(z,t)=(Ψ(t,z,v)-maxtsupp(T)\{t}Ψ(t,z,v))unorderedmonotonicity

Heckman et al. (2006b, 2008) assume separability in the underlying preference functions and show that IV estimates a LATE that compares the outcome of one choice to the outcome for the next best option. Our condition is weaker. Theorem T-4 states that unordered monotonicity only requires that the utility of a choice relative to the next best choice be separable. To clarify, the impact of instrument Z on the treatment choice is summarized by the term h(z, t). Suppose Z changes from z′ to z. If h(z, t) − h(z, t) > 0, each agent is induced towards t. If h(z, t) − h(z, t) < 0 agents are induced against t. This analysis applies for all pairwise values of (z, z′) ∈ supp(Z)×supp(Z) and for all t ∈ supp(T). The collection of all of these inequalities characterizes unordered monotonicity A-3.

6.5 Verifying Unordered Monotonicity Condition A-3

Verifying condition (ii) of Theorem T-3 is a daunting combinatorial task. It would require checking each 2 × 2 sub-matrix in R, which is impractical for large R. We show that a single calculation based on a simple multiplication of binary matrices suffices to check condition A-3. Our criterion is based on a binary matrix M:

Foreachtjsupp(T)={t1,,tNT},letMtj=[1NZ,NS,,1NZ,NSj-1times,Btj,0NZ,NS,,0NZ,NSNT-jtimes],thenM=[Mt1,,MtNT], (57)

where 1NZ,NS is a matrix of elements 1 and 0NZ,NS is a matrix of elements 0 of same dimension. Matrix M is block diagonal with matrices Bt on the diagonal, where, again, we eliminate any redundancies. M has elements 1 below this diagonal and elements 0 above it.

Theorem T-5

For the IV model (1)(3), the response matrix R is an unordered monotone response, that is, each binary matrix Bt = 1[R = t]; t ∈ supp(T) is lonesum, if and only if,

ιc((M(ιrιc-M))(M(ιrιc-M)))ιc=0, (58)

where ιr is an NT · NZ vector 1s and ιc is an NT · NS vector 1s. Moreover, if Equation (58) holds, then matrix M is lonesum.

Proof

See Web Appendix A.8.

Unordered monotonicity condition A-3 holds if and only if this value is equal to zero. Moreover, if equation (58) holds, then all the conditions stated in Theorem T-3 also hold.

7 Identification of Counterfactuals and Treatment Effects

This section applies our analysis to determine which counterfactuals and treatment effects are identified and for which strata. We build on our analysis of binary LATE presented in Section 4.1. We generalize the notions of “compliers,” “always takers” and “never-takers” to a general unordered choice model.

To this end, it is helpful to introduce some additional notation. Let Σt(i) be the set of response-types in which t appears exactly i times:

t(i)={sn,suchthatsnsupp(S)andj=1NZBt[j,n]=i}wherei{0,,NZ}. (59)

For example, Σta (2) for the response matrix of Table 3 consists of the response-types for which the value ta appears exactly twice. They are Σta (2) = {s2, s3} (see Table 9). Those are also the response-types whose column-sum of Bta in Table 7 is 2.

Table 9.

Partition of Response-types in Table 3 where supp(Z) = {zno, za, zbc}

Instrumental Variables Count. Choices Response-types of S
s1 s2 s3 s4 s5 s6 s7
No Voucher T(zno) ta ta ta tb tb tc tc
Voucher for car a T(za) ta ta ta ta tb ta tc
Voucher for car b or c T(zbc) ta tb tc tb tb tc tc

Response-types in Σta (0) s5 s7
Response-types in Σta (1) s4 s6
Response-types in Σta (2) s2 s3
Response-types in Σta (3) s1

ta-Switchers s2 s3 s4 s6
ta-Always-takers s1
ta-Never-takers s5 s7

For each t ∈ supp(T), we can partition the set of response-types by the number of times a treatment value t appears: supp(S)=i=0NZt(i). Table 9 displays these partitions for Σta(i); i = 0,, 3 based on the response matrix in Table 3. Let bt(i) be the NS-dimensional binary row-vector that indicates if response-type s belongs to Σt(i), that is, bt(i)[n] = 1 if sn ∈ Σt(i) and zero otherwise. For Table 3, bta(2) = [0, 1, 1, 0, 0, 0, 0]. Using this notation, we prove the following identification theorem:

Theorem T-6

If unordered monotonicity A-3 holds for the IV model (1)(3) then the following response-type probabilities and counterfactuals are identified:

P(St(i))andE(κ(Y(t))St(i))tsupp(T)andi{1,,NZ}. (60)

Moreover, those parameters can be evaluated by the following equations:

P(St(i))=bt(i)Bt+PZ(t)andE(κ(Y(t))St(i))=bt(i)Bt+QZ(t)bt(i)Bt+PZ(t), (61)

where κ: supp(Y) → ℝ denotes an arbitrary function in the support of Y.

Proof

See Web Appendix A.9.

Note that, in general, we cannot identify counterfactuals within every stratum. Nonetheless, it can be shown that under unordered monotonicity, at least one treatment effect can always be identified.59

Table 10 lists the mean counterfactual outcomes that are identified by applying Equation (60) of T-6 to the response matrix R in Table 9. Those are E(Y (t)|S ∈ Σt(i)) for i{1, 2, 3} and t{ta, tb, tc}.

Table 10.

Identified Counterfactual Outcomes for R in Table 3

Response-type Sets Counterfactual Outcomes
Y (ta) Y (tb) Y (tc)
Σt(1) E(Y (ta)|S{s4, s6}) E(Y (tb)|S = s2) E(Y (tc)|S = s3)
Σt(2) E(Y (ta)|S{s2, s3}) E(Y (tb)|S = s4) E(Y (tc)|S = s6)
Σt(3) E(Y (ta)|S = s1) E(Y (tb)|S = s5) E(Y (tc)|S = s7)
Remark 7.1

A direct implication of Theorem T-6 is that if there exists t, t′ ∈ supp(T) and i, i′ ∈ {1,, NZ} such that Σt(i) = Σt (i′) then E(Y (t)−Y (t′)|Σt(i)) is identified. This expression is the mean treatment effect of t relative to tfor the set of strata Σt(i).

Expression (61) uses the tools for identification based on the generalized inverse developed in Section 4. For instance, we can apply Equation (61) of T-6 to generate the following identifying relationships:

E(Y(ta)S{s2,s3})=E(Y(ta)Sta(1))=bta(1)Bt+QZ(ta)bta(1)Bt+PZ(ta)=E(Y·1[T=ta]Z=zno)-E(Y·1[T=ta]Z=zbc)P(T=taZ=zno)-P(T=taZ=zbc). (62)

See Web Appendix A.10 for the derivation of Equation (62). Counterfactuals E(Y (tb)|S = s2) and E(Y (tc)|S = s3) are also identified (See Table 10). Thus we can identify the effect:

E(Y(ta)-Y(tb)S=s2)P(S=s2)+E(Y(ta)-Y(tc)S=s3)P(S=s3)P(S=s2)+P(S=s3). (63)

Let ta¯ designate the choice values other than ta. We use the notation E(Y(ta)-Y(ta¯)S{s2,s3}) to designate treatment effect (63), which stands for the effect of choosing ta versus not choosing ta for response-types s2, s3. (See Web Appendix A.19 for its derivation.)

We generalize the terminology of Angrist et al. (1996) to the case of multiple treatments. The appropriate generalization is t-specific. In the binary case, there is no need to specify a particular t since the specification of one value automatically implies the other possible value. The t-Never-takers are those in the set of response-types in Σt(0), for which t does not occur. Σt(NZ) consists of a single response-type whose elements are all t. It is the set of the t-Always-takers. The set of remaining response-types are t-Switchersi=1NZ-1t(i) consists of all strata for which the choice of treatment t varies as Z ranges in its support.60 Those sets are formally defined as:

t-Never-takers={ssupp(S);P(T=tS=s)=0}t(0);t-Always-takers={ssupp(S);P(T=tS=s)=1}t(NZ);t-Switchers={ssupp(S);0<P(T=tS=s)<1}i=1NZ-1t(i).

These sets for the response matrix of Table 3 are: ta-Always-takers = {s1}, ta-Never-takers = {s5, s7}, and ta-Switchers = {s2, s3, s4, s6} (see Table 9). Corollaries C-2C-3 present identification results for the various categories

Corollary C-2

For the IV model (1)(3) in which unordered monotonicity A-3 holds, the following probabilities are identified for each t ∈ supp(T):

P(St-Always-takers)=P(St(NZ))=bt(NZ)Bt+PZ(t);P(St-Switchers)=P(Si=1NZ-1t(i))=(i=1NZ-1bt(i))Bt+PZ(t);P(St-Never-takers)=P(St(0))=1-P(St-Always-takers)-P(St-Switchers).
Proof

See Web Appendix A.11.

Corollary C-3

Assume the IV model (1)(3) for which unordered monotonicity A-3 holds. The mean counterfactual outcomes for the t-Always-takers and t-Switchers for each t ∈ supp(T) are generated by:

E(Y(t)t-Always-takers)=E(Y(t)St(NZ))=bt(NZ)Bt+QZ(t)bt(NZ)Bta+PZ(t);E(Y(t)t-Switchers)=i=1NZ-1E(Y(t)St(i))·P(St(i))P(St-Switchers),wherei=1NZ-1P(St(i))P(St-Switchers)=1;AlternativelyE(Y(t)t-Switchers)=(i=1NZ-1bt(i))Bt+QZ(t)(i=1NZ-1bt(i))Bta+PZ(t). (64)
Proof

See Web Appendix A.12.

Corollary C-2 relies on the result in T-6 that P(S ∈ Σt(i)) is identified for all i{1,, NZ}. Corollary C-3 is obtained by setting κ(Y) = Y and using the fact that E(κ(Y (t))|S ∈ Σt(i)) is identified for all i{1,, NZ}. To illustrate these corollaries we present the following example.

Remark 7.2

Corollary C-2 states that the expected value of counterfactual mean outcomes for response-types s ∈ supp(S) such that P(T = t|S = s) = 1 (the t-Always-takers) or 0 < P(T = t|S = s) < 1 (t-Switchers) are identified. According to Remark 3.1, these response-types refer to the values v ∈ supp(V) such that 0 < P(T = t|V = v) ≤ 1. Therefore, C-3 implies that E(Y (t)|V{v; 0 < P(T = t|V = v) ≤ 1}) is identified. The remaining set of response-types are the t-Never-takers, which consists of the response-types s ∈ supp(S) such that P(T = t|S = s) = 0. This set refers to the set of values v ∈ supp(V) such that P(T = t|V = v) = 0. If the set of t-Never-takers is empty, then all response-types belong to either t-Always-takers or t-Switchers and E(Y (t)) is identified.

Example 7.1

According to C-3, the counterfactual outcome mean for ta-Switchers in the response matrix R of Table 3 is given by:

E(Y(ta)ta-Switchers)=E(Y(ta)S{s2,s3,s4,s6})=(i=12bta(i))Bta+QZ(ta)(i=12bta(i))Bta+PZ(ta) (65)

The components of Equation (65) that can be estimated from observed data are:

PZ(ta)=[P(T=taZ=zno),P(T=taZ=za),P(T=taZ=zbc)];QZ(ta)=[E(Y·1[T=ta]Z=zno),E(Y·1[T=ta]Z=za),E(Y·1[T=ta]Z=zbc)].

The components of (65) that depend on the response matrix are:

i=12bta(i)=[0,1,1,1,0,1,0];Bta=[111000011110101000000]Bta+=[0011/20-1/21/20-1/2-1/21/20000-1/21/20000].

Equation (65) produces the following expression:

E(Y(ta)ta-Switchers)=E(Y·1[T=ta]Z=za)-E(Y·1[T=ta]Z=zbc)P(T=taZ=za)-P(T=taZ=zbc).

Web Appendix A.17 presents additional results on identification. Web Appendix A.21 shows that E(Y(ta)-Y(ta¯)ta-Switchers) is also identified. In contrast with the binary case, there is no response-type s{s1,, s7} of response matrix R in Table 3 such that E(Y (t) − Y (t′)|S = s) is identified for any t, t′ ∈ {ta, tb, tc}.

It is important to distinguish identification of counterfactuals within sets of strata from the identification of treatment effects within sets of strata. An example is helpful. Consider the response matrix for a two-valued instrument with three treatment choices under unordered monotonicity. It has five response types:

s1s2s3s4s5R=[t1t1t2t3t3t1t2t2t2t3]

Five counterfactual mean outcomes are identified: E(Y (t1)|S = s1), E(Y (t1)|S = s2), E(Y (t2)|S = s3), E(Y (t3)|S = s4), E(Y (t3)|S = s5). E(Y (t2) − E(Y (2)|t̄2Switchers) is identified. This is the treatment effect of t2 versus the next best treatment, which may vary among members of the t2 switchers.61

7.1 Maximum Number of Admissible Response-types to Secure Identification

The identification of strata probabilities P(s),S can be achieved under weaker conditions than are required for identifying counterfactual outcomes. The identification of response-type probabilities depends on the column-rank of BT while the identification of counterfactual outcomes for a choice t ∈ supp(T) depends on the column-rank of Bt. The rank of BT is always greater than the rank of each Bt; t ∈ supp(T) because BT is generated by stacking Bt across tsupp(T) (Equation (24)).

We characterize the maximum number of response-types NS in R that facilitate the identification of all response-type probabilities, that is, the maximum NS such that NS ≤ rank(BT).

Theorem T-7

Consider the IV model (1)(3). Let R be the response matrix consisting of NS response-types. If response-type probabilities are point identified, then it must be the case that:

NS1+(NT-1)NZ-i=1NZj=1NT1[P(T=tjZ=zi)=0],

where NZ is the number of possible values that the instrument takes and NT is the number of possible values that the treatment choice T takes. In particular, if P(T = t|Z = z) > 0 for all z ∈ supp(Z) and tsupp(T) then the maximum number of response-types NS in R for the model to be identified is:

NS=1+(NT-1)NZ.
Proof

See Web Appendix A.13.

To identify choice-response probabilities, choice restrictions should eliminate at least NTNZ-[1+(NT-1)NZ] response-types to generate identification of response-type probabilities. T-6 shows that even if we are not able to identify each probability P(S = s); s ∈ supp(S), it may still be possible to identify counterfactual means E(Y (t)|S ∈ Σt(i)) and associated probabilities P(S ∈ Σt(i)) for strata Σt(i). See Web Appendix A.14 for additional results on identification of strata probabilities.

8 Summary and Conclusions

This paper extends the literature on instrumental variables to general unordered choice models with heterogenous responses which affect choice. We generalize the notion of monotonicity to unordered choice models. Using discrete instruments, we present conditions under which certain counterfactuals and treatment effects associated with general unordered multiple discrete choice models can be identified. We demonstrate how to characterize the set of identified counterfactuals and treatment effects.

We represent IV equations using discrete mixtures. Identification is achieved by imposing restrictions on the kernels of the mixtures. We do not invoke separability of preferences or identification at infinity to achieve these results. Nonetheless, separability of choice equations emerges as one representation of the underlying choice process.

Unordered monotonicity can sometimes be justified by economic choice models. It can be represented in multiple ways. Unordered monotonicity implies and is implied by a form of separability in the equations generating choices. These representations are linked to properties of binary matrices that characterize the admissible response-types generated by the available instrumental variables. We develop a variety of criteria to determine if Unordered Monotonicity is satisfied. We interpret each of these criteria and explain how they can be used in practice. We show that “principal strata” in the statistics literature are coarse versions of control functions.

This paper demonstrates the power of binary matrices in generating and interpreting identification conditions and in unifying apparently diverse approaches to identification of mean counterfactuals and mean treatment effects. The broader lesson of this paper is that in general unordered discrete choice models restrictions on choice behavior, encoded in the kernel of the mixture representing the IV equations, play a fundamental role in identifying counterfactuals using instrumental variables.

Supplementary Material

Online Appendix

Acknowledgments

We thank Joshua Shea for a close and perceptive read of the paper and helpful comments. We thank the participants in various seminars at the University of Chicago and members of the audience at various regional meetings of the Econometric Society for helpful comments. This research was supported in part by: the American Bar Foundation; the Pritzker Children’s Initiative; the Buffett Early Childhood Fund; NIH grants NICHD R37HD065072, NICHD R01HD54702, and NIA R24AG048081. The views expressed in this paper are solely those of the authors and do not necessarily represent those of the funders or the official views of the National Institutes of Health. The Web Appendix for this paper is posted at https://cehd.uchicago.edu/monotonicity_identifiability.

Footnotes

1

See the statement of purpose for the Econometric Society by Ragner Frisch (1933).

2

Theil (1953, 1958) developed two stage least-squares—the leading instrumental variable estimator.

3

By the economist.

4

This concept is more accurately interpreted as “uniformity” and does not correspond to ordinary mathematical definitions of monotonicity. See Heckman and Vytlacil (2005, 2007a).

6

For the binary choice model, these are the LATE parameters of Imbens and Angrist (1994). Their extension of LATE to situations with multiple choices assumes that indicators of choice are naturally ordered (e.g., years of schooling). It assumes a meaningful scalar aggregator can be constructed that is monotonic in the ordered indicators of choice (Angrist and Imbens, 1995). They identify a mixture of LATEs where the weights, but not the LATEs, are identified individually. In general, LATE does not identify a variety of policy relevant parameters. See Heckman and Vytlacil (2007b) or Heckman (2010).

8

Continuity of instruments and full support produce identifiability in our model, but are not required. See Heckman and Pinto (2015a). In related work, Lee and Salanié (2016) use a general framework to investigate multivalued choice models defined by separable threshold-crossing rules. They show that the identification of causal effects is possible with enough variation in instrumental variables that assume values on a continuum (“identification at infinity”).

9

For an empirical application of unordered monotonicity, see Pinto (2016a), who evaluates the Moving to Opportunity Experiment.

10

By policy-invariant, we mean functions whose maps remain invariant under manipulation of the arguments. This is the notation of autonomy developed by Frisch (1938) and Haavelmo (1944). For a recent discussion of these conditions, see Heckman and Pinto (2015b).

11

The assumption that Z is a multiple-valued scalar is a convenience. We can vectorize a matrix of instruments into a scalar form. Thus, we accommodate multiple instruments defined in the usual way.

12

Such errors terms are often called “shocks” in structural equation models. fT is a random function that could be written as a deterministic function if we introduced shock εT of arbitrary dimension as an argument of the function, where εT is independent of V and εY.

13

Fixing is a causal operation that captures the notion of external (ceteris paribus) manipulation. It is central concept in the study of causality and dates back to (Haavelmo, 1943). See Heckman and Pinto (2015b) for a recent discussion of fixing and causality.

14

Counterfactual E(Y (t)) and conditional expectation E(Y |T = t) differ if the conditional and unconditional distributions of V are different: E(Y (t)) = ∫ E(Y (t)|V = v)dFV (v) ≠ ∫ E(Y |V = v, T = t)dFV |T=t(v) = E(Y |T = t) where FV is the CDF of V and FV |T=t is the CDF of V conditional on T = t. See Heckman and Pinto (2015b).

15

Different notions of response vectors are used in the literature. In our notation, response vectors correspond to the choices a person of type V would make when confronted by different values of Z. Robins and Greenland (1992) initiated the literature. Frangakis and Rubin (2002) use the term “principal strata.” They do not explicitly model V or use the econometric framework (1)–(3) so the relationship between strata and V and the fact that conditioning on S is equivalent to conditioning on regions of V is only implicit in their analysis. T(z) can potentially take as many as NT values for each value z ∈ supp(Z). Since Z has | supp(Z)| = NZ elements, supp(S) can have at most NTNZ elements.

16

Figure B.3 in Web Appendix B displays our IV model with response vector S as a Directed Acyclic Graph (DAG).

17

The regions are distinct because fT (·) is a function.

18

S being a balancing score means that properties of V are inherited by S. Formally, S = fS(V ) is a surjective function of V that satisfies Y (t) ⫫ T|VY (t) ⫫ T|fS(V ), and σ(S) ⊆ σ(V ) where σ denotes a σ-algebra in the probability space (Ω,, P).

19

See Heckman (2008) for a survey of a wide array of methods that implement this principle.

20

If we set κ(Y ) = Y, we equate expected values of observed outcomes with expected counterfactual outcomes. Setting κ(Y) = 1[Y ≤ y], we equate the cumulative distribution function (CDF) of the observed outcome with the unobserved CDF of counterfactual outcomes.

21

Candidates for X are baseline variables caused by V. Knowledge of the X variables helps to identify the observed characteristics of persons within strata.

22

Across the two equation systems for T and scalar Y there are (2·NT −1)·NZ observed quantities and (NT)NZ+1 unknown parameters.

23

See, e.g., Clogg (1995) and Henry et al. (2014).

24

The Moore-Penrose inverse of a matrix A is denoted by A+ and is defined by the four following properties: (1) AA+A = A; (2) A+AA+ = A+; (3) A+A is symmetric; (4) AA+ is symmetric. The Moore-Penrose matrix A+ of a real matrix A is unique and always exists (Magnus and Neudecker, 1999).

25

See Appendix A.3.

26

See Section A.4 of the Web Appendix for bounds on the response-type probabilities and counterfactual outcomes.

28

We also have that Bt1=ιNZιNS-Bt0, where ιN denotes a N-dimensional vector of elements 1.

29

Imbens and Angrist (1994) do not use indicator functions. This is one innovation of our analysis. They compare the values of the counterfactual choices directly, e.g., Tω(z) ≥ Tω(z′), assuming the T are ordered. In their analysis, the values that choice T takes must be ordered. Our approach does not require T to be ordered. The two monotonicity criteria are equivalent for the binary choice model.

30

Recall R does not have redundant rows or columns. Otherwise stated,.

31

In Section 6, we present a generalization of the triangular property for matrices called “lonesum matrices.”

32

Heckman and Pinto (2015c) show that in the general case of multi-valued treatments, Ordered Monotonicity A-2 and unordered monotonicity A-3 do not imply each other.

34

If we assume an ordered choice model, we can readily secure identification. If we only assume a partially ordered model we lose identification. Heckman and Pinto (2015c) discuss these cases.

35

See the elimination analysis in Table D.1 of Web Appendix D.

36

Under this rationale, it follows that: Λω(zb, ta) = Λω(zno, ta), Λω(zb, tb) = Λω(zbc, tb), and Λω(zb, tc) = Λω(zno, tc).

37

See Pinto (2016a).

38

This would occur if utility is quasilinear in γ.

39

A stronger assumption is homothetic preferences on consumption goods.

40

See the elimination analysis in Table D.5 of Web Appendix D.

41

See the elimination analysis in Table D.6 of Web Appendix D.

42

See Table D.13 in Web Appendix D.3 for the elimination analysis.

43

This claim is formally proved in the next section using Condition (iii) of Theorem T-3.

44

This paper does not consider issues of estimation and inference. If certain parameters are over-identified from different instrument configurations, the obvious approach is to combine estimators using efficient GMM (Hansen, 1982).

45

See Ryser (1957), Brualdi (1980), Brualdi and Ryser (1991) and Sachnov and Tarakanov (2002) for surveys of the properties of binary matrices.

47

Alternatively, this can be written as: for any z, z′ ∈ supp(Z) and t ∈ supp(T), we have that:

(1[T=t]Z=z,V=v)(1[T=t]Z=z,V=v)forallvsupp(V)or(1[T=t]Z=z,V=v)(1[T=t]Z=z,V=v)forallvsupp(V).
48

See Web Appendix E for a derivation.

49

Heckman and Pinto (2015c) show that lonesum matrices also play a key role in equivalence results for ordered monotonicity. Pinto (2016b) develops a framework for the design of social interventions using lonesum matrices that rely on revealed preference relationships to identify causal parameters. He shows that incentive designs based on lonesum matrices generate a range of monotonicity conditions.

50

Web Appendix F presents all of the 66 response matrices that consist of distinct sets of 7 response-types generated by unordered monotonicity A-3.

51

In this case, we obtain the following forbidden sub-matrix: (tatctbta).

52

These are termed “prohibited” or “forbidden” patterns of binary matrices (see Ryser, 1957).

53

Violation of Condition (ii) is not necessarily a violation of rationality. Table 6 is based on an application of WARP, but violates Condition (ii) and unordered monotonicity.

54

Recall that we eliminate all redundancies in the rows or columns of R.

55

With the caveat that we eliminate any redundancies in the rows and columns of R.

56

Web Appendix H discusses the threshold property of condition (iv) in greater detail.

57

Consider a binary choice model: T = 1[α + V · Z ≥ 0] T ∈ {0, 1} where (α, V) is a random vector and V, Z, and α are scalar. Suppose that we impose the requirement that V > 0 while Z is unrestricted. This is a monotone response model that is nonseparable. However, it can be represented as a separable model: T=1[αV+Z0]. This highlights the non-uniqueness of the representation of Ψ(t,Z, V) and that separability is only one characterization of preferences consistent with A-3.

58

This transformation does not change an agent’s preferences towards choices in supp(T):

Ψ(t,z,v)Ψ(t,z,v)(Ψ(t,z,v)-maxtsupp(T)\{t}Ψ(t,z,v))(Ψ(t,z,v)-maxtsupp(T)\{t}Ψ(t,z,v)).
59

In general, a richer class of treatment effects can be identified in the unordered model than in the ordered or binary models. See Heckman and Pinto (2016).

60

An alternative terminology would be t-Compliers.

61

See Heckman et al. (2006a). This analysis readily generalizes to any columns of switchers. This treatment effect can always be identified under the assumption of unordered monotonicity. Treatment effects for other categories of switchers can also be identified under these conditions. Additional treatment effects can sometimes be identified. See Heckman and Pinto (2016).

A version of this paper was first presented as the Presidential Address to the Econometric Society delivered in Taipei, June 22, 2014, under the title “Causal Models, Structural Equations, and Identification: Stratification and Instrumental Variables.” An earlier version was presented by Heckman at the European Seminar on Bayesian Econometrics in Vienna, November 1, 2012, hosted by Sylvia Fruehwirth-Schnatter, under the title “Causal Analysis After Haavelmo: Definitions and a Unified Analysis of Identification of Causal Models.”

Contributor Information

James J. Heckman, Department of Economics, University of Chicago, 1126 East 59th Street, Chicago, IL 60637, Phone: 773-702-0634

Rodrigo Pinto, Department of Economics, University of California Los Angeles, 8385 Bunche Hall, Los Angeles, CA 90095, Phone: 310-825-9528.

References

  1. Angrist JD, Imbens GW. Two-Stage Least Squares Estimation of Average Causal Effects in Models with Variable Treatment Intensity. Journal of the American Statistical Association. 1995;90:431–442. [Google Scholar]
  2. Angrist JD, Imbens GW, Rubin D. Identification of Causal Effects Using Instrumental Variables. Journal of the American Statistical Association. 1996;91:444–455. [Google Scholar]
  3. Blevins JR. Nonparametric Identification of Dynamic Decision Processes with Discrete and Continuous Choices. Quantitative Economics. 2014;5:531–554. [Google Scholar]
  4. Brualdi RA. Matrices of zeros and ones with fixed row and column sum vectors. Linear Algebra and Its Applications. 1980;33:159–231. [Google Scholar]
  5. Brualdi RA, Ryser HJ. Encyclopedia of Mathematics and its Applications. New York, NY: Cambridge University Press; 1991. Combinatorial Matrix Theory. [Google Scholar]
  6. Carrasco M, Florens J-P, Renault E. Linear Inverse Problems in Structural Econometrics Estimation Based on Spectral Decomposition and Regularization. In: Heckman JJ, Leamer EE, editors. Handbook of Econometrics. 6B. Amsterdam: Elsevier; 2007. pp. 5633–5751. [Google Scholar]
  7. Clogg CC. Latent Class Models. In: Arminger G, Clogg CC, Sobel ME, editors. Handbook of Statistical Modeling for the Social and Behavioral Sciences. chap. 6. New York: Plenum Press; 1995. pp. 311–359. [Google Scholar]
  8. Frangakis CE, Rubin D. Principal stratification in causal inference. Biometrics. 2002;58:21–29. doi: 10.1111/j.0006-341x.2002.00021.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Frisch R. Editor’s Note. Econometrica. 1933;1:1–4. [Google Scholar]
  10. Frisch R. Paper given at League of Nations. 1938. Autonomy of Economic Relations: Statistical versus Theoretical Relations in Economic Macrodynamics. Reprinted in D.F. Hendry and M.S. Morgan (1995), The Foundations of Econometric Analysis, Cambridge University Press. [Google Scholar]
  11. Haavelmo T. The Statistical Implications of a System of Simultaneous Equations. Econometrica. 1943;11:1–12. [Google Scholar]
  12. Haavelmo T. The Probability Approach in Econometrics. Econometrica. 1944;12:iii–vi. 1–115. [Google Scholar]
  13. Hansen LP. Large Sample Properties of Generalized Method of Moments Estimators. Econometrica. 1982;50:1029–1054. [Google Scholar]
  14. Heckman JJ. The Principles Underlying Evaluation Estimators with an Application to Matching. Annales d’Economie et de Statistiques. 2008;91–92:9–73. [Google Scholar]
  15. Heckman JJ. Building Bridges Between Structural and Program Evaluation Approaches to Evaluating Policy. Journal of Economic Literature. 2010;48:356–398. doi: 10.1257/jel.48.2.356. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Heckman JJ, Moon SH, Pinto R, Savelyev PA, Shaikh A, Yavitz AQ. The Perry Preschool Project: A Reanalysis. University of Chicago, Department of Economics; 2006a. Unpublished manuscript. [Google Scholar]
  17. Heckman JJ, Pinto R. Alternative Ways to Identify PS. University of Chicago; 2015a. Aug, Unpublished manuscript. 2015. [Google Scholar]
  18. Heckman JJ, Pinto R. Causal Analysis after Haavelmo. Econometric Theory. 2015b;31:115–151. doi: 10.1017/S026646661400022X. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Heckman JJ, Pinto R. Comparing Ordered and Unordered Choice Models. University of Chicago; 2015c. Jul, Unpublished manuscript. 2015. [Google Scholar]
  20. Heckman JJ, Pinto R. Ordered and Unordered Monotonicity. UCLA Economics; 2016. Unpublished manuscript. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Heckman JJ, Robb R. Alternative Methods for Evaluating the Impact of Interventions: An Overview. Journal of Econometrics. 1985;30:239–267. [Google Scholar]
  22. Heckman JJ, Urzúa S, Vytlacil EJ. Understanding Instrumental Variables in Models with Essential Heterogeneity. Review of Economics and Statistics. 2006b;88:389–432. [Google Scholar]
  23. Heckman JJ, Urzúa S, Vytlacil EJ. Instrumental Variables in Models with Multiple Outcomes: The General Unordered Case. Annales d’Economie et de Statistique. 2008;91–92:151–174. [Google Scholar]
  24. Heckman JJ, Vytlacil EJ. Structural Equations, Treatment Effects and Econometric Policy Evaluation. Econometrica. 2005;73:669–738. [Google Scholar]
  25. Heckman JJ, Vytlacil EJ. Econometric Evaluation of Social Programs, Part I: Causal Models, Structural Models and Econometric Policy Evaluation. In: Heckman JJ, Leamer EE, editors. Handbook of Econometrics. 6B. Amsterdam: Elsevier; 2007a. pp. 4779–4874. chap. 70. [Google Scholar]
  26. Heckman JJ, Vytlacil EJ. Econometric Evaluation of Social Programs, Part II: Using the Marginal Treatment Effect to Organize Alternative Economic Estimators to Evaluate Social Programs, and to Forecast Their Effects in New Environments. In: Heckman JJ, Leamer EE, editors. Handbook of Econometrics. 6B. Amsterdam: Elsevier B. V; 2007b. pp. 4875–5143. chap. 71. [Google Scholar]
  27. Henry M, Kitamura Y, Salanié B. Partial Identification of Finite Mixtures in Econometric Models. Quantitative Economics. 2014;5:123–144. [Google Scholar]
  28. Imbens GW, Angrist JD. Identification and Estimation of Local Average Treatment Effects. Econometrica. 1994;62:467–475. [Google Scholar]
  29. Lee S, Salanié B. Identifying Effects of Multivalued Treatments. 2016 Unpublished manuscript. [Google Scholar]
  30. Magnus J, Neudecker H. Matrix Differential Calculus with Applications in Statistics and Econometrics. 2 Wiley; 1999. [Google Scholar]
  31. Pinto R. Unpublished paper based on Ph.D. Thesis. University of Chicago, Department of Economics; 2016a. Learning from Noncompliance in Social Experiments: The Case of Moving to Opportunity. [Google Scholar]
  32. Pinto R. Randomized Biased-Controlled Trials: Adding Incentives to the Design of Experiments. 2016b Unpublished paper. [Google Scholar]
  33. Prakasa Rao BLS. Identifiability in Stochastic Models: Characterization of Probability Distributions. chap. 8. Boston, MA: Academic Press; 1992. Identifiability for Mixtures of Distributions; pp. 183–228. [Google Scholar]
  34. Quandt RE. A New Approach to Estimating Switching Regressions. Journal of the American Statistical Association. 1972;67:306–310. [Google Scholar]
  35. Robins JM, Greenland S. Identifiability and Exchangeability for Direct and Indirect Effects. Epidemiology. 1992;3:143–155. doi: 10.1097/00001648-199203000-00013. [DOI] [PubMed] [Google Scholar]
  36. Ryser H. Combinatorial properties of matrices of zeros and ones. Canadian Journal of Mathematics. 1957;9:371–377. [Google Scholar]
  37. Sachnov VN, Tarakanov VE. In: Combinatorics of Nonnegative Matrices. Kolchin Valentin F., translator. Providence, RI: American Mathematical Society; 2002. no. 213 in Translations of Mathematical Monographs. [Google Scholar]
  38. Theil H. Esimation and Simultaneous Correlation in Complete Equation Systems. The Hague: Central Planning Bureau; 1953. mimeographed memorandum. [Google Scholar]
  39. Theil H. Economic Forecasts and Policy. Amsterdam: North-Holland Publishing Company; 1958. no. 15 in Contributions to Economic Analysis. [Google Scholar]
  40. Vytlacil EJ. Independence, Monotonicity, and Latent Index Models: An Equivalence Result. Econometrica. 2002;70:331–341. [Google Scholar]
  41. Vytlacil EJ. Ordered Discrete Choice Selection Models: Equivalence, Nonequivalence, and Representation Results. Stanford University, Department of Economics; 2004. Unpublished manuscript. [Google Scholar]
  42. Vytlacil EJ. Ordered Discrete-Choice Selection Models and Local Average Treatment Effect Assumptions: Equivalence, Nonequivalence, and Representation Results. Review of Economics and Statistics. 2006;88:578–581. [Google Scholar]
  43. Yakowitz SJ, Spragins JD. On the Identifiability of Finite Mixtures. Annals of Mathematical Statistics. 1968;39:209–214. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Online Appendix

RESOURCES