Unordered Monotonicity

James J Heckman; Rodrigo Pinto

doi:10.3982/ECTA13777

. Author manuscript; available in PMC: 2019 Jan 1.

Published in final edited form as: Econometrica. 2018 Jan;86(1):35. doi: 10.3982/ECTA13777

Unordered Monotonicity

James J Heckman ¹, Rodrigo Pinto ²

PMCID: PMC5822751 NIHMSID: NIHMS938978 PMID: 29479110

Abstract

This paper defines and analyzes a new monotonicity condition for the identification of counterfactuals and treatment effects in unordered discrete choice models with multiple treatments, heterogenous agents and discrete-valued instruments. Unordered monotonicity implies and is implied by additive separability of choice of treatment equations in terms of observed and unobserved variables. These results follow from properties of binary matrices developed in this paper. We investigate conditions under which unordered monotonicity arises as a consequence of choice behavior. We characterize IV estimators of counterfactuals as solutions to discrete mixture problems.

Keywords: Instrumental Variables, Monotonicity, Revealed Preference, Generalized Roy Model, Binary Matrices, Discrete Choice, Selection Bias, Identification, Discrete Mixtures

JEL codes: I21, C93, J15, V16

1 Introduction

The evaluation of economic policies is a central goal of econometrics.¹ Economists have long used instrumental variables (IV) to identify policy-relevant parameters.² Early econometricians used IV to identify parameters in systems of linear simultaneous equations. In that framework, economists can safely be agnostic about the models generating choices in estimating a variety of interesting policy counterfactuals if their instruments satisfy rank and exogeneity conditions.

This agnostic stance is not justified in models with heterogeneous responses in which decisions to take treatment are based on unobserved³ components of those responses. Without additional assumptions, instrumental variables do not identify interpretable causal parameters. Choice mechanisms play a fundamental role in interpreting what instruments identify.

For binary and ordered versions of IV models, Imbens and Angrist (1994) show that monotonicity facilitates identification of certain instrument-defined causal parameters. This condition requires that responses to changes in instruments move all people toward or against the same choices.⁴ It is a condition about the uniformity of the direction of responses across persons in responses to changes in instruments.⁵ In binary and ordered choice models, monotonicity coupled with standard IV assumptions permits economists to identify certain causal effects on outcomes of changes in the choices induced by variation in the instruments with different instruments generally identifying different parameters.⁶

For a nonparametric binary choice Generalized Roy model, Vytlacil (2002) shows that monotonicity is equivalent to assuming that treatment choice equations can be characterized by an additively-separable latent-variable threshold-crossing model. Separability is defined in terms of observed and unobserved (by the economist) variables. Vytlacil (2006) extends his analysis to the case of ordered multiple choice models where the order is placed on the possible outcome variables (e.g., years of schooling).

This paper contributes to the literature by extending the analysis of instrumental variables to a general model of unordered choices. We develop a natural generalization of monotonicity—unordered monotonicity—that applies to models with multiple choices without a natural order among the choice values. For example, the choice of a pet among the set { cat, dog, bird } is only ordered by the preferences of agents across choices and not necessarily by the characteristics of the outcomes of choices. Unordered monotonicity preserves the intuitive notion of weak uniformity of responses to changes in instruments across persons without assuming any cardinalization on choice outcomes. We demonstrate how unordered monotonicity arises in choice models and examine which counterfactuals and causal parameters and weights are identified by different configurations of instruments. Like its counterpart in ordered choice models, unordered monotonicity identifies a mixture of LATEs with identifiable weights. We cannot identify the causal effect subcomponent of the mixture of LATEs, but we can identify certain counterfactuals.

Identification of causal effects in unordered choice models is studied by Heckman et al. (2006b), Heckman and Vytlacil (2007b) and Heckman et al. (2008) who identify a variety of economically interpretable treatment effects. They assume that the equations generating treatment choices are governed by additively separable threshold-crossing models. Their identification strategy relies critically on instruments that assume values on a continuum. They also invoke “identification at infinity,” as does a large literature in structural economics.⁷ In this paper, we show that these assumptions can be relaxed and identification of causal effects can still be secured. We rely only on discrete-valued instruments—the case commonly encountered in empirical work.⁸

This paper introduces economists to the identifying and interpretive power of binary matrices. We state the necessary and sufficient conditions for identification of counterfactuals in terms of conditions on binary matrices. We establish an equivalence result that connects unordered monotonicity and separability of choice equations. Separability is not imposed on the underlying choice equations.

However, unordered monotonicity implies and is implied by representations of choice equations that are additively separable in observed and unobserved variables. We show that this equivalence result stems from the properties of binary matrices that characterize choice sets. We determine the counterfactual outcomes that are identified under unordered monotonicity and present equations that facilitate estimation of the identified parameters.⁹

This paper proceeds in the following way. Section 2 defines a general model of multiple choices and categorical instrumental variables. Section 3 presents a general framework for studying identification of counterfactuals and causal parameters in the general model. Our framework is based on partitioning the population into strata corresponding to counterfactual treatment choices. Section 4 presents a new characterization of the IV identification problem using a finite mixture model with restrictions on admissible vectors of counterfactual choices. We state necessary and sufficient conditions for identifying causal parameters. We illustrate these conditions for a binary choice (LATE) model. We show the simplicity and power of our analytical framework by deriving Vytlacil’s equivalence result (2002) in a transparent way. Section 5 defines unordered monotonicity and illustrates how is arises in choice-theoretic models. Section 6 presents equivalence theorems that relate the properties of unordered monotonicity and the separability of choice equations. We interpret these equivalence results in light of economic theory. Section 7 applies this analysis to identify causal parameters. We establish the role of choice theory in securing identifiability. Section 8 concludes.

2 A Choice-Theoretic Model of Instrumental Variables

Our model consists of five (possible vector-valued) random variables defined on probability space (Ω, ℱ, P), two policy-invariant (vector) equations that determine causal relationships among the variables, and an independence condition:¹⁰

Choice Equations: T = f_{T} (Z, V)

(1)

Outcome Equations: Y = f_{Y} (T, V, ε_{Y})

(2)

Independence Condition: V, Z, ε_{Y} are mutually independent,

(3)

Variables (Z, T, Y, ε_Y,V ) have the following properties. P1: Instrument Z is a categorical random variable with support supp(Z) = {z₁, … , z_{N_Z}};¹¹ P2: Treatment (or Choice) indicator T is a discrete-valued random variable with support supp(T) = {t₁, … , t_{N_T}}; P3: Y is an observed random variable denoting outcomes arising from treatment; P4: ε_Y is an unobserved error term;¹² P5: V is a confounder—an unobserved random vector (possibly infinite dimensional) affecting both choices and outcomes. We assume that the expectation of each component of Y exists. We also assume that the distribution of T varies conditional on each value of Z, that is, P(T = t|Z = z) > 0 for all t ∈ supp(T) and z ∈ supp(Z). Vector (Z_ω; T_ω; Y_ω;V_ω) denotes the realization of these variables for an element ω ∈ Ω. To simplify notation, background variables unaffected by treatment are kept implicit. Our analysis is conditional on such variables.

Counterfactual outcome Y (t) is defined by fixing the argument T of the outcome Equation (2) to t ∈ supp(T), that is, Y (t) = f_Y (t,V, ε_Y ), The observed outcome Y (Equation (2)) is the output of a Quandt (1972) switching regression model:

Y = \sum_{t \in supp (T)} Y (t) \cdot 1 [T = t] \equiv Y (T),

(4)

where 1[α] is an indicator function that takes value 1 if α is true and 0 otherwise. Counterfactual choice T(z) = f_T (z,V ) is defined by fixing the argument Z of the choice equation (2) to z ∈ supp(Z).¹³ Observed choice is given by

T = \sum_{z \in supp (Z)} T (z) \cdot 1 [Z = z] \equiv T (Z) .

(5)

Remark 2.1

The binary Generalized Roy Model (Heckman and Vytlacil, 2007a) is a special case of this model in which V is a scalar random variable V, the choice is binary T ∈ {0, 1}, and the choice equation is defined by an indicator function that is separable in Z and V, namely T = f_T (Z, V ) ≡ 1[τ (Z) ≥ V ], In this paper, we analyze multiple choices and impose no restriction on the functional forms of the choice equations (1) or outcome equations (2). Instead, we make restrictions on counterfactual choices and examine how those restrictions affect the characterization of choice equations.

Independence condition (3) generates the following properties:

Exclusion Restriction: (V, Y (t)) ⫫ Z

(6)

Conditional Independence (Matching) Property: Y (t) ⫫ T ∣ V .

(7)

Equation (6) states that instrument Z is independent of counterfactual outcome Y (t) and the confounding variable V that generates selection bias. It implies that instrument Z affects Y only through its effect on T. Equation (7) states that Y (t) is independent of treatment choice T after conditioning on V. Counterfactual outcomes can be evaluated by conditioning on V :

E (Y (t) ∣ V) = E (Y (t) ∣ V, T = t) = E ((\sum_{t^{'} \in supp (T)} Y (t^{'}) \cdot 1 [T = t^{'}]) ∣ V, T = t) = E (Y ∣ V, T = t) .

(8)

Any solution to the problem of selection bias requires that the analyst control for, or balance, unobserved V across treatment and control states.¹⁴

We control for V by partitioning the sample space Ω so that the treatment indicator T is independent of counterfactual outcomes within each partition set. Consider a partition of $Ω : Ω = \cup_{n = 1}^{N} Ω_{n}$ ; Ω_n ∩ Ω_n_′= ∅, ∀ n, n′ ∈ {1, … , N}, n ≠ n′, with an associated indicator H_ω that takes the value n ∈ {1, … , N} if ω ∈ Ω_n, i.e., $H_{ω} = \sum_{n = 1}^{N} n \cdot 1 [ω \in Ω_{n}]$ . If the following relationship holds within each partition,

Y (t) ⫫ T ∣ (H = n); \forall n \in {1, \dots, N},

(9)

T is effectively randomly assigned conditional on H = n. If such partitions were known, one could apply the logic underlying Equation (8) to evaluate counterfactual outcome E(Y (t)|H = n) using E(Y |T = t,H = n). If T takes the value t with strictly positive probability in all partition sets, i.e., Pr(T = t|H = n) > 0; n ∈ {1, … , N}, E(Y (t)) can be constructed from $E (Y (t)) = \sum_{n = 1}^{N} E (Y ∣ T = t, H = n) P (H = n)$ . Our identification strategy uses instrumental variable Z to generate partitions ${Ω_{n}}_{n = 1}^{N}$ that satisfy Equation (9). To do so we use response vectors which we define next.

3 Response Vectors and Identifying or Bounding of Mean Counterfactuals and Weights on Counterfactuals

Response Vector S is defined as a N_Z-dimensional random vector of counterfactual treatment choices T for Z fixed at each value of its support:

S = {[T (z_{1}), \dots, T (z_{N_{Z}})]}^{'} = {[f_{T} (V, z_{1}), \dots, f_{T} (V, z_{N_{Z}})]}^{'} \equiv f_{S} (V),

(10)

where T(z) denotes a counterfactual treatment choice when instrumental variable Z is fixed at z ∈ supp(Z). Let supp(S) = {s₁, · · · , s_{N_S}} denote the finite support of S. The N_Z-dimensional vectors s ∈ supp(S) are termed response-types or strata.¹⁵ S plays a fundamental role in our analysis. T is related to S in the following way:

T = [1 [Z = z_{1}], \dots, 1 [Z = z_{N_{Z}}]] \cdot S \equiv g_{T} (S, Z) .

¹⁶

(11)

Equation (10) uses the fact that after fixing Z = z, S is a function only of unobserved V. Conditioning on S effectively conditions on the regions of V that map into S by Equation (10).¹⁷ It is a coarse way of conditioning on V.

3.1 Properties of Response Vectors

Lemma L-1 establishes four useful properties of response vectors analogous to properties shared with V.

Lemma L-1

The following relationships for S hold for IV model (1)–(3):

Y (t) ⫫ T|S, (ii) S ⫫ Z, (iii) Y ⫫ T|(S, Z), (iv) Y ⫫ Z|(S, T).

Proof

See Web Appendix A.1.

Relationship (i) states that counterfactual outcomes Y (t) for all t ∈ supp(T) are independent of treatment choices conditional on S. Thus S shares the same conditional independence (matching) properties as V in (7). Relationship (ii) states that the potential treatment choices in S are independent of the instrumental variables. Relationship (iii) states that outcomes are independent of treatment choices conditional on S and Z. Indeed, from (11), T is deterministic conditional on S and Z. Relationship (iv) is closely related to (iii). It states that outcome Y is independent of instrumental variable Z when conditioned on S and T.

Remark 3.1

Response vector S generates a partition of the sample space Ω that has independence property (9). Function f_S : supp(V ) → supp(S) in (10) is constructed using function f_T defined by (1). Thus, for each ω ∈ Ω, there is a single value v ∈ supp(V ) such that V_ω = v and a single value s ∈ supp(S) such that f_S(v) = s. We define a partition of the sample space Ω by:

Ω_{n} = {ω \in Ω; f_{S} (V_{ω}) = s_{n}} for each s_{n} \in supp (S) .

(12)

In partition (12), S_ω = s_n and ω ∈ Ω_n are equivalent. This partition satisfies (9) because Y (t) ⫫ T|(ω ∈ Ω_n) holds due to item (i) of Lemma L-1. Hence treatment choice can be interpreted as being randomly assigned conditional on S. Indeed, conditional on S, treatment T only depends on Z which is statistically independent of V.

Response vector S is a balancing score for V.¹⁸ It exploits the properties of instruments Z to generate a coarse partition of unobserved variable V while maintaining the independence properties arising from conditioning on V. The matching condition Y (t) ⫫ T|S is analogous to Y (t) ⫫ T|V in (7). If S (or V ) were known, counterfactual outcomes (conditional on S (or V )) can be identified by conditioning on S or V.¹⁹ Thus, S plays the role of a control function (Heckman and Robb, 1985). From Equation (8), Y (t) ⫫ T|S implies that E(Y (t)|S = s) = E(Y |T = t,S = s). If P(T = t|S = s) > 0 for all s ∈ supp(S), counterfactual mean outcomes can be expressed as:

E (Y (t)) = \sum_{s \in supp (S)} E (Y (t) ∣ S = s) P (S = s) = \sum_{s \in supp (S)} E (Y ∣ T = t, S = s) P (S = s) .

(13)

S acts as a coarse surrogate for V and identifies treatment effects within strata by balancing unobservables V across treatment states.

3.2 The Strata Identification Problem

The problem of identifying counterfactual mean outcomes defined for each stratum consists of identifying unobserved E(Y (t)|S = s) and P(S = s) for s ∈ supp(S) and t ∈ supp(T), from observed E(Y |T = t, Z = z) and P(T = t|Z = z) for z ∈ supp(Z) and t ∈ supp(T). Theorem T-1 uses the relationships of Lemma L-1 to express unobserved objects in terms of observed ones.

Theorem T-1

The following equality holds for the IV model (1)–(3):

E (κ (Y) \cdot 1 [T = t] ∣ Z) = \sum_{s \in supp (S)} 1 [T = t ∣ S = s, Z] E (κ (Y (t)) ∣ S = s) P (S = s),

(14)

where κ : supp(Y ) → ℝ is an arbitrary known function.

Proof

See Web Appendix A.2.

Setting κ(Y ) to 1 generates the propensity score equality:²⁰

P (T = t ∣ Z = z) = \sum_{s \in supp (S)} 1 [T = t ∣ S = s, Z = z] P (S = s) .

(15)

Replacing κ(Y ) by any variable X such that X ⫫ T|S, we obtain:²¹

E (X ∣ T = t, Z) P (T = t ∣ Z) = \sum_{s \in supp (S)} 1 [T = t ∣ S = s, Z] E (X ∣ S = s) P (S = s) .

(16)

Remark 3.2

Equation (14) characterizes the problem of identifying counterfactual outcomes within strata. There are N_Z observed objects on the left-hand side for each t ∈ supp(T) totalling N_Z · N_T. Without further restrictions, the total number of latent response-types on the right-hand side is $N_{T}^{N_{Z}}$ , i.e., the number of strata. Thus, the number of observed quantities (N_T · N_Z) grows linearly in N_Z while the number of possible response-types ( $N_{T}^{N_{Z}}$ ) grows geometrically in N_Z.²² Identification requires that constraints be placed on the number of admissible strata (S). Choice theory can produce such restrictions, as can other assumptions, such as those about functional forms.

Indicator 1[T = t|S = s, Z = z] in Equation (14) is deterministic because T is deterministic given Z and S in Equation (11). Our identification strategy develops economically interpretable restrictions on these indicators that govern the choice of treatment as Z varies. Such restrictions reduce the number of admissible response-types and characterize the indicators 1[T = t|S = s, Z = z], facilitating identification of causal parameters.

We note, for later use, that the probability of treatment choice conditional on response-types is

P (T = t ∣ S = s) = \sum_{z \in supp (Z)} 1 [T = t ∣ S = s, Z = z] P (Z = z ∣ S = s), = \sum_{z \in supp (Z)} 1 [T = t ∣ S = s, Z = z] P (Z = z),

(17)

where the last equality is a consequence of S ⫫ Z (item (ii) of Lemma L-1).

Note that Equation (14) is a discrete mixture latent class model, a feature we exploit below.²³ Our paper differs from previous work on nonparametric instrumental variables. Instead of forming the usual nonparametric IV moment equations (see, e.g., Carrasco et al., 2007), we use instruments to construct strata that generate the kernels of finite mixture equations and choice theory to place restrictions on the kernels. We then use finite mixture methods to examine the identification of the individual causal parameters on the right hand of Equations (14)–(16).

4 Identifying Response Probabilities and Counterfactual Outcomes

We now present general conditions for identifying response probabilities, counterfactual outcomes, and pre-program variables conditioned on strata. To do so, it is useful to express Equations (14)–(15) as a system of linear equations. Define P_Z(t) = [P(T = t|Z = z₁), … , P (T = t|Z = z_{N_Z})]′, the vector of observed choice probabilities (“propensity scores”). Define P_Z as the vector that stacks P_Z(t) across t ∈ supp(T): P_Z = [P_Z(t₁), … ,P_Z(t_{N_T})]′. Q_Z(t) is defined in an analogous fashion for outcomes defined for different values of T (i.e., multiplied by the treatment indicators). In a similar fashion, L_Z(t) stands for vector X such that X ⫫ T|S, Z. The left-hand sides of Equations (14) and (16) are given respectively by: Q_Z(t) = [E(κ(Y ) · 1[T = t]|Z = z₁), … , E(κ(Y ) · 1[T = t]|Z = z_{N_Z})]′, and L_Z(t) = [E(X · 1[T = t]|Z = z₁), … , E(X · 1[T = t]|Z = z_{N_Z})]′, where L_Z = [L_Z(t₁), … ,L_Z(t_{N_T})]′.

Let P_S be the vector of unobserved response probabilities P_S = [P(S = s₁), … , P(S = s_{N_S})] ′ and L_S = [E(X·1[S = s₁]), … , E(X·1[S = s_{N_S}])]′ be the unobserved vector of X-expectations times response indicators. We denote the vector of the expected outcomes multiplied by response indicators by: Q_S(t) = [E(κ(Y (t)) · 1[S = s₁]), … , E(κ(Y (t)) · 1[S = s_{N_S}])]′.

The following notation and concepts are used throughout the rest of this paper. Define response matrix R as an array of response-types defined over supp(S), i.e., R = [s₁, … , s_{N_S}]. To avoid trivial degeneracies we delete redundant rows (where different values of Z produce the same pattern for T) and redundant columns (where the same choices are made for the same value of Z). Matrix R has dimension N_Z×N_S. An element in the i-th row and n-th column of R is denoted by R[i, n] = (T|Z = z_i,S = s_n); i ∈ {1, · · · ,N_Z}, n ∈ {1, … , N_S}. We use R[i, ·] to denote the i-th row of R, R[·, n] for the n-th column R.

Let B_t denote a binary matrix of the same dimension as R and whose elements take value 1 if the respective element in R is equal to t and zero otherwise. Notationally, we define an element in the i-th row and n-th column of matrix B_t by B_t[i, n] = 1[T = t|Z = z_i, S = s_n]; i ∈ {1, · · · ,N_Z}, n ∈ {1, … , N_S}. We also use the short-hand notation B_t = 1[R = t] to denote B_t. Let B_T be a binary matrix of dimension (N_Z · N_T ) × N_S generated by stacking B_t as t ranges over $supp (T) : B_{T} = {[B_{t_{1}}^{'}, \dots, B_{t_{N_{T}}}^{'}]}^{'}$ .

In this notation, Equations (14), (15), and (16) can be written respectively as

Q_{Z} (t) = B_{t} Q_{S} (t),

(18)

P_{Z} = B_{T} P_{S}

(19)

L_{Z} = B_{T} L_{S} .

(20)

If B_t and B_T were invertible, Q_S(t), P_S, and L_S would be identified. However, such inverses do not always exist. In their place, we can use generalized inverses.

Let $B_{T}^{+}$ and $B_{t}^{+}$ be the Moore-Penrose pseudo-inverses²⁴ of matrices B_T and B_t; t ∈ supp(T) respectively. The following expressions are useful for characterizing the identification of response probabilities and counterfactual means:

K_{T} = I_{N_{S}} - B_{T}^{+} B_{T} and K_{t} = I_{N_{S}} - B_{t}^{+} B_{t}; t \in supp (T),

(21)

where I_{N_S} denotes an identity matrix of dimension N_S. K_T and K_t are orthogonal projection matrices that depend only on binary matrices B_T and B_t; t ∈ supp(T).²⁵

Applying the Moore-Penrose inverse to (18) and (19), we obtain:

P_{S} = B_{T}^{+} P_{Z} + K_{T} λ

(22)

Q_{S} (t) = B_{t}^{+} Q_{Z} (t) + K_{t} \tilde{λ}

(23)

where λ and λ̃ are arbitrary N_S-dimensional vectors (same dimension as P_S). In this notation, Theorem T-2 states general conditions for identification of response probabilities and counterfactual means:

Theorem T-2

For IV model (1)–(3), if there exists a real-valued N_S-dimensional vector ξ such that ξ′K_T = 0, then ξ′P_S and ξ′L_S are identified. In addition, if there exists a real-valued N_S-dimensional vector ζ such that ζ′K_t = 0, then ζ′Q_S(t) is identified.

Proof

See Web Appendix A.3.

Theorem T-2 shows the identifying properties of the response matrix. For example, suppose that B_T has full column-rank. Then $B_{T}^{+} B_{T} = I_{N_{S}}$ and K_T = 0. Therefore ξ′P_S is identified for any real vector ξ of dimension N_S. In particular, ξ′P_S is identified when ξ is set to be each column vector of the identity matrix I_{N_S}. In that case, each n-th column of I_{N_S} identifies P(S = s_n) and all the response-type probabilities are identified.²⁶

Note that full-rank for B_T does not imply full-rank for each B_t; t ∈ supp(T). Therefore, the identification of the response-type probabilities does not automatically produce identification of corresponding mean counterfactual outcomes. Corollary C-1 formalizes this discussion.

Corollary C-1

The following relationships hold for the IV model (1)–(3):

Vectors P_{S} and L_{S} are point-identified \Leftrightarrow rank (B_{T}) = N_{S} .

(24)

Vector Q_{S} (t) is point-identified \Leftrightarrow rank (B_{t}) = N_{S},

(25)

Also, if (25) holds, then E(κ(Y(t))) is identified by $ι^{'} B_{t}^{+} Q_{Z} (t)$ , where ι is a N_S-dimensional vector of 1s.

Proof

See Web Appendix A.5.

Versions of Corollary C-1 are found in the literature on the identifiability of finite mixtures.²⁷ Given binary matrices B_T, and B_t; t ∈ {1, ·, N_T }, the problem of identifying P_S, L_S and Q_S(t) is equivalent to the problem of identifying finite mixtures of distributions where the B_T and B_t play the roles of kernels of mixtures. Mixture components are the corresponding counterfactual outcomes conditional on the response types and mixture probabilities are the response-type probabilities.

One approach to identifiability is to simply assume that conditions (24) and (25) apply to R. A more satisfactory approach, and the one taken here and in Pinto (2016a), investigates how alternative specifications of choice relationships generate response matrices R that satisfy the identifiability requirements of Theorem T-2 and Corollary C-1.

It is important to note that we have given conditions for identifying counterfactual means within strata, E(Y(t)|S). Treatment effects are derived from, but are distinct from, these counterfactual means. Mean treatment effects are comparisons of different counterfactuals within the same set of strata: E(Y(t) − Y(t′)|S ∈ Σ) for t ≠ t′, where Σ ⊆ supp(S) is a subset-set of strata that might consist of a single element. In Section 7 we discuss identification of mean treatment effects, which is a more demanding problem.

4.1 Example: Binary Choice (LATE)

To familiarize the reader with our notation and concepts, and anticipate our generalization of it, consider the binary choice model implicit in the Local Average Treatment Effect – LATE. Treatment variable T takes two values: T_ω = t₁ if agent ω chooses to be treated and T_ω = t₀ if not. Instrument Z is binary valued (supp(Z) = {z₀, z₁}) with the property 0 < P(T = t₁|Z = z₀) < P(T = t₁|Z = z₁) < 1. A standard example is the problem of identifying the causal effect of college education on income Y. Agent ω decides between going to college (T_ω = t₁) or not (T_ω = t₀). Instrumental variable Z represents randomly assigned college scholarships. For example, Z_ω = z₁ if a scholarship is assigned to agent ω and Z_ω = z₀ if agent ω does not receive a scholarship.

The response vector is S = [T(z₀), T(z₁)]′. Without further restrictions, S can take four possible values described by the following response matrix:

\begin{matrix} \begin{matrix} s_{1} & s_{2} & s_{3} & s_{4} \end{matrix} \\ R = & [\begin{matrix} t_{1} & t_{0} & t_{1} & t_{0} \\ t_{1} & t_{1} & t_{0} & t_{0} \end{matrix}] & \begin{matrix} values for T (z_{0}) \\ values for T (z_{1}) \end{matrix} . \end{matrix}

(26)

In the language of LATE, the response-types s₁, s₂, s₃, s₄ are always-takers, compliers, defiers, and never-takers, respectively. B_t_₁ is the binary matrix that has the same dimension as R, whose elements take value 1 if the corresponding element in R is t₁ and value 0 if the element in R is t₀. Thus, B_t_₁ = 1[R = t₁] and B_t_₀ = 1[R = t₀] indicate whether elements in R are equal to t₁ or t₀, respectively.²⁸

The 4 × 4 binary matrix $B_{T} = {[\begin{matrix} B_{t_{0}}^{'}, & B_{t_{1}}^{'} \end{matrix}]}^{'}$ has rank equal to 3, which is less than the number of response-types N_S = 4. Therefore, by C-1, neither response-type probabilities nor the counterfactual outcomes are point identified. To identify them, it is necessary to reduce the number of response-types.

LATE solves this non-identification problem by assuming that each agent ω can only change his decision in one direction as the instrument varies. The monotonicity condition of Imbens and Angrist (1994) is:

Assumption A-1. Monotonicity for the Binary Choice Model

The following inequalities hold for any z, z′ ∈ supp(Z):

1 [T_{ω} (z) = t_{1}] \geq 1 [T_{ω} (z^{'}) = t_{1}] \forall ψ \in Ω or 1 [T_{ω} (z) = t_{1}] \leq 1 [T_{ω} (z^{'}) = t_{1}] \forall ω \in Ω .

²⁹

(27)

In our example, condition A-1 assumes that each agent is inclined to decide towards college if a scholarship is granted, i.e., 1[T_ω(z₁) = t₁] ≥ 1[T_ω(z₀) = t₁] for all ω ∈ Ω. This eliminates the response-type s₃ (the defiers) in matrix (26), generating the following matrices:

\begin{matrix} \begin{matrix} s_{1} & s_{2} & s_{4} \end{matrix} & \begin{matrix} s_{1} & s_{2} & s_{4} \end{matrix} & \begin{matrix} s_{1} & s_{2} & s_{4} \end{matrix} \\ R = & [\begin{matrix} t_{1} & t_{0} & t_{0} \\ t_{1} & t_{1} & t_{0} \end{matrix}], & B_{t_{1}} = & [\begin{matrix} 1 & 0 & 0 \\ 1 & 1 & 0 \end{matrix}], & B_{t_{0}} = & [\begin{matrix} 0 & 1 & 1 \\ 0 & 0 & 1 \end{matrix}], & B_{T} = & [\begin{matrix} B_{t_{0}} \\ B_{t_{1}} \end{matrix}] . \end{matrix}

(28)

Under monotonicity condition A-1 the three response-type probabilities (P(s₁), P(s₂), P(s₄)) and the four counterfactual outcomes (E(Y(t₀)|S = s₂), E(Y(t₀)|S = s₄), E(Y(t₁)|S = s₁), E(Y(t₀)|S = s₄)) are identified. These claims can be demonstrated by applying T-2 and C-1. For instance the rank of the binary matrix B_T in (28) is 3, which is also the number of response-types. Thus, by C-1, all the response probabilities P_S are identified. The identification of counterfactual outcomes depends on the properties of matrices K_t_₀, K_t_₁ that are calculated using the pseudo-inverse matrices $B_{t_{0}}^{+}, B_{t_{1}}^{+}$ as described in (21):

B_{t_{0}}^{+} = [\begin{matrix} 0 & 0 \\ 1 & - 1 \\ 0 & 1 \end{matrix}] \Rightarrow K_{t_{0}} = [\begin{matrix} 1 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{matrix}] and B_{t_{1}}^{+} = [\begin{matrix} 1 & 0 \\ - 1 & 1 \\ 0 & 0 \end{matrix}] \Rightarrow K_{t_{1}} = [\begin{matrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{matrix}] .

The observed vectors of propensity scores and conditional outcome expectations are P_Z = [P(T = t|Z = z₀), P(T = t|Z = z₁)]′ and Q_Z(t) = [E(Y · 1[T = t]|Z = z₀), E(Y · 1[T = t]|Z = z₁)]′, for t ∈ {t₁, t₀}. The unobserved 3×1 vectors of responsetype probabilities and counterfactual outcomes are given by

P_{S} = {[P (S = s_{1}), P (S = s_{2}), P (S = s_{4})]}^{'}

(29)

and

Q_{S} (t) = [E (Y (t) ∣ S = s_{1}) P (S = s_{1}), E (Y (t) ∣ S = s_{2}) P (S = s_{2}), E (Y (t) ∣ S = s_{4}) P {(S = s_{4})]}^{'} .

(30)

Equations (29) and (30) enable us to write the counterfactual E(Y(t₀)|S = s₂) as $E (Y (t_{0}) ∣ S = s_{2}) = \frac{ζ^{'} Q_{S} (t_{0})}{ζ^{'} P_{S}}$ where ζ = [0, 1, 0]′, so that ζ′P_S = P(S = s₂) is the population probability of the switchers. Note that ζ′K_t_₀ = 0, thus, by T-2, E(Y(t₀)|S = s₂) is identified. From Equation (22)–(23), we have:

E (Y (t_{0}) ∣ S = s_{2}) = \frac{ζ^{'} B_{t_{0}}^{+} Q_{Z} (t_{0})}{ζ^{'} B_{t_{0}}^{+} P_{Z} (t_{0})} = \frac{E (Y \cdot 1 [T = t_{0}] ∣ Z = z_{0}) - E (Y \cdot 1 [T = t_{0}] ∣ Z = z_{1})}{P (T = t_{0} ∣ Z = z_{0}) - P (T = t_{0} ∣ Z = z_{1})} .

By a parallel argument, the counterfactual outcome $E (Y (t_{1}) ∣ S = s_{2}) = \frac{ζ^{'} Q_{S} (t_{1})}{ζ^{'} P_{S}}$ . Since ζ′K_t_₁ = 0, by T-2, E(Y(t₁)|S = s₂) is identified from the expression

E (Y (t_{1}) ∣ S = s_{2}) = \frac{ζ^{'} B_{t_{1}}^{+} Q_{Z} (t_{1})}{ζ^{'} B_{t_{1}}^{+} P_{Z} (t_{1})} = \frac{E (Y \cdot 1 [T = t_{1}] ∣ Z = z_{1}) - E (Y \cdot 1 [T = t_{1}] ∣ Z = z_{0})}{P (T = t_{1} ∣ Z = z_{1}) - P (T = t_{1} ∣ Z = z_{0})} .

LATE is the causal effect for compliers E(Y(t₁) − Y(t₀)|S = s₂). Since P(T = t₀|Z = z) = 1 − P(T = t₁|Z = z), $ζ^{'} B_{t_{1}}^{+} P_{Z} (t_{1}) = ζ^{'} B_{t_{0}}^{+} P_{Z} (t_{0}) = P (S = s_{2})$ . Putting these ingredients together,

E (Y (t_{0}) - Y (t_{0}) ∣ S = s_{2}) = \frac{ζ^{'} (B_{t_{1}}^{+} Q_{Z} (t_{1}) - B_{t_{0}}^{+} Q_{Z} (t_{0}))}{ζ^{'} B_{t_{1}}^{+} P_{Z} (t_{1})} = \frac{E (Y ∣ Z = z_{1}) - E (Y ∣ Z = z_{0})}{P (T = t_{1} ∣ Z = z_{1}) - P (T = t_{1} ∣ Z = z_{0})} .

LATE is the causal effect conditioned on the values of V associated with strata s₂. It does not identify the average treatment effect E(Y(t₁) − Y(t₀)) because we cannot identify Y(t₁) for s₄ (t₀-always-taker) nor Y(t₀) for s₁ (t₁-always-taker). The counterfactual outcomes for the always-takers can be expressed in terms of Q_S(t) and P_S by:

E (Y (t_{0}) ∣ S = s_{4}) = \frac{ζ_{0}^{'} Q_{S} (t_{0})}{ζ_{0}^{'} P_{S}}; ζ_{0} = {[0, 0, 1]}^{'} and E (Y (t_{1}) ∣ S = s_{1}) = \frac{ζ_{1}^{'} Q_{S} (t_{1})}{ζ_{1}^{'} P_{S}}; ζ_{1} = {[1, 0, 0]}^{'} .

Since $ζ_{0}^{'} K_{t_{0}} = 0$ and $ζ_{1}^{'} K_{t_{1}} = 0$ , by Theorem T-2, E(Y(t₀)|S = s₄) and E(Y(t₁)|S = s₁) are identified. In Section 7, we use the properties of the generalized inverse to extend our analysis to a general model of multiple choices and extend the notion of compliers to the general unordered choice model.

4.2 Revisiting Vytlacil’s Equivalence Theorem

A by-product of our analysis is a simple derivation of Vytlacil’s (2002) fundamental equivalence result. He shows that monotonicity condition A-1 holds if and only if the treatment choice can be expressed as a function that is separable in Z and V, i.e., there exist deterministic functions, φ: supp(V) → ℝ and τ: supp(Z) → ℝ such that:

(1 [T = t_{1}] ∣ V = v, Z = z) = 1 [τ (z) \geq φ (v)] .

(31)

Monotonicity A-1 generates a key property of the binary matrix B_t_₁ = 1[R = t₁]. We can always reorder its rows and columns so that B_t_₁ becomes a lower-triangular matrix.³⁰ Consider the binary choice model where T takes values in {t₀, t₁} and Z takes values in {z₁, …, z_{N_Z}} that are indexed by increasing values of the propensity score, i.e., P(T = t₁|z₁) ≤ ··· ≤ P(T = t₁|z_{N_Z}). Arrange the columns of binary matrix B_t_₁ in decreasing order of the column-sums. Under Monotonicity A-1, B_t_₁ has dimension N_Z ×(N_Z +1) and is lower triangular. An explicit expression for B_t_₁ is given by Equation (28) for N_Z = 2.³¹ Under triangularity, for all i ∈ {1 ···, N_Z}, n ∈ {1, ···, N_Z + 1},

B_{t_{1}} [i, n] = 1 for i \geq n and B_{t_{1}} [i, n] = 0 for i < n .

(32)

Propensity score equality (15) generates the following expressions:

P (T = t_{1} ∣ Z = z_{i}) = \sum_{n^{'} = 1}^{N_{S}} 1 [T = t_{1} ∣ Z = z_{i}, S = s_{n^{'}}] \cdot P (S = s_{n^{'}}) = \sum_{n^{'} = 1}^{N_{Z} + 1} B_{t_{1}} [i, n^{'}] \cdot P (S = s_{n^{'}}) = \sum_{n^{'} = 1}^{i} P (S = s_{n^{'}}) .

(33)

The second equality uses the definition of an element in the i-th row and n-th column of B_t_₁[i, n′], that is B_t_₁[i, n′] = 1[T = t₁|Z = z_i, S = s_n_′] and that N_S = N_Z + 1 due to monotonicity A-1. The third equality uses triangularity property (32). Thus the following inequalities hold:

Since P (T = t_{1} ∣ Z = z_{i}) = \sum_{n^{'} = 1}^{i} P (S = s_{n^{'}}), then P (T = t_{1} ∣ Z = z_{i}) \geq \sum_{n^{'} = 1}^{n} P (S = s_{n^{'}}) for i \geq n

(34)

and P (T = t_{1} ∣ Z = z_{i}) < \sum_{n^{'} = 1}^{n} P (S = s_{n^{'}}) for i < n .

(35)

We can combine Equations (32) and (34)–(35) to express the elements B_t_₁[i, n] as:

B_{t_{1}} [i, n] = 1 [P (T = t_{1} ∣ Z = z_{i}) \geq ϕ (s_{n})],

(36)

where ϕ (s_{n}) = P (S \in {s_{1}, \dots, s_{n}}) = \sum_{n^{'} = 1}^{n} P (S = s_{n^{'}}) .

(37)

Vytlacil’s theorem emerges since B_t_₁[i, n] = 1[T = t₁|Z = z_i, S = s_n] and S is a balancing score for V, i.e., S = f_S(V). Thus, for any v ∈ supp(V) there is an s ∈ supp(S) such that s = f_S(v), and

1 [T = t_{1} ∣ Z = z, V = v] = 1 [T = t_{1} ∣ Z = z, S = f_{S} (v)] = 1 [\underset{τ (z)}{\underset{︸}{P (T = t_{1} ∣ Z = z)}} \geq \underset{φ (v)}{\underset{︸}{ϕ (f_{S} (v))}}]

(38)

This expression captures the key idea that the response variable S summarizes V. Section 6 establishes separability properties for a general unordered choice model. The triangularity property generating separability carries over to that general setting.

5 Multiple Unordered Choices

In the published literature, when LATE is extended to analyze multiple choices, T is assumed to be a scalar index defined over an ordered finite set of natural numbers {1, …, N_T} where the index is monotonically increasing (or decreasing) in the indicators of t (Angrist and Imbens, 1995). Treatment effects are defined in terms of variations in this index:

Assumption A-2. Ordered Monotonicity

The following inequalities hold for any z, z′ ∈ supp(Z), and each treatment t ∈ supp(T):

T_{ω} (z) \geq T_{ω} (z^{'}) \forall ω \in Ω or T_{ω} (z) \leq T_{ω} (z^{'}) \forall ω \in Ω .

(39)

Under standard assumptions about IV, A-2 is equivalent to the assumption that choices are generated by an ordered choice model (Vytlacil, 2004). To extend monotonicity to the unordered case, we retain the core feature of a monotonic relationship: shifts in Z move all agents toward or against making treatment choice t in supp(T). We do not require any order among the values of T, nor do we rely on a scalar representation of T. Instead, we replace comparisons of T with inequalities that compare indicator functions of the values taken by T for each pair of values z, z′ in supp(Z). If the support of T has no natural order, Assumption A-2 is meaningless.

This section extends the literature to define a concept of monotonicity for an unordered choice model. We discuss restrictions on the response matrix R that follow from this definition. We present some examples that build intuition.

5.1 Monotonicity for Unordered Models

Assumption A-3. Unordered Monotonicity

The following inequalities hold for any z, z′ ∈ supp(Z), and each treatment t ∈ supp(T):

1 [T_{ω} (z) = t] \geq 1 [T_{ω} (z^{'}) = t] \forall ω \in Ω or 1 [T_{ω} (z) = t] \leq 1 [T_{ω} (z^{'}) = t] \forall ω \in Ω,

(40)

where 1[T_ω(z) = t] indicates whether or not agent ω chooses treatment t ∈ supp(T) when Z is set to z.

Using indicator functions, we can make pairwise comparisons for all values of Z for each choice t ∈ supp(T) without imposing an arbitrary ordering on the values of the treatment choices T or creating a scalar index of T. Condition (40) preserves the key intuitive notion of monotonicity: a shift in an instrument moves all agents uniformly toward or against each possible choice. A-3 prohibits non-uniform movements induced by the instruments and is ruled out in Theorem T-3 below.

In the case of binary treatment, Ordered Monotonicity A-2 and unordered monotonicity A-3 generate the same monotonicity restriction A-1.³² In Appendix C, we present a simple example that demonstrates the benefits of using choice indicators rather than cardinal measures of outcomes to define monotonicity.

5.2 Linking Unordered Monotonicity to Choice Theory

Under unordered monotonicity, treatment choice can be characterized as the solution to a problem in which agents maximize utility Ψ(t, z, v), the utility arising from choosing t ∈ supp(T) for agent ω whose unobserved variable V takes value v when the instrument Z is set at z. We present a formal analysis of the properties of Ψ(t, z, v) generated by unordered monotonicity in Section 6. In this section we build economic intuition of how unordered monotonicity arises. We use revealed preference arguments to restrict R and generate monotonicity conditions. We give examples where plausible restrictions on choice theory, coupled with standard instrumental variable conditions, produce identification of various strata counterfactuals and response-type probabilities. We also examine cases in which the point identification of response-type probabilities fails.

Consider a model of car purchase in which each agent buys a single car from three possible options: {a, b, c}. Let T_ω = t_j if agent ω buys car j in supp(T) = {t_a, t_b, t_c}. Instruments are randomly assigned car-specific vouchers that offer price discounts to the car (or cars) specified by an offered voucher. We use z_a, z_b, z_c for vouchers that offer a discount to cars a, b and c respectively. We use z_bc for the voucher whose discount can be used to buy car b or c. z_no denotes no discount. If the voucher assigned to agent ω is z_a, then he faces a price-discount if he decides to buy car a. Agent ω pays full if decides to buy car b or c. If the agent were assigned voucher z_bc then the cars b and c become cheaper while car a has full price. We compare experimental designs that randomly assign different combinations of 3 out of the 5 voucher-types described above. Each agent ω is assumed to buy some car. In this section and in Web Appendix D, we give some examples of how choice restrictions facilitate identification and where they fail.

Our main example carried throughout the rest of this paper considers vouchers in supp(Z) = {z_no, z_a, z_bc}. The response vector S is given by the 3-dimensional vector of counterfactual choices: S = [T(z_no), T(z_a), T(z_bc)]′. Each of the three counterfactual choices T(z); z ∈ {z_no, z_a, z_bc} takes values in {t_a, t_b, t_c}, which gives a total of 27 (= 3³) possible response-types.³³ Without restrictions on admissible strata, the model of strata-contingent counterfactuals is not identified.³⁴ There are four intuitive monotonicity relationships arising from changes in z:

1 [T_{ω} (z_{n o}) = t_{a}] \leq 1 [T_{ω} (z_{a}) = t_{a}],

(41)

1 [T_{ω} (z_{b c}) = t_{a}] \leq 1 [T_{ω} (z_{a}) = t_{a}],

(42)

1 [T_{ω} (z_{n o}) \in {t_{b}, t_{c}}] \leq 1 [T_{ω} (z_{b c}) \in {t_{b}, t_{c}}],

(43)

1 [T_{ω} (z_{a}) \in {t_{b}, t_{c}}] \leq 1 [T_{ω} (z_{b c}) \in {t_{b}, t_{c}}] .

(44)

Relationship (41) states that the agent is induced toward buying car a when the instrument changes from no voucher (z_no) to a voucher for car a (z_a). Relationship (42) states that the agent is induced toward buying car a when the instrument changes from a voucher to buy b or c (z_bc) to a voucher for car a (z_a). Relationship (43) states that the agent is induced toward buying either car b or c when the instrument changes from no voucher (z_no) to a voucher for either car b or c (z_bc). Relationship (44) states that the agent is induced toward buying either car b or c when the instrument changes from a voucher for car a (z_a) to a voucher that applies to either car b or c (z_bc). Monotonicity relationships (41)–(44) eliminate 12 response-types out of the 27 possible ones, leaving the 15 admissible response-types presented in Table 1.³⁵

Table 1.

Response Matrix Generated by Monotonicity Relationships (41)–(44)

Instrumental Variables	Choices	Response-types of S
Instrumental Variables	Choices	s₁	s₂	s₃	s₄	s₅	s₆	s₇	s₈	s₉	s₁₀	s₁₁	s₁₂	s₁₃	s₁₄	s₁₅
No Voucher	T(z_no)	t_a	t_a	t_a	t_b	t_b	t_b	t_b	t_b	t_b	t_c	t_c	t_c	t_c	t_c	t_c
Voucher for a	T(z_a)	t_a	t_a	t_a	t_a	t_a	t_b	t_b	t_c	t_c	t_a	t_a	t_b	t_b	t_c	t_c
Voucher for b or c	T(z_bc)	t_a	t_b	t_c	t_b	t_c	t_b	t_c	t_b	t_c	t_b	t_c	t_b	t_c	t_b	t_c

Open in a new tab

Thus, by Corollary C-1, our model for counterfactuals is not identified. In addition, some of the remaining strata are not consistent with unordered monotonicity A- 3. More stringent application of revealed preference analysis can generate additional choice restrictions. Let Λ_ω(z, t) be the consumption set of agent ω when assigned instrument z ∈ supp(Z) when treatment is set to t ∈ supp(T). Let γ ∈ Λ_ω(z, t) represent a consumption good. Agent ω is assumed to maximize a utility function u_ω defined over consumption goods γ and choice t. Thus, the choice function Ch_ω: supp(Z) → supp(T) of agent ω when the instrument is set to value z ∈ supp(Z) is:

C h_{ω} (z) = \underset{t \in supp (T)}{argmax} (max_{g \in Λ_{ω} (z, t)} u_{ω} (g, t)) .

(45)

For budget set Λ_ω(z, t) for agent ω, we assume the following relationships:

Λ_{ω} (z_{n o}, t_{a}) = Λ_{ω} (z_{b c}, t_{a}) \subset Λ_{ω} (z_{a}, t_{a}),

(46)

Λ_{ω} (z_{n o}, t_{b}) = Λ_{ω} (z_{a}, t_{b}) \subset Λ_{ω} (z_{b c}, t_{b}),

(47)

Λ_{ω} (z_{n o}, t_{c}) = Λ_{ω} (z_{a}, t_{c}) \subset Λ_{ω} (z_{b c}, t_{c}) .

(48)

Relationship (46) compares the budget sets of agent ω for each possible voucher assignment given the car choice is fixed at a. The budget set of agent ω is enlarged when she has a voucher for car a (z_a) compared to when she does not (z_a is the only voucher that applies to car a). Thus, assigning consumer ω who buys car a, voucher z_a provides additional income. Vouchers z_no and z_bc offer no discount for car a and produce the same budget set for this choice. Relationship (47) examines the agent’s budget set if ω purchases car b. The budget set of agent ω is enlarged if she has a voucher that subsidizes car b when compared to vouchers that do not affect the choice set (z_a, z_no). Relationship (48) examines the agent’s budget set when car c is assigned and is consistent with the budget analysis of relationship (47).³⁶ For this example, the Weak Axiom of Revealed Preference (WARP) generates the following choice rule:

if C h_{ω} (z) = t and Λ_{ω} (z, t) \subseteq Λ_{ω} (z^{'}, t) and Λ_{ω} (z^{'}, t^{'}) \subseteq Λ_{ω} (z, t^{'}) \Rightarrow C h_{ω} (z^{'}) \neq t^{'} .

(49)

³⁷

In particular, Choice Rule (49) applied to budget set relationships (46)–(48) generates the choice restrictions 1–6 in Table 2.

Table 2.

Choice Restrictions Generated by Revealed Preference Analysis for supp(Z) = {z_no, z_a, z_bc}

Choice Restriction 1 : Ch_ω(z_no) = t_a ⇒ Ch_ω(z_a) = t_a
Choice Restriction 2 : Ch_ω(z_no) = t_b ⇒ Ch_ω(z_a) ≠ t_c	and Ch_ω(z_bc) ≠ t_a
Choice Restriction 3 : Ch_ω(z_no) = t_c ⇒ Ch_ω(z_a) ≠ t_b	and Ch_ω(z_bc) ≠ t_a
Choice Restriction 4 : Ch_ω(z_a) = t_b ⇒ Ch_ω(z_no) = t_b	and Ch_ω(z_bc) ≠ t_a
Choice Restriction 5 : Ch_ω(z_a) = t_c ⇒ Ch_ω(z_no) = t_c	and Ch_ω(z_bc) ≠ t_a
Choice Restriction 6 : Ch_ω(z_bc) = t_a ⇒ Ch_ω(z_no) = t_a	and Ch_ω(z_a) = t_a
Choice Restriction 7 : Ch_ω(z_no) ≠ t_a ⇒ Ch_ω(z_bc) = Ch_ω(z_no)

Open in a new tab

Under additional assumptions about choice, we generate additional restrictions on the admissible strata. It is reasonable to assume that if an agent decides to buy a car without a discount, then the agent will not alter his choice if assigned a voucher that makes his choice of car cheaper. Specifically consider the agent who decides between cars b and c when voucher assignment shifts from z_no to z_bc. There is no discount under z_no whereas z_bc offers a discount for either car. If most of the income increase is spent on goods, then the agent’s car choice likely remains the same.³⁸ Under this condition, an income increase should not decrease its consumption of a good. If the agent is already consuming one unit of car b and his income is increased, then the agent will not decrease his car consumption, hence the agent still buys car b if the voucher changes from z_no to z_bc.³⁹ This restriction on choice generates the 7 admissible response types in Table 2. The choice restrictions of Table 2 eliminate 20 out of the 27 possible response-types generating the admissible response matrix in Table 3.⁴⁰

Table 3.

Response-types Generated by Revealed Preference Analysis for supp(Z) = {z_no, z_a, z_bc}

Instrumental Variables	Choices	Response-types of S
Instrumental Variables	Choices	s₁	s₂	s₃	s₄	s₅	s₆	s₇
No Voucher	T(z_no)	t_a	t_a	t_a	t_b	t_b	t_c	t_c
Voucher for car a	T(z_a)	t_a	t_a	t_a	t_a	t_b	t_a	t_c
Voucher for car b or c	T(z_bc)	t_a	t_b	t_c	t_b	t_b	t_c	t_c

Open in a new tab

For the response matrix of Table 3, the rank of the indicator matrix B_T associated with this response matrix is equal to 7 which is also equal to the number of response-types. From Corollary C-1, response-type probabilities are identified. We can also identify mean counterfactual outcomes defined in terms of the strata in the table. The response matrix of Table 3 is generated by the nine unordered monotonicity relationships of Table 4.⁴¹ The choice restrictions generated by the revealed preference analysis in Table 2 produce unordered monotonicity A-3.

Table 4.

An Identified Pattern of Response Matrices

	Monotonicity Relationships	Implied Propensity Score Inequalities
Relation 1	1[T_ω(z_no) = t_a] ≤ 1[T_ω(z_a) = t_a]	P(T = t_a\|Z = z_no) ≤ P(T = t_a\|Z = z_a)
Relation 2	1[T_ω(z_no) = t_a] ≥ 1[T_ω(z_bc) = t_a]	P(T = t_a\|Z = z_no) ≥ P(T = t_a\|Z = z_bc)
Relation 3	1[T_ω(z_a) = t_a] ≥ 1[T_ω(z_bc) = t_a]	P(T = t_a\|Z = z_a) ≥ P(T = t_a\|Z = z_bc)

Relation 4	1[T_ω(z_no) = t_b] ≥ 1[T_ω(z_a) = t_b]	P(T = t_b\|Z = z_no) ≥ P(T = t_a\|Z = z_a)
Relation 5	1[T_ω(z_no) = t_b] ≤ 1[T_ω(z_bc) = t_b]	P(T = t_b\|Z = z_no) ≤ P(T = t_a\|Z = z_bc)
Relation 6	1[T_ω(z_a) = t_b] ≤ 1[T_ω(z_bc) = t_b]	P(T = t_b\|Z = z_a) ≤ P(T = t_a\|Z = z_bc)

Relation 7	1[T_ω(z_no) = t_c] ≥ 1[T_ω(z_a) = t_c]	P(T = t_c\|Z = z_no) ≥ P(T = t_c\|Z = z_a)
Relation 8	1[T_ω(z_no) = t_c] ≤ 1[T_ω(z_bc) = t_c]	P(T = t_c\|Z = z_no) ≤ P(T = t_c\|Z = z_bc)
Relation 9	1[T_ω(z_a) = t_c] ≤ 1[T_ω(z_bc) = t_c]	P(T = t_c\|Z = z_a) ≤ P(T = t_c\|Z = z_bc)

Open in a new tab

Remark 5.1

The response matrix in Table 3 is uniquely generated by the unordered monotonicity relationships of Table 4. By uniquely we mean that a change in the direction of any of these inequalities produces a response matrix that differs from the one in Table 3. This property is useful for testing the model assumptions as each monotonicity relationship implies a propensity score inequality that can be tested on observed data.

Unordered monotonicity can arise under different configurations of the instrumental variable. Thus, in the previous example, consider changing the support of the instrumental variable Z from {z_no, z_a, z_bc} to {z_no, z_b, z_bc}. We can apply the same revealed preference analysis of the first example to {z_no, z_b, z_bc}. This analysis generates the response matrix shown in Table 5 which is also uniquely generated by nine inequalities consistent with unordered monotonicity A-3. The response matrix also identifies response-type probabilities and an associated set of counterfactual outcomes. However, three out of seven response-types in Table 5 differ from the ones in Table 3.

Table 5.

Response-types Generated by Revealed Preference Analysis for supp(Z) = {z_no, z_b, z_bc}.

Instrumental Variables	Choices	Response-types of S
Instrumental Variables	Choices	s₁	s₂	s₃	s₄	s₅	s₆	s₇
No Voucher	T(z_no)	t_a	t_a	t_a	t_a	t_b	t_c	t_c
Voucher for car b	T(z_b)	t_a	t_a	t_b	t_b	t_b	t_c	t_b
Voucher for car b or c	T(z_bc)	t_a	t_c	t_b	t_c	t_b	t_c	t_c

Open in a new tab

Choice restrictions alone do not necessarily produce identifiability. For an example, see Web Appendix D.2. We further note that unordered monotonicity A-3 is not a necessary condition for identification of model parameters. In Web Appendix D.3, we modify the example of Table 5 by assuming that Z takes values in supp(Z) = {z_c, z_b, z_bc}. WARP alone generates the response matrix described in Table 6.⁴² The rank of its associated binary matrix B_T is equal to 7. Thus, response-type probabilities are identified. However, the response matrix in Table 6 is not consistent with unordered monotonicity A-3. There is no sequence of monotonic relationships consistent with A-3 that generates this response matrix. For example, consider the change in voucher assignment from voucher for c (z_c) to voucher for b (z_b) in Table 6. This change induces those in s₄ to move towards t_a (from t_c to t_a), while those in s₂ to move away from t_a (from t_a to t_b). This pattern of counterfactual choices is inconsistent with monotonicity.⁴³ Moreover, revealed preference analysis may or may not identify the choice model, depending on the patterns of restrictions imposed on the variation in the instruments.⁴⁴

Table 6.

Response-types Generated by Revealed Preference Analysis for supp(Z) = {z_c, z_b, z_bc}.

Instrumental Variables	Count. Choices	Response-types of S
Instrumental Variables	Count. Choices	s₁	s₂	s₃	s₄	s₅	s₆	s₇
Voucher for c	T(z_c)	t_a	t_a	t_b	t_c	t_c	t_c	t_c
Voucher for b	T(z_b)	t_a	t_b	t_b	t_a	t_b	t_b	t_c
Voucher for b or c	T(z_bc)	t_a	t_b	t_b	t_c	t_b	t_c	t_c

Open in a new tab

6 Equivalent Conditions for Characterizing Unordered Monotonicity

This section presents and interprets general properties shared by all response matrices that satisfy unordered monotonicity A-3. We explore a variety of ways to express A-3 including separability of choice equations.

6.1 Properties of Binary Matrices

To establish a relationship between identifiability and the properties of response matrix R, it is helpful to use concepts from the literature on binary matrices. A binary matrix is lonesum if it is uniquely determined by its row and column sums.⁴⁵ We establish that response matrix R is an unordered monotone response matrix (henceforth “monotone”) if each binary matrix derived from it, B_t = 1[R = t]; t ∈ supp(T), is lonesum. Lonesum matrices can be used to characterise monotonicity conditions in choice models. We show that identification and equivalence results arise from the properties of lonesum matrices.

Let r_i,t be the i-th row sum of the binary matrix $B_{t} : r_{i, t} = \sum_{n = 1}^{N_{S}} B_{t} [i, n]$ . Let c_n,t denote the sum of the n-th column of B_t, that is, $c_{n, t} = \sum_{i = 1}^{N_{Z}} B_{t} [i, n]$ . The maximal of matrix B_t is a matrix whose i-th row is given by r_i,t elements 1 followed by 0s. Two matrices are equivalent if one can be transformed into the other by a series of row and/or column permutations.

Table 7 displays matrix B_{t_a} = 1[R = t_a], where R is the response matrix of Table 3. The first column of Table 7 gives the row sums of B_{t_a} . The last row of Table 7 presents its column sums. To show that matrix B_{t_a} is lonesum, reorder its columns and rows based on decreasing values of column sums and increasing values of row sums. The maximal of B_{t_a} is obtained by a reordering of B_{t_a} based only on row and column sums. Note that there are different orderings for different t. The reordered matrix of Table 3 is given in Table 8. It is a maximal matrix because the matrix rows are described by elements 1 followed by 0s. For example, if a maximal matrix has 7 columns and its first row sum is 1, the first row is [1, 0, 0, 0, 0, 0, 0]. Thus a maximal matrix is uniquely determined by its row sums. Therefore we conclude that B_{t_a} is a lonesum matrix. One can check that matrices B_{t_b} and B_{t_c} of Table 3 are also lonesum. Thus, following our definition, response matrix R of Table 3 is unordered monotone. In our analysis of LATE in Section 4.1, B_t_₁ and B_t_₀ are both lonesum.

Table 7.

Row and Column Sums of Matrix B_{t_a} of Response Matrix in Table 3

Row Sum	Row Index	Matrix B_{t_a} = 1[R = t_a] of Table 3
Row Sum	Row Index	s₁	s₂	s₃	s₄	s₅	s₆	s₇
3	r_{1,t_a}	1	1	1	0	0	0	0
5	r_{2,t_a}	1	1	1	1	0	1	0
1	r_{3,t_a}	1	0	0	0	0	0	0

	Column Index	c_{1,t_a}	c_{2,t_a}	c_{3,t_a}	c_{4,t_a}	c_{5,t_a}	c_{6,t_a}	c_{7,t_a}
	Column Sum	3	2	2	1	0	1	0

Open in a new tab

Table 8.

Reordered Matrix B_{t_a} According to Increasing Values of Row Sums and Decreasing Values of Column Sums

Row Sum	Row Index	Reordered Rows and Columns by Sums
Row Sum	Row Index	s₁	s₂	s₃	s₄	s₆	s₅	s₇
1	r_{3,t_a}	1	0	0	0	0	0	0
3	r_{1,t_a}	1	1	1	0	0	0	0
5	r_{2,t_a}	1	1	1	1	1	0	0

	Column Index	c_{1,t_a}	c_{2,t_a}	c_{3,t_a}	c_{4,t_a}	c_{6,t_a}	c_{5,t_a}	c_{7,t_a}
	Column Sum	3	2	2	1	1	0	0

Open in a new tab

6.2 Characterizing Unordered Monotonicity

The following conditions are necessary and sufficient for characterizing unordered monotonicity A-3:

Theorem T-3

The following statements are equivalent characterizations of A-3 for the IV model (1)–(3):

R is an unordered monotone response matrix, i.e., each binary matrix B_t = 1[R = t]; t ∈ supp(T) is lonesum;
For any t, t′, t″ ∈ supp(T), there are no 2 × 2 sub-matrices of R of the type:
$(\begin{matrix} t & t^{'} \\ t^{″} & t \end{matrix}) o r (\begin{matrix} t^{'} & t \\ t & t^{″} \end{matrix}), where t^{'} \neq t and t^{″} \neq t .$ (50)
⁴⁶
Unordered monotonicity: For any z, z′ ∈ supp(Z), and for each treatment t ∈ supp(T), we have that:⁴⁷
$1 [T_{ω} (z) = t] \geq 1 [T_{ω} (z^{'}) = t] \forall ω \in Ω or 1 [T_{ω} (z) = t] \leq 1 [T_{ω} (z^{'}) = t] \forall ω \in Ω .$
Unordered Separability: treatment choice can be represented by separable choice functions in V and Z, i.e., there exist functions φ : supp(V ) × supp(T) → ℝ and τ : supp(Z) × supp(T) → ℝ such that:
$1 [T = t ∣ V = v, Z = z] = 1 [Ψ (t, z, v) \leq 0] = 1 [φ (v, t) + τ (z, t) \geq 0] .$ (51)

Proof

See Web Appendix A.6.

Condition (i) states our main condition for equivalence: if and only if response matrix R is unordered monotone, each indicator matrix formed from it (B_t = 1[R = t]) is lonesum, and conversely. Condition (ii) states that if R is an unordered monotone response matrix, each 2 × 2 sub-matrix in R is not of the form in (50). Condition (iii) states that the conditions preceding it hold if and only if unordered monotonicity A-3 holds. As previously noted, condition (iii) implies monotonicity A-1 for the binary choice model. Condition (iv) is a separability property that characterizes the choice functions. Vytlacil’s equivalence theorem (2002) is generated by the equivalence of conditions (iii) and (iv) when we specialize the model to the case of a binary treatment.⁴⁸

6.3 Interpreting T-3

Condition (i) describes a key property of response matrices: the lonesum property of treatment choice indicators. Lonesum matrices are not only useful for characterizing unordered monotonicity, but they are key concepts for investigating properties of choice models.⁴⁹

Condition (i) of T-3 implies that B_t is fully characterized by its column and row sums. This condition implies that the response matrix R is also characterized by its row and column sums. However, the reverse is not true. We illustrate this in Remark 6.1 :

Remark 6.1

If R is an unordered monotone response, each matrix B_t is lonesum and therefore fully characterized by its column and row sums r_i,t, c_n,t; t ∈ supp(T), i ∈ {1, …, N_Z}, n ∈ {1, …, N_S}. Since Response matrix R can be written as $R = \sum_{t \in supp (T)} t B_{t}$ , R is characterized by its column and row sums r_i,t, c_n,t as well. However, the reverse is not true. R being characterized by its column and row sums does not imply that R is an unordered monotone response. To illustrate this claim, let response matrix R be defined by:

R = (\begin{matrix} t_{1} & t_{2} \\ t_{2} & t_{3} \end{matrix}), thus \underset{row sums}{\underset{︸}{\begin{matrix} r_{1, t_{1}} = 1, & r_{1, t_{2}} = 1, & r_{1, t_{3}} = 0, \\ r_{2, t_{1}} = 0, & r_{2, t_{2}} = 1, & r_{2, t_{3}} = 1, \end{matrix}}}, : \underset{column sums}{\underset{︸}{\begin{matrix} c_{1, t_{1}} = 1, & c_{1, t_{2}} = 1, & c_{1, t_{3}} = 0, \\ c_{2, t_{1}} = 0, & c_{2, t_{2}} = 1, & c_{2, t_{3}} = 1. \end{matrix}}}

R is not unordered monotone because it violates condition (ii) of T-3. Moreover B_t_₂ = 1[R = t₂] exhibits one of the prohibited patterns (52) and it is not lonesum. Nevertheless, R is fully characterized by its column sums and row sums: r₁_,t_₁ = 1 and c₁_,t_₁ = 1 ⇒ R[1, 1] = t₁; r₂_,t_₃ = 1 and c₂_,t_₃ = 1 ⇒ R[2, 2] = t₃; r₁_,t_₂ = 1 and R[1, 1] = t₁ ⇒ R[1, 2] = t₂; r₂_,t_₂ = 1 and R[2, 2] = t₃ ⇒ R[2, 1] = t₂.

All response matrices for the case of binary treatment are equivalent under monotonicity A-1. This property does not hold for the general unordered case:

Remark 6.2

Consider the binary choice model in which the instrument takes N_Z values and T takes values in {0, 1}. Unordered monotonicity generates a monotonicity inequality for each pair of Z-values. Different sets of inequalities generate different response matrices. However, each of these response matrices is equivalent to the same lower triangular binary matrix with N_Z rows and N_Z + 1 columns (see the example in Section 4.1) and produces an identified model. However, in the case of multiple choices, unordered monotonicity does not generate response matrices that are equivalent to the same matrix. For example, the response matrices of Tables 3 and 5 are monotone responses but they are not equivalent, because one matrix cannot be transformed into another by row and/or column permutations. The response matrices in Tables 3 and 5 consist of seven response-types for N_T = 3 and N_Z = 3. There are 27 possible response-types for N_T = 3 and N_Z = 3. The combination of 7 response-types out of these 27 generates 888,030 possible response matrices, although some may not be identifiable. Among them, 66 response matrices satisfy unordered monotonicity condition (iii).⁵⁰ Response matrices of Tables 3 and 5 are two examples of these matrices.

Condition (ii) of T-3 imposes a restriction on counterfactual choices that does not depend on the number of treatment choices in supp(T) or the number of values that Z takes. The condition rules out two-way flows generated by changes in instruments. Thus the response matrix of Table 6 is not unordered monotone. The forbidden type of condition (ii) is obtained using the first and second rows of response-types s₂ and s₄ in Table 6.⁵¹ The change from z_c to z_b shifts people away from a in s₂ but toward a in s₄.

Remark 6.3

We note that a consequence of condition (ii) in T-3 is that under A-3, no 2 × 2 sub-matrix of any B_t; t ∈ supp(T) is of the type:⁵²

(\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}) nor (\begin{matrix} 0 & 1 \\ 1 & 0 \end{matrix}) .

(52)

Unordered monotonicity A-3 holds if and only if no prohibited patterns (52) occur for any B_t; t ∈ supp(T). An example clarifies the equivalence between the requirements for unordered monotonicity A-3 and the absence of prohibited patterns (52). Suppose that (1[T = t]|Z = z, V = v) ≥ (1[T = t]|Z = z′, V = v) holds for all v ∈ supp(V ). Then it must be the case that:

(1 [T = t] ∣ Z = z, S = s) \geq (1 [T = t] ∣ Z = z^{'}, S = s)

(53)

holds for all s ∈ supp(S) because for each v ∈ supp(V) there is s ∈ supp(S) such that s = f_S(v) (see (10)) and (T|S = s, Z = z) = (T|V = v, Z = z). Inequality (53) generates three possible sub-vectors of dimension 2 × 1 that indicate whether T is equal to t when Z takes value z and z′ or any response-type s ∈ supp(S):

(\begin{matrix} (1 [T = t] ∣ Z = z, S = s) \\ (1 [T = t] ∣ Z = z^{'}, S = s) \end{matrix}) \in {(\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} 1 \\ 0 \end{matrix}), (\begin{matrix} 1 \\ 1 \end{matrix})} for all s \in supp (S) .

(54)

The matrix generated by a combination of sub-vectors in (54) for any two response-types s, s′ ∈ supp(S) is:

(\begin{array}{l} (1 [T = t] ∣ Z = z, S = s) & (1 [T = t] ∣ Z = z, S = s^{'}) \\ (1 [T = t] ∣ Z = z^{'}, S = s) & (1 [T = t] ∣ Z = z^{'}, S = s^{'}) \end{array}) .

It cannot be of the form:

(\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}) or (\begin{matrix} 0 & 1 \\ 1 & 0 \end{matrix}),

which are prohibited patterns (52). Hence the weak inequality (1[T = t]|Z = z, V = v)) ≥ (1[T = t]|Z = z′, V = v)) ∀ v ∈ supp(V) implies that B_t is lonesum. On the other hand, suppose that v, v′ ∈ supp(V) are such that (1[T = t]|Z = z, V = v) > (1[T = t]|Z = z′, V = v) and (1[T = t]|Z = z, V = v′) < (1[T = t]|Z = z′, V = v′). Then there must exist s, s′ ∈ supp(S) where s = f_S(v), s′ = f_S(v′) that generates the prohibited pattern:

(\begin{matrix} (1 [T = t] ∣ Z = z, S = s) & (1 [T = t] ∣ Z = z, S = s^{'}) \\ (1 [T = t] ∣ Z = z^{'}, S = s) & (1 [T = t] ∣ Z = z^{'}, S = s^{'}) \end{matrix}) = (\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}) .

(55)

The equality of the first columns in the right and left side of Equation (55) means that, for some type s, treatment t is chosen when the instrument shifts from z′ to z. The equality of the second columns of Equation (55) states the opposite. For some type s′, the instrument shift from z′ to z causes treatment t not to be chosen. This behavior violates both the intuitive notion and formal definition of monotonicity because the instrument shifts some agents to change their choice towards t while others change their choice away from t.⁵³ Condition (iii) is implied by super (or sub) modularity of Ψ(t, z, v) in terms of v and z for all t, but that condition is stronger than what is required to produce A-3. Strictly speaking, the requirement is that component-wise, $sgn (\frac{Δ Ψ (T, z, v)}{Δ z})$ is the same for all V = v for each T = t and Z = z.

6.4 Understanding Condition (iv) of T-3

We draw on and generalize the binary-treatment model of Sections 4.1–4.2 to build the intuition underlying condition (iv). In the binary case, monotonicity implies that B_t_₁ is lower triangular (28).⁵⁴ Triangularity generates Equation (38) which expresses treatment choice T as an indicator function that is separable in the observed propensity score P(T = t₁|Z), which depends on Z, and a sum of response-type probabilities, which depends on V.

Theorem T-3 applies to choice models with multiple treatments, which include the binary case. If unordered monotonicity (condition (iii) of T-3) holds, then each binary matrix B_t; t ∈ supp(T) is characterized solely by its row and column sums so that B_t; t ∈ supp(T) are lonesum (Item (i) of T-3). This property can be understood as a generalization of the lower triangular property in the binary case, but applied to each B_t.⁵⁵ Generalized triangularity generates condition (iv) which characterizes treatment choice as an indicator function that is separable in Z and V. We present a detailed discussion of this condition in Appendix G.

To interpret separability condition (iv), suppose that agent ω with V_ω = v ∈ supp(V) chooses t ∈ supp(T) when an instrumental variable is set to z ∈ supp(Z), so that 1[T = t|V = v, Z = z] = 1. According to condition (iv), there exist functions φ and τ such that φ(v, t) + τ (z, t) ≥ 0.⁵⁶ It is clear that expressions of this type rule out the prohibited patterns (52) and therefore generate unordered monotonicity. What is less obvious is that (iii) implies representation (iv), which is not necessarily unique.⁵⁷

Note that 1[T = t|V = v, Z = z] = 1 implies that 1[T = t′|V = v, Z = z] = 0 for all t′ ∈ supp(T) \ {t}. Therefore it must be the case that φ(v, t′) + τ (z, t′) < 0 for all t′ that differs from t. In particular, condition (iv) implies that:

1 [T = t ∣ V = v, Z = z] = 1 \Leftrightarrow t = \underset{t^{'} \in supp (T)}{argmax} (Ψ (t^{'}, z, v)) = \underset{t^{'} \in supp (T)}{argmax} (φ (v, t^{'}) + τ (z, t^{'})) .

(56)

Condition (iv) does not claim that the functions φ and τ are unique. Indeed if t maximizes φ(v, t′) + τ (z, t′), it also maximizes m(φ(v, t′) + τ (z, t′)) where m is any strictly increasing function.

Condition (iv) does not impose rationality or perfect foresight on agent decision making. Suppose that agent ω decides among t₁, t₂, t₃ and that his treatment choice is generated by maximization of a utility function Ψ(t, z, v) where V_ω = v and Z_ω = z. Condition (iv) states that if unordered monotonicity A-3 holds, the maximized choice value Ψ(t, z, v) can be characterized as arising from the maximization of a separable function φ(v, t)+τ (z, t). Specifically, if ω chooses t₁, then t₁ is the maximum among Ψ(t, z, v) for t ∈ {t₁, t₂, t₃}. In this case, t₁ also maximizes φ(v, t) + τ (z, t) for t ∈ {t₁, t₂, t₃}:

t_{1} = \underset{t \in {t_{1}, t_{2}, t_{3}}}{argmax} Ψ (v, t, z) \Leftrightarrow t_{1} = \underset{t \in {t_{1}, t_{2}, t_{3}}}{argmax} (φ (v, t) + τ (z, t)) .

Condition (iv) does not imply that the ranking of treatment utilities generated by Ψ(t, z, v) is necessarily the same as the ranking generated by φ(v, t) + τ (z, t). For instance, if Ψ(t₁, z, v) > Ψ(t₂, z, v) > Ψ(t₃, z, v) then ω prefers t₁ to t₂, and t₂ to t₃. This does not necessarily imply that φ(v, t₁) + τ (z, t₁) > φ(v, t₂) + τ (z, t₂) > φ(v, t₃) + τ (z, t₃). Indeed, φ(v, t₁) + τ (z, t₁) > φ(v, t₃) + τ (z, t₃) > φ(v, t₂) + τ (z, t₂) may also occur. It is the ranking of t₁ relative to the next best that generates agent choices of t₁. Variation in instruments only identify preferences relative to the next best choice and not an order among the remaining elements in the choice set.

To formalize this discussion, we establish that unordered monotonicity arises if we assume that utilities of a choice compared to the next best choice can be represented as additively separable functions:⁵⁸

u (v, t) + h (z, t) = Ψ (t, z, v) - max_{t^{'} \in supp (T) \ {t}} Ψ (t^{'}, z, v) .

The following theorem formalizes this point.

Theorem T-4

If there exist functions u: supp(V)×supp(T) → ℝ and h: supp(Z)× supp(T) → ℝ such that

u (v, t) + h (z, t) = (Ψ (t, z, v) - max_{t^{'} \in supp (T) \ {t}} Ψ (t^{'}, z, v)) \forall v \in supp (V), z \in supp (Z),

then the response matrix R associated with this choice model is unordered monotone.

Proof

See Web Appendix A.7.

As before, the separable representation is not necessarily unique.

Remark 6.4

T-4 imposes stronger functional form assumptions than T-3. Summarizing:

unordered monotonicity \Rightarrow (\underset{t \in supp (T)}{argmax} φ (v, t) + τ (z, t)) = (\underset{t \in supp (T)}{argmax} (Ψ (t, z, v) - max_{t^{'} \in supp (T) \ {t}} Ψ (t^{'}, z, v))) while u (v, t) + h (z, t) = (Ψ (t, z, v) - max_{t^{'} \in supp (T) \ {t}} Ψ (t^{'}, z, v)) \Rightarrow unordered monotonicity

Heckman et al. (2006b, 2008) assume separability in the underlying preference functions and show that IV estimates a LATE that compares the outcome of one choice to the outcome for the next best option. Our condition is weaker. Theorem T-4 states that unordered monotonicity only requires that the utility of a choice relative to the next best choice be separable. To clarify, the impact of instrument Z on the treatment choice is summarized by the term h(z, t). Suppose Z changes from z′ to z. If h(z, t) − h(z′, t) > 0, each agent is induced towards t. If h(z, t) − h(z′, t) < 0 agents are induced against t. This analysis applies for all pairwise values of (z, z′) ∈ supp(Z)×supp(Z) and for all t ∈ supp(T). The collection of all of these inequalities characterizes unordered monotonicity A-3.

6.5 Verifying Unordered Monotonicity Condition A-3

Verifying condition (ii) of Theorem T-3 is a daunting combinatorial task. It would require checking each 2 × 2 sub-matrix in R, which is impractical for large R. We show that a single calculation based on a simple multiplication of binary matrices suffices to check condition A-3. Our criterion is based on a binary matrix M:

For each t_{j} \in supp (T) = {t_{1}, \dots, t_{N_{T}}}, let M_{t_{j}} = [\underset{j - 1 times}{\underset{︸}{1_{N_{Z}, N_{S}}, \dots, 1_{N_{Z}, N_{S}}}}, B_{t_{j}}, \underset{N_{T} - j times}{\underset{︸}{0_{N_{Z}, N_{S}}, \dots, 0_{N_{Z}, N_{S}}}}], then M = {[M_{t_{1}}^{'}, \dots, M_{t_{N_{T}}}^{'}]}^{'},

(57)

where 1_N_{_Z}_,N_{_S} is a matrix of elements 1 and 0_N_{_Z}_,N_{_S} is a matrix of elements 0 of same dimension. Matrix M is block diagonal with matrices B_t on the diagonal, where, again, we eliminate any redundancies. M has elements 1 below this diagonal and elements 0 above it.

Theorem T-5

For the IV model (1)–(3), the response matrix R is an unordered monotone response, that is, each binary matrix B_t = 1[R = t]; t ∈ supp(T) is lonesum, if and only if,

ι_{c}^{'} ((M^{'} (ι_{r} ι_{c}^{'} - M)) ⊙ {(M^{'} (ι_{r} ι_{c}^{'} - M))}^{'}) ι_{c} = 0,

(58)

where ι_r is an N_T · N_Z vector 1s and ι_c is an N_T · N_S vector 1s. Moreover, if Equation (58) holds, then matrix M is lonesum.

Proof

See Web Appendix A.8.

Unordered monotonicity condition A-3 holds if and only if this value is equal to zero. Moreover, if equation (58) holds, then all the conditions stated in Theorem T-3 also hold.

7 Identification of Counterfactuals and Treatment Effects

This section applies our analysis to determine which counterfactuals and treatment effects are identified and for which strata. We build on our analysis of binary LATE presented in Section 4.1. We generalize the notions of “compliers,” “always takers” and “never-takers” to a general unordered choice model.

To this end, it is helpful to introduce some additional notation. Let Σ_t(i) be the set of response-types in which t appears exactly i times:

\sum_{t} (i) = {s_{n}, such that s_{n} \in supp (S) and \sum_{j^{'} = 1}^{N_{Z}} B_{t} [j^{'}, n] = i} where i \in {0, \dots, N_{Z}} .

(59)

For example, Σ_t_{_a} (2) for the response matrix of Table 3 consists of the response-types for which the value t_a appears exactly twice. They are Σ_t_{_a} (2) = {s₂, s₃} (see Table 9). Those are also the response-types whose column-sum of B_t_{_a} in Table 7 is 2.

Table 9.

Partition of Response-types in Table 3 where supp(Z) = {z_no, z_a, z_bc}

Instrumental Variables	Count. Choices	Response-types of S
Instrumental Variables	Count. Choices	s₁	s₂	s₃	s₄	s₅	s₆	s₇
No Voucher	T(z_no)	t_a	t_a	t_a	t_b	t_b	t_c	t_c
Voucher for car a	T(z_a)	t_a	t_a	t_a	t_a	t_b	t_a	t_c
Voucher for car b or c	T(z_bc)	t_a	t_b	t_c	t_b	t_b	t_c	t_c

Response-types in Σ_t_{_a} (0)						s₅		s₇
Response-types in Σ_t_{_a} (1)					s₄		s₆
Response-types in Σ_t_{_a} (2)			s₂	s₃
Response-types in Σ_t_{_a} (3)		s₁

t_a-Switchers			s₂	s₃	s₄		s₆
t_a-Always-takers		s₁
t_a-Never-takers						s₅		s₇

Open in a new tab

For each t ∈ supp(T), we can partition the set of response-types by the number of times a treatment value t appears: $supp (S) = \cup_{i = 0}^{N_{Z}} \sum_{t} (i)$ . Table 9 displays these partitions for Σ_t_{_a}(i); i = 0, …, 3 based on the response matrix in Table 3. Let b_t(i) be the N_S-dimensional binary row-vector that indicates if response-type s belongs to Σ_t(i), that is, b_t(i)[n] = 1 if s_n ∈ Σ_t(i) and zero otherwise. For Table 3, b_t_{_a}(2) = [0, 1, 1, 0, 0, 0, 0]. Using this notation, we prove the following identification theorem:

Theorem T-6

If unordered monotonicity A-3 holds for the IV model (1)–(3) then the following response-type probabilities and counterfactuals are identified:

P (S \in \sum_{t} (i)) and E (κ (Y (t)) ∣ S \in \sum_{t} (i)) \forall t \in supp (T) and i \in {1, \dots, N_{Z}} .

(60)

Moreover, those parameters can be evaluated by the following equations:

P (S \in \sum_{t} (i)) = b_{t} (i) B_{t}^{+} P_{Z} (t) and E (κ (Y (t)) ∣ S \in \sum_{t} (i)) = \frac{b_{t} (i) B_{t}^{+} Q_{Z} (t)}{b_{t} (i) B_{t}^{+} P_{Z} (t)},

(61)

where κ: supp(Y) → ℝ denotes an arbitrary function in the support of Y.

Proof

See Web Appendix A.9.

Note that, in general, we cannot identify counterfactuals within every stratum. Nonetheless, it can be shown that under unordered monotonicity, at least one treatment effect can always be identified.⁵⁹

Table 10 lists the mean counterfactual outcomes that are identified by applying Equation (60) of T-6 to the response matrix R in Table 9. Those are E(Y (t)|S ∈ Σ_t(i)) for i ∈ {1, 2, 3} and t ∈ {t_a, t_b, t_c}.

Table 10.

Identified Counterfactual Outcomes for R in Table 3

Response-type Sets	Counterfactual Outcomes
Response-type Sets	Y (t_a)	Y (t_b)	Y (t_c)
Σ_t(1)	E(Y (t_a)\|S ∈ {s₄, s₆})	E(Y (t_b)\|S = s₂)	E(Y (t_c)\|S = s₃)
Σ_t(2)	E(Y (t_a)\|S ∈ {s₂, s₃})	E(Y (t_b)\|S = s₄)	E(Y (t_c)\|S = s₆)
Σ_t(3)	E(Y (t_a)\|S = s₁)	E(Y (t_b)\|S = s₅)	E(Y (t_c)\|S = s₇)

Open in a new tab

Remark 7.1

A direct implication of Theorem T-6 is that if there exists t, t′ ∈ supp(T) and i, i′ ∈ {1, …, N_Z} such that Σ_t(i) = Σ_t_′ (i′) then E(Y (t)−Y (t′)|Σ_t(i)) is identified. This expression is the mean treatment effect of t relative to t′ for the set of strata Σ_t(i).

Expression (61) uses the tools for identification based on the generalized inverse developed in Section 4. For instance, we can apply Equation (61) of T-6 to generate the following identifying relationships:

E (Y (t_{a}) ∣ S \in {s_{2}, s_{3}}) = E (Y (t_{a}) ∣ S \in \sum_{t_{a}} (1)) = \frac{b_{t_{a}} (1) B_{t}^{+} Q_{Z} (t_{a})}{b_{t_{a}} (1) B_{t}^{+} P_{Z} (t_{a})} = \frac{E (Y \cdot 1 [T = t_{a}] ∣ Z = z_{n o}) - E (Y \cdot 1 [T = t_{a}] ∣ Z = z_{b c})}{P (T = t_{a} ∣ Z = z_{n o}) - P (T = t_{a} ∣ Z = z_{b c})} .

(62)

See Web Appendix A.10 for the derivation of Equation (62). Counterfactuals E(Y (t_b)|S = s₂) and E(Y (t_c)|S = s₃) are also identified (See Table 10). Thus we can identify the effect:

\frac{E (Y (t_{a}) - Y (t_{b}) ∣ S = s_{2}) P (S = s_{2}) + E (Y (t_{a}) - Y (t_{c}) ∣ S = s_{3}) P (S = s_{3})}{P (S = s_{2}) + P (S = s_{3})} .

(63)

Let $\bar{t_{a}}$ designate the choice values other than t_a. We use the notation $E (Y (t_{a}) - Y (\bar{t_{a}}) ∣ S \in {s_{2}, s_{3}})$ to designate treatment effect (63), which stands for the effect of choosing t_a versus not choosing t_a for response-types s₂, s₃. (See Web Appendix A.19 for its derivation.)

We generalize the terminology of Angrist et al. (1996) to the case of multiple treatments. The appropriate generalization is t-specific. In the binary case, there is no need to specify a particular t since the specification of one value automatically implies the other possible value. The t-Never-takers are those in the set of response-types in Σ_t(0), for which t does not occur. Σ_t(N_Z) consists of a single response-type whose elements are all t. It is the set of the t-Always-takers. The set of remaining response-types are $t - Switchers \equiv \cup_{i = 1}^{N_{Z} - 1} \sum_{t} (i)$ consists of all strata for which the choice of treatment t varies as Z ranges in its support.⁶⁰ Those sets are formally defined as:

t - Never - takers = {s \in supp (S); P (T = t ∣ S = s) = 0} \equiv \sum_{t} (0); t - Always - takers = {s \in supp (S); P (T = t ∣ S = s) = 1} \equiv \sum_{t} (N_{Z}); t - Switchers = {s \in supp (S); 0 < P (T = t ∣ S = s) < 1} \equiv \cup_{i = 1}^{N_{Z} - 1} \sum_{t} (i) .

These sets for the response matrix of Table 3 are: t_a-Always-takers = {s₁}, t_a-Never-takers = {s₅, s₇}, and t_a-Switchers = {s₂, s₃, s₄, s₆} (see Table 9). Corollaries C-2–C-3 present identification results for the various categories

Corollary C-2

For the IV model (1)–(3) in which unordered monotonicity A-3 holds, the following probabilities are identified for each t ∈ supp(T):

P (S \in t - Always - takers) = P (S \in \sum_{t} (N_{Z})) = b_{t} (N_{Z}) B_{t}^{+} P_{Z} (t); P (S \in t - Switchers) = P (S \in \cup_{i = 1}^{N_{Z} - 1} \sum_{t} (i)) = (\sum_{i = 1}^{N_{Z} - 1} b_{t} (i)) B_{t}^{+} P_{Z} (t); P (S \in t - Never - takers) = P (S \in \sum_{t} (0)) = 1 - P (S \in t - Always - takers) - P (S \in t - Switchers) .

Proof

See Web Appendix A.11.

Corollary C-3

Assume the IV model (1)–(3) for which unordered monotonicity A-3 holds. The mean counterfactual outcomes for the t-Always-takers and t-Switchers for each t ∈ supp(T) are generated by:

E (Y (t) ∣ t - Always - takers) = E (Y (t) ∣ S \in \sum_{t} (N_{Z})) = \frac{b_{t} (N_{Z}) B_{t}^{+} Q_{Z} (t)}{b_{t} (N_{Z}) B_{t_{a}}^{+} P_{Z} (t)}; E (Y (t) ∣ t - Switchers) = \sum_{i = 1}^{N_{Z} - 1} E (Y (t) ∣ S \in \sum_{t} (i)) \cdot \frac{P (S \in \sum_{t} (i))}{P (S \in t - Switchers)}, where \sum_{i = 1}^{N_{Z} - 1} \frac{P (S \in \sum_{t} (i))}{P (S \in t - Switchers)} = 1; Alternatively E (Y (t) ∣ t - Switchers) = \frac{(\sum_{i = 1}^{N_{Z} - 1} b_{t} (i)) B_{t}^{+} Q_{Z} (t)}{(\sum_{i = 1}^{N_{Z} - 1} b_{t} (i)) B_{t_{a}}^{+} P_{Z} (t)} .

(64)

Proof

See Web Appendix A.12.

Corollary C-2 relies on the result in T-6 that P(S ∈ Σ_t(i)) is identified for all i ∈ {1, …, N_Z}. Corollary C-3 is obtained by setting κ(Y) = Y and using the fact that E(κ(Y (t))|S ∈ Σ_t(i)) is identified for all i ∈ {1, …, N_Z}. To illustrate these corollaries we present the following example.

Remark 7.2

Corollary C-2 states that the expected value of counterfactual mean outcomes for response-types s ∈ supp(S) such that P(T = t|S = s) = 1 (the t-Always-takers) or 0 < P(T = t|S = s) < 1 (t-Switchers) are identified. According to Remark 3.1, these response-types refer to the values v ∈ supp(V) such that 0 < P(T = t|V = v) ≤ 1. Therefore, C-3 implies that E(Y (t)|V ∈ {v; 0 < P(T = t|V = v) ≤ 1}) is identified. The remaining set of response-types are the t-Never-takers, which consists of the response-types s ∈ supp(S) such that P(T = t|S = s) = 0. This set refers to the set of values v ∈ supp(V) such that P(T = t|V = v) = 0. If the set of t-Never-takers is empty, then all response-types belong to either t-Always-takers or t-Switchers and E(Y (t)) is identified.

Example 7.1

According to C-3, the counterfactual outcome mean for t_a-Switchers in the response matrix R of Table 3 is given by:

E (Y (t_{a}) ∣ t_{a} - Switchers) = E (Y (t_{a}) ∣ S \in {s_{2}, s_{3}, s_{4}, s_{6}}) = \frac{(\sum_{i = 1}^{2} b_{t_{a}} (i)) B_{t_{a}}^{+} Q_{Z} (t_{a})}{(\sum_{i = 1}^{2} b_{t_{a}} (i)) B_{t_{a}}^{+} P_{Z} (t_{a})}

(65)

The components of Equation (65) that can be estimated from observed data are:

P_{Z} (t_{a}) = {[P (T = t_{a} ∣ Z = z_{n o}), P (T = t_{a} ∣ Z = z_{a}), P (T = t_{a} ∣ Z = z_{b c})]}^{'}; Q_{Z} (t_{a}) = [E (Y \cdot 1 [T = t_{a}] ∣ Z = z_{n o}), E (Y \cdot 1 [T = t_{a}] ∣ Z = z_{a}), E (Y \cdot 1 {[T = t_{a}] ∣ Z = z_{b c})]}^{'} .

The components of (65) that depend on the response matrix are:

\sum_{i = 1}^{2} b_{t_{a}} (i) = [0, 1, 1, 1, 0, 1, 0]; B_{t_{a}} = [\begin{matrix} 1 & 1 & 1 & 0 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 & 0 & 1 & 0 \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 \end{matrix}] \Rightarrow B_{t_{a}}^{+} = [\begin{matrix} 0 & 0 & 1 \\ 1 / 2 & 0 & - 1 / 2 \\ 1 / 2 & 0 & - 1 / 2 \\ - 1 / 2 & 1 / 2 & 0 \\ 0 & 0 & 0 \\ - 1 / 2 & 1 / 2 & 0 \\ 0 & 0 & 0 \end{matrix}] .

Equation (65) produces the following expression:

E (Y (t_{a}) ∣ t_{a} - Switchers) = \frac{E (Y \cdot 1 [T = t_{a}] ∣ Z = z_{a}) - E (Y \cdot 1 [T = t_{a}] ∣ Z = z_{b c})}{P (T = t_{a} ∣ Z = z_{a}) - P (T = t_{a} ∣ Z = z_{b c})} .

Web Appendix A.17 presents additional results on identification. Web Appendix A.21 shows that $E (Y (t_{a}) - Y (\bar{t_{a}}) ∣ t_{a} - Switchers)$ is also identified. In contrast with the binary case, there is no response-type s ∈ {s₁, …, s₇} of response matrix R in Table 3 such that E(Y (t) − Y (t′)|S = s) is identified for any t, t′ ∈ {t_a, t_b, t_c}.

It is important to distinguish identification of counterfactuals within sets of strata from the identification of treatment effects within sets of strata. An example is helpful. Consider the response matrix for a two-valued instrument with three treatment choices under unordered monotonicity. It has five response types:

\begin{matrix} \begin{matrix} s_{1} & s_{2} & s_{3} & s_{4} & s_{5} \end{matrix} \\ R = & [\begin{matrix} t_{1} & t_{1} & t_{2} & t_{3} & t_{3} \\ t_{1} & t_{2} & t_{2} & t_{2} & t_{3} \end{matrix}] \end{matrix}

Five counterfactual mean outcomes are identified: E(Y (t₁)|S = s₁), E(Y (t₁)|S = s₂), E(Y (t₂)|S = s₃), E(Y (t₃)|S = s₄), E(Y (t₃)|S = s₅). E(Y (t₂) − E(Y (t̄₂)|t̄₂ − Switchers) is identified. This is the treatment effect of t₂ versus the next best treatment, which may vary among members of the t₂ switchers.⁶¹

7.1 Maximum Number of Admissible Response-types to Secure Identification

The identification of strata probabilities P(s), ∈ S can be achieved under weaker conditions than are required for identifying counterfactual outcomes. The identification of response-type probabilities depends on the column-rank of B_T while the identification of counterfactual outcomes for a choice t ∈ supp(T) depends on the column-rank of B_t. The rank of B_T is always greater than the rank of each B_t; t ∈ supp(T) because B_T is generated by stacking B_t across t ∈ supp(T) (Equation (24)).

We characterize the maximum number of response-types N_S in R that facilitate the identification of all response-type probabilities, that is, the maximum N_S such that N_S ≤ rank(B_T).

Theorem T-7

Consider the IV model (1)–(3). Let R be the response matrix consisting of N_S response-types. If response-type probabilities are point identified, then it must be the case that:

N_{S} \leq 1 + (N_{T} - 1) N_{Z} - \sum_{i = 1}^{N_{Z}} \sum_{j = 1}^{N_{T}} 1 [P (T = t_{j} ∣ Z = z_{i}) = 0],

where N_Z is the number of possible values that the instrument takes and N_T is the number of possible values that the treatment choice T takes. In particular, if P(T = t|Z = z) > 0 for all z ∈ supp(Z) and t ∈ supp(T) then the maximum number of response-types N_S in R for the model to be identified is:

N_{S} = 1 + (N_{T} - 1) N_{Z} .

Proof

See Web Appendix A.13.

To identify choice-response probabilities, choice restrictions should eliminate at least $N_{T}^{N_{Z}} - [1 + (N_{T} - 1) N_{Z}]$ response-types to generate identification of response-type probabilities. T-6 shows that even if we are not able to identify each probability P(S = s); s ∈ supp(S), it may still be possible to identify counterfactual means E(Y (t)|S ∈ Σ_t(i)) and associated probabilities P(S ∈ Σ_t(i)) for strata Σ_t(i). See Web Appendix A.14 for additional results on identification of strata probabilities.

8 Summary and Conclusions

This paper extends the literature on instrumental variables to general unordered choice models with heterogenous responses which affect choice. We generalize the notion of monotonicity to unordered choice models. Using discrete instruments, we present conditions under which certain counterfactuals and treatment effects associated with general unordered multiple discrete choice models can be identified. We demonstrate how to characterize the set of identified counterfactuals and treatment effects.

We represent IV equations using discrete mixtures. Identification is achieved by imposing restrictions on the kernels of the mixtures. We do not invoke separability of preferences or identification at infinity to achieve these results. Nonetheless, separability of choice equations emerges as one representation of the underlying choice process.

Unordered monotonicity can sometimes be justified by economic choice models. It can be represented in multiple ways. Unordered monotonicity implies and is implied by a form of separability in the equations generating choices. These representations are linked to properties of binary matrices that characterize the admissible response-types generated by the available instrumental variables. We develop a variety of criteria to determine if Unordered Monotonicity is satisfied. We interpret each of these criteria and explain how they can be used in practice. We show that “principal strata” in the statistics literature are coarse versions of control functions.

This paper demonstrates the power of binary matrices in generating and interpreting identification conditions and in unifying apparently diverse approaches to identification of mean counterfactuals and mean treatment effects. The broader lesson of this paper is that in general unordered discrete choice models restrictions on choice behavior, encoded in the kernel of the mixture representing the IV equations, play a fundamental role in identifying counterfactuals using instrumental variables.

Supplementary Material

Online Appendix

NIHMS938978-supplement-Online_Appendix.pdf^{(909KB, pdf)}

Acknowledgments

We thank Joshua Shea for a close and perceptive read of the paper and helpful comments. We thank the participants in various seminars at the University of Chicago and members of the audience at various regional meetings of the Econometric Society for helpful comments. This research was supported in part by: the American Bar Foundation; the Pritzker Children’s Initiative; the Buffett Early Childhood Fund; NIH grants NICHD R37HD065072, NICHD R01HD54702, and NIA R24AG048081. The views expressed in this paper are solely those of the authors and do not necessarily represent those of the funders or the official views of the National Institutes of Health. The Web Appendix for this paper is posted at https://cehd.uchicago.edu/monotonicity_identifiability.

Footnotes

See the statement of purpose for the Econometric Society by Ragner Frisch (1933).

Theil (1953, 1958) developed two stage least-squares—the leading instrumental variable estimator.

By the economist.

⁴

This concept is more accurately interpreted as “uniformity” and does not correspond to ordinary mathematical definitions of monotonicity. See Heckman and Vytlacil (2005, 2007a).

⁵

See Heckman et al. (2006b).

⁶

For the binary choice model, these are the LATE parameters of Imbens and Angrist (1994). Their extension of LATE to situations with multiple choices assumes that indicators of choice are naturally ordered (e.g., years of schooling). It assumes a meaningful scalar aggregator can be constructed that is monotonic in the ordered indicators of choice (Angrist and Imbens, 1995). They identify a mixture of LATEs where the weights, but not the LATEs, are identified individually. In general, LATE does not identify a variety of policy relevant parameters. See Heckman and Vytlacil (2007b) or Heckman (2010).

⁷

See Heckman and Vytlacil (2007b) and Blevins (2014).

⁸

Continuity of instruments and full support produce identifiability in our model, but are not required. See Heckman and Pinto (2015a). In related work, Lee and Salanié (2016) use a general framework to investigate multivalued choice models defined by separable threshold-crossing rules. They show that the identification of causal effects is possible with enough variation in instrumental variables that assume values on a continuum (“identification at infinity”).

⁹

For an empirical application of unordered monotonicity, see Pinto (2016a), who evaluates the Moving to Opportunity Experiment.

¹⁰

By policy-invariant, we mean functions whose maps remain invariant under manipulation of the arguments. This is the notation of autonomy developed by Frisch (1938) and Haavelmo (1944). For a recent discussion of these conditions, see Heckman and Pinto (2015b).

¹¹

The assumption that Z is a multiple-valued scalar is a convenience. We can vectorize a matrix of instruments into a scalar form. Thus, we accommodate multiple instruments defined in the usual way.

¹²

Such errors terms are often called “shocks” in structural equation models. f_T is a random function that could be written as a deterministic function if we introduced shock ε_T of arbitrary dimension as an argument of the function, where ε_T is independent of V and ε_Y.

¹³

Fixing is a causal operation that captures the notion of external (ceteris paribus) manipulation. It is central concept in the study of causality and dates back to (Haavelmo, 1943). See Heckman and Pinto (2015b) for a recent discussion of fixing and causality.

¹⁴

Counterfactual E(Y (t)) and conditional expectation E(Y |T = t) differ if the conditional and unconditional distributions of V are different: E(Y (t)) = ∫ E(Y (t)|V = v)dF_V (v) ≠ ∫ E(Y |V = v, T = t)dF_V _|T₌_t(v) = E(Y |T = t) where F_V is the CDF of V and F_V _|T₌_t is the CDF of V conditional on T = t. See Heckman and Pinto (2015b).

¹⁵

Different notions of response vectors are used in the literature. In our notation, response vectors correspond to the choices a person of type V would make when confronted by different values of Z. Robins and Greenland (1992) initiated the literature. Frangakis and Rubin (2002) use the term “principal strata.” They do not explicitly model V or use the econometric framework (1)–(3) so the relationship between strata and V and the fact that conditioning on S is equivalent to conditioning on regions of V is only implicit in their analysis. T(z) can potentially take as many as N_T values for each value z ∈ supp(Z). Since Z has | supp(Z)| = N_Z elements, supp(S) can have at most $N_{T}^{N_{Z}}$ elements.

¹⁶

Figure B.3 in Web Appendix B displays our IV model with response vector S as a Directed Acyclic Graph (DAG).

¹⁷

The regions are distinct because f_T (·) is a function.

¹⁸

S being a balancing score means that properties of V are inherited by S. Formally, S = f_S(V ) is a surjective function of V that satisfies Y (t) ⫫ T|V ⇒ Y (t) ⫫ T|f_S(V ), and σ(S) ⊆ σ(V ) where σ denotes a σ-algebra in the probability space (Ω, ℱ, P).

¹⁹

See Heckman (2008) for a survey of a wide array of methods that implement this principle.

²⁰

If we set κ(Y ) = Y, we equate expected values of observed outcomes with expected counterfactual outcomes. Setting κ(Y) = 1[Y ≤ y], we equate the cumulative distribution function (CDF) of the observed outcome with the unobserved CDF of counterfactual outcomes.

²¹

Candidates for X are baseline variables caused by V. Knowledge of the X variables helps to identify the observed characteristics of persons within strata.

²²

Across the two equation systems for T and scalar Y there are (2·N_T −1)·N_Z observed quantities and (N_T)^2·N_Z+1 unknown parameters.

²³

See, e.g., Clogg (1995) and Henry et al. (2014).

²⁴

The Moore-Penrose inverse of a matrix A is denoted by A⁺ and is defined by the four following properties: (1) AA⁺A = A; (2) A⁺AA⁺ = A⁺; (3) A⁺A is symmetric; (4) AA⁺ is symmetric. The Moore-Penrose matrix A⁺ of a real matrix A is unique and always exists (Magnus and Neudecker, 1999).

²⁵

See Appendix A.3.

²⁶

See Section A.4 of the Web Appendix for bounds on the response-type probabilities and counterfactual outcomes.

²⁷

See, e.g., Yakowitz and Spragins (1968) and Prakasa Rao (1992).

²⁸

We also have that $B_{t_{1}} = ι_{N_{Z}} ι_{N_{S}}^{'} - B_{t_{0}}$ , where ι_N denotes a N-dimensional vector of elements 1.

²⁹

Imbens and Angrist (1994) do not use indicator functions. This is one innovation of our analysis. They compare the values of the counterfactual choices directly, e.g., T_ω(z) ≥ T_ω(z′), assuming the T are ordered. In their analysis, the values that choice T takes must be ordered. Our approach does not require T to be ordered. The two monotonicity criteria are equivalent for the binary choice model.

³⁰

Recall R does not have redundant rows or columns. Otherwise stated,.

³¹

In Section 6, we present a generalization of the triangular property for matrices called “lonesum matrices.”

³²

Heckman and Pinto (2015c) show that in the general case of multi-valued treatments, Ordered Monotonicity A-2 and unordered monotonicity A-3 do not imply each other.

³³

See Web Appendix Table D.1.

³⁴

If we assume an ordered choice model, we can readily secure identification. If we only assume a partially ordered model we lose identification. Heckman and Pinto (2015c) discuss these cases.

³⁵

See the elimination analysis in Table D.1 of Web Appendix D.

³⁶

Under this rationale, it follows that: Λ_ω(z_b, t_a) = Λ_ω(z_no, t_a), Λ_ω(z_b, t_b) = Λ_ω(z_bc, t_b), and Λ_ω(z_b, t_c) = Λ_ω(z_no, t_c).

³⁷

See Pinto (2016a).

³⁸

This would occur if utility is quasilinear in γ.

³⁹

A stronger assumption is homothetic preferences on consumption goods.

⁴⁰

See the elimination analysis in Table D.5 of Web Appendix D.

⁴¹

See the elimination analysis in Table D.6 of Web Appendix D.

⁴²

See Table D.13 in Web Appendix D.3 for the elimination analysis.

⁴³

This claim is formally proved in the next section using Condition (iii) of Theorem T-3.

⁴⁴

This paper does not consider issues of estimation and inference. If certain parameters are over-identified from different instrument configurations, the obvious approach is to combine estimators using efficient GMM (Hansen, 1982).

⁴⁵

See Ryser (1957), Brualdi (1980), Brualdi and Ryser (1991) and Sachnov and Tarakanov (2002) for surveys of the properties of binary matrices.

⁴⁷

Alternatively, this can be written as: for any z, z′ ∈ supp(Z) and t ∈ supp(T), we have that:

(1 [T = t] ∣ Z = z, V = v) \geq (1 [T = t] ∣ Z = z^{'}, V = v) for all v \in supp (V) or (1 [T = t] ∣ Z = z, V = v) \leq (1 [T = t] ∣ Z = z^{'}, V = v) for all v \in supp (V) .

⁴⁸

See Web Appendix E for a derivation.

⁴⁹

Heckman and Pinto (2015c) show that lonesum matrices also play a key role in equivalence results for ordered monotonicity. Pinto (2016b) develops a framework for the design of social interventions using lonesum matrices that rely on revealed preference relationships to identify causal parameters. He shows that incentive designs based on lonesum matrices generate a range of monotonicity conditions.

⁵⁰

Web Appendix F presents all of the 66 response matrices that consist of distinct sets of 7 response-types generated by unordered monotonicity A-3.

⁵¹

In this case, we obtain the following forbidden sub-matrix: $(\begin{matrix} t_{a} & t_{c} \\ t_{b} & t_{a} \end{matrix})$ .

⁵²

These are termed “prohibited” or “forbidden” patterns of binary matrices (see Ryser, 1957).

⁵³

Violation of Condition (ii) is not necessarily a violation of rationality. Table 6 is based on an application of WARP, but violates Condition (ii) and unordered monotonicity.

⁵⁴

Recall that we eliminate all redundancies in the rows or columns of R.

⁵⁵

With the caveat that we eliminate any redundancies in the rows and columns of R.

⁵⁶

Web Appendix H discusses the threshold property of condition (iv) in greater detail.

⁵⁷

Consider a binary choice model: T = 1[α + V · Z ≥ 0] T ∈ {0, 1} where (α, V) is a random vector and V, Z, and α are scalar. Suppose that we impose the requirement that V > 0 while Z is unrestricted. This is a monotone response model that is nonseparable. However, it can be represented as a separable model: $T = 1 [\frac{α}{V} + Z \geq 0]$ . This highlights the non-uniqueness of the representation of Ψ(t,Z, V) and that separability is only one characterization of preferences consistent with A-3.

⁵⁸

This transformation does not change an agent’s preferences towards choices in supp(T):

Ψ (t, z, v) \geq Ψ (t^{'}, z, v) \Leftrightarrow (Ψ (t, z, v) - max_{t^{'} \in supp (T) \ {t}} Ψ (t^{'}, z, v)) \geq (Ψ (t^{'}, z, v) - max_{t^{'} \in supp (T) \ {\tilde{t}}} Ψ (t^{'}, z, v)) .

⁵⁹

In general, a richer class of treatment effects can be identified in the unordered model than in the ordered or binary models. See Heckman and Pinto (2016).

⁶⁰

An alternative terminology would be t-Compliers.

⁶¹

See Heckman et al. (2006a). This analysis readily generalizes to any columns of switchers. This treatment effect can always be identified under the assumption of unordered monotonicity. Treatment effects for other categories of switchers can also be identified under these conditions. Additional treatment effects can sometimes be identified. See Heckman and Pinto (2016).

A version of this paper was first presented as the Presidential Address to the Econometric Society delivered in Taipei, June 22, 2014, under the title “Causal Models, Structural Equations, and Identification: Stratification and Instrumental Variables.” An earlier version was presented by Heckman at the European Seminar on Bayesian Econometrics in Vienna, November 1, 2012, hosted by Sylvia Fruehwirth-Schnatter, under the title “Causal Analysis After Haavelmo: Definitions and a Unified Analysis of Identification of Causal Models.”

Contributor Information

James J. Heckman, Department of Economics, University of Chicago, 1126 East 59th Street, Chicago, IL 60637, Phone: 773-702-0634

Rodrigo Pinto, Department of Economics, University of California Los Angeles, 8385 Bunche Hall, Los Angeles, CA 90095, Phone: 310-825-9528.

References

Angrist JD, Imbens GW. Two-Stage Least Squares Estimation of Average Causal Effects in Models with Variable Treatment Intensity. Journal of the American Statistical Association. 1995;90:431–442. [Google Scholar]
Angrist JD, Imbens GW, Rubin D. Identification of Causal Effects Using Instrumental Variables. Journal of the American Statistical Association. 1996;91:444–455. [Google Scholar]
Blevins JR. Nonparametric Identification of Dynamic Decision Processes with Discrete and Continuous Choices. Quantitative Economics. 2014;5:531–554. [Google Scholar]
Brualdi RA. Matrices of zeros and ones with fixed row and column sum vectors. Linear Algebra and Its Applications. 1980;33:159–231. [Google Scholar]
Brualdi RA, Ryser HJ. Encyclopedia of Mathematics and its Applications. New York, NY: Cambridge University Press; 1991. Combinatorial Matrix Theory. [Google Scholar]
Carrasco M, Florens J-P, Renault E. Linear Inverse Problems in Structural Econometrics Estimation Based on Spectral Decomposition and Regularization. In: Heckman JJ, Leamer EE, editors. Handbook of Econometrics. 6B. Amsterdam: Elsevier; 2007. pp. 5633–5751. [Google Scholar]
Clogg CC. Latent Class Models. In: Arminger G, Clogg CC, Sobel ME, editors. Handbook of Statistical Modeling for the Social and Behavioral Sciences. chap. 6. New York: Plenum Press; 1995. pp. 311–359. [Google Scholar]
Frangakis CE, Rubin D. Principal stratification in causal inference. Biometrics. 2002;58:21–29. doi: 10.1111/j.0006-341x.2002.00021.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Frisch R. Editor’s Note. Econometrica. 1933;1:1–4. [Google Scholar]
Frisch R. Paper given at League of Nations. 1938. Autonomy of Economic Relations: Statistical versus Theoretical Relations in Economic Macrodynamics. Reprinted in D.F. Hendry and M.S. Morgan (1995), The Foundations of Econometric Analysis, Cambridge University Press. [Google Scholar]
Haavelmo T. The Statistical Implications of a System of Simultaneous Equations. Econometrica. 1943;11:1–12. [Google Scholar]
Haavelmo T. The Probability Approach in Econometrics. Econometrica. 1944;12:iii–vi. 1–115. [Google Scholar]
Hansen LP. Large Sample Properties of Generalized Method of Moments Estimators. Econometrica. 1982;50:1029–1054. [Google Scholar]
Heckman JJ. The Principles Underlying Evaluation Estimators with an Application to Matching. Annales d’Economie et de Statistiques. 2008;91–92:9–73. [Google Scholar]
Heckman JJ. Building Bridges Between Structural and Program Evaluation Approaches to Evaluating Policy. Journal of Economic Literature. 2010;48:356–398. doi: 10.1257/jel.48.2.356. [DOI] [PMC free article] [PubMed] [Google Scholar]
Heckman JJ, Moon SH, Pinto R, Savelyev PA, Shaikh A, Yavitz AQ. The Perry Preschool Project: A Reanalysis. University of Chicago, Department of Economics; 2006a. Unpublished manuscript. [Google Scholar]
Heckman JJ, Pinto R. Alternative Ways to Identify PS. University of Chicago; 2015a. Aug, Unpublished manuscript. 2015. [Google Scholar]
Heckman JJ, Pinto R. Causal Analysis after Haavelmo. Econometric Theory. 2015b;31:115–151. doi: 10.1017/S026646661400022X. [DOI] [PMC free article] [PubMed] [Google Scholar]
Heckman JJ, Pinto R. Comparing Ordered and Unordered Choice Models. University of Chicago; 2015c. Jul, Unpublished manuscript. 2015. [Google Scholar]
Heckman JJ, Pinto R. Ordered and Unordered Monotonicity. UCLA Economics; 2016. Unpublished manuscript. [DOI] [PMC free article] [PubMed] [Google Scholar]
Heckman JJ, Robb R. Alternative Methods for Evaluating the Impact of Interventions: An Overview. Journal of Econometrics. 1985;30:239–267. [Google Scholar]
Heckman JJ, Urzúa S, Vytlacil EJ. Understanding Instrumental Variables in Models with Essential Heterogeneity. Review of Economics and Statistics. 2006b;88:389–432. [Google Scholar]
Heckman JJ, Urzúa S, Vytlacil EJ. Instrumental Variables in Models with Multiple Outcomes: The General Unordered Case. Annales d’Economie et de Statistique. 2008;91–92:151–174. [Google Scholar]
Heckman JJ, Vytlacil EJ. Structural Equations, Treatment Effects and Econometric Policy Evaluation. Econometrica. 2005;73:669–738. [Google Scholar]
Heckman JJ, Vytlacil EJ. Econometric Evaluation of Social Programs, Part I: Causal Models, Structural Models and Econometric Policy Evaluation. In: Heckman JJ, Leamer EE, editors. Handbook of Econometrics. 6B. Amsterdam: Elsevier; 2007a. pp. 4779–4874. chap. 70. [Google Scholar]
Heckman JJ, Vytlacil EJ. Econometric Evaluation of Social Programs, Part II: Using the Marginal Treatment Effect to Organize Alternative Economic Estimators to Evaluate Social Programs, and to Forecast Their Effects in New Environments. In: Heckman JJ, Leamer EE, editors. Handbook of Econometrics. 6B. Amsterdam: Elsevier B. V; 2007b. pp. 4875–5143. chap. 71. [Google Scholar]
Henry M, Kitamura Y, Salanié B. Partial Identification of Finite Mixtures in Econometric Models. Quantitative Economics. 2014;5:123–144. [Google Scholar]
Imbens GW, Angrist JD. Identification and Estimation of Local Average Treatment Effects. Econometrica. 1994;62:467–475. [Google Scholar]
Lee S, Salanié B. Identifying Effects of Multivalued Treatments. 2016 Unpublished manuscript. [Google Scholar]
Magnus J, Neudecker H. Matrix Differential Calculus with Applications in Statistics and Econometrics. 2 Wiley; 1999. [Google Scholar]
Pinto R. Unpublished paper based on Ph.D. Thesis. University of Chicago, Department of Economics; 2016a. Learning from Noncompliance in Social Experiments: The Case of Moving to Opportunity. [Google Scholar]
Pinto R. Randomized Biased-Controlled Trials: Adding Incentives to the Design of Experiments. 2016b Unpublished paper. [Google Scholar]
Prakasa Rao BLS. Identifiability in Stochastic Models: Characterization of Probability Distributions. chap. 8. Boston, MA: Academic Press; 1992. Identifiability for Mixtures of Distributions; pp. 183–228. [Google Scholar]
Quandt RE. A New Approach to Estimating Switching Regressions. Journal of the American Statistical Association. 1972;67:306–310. [Google Scholar]
Robins JM, Greenland S. Identifiability and Exchangeability for Direct and Indirect Effects. Epidemiology. 1992;3:143–155. doi: 10.1097/00001648-199203000-00013. [DOI] [PubMed] [Google Scholar]
Ryser H. Combinatorial properties of matrices of zeros and ones. Canadian Journal of Mathematics. 1957;9:371–377. [Google Scholar]
Sachnov VN, Tarakanov VE. In: Combinatorics of Nonnegative Matrices. Kolchin Valentin F., translator. Providence, RI: American Mathematical Society; 2002. no. 213 in Translations of Mathematical Monographs. [Google Scholar]
Theil H. Esimation and Simultaneous Correlation in Complete Equation Systems. The Hague: Central Planning Bureau; 1953. mimeographed memorandum. [Google Scholar]
Theil H. Economic Forecasts and Policy. Amsterdam: North-Holland Publishing Company; 1958. no. 15 in Contributions to Economic Analysis. [Google Scholar]
Vytlacil EJ. Independence, Monotonicity, and Latent Index Models: An Equivalence Result. Econometrica. 2002;70:331–341. [Google Scholar]
Vytlacil EJ. Ordered Discrete Choice Selection Models: Equivalence, Nonequivalence, and Representation Results. Stanford University, Department of Economics; 2004. Unpublished manuscript. [Google Scholar]
Vytlacil EJ. Ordered Discrete-Choice Selection Models and Local Average Treatment Effect Assumptions: Equivalence, Nonequivalence, and Representation Results. Review of Economics and Statistics. 2006;88:578–581. [Google Scholar]
Yakowitz SJ, Spragins JD. On the Identifiability of Finite Mixtures. Annals of Mathematical Statistics. 1968;39:209–214. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Online Appendix

NIHMS938978-supplement-Online_Appendix.pdf^{(909KB, pdf)}

[R1] Angrist JD, Imbens GW. Two-Stage Least Squares Estimation of Average Causal Effects in Models with Variable Treatment Intensity. Journal of the American Statistical Association. 1995;90:431–442. [Google Scholar]

[R2] Angrist JD, Imbens GW, Rubin D. Identification of Causal Effects Using Instrumental Variables. Journal of the American Statistical Association. 1996;91:444–455. [Google Scholar]

[R3] Blevins JR. Nonparametric Identification of Dynamic Decision Processes with Discrete and Continuous Choices. Quantitative Economics. 2014;5:531–554. [Google Scholar]

[R4] Brualdi RA. Matrices of zeros and ones with fixed row and column sum vectors. Linear Algebra and Its Applications. 1980;33:159–231. [Google Scholar]

[R5] Brualdi RA, Ryser HJ. Encyclopedia of Mathematics and its Applications. New York, NY: Cambridge University Press; 1991. Combinatorial Matrix Theory. [Google Scholar]

[R6] Carrasco M, Florens J-P, Renault E. Linear Inverse Problems in Structural Econometrics Estimation Based on Spectral Decomposition and Regularization. In: Heckman JJ, Leamer EE, editors. Handbook of Econometrics. 6B. Amsterdam: Elsevier; 2007. pp. 5633–5751. [Google Scholar]

[R7] Clogg CC. Latent Class Models. In: Arminger G, Clogg CC, Sobel ME, editors. Handbook of Statistical Modeling for the Social and Behavioral Sciences. chap. 6. New York: Plenum Press; 1995. pp. 311–359. [Google Scholar]

[R8] Frangakis CE, Rubin D. Principal stratification in causal inference. Biometrics. 2002;58:21–29. doi: 10.1111/j.0006-341x.2002.00021.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Frisch R. Editor’s Note. Econometrica. 1933;1:1–4. [Google Scholar]

[R10] Frisch R. Paper given at League of Nations. 1938. Autonomy of Economic Relations: Statistical versus Theoretical Relations in Economic Macrodynamics. Reprinted in D.F. Hendry and M.S. Morgan (1995), The Foundations of Econometric Analysis, Cambridge University Press. [Google Scholar]

[R11] Haavelmo T. The Statistical Implications of a System of Simultaneous Equations. Econometrica. 1943;11:1–12. [Google Scholar]

[R12] Haavelmo T. The Probability Approach in Econometrics. Econometrica. 1944;12:iii–vi. 1–115. [Google Scholar]

[R13] Hansen LP. Large Sample Properties of Generalized Method of Moments Estimators. Econometrica. 1982;50:1029–1054. [Google Scholar]

[R14] Heckman JJ. The Principles Underlying Evaluation Estimators with an Application to Matching. Annales d’Economie et de Statistiques. 2008;91–92:9–73. [Google Scholar]

[R15] Heckman JJ. Building Bridges Between Structural and Program Evaluation Approaches to Evaluating Policy. Journal of Economic Literature. 2010;48:356–398. doi: 10.1257/jel.48.2.356. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] Heckman JJ, Moon SH, Pinto R, Savelyev PA, Shaikh A, Yavitz AQ. The Perry Preschool Project: A Reanalysis. University of Chicago, Department of Economics; 2006a. Unpublished manuscript. [Google Scholar]

[R17] Heckman JJ, Pinto R. Alternative Ways to Identify PS. University of Chicago; 2015a. Aug, Unpublished manuscript. 2015. [Google Scholar]

[R18] Heckman JJ, Pinto R. Causal Analysis after Haavelmo. Econometric Theory. 2015b;31:115–151. doi: 10.1017/S026646661400022X. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] Heckman JJ, Pinto R. Comparing Ordered and Unordered Choice Models. University of Chicago; 2015c. Jul, Unpublished manuscript. 2015. [Google Scholar]

[R20] Heckman JJ, Pinto R. Ordered and Unordered Monotonicity. UCLA Economics; 2016. Unpublished manuscript. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] Heckman JJ, Robb R. Alternative Methods for Evaluating the Impact of Interventions: An Overview. Journal of Econometrics. 1985;30:239–267. [Google Scholar]

[R22] Heckman JJ, Urzúa S, Vytlacil EJ. Understanding Instrumental Variables in Models with Essential Heterogeneity. Review of Economics and Statistics. 2006b;88:389–432. [Google Scholar]

[R23] Heckman JJ, Urzúa S, Vytlacil EJ. Instrumental Variables in Models with Multiple Outcomes: The General Unordered Case. Annales d’Economie et de Statistique. 2008;91–92:151–174. [Google Scholar]

[R24] Heckman JJ, Vytlacil EJ. Structural Equations, Treatment Effects and Econometric Policy Evaluation. Econometrica. 2005;73:669–738. [Google Scholar]

[R25] Heckman JJ, Vytlacil EJ. Econometric Evaluation of Social Programs, Part I: Causal Models, Structural Models and Econometric Policy Evaluation. In: Heckman JJ, Leamer EE, editors. Handbook of Econometrics. 6B. Amsterdam: Elsevier; 2007a. pp. 4779–4874. chap. 70. [Google Scholar]

[R26] Heckman JJ, Vytlacil EJ. Econometric Evaluation of Social Programs, Part II: Using the Marginal Treatment Effect to Organize Alternative Economic Estimators to Evaluate Social Programs, and to Forecast Their Effects in New Environments. In: Heckman JJ, Leamer EE, editors. Handbook of Econometrics. 6B. Amsterdam: Elsevier B. V; 2007b. pp. 4875–5143. chap. 71. [Google Scholar]

[R27] Henry M, Kitamura Y, Salanié B. Partial Identification of Finite Mixtures in Econometric Models. Quantitative Economics. 2014;5:123–144. [Google Scholar]

[R28] Imbens GW, Angrist JD. Identification and Estimation of Local Average Treatment Effects. Econometrica. 1994;62:467–475. [Google Scholar]

[R29] Lee S, Salanié B. Identifying Effects of Multivalued Treatments. 2016 Unpublished manuscript. [Google Scholar]

[R30] Magnus J, Neudecker H. Matrix Differential Calculus with Applications in Statistics and Econometrics. 2 Wiley; 1999. [Google Scholar]

[R31] Pinto R. Unpublished paper based on Ph.D. Thesis. University of Chicago, Department of Economics; 2016a. Learning from Noncompliance in Social Experiments: The Case of Moving to Opportunity. [Google Scholar]

[R32] Pinto R. Randomized Biased-Controlled Trials: Adding Incentives to the Design of Experiments. 2016b Unpublished paper. [Google Scholar]

[R33] Prakasa Rao BLS. Identifiability in Stochastic Models: Characterization of Probability Distributions. chap. 8. Boston, MA: Academic Press; 1992. Identifiability for Mixtures of Distributions; pp. 183–228. [Google Scholar]

[R34] Quandt RE. A New Approach to Estimating Switching Regressions. Journal of the American Statistical Association. 1972;67:306–310. [Google Scholar]

[R35] Robins JM, Greenland S. Identifiability and Exchangeability for Direct and Indirect Effects. Epidemiology. 1992;3:143–155. doi: 10.1097/00001648-199203000-00013. [DOI] [PubMed] [Google Scholar]

[R36] Ryser H. Combinatorial properties of matrices of zeros and ones. Canadian Journal of Mathematics. 1957;9:371–377. [Google Scholar]

[R37] Sachnov VN, Tarakanov VE. In: Combinatorics of Nonnegative Matrices. Kolchin Valentin F., translator. Providence, RI: American Mathematical Society; 2002. no. 213 in Translations of Mathematical Monographs. [Google Scholar]

[R38] Theil H. Esimation and Simultaneous Correlation in Complete Equation Systems. The Hague: Central Planning Bureau; 1953. mimeographed memorandum. [Google Scholar]

[R39] Theil H. Economic Forecasts and Policy. Amsterdam: North-Holland Publishing Company; 1958. no. 15 in Contributions to Economic Analysis. [Google Scholar]

[R40] Vytlacil EJ. Independence, Monotonicity, and Latent Index Models: An Equivalence Result. Econometrica. 2002;70:331–341. [Google Scholar]

[R41] Vytlacil EJ. Ordered Discrete Choice Selection Models: Equivalence, Nonequivalence, and Representation Results. Stanford University, Department of Economics; 2004. Unpublished manuscript. [Google Scholar]

[R42] Vytlacil EJ. Ordered Discrete-Choice Selection Models and Local Average Treatment Effect Assumptions: Equivalence, Nonequivalence, and Representation Results. Review of Economics and Statistics. 2006;88:578–581. [Google Scholar]

[R43] Yakowitz SJ, Spragins JD. On the Identifiability of Finite Mixtures. Annals of Mathematical Statistics. 1968;39:209–214. [Google Scholar]

PERMALINK

Unordered Monotonicity

James J Heckman

Rodrigo Pinto

Abstract

1 Introduction

2 A Choice-Theoretic Model of Instrumental Variables

Remark 2.1

3 Response Vectors and Identifying or Bounding of Mean Counterfactuals and Weights on Counterfactuals

3.1 Properties of Response Vectors

Lemma L-1

Proof

Remark 3.1

3.2 The Strata Identification Problem

Theorem T-1

Proof

Remark 3.2

4 Identifying Response Probabilities and Counterfactual Outcomes

Theorem T-2

Proof

Corollary C-1

Proof

4.1 Example: Binary Choice (LATE)

Assumption A-1. Monotonicity for the Binary Choice Model

4.2 Revisiting Vytlacil’s Equivalence Theorem

5 Multiple Unordered Choices

Assumption A-2. Ordered Monotonicity

5.1 Monotonicity for Unordered Models

Assumption A-3. Unordered Monotonicity

5.2 Linking Unordered Monotonicity to Choice Theory

Table 1.

Table 2.

Table 3.

Table 4.

Remark 5.1

Table 5.

Table 6.

6 Equivalent Conditions for Characterizing Unordered Monotonicity

6.1 Properties of Binary Matrices

Table 7.

Table 8.

6.2 Characterizing Unordered Monotonicity

Theorem T-3

Proof

6.3 Interpreting T-3

Remark 6.1

Remark 6.2

Remark 6.3

6.4 Understanding Condition (iv) of T-3

Theorem T-4

Proof

Remark 6.4

6.5 Verifying Unordered Monotonicity Condition A-3

Theorem T-5

Proof

7 Identification of Counterfactuals and Treatment Effects

Table 9.

Theorem T-6

Proof

Table 10.

Remark 7.1

Corollary C-2

Proof

Corollary C-3

Proof

Remark 7.2

Example 7.1

7.1 Maximum Number of Admissible Response-types to Secure Identification

Theorem T-7

Proof

8 Summary and Conclusions

Supplementary Material

Acknowledgments

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK