Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Nov 1.
Published in final edited form as: Econometrica. 2023 Nov;91(6):2155–2185. doi: 10.3982/ecta19367

Nonrandom Exposure to Exogenous Shocks

Kirill Borusyak 1, Peter Hull 2,*
PMCID: PMC10795685  NIHMSID: NIHMS1953990  PMID: 38249339

Abstract

We develop a new approach to estimating the causal effects of treatments or instruments that combine multiple sources of variation according to a known formula. Examples include treatments capturing spillovers in social or transportation networks and simulated instruments for policy eligibility. We show how exogenous shocks to some, but not all, determinants of such variables can be leveraged while avoiding omitted variables bias. Our solution involves specifying counterfactual shocks that may as well have been realized and adjusting for a summary measure of non-randomness in shock exposure: the average treatment (or instrument) across shock counterfactuals. We use this approach to address bias when estimating employment effects of market access growth from Chinese high-speed rail construction.

1. Introduction

Many questions in economics involve the causal effects of treatments which are computed from multiple sources of variation, and sometimes observed at different “levels,” according to a known formula. Consider three examples. First, when estimating spillovers from a randomized intervention, one might count the number of an individual’s neighbors who were selected for the intervention. This spillover treatment combines variation in who was selected with variation in who neighbors whom. Second, in studies of transportation infrastructure effects, one might measure the growth of regional market access: a treatment computed from the location and timing of transportation upgrades and the spatial distribution of economic activity in a country. A third example is a treatment capturing individual eligibility for a public program, such as Medicaid, which is jointly determined by the eligibility policy in the individual’s state and her household’s demographics and income.1

This paper develops a new approach to estimating the effects of such composite variables when some, but not all, of their determinants are generated by a true or natural experiment. We ask, for example, how one can estimate market access effects by leveraging the timing of new railroad line construction as exogenous shocks, when the other determinants of market access (such as the pre-determined location of large markets and planned lines) are non-random.

We first show that omitted variable bias (OVB) may confound conventional regression approaches in such settings. Bias arises from different observations receiving systematically different values of the treatment because of their individual non-random “exposure” to the exogenous shocks. For example, even when construction is delayed for a random set of lines, regions that are economically or geographically more central will tend to see a larger growth in market access because they are closer to a typical potential line (and thus closer to a typical constructed line). Regression estimation of market access effects then fails without an additional assumption on the exogeneity of economic geography: that more exposed (e.g., central) regions do not differ in their relevant unobservables, such as changes in local productivity or amenities. Intuitively, randomizing transportation upgrades does not randomize the market access growth generated by them.

Our solution to the OVB challenge is based on the specification of counterfactual exogenous shocks that might as well have been realized. This approach views the observed shocks as one realization of some data-generating process—what we call the shock assignment process—which can be simulated to obtain counterfactuals. In a true experiment, the shock assignment process is given by the randomization protocol. In natural experiments, shock counterfactuals make explicit the contrasts which the researcher wishes to leverage, for instance by specifying permutations of the shocks that were as likely to have occurred. For example, if line construction delays are considered as-good-as-random, one might produce counterfactual network maps by randomly exchanging the lines which were completed earlier and later.

Valid shock counterfactuals can be used to avoid OVB by a “recentering” procedure which involves measuring and appropriately adjusting for a single confounder: the expected treatment. To do so, a researcher draws counterfactual shocks from the assignment process and recomputes the instrument many times. Then, for each observation, the treatment is averaged across these many draws to obtain the expected treatment. Finally, the expected treatment is subtracted from the realized treatment to obtain the recentered treatment. We show that using this recentered treatment as an instrument for the realized treatment removes the bias from non-random shock exposure. Intuitively, observations only get high vs. low values of the recentered treatment because the observed shocks were drawn instead of the counterfactuals, which is assumed to happen by chance. For example, when the expected treatment is constructed by permuting the timing of new line construction, regressions that instrument with recentered market access growth compare regions which received higher vs. lower market access growth because proximate lines were constructed early vs. late, and not because of the economic geography. Another closely related solution to OVB is to include the expected treatment as a control in the regression of an outcome on the realized treatment; this can be viewed as recentering the treatment while also removing some residual variation in the outcome, in a control function approach.2

This approach to causal inference with composite variables, in which some determinants are labeled as exogenous and characterized by an assignment process, can be seen as formalizing the natural experiment of interest and bringing composite variables to familiar econometric territory.3 Indeed, the conditions we impose on the exogenous shocks are similar to those which might be used if the shocks were directly used as treatments: e.g., if shocks to the timing of railroad line upgrades were used in a regression of outcomes defined at the “level” of those lines. Recentering ensures identification from the natural experiment, even when the regression is estimated at a different level (e.g., across regions instead of lines).

Our framework further allows the treatment to have endogenous or unobserved determinants. In this case one may construct candidate composite instruments based on the treatment’s exogenous and predetermined components. The same OVB problem arises in this instrumental variable (IV) case, and it can again be solved by recentering the candidate instrument by its expectation over the shock assignment process. Controlling for the expected instrument is again another solution.

We establish several attractive properties of the recentering approach, beyond our primary results on OVB. First, recentered estimators are consistent provided the exogenous shocks induce sufficient cross-sectional variation in the instrument and treatment—regardless of the correlation structure of unobservables. Second, shock counterfactuals can be used for exact finite-sample inference and specification tests via randomization inference (RI). Finally, while our consistency and RI results rely on an assumption of constant treatment effects, recentered IV estimators generally capture a convex average of heterogeneous effects under a natural first-stage monotonicity condition.

We apply this framework to estimate the employment effects of market access (MA) growth due to new high-speed railway system in China. We show how recentering can help leverage variation in the timing of transportation upgrades to purge OVB. Simple regressions of employment growth on MA growth suggest a large and statistically significant effect, which is only partially reduced by conventional geography-based controls. But this effect is eliminated when we adjust for expected MA growth, measured by permuting constructed HSR lines with similar ones that were planned but not built. The unadjusted estimates thus reflect the fact that employment grew in regions which were more exposed to planned high-speed rail construction, whether or not construction actually occurred.

Econometrically, expected treatment and instrument adjustment is similar to propensity score methods for removing OVB (Rosenbaum and Rubin 1983), with two key differences. First, we propose using the structure of composite treatments and instruments to compute their expectation from more primitive assumptions on the assignment process for exogenous shocks. This approach is similar to how Borusyak et al. (2022) and Aronow and Samii (2017) address OVB when using linear shift-share instruments and network treatments, respectively. It differs from conventional methods of directly estimating propensity scores; such methods are typically infeasible in the settings we consider because the exposure to exogenous shocks is intractably high-dimensional. Second, our regression-based adjustment differs from conventional approaches of weighting by or matching on propensity scores.4 Regression adjustment is more popular in applied research, avoids practical issues of limited overlap (due to, e.g., propensity scores that are close to zero or one), does not require the treatments or instruments to be binary, and is natural for estimating constant structural parameters or convex averages of heterogeneous treatment effects.

The remainder of this paper is organized as follows. The next section motivates our analysis with three examples related to network spillovers, market access effects, and Medicaid eligibility effects. Section 3 develops our general framework and results. Section 4 presents our application, and Section 5 concludes. Additional results and extensions are given in an earlier working paper, Borusyak and Hull (2021, henceforth BH).

2. Motivating Examples

We develop three stylized examples, inspired respectively by the settings of Miguel and Kremer (2004), Donaldson and Hornbeck (2016), and Currie and Gruber (1996), to illustrate the main insights of this paper. In each example we consider estimating the parameter β of a causal or structural model which relates an outcome yi to a treatment xi,

yi=βxi+εi, (1)

for a set of units i=1,,N with an unobserved error εi. The common feature of the examples is that xi is computed from multiple sources of variation by a known formula.

Network spillovers:

Suppose yi is student i’s educational achievement and xi counts the number of i’s neighbors who have been dewormed in an intervention:

xi=k=1NNeighborikDewormedk.

Here Dewormedk{0,1} is an indicator for student k being selected for the deworming intervention and Neighborik{0,1} indicates that i and k are neighbors (i.e., connected by an observed network link). The error term εi captures i’s educational outcome when none of her neighbors are dewormed. This example is a stylized version of the main specification in Miguel and Kremer (2004).5

Market access:

Suppose yi is the growth of land values in region i between two dates t{0,1} and xi=logMAi1logMAi0 is the growth of regional market access (MA) due to improvements to the interregional railroad network. Market access is computed

MAit=j=1NPopjτ(Networkt,Loci,Locj),

following standard models of economic geography (e.g. Redding and Venables (2004)). Here Popj is the time-invariant population of region j, Networkt is the set of railway lines and other types of transit which comprise the transportation network in operation at time t, Locj is the location of region j on the map, and τ() is a function giving the travel time between regions i and j. The error term εi captures location i’s land value growth in the absence of market access growth, due to some regional amenity and productivity shocks. Similar market access growth specifications are considered in, for example, Donaldson and Hornbeck (2016).

Medicaid eligibility:

Suppose yi is individual i’s health outcome and xi{0,1} indicates her eligibility for Medicaid. Let IncDemi be a vector of individual income and demographics, Statei{1,,50} index i’s state of residence, and Policyk be state k’s eligibility policy: i.e. the set of income and demographic groups eligible for Medicaid in that state. Then

xi=1[IncDemiPolicyStatei].

The error term εi captures individual i’s outcome when she is ineligible for Medicaid. This example comes from Currie and Gruber (1996).

To estimate β in each example, we consider a true or natural experiment that manipulates some of the determinants of xi. Formally, we partition the variables from which x=(x1,,xN) is computed into two groups: a set of shocks g and a set of predetermined variables w. The shocks g=(g1,,gK) are assumed to be exogenous, i.e. independent of the errors ε=(ε1,,εN). Shock exogeneity combines two conceptually distinct assumptions—that g is as-good-as-randomly assigned, and that this assignment only affects the outcome of each unit i via its treatment xi (an exclusion restriction). The shocks can be assigned at a different “level” than the observations, with KN. The remaining variables w have an arbitrary structure and fi(;w) governs the mapping from the exogenous shocks to each unit’s treatment, i.e. the observation’s “exposure” to the shocks.6 We assume that w is determined prior to the (natural) experiment and is unaffected by the shocks.

Network spillovers (cont.)

Suppose deworming is assigned in a randomized control trial (RCT) and βxi fully captures its spillover effects. Then g=(Dewormedk)k=1K collects the exogenous shocks, for K=N.7 The remaining determinants of the spillover treatments of all units, w=(Neighborik)i,k=1N, are fixed in the experiment.

Market access (cont.)

Suppose the timing of new railroads is exogenous. Specifically, suppose that among K lines planned to be constructed by t=1 some are randomly delayed by unexpected engineering problems (unrelated to the trends in regional land values). Suppose also the model of economic geography is correctly specified, so βlogMAit fully captures the effects of transportation upgrades. Then g=(Openk)k=1K collects the exogenous shocks, where Openk is an indicator for whether planned line k faces no delays. Assuming no other changes to the network at t=1, we can partition the determinants of MA growth into g and w=((Loci,Popi)i=1N,Network0) as Network1 is fully determined by Network0 and the set of newly opened lines.

Medicaid eligibility (cont.)

Suppose Medicaid policies across the K=50 states are exogenous, i.e. are chosen irrespective of the potential health outcomes and affect individual outcomes only via Medicaid eligibility. Then g=(Policyk)k=1K collects the exogenous shocks, with the other determinants of eligibility collected in w=(IncDemi,Statei)i=1N.

The first point of this paper is that ordinary least squares (OLS) estimation of β can suffer from OVB, despite the experimental variation underlying xi.8 The OVB problem arises because some units receive systematically higher values of xi than others, as a consequence of their non-random exposure to the shocks. This systematic variation may be cross-sectionally correlated with the errors εi, generating bias in OLS estimation of equation (1).

Network spillovers (cont.)

Even when deworming is randomly assigned to students, those with more neighbors (e.g., because they live in dense urban areas) will tend to have more dewormed neighbors and therefore be more exposed to the deworming intervention. Urban areas may have different educational outcomes for reasons unrelated to deworming, generating OVB.

Market access (cont.)

Even when the opening status of lines is as-good-as-randomly assigned, regions in the economic and geographic center of the country will tend to see more market access growth than peripheral regions as the former are closer to a typical potential line. Central regions may face different amenity and productivity shocks, generating OVB.

Medicaid eligibility (cont.)

Even when Medicaid policies are as-good-as-randomly assigned to states, poorer individuals will tend to see higher rates of eligibility. Poor individuals may face different health shocks, generating OVB.

Our second insight is that this OVB problem has a conceptually simple solution, which follows from viewing the set of realized g as one draw from a shock assignment process and considering what counterfactual sets of exogenous shocks could have as likely been drawn. The specification of such counterfactuals allows one to measure and remove the systematic component of variation in the treatment which drives OVB. Specifically, the researcher recomputes the treatment xi of each unit i across many counterfactual sets of shocks and takes their average to measure the expected treatment, μi. We show that this μi, which is co-determined by the exposure of xi to the shocks g and the shock assignment process, is the sole confounder in equation (1). OVB can then be purged by “recentering” the treatment, i.e. instrumenting xi with x~i=xiμi in equation (1), or by simply adding μi as a control in OLS estimation. The key to removing bias with this approach is thus to credibly specify and average over shock counterfactuals—a task which is trivial in true experiments and which otherwise formalizes the natural experiment of interest.

Network spillovers (cont.)

With deworming assigned in an RCT, the shock assignment process is given by the known randomization protocol. If, say, each student has a 30% chance of being dewormed, the expected number of i’s dewormed neighbors over repeated draws of deworming shocks μi is 0.3 times their number of neighbors k=1KNeighborik. OVB is thus purged by controlling for the number of neighbors, or by using the recentered number of dewormed neighbors x~i=xiμi to instrument for xi. With either adjustment, the regression will only compare students who had more neighbors dewormed than expected (given the network) to those with fewer-than-expected dewormed neighbors.

Market access (cont.)

The as-good-as-random assignment of opening status can be formalized by each planned line facing an equal and independent chance of opening. Then, if k=1KOpenk=K1 railway lines open by t=1, every counterfactual network in which K1 lines from the plan opened was as likely to have occurred. One can thus compute expected MA growth μi as the average MA growth of region i across these counterfactuals (or a random subset of them). Recentering by or controlling for this μi ensures that the regressions only compare regions which saw higher MA growth than expected—given pre-existing economic geography and the plan—to those which saw less-than-expected MA growth.

Medicaid eligibility (cont.)

The as-good-as-random assignment of Medicaid policies can be formalized by each state randomly drawing from a pool of potential policies, such that every permutation of the realized policies was equally likely to have occurred. Averaging individual i’s eligibility across these permutations yields an expected eligibility μi which equals the share of states in which she would be eligible. Our solution is to instrument actual eligibility with recentered eligibility xiμi, or control for μi in an OLS regression. Either approach would, for example, effectively remove from the sample “always-eligible” or “never-eligible” individuals (with xi=μi=1 or xi=μi=0) whose income and demographics make them unaffected by policy variation.

The recentering solution generally dominates more conventional ones, such as instrumenting directly by the shocks or controlling for the other determinants of xi. Instrumenting with the shocks is infeasible when the shocks are assigned at a different level than the units, and generally discards variation in treatment due to w. Controlling for an observation’s non-random shock exposure flexibly is typically infeasible, because such exposure is high-dimensional. Conversely, low-dimensional controls are only guaranteed to purge OVB (absent additional non-experimental restrictions on the error term) when they linearly span μi, which is difficult to establish except when μi is known and recentering is feasible. If either the assignment process or shock exposure mapping is complex, μi is unlikely to be a simple function of observed characteristics.

Network spillovers (cont.)

Using student i’s own deworming status as an instrument is infeasible as it does not predict the number of dewormed neighbors; incorporating the non-random network adjacency matrix is necessary. Controlling for the entire row of the adjacency matrix (which characterizes student’s exposure) is also infeasible, as it would absorb all cross-sectional variation in the treatment. Controlling for the number of i’s neighbors is enough to purge OVB under completely random assignment of deworming, since this control is proportional to μi. However, such simple controls would not linearly span μi with more complex randomization protocols, such as with two tiers (by school, then by student) or stratification (e.g., with girls dewormed with a known higher probability). Simple controls are also generally insufficient with more complex specifications of spillovers.9

Market access (cont.)

Railroad timing shocks vary at the level of lines, so it is infeasible to use them as instruments for regional market access without incorporating some non-random features of economic geography. Controlling perfectly for these features is also infeasible, as each region’s market access depends on the entire spatial distribution of economic activity. Simple sets of controls, such as polynomials in the latitude and longitude of a region, need not linearly span μi given the complexity of xi, and thus are not guaranteed to purge OVB.

Medicaid eligibility (cont.)

Currie and Gruber (1996) propose instrumenting individual eligibility with a measure of the overall policy generosity of her state—a so-called “simulated instrument.” Such instruments are simple functions of Policyk for all individuals in state k and are thus exogenous and relevant under random policy assignment. However they discard relevant within-state variation in i’s income and demographics and are thus likely to yield a less powerful first-stage prediction of xi than recentered eligibility.10

We conclude this section by noting that the OVB problem and recentering solution both extend to the case with an arbitrary endogenous xi and a candidate instrument zi which is constructed from exogenous shocks and other variables by a known formula. This approach is natural when the treatment can be represented as a function of exogenous shocks g, predetermined variables w, and endogenous (and possibly unobserved) variables u: i.e. when xi=hi(g,w,u) for a known hi(). An intuitive candidate instrument for xi is the prediction of xi in the scenario when the u shocks are ignored: zi=hi(g,w,0). Our framework shows that these candidate instruments are generally invalid, again because of the non-random exposure of zi to g. Yet OVB can again be purged by measuring the expected instrument μi—the average zi across counterfactual g—and either instrumenting xi with the recentered IV z~i=ziμi or controlling for μi while instrumenting with zi.

Market access (cont.)

Suppose population sizes also change between t=0 and t=1, and the observable changes u=(Popi1Popi0)i=1N are not exogenous (e.g. they respond to house price shocks in ε). Then one can consider instrumenting the realized change in MA by a predicted change in MA which keeps population sizes fixed at t=0 levels. Without recentering, this IV regression may suffer from the same OVB as the OLS regression discussed above. OVB is now avoided by recentering the MA prediction via counterfactual railroad networks.

Medicaid eligibilty (cont.)

Suppose one is interested in the effects of Medicaid takeup, instead of eligibility. Takeup is the product of eligibility and 1NeverTakeri, where NeverTakeri indicates that individual i would decline Medicaid if eligible and u=(NeverTakeri)i=1N is unobserved. Under the appropriate exclusion restriction one can consider instrumenting takeup with eligibility; our recentering strategy then again removes OVB from non-random variation in policy exposure.

3. Theory

We now develop a general econometric framework for settings with non-random exposure to exogenous shocks. We introduce the baseline setting, develop our approach to estimation based on the recentering procedure, and discuss how this recentering can be performed by specifying counterfactual shocks in Sections 3.1-3.3. We then discuss conditions for consistency of recentered IV estimators in Section 3.4, and how inference can be conducted in Section 3.5. Several extensions are summarized in Section 3.6.

3.1. Setting

We consider estimation of β in the causal or structural model

yi=βxi+εi, (2)

from a dataset of scalar and demeaned yi and xi, i=1,,N. Below we discuss extensions to heterogeneous causal effects, nonlinear models, multiple treatments, and additional control variables. Although we use a single index of i for observations, we note our framework accommodates repeated cross-sections and panel data.

Importantly for the applicability of our framework, we do not assume that the observations of yi and xi are independently or identically distributed (iid) as when arising from random sampling. This allows for complex dependencies across the units due to their common exposure to observed and potentially unobserved shocks. It is also consistent with settings where the N units represent a population—for example, all regions of a country—and conventional random sampling assumptions are inappropriate (Abadie et al., 2020).11

We suppose that to estimate β a researcher has constructed a candidate instrument

zi=fi(g;w), (3)

where f1(),,fN() is a list of known non-stochastic functions, g is a K×1 vector of shocks, and w is a list of other variables of unrestricted dimension. Equation (3) is very general: any zi that can be computed from a set of observed data, according to a known formula, can be described in this way.12 It also allows xi=zi, in which case β is the causal effect of the composite treatment.

We assume that the shocks g are exogenous, which we formalize by their conditional independence from the vector of errors given the other sources of instrument variation:

Assumption 1. (Shock exogeneity): gεw

As noted in Section 2, this notion of shock exogeneity combines two conceptually distinct conditions. First, it imposes an exclusion restriction, reflecting an economic model of how g can affect y. Second, it requires as-good-as-random shock assignment. This latter condition is satisfied when the shocks are fully randomly assigned, as in an RCT (i.e., g(ε,w)), but also allows w to contain variables that govern the shock assignment process.13 Importantly, Assumption 1 allows E[εiw] to vary arbitrarily across i; this reflects the lack of non-experimental assumptions, such as parallel trends, constraining the error in equation (2).14 Assumption 1 is consistent with a two-step data-generating process, where w is determined prior to the realization of shocks g and errors ε which then together determine (x,y).15

We start by considering an instrumental variable (IV) regression of yi on xi that instruments with zi. As usual, this strategy requires zi to be relevant to the treatment and orthogonal to the error term. In our non-iid setting, we formalize these two conditions in terms of the full-sample IV moments E[1Nizixi] and E[1Niziyi]. Since (2) implies E[1Niziyi]=βE[1Nizixi]+E[1Niziεi], β is recoverable from the ratio of these moments (what we term identification) under the relevance condition of E[1Nizixi]0 and the orthogonality condition of E[1Niziεi]=0.16 To start we assume the two IV moments are known, in order to focus on the potential for OVB when the orthogonality condition fails. We discuss conditions for consistent estimation in Section 3.4.

3.2. OVB and Instrument Recentering

We define the expected instrument μi=E[fi(g;w)w] as the average value of zi across different realizations of the shocks, conditional on w. Our first result shows that OVB may arise when predetermined exposure to the natural experiment is endogenous, and that the potential for such bias is entirely governed by the relationship between μi and the error εi. Formally, under Assumption 1 instrument orthogonality need not hold: E[1Niziεi]0 in general. Rather,

E[1Niziεi]=E[1Niμiεi]. (4)

This result follows from the law of iterated expectations: E[ziεi]=E[E[fi(g;w)εiw]]=E[μiE[εiw]]=E[μiεi] for all i, where the second equality uses Assumption 1 and the definition of μi.

The central role of μi in governing OVB suggests the recentering solution: even though OVB results from potentially high-dimensional variation in units’ exposure to shocks, adjustment for the one-dimensional confounder μi is sufficient for instrument orthogonality. We adjust zi by defining the recentered instrument z~i=ziμi. By equation (4), orthogonality always holds for this instrument:

E[1Niz~iεi]=E[1Niziεi]E[1Niμiεi]=0.

Thus, if z~i is also relevant, β is identified by the IV regression which uses the recentered instrument z~i instead of zi.17

A closely related solution, also suggested by equation (4), is to include the expected instrument μi as a control in specification (2) while using the original zi as the instrument in a control function approach (Wooldridge, 2015). Controlling for μi can be thought of as recentering zi while also removing the residual variation in yi which is cross-sectionally correlated with μi. As usual, removing this residual variation may generate precision gains in large samples; similar gains may arise from including (a fixed number of) any predetermined controls in a recentered IV regression.18

Equation (4) further shows that adjusting for μi is generally necessary for identification, absent additional restrictions on the unobserved error. Conventional controls and fixed effects are only guaranteed to purge OVB when they linearly span μi: a condition that is difficult to verify except when recentering is also feasible.19

Adjustments based on μi, as the sole confounder of zi, are similar to more conventional propensity score methods. There are three key differences, concerning the setting, adjustment method, and computation of μi. First, propensity score methods have mostly been applied to binary treatments, starting from Rosenbaum and Rubin (1983). While generalizations to binary instruments (e.g. Abadie (2003)) and non-binary treatments (e.g. Imbens (2000)) have been proposed, our setting allows for arbitrary treatments or instruments. Second, the propensity score literature has mostly used non-regression adjustment methods, such as matching or binning (Abadie and Imbens, 2016; King and Nielsen, 2019). A notable exception is the E-estimator of Robins et al. (1992), which similarly leverages linearity of an outcome model like (2) to recenter by a scalar variable. Third, and most importantly, propensity scores are usually estimated from the data by relating the treatment to a vector of observation-specific covariates. This approach is generally not feasible because exposure to exogenous shocks is high-dimensional: for instance, as noted in Section 2, the expected market access of any region i depends on the entire economic geography of the country. We therefore take a different approach to computing μi, which we turn to next.

3.3. Computing the Expected Instrument via Shock Counterfactuals

We propose computing the expected instrument by specifying an assignment process for the shocks, drawing many sets of counterfactual shocks from this process, recomputing the candidate instrument each time, and averaging it across the counterfactuals. Here we formalize this approach, discuss general ways in which counterfactual shocks can be specified, and highlight the advantages of our approach over alternatives.

We define the shock assignment process as the conditional distribution of gw, with cumulative distribution function G(gw). When G() is known, the expected instrument μi=fi(γ;w)dG(γw) can be computed and either used to recenter zi or added as a regression control.20 To emphasize the importance of a known shock assignment process, we write it as an assumption:

Assumption 2. (Known assignment process): G(gw) is known in the support of w.

This assumption is unrestrictive when the shocks are determined by a known randomization protocol, as in an RCT or with policy randomizations (such as tie-breaking lottery numbers in centralized assignment mechanisms; Abdulkadiroglu et al. (2017)). The assignment process may also be given by scientific knowledge when the shocks are randomized naturally, such as when g captures weather or seismic shocks governed by meteorological or geological processes (e.g., Carvalho et al. (2021); Madestam et al. (2013)). Policy discontinuities (as in regression discontinuity designs) can also yield a known G() when viewed as generating local randomization around known cutoffs (Lee, 2008; Cattaneo et al., 2015).

In observational data, where the distribution of shocks is unknown, Assumption 2 can be satisfied by specifying some permutations of shocks that were as likely to have occurred. For instance, if one is willing to assume the shocks gk are iid across k, it follows that all permutations of the observed g are equally likely. In this case G(gw) is uniform when w is augmented by the permutation class Π(g)={π(g)π()ΠK}, where ΠK denotes the set of permutation operators π() on vectors of length K (e.g. Lehmann and Romano, 2006, p. 634). The distribution of each gk (conditionally on other components of w) then needs not be specified; the expected instrument is the average zi across all permutations of shocks, which serve as counterfactuals:

μi=1K!π()ΠKfi(π(g);w).

Such μi are easy to compute (or approximate with a random set of permutations).

Similar expected instrument calculations follow under weaker shock exchangeability conditions, such as when the gk are iid within, but not across, a set of known clusters and the class of within-cluster permutations is used to draw counterfactuals. We illustrate this approach in Section 4. In BH we discuss how our framework can also apply with G(gw) specified up to a low-dimensional vector of consistently estimable parameters (Appendix C.5); we also show how Assumption 2 can derive from an economic model (e.g. of transportation network formation) with stochastic shocks or from symmetries of the joint shock distribution (Appendices D.1 and D.2)

We note that even when G() is challenging to specify, a possibly incorrect specification can be useful as a sensitivity check. Specifically, if Assumption 1 holds and there is already no OVB because the included regression controls perfectly capture either the endogenous features of exposure or the expected instrument, then controlling for any candidate expected instrument mi(w) cannot introduce bias. In this case the researcher may safely control for one or several mi(w) based on some guesses of the assignment process.21 More generally, researchers may achieve additional robustness by controlling for multiple candidate mi(w) based on multiple shock assignment process guesses; only one such guess needs to be right to purge OVB.

3.4. Recentered IV Consistency

With the ratio of recentered IV moments identifying β, we now consider whether the corresponding IV estimator β^=(1Niz~iyi)(1Niz~ixi) is consistent, i.e. whether β^pβ as the number of observed outcomes and treatments grows large (N). To formalize consistency in our non-iid context we consider a sequence of distributions PN for the complete data (y,x,g,w). Only in this section, to make the asymptotic sequence explicit, we index moments by PN: e.g., we write the recentered IV moments as EPN[1Niz~iyi] and EPN[1Niz~ixi]. We allow the number of observed shocks, KN=dim(g), and the dimensions of w to change arbitrarily with N.

We first consider mean-square convergence of 1Niz~iεi to EPN[1Niz~iεi]=0 under Assumption 1: i.e., whether VarPN[1Niz~iεi]0. Since β^=β+(1Niz~iεi)(1Niz~ixi) by (2), such convergence implies β^pβ so long as the instrument is asymptotically relevant (a condition we return to below). We establish this convergence under a regularity condition on εi and a substantive restriction on z~i, which we term weak mutual dependence:

Assumption 3. (Weak mutual dependence):

EPN[1N2i,jCovPN[z~i,z~jw]]0.

Proposition 1. Suppose Assumptions 1-3 hold and EPN[εi2w]Uε uniformly across N and i=1,,N. Then VarPN[1Niz~iεi]0.

Proof. See Appendix D.

Assumption 3 holds when the shocks induce rich cross-sectional variation in the recentered instrument, through heterogeneous exposure, such that most pairs of (z~i,z~j) have a weak covariance across possible realizations of g. The proof shows this is enough for a law of large numbers to apply to 1Niz~iεi.22

Note that in line with our approach to identification, Proposition 1 makes no substantive restrictions on the errors εi beyond Assumption 1 (in particular, it puts no restrictions on the dependence of εi across observations). In the absence of such restrictions, Proposition 2 in Appendix C shows the Assumption 3 is not only sufficient for VarPN[1Niz~iεi]0 but, under regularity conditions, also necessary. Of course, more conventional restrictions on the mutual dependence of errors (such as iid or clustered εi) may also suffice for convergence when weak mutual dependence of z~i fails.

Three additional results in Appendix C, which extend the results on consistency with linear shift-share instruments from Borusyak et al. (2022), unpack Assumption 3 further. First, a large number of exogenous shocks is essentially necessary for the recentered instrument to not have many strong cross-sectional dependencies. Proposition 3 formalizes this intuition by showing that, with sufficiently smooth fi(;w), Assumption 3 can only hold with KN. Moreover, the concentration of exposure to this growing number of shocks matters. Proposition 4 formalizes this idea by considering a concentration measure for average shock exposure which is similar to a Herfindahl-Hirschman Index (HHI): EPN[k=1KN(f¯(g;w)gk)2], where f¯(g;w)=1Ni(fi(g;w)μi). For binary gk and weakly monotone fi(;w) (as in the network spillovers and market access examples) and with mutually-independent shocks, Assumption 3 is satisfied when this measure converges to zero such that the impact of any finite set of shocks on 1Niz~i vanishes.23 Proposition 5 considers a different low-level condition in a case covering the medicaid eligibility example: Assumption 3 holds when most pairs of observations of z~i are affected by non-overlapping sets of shocks.

Convergence of 1Niz~iεi implies consistency of the recentered IV estimator so long as (i) EPN[1Niz~ixi] remains bounded away from zero and (ii) 1Niz~ixiEPN[1Niz~ixi]p0.24 Condition (i) follows when the relationship between zi and xi is strong and when most observations of z~i have exposure concentrated in a small number of exogenous shocks, such that VarPN[z~i] does not dissipate even as KN. Proposition 6 in Appendix C formalizes these conditions with a linear first stage model of xi=πzi+ui, with guw, and a different measure of shock exposure concentration: the HHI of the effects of different shocks on z~i, k=1KN(EPN[fi(g;w)gkw])2, averaged across observations i. In the case of mutually-independent binary shocks, we require π0 and that the expectation of this concentration measure is bounded above zero. The HHI conditions from Propositions 4 and 6 may simultaneously hold when most observations are mostly exposed to a small number of shocks, differentially across a large number of shocks. Conditions similar to Assumption 3 can be derived to ensure convergence of the sample first stage, (ii).25

3.5. Randomization Inference and Specification Tests

In some applications of our framework, natural assumptions on the mutual independence of z~i or εi across observations can make conventional (e.g. clustered) asymptotic inference valid. Generally, however, the common exposure of observations to observed and unobserved shocks generates complex dependencies across observations making conventional asymptotic analysis inapplicable.26 In such cases, it may be attractive to construct confidence intervals for the constant effect β and tests for Assumptions 1 and 2 based on the specification of the shock assignment process, following a long tradition of randomization inference (RI; Fisher, 1935). The RI approach guarantees correct coverage in finite samples of both observations and shocks.27 We focus on a particular type of RI test which is tightly linked to the recentered IV estimator β^.

RI tests and confidence intervals for β are based on a scalar test statistic T=𝒯(g,ybx,w), where b is a candidate parameter value. Under the null hypothesis of β=b and Assumption 1, the distribution of T=𝒯(g,ε,w) conditional on ε and w is implied by the shock assignment process G(gw). One may simulate this distribution, by redrawing the g shocks and recomputing T. If the original value of T is far in the tails of the simulated distribution, one has grounds to reject the null. Inversion of such tests yields confidence interval for β by collecting all b that are not rejected. These intervals have correct size, both conditionally on (ε,w) and unconditionally (see Appendix C.3 of BH for details).

We propose addressing the practical issue of choosing a randomization test statistic by picking a T that is tightly linked to the recentered IV estimator, building on the theory of Hodges and Lehmann (1963) and Rosenbaum (2002). Specifically, we consider the sample covariance of the recentered instrument and implied residual: T=1Ni(fi(g;w)μi)(yibxi). Lemma 2 of BH shows that β^ is a Hodges-Lehmann estimator corresponding to this T, meaning that β^ equates T with its expectation across counterfactual shocks (specifically, zero).28 This connection makes RI tests and confidence intervals based on T inherit the consistency of β^: the test power asymptotically increases to one for any fixed alternative bβ under additional regularity conditions (Proposition S2 of BH).29

Randomization inference can also be used to perform falsification tests on our key Assumptions 1 and 2. Recentering implies a testable prediction that z~i is orthogonal to any variable r=(ri)i=1N satisfying grw, such as any function of w or other observables (either predetermined or contemporaneous) thought to be conditionally independent of g. To test this restriction, one may check that the sample covariance T=1Niz~iri is sufficiently close to zero by drawing counterfactual shocks and checking that T is not in the tails of its conditional-on-(w,r) distribution. Multiple falsification tests, based on a vector of predetermined variables Ri, can be combined by an appropriate RI procedure, e.g. by taking T to be the sample sum of squared fitted values from regressing z~i on Ri.30

Falsification tests can be useful in two ways. First, when ri is a lagged outcome or another variable thought to proxy for εi, they provide an RI implementation of conventional placebo and covariate balance tests of Assumption 1. While the use of RI for inference on causal effects may be complicated by treatment effect heterogeneity, the sharp hypothesis of zero placebo effects is a natural null. Second, RI tests will generally have power to reject false specifications of the shock assignment process, i.e. violations of Assumption 2, even when ri does not proxy for εi. For ri=1, for example (which is trivially conditionally independent of g), the test verifies that the sample mean of zi is typical for the realizations of the specified assignment process. Setting ri=μi instead checks that the recentered instrument is not correlated with the expected instrument that it is supposed to remove.

3.6. Extensions

While we analyze the constant-effect model (2), identification by μi-adjusted regressions extends to settings with heterogeneous treatment effects. Namely, Appendix C.1 of BH shows that the recentered IV estimator generally identifies a convex-weighted average of heterogeneous effects under an appropriate monotonicity condition, extending Imbens and Angrist (1994). The weights are proportional to the conditional variance of z~iw across counterfactual shocks, σi2. These σi2, like μi, are given by the shock assignment process (Assumption 2) and therefore can be computed by the researcher. Moreover, they can be used to identify more conventional weighted average effects. For example, in reduced-form models of the form yi=βizi+εi a recentered and rescaled IV (ziμi)σi2 identifies the average effect E[1Ni=1Nβi]. Similarly, in IV settings with binary xi and zi, this rescaled instrument identifies the local average treatment effect of Imbens and Angrist (1994).31

Further extensions are given in the appendix of BH. Appendix C.6 shows how predetermined observables can be included as regression controls to reduce residual variation and potentially increase power. Appendix C.7 discusses identification and inference with multiple treatments or instruments. Finally, Appendix C.8 extends the framework to nonlinear outcome models.

4. Application: Effects of Transportation Infrastructure

We now present an empirical application showing how our theoretic framework can be used to avoid OVB in practice. Specifically, we estimate the effect of market access growth on Chinese regional employment growth over 2007–2016, leveraging the recent construction of high-speed rail (HSR). We show how counterfactual HSR shocks can be specified, and how correcting for expected market access growth can help purge OVB.

The recent construction of Chinese HSR has produced a network longer than in all other countries combined (Lawrence et al., 2019). The network mostly consists of dedicated passenger lines and has developed rapidly since 2007.32 Construction objectives included freeing up capacity on the low-speed rail network and supporting economic development by improving regional connectivity (Lawrence et al., 2019; Ma, 2011). While affordable fares make HSR popular for multiple purposes, business travel is an important component of rail traffic, ranging between 28% and 62%, depending on the line (Ollivier et al., 2014; Lawrence et al., 2019). The role of HSR may also extend beyond directly connected regions, as passengers frequently transfer between HSR and traditional lines (and between intersecting HSR lines). An early analysis by Zheng and Kahn (2013) finds positive effects of HSR on housing prices, while Lin (2017) similarly finds positive effects on regional employment.

We analyze HSR-induced market access effects for 340 sub-province-level administrative divisions in mainland China, referred to as prefectures.33 We measure market access growth between 2007 and 2016 by combining data on the development of the HSR network and each prefecture’s location and population (as measured in the 2000 census). A total of 83 HSR lines opened between these years, with the first in 2008; a further 66 lines were completed or under construction as of April 2019.34 We compute a simple market access measure in each prefecture i and year t based on the formula in Zheng and Kahn (2013): MAit=jexp(0.02τijt)Popj,2000, where Popj,2000 denotes the year-2000 population of prefecture j and τijt denotes predicted travel time between regions i and j in year t (in minutes). Travel time predictions are based on the operational speed of each HSR line as well as geographic distance, which proxies for the travel time by car or a low-speed train. We relate MA growth, xi=logMAi,2016logMAi,2007, to the corresponding growth in prefecture’s urban employment yi from Chinese City Statistical Yearbooks. This yields a set of 275 prefectures with non-missing outcome data; see Appendix A for details on the sample construction and MA measure. Panel A of Figure 1 shows the Chinese HSR network as of the end of 2016, along with the implied MA growth of relative to 2007.

Figure 1:

Figure 1:

Chinese High Speed Rail and Market Access Growth, 2007-2016 A. Completed Lines and MA Growth B. All Planned Lines Notes: Panel A shows the completed China high-speed rail network by the end of 2016, with shading indicating MA growth (i.e. log-change in MA) relative to 2007. Panel B shows the network of all HSR lines, including those planned but not yet completed as of 2016.

Column 1 of Table I, Panel A, reports the coefficient from a regression of employment growth on MA growth.35 The estimated elasticity of 0.23 is large. With an average MA growth of 0.54 log points, it implies a 12.4% employment growth attributable to HSR for an average prefecture—almost half of the 26.6% average employment growth. The estimate is also highly statistically significant using Conley (1999) spatially-clustered standard errors.

Table I:

Employment Effects of Market Access: Unadjusted and Recentered Estimates

Unadjusted
OLS
(1)
Recentered
IV
(2)
Controlled
OLS
(3)
Panel A: No Controls
 Market Access Growth 0.232
(0.075)
0.084
(0.097)
[−0.245, 0.337]
0.072
(0.093)
[−0.169, 0.337]
 Expected Market Access Growth 0.317
(0.096)
Panel B: With Geography Controls
 Market Access Growth 0.133
(0.064)
0.056
(0.089)
[−0.135, 0.280]
0.047
(0.092)
[−0.146, 0.280]
 Expected Market Access Growth 0.214
(0.073)
Recentered No Yes Yes
Prefectures 275 275 275

Notes: This table reports coefficients from regressions of employment growth on MA growth in Chinese prefectures from 2007–2016. MA growth is unadjusted in Column 1. In Column 2 this treatment is instrumented by MA growth recentered by permuting the opening status of built and unbuilt HSR lines with the same number of cross-prefecture links. Column 3 instead estimates an OLS regression with recentered MA growth as treatment and controlling for expected MA growth given by the same HSR counterfactuals. The regressions in Panel B control for distance to Beijing, latitude, and longitude. Standard errors which allow for linearly decaying spatial correlation (up to a bandwidth of 500km) are reported in parentheses. 95% RI confidence intervals based on the HSR counterfactuals are reported in brackets.

Panel A of Figure 1, however, gives reason for caution against causally interpreting the OLS coefficient. Prefectures with high MA growth, which serve as the effective treatment group, tend to be clustered in the main economic areas in the southeast of the country where HSR lines and large markets are concentrated. A comparison between these prefectures and the economic periphery may be confounded by the effects of unobserved policies, both contemporaneous and historical, that differentially affected the economic center.

We quantify the systematic nature of spatial variation in MA growth in Column 1 of Table II, by regressing it on a prefecture’s distance to Beijing, latitude, and longitude. These predictors capture over 80% of the variation in MA growth (as measured by the regression’s R2), reinforcing the OVB concern: for a causal interpretation of the Table I regression, one would need to assume that all unobserved determinants of employment growth (e.g. local productivity shocks) are uncorrelated with these geographic features. While one could of course control for the specific geographic variables from Table II (as we explore below), controlling perfectly for geography is impossible without removing all variation in xi.

Table II:

Regressions of Market Access Growth on Measures of Economic Geography

Unadjusted Recentered
(1) (2) (3) (4)
Distance to Beijing −0.291
(0.062)
0.069
(0.039)
0.088
(0.045)
Latitude/100 −3.324
(0.646)
−0.342
(0.276)
−0.182
(0.319)
Longitude/100 1.321
(0.458)
0.485
(0.237)
0.440
(0.240)
Expected Market Access Growth 0.026
(0.056)
0.054
(0.069)
Constant 0.536
(0.029)
0.018
(0.018)
0.018
(0.021)
0.018
(0.018)
Joint RI p-value 0.443 0.711 0.492
R2 0.824 0.083 0.010 0.086
Prefectures 275 275 275 275

Notes: This table reports coefficients from regressing the unadjusted and recentered MA growth of Chinese prefectures (2007–2016) on geographic controls. Recentering is done by permuting the opening status of built and unbuilt lines with the same number of cross-prefecture links. All regressors are measured for the prefecture’s main city and demeaned such that the constant in each regression captures the average outcome. Distance to Beijing is measured in 1,000km. Standard errors which allow for linearly decaying spatial correlation (up to a bandwidth of 500km) are reported in parentheses. Joint RI p-values are based on the 1,999 HSR counterfactuals and the sum-of-square fitted values statistic, as described in footnote 30.

Our solution is to view certain features of the HSR network as realizations of a natural experiment. By specifying a set of counterfactual HSR networks we can compute the appropriate function of geography μi which removes the systematic variation in MA growth.

Our specification of counterfactuals exploits the heterogeneous timing of HSR construction. Specifically, we permute the 2016 completion status of the built and unbuilt (but planned) lines, assuming that the timing of line completion is conditionally as-good-as-random. Panel B of Figure 1 compares the built and unbuilt lines which form our counterfactuals. Unbuilt lines tend to be concentrated in the same areas of China as built lines, reinforcing the fact that construction is not uniformly distributed in space. Moreover, built lines tend to connect more regions: the average number of cross-prefecture “links” is 3.19 and 2.44 for built and unbuilt lines, respectively, with a statistically significant difference (p = 0.048). To account for this difference we construct counterfactual upgrades by permuting the 2016 completion status only among lines with the same number of links. For example the main Beijing to Shanghai HSR line, which has the greatest number of links, is always included in the counterfactuals. This procedure generates 1,999 counterfactual HSR maps that are visually similar to the actual 2016 network; Appendix Figure A1 gives an illustrative example.

Columns 2–4 of Table II validate this specification of the HSR assignment process by the test described in Section 3.5. Column 2 shows that this recentering successfully removes the systematic geographic variation in market access. Specifically, we regress recentered MA growth on a constant and the same geographic controls as in Column 1. The regression coefficients and R2 fall dramatically relative to Column 1, while a permutation-based p-value for their joint significance (based on the regression’s sum-of-squares, as suggested in footnote 30) is 0.44. Columns 3 and 4 further show that recentered MA growth is uncorrelated with expected MA growth.36

Figure 2 plots expected and recentered MA growth given by the permutations of built and unbuilt lines. The effect of recentering is apparent by contrasting the solid and striped regions in Panel B of Figure 2 (indicating high and low recentered MA growth) with the dark- and light-shaded regions in Panel A of Figure 1 (indicating high and low MA growth). The recentered treatment no longer places western prefectures in the effective control group, as their MA growth is as low as expected. Similarly, some prefectures in the east (such as Tianjin) are no longer in the effective treatment group, as they saw an expectedly large increase in MA. At the same time, recentering provides a justification for retaining other regional contrasts. Hohhot, for example, expected higher MA growth than Harbin due to the planned connection to Beijing. This line was still under construction in 2016, however, resulting in lower MA growth in Hohhot than Harbin.

Figure 2: Expected and Recentered Market Access Growth from Chinese HSR.

Figure 2:

Notes: Panel A shows the variation in expected 2007–16 MA growth across Chinese prefectures, computed from 1,999 HSR counterfactuals that permute the opening status of built and unbuilt lines with the same number of cross-prefecture links. Panel B plots the variation in corresponding recentered MA growth: the difference between the MA growth shown in Panel A of Figure 1 and expected MA growth. The HSR network as of 2016 is also shown in this panel.

Column 2 of Table I, Panel A, shows that instrumenting MA growth with recentered MA growth reduces the estimated employment elasticity substantially, from 0.23 to 0.08. Controlling for expected MA growth yields a similar estimate of 0.07 in Column 3. Neither of the two adjusted estimates is statistically distinguishable from zero according to either Conley (1999) spatial-clustered standard errors or permutation-based inference (which yields a wider confidence interval in this setting). The difference between the unadjusted and adjusted estimates is explained by the fact that employment growth is strongly predicted by expected MA growth. In Column 3 we find a large coefficient on μi, of 0.32, meaning that employment grew faster in prefectures that were more highly exposed to potential HSR construction, whether or not the nearby lines were built yet.

Panel B of Table I shows that the geographic controls from Table II do not isolate the same variation as expected MA growth adjustment. Including these controls in the unadjusted regression of Column 1 yields a smaller but still economically and statistically significant coefficient of 0.13. In contrast, Columns 2 and 3 show that the finding of no significant MA effect after adjusting for μi is robust to including geographic controls. The μi adjustment alone appears sufficient to remove the geographic dependence of MA, as Table II also showed.37

While our primary interest is to illustrate the recentering approach, we note that there are several possible explanations for the substantive finding of a small employment effect of MA. Unlike other transportation networks used for trading goods, the Chinese HSR network primarily operates passenger trains. Its scope for directly affecting production is therefore smaller, although it could still facilitate cross-regional business relationships. In addition, the employment effects of growing market access could be positive for some regions but negative for others, as easier commuting between regions relocates employers. We leave analyses of such mechanisms and heterogeneity for future study.

In BH we discuss how market access recentering relates to other approaches in the long literature estimating transportation infrastructure upgrade effects (Redding and Turner, 2015). We first contrast the well-known challenge of strategically chosen transportation upgrades with the less discussed problem that regional exposure to exogenous upgrades may be unequal. We then explain how common strategies to address the former issue (e.g. by leveraging historical routes or inconsequential places) can be incorporated in our framework, at least in principle. At the same time, we highlight that recentering may still be needed to address the latter issue. We further discuss how some of the existing approaches naturally yield specifications of counterfactual networks (e.g. the placebos in Donaldson (2018) and Ahlfeldt and Feddersen (2018)) and summarize the conceptual and practical advantages of our approach relative to employing more conventional controls. We emphasize that even when it is challenging to obtain a convincing specification of counterfactuals, any specification can yield a robustness check on these alternative strategies (see footnote 21).

5. Conclusion

Many studies in economics use treatments or instruments which combine multiple sources of variation, which are sometimes observed at different “levels,” according to a known formula. We develop a general approach to causal inference when some, but not all, of this variation is exogenous. Non-random exposure to the exogenous shocks can bias conventional regression estimators, but this problem can be solved by specifying a shock assignment process: namely, a set of counterfactual shocks that might as well have been realized. Averaging the treatment or instrument over these counterfactuals yields a single μi which can be adjusted for to achieve identification and consistency. The specification of counterfactuals also yields a natural form of valid finite-sample inference.

In practice, researchers face a choice in how to use μi in a regression analysis: recentering by it or controlling for it. When the assignment process is given by a true randomization protocol, as in a RCT, we recommend researchers recenter first to purge OVB. Then any predetermined controls (i.e. functions of exposure) can be included to remove variation in the error term and likely increase estimation efficiency. While μi is one possible control, which automatically recenters the treatment or instrument, it need not be the best choice in terms of predicting the residual variation. Our recommendation is different in natural experiments where assumptions must be placed on the assignment process. Then controlling for candidate μi instead of recentering can have a valuable “double-robustness” property. Researchers can compute and control for several candidate μi based on different assignment processes, such that OVB is purged if at least one of the processes is specified correctly (or if there is no OVB to begin with).

We conclude by noting that our framework bears practical lessons for a range of common treatments and instruments, well beyond the market access measure in our empirical application. In our working paper (Borusyak and Hull, 2021), we discuss and illustrate some of these implications for policy eligibility treatments, network spillover treatments, linear and nonlinear shift-share instruments, model-implied instruments, instruments from centralized school assignment mechanisms, “free-space” instruments for mass media access, and weather instruments. We expect other settings may also benefit from explicit specification of shock counterfactuals and appropriate adjustment for non-random shock exposure.

A. Data Appendix

Our analysis of market access effects uses data on 340 prefectures of mainland China. This excludes the islands of Hainan and Taiwan and the special administrative regions of Hong Kong and Macau, but includes six sub-prefecture-level cities (e.g. Shihezi) that do not belong to any prefecture. We use United Nations shapefiles to geocode each prefecture by the location of its main city (or, in a few cases, by the prefecture centroid).38

We use a variety of sources to assemble a comprehensive database of the HSR network in 2016 as well as the lines planned (and in many cases under construction) as of April 2019 but not opened yet by the end of 2016. Our starting points are Map 1.2 of Lawrence et al. (2019), China Railway Yearbooks (China Railway Yearbook Editorial Board, 2001-2013), and the replication files of Lin (2017). We cross-check network links across these sources and use Internet resources such as Wikipedia and Baidu Baike to confirm and fill in missing information. Our database includes various types of HSR lines, including the National HSR Grid (4+4 and 8+8) and high-speed intercity railways. However, we only consider newly built HSR lines, excluding traditional lines upgraded to higher speeds. We do not put further restrictions on the class of trains (e.g. to G- and D-classes only) or specify an explicit minimum speed. The operating speed therefore ranges between 160 and 380kph, although the majority of lines are at 250kph. For each line we collect the date of its official opening (if it has opened), the actual or planned operating speed, and the list of prefecture stops. When different sections of the same line opened in a staggered way, we classify each section as a separate line for the purposes of constructing our 1,999 counterfactuals, following the definition of a line in footnote 34. We include only one contiguous stop per prefecture and drop lines that do not cross prefecture borders.

We compute travel time τijt between all pairs of prefectures i and j as of the ends of 2007 and 2016 for both the actual and counterfactual networks. Travel time combines traditional modes of transportation (car or low-speed train) with HSR, where available. We allow for unlimited changes between different HSR lines and between HSR and traditional modes without a layover penalty, as HSR trains tend to operate frequently and traditional modes also involve downtime. Following the existing literature, we proxy for travel time by traditional modes by the straight-line distance, and specify the speed of 100 = 120/1.2kph, where 120kph is their typical speed and the 1.2 adjustment for actual routes that are longer than a straight line. For two prefectures connected by an HSR line, we compute the distance along the line as the sum of straight-line distances between adjacent prefectures on the line. We use the operating speed of each line divided by an adjustment factor of 1.3 to capture the fact that the average speed is lower than the nominal speed we record. Computing MA further requires the population of each of the 340 prefectures from the 2000 population Census, which we obtain from Brinkhoff (2018).39

We measure prefecture employment in the 2008–2017 China City Yearbooks (China Statistics Press, 2000-2017).40 Each yearbook covers the previous year (so our data cover 2007–2016). While the yearbooks provide several employment variables, we use “The Average Number of Staff and Workers” (from the “People’s Living Conditions and Social Security” chapter), as measured in the entire prefecture and not just the main urban core. This employment series has by far the lowest number of strong year-to-year deviations which may indicate data quality issues.

We finally apply a data cleaning procedure to the outcome variable. We first mark a prefecture-year observation as exhibiting a “structural break” if (i) the outcome changes by more than twice in either direction relative to the previous non-missing value for the prefecture, (ii) it is not followed by a change in the opposite direction that is between 3/4 and 4/3 as large in terms of log-changes (which we view as a one-off jump and ignore), and (iii) the previous change does not satisfy (i). We view the outcome change between 2007 and 2016 as valid only if there are no structural breaks in any year in between. This reduces the sample from 283 to the final set of 275 prefectures.

B. Additional Exhibits

Figure A1: Simulated HSR Lines and Market Access Growth.

Figure A1:

Notes: This figure shows an example map of simulated Chinese HSR lines and market access growth over 2007–2016, obtained by permuting the opening status of built and unbuilt lines with the same number of cross-prefecture links.

C. Additional Results

Throughout the results and later proofs we omit the phrase “almost surely with respect to w” for brevity. We also abbreviate weak mutual dependence (Assumption 3) as WMD.

Proposition 2 (Convergence for all errors implies WMD). Suppose VarPN[z~iw]Uz uniformly. If, for some Uε>0, VarPN[1Niz~iεi]0 for every sequence of distributions of ε such that VarPN[εi]Uε for all N and i=1,,N then WMD holds.

Proposition 3 (WMD implies growing number of shocks). Suppose the support of gk is bounded uniformly across across N and k=1,,KN. Suppose further that, uniformly across N, f~i(g;w)fi(g;w)EPN[fi(g;w)w] is Lipschitz-continuous in g with the Lipschitz constant below ULip and that VarPN[f~i(g;w)w]Lz>0 for at least NLV units i, with LV>0. Then WMD implies KN.

Proposition 4 (WMD and dispersed shock exposure). Suppose fi(g;w) is weakly monotone in g for all i, the components of g are jointly independent conditionally on w, and VarPN[gkw][Lσ,Uσ] for 0<Lσ<Uσ<. Consider three cases: conditionally on w, (i) all components of g are normally distributed and EPN[f¯(g)gkw]<, (ii) all components of g have the Bernoulli distribution, or (iii) f¯ is linear in g. In each case:

  1. If kEPN[(f¯(g;w)gk)2]0, WMD holds;

  2. If WMD holds, EPN[k(EPN[f¯(g;w)gkw])2]0,

where we define f¯(g)gkf¯(g1,,gk1,1,gk+1,,gKN)f¯(g1,,gk1,0,gk+1,,gKN) in the Bernoulli shock case.

Proposition 5 (WMD and non-overlapping exposure sets). For each N, let G1(),,GN() be a fixed set of functions of w to subsets of {1,,KN} such that fi(;w) does not depend on gk for any kGi(w). Suppose the components of g are jointly independent conditionally on w, and VarPN[z~iw]Uz, uniformly across N and i=1,,N. Then WMD holds if EPN[1N2i,j=1N1[Gi(w)Gj(w)]]0.

Proposition 6 (First stage and concentrated individual exposure). Suppose xi=πzi+ui with uigw, for all i and π0. Suppose further, conditionally on w, the components of g are mutually independent with VarPN[gkw]Lσ>0 uniformly across N and k=1,,KN. Moreover, one of three conditions hold: (i) all components of g are mutually independent and EPN[f~i(g;w)gkw]<, (ii) all components of g have the Bernoulli distribution, or (iii) all f~i(;w) are linear in g. Then EPN[1Nik(EPN[f~i(g;w)gkw])2]LHHI>0 uniformly across N, EPN[1Nixiz~i] is also uniformly bounded away from zero (by πLHHILσ.

D. Proofs

We drop the PN subscripting of moments for all proofs to simplify notation. We again abbreviate weak mutual dependence (Assumption 3) as WMD.

Proof of Proposition 1. By Assumption 1 and the Cauchy-Schwarz inequality

Var[1Niz~iεi]=E[(1Niz~iεi)2]=1N2i,jE[z~iz~jεiεj]=1N2i,jE[E[z~iz~jw]E[εiεjw]]1N2i,jE[E[z~iz~jw]E[εi2w]E[εj2w]]UεE[1N2i,jCov[z~i,z~jw]]0. (5)

When μi is approximated by 1Ss=1Sz(s) for zi(s)=fi(g(s);w) and for a finite number S of random draws g(s) from G(gw), the same argument holds with a variance upper bound that is at most twice as large. Indeed, since

E[(zi1Sszi(s))(zj1Sszj(s))w]=Cov[zi1Sszi(s),zj1Sszj(s)w]+E[zi1Sszi(s)w]E[zj1Sszj(s)w]=S+1SCov[zi,zjw]=S+1SCov[z~i,z~jw],

we have, repeating the steps in (5),

Var[1Niεi(zi1Sszi(s))]S+1SUεE[1N2i,jCov[z~i,z~jw]]0.

Proof of Proposition 2. For each N consider ε=UεUzε~ where (ε~, w) is distributed as (z~, w), and ε~gw. Then Var[εi]=UεVar[z~i]UzUε. Moreover,

Var[1Niz~iεi]=1N2i,j=1NE[E[z~iz~jw]E[εiεjw]]=UεUz1N2i,j=1NE[E[z~iz~jw]2]=UεUz1N2i,j=1NE[Cov[z~iz~jw]2]0.

By the Cauchy-Schwarz inequality, 1N2i,j=1NCov[z~i,z~jw](1N2i,j=1NCov[z~i,z~jw]2)0.5.

And by Jensen’s inequality, E[1N2i,j=1NCov[z~i,z~jw]]E[(1N2i,j=1NCov[z~i,z~jw]2)0.5]E[1N2i,j=1NCov[z~i,z~jw]2]0.50.

Proof of Proposition 3.41 We prove this result by contradiction. Without loss of generality suppose KN=K is constant along the asymptotic sequence; whenever KN, there is a subsequence of KN bounded by some K, and the proof follows for that subsequence without change. Also without loss, we condition on w and suppress the w notation. We denote the upper bound on the support of gk by Ug2 and extend the domain of each f~i to [Ug2,Ug2]K preserving its Lipschitz constant, by the Kirszbraun theorem.

Let (x)=min{xδδ,Ug2} denote the upward-rounding function for some δ>0. Consider fˇi(g)=(f~i((g1),,(gK))) which rounds both the shocks and the values of f~i Note that

fˇi(g)f~i(g)fˇi(g)f~i((g1),,(gK))+f~i((g1),,(gK))f~i(g)δ+δULipK,

where the second inequality uses the Lipschitz condition and g((g1),,(gK))2δK since gk(gk)δ for each k.

By the Lipschitz condition f~i(gA)f~i(gB)ULipUgK for any gA,gB[Ug2,Ug2]K. Since Ef~i(g)]=0 and g[Ug2,Ug2]K, this implies f~i(g)ULipUgK and consequently fˇi(g)ULipUgK+δ+δULipK. Thus, for all N and i=1,,N there is only a finite number UR of possible “rounded” fˇi() functions. Therefore, at least NLVUR of observations i with Var[f~i(g)]Lz have the same rounded function, and thus there are at least (NLVUR)2 such pairs of observations (i,j). For any such pair, since E[f~i(g)f~j(g)]=0,

2Cov[f~i(g),f~j(g)]=Var[f~i(g)]+Var[f~j(g)]E[(f~i(g)f~j(g))2]2LzE[(f~i(g)fˇi(g)+fˇi(g)fˇj(g)+f~j(g)fˇj(g))2]2LzE[(δ(1+ULipK)+0+δ(1+ULipK))2].

Setting δ=Lz2(1+ULipK), we have Cov[f~i(g),f~j(g)]Lz2>0, and therefore

1N2i,j=1NCov[f~i(g),f~j(g)]1N2(NLVUR)2Lz2=LV2Lz2UR20.

Thus, weak mutual dependence does not hold, establishing the contradiction.

To establish Proposition 4, we first state and prove four lemmas. We assume all moments relevant for those lemmas exist.

Lemma 1. h:RKR is weakly increasing and random variables g1,,gK are independent, then for any k{1,,K1} the conditional expectation E[h(g1,,gK)g1,,gk] is weakly increasing.

Proof. Fix γ1,,γk and γ1,,γk such that γrγr for r=1,,k, and define the K×1 vectors g=(γ1,,γk,gk+1,,gK) and g=(γ1,,γk,gk+1,,gK). Note h(g)h(g). For r=k+1,,K, denote the cumulative distribution function of gr by Gr(). Then

E[h(g)]=h(g)dGk+1(gk+1)dGK(gK)h(g)dGk+1(gk+1)dGK(gK)=E[h(g)].

Lemma 2. For any weakly increasing h1, h2:RKR, Cov[h1(g),h2(g)]0 for g=(g1,,gK) with independent components.

Proof. For K=1 this is well known. The proof for K>1 follows by induction. Suppose it is true for K1. Then by the law of total covariance

Cov[h1(g),h2(g)]=E[Cov[h1(g),h2(g)g1]]+Cov[E[h1(g)g1],E[h2(g)g1]].

The first term is the expectation of a covariance of two monotone (by Lemma 1) functions of K1 variables. The second term, again by Lemma 1, is a covariance of two monotone functions of random scalars. Thus both terms are non-negative.

Lemma 3. If fi(g;w) is weakly monotone in g for all i and components of g are jointly independent conditionally on w, then Cov[z~i,z~jw]0 for all i and j. Furthermore WMD simplifies to Var[z¯]0 for z¯=1Niz~i.

Proof. Applying Lemma 2 to z~i=fi(g;w)E[fi(g;w)w] and z~j=fj(g;w)E[fj(g;w)w] (or their negations, if fi(g;w) is weakly decreasing) and conditioning on w everywhere, we obtain Cov[z~i,z~jw]0. Thus, WMD simplifies to

E[1N2i,jCov[z~i,z~jw]]=E[1N2i,jCov[z~i,z~jw]]=E[Var[1Niz~iw]]=Var[1Niz~i]0,

where the second line rearranges terms and the third line follows by E[1Niz~iw]=0.

Lemma 4. Suppose g=(g1,,gK) is jointly independent with σk2Var[gk] and consider a scalar function h on the support of g. Then if (i) all components of g are normally distributed and E[h(g)gk]< or (ii) all components of g have the Bernoulli distribution,

kσk2(E[h(g)gk])2Var[h(g)]kσk2E[(h(g)gk)2], (6)

with hgk defined in the Bernoulli case as in Proposition 4. Further, (iii) if h is linear, (6) holds with equalities, regardless of the distributions of the components of g.

Proof. For part (i), the lower bound is established by Cacoullos (1982, Proposition 3.7), and the upper bound on Var[h(g)] is established by Chen (1982, Corollary 3.2). For part (ii), the lower bound follows from restricting the results for binomial distributions in Cacoullos and Papathanasiou (1989, p. 355), and the upper bound is similarly a special case of the result in Cacoullos and Papathanasiou (1985, p. 183). Part (iii) follows trivially from the fact that hgk is non-stochastic.

Proof of Proposition 4. By Lemma 3, WMD is equivalent to Var[f¯(g;w)]=E[Var[f¯(g;w)w]]0. Applying Lemma 4 conditionally on w and using the bounds on Var[gkw],

E[kLσ(E[f¯(g;w)gkw])2]Var[f¯(g;w)]E[kUσE[(f¯(g;w)gk)2w]].

The upper bound, the law of iterated expectations, and Uσ>0 imply that if E[k[(f¯(g;w)gk)2]0, Var[f¯(g;w)]0, and thus WMD holds. The lower bound and Lσ>0 imply that if WMD holds and thus Var[f¯(g;w)]0, we have E[k(E[f¯(g;w)gkw])2]0.

Proof of Proposition 5. For any N and fixed w¯ in the support of w and for i and j such that Gi(w¯)Gj(w¯)= we have z~iz~jw=w¯ because fi and fj are functions of two non-overlapping subvectors of g, the components of which are conditionally independent. Thus Cov[z~i,z~jw=w¯]=0 for such (i, j) pairs, and we obtain

1N2i,jCov[z~i,z~jw]=1N2i,j1[Gi(w)Gj(w)]E[Cov[z~i,z~jw]]1N2i,j1[Gi(w)Gj(w)]E[Var[z~iw]Var[z~jw]]Uz1N2i,j1[Gi(w)Gj(w)]

and therefore E[1N2i,jCov[z~i,z~jw]]0.

Proof of Proposition 6. By the law of iterated expectations, E[1Nixiz~i]=πE[1Niz~i2]=πE[1NiVar[z~iw]]. The result then follows directly from Lemma 4 applied to each f~i(g;w):

1NiVar[z~iw]1NikVar[gkw](E[f~i(g;w)gkw])2Lσ1Nik(E[f~i(g;w)gkw])2,

and thus E[1Nixiz~i]πLσLHHI.

Footnotes

1

Examples of these three settings include Miguel and Kremer (2004), Donaldson and Hornbeck (2016), and Currie and Gruber (1996), respectively. Our working paper (Borusyak and Hull, 2021) discusses other common treatments and instruments nested in our framework: linear and nonlinear shift-share variables, model-implied optimal instruments, instruments based on centralized school assignment mechanisms, “free-space” instruments for access to mass media, and variables leveraging weather shocks.

2

While recentering is the key step that removes OVB, removing residual variation is likely to increase the efficiency of estimation in large samples. We give practical recommendations for each adjustment in the paper’s conclusion.

3

Our approach is “design-based,” in that identification is achieved by specifying the assignment process of some observed shocks (see, e.g., Lee (2008), Athey and Imbens (2022), Shaikh and Toulis (2021), and de Chaisemartin and Behaghel (2020)). This strategy for analyzing observational data builds on a long tradition in the analysis of randomized experiments, going back to Neyman (1923). It contrasts with other identification strategies that instead model the residual determinants of the outcome, such as difference-in-difference strategies (e.g. de Chaisemartin and D’Haultfoeuille (2020) and Athey et al. (2021)) or fully-specified structural models.

4

A notable exception of a recentering-type regression adjustment in the traditional propensity scores setting is the E-estimator of Robins et al. (1992).

5

Nothing is changed in what follows if one instead considers the number of not-dewormed neighbors as the treatment. For simplicity here we consider only the spillover treatment and not also the direct deworming treatment; see Section 3.6 on the extension to multiple treatments.

6

Aronow and Samii (2017) use a similar “exposure mapping” terminology for objects like fi(;w) in the network spillover context. We depart from this literature by referring to the realized fi(g;w) as the “treatment” or “candidate instrument” and not the realized “exposure.”

7

For simplicity here we assume away any direct effects of deworming; see Section 3.6 on the extension to multiple treatments.

8

Such OVB may arise even if (as in the network spillovers and market access examples) variation in the treatment “results” from the experimental shocks, in the sense that x1==xN=0 whenever g1==gK=0.

9

An example is given by Carvalho et al. (2021), where i is a Japanese firm and xi is the distance in the firm-to-firm supply network from i to the nearest firm located in the area hit by an earthquake. Unlike the number of treated neighbors, this spillover treatment is a nonlinear function of the earthquake shock dummies. The earthquake assignment process is also more complex, exhibiting spatial correlation. Our recentering approach still applies naturally in cases like this.

10

With completely random policy assignment, flexibly controlling for IncDemi may purge OVB as this is the only source of variation in μi. However, even in this setting the relevant demographics in IncDemi and their interactions can be high-dimensional, as discussed by Gruber (2003). This problem is exacerbated under more complex assignment processes, e.g. if policies can be viewed as random only within some groups of states, in which case group indicators and their interactions with the demographics would also have to be included. Recentering extends naturally and avoids the curse of dimensionality.

11

Formally, we assume (xi,εi)i=1N and the g and w variables introduced below are all drawn from some joint distribution which is unrestricted at this point.

12

In some cases (such as the network spillover and medicaid eligibility examples in the previous section) the candidate instrument can be naturally written as zi=f(g,wi) with a common function f() and a unit-specific measure of exposure wi. In other cases a more general notation is necessary: in the market access example, for example, the MA of each region depends on the entire country’s economic geography. An alternative way to formalize general composite variables is zi=f(g;w,w~i) for a common w and unit-specific w~i. This notation is equivalent to (3); we use fi(g;w) as it is more compact. We also note also that equation (3) does not contain a residual: it formalizes an algorithm for computing an instrument rather than characterizing an economic relationship.

13

The exclusion and as-good-as-random assignment assumptions are isolated in Appendix C.1 of BH, via a general potential outcomes model.

14

Our identification results hold under the weaker conditional mean independence assumption of E[εg,w]=E[εw]. This assumption can be understood as defining a partially linear model, as in Robinson (1988): yi=βxi+ψi(w)+ε~i where ψi(w)=E[εiw] and E[ε~ig,w]=0 for ε~i=εiψi(w). A difference from Robinson (1988) arises because we do not assume iid data; for instance, we do not assume ψi(w)ψ(wi) for iid wi.

15

Throughout, we allow (ε, w) to be stochastic (as when some components are sampled from a superpopulation) or fixed (as in a more conventional “design-based” analysis; e.g. Athey and Imbens (2022)). In the Medicaid eligibility example it may be more natural to view the observed (IncDemi, statei) as sampled from the national population, along with untreated potential outcomes εi. Conversely, in the market access example, it may be more natural to view the set of observed regions as a finite population, with fixed geography (Loci)i=1N. With fixed (ε, w), Assumption 1 holds trivially but Assumption 2 below is still restrictive.

16

It is worth emphasizing that in our non-iid setup these conditions combine two dimensions of variation: over the stochastic realizations of g, w, x, and ε, and across the cross-section of observations i=1,,N. In the iid case they reduce to the more familiar conditions of E[zixi]0 and E[ziεi]=0.

17

There exist fi() constructions that yield a relevant recentered instrument whenever the shocks induce some variation in treatment. Formally, when Var[E[xig,w]w] is not almost-surely zero at least for some i, the recentered instrument constructed as z~i=E[xig,w]E[xiw] is relevant. This again follows by the law of iterated expectations: E[1Niz~ixi]=E[1Niz~iE[xig,w]]=E[1NiVar[E[xig,w]w]].

18

Formally, the regression with μi as a control yields the reduced-form and first-stage moments E[1Niziyi] and E[1Nizixi], where vi denotes the residuals from a cross-sectional projection of vi on μi. We show in Appendix B.1 of BH that these moments also identify β under Assumption 1. Appendix C.9 of BH shows that controlling for μi always reduces asymptotic variance of the estimator when ziw is homoskedastic, while also giving a counterexample under heteroskedasticity.

19

In panel data with zit=fit(gt,wt), for example, unit fixed effects generally purge OVB only when the expected instrument is time-invariant, which generally requires the fit() mapping, the value of wt, and the distribution of gt to be time-invariant. While plausible in some applications, these conditions (in particular, stationarity of the shock distribution) can be quite restrictive. For instance, when new railroad lines tend to be built more than destroyed, expected market access will tend to grow over time.

20

For the identification results, it is enough to approximate μi by an average of fi(g(s);w) for any number S of g(s) drawn from G(gw), independently of each other and of g. We have, for example, E[1Ni(zi1Ssfi(g(s);w))εi]=0 by iterated expectations, since E[ziw,ε]=E[fi(g(s);w)w,ε]. We discuss how the number of draws affects the asymptotic behavior of the recentered IV estimator below.

21

Formally, suppose either E[ziw]=0 or E[εˇiw]=0 for each i, where vˇi denotes the cross-sectional residualization of variable vi on some functions of w used as controls. Then E[1Nizˇiεˇi]=0, where here vi denotes the residuals from a cross-sectional projection of vi on mi(w). See Appendix C.6 of BH for our framework extended to predetermined controls.

22

The proof to Proposition 1 shows Assumption 3 is also sufficient for consistency when μi is approximated by an average of fi(g(s);w) for g(s) drawn as in footnote 20. Intuitively, the variance of 1Niz~iεi is higher with fewer simulations S but converges to zero for any fixed S under Assumption 3.

23

Proposition 4 also applies when shocks are normally distributed, and when fi(;w) is linear regardless of the shock distribution. It further gives a necessary condition for Assumption 3 in all of these cases, which is a slightly stronger notion of vanishing average shock exposure concentration. For binary shocks, f¯(g;w)gk is defined as the difference in f¯ when gk switches from 0 to 1, keeping all other shocks fixed.

24

This follows because β^β=hN(r1,r2)r1(EPN[1Niz~ixi]+r2) for r1=1Niz~iεi and r2=1Niz~ixiEPN[1Niz~ixi]. Since hN() is Lipschitz-continuous at (0, 0) with a Lipschitz constant uniformly bounded from above, β^pβ if r1 and r2 converge to zero in probability.

25

For example, with a linear first stage, VarPN[1Niz~ixi]=VarPN[π1Niz~i2+1Niz~i(πμi+ui)]. Here, with mutually-independent binary shocks, Lemma 4 in Appendix C ensures VarPN[1Niz~i2]0 when the expected sum of squared effects of individual shocks on 1Niz~i2 converges to zero, and Proposition 1 implies VarPN[1Niz~i(πμi+ui)]0 whenAssumption 3 holds and EPN[(πμi+ui)2w] is uniformly bounded.

26

An exception is Adão et al. (2019) who derive non-standard asymptotic inference in one such setting: when zi is a linear shift-share variable, i.e. with fi(g;w)=kwikgk.

27

Specifically, RI guarantees the validity of tests for the model parameter β, which can be interpreted as a constant treatment effect. Valid inference with heterogeneous effects in the kind of interdependent data we study is a difficult challenge, even with an asymptotic approach (Adão et al., 2019).

28

With additional predetermined controls included in the regression (e.g. μi), the same property is satisfied by the residualized statistic 1Niz~i(yibxi), where here vi denotes the residuals from a cross-sectional projection of vi on the included controls.

29

RI confidence intervals based on this statistic are still obtained by test inversion, and not from the distribution of the recentered estimator itself across counterfactual shocks g. The latter idea fails in IV since the re-randomized instrument fi(g;w)μi has a true first-stage of zero. The distribution of reduced-form coefficients across counterfactual shocks is also not useful, except for testing β=0, as that distribution is centered around zero rather than β.

30

This T=z~R(R,R)1Rz~ extends our single-dimensional test: it is a quadratic form of the vector-valued statistic 1Niz~iRi, weighted by (R,R)1, where R is the matrix collecting Ri and z~ is the vector collecting z~i.

31

We note that this heterogeneous effects extension applies to identification but not randomization-based confidence intervals which, as noted above, require a sharp null hypothesis βi=b for all i.

32

Construction was started by the Medium- and Long-Term Railway Plan in 2004; this plan was later expanded in 2008 and again in 2016.

33

Most prefectures are officially called “prefecture-level cities,” but typically include multiple urban areas.

34

We define a line by a contiguous set of inter-prefecture HSR links that were proposed together and opened simultaneously. One pilot HSR line between Qinhuangdao and Shenyang opened in 2003. We include it in our market access measure but focus on the bulk of HSR growth over 2007–2076.

35

This regression can be viewed as a reduced form of a hypothetical IV regression, in which the treatment is a measure of market access that accounts for changes in population. We focus on the reduced form because of data constraints: we only observe the population of all 340 prefectures in the 2000 Census.

36

These results are consistent with correct specification of counterfactuals (i.e. we cannot reject Assumption 2), though we note they do not provide direct support for the exogeneity of HSR construction to the unobserved determinants of employment (Assumption 1).

37

In BH we provide additional robustness checks, adjusting the definitions of MA and outcome variables, using a binary measure of connectivity to the HSR network, including province fixed effects, dropping influential prefectures, and examining the role of treatment effect heterogeneity.

38

The shapefiles are obtained from OCHA Regional Office for Asia and the Pacific (2018, 2020), accessed on April 4, 2020.

39

Accessed on November 20, 2018.

40

Data for 2008-2015, excluding 2009 and 2011, are from http://oversea.cnki.net.proxy.uchicago.edu/kns55/default.aspx; data from 2009, 2011, 2016, and 2017, are from http://tongji.oversea.cnki.net/chn/navi/HomePage.aspx?id=N2018050234&name=YZGCA (all accessed on January 23, 2019 via a University of Chicago portal).

41

We thank Mikhail Dektiarev for help with this proof.

Contributor Information

Kirill Borusyak, UC Berkeley and CEPR.

Peter Hull, Brown and NBER.

References

  1. Abadie A. (2003): “Semiparametric instrumental variable estimation of treatment response models,” Journal of Econometrics, 113, 231–263. [Google Scholar]
  2. Abadie A, Athey S, Imbens GW, and Wooldridge JM (2020): “Sampling-based vs. Design-based Uncertainty in Regression Analysis,” Econometrica, 88, 265–296. [Google Scholar]
  3. Abadie A, and Imbens GW (2016): “Matching on the Estimated Propensity Score,” Econometrica, 84, 781–807. [Google Scholar]
  4. Abdulkadiroglu A, Angrist JD, Narita Y, and Pathak PA (2017): “Research Design Meets Market Design: Using Centralized Assignment for Impact Evaluation,” Econometrica, 85, 1373–1432. [Google Scholar]
  5. Adão R, Kolesár M, and Morales E (2019): “Shift-Share Designs: Theory and Inference,” Quarterly Journal of Economics, 134, 1949–2010. [Google Scholar]
  6. Ahlfeldt GM, and Feddersen A (2018): “From periphery to core: Measuring agglomeration effects using high-speed rail,” Journal of Economic Geography, 18, 355–390. [Google Scholar]
  7. Aronow PM, and Samii C (2017): “Estimating average causal effects under general interference, with application to a social network experiment,” Annals of Applied Statistics, 11, 1912–1947. [Google Scholar]
  8. Athey S, Bayati M, Doudchenko N, Imbens GW, and Khosravi K (2021): “Matrix Completion Methods for Causal Panel Data Models,” Journal of the American Statistical Association, 116, 1716–1730. [Google Scholar]
  9. Athey S, and Imbens GW (2022): “Design-based Analysis in Difference-In-Differences Settings with Staggered Adoption,” Journal of Econometrics, 226, 62–79. [Google Scholar]
  10. Borusyak K, and Hull P (2021): “Non-Random Exposure to Exogenous Shocks: Theory and Applications,” Mimeo. [Google Scholar]
  11. Borusyak K, Hull P, and Jaravel X (2022): “Quasi-Experimental Shift-Share Research Designs,” Review of Economic Studies, 89, 181–213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Brinkhoff T. (2018): “City Population,” http://www.citypopulation.de. [Google Scholar]
  13. Cacoullos T. (1982): “On Upper and Lower Bounds for the Variance of a Function of a Random Variable,” The Annals of Probability, 10, 799–809. [Google Scholar]
  14. Cacoullos T, and Papathanasiou V (1985): “On upper bounds for the variance of functions of random variables,” Statistics and Probability Letters, 3, 175–184. [Google Scholar]
  15. (1989): “Characterizations of Distributions by Variance Bounds,” Statistics and Probability Letters, 7, 351–356. [Google Scholar]
  16. Carvalho VM, Nirei M, Saito YU, and Tahbaz-Salehi A (2021): “Supply Chain Disruptions: Evidence from the Great East Japan Earthquake,” Quarterly Journal of Economics, 136, 1255–1321. [Google Scholar]
  17. Cattaneo MD, Frandsen BR, and Titiunik R (2015): “Randomization Inference in the Regression Discontinuity Design: An Application to Party Advantages in the U.S. Senate,” Journal of Causal Inference, 3, 1–24. [Google Scholar]
  18. de Chaisemartin C, and Behaghel L (2020): “Estimating the Effect of Treatments Allocated by Randomized Waiting Lists,” Econometrica, 88, 1453–1477. [Google Scholar]
  19. de Chaisemartin C, and D’HaultfŒuille X (2020): “Two-way fixed effects estimators with heterogeneous treatment effects,” American Economic Review, 110, 2964–2996. [Google Scholar]
  20. Chen LHY (1982): “An Inequality for the Multivariate Normal Distribution,” Journal of Multivariate Analysis, 12, 306–315. [Google Scholar]
  21. China Railway Yearbook Editorial Board (2001-2013): China Railway Yearbook, https://oversea.cnki.net/KNavi/YearbookDetail?pcode=CYFD&pykm=YZGTD&.
  22. China Statistics Press (2000-2017): China City Statistical Yearbook, https://cnki.net/KNavi/YearbookDetail?pcode=CYFD&pykm=YZGCA.
  23. Conley TG (1999): “GMM estimation with cross sectional dependence,” Journal of Econometrics, 92, 1–45. [Google Scholar]
  24. Currie J, and Gruber J (1996): “Health Insurance Eligibility, Utilization of Medical Care, and Child Health,” The Quarterly Journal of Economics, 111, 431–466. [Google Scholar]
  25. Donaldson D. (2018): “Railroads of the Raj: Estimating the Impact of Transportation Infrastructure,” American Economic Review, 108, 899–934. [Google Scholar]
  26. Donaldson D, and Hornbeck R (2016): “Railroads and American Economic Growth: A ”Market Access” Approach,” Quarterly Journal of Economics, 131, 799–858. [Google Scholar]
  27. Fisher RA (1935): The design of experiments: Oliver & Boyd. [Google Scholar]
  28. Gruber J. (2003): “Medicaid,” in Means-tested transfer programs in the United States: University of Chicago Press, 15–78. [Google Scholar]
  29. Hodges JLJ, and Lehmann EL (1963): “Estimates of Location Based on Rank Tests,” The Annals of Mathematical Statistics, 34, 598–611. [Google Scholar]
  30. Imbens GW (2000): “The role of the propensity score in estimating dose-response functions,” Biometrika, 87, 706–710. [Google Scholar]
  31. Imbens GW, and Angrist JD (1994): “Identification and Estimation of Local Average Treatment Effects,” Econometrica, 62, 467–475. [Google Scholar]
  32. King G, and Nielsen R (2019): “Why Propensity Scores Should Not Be Used for Matching,” Political Analysis, 27, 435–454. [Google Scholar]
  33. Lawrence M, Bullock R, and Liu Z (2019): China’s High-Speed Rail Development, Washington, D.C.: World Bank. [Google Scholar]
  34. Lee DS (2008): “Randomized experiments from non-random selection in U.S. House elections,” Journal of Econometrics, 142, 675–697. [Google Scholar]
  35. Lehmann EL, and Romano JP (2006): Testing statistical hypotheses: Springer Science & Business Media. [Google Scholar]
  36. Lin Y. (2017): “Travel costs and urban specialization patterns: Evidence from China’s high speed railway system,” Journal of Urban Economics, 98, 98–123. [Google Scholar]
  37. Ma D. (2011): “China’s Long, Bumpy Road to High-Speed Rail,” The Altantic. [Google Scholar]
  38. Madestam A, Shoag D, Veuger S, and Yanagizawa-Drott D (2013): “Do Political Protests Matter? Evidence from the Tea Party Movement,” Quarterly Journal of Economics, 128, 1633–1685. [Google Scholar]
  39. Miguel E, and Kremer M (2004): “Worms: Identifying impacts on education and health in the presence of treatment externalities,” Econometrica, 72, 159–217. [Google Scholar]
  40. Neyman J. (1923): “On the application of probability theory to agricultural experiments. Essay on principles,” Ann. Agricultural Sciences, 1–51. [Google Scholar]
  41. OCHA Regional Office for Asia and the Pacific (2018): “Province and Prefecture Capitals of China,” https://data.humdata.org/dataset/province-and-prefecture-capitals-of-china.
  42. (2020): “China - Subnational Administrative Boundaries,” https://data.humdata.org/dataset/cod-ab-chn.
  43. Ollivier G, Bullock R, Jin Y, and Zhou N (2014): “High-Speed Railways in China: A Look at Traffic,” China Transport Topics, 1–12. [Google Scholar]
  44. Redding SJ, and Turner MA (2015): “Transportation Costs and the Spatial Organization of Economic Activity,” in Handbook of regional and urban economics: Elsevier, 1339–1398. [Google Scholar]
  45. Redding SJ, and Venables AJ (2004): “Economic geography and international inequality,” Journal of International Economics, 62, 53–82. [Google Scholar]
  46. Robins JM, Mark SD, and Newey WK (1992): “Estimating Exposure Effects by Modelling the Expectation of Exposure Conditional on Confounders,” Biometrics, 48, 479–495. [PubMed] [Google Scholar]
  47. Robinson P. (1988): “Root-N-Consistent Semiparametric Regression,” Econometrica, 56, 934–954. [Google Scholar]
  48. Rosenbaum PR (2002): “Covariance adjustment in randomized experiments and observational studies,” Statistical Science, 17, 286–327. [Google Scholar]
  49. Rosenbaum PR, and Rubin DB (1983): “The Central Role of the Propensity Score in Observational Studies for Causal Effects,” Biometrika, 70, 41–55. [Google Scholar]
  50. Shaikh A, and Toulis P (2021): “Randomization Tests in Observational Studies with Staggered Adoption of Treatment,” Journal of the American Statistical Association, 116, 1835–1848. [Google Scholar]
  51. Wooldridge JM (2015): “Control function methods in applied econometrics,” Journal of Human Resources, 50, 420–445. [Google Scholar]
  52. Zheng S, and Kahn ME (2013): “China’s bullet trains facilitate market integration and mitigate the cost of megacity growth,” Proceedings of the National Academy of Sciences of the United States of America, 110, 1248–1253. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES