Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Jul 1.
Published in final edited form as: Stat Probab Lett. 2023 Mar 21;198:109836. doi: 10.1016/j.spl.2023.109836

Proximal Causal Inference without Uniqueness Assumptions

Jeffrey Zhang a, Wei Li b, Wang Miao c, Eric Tchetgen Tchetgen a
PMCID: PMC10887303  NIHMSID: NIHMS1884513  PMID: 38405420

Abstract

We consider identification and inference about a counterfactual outcome mean when there is unmeasured confounding using tools from proximal causal inference. Proximal causal inference requires existence of solutions to at least one of two integral equations. We motivate the existence of solutions to the integral equations from proximal causal inference by demonstrating that, assuming the existence of a solution to one of the integral equations, n-estimability of a mean functional of that solution requires the existence of a solution to the other integral equation. Solutions to the integral equations may not be unique, which complicates estimation and inference. We construct a consistent estimator for the solution set for one of the integral equations and then adapt the theory of extremum estimators to find from the estimated set a consistent estimator for a uniquely defined solution. A debiased estimator is shown to be root-n consistent, regular, and semiparametrically locally efficient under additional regularity conditions.

Keywords: Proximal Causal Inference, √n-estimability

1. Introduction

It is widely acknowledged that unmeasured confounding is pervasive in observational studies, as it is unlikely that an investigator will have measured all confounders of the treatment and outcome. Often, the best one can hope for is that some measured confounders can act as proxies for true, unmeasured confounders. Proximal causal inference was developed to circumvent the issue of unmeasured confounders through the use of suitable proxy variables. See Tchetgen Tchetgen et al. (2020), Shi et al., Miao et al. (2018), and Shi et al. (2020) for a more comprehensive overview of the framework. Proximal causal inference leverages the existence of either a treatment confounding bridge function or an outcome confounding bridge function, which solve certain integral equations (Miao et al. (2018), Cui et al. (2020), Deaner (2018)). Then, the population average treatment effect (ATE) and the average treatment effect on the treated (ATT) are respectively uniquely identified nonparametrically as a certain linear mean functional of a confounding bridge function, without the latter necessarily being uniquely identified (Miao et al. (2018), Cui et al. (2020), Deaner (2018)). However, construction of a root-n consistent, regular and asymptotically linear semiparametric locally efficient estimator of the ATE or ATT in prior literature has relied exclusively on uniqueness of the bridge functions. In this note, the goal is to investigate estimation and inference of the counterfactual mean, and thus of the ATE and ATT, when uniqueness of solutions to the integral equations does not hold. Specifically, we construct a root-n consistent, regular and asymptotically linear nonparametric estimator of the ATE without requiring uniqueness of the confounding bridge functions. The proposed methods build on recent methods methods developed in Santos (2011) and Li et al. (2021). In somewhat related settings, the former considers a linear functional of the structural function in (i) the widely studied nonparametric instrumental variable problem (Chen and Pouzo (2012)), while the latter considers (ii) a nonparametric shadow variable framework in a nonignorable missing data problem (D’Haultfoeuille (2010), Miao and Tchetgen Tchetgen (2016)). Proximal identification and inference differs from both of these settings in that the identification challenge requires the use of two proxies, while (i) and (ii) technically require a single proxy, mainly a valid instrument in (i) and a valid shadow variable in (ii).

An outline of the paper is as follows: in Section 2, we review identification strategies for the counterfactual mean, and thus the ATE and ATT, from previous works under the proximal framework, and describe sufficient and necessary conditions for identification and root-n estimation of the ATE. In Section 3, we develop an estimator for the counterfactual mean, and therefore for the ATE and ATT, establish the asymptotic theory, and discuss its semiparametric efficiency. We conclude with a discussion in Section 4 and include proofs in the Supplementary Material. While for ease of exposition, all results are given for the counterfactual outcome mean, they equally apply to a broader class of functionals as discussed in the Supplementary Material.

2. Identification

We wish to estimate the effect of a binary treatment A on an outcome Y in a setting where there is unmeasured confounding. Define Y(a) for a=0,1 to be the potential outcomes of the response if treatment had been externally set to a. The overall goal is to estimate the average treatment effect, E[Y(1)Y(0)]. Let U be an unmeasured confounder and L a vector of observed covariates. First, consider the following familiar assumptions:

Assumption 1.

(Consistency) Y=Y(A) almost surely.

Assumption 2.

(Positivity) 0<(A=aL)<1a=0,1 almost surely.

Assumption 3.

(Exchangeability) Y(a)AL for a=0,1.

Under these assumptions, the average treatment effect is identified. However, the exchangeability assumption requires that there be no unmeasured confounders, an assumption that is often untenable in observational study settings. Instead, we adopt the recent proximal causal inference framework wherein we require there to be a treatment confounding proxy Z and an outcome confounding proxy W. This leads to the following assumptions as introduced by Cui et al. (2020):

Assumption 4.

(Latent Unconfoundednes)

(Z,A)(Y(a),W)|U,Xfora=0,1

Assumption 5.

(Positivity) 0<(A=aU,X)<1 almost surely, a=0,1.

Assumption 6.

(Completeness 1) For any g(U) square-integrable, a,x, if E[g(U)Z,A=a,X=x]=0 almost surely, then g(U)=0 almost surely.

There are two ways to identify the counterfactual mean E[Y(a)]. The first method is given by the following theorem:

Theorem 2.1.

(Miao et al. (2018)) Suppose that there exists an outcome confounding bridge function h(w,a,x) that solves the following integral equation

E[YZ,A,X]=h(w,A,X)dF(wZ,A,X) (1)

almost surely. Under Assumptions 1, and 4,5, and 6, one has that

E[Y(a)]=𝓧h(w,a,x)dF(wx)dF(x) (2)

Assumption 7.

(Completeness 2) For any g(U) square-integrable, a,x, if E[g(U)W,A=a,X=x]=0 almost surely, then g(U)=0 almost surely.

Using this second completeness assumption, the following theorem provides an alternative identification scheme:

Theorem 2.2.

(Cui et al. (2020)) Suppose that there exists a treatment confounding bridge function q(z,a,x) that solves the integral equation:

E[q(Z,a,X)W,A=a,X]=1f(A=aW,X) (3)

Then under Assumptions 1, 4, 5, and 7, one has

E[Y(a)]=𝓧I(a˜=a)q(z,a,x)ydF(y,z,a˜x)dF(x) (4)

In the above theorems, only existence of an outcome confounding bridge function h or treatment confounding bridge function q is required for identification of the ATE; they need not be unique. Suppose that one has observed n i.i.d. data samples consisting of variables (A,Z,X,W,Y). Let L2(Y,W,Z,A,X) denote the space of real valued functions of (Y,W,Z,A,X) that are square integrable with respect to the distribution of (Y,W,Z,A,X) and use the inner product f1,f2:=E[f1f2]. For any bounded linear map T, define 𝓓(T),(T),𝓝(T),T to be the domain, range, null space, and adjoint of T. Let T be the orthocomplement of a set T. Let To:L2(W,A,X)L2(Z,A,X) where Tog(W,A,X)=E[g(W,A,X)Z,A,X. Let Tt:L2(Z,A,X)L2(W,A,X) where Tog(Z,A,X)=E[g(Z,A,X)W,A,X].

Before proceeding, we provide a purely statistical motivation for the integral equations 1 and 3. Therefore, the following requires no reference to an unmeasured confounder U and makes neither assumption 2.1 or 2.2. Consider the following two scenarios:

  1. Suppose (1) holds, i.e. there exists a function hL2(W,A,X) such that E[YZ,A,X]=E[h(W,A,X)Z,A,X]

  2. Suppose (3) holds, i.e. there exists a function qL2(Z,A,X) such that 1f(AW,X)=E[q(Z,A,X)W,A,X]

Under the first scenario, consider the problem of estimating a functional of the following form:

βo=E[ϕo(W,A,X)h(W,A,X)] (5)

where ϕo is a known function in L2(W,A,X). Let To:L2(W,A,X)L2(Z,A,X) where Tog(W,A,X)=E[g(W,A,X)Z,A,X]. Then we have the following:

Proposition 2.3.

Under the assumption that 1 holds, βo is identified iff ϕo𝓝(To).

Proof. First, suppose βo is identified. Consider h1,h2 that satisfy Equation 1. Note that this implies that h1h2𝓝(To). Since βo is identified, both h1 and h2 yield the same value of βo. Thus, we have that

0=Eϕo(W,A,X)(h1h2)]

and so ϕo(W,A,X)𝓝(To) since h1h2 is an arbitrary element of 𝓝(To). Conversely, suppose ϕo(W,A,X)𝓝(To). Then for any h1 and h2 that satisfy Equation 1, we have h1h2𝓝(To) and so E[ϕo(W,A,X)h1]=E[ϕo(W,A,X)h2] and so βo is identified. □

Note that 𝓝(To)=cl((To)). However, the following Lemma establishes that ϕo(W,X)(To) is necessary for βo to be n estimable.

Lemma 2.4.

Assuming equation 1 holds and additional regularity conditions described in the Supplementary Material, ϕo(W,X)(To) is necessary for βo to be n estimable.

This result is analogous to a result derived in Severini and Tripathi (2012) in the non-parametric instrumental variables context. Next, note that

E[h(W,a,X)]=E[E[h(W,a,X)W,X]]=E[E[h(W,a,X)I(A=a)/(A=aW,X)W,X]]=E[h(W,a,X)/(A=aW,X)I(A=a)]=E[h(W,A,X)/(A=aW,X)I(A=a)]

which is in the form of equation 5 with ϕo(W,A,X)=I(A=a)/(A=aW,X) which for current purposes may be assumed known. Lemma 2.4 thus implies that for E[h(W,a,X)] to be n estimable, there must be a function q(Z,A,X) that satisfies

E[q(Z,A,X)W,A,X]=I(A=a)/(A=aW,X)

This corresponds to the condition from Equation 3. Likewise, consider the problem of estimating a functional of the following form:

βt=E[ϕt(Z,A,X)q(Z,A,X)] (6)

where ϕt is a known function in L2(Z,A,X). Write Tt:L2(Z,A,X)L2(W,A,X) where Ttg(Z,A,X)=E[g(Z,A,X)W,A,X]. Analogous to Proposition 2.3, we have the following:

Proposition 2.5.

Under the assumption that 3 holds, βt is identified iff ϕt𝓝(Tt).

Proof. Note that for any q1 and q2 that satisfy 3, we must have q1q2𝓝(Tt). Then the argument follows in the exact same manner as in Proposition 2.3. □

As above, it is possible to establish that ϕtR(Tt) is necessary for βt to be n estimable.

Lemma 2.6.

Assuming equation 3 holds and additional regularity conditions described in the Supplementary Material, ϕo(W,X)R(To) is necessary for βo to be n estimable.

Next, observe that from Equation 4, we have

E[I(A=a)q(Z,a,X)Y]=E[E[I(A=a)q(Z,a,X)YZ,A,X]]=E[I(A=a)q(Z,a,X)E[YZ,A,X]]=E[I(A=a)q(Z,A,X)E[YZ,A,X]]

which is in the form of equation 6 with ϕt(Z,A,X)=I(A=a)E[YZ,A,X] which for current purposes may be assumed known. Lemma 2.6 thus implies that for E[I(A=a)q(Z,A,X)Y] to be n estimable, there must be a function h(W,A,X) such that

E[h(W,A,X)Z,A,X]=I(A=a)E[YZ,A,X]

This corresponds to the condition from Equation 1. We may conclude that taking as a primitive condition that a solution to Equation 1 exists everywhere in the model, i.e. at all laws included in the semiparametric model, identification and root-n estimation of the counterfactual outcome mean necessarily implies that a solution to 3 exists at the true data generating law. On the other hand, taking as a primitive condition that a solution to Equation 3 exists at all laws of the semiparametric model, identification and root-n estimation of the counterfactual outcome mean necessarily implies that a solution to 1 exists at the true data generating law. The present setting differs from the shadow variable missing data setting studied in Li et al. (2021) somewhat in ways worth discussing. In the current setting, we aim to account for the presence of an unmeasured confounder U and the key assumption 4 to identification involves this latent variable together with two fully observed auxiliary factors in the form of a pair of proxies Z and W, each of which plays a specific role. In contrast, identification in a shadow variable setting does not require invoking a latent factor, and requires only a single fully observed auxiliary variable which satisfies a certain conditional independence condition (Li et al. (2021)). Despite these differences, our paper demonstrates that the analytic framework of Li et al. (2021) readily extends to the proximal causal inference setting. We further establish in the Supplementary Material that the approach actually applies to a general class of doubly robust functionals studied by Ghassami et al. (2022), for which a pair of nuisance functions is defined as solution to Fredholm integral equations. The above propositions give some motivation for assuming solutions of the Fredholm integral equations from those theorems. In the next section, we describe an estimation strategy for the counterfactual mean without the assumption of a unique h or q that solve the integral equations.

3. Estimation Strategy

We follow estimation strategies from Santos (2011) and Li et al. (2021). By the above discussion, it is sensible to construct solution sets for either of the Fredholm integral equations from equation 1 and 3. However, estimating solution sets for the latter requires an estimate for the propensity score. Thus, we consider estimating the solution set of equation 1. First, let 𝓗 be a set of smooth functions. Define the solution sets of the Fredholm integral equations as follows:

𝓗0={h𝓗:E[h(W,A,X)Z,A,X]=E[YZ,A,X]} (7)

Under the assumptions from Theorem 2.1 and Theorem 2.2, E[h(W,a,X)] has a causal interpretation as the counterfactual mean E[Y(a)]. Under these assumptions, to estimate μa:=E[Y(a)], we first construct a consistent estimator 𝓗^0 for the set 𝓗0; next, we choose a specific h^0𝓗^0 so that it is a consistent estimator for a fixed element h0𝓗0.

3.1. Estimation of solution sets

Define the criterion function

C(h)=E[E[Yh(W,A,X)Z,A,X]2]

In practice, the estimation of the solution set can be done in the two arms separately, for example, by taking the criterion function C(h)=E[I(A=a)E[(Yh(W,a,X)Z,A=a,X]2]. Note that

𝓗0={h𝓗:C(h)=0}

i.e., the solution set of the Fredholm integral equation consists of the zeros of the criterion function. To proceed with estimation, we adopt a two-stage approach. We aim to construct sample analogues Cn of the criterion function C. We let 𝓗n be sieve for 𝓗. Specifically, for a known sequence of approximating functions {ψm(w,a,x)}m=1, let

𝓗n={h𝓗:h(w,a,x)=m=1mnβmψm(w,a,x)} (8)

For β,h unknown and mn known. To construct Cn, we require a nonparametric estimator of conditional expectations. For this, let {ϕk(z,a,x)}k=1 be a known sequence of approximating functions. Denote

ϕ(z,a,x)={ϕ1(z,a,x),,ϕkn(z,a,x)}T

and let

Φ={ϕ(Z1,A1,X1),,ϕ(Zn,An,Xn)}T

For a generic random variable B=B(W,A,X,Y,) with realizations {Bi=B(Wi,Ai,Xi,Yi)i=1n} the nonparametric sieve estimator of E[Ba,z,x] is

E^(BA,Z,X)=ϕ(Z,A,X)(ΦTΦ)1i=1nϕ(Zi,Ai,Xi)Bi (9)

The sample analogue Cn(h) is then

Cn(h)=1ni=1ne^2(Zi,Ai,Xi,h) (10)

where

e^(Zi,Ai,Xi,h)=E^[Yh(W,A,X)Ai,Zi,Xi] (11)

Then the proposed estimator of 𝓗0 is 𝓗^0={h𝓗n:Cn(h)cn} where cn is an appropriately chosen sequence that tends to 0.

3.2. Set consistency

In this section, we establish the set consistency of 𝓗^0 for 𝓗0 under Hausdorff distances. For this, define the Hausdorff distance between two sets 𝓗1,𝓗2𝓗 as

dH(𝓗1,𝓗2,)=max{d(𝓗1,𝓗2),d(𝓗2,𝓗1)}

where d(𝓗1,𝓗2)=suph1𝓗1infh2𝓗2h1h2 and is a given norm. Consider the following two norms:

hw2=E[{E[h(W,A,X)Z,A,X]}2]h2=supw,a,x|h(w,a,x)|

Notice that any h,h0𝓗0 satisfy 1, so it holds that for any h^0𝓗^0 and any h,h0𝓗0,h^0h0w=h^0hw and so

h^0h0w=infh𝓗0h^0hwdH(𝓗^0,𝓗0,w) (12)

Thus, we can calculate the convergence rate of h^0h0w by finding the convergence rate of dH(𝓗^0,𝓗0,w). We will need to consistently estimate 𝓗0 under the supremum norm because under the w norm, elements of 𝓗0 form an equivalence class. We will require several assumptions.

Assumption 8.

The vector of covariates Xd has support [0,1]d, and outcome Y and proxies Z,W have compact support.

Definition 1.

For a generic function ρ(ω) defined on ωd we define

ρ,α=max|λ|α_supw|Dλρ(w)|+maxλ=α_supwwDλρ(w)Dλρ(w)wwαα_

where λ is a d-dimensional vector of nonegative integers, |λ|=i=1dλi,α_ denotes the largest integer smaller than α,Dλρ(ω)=|λ|ρ(ω)/ω1λ1ωdλd, and D0ρ(ω)=ρ(ω).

Assumption 9.

The following conditions hold:

  1. suph𝓗h,α< for some α>(d+1)/2;𝓗00,𝓗n and 𝓗 are closed.

  2. for every 𝓗, there is a Πnh𝓗n such that suph𝓗hΠnh=O(ηn) for some ηn=o(1).

Assumption 10.

The following conditions hold:

  1. The smallest and largest eigenvalues of E[ϕ(Z,A,X)ϕ(Z,A,X)T] are bounded above and away from zero for all kn

  2. for every h𝓗, there is a πn(h)kn such that
    suph𝓗E[h(Z,A,X)a,z,x]ϕT(a,z,x)πn(h)=O(knαd+1)
  3. ξn2kn=o(n), where ξn=supz,a,xϕ(z,a,x)2

Then we have the following theorem:

Proposition 3.1.

Suppose that Assumptions 8-10 hold. If an=Oλn1,bn, and bn=o(an), then

dH(𝓗^0,𝓗0,)=op(1)anddH(𝓗^0,𝓗0,w)=Op(cn1/2)

After obtaining a consistent estimator 𝓗^0 of 𝓗0, we select a specific estimator from 𝓗^0 such that it converges to a unique element in 𝓗0.

3.3. A representer-based estimator

To do so, we let M:𝓗 be a population criterion function that attains a unique minimum h0 on 𝓗0 and Mn(h) its sample analogue. We then select

h^0argminh𝓗^0Mn(h) (13)

We make the following assumption about M:

Assumption 11.

The function set 𝓗 is convex; the functional M:𝓗 is strictly convex and attains a unique minimum at h0 on 𝓗0; its sample analogue Mn:𝓗 is continuous and suph𝓗|Mn(h)M(h)|=op(1).

A sensible choice for M is the squared length M(h)=E[h(W,A,X)2] with corresponding sample analog Mn(h)=1ni=1nh(Wi,Ai,Xi)2.

Proposition 3.2.

Suppose that assumptions 8-11 hold. Then

h^0h0=op(1)

where h^0 is defined in equation 13. If an=O(λn1),bn and bn=o(an), we then have

h^0h0w=Op(Cn1/2)

Based on the identification condition μa=E[Y(a)]=E[h0(W,a,X)], we propose the following estimator for μa:

μ^a=1ni=1n{h^0(Wi,a,Xi)} (14)

To facilitate analysis of the asymptotic properties of the estimator, let 𝓗¯ be the closure of the linear span of 𝓗 under w, and define

h1,h2w=E[I(A=a)E{h1(W,A,X)A,Z,X}E{h2(W,A,X)A,Z,X}]

Next, we require the following representer assumption:

Assumption 12.

The following conditions hold:

  1. there exists a function g0𝓗 such that
    g0,hw=E{h(W,a,X)}
    for all h𝓗¯
  2. ηn=o(n1/3),kn3α/(d+1)=o(n1),kn3=o(n),ξn2kn2=o(n), and ξn2kn2α/(d+1)=o(1).

Observe that any g0 that satisfies Assumption 12(i) satisfies the following for all h𝓗¯:

E[I(A=a)E{g0(W,A,X)A,Z,X}h(W,A,X)]=E[I(A=a)/P(A=aW,X)h(W,A,X)] (15)

Then suppose that

E[I(A=a)E{g0(W,A,X)A,Z,X}I(A=a)/P(A=aW,X)W,A,X]𝓗¯

Using 15, this implies that

E[I(A=a)({g0(W,A,X)A,Z,X}1/P(A=aW,X))W,A,X]=0.

In other words, E[{g0(W,A,X)A,Z,X}] solves the integral equation from 3. Thus, Assumption 12(i) can be viewed as a strengthening of the assumption that there exists a function q(Z,A,X) that solves Equation 3. As in Santos (2011), the g0 function will be unique up to equivalence class. Now we can address the asymptotic expansion of our estimator:

Theorem 3.3.

Suppose that assumptions 8-12 hold. Then we have that

n(μ^aμa)=1ni=1n[h0(Wi,a,Xi)μa+I(Ai=a)E{g0(W,A,X)Ai,Zi,Xi}×(Yih0(Wi,Ai,Xi))]nrn(h^0)+op(1)

where

rn(h^0)=1ni=1nI(Ai=a)E^{Πng0(W,A,X)Ai,Zi,Xi}e^(Zi,Ai,Xi,h^0) (16)

To get an asymptotically normal estimator, we may de-bias the estimator, which requires estimating the term rn(h^0), which may not be asymptotically negligible.

3.4. A de-biased estimator

First, we define a new criterion function

R(h)=E[I(A=a)E{h(W,A,X)Z,A,X}2]2E{h(W,a,X)},h𝓗

and the sample analog

Rn(h)=1ni=1nI(Ai=a)E^{h(W,A,X)Zi,Ai,Xi}22ni=1nh(Wi,a,Xi),h𝓗

Observe that since g0,hw=E{h(W,a,X)}, we have that R(h)=hg0w2g0w2. It follows that g0 is the unique minimizer of the mapping hR(h). Then since g0 is close to Πng0 by assumption 9(ii), we can estimate the term Πng0 by

g^argminh𝓗nRn(h) (17)

With this estimate, we can construct the following estimator for rn(h^0) as

r^n(h^0)=1ni=1nI(Ai=a)E^{g^(W,A,X)Ai,Zi,Xi}e^(Zi,Ai,Xi,h^0) (18)

Next, we have a lemma characterizing the convergence of r^n(h^0) to rn(h^0).

Lemma 3.4.

Suppose that assumptions 8-10 and 12 hold. Then

suph^0𝓗^0|r^n(h^0)rn(h^0)|=Op[cn1/2{(knn)1/4+knα2(d+1)}]

Using this lemma, we can construct the following debiased estimator that is asymptotically normal:

μ^adb=μ^a+r^n(h^0) (19)

Theorem 3.5.

Suppose assumptions 8 hold. If an an=O(λn1),bn, and n2/3bn=o(an), then n(μ^adbμa) converges in distribution to N(0,σ2), where σ2 is the variance of

h0(W,a,X)μa+I(A=a)E{g0(W,A,X)Z,A,X}×(Yh0(W,A,X)) (20)

Equation 20 is the influence function for the de-biased estimator μ^adb. Cui et al. (2020) consider a nonparametric model that leaves the h and q functions that solve the Fredholm integral equations unrestricted. With this model and under additional assumptions that ensure uniqueness of h and q, they derive the efficient influence function for the counterfactual mean as h(W,a,X)μa+I(A=a)q(Z,a,X)×(Yh(W,a,X)). Thus, we have the immediate corollary:

Corollary 3.6.

The influence function 20 attains the semiparametric efficiency bound under the model considered in Cui et al. (2020) under assumptions 1,4-7, assumption 10 in Cui et al. (2020).

4. Discussion

Under the proximal causal inference framework, we have established an estimator for the counterfactual mean. We have shown it is consistent and presented conditions for when it is asymptotically normal. We have also discussed under what conditions it achieves the semiparametric efficiency bound. Note that if interest is in the average causal effect, E[Y(1)Y(0)], the proposed methodology can be adapted by slightly adjusting 12(i).

Supplementary Material

1

Acknowledgements

Jeffrey Zhang was supported by NIH grant 5R01HD101415-02. Wei Li’s research was supported by the National Natural Science Foundation of China (NSFC 12101607), the Fundamental Research Funds for the Central Universities and the Research Funds of Renmin University of China. Eric Tchetgen Tchetgen (PI) was supported by NIH Grants: R01AI27271, R01CA222147, R01AG065276, R01GM139926.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Chen X. and Pouzo D. Estimation of nonparametric conditional moment models with possiblynonsmooth generalized residuals. Econometrica, 80(1):277–321, 2012. ISSN 00129682, 14680262. URL http://www.jstor.org/stable/41336586. [Google Scholar]
  2. Cui Y, Pu H, Shi X, Miao W, and Tchetgen ET Semiparametric proximal causal inference,2020. URL https://arxiv.org/abs/2011.08411.
  3. Deaner B. Proxy controls and panel data, 2018. URL https://arxiv.org/abs/1810.00283.
  4. D’Haultfoeuille X. A new instrumental method for dealing with endogenous selection. Journal of Econometrics, 154(1):1–15, 2010. URL https://EconPapers.repec.org/RePEc:eee:econom:v:154:y:2010:i:1:p:1-15. [Google Scholar]
  5. Ghassami A, Ying A, Shpitser I, and Tchetgen Tchetgen E. Minimax kernel machine learning for a class of doubly robust functionals with application to proximal causal inference. In CampsValls G, Ruiz FJR, and Valera I, editors, Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, pages 7210–7239. PMLR, 28–30 Mar 2022. URL https://proceedings.mlr.press/v151/ghassami22a.html. [Google Scholar]
  6. Li W, Miao W, and Tchetgen Tchetgen E. Identification and estimation of nonignorable missing outcome mean without identifying the full data distribution, 2021. URL https://arxiv.org/abs/2110.05776.
  7. Miao W. and Tchetgen Tchetgen EJ On varieties of doubly robust estimators under missingness not at random with a shadow variable. Biometrika, 103(2):475–482, 2016. ISSN 00063444. URL http://www.jstor.org/stable/43908634. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Miao W, Geng Z, and Tchetgen Tchetgen EJ Identifying causal effects with proxy variables of an unmeasured confounder. Biometrika, 105(4):987–993, 08 2018. ISSN 0006–3444. doi: 10.1093/biomet/asy038. URL 10.1093/biomet/asy038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Santos A. Instrumental variable methods for recovering continuous linear functionals. Journal of Econometrics, 161(2):129–146, 2011. ISSN 0304–4076. doi: 10.1016/j.jeconom.2010.11.014. URL https://www.sciencedirect.com/science/article/pii/S0304407610002253. [DOI] [Google Scholar]
  10. Severini TA and Tripathi G. Efficiency bounds for estimating linear functionals of nonparametric regression models with endogenous regressors. Journal of Econometrics, 170(2):491–498, 2012. ISSN 0304–4076. doi: 10.1016/j.jeconom.2012.05.018. URL https://www.sciencedirect.com/science/article/pii/S0304407612001303. Thirtieth Anniversary of Generalized Method of Moments. [DOI] [Google Scholar]
  11. Shi X, Miao W, and Tchetgen Tchetgen E. A selective review of negative control methods in epidemiology. Current Epidemiology Reports, 7(4):190–202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Shi X, Miao W, Nelson JC, and Tchetgen Tchetgen EJ Multiply robust causal inference with double-negative control adjustment for categorical unmeasured confounding. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82(2):521–540, 2020. doi: 10.1111/rssb.12361. URL https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/rssb.12361. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Tchetgen Tchetgen EJ, Ying A, Cui Y, Shi X, and Miao W. An introduction to proximal causal learning, 2020. URL https://arxiv.org/abs/2009.10982.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES