Abstract
We consider identification and inference about a counterfactual outcome mean when there is unmeasured confounding using tools from proximal causal inference. Proximal causal inference requires existence of solutions to at least one of two integral equations. We motivate the existence of solutions to the integral equations from proximal causal inference by demonstrating that, assuming the existence of a solution to one of the integral equations, -estimability of a mean functional of that solution requires the existence of a solution to the other integral equation. Solutions to the integral equations may not be unique, which complicates estimation and inference. We construct a consistent estimator for the solution set for one of the integral equations and then adapt the theory of extremum estimators to find from the estimated set a consistent estimator for a uniquely defined solution. A debiased estimator is shown to be root- consistent, regular, and semiparametrically locally efficient under additional regularity conditions.
Keywords: Proximal Causal Inference, √n-estimability
1. Introduction
It is widely acknowledged that unmeasured confounding is pervasive in observational studies, as it is unlikely that an investigator will have measured all confounders of the treatment and outcome. Often, the best one can hope for is that some measured confounders can act as proxies for true, unmeasured confounders. Proximal causal inference was developed to circumvent the issue of unmeasured confounders through the use of suitable proxy variables. See Tchetgen Tchetgen et al. (2020), Shi et al., Miao et al. (2018), and Shi et al. (2020) for a more comprehensive overview of the framework. Proximal causal inference leverages the existence of either a treatment confounding bridge function or an outcome confounding bridge function, which solve certain integral equations (Miao et al. (2018), Cui et al. (2020), Deaner (2018)). Then, the population average treatment effect (ATE) and the average treatment effect on the treated (ATT) are respectively uniquely identified nonparametrically as a certain linear mean functional of a confounding bridge function, without the latter necessarily being uniquely identified (Miao et al. (2018), Cui et al. (2020), Deaner (2018)). However, construction of a root- consistent, regular and asymptotically linear semiparametric locally efficient estimator of the ATE or ATT in prior literature has relied exclusively on uniqueness of the bridge functions. In this note, the goal is to investigate estimation and inference of the counterfactual mean, and thus of the ATE and ATT, when uniqueness of solutions to the integral equations does not hold. Specifically, we construct a root- consistent, regular and asymptotically linear nonparametric estimator of the ATE without requiring uniqueness of the confounding bridge functions. The proposed methods build on recent methods methods developed in Santos (2011) and Li et al. (2021). In somewhat related settings, the former considers a linear functional of the structural function in (i) the widely studied nonparametric instrumental variable problem (Chen and Pouzo (2012)), while the latter considers (ii) a nonparametric shadow variable framework in a nonignorable missing data problem (D’Haultfoeuille (2010), Miao and Tchetgen Tchetgen (2016)). Proximal identification and inference differs from both of these settings in that the identification challenge requires the use of two proxies, while (i) and (ii) technically require a single proxy, mainly a valid instrument in (i) and a valid shadow variable in (ii).
An outline of the paper is as follows: in Section 2, we review identification strategies for the counterfactual mean, and thus the ATE and ATT, from previous works under the proximal framework, and describe sufficient and necessary conditions for identification and root- estimation of the ATE. In Section 3, we develop an estimator for the counterfactual mean, and therefore for the ATE and ATT, establish the asymptotic theory, and discuss its semiparametric efficiency. We conclude with a discussion in Section 4 and include proofs in the Supplementary Material. While for ease of exposition, all results are given for the counterfactual outcome mean, they equally apply to a broader class of functionals as discussed in the Supplementary Material.
2. Identification
We wish to estimate the effect of a binary treatment on an outcome in a setting where there is unmeasured confounding. Define for to be the potential outcomes of the response if treatment had been externally set to . The overall goal is to estimate the average treatment effect, . Let be an unmeasured confounder and a vector of observed covariates. First, consider the following familiar assumptions:
Assumption 1.
(Consistency) almost surely.
Assumption 2.
(Positivity) almost surely.
Assumption 3.
(Exchangeability) for .
Under these assumptions, the average treatment effect is identified. However, the exchangeability assumption requires that there be no unmeasured confounders, an assumption that is often untenable in observational study settings. Instead, we adopt the recent proximal causal inference framework wherein we require there to be a treatment confounding proxy and an outcome confounding proxy . This leads to the following assumptions as introduced by Cui et al. (2020):
Assumption 4.
(Latent Unconfoundednes)
Assumption 5.
(Positivity) almost surely, .
Assumption 6.
(Completeness 1) For any square-integrable, if almost surely, then almost surely.
There are two ways to identify the counterfactual mean . The first method is given by the following theorem:
Theorem 2.1.
(Miao et al. (2018)) Suppose that there exists an outcome confounding bridge function that solves the following integral equation
(1) |
almost surely. Under Assumptions 1, and 4,5, and 6, one has that
(2) |
Assumption 7.
(Completeness 2) For any square-integrable, if almost surely, then almost surely.
Using this second completeness assumption, the following theorem provides an alternative identification scheme:
Theorem 2.2.
(Cui et al. (2020)) Suppose that there exists a treatment confounding bridge function that solves the integral equation:
(3) |
Then under Assumptions 1, 4, 5, and 7, one has
(4) |
In the above theorems, only existence of an outcome confounding bridge function or treatment confounding bridge function is required for identification of the ATE; they need not be unique. Suppose that one has observed i.i.d. data samples consisting of variables . Let denote the space of real valued functions of that are square integrable with respect to the distribution of and use the inner product . For any bounded linear map , define to be the domain, range, null space, and adjoint of . Let be the orthocomplement of a set . Let where . Let where .
Before proceeding, we provide a purely statistical motivation for the integral equations 1 and 3. Therefore, the following requires no reference to an unmeasured confounder and makes neither assumption 2.1 or 2.2. Consider the following two scenarios:
Suppose (1) holds, i.e. there exists a function such that
Suppose (3) holds, i.e. there exists a function such that
Under the first scenario, consider the problem of estimating a functional of the following form:
(5) |
where is a known function in . Let where . Then we have the following:
Proposition 2.3.
Under the assumption that 1 holds, is identified iff .
Proof. First, suppose is identified. Consider that satisfy Equation 1. Note that this implies that . Since is identified, both and yield the same value of . Thus, we have that
and so since is an arbitrary element of . Conversely, suppose . Then for any and that satisfy Equation 1, we have and so and so is identified. □
Note that . However, the following Lemma establishes that is necessary for to be estimable.
Lemma 2.4.
Assuming equation 1 holds and additional regularity conditions described in the Supplementary Material, is necessary for to be estimable.
This result is analogous to a result derived in Severini and Tripathi (2012) in the non-parametric instrumental variables context. Next, note that
which is in the form of equation 5 with which for current purposes may be assumed known. Lemma 2.4 thus implies that for to be estimable, there must be a function that satisfies
This corresponds to the condition from Equation 3. Likewise, consider the problem of estimating a functional of the following form:
(6) |
where is a known function in . Write where . Analogous to Proposition 2.3, we have the following:
Proposition 2.5.
Under the assumption that 3 holds, is identified iff .
Proof. Note that for any and that satisfy 3, we must have . Then the argument follows in the exact same manner as in Proposition 2.3. □
As above, it is possible to establish that is necessary for to be estimable.
Lemma 2.6.
Assuming equation 3 holds and additional regularity conditions described in the Supplementary Material, is necessary for to be estimable.
Next, observe that from Equation 4, we have
which is in the form of equation 6 with which for current purposes may be assumed known. Lemma 2.6 thus implies that for to be estimable, there must be a function such that
This corresponds to the condition from Equation 1. We may conclude that taking as a primitive condition that a solution to Equation 1 exists everywhere in the model, i.e. at all laws included in the semiparametric model, identification and root-n estimation of the counterfactual outcome mean necessarily implies that a solution to 3 exists at the true data generating law. On the other hand, taking as a primitive condition that a solution to Equation 3 exists at all laws of the semiparametric model, identification and root-n estimation of the counterfactual outcome mean necessarily implies that a solution to 1 exists at the true data generating law. The present setting differs from the shadow variable missing data setting studied in Li et al. (2021) somewhat in ways worth discussing. In the current setting, we aim to account for the presence of an unmeasured confounder and the key assumption 4 to identification involves this latent variable together with two fully observed auxiliary factors in the form of a pair of proxies and , each of which plays a specific role. In contrast, identification in a shadow variable setting does not require invoking a latent factor, and requires only a single fully observed auxiliary variable which satisfies a certain conditional independence condition (Li et al. (2021)). Despite these differences, our paper demonstrates that the analytic framework of Li et al. (2021) readily extends to the proximal causal inference setting. We further establish in the Supplementary Material that the approach actually applies to a general class of doubly robust functionals studied by Ghassami et al. (2022), for which a pair of nuisance functions is defined as solution to Fredholm integral equations. The above propositions give some motivation for assuming solutions of the Fredholm integral equations from those theorems. In the next section, we describe an estimation strategy for the counterfactual mean without the assumption of a unique or that solve the integral equations.
3. Estimation Strategy
We follow estimation strategies from Santos (2011) and Li et al. (2021). By the above discussion, it is sensible to construct solution sets for either of the Fredholm integral equations from equation 1 and 3. However, estimating solution sets for the latter requires an estimate for the propensity score. Thus, we consider estimating the solution set of equation 1. First, let be a set of smooth functions. Define the solution sets of the Fredholm integral equations as follows:
(7) |
Under the assumptions from Theorem 2.1 and Theorem 2.2, has a causal interpretation as the counterfactual mean . Under these assumptions, to estimate , we first construct a consistent estimator for the set ; next, we choose a specific so that it is a consistent estimator for a fixed element .
3.1. Estimation of solution sets
Define the criterion function
In practice, the estimation of the solution set can be done in the two arms separately, for example, by taking the criterion function . Note that
i.e., the solution set of the Fredholm integral equation consists of the zeros of the criterion function. To proceed with estimation, we adopt a two-stage approach. We aim to construct sample analogues of the criterion function . We let be sieve for . Specifically, for a known sequence of approximating functions , let
(8) |
For unknown and known. To construct , we require a nonparametric estimator of conditional expectations. For this, let be a known sequence of approximating functions. Denote
and let
For a generic random variable with realizations the nonparametric sieve estimator of is
(9) |
The sample analogue is then
(10) |
where
(11) |
Then the proposed estimator of is where is an appropriately chosen sequence that tends to 0.
3.2. Set consistency
In this section, we establish the set consistency of for under Hausdorff distances. For this, define the Hausdorff distance between two sets as
where and is a given norm. Consider the following two norms:
Notice that any satisfy 1, so it holds that for any and any and so
(12) |
Thus, we can calculate the convergence rate of by finding the convergence rate of . We will need to consistently estimate under the supremum norm because under the norm, elements of form an equivalence class. We will require several assumptions.
Assumption 8.
The vector of covariates has support , and outcome and proxies have compact support.
Definition 1.
For a generic function defined on we define
where is a d-dimensional vector of nonegative integers, denotes the largest integer smaller than , and .
Assumption 9.
The following conditions hold:
for some and are closed.
for every , there is a such that for some .
Assumption 10.
The following conditions hold:
The smallest and largest eigenvalues of are bounded above and away from zero for all
- for every , there is a such that
, where
Then we have the following theorem:
Proposition 3.1.
Suppose that Assumptions 8-10 hold. If , and , then
After obtaining a consistent estimator of , we select a specific estimator from such that it converges to a unique element in .
3.3. A representer-based estimator
To do so, we let be a population criterion function that attains a unique minimum on and its sample analogue. We then select
(13) |
We make the following assumption about :
Assumption 11.
The function set is convex; the functional is strictly convex and attains a unique minimum at on ; its sample analogue is continuous and .
A sensible choice for is the squared length with corresponding sample analog .
Proposition 3.2.
Suppose that assumptions 8-11 hold. Then
where is defined in equation 13. If and , we then have
Based on the identification condition , we propose the following estimator for :
(14) |
To facilitate analysis of the asymptotic properties of the estimator, let be the closure of the linear span of under , and define
Next, we require the following representer assumption:
Assumption 12.
The following conditions hold:
- there exists a function such that
for all , and .
Observe that any that satisfies Assumption 12(i) satisfies the following for all :
(15) |
Then suppose that
Using 15, this implies that
In other words, solves the integral equation from 3. Thus, Assumption 12(i) can be viewed as a strengthening of the assumption that there exists a function that solves Equation 3. As in Santos (2011), the function will be unique up to equivalence class. Now we can address the asymptotic expansion of our estimator:
Theorem 3.3.
Suppose that assumptions 8-12 hold. Then we have that
where
(16) |
To get an asymptotically normal estimator, we may de-bias the estimator, which requires estimating the term , which may not be asymptotically negligible.
3.4. A de-biased estimator
First, we define a new criterion function
and the sample analog
Observe that since , we have that . It follows that is the unique minimizer of the mapping . Then since is close to by assumption 9(ii), we can estimate the term by
(17) |
With this estimate, we can construct the following estimator for as
(18) |
Next, we have a lemma characterizing the convergence of to .
Lemma 3.4.
Suppose that assumptions 8-10 and 12 hold. Then
Using this lemma, we can construct the following debiased estimator that is asymptotically normal:
(19) |
Theorem 3.5.
Suppose assumptions 8 hold. If an , and , then converges in distribution to , where is the variance of
(20) |
Equation 20 is the influence function for the de-biased estimator . Cui et al. (2020) consider a nonparametric model that leaves the and functions that solve the Fredholm integral equations unrestricted. With this model and under additional assumptions that ensure uniqueness of and , they derive the efficient influence function for the counterfactual mean as . Thus, we have the immediate corollary:
Corollary 3.6.
The influence function 20 attains the semiparametric efficiency bound under the model considered in Cui et al. (2020) under assumptions 1,4-7, assumption 10 in Cui et al. (2020).
4. Discussion
Under the proximal causal inference framework, we have established an estimator for the counterfactual mean. We have shown it is consistent and presented conditions for when it is asymptotically normal. We have also discussed under what conditions it achieves the semiparametric efficiency bound. Note that if interest is in the average causal effect, , the proposed methodology can be adapted by slightly adjusting 12(i).
Supplementary Material
Acknowledgements
Jeffrey Zhang was supported by NIH grant 5R01HD101415-02. Wei Li’s research was supported by the National Natural Science Foundation of China (NSFC 12101607), the Fundamental Research Funds for the Central Universities and the Research Funds of Renmin University of China. Eric Tchetgen Tchetgen (PI) was supported by NIH Grants: R01AI27271, R01CA222147, R01AG065276, R01GM139926.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Chen X. and Pouzo D. Estimation of nonparametric conditional moment models with possiblynonsmooth generalized residuals. Econometrica, 80(1):277–321, 2012. ISSN 00129682, 14680262. URL http://www.jstor.org/stable/41336586. [Google Scholar]
- Cui Y, Pu H, Shi X, Miao W, and Tchetgen ET Semiparametric proximal causal inference,2020. URL https://arxiv.org/abs/2011.08411.
- Deaner B. Proxy controls and panel data, 2018. URL https://arxiv.org/abs/1810.00283.
- D’Haultfoeuille X. A new instrumental method for dealing with endogenous selection. Journal of Econometrics, 154(1):1–15, 2010. URL https://EconPapers.repec.org/RePEc:eee:econom:v:154:y:2010:i:1:p:1-15. [Google Scholar]
- Ghassami A, Ying A, Shpitser I, and Tchetgen Tchetgen E. Minimax kernel machine learning for a class of doubly robust functionals with application to proximal causal inference. In CampsValls G, Ruiz FJR, and Valera I, editors, Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, pages 7210–7239. PMLR, 28–30 Mar 2022. URL https://proceedings.mlr.press/v151/ghassami22a.html. [Google Scholar]
- Li W, Miao W, and Tchetgen Tchetgen E. Identification and estimation of nonignorable missing outcome mean without identifying the full data distribution, 2021. URL https://arxiv.org/abs/2110.05776.
- Miao W. and Tchetgen Tchetgen EJ On varieties of doubly robust estimators under missingness not at random with a shadow variable. Biometrika, 103(2):475–482, 2016. ISSN 00063444. URL http://www.jstor.org/stable/43908634. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miao W, Geng Z, and Tchetgen Tchetgen EJ Identifying causal effects with proxy variables of an unmeasured confounder. Biometrika, 105(4):987–993, 08 2018. ISSN 0006–3444. doi: 10.1093/biomet/asy038. URL 10.1093/biomet/asy038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Santos A. Instrumental variable methods for recovering continuous linear functionals. Journal of Econometrics, 161(2):129–146, 2011. ISSN 0304–4076. doi: 10.1016/j.jeconom.2010.11.014. URL https://www.sciencedirect.com/science/article/pii/S0304407610002253. [DOI] [Google Scholar]
- Severini TA and Tripathi G. Efficiency bounds for estimating linear functionals of nonparametric regression models with endogenous regressors. Journal of Econometrics, 170(2):491–498, 2012. ISSN 0304–4076. doi: 10.1016/j.jeconom.2012.05.018. URL https://www.sciencedirect.com/science/article/pii/S0304407612001303. Thirtieth Anniversary of Generalized Method of Moments. [DOI] [Google Scholar]
- Shi X, Miao W, and Tchetgen Tchetgen E. A selective review of negative control methods in epidemiology. Current Epidemiology Reports, 7(4):190–202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shi X, Miao W, Nelson JC, and Tchetgen Tchetgen EJ Multiply robust causal inference with double-negative control adjustment for categorical unmeasured confounding. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82(2):521–540, 2020. doi: 10.1111/rssb.12361. URL https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/rssb.12361. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tchetgen Tchetgen EJ, Ying A, Cui Y, Shi X, and Miao W. An introduction to proximal causal learning, 2020. URL https://arxiv.org/abs/2009.10982.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.