Abstract
Causal inference relies on two fundamental assumptions: ignorability and positivity. We study causal inference when the true confounder value can be expressed as a function of the observed data; we call this setting estimation with functional confounders (EFC). In this setting ignorability is satisfied, however positivity is violated, and causal inference is impossible in general. We consider two scenarios where causal effects are estimable. First, we discuss interventions on a part of the treatment called functional interventions and a sufficient condition for effect estimation of these interventions called functional positivity. Second, we develop conditions for nonparametric effect estimation based on the gradient fields of the functional confounder and the true outcome function. To estimate effects under these conditions, we develop Level-set Orthogonal Descent Estimation (LODE). Further, we prove error bounds on LODE’s effect estimates, evaluate our methods on simulated and real data, and empirically demonstrate the value of EFC.
1. Introduction
Determining the effect of interventions on outcomes using observational data lies at the core of many fields like medicine, economic policy, and genomics. For example, policy makers estimate effects to elect whether to invest in education or job training programs. In medicine, doctors use effects to design optimal treatment strategies for patients. Geneticists perform genome-wide association studies (GWAS) to relate genotypes and phenotypes. In observational data, there could exist unobserved variables that affect both the intervention and the outcome, called confounders. A necessary condition for the causal effect to be identified is that all confounders are observed; called ignorability. If ignorability holds, a sufficient condition for causal effect estimation is adequate variation in the intervention after conditioning on the confounders; this condition is called positivity.
The data apriori does not differentiate between confounders and interventions. It is the practitioners that select interventions of interest from all pre-outcome variables (variables that occur before the outcome). Then, assuming knowledge of the data generating mechanism, practitioners can label certain variables amongst the remaining pre-outcome variables as confounders. This corresponds to indexing into the set of pre-outcome variables.
In certain problems the confounders are specified as a function of the pre-outcome variables that does not simply index into the set of pre-outcome variables. For a concrete example, consider GWAS. The goal in GWAS is to estimate the influence of genetic variations on phenotypes like disease risk. In GWAS, population and family structures both result in certain genetic variations and affect phenotypes and therefore, are confounders [4]. Practitioners specify these confounders by using the genetic similarity between individuals [15, 19, 31], which is a function of the genetic variations. When the confounders are a function of the same pre-outcome variables that define the interventions, positivity is violated. Then, the class of interventions whose effects are estimable is not well-defined.
We study causal effect estimation in such settings, where a function of the pre-outcome variables provides the confounder and these same pre-outcome variables define the intervention. We call this estimation with functional confounders (EFC). In EFC, one column in the observed data is the outcome and all others are pre-outcome variables. We assume access to a function h(·) that takes as input the pre-outcome variables and returns the value of the confounder. Further, we assume these confounders give us ignorability. In settings like GWAS, the function h reflects the practitioner-specified function that captures the genetic variation influenced by the population structure. In traditional observational causal inference (OBS-CI), h(·) reflects the selection of certain variables in the data and labelling them as confounders. In EFC, two different values of the confounder are never observed for the same setting of the pre-outcome variables. This means that positivity is violated and the effects of only certain interventions may be estimable.
We address this issue in two ways. First, we investigate a class of plausible interventions that are functions of the observed pre-outcome variables, called functional interventions. We develop a sufficient condition to estimate the effects of said functional interventions, called functional positivity (F-POSITIVITY). Second, we consider intervening on all pre-outcome variables, called the full intervention. We develop a sufficient condition to estimate the effect of the full intervention, called causal redundancy (C-REDUNDANCY). For an intervention, given a confounder value, C-REDUNDANCY allows us to compute a surrogate intervention such that the conditional effect of the surrogate is equal to that of the original intervention. We also show that such surrogate interventions exist only under a certain condition that we call Effect Connectivity, that is necessary for nonparametric effect estimation in EFC. This condition is satisfied by default in traditional OBS-CI if ignorability and positivity hold. Then, we develop an algorithm for causal estimation assuming C-REDUNDANCY, called Level-set Orthogonal Descent Estimation (LODE), which estimates effects using surrogate interventions. If the surrogate is not estimated well, LODE’s estimates are biased. We establish bounds on this bias that capture the mitigating effect of the smoothness of the true outcome function.
Related work
The problem of genome-wide association studies (GWAS) is to estimate the effect of genetic variations(also called single nucleotide polymorphisms (SNPs)) on the phenotype [29]. The ancestry of the subjects acts as a confounder in GWAS. In GWAS practice, principle component analysis (PCA) and linear mixed models (LMMs) are used to compute this confounding structure [19, 31]. Lippert et al. [15] suggest estimating the confounders and effects on separate subsets of the SNPs. This separation disregards the confounding that is captured in the interaction of the two subsets of SNPs. GWAS is a special case of effects from multiple treatments (MTE) where the confounder value is specified via optimization as a function of the pre-outcome variables [20, 30]. In all these settings, positivity is violated and not all effects are estimable. We provide an avenue for nonparametric effect-estimation of the full intervention under a new condition, C-REDUNDANCY.
Traditional observational causal inference (OBS-CI) review
We setup causal inference with Structural Causal Models [17] and use do(t = t*) to denote making an intervention. Let t be a vector of the interventions, z be the confounder, and y be the outcome. Let η ~ p(η)(η⫫(z, t)) be noise. With f as the outcome function, we define the causal model for traditional OBS-CI as1:
Let p(y, z, t) denote the joint distribution implied by this data generating process. The effects of interest under the full intervention do(t = t*) are the average and conditional effect
| (1) |
With observed confounders, two assumptions make causal estimation possible: ignorability and positivity. Ignorability means that all confounders z are observed in data. Conditioning on all the confounders, the outcome under an intervention is distributed as if conditional on the value of the intervention: p(y = y1 | do(t = t*),z = z) = p(f(t*, z, η) = y1) = p(y = y1 | t = t*,z = z). This allows the expression of average effect as an expectation over the observed outcomes . The conditional expectation only exists for all t* if p(y | z, t = t*) = p(y,z,t=t*)/p(z)p(t=t* | z) exists. Positivity guarantees this existence
| (2) |
2. Estimation with functional confounders
In traditional OBS-CI, causal estimation relied on knowing the confounders. In this section, we consider settings where confounders are known via a function of the pre-outcome variables h(t) = z. We call this setting estimation with functional confounders (EFC). An example of this is GWAS, where SNPs (the pre-outcome variables) are used to estimate the confounding population structure through methods like PCA [31]. Assuming the confounders are a function of the pre-outcome variables violates positivity in general. Positivity is violated in this setting because
In words, two different confounder values cannot occur for the same t. A positivity violation precludes nonparametric effect estimation of the full intervention do(t = t*).
Positivity and Regression Identifiability
Positivity can be viewed as providing identifiability. To see this, let the confounder be z = h(t) and the outcome be y(t, z, η) = z + h(t). Now consider regressing z and t onto y. Then, functions y = αz + βh(t) indexed by α, β, such that α + β = 2, are consistent with the observed data. Thus, there exist infinitely many solutions to the conditional expectation of y on (t, z), meaning that the regression is not identifiable. Assuming positivity necessitates sufficient randomness to identify the regression and thus the causal effect. A violation of positivity means that nonparametric estimation of causal effects needs further assumptions.
2.1. Setup for EFC
In EFC, the confounder is provided as a non-bijective function h of the pre-outcome variables t. To reflect this property, we use h(t) to denote the confounder. As an illustrative example, let [mi] be the Gamma distribution and consider z ∈ {−1, 1}, p(z = 1) = 0.5 is the confounder and the intervention of interest is . Note sign(t) = z meaning that h(t) = sign(t) is the confounder. Figure 1 shows causal graphs connecting our EFC notation to that in traditional OBS-CI. With noise η ~ p(η)(η⫫t), our causal model samples, in order, the confounder ”part” of pre-outcome variables h(t), the pre-outcome variables t, and the outcome y via the outcome function f2:
Figure 1:

Causal Graphs for Traditional OBS-CI vs. EFC.
Similar to traditional OBS-CI, for an intervention t* the average effect, τ(·), and the conditional effect, ϕ(·,·) at , respectively, are defined as:
| (3) |
As the pre-outcome variables determine the confounder, positivity is violated. Further, the outcome function f(t, h(t), η) could recover the exact value of h(t) from t instead of its second argument. Thus, two different outcome functions could lead to the same observational data distribution, posing a fundamental obstacle to causal effect estimation. This is the central challenge in EFC.
2.2. Causal Questions With Functional Positivity
Without positivity, we can only estimate the effects of certain functions of t. We call such interventions, on some function g(t), functional interventions. The implied causal model for the outcome for functional intervention value g(t*) and confounder value is first t ~ p(t | g(t) = g(t*), and then 3. Then, the functional average effect is
An example of a functional intervention is intervening on the cumulative dosage of a drug. In contrast, traditional interventions would set each individual dose given at different points in time.
F-POSITIVITY and Functional Effect Estimation
For the causal model above to be well-defined for all functional interventions g(t*), the conditional must exist. To guarantee this existence, we define functional positivity (F-POSITIVITY) for any g(t*)
| (4) |
F-POSITIVITY says that the function of the pre-outcome variables that is being intervened on needs to have sufficient randomness when the function of the pre-outcome variables that defines the confounders is fixed. Further, under F-POSITIVITY, effect estimation for functional interventions is reduced to traditional OBS-CI on data p(y, g(t), h(t)). With positivity and ignorability satisfied, traditional causal estimators such as propensity scores [23], matching [21], regression [11], and doubly robust methods [22] can be used to estimate the causal effect. Focusing on regression, let fθ be a flexible function, then would estimate the conditional expectation of interest : . With θ, the effect of g(t*) can be estimated by averaging the estimate of the conditional expectation over the marginal distribution p(h(t)):
| (5) |
3. Identification of effects of the full intervention
When positivity is violated, causal effects cannot be estimated as conditional expectations over the observed data in general. We give a functional condition, called causal redundancy (C-REDUNDANCY), that allows us to estimate the effect of the full intervention do(t = t*), even when positivity is violated. Specifically, C-REDUNDANCY allows us to construct a surrogate intervention whose conditional effect at h(t′) matches the conditional effect of interest, . Let be a fixed value of the full intervention, then C-REDUNDANCY is
Assumption. Recall the outcome . With as gradient w.r.t. to argument :
In words, C-REDUNDANCY is the condition that the outcome function f uses the value of the confounder from its second argument instead of computing h(t) from the first argument4. To compute the conditional effect , we develop Level-set Orthogonal Descent Estimation (LODE). LODE’s key step is to construct a surrogate intervention such that
By definition, a surrogate intervention lives in the conditional effect level-set: . So LODE searches this level-set for . See fig. 2 which plots the conditional effect level-sets with the value of h(t) fixed (red) in (supp(t), supp(h(t)))-space. Green corresponds to the observed data, supp(t, h(t)). LODE finds by traversing the level-sets (black) to account for the confounder part mismatch . C-REDUNDANCY ensures LODE can traverse these level-sets as it implies under the regularity conditions in theorem 1. Thus, under C-REDUNDANCY, surrogate interventions can be constructed by solving a gradient flow equation which guarantees identification as follows:
Figure 2:

LODE’s traversal.
Theorem 1. Assume C-REDUNDANCY holds. Assuming the following:
-
Let be the limiting solution to the gradient flow equation , initialized at ; i.e. .
Further, let and .
and as functions of , are continuous and differentiable and the derivatives exist for all , η. Let exist and be bounded and integrable w.r.t. the probability measure corresponding to p(η), for all values of and .
Then the conditional effect (and therefore the average effect) is identified:
| (6) |
In words, the key idea is that starting at and following means always lies in the level-set . See appendix A.2 for the proof. While C-REDUNDANCY is stated in terms of the gradient of the outcome function, it suffices for theorem 1 to assume a weaker condition about the gradient of the conditional effect: .
Surrogate Positivity
In theorem 1, we assumed that the surrogate ∈ supp(t). This condition, which we call surrogate positivity (analogous to positivity), states that for any intervention and confounder, surrogate interventions that are limiting solutions to the gradient flow equation have nonzero density conditional on the confounder value. Formally, for any intervention t = t*
| (7) |
and satisfies assumption 1 in theorem 1. Surrogate positivity along with C-REDUNDANCY, is sufficient for full effect estimation under EFC. Next, we show that the positivity assumption in traditional causal inference is a special case of surrogate positivity.
Traditional observational causal inference (OBS-CI) and LODE
Let the confounder and intervention of interest in traditional OBS-CI be z and a respectively. Assume both are scalars and ignorability and positivity hold. This setup can be embedded in EFC by defining the vector of pre-outcome variables as: t = [a; z]. In this setting, C-REDUNDANCY and surrogate positivity(eq. (7)) hold by default. Let the outcome be y = f(t, h(t)) = f(a, z), where f only depends on the first element of t, i.e. a5. Let e1 = [1, 0] and e2 = [0,1]. In traditional OBS-CI as EFC, and meaning that . Thus, C-REDUNDANCY holds by default. Moreover, under positivity of a w.r.t. z, we also have surrogate positivity for traditional OBS-CI as an EFC problem. In this setting, LODE computes by following , which only changes the value of , not the value of a. Thus, t* and will have the same first element and t′’s second element will be . As a has positivity w.r.t. z, we have which means t′ ∈ supp(t). The estimated conditional effect is which matches the estimate in traditional OBS-CI.
Implementation of LODE
LODE first estimates the conditional expectation ; this can be done with model-based or nonparametric estimators. This is achieved by regressing y on t, , with empirical distribution D. The surrogate intervention is computed using Euler integration to solve the gradient flow equation. Euler integration in this setting is equivalent to gradient descent with a fixed step size. Other, more efficient schemes like Runge–Kutta numerical integration methods [3] could also be used. The conditional effect estimate is . See algorithm 1 for a description.
3.1. Estimation error of LODE in practice
To compute the surrogate intervention t′, LODE uses the gradients of h(·) in Euler integration. In practice, taking Euler integration steps, instead of solving the gradient flow exactly, could result in errors. Then t′ could lie outside the level-set of the conditional effect . Further, if , LODE incurs error for conditioning on a value of the confounder that is different from . The error due to t′ estimation is decoupled from the error in the estimation of which adds without further amplification. We formalize this error:
Theorem 2. Consider the conditional effect . Let be the estimate of the surrogate intervention computed by LODE, computed via Euler integration of the gradient flow , initialized at . Assume the true surrogate exists and is the limiting solution to the gradient flow equation.
Let the finite sample estimator of be . Let the error for all be bounded, , where N is the sample size and limN→∞ c(N) = 0.
Assume K Euler integrator steps were taken to find the surrogate estimate , each of size ℓ. Let the maximum confounder mismatch be .
-
Let be the Lipschitz-constant of as a function of , for fixed .
Let Le be the Lipschitz-constant of as a function of .
Assume h has a gradient with bounded norm, .
Assume f’s Hessian has bounded eigenvalues: , .
The conditional effect estimate error, , is upper bounded by:
| (8) |
See appendix A.3 for the proof. Theorem 2 captures the trade-off between biases due to conditioning on the wrong confounder value and due to the accumulated error in solving the gradient flow equation. This accumulated error analysis may be loose in settings where the sum of many gradient steps lead to , even if each step individually induces large error. In such settings, the term that depends on is a better measure of error. The maximum-mismatch [mi] appears because Euler integrator takes steps that depend on the magnitude of the gradient which depends on the mismatch value . If mismatch is large for some i, the Euler step could lead to a large error for a fixed step size ℓ. We discuss the assumptions in theorems 1 and 2 in appendix A.1
3.2. Effect Connectivity and the Existence of
The key element in Theorem 1 is the surrogate intervention t′ such that its conditional effect given h(t′), equals that of t* and . The orthogonality , is a functional condition that does not guarantee exists in supp(t); a necessity to compute without additional parametric assumptions. We give a general condition called Effect Connectivity that guarantees the surrogate intervention exists. With conditional effect , for any t*
| (9) |
In words, t has a chance of setting the conditional effect to any possible value supp(ϕ(t, h(t2))) given any confounder value . An equivalent statement is that every level set of the conditional effect , with fixed, contains an intervention for each confounder value. That is, for some define the level set , then , .
Theorem 3. Under Effect Connectivity, eq. (9), any surrogate intervention .
We give the proof in appendix A.4. Whether the intervention can be found via tractable search is problem-specific. If the surrogate exists , then eq. (9) holds by definition of the surrogate. Effect Connectivity allows us to reason about values of f anywhere in supp(t) × supp(h(t)) using only samples from p(y, t). Further, it is necessary in EFC:
Theorem 4. Effect Connectivity is necessary for nonparametric effect estimation in EFC.
We prove this in appendix A.5. Effect Connectivity ensures that causal models with different causal effects have different observational distributions. Then, parametric assumptions on the causal model are not necessary to estimate effects.
4. Experiments
We evaluate LODE on simulated data first and show that LODE can correct for confounding. We also investigate the error induced by imperfect estimation of the surrogate intervention in LODE. Further, we run LODE on a GWAS dataset [6] and demonstrate that LODE is able to correct for confounding and recovers genetic variations that have been reported relevant to Celiac disease [8, 25, 14, 1].
4.1. Simulated experiments
We investigate different properties of LODE on simulated data where ground truth is available. Let the dimension of t (pre-outcome variables) be T = 20 and outcome noise be . We consider two EFC causal models, denoted by A and B with different h(t) and f(t, h(t), η):
In both causal models, C-REDUNDANCY is satisfied. The constant γ controls the strength of the confounder and the constant α controls the Lipschitz constant of the outcome as a function of the confounder. We let the variance σ2 = 1, unless specified otherwise. In the following, we train on 1000 samples and report conditional effect root-mean-squared error (RMSE), computed with another 1000 samples. We used a degree-2 kernel ridge regression to fit the outcome model as a function of t. This model is correctly specified, and so the conditional can be estimated well. We compare against a baseline estimate of conditional effect that is the same outcome model’s estimate of . This baseline fails to account for confounding and produces a biased estimate of the conditional effect of do(t = t*), conditional on any .
First, we investigate how well LODE can correct for confounding for both causal models. We let α = 1 and obtain surrogate estimates by Euler integrating until the quantity is smaller than 10−4 times value at initialization, where is expectation over the evaluation set. In fig. 3, we plot the mean and standard deviation of conditional effect RMSE averaged over 10 seeds, for different strengths of confounding. We see that LODE is able to estimate effects well across multiple strengths of confounding while the baseline suffers.
Figure 3:

RMSE of estimated conditional effect vs. strength of confounding γ. LODE corrects for confounding and produces good effect estimates across different values of γ.
Second, we investigate LODE’s estimation when surrogate positivity holds but the probability is very small. This results in estimation error due to poor fitting of the outcome model in low density regions of supp(t). We run LODE on simulated data where t is generated with different variances (σ2). For small σ, the outcome model error is large when using surrogate interventions , where either or t* is large. This leads to high variance effect estimation as we show in fig. 4 for both causal models. For various variances of t, σ2, we plot the mean and standard deviation of RMSE of estimated conditional effect over 10 seeds, against different γ.
Figure 4:

RMSE of estimated conditional effect estimate vs. the strength of confounding γ, for different levels of variance of t, σ2. Small σ leads to large conditional estimation error.
Third, we investigate the bias induced due to imperfect estimation of the surrogate intervention in LODE for both causal models. We construct surrogate interventions by ensuring there is confounder-value mismatch . We do this by interrupting Euler integration when the objective , where the is over our evaluation set upon which we estimate conditional effects. For different α, we plot in fig. 6 the mean and standard deviation of RMSE of estimated conditional effect over 10 seeds, against different degrees of confounder mismatch, δ. The error due to confounder mismatch is mitigated by small α, the Lipschitz-constant of the outcome as a function of h(t). Finally, we consider how step size in Euler integration affects the quality of estimated effects. Large step sizes may result in biased surrogate estimates; this bias is captured in the accumulation error in section 3.1. We focus on the non-linear case in causal model B where gradient errors can accumulate(see appendix A.3.1). We demonstrate this error in fig. 5 where we plot mean and standard deviation of conditional effect RMSE against the strength of confounding, for different step sizes ℓ. We do not report results for larger step sizes (ℓ > 2) because Euler integration diverged for many surrogate estimates.
Figure 6:

RMSE of estimated conditional effect vs. degree of confounder mismatch δ. Error due to conditioning on a mismatched value of the confounder increases with strength of confounding but is mitigated by smoothness of the outcome function.
Figure 5:

RMSE of estimated conditional effect vs. step size in Euler Integrator in causal model B. Accumulating error due to large step size in Euler integrator increases with strength of confounding.
4.2. Effects in Genetics (GWAS)
In this experiment, we explore the associations of genetic factors and Celiac disease. We utilize data from the Wellcome Trust Celiac disease GWAS dataset [8, 6] consisting of individuals with celiac disease, called cases (n = 3796), and controls (n = 8154). We construct our dataset by filtering from the ~ 550,000 SNPs. The only preprocessing in our experiments is linkage disequilibrium pruning of adjacent SNPs (at 0.5 R2) and PLINK [5] quality control. After this, 337,642 SNPs remain for 11,950 people. We imputed missing SNPs for each person by sampling from the marginal distribution of that SNP. No further SNP or person was dropped due to missingness. The objective of this experiment is to show that LODE corrects for confounding and recovers SNPs reported in the literature [8, 25, 14, 1]. To this end, after preprocessing, we included in our data 50 SNPs reported in [8, 25, 14, 1] and 1000 randomly sampled from the rest.
We use outcome models and functional confounders h() traditionally employed in the GWAS literature. We choose a linear , where A is a matrix of the right singular vectors of a normalized Genotype matrix, that correspond to the top 10 singular values [19]. The outcome model is selected from logistic Lasso linear models with various regularization strengths, via cross validation within the training data (60% of the dataset). We defer details about the experimental setup to appendix B.
We then use this outcome model in LODE to compute causal effects on the whole filtered dataset. The effects are computed one SNP at a time. First, for each person , create which correspond to the ith SNP set to 1 and 0 respectively, with all other SNPs same as . Randomly sample a from the marginal p(h(t)) and, using the outcome model Pθ, compute . The average effect of SNP i is obtained by averaging across all persons: . Any SNP that beats a specified threshold of effect is deemed relevant to Celiac disease by LODE. We use a 60 − 40% train-test split, and outcome model selection is done via cross-validation within the training set. We did 5-fold cross-validation using just the training set. We use Scikit-learn [18] to fit the outcome models and for cross-validation.
Results
The best outcome model was a Lasso model, trained with regularization constant 10. We select relevant SNPs by thresholding estimated effects at a magnitude > 0.1. From 1050 SNPs (1000 not reported before) LODE returned 31 SNPs, out of which 13 were previously reported as being associated with Celiac disease [8, 25, 14, 1]. In appendix B.2 we plot the true positive and false negative rates of identifying previously reported SNPs, as a function of the effect threshold.
In table 1, we list a few SNPs that were both deemed relevant by LODE and were reported in existing literature [8, 25, 14, 1], their effects, and their Lasso coefficients. The full list is in table 2 in appendix B. If LODE cannot adjust for confounding, the Lasso coefficients would dictate the effects; 0 coefficient means 0 effect. However, the two pairs of SNPs in table 1 show that the effects estimated by LODE do not rely solely on the Lasso coefficients. For the first pair (rs13151961, rs2237236), the effect is the same but the coefficient of one is 0, while the other is positive. We note that rs2237236 was found to be associated with ulcerative colitis [12, 2], which is an inflammatory bowel disease that has been reported to share some common genetic basis with celiac disease [16]. For the second pair, (rs1738074, rs11221332), the magnitude of the effect is smaller for the former, but the coefficient is larger. Thus, LODE adjusts for confounding factors that the outcome model ignored.
Table 1:
A few SNPs previously reported as relevant and recovered by LODE, with estimated effects and Lasso coefficients. LODE produces effect estimates that do not rely purely on the coefficients.
| SNP | Effect. | Coef. |
|---|---|---|
| rs13151961 | 0.17 | 0.32 |
| rs2237236 | 0.17 | 0.00 |
| rs1738074 | −0.16 | −0.23 |
| rs11221332 | −0.15 | −0.24 |
5. Discussion
When positivity is violated in traditional OBS-CI, not all effects are estimable without further assumptions. In such cases, practitioners have to turn to parametric models to estimate causal effects. However, parametric models can be misspecified when used without underlying causal mechanistic knowledge. We develop a new general setting of observational causal effect estimation called estimation with functional confounders (EFC) where the confounder can be expressed as a function of the data, meaning positivity is violated. Even when positivity is violated, the effects of many functional interventions are estimable. We develop a sufficient condition called functional positivity (F-POSITIVITY) to estimate effects of functional interventions. Such effects could be of independent interest; like the effect of cumulative dosage of a drug instead of joint effects of multiple dosages at different times.
Second, we prove a necessary condition for nonparametric estimation of effects of the full intervention. We propose the C-REDUNDANCY condition, under which, the effect of the full intervention on t is estimable without parametric restrictions. We develop Level-set Orthogonal Descent Estimation (LODE) that computes surrogate interventions whose effects are estimable and match a conditional effect of interest. Further, we give bounds on errors (theorem 2) induced due to imperfect estimation of the surrogate intervention. Finally, we empirically demonstrate LODE’s ability to correct for confounding in both simulated and real data.
Future.
A few directions of improvement remain which we elaborate next. First, F-POSITIVITY may not hold for all functions g(t) that we want to intervene on. Instead, one could compute a “projection” gΠ to the space of functions that satisfy F-POSITIVITY and inspect the effects defined by gΠ instead. A second direction of interest is to let h(t) only account for a part of the confounding, meaning ignorability is violated. This bias could be mitigated under smoothness conditions of the outcome function and its interaction with the degree of violation of ignorability.
Finally, LODE’s search strategy is Euler integration, which is equivalent to gradient descent with a fixed step size. Optimization techniques like momentum, rescaling the gradient using an adaptive matrix, and using second order hessian information, speed up gradient descent. However, if there are many local or global minima for , such techniques will result in a different solution than Euler integration, which could mean that effect estimates are biased. One extension of LODE would allow for search strategies that use such techniques.
Broader Impact.
Our work mainly applies to causal inference where confounders are specified as functions of observed data, such as in problems in genetics and healthcare. We choose to assess the impact of our work through its applications in these fields. A positive impact of the work is that better estimates of causal effects helps guide treatment for people and aid in understanding biological pathways of diseases. However, in healthcare, data collected in hospitals has biases. If, for instance, a certain demographic of people have more complete data collected about them, then this demographic would have better quality effect estimates, potentially meaning that they receive better treatment. This problem could be characterized by evaluating the positivity of treatment and completeness of confounders in electronic health record data split by demographics.
Acknowledgements
The authors were partly supported by NIH/NHLBI Award R01HL148248, and by NSF Award 1922658 NRT-HDR: FUTURE Foundations, Translation, and Responsibility for Data Science. The authors would like to thank Xintian Han, Raghav Singhal, Victor Veitch, Fredrik D. Johansson and the reviewers for thoughtful feedback. The authors would also like to thank Mukund Sudarshan and Prof. Sriram Sankararaman for help with running the GWAS experiments.
A. Theoretical details
A.1. A note about the assumptions
Note about the assumptions
In theorem 1, assumption 1 consists of three parts that can all be validated on observed data: 1) that the gradient flow converges, 2) that the confounder value of the surrogate matches the confounder value whose effect is of interest, and 3) that the surrogate intervention lies in the support of the pre-outcome variables. Assumption 2 is required for expectations and their gradients to exist and be finite. In theorem 2, assumption 1 requires a consistent estimator of , which can be provided with regression. Assumption 3 lists regularity conditions which help control how the surrogate estimation error propagates to the effect error.
A.2. Proof of Theorem 1
We restate the theorem for completeness:
Theorem 1. Assume C-REDUNDANCY holds. Assuming the following:
-
Let be the limiting solution to the gradient flow equation , initialized at ; i.e. .
Further, let and .
and as functions of , are continuous and differentiable and the derivatives exist for all , η. Let exist and be bounded and integrable w.r.t. the probability measure corresponding to p(η), for all values of and .
Then the conditional effect (and therefore the average effect) is identified:
| (10) |
Proof. Recall definition of conditional effect . Recall is the gradient with respect to the first argument of f, that is . First, by assumption 2, and ∇ commute, under the dominated convergence theorem. Then, by C-REDUNDANCY
Now consider the gradient flow equation . We refer to the gradient evaluated at as . We will express as defined by the starting point and the gradient flow equation.
Let the solution path to the gradient flow equation be C with t*, being the starting and ending points respectively. By the Gradient Theorem [26], we have that and are related via the line integral over C:
Let be a parametrization of solution path C by the scalar time s ∈ [0, ∞). Now, to obtain the value of , we will compute the line integral over the vector field defined by , which exists by assumption 2 in theorem 1, evaluated along the path C defined by :
| (11) |
Finally, by assumption 1 in theorem 1, , and so
| (12) |
For clarity, the same equation, but using t′ and suppressing dependence on t*, :
| (13) |
Under the causal model for EFC, the outcome y = f(t, h(t), η). Then, ,
| (14) |
Using that and eqs. (13) and (14), the conditional effect is identified
| (15) |
Thus, the conditional effect, and consequently the average effect, are identified as and respectively.
Note about convergence of gradient flow
Any ODE’s solution, if it exists and converges, converges to an ω-limit set [27]. An ω-limit set is nonempty when the solution path lies entirely in a closed and bounded set and can consist of limit cycles, equilibrium points, or neither [13, 27]. A gradient flow equation (also called a gradient system) has the special property that its ω-limit set only consists of critical points of ; critical points of are also equilibrium points of the gradient flow equation [13]. Further, if exists and is bounded and has bounded sublevel sets , then the solution to the gradient flow equation will entirely lie within a bounded set. This is because along the solution path, always decreases meaning that the solution will remain in any sublevel set it started in. Thus, if has bounded sublevel sets, the solution of the gradient flow equation will converge only to critical points of .
A.3. Estimation error in LODE
Theorem 2. Consider the conditional effect . Let be the estimate of the surrogate intervention computed by LODE, computed via Euler integration of the gradient flow , initialized at . Assume the true surrogate exists and is the limiting solution to the gradient flow equation.
Let the finite sample estimator of be . Let the error for all be bounded, , where N is the sample size and limN→∞ c(N) = 0.
Assume K Euler integrator steps were taken to find the surrogate estimate , each of size ℓ. Let the maximum confounder mismatch be .
-
Let be the Lipschitz-constant of as a function of , for fixed .
Let Le be the Lipschitz-constant of as a function of .
Assume h has a gradient with bounded norm, .
Assume f’s Hessian has bounded eigenvalues: , , .
The conditional effect estimate error, , is upper bounded by:
| (16) |
Proof. (of Theorem 2) Recall the definition of conditional effect : .
LODE’s estimate of the conditional effect is . We will suppress notation for dependence on t*, and use t′ and to refer to the true surrogate intervention and the estimated surrogate interventions respectively. Note is the estimate of the conditional expectation , learned from N samples. We first bound the error by splitting into two parts and bounding each separately:
The first term is bounded via the Lipschitz-ness of ϕ as a function of with fixed first argument .
We now bound the remaining term. Recall that LODE’s computation of the surrogate intervention involved K gradient steps, each of size ℓ. We work with a constant step-size but the analysis can be generalized to a non-uniform step size. Indexing steps with i, let be the confounder mismatch error at the ith iterate. Then note that . We can use this to bound the error . With and , we proceed by expressing the error as a telescoping sum and using the Taylor expansion for in terms of the the first argument .
| (17) |
| (18) |
| (19) |
| (20) |
| (21) |
| (22) |
| (23) |
| (24) |
| (25) |
| (26) |
| (27) |
where the inequalities follow by the maximum value of , bounded eigenvalues of the Hessian of ϕ and the Lipschitz-ness of .
Another way we bound the error is via the Lipschitz constant of the conditional expectation as a function of . Recall this is Le. An alternate bound on the error is as follows:
The bound follows:
A.3.1. A note on linear confounder functions and LODE
In the proof above, the error in Euler integration accumulates due to terms like this one: . For a linear confounder function that satisfies , such terms can be expressed as under C-REDUNDANCY. Thus, such error does not accumulate even with large step sizes.
Further, note that the gradient flow equation in LODE for the causal model A in section 4 is a linear ODE whose solution has a closed form expression and one can estimate the surrogate without numerical integration [27].
A.4. Proof of sufficiency of Effect Connectivity
Theorem 3. Under Effect Connectivity, eq. (9), any surrogate intervention .
Proof. Recall . We have ∀t* ∈ supp(p(t)):
This implies Ǝt′ ∈ supp(t), , s.t. .
Then, .
A.5. Necessity of Effect Connectivity for Nonparametric effect estimation in EFC
Theorem 4. Effect Connectivity is necessary for nonparametric effect estimation in EFC.
Proof. (Proof of Theorem 4) Let the outcome be y = f(t, h(t)). Recall the joint distribution p(t, y) and let h(t) be the confounder. Let Effect Connectivity be violated, i.e. there exists a non-measure-zero subset B ∈ supp(t) × supp(h(t)) such that6:
Now, we construct a new outcome y2 = f2(t, h(t)) and show the conditional effects for this new outcome are different from the one defined by f on . Let
We have as the additional term in f2 is only present for ; this follows from the fact that , as
Thus, p(y, t) =d p(y2, t) are equal in distribution since B ∩ supp(t, h(t)) = ∅. This means that the conditional effects are different for the outcomes y, y2 for all :
Therefore, for causal models that violates Effect Connectivity, there exist observationally equivalent causal models with different causal effects. Thus, nonparametric effect estimation is impossible. Thus, Effect Connectivity is required for EFC.
A.6. Algorithmic details
We give in algorithm 1 pseudocode for LODE.
Extensions of LODE
Consider that we have access to m(h(t)) for some bijective differentiable function m(·), instead of h(t). The orthogonality in C-REDUNDANCY holds . Then, using to compute the surrogate , LODE would estimate valid effects. Similarly, LODE can estimate the effect on any differentiable transformation of the outcome m(y), because holds.

B. Experimental Details
B.1. Functional confounders in GWAS
Here, we show how h(t) = At and A reflect the traditional PCA based adjustment in GWAS. Recall population structure acts as a confounder in GWAS. Price et al. [19] demonstrated that using the principal components of the normalized genetic relationships matrix adjusts for confounding due to population structure in GWAS. Let the genotype matrix be G with people as rows and SNPs as columns, such that each element is one of 0, 1/2, 1, where 1/2 and 1 refer to one and two copies of the allele respectively at the position of the SNP. With ps as the allele frequency at SNP s [28], Φ is the genetic relationship matrix whose elements are defined as . Then, Price et al. [19] compute the top K (10 suggested) principal components of Φ to use as the axes of variation due to the population structure. The eigenvectors of Φ are the left eigenvectors of such that which capture independent axes of variation of individuals.
Price et al. [19] exploit the idea that if a SNP aligns with some of the axes of variation, this is due to the population structure. These axes of variation are the top K eigenvectors U of , where , and . Here, U are also the left singular vectors of where is diagonal, and . We use ≈ to denote that the chosen K eigenvectors explain the variation due to population structure; what remains are random mutations.
Let the sth SNP be , which is a column in . In Price et al. [19], population structure in the sth SNP is captured in . In words, projecting the SNP onto the axes of variation in individuals gives the population structure between sth SNP and the outcome. This projection is a row of . In turn, is the population structure in all SNPs. Projecting this population structure onto the genotype of an individual gives the confounding due to population structure amongst the SNPs present in the genotype. With as the genotype for an individual j, this projection is . However, implies that . Reflecting this, h(t) = ΣVT t is the functional confounder for an individual t.
B.2. Expanded results
In table 2, we list the 13 SNPs recovered by LODE, that have been previously reported as relevant to Celiac disease. In fig. 7, we plot the true positive and false negative rate amongst SNPs deemed relevant by LODE. The ground truth here are the SNPs reported associated with celiac disease in prior literature.
Figure 7:

True positive vs. False negative rate as we vary the threshold on average effects, that determines which SNPs LODE deems relevant to the outcome.
Table 2:
Full list of SNPs previously reported as relevant that were recovered by LODE, and their estimated effects and Lasso coefficients for SNPs. The effect threshold here is 0.1.
| SNP | Effect | Lasso Coef. |
|---|---|---|
| rs3748816 | 0.12 | 0.20 |
| rs10903122 | 0.10 | 0.17 |
| rs2816316 | 0.11 | 0.20 |
| rs13151961 | 0.17 | 0.32 |
| rs2237236 | 0.17 | 0.00 |
| rs12928822 | 0.14 | 0.29 |
| rs2187668 | −0.70 | −2.37 |
| rs2327832 | −0.12 | −0.20 |
| rs1738074 | −0.16 | −0.23 |
| rs11221332 | −0.15 | −0.24 |
| rs653178 | −0.13 | −0.21 |
| rs4899260 | −0.12 | −0.19 |
| rs17810546 | −0.12 | −0.20 |
Footnotes
We focus on f that generates y from t, z. SCMs generally specify the function that generates t from z also.
We also assume no interference [10] (also called Stable Unit Treatment Value Assumption [24]) which means that an individual’s outcome does not depend on others’ treatment. In EFC, when t and η are sampled IID there is no interference. To see this, note ∀i, j (ti, ηi)⫫(tj, ηj) ⇒ (yi, ti)⫫(yj, tj) ⇒ yi⫫tj.
If f transforms its first argument into as one amongst many different computations, the chain rule implies has a term which is non-zero in general.
We ignore noise in the outcome for ease of exposition.
Non-zero w.r.t. the product measure over supp(t) × supp(h(t)) due to p.
References
- [1].Adamovic Svetlana, Amundsen SS, Lie BA, Gudjonsdottir AH, Ascher H, Ek J, Van Heel DA, Nilsson S, Sollid LM, and Torinsson Naluai Å. Association study of il2/il21 and fcgriia: significant association with the il2/il21 region in scandinavian coeliac disease families. Genes and immunity, 9(4):364, 2008. [DOI] [PubMed] [Google Scholar]
- [2].Anderson Carl A, Boucher Gabrielle, Lees Charlie W, Franke Andre, D’Amato Mauro, Taylor Kent D, Lee James C, Goyette Philippe, Imielinski Marcin, Latiano Anna, et al. Meta-analysis identifies 29 additional ulcerative colitis risk loci, increasing the number of confirmed associations to 47. Nature genetics, 43(3):246, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Ascher Uri M and Petzold Linda R. Computer methods for ordinary differential equations and differential-algebraic equations, volume 61. Siam, 1998. [Google Scholar]
- [4].Astle William, Balding David J, et al. Population structure and cryptic relatedness in genetic association studies. Statistical Science, 24(4):451–471, 2009. [Google Scholar]
- [5].Chang Christopher C, Chow Carson C, Tellier Laurent CAM, Vattikuti Shashaank, Purcell Shaun M, and Lee James J. Second-generation plink: rising to the challenge of larger and richer datasets. Gigascience, 4(1):s13742–015, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Wellcome Trust Case Control Consortium et al. Genome-wide association study of 14, 000 cases of seven common diseases and 3, 000 shared controls. Nature, 447(7145):661, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Correa J and Bareinboim E. A calculus for stochastic interventions: Causal effect identification and surrogate experiments. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, NY, 2020. AAAI Press. [Google Scholar]
- [8].Dubois Patrick CA, Trynka Gosia, Franke Lude, Hunt Karen A, Romanos Jihane, Curtotti Alessandra, Zhernakova Alexandra, Heap Graham AR, Ádány Róza, Aromaa Arpo, et al. Multiple common variants for celiac disease influencing immune gene expression. Nature genetics, 42 (4):295, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Eberhardt Frederick and Scheines Richard. Interventions and causal inference. Philosophy of Science, 74(5):981–995, 2007. [Google Scholar]
- [10].Hernán Miguel A and Robins James M. Causal inference: what if. Boca Raton: Chapman & Hill/CRC, 2020, 2020. [Google Scholar]
- [11].Hill Jennifer L.. Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics, 20(1):217–240, 2011. doi: 10.1198/jcgs.2010.08162. URL 10.1198/jcgs.2010.08162. [DOI] [Google Scholar]
- [12].Hindorff Lucia A, Sethupathy Praveen, Junkins Heather A, Ramos Erin M, Mehta Jayashri P, Collins Francis S, and Manolio Teri A. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proceedings of the National Academy of Sciences, 106(23):9362–9367, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Hirsch Morris W, Devaney Robert L, and Smale Stephen. Differential equations, dynamical systems, and linear algebra, volume 60. Academic press, 1974. [Google Scholar]
- [14].Hunt Karen A, Zhernakova Alexandra, Turner Graham, Heap Graham AR, Franke Lude, Bruinenberg Marcel, Romanos Jihane, Dinesen Lotte C, Ryan Anthony W, Panesar Davinder, et al. Novel celiac disease genetic determinants related to the immune response. Nature genetics, 40 (4):395, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Lippert Christoph, Listgarten Jennifer, Liu Ying, Kadie Carl M, Davidson Robert I, and Heckerman David. Fast linear mixed models for genome-wide association studies. Nature methods, 8 (10):833, 2011. [DOI] [PubMed] [Google Scholar]
- [16].Pascual Virginia, Dieli-Crimi Romina, López-Palacios Natalia, Bodas Andrés, Medrano Luz María, and Núñez Concepción. Inflammatory bowel disease and celiac disease: overlaps and differences. World journal of gastroenterology: WJG, 20(17):4846, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Pearl Judea et al. Causal inference in statistics: An overview. Statistics surveys, 3:96–146, 2009. [Google Scholar]
- [18].Pedregosa Fabian, Varoquaux Gaël, Gramfort Alexandre, Michel Vincent, Thirion Bertrand, Grisel Olivier, Blondel Mathieu, Prettenhofer Peter, Weiss Ron, Dubourg Vincent, et al. Scikit-learn: Machine learning in python. Journal of machine learning research, 12(Oct):2825–2830, 2011. [Google Scholar]
- [19].Price Alkes L, Patterson Nick J, Plenge Robert M, Weinblatt Michael E, Shadick Nancy A, and Reich David. Principal components analysis corrects for stratification in genome-wide association studies. Nature genetics, 38(8):904, 2006. [DOI] [PubMed] [Google Scholar]
- [20].Ranganath Rajesh and Perotte Adler. Multiple causal inference with latent confounding. arXiv preprint arXiv:1805.08273, 2018. [Google Scholar]
- [21].Ratkovic Marc. Balancing within the margin: Causal effect estimation with support vector machines. Department of Politics, Princeton University, Princeton, NJ, 2014. [Google Scholar]
- [22].Robins James M. Robust estimation in sequentially ignorable missing data and causal inference models. In Proceedings of the American Statistical Association, volume 1999, pages 6–10. Indianapolis, IN, 2000. [Google Scholar]
- [23].Rosenbaum Paul R and Rubin Donald B. The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1):41–55, 1983. [Google Scholar]
- [24].Rubin Donald B. Randomization analysis of experimental data: The fisher randomization test comment. Journal of the American Statistical Association, 75(371):591–593, 1980. [Google Scholar]
- [25].Sollid Ludvig M. Coeliac disease: dissecting a complex inflammatory disorder. Nature Reviews Immunology, 2(9):647, 2002. [DOI] [PubMed] [Google Scholar]
- [26].Spivak Michael. Calculus on manifolds: a modern approach to classical theorems of advanced calculus. CRC press, 2018. [Google Scholar]
- [27].Teschl Gerald. Ordinary differential equations and dynamical systems, volume 140. American Mathematical Soc., 2012. [Google Scholar]
- [28].Thornton Timothy and Wu Michael. Summer institute in statistical genetics 2015.
- [29].Visscher Peter M, Wray Naomi R, Zhang Qian, Sklar Pamela, McCarthy Mark I, Brown Matthew A, and Yang Jian. 10 years of gwas discovery: biology, function, and translation. The American Journal of Human Genetics, 101(1):5–22, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Wang Yixin and Blei David M. The blessings of multiple causes. Journal of the American Statistical Association, (just-accepted):1–71, 2019.34012183 [Google Scholar]
- [31].Yu Jianming, Pressoir Gael, Briggs William H, Bi Irie Vroh, Yamasaki Masanori, Doebley John F, McMullen Michael D, Gaut Brandon S, Nielsen Dahlia M, Holland James B, et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature genetics, 38(2):203, 2006. [DOI] [PubMed] [Google Scholar]
