Skip to main content
Oxford University Press logoLink to Oxford University Press
. 2017 May 19;104(3):583–596. doi: 10.1093/biomet/asx028

Joint sufficient dimension reduction and estimation of conditional and average treatment effects

Ming-Yueh Huang *,, Kwun Chuen Gary Chan *
PMCID: PMC5793490  PMID: 29430034

Summary

The estimation of treatment effects based on observational data usually involves multiple confounders, and dimension reduction is often desirable and sometimes inevitable. We first clarify the definition of a central subspace that is relevant for the efficient estimation of average treatment effects. A criterion is then proposed to simultaneously estimate the structural dimension, the basis matrix of the joint central subspace, and the optimal bandwidth for estimating the conditional treatment effects. The method can easily be implemented by forward selection. Semiparametric efficient estimation of average treatment effects can be achieved by averaging the conditional treatment effects with a different data-adaptive bandwidth to ensure optimal undersmoothing. Asymptotic properties of the estimated joint central subspace and the corresponding estimator of average treatment effects are studied. The proposed methods are applied to a nutritional study, where the covariate dimension is reduced from 11 to an effective dimension of one.

Keywords: Forward selection, High-order kernel, Joint central subspace, Optimal bandwidth, Semiparametric efficiency, Undersmoothing

1. Introduction

Investigating the causal effect of a treatment on an outcome is often the primary interest in medical and social studies. While randomization is the gold standard in identifying treatment effects, often only observational data are available. A major challenge in observational studies is confounding, where the treatment and the outcome of interest are associated with other pretreatment variables, potentially leading to seriously biased estimation of average treatment effects. A simple method of dealing with confounding is local matching. Under conditional independence, the distribution of outcomes in a specific group behaves just like that of a random sample from the group while conditioning on values of confounding variables. Thus, a consistent estimator can be obtained by averaging the differences between groups over the distribution of confounding variables. Hahn (1998) introduced a class of nonparametric imputation estimators that use local matching and showed that they are asymptotically efficient.

Although nonparametric imputation estimators are Inline graphic-consistent under regularity conditions, the remainder terms depend on their biases and variances. In particular, the variance increases with the number of confounders, so a balancing score vector with a smaller dimension is always preferred. According to Rosenbaum & Rubin (1983), the propensity score is the coarsest balancing score, and also has the smallest dimension. However, Hahn (1998) showed that projection onto the true propensity score can be inefficient. Another well-known balancing score is the prognostic score of Hansen (2008). Leacy & Stuart (2014). further combined propensity and prognostic scores to improve the classical matching estimator of average treatment effects. Unfortunately, estimators based directly on propensity and prognostic scores often cannot attain the semiparametric efficiency bound. Hence, finding a suitable balancing score vector with high efficiency and minimal dimension is important in practice.

There are two approaches to reducing the dimension of the covariate vector while keeping as much information as possible about the relationship between response and covariates. The first is variable selection, in which the main goal is to drop redundant variables: de Luna et al. (2011) first defined some subsets of confounding variables which are minimal in the sense that the treatment ceases to be unconfounded for any proper subset of these sets; they also showed that these subsets can reduce the efficiency bound for average treatment effects. Related methods are discussed in Vansteelandt et al. (2012). Instead of using the total subset selection procedure, another method of variable selection is penalized regression. Since confoundedness resides in the conditional distribution of the potential outcomes and treatment variable, given confounders, Ghosh et al. (2015) developed a lasso-type criterion to select redundant variables. The second approach is sufficient dimension reduction, which seeks a few linear combinations of confounders that retain the full information on confoundedness. In a related missing data problem, Hu et al. (2014) introduced effective balancing scores, which are the central subspaces of missing indicators and observed responses, given covariates. While the estimation of average treatment effect can often be regarded as two missing data problems, applying this approach twice would yield two estimates of central subspaces that could be highly collinear.

In this paper, we introduce a joint sufficient dimension reduction model on the propensity score as well as the conditional distributions of potential outcomes, and use the joint central subspace to form a semiparametric efficient estimator of average treatment effects. No stringent parametric model formulation is assumed in the dimension reduction framework. Further, while classical dimension reduction methods consider models on the joint distribution of treatment and potential outcomes, our approach focuses on the marginal distributions and can yield a balancing score with smaller dimension. The exclusion restrictions used in de Luna et al. (2011) for efficiency gains are not required, but the semiparametric efficiency bound is retained by our estimator.

Several approaches to sufficient dimension reduction can be extended to this causal inference problem. In a complete-data setting, these approaches include inverse regression (Li, 1991; Li & Wang, 2007; Zhu et al., 2010), average derivative and minimum average variance estimation (Zhu & Zeng, 2006; Xia, 2007; Wang & Xia, 2008; Yin & Li, 2011), and the semiparametric framework (Ma & Zhu, 2012, 2013; Huang & Chiang, 2017). Here we extend the work of Huang & Chiang (2017) and construct a crossvalidation-type least squares criterion to estimate the structural dimension and the basis matrix simultaneously. The bandwidth used in the semparametric estimator of the unspecified link function can be selected using the same criterion and attains the optimal rate for estimating the conditional treatment effects. The proposed model is flexible and the average treatment effects can be estimated efficiently.

Although local matching using the propensity score may not be semiparametrically efficient, inverse propensity score weighting with estimated weights has been shown to be efficient (Hirano et al., 2003), and many recent efforts have focused on improving the weighting estimators (Qin & Zhang, 2007; Imai & Ratkovic, 2014; Chan et al., 2016). In contrast to those estimators, our method provides a rate-optimal estimator of covariate-specific treatment effects, which is useful for personalized prediction and the study of heterogeneity.

2. Joint sufficient dimension reduction

2.1. Notation and model construction

Let Inline graphic and Inline graphic be the potential outcomes when an individual is assigned to the control and treatment groups, respectively, and let Inline graphic be a binary treatment indicator. Since each unit is either treated or not treated, the observed outcome is Inline graphic. In addition, a vector of covariates or confounders Inline graphic is observed for each subject, and we make the following conditional independence assumption.

Assumption 1

(Unconfounded treatment assignment). We have that Inline graphic, where Inline graphic denotes independence.

This assumption is often made to identify the average treatment effect Inline graphic. Under Assumption 1, Hahn (1998) and Robins et al. (1994) derived the semiparametric efficiency bound for Inline graphic as

σeff2=E[{m1(X)m0(X)τ}2+σ12(X)π(X)+σ02(X)1π(X)],

where Inline graphic is the propensity score, Inline graphic is the conditional mean of the potential outcome, and Inline graphic (Inline graphic). Also, Inline graphic can be shown to be the asymptotic variance of a nonparametric imputation estimator by directly using Inline graphic as a balancing score. A balancing score with smaller dimension but the same efficiency is obtained if the conditional distributions of Inline graphic, Inline graphic and Inline graphic given Inline graphic are captured by a lower-dimensional linear subspace of Inline graphic. Therefore, we focus on finding Inline graphic such that

TXBTX,Y(0)XBTX,Y(1)XBTX, (1)

where Inline graphic is a full-rank Inline graphic parameter matrix with Inline graphic; we call the column space of Inline graphic a joint sufficient dimension reduction subspace. For simplicity, we will write Inline graphic for the column space of a matrix Inline graphic. Obviously, (1) holds when Inline graphic and Inline graphic is the Inline graphic identity matrix, Inline graphic. Thus it always covers the true model. Moreover, if Inline graphic and Inline graphic is a joint sufficient dimension reduction subspace, then Inline graphic will also be a joint sufficient dimension subspace. The most interesting parameter is therefore the joint sufficient dimension reduction subspace of smallest dimension, called the joint central subspace when it exists, which is unique up to an equivalence class as discussed in Remark 2. The corresponding basis matrix Inline graphic has dimension Inline graphic. The existence and uniqueness of the joint central subspace can be ensured under some mild conditions, similar to the discussion of Cook (1998) on sufficient dimension reduction for univariate responses.

Alternatively, based on the classical literature on sufficient dimension reduction, one can also consider the model

{T,Y(0),Y(1)}XBTX, (2)

which is different from model (1). In fact, Inline graphic will be contained in the central subspace of (2). Since the average treatment effect involves only the marginal distributions of Inline graphic and it is appealing to seek a balancing score with lower dimension, we consider model (1) instead of model (2).

Based on the definition of a joint central subspace, Inline graphic is obviously a balancing score and

T{Y(0),Y(1)}B0TX,

which ensures unbiased estimation of the average treatment effect. The main feature of this balancing score is that it creates both propensity and prognostic balance (Hansen, 2008; Leacy & Stuart, 2014), and we will show in § 3 that it attains the semiparametric efficiency bound in the estimation of Inline graphic.

Remark 1.

Unlike sufficient dimension reduction tools such as sliced inverse regression (Li, 1991), we do not require an additional linearity assumption on the covariate distribution.

Remark 2.

Under model (1), Inline graphic and Inline graphic (Inline graphic) remain the same for any basis matrix Inline graphic with the same column space. In fact, all such Inline graphic span the same space and are isomorphic up to a linear transformation. The parameter space of Inline graphic is called the Grassmann manifold, or Grassmannian, denoted by Inline graphic. To avoid ambiguity, we follow Ma & Zhu (2013) in employing the local coordinate system of the Grassmannian and parameterize the basis by Inline graphic, where Inline graphic is a Inline graphic free parameter matrix. This parameterization is particularly useful in theoretical developments for characterizing the information matrix, and is not an additional model assumption or a restriction on Inline graphic. Computation of the proposed estimator does not require fixing the reference variables in advance; see Remark 6.

Remark 3.

Since the main parameters of interest are means of potential outcomes, one could also consider the joint central mean subspace, which is the smallest linear subspace with basis matrix Inline graphic such that

TXBMTX,Y(0)E{Y(0)X}BMTX,Y(1)E{Y(1)X}BMTX. (3)

The corresponding method in a complete-data setting can be found in Cook & Li (2002) and Xia (2008). Note that the distribution Inline graphic remains the same as in the sufficient dimension reduction model because Inline graphic is binary and its distribution is determined by its mean. Since (3) models the conditional means only and the mean is a functional of the distribution, one can verify that Inline graphic. Moreover, by Theorems 2 and 3 of Rosenbaum & Rubin (1983), Inline graphic is also a balancing score, for which Proposition (1) below holds. Thus we obtain a balancing score with a possibly smaller dimension for the estimation of average treatment effects. However, a comparison of Inline graphic and (8) in § 3.1 reveals that the efficiency bound will not be generally attained with use of Inline graphic as a balancing score; that is, in general there is a trade-off between a lower dimension and a smaller asymptotic variance. This trade-off is also discussed in Hu et al. (2014), who studied mean estimation in missing data. Furthermore, the current formulation is sufficient for the estimation of any conditional functionals, not just the conditional means, and does not require re-estimation of the central subspace for different functionals of interest; see Remark (7). Therefore, we consider model (1) so that all relevant information is kept.

2.2. Simultaneous estimation for the basis and dimension of the joint central subspace

Here we develop an estimation criterion for the joint central subspace with a random sample Inline graphic. First, we note that model (1) is equivalent to

E(TX)=E(TBTX),E[1{Y(k)y}X]=E[1{Y(k)y}BTX](yR;k=0,1),

where Inline graphic represents the indicator function. That is, we consider semiparametric regression models Inline graphic and Inline graphic, where Inline graphic and Inline graphic are unspecified, for responses Inline graphic and Inline graphic on their corresponding mean functions Inline graphic and Inline graphic (Inline graphic). However, since Inline graphic and Inline graphic are not always observed, we must be careful in using them as responses. Our key idea comes from the following proposition.

Proposition 1.

Under Assumption 1 and model (1),

E[1{Y(k)y}X]=E{1(Yy)T=k,X}=E{1(Yy)T=k,B0TX}(yR;k=0,1). (4)

The first equality in (4) indicates that the regression problem can be solved by considering treatment and control groups separately. The second equality leads to a sieve approach for the estimation of the unknown link function. Let

π^(BTx)=j=1nTjKh{BT(Xjx)}j=1nKh{BT(Xjx)},F^k(yBTx)=j=1nTjk(1Tj)1k1(Yjy)Kh{BT(Xjx)}j=1nTjk(1Tj)1kKh{BT(Xjx)}(k=0,1)

denote estimators for Inline graphic and Inline graphic, where Inline graphic with Inline graphic, Inline graphic a positive bandwidth, and Inline graphic a Inline graphicth-order kernel function. The choice of Inline graphic will be discussed in Remark 5. Now we may use Proposition 1 to construct a crossvalidation-type criterion for the estimation of the joint central subspace. Write Inline graphic and Inline graphic for generic vector-valued functions Inline graphic and matrix Inline graphic, where Inline graphic is the marginal distribution of Inline graphic. Let Inline graphic be a future run, independent of the current data Inline graphic, and define the prediction risk

E{Yy0F^(yBTX0)W02} (5)

where

Yy0={T0,1(Y0y),1(Y0y)}T,F^(yBTX)={π^(BTX),F^0(yBTX),F^1(yBTX)}T,W0=(1π)e1e1T+(1T0)e2e2T+T0e3e3T,

with Inline graphic being the marginal probability of treatment and Inline graphic the standard basis of Inline graphic. The weight Inline graphic is used to treat the squared error as an integration over the distribution of Inline graphic. Further, let Inline graphic. A simple calculation shows that the prediction risk in (5) can be decomposed into

σ02+b02(B)+MISEB(h)+C(B,h), (6)

where

σ02=E{Yy0F(yB0TX0)W02},b02(B)=E{F(yB0TX0)F(yBTX0)W02},MISEB(h)=E{F(yBTX0)F^(yBTX0)W02},C(B,h)=2E{F(yB0TX0)F(yBTX0),F(yBTX0)F^(yBTX0)W0}.

As Inline graphic and Inline graphic, it is shown in the Supplementary Material that the last two terms of (6) converge to zero and are dominated by the first two terms. Note that Inline graphic and Inline graphic when Inline graphic is a joint sufficient dimension reduction subspace. Hence, the minimum of the prediction risk must occur when Inline graphic. Moreover, since our model has a nested structure, the prediction risk decreases with the working dimension Inline graphic when Inline graphic. On the other hand, when Inline graphic and Inline graphic, the prediction risk has an asymptotic order of Inline graphic, which increases with Inline graphic. Therefore, for large enough Inline graphic, the minimal prediction risk occurs at the joint central subspace Inline graphic. A formal result is stated as follows.

Proposition 2.

Under Assumption 1 and model (1), the joint central subspace Inline graphic and the optimal bandwidth Inline graphic minimize the prediction risk in (5) as Inline graphic, Inline graphic and Inline graphic, where the constant Inline graphic is given in the Supplementary Material.

The proof of Proposition 2 is given in the Supplementary Material. According to Proposition 2 and the fact that the prediction risk is asymptotically convex in Inline graphic, we obtain our estimator through the following algorithm.

Step 1.

For Inline graphic, calculate

CV0=1ni=1n[(TiT¯)2(1T¯)+k=01Tik(1Ti)1k{1(Yiy)F^k(y)}2dF^Y(y)],

where Inline graphic, Inline graphic is the empirical distribution of Inline graphic, and Inline graphic (Inline graphic).

Step 2.

For Inline graphic, let Inline graphic be the minimizer of Inline graphic among all Inline graphic matrices Inline graphic and positive Inline graphic, where

CV(B,h)=1ni=1n[{Tiπ^i(BTXi)}2(1T¯)+k=01Tik(1Ti)1k{1(Yiy)F^ki(yBTXi)}2dF^Y(y)];

here the superscript Inline graphic indicates the estimator based on a sample with the Inline graphicth subject deleted. Then calculate Inline graphic.

Step 3.

Repeat Step 2 until Inline graphic with Inline graphic. The proposed estimator is Inline graphic.

We will show that Inline graphic converges uniformly to the prediction risk as Inline graphic in the proof of the following theorem, and the proposed algorithm can simultaneously estimate the joint central subspace and the structural dimension consistently. In summary, our proposed method, which is easily implemented in practice, can simultaneously estimate the basis matrix, the structural dimension of the joint central subspace, and the optimal bandwidth. The asymptotic properties of the proposed estimators are established in the following theorem.

Theorem 1.

Suppose that Assumption 1 and Conditions A1–A5 in the Supplementary Material are satisfied. Then Inline graphic, Inline graphic, and

n1/2vec(B^B0)1(d^=d0)=n1/2i=1nξB0(Ti,Yi,Xi)+op(1)N(0,ΣB0)

in distribution as Inline graphic, where Inline graphic is the columnwise matrix vectorization operator, Inline graphic and Inline graphic, with Inline graphic and Inline graphic as defined in the Supplementary Material.

Remark 4.

The proposed criterion selects the basis matrix and the structural dimension of the joint central subspace simultaneously, which is different from most existing sufficient dimension reduction methods such as inverse regression (Zhu et al., 2010), minimum average variance estimation (Yin & Li, 2011) and semiparametric approaches (Ma & Zhu, 2012, 2013). Moreover, the bandwidth chosen by this criterion is the rate-optimal bandwidth in terms of the mean integrated squared error, which will be further discussed in § 2.3.

Remark 5.

We use different bandwidths Inline graphic for the working dimension Inline graphic in the proposed algorithm, and we show in the proof of Theorem 1 that Inline graphic. Coupled with the restriction in Condition A3, the order of the kernel function should satisfy Inline graphic. Since we always use a symmetric kernel function, whose order will be even, and require the order to be as small as possible, a suitable choice is Inline graphic for each working dimension Inline graphic, where Inline graphic denotes the floor function.

Remark 6.

Grassmanian optimization algorithms in Edelman et al. (1999) and Adragni et al. (2012) can be used in Step 2 without fixing the reference variables a priori. Those algorithms are a modification of conventional gradient-based algorithms, which consider movements along geodesics based on a metric defined in the tangent space of the Grassmann manifold. Since the optimization is nonconvex, a reliable initial value is needed. We suggest using the kernel-based method of Fukumizu & Leng (2014), as it is the fastest method known to date. That method is consistent but does not attain a Inline graphic convergence rate. Given this initial value, Step 2 can be implemented in a slightly different manner. The initial value could first be transformed into a local coordinate representation by Gaussian elimination, where the reference variables are chosen to be those with the largest columnwise coefficients in absolute value for numerical stability. Then the optimization can be performed with respect to free parameters in the Euclidean local coordinate system. We have compared these two methods and different initial values through simulations reported in the Supplementary Material, and found them to yield similar performance.

2.3. Estimation of conditional effects

An important advantage of our proposed criterion is that it selects the bandwidth simultaneously with the estimation of the joint central subspace, and the selected bandwidth Inline graphic minimizes the mean integrated squared error asymptotically. In particular, the bandwidth achieves the optimal rate for estimating conditional regression functions. Hence, we may use the bandwidth directly to obtain estimators of the conditional effects Inline graphic:

E^{Y(k)X=x}=i=1nYiTik(1Ti)1kKh^{B^T(Xix)}i=1nTik(1Ti)1kKh^{B^T(Xix)}(k=0,1). (7)

At the dimension reduction stage, we suggest using at least a fourth-order kernel function to ensure the large-sample properties of the estimated joint central subspace, as discussed in Remark 5. However, in practice the negative weights of a higher-order kernel can be detrimental to the stability of the resulting estimators. One possible way to obtain a more stable estimate is by using a second-order kernel function in (7) and substituting an adjusted bandwidth Inline graphic, so that the resulting convergence rate of Inline graphic is optimal with respect to a mean integrated squared error based on a second-order kernel function. In our numerical experiments we have found that the finite-sample performance of (7) is much better if a second-order kernel and an adjusted bandwidth have been used.

To estimate the variance of the estimated conditional effects, an infinitesimal jackknife estimator can be applied. The idea is to perturb the empirical weight Inline graphic in the original estimator by a small amount Inline graphic and then take Inline graphic. More precisely, if we write Inline graphic as a function of Inline graphic variables

Qk(w1,,wn)=i=1nwiYiTik(1Ti)1kKh^{B^T(Xix)}i=1nwiTik(1Ti)1kKh^{B^T(Xix)}

evaluated at Inline graphic, the infinitesimal jackknife estimator of variance is Inline graphic, where Inline graphic is the derivative of Inline graphic with respect to Inline graphic evaluated at Inline graphic, i.e., for Inline graphic or Inline graphic,

D^k,i=YiTik(1Ti)1kKh^{B^T(Xix)}i=1nTik(1Ti)1kKh^{B^T(Xix)}TkTik(1Ti)1kKh^{B^T(Xix)}i=1nTik(1Ti)1kKh^{B^T(Xix)}.

To estimate the variance of Inline graphic, we can directly apply the infinitesimal jackknife estimator Inline graphic.

Remark 7.

The estimator (7) can be extended to estimate Inline graphic for real-valued functions Inline graphic in the following way:

E^[g{Y(k)}X=x]=i=1ng(Yi)Tik(1Ti)1kKh^{B^T(Xix)}i=1nTik(1Ti)1kKh^{B^T(Xix)}(k=0,1).

Model (1) guarantees that Inline graphic for an arbitrary function Inline graphic.

3. Efficient estimation of average treatment effects

3.1. Semiparametric efficiency bound and the efficient estimator

One should note that Inline graphic is the efficiency bound for the nonparametric model without any specification of the forms of Inline graphic, Inline graphic and Inline graphic. Since model (1) imposes a multi-indices structure on these distribution functions, some efficiency gain might be expected. However, we have found that the dimension reduction structure (1) is ancillary for the estimation of Inline graphic, so knowledge of the joint central subspace does not reduce the asymptotic variance bound.

Theorem 2.

Under model (1), the semiparametric efficiency bound of Inline graphic is Inline graphic.

According to Rosenbaum & Rubin (1983) and Hansen (2008), the average treatment effects Inline graphic can be consistently estimated through balancing scores or prognostic scores. In fact, if Inline graphic satisfies

T{Y(0),Y(1)}b(X),

then Inline graphic, and Inline graphic can be estimated by

τ^b=1ni=1n[m^1{b(Xi)}m^0{b(Xi)}]

where

m^k{b(x)}=j=1nYjTjk(1Tj)1kKς{b(Xj)b(x)}j=1nTjk(1Tj)1kKς{b(Xj)b(x)}.

Since Inline graphic is an estimator of Inline graphic, we can follow the proof of Hahn (1998) for nonparametric imputation estimators and obtain the asymptotic variance of Inline graphic as

E([m1{b(X)}m0{b(X)}τ]2+σ12{b(X)}π{b(X)}+σ02{b(X)}1π{b(X)}) (8)

where Inline graphic, under some regularity conditions. Under model (1) and with Inline graphic, the asymptotic variance attains the semiparametric efficiency bound Inline graphic of Hahn (1998).

A simple estimator of Inline graphic is

τ^=1ni=1n{m^1(B^TXi)m^0(B^TXi)},

where Inline graphic is an estimator of Inline graphic. In this study, we further show that the asymptotic variance of Inline graphic does not affect the asymptotic behaviour of Inline graphic and that Inline graphic is semiparametrically efficient.

Although the estimator of the average treatment effect is an average of conditional treatment effects, one requires a different bandwidth to attain optimal undersmoothing. We first provide the asymptotic distribution of the proposed estimator for a range of bandwidths satisfying Condition A6 in the Supplementary Material; then a data-adaptive method for choosing the bandwidth is discussed in § 3.2.

Theorem 3.

Suppose that Assumption (1) and Conditions A1–A6 in the Supplementary Material are satisfied. Then Inline graphic in distribution as Inline graphic.

In practice, the semiparametric efficiency bound can be estimated by a direct plug-in estimator

σ^eff2=1ni=1n[{m^1(B^TXi)m^0(B^TXi)τ^}2+σ^12(B^TXi)π^(B^TXi)+σ^02(B^TXi)1π^(B^TXi)]

where

σ^k2(BTx)=i=1nTik(1Ti)kYi2Kς{BT(Xix)}i=1nTik(1Ti)kKς{BT(Xix)}m^k2(BTx)(k=0,1).

The bandwidth Inline graphic can be replaced by that used to estimate Inline graphic.

Remark 8.

A slight variation of Inline graphic can be constructed in the spirit of Cheng (1994). Since the potential outcomes are only partially unobservable, Cheng (1994) suggested imputing an unobserved value with its conditional expectation. More precisely, the estimator is

1ni=1n([TiYi+(1Ti)m^1{b(Xi)}][(1Ti)Yi+Tim^0{b(Xi)}]).

By paralleling the proof of Cheng (1994), one can show that the asymptotic distribution of this estimator is the same as that of Inline graphic, and hence neither is better in general. In our simulation studies, we have found that this alternative estimator has slightly smaller bias than Inline graphic and a very similar standard deviation.

3.2. Bandwidth selection

As indicated in Theorem 3, the bandwidth Inline graphic used in the nonparametric imputation should be smaller than the classical optimal bandwidth with rate Inline graphic, so an important issue in practice is how to select a proper bandwidth. Häggström & de Luna (2014) suggested minimizing the conditional mean squared error of Inline graphic, which is of the form

E([τ^1ni=1n{m1(B0TXi)m0(B0TXi)}]2|X1,,Xn).

In our simulation experiments we have found that the bandwidth Inline graphic which minimizes the sample analogue of the conditional mean squared error

E[{1ni=1nm^k(B0TXi)1ni=1nmk(B0TXi)}2|X1,,Xn](k=0,1) (9)

leads to a slightly better estimator for Inline graphic. The main difference is that Häggström & de Luna (2014) used a local linear regression instead of a Nadaraya–Watson estimator to estimate the conditional effects. Since we directly adopt the local constant smoothing estimator and the estimated optimal bandwidth in the dimension reduction stage, a separate criterion might be helpful for alleviating the boundary effects of the Nadaraya–Watson estimator. Following the proof of Häggström & de Luna (2014), we can show that (9) is asymptotically equivalent to

1nσk2(B0Tx)fX(x)πk(B0Tx){1π(B0Tx)}1kdx+σK2n2ςd0σk2(B0Tx)πk(B0Tx){1π(B0Tx)}1kdx+ς2q(μq,Kq!)2(DBTxq[mk(B0Tx)πk(B0Tx){1π(B0Tx)}1kfB0TX(B0Tx)]πk(B0Tx){1π(B0Tx)}1kmk(B0Tx)DBTxq[πk(B0Tx){1π(B0Tx)}1kfB0TX(B0Tx)]πk(B0Tx){1π(B0Tx)}1k(1d,,1d)dx)2,

where Inline graphic, Inline graphic, Inline graphic is the Inline graphicth-order derivative of Inline graphic in tensor form, Inline graphic with dimension Inline graphic, and the optimal bandwidth is asymptotically equivalent to Inline graphic. According to Condition A6 in the Supplementary Material, we require Inline graphic to ensure the asymptotic normality of Inline graphic if the estimated optimal bandwidth is used. However, we find that the second-order kernel still works very well in practice. In addition, consistency can be guaranteed under the weaker assumptions that Inline graphic and Inline graphic as Inline graphic.

4. Application

In this section, we demonstrate our proposed method by applying it to the 2007–2008 National Health and Nutrition Examination Survey, the main goal of which was to investigate the health and nutrition statuses of children and adults in the United States. We focus on a subset of the data (Kohn et al., 2014) and study whether participation in the National School Lunch or School Breakfast programme would lead to an increase in body mass index for children and youths aged 4 to 17. The dataset contains 2330 children and youths, of whom 1284, 55%, participated in the school meal programme. The covariates are child age, Inline graphic, child gender, Inline graphic, black race, Inline graphic, Hispanic race, Inline graphic, family above 200% of the federal poverty level, Inline graphic, participation in the Special Supplemental Nutrition Program for Women, Infants, and Children, Inline graphic, participation in the Food Stamp Program, Inline graphic, a childhood food security measurement, Inline graphic, health insurance coverage, Inline graphic, gender of survey respondent, Inline graphic, and age of survey respondent, Inline graphic.

The estimated structural dimension of the joint central subspace is 1 and the estimated linear index is shown in {Table 1. Based on this balancing score, the estimated average difference in body mass index between participants and nonparticipants is Inline graphic with a standard error of Inline graphic. Furthermore, the 95% confidence interval (Inline graphic) indicates an insignificant difference in body mass index between participants and nonparticipants, which is consistent with the conclusion that Chan et al. (2016) reached through weighting estimators. Therefore, participation in the school meal programme does not seem to be correlated with excessive food consumption.

Table 1.

Health and Nutrition Examination Survey data

Estimate (SE) Estimate (SE)
Inline graphic 1
Inline graphic 0·006 (0·0025) Inline graphic 0·016 (0·0025)
Inline graphic 0·004 (0·0021) Inline graphic 0·020 (0·0028)
Inline graphic 0·031 (0·0039) Inline graphic 0·002 (0·0013)
Inline graphic Inline graphic0·030 (0·0039) Inline graphic 0·029 (0·0015)
Inline graphic 0·000 (0·0027) Inline graphic 0·965 (0·0002)

Figure 1 plots the estimated difference in body mass index between the two groups as a function of the estimated linear index. The standard errors are obtained using the infinitesimal jackknife. In general, the difference in average body mass index between the two groups is insignificant. However, the participants tend to have slightly higher body mass index than non-participants when the linear index lies between 40 to 50, which is a reason behind the slightly positive average treatment effects.

Fig. 1.

Fig. 1.

Estimated difference in body mass index between participants and nonparticipants of the school meal programme plotted against the estimated linear index for the 2007–2008 National Health and Nutrition Examination Survey data. The solid line represents the estimated conditional effects and the dashed lines represent pointwise 95% confidence limits.

5. Discussion

Recently Ma & Zhu (2013) introduced an efficient estimating equation, which is obtained through a likelihood approach, to achieve the semiparametric efficiency bound of the central subspace. However, the efficiency bound is derived under a fixed dimension and, in practice, the true structural dimension is unknown. Our proposed estimator estimates the structural dimension and the basis matrix simultaneously.

In observational studies, continuous treatments or exposures are also common. In the literature, a generalized propensity score has been introduced to estimate continuous treatment effects; see, for example, Imbens (2000). Since our proposed model can adapt the propensity and outcome regression jointly, it would be interesting to extend this modelling approach to continuous treatment regimes.

Supplementary Material

Supplementary Data

Acknowledgement

The authors thank the editor, an associate editor, two reviewers, and Dr Mary Lou Thompson for their helpful comments and suggestions. The authors were partially supported by the National Heart, Lung, and Blood Institute of the U.S. National Institutes of Health.

Supplementary material

Supplementary material available at Biometrika online includes a comparison of several alternative estimation criteria for the joint central subspace, additional simulation results, and the proofs of Proposition 2 and Theorems 1–3.

References

  1. Adragni K. P., Cook R. D. & Wu S. (2012). Grassmannoptim: An R package for Grassmann manifold optimization. J. Statist. Software 50, 1–18. [Google Scholar]
  2. Chan K. C. G., Yam S. C. P. & Zhang Z. (2016). Globally efficient nonparametric inference of average treatment effects by empirical balancing calibration weighting. J. R. Statist. Soc. B 78, 673–700. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Cheng P. E. (1994). Nonparametric estimation of mean functionals with data missing at random. J. Am. Statist. Assoc. 89, 81–7. [Google Scholar]
  4. Cook R. D. (1998). Regression Graphics. New York: Wiley. [Google Scholar]
  5. Cook R. D. & Li B. (2002). Dimension reduction for conditional mean in regression. Ann. Statist. 30, 455–74. [Google Scholar]
  6. de Luna X., Waernbaum I. & Richardson T. S. (2011). Covariate selection for the nonparametric estimation of an average treatment effect. Biometrika 98, 861–75. [Google Scholar]
  7. Edelman A., Arias T. A. & Smith S. T. (1999). The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20, 303–53. [Google Scholar]
  8. Fukumizu K. & Leng C. (2014). Gradient-based kernel dimension reduction for regression. J. Am. Statist. Assoc. 109, 359–70. [Google Scholar]
  9. Ghosh D., Zhu Y. & Coffman D. L. (2015). Penalized regression procedures for variable selection in the potential outcomes framework. Statist. Med. 34, 1645–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Häggström J. & de Luna X. (2014). Targeted smoothing parameter selection for estimating average causal effects. Comp. Statist. 29, 1727–48. [Google Scholar]
  11. Hahn J. (1998). On the role of the propensity score in efficient semiparametric, estimation of average treatment effects. Econometrica 66, 315–31. [Google Scholar]
  12. Hansen B. B. (2008). The prognostic analogue of the propensity score. Biometrika 95, 481–8. [Google Scholar]
  13. Hirano K., Imbens G. W. & Ridder G. (2003). Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71, 1161–89. [Google Scholar]
  14. Hu Z., Follmann D. A. & Wang N. (2014). Estimation of mean response via the effective balancing score. Biometrika 101, 613–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Huang M.-Y. & Chiang C. T. (2017). An effective semiparametric estimation approach for the sufficient dimension reduction model. J. Am. Statist. Assoc. to appear, 10.1080/01621459.2016.1215987. [DOI] [Google Scholar]
  16. Imai K. & Ratkovic M. (2014). Covariate balancing propensity score. J. R. Statist. Soc. B 76, 243–63. [Google Scholar]
  17. Imbens G. W. (2000). The role of the propensity score in estimating dose-response functions. Biometrika 87, 706–10. [Google Scholar]
  18. Kohn M. J., Bell J. F. Grow M. G. & Chan K. C. G. (2014). Food insecurity, food assistance and weight status in US youth: New evidence from NHANES 2007-08. Pediatric Obesity 9, 155–66. [DOI] [PubMed] [Google Scholar]
  19. Leacy F. P. & Stuart E. A. (2014). On the joint use of propensity and prognostic scores in estimation of the average treatment effect on the treated: A simulation study. Statist. Med. 33, 3488–508. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Li B. & Wang S. (2007). On directional regression for dimension reduction. J. Am. Statist. Assoc. 102, 997–1008. [Google Scholar]
  21. Li K.-C. (1991). Sliced inverse regression for dimension reduction (with Discussion). J. Am. Statist. Assoc. 86, 316–42. [Google Scholar]
  22. Ma Y. & Zhu L. (2012). A semiparametric approach to dimension reduction. J. Am. Statist. Assoc. 107, 168–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Ma Y. & Zhu L. (2013). Efficient estimation in sufficient dimension reduction. Ann. Statist. 41, 250–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Qin J. & Zhang B. (2007). Empirical-likelihood-based inference in missing response problems and its application in observational studies. J. R. Statist. Soc. B 69, 101–22. [Google Scholar]
  25. Robins J. M., Rotnitzky A. & Zhao L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. J. Am. Statist. Assoc. 89, 846–86. [Google Scholar]
  26. Rosenbaum P. R. & Rubin D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70, 41–55. [Google Scholar]
  27. Vansteelandt S., Bekaert M. & Claeskens G. (2012). On model selection and model misspecification in causal inference. Statist. Meth. Med. Res. 21, 7–30. [DOI] [PubMed] [Google Scholar]
  28. Wang H. & Xia Y. (2008). Sliced regression for dimension reduction. J. Am. Statist. Assoc. 103, 811–21. [Google Scholar]
  29. Xia Y. (2007). A constructive approach to the estimation of dimension reduction directions. Ann. Statist. 35, 2654–90. [Google Scholar]
  30. Xia Y. (2008). A multiple-index model and dimension reduction. J. Am. Statist. Assoc. 103, 1631–40. [Google Scholar]
  31. Yin X. & Li B. (2011). Sufficient dimension reduction based on an ensemble of minimum average variance estimators. Ann. Statist. 39, 3392–416. [Google Scholar]
  32. Zhu L. P., Zhu L. X. & Feng Z. H. (2010). Dimension reduction in regressions through cumulative slicing estimation. J. Am. Statist. Assoc. 105, 1455–66. [Google Scholar]
  33. Zhu Y. & Zeng P. (2006). Fourier methods for estimating the central subspace and the central mean subspace in regression. J. Am. Statist. Assoc. 101, 1638–51. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Biometrika are provided here courtesy of Oxford University Press

RESOURCES