Analysis of Double Single Index Models

Kun Chen; Yanyuan Ma

doi:10.1111/sjos.12238

. Author manuscript; available in PMC: 2018 Mar 1.

Published in final edited form as: Scand Stat Theory Appl. 2016 Aug 22;44(1):1–20. doi: 10.1111/sjos.12238

Analysis of Double Single Index Models

Kun Chen ¹, Yanyuan Ma ²

PMCID: PMC5352986 NIHMSID: NIHMS816420 PMID: 28316363

Abstract

Motivated from problems in canonical correlation analysis, reduced rank regression and sufficient dimension reduction, we introduce a double dimension reduction model where a single index of the multivariate response is linked to the multivariate covariate through a single index of these covariates, hence the name double single index model. Since nonlinear association between two sets of multivariate variables can be arbitrarily complex and even intractable in general, we aim at seeking a principal one-dimensional association structure where a response index is fully characterized by a single predictor index. The functional relation between the two single-indices is left unspecified, allowing flexible exploration of any potential nonlinear association. We argue that such double single index association is meaningful and easy to interpret, and the rest of the multi-dimensional dependence structure can be treated as nuisance in model estimation. We investigate the estimation and inference of both indices and the regression function, and derive the asymptotic properties of our procedure. We illustrate the numerical performance in finite samples and demonstrate the usefulness of the modeling and estimation procedure in a multi-covariate multi-response problem concerning concrete.

Keywords: Canonical correlation analysis, Reduced rank regresion, Semiparametric efficiency, Single index models, Sufficient dimension reduction

1 Introduction

In scientific research and engineering, many statistical problems share a common goal of deciphering the associations between certain features and outcomes/responses from noisy data. When both the feature and response variables are multivariate, several different strategies exist to model their relations. Among the popular approaches are the canonical correlation analysis (CCA) (Hotelling, 1936) and the reduced rank regression (RRR) (Anderson, 1951; Reinsel and Velu, 1998; Mukherjee and Zhu, 2011), both are designed to examine possible linear association between the two sets of random variables.

Specifically, write the covariate vector X ∈ ℝ^p and the response variable Y ∈ ℝ^q, where p > 1, q > 1. CCA seeks linear combinations α^TY and β^TX that have maximum correlation with each other. In other words, CCA searches for unit length vectors α and β so that corr(α^TY, β^TX) is maximized. Because correlation is chosen as the sole criterion to evaluate the closeness between α^TY and β^TX, CCA implicitly assumes a linear relation between these two quantities, or, at the very least, CCA is only interested in the linear relation between them. Similar to CCA, in the multivariate linear regression framework, the RRR model assumes a linear relation Y = C^TX+ε* between the responses and covariates, where the coefficient matrix C ∈ ℝ^p×q is possibly of low rank, say, rank(C) = r ≤ min(p, q), and ε* is usually assumed to follow a multivariate normal distribution with mean zero. The main idea of RRR amounts to seek the best low-rank approximation of Y supervised by the covariate information in X, i.e., minimizing E{(Y − C^TX)^T(Y − C^TX)} subject to rank(C) ≤ r. When we consider the unit-rank RRR model, it becomes Y = cαβ^TX+ε*. Here c is the first singular value of C and α, β are the first left and right singular vectors of C respectively. This can be further written as α^TY = cβ^TX+ε, where ε is a mean zero error term. Obviously, the linear relation between Y and X in RRR implies a linear relation between α^TY and β^TX. In fact, many commonly used multivariate techniques, including CCA, RRR and principal component analysis are all intrinsically related and all rely on certain linear assumption (Hotelling, 1936; Reinsel and Velu, 1998). Although in practice, multiple linearly dependent pairs of directions can be retained from CCA or RRR either sequentially or simultaneously, to focus on the main idea, we restrict our attention to the extraction of a single pair of directions in this paper, following the spirit of the single index model.

In real world applications, linearity is often too strong an assumption when characterizing variable association, and nonlinearity inevitably arises, especially in multivariate settings. However, extension of the available nonparametric techniques designed for univariate response to multivariate response is not quite straightforward, not only because of the curse of dimensionality, but also because of the difficulty in efficiently modeling the dependence structure among the response variables to fully embrace the multivariate nature of the problem. Many existing nonlinear methods were originated from classical CCA and RRR (Gifi, 1990; Hsieh, 2000; He et al., 2003; Yuan et al., 2007; Mukherjee and Zhu, 2011). Xia (2008) proposed a semiparametric approach of CCA (SCA), in which the estimation was based on minimizing E{α^TY − E(α^TY | β^TX)}², where the conditional expectation E(· | ·) was estimated nonparametrically. SCA thus extends the classical CCA, as the latter simply assumes the conditional expectation to be linear. A similar approach is the generalized canonical correlation analysis proposed by Iaci et al. (2010); the method searches the pair of indices by minimizing E{α^TY − E(α^TY | β^TX)}² + E{β^TX − E(β^TX | α^TY)}², treating the two sets of variables symmetrically. There have been several approaches that find (β^Tx, α^Ty) by maximizing certain divergence measure between the joint distribution of (β^Tx, α^Ty) and the product of their marginal distributions. Iaci and Sriram (2013) proposed two families of multivariate association measures based on power divergence and alpha divergence, and Mandal and Cichocki (2013) proposed a generalized method of CCA called AB-canonical analysis using Alpha-Beta divergence. For extensions more related to RRR, Chan et al. (2004) studied the properties of a general semiparametric partial linear reduced-rank regression model, and Yuan et al. (2007) proposed a nonparametric low-rank factor model using regression splines. For other methods concerning the use of dimension reduction techniques to facilitate the exploration of multivariate nonlinear association, see Li et al. (2008) and the references therein.

Clearly, as soon as we venture into the territory beyond linearity, the possible multivariate association structures quickly become so rich and complex that it can even be infeasible to fully retrieve the true association structure. In this paper, to relax the assumption on linear association while still keep the model tractable, motivated by the sufficient dimension reduction literature, we introduce a flexible and yet manageable modeling strategy, where we assume there exists α ∈ ℝ^q and β ∈ ℝ^p so that α^TY relies on X through β^TX, but we do not impose a linear relation or any specific functional link between α^TY and β^TX. Specifically, we only assume

f_{α^{T} Y | X} (α^{T} y, x) = f_{α^{T} Y | β^{T} X} (α^{T} y, β^{T} x) .

(1)

Here, f_α^TY|X stands for the probability density function of α^TY conditional on X, and f_α^TY|β^TX is similarly defined. The model described in (1) is what we name the double single index model (DSI), for the obvious reason that there are two single indices described by α and β respectively. Our proposal has several key ingredients. First, a variable index is often of practical interest and admits meaningful interpretation, and thus this desirable feature is retained in our model, the same as in single index models (Ichimura, 1993). Moreover, searching for a pair of associated indices is the essential objective in many real-world multivariate problems, see for example, Witten et al. (2009), Zhu et al. (2014) and Chen et al. (2014). Second, to allow flexibility in the process of pursuing nonlinearity, we do not intend to characterize the association between α^TY and β^TX. Instead, we aim to extract a relatively simple yet meaningful one-dimensional association between the response variables and the predictors. The DSI model is directly built on the conditional distribution of the variables, in contrast to many methods that only model the mean association structure. Third, in our approach, the desired simple DSI structure is perceived as lurking beneath other parts of the multivariate association of no direct interest. As such, these other parts are treated as nuisance and left unspecified. In estimation, only a working model is needed and the estimation is not sensitive to its misspecification; see Sections 2 and 3 for details. This is a rather important feature both practically and conceptually, especially given that modern data are obtained with ever increasing complexity and yet often only a few summary features of the data contribute to the actual knowledge discovery.

The DSI model has connections to several familiar multivariate models. For example, in the special case when q = 1, the DSI model in (1) reduces to the familiar single index model (Ichimura, 1993; Hardle et al., 1993). In addition, DSI is an extension and generalization of CCA and RRR, in that it allows the association between α^TY and β^TX to be nonlinear. As DSI is specified from conditional distribution, it is more comprehensive than the SCA model which only concerns the association in the mean. We also avoid specifying and modeling other possibly intractable dependence structures between Y and X. The DSI model is also related to the multivariate response sufficient dimension reduction model (Li et al., 2008). In this context, when the structural dimension is one, the model assumes that Y depends on X through β^TX, i.e., f_Y|X(y, x) = f_Y|β^TX(y, β^Tx). This automatically leads to f_α^TY|X(α^Ty, x) = f_α^TY|β^TX(α^Ty, β^Tx) for any α. Now if we relax the requirement so that this relation only holds for some specific α instead of all α, then we obtain the DSI model (1).

The DSI model described in (1) arises naturally in practice. In civil engineering, it is an important topic to study the association between the quality of concrete and its composition (Yeh, 2006). Concrete is a highly complex material and consists of a mixture of several ingredients, including cement, fly ash, blast-furnace slag, water, superplasticizer and aggregate, etc. To summarize the composition of concrete is to study the proportion of these different ingredients, hence a natural way to summarize them is via their linear combination. On the other hand, in terms of quality, concrete is also measured in different aspects. Generally speaking, concrete which has high consistency at its fresh state while also has high strength at its hardened state, indicates that it has the properties of stability and durability, and is thus considered to be of high quality. The various aspects of the concrete quality include strength, stability, durability, etc, and can be summarized into a linear combination of these individual properties. It is thus natural to apply DSI to explore whether a quality index of the concrete (α^TY) exhibits some interesting linear/nonlinear relationship with certain composition of the concrete (β^TX). In Section 4, we analyze a data example concerning concrete to further demonstrate the application of DSI.

Given that we can estimate the linear combination coefficients in α, β, we can subsequently perform a classical univariate-covariate univariate-response nonparametric regression to identify the functional relationship between the two indices. Our proposed method thus provides a useful exploratory tool for examining potential nonlinear associations between two sets of variables.

2 Methodology

To ensure identifiability of α and β, we fix the last component of α and β to be 1 and require (1) to hold at unique α and β locally. We write $α = {(α_{u}^{T}, 1)}^{T}$ and $β = {(β_{u}^{T}, 1)}^{T}$ . The requirement can be easily satisfied by reordering the components in Y and X if necessary. We point out that the parameterization of requiring unit length of an index with positive first component and that of requiring a fixed component to be one are both commonly used in the literature (Newey and Stoker, 1993; Klein and Shen, 2007; Klein and Vella, 2009); here we choose the latter to enable the semi-parametric analysis and computation to be carried out in a more straightforward way. Under this parameterization, our interest is then exclusively in the q − 1 dimensional vector α_u and the p − 1 dimensional vector β_u. Here the subindex _u stands for unknown. Let $γ = {(α_{u}^{T}, β_{u}^{T})}^{T}$ be the unknown parameter of interest.

To provide a more direct and intuitive example of the model and its identifiability in (1), we consider the case when α_u is zero. This occurs when the last component Y_q depends on X through a single index β, while all other components in Y, i.e. Y₁, …, Y_q−1 depend on X through structures more complex than the single index model. For example, Y_k = m_k(X_kX_k+1) + ε_k for k = 1, …, q − 1, where ε_k is a mean zero random variable independent of X and m_k is a non-constant function. Having understood the component-wise model corresponding to the special α, we can then generalize the situation to the case when the response variable is further rotated and stretched by incorporating a general α. Further, if we restrict our interest on α in a local neighborhood, we can allow more components of Y to depend on X through single indices, as long as different components of Y correspond to different single indices. In this case, in a local neighborhood, only one of these single index structures, corresponding to one particular component of Y, will be of interest. With the additional rotation and stretching, only one linear combination of Y will be captured by α hence the problem is locally identifiable.

Following model (1), we write out the likelihood at one typical observation as

f_{X, Y} (x, y) = η_{1} (x) η_{2} (α^{T} y, β^{T} x) η_{3} (y_{r}, α^{T} y, x) .

Here y_r is the vector of the first q − 1 components of y, η₁ represents the probability density function (pdf) of X and η₃ represents the pdf of Y_r conditional on α^TY and X. We use η₂ to represent the pdf of α^TY conditional on X, which by the model assumption in (1) is a function of α^Ty and β^Tx only. Note that η₁, η₂, η₃ are all unknown. It is now clear that (1) can be viewed as a semiparametric model where the parameter of interest is γ and η₁, η₂, η₃ are three nuisance parameters. We thus use the semiparametric analysis tools to derive nuisance tangent space Λ and its orthogonal complement Λ^⊥. The details of the derivation are in the supplementary materials, where we obtain the conclusion that

Λ^{⊥} = {b (α^{T} Y, X) - E (b | α^{T} y, β^{T} x) : \forall b (α^{T} Y, X) \in ℝ^{p + q - 2} s . t . E (b | β^{T} x) = E (b | x)} .

This result somewhat resembles the results in Ma and Zhu (2012), where their univariate Y is replaced by α^TY here. Thus, the constructions there can be applied here as well by replacing all the instances of Y with α^TY. Let a and a_i’s be arbitrary functions of x, while g and g_i’s be arbitrary functions of α^Ty and β^Tx. Here a, a_i, g, g_i can be scalar, vector or matrix functions as long as their dimensions conform and the dimension of their product is p + q − 2, i.e. ga ∈ ℝ^p+q−2 and g_ia_i ∈ ℝ^p+q−2 for i = 1, …, k. Since

E [{g (α^{T} Y, β^{T} X) - E (g | β^{T} X)} {a (X) - E (a | β^{T} X)}] = 0

(2)

and

E [\sum_{i = 1}^{k} {g_{i} (α^{T} Y, β^{T} X) - E (g_{i} | β^{T} X)} {a_{i} (X) - E (a_{i} | β^{T} X)}] = 0,

(3)

we can use the functions inside the above expectations to construct root-n consistent estimators. The construction contained in (2) and (3) possesses a nice double robustness property, in that between the two expectations E{g(α^Ty, β^Tx) | β^Tx} (or E{g_i(α^Ty, β^Tx) | β^Tx}) and E{a(x) | β^Tx} (or E{a_i(x) | β^Tx}), as long as we calculate one of them correctly, we are free to mis-specify the other and the consistency of the estimating function will still be retained. That is, for instance in (2), we have

E [{g (α^{T} y, β^{T} x) - E (g | β^{T} x)} {a (x) - h (β^{T} x)}] = 0,

and

E [{g (α^{T} y, β^{T} x) - h (β^{T} x)} {a (x) - E (a | β^{T} x)}] = 0

for any function h(β^Tx). However, different from the practice in Ma and Zhu (2012), we summarize the theoretical results of estimating γ based on (2) in Theorems 2.1, where the matrix A in Theorem 2.1 is required to have rank p + q − 2.

Theorem 2.1

Under the regularity conditions C1–C6 listed in the supplement A.2, the estimator γ̂ from the estimating equation

\sum_{i = 1}^{n} {g ({\hat{α}}^{T} y_{i}, {\hat{β}}^{T} x_{i}) - Ê (g | {\hat{β}}^{T} x_{i})} {a (x_{i}) - Ê (a | {\hat{β}}^{T} x_{i})} = 0

is consistent, i.e.

\hat{γ} \to γ

in probability when n → ∞. In addition, the estimator satisfies

\sqrt{n} A (\hat{γ} - γ) \to N (0, B)

in distribution when n → ∞. Here

A = E (\partial vec [{g (α^{T} y, β^{T} x) - E (g | β^{T} x)} {a (x) - E (a | β^{T} x)}] / \partial γ^{T}),

B = cov (vec [{g (α^{T} y, β^{T} x) - E (g | β^{T} x)} {a (x) - E (a | β^{T} x)}]) .

Theorem 2.1 implies an interesting phenomenon, in that although we estimated the two expectations conditional on β̂^Tx nonparametrically, the corresponding estimation causes no effect on the final asymptotic properties of γ̂. In other words, if we had known how to obtain E(a | β̂^Tx) and E(g | β̂^Tx) exactly, the estimation of γ would not have been improved further. This nice property is a direct result of the double centering form of the estimating equation in Theorem 2.1, where we centered both g and a through subtracting their respective mean conditional on β̂^Tx, before multiplication. Similar practice has been used in other models in the partially linear model related literature (Ma et al., 2006; Ma and Zhu, 2013a) and sufficient dimensional reduction literature (Ma and Zhu, 2012). How the double centering operation leads to this property is clearly shown in the proof of Theorem 2.1, especially through Lemma 1.2, given in the supplement A.3. It is also clear from the derivation in the supplement A.3 that if we had taken advantage of the double robustness property mentioned before and had estimated only one expectation faithfully while using an arbitrary h(β^Tx) to replace the other expectation, then the fact that we had to estimate the conditional expectation would have led to an alteration of the variability in estimating γ̂.

We now further investigate the efficient estimation issue through calculating the score and the efficient score. First, straightforward calculation yields

S_{β} (α^{T} y, β^{T} x) = \frac{\partial log η_{2} (α^{T} y, β^{T} x)}{\partial (β^{T} x)} x_{r},

where we use x_r to denote the vector of the first p − 1 components of x. Now projecting S_β onto Λ^⊥, we obtain

S_{eff β} (α^{T} y, β^{T} x) = \frac{\partial log η_{2} (α^{T} y, β^{T} x)}{\partial (β^{T} x)} {x_{r} - E (X_{r} | β^{T} x)} .

This is because we can easily verify that S_eff
β(α^Ty, β^Tx) ∈ Λ^⊥ and {∂logη₂(α^Ty, β^Tx)/∂(β^Tx)}E(X_r | β^Tx) ∈ Λ, based on the description of Λ^⊥ and Λ in supplement A.1. We now further calculate

S_{α} (y_{r}, α^{T} y, x) = \frac{\partial log η_{2} (α^{T} y, β^{T} x)}{\partial (α^{T} y)} y_{r} + \frac{\partial log η_{3} (y_{r}, α^{T} y, x)}{\partial (α^{T} y)} y_{r .}

Projecting S_α onto Λ^⊥, we obtain S_eff
α(α^Ty, β^Tx) = E(S_α | α^Ty, x) − E(S_α | α^Ty, β^Tx). This is because

E {E (S_{α} | α^{T} y, x) | x} = E (S_{α} | x) = \int \frac{\partial {η_{2} (α^{T} y, β^{T} x) η_{3} (y_{r}, α^{T} y, x)}}{\partial (α^{T} y)} d (α^{T} y) y_{r} d y_{r} = 0,

hence E{E(S_α | α^Ty, x) | x} = E{E(S_α | α^Ty, x) | β^Tx}, which implies S_{eff α} ∈ Λ^⊥. On the other hand, S_α − E(S_α | α^Ty, x) ∈ Λ₃ and E{E(S_α | α^Ty, β^Tx) | β^Tx} = E(S_α | β^Tx) = 0, hence S_α − S_eff
α ∈ Λ indeed. Hence the projection of S_α onto Λ^⊥ is indeed given by S_{eff α}. Specifically, we obtain

S_{eff α} (α^{T} y, β^{T} x) = \frac{\partial log η_{2} (α^{T} y, β^{T} x)}{\partial (α^{T} y)} {E (y_{r} | α^{T} y, x) - E (y_{r} | α^{T} y, β^{T} x)} + E {\frac{\partial log η_{3} (y_{r}, α^{T} y, x)}{\partial (α^{T} y)} y_{r} | α^{T} y, x} - E {\frac{\partial log η_{3} (y_{r}, α^{T} y, x)}{\partial (α^{T} y)} y_{r} | α^{T} y, β^{T} x} .

Combining the two calculations, we have $S_{eff} = {({S_{eff}}_{α}^{T}, {S_{eff}}_{β}^{T})}^{T}$ .

Unfortunately, the estimation of E(y_r | α^Ty, x) and η₃(y_r, α^Ty, x) is subject to curse of dimensionality due to the presence of x (as well as y_r for η₃(y_r, α^Ty, x)). Hence the efficient estimator is unreachable in practice. This is in contrast to Ma and Zhu (2013b), where only a univariate Y is concerned. However, we can use the form of S_eff to construct locally efficient estimators using a working model of η₃. Although we can estimate η₂, considering that in any case we cannot guarantee efficiency, we will use a working model of η₂ as well. To this end, we propose to posit the working models $η_{2}^{*} (α^{T} y, β^{T} x)$ and $η_{3}^{*} (y_{r}, α^{T} y, x)$ . We then construct the locally efficient estimators from $S_{eff}^{*} = {({S_{eff}^{*}}_{α}^{T}, {S_{eff}^{*}}_{β}^{T})}^{T}$ , where

S_{eff α}^{*} = \frac{\partial log η_{2}^{*} (α^{T} y, β^{T} x)}{\partial (α^{T} y)} [E^{*} (y_{r} | α^{T} y, x) - E {E^{*} (y_{r} | α^{T} y, x) | α^{T} y, β^{T} x}] + E^{*} {\frac{\partial log η_{3}^{*} (y_{r}, α^{T} y, x)}{\partial (α^{T} y)} y_{r} | α^{T} y, x} - E [E^{*} {\frac{\partial log η_{3}^{*} (y_{r}, α^{T} y, x)}{\partial (α^{T} y)} y_{r} | α^{T} y, x} | α^{T} y, β^{T} x],

and

S_{eff β}^{*} = [\frac{\partial log η_{2}^{*} (α^{T} y, β^{T} x)}{\partial (β^{T} x)} - E {\frac{\partial log η_{2}^{*} (α^{T} y, β^{T} x)}{\partial (β^{T} x)} | β^{T} x}] {x_{r} - E (X_{r} | β^{T} x)} .

We can obtain the locally efficient estimator through using $S_{eff}^{*}$ . Specifically, use O_i to denote the ith observation, and use $S_{eff}^{*} (O_{i}; γ, Ê)$ to denote the efficient score evaluated at O_i, with E replaced by its kernel estimator Ê. The estimator γ̂ = (α̂^T, β̂^T)^T satisfies

\sum_{i = 1}^{n} S_{eff}^{*} (O_{i}; \hat{γ}, Ê) = 0 .

(4)

We show that γ̂ is locally efficient, i.e., it is efficient when η₂ and η₃ are correctly specified; otherwise it is still consistent and asymptotically normal.

Theorem 2.2

Under the regularity conditions B1–B5 listed in the supplement A.4, the estimator γ̂ from the estimating equation (4) is locally efficient. Specifically, when n → ∞,

\sqrt{n} (\hat{γ} - γ) \to N (0, A^{- 1} {B A^{- 1}}^{T}),

where $A = - E {\partial S_{eff}^{*} (O_{i}; γ) / \partial γ^{T}}$ and $B = E [{S_{eff}^{*} (O_{i}; γ) - u^{*} (x_{i}; γ)}^{\otimes 2}]$ . Here,

u^{*} (x_{i}; γ) \equiv \int b^{*} (α^{T} y, x_{i}) η_{2} (α^{T} y, β^{T} x_{i}) d (α^{T} y) - \int \frac{\int_{β^{T} x = β^{T} x_{i}} b^{*} (α^{T} y, x) f_{x} (x) d x η_{2} (α^{T} y, β^{T} x_{i})}{f (β^{T} x_{i})} d (α^{T} y) .

In addition, when $η_{2}^{*} (α^{T} y, β^{T} x) = η_{2} (α^{T} y, β^{T} x)$ and $η_{3}^{*} (y_{r}, α^{T} y, β^{T} x) = η_{3} (y_{r}, α^{T} y, β^{T} x)$ , then $A = B = E {S_{eff}^{\otimes 2} (O_{i}; γ)}$ , and the estimator is efficient. Here a^⊗2 ≡ aa^T for any vector or matrix a.

The details of the implementation of the locally efficient estimator is the following. To simplify the description of the implementation of the locally efficient estimator, we first define functions

m_{1} (β^{T} x) \equiv E (x_{r} | β^{T} x),

m_{2} (β^{T} x) \equiv E {\frac{\partial log η_{2}^{*} (α^{T} y, β^{T} x)}{\partial β^{T} x} | β^{T} x},

m_{3} (α^{T} y, β^{T} x) \equiv E {b^{*} (α^{T} y, x) | α^{T} y, β^{T} x},

where

b^{*} (α^{T} y, x) \equiv \frac{\partial log η_{2}^{*} (α^{T} y, β^{T} x)}{\partial (α^{T} y)} E^{*} (y_{r} | α^{T} y, x) + E^{*} {\frac{\partial log η_{3}^{*} (y_{r}, α^{T} y, x)}{\partial (α^{T} y)} y_{r} | α^{T} y, x} .

The Nadaraya-Watson kernel estimators of m₁, m₂ and m₃ are respectively

{\hat{m}}_{1} (β^{T} x) = \frac{\sum_{i = 1}^{n} K_{h} {β^{T} (x - x_{i})} x_{r i}}{\sum_{i = 1}^{n} K_{h} {β^{T} (x - x_{i})}},

{\hat{m}}_{2} (β^{T} x) = \frac{\sum_{i = 1}^{n} K_{h} {β^{T} (x - x_{i})} \partial log η_{2}^{*} (α^{T} y_{i}, β^{T} x_{i}) / \partial β^{T} x_{i}}{\sum_{i = 1}^{n} K_{h} {β^{T} (x - x_{i})}},

{\hat{m}}_{3} (α^{T} y, β^{T} x) = \frac{\sum_{i = 1}^{n} K_{h} {β^{T} (x - x_{i})} b^{*} (α^{T} y, x_{i})}{\sum_{i = 1}^{n} K_{h} {β^{T} (x - x_{i})}},

where h is a bandwidth. To emphasize the dependence on m₁, m₂, m₃, we can write the locally efficient score function

S_{eff}^{*} {α^{T} y, β^{T} x, m_{1} (β^{T} x), m_{2} (β^{T} x), m_{3} (α^{T} y, β^{T} x)} .

The locally efficient estimator can then be obtained in practice through solving the estimating equation

\sum_{i = 1}^{n} S_{eff}^{*} {α^{T} y_{i}, β^{T} x_{i}, {\hat{m}}_{1} (β^{T} x_{i}), {\hat{m}}_{2} (β^{T} x_{i}), {\hat{m}}_{3} (α^{T} y_{i}, β^{T} x_{i})} = 0 .

As long as the bandwidth h is fixed, the only unknown quantity in the estimating equation is γ. The estimating equation can be solved by standard optimization methods such as the Newton-Raphson algorithm or the trust region method. Because a wide range of bandwidths all lead to the same asymptotic result (see condition B4 in the supplement and Theorem 2.2), hence even in finite samples, the estimator is quite insensitive to the bandwidth. Thus, we can simply use h = n^−1/5 in the implementation. One can certainly perform cross validation and use a unique bandwidth to associate with each specific nonparametric regression, at the cost of selecting more bandwidths. We have implemented our method in MATLAB, where Newton-Raphson algorithm is applied and numerical difference is used to approximate the local derivative functions. Based on our limited experience the computation is stable and fast.

Having estimated γ, we can perform nonparametric regression of α̂^TY on β̂^TX to further estimate η₂, following, for example, Fan et al. (2003). Because γ is estimated at the parametric rate of root-n, the estimation of η₂ will have the usual nonparametric estimation rate, and its first order asymptotic properties are the same as that of the estimation of η₂ using the true parameter γ. Because the derivation and the results of the nonparametric procedure are standard, we omit the details.

Trimming (Ichimura, 1993) is often needed in nonparametric estimation to handle the potential issue of dividing by zero. However, trimming is avoided here because we only need the nonparametric evaluations at β^Tx_i, which is always positive since we include the ith observation in the estimator. Further, condition C3 guarantees the density of β^TX to be bounded away from zero. Thus, when sample size is sufficiently large, the estimated density is also bounded away from zero.

In practice, to specify $η_{2}^{*}$ and $η_{3}^{*}$ , we suggest the following. First, use simpler methods such as CCA or SCA to obtain starting values (α̃, β̃) for DSI. Then, $η_{2}^{*}$ can be specified based on the empirical conditional distribution between the leading canonical pairs α̃^Ty and β̃^Tx, e.g, α^Ty ~ N(aβ^Tx + b, σ²), where a, b and σ² are estimated from a regression analysis between α̃^Ty and β̃^Tx. Our numerical results suggest that $η_{3}^{*}$ can be specified in a more crude way, see Section 3. For example, we can specify $η_{3}^{*}$ by assuming the components of y_r are independent conditional on α̃^Ty and x and conducting regression analysis between y_r and some linear/nonlinear functions of α̃^Ty and x.

3 Simulation

3.1 Setups

We conduct simulation studies to evaluate the finite sample performance of the proposed methods. For comparison, the most relevant nonlinear approach to our method is the semiparametric canonical correlation analysis (SCA) proposed by Xia (2008). SCA searches the pair of indices by minimizing E{α^TY − E(α^TY | β^TX)}², and the procedure involves the estimation of E(Y_i | X) and its derivatives using d-th order local polynomial smoothing, where d > p/2 + 1 in order to achieve $\sqrt{n}$ -consistency. Several classical multivariate tools based on certain linearity assumption can also be applied for such two-way search, with CCA and RRR as the popular prototypes of those. We thus compare the proposed DSI approach to CCA, RRR and SCA.

We set p = 5, q = 4, α = (1, 1, 1, 1)^T and β = (1, −1, 1, −1, 1)^T in all the simulation examples. The process of generating a typical observation (x, y) is as follows.

Generate x from η₁, the marginal distribution of X.
Compute β^Tx and generate α^Ty from η₂, the conditional distribution of α^TY given β^Tx.
Generate y_r from η₃, the conditional distribution of Y_r given α^Ty and x.
Compute y_q from the generated values α^Ty and y_r, i.e., $y_{q} = α^{T} y - α_{u}^{T} y_{r}$ . Let $y = {(y_{r}^{T}, y_{q})}^{T}$ .

We set η₁ as the standard multivariate normal distribution; in practice, with the components of X correlated, one may orthogonalize the variables before pursuing sufficient dimension reduction. We consider three models with different choices of η₂:

Model I: η₂ is the normal distribution with mean μ_η = β^Tx and variance $σ_{η}^{2} = 4$ .
Model II: η₂ is the normal distribution with mean μ_η = (β^Tx)² and variance $σ_{η}^{2} = 6$ .
Model III: η₂ is the normal distribution with mean μ_η = (β^Tx)² and variance $σ_{η}^{2} = σ^{2} exp (β^{T} x) / 3)$ where σ² = 6.

In each of the above models, η₃ is set as the multivariate normal distribution with mean vector μ_r = (μ_r,1, …, μ_r,q−1)^T with

μ_{r, i} = α^{T} y / q + 2 sin (α^{T} y) + a (h_{i}^{T} x) + b {(h_{i}^{T} x)}^{2}, i = 1, \dots, q - 1,

and covariance matrix 4I, where the h_is are orthonormal vectors that are also orthogonal to β. The constants a and b are chosen to control the marginal correlation structure of Y. Specifically, we set a = 3, b = 3 in Model 1, and a = 3, b = 9 in both Models II and III, so that the correlations among the Y_i, i = 1, …, q are roughly at or below 0.6 in magnitude. These setups ensure that α^TY is not dominated by any particular coordinate in Y and it is indeed the desired simple direction, i.e., a direction in Y that is associated with a one-dimensional sufficient dimension reduction subspace in X. For the above models, it can be conveniently shown that

\frac{\partial log η_{2} (α^{T} y, β^{T} x)}{\partial (α^{T} y)} = \frac{μ_{η} - α^{T} y}{σ_{η}^{2},},

\frac{\partial log η_{2} (α^{T} y, β^{T} x)}{\partial (β^{T} x)} = - \frac{1}{2} \frac{\partial log σ_{η}^{2}}{\partial (β^{T} x)} - \frac{2 (μ_{η} - α^{T} y) \frac{\partial μ_{η}}{\partial (β^{T} x)} σ_{η}^{2} - {(μ_{η} - α^{T} y)}^{2} \frac{\partial σ_{η}^{2}}{\partial (β^{T} x)}}{2 σ_{η}^{4}},

and

E {\frac{\partial log η_{3} (y_{r}, α^{T} y, x)}{\partial (α^{T} y)} y_{r} | α^{T} y, x} = \frac{\partial μ_{r}}{\partial (α^{T} y)} .

The proposed locally efficient estimation is from solving the estimating equations (4) with potentially misspecified η₂ and η₃. For all three simulation examples, we set $η_{3}^{*}$ , the working model for η₃, as normal with mean vector ${(α^{T} y / q + x_{1}^{2}, \dots, α^{T} y / q + x_{q}^{2})}^{T}$ and variance-covariance matrix identity. We set $η_{2}^{*}$ , the working model of η₂, as $N (μ_{η} = 2 β^{T} x, σ_{η}^{2} = 9)$ in Model I and $N (μ_{η} = | β^{T} x |, σ_{η}^{2} = 9)$ in Models II and III. Three locally efficient estimators are constructed: the first one is based on $η_{2}^{*}$ and $η_{3}^{*}$ (LOC1), the second one is based on η₂ and $η_{3}^{*}$ (LOC2), and the third one is based on $η_{2}^{*}$ and η₃ (LOC3). When both η₂ and η₃ are correctly specified, we obtain an efficient oracle estimator (OR). Since η₂ and η₃ are usually unknown in real problems, LOC2, LOC3 and OR are not feasible in practice, but here they may serve as benchmarks to examine the effects of model misidentification. Based on Theorem 2.1, we also construct a simple consistent estimator (SIM), in which we choose g(α^Ty, β^Tx) = E(x | α^Ty) and a(x) = x^T. As we focus on the single index setup, the first leading pair of canonical variables are extracted from CCA, and a unit-rank estimator is obtained from RRR; the resulting estimators are denoted as CCA1 and RRR1, respectively. For CCA and RRR, we also extracted min(p, q) pairs of directions and recorded the one that is the closet to the true pair measured by ‖α̂(α̂^Tα̂)⁻¹α̂^T − α(α^Tα)⁻¹α^T‖_F + ‖β̂(β̂^Tβ̂)⁻¹β̂^T − β(β^Tβ)⁻¹β^T‖_F; the resulting estimators are denoted CCA* and RRR*, respectively. Similarly, for the semiparametric method SCA, we computed two estimators SCA1 and SCA*.

3.2 Results

We have considered various sample sizes, i.e., n = 500, 200 and 100, while for brevity, we mainly focus our discussion for the case n = 500 in the sequel, unless otherwise noted. The experiment is replicated 500 times under each setting. The obtained estimates (α̂, β̂) are standardized in the same way as the true (α, β), as described in Section 2. Figures 1–3 show the boxplots of the Euclidean distances between the true parameters and their estimated counterparts from all simulation runs, i.e., d(α̂, α) = ‖α̂(α̂^Tα̂)⁻¹α̂^T − α(α^Tα)⁻¹α^T‖_F for measuring the distance from α̂ to α, and d(β̂, β) = ‖β̂(β̂^Tβ̂)⁻¹β̂^T − β(β^Tβ)⁻¹β^T‖_F for measuring the distance from β̂ to β, where ‖·‖_F denotes the Frobenius norm. Tables 1–3 report the average parameter estimates (ave) and their associated standard errors (std), for Models I–III respectively. For the proposed semiparametric estimators, we also report the average of the estimated standard deviations $(\hat{std})$ and the coverage of the estimated 95% confidence interval (95%), based on the asymptotic results.

Boxplots of d(**α̂, α**) and d(**β̂, β**) for Model I (n = 500).

Boxplots of d(**α̂, α**) and d(**β̂, β**) for Model III (n = 500).

Table 1.

Simulation results for Model I (n = 500).

α₁

α₂

α₃

β₁

β₂

β₃

β₄

CCA1

ave

1.0019

1.0017

1.0025

−0.9954

0.9978

−0.9863

0.9964

std

0.0353

0.0333

0.0344

0.1415

0.1396

0.1310

0.1336

CCA*

ave

1.0019

1.0017

1.0025

−0.9954

0.9978

−0.9863

0.9964

std

0.0353

0.0333

0.0344

0.1415

0.1396

0.1310

0.1336

RRR1

ave

−0.4651

−0.5834

−0.4676

0.9328

1.1671

−0.3009

0.3320

std

0.6076

0.8502

0.6741

1.4070

2.0669

0.7229

0.8102

RRR*

ave

0.8305

1.0038

1.0469

−0.7006

0.8643

−0.8071

0.7799

std

1.6763

1.4649

2.1281

3.6369

4.5548

2.5557

2.8155

SCA1

ave

−1.2397

−0.1936

0.0127

0.7962

−0.7020

−0.2470

0.2418

std

1.7829

1.7914

1.7786

1.4098

1.4535

0.5063

0.5015

SCA*

ave

1.0027

1.0014

1.0022

−0.9966

0.9972

−0.9860

0.9976

std

0.0367

0.0347

0.0356

0.1455

0.1414

0.1337

0.1360

SIM

ave

1.0000

0.9929

0.9968

−1.0256

1.0153

−1.0300

1.0233

std

0.0553

0.0566

0.0570

0.1496

0.1340

0.1383

0.1331

\hat{std}

0.0548

0.0553

0.0568

0.1446

0.1437

0.1359

0.1366

95%

0.9100

0.9140

0.9000

0.9380

0.9560

0.9520

0.9700

LOC1

ave

1.0013

0.9984

1.0001

−1.0037

1.0034

−1.0092

1.0031

std

0.0356

0.0344

0.0327

0.1385

0.1425

0.1396

0.1322

\hat{std}

0.0322

0.0324

0.1406

0.1448

0.1375

0.1393

95%

0.9340

0.9280

0.9500

0.9380

0.9460

0.9380

0.9540

LOC2

ave

1.0015

0.9986

1.0002

−1.0033

1.0044

−1.0083

1.0035

std

0.0348

0.0341

0.0326

0.1304

0.1357

0.1309

0.1239

\hat{std}

0.0332

0.0326

0.0330

0.1329

0.1365

0.1296

0.1311

95%

0.9600

0.9300

0.9560

0.9480

0.9380

0.9520

LOC3

ave

0.9998

0.9994

1.0004

−1.0039

1.0052

−1.0113

1.0045

std

0.0279

0.0287

0.0267

0.1371

0.1359

0.1368

0.1294

\hat{std}

0.0280

0.0278

0.1270

0.1297

0.1233

0.1262

95%

0.9480

0.9440

0.9540

0.9380

0.9340

0.9220

0.9500

ave

0.9996

1.0007

−1.0040

1.0069

−1.0107

1.0052

std

0.0272

0.0284

0.0264

0.1287

0.1292

0.1283

0.1213

\hat{std}

0.0271

0.0266

0.0267

0.1312

0.1332

0.1272

0.1293

95%

0.9500

0.9320

0.9480

0.9520

0.9500

0.9420

0.9600

Open in a new tab

Table 3.

Simulation results for Model III (n = 500).

α₁

α₂

α₃

β₁

β₂

β₃

β₄

CCA1

ave

0.4335

−0.1846

0.6086

0.5321

−0.8804

−0.2799

0.2006

std

5.5592

4.2589

5.6687

3.9372

3.7894

1.3377

1.0205

CCA*

ave

1.0234

0.9830

1.0141

−1.0164

1.1505

−0.9157

1.1674

std

0.2575

0.2588

0.2505

4.6014

4.5963

7.1897

9.5704

RRR1

ave

−0.3377

−0.3885

−0.3376

0.7026

0.0307

−0.0410

−0.1217

std

0.5326

0.3455

0.5272

2.6470

3.0442

1.1687

1.0144

RRR*

ave

0.9656

0.9601

1.0030

−0.7602

0.6793

−0.6031

0.8798

std

0.4133

0.4283

0.6232

4.1492

3.2080

5.3231

5.5661

SCA1

ave

−0.2611

−0.2539

−0.4031

0.9776

−1.1616

−0.6185

0.6212

std

1.9660

1.9287

1.9992

2.1519

2.1893

0.5933

0.5946

SCA*

ave

1.0071

1.0027

1.0069

−0.9820

0.9957

−0.9919

0.9922

std

0.1836

0.1037

0.1223

0.3488

0.3186

0.2126

0.2215

SIM

ave

1.0051

1.0024

1.0059

−1.0837

1.0779

−1.0932

1.0816

std

0.0372

0.0344

0.0368

0.1679

0.1794

0.1671

0.1740

\hat{std}

0.0379

0.0386

0.0378

0.1870

0.1868

0.1885

0.1868

95%

0.9300

0.9440

0.9280

0.9700

0.9540

0.9780

0.9700

LOC1

ave

1.0140

1.0129

1.0136

−1.0034

0.9977

−0.9994

0.9954

std

0.0164

0.0157

0.0163

0.0506

0.0523

0.0537

0.0521

\hat{std}

0.0194

0.0216

0.0501

0.0496

0.0489

0.0497

95%

0.9420

0.9740

0.9760

0.9440

0.9480

0.9460

0.9580

LOC2

ave

1.0010

0.9999

1.0004

−1.0017

1.0003

−1.0012

1.0014

std

0.0117

0.0099

0.0101

0.0239

0.0223

0.0236

0.0231

\hat{std}

0.0112

0.0099

0.0100

0.0239

0.0230

0.0232

0.0230

95%

0.9480

0.9620

0.9560

0.9540

0.9740

0.9520

0.9500

LOC3

ave

1.0003

0.9999

1.0006

−1.0037

0.9975

−0.9999

0.9958

std

0.0133

0.0127

0.0130

0.0485

0.0503

0.0523

0.0508

\hat{std}

0.0122

0.0123

0.0495

0.0494

0.0486

0.0493

95%

0.9340

0.9580

0.9400

0.9560

0.9580

0.9480

0.9660

ave

1.0004

1.0000

1.0004

−1.0016

1.0000

−1.0009

1.0013

std

0.0096

0.0098

0.0220

0.0211

0.0223

0.0218

\hat{std}

0.0095

0.0096

0.0225

0.0220

0.0223

0.0220

95%

0.9500

0.9560

0.9440

0.9580

0.9760

0.9520

0.9500

Open in a new tab

In Model 1, the association between α^TY and β^TX is linear, which should benefit the linear methods. From Table 1, CCA performs very well in estimation, but RRR performs much worse. The discrepancy in performance between these two methods is due to their different objectives: while CCA focuses on maximizing the correlation between a pair of directions in Y and X, RRR focuses on explaining the variation in Y by X. In our model setup, α^TY and β^TX indeed has the strongest linear association among all possible directions, which makes CCA suitable. However, β^TX does not necessarily coincide with the targeted direction of RRR, along which most of the variation in Y can be explained in the least squares sense. As a consequence, RRR is unsuitable here for the discovery of the desired single indices, and even RRR* performs poorly. The performance of SCA* is comparable to CCA; however, the extracted leading pair by the SCA method does not necessarily correspond to the desired pair, as seen from the performance of SCA1. If we knew the underlying model is linear, a parsimonious method like CCA would be preferable. Our results show that the proposed semiparametric approaches, which do not rely on the knowledge of linear model, work almost as well as CCA, with only a slight loss in efficiency. We plotted the results in Figure 1 to show the relative performance of the different methods. For better illustration, we omitted the estimators that perform much worse than the rest of methods.

In Models 2 and 3, the association between β^TY and α^TX is nonlinear, and any other direction in Y may not be adequately characterized by a single direction in X. Not surprisingly, CCA and RRR both perform poorly. The bias in CCA* or RRR* is much smaller than CCA1 or RRR as expected, but the variance of either estimator is very high. Again, SCA1 may pick up other spurious directions to approximate a single index model. Nevertheless, it appears that the desirable pair is most likely among the ones obtained from SCA, albeit a much larger estimation error comparing to the proposed DSI methods. Also, the performance of SCA* in Model II is relatively better than that in Model III, because in Model II the two indices are related only in their mean structure, while in Model III the two indices are also related in their second moments. In all occasions, the DSI estimators continue to perform very well, clearly demonstrating the effectiveness of the proposed methods in detecting nonlinear association. LOC1 performs better than SIM in general as expected. Comparing the three locally efficient estimators and the oracle estimator, the misspecification of η₂ has a bigger impact on estimation than η₃ does. In both models, OR performs the best among all the methods, due to the fact that the search of the directions becomes more trackable when the underlying model structure is correctly chosen. On the other hand, even when both η₂ and η₃ are misspecified, LOC1 still achieves small bias and remarkable estimation accuracy, with only slightly increased standard errors.

Furthermore, we can see that the inference results based on the asymptotic analysis are accurate in general. The estimated standard errors match well with their counterparts based on Monte Carlo simulation, and the coverage probabilities are mostly close to the nominal level 95%. We notice that in Models II and III, LOC1 tends to be slightly biased for the estimation of α, and the standard errors also tend to be slightly overestimated. Nevertheless, in our experiment the inference results improve when we increase the sample size.

We have also experimented with smaller sample sizes. The estimation performance of the semiparametric estimators in Model II for n = 200 and n = 100 are shown in the supplementary materials. While the estimation accuracy of the DSI methods is still satisfactory, it appears that the performance of SCA* deteriorates more severely. Probably this is because the SCA method requires the estimation of E(Y_i | X) and its derivatives, which needs strong sample size requirement depending on the predictor dimension p. We have experimented with models in which the two indices are related in the second moments but not the first, and as expected SCA* fails while the proposed method continues to perform well. We note that the inference results of DSI may become less accurate for small sample sizes. In particular, the coverage probabilities for SIM and LOC1 tend to be slightly lower than the nominal level. This is expected as the inference procedure involves numerical approximations in several places, and for complex models a larger sample size may be required to allow the asymptotic theory to take effect. Following the request of a referee, we also increased the dimensions p and q and investigated the scalability of the method. The results are very encouraging. We provide the details of the computational performance in the supplementary materials.

4 Concrete Slump Test Data

As a mixture of several ingredients, concrete is a highly complex material. Understanding the relationship between the quality and composition of concrete is an important topic in the filed of Civil Engineering. Generally speaking, concrete with high consistency at its fresh state and with high strength at its hardened state exhibits desirable properties of stability and durability. The consistency of fresh concrete is commonly measured through a slump-cone test, by examining the behaviors of a compacted inverted cone of fresh concrete under the action of gravity: the slump is measured by the length of the drop from the top of the slumped concrete, and the slump flow is measured by its diameter. Here we consider a slump test dataset, consisting of 103 sets of slump test measurements (Yeh, 2006, 2007). Three variables regarding the quality of concrete were recorded including slump (cm), slump flow (cm) and 28-day compressive strength (mpa). The ingredients composing the concrete were also recorded (kg/m³), including cement, fly ash, blast furnace slag, water, superplasticizer and aggregate. Here, we apply the DSI approach to explore the association between the three quality variables (q = 3) and three ingredient variables (p = 3), the fly ash, water and superplasticizer, which are known to be important factors related to the slump and concrete quality (Yeh, 2006). All the variables are standardized prior to the analysis.

We apply CCA, RRR and SCA to identify possible linear/nonlinear relationships between the two sets of variables. We then conduct the DSI estimation, starting from 100 sets of initial values of α and β, randomly generated by adding Gaussian noise N(0, 3) to their CCA/RRR estimates. As the estimation problem is local in nature, this ensures that the starting points are fairly spread out in the vicinity of some initial linear estimates, enabling us to explore whether interesting directions of sufficient dimension reduction can be found when deviating away from the linear analysis. Because of the nonconvexity of the problem, multiple roots of the estimating equations may exist. In this problem, predominately we find two roots from the 100 model fitting attempts. Upper plots of Figure 4 depict the observed data points along the estimated linear directions from CCA and RRR, together with the fitted linear regression curves. In the middle panel of Figure 4, we plotted the two sets of solutions from the DSI method, and the fitted nonparametric regression curves are also shown. The SCI methods also extracted two pairs of directions, as shown in the bottom panel of Figure 4. The parameter estimates are given in Table 4. We have used the single-indexing (leave-one-out) cross-validation method (Xia, 2008) to assess the goodness of fit of the extracted pairs, to test whether α̂^TY can be adequately predicted by a single index model of β̂^TX, and all the six pairs mentioned above passed the test.

Scatter plots along the estimated single-index directions for the slump test data analysis.

Table 4.

Coefficient estimation in the slump test data analysis.

	Slump flow	Strength	Slump	Fly ash	Superplasticizer	Water
	α₁	α₂	α₃	β₁	β₂	β₃	Lack of fit
CCA	−1.6433	0.2853	1.0000	0.1032	0.0715	1.0000	No
RRR	1.2951	−0.5765	1.0000	−0.1115	−0.1691	1.0000	No
SCA(1)	−1.6917	0.2938	1.0000	0.1380	0.0469	1.0000	No
SCA(2)	−0.6435	0.0612	1.0000	−0.1487	−0.5702	1.0000	No
DSI(1)	−1.5573	0.2700	1.0000	0.1108	0.0713	1.0000	No
DSI(2)	3.1315	−1.5634	1.0000	−0.1583	−0.1508	1.0000	No

Open in a new tab

The first pair of directions found by either DSI or SCI mostly coincides with those from CCA. From the similarity of the results, as well as the fitted nonparametric curves, we can see a strong linear association along this pair of directions. It is worth pointing out that the coefficients for the slump flow has opposite sign from that of the strength or slump. This can be explained easily because in the slump test, the slump and the slump flow are in fact strongly positively correlated. Generally speaking, a lower slump implies a lower slump flow and higher compressive strength.

The second set of DSI solution reveals another interesting relation between the concrete quality and its character. In this case, the estimated α̂ from DSI agrees in sign with that from the RRR, although their coefficient values are quite different. The identified β̂ directions in X from the two methods are similar and mainly dominated by the water content variable. This is not surprising as the water content is the most important factor influencing the property of concrete, and the fly ash and the superplasticizer are both supplemental admixtures that are expected to have some secondary impact. Up to a few outliers, the association between the identified single indices by DSI can be well characterized by the fitted robust nonlinear nonparametric regression line, as shown in Figure 4(d). The coefficient of determination (R²) for the DSI fitted line is 0.503, while that for the RRR fitted line is 0.420. (We have removed two potential outliers, and the R² values before outlier removal are 0.426 and 0.364, respectively.) As the water content increases, the quality index seems to increase sharply at the beginning, then flats out and eventfully decreases slightly. These findings are consistent with the results in Yeh (2006), in which a similar nonlinear relationship between slump and water content was detected via neural network models. From Figure (4), the second pair found by SCI appears to be spurious, and does not offer much insight to the problem. This example demonstrates that the DSI approach can be a useful and flexible tool for conveniently exploring simple nonlinear structures in complex multivariate association.

5 Discussion

Although the DSI method is illustrated in an engineering problem, it has potential in other application areas. For example, in marine ecology, DSI can be used to study the dependence between the yearly adult fish abundance, summarized from the observed fish abundances in spatial regions (α^TY) and the yearly larval abundance, summarized from observed daily spawning biomass (β^TX) (Chen et al., 2014). In portfolio construction, DSI can be used to study the relation between the asset return, summarized from the allocation of the available assets (α^TY) to the market return, summarized from market indices and macroeconomic variables (β^TX). In genomic research, DSI can be used to study the relation between the summary of gene expression profiles (α^TY) and the summary of single-nucleotide polymorphism (β^TX) (Witten et al., 2009). More broadly, DSI can also be applied in many time series problems, where several random variables evolve together over time. In particular, the reduced-rank linear vector autoregressive (VAR) model is an important tool in modeling the vector time series (Reinsel and Velu, 1998). It can be readily seen that the DSI model extends and renovates a unit-rank VAR model, i.e., the present value of an index of the vector time series has nonlinear relationship with the past value of another index.

We have developed a flexible double single index model for exploring unspecified and possibly nonlinear function relations between multivariate response and predictors. There are many directions for future research. For example, our method can serve as a building block to study multi-index models, analogous to the multi-factor CCA or the RRR methods. To go beyond these linear methods, one challenge is how to exhaustively extract pairs of indices for sufficient dimension reduction without imposing any specific form or restrictive assumption on their functional relations. To this end, multi-index modeling and estimation strategies similar to the sufficient dimension reduction literature is one possibility. Sequentially extracting the single index pairs from both the covariate and response variables is also worth careful investigation. To further facilitate variable selection and model interpretation, we can also consider regularized estimation in the DSI model, e.g., imposing sparsity assumption on α, β so that the constructed pair of indices only involves a subset of the responses and the predictors (Chen et al., 2012; Chen and Huang, 2012; Bunea et al., 2012).

An alternative model related to the one considered here can be constructed by further assuming that the dependence of the response variable Y on the covariates X is completely captured by the dependence of a linear combination of Y on X. In other words, Y_r is independent of X conditional on α^TY. Although the assumption is stronger than the double single index model, it offers an interesting modeling approach and may have important applications. The estimation, efficiency and application of such model is certainly worth exploring.

Several possibilities exist for model checking. The general idea is that because our estimation method enables the estimation of α and β, one can construct both indices. This enables us to reduce the multi-covariate multi-response problem to an effective uni-covariate uni-response problem and facilitates the application of several existing methods. For example, to check whether α̂^TY can be adequately modeled by a single index model using β̂^TX, many existing goodness-of-fit methods developed in the single index model framework can be applied (Stute and Zhu, 2005; Xia, 2008; Liang et al., 2010; Ma et al., 2014). In addition, a graphical tool is also possible as an exploratory tool, where one only needs to plot the data cloud formed by the two indices and inspect if the data cloud is compact along the response index. This exploratory tool is often used in the dimension reduction literature.

A potentially more fundamental problem is how to parsimoniously and flexibly approximate the multivariate conditional distribution of Y given X (Hall and Yao, 2005). Given the curse of dimensionality issue due to nonparametric estimation with multiple indices, a sequential estimation procedure, which extracts double single index model structures sequentially to improve the current approximation of the conditional distribution, can be particularly useful. Built upon the proposed DSI model, such strategy has great potential in advancing nonlinear modeling and scalable dimension reduction and is certainly on our research agenda.

Supplementary Material

supplemental data

NIHMS816420-supplement-supplemental_data.pdf^{(269KB, pdf)}

Boxplots of d(**α̂, α**) and d(**β̂, β**) for Model II (n = 500).

Table 2.

Simulation results for Model II (n = 500).

α₁

α₂

α₃

β₁

β₂

β₃

β₄

CCA1

ave

0.6085

0.0534

−0.1249

0.4247

−1.0752

−0.1941

0.1703

std

5.3256

3.7520

4.8869

3.2969

3.3596

1.1036

1.0924

CCA*

ave

1.0022

1.0121

1.0252

−0.9870

0.8972

−1.2320

1.0596

std

0.2406

0.2551

0.2683

2.5884

2.2488

5.2078

5.8762

RRR1

ave

−0.3430

−0.4070

−0.3075

0.7447

0.0802

0.0928

−0.0913

std

0.5085

0.2223

0.4834

2.6632

2.7881

1.0480

0.9500

RRR*

ave

0.9439

1.0289

0.9611

−1.1452

0.8949

−1.1906

0.7960

std

1.2258

0.6755

0.5541

3.7731

2.0927

5.3769

4.9484

SCA1

ave

−0.3326

−0.1935

−0.1858

0.9299

−0.8993

−0.5943

0.5953

std

1.9992

1.8853

1.9391

2.1112

2.1041

0.5956

0.5970

SCA*

ave

1.0006

1.0005

0.9996

−0.9979

0.9922

−0.9995

0.9965

std

0.0574

0.0383

0.0457

0.1929

0.1960

0.1233

0.1168

SIM

ave

1.0047

1.0026

1.0063

−1.0826

1.0878

−1.0949

1.0776

std

0.0350

0.0368

0.0365

0.1751

0.1675

0.1787

0.1777

\hat{std}

0.0359

0.0351

0.0355

0.1711

0.1656

0.1626

0.1681

95%

0.9260

0.9140

0.9220

0.9280

0.9320

0.9360

0.9300

LOC1

ave

1.0145

1.0133

1.0129

−0.9999

1.0015

−1.0005

1.0015

std

0.0144

0.0160

0.0145

0.0406

0.0409

0.0401

0.0392

\hat{std}

0.0186

0.0198

0.0203

0.0394

0.0393

0.0382

0.0389

95%

0.9540

0.9820

0.9580

0.9420

0.9480

0.9640

LOC2

ave

1.0010

1.0012

1.0001

−1.0022

1.0031

−1.0021

1.0030

std

0.0117

0.0115

0.0109

0.0323

0.0318

0.0323

0.0308

\hat{std}

0.0113

0.0110

0.0109

0.0322

0.0318

0.0322

95%

0.9580

0.9340

0.9660

0.9580

0.9560

0.9640

0.9580

LOC3

ave

1.0003

1.0011

1.0000

−0.9991

1.0009

−1.0002

1.0015

std

0.0120

0.0118

0.0114

0.0386

0.0393

0.0390

0.0382

\hat{std}

0.0110

0.0109

0.0386

0.0387

0.0377

0.0385

95%

0.9380

0.9340

0.9480

0.9500

0.9640

ave

1.0006

1.0011

1.0000

−1.0021

1.0029

−1.0019

1.0028

std

0.0110

0.0103

0.0319

0.0314

0.0320

0.0304

\hat{std}

0.0106

0.0105

0.0318

0.0320

0.0315

0.0319

95%

0.9460

0.9420

0.9640

0.9560

0.9620

0.9600

Open in a new tab

Acknowledgments

This work was partially supported by the U.S. National Science Foundation (DMS-1206693), the U.S. National Institute of Neurological Disorders and Stroke (R01-NS073671), and the U.S. National Institutes of Health (U01-HL114494). The authors are grateful to the referees and the editors for their valuable comments and suggestions.

Contributor Information

Kun Chen, Department of Statistics, University of Connecticut, 215 Glenbrook Road U-4120, Storrs, Connecticut 06269, U.S.A.

Yanyuan Ma, Department of Statistics, University of South Carolina, 1523 Greene Street Columbia, SC 29208, U.S.A.

References

Anderson TW. Estimating linear restrictions on regression coefficients for multivariate normal distributions. Annals of Mathematical Statistics. 1951;22:327–351. [Google Scholar]
Bunea F, She Y, Wegkamp M. Joint variable and rank selection for parsimonious estimation of high dimensional matrices. Annals of Statistics. 2012;40:2359–2388. [Google Scholar]
Chan K-S, Li M-C, Tong H. Partially linear reduced-rank regression. Technical Report, Department of Statistics, University of Iowa. 2004 [Google Scholar]
Chen K, Chan K-S, Stenseth NC. Reduced rank stochastic regression with a sparse singular value decomposition. Journal of the Royal Statistical Society: Series B. 2012;74:203–221. [Google Scholar]
Chen K, Chan K-S, Stenseth NC. Source-sink reconstruction through regularized multicomponent regression analysis–with application to assessing whether north sea cod larvae contributed to local fjord cod in skagerrak. Journal of the American Statistical Association. 2014;109:560–573. [Google Scholar]
Chen L, Huang JZ. Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. Journal of the American Statistical Association. 2012;107:1533–1545. [Google Scholar]
Fan J, Yao Q, Cai Z. Adaptive varying-coefficient linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2003;65:57–80. [Google Scholar]
Gifi A. Nonlinear Multivariate Analysis. New York: John and Wiley & Sons; 1990. [Google Scholar]
Hall P, Yao Q. Approximating conditional distribution functions using dimension reduction. Annals of Statistics. 2005;33:1404–1421. [Google Scholar]
Hardle W, Hall P, Ichimura H. Optimal smoothing in single-index models. Annals of Statistics. 1993;21:157–178. [Google Scholar]
He G, Mller H-G, Wang J-L. Functional canonical analysis for square integrable stochastic processes. Journal of Multivariate Analysis. 2003;85:54–77. [Google Scholar]
Hotelling H. Relations between two sets of variates. Biometrika. 1936;28:321–377. [Google Scholar]
Hsieh W. Nonlinear canonical correlation analysis by neural networks. Neural Networks. 2000;13:1095–1105. doi: 10.1016/s0893-6080(00)00067-8. [DOI] [PubMed] [Google Scholar]
Iaci R, Sriram T. Robust multivariate association and dimension reduction using density divergences. Journal of Multivariate Analysis. 2013;117:281–295. [Google Scholar]
Iaci R, Sriram T, Yin X. Multivariate association and dimension reduction: a generalization of canonical correlation analysis. Biometrics. 2010;66:1107–1118. doi: 10.1111/j.1541-0420.2010.01396.x. [DOI] [PubMed] [Google Scholar]
Ichimura H. Semiparametric least squares (sls) and weighted fSLSg estimation of single-index models. Journal of Econometrics. 1993;58:71–120. [Google Scholar]
Klein R, Shen C. Bias corrections in testing and estimating semiparametric, single index models. 2007 [Google Scholar]
Klein R, Vella F. A semiparametric model for binary response and continuous outcomes under index heteroscedasticity. Journal of Applied Econometrics. 2009:735–762. [Google Scholar]
Li B, Wen S, Zhu L. On a projective resampling method for dimension reduction with multivariate responses. Journal of the American Statistical Association. 2008;103:1177–1186. [Google Scholar]
Liang H, Liu X, Li R, Tsai C-L. Estimation and testing for partially linear single-index models. Ann. Statist. 2010;38:3811–3836. doi: 10.1214/10-AOS835. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ma S, Zhang J, Sun Z, Liang H. Integrated conditional moment test for partially linear single index models incorporating dimension-reduction. Electron. J. Statist. 2014;8:523–542. [Google Scholar]
Ma Y, Chiou J-M, Wang N. Efficient semiparametric estimator for heteroscedastic partially linear models. Biometrika. 2006;93:75–84. [Google Scholar]
Ma Y, Zhu L. A semiparametric approach to dimension reduction. Journal of the American Statistical Association. 2012;107:168–179. doi: 10.1080/01621459.2011.646925. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ma Y, Zhu L. Doubly robust and Efficient estimators for heteroscedastic partially linear single-index models allowing high dimensional covariates. Journal of the Royal Statistical Society: Series B. 2013a;75:305–322. doi: 10.1111/j.1467-9868.2012.01040.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ma Y, Zhu L. Efficient estimation in sufficient dimension reduction. Annals of Statistics. 2013b;41:250–268. doi: 10.1214/12-AOS1072SUPP. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mandal A, Cichocki A. Non-linear canonical correlation analysis using Alpha-Beta divergences. Entropy. 2013;15:2788–2804. [Google Scholar]
Mukherjee A, Zhu J. Reduced rank ridge regression and its kernel extensions. Statistical Analysis and Data Mining. 2011;4:612–622. doi: 10.1002/sam.10138. [DOI] [PMC free article] [PubMed] [Google Scholar]
Newey WK, Stoker TM. Efficiency of weighted average derivative estimators and index models. Econometrica. 1993:1199–1223. [Google Scholar]
Reinsel GC, Velu P. Multivariate reduced-rank regression: theory and applications. New York: Springer; 1998. [Google Scholar]
Stute W, Zhu L-X. Nonparametric checks for single-index models. Ann. Statist. 2005;33:1048–1083. [Google Scholar]
Witten DM, Tibshirani RJ, Hastie TJ. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics. 2009;10:515–534. doi: 10.1093/biostatistics/kxp008. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xia Y. A semiparametric approach to canonical analysis. Journal of the Royal Statistical Society: Series B. 2008;70:519–543. [Google Scholar]
Yeh I-C. Exploring concrete slump model using artificial neural networks. Journal of Computing in Civil Engineering. 2006;20:217–221. [Google Scholar]
Yeh I-C. Modeling slump flow of concrete using second-order regressions and artificial neural networks. Cement and Concrete Composites. 2007;29:474–480. [Google Scholar]
Yuan M, Ekici A, Lu Z, Monteiro R. Dimension reduction and coefficient estimation in multivariate linear regression. Journal of the Royal Statistical Society: Series B. 2007;69:329–346. [Google Scholar]
Zhu H, Khondker Z, Lu Z, Ibrahim JG. Bayesian generalized low rank regression models for neuroimaging phenotypes and genetic markers. Journal of the American Statistical Association. 2014;109:977–990. [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supplemental data

NIHMS816420-supplement-supplemental_data.pdf^{(269KB, pdf)}

[R1] Anderson TW. Estimating linear restrictions on regression coefficients for multivariate normal distributions. Annals of Mathematical Statistics. 1951;22:327–351. [Google Scholar]

[R2] Bunea F, She Y, Wegkamp M. Joint variable and rank selection for parsimonious estimation of high dimensional matrices. Annals of Statistics. 2012;40:2359–2388. [Google Scholar]

[R3] Chan K-S, Li M-C, Tong H. Partially linear reduced-rank regression. Technical Report, Department of Statistics, University of Iowa. 2004 [Google Scholar]

[R4] Chen K, Chan K-S, Stenseth NC. Reduced rank stochastic regression with a sparse singular value decomposition. Journal of the Royal Statistical Society: Series B. 2012;74:203–221. [Google Scholar]

[R5] Chen K, Chan K-S, Stenseth NC. Source-sink reconstruction through regularized multicomponent regression analysis–with application to assessing whether north sea cod larvae contributed to local fjord cod in skagerrak. Journal of the American Statistical Association. 2014;109:560–573. [Google Scholar]

[R6] Chen L, Huang JZ. Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. Journal of the American Statistical Association. 2012;107:1533–1545. [Google Scholar]

[R7] Fan J, Yao Q, Cai Z. Adaptive varying-coefficient linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2003;65:57–80. [Google Scholar]

[R8] Gifi A. Nonlinear Multivariate Analysis. New York: John and Wiley & Sons; 1990. [Google Scholar]

[R9] Hall P, Yao Q. Approximating conditional distribution functions using dimension reduction. Annals of Statistics. 2005;33:1404–1421. [Google Scholar]

[R10] Hardle W, Hall P, Ichimura H. Optimal smoothing in single-index models. Annals of Statistics. 1993;21:157–178. [Google Scholar]

[R11] He G, Mller H-G, Wang J-L. Functional canonical analysis for square integrable stochastic processes. Journal of Multivariate Analysis. 2003;85:54–77. [Google Scholar]

[R12] Hotelling H. Relations between two sets of variates. Biometrika. 1936;28:321–377. [Google Scholar]

[R13] Hsieh W. Nonlinear canonical correlation analysis by neural networks. Neural Networks. 2000;13:1095–1105. doi: 10.1016/s0893-6080(00)00067-8. [DOI] [PubMed] [Google Scholar]

[R14] Iaci R, Sriram T. Robust multivariate association and dimension reduction using density divergences. Journal of Multivariate Analysis. 2013;117:281–295. [Google Scholar]

[R15] Iaci R, Sriram T, Yin X. Multivariate association and dimension reduction: a generalization of canonical correlation analysis. Biometrics. 2010;66:1107–1118. doi: 10.1111/j.1541-0420.2010.01396.x. [DOI] [PubMed] [Google Scholar]

[R16] Ichimura H. Semiparametric least squares (sls) and weighted fSLSg estimation of single-index models. Journal of Econometrics. 1993;58:71–120. [Google Scholar]

[R17] Klein R, Shen C. Bias corrections in testing and estimating semiparametric, single index models. 2007 [Google Scholar]

[R18] Klein R, Vella F. A semiparametric model for binary response and continuous outcomes under index heteroscedasticity. Journal of Applied Econometrics. 2009:735–762. [Google Scholar]

[R19] Li B, Wen S, Zhu L. On a projective resampling method for dimension reduction with multivariate responses. Journal of the American Statistical Association. 2008;103:1177–1186. [Google Scholar]

[R20] Liang H, Liu X, Li R, Tsai C-L. Estimation and testing for partially linear single-index models. Ann. Statist. 2010;38:3811–3836. doi: 10.1214/10-AOS835. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] Ma S, Zhang J, Sun Z, Liang H. Integrated conditional moment test for partially linear single index models incorporating dimension-reduction. Electron. J. Statist. 2014;8:523–542. [Google Scholar]

[R22] Ma Y, Chiou J-M, Wang N. Efficient semiparametric estimator for heteroscedastic partially linear models. Biometrika. 2006;93:75–84. [Google Scholar]

[R23] Ma Y, Zhu L. A semiparametric approach to dimension reduction. Journal of the American Statistical Association. 2012;107:168–179. doi: 10.1080/01621459.2011.646925. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] Ma Y, Zhu L. Doubly robust and Efficient estimators for heteroscedastic partially linear single-index models allowing high dimensional covariates. Journal of the Royal Statistical Society: Series B. 2013a;75:305–322. doi: 10.1111/j.1467-9868.2012.01040.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] Ma Y, Zhu L. Efficient estimation in sufficient dimension reduction. Annals of Statistics. 2013b;41:250–268. doi: 10.1214/12-AOS1072SUPP. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] Mandal A, Cichocki A. Non-linear canonical correlation analysis using Alpha-Beta divergences. Entropy. 2013;15:2788–2804. [Google Scholar]

[R27] Mukherjee A, Zhu J. Reduced rank ridge regression and its kernel extensions. Statistical Analysis and Data Mining. 2011;4:612–622. doi: 10.1002/sam.10138. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] Newey WK, Stoker TM. Efficiency of weighted average derivative estimators and index models. Econometrica. 1993:1199–1223. [Google Scholar]

[R29] Reinsel GC, Velu P. Multivariate reduced-rank regression: theory and applications. New York: Springer; 1998. [Google Scholar]

[R30] Stute W, Zhu L-X. Nonparametric checks for single-index models. Ann. Statist. 2005;33:1048–1083. [Google Scholar]

[R31] Witten DM, Tibshirani RJ, Hastie TJ. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics. 2009;10:515–534. doi: 10.1093/biostatistics/kxp008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] Xia Y. A semiparametric approach to canonical analysis. Journal of the Royal Statistical Society: Series B. 2008;70:519–543. [Google Scholar]

[R33] Yeh I-C. Exploring concrete slump model using artificial neural networks. Journal of Computing in Civil Engineering. 2006;20:217–221. [Google Scholar]

[R34] Yeh I-C. Modeling slump flow of concrete using second-order regressions and artificial neural networks. Cement and Concrete Composites. 2007;29:474–480. [Google Scholar]

[R35] Yuan M, Ekici A, Lu Z, Monteiro R. Dimension reduction and coefficient estimation in multivariate linear regression. Journal of the Royal Statistical Society: Series B. 2007;69:329–346. [Google Scholar]

[R36] Zhu H, Khondker Z, Lu Z, Ibrahim JG. Bayesian generalized low rank regression models for neuroimaging phenotypes and genetic markers. Journal of the American Statistical Association. 2014;109:977–990. [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Analysis of Double Single Index Models

Kun Chen

Yanyuan Ma

Abstract

1 Introduction

2 Methodology

Theorem 2.1

Theorem 2.2

3 Simulation

3.1 Setups

3.2 Results

Figure 1.

Figure 3.

Table 1.

Table 3.

4 Concrete Slump Test Data

Figure 4.

Table 4.

5 Discussion

Supplementary Material

Figure 2.

Table 2.

Acknowledgments

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Analysis of Double Single Index Models

Kun Chen

Yanyuan Ma

Abstract

1 Introduction

2 Methodology

Theorem 2.1

Theorem 2.2

3 Simulation

3.1 Setups

3.2 Results

Figure 1.

Figure 3.

Table 1.

Table 3.

4 Concrete Slump Test Data

Figure 4.

Table 4.

5 Discussion

Supplementary Material

Figure 2.

Table 2.

Acknowledgments

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases