Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 Mar 1.
Published in final edited form as: Scand Stat Theory Appl. 2016 Aug 22;44(1):1–20. doi: 10.1111/sjos.12238

Analysis of Double Single Index Models

Kun Chen 1, Yanyuan Ma 2
PMCID: PMC5352986  NIHMSID: NIHMS816420  PMID: 28316363

Abstract

Motivated from problems in canonical correlation analysis, reduced rank regression and sufficient dimension reduction, we introduce a double dimension reduction model where a single index of the multivariate response is linked to the multivariate covariate through a single index of these covariates, hence the name double single index model. Since nonlinear association between two sets of multivariate variables can be arbitrarily complex and even intractable in general, we aim at seeking a principal one-dimensional association structure where a response index is fully characterized by a single predictor index. The functional relation between the two single-indices is left unspecified, allowing flexible exploration of any potential nonlinear association. We argue that such double single index association is meaningful and easy to interpret, and the rest of the multi-dimensional dependence structure can be treated as nuisance in model estimation. We investigate the estimation and inference of both indices and the regression function, and derive the asymptotic properties of our procedure. We illustrate the numerical performance in finite samples and demonstrate the usefulness of the modeling and estimation procedure in a multi-covariate multi-response problem concerning concrete.

Keywords: Canonical correlation analysis, Reduced rank regresion, Semiparametric efficiency, Single index models, Sufficient dimension reduction

1 Introduction

In scientific research and engineering, many statistical problems share a common goal of deciphering the associations between certain features and outcomes/responses from noisy data. When both the feature and response variables are multivariate, several different strategies exist to model their relations. Among the popular approaches are the canonical correlation analysis (CCA) (Hotelling, 1936) and the reduced rank regression (RRR) (Anderson, 1951; Reinsel and Velu, 1998; Mukherjee and Zhu, 2011), both are designed to examine possible linear association between the two sets of random variables.

Specifically, write the covariate vector X ∈ ℝp and the response variable Y ∈ ℝq, where p > 1, q > 1. CCA seeks linear combinations αTY and βTX that have maximum correlation with each other. In other words, CCA searches for unit length vectors α and β so that corr(αTY, βTX) is maximized. Because correlation is chosen as the sole criterion to evaluate the closeness between αTY and βTX, CCA implicitly assumes a linear relation between these two quantities, or, at the very least, CCA is only interested in the linear relation between them. Similar to CCA, in the multivariate linear regression framework, the RRR model assumes a linear relation Y = CTX+ε* between the responses and covariates, where the coefficient matrix C ∈ ℝp×q is possibly of low rank, say, rank(C) = r ≤ min(p, q), and ε* is usually assumed to follow a multivariate normal distribution with mean zero. The main idea of RRR amounts to seek the best low-rank approximation of Y supervised by the covariate information in X, i.e., minimizing E{(YCTX)T(YCTX)} subject to rank(C) ≤ r. When we consider the unit-rank RRR model, it becomes Y = cαβTX+ε*. Here c is the first singular value of C and α, β are the first left and right singular vectors of C respectively. This can be further written as αTY = cβTX+ε, where ε is a mean zero error term. Obviously, the linear relation between Y and X in RRR implies a linear relation between αTY and βTX. In fact, many commonly used multivariate techniques, including CCA, RRR and principal component analysis are all intrinsically related and all rely on certain linear assumption (Hotelling, 1936; Reinsel and Velu, 1998). Although in practice, multiple linearly dependent pairs of directions can be retained from CCA or RRR either sequentially or simultaneously, to focus on the main idea, we restrict our attention to the extraction of a single pair of directions in this paper, following the spirit of the single index model.

In real world applications, linearity is often too strong an assumption when characterizing variable association, and nonlinearity inevitably arises, especially in multivariate settings. However, extension of the available nonparametric techniques designed for univariate response to multivariate response is not quite straightforward, not only because of the curse of dimensionality, but also because of the difficulty in efficiently modeling the dependence structure among the response variables to fully embrace the multivariate nature of the problem. Many existing nonlinear methods were originated from classical CCA and RRR (Gifi, 1990; Hsieh, 2000; He et al., 2003; Yuan et al., 2007; Mukherjee and Zhu, 2011). Xia (2008) proposed a semiparametric approach of CCA (SCA), in which the estimation was based on minimizing E{αTYE(αTY | βTX)}2, where the conditional expectation E(· | ·) was estimated nonparametrically. SCA thus extends the classical CCA, as the latter simply assumes the conditional expectation to be linear. A similar approach is the generalized canonical correlation analysis proposed by Iaci et al. (2010); the method searches the pair of indices by minimizing E{αTYE(αTY | βTX)}2 + E{βTXE(βTX | αTY)}2, treating the two sets of variables symmetrically. There have been several approaches that find (βTx, αTy) by maximizing certain divergence measure between the joint distribution of (βTx, αTy) and the product of their marginal distributions. Iaci and Sriram (2013) proposed two families of multivariate association measures based on power divergence and alpha divergence, and Mandal and Cichocki (2013) proposed a generalized method of CCA called AB-canonical analysis using Alpha-Beta divergence. For extensions more related to RRR, Chan et al. (2004) studied the properties of a general semiparametric partial linear reduced-rank regression model, and Yuan et al. (2007) proposed a nonparametric low-rank factor model using regression splines. For other methods concerning the use of dimension reduction techniques to facilitate the exploration of multivariate nonlinear association, see Li et al. (2008) and the references therein.

Clearly, as soon as we venture into the territory beyond linearity, the possible multivariate association structures quickly become so rich and complex that it can even be infeasible to fully retrieve the true association structure. In this paper, to relax the assumption on linear association while still keep the model tractable, motivated by the sufficient dimension reduction literature, we introduce a flexible and yet manageable modeling strategy, where we assume there exists α ∈ ℝq and β ∈ ℝp so that αTY relies on X through βTX, but we do not impose a linear relation or any specific functional link between αTY and βTX. Specifically, we only assume

fαTY|X(αTy,x)=fαTY|βTX(αTy,βTx). (1)

Here, fαTY|X stands for the probability density function of αTY conditional on X, and fαTY|βTX is similarly defined. The model described in (1) is what we name the double single index model (DSI), for the obvious reason that there are two single indices described by α and β respectively. Our proposal has several key ingredients. First, a variable index is often of practical interest and admits meaningful interpretation, and thus this desirable feature is retained in our model, the same as in single index models (Ichimura, 1993). Moreover, searching for a pair of associated indices is the essential objective in many real-world multivariate problems, see for example, Witten et al. (2009), Zhu et al. (2014) and Chen et al. (2014). Second, to allow flexibility in the process of pursuing nonlinearity, we do not intend to characterize the association between αTY and βTX. Instead, we aim to extract a relatively simple yet meaningful one-dimensional association between the response variables and the predictors. The DSI model is directly built on the conditional distribution of the variables, in contrast to many methods that only model the mean association structure. Third, in our approach, the desired simple DSI structure is perceived as lurking beneath other parts of the multivariate association of no direct interest. As such, these other parts are treated as nuisance and left unspecified. In estimation, only a working model is needed and the estimation is not sensitive to its misspecification; see Sections 2 and 3 for details. This is a rather important feature both practically and conceptually, especially given that modern data are obtained with ever increasing complexity and yet often only a few summary features of the data contribute to the actual knowledge discovery.

The DSI model has connections to several familiar multivariate models. For example, in the special case when q = 1, the DSI model in (1) reduces to the familiar single index model (Ichimura, 1993; Hardle et al., 1993). In addition, DSI is an extension and generalization of CCA and RRR, in that it allows the association between αTY and βTX to be nonlinear. As DSI is specified from conditional distribution, it is more comprehensive than the SCA model which only concerns the association in the mean. We also avoid specifying and modeling other possibly intractable dependence structures between Y and X. The DSI model is also related to the multivariate response sufficient dimension reduction model (Li et al., 2008). In this context, when the structural dimension is one, the model assumes that Y depends on X through βTX, i.e., fY|X(y, x) = fY|βTX(y, βTx). This automatically leads to fαTY|X(αTy, x) = fαTY|βTX(αTy, βTx) for any α. Now if we relax the requirement so that this relation only holds for some specific α instead of all α, then we obtain the DSI model (1).

The DSI model described in (1) arises naturally in practice. In civil engineering, it is an important topic to study the association between the quality of concrete and its composition (Yeh, 2006). Concrete is a highly complex material and consists of a mixture of several ingredients, including cement, fly ash, blast-furnace slag, water, superplasticizer and aggregate, etc. To summarize the composition of concrete is to study the proportion of these different ingredients, hence a natural way to summarize them is via their linear combination. On the other hand, in terms of quality, concrete is also measured in different aspects. Generally speaking, concrete which has high consistency at its fresh state while also has high strength at its hardened state, indicates that it has the properties of stability and durability, and is thus considered to be of high quality. The various aspects of the concrete quality include strength, stability, durability, etc, and can be summarized into a linear combination of these individual properties. It is thus natural to apply DSI to explore whether a quality index of the concrete (αTY) exhibits some interesting linear/nonlinear relationship with certain composition of the concrete (βTX). In Section 4, we analyze a data example concerning concrete to further demonstrate the application of DSI.

Given that we can estimate the linear combination coefficients in α, β, we can subsequently perform a classical univariate-covariate univariate-response nonparametric regression to identify the functional relationship between the two indices. Our proposed method thus provides a useful exploratory tool for examining potential nonlinear associations between two sets of variables.

2 Methodology

To ensure identifiability of α and β, we fix the last component of α and β to be 1 and require (1) to hold at unique α and β locally. We write α=(αuT,1)T and β=(βuT,1)T. The requirement can be easily satisfied by reordering the components in Y and X if necessary. We point out that the parameterization of requiring unit length of an index with positive first component and that of requiring a fixed component to be one are both commonly used in the literature (Newey and Stoker, 1993; Klein and Shen, 2007; Klein and Vella, 2009); here we choose the latter to enable the semi-parametric analysis and computation to be carried out in a more straightforward way. Under this parameterization, our interest is then exclusively in the q − 1 dimensional vector αu and the p − 1 dimensional vector βu. Here the subindex u stands for unknown. Let γ=(αuT,βuT)T be the unknown parameter of interest.

To provide a more direct and intuitive example of the model and its identifiability in (1), we consider the case when αu is zero. This occurs when the last component Yq depends on X through a single index β, while all other components in Y, i.e. Y1, …, Yq−1 depend on X through structures more complex than the single index model. For example, Yk = mk(XkXk+1) + εk for k = 1, …, q − 1, where εk is a mean zero random variable independent of X and mk is a non-constant function. Having understood the component-wise model corresponding to the special α, we can then generalize the situation to the case when the response variable is further rotated and stretched by incorporating a general α. Further, if we restrict our interest on α in a local neighborhood, we can allow more components of Y to depend on X through single indices, as long as different components of Y correspond to different single indices. In this case, in a local neighborhood, only one of these single index structures, corresponding to one particular component of Y, will be of interest. With the additional rotation and stretching, only one linear combination of Y will be captured by α hence the problem is locally identifiable.

Following model (1), we write out the likelihood at one typical observation as

fX,Y(x,y)=η1(x)η2(αTy,βTx)η3(yr,αTy,x).

Here yr is the vector of the first q − 1 components of y, η1 represents the probability density function (pdf) of X and η3 represents the pdf of Yr conditional on αTY and X. We use η2 to represent the pdf of αTY conditional on X, which by the model assumption in (1) is a function of αTy and βTx only. Note that η1, η2, η3 are all unknown. It is now clear that (1) can be viewed as a semiparametric model where the parameter of interest is γ and η1, η2, η3 are three nuisance parameters. We thus use the semiparametric analysis tools to derive nuisance tangent space Λ and its orthogonal complement Λ. The details of the derivation are in the supplementary materials, where we obtain the conclusion that

Λ={b(αTY,X)E(b|αTy,βTx):b(αTY,X)p+q2s.t.E(b|βTx)=E(b|x)}.

This result somewhat resembles the results in Ma and Zhu (2012), where their univariate Y is replaced by αTY here. Thus, the constructions there can be applied here as well by replacing all the instances of Y with αTY. Let a and ai’s be arbitrary functions of x, while g and gi’s be arbitrary functions of αTy and βTx. Here a, ai, g, gi can be scalar, vector or matrix functions as long as their dimensions conform and the dimension of their product is p + q − 2, i.e. ga ∈ ℝp+q−2 and giai ∈ ℝp+q−2 for i = 1, …, k. Since

E[{g(αTY,βTX)E(g|βTX)}{a(X)E(a|βTX)}]=0 (2)

and

E[i=1k{gi(αTY,βTX)E(gi|βTX)}{ai(X)E(ai|βTX)}]=0, (3)

we can use the functions inside the above expectations to construct root-n consistent estimators. The construction contained in (2) and (3) possesses a nice double robustness property, in that between the two expectations E{g(αTy, βTx) | βTx} (or E{gi(αTy, βTx) | βTx}) and E{a(x) | βTx} (or E{ai(x) | βTx}), as long as we calculate one of them correctly, we are free to mis-specify the other and the consistency of the estimating function will still be retained. That is, for instance in (2), we have

E[{g(αTy,βTx)E(g|βTx)}{a(x)h(βTx)}]=0,

and

E[{g(αTy,βTx)h(βTx)}{a(x)E(a|βTx)}]=0

for any function h(βTx). However, different from the practice in Ma and Zhu (2012), we summarize the theoretical results of estimating γ based on (2) in Theorems 2.1, where the matrix A in Theorem 2.1 is required to have rank p + q − 2.

Theorem 2.1

Under the regularity conditions C1–C6 listed in the supplement A.2, the estimator γ̂ from the estimating equation

i=1n{g(α^Tyi,β^Txi)Ê(g|β^Txi)}{a(xi)Ê(a|β^Txi)}=0

is consistent, i.e.

γ^γ

in probability when n → ∞. In addition, the estimator satisfies

nA(γ^γ)N(0,B)

in distribution when n → ∞. Here

A=E(vec[{g(αTy,βTx)E(g|βTx)}{a(x)E(a|βTx)}]/γT),
B=cov(vec[{g(αTy,βTx)E(g|βTx)}{a(x)E(a|βTx)}]).

Theorem 2.1 implies an interesting phenomenon, in that although we estimated the two expectations conditional on β̂Tx nonparametrically, the corresponding estimation causes no effect on the final asymptotic properties of γ̂. In other words, if we had known how to obtain E(a | β̂Tx) and E(g | β̂Tx) exactly, the estimation of γ would not have been improved further. This nice property is a direct result of the double centering form of the estimating equation in Theorem 2.1, where we centered both g and a through subtracting their respective mean conditional on β̂Tx, before multiplication. Similar practice has been used in other models in the partially linear model related literature (Ma et al., 2006; Ma and Zhu, 2013a) and sufficient dimensional reduction literature (Ma and Zhu, 2012). How the double centering operation leads to this property is clearly shown in the proof of Theorem 2.1, especially through Lemma 1.2, given in the supplement A.3. It is also clear from the derivation in the supplement A.3 that if we had taken advantage of the double robustness property mentioned before and had estimated only one expectation faithfully while using an arbitrary h(βTx) to replace the other expectation, then the fact that we had to estimate the conditional expectation would have led to an alteration of the variability in estimating γ̂.

We now further investigate the efficient estimation issue through calculating the score and the efficient score. First, straightforward calculation yields

Sβ(αTy,βTx)=logη2(αTy,βTx)(βTx)xr,

where we use xr to denote the vector of the first p − 1 components of x. Now projecting Sβ onto Λ, we obtain

Seffβ(αTy,βTx)=logη2(αTy,βTx)(βTx){xrE(Xr|βTx)}.

This is because we can easily verify that Seff β(αTy, βTx) ∈ Λ and {∂logη2(αTy, βTx)/∂(βTx)}E(Xr | βTx) ∈ Λ, based on the description of Λ and Λ in supplement A.1. We now further calculate

Sα(yr,αTy,x)=logη2(αTy,βTx)(αTy)yr+logη3(yr,αTy,x)(αTy)yr.

Projecting Sα onto Λ, we obtain Seff α(αTy, βTx) = E(Sα | αTy, x) − E(Sα | αTy, βTx). This is because

E{E(Sα|αTy,x)|x}=E(Sα|x)={η2(αTy,βTx)η3(yr,αTy,x)}(αTy)d(αTy)yrdyr=0,

hence E{E(Sα | αTy, x) | x} = E{E(Sα | αTy, x) | βTx}, which implies Seff α ∈ Λ. On the other hand, SαE(Sα | αTy, x) ∈ Λ3 and E{E(Sα | αTy, βTx) | βTx} = E(Sα | βTx) = 0, hence SαSeff α ∈ Λ indeed. Hence the projection of Sα onto Λ is indeed given by Seff α. Specifically, we obtain

Seffα(αTy,βTx)=logη2(αTy,βTx)(αTy){E(yr|αTy,x)E(yr|αTy,βTx)}+E{logη3(yr,αTy,x)(αTy)yr|αTy,x}E{logη3(yr,αTy,x)(αTy)yr|αTy,βTx}.

Combining the two calculations, we have Seff=(SeffαT,SeffβT)T.

Unfortunately, the estimation of E(yr | αTy, x) and η3(yr, αTy, x) is subject to curse of dimensionality due to the presence of x (as well as yr for η3(yr, αTy, x)). Hence the efficient estimator is unreachable in practice. This is in contrast to Ma and Zhu (2013b), where only a univariate Y is concerned. However, we can use the form of Seff to construct locally efficient estimators using a working model of η3. Although we can estimate η2, considering that in any case we cannot guarantee efficiency, we will use a working model of η2 as well. To this end, we propose to posit the working models η2*(αTy,βTx) and η3*(yr,αTy,x). We then construct the locally efficient estimators from Seff*=(Seff*αT,Seff*βT)T, where

Seffα*=logη2*(αTy,βTx)(αTy)[E*(yr|αTy,x)E{E*(yr|αTy,x)|αTy,βTx}]+E*{logη3*(yr,αTy,x)(αTy)yr|αTy,x}E[E*{logη3*(yr,αTy,x)(αTy)yr|αTy,x}|αTy,βTx],

and

Seffβ*=[logη2*(αTy,βTx)(βTx)E{logη2*(αTy,βTx)(βTx)|βTx}]{xrE(Xr|βTx)}.

We can obtain the locally efficient estimator through using Seff*. Specifically, use Oi to denote the ith observation, and use Seff*(Oi;γ,Ê) to denote the efficient score evaluated at Oi, with E replaced by its kernel estimator Ê. The estimator γ̂ = (α̂T, β̂T)T satisfies

i=1nSeff*(Oi;γ^,Ê)=0. (4)

We show that γ̂ is locally efficient, i.e., it is efficient when η2 and η3 are correctly specified; otherwise it is still consistent and asymptotically normal.

Theorem 2.2

Under the regularity conditions B1–B5 listed in the supplement A.4, the estimator γ̂ from the estimating equation (4) is locally efficient. Specifically, when n → ∞,

n(γ^γ)N(0,A1BA1T),

where A=E{Seff*(Oi;γ)/γT} and B=E[{Seff*(Oi;γ)u*(xi;γ)}2]. Here,

u*(xi;γ)b*(αTy,xi)η2(αTy,βTxi)d(αTy)βTx=βTxib*(αTy,x)fx(x)dxη2(αTy,βTxi)f(βTxi)d(αTy).

In addition, when η2*(αTy,βTx)=η2(αTy,βTx) and η3*(yr,αTy,βTx)=η3(yr,αTy,βTx), then A=B=E{Seff2(Oi;γ)}, and the estimator is efficient. Here a⊗2aaT for any vector or matrix a.

The details of the implementation of the locally efficient estimator is the following. To simplify the description of the implementation of the locally efficient estimator, we first define functions

m1(βTx)E(xr|βTx),
m2(βTx)E{logη2*(αTy,βTx)βTx|βTx},
m3(αTy,βTx)E{b*(αTy,x)|αTy,βTx},

where

b*(αTy,x)logη2*(αTy,βTx)(αTy)E*(yr|αTy,x)+E*{logη3*(yr,αTy,x)(αTy)yr|αTy,x}.

The Nadaraya-Watson kernel estimators of m1, m2 and m3 are respectively

m^1(βTx)=i=1nKh{βT(xxi)}xrii=1nKh{βT(xxi)},
m^2(βTx)=i=1nKh{βT(xxi)}logη2*(αTyi,βTxi)/βTxii=1nKh{βT(xxi)},
m^3(αTy,βTx)=i=1nKh{βT(xxi)}b*(αTy,xi)i=1nKh{βT(xxi)},

where h is a bandwidth. To emphasize the dependence on m1, m2, m3, we can write the locally efficient score function

Seff*{αTy,βTx,m1(βTx),m2(βTx),m3(αTy,βTx)}.

The locally efficient estimator can then be obtained in practice through solving the estimating equation

i=1nSeff*{αTyi,βTxi,m^1(βTxi),m^2(βTxi),m^3(αTyi,βTxi)}=0.

As long as the bandwidth h is fixed, the only unknown quantity in the estimating equation is γ. The estimating equation can be solved by standard optimization methods such as the Newton-Raphson algorithm or the trust region method. Because a wide range of bandwidths all lead to the same asymptotic result (see condition B4 in the supplement and Theorem 2.2), hence even in finite samples, the estimator is quite insensitive to the bandwidth. Thus, we can simply use h = n−1/5 in the implementation. One can certainly perform cross validation and use a unique bandwidth to associate with each specific nonparametric regression, at the cost of selecting more bandwidths. We have implemented our method in MATLAB, where Newton-Raphson algorithm is applied and numerical difference is used to approximate the local derivative functions. Based on our limited experience the computation is stable and fast.

Having estimated γ, we can perform nonparametric regression of α̂TY on β̂TX to further estimate η2, following, for example, Fan et al. (2003). Because γ is estimated at the parametric rate of root-n, the estimation of η2 will have the usual nonparametric estimation rate, and its first order asymptotic properties are the same as that of the estimation of η2 using the true parameter γ. Because the derivation and the results of the nonparametric procedure are standard, we omit the details.

Trimming (Ichimura, 1993) is often needed in nonparametric estimation to handle the potential issue of dividing by zero. However, trimming is avoided here because we only need the nonparametric evaluations at βTxi, which is always positive since we include the ith observation in the estimator. Further, condition C3 guarantees the density of βTX to be bounded away from zero. Thus, when sample size is sufficiently large, the estimated density is also bounded away from zero.

In practice, to specify η2* and η3*, we suggest the following. First, use simpler methods such as CCA or SCA to obtain starting values (α̃, β̃) for DSI. Then, η2* can be specified based on the empirical conditional distribution between the leading canonical pairs α̃Ty and β̃Tx, e.g, αTy ~ N(aβTx + b, σ2), where a, b and σ2 are estimated from a regression analysis between α̃Ty and β̃Tx. Our numerical results suggest that η3* can be specified in a more crude way, see Section 3. For example, we can specify η3* by assuming the components of yr are independent conditional on α̃Ty and x and conducting regression analysis between yr and some linear/nonlinear functions of α̃Ty and x.

3 Simulation

3.1 Setups

We conduct simulation studies to evaluate the finite sample performance of the proposed methods. For comparison, the most relevant nonlinear approach to our method is the semiparametric canonical correlation analysis (SCA) proposed by Xia (2008). SCA searches the pair of indices by minimizing E{αTYE(αTY | βTX)}2, and the procedure involves the estimation of E(Yi | X) and its derivatives using d-th order local polynomial smoothing, where d > p/2 + 1 in order to achieve n-consistency. Several classical multivariate tools based on certain linearity assumption can also be applied for such two-way search, with CCA and RRR as the popular prototypes of those. We thus compare the proposed DSI approach to CCA, RRR and SCA.

We set p = 5, q = 4, α = (1, 1, 1, 1)T and β = (1, −1, 1, −1, 1)T in all the simulation examples. The process of generating a typical observation (x, y) is as follows.

  1. Generate x from η1, the marginal distribution of X.

  2. Compute βTx and generate αTy from η2, the conditional distribution of αTY given βTx.

  3. Generate yr from η3, the conditional distribution of Yr given αTy and x.

  4. Compute yq from the generated values αTy and yr, i.e., yq=αTyαuTyr. Let y=(yrT,yq)T.

We set η1 as the standard multivariate normal distribution; in practice, with the components of X correlated, one may orthogonalize the variables before pursuing sufficient dimension reduction. We consider three models with different choices of η2:

  • Model I: η2 is the normal distribution with mean μη = βTx and variance ση2=4.

  • Model II: η2 is the normal distribution with mean μη = (βTx)2 and variance ση2=6.

  • Model III: η2 is the normal distribution with mean μη = (βTx)2 and variance ση2=σ2 exp(βTx)/3) where σ2 = 6.

In each of the above models, η3 is set as the multivariate normal distribution with mean vector μr = (μr,1, …, μr,q−1)T with

μr,i=αTy/q+2sin(αTy)+a(hiTx)+b(hiTx)2,i=1,,q1,

and covariance matrix 4I, where the his are orthonormal vectors that are also orthogonal to β. The constants a and b are chosen to control the marginal correlation structure of Y. Specifically, we set a = 3, b = 3 in Model 1, and a = 3, b = 9 in both Models II and III, so that the correlations among the Yi, i = 1, …, q are roughly at or below 0.6 in magnitude. These setups ensure that αTY is not dominated by any particular coordinate in Y and it is indeed the desired simple direction, i.e., a direction in Y that is associated with a one-dimensional sufficient dimension reduction subspace in X. For the above models, it can be conveniently shown that

logη2(αTy,βTx)(αTy)=μηαTyση2,,
logη2(αTy,βTx)(βTx)=12logση2(βTx)2(μηαTy)μη(βTx)ση2(μηαTy)2ση2(βTx)2ση4,

and

E{logη3(yr,αTy,x)(αTy)yr|αTy,x}=μr(αTy).

The proposed locally efficient estimation is from solving the estimating equations (4) with potentially misspecified η2 and η3. For all three simulation examples, we set η3*, the working model for η3, as normal with mean vector (αTy/q+x12,,αTy/q+xq2)T and variance-covariance matrix identity. We set η2*, the working model of η2, as N(μη=2βTx,ση2=9) in Model I and N(μη=|βTx|,ση2=9) in Models II and III. Three locally efficient estimators are constructed: the first one is based on η2* and η3* (LOC1), the second one is based on η2 and η3* (LOC2), and the third one is based on η2* and η3 (LOC3). When both η2 and η3 are correctly specified, we obtain an efficient oracle estimator (OR). Since η2 and η3 are usually unknown in real problems, LOC2, LOC3 and OR are not feasible in practice, but here they may serve as benchmarks to examine the effects of model misidentification. Based on Theorem 2.1, we also construct a simple consistent estimator (SIM), in which we choose g(αTy, βTx) = E(x | αTy) and a(x) = xT. As we focus on the single index setup, the first leading pair of canonical variables are extracted from CCA, and a unit-rank estimator is obtained from RRR; the resulting estimators are denoted as CCA1 and RRR1, respectively. For CCA and RRR, we also extracted min(p, q) pairs of directions and recorded the one that is the closet to the true pair measured by ‖α̂(α̂Tα̂)−1α̂T − α(αTα)−1αTF + ‖β̂(β̂Tβ̂)−1β̂Tβ(βTβ)−1βTF; the resulting estimators are denoted CCA* and RRR*, respectively. Similarly, for the semiparametric method SCA, we computed two estimators SCA1 and SCA*.

3.2 Results

We have considered various sample sizes, i.e., n = 500, 200 and 100, while for brevity, we mainly focus our discussion for the case n = 500 in the sequel, unless otherwise noted. The experiment is replicated 500 times under each setting. The obtained estimates (α̂, β̂) are standardized in the same way as the true (α, β), as described in Section 2. Figures 13 show the boxplots of the Euclidean distances between the true parameters and their estimated counterparts from all simulation runs, i.e., d(α̂, α) = ‖α̂(α̂Tα̂)−1α̂Tα(αTα)−1αTF for measuring the distance from α̂ to α, and d(β̂, β) = ‖β̂(β̂Tβ̂)−1β̂Tβ(βTβ)−1βTF for measuring the distance from β̂ to β, where ‖·‖F denotes the Frobenius norm. Tables 13 report the average parameter estimates (ave) and their associated standard errors (std), for Models I–III respectively. For the proposed semiparametric estimators, we also report the average of the estimated standard deviations (std^) and the coverage of the estimated 95% confidence interval (95%), based on the asymptotic results.

Figure 1.

Figure 1

Boxplots of d(α̂, α) and d(β̂, β) for Model I (n = 500).

Figure 3.

Figure 3

Boxplots of d(α̂, α) and d(β̂, β) for Model III (n = 500).

Table 1.

Simulation results for Model I (n = 500).

α1 α2 α3 β1 β2 β3 β4
CCA1 ave 1.0019 1.0017 1.0025 −0.9954 0.9978 −0.9863 0.9964
std 0.0353 0.0333 0.0344 0.1415 0.1396 0.1310 0.1336

CCA* ave 1.0019 1.0017 1.0025 −0.9954 0.9978 −0.9863 0.9964
std 0.0353 0.0333 0.0344 0.1415 0.1396 0.1310 0.1336

RRR1 ave −0.4651 −0.5834 −0.4676 0.9328 1.1671 −0.3009 0.3320
std 0.6076 0.8502 0.6741 1.4070 2.0669 0.7229 0.8102

RRR* ave 0.8305 1.0038 1.0469 −0.7006 0.8643 −0.8071 0.7799
std 1.6763 1.4649 2.1281 3.6369 4.5548 2.5557 2.8155

SCA1 ave −1.2397 −0.1936 0.0127 0.7962 −0.7020 −0.2470 0.2418
std 1.7829 1.7914 1.7786 1.4098 1.4535 0.5063 0.5015

SCA* ave 1.0027 1.0014 1.0022 −0.9966 0.9972 −0.9860 0.9976
std 0.0367 0.0347 0.0356 0.1455 0.1414 0.1337 0.1360

SIM ave 1.0000 0.9929 0.9968 −1.0256 1.0153 −1.0300 1.0233
std 0.0553 0.0566 0.0570 0.1496 0.1340 0.1383 0.1331
std^
0.0548 0.0553 0.0568 0.1446 0.1437 0.1359 0.1366
95% 0.9100 0.9140 0.9000 0.9380 0.9560 0.9520 0.9700

LOC1 ave 1.0013 0.9984 1.0001 −1.0037 1.0034 −1.0092 1.0031
std 0.0356 0.0344 0.0327 0.1385 0.1425 0.1396 0.1322
std^
0.0322 0.0322 0.0324 0.1406 0.1448 0.1375 0.1393
95% 0.9340 0.9280 0.9500 0.9380 0.9460 0.9380 0.9540

LOC2 ave 1.0015 0.9986 1.0002 −1.0033 1.0044 −1.0083 1.0035
std 0.0348 0.0341 0.0326 0.1304 0.1357 0.1309 0.1239
std^
0.0332 0.0326 0.0330 0.1329 0.1365 0.1296 0.1311
95% 0.9600 0.9300 0.9560 0.9480 0.9480 0.9380 0.9520

LOC3 ave 0.9998 0.9994 1.0004 −1.0039 1.0052 −1.0113 1.0045
std 0.0279 0.0287 0.0267 0.1371 0.1359 0.1368 0.1294
std^
0.0280 0.0278 0.0278 0.1270 0.1297 0.1233 0.1262
95% 0.9480 0.9440 0.9540 0.9380 0.9340 0.9220 0.9500

OR ave 0.9996 0.9996 1.0007 −1.0040 1.0069 −1.0107 1.0052
std 0.0272 0.0284 0.0264 0.1287 0.1292 0.1283 0.1213
std^
0.0271 0.0266 0.0267 0.1312 0.1332 0.1272 0.1293
95% 0.9500 0.9320 0.9480 0.9520 0.9500 0.9420 0.9600

Table 3.

Simulation results for Model III (n = 500).

α1 α2 α3 β1 β2 β3 β4
CCA1 ave 0.4335 −0.1846 0.6086 0.5321 −0.8804 −0.2799 0.2006
std 5.5592 4.2589 5.6687 3.9372 3.7894 1.3377 1.0205

CCA* ave 1.0234 0.9830 1.0141 −1.0164 1.1505 −0.9157 1.1674
std 0.2575 0.2588 0.2505 4.6014 4.5963 7.1897 9.5704

RRR1 ave −0.3377 −0.3885 −0.3376 0.7026 0.0307 −0.0410 −0.1217
std 0.5326 0.3455 0.5272 2.6470 3.0442 1.1687 1.0144

RRR* ave 0.9656 0.9601 1.0030 −0.7602 0.6793 −0.6031 0.8798
std 0.4133 0.4283 0.6232 4.1492 3.2080 5.3231 5.5661

SCA1 ave −0.2611 −0.2539 −0.4031 0.9776 −1.1616 −0.6185 0.6212
std 1.9660 1.9287 1.9992 2.1519 2.1893 0.5933 0.5946

SCA* ave 1.0071 1.0027 1.0069 −0.9820 0.9957 −0.9919 0.9922
std 0.1836 0.1037 0.1223 0.3488 0.3186 0.2126 0.2215

SIM ave 1.0051 1.0024 1.0059 −1.0837 1.0779 −1.0932 1.0816
std 0.0372 0.0344 0.0368 0.1679 0.1794 0.1671 0.1740
std^
0.0379 0.0386 0.0378 0.1870 0.1868 0.1885 0.1868
95% 0.9300 0.9440 0.9280 0.9700 0.9540 0.9780 0.9700

LOC1 ave 1.0140 1.0129 1.0136 −1.0034 0.9977 −0.9994 0.9954
std 0.0164 0.0157 0.0163 0.0506 0.0523 0.0537 0.0521
std^
0.0194 0.0216 0.0216 0.0501 0.0496 0.0489 0.0497
95% 0.9420 0.9740 0.9760 0.9440 0.9480 0.9460 0.9580

LOC2 ave 1.0010 0.9999 1.0004 −1.0017 1.0003 −1.0012 1.0014
std 0.0117 0.0099 0.0101 0.0239 0.0223 0.0236 0.0231
std^
0.0112 0.0099 0.0100 0.0239 0.0230 0.0232 0.0230
95% 0.9480 0.9620 0.9560 0.9540 0.9740 0.9520 0.9500

LOC3 ave 1.0003 0.9999 1.0006 −1.0037 0.9975 −0.9999 0.9958
std 0.0133 0.0127 0.0130 0.0485 0.0503 0.0523 0.0508
std^
0.0122 0.0122 0.0123 0.0495 0.0494 0.0486 0.0493
95% 0.9340 0.9580 0.9400 0.9560 0.9580 0.9480 0.9660

OR ave 1.0004 1.0000 1.0004 −1.0016 1.0000 −1.0009 1.0013
std 0.0096 0.0098 0.0098 0.0220 0.0211 0.0223 0.0218
std^
0.0095 0.0096 0.0096 0.0225 0.0220 0.0223 0.0220
95% 0.9500 0.9560 0.9440 0.9580 0.9760 0.9520 0.9500

In Model 1, the association between αTY and βTX is linear, which should benefit the linear methods. From Table 1, CCA performs very well in estimation, but RRR performs much worse. The discrepancy in performance between these two methods is due to their different objectives: while CCA focuses on maximizing the correlation between a pair of directions in Y and X, RRR focuses on explaining the variation in Y by X. In our model setup, αTY and βTX indeed has the strongest linear association among all possible directions, which makes CCA suitable. However, βTX does not necessarily coincide with the targeted direction of RRR, along which most of the variation in Y can be explained in the least squares sense. As a consequence, RRR is unsuitable here for the discovery of the desired single indices, and even RRR* performs poorly. The performance of SCA* is comparable to CCA; however, the extracted leading pair by the SCA method does not necessarily correspond to the desired pair, as seen from the performance of SCA1. If we knew the underlying model is linear, a parsimonious method like CCA would be preferable. Our results show that the proposed semiparametric approaches, which do not rely on the knowledge of linear model, work almost as well as CCA, with only a slight loss in efficiency. We plotted the results in Figure 1 to show the relative performance of the different methods. For better illustration, we omitted the estimators that perform much worse than the rest of methods.

In Models 2 and 3, the association between βTY and αTX is nonlinear, and any other direction in Y may not be adequately characterized by a single direction in X. Not surprisingly, CCA and RRR both perform poorly. The bias in CCA* or RRR* is much smaller than CCA1 or RRR as expected, but the variance of either estimator is very high. Again, SCA1 may pick up other spurious directions to approximate a single index model. Nevertheless, it appears that the desirable pair is most likely among the ones obtained from SCA, albeit a much larger estimation error comparing to the proposed DSI methods. Also, the performance of SCA* in Model II is relatively better than that in Model III, because in Model II the two indices are related only in their mean structure, while in Model III the two indices are also related in their second moments. In all occasions, the DSI estimators continue to perform very well, clearly demonstrating the effectiveness of the proposed methods in detecting nonlinear association. LOC1 performs better than SIM in general as expected. Comparing the three locally efficient estimators and the oracle estimator, the misspecification of η2 has a bigger impact on estimation than η3 does. In both models, OR performs the best among all the methods, due to the fact that the search of the directions becomes more trackable when the underlying model structure is correctly chosen. On the other hand, even when both η2 and η3 are misspecified, LOC1 still achieves small bias and remarkable estimation accuracy, with only slightly increased standard errors.

Furthermore, we can see that the inference results based on the asymptotic analysis are accurate in general. The estimated standard errors match well with their counterparts based on Monte Carlo simulation, and the coverage probabilities are mostly close to the nominal level 95%. We notice that in Models II and III, LOC1 tends to be slightly biased for the estimation of α, and the standard errors also tend to be slightly overestimated. Nevertheless, in our experiment the inference results improve when we increase the sample size.

We have also experimented with smaller sample sizes. The estimation performance of the semiparametric estimators in Model II for n = 200 and n = 100 are shown in the supplementary materials. While the estimation accuracy of the DSI methods is still satisfactory, it appears that the performance of SCA* deteriorates more severely. Probably this is because the SCA method requires the estimation of E(Yi | X) and its derivatives, which needs strong sample size requirement depending on the predictor dimension p. We have experimented with models in which the two indices are related in the second moments but not the first, and as expected SCA* fails while the proposed method continues to perform well. We note that the inference results of DSI may become less accurate for small sample sizes. In particular, the coverage probabilities for SIM and LOC1 tend to be slightly lower than the nominal level. This is expected as the inference procedure involves numerical approximations in several places, and for complex models a larger sample size may be required to allow the asymptotic theory to take effect. Following the request of a referee, we also increased the dimensions p and q and investigated the scalability of the method. The results are very encouraging. We provide the details of the computational performance in the supplementary materials.

4 Concrete Slump Test Data

As a mixture of several ingredients, concrete is a highly complex material. Understanding the relationship between the quality and composition of concrete is an important topic in the filed of Civil Engineering. Generally speaking, concrete with high consistency at its fresh state and with high strength at its hardened state exhibits desirable properties of stability and durability. The consistency of fresh concrete is commonly measured through a slump-cone test, by examining the behaviors of a compacted inverted cone of fresh concrete under the action of gravity: the slump is measured by the length of the drop from the top of the slumped concrete, and the slump flow is measured by its diameter. Here we consider a slump test dataset, consisting of 103 sets of slump test measurements (Yeh, 2006, 2007). Three variables regarding the quality of concrete were recorded including slump (cm), slump flow (cm) and 28-day compressive strength (mpa). The ingredients composing the concrete were also recorded (kg/m3), including cement, fly ash, blast furnace slag, water, superplasticizer and aggregate. Here, we apply the DSI approach to explore the association between the three quality variables (q = 3) and three ingredient variables (p = 3), the fly ash, water and superplasticizer, which are known to be important factors related to the slump and concrete quality (Yeh, 2006). All the variables are standardized prior to the analysis.

We apply CCA, RRR and SCA to identify possible linear/nonlinear relationships between the two sets of variables. We then conduct the DSI estimation, starting from 100 sets of initial values of α and β, randomly generated by adding Gaussian noise N(0, 3) to their CCA/RRR estimates. As the estimation problem is local in nature, this ensures that the starting points are fairly spread out in the vicinity of some initial linear estimates, enabling us to explore whether interesting directions of sufficient dimension reduction can be found when deviating away from the linear analysis. Because of the nonconvexity of the problem, multiple roots of the estimating equations may exist. In this problem, predominately we find two roots from the 100 model fitting attempts. Upper plots of Figure 4 depict the observed data points along the estimated linear directions from CCA and RRR, together with the fitted linear regression curves. In the middle panel of Figure 4, we plotted the two sets of solutions from the DSI method, and the fitted nonparametric regression curves are also shown. The SCI methods also extracted two pairs of directions, as shown in the bottom panel of Figure 4. The parameter estimates are given in Table 4. We have used the single-indexing (leave-one-out) cross-validation method (Xia, 2008) to assess the goodness of fit of the extracted pairs, to test whether α̂TY can be adequately predicted by a single index model of β̂TX, and all the six pairs mentioned above passed the test.

Figure 4.

Figure 4

Scatter plots along the estimated single-index directions for the slump test data analysis.

Table 4.

Coefficient estimation in the slump test data analysis.

Slump flow Strength Slump Fly ash Superplasticizer Water
α1 α2 α3 β1 β2 β3 Lack of fit
CCA −1.6433 0.2853 1.0000 0.1032 0.0715 1.0000 No
RRR 1.2951 −0.5765 1.0000 −0.1115 −0.1691 1.0000 No
SCA(1) −1.6917 0.2938 1.0000 0.1380 0.0469 1.0000 No
SCA(2) −0.6435 0.0612 1.0000 −0.1487 −0.5702 1.0000 No
DSI(1) −1.5573 0.2700 1.0000 0.1108 0.0713 1.0000 No
DSI(2) 3.1315 −1.5634 1.0000 −0.1583 −0.1508 1.0000 No

The first pair of directions found by either DSI or SCI mostly coincides with those from CCA. From the similarity of the results, as well as the fitted nonparametric curves, we can see a strong linear association along this pair of directions. It is worth pointing out that the coefficients for the slump flow has opposite sign from that of the strength or slump. This can be explained easily because in the slump test, the slump and the slump flow are in fact strongly positively correlated. Generally speaking, a lower slump implies a lower slump flow and higher compressive strength.

The second set of DSI solution reveals another interesting relation between the concrete quality and its character. In this case, the estimated α̂ from DSI agrees in sign with that from the RRR, although their coefficient values are quite different. The identified β̂ directions in X from the two methods are similar and mainly dominated by the water content variable. This is not surprising as the water content is the most important factor influencing the property of concrete, and the fly ash and the superplasticizer are both supplemental admixtures that are expected to have some secondary impact. Up to a few outliers, the association between the identified single indices by DSI can be well characterized by the fitted robust nonlinear nonparametric regression line, as shown in Figure 4(d). The coefficient of determination (R2) for the DSI fitted line is 0.503, while that for the RRR fitted line is 0.420. (We have removed two potential outliers, and the R2 values before outlier removal are 0.426 and 0.364, respectively.) As the water content increases, the quality index seems to increase sharply at the beginning, then flats out and eventfully decreases slightly. These findings are consistent with the results in Yeh (2006), in which a similar nonlinear relationship between slump and water content was detected via neural network models. From Figure (4), the second pair found by SCI appears to be spurious, and does not offer much insight to the problem. This example demonstrates that the DSI approach can be a useful and flexible tool for conveniently exploring simple nonlinear structures in complex multivariate association.

5 Discussion

Although the DSI method is illustrated in an engineering problem, it has potential in other application areas. For example, in marine ecology, DSI can be used to study the dependence between the yearly adult fish abundance, summarized from the observed fish abundances in spatial regions (αTY) and the yearly larval abundance, summarized from observed daily spawning biomass (βTX) (Chen et al., 2014). In portfolio construction, DSI can be used to study the relation between the asset return, summarized from the allocation of the available assets (αTY) to the market return, summarized from market indices and macroeconomic variables (βTX). In genomic research, DSI can be used to study the relation between the summary of gene expression profiles (αTY) and the summary of single-nucleotide polymorphism (βTX) (Witten et al., 2009). More broadly, DSI can also be applied in many time series problems, where several random variables evolve together over time. In particular, the reduced-rank linear vector autoregressive (VAR) model is an important tool in modeling the vector time series (Reinsel and Velu, 1998). It can be readily seen that the DSI model extends and renovates a unit-rank VAR model, i.e., the present value of an index of the vector time series has nonlinear relationship with the past value of another index.

We have developed a flexible double single index model for exploring unspecified and possibly nonlinear function relations between multivariate response and predictors. There are many directions for future research. For example, our method can serve as a building block to study multi-index models, analogous to the multi-factor CCA or the RRR methods. To go beyond these linear methods, one challenge is how to exhaustively extract pairs of indices for sufficient dimension reduction without imposing any specific form or restrictive assumption on their functional relations. To this end, multi-index modeling and estimation strategies similar to the sufficient dimension reduction literature is one possibility. Sequentially extracting the single index pairs from both the covariate and response variables is also worth careful investigation. To further facilitate variable selection and model interpretation, we can also consider regularized estimation in the DSI model, e.g., imposing sparsity assumption on α, β so that the constructed pair of indices only involves a subset of the responses and the predictors (Chen et al., 2012; Chen and Huang, 2012; Bunea et al., 2012).

An alternative model related to the one considered here can be constructed by further assuming that the dependence of the response variable Y on the covariates X is completely captured by the dependence of a linear combination of Y on X. In other words, Yr is independent of X conditional on αTY. Although the assumption is stronger than the double single index model, it offers an interesting modeling approach and may have important applications. The estimation, efficiency and application of such model is certainly worth exploring.

Several possibilities exist for model checking. The general idea is that because our estimation method enables the estimation of α and β, one can construct both indices. This enables us to reduce the multi-covariate multi-response problem to an effective uni-covariate uni-response problem and facilitates the application of several existing methods. For example, to check whether α̂TY can be adequately modeled by a single index model using β̂TX, many existing goodness-of-fit methods developed in the single index model framework can be applied (Stute and Zhu, 2005; Xia, 2008; Liang et al., 2010; Ma et al., 2014). In addition, a graphical tool is also possible as an exploratory tool, where one only needs to plot the data cloud formed by the two indices and inspect if the data cloud is compact along the response index. This exploratory tool is often used in the dimension reduction literature.

A potentially more fundamental problem is how to parsimoniously and flexibly approximate the multivariate conditional distribution of Y given X (Hall and Yao, 2005). Given the curse of dimensionality issue due to nonparametric estimation with multiple indices, a sequential estimation procedure, which extracts double single index model structures sequentially to improve the current approximation of the conditional distribution, can be particularly useful. Built upon the proposed DSI model, such strategy has great potential in advancing nonlinear modeling and scalable dimension reduction and is certainly on our research agenda.

Supplementary Material

supplemental data

Figure 2.

Figure 2

Boxplots of d(α̂, α) and d(β̂, β) for Model II (n = 500).

Table 2.

Simulation results for Model II (n = 500).

α1 α2 α3 β1 β2 β3 β4
CCA1 ave 0.6085 0.0534 −0.1249 0.4247 −1.0752 −0.1941 0.1703
std 5.3256 3.7520 4.8869 3.2969 3.3596 1.1036 1.0924

CCA* ave 1.0022 1.0121 1.0252 −0.9870 0.8972 −1.2320 1.0596
std 0.2406 0.2551 0.2683 2.5884 2.2488 5.2078 5.8762

RRR1 ave −0.3430 −0.4070 −0.3075 0.7447 0.0802 0.0928 −0.0913
std 0.5085 0.2223 0.4834 2.6632 2.7881 1.0480 0.9500

RRR* ave 0.9439 1.0289 0.9611 −1.1452 0.8949 −1.1906 0.7960
std 1.2258 0.6755 0.5541 3.7731 2.0927 5.3769 4.9484

SCA1 ave −0.3326 −0.1935 −0.1858 0.9299 −0.8993 −0.5943 0.5953
std 1.9992 1.8853 1.9391 2.1112 2.1041 0.5956 0.5970

SCA* ave 1.0006 1.0005 0.9996 −0.9979 0.9922 −0.9995 0.9965
std 0.0574 0.0383 0.0457 0.1929 0.1960 0.1233 0.1168

SIM ave 1.0047 1.0026 1.0063 −1.0826 1.0878 −1.0949 1.0776
std 0.0350 0.0368 0.0365 0.1751 0.1675 0.1787 0.1777
std^
0.0359 0.0351 0.0355 0.1711 0.1656 0.1626 0.1681
95% 0.9260 0.9140 0.9220 0.9280 0.9320 0.9360 0.9300

LOC1 ave 1.0145 1.0133 1.0129 −0.9999 1.0015 −1.0005 1.0015
std 0.0144 0.0160 0.0145 0.0406 0.0409 0.0401 0.0392
std^
0.0186 0.0198 0.0203 0.0394 0.0393 0.0382 0.0389
95% 0.9540 0.9540 0.9820 0.9580 0.9420 0.9480 0.9640

LOC2 ave 1.0010 1.0012 1.0001 −1.0022 1.0031 −1.0021 1.0030
std 0.0117 0.0115 0.0109 0.0323 0.0318 0.0323 0.0308
std^
0.0113 0.0110 0.0109 0.0322 0.0322 0.0318 0.0322
95% 0.9580 0.9340 0.9660 0.9580 0.9560 0.9640 0.9580

LOC3 ave 1.0003 1.0011 1.0000 −0.9991 1.0009 −1.0002 1.0015
std 0.0120 0.0118 0.0114 0.0386 0.0393 0.0390 0.0382
std^
0.0110 0.0109 0.0109 0.0386 0.0387 0.0377 0.0385
95% 0.9380 0.9340 0.9340 0.9480 0.9480 0.9500 0.9640

OR ave 1.0006 1.0011 1.0000 −1.0021 1.0029 −1.0019 1.0028
std 0.0110 0.0110 0.0103 0.0319 0.0314 0.0320 0.0304
std^
0.0106 0.0105 0.0105 0.0318 0.0320 0.0315 0.0319
95% 0.9460 0.9420 0.9640 0.9560 0.9560 0.9620 0.9600

Acknowledgments

This work was partially supported by the U.S. National Science Foundation (DMS-1206693), the U.S. National Institute of Neurological Disorders and Stroke (R01-NS073671), and the U.S. National Institutes of Health (U01-HL114494). The authors are grateful to the referees and the editors for their valuable comments and suggestions.

Contributor Information

Kun Chen, Department of Statistics, University of Connecticut, 215 Glenbrook Road U-4120, Storrs, Connecticut 06269, U.S.A.

Yanyuan Ma, Department of Statistics, University of South Carolina, 1523 Greene Street Columbia, SC 29208, U.S.A.

References

  1. Anderson TW. Estimating linear restrictions on regression coefficients for multivariate normal distributions. Annals of Mathematical Statistics. 1951;22:327–351. [Google Scholar]
  2. Bunea F, She Y, Wegkamp M. Joint variable and rank selection for parsimonious estimation of high dimensional matrices. Annals of Statistics. 2012;40:2359–2388. [Google Scholar]
  3. Chan K-S, Li M-C, Tong H. Partially linear reduced-rank regression. Technical Report, Department of Statistics, University of Iowa. 2004 [Google Scholar]
  4. Chen K, Chan K-S, Stenseth NC. Reduced rank stochastic regression with a sparse singular value decomposition. Journal of the Royal Statistical Society: Series B. 2012;74:203–221. [Google Scholar]
  5. Chen K, Chan K-S, Stenseth NC. Source-sink reconstruction through regularized multicomponent regression analysis–with application to assessing whether north sea cod larvae contributed to local fjord cod in skagerrak. Journal of the American Statistical Association. 2014;109:560–573. [Google Scholar]
  6. Chen L, Huang JZ. Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. Journal of the American Statistical Association. 2012;107:1533–1545. [Google Scholar]
  7. Fan J, Yao Q, Cai Z. Adaptive varying-coefficient linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2003;65:57–80. [Google Scholar]
  8. Gifi A. Nonlinear Multivariate Analysis. New York: John and Wiley & Sons; 1990. [Google Scholar]
  9. Hall P, Yao Q. Approximating conditional distribution functions using dimension reduction. Annals of Statistics. 2005;33:1404–1421. [Google Scholar]
  10. Hardle W, Hall P, Ichimura H. Optimal smoothing in single-index models. Annals of Statistics. 1993;21:157–178. [Google Scholar]
  11. He G, Mller H-G, Wang J-L. Functional canonical analysis for square integrable stochastic processes. Journal of Multivariate Analysis. 2003;85:54–77. [Google Scholar]
  12. Hotelling H. Relations between two sets of variates. Biometrika. 1936;28:321–377. [Google Scholar]
  13. Hsieh W. Nonlinear canonical correlation analysis by neural networks. Neural Networks. 2000;13:1095–1105. doi: 10.1016/s0893-6080(00)00067-8. [DOI] [PubMed] [Google Scholar]
  14. Iaci R, Sriram T. Robust multivariate association and dimension reduction using density divergences. Journal of Multivariate Analysis. 2013;117:281–295. [Google Scholar]
  15. Iaci R, Sriram T, Yin X. Multivariate association and dimension reduction: a generalization of canonical correlation analysis. Biometrics. 2010;66:1107–1118. doi: 10.1111/j.1541-0420.2010.01396.x. [DOI] [PubMed] [Google Scholar]
  16. Ichimura H. Semiparametric least squares (sls) and weighted fSLSg estimation of single-index models. Journal of Econometrics. 1993;58:71–120. [Google Scholar]
  17. Klein R, Shen C. Bias corrections in testing and estimating semiparametric, single index models. 2007 [Google Scholar]
  18. Klein R, Vella F. A semiparametric model for binary response and continuous outcomes under index heteroscedasticity. Journal of Applied Econometrics. 2009:735–762. [Google Scholar]
  19. Li B, Wen S, Zhu L. On a projective resampling method for dimension reduction with multivariate responses. Journal of the American Statistical Association. 2008;103:1177–1186. [Google Scholar]
  20. Liang H, Liu X, Li R, Tsai C-L. Estimation and testing for partially linear single-index models. Ann. Statist. 2010;38:3811–3836. doi: 10.1214/10-AOS835. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Ma S, Zhang J, Sun Z, Liang H. Integrated conditional moment test for partially linear single index models incorporating dimension-reduction. Electron. J. Statist. 2014;8:523–542. [Google Scholar]
  22. Ma Y, Chiou J-M, Wang N. Efficient semiparametric estimator for heteroscedastic partially linear models. Biometrika. 2006;93:75–84. [Google Scholar]
  23. Ma Y, Zhu L. A semiparametric approach to dimension reduction. Journal of the American Statistical Association. 2012;107:168–179. doi: 10.1080/01621459.2011.646925. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Ma Y, Zhu L. Doubly robust and Efficient estimators for heteroscedastic partially linear single-index models allowing high dimensional covariates. Journal of the Royal Statistical Society: Series B. 2013a;75:305–322. doi: 10.1111/j.1467-9868.2012.01040.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Ma Y, Zhu L. Efficient estimation in sufficient dimension reduction. Annals of Statistics. 2013b;41:250–268. doi: 10.1214/12-AOS1072SUPP. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Mandal A, Cichocki A. Non-linear canonical correlation analysis using Alpha-Beta divergences. Entropy. 2013;15:2788–2804. [Google Scholar]
  27. Mukherjee A, Zhu J. Reduced rank ridge regression and its kernel extensions. Statistical Analysis and Data Mining. 2011;4:612–622. doi: 10.1002/sam.10138. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Newey WK, Stoker TM. Efficiency of weighted average derivative estimators and index models. Econometrica. 1993:1199–1223. [Google Scholar]
  29. Reinsel GC, Velu P. Multivariate reduced-rank regression: theory and applications. New York: Springer; 1998. [Google Scholar]
  30. Stute W, Zhu L-X. Nonparametric checks for single-index models. Ann. Statist. 2005;33:1048–1083. [Google Scholar]
  31. Witten DM, Tibshirani RJ, Hastie TJ. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics. 2009;10:515–534. doi: 10.1093/biostatistics/kxp008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Xia Y. A semiparametric approach to canonical analysis. Journal of the Royal Statistical Society: Series B. 2008;70:519–543. [Google Scholar]
  33. Yeh I-C. Exploring concrete slump model using artificial neural networks. Journal of Computing in Civil Engineering. 2006;20:217–221. [Google Scholar]
  34. Yeh I-C. Modeling slump flow of concrete using second-order regressions and artificial neural networks. Cement and Concrete Composites. 2007;29:474–480. [Google Scholar]
  35. Yuan M, Ekici A, Lu Z, Monteiro R. Dimension reduction and coefficient estimation in multivariate linear regression. Journal of the Royal Statistical Society: Series B. 2007;69:329–346. [Google Scholar]
  36. Zhu H, Khondker Z, Lu Z, Ibrahim JG. Bayesian generalized low rank regression models for neuroimaging phenotypes and genetic markers. Journal of the American Statistical Association. 2014;109:977–990. [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supplemental data

RESOURCES