Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Feb 10.
Published in final edited form as: J Am Stat Assoc. 2023 Dec 26;119(548):2733–2747. doi: 10.1080/01621459.2023.2277406

Dimension Reduction for Fréchet Regression

Qi Zhang 1, Lingzhou Xue 1,*, Bing Li 1
PMCID: PMC11810122  NIHMSID: NIHMS1956452  PMID: 39931232

Abstract

With the rapid development of data collection techniques, complex data objects that are not in the Euclidean space are frequently encountered in new statistical applications. Fréchet regression model (Peterson & Müller 2019) provides a promising framework for regression analysis with metric space-valued responses. In this paper, we introduce a flexible sufficient dimension reduction (SDR) method for Fréchet regression to achieve two purposes: to mitigate the curse of dimensionality caused by high-dimensional predictors, and to provide a visual inspection tool for Fréchet regression. Our approach is flexible enough to turn any existing SDR method for Euclidean (X,Y) into one for Euclidean X and metric space-valued Y. The basic idea is to first map the metric space-valued random object Y to a real-valued random variable f(Y) using a class of functions, and then perform classical SDR to the transformed data. If the class of functions is sufficiently rich, then we are guaranteed to uncover the Fréchet SDR space. We showed that such a class, which we call an ensemble, can be generated by a universal kernel (cc-universal kernel). We established the consistency and asymptotic convergence rate of the proposed methods. The finite-sample performance of the proposed methods is illustrated through simulation studies for several commonly encountered metric spaces that include Wasserstein space, the space of symmetric positive definite matrices, and the sphere. We illustrated the data visualization aspect of our method by the human mortality distribution data from the United Nations Databases.

Keywords: Ensembled sufficient dimension reduction, Inverse regression, Statistical objects, Universal kernel, Wasserstein space

1. Introduction

With the rapid development of data collection techniques, complex data objects that are not in the Euclidean space are frequently encountered in new statistical applications, such as the graph Laplacians of networks, the covariance or correlation matrices for the brain functional connectivity in neuroscience (Ferreira and Busatto, 2013), and probability distributions in CT hematoma density data (Petersen and Müller, 2019). These data objects, also known as random objects, do not obey the operation rules of a vector space with an inner product or a norm, but instead reside in a general metric space. In a prescient paper, Fréchet (1948) proposed the Fréchet mean as a natural generalization of the expectation of a random vector. By extending the Fréchet mean to the conditional Fréchet mean, Petersen and Müller (2019) introduced the Fréchet regression model with random objects as the response and Euclidean vectors as predictors, which provides a promising framework for regression analysis with metric space-valued responses. Dubey and Müller (2019) showed the consistency of the sample Fréchet mean using the results of Petersen and Müller (2019), derived a central limit theorem for the sample Fréchet variance that quantifies the variation around the Fréchet mean, and further developed the Fréchet analysis of variance for random objects. Dubey and Müller (2020a) designed a method for change-point detection and inference in a sequence of metric-space-valued data objects.

The Fréchet regression employs the global least squares and the local linear or polynomial regression to fit the conditional Fréchet mean. It is well known that the global least squares is based on a restrictive assumption of the regression relation. Although the local regression is more flexible, it is effective only when the dimension of the predictor is relatively low. As this dimension gets higher, its accuracy drops significantly–a phenomenon known as the curse of dimensionality. To address this issue, it is essential to reduce the dimension of the predictor without losing the information about the response. For classical regression, this task is performed by sufficient dimension reduction (SDR; see Li 1991; Cook 1996 and Li 2018 among others). SDR projects the high-dimensional predictor onto a low-dimensional subspace that preserves the information about the response through the use of sufficiency.

Besides assisting regression in overcoming the curse of dimensionality, another important function of SDR for classical regression is to provide a data visualization tool to gain insights into how the regression surface looks in high-dimensional space before even fitting a model. By inspecting the sufficient plots of the response objects against the sufficient predictors, we can gain insights into the general trends of the response as the most informative part of the predictor varies, whether there are outlying observations, and whether there are subjects with high leverage that have undue influence on the regression estimates-the usual information a statistician looks for in the exploratory and model checking stages of the regression analysis. This function is also needed in Fréchet regression. In fact, it can be argued that data visualization is even more important for the regression of random objects, as the regression relation may be even more difficult to discern among the complex details of the objects.

To fulfill these demands, we systematically develop the theories and methodologies of sufficient dimension reduction for Fréchet regression in this paper. To set the stage, we first give an outline of SDR for classical regression. Let X be a p-dimensional random vector in Rp and Y a random variable in R. The classical SDR aims to find a dimension reduction subspace 𝒮 of Rp such that Y and X are independent conditioning on P𝒮X, that is, YXP𝒮X, where P𝒮 is the projection on to 𝒮 with respect to the usual inner product in Rp. This way, P𝒮X can be used as the synthetic predictor without losing regression information about the response Y. Under mild conditions, the intersection of all such dimension reduction subspaces is also a dimension reduction subspace, and the intersection is called the central subspace denoted by 𝒮YX (Cook, 1996; Yin et al., 2008). For the situation where the primary interest is in estimating the regression function, Cook and Li (2002) introduced a weaker form of SDR, the mean dimension reduction subspace. A subspace 𝒮 of Rp is a mean SDR subspace if it satisfies E(YX)=EYP𝒮X, and the intersection of all such spaces if it is still a mean SDR subspace, is the central mean subspace, denoted by 𝒮E(YX). The central mean subspace 𝒮E(YX) is always contained in central subspace 𝒮YX when they exist. Many estimating methods for the central subspace and the central mean subspace have been developed over the past decades. For example, for the central subspace, we have the sliced inverse regression (SIR; Li 1991), the sliced average variance estimate (SAVE; Cook and Weisberg 1991), the contour regression (CR; Li et al. 2005), and the directional regression (DR; Li and Wang 2007). For the central mean subspace, we have the ordinary least squares (OLS; Li and Duan 1989), the principal Hessian directions (PHD; Li 1992), the iterative Hessian transformation (IHT Cook and Li 2002), the outer product of gradients (OPG) and the minimum average variance estimator (MAVE) of Xia et al. (2002).

SDR has been extended to accommodate some complex data structures in the past, for example, to functional data (Ferré and Yao 2003; Hsing and Ren 2009; Li and Song 2017), to tensorial data (Li et al. 2010; Ding and Cook 2015), and to panel data (Fan et al., 2017; Yu et al., 2020; Luo et al., 2021). Most recently, Ying and Yu (2022) extended SIR to the case where the response takes values in a metric space, and Zhang et al. (2022) extended the generalized SIR (Lee et al., 2013) to the case where the response and predictors are distributional data. Taking a substantial step forward, in this paper, we introduce a comprehensive and flexible method that can adapt any existing SDR estimators to metric space-valued responses.

The basic idea of our method stems from the ensemble SDR for Euclidean X and Y of Yin and Li (2011), which recovers the central subspace 𝒮YX by repeatedly estimating the central mean subspace 𝒮E[f(Y)X] for a family F of functions f that is rich enough to determine the conditional distribution of Y|X. Such a family F is called an ensemble and satisfies 𝒮YX=𝒮E[f(Y)X]:fF. Using this relation, we can turn any method for estimating the central mean space into one that estimates the central subspace.

While borrowing the idea of the ensemble, our goal is different from Yin and Li (2011): we are not interested in turning an estimator for the central mean subspace into one for the central subspace. Instead, we are interested in turning any existing SDR method for Euclidean (X,Y) into one for Euclidean X and metric space-valued Y. Let X be a random vector in Rp and Y a random object that takes values in a metric space ΩY,d. Still use the symbol SYX to represent the intersection of all subspaces of Rp satisfying YXP𝒮X. We call 𝒮YX the central subspace for Fréchet SDR, or simply the Fréchet central subspace. Let F be a family of functions f:ΩYR that are measurable with respect to the Borel σ-field on the metric space. We use two types of ensembles to connect classical SDR with Fréchet SDR:

  • Central Mean Space ensemble (CMS-ensemble) is a family F that is rich enough so that 𝒮YX=𝒮E[f(Y)X]:fF. Note that we know how to estimate the spaces 𝒮E(f(Y)X) using the existing SDR methods since f(Y) is a number. We use this ensemble to turn an SDR method that targets the central mean subspace into one that targets the Fréchet central subspace. We will focus on two forward regression methods: OPG and MAVE, and three moment estimators of the CMS.

  • Central Space ensemble (CS-ensemble) is a family F that is rich enough so that 𝒮YX=𝒮f(Y)X:fF. We use this ensemble to turn an SDR method that targets the central subspace for real-valued response into one that targets the Fréchet central subspace. We will focus on three inverse regression methods: SIR, SAVE, and DR.

A key step in implementing both of the above schemes is to construct an ensemble F in each case. For this purpose, we assume that the metric space ΩY,d is continuously embeddable into a Hilbert space. Under this assumption, one can construct a universal reproducing kernel, which leads to an F that satisfies the required characterizing property.

As with classical SDR, the Fréchet SDR can also be used to assist data visualization. To illustrate this aspect, we consider an application involving factors that influence the mortality distributions of 162 countries (see Section 7 for details). For each country, the response is a histogram with the numbers of deaths for each five-year period from age 0 to age 100, which is smoothed to produce a density estimate, as shown in panel (a) of Figure 1. We considered nine predictors characterizing each country’s demography, economy, labor market, health care, and environment. Using our ensemble method, we obtained a set of sufficient predictors. In panel (b) of Figure 1, we show the mortality densities plotted against the first sufficient predictor. A clear pattern is shown in the plot: for countries with low values of the first sufficient predictor, the modes of the mortality distributions are at lower ages, and there are upticks at age 0, indicating high infant mortality rates; for countries with high values of the first sufficient predictor, the modes of the mortality distributions are significantly higher, and there are no upticks at age 0, indicating very low infant mortality rates. The information provided by the plot is useful, and many further insights can be gained about what affects the mortality distribution by taking a careful look at the loadings of the first sufficient predictor, as will be detailed in Section 7.

Figure 1:

Figure 1:

Data visualization in Fréchet regression for mortality distributions of 162 countries. Panel (a) plots mortality densities that are placed in random order, and Panel (b) plots mortality densities versus the first sufficient predictor estimated by our ensemble method.

The rest of this paper is organized as follows. Section 2 defines the Fréchet SDR problem and provides sufficient conditions for a family F to characterize the central subspace. Section 3 then constructs ensemble F for the Wasserstein space of univariate distributions, the space of covariance matrix, and a special Riemannian manifold, the sphere. Section 4 proposes the CMS-ensembles by extending five SDR methods that target the central mean space for real-valued response: OLS, PHD, IHT, OPG and MAVE, and CS-ensembles by extending three SDR methods that target the central space for real-valued response: SIR, SAVE, and DR. Section 5 establishes the convergence rate of the proposed methods. Section 6 uses simulation studies to examine the numerical performances of different ensemble estimators in different settings, including distributional responses and covariance matrix responses. In Section 7, we analyze the mortality distribution data to demonstrate the usefulness of our methods. Section 8 includes a few concluding remarks and discussion. All the proofs and additional simulation studies and real applications are presented in the Supplementary Material.

2. Characterization of the Fréchet Central Subspace

Let (Ω,,P) be a probability space. Let ΩY,d be a metric space with metric d and Y the Borel σ-field generated by the open sets in ΩY. Let ΩX be a subset of Rp and X the Borel σ-field generated by the open sets in ΩX. Let (X,Y) be a random element mapping from Ω to ΩX×ΩY measurable with respect to the product σ-field X×Y. We denote the marginal distributions of X and Y by PX and PY, respectively, and the conditional distributions of YX and XY by PYX and PXY, respectively. We formulate the Fréchet SDR problem as finding a subspace 𝒮 of Rp such that Y and X are independent conditioning on P𝒮X:

YXP𝒮X, (1)

where P𝒮 is the projection on to 𝒮 with respect to the inner product in Rp. As in the classical SDR, the intersection of all such subspaces 𝒮 still satisfies (1) under mild conditions (Cook and Li, 2002). Indeed, it does not require any structure of the space ΩY. A sufficient condition shown in Yin et al. (2008) is that X is supported by a matching set. For example, if the support of X is convex, then this sufficient condition is satisfied. We call this subspace the Fréchet central subspace and denote it by 𝒮YX. Similar to Cook (1996), it can be shown that if the support of X is open and convex, the Fréchet central subspace 𝒮YX satisfies (1).

2.1. Two types of ensembles and their sufficient conditions

Let F be a family of measurable functions f:ΩYR, and for an fF, let 𝒮E[f(Y)X] be the central mean subspace of f(Y) versus X. As mentioned in Section 1, we use two types of ensembles to recover the Fréchet central subspace. The first type is any F that satisfies

span𝒮E(f(Y)X):fF=𝒮Y|X. (2)

This is the same ensemble as that in Yin and Li (2011), except that, here, the right-hand side is the Fréchet central subspace. The relation (2) allows us to recover the Fréchet central subspace 𝒮YX by a collection of classical central mean subspaces. We call a class F that satisfies (2) a CMS-ensemble. The second type of ensemble is any family F that satisfies

span𝒮f(Y)X:fF=𝒮YX, (3)

which we call a CS-ensemble. Proposition 1 shows that a CMS ensemble is a CS-ensemble.

PROPOSITION 1.

If F is a CMS-ensemble, then it is a CS-ensemble.

We next develop a sufficient condition for an F to be a CMS-ensemble and hence also a CS-ensemble. Let B=IB:BisBorelsetinΩY be the family of measurable indicator functions on ΩY, and let span(F)={i=1kαifi:k,α1,,αk,f1,,fkF} be the linear span of F, where N={1,2,}. Yin and Li (2011) showed that if F is a subset of L2PY that is dense in B, then (2) holds for the classical 𝒮YX. Here, we generalize that result to our setting by requiring only span(F) to be dense in B.

LEMMA 1.

If F is a subset of L2PY and span{F} is dense in B with respect to the L2PY-metric, then F is a CMS-ensemble and hence also a CS-ensemble.

2.2. Construction of the CMS-ensemble

To construct a CMS-ensemble, we resort to the notion of the universal kernel. Let CΩy be the family of continuous real-valued functions on ΩY. When ΩY is compact, Steinwart (2001) defined a continuous kernel κ as universal (we refer to it as c-universal) if its associated RKHS Y is dense in CΩY under the uniform norm. To relax the compactness assumption, Micchelli et al. (2006) proposed the following notion of universality, which is referred to cc-univsersal in Sriperumbudur et al. (2011). For any compact set KΩY, let Y(K) be the RKHS generated by {κ(,y):yK}. We should note that a member f of Y(K) is supported on ΩY, rather than K. Let fK denote the restriction of f on K, and C(K) the class of all continuous functions with respect to the topology in ΩY,d restricted on K.

DEFINITION 1.

(Micchelli et al., 2006) We say that κ is cc-universal if, for any compact set KΩY, any member f of C(K), and any ϵ>0, there is an hY(K) such that f(hK)=supyK|f(y)h(y)|<ϵ.

When ΩY is compact, Sriperumbudur et al. (2011) showed that two notions of universality are equivalent. In the following, we look into the conditions under which a metric space has a cc-universal kernel and how to construct such a kernel when it does.

Micchelli et al. (2006) showed that when ΩY=Rd, many standard kernels, including Laplacian kernels and Gaussian RBF kernels, are cc-universal. Unfortunately, when ΩY is a general metric space, direct extensions of these types of kernels, for example, ky,y=exp(γdy,y2), are no longer guaranteed to be cc-universal. Christmann and Steinwart (2010) showed that for compact ΩY, if there exists a separable Hilbert space and a continuous injection ρ:ΩY, then for any analytic function F:RR whose Taylor series at zero has strictly positive coefficients, the function κy,y=Fρ(y),ρy defines a c-universal kernel on ΩY. They also provide an analogous definition of the Gaussian-type kernel in the above case. We extend this result to construct cc-universal kernels on non-compact metric space. The proof is given in the Supplementary Material.

PROPOSITION 2.

Suppose ΩY,d is a complete and separable metric space, and there exists a separable Hilbert space and a continuous injection ρ:ΩY. If F:RR is an analytic function of the form F(t)=n=0antn, an0 for all n1, then the function κ:ΩY×ΩYR defined by κy,y=Fρ(y),ρy is a positive definite kernel. Furthermore, if an>0 for all n1, then κ is a cc-universal kernel on ΩY.

As an example, Corollary 1 shows that the Gaussian-type kernel is cc-universal on ΩY

COROLLARY 1.

Suppose the conditions in Proposition 2 are satisfied, then the Gaussian-type kernel κγy,y=exp(γρ(y)ρy2), where γ>0, is cc-universal. Furthermore, if the continuous function ρ:ΩY is isometric, that is, dy,y=ρ(y)ρy, then Gaussian-type kernel κγy,y=expγd2y,y is cc-universal.

The second part of Corollary 1 is straightforward since an isometry is an injection. Similar results can be established for Laplacian-type kernel κγ(y,y)=expγρ(y)ρy.

As far as we know, the idea of embedding a (semi) metric space to a Hilbert space was first proposed in Berg et al. (1984) and has been revisited by Sejdinovic et al. (2013) and Dubey and Müller (2020b). By Berg et al. (1984, Theorem 2.2), expγd2(,) is positive definite for all γ>0 if and only if d2(,) is negative definite, which is guaranteed when the metric space can be isometrically embedded into a Hilbert space.

The continuous embedding condition in Proposition 2 covers several metric spaces often encountered in statistical applications. Section 3 employs it to construct cc-universal kernels on the space of univariate distributions endowed with Wasserstein-2 distance, correlation matrices endowed with Frobenius distance, and spheres endowed with geodesic distance.

By using the notion of regular probability measure, we connect the cc-universal kernel on ΩY,d with the CMS-ensemble, which is the theoretical foundation of our method. Recall that a measure PY on ΩY,d is regular if, for any Borel subset BΩY and any ε>0, there is a compact set KB and an open set GB, such that P(G\K)<ε.

THEOREM 1.

Suppose, on metric space ΩY,d, (1) κ is a bounded cc-universal kernel and (2) PY is a regular probability measure. Then the family F=κ(,y):yΩY is a CMS-ensemble.

The proof of Theorem 1 is given in the Supplementary Material. Condition (2), which requires PY to be regular, is quite mild: it is known that any Borel measure on a complete and separable metric space is regular (see Granirer (1970, Chapter 2: Theorem 1.2, Theorem 3.2)). Thus, a sufficient condition of Condition (2) is ΩY,d being complete and separable, which is satisfied by all the metric spaces we consider. Specifically, note that if M is separable and complete, then so is the Wasserstein-2 space W2(M) (Panaretos and Zemel, 2020, Proposition 2.2.8, Theorem 2.2.7). Therefore, W2(R) is complete and separable. Similarly, the SPD matrix space endowed with Frobenius distance and the sphere endowed with geodesic distance are complete and separable metric spaces. Furthermore, the Gaussian kernel and Laplacian kernel we considered satisfy Condition (1) in Theorem 1.

Thus, Proposition 2 and Theorem 1 provide a general mechanism to construct the CMS-ensemble over any separable and complete metric space without a linear structure, provided it can be continuously embedded in a separable Hilbert space. For the case where multiple cc-universal kernels exist, we design a cross-validation framework in Section 6 to choose the kernel type and the bandwidth γ.

3. Important Metric Spaces and their CMS Ensembles

This section gives the construction of CMS-ensembles for three commonly used metric spaces.

3.1. Wasserstein space

Let I be R or a closed interval of R,(I) the σ-field of Borel subsets of I, and 𝒫(I) the collection of all probability measures on I,(I). The Wasserstein space 𝒲2(I) is defined as the subset of 𝒫(I) with finite second moment, that is, 𝒲2(I)=μ𝒫(I):It2dμ(t)<, endowed with the quadratic Wasserstein distance dWμ1,μ2=01Fμ11(s)Fμ21(s)2ds1/2, where μ1 and μ2 are members of 𝒲2(I) and Fμ11 and Fμ21 are the quantile functions of μ1 and μ2, which we assume to be well defined. This distance can be equivalently written as dWμ1,μ2=IFμ11Fμ2(t)t2dμ2(t)1/2. The set 𝒲2(I) endowed with dW is a metric space with a formal Riemannian structure (Ambrosio et al., 2004).

Here, we present some basic results that characterize 𝒲2(I), whose proofs can be found, for example, in Ambrosio et al. (2004) and Bigot et al. (2017). For μ1,μ2𝒲2(I), we say that a (I)-measurable map r:II transports μ1 to μ2 if μ2=μ1r1. This relation is often written as μ2=r#μ1. Let μ0𝒲2(I) be a reference measure with a continuous Fμ0. The tangent space at μ0 is Tμ0=clL2μ0λFμ1Fμ0id:μ𝒲2(I),λ>0, where, for a set AL2μ0,clL2μ0(A) denotes the L2μ0-closure of A. The exponential map expμ0 from Tμ0 to 𝒲2(I), defined by expμ0(r)=(r+id)#μ0, is surjective. Therefore its inverse, logμ0:𝒲2(I)Tμ0, defined by logμ0(μ)=Fμ1Fμ0id, is well defined on 𝒲2(I). It is well known that the exponential map, restricted to the image of log map, denoted as expμ0logμ0(μ)𝒲2(I), is an isometric homeomorphism (Bigot et al., 2017). Therefore, logμ0 is a continuous injection from 𝒲2(I) to L2μ0. We can then construct CMS-ensembles using the general constructive method provided by Theorem 1 and Proposition 2. The next proposition gives two such constructions, where the subscripts “G” and “L” for the two kernels refer to “Gaussian” and “Laplacian”, respectively.

PROPOSITION 3.

For IR,κGy,y=exp(γlogμ0(y)logμ0yμ022)=exp(γdWy,y2) and κLy,y=exp(γlogμ0(y)logμ0yμ02)=expγdWy,y are both cc-universal kernels on 𝒲2(I). Consequently, the families FG={exp(γdW(,t)2):t𝒲2(I)} and FL=expγdW(,t):t𝒲2(I) are CMS-ensembles.

3.2. Space of symmetric positive definite matrices

We first introduce some notations. Let Sym (r) be the set of r×r invertible symmetric matrices with real entries and Sym+(r) the set of r×r symmetric positive definite (SPD) matrices. For any YRr×r, the matrix exponential of Y is defined as the infinite power series exp(y)=k=0Yk/k!. For any XSym+(r), the matrix logarithm of X is defined as any r×r matrix Y such that exp(y)=X and denoted by log(X).

Let dF be the Frobenius metric. Then (sym+(r),dF is a metric space continuously embedded by identity mapping in sym+(r), which is a Hilbert space with the Frobenius inner product A,B=trAB. Also, the identity mapping id: Sym+(r)Sym(r) is obviously isometric. Therefore, by Corollary 1, the two types of radial basis function kernels for Wasserstein space can be similarly extended to Sym+(r). That is, let κGy,y=exp(γdFy,y2) and κLy,y=expγdFy,y, then FG=κGy,y,ySym+(r) and FL=κLy,y,ySym+(r) are CMS-ensembles.

Another widely used metric over Sym+(r) is the log-Euclidean distance defined as dlogY1,Y2=logY1logY2F. It pulls the Frobenius metric on Sym(r) back to Sym+(r) by the matrix logarithm map. The matrix logarithm log() is a continuous injection to Hilbert Sym(r). By Corollary 1, the two types of radial basis function kernels κG,logy,y=exp(γdlogy,y2) and κL,logy,y=expγdlogy,y are cc-universal. Then, FG,log=κG,logy,y,y𝒲2(I) and FL,log=κL,logy,y,y𝒲2(I) are CMS-ensembles.

3.3. The sphere

Consider the random vector taking values in the sphere Sn=xRn:x=1. To respect the nonzero curvature of Sn, the geodesic distance dgY1,Y2=arccosY1Y2, which is derived from its Riemannian geometry, is often used rather than the Euclidean distance. However, the popular Gaussian-type RBF kernel κGy,y=exp(γdgy,y2) is not positive definite on Sn (Jayasumana et al., 2013). In fact, Feragen et al. (2015) proved that for complete Riemannian manifold M with its associated geodesic distance dg,κGy,y=exp(γdgy,y2) is positive semidefinite only if M is isometric to a Euclidean space. Honeine and Richard (2010) and Jayasumana et al. (2013) proved that the Laplacian-type kernel κLy,y=expγdgy,y is positive definite on the sphere Sn. We show in the following proposition that κLy,y is cc-universal.

PROPOSITION 4.

The Laplacian-type kernel κLy,y:Sn×SnR, defined by κLy,y=expγdgy,y where dg is the geodesic distance on Sn, is a cc-universal kernel for any Consequently, FL=expγdg(,t),tSn is a CMS-ensemble.

We note that the scope of Proposition 2 goes beyond Riemannian manifolds. For example, the Gaussian type kernel on the space of Borel probability measures on a compact metric space with Prohorov metric is universal (Christmann and Steinwart, 2010). We construct an explicit embedding and a universal kernel for any metric space of negative type in Theorem 4 in the Supplementary Material.

4. Fréchet Sufficient Dimension Reduction

In this section, we develop the Fréchet SDR estimators based on the CMS-ensembles and CS-ensembles and establish their Fisher consistency.

4.1. Ensembled moment estimators via CMS ensembles

We first develop a general class of Fréchet SDR estimators based on the ensembled moment estimators of the CMS, such as the OLS, PHD, and IHT. Let 𝒫XY be the collection of all distributions of (X,Y), and let M:𝒫XYRp×p be a measurable function to be used as an estimator of the Fréchet central subspace 𝒮YX. A function defined on 𝒫XY is called statistical functional; see, for example, Chapter 9 of Li (2018). In the SDR literature, such a function is also called a candidate matrix (Ye and Weiss, 2003). Let FXY be a generic member of 𝒫XY,FXY(0) the true distribution of (X,Y), and FˆXY(n) the empirical distribution of (X,Y) based on an i.i.d. sample X1,Y1,,Xn,Yn. Extending the terminology of classical SDR (see, for example, Li 2018, Chapter 2), we say that the estimate M(FˆXY(n)) is unbiased if M(FXY(0))𝒮YX, exhaustive if M(FXY(0))𝒮YX, and Fisher consistent if M(FXY(0))=𝒮YX. We refer to M as the Fréchet candidate matrix.

Suppose we are given a CMS-ensemble F. Let M0:𝒫XY×FRp×p be a function to be used as an estimator of 𝒮E[f(Y)X] for each f. In the classical sense, this is not a statistical functional, as it involves an additional set F. So, we redefine unbiasedness, exhaustiveness, and Fisher consistency for this type of augmented statistical functional.

DEFINITION 2.

We say that M0 is unbiased for estimating 𝒮E[f(Y)X]:fF if, for each fF, span{M0(FXY(0),f)}𝒮E[f(Y)X]. Exhaustiveness and Fisher consistency of M0 are defined by replacingin the above byand =, respectively.

Note that M0(,f) is an estimator of the classical central mean subspace 𝒮E[f(Y)X], as f(Y) is a random number rather than a random object. We refer to M0 as the ensemble candidate matrix, or, when confusion is possible, CMS-ensemble candidate matrix. Our goal is to construct a Fréchet candidate matrix M:𝒫XYRp×p from the ensemble candidate matrix M0:𝒫XY×FRp×p. To do so, we assume F is of the form κ(,y):yΩY, where κ:ΩY×ΩYR is a cc-universal kernel. Given such an F and M0, we define M as follows

MFXY=ΩYM0FXY,κ,ydFYy,

where FY is the distribution of Y derived from FXY.

We now adapt several estimates for the classical central mean subspace to the estimation of Fréchet SDR: the ordinary least squares (OLS; Li and Duan 1989), the principal Hessian directions (PHD; Li 1992), and the Iterative Hessian Transformation (IHT; Cook and Li 2002). These estimates are based on sample moments and require additional conditions on the predictor X for their unbiasedness. Specifically, we make the following assumptions:

ASSUMPTION 1.

  1. Linear Conditional Mean (LCM): E(XβX) is a linear function of βX, where β is a basis matrix of the Fréchet central subspace 𝒮YX;

  2. Constant Conditional Variance (CCV): var(XβX) is a nonrandom matrix.

Under the first assumption, the ensemble OLS and IHT are unbiased for estimating the Fréchet central subspace; under both assumptions, the ensemble PHD is unbiased for estimating 𝒮YX. More detailed discussions on the unbiasedness and fisher consistency of ensemble estimators are presented in Section 4.4. In practice, the two assumptions above cannot be checked directly since we do not know β. However, as was shown by Eaton (1986), if Assumption 1 holds for all β, then the distribution of X is elliptical, and vice versa. If further X is multivariate normal, then Assumption 2 is satisfied. This means once the marginal distribution of predictor X is regular, Assumption 1 holds without being affected by the nonlinear nature of the response. Currently, the scatter plot matrix is the most commonly used empirical method to check the elliptical distribution assumption. If non-elliptical features are observed, one can use marginal transformations of the predictors, such as the Box-Cox transformation, to mitigate the non-ellipticity problem. Furthermore, in practice, the SDR methods that require ellipticity usually still work reasonably well even when the elliptical distribution assumption is violated. This occurs particularly when the dimension p of X is high. See Hall and Li (1993) and Li and Yin (2007) for the theoretical supports. Our simulation results in Section 6 support this phenomenon.

It is most convenient to construct these ensemble estimators using standardized predictors. As stated in the next proposition, the theoretical basis for doing so is an equivariant property of the Fréchet central subspace.

PROPOSITION 5.

If 𝒮YX is the Fréchet central subspace, ARp×p is a non-singular matrix, and b is a vector in Rp, then 𝒮YAX+b=ASYX.

The proof is essentially the same as that for the classical central subspace (see, for example, Li, 2018, page 24) and is omitted. Using this property, we first transform X to Z=var(X)1/2(XEX), estimate the Fréchet central subspace 𝒮YZ, and then transform it by var(X)1/2𝒮YZ, which is the same as 𝒮YX. The candidate matrices M0 and M for ensemble OLS, PHD, and IHT are formulated in Remark 1. Detailed motivation for each can be found in Li (2018, Chapter 8). The sample estimates can then be constructed by replacing the expectations in M0 and M with sample moments whenever possible. Algorithm 1 summarizes the steps to implement an ensembled moment estimator, where κcy,y stands for the centered kernel κy,yEnκY,y.

Algorithm 1.

Fréchet OLS, PHD, IHT, SIR, SAVE, DR

Step 1. Standardize predictors. Compute sample mean μˆ=En(X) and sample variance Σˆ=varn(X). Then let Zi=Σˆ1/2Xiμˆ.
Step 2. Compute Mˆ0(y) for y=Y1,,Yn according to Remarks 1 and 2.
Step 3. Compute Mˆ=1ni=1nMˆ0(Yi).
Step 4. Let vˆ1,,vˆd0 be the leading d0 eigenvectors of Mˆ, and let uk=Σˆ1/2vk, for k=1,,d0. Then use u1,,ud0 to estimate a basis of 𝒮YX.

REMARK 1.

The candidate matrices M0(y) for Fréchet OLS, PHD, and IHT are (y)C(y) with (y)=cov[Z,κ(Y,y)], EZZκc(Y,y), and W(y)W(y) with W(y)=C(y),H(y)C(y),,H(y)rC(y) and H(y)=EZZκc(Y,y), respectively.

4.2. Ensembled forward regression estimators via CMS ensembles

We adopt the OPG (Xia et al. 2002), a popular method for estimating the classical CMS based on nonparametric forward regression, to the estimation of the Fréchet central subspace, which does not require LCM and CCV conditions. The adaption of another forward regression method MAVE is similar and presented in Section S.3.2 of the Supplementary Material. The framework of the statistical functional M0FXY,f is no longer sufficient to cover this case because we now have a tuning parameter here. So, we adopt the notion of tuned statistical functional in Section 11.2 of Li (2018) to accommodate a tuning parameter.

Let 𝒫XY,FXY,FXY(0) and FˆXY(n) be as defined in Section 4.1. For simplicity, we assume the tuning parameter h to be a scalar, but it could also be a vector. Given a CMS-ensemble F, let T0:𝒫XY×F×RRp×p be a tuned functional to be used as an estimator of 𝒮E[f(Y)X] for each f. We refer to T0 as the ensemble-tuned candidate matrix. The unbiasedness, exhaustiveness, and Fisher consistency of T0 are defined as follows.

DEFINITION 3.

We say that T0 is unbiased for estimating 𝒮E[f(Y)X]:fF if, for each fF, span{limh0T0(FXY(0),f,h)}𝒮E[f(Y)X]. Exhaustiveness and Fisher consistency of T0 are defined by replacingin the above byand =, respectively.

Given F=κ(,y):yΩY and T0, we define the tuned Fréchet candidate matrix T:𝒫XY×RRp×p as TFXY,h=ΩYT0FXY,κ(S,y),hdFY(y). We say that the estimate T(TXY(n),h) is unbiased if span(limh0T(FXY(0),h))𝒮YX, exhaustive if span(limh0T(FXY(0),h))𝒮YX, and Fisher consistent if span(limh0T(FXY(0),h))=𝒮YX.

In the following, for a function h(x), we use h(X)/X to denote h(x)/x evaluated at x=X. The OPG aims to estimate central mean subspace 𝒮E[κ(Y,y)X] by EE(κ(Y,y)X)XE(κ(Y,y)X)XT where the gradient E(κ(Y,y)X)/X is estimated by local linear approximation as follows. Let K0:R[0,) be a kernel function as used in kernel estimation. For any vRp and bandwidth h>0, let Kh(v)=hpK0(v/h). At the population level, for fixed xΩx and yΩY, we minimize the objective function

E{[κ(Y,y)ab(Xx)]2Kh(Xx)}/EKh(Xx) (4)

over all aR and bRd0. The minimizer depends on x,y and we write it as (ah(x,y),bh(x,y). The ensemble tuned candidate matrix for estimating the central mean subspace 𝒮E[κ(Y,y)X] is T0FXY,κ(,y),h=E[bh(X,y)bh(X,y)] and the tuned Fréchet candidate matrix is TFXY,h=E[bh(X,Y)bh(X,Y)].

At the sample level, we minimize, for each j,k=1,,n, the empirical objective function

i=1nwh(Xi,Xj)[κγ(Yi,Yk)ajkbjk(XiXj)]2 (5)

over ajkR and bjkRp, where wh(Xi,Xj)=Kh(XiXj)/l=1nKh(XlXj). Following Xia et al. (2002), we take the bandwidth to be h=c0n1/p0+6 where p0=max{p,3} and c0=2.34, which is slightly larger than the optimal n1/(p+4) in terms of the mean integrated squared errors. As proposed in Li (2018, Lemma 11.6), instead of solving bjk from (5) n2 times, we solve multivariate weighted least squares to obtain bj1,,bjn simultaneously. The tuned Fréchet candidate matrix is then estimated by Tˆ(FˆXY(n),h)=n2j,k=1nbˆjkbˆjk. The first d eigenvectors of Tˆ(FˆXY(n),h) form an estimate of the Fréchet central subspace.

We can further enhance the performance by projecting the original predictors onto the directions produced by Tˆ(FˆXY(n),h) to re-estimate 𝒮YX. Specifically, after compute Tˆ(FˆXY(n),h), we form the matrix Bˆ=vˆ1,,vˆd to contain the first d eigenvectors of Tˆ(FˆXY(n),h). We then replace the kernel KhXjXi by Kh(BˆXjXi) with an updated bandwidth h, and complete bˆjk from (5) again, which leads to an updated Bˆ. We then iterate this process until convergence. In this way, we reduce the dimension of the kernel from p to d0 and mitigate the “curse of dimensionality.” For classical SDR problems, this procedure is called refined OPG, see Li (2018, Chapter 11.4). We call this refined estimator Fréchet OPG or FOPG. The algorithm for FOPG is summarized as Algorithm 2 in the Supplementary Material.

4.3. Ensembled inverse regression estimators via CS ensembles

In this subsection, we adapt several well-known estimators for the classical central subspace to Fréchet SDR, which include SIR (Li, 1991), SAVE (Cook and Weisberg, 1991), and DR (Li and Wang, 2007). We use the CS-ensemble to combine these classical estimates through (3). Let F=κ(,y):yΩy be a CS ensemble, where κ is a cc-universal kernel. Let M0:𝒫XY×FRp×p be a CS-ensemble candidate matrix. Let MFXY=M0FXY,κ(,y)dFY(y) be the Fréchet candidate matrix.

Again, we work with the standard predictor Z. The candidate matrices M0(y) for ensemble SIR, SAVE, and DR are formulated in Remark 2. Detailed motivation for each can be found in Li (2018, Chapter 3,5,6). At the sample level, we replace any unconditional moment E by the sample average En, and replace any conditional moment, such as E(Zκ(Y,y)), by the slice mean. The algorithms are also included in Algorithm 1.

REMARK 2.

The candidate matrices M0(y) for Fréchet SIR, SAVE, and DR are var[E(Zκ(Y,y)], Ipvar(Zκ(Y,y))2, and 2EEZZκ(Y,y)2+2E2E[Zκ(Y,y)]EZκ(Y,y)+2EEZκ(Y,y)E[Zκ(Y,y)]EE[Zκ(Y,y)]EZκ(Y,y)2Ip, respectively.

REMARK 3.

Regarding the time complexity of the Fréchet SDR methods, by construction, the ensemble estimator requires n times the computing time of the original estimator because it needs to reapply the original estimator for each κ,yi,i=1,,n. For example, if SAVE is used as the original estimator, then the largest matrix multiplication is Ap×nBn×p which requires p2n basic computation units; the largest matrix to invert or eigendecomposition to perform is a p×p matrix, which requires p3 basic computation units. So the net computation complexity is n×maxOnp2,Op3.

4.4. Fisher consistency

In this subsection, we establish the unbiasedness and Fisher consistency of the tuned Fréchet candidate matrix. As a special case, the Fréchet candidate matrix constructed by any moment-based methods in Section 4.1 can be considered as tuned Fréchet candidate matrix with the tuning parameter h taken to be 0. The next theorem shows that if T0 is unbiased (or Fisher consistent), then T is unbiased (or Fisher consistent). In the following, we say that a measure μ on ΩY is strictly positive if and only if for any nonempty open set UΩY,μ(U)>0. For a matrix A,A represents the operator norm.

THEOREM 2.

Suppose F=κ(,y):yΩY is a CMS-ensemble, where κ is a cc-universal kernel. We have the following results regarding unbiasedness and Fisher consistency for T.

  1. If T0isunbiasedfor𝒮E[κ(Y,y)X]:fF and T0(FXY(0),κ,Y,h)GY, where GY is a real-valued function with EGY<, then T is unbiased for 𝒮YX;

  2. If (a) T0 is Fisher consistent for 𝒮E[κ(Y,y)X]:fF, (b) T0FXY,κ(,y),h is positive semidefinite for each yΩY,hR and FXY𝒫XY, (c) limsuph0T0(FXY(0),κ(,Y),h)G(Y) with EGY<, (d) FY is strictly positive on ΩY, and (e) the mapping ylimh0T0(FXY,κ(,y),h) is continuous, then T is Fisher consistent for 𝒮YX.

We similarly develop Fisher consistency for Fréchet SDR based on the CS-ensemble, including methods in Section 4.3. The next corollary says that if M0 is Fréchet consistent for 𝒮κ(Y,y)X:yΩY, then M is Fréchet consistent for 𝒮YX. The proof is similar to that of Theorem 2 and is omitted.

COROLLARY 2.

Suppose F=κ(,y):yΩY is a CS-ensemble, where κ is a cc-universal kernel. We have the following results regarding unbiasedness and Fisher consistency for M.

  1. If M0 is unbiased for 𝒮κ(Y,y)X:fF, then M is unbiased for 𝒮YX;

  2. If M0 is Fisher consistent for 𝒮κ(Y,y)X:fF,M0FXY,κ(,y) is positive semidefinite for each yΩX and FXY𝒫XY, FY is strictly positive, and the mapping yM0FXY,κ,y is continuous, then M is Fisher consistent for 𝒮YX.

Unbiasedness and Fisher consistency of T0 or M0 are satisfied by different sets of sufficient conditions for the moment-based or forward-regression-based estimators. We outline these conditions below.

  1. For ensembled moment estimators in Section 4.1 and ensembled inverse regression estimators in Section 4.3, most of them are unbiased under either the LCM assumption or both the LCM and CCV assumption. For example, the unbiasedness of SIR, OLS, and IHT requires the LCM assumption, whereas the unbiasedness of SAVE, DR, and PHD requires both the LCM and the CCV assumptions. The estimators SIR, OLS, IHT, and PHD are generally not exhaustive (recall that unbiased along with exhaustiveness is equivalent to Fisher consistency). But sufficient conditions for SAVE and DR to be exhaustive are reasonably mild (see Li and Wang (2007) and Li (2018, Chapter 6)).

  2. Sufficient conditions for Fisher consistency for OPG are given in Li (2018, Section 11.2). Specifically, it requires: (a) the smooth kernel function K0 is a spherically-contoured p.d.f. with finite fourth moments; (b) the p.d.f. of X is supported on Rp and has continuous bounded second derivatives. Note that neither LCM nor CCV assumption is needed for the OPG estimator.

In practice, when we observe a severe violation of the elliptical assumption among the predictors, for example, by exploratory data analysis tools such as the scatter plot matrix, it is favorable to use forward regression methods such as FOPG. Otherwise, moment-based methods are recommended since they are faster to compute and have a parametric (n1/2) convergence rate. Our experiences also indicate that the performance of the moment-based methods is relatively robust against the violation of ellipticity as long as it is not very severe.

5. Convergence Rates of the Ensemble Estimates

In this section, we develop the convergence rates of the ensemble estimates for Fréchet SDR. To save space, we will only consider the CMS-ensemble; the results for the CS-ensemble are largely parallel. To simplify the asymptotic development, we slightly modify the ensemble estimator, which does not result in any significant numerical difference from the original ensembles developed in the previous sections. For each i=1,,n, let FˆXY(i) be the empirical distribution based on the sample with ith subject removed: X1,Y1,,Xn,Yn\Xi,Yi. Our modified ensemble estimate is of the form

T(FˆXY(n),hn)=n1i=1nT0(FˆXY(i),κ,Yi,hn).

The purpose of this modification is to break the dependence between the ensemble member κ,Yi and the CMS estimate, which substantially simplifies the asymptotic argument. Here, we let the tuning parameter hn depend on n. Again, the Fréceht candidate matrix constructed by moment-based methods can be considered as a special case with hn=0.

Rather than deriving the convergence rate of each individual ensemble estimate case by case, we will show that, under some mild conditions, the ensemble convergence rate is the same as the corresponding CMS-estimate’s rate. Since the convergence rates of many CMS-estimates are well established, including all the forward regression and sample moment-based estimates mentioned earlier, our general result covers all the CMS-ensemble estimates.

In this following, for a matrix A,A represents the operator norm and AF the Frobenius norm. If an and bn are sequences of positive numbers, we write anbn if limnan/bn=0; we write anbn if an/bn is a bounded sequence. We write bnan (or bnan) if anbn (or anbn). We write anbn if anbn and bnan. Let T0*(FXY(0),κ(,y))=limh0T0(FXY(0),κ(,y),h) and T*(FXY(0))=limh0T(FXY(0),h).

THEOREM 3.

Let Cn(y)=ET0(FˆXY(n),κ(,y),hn)T0*(FXY(0),κ(,y)) and an be a positive sequence of numbers satisfying an+1/an1 and ann1/2. Suppose the entries of T0*(FXY(0),κ(,Y)) have finite variances. If ECn(y)=Oan, then T(FˆXY(n),hn)T*(FXY(0))=OPan.

The above theorem says that, under some conditions, the convergence rate of an ensemble Fréchet SDR estimator is the same as the corresponding CMS estimator. This covers all the estimators developed in Section 4. Specifically:

  1. For all moment-based ensemble methods, such as OLS, PHD, IHT, SIR, SAVE, DR, the ensemble candidate matrices can be written in the form T0(FˆXY(n),κ(,y))=Λˆ(y)Λˆ(y), where Λˆ(y) is a matrix possessing the second order von Mises expansion, implying ECn(y)=On1/2. See, for example, Li (2018)

  2. For nonparametric forward regression ensemble methods, OPG, the convergence rate of Cn(y) was reported in Xia et al. (2002) as Ohn2+hn1δn2 where δn=(logn)/nhnp. Although the convergence was established in terms of convergence in probability, under mild conditions such as uniformly integrability, we can obtain the same rate for ECn(y).

6. Simulations

We evaluate the performance of the proposed Fréchet SDR methods with distributions and symmetric positive definite matrices as responses. For space consideration, the additional simulation for spherical data is presented in the Supplementary Material.

6.1. Computational details

Choice of tuning parameters and kernel types.

We first implement a unified cross-validation procedure to select the kernel type and bandwidth γ in the kernel. For both distributional response and symmetric positive definite matrix response, we consider Gaussian radial basis kernel κGy,y=exp(γdy,y2) and Laplacian radial basis kernel κLy,y as candidates to construct the ensembles. For the bandwidth γ, we set the default value as

γG=ρY2σG2,whereσG2=(n2)1i<jd(Yi,Yj)2,ρY=1, (6)

in the Gaussian radial basis kernel, and

γL=ρY2σL,whereσL=(n2)1i<jd(Yi,Yj),ρY=1,

in the Laplacian radial basis kernel. The same choices were used in Lee et al. (2013) and Li and Song (2017). We then fine-tune ρY and kernel types together via the k-fold cross-validation as follows. Randomly split the whole sample into k subsets of roughly equal sizes, say D1,,Dk. For each i=1,,k, use Di as the test set and its complement as the training set. We first use the training set to implement the Fréchet SDR with an initial dimension d, say 5. This choice of a relatively large dimension helps to guarantee the unbiasedness of the estimated Fréchet central subspace. We then substitute the estimated βˆ into the testing set to produce the sufficient predictor βˆTX and then fit a global Fréchet regression model (Petersen and Müller, 2019) to predict the response in the testing set. Compute the prediction error for each i and aggregate the error for all rotations i=1,,k, which yields an overall cross-validation error. This overall error depends on the tuning parameter ρY and kernel type, and is then minimized over a grid 102,101,1,10×κG,κL to obtain the optimal combinations.

Estimation of the dimensions.

For the ensemble estimators that possess a candidate matrix (such as the ensemble moment estimators in Section 4.1), the recently developed order-determination methods, such as the ladle estimate (Luo and Li, 2016), and predictor augmentation estimator (Luo and Li, 2021) can be directly applied to estimate d0. In addition, the BIC-criterion introduced by Zhu et al. (2006) can also be used for this purpose.

In this paper, we adapted the predictor augmentation estimator to the current setting. A detailed introduction of the predictor augmentation method and more simulation results are included in the Supplementary Material. For the predictor augmentation estimator, we take the times of augmentations s=10 and the dimension of augmented predictors r=p/2, where p is the original dimension of predictors.

Estimation error assessment.

We used the error measurement for subspace estimation as in Li et al. (2005): if 𝒮1 and 𝒮2 are two subspaces of Rp of the same dimension, then their distance is defined as d𝒮1,𝒮2=P𝒮1P𝒮2F, where P𝒮 is the projection on to 𝒮, and F is the Frobenius matrix norm. If B1 and B2 are two matrices whose columns form bases of 𝒮1 and 𝒮2 respectively, this distance can be equivalently written as B1B1B11B1B2B2B21B2F. This distance is coordinate-free, as it is invariant to the basis matrices involved.

To facilitate the comparison, we also include the benchmark error, which is set as the expectation of the above distance when B1 is taken as any basis matrix of the true central subspace, and entries of B2 are generated randomly from i.i.d. N(0,1). This expectation is computed by Monte Carlo with 1000 repeats.

6.2. Scenario I: Fréchet SDR for distributions

Let ΩY,dw be the metric space of univariate distributions endowed with Wasserstein metric dw, as described in Section 3. The construction of the ensembles requires computing the Wasserstein distances dwYi,Yj for i,j=1,,n. However, the distributions Y1,,Yn are usually not fully observed in practice, which means we need to estimate them in the implementation of the proposed methods. There are multiple ways to do so, such as by estimating the c.d.f.’s, the quantile functions (Parzen, 1979), or the p.d.f’s (Petersen and Müller, 2016; Chen et al., 2021). For computation simplicity, we use the Wasserstein distances between the empirical measures. Specifically, suppose we observe (X1,W1jj=1m1),,(Xn,Wnjj=1mn), where Wijj=1mi are independent samples from the distribution Yi. Let Yˆi be the empirical measure mi1j=1miδwij, where δa is the Dirac measure at a, then we estimate dwYi,Yk by dw(Yi,Yˆk). For the theoretical justification, see Fournier and Guillin (2015) and Lei (2020). For simplicity, we assume the sample sizes m1,,mn to be the same and denote the common sample size by m. Then the distance between empirical measures Yˆi and Yˆk is a simple function of the order statistics: dw(Yˆi,Yˆk)={j=1m(Wi(j)Wk(j))2}1/2, where Wi(j) is the j-th order statistics of the sample Wi1,Wim.

Let β1=(1,1,0,,0), β2=(0,,0,1,1), β3=(1,2,0,,0,2) and β4=(0,0,1,2,2,,0). To generate univariate distributional response Y, we let Y=NμY,σY2, where μY and σY2 are random variables dependent on X, and σY>0 almost surely, defined by the following models:

I-1 : μYXNexpβ1X,1 and σY=1.

I-2 : μYXNexpβ1X,1 and σY=1011ςX<101+ςX1101ςX10+101{ςX>10} where ςX=expβ2X.

To generate the predictor X, we consider both scenarios where Assumption 1 is satisfied and violated. Specifically, for Model I-1 and I-2, X is generated by the following two scenarios:

  1. XN(0,1); in this case both LCM and CCV in Assumption 1 are satisfied;

  2. we first generate U1,,Up from the AR(1) model with mean 0 and covariance matrix Σ=(0.5|ij|)i,j, and then generate X by sinU1,U2,U3,,Up. For this model, both LCM and CCS are violated.

Ying and Yu (2022) considered similar models to Model I-1 and Model I-2. For Model I-1, B0=β1 and d0=1; for Model I-2, B0=β1,β2 and d0=2. In the simulation, we first generate X1,,Xn, then generate μY1,σY1,,μYn,σYn. For each i=1,,n, we then generate Wi1,,Wim independently from N(μYi,σYi2). We take (n,p)=(200,10),(400,20) and m=100.

We compare performances of the CMS ensemble methods and CS ensemble methods described in Section 4, including FOLS, FPHD, FIHT, FSIR, FSAVE, FDR, and FOPG (with refinement). We first implement the predictor augmentation (PA) method to estimate the dimension of the Fréchet central subspace. Then with estimated dˆ, we evaluate the estimation error. For FOPG, the number of iterations is set as 5, which is large enough to guarantee numerical convergence. For FSIR, FSAVE, the number of slices is chosen as n/2p; for FDR, the number of slices is chosen as n/6p. We also implement the weighted inverse regression ensemble (WIRE) method proposed by Ying and Yu (2022) for comparison. We repeat the experiments 500 times and summarize the proportion of correct identification of order and the mean and standard deviation of estimation error in Table 1. A smaller distance indicates a more accurate estimate, and the estimate with the smallest distance is shown in boldface. The benchmark distances are shown at the bottom of the table.

Table 2:

Mean(± standard deviation) of estimation error measured by PB0PBˆF for different methods for Scenario II. The benchmark for Model II-1 with p=10,20 are 1.334, 1.373 respectively, for Model II-2 with p=10,20 are 1.785, 1.893, respectively. The bold-faced number indicates the best performer.

Model (p,n) FOLS FPHD FIHT FSIR FSAVE FDR FOPG WIRE
II-1-(a) 100% 71% 100% 99% 87% 98% 100% 100%
(10,200) 0.157 0.865 0.157 0.170 0.299 0.171 0.154 0.152
(0.055) (0.288) (0.055) (0.093) (0.285) (0.138) (0.041) (0.038)
100% 68% 100% 100% 92% 99% 100% 100%
(20,400) 0.162 0.921 0.162 0.167 0.258 0.165 0.153 0.160
(0.029) (0.262) (0.029) (0.031) (0.221) (0.089) (0.027) (0.028)
II-1-(b) 99% 58% 99% 92% 51% 67% 97% 98%
(10,200) 0.236 1.044 0.236 0.288 0.735 0.506 0.224 0.220
(0.115) (0.271) (0.115) (0.219) (0.348) (0.376) (0.162) (0.133)
100% 52% 100% 96% 51% 70% 96% 99%
(20,400) 0.235 1.126 0.235 0.260 0.726 0.472 0.233 0.222
(0.047) (0.245) (0.047) (0.155) (0.336) (0.364) (0.157) (0.09)
II-2-(a) 100% 20% 100% 100% 78% 99% 100% 100%
(10,200) 0.292 1.20 0.292 0.306 0.615 0.358 0.151 0.290
(0.078) (0.155) (0.078) (0.088) (0.285) (0.122) (0.05) (0.078)
100% 20% 100% 100% 85% 100% 100% 100%
(20,400) 0.308 1.218 0.308 0.318 0.618 0.375 0.179 0.307
(0.057) (0.15) (0.057) (0.058) (0.218) (0.075) (0.029) (0.057)
II-2-(b) 99% 35% 98% 100% 58% 80% 98% 100%
(10,200) 0.680 1.462 0.682 0.707 1.182 0.941 0.275 0.675
(0.184) (0.19) (0.186) (0.185) (0.233) (0.245) (0.137) (0.181)
100% 42% 100% 100% 50% 92% 100% 100%
(20,400) 0.694 1.505 0.694 0.710 1.228 0.933 0.336 0.691
(0.128) (0.187) (0.128) (0.126) (0.189) (0.194) (0.079) (0.128)

For Models I-1 and I-2, the best performer FOPG achieves 100% correct order determination percentage and enjoys the smallest estimation error. The moment-based ensemble methods are slightly less accurate than FOPG. Compared with the benchmark, all methods can successfully estimate the true central subspace except FPHD. Compared to the results from predictor settings (a) and (b), we see that most moment-based methods and inverse-regression-based methods have larger estimation error and less percentage of correct order determination under setting (b), but FOPG, which is free from the elliptical assumption of predictors, still give the most precise estimation. Overall, the correlation between predictors and non-ellipticity does not affect the results much compared with the benchmark error.

6.3. Scenario II: Fréchet SDR for positive definite matrices

Let ΩY be the space sym+(r) endowed with Frobenius distance dFY1,Y2=Y1Y2F. To accommodate the anatomical intersubject variability, Schwartzman (2006) introduced the symmetric matrix variate Normal distributions. We adopt this distribution to construct the regression model with correlation matrix response. We say that ZSym(r) has the standard symmetric matrix variate Normal distribution Nrr0;Ir if it has density φ(Z)=(2π)q/2exp(tr(Z)2/2) with respect to Lebesgue measure on Rp(p+1). As pointed out in Schwartzman (2006), this definition is equivalent to a symmetric matrix with independent N(0,1) diagonal elements and N(0,1/2) off-diagonal elements. We say YSym(r) has symmetric matrix variate Normal distribution Nrr(M;Σ) if Y=GZGT+M where MSym(r), GR(r×r) is a non-singular matrix, and Σ=GTG. As a special case, we say YSym(r)NrrM;σ2 if Y=σZ+M.

We generate predictors X as in settings (a) and (b) of Scenario II. We generate log(y) following Ndd(log{D(X)},0.25), where log() is the matrix logarithm defined in Section 3, and D(X) is specified by the following models:

II-1: (X)=1ρ(X)ρ(X)1, where ρX=expβ1X1/expβ1X+1.

II-2: D(X)=1ρ1(X)ρ2(X)ρ1(X)1ρ1(X)ρ2(X)ρ1(X)1, where ρ1X=0.4expβ1X1/expβ1X+1 and ρ2X=0.4 sinβ3X.

In Model II-1, B0=β1 and d0=1; in Model II-2, B0=β1,β2 and d0=2. We note that D(x) is not necessarily the Fréchet conditional mean of Y given X, but still measures the central tendency of the conditional distribution YX. We also compare performances of the CMS ensemble methods and CS ensemble methods, with (n,p)=(200,10),(400,20). The experiments are repeated 500 times. The proportion of correct identification of order and the means and standard deviations of estimation errors are summarized in Table 2.

Table 3:

10-fold cross-validation prediction errors of GFR/LFR for mortality data

GFR LFR
9-dim full predictor 30.475 28.745
2-dim sufficient predictor 27.200 23.852

We conclude that all ensemble methods give reasonable estimation except FPHD. FOPG performs best in all settings except Model II-1-(b). To illustrate the relation between the response and estimated sufficient predictor βˆ1X, we adopt the ellipsoidal representation of SPD matrices. Each ASym+(d) can be associated with an ellipsoid centered at the origin A=x:xA1x1. Figure 2 plots the responses ellipsoid versus the estimated sufficient predictor in panel (a), compared with the responses versus predictor X10 for Model II-1-(a). We can tell a clear pattern of change in shape and rotation of response ellipsoids versus βˆ1X.

Figure 2:

Figure 2:

Ellipsoidal plots of the SPD matrix response versus the FOPG predictor βˆ1X and X10 using Model II-1-(a) with (n=200,p=10). Each horizontal ellipse is an Ellipsoidal representation of an SPD matrix, and the vertical axis is the value of (a) βˆ1X; (b) X10, colored according to the vertical axis values.

7. Application to the Human Mortality Data

This section presents an application concerning human life spans. Another application concerning intracerebral hemorrhage is presented in Section S. 6 of the Supplementary Material.

Compared with summary statistics such as the crude death rate, viewing the entire age-at-death distributions as data objects gives us a more comprehensive understanding of human longevity and health conditions. Mortality distributions are affected by many factors, such as economics, the health care system, and social and environmental factors. To investigate the potential factors that are related to the mortality distributions across different countries, we collect nine predictors listed below, covering demography, economics, labor market, nutrition, health, and environmental factors in 2015: (1) Population Density: population per square Kilometer; (2) Sex Ratio: number of males per 100 females in the population; (3) Mean Childbearing Age: the average age of mothers at the birth of their children; (4) Gross Domestic Product (GDP) per Capita; (5) Gross Value Added (GVA) by Agriculture: the percentage of agriculture, hunting, forestry, and fishing activities of gross value added, (6) Consumer price index: treat 2010 as the base year; (7) Unemployment Rate; (8) Expenditure on Health (percentage of GDP); (9) Arable Land (percentage of total land area). The data are collected from United Nation Databases (http://data.un.org/) and UN World Population Prospects 2019 Databases (https://population.un.org/wpp/Download). For each country and age, the life table contains the number of deaths d(x,n) aggregated every five years. We treat these data as histograms of the number of deaths at age, with bin widths equal to 5 years. We smooth the histograms using the ‘frechet’ package available at (https://cran.r-project.org/web/packages/frechet/index.html) to obtain smoothed probability density functions and then calculate the Wasserstein distances between them. We collected the data for 152 countries in 2015 after removing 10 countries with extreme values in feature Population Density, Sex Ratio, CPI, and Expenditure on Health.

By checking the scatter plots matrix, we observe nonellipticity in the predictors. So we choose FOPG to analyze the data, which does not rely on Assumption 1. We use the Gaussian kernel κy,y=expγdw2y,y for the ensemble, where γ is chosen according to (6) in Section 6.1. We standardize all covariates separately, then use the predictor augmentation method combined with FOPG to estimate the dimension of the Fréchet central subspace, which is estimated as 2. The first two directions obtained by FOPG are

βˆ1=(0.841,0.155,0.100,0.885,0.361,0.075,0.108,0.214,0.055),
βˆ2=(0.838,0.706,0.395,0.424,0.758,0.005,0.218,0.102,0.034).

A plot of mortality densities versus the first sufficient predictor βˆ1TX is shown in Figure 3(a). Clear and useful patterns emerge from Figure 3(a): the mode of the mortality distribution shifts from right to left (with left indicating a longer life span) as the value of the first sufficient predictor increases. Moreover, there is a significant uptick at the right-most end as the first sufficient predictor decreases, indicating high infant mortality. Meanwhile, the loadings of the first sufficieant predictor are strongly positive for the GDP per capita, which indicates the levels of economic development and health care of a country, with larger values associated with more developed countries and smaller values associated with less developed countries. From Figure 3(c), we see that the mean of the mortality distribution increases and the standard deviation decreases with the value of the first sufficient predictor. This also makes sense: the mean life span increases with the level of development, consistent with Figure 3(a). The standard deviation decreases with the first predictor because, as the development level increases, the life span is increasingly concentrated on the high values. Moreover, the high mortality in the lower region of the first sufficient predictor also contributes to the larger standard deviation in this region.

Figure 3:

Figure 3:

(a) Mortality distributions versus the first sufficient predictors; (b) Cross-validation predicted mortality using sufficient predictors; (c) Mean and standard deviation of responses versus the first sufficient predictor.

We fit the global and local Fréchet regressions (GFR/LFR) with all nine predictors and two sufficient predictors, respectively. The 10 -fold cross-validation prediction errors are collected in the following Table 3. We see that using sufficient predictors achieves more accurate prediction results, especially in the local Fréchet regression model. The LFR gives more accurate predictions than the GFR, which indicates that the LFR is more flexible when a nonlinear regression pattern exists. This result is consistent with the recent findings in Bhattacharjee and Müller (2021). The predicted mortality densities versus the first sufficient predictor are shown in Figure 3(b). We also compared the cross-validation prediction errors by LFR using sufficient predictors of FOPG with those of FSIR and WIRE in Ying and Yu (2022), which require the linear conditional mean assumption. The 10 -fold cross-validation prediction error using WIRE and FSIR are 24.765 and 24.342, respectively. Compared with 23.852 by FOPG, we see that FOPG performs better than FSIR and WIRE, while the latter two perform similarly.

8. Discussion

In the classical regression setting, sufficient dimension reduction has been used as a tool for exploratory data analysis, regression diagnostics, and a mechanism to overcome the curse of dimensionality in regression. As a regression tool, it can help us to treat collinearity in the predictor effectively, detect heteroscedasticity in the response, find the most important linear combinations of predictors, and understand the general shape of the regression surface without fitting an elaborate regression model. Although regression with a metric-space valued random object is a new problem, as a regression problem, it shares the same set of issues, such as the need for exploratory analysis before regression, for model diagnostics after the regression, and for mitigating the curse of dimensionality. As shown in Figure 1 in the paper, the first sufficient predictor clearly reveals useful information about a general trend of mortality distributions among countries.

The proposed methodology is very flexible and versatile: it can be used to turn any existing SDR method into one that can deal with the metric-space-valued response variable. Furthermore, it applies to any separable and complete metric space of negative type with an explicit CMS ensemble. It significantly broadens the current field of sufficient dimension reduction and provides a useful set of tools for Fréchet regression.

The proposed method also has its limitations, one of which is that it only applies to metric spaces that permit the construction of a universal kernel. Another possible criticism is that the ensemble constructed by metric in the embedded Hilbert space is extrinsic to the original metric space. However, we do not regard this as a serious drawback for two reasons. First, the role played by the ensemble family is rather like that played by characteristic function, which need not be of the same nature as the original random variable. Second, in some important special cases (e.g., the Wasserstein space of univariate distributions and the space of SPD matrices), the embedding is isometric, so we are building the kernel using the original metric even though we work with the embedded Hilbert space. Nevertheless, when it is possible to use the original metric (such as in the isometric embedding case), it seems intuitively appealing to take that as our first choice, as we have done in all three examples.

Supplementary Material

Supp 1

Table 1:

The percentages of correct order determination, and the mean (standard deviation) of estimation error as measured by PB0PBˆF for Models I-1 and I-2 with settings (a) settings (a) and (b); the benchmark for Model I-1 with p=10,20 are 1.334, 1.373 respectively, and for Model I-2 with p=10,20 are respectively. The bold-faced number indicates the best performer.

Model (p,n) FOLS FPHD FIHT FSIR FSAVE FDR FOPG WIRE
I-1-(a) 100% 97% 100% 100% 95% 97% 100% 100%
(10,200) 0.334 0.593 0.341 0.260 0.437 0.336 0.167 0.236
(0.088) (0.158) (0.09) (0.081) (0.199) (0.144) (0.054) (0.057)
100% 97% 100% 100% 97% 98% 100% 100%
(20,400) 0.365 0.634 0.371 0.263 0.433 0.342 0.227 0.251
(0.075) (0.136) (0.075) (0.046) (0.149) (0.115) (0.05) (0.041)
I-1-(b) 99% 99% 99% 97% 95% 98% 100% 97%
(10,200) 0.380 0.638 0.399 0.239 0.361 0.280 0.136 0.204
(0.122) (0.122) (0.126) (0.145) (0.187) (0.12) (0.039) (0.147)
99% 99% 99% 98% 95% 99% 100% 97%
(20,400) 0.387 0.648 0.404 0.237 0.365 0.275 0.194 0.211
(0.098) (0.096) (0.094) (0.123) (0.176) (0.092) (0.053) (0.151)
I-2-(a) 100% 91% 100% 100% 99% 100% 100% 100%
(10,200) 0.409 1.032 0.412 0.370 0.528 0.413 0.267 0.304
(0.11) (0.254) (0.109) (0.082) (0.134) (0.09) (0.112) (0.061)
100% 90% 100% 100% 100% 100% 100% 100%
(20,400) 0.431 1.157 0.435 0.371 0.548 0.434 0.298 0.320
(0.069) (0.23) (0.069) (0.049) (0.086) (0.059) (0.072) (0.038)
I-2-(b) 100% 91% 100% 99% 100% 100% 100% 100%
(10,200) 0.551 1.122 0.557 0.464 0.630 0.507 0.290 0.370
(0.111) (0.203) (0.11) (0.119) (0.133) (0.102) (0.086) (0.084)
100% 91% 100% 100% 100% 100% 100% 100%
(20,400) 0.561 1.179 0.567 0.458 0.645 0.521 0.330 0.381
(0.081) (0.164) (0.08) (0.072) (0.089) (0.07) (0.051) (0.058)

Acknowledgments

The authors thank the Co-Editor, Associate Editor, and anonymous reviewers for their helpful comments, which significantly improved the quality of the paper.

Footnotes

Disclosure

The authors report there are no competing interests to declare.

References

  1. Ambrosio L, Gigli N and Savaré G (2004), ‘Gradient flows with metric and differentiable structures, and applications to the wasserstein space’, Atti Accad. Naz. Lincei Cl. Sci. Fis. Mat. Natur. Rend. Lincei (9) Mat. Appl 15(3–4), 327–343. [Google Scholar]
  2. Berg C, Christensen JPR and Ressel P (1984), Harmonic analysis on semigroups: theory of positive definite and related functions, Vol. 100, Springer. [Google Scholar]
  3. Bhattacharjee S and Müller H-G (2021), ‘Single index fréchet regression’, arXiv preprint arXiv:2108.05437. [Google Scholar]
  4. Bigot J, Gouet R, Klein T and López A (2017), ‘Geodesic pca in the wasserstein space by convex pca’, Annales de l’Institut Henri Poincaré, Probabilités et Statistiques 53, 1–26. [Google Scholar]
  5. Chen Y, Lin Z and Müller H-G (2021), ‘Wasserstein regression’, J. Amer. Statist. Assoc. [Google Scholar]
  6. Christmann A and Steinwart I (2010), Universal kernels on non-standard input spaces, in ‘in Advances in Neural Information Processing Systems’, pp. 406–414. [Google Scholar]
  7. Cook RD (1996), ‘Graphics for regressions with a binary response’, J. Amer. Statist. Assoc 91, 983–992. [Google Scholar]
  8. Cook RD and Li B (2002), ‘Dimension reduction for conditional mean in regression’, The Annals of Statistics 30(2), 455–474. [Google Scholar]
  9. Cook RD and Weisberg S (1991), ‘Sliced inverse regression for dimension reduction: Comment’, Journal of the American Statistical Association 86(414), 328–332. [Google Scholar]
  10. Ding S and Cook RD (2015), ‘Tensor sliced inverse regression’, J. Multivar. Anal. 133, 216–231. [Google Scholar]
  11. Dubey P and Müller H-G (2019), ‘Fréchet analysis of variance for random objects’, Biometrika 106(4), 803–821. [Google Scholar]
  12. Dubey P and Müller H-G (2020a), ‘Fréchet change-point detection’, Ann. Stat. 48(6), 3312–3335. [Google Scholar]
  13. Dubey P and Müller H-G (2020b), ‘Functional models for time-varying random objects’, Journal of the Royal Statistical Society Series B: Statistical Methodology 82(2), 275–327. [Google Scholar]
  14. Eaton ML (1986), ‘A characterization of spherical distributions’, J. Multivar. Anal. 20(2), 272–276. [Google Scholar]
  15. Fan J, Xue L and Yao J (2017), ‘Sufficient forecasting using factor models’, Journal of Econometrics 201(2), 292–306. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Feragen A, Lauze F and Hauberg S (2015), Geodesic exponential kernels: when curvature and linearity conflict, in ‘Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition’, pp. 3032–3042. [Google Scholar]
  17. Ferré L and Yao A-F (2003), ‘Functional sliced inverse regression analysis’, Statistics 37(6), 475–488. [Google Scholar]
  18. Ferreira LK and Busatto GF (2013), ‘Resting-state functional connectivity in normal brain aging’, Neuroscience & Biobehavioral Reviews 37(3), 384–400. [DOI] [PubMed] [Google Scholar]
  19. Fournier N and Guillin A (2015), ‘On the rate of convergence in wasserstein distance of the empirical measure’, Probability Theory and Related Fields 162(3–4), 707–738. [Google Scholar]
  20. Fréchet M (1948), ‘Les éléments aléatoires de nature quelconque dans un espace distancié’, Annales de l’Institut Henri Poincaré p. 215–310. [Google Scholar]
  21. Granirer EE (1970), ‘Probability measures on metric spaces. by parthasarathy k. r.. academic press, new york and london: (1967). x 276 pp.’, Canadian Mathematical Bulletin 13(2), 290–291. [Google Scholar]
  22. Hall P and Li K-C (1993), ‘On almost linearity of low dimensional projections from high dimensional data’, The Annals of Statistics pp. 867–889. [Google Scholar]
  23. Honeine P and Richard C (2010), The angular kernel in machine learning for hyperspectral data classification, in ‘2010 2nd Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing’, IEEE, pp. 1–4. [Google Scholar]
  24. Hsing T and Ren H (2009), ‘An rkhs formulation of the inverse regression dimension-reduction problem’, The Annals of Statistics 37(2), 726–755. [Google Scholar]
  25. Jayasumana S, Hartley R, Salzmann M, Li H and Harandi M (2013), Combining multiple manifold-valued descriptors for improved object recognition, in ‘2013 International Conference on Digital Image Computing: Techniques and Applications (DICTA)’, IEEE, pp. 1–6. [Google Scholar]
  26. Lee K-Y, Li B and Chiaromonte F (2013), ‘A general theory for nonlinear sufficient dimension reduction: Formulation and estimation’, The Annals of Statistics 41(1), 221–249. [Google Scholar]
  27. Lei J (2020), ‘Convergence and concentration of empirical measures under wasserstein distance in unbounded functional spaces’, Bernoulli 26(1), 767–798. [Google Scholar]
  28. Li B (2018), Sufficient Dimension Reduction: Methods and Applications with R, CRC Press. [Google Scholar]
  29. Li B, Kim MK and Altman N (2010), ‘On dimension folding of matrix-or array-valued statistical objects’, The Annals of Statistics 38(2), 1094–1121. [Google Scholar]
  30. Li B and Song J (2017), ‘Nonlinear sufficient dimension reduction for functional data’, The Annals of Statistics 45(3), 1059–1095. [Google Scholar]
  31. Li B and Wang S (2007), ‘On directional regression for dimension reduction’, Journal of the American Statistical Association 102(479), 997–1008. [Google Scholar]
  32. Li B and Yin X (2007), ‘On surrogate dimension reduction for measurement error regression: an invariance law’, The Annals of Statistics 35(5), 2143–2172. [Google Scholar]
  33. Li B, Zha H and Chiaromonte F (2005), ‘Contour regression: a general approach to dimension reduction’, The Annals of Statistics 33(4), 1580–1616. [Google Scholar]
  34. Li K-C (1991), ‘Sliced inverse regression for dimension reduction’, J. Amer. Statist. Assoc 86, 316–327. [Google Scholar]
  35. Li K-C (1992), ‘On principal hessian directions for data visualization and dimension reduction: Another application of stein’s lemma’, Journal of the American Statistical Association 87(420), 1025–1039. [Google Scholar]
  36. Li K-C and Duan N (1989), ‘Regression analysis under link violation’, Ann. Stat. 17(3), 1009–1052. [Google Scholar]
  37. Luo W and Li B (2016), ‘Combining eigenvalues and variation of eigenvectors for order determination’, Biometrika 103(4), 875–887. [Google Scholar]
  38. Luo W and Li B (2021), ‘On order determination by predictor augmentation’, Biometrika 108(3), 557–574. [Google Scholar]
  39. Luo W, Xue L, Yao J and Yu X (2021), ‘Inverse moment methods for sufficient forecasting using high-dimensional predictors’, Biometrika 109(2), 473—487. [Google Scholar]
  40. Micchelli CA, Xu Y and Zhang H (2006), ‘Universal kernels.’, J. Mach. Learn. Res. 7(12), 2651–2667. [Google Scholar]
  41. Panaretos VM and Zemel Y (2020), An Invitation to Statistics in Wasserstein Space, Springer Nature. [Google Scholar]
  42. Parzen E (1979), ‘Nonparametric statistical data modeling’, J. Amer. Statist. Assoc. 74(365), 105–121. [Google Scholar]
  43. Petersen A and Müller H-G (2016), ‘Functional data analysis for density functions by transformation to a hilbert space’, The Annals of Statistics 44(1), 183–218. [Google Scholar]
  44. Petersen A and Müller H-G (2019), ‘Fréchet regression for random objects with euclidean predictors’, The Annals of Statistics 47(2), 691–719. [Google Scholar]
  45. Schwartzman A (2006), Random ellipsoids and false discovery rates: Statistics for diffusion tensor imaging data, PhD thesis, Stanford University. [Google Scholar]
  46. Sejdinovic D, Sriperumbudur B, Gretton A and Fukumizu K (2013), ‘Equivalence of distance-based and rkhs-based statistics in hypothesis testing’, The Annals of Statistics pp. 2263–2291. [Google Scholar]
  47. Sriperumbudur BK, Fukumizu K and Lanckriet GR (2011), ‘Universality, characteristic kernels and rkhs embedding of measures.’, Journal of Machine Learning Research 12(7), 2389–2410. [Google Scholar]
  48. Steinwart I (2001), ‘On the influence of the kernel on the consistency of support vector machines’, Journal of Machine Learning Research 2(Nov), 67–93. [Google Scholar]
  49. Xia Y, Tong H, Li WK and Zhu L-X (2002), ‘An adaptive estimation of dimension reduction space (with discussion)’, Journal of the Royal Statistical Society. Series B 64(3), 363–410. [Google Scholar]
  50. Ye Z and Weiss RE (2003), ‘Using the bootstrap to select one of a new class of dimension reduction methods’, Journal of the American Statistical Association 98(464), 968–979. [Google Scholar]
  51. Yin X and Li B (2011), ‘Sufficient dimension reduction based on an ensemble of minimum average variance estimators’, The Annals of Statistics 39(6), 3392–3416. [Google Scholar]
  52. Yin X, Li B and Cook RD (2008), ‘Successive direction extraction for estimating the central subspace in a multiple-index regression’, Journal of Multivariate Analysis 99(8), 1733–1757. [Google Scholar]
  53. Ying C and Yu Z (2022), ‘Fréchet sufficient dimension reduction for random objects’, Biometrika. [Google Scholar]
  54. Yu X, Yao J and Xue L (2020), ‘Nonparametric estimation and conformal inference of the sufficient forecasting with a diverging number of factors’, J. Bus. Econ. Stat. 40(1), 342–354. [Google Scholar]
  55. Zhang Q, Li B and Xue L (2022), ‘Nonlinear sufficient dimension reduction for distribution-on-distribution regression’, arXiv preprint arXiv:2207.04613. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Zhu L, Miao B and Peng H (2006), ‘On sliced inverse regression with high-dimensional covariates’, Journal of the American Statistical Association 101(474), 630–643. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp 1

RESOURCES