FUSED KERNEL-SPLINE SMOOTHING FOR REPEATEDLY MEASURED OUTCOMES IN A GENERALIZED PARTIALLY LINEAR MODEL WITH FUNCTIONAL SINGLE INDEX

Fei Jiang; Yanyuan Ma; Yuanjia Wang

doi:10.1214/15-AOS1330

. Author manuscript; available in PMC: 2016 Aug 3.

Published in final edited form as: Ann Stat. 2015 Aug 3;43(5):1929–1958. doi: 10.1214/15-AOS1330

FUSED KERNEL-SPLINE SMOOTHING FOR REPEATEDLY MEASURED OUTCOMES IN A GENERALIZED PARTIALLY LINEAR MODEL WITH FUNCTIONAL SINGLE INDEX^*

Fei Jiang ¹, Yanyuan Ma ¹, Yuanjia Wang ¹

PMCID: PMC4536976 NIHMSID: NIHMS686160 PMID: 26283801

Abstract

We propose a generalized partially linear functional single index risk score model for repeatedly measured outcomes where the index itself is a function of time. We fuse the nonparametric kernel method and regression spline method, and modify the generalized estimating equation to facilitate estimation and inference. We use local smoothing kernel to estimate the unspecified coefficient functions of time, and use B-splines to estimate the unspecified function of the single index component. The covariance structure is taken into account via a working model, which provides valid estimation and inference procedure whether or not it captures the true covariance. The estimation method is applicable to both continuous and discrete outcomes. We derive large sample properties of the estimation procedure and show different convergence rate of each component of the model. The asymptotic properties when the kernel and regression spline methods are combined in a nested fashion has not been studied prior to this work even in the independent data case.

Keywords: B-spline, Generalized linear model, Huntington's disease, Infinite dimension, Logistic model, Semiparametric model, Single index model

1. Introduction

As a semiparametric regression model, single index model is a popular way to accomodate multivariate covariates while retain model flexibility. For independent outcomes, Carroll et al. (1997) introduced a generalized partially linear single index model which enriches the family of single index models by allowing an additional linear component. The goal of this paper is to develop a class of generalized partially linear single index models with functional covariate effect and explore the estimation and inference for repeatedly measured dependent outcomes.

In the longitudinal data framework, let i denote the ith individual, and k be the kth measurement, where i = 1, . . . , n and k = 1 . . . , M_i. Here M_i is the total number of observations available for the ith individual. Let D_ik be the response variable, Z_ik and X_ik be d_w and d_β dimensional covariate vectors. We assume the observations from different individuals are independent, while the responses D_i₁, . . . , D_iMi assessed on the same individual at different time points are correlated but we do not attempt to model such correlation. To model the relationship between the conditional mean of the repeatedly measured outcomes D_ik at time T_ik and covariates Z_ik, X_ik, we propose a partially linear functional single index model which models the mean of D_ik given Z_ik, X_ik at time T_ik in the form of

E (D_{i k} ∣ X_{i k}, Z_{i k}, T_{i k}) = H [m {w {(T_{i k})}^{T} Z_{i k}} + β^{T} X_{i k}],

(1)

where H is a known differentiable monotone link function, w(t) ∈ R^d_w at any t, β ∈ R^d_β. Such model is useful when the time varying effect of Z_ik and the functional combined score effect of w(T_ik)^TZ_ik, adjusted by the covariate vector X_ik, are of main interest. Note that both X_ik and Z_ik can contain components that do not vary with k, such as gender, and the ones that vary with k such as age. Here, m(0) serves as the intercept term, thus X_ik does not contain the constant one. In Model (1), Z_ik includes the covariates of main research interest whose effects are usually time varying and modeled nonparametrically, and X_ik contains additional covariates of secondary scientific interest and whose effects are only modeled via a simple linear form. Here m is an unspecified smooth single index function. Further w is a d_w-dimensional vector of smooth functions in L₂, while w(t) is w evaluated at t, hence a d_w-dimensional vector. In addition, w(t) contributes to form the argument of the function m, which yields a nested nonparametric functional form. To ensure identifiability and to reflect the practial application that motivated this example, we further require w(t) > 0 and ∥w(t)∥₁ = 1 ∀t. Here w(t) > 0 means every component in w(t) is positive, and ∥·∥₁ denotes the vector l₁-norm, i.e. the sum of the absolute values of the components in the vector. The choice of l₁ norm incorporates the practical knowledge from our real data example described in Section 4 and is not critical. It can be modified to other norms, such as the most often used l₂ norm or sup norm in our subsequent development. We assume the observed data follow the model described above. Throughout the texts, we use subscript ₀ to denote the true parameters. Before we proceed, we first show that

Proposition 1

Assume $m_{0} \in M$ , where $M = {m \in C^{1} ([0, 1]), m i s o n e - t o - o n e, a n d m (0) = c_{0}}$ . Here C¹([0, 1]) is the space of functions with continuous derivatives on [0, 1] and c₀ is a finite constant. Assume $w_{0} (t) \in D$ , where $D = {w = {(w_{1}, \dots, w_{d_{w}})}^{T} : {‖ w_{0} (t) ‖}_{1} = 1, w_{j} > 0$ . Here C¹([0, τ]) is the space of functions with continuous derivatives on [0, τ] and τ is a finite constant. Assume $E (X_{i k}^{\otimes 2})$ and $E (Z_{i k}^{\otimes 2})$ are both positive definite, where we define $a^{\otimes 2} = {aa}^{T}$ for an arbitrary vector a. Then under these assumptions, the parameter set (β₀, m₀, w₀) in (1) is identifiable.

The proof of Proposition 1 is in Appendix A.1. Model (1) can be viewed as a longitudinal extension of the generalized partially linear single index risk score model introduced in Carroll et al. (1997), i.e.,

E (D_{i k} ∣ X_{i k}, Z_{i k}) = H {m (w^{T} Z_{i k}) + β^{T} X_{i k}},

(2)

which is a popular way to increase flexibility when covariate dimension may be high. Many existing literatures explore the generalized partially linear single index model under the longitudinal settings. Jiang and Wang (2011) consider the single index function in the form of m(w^TZ_ik, t), which allows a time dependent function m, but w is time invariant hence it does not have the nesting structure in Model (1) to capture the time dependent effect of Z_ik. Furthermore, the method does not consider the within subject correlation. Xu and Zhu (2012) adopted Model (2) as marginal model in the longitudinal data setting. Their method takes into account the within subject correlation, but, similar to Jiang and Wang (2011)'s approach, it does not allow w to vary with time, hence is not sufficient to describe the time varying effect of Z_ik. We modify Jiang and Wang (2011) and Xu and Zhu (2012)'s models to accommodate the time dependent score effect w(t). In Section 4, we show that time-dependent effect is essential to improve model fit in some practical situations. In addition, we retain the virtue of Jiang and Wang (2011) and Xu and Zhu (2012)'s models by using the semiparametric functional single index model, which overcomes the curse of dimensionality, and alleviate the risk of model mis-specification (Peng and Huang, 2011).

The estimation and inference for Model (1) are challenging due to the non-parametric form of m, w, and the complications from correlation between repeatedly measured outcomes. The estimation for single index models has been discussed extensively in both kernel and spline literatures. Carroll et al. (1997) proposed a local kernel smoothing technique to estimate the unknown function m and the finite dimensional parameters w, β in Model (2) through iterative procedures. Later, Xia and Härdle (2006) applied a kernel-based minimum average variance estimation (MAVE) method for partially linear single index models, which was first proposed by Xia et al. (2002) for dimension reduction. When Z_ik is continuous, MAVE results in consistent estimators for the single index function m without the root-n assumption on w as in Carroll et al. (1997). Nevertheless, when Z_ik is discrete, the method may fail to obtain consistent estimators without prior information about β (Xia et al., 2002; Wang et al., 2010). Moreover, Wang and Yang (2009) showed that MAVE is unreliable for estimating single index coefficient w when Z_ik is unbalanced and sparse, i.e., when Z_ik is measured at different time points for each subject, and each subject may have only a few measurements.

To overcome these limitations, we apply the B-spline method to estimate the unknown function m, which is stable when the data set contains discrete or sparse Z_ik. Although the B-spline method outperforms the kernel method in estimating m, problems arise if it is also used for estimating w(t) in our model setting. If spline approximations are used for both m and w(t) with k knots, then we must simultaneously solve (d_w + 1)k estimating equations to get the spline coefficients associated with the spline knots, which may cause numerical instability and is computationally expensive when the parameter number increases with the sample size. To alleviate the computational burden and instability, we estimate w(t) by using the kernel method. At different time point t, the procedure solves w(t) independently and in parallel, hence it does not suffer from the numerical instability and is computationally efficient. To handle longitudinal outcomes, we use the idea from the generalized estimating equation (GEE) to combine a set of estimating equations built from the marginal model. It is worth pointing out that the GEE in its original form is only applicable when the index w does not change along time. In conclusion, we combine the kernel and B-spline smoothing with the GEE approach, and develop a fused kernel/B-spline procedure for estimation and inference.

The fusion of kernel and B-spline poses theoretical challenges which we address in this work. To the best of our knowledge, this is the first time kernel and spline methods are jointly implemented in a nested function setting. We study convergence properties, such as asymptotic bias and variance, for each component of the model, show that the parametric component achieves the regular root-n convergence rate, and establish the relation of the non-parametric function convergence rates to the number of B-spline basis functions and B-spline order, as well as their relation to the kernel bandwidth. These results provide guidelines for choosing the number of knots in association with spline order and bandwidth in order to optimize the performance. They also further facilitate inference, such as constructing confidence intervals and performing hypothesis testing. Although theoretical properties of kernel smoothing and spline smoothing are available separately, the properties when these two methods are combined in a nested fashion has not been studied in the literature even for the independent data case prior to this work. Because the vector functions w appears inside the function m, the asymptotic analysis of the spline and kernel methods are not completely separable. This requires a comprehensive analysis and integration of both methods instead of a mechanical combination of two separate techniques.

The rest of the paper is structured as the following. In Section 2, we define some notations and state assumptions in the model, introduce the fused kernel/B-spline semiparametric estimating equation, illustrate the profiling estimation procedure to obtain the estimators, and study the asymptotic properties of the resulting estimators. In Section 3, we evaluate the estimation procedure on simulated data sets. In Section 4, we apply the model and estimation procedure on the Huntington's disease data set. We conclude the paper with some discussion in Section 5. We present the technical proofs in Appendix and an online supplementary document (Jiang, Ma and Wang, 2015).

2. Estimating equations and profiling procedure

In this section, we construct estimators for (β, m, w) in Model (1). We first derive a set of estimating equations, through applying both B-spline and kernel methods. We then introduce a profiling procedure to implement the estimation. Finally, we discuss the asymptotic properties of the estimators.

Many estimation procedures have been developed for the single index risk score model. In addition to the methods describe in Section 1, for the models with uncorrelated responses, Cui, Härdle and Zhu (2011) illustrate an estimating function method based on the kernel approach for the generalized single index risk score model. Ma and Zhu (2013) discuss a doubly robust and efficient estimation procedure for the single index risk score model with high dimensional covariates. Ma and Song (2014) and Lu and Loomis (2013) propose B-spline methods for estimating the unknown regression link functions in single index risk score models. However, these methods are not adequate for the parameter estimation in our model. As shown in (1), in addition to an unknown link function m, our functional single index model contains a nonparametric function w(t) which is multivariate and appears inside m. Therefore, we develop a GEE type method for the parameter estimation in our model which allows to take into account the within patient correlation. In conjunction with the kernel smoothing technique and B-spline basis expansion, our fused method estimates both the coefficients as a function of time and the unspecified regression function, and simultaneously handles the complexities of repeated measurements and curses of dimensionality.

More specifically, let B_r(u) = {B_r₁(u),. .. , B_{rd_λ} (u)}^T be the set of B-spline basis functions of order r and let λ = (λ₁,. .., λ_{d_λ})^T be the coefficients of the B-spline approximation. Denoting m̃(u, λ) = B_r(u)^Tλ, de Boor (2001) has shown the existence of a λ₀ ∈ R^d_λ so that m̃(u,, λ₀ ) = B_r(u)^T λ₀ converges to m₀(u) uniformly on (0, 1) when the number of the B-spline inner knots goes to infinite (See Fact 1 in Section S.2 in the supplementary article (Jiang, Ma and Wang, 2015)). A detailed description of the B-spline functions and the properties of their derivatives can be found in de Boor (2001).

The B-spline approximation greatly eases the parameter estimation procedure. Operationally, for a given sample size n, the problem is reduced from estimating the infinite dimensional m to estimating a finite dimensional vector λ. Since the dimension of λ grows with the sample size, the estimation consistency can be achieved when the sample size goes to infinity. Let $θ = {(β^{T}, λ^{T})}^{T} \in R^{d_{θ}}$ , the approximated mean function can be written as

H [B_{r} {w {(T_{i k})}^{T} Z_{i k}}^{T} λ + β^{T} X_{i k}] .

We investigate the properties for estimating m₀, w₀, β₀ through investigating the properties of the estimators for λ₀, w₀ and β₀.

2.1 Notations

We define some notations to present the estimation procedure. To keep the main text concise, we illustrate the specific forms of notations in the Section A.2 in Appendix. Generally, for a generic vector valued function a that depends on some additional parameters, we use a to denote the function with the estimated parameter values plugged in. For example, this applies to S_w, S_β , Ŝ_w, Ŝ_β in the following text. The specific forms of S_w, S_β , Ŝ_w, Ŝ_β are given in Section A.2 in Appendix.

In our profiling procedure, we estimate λ₀ using $\hat{λ}$ , considered as a functional of β₀, w₀. Then we estimate w₀ using ŵ, considered as a function of β₀ at different time points. Finally, we estimate β₀ using $\hat{β}$ . We further define T_ik, k = 1,. .. , M_i, i = 1,. .. , n to be the random measurement times which are independent of X_ik, Z_ik, D_ik, w to be a function of t for t ∈ [0, τ], of where τ is a finite constant, and ŵβ, ŵ(β, t), considered as functions of β, to be the estimators for w and w(t), respectively.

Let Q_β (X_ik) = X_ik, Q_λ {Z_ik; w(t)} = B_r {w(t)^TZ_ik}, and $Q_{w} {Z_{i k}; λ, w (t)} = Z_{i k} B_{r}^{'} {w {(t)}^{T} Z_{i k}}^{T} λ$ , to be the partial derivatives of B_r{w(t)^TZ_ik}^T λ+ β^TX_ik with respect to β, λ, w(t). In the sequel, we will frequently use Q_βik, Q _λik{w(t)}, Q_wik{λ, w(t)} as short forms for Q_β (X_ik), Q_λ {Z_ik; w(t)} and Q_w {Z_ik; λ, w(t)} respectively.

In general, to simplify the notations, we use subscripts to indicate the observations, i.e. for a generic function a(·), we write a_i(·) ≡ a(O_i; ·), where O_i denotes the ith observed variables. For example we write

H_{i k} {β, λ, w (t)} \equiv H [B_{r} {w {(t)}^{T} Z_{i k}}^{T} λ + β^{T} X_{i k}] .

Further, we indicate the use of the true function instead of its B-spline approximation by replacing the argument λ with m, for example,

H_{i k} {β, m, w (t)} \equiv H [m {w {(t)}^{T} Z_{i k}} + β^{T} X_{i k}] .

We also define Θ(u) = dH(u)/du and

ϴ_{i k} {β, λ, w (t)} = ϴ [B_{r} {w {(t)}^{T} Z_{i k}}^{T} λ + β^{T} X_{i k}],

and

ϴ_{i k} {β, m, w (t)} = ϴ [m {w {(t)}^{T} Z_{i k}} + β^{T} X_{i k}]

throughout the text.

The profiling procedure has three steps. We define the details of notations used in each step and their corresponding population forms in the Section A.2 in Appendix.

2.2 Estimation procedure via profiling

In this section, we define the estimation procedures for m, w₀ and β₀ via estimating equations which are solved through a profiling procedure as we describe below. We first estimate the function m through B-splines, by treating w and β as parameters that are held fixed. This yields a set of estimating equations for the spline coefficients, as functions of w and β. We then estimate the partially linear nonparametric component w(t) of the cognitive score profiles through local kernel smoothing, while treating β as fixed parameters. This further allows us to obtain a second set of estimating equations at each time point that the function w(t) needs to be estimated, as a function of β. Finally, we estimate the parametric component coefficients through solving its own corresponding estimating equation set. The profiling procedure achieves a certain separation by allowing us to treat only one of the three components in each of the three nested steps, hence it eases the computational complexities. Because the B-spline estimator $\hat{λ}$ , kernel estimator ŵ(t), and linear parametric estimator $\hat{β}$ have different convergence rates, such separation also facilitates analysis of the asymptotic properties, compared with a simultaneous estimation procedure.

Step 1

We obtain $\hat{λ} (β_{0}, w_{0})$ by solving

\sum_{i = 1}^{n} {\tilde{Q}}_{λ i} {w_{0} (T_{i})}^{T} ϴ_{i} {β_{0}, λ, w_{0} (T_{i})} Ω_{i}^{- 1} [D_{i} - H_{i} {β_{0}, λ, w_{0} (T_{i})}] = 0

with respect to λ, where Ω_i is a working covariance matrix, and Θ_i = diag{Θ_ik}, k = 1,. .. , M_i is a M_i × M_i diagonal matrix. From the first step, we obtain the B-spline coefficients to estimate the function m.

Step 2

We obtain ŵ(β) in this step. Let K_h(T_i − t₀) be a d_wM_i × d_wM_i diagonal matrix whose kth diagonal block is diag{K_h(T_ik − t₀ )} where K_h(s) = h⁻¹K(s/h) is a Kernel function with bandwidth h.

To obtain ŵ(β₀, t₀), we solve the estimating equation

\begin{matrix} \sum_{i = 1}^{n} {\hat{A}}_{w i} {β_{0}, \hat{λ} (β_{0}, w), w (t_{0})} {\hat{V}}_{w i} {β_{0}, \hat{λ} (β_{0}, w), w (t_{0})}^{- 1} \\ \times K_{h} (T_{i} - t_{0}) {\hat{S}}_{w i} {β_{0}, \hat{λ} (β_{0}, w), w (t_{0})} \end{matrix}

(3)

with respect to w. Recall that ∥w(t₀)∥₁₁ = 1. In the implementation, we parameterize $w_{d_{w}} = 1 - \sum_{j = 1}^{d_{w} - 1} w_{j}$ , and derive the score functions for the vector (w₁,. .. , w_{dw − 1}). We then solve the estimating equation system which contains the d_w − 1 equations constructed from the score functions and the equation $\sum_{j = 1}^{d_{w}} w_{j} - 1 = 0$ . The roots of the estimating equation system automatically satisfy the l₁ constraint. In all our experiments, the resulting ŵ_j(t) are nonnegative automatically, hence we did not particularly enforce the nonnegativity as a constraint. If it is needed, one can further enforce the nonnegativity and perform a constrained optimization.

Step 3

We obtain $\hat{β}$ by solving

\begin{matrix} \sum_{i = 1}^{n} {\hat{A}}_{β i} [β, \hat{λ} {β, \hat{w} (β, T_{i})}, \hat{w} (β)] {\hat{V}}_{β i} {[β, \hat{λ} {β, \hat{w} (β)}, \hat{w} (β, T_{i})]}^{- 1} \\ \times {\hat{S}}_{β i} [β, \hat{λ} {β, \hat{w} (β)}, \hat{w} (β, T_{i})] = 0 . \end{matrix}

(4)

In above steps, we approximate ∂ŵ(β, T_i), $\partial \hat{λ} (β, w) ∕ \partial β^{T}$ , and $\partial \hat{λ} (β_{0}, w_{0}) ∕ \partial w$ by the leading terms in their expansions. Their explicit forms are shown in (S.27) in the proofs of Lemma 6, (S.37) in the proofs of Lemma 11, and Notations in Step 2 in Appendix, respectively.

2.3 Asymptotic properties of the estimators

The profiling estimator described in Section 2.2 is quite complex, caused by the functional nature of w(t), the unspecified forms of both w and m and their nested appearance in the model, the correlation among different observations associated with the same individual and the different numbers of observations for each individual. In addition, the fused kernel/B-spline method requires careful joint consideration of both smoothing techniques. As a consequence, the analysis to obtain the asymptotic properties of the estimator described in Section 2.2 is very challenging and involved. We first list the regularity conditions under which we perform our theoretical analysis.

(A1) The kernel function K(·) is non-negative, has compact support, and satisfies $\int K (s) d s = 1$ , $\int K (s) s d s = 0$ and $\int K (s) s^{2} d s < \infty$ , and $\int K^{2} (s) s d s < \infty$ .

(A2) The bandwidth h in the kernel smoothing satisfies nh² → ∞ and nh⁴ → 0 when n → ∞.

(A3) The density function of w(t)^TZ for each t ∈ [0, τ] is bounded away from 0 on S_w(t) and satisfies the Lipschitz condition of order 1 on S_w(t), where w is in a neighborhood of w₀, and S_w(t) = {w(t)^TZ, Z ∈ S} and S is a compact support of Z and τ < ∞ is a finite constant. Without loss of generality, we assume S_w(t) =1[0, 1].

(A4) Assume m₀ ∈ {m ∈ C^q([0, 1]), m is one-to-one, and m(0) = c₀}. Here C^q([0, 1]) is the space of functions with first q continuous derivatives on [0, 1]. The spline order r ≥ q. The cluster size M_i is a fixed finite number that does not diverge with the sample size, i.e. M_i < ∞ for all i.

(A5) Let h_p be the distance between the (p + 1)th and pth interior knots of the order r B-spline functions. And $h_{b} = \max_{r \leq p \leq N_{+ r}} h_{p}$ . There exists 0 < c_{h_b} < ∞, such that $\max_{r \leq p \leq N_{+ r}} h_{p + 1} = o (N^{- 1})$ abd $h_{b} ∕ \min_{r \leq p \leq N_{+ r}} h_{p} < c_{h_{b}}$ , where N is the number of knots which satisfies N → ∞ as n → ∞, and N⁻¹n(logn)⁻¹ → ∞ and Nn^−1/(2^q⁺¹⁾ → ∞. Further assuming q > 3 and N⁻³n.

(A6) The matrices $E (X_{i k}^{\otimes 2})$ , $E ([{{X_{i k} - E {X_{i k} ∣ w (t)^{T} Z_{i k}}]}^{\otimes 2}), E {([Z_{i k} - E {{Z_{i k} ∣ w (t)}^{T} Z_{i k}} m_{0}^{'} {w {(t)}^{T} Z_{i k}}]}^{\otimes 2})$ and $E ([{{X_{i k} Z_{i k}^{T} - E {X_{i k} Z_{i k}^{T} ∣ w {(t)}^{T} Z_{i k}} m_{0}^{'} {w {(t)}^{T} Z_{i k}}]}^{\otimes 2})$ are finite and positive definite for any t ∈ [0, τ]. The requirements nh⁴ → 0 in (A2) and Nn ^−1(2q+1) → ∞ in (A5) are undersmoothing requirements on the kernel approximation and on the spline approximation respectively. They are required to ensure that the biases, E(ŵ) − w and $E (B_{r}^{T} \hat{λ}) - m_{0}$ , are ignorable compared to other terms left in the final analysis. This kind of undersmoothing conditions are commonly required in semiparametric models.

Theorems 1–3 describe the asymptotic properties for the estimators of w₀(t), β₀ and m₀, respectively.

Theorem 1

Assume Conditions (A1)-(A6) and the identifiability conditions stated in Proposition 1 hold. Let Â_wi, V̂_wi, Ŝ_wi, and their population forms A_wi, V_wi and S_wi be as defined in Notation in Step 2Section A.2 in Appendix. Let ŵ(β₀, t₀) solve (3) and f_T be the probability density function of T_ik with support [0, τ]. Define

Σ_{w} = {(n h)}^{- 1} {B (t_{0}) f_{T} (t_{0})}^{- 1} E (f_{T} (t_{0}) [A_{w i} {β_{0}, m_{0}, w_{0} (t_{0})} \times V_{w i} {β_{0}, m_{0}, w_{0} (t_{0})}^{- 1}] \int K (s) V_{w i}^{*} {β_{0}, m_{0}, w_{0} (t_{0})} K (s) d s \times {[A_{w i} {β_{0}, m_{0}, w_{0} (t_{0})} V_{w i} {β_{0}, m_{0}, w_{0} (t_{0})}^{- 1}]}^{T}) {B (t_{0}) f_{T} (t_{0})}^{- 1} .

Then

Σ_{w}^{- 1 ∕ 2} {\hat{w} (β_{0}, t_{0}) - w_{0} (t_{0})} \overset{d}{\to} N (0, I),

where B are defined in Notation in Step 3 in Section A.2 in Appendix.

Theorem 1 establishes the large samples properties of the estimation of the multivariate weight function w₀(t). It shows that our method achieves the usual nonparametric convengence rate of root-nh under the conditions given.

Theorem 2

Assume Conditions (A1)-(A6) and the identifiability conditions stated in Proposition 1 hold. Let ${\hat{S}}_{β i k}$ , ${\hat{A}}_{β i k}$ , ${\hat{V}}_{β i k l}$ , and their population forms S_βik, A_βik, V_βikl be as defined in Notation in Step 3 in Section A.2 in Appendix, and ŵ(β), w(β) be as defined in Section 2.1. Let $\hat{β}$ solve (4), then

\sqrt{n} (\hat{β} - β_{0}) = F {(m_{0})}^{- 1} (\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} A_{β i} {β_{0}, m_{0}, w_{0} (T_{i})} V_{β i} {β_{0}, m_{0}, w_{0} (T_{i})}^{- 1} \times S_{β i} {β_{0}, m_{0}, w_{0} (T_{i})} - \frac{1}{\sqrt{n}} \sum_{j = 1}^{n} E (A_{β i} {β_{0}, m_{0}, w_{0} (T_{j})} \times V_{β i} {β_{0}, m_{0}, w_{0} (T_{i})}^{- 1} K (T_{j}) ∣ O_{j}) B {(T_{j})}^{- 1} [A_{w j} {β_{0}, m_{0}, w_{0} (T_{j})} \times V_{w j} {β_{0}, m_{0}, w_{0} (T_{j})}^{- 1} S_{w j} {β_{0}, m_{0}, w_{0} (T_{j})}]) - G (m_{0}) V^{- 1} \times \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \tilde{Q} λ i {w_{0} (T_{i})}^{T} ϴ_{i} {β_{0}, m_{0}, w_{0} (T_{i})} Ω_{i}^{- 1} [D_{i} - H_{i} {β_{0}, m_{0}, w_{0} (T_{i})}]) {1 + o_{p} (1)},

(5)

where $K (T_{i}) = d i a g {κ (T_{i k}), k = 1, \dots, M_{i}} a d_{β} M_{i} \times d_{w} M_{i}$ matrix and κ(T_ik) is

{Q_{β i k} - δ {w_{0} {(T_{i k})}^{T} Z_{i k}} - {(B {(T_{i k})}^{- 1} E [A_{w j} {β_{0}, m_{0}, w_{0} (T_{i k})} V_{w j} {β_{0}, m_{0}, w_{0} (T_{i k})}^{- 1} \frac{\partial S_{w j} {β_{0}, m_{0}, w_{0} (T_{i k})}}{\partial β^{T}} ∣ O_{i}])}^{T} Z_{i k} \times m_{0}^{'} {w_{0} {(T_{i k})}^{T} Z_{i k}} + γ {w_{0} {(T_{i k})}^{T} Z_{i k}}} Q_{w i k} {m_{0}, w_{0} (T_{i k})}^{T} \times ϴ_{i k} {β_{0}, m_{0}, w_{0} (T_{i k})} .

F (m_{0}) = - E {A_{β i} {β_{0}, m_{0}, w_{0} (T_{i})} V_{β i} {β, m_{0}, w_{0} (T_{i})}^{- 1} \times \frac{\partial S_{β i} {β_{0}, m_{0}, w_{0} (T_{i})}}{\partial β^{T}}},

and

G (m_{0}) = E [A_{β i} {β_{0}, m_{0}, w_{0} (T_{i})} V_{β i} {β_{0}, λ_{0}, w_{0} (T_{i})}^{- 1} C_{i} ϴ_{i}^{*} {β_{0}, m_{0}, w_{0} (T_{i})} Q_{λ i}^{*} {w_{0} (T_{i})}] .

Here C_i is a d_β M_i Θ d_β M_i with the kth block having the form

{Q_{β i k} - δ {w_{0} {(T_{i k})}^{T} Z_{i k}} - {(B {(T_{i k})}^{- 1} E [A_{w j} {β_{0}, m_{0}, w_{0} (T_{i k})} V_{w j} {β_{0}, m_{0}, w_{0} (T_{i k})}^{- 1} \times \frac{\partial S_{w j} {β_{0}, m_{0}, w_{0} (T_{i k})}}{\partial β^{T}} ∣ O_{i}])}^{T} Z_{i k} m_{0}^{'} {w_{0} {(T_{i k})}^{T} Z_{i k}} + γ {w_{0} {(T_{i k})}^{T} Z_{i k}}} .

Here $ϴ_{i}^{*} {β_{0}, m_{0}, w_{0} (T_{i})}$ is a d_β M_i × d_β M_i matrix with the kth block being a d_β × d_β diagonal matrix with the element Θ_ik{ β₀, m₀, w₀(T_ik)} . And $Q_{λ i}^{*} {w_{0} (T_{i})}$ is a d_β M_i × d_λ matrix with kth row block being a d_β × d_λ matrix, which is d_β replicates of the row vector Q λ_ik{w₀(T_ik)}^T. B, δ , γ are functions defined in Notation in Step 3 in Section A.2 in Appendix.

Consequently, we have

\sqrt{n} (\hat{β} - β_{0}) \overset{d}{\to} N (0, Σ),

where

Σ = F {(m_{0})}^{- 1} E [({[A_{β i} {β_{0}, m_{0}, w_{0} (T_{i})} V_{β i} {β_{0}, m_{0}, w_{0} (T_{i})}^{- 1} \times S_{β i} {β_{0}, m_{0}, w_{0} (T_{i})}]}^{\otimes 2}) + {{(E (A_{β i} {β_{0}, m_{0}, w_{0} (T_{j})} \times V_{β i} {β_{0}, m_{0}, w_{0} (T_{i})}^{- 1} K (T_{j}) ∣ O_{j}) B {(T_{j})}^{- 1} [A_{w j} {β_{0}, m_{0}, w_{0} (T_{j})} \times V_{w j} {β_{0}, m_{0}, w_{0} (T_{j})}^{- 1} S_{w j} {β_{0}, m_{0}, w_{0} (T_{j})}]))}^{\otimes 2}} + {(G (m_{0}) V^{- 1} {\tilde{Q}}_{λ i} {w_{0} (T_{i})}^{T} ϴ_{i} {β_{0}, m_{0}, w_{0} (T_{i})} Ω_{i}^{- 1} {[D_{i} - H_{i} {β_{0}, m_{0}, w_{0} (T_{i})}])}^{\otimes 2}}] \times F {(m_{0})}^{- 1} .

Theorem 2 establishes the usual parametric convergence rate for $\hat{β}$ , even though the estimation relies on multiple nonparametric estimates as well. The form of (5) in Theorem 2 indicates that the variance of estimating β₀ is inflated by the estimation ŵ as given in

\frac{1}{\sqrt{n}} \sum_{j = 1}^{n} E (A_{β i} {β_{0}, m_{0}, w_{0} (T_{j})} V_{β i} {β_{0}, m_{0}, w_{0} (T_{i})}^{- 1} K (T_{j}) ∣ O_{j}) B {(T_{j})}^{- 1} [A_{w j} {β_{0}, m_{0}, w_{0} (T_{j})} V_{w j} {β_{0}, m_{0}, w_{0} (T_{j})}^{- 1} \times S_{w j} {β_{0}, m_{0}, w_{0} (T_{j})}])

and is also inflated by the estimation $\hat{λ}$ , as given in

G (m_{0}) V^{- 1} \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} {\tilde{Q}}_{λ i} {w_{0} (T_{i})}^{T} ϴ_{i} {β_{0}, m_{0}, w_{0} (T_{i})} Ω_{i}^{- 1} [D_{i} - H_{i} {β_{0}, m_{0}, w_{0} (T_{i})}] .

See Lemma 9, 11 and the proofs of Theorem 2 in the supplementary article (Jiang, Ma and Wang, 2015) for more detailed discussion.

The asymptotic normality of $\hat{β}$ established in Theorem 2 further facilitates inference on β such as constructing confidence intervals or performing hypothesis testing. In implementing these inference procedures, we replace the variance-covariance matrix Σ with its estimate, where we use empirical sample mean over the observed samples to replace the expectations in Theorem 2, and plug in the estimates of the corresponding parameter and function values. This is the procedure adopted in all our numerical implementation

Theorem 3

Assume Conditions (A1)-(A6) and the identifiability conditions stated in Proposition 1 hold. Let $\hat{m} {u, \hat{λ} (β, w)} = B_{r} {(u)}^{T} \hat{λ} (β, w)$ , $\tilde{m} {u, λ_{0}} = B_{r} {(u)}^{T} λ_{0}$ , where $\hat{λ} (β_{0}, w_{0})$ solves (3) and define

σ^{2} (u, w_{0}) \equiv \frac{1}{n} B_{r} {(u)}^{T} E {([{\tilde{Q}}_{λ i} {w_{0} (T_{i})}^{T} ϴ_{i} {β_{0}, m_{0}, w_{0} (T_{i})} Ω_{i}^{- 1} \times ϴ_{i} {β_{0}, m_{0}, w_{0} (T_{i})} {\tilde{Q}}_{λ i} {w_{0} (T_{i})}])}^{- 1} E ([{\tilde{Q}}_{λ i} {w_{0} (T_{i})}^{T} \times ϴ_{i} {β_{0}, m_{0}, w_{0} (T_{i})} Ω_{i}^{- 1} Ω_{i}^{*} Ω_{i}^{- 1} ϴ_{i} {β_{0}, m_{0}, w_{0} (T_{i})} \times {\tilde{Q}}_{λ i} {w_{0} (T_{i})}]) E {([{\tilde{Q}}_{λ i} {w_{0} (T_{i})}^{T} ϴ_{i} {β_{0}, m_{0}, w_{0} (T_{i})} \times Ω_{i}^{- 1} ϴ_{i} {β_{0}, m_{0}, w_{0} (T_{i})} {\tilde{Q}}_{λ i} {w_{0} (T_{i})}])}^{- 1} B_{r} (u),

where $Ω_{i}^{*} = E {{(D_{i} - H_{i})}^{\otimes 2} ∣ X_{i}, Z_{i}}$ is the true covariance matrix, and

σ_{w}^{2} \equiv \frac{1}{n} B_{r}^{T} (u) E {{(V^{- 1} E [\sum_{k = 1}^{M_{i}} \sum_{v = 1}^{M_{i}} E {C_{i k v} ϴ_{i k} {β_{0}, m_{0}, w_{0} (T_{i k})} \times ϴ_{i v} {β_{0}, m_{0}, w_{0} (T_{i v})} B_{r} {w_{0} (T_{i v})^{T} Z_{i v}} m_{0}^{'} (w_{0} {(T_{i k})}^{T} Z_{i k}) Z_{i k}^{T} \times ({B (T_{i k}) f_{T} (T_{i k})}^{- 1} [A_{w j} {β_{0}, m_{0}, w_{0} (T_{i k})} V_{w j} {β_{0}, m_{0}, w_{0} (T_{i k})}^{- 1} K_{h} (T_{j} - T_{i k}) S_{w j} {β_{0}, m_{0}, w_{0} (T_{i k})}]) ∣ M_{i}, O_{j}} ∣ O_{j}])}^{\otimes 2}} B_{r} (u) .

Here V is as defined in the Notations in Step 1 in Section A.2 in Appendix, and C_ikv is the (k, v)th entry of the matrix $Ω_{i}^{- 1}$ . Then we have

{σ^{2} (u, w_{0}) + σ_{w}^{2}}^{- 1 ∕ 2} (\hat{m} [u, \hat{λ} {\hat{β}, \hat{w} (\hat{β})}] - m_{0} (u)) \overset{d}{\to} N (0, 1) .

Further because the order of σ² and $σ_{λ}^{2}$ are both (nh_b)⁻¹, together with Fact 1 in Section S.2, we have

\begin{matrix} ∣ \hat{m} [u, \hat{λ} {\hat{β}, \hat{w} (\hat{β})}] - m_{0} (u) ∣ = & O_{p} {{(n h_{b})}^{- 1 ∕ 2} + h_{b}^{q}}, \\ ∣ {\hat{m}}^{'} [u, \hat{λ} {\hat{β}, \hat{w} (\hat{β})}] - m_{0}^{'} (u) ∣ = & O_{p} {n^{- 1 ∕ 2} h_{n}^{- 3 ∕ 2} + h_{b}^{q - 1}} \end{matrix}

uniformly for u ∈ (0, 1).

Theorem 3 shows that the estimation error of $\hat{m} [u, \hat{λ} {\hat{β}, \hat{w} (\hat{β})}]$ consists of two components, the approximation error of $\hat{m} [u, \hat{λ} {\hat{β}, \hat{w} (\hat{β})}]$ and the approximation error of m̃(u, λ₀) from their respective true functions. The errors of m̂ and m̂′ go to zero with the rates of O_p{(nh_b)^−1/2} and O_p(n^−1/2h_b^−1/2) respectively. Under Condition (A5), m̂ and m̂′ are both consistent, and they approach the truths with the standard B-spline convergence rate. We provide an outline of the proofs for Theorems 1-3 in the supplementary article (Jiang, Ma and Wang, 2015). The proofs are highly technical and lengthy, and they require several preliminary results which we summarize as lemmas. We present and prove these lemmas in the supplementary article (Jiang, Ma and Wang, 2015).

3. Numeric evaluation via simulations

We now evaluate the finite sample performance of the proposed estimation procedure on simulated data sets. We simulate 1000 data sets from Model (1) under three settings. In Settings 1 and 2, we consider binary response and use logit link function for H, while in Setting 3, we consider continuous normal response and use an identity H function. In Setting 1, we choose m as a polynomial function with degree two. We generate w initially as positive linear functions on t, and then normalize the vector to have summation one. Note that the normalization function modifies the structure of w(t) and results in a nonlinear vector-valued function in t. Additionally, we generate Z_ik from the Poisson distribution and normalize the vectors by the sample standard deviations. Furthermore, we generate T_ik from the exponential distribution and the covariate X_ik from the univariate normal distribution. In Settings 2 and 3, we use the sine function for m, and generate w as power functions on t and then normalize the vector to have summation one. We generate covariate vector X_i from a three-dimensional multivariate normal distribution. In order to stabilize the computation and control numerical errors, in both settings, we transform the function $w {(T_{i k})}^{T} Z_{i k}$ to $F {w {(T_{i k})}^{T} Z_{i k}} = Φ ([w {(T_{i k})}^{T} Z_{i k} - E {w^{0} {(T_{i k})}^{T} Z_{i k}}] ∕ \sqrt{var {w^{0} {(T_{i k})}^{T} Z_{i k}}})$ , where w⁰ is the initial value of w, and E{w⁰(T_ik)^T Z_ik} and var{w⁰ (T_ik)^TZ_ik} are approximated by the sample mean and the sample variance. We then use B-spline to approximate m ○ F⁻¹ instead of m, where ○ denotes composite. All other operations remain the same, and the estimation and inference of the functional single index risk score m{w(T_ik)^TZ_ik}, our main research interest, is carried out as described before. To recover information regarding m, one can use the Delta method to obtain the estimate and the variance of estimating m from that of estimating m ○ F⁻¹.

In all the implementations, we use the third order quadratic spline. We select the number of internal knots N = {n^1/5(logn)^2/5} which satisfies the Condition (A5) in Section 2.3. We choose the Gaussian kernel with bandwidth h = n^−2/15h_s, where h_s is Silverman's rule-of-thumb bandwidth (Silverman, 1986). Because h_s = O(n^−1/5), the bandwidth selection satisfies Condition (A2) in Section 2.3.

Table 1 shows the averaged point estimators of β, the empirical standard deviations calculated from the sample variances, the averages of the estimated asymptotic standard deviation (Σ^1/2 in Theorem 2) over the simulated samples, and the mean squared errors (MSE) when the sample sizes are 100, 500, 800, respectively. The conclusions are similar under the three settings. To sum up, the estimation biases are consistently small across all samples sizes, the empirical standard deviations and the estimated asymptotic standard deviations are decreasing when the sample size increases. The MSE decreases as the sample size increases as well, mainly due to the declining variations. Further, the empirical standard deviation of the estimators and average of the estimated standard deviations calculated from the asymptotic results are close. In addition, the coverage probabilities of the empirical confidence intervals are close to the normal level 95%. This suggests that we can use the asymptotic properties to perform inference and can obtain sufficiently reliable results under moderate sample sizes.

Table 1.

Simulation results in Setting 1, 2, 3, based on 1000 data sets.

Setting 1
β ₀	$E (\hat{β})$	$sd (\hat{β})$	$\hat{sd} (\hat{β})$	MSE	CP
n = 100
−0.2	−0.202	0.157	0.113	0.0247	0.957
−0.4	−0.398	0.119	0.115	0.0142	0.940
−0.6	−0.601	0.124	0.118	0.0153	0.957
n = 500
−0.2	−0.198	0.052	0.050	0.0027	0.954
−0.4	−0.398	0.053	0.051	0.0028	0.947
−0.6	−0.601	0.056	0.053	0.0031	0.939
n = 800
−0.2	−0.197	0.041	0.040	0.0017	0.951
−0.4	−0.398	0.041	0.040	0.0017	0.949
−0.6	−0.602	0.044	0.042	0.0019	0.946

Setting 2
	β ₀	$E (\hat{β})$	$sd (\hat{β})$	$\hat{sd} (\hat{β})$	MSE	CP
n = 100
β ₁	−0.5	−0.505	0.131	0.116	0.0171	0.908
β ₂	0.2	−0.200	0.122	0.112	0.0147	0.923
β ₃	0.5	−0.515	0.125	0.116	0.0159	0.927
n = 500
β ₁	−0.5	−0.507	0.056	0.053	0.0032	0.946
β ₂	0.2	−0.198	0.053	0.052	0.0028	0.951
β ₃	0.5	−0.508	0.054	0.053	0.0031	0.944
n = 800
β ₁	−0.5	−0.504	0.043	0.042	0.0019	0.953
β ₂	0.2	−0.202	0.041	0.041	0.0017	0.962
β ₃	0.5	−0.505	0.043	0.042	0.0019	0.951

Setting 3
	β ₀	$E (\hat{β})$	$sd (\hat{β})$	$\hat{sd} (\hat{β})$	MSE	CP
n= 100
β ₁	−0.5	−0.501	0.062	0.052	3.85e-3	0.938
β ₂	0.2	−0.200	0.060	0.063	3.60e-3	0.932
β ₃	0.5	−0.503	0.061	0.053	3.73e-3	0.932
n = 500
β ₁	−0.5	−0.500	0.025	0.024	6.25e-4	0.966
β ₂	0.2	−0.200	0.024	0.024	5.76e-4	0.945
β ₃	0.5	−0.502	0.025	0.024	6.29e-4	0.963
n = 800
β ₁	−0.5	−0.500	0.020	0.019	4.00e-4	0.949
β ₂	0.2	−0.200	0.019	0.019	3.61e-4	0.949
β ₃	0.5	−0.501	0.020	0.019	4.01e-4	0.952

Open in a new tab

The true parameter β₀, mean (E), empirical standard deviation $(s d \hat{β}))$ and average of the estimated standard deviations $(\hat{s d} \hat{β})) M S E = {s d (\hat{β})}^{2} + {E (\hat{β}) - β}^{2}$ , the coverage probabilities (CP) of the 95% empirical confidnece intervals are reported.

We also examined the performances of ŵ and m̂ to assess the properties of the estimated functional single index risk score. Under the first setting, because the functional single index risk score is fixed with respect to β, we only evaluate the settings with β = 0.4. To evaluate the combined score ŵ(t)^TZ as a function of t, we fix Z at Z * = (1, 2, 3, 4) and plot the averages of the estimated combined score ŵ(t)^TZ* over the 1000 simulations around the true scores w₀(t)^TZ* in the upper panels of Figure 1, 2, and 3 for Setting 1, 2 and 3, respectively. Additionally, we present the 95% point wise confidence band. The results show that the estimates are close to the true function. Further, the 95% confidence band becomes narrower when the sample size increases, which indicates that the estimation variation decreases with increased sample size. Moreover, we evaluated the coverage probabilities of the empirical pointwise confidence bands of w, by computing the coverage probabilities at a set of fixed points across t and taking their average. The average coverage probabilities for n = 100, 500, 800 are 0.934, 0.936, 0.939 in Setting 1, 0.939, 0.940, 0.941 in Setting 2, and 0.931, 0.934, 0.936 in Setting 3, respectively. All are reasonably close to the nominal level of 95%.

Fig 1 — Estimation of w(t)^Tz (upper) and m(u) (bottom) as a function of t and u, respectively in Setting 1 with sample sizes 100 (left), 500 (middle) and 800 (right). True function (solid line), average of 1000 estimated functions (dashed lines), and 95% point wise confidence band (dash-doted lines) are provided.

Fig 2 — Estimation of w(t)^Tz (upper) and m(u) (bottom) as a function of t and u, respectively in in Setting 2 with sample sizes 100 (left), 500 (middle) and 800 (right). True function (solid line), average of 1000 estimated functions (dashed lines), and 95% point wise confidence band (dash-doted lines) are provided.

Fig 3 — Estimation of w(t)^Tz (upper) and m(u) (bottom) as a function of t and u, respectively in Setting 3 with sample sizes 100 (left), 500 (middle) and 800 (right). True function (solid line), average of 1000 estimated functions (dashed lines), and 95% point wise confidence band (dash-doted lines) are provided.

To evaluate the performance of m̂, we plot the average of m̂(u) based on the 1000 simulations, as well as the 95% point wise confidence band in the bottom panels of Figure 1, 2, and 3 for Setting 1, 2, and 3, respectively. The plots show that the estimators are close to the true functions except on the boundary when the sample size is relatively small. In addition, when the sample size increases, the confidence band becomes narrower, benefiting from the smaller estimation variation. Note that because of the additional transformation on w(t)^TZ, it is not unexpected that the true m function does not appear to be periodic sine function on w(t)^TZ. Moreover, we evaluate the converge probability of the empirical pointwise confidence bands of m. The average coverage probabilities are 0.943, 0.947, 0.948 in Setting 1, 0.957, 0.960, 0.951 in Setting 2, and 0.939, 0.947, 0.946 in Setting 3, respectively. Again, they are all fairly close to the nominal level of 95%.

In summary, Table 1, Figures 1, 2, 3 illustrate the desirable finite sample performance of the fused kernel/B-spline combination method in estimating β, m and w. In terms of parameter estimation and function estimation in the non-boundary region, the estimators show very small biases across all sample sizes, and decreasing variability as the sample size increases. The asymptotic variance and sample empirical variance in estimating β are close. Furthermore, the coverage probability of the empirical confidence intervals for β and the coverage probability of the empirical pointwise confidence bands for w and m are close to the nominal levels, which supports using the asymptotic results for the subsequent inferences.

4. Application

We apply the functional single index risk score model and the fused kernel/B-spline semiparametric estimation method to analyze a real data set from a Huntington's disease (HD) study. Current research in HD aims to find reliable prodromes to enable early detection of HD. The joint effect of the cognitive scores on odds of HD diagnosis is shown to change with time. In addition, the relationship between the cognitive symptoms and the log-odds of the disease diagnosis is shown to be nonlinear (Paulsen et al., 2008). Our goal is to study the nonlinear time dependent cognitive effects so as to facilitate the early detection of HD.

Specifically, let D_ik, Z_ik, and X_ik represent the binary disease indicator, the cognitive score vector, and the additional covariate vector for the ith individual at the jth measurement time, respectively. The cognitive scores include SDMT (Smith, 1982), stroop color, stroop word, and stroop interference tests (Stroop, 1935). They are denoted by Z_i₁,. .. , Z_i₄, respectively. The covariates of interest are gender, education, CAP score (Zhang et al., 2011). They are denoted by X_i₁,. .. , X_i₃, respectively. The subject's age at the visiting time serves as the time variable T_ik. We normalize the continuous variables to the interval (0, 1) to alleviate numerical instability. Without changing notations, we transform Z_i₁,. .. , Z_i₄, X_i₃, T_ik by the normal distribution functions with means and variances estimated from the sample.

We use logit link function to model the binary outcomes, i.e., we assume

H [m {w {(T_{i k})}^{T} Z_{i k}} + β^{T} X_{i k}] = \frac{\exp [m {w {(T_{i k})}^{T} Z_{i k}} + β^{T} X_{i k}]}{1 + \exp [m {w {(T_{i k})}^{T} Z_{i k}} + β^{T} X_{i k}]} .

(6)

We obtain the initial estimates and working correlation matrix using the GEE method with exchangeable covariance assumption. We choose the exchangeable covariance structure because in our setting, it facilitates computation while also accounts for the longitudinal correlations. Let the working correlation coefficient matrix be R_i, the working covariance matrix be ${\hat{ϴ}}_{i}^{1 ∕ 2} R_{i} {\hat{ϴ}}_{i}^{1 ∕ 2}$ , where ${\hat{ϴ}}_{i}$ is H_i(1 − H_i) with estimated $\hat{λ}$ , ŵ , $\hat{β}$ plugged in. We implement the profiling procedure described in Section 2.2 in the subsequent estimation. The kernel and B-spline functions are defined in the same way as described in Section 3. We obtain the point estimators $\hat{β} = {(- 0.34, - 0.89, 2.31)}^{T}$ and the asymptotic variances $var (\hat{β}) = {(0.0035, 0.00044, 0.011)}^{T}$ . Consequently, the 95% asymptotic confidence intervals are {(−0.46, −0.23), (−0.93, −0.85), (2.09, 2.52)}, which demonstrate of significant effect gender, education level, and CAP score on the disease risk. Specifically, female (X_i₁ = 0) tends to have higher disease risk than male (X_i₁ = 1). In addition, patients with lower education levels and higher CAP scores are more likely to develop Huntington's disease, which is consistent with the clinical literature (Zhang et al., 2011).

We also plot ŵ(t) to show the variation patterns of the effect of the four cognitive scores over time. Figure 4 shows that the stroop interference score has more important effect than all the others after age 30. The 95% point wise confidence interval remains above the 0.25 level after age 27, and the stroop interference score effect largely dominates all the other effects during that period. This dominating effect indicates that the stroop inference score has the closest relationship with the onset of HD, and in turn could be used to predict HD most effectively among the four. Further, stroop color has large effect at earlier ages (before 30 or at early 30s), while the SDMT has reasonably large effect at later ages (75 or above). Moreover, stroop word have relatively small predicative effects (< 0.25) on the disease risk across all ages. The plots clearly show the time dependent nature of the cognitive score effects. More specifically, stroop color effect is decreasing over times, stroop interference effect is a concave function of time, while SDMT, stroop word effects are convex functions of time. The last three non-monotone effects reach their extreme values around the ages of 40 to 50. In summary, the results show that the stroop interference is more relevant to the disease risk than the other scores. Further, the relative magnitude of the score effects clearly change over time, which suggests the need to closely monitor specific cognitive scores for different age groups. This illustrates the importance of modeling w as a function of age, and the convenience of using a weighted score w(t)^TZ as a combined cognitive profile in practice.

Fig 4 — Estimation of the weight function w(t)'s and the 95% asymptotic confidence bands in Huntington's disease data. The reference line is 0.25.

The form of the function m̂ is shown in the left panel of Figure 5. We also plot the 95% point wise asymptotic confidence band of m̂ in the range of the combined scores U. The plot shows that the functional single index risk score is a decreasing function of the index. The upper confidence interval does not include 0, which shows that the functional single index risk score is significantly smaller than 0 at any age and cognitive score values in this population.

Fig 5 — Function m̂(u) (left) and the estimated disease risk as a function of u (right) in Huntington's disease data.

In the right panel of Figure 5, we plot the disease risk (the estimated probability of D = 1) and the 95% point wise asymptotic confidence band, where the confidence band is based on estimated variance, calculated using the Delta method and the estimated variance of m̂. The results show that the disease risk decreases with the combined cognitive score value U. The 95% confidence interval does not include the 0.5 line, which shows that the disease risk in the population is smaller than 0.5 across all age and cognitive score values. Combining the two plots, Figure 5 shows that a higher value of the combined score U = w(t)^TZ, which implies better cognitive functioning, tends to lower functional single index risk score and in turn lower the risk of HD. The effect of the functional single index cognitive risk score on HD diagnosis is approximately quadratic for a standardized score U < 0.6, and is approximately a constant for U > 0.6. The flattening of the effect reflects a ceiling effect for subjects with better cognitive performance.

Next, we perform two sensitivity analyses to justify using a more flexible generalized partially linear functional single index model as shown in (6). We compare Model (6) with two simpler models. The first one assumes the function m is linear, hence

H {X_{i k}, Z_{i k}; θ, w (T_{i k})} = \frac{\exp {α_{c} + α_{1} w {(T_{i k})}^{T} Z_{i k} + β^{T} X_{i k}}}{1 + \exp {α_{c} + α_{1} w {(T_{i k})}^{T} Z_{i k} + β^{T} X_{i k}}},

(7)

where α_c, α₁ are unknown parameters. The second one assumes the weight function w is time-invariant, hence

H (X_{i k}, Z_{i k}; θ, w) = \frac{\exp {m (w^{T} Z_{i k}) + β^{T} X_{i k}}}{1 + \exp {m (w^{T} Z_{i k}) + β^{T} X_{i k}}},

(8)

where w is an unknown parameter vector. We carried out the estimation of w(t) in the first model using kernel method and the estimation for m in the second model via B-spline method. We implemented 1000 5-fold cross validation analysis. We evaluated models by the mean squared predictive error (i.e., the mean squared differences between D_i and the predicted probability of D_i = 1 on the test set) as a function of the average of the four standardized cognitive scores $\sum_{j}^{4} Z_{j} ∕ 4$ , which we named the standardized score. In Figure 6, we plot the mean squared predictive error curves obtained under the proposed Model (6) and two simpler models. The results show that our original generalized partially linear model with functional single index outperforms Model (8) uniformly across the range of the standardized scores in terms of a lower mean squared error. We also plot the empirical 95% confidence intervals of the squared predictive errors under the proposed model. Compared with the simpler Model (7), our model gives significant smaller predictive errors when the standardized score is smaller than 0.36. The medians of the squared predicative errors in this range are 0.040 and 0.049 for the models (6) and (7), respectively. When the standardized score is greater than 0.5, Model (7) performs slightly, but not significantly, better than Model (6). Overall, the total mean squared error summarized by the area under the predictive error curves for models (6), (7) and (8) are respectively, 0.022, 0.028, and 0.057, which justify using the more flexible model in (6) to fit the Huntington's disease data. The results also demonstrate the potential of using our method as an exploratory tool to assess general patterns of data.

Fig 6 — The mean squared predictive errors versus the standardized averaged score $\sum_{j = 1}^{4} Z_{i k} ∕ 4$ in Huntington's disease data. The gray lines are the 95% confidence intervals for the fused kernel/B-spline method.

5. Conclusion and discussions

We have developed a generalized partially linear functional single index risk score model in the longitudinal data framework. We explore the relationship between the cognitive scores and the disease risk so as to predict HD diagnosis early, and in turn to intervene with the disease progression in a timely manner.

We introduce a framework of jointly using the B-spline and kernel methods in semiparametric estimation. We use B-spline to approximate the functional single index risk score function m, and use kernel smoothing technique for estimating the cognitive weight functions of time w(t). We integrate B-spline basis expansion, kernel smoothing and longitudinal analysis, and have proven the consistency and asymptotic normalities of the covariate coe -cient estimators, the time dependent weight function estimators, and the single index risk score function estimators. The derivation relies on the assumption that the iteration procedure converges to a parameter vector value that is in a small neighborhood of the truth, which generally requires the estimating equation to have a unique zero. The unique zero property is difficult to guarantee in theory and is less likely to hold when sample size is small or moderate. To this end, empirical knowledge is usually used to select a suitable root. In our simulations, multiple roots issue did not occur and the numerical results show desirable finite sample properties of the estimators. The real data analysis yields results which are interpretable and useful in practice. In summary, the functional single index model provides rich and meaningful information regarding the association between the disease risk and the cognitive score profiles. It is of course also possible to use B-spline or kernel methods to estimate both m and w(t), research along this line can also be interesting.

Our method accommodates both continuous and categorical response variables as long as the link function H is continuously differentiable and has finite second derivative. One outstanding research question in these models, even in the context when the marginal model is completely parametric (for example, both m and w are known), is the estimation efficiency. As far as we are aware, there is no guarantee that GEE family contains the efficient estimator, and how to obtain asymptotically efficient estimator certainly worth further research.

The proposed generalized partially linear functional single index model can be used to incorporate high dimensional data, since the single index risk score is a natural method to alleviate the curse of the dimensionality. For example, the single index score could be a combination of gene expression covariates to facilitate the genetic association study. Furthermore, the generalized partially linear functional single index risk score can be used in an adaptive randomization clinical trial study to improve study efficiency. For example, we can use a single index risk score to summarize some disease related biomarkers which provide early information about the primary endpoints in adaptive trials. When a trial progresses, the information can be used to make certain intermediate decisions, such as treatment assignments among the patients, and stopping or continuation of the trial.

Supplementary Material

Supplement

NIHMS686160-supplement-Supplement.pdf^{(1,008.1KB, pdf)}

APPENDIX A.1: PROOF OF PROPOSITION 1

Assume there exist $m_{1} \in M, w_{1} (t) \in D$ and $β_{1} \in R^{d_{β}}$ , such that

m_{1} {w_{1}^{T} (t) Z} + β_{1}^{T} X = m_{0} {w_{0}^{T} (t) Z} + β_{0}^{T} X,

(9)

where m₀, w₀(t) and β₀ are the true parameter values. Taking derivative with respect to Z and t on both sides of the equation, we obtain

\begin{matrix} m_{1}^{'} {w_{1}^{T} (t) Z} w_{1} (t) = & m_{0}^{'} {w_{0}^{T} (t) Z} w_{0} (t), \\ m_{1}^{'} {w_{1}^{T} (t) Z} w_{1}^{'} {(t)}^{T} Z = & m_{0}^{'} {w_{0}^{T} (t) Z} w_{0}^{'} {(t)}^{T} Z . \end{matrix}

(10)

Because m₁, m₀ are one-to-one, $m_{1}^{'} {w_{1}^{T} {(t)}_{Z}} = m_{0}^{'} {w_{0}^{T} (t) Z} = 0$ can hold only for a set of discrete set of $w_{1}^{T} (t) Z$ and $w_{0}^{T} (t) Z$ values, hence a discrete set of t values. Thus, due to the continuity of $m_{1}^{'}$ , $m_{0}^{'}$ , w₁ and w₀, (10) implies $w_{1}^{'} {(t)}^{T} Z ∕ w_{1 j} (t) = w_{0}^{'} {(t)}^{T} Z ∕ w_{0 j} (t)$ for all j = 1,. .. , d_w, all Z, and all t ∈ [0, τ]. Thus, $w_{1}^{'} {(t)}^{T} E (Z^{\otimes 2}) ∕ w_{1 j} (t) = w_{0}^{'} {(t)}^{T} E (Z^{\otimes 2}) ∕ w_{0 j} (t)$ . Furthermore, $E (Z^{\otimes 2})$ is positive definite and in turn is invertible, it leads to $w_{1}^{'} (t) ∕ w_{1 j} (t) = w_{0}^{'} (t) ∕ w_{0 j} (t)$ . In particular, we have $w_{1 j}^{'} (t) ∕ w_{1 j} (t) = w_{0 j}^{'} (t) ∕ w_{0 j} (t)$ for all j = 1,. .. , d_w. This gives w₁_j(t) = w₀_j(t)c_j for some constant c_j, or equivalently, w₁(t) = Cw₀(t) where C is a diagonal matrix with c_j's on the diagonal. Taking derivative with respect to t, we further have $w_{1}^{'} (t) = {Cw}_{0}^{'} (t)$ . Dividing w₁_j(t) on both sides, we have $w_{1}^{'} (t) ∕ w_{1 j} (t) = (C ∕ c_{j}) w_{0}^{'} (t) ∕ w_{0 j} (t)$ . Therefore, C/c_j is the identity matrix. In other words, c_j, j = 1,. .. , d_w are identical. Since ∥w₁(t)∥₁ = ∥w₀(t)∥₁ = 1 and w₁(t), w₀(t) are positive, this further implies w₁(t) = w₀(t). Therefore, (10) reduces to $m_{1}^{'} {w_{0}^{T} (t) Z} - m_{0}^{'} {w_{0}^{T} (t) Z} = 0$ . This further implies $m_{1} {w_{0}^{T} (t) Z} = m_{0} {w_{0}^{T} (t) Z} + C_{1}$ for a constant C₁. Because m₁(0) = m₀(0) = c₀, C₁ = 0, i.e. m₁ = m₀. (9) now leads to $β_{1}^{T} X = β_{0}^{T} X$ . The equality holds for any X, which implies $β_{1}^{T} E (X^{\otimes 2}) = β_{0}^{T} E (X^{\otimes 2})$ . Since $E (X^{\otimes 2})$ is positive definite, and in turn is invertible, we have β₁ = β₀. Therefore, we have β₁ = β₀, w₁(t) = w₀(t), and m₁ = m₀, hence the problem is identifiable.

APPENDIX A.2: NOTATION IN ESTIMATION STEP

Notation in Step 1

We define an M_i × d_λ matrix

{\tilde{Q}}_{λ 1} {w (T_{i})} = [\begin{matrix} B_{r 1} {w {(T_{i 1})}^{T} Z_{i 1}} & \dots & B_{r d_{λ}} {w {(T_{i 1})}^{T} Z_{i 1}} \\ ⋮ & ⋮ & ⋮ \\ B_{r 1} {w {(T_{i M_{i}})}^{T} Z_{i M_{i}}} & \dots & B_{r d_{λ}} {w {(T_{i M_{i}})}^{T} Z_{i M_{i}}} \end{matrix}],

and define ${\tilde{Q}}_{λ i} {w (t_{0})}$ to be the same as ${\tilde{Q}}_{λ_{i}} {w (T_{i})}$ except we replace T_ik, k = 1, ... ,M_i. Here and throughout the text, replacing T_i by t₀ means replace T_ik = t₀ for each k, k =, 1,. .. , M_i. Let

\begin{matrix} V_{n} = & n^{- 1} \sum_{i = 1}^{n} [{\tilde{Q}}_{λ i} {w_{0} (T_{i})}^{T} ϴ_{i} {β_{0}, m_{0}, w_{0} (T_{i})} Ω_{i}^{- 1} \times ϴ_{i} {β_{0}, m_{0}, w_{0} (T_{i})} {\tilde{Q}}_{λ i} {w_{0} (T_{i})}], \\ V = & E ([{\tilde{Q}}_{λ i} {w_{0} (T_{i})}^{T} ϴ_{i} {β_{0}, m_{0}, w_{0} (T_{i})} Ω_{i}^{- 1} \times ϴ_{i} {β_{0}, m_{0}, w_{0} (T_{i})} {\tilde{Q}}_{λ i} {w_{0} (T_{i})}]) . \end{matrix}

Notation in Step 2

We define ${\hat{S}}_{w i k} {β_{0}, \hat{λ} (β_{0}, w), w (t_{0})}$ as

[Q_{w i k} {\hat{λ} (β_{0}, w), w (t_{0})} + Q_{λ i k} {w (t_{0})}^{T} {\frac{\partial \hat{λ} (β_{0}, w)}{\partial w}}] \times [D_{i k} - H_{i k} {β_{0}, \hat{λ} (β_{0}, w), w (t_{0})}],

and ${\hat{S}}_{w i} {β_{0}, \hat{λ} (β_{0}, w), w (t_{0})} = {[{\hat{S}}_{w i k} {β_{0}, \hat{λ} (β_{0}, w), w {(t_{0})}^{T}}, k = 1, \dots, M_{i}]}^{T}$ .

We define a functional from $D$ to $R^{d_{w}}$ , so that this functional evaluated at w_h is $Q_{w i k} {\hat{λ} (β_{0}, w), w (t_{0})}^{T} w_{h} (t_{0})$ . For notational brevity, we still use $Q_{w i k} {\hat{λ} (β_{0}, w), w (t_{0})}$ to denote this functional, i.e.

Q_{w i k} {\hat{λ} (β_{0}, w), w (t_{0})} (w_{h}) \equiv Q_{w i k} {\hat{λ} (β_{0}, w), w (t_{0})}^{T} w_{h} (t_{0}) .

Let ${\hat{A}}_{w i} {β_{0}, \hat{λ} (β_{0}, w), w (t_{0})}$ be a d_w × d_wM_i matrix, with the kth size d_w × d_w column block ${\hat{A}}_{w i k} (β_{0}, \hat{λ} (β_{0}, w), w (t_{0})}$ being

{[Q_{w i k} {\hat{λ} (β_{0}, w), w (t_{0})} + Q_{λ i k} {w (t_{0})}^{T} {\frac{\partial \hat{λ} (β_{0}, w)}{\partial w}}]}^{\otimes 2} \times ϴ_{i k} {β_{0}, \hat{λ} (β_{0}, w), w (t_{0})} .

Let ${\hat{V}}_{w i} {β_{0}, \hat{λ} (β_{0}, w), w (t_{0})}$ be a d_wM_i × d_wM_i matrix with the (p, q)th block ${\hat{V}}_{w i p q} {β_{0}, \hat{λ} (β_{0}, w), w (t_{0})}$ being

[Q_{w i p} {\hat{λ} (β_{0}, w), w (t_{0})} + Q_{λ i p} {w (t_{0})}^{T} {\frac{\partial \hat{λ} (β_{0}, w)}{\partial w}}] \times {[Q_{w i q} {\hat{λ} (β_{0}, w), w (t_{0})} + Q_{λ i q} {w (t_{0})}^{T} {\frac{\partial \hat{λ} (β_{0}, w)}{\partial w}}]}^{T} Ω_{i p q},

where Ω_ipq is the (p, q)th element of the working covariance matrix Ω_i.

We further define the population level quantities S_wik{ β₀, m₀, w₀(t₀)} to be

[Z_{i k} m_{0}^{'} {w_{0} {(t_{0})}^{T} Z_{i k}} - η {w_{0} {(t_{0})}^{T} Z_{i k}}] [D_{i k} - H_{i k} {β_{0}, m_{0}, w_{0} (t_{0})}]

and S_wi{ β₀, m₀, w₀(t₀)} = [S_wik {β₀, m₀, w₀(t₀)}^T , k = 1,..., M_i]^T. Let A_wi{ β₀, m₀, w₀(t₀)} be a d_w × d_wM_i matrix, with the kth column block A_wi{ β₀, m₀, w₀(t₀)} being a d_w × d_w matrix

{[Z_{i k} m_{0}^{'} {w_{0} {(t_{0})}^{T} Z_{i k}} - η {w_{0} {(t_{0})}^{T} Z_{i k}}]}^{\otimes 2} ϴ_{i k} {β_{0}, m_{0}, w_{0} (t_{0})} .

Let V_wi{ β₀, m₀, w₀(t₀)} be a d_{wM_i} × d_wM_i matrix, with the (p, q)th column block V_wipq{ β₀, m₀, w₀(t₀)} being

[Z_{i p} m_{0}^{'} {w_{0} {(t_{0})}^{T} Z_{i p}} - η {w_{0} {(t_{0})}^{T} Z_{i p}}] \times {[Z_{i q} m_{0}^{'} {w_{0} {(t_{0})}^{T} Z_{i q}} - η {w_{0} {(t_{0})}^{T} Z_{i q}}]}^{T} Ω_{i p q} .

Let V*_wi{ β₀, m₀, w₀(t₀)} be a d_wM_i × d_wM_i matrix. The (p, q)th block is obtained by replacing Ω_ipq in V_wipq{ β₀, m₀, w₀(t₀)} with

[E (D_{i p} D_{i q}) - H_{i p} {β_{0}, m_{0}, w_{0} (t_{0})} H_{i q} {β_{0}, m_{0}, w_{0} (t_{0})}] .

Here η is an operator that maps functions in C¹([0, τ]) to functionals from $D$ to $R^{d_{w}}$ . Specifically, η minimizes

\sup_{w_{h} \in D} {‖ E ({[{\tilde{Q}}_{w i} {m_{0}, w_{h} (T_{i})} - η {U_{i} (T_{i})} (w_{h})]}^{T} ϴ_{i} {β_{0}, m_{0}, w_{0} (T_{i})} \times Ω_{i}^{- 1} ϴ_{i} {β_{0}, m_{0}, w_{0} (T_{i})} [{\tilde{Q}}_{w i} {m_{0}, w_{h} (T_{i})} - η {U_{i} (T_{i})} (w_{h})]) ‖}_{2}

where

{\tilde{Q}}_{w i} {m_{0}, w_{h} (T_{i})} = {[m_{0}^{'} {w_{0} {(T_{i 1})}^{T} Z_{i 1}} w_{h} {(T_{i 1})}^{T} Z_{i 1}, \dots, m_{0}^{'} {w_{0} {(T_{i M_{i}})}^{T} Z_{i M_{i}}} w_{h} {(T_{i M_{i}})}^{T} Z_{i M_{i}}]}^{T}

and η{U_i(T_i)}(w_h) = [η{w(T_ik)^TZ_ik}(w_h), k = 1,. .. , M_i ]} are M_i vectors. We can also write

η (w_{0} {(T_{i k})}^{T} Z_{i k}) = E [Z_{i k} m_{0}^{'} {w_{0} {(T_{i k})}^{T} Z_{i k}} ∣ w_{0} {(T_{i k})}^{T} Z_{i k}] .

Further, we define ${\tilde{Q}}_{w j} {\hat{λ} {\hat{β}, \hat{w} (\hat{β}}, \cdot}$ is a M_i × d_w matrix, with row j as $B_{r}^{'} {\hat{w} {(\hat{β}, T_{i k})}^{T} Z_{i k}} \hat{λ} {\hat{β}, \hat{w} (\hat{β})} Z_{i k}^{T}$ . In the estimation, we use the asymptotic form in Lemma 4 in the supplementary article in the place of $\partial \hat{λ} (β_{0}, w) ∕ \partial w$ for computation.

Notation in Step 3

We define

{\hat{S}}_{β i k} [β, \hat{λ} {β, \hat{w} (β, T_{i k})}, \hat{w} (β)] = (Q_{β i k} + {[\frac{\partial \hat{λ} {β, \hat{w} (β)}}{\partial β^{T}} + \frac{\partial \hat{λ} {β, \hat{w} (β)}}{\hat{w} (β)} \frac{\partial \hat{w} (β)}{\partial β^{T}}]}^{T} Q_{λ i k} {\hat{w} (β, T_{i k})} + {\frac{\partial \hat{w} (β, T_{i k})}{\partial β^{T}}}^{T} Q_{w i k} [\hat{λ} {β, \hat{w} (β)}, \hat{w} (β, T_{i k})]) \times (D_{i k} - H_{i k} [β, \hat{λ} {β, \hat{w} (β)}, \hat{w} (β, T_{i k})]),

and ${\hat{S}}_{β i} [β, \hat{λ} {β, \hat{w} (β)}, \hat{w} (β, T_{i})] = {({\hat{S}}_{β i k} [β, \hat{λ} {β, \hat{w} (β, T_{i k})}^{T}, \hat{w} (β)], k = 1, \dots, M_{i})}^{T}$ . Let ${\hat{A}}_{β i} [β, \hat{λ} {β, \hat{w} (β, T_{i})}, \hat{w} (β)]$ be a d_β × d_β M_i matrix with the kth size d_β × d_β column block ${\hat{A}}_{β i k} [β, \hat{λ} {β, \hat{w} (β)}, \hat{w} (β, T_{i k})]$ being

{(Q_{β i k} + {[\frac{\partial \hat{λ} {β, \hat{w} (β)}}{\partial β^{T}} + \frac{\partial \hat{λ} {β, \hat{w} (β)}}{\hat{w} (β)} \frac{\partial \hat{w} (β)}{\partial β^{T}}]}^{T} Q_{λ i k} {\hat{w} (β, T_{i k})} + {\frac{\partial \hat{w} (β, T_{i k})}{\partial β^{T}}}^{T} Q_{w i k} [\hat{λ} {β, \hat{w} (β)}, \hat{w} (β, T_{i k})])}^{\otimes 2} ϴ_{i} [β, \hat{λ} {β, \hat{w} (β)}, \hat{w} (β, T_{i k})] .

Let ${\hat{V}}_{β i} {[β, \hat{λ} {β, \hat{w} (β)}, \hat{w} (β, T_{i})]}^{- 1}$ be a d_β M_i × d_β M_i matrix with the (p, q)th block ${\hat{V}}_{β i p} [β, \hat{λ} {β, \hat{w} (β)}, \hat{w} (β, T_{i p})]$ being

(Q_{β i p} + {[\frac{\partial \hat{λ} {β, \hat{w} (β)}}{\partial β^{T}} + \frac{\partial \hat{λ} {β, \hat{w} (β)}}{\hat{w} (β)} \frac{\partial \hat{w} (β)}{\partial β^{T}}]}^{T} Q_{λ i p} {\hat{w} (β, T_{i p})} + {\frac{\partial \hat{w} (β, T_{i p})}{\partial β^{T}}}^{T} Q_{w i p} [\hat{λ} {β, \hat{w} (β)}, \hat{w} (β, T_{i p})]) {(Q_{β i p} + {[\frac{\partial \hat{λ} {β, \hat{w} (β)}}{\partial β^{T}} + \frac{\partial \hat{λ} {β, \hat{w} (β)}}{\hat{w} (β)} \frac{\partial \hat{w} (β)}{\partial β^{T}}]}^{T} Q_{λ i q} {\hat{w} (β, T_{i q})} + {\frac{\partial \hat{w} (β, T_{i q})}{\partial β^{T}}}^{T} \times Q_{w i q} [\hat{λ} {β, \hat{w} (β)}, \hat{w} (β, T_{i q})])}^{T} Ω_{i p q} .

Additionally, let $δ_{u} \in C^{q} ([0, 1])$ and we define $δ {w {(T_{i k})}^{T} Z_{i k}} = [δ_{u} {w {(T_{i k})}^{T} Z_{i k}}, u = 1, \dots, d_{β}] \in R^{d_{β}}$ which minimizes

1_{d_{β}}^{T} E ({[{\tilde{Q}}_{β i} - δ {U_{i} (T_{i})}]}^{T} ϴ_{i} (β_{0}, m_{0}, w_{0} (T_{i})} Ω_{i}^{- 1} ϴ_{i} {β_{0}, m_{0}, w_{0} (T_{i})} \times {\tilde{Q}}_{β i} - δ {U_{i} (T_{i})}]) 1_{d_{β}},

where Q̃_βi = (X_i₁,. .. , X_iMi)^T is a M_i × d_β matrix, and δ {U_i(T_i)} = [ δ{w(T_ik)^T Z_ik}, k = 1,. .. , M}]^T is a M_i × d_β matrix. We can also write $δ {w_{0} {(T_{i k})}^{T} Z_{i k}}$ as $E {X ∣ w_{0} {(T_{i k})}^{T} Z_{i k}}$ . Further, we define

B (t_{0}) = E (A_{w i} {β_{0}, m_{0}, w_{0} (t_{0})} V_{w i} {β_{0}, m_{0}, w_{0} (t_{0})}^{- 1} [Q_{w i} {m_{0}, w_{0} (t_{0})} - η {U_{i} (t_{0})}] ϴ_{i}^{*} {β_{0}, m_{0}, w_{0} (t_{0})} Q_{w i}^{*} {m_{0}, w_{0} (t_{0})}),

where $ϴ_{i}^{*} {β_{0}, m_{0}, w (t_{0})}$ is a d_wM_i × d_wM_i diagonal matrix with the kth diagonal block being a d_w × d_w diagonal with the element Θ_ik {β₀, m₀, w(t₀)}. And Q_wi{m₀, w(t₀)} is a d_wM_i × d_wM_i diagonal matrix with the kth diagonal block being $diag [Z_{i k} m_{0}^{'} {w {(t_{0})}^{T} Z_{i k}}]$ . Moreover $Q_{w i}^{*} {m_{0}, w (t_{0})}$ is a d_wM_i × d_w matrix with the kth row block being a d_w × d_w matrix with d_w replications of $Z_{i k}^{T} m_{0}^{'} {w {(t_{0})}^{T} Z_{i k}}$ . And $η {U_{i} (t_{0})} = {[η {w (T_{0}) Z_{i 1}}^{T}, \dots, η {w (t_{0}) Z_{i M_{i}}}^{T}]}^{T}$ . Also Let B(T_i) be the d_wM_i × d_wM_i block diagonal matrix with the kth block as B(T_ik) and f_T(T_i) be the d_wM_i × d_wM_i block diagonal matrix with the kth block as f_T(T_ik).

Let γ_u ∈ C^q([0, 1]) and we define λ {w(T_ik)^T Z_ik} = [ γ_u{w(T_ik^TZ_ik}, u = 1,...,d_β] ∈ R^dβ which minimize

1_{β}^{T} E {[{{\tilde{Q}}_{w i} (m_{0}, B {(T_{i})}^{- 1} E [A_{w j} {β_{0}, m_{0}, w_{0} (T_{i})} V_{w j} {β_{0}, m_{0}, w_{0} (T_{i})}^{- 1} \frac{\partial S_{w j} {β_{0}, m_{0}, w_{0} (T_{i})}}{\partial β^{T}}) ∣ O_{i}]) - γ {U_{i} (T_{i})}}}^{T} ϴ_{i} {β_{0}, m_{0}, w_{0} (T_{i})} Ω_{i}^{- 1} ϴ_{i} {β_{0}, m_{0}, w_{0} (T_{i})} {{\tilde{Q}}_{w i} (m_{0}, B {(T_{i})}^{- 1} E [A_{w j} {β_{0}, m_{0}, w_{0} (T_{i})} V_{w j} {β_{0}, m_{0}, w_{0} {(T_{i})}^{- 1} \frac{\partial S_{w j} {β_{0}, m_{0}, w_{0} (T_{i})}}{\partial β^{T}}) ∣ O_{i}]) - γ {U_{i} (T_{i})}}] 1_{β} .

where γ{U_i(T_i)} = [ γ{w(T_ik)^TZ_ik}, k = 1,. .. , M_i}]^T is a M_i × d_β , and

{\tilde{Q}}_{w i} (m_{0}, B {(T_{i})}^{- 1} E [A_{w j} {β_{0}, m_{0}, w_{0} (T_{i})} V_{w j} {β_{0}, m_{0}, w_{0} (T_{i})}^{- 1} \times (\frac{\partial S_{w j} {β_{0}, m_{0}, w_{0} (T_{i})}}{\partial β^{T}}) ∣ O_{i}]),

is a M_i × β matrix with kth row as

{(B {(T_{i k})}^{- 1} E [A_{w j} {β_{0}, m_{0}, w_{0} (T_{i k})} V_{w j} {β_{0}, m_{0}, w_{0} (T_{i k})}^{- 1} \times \frac{\partial S_{w j} {β_{0}, m_{0}, w_{0} (T_{i k})}}{\partial β^{T}} ∣ O_{i}])}^{T} Z_{i k} m_{0}^{'} {w_{0} {(T_{i k})}^{T} Z_{i k}} .

We can also write

γ (w_{0} {(T_{i k})}^{T} Z_{i k}) = E {{(B {(T_{i k})}^{- 1} E [A_{w j} {β_{0}, m_{0}, w_{0} (T_{i k})} V_{w j} {β_{0}, m_{0}, w_{0} (T_{i k})}^{- 1} \frac{\partial S_{w j} {β_{0}, m_{0}, w_{0} (T_{i k})}}{\partial β^{T}} ∣ O_{i}])}^{T} Z_{i k} \times m_{0}^{'} {w_{0} {(T_{i k})}^{T} Z_{i k}} ∣ w_{0} {(T_{i k})}^{T} Z_{i k}}

We also define the population forms S _βik{ β₀, m₀, w₀(T_ik)} as

{Q_{β i k} - δ {w_{0} {(T_{i k})}^{T} Z_{i k}} - {(B {(T_{i k})}^{- 1} E [A_{w j} {β_{0}, m_{0}, w_{0} (T_{i k})} V_{w j} {β_{0}, m_{0}, w_{0} (T_{i k})}^{- 1} \frac{\partial S_{w j} {β_{0}, m_{0}, w_{0} (T_{i k})}}{\partial β^{T}} ∣ O_{i}])}^{T} Z_{i k} m_{0}^{'} {w_{0} {(T_{i k})}^{T} Z_{i k}} + γ {w_{0} {(T_{i k})}^{T} Z_{i k}}} [D_{i k} - H_{i k} {β_{0}, m_{0}, w_{0} (T_{i k})}]

and S _βi{ β₀, m₀, w₀(T_i)} = [S _βik{ β₀, m₀, w₀(T_ik)}^T, k = 1,...M_i]^T. Let A _βi{ β₀, m₀, w₀(T_i)} be a d_β × d_β M_i be the matrix with the kth block A _βik{ ₀, m₀, w₀(T_ik)} being a d_β Θ d_β matrix

{Q_{β_{i k}} - δ {w_{0} {(T_{i k})}^{T} Z_{i k}} - {(B {(T_{i k})}^{- 1} E [A_{w j} {β_{0}, m_{0}, w_{0} (T_{i k})} V_{w j} {β_{0}, m_{0}, w_{0} (T_{i k})}^{- 1} \frac{\partial S_{w j} {β_{0}, m_{0}, w_{0} (T_{i k})}}{\partial β^{T}} ∣ O_{i}])}^{T} Z_{i k} \times m_{0}^{'} {w_{0} {(T_{i k})}^{T} Z_{i k}} + γ {w_{0} {(T_{i k})}^{T} Z_{i k}}}^{\otimes 2} ϴ_{i k} [β_{0}, m_{0}, w_{0} (T_{i k})] .

Let V _βi{ β₀, m₀, w₀(T_i)} be a d_β M_iΘd_β M_i with the (p, q)th block V _βipq β₀, m₀, w₀(T_ip)} being

{Q_{β i p} - δ {w_{0} {(T_{i p})}^{T} Z_{i p}} - {(B {(T_{i p})}^{- 1} E [A_{w j} {β_{0}, m_{0}, w_{0} (T_{i p})} V_{w j} {β_{0}, m_{0}, w_{0} (T_{i p})}^{- 1} \frac{\partial S_{w j} {β_{0}, m_{0}, w_{0} (T_{i p})}}{\partial β^{T}} ∣ O_{i}])}^{T} Z_{i p} \times m_{0}^{'} {w_{0} {(T_{i p})}^{T} Z_{i p}}} + γ {w_{0} {(T_{i p})}^{T} Z_{i p}}} {Q_{β i q} - δ {w_{0} {(T_{i q})}^{T} Z_{i q}} - {(B {(T_{i q})}^{- 1} E [A_{w j} {β_{0}, m_{0}, w_{0} (T_{i q})} V_{w j} {β_{0}, m_{0}, w_{0} (T_{i q})}^{- 1} \times \frac{\partial S_{w j} {β_{0}, m_{0}, w_{0} (T_{i q})}}{\partial β^{T}} ∣ Q_{i}])}^{T} Z_{i q} m_{0}^{'} {w_{0} {(T_{i q})}^{T} Z_{i q}} + γ {w_{0} {(T_{i q})}^{T} Z_{i q}}}^{T} Ω_{i p q} .

Let $V_{β i}^{*} {β_{0}, m_{0}, w_{0} (T_{i})}$ be a d_β M_i × d_β M_i matrix. The (p, q)th block is obtained by replacing Ω_ipq in V _βi{ β₀, m₀, w₀(T_i)} with

[E (D_{i p} D_{i q}) - H_{i p} {β_{0}, m_{0}, w (T_{i p})} H_{i q} {β_{0}, m_{0}, w (T_{i q})}] .

Footnotes

This work was supported by the National Science Foundation (DMS-1206693 and DMS-1000354) and the National Institute of Neurological Disorders and Stroke (NS073671, NS082062). The authors thank the editor, associate editor and three anonymous referees for their comprehensive review which greatly improved the paper.

SUPPLEMENTARY MATERIAL

Supplement: Supplement to “Fused Kernel-Spline Smoothing for Repeatedly Measured Outcomes in a Generalized Partially Linear Model with Functional Single Index” (http://www.e-publications.org/ims/support/dowload). We provide the comprehensive proofs of Theorem 1, 2, 3 and additional Lemmas which support the results.

REFERENCES

Bishop YM, Fienberg SE, Holland PW. Discrete Multivariate Analysis: Theory and Practice. Springer; New York: 2007. p. c2007. [Google Scholar]
Bosq D. Bosq. Lecture notes in statistics. Springer; New York: 1998. Nonparametric statistics for stochastic processes : estimation and prediction D. p. 110.p. c1998. [Google Scholar]
Carroll RJ, Fan J, Gijbels I, Wand MP. Generalized Partially Linear Single-Index Models. Journal of the American Statistical Association. 1997;92:477–489. [Google Scholar]
Cui X, Härdle WK, Zhu L. The EFM approach for single-index models. The Annals of Statistics. 2011;39:1658–1688. [Google Scholar]
de Boor C. Applied Mathematical Sciences. Vol. 27. Springer; 2001. A Practical Guide to Splines. [Google Scholar]
DeVore RA, Lorentz GG. Constructive approximation. Grundlehren der mathematischen Wissenschaften: 303. Springer-Verlag; Berlin ; New York: 1993. p. c1993. [Google Scholar]
Jiang F, Ma Y, Wang Y. Supplement to ”Fused Kernel-Spline Smoothing for Repeatedly Measured Outcomes in a Generalized Partially Linear Model with Functional Single Index”. 2015 doi: 10.1214/15-AOS1330. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jiang C-R, Wang J-L. Functional single index models for longitudinal data. The Annals of Statistics. 2011;39:362–388. [Google Scholar]
Lu M, Loomis D. Spline-based semiparametric estimation of partially linear Poisson regression with single-index models. Journal of Nonparametric Statistics. 2013;25:905–922. [Google Scholar]
Ma S, Song PX-K. Varying Index Coefficient Models. Journal of the American Statistical Association. 2014 0 00–00. [Google Scholar]
Ma Y, Zhu L. Doubly robust and efficient estimators for heteroscedastic partially linear single-index models allowing high dimensional covariates. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2013;75:305–322. doi: 10.1111/j.1467-9868.2012.01040.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Paulsen JS, Langbehn DR, Stout JC, Aylward E, Ross CA, Nance M, Guttman M, Johnson S, MacDonald M, Beglinger LJ, Duff K, Kayson E, Biglan K, Shoulson I, Oakes D, Hayden M. Detection of Huntingtons disease decades before diagnosis: the Predict-HD study. Journal of Neurology, Neurosurgery & Psychiatry. 2008;79:874–880. doi: 10.1136/jnnp.2007.128728. [DOI] [PMC free article] [PubMed] [Google Scholar]
Peng H, Huang T. Penalized least squares for single index models. Journal of Statistical Planning and Inference. 2011;141:1362–1379. [Google Scholar]
Silverman BW. Density estimation for statistics and data analysis B.W. Silverman. Monographs on statistics and applied probability. Chapman and Hall; London; New York: 1986. p. 26. 1986. [Google Scholar]
Smith A. Symbol digits modalities test: manual. Western Psychological Services; Los Angeles: 1982. [Google Scholar]
Stroop JR. Studies of interference in serial verbal reactions. Journal of Experimental Psychology. 1935;18:643–662. [Google Scholar]
Wang L, Yang L. Spline estimation of single-index models. Statistica Sinica. 2009;19:765. [Google Scholar]
Wang J-L, Xue L, Zhu L, Chong YS. Estimation for a partial-linear single-index model. The Annals of statistics. 2010;38:246–274. [Google Scholar]
Xia Y, Härdle W. Semi-parametric estimation of partially linear single-index models. Journal of Multivariate Analysis. 2006;97:1162–1184. [Google Scholar]
Xia Y, Tong H, Li W, Zhu L-X. An adaptive estimation of dimension reduction space. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2002;64:363–410. [Google Scholar]
Xu P, Zhu L. Estimation for a marginal generalized single-index longitudinal model. Journal of Multivariate Analysis. 2012;105:285–299. [Google Scholar]
Zhang Y, Long JD, Mills JA, Warner JH, Lu W, Paulsen JS. AIndexing disease progression at study entry with individuals at risk for Huntington disease. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics. 2011;156:751. doi: 10.1002/ajmg.b.31232. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement

NIHMS686160-supplement-Supplement.pdf^{(1,008.1KB, pdf)}

[R1] Bishop YM, Fienberg SE, Holland PW. Discrete Multivariate Analysis: Theory and Practice. Springer; New York: 2007. p. c2007. [Google Scholar]

[R2] Bosq D. Bosq. Lecture notes in statistics. Springer; New York: 1998. Nonparametric statistics for stochastic processes : estimation and prediction D. p. 110.p. c1998. [Google Scholar]

[R3] Carroll RJ, Fan J, Gijbels I, Wand MP. Generalized Partially Linear Single-Index Models. Journal of the American Statistical Association. 1997;92:477–489. [Google Scholar]

[R4] Cui X, Härdle WK, Zhu L. The EFM approach for single-index models. The Annals of Statistics. 2011;39:1658–1688. [Google Scholar]

[R5] de Boor C. Applied Mathematical Sciences. Vol. 27. Springer; 2001. A Practical Guide to Splines. [Google Scholar]

[R6] DeVore RA, Lorentz GG. Constructive approximation. Grundlehren der mathematischen Wissenschaften: 303. Springer-Verlag; Berlin ; New York: 1993. p. c1993. [Google Scholar]

[R7] Jiang F, Ma Y, Wang Y. Supplement to ”Fused Kernel-Spline Smoothing for Repeatedly Measured Outcomes in a Generalized Partially Linear Model with Functional Single Index”. 2015 doi: 10.1214/15-AOS1330. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Jiang C-R, Wang J-L. Functional single index models for longitudinal data. The Annals of Statistics. 2011;39:362–388. [Google Scholar]

[R9] Lu M, Loomis D. Spline-based semiparametric estimation of partially linear Poisson regression with single-index models. Journal of Nonparametric Statistics. 2013;25:905–922. [Google Scholar]

[R10] Ma S, Song PX-K. Varying Index Coefficient Models. Journal of the American Statistical Association. 2014 0 00–00. [Google Scholar]

[R11] Ma Y, Zhu L. Doubly robust and efficient estimators for heteroscedastic partially linear single-index models allowing high dimensional covariates. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2013;75:305–322. doi: 10.1111/j.1467-9868.2012.01040.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] Paulsen JS, Langbehn DR, Stout JC, Aylward E, Ross CA, Nance M, Guttman M, Johnson S, MacDonald M, Beglinger LJ, Duff K, Kayson E, Biglan K, Shoulson I, Oakes D, Hayden M. Detection of Huntingtons disease decades before diagnosis: the Predict-HD study. Journal of Neurology, Neurosurgery & Psychiatry. 2008;79:874–880. doi: 10.1136/jnnp.2007.128728. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Peng H, Huang T. Penalized least squares for single index models. Journal of Statistical Planning and Inference. 2011;141:1362–1379. [Google Scholar]

[R14] Silverman BW. Density estimation for statistics and data analysis B.W. Silverman. Monographs on statistics and applied probability. Chapman and Hall; London; New York: 1986. p. 26. 1986. [Google Scholar]

[R15] Smith A. Symbol digits modalities test: manual. Western Psychological Services; Los Angeles: 1982. [Google Scholar]

[R16] Stroop JR. Studies of interference in serial verbal reactions. Journal of Experimental Psychology. 1935;18:643–662. [Google Scholar]

[R17] Wang L, Yang L. Spline estimation of single-index models. Statistica Sinica. 2009;19:765. [Google Scholar]

[R18] Wang J-L, Xue L, Zhu L, Chong YS. Estimation for a partial-linear single-index model. The Annals of statistics. 2010;38:246–274. [Google Scholar]

[R19] Xia Y, Härdle W. Semi-parametric estimation of partially linear single-index models. Journal of Multivariate Analysis. 2006;97:1162–1184. [Google Scholar]

[R20] Xia Y, Tong H, Li W, Zhu L-X. An adaptive estimation of dimension reduction space. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2002;64:363–410. [Google Scholar]

[R21] Xu P, Zhu L. Estimation for a marginal generalized single-index longitudinal model. Journal of Multivariate Analysis. 2012;105:285–299. [Google Scholar]

[R22] Zhang Y, Long JD, Mills JA, Warner JH, Lu W, Paulsen JS. AIndexing disease progression at study entry with individuals at risk for Huntington disease. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics. 2011;156:751. doi: 10.1002/ajmg.b.31232. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

FUSED KERNEL-SPLINE SMOOTHING FOR REPEATEDLY MEASURED OUTCOMES IN A GENERALIZED PARTIALLY LINEAR MODEL WITH FUNCTIONAL SINGLE INDEX*

Fei Jiang

Yanyuan Ma

Yuanjia Wang