Abstract
We propose a generalized partially linear functional single index risk score model for repeatedly measured outcomes where the index itself is a function of time. We fuse the nonparametric kernel method and regression spline method, and modify the generalized estimating equation to facilitate estimation and inference. We use local smoothing kernel to estimate the unspecified coefficient functions of time, and use B-splines to estimate the unspecified function of the single index component. The covariance structure is taken into account via a working model, which provides valid estimation and inference procedure whether or not it captures the true covariance. The estimation method is applicable to both continuous and discrete outcomes. We derive large sample properties of the estimation procedure and show different convergence rate of each component of the model. The asymptotic properties when the kernel and regression spline methods are combined in a nested fashion has not been studied prior to this work even in the independent data case.
Keywords: B-spline, Generalized linear model, Huntington's disease, Infinite dimension, Logistic model, Semiparametric model, Single index model
1. Introduction
As a semiparametric regression model, single index model is a popular way to accomodate multivariate covariates while retain model flexibility. For independent outcomes, Carroll et al. (1997) introduced a generalized partially linear single index model which enriches the family of single index models by allowing an additional linear component. The goal of this paper is to develop a class of generalized partially linear single index models with functional covariate effect and explore the estimation and inference for repeatedly measured dependent outcomes.
In the longitudinal data framework, let i denote the ith individual, and k be the kth measurement, where i = 1, . . . , n and k = 1 . . . , Mi. Here Mi is the total number of observations available for the ith individual. Let Dik be the response variable, Zik and Xik be dw and dβ dimensional covariate vectors. We assume the observations from different individuals are independent, while the responses Di1, . . . , DiMi assessed on the same individual at different time points are correlated but we do not attempt to model such correlation. To model the relationship between the conditional mean of the repeatedly measured outcomes Dik at time Tik and covariates Zik, Xik, we propose a partially linear functional single index model which models the mean of Dik given Zik, Xik at time Tik in the form of
| (1) |
where H is a known differentiable monotone link function, w(t) ∈ Rdw at any t, β ∈ Rdβ. Such model is useful when the time varying effect of Zik and the functional combined score effect of w(Tik)TZik, adjusted by the covariate vector Xik, are of main interest. Note that both Xik and Zik can contain components that do not vary with k, such as gender, and the ones that vary with k such as age. Here, m(0) serves as the intercept term, thus Xik does not contain the constant one. In Model (1), Zik includes the covariates of main research interest whose effects are usually time varying and modeled nonparametrically, and Xik contains additional covariates of secondary scientific interest and whose effects are only modeled via a simple linear form. Here m is an unspecified smooth single index function. Further w is a dw-dimensional vector of smooth functions in L2, while w(t) is w evaluated at t, hence a dw-dimensional vector. In addition, w(t) contributes to form the argument of the function m, which yields a nested nonparametric functional form. To ensure identifiability and to reflect the practial application that motivated this example, we further require w(t) > 0 and ∥w(t)∥1 = 1 ∀t. Here w(t) > 0 means every component in w(t) is positive, and ∥·∥1 denotes the vector l1-norm, i.e. the sum of the absolute values of the components in the vector. The choice of l1 norm incorporates the practical knowledge from our real data example described in Section 4 and is not critical. It can be modified to other norms, such as the most often used l2 norm or sup norm in our subsequent development. We assume the observed data follow the model described above. Throughout the texts, we use subscript 0 to denote the true parameters. Before we proceed, we first show that
Proposition 1
Assume , where . Here C1([0, 1]) is the space of functions with continuous derivatives on [0, 1] and c0 is a finite constant. Assume , where . Here C1([0, τ]) is the space of functions with continuous derivatives on [0, τ] and τ is a finite constant. Assume and are both positive definite, where we define for an arbitrary vector a. Then under these assumptions, the parameter set (β0, m0, w0) in (1) is identifiable.
The proof of Proposition 1 is in Appendix A.1. Model (1) can be viewed as a longitudinal extension of the generalized partially linear single index risk score model introduced in Carroll et al. (1997), i.e.,
| (2) |
which is a popular way to increase flexibility when covariate dimension may be high. Many existing literatures explore the generalized partially linear single index model under the longitudinal settings. Jiang and Wang (2011) consider the single index function in the form of m(wTZik, t), which allows a time dependent function m, but w is time invariant hence it does not have the nesting structure in Model (1) to capture the time dependent effect of Zik. Furthermore, the method does not consider the within subject correlation. Xu and Zhu (2012) adopted Model (2) as marginal model in the longitudinal data setting. Their method takes into account the within subject correlation, but, similar to Jiang and Wang (2011)'s approach, it does not allow w to vary with time, hence is not sufficient to describe the time varying effect of Zik. We modify Jiang and Wang (2011) and Xu and Zhu (2012)'s models to accommodate the time dependent score effect w(t). In Section 4, we show that time-dependent effect is essential to improve model fit in some practical situations. In addition, we retain the virtue of Jiang and Wang (2011) and Xu and Zhu (2012)'s models by using the semiparametric functional single index model, which overcomes the curse of dimensionality, and alleviate the risk of model mis-specification (Peng and Huang, 2011).
The estimation and inference for Model (1) are challenging due to the non-parametric form of m, w, and the complications from correlation between repeatedly measured outcomes. The estimation for single index models has been discussed extensively in both kernel and spline literatures. Carroll et al. (1997) proposed a local kernel smoothing technique to estimate the unknown function m and the finite dimensional parameters w, β in Model (2) through iterative procedures. Later, Xia and Härdle (2006) applied a kernel-based minimum average variance estimation (MAVE) method for partially linear single index models, which was first proposed by Xia et al. (2002) for dimension reduction. When Zik is continuous, MAVE results in consistent estimators for the single index function m without the root-n assumption on w as in Carroll et al. (1997). Nevertheless, when Zik is discrete, the method may fail to obtain consistent estimators without prior information about β (Xia et al., 2002; Wang et al., 2010). Moreover, Wang and Yang (2009) showed that MAVE is unreliable for estimating single index coefficient w when Zik is unbalanced and sparse, i.e., when Zik is measured at different time points for each subject, and each subject may have only a few measurements.
To overcome these limitations, we apply the B-spline method to estimate the unknown function m, which is stable when the data set contains discrete or sparse Zik. Although the B-spline method outperforms the kernel method in estimating m, problems arise if it is also used for estimating w(t) in our model setting. If spline approximations are used for both m and w(t) with k knots, then we must simultaneously solve (dw + 1)k estimating equations to get the spline coefficients associated with the spline knots, which may cause numerical instability and is computationally expensive when the parameter number increases with the sample size. To alleviate the computational burden and instability, we estimate w(t) by using the kernel method. At different time point t, the procedure solves w(t) independently and in parallel, hence it does not suffer from the numerical instability and is computationally efficient. To handle longitudinal outcomes, we use the idea from the generalized estimating equation (GEE) to combine a set of estimating equations built from the marginal model. It is worth pointing out that the GEE in its original form is only applicable when the index w does not change along time. In conclusion, we combine the kernel and B-spline smoothing with the GEE approach, and develop a fused kernel/B-spline procedure for estimation and inference.
The fusion of kernel and B-spline poses theoretical challenges which we address in this work. To the best of our knowledge, this is the first time kernel and spline methods are jointly implemented in a nested function setting. We study convergence properties, such as asymptotic bias and variance, for each component of the model, show that the parametric component achieves the regular root-n convergence rate, and establish the relation of the non-parametric function convergence rates to the number of B-spline basis functions and B-spline order, as well as their relation to the kernel bandwidth. These results provide guidelines for choosing the number of knots in association with spline order and bandwidth in order to optimize the performance. They also further facilitate inference, such as constructing confidence intervals and performing hypothesis testing. Although theoretical properties of kernel smoothing and spline smoothing are available separately, the properties when these two methods are combined in a nested fashion has not been studied in the literature even for the independent data case prior to this work. Because the vector functions w appears inside the function m, the asymptotic analysis of the spline and kernel methods are not completely separable. This requires a comprehensive analysis and integration of both methods instead of a mechanical combination of two separate techniques.
The rest of the paper is structured as the following. In Section 2, we define some notations and state assumptions in the model, introduce the fused kernel/B-spline semiparametric estimating equation, illustrate the profiling estimation procedure to obtain the estimators, and study the asymptotic properties of the resulting estimators. In Section 3, we evaluate the estimation procedure on simulated data sets. In Section 4, we apply the model and estimation procedure on the Huntington's disease data set. We conclude the paper with some discussion in Section 5. We present the technical proofs in Appendix and an online supplementary document (Jiang, Ma and Wang, 2015).
2. Estimating equations and profiling procedure
In this section, we construct estimators for (β, m, w) in Model (1). We first derive a set of estimating equations, through applying both B-spline and kernel methods. We then introduce a profiling procedure to implement the estimation. Finally, we discuss the asymptotic properties of the estimators.
Many estimation procedures have been developed for the single index risk score model. In addition to the methods describe in Section 1, for the models with uncorrelated responses, Cui, Härdle and Zhu (2011) illustrate an estimating function method based on the kernel approach for the generalized single index risk score model. Ma and Zhu (2013) discuss a doubly robust and efficient estimation procedure for the single index risk score model with high dimensional covariates. Ma and Song (2014) and Lu and Loomis (2013) propose B-spline methods for estimating the unknown regression link functions in single index risk score models. However, these methods are not adequate for the parameter estimation in our model. As shown in (1), in addition to an unknown link function m, our functional single index model contains a nonparametric function w(t) which is multivariate and appears inside m. Therefore, we develop a GEE type method for the parameter estimation in our model which allows to take into account the within patient correlation. In conjunction with the kernel smoothing technique and B-spline basis expansion, our fused method estimates both the coefficients as a function of time and the unspecified regression function, and simultaneously handles the complexities of repeated measurements and curses of dimensionality.
More specifically, let Br(u) = {Br1(u),. .. , Brdλ (u)}T be the set of B-spline basis functions of order r and let λ = (λ1,. .., λdλ)T be the coefficients of the B-spline approximation. Denoting m̃(u, λ) = Br(u)Tλ, de Boor (2001) has shown the existence of a λ0 ∈ Rdλ so that m̃(u,, λ0 ) = Br(u)T λ0 converges to m0(u) uniformly on (0, 1) when the number of the B-spline inner knots goes to infinite (See Fact 1 in Section S.2 in the supplementary article (Jiang, Ma and Wang, 2015)). A detailed description of the B-spline functions and the properties of their derivatives can be found in de Boor (2001).
The B-spline approximation greatly eases the parameter estimation procedure. Operationally, for a given sample size n, the problem is reduced from estimating the infinite dimensional m to estimating a finite dimensional vector λ. Since the dimension of λ grows with the sample size, the estimation consistency can be achieved when the sample size goes to infinity. Let , the approximated mean function can be written as
We investigate the properties for estimating m0, w0, β0 through investigating the properties of the estimators for λ0, w0 and β0.
2.1 Notations
We define some notations to present the estimation procedure. To keep the main text concise, we illustrate the specific forms of notations in the Section A.2 in Appendix. Generally, for a generic vector valued function a that depends on some additional parameters, we use a to denote the function with the estimated parameter values plugged in. For example, this applies to Sw, Sβ , Ŝw, Ŝβ in the following text. The specific forms of Sw, Sβ , Ŝw, Ŝβ are given in Section A.2 in Appendix.
In our profiling procedure, we estimate λ0 using , considered as a functional of β0, w0. Then we estimate w0 using ŵ, considered as a function of β0 at different time points. Finally, we estimate β0 using . We further define Tik, k = 1,. .. , Mi, i = 1,. .. , n to be the random measurement times which are independent of Xik, Zik, Dik, w to be a function of t for t ∈ [0, τ], of where τ is a finite constant, and ŵβ, ŵ(β, t), considered as functions of β, to be the estimators for w and w(t), respectively.
Let Qβ (Xik) = Xik, Qλ {Zik; w(t)} = Br {w(t)TZik}, and , to be the partial derivatives of Br{w(t)TZik}T λ+ βTXik with respect to β, λ, w(t). In the sequel, we will frequently use Qβik, Q λik{w(t)}, Qwik{λ, w(t)} as short forms for Qβ (Xik), Qλ {Zik; w(t)} and Qw {Zik; λ, w(t)} respectively.
In general, to simplify the notations, we use subscripts to indicate the observations, i.e. for a generic function a(·), we write ai(·) ≡ a(Oi; ·), where Oi denotes the ith observed variables. For example we write
Further, we indicate the use of the true function instead of its B-spline approximation by replacing the argument λ with m, for example,
We also define Θ(u) = dH(u)/du and
and
throughout the text.
The profiling procedure has three steps. We define the details of notations used in each step and their corresponding population forms in the Section A.2 in Appendix.
2.2 Estimation procedure via profiling
In this section, we define the estimation procedures for m, w0 and β0 via estimating equations which are solved through a profiling procedure as we describe below. We first estimate the function m through B-splines, by treating w and β as parameters that are held fixed. This yields a set of estimating equations for the spline coefficients, as functions of w and β. We then estimate the partially linear nonparametric component w(t) of the cognitive score profiles through local kernel smoothing, while treating β as fixed parameters. This further allows us to obtain a second set of estimating equations at each time point that the function w(t) needs to be estimated, as a function of β. Finally, we estimate the parametric component coefficients through solving its own corresponding estimating equation set. The profiling procedure achieves a certain separation by allowing us to treat only one of the three components in each of the three nested steps, hence it eases the computational complexities. Because the B-spline estimator , kernel estimator ŵ(t), and linear parametric estimator have different convergence rates, such separation also facilitates analysis of the asymptotic properties, compared with a simultaneous estimation procedure.
Step 1
We obtain by solving
with respect to λ, where Ωi is a working covariance matrix, and Θi = diag{Θik}, k = 1,. .. , Mi is a Mi × Mi diagonal matrix. From the first step, we obtain the B-spline coefficients to estimate the function m.
Step 2
We obtain ŵ(β) in this step. Let Kh(Ti − t0) be a dwMi × dwMi diagonal matrix whose kth diagonal block is diag{Kh(Tik − t0 )} where Kh(s) = h−1K(s/h) is a Kernel function with bandwidth h.
To obtain ŵ(β0, t0), we solve the estimating equation
| (3) |
with respect to w. Recall that ∥w(t0)∥11 = 1. In the implementation, we parameterize , and derive the score functions for the vector (w1,. .. , wdw − 1). We then solve the estimating equation system which contains the dw − 1 equations constructed from the score functions and the equation . The roots of the estimating equation system automatically satisfy the l1 constraint. In all our experiments, the resulting ŵj(t) are nonnegative automatically, hence we did not particularly enforce the nonnegativity as a constraint. If it is needed, one can further enforce the nonnegativity and perform a constrained optimization.
Step 3
We obtain by solving
| (4) |
In above steps, we approximate ∂ŵ(β, Ti), , and by the leading terms in their expansions. Their explicit forms are shown in (S.27) in the proofs of Lemma 6, (S.37) in the proofs of Lemma 11, and Notations in Step 2 in Appendix, respectively.
2.3 Asymptotic properties of the estimators
The profiling estimator described in Section 2.2 is quite complex, caused by the functional nature of w(t), the unspecified forms of both w and m and their nested appearance in the model, the correlation among different observations associated with the same individual and the different numbers of observations for each individual. In addition, the fused kernel/B-spline method requires careful joint consideration of both smoothing techniques. As a consequence, the analysis to obtain the asymptotic properties of the estimator described in Section 2.2 is very challenging and involved. We first list the regularity conditions under which we perform our theoretical analysis.
(A1) The kernel function K(·) is non-negative, has compact support, and satisfies , and , and .
(A2) The bandwidth h in the kernel smoothing satisfies nh2 → ∞ and nh4 → 0 when n → ∞.
(A3) The density function of w(t)TZ for each t ∈ [0, τ] is bounded away from 0 on Sw(t) and satisfies the Lipschitz condition of order 1 on Sw(t), where w is in a neighborhood of w0, and Sw(t) = {w(t)TZ, Z ∈ S} and S is a compact support of Z and τ < ∞ is a finite constant. Without loss of generality, we assume Sw(t) =1[0, 1].
(A4) Assume m0 ∈ {m ∈ Cq([0, 1]), m is one-to-one, and m(0) = c0}. Here Cq([0, 1]) is the space of functions with first q continuous derivatives on [0, 1]. The spline order r ≥ q. The cluster size Mi is a fixed finite number that does not diverge with the sample size, i.e. Mi < ∞ for all i.
(A5) Let hp be the distance between the (p + 1)th and pth interior knots of the order r B-spline functions. And . There exists 0 < chb < ∞, such that abd , where N is the number of knots which satisfies N → ∞ as n → ∞, and N−1n(logn)−1 → ∞ and Nn−1/(2q+1) → ∞. Further assuming q > 3 and N−3n.
(A6) The matrices , and are finite and positive definite for any t ∈ [0, τ]. The requirements nh4 → 0 in (A2) and Nn −1(2q+1) → ∞ in (A5) are undersmoothing requirements on the kernel approximation and on the spline approximation respectively. They are required to ensure that the biases, E(ŵ) − w and , are ignorable compared to other terms left in the final analysis. This kind of undersmoothing conditions are commonly required in semiparametric models.
Theorems 1–3 describe the asymptotic properties for the estimators of w0(t), β0 and m0, respectively.
Theorem 1
Assume Conditions (A1)-(A6) and the identifiability conditions stated in Proposition 1 hold. Let Âwi, V̂wi, Ŝwi, and their population forms Awi, Vwi and Swi be as defined in Notation in Step 2Section A.2 in Appendix. Let ŵ(β0, t0) solve (3) and fT be the probability density function of Tik with support [0, τ]. Define
Then
where B are defined in Notation in Step 3 in Section A.2 in Appendix.
Theorem 1 establishes the large samples properties of the estimation of the multivariate weight function w0(t). It shows that our method achieves the usual nonparametric convengence rate of root-nh under the conditions given.
Theorem 2
Assume Conditions (A1)-(A6) and the identifiability conditions stated in Proposition 1 hold. Let , , , and their population forms Sβik, Aβik, Vβikl be as defined in Notation in Step 3 in Section A.2 in Appendix, and ŵ(β), w(β) be as defined in Section 2.1. Let solve (4), then
| (5) |
where matrix and κ(Tik) is
and
Here Ci is a dβ Mi Θ dβ Mi with the kth block having the form
Here is a dβ Mi × dβ Mi matrix with the kth block being a dβ × dβ diagonal matrix with the element Θik{ β0, m0, w0(Tik)} . And is a dβ Mi × dλ matrix with kth row block being a dβ × dλ matrix, which is dβ replicates of the row vector Q λik{w0(Tik)}T. B, δ , γ are functions defined in Notation in Step 3 in Section A.2 in Appendix.
Consequently, we have
where
Theorem 2 establishes the usual parametric convergence rate for , even though the estimation relies on multiple nonparametric estimates as well. The form of (5) in Theorem 2 indicates that the variance of estimating β0 is inflated by the estimation ŵ as given in
and is also inflated by the estimation , as given in
See Lemma 9, 11 and the proofs of Theorem 2 in the supplementary article (Jiang, Ma and Wang, 2015) for more detailed discussion.
The asymptotic normality of established in Theorem 2 further facilitates inference on β such as constructing confidence intervals or performing hypothesis testing. In implementing these inference procedures, we replace the variance-covariance matrix Σ with its estimate, where we use empirical sample mean over the observed samples to replace the expectations in Theorem 2, and plug in the estimates of the corresponding parameter and function values. This is the procedure adopted in all our numerical implementation
Theorem 3
Assume Conditions (A1)-(A6) and the identifiability conditions stated in Proposition 1 hold. Let , , where solves (3) and define
where is the true covariance matrix, and
Here V is as defined in the Notations in Step 1 in Section A.2 in Appendix, and Cikv is the (k, v)th entry of the matrix . Then we have
Further because the order of σ2 and are both (nhb)−1, together with Fact 1 in Section S.2, we have
uniformly for u ∈ (0, 1).
Theorem 3 shows that the estimation error of consists of two components, the approximation error of and the approximation error of m̃(u, λ0) from their respective true functions. The errors of m̂ and m̂′ go to zero with the rates of Op{(nhb)−1/2} and Op(n−1/2hb−1/2) respectively. Under Condition (A5), m̂ and m̂′ are both consistent, and they approach the truths with the standard B-spline convergence rate. We provide an outline of the proofs for Theorems 1-3 in the supplementary article (Jiang, Ma and Wang, 2015). The proofs are highly technical and lengthy, and they require several preliminary results which we summarize as lemmas. We present and prove these lemmas in the supplementary article (Jiang, Ma and Wang, 2015).
3. Numeric evaluation via simulations
We now evaluate the finite sample performance of the proposed estimation procedure on simulated data sets. We simulate 1000 data sets from Model (1) under three settings. In Settings 1 and 2, we consider binary response and use logit link function for H, while in Setting 3, we consider continuous normal response and use an identity H function. In Setting 1, we choose m as a polynomial function with degree two. We generate w initially as positive linear functions on t, and then normalize the vector to have summation one. Note that the normalization function modifies the structure of w(t) and results in a nonlinear vector-valued function in t. Additionally, we generate Zik from the Poisson distribution and normalize the vectors by the sample standard deviations. Furthermore, we generate Tik from the exponential distribution and the covariate Xik from the univariate normal distribution. In Settings 2 and 3, we use the sine function for m, and generate w as power functions on t and then normalize the vector to have summation one. We generate covariate vector Xi from a three-dimensional multivariate normal distribution. In order to stabilize the computation and control numerical errors, in both settings, we transform the function to , where w0 is the initial value of w, and E{w0(Tik)T Zik} and var{w0 (Tik)TZik} are approximated by the sample mean and the sample variance. We then use B-spline to approximate m ○ F−1 instead of m, where ○ denotes composite. All other operations remain the same, and the estimation and inference of the functional single index risk score m{w(Tik)TZik}, our main research interest, is carried out as described before. To recover information regarding m, one can use the Delta method to obtain the estimate and the variance of estimating m from that of estimating m ○ F−1.
In all the implementations, we use the third order quadratic spline. We select the number of internal knots N = {n1/5(logn)2/5} which satisfies the Condition (A5) in Section 2.3. We choose the Gaussian kernel with bandwidth h = n−2/15hs, where hs is Silverman's rule-of-thumb bandwidth (Silverman, 1986). Because hs = O(n−1/5), the bandwidth selection satisfies Condition (A2) in Section 2.3.
Table 1 shows the averaged point estimators of β, the empirical standard deviations calculated from the sample variances, the averages of the estimated asymptotic standard deviation (Σ1/2 in Theorem 2) over the simulated samples, and the mean squared errors (MSE) when the sample sizes are 100, 500, 800, respectively. The conclusions are similar under the three settings. To sum up, the estimation biases are consistently small across all samples sizes, the empirical standard deviations and the estimated asymptotic standard deviations are decreasing when the sample size increases. The MSE decreases as the sample size increases as well, mainly due to the declining variations. Further, the empirical standard deviation of the estimators and average of the estimated standard deviations calculated from the asymptotic results are close. In addition, the coverage probabilities of the empirical confidence intervals are close to the normal level 95%. This suggests that we can use the asymptotic properties to perform inference and can obtain sufficiently reliable results under moderate sample sizes.
Table 1.
Simulation results in Setting 1, 2, 3, based on 1000 data sets.
| Setting 1 | |||||
|---|---|---|---|---|---|
| β 0 | MSE | CP | |||
| n = 100 | |||||
| −0.2 | −0.202 | 0.157 | 0.113 | 0.0247 | 0.957 |
| −0.4 | −0.398 | 0.119 | 0.115 | 0.0142 | 0.940 |
| −0.6 | −0.601 | 0.124 | 0.118 | 0.0153 | 0.957 |
| n = 500 | |||||
| −0.2 | −0.198 | 0.052 | 0.050 | 0.0027 | 0.954 |
| −0.4 | −0.398 | 0.053 | 0.051 | 0.0028 | 0.947 |
| −0.6 | −0.601 | 0.056 | 0.053 | 0.0031 | 0.939 |
| n = 800 | |||||
| −0.2 | −0.197 | 0.041 | 0.040 | 0.0017 | 0.951 |
| −0.4 | −0.398 | 0.041 | 0.040 | 0.0017 | 0.949 |
| −0.6 | −0.602 | 0.044 | 0.042 | 0.0019 | 0.946 |
| Setting 2 | ||||||
|---|---|---|---|---|---|---|
| β 0 | MSE | CP | ||||
| n = 100 | ||||||
| β 1 | −0.5 | −0.505 | 0.131 | 0.116 | 0.0171 | 0.908 |
| β 2 | 0.2 | −0.200 | 0.122 | 0.112 | 0.0147 | 0.923 |
| β 3 | 0.5 | −0.515 | 0.125 | 0.116 | 0.0159 | 0.927 |
| n = 500 | ||||||
| β 1 | −0.5 | −0.507 | 0.056 | 0.053 | 0.0032 | 0.946 |
| β 2 | 0.2 | −0.198 | 0.053 | 0.052 | 0.0028 | 0.951 |
| β 3 | 0.5 | −0.508 | 0.054 | 0.053 | 0.0031 | 0.944 |
| n = 800 | ||||||
| β 1 | −0.5 | −0.504 | 0.043 | 0.042 | 0.0019 | 0.953 |
| β 2 | 0.2 | −0.202 | 0.041 | 0.041 | 0.0017 | 0.962 |
| β 3 | 0.5 | −0.505 | 0.043 | 0.042 | 0.0019 | 0.951 |
| Setting 3 | ||||||
|---|---|---|---|---|---|---|
| β 0 | MSE | CP | ||||
| n= 100 | ||||||
| β 1 | −0.5 | −0.501 | 0.062 | 0.052 | 3.85e-3 | 0.938 |
| β 2 | 0.2 | −0.200 | 0.060 | 0.063 | 3.60e-3 | 0.932 |
| β 3 | 0.5 | −0.503 | 0.061 | 0.053 | 3.73e-3 | 0.932 |
| n = 500 | ||||||
| β 1 | −0.5 | −0.500 | 0.025 | 0.024 | 6.25e-4 | 0.966 |
| β 2 | 0.2 | −0.200 | 0.024 | 0.024 | 5.76e-4 | 0.945 |
| β 3 | 0.5 | −0.502 | 0.025 | 0.024 | 6.29e-4 | 0.963 |
| n = 800 | ||||||
| β 1 | −0.5 | −0.500 | 0.020 | 0.019 | 4.00e-4 | 0.949 |
| β 2 | 0.2 | −0.200 | 0.019 | 0.019 | 3.61e-4 | 0.949 |
| β 3 | 0.5 | −0.501 | 0.020 | 0.019 | 4.01e-4 | 0.952 |
The true parameter β0, mean (E), empirical standard deviation and average of the estimated standard deviations , the coverage probabilities (CP) of the 95% empirical confidnece intervals are reported.
We also examined the performances of ŵ and m̂ to assess the properties of the estimated functional single index risk score. Under the first setting, because the functional single index risk score is fixed with respect to β, we only evaluate the settings with β = 0.4. To evaluate the combined score ŵ(t)TZ as a function of t, we fix Z at Z * = (1, 2, 3, 4) and plot the averages of the estimated combined score ŵ(t)TZ* over the 1000 simulations around the true scores w0(t)TZ* in the upper panels of Figure 1, 2, and 3 for Setting 1, 2 and 3, respectively. Additionally, we present the 95% point wise confidence band. The results show that the estimates are close to the true function. Further, the 95% confidence band becomes narrower when the sample size increases, which indicates that the estimation variation decreases with increased sample size. Moreover, we evaluated the coverage probabilities of the empirical pointwise confidence bands of w, by computing the coverage probabilities at a set of fixed points across t and taking their average. The average coverage probabilities for n = 100, 500, 800 are 0.934, 0.936, 0.939 in Setting 1, 0.939, 0.940, 0.941 in Setting 2, and 0.931, 0.934, 0.936 in Setting 3, respectively. All are reasonably close to the nominal level of 95%.
Fig 1.
Estimation of w(t)Tz (upper) and m(u) (bottom) as a function of t and u, respectively in Setting 1 with sample sizes 100 (left), 500 (middle) and 800 (right). True function (solid line), average of 1000 estimated functions (dashed lines), and 95% point wise confidence band (dash-doted lines) are provided.
Fig 2.

Estimation of w(t)Tz (upper) and m(u) (bottom) as a function of t and u, respectively in in Setting 2 with sample sizes 100 (left), 500 (middle) and 800 (right). True function (solid line), average of 1000 estimated functions (dashed lines), and 95% point wise confidence band (dash-doted lines) are provided.
Fig 3.

Estimation of w(t)Tz (upper) and m(u) (bottom) as a function of t and u, respectively in Setting 3 with sample sizes 100 (left), 500 (middle) and 800 (right). True function (solid line), average of 1000 estimated functions (dashed lines), and 95% point wise confidence band (dash-doted lines) are provided.
To evaluate the performance of m̂, we plot the average of m̂(u) based on the 1000 simulations, as well as the 95% point wise confidence band in the bottom panels of Figure 1, 2, and 3 for Setting 1, 2, and 3, respectively. The plots show that the estimators are close to the true functions except on the boundary when the sample size is relatively small. In addition, when the sample size increases, the confidence band becomes narrower, benefiting from the smaller estimation variation. Note that because of the additional transformation on w(t)TZ, it is not unexpected that the true m function does not appear to be periodic sine function on w(t)TZ. Moreover, we evaluate the converge probability of the empirical pointwise confidence bands of m. The average coverage probabilities are 0.943, 0.947, 0.948 in Setting 1, 0.957, 0.960, 0.951 in Setting 2, and 0.939, 0.947, 0.946 in Setting 3, respectively. Again, they are all fairly close to the nominal level of 95%.
In summary, Table 1, Figures 1, 2, 3 illustrate the desirable finite sample performance of the fused kernel/B-spline combination method in estimating β, m and w. In terms of parameter estimation and function estimation in the non-boundary region, the estimators show very small biases across all sample sizes, and decreasing variability as the sample size increases. The asymptotic variance and sample empirical variance in estimating β are close. Furthermore, the coverage probability of the empirical confidence intervals for β and the coverage probability of the empirical pointwise confidence bands for w and m are close to the nominal levels, which supports using the asymptotic results for the subsequent inferences.
4. Application
We apply the functional single index risk score model and the fused kernel/B-spline semiparametric estimation method to analyze a real data set from a Huntington's disease (HD) study. Current research in HD aims to find reliable prodromes to enable early detection of HD. The joint effect of the cognitive scores on odds of HD diagnosis is shown to change with time. In addition, the relationship between the cognitive symptoms and the log-odds of the disease diagnosis is shown to be nonlinear (Paulsen et al., 2008). Our goal is to study the nonlinear time dependent cognitive effects so as to facilitate the early detection of HD.
Specifically, let Dik, Zik, and Xik represent the binary disease indicator, the cognitive score vector, and the additional covariate vector for the ith individual at the jth measurement time, respectively. The cognitive scores include SDMT (Smith, 1982), stroop color, stroop word, and stroop interference tests (Stroop, 1935). They are denoted by Zi1,. .. , Zi4, respectively. The covariates of interest are gender, education, CAP score (Zhang et al., 2011). They are denoted by Xi1,. .. , Xi3, respectively. The subject's age at the visiting time serves as the time variable Tik. We normalize the continuous variables to the interval (0, 1) to alleviate numerical instability. Without changing notations, we transform Zi1,. .. , Zi4, Xi3, Tik by the normal distribution functions with means and variances estimated from the sample.
We use logit link function to model the binary outcomes, i.e., we assume
| (6) |
We obtain the initial estimates and working correlation matrix using the GEE method with exchangeable covariance assumption. We choose the exchangeable covariance structure because in our setting, it facilitates computation while also accounts for the longitudinal correlations. Let the working correlation coefficient matrix be Ri, the working covariance matrix be , where is Hi(1 − Hi) with estimated , ŵ , plugged in. We implement the profiling procedure described in Section 2.2 in the subsequent estimation. The kernel and B-spline functions are defined in the same way as described in Section 3. We obtain the point estimators and the asymptotic variances . Consequently, the 95% asymptotic confidence intervals are {(−0.46, −0.23), (−0.93, −0.85), (2.09, 2.52)}, which demonstrate of significant effect gender, education level, and CAP score on the disease risk. Specifically, female (Xi1 = 0) tends to have higher disease risk than male (Xi1 = 1). In addition, patients with lower education levels and higher CAP scores are more likely to develop Huntington's disease, which is consistent with the clinical literature (Zhang et al., 2011).
We also plot ŵ(t) to show the variation patterns of the effect of the four cognitive scores over time. Figure 4 shows that the stroop interference score has more important effect than all the others after age 30. The 95% point wise confidence interval remains above the 0.25 level after age 27, and the stroop interference score effect largely dominates all the other effects during that period. This dominating effect indicates that the stroop inference score has the closest relationship with the onset of HD, and in turn could be used to predict HD most effectively among the four. Further, stroop color has large effect at earlier ages (before 30 or at early 30s), while the SDMT has reasonably large effect at later ages (75 or above). Moreover, stroop word have relatively small predicative effects (< 0.25) on the disease risk across all ages. The plots clearly show the time dependent nature of the cognitive score effects. More specifically, stroop color effect is decreasing over times, stroop interference effect is a concave function of time, while SDMT, stroop word effects are convex functions of time. The last three non-monotone effects reach their extreme values around the ages of 40 to 50. In summary, the results show that the stroop interference is more relevant to the disease risk than the other scores. Further, the relative magnitude of the score effects clearly change over time, which suggests the need to closely monitor specific cognitive scores for different age groups. This illustrates the importance of modeling w as a function of age, and the convenience of using a weighted score w(t)TZ as a combined cognitive profile in practice.
Fig 4.

Estimation of the weight function w(t)'s and the 95% asymptotic confidence bands in Huntington's disease data. The reference line is 0.25.
The form of the function m̂ is shown in the left panel of Figure 5. We also plot the 95% point wise asymptotic confidence band of m̂ in the range of the combined scores U. The plot shows that the functional single index risk score is a decreasing function of the index. The upper confidence interval does not include 0, which shows that the functional single index risk score is significantly smaller than 0 at any age and cognitive score values in this population.
Fig 5.

Function m̂(u) (left) and the estimated disease risk as a function of u (right) in Huntington's disease data.
In the right panel of Figure 5, we plot the disease risk (the estimated probability of D = 1) and the 95% point wise asymptotic confidence band, where the confidence band is based on estimated variance, calculated using the Delta method and the estimated variance of m̂. The results show that the disease risk decreases with the combined cognitive score value U. The 95% confidence interval does not include the 0.5 line, which shows that the disease risk in the population is smaller than 0.5 across all age and cognitive score values. Combining the two plots, Figure 5 shows that a higher value of the combined score U = w(t)TZ, which implies better cognitive functioning, tends to lower functional single index risk score and in turn lower the risk of HD. The effect of the functional single index cognitive risk score on HD diagnosis is approximately quadratic for a standardized score U < 0.6, and is approximately a constant for U > 0.6. The flattening of the effect reflects a ceiling effect for subjects with better cognitive performance.
Next, we perform two sensitivity analyses to justify using a more flexible generalized partially linear functional single index model as shown in (6). We compare Model (6) with two simpler models. The first one assumes the function m is linear, hence
| (7) |
where αc, α1 are unknown parameters. The second one assumes the weight function w is time-invariant, hence
| (8) |
where w is an unknown parameter vector. We carried out the estimation of w(t) in the first model using kernel method and the estimation for m in the second model via B-spline method. We implemented 1000 5-fold cross validation analysis. We evaluated models by the mean squared predictive error (i.e., the mean squared differences between Di and the predicted probability of Di = 1 on the test set) as a function of the average of the four standardized cognitive scores , which we named the standardized score. In Figure 6, we plot the mean squared predictive error curves obtained under the proposed Model (6) and two simpler models. The results show that our original generalized partially linear model with functional single index outperforms Model (8) uniformly across the range of the standardized scores in terms of a lower mean squared error. We also plot the empirical 95% confidence intervals of the squared predictive errors under the proposed model. Compared with the simpler Model (7), our model gives significant smaller predictive errors when the standardized score is smaller than 0.36. The medians of the squared predicative errors in this range are 0.040 and 0.049 for the models (6) and (7), respectively. When the standardized score is greater than 0.5, Model (7) performs slightly, but not significantly, better than Model (6). Overall, the total mean squared error summarized by the area under the predictive error curves for models (6), (7) and (8) are respectively, 0.022, 0.028, and 0.057, which justify using the more flexible model in (6) to fit the Huntington's disease data. The results also demonstrate the potential of using our method as an exploratory tool to assess general patterns of data.
Fig 6.

The mean squared predictive errors versus the standardized averaged score in Huntington's disease data. The gray lines are the 95% confidence intervals for the fused kernel/B-spline method.
5. Conclusion and discussions
We have developed a generalized partially linear functional single index risk score model in the longitudinal data framework. We explore the relationship between the cognitive scores and the disease risk so as to predict HD diagnosis early, and in turn to intervene with the disease progression in a timely manner.
We introduce a framework of jointly using the B-spline and kernel methods in semiparametric estimation. We use B-spline to approximate the functional single index risk score function m, and use kernel smoothing technique for estimating the cognitive weight functions of time w(t). We integrate B-spline basis expansion, kernel smoothing and longitudinal analysis, and have proven the consistency and asymptotic normalities of the covariate coe -cient estimators, the time dependent weight function estimators, and the single index risk score function estimators. The derivation relies on the assumption that the iteration procedure converges to a parameter vector value that is in a small neighborhood of the truth, which generally requires the estimating equation to have a unique zero. The unique zero property is difficult to guarantee in theory and is less likely to hold when sample size is small or moderate. To this end, empirical knowledge is usually used to select a suitable root. In our simulations, multiple roots issue did not occur and the numerical results show desirable finite sample properties of the estimators. The real data analysis yields results which are interpretable and useful in practice. In summary, the functional single index model provides rich and meaningful information regarding the association between the disease risk and the cognitive score profiles. It is of course also possible to use B-spline or kernel methods to estimate both m and w(t), research along this line can also be interesting.
Our method accommodates both continuous and categorical response variables as long as the link function H is continuously differentiable and has finite second derivative. One outstanding research question in these models, even in the context when the marginal model is completely parametric (for example, both m and w are known), is the estimation efficiency. As far as we are aware, there is no guarantee that GEE family contains the efficient estimator, and how to obtain asymptotically efficient estimator certainly worth further research.
The proposed generalized partially linear functional single index model can be used to incorporate high dimensional data, since the single index risk score is a natural method to alleviate the curse of the dimensionality. For example, the single index score could be a combination of gene expression covariates to facilitate the genetic association study. Furthermore, the generalized partially linear functional single index risk score can be used in an adaptive randomization clinical trial study to improve study efficiency. For example, we can use a single index risk score to summarize some disease related biomarkers which provide early information about the primary endpoints in adaptive trials. When a trial progresses, the information can be used to make certain intermediate decisions, such as treatment assignments among the patients, and stopping or continuation of the trial.
Supplementary Material
APPENDIX A.1: PROOF OF PROPOSITION 1
Assume there exist and , such that
| (9) |
where m0, w0(t) and β0 are the true parameter values. Taking derivative with respect to Z and t on both sides of the equation, we obtain
| (10) |
Because m1, m0 are one-to-one, can hold only for a set of discrete set of and values, hence a discrete set of t values. Thus, due to the continuity of , , w1 and w0, (10) implies for all j = 1,. .. , dw, all Z, and all t ∈ [0, τ]. Thus, . Furthermore, is positive definite and in turn is invertible, it leads to . In particular, we have for all j = 1,. .. , dw. This gives w1j(t) = w0j(t)cj for some constant cj, or equivalently, w1(t) = Cw0(t) where C is a diagonal matrix with cj's on the diagonal. Taking derivative with respect to t, we further have . Dividing w1j(t) on both sides, we have . Therefore, C/cj is the identity matrix. In other words, cj, j = 1,. .. , dw are identical. Since ∥w1(t)∥1 = ∥w0(t)∥1 = 1 and w1(t), w0(t) are positive, this further implies w1(t) = w0(t). Therefore, (10) reduces to . This further implies for a constant C1. Because m1(0) = m0(0) = c0, C1 = 0, i.e. m1 = m0. (9) now leads to . The equality holds for any X, which implies . Since is positive definite, and in turn is invertible, we have β1 = β0. Therefore, we have β1 = β0, w1(t) = w0(t), and m1 = m0, hence the problem is identifiable.
APPENDIX A.2: NOTATION IN ESTIMATION STEP
Notation in Step 1
We define an Mi × dλ matrix
and define to be the same as except we replace Tik, k = 1, ... ,Mi. Here and throughout the text, replacing Ti by t0 means replace Tik = t0 for each k, k =, 1,. .. , Mi. Let
Notation in Step 2
We define as
and .
We define a functional from to , so that this functional evaluated at wh is . For notational brevity, we still use to denote this functional, i.e.
Let be a dw × dwMi matrix, with the kth size dw × dw column block being
Let be a dwMi × dwMi matrix with the (p, q)th block being
where Ωipq is the (p, q)th element of the working covariance matrix Ωi.
We further define the population level quantities Swik{ β0, m0, w0(t0)} to be
and Swi{ β0, m0, w0(t0)} = [Swik {β0, m0, w0(t0)}T , k = 1,..., Mi]T. Let Awi{ β0, m0, w0(t0)} be a dw × dwMi matrix, with the kth column block Awi{ β0, m0, w0(t0)} being a dw × dw matrix
Let Vwi{ β0, m0, w0(t0)} be a dwMi × dwMi matrix, with the (p, q)th column block Vwipq{ β0, m0, w0(t0)} being
Let V*wi{ β0, m0, w0(t0)} be a dwMi × dwMi matrix. The (p, q)th block is obtained by replacing Ωipq in Vwipq{ β0, m0, w0(t0)} with
Here η is an operator that maps functions in C1([0, τ]) to functionals from to . Specifically, η minimizes
where
and η{Ui(Ti)}(wh) = [η{w(Tik)TZik}(wh), k = 1,. .. , Mi ]} are Mi vectors. We can also write
Further, we define is a Mi × dw matrix, with row j as . In the estimation, we use the asymptotic form in Lemma 4 in the supplementary article in the place of for computation.
Notation in Step 3
We define
and . Let be a dβ × dβ Mi matrix with the kth size dβ × dβ column block being
Let be a dβ Mi × dβ Mi matrix with the (p, q)th block being
Additionally, let and we define which minimizes
where Q̃βi = (Xi1,. .. , XiMi)T is a Mi × dβ matrix, and δ {Ui(Ti)} = [ δ{w(Tik)T Zik}, k = 1,. .. , M}]T is a Mi × dβ matrix. We can also write as . Further, we define
where is a dwMi × dwMi diagonal matrix with the kth diagonal block being a dw × dw diagonal with the element Θik {β0, m0, w(t0)}. And Qwi{m0, w(t0)} is a dwMi × dwMi diagonal matrix with the kth diagonal block being . Moreover is a dwMi × dw matrix with the kth row block being a dw × dw matrix with dw replications of . And . Also Let B(Ti) be the dwMi × dwMi block diagonal matrix with the kth block as B(Tik) and fT(Ti) be the dwMi × dwMi block diagonal matrix with the kth block as fT(Tik).
Let γu ∈ Cq([0, 1]) and we define λ {w(Tik)T Zik} = [ γu{w(TikTZik}, u = 1,...,dβ] ∈ Rdβ which minimize
where γ{Ui(Ti)} = [ γ{w(Tik)TZik}, k = 1,. .. , Mi}]T is a Mi × dβ , and
is a Mi × β matrix with kth row as
We can also write
We also define the population forms S βik{ β0, m0, w0(Tik)} as
and S βi{ β0, m0, w0(Ti)} = [S βik{ β0, m0, w0(Tik)}T, k = 1,...Mi]T. Let A βi{ β0, m0, w0(Ti)} be a dβ × dβ Mi be the matrix with the kth block A βik{ 0, m0, w0(Tik)} being a dβ Θ dβ matrix
Let V βi{ β0, m0, w0(Ti)} be a dβ MiΘdβ Mi with the (p, q)th block V βipq β0, m0, w0(Tip)} being
Let be a dβ Mi × dβ Mi matrix. The (p, q)th block is obtained by replacing Ωipq in V βi{ β0, m0, w0(Ti)} with
Footnotes
This work was supported by the National Science Foundation (DMS-1206693 and DMS-1000354) and the National Institute of Neurological Disorders and Stroke (NS073671, NS082062). The authors thank the editor, associate editor and three anonymous referees for their comprehensive review which greatly improved the paper.
SUPPLEMENTARY MATERIAL
Supplement: Supplement to “Fused Kernel-Spline Smoothing for Repeatedly Measured Outcomes in a Generalized Partially Linear Model with Functional Single Index” (http://www.e-publications.org/ims/support/dowload). We provide the comprehensive proofs of Theorem 1, 2, 3 and additional Lemmas which support the results.
REFERENCES
- Bishop YM, Fienberg SE, Holland PW. Discrete Multivariate Analysis: Theory and Practice. Springer; New York: 2007. p. c2007. [Google Scholar]
- Bosq D. Bosq. Lecture notes in statistics. Springer; New York: 1998. Nonparametric statistics for stochastic processes : estimation and prediction D. p. 110.p. c1998. [Google Scholar]
- Carroll RJ, Fan J, Gijbels I, Wand MP. Generalized Partially Linear Single-Index Models. Journal of the American Statistical Association. 1997;92:477–489. [Google Scholar]
- Cui X, Härdle WK, Zhu L. The EFM approach for single-index models. The Annals of Statistics. 2011;39:1658–1688. [Google Scholar]
- de Boor C. Applied Mathematical Sciences. Vol. 27. Springer; 2001. A Practical Guide to Splines. [Google Scholar]
- DeVore RA, Lorentz GG. Constructive approximation. Grundlehren der mathematischen Wissenschaften: 303. Springer-Verlag; Berlin ; New York: 1993. p. c1993. [Google Scholar]
- Jiang F, Ma Y, Wang Y. Supplement to ”Fused Kernel-Spline Smoothing for Repeatedly Measured Outcomes in a Generalized Partially Linear Model with Functional Single Index”. 2015 doi: 10.1214/15-AOS1330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jiang C-R, Wang J-L. Functional single index models for longitudinal data. The Annals of Statistics. 2011;39:362–388. [Google Scholar]
- Lu M, Loomis D. Spline-based semiparametric estimation of partially linear Poisson regression with single-index models. Journal of Nonparametric Statistics. 2013;25:905–922. [Google Scholar]
- Ma S, Song PX-K. Varying Index Coefficient Models. Journal of the American Statistical Association. 2014 0 00–00. [Google Scholar]
- Ma Y, Zhu L. Doubly robust and efficient estimators for heteroscedastic partially linear single-index models allowing high dimensional covariates. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2013;75:305–322. doi: 10.1111/j.1467-9868.2012.01040.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Paulsen JS, Langbehn DR, Stout JC, Aylward E, Ross CA, Nance M, Guttman M, Johnson S, MacDonald M, Beglinger LJ, Duff K, Kayson E, Biglan K, Shoulson I, Oakes D, Hayden M. Detection of Huntingtons disease decades before diagnosis: the Predict-HD study. Journal of Neurology, Neurosurgery & Psychiatry. 2008;79:874–880. doi: 10.1136/jnnp.2007.128728. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Peng H, Huang T. Penalized least squares for single index models. Journal of Statistical Planning and Inference. 2011;141:1362–1379. [Google Scholar]
- Silverman BW. Density estimation for statistics and data analysis B.W. Silverman. Monographs on statistics and applied probability. Chapman and Hall; London; New York: 1986. p. 26. 1986. [Google Scholar]
- Smith A. Symbol digits modalities test: manual. Western Psychological Services; Los Angeles: 1982. [Google Scholar]
- Stroop JR. Studies of interference in serial verbal reactions. Journal of Experimental Psychology. 1935;18:643–662. [Google Scholar]
- Wang L, Yang L. Spline estimation of single-index models. Statistica Sinica. 2009;19:765. [Google Scholar]
- Wang J-L, Xue L, Zhu L, Chong YS. Estimation for a partial-linear single-index model. The Annals of statistics. 2010;38:246–274. [Google Scholar]
- Xia Y, Härdle W. Semi-parametric estimation of partially linear single-index models. Journal of Multivariate Analysis. 2006;97:1162–1184. [Google Scholar]
- Xia Y, Tong H, Li W, Zhu L-X. An adaptive estimation of dimension reduction space. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2002;64:363–410. [Google Scholar]
- Xu P, Zhu L. Estimation for a marginal generalized single-index longitudinal model. Journal of Multivariate Analysis. 2012;105:285–299. [Google Scholar]
- Zhang Y, Long JD, Mills JA, Warner JH, Lu W, Paulsen JS. AIndexing disease progression at study entry with individuals at risk for Huntington disease. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics. 2011;156:751. doi: 10.1002/ajmg.b.31232. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

