Abstract
We consider the problem of estimation in semiparametric varying coefficient models where the covariate modifying the varying coefficients is functional and is modeled nonparametrically. We develop a kernel-based estimator of the nonparametric component and a profiling estimator of the parametric component of the model and derive their asymptotic properties. Specifically, we show the consistency of the nonparametric functional estimates and derive the asymptotic expansion of the estimates of the parametric component. We illustrate the performance of our methodology using a simulation study and a real data application.
Keywords: Functional regression, Kernel smoothing, Profile method, Semi-varying coefficient model
1 Introduction
Suppose we observe for i = 1, …, n outcome Yi and covariates Xi, Si and
, where Xi = (Xi1, …, Xip)T ∈ ℜp, Si = (Si1, …, Siq)T ∈ ℜq and
is a functional covariate assumed to belong to a function space
. We consider the following partially linear functional varying coefficient model:
| (1) |
where β0(t) = [β01(t), …, β0p(t)]T with β0k(·), k = 1, …, p, assumed to be smooth but unknown functionals, η0 ∈ ℜq and the random errors εi are independently and identically distributed with E[ε|X,
, S] = 0. We also assume the functional covariate
is observed on a finite grid 0 ≤ t1
< … < ts ≤ 1, without loss of generality.
This model is the functional generalization of the standard partially linear varying coef-ficient model where
is typically assumed to be a scalar or a vector (Ahmad et al. 2005). A simple example of such model is when
is binary (taking values 0 or 1) indicating a non-exposed or exposed group to some treatment. In that case, β0(
) has only two values indicating the different effects of X for the two groups. In general,
can be thought as a stratification variable that divides the population into several strata and the varying coefficient model assumes that X has different effect on Y in different strata. In other words, it is a means to quantify the interaction between X and
. Such varying coefficient models have obtained much attention in the past years due to their usefulness and a fair amount of literature exists when the nonparametrically modeled variable is a vector. For example, Fan and Huang (2005) considered semiparametric varying-coefficient partially linear models and derived asymptotic properties of profile likelihood estimators. For a much more detailed review, readers are directed to Fan and Zhang (2008) and the references therein. However, in this paper, we consider the case where
is a functional stratifier and develop an estimation procedure for β0 and η0.
Semiparametric models are not well studied when covariates modeled nonparametrically are functional-valued. There is some recent literature on partially linear functional models. Aneiros-Perez and Vieu (2006) considered a special case of model (1) with Xi ≡ 1. Aneiros-Perez and Vieu (2008) considered semi-functional partial linear time series modeling to use a continuous path in the past (used as the functional covariate) to predict future values of the process. Lian (2011) considered functional partial linear models in the sense that the functional covariate is modeled via a linear operator while the other covariate (assumed to be in a vectorial topological space with a semi-metric) is modeled nonparametrically. Ferraty and Romain (2010) also provided detailed discussion about functional partial linear models, their estimation procedure using kernel smoothing and asymptotic properties of the estimates. However, to the best of our knowledge, there is no results on partially linear functional varying-coefficient model where the varying coefficients are functionals of the observed functional covariates and are modeled nonparametrically.
In this note, we extend the work of Aneiros-Perez and Vieu (2006) and also discuss asymptotic expansion and properties of estimators of η0, which is new when there exist functional covariates that enter a regression model nonparametrically. Section 2 presents our estimation method and relevant asymptotic results. Some simulation results are presented in Section 3 and an application to real data is given in Section 4. The Appendix collects all technical conditions and proofs.
2 Estimation method and asymptotic results
2.1 Estimation of β0(·)
We first discuss about estimation of β(·) for any fixed value of η0 = η*. Recall that we assume the covariates
to be in a abstract space
. Ferraty and Vieu (2004, 2006) argued that the most interesting spaces for functional data modeling are the so called semi-metric spaces. Here, to keep the general formulation, we assume that
is a semi-metric space with an associated semi-metric d(·, ·).
We now use Nadaraya-Watson type local constant kernel based estimators for β. Let K(·) denote the kernel function and h be a bandwidth, and define Kh(z) = K (z/h). In general, K(·) should satisfy some usual smoothness conditions, see Appendix for details. Define
(η*) = Y − STη*. Then, for any given z ∈
, we estimate β0(z) by minimizing
with respect to β(·) or equivalently, one solves for β(z),
Of course, the solution has a closed form:
| (2) |
Define β(z, η) = [E(XXT|
= z)]−1E[X(Y − ST
η)|
= z]. Note that β0(z) = β(z, η0). Suppose the functions
, …,
∈
where
is a compact subset of
such that condition (C.
1) in the Appendix is satisfied. Then we have the following result.
Result 1
Under regularity conditions given in the Appendix, we have that uniformly in z ∈

where φ (h) is defined in the Appendix.
The proof of Result 1 is given in the Appendix.
2.2 Estimation of η0
We estimate η0 using the profile method, as described below. Recall that for each value of η, we can construct β̂(z, η) as in (2). Then we propose to estimate η0 by minimizing
and thereby solving for η
where
To investigate the asymptotic properties of η̂, we first define βη, j (z) = −[E(XXT|
= z)]−1E[XSj |
= z], and
. Then we have the following result.
Result 2
Under regularity conditions given in the Appendix, we have that uniformly in z ∈

where σ2 = E(ε2) and Σ = E(S̃S̃T).
Remark 1
Note that by definition, S̃ does not depend on the true value of η0, and hence Σ is also independent of η0. Also it is interesting to note that S̃j, the jth element of S̃ for each j, can be thought of the residual of a nonparametric varying coefficient model with Sj, the jth element of S, as response and X and
as covariates. Specifically, if we posit the nonparametric varying coefficient model
, i = 1, …, n, E(ζi|Xi,
) = 0, then by similar arguments as in Result 1,
(z) gives us consistent estimate of −β η,j (z), regardless of the correctness of the posited model. Thus S̃ can be estimated by the residuals of the element-wise regressions of S on X and
using nonparametric varying coefficient models. Consequently, one can easily estimate Σ using plug-in method where one replaces expectation by the corresponding sample version in the expression of Σ. The error variance σ2 can also be estimated by the mean square error of the estimated residuals from the full semiparametric model.
3 Simulation Study
In this section we present a simulation study to evaluate the performance of our method. We consider the model
where we set Xi1 = Normal(0, 1), Xi2 = Normal(0, 1), Si = Normal(0, 1), and εi = Normal(0, 1). The true values of the parameters are η0 = (η00, η01)T = (5, 5)T. The functional covariates are taken to be of the form
with ai = Normal(0, 4), bi = Normal(0, 1) and t ∈ [0, 1]. We take for z(t) ε

The functions are observed in a grid of 51 equally spaced points on [0, 1]. We use B-splines to approximate the observed functions and use first derivative based semi-metric. To fit each model, we use k-nearest neighbors type bandwidths which are selected by cross-validation.
To evaluate the performance of the estimator, we set up a test set of functions
= {a(t − 0.5)2 + 1: −3 ≤ a ≤ 3}. We consider three different sample sizes, n = 200, 400 and 1000, and generate 500 data sets for each case.
For each data set, we estimate β1(z) and β2(z) with z ∈
and compare the estimates with the actual values. The results are shown in Figure 1. Plotted are the actual values of the operators with z ∈
and the median of the estimates over all simulated data sets along with 95% point-wise confidence bands. It is evident from the graphs that the estimates are very close to the true functional values for z ∈
with reasonable confidence bands for both the functionals.
Figure 1.
Results for the simulation study. Plotted are the actual values of β1(z) (1st column) and β2(z) (2nd column) with z ∈
(dashed line) and the median of the estimates over all simulated data sets (solid line) along with point wise empirical 95% confidence band (dotted line). The top, middle and bottom rows correspond to three different sample sizes, n = 200, 400 and 1000, respectively.
Also, we estimate η00 and η01 for each of the settings. We compute, for j = 0, 1, the scaled empirical bias as average of n1/2(η̂0j − η0j), scaled empirical standard error as the standard deviation of n1/2(η̂0j − η0j), estimated standard error as the average of estimated standard deviation of n1/2(η̂0j − η0j) (using Remark 1) and empirical coverage at 95% nominal level. The results are given in Table 1. It is evident that as n increases, the bias remains close to zero, with reasonable coverage and standard errors. To observe normality of the estimates, we provide Q-Q plot of the estimates of η00 and η01 against normal quantiles when n = 1000 in Figure 2. It is evident that for large sample size, the distribution of estimates behaves like a Gaussian distribution.
Table 1.
Simulation results from Section 3. Presented are the scaled empirical bias (Bias = average of n1/2(η̂0j − η0j)), empirical standard error (Emp.se = standard deviation of n1/2(η̂0j − η0j)), estimated standard error (Est.se = average of estimated standard deviation of n1/2(η̂0j − η0j)) and empirical coverage (at 95% nominal level) of η00 and η01 over 500 simulations.
| Parameter | Bias | Emp.se | Est.se | Emp.cov (%) | |
|---|---|---|---|---|---|
| n = 200 | η00 | 0.031 | 1.43 | 1.22 | 91.6 |
| η01 | −0.043 | 1.34 | 1.18 | 92.0 | |
|
| |||||
| n = 400 | η00 | 0.019 | 1.25 | 1.09 | 93.6 |
| η01 | −0.034 | 1.27 | 1.09 | 93.5 | |
|
| |||||
| n = 1000 | η00 | 0.005 | 1.12 | 1.02 | 94.2 |
| η01 | −0.006 | 1.13 | 1.02 | 94.6 | |
Figure 2.
Results for the simulation study. Displayed are the normal Q-Q plot of estimates of η00 (left) and η01 (right) for n = 1000.
4 Data Example
We demonstrate out method using the Tecator data set available at http://lib.stat.cmu.edu/datasets/tecator. The goal is to predict the fat content of a meat sample on the basis of its near infrared absorbance spectrum. Each sample contains finely chopped pure meat with different moisture, fat and protein contents, which are measured in percent and are determined by analytic chemistry. The functional covariate for each food sample consist of a 100 channel spectrum of absorbances recorded on a Tecator Infratec Food and Feed Analyzer working in the wavelength range 850–1050 nm by the near infrared transmission (NIT) principle. See Ferraty and Vieu (2006) for a detailed discussion of the data set.
In this analysis, we demonstrate our method using a subset of the full data. We have for i = 1, …, 215 samples where our response variable Yi is the fat content in the sample. We denote the protein and moisture content by X1 and X2, respectively, and denote the functional covariate by
. We consider various (varying coefficient) models. To evaluate the performance of the models, we divide the data set into a training data consisting of 172 samples and a test set with 43 samples as suggested in the data documentation given at the above mentioned link. We use the prediction mean squared error
as our evaluation criterion. To fit each model, we use k-nearest neighbors type bandwidths which are selected by cross-validation over the training sample. We use derivative based semi-metrics dk(f, g) = [∫{f (k)(x) − g(k)(x)}2dx]1/2 for k = 0, 1, 2. The results are displayed in Table 2.
Table 2.
Prediction mean squared errors for the Tecator data set for different type of varying coefficient models. Columns denoting PMSE(k) display PMSE values when the semi-metric dk is used. In each model, the best PMSE value is given in bold face. The overall best PMSE is underlined and bold faced.
| Model | PMSE(2) | PMSE(1) | PMSE(0) |
|---|---|---|---|
Y = β0(
) + X1β1(
) + ε
|
2.72 | 5.55 | 8.82 |
Y = β0(
) + X2β2(
) + ε
|
1.80 | 1.41 | 2.28 |
Y = β0(
) + X1β1 + X2β2 + ε
|
1.05 | 0.80 | 2.03 |
Y = β0(
) + X1β1(
) + X2β2 + ε
|
1.29 | 0.88 | 1.58 |
Y = β0(
) + X1β1 + X2β2(
) + ε
|
1.30 | 0.40 | 1.47 |
Y = β0(
) + X1β1(
) + X2β2(
) + ε
|
0.99 | 0.56 | 1.48 |
It is evident that the model with linear effect of X1 and varying coefficient for X2 performs best among the all considered models. It is interesting to note that Aneiros-Perez and Vieu (2006) found that the Semi-functional partial linear model with both X1 and X2 linear was best among their models. From our results, it appears that allowing X2 to have a varying coefficient captured the data better. In fact, the model with both X1 and X2 having varying coefficient performs competitively as well.
We emphasize that the example provided in this section is not meant to be a full case study or analysis of the data, but a demonstrative example that shows the performance of out methods and illustrates the fact that the partially linear functional varying coefficient models are competitive for this kind of data sets.
Acknowledgments
Maity’s work was partly supported by Award Number R00ES017744 from the National Institute of Environmental Health Sciences. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of Environmental Health Sciences or the National Institutes of Health. Huang’s work was partly supported by grants from NSF (DMS-09-07170 AND DMS-10-07618) and King Abdullah University of Science and Technology (KUS-CI-016-04). We are also grateful to two anonymous reviewers for their careful evaluation of the paper and constructive comments that led to a significantly improved version of the paper.
A Appendix
A.1 Conditions and Assumptions
We use the following assumptions for the results stated in this paper. These assumptions are usual in nonparametric functional modeling, see for example Ferraty and Vieu (2006) and Aneiros-Perez and Vieu (2006).
-
(C.
1)The functions
, …,
∈
where
is a compact subset of
such that
, with
for some real positive constants γ, c0. -
(C.K)
K(·) has support [0, 1], is Lipschitz continuous and is strictly decreasing. Also, there exists cK > 0 such that for all t ∈ [0, 1], −K′(t) > cK.
-
(C. θ)
There exist constants c and a so that |f (z1) − f (z2)| ≤ cd(z1, z2)a for all z1, z2 ∈
with f ∈ {β1, …, βp, βη,1, …, βη,q, g11, …, gpp, w11, …, wpq}, where gj k(z) = E(Xj Xk|
= z) and wj k(z) = E(Xj Sk|
= z). -
(C.Y)
For some β ≥ 2, E|Y |β < ∞.
-
(C.
2)There exists a function φ(h) with for some θ2 > 0 such that for all z ∈
, there are constants c1 and c2 such that
-
(C.X)
E(XXT|
) is finite positive definite. Also for all m ≥ 2, E(|Xj Xk|m|
= z) < σ1m(z) < ∞, where σ1m(·) is continuous at z. -
(C.S)
E(SST|
) is finite positive definite. Also for all m ≥ 2, E(|Xj Sk|m|
= z) < σ2m(z) < ∞, where σ2m(·) is continuous at z. -
(C.ε)
σ2 = E(ε2) > 0 and E|ε|r < ∞ for some r ≥ 3.
-
(C.h)
nh4a → 0 and φ (h) ≥ n(2/r)+b−1/(log(n))2 for large enough n and some constant b > 0 such that (2/r) + b > 1/2
Remark 2
Note that the assumption (C.
1) describes the structure of the subset
containing the functional covariates. As noted in Ferraty and Vieu (2008), this assumption is trivially satisfied in standard non-parametric problems with vector-valued covariates, but this is not necessarily true for any general abstract semi-metric space. Thus it is necessary to assume that the compact subset
can be written such that (C.
1) holds. To address this issue, Ferraty et al. (2010) proposed an alternative condition based on Kolmogorov’s ε-entropy (see their definition 1 in Section 2). Specifically, let us denote the Kolmogorov’s ε-entropy of
by Ψ
(ε). Ferraty et al. (2010) proposed to impose conditions on Ψ
(log(n)/n), see for example their conditions (H5) and (H6). In this paper, we will utilize the condition (C.
1) to prove our results. However, we conjecture that it is possible to prove similar results using these alternative entropy based conditions as well. We do not take-up this problem in the present paper but it is certainly an interesting open-problem.
A.2 Proof of Result 1
We use the following two results to prove Result 1. Denote .
Lemma 1
Under regularity conditions, uniformly in z ∈
,
Lemma 2
Under regularity conditions, uniformly in z ∈
,
Now, Using Lemma 1, Lemma 2 and positive definiteness of E[XXT|
= z], it is easy to derive that uniformly in z ∈

A.2.1 Proof of Lemma 1
Using results in Ferraty and Vieu (2004), see proof of their Lemma 3.1, we see that for j, k = 1, …, p,
Now, using conditions that h → 0 and log(n)/{nφ(h)} → 0, we see that the right hand side is in fact op(1) and hence the proof follows.
A.2.2 Proof of Lemma 2
Recall that under the model,
Define . Then we have that
where by Ferraty and Vieu (2004) uniformly in z ∈ 
| (3) |
Also, using the fact that E(ε|X,
) = 0 we see that
Also it is easy to see (Ferraty and Vieu, 2006) that
| (4) |
A.3 Proof of Result 2
A.3.1 Expansion of η̂
We start by noting that
Using similar argument as in the previous section, we obtain that
| (5) |
Also it is easy to see that E(S̃XT|
) = 0. Now we derive
Using (5) and the fact that E(ε|X,
) = 0, it is obvious that A2 = op(n−1/2). Moreover, using condition (C.h), (5) and Result 1 we have A3 = op(n −1/2). Finally, A1 = op(n −1/2) follows from Result 1 and noting that
and E(S̃XT|
) = 0
We observe that
Hence we have that
and the result follows.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Contributor Information
Arnab Maity, Email: amaity@ncsu.edu, Department of Statistics, North Carolina State University, Raleigh NC 27695, U.S.A.
Jianhua Z. Huang, Email: jianhua@stat.tamu.edu, Department of Statistics, Texas A&M University, College Station TX 77843-3143, U.S.A
References
- Aneiros-Perez G, Vieu P. Semi-functional partial linear regression. Statistics & Probability Letters. 2006;76:1102–1110. [Google Scholar]
- Aneiros-Perez G, Vieu P. Nonparametric time series prediction: A semi-functional partial linear modeling. Journal of Multivariate Analysis. 2008;99:834–857. [Google Scholar]
- Ahmad I, Leelahanon S, Li Q. Efficient estimation of a semiparametric partially linear varying coefficient model. Annals of Statistics. 2005;33:256–283. doi: 10.1214/10-AOS842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fan J, Huang T. Profile Likelihood Inferences on semiparametric varying-coefficient partially linear models. Bernoulli. 2005;11:1031–1057. [Google Scholar]
- Fan J, Zhang W. Statistical methods with varying coefficient models. Statistics and Its Interface. 2008;1:179–195. doi: 10.4310/sii.2008.v1.n1.a15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ferraty F, Laksaci A, Tadj A, Vieu P. Rate of uniform consistency for nonparametric estimates with functional variables. Journal of Statistical Planning and Inference. 2010;140:335–352. [Google Scholar]
- Ferraty F, Romain Y. The Oxford Handbook of Functional Data Analysis. Oxford University Press; 2011. [Google Scholar]
- Ferraty F, Vieu P. Nonparametric models for functional data, with application in regression, time-series prediction and curve discrimination. Journal of Nonparametric Statistics. 2004;16:111–125. [Google Scholar]
- Ferraty F, Vieu P. Nonparametric functional data analysis. Springer; New York: 2006. [Google Scholar]
- Ferraty F, Vieu P. Erratum of: ‘Non-parametric models for functional data, with application in regression, time-series prediction and curve discrimination’. Journal of Nonparametric Statistics. 2008;20:187–189. [Google Scholar]
- Lian H. Functional partial linear model. Journal of Nonparametric Statistics. 2011;23:115–128. [Google Scholar]


