Skip to main content
Journal of Applied Statistics logoLink to Journal of Applied Statistics
. 2020 Mar 3;48(4):583–604. doi: 10.1080/02664763.2020.1736527

Bayesian bandwidth estimation and semi-metric selection for a functional partial linear model with unknown error density

Han Lin Shang a,b,CONTACT
PMCID: PMC9041737  PMID: 35706989

ABSTRACT

This study examines the optimal selections of bandwidth and semi-metric for a functional partial linear model. Our proposed method begins by estimating the unknown error density using a kernel density estimator of residuals, where the regression function, consisting of parametric and nonparametric components, can be estimated by functional principal component and functional Nadayara-Watson estimators. The estimation accuracy of the regression function and error density crucially depends on the optimal estimations of bandwidth and semi-metric. A Bayesian method is utilized to simultaneously estimate the bandwidths in the regression function and kernel error density by minimizing the Kullback-Leibler divergence. For estimating the regression function and error density, a series of simulation studies demonstrate that the functional partial linear model gives improved estimation and forecast accuracies compared with the functional principal component regression and functional nonparametric regression. Using a spectroscopy dataset, the functional partial linear model yields better forecast accuracy than some commonly used functional regression models. As a by-product of the Bayesian method, a pointwise prediction interval can be obtained, and marginal likelihood can be used to select the optimal semi-metric.

Keywords: Functional Nadaraya-Watson estimator, scalar-on-function regression, Gaussian kernel mixture, Markov chain Monte Carlo, error-density estimation, spectroscopy

2010 Mathematics Subject Classifications: 97K80, 62F15

1. Introduction

In scalar-on-function regression, parametric regression models are useful for interpreting the linear relationship between scalar response and functional predictor, while nonparametric regression models may capture a possible nonlinear relationship, and thus may improve the estimation and prediction accuracies of the regression function. By combining the advantages of parametric and nonparametric regression models, the semiparametric regression models, such as the functional partial linear model, have received increasing attention in the literature (e.g. [42]).

Despite rapid developments in the estimations of functional partial linear models, the optimal selections of semi-metric and bandwidth remain largely unexplored. To address this, we consider the optimal parameter selections from the perspective of error-density estimation in functional regression. The estimation of error density is important for understanding the residual behavior of regression models and assessing the adequacy of the error distribution assumption (e.g. [2,15]). The estimation of error density is also useful for testing the symmetry of the residual distribution (e.g. [1,49]); for the estimation of the density of response variable (e.g. [22]); and is important for statistical inference, prediction and model validation (e.g. [20,47]).

In a nonparametric scalar-on-function regression, Shang [55] proposes a Bayesian bandwidth estimation method for determining the bandwidth of the Nadaraya-Watson (NW) estimator for estimating regression mean function and the bandwidth of the kernel-form error density. Also, in nonparametric scalar-on-function regression, Shang [58] proposes a Bayesian bandwidth estimation method for determining the bandwidth of the NW estimator for estimating regression quantile function and the bandwidth of the kernel-form error density. Further, Shang [58] uses marginal likelihood as a means of selecting the optimal semi-metric. Building on the early work by Zhang et al. [64], Zhang and King [63] and Shang [55–59], we consider a kernel error-density estimator which explores data-driven features, such as asymmetry, skewness, and multi-modality, and relies on residuals obtained from the estimated regression function and bandwidth of residuals. Differing from those early work, we derive an approximate likelihood and a posterior for the functional partial linear model (a semiparametric model). We present a Markov chain Monte Carlo (MCMC) sampling algorithm for simultaneously sampling bandwidth parameters, linear regression coefficient function β(t), and nonlinear NW estimator of m(z). Through a series of simulation studies, the functional partial linear model gives improved estimation and forecast accuracies of the regression mean function compared with the functional principal component regression and nonparametric scalar-on-function regression.

While the selection of bandwidth determines the amount of smoothing, the selection of semi-metric plays an important role in measuring distances among functions. In nonparametric functional data analysis, the optimal bandwidth can be determined either by functional cross-validation [50] or by a Bayesian method incorporating the information on error density [55]. However, the optimal selection of semi-metric is rather arbitrary. As noted by Ferraty and Vieu [27, Chapters 3 and 13], a semi-metric based on a derivative should be used for a set of smooth functional data; a semi-metric based on a dimension-reduction technique, such as functional principal component analysis (FPCA), should be used for a set of rough functional data. To the best of our knowledge, little attention has been paid for selecting which semi-metric is adequate based on rigorous statistical criteria.

This lack of rigorous statistical criteria motivates our investigation of a Bayesian method for estimating bandwidths in the regression function and error density simultaneously, and for selecting the optimal semi-metric, based on the notion of marginal likelihood (also referred to [37] as the evidence). The marginal likelihood reflects a summary of evidence provided by the data supporting a choice of semi-metric as opposed to its competing semi-metric. The optimal semi-metric is that which has the largest marginal likelihood. With the marginal likelihood, it is straightforward to calculate the Bayes factor for measuring the strength of evidence between any two semi-metrics (e.g. [37]). Selecting the optimal semi-metric and bandwidths often improves the estimation and prediction accuracies of the regression function, including the functional partial linear model considered here.

The remainder of this paper is organized as follows. In Section 2, we introduce the partial linear model with the functional covariate, previously studied by Lian [42], and the estimation procedure in Aneiros-Pérez and Vieu [5], Aneiros-Pérez et al. [3] and Ling et al. [43]. In Section 3, we review the Bayesian bandwidth-estimation method and introduce the notion of marginal likelihood computed as a by-product of the MCMC. The optimal semi-metric is determined by having the largest marginal likelihood among several possible semi-metrics. Using a series of simulation studies in Section 4, we evaluate and compare the estimation accuracies of the regression function and error density. Also, we compare the forecast accuracy of regression function among the functional partial linear model, functional principal component regression, and functional nonparametric regression. In Section 5, we apply the proposed method to a spectroscopy dataset in the food quality control and compare its forecast accuracy with 13 commonly used functional regression models. Conclusions are presented in Section 6, along with some reflections on how the method presented here could be extended.

2. Model and estimator

There is an increasing amount of literature on the development of nonparametric functional estimators, such as the functional NW estimator [27], the functional local-linear estimator [8], the functional k-nearest neighbor estimator [11], and the distance-based local-linear estimator [9]. In the functional partial linear model, we choose to estimate the conditional mean by the functional NW estimator because of its simplicity and robustness against a large gap in design points.

2.1. A functional partial linear model

When modeling scalar responses and functional predictors, some functional variables are related to responses linearly while other variables have nonlinear relationships with the response. We consider a random data triplet (X,Z,y), where y is a real-valued response and the functional random variable X is valued in Hilbert space containing square-integrable functions and Z is valued in some infinite-dimensional semi-metric vector space (F,d(,)). Let (Xi,Zi,yi)i=1,2,,n be a sample of data triplets that are independent and identically distributed (iid) as (X,Z,y). We consider a functional partial linear model with homoscedastic errors. Given a set of observations (Xi,Zi,yi), the regression model can be expressed as

yi=tIXi(t)β(t)dt+m(Zi)+εi,

where I represents function support range, β() is a regression coefficient function, m() is an unknown smooth real function that captures the possible nonlinear relationship between Z and y and (ε1,,εn) are iid random error satisfying

Eεi|Xi,Zi=0.

Following the estimation procedure of Aneiros-Pérez and Vieu [4], it is natural to estimate the regression coefficient function β() and the smooth function m() by the ordinary least squares and functional NW estimators, given by

βˆh(t)=X~h(t)X~h(t)+X~h(t)y~h, (1)
mˆh(z)=i=1nwh(z,Zi)yiXi(t),βˆh(t), (2)

where + represents generalized inverse and represents the column vector; X=(X1,,Xn) and y=(y1,,yn), X~h=(IWh)X and y~h=(IWh)y, Wh=[wh(Zi,Zj)]i,j is a weight matrix with wh(z,Zi) being the NW-type weights,

wh(z,Zi)=Kd(z,Zi)/hj=1nKd(z,Zj)/h,

where K() is a kernel function, such as the Gaussian kernel function considered in this paper, h is a positive real-valued estimated bandwidth parameter, controlling the trade-off between squared bias and variance in the mean squared error (MSE) and d(,) is a semi-metric used to quantify differences among curves.

The regression coefficient function βˆh(t) in (1) can be seen as the ordinary least squares estimator obtained by regressing the partial residual vector y~h on the partial residual vector of functions X~h. To avoid possible non-singularity, it is common to use a dimension-reduction technique, such as functional principal component regression (e.g. [42,52,53]). However, mˆh(z) in (2) can be seen as the functional NW estimator, with the partial residual vector as the response variable. Therefore, the estimation accuracy of βˆh(t) and mˆh(z) crucially depends on the optimal estimation of h. With an estimated bandwidth of the NW estimator, we obtain an estimate of β(t) at each iteration of the MCMC sampling. Then, we take average over all iterations to obtain the final estimates, namely hˆ, βˆh(t) and mˆh(z).

In functional nonparametric regression, the bandwidth parameter is commonly determined by functional cross-validation (e.g. [6,11,23,28]). Functional cross-validation aims to select a bandwidth that minimizes the squared loss function and has the appealing feature that no estimation of the error variance is required. However, as stated, an accurate estimation of error density is important. Thus, we consider a Bayesian method to estimate bandwidth parameters in the regression function and error density simultaneously. The Bayesian method aims to select a bandwidth that minimizes the Kullback-Leibler (KL) divergence.

2.2. Choice of semi-metric

The choice of semi-metric has effects on the size and form of neighborhoods and can thus control the concentration properties of functional data. Although Ferraty and Vieu [27, p. 193] note that the selection of the semi-metric remains an open question, there is no method to quantify which semi-metric is adequate, or how to select the optimal semi-metric, using statistical criteria.

From a practical perspective, the semi-metric based on derivative should be utilized for a set of smooth functional data. This semi-metric can be expressed as

dqderiv(Zi,Z)=tZi(q)(t)Z(q)(t)2dt,

where Z(q) is the qth-order derivative of Z, in which first and second derivatives are commonly used in practice (e.g. [26,28,36]). Computationally, the semi-metric based on the qth-order derivative uses a B-spline approximation for each curve, and a derivative of a B-spline function can be directly computed by differencing the B-spline coefficients and B-spline basis functions of one order lower (for more detail, see [19,27], Section 3.4).

For a set of rough functional data, a semi-metric based on FPCA is commonly used (for detail on the choice of semi-metric from the practical and theoretical perspectives, see [27], Chapters 3 and 13). The semi-metric based on FPCA computes proximities among a set of rough curves using the principal component scores. FPCA reduces the functional data to a finite-dimensional space (i.e. K number of components). The finite-dimensional representation can be expressed as

dK(Zi,Z)ζ=1K(βζ,iβζ)2φζ(t)2,

where K represents the number of retained functional principal components, and v=v,v represents the induced norm.

3. Bayesian method

In Section 3.1, we briefly review Shang's [55] Bayesian method for selecting optimal bandwidth. As a by-product of the MCMC, the optimal semi-metric can be determined with the largest marginal likelihood among a set of semi-metrics (as presented in Section 3.3).

3.1. Bayesian bandwidth estimation

Zhang et al. [64] consider a kernel-form error density in a nonparametric regression model, while Zhang and King [63] consider the kernel-form error density in a nonlinear parametric model where Bayesian sampling approaches are used. Following the early work by Zhang et al. [64] and Zhang and King [63], the unknown error density f(ε) can be approximated by a location-mixture Gaussian density, given by

f(ε;b)=1nj=1n1bφεεjb,

where φ() is the probability density function of the standard Gaussian distribution and Gaussian densities have means at εj, for j=1,2,,n and a common standard deviation b.

Given that errors are unknown in practice, we approximate them by residuals obtained from the functional principal component and functional NW estimators of the conditional mean. Given bandwidths h and b, the kernel likelihood of y=(y1,,yn) is given by

Lˆy|h,b=i=1n1n1j=1jin1bφεˆiεˆjb, (3)

where εˆi=yitIXi(t)βˆ(t)dtmˆ(Zi) represents ith residual obtained from the estimated regression function. This likelihood is not proper since some terms are left out. Instead, the likelihood is a pseudo-likelihood. The pseudo-likelihood is a likelihood function associated with a family of probability distributions, which does not necessarily contain the true function [35]. As a consequence, the resulting Bayesian estimators, while consistent, may have an inaccurate posterior variance, and subsequent credible sets constructed by this posterior may not be asymptotically valid (see also [16,48]).

By assuming that the independent priors of squared bandwidths h2 and b2 follow inverse Gamma distribution IG(103,103), the posterior density can be obtained by multiplying the kernel likelihood in (3) with the prior density. The posterior density can be expressed as (up to a normalizing constant)

πh2,b2|yLˆy|h,b×πh2×πb2.

3.2. Sampling algorithm

Computationally, an adaptive random-walk Metropolis algorithm can be used for sampling both parameters (h,b) iteratively (see [31]). The sampling procedure is briefly described as follows:

  1. Specify a Gaussian proposal distribution and begin the sampling iteration process by choosing an arbitrary value of (h2,b2) and denoting it as (h(0)2,b(0)2). The starting points can be drawn from a uniform distribution U(0,1).

  2. From the initial bandwidths h(0)2 and b(0)2, we obtain an initial estimate of linear regression coefficient function, βˆh(0)(t) from (1) and an initial estimate of nonlinear regression function, mˆh(0)(z) from (2).

  3. At the kth iteration, the current state h(k)2 is updated as h(k)2=h(k1)2+τ(k1)u, where u is drawn from the proposal density which can be the standard Gaussian density and τ(k1) is an adaptive tuning parameter with an arbitrary initial value τ(0). The updated h(k)2 is accepted with a probability given by
    minπh(k)2,b(k1)2|yπh(k1)2,b(k1)2|y,1.
  4. The tuning parameter for the next iteration is set to
    τ(k)=τ(k1)+c(1ξ)/kif h(k)2 is acceptedτ(k1)cξ/kif h(k)2 is rejected
    where c=τ(k1)/(ξξ2) is a constant and ξ is the optimal target acceptance probability, which is 0.44 for univariate updating (e.g. [54]).
  5. Conditional on h(k)2 and y, we obtain an estimator of linear regression coefficient function, βˆh(k)(t), and an NW estimator of nonlinear regression function, mˆh(k)(z) (see, e.g. [13]).

  6. Repeat Steps 2-5 for b(k)2, conditional on h(k)2 and y.

  7. Repeat Steps 2-6 for M + N times, discard (h(0)2,βh(0)(t),mh(0)(z),b(0)2), (h(1)2,βh(1)(t),mh(1)(z),b(1)2), …, (h(M)2,βh(M)(t),mh(M)(z),b(M)2) for burn-in to let the effects of the transients wear off, estimate hˆ2=k=M+1M+Nh(k)2/N, βˆh(t)=k=M+1M+Nβh(k)(t)/N, mˆh(z)=k=M+1M+Nmh(k)(z)/N, and bˆ2=k=M+1M+Nb(k)2/N. The burn-in period is taken to be M = 1000 iterations and the number of iterations after the burn-in is N=10,000 iterations.

3.3. Bayesian model selection

In Bayesian inference, model selection is often conducted through the marginal likelihood of the model of interest against a competing model. The marginal likelihood is the expectation of likelihood with respect to the prior density of parameters and reflects a summary of evidence provided by the data supporting the model as opposed to its competing model. It is seldom calculated as the integral of the product of the likelihood and prior density of parameters, but instead, is often computed numerically (e.g. [17,32]). We use the method proposed by Chib [17] to compute the marginal likelihood.

Let θ=(h,b) be the parameter vector and y=(y1,,yn) be the data. Chib [17] demonstrated that the marginal likelihood under model A can be expressed as

LA(y)=LA(y|θ)πA(θ)πA(θ|y),

where LA(y|θ), πA(θ) and πA(θ|y) denote the likelihood, prior and posterior, respectively. LA(y) is often computed at the posterior estimate of θ. The numerator has a closed form and can be computed analytically. The denominator can be estimated by its kernel-density estimator based on the simulated samples of θ through a posterior sampler. Based on marginal likelihoods, the Bayes factor of model A against model B is defined as LA(y)/LB(y), which can be used to decide whether model A or B is preferred, along with its degree of preference [37].

3.4. Adaptive estimation of error density

In the kernel-density estimation of directly observed data, it has been noted that the leave-one-out estimator is profoundly affected by extreme observations in the sample (e.g. [10]). When the actual error density has sufficient long tails, the leave-one-out kernel-density estimator with a global bandwidth estimated under the KL divergence is likely to overestimate the tails of the density [55]. To address this issue, Zhang and King [63] and Shang [58] suggested that localized bandwidths should improve the estimation accuracy of error density for the symmetrical unimodal distribution. The kernel density estimator with localized bandwidths assigns small bandwidths to the observations in the high-density region and large bandwidths to the observations in the low-density region. The localized error-density estimator can be written as

fˆεi;τ,τε=1n1j=1jin1τ1+τε|εˆj|φεˆiεˆjτ1+τε|εˆj|,

where τ(1+τε|εˆj|) is the bandwidth assigned to εˆj, for j=1,2,,n and the vector of parameters is now (h,τ,τε). The adaptive random-walk Metropolis algorithm, described in Section 3.2, can be used for sampling parameters, where the prior density of τεU(0,1).

4. Simulation

The principal aim of this section is to illustrate the proposed method through simulated data. We consider two cases in which simulated curves are smooth in Section 4.3 and in which simulated curves are rough in Section 4.5. Using the error criteria in Sections 4.1 and 4.2, we assess the estimation accuracies of the regression function and error density, respectively. In Section 4.4, we conduct a prior sensitivity analysis.

4.1. Criteria for assessing estimation and prediction accuracies of the regression function

To measure the estimation accuracy of the regression function, we first calculate the averaged mean squared error (AMSE) between the true regression function g() and the estimated regression function gˆ(). This is expressed as

AMSE=Eg(X;Z)gˆ(X;Z)2=EEgX;ZgˆX;Z2|X,Z1ni=1nEg(X;Z)gˆ(X;Z)2|X=Xi,Z=Zi1n1Bi=1ns=1Bg(Xi;Zi)gˆs(Xi;Zi)2,

where B = 100 represents the number of replications.

To measure the prediction accuracy of the regression function, we calculate the averaged mean squared prediction error (AMSPE) between the holdout regression function g(Xnew;Znew) and the predicted regression function gˆ(Xnew;Znew). This is expressed as

AMSPE=Eg(Xnew;Znew)gˆ(Xnew;Znew)21η1Bj=1ηs=1Bg(Xj;Zj)gˆs(Xj;Zj)2,

where η represents the length of prediction samples.

4.2. Criteria for assessing error-density estimation

To measure the difference between the true error density f(ε) and the estimated error density fˆ(ε), we consider two risk functions. First, we calculate the mean integrated squared error (MISE), which is given by

MISEfˆ(ε)=ε=abf(ε)fˆ(ε)2dε,

for ε[a,b]. For each replication, the MISE can be approximated at 1,001 grid points bounded between a closed interval, such as [10,10]. These can be expressed as

MISEfˆ(ε)150i=11001f10+(i1)50fˆ10+(i1)502.

From an average of 100 replications, the approximated mean integrated squared error (AMISE) is used to assess the estimation accuracy of error density. The AMISE is defined as

AMISE=1Bb=1BMISEb.

The second risk function is the KL divergence, given by

KLfˆ,f=Eε=abf(ε)lnfˆ(ε)dε.

With f fixed in simulation, KL(fˆ,f) is minimized for f=fˆ [40].

4.3. Simulation of smooth curves

We describe the construction of the simulated data. First, we build simulated discretized curves as

Xi(tj)=aicos(2tj)+bisin(4tj)+citj2πtj+29π2,i=1,2,,n, (4)

where t represents the function support range and 0t1t2t100π are equispaced points within the function support range, ai, bi, ci are independently drawn from a uniform distribution on [0,1], and n represents the sample size. The functional form of (4) is taken from Ferraty et al. [25] and Shang [55,57]. Figure 1 presents a replication of 100 simulated smooth curves, along with the first-order derivative of the curves approximated by B-splines.

Figure 1.

Figure 1.

100 simulated smooth curves. (a) Raw functional curves and (b) First-order derivative of the curves.

Once the curves are formed, we define Z as the first-order derivative of the curves (see also [45]). We compute the response variable through the following steps:

  1. Construct a regression-function operator g, which performs the mapping from function-valued space to real-valued space. The regression-function operator g is expressed as
    g(Xi;Zi)=10×ai2bi2.
  2. Generate ε1,ε2,,εn, which are independently drawn from mixture normal distributions. To highlight the possible non-normality of the error density, we consider three different error densities previously introduced by Marron and Wand [44]:
    1. unimodal symmetric distribution, such as t5;
    2. skewunimodal distribution: 0.2×N(0,1)+0.2×N(0.5,(23)2)+0.6×N(13/12,(59)2);
    3. skewbimodal distribution: 0.75×N(0,1)+0.25×N(1.5,(13)2).
  3. Compute the corresponding responses: yi=g(Xi;Zi)+εi, for i=1,2,,n.

4.3.1. Estimating the regression function

For a given data triplet (X,Z,y) and a bandwidth h, we compute the discrepancy between g() and gˆ(). We use the following Monte-Carlo scheme:

  • Build 100 replications [(Xis,Zis,yis)i=1,,n]s=1,,100;

  • Compute 100 estimates [g()gˆs()]s=1,,100, where gˆs() is the functional NW estimator of the regression function computed over the sth replication;

  • Obtain the AMSE and AMSPE by averaging over 100 replications of MSE and MSPE.

Table 1 presents the AMSEs and AMSPEs for the estimated conditional mean, where the error density is estimated by a kernel error density with a global bandwidth and localized bandwidths. The functional partial linear model considers two function-valued random variables, X and Z. In contrast, functional principal component regression and functional nonparametric regression models consider only the original functional curves, X. Thus, the functional partial linear model produces improved estimation and forecast accuracies than functional principal component regression and functional nonparametric regression. It is also advantageous to estimate error density with localized bandwidths to achieve the best estimation and forecast accuracies of the regression function. An exception is in the skewbimodal error density, where the difference between the two bandwidth procedures is quite small. Among the three types of semi-metrics, the AMSEs are smaller for the two semi-metrics based on derivatives, compared with the semi-metric based on FPCA. Following the early work by Shang [57], we use three retained principal components that seem to be sufficient to capture the primary mode of variation in this example. The semi-metric based on the second derivative has the smallest AMSE and AMSPE among the three semi-metrics.

Table 1. Comparisons of the estimation and forecast accuracies of the regression function for the functional partial linear model, functional principal component regression, and functional nonparametric regression for the three choices of semi-metric.
    Type of semi-metric
    1st derivative 2nd derivative FPCAZ=3
  FPCR FPLMG FPLML FNP FPLMG FPLML FNP FPLMG FPLML FNP
t5
AMSE 1.148 1.039 0.993 5.382 1.032 0.969 5.039 1.164 1.097 5.713
AMSPE 1.332 1.361 1.285 5.247 1.329 1.235 5.055 1.541 1.452 5.484
Skewunimodal
AMSE 1.600 1.237 1.216 5.847 1.148 1.136 5.353 1.337 1.320 6.182
AMSPE 1.783 1.366 1.371 5.746 1.313 1.297 5.383 1.551 1.519 5.980
Skewbimodal
AMSE 1.235 1.010 1.006 5.450 0.971 0.973 5.017 1.130 1.116 5.771
AMSPE 1.416 1.213 1.199 5.334 1.144 1.145 5.041 1.363 1.353 5.569

Notes: The underscore G and L represent a global bandwidth or localized bandwidths for estimating error density. The smallest errors are highlighted in bold.

Figure 2 presents the log marginal likelihoods (LMLs) for the semi-metrics in all three error densities considered. Again, the semi-metric based on the second derivative has the largest LMLs and is thus considered the optimal semi-metric in this example.

Figure 2.

Figure 2.

Comparisons of the LMLs in the functional partial linear model with a global bandwidth and localized bandwidths for the three choices of semi-metric. The semi-metric based on the first derivative is shown in white, the semi-metric based on the second derivative is shown in dark grey, and the semi-metric based on the FPCA is shown in light grey. (a) A global bandwidth and (b) Localized bandwidths.

4.3.2. Estimating the error density

With a set of residuals and a fixed residual bandwidth, it is possible to apply a univariate kernel-density estimator and compute the discrepancy between f(ε) and fˆ(ε) by using the following Monte-Carlo scheme:

  • Compute 100 replications of residuals [yisgˆis()]s=1,,100;

  • Apply a univariate kernel density to estimate error density, where the residual bandwidth is estimated by the Bayesian method for 100 replications;

  • Compute the MISE and KL divergence between the true error density fs(ε) and estimated error density fˆs(ε) for s=1,2,,100;

  • Obtain the AMISE and averaged KL divergence by averaging over 100 replications of MISE and KL divergence.

Table 2 presents AMISEs and averaged KL divergences for the kernel error density with bandwidths estimated by Bayesian methods with a global bandwidth and localized bandwidths. For the three error densities, the kernel error density with localized bandwidths produces smaller AMISEs and smaller averaged KL divergences than the kernel error density with a global bandwidth.

Table 2. AMISEs and averaged KL divergences for the error-density estimation in the functional partial linear model with a global bandwidth and localized bandwidths for the three choices of semi-metric.
  Type of semi-metrics (G) Type of semi-metrics (L)
f(ε) 1st derivative 2nd derivative FPCAZ=3 1st derivative 2nd derivative FPCAZ=3
AMISE
t5 0.0015 0.0015 0.0019 0.0008 0.0008 0.0008
skewunimodal 0.0088 0.0084 0.0088 0.0074 0.0073 0.0075
skewbimodal 0.0026 0.0024 0.0026 0.0024 0.0022 0.0026
Averaged KL divergence
t5 0.1427 0.1421 0.1957 0.1299 0.1272 0.1391
skewunimodal 0.6205 0.5486 0.6737 0.4996 0.4636 0.5695
skewbimodal 0.4247 0.3905 0.4736 0.2990 0.2756 0.4112

Notes: The underscore G and L represent a global bandwidth and localized bandwidths for estimating error density. The smallest errors are highlighted in bold.

4.4. Diagnostic check

As a demonstration with one replication, we plot the MCMC sample paths of the parameters on the left panel of Figure 3 and the autocorrelation functions (ACFs) of these sample paths on the right panel of Figure 3. With the t5 error density, these plots demonstrate that the sample paths are mixed reasonably well. Table 3 summarizes the ergodic averages, 95% Bayesian credible intervals (CIs), standard error (SE), batch mean SE, and simulation inefficiency factor (SIF) values. Considered previously by Kim et al. [38] and Meyer and Yu [46], the SIF can be interpreted as the number of draws needed to have iid observations. Based on the SIF values, we find that the simulation chain converges very well.

Figure 3.

Figure 3.

MCMC sample paths and ACFs of the sample paths with t5 error density.

Table 3. MCMC results of the bandwidth estimation under the prior density of IG(α=1,β=0.05) with t5 error density.

Prior density: IG(α=1,β=0.05)      
Parameter Mean 95% Bayesian CIs SE Batch-mean SE SIF
h 0.4656 (0.2136, 0.7026) 0.1217 0.3175 6.80
b 0.5000 (0.3377, 0.7078) 0.0967 0.2300 5.65

4.5. Simulation of rough curves

In this simulation study, we consider the same functional form as given in (4), but add one extra variable djU(0.1,0.1) in the construction of curves. This functional form is taken from Shang [57, Section 4.2]. Figure 4 presents the simulated curves for one replication, along with the first-order derivative of the curves.

Figure 4.

Figure 4.

100 simulated rough curves. (a) Raw functional curves and (b) The first-order derivative of the curves.

From the smooth curves presented in Figure 2, we found the difference in LML between a global bandwidth and localized bandwidths to be quite small for the functional partial linear model. Therefore, in Figure 5, we report only the LMLs for the functional partial linear model with a global bandwidth. The LMLs are computed for the three semi-metrics in all three error densities considered, and the overall optimal semi-metric is determined based on 100 replications. With the median LML, the semi-metric based on the first derivative has the largest LML for the t5 error density, followed closely by the semi-metric based on the FPCA with three retained components. In the skewunimodal and skewbimodal error densities, the semi-metric based on the FPCA gives the largest LML and is thus considered the optimal semi-metric in these two error densities.

Figure 5.

Figure 5.

Comparisons of the LMLs in the functional partial linear model with a global bandwidth for the three choices of semi-metric. The semi-metric based on the first derivative is shown in white, the semi-metric based on the second derivative is shown in dark grey, and the semi-metric based on the FPCA is shown in light grey.

5. Spectrometric data analysis

This dataset focuses on the prediction of the fat content in meat samples based on near-infrared (NIR) absorbance spectra. The dataset was obtained from http://lib.stat.cmu.edu/datasets/tecator and has been studied by many researchers (e.g. [4,27]). Each food sample contains finely chopped pure meat with a percentage of fat content. For each unit i (among 215 pieces of finely chopped meat), we observe one spectrometric curve, denoted by Xi. The spectrometric curve measures the absorbance at a grid of 100 wavelengths, denoted by Xi=[Xi(t1),,Xi(t100)]. For each unit i, we consider the first-order derivative of Xi and observe the fat content yR by analytical chemical processing. Given a new spectrometric curve (Xnew,Xnew), we aim to predict the corresponding fat content ynew. A graphical display of the spectrometric curves is presented in Figure 6.

Figure 6.

Figure 6.

Graphical display of spectrometric curves.

To assess the point forecast accuracy of the functional partial linear model, we split the original samples into two subsamples (see also [27, p. 105]). The first subsample is named the ‘learning sample’, which contains the first 160 units {Xi,Zi,yi}i=1,2,,160. The second subsample is named the ‘testing sample’, which contains the last 55 units {Xl,Zl,yl}l=161,,215. The learning sample allows us to build the regression model with the optimal smoothing parameter. The testing sample allows us to evaluate the prediction accuracy.

To measure estimation and prediction accuracies, we consider the root mean squared error (RMSE) and root mean squared prediction error (RMSPE). The errors are expressed as

RMSE=1160ω=1160yωyˆω2,RMSPE=155δ=161215yδyˆδ2.

For comparison, we also consider the functional principal component regression examined by Reiss and Ogden [53] and the functional nonparametric regression examined by Ferraty and Vieu [27].

As presented in Table 4, the functional partial linear model produces the smallest RMSE and RMSPE, in particular when the semi-metric is chosen based on the second derivative. The semi-metric based on the second derivative not only has the smallest RMSE and RMSPE, but it also has the largest LML.

Table 4. Estimation and forecast accuracies for the functional partial linear model, functional principal component regression, and functional nonparametric regression.

    Type of semi-metrics
    1st derivative 2nd derivative FPCAZ=3
Criterion FPCR FPLM FNP FPLM FNP FPLM FNP
RMSE 8.0832 2.4281 4.6014 1.5993 2.2899 7.3236 10.8779
RMSPE 9.1156 1.9191 4.7755 1.4075 1.9429 8.3378 11.2603
LML   −360.21   −291.37   −524.48  

Notes: The smallest error measures and largest LML are shown in bold.

To verify the model adequacy, we present several scatter plots of the holdout responses against the predicted responses in Figure 7. Among the three regression models, the functional partial linear model provides the best prediction accuracy.

Figure 7.

Figure 7.

Diagnostic check of the three functional regression models.

The original functional data being iid allows sampling with replacement to obtain a set of bootstrap samples. For each bootstrap sample, we use the first 160 pairs of data for estimation and the remaining 55 pairs of data for assessing the prediction accuracy. By repeating this procedure 100 times, we obtain the averaged RMSEs, and RMSPEs presented in Table 5. The functional partial linear model produces more accurate forecasts than functional principal component regression and functional nonparametric regression. In particular, the functional partial linear model produces the most accurate forecasts when the semi-metric is based on the second derivative. This optimal selection of semi-metric is further confirmed by the boxplot of LMLs presented in Figure 8.

Table 5. Estimation and forecast accuracy averaged over 100 replications for the functional partial linear model, functional principal component regression, and functional nonparametric regression.

    Type of semi-metrics
    1st derivative 2nd derivative FPCAZ=3
Criterion FPCR FPLM FNP FPLM FNP FPLM FNP
RMSE 8.1011 2.2858 4.6449 1.5269 2.2343 7.3675 10.7080
RMSPE 8.2234 2.3100 4.5655 1.6259 2.1947 7.8725 10.8047

Notes: The smallest error measures are shown in bold.

Figure 8.

Figure 8.

Comparisons of the LMLs in the functional partial linear model with a global bandwidth for the three choices of semi-metric.

There has been rapid development in functional regression models. Here, we consider a large set of functional regression models collected in Goldsmith and Scheipl [34] and Ferraty and Vieu [29]. These models include: (1) restricted maximum likelihood-based functional linear model with a locally adaptive penalty [12]; (2) penalized functional regression [33]; (3) functional principal component regression on the first few functional principal components [53]; (4) linear model on the first K functional principal components, where optimal K is estimated by 20-fold bootstrap [51]; (5) REML-based single-index signal regression with locally adaptive penalty [61]; (6) cross-validation based single-index signal regression [21]; (7) penalized partial least squares [39]; (8) least absolute shrinkage and selection operator penalized linear model on the first few functional principal components [30]; (9) functional nonparametric regression with NW estimator for estimating conditional mean [27]; (10) functional nonparametric regression with NW estimator for estimating conditional median [41]; (11) functional nonparametric regression with NW estimator for estimating conditional mode [24]; (12) functional nonparametric regression with k nearest neighbor estimator [11]; (13) functional nonparametric regression with most-predictive design points [23]; (14) functional partial linear model.

The boxplot of RMSPEs is presented in Figure 9. The functional partial linear model with the semi-metric and bandwidth selected by the Bayesian method has the smallest RMSPE among all methods considered.

Figure 9.

Figure 9.

Comparisons of forecast accuracy for the 14 functional regression models listed above.

A by-product of the Bayesian method is that it is possible to compute the pointwise prediction interval nonparametrically. To make this computation, we first compute the cumulative density function of the error density, over a set of grid points within a range, such as between 10 and 10. Then, we take the inverse of the cumulative density function and find two grid points that are closest to the 10% and 90% quantiles. The 80% pointwise prediction interval of a holdout sample is obtained by adding the two grid points to a point forecast. In Figure 10, the holdout fat content in percentage is presented as diamond-shaped dots, the point forecasts of the fat content are presented as round circles, and the 80% pointwise prediction intervals are presented as vertical bars. At the nominal coverage probability of 80%, the empirical coverage probability is 87%; at the nominal coverage probability of 50%, the empirical coverage probability is 51%.

Figure 10.

Figure 10.

A plot of predicted fat content in percentage and their 80% pointwise prediction intervals. The actual holdout data are presented as diamond-shaped dots; the point forecasts of the fat content are presented as circles; the 80% pointwise prediction intervals are presented as vertical bars.

6. Conclusion

We propose a Bayesian method to estimate optimal bandwidths in a functional partial linear model with homoscedastic errors and unknown error density. As a by-product of the MCMC sampling algorithm, it allows us to compute log marginal likelihood for a semi-metric. The optimal semi-metric is the one that provides the smallest log marginal likelihood, as evident from the results in Table 4. Given that no closed-form expression for our bandwidth estimator exists, establishing the mathematical properties, such as the asymptotic optimality of Shibata [60], of the bandwidth estimator has been difficult. However, we have developed an approximate solution to the bandwidth estimator through MCMC. As a by-product of the Bayesian method, a prediction interval can be obtained, and marginal likelihood can also be used to determine the optimal choice of semi-metric among a set of semi-metrics.

Through a series of simulation studies, we find the functional partial linear model has better estimation and forecast accuracies than functional principal component regression and functional nonparametric regression. For a set of smooth curves, we further confirm the optimality of the semi-metric based on the second derivative; for a set of rough curves, we affirm the overall optimal semi-metric is the semi-metric based on the FPCA, followed closely by the semi-metric based on the first derivative. The kernel density estimator with localized bandwidths performs similarly to the kernel density estimator with a global bandwidth for estimating regression function and outperforms the one with a global bandwidth for estimating error density.

Using the spectroscopy dataset, the functional partial linear model produces the smallest forecast error of some commonly used functional regression models. The Bayesian method not only allows determination of the optimal semi-metric via marginal likelihood, but it also allows the nonparametric construction of pointwise prediction intervals for measuring prediction uncertainty.

There are many ways in which the Bayesian method and functional partial linear model can be extended, and we briefly outline five:

  1. Combine different semi-metrics with weights based on their marginal likelihoods; this leads to the idea of ensemble forecasting.

  2. Consider other functional regression estimators, such as the functional local-linear estimator of Benhenni et al. [7] or the k-nearest neighbor estimator of Burba et al. [11] – the functional local-linear estimator can improve the estimation accuracy of the regression function by using a higher-order kernel function; the k-nearest neighbor estimator considers the local structure of the data and gives better forecasts when the functional data are heterogeneously concentrated.

  3. Extend to a functional partial linear model with heterogeneous errors; the covariate-dependent variance can be modeled by another kernel-density estimator (e.g. [14]).

  4. Extend to a functional partial linear model with autoregressive errors (e.g. [62]).

  5. Consider a J test of Davidson and MacKinnon [18] for selecting non-nested functional regression models.

Acknowledgments

The author would like to thank a reviewer for insightful comments and suggestions, which substantially improve the early version of the paper.

Funding Statement

The author acknowledges a faculty research grant from the College of Business and Economics at the Australian National University (R62860 I704).

Disclosure statement

No potential conflict of interest was reported by the author(s).

ORCID

Han Lin Shang  http://orcid.org/0000-0003-1769-6430

References

  • 1.Ahmad I. and Li Q., Testing symmetry of an unknown density function by kernel method, J. Nonparametr. Stat. 7 (1997), pp. 279–293. doi: 10.1080/10485259708832704 [DOI] [Google Scholar]
  • 2.Akritas M.G. and Van Keilegom I., Non-parametric estimation of the residual distribution, Scand. J. Statist. 28 (2001), pp. 549–567. doi: 10.1111/1467-9469.00254 [DOI] [Google Scholar]
  • 3.Aneiros-Pérez G., Ferraty F. and Vieu P., Variable selection in partial linear regression with functional covariate, Statist.: A J. Theor. Appl. Statist. 49 (2015), pp. 1322–1347. doi: 10.1080/02331888.2014.998675 [DOI] [Google Scholar]
  • 4.Aneiros-Pérez G. and Vieu P., Semi-functional partial linear regression, Stat. Probab. Lett. 76 (2006), pp. 1102–1110. doi: 10.1016/j.spl.2005.12.007 [DOI] [Google Scholar]
  • 5.Aneiros-Pérez G. and Vieu P., Partial linear modelling with multi-functional covariates, Comput. Stat. 30 (2015), pp. 647–671. doi: 10.1007/s00180-015-0568-8 [DOI] [Google Scholar]
  • 6.Barrientos-Marin J., Ferraty F. and Vieu P., Locally modelled regression and functional data, J. Nonparametr. Stat. 22 (2010), pp. 617–632. doi: 10.1080/10485250903089930 [DOI] [Google Scholar]
  • 7.Benhenni K., Ferraty F., Rachdi M., and Vieu P., Local smoothing regression with functional data, Comput. Stat. 22 (2007), pp. 353–369. doi: 10.1007/s00180-007-0045-0 [DOI] [Google Scholar]
  • 8.Berlinet A., Elamine A., and Mas A., Local linear regression for functional data, Ann. Inst. Stat. Math. 63 (2011), pp. 1047–1075. doi: 10.1007/s10463-010-0275-8 [DOI] [Google Scholar]
  • 9.Boj E., Delicado P., and Fortiana J., Distance-based local linear regression for functional predictors, Comput. Stat. Data Anal. 54 (2010), pp. 429–437. doi: 10.1016/j.csda.2009.09.010 [DOI] [Google Scholar]
  • 10.Bowman A.W., An alternative method of cross-validation for the smoothing of density estimates, Biometrika 71 (1984), pp. 353–360. doi: 10.1093/biomet/71.2.353 [DOI] [Google Scholar]
  • 11.Burba F., Ferraty F., and Vieu P., k-nearest neighbour method in functional nonparametric regression, J. Nonparametr. Stat. 21 (2009), pp. 453–469. doi: 10.1080/10485250802668909 [DOI] [Google Scholar]
  • 12.Cardot H., Ferraty F., and Sarda P., Spline estimators for the functional linear model, Stat. Sin. 13 (2003), pp. 571–591. [Google Scholar]
  • 13.Chen H., Smyth R., and Zhang X., A Bayesian sampling approach to measuring the price responsiveness of gasoline demand using a constrained partially linear model, Energy Econ. 67 (2017), pp. 346–354. doi: 10.1016/j.eneco.2017.08.029 [DOI] [Google Scholar]
  • 14.Chen L.-H., Cheng M.-Y., and Peng L., Conditional variance estimation in heteroscedastic regression models, J. Stat. Plan. Inference 139 (2009), pp. 236–245. doi: 10.1016/j.jspi.2008.04.020 [DOI] [Google Scholar]
  • 15.Cheng F. and Sun S., A goodness-of-fit test of the errors in nonlinear autoregressive time series models, Stat. Probab. Lett. 78 (2008), pp. 50–59. doi: 10.1016/j.spl.2007.05.003 [DOI] [Google Scholar]
  • 16.Chernozhukov V. and Hong H., An MCMC approach to classical estimation, J. Econom. 115 (2003), pp. 293–346. doi: 10.1016/S0304-4076(03)00100-3 [DOI] [Google Scholar]
  • 17.Chib S., Marginal likelihood from the Gibbs output, J. Amer. Statist. Assoc.: Theory Method 90 (1995), pp. 1313–1321. doi: 10.1080/01621459.1995.10476635 [DOI] [Google Scholar]
  • 18.Davidson R. and MacKinnon J.G., Several tests for model specification in the presence of alternative hypotheses, Econometrica 49 (1981), pp. 781–793. doi: 10.2307/1911522 [DOI] [Google Scholar]
  • 19.de Boor C., A Practical Guide to Splines, Revised ed., volume 27 of Applied Mathematical Sciences. Springer, New York, 2001.
  • 20.Efromovich S., Estimation of the density of regression errors, Ann. Stat. 33 (2005), pp. 2194–2227. doi: 10.1214/009053605000000435 [DOI] [Google Scholar]
  • 21.Eilers P.H.C., Li B., and Marx B.D., Multivariate calibration with single-index signal regression, Chemometr. Intell. Lab. Syst. 96 (2009), pp. 196–202. doi: 10.1016/j.chemolab.2009.02.001 [DOI] [Google Scholar]
  • 22.Escanciano J.C. and Jacho-Chávez D.T., n uniformly consistent density estimation in nonparametric regression models, J. Econom. 167 (2012), pp. 305–316. doi: 10.1016/j.jeconom.2011.09.017 [DOI] [Google Scholar]
  • 23.Ferraty F., Hall P., and Vieu P., Most-predictive design points for functional data predictors, Biometrika 97 (2010), pp. 807–824. doi: 10.1093/biomet/asq058 [DOI] [Google Scholar]
  • 24.Ferraty F., Laksaci A., and Vieu P., Functional time series prediction via conditional mode estimation, C. R. Acad. Sci. Paris Ser. I 340 (2005), pp. 389–392. doi: 10.1016/j.crma.2005.01.016 [DOI] [Google Scholar]
  • 25.Ferraty F., Van Keilegom I., and Vieu P., On the validity of the bootstrap in non-parametric functional regression, Scand. J. Statist. 37 (2010), pp. 286–306. doi: 10.1111/j.1467-9469.2009.00662.x [DOI] [Google Scholar]
  • 26.Ferraty F. and Vieu P., The functional nonparametric model and application to spectrometric data, Comput. Stat. 17 (2002), pp. 545–564. doi: 10.1007/s001800200126 [DOI] [Google Scholar]
  • 27.Ferraty F. and Vieu P., Nonparametric Functional Data Analysis: Theory and Practice, Springer, New York, 2006. [Google Scholar]
  • 28.Ferraty F. and Vieu P., Additive prediction and boosting for functional data, Comput. Stat. Data Anal. 53 (2009), pp. 1400–1413. doi: 10.1016/j.csda.2008.11.023 [DOI] [Google Scholar]
  • 29.Ferraty F. and Vieu P., Richesse et complexité des données fonctionnelles, Revue MODULAD 43 (2011), pp. 25–43. [Google Scholar]
  • 30.Friedman J., Hastie T., and Tibshirani R., Regularization paths for generalized linear models via coordinate descent, J. Stat. Softw. 33 (2010), pp. 1–22. doi: 10.18637/jss.v033.i01 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Garthwaite P.H., Fan Y., and Sisson S.A., Adaptive optimal scaling of Metropolis-Hastings algorithms using the Robbins-Monro process, Comm Statist Theory Methods 45 (2016), pp. 5098–5111. doi: 10.1080/03610926.2014.936562 [DOI] [Google Scholar]
  • 32.Gelfand A.E. and Dey D.K., Bayesian model choice: Asymptotics and exact calculations, J. R. Stat. Soc. Ser. B 56 (1994), pp. 501–514. [Google Scholar]
  • 33.Goldsmith J., Bobb J., Crainiceanu C.M., Caffo B., and Reich D., Penalized functional regression, J. Comput. Graph. Stat. 20 (2011), pp. 830–851. doi: 10.1198/jcgs.2010.10007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Goldsmith J. and Scheipl F., Estimator selection and combination in scalar-on-function regression, Comput. Stat. Data Anal. 70 (2014), pp. 362–372. doi: 10.1016/j.csda.2013.10.009 [DOI] [Google Scholar]
  • 35.Gourieroux C., Monfort A., and Trognon A., Pseudo maximum likelihood methods: Theory, Econometrica 52 (1984), pp. 681–700. doi: 10.2307/1913471 [DOI] [Google Scholar]
  • 36.Goutis C., Second-derivative functional regression with applications to near infra-red spectroscopy, J. R. Stat. Soc. Ser. B 60 (1998), pp. 103–114. doi: 10.1111/1467-9868.00111 [DOI] [Google Scholar]
  • 37.Kass R.E. and Raftery A.E., Bayes factors, J. Amer. Statist. Assoc.: Rev. Paper 90 (1995), pp. 773–795. doi: 10.1080/01621459.1995.10476572 [DOI] [Google Scholar]
  • 38.Kim S., Shephard N., and Chib S., Stochastic volatility: Likelihood inference and comparison with ARCH models, Rev. Econ. Stud. 65 (1998), pp. 361–393. doi: 10.1111/1467-937X.00050 [DOI] [Google Scholar]
  • 39.Krämer N., Boulesteix A.-L., and Tutz G., Penalized partial least squares with application to B-spline transformations and functional data, Chemometr. Intell. Lab. Syst. 94 (2008), pp. 60–69. doi: 10.1016/j.chemolab.2008.06.009 [DOI] [Google Scholar]
  • 40.Kullback S., Information Theory and Statistics, Wiley, New York, 1959. [Google Scholar]
  • 41.Laksaci A., Lemdani M., and Saïd E.O., Asymptotic results for an L1-norm kernel estimator of the conditional quantile for functional dependent data with application to climatology, Sankhyā 73-A (2011), pp. 125–141. doi: 10.1007/s13171-011-0002-4 [DOI] [Google Scholar]
  • 42.Lian H., Functional partial linear model, J. Nonparametr. Stat. 23 (2011), pp. 115–128. doi: 10.1080/10485252.2010.500385 [DOI] [Google Scholar]
  • 43.Ling N., Aneiros G., and Vieu P., kNN estimation in functional partial linear modeling, Statist. Papers 61 (2020), pp. 423–444. doi: 10.1007/s00362-017-0946-0 [DOI] [Google Scholar]
  • 44.Marron J.S. and Wand M.P., Exact mean integrated squared error, Ann. Stat. 20 (1992), pp. 712–736. doi: 10.1214/aos/1176348653 [DOI] [Google Scholar]
  • 45.Mas A. and Pumo B., Functional linear regression with derivatives, J. Nonparametr. Stat. 21 (2009), pp. 19–40. doi: 10.1080/10485250802401046 [DOI] [Google Scholar]
  • 46.Meyer R. and Yu J., BUGS for a Bayesian analysis of stochastic volatility models, Econom. J. 3 (2000), pp. 198–215. doi: 10.1111/1368-423X.00046 [DOI] [Google Scholar]
  • 47.Muhsal B. and Neumeyer N., A note on residual-based empirical likelihood kernel density estimation, Electron. J. Stat. 4 (2010), pp. 1386–1401. doi: 10.1214/10-EJS586 [DOI] [Google Scholar]
  • 48.Müller U., Risk of Bayesian inference in misspecified models, and the sandwich covariance matrix, Econometrica 81 (2013), pp. 1805–1849. doi: 10.3982/ECTA9097 [DOI] [Google Scholar]
  • 49.Neumeyer N. and Dette H., Testing for symmetric error distribution in nonparametric regression models, Stat. Sin. 17 (2007), pp. 775–795. [Google Scholar]
  • 50.Rachdi M. and Vieu P., Nonparametric regression for functional data: Automatic smoothing parameter selection, J. Stat. Plan. Inference 137 (2007), pp. 2784–2801. doi: 10.1016/j.jspi.2006.10.001 [DOI] [Google Scholar]
  • 51.Ramsay J. and Silverman B., Functional Data Analysis, 2nd ed., Springer, New York, 2005. [Google Scholar]
  • 52.Reiss P.T., Goldsmith J., Shang H.L., and Ogden R.T., Methods for scalar-on-function regression, Int. Statist. Rev. 85 (2017), pp. 228–249. doi: 10.1111/insr.12163 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Reiss P.T. and Ogden R.T., Functional principal component regression and functional partial least squares, J. Am. Stat. Assoc. 102 (2007), pp. 984–996. doi: 10.1198/016214507000000527 [DOI] [Google Scholar]
  • 54.Roberts G.O. and Rosenthal J.S., Examples of adaptive MCMC, J. Comput. Graph. Stat. 18 (2009), pp. 349–367. doi: 10.1198/jcgs.2009.06134 [DOI] [Google Scholar]
  • 55.Shang H.L., Bayesian bandwidth estimation for a nonparametric functional regression model with unknown error density, Comput. Stat. Data Anal. 67 (2013), pp. 185–198. doi: 10.1016/j.csda.2013.05.006 [DOI] [Google Scholar]
  • 56.Shang H.L., Bayesian bandwidth estimation for a nonparametric functional regression model with mixed types of regressors and unknown error density, J. Nonparametr. Stat. 26 (2014), pp. 599–615. doi: 10.1080/10485252.2014.916806 [DOI] [Google Scholar]
  • 57.Shang H.L., Bayesian bandwidth estimation for a semi-functional partial linear regression model with unknown error density, Comput. Stat. 29 (2014), pp. 829–848. doi: 10.1007/s00180-013-0463-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Shang H.L., A Bayesian approach for determining the optimal semi-metric and bandwidth in scalar-on-function quantile regression with unknown error density and dependent functional data, J. Multivar. Anal. 146 (2016), pp. 95–104. doi: 10.1016/j.jmva.2015.06.015 [DOI] [Google Scholar]
  • 59.Shang H.L., Estimation of a functional single index model with dependent errors and unknown error density, Comm Statist Simulation Computation (2020). Available at https://www.tandfonline.com/doi/abs/10.1080/03610918.2018.1535068?journalCode=lssp20. [Google Scholar]
  • 60.Shibata R., An optimal selection of regression variables, Biometrika 68 (1981), pp. 45–54. doi: 10.1093/biomet/68.1.45 [DOI] [Google Scholar]
  • 61.Wood S.N., Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models, J. R. Stat. Soc. Ser. B 73 (2011), pp. 3–36. doi: 10.1111/j.1467-9868.2010.00749.x [DOI] [Google Scholar]
  • 62.You J. and Zhou X., Statistical inference in a panel data semiparametric regression model with serially correlated errors, J. Multivar. Anal. 97 (2006), pp. 844–873. doi: 10.1016/j.jmva.2005.04.005 [DOI] [Google Scholar]
  • 63.Zhang X. and King M.L., Bayesian semiparametric GARCH models, Working Paper 24, Monash University, 2011. Available at https://www.monash.edu/business/econometrics-and-business-statistics/research/publications/ebs/wp24-11.pdf.
  • 64.Zhang X., King M.L., and Shang H.L., Bayesian estimation of bandwidths for a nonparametric regression model with a flexible error density, Working Paper 10, Monash University, 2011. Available at https://www.monash.edu/business/econometrics-and-business-statistics/research/publications/ebs/wp10-11.pdf.

Articles from Journal of Applied Statistics are provided here courtesy of Taylor & Francis

RESOURCES