Nonlinear Varying Coefficient Models with Applications to Studying Photosynthesis

Esra Kürüm; Runze Li; Yang Wang; Damla ŞEntürk

doi:10.1007/s13253-013-0157-7

. Author manuscript; available in PMC: 2015 Mar 1.

Published in final edited form as: J Agric Biol Environ Stat. 2014 Mar 1;19(1):57–81. doi: 10.1007/s13253-013-0157-7

Nonlinear Varying Coefficient Models with Applications to Studying Photosynthesis

Esra Kürüm ¹, Runze Li ², Yang Wang ³, Damla ŞEntürk ⁴

PMCID: PMC4070621 NIHMSID: NIHMS591009 PMID: 24976756

Abstract

Motivated by a study on factors affecting the level of photosynthetic activity in a natural ecosystem, we propose nonlinear varying coefficient models, in which the relationship between the predictors and the response variable is allowed to be nonlinear. One-step local linear estimators are developed for the nonlinear varying coefficient models and their asymptotic normality is established leading to point-wise asymptotic confidence bands for the coefficient functions. Two-step local linear estimators are also proposed for cases where the varying coefficient functions admit different degrees of smoothness; bootstrap confidence intervals are utilized for inference based on the two-step estimators. We further propose a generalized F test to study whether the coefficient functions vary over a covariate. We illustrate the proposed methodology via an application to an ecology data set and study the finite sample performance by Monte Carlo simulation studies.

Keywords: Generalized F test, Local linear regression, Nonlinear regression model, Varying coefficient models

1 INTRODUCTION

This work was motivated by an empirical study in the field of ecology. It is known that the rate of photosynthesis in an ecosystem is affected by sunlight intensity. The level of photosynthetic activity in a natural ecosystem is measured by the Net Ecosystem Exchange of CO₂ (NEE), since leaves absorb carbon dioxide (CO₂) during photosynthesis. NEE depends on the amount of Photosynthetically Active Radiation (PAR) available to an ecosystem. It is believed that the relationship between NEE and PAR is nonlinear (Monteith 1972) and follows the model,

NEE = β_{1} - \frac{β_{2} PAR}{PAR + β_{3}} + ε,

(1.1)

where ε is the random error with mean zero, and β₁; β₂; β₃ are the unknown parameters. The coefficient β₁ represents the dark respiration rate, that is the rate of respiration in plants where carbon dioxide is released without the aid of sun light, and β₂ represents the light-saturated net photosynthetic rate. The light saturation point refers to the amount of light that the chloroplast cannot absorb. At this point, photosynthesis still occurs, but the amount of light is more than the number of pigments in chlorophyll cells that are available for absorption. Finally, β₂/β₃ is the apparent quantum yield. The quantum yield of photosynthesis is a measure of efficiency which refers to the amount of the products of photosynthesis, such as O₂, per unit light input (Garrett and Grisham 2005). When the quantum yield measured is based only on the incident light, that is the light falling on a surface, and does not take into account the scattered light, it is referred as the apparent quantum yield or lower limits of true quantum yield.

Model (1.1) was obtained in a laboratory where climate variables such as temperature and moisture variability can be controlled. However, when the temperature (T) cannot be controlled, the parameters β₁, β₂, and β₃ are likely to depend on temperature. To demonstrate that the relationship between NEE and PAR depends on temperature, we fit a 2-dimensional kernel regression of NEE (μmol m⁻²s⁻¹) on PAR (μmol m⁻²s⁻¹) and temperature (°C). Figure 1(a) gives the contour plot of the regression fit, where three slices of the estimated regression fit are given at three quartiles of the observed temperature values in Figure 1(b). In Figure 1(a), there seems to be a weak temperature effect on the association of NEE and PAR for low values of PAR; however, as PAR increases, the non-parallel pattern demonstrates that the effect of temperature on this relation is getting stronger. The profiles in Figure 1(b) strengthen this observation: with large PAR values, the estimates seem to differ for different temperature values. Hence, we consider the following model allowing the nonlinear regression relation between NEE and PAR to change with temperature:

NEE = β_{1} (T) - \frac{β_{2} (T) PAR}{PAR + β_{3} (T)} + ε .

(1.2)

A detailed analysis of the ecological data set will be given in Section 4.

(a) Contour plot of the 2-dimensional kernel regression fit of NEE on PAR and TEMP, (b) Slices from the kernel regression fit at three quartiles of the observed temperature values.

Powerful modeling techniques have been proposed for dealing with ‘curse of dimensionality’ while modeling high dimensional data. Examples include additive models (Breiman and Friedman 1985; Hastie and Tibshirani 1990), low-dimensional interaction models (Friedman 1991; Gu and Wahba 1993; Stone, Hansen, Kooperberg, and Truong 1997), multiple-index models (Härdle and Stoker 1989; Li 1991), and partially linear models (Wahba 1984; Green and Silverman 1994). With the aim of increasing the flexibility of linear regression models and reducing the modeling bias, varying coefficient models

Y = X^{T} β (U) + ε,

(1.3)

was first introduced by Cleveland, Grosse, and Shyu (1992) and became popular in the statistical literature due to the work by Hastie and Tibshirani (1993). In (1.3), Y denotes the response variable, X the vector of predictors, and U a scalar predictor, such as temperature. The varying coefficient models explore the dynamic features of the data by allowing the model coefficients to vary over a covariate U, such as temperature. At a fixed U = u value, the regression coefficients in a varying coefficient model can be interpreted in a similar way to a linear regression model.

Model (1.2) motivates us to consider a new class of models, that are a natural extension of varying coefficient models, namely, nonlinear varying coefficient models. We define the nonlinear varying coefficient model as

Y = f {X, β (U)} + ε,

(1.4)

where f (·, ·) is a pre-specified function and β(U) consists of unknown coefficient functions. The error term satisfies E(ε|X;U) = 0. Similar to the interpretation of a varying coefficient model, at a fixed value U = u, the nonlinear varying coefficient model can be interpreted as a nonlinear regression model.

Estimation in varying coefficient models has been studied extensively. Earlier work on estimation of the varying coefficient models includes Wu, Chiang, and Hoover (1998), Hoover, Rice, Wu, and Yang (1998), Fan and Zhang (1999) and Kauermann and Tutz (1999). Fan and Zhang (2008) provides an excellent review of the literature on estimation and inference procedures for varying coefficient models. In this paper, we propose one-step and two-step local linear estimators for nonlinear varying coefficient models using local linear regression techniques (Fan and Gijbels 1996). While the one-step estimators estimate the varying coefficient functions using a single bandwidth, the two-step estimators adopt to cases where the varying coefficient functions admit different degrees of smoothness. Since there is no closed form solution for the local parameters, we propose an iterative linear regression algorithm to search for the local solutions. Thus, the proposed estimation procedures can be easily carried out. We establish the asymptotic normality of the proposed one-step estimators and derive consistent estimators for their asymptotic standard error, leading to asymptotic point-wise confidence intervals. Bootstrap percentile confidence intervals are utilized for inference based on the two-step estimators.

It is of interest to test whether the proposed nonlinear varying coefficient model reduces to a regular nonlinear regression model, which amounts to testing whether the varying coefficient functions in (1.4) are in fact varying over the covariate U. In this case the null hypothesis is parametric, while the alternative is nonparametric. For this hypothesis testing problem, we develop a generalized likelihood ratio test where the proposed test statistic follows a χ² distribution and it inherits Wilks phenomenon (Fan, Zhang, and Zhang 2001), i.e. the asymptotic null distribution of the test statistic is independent of the nuisance parameters β(U) and the density function of U. Based on the Wilks phenomenon, we utilize a bootstrap procedure to estimate the degrees of freedom of the χ² distribution of the proposed test statistic (Cai, Fan, and Li 2000). We apply the proposed methodology for a detailed empirical analysis of the aforementioned ecological data set. The efficacy of the proposed algorithms is studied via Monte Carlo simulations.

The paper is organized as follows. In Section 2, we develop one-step and two-step estimation procedures and propose a generalized F test for the nonlinear varying coefficient models. Asymptotic properties of the proposed one-step estimation procedure are also given in this section. Monte Carlo simulation studies are presented in Section 3. A detailed empirical analysis of the ecology data set is given in Section 4. Regularity conditions and technical proofs are given in the supplementary material.

2 STATISTICAL INFERENCE PROCEDURES

Suppose that {U_i,X_i,Y_i}, i = 1,2, … ,n is an independent and identically distributed sample from the nonlinear varying coefficient model

Y_{i} = f {X_{i}, β (U_{i})} + ε_{i},

where E(ε_i|X_i,U_i) = 0 and β(_{U_i}) = {β₁(U_i),β₂(U_i); … ;β_p(U_i)}^T consists of p unknown nonparametric coefficient functions. At every fixed U = u value, a nonlinear varying coefficient model implies a nonlinear regression model. Hence, its identifiability requires the identifiability of the nonlinear models holding at each fixed U = u, i.e. that there exists no nonzero parameter β⁰(u) ≠ β(u) such that f(x;β⁰(u))= f(x;β(u)) for all x (Seber and Wild 2003).

2.1 ONE-STEP ESTIMATION

One approach for estimating the coefficient functions β_j(·), j = 1, … , p, is to employ local linear fitting techniques. In what follows, we will use h₁ to denote the bandwidth choice used in obtaining the one-step estimator and use h₀ and h₂ to denote the initial and final bandwidths used in obtaining the two-step estimators. In order to estimate the nonparametric coefficient functions at a fixed point u₀, we locally approximate the functions in the neighborhood of a fixed point u₀ by the Taylor expansion:

β_{j} (u) \approx β_{j} (u_{0}) + β_{j} (u_{0}) (u - u_{0}) \equiv a_{j} + b_{j} (u - u_{0}),

for j = 1,…,p. Let a = (a₁,…,a_p)^T and b = (b₁,…,b_p)^T. We minimize the local least squares,

ℓ (a, b) = \sum_{i = 1}^{n} {[Y_{i} - f {X_{i}, a + b (U_{i} - u_{0})}]}^{2} K_{h_{1}} (U_{i} - u_{0}),

(2.1)

with respect to the local parameters (a,b), where $K_{h_{1}} (t) = h_{1}^{- 1} K (t ∕ h_{1})$ is the scaled kernel function K(·) with bandwidth h₁. The estimators of the nonlinear regression coefficients can be given as $\hat{β} (u_{0}) = \hat{a}$ .

Note that f (X,β) is pre-specified, but may be a nonlinear function of X and β. Thus, it is typical that there is no closed form for $\hat{a}$ and $\hat{b}$ . The Newton-Raphson algorithm may be employed to search the solution. However, based on our limited experience, the Hessian matrix of l(a,b) may not be positive definite during the iteration of the Newton-Raphson algorithm. This poses a challenge in finding the minimizer of the objective function. Hence, we propose the following iterative linear regression algorithm to obtain the values that minimize the local least squares function (2.1). During the course of the iteration, suppose we have the value (a₀,b₀) of (a,b) at the current step. We approximate f{X_i,a+b(U − u₀)} in the neighborhood of (a₀,b₀) using the Taylor expansion,

\begin{matrix} f {X_{i}, a + b (U_{i} - u_{0})} \approx & f {X_{i}, a_{0} + b_{0} (U_{i} - u_{0})} \\ + {(a - a_{0}) + (b - b_{0}) (U_{i} - u_{0})}^{T} f^{'} {X_{i}, a_{0} + b_{0} (U_{i} - u_{0})}, \end{matrix}

where f’ (X_i,β) is the p×1 vector equal to ∂f(X_i,β)/∂β. Denote the values of a and b at the k^th iteration as a^(k) and b^(k). Let

\begin{matrix} F_{k} = & {[\begin{matrix} f^{' T} {X_{1}, a^{(k)} + b^{(k)} (U_{1} - u_{0})} & (U_{1} - u_{0}) f^{' T} {X_{1}, a^{(k)} + b^{(k)} (U_{1} - u_{0})} \\ ⋮ & ⋮ \\ f^{' T} {X_{n}, a^{(k)} + b^{(k)} (U_{n} - u_{0})} & (U_{n} - u_{0}) f^{' T} {X_{n}, a^{(k)} + b^{(k)} (U_{n} - u_{0})} \end{matrix}]}_{n \times 2 p}, \\ Y_{i, k} = & Y_{i} - f {X_{i}, a^{(k)} + b^{(k)} (U_{i} - u_{0})} \\ + {a^{(k)} + b^{(k)} (U_{i} - u_{0})}^{T} f^{'} {X_{i}, a^{(k)} + b^{(k)} (U_{i} - u_{0})}, \end{matrix}

and Y_k = (Y_1,k, … ,Y_n,k)^T. Then, we update (a,b) according to

(\begin{matrix} a^{(k + 1)} \\ b^{(k + 1)} \end{matrix}) = {(F_{k}^{T} {WF}_{k})}^{- 1} F_{k}^{T} {WY}_{k},

where W= diag (K_h1(U₁–u₀), …,K_h1(U_n–u₀)). The solution of this iterative linear regression algorithm will satisfy l(a,b) = 0, and the estimators will be given by $\hat{β} (u_{0}) = \hat{a}$ and ${\hat{β}}^{'} (u_{0}) = \hat{b}$ . The proposed algorithm is indeed the Fisher scoring algorithm if we further assume that the error follows a normal distribution. Hence, our algorithm shares the convergence properties of the Fisher scoring algorithm. The asymptotic rate of convergence of the scoring algorithm should approach that of the Newton method as the sample size goes to infinity (Osborne 1992). In other words, the scoring algorithm approaches a second order convergence rate as the sample size increases. We demonstrate that the proposed algorithm converges quickly in simulation studies.

A question that arises in the practical implementation of the proposed procedure is the bandwidth selection. We suggest using a multi-fold cross-validation technique (Geisser 1975). Specifically, we propose minimizing the following cross validation score:

CV (h) = \sum_{j} {‖ Y_{j} - {\hat{Y}}_{- j} ‖}^{2},

(2.2)

where ${\hat{Y}}_{- j}$ denotes the fitted value with jth group of subjects excluded. The bandwidth that minimizes (2.2) is chosen.

2.2 TWO-STEP ESTIMATION

As a second estimation proposal, we extend the two-step local linear estimation procedure of Fan and Zhang (1999) proposed for varying coefficient models to nonlinear varying coefficient models. While the proposed one-step estimation is easier to implement, the two-step estimators have been reported to lead to efficiency gains when the varying coefficient functions admit different degrees of smoothness and hence a single bandwidth is not optimal in estimating all varying coefficient functions. The proposed two-step estimation procedure is sketched as follows. In the first step, we obtain initial estimators ${\tilde{β}}_{j} (U_{i}), j = 1, \dots, p$ , by minimizing (2.1) with a small bandwidth h₀. These estimators would have a smaller bias, but a larger variance. As Fan and Zhang (1999) pointed out the two-step estimator is not sensitive to the initial bandwidth h₀ choice, since the two-step estimator is optimal for a wide range of bandwidths. We suggest using a multifold cross-validation (2.2) or generalized cross-validation to obtain the bandwidth $\hat{h}$ and then use $h_{0} = 0.5 \hat{h}$ as the initial bandwidth (Fan and Zhang 1999). In the second step, we obtain the two-step estimator of β_j(·), j = 1, … , p, by replacing all the varying coefficient functions except β_j(·) with their corresponding initial estimators. In this step, we reduce the variance by using further smoothing where we recommend using a multi-fold cross-validation (2.2) in choosing the optimal bandwidth. To obtain the two-step estimator of β_j(·) at a fixed point u₀, we start by locally approximating the function β_j(·) in the neighborhood of a fixed point u₀ as follows:

β_{j} (u) \approx β_{j} (u_{0}) + β_{j} (u_{0}) (u - u_{0}) \equiv a_{j}^{*} + b_{j}^{*} (u - u_{0}) .

This leads to minimizing the following local least squares in order to obtain the two-step estimator of β_j(u₀):

ℓ^{*} (a^{*}, b^{*}) = \sum_{i = 1}^{n} {[Y_{i} - f^{*} {X_{i}, a_{j}^{*} + b_{j}^{*} (U_{i} - u_{0})}]}^{2} K_{h_{2}} (U_{i} - u_{0}),

with respect to ( $a_{j}^{*}$ , $b_{j}^{*}$ ), where f*(· , ·) is the function f(· , ·) with all coefficients except β_j(u₀) substituted with their corresponding initial estimators, and $K_{h_{2}} (t) = h_{2}^{- 1} K (t ∕ h_{2})$ is the scaled kernel function K(·) with bandwidth h₂. The final estimator of β_j(u₀) is ${\hat{a}}^{*} (u_{0})$ .

In the first step of the proposed estimation, the iterative algorithm described in Section 2.1 is employed and the initial estimators ${\tilde{β}}_{j} (U_{i})$ for i = 1, … , n and j = 1, … , p, are obtained. For minimization of the local likelihood in the second step, we propose an iterative linear regression algorithm similar to the one given in Section 2.1. We denote the values of a* and b* at the k^th iteration as a*^(k) and b*^(k), respectively, and f*’(X_i,β_j) is the derivative of f*(X_i;β_j) with respect to β_j, that is ∂f*(X_i,β_j)=∂β_j. Let

\begin{matrix} F_{k}^{*} = & {[\begin{matrix} f^{*^{' T}} {X_{1}, a_{j}^{* (k)} + b_{j}^{* (k)} (U_{1} - u_{0})} & (U_{1} - u_{0}) f^{*^{' T}} {X_{1}, a_{j}^{* (k)} + b_{j}^{* (k)} (U_{1} - u_{0})} \\ ⋮ & ⋮ \\ f^{*^{' T}} {X_{n}, a_{j}^{* (k)} + b_{j}^{* (k)} (U_{n} - u_{0})} & (U_{n} - u_{0}) f^{*^{' T}} {X_{n}, a_{j}^{* (k)} + b_{j}^{* (k)} (U_{n} - u_{0})} \end{matrix}]}_{n \times 2}, \\ Y_{i, k}^{*} = & Y_{i} - f^{*} {X_{i}, a_{j}^{* (k)} + b_{j}^{* (k)} (U_{i} - u_{0})} + {a_{j}^{* (k)} + b_{j}^{* (k)} (U_{i} - u_{0})}^{T} f^{*^{'}} {X_{i}, a_{j}^{* (k)} + b_{j}^{* (k)} (U_{i} - u_{0})}, \end{matrix}

and Y_k = (Y_1,k, … ,Y_n,k)^T. Then, we update $a_{j}^{*} = {(a_{j}^{*}, b_{j}^{*})}^{T}$ according to

a_{j}^{* (k + 1)} = {(F_{k}^{* T} {WF}_{k}^{*})}^{- 1} F_{k}^{* T} {WY}_{k}^{*},

where W= diag (K_h2(U₁−u₀); …;K_h2(U_n−u₀)). The solution of this iterative linear regression algorithm will satisfy l*(a*;b*) = 0, and the two-step estimator of β_j(u₀) is given by ${\hat{a}}_{j}^{*}$ .

2.3 ASYMPTOTIC PROPERTIES OF THE ONE-STEP ESTIMATOR

We now derive the asymptotic distributions of our one-step estimators $\hat{a}$ and $\hat{b}$ . Define θ(u₀)=(a₁, … ,a_p,b₁; … ,b_p)^T and $\hat{θ} (u_{0}) = {({\hat{a}}^{T}, {\hat{b}}^{T})}^{T}$ . Further define μ_k = ∫t^kK(t)dt, v_k = ∫t^kK²(t)dt and H = diag (1,h₁) ⊗ I_p with ⊗ denoting the Kronecker product and I_p the p×p identity matrix. Let c(u) denote the marginal density of U, and let

\begin{matrix} Γ_{1} (u_{0}) = & E {(f^{'} {X; β (u_{0})} {[f^{'} {X; β (u_{0})}]}^{T} ∣ U = u_{0})}_{p \times p}, \\ Γ_{2} (u_{0}) = & E {(σ^{2} (u_{0}, X) f^{'} {X; β (u_{0})} {[f^{'} {X; β (u_{0})}]}^{T} ∣ U = u_{0})}_{p \times p} . \end{matrix}

The following theorem establishes the asymptotic normality of the one-step estimators of the regression coefficient functions β(u).

Theorem 1

Under the regularity conditions (A)—(G) in the supplementary material, we have the following result for $\hat{θ} (u_{0})$ :

\sqrt{{nh}_{1}} [H {\hat{θ} (u_{0}) - θ (u_{0})} - \frac{h_{1}^{2}}{2 (μ_{2} - μ_{1}^{2})} {\begin{matrix} (μ_{2}^{2} - μ_{1} μ_{3}) β^{″} (u_{0}) \\ (μ_{3} - μ_{1} μ_{2}) β^{″} (u_{0}) \end{matrix}} + o_{p} (h_{1}^{2})] \overset{D}{\to} N (0, Δ^{- 1} Λ Δ^{- 1})

as n→∞, where

Δ = c (u_{0}) (\begin{matrix} 1 & μ_{1} \\ μ_{1} & μ_{2} \end{matrix}) \otimes Γ_{1} (u_{0}) and Λ = c (u_{0}) (\begin{matrix} ν_{0} & ν_{1} \\ ν_{1} & ν_{2} \end{matrix}) \otimes Γ_{2} (u_{0}) .

If K(·) is symmetric, we obtain the following simplification:

\sqrt{{nh}_{1}} {\hat{a} (u_{0}) - β (u_{0}) - \frac{h_{1}^{2} μ_{2}}{2} β^{″} (u_{0}) + o_{p} (h_{1}^{2})} \overset{D}{\to} N (0, Σ (u_{0}))

as n→∞, where

Σ (u_{0}) = ν_{0} Γ_{1}^{- 1} (u_{0}) Γ_{2} (u_{0}) Γ_{1}^{- 1} (u_{0}) ∕ c (u_{0}) .

Theorem 2

Under the regularity conditions given in the supplementary material, it holds that

H {(F^{T} WF)}^{- 1} F^{T} W Q WF {(F^{T} WF)}^{- 1} H \overset{P}{\to} Δ^{- 1} Λ Δ^{- 1}

as n→∞, where F = F_k, $Q = diag (e_{1}^{2}, \dots e_{n}^{2})$ with $e_{i} = Y_{i} - f {X_{i}, \hat{β} (U_{i})}$ .

The proofs of Theorems 1 and 2 are given in the supplementary material. Theorem 1 shows that when K(·) is symmetric, the asymptotic bias of $\hat{a} (u)$ is the same as that in standard varying coefficient models (Fan and Zhang 2008). Theorem 1 also establishes the asymptotic variance of $\hat{a} (u)$ ; however, we need a consistent estimator of this asymptotic variance to obtain confidence intervals for β(u). To this end, we use Theorem 2 to derive a consistent estimator of the asymptotic covariance matrix of $\hat{θ} (u_{0})$ given in Theorem 1,

\hat{cov} {\hat{θ} (u_{0})} = {(F^{T} WF)}^{- 1} F^{T} W Q WF {(F^{T} WF)}^{- 1} .

(2.3)

The accuracy of this estimator will be tested in our simulation studies. Thus, the (1–α) 100% asymptotic pointwise confidence intervals for θ(u₀) can be given as ${\hat{β}}_{j} (u_{0}) \pm C_{α ∕ 2} {[\hat{var} {{\hat{β}}_{j} (u_{0})}]}^{1 ∕ 2}$ , where $C_{α ∕ 2}$ is the 100×(1−α=2)-percentile of N(0,1).

2.4 GENERALIZED F-TEST

It is of scientific interest to test whether the coefficient functions in (1.2) really change over the temperature T. In general, we may formulate this problem as a hypothesis test to study whether the coefficients of the nonlinear varying coefficient model (1.4) actually vary over the covariate U as follows:

H_{0} : β_{j} (U) \equiv β_{j 0} vs H_{1} : β_{j} (U) \neq β_{j 0} for j = 1, \dots, p,

where β_j0’s are unknown constants. This hypothesis testing is challenging because the corresponding null hypothesis is parametric, whereas the alternative is nonparametric. Thus, the parameter space under the null hypothesis is finite dimensional, while it is infinite dimensional under the alternative hypothesis. Intuitively, we would compare the residual sum of squares under H₀ and under H₁. Let $\hat{β} (\cdot)$ and ${\hat{β}}_{0}$ be the estimators of β under H₁ and H₀ respectively. Denote by $RSS (H_{1}) = \sum_{i = 1}^{n} {[Y_{i} - f {X_{i}, \hat{β} (U_{i})}]}^{2}$ and $RSS (H_{0}) = \sum_{i = 1}^{n} {Y_{i} - f (X_{i}, {\hat{β}}_{0})}^{2}$ , the residual sum of squares under H₀ and H₁, respectively. Define

F_{0} = \frac{RSS (H_{0}) - RSS (H_{1})}{n^{- 1} RSS (H_{1})} .

Note that F₀ is similar to the F-test for linear regression models. However, the parameter space under H₁ is infinite dimensional. Recall that an F-test for linear hypothesis in linear regression models is equivalent to the likelihood ratio test under the normality assumption. Specifically, assume that ε_i are iid N(0,σ²); then, the local log-likelihood function conditional on {_{U_i},X_i} equals

ℓ {β (U_{i})} = - \frac{n}{2} \log (2 π σ^{2}) - \frac{1}{2 σ^{2}} \sum_{i = 1}^{n} {[Y_{i} - f {X_{i}, β (U_{i})}]}^{2} .

Then the likelihood ratio test is defined by

T_{0} = ℓ_{n} (H_{1}) - ℓ_{n} (H_{0}) = \frac{n}{2} \log {\frac{RSS (H_{0})}{RSS (H_{1})}},

(2.4)

where l_n(H₀) and _ln(H₁) are the maximum likelihood functions under the null and alternative hypotheses, respectively, and (2.4) is a monotone function of the ratio RSS(H₀)=RSS(H₁). Noting that log(1+x) ≈ x for small x by the Taylor expansion, we obtain

T_{0} \approx \frac{1}{2} \frac{RSS (H_{0}) - RSS (H_{1})}{n^{- 1} RSS (H_{1})} = \frac{1}{2} F_{0}

under H₀. Therefore, motivated by the generalized likelihood ratio test proposed in Fan, Zhang, and Zhang (2001), we define the generalized F-test statistic to be

F = \frac{r_{K}}{2} F_{0},

where r_K = {K(0)–0:5 ∫ K²(u)du}/[∫{K(u)–0.5K *K(u)}du], K *K denotes the convolution of K, and K(·) is the kernel function used to estimate the regression coefficients. The value of r_K is 1.2000, 2.1153, 2.3061, 2.3797, and 2.5375 for uniform, epanechnikov, biweight, triweight, and gaussian kernel, respectively. Note here that the generalized F test does not rely on the normality assumption of the error distribution in general.

Intuitively, under the alternative hypothesis, RSS(H₀) would be larger than RSS(H₁) which would make the test statistic F large and we would reject the null hypothesis. However, under H₀, RSS(H₀) will be close to RSS(H₁); hence, F would be small and we would be in favor of H₀. Fan, Zhang, and Zhang (2001) showed that the generalized likelihood ratio test statistic F has a χ² limiting distribution for a variety of models including Gaussian white noise models, nonparametric regression models, varying coefficient models and generalized varying coefficient models. However, the null limiting distribution given in Fan, Zhang, and Zhang (2001) does not keep the type I error rate well. As demonstrated by Cai, Fan, and Li (2000), a bootstrap procedure may provide a better estimate of the null distribution with moderate sample size than the asymptotic distribution. Thus, we propose to use a nonparametric bootstrap method to estimate the null distribution. In the proposed nonparametric bootstrap procedure, we first estimate the model coefficients under the null and alternative hypotheses using the original data. Let $\tilde{β} (U_{i})$ and $\hat{β} (U_{i})$ denote the estimates of the model coefficients under the null and the alternative hypotheses, respectively, from which the test statistic F is computed. We then obtain residuals $e_{i} = Y_{i} - f {X_{i}, \tilde{β} (U_{i})}$ . Next, we obtain each bootstrap sample by generating data ( $Y_{i}^{*}$ ,X_i,U_i) using the model

Y_{i}^{*} = f {X_{i}, \tilde{β} (U_{i})} + e_{i}^{*},

where $e_{i}^{*}$ denotes the residuals sampled from {e₁, … ,e_n} with replacement. The distribution of the test statistics F* computed from the bootstrap samples estimate the null distribution of F. We study the properties of the proposed generalized F-test by Monte Carlo simulations in Section 3. Note that in simulation studies and data application, we employ the one-step estimator to estimate the null distribution of the test statistics F, but the two-step estimator can also be used and the procedure would be the same.

3 SIMULATION STUDIES

In this section, we study the performance of the proposed one-step and two-step estimators via Monte Carlo simulation studies. All simulations were conducted using R. The Epanechnikov kernel, K(t) = 0.75(1−t²)+, is used in our simulations. We evaluate the performance of the proposed procedures using root average squared error (RASE):

{RASE}_{j} = {(\frac{1}{n_{grid}} \sum_{k = 1}^{n_{grid}} {[\frac{β_{j} (u_{k}) - {\hat{β}}_{j} (u_{k})}{\max {β_{j} (u)} - \min {β_{j} (u)}}]}^{2})}^{1 ∕ 2},

where {u_k,k = 1, … ,n_grid} is an equidistant set of grid points over the range of U with n_grid = 200 and j = 1, … , p. We further study the accuracy of the proposed estimator of the covariance matrix from Section 2.3.

Example 1

In this example, data are generated from the following nonlinear varying coefficient model:

Y_{i} = \exp {β_{1} (U_{i}) + X_{i} β_{2} (U_{i})} + ε_{i},

where β₁(U_i) = sin(πU_i) and β₂(U_i) = sin{4π(U_i–1/8)}. The predictor and error variables, X_i and ε_i, are generated from two independent N(0,1), while U_i are generated from U[0,1], the uniform distribution on [0,1]. We consider sample sizes n = 250,500 and 1000 and 500 Monte Carlo runs for each sample size.

In our simulation, we generate several pilot simulation data sets, and use a cross-validation bandwidth selector to get an overall picture about the optimal bandwidth. To save computing time, we fix the bandwidth to be close to the optimal ones from the pilot simulation data sets. Specifically, we set the bandwidth to be h₁ = 0.10, 0.075 and 0.06 for n = 250, 500 and 1000, respectively, for the one-step procedure. We use h₀ = h₁/2 for the initial bandwidth of the two-step estimator and h₂ = (0.2,0.1), (0.15,0.075), (0.12,0.06) (corresponding to that used for (β₁ and β₂)) for the final bandwidth of the two-step estimator for n = 250, 500 and 1000, respectively. It is of interest to examine the performance of the proposed estimators with a wide range of the bandwidth. Thus, for each sample size, we also set the bandwidth to be h₁/2, 2h₁, h₂/2 and 2h₂, corresponding to undersmoothing and oversmoothing simulations.

The sample mean and sample standard deviation of the RASE values for the one-step and two-step procedures from the 500 replications are summarized in Tables 1 and 2, respectively. Figure 2 depicts the one-step estimates of the parameter functions and 95% asymptotic point-wise confidence intervals based on a typical sample which has the median RASE value over the 500 Monte Carlo runs. The estimates target the underlying true coefficient functions in which the proposed confidence intervals get narrower with increasing sample size. In this simulation, β₁(·) and β₂(·) admit different degrees of smoothness and the one-step estimate for β₁(·) is undersmoothed (Figure 2) . Hence, we employ the two-step procedure. Figure 3 displays the two-step estimates of β₁(·) for each sample size based on a sample with the median RASE value over 500 Monte Carlo runs along with 95% percentile bootstrap confidence intervals based on 500 bootstrap samples. According to Figure 3, the two-step estimates are smoother and closer to the underlying true coefficient function than the one-step estimates. The two-step estimates of β₂(·) are close to the one-step estimates and hence are omitted. Since the two varying coefficient functions admit different degrees of smoothness in this simulation, the two-step estimator leads to efficiency gains across all sample sizes, as demonstrated with smaller RASE values in Table 2. Even though the two-step procedure outperforms the one-step procedure in this simulation, it requires 3 times more computation time than the one-step procedure for n = 500 implemented with CPU 2.3 GHz Intel Core i5.

Table 1.

RASE for One-step Estimates in Example 1

h	β₁(u)			β₂(u)

	n = 250	n = 500	n = 1000	n = 250	n = 500	n = 1000
$\frac{1}{2} h_{1}$	.224(.112)	.160(.048)	.116(.033)	.117(.058)	.076(.029)	.049(.018)
h ₁	.154(.054)	.114(.038)	.081(.022)	.083(.030)	.053(.019)	.035(.009)
2h₁	.198(.125)	.132(.053)	.095(.028)	.147(.030)	.088(.017)	.059(.009)

Open in a new tab

Table 2.

RASE for Two-step Estimates in Example 1

h	β₁(u)			β₂(u)

	n = 250	n = 500	n = 1000	n = 250	n = 500	n = 1000
$\frac{1}{2} h_{2}$	.194(.162)	.128(.084)	.086(.032)	.106(.061)	.066(.026)	.043(.013)
h ₂	.125(.057)	.095(.038)	.064(.021)	.081(.038)	.050(.020)	.033(.010)
2h₂	.191(.128)	.121(.089)	.075(.035)	.143(.033)	.082(.019)	.052(.009)

Open in a new tab

One-step estimates for the varying coefficient functions (dashed) overlaying the true functions (solid) along with 95% pointwise asymptotic confidence intervals (dotted) in (a) and (b) for n = 250, (c) and (d) for n = 500, (e) and (f) for n = 1000.

Two-step estimates of the varying coefficient function β₁(·) (dash-dotted) overlaying the true functions (solid) and 95% percentile bootstrap confidence intervals based on 500 bootstrap samples (dotted) in (a) for n = 250, (b) for n = 500, and (c) for n = 1000.

To examine the bias of the proposed one-step estimator, we also summarize the empirical bias of ${\hat{β}}_{j} (u)$ . Table 3 reports the empirical bias at u = 0.5. The empirical bias of ${\hat{β}}_{j} (u)$ at other locations is similar, and is not reported here to save space. We also test the accuracy of the proposed standard error formula in (2.3). The standard deviation, denoted by SD in Table 3, of 500 ${\hat{β}}_{j} (u)$ , based on 500 simulations, can be viewed as the true standard error. The sample average and the sample standard deviation of the 500 estimated standard errors of ${\hat{β}}_{j} (u)$ , denoted by SE and SD_se in Table 3, respectively, summarize the overall performance of the standard error formula (2.3). Table 3 presents the results at u = 0.5. Table 3 shows that the proposed covariance estimator slightly underestimates the true one, but the difference between SD and SE is less than twice of the SD_se. We conclude that the proposed covariance estimator works reasonably well (Cai et al. 2000; Fan and Li 2004). Results for other locations are similar, and are not presented here to save space.

Table 3.

Bias and Standard Errors for One-step Estimates in Example 1 (h = h₁)

n	β₁(u) with u = 0:5			β₂(u) with u = 0:5

	Bias	SD	SE(SD_se)	Bias	SD	SE(SD_se)
250	.081	.097	.074(.015)	.125	.112	.064(.024)
500	.067	.082	.065(.020)	.081	.081	.053(.021)
1000	.046	.056	.048(.014)	.052	.050	.036(.013)

Open in a new tab

We next investigate the behavior of the generalized F test and its power. It is expected that the null distribution approximately follows a χ²-distribution. Furthermore, the limiting χ² distribution does not depend on the specific null values considered. This property has been called the Wilks phenomenon in Fan, Zhang, and Zhang (2001). To illustrate the Wilks phenomenon, we consider

H_{0} : β_{j} (U) = γ_{j 0} vs H_{1} : β_{j} (U) \neq γ_{j 0}, j = 1, 2,

(3.1)

with five different sets for (γ₁₀, γ₂₀): (μ_β1 , μ_β2) and four combinations of (μ_β1 ±2std(β₁),μ_β2 ±2std(β₂)), where μ_βj = E{β_j(U)} and $std (β_{j}) = \sqrt{var {β_{j} (U)}}$ for j = 1, 2.

A nonparametric bootstrap procedure with resampling from residuals is applied for n = 500 to estimate the null distribution of the test statistic F at different null values. After one sample is generated from the above simulation set-up, the residuals e_i = Y_i – f(X_i, γ₀) are obtained where γ₀ contains the null values. Next, 500 bootstrap samples ( $Y_{i}^{*}$ ,X_i,U_i) of size n = 500 are generated under the null according to

Y_{i}^{*} = f (X_{i}, γ_{0}) + e_{i}^{*},

where $e_{i}^{*}$ is randomly drawn from {e₁, ⋯ ,e_n} with replacement. The test statistic is computed for each bootstrap sample. The estimated density of the test statistic F for 5 different null values is plotted in Figure 4(a) along with the density of the chi-square distribution. The degrees of freedom for the null distribution of the test statistic F is chosen to be close to the sample mean of bootstrap test statistic values across different null values, which is approximately 31.2. The plotted densities in Figure 4(a) are very close to each other and to the chi-square density, which confirms that the null distribution of the test statistic follows a χ² distribution with degrees of freedom 31.2 and that it is independent of the γ_j0 null values. Hence, the Wilks phenomenon holds for the nonlinear varying coefficient models. We also study the power of the proposed hypothesis testing procedure. Consider the hypotheses given in (3.1). The power is evaluated at a sequence of alternatives indexed by δ,

H_{1} : β_{j} (U, δ) = (1 - δ) γ_{j 0} + δ β_{j}^{0} (U),

where j = 1,2 and δ ∈ [0,0.4]. Hence, larger values of δ correspond to further deviation from the null hypothesis. Figure 4(b) depicts the five power functions at five different significance levels: 0.5, 0.25, 0.10, 0.05 and 0.01. The estimated power at δ = 0 for the five significance levels are 0.498, 0.242, 0.074, 0.048 and 0.012 based on 500 replications at n = 500. This implies that the test statistic keeps type I error rate very well. As expected, the power functions increase rapidly for increasing δ values.

(a) Plots for estimated densities of the test statistic F from the proposed generalized F-test using 5 different null values. The solid line in (a) is the density function of a χ² distribution with 31.2 degrees of freedom; (b) is the estimated power functions.

Example 2

In this example, we generate data from

Y_{i} = β_{1} (U_{i}) = \frac{β_{2} (U_{i}) X_{i}}{X_{i} + β_{3} (U_{i})} + ε_{i},

which is the same model as that was fit to the ecological data. In our simulation, ε_i ~ N(0,0.1); the X_i follows a truncated normal distribution with mean and variance equal to the corresponding values of PAR variable in the ecological data and lies between the minimum value of PAR variable and the maximum value of PAR variable; and the U_i follows the uniform distribution over [11.5;24.7], which is the range for temperature in the ecological data. Furthermore, we set β₁(U_i) = 0.5exp(0.5U_i), β₂(U_i) = 23+sin(0.25U_i) and β₃(U_i)=300+2(U_i–10)² so that their ranges are similar to those of estimated parameters in Section 4. We consider three sample sizes, n = 500, 1000 and 2000. We take the bandwidth to be h₁ =5.3, 4.6 and 4.0 for n =500, 1000 and 2000, respectively, for the one-step procedure. We use h₀ = h₁/2 for the initial bandwidth of the two-step estimator and h₂ = (5.3,5.3,5.3), (4.6,4.6,4.6), (4.0,4.0,4.0) (corresponding to that used for (β₁, β₂, β₃)) for the final bandwidth of the two-step estimator for n = 250, 500 and 1000, respectively. For each sample size, we also set the bandwidth to be h₁/2, 2h₁, h₂/2 and 2h₂, corresponding to undersmoothing and oversmoothing simulations for each sample size. We conduct 500 simulations for each case.

The sample mean and standard deviation of the 500 RASE values for the one-step and two-step procedures are reported in Tables 4 and 5, respectively. Since β₁(·), β₂(·) and β₃(·) admit similar degrees of smoothness, the one-step and two-step procedures perform similarly as expected and the RASEs decrease as the sample size increases. Figure 5 depicts the one-step estimates of the parameter functions and their 95% point-wise asymptotic confidence intervals based on a typical sample which has the median RASE value over the 500 Monte Carlo runs. The two-step estimates are similar and are not displayed here. The estimated coefficient functions are close to the underlying true coefficient functions. In the second simulation, the two-step procedure requires 6 times more computation time than the one-step procedure for n = 1000 implemented with CPU 2.3 GHz Intel Core i5.

Table 4.

RASE for One-step Estimates in Example 2

h	β ₁			β ₂			β ₃

	n = 500	n = 1000	n = 2000	n = 500	n = 1000	n = 2000	n = 500	n = 1000	n = 2000
$\frac{1}{2} h_{1}$	.051(.019)	.035(.012)	.026(.007)	.172(.060)	.119(.038)	.091(.025)	.046(.019)	.034(.012)	.025(.008)
h ₁	.046(.019)	.031(.012)	.023(.008)	.143(.056)	.102(.038)	.077(.026)	.041(.019)	.031(.013)	.022(.009)
2h₁	.105 (.027)	.078(.016)	.056(.011)	.211(.059)	.159(.041)	.119(.028)	.110(.020)	.083(.015)	.061(.011)

Open in a new tab

Table 5.

RASE for Two-step Estimates in Example 2

h	β ₁			β ₂			β ₃

	n = 500	n = 1000	n = 2000	n = 500	n = 1000	n = 2000	n = 500	n = 1000	n = 2000
$\frac{1}{2} h_{2}$	.044(.020)	.032(.012)	.023(.008)	.159(.058)	.111(.037)	.080(.024)	.041(.019)	.031(.013)	.023(.009)
h ₂	.041(.019)	.029(.012)	.021(.008)	.142(.052)	.101(.030)	.077(.018)	.040(.018)	.030(.013)	.022(.009)
2h₂	.046(.015)	.037(.010)	.031(.007)	.210(.039)	.158(.017)	.118 (.009)	.052(.012)	.042(.009)	.035(.006)

Open in a new tab

One-step estimates for the parameter functions and their confidence intervals. The dashed lines are the estimated parameter functions, the dotted lines are the corresponding 95% confidence intervals, and the solid lines are the true parameter functions. (a), (b), and (c) are for n=500, (d), (e), and (f) for n=1000, (g), (h), and (i) for n = 2000.

Table 6 depicts the empirical bias, the SD, the SE and the SD_se of ${\hat{β}}_{i} (u)$ at u = 18. Note that the range of β₃(·) is much larger than those for β₁(·) and β₂(·). Thus, the bias and SD for β₃(U) at u = 18 is much larger than those for the other two coefficients. The overall pattern in Tables 4 and 6 is similar to that in Tables 1 and 3. The RASE and bias decrease as the sample size increases. The SE is slightly less than the SD, but their difference is still less than two times the SD_se. This implies that the proposed procedures work reasonably well.

Table 6.

Bias and Standard Errors for One-step Estimates in Example 2 (h = h₁)

n	β₁(u) with u = 18			β₂(u) with u = 18			β₃(u) with u = 18

	Bias	SD	SE(SD_se)	Bias	SD	SE(SD_se)	Bias	SD	SE(SD_se)
500	.132	.145	.115(.024)	.126	.116	.106(.010)	9.265	10.633	9.586(1.204)
1000	.088	.095	.087(.014)	.094	.081	.079(.005)	7.091	7.522	7.212(.668)
2000	.073	.075	.066(.008)	.078	.065	.060(.003)	5.369	5.872	5.441(.394)

Open in a new tab

Similar to Example 1, we illustrate the Wilks phenomenon by considering

H_{0} : β_{j} (U) = γ_{j 0} vs H_{1} : β_{j} (U) \neq γ_{j 0}, j = 1, 2, 3,

with seven different sets for (γ₁₀, γ₂₀, γ₃₀): (β_β₁ , μ_β₂, μ_β₃), (μ_β₁±2std(β₁),μ_β₂ , μ_β₃), (μ_β₁ , μ_β₂±2std(β₂),μ_β₃), and (μ_β₁ ,μ_β2 ,μ_β₃ ±2std(β₃)), where μ_{β_j} = E{β_j(U)} and $std (β_{j}) = \sqrt{var {β_{j} (U)}}$ for j = 1,2,3. Figure 6(a) depicts the kernel density estimates of the null distributions with these seven different sets for (γ₁₀, γ₂₀, γ₃₀) based on 500 bootstrap samples. The seven densities of the null distributions are very close to each other. This implies the null distribution is not sensitive to the value of (γ₁₀, γ₂₀, γ₃₀). Figure 6(a) shows that the null density is very close to a χ² distribution with 12.1 degrees of freedom. Similar to the first simulation study, the value of the degrees of freedom is obtained by computing the mean of the test statistic values based on 500 bootstrap samples. This implies that the Wilks phenomenon holds for the nonlinear varying coefficient models.

(a) Plots for estimated densities of the test statistic F from the proposed generalized F-test using 7 different null values. In (a), the solid curve is the density of a χ² distribution with 12.1 degrees of freedom. (b) Estimated power functions.

Figure 6(b) depicts the power functions at levels 0.5, 0.25, 0.10, 0.05 and 0.01 under the following alternative hypothesis:

H_{1} : β_{j} (U, δ) = (1 - δ) γ_{j 0} + δ β_{j}^{0} (U),

where γ_j0 = E{β_j(U)} for j = 1,2,3 and δ ∈ [0,0.015]. The estimated power at δ = 0 for the five significance levels are 0.450, 0.234, 0.090, 0.048 and 0.014 based on 500 Monte Carlo simulations when n = 500. Again, the test procedure keeps type I error rate very well for this model, and the power functions increase rapidly for increasing δ values.

4 APPLICATION TO THE ECOLOGICAL DATA

In this section, we apply the methodologies proposed in Section 2 for the ecological data set introduced in Section 1. The data set was collected by the AmeriFlux network during summer growing seasons (from June 1 to August 31) of 1993-1995. This study is a part of the U.S. Climate Change Research Initiative conducted by the North American Carbon Program (NACP) which is a component of U.S. Interagency Carbon Cycle Science Program. The objective of AmeriFlux in this project is to help NACP in modeling the causes of variation in the net exchange of CO₂ leading to general principles that can be applied more broadly. Let Y denote the response variable NEE, X denote PAR and U denote temperature. We fit the following nonlinear varying coefficient model

Y = β_{1} (U) - \frac{β_{2} (U) X}{X + β_{3} (U)} + ε,

using the iterative algorithm proposed in Section 2.1. Since photosynthetic activity is low at extreme temperatures, we analyze a data subset consisting of observations with temperature values between 11.5 and 24.7. The total sample size of this data subset is n=4636. The bandwidth was chosen via a multi-fold cross-validation. Specifically, we minimize the cross-validation score

CV (h) = \sum_{j} {‖ Y_{j} - {\hat{Y}}_{- j} ‖}^{2},

where ${\hat{Y}}_{- j}$ denotes the fitted value with jth group of 20 subjects excluded. This results in a bandwidth of h = 3.0.

Figures 7(a), (b), and (c) depict the estimated regression coefficients along with the proposed 95% point-wise asymptotic confidence intervals. From Figures 7(a), (b), and (c), we can see that the estimated coefficient functions vary with temperature; as will be described below the proposed F-test rejects the null hypothesis of a regular nonlinear model in favor of the nonlinear varying coefficient model with a p-value of 0.00036. In our data, we have mixed species plants in a deciduous broadleaf forest. Figure 7(a) shows that these plants have higher dark respiration activity at higher temperatures (T ≥ 16°C ) than lower temperatures (T < 16°C). According to Figure 7(b), at their light-saturation point, the species in our sample achieve the maximum net photosynthetic rate β₂(·) of 36.5 around 12.5°C, beyond which the net photosynthetic rate experiences a period of decrease and stabilizes. In other words, at their light-saturation point, maximum photosynthetic capacity occurs at lower temperatures and the lowest activity is observed around 16°C. Similarly β₃(·) achieves its maximum value shortly after 12°C, and beyond which point it declines and stabilizes. Figure 7(d) shows the estimated apparent quantum yield ${\hat{β}}_{2} (\cdot) ∕ {\hat{β}}_{3} (\cdot)$ along with the 95% point-wise asymptotic confidence intervals. The apparent quantum yield increases with temperature until 19.5°C and stabilizes for higher temperatures. Note that the proposed inference assumes i.i.d observations and hence confidence intervals may be narrower than those that would take into account the correlation structure over time in the data set. Extension of the proposed inference framework for correlated data is of interest.

(a), (b), and (c) One-step estimates for the varying coefficient functions (dashed) of the nonlinear varying coefficient model fit to the ecological data along with 95% point-wise asymptotic confidence intervals (dotted). The solid lines are the estimate of *β_j* under the null hypothesis. (d) Estimated apparent quantum yield (solid) along with 95% point-wise asymptotic confidence intervals (dotted).

This application illustrates the advantages of the proposed nonlinear varying coefficient models outlined in the introduction. First, the dimension is reduced by regression NEE on only PAR in a local temperature window rather than considering a two-dimensional nonparametric regression of NEE on both PAR and temperature. Second, the proposed model reduces the modeling bias by allowing the regression coefficients to vary over temperature; the nonlinear varying coefficient model fit leads to smaller multifold cross validation error compared to the regular nonlinear regression model fit. Finally, as illustrated by the above interpretations, the fits from the nonlinear varying coefficient model can be interpreted as describing photosynthesis via separate nonlinear regression models for each fixed temperature value. This parallels the interpretation that a varying coefficient model can be viewed as a linear regression model at a fixed U = u.

For comparison, we also obtained the two-step estimates proposed in Section 2.2. The initial bandwidth is chosen to be h₀ = 1.5, which is half of the bandwidth obtained by using cross-validation in the one-step procedure. After obtaining the initial estimators, we apply a cross-validation bandwidth selector for each coefficient separately, and obtain the optimal bandwidths h₂ = 2.0, 2.5 and 3.0 for β₁(·), β₂(·), and β₃(·), respectively. Figure 8 shows the two-step estimates of regression coefficients along with 95% percentile bootstrap confidence intervals based on 500 bootstrap samples. To have a sensible comparison between the two-step procedure and the one-step procedure, we rerun the one-step procedure with bandwidths 2.0, 2.5 and 3.0. These estimates are also shown in Figure 8. The two-step estimates are very close to the one-step estimates; however, they are smoother. To compare the two-methods, we also computed the ratio of residual sum of squares, that is RSS(one_step)= RSS(two_step), and found this ratio to be very close to 1. The two-step procedure requires 8 times more computation time than the one-step procedure implemented with CPU 2.3 GHz Intel Core i5.

Two-step estimates for the varying coefficient functions (solid) of the nonlinear varying coefficient model fit to the ecological data along with one-step estimates (dashed) and 95% percentile bootstrap confidence intervals based on 500 bootstrap samples (dotted).

We apply the testing procedure proposed in Section 2.4 to test whether the regression coefficients are actually varying over temperature. In order to do so, we fit the nonlinear model (1.1) to the data and use the resulting estimates ( ${\hat{β}}_{1}$ , ${\hat{β}}_{2}$ , ${\hat{β}}_{3}$ ) = (4.809,31.940,562.677) as the null value for testing whether the functions are constant over temperature. Figure 9 depicts the estimated density function of the null distribution of the test statistic F, overlaying a chi-square density. The degrees of freedom of the chi-square density is chosen close to the sample mean of test statistic values obtained from 500 bootstrap samples, which is approximately 18.5. The figure is consistent with the Wilks phenomenon stating that under the null hypothesis the chisquare distribution is a good approximation for the asymptotic behavior of F. We reject the null hypothesis at the significance level α = 0.05 in favor of the newly proposed nonlinear varying coefficient model with a test statistic of 46.209 and a p-value of 0.00036. In other words, the proposed nonlinear varying coefficient model fits the data better than the nonlinear regression model (1.1).

Estimated density of the test statistic F (dashed) based on 500 bootstrap samples, overlaying the density of a χ² distribution with 18.5 degrees of freedom (solid).

5 DISCUSSION

Motivated by an empirical analysis of a real data set, we proposed nonlinear varying coefficients models. We further developed two estimation procedures based on local linear regression techniques and studied their finite sample properties. Similar to estimation in regular varying coefficient models, the two-step estimators for nonlinear varying coefficient models also lead to efficiency gains when the varying coefficient functions admit different degrees of smoothness, and are less computationally efficient than the one-step estimators. Asymptotic properties of the one-step estimator is established and those for the two-step estimator requires further research. Alternative estimation approaches for nonlinear varying coefficient models may be developed via basis expansions such as smoothing splines (Hastie and Tibshirani 1993). Although splines have many advantages, some challenges such as the difficulty in choosing all the smoothing parameters simultaneously and their computational intensity have been reported (Fan and Zhang 1999).

The generalized F test is proposed for testing whether the nonlinear varying coefficient model reduces to a nonlinear regression model, i.e. whether all the varying coefficient functions are constant over U. For assessing features of single varying coefficient functions, we proposed asymptotic and bootstrap confidence intervals. Developing valid hypothesis testing procedures for specific features of single varying coefficient functions would also be of interest and requires further research.

The proposal is based on a univariate index covariate U. Estimation in varying coefficient models in higher index dimensions quickly runs into issues of ‘curse of dimensionality’. Proposals for incorporating multi-dimensional indices include Xia and Li (1999); Fan, Yao, and Cai (2003) where authors consider a linear combination of a multidimensional U vector to create to a one-dimensional index. Similar extensions can be considered in the case of nonlinear varying coefficient models.

As extensions of generalized varying coefficient models and model (1.4), it may be of interest to consider a new class of following models:

E (Y ∣ X, U) = g [f {X, β (U)}],

(5.1)

where f(· , ·) is pre-specified, while g(·) is unspecified, nonparametric smooth function. This class of models include (1.4) as a special case by taking the link function g(·) to be the identical link. With a pre-specified link function, model (5.1) also includes generalized varying coefficient models as special cases by taking f(X,β) = X^T β(U). Fan, Yao and Cai (2003) proposed adaptive varying coefficient models, in which the link function f(·) is an unknown nonparametric smooth function, and f(X,β) = X^T β(U). The authors proposed an approach to estimating the unknown link function and β(·) iteratively.

In practice, data may be collected over time. Thus, the data would be correlated. The proposed estimation procedures are directly applicable for correlated data. It will involve theoretical techniques in time series to establish the sampling properties of the proposed procedure for correlated data. Thus, it may be an interesting future research topic to establish theoretical property of our proposed procedure under the setting of dependent data.

Supplementary Material

supple

NIHMS591009-supplement-supple.pdf^{(120.8KB, pdf)}

Acknowledgments

His research was supported by National Institute on Drug Abuse grant P50-DA10075, National Cancer Institute (NCI) grant R01 CA168676, and National Natural Science Foundation of China grant 11028103, Her research was supported by National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) grant R01 DK092232. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIDA, NCI, NIDDK or the NIH.

Contributor Information

Esra Kürüm, Department of Statistics, Istanbul Medeniyet University, Istanbul, Turkey. (esra.kurum@medeniyet.edu.tr)..

Runze Li, Department of Statistics and The Methodology Center, The Pennsylvania State University, University Park, PA 16802-2111, USA (rzli@psu.edu).

Yang Wang, Credit & Counterparty Risk Management, McLean, VA 22102, USA (yangwang@freddiemac.com)..

Damla ŞEntürk, Department of Biostatistics, University of California, Los Angeles, CA 90095, USA (dsenturk@ucla.edu)..

References

Breiman L, Friedman JH. Estimating optimal transformations for multiple regression and correlation. Journal of the American Statistical Association. 1985;80:580–619. [Google Scholar]
Cai Z, Fan J, Li R. Efficient estimation and inferences for varying-coefficient models. Journal of the American Statistical Association. 2000;95:888–902. [Google Scholar]
Cleveland W, Grosse E, Shyu W. Local regression models. In: Chambers JM, Hastie TJ, editors. Statistical Models in S. Wadsworth & Brooks/Cole; Pacific Grove, CA: 1992. pp. 309–376. [Google Scholar]
Fan J, Gijbels I. Local polynomial modelling and its applications. Chapman and Hall; London: 1996. [Google Scholar]
Fan J, Li R. New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. Journal of the American Statistical Association. 2004;99:710–723. URL http://www.jstor.org/stable/27590442. [Google Scholar]
Fan J, Yao Q, Cai Z. Adaptive varying-coefficient linear models. (Series B).Journal of the Royal Statistical Society. 2003;65:57–80. URL http://dx.doi.org/10.1111/1467-9868.00372. [Google Scholar]
Fan J, Zhang C, Zhang J. Generalized likelihood ratio statistics and wilks phenomenon. Annals of Statistics. 2001;29:153–193. [Google Scholar]
Fan J, Zhang W. Statistical estimation in varying coefficient models. Annals of Statistics. 1999;27:1491–1518. [Google Scholar]
— Statistical methods with varying coefficient models. Statistics and Its Interface. 2008;1:179–195. doi: 10.4310/sii.2008.v1.n1.a15. [DOI] [PMC free article] [PubMed] [Google Scholar]
Friedman JH. Multivariate adaptive regression splines. Annals of Statistics. 1991;19:1–67. [Google Scholar]
Garrett RH, Grisham CM. Biochemistry. 3rd ed Thomson Learning; Belmont, CA: 2005. [Google Scholar]
Geisser S. The predictive sample reuse method with applications. Journal of the American Statistical Association. 1975;70:320–328. [Google Scholar]
Green PJ, Silverman BW. Nonparametric regression and generalized linear models: a roughness penalty approach. Chapman and Hall; London: 1994. [Google Scholar]
Gu C, Wahba G. Smoothing spline anova with component-wise bayesian “confidence intervals. Journal of Computational and Graphical Statistics. 1993;2:97–117. [Google Scholar]
Härdle W, Stoker TM. Investigating smooth multiple regression by the method of average derivatives. Journal of the American Statistical Association. 1989;84:986–995. [Google Scholar]
Hastie T, Tibshirani R. Generalized additive models. Chapman and Hall; New York: 1990. [DOI] [PubMed] [Google Scholar]
— Varying-coefficient models. (Series B).Journal of the Royal Statistical Society. 1993;55:757–796. [Google Scholar]
Hoover DR, Rice JA, Wu CO, Yang L-P. Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika. 1998;85:809–822. [Google Scholar]
Kauermann G, Tutz G. On model diagnostics using varying coefficient models. Biometrika. 1999;86:119–128. [Google Scholar]
Li K-C. Sliced inverse regression for dimension reduction. Journal of the American Statistical Association. 1991;86:316–327. [Google Scholar]
Monteith JL. Solar radiation and productivity in tropical ecosystems. Journal of Applied Ecology. 1972;9:747–766. [Google Scholar]
Osborne MR. Fisher’s method of scoring. International Statistical Review/Revue Internationale de Statistique. 1992;60:99–117. [Google Scholar]
Seber GAF, Wild CJ. Nonlinear regression. Wiley-Interscience; 2003. [Google Scholar]
Stone CJ, Hansen MH, Kooperberg C, Truong YK. Polynomial splines and their tensor products in extended linear modeling: 1994 wald memorial lecture. Annals of Statistics. 1997;25:1371–1470. [Google Scholar]
Wahba G. Partial spline models for the semiparametric estimation of functions of several variables. Statistical Analysis of Time Series, Proceedings of the Japan U.S. Joint Seminar; Tokyo. Institute of Statistical Mathematics; 1984. pp. 319–329. [Google Scholar]
Wu CO, Chiang C-T, Hoover DR. Asymptotic confidence regions for kernel smoothing of a varying-coefficient model with longitudinal data. Journal of the American Statistical Association. 1998;93:1388–1402. [Google Scholar]
Xia Y, Li WK. On single-index coefficient regression models. Journal of the American Statistical Association. 1999;94:1275–1285. URL http://www.jstor.org/stable/2669941. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supple

NIHMS591009-supplement-supple.pdf^{(120.8KB, pdf)}

[R1] Breiman L, Friedman JH. Estimating optimal transformations for multiple regression and correlation. Journal of the American Statistical Association. 1985;80:580–619. [Google Scholar]

[R2] Cai Z, Fan J, Li R. Efficient estimation and inferences for varying-coefficient models. Journal of the American Statistical Association. 2000;95:888–902. [Google Scholar]

[R3] Cleveland W, Grosse E, Shyu W. Local regression models. In: Chambers JM, Hastie TJ, editors. Statistical Models in S. Wadsworth & Brooks/Cole; Pacific Grove, CA: 1992. pp. 309–376. [Google Scholar]

[R4] Fan J, Gijbels I. Local polynomial modelling and its applications. Chapman and Hall; London: 1996. [Google Scholar]

[R5] Fan J, Li R. New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. Journal of the American Statistical Association. 2004;99:710–723. URL http://www.jstor.org/stable/27590442. [Google Scholar]

[R6] Fan J, Yao Q, Cai Z. Adaptive varying-coefficient linear models. (Series B).Journal of the Royal Statistical Society. 2003;65:57–80. URL http://dx.doi.org/10.1111/1467-9868.00372. [Google Scholar]

[R7] Fan J, Zhang C, Zhang J. Generalized likelihood ratio statistics and wilks phenomenon. Annals of Statistics. 2001;29:153–193. [Google Scholar]

[R8] Fan J, Zhang W. Statistical estimation in varying coefficient models. Annals of Statistics. 1999;27:1491–1518. [Google Scholar]

[R9] — Statistical methods with varying coefficient models. Statistics and Its Interface. 2008;1:179–195. doi: 10.4310/sii.2008.v1.n1.a15. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Friedman JH. Multivariate adaptive regression splines. Annals of Statistics. 1991;19:1–67. [Google Scholar]

[R11] Garrett RH, Grisham CM. Biochemistry. 3rd ed Thomson Learning; Belmont, CA: 2005. [Google Scholar]

[R12] Geisser S. The predictive sample reuse method with applications. Journal of the American Statistical Association. 1975;70:320–328. [Google Scholar]

[R13] Green PJ, Silverman BW. Nonparametric regression and generalized linear models: a roughness penalty approach. Chapman and Hall; London: 1994. [Google Scholar]

[R14] Gu C, Wahba G. Smoothing spline anova with component-wise bayesian “confidence intervals. Journal of Computational and Graphical Statistics. 1993;2:97–117. [Google Scholar]

[R15] Härdle W, Stoker TM. Investigating smooth multiple regression by the method of average derivatives. Journal of the American Statistical Association. 1989;84:986–995. [Google Scholar]

[R16] Hastie T, Tibshirani R. Generalized additive models. Chapman and Hall; New York: 1990. [DOI] [PubMed] [Google Scholar]

[R17] — Varying-coefficient models. (Series B).Journal of the Royal Statistical Society. 1993;55:757–796. [Google Scholar]

[R18] Hoover DR, Rice JA, Wu CO, Yang L-P. Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika. 1998;85:809–822. [Google Scholar]

[R19] Kauermann G, Tutz G. On model diagnostics using varying coefficient models. Biometrika. 1999;86:119–128. [Google Scholar]

[R20] Li K-C. Sliced inverse regression for dimension reduction. Journal of the American Statistical Association. 1991;86:316–327. [Google Scholar]

[R21] Monteith JL. Solar radiation and productivity in tropical ecosystems. Journal of Applied Ecology. 1972;9:747–766. [Google Scholar]

[R22] Osborne MR. Fisher’s method of scoring. International Statistical Review/Revue Internationale de Statistique. 1992;60:99–117. [Google Scholar]

[R23] Seber GAF, Wild CJ. Nonlinear regression. Wiley-Interscience; 2003. [Google Scholar]

[R24] Stone CJ, Hansen MH, Kooperberg C, Truong YK. Polynomial splines and their tensor products in extended linear modeling: 1994 wald memorial lecture. Annals of Statistics. 1997;25:1371–1470. [Google Scholar]

[R25] Wahba G. Partial spline models for the semiparametric estimation of functions of several variables. Statistical Analysis of Time Series, Proceedings of the Japan U.S. Joint Seminar; Tokyo. Institute of Statistical Mathematics; 1984. pp. 319–329. [Google Scholar]

[R26] Wu CO, Chiang C-T, Hoover DR. Asymptotic confidence regions for kernel smoothing of a varying-coefficient model with longitudinal data. Journal of the American Statistical Association. 1998;93:1388–1402. [Google Scholar]

[R27] Xia Y, Li WK. On single-index coefficient regression models. Journal of the American Statistical Association. 1999;94:1275–1285. URL http://www.jstor.org/stable/2669941. [Google Scholar]

PERMALINK

Nonlinear Varying Coefficient Models with Applications to Studying Photosynthesis

Esra Kürüm

Runze Li

Yang Wang

Damla ŞEntürk

Roles

Abstract

1 INTRODUCTION

Figure 1.

2 STATISTICAL INFERENCE PROCEDURES

2.1 ONE-STEP ESTIMATION

2.2 TWO-STEP ESTIMATION

2.3 ASYMPTOTIC PROPERTIES OF THE ONE-STEP ESTIMATOR

Theorem 1

Theorem 2

2.4 GENERALIZED F-TEST

3 SIMULATION STUDIES

Example 1

Table 1.

Table 2.

Figure 2.

Figure 3.

Table 3.

Figure 4.

Example 2

Table 4.

Table 5.

Figure 5.

Table 6.

Figure 6.

4 APPLICATION TO THE ECOLOGICAL DATA

Figure 7.

Figure 8.

Figure 9.

5 DISCUSSION

Supplementary Material

Acknowledgments

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases