Summary
Motivated by an analysis of a real data set in ecology, we consider a class of partially nonlinear models where both of a nonparametric component and a parametric component present. We develop two new estimation procedures to estimate the parameters in the parametric component. Consistency and asymptotic normality of the resulting estimators are established. We further propose an estimation procedure and a generalized F test procedure for the nonparametric component in the partially nonlinear models. Asymptotic properties of the newly proposed estimation procedure and the test statistic are derived. Finite sample performance of the proposed inference procedures are assessed by Monte Carlo simulation studies. An application in ecology is used to illustrate the proposed methods.
Keywords: Local linear regression, partial linear models, profile least squares, semiparametric models
1. Introduction
Semiparametric regression models are useful for data analysis since they retain the flexibility of nonparametric models and the ease of interpretation of parametric models. Ruppert, Wand and Carroll (2003) presents various estimation procedures and many applications of semiparametric regression models. Motivated by a real data example in ecology, we consider a class of partially nonlinear models, natural extensions of partially linear models which are systematically studied in Härdle, Liang and Gao (1999).
Let us first present some background of the motivating real data example. Photosynthesis is the most important biochemical process to the dynamics of life on the Earth. Dependence of the ecosystem light response on temperature is complex. For example, because of elevated temperature and enhanced respiration, increase in radiation may lead to decrease in photosynthesis (Gu, et al., 2002). Many researches have been conducted to understand the complexity of the effects of environmental stresses on the ecosystem light response. In the literature, researchers use Net Ecosystem CO2 Exchange (NEE) to measure photosynthetic rate in a natural ecosystem. It is known that sunlight intensity affects the rate of photosynthesis. In other words, the photosynthetic rate depends on the amount of Photosynthetically Active Radiation (PAR) available to an ecosystem. Based on empirical studies using data collected in laboratory experiments in which temperature can be well controlled, the relationship between NEE and PAR, is assumed to be nonlinear (Montheith, 1972, Ruimy et al., 1999 and references therein). See also (3.1). On the other hand, the data analyzed in Section 3.2, consisting of 1997 observations of NEE, PAR and temperature (T), were collected over a subalpine forest during parts of the growth season of 1999. Because the temperature cannot be controlled in an real ecosystem, and ecosystem respiration, which gives off CO2, depends on temperature, we should allow the parameter associated with ecosystem respiration, i.e, R in equation (3.1), to depend on temperature. Since the parametric relationship of R and T is not available, we assume R depends on the temperature through an unknown function R(T). Thus, we consider a partially nonlinear model (3.2). See Section 3.2 for data description and more demonstrations.
The partially nonlinear model, defined in (2.1), retains the flexibility of nonparametric models for the baseline function and the model interpretability of nonlinear parametric models. As special cases of partially nonlinear models, partially linear models are popular in the literature (Engle, et al., 1986, Heckman, 1986, Chen, 1988, Speckman, 1988, Ma, Chiou and Wang, 2006 for iid data, and Fan and Li, 2004, Hu, Wang and Carroll, 2004 and Wang, Carroll and Lin, 2005 for longitudinal data). As another type of special cases of partially nonlinear models, nonlinear parametric regression models have been widely used in statistical literature. Many interesting examples and applications of such models are given in Bates and Watts (1988), Seber and Wild (1989), and Huet (2004).
In this paper, we first propose an estimation procedure for parameters in the parametric component by using a profile nonlinear least squares technique. The asymptotic normality of the resulting estimate are established. We further propose an alternative estimation procedure, in which the nonlinear parametric function is approximated by a linear function. Compared to the profile approach, the linear approximation approach is computationally less intensive, however shares the same asymptotic efficiency. We further propose an estimation procedure for the nonparametric baseline function. The asymptotic properties of the estimation procedure are established. To address some scientific questions, one may have to test whether the nonparametric component has some simple forms (e.g, constant, or linear). To this end, we propose a generalized F test to the partially nonlinear model, and further derive its asymptotic null distribution. The paper is organized as follow. In Section 2, we develop two estimation procedures for parameters in the parametric component and statistical inference procedures for the baseline function. In Section 3, we conduct numerical studies to assess finite sample performance of the proposed procedures. A real data example is used to illustrate the proposed procedures. Some discussions are given in Section 4.
2. Inference Procedures
Let Y be a response variable and {x, U}be its covariates. A partially nonlinear model is defined to be
| (2.1) |
where α(·) is an unknown smooth function, g(·; ·) is a pre-specified function, β is an unknown parameter vector, and ε is a random error with mean zero and variance σ2. Due to the curse of dimensionality, it is assumed throughout this paper that U is a univariate continuous random variable. Suppose that {xi, ui, yi}, i = 1, ···, n is a random sample. We next propose two estimation procedures for the parameter β.
2.1 Profile nonlinear least squares approach
For model (2.1), define a nonlinear least squares function
| (2.2) |
We first apply the profile least squares technique to estimate parameters through (2.2) as follows. For a given β, let . Then, { } can be viewed as a sample from the model
| (2.3) |
This is a 1-dimensional ordinary nonparametric regression model. Thus, α(·) can be easily estimated using any linear smoother. Here, we will employ local linear regression (Fan and Gijbels, 1996). For any u in a neighborhood of u0, it follows by Taylor’s expansion that
Let K(·) be a kernel function and h be a bandwidth. The local linear regression approach finds local parameters (a, b) to minimize
where Kh(·) = h−1K(·/h). Note that local linear regression results in a linear estimator for α(·) in terms of . Let y = (y1, ···, yn)T, α = {α(u1), ···, α(un)}T, and g(β) = {g(x1; β), ···, g(xn;β)}T. Thus, the local linear estimate for α is
| (2.4) |
where Sh is an n × n smoothing matrix depending only on u1, ···, un and the bandwidth h.
Substituting α̂ for α in (2.2) results in the profile nonlinear least square:
| (2.5) |
where In is the n × n identity matrix, and ||·|| stands for the Euclidean norm. Minimizing Q(β) yields a nonlinear profile least squares estimator β̂P. The idea of the nonlinear profile least squares approach stems from the generalized profile likelihood principle (Severini and Wong, 1992). Let g′(x,β) be ∂g(x; β)/∂β, and β0 the true value of β. Denote a⊗2 = aaT and define
Theorem 1
Suppose that the matrix A is finite and positive definite, and Conditions (A)—(H) in the Web Appendix hold. Then
where “ ” stands for convergence in distribution.
The proof of Theorem 1 is given in the Web Appendix. From Theorem 1, the asymptotic variance of β̂, i.e. σ2A−1/n, can be estimated by
| (2.6) |
where ḡ(ui) is the local linear estimate of E{g′(x; β0)|U = ui} based on the sample {uj, g′(xj; β̂P), j = 1, ···, n}; and is the residual mean squared error of the profile nonlinear estimator.
2.2 Linear approximation approach
Since g(β) is nonlinear with respect to β, there is no closed form for the minimizer of Q(β) in (2.5). Typically, we employ the Newton-Raphson algorithm for the minimization problem. The algorithm requires to compute Shg(β), Shg′(β) and Shg″(β) at each step of iterations. That is, we have to conduct nonparametric smoothing (i.e, local linear regression in this paper) many times at each step. Therefore, minimizing Q(β) involves a heavy computation. We next propose an alternative estimation procedure which requires much less computation. This approach will need β̂I, a consistent estimate of β, as a starting point, while we will introduce an easy way to construct β̂I in (2.11). Applying Taylor expansion for g(·; β) at β̂I, we have
Thus, it follows that
| (2.7) |
Let zi = yi − g(xi; β̂I)+ g′(xi; β̂I)Tβ̂I. We consider the following linear approximation model
| (2.8) |
Existing estimation procedures can be directly employed to estimate β through the partial linear model (2.8). Here we use the profile least squares technique. For a given β, let , then (2.8) leads to
The local linear regression can be used to estimate the baseline function α(U). Denote and α = (α(u1), ···, α(un))T. Since the local linear regression is a linear smoother, we have
Substituting α̂ for α in (2.8), then it follows that
where z = (z1, ···, zn)T and g′(β̂I) = (g′(x1;β̂I), ···, g′(xn;β̂I))T. We therefore estimate β through the ordinary least squares estimator,
| (2.9) |
Theorem 2
Suppose that the matrix A is finite and positive definite. Under Conditions (A)—(H) given in the Web Appendix and ||β̂I − β0|| = OP(n−1/2),
The proof of Theorem 2 is given in the Web Appendix. From Theorems 1 and 2, β̂P and β̂L share the same asymptotic distribution. Similar to (2.6), Var{β̂L} can be estimated by using its sample counterpart.
In practical implementation, one needs to obtain β̂I first. Reorder the data from the least to largest according to the value of the variable {ui}. Denote (x(i), u(i), y(i)), i = 1, ···, n, to be the ordered sample. Similar to Yatchew (1997), we obtain the following approximated model,
| (2.10) |
the nonlinear least squares estimate will be used as an initial starting point of β.
| (2.11) |
Under some conditions, β̂I is root n consistent (Li and Nie, 2007).
Remark
To ensure root n consistency of β̂L, the assumption of ||β̂I − β|| = OP (n−1/2) is critical in the linear approximation approach. Inconsistent estimate β̂I may result in an inconsistent estimate β̂L. The finite sample performance of β̂L may be sensitive to the initial value βI, which is confirmed by our simulation study. The efficiency of β̂L depends on the intrinsic curvature of the partially nonlinear model (Zhu, Tang and Wei, 2000). If the intrinsic curvature of the partially nonlinear model at β̂I is big, or β̂I is not the root n consistent, then we suggest using the following iterative algorithm. For a given initial value β̂(0) (we may set β̂(0) =β̂I), we iteratively compute
| (2.12) |
until it converges, where the i-th element of z(m) equals yi − g(xi; β̂(m)) + g′(xi; β̂(m))Tβ̂(m). As shown in the Web Appendix, algorithm (2.12) is equivalent to using the Fisher scoring algorithm to minimize Q(β) in (2.5). As shown in the proofs of Theorems 1 and 2, both n−1{Q″(β0)}/2 and n−1{g′(β0)T(I−Sh)T (1−Sh)g′(β0)} tend to A in probability. Therefore the resulting estimate of algorithm (2.12) shares the same efficiency of the profile nonlinear least squares estimate.
2.3 Inference procedures for the baseline function
For a given root n consistent estimate β̂ of β, we estimate α(·) by smoothing the partial residuals {ri = yi − g(xi; β̂), i = 1, ···, n} over ui. Here β̂ can be either β̂P or β̂L. We will utilize local linear regression (Fan and Gijbels, 1996) to estimate α(u). Denote by (â, b̂) the minimizer of
| (2.13) |
Then
The asymptotic bias and variance of α̂ (u0; β̂) are given in the following theorem. They are the same as that with β known. This is because when β̂ is root n consistent, the convergence rate of β̂ is faster than that of the nonparametric estimator. Consequently the errors in estimation of β are negligible in the nonparametric estimation of α.
Theorem 3
Suppose that ||β̂ − β0||= OP(n−1/2). Under Conditions (A)—(C) given in the Web Appendix, if h → 0 and nh → ∞, then for any u0 ∈ Ω, the support of U, we have
where
and f(·) is the marginal density of U.
The bias and variance expressions are the same as those in Fan (1993). Thus, the theoretic optimal bandwidth will remain the same as that for 1-dimensional nonparametric model (2.8) regarding β is known. The proof of Theorem 3 is similar to what is described in Fan (1993), we therefore omit the details. In practice, the bandwidth can be selected by existing data-driven methods. Specifically, let , and then apply an existing bandwidth selection procedure such as the plug-in bandwidth selector (Ruppert, Sheather and Wand, 1995), for (ui, ), i = 1, ···, n to select a bandwidth h. This approach will be implemented in Section 3.2.
In practice, a natural question that may arise is whether α(·) has a pre-specified parametric form. This leads us to consider the following hypothesis testing problem:
where α0(·, θ) is a pre-specified parametric function and θ is a vector of unknown parameters. For example, if we are interested in testing whether α(·) really depends on U variable, then we may consider the null hypothesis that α(·) = α0, where θ = α0 is an unknown constant. This kind of hypothesis testing is referred to as nonparametric goodness of fit test. In this section, we propose a generalized F test for the partially nonlinear model. For simplicity of presentation, we consider only a test of linearity for α(·):
| (2.14) |
where α0 and α1 are unknown constants. Note that the parameter space under the null hypothesis is finite dimensional, while it is infinite dimensional under the alternative hypothesis. Thus, many traditional tests, such as the F test, cannot be directly applied for the above hypothesis. Here we propose a generalized F test to deal with this issue. Let β̂ be either β̂P or β̂L, a root n consistent estimate of β under H1, (α̂0, α̂1)T be an estimate of (α0, α1)T under H0 and α̂(·) be an estimate of α(·) under H1. Denote , the residual sum of squares under H0, and , the residual sum of squares under H1. As in linear models, it is natural to employ an F-type test to compare the model fits under H0 and H1. Since the parameter space is infinite dimensional under alternative hypothesis, we define a generalized F test statistic
| (2.15) |
Intuitively, under H0, there will be little difference between RSS(H0) and RSS(H1). However, under the alternative hypothesis, RSS(H0) should become systematically larger than RSS(H1), and hence the test statistic F will tend to take a large positive value. Hence, a large value of the test statistic F indicates that the null hypothesis should be rejected. The following theorem gives the asymptotic null distribution of F.
Theorem 4
Suppose that ||β̂ − β0|| = OP (n−1/2), E|ε|4 < ∞ and Conditions (A)—(C) in the Web Appendix hold. If h → 0 in such a way that nh3/2 → ∞, then, under H0 in (2.14), (rK/2)F, has an asymptotic χ2 distribution with δn degrees of freedom, where δn = rK|Ω|{K(0) − 0.5 ∫K2(u) du}/h, and |Ω| stands for the length of the support of U, and rK = {K(0) − 0.5 ∫ K2(t) dt}/∫{K(t) − 0.5K * K(t)}2 dt. Here K * K stands for the convolution of K and K.
The proof of Theorem 4 is given in the Web Appendix. The value of rK is 1.2000, 2.1153, 2.3061, 2.3797 and 2.5375 for uniform, Epanechnikov, biweight, triweight and Gaussian kernel, respectively. Since the F test and the likelihood ratio test are equivalent for linear model with normally distributed error, the above theorem indeed extends the generalized likelihood ratio theory (Fan, Zhang and Zhang, 2001) to model (2.1), and unveils a new Wilks phenomenon for partially nonlinear models: the asymptotic null distribution is a chi-square distribution and does not depend on the unknown parameters (α0, β0). We will also provide empirical justification to the null distribution. Similar to Cai, Fan and Li (2000), the null distribution of the generalized F test can be estimated by Monte Carlo simulation or a bootstrap procedure. This usually provides a better estimate than the asymptotic null distribution, since the degrees of freedom tend to infinity and the results in the above theorem give only the main order of the degrees of freedom.
Remark
Note that for a χ2 random variable with degrees of freedom r, Var{χ2(r)} = 2E{χ2(r)}. Thus, rK = 4E(F)=Var(F). Using this relationship, we can derive a better approximation to the normalizing constant rK using the bootstrap samples of F. Specifically, let mean(F*) and var(F*) be the sample mean and the sample variance of bootstrap samples { , i = 1, ···, N} of F0, respectively. Then rK can be replaced by . This will be implemented in Section 3.
3. Numerical study and Application
In this section, we assess the finite sample performance by Monte Carlo simulations. We further illustrate the proposed methodologies through an application to the real data example. All simulations were conducted in SAS. In our simulation, the kernel function is taken to be the Epanechnikov kernel K(u) = 0.75(1 − u2)+, and β̂I in (2.11) is set to be the initial value for β̂P. From our limited experience, the profile nonlinear least squares method is not sensitive to the initial value specification while a good initial value leads to fast convergence of the algorithm.
3.1 Monte Carlo simulations
We generate random samples from the following model
where ε ~ N(0, 1) and U ~ U(0, 1), the uniform distribution over [0,1]. In our simulation, we consider the following two baseline functions:
and two nonlinear functions g(·; ·):
with β1 = 18 and β2 = 0.8, which were chosen to be the estimates for the real data example in Section 3.2, and x ~ N(0, 1); and
with β1 = β2 = 1, The covariate vector (x1, x2)T was simulated from a normal distribution with mean zero and cov(xi, xj) = 0.5|i−j|. The sample size n was set to 200 or 400. For each case, we conduct 1,000 Monte Carlo simulations.
Performance of β̂
Simulation results for β̂ are summarized in Table 1, in which “mean” stands for the average of the 1000 estimates of β’s, and “SD” is the standard deviation of the 1000 estimates of β’s and can be regarded as the true standard error of β’s. “SE” and “Std(SE)” are the average and the standard deviation of the 1000 estimated standard errors using the proposed formulae in Section 2. From Table 1, we can see the performance of the two estimation procedures for β are almost the same, and both perform well. Comparing with the standard deviation of SE’s, the SE slightly under-estimate the SD for sample size n = 200. This is typical for sandwich formula estimator (Kauermann and Carroll, 2001). In general, the difference between SD and its corresponding SE is less than one standard deviation of SEs. This indicates that the proposed formulae in Section 2 work very well. Results in Table 1 correspond to the bandwidth h = 0.1, which equals approximately the optimal bandwidth selected by the plug-in method of Ruppert, Sheather and Wand (1995). To study the effects of choice of bandwidth, we consider a smaller bandwidth h = 0.05 and obtained similar results (data not shown). Results for other cases are similar and omitted here. We have also conducted simulations with the sample size n = 50 and 100. The results are similar to those in Table 1. Thus, the results are not reported here to save space.
Table 1.
Finite Sample Performance of β̂P and β̂L with h = 0.1
| Profile Nonlinear LS | Linear Approximation | ||||||
|---|---|---|---|---|---|---|---|
| β | (n, α, g) | Mean | SD | SE(Std(SE)) | Mean | SD | SE(Std(SE)) |
| β1 = 18 | (200, α1, g1) | 18.294 | 1.524 | 1.417(0.452) | 18.29 | 1.521 | 1.391(0.440) |
| (400, α1, g1) | 18.145 | 1.004 | 0.963(0.193) | 18.144 | 1.004 | 0.954(0.191) | |
| (200, α2, g1) | 18.294 | 1.524 | 1.416(0.452) | 18.290 | 1.521 | 1.390(0.440) | |
| (400, α2, g1) | 18.145 | 1.004 | 0.963(0.193) | 18.144 | 1.004 | 0.953(0.191) | |
|
| |||||||
| β2 = 0.8 | (200, α1, g1) | 0.829 | 0.165 | 0.153(0.048) | 0.829 | 0.164 | 0.150(0.047) |
| (400, α1, g1) | 0.814 | 0.106 | 0.104(0.020) | 0.814 | 0.106 | 0.103(0.020) | |
| (200, α2, g1) | 0.829 | 0.165 | 0.153(0.048) | 0.829 | 0.164 | 0.150(0.047) | |
| (400, α2, g1) | 0.814 | 0.106 | 0.104(0.020) | 0.814 | 0.106 | 0.103(0.020) | |
|
| |||||||
| β1 = 1 | (200, α1, g2) | 1.001 | 0.030 | 0.029(0.002) | 1.001 | 0.030 | 0.029(0.002) |
| (400, α1, g2) | 1.000 | 0.021 | 0.021(0.001) | 1.000 | 0.021 | 0.021(0.001) | |
| (200, α2, g2) | 1.001 | 0.030 | 0.029(0.002) | 1.001 | 0.029 | 0.029(0.002) | |
| (400, α2, g2) | 1.000 | 0.020 | 0.021(0.001) | 1.000 | 0.020 | 0.021(0.001) | |
|
| |||||||
| β2 = 1 | (200, α1, g2) | 1.000 | 0.029 | 0.029(0.002) | 1.000 | 0.029 | 0.029(0.002) |
| (400, α1, g2) | 1.001 | 0.021 | 0.021(0.001) | 1.001 | 0.021 | 0.021(0.001) | |
| (200, α2, g2) | 1.000 | 0.029 | 0.029(0.002) | 1.000 | 0.029 | 0.029(0.002) | |
| (400, α2, g2) | 1.001 | 0.021 | 0.021(0.001) | 1.001 | 0.021 | 0.021(0.001) | |
Performance of α̂(·)
The performance of α̂(·) is assessed by the square root of average squared errors (RASE),
where {uk, k = 1, ···, ngrid} are the grid points at which α̂(·) is evaluated. In our simulation, the bandwidth was set to 0.1 or 0.2. The sample averages of the RASEs based on 1000 replicates are listed in Table 2, which reports only the case with g(x; β) = −β1x/(x + β2). Results for the other case is similar, and are omitted here to save space.
Table 2.
Mean of RASEs for Baseline Function
| Profile Nonlinear LS | Linear Approximation | |||
|---|---|---|---|---|
| (n, g) | h = 0.1 | h = 0.2 | h = 0.1 | h = 0.2 |
| (200, g1) | 0.123 | 0.117 | 0.123 | 0.116 |
| (400, g1) | 0.056 | 0.058 | 0.056 | 0.058 |
|
| ||||
| (200, g2) | 0.039 | 0.032 | 0.039 | 0.032 |
| (400, g2) | 0.012 | 0.021 | 0.012 | 0.021 |
Performance of the generalized F test
We first verify the Wilks type result in Theorem 4 empirically. To this end, we take H0: α(u) = a0 + a1u. where (a0, a1) = argminc0c1E||α(u) − c0 − c1u||2. Such choice of (a0, a1) will pose the challenge to the generalized F test in the power calculation below. Then we find the null distribution based on 1,000 bootstrap simulations. We present only results for α(u) = α1(u). Results for α2(u) are similar. Figure 1(a) and (c) depict the estimated density of the null distribution of (cK/2)F for g = g1 and g2, respectively. We also plot the density of the chi-square distribution in order to examine whether the null distribution is close to a chi-square distribution. The degree of freedom is chosen to be the closest integer to the sample average of (cK/2)F values from the 1,000 bootstrap samples. It is clear from Figure 1(a) and (c) that the Wilks type results hold, i.e., the chi-square distribution is a good approximation for the asymptotic behavior of (cK/2)F under H0.
Figure 1.
Plot of Null Distribution of the Generalized F Test and its Power Function (n = 200). (a) and (c) are the density of (ck/2)F under the null hypothesis. The solid line is the estimated density curve using kernel density estimation and dash-dotted line is the density curve of a χ2-distribution with degrees of freedom 15 and 21 for (a) and (c) respectively. (b) and (d) are power functions at level α=0.01, 0.05 and 0.10 from bottom to top, respectively.
To examine the power of the proposed nonparametric goodness of fit test, we evaluate the power of the generalized F test for the alternative model
for each given c, where α(u) equals either α1(·) or α2(·). We took c = 0.2, 0.4, 0.6. Results for α2(·) are similar to those of α1(·). We present only results for α1(·).
Figure 1(b) and (d) depicts the three power functions based on 400 Monte Carlo simulations for the sample size, n = 200, at three different significance levels: 0.10, 0.05 and 0.01. The powers at c = 0 for the foregoing three significance levels are: 0.095, 0.043 and 0.018 for Figure 1(b) and 0.090, 0.048 and 0.018 for Figure 1(d), respectively. This shows that the bootstrap method gives us an approximately right sized test.
3.2 An Application
In this section, we illustrate the proposed methodologies by an analysis of a real data set in ecology. Of interest in this example is to study how temperature affects the relationship between the net ecosystem CO2 exchange (NEE) and the photosynthetically active radiation (PAR). The data set in this example consists of 1997 observations of NEE, PAR and temperature (T), and was collected over a subalpine forest at approximately 3050 meter elevations above sea level by using three-dimensional sonic anemometers on hundreds of meter towers during parts of the growth season of 1999. When data are collected from laboratory experiments, in which the temperature can be well controlled, the following model is widely used to describe the NEE and PAR relationship.
| (3.1) |
where ε is a random error with zero mean, and R, β1 and β2 are unknown parameters with physical interpretations. Specifically, R is the dark respiration rate, β1 is the light-saturated net photosynthetic rate, and β1/β2 is the apparent quantum yield.
In order to demonstrate that the model (2.1) is appropriate, we take the NEE as the response variable, and PAR and T as covariates. We first consider a fully nonparametric regression model:
where m(·, ·) is an unspecified smoothing function, and ε is random error with mean 0 and variance σ2. Two dimensional kernel regression was used to estimate the regression m(·, ·). To examine how temperature affects the parameters in Model (3.1), we plot m̂ (PAR, T) versus PAR for given values of T. The three lines in Figure 2(a) depict the plot of m̂ (PAR, T) over PAR for T = 10.76, 13.29 and 15.41, which correspond to the three sample quartiles of temperatures. From Figure 2(a), we can see the nonlinear relationship between PAR and NEE when temperature is fixed. The nearly parallel pattern of the three lines suggests that the parameters β1 and β2 do not vary over temperature, while different intercepts for the three lines show the dark respiration rate R changes over temperature. The monotone decreasing pattern of the lines implies that Model (3.1) may be appropriate when temperature is fixed.
Figure 2.
Plots for Example in Section 3.2. The dashed lines in (a) are the fitted value of NEE versus PAR given temperature using 2-d kernel regression. From the bottom to top, the temperatures are 10.760, 13.290 and 15.410, which correspond to the three sample quartiles of temperatures. In (b), the solid line is an estimate of the baseline function, and the dots are partial residuals: yi − β̂1xi/(xi + β̂2) with (β̂1, β̂2) = (17.644; 0.808). In (c), the solid line is an estimate to the density of the (ck/2)F under H0, and dotted line is the density of a chi-square distribution with d.f. 18.
From the above analysis, we consider a partially nonlinear model
| (3.2) |
The proposed estimation procedure for β is used to fit the data set using Model (3.2). We first compute the difference based estimate β̂I, and then use the plug-in method of Ruppert, Sheather and Wand (1995) to select a bandwidth. The resulting optimal bandwidth is h = 1.286. The resulting estimates using this bandwidth are β̂1 = 17.656(0.398) and β̂2 = 0.806(0.071) for the profile nonlinear least square approach; and β̂1 = 17.644(0.384) and β̂2 = 0.808(0.063) for the linear approximation approach.
As proposed in Section 2.2, we estimate the baseline function based on the partial residuals, which are depicted in Figure 2(b), from which we can see that the partial residuals have an increasing trend over temperature. Figure 2(b) also depicts the estimated baseline function using local linear regression.
From Figure 2(b), the overall increasing trend might suggest a simple linear model for R(T): R(T) = a+b×T. Thus, it is of interest to test H0: R(T) = a+b×T. Using nonlinear least squares approach, we obtain the resulting estimate for parameters in model (3.2) under the H0 given by R(T) = −2.793+0.554 ×T, β̃1 = 17.556 and β̃2 = 0.798. It can be seen that (β̃1, β̃2) is very close to (β̂1, β̂2) = (17.644, 0.808), the β̂L obtained under H1. This implies that the model under H0 fits the data almost as well as the one under H1, and indicates that test of linearity for this data set is challenging. Setting (β̂1, β̂2) = (17.644, 0.808), we compute the generalized F test which equals 34.460 with P-value < 0.001 obtained by using 1000 bootstraps. The null distribution is depicted in Figure 2(c). Thus, the dark respiration rate is not linear in temperature. This example also demonstrate that the generalized F has good power.
4. Discussions
Motivated by the real data example analyzed in Section 3.2, we considered a class of partially nonlinear models. We proposed two estimation procedures for parameters involved in the parametric component and inference procedures for the baseline function. For the setting in which both ui and xi are fixed design points and the random error is normally distributed, Zhu and Wei (1999) derived an asymptotic efficient estimation procedure for the partially nonlinear models by using the concept of least favorable curve introduced by Severini and Wong (1992). They focused on theoretic development rather than practical implementation. Their method is quite abstract, and they did not discuss how to implement their approach. Compared with their method, both the newly proposed methods are easy to implement, although the profile nonlinear least squares method needs more computation than the linear approximation method.
5. Supplementary Materials
Wed Appendix referenced in Section 2 is available under the Paper Information link at the Biometrics website http://www.tibs.org/biometrics.
Supplementary Material
Acknowledgments
The authors thank the Editor, the associate editor and the referees for their constructive comments. These comments substantially improved an earlier draft. The authors are grateful to Dr. C. Yi for providing data analyzed in Section 3.2. Li’s research was supported by NSF grant DMS-0348869 and National Institute on Drug Abuse grant P50 DA10075.
Contributor Information
Runze Li, Department of Statistics and The Methodology Center, The Pennsylvania State University, University Park, PA 16802, email: rli@stat.psu.edu.
Lei Nie, Department of Biostatistics, Bioinformatiocs, and Biomathematics, Georgetown University, Washington, DC20057, email: ln54@georgetown.edu.
References
- Bates D, Watts D. Nonlinear regression Analysis and Its Applications. Wiley; 1988. [Google Scholar]
- Cai Z, Fan J, Li R. Efficient estimation and inferences for varying-coefficient models. J Amer Statist Assoc. 2000;95:888–902. [Google Scholar]
- Chen H. Convergence rates for parametric components in a partly linear model. Ann Statist. 1988;16:136–146. [Google Scholar]
- Engle R, Granger C, Rice J, Weiss A. Semiparametric estimates of the relation between weather and electricity sales. J Amer Statist Asso. 1986;81:310–320. [Google Scholar]
- Fan J. Local linear regression smoothers and their minimax efficiencies. Ann Statist. 1993;21:196–216. [Google Scholar]
- Fan J, Gijbels I. Local Polynomial Modelling and Its Applications. Chapman and Hall; London: 1996. [Google Scholar]
- Fan J, Li R. New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J Amer Statist Assoc. 2004;99:710–723. [Google Scholar]
- Fan J, Zhang C, Zhang J. Generalized likelihood ratio statistics and Wilks Phenomenon. Ann Statist. 2001;29:153–193. [Google Scholar]
- Gu L, Baldocchi D, Verma S, Black TA, Vesala T, Falge EM, Dowty PR. Advantages of diffuse radiation for terrestrial ecosystem productivity. J Geophysical Research. 2002;107 doi: 10.1029/2001JD001242. [DOI] [Google Scholar]
- Härdle W, Liang H, Gao J. Partially Linear Models. Springer; New York: 1999. [Google Scholar]
- Heckman N. Spline smoothing in partly linear models. J Royal Statist Soc B. 1986;48:244–248. [Google Scholar]
- Hu Z, Wang N, Carroll RJ. Profile-kernel versus backfitting in partially linear models for longitudinal/clustered data. Biometrika. 2004;91:252–261. [Google Scholar]
- Huet S. Statistical Tools for Nonlinear Regression: a Practical Guide with S-PLUS and R Examples. Springer; New York: 2004. [Google Scholar]
- Kauermann G, Carroll R. A note on efficiency of sandwich covariance matrix estimation. J Amer Statist Asso. 2001;96:1387–1396. [Google Scholar]
- Li R, Nie L. A new estimation procedure for partially nonlinear model via mixed effects approach. 2007 Submitted for publication. Available at http://www.stat.psu.edu/rli/research/partialnlnv1.pdf.
- Ma Y, Chiou JM, Wang N. Efficient semiparametric estimator for heteroscedastic partially linear models. Biometrika. 2006;93:75–84. [Google Scholar]
- Monteith JL. Solar radiation and productivity in tropical ecosystems . Journal of Applied Ecology. 1972;9:747–766. [Google Scholar]
- Ruimy A, Kergoat L, Bondeau A and Potsdam participants. Model Intercomparison, Comparing global models of terrestrial net primary productivity: analysis of differences in light absorption and light-use efficiency. Global Change Biology. 1999;5:56–64. [Google Scholar]
- Ruppert D, Sheather SJ, Wand MP. An effective bandwidth selector for local least squares regression. Jour Amer Statist Assoc. 1995;90:1257–1270. [Google Scholar]
- Ruppert D, Wand MP, Carroll RJ. Semiparametric Regression. Cambridge University Press; New York: 2003. [Google Scholar]
- Seber GAF, Wild CJ. Nonlinear Regression. Wiley: New York; 1989. [Google Scholar]
- Severini TA, Wong WH. Profile likelihood and conditionally parametric models. Ann Statist. 1992;20:1768–1802. [Google Scholar]
- Speckman P. Kernel smoothing in partial linear models. J Royal Statist Soc B. 1988;50:413–436. [Google Scholar]
- Wang N, Carroll RJ, Lin X. Efficient semiparametric marginal estimation for longitudinal/clustered data. J Amer Statist Assoc. 2005;100:147–157. [Google Scholar]
- Yatchew A. An elementary estimator for the partial linear model. Econometric Letters. 1997;57:135–143. [Google Scholar]
- Zhu Z, Tang N, Wei B. On confidence regions of semiparametric nonlinear regression models (A geometric approach) Acta Mathematica Scientia. 2000;20:68–75. [Google Scholar]
- Zhu Z, Wei B. Asymptotic efficient estimation in semiparametric nonlinear regression models. J Appl Math Chinese Univ. 1999;14B:43–51. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


