GENERALIZED PARTIALLY LINEAR MIXED-EFFECTS MODELS INCORPORATING MISMEASURED COVARIATES

Hua Liang

doi:10.1007/s10463-007-0146-0

. Author manuscript; available in PMC: 2010 Mar 1.

Published in final edited form as: Ann Inst Stat Math. 2009 Mar 1;61(1):27–46. doi: 10.1007/s10463-007-0146-0

GENERALIZED PARTIALLY LINEAR MIXED-EFFECTS MODELS INCORPORATING MISMEASURED COVARIATES

Hua Liang ¹

PMCID: PMC2768363 NIHMSID: NIHMS78899 PMID: 20160899

Abstract

In this article we consider a semiparametric generalized mixed-effects model, and propose combining local linear regression, and penalized quasilikelihood and local quasilikelihood techniques to estimate both population and individual parameters and nonparametric curves. The proposed estimators take into account the local correlation structure of the longitudinal data. We establish normality for the estimators of the parameter and asymptotic expansion for the estimators of the nonparametric part. For practical implementation, we propose an appropriate algorithm. We also consider the measurement error problem in covariates in our model, and suggest a strategy for adjusting the effects of measurement errors. We apply the proposed models and methods to study the relation between virologic and immunologic responses in AIDS clinical trials, in which virologic response is classified into binary variables. A dataset from an AIDS clinical study is analyzed.

Key words and phrases: AIDS clinical trial, generalized linear mixed-effects models, linear mixed-effects model, local linear, local quasilikelihood, longitudinal data, measurement error, penalized quasilikelihood

1 Introduction

In recent years, the relation between viral load and CD4 cell counts has been well studied (Wu et al., 1999; Wu, 2002; Liang, Wu, and Carroll, 2003). These studies have investigated the concordance and discordance between virologic and immunologic variables and may help clinicians more deeply understand AIDS pathogenesis and improve therapy regimes. Analysis of these AIDS data poses statistical challenges. A common feature of the markers, such as viral load measurements and CD4+ cell counts, used to evaluate antiviral therapy is that their measurement produces longitudinal data-that is, a series of independent and dependent variables obtained by taking repeated measurements, over time, from any one subject. Consideration of the features of longitudinal data, such as within-subject and between-subject variations and the correlation structure, is of practical interest.

Antiretroviral therapy for HID-1 infected patients has greatly improved in recent years. Administration of drug cocktails consisting of three or more drugs may reduce and maintain virus load copies below the detection limit in many patients, although it is unlikely that combination therapies alone can eradicate HID in infected patients because long-lived infected cells and sites within the body exist where drugs may not be effective. Because of the success of highly active antiretroviral therapy for HID infection, viral load in AIDS patients is suppressed and reaches to magnitudes that are below the limit of quantification, and AIDS becomes a chronic disease. Clinicians and patients are therefore sometimes only interested in maintaining the viral load below the detection limit, and in how the immune system (measured by CD4 cell counts) relates to this suppression.

Our primary goal in this paper is to propose a generalized partially linear mixed-effects model (GPLMeM) to study the relation between binary viral load measurement and CD4 cell count. Our focus is not only on the population characteristic but also on individual diversities. This is particularly important in AIDS research because that there is generally a large between-subject variation, which indicates the importance of estimating each individual’s parameters for the individualization of treatment management and care of patients with AIDS. We propose a new method to explore the population and individual characteristics by combining local linear regression (Fan and Gijbels, 1996), penalized quasilikelihood (PQL, Breslow and Clayton, 1993), and local quasilikelihood (Severini and Staniswalis, 1994) techniques.

For longitudinal binary data analysis, generalized linear mixed-effects models (GLMeM) have been proposed to incorporate the between-subject and within-subject variations. See, for example, Breslow and Clayton (1993) and Schall (1991) for the PQL method, Zeger and Karim (1991) for the Gibbs sampler. Wang et al. (1998) considered generalized linear mixed models when one of the covariates is measured with error, and they proposed a simulation extrapolation (SIMEX, Stefanski and Cook, 1995) estimation method.

To weaken model assumption for possible misspecification and to avoid the curse of dimensionality of fully nonparametric regression in the presence of several predictor variables, semipara-metric models have been studied and used for longitudinal data analysis in literature (Severini and Staniswalis, 1994). Lin and Carroll (2001) studied marginal semiparametric mixed-effects models and used the generalized estimating equation to estimate the parameter of interest. Wu and Zhang (2002) and Ke and Wang (2001) proposed semiparametric nonlinear mixed-effects models. Lin and Zhang (1999), Zhang (2004), and Zhang et al. (1998) used the smoothing spline to fit semi-parametric mixed-effects models. These authors considered between-subject and within-subject variations in their estimation methods, but they mainly aimed at delineating population features, while individual characteristics are ignored.

The article is organized as follows. In Section 2 we formally introduce the model’s framework, propose an estimation algorithm, and develop asymptotic results; extend to the case that X is measured with error; and discuss practical implementation of the methods. We illustrate the methods by conducting a small simulation experiment and extensively analyzing a data set from an AIDS study in Section 3 and provide a discussion in Section 4. The proof of the theoretical results is given in the Appendix.

2 Generalized Partially Linear Mixed-effects Models

2.1 Model

Suppose that data are obtained from n independent subjects with outcome variables Y_ij, linear covariates X_ij(p × 1) and A_ij(q × 1), and nonlinear scalar Z_ij, where i = 1, …, n indicates the subjects; j = 1, …, n_i indicates the observation of the ith subject. Given the covariates X_ij, Z_ij, A_ij, and unobserved q × 1 random effects vectors b_i and unobserved random effects function c_i(·), the observations Y_ij are assumed to be independent with means μ_ij and variance ϕω_ijV(μ_ij), where ω_ij is a known weight and V(·) is a known variance function. The GPLMeM of Y given X, Z is specified by

g (μ_{i j}) = X_{i j}^{T} β + A_{i j}^{T} b_{i} + θ (Z_{i j}) + c_{i} (Z_{i j}),

(2.1)

where g(·) is a known monotonic differentiable link function, β is a p × 1 vector, and θ(·) and c_i(·) are unknown smoothing functions. c_i(·) indicates the random-effects, which can be regarded as a realization of a stochastic process with mean zero. The random effects b_i are independent of c_i(·) and are independently distributed N(0, Σ_b), where the covariance matrix Σ_b is determined by finite parameters. Given the covariates X_ij, A_ij and Z_ij, and b_i and c_i(·), the observations Y_ij and Y_ik are independent; i.e., E(Y_ij|X_ij, Z_ij) = E{Y_ij|X_ij, Z_ij, (X_ik, Z_ik)_k_≠_j}. In longitudinal data, Z_ij are often the treatment times, and X_ij, A_ij, and Y_ij are the observations of the ith subject at time Z_ij, for example, in the AIDS data set studied later.

2.2 Estimation

To estimate the parametric and nonparametric parts β, b_i, θ(·), and c_i(·), we combine local linear regression, the PQL technique, and the quasilikelihood principle. We estimate these parameters by considering local PQL, then we update the estimate of β and b_i, using all data points, by relying on the global PQL. In local PQL, we approximate θ(·) and c_i(·) by linear functions:

\begin{array}{l} θ (z) \approx θ (z_{0}) + θ^{'} (z_{0}) (z - z_{0}) \equiv α_{0} + α_{1} (z - z_{0}) \\ c_{i} (z) \approx c_{i} (z_{0}) + c_{i}^{'} (z_{0}) (z - z_{0}) \equiv {ν_{i}}_{0} + {ν_{i}}_{1} (z - z_{0}) . \end{array}

Denote α = (α₀, α₁)^T, ν_i = (ν_i₀, &nu;_i₁)^T, $Λ_{i j} = (\begin{matrix} 1 \\ Z_{i j} - z_{0} \end{matrix})$ . In a neighborhood of z₀, model (2.1) can be approximated by a GLMeM and described as

g (μ_{i j}) = X_{i j}^{T} β + A_{i j}^{T} b_{i} + α^{T} Λ_{i j} + ν_{i}^{T} Λ_{i j},

where ν_i ~ N(0, Σ_ν) are random effects. In a conventional GLMeM setting, we can estimate α, β, b_i, and ν_i by using the PQL principle; i.e., maximize

- \sum_{i = 1}^{n} {\sum_{j = 1}^{n_{i}} Q ({\tilde{μ}}_{i j}, y_{i j}) + ν_{i}^{T} \sum_{ν}^{- 1} ν_{i} + b_{i}^{T} \sum_{b}^{- 1} b_{i}}

with respect to α, β, b_i, and ν_i, where ${\tilde{μ}}_{i j} = g^{- 1} (X_{i j}^{T} β + A_{i j}^{T} b_{i} + Λ_{i j}^{T} α + Λ_{i j}^{T} ν_{i})$ and $Q (s, y) = - 2 \int_{y}^{s} V^{- 1} (t) (y - t) d t$ .

To consider the localization, we propose a locally PQL approach. The motivation of doing so is essentially the same as localization in conventional nonparametric regression.

Denote ρ_k(t) = {dg⁻¹)(t)/dt}^k/[σ²V{g⁻¹(t)}] and q_k(t,y) = ∂^kQ{g⁻¹(t), y}/∂t^k for k = 1, 2, 3. Let k_j = ∫ u^jK(u)du, μ_j = ∫ u^jK²(u)du for j = 1, 2, $N = \sum_{i = 1}^{n} n_{i}$ , and $M = \sum_{i = 1}^{n} n_{i}^{2}$ .

Step 1

For each fixed z₀, we consider the maximization

- \sum_{i = 1}^{n} {\sum_{j = 1}^{n_{i}} K_{h} (Z_{i j} - z_{0}) Q ({\tilde{μ}}_{i j}, y_{i j}) + ν_{i}^{T} \sum_{ν}^{- 1} ν_{i} + b_{i}^{T} \sum_{b}^{- 1} b_{i}}

(2.2)

with respect to α, β, b_i, and ν_i. K_h(·) = K(·/h)/h, K(·) is a kernel function, and h is a bandwidth.

Take the differential of (2.2) on α, β, b_i, and ν_i. Step 1 is equivalent to solving the equations

\sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} K_{h} (Z_{i j} - z_{0}) (Y_{i j} - {\tilde{μ}}_{i j}) ρ_{1} ({\tilde{μ}}_{i j}) (\begin{matrix} 1 \\ Z_{i j} - z_{0} \\ X_{i j} \end{matrix}) = 0

(2.3)

and, for each i,

\sum_{j = 1}^{n_{i}} K_{h} (Z_{i j} - z_{0}) (Y_{i j} - {\tilde{μ}}_{i j}) ρ_{1} ({\tilde{μ}}_{i j}) (\begin{matrix} A_{i j} \\ 1 \\ Z_{i j} - z_{0} \end{matrix}) = (\begin{matrix} \sum_{b}^{- 1} & 0 \\ 0 & \sum_{ν}^{- 1} \end{matrix}) (\begin{matrix} b_{i} \\ ν_{i} \end{matrix}) .

(2.4)

Denote $Y_{i j}^{*} = g ({\tilde{μ}}_{i j}) + (Y_{i j} - {\tilde{μ}}_{i j}) g^{'} ({\tilde{μ}}_{i j}), {\tilde{Y}}_{i} = {(Y_{i 1}^{*}, \dots, Y_{i n_{i}}^{*})}^{T}$ , Z_i = (Z_i₁,…Z_{in_i})^T, and A_i similarly. Let K_ih = diag{K_h(Z_i₁ − z₀), …, K_h(Z_{in_i} − z₀)}, R_i = diag(R_i₁, …, R_{in_i}) with R_ij = {g′(μ̃_ij)}²V(μ̃_ij).

Equations (2.3) and (2.4) are approximately equivalent to iteratively fitting the following linear mixed-effects (Laird and Ware, 1982) model:

K_{i h}^{1 / 2} {\tilde{Y}}_{i} = K_{i h}^{1 / 2} {J_{i} (α + ν_{i}) + X_{i} β + A_{i} b_{i}} + ε_{i}

(2.5)

where ε_i ~ (0, R_i) and J_i = (Λ_i₁, ···, (Λ_{in_i})^T.

Consequently, the estimates defined by step 1 are approximately given in closed-form expressions (Davidian and Giltinan, 1995):

\begin{array}{l} (\begin{matrix} \hat{α} \\ \hat{β} \end{matrix}) = {\sum_{i = 1}^{n} {(J_{i}, X_{i})}^{T} Ω_{i} (J_{i}, X_{i})}^{- 1} {\sum_{i = 1}^{n} {(J_{i}, X_{i})}^{T} Ω_{i} {\tilde{Y}}_{i}} \\ (\begin{matrix} {\hat{b}}_{i} \\ {\hat{ν}}_{i} \end{matrix}) = (\begin{matrix} \sum_{b} & 0 \\ 0 & \sum_{ν} \end{matrix}) {(J_{i}, A_{i})}^{T} Ω_{i} {{\tilde{Y}}_{i} - (J_{i}, A_{i}) (\begin{matrix} \hat{α} \\ \hat{β} \end{matrix})}, \end{array}

where $Ω_{i} = K_{i h}^{1 / 2} \sum_{i}^{- 1} K_{i h}^{1 / 2}$ with $\sum_{i} = R_{i} + K_{i h}^{1 / 2} (J_{i} \sum_{ν} J_{i}^{T} + A_{i} \sum_{b} A_{i}^{T}) K_{i h}^{1 / 2}$ . &thetas;(z₀) and c_i(z₀) are estimated by θ̂(z₀) = α̂₀ and ĉ_i(z₀) = ν_i₀.

Note that β and b_i are global parameters and their estimates given in step 1 are based only on local information. We update their estimate, using all data, by considering a global penalized quasilikelihood procedure:

Step 2

Update estimates of β and b_i by maximizing the object function

- \sum_{i = 1}^{n} {\sum_{j = 1}^{n_{i}} Q ({\tilde{μ}}_{i j}^{*}, y_{i j}) + b_{i}^{T} \sum_{b}^{- 1} b_{i}}

(2.6)

with respect to β and b_i, where ${\tilde{μ}}_{i j}^{*} = g^{- 1} {X_{i j}^{T} β + A_{i j}^{T} b_{i} + {\hat{α}}_{0} (Z_{i j}) + {\hat{ν}}_{i 0} (Z_{i j})}$ .

The expressions containing matrices Σ_b, Σ_ν, and R_i, which are generally unknown and need to be estimated. To estimate these matrices, one may use the maximum and the restricted maximum likelihood methods to estimate the unknown components of Σ_b, Σ_ν, and R_i under the normality assumption. To implement these non-trivial methods, the EM-algorithm and New-Raphson methods have been proposed(Laird and Ware, 1982; Davidian and Giltinan, 1995). The standard statistical software packages such as SAS and Splus/R provide user-friendly functions to implement these methods( lme Splus/R function and the SAS procedure SAS MIXED). After obtaining the point estimates of the unknown components, we have Σ̂_b, Σ̂_ν, and R̂_i, and then Σ̂_i and Ω̂_i. The estimators of α, β, b_i, and μ_i thus can be obtained by substitution of Σ̂_i and Ω̂_i in steps 1 and 2.

2.3 Asymptotic Results

We claim the following condition, which is standard in the literature describing the generalized partially linear models.

Condition

The function q₂(t, y) < 0 for t ∈ R and y in the range of the response variable.
The random-effects functions c_i(z) are iid and have zero mean gaussian marginal distribution, and have the same distribution as a random-effects curve c(z), a two times continuously differentiable function.
The density function f(z) of Z is positive and continuous at the point z₀.
The functions θ(·) and θ⁽²⁾(·) are continuous at the point z₀.
With R = X^Tβ+A^Tb+θ(Z)+c(Z), $E {q_{1}^{2} (R, Y) ∣ z}, E {q_{1}^{2} (R, Y) X ∣ z}$ , and $E {q_{1}^{2} (R, Y) X X^{T} ∣ z}$ are twice differentiable in z.
$E {q_{2}^{2} (R, Y)} < \infty$ and $E {q_{1}^{2 + δ} (R, Y)} < \infty$ , for some δ > 2.

Mh/N → λ, a finite nonnegative constant.

Let γ_y(z₁, z₂) be the covariance of [Y − g⁻¹{X^Tβ + A^Tb + θ(Z) + c(Z)}]ρ₁{X^Tβ + A^Tb+ θ (Z) + c(Z)} for Z = z₁ and z₂. γ_y(z₁, z₂) is continuous on z₁ and z₂.

The functions V″(·) and g‴(·) are continuous.

Theorem 1

Consider the estimator θ̂(z₀) given in step 1. Then, as n → ∞, h → 0 and Nh → ∞, and under the specified condition, we have the asymptotic expansion

\begin{array}{c} \hat{θ} (z_{0}) - θ (z_{0}) = b_{θ} (z_{0}) + \frac{1}{N f (z_{0}) E {ρ_{2} (R) ∣ Z = z_{0}}} \sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} q_{1} (η_{i j}, y_{i j}) K_{h} (Z_{i j} - z_{0}) \\ + o_{P} {{(N h)}^{- 1 / 2} + h^{2}}, \end{array}

(2.7)

and hence

{(N h)}^{1 / 2} {\hat{θ} (z_{0}) - θ (z_{0}) - b_{θ} (z_{0})} \overset{D}{\to} N {0, σ_{θ}^{2} (z_{0})},

(2.8)

where

b_{θ} (z_{0}) = \frac{h^{2}}{2} κ_{2} [θ^{″} (z_{0}) + \frac{E {c^{″} (Z) ρ_{2} (R) ∣ Z = z_{0}}}{E {ρ_{2} (R) ∣ Z = z_{0}}}]

and

σ_{θ}^{2} (z_{0}) = \frac{μ_{0} E {ρ_{2} (R) ∣ Z = z_{0}} + λ f (z_{0}) γ_{y} (z_{0}, z_{0})}{f (z_{0}) {[E {ρ_{2} (R) ∣ Z = z_{0}}]}^{2}} .

Theorem 2

Let β̂ be the estimate given in step 2. Under the condition, as n → ∞, Nh⁴ → 0 and Nh²/log(1/h) → ∞,

N^{1 / 2} (\hat{β} - β) \overset{D}{\to} N (0, B^{- 1} \sum_{1} B^{- 1}),

(2.9)

where B = E[ρ₂(R)XX^T] and $\sum_{1} = cov (q_{1} (R, Y) [X - \frac{E {ρ_{2} (R) X ∣ Z}}{E {ρ_{2} (R) ∣ Z}}])$ .

As in generalized partially linear models for cross-sectional data, theorem 2 indicates that in order to estimate β at the root–n rate, one must undersmooth the nonparametric part θ(·). This undersmoothing request is standard in the kernel literature, although ordinary bandwidth rates are permissible in partially linear models.

2.4 Implementation

To obtain the estimates by fitting (2.5), we give an initial value for μ̃_ij, say ${\tilde{μ}}_{i j}^{(0)}$ , and then have the working response variable

{\tilde{Y}}_{i j}^{(0)} = g ({\tilde{μ}}_{i j}^{(0)}) + (Y_{i j} - {\tilde{μ}}_{i j}^{(0)}) g^{'} ({\tilde{μ}}_{i j}^{(0)}) .

For a given bandwidth h, we fit model (2.5) and obtain estimates α̂, β̂, b̂_i, and ν̂_i. Update $g ({\hat{μ}}_{i j}) = {\hat{α}}_{0} + {\hat{ν}}_{i 0} + X_{i j}^{T} \hat{β} + A_{i j}^{T} {\hat{b}}_{i}$ and g′(μ̂_ij). Correspondingly, Ỹ_ij can be updated as g(μ̂_ij) + (y_ij − μ̂_ij)g′(μ̂_ij), and fit (2.5) again. Repeat these procedures until convergence, and collect the final estimates θ̂ (z₀) = α̂₀, ĉ_i(z₀) = ν̂_i₀.

During the process of the implementation, one should keep in mind that the bandwidth needs to be selected in the estimation procedure involved in step 1. Theorem 2 indicates that undersmoothing the nonparametric part is necessary to guarantee to estimate β at the root-n rate. Because only the rates of convergence for the bandwidth h are necessary for the same limiting distribution for the estimators of β, we select h by using the empirical bias bandwidth selection method (Ruppert, 1997) in our data analysis. In our numerical analysis below, we select initial values by fitting the data by a generalized linear mixed-effects model. Because iteration is needed, we set the maximum number of iteration to be 30. Computation results for our numerical analysis are obtained within 18 iterations. This adoption leads to potentially burdensome computation, which may be attenuated with a powerful computation platform.

2.5 Measurement errors in covariates

In practice, the covariates may not be exactly observable, and like CD4 cell counts in our AIDS dataset analyzed later, are measured with substantial errors. The estimators and inference may be biased if one ignores these measurement errors. We need to adjust the resulting bias. In this special case, we denote X_ij by X_i(Z_ij), the true covariates of subject i, and W_ij by W_i(Z_ij), the observed values of X_ij at time Z_ij. Suppose we have the measurement error model,

W_{i} (z) = X_{i} (z) + u_{i} (z),

(2.10)

where W_i(z) is the observed value and X_i(z) is the underlying true value for the ith patient at treatment time z. The error u_i(z) represents measurement error in the CD4+ cell counts. We assume that u_i(z) is a mean zero process, and {X_i(z), u_i(z)} are mutually independent. Fuller (1987) and Carroll, Ruppert, and Stefanski (1995) provided a good survey of measurement error models. Buonaccorsi, Demidenko, and Tosteson (2000) studied measurement errors in linear mixed-effects models. Higgins, Davidian, and Giltinan (1997) proposed a two-step approach to deal with measurement errors in nonlinear mixed-effects models. Wang et al. (1998) studied generalized linear mixed measurement error models. Liang and Ren (2005) investigated generalized partially linear models with errors-in-variables and proposed a measurement error calibration based on a SIMEX algorithm. The computation burden in a similar platform is tremendously expensive, although its implementation is straightforward. In this paper we adjust measurement error by using a mixed-effects regression spline model, as done by Higgins et al. (1997); a similar idea was used by Liang, Wu, and Carroll (2003).

The data points repeatedly measured along time z are similar to replications if we can assume that the measured variable is a smooth function of z. Shi, Weiss, and Taylor (1996) and Rice and Wu (2001) have modeled the natural history of the CD4+ cell process in untreated HID-infected patients by using a mixed-effects regression spline model. Model the CD4+ cell counts as follows:

X_{i} (Z_{i j}) = \sum_{k = 0}^{p} ξ_{k} ψ_{k} (Z_{i j}) + \sum_{k = 0}^{q} η_{i k} φ_{k} (Z_{i j}) = Ψ_{i} (Z_{i j}) ξ + Φ_{i} (Z_{i j}) η_{i},

(2.11)

where Ψ_i(z) = {ψ₀(z), ψ₁(z), ···, ψ_p(z)} and Φ_i(z) = {φ₀(z), φ₁(z), ···, φ_q(z)} are basis functions such as cubic B-spline basis, ξ = (ξ₀, ξ₁, ···, ξ_p)^T is a fixed-effect parameter vector, and η_i = (η_i₀, η_i₁, ···, η_iq)^T is a random effect vector with mean zero and covariance matrix Σ_η (Σ_η may be unstructured or can be specified with a special structure). The selection of the number (p and q) and locations of knots for regression splines can be achieved by using cross-validation (Eubank, 1999). Rice and Wu (2001) suggest setting p = q and ψ_k(z) = φ_k(z). Model (2.11) is a standard linear mixed-effects model, which can be fitted by using the LME function of Splus (Pinheiro and Bates, 2000).

Higgins et al. (1997) have proposed a two-step approach to deal with measurement errors in time-dependent covariates in nonlinear mixed-effects models. The first step is to estimate the covariate function by fitting an appropriate model for covariate processes, and then fit the nonlinear mixed-effects model by plugging in the estimates of the covariates in the second step. This is essentially similar to the regression calibration idea (Carroll et al. 1995). By applying a similar idea to our model for measurement error in CD4+ cell counts, we fit a LME model,

W_{i} (Z_{i j}) = Ψ_{i} (Z_{i j}) ξ + Φ_{i} (Z_{i j}) η_{i} + u_{i},

(2.12)

and obtain an estimate Ŵ_ij of W_ij, which is regarded as X_ij.

3 Numerical Illustration

In this section, we first conduct a small simulation study for an illustration. We then use the proposed methods to analyze a data set from an AIDS study. In our numerical analysis, we calculate the naive estimates, i.e., those obtained by ignoring measurement errors, the proposed estimates. We use the quartic kernel k(u) = 15/16(1 − u²)²I_(|_u_|≤1) for nonparametric regression.

3.1 A Small Simulation Experiment

We performed a small simulation experiment and the data were generated from a logistic model, whose log(odd) can be expressed as

logit {Y_{i j} = 1 ∣ X_{i j}, Z_{i j}} = X_{i j} (β + b_{i}) + θ_{i} (Z_{i j}), W_{i j} = X_{i j} + U_{i j} i = 1, \dots, n, j = 1, \dots, m,

where X_ij is independent uniform (0, 1) components, Z_ij is uniformly distributed U(0, 1), and U_i is normally distributed $N (0, σ_{u}^{2})$ . The parameter β is equal to 1.85, and the nonparametric function is θ_i(z) = (1 + c₁_i) cos(2πz) + (2 + c₂_i) sin(2πz) with (c₁_i, c₂_i) being 2-dimensional normal distribution $N (0, σ_{b}^{2} I_{2})$ .

We consider a combination of (n, m) = (20, 30), (40, 50), (60, 70), σ_u = 0.1, 0.25, and σ_b = 0.1. In each of configurations, we run 500 replications. We calculate the proposed estimates and the naive estimates (i.e., we ignore the measurement errors). The estimated values of β are summarized in Table 1.

Table 1.

The estimates with standard errors (s.e.) of the parameter β for the simulated data. “Mean”: the means of the estimated values; “s.e.”: sample standard errors based on 500 replications.

n	m	σ_u	Proposed		Naive
			mean	s.e.	mean	s.e.
20	30	0.1	1.876	0.355	1.784	0.352
40			1.873	0.256	1.759	0.243
60			1.887	0.305	1.742	0.299
20	50		1.831	0.338	1.65	0.337
40			1.822	0.227	1.67	0.266
60			1.848	0.18	1.706	0.201
20	70		1.824	0.238	1.672	0.238
40			1.796	0.214	1.627	0.208
60			1.813	0.18	1.628	0.185
20	30	0.25	1.89	0.343	1.298	0.354
40			1.885	0.317	1.257	0.326
60			1.891	0.264	1.253	0.264
20	50		1.822	0.258	1.128	0.254
40			1.799	0.237	1.1	0.248
60			1.832	0.207	1.179	0.206
20	70		1.821	0.239	1.127	0.219
40			1.79	0.197	1.128	0.194
60			1.809	0.192	1.089	0.195

Open in a new tab

The results are in accord with the theory for measurement error models. The simulation results based on the proposed method indicate that the naive estimator of β is markedly biased, while the estimated values based on the proposed method is much closer to the true value than the naïve estimate of β.

3.2 Data Analysis

Let Y be binary viral load, X be CD4 cell counts and Z be the treatment time. An ordinary logistic mixed-effects model says that the logit of Y = 1 satisfies

logit {E (Y_{i j} ∣ X_{i j}, Z_{i j}, b_{1 i}, b_{2 i})} = α_{0} + α_{1} X_{i j} + α_{2} Z_{i j} + b_{1 i} X_{i j} + b_{2 i} Z_{i j},

(3.1)

where b₁_i and b₂_i reflect the individual variations. The merit of this model includes its computational convenience and easy interpretation of the model parameter. However, this model may not be able to capture some curvature, as we shall see later because drug resistance, and noncompliance probably contribute nonlinearly along the treatment time. An alternative is a partially logistic mixed-effects model, described as

logit {E (Y_{i j} ∣ X_{i j}, Z_{i j}, b_{1 i}, c_{i})} = X_{i j} β + X_{i j} b_{1 i} + θ (Z_{i j}) + c_{i} (Z_{i j}),

(3.2)

where θ(z) and c_i(z) are unknown smoothing functions, which describe the population characteristic and individual diversities.

In this section, we present the results of analysis of an AIDS clinical study conducted by the Pediatric AIDS Clinical Trials Group (PACTG 345, Scott et al., 2001). In this study, 33 patients were enrolled as cohort II. Specimens were obtained on days 0, 1, 3, 7, 14, 28, 56, …, 1115, and 559 observations were obtained with 256 HID-1 RNA measurements below the detection limit of 400 copies/mL: 45% viral load observations were therefore suppressed below the detection limit. Figure 1 presents the individual observations of plasma HID RNA concentration (viral load) after initial antiretroviral treatments. A main objective of the treatment is to suppress the viral load below the limit of detection.

Individual viral load measurements of plasma HID RNA concentration in the PACTG 345 study. The detection limit of 400 copies of HID RNA per mL of plasma is indicated by the horizontal line.

The purpose of this analysis focuses on whether the viral load is suppressed and how the of CD4 cell counts change during the treatment. We apply the model and estimation method described in section 2 to explore this data set and address the concerns mentioned in Section 1 by modeling the dynamic relation between the binary response (with or without viral suppression) and CD4 cell counts during the treatment period of about 3 years.

Let Y_ij = 1 if the viral load of the ith subject at time Z_ij is below the detection limit, and 0 otherwise. In this analysis, we consider 4 scenarios: (i) model (3.1) and ignoring consideration of measurement error in CD4+ cell counts; (ii) model (3.1) with consideration of measurement error in CD4+ cell counts; (iii) model (3.2), ignoring measurement error in CD4+ cell counts; and (iv) model (3.2) with consideration of measurement error in CD4+ cell counts. When considering measurement errors in CD4+ cell counts and using the method proposed in subsection 2.5 to produce the “true” CD4+ cell counts, the smoothing parameters (the number of knots) are selected by the model selection criterion BIC and the location of knots is selected at the quantiles of the data (Eubank, 1999). We obtain p = q = 3. To stabilize the variance and computational algorithms, we centered the covariate CD4+ cell counts and took a log-transformation for time in the model fitting. We fitted the model by using the LME function of Splus.

The population estimated values of the parameter β for these 4 scenarios are presented in Table 2. Comparing the estimates of β under (i) scenarios to (iii) and (ii) to (iv), we saw that models (3.1) and (3.2) derive remarkably different results. We also compared the estimates obtained with or without considering measurement error in the CD4+ cell counts. The estimates attenuated toward zero regardless of the models (3.1) or (3.2) used when the measurement error in the covariate is not considered. This finding is similar to that in standard linear or nonlinear regression models with measurement error (Carroll et al. 1995).

Table 2.

Estimates of the parameters in scenarios (i)–(iv), based on PACTG 345 data

	(i)	(ii)	(iii)	(iv)
β̂	−0.272	−0.358	−0.692	−1.046
SE	0.10	0.143	0.134	0.223
Pvalue	0.007	0.0014	< 10⁻⁴	< 10⁻⁴

Open in a new tab

The population estimates of θ(z) for scenarios (iii) and (iv) are shown in Figure 2. Two population curves have similar patterns; that is, the curves rise rapidly at first and then maintain a steady decline from day 56 to the end of treatment. However, a consideration of measurement error reflects a sharper increase at the beginning.

Population curve estimates of θ(z) considering measurement errors (solid line) and ignoring measurement errors (dashed line).

Recalling the viral load of each subject shown in Figure 1, we note that the between-subject variation can not be ignored and the individual curves, α_i(z) = θ(z) + c_i(z), may not follow the pattern of the population curve. We present the results for 4 individual subjects, and these results illustrate the principal advantage of the proposed model and developed methods in that the estimates can be obtained for both population and individuals. Because the patterns for the same individual under scenarios (iii) and (iv) are similar, we report in detail only the results for the case in which measurement errors in CD4+ cell counts are ignored. Figure 3 shows the individual estimates from 4 patients. For comparison, the corresponding population estimate is also plotted. The population and individual estimates are different not only in magnitude but also in patterns of change. The individual curve for subject 1 almost follows the population curve. The individual curves of the other 3 subjects are totally different from the population curve. Curves of subjects 6 and 24 decline sharply at first and then rebound later, to the end of treatment, but have different rebounding times. In contrast, the curve of subject 17 follows a convex shape. Given this large between-subject variation, the estimated trajectories of the individual curves become very important for individualizing treatment management and care for AIDS patients.

Selected individual curve estimates (solid line), with the population curve estimate (dashed line) provided for comparison.

4 Discussion

In order to study the relation between binary virologic variables and immunologic variables, repeatedly measured indices of success of suppression of viral load (the virologic variable) and CD4+ cell counts (the immunologic variable) in AIDS clinical trials, we proposed a semiparamet-ric mixed-effects model, which can parsimoniously reflect both population and individual relationships between the two longitudinal variables. Similar models have been reported in the literature. However, most published studies focused only on the estimation of the population characteristic, although between-subject and within-subject variations have been incorporated. The method proposed in this article extends existing methods and allows us to estimate the population profile and individual diversities.

In step 1 we minimized the objective function with respect to α, β, b_i and ν_i. Alternatively, we may specify initial values β⁽⁰⁾ and $b_{i}^{(0)}$ , and maximize the objective function (a similar form to (2.2) but replacing β and b_i with β⁽⁰⁾ and $b_{i}^{(0)}$ ) with respect to α and ν_i. This alternative may increase efficiency (see Carroll et al., 1997 for a related discussion), but it increases computation efforts.

When considering measurement errors in CD4 cell counts, we used a mixed-effects based approach to calibrate the effect of measurement error. This calibration may also be achieved by using a SIMEX-based procedure such as Liang and Ren (2005). However, additional information is required to estimate the covariance matrix cov(X) in implementing the SIMEX procedure. This investigation also increases computation efforts.

In this article, we develop a method using local linear regression. There are a lot of alternative ways for local linear approximation in step 1, including higher degree local polynomial kernel methods and smoothing and regression splines. The details for these methods require a further investigation in our setting. We chose the local linear smoother because theoretical results can be derived and the estimators of nonparametric components do not suffer from boundary effects (Fan and Gijbels, 1996).

Model (2.1) may be extended to a generalized additive partially linear mixed-effect model, of the form

g (μ_{i j}) = X_{i j}^{T} β + A_{i j}^{T} b_{i} + \sum_{k = 1}^{K} θ_{k} (Z_{i j}^{(k)}) + \sum_{k = 1}^{K} c_{i k} (Z_{i j}^{(k)}) .

The study of this model is interesting and requires tedious efforts, but is beyond the scope of this paper.

Acknowledgments

The author appreciates the Editor, Associate Editor, and two referees for their valuable comments and suggestions. This research was partially supported by NIH/NIAID grants AI62247 and AI59773.

Appendix

A.1 Proof of Theorem 1

Let c_n = (Nh)^−1/2,

Ξ_{i j}^{*} = (\begin{matrix} 1 \\ (Z_{i j} - z_{0}) / h \\ X_{i j} \end{matrix}), β^{*} = (\begin{matrix} c_{n}^{- 1} {α_{0} - θ (z_{0})} \\ c_{n}^{- 1} h {α_{1} - θ^{'} (z_{0})} \\ c_{n}^{- 1} (\tilde{β} - β) \end{matrix}), {\hat{β}}^{*} = (\begin{matrix} c_{n}^{- 1} {{\hat{α}}_{0} - θ (z_{0})} \\ c_{n}^{- 1} h {{\hat{α}}_{1} - θ^{'} (z_{0})} \\ c_{n}^{- 1} (\hat{β} - β) \end{matrix}),

and let f(z) denote the density function of Z_ij. Denote further ${\bar{η}}_{i j} = {\bar{η}}_{i j} (z_{0}) = X_{i j}^{T} β + A_{i j}^{T} {\hat{b}}_{i} + θ (z_{0}) + θ^{'} (z_{0}) (Z_{i j} - z_{0}) + {\hat{c}}_{i}^{'} (z_{0}) + {\hat{c}}_{i}^{'} (z_{0}) (Z_{i j} - z_{0})$ . (2.3) implies that β̂^* maximizes

ℓ_{n} (β^{*}) = h \sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} [Q {g^{- 1} (c_{n} β^{* T} Ξ_{i j}^{*} + {\bar{η}}_{i j}), Y_{i j}} - Q {g^{- 1} ({\bar{η}}_{i j}), Y_{i j}}] K_{h} (Z_{i j} - z_{0}),

with respect to β^*. By the concavity of the function ℓ_n(β^*) and Taylor expansion of the function Q{g⁻¹(·), Y_ij} with respect to β^*, we obtain that

ℓ_{n} (β^{*}) = W_{n}^{T} β^{*} + \frac{1}{2} β^{* T} A_{n} β^{*} {1 + o_{P} (1)},

(A.1)

where $W_{n} = h c_{n} \sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} q_{1} ({\bar{η}}_{i j}, Y_{i j}) Ξ_{i j}^{*} K_{h} (Z_{i j} - z_{0})$ and $A_{n} = h c_{n}^{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} q_{2} ({\bar{η}}_{i j}, Y_{i j}) Ξ_{i j}^{*} Ξ_{i j}^{* T} K_{h} (Z_{i j} - z_{0})$ . Furthermore, let $η_{i j}^{0} = X_{i j}^{T} β + A_{i j}^{T} b_{i} + θ (z_{0}) + θ^{'} (z_{0}) (Z_{i j} - z_{0}) + c_{i} (z_{0}) + {\hat{c}}_{i}^{'} (z_{0}) (Z_{i j} - z_{0})$ . Without loss of generality, assume that the means of random-effects terms b̂_i − b_i, ĉ(z₀) − c_i(z₀), and ${\hat{c}}_{i}^{'} (z_{0}) - c_{i}^{'} (z_{0})$ are zeros. Then

\begin{array}{l} A_{n} = h c_{n}^{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} q_{2} (η_{i j}^{0}, Y_{i j}) Ξ_{i j}^{*} Ξ_{i j}^{* T} K_{h} (Z_{i j} - z_{0}) \\ + h c_{n}^{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} q_{3} (ζ_{i j}, Y_{i j}) Ξ_{i j}^{*} Ξ_{i j}^{* T} ({\bar{η}}_{i j} - η_{i j}^{0}) K_{h} (Z_{i j} - z_{0}), \end{array}

(A.2)

where ζ_ij is between $η_{i j}^{0}$ and η̄_ij. Note that $({\bar{η}}_{i j} - η_{i j}^{0}) = A_{i j}^{T} ({\hat{b}}_{i} - b_{i}) + {{\hat{c}}_{i} (z_{0}) - c_{i} (z_{0})} + {{\hat{c}}_{i}^{'} (z_{0}) - c_{i}^{'} (z_{0})} (Z_{i j} - z_{0})$ . Using an argument similar to the proof of (A.9) of Carroll et al. (1997) and condition (i) yield that the second term in the (A.2) is of order o(h). We therefore have

A_{n} = h c_{n}^{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} q_{2} (η_{i j}^{0}, Y_{i j}) Ξ_{i j}^{*} Ξ_{i j}^{* T} K_{h} (Z_{i j} - z_{0}) + o (h) ≜ A_{n}^{0} + o (1) .

In a similar way, we obtain that

W_{n} = h c_{n} \sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} q_{1} (η_{i j}^{0}, Y_{i j}) Ξ_{i j}^{*} K_{h} (Z_{i j} - z_{0}) + o (c_{n}^{- 1} h^{2}) ≜ W_{n}^{0} + o (c_{n}^{- 1} h^{2}) .

Let

A (X) = (\begin{matrix} 1 & 0 & X^{T} \\ 0 & κ_{2} & 0 \\ X & 0 & X X^{T} \end{matrix}); B (X) = (\begin{matrix} μ_{0} & 0 & μ_{0} X^{T} \\ 0 & μ_{2} & 0 \\ μ_{0} X & 0 & μ_{0} X X^{T} \end{matrix}) .

We write ${(A_{n}^{0})}_{i j} = {(E A_{n}^{0})}_{i j} + O_{P} ({var {(A_{n}^{0})}_{i j}}^{1 / 2})$ and deal with the first two moments of $A_{n}^{0}$ . Note that $q_{2} (x, y) = {y - g^{- 1} (x)} ρ_{1}^{'} (x) - ρ_{2} (x)$ . We find for the first moment, with η̄ = X^Tβ + A^Tb + θ(z₀) + θ(z₀)(Z − z₀) + c(z₀) + c′(z₀)(Z − z₀),

\begin{array}{l} E A_{n}^{0} = E [E {q_{2} (\bar{η}, Y) Ξ^{*} Ξ^{* T} K_{h} (Z - z_{0}) ∣ X, b}] \\ = E [{g^{- 1} (R) - g^{- 1} (\bar{η})} ρ_{1}^{'} (\bar{η}) Ξ^{*} Ξ^{* T} K_{h} (Z - z_{0})] \\ - E {ρ_{2} (\bar{η}) Ξ^{*} Ξ^{* T} K_{h} (Z - z_{0})} . \end{array}

(A.3)

The first term in (A.3) is of order O(h²). It follows that

\begin{array}{l} E A_{n}^{0} = - E {ρ_{2} (\bar{η}) Ξ^{*} Ξ^{* T} K_{h} (Z - z_{0})} + O (h^{2}) \\ = - E (E [ρ_{2} {R + O (h)} Ξ^{*} Ξ^{* T} ∣ Z = z_{0}] K_{h} (Z - z_{0})) + O (h^{2}) \\ = - f (u) E {ρ_{2} (R) A (X) ∣ Z = z_{0}} + o (1) \equiv A + o (1) . \end{array}

(A.4)

In an analogous way we can show that

var {(A_{n}^{0})}_{i j} = O (c_{n}^{2}) .

(A.5)

Combining (A.4) and (A.5), we obtain

A_{n}^{0} = - A + o_{P} (1) .

(A.6)

Therefore, by (A.1),

ℓ_{n} (β^{*}) = W_{n}^{0 T} β^{*} - \frac{1}{2} β^{* T} A β^{*} + o_{P} (1) .

(A.7)

By applying the convexity lemma (Pollard, 1991), we obtain that ${\hat{β}}^{*} = A^{- 1} W_{n}^{0} + o_{P} (1)$ . Hence the asymptotic normality of β̂^* will follow from that of $W_{n}^{0}$ , which we will establish below. By the definition of $W_{n}^{0}$ and using Taylor’s expansion we get for the first moment,

\begin{array}{l} E W_{n}^{0} = c_{n}^{- 1} E [E {q_{1} (\bar{η}, Y) Ξ^{*} K_{h} (Z - z_{0}) ∣ X, b}] \\ = c_{n}^{- 1} E [{g^{- 1} (R) - g^{- 1} (\bar{η})} ρ_{1} (\bar{η}) Ξ^{*} K_{h} (Z - z)] \\ = \frac{1}{2} c_{n}^{- 1} E [(g^{- 1})^{'} (R) {c^{″} (z_{0}) + θ^{″} (z_{0})} {(Z - z_{0})}^{2} ρ_{1} (\bar{η}) Ξ^{*} K_{h} (Z - z_{0})] + o (c_{n}^{- 1} h^{2}) \\ = c_{n}^{- 1} \frac{1}{2} θ^{″} (z_{0}) h^{2} f (z_{0}) E {ρ_{2} (R) {(κ_{2}, 0, κ_{2} X^{T})}^{T} ∣ Z = z_{0}} \\ + c_{n}^{- 1} \frac{1}{2} h^{2} f (z_{0}) E {c^{″} (Z) ρ_{2} (R) {(κ_{2}, 0, κ_{2} X^{T})}^{T} ∣ Z = z_{0}} + o (c_{n}^{- 1} h^{2}) . \end{array}

(A.8)

It follows from these statements and (A.6) that

c_{n}^{- 1} {\hat{θ} (z_{0}) - θ (z_{0}) - b_{θ} (z_{0}) + o_{p} (h^{2})} = {[f (z_{0}) E {ρ_{2} (R) ∣ Z = z_{0}}]}^{- 1} S_{n} {1 + o_{p} (1)},

where $S_{n} = c_{n} \sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} s_{i j}$ with s_ij = q₁(η_ij, y_ij)K_h(Z_ij − z₀). We have

var (S_{n}) = c_{n}^{2} \sum_{i = 1}^{n} var (\sum_{j = 1}^{n_{i}} s_{i j}) = J_{n 1} + J_{n 2}

where

J_{n 1} = c_{n}^{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} var (s_{i j}) and J_{n 2} = c_{n}^{2} \sum_{i = 1}^{n} \sum_{j_{1} \neq j_{2}} cov (s_{i j_{1}}, s_{i j_{2}}) .

It is easy to see that

var (s_{i j}) = h μ_{0} f (z_{0}) E {ρ_{2} (R) ∣ Z = z_{0}} + o (1)

and

J_{n 1} = μ_{0} f (z_{0}) E {ρ_{2} (R) ∣ Z = z_{0}} + o (1) .

Furthermore, for j₁ ≠ j₂, by condition (h)

cov (s_{i j_{1}}, s_{i j_{2}}) = E {γ_{y} (Z_{i j_{1}}, Z_{i j_{2}}) K_{h} (Z_{i j_{1}} - z_{0}) K_{h} (Z_{i j_{2}} - z_{0})} = h^{2} f^{2} (z_{0}) γ_{y} (z_{0}, z_{0}) {1 + o_{p} (1)},

which, combined with condition (g), yields that

J_{n 2} = h^{2} f^{2} (z_{0}) γ_{y} (z_{0}, z_{0}) c_{n}^{2} \sum_{i = 1}^{n} n_{i} (n_{i} - 1) {1 + o (1)} = λ f^{2} (z_{0}) γ_{y} (z_{0}, z_{0}) {1 + o (1)} .

It follows that

var (S_{n}) = μ_{0} f (z_{0}) E {ρ_{2} (R) ∣ Z = z_{0}} + λ f^{2} (z_{0}) γ_{y} (z_{0}, z_{0}) + o (1) .

Finally, using condition (f), it can be shown that Liapounov’s condition is satisfied and hence Theorem 1 holds, as claimed.

A.2 Proof of Theorem 2

First of all, we note that under the specified condition, in a proof similar to that for theorem 2 of Carroll et al. (1997), we can establish

\begin{array}{r} sup_{z \in D} | \hat{θ} (z_{0}) - θ (z_{0}) - \frac{1}{N f (z) E {ρ_{2} (R) ∣ Z = z_{0}}} \sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} q_{1} (η_{i j}, y_{i j}) K_{h} (Z_{i j} - z_{0}) | \\ = O_{P} {h^{2} c_{n} + c_{n}^{2} {log}^{1 / 2} (1 / h)} . \end{array}

(A.9)

Let ζ̂ = N^1/2(β̂ − β₀), ${\hat{m}}_{i j} = \hat{θ} (Z_{i j}) + X_{i j}^{T} β + {\hat{c}}_{i} (Z_{i j}) + A_{i j}^{T} b_{i}$ , and $m_{i j} = θ (Z_{i j}) + X_{i j}^{T} β_{0} + c_{i} (Z_{i j}) + A_{i j}^{T} b_{i}$ . Then, ζ̂ maximizes

ℓ_{n} (ζ) = \sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} [Q {g^{- 1} ({\hat{m}}_{i j} + N^{- 1 / 2} X_{i j}^{T} ζ), Y_{i j}} - Q {g^{- 1} ({\hat{m}}_{i j}), Y_{i j}}]

(A.10)

with respect to ζ. By Taylor’s expansion, we have

ℓ_{n} (ζ) = N^{- 1 / 2} \sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} q_{1} ({\hat{m}}_{i j}, Y_{i j}) X_{i j}^{T} ζ + \frac{1}{2} ζ^{T} B_{n} ζ,

(A.11)

where

B_{n} = \frac{1}{N} \sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} [Y_{i j} ρ_{1}^{'} {g^{- 1} ({\hat{m}}_{i j} + ζ_{nij})} - ρ_{3} {g^{- 1} ({\hat{m}}_{i j} + ζ_{nij}^{'})}] X_{i j} X_{i j}^{T},

with ζ_nij and $ζ_{nij}^{'}$ between 0 and $N^{- 1 / 2} X_{i j}^{T} ζ$ , independent of Y_ij, and with $ρ_{3} (x) = - g^{- 1} (x) ρ_{1}^{'} (x) - ρ_{2} (x)$ . It can be shown that

B_{n} = - E {ρ_{2} (R) X X^{T}} + o_{P} (1) = - B + o_{P} (1) .

(A.12)

Using similar arguments as for obtaining (A.12), we get

\begin{array}{r} N^{- 1 / 2} \sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} q_{1} ({\hat{m}}_{i j}, Y_{i j}) X_{i j} = N^{- 1 / 2} \sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} q_{1} (m_{i j}, Y_{i j}) X_{i j} + N^{- 1 / 2} \sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} q_{2} (m_{i j}, Y_{i j}) \\ {\hat{θ} (Z_{i j}) - θ (Z_{i j})} X_{i j} + O_{P} {N^{1 / 2} c_{n}^{2} {log}^{1 / 2} (1 / h)} . \end{array}

By (A.9), the second term in the above expression can be expressed as

\begin{array}{r} N^{- 1 / 2} \sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} q_{2} (m_{i j}, Y_{i j}) \frac{1}{Nhf (Z_{i j}) E {ρ_{2} (R) ∣ Z_{i j}}} {\sum_{l = 1}^{n} \sum_{s = 1}^{n_{i}} q_{1} (η_{l s}, y_{l s}) K_{h} (Z_{l s} - Z_{i j})} X_{i j} \\ + O_{P} {n^{1 / 2} c_{n}^{2} {log}^{1 / 2} (1 / h)} \\ \equiv T_{n 1} + O_{P} {N^{1 / 2} c_{n}^{2} {log}^{1 / 2} (1 / h)} . \end{array}

T_n₁ can further be simplified as

T_{n 1} = N^{- 1 / 2} \sum_{l = 1}^{n} \sum_{s = 1}^{n_{i}} q_{1} (η_{l s}, y_{l s}) \frac{E {ρ_{2} (R) X ∣ Z_{l s}}}{E {ρ_{2} (R) ∣ Z_{l s}}} + O_{P} (N^{1 / 2} h^{2})

(A.13)

Combining (A.10)–(A.13) we obtain that

ℓ_{n} (ζ) = N^{- 1 / 2} \sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} ω (Z_{i j}, Y_{i j}, X_{i j}) - \frac{1}{2} ζ^{T} B ζ + o_{P} (1),

where

Ω (Z_{i j}, Y_{i j}, X_{i j}) = q_{1} (m_{i j}, Y_{i j}) [X_{i j} - \frac{E {ρ_{2} (R) X ∣ Z_{i j}}}{E {ρ_{2} (R) ∣ Z_{i j}}}] .

By the convexity lemma (Pollard, 1991) we find that $\hat{ζ} = B^{- 1} N^{- 1 / 2} \sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} Ω (Z_{i j}, Y_{i j}, X_{i j}) + o_{P} (1)$ , from which it follows that

N^{1 / 2} (\hat{β} - β) \overset{D}{\to} N (0, B^{- 1} \sum_{1} B^{- 1}),

as claimed.

References

Breslow NE, Clayton DG. Approximate inference in generalized linear mixed models. J Am Statist Assoc. 1993;88:9–25. [Google Scholar]
Buonaccorsi JP, Demidenko E, Tosteson TD. Estimation in longitudinal random effects models with measurement error. Statist Sinica. 2000;10:885–904. [Google Scholar]
Carroll RJ, Fan J, Gijbels I, Wand MP. Generalized partially single-index models. J Am Statist Assoc. 1997;92:477–489. [Google Scholar]
Carroll RJ, Ruppert D, Stefanski LA. Nonlinear Measurement Error Models. New York: Chapman and Hall; 1995. [Google Scholar]
Davidian M, Giltinan D. Nonlinear Models for Repeated Measurement Data. New York: Chapman and Hall; 1995. [Google Scholar]
Eubank RL. Nonparametric Regression and Spline Smoothing. New York: Marcel Dekker; 1999. [Google Scholar]
Fan J, Gijbels I. Local Polynomial Modeling and Its Applications. London: Chapman and Hall; 1996. [Google Scholar]
Fuller WA. Measurement Error Models. New York: John Wiley; 1987. [Google Scholar]
Higgins KM, Davidian M, Giltinan DM. A two-step approach to measurement error in time-dependent covariates in nonlinear mixed-effects models, with application to IGF-I pharmacokinetics. J Am Statist Assoc. 1997;92:436–448. [Google Scholar]
Ke CL, Wang YD. Semiparametric nonlinear mixed-effects models and their applications (with discussions) J Am Statist Assoc. 2001;96:1272–1298. [Google Scholar]
Laird NM, Ware JH. Random effects models for longitudinal data. Biometrics. 1982;38:963–974. [PubMed] [Google Scholar]
Liang H, Ren HB. Generalized partially linear measurement error models. J Comp Graph Statist. 2005;14:237–250. [Google Scholar]
Liang H, Wu HL, Carroll RJ. The relationship between virologic and immunologic responses in AIDS clinical research using mixed-effect varying-coefficient semiparamet-ric models with measurement error. Biostatistics. 2003;4:297–312. doi: 10.1093/biostatistics/4.2.297. [DOI] [PubMed] [Google Scholar]
Lin XH, Carroll RJ. Semiparametric regression for clustered data using generalized estimating equations. J Am Statist Assoc. 2001;96:1045–1056. [Google Scholar]
Lin XH, Zhang DW. Inference in generalized additive mixed models by using smoothing splines. J R Statist Soc B. 1999;61:381–400. [Google Scholar]
Pinheiro JC, Bates DM. Mixed-Effects Models in S and S-PLUS. New York: Springer; 2000. [Google Scholar]
Pollard D. Asymptotics for least absolute deviation regression estimators. Economet Theory. 1991;7:186–199. [Google Scholar]
Ruppert D. Empirical-bias bandwidths for local polynomial nonparametric regression and density estimation. J Am Statist Assoc. 1997;92:1049–1062. [Google Scholar]
Rice JA, Wu CO. Nonparametric mixed effects models for unequally sampled noisy curve. Biometrics. 2001;57:253–259. doi: 10.1111/j.0006-341x.2001.00253.x. [DOI] [PubMed] [Google Scholar]
Schall R. Estimation in generalized linear models with random effects. Biometrika. 1991;78:717–727. [Google Scholar]
Scott ZA, Chadwick EG, Gibson LL, et al. Infrequent detection of HIV-1-specific, but not cytomegalovirus-specific, CD8+T cell responses in young HIV-1-infected infants. J Immunology. 2001;167:7134–7140. doi: 10.4049/jimmunol.167.12.7134. [DOI] [PubMed] [Google Scholar]
Severini TA, Staniswalis JG. Quasilikelihood estimation in semiparametric models. J Am Statist Assoc. 1994;89:501–511. [Google Scholar]
Shi M, Weiss RE, Taylor JMG. An analysis of pediatric CD4+ counts for acquired immune deficiency syndrome using flexible random curves. Applied Statistics. 1996;45:151–163. [Google Scholar]
Stefanski LA, Cook JR. Simulation-extrapolation: the measurement error jackknife. J Am Statist Assoc. 1995;90:1247–1256. [Google Scholar]
Wang NS, Lin XH, Gutierrez RG, Carroll RJ. Bias analysis and SIMEX approach in generalized linear mixed measurement error models. J Am Statist Assoc. 1998;93:249–261. [Google Scholar]
Wu HL, Kuritzkes DR, Clair MS, et al. Characterizing individual and population viral dynamics in HIV-1-infected patients with potent antiretroviral therapy: correlations with host-specific factors and virological endpoints. J Infectious Disease. 1999;179:799–897. [Google Scholar]
Wu HL, Zhang JT. Semiparametric nonlinear mixed-effects models for longitudinal data application to an AIDS clinical study. Statist Med. 2002;21:3655–3675. doi: 10.1002/sim.1317. [DOI] [PubMed] [Google Scholar]
Wu L. A joint model for nonlinear mixed-effects models with censoring and covariates measured with error, with application to AIDS studies. J Am Statist Assoc. 2002;97:955–964. [Google Scholar]
Zeger SL, Karim MR. Generalized linear models with random effects: a gibbs sampling approach. J Am Statist Assoc. 1991;6:79–86. [Google Scholar]
Zhang D. Generalized linear mixed models with varying coefficients for longitudinal data. Biometrics. 2004;60:8–15. doi: 10.1111/j.0006-341X.2004.00165.x. [DOI] [PubMed] [Google Scholar]
Zhang D, Lin X, Raz J, Sowers M. Semiparametric stochastic mixed models for longitudinal data. J Am Statist Assoc. 1998;93:710–719. [Google Scholar]

[R1] Breslow NE, Clayton DG. Approximate inference in generalized linear mixed models. J Am Statist Assoc. 1993;88:9–25. [Google Scholar]

[R2] Buonaccorsi JP, Demidenko E, Tosteson TD. Estimation in longitudinal random effects models with measurement error. Statist Sinica. 2000;10:885–904. [Google Scholar]

[R3] Carroll RJ, Fan J, Gijbels I, Wand MP. Generalized partially single-index models. J Am Statist Assoc. 1997;92:477–489. [Google Scholar]

[R4] Carroll RJ, Ruppert D, Stefanski LA. Nonlinear Measurement Error Models. New York: Chapman and Hall; 1995. [Google Scholar]

[R5] Davidian M, Giltinan D. Nonlinear Models for Repeated Measurement Data. New York: Chapman and Hall; 1995. [Google Scholar]

[R6] Eubank RL. Nonparametric Regression and Spline Smoothing. New York: Marcel Dekker; 1999. [Google Scholar]

[R7] Fan J, Gijbels I. Local Polynomial Modeling and Its Applications. London: Chapman and Hall; 1996. [Google Scholar]

[R8] Fuller WA. Measurement Error Models. New York: John Wiley; 1987. [Google Scholar]

[R9] Higgins KM, Davidian M, Giltinan DM. A two-step approach to measurement error in time-dependent covariates in nonlinear mixed-effects models, with application to IGF-I pharmacokinetics. J Am Statist Assoc. 1997;92:436–448. [Google Scholar]

[R10] Ke CL, Wang YD. Semiparametric nonlinear mixed-effects models and their applications (with discussions) J Am Statist Assoc. 2001;96:1272–1298. [Google Scholar]

[R11] Laird NM, Ware JH. Random effects models for longitudinal data. Biometrics. 1982;38:963–974. [PubMed] [Google Scholar]

[R12] Liang H, Ren HB. Generalized partially linear measurement error models. J Comp Graph Statist. 2005;14:237–250. [Google Scholar]

[R13] Liang H, Wu HL, Carroll RJ. The relationship between virologic and immunologic responses in AIDS clinical research using mixed-effect varying-coefficient semiparamet-ric models with measurement error. Biostatistics. 2003;4:297–312. doi: 10.1093/biostatistics/4.2.297. [DOI] [PubMed] [Google Scholar]

[R14] Lin XH, Carroll RJ. Semiparametric regression for clustered data using generalized estimating equations. J Am Statist Assoc. 2001;96:1045–1056. [Google Scholar]

[R15] Lin XH, Zhang DW. Inference in generalized additive mixed models by using smoothing splines. J R Statist Soc B. 1999;61:381–400. [Google Scholar]

[R16] Pinheiro JC, Bates DM. Mixed-Effects Models in S and S-PLUS. New York: Springer; 2000. [Google Scholar]

[R17] Pollard D. Asymptotics for least absolute deviation regression estimators. Economet Theory. 1991;7:186–199. [Google Scholar]

[R18] Ruppert D. Empirical-bias bandwidths for local polynomial nonparametric regression and density estimation. J Am Statist Assoc. 1997;92:1049–1062. [Google Scholar]

[R19] Rice JA, Wu CO. Nonparametric mixed effects models for unequally sampled noisy curve. Biometrics. 2001;57:253–259. doi: 10.1111/j.0006-341x.2001.00253.x. [DOI] [PubMed] [Google Scholar]

[R20] Schall R. Estimation in generalized linear models with random effects. Biometrika. 1991;78:717–727. [Google Scholar]

[R21] Scott ZA, Chadwick EG, Gibson LL, et al. Infrequent detection of HIV-1-specific, but not cytomegalovirus-specific, CD8+T cell responses in young HIV-1-infected infants. J Immunology. 2001;167:7134–7140. doi: 10.4049/jimmunol.167.12.7134. [DOI] [PubMed] [Google Scholar]

[R22] Severini TA, Staniswalis JG. Quasilikelihood estimation in semiparametric models. J Am Statist Assoc. 1994;89:501–511. [Google Scholar]

[R23] Shi M, Weiss RE, Taylor JMG. An analysis of pediatric CD4+ counts for acquired immune deficiency syndrome using flexible random curves. Applied Statistics. 1996;45:151–163. [Google Scholar]

[R24] Stefanski LA, Cook JR. Simulation-extrapolation: the measurement error jackknife. J Am Statist Assoc. 1995;90:1247–1256. [Google Scholar]

[R25] Wang NS, Lin XH, Gutierrez RG, Carroll RJ. Bias analysis and SIMEX approach in generalized linear mixed measurement error models. J Am Statist Assoc. 1998;93:249–261. [Google Scholar]

[R26] Wu HL, Kuritzkes DR, Clair MS, et al. Characterizing individual and population viral dynamics in HIV-1-infected patients with potent antiretroviral therapy: correlations with host-specific factors and virological endpoints. J Infectious Disease. 1999;179:799–897. [Google Scholar]

[R27] Wu HL, Zhang JT. Semiparametric nonlinear mixed-effects models for longitudinal data application to an AIDS clinical study. Statist Med. 2002;21:3655–3675. doi: 10.1002/sim.1317. [DOI] [PubMed] [Google Scholar]

[R28] Wu L. A joint model for nonlinear mixed-effects models with censoring and covariates measured with error, with application to AIDS studies. J Am Statist Assoc. 2002;97:955–964. [Google Scholar]

[R29] Zeger SL, Karim MR. Generalized linear models with random effects: a gibbs sampling approach. J Am Statist Assoc. 1991;6:79–86. [Google Scholar]

[R30] Zhang D. Generalized linear mixed models with varying coefficients for longitudinal data. Biometrics. 2004;60:8–15. doi: 10.1111/j.0006-341X.2004.00165.x. [DOI] [PubMed] [Google Scholar]

[R31] Zhang D, Lin X, Raz J, Sowers M. Semiparametric stochastic mixed models for longitudinal data. J Am Statist Assoc. 1998;93:710–719. [Google Scholar]

PERMALINK

GENERALIZED PARTIALLY LINEAR MIXED-EFFECTS MODELS INCORPORATING MISMEASURED COVARIATES

Hua Liang

Abstract

1 Introduction

2 Generalized Partially Linear Mixed-effects Models

2.1 Model

2.2 Estimation

Step 1

Step 2

2.3 Asymptotic Results

Condition

Theorem 1

Theorem 2

2.4 Implementation

2.5 Measurement errors in covariates

3 Numerical Illustration

3.1 A Small Simulation Experiment

Table 1.

3.2 Data Analysis

Figure 1.

Table 2.

Figure 2.

Figure 3.

4 Discussion

Acknowledgments

Appendix

A.1 Proof of Theorem 1

A.2 Proof of Theorem 2

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

GENERALIZED PARTIALLY LINEAR MIXED-EFFECTS MODELS INCORPORATING MISMEASURED COVARIATES

Hua Liang

Abstract

1 Introduction

2 Generalized Partially Linear Mixed-effects Models

2.1 Model

2.2 Estimation

Step 1

Step 2

2.3 Asymptotic Results

Condition

Theorem 1

Theorem 2

2.4 Implementation

2.5 Measurement errors in covariates

3 Numerical Illustration

3.1 A Small Simulation Experiment

Table 1.

3.2 Data Analysis

Figure 1.

Table 2.

Figure 2.

Figure 3.

4 Discussion

Acknowledgments

Appendix

A.1 Proof of Theorem 1

A.2 Proof of Theorem 2

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases