Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Oct 23.
Published in final edited form as: Stat Med. 2008 Jul 20;27(16):3042–3056. doi: 10.1002/sim.3262

A Varying-Coefficient Model for the Evaluation of Time-Varying Concomitant Intervention Effects in Longitudinal Studies

Colin O Wu 1,*, Xin Tian 2, Heejung Bang 3
PMCID: PMC4615703  NIHMSID: NIHMS66649  PMID: 18351714

Summary

Concomitant interventions are often introduced during a longitudinal clinical trial to patients who respond undesirably to the pre-specified treatments. In addition to the main objective of evaluating the pre-specified treatment effects, an important secondary objective in such a trial is to evaluate whether a concomitant intervention could change a patient’s response over time. Because the initiation of a concomitant intervention may depend on the patient’s general trend of pre-intervention outcomes, regression approaches that treat the presence of the intervention as a time-dependent covariate may lead to biased estimates for the intervention effects. Borrowing the techniques of Follmann and Wu (1995) for modeling informative missing data, we propose a varying-coefficient mixed-effects model to evaluate the patient’s longitudinal outcome trends before and after the patient’s starting time of the intervention. By allowing the random coefficients to be correlated with the patient’s starting time of the intervention, our model leads to less biased estimates of the intervention effects. Nonparametric estimation and inferences of the coefficient curves and intervention effects are developed using B-splines. Our methods are demonstrated through a longitudinal clinical trial in depression and heart disease and a simulation study.

Keywords: Change-Point Models, Concomitant Intervention, Longitudinal Study, Polynomial Splines, Shared Parameter Model, Varying-Coefficient Model

1 Introduction

A main objective of longitudinal analysis in clinical trials is to evaluate the effects of covariates of interest on the time-trends of the outcome variables. The treatment effects are usually modeled using a time-invariant categorical covariate, while the other covariates can be either time-invariant or time-dependent. Recent advances in longitudinal analysis have led to a wide range of regression methods. For parametric models, estimation and inference procedures, such as maximum likelihood estimation, the restricted maximum likelihood (REML) estimation and the generalized estimating equations (GEEs), can be found [14]. For models involving nonparametric components, smoothing methods, such as local polynomials and splines, are often used [59].

The above regression approaches generally lead to satisfactory results when the subjects are properly randomized so that the treatments and the covariates are not subject to “selection bias”. In many longitudinal studies, however, concomitant interventions are initiated, usually due to ethical reasons, to patients who exhibit less satisfactory trends in their medical outcomes. This phenomenon bears some resemblance to longitudinal studies with informative missing data, where patients with undesirable outcome trends tend to drop out early from the study, except that in our situation the outcomes of these patients are continuously observed under the additional concomitant treatment. In a randomized longitudinal clinical trial with pre-specified treatments, patients who have taken a concomitant intervention in addition to their assigned treatments may generally have different disease pathology from those who do not need the intervention. Thus, in addition to the primary goal of evaluating the effect of the main treatment, an important objective is to evaluate the effects of the additional concomitant intervention(s) among eligible patients.

Our motivating example is the Enhancing Recovery in Coronary Heart Disease (EN-RICHD), which is a randomized clinical trial that evaluated the efficacy of a psychosocial treatment versus usual cardiological care on survival and depression severity in 2,481 patients with depression and/or low perceived social support after acute myocardial infraction. Depression severity was measured by Hamilton Rating Scale for Depression (HRSD) and Beck Depression Inventory (BDI), where higher HRSD and BDI scores indicate worsened depression. In addition to the randomized treatments, patients with high baseline depression scores and/or nondecreasing BDI trends were eligible for pharmacotherapy with antidepressant. Antidepressants were also prescribed at the requests of the patients or their primary care physicians. Details of the study design, objectives and major findings of the trial have been described [10,11]. Taylor et al. [12] compared the survival rates for death and cardiovascular morbidity and mortality among 1,834 depressed patients in this trial and found that the use of selective serotonin reuptake inhibitors seemed to reduce subsequent cardiovascular morbidity and mortality. Bang and Robins [13] also analyzed the same data but a cross-sectional component only. However, the question of whether these antidepressant medications had added benefits for lowering the BDI scores of patients in the psychosocial treatment arm who had undergone this concomitant intervention during the trial was not addressed. To answer this question, we treat the patient’s starting time of pharmacotherapy as a subject-specific “change-point” and evaluate the effects of pharmacotherapy on the patient’s BDI scores over time using different models before and after the “change-point”. We attempt to model the relationships between the subject’s parameters for BDI trends and his/her change-point time using a B-spline nonparametric approach. It is important to note that in the ENRICHD trial pharmacotherapy was not simply initiated based on certain common HRSD or BDI threshold values, which may require a different methodology.

Methodologically, Murphy, van der Laan and Robins [14] studied estimation and causal inferences for the mean responses to dynamic treatment regimens that were tailored to subjects’ individual needs. Their designs involve a set of treatment intervals specified by selected time points and a pre-specified sequential randomization rule that assigns subjects to different treatment levels. The main difference between our data structure and their dynamic regimens is that the concomitant interventions considered in this paper are often not assigned based on specific treatment rules, so that their estimating equations may not be directly applied to the current context. Also Wall, Dai and Eberly [15] examined the impact of a misspecified time-varying covariate when they analyzed the effect of (nonrandomized) alcoholism treatment on medical utilization in the GEE framework. They implicitly suggested that a change-point model was a way to go but did not provide a solution.

We propose in this paper a nonparametric approach for estimating the effects of a concomitant intervention in longitudinal clinical trials. The proposed methodology is based on a varying-coefficient mixed-effects model and a B-spline least-squares estimation method. A key question that has not been previously well-understood in the literature is why a naive linear mixed-effects model (e.g., Verbeke and Molenberghs [3], Sec. 3.3) could lead to biased estimates when a concomitant intervention is present. Using the general framework of the change-point shared-parameter model (Section 2.3), we show that the mixed-effects models without properly incorporating the joint distributions of the random parameters and the starting time of the concomitant intervention are misspecified models. On the other hand, our varying-coefficient mixed-effects model is a flexible nonparametric version of the change-point shared-parameter model which can adequately incorporate the concomitant intervention starting time when the joint distribution of the random parameters and the concomitant intervention starting time is completely unknown. Because our model may include both parametric and nonparametric components, our B-spline estimation method is more flexible than the usual local smoothing methods, such as kernel or local polynomial methods, in the sense that it can be naturally adapted to both parametric and nonparametric situations [8, 9]. Although certain components of our approach, such as B-splines, have been used for other longitudinal settings in the literature [59], the systematic modeling and estimation procedure proposed in this paper fills a gap for obtaining unbiased estimates in longitudinal clinical trials with the presence of a concomitant intervention.

Our modeling approach, however, shares some similarities with the shared parameter model that addresses the informative missingness [16,17]. Follmann and Wu [17] used this approach to link a mixed-effects model for the response variable and a marginal model for the characteristics of missing data, such as time to drop-out. Their conditional model resembles our varying-coefficient model with only the observations before the change-points. In contrast, our model also includes the response curves after the change-points. Unlike the classical change-point problems where all the subjects have the same change-point and its location is the unknown parameter to be estimated, individual starting times of the concomitant intervention are observed in our data [18,19].

We describe our regression models and its biological interpretations in Section 2. Subsequently, we propose a class of nonparametric estimation and inference procedures based on B-splines in Sections 3 and 4, apply these methods to the ENRICHD data in Section 5, and present a simulation study in Section 6. Finally, we discuss some potential extensions of our methods in Section 7.

2 Change-Point Mixed-Effects Models

2.1 Data Structure

We consider a study with n randomly selected subjects. For the ith subject, ni is the number of visits, Tij ∈ [𝒯0, 𝒯1] is the trial time or study time at the jth visit, Yij is the real-valued outcome measured at Tij and Xi = (1,X1i, …,XPi)T is the RP+1-valued time-invariant covariate vector. Here [𝒯0, 𝒯1] is the known time interval for the study. We assume that the study has one concomitant intervention, and each subject has one change-point from non-intervention to intervention within the study period [𝒯0, 𝒯1]. Let Si be the ith subject’s intervention starting time or change-point time, and δij = 1[TijSi] the intervention indicator at the jth visit. The time from the concomitant intervention to the jth visit is Rij = TijSi. The observed data are {(Tij, Yij, Xi, Si); 1 ≤ jni, 1 ≤ in}.

Because our objective is to investigate the relationships between Yij and (Tij,Xi) before and after the change-point Si, our model does not include subjects who have not received the concomitant intervention within the study period [𝒯0, 𝒯1]. In some situations, these subjects may provide useful information for the pre-intervention covariate effects on Yij. However, models incorporating such subjects require additional assumptions and/or models on the treatment receipt pattern and disease pathology. Hence, discussion on this problem is out of the scope of this paper. When the context is clear, we may interchange intervention with concomitant intervention and denote the random variables by (T, Y,X, S, δ,R).

2.2 A Naive Linear Mixed-Effects Model: Review

Intuitively we may evaluate the intervention effects by comparing the response trajectories before and after the change-points. For the ease of notation, we discuss the case without X. Suppose that Yij are given by Yij = a0i + a1iTij + εij when Tij < Si and (a0i + b0i) + (a1i + b1i)Tij + εij when TijSi for some subject-specific parameters (a0i, a1i, b0i, b1i) and measurement errors εij. The individual intervention effects for the ith subject are characterized by (b0i, b1i), and the marginal intervention effects for the population are then characterized by E(b0i, b1i)T = (β0, β1)T. Using the framework of linear mixed-effects models (see Ch.3 in [3]), an intuitive change-point model is

{Yij=TijTai+(Tij*)Tbi+εij,(aiT,biT)T~MVN((αT,βT)T,Γ)for some unkown(α,β,Γ), (2.1)

where, for known constants D1 and D2, ai = (a0i, …, aD1i)T, bi = (b0i, …, bD2i)T, Tij=(1,Tij,,TijD1)T,Tij*=(δij,δijTij,,δijTijD2)T, εij are mean zero error processes, and (aiT,biT)T and εi = (εi1, …, εini)T are independent. The intervention effects are characterized by bi for the ith subject and E(bi) = β for the population. A crucial assumption of (2.1) is that {ai, bi} and Si are independent. Although it appears that the time-varying intervention is incorporated as a covariate by the term involving δij, we will demonstrate in Sections 5 and Section 6 that, by ignoring the correlations between {ai, bi} and Si in the distribution assumption of {ai,bi}, (2.1) is a misspecified model for our data and may lead to biased estimates for the intervention effects.

2.3 The Shared-Parameter Model

To model the initiation of the concomitant intervention, a natural extension of (2.1) is to allow the intervention starting time Si to be correlated with the pre-intervention random coefficients ai or more generally {ai, bi}. When the context is clear, we will denote by μ1(·; ai) and [μ1(·; ai) + μ2(·; bi)] the subject-specific response curves before and after the start of the intervention, respectively. We interpret μ2(·; bi) as the intervention effect. Given {Tij ,Xi, Si}, our shared-parameter model is

{Yij=μ1(Tij,Xi;ai)+δijμ2(Tij,Xi,Rij;bi)+εij,(aiT,biT,Si)T~Joint Distribution, (2.2)

where εij are mean zero errors with covij1, εij2) = σij1j2, εi1j1 and εi2j2 are independent if i1i2, and, conditioning on {ai, bi}, Si and {Tij ,Xi} are independent. In addition, we assume that {ai, bi} and {Tij,Xi} are independent.

Let Yi = (Yi1, …, Yini)T, Ti = (Ti1, …, Tini)T and H(·, ·) be the joint distribution function of {ai, bi}. The joint likelihood of (YiT,Si)T given {Ti,Xi} is

f(Yi,Si|Ti,Xi)=f(Yi|Ti,Xi,Si,ai,bi)f(Si|ai,bi)dH(ai,bi), (2.3)

where f(·|·) denotes the conditional density. Because of the extra f(Si|ai, bi) in the integrand, (2.3) differs from the usual likelihood functions for the mixed-effects models (see p24 in [3]).

Unlike (2.1), (2.2) is a change-point model with shared parameters {ai, bi} which determine both the response curves of Yij and the distribution of Si. The shared parameters approach was proposed for modeling the behaviors of informative missing data [7]. In (2.2), the correlation between Si and ai suggests that the ith subject’s intervention starting time is determined by the pre-intervention response curve μ1, while the correlation between Si and bi suggests that Si may also influence the response curve μ2 that characterizes the intervention effect.

2.4 The Varying-Coefficient Mixed-Effects Model

The approach based on the joint likelihood (2.3) can be computationally complicated and requires some assumptions about the distribution of Si. In this paper, we consider a simpler method based on the conditional model, which is robust to the distributional assumption of Si. The conditional distribution can be written as

f(Yi|Si,Ti,Xi)=f(Yi|Ti,Xi,Si,ai,bi)dG(ai,bi|Si). (2.4)

Then we can rewrite (2.2) as a varying-coefficient model using the conditional distribution of {ai, bi} given Si. When μ1 and μ2 are linear functions, let μ1(Tij,Xi;ai)=ZijTai for Zij = (Zij0, …, ZijD1)T generated by {(Tij,Xi); 1 ≤ jni, δij = 0}, and μ2(Tij,Xi,Si;bi)=WijTbi for Wij = (Wij0, …, WijD2)T generated by {(Tij,Xi, Si); 1 ≤ jni, δij = 1}. Writing α(Si) = E(ai|Si), β(Si) = E(bi|Si), ai*=aiα(Si) and bi*=biβ(Si), our varying-coefficient mixed-effects model has the expression

{Yij=ZijT[α(Si)+ai*]+δijWijT[β(Si)+bi*]+εij,(ai*T,bi*T)T|Si~G(·|Si) (2.5)

where, for Si = s, G(·|s) is a distribution function with mean zero and covariance matrix cov[(ai*T,bi*T)T|s]=C(s). Marginal parameters of interest are α(s) and β(s). When Si = s, the mean intervention effect is β(s), and β(s) = 0 for all s ∈ (𝒯0, 𝒯1) implies that the concomitant intervention had no marginal effect on the response curve.

An obvious choice for G(·|Si) is the multivariate normal distribution with mean zero and covariance matrix C=cov[(ai*T,bi*T)T|s], which, for simplicity, is assumed to be time-invariant. Extensions to time-dependent covariances can be made by modeling C(s). Since explicit forms of G(·|Si) are often unknown, modeling α(s) and β(s) is often more important than modeling C(s). In linear models, we have α(s;γ) = (α0(s; γ0), …, αD1(s; γD1))T, β(s; τ) = (β0(s; τ0), …, βD2(s; τD2))T,

αd(s;γ)=l=0Ldγdl𝒯dl(s)andβd(s;τ)=m=0Mdτdm𝒯dm*(s) (2.6)

where {Ld,Md} are fixed, and {𝒯dl(s),𝒯dm*(s)} are known transformations of s. The choice of 𝒯dl(s) = sl and 𝒯dm*(s)=sm leads to polynomials for (2.6).

Extended linear models can be used to approximate {α(s), β(s)} when their parametric forms are unknown. Let {ℬd1(s) = (ℬd10(s), …, ℬd1d1 (s))T; 0 ≤ d1D1} and {d2*(s)=(d20*(s),,d2d2(s))T; 0 ≤ d2D2} be some pre-specified basis functions. Then α(s) and β(s) can be approximated by

αd(s;γ)l=0dγdldl(s)andβd(s;τ)m=0dτdmdm*(s), (2.7)

where ℒd and ℳd may tend to infinity as n → ∞. Popular basis choices include truncated polynomial bases, Fourier bases or B-splines. In this paper, we restrict our attention to B-splines with fixed knot sequences because of their superior numerical stability. The smoothing parameters {ℒd,ℳd} may be chosen subjectively or by a variable selection procedure, such as cross-validation and information criteria [8,9,20]. An alternative smoothing approach is to approximate {α(s), β(s)} by smoothing splines [21,22]. Because the explicit expressions and statistical properties of smoothing spline estimators are generally different from B-splines, we do not discuss this class of estimators in this paper.

3 Estimation Methods

3.1 Likelihood-Based Estimation

If (2.3) has an explicit parametric expressions, the parameters can be in principle estimated by maximizing the log-likelihood i=1nlogf(Yi,Si|Ti,Xi). Suppose that (2.5) and (2.6) are satisfied, G(·|s) is Gaussian, and (εi1, …, εini)T ~ N(0, Γi). We can estimate {γ, τ} by maximizing the partial likelihood

L({γ,τ}|Ti,Xi,Si)=i=1nlog[f(Yi|Ti,Xi,Si,ai,bi)dG(ai,bi|Si)]. (3.1)

Let 𝒲i be the matrix whose jth row is (ZijT,δijWijT),𝒯d(s)=(𝒯d0(s),,𝒯dLd(s))T,𝒯d*(s)=(𝒯d0*(s),,𝒯dMd*(s))T,𝒯(s)=diag{𝒯0T(s),,𝒯D1T(s),𝒯0*T(s),,𝒯D2*T(s)},𝒯(Si)=𝒯i, and Vi be the covariance matrix of

eij=ZijTai*+δijWijTbi*+εij,1jni. (3.2)

The matrix representation for (2.7) is (αT (s; γ), βT(s; τ))T = 𝒯 (s)(γT, τT)T, where γd = (γd0, …, γdLd)T, γ=(γ0T,,γD1T)T, τd = (τd0, …, τdMd)T and τ=(τ0T,,τD2T)T. When Vi are known, maximizing (3.1) leads to

(γ^ML(𝒯)τ^ML(𝒯))={i=1n[𝒲i𝒯i]TVi1[𝒲i𝒯i]}1{i=1n[𝒲i𝒯i]TVi1Yi} (3.3)

provided that i=1n[(𝒲i𝒯i)TVi1(𝒲i𝒯i)] is nonsingular. When Vi are unknown but can be consistently estimated by a non-singular i, we can estimate {γ, τ} by {γ̃ML(𝒯), τ̃ML(𝒯)} which are given by (3.3) with Vi substituted by i.

Substituting {𝒯dl(s),𝒯dm*(s)} in (3.3) with the basis functions {dl(s),dm*(s)}, we can compute {γ̂ML(ℬ), τ̂ML(ℬ)}. Likelihood-based nonparametric estimators of {α(s), β(s)} under (2.7) and known Vi are

(α^MLT(s;)),β^MLT(s;))T=(s)(γ^MLT(),τ^MLT())T,

where ℬi = ℬ (Si), ℬ (s) is defined similarly to 𝒯 (s) with {𝒯dl(s),𝒯dm*(s)} replaced by {dl(s),dm*(s)}. Nonparametric estimators computed with i used in (3.3) are

(α˜MLT(s;),β˜MLT(s;))T=(s)(γ˜MLT(),τ˜MLT())T.

3.2 Least-Squares Based Estimation

Likelihood-based estimates of {α(s), β(s)} can not be computed when the explicit forms of G(·|Si) and the distribution of εij are unknown. In such situations, a practical approach is to first parameterize {α(s), β(s)} by certain parametric models {α(s; γ), β(s; τ)} and then derive the weighted least-squares estimators {γ̂LS, τ̂LS} which minimize

(γ,τ)=i=1n{[Yi(ZiTα(Si;γ)+(δW)iTβ(Si;τ))]T×Λi[Yi(ZiTα(Si;γ)+(δW)iTβ(Si;τ))]}, (3.4)

where Zi = (Zi1, …, Zini)T, (δW)i = (δi1Wi1, …, δiniWini)T, and Λi are some pre-specified symmetric nonsingular ni × ni weight matrices. The weighted least-squares estimators for (2.6) are

(γ^LS(𝒯)τ^LS(𝒯))={i=1n[𝒲i𝒯i]TΛi[𝒲i𝒯i]}1{i=1n[𝒲i𝒯i]TΛiYi}, (3.5)

where i=1n[𝒲i𝒯i]TΛi[𝒲i𝒯i] is nonsingular, and the jth row of 𝒲i is (ZijT,δijWijT). Substituting the basis approximations (2.7) in (3.4), the least-squares based nonparametric estimators of {α(s), β(s)} are

(α˜LST(s;),β˜LST(s;))T=(s)(γ˜LST(),τ˜LST())T, (3.6)

where {γ̂LS(ℬ), τ̂LS(ℬ)} are given in (3.5) with 𝒯 (s) replaced by ℬ(s). Consistency and the rates of convergence for (3.6) can be derived [9].

Clearly, (3.5) and (3.6) are the same as the likelihood-based estimators when Λi=Vi1 and normality assumptions hold. In practice, Vi are usually unknown and often difficult to estimate, so that subjective choices for Λi are used. Guidance on this choice is also available [8,9,23].

3.3 Estimation of the Covariances

The covariance structure Vi defined in Section 3.1 can be modeled in a number of ways. By the definition of eij in (3.2), the (j1, j2)th component of Vi is

Vi,j1,j2=E(eij1eij2)=ρi,j1,j2(A,B,C)+σi,j1,j2, (3.7)

where A=E(ai*ai*T),B=E(bi*bi*T),C=E(ai*bi*T), σi,j1,j2 = Eij1εij2) and

ρi,j1,j2(A,B,C)=Zij1TAZij2+Zij1TC(δij2Wij2)+(δij1Wij1T)CZij2+(δij1Wij1T)B(δij2Wij2).

For the special case that εij are independent measurement errors such that σi,j1,j2 = 0 if j1j2 and σ2 if j1 = j2, Vi adopts the parametric model Vi(A, B, C, σ2) with Vi,j1,j2 = ρi,j1,j2(A, B, C) if j1j2 and ρi,j,j(A, B, C)+ σ2 if j1 = j2 = j. Other structures for Vi can be formulated by modeling σi,j1,j2 [4].

For the general case of εij having unknown correlation structures, σi,j1,j2 is an nonparametric component in (3.7), hence can be either directly estimated or approximated by a parametric model. Under a different regression model, a local smoothing technique was suggested but can be computationally intensive [24]. To ease the computational burden, a consistent covariance estimator can be constructed by B-spline approximations [9]. σi,j1,j2 can be approximated via B-spline by σi,j1,j2(u,v)=k=1K1l=1K1uklk(Tij1)l(Tij2) if j1j2, and k=1K2υkk(Tij) if j1 = j2 = j, where {ℬk} is a spline basis with a fixed knot sequence, u = {ukl = ulk; k, l = 1, …,K1} and v = {vk; k = 1, …,K2}. Substituting σi,j1,j2 (u, v) into (3.7), Vi is approximated by Vi(A, B, C, u, v) such that

Vi,j1,j2={ρi,j1,j2(A,B,C)+k=1K1l=1K1uklk(Tij1)l(Tij2),ifj1j2;ρi,j1,j2(A,B,C)+k=1K2υkk(Tij),ifj1j2=j.

Once an approximate parametric model for Vi is established, estimation of Vi can be achieved by least squares. Let êij=Yij[ZijTα^(Si)+δijWijTβ^(Si)] be the residual of Yij computed based on some consistent estimators α̂(s) and β̂(s). If Vi,j1,j2 = ρi,j1,j2 (A, B, C)+ σi,j1,j2 (u, v), we can estimate Vi by Vi(Â,B̂, Ĉ, û, v̂) where (Â, B̂, Ĉ, û, v̂) minimizes

i=1nj1,j2=1,j1<j2ni{êij1êij2[ρi,j1,j2(A,B,C)+kluklk(Tij1)l(Tij2)]}2

subject toukl = ulk when j1j2, and

i=1nj=1ni{êij2[ρi,j,j(A,B,C)+kυkk(Tij)]}2,

when j1 = j2 = j.

Here, Vi(Â, B̂, Ĉ, û, v̂) needs not to be positive definite for a finite sample, although, by consistency, it is asymptotically positive definite [9]. The problem of imposing finite sample positive definiteness to the spline estimators of Vi deserves substantial further investigation. The adequacy of Vi(Â, B̂, Ĉ, û, v̂) depends on the choices of knots and the degrees of the splines. Although it is possible to develop data-driven knots using cross-validation or the generalized cross-validation, statistical properties of such procedures are currently unknown. Subjective knot choices, such as using a few equal spaced knots, often give satisfactory results in biomedical applications.

4 Inferences

4.1 Inferences for Linear Models

Following the classical inferential framework with linear mixed-effects models, we consider the inferences for the fixed effects of (2.5) and (2.6) given {(Zi, δij,Wij, Si); 1 ≤ in, 1 ≤ jni}. From (3.5), E[γ̂LS(𝒯)] = γ, E[τ̂LS(𝒯)] = τ and the covariance of (γ^LST(𝒯),τ^LST(𝒯))T is

[i=1n(𝒲i𝒯i)TΛi(𝒲i𝒯i)]1[i=1n(𝒲i𝒯i)TΛiViΛi(𝒲i𝒯i)][i=1n(𝒲i𝒯i)TΛi(𝒲i𝒯i)]1.

Let ξ^LST=(γ^LST(𝒯),τ^LST(𝒯))T, ξ = (γT, γT)T, and ℒξ̂LS and ℒ ξ be their corresponding linear combinations. Following the central limit theorem (see Sec.1.9.3 in [25]), it can be shown that, when n is sufficiently large, ℓ ξ̂LS is asymptotically distributed as N(ℒ ξ Var(ℓ ξ̂LS)), where Var(ℒξ̂LS) can be derived from the covariance matrix of ξ̂LS. Substituting Var(ℒ ξ̂LS) with a consistent estimate Var^(ξ^LS), an approximate (1 − α) confidence interval for ℒ ξ is

ξ^LS±Zα/2[Var^(ξ^LS)]1/2, (4.1)

where Zα/2 is the [100 × (1 − α/2)]th percentile of the standard normal distribution.

Let L be any given matrix with rank(L) = d. The above asymptotic approximations can be used to test the null hypothesis, H0: Lξ = C for a known constant vector C, versus the alternative, HA: Lξ ≠ C. An α-level approximate χ2-test rejects the null hypothesis if

(Lξ^LSC)T{L[cov(ξ^LS)]LT}1(Lξ^LSC)χd,α2, (4.2)

where χd,α2 is the [100 × (1 − α)]th upper percentile of the χd2-distribution.

4.2 Bootstrap Confidence Intervals

Nonparametric inferences for the smoothing estimators of α(s) and β(s) can be constructed using the “resampling-subject” bootstrap [8,26]. For the construction of bootstrap pointwise confidence intervals (CIs), this procedure generates bootstrap samples { (Yijb,Zijb,(δijWij)b,Sib); 1 ≤ in, 1 ≤ jni} by sampling the subjects with replacement from the original data and obtains the spline estimators α˜LS(b)(s,) and β˜LS(b)(s,). Repeat the procedure multiple times, and let Lα/2(α̃LS,d(s, ℬ)) and Uα/2(α̃LS,d(s, ℬ)) be the lower and upper [100 × (α/2)]th percentiles of the bootstrap estimators of αd(s). An approximate (1 − α) pointwise CI for αd(s) is [Lα/2(α̃LS,d(s, ℬ)), Uα/2(α̃LS,d(s, ℬ))]. Confidence intervals for βd(s) and other parameters can be constructed similarly. Moreover, variances of the parameters can be estimated by the sample variances of bootstrap estimates.

5 Application to Pharmacotherapy in the ENRICHD Study

As described in Section 1, our objective is to evaluate the additional effects of pharmacotherapy (antidepressants) on the trends of depression (measured by BDI scores) for patients who received pharmacotherapy during the six-month psychosocial treatment period. Because pharmacotherapy was only designed as a concomitant intervention in this trial, the starting time of pharmacotherapy was decided by the patients or their physicians. Unfortunately, since patients in the usual care arm did not have accurate pharmacotherapy starting time and repeated BDI scores recorded within the first six-month period, the effects of pharmacotherapy could not be properly analyzed for these patients (refer to [11] for more details). Ninety one patients (total 1,446 observations) in the psychosocial treatment arm received pharmacotherapy as a concomitant intervention during this period and had clear records of their pharmacotherapy starting time. Among them, 43 started pharmacotherapy at baseline and 48 started pharmacotherapy between 7 and 172 days. The number of visits for these patients ranges from 5 to 36 and has the median of 16. Patients who did not have proper records of antidepressant use were excluded.

For the ith patient, Yij, Tij, Si, Rij = TijSi and δij = 1[TijSi] are the BDI score, trial time (in months), starting time of pharmacotherapy, time from initiation of pharmacotherapy, and pharmacotherapy indicator, respectively, at the jth visit. Our preliminary examination of the data revealed that the BDI scores over Tij could be approximated by a linear model (results not shown). An intuitive model is the following special case of (2.1),

Yij=a0i+a1iTij+b0iδij+b1iδijRij+εij, (5.1)

where E(a0i, a1i, b0i, b1i)T = (α0, α1, β0, β1)T and, when δij = 1 and Rij = r, (β0 + β1r) describes the mean pharmacotherapy effect at r months since the start of pharmacotherapy. Clearly, (5.1) ignores the correlation between Si and the pre-pharmacotherapy depression trends. To evaluate whether (5.1) leads to potential bias, we next considered the following special case of (2.5),

Yij=α0(Si)+α1(Si)Tij+β0δij+β1δijRij+eij, (5.2)

where eij=ai0*+ai1*Tij+bi0*δij+bi1*δijRij+εij, α0(Si) = γ00 + γ01Si and α1(Si) = γ10 + γ11Si. In (5.2), the mean pre-pharmacotherapy BDI trend is associated with Si through intercept α0(Si) and slope α1(Si). At r months after the start of pharmacotherapy, the mean pharmacotherapy effect is β01r, where a negative value for β01r corresponds to a beneficial effect for reducing depression. To reduce model complexity, we assume in (5.2) that β0(Si) ≡ β0 and β1(Si) ≡ β1 in the sense that the effects of pharmacotherapy only depend on how long the antidepressant has been used, but not on when it was started.

Table 1 summarizes the parameter estimates and their corresponding standard errors, 95% CIs and p-values obtained by the REML procedure with unstructured correlations. The negative estimates for (β0, β1) under (5.1) and (5.2) suggest that the beneficial effect of pharmacotherapy for this patient population is detected by both models. However, a slightly stronger depression lowering effect is exhibited by (5.2). The 95% CI for γ01 suggests a negative correlation of Si with baseline BDI scores, so that patients with higher baseline BDI scores tend to start pharmacotherapy sooner.

Table 1.

The ENRICHD Data Analysis

Model Parameter Estimate SE 95% CI p-value
(5.1) α0 23.380 1.107 (21.167, 25.594) <0.0001
α1 −0.619 0.479 (−1.577, 0.339) 0.199
β0 −3.410 0.994 (−5.399, −1.422) 0.0013
β1 −1.584 0.521 (−2.626, −0.542) 0.0039

(5.2) γ00 25.670 1.431 (22.808, 28.533) <0.0001
γ01 −1.389 0.586 (−2.562, −0.216) 0.0180
γ10 −0.278 0.822 (−1.922, 1.366) 0.736
γ11 0.078 0.174 (−0.272, 0.426) 0.654
β0 −4.302 1.041 (−6.385, −2.220) 0.0001
β1 −2.062 0.773 (−3.608, −0.516) 0.0105

Parameter estimates and their standard errors (SE), 95% confidence intervals (CIs) and p-values were obtained by restricted maximum likelihood with unstructured correlations for models (5.1) and (5.2).

6 Simulation

Following the general framework (2.3), we consider a simulation design that resembles the data structure of the ENRICHD trial. Each simulated sample contains n = 200 subjects. Each subject has 30 “scheduled visits” at time points (Ti1, …, Ti,30) = (0, 0.2+e1, …, 5.8+ e29), where {el} are independently generated from uniform U(−0.2, 0.2) distribution, but each scheduled visit has 40% probability skipped. This leads to unequal numbers of repeated measurements among the subjects with ni being the number of repeated measurements for the ith subject. The random parameters (aiT,biT)T=(a0i,a1i,b0i,b1i)T are generated from the multivariate normal distribution with mean (25, 0,−4,−2)T and covariance matrix cov(a0i, a1i, b0i, b1i) = diag(6.25, 1, 1, 1). For each {ai, bi}, we generate two different change-point times: (a) Si ~ N(10−0.3 a0i, 0.16); and (b) Si ~ N(1+4 sin[(a0i−4)/9], 0.09). For each given {Tij, Si, ai, bi}, Yij is generated from N(a0i+a1iTij+b0iδij+b1iδijRij, 4). When {Si; i = 1, …, 200} are generated from (a), direct calculation based on conditional normal distributions shows that the marginal model of Yij is (5.2) with γ00 = 31.49, γ01 = −2.60, γ10 = γ11 = 0, τ0 = −4 and τ1 = −2. When {Si; i = 1, …, 200} are generated from (b), we assume that the parametric form of α0(Si) is unknown and the marginal model of Yij is

Yij=α0(Si)+α1Tij+β0δij+β1δijRij+eij, (6.1)

where eij=ai0*+ai1*Tij+bi0*δij+bi1*δijRij, β0 = −4, β1 = −2 and α0(Si) is not a linear function.

The simulation was repeated 2,000 times. For samples with Si generated from (a), we first ignored the correlation between a0i and Si and estimated (α0, α1, β0, β1) using (5.1) with unstructured correlations, and then fitted the data to (5.2) with α1(Si) ≡ α1 and estimated (γ00, γ01, α1, β0, β1), all by REML. Table 2 summarizes the averages of the estimates, their standard errors, and root mean-squared errors as well as the empirical coverage probabilities of the 95% asymptotic CIs. The bias for the estimation of β0 in (5.1) can be seen from the large average root mean-squared errors and the low coverage probabilities compared with the estimates obtained using (5.2).

Table 2.

Simulation results for (a) Si ~ N(10 − 0.3 a0i, 0.16)

Model Parameter Estimate SE MSE CP
(5.1) α0 =25 25.018 0.202 0.211 0.940
α1 =0 −0.058 0.100 0.123 0.897
β0 = −4 −3.805 0.153 0.253 0.745
β1 = −2 −1.971 0.115 0.123 0.936

(5.2) γ00 =31.488 31.480 0.350 0.359 0.942
γ01 = −2.595 −2.592 0.131 0.134 0.943
α1 =0 0.001 0.099 0.099 0.951
β0 = −4 −4.004 0.152 0.154 0.946
β1 = −2 −1.998 0.114 0.114 0.955

Estimate, SE and MSE denote the averages of estimates, standard errors and square root of the mean squared errors and CP represents the estimated coverage probability of the 95% confidence intervals, computed from 2,000 simulated samples. Parameter estimates and SEs were obtained by restricted maximum likelihood with unstructured correlations.

For samples with Si generated from (b), we approximated α0(s), s ∈ [0,6], using the quadratic B-spline with 2 equally spaced interior knots (see Ch.5.2 in [27]), and estimated (α0(s), α1, β0, β1) under (6.1) using (3.5) and (3.6) with the Λi = Ini×ni weight. We computed the 95% bootstrap CIs for (α1, β0, β1) and the 95% pointwise bootstrap CIs for α0(s) at 60 equally spaced values of s ∈ [0, 6] using the percentile procedures with B = 300 bootstrap replications. For comparison, we also fitted the data to (5.1), which assumes that α0(s) ≡ α0, and estimated (α0, α1, β0, β1) using the same procedure of (3.5). Figure 1(a) shows the spline-estimated coefficient curve α0(s) and the 95% pointwise bootstrap CIs obtained from a randomly selected simulated sample and Figure 1(b) displays the empirical coverage probabilities of the 95% pointwise bootstrap CIs, where true α0(s) is numerically calculated. Table 3 presents the same set of summary statistics used in Table 2 for (α1, β01) under both (5.1) and (6.1). Standard errors and CI coverage probabilities based on either the bootstrap procedure (Section 4.2) or the least squares and normal approximation procedure (Sections 3.3 and 4.1) are compared, and the performances of the two procedures are similar under each model. The large root mean-squared errors and poor coverage probabilities for the estimates obtained under (5.1) suggest that ignoring the association between a0i and Si may lead to erroneous conclusions in the present situation.

Figure 1.

Figure 1

(a) True curve α0(s) (solid line), spline estimated curve α̃0(s) (dash line) and pointwise 95% bootstrap percentile confidence intervals (dotted lines) obtained from a randomly selected simulated sample. (b) Empirical coverage probability of pointwise 95% bootstrap confidence intervals for α0(s) (solid line) and their sample mean (dash line).

Table 3.

Simulation results for (b) Si ~ N(1 + 4 sin[(a0i − 4)/9], 0.09)

Model Parameter Estimate SE (SE*) MSE CP (CP*)
(5.1) α1 =0 −0.862 0.150 (0.145) 0.874 0(0)
β0 = −4 −2.404 0.340 (0.329) 1.632 0.004 (0.007)
β1 = −2 −0.237 0.372 (0.390) 1.804 0.005 (0.012)

(6.1) α1 =0 −0.001 0.097 (0.101) 0.095 0.951 (0.956)
β0 = −4 −3.999 0.264 (0.255) 0.267 0.942 (0.940)
β1 = −2 −1.991 0.258 (0.247) 0.259 0.944 (0.921)

Estimate, SE and MSE denote the averages of estimates, bootstrap standard errors and square root of the mean squared errors and CP represents the estimated coverage probability of the 95% bootstrap confidence intervals, computed from 2,000 simulated samples. SE* and CP* denote the standard errors and coverage probability of CIs obtained by the procedures in Sections 3.3 and 4.1. Parameter estimates were obtained using (3.5) with Λi = Ini×ni

7 Discussion

Our proposed methodology is focused on concomitant interventions in longitudinal clinical trials and such interventions commonly appear in other settings. For example, subjects in an epidemiological study may take antihypertensive medication during the study when their blood pressure levels either exhibit some undesirable trends or stay in an intolerable range. Crucial in dealing with this type of data is to model the intervention selection mechanism as realistic as possible. In our pharmacotherapy example of the ENRICHD trial, there was only a vague guideline for the initiation of pharmacotherapy, so that it appeared reasonable to model the intervention selection mechanism through some shared-parameters. We focus on the varying-coefficient model mainly because it has a simple and clear biological interpretation for this example, its assumptions seem to be realistic for this type of trials, and the nonparametric B-spline method can be easily implemented.

We also find that there are several possible extensions worthy of further investigation. First, our data structure allows for only a single intervention with one change-point per subject. Generally, subjects in longitudinal studies may have single or multiple concomitant interventions which can be turned on or off at different time points. In such situations, more general shared-parameter models may be needed to accommodate the possibility of multiple interventions and/or multiple change-points. Second, our model relies on linear functions to describe the trends before and after the intervention. It can be generalized to models with nonlinear response curves. Finally, we use the classical frequentist’s framework for the B-spline methods. In a different context, Fahrmeir and Lang [28] demonstrated a promising Bayesian inference procedure for generalized additive mixed models based on Markov random field priors. Analogous approaches for our model and estimators may lead to useful confidence regions and model diagnostic procedures.

Acknowledgement

Financial support for the ENRICHD study was provided by the National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland. Pfizer Inc., provided sertraline (Zoloft) for the ENRICHD study. We want to thank the participants as well as the investigators of the ENRICHD study. We also thank two referees and the associate editor for their thoughtful suggestions and comments which greatly improved our presentation.

Contributor Information

Colin O. Wu, Office of Biostatistics Research, National Heart, Lung and Blood Institute, Bethesda, MD 20892.

Xin Tian, Office of Biostatistics Research, National Heart, Lung and Blood Institute, Bethesda, MD 20892.

Heejung Bang, Division of Biostatistics and Epidemiology, Department of Public Health, Weill Medical College of Cornell University, NY 10021.

References

  • 1.Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963–974. [PubMed] [Google Scholar]
  • 2.Davidian M, Giltinan DM. Nonlinear Models for Repeated Measurement Data. London; New York: Chapman Hall; 1995. [Google Scholar]
  • 3.Verbeke G, Molenberghs G. Linear Mixed Models for Longitudinal Data. New York: Springer; 2000. [Google Scholar]
  • 4.Diggle PJ, Heagerty P, Liang K-Y, Zeger SL. Analysis of Longitudinal Data. 2nd ed. Oxford: Oxford University Press; 2002. [Google Scholar]
  • 5.Fan J, Zhang JT. Functional linear models for longitudinal data. Journal of the Royal Statistical Society. Ser. B. 2000;62:303–322. [Google Scholar]
  • 6.Lin X, Carroll RJ. Nonparametric function estimation for clustered data when the predictor is measured without/with error. Journal of the American Statistical Association. 2000;95:520–534. [Google Scholar]
  • 7.Lin X, Carroll RJ. Semiparametric regression for clustered data using generalized estimating equations. Journal of the American Statistical Association. 2001;96:1045–1056. [Google Scholar]
  • 8.Huang JZ, Wu CO, Zhou L. Varying-coefficient models and basis function approximations for the analysis of repeated measurements. Biometrika. 2002;89:111–128. [Google Scholar]
  • 9.Huang JZ, Wu CO, Zhou L. Polynomial spline estimation and inference for varying coefficient models with longitudinal data. Statistica Sinica. 2004;14:763–788. [Google Scholar]
  • 10.The ENRICHD Investigators. Enhancing recovery in coronary heart disease patients (ENRICHD): Study intervention rationale and design. Psychosomatic Medicine. 2001;63:747–755. [PubMed] [Google Scholar]
  • 11.The ENRICHD Investigators. Enhancing recovery in coronary heart disease patients (ENRICHD): The effects of treating depression and low perceived social support on clinical events after myocardial infarction. Journal of the American Medical Association. 2003;289:3106–3116. doi: 10.1001/jama.289.23.3106. [DOI] [PubMed] [Google Scholar]
  • 12.Taylor CB, Youngblood ME, Catellier D, Veith RC, Carney RM, Burg MM, Kaufmann P, Shuster J, Mellman T, Blumenthal JA, Krishnan R, Jaffe AS. Effects of antidepressant medication on morbidity and mortality in depressed patients after myocardial infarction. Archives of General Psychiatry. 2005;62:792–798. doi: 10.1001/archpsyc.62.7.792. [DOI] [PubMed] [Google Scholar]
  • 13.Bang H, Robins JM. Doubly robust estimation in missing data and causal inference models. Biometrics. 2005;61:962–972. doi: 10.1111/j.1541-0420.2005.00377.x. [DOI] [PubMed] [Google Scholar]
  • 14.Murphy SA, van der Laan MJ, Robins JM. Marginal mean models for dynamic regimes. Journal of the American Statistical Association. 2001;96:1410–1423. doi: 10.1198/016214501753382327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Wall MM, Dai Y, Eberly LE. GEE estimation of a misspecified time-varying covariate: an example with the effect of alcoholism treatment on medical utilization. Statistics in Medicine. 2005;24:925–939. doi: 10.1002/sim.1966. [DOI] [PubMed] [Google Scholar]
  • 16.Wu MC, Carroll R. Estimation and comparison fo changes in the presence of informative right censoring by modeling the censoring process. Biometrics. 1988;44:175–188. [Google Scholar]
  • 17.Follmann D, Wu M. An approximate generalized linear model with random effects for informative missing data. Biometrics. 1995;51:151–168. [PubMed] [Google Scholar]
  • 18.Naumova EN, Must A, Laird NM. Tutorial in Biostatistics: Evaluating the impact of critical periods in longitudinal studies of growth using piecewise mixed effects models. International Journal of Epidemiology. 2001;30:1332–1341. doi: 10.1093/ije/30.6.1332. [DOI] [PubMed] [Google Scholar]
  • 19.Bang H, Mazumdar M, Spence JD. Tutorial in Biostatistics: Analyzing associations between total plasma Homocysteine and B vitamins using optimal categorization and segmented regression. Neuroepidemiology. 2006;27:188–200. doi: 10.1159/000096149. [DOI] [PubMed] [Google Scholar]
  • 20.Rice JA, Wu CO. Nonparametric mixed effects models for unequally sampled noisy curves. Biometrics. 2001;57:253–259. doi: 10.1111/j.0006-341x.2001.00253.x. [DOI] [PubMed] [Google Scholar]
  • 21.Lin X, Zhang D. Inference in generalized additive mixed models by using smoothing splines. Journal of the Royal Statistical Society. Ser. B. 1999;61:381–400. [Google Scholar]
  • 22.Chiang CT, Rice JA, Wu CO. Smoothing spline estimation for varying coefficient models with repeatedly measured dependent variables. Journal of the American Statistical Association. 2001;96:605–619. [Google Scholar]
  • 23.Welsh AH, Lin X, Carroll RJ. Marginal longitudinal nonparametric regression: Locality and efficiency of spline and kernel methods. Journal of the American Statistical Association. 2002;97:482–493. [Google Scholar]
  • 24.Diggle PJ, Verbyla AP. Nonparametric estimation of covariance structure in longitudinal data. Biometrics. 1998;54:401–415. [PubMed] [Google Scholar]
  • 25.Serfling RJ. Approximation Theorems of Mathematical Statistics. New York: John Wiley & Sons; 1980. [Google Scholar]
  • 26.Hoover DR, Rice JA, Wu CO, Yang LP. Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika. 1998;85:809–822. [Google Scholar]
  • 27.Hastie TJ, Tibshirani RJ, Friedman J. The Elements of Statistical Learning; Data Mining, Inference, and Prediction. New York: Springer; 2001. [Google Scholar]
  • 28.Fahrmeir L, Lang S. Bayesian inference for generalized additive mixed models based on Markov random field priors. Applied Statistics. 2001;50:201–220. [Google Scholar]

RESOURCES