Regression modeling of longitudinal data with outcome-dependent observation times: extensions and comparative evaluation

Kay See Tan; Benjamin French; Andrea B Troxel

doi:10.1002/sim.6262

. Author manuscript; available in PMC: 2024 Mar 19.

Published in final edited form as: Stat Med. 2014 Jul 23;33(27):4770–4789. doi: 10.1002/sim.6262

Regression modeling of longitudinal data with outcome-dependent observation times: extensions and comparative evaluation

Kay See Tan ^1,^*,^†, Benjamin French ¹, Andrea B Troxel ¹

PMCID: PMC10949856 NIHMSID: NIHMS1971342 PMID: 25052289

Abstract

Conventional longitudinal data analysis methods assume that outcomes are independent of the data-collection schedule. However, the independence assumption may be violated, for example, when a specific treatment necessitates a different follow-up schedule than the control arm or when adverse events trigger additional physician visits in between prescheduled follow-ups. Dependence between outcomes and observation times may introduce bias when estimating the marginal association of covariates on outcomes using a standard longitudinal regression model. We formulate a framework of outcome-observation dependence mechanisms to describe conditional independence given observed observation-time process covariates or shared latent variables. We compare four recently developed semi-parametric methods that accommodate one of these mechanisms. To allow greater flexibility, we extend these methods to accommodate a combination of mechanisms. In simulation studies, we show how incorrectly specifying the outcome-observation dependence may yield biased estimates of covariate-outcome associations and how our proposed extensions can accommodate a greater number of dependence mechanisms. We illustrate the implications of different modeling strategies in an application to bladder cancer data. In longitudinal studies with potentially outcome-dependent observation times, we recommend that analysts carefully explore the conditional independence mechanism between the outcome and observation-time processes to ensure valid inference regarding covariate-outcome associations.

Keywords: joint models, observation-time process, outcome process, outcome-dependent follow-up, semi-parametric regression, informative observation times

1. Introduction

Longitudinal studies commonly assume that the data-collection schedule is independent of a subject’s outcomes and measured or unmeasured characteristics. However, this independence assumption may be violated if observed covariate or outcome values influence the occurrence or timing of subsequent visits. For example, in a study following patients with diabetes, routine visits are scheduled every 6 months. However, spikes in blood sugar levels, exacerbation of other symptoms, or underlying patient characteristics may trigger additional closely spaced physician visits until the blood sugar level has stabilized. The intensity of events such as physician visits is dependent on previous outcomes and measured or unmeasured covariates. Less healthy patients may be over-represented in the analysis because of more frequent data collection. In the presence of the resultant selection bias, conventional methods such as generalized estimating equations (GEE) [1] may yield biased estimates of covariate-outcome associations [2,3]. Proper estimation must account for such selection bias. We focus on a marginal mean regression model to evaluate the association between observed covariates and a continuous outcome of interest. We denote the longitudinal outcomes as the outcome process and the occurrence of visits over time as the observation-time process.

Several authors have proposed parametric models to account for the potential dependence between the outcome and observation-time processes. Lipsitz et al. [4] developed a likelihood-based procedure for continuous outcomes, Fitzmaurice et al. [5] proposed a pseudo-likelihood estimation procedure for binary outcomes, and Lin et al. [6] and Bůžková and Lumley [7] utilized inverse intensity-weighted estimators with observation-level inverse weights. Others focused on estimation procedures based on joint likelihood approaches: Ryu et al. [8] developed a Bayesian fully parametric regression model; Liu et al. [9] considered a joint mixed-effects model in which the outcomes, observation times, and censoring times were correlated through latent variables. The study of outcome-dependent observation times shares features of research regarding incomplete [10,11] and recurrent marked point process data [12] but differs in that subjects do not share a common set of visit times, and outcomes (e.g., blood sugar level) exist even if an event (e.g., a physician visit) does not occur.

We introduce a framework of three outcome-observation dependence mechanisms. The first mechanism applies when the outcome and the observation-time processes are conditionally independent given outcome-model covariates. The second mechanism applies when the processes are conditionally independent given observation-time model covariates, which may include outcome-model covariates and previous outcomes. The third applies when the processes are conditionally independent given shared, unobserved, latent variables. We consider four semi-parametric marginal regression methods that do not require estimation of the mean effect of time on the outcomes: the Lin method [13] accommodates the first mechanism, the Bůžková method [14] accommodates the second mechanism, and the Liang [15] and Sun [16] methods accommodate the third mechanism. We extend both the Liang and the Sun methods to accommodate a combination of the three mechanisms, thereby increasing the flexibility of the models.

In this article, we compare currently available and newly extended methods that accommodate outcome-dependent observation times. Our goal is to provide much-needed clarification of the strengths and limitations of each estimation method under alternative outcome-observation dependence mechanisms. In Section 2, we elaborate on our framework of outcome-observation dependence mechanisms. We review existing methods under each of these mechanisms (Section 2.2) and detail our extensions to both the Liang and Sun methods to accommodate conditional independence through observation-time model covariates, and our extension to the Liang method to accommodate time-dependent covariates in the observation-time model (Section 2.3). We present simulation studies to evaluate the performance of the reviewed methods under alternative outcome-observation dependence mechanisms in Section 3 and illustrate their application to a bladder cancer study in Section 4. Section 5 provides guidance on the selection of estimation methods.

2. Estimation methods

Let $Y_{i} (t)$ denote a continuous outcome of interest at time $t$ and $X_{i} (t)$ denote a $p \times 1$ vector of possibly time-dependent covariates for subject $i = 1, \dots, n$ . We only consider external covariates, such that any time-dependent covariate process at time $t$ is conditionally independent of all previous outcomes given the history of the covariate process [17]. The outcome $Y_{i} (\cdot)$ is measured at $m_{i}$ observation times $0 ⩽ T_{i 1} < T_{i 2} < \dots < T_{i m_{i}} ⩽ τ$ , for which $m_{i}$ denotes the number of follow-up measurements on the $i^{th}$ individual, and $τ$ denotes the maximum study duration. Using counting process notation, let $N_{i} (t) = \sum_{s ⩽ t} d N_{i} (s)$ denote the number of observations on the $i^{th}$ subject by $t ⩽ C_{i}$ . The censoring time $C_{i} ⩽ τ$ is the time of last visit or an administrative end-of-study time. The indicator variable $d N_{i} (t)$ is 1 if a follow-up visit occurred at $t$ and 0 otherwise. We assume non-informative censoring, such that $E [Y_{i} (t) ∣ X_{i} (t), C_{i} ⩾ t] = {E [Y}_{i} (t) ∣ X_{i} (t)$ . That is, the covariate-outcome associations are the same in subjects who are censored at $C_{i}$ as those who are still in the study.

2.1. Models and assumptions

2.1.1. Semi-parametric outcome model.

We assume that primary scientific interest lies in a semi-parametric regression model for the longitudinal continuous outcomes [13]:

Y_{i} (t) = μ (t) + β^{'} X_{i} (t) + ϵ_{i} (t),

(1)

for which $μ (t)$ is an arbitrary function of time, $β$ is a $p \times 1$ vector of regression parameters of interest, and $ϵ_{i} (t)$ is a zero-mean process independent of $X_{i} (t)$ . Model (1) specifies a parametric structure for the effect of $X_{i} (t)$ and a non-parametric structure for $μ (t)$ [3,18,19]. A semi-parametric model is appealing if the effect of $X_{i} (t)$ is of primary interest and the effect of time is considered a nuisance. Model (1) does not condition on the entire covariate process or on past outcomes. Instead, it includes covariate information available at $t$ , such as baseline covariates, covariates measured at or before $t$ , and summaries of the covariate history, that is, Model (1) is a partly conditional mean regression model [20].

2.1.2. Observation-time model.

We use a standard recurrent events model to describe the observation-time process. Given observation-time model covariates $Z_{i} (t)$ and a non-negative latent variable $η_{i}$ with mean 1 and unknown variance $σ^{2}$ , $N_{i}$ is a non-homogeneous Poisson process with intensity function [21]:

λ_{i} (t) = η_{i} e x p \{γ^{'} Z_{i} (t)\} λ (t),

(2)

in which $γ$ is a vector of unknown parameters, $λ (t)$ is an unspecified baseline intensity function with $Λ (t) = \int_{0}^{t} λ (u) d u$ , and $X_{i} (t) \subseteq Z_{i} (t)$ . Unless otherwise specified, we assume that $η_{i}$ is independent of $Z_{i} (t)$ . Model (2) implies that the occurrence of observations follows a proportional intensity model, in which $η_{i}$ inflates or deflates the visit intensity. The parameter $γ$ from the observation-time model is considered a nuisance. However, incorporating the observation-time process into the estimation of $β$ facilitates prediction from longitudinal data under similar outcome-observation dependence mechanisms, which we detail in the next section.

2.1.3. A framework of outcome-observation dependence mechanisms.

We distinguish three mechanisms that describe the dependence between the outcome and observation-time processes:

(M1) Conditional independence given past outcome-model covariates.

(M2) Conditional independence given past observation-time model covariates.

(M3) Conditional independence given shared latent variables.

Throughout the paper, ‘conditional independence given covariates’ implies conditional independence given past observed covariates. Recall that $X_{i} (t)$ incorporates covariate information available at $t$ , which may include baseline covariates, covariates measured at or before $t$ , and summaries of the covariate history.

(M1) Conditional independence given past outcome-model covariates

The first mechanism applies when the outcome process is conditionally independent of the observation-time process given observed outcome-model covariates $X_{i} (t)$ , or a subset of $X_{i} (t)$ :

E [d N_{i} (t) ∣ X_{i} (t), Y_{i} (t), C_{i} ⩾ t] = E [d N_{i} (t) ∣ X_{i} (t)] .

The probability of observation at time $t$ depends on $X_{i} (t)$ , $Y_{i} (t)$ , and $C_{i}$ only through observed outcome-model covariates $X_{i} (t)$ . (M1) is plausible when the occurrence of a visit is due to the features of the study design instead of subject-specific behaviors.

(M2) Conditional independence given past observation-time model covariates

The second mechanism applies when the probability of an observation at $t$ depends on observation-time model covariates $Z_{i} (t)$ , in which full or subset of $X_{i} (t)$ is contained in $Z_{i} (t)$ :

E [d N_{i} (t) ∣ Z_{i} (t), Y_{i} (t), C_{i} ⩾ t] = E [d N_{i} (t) ∣ Z_{i} (t)] .

The set of observation-time model covariates $Z_{i} (t)$ can include the outcome-model covariates, any additional measured covariates at or before $t$ , and summaries of previous outcomes. Note that (M1) ⊂ (M2) because $X_{i} (t) \subseteq Z_{i} (t)$ .

(M3) Conditional independence given shared latent variables

The third mechanism applies when the outcome process is conditionally independent of the observation-time process given observed outcome-model covariates $X_{i} (t)$ and an unobserved mean 1 subject-specific latent variable $η_{i}$ :

E [d N_{i} (t) ∣ X_{i} (t), Y_{i} (t), η_{i}, C_{i} ⩾ t] = E [d N_{i} (t) ∣ X_{i} (t), η_{i}] .

The parameter $η_{i}$ expresses subject-specific unmeasured confounders and propensity for an observation. Note that (M1) ⊂ (M3) in situations to be described in Section 2.2.3.

Our framework for outcome-observation dependence in the analysis of longitudinal data provides guidance for the selection of reliable methods. (M2) and (M3) place fewer restrictions on the probability of having a visit than (M1) and are reasonable assumptions in most observational studies. However, (M2) and (M3) are more restrictive because fewer analysis methods are available to provide valid inference, which we detail in the following section.

2.2. Existing methods

In this section, we describe four existing methods to estimate covariate-outcome associations in the presence of outcome-observation dependence. All of the methods require estimation of an observation-time model. If the observation-time process is conditionally independent of the censoring times, the parameter $γ$ can be consistently estimated by $\hat{γ}$ from the procedure in Lin et al. [21]:

U (γ) = \sum_{i = 1}^{n} \int_{0}^{τ} {Z_{i} (t) - \bar{Z} (t; γ)} d N_{i} (t),

(3)

for which $ξ_{i} (t) = I (C_{i} > t)$ and:

\overline{Z} (t; γ) = \frac{\sum_{i = 1}^{n} ξ_{i} (t) e x p \{γ^{'} Z_{i} (t)\} Z_{i} (t)}{\sum_{i = 1}^{n} ξ_{i} (t) e x p \{γ^{'} Z_{i} (t)\}} .

2.2.1. Method under (M1).

Lin and Ying [13] assume that the observation-time process is conditionally independent of the outcome process given the outcome-model covariates, as in (M1). The Lin method specifies a marginal semi-parametric outcome model $E [Y_{i} (t) ∣ X_{i} (t)] = μ (t) + β^{'} X_{i} (t)$ and a proportional rate observation-time model $E [d N_{i} (t) ∣ V_{i} (t)] = e x p \{γ^{'} V_{i} (t)\} d Λ (t)$ , in which $V_{i} (t)$ is a subset of $X_{i} (t)$ . We define a zero-mean stochastic process [13]:

M_{i} (t; 𝒜, β, γ) = \int_{0}^{t} \{Y_{i} (s) - β^{'} X_{i} (s)\} d N_{i} (s) - \int_{0}^{t} e x p \{γ^{'} V_{i} (s)\} ξ_{i} (s) d 𝒜 (s),

(4)

in which $𝒜 (t) = \int_{0}^{t} μ (s) d Λ (s)$ . Based on (4), one set of estimating equations to solve for $μ (t)$ and $β$ is:

\sum_{i = 1}^{n} M_{i} (t; β, γ) = 0, 0 < t ⩽ τ

(5)

\sum_{i = 1}^{n} \int_{0}^{τ} W (t) X_{i} (t) d M_{i} (t; β, γ) = 0.

(6)

The common weight $W (t)$ can improve efficiency and may be data-dependent, such as the proportion of subjects left in the study, that is, $n^{- 1} \sum_{i = 1}^{n} ξ_{i} (t)$ . The closed-form expression of $𝒜 (t)$ in (5) yields $\tilde{𝒜} (t; β) = \sum_{i = 1}^{n} \int_{0}^{t} \frac{\{Y_{i} (s) - β^{'} X_{i} (s)\} d N_{i} (s)}{\sum_{j = 1}^{n} ξ_{j} (s) e x p \{γ^{'} V_{i} (s)\}}$ , which replaces $𝒜 (t)$ in (6) to form the estimating equation:

\sum_{i = 1}^{n} \int_{0}^{τ} W (t) {X_{i} (t) - \bar{X} (t; γ)} {Y_{i} (t) - β^{'} X_{i} (t)} d N_{i} (t) = 0.

(7)

The centering term is defined as:

\overline{X} (t; γ) = \frac{\sum_{i = 1}^{n} ξ_{i} (t) e x p \{γ^{'} V_{i} (t)\} X_{i} (t)}{\sum_{i = 1}^{n} ξ_{i} (t) e x p \{γ^{'} V_{i} (t)\}} .

We note that:

E [\sum_{i = 1}^{n} \int_{0}^{τ} \{X_{i} (t) - \overline{X} (t; γ)\} g (t) d N_{i} (t) ∣ \{X_{i} (t), C_{i}; i = 1, \dots, n\}] = \sum_{i = 1}^{n} \int_{0}^{τ} \{X_{i} (t) - \overline{X} (t; γ)\} g (t) d N_{i} (t) = 0

for any function $g (\cdot)$ , so we extend the left side of (7) to obtain the class of estimating functions for $β$ :

U_{g} (β; γ) = \sum_{i = 1}^{n} \int_{0}^{τ} W (t) [X_{i} (t) - \overline{X} (t; γ)] \{Y_{i} (t) - β^{'} X_{i} (t) - g (t; γ)\} d N_{i} (t) .

One optimal choice of $g (\cdot)$ that minimizes the variance of $U_{g} (β, γ)$ is $g (t; γ) = {\overline{Y}}^{*} (t; γ) - β^{'} \overline{X} (t; γ)$ , in which:

{\overline{Y}}^{*} (t; γ) = \frac{\sum_{i = 1}^{n} ξ_{i} (t) e x p \{γ^{'} V_{i} (t)\} Y_{i}^{*} (t)}{\sum_{i = 1}^{n} ξ_{i} (t) e x p \{γ^{'} V_{i} (t)\}},

and $Y_{i}^{*} (t)$ is the measurement of $Y_{i}$ at the observation nearest to $t$ . Hence, $β$ can be consistently estimated from the estimating equation [13]:

U (β; γ) = \sum_{i = 1}^{n} \int_{0}^{τ} W (t) [X_{i} (t) - \overline{X} (t; γ)] {Y_{i} (t) - {\overline{Y}}^{*} (t; γ) - β^{'} [X_{i} (t) - \overline{X} (t; γ)]} d N_{i} (t) = 0,

in which $γ$ is estimated by (3) conditioning on the covariates $V_{i} (t)$ . The inclusion of the centering term for covariates accounts for the probability of being observed at $t$ and removes the need for estimation of $μ (t)$ . The centering of the outcome increases the efficiency of the estimation procedure. Note that $Y_{i}^{*} (t)$ is the nearest-neighbor approximation of $Y_{i} (t)$ if the true measurement is not evaluable or collected at $t$ . Li and Ryan [22] documented the potential issue of such mismeasured covariates. Discussion of other forms of $g (\cdot)$ and $Y_{i}^{*} (t)$ can be found in the comments and rejoinder section of Lin and Ying [13].

2.2.2. Method under (M1) and (M2).

Bůžková and Lumley [14] relax the assumption of (M1) by addressing the dependence between the outcome and observation-time processes through observation-time model covariates. The set of covariates $Z_{i} (t)$ may include the outcome-model covariates $X_{i} (t)$ and past outcomes.

The Bůžková method uses inverse intensity rate ratio (IIRR)-weighted estimators to estimate $β$ in the outcome model $E [Y_{i} (t) ∣ X_{i} (t)] = μ (t) + β^{'} X_{i} (t)$ . The observation-level inverse weights standardize the observed data to the time-specific underlying population under the proportional rate model for observation times $E [d N_{i} (t) ∣ Z_{i} (t)] = e x p \{γ^{'} Z_{i} (t)\} d Λ (t)$ . Inverse weighting has also been shown to reduce bias when cluster size is informative (i.e., the outcomes measured among clustered units are not independent of cluster size) [23, 24] and when missing data are missing at random (i.e., missingness depends only on observed covariates and outcomes) [25, 26]. One particular weight with variance-stabilizing properties is:

ρ_{i} (t; γ, δ) = \frac{e x p \{γ^{'} Z_{i} (t)\}}{e x p \{δ^{'} X_{i} (t)\}},

for which $δ$ is estimated by $\hat{δ}$ using (3) conditioning on $X_{i} (t)$ instead of $Z_{i} (t)$ . The proposed estimating equation for $β$ is:

U (β; \hat{γ}, δ) = \sum_{i = 1}^{n} \int_{0}^{τ} \frac{W (t)}{ρ_{i} (t; \hat{γ}, δ)} [X_{i} (t) - \overline{X} (t; δ)] \{Y_{i} (t) - {\overline{Y}}^{*} (t; δ) - β^{'} [X_{i} (t) - \overline{X} (t; δ)]\} d N_{i} (t) = 0,

in which:

\overline{X} (t; δ) = \frac{\sum_{i = 1}^{n} ξ_{i} (t) e x p \{δ^{'} X_{i} (t)\} X_{i} (t)}{\sum_{i = 1}^{n} ξ_{i} (t) e x p \{δ^{'} X_{i} (t)\}},

and:

{\overline{Y}}^{*} (t; δ) = \frac{\sum_{i = 1}^{n} ξ_{i} (t) e x p \{δ^{'} X_{i} (t)\} Y_{i}^{*} (t)}{\sum_{i = 1}^{n} ξ_{i} (t) e x p \{δ^{'} X_{i} (t)\}} .

If $Z_{i} (t) = X_{i} (t)$ , then $ρ_{i} (t; γ, δ) = 1$ , and the Bůžková method reduces to the Lin method (Section 2.2.1). The IIRR-weighted estimates are asymptotically consistent and normal, but the validity of the proposed IIRR-weighted estimator is contingent upon correct specification of $Z_{i} (t)$ in the observation-time model [14].

2.2.3. Methods under (M1) and (M3).

The following two methods accommodate subject-specific observation-time processes with arbitrary visit patterns through the use of latent variables. The Liang method [15] specifies a semi-parametric mixed-effects outcome model:

E [Y_{i} (t) ∣ X_{i} (t), Q_{i} (t)] = μ (t) + β^{'} X_{i} (t) + η_{i 1}^{'} Q_{i} (t),

(8)

in which $η_{i 1}$ is a vector of unobserved subject-specific latent variables and $Q_{i} (t)$ is a subset of $X_{i} (t)$ . The observation-time process is modeled as $λ_{i} (t) = η_{i 2} λ (t) e x p \{γ^{'} V_{i}\}$ , and $V_{i}$ is a subset of baseline covariates in $X_{i} (t)$ . The Gamma-distributed latent variable $η_{i 2}$ is independent of $V_{i}$ , $E [η_{i 2}] = 1$ , and $V a r [η_{i 2}] = σ^{2}$ is unknown. The relationship between $η_{i 1}$ and $η_{i 2}$ is defined by the conditional expectation $E [η_{i 1} | η_{i 2}] = θ (η_{i 2} - 1)$ , so $θ$ describes the magnitude and direction of the association between the outcome and observation-time processes. Note that the marginal expectation of $η_{i 1}$ is 0. The linear link between $η_{i 1}$ and $η_{i 2}$ can also be extended to other specified link functions [15]. When $η_{i 1} = 0$ , the Liang method reduces to the Lin method (Section 2.2.1).

Conditioning on $η_{i 2}$ , the observation-time process is a non-homogeneous process such that $m_{i}$ has a Poisson distribution with mean $η_{i 2} e x p (γ^{'} V_{i}) Λ (C_{i})$ . The cumulative baseline intensity function $Λ (t)$ can be consistently estimated by the Aalen-Breslow-type estimator $\hat{Λ} (t) = \hat{Λ} (t, \hat{γ})$ :

\hat{Λ} (t, \hat{γ}) = \sum_{i = 1}^{n} \int_{0}^{t} \frac{d N_{i} (s)}{\sum_{j = 1}^{n} ξ_{j} (s) e x p ({\hat{γ}}^{'} V_{i})},

for which $γ$ is estimated by (3) conditioning on the baseline covariates $V_{i}$ . Given $(C_{i}, m_{i}, η_{i 2})$ , the observation times $(T_{i 1}, T_{i 2}, \dots, T_{i m_{i}})$ are the order statistics of a set of independently and identically distributed random variables with the density function:

\prod_{i = 1}^{n} p (t_{i 1}, t_{i 2}, \dots, t_{i m_{i}} ∣ C_{i}, m_{i}, η_{i 2}) = \prod_{i = 1}^{n} \{m_{i}! \prod_{i = 1}^{n} \frac{d Λ (t_{i j})}{Λ (C_{i})}\} .

Hence, $E \{d N_{i} (t) ∣ C_{i}, m_{i}, η_{i 2}\} = ξ_{i} (t) m_{i} \frac{d Λ (t)}{Λ (C_{i})}$ . It follows that:

E [\{Y_{i} (t) - β^{'} X_{i} (t)\} d N_{i} (t) ∣ C_{i}, m_{i}] = E (E [\{μ (t) + η_{i 1}^{'} Q_{i} (t) + ϵ_{i} (t)\} d N_{i} (t) ∣ C_{i}, m_{i}, η_{i 2}] ∣ C_{i}, m_{i}) = μ (t) ξ_{i} (t) m_{i} \frac{d Λ (t)}{Λ (C_{i})} + θ^{'} Q_{i} (t) E \{(η_{i 2} - 1) ∣ C_{i}, m_{i}\} E \{d N_{i} (t) ∣ C_{i}, m_{i}\} .

(9)

We define $B_{i} (t) = Q_{i} (t) E [(η_{i 2} - 1) ∣ C_{i}, m_{i}]$ as a covariate based on the subject-specific propensity of visit, and $𝒜 (t) = \int_{0}^{t} μ (s) d Λ (s)$ . Then (9) can be expressed as:

E [\{Y_{i} (t) - β^{'} X_{i} (t) - θ^{'} B_{i} (t)\} d N_{i} (t) ∣ C_{i}, m_{i}] = ξ (t) \frac{m_{i}}{Λ (C_{i})} d 𝒜 (t) .

We can then formulate the zero-mean process:

M_{i 2} (t; 𝒜, β, θ, γ) = \int_{0}^{t} \{Y_{i} (s) - β^{'} X_{i} (s) - θ^{'} B_{i} (t)\} d N_{i} (s) - \int_{0}^{t} ξ_{i} (s) \frac{m_{i}}{Λ (C_{i})} d 𝒜 (s),

(10)

and define the set of estimating equations based on (10) to estimate $μ (t)$ , $β$ , and $θ$ simultaneously:

\sum_{i = 1}^{n} M_{i 2} (t; β, θ, γ) = 0, 0 < t ⩽ τ

(11)

\sum_{i = 1}^{n} \int_{0}^{τ} (\begin{matrix} X_{i} (t) \\ {\hat{B}}_{i} (t) \end{matrix}) d M_{i 2} (t; β, θ, γ) = 0.

(12)

The closed-form expression for $𝒜 (t)$ in (11) replaces $𝒜 (t)$ in (12), so $β$ and $θ$ can be consistently estimated using the class of estimating equations [15]:

U (β, θ; \hat{Λ}, \hat{B}) = \sum_{i = 1}^{n} \int_{0}^{τ} (\begin{array}{l} X_{i} (t) - \overline{X} (t) \\ {\hat{B}}_{i} (t) - \bar{\hat{B}} (t) \end{array}) \{Y_{i} (t) - β^{'} X_{i} (t) - θ^{'} {\hat{B}}_{i} (t)\} d N_{i} (t) = 0,

for which:

\overline{X} (t) = \frac{\sum_{i = 1}^{n} ξ_{i} (t) X_{i} (t) m_{i} / \hat{Λ} (C_{i})}{\sum_{i = 1}^{n} ξ_{i} (t) m_{i} / \hat{Λ} (C_{i})},

and:

\bar{\hat{B}} (t) = \frac{\sum_{i = 1}^{n} ξ_{i} (t) {\hat{B}}_{i} (t) m_{i} / \hat{Λ} (C_{i})}{\sum_{i = 1}^{n} ξ_{i} (t) m_{i} / \hat{Λ} (C_{i})} .

To estimate $B_{i} (t)$ , the conditional expectation of $η_{i 2}$ given $(C_{i}, m_{i})$ is required. If we assume that $η_{i 2}$ is Gamma distributed with mean 1 and variance $σ^{2}$ , the expectation of $η_{i 2}$ can be expressed as:

E (η_{i 2} ∣ C_{i}, m_{i}) = \frac{1 + m_{i} σ^{2}}{1 + e x p (γ^{'} V_{i}) Λ (C_{i}) σ^{2}} .

The covariate $B_{i} (t)$ can thus be estimated by:

{\hat{B}}_{i} (t) = (\frac{1 + m_{i} {\hat{σ}}^{2}}{1 + e x p ({\hat{γ}}^{'} V_{i}) \hat{Λ} (C_{i}) {\hat{σ}}^{2}} - 1) Q_{i} (t),

for which ${\hat{σ}}^{2}$ is a consistent estimator of $σ^{2}$ defined as:

{\hat{σ}}^{2} = m a x \{\frac{\sum_{i = 1}^{n} \{m_{i}^{2} - m_{i} - e x p (2 {\hat{γ}}^{'} V_{i}) {\hat{Λ}}^{2} (C_{i})\}}{\sum_{i = 1}^{n} e x p (2 {\hat{γ}}^{'} V_{i}) {\hat{Λ}}^{2} (C_{i})}, 0\} .

(13)

Similar to the Liang method, the Sun method [16] accommodates (M3). In contrast to the Liang method, the distribution of the latent variable is completely unspecified, and the same latent variable $η_{i}$ is shared between the outcome and observation-time models. The Sun method specifies the semi-parametric marginal model:

E [Y_{i} (t) ∣ X_{i} (t), η_{i}] = μ (t) + β^{'} X_{i} (t) + α η_{i} .

(14)

Similar to $θ$ in the Liang method, $α$ parameterizes the correlation between the outcome and observation-time processes. If $α = 0$ , then the Sun method reduces to the Lin method.

Conditioning on $η_{i}$ , $N_{i} (t)$ is a non-homogeneous Poisson process with intensity function $λ_{i} (t) = η_{i} λ (t) e x p \{γ^{'} X_{i} (t)\}$ . The distribution of $η_{i}$ under the Sun method may depend on observed time-independent outcome-model covariates $V_{i}$ with $E [η_{i} ∣ V_{i}] = 1$ . Discussion regarding covariate-dependent latent variables or frailties can be found in recent literature [27–30]. Let $\hat{π} (t; X_{i}) = \int_{0}^{t} e x p \{{\hat{γ}}^{'} X_{i} (u)\} d \hat{Λ} (u)$ , ${\hat{η}}_{i} = (m_{i} - 1) / \hat{π} (C_{i}; X_{i})$ , and ${\hat{Ω}}_{i} = (m_{i} - 1) (m_{i} - 2) / \hat{π} {(C_{i}; X_{i})}^{2}$ . The class of estimating equations for $β$ and $α$ has the form:

U_{1} (β, α; γ) = \sum_{i = 1}^{n} \int_{0}^{τ} W (t) [\{X_{i} (t) - \overline{X} (t; γ)\} \{Y_{i} (t) - β^{'} X_{i} (t) - α {\hat{η}}_{i}\}] d N_{i} (t) = 0,

and:

U_{2} (β, α; γ) = \sum_{i = 1}^{n} \int_{0}^{τ} W (t) [\{{\hat{η}}_{i} - \overline{η} (t; γ)\} \{Y_{i} (t) - β^{'} X_{i} (t)\} - α \{{\hat{Ω}}_{i} - {\hat{η}}_{i} \overline{η} (t; γ)\}] d N_{i} (t) = 0,

for which:

\overline{X} (t; γ) = \frac{\sum_{i = 1}^{n} ξ_{i} (t) e x p \{γ^{'} X_{i} (t)\} X_{i} (t) m_{i} / \hat{π} (C_{i}; X_{i})}{\sum_{i = 1}^{n} ξ_{i} (t) e x p \{γ^{'} X_{i} (t)\} m_{i} / \hat{π} (C_{i}; X_{i})},

and:

\overline{η} (t; γ) = \frac{\sum_{i = 1}^{n} ξ_{i} (t) e x p \{γ^{'} X_{i} (t)\} {\hat{η}}_{i} m_{i} / \hat{π} (C_{i}; X_{i})}{\sum_{i = 1}^{n} ξ_{i} (t) e x p \{γ^{'} X_{i} (t)\} m_{i} / \hat{π} (C_{i}; X_{i})} .

2.3. Extensions

2.3.1. Extension to Liang method to accommodate time-dependent covariates.

The estimation procedure of Liang et al. [15] allows adjustment for time-independent covariates in the observation-time model. Here, we extend the Liang method to accommodate time-dependent covariates. $\hat{π} (t; V_{i}) = \int_{0}^{t} e x p \{{\hat{γ}}^{'} V_{i} (u)\} d \hat{Λ} (u)$ . The class of estimating equations for $β$ and $θ$ permitting time-dependent covariates in the observation-time model has the form:

U (β, θ; \hat{Λ}, \hat{B}) = \sum_{i = 1}^{n} \int_{0}^{τ} (\begin{array}{l} X_{i} (t) - \overline{X} (t) \\ {\hat{B}}_{i} (t) - \bar{\hat{B}} (t) \end{array}) \{Y_{i} (t) - β^{'} X_{i} (t) - θ^{'} {\hat{B}}_{i} (t)\} d N_{i} (t) = 0,

for which:

\overline{X} (t) = \frac{\sum_{i = 1}^{n} ξ_{i} (t) e x p \{γ^{'} X_{i} (t)\} X_{i} (t) m_{i} / \hat{π} (C_{i}; X_{i})}{\sum_{i = 1}^{n} ξ_{i} (t) e x p \{γ^{'} X_{i} (t)\} m_{i} / \hat{π} (C_{i}; X_{i})},

\bar{\hat{B}} (t) = \frac{\sum_{i = 1}^{n} ξ_{i} (t) e x p \{γ^{'} X_{i} (t)\} {\hat{B}}_{i} (t) m_{i} / \hat{π} (C_{i}; X_{i})}{\sum_{i = 1}^{n} ξ_{i} (t) e x p \{γ^{'} X_{i} (t)\} m_{i} / \hat{π} (C_{i}; X_{i})},

and ${\hat{B}}_{i} (t)$ can be estimated as before by replacing $\hat{Λ} (C_{i})$ with $m_{i} / \hat{π} (C_{i}; V_{i})$ . We provide details on consistency and asymptotic normality of the estimators in Appendix A.

2.3.2. Weighted-Liang and weighted-Sun methods.

We propose extensions to the Liang and Sun methods to offer additional flexibility when parameterizing outcome-observation dependence under both (M2) and (M3). Recall that we denote $X_{i} (t)$ as the outcome-model covariates and $Z_{i} (t)$ as the observation-time model covariates. With the inclusion of observation-level weights $ρ_{i} (t; \hat{γ}, δ)$ , the set of estimating equation for the weighted-Liang method can be expressed as:

U (β, θ; \hat{Λ}, \hat{B}) = \sum_{i = 1}^{n} \int_{0}^{τ} \frac{W (t)}{ρ_{i} (t; \hat{γ}, δ)} (\begin{array}{l} X_{i} (t) - \overline{X} (t) \\ {\hat{B}}_{i} (t) - \bar{\hat{B}} (t) \end{array}) \{Y_{i} (t) - β^{'} X_{i} (t) - θ^{'} {\hat{B}}_{i} (t)\} d N_{i} (t) = 0,

for which:

\overline{X} (t) = \frac{\sum_{i = 1}^{n} ξ_{i} (t) e x p \{δ^{'} X_{i} (t)\} X_{i} (t) m_{i} / \hat{π} (C_{i}; Z_{i})}{\sum_{i = 1}^{n} ξ_{i} (t) e x p \{δ^{'} X_{i} (t)\} m_{i} / \hat{π} (C_{i}; Z_{i})},

and:

\bar{\hat{B}} (t) = \frac{\sum_{i = 1}^{n} ξ_{i} (t) e x p \{δ^{'} X_{i} (t)\} {\hat{B}}_{i} (t) m_{i} / \hat{π} (C_{i}; Z_{i})}{\sum_{i = 1}^{n} ξ_{i} (t) e x p \{δ^{'} X_{i} (t)\} m_{i} / \hat{π} (C_{i}; Z_{i})} .

Similarly, the set of estimating functions for the weighted-Sun method is:

U_{1} (β, α; \hat{γ}) = \sum_{i = 1}^{n} \int_{0}^{τ} \frac{W (t)}{ρ_{i} (t; \hat{γ}, δ)} [\{X_{i} (t) - \overline{X} (t)\} \{Y_{i} (t) - β^{'} X_{i} (t) - α {\hat{η}}_{i}\}] d N_{i} (t) = 0,

and:

U_{2} (β, α; \hat{γ}) = \sum_{i = 1}^{n} \int_{0}^{τ} \frac{W (t)}{ρ_{i} (t; \hat{γ}, δ)} [\{{\hat{η}}_{i} - \overline{η} (t)\} \{Y_{i} (t) - β^{'} X_{i} (t)\} - α \{{\hat{Ω}}_{i} - {\hat{η}}_{i} \overline{η} (t)\}] d N_{i} (t) = 0,

for which ${\hat{η}}_{i} = \frac{m_{i} - 1}{\hat{π} (C_{i}; Z_{i})}$ ,

\overline{η} (t) = \frac{\sum_{i = 1}^{n} ξ_{i} (t) e x p \{δ^{'} X_{i} (t)\} {\hat{η}}_{i} m_{i} / \hat{π} (C_{i}; Z_{i})}{\sum_{i = 1}^{n} ξ_{i} (t) e x p \{δ^{'} X_{i} (t)\} m_{i} / \hat{π} (C_{i}; Z_{i})},

and:

\overline{X} (t) = \frac{\sum_{i = 1}^{n} ξ_{i} (t) e x p \{δ^{'} X_{i} (t)\} X_{i} (t) m_{i} / \hat{π} (C_{i}; Z_{i})}{\sum_{i = 1}^{n} ξ_{i} (t) e x p \{δ^{'} X_{i} (t)\} m_{i} / \hat{π} (C_{i}; Z_{i})} .

We provide details on consistency and asymptotic normality of our extensions in Appendices A and B.

2.4. Summary

In this section, we formulated a semi-parametric linear regression model to evaluate the marginal association between covariates and a continuous outcome of interest in the presence of outcome-dependent observation times. We presented a framework of outcome-observation dependence mechanisms. The Lin method is the most restrictive of the reviewed methods, because it is suitable only for the stronger assumption of (M1); the Bůžková method accommodates (M2) and reduces to (M1) when the additional covariates in the observation-time model are not required; the Liang and Sun methods accommodate (M3), with (M1) as a special case. We proposed two methods, the weighted-Liang and weighted-Sun methods, which offer considerable flexibility in that they can accommodate all (or any combination) of the three outcome-observation dependence mechanisms. We note that standard error estimation for all methods is most easily obtained using bootstrap procedures; in this setting, a cluster bootstrap, in which subjects are sampled with replacement, is required [31,32]. The resampling of subjects assumes that the correlation structure within each subject is retained [33,34]. R code for the cluster-bootstrap procedure is included in the Appendix. In subsequent sections, we evaluate the statistical properties of these methods in simulation studies (Section 3). We illustrate their application and propose model verification in a case study (Section 4).

3. Simulation study

We evaluated the statistical properties of the reviewed methods through simulation studies under two outcome-observation dependence settings: (i) (M2) and (ii) (M2) and (M3). All simulations were conducted in R 2.13.1 (R Development Core Team, Vienna, Austria).

3.1. Setting 1: simulations under (M2)

3.1.1. Parameters.

In this setting, we used covariates to induce correlation between the outcome and observation-time processes. Following the simulation procedure of Bůžková and Lumley [14], we generated continuous outcomes at each of 1000 iterations using the linear mixed-effects model:

Y_{i} (t) = μ (t) + β_{1} X_{i 1} (t) + β_{2} (X_{i 2} - E [X_{i 2} ∣ X_{i 1}]) + ϵ_{i} (t),

(15)

for which $μ (t) = t$ , $ϵ_{i} (t) \sim N o r m a l (0,1)$ and $β_{1}$ was the target of inference. The time-dependent covariate $X_{i 1} (t) = X_{i 1} l o g (t)$ was a known function of time, in which $X_{i 1}$ followed a Uniform[0,1] distribution. The time-independent covariate $X_{i 2}$ was drawn from a mixture distribution, for which $X_{i 2} \sim N o r m a l (2,1)$ if $X_{i 1} ⩽ 0.5$ and $X_{i 2} \sim N o r m a l (0,4)$ if $X_{i 1} > 0.5$ . Hence, $X_{i 2}$ in model (15) influenced the covariate-outcome association of $X_{i 1} (t)$ . To ensure proper marginalization of model (15), $X_{i 2}$ was centered by its conditional mean given $X_{i 1}$ , resulting in the marginal semi-parametric outcome model:

E [Y_{i} (t) ∣ X_{i} (t)] = μ (t) + β_{1} X_{i 1} (t) .

(16)

We generated the observation times $T_{i k}$ following a non-homogeneous Poisson process with intensity function $λ_{i} (t) = η_{i} λ (t) e x p \{γ_{1} X_{i 1} (t) + γ_{2} X_{i 2}\}$ . Note that $X_{i 2}$ induced additional correlation between the outcome and observation-time processes. We set $λ (t) = \frac{\sqrt{t}}{2}$ and generated the latent variable $η_{i}$ from a Gamma distribution with mean 1 and variance $σ_{η}^{2} = 0.5$ . The independent censoring time $C_{i}$ was generated from Uniform [5,10]. We considered various combinations of outcome parameters ( $β_{1} = 1$ , $β_{2} = {0, 0.3, 1}$ ) and intensity parameters $(γ_{1} = 0.5, γ_{2} = {0, - 0.2, 0.5})$ . When $β_{2} = 0$ and $γ_{2} = 0$ , the outcome-observation dependence model satisfied (M1); when $γ_{2} \neq 0$ , the outcome-observation dependence model satisfied (M2).

3.1.2. Results.

Table I provides the estimated bias, empirical standard error estimates, and mean-squared error estimates for estimation of $β_{1}$ in model (16). Recall that the Lin, Liang, and Sun methods estimate $β_{1}$ without accounting for $X_{i 2}$ in any way, whereas the Bůžková, weighted-Liang, and weighted-Sun methods incorporate the effect of $X_{i 2}$ through observation-level weights. As anticipated, all six methods yielded approximately unbiased parameter estimates for $β_{1}$ if (M1) was satisfied ( $γ_{2} = 0$ ), that is, when the outcome process was conditionally independent of the observation-time process given outcome-model covariates. The Lin, Bůžková, weighted-Liang, and weighted-Sun estimates of $β_{1}$ were comparable in bias and efficiency to both the Liang and Sun estimators. However, if (M1) was violated $(γ_{2} \neq 0$ , that is, the source of additional correlation between the two processes was induced by an additional covariate $X_{i 2}$ , then only the Bůžková, weighted-Liang, and weighted-Sun methods performed well, with negligible biases in all settings. When $β_{2} = 0$ and $γ_{2} \neq 0$ , all methods performed well because $X_{i 2}$ was not associated with the outcome. As $β_{2}$ increased, the biases of Lin, Liang, and Sun estimates for $β_{1}$ increased. A positive value of $γ_{2}$ with positive values of $X_{i 2}$ led to more observations per subject, which increased efficiency in the estimation of $β_{1}$ in most cases.

Table I.

Simulation results for $β_{1}$ under (M2).

$n$	$β_{2}$	$γ_{1}$	Lin			Buzkova			Liang (extension)			Weighted-Liang (extension)				Sun			Weighted-Sun (extension)
$n$	$β_{2}$	$γ_{1}$	Bias	ESE	MSE	Bias	ESE	MSE	Bias	ESE	MSE	Bias	ESE	MSE	ERE^a	Bias	ESE	MSE	Bias	ESE	MSE	ERE^a

100	0	0	0.003	0.284	0.081	0.003	0.284	0.080	0.004	0.280	0.078	−0.006	0.393	0.155	1.915	0.004	0.282	0.079	−0.006	0.393	0.155	1.915
		−0.2	−0.001	0.292	0.085	−0.002	0.289	0.084	0.003	0.287	0.082	−0.012	0.428	0.183	2.193	0.003	0.288	0.083	−0.013	0.428	0.183	2.193
		0.5	−0.020	0.342	0.118	−0.009	0.289	0.084	−0.018	0.324	0.105	−0.007	0.379	0.143	1.720	−0.018	0.329	0.109	−0.008	0.379	0.144	1.724
	0.3	0	0.003	0.318	0.101	0.002	0.313	0.098	0.004	0.313	0.098	−0.008	0.411	0.169	1.724	0.004	0.315	0.099	−0.007	0.411	0.169	1.313
		−0.2	−0.136	0.336	0.132	−0.010	0.332	0.110	−0.098	0.323	0.114	−0.023	0.467	0.219	1.979	−0.119	0.328	0.121	−0.024	0.467	0.219	1.979
		0.5	0.336	0.399	0.273	−0.003	0.323	0.104	0.231	0.352	0.177	0.002	0.385	0.149	1.421	0.300	0.367	0.225	0.011	0.393	0.155	1.480
	1	0	0.003	0.543	0.294	−0.001	0.521	0.271	0.004	0.536	0.288	−0.010	0.577	0.333	1.227	0.004	0.539	0.290	−0.010	0.577	0.333	1.227
		−0.2	−0.452	0.624	0.594	−0.029	0.587	0.345	−0.333	0.555	0.419	−0.048	0.687	0.475	1.370	−0.402	0.582	0.500	−0.050	0.686	0.473	1.366
		0.5	1.167	0.769	1.954	0.011	0.558	0.312	0.811	0.549	0.960	0.022	0.560	0.314	1.007	1.043	0.629	1.484	0.054	0.599	0.362	1.152
200	0	0	−0.004	0.203	0.041	−0.003	0.203	0.041	−0.002	0.202	0.041	−0.010	0.284	0.081	1.957	−0.003	0.202	0.041	−0.010	0.285	0.081	1.971
		−0.2	−0.004	0.213	0.045	−0.004	0.209	0.044	−0.001	0.210	0.044	−0.009	0.304	0.092	2.116	−0.001	0.211	0.044	−0.009	0.304	0.092	2.116
		0.5	0.001	0.258	0.067	0.004	0.202	0.041	−0.003	0.237	0.056	0.002	0.269	0.072	1.773	−0.001	0.243	0.059	0.001	0.269	0.072	1.773
	0.3	0	−0.007	0.227	0.052	−0.007	0.224	0.050	−0.006	0.225	0.051	−0.014	0.299	0.090	1.782	−0.006	0.226	0.051	−0.014	0.299	0.090	1.782
		−0.2	−0.140	0.246	0.080	−0.008	0.237	0.056	−0.099	0.235	0.065	−0.013	0.329	0.108	1.927	−0.121	0.239	0.072	−0.013	0.329	0.108	1.927
		0.5	0.378	0.305	0.236	0.006	0.220	0.048	0.255	0.255	0.130	0.007	0.271	0.073	1.517	0.326	0.267	0.178	0.010	0.270	0.073	1.506
	1	0	−0.014	0.389	0.152	−0.015	0.374	0.140	−0.015	0.386	0.149	−0.023	0.421	0.178	1.267	−0.014	0.388	0.150	−0.023	0.421	0.178	1.267
		−0.2	−0.457	0.462	0.422	−0.016	0.420	0.177	−0.328	0.405	0.272	−0.023	0.486	0.236	1.339	−0.401	0.428	0.344	−0.023	0.485	0.236	1.333
		0.5	1.257	0.602	1.943	0.012	0.369	0.136	0.856	0.389	0.884	0.018	0.379	0.144	1.055	1.089	0.449	1.386	0.031	0.377	0.143	1.044

Open in a new tab

Bias, ${\hat{β}}_{1} - β_{1}$ , $β_{1} = 1$ ; ESE, empirical sample error; MSE, mean-squared error; ERE, estimated relative efficiency.

Estimated relative efficiency was calculated for unbiased estimators with the variance of the Bůžková parameter estimate in the denominator.

In this setting, we also quantified the price of assuming (M3) when the latent variable was unnecessary. We calculated the estimated relative efficiency (ERE) of unbiased estimators with the estimated variance of the weighted-Liang and weighted-Sun methods in the numerator and the estimated variance of the Bůžková method in the denominator. The ERE indicated that the loss of efficiency was reasonable and comparable between the weighted-Liang and weighted-Sun methods. As $β_{2}$ increased (i.e., the dependence between the outcome and observation-time models increased), the ERE decreased. In addition, we also calculated the ERE of IIRR-weighted versus unweighted methods to investigate the loss of efficiency due to inclusion of the additional covariate $X_{i 2}$ when none was needed (Appendix C). The EREs between the Bůžková and Lin methods were close to 1 under all scenarios. The loss of efficiency was greater for the weighted-Liang and weighted-Sun methods but decreased as the number of observations increased (i.e., greater $γ_{2}$ ) and when $β_{2}$ increased.

3.2. Setting 2: simulation under (M2) and (M3)

3.2.1. Parameters.

In the previous setting, we focused on outcome-observation dependence induced through covariates. In this setting, we focus on estimation of $β_{1}$ under various forms of latent variable structures. To simulate data under both (M2) and (M3), we generated outcomes at each of 1000 iterations using the linear mixed-effects model:

Y_{i} (t) = μ (t) + β_{1} X_{i 1} (t) + β_{2} (X_{i 2} - E [X_{i 2} ∣ X_{i 1}]) + α η_{i 1} Q_{i} (t) + ϵ_{i} (t),

(17)

in which $μ (t)$ , $ϵ_{i} (t)$ , $X_{i 1} (t)$ , and $X_{i 2}$ were as defined in Section 3.1.1. The observation times $T_{i k}$ were generated from a non-homogeneous Poisson process with intensity function $λ_{i} (t) = η_{i 2} λ (t) e x p \{γ_{1} X_{i 1} (t) + γ_{2} X_{i 2}\}$ , for which $λ (t) = \frac{\sqrt{t}}{2}$ . The independent censoring time $C_{i}$ was generated from Uniform[7,10]. The coefficients were set at $β_{1} = 1$ , $β_{2} = 0.3$ , $γ_{1} = 0.5$ , $γ_{2} = - 0.2$ , and $α = 1$ . Because $α \neq 0$ in model (17), correlation was introduced between the outcome and the observation-time processes through latent variables. We generated the latent variable $η_{i 2}$ under two scenarios:

$η_{i 2}$ from Gamma distribution with mean 1 and variance 0.5; hereby, $η_{i 2}^{(1)}$ .
$η_{i 2}$ from a mixture distribution, following Uniform [0.5,1.5] if $X_{i 1} ⩽ 0.5$ and Gamma distribution with mean 1 and variance 0.7 if $X_{i 1} > 0.5$ ; hereby, $η_{i 2}^{(2)}$ .

The latent variable $η_{i 1}$ was generated under two scenarios:

$η_{i 1} = η_{i 2}$ ; hereby, $η_{i 1}^{(1)}$ .
$E [η_{i 1} ∣ η_{i 2}] = θ (η_{i 2} - 1)$ , $θ = 1$ ; hereby, $η_{i 1}^{(2)}$ .

We let $Q_{i} (t) = 1$ or $Q_{i} (t) = X_{i 1}$ . When $Q_{i} (t) = X_{i 1}$ , Model (17) can be considered a random coefficient model. The latent variables were dependent on the outcome process either through $Q_{i} (t) = X_{i 1}$ or $η_{i 2}^{(2)}$ . The simulation setup mirrored the setup of Sun et al. [16] if $η_{i 1} = η_{i 2}$ and $Q_{i} (t) = 1$ and mirrored the setup of Liang et al. [15] if $α = 1$ , $η_{i 2}$ was Gamma distributed with mean 1 and $η_{i 1}$ and $η_{i 2}$ were linearly linked through $E [η_{i 1} ∣ η_{i 2}] = θ (η_{i 2} - 1)$ .

3.2.2. Results.

Table II provides the estimated bias, empirical standard error estimates, and mean-squared error estimates for estimation of $β_{1}$ in (17). The inclusion of $X_{i 2}$ in the observation-time model satisfied (M2) and induced additional correlation between the outcome and observation-time processes, so the IIRR-weighted methods (Bůžková, weighted-Liang, and weighted-Sun) performed better than their unweighted counterparts, reflecting the results of Setting 1.

Table II.

Simulation results for $β_{1}$ under (M2) and (M3).

$n$	$η_{i 1}$ ^a	$Q_{i}$	$η_{i 2}$ ^b	Lin			Bůžková			Liang (extension)			Weighted-Liang (extension)			Sun			Weighted-Sun (extension)
$n$	$η_{i 1}$ ^a	$Q_{i}$	$η_{i 2}$ ^b	Bias	ESE	MSE	Bias	ESE	MSE	Bias	ESE	MSE	Bias	ESE	MSE	Bias	ESE	MSE	Bias	ESE	MSE

100	$η_{i 1}^{(1)}$	1	$η_{i 2}^{(1)}$	−0.155	0.430	0.209	−0.030	0.426	0.182	−0.603	0.496	0.611	0.007	0.421	0.177	−0.595	0.494	0.598	0.016	0.407	0.166
			$η_{i 2}^{(2)}$	0.335	0.426	0.294	0.469	0.418	0.394	−0.281	0.464	0.294	0.116	0.405	0.178	−0.305	0.462	0.306	0.008	0.398	0.158
	$η_{i 1}^{(2)}$	1	$η_{i 2}^{(1)}$	−0.171	0.488	0.267	−0.041	0.484	0.236	−0.592	0.539	0.641	0.002	0.481	0.231	−0.585	0.536	0.630	0.014	0.472	0.223
			$η_{i 2}^{(2)}$	0.356	0.528	0.406	0.490	0.528	0.519	−0.223	0.541	0.342	0.134	0.470	0.239	−0.247	0.540	0.353	0.025	0.465	0.217
		$X_{i 1}$	$η_{i 2}^{(1)}$	0.104	0.415	0.183	0.232	0.411	0.223	−0.358	0.481	0.359	0.004	0.445	0.198	−0.284	0.493	0.324	0.250	0.442	0.258
			$η_{i 2}^{(2)}$	0.348	0.483	0.354	0.486	0.485	0.472	−0.233	0.504	0.309	0.045	0.443	0.198	−0.163	0.506	0.283	0.127	0.438	0.208
200	$η_{i 1}^{(1)}$	1	$η_{i 2}^{(1)}$	−0.166	0.307	0.122	−0.025	0.310	0.097	−0.621	0.351	0.509	0.012	0.302	0.092	−0.615	0.349	0.500	0.018	0.292	0.085
			$η_{i 2}^{(2)}$	0.336	0.311	0.209	0.484	0.301	0.325	−0.285	0.342	0.198	0.113	0.294	0.100	−0.304	0.341	0.209	0.007	0.288	0.083
	$η_{i 1}^{(2)}$	1	$η_{i 2}^{(1)}$	−0.179	0.342	0.149	−0.043	0.350	0.124	−0.607	0.387	0.518	−0.004	0.353	0.124	−0.601	0.385	0.510	0.004	0.346	0.120
			$η_{i 2}^{(2)}$	0.338	0.372	0.253	0.492	0.377	0.384	−0.240	0.390	0.210	0.104	0.346	0.131	−0.258	0.391	0.220	−0.002	0.343	0.118
		$X_{i 1}$	$η_{i 2}^{(1)}$	0.100	0.297	0.098	0.235	0.303	0.147	−0.361	0.352	0.254	0.003	0.324	0.105	−0.289	0.358	0.211	0.257	0.329	0.174
			$η_{i 2}^{(2)}$	0.335	0.339	0.227	0.490	0.344	0.358	−0.237	0.367	0.190	0.024	0.323	0.105	−0.168	0.365	0.162	0.110	0.320	0.114

Open in a new tab

Bias, ${\hat{β}}_{1} - β_{1}$ , $β_{1} = 1$ ; ESE, empirical sample error; MSE, mean-squared error.

Two possible links: $η_{i 1}^{(1)} : η_{i 1} = η_{i 2}$ ; $η_{i 1}^{(2)} : E [η_{i 1} ∣ η_{i 2}] = θ (η_{i 2} - 1)$ , $θ = 1$ .

Latent vaiiable distributions: $η_{i 2}^{(1)} : η_{i 2} \sim G a m m a (m e a n = 1, σ^{2} = 0.5)$ ; $η_{i 2}^{(2)} : η_{i 2} \sim I (X_{i 1} ⩽ 0.5) U n i f o r m [0.5,1.5] + I (X_{i 1} > 0.5) G a m m a (1,0.7)$ .

Under the Sun setup (i.e., $η_{i 1}^{(1)} : η_{i 1} = η_{i 2}$ and $Q_{i} (t) = 1$ ), all IIRR-weighted methods yielded approximately unbiased parameter estimates for $β_{1}$ under $η_{i 2}^{(1)}$ . Under the Liang setup (i.e., $η_{i 1}^{(2)} : E [η_{i 1} ∣ η_{i 2}] = θ (η_{i 2} - 1)$ and $Q_{i} (t) = 1)$ , all methods yielded approximately unbiased estimates under $η_{i 2}^{(1)}$ . Under $η_{i 2}^{(2)}$ , in which the distribution of the latent variable depended on $X_{i 1}$ , only the weighted-Sun method yielded approximately unbiased estimates under $Q_{i} (t) = 1$ , although the bias under the weighted-Liang method was smaller in magnitude than the Bůžková method. Note that the Bůžková method is not expected to perform well in this setting (in which unobserved latent variables affect the outcome and observation-time processes), because latent variable models represent a different class of models. If the effect of the latent variable $η_{i 1}$ on the outcomes was associated with the value of $X_{i 1}$ (i.e., $Q_{i} (t) = X_{i 1}$ ), then the bias of $β_{1}$ was small under the weighted-Liang method but large under all other methods.

3.3. Summary

Our simulation results quantified the potential for bias in estimated covariate-outcome associations under various outcome-observation dependence mechanisms. The Bůžková, weighted-Liang, and weighted-Sun methods performed better when (M2) is satisfied. In Setting 1, we examined the robustness of the methods that included latent variables when they were not needed. We showed that the potential loss of efficiency was moderate and decreased when the dependence between the outcome and observation-time models increased. We also examined the relative efficiency between IIRR-weighted and unweighted methods to examine potential loss of efficiency due to including an unnecessary additional covariate in the observation-time model. The results indicated that the loss of efficiency was moderate and decreased with greater number of observations or increased dependence between the outcome and observation-time processes. The weighted-Liang and weighted-Sun methods were the most flexible in that they could accommodate a combination of outcome-observation dependence mechanisms. They also provided estimates with negligible bias depending on the relationship between the latent variable and the outcome-model covariates. In practice, ensuring unbiased estimates through a more complex dependence model may be more important than a potential loss in efficiency.

In our simulation study, we generated data using the same set of covariates in settings with and without latent variables. In settings without such latent variables, the Bůžková method performs well, and simulations here (Section 3.1) and by others [7,14,35] have demonstrated small empirical bias. In practice, observed covariates (including previous outcomes) that are correlated with an unobserved latent variable may be used to partially capture information regarding subject-specific visit intensities.

Exploratory data analysis, model diagnostics, and sensitivity analyses can be used to investigate the relationship between the outcome and observation-time processes and to ensure selection of an appropriate analysis method. We illustrate and discuss strategies for model selection in Sections 4 and 5.

4. Case study

4.1. Background

We compared the reviewed methods using a subset of data from a bladder cancer study conducted by the Veterans Administration Cooperative Urological Research Group [36]. Eighty-five patients with superficial bladder tumors were randomly assigned to placebo $(n = 47)$ or thiotepa treatment $(n = 38)$ . At each follow-up visit, new tumors were counted before being removed transurethrally. The maximum study duration was 53 months. There was notable heterogeneity in visit patterns across patients. The median (25^th, 75^th percentile) number of visits in the placebo group and treatment group was 9 (5, 12) and 9 (4, 23), respectively. The average time between visits for the placebo group was 3.7 months, compared with 2.3 months for the treatment group. These differences suggested that the patients in the treatment group visited the clinic more often. Hence, the observation-time process must account for this difference to estimate properly the effect of treatment on tumor recurrence.

Our analysis focused on the natural logarithm of the cumulative number of new tumors observed up to $t$ plus 1 to retain a marginal response. We included a treatment indicator $(X_{1})$ and the natural logarithm of the initial number of tumors plus 1 $(X_{2})$ in the outcome model. We considered the following outcome models:

Lin and Bůžková methods: $E [Y_{i} (t) ∣ X_{i} (t)] = μ (t) + β_{1} X_{i 1} + β_{2} X_{i 2}$ ;

Liang and weighted-Liang methods: $E [Y_{i} (t) ∣ X_{i} (t), η_{i 1}] = μ (t) + β_{1} X_{i 1} + β_{2} X_{i 2} + η_{i 1} Q_{i}$ , $Q_{i} = X_{i 1}$ ;

Sun and weighted-Sun methods: $E [Y_{i} (t) ∣ X_{i} (t), η_{i 1}] = μ (t) + β_{1} X_{i 1} + β_{2} X_{i 2} + α η_{i 1}$ .

The consensus of previous analyses was that the tumor recurrence and observation-time processes were dependent [15,37,38]. We note that the outcome may be intrinsically dependent upon the measurement process, such that larger intervals between visits allows for more tumors to grow. The outcome is undoubtedly expected to increase with longer time between visits. We considered two observation-time models:

Case 1: $λ_{i} (t) = η_{i 2} e x p \{γ_{1} X_{i 1} + γ_{2} X_{i 2}\} λ (t)$

Case 2: $λ_{i} (t) = η_{i 2} \exp \{γ_{1} X_{i 1} + γ_{2} X_{i 2} + γ_{3} l o g (# n e w t u m o r s s i n c e b a s e l i n e + 1)\} λ (t)$

Case 1 specified the same set of covariates in both the outcome and observation-time models. Case 2 specified an additional covariate based on number of tumors since baseline because it is common for the physician to schedule a patient’s next visit based on the outcomes so far. Recall that $η_{i 1} = η_{i 2}$ in the Sun and weighted-Sun methods and $E [η_{i 1} ∣ η_{i 2}] = θ (η_{i 2} - 1)$ in the Liang and weighted-Liang methods.

4.2. Results

Table III provides estimates for $β$ and $γ$ under the Lin, Liang, and Sun methods in Case 1. We obtained ${\hat{γ}}_{1} = 0.444 (S E, 0.093)$ and ${\hat{γ}}_{2} = - 0.001 (0.115)$ , which suggested that treatment assignment was significantly associated with the observation-time process. We specified $Q_{i} = X_{i 1}$ because $X_{i 1}$ had a significant effect in the observation-time model, and the results in Table III mirrored the conclusion from Liang et al. [15].

Table III.

Parameter estimates and estimated standard errors (SE) under Case 1.

Method	${\hat{β}}_{1} (S E)$	${\hat{β}}_{2} (S E)$	$\hat{θ} (S E)$ ^*	$\hat{α} (S E)$ ^*

Lin	−0.701 (0.172)	0.657 (0.165)
Liang	−0.588 (0.175)	0.682 (0.147)	−0.235 (0.243)
Sun	−0.751 (0.188)	0.680 (0.159)		−0.043 (0.398)

Open in a new tab

${\hat{γ}}_{1} = 0.444 (0.093)$ , ${\hat{γ}}_{2} = - 0.001 (0.115)$ .

The parameters $θ$ and $α$ represent the association between the outcome and observation-time processes for the Liang and Sun methods, respectively.

Next, we examined the importance of the additional covariate in Case 2. Table IV provides estimates for $β$ and $γ$ for IIRR-weighted methods under Case 2. We found that the cumulative number of tumors since baseline was significantly related to the observation-time process. The Wald test of $γ_{3} = 0$ in the observation-time model provided a $p$ -value < 0.001, implying that the inclusion of the additional covariate was appropriate. Hence, the IIRR-weighted methods were more appropriate than the unweighted methods, and we focused on the results in Table IV. The observation-level weights applied to the Bůžková, weighted-Liang, and weighted-Sun methods ranged from 0.50 to 1.26, with median (25th, 75th percentile) = 0.93 (0.84, 1.06). With the incorporation of observation-levels weights, the treatment effect under the Bůžková method was attenuated compared with the Lin method. The treatment effect estimates under weighted-Liang and weighted-Sun methods were lower than those under the Liang and Sun methods. Because the initial number of tumors was not significantly related to the observation-time process, the corresponding estimates ${\hat{β}}_{2}$ were comparable under all methods.

Table IV.

Parameter estimates and their estimated standard errors (SE) under Case 2.

Method	${\hat{β}}_{1} (S E)$	${\hat{β}}_{2} (S E)$	$\hat{θ} (S E)$ ^*	$\hat{α} (S E)$ ^*

Bůžková	−0.565 (0.170)	0.572 (0.165)
Weighted-Liang	−0.395 (0.166)	0.584 (0.147)	−0.266 (0.229)
Weighted-Sun	−0.423 (0.182)	0.580 (0.156)		−0.247 (0.247)

Open in a new tab

${\hat{γ}}_{1} = 0.536 (0.090)$ , ${\hat{γ}}_{2} = - 0.105 (0.128)$ , ${\hat{γ}}_{3} = 0.227 (0.076)$

Stabilized weights: median (25th, 75th percentile) = 0.93 (0.84, 1.06).

The parameters $θ$ and $α$ represent the association between the outcome and observation-time processes for the weighted-Liang and weighted-Sun methods, respectively.

To determine the necessity of latent variables in the outcome models, we focused on the variance of the latent variable $η_{i 2}$ in the observation-time model. Under Case 2, the estimated variance based on (13) was 0.448, indicating that the latent variable approaches were appropriate. Based on the variance property of the Gamma distribution, we partitioned the variance of $η_{i 2}$ as the contribution from the placebo group (0.059) and the thiotepa group (0.417). The difference in variance estimates indicated the possibility of covariate-dependent $η_{i 2}$ , in that the distribution of $η_{i 2}$ was different between the treatment groups. Next, we used the density curve of ${\hat{η}}_{i 2}$ to graphically check if $η_{i 2}$ was covariate dependent. The density curves were indeed different between the treatment groups (Appendix D). Given the evidence of (M2) and (M3), we focused on the results under the weighted-Liang and weighted-Sun methods. We note that the same $Z_{i} (t)$ was used for the methods on Table IV. As in the simulation study, correct specification of covariates in $Z_{i} (t)$ may recover the effect of the latent variable under the Bůžková method. We did not have access to other measured covariates in this data set; if those were available, it may have been possible to find candidates for $Z_{i} (t)$ such that the treatment estimate under the Bůžková method were closer to those under the weighted-Liang and weighted-Sun methods.

The choice between weighted-Liang and weighted-Sun methods relied on the distribution of $η_{i 2}$ . The weighted-Liang method assumes that $η_{i 2}$ is derived from a Gamma distribution with a common variance for all subjects, whereas the weighted-Sun method places no distributional assumption on $η_{i 2}$ . Considering the evidence of covariate dependence based on the density curves, the results from the weighted-Sun method best described the data, although the estimates for $β_{1}$ were similar between the weighted-Liang and weighted-Sun methods. Overall, the results indicated that treatment and the initial number of tumors had significant effects on tumor recurrence. We also observed a negative correlation between tumor recurrence and the observation-time processes $(\hat{α} = - 0.247)$ .

Lastly, we evaluated the fit of the outcome model based on the procedure presented in Liang et al. [15]. We derived residuals ${\hat{ϵ}}_{i} (t) = Y_{i} (t) - {\hat{y}}_{i} (t)$ using parameter estimates from Table IV. Denote $0 ⩽ t_{1} < t_{2} < \dots < t_{M}$ as the $M$ total observation times among all subjects. The estimate of $μ (t)$ is a step function with jumps at unique observation times: $\hat{μ} (t_{k}) = \frac{d \hat{𝒜} (t_{k})}{d \hat{Λ} (t_{k})} = \frac{\hat{𝒜} (t_{k}) - \hat{𝒜} (t_{k} -)}{\hat{Λ} (t_{k}) - \hat{Λ} (t_{k} -)}$ , $1 ⩽ k ⩽ M$ . More information on $\hat{𝒜} (t_{k})$ and $d \hat{Λ} (t_{k})$ can be found in Appendix E. Based on the residual plots of ${\hat{ϵ}}_{i} (t)$ against the observation times (shown in Appendix E), there was some evidence of lack of fit for large outcome values, but it was not systematic with respect to time and was similar across all weighted methods.

5. Discussion

In this paper, we evaluated the statistical properties of currently available and newly extended semi-parametric methods for the analysis of longitudinal data with outcome-dependent observation times. Table V summarizes the strengths and limitations of each method under various outcome-observation dependence mechanisms. The performance of each method hinges on the assumed mechanism of dependence between the outcome and observation-time processes. For conditional independence given covariates in the outcome model only (M1), all reviewed methods are appropriate. For conditional independence given observation-time model covariates only (M2), the Bůžková method is preferred. For conditional independence given unobserved latent variables only (M3), all methods perform well when the latent variables are independent of outcome-model covariates. However, if the distribution of the latent variables is covariate dependent, then the Sun method is preferred; if the effect of the latent variable in the outcome model is modified by any outcome-model covariates, then the Liang method is preferred. Under both (M2) and (M3), our extensions, the weighted-Liang and weighted-Sun methods, are the most flexible and remove the bias otherwise associated with the original Liang and Sun methods under (M2). In addition, our extension of the method by Liang et al. [15] allows time-dependent covariates in the observation-time process, which would otherwise not be possible.

Table V.

Summary of methods for various outcome-observation dependence mechanisms.

				Lin	Bůžková	Liang	Liang (extension)	Weighted-Liang (extension)	Sun	Weighted-Sun (extension)
Time-independent covariates in the observation-time model	Mechanism (Ml)	Conditional independence given outcome-models covariates		+	+	+	+	+	+	+
	Mechanism (M2)	Conditional independence given observation-time model covariates		−	+	−	−	+	−	+
	Mechanism (M3) Conditional independence given latent variables (LV)	LV not associated with outcome-model covariates	LV distribution specified Gamma	+	+	+	+	+	+	+
		LV not associated with outcome-model covariates	LV distribution unspecified	+	+	+	+	+	+	+
		LV associated with outcome-model co variates	Effect of LV modified by Q(t)^*	−	−	−	+	+	−	−
		LV associated with outcome-model co variates	Distribution of LV based on Q(t)^*	−	−	−	−	−	+	+
	Mechanisms (M2) + (M3)			−	−	−	−	+/−	−	+/−
Time-dependent co variates in the observation-time model				+/−	+/−	N/A	+/−	+/−	+/−	+/−

Open in a new tab

+ appropriate; − not appropriate; +/− appropriate under certain situations; N/A not applicable.

$Q (t)$ is a subset of $X (t)$ .

In practice, empirical model checking can be useful to decide which method is most appropriate. First, to decide between (M1) and (M2), one can focus on the observation-time model and perform a Wald test of the additional $q - p$ covariates [21]. If the Wald test yields a significant result, the data suggest (M2). Next, one can determine the necessity of latent variables in the outcome model using the variance of the latent variable $η_{i 2}$ . If the estimated variance of the latent variables is small (i.e., close to 0), latent variables may not be required. One method to estimate $V a r [η_{i 2}]$ is to assume a parametric distribution for the latent variables, such as using Equation (13) if we can assume $η_{i 2}$ is Gamma distributed. The distribution of the latent variable in the observation-time model is unspecified in the Lin, Bůžková, and Sun methods but is assumed to be Gamma distributed in the Liang method. There is a lack of formal techniques to check the Gamma distribution assumption of the unobserved latent variable. A series of sensitivity analyses is recommended. Liang et al. [15] showed that the Liang method provided reasonable estimates for covariate-outcome association even if the distribution of the latent variable $η_{i 2}$ was misspecified, especially when the variance of the distribution was small. Robustness of the Liang and weighted-Liang methods to misspecification of the distribution of $η_{i 2}$ can be improved by replacing the estimate of $η_{i 2}$ by ${\hat{η}}_{i 2} = m_{i} / \int_{0}^{C_{i}} e x p \{{\hat{γ}}^{'} X_{i} (t)\} d Λ (t)$ , removing any distributional assumption. The choice between the Liang and Sun methods rests upon whether the distribution of the latent variable is covariate dependent. An informal check is to partition the estimated variance of $η_{i 2}$ by the covariate values to determine if the partitioned variances are similar across levels of $X_{i} (t)$ . We can also graphically display the density curves of ${\hat{η}}_{i 2}$ to check for covariate-dependent latent variables. Lastly, we can evaluate the overall fit of the models based on residuals. Formal model selection is an area of future research.

Several features of the methods discussed here deserve comment. First, the semi-parametric outcome model does not require the estimation of $μ (t)$ . However, the potential gain from the flexibility of the form of $μ (t)$ is countered by the potential loss in efficiency of estimation of the parameters of interest. Second, we assume that censoring times are independent of the outcome and observation-time model processes, that is, non-informative censoring. This assumption may be relaxed to allow censoring to depend on the outcome and observation-time processes by estimating $γ$ and $Λ$ using the method proposed by Huang et al. [39]. In addition, the parameters in the outcome model are time-independent, which may not be appropriate in some cases. We refer readers to the procedure in Sun et al. [16] to derive time-dependent regression coefficients. Third, our goal is to generate inference regarding the marginal association between a set of covariates and the outcome of interest, rather than to conduct formal causal inference. If we allow intervention on $X_{i} (t)$ , modification of the exposure may influence not only the outcome of interest but also the occurrence of a visit. Hence, the quantification of the causal effect of the exposure on the outcome of interest requires techniques that establish the temporal association between exposure and outcome. A g-computation algorithm [40] or inverse-probability-of-treatment-weighted estimators [41] may provide insight into estimation of causal effects. Lastly, the observation-time process can be modeled on two time scales: total time scale (i.e., time-to-events model) in which each recurrent event is measured from a time of origin, and gap time scale (i.e., time-between-events model) in which the measure of interest is time between successive events [42]. The methods in this paper adopt the total time scale, but it may be appropriate to consider the alternative parameterization. The time-between-events approach is well studied within the recurrent events field [43], but the use of the gap time scale in the regression modeling of longitudinal data with outcome-dependent observation times warrants future research.

It is of interest to note that in the framework of incomplete data, GEE is able to accommodate missing completely at random data and the special case of covariate-dependent missingness [44,45]. Similarly, in the current focus on outcome-dependent observation times, GEE does provide reliable estimates of $β$ under (M1), assuming a correctly specified function of time in the outcome model. With the inclusion of observation-level inverse intensity weights, a weighted-GEE model may also provide reasonable estimates of $β$ under (M2) with the ease of currently available software packages [7]. However, the advantage of the methods in Section 2 is the flexibility provided by the non-parametric specification of the effect of time.

The methods we described are currently limited to linear models for continuous outcomes. Recent research has focused on the development of log-linear models for count outcomes [35, 46]. Future research will likely extend to semi-parametric models for binary outcomes. In addition, broader application of existing methods is likely hampered by the lack of available general-purpose statistical software. R code to generate Table IV along with some model-checking procedures is provided in Appendix F.

Supplementary Material

Supplementary material

NIHMS1971342-supplement-Supplementary_material.pdf^{(312KB, pdf)}

Acknowledgements

We gratefully acknowledge the University of Pennsylvania for supporting this research. We also thank the associate editor and reviewers for invaluable comments that greatly improved the manuscript.

Footnotes

Supporting information

Additional supporting information may be found in the online version of this article at the publisher’s web site.

References

1.Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika 1986; 73(1):13–22. [Google Scholar]
2.Huang CY, Wang MC, Zhang Y. Analysing panel count data with informative observation times. Biometrika 2006; 93(4):763–775. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Sun J, Park DH, Sun L, Zhao X. Semiparametric regression analysis of longitudinal data with informative observation times. Journal of the American Statistical Association 2005; 100(471):882–889. [Google Scholar]
4.Lipsitz S, Fitzmaurice G, Ibrahim J. Parameter estimation in longitudinal studies with outcome-dependent follow-up.Biometrics 2002; 58(3):621–630. [DOI] [PubMed] [Google Scholar]
5.Fitzmaurice GM, Lipsitz SR, Ibrahim JG, Gelber R, Lipshultz S. Estimation in regression models for longitudinal binary data with outcome-dependent follow-up. Biostatistics 2006; 7(3):469–485. [DOI] [PubMed] [Google Scholar]
6.Lin H, Scharfstein DO, Rosenheck RA. Analysis of longitudinal data with irregular, outcome-dependent follow-up. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2004; 66(3):791–813. [Google Scholar]
7.Bůžková P, Lumley T. Longitudinal data analysis for generalized linear models with follow-up dependent on outcome-related variables. Canadian Journal of Statistics 2007; 35(4):485–500. [Google Scholar]
8.Ryu D, Sinha D, Mallick B, Lipsitz SR, Lipshultz SE. Longitudinal studies with outcome-dependent follow-up. Journal of the American Statistical Association 2007; 102(479):952–961. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Liu L, Huang X, O’Quigley J. Analysis of longitudinal data in the presence of informative observational times and a dependent terminal event, with application to medical cost data. Biometrics 2008; 64(3):950–958. [DOI] [PubMed] [Google Scholar]
10.Troxel AB, Lipsitz SR, Fitzmaurice GM, Ibrahim JG, Sinha D, Molenberghs G. A weighted combination of pseudo-likelihood estimators for longitudinal binary data subject to non-ignorable non-monotone missingness. Statistics in Medicine 2010; 29(14):1511–1521. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Albert PS. A transitional model for longitudinal binary data subject to nonignorable missing data. Biometrics 2000; 56(2):602–608. [DOI] [PubMed] [Google Scholar]
12.French B, Heagerty PJ. Marginal mark regression analysis of recurrent marked point process data. Biometrics 2009; 65(2):415–422. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Lin DY, Ying Z. Semiparametric and nonparametric regression analysis of longitudinal data. Journal of the American Statistical Association 2001; 96(453):103–126. [Google Scholar]
14.Bůžková P, Lumley T. Semiparametric modeling of repeated measurements under outcome-dependent follow-up. Statistics in Medicine 2009; 28:987–1003. [DOI] [PubMed] [Google Scholar]
15.Liang Y, Lu W, Ying Z. Joint modeling and analysis of longitudinal data with informative observation times. Biometrics 2009; 65(2):377–384. [DOI] [PubMed] [Google Scholar]
16.Sun L, Song X, Zhou J. Regression analysis of longitudinal data with time-dependent covariates in the presence of informative observation and censoring times. Journal of Statistical Planning and Inference 2011; 141(8):2902–2919. [Google Scholar]
17.Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data 2nd edn. Wiley: New York, 2002. [Google Scholar]
18.Brumback BA, Rice JA. Smoothing spline models for the analysis of nested and crossed samples of curves. Journal of the American Statistical Association 1998; 93:961–976. [Google Scholar]
19.Lin X, Carroll RJ. Semiparametric regression for clustered data using generalized estimating equations. Journal of the American Statistical Association 2001; 96(455):1045–1056. [Google Scholar]
20.Pepe MS, Couper D. Modeling partly conditional means with longitudinal data. Journal of the American Statistical Association 1997; 92(439):991–998. [Google Scholar]
21.Lin DY, Wei LJ, Yang I, Ying Z. Semiparametric regression for the mean and rate functions of recurrent events. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2000; 62(4):711–730. [Google Scholar]
22.Li Y, Ryan L. Survival analysis with heterogeneous covariate measurement error. Journal of the American Statistical Association 2004; 99(467):724–735. [Google Scholar]
23.Williamson JM, Datta S, Satten G. Marginal analyses of clustered data when cluster size is informative. Biometrics 2003; 59(1):36–42. [DOI] [PubMed] [Google Scholar]
24.Seaman SR, Pavlou M, Copas AJ. Methods for observed-cluster inference when cluster size is informative: a review and clarifications. Biometrics 2014; 70(2):449–456. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Rotnitzky A, Robins JM. Semiparametric regression estimation in the presence of dependent censoring. Biometrika 1995; 82(4):805–820. [Google Scholar]
26.Zhao LP, Rotnitzky A, Robins JM. Analysis of semiparametric models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association 1995; 90(429):106–121. [Google Scholar]
27.McCulloch CE, Neuhaus JM. Misspecifying the shape of a random effects distribution: why getting it wrong may notmatter. Statistical Science 2011; 26(3):388–402. [Google Scholar]
28.Liu D, Kalbfleisch JD, Schaubel DE. A positive stable frailty model for clustered failure time data with covariate-dependent frailty. Biometrics 2011-March; 67(1):8–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Neuhaus JM, McCulloch CE. Separating between- and within-cluster covariate effects by using conditional and partitioning methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2006; 68(5):859–872. [Google Scholar]
30.Heagerty PJ, Kurland BF. Misspecified maximum likelihood estimates and generalised linear mixed models. Biometrika 2001; 88(4):973–985. [Google Scholar]
31.Efron B, Tibshirani R. An Introduction to the Bootstrap 1st edn. Chapman & Hall/CRC: Boca Raton, 1993. [Google Scholar]
32.Field CA, Welsh AH. Bootstrapping clustered data. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2007; 69(3):369–390. [Google Scholar]
33.Chernick MR. Bootstrap Methods: A Guide for Practitioners and Researchers 2nd edn. Wiley: New York, 2007. [Google Scholar]
34.Cheng G, Yu Z, Huang JZ. The cluster bootstrap consistency in generalized estimating equations. Journal of Multivariate Analysis 2013; 115:33–47. [Google Scholar]
35.Bůžková P, Lumley T. Semiparametric log-linear regression for longitudinal measurements subject to outcome-dependent follow-up. Journal of Statistical Planning and Inference 2008; 138(8):2450–2461. [Google Scholar]
36.Andrews DF, Herzberg A. Data: a Collection of Problems from Many Fields for the Student and Research Worker 1st edn. Springer: New York, 1985. [Google Scholar]
37.Sun J, Wei L. Regression analysis of panel count data with covariate-dependent observation and censoring times. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2000; 62(2):293–302. [Google Scholar]
38.Hu XJ, Sun J, Wei LJ. Regression parameter estimation from panel counts. Scandinavian Journal of Statistics 2003; 30(1):25–43. [Google Scholar]
39.Huang CY, Qin J, Wang MC. Semiparametric analysis for recurrent event data with time-dependent covariates and informative censoring. Biometrics 2010; 66(1):39–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Robins JM, Greenland S, Hu FC. Estimation of the causal effect of a time-varying exposure on the marginal mean of a repeated binary outcome. Journal of the American Statistical Association 1999; 94:687–700. [Google Scholar]
41.Robins JM, Hernán MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology 2000; 11(5):550–560. [DOI] [PubMed] [Google Scholar]
42.Cook RJ, Lawless J. The Statistical Analysis of Recurrent Events 1st edn. Springer: New York, 2007. [Google Scholar]
43.Huang X, Liu L. A joint frailty model for survival and gap times between recurrent events. Biometrics 2007; 63(2):389–397. [DOI] [PubMed] [Google Scholar]
44.Rubin DB. Inference and missing data. Biometrika 1976; 63(3):581–592. [Google Scholar]
45.Little RJA. Modeling the drop-out mechanism in repeated-measures studies. Journal of the American Statistical Association 1995; 90(431):1112–1121. [Google Scholar]
46.Sun J, Tong X, He X. Regression analysis of panel count data with dependent observation times. Biometrics 2007; 63(4):1053–1059. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material

NIHMS1971342-supplement-Supplementary_material.pdf^{(312KB, pdf)}

[R1] 1.Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika 1986; 73(1):13–22. [Google Scholar]

[R2] 2.Huang CY, Wang MC, Zhang Y. Analysing panel count data with informative observation times. Biometrika 2006; 93(4):763–775. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Sun J, Park DH, Sun L, Zhao X. Semiparametric regression analysis of longitudinal data with informative observation times. Journal of the American Statistical Association 2005; 100(471):882–889. [Google Scholar]

[R4] 4.Lipsitz S, Fitzmaurice G, Ibrahim J. Parameter estimation in longitudinal studies with outcome-dependent follow-up.Biometrics 2002; 58(3):621–630. [DOI] [PubMed] [Google Scholar]

[R5] 5.Fitzmaurice GM, Lipsitz SR, Ibrahim JG, Gelber R, Lipshultz S. Estimation in regression models for longitudinal binary data with outcome-dependent follow-up. Biostatistics 2006; 7(3):469–485. [DOI] [PubMed] [Google Scholar]

[R6] 6.Lin H, Scharfstein DO, Rosenheck RA. Analysis of longitudinal data with irregular, outcome-dependent follow-up. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2004; 66(3):791–813. [Google Scholar]

[R7] 7.Bůžková P, Lumley T. Longitudinal data analysis for generalized linear models with follow-up dependent on outcome-related variables. Canadian Journal of Statistics 2007; 35(4):485–500. [Google Scholar]

[R8] 8.Ryu D, Sinha D, Mallick B, Lipsitz SR, Lipshultz SE. Longitudinal studies with outcome-dependent follow-up. Journal of the American Statistical Association 2007; 102(479):952–961. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Liu L, Huang X, O’Quigley J. Analysis of longitudinal data in the presence of informative observational times and a dependent terminal event, with application to medical cost data. Biometrics 2008; 64(3):950–958. [DOI] [PubMed] [Google Scholar]

[R10] 10.Troxel AB, Lipsitz SR, Fitzmaurice GM, Ibrahim JG, Sinha D, Molenberghs G. A weighted combination of pseudo-likelihood estimators for longitudinal binary data subject to non-ignorable non-monotone missingness. Statistics in Medicine 2010; 29(14):1511–1521. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Albert PS. A transitional model for longitudinal binary data subject to nonignorable missing data. Biometrics 2000; 56(2):602–608. [DOI] [PubMed] [Google Scholar]

[R12] 12.French B, Heagerty PJ. Marginal mark regression analysis of recurrent marked point process data. Biometrics 2009; 65(2):415–422. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Lin DY, Ying Z. Semiparametric and nonparametric regression analysis of longitudinal data. Journal of the American Statistical Association 2001; 96(453):103–126. [Google Scholar]

[R14] 14.Bůžková P, Lumley T. Semiparametric modeling of repeated measurements under outcome-dependent follow-up. Statistics in Medicine 2009; 28:987–1003. [DOI] [PubMed] [Google Scholar]

[R15] 15.Liang Y, Lu W, Ying Z. Joint modeling and analysis of longitudinal data with informative observation times. Biometrics 2009; 65(2):377–384. [DOI] [PubMed] [Google Scholar]

[R16] 16.Sun L, Song X, Zhou J. Regression analysis of longitudinal data with time-dependent covariates in the presence of informative observation and censoring times. Journal of Statistical Planning and Inference 2011; 141(8):2902–2919. [Google Scholar]

[R17] 17.Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data 2nd edn. Wiley: New York, 2002. [Google Scholar]

[R18] 18.Brumback BA, Rice JA. Smoothing spline models for the analysis of nested and crossed samples of curves. Journal of the American Statistical Association 1998; 93:961–976. [Google Scholar]

[R19] 19.Lin X, Carroll RJ. Semiparametric regression for clustered data using generalized estimating equations. Journal of the American Statistical Association 2001; 96(455):1045–1056. [Google Scholar]

[R20] 20.Pepe MS, Couper D. Modeling partly conditional means with longitudinal data. Journal of the American Statistical Association 1997; 92(439):991–998. [Google Scholar]

[R21] 21.Lin DY, Wei LJ, Yang I, Ying Z. Semiparametric regression for the mean and rate functions of recurrent events. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2000; 62(4):711–730. [Google Scholar]

[R22] 22.Li Y, Ryan L. Survival analysis with heterogeneous covariate measurement error. Journal of the American Statistical Association 2004; 99(467):724–735. [Google Scholar]

[R23] 23.Williamson JM, Datta S, Satten G. Marginal analyses of clustered data when cluster size is informative. Biometrics 2003; 59(1):36–42. [DOI] [PubMed] [Google Scholar]

[R24] 24.Seaman SR, Pavlou M, Copas AJ. Methods for observed-cluster inference when cluster size is informative: a review and clarifications. Biometrics 2014; 70(2):449–456. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Rotnitzky A, Robins JM. Semiparametric regression estimation in the presence of dependent censoring. Biometrika 1995; 82(4):805–820. [Google Scholar]

[R26] 26.Zhao LP, Rotnitzky A, Robins JM. Analysis of semiparametric models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association 1995; 90(429):106–121. [Google Scholar]

[R27] 27.McCulloch CE, Neuhaus JM. Misspecifying the shape of a random effects distribution: why getting it wrong may notmatter. Statistical Science 2011; 26(3):388–402. [Google Scholar]

[R28] 28.Liu D, Kalbfleisch JD, Schaubel DE. A positive stable frailty model for clustered failure time data with covariate-dependent frailty. Biometrics 2011-March; 67(1):8–17. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Neuhaus JM, McCulloch CE. Separating between- and within-cluster covariate effects by using conditional and partitioning methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2006; 68(5):859–872. [Google Scholar]

[R30] 30.Heagerty PJ, Kurland BF. Misspecified maximum likelihood estimates and generalised linear mixed models. Biometrika 2001; 88(4):973–985. [Google Scholar]

[R31] 31.Efron B, Tibshirani R. An Introduction to the Bootstrap 1st edn. Chapman & Hall/CRC: Boca Raton, 1993. [Google Scholar]

[R32] 32.Field CA, Welsh AH. Bootstrapping clustered data. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2007; 69(3):369–390. [Google Scholar]

[R33] 33.Chernick MR. Bootstrap Methods: A Guide for Practitioners and Researchers 2nd edn. Wiley: New York, 2007. [Google Scholar]

[R34] 34.Cheng G, Yu Z, Huang JZ. The cluster bootstrap consistency in generalized estimating equations. Journal of Multivariate Analysis 2013; 115:33–47. [Google Scholar]

[R35] 35.Bůžková P, Lumley T. Semiparametric log-linear regression for longitudinal measurements subject to outcome-dependent follow-up. Journal of Statistical Planning and Inference 2008; 138(8):2450–2461. [Google Scholar]

[R36] 36.Andrews DF, Herzberg A. Data: a Collection of Problems from Many Fields for the Student and Research Worker 1st edn. Springer: New York, 1985. [Google Scholar]

[R37] 37.Sun J, Wei L. Regression analysis of panel count data with covariate-dependent observation and censoring times. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2000; 62(2):293–302. [Google Scholar]

[R38] 38.Hu XJ, Sun J, Wei LJ. Regression parameter estimation from panel counts. Scandinavian Journal of Statistics 2003; 30(1):25–43. [Google Scholar]

[R39] 39.Huang CY, Qin J, Wang MC. Semiparametric analysis for recurrent event data with time-dependent covariates and informative censoring. Biometrics 2010; 66(1):39–49. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] 40.Robins JM, Greenland S, Hu FC. Estimation of the causal effect of a time-varying exposure on the marginal mean of a repeated binary outcome. Journal of the American Statistical Association 1999; 94:687–700. [Google Scholar]

[R41] 41.Robins JM, Hernán MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology 2000; 11(5):550–560. [DOI] [PubMed] [Google Scholar]

[R42] 42.Cook RJ, Lawless J. The Statistical Analysis of Recurrent Events 1st edn. Springer: New York, 2007. [Google Scholar]

[R43] 43.Huang X, Liu L. A joint frailty model for survival and gap times between recurrent events. Biometrics 2007; 63(2):389–397. [DOI] [PubMed] [Google Scholar]

[R44] 44.Rubin DB. Inference and missing data. Biometrika 1976; 63(3):581–592. [Google Scholar]

[R45] 45.Little RJA. Modeling the drop-out mechanism in repeated-measures studies. Journal of the American Statistical Association 1995; 90(431):1112–1121. [Google Scholar]

[R46] 46.Sun J, Tong X, He X. Regression analysis of panel count data with dependent observation times. Biometrics 2007; 63(4):1053–1059. [DOI] [PubMed] [Google Scholar]

PERMALINK

Regression modeling of longitudinal data with outcome-dependent observation times: extensions and comparative evaluation

Kay See Tan

Benjamin French

Andrea B Troxel

Abstract

1. Introduction

2. Estimation methods

2.1. Models and assumptions

2.1.1. Semi-parametric outcome model.

2.1.2. Observation-time model.

2.1.3. A framework of outcome-observation dependence mechanisms.

(M1) Conditional independence given past outcome-model covariates

(M2) Conditional independence given past observation-time model covariates

(M3) Conditional independence given shared latent variables

2.2. Existing methods

2.2.1. Method under (M1).

2.2.2. Method under (M1) and (M2).

2.2.3. Methods under (M1) and (M3).

2.3. Extensions

2.3.1. Extension to Liang method to accommodate time-dependent covariates.

2.3.2. Weighted-Liang and weighted-Sun methods.

2.4. Summary

3. Simulation study

3.1. Setting 1: simulations under (M2)

3.1.1. Parameters.

3.1.2. Results.

Table I.

3.2. Setting 2: simulation under (M2) and (M3)

3.2.1. Parameters.

3.2.2. Results.

Table II.

3.3. Summary

4. Case study

4.1. Background

4.2. Results

Table III.

Table IV.

5. Discussion

Table V.

Supplementary Material

Acknowledgements

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases