Skip to main content
Biostatistics (Oxford, England) logoLink to Biostatistics (Oxford, England)
. 2018 Mar 28;20(3):433–451. doi: 10.1093/biostatistics/kxy010

The ROC curve for regularly measured longitudinal biomarkers

Haben Michael 1,, Lu Tian 1, Musie Ghebremichael 1
PMCID: PMC6587928  PMID: 29608649

Summary

The receiver operating characteristic (ROC) curve is a commonly used graphical summary of the discriminative capacity of a thresholded continuous scoring system for a binary outcome. Estimation and inference procedures for the ROC curve are well-studied in the cross-sectional setting. However, there is a paucity of research when both biomarker measurements and disease status are observed longitudinally. In a motivating example, we are interested in characterizing the value of longitudinally measured CD4 counts for predicting the presence or absence of a transient spike in HIV viral load, also time-dependent. The existing method neither appropriately characterizes the diagnostic value of observed CD4 counts nor efficiently uses status history in predicting the current spike status. We propose to jointly model the binary status as a Markov chain and the biomarkers levels, conditional on the binary status, as an autoregressive process, yielding a dynamic scoring procedure for predicting the occurrence of a spike. Based on the resulting prediction rule, we propose several natural extensions of the ROC curve to the longitudinal setting and describe procedures for statistical inference. Lastly, extensive simulations have been conducted to examine the small sample operational characteristics of the proposed methods.

Keywords: HIV/AIDS, Longitudinal binary outcomes, Longitudinal biomarker, Predictive value, Receiver operator characteristic (ROC) curve

1. Introduction

The receiver operating characteristic (ROC) curve is a graphical summary of the discriminative capacity of a thresholded continuous scoring system for a binary outcome. The curve consists of pairs of true positive and false positive rates as the threshold is varied (Swets and Pickett, 1982). Although many alternative measures have been proposed (Uno and others, 2007, 2013; Pencina and others, 2008; Steyerberg and others, 2010), the ROC curve remains the most commonly used in many fields. In medical research, ROC curves are often used to characterize the quality of a continuous biomarker as a diagnostic for binary statuses such as “diseased” versus “non-diseased” (Pepe, 2003; Zhou and others, 2009). Well-studied in the cross-sectional setting, the ROC curve has been generalized to settings where the outcome of interest is time-to-event (Heagerty and others, 2000; Zheng and Heagerty, 2004; Heagerty and Zheng, 2005) and where the biomarker is longitudinally measured (Foulkes and others, 2010; Liu and Albert, 2014). However, less research has considered both longitudinal biomarker measurements and binary statuses.

A motivating example is data gathered from the Yale Prospective Longitudinal Pediatric HIV Cohort. The cohort comprises 97 children born to HIV-infected mothers in the New Haven, CT, area since 1985. Various measurements were taken on the participants every 2–3 months over the 10-year period 1996–2006. Among these measurements, we focus on a continuous biomarker, CD4+ lymphocyte count, as a predictor of a binary outcome, “blip” status, the presence or absence of a transient spike in viral load (Paintsil and others, 2008).

Let Inline graphic and Inline graphic denote the biomarker value and a binary status of patient Inline graphic at visit Inline graphic, respectively, Inline graphic, Inline graphic The values Inline graphic may be direct assay measurements, as in the motivating example, or they may be derived or composite quantities. To assess the predictive value of the longitudinal biomarker Inline graphic for predicting Inline graphic, Liu and Wu (2003) and Liu and others (2005) propose a simple mixed effect regression model (Breslow and Clayton, 1993)

graphic file with name M10.gif (1.1)

where Inline graphic is the logit function and Inline graphic and Inline graphic are, respectively, the subject-specific random intercept and slope. Similar models are described in Foulkes and others (2010) and Albert (2012). The random vector Inline graphic is assumed to follow a Gaussian distribution

graphic file with name M15.gif

The ROC curve summarizing the diagnostic value of Inline graphic is then constructed based on pairs

graphic file with name M17.gif

where Inline graphic are estimates of the subject-specific random effects obtained from the observed data, for example,

graphic file with name M19.gif

and Inline graphic, Inline graphic, and Inline graphic are maximum likelihood estimators for the corresponding population parameters.

While this approach is simple and intuitive, we mention several limitations. First, the parametric assumptions may be too restrictive for some applications. For example, as discussed below, the Yale pediatric HIV data suggest greater dependency among biomarkers and disease statuses nearer in time. Neither is accounted for by model (1.1), which is symmetric in time. Second, the subject-specific random effect estimate Inline graphic and Inline graphic as defined above, are not available at visit Inline graphic as the biomarker levels Inline graphic and responses Inline graphic are not yet observed. Third, the approach uses the same data both to fit the model and, by using the fitted biomarkers to construct the ROC curve, to assess the quality of the fit. One expects such an assessment to overestimate the true diagnostic quality of the biomarker (Janes and others, 2009). Efforts to set aside a group of patients for validation after estimation encounter the difficulty that subject effect estimates for the validation patients are unavailable (Foulkes and others, 2010). Lastly and more conceptually, the notion of ROC curve stands to be refined in the context of longitudinal measurements of multiple patients. In contrast to the cross-sectional setting, several useful ROC curves suggest themselves. For example, the predictive performance of the biomarker for a given patient, as determined by that patient’s history, can be quite different from the predictive performance for the entire patient population. In the next section, we propose a general framework to address these limitations.

2. Methods

We first note two properties desirable in a framework for assessing diagnostic performance in the longitudinal, multiple subject design under consideration. First, to assess the predictive performance of a biomarker, the biomarker should depend only on data that is available when a prediction is to be made. We adopt the vantage of a practitioner who has previous biomarker and status data for a patient, is confronted with a current biomarker for the patient, and must now predict current status. In the HIV example, due to the turnaround time of the tests involved, CD4 count or percentage is normally available before blip status. The patient’s history includes previous blip statuses, and the clinician may need to determine a course of treatment based on as-yet unavailable current blip status as predicted from current CD4 count or percentage. A similar problem is described by Yang and others (2015). Here, the “status” is the presence of absence of influenza-like illness in periodic reports issued by the Center for Disease Control (CDC), and the predictor or “biomarker” is real-time internet search data. The CDC’s reports describe outbreaks at a 1–3 week delay. When making real-time predictions with the search data, only CDC outbreak data referencing previous time points are available. In this case, an accurate early warning of the outbreak can be very important for public health.

As predictions may be made at different times, the accuracy of the prediction and the associated ROC curve will depend on time, with the corresponding prediction depending only on patient history available at that time. Second, two types of prediction performance should be differentiated: that for an individual patient and that for a patient population. For the former, we target the performance of Inline graphic as a predictor of Inline graphic where Inline graphic is a continuous score summarizing the predictive information contained in the history up to visit Inline graphic of a given patient Inline graphic and Inline graphic For the latter, we are interested in the predictive performance of Inline graphic in the entire patient population at a time Inline graphic, that is, marginalizing across patients.

In the following, we first generalize the simple mixed effect model (1.1) and discuss the two types of predictive performance under the proposed model. As discussed further below, easy extensions lead to more sophisticated models allowing for more flexible prediction rules. We assume that the longitudinal biomarker levels Inline graphic follow an autoregressive process conditional on disease status Inline graphic which are generated by a Markov chain as in, e.g., Azzalini (1994). Specifically, we assume that for the Inline graphicth patient

graphic file with name M39.gif (2.1)

where

graphic file with name M40.gif

independently and identically, and Inline graphic are hyperparameters. Inline graphic is set to 0 to initialize the autoregressive, implying that the baseline biomarker level Inline graphic follows a Gaussian distribution conditional on the blip status. This set of parametric distributions for the random effects is chosen in part for convenience as they permit the model parameters to be estimated using many standard statistical software packages. Specifically, Inline graphic can be estimated by fitting the linear mixed effects model (Laird and Ware, 1982)

graphic file with name M45.gif

and Inline graphic can be estimated by fitting the generalized mixed effects model

graphic file with name M47.gif

where Inline graphic and Inline graphic In addition, Inline graphic can be estimated by the observed proportion across patients at the initial visit, i.e., the baseline. More importantly, under this model, we may link Inline graphic with the observed history at visit Inline graphic via a random effects model

graphic file with name M53.gif (2.2)

where

graphic file with name M54.gif (2.3)

Thus model (2.1) generalizes model (1.1), insofar as the log-odds of positive disease status is modeled as linear in the subject’s most current biomarker level, although the distribution of the coefficients in this linear combination may differ between the two models. Generalizations of (2.1) that include more biomarkers, e.g., Inline graphic, correspond to higher order autoregressive processes in model (1.1).

2.1. Individual patient ROC curve

We would like to evaluate the predictive performance of the biomarker or score Inline graphic (or its history) for patient Inline graphic at time Inline graphic by contrasting two survival functions, Inline graphic and Inline graphic where

graphic file with name M61.gif

and

graphic file with name M62.gif

Inline graphic uses the available history, since under model (2.1), Inline graphic is conditionally independent of the remaining history given Inline graphic. We may then use the ROC curve Inline graphic or derived statistics such as the area under the ROC curve Inline graphic to summarize this contrast.

As Inline graphic, depend on the unknown subject-specific random effect Inline graphic the score Inline graphic is unavailable in practice. An ROC curve based on Inline graphic can only serve as a theoretical benchmark. We therefore estimate the random effect Inline graphic based on its conditional distribution Inline graphic where Inline graphic and use a plug-in estimator for Inline graphic . For example, we may estimate the random effects and Inline graphic by the posterior mean

graphic file with name M77.gif

and

graphic file with name M78.gif (2.4)

respectively (Robinson, 1991), where the functions Inline graphic are obtained by replacing all the relevant subject-specific random effects in (2.3) with their estimated counterparts based on Inline graphic For example,

graphic file with name M81.gif

Here, the subscript Inline graphic is used to emphasize that the prediction of the subject-specific random effect is made at visit Inline graphic using information up to visit Inline graphic The estimator Inline graphic depends on the subject only through the first argument, i.e., the patient history, and so the subscript Inline graphic has been dropped. An explicit expression for this choice of Inline graphic can be found in Appendix A of the supplementary material available at Biostatistics online. Using the estimated score Inline graphic (or Inline graphic if Inline graphic is unknown) to predict the disease status at visit Inline graphic, the predictive performance of patient Inline graphic’s biomarker at visit Inline graphic can be summarized by the ROC curve

graphic file with name M94.gif

where

graphic file with name M95.gif

is the subject- and visit-specific survival function of the estimated score. Inline graphic depends on the joint distribution of the random history Inline graphic and the response Inline graphic and thus also on the subject-specific random effect Inline graphic Since we do not have a convenient analytic expression for Inline graphic we resort to a Monte-Carlo method. Specifically, for the Inline graphicth patient:

  1. Simulate Inline graphic and Inline graphic according to model (2.1) using consistent estimates of the subject-specific random effect Inline graphic and the population parameter Inline graphic

  2. Compute Inline graphic according to (2.4).

  3. Repeat steps 1–2 a large number of times and calculate the empirical ROC curve Inline graphic of the resulting pairs Inline graphic

Inline graphic can serve as an approximation to the subject-specific ROC curve of the Inline graphicth patient at the Inline graphicth visit provided that this patient’s subject-specific parameters are known or can be estimated up to the desired accuracy. When this assumption is unmet, e.g., when the time Inline graphic is small and few observations on the patient of interest are available, we instead propose two alternative summaries of the diagnostic performance of the biomarker at the individual level.

The first is the average individual-specific ROC curve over the patient population,

graphic file with name M113.gif (2.5)

where the expectation is taken with respect to the random effect Inline graphic In practice, we may use Monte-Carlo methods, simulating a large number Inline graphic of random effects Inline graphic from the distribution for the random effect and estimating Inline graphic by

graphic file with name M118.gif

The resulting Inline graphic is not the ROC curve for any individual patient but the expected patient-level ROC curve for a typical patient from the given population. As before, when Inline graphic is unknown, we may replace it by a consistent estimator Inline graphic and let

graphic file with name M122.gif

Since Inline graphic is a smooth function of Inline graphic, Inline graphic is a consistent estimator for Inline graphic and Inline graphic converges weakly to a mean zero Gaussian process indexed by Inline graphic when Inline graphic converges weakly to a mean zero Gaussian distribution.

The second option is the limit

graphic file with name M130.gif (2.6)

As Inline graphicInline graphicInline graphic and Inline graphic converge to Inline graphicInline graphic, and Inline graphic, respectively, where

graphic file with name M138.gif

are subject-specific state probabilities of the stationary distribution of the 2-state Markov chain. Therefore, provided Inline graphic,

graphic file with name M140.gif (2.7)

where Inline graphic is cumulative distribution function of the standard normal. Here, we used the fact that under model (2.1), Inline graphic given Inline graphic is normally distributed with mean Inline graphic and variance Inline graphic. When Inline graphic, i.e., a patient’s diseased and non-diseased biomarker means are the same, the posterior probability of positive event status (2.2) reduces to

graphic file with name M147.gif

posterior probability of a 2-state Markov chain. Consequently, the ROC curve summarizes the performance of a 2-state Markov chain in predicting the next state in this case. This performance serves as a limiting case when Inline graphic becomes small in magnitude, the biomarkers cease to provide useful discrimination, and the patient’s prior status carries all the information about current status.

Inline graphic can be viewed as the ROC curve for subject Inline graphic after adequate follow-up and therefore reflects the ultimate personalized diagnostic value of the biomarker for the Inline graphicth patient with the subject random effect Inline graphic It may or may not be similar to the population counterpart described in the next section. Inline graphic can be estimated by Inline graphic which is the same as Inline graphic with Inline graphic and Inline graphic being replaced by Inline graphic and Inline graphic respectively. Assuming that Inline graphic and Inline graphicInline graphic is consistent and Inline graphic converges to a mean zero Gaussian process. Therefore, the key assumption for estimating Inline graphic in practice is that Inline graphic be sufficiently large to allow acceptable estimation of the individual-specific random effect. The resulting estimated ROC curve can then be used to characterize the diagnostic value of the biomarker for an individual patient after sufficient follow-up.

Inference for Inline graphic and Inline graphic can be carried out with the parametric bootstrap. One simulates fresh data using the estimated population parameter Inline graphic from model (2.1) and obtains Inline graphic and Inline graphic, the estimators for the corresponding ROC curves, from the simulated data. The empirical distributions Inline graphic and Inline graphic based on a large number of simulations serve as approximations to the distributions of Inline graphic and Inline graphic respectively. Point-wise confidence intervals (CIs) of Inline graphic and Inline graphic can be constructed along these lines.

Remark 2.1

One may be interested in the diagnostic value of Inline graphic at the Inline graphicth visit given the past history Inline graphic In this case, the ROC curve can be constructed based on the conditional survival function

Remark 2.1

In contrast to ROC curves based on Inline graphic or its estimator, this ROC curve reflects the predictive value of Inline graphic only. It also depends on the random effect Inline graphic, unknown at the visit Inline graphic One may also consider its expectation with respect to random effects or its limit when Inline graphic as an estimable alternative.

Remark 2.2

Both Inline graphic and Inline graphic are parametric in nature in that their summarization of the diagnostic value of the longitudinally measured biomarker are valid only if model (2.1) is correctly specified.

2.2. ROC curve for the patient population

The predictive performance of the biomarker across the entire population may be very different from that for an individual patient. For example, the latter does not take into account biomarker variation between patients, or differences between patients in the prior probabilities of positive status events. Were the data not longitudinal, we might consider the empirical ROC curve of biomarker–status pairs Inline graphic. To take accumulated patient data into account, we instead consider the ROC curve of Inline graphic, the patients’ biomarker scores (2.4) at a given time Inline graphic. The scores synthesize all the predictive information in the past history under model (2.1).

Conditionally on the population parameter Inline graphic, the patient scores are iid, and the empirical ROC curve is a valid metric for the predictive value of the scores regardless of the validity of the model being used to derive them. If model (2.1) is a good approximation to the true relationship between Inline graphic and Inline graphic, one may anticipate good prediction accuracy of the resulting score. A severely misspecified model may give a prediction score with poor performance. In either case, the ROC curve and derived statistics such as the area under the ROC curve remain objective measures for the predictive value of the scoring system.

Formally, assuming that Inline graphic in probability, the score Inline graphic converges to

graphic file with name M196.gif

where Inline graphic if the model is correctly specified. We are interested in estimating the ROC curve for the predictive value at the Inline graphicth visit

graphic file with name M199.gif (2.8)

where

graphic file with name M200.gif

We do so by plugging in the empirical survival function:

graphic file with name M201.gif (2.9)

where Inline graphic

graphic file with name M203.gif

and Inline graphic is the event indicator function. Similarly, the area under the ROC curve, the concordance statistics, may be estimated as

graphic file with name M205.gif

In Appendix B of supplementary material, available at Biostatistics online, we show that Inline graphic is a consistent estimator for Inline graphic and the distribution of Inline graphic converges to a mean zero Gaussian process under mild regularity conditions. The variance of Inline graphic can be approximated by an efficient resampling method. At each iteration, we first generate random weights Inline graphic from the unit exponential distribution and estimate Inline graphic under model (2.1) with the Inline graphicth observation weighted by Inline graphic Denote the estimator by Inline graphic and let

graphic file with name M215.gif

where

graphic file with name M216.gif

Obtaining in this way a large number of realizations of Inline graphic their empirical variance can be used to approximate that of Inline graphic Similar resampling methods can be used to make inference on Inline graphic, the area under the ROC curve at the Inline graphicth visit.

The predictive value of the biomarker in the entire population also varies with the visit Inline graphic. With more visits and richer data observed, the predictive ability of the updated scoring system is expected to increase. We may study the trend of predictive value from visit Inline graphic to Inline graphic by simultaneously estimating Inline graphic and Inline graphic. It is not difficult to show that

graphic file with name M226.gif

can be approximated by a multivariate mean-zero Gaussian distribution, based on which joint inference for the predictive value at all visits of interest may be conducted.

When the predictive value of the constructed scoring system only varies moderately from visit Inline graphic to Inline graphic, i.e., Inline graphic are similar, it is tempting to estimate the ROC curve by the average predictive value between these two visits. To this end, one may empirically construct a ROC curve as

graphic file with name M230.gif

where

graphic file with name M231.gif

and Inline graphic Since it averages observations from multiple visits, Inline graphic can be substantially more stable than Inline graphicInline graphic is a consistent estimator of

graphic file with name M236.gif

where

graphic file with name M237.gif

a weighted average of Inline graphic Statistical inference based on Inline graphic can be made by resampling methods similar to those previously described.

Remark 2.3

Despite some similarities, Inline graphic and Inline graphic are quite different. The former is parametric and interpretable only when model (2.1) is correctly specified, while and latter is non-parametric in nature. The former, ignoring the differentiability in biomarkers across patients, tends to be smaller than the latter.

Remark 2.4

The proposed ROC curves depend on the patient history Inline graphic through the biomarkers estimates Inline graphic We may consider other functions of Inline graphic given by different statistical models of the response. More generally, one may consider a working regression model

Remark 2.4

and construct the ROC curve based on

Remark 2.4

where Inline graphic is a parametric function of observed history and Inline graphic and Inline graphic are the model parameter and its appropriate estimator, respectively.

2.3. Extension

In model (2.1), we assume that (i) the underlying disease status follows a simple Markov chain, i.e., the distribution of Inline graphic only depends on Inline graphic and (ii) the distribution of the biomarker level at visit Inline graphic, Inline graphic only depends on Inline graphic and Inline graphic; see Figure 1a, which diagrams the probability generating process described in (2.1). There are several obvious extensions:

  1. The distribution of Inline graphic depends on Inline graphic (Figure 1b).

  2. The distribution of Inline graphic depends on Inline graphic (Figure 1c).

Fig. 1.

Fig. 1.

Schematic of the data-generation process described by (2.1) and extensions. (a) Model (2.1). (b) Inline graphic generated from Inline graphic. (c) Inline graphic generated from Inline graphic.

Adapting model (2.1) to the first setting, where the biomarker value depends not only on the current disease status but also the status at the previous visit, gives:

graphic file with name M264.gif (2.10)

where Inline graphic is the subject-specific random effect and Inline graphic is independent Inline graphic. Under this model

graphic file with name M268.gif

where

graphic file with name M269.gif

Therefore, besides the terms in (2.2), model (2.10) leads to additional interaction terms Inline graphic and Inline graphic contributing to the prediction of the disease status at the Inline graphicth visit, Inline graphic

For the second setting, we may assume that

graphic file with name M274.gif

where

graphic file with name M275.gif

In other words, the transition probability of the underlying disease status depends on the biomarker level at the prior visit. Under this model

graphic file with name M276.gif

where

graphic file with name M277.gif

Therefore, compared with (2.2), there is an additional interaction term Inline graphic contributing to the prediction of the disease status at the Inline graphicth visit, Inline graphic There may be more extensive generalizations of model (2.2) such as the combination of extensions of (1) and (2) or higher order Markov chains for Inline graphic. As mentioned in the previous sections, while the validity of individualized ROC curve depends on the correct model specification, the population-based ROC curve Inline graphic and Inline graphic can be constructed for scoring systems developed under different modeling assumptions and used to compare different models in terms of their predictive ability.

3. Example

The goal of highly active antiretroviral therapy in the treatment of HIV is to keep a patient’s CD4 count high and to suppress viral load. CD4 count measures immunosuppression, the risk of opportunistic infections, and the strength of the immune system. Viral load is the amount of HIV in a sample, indicative of, among other things, transmission risk. Although viral load is regarded as a better indicator of disease status, it is also more expensive and time-consuming to measure than CD4 count. According to clinical guidelines, both are tested regularly in a typical treatment regimen and used to guide subsequent treatment.

Even when therapy is effective and viral load is clinically categorized as suppressed, transient spikes in viral load, or “blips,” are observed. The clinical significance of viral blips is not understood well. While some studies have reported that viral blips are of no clinical significance, others have reported an association between viral blips and virologic failure. The identification of the predictors of viral blips and the association between viral blips and CD4+ T-cell changes over time are subjects of ongoing research. (see Paintsil and others, 2016 and references therein.)

We consider the accuracy of absolute CD4+ T-lymphocyte count as a predictor of blip status among children. We analyzed longitudinal data from HIV-infected children enrolled in the Yale Prospective Longitudinal Cohort study comprising 97 children born to HIV-infected mothers in the greater New Haven, CT, area since 1985. The predictor CD4 count measures the number of CD4 cells/mmInline graphic of blood and the response blip status is defined as a viral load equal or exceeding 50 copies/mL. The median number of visits/patient is 33, with 1st and 3rd quartiles of 15 and 47 visits, respectively. For all of the 3309 visits in the data set, the median time between visits is exactly 90 days, with 1st and 3rd quartiles of 57 and 112 days, respectively, giving approximately evenly spaced visits during the follow-up. The average age at enrollment is 6.7 years (standard deviation: 3.9 years). Figure S1(a) of supplementary material available at Biostatistics online summarizes the dates of visits in the lifetimes of the subjects. Further details on the cohort and definitions used here can be found in Paintsil and others (2008) and the references therein. Sixteen patients with fewer than 10 visits were removed in order to allow for estimation of the individual ROC, Inline graphic as discussed in Section 2. Eighty-one subjects remain after excluding those with fewer than 10 visits.

The choice of how to group longitudinal observations is an important issue in many cohort-based longitudinal data analyzes, including ours. At each visit, measurements including CD4 count and blip status are taken, and antiretroviral treatment is administered. Therefore, the visit may serve as a surrogate for the number of treatments administered since baseline. While the specific enrollment time varies, the majority of the enrolled children (average age of 6.7 years) are in the early stages of treatment at baseline, and therefore it may be sensible to align observations according to their visit numbers. When the sample size allows, the analysis can be restricted to a more homogeneous subgroup of children with similar conditions at the baseline, which makes grouping by visits still more interpretable.

A crude indication of the value of CD4 as a predictor of blip status is given in Figure S1(b) of supplementary material available at Biostatistics online, which plots the histograms of CD4, aggregated over patients and visits, conditional on positive and negative blip status. Despite the large overlap, there is a clear location shift between the two measures. Figure S2 of supplementary material available at Biostatistics online plots the trajectories of CD4, with plotting shape encoding blip status, for a representative sample of subjects. The long sequences of like shapes even as CD4 fluctuates wildly suggests previous blip status as a predictor of future blip status, motivating the Markov structure in model (2.1). Finally, Figure S3 of supplementary material available at Biostatistics online is a heatmap of the empirical correlation matrix among CD4 measurements on the first 40 visits, for the 44 patients with 40 or more visits. The entries tend to decrease in magnitude moving away from the diagonal. This correlation structure accords with the weak dependency implied by the autocorrelative structure in model (2.1).

We apply the ROC estimation procedure described in Section 2 to the pediatric HIV data in order to assess the value of the past CD4 counts and blip statuses as a predictor of current blip status. The MLE Inline graphic (95% CI (0.28,0.63)), describes the strong autoregressive dependency of CD4, as suggested by the heatmap. Similarly, the strong dependence between previous and current blip status is confirmed by the population transition probabilities Inline graphic, giving the probabilities of remaining in the negative and positive, respectively, blip status states in successive visits. A 95% CI for the difference Inline graphic is (0.05,0.12). The difference between the CD4 standardized means conditional on negative versus positive blip status, Inline graphic (95% bootstrap CI (0.63,0.91)), is consistent with the location shift apparent from Figure S1(b) of supplementary material available at Biostatistics online.

The resulting time-indexed ROC curves at time Inline graphic and their associated 95% CIs are summarized in Figure S4 of supplementary material available at Biostatistics online. The time Inline graphic was chosen to be consistent with our exclusion of patients with fewer than 10 visits from the analysis. As Figure S1(a) of supplementary material available at Biostatistics online shows, ROC curves at later time points are available if one is willing to exclude patients with insufficient visits. The CIs are constructed by a bootstrap with Inline graphic bootstrap samples. Despite the noisy data presented in Figures S1(b) and S2 of supplementary material available at Biostatistics online, the risk score taking into account both previous CD4 values and blip status performs reasonably well as a predictor of the current disease status. We also plot the time-asymptotic individual ROC curve Inline graphic for selected patients. Patient no. 14 exhibits a non-smooth curve. The “elbow” arises when a patient’s previous disease status is significantly more predictive than the patient’s biomarkers of future disease status. In such cases, the ROC curve approximates the discrete behavior of a threshold predictor. As mentioned above, the validity of the individualized ROC curves depends on the correct specification of model (2.1). If we view model (2.1) as a working device used to derive a risk score for predicting the blip status, we may use Inline graphic as well as Inline graphic to summarize the predictive value of the scoring system between visits Inline graphic and Inline graphic Due to the small sample size and infrequent occurrence of blips, we construct Inline graphic and its 95% CI as shown in Figure S4 of supplementary material available at Biostatistics online. The area under the ROC curve is 0.865, with a bootstrap standard error of 0.008, also indicating good predictive value. The jagged shape of the ROC curve reflects the fact that few of the 81 patient scores lie in the overlap of the case and control distributions.

As a comparison, we also plot in Figure 2 the ROC curve based on scores Inline graphic by fitting the simple random effect model (1.1). As expected, the resulting ROC curve is higher than Inline graphic and Inline graphic. However, using fitted values as scores to predict the blip status requires information not available at visit Inline graphic and is not therefore a comparable measure of the predictiveness of the biomarker at that time.

Fig. 2.

Fig. 2.

The ROC curve when fitted values under a random effects model (1.1) are used as scores compared with Inline graphic and Inline graphic (pediatric HIV data).

4. Simulation

In this section, we investigate the finite-sample performance of the proposed method. To this end, we generate data sets mimicking the pediatric HIV data. Specifically, Inline graphic are simulated under model (2.1) with the population parameter Inline graphic being the maximum likelihood estimator obtained from the HIV data. We use a Monte-Carlo approximation for the underlying true ROC curves Inline graphic and Inline graphic We also calculate Inline graphic based on the analytic expression of Inline graphic given in (2.7) for selected random effects Inline graphic Next, we generate 500 data sets, each consisting of Inline graphic patients with Inline graphic visits each to match the HIV example. For each simulated data set, we estimate

  1. the expected individual-specific ROC curve Inline graphic at Inline graphic and its 95% point-wise CI using the parametric bootstrap method;

  2. the limiting individual-specific ROC curve Inline graphic for selected patients;

  3. the population ROC curve Inline graphic and its 95% point-wise CI using the resampling method, for Inline graphic

The resulting ROC curves estimates and 95% CIs based on one generated data set are presented in Figure 3. To evaluate the performance of the proposed method, we estimate the empirical bias of the point estimators as well as the coverage level of the 95% CIs at selected Inline graphic for both Inline graphic and Inline graphic, Inline graphic For each estimator of interest, we also calculate the empirical average of the estimated standard errors and the empirical standard error. The detailed simulation results for Inline graphic and Inline graphic are summarized at Table 1. The empirical biases are reasonably small in magnitude. The average estimated standard errors of all estimators are very close to the empirical standard errors and the coverage level of the 95% CIs are consistent with the nominal level allowed by the Monte-Carlo simulation error. In general, as expected, the population ROC curve tends to be higher than the individualized counterpart at the same visit. For estimating Inline graphic we compare the AUC under ROC curve, Inline graphic based on data from increasing number of visits with the true limiting AUC value for selected random effects Inline graphics. We focus in particular on whether the estimator converges to the truth as the number of visits increases under this correct model specification. Figure S5 of supplementary material available at Biostatistics online plots the number of visits against the difference between estimated AUCs and the truth with five different realizations of Inline graphic, showing the expected convergence. The convergence may be too slow for some purposes, requiring data from a large number of visits in order to achieve the required estimation accuracy of the individual random effect.

Fig. 3.

Fig. 3.

Expected individual ROC Inline graphic (solid) and population ROC Inline graphic (dotted) at visit Inline graphic with 95% bootstrap CI; limiting individual ROC Inline graphic for four patients. The data were generated under model (2.1) with hyperparameters estimated from the pediatric HIV data.

Table 1.

Nominal 95% CI coverage (CVL), bias (BS), average standard error (ASE), and empirical standard error (ESE) of Inline graphic and Inline graphic for false positive rates (FPR) 10%, 25%, 50%, and 75% at visits Inline graphic and 35 (synthetic data using hyperparameters estimated from the pediatric HIV data, Inline graphic patients)

    FPR 0.10 0.25 0.50 0.75
Visit ROC          
      CVL (BS,ASE,ESE) CVL (BS,ASE,ESE) CVL (BS,ASE,ESE) CVL (BS,ASE,ESE)
5 Inline graphic   0.94 (0.01,0.05,0.05) 0.96 (Inline graphic0.00,0.04,0.04) 0.94 (Inline graphic0.01,0.03,0.03) 0.95 (Inline graphic0.00,0.02,0.02)
  Inline graphic   0.94 (Inline graphic0.04,0.12,0.12) 0.95 (Inline graphic0.02,0.12,0.11) 0.97 (Inline graphic0.00,0.10,0.10) 0.97 (Inline graphic0.00,0.06,0.06)
15 Inline graphic   0.93 (Inline graphic0.01,0.05,0.05) 0.93 (Inline graphic0.01,0.04,0.04) 0.91 (Inline graphic0.02,0.03,0.03) 0.97 (Inline graphic0.00,0.02,0.02)
  Inline graphic   0.96 (Inline graphic0.03,0.13,0.12) 0.96 (Inline graphic0.01,0.12,0.11) 0.98 (0.00,0.08,0.08) 0.95 (0.01,0.05,0.04)
25 Inline graphic   0.93 (Inline graphic0.00,0.05,0.05) 0.94 (Inline graphic0.01,0.04,0.04) 0.95 (Inline graphic0.01,0.03,0.03) 0.94 (Inline graphic0.01,0.02,0.02)
  Inline graphic   0.95 (Inline graphic0.02,0.13,0.12) 0.97 (Inline graphic0.00,0.11,0.11) 0.95 (Inline graphic0.01,0.07,0.07) 0.92 (0.00,0.04,0.04)
35 Inline graphic   0.94 (Inline graphic0.01,0.05,0.05) 0.92 (Inline graphic0.01,0.04,0.04) 0.96 (Inline graphic0.00,0.03,0.03) 0.95 (Inline graphic0.00,0.02,0.02)
  Inline graphic   0.96 (0.02,0.13,0.12) 0.94 (0.01,0.11,0.11) 0.94 (Inline graphic0.00,0.08,0.07) 0.92 (Inline graphic0.00,0.04,0.04)

In the second set of simulations, we examine the performance of the proposal under model mis-specification. Specifically, we simulate data under the random effect model (1.1), with model parameters taken to be the maximum likelihood estimators from the HIV data. As discussed above, the diagnostic value represented by the ROC curve based on Inline graphic cannot be achieved in practice but may serve as a benchmark. Since model (2.1) is misspecified, we focus on the population ROC curve only. First, we plot in Figure S6 of supplementary material available at Biostatistics online the true Inline graphic and that based on Inline graphic by setting Inline graphic and Inline graphic As expected, by comparison with the benchmark Inline graphic fails to reflect the predictive value of the observed history due to model misspecification. Next, we repeat the simulation with a sample size Inline graphic and Inline graphic 500 times to examine the finite-sample biases of the point estimators and coverage levels of the 95% CIs for estimating the true ROC curves. Table 2 confirms our expectation that the inference procedure for Inline graphic remains valid in the presence of model misspecification.

Table 2.

Misspecified model: nominal 95% CI coverage (CVL), bias (BS), average standard error (ASE), and empirical standard error (ESE) of Inline graphic for false positive rates (FPR) 10%, 25%, 50%, and 75% at visits Inline graphic and 35 (random slope-intercept logistic model, Inline graphic patients)

  FPR 0.10 0.25 0.50 0.75
Visit          
    CVL (BS,ASE,ESE) CVL (BS,ASE,ESE) CVL (BS,ASE,ESE) CVL (BS,ASE,ESE)
5   0.97 (Inline graphic0.00,0.14,0.13) 0.97 (Inline graphic0.00,0.12,0.10) 0.96 (0.00,0.09,0.08) 0.92 (0.00,0.06,0.05)
15   0.94 (0.01,0.15,0.16) 0.95 (0.05,0.11,0.10) 0.92 (Inline graphic0.00,0.07,0.06) 0.92 (0.00,0.04,0.03)
25   0.94 (Inline graphic0.01,0.15,0.15) 0.93 (0.03,0.11,0.11) 0.96 (0.03,0.07,0.07) 0.93 (0.02,0.04,0.03)
35   0.93 (Inline graphic0.08,0.14,0.11) 0.97 (Inline graphic0.03,0.11,0.09) 0.95 (Inline graphic0.00,0.06,0.06) 0.95 (0.01,0.03,0.03)

5. Discussion

We have proposed a set of ROC-based metrics and statistical methods for evaluating the predictive value of a biomarker in a longitudinal, multiple patient design. We emphasize three keys in extending the ROC curve from the cross-sectional to longitudinal setting: (i) the score used to construct the ROC curve should take into account all of the observed history; (ii) the score should not require unobserved history; and (iii) the predictive value of the biomarker at the individual and the population levels should be treated differently. These objectives are not met satisfactorily by the mixed effects model (1.1) available in the literature, where (i) a patient’s observations are conditionally independent given the subject effects, and in particular past observations are not taken into account in using the observation as a score; (ii) all time points are used to estimate the subject effects, so that the score estimate for a given time point is a function of the disease status it is intended to predict; and (iii) there is no distinction between patient and population ROC curves.

The current approach is developed based on a simple parametric model. While the parametric assumptions are plausible in light of our data, they are chosen mainly for convenience in implementation and motivated by the HIV data. Necessary model checking analysis for other data is needed.

In the proposed approach, we assume that the biomarker is measured at a regular time interval, which is true in the HIV example. However, in clinical practice, the measurement time is often irregular and it may not be possible to group the measurements into comparable 1st, 2nd Inline graphic visits. Furthermore, even with regular measurements, grouping measurements by visits may not be interpretable if the baseline does not represent an interpretable “origin,” such as onset of disease or start of treatment. In such cases, the predictiveness of the biomarker needs to be evaluated with respect to the measurement history, including the actual measurement times. Doing so may require complex joint modeling of measurement time, biomarker level and disease status. Further research in this direction is warranted.

Supplementary Material

kxy010_Supplementary_Data

Acknowledgments

This research was partially supported by grants NIH/NAIDS 2P30 AI060354-11 from the National Institutes of Health and R01HL089778-05 from the National Heart, Lung, and Blood Institute. We thank the two referees and associate editor for their constructive comments. The authors also thank the study participants and the principal investigator of the study, Dr. Elijah Paintsil, for sharing the data with us. Conflict of Interest: None declared.

References

  1. Albert, P. S. (2012). A linear mixed model for predicting a binary event from longitudinal data under random effects misspecification. Statistics in Medicine 31, 143–154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Azzalini, A. (1994). Logistic regression for autocorrelated data with application to repeated measures. Biometrika 81, 767–775. [Google Scholar]
  3. Breslow, N. E. and Clayton, D. G. (1993). Approximate inference in generalized linear mixed models. Journal of the American Statistical Association 88, 9–25. [Google Scholar]
  4. Foulkes, A. S., Azzoni, L., Li, X., Johnson, M. A., Smith, C., Mounzer, K. and Montaner, L. J. (2010). Prediction based classification for longitudinal biomarkers. The Annals of Applied Statistics 4, 1476. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Heagerty, P. J. and Zheng, Y. (2005). Survival model predictive accuracy and ROC curves. Biometrics 61, 92–105. [DOI] [PubMed] [Google Scholar]
  6. Heagerty, P. J., Lumley, T. and Pepe, M. S. (2000). Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics 56, 337–344. [DOI] [PubMed] [Google Scholar]
  7. Janes, H., Longton, G. and Pepe, M. (2009). Accommodating covariates in ROC analysis. Stata Journal 9, 17–39. [PMC free article] [PubMed] [Google Scholar]
  8. Laird, N. M. and Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics 38, 963–974. [PubMed] [Google Scholar]
  9. Liu, D. and Albert, P. S. (2014). Combination of longitudinal biomarkers in predicting binary events. Biostatistics 15, 706–718. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Liu, H. and Wu, T. (2003). Estimating the area under a receiver operating characteristic (ROC) curve for repeated measures design. Journal of Statistical Software 8, 1–18. [Google Scholar]
  11. Liu, H., Li, G., Cumberland, W. and Wu, T. (2005). Testing statistical significance of the area under a receiving operating characteristics curve for repeated measures design with bootstrapping. Journal of Data Science 3, 257–278. [Google Scholar]
  12. Paintsil, E., Ghebremichael, M., Romano, S. and Andiman, W. (2008). Absolute CD4+ T-lymphocyte count as a surrogate marker of pediatric HIV disease progression. Pediatric Infectious Disease Journal 7, 629–635. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Paintsil, E., Martin, R., Goldenthal, A., Bhandari, S., Andiman, W. and Ghebremichael, M. (2016). Frequent episodes of detectable viremia in HIV treatment-experienced children is associated with a decline in CD4+ T-cells over time. Journal of AIDS & Clinical Research 7, 565–577. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Pencina, M. J., D’Agostino, R. B. and Vasan, R. S. (2008). Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Statistics in Medicine 27, 157–172. [DOI] [PubMed] [Google Scholar]
  15. Pepe, M. S. (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction. New York: Oxford University Press, USA. [Google Scholar]
  16. Robinson, G. K. (1991). That BLUP is a good thing: the estimation of random effects. Statistical Science 6, 15–32. [Google Scholar]
  17. Steyerberg, E. W., Vickers, A. J., Cook, N. R., Gerds, T., Gonen, M., Obuchowski, N., Pencina, M. J. and Kattan, M. W. (2010). Assessing the performance of prediction models: a framework for some traditional and novel measures. Epidemiology (Cambridge, MA) 21, 128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Swets, J. A. and Pickett, R. M. (1982). Evaluation of Diagnostic Systems: Methods from Signal Detection Theory. Academic Press Series in Cognition and Perception New York, NY: Elsevier Science & Technology Books. [Google Scholar]
  19. Uno, H., Cai, T., Tian, L. and Wei, L.-J. (2007). Evaluating prediction rules for t-year survivors with censored regression models. Journal of the American Statistical Association 102, 527–537. [Google Scholar]
  20. Uno, H., Tian, L, Cai, T., Kohane, I. S. and Wei, L.-J. (2013). A unified inference procedure for a class of measures to assess improvement in risk prediction systems with survival data. Statistics in Medicine 32, 2430–2442. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Yang, S., Santillana, M. and Kou, S. C. (2015). Accurate estimation of influenza epidemics using Google search data via ARGO. Proceedings of the National Academy of Sciences 112, 14473–14478. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Zheng, Y. and Heagerty, P. J. (2004). Semiparametric estimation of time-dependent ROC curves for longitudinal marker data. Biostatistics 5, 615–632. [DOI] [PubMed] [Google Scholar]
  23. Zhou, X.-H., McClish, D. K. and Obuchowski, N. A. (2009). Statistical Methods in Diagnostic Medicine, Volume 569 New York: John Wiley & Sons. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

kxy010_Supplementary_Data

Articles from Biostatistics (Oxford, England) are provided here courtesy of Oxford University Press

RESOURCES