Skip to main content
American Journal of Epidemiology logoLink to American Journal of Epidemiology
. 2022 Jul 22;191(10):1820–1830. doi: 10.1093/aje/kwac125

Analyzing Longitudinally Collected Viral Load Measurements in Youth With Perinatally Acquired HIV Infection: Problems and Possible Remedies

Sean S Brummel , Russell B Van Dyke, Kunjal Patel, Murli Purswani, George R Seage, Tzy-Jyun Yao, Rohan Hazra, Brad Karalius, Paige L Williams; for the Pediatric HIV/AIDS Cohort Study
PMCID: PMC9767869  PMID: 35872591

Abstract

Human immunodeficiency virus (HIV) viral load (VL) is an important quantitative marker of disease progression and treatment response in people living with HIV infection, including children with perinatally acquired HIV. Measures of VL are often used to predict different outcomes of interest in this population, such as HIV-associated neurocognitive disorder. One popular approach to summarizing historical viral burden is the area under a time-VL curve (AUC). However, alternative historical VL summaries (HVS) may better answer the research question of interest. In this article, we discuss and contrast the AUC with alternative HVS, including the time-averaged AUC, duration of viremia, percentage of time with suppressed VL, peak VL, and age at peak VL. Using data on youth with perinatally acquired HIV infection from the Pediatric HIV/AIDS Cohort Study Adolescent Master Protocol, we show that HVS and their associations with full-scale intelligence quotient depend on when the VLs were measured. When VL measurements are incomplete, as can be the case in observational studies, analysis results may be subject to selection bias. To alleviate bias, we detail an imputation strategy, and we present a simulation study demonstrating that unbiased estimation of a historical VL summary is possible with a correctly specified imputation model.

Keywords: bias, HIV, imputation, missing data, viral load, viremia copy-years, youth

Abbreviations

AIDS

acquired immunodeficiency syndrome

AMP

Adolescent Master Protocol

AUC

area under the viral load curve

AUCt

time-averaged AUC

FSIQ

full-scale intelligence quotient

HIV

human immunodeficiency virus

HVS

historical viral load summary(ies)

MAR

missing at random

MCAR

missing completely at random

MNAR

missing not at random

PHACS

Pediatric HIV/AIDS Cohort Study

 

Human immunodeficiency virus (HIV) infection is a chronic lifelong condition that can affect the function of different organs and systems, causing morbidity and reduced quality of life. For example, HIV can penetrate the blood-brain barrier, resulting in an HIV-associated neurocognitive disorder ranging from minor to severe dementia (1, 2). This disorder has been linked to a low CD4 cell count nadir, a surrogate for past viral load exposure (3). Children living with perinatal HIV infection may be particularly vulnerable to development of neurocognitive deficits. These deficits may be reduced with earlier antiretroviral treatment, which decreases viral load burden (4). However, as children age into young adulthood, they often face challenges with adherence, resulting in increased viral load exposure (5) and subsequent risk of adverse neurocognitive outcomes. Therefore, it is important to understand viral load exposure from birth to adulthood when investigating associations of viral load with neurocognitive and other health outcomes.

 

When evaluating multiple past viral load measurements, it is often advantageous to collapse them into a single historical viral load summary (HVS). HVS have been used in HIV research for at least 20 years (6), particularly among adults with HIV. Commonly used summaries include the area under the viral load curve (AUC) (viremia copy-years) (713), time-averaged AUC (AUCt) (1416), total duration and percentage of time above or below a viral load threshold (1723), peak viral load (2325), and age at peak viral load (25). The AUC has been described most often in the literature, but it may not be the most scientifically meaningful HVS, especially in pediatric populations or those with incomplete viral load histories. To broaden the literature and increase the use of alternative HVS, here we compare and discuss the HVS measures noted above.

 

In addition to selecting a scientifically meaningful HVS, data collection needs to be carefully considered—particularly in observational studies—because HVS may be subject to selection bias in the presence of incomplete records (15, 26). Incomplete records in this context create a unique missing-data problem, because the availability of viral load measurements needed to compute the HVS are only partially present. In this paper, we describe selection bias due to incomplete viral load records and show that it is possible to reduce bias when using a flexible imputation strategy.

 

We first discuss key considerations in selecting a meaningful HVS, including timing and scale transformations. Then, using data from the Pediatric HIV/AIDS Cohort Study (PHACS) Adolescent Master Protocol (AMP), we show how HVS based on incomplete data can exhibit large discrepancies in comparison with imputation-based methods. Next, we evaluate the association of 6 HVS with neurocognitive outcomes while varying the timing of viral load measures and the types of transformations to reinforce the need to select the HVS carefully. Finally, we present a simulation study to show that a flexible imputation strategy can reduce bias.

SELECTING A HISTORICAL VIRAL LOAD SUMMARY

First, we discuss 3 time-related concepts to help choose the HVS for analysis: relative timing, range (time span), and transformation. Next, we motivate the need for an HVS. Last, we further define the 6 HVS.

Relative timing, range (time span), and transformation

Relative timing is defined as the set of viral loads selected for analysis relative to when the outcome of interest was measured. Selection of the relative timing should be based on the expected influence of viral load on the outcome of interest. For example, suppose viral load measurements early in life are expected to influence cognitive functioning later in life. In that case, an HVS using viral loads up to 3 years after birth could be associated with a full-scale intelligence quotient (FSIQ) obtained in adolescence. Alternatively, if inflammation markers (e.g., C-reactive protein) are thought to be influenced by more recent viral loads, an HVS could be computed from measurements taken within 1 year before the inflammation measurement. Different choices for the relative timing can change relationships with the outcome under study and change the interpretation of the HVS.

A second metric related to time is the range (time span), defined as the difference in time between the first and last measured viral loads. In the 2 previous examples, the HVS time range was 3 years for FSIQ and 1 year for the inflammation marker. The selected range over which the HVS is calculated may alter associations with outcomes. In addition, HVS can be standardized by the time span, resulting in alternative HVS with new interpretations.

Another important consideration is how viral loads are transformed. Viral loads are typically transformed using either log10 or an indicator function above or below a threshold (e.g., ≥400 copies/mL vs. <400 copies/mL). Both of these transformations reduce the relative difference of the viral loads (e.g., 5,000/500 = 10 but log10(5,000)/log10(500)≈ 1.37) and minimize the influence of the tails of the viral load distribution. When computing an HVS, transformations can be applied before or after the data are aggregated across time. The importance of the tails of the viral load distribution in the outcome under study should guide the timing of the transformation. If most of the effect of viral load on the outcome is expected to occur at higher levels of viremia, it may be more appropriate to perform the transformation after aggregating across time. If moderate and high levels of viremia are expected to have similar effects on the outcomes, it may be more appropriate to perform the transformation first.

Motivation for HVS

Once the relative timing, range, and transformation have been selected, associations with historical viral loads could be analyzed in a single generalized linear model with a distributed lag or a moving average (27). This strategy would work well if, for example, each participant had 2 viral loads measured at 1 year and 2 years before the outcome. However, model instability or model misspecification may be encountered when the number of viral load measurements is large relative to the number of participants. When multiple viral loads are collected opportunistically (e.g., chart abstraction), it may be difficult to align the viral load data, resulting in many missing observations. Therefore, aggregating to a single HVS is often a necessary analysis step.

Historical measures of viral load

Popular non–time-standardized HVS measures of the total burden of viral load include the AUC and the area under a viral load threshold (duration of viremia). These measures are useful when it is expected that the effect of viral load accumulates without recovery from earlier viral load exposures. If the time range of the HVS varies by participant, the AUC and the duration of viremia may be subject to selection bias. In addition, varying ranges make regression coefficients from generalized linear models difficult to interpret because changes in these cumulative measures could be due to either viral load or the time scale. One approach to alleviating this problem is to divide the AUC or the duration of viremia by the individual-level time span range, resulting in time-averaged HVS.

Standardization of the AUC and the duration of viremia results in the AUCt and the percentage of time suppressed. Time standardization helps to reduce changes from person to person that are attributable to varying lengths of follow-up time and therefore makes regression coefficients easier to interpret. However, standardization may not remove selection bias if the effect of the viral load on the outcome depends on the timing of measurement. For example, suppose the effect of viral load on cognitive abilities is different in the first, second, and third years of life. In that case, analysis results of AUCt based on the first year of life will be different from those based on AUCt over the first 3 years of life. Accordingly, when viral loads are measured sporadically through the first 3 years of life, associations for AUCt will correspond to the ages at which the viral loads were measured.

Two other HVS include peak viral load and age at peak viral load. These HVS are easy to compute and analyze, and they measure the level and timing of the worst HIV disease severity. Relative timing and range of the viral loads can also strongly influence the interpretation for these HVS and can bias the detected HVS-outcome relationship, similar to AUCt. Further, these peak HVS cannot be standardized by time.

EVALUATING HVS MEASURES USING THE PHACS-AMP STUDY

In this section, we introduce the PHACS-AMP Study. Next, we show that PHACS-AMP viral load availability depends on age. We then detail an imputation-based modeling strategy to account for missing data. Subsequently, we compare the imputation-based HVS and the HVS computed from partial information. Lastly, we correlate the 6 HVS with FSIQ while varying the relative timing. (See Web Appendix 1, available at https://doi.org/10.1093/aje/kwac125, for analysis code.)

The PHACS-AMP Study

PHACS-AMP is a prospective observational cohort study that enrolled and followed 451 youth living with perinatal HIV (28). Youth aged 7–16 years were enrolled in PHACS-AMP between 2007 and 2009; follow-up continued until participants reached age 18 years. The primary goals of PHACS-AMP were to evaluate the long-term effects of HIV and antiretroviral treatment. PHACS-AMP was approved by institutional review boards at Harvard T.H. Chan School of Public Health and all clinical research sites. Written informed consent was obtained from each parent or legal guardian, with assent from children as appropriate.

Data on lifetime medical history, including antiretroviral treatment, opportunistic infections, viral loads, CD4 cell counts, and medical conditions, were recorded at study entry and throughout follow-up visits. Study visits were scheduled at entry, 6 months, 1 year, 2 years, 2.5 years, 3 years, and annually thereafter until the participant reached age 18 years or left the study. All PHACS-AMP youth living with perinatal HIV are now off-study, with 76% completing the age 18 visit. The median age at enrollment was 12 years, with 53% female; 70% were Black or African-American, and 24% were Hispanic or Latino. The FSIQ was obtained using the Wechsler Intelligence Scale for Children–Fourth Edition (29) and the Wechsler Adult Intelligence Scale–Fourth Edition (30). FSIQ was measured on average 2 months, 2 years, 4 years, and 4.5 years after PHACS-AMP study entry.

PHACS-AMP viral load availability

Figure 1 shows the distribution of log10 viral load for each year of age among PHACS-AMP youth living with perinatal HIV. Median viral loads decreased from birth through early childhood and stabilized at approximately age 7 years. Viral load data availability ranged from 18% in the first year of life to a maximum of 75% at age 9 years. Over 90% had a viral load measurement taken in at least 6 of the 18 years.

Figure 1.

Figure 1

Distribution of log10(viral load) by age and number of youth living with perinatal human immunodeficiency virus (HIV) infection with at least 1 age measurement, Pediatric HIV/AIDS Cohort Study Adolescent Master Protocol, United States, 2007–2016. The top and bottom of the box represent the 75th and 25th percentiles, the horizontal line inside the box represents the median, the whiskers are the 10th and 90th percentiles, and outliers are shown as circles. AIDS, acquired immunodeficiency syndrome.

It is difficult to interpret HVS when availability varies by participant so that each participant has a unique viral load relative timing and time span. On average, PHACS-AMP participants with a greater time span of measured viral loads earlier in life will have greater AUCs with a higher peak viral load. Therefore, results based only on the observed viral loads would be biased if participants with more measurements had different health outcomes.

Imputation model

Multiple imputation (3133) can be used to alleviate this bias by accounting for this missing-data pattern. Multiple imputation has been described extensively elsewhere (3436). Briefly, an imputation model is created using the relationships between variables among records with complete data. The imputation model is used to repeatedly fill in missing values to correct for bias and account for uncertainty. Analyses are conducted on each completed data set, and the results are averaged using the multiple imputation combination rules (33, 37).

To account for repeated measurements within an individual, we developed an imputation model using a longitudinal linear mixed-effects model with a random intercept and random slope for age and antiretroviral drug group. To account for the lack of viral suppression resulting from resistance or nonadherence, we modeled antiretroviral drugs with a random effect. Viral load data were divided into 3-month intervals from birth to the last clinic visit, so there were few repeated participant-level measurements within each bin.

Variables known to be associated with viral load included in our multiple imputation model were year of birth (grouped: 1991–1993, 1994–1995, 1996–1998, or 1999–2002)), year of viral load measurement (grouped: 1991–2000, 2001–2004, 2005–2008, or 2009–2017), use of antiretroviral drugs (no antiretroviral drugs, antiretroviral treatment (including at least 3 antiretroviral drugs in 2 or more drug classes), or non–antiretroviral treatment antiretroviral drugs), and age. Age, CD4 cell count, CD4 cell percentage (CD4%), and CD8 cell count were transformed using natural splines. These splines are cubic splines with a linear constraint at the boundary (38). The Bayesian information criterion was used to select the spline degrees of freedom. To satisfy normality and constant variance assumptions, we fitted models after applying a log transformation (log10 of viral load; loge of CD4 count, CD4%, and CD8 count). To account for missing data on CD4 count, CD4%, and CD8 count, we used multiple imputation by chained equations (39). Chained equations were cycled through 10 times to create 25 imputation-based data sets.

Comparing the imputation-based HVS with the HVS from partial information

The participant-level HVS estimates from the imputation-based data were compared with an HVS based on the observed data using a scatterplot (Figure 2, Web Figure 1). The x-axis displays the HVS based only on the observed PHACS-AMP data, and the y-axis displays the HVS based on the imputation strategy. Points above the y = x reference line indicate a higher individual HVS based on the imputation strategy, while values below the y = x reference line indicate a higher individual-level HVS based on the observed viral loads. Cumulative viral load measures were lower, on average, when based only on the observed data. The percentage of time of suppressed viral load was higher for observed values than for imputed measures. The correlation, a measure of linear relationship strength, between the HVS imputation-based estimate and the HVS based only on observed data was 0.64 for AUC, 0.87 for AUCt, 0.86 for duration of viremia greater than or equal to 400 copies/mL, 0.91 for percentage of time with a viral load less than 400 copies/mL, 0.25 for age at peak viral load, and 0.25 for peak viral load.

Figure 2.

Figure 2

Comparison of the imputation-based historical viral load summary (HVS) with the HVS based only on observed data for youth living with perinatal human immunodeficiency virus (HIV) infection, Pediatric HIV/AIDS Cohort Study Adolescent Master Protocol, United States, 2007–2016. AIDS, acquired immunodeficiency syndrome; AUC, area under the viral load curve; AUCt, time-averaged AUC.

Associations with HVS and neurocognitive outcomes

We sought to quantify average differences and directions of various exposure-outcome relationships by varying 1) the HVS, 2) the time span over which the HVS were calculated, and 3) the specific neurocognitive outcomes. Imputation-based regression coefficients, standardized by the HVS standard deviation, were plotted to allow for comparisons across the HVS (Figure 3, Web Figure 2). Unadjusted regression analyses were conducted for illustrative purposes.

Figure 3.

Figure 3

Mean difference in full-scale intelligence quotient (FSIQ) for a 1–standard-deviation (SD) difference in the imputed historical viral load among youth living with perinatal human immunodeficiency virus (HIV) infection, varying the time spans of the viral loads, Pediatric HIV/AIDS Cohort Study Adolescent Master Protocol, United States, 2007–2016. AIDS, acquired immunodeficiency syndrome; AUC, area under the viral load curve; AUCt, time-averaged AUC; HVS, historical viral load summary.

Figure 3 displays the standardized regression coefficients on the y-axis for FSIQ based on the 6 HVS. The x-axis in panel A indicates the number of years used to compute the HVS before the FSIQ outcome. As the number on the x-axis increases, viral loads further back in time are used to compute the HVS. FSIQ was most strongly associated with percentage of time with a viral load less than 400 copies/mL, and the average difference in FSIQ scores became larger as the number of years before FSIQ was measured increased. Figure 3B is similar, except that the x-axis displays the number of years after birth used to compute the HVS. For this plot, the percentage of time with a viral load less than 400 copies/mL showed the strongest association with FSIQ, but estimated differences per standard-deviation increase in this HVS flattened out after 5 years from birth. These results suggest that both HVS type and the time span affect the estimated exposure-outcome relationship.

SIMULATION STUDY

We conducted a simulation study to investigate operating characteristics of the previously described multiple imputation approach under plausible scenarios and to highlight simulation as a tool for gaining confidence in analytical decisions. We simulated viral loads to mimic the observed PHACS-AMP viral load data. We then applied 3 plausible missing-data mechanisms and 3 analytical strategies to summarize HVS. R software code (R Foundation for Statistical Computing, Vienna, Austria) with which to reproduce the simulation is provided in Web Appendix 2.

Longitudinal log-transformed viral loads were simulated every 3 months from birth to age 18 years. Simulations were based on a linear mixed-effect model with a random intercept and a natural spline for age with 6 degrees of freedom, estimated from the observed PHACS-AMP viral loads.

Three missing-data mechanisms were applied to the simulated viral loads: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). In the MCAR scenario, longitudinal measurements were randomly set to missing independently and with probability equal to that observed overall in PHACS-AMP (0.36). The MAR scenario randomly sampled the average missing-data pattern within each age group observed in PHACS-AMP. The MNAR scenario set those who were older with higher viral loads to have a higher likelihood of missing data. The MNAR scenario was used to investigate the impact of lower adherence and an increased number of missed visits during adolescence. The probabilities for missing data within the age groups <5 years, 5–9 years, 10–14 years, and ≥15 years were set to 0.025, 0.05, 0.25, and 0.35, respectively. In addition, within each of these age ranges, the missing probabilities for mean log10 viral loads in the ranges of <2.5, 2.5–3.4, and ≥3.5 copies/mL were set to 0, 0.05, and 0.15, respectively. The probabilities were then summed based on the age at the simulated viral load and the viral load for computation of the overall probability of missingness.

The simulation used study sample sizes of 451, 1,000, and 2,000. Five thousand simulations were conducted for each sample size and missing-data scenario. For each simulation, HVS were computed using 3 analytical strategies: 1) use of the “observed data”; 2) use of an incorrectly specified linear mixed-effect imputation model with a random intercept and a linear assumption for age; and 3) use of a correctly specified imputation linear mixed-effects model with a random intercept and a natural spline for age. Missing viral load data were imputed 25 times.

The percent bias was lowest when using an imputation model for MCAR and MAR missing data (Table 1). Consistent with Figure 2, bias was highest for age at peak viral load and peak viral load using the observed data. An analysis using the observed data was not unbiased even when the missing data were MCAR because of randomly missing the first or last viral load measurement. Analyses based on both the observed data and the imputation method were biased when the missing data were MNAR; however, results based on imputation tended to have lower bias. Except for age at peak HVS, imputation using an incorrect imputation model reduced bias in comparison with using only the observed data.

Table 1.

Simulation Results Examining Percent Bias in the Estimation of Historical Viral Load Summaries According to Sample Size, Type of Missing Data, and Missing Data Strategy, Pediatric HIV/AIDS Cohort Study Adolescent Master Protocol, United States, 2007–2016

Missing-Data Scenario a and
Missing-Data Strategy
% Bias b
AUC AUC t Duration of Viremia % With <400 Copies/mL Peak Viral Load Age at Peak Viral Load
Sample Size = 451
MCAR: probability of missing data = 0.36
 Observed data only 1.8 0.2 1.7 0.0 3.4 −20.7
 Incorrect imputation model −0.1 −0.1 −0.6 0.8 1.4 −33.1
 Correct imputation model 0.0 0.0 0.0 0.0 0.0 0.6
MAR: AMP pattern
 Observed data only 7.8 1.3 7.7 −3.9 8.7 −119.9
 Incorrect imputation model 0.9 0.9 0.4 −0.8 3.4 −63.0
 Correct imputation model 0.0 0.0 0.0 0.0 0.0 1.3
MNAR: older/+VL high missingc
 Observed data only 2.3 1.1 3.3 1.4 1.4 −62.4
 Incorrect imputation model 1.2 1.2 2.1 −3.2 0.8 −44.2
 Correct imputation model 0.9 0.9 1.8 −2.7 0.2 7.9
Sample Size = 1,000
MCAR: probability of missing data = 0.36
 Observed data only 1.8 0.2 1.7 0.0 3.4 −20.5
 Incorrect imputation model −0.1 −0.1 −0.6 0.8 1.4 −33.0
 Correct imputation model 0.0 0.0 0.0 0.0 0.0 0.6
MAR: AMP pattern
 Observed data only 7.8 1.3 7.7 −3.9 8.7 −120.1
 Incorrect imputation model 0.9 0.9 0.4 −0.8 3.4 −63.0
 Correct imputation model 0.0 0.0 0.0 0.0 0.0 1.3
MNAR: older/+VL high missing
 Observed data only 2.3 1.1 3.3 1.4 1.4 −5.7
 Incorrect imputation model 1.2 1.2 2.1 −3.2 0.8 −11.2
 Correct imputation model 0.9 0.9 1.8 −2.7 0.2 1.5
Sample Size = 2,000
MCAR: probability of missing data = 0.36
 Observed data only 1.8 0.2 1.7 0.0 3.4 −20.5
 Incorrect imputation model −0.1 −0.1 −0.6 0.8 1.4 33.0
 Correct imputation model 0.0 0.0 0.0 0.0 0.0 0.6
MAR: AMP pattern
 Observed data only 7.8 1.3 7.7 −3.9 8.7 −119.9
 Incorrect imputation model 0.9 0.9 0.4 −0.8 3.4 −62.9
 Correct imputation model 0.0 0.0 0.0 0.0 0.0 1.3
MNAR: older/+VL high missing
 Observed data only 2.3 1.1 3.3 1.4 1.4 −5.7
 Incorrect imputation model 1.2 1.2 2.1 −3.2 0.8 −11.2
 Correct imputation model 0.9 0.9 1.8 −2.7 0.2 1.6

Abbreviations: AIDS, acquired immunodeficiency syndrome; AMP, Adolescent Master Protocol; AUC, area under the VL curve; AUCt, time-averaged AUC; HIV, human immunodeficiency virus; HVS, historical viral load summary; MAR, missing at random; MCAR, missing completely at random; MNAR, missing not at random; VL, viral load.

a The average percentage of missingness for the MCAR, MAR, and MNAR scenarios was 36%, 42%, and 22%, respectively.

b Percent bias was calculated by subtracting the average simulated HVS with no missing data from the average HVS computed with the missing-data strategy, which was then divided by the average simulated HVS with no missing data.

c Older participants with higher VLs had a higher chance of missing data.

DISCUSSION

We evaluated 6 HVS: 1) AUC, 2) AUCt, 3) duration of viremia, 4) percentage of time that HIV viral load is suppressed, 5) peak viral load, and 6) age at peak viral load. To guide the choice of the HVS, the relative timing, range (time span), and transformation need to be carefully considered. In addition, the data collection process needs to be considered, so analysis summaries do not simply reflect data availability. Missing-data analysis methods may be necessary to reduce bias.

We have 4 recommendations for researchers who plan to use HVS.

  1. Investigate data availability: Summarize how data collection differs by participant and by covariates.

  2. Satisfy the MAR assumption: Use flexible missing-data models to satisfy a plausible MAR assumption and include as many predictive variables as possible.

  3. Simulate missing-data mechanisms: Investigate bias using simulation studies based on complete data, the observed missing-data pattern, and a hypothetical MNAR scenario.

  4. Conduct exploratory analyses: Vary the relative timing, range, transformation, and type of HVS in exploratory analyses when there are no a priori hypotheses for how a historical exposure summary should influence the outcomes under study.

Investigate data availability

We discussed how the relative timing to the outcome of interest and how the range of viral loads affects the interpretation of HVS. Similarly, when the relative timing and range vary across participants, analyses of HVS will be difficult to interpret. For this reason, it is essential to investigate data availability according to the time unit used to calculate the HVS. If the time unit varies by participant, missing-data methods can be used to provide interpretable measures of association. Lastly, viral load availability by other covariates should be summarized to inform missing-data strategies.

Satisfy the MAR assumption

We used a multiple imputation approach to alleviate selection bias. Multiple imputation assumes MAR, which states that the propensity for missingness depends only on the observed data values through a correctly specified imputation model (33). Therefore, the year of the viral load measurement and age were included in the imputation model, since viral load availability increased over time. The individual viral load trajectories by age were complex, because the virus can quickly rebound when participants stop antiretroviral treatment. To account for this complexity, we used a spline model with a random slope and intercept, so each participant would have an imputation model to reflect their unique viral load trajectory. Achieving the MAR assumption for age with a realistic model required a flexible modeling strategy.

Variables thought to be associated with viral load were also included in the imputation model. For example, antiretroviral drugs were included because they reduce viral load. To model periods of nonadherence, the imputation model included a random effect for antiretroviral drug use. In addition, CD4 measurements were included because viral load exposure weakens the immune system. However, CD4 measurements also had missing values, which resulted in the need to use chained equations. This strategy required an imputation model for the CD4 measurements, which makes imputation more model-dependent and more difficult to implement.

Simulate missing-data mechanisms

Multiple imputation is not a perfect solution, because results rely on model specification and the available data. For example, youth born earlier in the epidemic were more likely to have missing viral load data. The imputation model might have been different had more viral load data been available. Concerns like these can be investigated using a simulation approach, which assumes an unknown data-generating mechanism.

The simulations presented in this article showed that multiple imputation can provide an unbiased HVS estimate when the assumption of MAR is satisfied, in contrast to an analysis based on partial data. The simulations also showed that it is possible to reduce bias under a plausible MNAR missing-data mechanism. This reduction in bias is due to having access to correlated longitudinal measurements (40). Simulation results reassured the use of imputation in PHACS-AMP analyses.

Conduct exploratory analyses

If little is known about how an HVS could affect an outcome, it is best to start with an exploratory analysis. This is because different HVS might have unique relationships with the outcome. As such, we varied the timing of the viral loads relative to the age of the FSIQ measurements. Depending on how proximate the viral loads were relative to birth or the outcome measurement, some associations changed. We also found that associations differed depending on which HVS was used, implying that different HVS describe different relationships.

Time standardization and extensions

When the HVS based on the observed data and the imputation approach differ substantially (e.g., peak viral load, age at peak viral load), the analysis results using just the observed data are probably unreliable. In these cases, analysis results may be sensitive to the imputation model assumptions. Standardizing by time (e.g., time-averaged AUC, percentage of time viral load was less than 400 copies/mL) resulted in the highest correlation between the imputation-based estimate and the estimate based on the observed data. That makes time standardization an attractive analytical strategy and follows the recommendation given by Lesosky et al. (26).

The analytical considerations discussed in this article can be generalized for other studies which collect data on longitudinally measured exposure variables. Other examples include wearable data-collection devices that capture information on heart rate, brain waves, respiration, diet, exercise, or sleep (41); or other commonly collected clinical measurements extracted from charts, like blood pressure, weight, or routine laboratory test results. Irregular measurements and missing data are likely to be encountered with the use of wearable data-collection devices (42) and chart abstraction (43).

Missing data and the many HVS options make characterizing associations with HVS and outcomes difficult. These difficulties should not discourage researchers from pursuing these summaries; rather, the assumptions and areas of uncertainty should be transparent, with limitations acknowledged.

Supplementary Material

Web_Material_kwac125

ACKNOWLEDGMENTS

Author affiliations: Center for Biostatistics in AIDS Research, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States (Sean S. Brummel, Kunjal Patel, Tzy-Jyun Yao, Paige L. Williams); Department of Pediatrics, School of Medicine, Tulane University, New Orleans, Louisiana, United States (Russell B. Van Dyke); Department of Epidemiology, Harvard School T.H. Chan of Public Health, Boston, Massachusetts, United States (Kunjal Patel, George R. Seage, Brad Karalius, Paige L. Williams); Division of Pediatric Infectious Disease, BronxCare Health System, New York, New York, United States (Murli Purswani); Maternal and Pediatric Infectious Disease Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, Maryland, United States (Rohan Hazra); and Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States (Paige L. Williams).

This study was supported by the Eunice Kennedy Shriver National Institute of Child Health and Human Development, the Office of the Director of the National Institutes of Health, the National Institute of Dental and Craniofacial Research, the National Institute of Allergy and Infectious Diseases, the National Institute of Neurological Disorders and Stroke, the National Institute on Deafness and Other Communication Disorders, the National Institute of Mental Health, the National Institute on Drug Abuse, the National Cancer Institute, the National Institute on Alcohol Abuse and Alcoholism, and the National Heart, Lung, and Blood Institute through cooperative agreements with the Harvard T.H. Chan School of Public Health (agreement HD052102) (Principal Investigator (PI): George R. Seage III; Program Director: Elizabeth Salomon) and Tulane University School of Medicine (agreement HD052104) (PI: Russell Van Dyke; Co-PI: Ellen Chadwick; Project Director: Patrick Davis), and through a cooperative agreement with the Harvard T.H. Chan School of Public Health for the Pediatric HIV/AIDS Cohort Study 2020 (agreement P01HD103133) (multiple PIs: Ellen Chadwick, Sonia Hernandez-Diaz, Jennifer Jao, and Paige Williams; Program Director: Liz Salomon). Data management services were provided by the Frontier Science and Technology Research Foundation (Boston, Massachusetts) (Data Management Center Director: Suzanne Siminski), and regulatory services and logistical support were provided by Westat, Inc. (Rockville, Maryland) (Project Directors: Julie Davidson and Tracy Wolbach).

Public-use PHACS-AMP data are available through the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) Data and Specimen Hub (DASH).

We thank the participants and families for their participation in PHACS and the individuals and institutions involved in the conduct of PHACS.

The following institutions, clinical site investigators, and staff participated in conducting PHACS AMP and AMP Up in 2020 (in alphabetical order): Ann & Robert H. Lurie Children’s Hospital of Chicago, Chicago, Illinois: Ellen Chadwick, Margaret Ann Sanders, Kathleen Malee, and Yoonsun Pyun; Baylor College of Medicine, Houston, Texas: Mary Paul, Shelley Buschur, Chivon McMullen-Jackson, and Lynnette Harris; BronxCare Health System, New York, New York: Murli Purswani, Marvin Alvarado, Mahboobullah Mirza Baig, and Alma Villegas; Children’s Diagnostic & Treatment Center, Fort Lauderdale, Florida: Lisa-Gaye Robinson, Celestyn Angot, and Patricia Garvie; Boston Children’s Hospital, Boston, Massachusetts: Sandra K. Burchett, Michelle E. Anderson, and Christine M. Salois; Jacobi Medical Center, New York, New York: Andrew Wiznia, Marlene Burey, and Ray Shaw; Rutgers New Jersey Medical School, Newark, New Jersey: Arry Dieudonne, Linda Bettica, Juliette Johnson, and Karen Surowiec; St. Christopher’s Hospital for Children, Philadelphia, Pennsylvania: Janet S. Chen, Taesha White, and Mitzie Grant; St. Jude Children’s Research Hospital, Memphis, Tennessee: Katherine Knapp, Jamie Russell-Bell, Megan Wilkins, and Erick Odero; San Juan Hospital, San Juan, Puerto Rico: Nicolas Rosario, Heida Rios, and Vivian Olivera; Tulane University School of Medicine, New Orleans, Louisiana: Margarita Silio, Medea Gabriel, and Patricia Sirois; University of California, San Diego, La Jolla, California: Stephen A. Spector, Megan Loughran, Veronica Figueroa, and Sharon Nichols; University of Colorado Denver Health Sciences Center, Denver, Colorado: Elizabeth McFarland, Carrie Chambers, Carrie Knowlton, and Nicole Petrovic; University of Miami, Miami, Florida: Gwendolyn Scott, Grace Alvarez, Juan Caffroni, and Anai Cuadra.

The conclusions and opinions expressed in this article are those of the authors and do not necessarily reflect those of the National Institutes of Health or the US Department of Health and Human Services.

Conflict of interest: none declared.

References

  • 1. Atluri VSR, Hidalgo M, Samikkannu T, et al. Effect of human immunodeficiency virus on blood-brain barrier integrity and function: an update. Front Cell Neurosci. 2015;9:212. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. McArthur JC, Brew BJ. HIV-associated neurocognitive disorders: is there a hidden epidemic? AIDS. 2010;24(9):1367–1370. [DOI] [PubMed] [Google Scholar]
  • 3. Hassanzadeh-Behbahani S, Shattuck KF, Bronshteyn M, et al. Low CD4 nadir linked to widespread cortical thinning in adults living with HIV. NeuroImage Clin. 2020;25:102155. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Crowell CS, Yanling H, Tassiopoulos K, et al. Early viral suppression improves neurocognitive outcomes in HIV-infected children. AIDS. 2015;29(3):295–304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Usitalo A, Leister E, Tassiopoulos K, et al. Relationship between viral load and self-report measures of medication adherence among youth with perinatal HIV infection. AIDS Care. 2014;26(1):107–115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Shearer WT, Quinn TC, LaRussa P, et al. Viral load and disease progression in infants infected with human immunodeficiency virus type 1. N Engl J Med. 1997;336(19):1337–1342. [DOI] [PubMed] [Google Scholar]
  • 7. Cole SR, Napravnik S, Mugavero MJ, et al. Copy-years viremia as a measure of cumulative human immunodeficiency virus viral burden. Am J Epidemiol. 2010;171(2):198–205. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Wright ST, Hoy J, Mulhall B, et al. Determinants of viremia copy-years in people with HIV/AIDS after initiation of antiretroviral therapy. J Acquir Immune Defic Syndr. 2014;66(1):55–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Mugavero MJ, Napravnik S, Cole SR, et al. Viremia copy-years predicts mortality among treatment-naive HIV-infected patients initiating antiretroviral therapy. Clin Infect Dis. 2011;53(9):927–935. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Chirouze C, Journot V, Le Moing V, et al. Viremia copy-years as a predictive marker of all-cause mortality in HIV-1-infected patients initiating a protease inhibitor-containing antiretroviral treatment. J Acquir Immune Defic Syndr. 2015;68(2):204–208. [DOI] [PubMed] [Google Scholar]
  • 11. Mirani G, Williams PL, Chernoff M, et al. Changing trends in complications and mortality rates among US youth and young adults with HIV infection in the era of combination antiretroviral therapy. Clin Infect Dis. 2015;61(12):1850–1861. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Avettand-Fenoel V, Blanche S, Le Chenadec J, et al. Relationships between HIV disease history and blood HIV-1 DNA load in perinatally infected adolescents and young adults: the ANRS-EP38-IMMIP Study. J Infect Dis. 2012;205(10):1520–1528. [DOI] [PubMed] [Google Scholar]
  • 13. Zoufaly A, Stellbrink HJ, An der Heiden M, et al. Cumulative HIV viremia during highly active antiretroviral therapy is a strong predictor of AIDS-related lymphoma. J Infect Dis. 2009;200(1):79–87. [DOI] [PubMed] [Google Scholar]
  • 14. Purswani MU, Karalius B, Yao TJ, et al. Prevalence and persistence of varicella antibodies in previously immunized children and youth with perinatal HIV-1 infection. Clin Infect Dis. 2016;62(1):106–114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Spritzler J, DeGruttola VG, Pei L. Two-sample tests of area-under-the-curve in the presence of missing data. Int J Biostat. 2008;4(1):Article 1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Williams PL, Abzug MJ, Jacobson DL, et al. Pubertal onset in children with perinatal HIV infection in the era of combination antiretroviral treatment. AIDS. 2013;27(12):1959–1970. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Agwu AL, Yao TJ, Eshleman SH, et al. Phenotypic coreceptor tropism in perinatally HIV-infected youth failing antiretroviral therapy. Pediatr Infect Dis J. 2016;35(7):777–781. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Stöhr W, Fidler S, McClure M, et al. Duration of HIV-1 viral suppression on cessation of antiretroviral therapy in primary infection correlates with time on therapy. PLoS One. 2013;8(10):e78287. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Lima VD, Bangsberg DR, Harrigan PR, et al. Risk of viral failure declines with duration of suppression on highly active antiretroviral therapy irrespective of adherence level. J Acquir Immune Defic Syndr. 2010;55(4):460–465. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Kempf DJ, Rode RA, Xu Y, et al. The duration of viral suppression during protease inhibitor therapy for HIV-1 infection is predicted by plasma HIV-1 RNA at the nadir. AIDS. 1998;12(5):F9–F14. [DOI] [PubMed] [Google Scholar]
  • 21. Maggiolo F, Migliorino M, Pirali A, et al. Duration of viral suppression in patients on stable therapy for HIV-1 infection is predicted by plasma HIV RNA level after 1 month of treatment. J Acquir Immune Defic Syndr. 2000;25(1):36–43. [DOI] [PubMed] [Google Scholar]
  • 22. Sempa JB, Dushoff J, Daniels MJ, et al. Reevaluating cumulative HIV-1 viral load as a prognostic predictor: predicting opportunistic infection incidence and mortality in a Ugandan cohort. Am J Epidemiol. 2016;184(1):67–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Lewis-de los Angeles CP, Williams PL, Jenkins LM, et al. Brain morphometric differences in youth with and without perinatally acquired HIV: a cross-sectional study. NeuroImage Clin. 2020;26:102246. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Geffner ME, Patel K, Miller TL, et al. Factors associated with insulin resistance among children and adolescents perinatally infected with HIV-1 in the Pediatric HIV/AIDS Cohort Study. Horm Res Paediatr. 2011;76(6):386–391. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Nichols SL, Brummel SS, Smith RA, et al. Executive functioning in children and adolescents with perinatal HIV infection. Pediatr Infect Dis J. 2015;34(9):969–975. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Lesosky M, Glass T, Rambau B, et al. Bias in the estimation of cumulative viremia in cohort studies of HIV-infected individuals. Ann Epidemiol. 2019;38:22–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Gasparrini A. Modelling lagged associations in environmental time series data: a simulation study. Epidemiology. 2016;27(6):835–842. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Van Dyke RB, Patel K, Siberry GK, et al. Antiretroviral treatment of US children with perinatally acquired HIV infection: temporal changes in therapy between 1991 and 2009 and predictors of immunologic and virologic outcomes. J Acquir Immune Defic Syndr. 2011;57(2):165–173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Wechsler D. Wechsler Intelligence Scale for Children–Fourth Edition (WISC-IV). San Antonio, TX: The Psychological Corporation; 2003. [Google Scholar]
  • 30. Wechsler D. Wechsler Adult Intelligence Scale–Fourth Edition (WAIS-IV). San Antonio, TX: NCS Pearson; 2008. [Google Scholar]
  • 31. Rubin DB. Multiple Imputation for Nonresponse in Surveys. Wiley Classics Library ed. Hoboken, NJ: John Wiley & Sons, Inc.; 2004. [Google Scholar]
  • 32. Rubin DB. Inference and missing data. Biometrika. 1976;63(3):581–592. [Google Scholar]
  • 33. Little RJ, Rubin DB. Statistical Analysis With Missing Data. 3rd ed. Hoboken, NJ: John Wiley & Sons, Inc.; 2019. [Google Scholar]
  • 34. Donders AR, Van Der Heijden GJ, Stijnen T, et al. A gentle introduction to imputation of missing values. J Clin Epidemiol. 2006;59(10):1087–1091. [DOI] [PubMed] [Google Scholar]
  • 35. Greenland S, Finkle WD. A critical look at methods for handling missing covariates in epidemiologic regression analyses. Am J Epidemiol. 1995;142(12):1255–1264. [DOI] [PubMed] [Google Scholar]
  • 36. Rezvan PH, Lee KJ, Simpson JA. The rise of multiple imputation: a review of the reporting and implementation of the method in medical research. BMC Med Res Methodol. 2015;15(1):30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Kenward MG, Carpenter J. Multiple imputation: current perspectives. Stat Methods Med Res. 2011;16(3):199–218. [DOI] [PubMed] [Google Scholar]
  • 38. Perperoglou A, Sauerbrei W, Abrahamowicz M, et al. A review of spline function procedures in R. BMC Med Res Methodol. 2019;19(1):1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Resche-Rigon M, White IR. Multiple imputation by chained equations for systematically and sporadically missing multilevel data. Stat Methods Med Res. 2018;27(6):1634–1649. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Ginkel JR, Linting M, Rippe RC, et al. Rebutting existing misconceptions about multiple imputation as a method for handling missing data. J Pers Assess. 2020;102(3):297–308. [DOI] [PubMed] [Google Scholar]
  • 41. Cadmus-Bertram L. Using fitness trackers in clinical research: what nurse practitioners need to know. J Nurse Pract. 2017;13(1):34–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Izonin I, Kryvinska N, Tkachenko R, et al. An approach towards missing data recovery within IoT smart system. Procedia Comput Sci. 2019;155:11–18. [Google Scholar]
  • 43. Kaji AH, Schriger D, Green S. Looking through the retrospectoscope: reducing bias in emergency medicine chart review studies. Ann Emerg Med. 2014;64(3):292–298. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Web_Material_kwac125

Articles from American Journal of Epidemiology are provided here courtesy of Oxford University Press

RESOURCES