Skip to main content
AIDS Research and Treatment logoLink to AIDS Research and Treatment
. 2025 Jan 29;2025:5890464. doi: 10.1155/arat/5890464

Longitudinal Viral Load Clustering for People With HIV Using Functional Principal Component Analysis

Yunqing Ma 1,, Xueying Yang 2,3, Jiayang Xiao 1, Xiaoming Li 2,3, Bankole Olatosi 2,4, Jiajia Zhang 1,2
PMCID: PMC11824709  PMID: 39949990

Abstract

Background: Longitudinal measures of viral load (VL) are critical in monitoring the HIV status. While multiple lab indicators exist for monitoring measures of VL, research on clustering historical/longitudinal VL measures is limited. Analyzing longitudinal VL patterns rather than aggregated measures offers deeper insights into HIV status. This study uses functional data clustering to classify longitudinal VL patterns and characterize each cluster by demographics, comorbidities, social behaviors, and CD4 count.

Methods: Adult PWH diagnosed from 2005 to 2015 in South Carolina with a 5-year minimum follow-up were included. We compared functional principal component analysis (FPCA), K-means, hierarchical clustering, and Gaussian mixture models for classification and found FPCA yielded the best results. ANOVA was used to compare VL characteristics, demographics, comorbidities, substance uses, and longitudinal CD4 count across clusters.

Results: Results obtained from FPCA could best distinguish the characteristics and patterns into four clusters. A total of 5916 PWH were grouped into long-term VS group (Cluster 1, 17.3%), short-term VS group (Cluster 2, 29.8%), suboptimal VS group (Cluster 3, 28.3%), and viral failure group (Cluster 4, 24.9%). In the long-term VS group with an average of 11-year follow-up, PWH displayed sustained VS (95.3%) and lower mean CD4 count (95.3%) than other clusters. The short-term VS group had shorter follow-up (6 years), more comorbidities (31.4%), and lower percentage of time with low CD4 count (79.9%). In suboptimal VS group, PWH were mostly under 30 years old (44.8%) and Black (77.2%), with relatively lower mean VL (92.9%) and lower VR history (18.4%). In the viral failure group, PWH had higher mean VL (40.6%) and lower mean CD4 count (34.7%).

Discussion: The findings highlight the impact of continuous clustering in understanding the distinct viral profiles of PWH and emphasize the importance of tailored treatment and insights to target interventions for all PWH.

Keywords: FPCA, HIV, longitudinal VL clustering, South Carolina

1. Introduction

Viral suppression (VS), generally defined as maintaining HIV viral load (VL) at undetectable levels [1, 2], is a primary goal of HIV treatment management. The longitudinal measure of VL is important for assessing disease progression, treatment effectiveness, and transmission risk [3, 4]. Maintaining low VL levels can reduce transmission risks and improve health outcomes [5, 6]. Fluctuations in VL can signal changes in disease progression or the effectiveness of antiretroviral therapy (ART), enabling timely adjustments to treatment regimens [7]. Therefore, the potential to proactive monitoring of longitudinal VL measures can serve as a robust metric for evaluating the impact of various interventions., e.g., treatment switch, and becomes an indispensable tool in both clinical care and epidemiological studies that inform public health strategies for HIV control and prevention [8, 9].

Clustering people with HIV (PWH) based on their longitudinal VL offers several significant benefits in the context of HIV treatment [1012]. First, such clustering enables the identification of distinct trajectories of disease progression and treatment response [11, 13, 14], providing valuable insights into heterogeneity within the patient population. This stratification allows for more personalized and effective treatment regimens, as different clusters may respond differently to ART or other treatment interventions. Second, clustering based on VL can reveal hidden subgroups that are at a higher risk of treatment failure or disease progression [15, 16], thereby facilitating early intervention strategies for these vulnerable populations. Third, it could support efficient resource allocation by helping public health agencies direct interventions and support more effectively when they know which clusters require the most urgent attention [11, 13, 16].

In previous research, researchers performed HIV population case studies using differing schema to classify VL patterns consistent with epidemiological principles, such as the difference in baseline clinical test results (CD4 or VL) among three cohorts, rather than statistical methodologies [17, 18]. Some groups defined clusters based on sustained high VL (SHVL) or durably suppressed VL (DSVL) before comparing patterns of different clusters [19]. However, the clusters may be preconceived because they are based on predetermined criteria that may not account for the complexity and variability of VL. In recent research, PWH have been clustered into time-varying patterns, like sustained low VL, rebounding VL, or SHVL; however, only aggregated VL measures were used, including relative area of viral exposure or weighted recency reliability [16], max VL [20], recent VL [21], and cumulative VL [22, 23], to generalize the dynamic VL patterns. However, current aggregate and/or temporal indicators cannot present the dynamic patterns well since they represent a summary of VL for a period of time rather than continuous changes. Sher et al. reported that it is unreliable to represent the impact of the VL pattern using only timepoint observations because VL change is a dynamic pattern [24]. In addition, the use of aggregate data may yield correlation coefficients exhibiting considerable bias, known as an ecological fallacy [25], encounter information loss [26], and ignore the temporal scope [27].

Longitudinal VL can better reflect the VL dynamics over a period, although there is a challenge in cluster analysis due to the sparsity and inconsistent patient encounters in electronic health records (EHR) intervals [28]. Wang et al. pointed out that EHR data recorded irregularly and continuously can be treated as functional data and analyzed via the functional data analysis (FDA) [29]. Compared with traditional multivariate statistical methods, FDA can easily account for complex, temporal variation [29], extract essential features of the functional data, and can be applied to functional classification [30]. It has been demonstrated to outperform standard clustering methods when used with longitudinal data [3133]. The goal of this study was to highlight the value of FPCA clustering in understanding the distinct viral profiles of PWH and emphasize the importance of tailored treatment and insights for targeted interventions. Although South Carolina's (SC's) VS rate has increased [34], the rate of VS did not achieve the national goal. In this study, using SC statewide EHR data with over 15 years of follow-up time, we applied the functional principal component analysis (FPCA) to identify the PWH with a similar VL history. Then we compared the FPCA clustering with other existing methods and highlighted the usage of FPCA in longitudinal clustering. Finally, we explored the differences in demographics, comorbidities, social behaviors, and CD4 count among different clusters which could serve as the target for future treatment interventions.

2. Methods

2.1. Data Source

SC has been compiling a comprehensive electronic HIV/AIDS reporting system for HIV, since 1986. This dataset is maintained by the South Carolina Department of Health and Environmental Control (SC DHEC) through a confidential system specifically designed for HIV/AIDS cases known as the enhanced HIV/AIDS Reporting System (eHARS) [35]. From January 1st, 2004, the CDC mandated the inclusion of CD4 count and VL tests [36]. De-identified EHR data from SC DHEC's eHARS and claims data from various payers were linked by the SC Office of Revenue and Fiscal Affairs (SC RFA). More details about the data sources and how they were linked can be found elsewhere [35, 37]. A total of 5,916 PWH who were (1) aged ≥ 18 years old, (2) diagnosed from January 1st, 2005, to December 31st, 2015, in SC, and (3) with a 5-year minimum follow-up from the first VS (VL ≤ 200 copies/mL) to the last VL test were included. Because each PWH took the VL test at varying times, this resulted in a sparse and irregular longitudinal VL history. The average follow-up time is 3616 days (9.90 years (SD: 1144 days (3.13 years))) with an average number of follow-ups being 21.85 (SD: 11.01). Based on the reporting system, we defined VS as VL ≤ 200 copies/mL. When PWH reached the VS, only VL ≤ 200 copies/mL was recorded, and the real numerical value was unavailable.

2.2. Variables

The historical VL and CD4 measures were defined across the follow-up time. We defined four aggregated measures for historical VL, including mean VL, max VL, min VL, recent VL, VL percentiles, proportion of the time with VS, and number of viral rebounds (VRs). VR was defined (1) as VL > 200 copies/mL after two consecutive VS, (2) at least one year of follow-up record after first VS, i.e., VL ≤ 200 copies/mL, and (3) at least 90 days apart from the last VS [6]. All VL measures were then categorized into four groups: ≤ 200, 200 to 10,000 to 100,000, and > 100,000 copies/mL.

The demographic variables included gender, age at HIV diagnosis, race, transmission mode for HIV, and residence (urban vs. rural). The clinical variables included substance use (i.e., alcohol use, tobacco use, and illicit drug use) and number of baseline comorbidities at the HIV diagnosis (i.e., comorbidities included hypothyroidism, hypertension, arthritis, COPD, cardiovascular disease, renal disease, diabetes mellitus, obesity, cerebrovascular disease, dyslipidemia, hepatitis C, and hepatitis B). All the clinical variables were identified using ICD-9/10 codes.

In addition, we defined five aggregated measures for historical CD4 count, including baseline CD4 count, max CD4 count, mean CD4 count, nadir CD4 count, and proportion of the time with low CD4 (< 200 cells/μL). All CD4 count measures were categorized into four groups: < 200 cells/μL, 200 to 350 cells/μL, 350 to 500 cells/μL, and > 500 cells/μL. We also constructed the proportion of the time with low CD4 or VS over the total follow-up days, which was categorized as 0, > 0 and ≤ 25%, > 25% and ≤ 50%, > 50% and ≤ 75%, > 75% and < 100%, and 100%.

2.3. Approach

We investigated four different clustering methods and selected the best one according to the visualization with multiple VL indicators. The four clustering methods included FPCA [30, 38], K-means clustering [39], hierarchical clustering (Hclust) [40], and Gaussian mixture models (GMM) clustering [41]. For the function clustering, the FPCA was employed to extract variations in sparse longitudinal VL data, which generated the corresponding eigenvalues to capture the underlying variability of longitudinal VL measures and functional principal component (FPC) scores to express the contribution of each FPC to individual observations. Then, the FPC scores were used for clustering because of their ability to distill complex functional data into a manageable number of dimensions that still retain the essential structure of the data. Through this approach, each individual's functional curve is estimated using local smoothing techniques that take into account the sparsity of the data, thus avoiding reliance on interpolation methods that could introduce bias. The EMCluster algorithm [42] was used for the clustering. FPCA clustering was conducted using fdapace R package.

Since other variables could not capture the whole VL as a function, we used all aggregated information of VL, i.e., mean VL, max VL, min VL, last VL, and time range from first VS to last VL, for clustering. K-means is designed to partition a dataset into K distinct clusters, where each individual belongs to the cluster with the nearest mean of the aggregated VL information [39]. The hierarchical average linkage clustering method operates by successively merging or dividing individuals based on similarity measures of the aggregated VL, forming a tree-like structure known as a dendrogram [40]. GMM operates under the assumption that the data are generated from a mixture of several Gaussian distributions [41] and that it separates individuals based on the probability of each individual belonging to each cluster.

To explore the similarities and differences among different clusters and validate the proposed clustering method, after getting the clustering labels through four methods, ANOVA was used to test the difference in VL characteristics, demographics, comorbidities, social behaviors, and historical CD4 count during each cluster's follow-up period.

We used R version 4.3.2 for analysis. A two-sided p value of 0.05 was employed to determine statistical significance.

3. Results

3.1. Basic Features of Longitudinal VL

Among 5916 PWH in this study, majority were 18–30 years old (39.9%), male (74.0%), Black (72.2%), identified as men who have sex with men (MSM) (51.3%), and resided in urban areas (83.0%) (Table 1). For the VL measures, most PWH were viral suppressed over the follow-up time (51.3%) and experienced no VR (82.0%). For the CD4 count measures, most PWH had minimum CD4 count < 200 cells/μL (34.6%), baseline CD4 count > 500 cells/μL (36.1%), maximum CD4 count > 500 cells/μL (81.8%), and mean CD4 count > 500 cells/μL (57.6%). 29.2%, 25.0%, and 8.0% of the population had alcohol, tobacco, and illicit drug use history, while 27.2% had some comorbidity history.

Table 1.

Distribution for VL characteristics, demographics, comorbidities, social behaviors, and historical CD4 count for overall and four clusters from FPCA.

Characteristics Cluster 1 Cluster 2 Cluster 3 Cluster 4 p value1
Long-term VS Short-term VS Suboptimal VS Viral failure
n = 1008 (17.3%) n = 1763 (29.8%) n = 1673 (28.3%) n = 1472 (24.9%)
Age group (years) < 0.001
 ≥ 18 and < 30 283 (28.1%) 678 (38.5%) 750 (44.8%) 651 (44.2%)
 ≥ 30 and < 40 251 (24.9%) 353 (20.0%) 389 (23.3%) 374 (25.4%)
 ≥ 40 and < 50 298 (29.6%) 389 (22.1%) 323 (19.3%) 317 (21.5%)
 ≥ 50 176 (17.5%) 343 (19.5%) 211 (12.6%) 130 (8.8%)
Sex < 0.001
 Male 738 (73.2%) 1363 (77.3%) 1249 (74.7%) 1030 (70.0%)
 Female 270 (26.8%) 400 (22.7%) 424 (25.3%) 442 (30.0%)
Race < 0.001
 White 298 (29.6%) 484 (27.5%) 288 (17.2%) 246 (16.7%)
 Black 647 (64.2%) 1173 (66.5%) 1292 (77.2%) 1157 (78.6%)
 Hispanic 48 (4.8%) 67 (3.8%) 63 (3.8%) 47 (3.2%)
 Others 15 (1.5%) 39 (2.2%) 30 (1.8%) 22 (1.5%)
Risk < 0.001
 Heterosexual 257 (25.5%) 301 (17.1%) 380 (22.7%) 406 (27.6%)
 MSM/IDU 48 (4.8%) 78 (4.4%) 100 (6.0%) 104 (7.1%)
 MSM 505 (50.1%) 1000 (56.7%) 849 (50.7%) 683 (46.4%)
 Others 198 (19.6%) 384 (21.8%) 344 (20.6%) 279 (19.0%)
Region 0.325
 Urban 847 (84.0%) 1481 (84.0%) 1376 (82.2%) 1209 (82.1%)
 Rural 161 (16.0%) 282 (16.0%) 297 (17.8%) 263 (17.9%)
Alcohol use < 0.001
 No 798 (79.2%) 1193 (67.7%) 1190 (71.1%) 1008 (68.5%)
 Yes 210 (20.8%) 570 (32.3%) 483 (28.9%) 464 (31.5%)
Tobacco use < 0.001
 No 845 (83.8%) 1265 (71.8%) 1249 (74.7%) 1079 (73.3%)
 Yes 163 (16.2%) 498 (28.2%) 424 (25.3%) 393 (26.7%)
Illicit drug use 0.003
 No 949 (94.1%) 1632 (92.6%) 1536 (91.8%) 1327 (90.1%)
 Yes 59 (5.9%) 131 (7.4%) 137 (8.2%) 145 (9.9%)
Comorbidity history < 0.001
 No 761 (75.5%) 1209 (68.6%) 1258 (75.2%) 1075 (73.0%)
 Yes 247 (24.5%) 554 (31.4%) 415 (24.8%) 397 (26.9%)
Percentage of time with low CD4 count < 0.001
 =0% 737 (73.1%) 1383 (79.9%) 1152 (69.7%) 613 (41.7%)
 > 0% and ≤ 25% 241 (23.9%) 260 (15.0%) 397 (24.0%) 483 (32.8%)
 > 25% and ≤ 50% 24 (2.4%) 42 (2.4%) 44 (2.7%) 183 (12.4%)
 > 50% and ≤ 75% 5 (0.5%) 11 (0.6%) 20 (1.2%) 88 (6.0%)
 > 75% and < 100% 0 (0.0%) 11 (0.6%) 9 (0.5%) 39 (2.7%)
 =100% 1 (0.1%) 23 (1.3%) 30 (1.8%) 65 (4.4%)

Note: 2 A two-sided p value of 0.05 was employed to determine the statistical significance.

1 p values were calculated using Pearson's chi-squared test.

3.2. Model Comparison

Using a generalized method of moments (GMM), four distinct clusters were identified with balanced sample sizes of 228 (3.9%), 928 (15.7%), 3650 (61.7%), and 1110 (18.8%), respectively (Figures 1 and 2). From Table 1, differences lied in demographics, comorbidities, social behaviors, and historical CD4 count among four clusters. However, demographics failed to be classified using GMM (Supporting Table 2), since there was no obvious difference between different clusters, i.e., only the third cluster can be distinguished.

Figure 1.

Figure 1

Distribution for VL characteristics (baseline, last, maximum, and mean) using GMM.

Figure 2.

Figure 2

Distribution for VL characteristics (percentage of time with VS and viral rebound history) using GMM.

For the first cluster, PWH had higher initial VL (> 100,000 cells/mL, 20.2%), higher mean VL (> 100,000 cells/mL, 72.4%), higher initial CD4 count (> 500 cells/μL, 17.5%), and lower minimum CD4 count (< 200 cells/μL, 85.5%). For the second cluster, mean VL mostly fell between 10,000 and 100,000 cells/mL (71.8%). For the third cluster, PWH tended to have lower initial VL (≤ 200 cells/mL, 96.4%), sustained VS over time (83.2%), and higher mean CD4 count (> 500 cells/μL, 67.7%). For the fourth cluster, mean VL most fell in between 200 and 10,000 cells/mL (95.0%). VL percentiles are shown in Supporting Table 1.

Clusters from K-means and Hclust were not informative due to the strong imbalance: the numbers of PWH in the main cluster for K-means and Hclust were 5754 (97.3%) and 5898 (99.7%), meaning that these two methods failed to classify longitudinal VL in this case.

3.3. Results Based on the FPCA Clustering

A total of 5916 PWH were grouped into four clusters using FPCA clustering method: long-term VS group (Cluster 1, 17.3%), short-term VS group (Cluster 2, 29.8%), suboptimal VS group (Cluster 3, 28.3%), and viral failure group (Cluster 4, 24.9%) (Table 1). In the long-term, VS group with an average of 11-year follow-up, PWH were older (> 50 years old, 17.5%), with lower percentage of Black people compared with other clusters (64.2%), with minimum CD4 count > 500 cells/μL (18.1%) and less alcohol use (20.8%), and were mostly sustained VS (95.3%) (Figures 3 and 4). Characteristics of individuals in the short-term VS group were similar except for shorter follow-up (6 years), a higher percentage of individuals with comorbidity history (31.4%), and higher max CD4 count (> 500 cells/μL, 93.8%) (Figure 5). In the suboptimal VS group, PWH were mostly under 30 years old (44.8%) and Black (77.2%), with relatively lower percentage of individuals with mean VL < 10,000 cells/mL compared with other clusters (92.9%) (Figure 4) and VR history (18.4%). In the viral failure group, demographics were similar to the suboptimal VS group, where majority of PWH had no VS history, lower percentage mean CD4 count > 500 cells/μL (34.7%), and percentage of time with low CD4 count (never had low CD4 count, 41.7%).

Figure 3.

Figure 3

Distribution for VL characteristics (baseline, last, maximum, and mean) using FPCA.

Figure 4.

Figure 4

Distribution for VL characteristics (percentage of time with VS and viral rebound history) using FPCA.

Figure 5.

Figure 5

Distribution for CD4 characteristics using FPCA.

4. Discussion

The application of FPCA in clustering PWH based on their longitudinal VL patterns has proven to be effective in this study, while other clustering methods (K-means, Hclust, and GMM) failed to distinguish different clusters. The results successfully categorized patients into four distinct clusters with different characteristics: long-term VS group, short-term VS group, suboptimal VS group, and viral failure group. This method has allowed for a more comprehensive understanding of the heterogeneity within the patient population, providing valuable insights into disease progression and treatment responses.

The advantage of FPCA, compared to other clustering methods based only on aggregated VL and other longitudinal measures, lies in its ability to capture the continuity and dynamics of VL, handle sparsity and irregularity in longitudinal data, and accommodate variability and nonlinearity [38, 43]. Namely, FPCA is the only method to cluster functional curves. Dealing with sparse and irregular longitudinal data is a common challenge in epidemiological studies involving EHR data, particularly when analyzing disease trajectories. Continuous clustering via FPCA offers multiple benefits for understanding viral profiles among PWH. It identifies unique VL trajectory patterns, enabling the stratification of PWH into distinct groups that may reflect different infection stages, treatment responses, or disease progressions. These clusters can correlate with varying risk profiles for comorbidity development or virus transmission, informing targeted public health strategies. Furthermore, continuous clustering aids in the predictive modeling of VL changes, guiding clinical decision making and trial designs. It also informs tailored treatment approaches by understanding how different PWH groups respond to treatment and evolve in VL trajectories, ultimately enhancing patient outcomes.

A good clustering method after extracting FPC scores is also important. Therefore, we also compared the use of EMCluster on FPC scores with K-means clustering on the same scores (Supporting Table 3). While the clusters generated by K-means were less balanced than those from EMCluster (with 74.0% of the data in one cluster), they still performed much better than applying K-means solely on aggregated VL data (which resulted in 97.3% in one cluster).

The clustering results obtained in this study are not only statistically meaningful but also clinically interpretable, which is essential in ensuring the relevance and applicability of the findings. The identified clusters align with what has been observed in clinical practice and other research studies [19, 4447]. For example, PWH in suboptimal VS group may have very similar VL patterns as intermittent and sustained low-level HIV VR [45]; PWH in viral failure group tend to have similar clinical indicators as PWH with SHVL [19]. Thus, the distinction between four groups is consistent with the known variability in treatment responses among PWH.

While continuous clustering using FPCA is a powerful tool for VL pattern clustering, it is important to acknowledge its limitations. Disease and other biological indicators can affect VL dynamics. To achieve a more comprehensive clustering results and clinical understanding, future studies should consider incorporating these confounders, such as longitudinal CD4 counts and comorbidities, into the clustering analysis to account for their influence on VL trajectories. Meanwhile, the study uses a 200 copies/mL threshold as the lower boundary for VL measurements. This threshold may affect the interpretation of VL data and could be a subject of consideration in future research. FPCA can effectively manage sparsity by estimating the underlying functional curves based on the observed data points, without requiring strict regularity in measurements. However, significant irregularity and sparsity continue to affect the effectiveness of the clustering performance [48].

To achieve a more comprehensive clustering results and clinical understanding, future studies should consider incorporating these confounders, such as longitudinal CD4 counts and comorbidities, into the clustering analysis to account for their influence on VL trajectories. Meanwhile, the study uses a 200 copies/mL threshold as the lower boundary for VL measurements. This threshold may affect the interpretation of VL data and could be a subject of consideration in future research.

In conclusion, this study highlights the effectiveness of continuous clustering using FPCA in understanding the distinct viral profiles of PWH. The results obtained provide valuable insights for both clinical practice and public health strategies, emphasizing the importance of tailored treatment and targeted interventions based on individualized disease trajectories.

Acknowledgments

The research reported in this publication was supported by the National Institute of Allergy and Infectious Diseases of the National Institutes of Health under Award Number R01AI127203.

Data Availability Statement

The authors are prohibited from making individual-level data available publicly due to provisions in our data use agreements with state agencies/data providers, institutional policy, and ethical requirements. To facilitate research, we make access to such data available via approved data access requests through the data owners. The data are unavailable externally or for re-release due to prohibitions in data use agreements with our state agencies or other data providers. Institutional policies stipulate that all external requests for data access require collaboration with a USC researcher. For more information or to make a request, please contact Bankole Olatosi, PhD: Olatosi@mailbox.sc.edu. The underlying analytical codes are available from the authors on request.

Ethics Statement

We did not do experiments/study on/involving humans or human data.

Consent

The authors have nothing to report.

Disclosure

The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Funders had no role in the design of the study and collection, analysis, and interpretation of the data. The manuscript was presented at CROI 2024 as a poster.

Conflicts of Interest

The authors declare no conflicts of interest.

Author Contributions

Yunqing Ma and Jiayang Xiao did the analysis part. Yunqing Ma wrote the manuscript. Xueying Yang and Jiajia Zhang gave suggestions about the writing and modified the manuscript. Xiaoming Li, Bankole Olatosi, and Jiajia Zhang supervised all the work.

Funding

The research reported in this publication was supported by the National Institute of Allergy and Infectious Diseases of the National Institutes of Health under Award Number R01AI164947.

Supporting Information

Supporting Information

Additional supporting information can be found online in the Supporting Information section.

5890464.f1.docx (73.6KB, docx)

To better show the differences among four clusters from FPCA, we provided the distribution for VL percentiles in Supporting Table 1. Supporting Table 2 shows the distribution for VL characteristics and percentiles, demographics, comorbidities, social behaviors, and historical CD4 count for overall and four clusters from GMM in order to show the advantage of our methods over the GMM clustering. Supporting Table 3 shows the distribution for VL characteristics and percentiles, demographics, comorbidities, social behaviors, and historical CD4 count for overall and four clusters from K-means on the same FPC scores in order to show the advantage of EM Clustering over K-means.

References

  • 1.Nance R. M., Delaney J. C., Simoni J. M., et al. HIV Viral Suppression Trends over Time Among HIV-Infected Patients Receiving Care in the United States, 1997 to 2015. Annals of Internal Medicine . 2018;169(6):376–384. doi: 10.7326/m17-2242. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Thaker H. K., Snow M. H. HIV Viral Suppression in the Era of Antiretroviral Therapy. Postgraduate Medical Journal . 2003;79(927):36–42. doi: 10.1136/pmj.79.927.36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Langford S. E., Ananworanich J., Cooper D. A. Predictors of Disease Progression in HIV Infection: a Review. AIDS Research and Therapy . 2007;4(1):p. 11. doi: 10.1186/1742-6405-4-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Wilson D. P., Law M. G., Grulich A. E., Cooper D. A., Kaldor J. M. Relation between HIV Viral Load and Infectiousness: a Model-Based Analysis. The Lancet . 2008;372(9635):314–320. doi: 10.1016/s0140-6736(08)61115-0. [DOI] [PubMed] [Google Scholar]
  • 5.Cohen M. S., Chen Y. Q., McCauley M., et al. Prevention of HIV-1 Infection with Early Antiretroviral Therapy. New England Journal of Medicine . 2011;365(6):493–505. doi: 10.1056/nejmoa1105243. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Palmer A., Gabler K., Rachlis B., et al. Viral Suppression and Viral Rebound Among Young Adults Living with HIV in Canada. Medicine . 2018;97(22):p. e10562. doi: 10.1097/md.0000000000010562. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Fraser C., Hollingsworth T. D., Chapman R., de Wolf F., Hanage W. P. Variation in HIV-1 Set-point Viral Load: Epidemiological Analysis and an Evolutionary Hypothesis. Proceedings of the National Academy of Sciences . 2007;104(44):17441–17446. doi: 10.1073/pnas.0708559104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Arnold E. M., Swendeman D., Harris D., et al. The Stepped Care Intervention to Suppress Viral Load in Youth Living with HIV: Protocol for a Randomized Controlled Trial. JMIR Research Protocols . 2019;8(2):p. e10791. doi: 10.2196/10791. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Ghose T., Shubert V., Poitevien V., Choudhuri S., Gross R. Effectiveness of a Viral Load Suppression Intervention for Highly Vulnerable People Living with HIV. AIDS and Behavior . 2019;23(9):2443–2452. doi: 10.1007/s10461-019-02509-5. [DOI] [PubMed] [Google Scholar]
  • 10.Moens K., Siegert R. J., Taylor S., Namisango E., Harding R., Encompass & Impact E. Symptom Clusters in People Living with HIV Attending Five Palliative Care Facilities in Two Sub-saharan African Countries: A Hierarchical Cluster Analysis. PLoS One . 2015;10(5):p. e0126554. doi: 10.1371/journal.pone.0126554. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Oster A. M., Lyss S. B., McClung R. P., et al. HIV Cluster and Outbreak Detection and Response: The Science and Experience. American Journal of Preventive Medicine . 2021;61(5):S130–S142. doi: 10.1016/j.amepre.2021.05.029. [DOI] [PubMed] [Google Scholar]
  • 12.Zhu Z., Zhao R., Hu Y. Symptom Clusters in People Living with HIV: A Systematic Review. Journal of Pain and Symptom Management . 2019;58(1):115–133. doi: 10.1016/j.jpainsymman.2019.03.018. [DOI] [PubMed] [Google Scholar]
  • 13.Cook P. F., Sousa K. H., Matthews E. E., Meek P. M., Kwong J. Patterns of Change in Symptom Clusters with HIV Disease Progression. Journal of Pain and Symptom Management . 2011;42(1):12–23. doi: 10.1016/j.jpainsymman.2010.09.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Krebs E., Min J. E., Bayoumi A. M., et al. Informing Targeted Interventions to Optimize the Cascade of HIV Care Using Cluster Analyses of Health Resource Use Among People Living with HIV/AIDS. AIDS and Behavior . 2018;22(1):234–244. doi: 10.1007/s10461-017-1839-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Farooq S., Weisenthal S. J., Trayhan M., et al. Revealing Patterns in HIV Viral Load Data and Classifying Patients via a Novel Machine Learning Cluster Summarization Method. 2018. [DOI]
  • 16.Farooq S. A., Weisenthal S. J., Trayhan M., et al. Revealing HIV Viral Load Patterns Using Unsupervised Machine Learning and Cluster Summarization. F1000Res . 2018;7:p. 1144. doi: 10.12688/f1000research.15591.1. [DOI] [Google Scholar]
  • 17.Phillips A. N., Staszewski S., Weber R., et al. HIV Viral Load Response to Antiretroviral Therapy According to the Baseline CD4 Cell Count and Viral Load. JAMA . 2001;286(20):2560–2567. doi: 10.1001/jama.286.20.2560. [DOI] [PubMed] [Google Scholar]
  • 18.Rose C. E., Gardner L., Craw J., et al. A Comparison of Methods for Analyzing Viral Load Data in Studies of HIV Patients. PLoS One . 2015;10(6):p. e0130090. doi: 10.1371/journal.pone.0130090. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Terzian A. S., Bodach S. D., Wiewel E. W., et al. Novel Use of Surveillance Data to Detect HIV-Infected Persons with Sustained High Viral Load and Durable Virologic Suppression in New York City. PLoS One . 2012;7(1):p. e29679. doi: 10.1371/journal.pone.0029679. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Achenbach C. J., Buchanan A. L., Cole S. R., et al. HIV Viremia and Incidence of Non-hodgkin Lymphoma in Patients Successfully Treated with Antiretroviral Therapy. Clinical Infectious Diseases . 2014;58(11):1599–1606. doi: 10.1093/cid/ciu076. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Bruyand M., Thiébaut R., Lawson-Ayayi S., et al. Role of uncontrolled HIV RNA level and immunodeficiency in the occurrence of malignancy in HIV-infected patients during the combination antiretroviral therapy era: Agence Nationale de Recherche sur le Sida (ANRS) CO3 Aquitaine Cohort. Clinical Infectious Diseases . 2009;49(7):1109–1116. doi: 10.1086/605594. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Kowalkowski M., Day R., Du X., Chan W., Chiao E. Cumulative HIV Viremia and Non-AIDS-defining Malignancies Among a Sample of HIV-Infected Male Veterans. JAIDS Journal of Acquired Immune Deficiency Syndromes . 2014;67(2):204–211. doi: 10.1097/qai.0000000000000289. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Zoufaly A., Stellbrink H.-J., Heiden M. A. der, et al. Cumulative HIV Viremia during Highly Active Antiretroviral Therapy Is a Strong Predictor of AIDS-Related Lymphoma. The Journal of Infectious Diseases . 2009;200(1):79–87. doi: 10.1086/599313. [DOI] [PubMed] [Google Scholar]
  • 24.Sher R., Dlamini S., Muloiwa R. Patterns of Detectable Viral Load in a Cohort of HIV-Positive Adolescents on Antiretroviral Therapy in South Africa. Journal of the International AIDS Society . 2020;23(3):p. e25474. doi: 10.1002/jia2.25474. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Clark W. a. V., Avery K. L. The Effects of Data Aggregation in Statistical Analysis. Geographical Analysis . 1976;8(4):428–438. doi: 10.1111/j.1538-4632.1976.tb00549.x. [DOI] [Google Scholar]
  • 26.Beretta M., Pelka K., Cusidó J., Lichtenstein T. Quantification of the Information Loss Resulting from Temporal Aggregation of Wind Turbine Operating Data. Applied Sciences . 2021;11(17):p. 8065. doi: 10.3390/app11178065. [DOI] [Google Scholar]
  • 27.Kim S. In: Encyclopedia of Gerontology and Population Aging . Gu D., Dupre M. E., editors. Springer International Publishing; 2021. pp. 1251–1255. [DOI] [Google Scholar]
  • 28.Mullin S., Zola J., Lee R., et al. Longitudinal K-Means Approaches to Clustering and Analyzing EHR Opioid Use Trajectories for Clinical Subtypes. Journal of Biomedical Informatics . 2021;122:p. 103889. doi: 10.1016/j.jbi.2021.103889. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Wang J.-L., Chiou J.-M., Müller H.-G. Functional Data Analysis. Annu. Rev. Stat. Appl. . 2016;3(1):257–295. doi: 10.1146/annurev-statistics-041715-033624. [DOI] [Google Scholar]
  • 30.Yao F., Müller H.-G., Wang J.-L. Functional Data Analysis for Sparse Longitudinal Data. Journal of the American Statistical Association . 2005;100(470):577–590. doi: 10.1198/016214504000001745. [DOI] [Google Scholar]
  • 31.Pascucci S., Carfora M. F., Palombo A., et al. A Comparison between Standard and Functional Clustering Methodologies: Application to Agricultural Fields for Yield Pattern Assessment. Remote Sensing . 2018;10(4):p. 585. doi: 10.3390/rs10040585. [DOI] [Google Scholar]
  • 32.Jacques J., Preda C. Functional Data Clustering: a Survey. Adv Data Anal Classif . 2014;8(3):231–255. doi: 10.1007/s11634-013-0158-y. [DOI] [Google Scholar]
  • 33.Yang Q., Jiang M., Li C., Luo S., Crowley M. J., Shaw R. J. Predicting Health Outcomes with Intensive Longitudinal Data Collected by Mobile Health Devices: a Functional Principal Component Regression Approach. BMC Medical Research Methodology . 2024;24(1):p. 69. doi: 10.1186/s12874-024-02193-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Sullivan P. S., Woodyatt C., Koski C., et al. A Data Visualization and Dissemination Resource to Support HIV Prevention and Care at the Local Level: Analysis and Uses of the AIDSVu Public Data Resource. Journal of Medical Internet Research . 2020;22(10):p. e23173. doi: 10.2196/23173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Olatosi B., Zhang J., Weissman S., Hu J., Haider M. R., Li X. Using Big Data Analytics to Improve HIV Medical Care Utilisation in South Carolina: A Study Protocol. BMJ Open . 2019;9(7):p. e027688. doi: 10.1136/bmjopen-2018-027688. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Cdc. HIV Surveillance Report. 2018.
  • 37.Zhang J., Olatosi B., Yang X., et al. Studying Patterns and Predictors of HIV Viral Suppression Using A Big Data Approach: a Research Protocol. BMC Infectious Diseases . 2022;22(1):p. 122. doi: 10.1186/s12879-022-07047-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Lin N., Jiang J., Guo S., Xiong M. Functional Principal Component Analysis and Randomized Sparse Clustering Algorithm for Medical Image Analysis. PLoS One . 2015;10(7):p. e0132945. doi: 10.1371/journal.pone.0132945. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Hartigan J. A., Wong M. A. Algorithm AS 136: A K-Means Clustering Algorithm. Applied Statistics . 1979;28(1):100–108. doi: 10.2307/2346830. [DOI] [Google Scholar]
  • 40.Murtagh F., Contreras P. Algorithms for Hierarchical Clustering: an Overview. WIREs Data Mining and Knowledge Discovery . 2011;2(1):86–97. doi: 10.1002/widm.53. [DOI] [Google Scholar]
  • 41.Yang M.-S., Lai C.-Y., Lin C.-Y. A Robust EM Clustering Algorithm for Gaussian Mixture Models. Pattern Recognition . 2012;45(11):3950–3961. doi: 10.1016/j.patcog.2012.04.031. [DOI] [Google Scholar]
  • 42.Chen W.-C., Maitra R., Melnykov V., Nettleton D., Faden D., Rostamian R. EM Algorithm for Model-Based Clustering of Finite Mixture Gaussian Distribution. 2015.
  • 43.Leroy A., Marc A., Dupas O., Rey J. L., Gey S. Functional Data Analysis in Sport Science: Example of Swimmers’ Progression Curves Clustering. Applied Sciences . 2018;8(10):p. 1766. doi: 10.3390/app8101766. [DOI] [Google Scholar]
  • 44.de Jong M. D., Simmons C. P., Thanh T. T., et al. Fatal Outcome of Human Influenza A (H5N1) Is Associated with High Viral Load and Hypercytokinemia. Nature Medicine . 2006;12(10):1203–1207. doi: 10.1038/nm1477. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Greub G., Cozzi-Lepri A., Ledergerber B., et al. Intermittent and Sustained Low-Level HIV Viral Rebound in Patients Receiving Potent Antiretroviral Therapy. AIDS . 2002;16(14):1967–1969. doi: 10.1097/00002030-200209270-00017. [DOI] [PubMed] [Google Scholar]
  • 46.Viard J.-P., Burgard M., Hubert J.-B., et al. Impact of 5 Years of Maximally Successful Highly Active Antiretroviral Therapy on CD4 Cell Count and HIV-1 DNA Level. AIDS . 2004;18(1):45–49. doi: 10.1097/00002030-200401020-00005. [DOI] [PubMed] [Google Scholar]
  • 47.Ylitalo N., Sørensen P., Josefsson A. M., et al. Consistent High Viral Load of Human Papillomavirus 16 and Risk of Cervical Carcinoma In Situ: a Nested Case-Control Study. The Lancet . 2000;355(9222):2194–2198. doi: 10.1016/s0140-6736(00)02402-8. [DOI] [PubMed] [Google Scholar]
  • 48.Guo S., Zhang J., Wu Y., et al. Functional Multivariable Logistic Regression with an Application to HIV Viral Suppression Prediction. Biometrical journal. Biometrische Zeitschrift . 2024;66(5):p. e202300081. doi: 10.1002/bimj.202300081. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Additional supporting information can be found online in the Supporting Information section.

5890464.f1.docx (73.6KB, docx)

Data Availability Statement

The authors are prohibited from making individual-level data available publicly due to provisions in our data use agreements with state agencies/data providers, institutional policy, and ethical requirements. To facilitate research, we make access to such data available via approved data access requests through the data owners. The data are unavailable externally or for re-release due to prohibitions in data use agreements with our state agencies or other data providers. Institutional policies stipulate that all external requests for data access require collaboration with a USC researcher. For more information or to make a request, please contact Bankole Olatosi, PhD: Olatosi@mailbox.sc.edu. The underlying analytical codes are available from the authors on request.


Articles from AIDS Research and Treatment are provided here courtesy of Wiley

RESOURCES