Skip to main content
PLOS Global Public Health logoLink to PLOS Global Public Health
. 2023 Jul 19;3(7):e0002105. doi: 10.1371/journal.pgph.0002105

Historical visit attendance as predictor of treatment interruption in South African HIV patients: Extension of a validated machine learning model

Rachel T Esra 1,2,*, Jacques Carstens 3, Janne Estill 1, Ricky Stoch 4, Sue Le Roux 5, Tonderai Mabuto 5, Michael Eisenstein 5, Olivia Keiser 1, Mhari Maskew 6, Matthew P Fox 6,7, Lucien De Voux 3, Kieran Sharpey-Schafer 3
Editor: Hannah Hogan Leslie8
PMCID: PMC10355459  PMID: 37467217

Abstract

Retention of antiretroviral (ART) patients is a priority for achieving HIV epidemic control in South Africa. While machine-learning methods are being increasingly utilised to identify high risk populations for suboptimal HIV service utilisation, they are limited in terms of explaining relationships between predictors. To further understand these relationships, we implemented machine learning methods optimised for predictive power and traditional statistical methods. We used routinely collected electronic medical record (EMR) data to evaluate longitudinal predictors of lost-to-follow up (LTFU) and temporal interruptions in treatment (IIT) in the first two years of treatment for ART patients in the Gauteng and North West provinces of South Africa. Of the 191,162 ART patients and 1,833,248 visits analysed, 49% experienced at least one IIT and 85% of those returned for a subsequent clinical visit. Patients iteratively transition in and out of treatment indicating that ART retention in South Africa is likely underestimated. Historical visit attendance is shown to be predictive of IIT using machine learning, log binomial regression and survival analyses. Using a previously developed categorical boosting (CatBoost) algorithm, we demonstrate that historical visit attendance alone is able to predict almost half of next missed visits. With the addition of baseline demographic and clinical features, this model is able to predict up to 60% of next missed ART visits with a sensitivity of 61.9% (95% CI: 61.5–62.3%), specificity of 66.5% (95% CI: 66.4–66.7%), and positive predictive value of 19.7% (95% CI: 19.5–19.9%). While the full usage of this model is relevant for settings where infrastructure exists to extract EMR data and run computations in real-time, historical visits attendance alone can be used to identify those at risk of disengaging from HIV care in the absence of other behavioural or observable risk factors.

Introduction

While South Africa has the largest HIV treatment programme globally, it is currently estimated that a quarter of the 7.5 million people living with HIV (PLHIV) are not on antiretroviral treatment (ART) [1]. ART is lifelong and stopping treatment results in rapid viral rebound, putting patients at an individual risk for AIDS-defining illness and increasing the risk of viral transmission [2]. Retention on ART remains a challenge in South Africa where 11–28% of patients become lost to follow up (LTFU) within the first two years of treatment initiation [3, 4].

ART treatment interruption in South Africa is likely mediated by a complex mix of socio-behavioural factors including mobility, stigma and health facility access [5, 6]. Cohort studies indicate that the risk of LTFU varies over time [7, 8] and many patients iteratively transition in and out of treatment [9], making behavioural drivers of ART retention difficult to define longitudinally. Without socio-behavioural information linked to routine HIV management, many retention interventions are focused on broad demographic sub-populations with perceived elevated rates of LTFU, including men, those diagnosed with HIV at younger ages and those initiating treatment with lower CD4 counts [10, 11]. However, little evidence supports the effectiveness of this approach [12, 13].

Innovative approaches to understanding and addressing risk of disengagement from HIV care are needed. Traditional statistical methods such as regression and survival analysis are frequently used to enumerate factors that describe elevated risk of LTFU [711]. Though widely adopted due to their ease of computation and explainability, these methods are limited in terms of accurately modelling collinearity, interaction effects and non-linear relationships between predictors [14] and are thus unable to uncover the complex mechanisms driving risk of disengagement from care. In contrast to this, machine learning methods are able to account for non-linear patterns often present in routinely collected observational data, and are increasingly being used to identify high risk subgroups of populations with suboptimal HIV service utilisation in low- and middle-income contexts [1518].

We have previously described a machine learning algorithm able to predict up to two thirds of missed ART clinic visits using only visit attendance and routinely collected clinical information [16, 17]. In this model, patterns of historical visits attendance ranked higher than baseline demographic and clinical characteristics when predicting next missed visits [17]. While this model is able to predict the risk of disengagement from care at the level of on an individual patient and visit, the approach is still limited in terms of ability to infer relationships between predictors and interpret both the individual and relative role of potential predictors of treatment interruptions [17].

Previously, we identified 13 predictors for ART treatment discontinuation relating to age, baseline clinical characteristics and patterns of visits attendance from routinely collected ART patient records [17]. Here, we aim to expand the explainability of these predictors as a means of providing more generalised descriptions of the population at risk of IIT and the underlying drivers of risk. We assess the relative contribution of historical visit attendance in predicting risk of treatment interruption, by defining mutually exclusive and collectively exhaustive visit attendance archetypes encompassing this information in a single categorical variable. We then evaluate the predictive ability of the archetypes alone and in combination with the previously identified demographic and clinical predictors using both machine learning and traditional statistical methods.

Methods

Ethics

This study utilises routinely collected patient record data from the TIER.net electronic medical register (EMR), consisting of patient-level data collected at public health facilities providing HIV care and treatment to the public sector in South Africa [19]. Data extraction, data anonymization and data management were approved by the University of Witwatersrand (Human Research Ethics Committee, Reference: 210106). Data extraction and anonymisation was performed by collaborators from The Aurum Institute South Africa, a not-for-profit organisation funded by the President’s Emergency Plan for Aids Relief (PEPFAR) to support the implementation and improvement of ART services at the health facilities included in this study. The use of de-identified routine programme data to identify areas for quality improvement efforts is standard practice in South Africa, and critical for achieving the country’s goals to control the HIV epidemic.

Data sources and study participants

Our cohort included patients receiving ART from facilities from the Gauteng and North West provinces of South Africa. We included patient records from 1 January 2017, after the date of the implementation of the treatment for all policy, whereby ART initiation in South Africa was implemented for HIV patients regardless of HIV disease progression [20]. We included all patients newly initiated onto ART from the study start date to 24 March 2022, aged 15 years and older at ART initiation with a minimum of 18 months observation time. Based on cohort data indicating that the risk of LTFU stabilises after two years on treatment [21], person time was censored at 2 years after ART initiation. From the 264,635 patients that matched our inclusion criteria, we excluded patients who had died (0.03%, N = 8,028) or had transferred out to other facilities (23%, N = 61,775). We additionally excluded patients with records flagged as poor-quality including patients confirmed as LTFU at visits prior to final visits on record (N = 805) and patients with HIV diagnosis recorded after ART start date (N = 1).

Measures

Operational definition of outcomes

For the purpose of this study, we considered different operational definitions of treatment interruption on the individual patient level. We assessed longitudinal treatment attendance on a visit by visit basis, by classifying each visit in a patient’s visit trajectory as an interruption in treatment (IIT) if the visit was attended more than 28 days after the scheduled visit date [16, 17, 22]. On a patient level, we investigated the relationship between the longitudinal pattern of visit attendance and a final outcome of patient retention, where patients were considered LTFU if they were 90 days or more late for a scheduled visit at the end of our observation period in accordance with the South African Department of Health guidelines [20].

Visit attendance archetypes

In our previous work, variables describing historical visit attendance including the ratio of visits attended late vs. visits attended on time and the number of historical IITs, were shown to be more important in predicting next missed visits than baseline demographic and clinical features [17]. Based on these results and input from the clinical and program teams at The Aurum Health Institute, we developed mutually exclusive and collectively exhaustive visit attendance archetypes that describe historical visit attendance in a single categorical variable (Fig 1). For each visit attended, we defined visits attended within 14 days of a scheduled appointment to be on time and visits attended between 14–28 days of a scheduled appointment to be late (Fig 1). Using these definitions of visits attended on time, visits attended late and IITs, we defined visit archetypes as illustrated and described in Fig 1 and Table 1.

Fig 1. Visit archetypes based on longitudinal patterns of ART visit attendance as described in Table 1.

Fig 1

Archetypes are mutually exclusive, completely exhaustive and defined by the historical pattern of visit attendance of interruptions in treatment (red), late attendance (orange) and visits attended on time (green). Analysis is focused on how these historical patterns are able to predict attendance at the next visit in the time series (grey).

Table 1. Visits archetype definitions based on longitudinal patterns of ART visit attendance.
Visit archetype   Definition
Adherent On time Current visit and previous visit attended on time OR Current visit attended on time and previous visit attended late
Late Late Once Current visit attended late and previous visit attended on time
Late Twice Current visit attended late and previous visit attended late
Interrupter First time interrupter First IIT (visit attended >28 days after scheduled visit date) after last visit attended on time
First time interrupter late previously First IIT (visit attended >28 days after scheduled visit date) after last visit attended late
Repeat interrupter Visit attended >28 days after scheduled visit date and patient has historical IIT and last visit
Repeat interrupter late previously Visit attended >28 days after scheduled visit date, patient has historical IIT and last visit late
ITT twice Visit attended >28 days after scheduled visit date and previous visit IIT
Returning defaulter Visit after ITT on time Visit on time where previous visit was an IIT
Visit after ITT late Late visit where previous visit was an IIT

Data analysis

As described previously, clearly describing and explaining relationships between predictors in machine learning algorithms is difficult due to non-linearity and the collinearity [17]. While our previous work ranked historical visit attendance highly in predicting next missed ART visits, using machine learning methods alone, we are unable to assess the relative contribution of historical visit attendance to other variables included. Here we evaluate the individual and combined predictive ability of historical visits attendance, baseline demographic and clinical characteristics comparing traditional statistical approaches with machine learning methods.

Description of baseline and time varying risk factors

Descriptive statistics were used to characterise the demographic and clinical profile of patients at baseline and/or specific time points after ART initiation. We evaluated the demographic and clinical patient characteristics previously identified as predictive for IIT including sex, gender, age at ART initiation and baseline CD4 [17]. In order to adjust for changes in ART service delivery due to the Covid-19 pandemic [23], we included a binary variable describing the timing of ART initiation as preceding or during the national lockdown starting on 27 March 2020.

Defining baseline as the time of treatment initiation, we identified baseline risk factors using multivariable log binomial regression. We evaluated two separate outcomes, the risk of IIT and the risk of LTFU by the end of our observation period. For the latter, we included only patients who had two or more clinical visits. For all analyses, we report both sex aggregated and sex stratified estimates.

Second, we evaluated the impact of previous visit attendance on the time varying risk of IIT and LTFU using a non-parametric mixed effects Cox proportional hazards model. We included previous visit archetypes (Fig 1) and covariates identified as significant in our previous analysis in a model specified as:

Where the hazard of the occurrence the event at time for individual is the product of the baseline hazard, an exponentiated random effect for unobserved individual variance and linear function of predictors that may be time-invariant (,e.g. sex) or time-varying (, e.g. previous visits attendance). This semi-parametric extension of the cox proportional hazards model, violates the assumption of proportional hazards with inclusion of time varying covariates to account for within-subject correlation whereby the occurrence of an event may impact the occurrence of future events. We additionally include an individual level random effect, describing unmeasured heterogeneity in excess risk for clusters of individuals that cannot be explained by the observed covariates. In this recurrent event analysis, each IIT experienced was recorded as an event and patients who did not experience an IIT were censored at the end of the two year observation period. All statistical analysis was run in R version 4.2.1.

Inclusion of visit archetypes in machine learning model

We have previously developed and validated a machine learning model predicting missed ART visits using baseline characteristics, historical visits attendance, clinical data and ART dispensing information constructed from the same South African EMR source [17]. Here we compare the performance of the model using the original set of 13 predictors, the original set of 13 predictors with previous visit archetypes, and previous visit archetypes alone. We apply the model to the same dataset with an extended study period. Model validation, feature engineering and feature selection has been described previously [17]. Briefly, we randomly split 70% of visits into a training dataset (N = 1,833,248 visits) with the remaining 30% (N = 456,472 visits) reserved to act as an unseen test dataset. The training dataset was upsampled using the RandomOverSampler method from imblearn [24] to build a 50:50 balanced dataset. We implemented a gradient boosting model using the CatBoost algorithm [25]. The model was run for 1000 iterations using the model training parameters summarised previously [17].

Model performance was assessed using metrics to ascertain the ability to classify both the positive and negative outcomes. These included positive predictive value (PPV—proportion of predicted missed visits that were truly missed) and negative predictive value (NPV—proportion of predicted attended visits that were truly attended). We additionally evaluated the overall model performance, reporting the Area Under the model Precision Recall Curve (PR AUC, demonstrating model sensitivity and PPV at different classification thresholds), accuracy (total proportion of correctly identified visits) and F1 score (harmonic mean of overall model precision and recall). We constructed 95% confidence intervals using bootstrap resampling. We resampled the test dataset with replacement n = 1000 times, while the training set and model remained fixed. Feature importance was calculated using the Loss Function Change from CatBoost [25].

Comparison of survival analysis to machine learning predictions

For each visit in the test dataset, the model above calculates a probability that an IIT will occur at the next visit. If the probability is higher than 0.50, the visit is assigned an outcome of predicted IIT. Model predictions are compared to occurrence of the outcome in the dataset and the model metrics are calculated accordingly. We compared the correlation between previous visit type and the predicted probabilities of IIT produced by the machine learning model, to the hazard ratios produced by survival analysis in the first part of this study.

Results

Cohort characteristics

Our cohort included 191,162 patients of which 63% were women and the median age of ART initiation was 34 years old (IQR: 28–41). Despite our study beginning after the implementation of treatment for all policies [20], only 55% of patients were initiated onto treatment on the same day of HIV diagnosis, 20% within a week of HIV diagnosis, 9% within two weeks diagnosis and 16% two or more weeks after HIV diagnosis.

Baseline and time-varying risk factors for LTFU

Using the definition of a missed last appointment by 90 days or more, 38.8% (N = 73,978) of patients were defined as LTFU within two years of ART initiation. Of those that became LTFU, 24.5% (N = 18,568) did not return to treatment after initiation, 25.5% (N = 18,878) became LTFU within the first six months of treatment, 18.9% (N = 13,960) became LTFU between 6 months—1 year and the remainder 31.1% (N = 23,016) dropped out in the second year of treatment (Fig 1).

Overall, men were at a higher risk of LTFU after initiation (RR: 1.19 [95% CI: 1.15–1.23]) and within the first two years of treatment (RR: 1.07 [95% CI: 1.02–1.12]) (S1 Table). Risk of LTFU was lower for those initiated during Covid-19 lockdowns relative to those initiated prior (S1 Table). This effect was consistent in the aggregated and sex-stratified analyses for after initiation (RR: 1.09 [95% CI: 1.04–1.14]) and within the first two years of treatment (RR: 2.01 [95% CI: 1.92–2.11]). Similarly, the risk for LTFU decreased over time on treatment (S1 Table) for both men and women (S1 Table).

Longitudinal risk factors for IIT

Distribution of visit archetypes. During the first two years of treatment, 49% (N = 95,581) of patients who attended at least one additional visit after initiation experienced at least one IIT. Of the 1,778,074 visits observed, 75% were attended on time, 14% were attended late, 7% were defined as IIT and 4% of visits occurred after an interruption in treatment. Based on our operational definition of visits attendance (Fig 1), visits were classified by the mutually exclusive and collectively exhaustive visit archetypes defined in Table 2.

Table 2. Characteristics of current visits archetypes based on longitudinal patterns of ART visit attendance in a cohort of 191,162 patients initiating antiretroviral therapy in South Africa from Jan 2017-March 2022.

Visit archetype   % of total visits % Next visit IIT
Adherent On time 75.31% (N = 1,309,143) 6.30%
Late Late Once 10.63% (N = 184,827) 8.74%
Late Twice 2.89% (N = 50,280) 11.85%
Interrupter First time interrupter 3.48% (N = 60,553) 13.48%
First time interrupter late previously 0.81% (N = 14,154) 17.54%
Repeat interrupter 0.96% (N = 16,639) 15.22%
Repeat interrupter late previously 0.41% (N = 7,178) 21.13%
ITT twice 0.82% (N = 14,276) 23.48%
Returning defaulter Visit after ITT on time 3.53% (N = 61,291) 11.15%
Visit after ITT late 1.15% (N = 20,022) 16.91%

While the overall rate of return to treatment was lower than interruption in treatment (Fig 2), 85% of patients returned for a subsequent ART visit after an interruption in treatment. After an initial peak due to those who drop out of treatment after initiation, rates of IIT consistently increase with time on treatment with marked declines at the one year and two year time points (Fig 2).

Fig 2. Longitudinal ART clinic visit attendance in the first two years of antiretroviral therapy in a cohort of 191,162 patients initiating antiretroviral therapy in South Africa from Jan 2017-March 2022.

Fig 2

Purple dots represent the proportion of patients that do not return to treatment after treatment initiation. Green lines represent the monthly proportion of visits that are attended more than 28 days after a scheduled appointment (ITT) and yellow lines represent the proportion of monthly visits attended by patients within 28 days of a scheduled visit date after previously experiencing an ITT (Return after ITT).

As with LTFU, age at ART initiation and baseline CD4 count were not shown to be predictive of risk of IIT (Fig 3, S2 Table). Previous visit attendance was shown to be associated with the risk of experiencing an IIT with both late previous visit attendance and having had a historical IIT having increased hazards of the next visit being an IIT (Fig 2, S2 Table).

Fig 3. Adjusted survival analysis of baseline and longitudinal risk factors for interruption in treatment in a cohort of 191,162 patients initiating antiretroviral therapy in South Africa from Jan 2017-March 2022.

Fig 3

Results from semi-parametric extension of the cox model are summarised as exponentiated hazards ratios (box) and 95% confidence intervals (whiskers). Colours denote reference (black) and comparator (grey) groups for categorical variables and size denotes number of observations in each variable group.

Relating linear risk factors to machine learning model predictions

When using the original set of 13 predictors, model performance decreased relative to previous iterations [17] when trained and tested with health records collected during the and after Covid-19 lockdown measures (Table 3). While sensitivity remained similar, with both models able to predict approximately 62% of next missed visits, PPV decreased by 2% translating to 17.5% next visits labelled as missed to be truly missed (Table 3). Model performance was not improved with the addition of a singular categorical predictor describing previous visits archetypes (Table 3). In comparison to the full model containing information on historical visits attendance, baseline demographics and clinical features, a model using only previous visit archetypes was able to correctly predict almost half of next missed visits, with a small decrease in precision (PPV of 16.5%).

Table 3. Model performance in analysis of prediction of interruption in treatment in a cohort of 191,162 patients initiating antiretroviral therapy in South Africa from Jan 2017-March 2022.

We compared the performance of (A) the top 13 predictors from the validated CatBoost model, (B) the addition of previous visit archetypes to the validated CatBoost model and (C) a model using only previous visit archetypes as predictor.

Model   (A) CatBoost model: top 13 predictors (B) CatBoost model: top 13 predictors + archetypes (C) CatBoost model: Archetypes only
Cohort 1 Jan 2017 - 1 Jan 2017 - 1 Jan 2017 -
31 March 2020 24 March 2022 24 March 2022
Patients 136,082 191,162 191,162
Visits   1,494,728 1,833,248 1,833,248
  Socio-demographic Age Age
Sex Sex
Clinical ART regimen duration ART regimen duration
Viral load count Viral load count
Last VL Value Last VL Value
Historical visit attendance Visit count Visit count
Months since last visit Months since last visit
Months since first visit Months since first visit
Next visit: day of month Next visit: day of month
Next visit: day of week Next visit: day of week
3 days late ratio 3 days late ratio
28 days late count 28 days late count
# Months missed Tx # Months missed Tx
      Previous visit archetype Previous visit archetype
N visits train (%ITT) 1,833,248 (50%) 4,638,562 (50%) 4,638,562 (50%)
N visits test (%ITT) 456,472 (12%) 456,472 (10%) 456,472 (10%)
Sensitivity 61.9% (61.5–62.3%) 62.4% (62.2–62.7%) 48.1% (47.9–48.4%)
Specificity 66.5% (66.4–66.7%) 66.6% (66.6–66.7%) 72.4% (72.4–72.5%)
PPV 19.7% (19.5–19.9%) 17.5% (17.4–17.6%) 16.5% (16.4–16.6%)
NPV 93% (92.9–93%) 94% (93.9–94%) 92.5% (92.4–92.5%)
F1 score 0.299 (0.296–0.301) 0.274 (0.272–0.275) 0.246 (0.244–0.248)
roc AUC   0.692 (0.69–0.695) 0.697 (0.695–0.698) 0.625 (0.623–0.627)

We ranked previous archetypes based on the hazard ratios calculated in S2 Table, and evaluated how these results related to the risk of IIT predicted by the CatBoost model (Fig 4). We found that relative to previous on time visits, late previous visit attendance or historical IIT was associated with a prediction of IIT in the machine learning model. Subsequent late visit attendance and/or IITs were strongly associated with elevated risk of IIT in both the machine learning and adjusted cox models. While associated with a relatively smaller increased risk of IIT compared to other visit archetypes in the survival analysis, a late visit where a previous visit was an IIT often preceded an IIT in the machine learning predictions. Conversely, a single late visit, shown to confer an elevated 20% hazard of the occurrence of IIT in the survival analysis was not a strong predictor of IIT in the machine learning model.

Fig 4. CatBoost predicted probability of IIT for all visits summarised by previous visits type in a cohort of 191,162 patients initiating antiretroviral therapy in South Africa from Jan 2017-March 2022.

Fig 4

Hazard ratios for each visit archetype from adjusted survival model for all patients (Table 3) are labelled above.

Discussion

ART patient retention is a priority for achieving epidemic control in the South African HIV epidemic. To design effective intervention strategies, there is a need for more precise descriptions of the longitudinal changes in the risk of LTFU as well as the characteristics of those who disengage from treatment [26]. We have previously reported that a machine learning model informed by historical visit attendance, baseline demographics and clinical risk factors is able to predict up to two-thirds of next missed ART clinic visits [17]. Here, we demonstrate that historical visit attendance alone is able to predict up to half of next missed visits and is predictive of IIT using both machine learning and traditional statistical methods.

Depending on the program context, the operational definition of LTFU is defined as anywhere between 28–90 days out of treatment [20, 22]. Given inconsistent definitions of LTFU, the inability to account for undocumented patient transfers between clinics and undocumented mortality, true rates of LTFU in South Africa are difficult to quantify [27, 28]. A cohort study involving intensive retrospective contact tracing of patients who discontinued ART at a Kwazulu-Natal clinic found that only 14% of patients marked as LTFU were truly unaccounted for [10]. Using a definition of 90 days out of treatment we observed that almost 40% of patients became LTFU within the first two years of treatment and 50% of treatment discontinuation occurred the first month of treatment. When assessing trends in IIT, we observed that 85% of patients who miss a visit by more than 28 days return for a subsequent visit, implying that cross-sectional estimates of LTFU are not a good indicator of current treatment coverage. We observed temporal landmarks in ART visit attendance, with rates of IIT being lowest at the 1-year and 2-year landmarks. This demonstrates that ART retention and treatment engagement are dynamic processes and current approaches that do not consider temporal trends are not appropriate for characterising gaps in care in over time [29]. Understanding this is critical in informing cross-sectional estimates of treatment coverage, given the large variation in the sensitivity and specificity of current methods to assess ART treatment adherence [30].

Collinearity and the non-linear nature of predictors in our previously validated machine learning model limit explainability of risk factors identified as predictive of IIT [17] and therefore the ability to understand drivers of risk at the individual level and intervene accordingly before a treatment interruption occurs. While the inclusion of demographic and clinical features improves model performance, we have demonstrated with both machine learning and traditional statistical methods, that historical visit attendance alone is a strong predictor of IIT. Furthermore, lateness and repeat patterns of lateness can predict IIT, irrespective of current age, age at ART initiation, sex and baseline CD4. The usage of machine learning models such as this are limited to clinical settings where infrastructure exist to extract EMR data and run computations in real-time. In settings where this is not possible, patient archetypes based on historical visits attendance may be used to triage patient retention interventions.

These findings are aligned with results observed in historical cohort studies reporting that the timeliness of clinic attendance is a good predictor of viral load suppression and the development of ART resistance [31, 32]. In the absence of observable risk factors, we believe lateness is an actionable behavioural flag for a patient that may become LTFU in future but is currently present at a healthcare access point. This finding may inform patient retention strategies by identifying patients who are good candidates for prioritised interventions—those who are demonstrating a willingness to be on treatment and experiencing some external barrier. This creates the potential for targeted proactive intervention, as opposed to resource intensive retrospective tracing.

While there is little quantitative evidence detailing the effectiveness of individual retention interventions in South Africa [5, 6], modelling studies have demonstrated improving ART retention is cost saving even at low levels of effectiveness relative to alternatives for HIV spending [33]. A recent systematic review found that ART retention in South Africa was similar in standard care compared to 37 direct-service-delivery treatment interventions including those were facility-based individual models, out-of-facility-based individual models, client-led groups, and healthcare worker-led groups [5]. The direct effectiveness of retention interventions are difficult to quantify given that they are often implemented as part of multifaceted HIV service provision and retrospectively evaluated in an observational study framework [6]. While our work provides some visibility into ART patients who are at a high risk of LTFU, more work needs to be done in evaluating how the risk cohorts defined here can be effectively matched with appropriate retention or directed service delivery treatment modalities.

In previous applications of the IIT model, we censored data at the end of March 2020 as we were unable to account for interruptions in ART service delivery from the onset of the Covid-19 pandemic [17]. Here, we extended the application of the model to March 2022 and observed a moderate decrease in model performance despite doubling the size of the training dataset. Our regression analysis demonstrated that those initiated on ART during Covid lockdowns were at a reduced risk of longitudinal IIT and eventual LTFU relative to those initiated before. While this may reflect an improvement in treatment adherence, it is likely an artefact of the adoption of dispensing longer durations of ART treatment to account for limited facility access during that period [23, 34]. The sensitivity of the model to correctly predict IIT is a function of prevalence of the IIT as well as the occurrence of consistent patterns preceding IIT. Due to this, model development towards either sensitivity or precision is context dependent as has been discussed in our previous work [17]. Model performance may have been impacted by both a decrease in the overall observed rate of IIT and heterogeneity in visit attendance patterns after March 2020. Adding a single categorical predictor describing historical visit attendance did not improve model performance relative to the original set of predictors used, indicating that information on historical visit attendance is already distributed amongst the original set of predictors [17].

Due to the absence of unique patient identifiers, we were unable to account for patient mobility or validate outcome reporting. As a result, it is likely that a subset of patients classified as experiencing IIT or becoming LTFU were attending treatment at other facilities. Comparison of facility level outcomes to a South African national laboratory cohort demonstrates that HIV patient retention is underestimated at the facility level where undocumented patient transfers reflect as discontinuations in treatment [3]. Over six years of treatment, retention in care at the national level accounting for patient mobility was 63% relative to 29% at a facility level [3]. We plan on focusing future work on aligning viral load testing records with longitudinal patient attendance records as an improved method of ascertaining and predicting individual level treatment status.

A study of HIV patients in the United States demonstrated that data extracted from clinical records, patient mental health evaluations and insurance claims can be leveraged by machine learning methods to produce high precision predictions of patient behaviour across the HIV care cascade [35]. Without socio-behavioural information linked to routine HIV management systems, we use lateness as a signal for the occurrence events that increase the risk of IIT and LTFU. Using only longitudinal visits attendance and baseline clinical outcomes, we are able to predict two thirds of next missed visits. The incorporation of socio-behavioural data could improve the ability of this approach to inform retention interventions to prevent those at risk of disengaging from HIV care.

In this study, we describe baseline and time varying predictors of ART treatment in South African PLHIV. Longitudinal trajectories of ART visit attendance demonstrate that patients transition in and out of treatment indicating that patient retention in South Africa is likely underestimated. Historical visits attendance is predictive of future interruptions in treatment and can be used to identify those at risk of disengaging from HIV care in the absence of other behavioural or observable risk factors.

Supporting information

S1 Table. Log binomial regression results risk factors for LTFU in a cohort of 191,162 patients initiating antiretroviral therapy in South Africa from Jan 2017-March 2022.

(DOCX)

S2 Table. Survival analysis of baseline and longitudinal risk factors for IIT in a cohort of 191,162 patients initiating antiretroviral therapy in South Africa from Jan 2017-March 2022.

(DOCX)

Data Availability

Access to primary data is subject to restrictions owing to privacy and ethics policies set by the South African Government. Data extraction, data anonymization and data management were approved by the University of Witwatersrand (Human Research Ethics Committee, Reference: 210106). The contact point for this data sharing agreement is the University of Witwaterstrand Ethics Administrators (HREC-Medical.ResearchOffice@wits.ac.za).

Funding Statement

The Aurum Institute is funded by the PEPFAR programme under grant number GGH001981: Programmatic Implementation and Technical Assistance for HIV/AIDS & TB Programs in Priority Districts of South Africa. The contents are the responsibility of the authors and do not necessarily reflect the views of PEPFAR, USAID or the United States Government. Jacques Carstens received funding in the form of salary from the commercial company Palindrome Data. Palindrome Data is partially funded by Janssen Pharmaceutica (Pty) Ltd, part of the Janssen Pharmaceutical Companies of Johnson & Johnson. The contents are the responsibility of the authors and do not necessarily reflect the views of Janssen Pharmaceutica (Pty) Ltd. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The corresponding author had full access to all the dataset in the study and had final responsibility for the decision to submit for publication. The specific roles of these authors are articulated in the ‘author contributions’ section.

References

  • 1.The Joint United Nations Programme on HIV/AIDS (UNAIDS). Global HIV & AIDS statistics—Fact sheet. 2022. Available from: URL:https://www.unaids.org/en/resources/documents/2022/UNAIDS_FactSheet
  • 2.Delva W, Eaton JW, Meng F, Fraser C, White RG, Vickerman P, et al. HIV treatment as prevention: optimising the impact of expanded HIV treatment programmes. PLOS medicine. 2012. Jul 10;9(7):e1001258. doi: 10.1371/journal.pmed.1001258 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Fox MP, Bor J, Brennan AT, MacLeod WB, Maskew M, Stevens WS, et al. Estimating retention in HIV care accounting for patient transfers: A national laboratory cohort study in South Africa. PLoS medicine. 2018. Jun 11;15(6):e1002589. doi: 10.1371/journal.pmed.1002589 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Kaplan SR, Oosthuizen C, Stinson K, Little F, Euvrard J, Schomaker M, et al. Contemporary disengagement from antiretroviral therapy in Khayelitsha, South Africa: a cohort study. PLOS medicine. 2017. Nov 7;14(11):e1002407. doi: 10.1371/journal.pmed.1002407 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Clouse K, Pettifor AE, Maskew M, Bassett J, Van Rie A, Behets F, et al. Patient retention from HIV diagnosis through one year on antiretroviral therapy at a primary healthcare clinic in Johannesburg, South Africa. Journal of acquired immune deficiency syndromes (1999). 2013. Feb 2;62(2):e39. doi: 10.1097/QAI.0b013e318273ac48 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Chirambo L, Valeta M, Banda Kamanga TM, Nyondo-Mipando AL. Factors influencing adherence to antiretroviral treatment among adults accessing care from private health facilities in Malawi. BMC public health. 2019. Dec;19(1):1–1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Chauke P, Huma M, Madiba S. Lost to follow up rate in the first year of ART in adults initiated in a universal test and treat programme: a retrospective cohort study in Ekurhuleni District, South Africa. The Pan African Medical Journal. 2020;37. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Mugglin C, Haas AD, van Oosterhout JJ, Msukwa M, Tenthani L, Estill J, E et al. Long-term retention on antiretroviral therapy among infants, children, adolescents and adults in Malawi: A cohort study. PloS one. 2019. Nov 14;14(11):e0224837. doi: 10.1371/journal.pone.0224837 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Hallett TB, Eaton JW. A side door into care cascade for HIV-infected patients?. JAIDS Journal of Acquired Immune Deficiency Syndromes. 2013. Jul 1;63:S228–32. doi: 10.1097/QAI.0b013e318298721b [DOI] [PubMed] [Google Scholar]
  • 10.Arnesen R, Moll AP, Shenoi SV. Predictors of loss to follow-up among patients on ART at a rural hospital in KwaZulu-Natal, South Africa. PLoS One. 2017. May 24;12(5):e0177168. doi: 10.1371/journal.pone.0177168 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Plazy M, Orne-Gliemann J, Dabis F, Dray-Spira R. Retention in care prior to antiretroviral treatment eligibility in sub-Saharan Africa: a systematic review of the literature. BMJ Open. 2015. Jun 1;5(6):e006927 doi: 10.1136/bmjopen-2014-006927 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Long L, Kuchukhidze S, Pascoe S, Nichols BE, Fox MP, Cele R, et al. Retention in care and viral suppression in differentiated service delivery models for HIV treatment delivery in sub‐Saharan Africa: a rapid systematic review. Journal of the International AIDS Society. 2020. Nov;23(11):e25640 doi: 10.1002/jia2.25640 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Fox MP, Rosen S, Geldsetzer P, Bärnighausen T, Negussie E, Beanland R. Interventions to improve the rate or timing of initiation of antiretroviral therapy for HIV in sub‐Saharan Africa: meta‐analyses of effectiveness. Journal of the International AIDS Society. 2016. Jan;19(1):20888. doi: 10.7448/IAS.19.1.20888 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Moncada-Torres A, van Maaren MC, Hendriks MP, Siesling S, Geleijnse G. Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival. Scientific Reports. 2021. Mar 26;11(1):1–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Stockman J, Friedman J, Sundberg J, Harris E. Predictive analytics using machine learning to identify ART clients at health system level at greatest risk of treatment interruption in Mozambique and Nigeria. JAIDS Journal of Acquired Immune Deficiency Syndromes. 2022. May 13:10–97. doi: 10.1097/QAI.0000000000002947 [DOI] [PubMed] [Google Scholar]
  • 16.Maskew M, Sharpey-Schafer K, De Voux L, Crompton T, Bor J, Rennick M, et al. Applying machine learning and predictive modeling to retention and viral suppression in South African HIV treatment cohorts. Scientific reports. 2022. Jul 26;12(1):1–0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Esra R, Carstens J, Le Roux S, Mabuto T, Eisenstein M, Keiser O, et al. Validation and improvement of a machine learning model to predict interruptions in antiretroviral treatment in South Africa. Journal of Acquired Immune Deficiency Syndromes. 2022. Oct 3. [DOI] [PubMed] [Google Scholar]
  • 18.Fahey CA, Wei L, Njau PF, Shabani S, Kwilasa S, Maokola W, et al. Machine learning with routine electronic medical record data to identify people at high risk of disengagement from HIV care in Tanzania. PLOS Global Public Health. 2022. Sep 16;2(9):e0000720. doi: 10.1371/journal.pgph.0000720 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Osler M, Hilderbrand K, Hennessey C, Arendse J, Goemaere E, Ford N, et al. A three‐tier framework for monitoring antiretroviral therapy in high HIV burden settings. Journal of the International AIDS Society. 2014. Jan;17(1):18908. doi: 10.7448/IAS.17.1.18908 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.South African National Department of Health. 2019 ART Clinical Guidelines.; 2019. Available from: https://www.health.gov.za/wp-content/uploads/2020/11/2019-art-guideline.pdf
  • 21.Mukumbang FC, Orth Z, Van Wyk B. What do the implementation outcome variables tell us about the scaling-up of the antiretroviral treatment adherence clubs in South Africa? A document review. Health Research Policy and Systems. 2019. Dec;17(1):1–2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.The United States President’s Emergency Plan for AIDS Relief (PEPFAR). Monitoring, Evaluation, and Reporting Indicator Reference Guide (MER 2.0, Version 2.6). 2022. Available from: https://www.state.gov/wp-content/uploads/2021/09/FY22-MER-2.6-Indicator-Reference-Guide.pdf
  • 23.Grimsrud A, Wilkinson L. Acceleration of differentiated service delivery for HIV treatment in sub‐Saharan Africa during COVID‐19. Journal of the International AIDS Society. 2021. Jun;24(6):e25704. doi: 10.1002/jia2.25704 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Lemaître G, Nogueira F, Aridas CK. Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. The Journal of Machine Learning Research. 2017. Jan 1;18(1):559–63. [Google Scholar]
  • 25.Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A. Catboost: Unbiased boosting with categorical features. Adv Neural Inf Process Syst. 2018;2018-Decem(Section 4):6638–6648. [Google Scholar]
  • 26.Nosyk B, Humphrey L. Highlighting the need for investment and innovation in ART retention interventions. The Lancet Global Health. 2022. Sep 1;10(9):e1218–9. doi: 10.1016/S2214-109X(22)00327-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Haas R. mortality on antiretroviral therapy in sub-Saharan Africa: collaborative analyses of HIV treatment programmes. J Int AIDS Soc.(21). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Etoori D, Wringe A, Renju J, Kabudula CW, Gomez-Olive FX, Reniers G. Challenges with tracing patients on antiretroviral therapy who are late for clinic appointments in rural South Africa and recommendations for future practice. Global Health Action. 2020. Dec 31;13(1):1755115. doi: 10.1080/16549716.2020.1755115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Mody A, Tram KH, Glidden DV, Eshun-Wilson I, Sikombe K, Mehrotra M, et al. Novel longitudinal methods for assessing retention in care: a synthetic review. Current HIV/AIDS Reports. 2021. Aug;18(4):299–308. doi: 10.1007/s11904-021-00561-2 [DOI] [PubMed] [Google Scholar]
  • 30.Smith R, Villanueva G, Probyn K, Sguassero Y, Ford N, Orrell C, et al. Accuracy of measures for antiretroviral adherence in people living with HIV. Cochrane Database of Systematic Reviews. 2022(7) doi: 10.1002/14651858.CD013080.pub2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Bastard M, Pinoges L, Balkan S, Szumilin E, Ferreyra C, Pujades-Rodriguez M. Timeliness of clinic attendance is a good predictor of virological response and resistance to antiretroviral drugs in HIV-infected patients. PLoS One. 2012. Nov 7;7(11):e49091. doi: 10.1371/journal.pone.0049091 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Dear N, Esber A, Iroezindu M, Bahemana E, Kibuuka H, Maswai J, et al. Routine HIV clinic visit adherence in the African Cohort Study. AIDS research and therapy. 2022. Dec;19(1):1–2. doi: 10.1186/s12981-021-00425-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Bershteyn A, Jamieson L, Kim H-Y, Milali M.P, Brink D, Martin-Huges M, et al. Modeling the impact and cost-effectiveness of interventions for retention in HIV care. CROI (2022). Poster 00909. [Google Scholar]
  • 34.Mendelsohn AS, Ritchwood T. COVID-19 and antiretroviral therapies: South Africa’s charge towards 90–90–90 in the midst of a second pandemic. AIDS and Behavior. 2020. Oct;24(10):2754–6. doi: 10.1007/s10461-020-02898-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Semerdjian J, Lykopoulos K, Maas A, Harrell M, Priest J, Eitz-Ferrer P, et al. Supervised machine learning to predict HIV outcomes using electronic health record and insurance claims data. AIDS. 2018. Avalible at: https://programme.aids2018.org/Abstract/Abstract/4559 [Google Scholar]
PLOS Glob Public Health. doi: 10.1371/journal.pgph.0002105.r001

Decision Letter 0

Hannah Hogan Leslie

6 Mar 2023

PGPH-D-22-01941

Historical visit attendance as predictor of treatment interruption in South African HIV patients: extension of a validated machine learning model

PLOS Global Public Health

Dear Dr. Esra,

Thank you for submitting your manuscript to PLOS Global Public Health. After careful consideration, we feel that it has merit but does not fully meet PLOS Global Public Health’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Apr 20 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at globalpubhealth@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pgph/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

We look forward to receiving your revised manuscript.

Kind regards,

Hannah Hogan Leslie, PhD

Academic Editor

PLOS Global Public Health

Journal Requirements:

1. Please ensure that Funding Information and Financial Disclosure Statement are matched.

2. In the Funding Information you indicated that no funding was received. Please revise the Funding Information field to reflect funding received.

3. We have noticed that you have uploaded Supporting Information files, but you have not included a list of legends. Please add a full list of legends for your Supporting Information files after the references list. 

4. In the online submission form, you indicated that "Access to primary data is subject to restrictions owing to privacy and ethics policies set by the South African Government. Code for the analysis may be accessed from the authors upon reasonable request". All PLOS journals now require all data underlying the findings described in their manuscript to be freely available to other researchers, either 1. In a public repository, 2. Within the manuscript itself, or 3. Uploaded as supplementary information.

This policy applies to all data except where public deposition would breach compliance with the protocol approved by your research ethics board. If your data cannot be made publicly available for ethical or legal reasons (e.g., public availability would compromise patient privacy), please explain your reasons by return email and your exemption request will be escalated to the editor for approval. Your exemption request will be handled independently and will not hold up the peer review process, but will need to be resolved should your manuscript be accepted for publication. One of the Editorial team will then be in touch if there are any issues.

Additional Editor Comments (if provided):

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Does this manuscript meet PLOS Global Public Health’s publication criteria? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I don't know

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS Global Public Health does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Overall comments:

I think this is an interesting manuscript to better understand the relationship between sociodemographic, clinical, and care history variables in patterns and prediction of retention. The authors use sophisticated methods (e.g., machine learning) and present interesting analyses to help unpack which variables are most important for understanding future risk of LTFU. I think this is an incredibly important question as to date most attention on risk prediction has focused primarily on sociodemographic and clinical variables rather than observed behavior like visit history. My main suggestions for this article are to better clarify the questions they are trying to answer with each analysis and why it is important. Currently, these are interesting analyses but the overall message, learning points are a bit obscured. I also have additional suggestions for potential analyses that may tighten the message and also to help with clarity.

Specific comments:

Intro

• I think the intro is generally well-written and touches on the important points, except I found the second paragraph and bit winding…it is more about how we target retention interventions rather than if they are effective or not.

• I think points that could be highlighted more clearly about how we target. A lot of attention is on just using clinical and sociodemographic characteristics that are not that predictive, but are easily identified. Statistical approaches that include more variables may be a bit better, but are harder to implement (and machine learning may be even a little bit better). But it seems to me that the most important aspect is what kind of data/variables we focus on and put into our models. This is discussed well in the discussion, but I think the main point is that we need to incorporate patients’ observed behaviors (i.e., their care history) into how we do risk assessments. Whether simple variables of care history (prior LTFU, prior lateness) suffice or more advanced machine learning algorithms are needed is I think an important point (and interesting question). I think some highlighting of both the types of data we use and also how sophisticated the algorithm should be discussed.

• For the last paragraph, I wasn’t totally clear on the purpose of the paper. Is it to identify variables that may improve a machine learning algorithm. Is it to identify which variables are most important (sociodemographics, clinical, care history).

• I would also distinguish between sociodemographic, clinical, care history predictors. I think many people just think of sociodemographic and clinical being important, but what this paper and the authors’ prior work really demonstrates is that care history is likely most important.

Methods:

• I had a bit of hard time keeping track of the definitions and how the analysis was being down. From my understanding, all analyses were done at the level of a scheduled appointment (whether they attended the visit or not). And then each visit (whether attended or not) were classified as on time, late, or IIT depending on when they eventually showed up (and then further categorized based on prior histories). And clarify at what level the analyses are being done.

• Figure 1 may need a more descriptive legend. To me I think the visit in gray is the one being categorized and the red, orange, green represent what has happened in the past (or perhaps it is the rightmost colored dot). A figure like this is great, but I think needs to be a lot clearer.

• How can some one return to treatment on time or late? If they have an appointment and are 28 days late they have an interruption, but they may come back at any time after 28 days late and reengage in care. Either way they are more than 28 days late for their last appointment. Looking more in depth at Table 1, the description also makes it sound like this is the visit after someone returns (not their return visit)…but then not sure how their return visit get classified (IIT?).

• For Cox PH model, would clarify what time zero is and also event. Is this time to first IIT or LTFU. Or this a recurrent event analysis looking forward from each made visit.

• How do the original 13 variables different from visit archetypes. My understanding is that previous history was already included in the original machine learning model (and those models may more flexibly characterize complex trajectories over time)…I do understand more after looking at Table 2 but would be nice to understand in methods so questsion becomes clear.

• In general, would try to be specific about the question each analysis is trying to answer. My sense is that you are trying to see whether visit archetypes are good enough and then don’t even require machine learning (or something to that end)…but I am not sure. Similar to machine learning vs. Cox PH

Results

• Should you also describe characteristics of visits as I think most analyses are being on that level?

• What does “returned to treatment after initiation” mean. I think it represent someone who came back a second time (i.e., not LTFU the day of initiation) but also sounds a bit like reengagement. May want to rephrase.

• I don’t fully understand Figure 2 and how the dynamics of LTFU and then return to care play out with the blue and yellow lines (related to prior comment).

• Table 2 seems to have some of the most interesting findings from the study. I think would be really interesting to step and consider the important to ask here, and I think that has to do with the types of variables included. I see 3 categories: sociodemographic (age, sex); clinical (VLs), but then more are related to care history (duration in care, appointments, lateness ratio, visit archetypes). Could examine what prediction looks like with or without care history (e.g., how good does sociodemographics do). Could examine using visit archetypes vs. the other nuanced care history variables. Ultimately, I think it comes down to what is most useful. A clinician can easily identify sex, age, and a visit archetype, but can’t really calculate lateness ratios (etc.). Could try and understand what is a minimum feasible to get good prediction.

Discussion

• I think the discussion is good and focuses on the important points, particularly the important of care history. I think a sharper focus on the questions at hand could be helpful (but I think will come from addressing my comments in other sections).

• Additional citations to consider for intro or discussion are PMID 33948789, 33105396, 32173743, 31661487. For disclosure, I am a co-author on these studies so feel free to include or not include based on your judgement. I think they may complement what this study does by discussing how we think about risk of LTFU, its time-varying nature, and what types are data needed to get better understandings, which I think are all very important questions that this paper helps unpack.

Reviewer #2: Background

Additionally, it would be helpful to clarify the specific research questions or hypotheses being addressed in this study. While the overall aim of the study is clear, it may be useful to explicitly state the research questions or hypotheses that were tested using the machine learning and traditional statistical methods.

Methods

The methods section could benefit from a clearer explanation of what "previous visit archetypes" are and how they were used in the analysis.

Overall

Overall, the manuscript appears to be well-written and provides valuable insights into patient retention in HIV treatment in South Africa. The use of machine learning to predict patient outcomes based on historical data is particularly innovative and could have important implications for improving patient retention strategies. However, the manuscript could benefit from additional detail on the specific interventions that could be implemented based on the findings. Additionally, further discussion of potential limitations of the study, such as the lack of unique patient identifiers and the potential for misclassification of patient outcomes, could help contextualize the results.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

Do you want your identity to be public for this peer review? If you choose “no”, your identity will remain anonymous but your review may still be made public.

For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Aaloke Mody

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLOS Glob Public Health. doi: 10.1371/journal.pgph.0002105.r003

Decision Letter 1

Hannah Hogan Leslie

24 Apr 2023

PGPH-D-22-01941R1

Historical visit attendance as predictor of treatment interruption in South African HIV patients: extension of a validated machine learning model

PLOS Global Public Health

Dear Dr. Esra,

Thank you for submitting your manuscript to PLOS Global Public Health. After careful consideration, we feel that it has merit but does not fully meet PLOS Global Public Health’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. In particular, reviewer concerns around the clarity of the research question, the transparency in decision making, and the conceptual basis for including variables have not been fully addressed within the manuscript. A more complete response is necessary to fairly evaluate the manuscript and progress it further. 

Please submit your revised manuscript by Jun 08 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at globalpubhealth@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pgph/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

We look forward to receiving your revised manuscript.

Kind regards,

Hannah Hogan Leslie, PhD

Academic Editor

PLOS Global Public Health

Journal Requirements:

Additional Editor Comments (if provided):

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

**********

2. Does this manuscript meet PLOS Global Public Health’s publication criteria? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS Global Public Health does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Overall comments:

Thank you for the authors’ for addressing my previous comments. I still had a few points for clarification. I still there is additional clarity around the question at hand and why its important.

• I would remove all references to causality. The issue with causal inference isn’t a statistical one due to non-linearity and collinearity. There are more complex issues at hand with time-varying mediation-confounding, etc. and unmeasured confounding that can’t be handled statistically.

• I agree with the author’s that discussion of the effectiveness of LTFU interventions is relevant, but as a reader, the narrative gets lost. Its not until the end of the intro or beginning of methods that I start to understand what the manuscript is about. Would just consider reframing/organizing so the main narrative stays clear.

• I appreciate the authors’ perspective that discussion of what variables are included are not within the scope of this manuscript, though I am not sure I agree. Again, this may be partially because I don’t follow the intro narrative for the specific questions and their implications. Machine learning also seems potentially useful when data is complex and would require a lot of decision-making beforehand. Considering these issues also directly guides what types of statistical issues and questions are most important to address.

• I still think for Table 3 would be important to consider the conceptually different categories for data types. Looking at the previously published paper, variables on visit history were the most important, sociodemographics less so. Is the questions here, will adding archetypes—when you already included other visit history in the machine learning model—improve prediction? And the rationale for adding archetypes is because they were associated with outcomes using more routine statistical methods (although those models didn’t include visit history). Again I may not be following correctly, but I would not anticipate that would add much. There are many practical considerations to consider to how one would want to add additional variables like visit archetypes to an existing model. I don’t particularly follow the rationale for the approach taken here. Some conceptual distinction between the variables used in the prior models vs these archetypes should be made (outside of the history of how the authors’ have chosen to use them).

Reviewer #2: Thanks for the thorough review and making the manuscrit clear.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

Do you want your identity to be public for this peer review? If you choose “no”, your identity will remain anonymous but your review may still be made public.

For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Aaloke Mody

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLOS Glob Public Health. doi: 10.1371/journal.pgph.0002105.r005

Decision Letter 2

Hannah Hogan Leslie

22 May 2023

PGPH-D-22-01941R2

Historical visit attendance as predictor of treatment interruption in South African HIV patients: extension of a validated machine learning model

PLOS Global Public Health

Dear Dr. Esra,

Thank you for submitting your manuscript to PLOS Global Public Health. After careful consideration, we feel that it has merit but does not fully meet PLOS Global Public Health’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Thank you for addressing the reviewer's comments on the previous submission; the revisions have substantially clarified the paper. I would ask that you expand on the findings in Table 3 and link these to the summary implications in the discussion and abstract to be sure the clarified research aims are fully executed in the paper.

Please submit your revised manuscript by Jun 21 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at globalpubhealth@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pgph/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

We look forward to receiving your revised manuscript.

Kind regards,

Hannah Hogan Leslie, PhD

Academic Editor

PLOS Global Public Health

Journal Requirements:

1. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments (if provided):

There is limited engagement in the text with the contents of Table 3, which is the key evidence in response to the question on whether categorical visit attendance is on its own a valid predictor. The discussion summarizes across the models ("Historical visit attendance is highly informative in a machine learning model predicting up to two thirds of next missed ART visits" - presumably a reference to the sensitivity >60% of the 2 models with clinical and demographic variables also included) and the abstract highlights the performance of the model without categorical adherence. Please provide greater interpretation in the text of the results shown in Table 3 and elaborate in the discussion what the implications are for both variable type and statistical approach recommended for use moving forward.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

**********

2. Does this manuscript meet PLOS Global Public Health’s publication criteria? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS Global Public Health does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Thank you for the authors’ for addressing my previous comments. I think the manuscript is improved and easier to follow with these changes. I do still find some of the explanation dense, particularly related to Table 3, and I feel a general reader who is not familiar with differences between examining causality vs. prediction would struggle. Ultimately, what I believe the authors are saying and an underlying point is that machine learning algorithms are complex and model complex relationships, thus they are not readily useable in clinical settings (unless implemented in an EHR that can do the computations in the real-time in the background). Essentially, they are a black box. Visit archetypes, however, are readily identifiable by clinicians/practitioners—either with EHRs or not—and without any need for understanding complex relationships between variables. Thus, it is useful and important to understand the performance of these archetypes. I think bringing out these practical implications would be helpful in the results and discussion.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

Do you want your identity to be public for this peer review? If you choose “no”, your identity will remain anonymous but your review may still be made public.

For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLOS Glob Public Health. doi: 10.1371/journal.pgph.0002105.r007

Decision Letter 3

Hannah Hogan Leslie

6 Jun 2023

Historical visit attendance as predictor of treatment interruption in South African HIV patients: extension of a validated machine learning model

PGPH-D-22-01941R3

Dear Ms Esra,

We are pleased to inform you that your manuscript 'Historical visit attendance as predictor of treatment interruption in South African HIV patients: extension of a validated machine learning model' has been provisionally accepted for publication in PLOS Global Public Health.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they'll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact globalpubhealth@plos.org.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Global Public Health.

Best regards,

Hannah Hogan Leslie, PhD

Academic Editor

PLOS Global Public Health

***********************************************************

Reviewer Comments (if any, and for reference):

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. Log binomial regression results risk factors for LTFU in a cohort of 191,162 patients initiating antiretroviral therapy in South Africa from Jan 2017-March 2022.

    (DOCX)

    S2 Table. Survival analysis of baseline and longitudinal risk factors for IIT in a cohort of 191,162 patients initiating antiretroviral therapy in South Africa from Jan 2017-March 2022.

    (DOCX)

    Attachment

    Submitted filename: plos_ltfu_response_to_reviewers.docx

    Attachment

    Submitted filename: 090523_response_to_reviewers.docx

    Attachment

    Submitted filename: 260523_response_to_reviewers.docx

    Data Availability Statement

    Access to primary data is subject to restrictions owing to privacy and ethics policies set by the South African Government. Data extraction, data anonymization and data management were approved by the University of Witwatersrand (Human Research Ethics Committee, Reference: 210106). The contact point for this data sharing agreement is the University of Witwaterstrand Ethics Administrators (HREC-Medical.ResearchOffice@wits.ac.za).


    Articles from PLOS Global Public Health are provided here courtesy of PLOS

    RESOURCES