Skip to main content
PLOS One logoLink to PLOS One
. 2026 Mar 6;21(3):e0344570. doi: 10.1371/journal.pone.0344570

Machine learning prediction of weight gain after antiretroviral therapy initiation in people with HIV: Insights from a large french real-world cohort

Cyrielle Codde 1,2,*,#, Clément Benoist 2, Laurent Hocqueloux 3, Cyrille Delpierre 4, Clotilde Allevena 5, Amélie Ménard 6, Antoine Chéret 7, Cédric Arvieux 8, Jean-François Faucher 1, Jean-Baptiste Woillard 2,9,#; on behalf of the Dat’AIDS Study Group
Editor: Carmen María González-Domenech10
PMCID: PMC12965677  PMID: 41790866

Abstract

Excessive weight gain after initiation of antiretroviral therapy (ART) has become a recognized concern among people living with HIV. Individual weight trajectories remain highly heterogeneous and challenging to predict using conventional methods. We leveraged the French Dat’AIDS national cohort to assess whether machine learning (ML) could enhance the prediction of individual body weight at 6, 12, and 24 months after ART initiation. Using 112 baseline variables encompassing demographic, clinical, laboratory, and treatment-related data, we trained XGBoost models and evaluated performance using root mean square error (RMSE), R², and mean prediction error. A simple benchmark model based on baseline weight was used for comparison. Among 24,014 eligible ART-naïve adults, the ML models achieved RMSEs of approximately 4.6 kg, 5.3 kg, and 6.4 kg at 6, 12, and 24 months respectively, with declining predictive power over time. Baseline weight (Weight_M0) consistently emerged as the strongest predictor, while other factors contributed minimally. Although ML marginally outperformed the benchmark (Weight_M0), accuracy remained insufficient for clinical decision-making. Sensitivity analyses excluding individuals with implausibly large monthly weight changes modestly improved RMSE (3.9–6.0 kg), underscoring the impact of data quality. Our results demonstrate that, despite large sample size and rich clinical variables, ML lacks the precision necessary for individual weight forecasting in this context. These findings highlight the limitations of applying artificial intelligence to heterogeneous real-world cohorts and underscore the need to incorporate behavioral and lifestyle factors to improve predictive modeling.

Introduction

Excessive weight gain following initiation of antiretroviral therapy (ART) is increasingly recognized in people living with HIV (PLWH) [1,2]. This phenomenon is particularly associated with integrase strand transfer inhibitors (INSTIs) and tenofovir alafenamide (TAF) [35], but weight trajectories remain highly heterogeneous [6,7]. While weight gain may reflect a desirable “return-to-health” in patients presenting with HIV-related wasting, it can also lead to overweight, obesity, and cardiometabolic complications in those with normal baseline weight.

Several population-based studies have identified risk factors for weight gain, including female sex, Black ethnicity, and exposure to specific antiretroviral drugs [3,810]. However, individual prediction remains challenging [4]. Traditional linear regression approaches have provided insights at the group level but fail to capture the complexity of weight evolution at the patient level, which depends on medical, behavioral, and lifestyle factors that are not always measured.

Machine learning (ML) approaches are well suited to model complex, nonlinear relationships and may therefore improve individual-level prediction. In other fields, ML has been successfully applied to forecast weight gain during pregnancy or after exposure to psychotropic drugs [11]. In HIV research, ML has mainly been used to predict virological suppression, treatment adherence, or resistance, but little is known about its capacity to predict weight trajectories under ART [12].

Recent pilot studies, such as the work by Motta et al.[13], have explored ML algorithms in specialized cohorts with moderate success. However, a critical research gap remains: the scalability of these models to large, unselected real-world populations. It is currently unknown whether the predictive signals identified in smaller, homogeneous datasets remain robust when applied to nationwide heterogeneous cohorts, or if they are diluted by the variability of routine clinical care.

The objective of this study was to develop and evaluate machine learning models for predicting individual weight trajectories at 6, 12, and 24 months after ART initiation in treatment-naïve PLWH from the French Dat’AIDS cohort. Beyond assessing prediction accuracy, we aimed to highlight the opportunities and limitations of applying machine learning to large heterogeneous real-world datasets.

Methods

Study design and population

We conducted a retrospective cohort study using data from the French Dat’AIDS cohort, a nationwide collaboration of 16 HIV reference centers. Dat’AIDS collects standardized longitudinal clinical information from PLWH receiving routine care.

We included all ART-naïve adults (≥18 years) who initiated combination ART between 01/01/2004 and 31/12/2021 and had at least one available weight measurement at baseline and during follow-up. Patients with missing baseline weight or incomplete ART regimen data were excluded. Data were accessed for research purposes from 01/03/2023 to 31/10/2023.

Outcome

The primary outcome of interest was body weight, expressed in kilograms, at 6, 12, and 24 months after ART initiation. When multiple weight measurements were available within a time window of ±2 months around the target visit, the measurement closest to the corresponding timepoint was retained.

Predictors

A total of 112 baseline variables were considered as potential predictors after multidisciplinary discussion (pharmacologist, infectious disease specialist, data scientist) and are presented in Table 1.

Table 1. Predictors selected for machine learning analysis of weight gain in PWH.

Socio-Demographic and Follow-up Predictors
COREVIH center number
Gender
Time interval between visit and HIV diagnosis
Weight at first visit
Birth country
Age at visit n
Weight at visit n
Year and month at visit n
Biological predictors
Log HIV viral load at visit n
CD4 count at visit n, values bounded between 200–3000 (/mm3)
CD4 CD8 ratio at visit n, values bounded between 0–4
Predictors related to HIV history
Year of HIV diagnosis
HIV type (1, 2)
AIDS stage at visit n
Time since initiation of ARV line
Second-generation INSTI
TAF
INSTI
NRTI
NNRTI
PI
Raltegravir
Elvitegravir
Dolutegravir
Bictegravir
Cabotegravir
Saquinavir
Ritonavir
Lopinavir
Atazanavir
Fosamprenavir
Tipranavir
Darunavir
Indinavir
Zidovudine
Lamivudine
Abacavir
Tenofovir disoproxil fumarate
Emtricitabine
Stavudine
Didanosine
Nevirapine
Efavirenz
Etravirine
Rilpivirine
Doravirine
Maraviroc
Islatravir
Bms955176
Predictors related to comorbidities
HBV, HCV co-infection
Metabolic disorders
Endocrine risk factors
Pregnancy
Time since pregnancy
Menopause
Time since menopause
Sedentary lifestyle, mobility restriction
Diet and hygiene habits
Socio-professional risk factors
Malnutrition
Thyrotoxicosis
Anorexia
Intake of toxic substances
Mood disorders§
Schizophrenia disorders§
Bulimia§
Sleep disorders§
Non-compliance with treatment§
Enterocolitis§
Stomies§
Predictors related to co-medications
Atypical neuroleptics
Tricyclic antidepressants
Other antidepressants
Thymoregulators and anti-convulsants
Corticosteroids
GLP1 analogues
Medications associated with weight loss (others)
Time since each medication

INSTI, Integrase strand transfer inhibitor; TAF, Tenofovir alafenamide; NRTI, Nuclos(t)idic reverse transcriptase inhibitor; NNRTI, Non nuclos(t)idic reverse transcriptase inhibitor; PI, Protease inhibitor.

† Supposed predictors of weight gain, ‡ supposed predictors of weight loss, § supposed predictors of weight variation.

These included demographic characteristics such as age, sex, and geographical origin; clinical parameters such as baseline weight, body mass index (BMI), HIV disease stage, and comorbidities including hypertension, diabetes, renal or hepatic disease; laboratory results including CD4 and CD8 cell counts, plasma HIV RNA, creatinine, and liver function tests; and treatment-related variables such as ART regimen composition by drug class and individual antiretroviral agents. Lifestyle factors such as smoking, alcohol use, and history of opportunistic infections were also incorporated. For women aged 50 years or older, menopausal status was imputed based on age, which we acknowledge as an approximation with potential limitations.

The selection of comorbidities was based on the International Code of Diseases 10 (ICD-10), and the selection of comedications of interest and ARV treatment lines were identified according to their International Nonproprietary Names (INNs) and marketed specialty names. These selections are presented in the S1 Table.

Unlike the starting date of a comorbidity or comedication, the date of stop was not known in the Dat’AIDS database. Therefore, each antecedent or treatment of interest entered only once was considered continuous during longitudinal follow-up, with the exception of pregnancy, which was terminated after 9 months. Weights corresponding to entries when CD4 was less than 200/mm3 were omitted to restrict weight gain to that linked with a return to health rather than exclusively with ARV treatment exposure.

Data preprocessing

Predictors with more than 30% missing values were excluded from the analysis. Data processing, imputation of missing data, correction of outliers, and graphical exploration were performed using the tidyverse and tidymodels packages [14,15]. Imputation was performed using knn approach. Continuous biological variables were bounded to eliminate outliers, resulting in a Gaussian distribution. Continuous variables were standardized, and categorical variables were transformed using one-hot encoding.

For all analyses, data cleaning consisted in restricting the weight values reported between 30 and 230 kg, in order to obtain a Gaussian distribution of outcome values with a maximum variation in weight per month lower or equal to 10 kg. Sensitivity analyses were performed by framing this maximum variation at 5 kg and 3 kg.

Machine learning modeling

We applied the XGBoost (Extreme Gradient Boosting) algorithm to predict weight at 6, 12, and 24 months after ART initiation. Data splitting was performed allocating 75% in the train set (5053, 4283, and 3549 patients respectively) and 25% in the test set (1686, 1430 and 1185 patients). XGBoost models were trained on each train set to predict the weight at the checkpoint of interest. Each train set was used to tune the hyperparameters (trees, tree depth, learning rate, min_n, loss_reduction, sample_size, and mtry) and to evaluate the model performances by 10-fold cross-validation. The specific hyperparameter settings for the final models are detailed in S2 Table. Each best model was then evaluated in the corresponding test set by measuring the root mean square error (RMSE; expressed in kg) between the predicted and reference weight at the checkpoint. The performances were evaluated by calculation of the RMSE, r2, mean prediction error (MPE; expressed in kg), relative MPE (%), and relative RMSE (%) in the test set. Variable importance plot obtained by random permutation were drawn to assess the importance of predictors. Finally, scatter plots of predicted vs reference weight at each checkpoint in the test set were drawn.

Benchmark models

To contextualize the performance of the machine learning approach, we also developed a LASSO penalized multivariable Linear Regression model using the same set of predictors and data splitting strategy (training/testing sets) as the XGBoost model at 6, 12 and 24 months. This served to assess whether a non-linear approach provided a significant advantage over traditional statistical modeling.

In addition to the ML models, we defined a simple benchmark model based on baseline weight at ART initiation (Weight_M0). In this approach, the predicted weight at 6, 12, or 24 months was assumed to be equal to the baseline weight, i.e., assuming no change over time. For each timepoint, we calculated the same performance metrics as for the ML models.

Sensitivity analyses

To account for potential measurement errors and implausible fluctuations in weight trajectories, we conducted sensitivity analyses by restricting the study population to individuals with limited monthly variations in body weight. Two alternative cohorts were defined: the first excluded patients with weight changes exceeding 5 kg per month, and the second excluded those with variations greater than 3 kg per month. For each restricted dataset, XGBoost models were re-trained and evaluated at 6, 12, and 24 months following ART initiation, using the same procedures for data preprocessing, model training, and performance assessment as in the main analysis.

Ethics

This study was conducted in accordance with French ethics regulations and the database received approval from the Commission Nationale de l’Informatique et des Liberte´s (CNIL) and is registered in ClinicalTrials.gov. Medical records were collected from the Dat’AIDS cohort study, which is a collaboration of 30 HIV treatment centers in France and overseas (registered with ClinicalTrials.gov under the identifier NCT02898987). These centers maintain prospective cohorts of PLWH who provide written informed consent via a unique electronic medical record (Nadis®). Anonymized data for clinical events, laboratory test results and therapeutic history are collected by the networking organization, and the Dat’AIDS study was registered with the French National Commission on Informatics and Liberties (CNIL Registration number: 2001/762876/nadiscnil.doc). This study was carried out in compliance with the International guidelines for human research protection as per the Declaration of Helsinki and ICH-GCP.

Results

Study population

Dat’AIDS provided 11 separate tables: Data (socio-demographic data), Medical history (longitudinal entry of cohort comorbidities), Comedic (longitudinal entry of comedications), CVVIH (longitudinal monitoring of viral load), CD4 CD8 (longitudinal monitoring of CD4 and CD8 lymphocyte levels and ratio), Creat (longitudinal monitoring of serum creatinine), Leuco (longitudinal monitoring of leukocyte levels), Transa (longitudinal monitoring of transaminases), Lipids (longitudinal monitoring of the evaluation of lipid abnormalities), Exam_clinique (height, abdominal and hip circumference, blood pressure and weight), Evt_ther (ARV treatment line, with reason for stopping or switching). The number of patients and the overall percentage of missing data in each initial table are presented in the S3Table. After data cleaning and removal of outliers, we obtained a reduced number of patients for each of the tables.

Out of a theoretical number of 78,621 patients, 37,621 were diagnosed between January 1, 2004 and December 31, 2021 (DATA Table). Of this sample, 26,070 patients had at least 3 weights recorded in their follow-up (EXAM_CLINIQUE Table). Forty-seven of them had a history of ARV treatment according to the EVT_THER table, 2,009 patients did not have available biological values (CD4 CD8 and CVVIH tables) and were excluded. The flow chart is presented in Fig 1.

Fig 1. Flow-chart.

Fig 1

There were 24,014 patients who met the criteria for workable data. In this available population, 6,739, 5,713 and 4,734 patients treated in the first line, called naive, were analyzed for weight prediction models at 6, 12 and 24 months. The cohorts were on average about 40 years old, were made up of more than 70% men, and more than half were born in France. It was mainly HIV-1 infection with less than 15% of AIDS stage patients. At each checkpoints, there was an average weight gain of 2 kg in each cohort. Their characteristics are summarized in Table 2.

Table 2. Main characteristics of PWH treated in first line, so-called naive, used for weight prediction at 6, 12 and 24 months.

Cohort M6
N = 6739
Cohort M12
N = 5713
Cohort M24
N = 4734
Age (year) 1 40.27 (31.20-48.20) 40.99 (32.10-49.00) 42.06 (33.20-50.10)
Gender Male: 4851 (71.98) Male: 4190 (73.34) Male: 3483 (73.36)
Female: 1843 (27.35) Female: 1484 (25.98) Female: 1218 (25.73)
Trans M-F: 45 (0.67) Trans M-F: 39 (0.68) Trans M-F: 33 (0.70)
Birth country §
France 3759 (55.78) 3285 (57.50) 2720 (57.46)
West and Central Africa 1403 (20.82) 1149 (20.11) 972 (20.53)
Middle East and North Africa 244 (3.62) 198 (3.47) 182 (3.84)
Others 789 (11.71) 667 (11.68) 576 (12.17)
VIH-1 6698 (99.39) 5688 (99.56) 4704 (99.37)
AIDS stage 891 (13.22) 727 (12.73) 538 (11.36)
Weight at first visit (kg) 1 70.70 (62.00-78.00) 71.02 (62.00-78.50) 71.42 (62.50-79.00)
Weight at checkpoint (kg) 1 72.26 (63.00-80.00) 73.16 (64.00-81.00) 73.98 (64.00-82.00)
Viral load (log) 1 1.69 (1.30-1.68) 1.67 (1.30-1.60) 1.66 (1.30-1.60)
CD4 count (/mm3)1 573 (379-711) 606 (405-748) 646 (446-794)
CD4 CD8 ratio 1 0.80 (0.46-1.02) 0.87 (0.51-1.11) 0.95 (0.58-1.21)
ARV treatment
TAF 593 (8.80) 548 (9.59) 463 (9.78)
INSTI 1691 (25.09) 1432 (25.07) 1196 (25.26)
Including second generation INSTI: 878 (13.00) Including second generation INSTI: 760 (13.30) Including second generation INSTI: 679 (14.34)
NRTI 6572 (97.52) 5584 (97.74) 4628 (97.76)
NNRTI 1469 (21.80) 1393 (24.38) 1353 (28.58)
PI 3569 (52.96) 2856 (49.99) 2167 (45.78)
≥ 1 Supposed comorbidities of weight gain 1250 (18.55) 1161 (20.32) 1068 (22.56)
≥ 1 Supposed comorbidities of weight loss 126 (1.87) 105 (1.84) 106 (2.24)
≥ 1 Supposed comorbidities of weight variation 780 (11.57) 761 (13.32) 664 (14.03)
≥ 1 Supposed comedications of weight gain 204 (3.03) 171 (2.99) 175 (3.70)
≥ 1 Supposed comedications of weight loss 4 (0.06) 2 (0.04) 8 (0.17)

Values in number (%), 1Average (quartiles).

†Not reported: 544 (8.07), ‡Not reported: 414 (7.25), §Not reported: 287 (6.06).

ARV, Antireotroviral; INSTI, Integrase strand transfer inhibitor; TAF, Tenofovir alafenamide; NRTI, Nuclos(t)idic reverse transcriptaseinhibitor; NNRTI, Non nuclos(t)idic reverse transcriptaseinhibitor; PI, Protease inhibitor.

The characteristics for the subpopulations (sensitivity analyses) bounded by maximum monthly weight variations of 5 kg and 3 kg are presented in the S4 Table.

Model performance

The results obtained after cross-validation (cutting into 10 subgroups) for each time checkpoint are presented in the Table 3.

Table 3. Performance of XGBoost models for weight prediction in naive PWH at 6, 12 and 24 months in the test sets.

Prediction M6 Prediction M12 Prediction M24
RMSE, kga 4.60 5.28 6.36
R2 of RMSEa 0.893 0.862 0.808
Relative RMSE, % 6.21 6.88 8.27
Relative biais, % −0.08 −0.10 −0.87

a Value obtained after 10 cross-validation.

The RMSE evaluates the accuracy of the model by measuring the average difference between the actual values and the predictions (the lower it is, the better the performance of the model). The R2 of RMSE measures the quality of fit of the model in relation to the variability of the real data (0: no fit, 1: perfect fit). The relative RMSE expresses the relative error of the mean compared to the actual values. Relative bias measures the systematic error of the model compared to actual values.

They show fairly mediocre results with prediction inaccuracies in the test set at 4.60, 5.28 and 6.36 kg for the respective predictions at 6,12 and 24 months. The calibration of the model was assessed visually using scatter plots of observed vs. predicted weights, presented in Fig 2. These plots illustrate the dispersion of predictions around the identity line.

Fig 2. Scatter plot of weights predicted by ML versus reference weight and examples of weight predictions at the checkpoint.

Fig 2

(A) 6-month prediction, (B) 12-month prediction, and (C) 24-month prediction.

The more we advance in time, the more the predictions are spread out agreeing with the different RMSE values.

Predictor importance

Baseline weight consistently emerged as the strongest predictor across all timepoints. Other variables such as age, sex, country of birth, CD4 cell count, and ART regimen contributed modestly to prediction. Lifestyle factors and comorbidities had limited influence on model performance (Fig 3).

Fig 3. Variable Importance Plot at 6 months (A), 12 months (B) and 24 months (C).

Fig 3

Benchmark linear model and baseline weight

The comparative LASSO penalized Linear Regression model yielded RMSE (relative RMSE/ relative biais) values in the test sets of 4,53 kg (6.23%/ 0.34%), 5.18 kg (6.85%/ 0,27%), and 6.18 kg (7.95%/ 0.72%) kg at 6, 12, and 24 months, respectively. The marginal difference between XGBoost and the linear model suggests that the predictive limitation lies in the informational content of the variables rather than the modeling technique.

When compared with the simple benchmark model assuming no weight change from baseline, XGBoost yielded lower RMSE values and higher R² values, indicating better fit. However, the magnitude of improvement was limited and insufficient for precise individual-level prediction (S5 Table).

Sensitivity analysis: Bounded subpopulations on weight change per month

The sensitivity population were more restrictive for criteria and were reduced in number, with cohorts of 6,679, 5,670 and 4,726 patients for predictions with weight variations restricted at 5 kg per months at 6, 12 and 24 months respectively, and 6,540, 5,596 and 4,684 patients respectively for predictions with weight variations restricted at 3 kg per months. The XGBoost models developed from more strictly selected sub-populations showed improved performance, which would support the presence of outliers with aberrantly large weight variations. Once again, better metric values were obtained for prediction at 6 months than at 12 and 24 months. Thus, when the maximum variation in weight per month was capped at 5 kg and then 3 kg, we obtained RMSE values (S6 Table) of 4.42, 4.98 and 5.96 kg for predictions at 6, 12 and 24 months respectively, and 3.89, 4.82 and 5.93 kg for variation in weight per month capped at 3 kg. As for the main analysis, the weight at the first visit was the most important variable and when used of this value instead of the ML model prediction, the comparison with the metrics obtained using Weight_T0 were less efficient, thanks to our XGBoost models.

Discussion

In this large national cohort of treatment-naïve people living with HIV, we evaluated the capacity of machine learning to predict individual weight evolution after ART initiation. Despite the use of more than one hundred baseline variables, a rigorous modeling strategy, and sensitivity analyses in carefully cleaned datasets, prediction accuracy remained limited, with RMSE values between 4 and 7 kg. This degree of error prevents the use of such models for clinically actionable individual prediction.

The advantages of ML compared to statistical methods classically used in population studies are numerous: it allows complex modeling of non-linear relationships between data, allowing the capture of more subtle and nuanced patterns, it also allows adaptation to the individual context for predictions personalized to each individual and their characteristics. The definite advantage of using XGBoost lies in its ability to handle a large amount of data. It can utilize datasets with correlated variables and is relatively resistant to overfitting, allowing for the consideration of a large number of variables.

The best results were obtained for predictions at 6 months, which is not surprising, the weight varies less overall at this time interval compared to longer periods of 12 or 24 months of exposure to ARV regimen. Furthermore, despite our efforts to group certain predictors together and thus limit the number of variables, we had more than a hundred. Informing as many parameters in the clinic, during an ARV treatment initiation consultation is not possible, unless an automatic extraction and preprocessing of the electronic health record. Even if the choice of predictors was made after multidisciplinary consultation, some of them presented obvious limitations: consideration of medical history and comedications was hampered by the absence of data concerning the duration of exposure to the predictor.

The number of patients available via the Dat’AIDS database was colossal, however, from a theoretical number of 78,621 patients, we were forced to considerably reduce the final number. Data cleaning was laborious, with missing data and many outliers to correct. The weight values, which were our key data in this project, were not systematically reported. Few patients ultimately had at least three “reliable” weights available for use in our models.

Finally, the most important predictors was baseline weight and would have preferred a distribution of the prediction balanced on sex, ethnicity, age, comorbidity, or target comedication, as found in population studies [3,810]. The consistent emergence of baseline weight as the primary predictor reflects the strong biological inertia of anthropometric measurements. In adults, body weight remains highly stable over time in the absence of acute illness or significant lifestyle modification. Consequently, baseline weight exerts a dominant statistical effect that overshadows the more modest contributions of ART regimens or immune parameters.

Several key lessons emerge from our study. First, the results highlight the limitations of real-world multicenter data for building clinically precise predictive tools. Although Dat’AIDS covers a nationwide population including a significant proportion of patients of non-European descent, the consistency of data reporting varies significantly across collection centers and individual practitioners. This heterogeneity is quantitatively illustrated by our flow chart: out of 16,375 eligible ART-naïve patients, only 6,679 had a recorded weight available for the 6-month prediction. This attrition rate underscores that weight is not systematically reported in the database during routine care, even if measured clinically. This variability creates a ‘noisy’ dataset where the presence of data often depends on provider habits rather than patient characteristics. Consequently, the model had to be trained on a selected fraction of the original cohort, limiting its accuracy and ability to generalize.

The improvement observed when excluding patients with implausibly large monthly weight variations illustrates how sensitive ML models are to data quality, reinforcing the fundamental principle of ‘garbage in, garbage out. Thus, “more data” is not sufficient if the data are heterogeneous or noisy; standardized and high-quality phenotyping is essential.

Second, while we included a wide range of demographic, clinical, laboratory, and treatment-related variables, the prediction of weight trajectories is inherently influenced by behavioral and lifestyle determinants, such as dietary patterns, physical activity, and psychosocial factors, that were not available in our dataset. Their absence likely contributed to the limited explanatory power of the models. To overcome these limitations, future predictive efforts must move beyond passive extraction of Electronic Medical Records. We recommend the integration of standardized Patient-Reported Outcomes (PROs) to actively capture determining factors such as dietary habits and psychosocial stressors. Furthermore, the increasing availability of digital health tools—such as connected activity trackers or mobile health applications—offers a way to collect granular, longitudinal data on lifestyle behaviors.

Our findings are consistent with those of Motta et al.,who evaluated weight prediction using machine learning in a smaller, specialized cohort and also reported limited precision (3.5–5 kg) [13]. Similarly, trajectory analyses from the US CNICS cohort [16] and pooled analyses of international randomized trials [5] have highlighted the extreme heterogeneity of weight gain, which remains largely unexplained by standard demographic and clinical variables. Taken together, these studies demonstrate that even in diverse healthcare settings (Europe, USA), the accuracy of purely clinical prediction models remains modest. This reinforces the notion that weight gain after ART initiation is a multifactorial process, driven by unmeasured behavioral variables and biological variability that transcend specific national cohorts. Taken together, both studies demonstrate that even in more homogeneous or specialized datasets, the accuracy of prediction remains modest. This reinforces the notion that weight gain after ART initiation is a multifactorial process, partly driven by unmeasured variables and biological variability, which may limit the potential of purely data-driven prediction approaches.

Finally, beyond its immediate results, this work illustrates the broader challenge of applying artificial intelligence to real-world, multicenter clinical cohorts. Machine learning is highly sensitive to missing data, measurement inconsistencies, and unrecorded confounders. While we managed missing data using KNN imputation on variables with <30% missingness, we acknowledge that this method assumes data are missing at random and does not account for the uncertainty of the imputation as robustly as multiple imputation methods might. However, given that the primary predictor (baseline weight) was complete for all subjects, the impact of this limitation on overall performance is likely contained.

Conclusion

In summary, while machine learning applied to a large, heterogeneous national cohort was able to capture general weight trends, standard baseline clinical variables proved insufficient to provide accurate individual predictions. This result suggests that the limitation lies not within the modeling methodology, but rather in the inherent noise of real-world measurements and the absence of key behavioral determinants in routine Electronic Health Records. Consequently, this study highlights the critical importance of shifting focus from simply increasing sample size to improving data quality and integrating patient-reported outcomes to unlock the full potential of predictive analytics in HIV care.

Legros, G. Mchantaf, C. Mille, Y. Mohamed-Kassim, T. Prazuck, A. Sève, L. Vitry d’Aubigny

Supporting information

S1 Table. Predictors selection. (A) Comorbidities, (B) Co-medications and (C) Antiretroviral treatment.

INSTI, Integrase strand transfer inhibitor; TAF, Tenofovir alafenamide; NRTI, Nuclos(t)idic reverse transcriptaseinhibitor; NNRTI, Non nuclos(t)idic reverse transcriptaseinhibitor; PI, Protease inhibitor.

(DOCX)

pone.0344570.s001.docx (36.6KB, docx)
S2 Table. Hyperparameters setting for final XGBoost models. The following hyperparameters were tuned using grid search with 10-fold cross-validation via tidymodels package.

(DOCX)

pone.0344570.s002.docx (24.3KB, docx)
S3 Table. Tables from Dat’AIDS database.

(DOCX)

pone.0344570.s003.docx (25.1KB, docx)
S4 Table. Characteristics of the subpopulations of PLHIV treated in the first line used for weight prediction at 6, 12 and 24 months: maximum variation in weight per month limited to 5 kg (A) and 3 kg (B).

Values in number (%), 1Average (quartiles). †Not provided: 537 (8.04), ‡Not provided: 410 (7.23), §Not provided: 285 (6.03). $Not specified: 530 (8.10), $$Not specified: 409 (7.31), $$$Not specified: 284 (6.06).

(DOCX)

pone.0344570.s004.docx (31.4KB, docx)
S5 Table. Performance using Weight_T0 = weight at checkpoints. aValue obtained after 10 cross-validation.

The RMSE evaluates the accuracy of the model by measuring the average difference between the actual values and the predictions (the lower it is, the better the performance of the model). The R2 of RMSE measures the quality of fit of the model in relation to the variability of the real data (0: no fit, 1: perfect fit). The relative RMSE expresses the relative error of the mean compared to the actual values. Relative bias measures the systematic error of the model compared to actual values.

(DOCX)

pone.0344570.s005.docx (23.8KB, docx)
S6 Table. Performance of XGBoost models for weight prediction at 6, 12 and 24 months in study subpopulations. (A) Subpopulation limited to 5 kg/month, (B) Subpopulation limited to 3 kg/month.

aValue obtained after 10 cross-validation. The RMSE evaluates the accuracy of the model by measuring the average difference between the actual values and the predictions (the lower it is, the better the performance of the model). The R2 of RMSE measures the quality of fit of the model in relation to the variability of the real data (0: no fit, 1: perfect fit). The relative RMSE expresses the relative error of the mean compared to the actual values. Relative bias measures the systematic error of the model compared to actual values.

(DOCX)

pone.0344570.s006.docx (25.1KB, docx)

Acknowledgments

Members of the Dat’AIDS study group: referent Dr Laurent Hocqueloux (lau <laurent.hocqueloux@chu-orleans.fr)

1.Besançon: C. Chirouze, K. Bouiller, F. Bozon, AS. Brunel, L. Hustache-Mathieu, J. Lagoutte, Q. Lepiller, S. Marty-Quinternet, L. Pépin-Puget, B. Rosolen, N. Tissot, C. Lebreton, L. Bohard

2.Brest: S. Jaffuel, S. Ansart, Y. Quintric, S. Rezig, P. Gazeau, R. Paret, A. Coste, S. Rolland, JC. Duthe

3.Clermont-Ferrand: C. Jacomet, N. Mrozek, C. Theis, M. Vidal, C. Richaud, V. Corbin, A. Benelhadj, A. Zaghdoudi, C. Aumeran, O. Baud, M. Berthommier, M. Charles, C. Durand, D. Coban, A. Mirand, A. Brebion, H. Chabrolles, O. Perruche, E. Creuzet, C. Henquell

4.Guadeloupe: I. Lamaury, G. Baronnet, F. Bissuel, F. Boulard, A. Chéret, J. Coussement, E. Curlier, T. Dequidt, C. Desfontaines, S. Devatine, E. Duvallon, I. Fabre, C. Herrmann-Storck, C. Loraux, S. Markowicz, M. Marquet, R. Ouissa, S. Peugny, L. Pradat-Paz, M. C. Receveur, J. Reltien, K. Samar, K. Schepers, B. Tressieres, V. Walte

5.La Roche sur Yon: D. Merrien, O. Bollangier, D. Boucher, T. Guimard, L. Laine, S. Leautez, M. Morrier, P. Perré

6.La Rochelle: M. Roncato-Saberan, X. Pouget-Abadie, C. Chapuzet, A. Thomas

7.Limoges: JF. Faucher, A. Cypierre, S. Ducroix-Roubertou, H. Durox, C. Genet-Villeger, J. Pascual, P. Pinet, C. Codde, S. Rogez, JB. Woillard, C. Benoist, S. Mafi

8.Lyon: A. Becker, M. Godinot, F. Ader, M. Bonjour, E. Braun, C. Brochier, F. Brunel-Dalmas, P. Chiarello, A. Conrad, S. Degroodt, P. Fascia, T. Ferry, V. Gueripel, V. Icard, J. Izard, C Javaux, H. Lardot, J. Lippmann, D. Makhloufi, Y. Merad, T. Perpoint, S. Roux, S. Sahyouni, M. Simon, S. Soueges, C. Triffault-Fillit, F. Valour, L. Van den Bogaart, M. Wan, AS. Batalla

9.Marseille IHU Méditerrannée: A. Ménard, Y. Belkhir, P. Colson, C. Dhiver, M. Martin-Degioanni, A. Motte, C. Toméi, M. Million, N. De Palmas, M. Champeaux, I. Ravaux

10.Marseille Ste Marguerite: S. Brégigeon, O. Zaegel-Faucher, H. Laroche, MJ. Ducassou, A. Ivanova, I. Jaquet, V. Obry-Roguet, M. Orticoni, E. Ressiot, AS. Ritleng, F. Niemetzky, C. Ferron

11.Martinique: A. Cabié, S. Abel, O. Cabras, L. Cuzin, G. Dos Santos, L. Facelina, L. Fagour, L. de Ghellinck, K. Guitteaud, E. Louis-Michel, E. Medo, F Quenard, S. Pierre-François, P Richard, A Schapira, B. Tregan

12.Metz: C. Robert, Z. Cavalli, L. Bucy, C. Emilie, A. Fournier

13.Montpellier: A. Makinson, A. Artiaga, M. Bistoquet, E Delaporte, V. Le Moing, J. Lejeune, N. Meftah, C. Merle de Boever, B. Montes, L. Perez, N. Pansu, J. Reynes, C. Tramoni, E. Tuaillon

14.Nancy: B. Lefèvre, M. André, S. Bevilacqua, L. Boyer, MP. Grandin, A. Charmillon, M. Delestan, E. Frentiu, F. Goehringer, S. Hénard, E. Jeanmaire, C. Rabaud, L. Lalevée, J. Kotzyba

15.Nantes: C. Allavena, E. André-Garnier, A. Asquier-Khati, V. Bellon, E. Billaud, C. Biron, B. Bonnet, S. Bouchez, D. Boutoille, J. Brochon, C. Brunet-Cartier, M. Cavellec, L. Collias, C. Deschanvres, T. Drumel, BJ. Gaborit, M. Gregoire, T. Jovelin, R. Lecomte, M. Lefebvre, M. Le Goff, C. Mear-Passard, P. Morineau, C. Moyon, E. Paredes, V. Pineau, G. Querne, A. Soria

16.Nice: D Chirio, P. Pugliese, C. Bonnefoy, M. Buscot, M. Carles, A. Courdurié, J. Courjon, E. Cua, P. Dellamonica, E. Demonchy, A. De Monte, S. Ferrando, C. Pradier, K. Risso, A. Viot, S. Wehrlen-Pugliese

17.Niort: S. Sunder, V. Goudet, A. Dos Santos, V. Rzepecki, A. Metais

18.Orléans: L. Hocqueloux, R. Albert, V. Avettand-Fènoël, S. Bafong-Ketchemen, G. Béraud, J. Effa, C. Gubavu, V.

19.Paris APHP Bicêtre: C. Goujard, A. Castro-Gordon, P. David-Chevallier, V. Godard, Y. Quertainmont, E. Teicher/ S. Jaureguiberry, L. Escaut, B. Henry, C. Couzigou, O. Derradji, R. Collarino, J. Y. Liotier, M. Merad, L. Lévi, L. Lefèvre, R. Courtois

20.Paris APHP Bichat: V. Joly, A Bachelard C. Charpentier, D. Descamps, M. Digumber, A. Gervais, J. Ghosn, Z. Julia, R. Landman, F.Ouvrard, N. Peiffer-Smadja, G. Peytavin, C. Rioux, Y. Yazdanpanah, L. Deconinc

21.Paris APHP Necker/Institut Pasteur: C. Duvivier, K. Amazzough, G. Benabdelmoumen, P. Bossi, G. Cessot, PH. Consigny, M. Garzaro, E. Gomes-Pires, P. Hochedez, O. Itani, K. Jidar, E. Lafont, F. Lanternier, O. Lortholary, C. Louisin, J. Lourenco, C. Melenotte, P. Parize, C. Rouzaud, A. Serris, F. Taieb, J. Zeggagh

22.Paris APHP Pitié Salpetrière: V Pourcher, MA. Valantin, C. Katlama, L. Schneider, S. Seang, R. Tubiana, A. Faycal, S. Saliba, M. Favier, C. Aubron, R. Agher, Y. Dudoit, N. Hamani, N. Qatib, A. Chermak, M. Chansombat, G. Osseni, D. Beniken, A. Nadour

23.Quimper: N. Hall, P. Perfezou, JC. Duthe, FB. Drevillon, JP. Talarmin, L. Khatchatourian, P. Petitgas, P. Martinet

24.Reims: F. Bani-Sadr, V. Brodard, M. Hentzien, I. Kmiec, D. Lambert, D. Lebrun, M. Moutel, M. Petithomme-Nanrocki, A. Brunet, H. Marty, Y. N’Guyen, C. Strady, V. Greigert,

25.Rennes: C. Arvieux, M. Baldeyrou, F. Benezit, G. Bury, M.Cailleaux, JM. Chapplain, M. Dupont, JC. Duthé, S. Ismaël, T. Jovelin, A. Lebot, F. Lemaitre, D. Luque-Paz, A. Maillard, C. Morlat, S. Patrat-Delon, L. Picard, M. Poisson-Vannier, L. Poussier, C. Pronier M. Revest, M. Sebillotte, P. Tattevin, C. Thoreux

26.St Etienne: A. Gagneux-Brunon, E. Botelho-Nevers, A. Pouvaret, F. Saunier, V. Ronat

27.Strasbourg: A. Ursenbach, C. Cheneau, C. Bernard-Henry, S. Fafi-Kremer, P. Gantner, C. Mélounou, P. Klee, Y. Hansmann, N. Lefebvre, Y. Ruch, F. Danion, B. Hoellinger, T. Lemmet, V. Gerber, JM. Schevin, A. Fuchs, C. Le Hyaric, D. Rey

28.Toulouse: P. Delobel, M. Alvarez, N. Biezunski, X. Boumaza, A. Chan Sui Ko, N. Collercandy, A. Debard, C. Delpierre, P. Gandia, C. Garnier, R. Gueneau, L. Lelièvre, G. Martin-Blondel, C. Rastoll, S. Raymond, C. Vellas

29.Tourcoing: O. Robineau, E. Aïssi, I. Alcaraz, E. Alidjinou, V. Baclet, A. Boucher, V. Derdour, B. Lafon-Desmurs, A.

Meybeck, M. Tetart, M. Valette, N. Viget, A. Diarra, E Bontemps, B Capelliez, P Coulon

30.Vannes: G. Corvaisier, M. Brière, M. De La Chapelle, M. Gousseff, R. Nguyen Van, M. Thierry

Data Availability

The data underlying this study are drawn from the French Dat’AIDS cohort. These data cannot be shared publicly due to national data protection regulations (Commission Nationale de l’Informatique et des Libertés, CNIL). Access to Dat’AIDS data may be granted upon reasonable request to the Dat’AIDS scientific committee (president: Laurent Hocqueloux; laurent.hocqueloux@chu-orleans.fr and data protection officer: dpo@dataids.com), subject to compliance with French regulations and institutional agreements.

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1.Rockstroh JK, Lennox JL, Dejesus E, Saag MS, Lazzarin A, Wan H, et al. Long-term treatment with raltegravir or efavirenz combined with tenofovir/emtricitabine for treatment-naive human immunodeficiency virus-1-infected patients: 156-week results from STARTMRK. Clin Infect Dis. 2011;53(8):807–16. doi: 10.1093/cid/cir510 [DOI] [PubMed] [Google Scholar]
  • 2.McComsey GA, Moser C, Currier J, Ribaudo HJ, Paczuski P, Dubé MP, et al. Body composition changes after initiation of raltegravir or protease inhibitors: ACTG A5260s. Clin Infect Dis. 2016;62(7):853–62. doi: 10.1093/cid/ciw017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Norwood J, Turner M, Bofill C, Rebeiro P, Shepherd B, Bebawy S, et al. Brief Report: Weight gain in persons with hiv switched from efavirenz-based to integrase strand transfer inhibitor-based regimens. J Acquir Immune Defic Syndr. 2017;76(5):527–31. doi: 10.1097/QAI.0000000000001525 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Bourgi K, Jenkins CA, Rebeiro PF, Palella F, Moore RD, Altoff KN, et al. Weight gain among treatment-naïve persons with HIV starting integrase inhibitors compared to non-nucleoside reverse transcriptase inhibitors or protease inhibitors in a large observational cohort in the United States and Canada. J Int AIDS Soc. 2020;23(4):e25484. doi: 10.1002/jia2.25484 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Sax PE, Erlandson KM, Lake JE, Mccomsey GA, Orkin C, Esser S, et al. Weight Gain Following Initiation of Antiretroviral Therapy: Risk Factors in Randomized Comparative Clinical Trials. Clin Infect Dis. 2020;71(6):1379–89. doi: 10.1093/cid/ciz999 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.NAMSAL ANRS 12313 Study Group, Kouanfack C, Mpoudi-Etame M, Omgba Bassega P, Eymard-Duvernay S, Leroy S, et al. Dolutegravir-based or low-dose efavirenz-based regimen for the treatment of HIV-1. N Engl J Med. 2019;381(9):816–26. doi: 10.1056/NEJMoa1904340 [DOI] [PubMed] [Google Scholar]
  • 7.Venter WDF, Sokhela S, Simmons B, Moorhouse M, Fairlie L, Mashabane N, et al. Dolutegravir with emtricitabine and tenofovir alafenamide or tenofovir disoproxil fumarate versus efavirenz, emtricitabine, and tenofovir disoproxil fumarate for initial treatment of HIV-1 infection (ADVANCE): Week 96 results from a randomised, phase 3, non-inferiority trial. Lancet HIV. 2020;7(10):e666–76. doi: 10.1016/S2352-3018(20)30241-1 [DOI] [PubMed] [Google Scholar]
  • 8.Lake JE, Wu K, Bares SH, Debroy P, Godfrey C, Koethe JR, et al. Risk factors for weight gain following switch to integrase inhibitor-based antiretroviral therapy. Clin Infect Dis. 2020;71(9):e471–7. doi: 10.1093/cid/ciaa177 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Menard A, Meddeb L, Tissot-Dupont H, Ravaux I, Dhiver C, Mokhtari S, et al. Dolutegravir and weight gain: An unexpected bothering side effect?. AIDS. 2017;31(10):1499–500. doi: 10.1097/QAD.0000000000001495 [DOI] [PubMed] [Google Scholar]
  • 10.Bhagwat P, Ofotokun I, McComsey GA, Brown TT, Moser C, Sugar CA, et al. Changes in waist circumference in HIV-infected individuals initiating a raltegravir or protease inhibitor regimen: Effects of sex and race. Open Forum Infectious Diseases. 2018;5(11). doi: 10.1093/ofid/ofy201 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Puri C, Dolui K, Kooijman G, Masculo F, Van Sambeek S, Den Boer S, et al. Gestational weight gain prediction using privacy preserving federated learning. Annu Int Conf IEEE Eng Med Biol Soc. 2021;2021:2170–4. doi: 10.1109/EMBC46164.2021.9630505 [DOI] [PubMed] [Google Scholar]
  • 12.Qiao S, Li X, Olatosi B, Young SD. Utilizing Big Data analytics and electronic health record data in HIV prevention, treatment, and care research: A literature review. AIDS Care. 2024;36(5):583–603. doi: 10.1080/09540121.2021.1948499 [DOI] [PubMed] [Google Scholar]
  • 13.Motta F, Milic J, Gozzi L, Belli M, Sighinolfi L, Cuomo G, et al. A machine learning approach to predict weight change in ART-experienced people living with HIV. J Acquir Immune Defic Syndr. 2023;94(5):474–81. doi: 10.1097/QAI.0000000000003302 [DOI] [PubMed] [Google Scholar]
  • 14.Wickham H, Averick M, Bryan J, Chang W, McGowan L, François R, et al. Welcome to the Tidyverse. JOSS. 2019;4(43):1686. doi: 10.21105/joss.01686 [DOI] [Google Scholar]
  • 15.Kuhn M, Wickham H. Tidymodels: A collection of packages for modeling and machine learning using tidyverse principles. https://www.tidymodels.org. 2020.
  • 16.Bailin SS, Gabriel CL, Wanjalla CN, Koethe JR. Obesity and weight gain in persons with HIV. Curr HIV/AIDS Rep. 2020;17(2):138–50. doi: 10.1007/s11904-020-00483-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Carmen González-Domenech

10 Nov 2025

Dear Dr. Codde,

Please submit your revised manuscript by Dec 25 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Carmen María González-Domenech, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please note that PLOS One has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, we expect all author-generated code to be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

3. In the online submission form, you indicated that [The data underlying this study are drawn from the French Dat’AIDS cohort. These data cannot be shared publicly due to national data protection regulations (Commission Nationale de l’Informatique et des Libertés, CNIL). Access to Dat’AIDS data may be granted upon reasonable request to the Dat’AIDS scientific committee, subject to compliance with French regulations and institutional agreements.].

All PLOS journals now require all data underlying the findings described in their manuscript to be freely available to other researchers, either 1. In a public repository, 2. Within the manuscript itself, or 3. Uploaded as supplementary information.

This policy applies to all data except where public deposition would breach compliance with the protocol approved by your research ethics board. If your data cannot be made publicly available for ethical or legal reasons (e.g., public availability would compromise patient privacy), please explain your reasons on resubmission and your exemption request will be escalated for approval.

4. One of the noted authors is a group or consortium [Dat’AIDS Study Group]. In addition to naming the author group, please list the individual authors and affiliations within this group in the acknowledgments section of your manuscript. Please also indicate clearly a lead author for this group along with a contact email address.

5. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise.

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments:

This is a well-designed study with a large and representative sample (over 24,000 patients), rigorous methodology, and robust statistical and sensitivity analyses. It demonstrates the current limitations of ML in real-world clinical cohorts, especially when behavioral or lifestyle data are lacking. My opinion is positive regarding publication, but the authors should first address the minor comments from two of the reviewers and specially, the major points raised by the third.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously? -->?>

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available??>

The PLOS Data policy

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English??>

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

Reviewer #1: The manuscript presents an important and well-executed analysis of weight gain prediction after antiretroviral therapy initiation using a large French real-world cohort. The authors should be commended for the rigorous data processing, multidisciplinary approach, and transparent reporting of a scientifically meaningful “negative result.”

However, several points need clarification or elaboration before acceptance:

Please clarify the conceptual novelty and intended contribution of the study beyond demonstrating limited ML performance — is the focus methodological (data quality and model robustness) or clinical (individual prediction feasibility)?

Consider adding a comparative analysis using alternative models (e.g., Random Forest, SVM) to contextualize XGBoost’s relative performance.

Expand on the feature importance analysis: provide SHAP plots or additional interpretation of why baseline weight dominates the prediction.

Quantify data heterogeneity across centers and discuss how measurement frequency or data quality influenced model accuracy.

The Discussion could be more concise and focused on broader implications for AI in healthcare, emphasizing the importance of behavioral and lifestyle data integration.

Reviewer #2: This study uses machine learning (XGBoost) on a large French HIV cohort to predict weight gain after antiretroviral therapy (ART) initiation at 6, 12, and 24 months using 112 baseline clinical variables. The models marginally outperformed a simple benchmark (baseline weight) but did not achieve clinically actionable accuracy, primarily due to data heterogeneity and absence of behavioral variables.

Major Points

• Innovation and Importance:

The study addresses a well-recognized clinical issue—excessive weight gain following ART—and applies advanced ML methods on a large real-world dataset, filling a gap in prediction research in HIV care.

• Sample and Data:

The cohort size is impressive (over 24,000 ART-naïve adults), and the data includes a comprehensive range of clinical, laboratory, and demographic features.

• Model Choice:

XGBoost is an appropriate algorithm given the dataset size and the mix of variable types. The use of cross-validation and careful train/test splitting is appropriate.

• Limitations Clearly Stated:

The paper explicitly acknowledges limitations around the lack of high-quality behavioral, lifestyle, and granular longitudinal data. It also carefully details the reduction of sample size due to missing data and the impact such missingness has on model performance.

• Performance and Interpretation:

The models achieve RMSE values of 4.6, 5.3, and 6.4 kg (at 6, 12, and 24 months), which is only a marginal improvement over the baseline. Baseline weight overwhelms all other predictors; other variables, including ART components, have limited additional value. The discussion around why ML does not perform well in this context is appropriately critical and balanced.

• Methodological Transparency:

Data processing, variable selection, and imputation steps are described transparently. Sensitivity analyses excluding outliers and restricting datasets were thoughtful and further contextualized the findings.

• Ethical and Data Sharing Declarations:

Ethical approvals and data access limitations are well described. The paper observes regulatory limitations on data sharing, and this is explained up front.

Minor Points

• Lifestyle Predictor Handling:

Lifestyle and behavioral factors are mentioned as potentially important, but their absence is only briefly discussed. The authors could speculate more on how to either estimate or collect these for future work.

• Imputation Strategy:

The choice of k-nearest neighbors for imputation is standard, but potential biases from this method are not deeply explored. Some simulation or secondary analysis around missingness mechanisms might add value.

• Model Calibration:

The paper does not report calibration plots (e.g., observed vs predicted weights). Given the clinical implications, calibration is important to assess and could be shown, even if limited.

• External Generalizability:

The study’s scope is the French population, but some comment about applicability to other settings, especially outside Europe, would be welcome.

• Figure/Table Presentation:

Figures and tables are referenced well, but future submissions could improve access to the key visuals (since they are in supplemental content).

• Comparison to Published Literature:

Only a few related studies are referenced (notably Motta et al.). Adding further international context may help underscore the universality of the limitations found.

Overall Assessment

This is a high-quality, carefully executed study with an honest appraisal of the challenges of applying ML to real-world clinical prediction in HIV. While negative in primary results, the findings are valuable and relevant to the field. The main area for improvement would be a deeper exploration of missing data and model calibration, and a more detailed discussion of the challenges of integrating behavioral variables in future iterations.

Reviewer #3: The manuscript is well structured and statistically appropriate. However, there are some issues/questions.

1. The introduction is weak. There are already studies investigating ML approach to prediction weight change among PLWH. The manuscript failed to conduct a comprehensive literature review on related works and research gaps. Based on it, what is additional contributions of this study to the literature?

2. What are the missing rates of predictors?

3. What are the hyper parameters to be tuned and what are the specific parameter settings of XGBoost?

4. The benchmark only considered a no-weight-change assumption by using Weight_M0 while a simple linear model (or Lasso) should also be added and compared with ML approach.

5. The conclusion that ML "failed to provide accurate individual predictions" did not convince me. The non-clinically significant improvement of ML approach may due to the noisy variation of weight change or predicting ability of the predictors. For example, the study only used the baseline predictors while the cohort had dynamic information after ART, predictors that were missing in this study may serve as the other important factors for weight change.

**********

what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

To ensure your figures meet our technical requirements, please review our figure guidelines: https://journals.plos.org/plosone/s/figures

You may also use PLOS’s free figure tool, NAAS, to help you prepare publication quality figures: https://journals.plos.org/plosone/s/figures#loc-tools-for-figure-preparation.

NAAS will assess whether your figures meet our technical requirements by comparing each figure against our figure specifications.

Attachment

Submitted filename: PlosOne.pdf

pone.0344570.s007.pdf (84.9KB, pdf)
PLoS One. 2026 Mar 6;21(3):e0344570. doi: 10.1371/journal.pone.0344570.r002

Author response to Decision Letter 1


2 Feb 2026

Journal requirements have been done.

Additional Editor Comments:

This is a well-designed study with a large and representative sample (over 24,000 patients), rigorous methodology, and robust statistical and sensitivity analyses. It demonstrates the current limitations of ML in real-world clinical cohorts, especially when behavioral or lifestyle data are lacking. My opinion is positive regarding publication, but the authors should first address the minor comments from two of the reviewers and specially, the major points raised by the third.

Answer: We thank the Editor for this highly encouraging assessment and for recognizing the value of our rigorous methodology despite the challenges of real-world data.

We have carefully addressed every comment raised by the three reviewers. In particular, to address the major points raised by Reviewer 3, we have:

● Conducted a comparative analysis with a Linear Regression model to benchmark the ML performance (proving that the limitation lies in the data, not the algorithm).

● Significantly expanded the Introduction to better define the research gap regarding large-scale scalability.

● Provided full transparency on hyperparameters (new S1 Table).

● Rewritten the Conclusion to clarify that the results reflect the insufficiency of standard clinical variables rather than a failure of the machine learning methodology itself.

We have also incorporated all minor suggestions from Reviewers 1 and 2, including explicitly assessing calibration and discussing generalizability. We believe these revisions have significantly strengthened the manuscript and fully meet the journal's requirements.

Reviewer #1: The manuscript presents an important and well-executed analysis of weight gain prediction after antiretroviral therapy initiation using a large French real-world cohort. The authors should be commended for the rigorous data processing, multidisciplinary approach, and transparent reporting of a scientifically meaningful “negative result.”

However, several points need clarification or elaboration before acceptance:

Please clarify the conceptual novelty and intended contribution of the study beyond demonstrating limited ML performance — is the focus methodological (data quality and model robustness) or clinical (individual prediction feasibility)?

Answer: Initially, our primary objective was indeed clinical. Given the widespread concern regarding weight gain associated with INSTIs and TAF, we hypothesized that the large sample size of the Dat’AIDS cohort, combined with non-linear ML models, would enable accurate individual weight prediction.

However, contrary to our expectations, the results demonstrated that despite the comprehensive set of variables available, we reached a predictive "glass ceiling." Consequently, the contribution of this manuscript evolved from providing a clinical tool to establishing a methodological proof: we demonstrate that without behavioral data (such as diet and physical activity), even massive clinical datasets are insufficient to predict individual weight trajectories. This "negative" result serves as a critical message to prevent the research community from overestimating the power of heterogeneous real-world data for this specific outcome

Consider adding a comparative analysis using alternative models (e.g., Random Forest, SVM) to contextualize XGBoost’s relative performance.

Answer: We agree that contextualizing XGBoost’s performance is essential and to address this, and in line with Reviewer 3’s recommendation, we conducted a comparative analysis using a standard multivariable Linear Regression model with LASSO penalisation.

We chose this comparison (Linear vs. Non-Linear) rather than testing other ensemble methods like Random Forest or SVM, because our primary goal was to determine if the complex architecture of XGBoost was capturing non-linear patterns that traditional statistical methods miss.

The results showed that the Linear Regression model achieved RMSE values of 4.53 kg, 5.18 kg, and 6.18 kg at 6, 12, and 24 months, respectively, very similar to the XGBoost performance. This similarity confirms that the limiting factor is the predictive power of the variables themselves, rather than the choice of the algorithm.

Expand on the feature importance analysis: provide SHAP plots or additional interpretation of why baseline weight dominates the prediction.

Answer: Regarding the dominance of baseline weight, we believe this reflects the strong biological inertia of anthropometric parameters rather than a model bias. Body weight is a highly autoregressive variable: in adults, weight at month 6 is structurally strongly correlated with weight at month 0.

Given that our Permutation Importance analysis (Supplemental Figure 1) already clearly quantifies this overwhelming dominance, we chose to focus on expanding the biological interpretation in the manuscript rather than adding SHAP plots, which would likely visually redundate the finding that baseline weight crushes other predictors. We have added a dedicated paragraph in the Discussion to explain this « biological inertia ».

Quantify data heterogeneity across centers and discuss how measurement frequency or data quality influenced model accuracy.

Answer : We agree that data heterogeneity across centers is a critical factor. Regarding quantification, the most telling metric is the attrition rate shown in the Flow Chart (Figure 1). Starting from 16,375 eligible ART-naïve patients, only 6,679 had a usable weight measurement for the 6-month prediction.

This loss is not merely random missing data; it reflects structural disparities in data collection practices across the different clinical centers contributing to the cohort. While some centers systematically record weight, others do so inconsistently. Finally, this issue extends beyond measurement frequency: in many cases, patients may be weighed during the consultation, but the value is not reported in the structured database. This variation depends heavily on individual practitioner habits, leading to significant heterogeneity not only between centers but also between physicians. This lack of standardized reporting introduces structural noise and potential selection bias. Consequently, the ML model struggles to distinguish true biological signals from data quality artifacts, which directly contributes to the predictive inaccuracy observed

The Discussion could be more concise and focused on broader implications for AI in healthcare, emphasizing the importance of behavioral and lifestyle data integration.

Answer : To address the request for conciseness, we have removed the detailed paragraph discussing the specific recording limitations of certain variables (smoking cessation, compliance, amphetamines). We agree that these technical details distracted from the main findings.

At the same time, we believe it is essential to retain the other sections of the discussion. For a study reporting a 'negative' result, a thorough exploration of the methodological process is essential to ensure transparency and to robustly support our conclusion that the limitation lies within the data, not the analysis. Furthermore, regarding the 'broader implications,' we have expanded the discussion to emphasize that the future of AI in healthcare lies in hybridizing clinical data with behavioral Patient-Reported Outcomes (PROs), rather than solely relying on larger EHR datasets.

Reviewer #2: This study uses machine learning (XGBoost) on a large French HIV cohort to predict weight gain after antiretroviral therapy (ART) initiation at 6, 12, and 24 months using 112 baseline clinical variables. The models marginally outperformed a simple benchmark (baseline weight) but did not achieve clinically actionable accuracy, primarily due to data heterogeneity and absence of behavioral variables.

Major Points

• Innovation and Importance:

The study addresses a well-recognized clinical issue—excessive weight gain following ART—and applies advanced ML methods on a large real-world dataset, filling a gap in prediction research in HIV care.

• Sample and Data:

The cohort size is impressive (over 24,000 ART-naïve adults), and the data includes a comprehensive range of clinical, laboratory, and demographic features.

• Model Choice:

XGBoost is an appropriate algorithm given the dataset size and the mix of variable types. The use of cross-validation and careful train/test splitting is appropriate.

• Limitations Clearly Stated:

The paper explicitly acknowledges limitations around the lack of high-quality behavioral, lifestyle, and granular longitudinal data. It also carefully details the reduction of sample size due to missing data and the impact such missingness has on model performance.

• Performance and Interpretation:

The models achieve RMSE values of 4.6, 5.3, and 6.4 kg (at 6, 12, and 24 months), which is only a marginal improvement over the baseline. Baseline weight overwhelms all other predictors; other variables, including ART components, have limited additional value. The discussion around why ML does not perform well in this context is appropriately critical and balanced.

• Methodological Transparency:

Data processing, variable selection, and imputation steps are described transparently. Sensitivity analyses excluding outliers and restricting datasets were thoughtful and further contextualized the findings.

• Ethical and Data Sharing Declarations:

Ethical approvals and data access limitations are well described. The paper observes regulatory limitations on data sharing, and this is explained up front.

Minor Points

• Lifestyle Predictor Handling:

Lifestyle and behavioral factors are mentioned as potentially important, but their absence is only briefly discussed. The authors could speculate more on how to either estimate or collect these for future work.

Answer: Our results suggest that future improvements in prediction will not come from larger clinical databases, but from different types of data.

We have expanded the Discussion to specifically propose two concrete strategies: 1) The systematic integration of standardized Patient-Reported Outcomes (PROs) to capture dietary and psychosocial factors, and 2) The potential use of digital health tools (connected devices) for longitudinal monitoring of physical activity. We argue that hybridizing clinical cohorts with these patient-generated data is the necessary next step for the field.

• Imputation Strategy:

The choice of k-nearest neighbors for imputation is standard, but potential biases from this method are not deeply explored. Some simulation or secondary analysis around missingness mechanisms might add value.

Answer: We acknowledge that investigating missingness mechanisms is valuable, particularly for inferential statistics. However, in this prediction-focused study, we opted for KNN imputation for its efficiency and compatibility with our machine learning pipeline.

We believe that the potential bias introduced by imputation remains minimal regarding our main conclusion as our feature importance analysis showed that Baseline Weight is the overwhelming driver of the prediction. By design (inclusion criteria), Baseline Weight had 0% missing data. Consequently, imputation was only applied to secondary variables with a low predictive power. Therefore, using more complex method like MICE would essentially refine the "noise" without altering the primary signal driven by the non-imputed baseline weight.

• Model Calibration:

The paper does not report calibration plots (e.g., observed vs predicted weights). Given the clinical implications, calibration is important to assess and could be shown, even if limited.

Answer: We intended Figure 2 (Scatter plots of observed vs. predicted weights) to serve this specific purpose, as plotting predicted values against reference values is the standard method for visualizing calibration in regression tasks.

However, to ensure this is unambiguous for the reader, we have modified the manuscript text to explicitly label this analysis as a calibration assessment.

• External Generalizability:

The study’s scope is the French population, but some comment about applicability to other settings, especially outside Europe, would be welcome.

Answer: While the study was conducted in France, it is important to note that the Dat'AIDS cohort is not ethnically homogeneous, even though the environmental context (healthcare system, diet) is specific to France. As indicated by the inclusion of 'Geographical origin' and 'Country of birth' in our predictors, a significant proportion of our study population consists of migrants, particularly from Sub-Saharan Africa.

Consequently, the biological and genetic diversity of our sample supports the applicability of these findings to non-European populations. We have added a sentence in the Discussion to explicitly state that this demographic diversity mitigates the limitation of a single-country study

• Figure/Table Presentation:

Figures and tables are referenced well, but future submissions could improve access to the key visuals (since they are in supplemental content).

Answer: We agree with the suggestion to make key visual data more accessible. Accordingly, we have moved the Variable Importance Plot (previously Supplemental Figure 1) into the main body of the manuscript as Figure 3. This figure illustrates the dominance of baseline weight over other predictors.

However, we have kept the tables in the Supplemental Content to maintain the flow and readability of the main text: Supplemental Table 1 is too extensive (comprehensive list of ICD-10 codes/medications), Supplemental Table 2 is technical (detailed missing data rates per variable), and the remaining tables focus on sensitivity analyses for specific subpopulations.

• Comparison to Published Literature:

Only a few related studies are referenced (notably Motta et al.). Adding further international context may help underscore the universality of the limitations found.

Answer: We have added references to major international studies, specifically the pooled analysis of randomized trials by Sax et al. and the trajectory modeling from the US CNICS cohort by Bailin et al in the discussion. These studies, conducted in different healthcare settings, also highlight the heterogeneity of weight trajectories and the difficulty of explaining the variance solely through clinical factors. Citing them reinforces the universality of the limitations we encountered.

Overall Assessment

This is a high-quality, carefully executed study with an honest appraisal of the challenges of applying ML to real-world clinical prediction in HIV. While negative in primary results, the findings are valuable and relevant to the field. The main area for improvement would be a deeper exploration of missing data and model calibration, and a more detailed discussion of the challenges of integrating behavioral variables in future iterations.

Answer: We would like to express our sincere gratitude to the Reviewer for this highly encouraging assessment. We particularly appreciate your recognition of the scientific value of reporting 'negative' results and the challenges of real-world data. As detailed in the responses above, we have enriched the discussion regarding behavioral variable integration, clarified our approach to missing data, and explicitly addressed model calibration. We believe these improvements have significantly strengthened the manuscript.

Reviewer #3: The manuscript is well structured and statistically appropriate. However, there are some issues/questions.

The introduction is weak. There are already studies investigating ML approach to prediction weight change among PLWH. The manuscript failed to conduct a comprehensive literature review on related works and research gaps. Based on it, what is additional contributions of this study to the literature?

Answer: We agree with the Reviewer that the introduction required a more comprehensive review of the existing ML literature to better define the research gap.

We have revised the Introduction to explicitly reference prior works, such as the study by Motta et al., which explored ML in a smaller, specialized cohort. We have clarified that the specific addi

Attachment

Submitted filename: Response to Reviewers.docx

pone.0344570.s008.docx (4.1MB, docx)

Decision Letter 1

Carmen González-Domenech

23 Feb 2026

Machine learning prediction of weight gain after antiretroviral therapy initiation in people with HIV: insights from a large french realworld cohort

PONE-D-25-47116R1

Dear Dr. Cyrielle Codde,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager®  and clicking the ‘Update My Information' link at the top of the page. For questions related to billing, please contact billing support .

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Carmen María González-Domenech, Ph.D.

Academic Editor

PLOS One

Additional Editor Comments (optional):

All the concerns rised by the reviewers have been thoroughly and satisfactorily addressed, including those comments requiring major revision. Therefore, the manuscript is now ready and suitable for publication in PLOS ONE.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #1: All comments have been addressed

Reviewer #2: (No Response)

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions??>

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: (No Response)

**********

3. Has the statistical analysis been performed appropriately and rigorously? -->?>

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: (No Response)

**********

4. Have the authors made all data underlying the findings in their manuscript fully available??>

The PLOS Data policy

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: (No Response)

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English??>

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: (No Response)

**********

Reviewer #1: Authors have addressed my concerns well. The manuscript presents an important and well-executed analysis of

weight gain prediction after antiretroviral therapy initiation using a large French real world cohort. I recommend this paper to be accepted by Plos one.

Reviewer #2: The authors responded adequately, my questions are well adressed. No further comments from my side, the paper is good to go.

Reviewer #3: (No Response)

**********

what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy

Reviewer #1: Yes: Noland Ding

Reviewer #2: No

Reviewer #3: No

**********

Acceptance letter

Carmen González-Domenech

PONE-D-25-47116R1

PLOS One

Dear Dr. Codde,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS One. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Carmen María González-Domenech

Academic Editor

PLOS One

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. Predictors selection. (A) Comorbidities, (B) Co-medications and (C) Antiretroviral treatment.

    INSTI, Integrase strand transfer inhibitor; TAF, Tenofovir alafenamide; NRTI, Nuclos(t)idic reverse transcriptaseinhibitor; NNRTI, Non nuclos(t)idic reverse transcriptaseinhibitor; PI, Protease inhibitor.

    (DOCX)

    pone.0344570.s001.docx (36.6KB, docx)
    S2 Table. Hyperparameters setting for final XGBoost models. The following hyperparameters were tuned using grid search with 10-fold cross-validation via tidymodels package.

    (DOCX)

    pone.0344570.s002.docx (24.3KB, docx)
    S3 Table. Tables from Dat’AIDS database.

    (DOCX)

    pone.0344570.s003.docx (25.1KB, docx)
    S4 Table. Characteristics of the subpopulations of PLHIV treated in the first line used for weight prediction at 6, 12 and 24 months: maximum variation in weight per month limited to 5 kg (A) and 3 kg (B).

    Values in number (%), 1Average (quartiles). †Not provided: 537 (8.04), ‡Not provided: 410 (7.23), §Not provided: 285 (6.03). $Not specified: 530 (8.10), $$Not specified: 409 (7.31), $$$Not specified: 284 (6.06).

    (DOCX)

    pone.0344570.s004.docx (31.4KB, docx)
    S5 Table. Performance using Weight_T0 = weight at checkpoints. aValue obtained after 10 cross-validation.

    The RMSE evaluates the accuracy of the model by measuring the average difference between the actual values and the predictions (the lower it is, the better the performance of the model). The R2 of RMSE measures the quality of fit of the model in relation to the variability of the real data (0: no fit, 1: perfect fit). The relative RMSE expresses the relative error of the mean compared to the actual values. Relative bias measures the systematic error of the model compared to actual values.

    (DOCX)

    pone.0344570.s005.docx (23.8KB, docx)
    S6 Table. Performance of XGBoost models for weight prediction at 6, 12 and 24 months in study subpopulations. (A) Subpopulation limited to 5 kg/month, (B) Subpopulation limited to 3 kg/month.

    aValue obtained after 10 cross-validation. The RMSE evaluates the accuracy of the model by measuring the average difference between the actual values and the predictions (the lower it is, the better the performance of the model). The R2 of RMSE measures the quality of fit of the model in relation to the variability of the real data (0: no fit, 1: perfect fit). The relative RMSE expresses the relative error of the mean compared to the actual values. Relative bias measures the systematic error of the model compared to actual values.

    (DOCX)

    pone.0344570.s006.docx (25.1KB, docx)
    Attachment

    Submitted filename: PlosOne.pdf

    pone.0344570.s007.pdf (84.9KB, pdf)
    Attachment

    Submitted filename: Response to Reviewers.docx

    pone.0344570.s008.docx (4.1MB, docx)

    Data Availability Statement

    The data underlying this study are drawn from the French Dat’AIDS cohort. These data cannot be shared publicly due to national data protection regulations (Commission Nationale de l’Informatique et des Libertés, CNIL). Access to Dat’AIDS data may be granted upon reasonable request to the Dat’AIDS scientific committee (president: Laurent Hocqueloux; laurent.hocqueloux@chu-orleans.fr and data protection officer: dpo@dataids.com), subject to compliance with French regulations and institutional agreements.


    Articles from PLOS One are provided here courtesy of PLOS

    RESOURCES