Skip to main content
PLOS Neglected Tropical Diseases logoLink to PLOS Neglected Tropical Diseases
. 2022 May 4;16(5):e0010388. doi: 10.1371/journal.pntd.0010388

Machine learning-based in-hospital mortality prediction of HIV/AIDS patients with Talaromyces marneffei infection in Guangxi, China

Minjuan Shi 1,#, Jianyan Lin 2,#, Wudi Wei 3,#, Yaqin Qin 2, Sirun Meng 2, Xiaoyu Chen 2, Yueqi Li 3, Rongfeng Chen 3, Zongxiang Yuan 1, Yingmei Qin 2, Jiegang Huang 1, Bingyu Liang 1, Yanyan Liao 3, Li Ye 1,3,*, Hao Liang 1,3,*, Zhiman Xie 2,*, Junjun Jiang 1,3,*
Editor: Roderick Hay4
PMCID: PMC9067679  PMID: 35507586

Abstract

Objective

Talaromycosis is a serious regional disease endemic in Southeast Asia. In China, Talaromyces marneffei (T. marneffei) infections is mainly concentrated in the southern region, especially in Guangxi, and cause considerable in-hospital mortality in HIV-infected individuals. Currently, the factors that influence in-hospital death of HIV/AIDS patients with T. marneffei infection are not completely clear. Existing machine learning techniques can be used to develop a predictive model to identify relevant prognostic factors to predict death and appears to be essential to reducing in-hospital mortality.

Methods

We prospectively enrolled HIV/AIDS patients with talaromycosis in the Fourth People’s Hospital of Nanning, Guangxi, from January 2012 to June 2019. Clinical features were selected and used to train four different machine learning models (logistic regression, XGBoost, KNN, and SVM) to predict the treatment outcome of hospitalized patients, and 30% internal validation was used to evaluate the performance of models. Machine learning model performance was assessed according to a range of learning metrics, including area under the receiver operating characteristic curve (AUC). The SHapley Additive exPlanations (SHAP) tool was used to explain the model.

Results

A total of 1927 HIV/AIDS patients with T. marneffei infection were included. The average in-hospital mortality rate was 13.3% (256/1927) from 2012 to 2019. The most common complications/coinfections were pneumonia (68.9%), followed by oral candida (47.5%), and tuberculosis (40.6%). Deceased patients showed higher CD4/CD8 ratios, aspartate aminotransferase (AST) levels, creatinine levels, urea levels, uric acid (UA) levels, lactate dehydrogenase (LDH) levels, total bilirubin levels, creatine kinase levels, white blood-cell counts (WBC) counts, neutrophil counts, procaicltonin levels and C-reactive protein (CRP) levels and lower CD3+ T-cell count, CD8+ T-cell count, and lymphocyte counts, platelet (PLT), high-density lipoprotein cholesterol (HDL), hemoglobin (Hb) levels than those of surviving patients. The predictive XGBoost model exhibited 0.71 sensitivity, 0.99 specificity, and 0.97 AUC in the training dataset, and our outcome prediction model provided robust discrimination in the testing dataset, showing an AUC of 0.90 with 0.69 sensitivity and 0.96 specificity. The other three models were ruled out due to poor performance. Septic shock and respiratory failure were the most important predictive features, followed by uric acid, urea, platelets, and the AST/ALT ratios.

Conclusion

The XGBoost machine learning model is a good predictor in the hospitalization outcome of HIV/AIDS patients with T. marneffei infection. The model may have potential application in mortality prediction and high-risk factor identification in the talaromycosis population.

Author summary

Talaromyces marneffei can cause a fatal deeply disseminated fungal infection- talaromycosis. It is widely distributed in Southeast Asia and spreading globally, the disease is insidious and responsible for significant deaths. Clinicians need easy-to-use tools to make decisions on which patients are at a higher risk of dying after infecting T. marneffei. In this study, conducted in Southern China, we have evolved XGBoost machine learning model. 15 clinical indicators and laboratory measures were used to estimate a patient’s risk of dying in the hospital due to the T. marneffei infection. The study showed that the machine learning model has good predictive ability when tested in an internal testing population of patients. We expect that the model could help clinicians assess a patient’s risk of death in just the time of admission to help decide on early treatment timing of high-risk patients who are likely to die.

Introduction

Talaromyces marneffei (formerly known as Penicillium marneffei) is a thermally dimorphic fungus. Invading a variety of tissues and organs, it can cause a fatal deeply disseminated fungal infection- talaromycosis that primarily occurs in tropical or subtropical regions of Asia. Since the global outbreak of HIV/AIDS in the 1980s[1], talaromycosis has gradually increased in prevalence, accounting for 6.4–11% of HIV-related admissions in Vietnam [2,3], 3.3% in Thailand [4], 16.1% in Guangxi [5], China, and 17.3% in Guangdong [6], China. Currently, due to immunosuppressive therapy for autoimmune diseases, malignancies and increased international travel and migration, an increasing number of cases are being reported among HIV-negative patients. Furthermore, cases outside of traditional endemic regions have been reported, such as in Wuhan [7], Beijing [8], Shanghai [9], and Hong Kong [10], China. Due to the inability to make an early diagnosis, the in-hospital mortality of talaromycosis patients can be as high as 16.7–30%, despite antifungal therapy [1113]. By the end of 2018, the cumulative number of talaromycosis cases was estimated at 288,000 (95% CI: 146,000–613,800), with 87,900 (95% CI: 37,200–204,300) cumulative deaths [14]. Thus, talaromycosis is a tropical infectious disease with high morbidity and mortality and is a serious threat to regional health. Thuy Le, Linghua Li, and other experts call for talaromycosis to be recognized as a neglected tropical disease that urgently needs to be taken seriously despite the perpetuation of the condition by a cycle of poverty, stigma, and global neglect [15].

In China, 40–56.6% of the cases of talaromycosis are reported in Guangxi [11,16]. Guangxi is a province with a high burden of AIDS patients, where the number of cumulative reports ranks second in China. By the end of October 2020, Guangxi had more than 97,000 HIV-infected people, which accounts for 9% of the total infected people in China. More than 30,000 patients have died of AIDS-related opportunistic infections in Guangxi [17]. Our previous study found that the proportion of HIV/AIDS-related deaths due to talaromycosis increased from 11.5% in 2012 to 16.1% in 2015, and was the most important leading cause of in-hospital HIV/AIDS-related death in Guangxi (AHR = 1.8–4.51), which represented a major public health problem [5,18].

Although T. marneffei infection has a high prevalence and in-hospital mortality rate, the risk factors influencing in-hospital death of patients with talaromycosis are still unclear in Guangxi and relevant studies to guide clinical work are lacking. Although several studies have reported the factors influencing the death of hospitalized patients, including occupation, antiviral treatment, and clinical complications, they could not be completely used as clinical prognosis predictors. In addition, the current research still has limitations, such as insufficient sample sizes, confounding factors.

In recent years, the application of artificial intelligence in the medical field has become a hot spot, and various machine learning algorithms have shown their potential to be applied to large-scale biomedical and patient datasets. Moreover, Machine learning methods might overcome some of the limitations of current analytical approaches to risk prediction by applying computer algorithms to large datasets with numerous, multidimensional variables, capturing high-dimensional, nonlinear relationships among clinical features to make data-driven death outcome predictions. Machine learning models based on clinical features have been used in many applications in cancer and tumor prognosis prediction, such as in lung cancer and breast cancer [19,20]. The application of death prediction in infectious diseases is also becoming a trend, typically regarding the prediction of mortality risk and prognosis of COVID-19 patients [2123]. Similarly, assessing dengue severity risk factors has been reported [24].

The in-hospital mortality rate of patients with talaromycosis is high, yet there is no machine learning model for predicting T. marneffei treatment outcome. Therefore, we would like to develop an optimal machine learning-based risk predictive model by fitting daily laboratory measures and clinical indicators, which will guide clinicians to adjust treatment plans for patients with talaromycosis with different symptoms in a timely manner, as it may have a positive significance for reducing death.

Methods

Ethics statement

This study was approved by the Human Research Ethics Committee of Guangxi Medical University (Ethical Review No. 20210099).

Datasets

To develop the machine learning models, we used a cohort of 1927 hospitalized adult patients (≥ 18 years old) with talaromycosis and gathered information from the hospital’s electronic medical records system. This large-scale observational cohort study was conducted in the Fourth People’s Hospital of Nanning, which is the largest tertiary hospital specializing in infectious diseases in Guangxi and the province’s largest treatment center for HIV/AIDS. The present study included all HIV/AIDS patients admitted to the Fourth People’s Hospital of Nanning from January 2012 to June 2019. Individuals who were HIV/AIDS patients with talaromycosis were identified by the hospital electronic medical records system. For those with multiple admissions, data from the latest admission were preferentially included, and the laboratory data we included were the results of the blood test collected for the first time when the patient was admitted to the hospital before the patient has started formal treatment. The endpoint of our observation was the time of discharge of the patient, and we stopped observation if the patient died during this period. The inclusion criteria were as follows: (1) positive enzyme-linked immunosorbent assay (ELISA) and confirmatory western blotting were used to determine HIV infection; (2) samples of T. marneffei infection—T. marneffei were isolated and cultured from blood, skin tissue, bone marrow, lymph nodes, and/or other bodily fluid samples (mycelia at 25°C and yeast-like structures at 37°C) and indicated compliance with the diagnostic criteria. Patients with complete absence of laboratory results were excluded from the analysis. The study design and grouping are shown in Fig 1.

Fig 1. Workflow for machine learning.

Fig 1

Information such as clinical complications/coinfections and laboratory measures of HIV/AIDS patients with talaromycosis was collected. Different machine-learning methods were evaluated after feature selection to establish the best clinical outcome prediction model.

The sample size was calculated based on the equation, as follows: n=(Z1α22q¯p¯+Zβp0q0+p1q1)2(p1p0)2, where Zα represents the standard normal distribution bound, α was set as 0.05, Zα was set as 1.96, and Zβ = 1.282. Generally, the number of exposed groups was designed to be equal to the number of control groups, according to the data previously reported in the literature, the mortality rate of AIDS patients without comorbid T. marneffei was p0 = 0.076, and the mortality rate of AIDS patients with comorbid T. marneffei was p1 = 0.175[5], p = 1/2(p0+p1), q0 = 1-p0, q1 = 1-p1, q = 1-p. The sample size was chosen as 233 based on the equation. The number of cases in the two groups were 256 and 1671, respectively, which met the sample size requirement. We also collected as many samples as we could base on our ability to meet the minimum sample size requirement to ensure statistical efficacy. In fact, all the samples we could find were included.

Definitions of various complications and coinfections

Fever was defined as a single oral temperature ≥ 38.3°C (armpit temperature ≥ 38.0°C), or oral temperature ≥ 38.0°C (armpit temperature ≥ 37.7°C) lasting more than 1 hour. The diagnosis of pneumonia includes bacterial pneumonia, viral pneumonia, pulmonary mycosis (including Pneumocystis pneumonia) and pneumonia caused by other factors, but does not include pulmonary tuberculosis pneumonia, classified as tuberculosis [25]. The definition standard of anemia is as follows: male levels of hemoglobin 120 g/L, female hemoglobin levels of 110 g/L [26]. The definition of meningitis includes purulent meningitis, cryptococcosis meningitis, and viral meningitis, but does not include tuberculosis meningitis, which is classified as tuberculosis [27]. Coinfections were confirmed according to the diagnostic criteria of chronic hepatitis (hepatitis B, or hepatitis C) and oral candida infection found in infectious diseases [28]. The diagnostic criteria of residual complications or coinfections were defined based on the standard of Internal Medicine [26].

Study outcomes

The patients were classified into two groups according to outcomes—the good outcome (survival) and bad outcome (death) groups when discharged.

Feature selection and data preprocessing

The structured dataset included 80 variables: 23 clinical complications/coinfections and symptom variables (fever, pneumonia, tuberculosis, lung infection, lymphatic tuberculosis, pneumocystis, oral fungal infections, cryptococcus, herpesvirus, syphilis, cytomegalovirus, electrolyte disturbances, hypoproteinaemia, IRIS, bronchitis, hepatitis (B or C), enteritis, dermatitis, hypertension, diabetes, respiratory failure, septic shock, and tumor) and 57 laboratory measures (CD4+ T-cell count, CD8+ T-cell count, and levels of AST, ALT, PLT, Hb, etc.)

Model construction and validation

The patients were randomly split into two datasets: a training cohort (70% of patients), which was used to train the four machine learning models and tune their parameters, and a testing cohort (30% of patients), which was used to test the models and to finetune the hyperparameters. We used bootstrapping as an internal verification method for 2000 trails of random sampling for four machine learning classifiers (logistic regression, eXtreme Gradient Boosting (XGBoost), K-nearest neighbors (KNN), and support vector machine (SVM)) to generate four models for the prediction of outcome.

Performance evaluation

Model performance was assessed according to the sensitivity, specificity, accuracy, area under the receiver operating characteristic curve (AUC) and other learning metrics (F1_score (F1), mAP, and RP curve (recall, precision)). A best-performing model based on a combination of performance evaluation metrics was used as the final model.

Feature importance

For clinical complications/coinfections, the variables with p < 0.05 were selected after Pearson’s chi-square test. The laboratory measures with p < 0.05 were selected after t-test or Mann–Whitney U test.

To determine the major predictors of study outcome in our patient population, the importance of each permutation feature was measured from the final model. Information gain ranking was used to evaluate the worth of each variable by measuring the entropy gain with respect to the outcome. The importance of each feature was quantified by calculating the decrease in the model’s performance after permuting its values. The higher its value was, the more influential the feature. To determine whether the features had a greater impact on the final model, the importance of each permutation feature was measured by the final model. According to the information gain ranking criteria for this study, we calculated the feature importance of all the variables.

Statistical analysis

Categorical variables are reported as counts (%), and continuous variables are reported as the means (SDs) or medians (IQRs). The presence of a normal distribution was verified by the Kolmogorov-Smirnoff test. We used the t-test to assess differences between parametric continuous variables, and the Mann–Whitney U test to assess differences for nonparametric variables. Categorical variables were analyzed using the chi-square test or Fisher’s exact test. No correction for multiple testing was performed. A two-sided p < 0.05 was considered statistically significant. All analyses were performed with Statistical Package for the Social Sciences (SPSS) version 24.0 (SPSS Inc, Chicago, IL, USA) and Anaconda 3 (Python v 3.8.5).

Results

General characteristics of study participants

In all, 1927 eligible patients with talaromycosis were included in this study between January 2012 and June 2019, and the outcome at the time of hospital discharge was defined as death (n = 256) or survival (n = 1671).

The general characteristics of the patients are summarized in S1 Table. The median age of the 1927 patients with talaromycosis was 43 years (range: 18–86 years). In total, 82.3% (1585/1927) of patients were male, 59.5% (1147/1927) of patients were of Han nationality, 59.5% (1146/1927) of patients were married, 55.1% (1061/1927) of patients were farmers, and the median time of inhospital day was 20 (11–28) days. Significant differences in baseline characteristics were identified between the survival and death groups in nationality, marital status, occupation, and time of inhospital day (p < 0.05).

The mortality of talaromycosis among hospitalized HIV/AIDS patients from 2012 to 2019

Among 1927 admitted patients, the total average mortality of talaromycosis among hospitalized HIV/AIDS patients was 13.3% (256/1927) from 2012 to 2019. The mortality rates were 18.4% (45/245) in 2012, 14.4%(44/306) in 2013, 12.1%(36/297) in 2014, 12.2%(31/255) in 2015, 10.8% (26/240) in 2016, 15.1%(39/259) in 2017, 9.8%(22/224) in 2018 and 12.9%(13/101) in the first half of 2019. The number of deaths and the overall in-hospital mortality rate showed a downward trend (p = 0.021) (Fig 2).

Fig 2. The mortality change in HIV/AIDS patients with T.marneffei infection at the Fourth People’s Hospital of Nanning, Guangxi from 2012 to 2019.

Fig 2

Clinical characteristics

The baseline clinical complications/coinfections in the study population were shown in S2 Table. The most common complications/coinfections or symptoms of in-hospital HIV/AIDS patients with talaromycosis were pneumonia (56.7%, 1092/1927), followed by oral candida (47.5%, 915/1927), tuberculosis (40.6%, 737/1927), fever (38.2%, 643/1956) and hypoproteinemia (19.6%, 340/1956). The influence of clinical complications/coinfections on the outcome was shown in Fig 3A (p < 0.05). Septic shock and respiratory failure were the two most common complications/coinfections, leading to an increase in the death toll, followed by pulmonary infection, hypoproteinemia, and electrolyte disturbances. The constituent mortality rate of T. marneffei-infected patients with shock was as high as 84% (86/102), which was higher than that of 9% (170/1825) observed for patients without shock, 69% (55/80) observed for patients with respiratory failure and 11% (201/1847) observed for patients without respiratory failure.

Fig 3. Feature engineering for filtering machine learning predictive model variables.

Fig 3

(A) Percentage of deaths of all patients with different clinical complications/coinfections, all variables χ2 p < 0.05. (B) Violin diagram comparing the laboratory measures levels between the two groups, with p <0.001 in all items. (C) Spearman’s rank correlation coefficient analysis for 39 laboratory measures. (D) Radar plot for the fifth most important predictors of death in the XGBoost model. Abbreviation: CD3, CD3+ T-cell count; CD4/CD8, CD4/CD8 ratio; CD8+ T-cell count; LDL, low-density lipoprotein cholesterol; Ca, calcium; HDL, high-density lipoprotein cholesterol; CREA, creatinine; AST, aspartate aminotransferase; UA, uric acid; LDH, lactate dehydrogenase; Ccr, endogenous creatinine clearance rate; Glu, glucose; CHOL, total cholesterol; TBIL, total bilirubin; AST/ALT, AST/ALT ratio; BUN/CREA, BUN/CREA ratio; BUN, urea nitrogen; K, potassium; IBIL, indirect bilirubin; P, phosphorus; Cl, chlorine; Na, sodium; STY, osmolarity; HCO3, carbonate; Cys-C, serum cystatin C; AG, anion gap; DBIL, direct bilirubin; TBA, total bile acid; Hb, hemoglobin; PLT, platelet; MONO%, monocyte ratio; RDW-CV, red blood cell distribution width; HCT, hematocrit; EOS, eosinophil; EOS%, eosinophil ratio; PDW, platelet distribution width; PCT, platelet distributing width; CRP, C-reactive protein; hsCRP, high-sensitivity C.

We assessed the median levels of some essential indicators in patients of the two groups of patients and compared them. The deceased patients seemed to have higher levels of urea, uric acid, phosphorus (P), chlorine (Cl), serum cystatin C (Cys-C), red blood cell distribution width (RDW-CV), and platelet distribution width (PDW), as well as lower levels of CD8+ T-cell count, triglycerides (TG), total cholesterol (CHOL) and platelets, In particular, the level of PLT in surviving patients (131 μmol/L) was more than twice as high as that in deceased patients (64.5 μmol/L), as detailed in Fig 3B. There are also some features of concern, such as elevated levels of aspartate aminotransferase, lactate dehydrogenase and white blood-cell counts; the specific comparison is shown in S3 Table.

Features used to build the model

Fig 3C shows the results of the correlation analysis of selected laboratory features. Eight pairs of features were highly correlated (R > 0.8); the features with lower contributions (TBIL, HCT, PCT, STY, Ccr, CD3+ T-cell count, and EOS) in the comparison were excluded.

After excluding the correlated features, the 15 top ranked features are shown in Fig 3D, which provided approximately 69% of the overall importance weight. As expected, 13 factors were laboratory factors with among the top 15 features for in-hospital mortality; the AST/ALT ratio, septic shock and respiratory failure were important features in the XGBoost model. (It is not possible to judge the relationship between the feature and the final prediction result, but the results directly reflect the importance of the feature.).

Discrimination of four machine learning prediction models

The four prediction models constructed based on the top 15 most important variables had different predictive performances. Logistic regression had an AUC of 0.72 in the training cohort and 0.80 in the internal validation cohort. We also tested the KNN model (training/testing, AUC = 0.85/1.00, sen = 1.00/0.60, and sep = 1.00/0.95) and SVM (AUC = 0.91/0.70, sen = 0.82/0.47, and sep = 1.00/0.94) to predict patient outcome. The KNN model showed the worst discrimination ability and exhibited overfitting. In contrast, the XGBoost model showed the best discrimination ability; the model yielded an AUC of 0.98 in the training data, with a sensitivity of 0.71 and specificity of 0.99 when using a score of 0.5 as the cutoff value. In the validation of the testing sets, the sensitivity of the model was 0.69, while its specificity was 0.96, indicating that the model had a specific predictive ability. The ROC curves of the training data and testing data of the four models are shown in Fig 4A and 4B, and the ROC curve results of the XGBoost model were more ideal.

Fig 4. Performance evaluation of four machine learning models.

Fig 4

A-B. Receiver operating characteristic curves of the models. (A) AUCs for death of the training (70%) set. (B) AUCs for death of the testing (30%) set. (C) Confusion matrix for the training set. (D) Confusion matrix for the testing set. (E-F) RP curve for death of the training set (E) and the testing set (F). AUC = area under the receiver operating characteristic curve. Precision = true positive/(true positive + false positive); recall = true positive/ (true positive + false negative). (C-D) “0”: “Survival”, “1”: “Death.

In extremely unbalanced data (positive has fewer samples), PR curves may be more practical than ROC curves. After the data is learned by many models, if the PR curve A of one model completely wraps around the PR curve of another model B, it can be asserted that A outperforms B. If A and B cross, a comparison can be made based on the size of the area under the curve. Both equilibrium precision and recall are commonly used. Precision and recall indicators sometimes appear contradictory, so they need to be considered together, with the most common method being the F-measure (also known as the F-score, F1). Combined with the RP graph (Fig 4E and 4F), F1 combines the precision and recall results; the larger F1 is, the better we can assume the performance of the model. Although the training set F1 value is higher than the KNN and SVM values, it does not have a good ability to recognize imbalanced data in actual operation. The KNN has only three cases of better output results. Regarding the RP curve’s ability to measure the performance, the XGBoost comprehensive performance results are stronger in these four models, and the F1 value is greater than 0.70. Considering the performance of both aspects, XGBoost is the better prediction model for this study. The effectiveness of the four models is summarized in Table 1.

Table 1. The effectiveness of the four machine learning preditive models.

Classifiers Datasets Accuracy Error Sensitivity Specificity Precision F1_score mAP MCC AUC
KNN Training 1.0000 0.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
Testing 0.9171 0.0829 0.6029 0.9589 0.6613 0.6308 0.5955 0.5850 0.8514
Logistic Training 0.9088 0.0912 0.4734 0.9793 0.7876 0.5914 0.6604 0.5659 0.7264
Testing 0.9326 0.0674 0.6324 0.9726 0.7544 0.6880 0.7060 0.6538 0.8025
SVM Training 0.9755 0.0245 0.8245 1.0000 1.0000 0.9038 0.9763 0.8954 0.9122
Testing 0.8912 0.1088 0.4706 0.9472 0.5424 0.5039 0.5185 0.4446 0.7089
XGBoost Training 0.9518 0.0482 0.7128 0.9905 0.9241 0.8048 0.9158 0.7864 0.9794
Testing 0.9344 0.0656 0.6912 0.9667 0.7344 0.7121 0.6472 0.6755 0.9008

Explanatory assessment of model stability

To better investigate the predictive significance of the XGBoost model to guide specific practice, we introduced the SHAP value to describe the impact of features on the outcome. For each predicted sample, the model generates a predictive value, and the SHAP value is the value assigned to each feature in that sample, which can reflect each feature’s impact, and shows the impact whether positive or negative. As seen in Fig 5, septic shock and respiratory failure were the two most important features. They were essentially positively correlated with death, being the most closely related to death, with those who exhibit both features having a greatly increased risk of death compared to that of those who do not. Uric acid, urea, RDW-CV, Cys-C, BUN/CREA, PDW, and P also significantly affected death. The higher the value was for these features, the higher the risk of death, and the smaller the values of chlorine, total cholesterol, platelets, and calcium were, the higher the risk of death, especially regarding platelets and total cholesterol levels. With AST/ALT levels, there was a tendency for an increase in the death risk when the level downregulates slightly. The contribution of the CD8+ T-cell count value to the outcome was predominantly negative, and it was more pronounced when CD8+ T-cell count values were greater than a certain level.

Fig 5. The effect of 15 top ranked features on the outcome.

Fig 5

Each row represents a feature, the horizontal coordinate is the SHAP value, the blue color means the feature’s contribution is negative; the red color means the feature’s contribution is positive, one point represents a sample, the more red the color means the feature itself is larger, the more blue the color means the feature itself is smaller.

The impact of these indicators on the prediction results used in the discrimination of patient outcome can also be verified by analyzing the number of misclassified cases. Fig 4C and 4D show the confusion matrix of the model; 1149 out of 1348 patients in the training set were correctly predicted as anti-case patients (survived), 134 patients were correctly predicted as positive cases (died), and a total of 65 patients were misclassified (ACC = 95%). Of the 579 patients in the test set, 541 patients were correctly predicted, and 38 patients were misclassified (ACC = 93%). Among the misclassified cases in both the training and test sets, those whose actual prognosis was survival and misclassified as death (FP) had higher prevalence of respiratory failure and shock than that of patients who were correctly judged to be alive (survival). Conversely, those whose actual prognosis was death and who were judged to be alive (FN) had lower prevalence of both respiratory failure and shock than those of patients who were correctly judged to be dead (death) (Fig 6A and 6B, p < 0.05).

Fig 6. Analysis of clinical complications/coinfections and laboratory results of misclassified cases.

Fig 6

(A-B) Percentage of deaths of all patients with different clinical complications/coinfections in the training dataset (A) and testing dataset (B). (C-D) Violin diagram comparing the levels of laboratory measures between the four groups. Survival: correctly classified to be alive; Death: correctly classified to be dead; FN: those whose actual prognosis was death and were classified to be alive; FP: those whose actual prognosis was survival and were misclassified as death.

The presence of a relative abnormality in an index of misclassified patients was also reflected in the laboratory characteristics. For example, higher urea levels and AST/ALT ratios levels and lower platelet levels were observed in patients classified as FP (compared to those correctly judged as survivors), and the opposite was true for patients classified as FN when these values were compared with those of patients who died (Fig 6C and 6D).

Discussion

We conducted a cohort study with a large sample size and obtained the latest in-hospital mortality rate of T. marneffei infections among HIV/AIDS inpatients in southern China. The number and in-hospital mortality of talaromycosis patients among HIV/AIDS admissions decreased from 45 and 18.4% in 2012 to 13 and 12.9% in the first half of 2019, respectively. Pneumonia, oral candidiasis, tuberculosis, and hypoproteinemia were common complications/coinfections in HIV/AIDS patients with T. marneffei infection, which is a finding similar to the results of Pang et al [29].

In this study, we used data on 1927 HIV/AIDS patients with T. marneffei coinfection at the time of admission to develop and test an Machine learning-based prediction model to predict the risk of death during patient hospitalization. Our XGBoost prognostic model exhibited good discrimination for the prediction of death during patient hospitalization. The clinically meaningful cutoff value of 0.5 was bounded by a sensitivity and specificity of approximately 70% for both the training and test sets. There was no decrease in model performance between the training data and test validation, which should allay most concerns about overfitting of the training data. Finally, robust hypothetical trade-offs in the occurrence of mortality events are observed for each patient according to the SHAP value of each feature. Specifically, septic shock and respiratory failure were the most important variables affecting death, and we also considered serum uric acid, urea, platelet, and AST/ALT levels as relatively important variables.

The results of a recent prognostic model developed to predict outcomes in patients with HIV-associated tuberculosis were published [30]. Accurate prediction of patient death after coinfection with HIV/AIDS and T. marneffei still represents an unmet need. Our previous study developed a simple-to-use nomogram for predicting the survival of hospitalized HIV/AIDS patients [31], however, it did not involve laboratory measures, so it is not an optimally comprehensive evaluation of the specific conditions of patients. Thuy Le developed a prognostic model using Bayesian logistic regression to identify predictors of death [32]. In general, the value of models for prognostic evaluation of T. marneffei infection populations using available data is increasingly recognized as a very economical means to aid clinical practice, but thus far, there is a lack of relatively well-developed studies with large samples sizes and especially well-performing predictive models. Our XGBoost predictive model offers relatively high accuracy in detecting the risk of in-hospital death in a population of 28.7% patients (553/1927) treated with current standard ART therapies during the study period.

There is growing evidence that respiratory failure, shock, urea levels, and platelet levels significantly impact adverse outcomes, such as death. A study in Vietnam found that urea levels were higher in fatal cases of patients with HIV/AIDS complicated by T. marneffei infection compared with those of nonfatal patient cases. Dyspnea is an independent predictor of in-hospital mortality [2]. Not coincidentally, another article reported that both respiratory difficulty and lower platelet count predict poor in-hospital outcome [33]. Infection shock accounts for 10.2% of the total causes of death among HIV patients with T. marneffei infection at the Beijing Ditan Hospital, ranking fourth [34]. Septic shock and respiratory failure are often manifestations of a patient’s progression to cachexia. Patients with combined respiratory failure and shock are often clinically classified as high-risk patients, which also indicates that the prognosis for these patients may be relatively poor, in other words, they are more likely to die. Our study found that both were indicators of poor prognosis.

We ranked the contribution of all the independent variables, the AST/ALT ratio was the highest in the feature contribution ranking, we found that patients who died had significantly higher AST/ALT ratios compared with those who survived (3.07 versus 1.96). The previous study has shown an elevated AST/ALT ratio in talaromycosis patients.[33]. Two other studies also showed abnormal changes in AST or ALT levels in HIV/AIDS patients with talaromycosis [35,36]. In fact, other fungal studies have also found this phenomenon, a study suggested that the mean ratio of AST to ALT in patients with disseminated histoplasmosis (A fungal disease) was higher than localized pulmonary disease and other endemic mycoses [37]. As we know, ALT is primarily distributed in the liver, kidneys, heart, and skeletal muscle, while AST is primarily distributed in the heart, liver, skeletal muscle, and kidneys. Given our results, AST/ALT ratio may be a predictor of death. Nevertheless, talaromycosis is a disseminated disease, the exact site of the damage, the cause of AST and AL changes, and the biological mechanism in talaromycosis, which deserves further research. Similarly, the association of platelets with the poor in-hospital outcome of talaromycosis has been reported previously. The platelet elevated level in the group of deceased patients (64.5×109/L) was less than half of that in the survival group (131×109/L). The lower the platelet levels are, the more likely the patient is to bleed and develop coagulation disorders, which is also consistent with the results of the misclassification case analysis. The higher the urea level is, the lower the levels of platelets, and the higher the AST/ALT ratio is, the more likely the surviving patient is judged to be deceased, and conversely, the patients with a high risk of death may be judged to be alive. Therefore, it is valuable to clarify the significance of these indicators for death to correctly identify and predict the prognosis of patients. The combination of chloride, calcium, and phosphorus levels points to the electrolyte status of the body, which may indicate electrolyte disturbances in patients at high risk of death. Cys-C levels, BUN/CREA, PDW, and RDW-CV are less clinically significant and may receive less attention, but they are also essential for model prediction. We also note that deceased patients showed higher CD4/CD8 ratios, and from the data of S2 Table we can clearly see that the median CD4+ T-cell count was 22 in survival patients and 21 in death patients (p > 0.05), while the median CD8+ T-cell count was 271,215, (p < 0.001). This result shows that the main feature underlying patient death was a higher CD4/CD8 ratio due to the lower CD8+ T-cell count, which suggests that CD8+ T-cell count is also important and that focusing on CD4+ T-cell count alone may not be enough to avoid death. This brings us to the question of how to reduce deaths in the HIV/AIDS with T. marneffei infection population, which is usually to change the method of treatment, including the change of drugs, the change and choice of treatment timing, recommendations for dosage of treatment drugs, etc.

Notably, patients with T. marneffei infections have many similar clinical symptoms to patients who have many other infections, which makes early diagnosis of talaromycosis difficult, so special clinical attention needs to be paid to the early diagnosis of talaromycosis patients, and the earlier the diagnosis, the more deaths can be reduced. This study attempted to build four machine learning-based prognostic prediction models for HIV/AIDS patients with talaromycosis during hospitalization. Our XGBoost model stems from the exploration of 15 variables that are routinely assessed during the management of patients admitted to the hospital to identify which factors are more predictive of death in talaromycosis patients. This prediction machine learning model helps clinicians reduce talaromycosis deaths to some extent. We remind clinicians to differentially diagnose the symptoms caused by other opportunistic infections, such as tuberculosis, which is clinically and radiologically similar to T. marneffei, to mitigate the pneumonia arising from the combination of tuberculosis while treating pneumonia caused by T. marneffei, and then to take targeted treatment to reduce deaths.

Although this is the only study, to our knowledge, to propose an in-hospital machine learning-based mortality prediction model for HIV/AIDS patients with T. marneffei infection in such a large sample of patients in China, our research should be interpreted considering some limitations. In fact, the time from onset to diagnosis, the antifungal treatment regimen, the time of fungal culture positivity, the types and number of the other comorbidities, the identities and timing of antifungal treatments, delays in diagnosis after admission, the severity of coinfections, the timing of antiretroviral therapy, etc., are more comprehensive information that we unfortunately, for various objective reasons, did not obtain. Second, our data were only from one hospital, and there was no external validation dataset for this hospital, which is the largest HIV/AIDS treatment center in Guangxi Province. The model we built could guide mortality prediction in this hospital. It is a remarkable fact that we did not have data from external validation. Our model was validated in internally and maintained a good and stable level of discrimination for the explored outcome. Finally, the data we used are cross-sectional data, but it is noted that the data can be updated in real time when truly applied to the clinic, and further efforts will have to continue to increase the sample size.

In conclusion, we have developed and tested a XGBoost predictive model, an machine learning-based tool to predict the risk for death. This study showed that the machine learning-based approach in this setting is feasible and effective with potentially significant application in mortality prediction in HIV/AIDS with talaromycosis population.

Supporting information

S1 Table. General characteristics of 1927 HIV/AIDS patients with T.marneffei infection at the Fourth People’s Hospital of Nanning, Guangxi.

ART, antiretroviral therapy, a Kolmogorov-Smirnov, b Chi-square test, c t-test.

(DOCX)

S2 Table. Effects of different clinical complications/coinfections on the mortality of 1956 HIV/AIDS patients with T. marneffei infection at admission.

IRIS, immune reconstitution inflammatory syndrome.

(DOCX)

S3 Table. Laboratory measures of 1927 HIV/AIDS patients with T. marneffei infection.

(DOCX)

Acknowledgments

We would like to express our gratitude to all of staff from the Fourth People’s Hospital of Nanning in Guangxi, China, for their collecting, verifying, and cleaning of the data used in this study.

Data Availability

Data cannot be shared publicly because of ethical and legal reasons. Data are available from the Langchao(Nanning)Computer Technology Co., LTD Institutional Data Access / EthicsCommittee (Hetai science park, No. 9 gaoxin 4th Road, Nanning City, Guangxi Province, email address: xueying01@inspur.com) for researchers who meet the criteria for access to confidential data.

Funding Statement

The study was supported by National Natural Science Foundation of China (NSFC; 81971934), Guangxi Bagui Scholar (to JJ), Guangxi Science Fund for Distinguished Young Scholars (2018GXNSFFA281001), Guangxi Medical University Training Program for Distinguished Young Scholars (to JJ), Guangxi Natural Science Foundation of Guangxi (2021GXNSFBA196004, to WW), Guangxi Key Research and Development Plan (GuikeAB18050022, to HL), and the Nanning Science and Technology Major Project (20193008, to JL). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Cao C, Xi L, Chaturvedi V. Talaromycosis (Penicilliosis) Due to Talaromyces (Penicillium) marneffei: Insights into the Clinical Trends of a Major Fungal Disease 60 Years After the Discovery of the Pathogen. Mycopathologia. 2019;184(6):709–20. doi: 10.1007/s11046-019-00410-2 [DOI] [PubMed] [Google Scholar]
  • 2.Larsson M, Nguyen LH, Wertheim HF, Dao TT, Taylor W, Horby P, et al. Clinical characteristics and outcome of Penicillium marneffei infection among HIV-infected patients in northern Vietnam. AIDS research and therapy. 2012;9(1):24. doi: 10.1186/1742-6405-9-24 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Qin Y, Huang X, Chen H, Liu X, Li Y, Hou J, et al. Burden of Talaromyces marneffei infection in people living with HIV/AIDS in Asia during ART era: a systematic review and meta-analysis. BMC infectious diseases. 2020;20(1):551. doi: 10.1186/s12879-020-05260-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Chayakulkeeree M, Denning DW. Serious fungal infections in Thailand. European journal of clinical microbiology & infectious diseases: official publication of the European Society of Clinical Microbiology. 2017;36(6):931–5. [DOI] [PubMed] [Google Scholar]
  • 5.Jiang J, Meng S, Huang S, Ruan Y, Lu X, Li JZ, et al. Effects of Talaromyces marneffei infection on mortality of HIV/AIDS patients in southern China: a retrospective cohort study. Clinical microbiology and infection: the official publication of the European Society of Clinical Microbiology and Infectious Diseases. 2019;25(2):233–41. doi: 10.1016/j.cmi.2018.04.018 [DOI] [PubMed] [Google Scholar]
  • 6.Ying RS, Le T, Cai WP, Li YR, Luo CB, Cao Y, et al. Clinical epidemiology and outcome of HIV-associated talaromycosis in Guangdong, China, during 2011–2017. HIV medicine. 2020;21(11):729–38. doi: 10.1111/hiv.13024 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Hu F, Liu S, Liu Y, Li X, Pang R, Wang F. The decreased number and function of lymphocytes is associated with Penicillium marneffei infection in HIV-negative patients. Journal of microbiology, immunology, and infection = Wei mian yu gan ran za zhi. 2020. doi: 10.1016/j.jmii.2020.02.007 [DOI] [PubMed] [Google Scholar]
  • 8.Zhang J, Zhang D, Du J, Zhou Y, Cai Y, Sun R, et al. Rapid diagnosis of Talaromyces marneffei infection assisted by metagenomic next-generation sequencing in a HIV-negative patient. IDCases. 2021;23:e01055. doi: 10.1016/j.idcr.2021.e01055 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Zhu YM, Ai JW, Xu B, Cui P, Cheng Q, Wu H, et al. Rapid and precise diagnosis of disseminated T.marneffei infection assisted by high-throughput sequencing of multifarious specimens in a HIV-negative patient: a case report. BMC infectious diseases. 2018;18(1):379. doi: 10.1186/s12879-018-3276-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Chan JF, Lau SK, Yuen KY, Woo PC. Talaromyces (Penicillium) marneffei infection in non-HIV-infected patients. Emerging microbes & infections. 2016;5(3):e19. doi: 10.1038/emi.2016.18 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Hu Y, Zhang J, Li X, Yang Y, Zhang Y, Ma J, et al. Penicillium marneffei infection: an emerging disease in mainland China. Mycopathologia. 2013;175(1–2):57–67. doi: 10.1007/s11046-012-9577-0 [DOI] [PubMed] [Google Scholar]
  • 12.Chen J, Zhang R, Shen Y, Liu L, Qi T, Wang Z, et al. Clinical Characteristics and Prognosis of Penicilliosis Among Human Immunodeficiency Virus-Infected Patients in Eastern China. The American journal of tropical medicine and hygiene. 2017;96(6):1350–4. doi: 10.4269/ajtmh.16-0521 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Son VT, Khue PM, Strobel M. Penicilliosis and AIDS in Haiphong, Vietnam: evolution and predictive factors of death. Medecine et maladies infectieuses. 2014;44(11–12):495–501. doi: 10.1016/j.medmal.2014.09.008 [DOI] [PubMed] [Google Scholar]
  • 14.Ning C, W W, B X, NT T3, Ye L. The Global Distribution, Drivers, and Burden of Talaromycosis 1964–2017. DuKeHeath. 2020. [Google Scholar]
  • 15.Narayanasamy S, Dat VQ, Thanh NT, Ly VT, Chan JF, Yuen KY, et al. A global call for talaromycosis to be recognised as a neglected tropical disease. The Lancet Global health. 2021;9(11):e1618–e22. doi: 10.1016/S2214-109X(21)00350-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Zhou LH, Jiang YK, Li RY, Huang LP, Yip CW, Denning DW, et al. Risk-Based Estimate of Human Fungal Disease Burden, China. Emerg Infect Dis. 2020;26(9):2137–47. doi: 10.3201/eid2609.200016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Xianmin D, Wenmin Y, Qiuying Z, Xiuling W, Zhiyong S. Epidenmiological characteristic of HIV/AIDS in Guangxi Zhuang Autonomous Region, 2010–2017. Chin J Epidemiol 2019;March 2019, Vol.40, No.3. [Google Scholar]
  • 18.Han J, Lun WH, Meng ZH, Huang K, Mao Y, Zhu W, et al. Mucocutaneous manifestations of HIV-infected patients in the era of HAART in Guangxi Zhuang Autonomous Region, China. Journal of the European Academy of Dermatology and Venereology: JEADV. 2013;27(3):376–82. doi: 10.1111/j.1468-3083.2011.04429.x [DOI] [PubMed] [Google Scholar]
  • 19.Liu G, Xu Z, Zhang Y, Jiang B, Zhang L, Wang L, et al. Machine-Learning-Derived Nomogram Based on 3D Radiomic Features and Clinical Factors Predicts Progression-Free Survival in Lung Adenocarcinoma. Frontiers in oncology. 2021;11:692329. doi: 10.3389/fonc.2021.692329 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Chen Z, Wang M, De Wilde RL, Feng R, Su M, Torres-de la Roche LA, et al. A Machine Learning Model to Predict the Triple Negative Breast Cancer Immune Subtype. Frontiers in immunology. 2021;12:749459. doi: 10.3389/fimmu.2021.749459 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Campbell TW, Wilson MP, Roder H, MaWhinney S, Georgantas RW 3rd, Maguire LK, et al. Predicting prognosis in COVID-19 patients using machine learning and readily available clinical data. International journal of medical informatics. 2021;155:104594. doi: 10.1016/j.ijmedinf.2021.104594 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Hu C, Liu Z, Jiang Y, Shi O, Zhang X, Xu K, et al. Early prediction of mortality risk among patients with severe COVID-19, using machine learning. International journal of epidemiology. 2021;49(6):1918–29. doi: 10.1093/ije/dyaa171 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Li M, Zhang Z, Cao W, Liu Y, Du B, Chen C, et al. Identifying novel factors associated with COVID-19 transmission and fatality using the machine learning approach. The Science of the total environment. 2021;764:142810. doi: 10.1016/j.scitotenv.2020.142810 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Huang SW, Tsai HP, Hung SJ, Ko WC, Wang JR. Assessing the risk of dengue severity using demographic information and laboratory test results with machine learning. PLoS neglected tropical diseases. 2020;14(12):e0008960. doi: 10.1371/journal.pntd.0008960 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Y M, L Z, Y P. Tuberculosis. Beijing: People’s Medical Publishing House; 2006. [Google Scholar]
  • 26.J G, Y X. Internal medicine. 8th ed. Beijing: People’s Medical Publishing House; 2013. [Google Scholar]
  • 27.J J, S C. Neurology. 7th ed. Beijing: People’s Medical Publishing House; 2013. [Google Scholar]
  • 28.L L, H R. Infectious diseases. 8th ed. Beijing: People’s Medical Publishing House; 2013. [Google Scholar]
  • 29.Pang W, Shang P, Li Q, Xu J, Bi L, Zhong J, et al. Prevalence of Opportunistic Infections and Causes of Death among Hospitalized HIV-Infected Patients in Sichuan, China. The Tohoku journal of experimental medicine. 2018;244(3):231–42. doi: 10.1620/tjem.244.231 [DOI] [PubMed] [Google Scholar]
  • 30.Hanifa Y, Fielding KL, Chihota VN, Adonis L, Charalambous S, Foster N, et al. A clinical scoring system to prioritise investigation for tuberculosis among adults attending HIV clinics in South Africa. PloS one. 2017;12(8):e0181519. doi: 10.1371/journal.pone.0181519 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Yuan Z, Zhou B, Meng S, Jiang J, Huang S, Lu X, et al. Development and external-validation of a nomogram for predicting the survival of hospitalised HIV/AIDS patients based on a large study cohort in western China. Epidemiology and infection. 2020;148:e84. doi: 10.1017/S0950268820000758 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Klus J, Ly VT, Chan C, Le T. Prognosis and treatment effects of HIV-associated talaromycosis in a real-world patient cohort. Medical mycology. 2021;59(4):392–9. doi: 10.1093/mmy/myab005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Le T, Wolbers M, Chi NH, Quang VM, Chinh NT, Lan NP, et al. Epidemiology, seasonality, and predictors of outcome of AIDS-associated Penicillium marneffei infection in Ho Chi Minh City, Viet Nam. Clinical infectious diseases: an official publication of the Infectious Diseases Society of America. 2011;52(7):945–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Xiao J, Du S, Tian Y, Su W, Yang D, Zhao H. Causes of Death Among Patients Infected with HIV at a Tertiary Care Hospital in China: An Observational Cohort Study. AIDS research and human retroviruses. 2016;32(8):782–90. doi: 10.1089/AID.2015.0271 [DOI] [PubMed] [Google Scholar]
  • 35.Yousukh A, Jutavijittum P, Pisetpongsa P, Chitapanarux T, Thongsawat S, Senba M, et al. Clinicopathologic study of hepatic Penicillium marneffei in Northern Thailand. Archives of pathology & laboratory medicine. 2004;128(2):191–4. doi: 10.5858/2004-128-191-CSOHPM [DOI] [PubMed] [Google Scholar]
  • 36.Kawila R, Chaiwarith R, Supparatpinyo K. Clinical and laboratory characteristics of penicilliosis marneffei among patients with and without HIV infection in Northern Thailand: a retrospective study. BMC infectious diseases. 2013;13:464. doi: 10.1186/1471-2334-13-464 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Spec A, Barrios CR, Ahmad U, Proia LA. AST to ALT Ratio is elevated in disseminated histoplasmosis as compared to localized pulmonary disease and other endemic mycoses. Medical mycology. 2017;55(5):541–5. doi: 10.1093/mmy/myw106 [DOI] [PubMed] [Google Scholar]
PLoS Negl Trop Dis. doi: 10.1371/journal.pntd.0010388.r001

Decision Letter 0

Roderick Hay, Ahmed Fahal

11 Jan 2022

Dear Author

Thank you very much for submitting your manuscript "Machine learning-based in-hospital mortality prediction of HIV/AIDS patients with Talaromyces marneffei infection in Guangxi, China" for consideration at PLOS Neglected Tropical Diseases. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

The reviewers have identified some issues that require clarification

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Roderick Hay

Guest Editor

PLOS Neglected Tropical Diseases

Ahmed Fahal

Deputy Editor

PLOS Neglected Tropical Diseases

***********************

The reviewers have identified some issues that require clarification

Reviewer's Responses to Questions

Key Review Criteria Required for Acceptance?

As you describe the new analyses required for acceptance, please consider the following:

Methods

-Are the objectives of the study clearly articulated with a clear testable hypothesis stated?

-Is the study design appropriate to address the stated objectives?

-Is the population clearly described and appropriate for the hypothesis being tested?

-Is the sample size sufficient to ensure adequate power to address the hypothesis being tested?

-Were correct statistical analysis used to support conclusions?

-Are there concerns about ethical or regulatory requirements being met?

Reviewer #1: The objectives of the study were clearly articulated with a clear testable hypothesis stated.

The study design was basically appropriate to address the stated objectives

The population was clearly described and appropriate for the hypothesis being tested.

No calculation of the sample size for ensuring the adequate power to address the hypothesis being tested.

The statistical analysis used to support conclusions were correct in general.

The concerns about ethical were being met.

However, there are quite a few contents that need to be considered and clarified in this article.

1. Many factors are potentially associated to the prognosis of of AIDS related Talaromycosis, however many important indicators were not included in this article except the lab indexes, such as, the time from onset to diagnosis, the antifungal regimen, the time of fungal culture turning positive, the types and number of the comorbidities

2. The article listed many lab indicators but ignored the significance and the mutual relationship of these indicators. For example, ALT and AST represent the liver function and would be better to combine for analysis.

3. The observation end point and time, the diagnosis criteria of talaromycosis and other OIs were not explained clearly in this article.

Reviewer #2: The aim is clearly stated and a large cohort of patients has been used to address the objectives. There is a fairly well described cohort, although there are issues as to when the individual data were obtained in relation to each patient's illness. The study design (use of machine learning) is appropriate although more details of each method of machine learning would be useful. The statistical approach seems reasonable.

There are no ethical or regulatory issues.

Reviewer #3: The study need more data to support their hyposis.

--------------------

Results

-Does the analysis presented match the analysis plan?

-Are the results clearly and completely presented?

-Are the figures (Tables, Images) of sufficient quality for clarity?

Reviewer #1: The analysis presented match the analysis plan, and the results were clearly and completely presente. however, some figures and tables could be merged and simplified to improve the expression efficacy.

Reviewer #2: The authors analysed 4 different types of machine learning after having selected the 15 most important variables. There was an emphasis on laboratory variables, possibly because they were easier to obtain from the patient records. This might have introduced some bias away from clinical features, which are often poorly recorded compared to lab data. The results section is slightly difficult to follow as there seemed to be confusion between complications of Talaromyces (e.g. anaemia) and co-infections (e.g. tuberculosis). There was also inclusion of pneumonia as a variable without discussing the causes, one of which could be talaromyces. Failure to recognise that some of the co-infections, rather than Talaromyces, could have been the cause of death in some patients weakens the data set.

Table S2 is a problem: it states that the number of cases is 1956, rather than 1927 stated elsewhere. Also, the number with or without fever (643 and 940 respectively) do not add up to either 1927 or 1956

Reviewer #3: No for all these questions

--------------------

Conclusions

-Are the conclusions supported by the data presented?

-Are the limitations of analysis clearly described?

-Do the authors discuss how these data can be helpful to advance our understanding of the topic under study?

-Is public health relevance addressed?

Reviewer #1: In general, the conclusions were supported by the data presented. The limitations of analysis were not clearly described here. The authors have discussed how these data can be helpful to advance our understanding of the topic under study and addressed the relevant public health issue. For some contradictory results for instance“deceased patients showed higher levels of CD4/CD8 ratio”, the discussion lacked the reasonable explanation.

Reviewer #2: The conclusions appear to be supported by the data, although the machine learning methodology is not clear to me. Some of the limitations are acknowledged but a major flaw seems to be a paucity of clinical data, such as what antifungals were used and when, any delays in diagnosis after admission, the severity of co-infections, the timing of antiretroviral therapy etc. There was over-reliance on lab parameters without stating when, in the stage of illness, the blood tests were obtained.

The authors could be clearer as to how the results can be applied and whether the clinical approach to patients with Talaromyces should be altered as a consequence

Reviewer #3: None

--------------------

Editorial and Data Presentation Modifications?

Use this section for editorial suggestions as well as relatively minor modifications of existing data that would enhance clarity. If the only modifications needed are minor and/or editorial, you may wish to recommend “Minor Revision” or “Accept”.

Reviewer #1: 1. The case numbers in the article were not consistent. For example, table S1 and S2.

2. The English need to be polished.

Reviewer #2: Clearly Table S2 needs to be modified. Some of the use of English needs editorial input

Reviewer #3: (No Response)

--------------------

Summary and General Comments

Use this section to provide overall comments, discuss strengths/weaknesses of the study, novelty, significance, general execution and scholarship. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. If requesting major revision, please articulate the new experiments that are needed.

Reviewer #1: This is an interesting study on in-hospital mortality predictors of AIDS related Talaromycosis by Machine learning. In general, the study has very good originality and the statistical methods are appropriately applied. However, there are quite a few contents that need to be considered and clarified in this article.

Reviewer #2: Talaromyces is an important complication of HIV infection, particulary in Asia. Mortality rates remain high, often because of late presentation or lack of appropriate therapy. Assessing ways to improve outcomes is, therefore, important both for the individual patient and for public health reasons. Using machine learning to process data to derive prediction models is increasingly being investigated, but the quality of the output depends critically on the data input. This study conflates some variables that are due to the fungal infection with other co-infections that can also be lethal. For example, TB and Talaromyces can look similar in HIV, both clinically and radiologically, and pneumonia is a common cause of death in HIV.

If such methods are used to predict mortality, there needs to be some view from the authors as to how such predictions can be used to modify diagnosis and treatment. It is well known that septic shock and respiratory failure are associated with an increased risk of death, and, as both are obvious clinically, the authors need to explain how the machine learning tool adds value. Clinicians need to be able to identify at-risk patients much earlier, before shock or respiratory failure, to be able to intervene.

Reviewer #3: This study attempted to build ML-based prognostic prediction models for HIV/AIDS patients with talaromycosis during hospitalization, the results may have a positive significance for reducing death. However, there are some weaknesses that require further data before acceptance. I have some comments as detailed below:

1.The clinical complication and symptom variables described in the study included: “pneumonia, lung infection”, which might be same concepts. As respiratory tract is considered as the initial place for T.marneffei infection, pneumonia or lung infection could be caused by T.marneffei, If so, it should not be considered as clinical complication. This is the first question should be make sense.

2.39 laboratory variables were evaluated in the study, It’s unclear when the data was extracted? As the parameters should be changing on the process of the diseas, some good and bad. It should be better assess the outcomes base on the changes of the parameter, rather than certain timepoint’s results.

3.The authors mention the AST/ALT ratio in died patients is higher than that in the surviving patients, and suggest elevated ALT level indicate the possibility of liver damage, and the higher rise in the AST level may indicate the myocardium is involved.However, elevated AST and ALT also could be caused by side effect of drugs , especially antifungal agents.

4.The sentence in line 330-331, p27, “The number and in-hospital mortality of talaromycosis among HIV admissions increased from 45 and 18.4% in 2012 to 13 and 12.9% in the first half of 2019, respectively”. The mortality increased or decreased within these years? The results seems in contrary with the data.

There are too many spelling and grammatical errors, It requires professional p

--------------------

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

PLoS Negl Trop Dis. doi: 10.1371/journal.pntd.0010388.r003

Decision Letter 1

Roderick Hay, Ahmed Fahal

22 Mar 2022

Dear Mr. Jiang,

Thank you very much for submitting your manuscript "Machine learning-based in-hospital mortality prediction of HIV/AIDS patients with Talaromyces marneffei infection in Guangxi, China" for consideration at PLOS Neglected Tropical Diseases. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Roderick Hay

Guest Editor

PLOS Neglected Tropical Diseases

Ahmed Fahal

Deputy Editor

PLOS Neglected Tropical Diseases

***********************

Reviewer's Responses to Questions

Key Review Criteria Required for Acceptance?

As you describe the new analyses required for acceptance, please consider the following:

Methods

-Are the objectives of the study clearly articulated with a clear testable hypothesis stated?

-Is the study design appropriate to address the stated objectives?

-Is the population clearly described and appropriate for the hypothesis being tested?

-Is the sample size sufficient to ensure adequate power to address the hypothesis being tested?

-Were correct statistical analysis used to support conclusions?

-Are there concerns about ethical or regulatory requirements being met?

Reviewer #3: The authors mentioned "the elevated AST level may be a manifestation of myocardial damage", which need other biomarkers to support the hypothesis, such as other mycocardial enzymes and PNP.

--------------------

Results

-Does the analysis presented match the analysis plan?

-Are the results clearly and completely presented?

-Are the figures (Tables, Images) of sufficient quality for clarity?

Reviewer #3: Only base on elevated AST couldn't imply myocardial damage.

--------------------

Conclusions

-Are the conclusions supported by the data presented?

-Are the limitations of analysis clearly described?

-Do the authors discuss how these data can be helpful to advance our understanding of the topic under study?

-Is public health relevance addressed?

Reviewer #3: The presented data couldn't completly support the conclusions.

--------------------

Editorial and Data Presentation Modifications?

Use this section for editorial suggestions as well as relatively minor modifications of existing data that would enhance clarity. If the only modifications needed are minor and/or editorial, you may wish to recommend “Minor Revision” or “Accept”.

Reviewer #3: The manuscript need revise again.

--------------------

Summary and General Comments

Use this section to provide overall comments, discuss strengths/weaknesses of the study, novelty, significance, general execution and scholarship. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. If requesting major revision, please articulate the new experiments that are needed.

Reviewer #3: The revised version is better than the primary version, however it also need further revise.

--------------------

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

References

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article's retracted status in the References list and also include a citation and full reference for the retraction notice.

PLoS Negl Trop Dis. doi: 10.1371/journal.pntd.0010388.r005

Decision Letter 2

Roderick Hay, Ahmed Fahal

2 Apr 2022

Dear Mr. Jiang,

We are pleased to inform you that your manuscript 'Machine learning-based in-hospital mortality prediction of HIV/AIDS patients with Talaromyces marneffei infection in Guangxi, China' has been provisionally accepted for publication in PLOS Neglected Tropical Diseases.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Neglected Tropical Diseases.

Best regards,

Roderick Hay

Guest Editor

PLOS Neglected Tropical Diseases

Ahmed Fahal

Deputy Editor

PLOS Neglected Tropical Diseases

***********************************************************

Prior to printing talaromycosis should not be in italics as it is the name of a disease not an organism

PLoS Negl Trop Dis. doi: 10.1371/journal.pntd.0010388.r006

Acceptance letter

Roderick Hay, Ahmed Fahal

21 Apr 2022

Dear Mr. Jiang,

We are delighted to inform you that your manuscript, "Machine learning-based in-hospital mortality prediction of HIV/AIDS patients with Talaromyces marneffei infection in Guangxi, China," has been formally accepted for publication in PLOS Neglected Tropical Diseases.

We have now passed your article onto the PLOS Production Department who will complete the rest of the publication process. All authors will receive a confirmation email upon publication.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any scientific or type-setting errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Note: Proofs for Front Matter articles (Editorial, Viewpoint, Symposium, Review, etc...) are generated on a different schedule and may not be made available as quickly.

Soon after your final files are uploaded, the early version of your manuscript will be published online unless you opted out of this process. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Neglected Tropical Diseases.

Best regards,

Shaden Kamhawi

co-Editor-in-Chief

PLOS Neglected Tropical Diseases

Paul Brindley

co-Editor-in-Chief

PLOS Neglected Tropical Diseases

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. General characteristics of 1927 HIV/AIDS patients with T.marneffei infection at the Fourth People’s Hospital of Nanning, Guangxi.

    ART, antiretroviral therapy, a Kolmogorov-Smirnov, b Chi-square test, c t-test.

    (DOCX)

    S2 Table. Effects of different clinical complications/coinfections on the mortality of 1956 HIV/AIDS patients with T. marneffei infection at admission.

    IRIS, immune reconstitution inflammatory syndrome.

    (DOCX)

    S3 Table. Laboratory measures of 1927 HIV/AIDS patients with T. marneffei infection.

    (DOCX)

    Attachment

    Submitted filename: Response to reviewers.docx

    Attachment

    Submitted filename: Response to reviewers.docx

    Data Availability Statement

    Data cannot be shared publicly because of ethical and legal reasons. Data are available from the Langchao(Nanning)Computer Technology Co., LTD Institutional Data Access / EthicsCommittee (Hetai science park, No. 9 gaoxin 4th Road, Nanning City, Guangxi Province, email address: xueying01@inspur.com) for researchers who meet the criteria for access to confidential data.


    Articles from PLoS Neglected Tropical Diseases are provided here courtesy of PLOS

    RESOURCES