Abstract
Background
Scrub typhus (ST) is a life-threatening infectious disease caused by Orientia tsutsugamushi. Early prediction of whether the disease will progress to a severe state is crucial for clinicians to provide targeted medical care in advance.
Methods
This study retrospectively collected severe and mild ST cases in two hospitals in Fujian Province, China from 2011 to 2022. Eighteen objective clinical and laboratory features collected at admission were screened using various feature selection algorithms, and used to construct models based on six machine learning algorithms.
Results
The model based on Gradient Boosting Decision Tree using 14 features screened by Recursive Feature Elimination was evaluated as the optimal one. The model showed high accuracy, precision, sensitivity, specificity, F-1 score, and area under receiver operating characteristics curve of 0.975, 0.967, 0.983, 0.966, 0.975, and 0.981, respectively, indicating its possible clinical application. Additionally, a simplified model based on Support Vector Machine was constructed and evaluated as an alternative optimal model.
Conclusions
This study is the first to use machine learning algorithms to accurately predict the developments of ST patients upon admission to hospitals. The models can help clinicians assess the potential risks of their patients early on, thereby improving patient outcomes.
Keywords: Scrub typhus, machine learning model, prediction of severity, Gradient Boosting Decision Tree, forecast
1. Introduction
Scrub typhus (ST), caused by Orientia tsutsugamushi, is a life-threatening disease prevalent in a well-known ‘tsutsugamushi triangle’ in Asia, covering over one billion people at risk of exposure. In addition, cases in East Africa and South America have been reported in recent years, indicating its expanding trend and potential threat to the whole world [1–6]. However, ST is described as one of the most underdiagnosed and underreported febrile illnesses by the World Health Organization (WHO).
The symptoms of ST vary widely among patients, ranging from mild to severe multiple organ failure, and even death. Therefore, it is crucial to predict whether the disease will progress to a severe stage in the early stages of admission. This can help clinicians identify patients at risk of disease progression, morbidity, and mortality, and provide targeted medical care in advance.
Elderly patients are generally considered to be at higher risk of severe complications, with an increased disease severity rate and a median series case fatality rate of up to 29.4% [7,8]. Other predictors of ST severity have also been reported in multiple studies. Sharma et al. analyzed 92 patients and found leukocytosis, hyperbilirubinemia, transaminitis, hypoalbuminemia, and uremia were significantly associated with morbidity and mortality [9]. Chang et al. found the elderly, dyspnea, less relative bradycardia, less febrile illness, prolonged prothrombin time, higher levels of initial C-reactive protein (CRP), blood leukocyte counts, and lower platelet counts were significantly associated with severe complications of ST [10]. Prolonged prothrombin time was considered an important predictor for severe complications in multivariate analysis. Khemka conducted a cross-sectional observational study in children and found infant age group, presence of altered sensorium, reduced urine output, thrombocytopenia, elevated inflammatory markers, coagulopathy, hypoalbuminemia, and hyponatremia were predictors for ST-causing multiple organ dysfunction syndrome (MODS) by multivariate analysis [11]. Guan et al. separately analyzed pediatric and elderly patients and found that the presence of peripheral edema and decreased hemoglobin were the most important predictors of severe illness in pediatric patients, while the presence of dyspnea and increased total bilirubin were potential determinants of severe disease in elderly patients [12].
However, few studies have established models that can strongly forecast the development (severity or mildness) of ST based on symptoms at admission to hospitals. Sriwongpan et al. established a simple clinical risk-scoring algorithm using predictors including age, pulse rate, crepitation, serum aspartate aminotransferase, serum albumin, and serum creatinine [13]. However, the exact severity classification was obtained only in 68.3% of cases. After optimization, the algorithm achieved a sensitivity of 75.9% and a specificity of 77.5% and correctly predicted 76.77% of severe cases [14].
The advent of machine learning models has shown promise in improving predictive ability in severity prediction of various diseases [15]. For instance, in COVID-19 risk stratification, multiple models have been reported to use various types of data, such as demographic information, disease history, laboratory results, and clinical symptoms, to predict the likelihood of a patient developing severe COVID-19 [16–18]. These machine learning-based models offer advantages in that they account for high-order, non-linear interactions between predictors and gain more stable prediction [15]. However, to the best of our knowledge, the prediction of ST prognosis at admission to hospitals based on machine learning algorithms has not been reported.
In this study, we analyzed the clinical features and laboratory indicators of mild and severe ST cases admitted to two hospitals in Fujian Province, China, between 2011 and 2022. We identified the risk factors associated with severe symptoms and constructed 18 machine learning-based models to predict disease development. After screening, we identified the optimal model, which demonstrated strong predictive ability for ST development at admission to hospitals.
2. Materials and methods
2.1. Ethics statement
The study was approved by the Ethics Committee of Huadong Research Institute for Medicine and Biotechniques (approval number: 20230026; approval date: 17 Feb., 2023). The need for consent to participate was also waived by the Ethics Committee of Huadong Research Institute for Medicine and Biotechniques. The information of the cases was deidentified and participant confidentiality was maintained throughout the research process. We confirm that this study adhered to the ethical principles outlined in the Declaration of Helsinki. All procedures involving human participants were conducted in accordance with the ethical standards of the institutional and national research committee and with the 1964 Helsinki Declaration and its later amendments.
2.2. Case and data collection
Retrospective data analysis was conducted at two hospitals (The 900th and 910th hospital of PLA) in Fujian Province, China, from 2011 to 2022. All ST patients diagnosed based on the National Scrub Typhus Control and Prevention Guideline (2009) issued by the Chinese Center for Disease Control and Prevention were included. All labs adhered to standardized guidelines, and longitudinal data consistency was ensured through regular calibration and quality control. Patients were classified into mild and severe ST groups as previously described with a slight modification [12]. Briefly, ST patients with multiple organ dysfunction syndrome (MODS), shock, requiring intensive care unit (ICU) admission during their hospitalization, and death cases were classified into the severe group. The other patients were defined as mild ST patients.
Data regarding patients’ demographic information (sex and age), disease-exposure histories, clinical signs and symptoms (including skin, respiratory, gastrointestinal, hemorrhagic, and other non-specific manifestations), image findings, and laboratory test results (white blood cell count (WBC), neutrophil count (N), eosinophil count (EO), hemoglobin concentration (HGB), platelet count (PLT), alanine transaminase concentration (ALT), aspartate amino transferase concentration (AST), gamma-glutamyl transferase concentration (GGT), total bilirubin concentration (TBIL), direct bilirubin concentration (DBIL), lactate dehydrogenase concentration (LDH), creatinine concentration (CREA), blood urea nitrogen concentration (BUN), alkaline phosphatase concentration (ALP), and C reactive protein concentration (CRP)) within 24 h after admission were obtained from the medical records. Cases with missing features exceeding 20% were removed.
2.3. Preliminary analysis of the data
The collected variables were statistically analyzed to preliminarily determine the clinical differences between the two groups. Continuous variables were expressed as median (IQR, interquartile range), and categorical variables were summarized as frequencies or proportions. The statistical differences of these variables between the mild and severe groups were analyzed using the Wilcoxon two-sample rank sum test for continuous variables and the chi-square test or Fisher’s exact test for categorical variables, as appropriate. A two-sided p value of <0.05 was considered statistically significant. All statistical analyses were performed using SAS 9.4 (SAS Institute Inc., Cary, North Carolina).
2.4. Prediction model construction
We developed a forecasting system based on 18 variables, including 3 features (gender, age, and body temperature) typically collected at the pre-test triage stage and 15 laboratory indicators, as they were more objective than subjective. The forecasting system comprised a series of steps, as shown in Figure 1, including imputing missing values, balancing the dataset, selecting features, using classifiers, optimizing hyperparameters, and applying the SHapley Additive exPlanation (SHAP) method. We used PYTHON 3.7 software (https://www.python.org/) to build the models.
Figure 1.
The Flowchart of the present study. MAE: mean absolute error; RMSE: root mean square error; LR: Logistic regression; SVM: Support Vector machine; RF: Random Forest; GBDT: Gradient Boosting Decision Tree; XGBoost: Extreme Gradient Boosting; lightGBM: Light Gradient Boosting machine; FCBF: fast correlation-based filter feature; RFE: Recursive feature Elimination; LASSO: least absolute shrinkage and selection operator; SMOTE: Synthetic Minority over-sampling Technique; AUC: area under receiver operating characteristics curve.
2.4.1. Imputing missing values
Retrospective studies often have various abnormalities in the dataset, such as duplicate data, unavailable data, missing data, and erroneous data, which can affect the accuracy and stability of the subsequent model establishment. Missing data is a common problem, and samples with missing values are typically removed if there are enough samples. However, in our study with a small number of samples, we used five methods to fill the missing data, including mean, median, K-Nearest Neighbors (KNN), random forest (RF), and IterativeImputer. It is worth noting that the IterativeImputer method complements the missing data of the sample according to the overall distribution of the data in high dimensions. The theoretical basis of this method comes from multiple imputation by chained equations (MICE).
We determined the best imputing method as follows. Firstly, we identified all the features with missing values and selected all the samples with complete features. We then proportionally created missing values in the selected samples according to the number of missing values of a feature in all the samples. Finally, we used the five methods to fill the created missing values and compared the filled values with the original true values. We calculated the mean absolute error (MAE) and root mean square error (RMSE) for each method to evaluate the accuracy of the imputation. The method with the smallest MAE and RMSE was selected as the best one. To ensure the reliability of the test results, we repeated the test process for 50 times and calculated the average performance indicators to reduce the variability of the results.
2.4.2. Dataset balancing
Due to the imbalanced distribution of the mild and severe ST patients in the dataset, the machine learning-based model may be biased towards the majority class. Therefore, we used the Synthetic Minority Over-sampling Technique (SMOTE) to balance the dataset. SMOTE is a widely used method based on the KNN algorithm that generates synthetic samples for the minority class by interpolating between existing samples. Therefore, SMOTE is suitable for continuous variables, and can effectively improve the performance of the model in predicting the minority class. As for the binary attribute gender, we used SMOTENC, which handles nominal attributes differently from continuous attributes and preserves the original labels of the categorical features in the resampled data.
2.4.3. Feature selection
Feature selection is a critical step in machine learning model training as it reduces the dimensionality of the dataset and improves the efficiency and accuracy of the model. There are several feature selection methods, including filter, wrapper, and embedded methods. In this study, we used three algorithms for feature selection, including Fast Correlation-based Filter Feature (FCBF), Recursive Feature Elimination (RFE), and Least Absolute Shrinkage and Selection Operator (LASSO).
FCBF is a feature selection algorithm that considers the correlation between features and labels, as well as the correlation between features themselves. It aims to remove redundant and irrelevant features to improve the performance of machine learning models. The algorithm uses symmetric uncertainty to represent the correlation/redundancy between features and labels and features themselves, making it suitable for handling high-dimensional data.
RFE is a recursive feature selection algorithm that builds a model on all features to rank the features based on their importance. It then removes the unimportant features until the desired number of features is obtained. This algorithm is particularly useful for identifying the most important features for the model.
LASSO is a classic linear regression variant algorithm that introduces an L1 norm regularization term in the objective function. This regularization term constrains the model parameters to 0 or around 0, effectively removing the unimportant features. LASSO is widely used in feature selection for its ability to handle high-dimensional data and its interpretability. The objective function of the LASSO algorithm can be expressed as:
| (1) |
X represents the feature matrix, y represents the label, w represents the model parameters, and α is the regularization coefficient. Furthermore, the objective function of LASSO can be transformed into:
| (2) |
From Equation (2), it can be seen that the LASSO algorithm can compress the weights of some features to near 0, meaning that LASSO can simplify the model by removing or suppressing irrelevant or redundant features and retaining features with large weight values. In LASSO, the common solutions are coordinate descent or least angle regression.
2.4.4. Classifiers
After feature selection, three new datasets were generated and divided into training and testing sets with a 7:3 ratio using the Stratify function to ensure the consistency of the sample size in each set. The training set was used for model training, and the testing set was used to evaluate the performance of each model. We used six classifiers for the prediction model constructions, including Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), Gradient Boosting Decision Tree (GBDT), Extreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (lightGBM).
LR is an improved version of the linear regression algorithm that uses the fitting results of linear regression and the Sigmoid function to estimate the probability of the sample belonging to a certain class. It then compares the probability with a threshold for classification.
SVM aims to find an optimal classification hyperplane in the feature space that maximizes the margin between positive and negative samples and minimizes the classification error rate. In tasks where linear separability is not possible, this algorithm usually uses kernel functions to map features to a high-dimensional space and then seeks the optimal classification hyperplane to complete the nonlinear classification.
RF is an ensemble learning algorithm that combines multiple decision trees to form a strong classifier. Each decision tree is built using a random subset of the training data. The final prediction is made by combining the predictions of all the decision trees. It has high stability and robustness due to the random selection mechanism.
GBDT is another ensemble learning algorithm that combines multiple decision trees. However, unlike RF, GBDT builds decision trees in a sequential manner, with each new tree learning from the errors of the previous trees. This approach improves the accuracy of the model by reducing the bias.
XGBoost is an optimized implementation of GBDT that uses a gradient boosting algorithm and parallel processing to improve the efficiency and accuracy of the model [19]. It is widely used in machine learning competitions and is known for its high performance and flexibility.
LightGBM is another optimized implementation of GBDT that uses a histogram-based approach to reduce the memory usage and improve the speed of the algorithm. It is particularly useful for handling large-scale datasets.
Overall, we used a range of classifiers to predict the development of ST and selected the best-performing one based on the evaluation indicators.
2.4.5. Hyperparameter optimization
The performance of a machine learning-based model largely depends on the selection of hyperparameters, which are manually set during the model training process, such as learning rate, regularization parameter, etc. In this study, we used Bayesian optimization (BO) for hyperparameter selection. BO is a method for optimizing black-box functions, and its main idea is to use Bayesian inference to build a surrogate model that models the posterior probability of the objective function. This surrogate model is used to select the next parameter configuration to be evaluated, so as to find the optimal solution in a limited number of evaluations.
2.4.6. Performance evaluation
We evaluated the performance of the established models using the testing datasets. Various indicators, including sensitivity (also known as recall), precision, specificity, accuracy, and F1 score were calculated. Firstly, we designated severe cases as positive and mild cases as negative. True positive (TP) cases represented severe cases that were correctly predicted to be positive by the model, while true negative (TN) cases represented mild cases that were correctly predicted to be negative. On the other hand, false positive (FP) cases represented mild cases that were incorrectly predicted to be positive, and false negative (FN) cases represented severe cases that were incorrectly predicted to be negative. We calculated these indicators according to Equations (3)–(7).
In addition, we drew receiver operating characteristics (ROC) curves and calculated the area under the curve (AUC) for each model. The model with higher values of these indicators was considered better.
| (3) |
| (4) |
| (5) |
| (6) |
| (7) |
2.4.7. SHapley Additive exPlanation
To interpret the results of the optimal model, we used the SHAP method to calculate the contribution of each feature to the prediction results. The SHAP values represent the impact of each feature on the model output and can help identify potential biomarkers for ST development prediction.
3. Results
3.1. Demographic and epidemiological findings
A total of 236 cases of ST were collected from two hospitals between 2011 and 2022, and after removing samples with missing features exceeding 20%, 226 cases remained, including 197 mild and 29 severe cases (including 3 deaths). The number of cases peaked in 2015 and has been declining since then (Supplementary Figure 1A). Few cases were collected from 2020 to 2022, possibly due to the impact of the COVID-19 pandemic. The monthly incidence peaked from May to October, with the lowest monthly incidence occurring in March (Supplementary Figure 1B). The majority of cases were male (60.6%, 137/226).
The median age of severe patients was significantly higher than that of mild patients (66 (IQR, 53–70) vs. 51 (IQR, 27–60), p < 0.001, Supplementary Table 1). The detection rate by the Weil-Felix test was significantly higher in the severe group than in the mild group (p < 0.001). No gender trend was observed between the two groups (p > 0.05).
3.2. Clinical manifestations
In terms of clinical manifestations, the frequencies of symptoms differed between the two groups. Severe cases had significantly higher rates of chest distress (p < 0.05), cough (p < 0.001), expectoration (p < 0.001), anorexia (p < 0.01), pleural effusion (p < 0.001), ascites (p < 0.05) and signs of lung or bronchial injury by imaging examination (p < 0.001), while had significantly fewer symptoms of headache (p < 0.001) (Supplementary Table 1). The proportions of patients with fever were a little higher in the severe group than in the mild group, with a P value at the edge of significant difference (p = 0.044, Supplementary Table 1).
3.3. Laboratory indicators
A total of 15 laboratory indicators were obtained, and most of them showed significant differences (p < 0.05) between the severe and mild groups, except four indicators including WBC count, HGB, N count, and ALT (p > 0.05, Supplementary Table 1). The severe group had significantly higher amounts of EO count (p < 0.05), AST (p < 0.01), GGT (p < 0.01), TBIL (p < 0.05), DBIL (p < 0.01), LDH (p < 0.01), ALP (p < 0.05), CREA (p < 0.001), BUN (p < 0.001), and CRP (p < 0.01) than the mild group, while significantly lower amounts of PLT (p < 0.001) (Supplementary Table 1).
3.4. Model construction and screen
After processing the data using five different methods for filling in missing data, IterativeImputer was selected as the final method based on the minimum MAE and RMSE (Figure 2). After dataset balancing, features determined by each of the three feature selection algorithms were used for model construction with each of the six classifiers, resulting in 18 models. The performances of these models were shown in Figure 3 and Supplementary Table 2.
Figure 2.
Evaluation of 5 missing value filling algorithms using MAE (A) and RMSE (B). RF: random Forest; KNN: K-Nearest Neighbors; MAE: mean absolute error; RMSE: root mean square error.
Figure 3.
Prediction performances of the constructed models based on 6 algorithms using the features selected by FCBF, RFE, or LASSO. LR: Logistic regression; SVM: Support Vector machine; RF: Random Forest; GBDT: Gradient Boosting Decision Tree; XGBoost: Extreme Gradient Boosting; lightGBM: Light Gradient Boosting machine; FCBF: fast correlation-based filter feature; RFE: Recursive feature Elimination; LASSO: least absolute shrinkage and selection operator.
The feature selection algorithms selected different features. FCBF selected 16 of the 18 features (excluding gender and body temperature), RFE selected 14 of the 18 features (excluding gender, WBC, HGB, and TBIL), and LASSO selected 15 of the 18 features (excluding WBC, TBIL, and DBIL). In model construction, RFE performed the best, with the highest average indicators (Figure 3 and Supplementary Table 2) and least features.
All classifiers achieved high accuracy, precision, sensitivity, specificity, and F1-score values, ranging from 0.855 to 1, with most exceeding 0.9 (Figure 3 and Supplementary Table 2). Models based on different classifiers and the same feature selection algorithm had different performances. In brief, in models using FCBF-selected features, GBDT and lightGBM had the highest values of accuracy and F1 score, while RF had the highest values of precision and specificity. In models using RFE-selected features, GBDT had the highest values for all five indicators. In models using LASSO-selected features, SVM performed the best. Overall, the model using GBDT based on RFE-selected features (GBDT-RFE) was considered the best, with the highest values of accuracy, precision, specificity, and F1 score, and the third-highest value of sensitivity among all 18 models. In the ROC curves, all six models using the 14 RFE-selected features had high AUC values, ranging from 0.942 to 0.995 (Figure 4), with lightGBM and SVM algorithms performing the best with an AUC value of 0.995. However, the GBDT-based model ranked the third with an AUC value of 0.981, which is acceptable.
Figure 4.
Receiver operating characteristics (ROC) curves and Area under curve (AUC) of models based on 6 algorithms using features selected by recursive feature elimination (RFE). LR: Logistic regression; SVM: Support Vector Machine; RF: Random Forest; GBDT: Gradient Boosting Decision Tree; XGBoost: Extreme Gradient Boosting; lightGBM: Light Gradient Boosting Machine.
3.5. Feature contributions
To identify the most important features for predicting the development of ST in the optimal GBDT-RFE model, we used the SHAP method to calculate the SHAP values of each feature. The top six most important features were BUN, age, EO, PLT, N, and CREA, with all these features contributing positively to the model except for PLT, which contributed negatively (Figure 5). These results suggest that these features may play a crucial role in predicting the development of ST and could be potential biomarkers.
Figure 5.
Contributions of each feature to the optimal model based on gradient boosting Decision Tree (GBDT) algorithm using recursive feature elimination (RFE)-selected features. The contribution degree (A) and its direction (positive or negative contribution) (B) of each feature were evaluated by SHapley additive exPlanation (SHAP). N: neutrophil count; EO: eosinophil count; PLT: platelet count; ALT: alanine transaminase concentration; AST: aspartate amino transferase concentration; GGT: gamma-glutamyl transferase concentration; DBIL: direct bilirubin concentration; LDH: lactate dehydrogenase concentration; CREA: creatinine concentration; BUN: blood urea nitrogen concentration; ALP: alkaline phosphatase concentration; CRP: C reactive protein concentration.
3.6. Model simplification
Simplified models based on the six classifiers using the top six most important features were further constructed and evaluated (Supplementary Table 3 and Supplementary Figures 2–8). The values of the indicators (sensitivity, accuracy, precision, specificity, and F1 score) in most of the simplified models decreased, except for the model based on SVM, which increased, making it the optimal one. However, the simplified optimal model was inferior to the original optimal one, with smaller indicators and AUC.
4. Discussion
Based on retrospective data from two hospitals, we applied 18 machine learning models to predict the development of ST at hospital admission. Compared to the risk-scoring algorithm reported previously [13], which achieved an accuracy of 68.3% for severity classification, these models demonstrated superior performance. The established GBDT-RFE model achieved high sensitivity (98.3%), specificity (96.6%), and AUC (0.981), highlighting its enhanced discriminative capacity. This improvement likely stems from the ability of machine learning to capture non-linear interactions among predictors. Furthermore, compared to logistic regression-based models for COVID-19 severity prediction (e.g. Zhao et al. [17], AUC: 0.74–0.83), our GBDT-RFE framework leverages ensemble learning to reduce bias and improve generalizability, as demonstrated by its higher AUC (0.981). The SHAP-based interpretability also addresses a key limitation of ‘black-box’ models, providing clinicians with actionable insights into critical biomarkers. Our automated feature selection (RFE) and hyperparameter optimization (Bayesian methods) mitigate the constrains of linear assumptions and manual feature weighting of the traditional risk-scoring systems, ensuring robust performance even with imbalanced data.
The models had high values of sensitivity, specificity, accuracy, precision, F1 score, and AUC. The GBDT algorithm model using RFE selected features was determined as the optimal model, although the difference in prediction effects between the models was small. This is the first study to comprehensively examine the utility of machine learning models for predicting the development of ST patients at hospital admission.
Data quality is a critical factor that affects the effectiveness of prediction models. Although we collected data covering approximately 11 years, the number of cases was not large, some data were missing, and there were much more mild cases than the severe ones. Despite these imperfections, we established a complete modeling solution with essential steps. We introduced and screened various methods and algorithms to resolve problems of missing data and unbalanced datasets during the data preprocessing stage. We obtained over 40 features of ST patients, but to reduce the influence of human subjective factors, we chose 18 objective features for model construction. It’s worth noting that coagulation markers (PT/APTT), previous reported predictors for the severity of ST [10,20], were excluded due to incomplete data. Future prospective studies should prioritize the collection of these parameters to refine predictive accuracy. To eliminate the adverse effects caused by dataset imbalance, we processed continuous and categorical variables using different algorithms. Our solution provides a valuable reference for future research in this area.
Different models based on various variables and algorithms can produce different prediction effects, making it essential to screen for the optimal model. In our study, the optimal model based on 14 features identified BUN, age, EO, PLT, N, and CREA as the top six most important features that physicians should pay attention to when admitting ST cases. Most of them showed significant differences between the two groups with p < 0.001. We further evaluated the performance of machine learning-based models using these six features and obtained a simplified optimal model based on SVM algorithm, which greatly reduced computational requirements while producing acceptable prediction effects. Both the original and simplified models have their advantages and can be used selectively in different scenarios.
Our study demonstrates the potential of machine learning in predicting scrub typhus severity; however, certain limitations must be acknowledged. While we employed stratified training-test splits (7:3 ratio) and rigorous anti-overfitting strategies—including SMOTE for class balancing, LASSO regularization, and Bayesian hyperparameter optimization—the relatively small sample size (n = 226) and retrospective design inherently constrain generalizability. Although our model achieved high internal performance (AUC: 0.981), the absence of an external test set precludes definitive conclusions about its applicability to diverse populations or emerging ST strains, a limitation shared by retrospective prognostic models [13,17]. Clinical application of this model requires validation in prospective external cohorts. Future studies should prioritize external validation in multicenter, prospective cohorts to confirm the model’s reliability across diverse populations. Additionally, while cross-validation was not originally included due to sample size constraints, its application in larger datasets could further mitigate overfitting risks.
Furthermore, while this study demonstrates the potential of machine learning in predicting severe scrub typhus with promising accuracy, the inherent complexity of models can pose significant challenges for their adoption by primary care physicians (PCPs), particularly in resource-limited or time-constrained environments. Such models often require specialized software, computational resources, and expertise for deployment and interpretation, which may not be readily accessible or practical for bedside decision-making in primary care. Simpler, well-established approaches such as logistic regression analysis, potentially translated into user-friendly formats like a risk score or a nomogram, might offer greater feasibility and immediate utility for PCPs in stratifying scrub typhus patients. Therefore, building upon the feature selection capabilities of machine learning demonstrated in this study, our planned future work focuses on developing precisely such a pragmatic clinical prediction tool. We intend to study the potential of utilizing the key predictors identified through machine learning classification analysis and subsequently refining and validating them using traditional logistic regression. This hybrid approach aims to leverage the feature discovery power of machine learning while retaining the interpretability and deployability of logistic regression. The resulting risk scoring system or nomogram will be designed specifically for ease-of-use at the primary care level.
5. Conclusions
In conclusion, our study demonstrated the effectiveness of machine learning algorithms in accurately predicting and distinguishing patients with severe and mild ST at hospital admission. The GBDT-RFE model and SVM-based simplified model were screened as optimal models, with high values of sensitivity, specificity, accuracy, precision, F1 score, and AUC in predicting the development of ST. These models can help physicians evaluate the potential risks of their patients in the early stages of admission, benefiting patient outcomes. The identified important features also provide valuable insights into potential biomarkers for ST development prediction. Despite imperfect data quality, we successfully established a complete modeling solution with essential steps to address issues of missing data and unbalanced datasets. Our study provides a valuable reference for future research in this area. Further validation and clinical studies are necessary to confirm the effectiveness and reliability of our system.
Supplementary Material
Funding Statement
This study was funded by the Natural Science Foundation of Jiangsu Province (Grants No. BK20241748), National Key Research and Development Program of China (No.2022YFC2305001), and a Project of 900th Hospital of Joint Logistics Support Force of PLA (No. 2022ZL10).
Ethical approval
The use of human data was approved by the Ethics Committee of Huadong Research Institute for Medicine and Biotechniques
Disclosure statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Data availability statement
The datasets during and/or analyzed during the current study available from the corresponding author on reasonable request.
References
- 1.Weitzel T, Dittrich S, López J, et al. Endemic scrub typhus in South America. N Engl J Med. 2016;375(10):954–961. doi: 10.1056/NEJMoa1603657. [DOI] [PubMed] [Google Scholar]
- 2.Maina AN, Farris CM, Odhiambo A, et al. Q fever, scrub typhus, and rickettsial diseases in children, Kenya, 2011–2012. Emerg Infect Dis. 2016;22(5):883–886. doi: 10.3201/eid2205.150953. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Kocher C, Jiang J, Morrison AC, et al. Serologic evidence of scrub typhus in the Peruvian Amazon. Emerg Infect Dis. 2017;23(8):1389–1391. doi: 10.3201/eid2308.170050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Thiga JW, Mutai BK, Eyako WK, et al. High seroprevalence of antibodies against spotted fever and scrub typhus bacteria in patients with febrile illness, Kenya. Emerg Infect Dis. 2015;21(4):688–691. doi: 10.3201/eid2104.141387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Weitzel T, Abarca K, Martínez-Valdebenito C, et al. Scrub typhus in continental Chile, 2016–2018. Emerg Infect Dis. 2019;25(6):1214–1217. doi: 10.3201/eid2506.181860. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Soong L, Dong X.. Emerging and re-emerging zoonoses are major and global challenges for public health. Zoonoses. 2021;1:1–2. doi: 10.15212/ZOONOSES-2021-00011. [DOI] [Google Scholar]
- 7.Taylor AJ, Paris DH, Newton PN.. A systematic review of mortality from untreated scrub typhus (Orientia tsutsugamushi). PLOS Negl Trop Dis. 2015;9(8):e0003971. doi: 10.1371/journal.pntd.0003971. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Jang M-O, Kim JE, Kim UJ, et al. Differences in the clinical presentation and the frequency of complications between elderly and non-elderly scrub typhus patients. Arch Gerontol Geriatr. 2014;58(2):196–200. doi: 10.1016/j.archger.2013.10.011. [DOI] [PubMed] [Google Scholar]
- 9.Sharma R, Mahajan SK, Singh B, et al. Predictors of severity in scrub typhus. J Assoc Physicians India. 2019;67(4):35–38. [PubMed] [Google Scholar]
- 10.Chang K, Lee N-Y, Ko W-C, et al. Characteristics of scrub typhus, murine typhus, and Q fever among elderly patients: prolonged prothrombin time as a predictor for severity. J Microbiol Immunol Infect. 2019;52(1):54–61. doi: 10.1016/j.jmii.2016.08.023. [DOI] [PubMed] [Google Scholar]
- 11.Khemka A, Sarkar M, Basu A, et al. Predictors of severity of scrub typhus in children requiring pediatric intensive care admission. J Pediatr Intensive Care. 2022;11(3):247–253. doi: 10.1055/s-0041-1723947. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Guan X-G, Wei Y-H, Jiang B-G, et al. Clinical characteristics and risk factors for severe scrub typhus in pediatric and elderly patients. PLOS Negl Trop Dis. 2022;16(4):e0010357. doi: 10.1371/journal.pntd.0010357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Sriwongpan P, Krittigamas P, Tantipong H, et al. Clinical risk-scoring algorithm to forecast scrub typhus severity. Risk Manag Healthc Policy. 2013;7:11–17. doi: 10.2147/RMHP.S55305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Gulati S, Chunduru K, Madiyal M, et al. Validation of a clinical risk-scoring algorithm for scrub typhus severity in South India. Indian J Crit Care Med. 2021;25(5):551–556. doi: 10.5005/jp-journals-10071-23828. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Raita Y, Goto T, Faridi MK, et al. Emergency department triage prediction of clinical outcomes using machine learning models. Crit Care. 2019;23(1):64. doi: 10.1186/s13054-019-2351-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Raman G, Ashraf B, Demir YK, et al. Machine learning prediction for COVID-19 disease severity at hospital admission. BMC Med Inform Decis Mak. 2023;23(1):46. doi: 10.1186/s12911-023-02132-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Zhao Z, Chen A, Hou W, et al. Prediction model and risk scores of ICU admission and mortality in COVID-19. PLOS One. 2020;15(7):e0236618. doi: 10.1371/journal.pone.0236618. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Lee NCJ, Demir YK, Ashraf B, et al. Immature platelet fraction as a biomarker for disease severity in pediatric respiratory coronavirus disease 2019. J Pediatr. 2022;251:187–189. doi: 10.1016/j.jpeds.2022.07.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Chen T, Guestrin C.. XGBoost: a scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. San Francisco: ACM; 2016. [Google Scholar]
- 20.Sul H, Yun NR, Kim D-M, et al. Development of a scoring system to differentiate severe fever with thrombocytopenia syndrome from scrub typhus. Viruses. 2022;14(5):1093. doi: 10.3390/v14051093. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The datasets during and/or analyzed during the current study available from the corresponding author on reasonable request.





