Abstract
Background
Lumbar drainage is widely used in the clinic; however, forecasting lumbar drainage-related meningitis (LDRM) is limited. We aimed to establish prediction models using supervised machine learning (ML) algorithms.
Methods
We utilized a cohort of 273 eligible lumbar drainage cases. Data were preprocessed and split into training and testing sets. Optimal hyper-parameters were archived by 10-fold cross-validation and grid search. The support vector machine (SVM), random forest (RF), and artificial neural network (ANN) were adopted for model training. The area under the operating characteristic curve (AUROC) and precision-recall curve (AUPRC), true positive ratio (TPR), true negative ratio (TNR), specificity, sensitivity, accuracy, and kappa coefficient were used for model evaluation. All trained models were internally validated. The importance of features was also analyzed.
Results
In the training set, all the models had AUROC exceeding 0.8. SVM and the RF models had an AUPRC of more than 0.6, but the ANN model had an unexpectedly low AUPRC (0.380). The RF and ANN models revealed similar TPR, whereas the ANN model had a higher TNR and demonstrated better specificity, sensitivity, accuracy, and kappa efficiency. In the testing set, most performance indicators of established models decreased. However, the RF and AVM models maintained adequate AUROC (0.828 vs. 0.719) and AUPRC (0.413 vs. 0.520), and the RF model also had better TPR, specificity, sensitivity, accuracy, and kappa efficiency. Site leakage showed the most considerable mean decrease in accuracy.
Conclusions
The RF and SVM models could predict LDRM, in which the RF model owned the best performance, and site leakage was the most meaningful predictor.
Keywords: lumbar drainage, meningitis, machine learning, prediction model, infectious diseases
Highlights
- The supervised machine learning algorithm has value in developing the LDRM model.
- The RF and SVM models had acceptable performance.
- Site leakage was the most meaningful predictor.
- Our proposed model may serve as a decision-making tool in the clinic.
Introduction
Lumbar drainage (LD) is a temporary technique for neurologically disease patients with the purpose of therapeutically or prophylactically releasing cerebrospinal fluid (CSF) and modulating CSF pressure (1, 2). This technique is conducted in isolation from the environment through a closed medical instrument with the catheter tip placed into the lumbar cistern. As a routine operation, it is widely used in clinical practice and manipulated by multiple specialists, including neurosurgeons, neurocritical care physicians, interventional radiologists, anesthesiologists, and otolaryngologists (3). The common LD indications consist of intraventricular or subarachnoid hemorrhage, CSF leakage, communicating hydrocephalus, and drainage of CSF during operation to increase anatomical exposure (4–8).
Previous studies reveal that LD can bring obvious clinical benefits to patients, such as reducing angiographic and symptomatic vasospasm, preventing permanent shunt dependency, accelerating CSF leakage healing, and predicting the outcome of hydrocephalus shunt (9–12). In addition, LD may be a safe and effective method to lower intracranial pressure in traumatic brain injury patients with refractory intracranial hypertension (13). However, the risks from LD are frequent and need to be noticed, although LD placement is often viewed as benign (2). One of the most common complications is lumbar drainage-related meningitis (LDRM), which has diverse incidence in different reports, usually as 3–20%, and a few may be as high as 40% (14). This complication can prolong the hospital stay, increase medical expenses, and even lead to catastrophic outcomes (15, 16).
Several risk factors have been identified to promote the occurrence of LDRM. These factors include duration days, drain opening, site leakage, admission to intensive care unit (ICU), diabetes, and accompanied craniotomy (15–18). In our earlier study, we proposed a prediction model by screening the risk factors of LDRM and established a nomogram as a simple tool to estimate the infection risk (19). However, this model was built based on a traditional logistic regression method, which is challenging to fit the actual distribution of data and deal with the problem of collinearity.
Novel supervised machine learning (ML) algorithms have become widely accepted in recent decades, and have emerged as a popular method of clinical infection research (20). The algorithms can build complex non-linear models that associate the independent features with dependent corresponds in large data sets, with high efficiency and accuracy (21). In this work, we used three frequent ML algorithms, the support vector machine (SVM), random forest (RF), and artificial neural network (ANN), to build prediction models of LDRM. We also evaluated the model performance and conducted internal validation to assess possible clinical application value.
Methods
Program Environment
The data preprocessing and model development in this research were implemented within the environment of R (4.1.2).
Study Population
We used a cohort of 273 eligible cases, as described in our previous report (19). All the enrolled patients received LD treatment during a research period from January 2012 to December 2018 in the Chengdu Fifth People's Hospital (Chengdu, China). The original clinical features were extracted from the hospital electronic medical records, including sex, age, admission diagnosis, admission to surgery intensive care unit (SICU), initial Glasgow coma scale score, blood CSF, malignancy, immunosuppression, diabetes, duration days, site leakage of CSF, accompanied craniotomy, and antibiotic treatment for other types of infection (before or after LD initiation). LDRM was identified as the response variable, in which meningitis was the positive response, and the contrary situation (without meningitis) was the negative response. After univariate analysis, we only collected statistically significant features related to the occurrence of LDRM to form the original set (Figure 1A). All procedures in this study were following the ethical standards of the institutional ethical committee of the Chengdu Fifth People's Hospital (ref. no. 2019–074), and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.
Figure 1.
Schematic diagrams of data processing and model development. (A) The data from electronic medical records (EMR) were selected by univariate analysis to form an original set, which was divided into training and test sets by stratified sampling. (B) Models were developed through a procedure with four steps. (C) Three machine learning algorithms, the support vector machine (SVM), random forest (RF), and artificial neural network (ANN), were used for model training.
Data Pre-processing
All data had normalization (range “0–1”) to eliminate the dimensional influence between features and make different predictors comparable. We conducted stratified sampling according to the response, taking 70% of the original set as the training set and 30% as the testing set (Figure 1A). Compared with random sampling, stratified sampling can help achieve a consistent distribution of response in training and testing sets. Features were modified to categorical variables using as.factor function if they were automatically identified as numeric in R.
Model Development
We selected three supervised ML algorithms, SVM, RF, and ANN, to construct binary LDRM prediction models following a procedure with four steps (Figure 1B).
(1) Hyper-parameter tuning. We adopted a 10-fold cross-validation and grid search to achieve the best parameters with minimal classification error for each algorithm. The tune function in R implemented this process.
(2) Model training. We trained models via three ML algorithms using the determined hyper-parameters. The SVM, RF, and ANN algorithms were proceeded by R packages of e1071 (22), randomForest (23), and nnet (24), respectively (Figure 1C). We also analyzed the importance of features in the RF model.
(3) Model evaluation. We drew the receiver operating characteristic curve (ROC) and precision-recall curve (PRC) and calculated the area under the two curves (AUROC and AUPRC) to evaluate the performance of different models. We also constructed the confusion matrix and calculated other performance indicators for evaluation, such as true positive ratio (TPR), true negative ratio (TNR), specificity, sensitivity, accuracy, and kappa coefficient.
(4) Model validation. We verified the trained models in the testing set for internal validation to determine whether the models were generalizable. Similarly, we computed performance indicators of the model in the testing set as the method in step three.
Results
The original set of this study enrolled 273 LD patients, including 37 (13.6%) cases with meningitis and 236 (86.4%) without meningitis. The demographic information and univariate analysis of the data set are demonstrated in our previous report (19). Five features (admission to SICU, diabetes, duration days, site leakage, and associated craniotomy) had significant differences between infected and non-infected cases. See reference 19 for further details. We adopted stratified random sampling to divide the original set into training and testing sets. The former has 192 patients (70.3%) and the latter has 81 (29.7%). The distribution of LD patients with and without meningitis in different data sets is shown in Table 1.
Table 1.
Distribution of lumbar drainage patients with and without meningitis in different data sets.
Data Sets | Meningitis | Non-meningitis | Total |
---|---|---|---|
Original set | 37 (13.6%) | 236 (86.4%) | 273 (100%) |
Training set | 25 (13.0%) | 167 (87.0%) | 192 (100%) |
Testing set | 12 (14.8%) | 69 (85.2%) | 81 (100%) |
We used the 10-fold cross-validation and grid search to obtain optimal hyper-parameters (Figure 2). The optimal constraints violation cost (cost) and gamma parameter (gamma) of the SVM model were 31 and 0.01, with a minimal error of 0.109. The optimal number of trees to grow (ntree) of the RF model was 500, and the number of variables randomly sampled as candidates at each split (mtry) was 2, with a minimal error of 0.110. As for the ANN model, the optimal number of units in the hidden layer (size) was 8, and the best maximum number of iterations (maxit) was 170, with a minimal error of 0.111.
Figure 2.
Perspective view of the outcomes in hyper-parameter tuning. The 10-fold cross-validation and grid search were used to obtain optimal hyper-parameters for three machine learning models. (A) The constraints violation cost (cost) and gamma parameter (gamma) in the support vector machine (SVM) model. (B) The number of trees to grow (ntree) and variables randomly sampled as candidates at each split (mtry) in the random forest (RF) model. (C) The number of hidden units (size) and the maximum number of iterations (maxit) in the artificial neural network (ANN) model.
We established three ML models in the training set. The kernel used in the SVM model was radial. The AUROC of all three models exceeded 0.8; the ANN model had a maximal under curve area of 0.925 (Figure 3A). The SVM and RF models had acceptable AUPRC, both of which were more than 0.6. However, the ANN model had a low AUPRC, and the value was only 0.380 (Figure 3B). The RF and ANN models revealed similar TPR, while ANN had a higher TNR (Figure 4). And the ANN model showed better specificity, sensitivity, accuracy, and kappa efficiency (Table 2).
Figure 3.
The area under the receiver operating characteristic curve (AUROC) and precision-recall curve (AUPRC) of three machine learning models, which were trained using the support vector machine (SVM), random forest (RF), and artificial neural network (ANN) algorithms. (A) AUROC of the training set. (B) AUPRC of the training set. (C) AUROC of the testing set. (D) AUPRC of the testing set.
Figure 4.
The confusion matrix of three machine learning models, constructed by the support vector machine (SVM), random forest (RF), and artificial neural network (ANN), respectively. The value in the upper-right grid represents the true positive ratio, and the value in the lower-left grid means the true negative ratio.
Table 2.
Performance indicators of confusion matrix in training and testing sets.
Models | Training set | Testing set | ||||||
---|---|---|---|---|---|---|---|---|
Specificity | Sensitivity | Accuracy | Kappa coefficient | Specificity | Sensitivity | Accuracy | Kappa coefficient | |
SVM | 0.665 | 0.880 | 0.693 | 0.287 | 0.783 | 0.750 | 0.679 | 0.247 |
RF | 0.689 | 0.920 | 0.719 | 0.329 | 0.826 | 0.833 | 0.691 | 0.290 |
ANN | 0.790 | 0.920 | 0.807 | 0.455 | 0.696 | 0.500 | 0.667 | 0.129 |
SVM, support vector machine; RF, random forest; ANN, artificial neural network.
Ulteriorly, we internally validated the established models in the testing set. The RF and SVM models maintained adequate AUROC (0.828 vs. 0.719), whereas the ANN models decreased by a prodigious degree (0.574) (Figure 3C). All three models had different levels of decrements in AUPRC, in which the RF and SVM models comparatively performed better (0.413 vs. 0.520) (Figure 3D). The RF model had better TPR, although the indicator of all models decreased comprehensively. And the TNR of SVM and RF models changed slightly, compared with a notable decline in the ANN model (Figure 4). RF showed better specificity, sensitivity, accuracy, and kappa efficiency than the other models (Table 2).
In addition, the importance of features was analyzed in the RF model, in which site leakage had a significant impact on the prediction accuracy, with the most meaningful mean decrease accuracy (Figure 5).
Figure 5.
The importance of features in the random forest (RF) model. Site leakage showed the most considerable mean decrease in accuracy.
Discussion
At present, scholars have an increasing enthusiasm for utilizing supervised ML to predict the occurrence of infection, including iatrogenic and non-iatrogenic (21, 25–33). Nonetheless, the study that forecasts the risk of LDRM in an early stage before the clinical diagnosis is limited, although we have proposed a prediction model using the traditional logistic regression algorithm (19). To search for potential ML models and improve the prediction accuracy, we used three prevalent ML algorithms to establish the LDRM prediction models in this research. The main findings are the RF and SVM models showed the ability to predict LDRM, in which the RF model had the best performance among all established models, and site leakage was the most meaningful predictor.
Data preprocessing is an important preceding step to initiating machine learning (34). The features included in the model often have different dimensions and units, which will affect the results of data analysis and cause bias (35). We normalized the values (range “0–1”) between included features to eliminate the overrepresentation or underrepresentation between predictors. In addition, we conducted feature selection to delete redundant or useless characteristics and retain the contributing variables in the prediction (29, 35). We used the univariate analysis consistent with our previous studies to make the included variables concordant so that the models are comparable.
We included five features as model predictor variables, which were significantly different in infected and non-infected groups in the this study. Duration days and site leakage are two features directly related to LD. Although the relationship between drainage time and infection is non-linear (36), it is difficult to obtain an infection cutoff as the risk increases gradually and progressively (16). Site leakage is another significant predictor variable, regarded as a critical driving factor causing retrograde infection (19). Diabetes is usually a risk factor for surgical site infection because diabetic patients are prone to hyperglycemia, vascular changes, and white blood cell dysfunction (37). Furthermore, admission to SICU and associated craniotomy are predictors included in this work, with the possible reason these LD patients are more severe and susceptible to bacteria.
Typically, the original set is divided into a large set (training set) for model training and a small set (testing set) for model validation. Some studies that predict infection use random sampling to split the original data (25, 27). Considering that this method may cause a disproportionate distribution between different data sets, we used stratified sampling in the this study. We also recommend this hierarchical data segmentation method, of which the utilization can contribute to balancing the class distributions within the splits. Data imbalance in ML algorithms may be an obstacle to obtaining excellent results (38). Some data resampling methods, such as oversampling or under sampling, are likely to help solve this problem. However, we did not resample data because the benefit of data balance is limited in infection prediction, given the low rate of positive events (25).
Hyper-parameter has a considerable influence on the model performance in ML. These parameters are predetermined but not obtained through the training process (39). It is needed to optimize the hyper-parameters for the ML models to improve their performance. Improper selection of hyper-parameters in some models, such as SVM and ANN, can significantly affect the outcome (35). In ML, a validation set is usually established, with the role of adjusting hyper-parameters. Nevertheless, we did not divide a validation set separately in the this study due to the limited data. We, instead, adopted a 10-fold cross-validation to achieve the optimal hyper-parameter in the training set, using the tune function of R. This general function uses grid search to adjust the hyper-parameter of the ML methods within the provided parameter range.
We established prediction models of LDRM using three ML algorithms. The model evaluation in the training set showed that the ANN model had excellent AUROC and additional performance indicators, including TNR, specificity, sensitivity, accuracy, and kappa efficiency. However, this model had a minimal area under the PRC, suggesting that it may not be satisfying because the PRC is more informative than ROC when dealing with imbalanced data sets (40). This conclusion is further confirmed when using the testing set for model verification. The performance of the ANN model decreased notably, which reveals there might be overfitting during training. Some previous studies also support our viewpoint, in which the ANN model does not achieve the best performance in infection prediction (27, 32).
On the contrary, the RF model in this study showed encouraging performance both in the training and testing sets. This model also outperforms our earlier logistic regression model (AUROC 0.888 vs. 0.837), and it consequently may be the most promising ML model for predicting LDRM. In addition, the strongest predictor of LDMR in the RF model was site leakage, which coincides with our previous studies. We have proposed a possible reason that the site leakage of LD is usually inconspicuous to be found, resulting in retrograde infection through CSF or infected soft tissue (19). Another model based on the SVM algorithm also had acceptable performance, although indicators were not superior to the RF model except AUPRC in the testing set. With further optimization, this model may become an alternative candidate for predicting LDMR.
Although the effective prevention of LDRM needs additional research, the ML models in this study, for example, may play a meaningful role. These prediction models can help clinicians and nurses judge the possibility of LDRM and identify high-risk patients when multiple risk factors coexist, to strengthen monitoring or adjust treatment strategy (30). Some procedures that increase the infection risk, such as CSF sampling, can be accurately enhanced or diminished as appropriate. Moreover, early prediction of LDRM may improve clinical outcomes and reduce medical costs, as the infection is closely related to disease deterioration and extra drug use (15, 17). It is worth noting that optimizing the model according to the target population is encouraged, given that the patient population, clinical scheme, and possible risk factors are diverse among institutions (26).
This study is the first to predict LDRM using supervised ML algorithms, in which we constructed and evaluated three prevalent models. However, there were several limitations. (1) Our study included retrospective data from a single research center to establish ML models, which may result in selection bias and introduce uncontrollable confounding factors. (2) We only used three ML algorithms, and other unused algorithms may help achieve models with better prediction performance. We also did not use unstructured data, which may contain extra prediction information (31). Furthermore, the modeling method we used entirely depends on supervised ML, which requires a lot of expensive and time-consuming tag data and may not extend well to related but non-identical tasks. (3) We did not conduct external verification of the established models, although some of them had a good performance during internal verification. It is necessary to update the model and verify the generalization in other clinical environments. In addition, whether these models can improve the clinical prognosis of LD patients and reduce medical costs remains to be explored.
Conclusion
In summary, this study constructed and verified three supervised ML prediction models to predict LDRM. The results suggest that RF and SVM models had the predictive power, in which the RF model owned the best performance, and site leakage was the most meaningful predictor. Our research highlights that the prediction model based on the ML algorithm, with further optimization, may become an important decision-making tool for clinical staff in the future, supplementing the existing schemes to identify high-risk patients.
Data Availability Statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.
Ethics Statement
The studies involving human participants were reviewed and approved by Ethical Committee of Chengdu Fifth People's Hospital. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.
Author Contributions
PW: conceptualization, funding acquisition, and writing—original draft. SC: methodology, visualization, and supervision. YL: formal analysis. LL, JL, and QZ: data curation and investigation. SL: validation and writing—review and editing. All authors contributed to the article and approved the submitted version.
Funding
This research was supported by grants from the Xinglin Scholar Discipline Talents Scientific Research Promotion Plan of Chengdu University of TCM (YYZX2021047), the Chengdu High-level Key Clinical Specialty Construction Project (GSPZX2021-15), and the Foundation of Sichuan Health Commission (19PJ016).
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher's Note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Acknowledgments
We acknowledge Ms. Qing Qing, who graduated from the Massey University in New Zealand, for her English editorial assistance.
Glossary
Abbreviations
- LD
Lumbar drainage
- CSF
Cerebrospinal fluid
- LDRM
Lumbar drainage-related meningitis
- ICU
Intensive care unit
- ML
Machine learning
- SVM
Support vector machine
- RF
Random forest
- ANN
Artificial neural network
- SICU
Surgery intensive care unit
- ROC
Receiver operating characteristic curve
- PRC
Precision-recall curve
- AUROC
Area under receiver operating characteristic curve
- AUPRC
Area under precision-recall curve
- TPR
True positive ratio
- TNR
True negative ratio.
References
- 1.Lele AV, Hoefnagel AL, Schloemerkemper N, Wyler DA, Chaikittisilpa N, Vavilala MS, et al. Perioperative management of adult patients with external ventricular and lumbar drains: guidelines from the society for neuroscience in anesthesiology and critical care. J Neurosurg Anesthesiol. (2017) 29:191–210. 10.1097/ANA.0000000000000407 [DOI] [PubMed] [Google Scholar]
- 2.Livingston AJ, Laing B, Zwagerman NT, Harris MS. Lumbar drains: practical understanding and application for the otolaryngologist. Am J Otolaryngol. (2020) 41:102740. 10.1016/j.amjoto.2020.102740 [DOI] [PubMed] [Google Scholar]
- 3.Maguire RC, Gull J, Weaver M, Sataloff RT. Otolaryngologic uses for spinal drains. Ear Nose Throat J. (2010) 89:E17–22. 10.1177/014556131008901004 [DOI] [PubMed] [Google Scholar]
- 4.Giese H, Meyer J, Unterberg A, Beynon C. Preoperative lumbar drainage placement for surgical cranioplasty. J Clin Neurosci. (2018) 52:88–91. 10.1016/j.jocn.2018.03.008 [DOI] [PubMed] [Google Scholar]
- 5.Abulhasan YB, Al-Jehani H, Valiquette MA, McManus A, Dolan-Cake M, Ayoub O, et al. Lumbar drainage for the treatment of severe bacterial meningitis. Neurocrit Care. (2013) 19:199–205. 10.1007/s12028-013-9853-y [DOI] [PubMed] [Google Scholar]
- 6.Fang Y, Shao Y, Lu J, Dong X, Zhao X, Zhang J, et al. The effectiveness of lumbar cerebrospinal fluid drainage in aneurysmal subarachnoid hemorrhage with different bleeding amounts. Neurosurg Rev. (2020) 43:739–47. 10.1007/s10143-019-01116-1 [DOI] [PubMed] [Google Scholar]
- 7.Fan S, Chen Y, Cao Y, Liu L, Liu F, Zhang C, et al. The effectiveness of lumbar drainage in the management of delayed or recurrent cerebrospinal fluid leaks: a retrospective case series in a single center. World Neurosurg. (2019) 129:e845–e50. 10.1016/j.wneu.2019.06.054 [DOI] [PubMed] [Google Scholar]
- 8.Manet R, Payen JF, Guerin R, Martinez O, Hautefeuille S, Francony G, et al. Using external lumbar Csf drainage to treat communicating external hydrocephalus in adult patients after acute traumatic or non-traumatic brain injury. Acta Neurochir (Wien). (2017) 159:2003–9. 10.1007/s00701-017-3290-1 [DOI] [PubMed] [Google Scholar]
- 9.Qian C, Yu X, Chen J, Gu C, Wang L, Chen G, et al. Effect of the drainage of cerebrospinal fluid in patients with aneurismal subarachnoid hemorrhage: a meta-analysis. Medicine. (2016) 95:e5140. 10.1097/MD.0000000000005140 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Staykov D, Kuramatsu JB, Bardutzky J, Volbers B, Gerner ST, Kloska SP, et al. Efficacy and safety of combined intraventricular fibrinolysis with lumbar drainage for prevention of permanent shunt dependency after intracerebral hemorrhage with severe ventricular involvement: a randomized trial and individual patient data meta-analysis. Ann Neurol. (2017) 81:93–103. 10.1002/ana.24834 [DOI] [PubMed] [Google Scholar]
- 11.Strickland BA, Lucas J, Harris B, Kulubya E, Bakhsheshian J, Liu C, et al. Identification and repair of intraoperative cerebrospinal fluid leaks in endonasal transsphenoidal pituitary surgery: surgical experience in a series of 1002 patients. J Neurosurg. (2018) 129:425–9. 10.3171/2017.4.JNS162451 [DOI] [PubMed] [Google Scholar]
- 12.Skalicky P, Mladek A, Vlasak A, De Lacy P, Benes V, Bradac O. Normal pressure hydrocephalus-an overview of pathophysiological mechanisms and diagnostic procedures. Neurosurg Rev. (2020) 43:1451–64. 10.1007/s10143-019-01201-5 [DOI] [PubMed] [Google Scholar]
- 13.Ginalis EE, Fernandez LL, Avila JP, Aristizabal S, Rubiano AM. A review of external Lumbar drainage for the management of intracranial hypertension in traumatic brain injury. Neurochirurgie. (2021) 68:206–11. 10.1016/j.neuchi.2021.05.004 [DOI] [PubMed] [Google Scholar]
- 14.Li G, Zhang Y, Zhao J, Han Z, Zhu X, Hou K. Some cool considerations of external lumbar drainage during its widespread application in neurosurgical practice: a long way to go. Chin J Neurosurg. (2016) 2:117–24. 10.1186/s41016-016-0033-8 [DOI] [Google Scholar]
- 15.Liang H, Zhang L, Gao A, Li Y, Jiang Z, Hu F, et al. Risk factors for infections related to lumbar drainage in spontaneous subarachnoid hemorrhage. Neurocrit Care. (2016) 25:243–9. 10.1007/s12028-015-0239-1 [DOI] [PubMed] [Google Scholar]
- 16.Hussein K, Rabino G, Feder O, Eghbaryeh H, Zayyad H, Sviri G, et al. Risk factors for meningitis in neurosurgical patients with cerebrospinal fluid drains: prospective observational cohort study. Acta Neurochir (Wien). (2019) 161:517–24. 10.1007/s00701-019-03801-y [DOI] [PubMed] [Google Scholar]
- 17.Leverstein-van Hall MA, Hopmans TE, van der Sprenkel JW, Blok HE, van der Mark WA, Hanlo PW, et al. A Bundle Approach to Reduce the Incidence of External Ventricular and Lumbar Drain-Related Infections. J Neurosurg. (2010) 112:345–53. 10.3171/2009.6.JNS09223 [DOI] [PubMed] [Google Scholar]
- 18.van Mourik MS, Groenwold RH. Berkelbach van der Sprenkel JW, van Solinge WW, Troelstra A, Bonten MJ. Automated detection of external ventricular and lumbar drain-related meningitis using laboratory and microbiology results and medication data. PLoS ONE. (2011) 6:e22846. 10.1371/journal.pone.0022846 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Wang P, Song W, Cheng S, Shuai Y, Yang J, Luo S. Establishment of a nomogram for predicting lumbar drainage-related meningitis: a simple tool to estimate the infection risk. Neurocrit Care. (2021) 34:557–65. 10.1007/s12028-020-01076-1 [DOI] [PubMed] [Google Scholar]
- 20.Luz CF, Vollmer M, Decruyenaere J, Nijsten MW, Glasner C, Sinha B. Machine learning in infection management using routine electronic health records: tools, techniques, and reporting of future technologies. Clin Microbiol Infect. (2020) 26:1291–9. 10.1016/j.cmi.2020.02.003 [DOI] [PubMed] [Google Scholar]
- 21.Ozkan IA, Koklu M, Sert IU. Diagnosis of Urinary Tract Infection Based On Artificial Intelligence Methods. Comput Methods Programs Biomed. (2018) 166:51–9. 10.1016/j.cmpb.2018.10.007 [DOI] [PubMed] [Google Scholar]
- 22.Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F. E1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), Tu Wien. (2021). [Google Scholar]
- 23.Liaw A, Wiener M. Classification and regression by randomforest. R News. (2002) 2:18–22. 10.1057/9780230509993 [DOI] [Google Scholar]
- 24.Venables WN, Ripley BD. Modern Applied Statistics With S. 4th ed. New York: Springer. (2002). 211–50, 331–51. 10.1007/978-0-387-21706-2 [DOI] [Google Scholar]
- 25.Bartz-Kurycki MA, Green C, Anderson KT, Alder AC, Bucher BT, Cina RA, et al. Enhanced neonatal surgical site infection prediction model utilizing statistically and clinically significant variables in combination with a machine learning algorithm. Am J Surg. (2018) 216:764–77. 10.1016/j.amjsurg.2018.07.041 [DOI] [PubMed] [Google Scholar]
- 26.Li BY, Oh J, Young VB, Rao K, Wiens J. Using machine learning and the electronic health record to predict complicated clostridium difficile infection. Open Forum Infect Dis. (2019) 6:ofz186. 10.1093/ofid/ofz186 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Tunthanathip T, Sae-Heng S, Oearsakul T, Sakarunchai I, Kaewborisutsakul A, Taweesomboonyat C. Machine learning applications for the prediction of surgical site infection in neurological operations. Neurosurg Focus. (2019) 47:E7. 10.3171/2019.5.FOCUS19241 [DOI] [PubMed] [Google Scholar]
- 28.Ramgopal S, Horvat CM, Yanamala N, Alpern ER. Machine learning to predict serious bacterial infections in young febrile infants. Pediatrics. (2020) 146:4096. 10.1542/peds.2019-4096 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Azimi K, Honaker MD, Chalil Madathil S, Khasawneh MT. Post-operative infection prediction and risk factor analysis in colorectal surgery using data mining techniques: a pilot study. Surg Infect. (2020) 21:784–92. 10.1089/sur.2019.138 [DOI] [PubMed] [Google Scholar]
- 30.Rahmani K, Garikipati A, Barnes G, Hoffman J, Calvert J, Mao Q, et al. Early prediction of central line associated bloodstream infection using machine learning. Am J Infect Control. (2021) 50:440–5. 10.1016/j.ajic.2021.08.017 [DOI] [PubMed] [Google Scholar]
- 31.Tabaie A, Orenstein EW, Nemati S, Basu RK, Kandaswamy S, Clifford GD, et al. Predicting presumed serious infection among hospitalized children on central venous lines with machine learning. Comput Biol Med. (2021) 132:104289. 10.1016/j.compbiomed.2021.104289 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Wang H, Fan T, Yang B, Lin Q, Li W, Yang M. Development and internal validation of supervised machine learning algorithms for predicting the risk of surgical site infection following minimally invasive transforaminal lumbar interbody fusion. Front Med (Lausanne). (2021) 8:771608. 10.3389/fmed.2021.771608 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Yeo I, Klemt C, Robinson MG, Esposito JG, Uzosike AC, Kwon YM. The Use of Artificial neural networks for the prediction of surgical site infection following Tka. J Knee Surg. (2022). 10.1055/s-0041-1741396. [Epub ahead of print]. [DOI] [PubMed] [Google Scholar]
- 34.Leisman DE, Harhay MO, Lederer DJ, Abramson M, Adjei AA, Bakker J, et al. Development and reporting of prediction models: guidance for authors from editors of respiratory, sleep, and critical care journals. Crit Care Med. (2020) 48:623–33. 10.1097/CCM.0000000000004246 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Kocak B, Kus EA, Kilickesmez O. How to read and review papers on machine learning and artificial intelligence in radiology: a survival guide to key methodological concepts. Eur Radiol. (2021) 31:1819–30. 10.1007/s00330-020-07324-4 [DOI] [PubMed] [Google Scholar]
- 36.Conen A, Fux CA, Vajkoczy P, Trampuz A. Management of infections associated with neurosurgical implanted devices. Expert Rev Anti Infect Ther. (2017) 15:241–55. 10.1080/14787210.2017.1267563 [DOI] [PubMed] [Google Scholar]
- 37.Martin ET, Kaye KS, Knott C, Nguyen H, Santarossa M, Evans R, et al. Diabetes and risk of surgical site infection: a systematic review and meta-analysis. Infect Control Hosp Epidemiol. (2016) 37:88–99. 10.1017/ice.2015.249 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Lin WJ, Chen JJ. Class-imbalanced classifiers for high-dimensional data. Brief Bioinform. (2013) 14:13–26. 10.1093/bib/bbs006 [DOI] [PubMed] [Google Scholar]
- 39.Olson RS, Cava W, Mustahsan Z, Varik A, Moore JH. Data-driven advice for applying machine learning to bioinformatics problems. Pac Symp Biocomput. (2018) 23:192–203. 10.1142/9789813235533_0018 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Saito T, Rehmsmeier M. Precrec: fast and accurate precision-recall and roc curve calculations in R. Bioinformatics. (2017) 33:145–7. 10.1093/bioinformatics/btw570 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.