Abstract
In this narrative review, we discuss studies assessing the use of machine learning (ML) models for the early diagnosis of candidemia, focusing on employed models and the related implications. There are currently few studies evaluating ML techniques for the early diagnosis of candidemia as a prediction task based on clinical and laboratory features. The use of ML tools holds promise to provide highly accurate and real-time support to clinicians for relevant therapeutic decisions at the bedside of patients with suspected candidemia. However, further research is needed in terms of sample size, data quality, recognition of biases and interpretation of model outputs by clinicians to better understand if and how these techniques could be safely adopted in daily clinical practice.
Keywords: : artificial intelligence, candidemia, classification, machine learning, neural networks, prediction, random forest
Plain language summary
Candida is a type of fungus that can cause fatal infections. To confirm the presence of the infection, doctors may search for the fungus in the blood. Here, we discuss if computer systems can help to identify infection more easily and more rapidly.
Plain language summary
Executive summary.
Bloodstream infections (BSI) by Candida spp. are among the most frequent healthcare-associated BSI, and are associated with high mortality rates.
In the past few years some authors have started to explore the potential role of machine learning (ML) models for the prediction of candidemia, as early diagnostic tools in the form of ML-guided clinical decision support systems.
Brief summary of main results of studies assessing the use of ML techniques for the early diagnosis of candidemia
There are currently only a few studies evaluating the approach through ML techniques for the early diagnosis of candidemia in clinical practice.
Study design & populations
All the four studies retrieved by our literature search were retrospective and investigated the diagnostic performance of various ML algorithms for the prediction of candidemia in adult patients.
Sample sizes & choice & availability of input features
Various factors affect the pre-test probability and post-test probability of candidemia (where the test is the evaluated ML-based classifier) and may lead to very different diagnostic performances when extrapolated to settings different from those registered in the original studies.
Feature selection, model selection & performance evaluation
Exploration of the performance of complex ML models in predicting candidemia on very large samples of automatically extracted laboratory and microbiological data could help, in our opinion, to better understand the potential of such models in identifying hidden patterns of associations that are difficult to be recognized by clinicians due to their apparently nonspecific nature.
The best performances for the early diagnosis of candidemia (as a prediction/classification task) within each retrieved study were from models in the category of ensemble learning methods.
The future of ML techniques for diagnosis of candidemia
The use of ML tools hold promise to provide highly accurate and real-time support to clinicians for relevant therapeutic decisions at the bedside of patients with suspected candidemia, although due caution is still necessary in terms of extrapolation and generalization.
An accurate, transparent, and large dataset is at least as important as the choice of the best suited ML model for training.
Further research is needed in terms of sample size, data quality, recognition of biases and interpretation of model outputs by clinicians to better understand if and how these techniques could be safely adopted in the daily clinical practice.
Bloodstream infections (BSI) by Candida spp. are among the most frequent healthcare-associated BSI, and are associated with mortality rates possibly surpassing 50% when they develop in critically ill patients in intensive care units (ICUs) [1-4]. The development of candidemia has also been reported to unfavorably impact length of hospital stay, and to increase costs in healthcare [1].
Diagnosis of candidemia is usually made through collection of blood for culture, the results of which become available only 48–72 h after blood draw. Therefore, while blood cultures currently remain the gold standard for diagnosis (and also allow for species identification and susceptibility testing), more rapid antigen-based or molecular tests are also nowadays commonly used in clinical practice for helping physicians with early therapeutic decision, in other words, whether or not to add an antifungal, or to discontinue empirical antifungals, in patients with signs and symptoms of BSI and severe clinical presentation, while waiting blood culture results [5-14].
In the past few years, some authors have also started to explore the potential role of machine learning (ML) models for the prediction of candidemia, as early diagnostic tools in the form of ML-guided clinical decision support systems. In the present narrative review, we discuss currently available studies assessing the use of ML models for the early diagnosis of candidemia, focusing on employed models and the related implications. Finally, we provide a personal view on some crucial points that should be considered for the future inclusion of these techniques in routine diagnostic algorithms and clinical reasoning at the bedside of patients with suspected candidemia.
To identify available studies dealing with the use of ML techniques to support the diagnosis of candidemia in clinical practice, we performed a literature search. On 2 August 2023, we searched the PubMed and Embase databases using the following combination of keywords: ((candidemia) OR (candidaemia)) AND (machine learning). We did not apply any filter on time, type of study and text availability. An update of the search, employing the same combinations of key words, was performed on 28 November 2023. Overall, four original studies were eventually selected for inclusion in the present review [15-18]. The main characteristics of included studies are reported in Table 1. Finally, the text of the present review was structured in the following sections, reflecting crucial aspects of the use of ML techniques for the early diagnosis of candidemia: brief summary of main results of studies assessing the use of ML techniques for the early diagnosis of candidemia; considerations on study design and population; considerations on sample sizes and on choice and availability of input features; considerations on feature selection, model selection and performance evaluation. The manuscript was then completed with the following two sections: conclusion; future perspective.
Table 1.
Summary of the principal characteristics of included studies.
| Study (year) | Objectives | End point | Study period | Strengths | Limitations | Ref. |
|---|---|---|---|---|---|---|
| Giacobbe et al. (2023) | Differential diagnosis of candidemia vs bacteremia | Candidemia, defined as presence of one or more positive blood cultures for a species of Candida spp. within a 30 days window | 2011–2019 | Automation in features extraction from hospital LIS; identification of nonspecific markers | Based on laboratory tests and microbiological cultures only; single center study; large proportion of missing values | [18] |
| Ripoli et al., (2020) | Detection of candidemia in patients hospitalized in IMW and experiencing fever and/or other clinical signs of infection (SIRS) | Candidemia, defined as a patient with at least one blood culture yielding Candida spp. | 2012–2015 | Attempt to explain model outputs to clinicians | Limited sample; case-control study; only one ML-model considered | [17] |
| Yuan et al. (2021) | Prediction of candidemia in ICU patients with new onset of SIRS | Candidemia, defined as a positive blood culture for Candida spp. obtained during SIRS only in absence of previous SIRS within 24 h | 2013–2017 | Limited number of positive cases managed with SMOTE; comparison of five different algorithms (four ML models + LR) | Results not generalizable to non-ICU patients; case-control study | [15] |
| Yoo et al. (2021) | Diagnosis of candidemia in patients with cancer | Candidemia, defined as positive culture for any Candida spp. from more than one blood sample within a 7 days window | 2010–2018 | Specific for patients with malignancy (which nonetheless could also be viewed as a limitation for generalization) | Single center study; non-standard definition of candidemia; the developed algorithm only works with none or few missing values | [16] |
ICU: Intensive care unit; IMW: Internal medical ward; LIS: Laboratory information system; LR: Logistic regression; ML: Machine learning; SIRS: Systemic inflammatory response syndrome; SMOTE: Synthetic minority over-sampling technique.
Brief summary of main results of studies assessing the use of ML techniques for the early diagnosis of candidemia
In 2020, Ripoli and colleagues reported the results of a retrospective, multicenter study assessing the performance of a random forest (RF) algorithm for the early differential diagnosis between candidemia and bacteremia in internal medicine wards [17]. The study population consisted of 295 patients with candidemia and 138 controls with bacteremia. Eventually, the RF algorithm showed 84% sensitivity and 91% specificity, which were higher than those obtained through a classical multivariable logistic regression model (80% sensitivity and 85% specificity) [17].
In 2021, Yuan and colleagues also assessed the performance of four different ML algorithms (XGBoost, support vector machine, RF and ExtraTrees) and logistic regression for the early diagnosis of candidemia in ICUs patients with systemic inflammatory response syndrome (SIRS) [15]. The study was multicenter and retrospective, and included a total of 137 patients with SIRS and candidemia and 7795 patients with SIRS and blood cultures negative or positive for a pathogen other than Candida species. The ML algorithm showing the best predictive performance in the study population was XGBoost, with 84% sensitivity and 89% specificity. The model also showed a very high negative predictive value (NPV) of 99.6%, partly connected to the low prevalence of candidemia in the study population (1.7%) [15].
In the same year, Yoo and colleagues reported the results of a retrospective, single center study assessing the performance of various ML algorithms (including, among others, deep neural network, gradient boosting and RF) and logistic regression for the early diagnosis of candidemia in patients with malignancies [16]. The study sample consisted of 501 candidemia episodes and 2000 control episodes (either bacteremia episodes or negative blood cultures). The ML algorithm showing the best predictive performance in the study population was RF, with 89% sensitivity and 90% specificity. The model also showed 96.7% NPV, in a study population with a prevalence of candidemia of 20% [16].
Finally, we recently assessed the performance of an RF algorithm for the early differential diagnosis between candidemia and bacteremia on an automatically extracted dataset of 1275 episodes of candidemia and 11,208 episodes of bacteremia over 9 years (10% prevalence of candidemia) [18]. The study was retrospective and exploited automated data collection of laboratory and microbiological features through a previously validated automated extraction system [19]. In the overall study population, the performance of the RF algorithm in predicting candidemia on the basis on microbiological features and nonspecific laboratory markers resulted in 98% sensitivity and 65% specificity in the training set, and in 74% sensitivity and 57% specificity in the test set (outperforming predictions of penalized logistic regression models using the same training and test sets). When these performances were used as pre-training for selecting the most influential features to then predict candidemia in a subset of episodes for which the results of more specific laboratory markers for the diagnosis of candidemia (serum β-D-glucan [BDG] and serum procalcitonin [PCT]) were available, the diagnostic performance of an RF model based on the selected most influential features plus BDG and PCT numerically outperformed the performance of a RF model based on BDG and PCT only, although the difference was eventually not statistically significant [18].
Study design & populations
All the four studies retrieved by our literature search were retrospective and investigated the diagnostic performance of various ML algorithms for the prediction of candidemia in adult patients [15-18]. As shown in Table 2, three studies were restricted to specific wards or baseline conditions (internal medicine wards, ICUs and patients with malignancies, respectively), whereas in our previous study we considered all hospitalized patients independent of wards and conditions [15-18]. It is noteworthy that the population selection has implications for the baseline prevalence of candidemia, which is generally low, but with notable fluctuations according to the baseline risk of the given population, in turn influencing both NPV and positive predictive value (PPV) of explored models. Other factors affecting the original baseline prevalence of candidemia (in a way that may lead to artificial increases) are whether all patients without candidemia meeting the study entry criteria are included or only a subset of them is included as a control group (like in the studies by Ripoli and colleagues and by Yoo and colleagues [16,17]), and whether baseline prevalences were intentionally changed to avoid difficulties in model training (like in the study by Yoo and colleagues [16]). While similar adjustments are not incorrect and may help to evaluate the model performance more adequately (as well as to reduce the time of data collection when performed manually), readers should always pay caution when extrapolating NPV and PPV to their own settings, as their baseline prevalence of candidemia is likely different from that of samples adjusted for model training. Even more important clinically, when extrapolating results to their own setting, readers should pay careful attention to the type of population on which the model was trained. Indeed, both the baseline prevalence of candidemia and the diagnostic performance of markers (for example, that of inflammatory markers in populations frequently treated with potent anti-inflammatory drugs versus populations usually not receiving anti-inflammatory drugs) may vary greatly across, for example, ICUs, internal medicine wards, patients with malignances and whole hospital populations. All these factors variously affect the pre-test probability and post-test probability of candidemia (where the ‘test’ is the evaluated ML-based classifier) and may lead to very different diagnostic performances locally from those registered in original studies. Notably, these considerations do not only apply only to ML algorithms but also to classical diagnostic studies, and highlight the importance and need for external validation of trained models (both in different centers and in different populations). With regard to specific considerations on study design of future multicenter studies evaluating the performance of ML-based classifiers for the early diagnosis of candidemia, it is of note that also the type and characteristics of the information used for training the model could possibly and unintentionally influence and confound the assessment of the performance of the model. For example, if also all (or more likely, part) of data is collected from the text of clinical notes through ML techniques (for example, natural language processing [NLP]), between-center heterogeneity in how and how much information is presented in the text of clinical charts may lead to different accuracy in the extraction of the information, in turn influencing the performance of the model assessing the impact of extracted features on the endpoint of interest (early diagnosis of candidemia). In our opinion, measurement of such a potential heterogeneity in the ability of information extraction across center should also be defined at the time of study design, and may represent an important strategy to understand both how much it could influence model results and how to reduce it to improve generalization.
Table 2.
Study populations of included studies.
| Study (year) | Population | Candidemia (no.) | No candidemia (no.) | Case-control matching | Ref. |
|---|---|---|---|---|---|
| Giacobbe et al. (2023) | Mixed-ward patients | 1275 | 11,208† | - | [18] |
| Ripoli et al. (2020) | Internal medical wards patients | 157 | 138† | 1:1 | [17] |
| Yuan et al. (2021) | Intensive care units patients | 137 | 7865‡ | - | [15] |
| Yoo et al. (2021) | Patients with malignancy | 501 | 2000‡ | 1:4 in four subsets | [16] |
Bacteremia episodes.
Either bacteremia or negative blood cultures.
Sample sizes & choice & availability of input features
Clinical data included in observational studies mainly derive from procedures concerning patient care, that are not primarily aimed at research. This is classically defined as real world experience and comprehends all data collected during daily clinical practice. Nowadays, the richest sources of patient's data are laboratory information systems (LISs) and electronic medical records (EMRs), that contain multidimensional and heterogeneous data, usually collected manually by investigators to be included in specific datasets specifically designed for a given research project. Manual collection is extremely time consuming, so samples sizes are usually limited to hundreds or sometime a few thousands of patients. This is in line with the sample sizes of three of the retrieved studies (Table 2), whereas in our previous study, in which the data was not collected manually but through an automated extraction system, the final number of included episodes (either of candidemia or bacteremia) was far higher, reaching 12,483 (a number very difficult to achieve in general, and likely impossible to achieve for a single-center study by means of manual collection, at least in a short time window) [15-18]. On the other hand, in our study, only laboratory and microbiological features were collected by the automated extraction system, whereas in the other studies also clinical features that are well recognized as predictors of candidemia (for example, presence of central venous catheters, total parenteral nutrition, receipt of broad spectrum antibiotics) were collected manually [15-18]. This reflects the current difficulties in automatically and reliably extracting high-detail and accurate clinical features from unstructured data (for example, free text from clinical notes in EMRs), although advancements in the field are likely to be achieved in the near futures by exploiting progresses in the application of NLP techniques to the extraction of clinical features from EMRs [20-23]. In the meantime, in our opinion, exploration of the performance of complex ML models in predicting candidemia on very large samples of automatically extracted laboratory and microbiological data can help to better understand the potential of such models in identifying hidden patterns of associations that are difficult to be recognized by clinicians at the bedside due to their apparently nonspecific nature. In turn, such results could be exploited for improving the performance of real-time ML-based clinical decision support systems for the early diagnosis of candidemia, provided interpretability or sufficient explainability of employed models is guaranteed (see future perspective below).
Feature selection, model selection & performance evaluation
Before proceeding with training and evaluation of the performance of ML models, descriptive univariable comparisons, for example, through chi-square test and Kruskal–Wallis test, were performed in all the included studies in order to spot possible associations between each variable and the outcome measure of interest (diagnosis of candidemia), eventually leading to feature selection for inclusion in ML models [15-18]. The number of features screened, the methods and processes used for feature selection, and the number and type of features eventually included in the best performing models are summarized graphically in Figure 1. In line with the design of the study, in our previous experience we selected only features derived from laboratory tests and microbiological cultures results, since the employed dataset included automatically collected data from the LIS and not from EMRs [18]. Conversely, in the other included studies, feature selection was based also on demographics and clinical data (for example, comorbidities, previous antibiotic treatments, total parenteral nutrition) [15-17]. In our opinion, such different designs in these early studies attempting to explore how ML-based classifiers could help in the early diagnosis of candidemia reflect both the current lack of standardized approaches and the need to adhere to local possibility of extraction/collection in terms of type and quantity of data. Nonetheless both these considerations should not be necessarily viewed as limitations. Indeed, in our opinion, they could also help to discover which approach (or part of it) could work better in a hypothesis-generating way, prompting further study of specific aspects about how to improve both training and performance of ML models for this specific task. Certainly, caution is still necessary in extrapolating preliminary evidences, especially considering the limited number of studies (and consequently models and choices regarding how to train them) currently available as retrieved by our literature search restricted to diagnosis (while other studies are available that explore the use of ML-based classifiers for the prognosis of candidemia, which nonetheless represent a different endpoint [24-26]).
Figure 1.

Feature selection process in the different studies.
*Comprehensive of higher variable importance, higher clinical importance, better extractability and less missing values.
BDG: Serum β-D-glucan; PCT: Serum procalcitonin; PFI: Permutation feature importance.
Details regarding model selection and performance evaluation of ML algorithms in the different studies are reported in Table 3. Notably, all included studies basically compared the performances of several ML models (plus logistic regression in most cases) for the classification task ‘candidemia episode yes/no’ [15-18]. For each of the selected models, a first subset of the study sample (the training set) was used to identify the best set of parameters for each model. Then, the predictive performance of models trained on the training set was evaluated on the remaining subsample (the test set), using several performance metrics. In the study by Yuan and colleagues, the authors employed the synthetic minority over-sampling technique (SMOTE) algorithm to deal with the imbalance of sample categories [15]. In our previous study, no augmentation, oversampling or undersampling techniques were adopted but rather the training and test sets had a ground truth distribution similar to that of the available dataset, which is a relevant point to be considered for actually evaluating and creating operational ML models [18]. Finally, it should be highlighted that all ML models eventually showing the best performances for the early diagnosis of candidemia (as a prediction/classification task) within each study belong to the category of ensemble learning methods. Specifically, they are a subclass of supervised learning algorithms whose final prediction is obtained as a weighted average of the predictions returned by a set of devised classifiers (for example, RF is composed by K decision trees) [27]. The impact of biases on model performances specific to ML-based classifiers requires particular attention. In particular, the frequent need to fine-tune hyperparameters (in order to achieve better model performances) once first results are available is apparently in contrast with the classical approach of developing a statistical analysis plan before study conduction. This is also not necessarily a disadvantage, in our opinion, since fine-tuning in this case can also be conceived a priori, as a predefined measure with a certain degree of flexibility. What we think would become fundamental is, for each specific topic to achieve standardization of the desired bias-variance trade-off (for example, for the early diagnosis of candidemia, we may propose as a possible solution to focus the fine-tuning of hyperparameters on selecting a non-overfitted model prioritizing sensitivity over specificity in order not to miss true cases of candidemia, that otherwise would encounter perilous delays in antifungal treatment, still improving specificity compared to standard of care or in addition to standard of care). Of course, other and possibly better solutions may exist and deserve investigation, nonetheless still conceptually highlighting the need to tailoring each specific approach to the pertinent clinical research question.
Table 3.
Summary of the information related to ML models extracted from the included studies.
| Study (year) | Technical details | ML models | Output metrics | Best performances | Software (packages) used | Ref. |
|---|---|---|---|---|---|---|
| Giacobbe et al. (2023) | (Stratified) train test split: 70:30; (stratified) tenfold cross-validation; parameters tuning with GridSearch | RF | Sensitivity, specificity, PPV, NPV, Accuracy, F1 score, TSS | RF: sensitivity† = 0.566, Specificity† = 0.739 | Python | [18] |
| Ripoli et al. (2020) | Train test split: 2/3:1/3; tenfold cross-validation | RF | Sensitivity, specificity, AUROC, HLT statistics | RF: sensitivity† = 0.842, Specificity† = 0.91 | R (Boruta) | [17] |
| Yuan et al. (2021) | (Stratified) train test split: 80:20; SMOTE algorithm to balance sample categories | ET, RF, SVM, XGBoost | Sensitivity, specificity, PPV, NPV, AUROC | XGBoost: sensitivity† = 0.81, Specificity† = 0.89 | Python 3.7.0, (SMOTE) | [15] |
| Yoo et al. (2021) | Train test split: 70:30; parameters tuning with grid search | Auto-ML (TPOT), DNN, gradient boosting, RF | Sensitivity, specificity, PPV, NPV, F1 score, AUROC | RF: sensitivity† = 0.901 Specificity† = 0.722 | R, Python | [16] |
Evaluated at the optimal threshold, based on specific conditions.
AUROC: Area under the receiver operating characteristics curve; Auto-ML: Automated machine learning; DNN: Deep neural network; ET: ExtraTrees; HLT: Hosmer–Lemeshow test; RF: Random forest; SMOTE: Synthetic minority over-sampling technique; SVM: Support vector machine; TPOT: Tree-based pipeline optimization tool.
The future of ML techniques for diagnosis of candidemia
There are currently a few studies evaluating the approach through ML techniques for the early diagnosis of candidemia in clinical practice (as a prediction task based on clinical and routine laboratory features) [15-18]. In our opinion, there are four important considerations that should be made regarding the future development of these techniques.
The first consideration consists in the overlap of two points: almost all current studies do not have very large sample sizes, rather they are similar to those of some previous prediction models of candidemia based on logistic regression [28-33]; logistic regression is also frequently presented as a ML model (although we did not include studies employing logistic regression in the present review), and most teaching courses on ML for students or other interested professional figures start with logistic regression [15-18]. This underscores the lack of a clear conceptual border between classical statistical techniques and ML models, since both are able to ‘learn’ from data. Then, learned knowledge is used for making a prediction for a new patient (for example, this is what is usually done when considering the Candida Score by Leon and colleagues in clinical practice, which was based on a logistic regression model) [30]. The crucial difference between classical approaches and modern ML techniques may rely on the highly interconnected concepts of ML and ‘big data’ [34,35]. Some modern ML techniques are able to learn complex associations/correlations across features (of which some may be difficult to, or even impossible, to be spotted or understood by humans) and then to predict an outcome/result with high accuracy, usually unachievable when using classical techniques (although this is not absolute rule). Provided there are no other biases (or they are limited) in the data, very large training samples are needed for these complex models to learn to predict without overfitting to learning data, which partly explains why the concepts of ML and ‘big data’ are highly interconnected [34,36].
The second consideration regards the fact that, in clinical practice, the overall diagnostic accuracy, considering the best compromise across performance measures (for example, sensitivity, specificity, PPV, NPV), may be not the most desired result from model training (see also previous section). Indeed, at the bedside of patients with signs and symptoms of sepsis, clinicians may prefer an early prediction tool favoring sensitivity over specificity, considering that the least desirable situation (in other words, with the worse prognostic impact for the patient) is not to treat a true episode of candidemia (in other words, delaying up to blood cultures results the initiation of an antifungal agent). In our opinion, the crucial question to be answered in the near future is whether ML models, provided extremely high sensitivity is guaranteed in order not to perilously miss the early diagnosis of true cases, would be able to dramatically improve specificity, limiting as much as possible (and more than what is currently obtained with classical predictions tools) the number of false positive cases, thereby avoiding useless antifungal treatments in line with antifungal stewardship principles [37-40].
The third consideration regards the explainability of ML model results. Indeed, while models like logistic regression are interpretable (the ‘weight’ on the prediction conferred by clinical and laboratory features is evident from the model equation and can easily allow to calculate the related odds ratios), some complex ML models (for example, neural networks) are commonly defined as ‘black boxes’, since the calculations leading to the final prediction are hidden and more obscure to both data scientists and clinicians [41-43]. From this perspective, we support the use of interpretable models whenever their performance equals that of black box models. However, when this is not the case and a black model is used, there are crucial issues regarding the degree of explainability achievable for the employed model. Explainability is the ability to understand how the model produced its output (which inherently cannot be full explainability, as this would be interpretability), which is an intense matter of research and should be considered essential for healthcare in general and for candidemia. This because the lack of understanding (lack of explainability) of the reasons leading to a possible mistake of the model (for example, classifying as ‘no candidemia’ a patient who conversely has candidemia) may hamper the recognition of biases leading to the mistake, consequently increasing the risk of wrong clinical decision unfavorably impacting patients' outcomes [44,45].
Finally, the importance of data quality should not be overlooked. An accurate, transparent, and large dataset is as least as important as the choice of the best suited ML model for training. Both increased speed of data collection (automation of extraction could be crucial for healthcare personnel overwhelmed by clinical tasks that have limited or no time to collect research data) and increased data quality can be achieved by well-designed and calibrated automatic extraction tools, provided format and semantics of extraction are properly standardized [46].
Conclusion
The use of ML techniques has started to be explored also for the early diagnosis of candidemia. As in other fields of medicine, the use of ML tools hold promise to provide highly accurate and real-time support to clinicians for relevant therapeutic decisions at the bedside of patients with suspected candidemia, although due caution is still necessary in terms of extrapolation and generalization of this potential and the application of ML tools for the early diagnosis of candidemia in real life practice, considering the limited number of currently available studies.
Future perspective
As described in depth in the main body of the article, further research is needed in terms of sample size, data quality, recognition of biases and interpretation of model outputs by clinicians to better understand if and how these techniques could be safely adopted in the daily clinical practice.
Author contributions
Conceptualization: DR Giacobbe; writing—original draft preparation, DR Giacobbe, C Marelli, S Mora, A Cappello, S Guastavino, A Vena, A Signori; writing—review and editing, DR Giacobbe, C Marelli, S Mora, A Cappello, S Guastavino, A Vena, A Signori, N Rosso, C Campi, M Giacomini, M Bassetti; supervision: N Rosso, C Campi, M Giacomini, M Bassetti.
Financial disclosure
The authors have no financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.
Competing interests disclosure
Outside the submitted work, M Bassetti has received funding for scientific advisory boards, travel and speaker honoraria from Angelini, Astellas, BioMérieux, Cidara, Gilead, Menarini, MSD, Pfizer, Shionogi, Tetraphase, Nabriva. Outside the submitted work, DR Giacobbe reports investigator-initiated grants from Pfizer, Shionogi, BioMérieux, and Gilead Italia and speaker/advisor fees from Menarini, Pfizer and Tillotts Pharma. The authors have no other competing interests or relevant affiliations with any organization or entity with the subject matter or materials discussed in the manuscript apart from those disclosed.
Writing disclosure
No writing assistance was utilized in the production of this manuscript.
References
Papers of special note have been highlighted as: • of interest; •• of considerable interest
- 1.Bouza E, Munoz P. Epidemiology of candidemia in intensive care units. Int. J. Antimicrob. Agents 2008;32(Suppl. 2):S87–S91. doi: 10.1016/S0924-8579(08)70006-2 [DOI] [PubMed] [Google Scholar]
- 2.Bougnoux ME, Kac G, Aegerter Pet al. Candidemia and candiduria in critically ill patients admitted to intensive care units in France: incidence, molecular diversity, management and outcome. Intensive Care Med. 2008;34(2):292–299. doi: 10.1007/s00134-007-0865-y [DOI] [PubMed] [Google Scholar]
- 3.Bassetti M, Righi E, Ansaldi Fet al. A multicenter study of septic shock due to candidemia: outcomes and predictors of mortality. Intensive Care Med. 2014;40(6):839–845. doi: 10.1007/s00134-014-3310-z [DOI] [PubMed] [Google Scholar]
- 4.Bassetti M, Giacobbe DR, Vena Aet al. Incidence and outcome of invasive candidiasis in intensive care units (ICUs) in Europe: results of the EUCANDICU project. Crit. Care 2019;23(1):219. doi: 10.1186/s13054-019-2497-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.White PL, Archer AE, Barnes RA. Comparison of non-culture-based methods for detection of systemic fungal infections, with an emphasis on invasive Candida infections. J. Clin. Microbiol. 2005;43(5):2181–2187. doi: 10.1128/JCM.43.5.2181-2187.2005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Walker B, Powers-Fletcher MV, Schmidt RLet al. Cost-effectiveness analysis of multiplex PCR with magnetic resonance detection versus empiric or blood culture-directed therapy for management of suspected candidemia. J. Clin. Microbiol. 2016;54(3):718–726. doi: 10.1128/JCM.02971-15 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Rouze A, Loridant S, Poissy Jet al. Biomarker-based strategy for early discontinuation of empirical antifungal treatment in critically ill patients: a randomized controlled trial. Intensive Care Med. 2017;43(11):1668–1677. doi: 10.1007/s00134-017-4932-8 [DOI] [PubMed] [Google Scholar]
- 8.Posteraro B, Tumbarello M, De Pascale Get al. (1,3)-beta-d-Glucan-based antifungal treatment in critically ill adults at high risk of candidaemia: an observational study. J. Antimicrob. Chemother. 2016;71(8):2262–2269. doi: 10.1093/jac/dkw112 [DOI] [PubMed] [Google Scholar]
- 9.Mikulska M, Magnasco L, Signori Aet al. Sensitivity of serum beta-D-glucan in candidemia according to candida species epidemiology in critically ill patients admitted to the intensive care unit. J. Fungi (Basel) 2022;8(9):921. doi: 10.3390/jof8090921 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Martinez-Jimenez MC, Munoz P, Valerio Met al. Candida biomarkers in patients with candidaemia and bacteraemia. J. Antimicrob. Chemother. 2015;70(8):2354–2361. doi: 10.1093/jac/dkv090 [DOI] [PubMed] [Google Scholar]
- 11.Giannella M, Paolucci M, Roncarati Get al. Potential role of T2Candida in the management of empirical antifungal treatment in patients at high risk of candidaemia: a pilot single-centre study. J. Antimicrob. Chemother. 2018;73(10):2856–2859. doi: 10.1093/jac/dky247 [DOI] [PubMed] [Google Scholar]
- 12.Giacobbe DR, Signori A, Tumbarello Met al. Desirability of outcome ranking (DOOR) for comparing diagnostic tools and early therapeutic choices in patients with suspected candidemia. Eur. J. Clin. Microbiol. Infect. Dis. 2019;38(2):413–417. doi: 10.1007/s10096-018-3441-1 [DOI] [PubMed] [Google Scholar]
- 13.Giacobbe DR, Mikulska M, Tumbarello Met al. Combined use of serum (1,3)-beta-D-glucan and procalcitonin for the early differential diagnosis between candidaemia and bacteraemia in intensive care units. Crit. Care 2017;21(1):176. doi: 10.1186/s13054-017-1763-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Arendrup MC, Andersen JS, Holten MKet al. Diagnostic performance of T2Candida among ICU patients with risk factors for invasive candidiasis. Open Forum Infect. Dis. 2019;6(5):ofz136. doi: 10.1093/ofid/ofz136 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Yuan S, Sun Y, Xiao Xet al. Using machine learning algorithms to predict candidaemia in ICU patients with new-onset systemic inflammatory response syndrome. Front. Med. (Lausanne) 2021;8:720926. doi: 10.3389/fmed.2021.720926 [DOI] [PMC free article] [PubMed] [Google Scholar]; • One of the first studies exploring the use of machine learning techniques for the prediction of candidemia.
- 16.Yoo J, Kim S-H, Hur Set al. Candidemia risk prediction (CanDETEC) model for patients with malignancy: model development and validation in a single-center retrospective study. JMIR Med. Inform. 2021;9(7):e24651. doi: 10.2196/24651 [DOI] [PMC free article] [PubMed] [Google Scholar]; • One of the first studies exploring the use of machine learning techniques for the prediction of candidemia.
- 17.Ripoli A, Sozio E, Sbrana Fet al. Personalized machine learning approach to predict candidemia in medical wards. Infection 2020;48(5):749–759. doi: 10.1007/s15010-020-01488-3 [DOI] [PubMed] [Google Scholar]; • One of the first studies exploring the use of machine learning techniques for the prediction of candidemia.
- 18.Giacobbe DR, Marelli C, Mora Set al. Early diagnosis of candidemia with explainable machine learning on automatically extracted laboratory and microbiological data: results of the AUTO-CAND project. Ann. Med. 2023;55(2):2285454. doi: 10.1080/07853890.2023.2285454 [DOI] [PMC free article] [PubMed] [Google Scholar]; • One of the first studies exploring the use of machine learning techniques for the prediction of candidemia, on a large automatically extracted dataset of laboratory and microbiological variables.
- 19.Giacobbe DR, Mora S, Signori Aet al. Validation of an automated system for the extraction of a wide dataset for clinical studies aimed at improving the early diagnosis of candidemia. Diagnostics (Basel) 2023;13(5):961. doi: 10.3390/diagnostics13050961 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Rabhi S, Jakubowicz J, Metzger M-H. Deep learning versus conventional machine learning for detection of healthcare-associated infections in French clinical narratives. Methods Inf. Med. 2019;58(1):31–41. doi: 10.1055/s-0039-1677692 [DOI] [PubMed] [Google Scholar]
- 21.Mora S, Attene J, Gazzarata Ret al. A NLP pipeline for the automatic extraction of a complete microorganism's picture from microbiological notes. J. Pers. Med. 2022;12(9):1424. doi: 10.3390/jpm12091424 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Datta S, Bernstam EV, Roberts K. A frame semantic overview of NLP-based information extraction for cancer-related EHR notes. J. Biomed. Inform. 2019;100:103301. doi: 10.1016/j.jbi.2019.103301 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Chen L, Song L, Shao Yet al. Using natural language processing to extract clinically useful information from Chinese electronic medical records. Int. J. Med. Inform. 2019;124:6–12. doi: 10.1016/j.ijmedinf.2019.01.004 [DOI] [PubMed] [Google Scholar]
- 24.Li Y, Wu Y, Gao Yet al. Machine-learning based prediction of prognostic risk factors in patients with invasive candidiasis infection and bacterial bloodstream infection: a singled centered retrospective study. BMC Infect. Dis. 2022;22(1):150. doi: 10.1186/s12879-022-07125-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Hu W-H, Lin S-Y, Hu Y-Jet al. Application of machine learning for mortality prediction in patients with candidemia: feasibility verification and comparison with clinical severity scores. Mycoses 2024;67(1):e13667. doi: 10.1111/myc.13667 [DOI] [PubMed] [Google Scholar]
- 26.Gao Y, Tang M, Li Yet al. Machine-learning based prediction and analysis of prognostic risk factors in patients with candidemia and bacteraemia: a 5-year analysis. PeerJ. 2022;10:e13594. doi: 10.7717/peerj.13594 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Mahajan P, Uddin S, Hajati Fet al. Ensemble learning for disease prediction: a review. Healthcare (Basel) 2023;11(12):1808. doi: 10.3390/healthcare11121808 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Paphitou NI, Ostrosky-Zeichner L, Rex JH. Rules for identifying patients at increased risk for candidal infections in the surgical intensive care unit: approach to developing practical criteria for systematic use in antifungal prophylaxis trials. Med. Mycol. 2005;43(3):235–243. doi: 10.1080/13693780410001731619 [DOI] [PubMed] [Google Scholar]
- 29.Michalopoulos AS, Geroulanos S, Mentzelopoulos SD. Determinants of candidemia and candidemia-related death in cardiothoracic ICU patients. Chest 2003;124(6):2244–2255. doi: 10.1378/chest.124.6.2244 [DOI] [PubMed] [Google Scholar]
- 30.Leon C, Ruiz-Santana S, Saavedra Pet al. A bedside scoring system (“Candida score”) for early antifungal treatment in nonneutropenic critically ill patients with Candida colonization. Crit. Care Med. 2006;34(3):730–737. doi: 10.1097/01.CCM.0000202208.37364.7D [DOI] [PubMed] [Google Scholar]
- 31.Hermsen ED, Zapapas MK, Maiefski Met al. Validation and comparison of clinical prediction rules for invasive candidiasis in intensive care unit patients: a matched case-control study. Crit. Care 2011;15(4):R198. doi: 10.1186/cc10366 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Guillamet CV, Vazquez R, Micek STet al. Development and validation of a clinical prediction rule for candidemia in hospitalized patients with severe sepsis and septic shock. J. Crit. Care 2015;30(4):715–720. doi: 10.1016/j.jcrc.2015.03.010 [DOI] [PubMed] [Google Scholar]
- 33.Bassetti M, Giacobbe DR, Vena Aet al. Diagnosis and treatment of candidemia in the intensive care unit. Semin. Respir. Crit. Care Med. 2019;40(4):524–539. doi: 10.1055/s-0039-1693704 [DOI] [PubMed] [Google Scholar]
- 34.Beam AL, Kohane IS. Big data and machine learning in health care. JAMA 2018;319(13):1317–1318. doi: 10.1001/jama.2017.18391 [DOI] [PubMed] [Google Scholar]; •• An interesting overview on the emerging role of big data and machine learning in healthcare.
- 35.Giacobbe DR, Signori A, Del Puente Fet al. Early detection of sepsis with machine learning techniques: a brief clinical perspective. Front. Med. (Lausanne) 2021;8:617486. doi: 10.3389/fmed.2021.617486 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Ngiam KY, Khor IW. Big data and machine learning algorithms for health-care delivery. Lancet Oncol. 2019;20(5):e262–e273. doi: 10.1016/S1470-2045(19)30149-4 [DOI] [PubMed] [Google Scholar]
- 37.Miyazaki T, Kohno S. Current recommendations and importance of antifungal stewardship for the management of invasive candidiasis. Expert. Rev. Anti. Infect. Ther. 2015;13(9):1171–1183. doi: 10.1586/14787210.2015.1058157 [DOI] [PubMed] [Google Scholar]
- 38.Kriegl L, Boyer J, Egger Met al. Antifungal stewardship in solid organ transplantation. Transpl. Infect. Dis. 2022;24(5):e13855. doi: 10.1111/tid.13855 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Bassetti M, Giacobbe DR, Vena Aet al. Challenges and research priorities to progress the impact of antimicrobial stewardship. Drugs Context 2019;8:212600. doi: 10.7573/dic.212600 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Ananda-Rajah MR, Slavin MA, Thursky KT. The case for antifungal stewardship. Curr. Opin. Infect. Dis. 2012;25(1):107–115. doi: 10.1097/QCO.0b013e32834e0680 [DOI] [PubMed] [Google Scholar]
- 41.Wang S-H, Zhang Y-D. Advances and challenges of deep learning. Recent Patents Engin. 2023;17(4):1–2. doi: 10.2174/1872212116666220530125230 [DOI] [Google Scholar]
- 42.Goodswen SJ, Barratt JLN, Kennedy PJet al. Machine learning and applications in microbiology. FEMS Microbiol. Rev. 2021;45(5):fuab015. doi: 10.1093/femsre/fuab015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Giacobbe DR, Zhang Y, de la Fuente J. Explainable artificial intelligence and machine learning: novel approaches to face infectious diseases challenges. Ann. Med. 2023;55(2):2286336. doi: 10.1080/07853890.2023.2286336 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Rudin C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019;1(5):206–215. doi: 10.1038/s42256-019-0048-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Ali S, Akhlaq F, Imran ASet al. The enlightening role of explainable artificial intelligence in medical & healthcare domains: a systematic literature review. Comput. Biol. Med. 2023;166:107555. doi: 10.1016/j.compbiomed.2023.107555 [DOI] [PubMed] [Google Scholar]
- 46.Blobel B, Ruotsalainen P, Oemig Fet al. Principles and standards for designing and managing integrable and interoperable transformed health ecosystems. J. Pers. Med. 2023;13(11):1579. doi: 10.3390/jpm13111579 [DOI] [PMC free article] [PubMed] [Google Scholar]
