Figure 5.
Overview of area under the curve (AUC) values for identification or early detection of infection, sepsis, septic shock, and severe sepsis using different data types (structured data and text, structured data only, and text only).∗ Each figure contains the study and year, machine learning model,a and natural language processing techniqueb. (A) AUC values for infection identification. Horng et al47 2017: SVM (BoW) has 2 AUC values; 0.86 when using chief complaints and nursing notes and 0.83 when using only chief complaints. (B) AUC values for early sepsis detection. Amrollahi et al53 AUC values are from detecting 4 h before sepsis onset, and Qin et al55 AUC values are the average from detecting 0 to 6 h before sepsis onset. (C) AUC values for early septic shock detection. Hammoud et al54 AUC values are from detecting 30.64 h before septic shock onset, and Liu et al50 AUC values are from detecting 6.0 to 7.3 h before septic shock onset. (D) AUC values for early sepsis, severe sepsis, or septic shock detection and sepsis identification in Goh et al.52 Different symbols separate data types. (E) AUC values for early septic shock detection for Culliton et al49 using results from the test set. (F) AUC values for early septic shock detection for Culliton et al49 using results from 3-fold validation. ∗Disclaimer: AUC values should not be directly compared between studies and different figures for infection, sepsis, severe sepsis, and septic shock. Additionally, the lines connecting points do not indicate AUC values changing over time (Figure 5D and 5F); lines only separate the different methods visually. aMachine learning models: dag: dagging (partition data into disjoint subgroups); GBT: gradient boosted trees; GRU: gated recurrent unit; LSTM: long short-term memory; NB: Naïve Bayes; RF: random forest; SVM: support vector machines. bNatural language processing techniques: BoW: Bag-of-words; ClinicalBERT: Clinical Bidirectional Encoder Representations from Transformers; ClinicalBERT-m: ClinicalBERT from merging all textual features to get embeddings; ClinicalBERT-sf; finetuned ClinicalBERT from concatenating individual embeddings of each textual feature; CM: Amazon Comprehend Medical service for named entity recognition; GloVe: Global Vectors for Word Representation; LDA: Latent Dirichlet Allocation; tf-idf: term frequency-inverse document frequency.
