Abstract
Background
Traditional methods of outbreak investigations utilize reactive whole genome sequencing (WGS) to confirm or refute the outbreak. We have implemented WGS surveillance and a machine learning (ML) algorithm for the electronic health record (EHR) to retrospectively detect previously unidentified outbreaks and to determine the responsible transmission routes.
Methods
We performed WGS surveillance to identify and characterize clusters of genetically-related Pseudomonas aeruginosa infections during a 24-month period. ML of the EHR was used to identify potential transmission routes. A manual review of the EHR was performed by an infection preventionist to determine the most likely route and results were compared to the ML algorithm.
Results
We identified a cluster of 6 genetically related P. aeruginosa cases that occurred during a 7-month period. The ML algorithm identified gastroscopy as a potential transmission route for 4 of the 6 patients. Manual EHR review confirmed gastroscopy as the most likely route for 5 patients. This transmission route was confirmed by identification of a genetically-related P. aeruginosa incidentally cultured from a gastroscope used on 4of the 5 patients. Three infections, 2 of which were blood stream infections, could have been prevented if the ML algorithm had been running in real-time.
Conclusions
WGS surveillance combined with a ML algorithm of the EHR identified a previously undetected outbreak of gastroscope-associated P. aeruginosa infections. These results underscore the value of WGS surveillance and ML of the EHR for enhancing outbreak detection in hospitals and preventing serious infections.
Keywords: outbreak detection, healthcare-associated infections, whole genome sequencing surveillance, Pseudomonas aeruginosa, machine learning
Whole genome sequence surveillance combined with machine learning of the electronic health record discovered a previously undetected outbreak of Pseudomonas aeruginosa infections and accurately defined the transmission route as a contaminated gastroscope.
Outbreaks of healthcare-associated infections (HAIs) are serious events causing high morbidity, mortality, and burden on healthcare systems [1, 2]. A series of infections caused by the same pathogen with geotemporal clustering within the hospital may alert infection preventionists to investigate a potential transmission event or outbreak. Traditionally, hospital epidemiology involves creating a case definition, making a line list of potential patients, reviewing patient medical records for common exposures, performing an environmental investigation, and auditing health care practices [3]. This manual process is labor intensive and involves multiple staff dedicated to the investigation which diverts valuable resources from routine infection prevention practice. In addition, infection prevention programs that focus on geotemporal clustering for outbreak detection may miss outbreaks and generate false positive signals, which limits the utility of the approach [4, 5]. Further, not all patients within the investigation may be involved in the outbreak. Whole genome sequencing (WGS) in response to a suspected outbreak (reactive WGS) is one method of confirming genetic relatedness and therefore help confirm or refute presumptive transmission patterns.
In contrast to using WGS to confirm epidemiologically-defined transmission events, WGS surveillance—the routine prospective sequencing of select pathogens collected from hospitalized patients—has the potential to identify outbreaks more accurately and promptly [6]. However, while WGS surveillance can identify outbreaks, some studies using this method alone have been unable to identify the responsible transmission route [7–10].
In late 2016, we began development and validation of the Enhanced Detection System for Healthcare-Associated Transmission (EDS-HAT), which utilizes machine learning (ML) of the electronic health record (EHR) to identify the transmission route responsible for outbreaks detected by WGS surveillance. The EHR contains a multitude of clinical and epidemiologic information that can assist with outbreak investigations, including medical procedures and location data. We previously demonstrated that our EHR ML algorithm automatically detected the correct transmission routes for outbreaks defined by molecular methods [11, 12]. To further investigate this approach, we retrospectively performed 2.5 years of WGS surveillance and combined these data with our ML algorithm. During this analysis, we uncovered a previously undetected Pseudomonas aeruginosa outbreak.
METHODS
Study Setting
This study was conducted at the University of Pittsburgh Medical Center (UPMC) Presbyterian Hospital, an adult medical/surgical tertiary care hospital with 758 total beds, 134 critical care unit beds, more than 32 000 yearly inpatient admissions, and over 400 solid organ transplants per year. Ethics approval was obtained from the Institutional Review Board of the University of Pittsburgh.
Isolate Collection
We collected pre-specified EDS-HAT bacterial pathogens isolated from clinical cultures, including P. aeruginosa, between November 2016 and May 2019. Inclusion criteria were hospital admission greater than 2 days before the culture date and/or a recent inpatient or outpatient UPMC hospital encounter in the 30 days before the culture date. Any EDS-HAT pathogen, including P. aeruginosa, cultured from a separate program of microbiological surveillance of endoscopes (eg, gastroscopes, bronchoscopes) as a component of our routine infection prevention practice also underwent WGS. Surveillance scope cultures were performed in the clinical microbiology laboratory using the filtration method promulgated by the Centers for Disease Control and Prevention (CDC) and the Food and Drug Administration (FDA) [13]. If a particular scope was positive it was examined for mechanical defects, and if none were found the scope was again treated with high-level disinfection and recultured to ensure that it was negative.
Whole Genome Sequencing of P. aeruginosa
WGS was performed as previously described [14]. Briefly, following DNA extraction and library preparation, sequencing was performed on the NextSeq 550 platform (Illumina, San Diego, CA). Bacterial species was assigned by k-mer clustering with Kraken v1.0, [15], and genomes were assembled with SPAdes v3.13 [16]. Genomes were annotated with Prokka v1.14 [17], and multi-locus sequence types (STs) were assigned using PubMLST typing schemes with the mlst program (https://github.com/tseemann/mlst) [18]. Selected isolates were also resequenced with long-read technology on a MinION device (Oxford Nanopore Technologies, Oxford, United Kingdom).
Pairwise single nucleotide polymorphisms (SNPs) were calculated for isolates belonging to all P. aeruginosa STs with >2 isolates using snippy (https://github.com/tseemann/snippy), and core genome phylogenies based on aligned core SNPs were constructed for each ST using RAxML v8.0.26 by running 100 bootstrap replicates under the generalized time-reversible model (GTRCAT) and Lewis correction for ascertainment bias [19]. Clusters of genetically related P. aeruginosa isolates were defined as isolates from >1 patient having ≤15 pairwise core SNPs [14].
Extraction and Processing of EHR Data
Extraction and processing of EHR data was performed using a data mining algorithm as previously described [12]. In brief, all inpatient, emergency room, and same-day-surgery encounters were mined for charge codes and microbiologic data. A secure research database was utilized with each patient assigned a unique identification number (ID) using De-ID software (De-ID Data, Philadelphia, PA). Information from healthcare workers who wrote and signed clinical notes was also extracted and de-identified. Charge codes representing medical procedures (eg, bronchoscopy) were analyzed individually and also as user-defined group codes if there were multiple codes representing the procedure.
Machine Learning Algorithm
Machine learning algorithms based on Bayesian inference were designed and applied to defined lists of potential transmission routes using a case-control approach with scoring function and optimization process, as previously described [11, 12]. Cases were defined as patients whose clinical isolates differed by ≤15 pairwise SNPs, whereas controls were patients who were hospitalized during the 30 days prior to the case patients’ culture date and did not test positive for the genetically related isolate. Transmission routes were ranked by likelihood of transmission with significant odds ratio P-values less than 0.05. The top 10 transmission routes were categorized by type (procedures, locations, and providers) and further reviewed manually for accuracy and biologic plausibility. The EHR was also manually reviewed to explore other possible transmission routes that were not captured in the ML results.
Infections Prevented
The number of potential infections prevented had EDS-HAT been running in real time was calculated using methods previously described [12]. Seven and 14 days from the initially detected transmission route were used as a potential intervention timeline. All subsequent infections with the same transmission route were considered as preventable. We estimated costs saved due to preventable infections using AHRQ’s (Agency for Healthcare Research and Quality) Healthcare Cost and Utilization Project data [20]. The costs were adjusted to 2020 using the medical component of the Consumer Price Index [21].
RESULTS
There were 31/882 (3.5%) genetically related P. aeruginosa isolates across 10 WGS-defined clusters collected between November 2016 to May 2019. One cluster, designated cluster A, contained P. aeruginosa isolates collected from 6 patients over a 7-month period (Table 1). A gastroscope surveillance culture collected from Gastroscope A in month 8 grew a P. aeruginosa isolate that by WGS was included in cluster A. Isolates within cluster A belonged to ST27 and had between 0 and 9 core SNPs in pairwise comparisons (median pairwise SNPs = 2) of isolate genomes. The most closely related P. aeruginosa genome outside of the cluster had 64 SNPs compared to isolates within the cluster. All isolates were susceptible to piperacillin/tazobactam, cefepime, ciprofloxacin, and the aminoglycosides, and resistant or intermediate to aztreonam and meropenem.
Table 1.
Patient | Age | Infection | Culture Day | Gastroscope A exposure day | Gastroscope B exposure day | Days from Gastroscope A exposure to positive clinical culture | Days from Gastroscope B exposure to positive clinical culture | Outcome |
---|---|---|---|---|---|---|---|---|
1 | 89 | Urinary tract infection | 17 | 1 | - | 16 | - | Discharged to skilled nursing facility |
2 | 67 | Pneumonia | 55 | 43 | - | 12 | - | Discharged to home |
3 | 82 | Bacteremia | 80 | 66 | - | 14 | - | Discharged to hospice |
4 | 66 | Bacteremia | 124 | 123 | 192 | 1 | - | Discharged to long- term acute care facility |
5 | 52 | Pneumonia | 201 | - | 196 | - | 5 | Discharged to home |
6 | 66 | Pneumonia | 208 | - | - | - | - | Deceased |
aDay 1 is the Day of Gastroscope Exposure for Patient 1. The Positive Culture from Gastroscope A was obtained on day 219 of the Outbreak
A total of 190 transmission route groups, including procedure codes, locations, and providers, were explored for cluster A. There were 23 top ranked transmission routes identified by the ML algorithm. Gastroscopy (patients 1, 2, 4, and 5; OR 338.04, P-value < .01) (ranked #3) and bronchoscopy (patients 1, 2, and 4–6; OR 420.32, P-value < .001) (rank #1) were initially deemed to be plausible routes.
Traditional infection prevention methods did not initially identify these 6 patients as a cluster. Manual review of the EHR revealed that multiple different bronchoscopes on different nursing units were used, suggesting no common bronchoscope exposure and therefore lack of biologic plausibility. There was no 1 common nursing unit among all patients, prior admission from outside facilities, or other medical procedures otherwise not detected by EDS-HAT. Thus, manual EHR review did not identify any additional plausible transmission routes.
A timeline of the ST27 P. aeruginosa outbreak considering gastroscopes as a potential transmission route is illustrated in Figure 1. Patients 1–4 had gastroscopy with gastroscope A, the gastroscope with the genetically-related isolate cultured in month 8, with Patient 1’s gastroscopy performed on day 1 on the outbreak. Patient 3 underwent gastroscopy with gastroscope A but did not have a charge code that was detected by the algorithm. Patient 4 subsequently underwent gastroscopy with gastroscope B which was subsequently used on patient 5. Patient 6 was noted to have a procedure with an endoscope on day 197 (12 days prior to culture date), but no associated documentation for the endoscope type, serial number, or charge code could be identified. No other epidemiological link was identified for patient 6.
Gastroscopy was first identified as a potential transmission route by WGS on cluster day 43, the date of the Gastroscope A procedure for patient 2 (Figure 2). Using a 14-day intervention period, scope A could have been identified as a source P. aeruginosa transmission and removed from service on day 57. Therefore, infections of patients 3, 4, (both with bacteremia), and 5 (pneumonia) could have been potentially prevented if EDS-HAT had been running in real-time, representing a potential cost savings of $50 370.
DISCUSSION
In this study, whole genome sequence surveillance combined with machine learning of the EHR identified a previously undetected outbreak caused by a gastroscope that was contaminated with P. aeruginosa. Although the outbreak was detected by EDS-HAT, the P. aeruginosa isolate cultured from Gastroscope A helped to confirm gastroscopy as the correct transmission route. Infections from contaminated endoscopes have been previously described [22–24]. To our knowledge, there is only one other report describing a P. aeruginosa outbreak related to a contaminated gastroscope [25]. The prior report identified the outbreak given the presence of 2 extended-spectrum β-lactamase Pseudomonas isolates, a rare finding within their institution, in a 1-month period. Our study describes a cluster of P. aeruginosa isolates with an unremarkable susceptibility profile, which may explain in part why the outbreak went undetected by traditional epidemiologic methods.
There are several salient features about this outbreak investigation. First, WGS surveillance was able to “connect the dots” between patients in this outbreak that were not detected by traditional hospital epidemiology methods. Similarly, we reported that EDS-HAT detected an outbreak of vancomycin-resistant Enterococcus faecium infections caused by unsafe technique for injecting sterile contrast during interventional radiology procedures, a transmission route that was previously unrecognized [14]. Second, automated ML of the EHR can identify the transmission routes responsible for outbreaks detected by WGS surveillance and has the potential to prevent infections, as well as associated healthcare costs and patient morbidity. Third, the availability of a P. aeruginosa positive gastroscope surveillance culture confirmed the transmission route identified by the ML algorithm and supports a role of environmental cultures for hospital epidemiology.
There are limitations to this study. First, this analysis only included P. aeruginosa isolates obtained within our hospital as part of clinical care and were selected based upon presumptive criteria for healthcare-associated infections. Therefore, transmission events may be missed if a clinical culture was not obtained or the infection was not classified as healthcare-acquired based on hospital length-of-stay. Second, the ML algorithm relies on the presence of charge codes in the patient EHR to identify transmission routes. Patient 3 did not have a charge code for gastroscopy but, upon manual EHR review, was found to have had a procedure with gastroscope A. Thus, inconsistencies in charge codes may result in decreased sensitivity to detect all transmission events.
Ongoing retrospective investigations using EDS-HAT has identified multiple additional outbreaks and suspected transmission routes. Our proof of concept analysis [12] and the present study support our assertion that EDS-HAT is an effective tool for detecting outbreaks and identifying novel and previously unrecognized routes of transmission. This investigation supports the use of WGS surveillance and ML algorithms to identify highly impactful outbreaks for which a substantial proportion of infections could be prevented. Widespread use of EDS-HAT by healthcare institutions has the potential to increase patient safety substantially, decrease healthcare costs, and change the paradigm for outbreak detection and investigation in hospitals.
Notes
Acknowledgments. We thank Daniel Snyder, Daniel Evans, and Hayley Nordstrom for assistance with whole genome sequencing and data analysis.
Financial support. This study was funded in part by the National Institute of Allergy and Infectious Diseases, National Institutes of Health (NIH) (R21Al109459 and R01AI127472). NIH played no role in data collection, analysis, or interpretation; study design; writing of the manuscript; or decision to submit for publication.
Potential conflicts of interest. G. S. reports Scientific Advisory fees from Infectious Disease Connect, outside the submitted work. All other authors have no potential conflict. All authors have submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest. Conflicts that the editors consider relevant to the content of the manuscript have been disclosed.
Nonstandard Abbreviations
EDS-HAT, Enhanced Detection System for Healthcare-Associated Transmission; EHR, electronic health record; HAI, healthcare-associated infections; ML, machine learning; SNP, single nucleotide polymorphism; ST, sequence types. WGS, whole genome sequencing
References
- 1. Magill SS, Edwards JR, Bamberg W, et al. . Multistate point-prevalence survey of health care–associated infections. N Engl J Med 2014; 370:1198–208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Scott R. The direct medical costs of healthcare-associated infections in U.S. hospitals and the benefits of prevention. Centers for Disease Control and Prevention website. Published online 2009:16. Available at: http://www.cdc.go/A/df/a/cott_CostPaper.pdf. Acccessed 12 August 2020. [Google Scholar]
- 3. Principles of Epidemiology: Lesson 6, Section 2|Self-Study Course SS1978|CDC. Published October 29, 2019. Available at: https://www.cdc.gov/csels/dsepd/ss1978/lesson6/section2.html. Accessed 9 February 2020.
- 4. Lefebvre A, Bertrand X, Vanhems P, et al. . Detection of temporal clusters of healthcare-associated infections or colonizations with Pseudomonas aeruginosa in two hospitals: comparison of SaTScan and WHONET software packages. PLoS One 2015; 10:e0139920. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Huang SS, Yokoe DS, Stelling J, et al. . Automated detection of infectious disease outbreaks in hospitals: a retrospective cohort study. PLoS Med 2010; 7:e1000238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Popovich KJ, Snitkin ES. Whole genome sequencing; implications for infection prevention and outbreak investigations. Curr Infect Dis Rep 2017; 19:15. [DOI] [PubMed] [Google Scholar]
- 7. Raven KE, Gouliouris T, Brodrick H, et al. . Complex routes of nosocomial vancomycin-resistant Enterococcus faecium transmission revealed by genome sequencing. Clin Infect Dis 2017; 64:886–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Mellmann A, Bletz S, Böking T, et al. . Real-time genome sequencing of resistant bacteria provides precision infection control in an institutional setting. J Clin Microbiol 2016; 54:2874–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Price JR, Cole K, Bexley A, et al. . Transmission of Staphylococcus aureus between health-care workers, the environment, and patients in an intensive care unit: a longitudinal cohort study based on whole-genome sequencing. Lancet Infect Dis 2017; 17:207–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Robilotti E, Huang W, Babady NE, Chen D, Kamboj M. Transmission of Clostridioides difficile infection (CDI) from patients less than 3 years of age in a pediatric oncology setting. Infect Control Hosp Epidemiol 2020; 41:233–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Miller JK, Chen J, Sundermann A, et al. . Statistical outbreak detection by joining medical records and pathogen similarity. J Biomed Inform 2019; 91:103126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Sundermann AJ, Miller JK, Marsh JW, et al. . Automated data mining of the electronic health record for investigation of healthcare-associated outbreaks. Infect Control Hosp Epidemiol 2019; 40:314–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. U.S. Food and Drug Administration/Centers for Disease Control and Prevention/American Society for Microbiology Working Group on Duodenoscope Culturing. Duodenoscope surveillance sampling and culturing. Published online February 2018. Available at: https://www.fda.gov/media/111081/download. Accessed 12 August 2020.
- 14. Sundermann AJ, Babiker A, Marsh JW, et al. . Outbreak of vancomycin-resistant Enterococcus faecium in interventional radiology: detection through whole genome sequencing-based surveillance. Clin Infect Dis 2020; 71:2336–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 2014; 15:R46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Bankevich A, Nurk S, Antipov D, et al. . SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 2012; 19:455–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 2014; 30:2068–9. [DOI] [PubMed] [Google Scholar]
- 18. Jolley KA, Maiden MC. BIGSdb: Scalable analysis of bacterial genome variation at the population level. BMC Bioinformatics 2010; 11:595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 2014; 30:1312–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Healthcare Cost and Utilization Project (HCUPnet). HCUPnet, healthcare cost and utilization project. Rockville, MD: Agency for Healthcare Research and Quality. Available at: https://hcupnet.ahrq.gov. Accessed 31 August 2020. [Google Scholar]
- 21. Bureau of Labor Statistics. Medical care in US city average, all urban consumers, not seasonally adjusted. Washington, DC: Division of Consumer Prices and Price Indexes, 2020. Accessed 31 August 2020. [Google Scholar]
- 22. Marsh JW, Mustapha MM, Griffith MP, et al. . Evolution of outbreak-causing carbapenem-resistant Klebsiella pneumoniae ST258 at a tertiary care hospital over 8 years. mBio 2019; 10. doi: 10.1128/mBio.01945-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Galdys AL, Marsh JW, Delgado E, et al. . Bronchoscope-associated clusters of multidrug-resistant Pseudomonas aeruginosa and carbapenem-resistant Klebsiella pneumoniae. Infect Control Hosp Epidemiol 2019; 40:40–6. [DOI] [PubMed] [Google Scholar]
- 24. Mustapha MM, Marsh JW, Ezeonwuka CD, et al. . Draft genome sequences of four hospital-associated Pseudomonas putida isolates. Genome Announc 2016; 4. doi: 10.1128/genomeA.01039-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Bajolet O, Ciocan D, Vallet C, et al. . Gastroscopy-associated transmission of extended-spectrum beta-lactamase-producing Pseudomonas aeruginosa. J Hosp Infect 2013; 83:341–3. [DOI] [PubMed] [Google Scholar]