Abstract
By circumventing the need for a pure colony, MALDI-TOF Mass Spectrometry of bacterial membrane glycolipids (lipid A) has the potential to identify microbes more rapidly than protein-based methods. However, currently available bioinformatics algorithms (e.g. dot products) do not work well with glycolipid mass spectra like those produced by lipid A, the membrane anchor of lipopolysaccharide. To address this issue, we propose a spectral library approach coupled with a machine learning technique to more accurately identify microbes. Here, we demonstrate the performance of the model-based spectral library approach for microbial identification using approximately a thousand mass spectra collected from multidrug-resistant bacteria. At false discovery rates < 1%, our approach identified many more bacterial species than the existing approaches such as the Bruker Biotyper and characterized over 97% of their phenotypes accurately. As the diversity in our glycolipid mass spectral library increases, we anticipate that it will provide valuable information to more rapidly treat infected patients.
Graphical abstract

Despite public health efforts to combat antimicrobial resistance, challenges remain in bacterial identification, in particular, related to organisms that are antimicrobial resistant.1 To better address this problem and in turn more effectively control the spread of infectious diseases, it is essential to develop accurate, affordable, and timely diagnostic tools.2 Profiling the Gram-negative glycolipid lipid A (and other bacterial membrane glycolipids from Gram-positive bacteria) by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) is a candidate for such a rapid and low-cost diagnostic tool.3 Lipid A is the primary immunostimulatory component of lipopolysaccharide (LPS) and responsible for the toxicity of Gram-negative bacteria. Due to their diversity between species/phenotypes in the arrangement of fatty acyl side chains and sugar-associated functional groups, mass spectra generated from lipid A contain information to identify and characterize Gram-negative bacteria.4 Circumventing the need for biological culture to produce a pure colony allows this glycolipid approach to be much faster and cheaper than currently used pathogen detection methods (e.g., morphological/biochemical method), as well as the protein-based MALDI-TOF MS approach.5–10 The mass spectrometry approach of Leung et al. 3, which is based on profiling bacterial glycolipids like lipid A from Gram-negative microbes and related molecules from Gram-positive microbes, is also more cost-effective than the next generation sequencing approach that analyzes whole bacterial genomes. 11
Bioinformatics tools exist that analyze MALDI-TOF MS protein based mass spectra. For example, FDA approved software such as Biotyper from Bruker Daltonics and Spectral Archive and Microbial Identification System (SARAMICS) from bioMerieux are currently used in hospital clinical laboratories.7 Recognizing that these tools cannot differéntiate closely related bacterial species (e.g. Bacillus cereus group), Yang et al.12 proposed new measures of spectral similarity and a statistical assessment of such identifications. However, these tools are developed for information-rich protein-based MALDI-TOF MS data, not the glycolipid mass spectra which contain fewer peaks that are unique to species. To fully utilize glycolipid mass spectra for bacterial identification, it is essential to develop bioinformatic tools specific to glycolipid mass spectra like those produced by lipid A and related Gram-positive molecules including lipoteichoic acid and cardiolipin.
Constructing meaningful theoretical lipid A mass spectra with reasonable complexity is very challenging. Wilson et al.13 showed that a Cartesian product algorithm based on membrane glycolipid structure can, in theory, produce >2 billion molecular masses from the lipid A scaffold. However, we observe far fewer meaningful masses representing unique structures in real lipid A mass spectra. Here, we propose a Spectral Library approach that utilizes mass spectra generated by known lipid A structures and related glycolipids from Gram-positive bacteria. Since our algorithm is based on acquired data, we can develop an algorithm that reflects the stochastic nature of bacterial glycolipid ions. Such a spectral library concept has been previously used in proteomics. There, a peptide sequence is determined from a tandem mass spectrum by searching against previously assigned tandem mass spectral libraries of peptides.14–16 The popular scoring approach used in this area involves several variations of dot product analysis. However, due to the lack of many meaningful masses representing unique structures of species in these glycolipid mass spectra, the traditional dot product approaches did not work well in analyzing membrane glycolipid based mass spectra. Our previous work also shows that it is more suitable to use a machine learning technique for glycolipid mass spectra identification. 4
In this paper, we propose a model-based spectral library approach17 for matching gly-colipid mass spectra that we refer to as Lipid A Spectral Library (LASL). Different from previously proposed spectral library approaches, LASL contains bacteria identification models instead of mass spectra or representative mass spectra. The machine learning model can select key ions in glycolipid mass spectra during its training runs. Thus, it can work better in identifying glycolipid mass spectra than algorithms designed for protein mass spectra. By using a model-based approach, LASL is complex enough to capture the apparent stochastic nature of glycolipid mass spectra better than using only one representative mass spectrum per bacteria.
Here, we first introduce LASL as a model-based spectral library approach and then discuss measures of uncertainty of bacterial identifications. Then, we demonstrate the performance of LASL using nearly a thousand glicolipid mass spectra. Finally, we discuss the limitations and potential of our approach.
Materials and Methods
Data
For our analysis, we used the glycolipid mass spectral dataset published by Leung et al.3 that contained 906 mass spectra from various strains of six microbial species. We consider these 906 glycolipid mass spectra as a main dataset. These mass spectra were generated by negative ion MALDI-TOF-MS analysis. In short, samples were grown in liquid culture and lipids isolated using the hot ammonium isobutyrate described by El Hamidi et al.18 after which they were analyzed by MALDI-TOF-MS in the negative ion mode. The dataset included 404 mass spectra of Acinetobacter baumannii (AB), 79 from Enterobacter cloacae (EC), 55 from Enterococcus faecalis (EF), 207 from Klebsiella pneumoniae (KP), 78 from Pseudomonas aeruginosa (PA), and 83 from Staphylococcus aureus (SA). There were two phenotypes available in the dataset: colistin-susceptible (cs) and colistin-resistance (cr). Colistin (also known as polymyxin E) is used as a major antibiotic for fighting Gram-negative infections. Since colistin is the last resort to treat patients infected by multidrug-resistant bacteria (e.g. multidrug-resistant Acinetobacter baumannii19), the ability to accurately detect colistin-resistant bacteria and monitor their presence is essential. We denote colistin-susceptible Acinetobacter baumannii and colistin-resistant Acinetobacter baumannii as ABcs and ABcr, respectively. Similarity, we denote colistin-susceptible Klebsiella pneumoniae and colistin-resistant Klebsiella pneumoniae as KPcs and KPcr, respectively. Besides this main dataset, we had a supplementary dataset of four lipid A mass spectra generated from the following bacteria: Clostridium difficile, Legionella bozermannii, Salmonella typhimurium, and Yersinia pseudotuberculosis.
All mass spectra were converted to mzXML format using msconvert (v3.0.9393 ProteoWizard), then processed using the MALDIquant (v1.16.2) and MALDIquantForeign (v0.10) R packages.20 Specifically, the mass spectra were square root-transformed and smoothed using a Savitzky-Golay filter.21 Then, the baselines of mass spectra were corrected using the Statistics-sensitive Non-linear Iterative Peak-clipping (SNIP) algorithm,22 and peak intensities in mass spectra were normalized by their total ion current. The top K peaks were selected for the further analysis where K=50.4 Then, we binned peaks by their mass-to-charge ratios with their bin sizes of 1Da. The highest peak in each bin was selected. Their masses, (normalized) intensities, and ranks of intensities (across bins) were recorded.
Finally, we created decoy mass spectra, which did not belong to any species. Only a training set from the main dataset was used for decoy spectra construction. For bacterial identification, two sets of decoy spectra were constructed. One set was used to train the model (N=1,500) and another was used to test the model performance and measure false discovery rates (N=10,000). Decoy mass spectra were created by extracting K (e.g. K=50) random peaks from N mass spectra and randomly permuting their intensities. For example, if the K sampled peaks are expressed as (mz1, intensity1), (mz2, intensity2),, (mzK, intensityK), then one example decoy spectrum can contain the following peaks: (mz1, intensity30), (mz2, intensity 14), (mz3, intensity11), …, (mzK, intensity2) where the original intensity values are mismatched with their m/z values. Our model performance was not too sensitive to the choice of N as long as N was not too small (e.g. N = 1). In this paper, for each decoy spectrum, we randomly chose N to be an integer between 5 and 10. For the same purpose, we also constructed two sets of decoy mass spectra for AB phenotype identification and other two sets for KP phenotype identification.
Model-based Spectral Library
The main dataset was divided into test and train sets in a ratio of 2:1. For each set, we added decoy mass spectra, which did not belong to any species. Specifically, we added 1,500 mass spectra in the training set and 10,000 in the testing set. Adding decoy mass spectra in the training set improved the model performance allowing identification of the correct species with higher confidence. Decoy mass spectra in the testing set did not overlap with ones in the training set, but were used to estimate p-values and false discovery rates.
Mass spectra in the training set were used to construct a model-based spectral library. We built bacteria/phenotype identification models using eXtreme Gradient Boosting (XG-boost) with a logistic regression (binary classification) option.23 One model was built for each microbial species. We treated mass spectra from bacteria of interest as positive cases and mass spectra from other species and decoy mass spectra as negative cases. A total of six bacterial identification models were constructed. Similarly, two phenotype models also were built for AB and two other phenotype models were built for KP. The best tuning parameters for bacteria/phenotype models were selected using the the 5-fold cross-validation and the grid search (See details about tuning parameters in Supporting Information).
Bacteria/Phenotype Identification
The general framework of Bacteria/Phenotype Identification is displayed in Figure 1. Given a glycolipid mass spectrum, we first identified a bacterial species. If the bacterium was identified with high confidence (e.g. FDR < 0.01) and its phenotype models were available in the Spectral Library, we identified its phenotype. In detail, in Step 1, we measured a predicted probability, pb that a given mass spectrum was from a microbial species b, where b represented species in the spectral library. In our setting, was not equal to one where SL was a set of all the species in the spectral library because we chose not to use m-group classification models where m > 2. Noting that in practice, a given mass spectrum may not be from microbial species in the spectral library, we intentionally added decoy mass spectra in the training set and used pb as mere scores to choose the best species models. We called pb as matching scores for the rest of the paper. After a matching score of the given mass spectrum for each species model was estimated, the species with the highest matching score was assigned to the mass spectrum as a bacterial identification. We denote the top matching score as pb*.
Figure 1:
General Workflow of LASL.
In Step 2, we measured the uncertainty of bacterial species identifications. We note that the spectral library may not contain a microbial species of interest. Even when the library contains such a species, misidentifications can occur. Since it will be important to be certain about bacterial species identifications made from patients with infections, we calculated p-values and the corresponding false discovery rates (FDR) for the bacterial species identifications and discarded identifications with FDRs > 0.01. The p-values were estimated using 10,000 decoy mass spectra in the test set:
| (1) |
where d represents decoy mass spectrum, Nd is the number of decoy mass spectra, I is the indicator variable, pdi is the top matching score of ith decoy mass spectrum, pb* is the top matching score of an observed (non-decoy) mass spectrum. In other words, the p-value was calculated by dividing the number of decoy mass spectra with their top matching score greater than the top matching score of a given non-decoy spectrum by the total number of decoy mass spectra. The false discovery rates were estimated to correct multiple testing errors.24
For the glycolipid mass spectra identified as either AB or KP with high confidence (FDRs <0.01), we identified their phenotypes in Step 3 (Figure 1). We note that once phenotypes for other species become available, similar procedures can be incorporated. Similar to species identifications, the given mass spectrum was matched to the available phenotype identification models and their matching scores were calculated. A phenotype with the top matching score was assigned to a given mass spectrum. The corresponding p-value and FDR were estimated.
Results and Discussion
LASL performed very well in identifying many species at low false discovery rates as shown in Figure 2(a). Most LASL identifications had very low false discovery rates. At FDR < 1%, LASL identified about 95% of mass spectra. Out of 305 mass spectra, LASL identified 289 spectra at false discovery rates of 1% or less. Examples of correctly and incorrectly identified spectra are shown in Supporting Information (Figure S1). Since true identifications in the test set were known, we investigated how many mass spectra with FDRs < 0.01 were true identifications. Only one mass spectrum identified by LASL had a false identification at FDR < 1%. Furthermore, we also calculated a true false discovery rate (tFDR), which is a proportion of incorrect identifications of non-decoy spectra. The dotted line in Figure 2(a) is based on true false discovery rates. As shown here, our FDR estimations were close to true FDRs, and they were conservative estimations of tFDRs.
Figure 2:
The LASL performance in species identifications. (a) Estimated FDR Threshold vs. the number of identified species plot. The horizontal green line was the number of (non-decoy) mass spectra in test set. The dotted line was based on true FDR (tFDR) threshold. (b) Receiver Operating Characteristic (ROC) curve. (c) Precision-Recall (PR) curve.
The proposed scores (pb*) were also good at differentiating correct identifications from incorrect identifications (Figure 2(b) and Figure 2(c)). We used decoy mass spectra in the test set to measure the discriminative power of the proposed scores. The matching scores (pb*) in LASL were good at differentiating correct from incorrect identifications. The ROC (Receiver Operating Characteristic) curve AUC (area under curve) was 98.84%. The recall- precision curve (PR) AUC was also very high with 96.01%.
LASL used multiple characteristics of the mass spectra to identify species. Among those, the top 10 important features were displayed in Figure 3 for AB model. (The top 10 important features for other species/phenotypes can also be found in Supporting Information.) LASL automatically chose the signature ions, which we denote as only those ions necessary and sufficient to correctly identify a microbe, and used them to identify the species. The characteristics of the signature ions were reproducible between both technical and biological replicates with some variations (See Figure S2 in Supporting Information).
Figure 3:
(a) An example mass spectrum for AB. The m/z values that were related to the top 10 important features were displayed. The zoomed spectrum was also shown, (b) Top 10 important features in AB model. The Intensity, the m/z value, and the rank of the intensity of the highest peak in each m/z bin were features that considered to construct the species model.
We note that there was no overlap in either decoy nor non-decoy spectra between train and test sets. However, adding decoy spectra into the train set helped us identify more non-decoy spectra as shown in Table 1. Even without decoy spectra in the train set, LASL performed well. About 97% of top-ranked identifications among non-decoy spectra were correct (FDRs filtering was not applied at this stage). However, including the proposed decoy spectra in the train set, LASL performed even better. About 99% of top-ranked identifications among non-decoy spectra were correct identifications. In addition, including decoy spectra in the training process helped the model distinguish correct identifications from incorrect or decoy identifications. Without decoy spectra in the training process, LASL had poor precision-recall area under curve (PR AUC), while LASL with decoy-training had very good PR AUC. Specifically, LASL without decoy-training resulted in 93.80% ROC AUC and 56.51% PR AUC, while LASL with decoy-training resulted in 98.84% ROC AUC and 96.01% PR AUC.
Table 1:
The performance comparison in bacterial identifications with and without decoy- training. The proportion of correctly identified (non-decoy) spectra, area under curve (AUC) for receiver operating characteristic (ROC) curves and precision-recall (PR) curves were used to compare the performance.a
| decoy-training | no decoy-training | |
|---|---|---|
| correct top-ranked IDs, % | 99.08 | 97.38 |
| ROC AUC | 98.84 | 93.80 |
| PR AUC | 96.01 | 56.51 |
The proportion of correctly identified (nondecoy) spectra, area under curve (AUC) for receiver operating characteristic (ROC) curves and precision-recall (PR) curves, were used to compare the performance.
We further investigated the following alternative decoy spectra generation strategies: 1) A real number I) was added to all the m/z values of K peaks that were extracted from one spectrum where D was a random number between −100 and 100; and 2) The intensity values of K peaks that were extracted from one spectrum were permuted where K = 50. At the estimated false discovery rate threshold of 1%, tFDRs were 0.35%, 0.42%, and 0.37% for the original decoy strategy, the alternative decoy strategy 1 (mass shift), and alternative decoy strategy 2 (intensity permutation), respectively. All of these decoy strategies were conservative, but among them, our original decoy strategy was the most conservative one. The proportion of correctly top-ranked identifications were 99.08%, 98.69%, and 97.38% for the original decoy strategy, the alternative decoy strategy 1 (mass shift), and alternative decoy strategy 2 (intensity permutation), respectively. Further investigation about the best way to construct decoy spectra is needed in the future.
Finally, we compared our proposed method to Biotyper25 and bootstrap-based confidence scores12. We denote bootstrap-based confidence scores based on cosine and relative Euclidean distance as cosine and ieu, respectively. Details about the bootstrap-based confidence scores are shown in Supporting Information. LASL performed better than Biotyper, cosine, and ieu in various aspects. Since an FDR estimation strategy was developed for LASL, we compared the performance without making use of estimated false discovery rates. First, LASL was able to correctly identify more (non-decoy) mass spectra than the competing approaches. The proportion of correctly assigned bacteria for LASL was 99.08%, while the competing approaches produced results of 90.79%, 90.49%, and 84.27% for Biotyper, cosine, and ieu. Most importantly, LASL identified much more bacteria than the competing approaches at true FDRs (tFDRs) < 0.01 (Table 2 and Figure 4). This comparison demonstrated the degree to which a glycolipid-specific bioinformatics tool could improve bacterial identifications from glycolipid mass spectra.
Table 2:
The proportions of true bacterial identifications and the numbers of species identifications among LASL, Biotyper, Cosine correlation (cosine), and intensity-weighted Euclidean similarity (ieu). The proportions of correct IDs were calculated before tFDR thresholds were applied.
| Methods | correct IDs, % | no. of IDs at tFDR < 1% |
|---|---|---|
| LASL | 99.08 | 305 |
| Biotyper | 90.79 | 49 |
| cosine | 90.49 | 135 |
| ieu | 84.27 | 132 |
Figure 4:
True FDR Threshold vs. the number of identified species plot comparing among LASL, Bioytper, cosine correlation (cosine), and intensity-weighted euclidean distance (ieu).
LASL also performed well in identifying phenotypes when phenotypes of species were available in a spectral library (Table 3). At FDR < 1%, LASL identified phenotypes of130 AB mass spectral entries, which were 97% of AB mass spectra in the test set. At the same threshold, 66 out of 67 KP mass spectra had their phenotype identifications at FDR 1%. The area under curve calculations for ROC and Recall-Precision were over 94% for both AB and KP phenotype identifications. We did not consider comparing our approach to Biotyper, cosine, and ieu in phenotype identifications since the number of confidently identified bacteria for Biotyper, cosine, and ieu were substantially smaller than LASL in bacterial identification stage.
Table 3:
The performance of LASL in phenotype identifications. The number of identified phenotypes with FDR < 0.01, area under curve (AUC) for receiver operating characteristic (ROC) curves and precision-recall (PR) curves were used to measure the performance.
| Phenotypes | ROC AUC | PR AUC | no. of IDs |
|---|---|---|---|
| ABcr vs. ABcs | 99.90% | 96.98% | 130 (134)* |
| KPcr VS. KPcs | 99.94% | 94.72% | 66 (67)* |
The numbers in parentheses represent the total numbers of mass spectra identified as either AB or KP at FDR < 1% in test set.
In this paper, we proposed and tested a model-based spectral library approach for bacterial identifications using glycolipid mass spectra. LASL performed substantially better than the existing bioinformatics approaches in terms of accurately identifying and characterizing bacteria. However, LASL can identify only bacteria that are present in the spectral library. Thus, in the future, it is essential to build a spectral library that contains mass spectra from many different microbes. Noting that the mass spectrometry technology needed for this assay is relatively low-cost, widely distributed in hospital clinics and easy to use, we anticipate that the diversity of bacteria in this library will increases rapidly in the future.
Another way to overcome the limitation of the existing small library with very few entries is to utilize false discovery rates. In practice, we may not know whether a bacterium of interest is present in a given spectral library, even when the library contains a wide variety of microbes. If a glycolipid mass spectrum of interest is not from bacteria in the spectral library, the best outcome would be that LASL assigns low matching scores (pb*) and high false discovery rates to such spectra. Thus, the identification of those mass spectra would be discarded, not passing the FDR threshold (e.g., 1%, 5%). When we tested LASL with the supplementary dataset, which contained no species from the spectral library, the matching scores for those identifications were very small ranging from 0.01 to 0.02. Their false discovery rates were larger than 5%. High false discovery rates or low matching scores of mass spectra do not necessarily imply that those spectra are not from bacteria in our spectral library. This is because glycolipid mass spectra of bacteria from the spectral library can have low matching scores due to the poor quality of mass spectra (e.g., low signal-to-noise). However, this demonstrated the potential use of our approach in practice in cases where our spectral library does not contain all bacteria. In the future, constructing decoy spectra from all the currently available bacteria using theoretical lipid structures may enable us to more accurately measure false discovery rates for the identifications of bacteria that are not present in the spectral library. More investigation about decoy spectra generation strategies will help us use LASL in practice.
Conclusions
We developed and tested a model-based spectral library framework to analyze MALDI-TOF-MS data of bacterial membrane glycolipids like lipid A from Gram-negative bacteria and related species from Gram-positive bacteria. The performance of LASL was demonstrated using human pathogens notorious as hospital-acquired infections (HAIs) and for the acquisition of resistance to antibiotics. With the proposed framework, the library can be extended easily containing many more pathogens and organisms of general interest. As the microbial entries in the library increases, we believe that LASL will be able to provide valuable information for treatment decisions of infected patients ultimately helping to improve health care outcomes by decreasing morbidity/mortality as well as decreasing costs.
Supplementary Material
Acknowledgement
This publication was made possible by grants from the National Institute of General Medical Sciences (GM103440 for SR and GAW; 1R15GM126562–01 for GAW; 1R01GM111066–01 for DRG and RKE) from the National Institutes of Health. We thank anonymous reviewers whose comments and suggestions helped improve and clarify this manuscript.
Footnotes
Data Availability
Please contact goodlett@umaryland.edu to obtain data used in this paper.
Supporting Information Available
• SupportingInformation.pdf: Experimental details, supporting figures, and references. This material is available free of charge via the Internet at http://pubs.acs.org/.
Conflict of Interest: DRG and RKE have a significant financial interest in Pataigin LLC, the company developing diagnostic technology for rapid bacterial identification. All other authors declare no competing financial interests.
References
- (1).Willyard C The drug-resistant bacteria that pose the greatest health threats. Nature News 2017, 543, 15. [DOI] [PubMed] [Google Scholar]
- (2).Khabbaz RF; Moseley RR; Steiner RJ; Levitt AM; Bell BP Challenges of infectious diseases in the USA. The Lancet 2014, 384, 53–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (3).Leung LM; Fondrie WE; Doi Y; Johnson JK; Strickland DK; Ernst RK; Goodlett DR Identification of the ESKAPE pathogens by mass spectrometric analysis of microbial membrane glycolipids. Scientific Reports 2017, 7, 6403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (4).Fondrie WE; Liang T; Oyler BL; Leung LM; Ernst RK; Strickland DK; Goodlett DR Pathogen identification direct from polymicrobial specimens using membrane glycolipids. Scientific reports 2018, 8, 15857. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (5).Elssner T; Kostrzewa M; Maier T; Kruppa G Microorganism identification based on MALDI-TOF-MS fingerprints. Detection of Biological Agents for the Prevention of Bioterrorism. 2011; pp 99–113. [Google Scholar]
- (6).Belkum A.v. ; Durand G; Peyret M; Chatellier S; Zambardi G; Schrenzel J; Shortridge D; Engelhardt A; Dunne WM Rapid clinical bacteriology and its future impact. Annals of Laboratory Medicine 2013, 33, 14–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (7).Mather CA; Rivera SF; Butler-Wu SM Comparison of the Bruker Biotyper and Vitek MS matrix-assisted laser desorption ionization-time of flight mass spectrometry systems for identification of mycobacteria using simplified protein extraction protocols. Journal of Clinical Microbiology 2014, 52, 130–138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (8).Pence MA; McElvania TeKippe E; Wallace MA; Burnham CA Comparison and optimization of two MALDI-TOF MS platforms for the identification of medically relevant yeast species. European Journal of Clinical Microbiology & Infectious Diseases 2014, 33, 1703–1712. [DOI] [PubMed] [Google Scholar]
- (9).Seng P; Drancourt M; Gouriet F; La Scola B; Fournier P; Rolain JM; Raoult D Ongoing revolution in bacteriology: routine identification of bacteria by matrix-assisted laser desorption ionization time-of-flight mass spectrometry. Clinical Infectious Diseases: An Official Publication of the Infectious Diseases Society of America 2009, 49, 543–551. [DOI] [PubMed] [Google Scholar]
- (10).Clark AE; Kaleta EJ; Arora A; Wolk DM Matrix-assisted laser desorption ionization-time of flight mass spectrometry: a fundamental shift in the routine practice of clinical microbiology. Clinical Microbiology Reviews 2013, 26, 547–603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (11).Bertelli C; Greub G Rapid bacterial genome sequencing: methods and applications in clinical microbiology. Clinical Microbiology and Infection 2013, 19, 803–813. [DOI] [PubMed] [Google Scholar]
- (12).Yang Y; Lin Y; Chen Z; Gong T; Yang P; Girault H; Liu B; Qiao L Bacterial whole cell typing by mass spectra pattern matching with bootstrapping assessment. Analytical Chemistry 2017, 89, 12556–12561. [DOI] [PubMed] [Google Scholar]
- (13).Wilson MC; Liang T; Yoon SH; Leung L; Ernst RK; Goodlett DR A cartesian produce approach to lipid A structure identification. 2015; http://goodlettlab.org/posters/2015_ASMS_Lisa.pdf.
- (14).Lam H; Deutsch EW; Eddes JS; Eng JK; King N; Stein SE; Aebersold R Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics 2007, 7, 655–667. [DOI] [PubMed] [Google Scholar]
- (15).Lam H; Deutsch EW; Eddes JS; Eng JK; Stein SE; Aebersold R Building consensus spectral libraries for peptide identification in proteomics. Nature Methods 2008. , 5, 873–875. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (16).Deutsch EW; Perez-Riverol Y; Chalkley RJ; Wilhelm M; Tate S; Sachsenberg T; Walzer M; Käll L; Delanghe B; Böcke S et al. Expanding the use of spectral libraries in proteomics. 2018, 17, 4051–4060. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (17).Ryu S Computer implemented methods and systems for identifying a species from mass spectra. United States Provisional Patent 62/809285, 2019.
- (18).El Hamidi A; Tirsoaga A; Novikov A; Hussein A; Caroff M Microextraction of bacterial lipid A: easy and rapid method for mass spectrometric characterization. Journal of lipid research 2005, 46, 1773–1778. [DOI] [PubMed] [Google Scholar]
- (19).Cai Y; Chai D; Wang R; Liang B; Bai N Colistin resistance of Acinetobacter baumannii: clinical reports, mechanisms and antimicrobial strategies. Journal of Antimicrobial Chemotherapy 2012, 67, 1607–1615. [DOI] [PubMed] [Google Scholar]
- (20).Gibb S; Strimmer K MALDIquant: a versatile R package for the analysis of mass spectrometry data. Bioinformatics 2012, 28, 2270–2271. [DOI] [PubMed] [Google Scholar]
- (21).Savitzky A; Golay MJE Smoothing and differentiation of data by simplified least squares procedures. Analytical Chemistry 1964, 36, 1627–1639. [Google Scholar]
- (22).Ryan CG; Clayton E; Griffin WL; Sie SH; Cousens DR SNIP, a statistics-sensitive background treatment for the quantitative analysis of PIXE spectra in geoscience applications. Nuclear Instruments and Methods in Physics Research Section B: Beam Interactions with Materials and Atoms 1988. , 34 , 396–402. [Google Scholar]
- (23).Chen T; Guestrin C XGBoost: a scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2016; pp 785–794. [Google Scholar]
- (24).Benjamini Y; Hochberg Y Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological) 1995, 57, 289–300. [Google Scholar]
- (25).Mellmann A; Bimet F; Bizet C; Borovskaya A; Drake R; Eigner U; Fahr A; He Y; Ilina E; Kostrzewa M et al. High interlaboratory reproducibility of matrix-assisted laser desorption ionization-time of flight mass spectrometry-based species identification of nonfermenting bacteria. Journal of clinical microbiology 2009, 47, 3732–3734. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.







