Metagenomic shotgun sequencing for the identification of pathogens is being increasingly utilized as a diagnostic method. Interpretation of large and complicated data sets is a significant challenge, for which multiple commercial tools have been developed. Three commercial metagenomic shotgun sequencing tools, CosmosID, One Codex, and IDbyDNA, were compared to determine whether they result in similar interpretations of the same sequencing data.
KEYWORDS: diagnosis, metagenomics
ABSTRACT
Metagenomic shotgun sequencing for the identification of pathogens is being increasingly utilized as a diagnostic method. Interpretation of large and complicated data sets is a significant challenge, for which multiple commercial tools have been developed. Three commercial metagenomic shotgun sequencing tools, CosmosID, One Codex, and IDbyDNA, were compared to determine whether they result in similar interpretations of the same sequencing data. We selected 24 diverse samples from a previously characterized data set derived from DNA extracted from biofilms dislodged from the surfaces of resected arthroplasties (sonicate fluid). Sequencing data sets were analyzed using the three commercial tools and compared to culture results and prior metagenomic analysis interpretation. Identical interpretations from all three tools occurred for 6 samples. The total number of species identified included 28 by CosmosID, 59 by One Codex, and 41 by IDbyDNA. All of the tools performed similarly in detecting those microorganisms identified by culture, including polymicrobial mixes. These data show that while all of the tools performed well overall, there were some differences, particularly in their predilection for identifying low-abundance or contaminant organisms as present.
INTRODUCTION
Metagenomic shotgun sequencing, a method in which nucleic acid from a sample is sequenced to identify and characterize microorganisms present in the sample, is being evaluated and used with increasing frequency for clinical microbiology diagnostics (1–4). This approach is appealing as it is a single test that can theoretically detect all bacteria, fungi, viruses, or protozoa present in a sample without preconceived notions with regard to possible pathogens. Analyses of genome content from the same test, such as genes or mutations conferring antibiotic resistance, also have the potential to impact therapeutic decisions, while other genetic content may be useful for epidemiological purposes.
While these methods hold great potential and promise as a diagnostic tool, there are barriers limiting their routine implementation in clinical diagnostic laboratories (5, 6). These include, but are not limited to, costs and time associated with next-generation sequencing, regulatory challenges for tests without predefined possible results, and technical challenges such as optimization of sample preparation to maximize detection of potential pathogens while minimizing the presence of background signal or contamination.
Another barrier, particularly for those new to metagenomic shotgun sequencing, is the challenge of data analysis and interpretation of results. Sequence data comprise large files containing millions of short sequencing reads per sample. Several tools with a variety of underlying methods for sequence analysis have been developed in the past decade and have been reviewed and compared head to head (7–10). Many were developed by research laboratories and have the benefit of being free and open access. While they are freely available, many require significant computational resources and most do not have refined interfaces allowing straightforward analysis of complicated data sets. Not surprisingly, multiple commercial entities have developed bioinformatics tools to fill this void by creating products with the goal of being fast, accurate, and easy to use through a more user friendly interface. These are especially attractive for clinical laboratories looking to avoid the need for extensive specialty training or experience to conduct these analyses.
In the study reported here, three commercially available metagenomic analytical tools were employed to detect and identify pathogens in clinical samples to determine if the choice of tool impacted identification of pathogens. We previously published the results of metagenomic sequencing of a large sample set of resected joint arthroplasties consisting of both infected samples (prosthetic joint infection [PJI]) and uninfected samples (aseptic failure [AF]) (11). We also recently reported the use of the CosmosID platform for evaluation of this entire set of samples (12). In this study, a curated set of those previously characterized samples was selected to represent a range of sample complexity, including uninfected, monomicrobial, or polymicrobial infections, along with various signal strengths, from those that were weakly positive to those with a high percentage of microbial reads in a sample. The data were then subjected to analysis employing the three commercial products.
MATERIALS AND METHODS
Samples.
The metagenomic sequencing data sets used for comparison were generated as previously reported (11). They consisted of selected sonicate fluid samples generated from total joint arthroplasties removed for reasons of infection or for noninfectious causes. Arthroplasty components were sent to the clinical microbiology laboratory in a sterile container, after which Ringer’s solution was added and the sample underwent sonication and vortex mixing to create sonicate fluid (13). Intraoperative cultures (tissue cultures and synovial fluid) were also collected. The negative controls of Ringer’s solution were processed in the same way as the sonicate fluid. The samples were collected under Mayo Clinic Internal Review Board protocol 09-00808 and have been previously described (11, 12).
Metagenomic data sets.
Data files were submitted as FASTQ files (One Codex and IDbyDNA) or FASTA files (CosmosID) as recommended by the developers. No quality filtering of reads or prefiltering of human reads was performed prior to submission as this was not recommended by any developer. Due to security concerns regarding transmission of files without depletion of human subject sequences, files were not submitted through the typical website interfaces. Instead, an institutionally approved secure file transfer system (Signiant Media Exchange) was used to transfer files containing human sequences. Samples were analyzed with software versions corresponding to the following dates: January 2017 for IDbyDNA, February 2017 for One Codex, and March 2017 for CosmosID.
Interpretation of results.
Cultures were considered positive if organisms were isolated at >20 CFU/10 ml sonicate fluid, if there were <20 CFU/10 ml sonicate fluid and at least one other culture was positive for the same organism, or if two tissue and/or synovial fluid culture specimens were positive for the same organism. The 20 CFU/10 ml is the cutoff used by the clinical microbiology laboratory for significant levels; organisms present in smaller amounts are not routinely identified (13). Any cultures positive for fungal and mycobacterial species were considered positive as quantification of these was not routinely reported. Previous metagenomic interpretations based on an analysis pipeline centered on the Livermore Metagenomics Analysis Toolkit (LMAT) (14) were as previously reported (11). Interpretations were based on meeting a minimal threshold number of reads, percentage of microbial reads, and relative coverage of the reference genome for a species. Samples were analyzed using default settings for both CosmosID and One Codex. CosmosID interpretation was as previously reported (12) (minimum thresholds of frequency number of >15, total match percentage of >30% for Acinetobacter and Cutibacterium species or >3% for all other species, and relative abundance of >10%), with one additional modification. To improve detection of polymicrobial infections, if a microorganism met criteria with the thresholds described above, then any additional identifications within that same sample no longer required meeting a minimum level of 10% relative abundance. One Codex interpretations were based on identification of organisms as present by the software analysis. This included any organism given a value corresponding to a percentage of abundance, including those listed as present at less than 1% abundance. IDbyDNA analysis and interpretation were performed by individuals associated with the company providing a final interpretation on the basis of results from analyses performed using Taxonomer, which is a k-mer-based method as previously reported (15). Each analysis was manually reviewed and curated based on the results of the analyses. The individuals performing the analysis were blind to any culture results, to the infection classification, and to which samples were controls. They were aware only of the sample source, that the samples represented a variety of infectious etiologies, and that some were classified as not infected or were controls.
Data availability.
The next-generation sequencing data, with identifiable human reads removed for human subject privacy protection, are available from the NCBI Sequence Read Archive under BioProject PRJNA378504.
RESULTS
Identification of monomicrobial culture-positive PJI pathogens.
To assess the performance of the tools, the levels of sensitivity of detecting organisms identified from sonicate fluid and intraoperative cultures were compared. Thirteen samples were selected that represented a variety of pathogens, including numerous Gram-positive bacterial species (aerobic, anaerobic, and fastidious), Serratia marcescens, Mycobacterium abscessus, and Candida species, chosen to represent a wide range of pathogen burdens, as determined by quantitative analysis of sonicate fluid cultures and sequencing signal strength, i.e., the number of known pathogen reads, as determined by prior analysis performed using the LMAT-based pipeline in earlier studies (11).
The analytical tools provided similar results but with some differences (Table 1 and Table 2). All of the tools identified the known pathogen in 7 of 13 monomicrobial and fungal infection samples. No tool identified M. abscessus from sample 1041, presumably due to the low number of reads identifiable from this organism. Four samples (samples 536, 650, 814, and 903) had mixed results, with different tools not identifying the pathogen as present, almost all of which had the known pathogen as the top organism detected but not meeting the threshold for positivity (see Table S1 in the supplemental material). In only one sample (sample 1196), the known pathogen, Candida parapsilosis, was not detected at all (no reads present) using one of the tools (CosmosID), despite this organism being present in the database used. One Codex did not detect the presence of C. parapsilosis or C. albicans, despite many reads being present (Table S1).
TABLE 1.
Interpretation of metagenomic sequencing results by different analytical tools
| Sample | Result(s)a
|
|||||
|---|---|---|---|---|---|---|
| Sonicate fluid culture (CFU/10 ml) | Intraoperative tissue culture | LMAT | CosmosID | One codex | IDbyDNA | |
| Monomicrobial infections | ||||||
| 536 | Corynebacterium jeikeium 20–50 | Cutibacterium acnes 2/4, Staphylococcus saccharolyticus 1/4 | Corynebacterium jeikeium | Corynebacterium jeikeium top hit, did not reach threshold | Corynebacterium jeikeium top hit, did not reach threshold | Corynebacterium jeikeium, Cutibacterium acnes |
| 637 | Aerobic bacterium <20 | Streptococcus viridans group 4/4 | Streptococcus mitis group | Streptococcus species | Streptococcus mitis group; Acinetobacter species | Streptococcus mitis group |
| 656 | Streptococcus agalactiae >100 | 0/4 | Streptococcus agalactiae | Streptococcus agalactiae | Streptococcus agalactiae | Streptococcus agalactiae |
| 782 | Serratia species >100, aerobic bacterium <20 | Serratia species 5/5, Staphylococcus epidermidis 1/5 | Serratia marcescens, Staphylococcus epidermidis | Serratia marcescens | Serratia marcescens, Staphylococcus epidermidis, Aquamicrobium species | Serratia marcescens |
| 814 | Staphylococcus epidermidis 20–50 | Staphylococcus epidermidis 2/4 | Staphylococcus epidermidis | Staphylococcus epidermidis | Staphylococcus epidermidis, Cutibacterium acnes, Streptococcus mitis, Staphylococcus aureus, Staphylococcus warneri, Penicillium citrinum | Staphylococcus epidermidis top hit, did not reach threshold |
| 867 | Staphylococcus aureus >100 | Staphylococcus aureus 5/5, Streptococcus mitis group 1/5, Veillonella species 1/5 | Staphylococcus aureus, Peptoniphilus harei | Staphylococcus aureus, Peptoniphilus harei | Staphylococcus aureus, Peptoniphilus harei | Staphylococcus aureus, Peptoniphilus harei |
| 903 | Staphylococcus epidermidis <20, aerobic bacterium <20 | Staphylococcus epidermidis 2/3 | Staphylococcus epidermidis | Staphylococcus epidermidis top hit, did not reach threshold | Staphylococcus epidermidis, Corynebacterium glutamicum | Staphylococcus epidermidis top hit, did not reach threshold |
| 930 | Streptococcus anginosus >100 | Streptococcus anginosus group 3/3 | Streptococcus anginosus group | Multiple Streptococcus anginosus group species | Multiple Streptococcus anginosus group species | Streptococcus anginosus group |
| 1041 | Mycobacterium abscessus group, not quantified | Mycobacterium abscessus group 2/3 | None identified | None identified | None identified | None identified |
| 1050 | Cutibacterium avidum >100 | Cutibacterium avidum 1/4, small aerobic Gram-positive bacillus 1/4 | Cutibacterium avidum | Cutibacterium avidum, another Propionibacterium species | Cutibacterium avidum | Cutibacterium avidum |
| 1120 | Corynebacterium pyruviciproducens 51–100 | 0/5 | Corynebacterium pyruviciproducens | Corynebacterium pyruviciproducens | Corynebacterium pyruviciproducens, Corynebacterium glutamicum, Acinetobacter junii, Aquamicrobium species, Cutibacterium acnes | Corynebacterium pyruviciproducens |
| Fungal infections | ||||||
| 650 | Candida albicans, not quantified | Candida albicans 3/4, coagulase-negative Staphylococcus species 1/4 | Candida albicans | Candida albicans | Candida albicans top hit, did not reach threshold | Candida albicans |
| 1196 | Candida parapsilosis <20 | Candida parapsilosis 3/3, Cutibacterium acnes 2/3 | Candida parapsilosis | None identified | Candida parapsilosis top hit, did not reach threshold | Candida parapsilosis, Corynebacterium glutamicum |
| Polymicrobial infections | ||||||
| 611 | Streptococcus viridans group >100, Klebsiella oxytoca >100, Gram-positive bacillus resembling Corynebacterium >100 | Klebsiella oxytoca 5/5, Streptococcus viridans group 5/5, Corynebacterium species 5/5, coagulase-negative Staphylococcus species 1/5, Finegoldia magna 1/5, anaerobic Gram-positive coccus 4/5 | Streptococcus viridans group, Klebsiella oxytoca, Corynebacterium species, Staphylococcus epidermidis, Staphylococcus lugdunensis, Finegoldia magna | Streptococcus species, Klebsiella species, Corynebacterium striatum; Finegoldia magna | Streptococcus mitis, Klebsiella oxytoca, Corynebacterium striatum; also Finegoldia magna, Staphylococcus epidermidis, Staphylococcus lugdunensis, Corynebacterium aurimucosum | Streptococcus mitis, Klebsiella oxytoca, Corynebacterium striatum; Staphylococcus epidermidis, Staphylococcus lugdunensis, Finegoldia magna |
| 624 | Finegoldia magna <20, Peptostreptococcus species <20 | Finegoldia magna 5/5, Peptostreptococcus species 1/5 | Finegoldia magna, Anaerococcus prevotii | None identifiedb | Anaerococcus prevotii; Finegoldia magna next top hit, did not reach threshold | Finegoldia magna, Anaerococcus prevotii |
| 863 | Actinomyces odontolyticus >100, Peptoniphilus species >100, Anaerococcus vaginalis >100, Streptococcus agalactiae >100, Enterobacter aerogenes 20–50, small Gram-positive bacillus 20–50, Staphylococcus aureus <20, aerobic bacterium <20 | Actinomyces odontolyticus 3/3, Peptoniphilus species 1/3, Streptococcus agalactiae 3/3, Enterobacter aerogenes 2/3, Staphylococcus aureus 1/3, Trueperella bernardiae 1/3, Pseudomonas aeruginosa 1/3, Corynebacterium striatum 1/3, anaerobic Gram-positive coccus 1/3 | Actinomyces odontolyticus, Peptoniphilus harei, Anaerococcus vaginalis, Streptococcus agalactiae, Enterobacter aerogenes (not Staphylococcus aureus); Anaerococcus obesiensis, Finegoldia magna, Peptoniphilus nanciensis, Varibaculum cambriense | Actinomyces species, Peptoniphilus harei, Anaerococcus vaginalis, Streptococcus agalactiae, Enterobacter aerogenes, Staphylococcus aureus, Anaerococcus obesiensis, Varibaculum cambriense | Actinomyces sp., Peptoniphilus harei, Anaerococcus vaginalis, Streptococcus agalactiae, Staphylococcus aureus, Enterobacter aerogenes, Anaerococcus obesiensis, Varibaculum cambriense, Prevotella nanceiensis, Trueperella bernardiae, Xylanimonas cellulosilytica, Tissierellia bacterium | Actinomyces odontolyticus, Peptoniphilus harei, Anaerococcus vaginalis, Streptococcus agalactiae, Enterobacter aerogenes (not Staphylococcus aureus*), Anaerococcus obesiensis, Trueperella bernardiae, Finegoldia magna, Peptoniphilus nanciensis, Varibaculum cambriense |
| Culture-negative PJIs | ||||||
| 674 | No growth | 0/4 | Aerococcus urinae, Peptoniphilus species, Facklamia languida | Aerococcus urinae, Peptoniphilus species, Clostridiales species | Aerococcus urinae, Peptoniphilus species, Clostridiales species, Cutibacterium acnes, Staphylococcus epidermidis, Acinetobacter baumannii, Lactobacillus iners, Acinetobacter indicus, Actinomyces species | Aerococcus urinae, Peptoniphilus species, Facklamia languida |
| 1094 | No growth | Granulicatella adiacens 1/4 | Granulicatella adiacens | Granulicatella adiacens | Granulicatella adiacens, Cutibacterium acnes, Staphylococcus aureus | Granulicatella adiacens |
| 1116 | No growth | 0/5 | Mycoplasma salivarium | Mycoplasma salivarium | Mycoplasma salivarium | Mycoplasma salivarium |
| 1195 | Corynebacterium tuberculostearicum 1 CFU, aerobic bacterium <20, anaerobic bacterium <20 | Mycobacterium bovis BCG 3/3, Staphylococcus epidermidis 1/3 | Mycobacterium tuberculosis complex | Corynebacterium glutamicumb | Mycobacterium bovis/africanum complex, Corynebacterium glutamicum, Bradyrhizobium species | Mycobacterium tuberculosis complex, Corynebacterium tuberculostearicum |
| Aseptic failures | ||||||
| 641 | No growth | 0/2 | Negative | Negative | Negative | Negative |
| 736 | No growth | 0/3 | Negative | Negative | Negative | Cutibacterium acnes |
| 748 | No growth | 0/3 | Negative | Negative | Bacillus species | Negative |
| 991 | Aerobic bacterium <20 | 0/3 | Negative | Negative | Negative | Staphylococcus epidermidis, Streptococcus mitis |
| Control | ||||||
| LR070516 (Ringer’s) | Not applicable | Not applicable | Negative | Bradyrhizobium 23193 branch | Acinetobacter species | Acinetobacter ursingii, Staphylococcus aureus |
Quantitative data reported indicate number of positive cultures/total number collected. Lightface font indicates organisms identified by culture or corroborated by another metagenomic analytical tool. Underlining indicates uncorroborated identifications. Boldface font indicates an organism from culture that was not identified.
The indicated organisms (Anaerococcus hydrogenalis for sample 624 and M. bovis for sample 1195) were the organisms detected at the next highest levels in the “unfiltered” results.
TABLE 2.
Number of species detected by different analytical methodsa
| Parameter | Result(s) |
|||
|---|---|---|---|---|
| LMAT | CosmosID | One codex | IDbyDNA | |
| Monomicrobial culture-positive PJIs (13 samples) | ||||
| No. of species detected (13) | 12 | 9 (2b ) | 9 (3b ) | 10 (2b ) |
| No. of additional species detected | ||||
| Corroborated | 2 (2 samples) | 1 | 2 | 1 |
| Uncorroborated | 1 | 11 (4 samples) | 2 (2 samples) | |
| Polymicrobial samples (3 samples) | ||||
| No. of species detected (11) | 10 | 9 | 10 | 10 |
| No. of additional species detected | ||||
| Corroborated | 7 (2 samples) | 3 (2 samples) | 7 (2 samples) | 7 (2 samples) |
| Uncorroborated | 0 | 0 | 3 (1 sample) | 1 |
| Culture-negative PJIs (4 samples) | ||||
| No. of additional species detected | ||||
| Corroborated | 6 (4 samples) | 6 (3 samples) | 8 (4 samples) | 6 (4 samples) |
| Uncorroborated | 0 | 0 | 8 (3 samples) | 1 |
| Aseptic failures (4 samples) | ||||
| No. of additional species detected | ||||
| Corroborated | 0 | 0 | 0 | 0 |
| Uncorroborated | 0 | 0 | 1 | 3 (2 samples) |
The numbers of bacterial or fungal species identified are listed. “Corroborated” indicates species not identified by culture but detected by another analytical tool. “Uncorroborated” indicates that no other analytical tool identified the species.
Data represent numbers of additional species that represented the top organism detected but that failed to meet the threshold for identification as presented.
More differences were observed in comparing the identification of organisms not present on culture. Some of these were shared between tools (i.e., were “corroborated”), with all of the tools identifying Peptoniphilus harei in sample 867 and Staphylococcus epidermidis identified by LMAT and One Codex in sample 782. Other identifications were seen with only one analytical tool (i.e., were “uncorroborated”); One Codex identified the greatest number of additional uncorroborated species, with 10 species from 4 different samples detected.
Identification of pathogens from polymicrobial samples.
Three samples were selected based on their polymicrobial nature, containing two, three, and six species that met criteria for culture positivity. The total numbers of culture-positive species detected were similar (9 or 10 of 11 species detected) for all methods. Differences included CosmosID not detecting either pathogen (Finegoldia magna or Anaerococcus prevotii) in sample 624, One Codex not identifying F. magna as present in sample 624 despite identifying many reads as being from F. magna, and IDbyDNA not identifying Staphylococcus aureus in sample 863.
As with the monomicrobial samples, greater variability between tools was seen with identification of noncultured organisms. Eight additional species were identified by at least two tools (i.e., were “corroborated”) that either were not detected on culture or did not meet culture positivity criteria. Both One Codex and IDbyDNA identified seven additional species that were detected by another analytical tool, while CosmosID detected four additional species. One Codex again identified the most species (three) not detected by the other tools, whereas IDbyDNA detected only a single uncorroborated species.
Culture-negative PJI analysis.
Four samples were selected that qualified as culture negative for sonicate fluid growth but had organisms identified by previous analysis with LMAT. The samples contained anaerobic or fastidious organisms, including Granulicatella adiacens, Mycoplasma salivarium, and Mycobacterium bovis BCG. All organisms were identified using all of the tools, with the exception of M. bovis BCG not being identified using CosmosID (though reads were present in the “unfiltered” CosmosID results).
Aseptic failure and negative-control analysis.
Four samples from prosthetic joints removed for reasons other than infection (aseptic failure) were also tested. CosmosID did not identify any organisms, while One Codex and IDbyDNA identified 1 and 3 potential pathogens, respectively. Both tools identified an Acinetobacter species from the negative control of Ringer’s solution that underwent processing steps identical to those undergone by the sonicate fluid samples. Acinetobacter species were considered common contaminants in these samples based on prior LMAT analysis.
DISCUSSION
Metagenomic shotgun sequencing as a diagnostic tool for infectious diseases is a rapidly developing field. However, there are challenges that need to be addressed such as cost and time required for sequencing. An additional hurdle for researchers and laboratories engaging in metagenomic next-generation sequencing is the analysis of large and complicated sets of data generated from DNA sequencing. Early tools often required supercomputers and knowledge of command-line computer programs that limited access. With the growth of interest in metagenomic analysis, improved analytical algorithms requiring less computing power, and increasing use of cloud computing, the options available for analysis are growing. These include free tools that can be run on external servers (e.g., MG-RAST [16]) or powerful desktop computers (e.g., Kraken [17]), as well as commercial products, such as those included in this study. While commercial products operate on a fee-for-analysis basis, they offer many advantages, including servers available for data processing, reducing the local computer processing power required and instead primarily requiring sufficient storage space for files. Web interfaces for these commercial products bypass the requirements for command-line software knowledge. The tools also generate graphical representations of data, including across data sets.
The three commercial tools tested as part of these studies yielded similar results with some small differences. The CosmosID analysis identified the fewest (28 total) organisms thought to be present (including culture-positive and corroborated culture-negative species) compared to One Codex and IDbyDNA, which each identified 36. However, One Codex and IDbyDNA had more identifications that were not corroborated by another analytical tool, 23 and 7 identifications, respectively. Unfortunately, the nature of metagenomic analysis of clinical samples, where no gold standard exists to identify which organisms may be present, limits interpretation of whether the increased number of detections represents increased sensitivity or decreased specificity due to false-positive identifications. Even with uninfected aseptic failures or negative controls, the well-known presence of contaminant DNA limits one with regard to concluding that a given identification is incorrect in evaluating an analytical tool, which may be correctly detecting the presence of DNA that is present. Contaminant DNA also limits the usefulness of secondary molecular tests (16S rRNA gene PCR or pathogen-directed PCR) in addressing this issue.
There are many potential explanations for the discrepancies in the results from the different analytical tools studied here. Each factor in metagenomic data set analysis likely differs between the tools, including the methods for prefiltering human reads, the read quality cutoffs, the algorithms used for taxonomic assignment, the genome databases used, and the thresholds used for defining when sufficient reads are present to identify an organism as present. While some basic information is available, such as the makeup of the reference databases, the proprietary nature of each tool package prevents detailed dissection of which aspects account for the discrepancies observed.
It should also be noted that the CosmosID interpretations were based on an algorithm that includes parameters of sequencing results that were developed by our laboratory (12) and slightly modified, whereas One Codex interpretations were provided automatically and IDbyDNA included a secondary human component for interpretation of results. Altering the settings for interpretive criteria for CosmosID could be expected to alter performance of the analysis, while the potential variability in interpretation by the other tools is not able to be assessed with this one data set. Additionally, we included all organisms identified as present by One Codex, including those at less than 1% relative abundance, which are not typically included in the summary of organisms of high, medium, and low abundance. These were included because we were analyzing a normally sterile site, and with PJIs, any organism present, regardless of the relative abundance, would typically be considered clinically relevant. Not including those would have led to fewer uncorroborated identifications (i.e., 9 fewer) but also would have decreased the sensitivity of detection of culture-positive (4 fewer) and corroborated (5 fewer) organisms.
While the tools tested here perform well for taxonomic assignment, they are not designed at this time for more-detailed additional analysis of the sequencing data available. Metagenomic sequencing offers the potential for evaluating additional genomic information if organisms are sequenced at sufficient depth. Predictions of antibiotic resistance and epidemiological comparisons of strains are among the most commonly cited analyses considered. CosmosID includes reports of detected genes associated with antibiotic resistance, and One Codex offers this as a separate analysis. Xu et al. previously reported on CosmosID and antibiotic resistance prediction in staphylococcal species using a larger data set which included the samples studied here (18).
It should also be noted that there are concerns regarding the security of transmitting data files containing human sequences outside medical institutions, an issue especially germane to sequences that have not undergone human sequence depletion, as well as concerns regarding ownership of the data after analysis. For these studies, institutionally approved secure file exchange software (Signiant Media Exchange) was used to minimize risks regarding file transfer security. Confidentiality agreements were also used to protect the privacy of the data generated. Users of any third party metagenomic analysis tools are cautioned to be aware of institutional and regulatory rules regarding transmission of sensitive data as well as of third-party policies regarding data retention and use to ensure that they are not compromising privacy and that what is done is consistent with institutional review board policies and/or clinical practice.
Limitations of this study included that the samples used for comparison were selected to present challenges for detection based on results from previous analysis using tools that included LMAT. This approach prevents any comparison between LMAT and the commercial tools, though this was not the intent of this study. This approach also precludes analysis of sensitivity or specificity of these tools in regard to PJI pathogen diagnosis in general. These tools are also regularly updated in regard to the analytical methods and databases used, so interpretation of results must be considered in the context of the date and version of the software used. The scope of analysis was also limited, as the sequencing methods were DNA based and did not include RNA, and virus identifications were not included as these are not known to be significant contributors to PJI.
The selection of a metagenomic shotgun sequencing analysis tool is a difficult one for research and clinical microbiology laboratories as there are many publicly provided and commercial tools available. The tools tested here performed similarly, though some differences were present, particularly in the identification of organisms not detected by culture.
Supplementary Material
ACKNOWLEDGMENTS
Research reported in this publication was supported by the National Institutes of Health under award number R01 AR056647. N.C. is supported by R01 CA179243. R.P. reports grants from CD Diagnostics, BioFire, Curetis, Merck, Contrafect, Hutchison Biofilm Medical Solutions, Accelerate Diagnostics, Allergan, EnBiotix, Contrafect, and The Medicines Company. R.P. is or has been a consultant to Curetis, Specific Technologies, Next Gen Diagnostics, Accelerate Diagnostics, Selux Dx, GenMark Diagnostics, PathoQuest, Heraeus Medical, and Qvella; relevant monies are paid to Mayo Clinic. In addition, R.P. has a patent on Bordetella pertussis/parapertussis PCR issued, a patent on a device/method for sonication with royalties paid by Samsung to Mayo Clinic, and a patent on an antibiofilm substance issued. R.P. receives travel reimbursement from the American Society for Microbiology (ASM) and the Infectious Diseases Society of America (IDSA) and an editor’s stipend from IDSA and honoraria from the National Board of Medical Examiners (NBME), Up-to-Date, and the Infectious Diseases Board Review Course.
The content of this report is solely our responsibility and does not necessarily represent the official views of the National Institutes of Health.
Footnotes
Supplemental material is available online only.
REFERENCES
- 1.Forbes JD, Knox NC, Ronholm J, Pagotto F, Reimer A. 2017. Metagenomics: the next culture-independent game changer. Front Microbiol 8:1069. doi: 10.3389/fmicb.2017.01069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Wilson MR, Sample HA, Zorn KC, Arevalo S, Yu G, Neuhaus J, Federman S, Stryke D, Briggs B, Langelier C, Berger A, Douglas V, Josephson SA, Chow FC, Fulton BD, DeRisi JL, Gelfand JM, Naccache SN, Bender J, Dien Bard J, Murkey J, Carlson M, Vespa PM, Vijayan T, Allyn PR, Campeau S, Humphries RM, Klausner JD, Ganzon CD, Memar F, Ocampo NA, Zimmermann LL, Cohen SH, Polage CR, DeBiasi RL, Haller B, Dallas R, Maron G, Hayden R, Messacar K, Dominguez SR, Miller S, Chiu CY. 2019. Clinical metagenomic sequencing for diagnosis of meningitis and encephalitis. N Engl J Med 380:2327–2340. doi: 10.1056/NEJMoa1803396. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Langelier C, Zinter MS, Kalantar K, Yanik GA, Christenson S, O'Donovan B, White C, Wilson M, Sapru A, Dvorak CC, Miller S, Chiu CY, DeRisi JL. 2018. Metagenomic sequencing detects respiratory pathogens in hematopoietic cellular transplant patients. Am J Respir Crit Care Med 197:524–528. doi: 10.1164/rccm.201706-1097LE. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Blauwkamp TA, Thair S, Rosen MJ, Blair L, Lindner MS, Vilfan ID, Kawli T, Christians FC, Venkatasubrahmanyam S, Wall GD, Cheung A, Rogers ZN, Meshulam-Simon G, Huijse L, Balakrishnan S, Quinn JV, Hollemon D, Hong DK, Vaughn ML, Kertesz M, Bercovici S, Wilber JC, Yang S. 2019. Analytical and clinical validation of a microbial cell-free DNA sequencing test for infectious disease. Nat Microbiol 4:663–674. doi: 10.1038/s41564-018-0349-6. [DOI] [PubMed] [Google Scholar]
- 5.Greninger AL. 2018. The challenge of diagnostic metagenomics. Expert Rev Mol Diagn 18:605–615. doi: 10.1080/14737159.2018.1487292. [DOI] [PubMed] [Google Scholar]
- 6.Simner PJ, Miller S, Carroll KC. 2018. Understanding the promises and hurdles of metagenomic next-generation sequencing as a diagnostic tool for infectious diseases. Clin Infect Dis 66:778–788. doi: 10.1093/cid/cix881. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.McIntyre ABR, Ounit R, Afshinnekoo E, Prill RJ, Henaff E, Alexander N, Minot SS, Danko D, Foox J, Ahsanuddin S, Tighe S, Hasan NA, Subramanian P, Moffat K, Levy S, Lonardi S, Greenfield N, Colwell RR, Rosen GL, Mason CE. 2017. Comprehensive benchmarking and ensemble approaches for metagenomic classifiers. Genome Biol 18:182. doi: 10.1186/s13059-017-1299-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Oulas A, Pavloudi C, Polymenakou P, Pavlopoulos GA, Papanikolaou N, Kotoulas G, Arvanitidis C, Iliopoulos I. 2015. Metagenomics: tools and insights for analyzing next-generation sequencing data derived from biodiversity studies. Bioinform Biol Insights 9:75–88. doi: 10.4137/BBI.S12462. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Escobar-Zepeda A, Godoy-Lozano EE, Raggi L, Segovia L, Merino E, Gutierrez-Rios RM, Juarez K, Licea-Navarro AF, Pardo-Lopez L, Sanchez-Flores A. 2018. Analysis of sequencing strategies and tools for taxonomic annotation: defining standards for progressive metagenomics. Sci Rep 8:12034. doi: 10.1038/s41598-018-30515-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Breitwieser FP, Lu J, Salzberg SL. 19 July 2019, posting date A review of methods and databases for metagenomic classification and assembly. Brief Bioinform doi: 10.1093/bib/bbx120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Thoendel MJ, Jeraldo PR, Greenwood-Quaintance KE, Yao JZ, Chia N, Hanssen AD, Abdel MP, Patel R. 2018. Identification of prosthetic joint infection pathogens using a shotgun metagenomics approach. Clin Infect Dis 67:1333–1338. doi: 10.1093/cid/ciy303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Yan Q, Wi YM, Thoendel MJ, Raval YS, Greenwood-Quaintance KE, Abdel MP, Jeraldo PR, Chia N, Patel R. 2018. Evaluation of the CosmosID bioinformatics platform for prosthetic joint-associated sonicate fluid shotgun metagenomic data analysis. J Clin Microbiol 57:e01182-18. doi: 10.1128/JCM.01182-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Trampuz A, Piper KE, Jacobson MJ, Hanssen AD, Unni KK, Osmon DR, Mandrekar JN, Cockerill FR, Steckelberg JM, Greenleaf JF, Patel R. 2007. Sonication of removed hip and knee prostheses for diagnosis of infection. N Engl J Med 357:654–663. doi: 10.1056/NEJMoa061588. [DOI] [PubMed] [Google Scholar]
- 14.Ames SK, Hysom DA, Gardner SN, Lloyd GS, Gokhale MB, Allen JE. 2013. Scalable metagenomic taxonomy classification using a reference genome database. Bioinformatics 29:2253–2260. doi: 10.1093/bioinformatics/btt389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Graf EH, Simmon KE, Tardif KD, Hymas W, Flygare S, Eilbeck K, Yandell M, Schlaberg R. 2016. Unbiased detection of respiratory viruses by use of RNA sequencing-based metagenomics: a systematic comparison to a commercial PCR panel. J Clin Microbiol 54:1000–1007. doi: 10.1128/JCM.03060-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Meyer F, Paarmann D, D'Souza M, Olson R, Glass EM, Kubal M, Paczian T, Rodriguez A, Stevens R, Wilke A, Wilkening J, Edwards RA. 2008. The metagenomics RAST server—a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9:386. doi: 10.1186/1471-2105-9-386. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Wood DE, Salzberg SL. 2014. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 15:R46. doi: 10.1186/gb-2014-15-3-r46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Xu Y, Rudkjobing VB, Simonsen O, Pedersen C, Lorenzen J, Schonheyder HC, Nielsen PH, Thomsen TR. 2012. Bacterial diversity in suspected prosthetic joint infections: an exploratory study using 16S rRNA gene analysis. FEMS Immunol Med Microbiol 65:291–304. doi: 10.1111/j.1574-695X.2012.00949.x. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The next-generation sequencing data, with identifiable human reads removed for human subject privacy protection, are available from the NCBI Sequence Read Archive under BioProject PRJNA378504.
