The broad-range detection and identification of bacterial DNA from clinical specimens are a foundational approach in the practice of molecular microbiology. However, there are circumstances under which conventional testing may yield false-negative or otherwise uninterpretable results, including the presence of multiple bacterial templates or degraded nucleic acids. Here, we describe an alternative, next-generation sequencing approach for the broad range detection of bacterial DNA using broad-range 16S rRNA gene hybrid capture (“16S Capture”).
KEYWORDS: 16S rRNA, broad range, enrichment, hybridization capture, metagenomics, molecular diagnosis, next-generation sequencing, sequencing
ABSTRACT
The broad-range detection and identification of bacterial DNA from clinical specimens are a foundational approach in the practice of molecular microbiology. However, there are circumstances under which conventional testing may yield false-negative or otherwise uninterpretable results, including the presence of multiple bacterial templates or degraded nucleic acids. Here, we describe an alternative, next-generation sequencing approach for the broad range detection of bacterial DNA using broad-range 16S rRNA gene hybrid capture (“16S Capture”). The method is able to deconvolute multiple bacterial species present in a specimen, is compatible with highly fragmented templates, and can be readily implemented when the overwhelming majority of nucleic acids in a specimen derive from the human host. We find that this approach is sensitive to detecting as few as 17 Staphylococcus aureus genomes from a background of 100 ng of human DNA, providing 19- to 189-fold greater sensitivity for identifying bacterial sequences than standard shotgun metagenomic sequencing, and is able to successfully recover organisms from across the eubacterial tree of life. Application of 16S Capture to a proof-of-principle case series demonstrated its ability to identify bacterial species that were consistent with histological evidence of infection, even when diagnosis could not be established using conventional broad range bacterial detection assays. 16S Capture provides a novel means for the efficient and sensitive detection of bacteria embedded in human tissues and for specimens containing highly fragmented template DNA.
INTRODUCTION
The detection of microorganisms by molecular techniques has become a fundamental tool in the practice of modern medical bacteriology (1–3). In contrast to conventional, culture-based identification methods, molecular techniques are able to identify fastidious, slow-growing, unculturable, or nonviable bacteria and to directly interrogate clinical specimens, without the need to isolate and expand organisms through growth in vitro (2–4). Of particular utility are “broad-range” nucleic acid amplification and Sanger sequencing of the taxonomically informative 16S rRNA gene, which enable the recovery and subsequent classification of bacterial DNA, often to the species level, without prior knowledge or expectation of what organism is present (1, 2). Nevertheless, conventional broad-range species identification is not readily applicable to polymicrobial populations, where the presence of multiple interfering templates gives rise to generally uninterpretable sequence traces (5), and also is not robust when nucleic acid templates are highly fragmented (6), as can occur in specimens that have undergone formalin fixation or improper sample storage.
To help overcome these limitations, next-generation sequencing (NGS) approaches that offer expanded diagnostic capabilities have now been developed. To date, three general categories of NGS-based diagnostics for bacteria have emerged, each with their comparative advantages and disadvantages. First, and conceptually simplest, is PCR amplifying portions of the 16S rRNA gene using broad-range primers and subjecting the product to deep sequencing, which allows individual template molecules to be interrogated and taxonomically classified (4). Amplicon deep sequencing has been shown to be effective and sensitive in deconvoluting both the composition and relative abundance of taxa in even very complex microbiological communities (4). However, the approach is dependent on prior PCR amplification and, like conventional broad-range PCR, is therefore limited by DNA template fragmentation and a potential for amplification bias among species (7). Second, metagenomic next-generation sequencing (mNGS) has been used to classify DNA fragments randomly sampled from clinical specimens using shotgun sequencing of their total nucleic acid content (8). Because of the unbiased nature of this process, mNGS is capable of identifying organisms which are not amplifiable by broad-range PCR, including previously undescribed pathogens and nonbacterial organisms (8), but it is limited by the overwhelming quantity of host-derived material that is concordantly sequenced and which can obstruct the detection of far more rare pathogen sequences (9, 10). Methods for enriching bacterial DNA from human DNA in this context have been developed (11, 12) but are lossy, variably effective, and/or are only compatible with fresh specimens. Current clinical applications of mNGS are thereby limited to paucicellular specimens, such as cerebrospinal fluid (8) or cell-free plasma DNA (13, 14), and require comparatively high sequencing depths that can render them cost prohibitive (13). Third, techniques have been developed to selectively enrich one (15) or multiple (16) species of interest from a NGS library using hybrid capture probes that match an organism’s genome, permitting recovery of complementary sequences while leaving behind host DNA (15, 17, 18). This technique provides high sensitivity for targeted organisms in specimens containing high burdens of human nucleic acids, while enabling interrogation of an organism’s genomic content, such as the presence of antibiotic resistance factors or virulence genes (16). However, because enrichment approaches are specific for predefined organisms of interest, even when multiplexed (16) they lack the breadth of broad-range diagnostic approaches.
Here, we describe an NGS approach for bacterial DNA detection, broad range 16S rRNA gene enrichment (“16S Capture”), which addresses several shortcomings presented by conventional and existing NGS approaches. We developed a panel of hybrid capture enrichment probes which span the length of the 16S rRNA gene and which redundantly cover regions of sequence diversity across bacterial taxa, such that one or more individual oligonucleotides have ≥80% homology with, and can consequently hybridize to, each eubacterial species present across the tree of life. Shotgun sequencing libraries prepared from input material are enriched using this panel in order to selectively recover bacterial sequences and then subjected to NGS to catalog the organisms present.
MATERIALS AND METHODS
Probe panel design.
An alignment of 16S rRNA gene sequences was downloaded from the Ribosomal Database Project (19) (v11.4), providing a broad, yet highly curated and minimally redundant sequence database to inform probe design. We selected full-length sequences and restricted the search to type strains and isolates. A total of 9,669 sequences met these criteria. The alignment was trimmed from 102 bp upstream of V1 to 157 bp downstream of V9, based on Escherichia coli 16S rRNA gene sequence and numbering nomenclature (20).
Probe design using this multiple sequence alignment is diagrammed schematically in Fig. 1. The gapped alignment was split into multiple, nonoverlapping windows such that the average length of ungapped sequences contained within each window was 100 bp. This operation divided the multiple alignment into 14 windows. Sequences were extracted from within each window, those containing ambiguous or degenerate positions were excluded, gaps were removed, and sequences exceeding 120 bp in length were split into multiple subsequences of 120 bp each. For windows which were not a multiple of 120 bp in length, subsequences were selected such that they were evenly spaced, with appropriate overlap, across the window. Within each window, sequences and subsequences were clustered at a 80% identity threshold and then deduplicated using CDHIT-EST (21) (v3.6.1) with the following parameters: -n 5 -c 0.80 -G 1 -aL 0.8 -aS 0.8 -B 1 -d 0 -g 1 -gap -5 -p 1. The final probe pool design constituted the representative sequence selected by CDHIT from each deduplicated sequence cluster, combined across all windows. This yielded a final hybridization probe panel of 1,402 probes (see Table S1 in the supplemental material). Capture probes were synthesized as an xGen Lockdown target capture panel (biotinylated DNA oligonucleotides) by IDT.
FIG 1.

Schematic of 16S Capture probe design. All panels are drawn to scale, as represented by the common scale bar at bottom. (A) Schematic of the domains of a representative 16S rRNA gene (derived from E. coli), displayed 5′ to 3′, indicating the relative size of constant and variable regions. Variable regions (V1 to V9) are depicted as thick lines, and conserved domains are depicted as thin lines. (B) Relative lengths of 16S features (variable and conserved regions, according to E. coli numbering) as they have been expanded through a gapped multiple alignment across all bacterial species and subsequently used as the substrate for probe design. (C) Location of probe-design windows, represented as rectangular boxes, is shown relative to features in the multiple alignment. The count of probes designed within each windowed region is indicated by the height of shading.
DNA extraction and sequencing library preparation.
Staphylococcus aureus DNA was extracted from strain ATCC 29213 using a DNeasy UltraClean microbial kit (Qiagen) according to the manufacturer’s instructions. Purified human DNA from HapMap reference individual NA12878 was obtained from Coriell Cell Repositories. The microbial mock community (Microbial Mock Community B, even, low concentration, HM-782D) was obtained from BEI Resources. S. aureus DNA, human DNA, and mock community DNA were sheared using the Covaris E220 focused-ultrasonicator to an average fragment size of 150 bp (175 W, 10% duty factor, 200 cycles/burst, 330s, 7C, water level –5, 50-μl sample volume) or 400 bp (175 W, 10% duty factor, 200 cycles/burst, 38s, 7C, water level –5, 50-μl sample volume).
Fully deidentified formalin-fixed paraffin-embedded (FFPE) patient samples were extracted using the GeneRead DNA FFPE kit (Qiagen) according to the manufacturer’s instructions, except that the standard silica gel membrane columns from the kit were substituted with QIAamp UCP MinElute spin columns (Qiagen) in order to minimize possible exogenous contamination from microbial DNA. 16S rRNA gene equivalents were estimated based on organism genome size (22) and average operon count per genome as reported from rrnDB (23). Sequencing libraries were prepared from 10 to 100 ng of purified input DNA as previously described (24), with the fragmentation step omitted.
Hybridization capture enrichment.
Hybridization capture enrichment was accomplished using two sequential rounds of enrichment performed with xGen Lockdown Reagents (IDT), as described elsewhere (25). A total of 18 cycles of PCR amplification were used after the first round of hybridization, and 15 cycles were used after the second round of capture as described in the IDT xGen lockdown probe and reagents protocol v1.0 (25).
Sequencing and data analysis.
Sequencing was performed using an Illumina MiSeq system with 250-bp, paired-end sequencing chemistries. An average of ∼447,000 (standard deviation = 302,295) unique, deduplicated reads were allocated per specimen.
The data analysis pipeline used in this project is available from a public repository (https://github.com/nhoffman/16s-capture). Briefly, sequence reads were filtered, trimmed, deduplicated, and assembled using barcodecop v0.5 (https://github.com/nhoffman/barcodecop), ea-utils fastqc-mcf (https://github.com/ExpressionAnalysis/ea-utils), HTStream SuperDeduper (26), and PEAR (27), respectively. Reads corresponding to 16S rRNA genes were selected using the Infernal 1.1.2 (28) cmsearch function and aligned using cmalign function. The resulting alignments were merged with reference alignments using the Infernal esl-alimerge function in order to place all sequences in the same alignment register.
The 16S rRNA gene is composed of interspersed variable and constant regions that carry different information content for ascertaining the taxonomic assignment of bacterial species (2). As such, there are various degrees of phylogenetic signal from fragments originating from different regions of 16S. In order to allow assignment of individual reads at the greatest level of the taxonomic hierarchy possible, sequence reads were classified using a phylogenetic placement approach utilizing phylogenetic trees composed of appropriate reference sequences. For experiments using the mock bacterial community, predefined 16S rRNA gene reference sequences were used to establish the reference set (29). For experiments using clinical material, case-matched reference packages were established by recruiting 16S rRNA gene reference sequences from a curated set of NCBI 16S sequences (https://github.com/nhoffman/ya16sdb) on the basis of similarity to sequence reads using DeeNuRP 0.2.4 (https://github.com/fhcrc/deenurp) search-sequences and select-references functions (5). Query sequences from sequence reads were mapped onto the appropriate phylogenetic tree of reference sequences using epa-ng 0.3.5 (30) and classified using gappa 0.2.4 (31).
For limit of detection studies, DeSeq2 v1.28.1 (32) was used to assess the differences in abundance of S. aureus species-level classifications for three replicates subjected to 16S Capture compared to controls composed entirely of human DNA. The Wald test was used to determine statistical significance of differences in normalized read counts.
Standard mNGS data analysis was performed using CLOMPv1.01 (https://github.com/rcs333/CLOMP) with strict tiebreaking logic (14). Briefly, the CLOMP pipeline functions by quality filtering and adapter trimming reads, eliminating reads derived from the human host, aligning reads against NCBI’s nonredundant sequence database, performing taxonomic assignment of reads to the most specific taxonomic level possible, and aggregating read counts derived from each observed taxonomic category.
Organism-specific PCR.
Nested PCR to identify Lactobacillus iners was performed using LnestedF (5′-GCCTAATACATGCAAGTCGAGC-3′) and LnestedR (5′-CCGTTACCCTACCAACTAGCT-3′), followed by amplification with species-specific primers InersFw and InersRev (33). Species-specific PCR for Prevotella oris was carried out using “squirrel” primers (34) incorporating primer PO1 (35) and primer PO1R (5′-CCCATCCCTGACCGATGAAAT-3′), newly designed to minimize the size of the resultant amplicon. The primers were synthesized by IDT.
Data availability.
Sequence reads supporting bacterial classifications are available from the NCBI Sequence Read Archive (SRA) under BioProject accession PRJNA635908.
RESULTS
16S Capture provides sensitive enrichment of bacterial DNA.
After designing a panel of hybridization capture probes having homology across known eubacterial 16S rRNA genes, we first evaluated the sensitivity of the 16S Capture approach to recover bacterial DNA from a sample matrix largely comprised of human material (Fig. 2). DNA extracted from a human cell line and a Staphylococcus aureus culture were separately sheared to 150 and 400 bp in length in order to simulate different degrees of DNA degradation. These materials were subsequently used to prepare serial dilutions containing various quantities of bacterial template, ranging from 10 to 100 pg, in a fixed background comprised of 100 ng of human DNA. Sequencing adapters were then ligated directly to the fragmented DNA molecules and PCR amplified to generate NGS libraries. Libraries were either sequenced directly as mNGS specimens or were enriched using 16S Capture prior to sequencing. The fraction of sequence reads that were unambiguously classifiable as S. aureus at the species level was measured relative to a control specimen derived entirely from human genomic DNA in order to determine whether the abundance of bacterial sequence reads identified was significantly greater than spurious, background classifications recovered when human material alone was sequenced. Three biological replicates were prepared for each condition tested.
FIG 2.
Recovery of bacterial sequences from human background using mNGS and 16S Capture. Three biological replicates for various S. aureus genome equivalents spiked into a fixed background of 100 ng of human DNA were detected to the species level using standard mNGS (A) or 16S Capture (B). Points indicate data from individual replicates, error bars indicate standard errors of the mean for replicates within a condition, and horizontal lines denote mean value across those replicates. Asterisks indicate significance of P ≤ 0.05 (Wald test) relative to the negative control (0 genome equivalents). Note the difference in the y-axis scale between panels.
When performing standard mNGS shotgun sequencing, levels of S. aureus DNA significantly higher than background (P < 0.05 by the Wald test) were detectable when 322 genome equivalents were included for the specimens sheared to a 400 bp average DNA fragment length, while DNA sheared to an average of 150 bp in size required 3,220 genomes to be distinguishable from background (Fig. 2A). In contrast, libraries subjected to 16S Capture showed measurable and significant enrichment for S. aureus reads with as few as 17 genome equivalents of S. aureus templates (Fig. 2B). In this experiment, no significant difference was seen in enrichment levels using either 150- or 400-bp library fragments with 16S Capture. These results indicate that 17 S. aureus genomes, equivalent to 102 16S rRNA operons (23), approximates the limit of detection achievable by 16S Capture. This corresponds to a gain in sensitivity of 19- to 189-fold by 16S Capture relative to unenriched mNGS shotgun sequencing, depending on the fragment size of input material. Moreover, when comparing the average normalized read counts obtained by each method in conditions where both identified a statistically significant level of bacterial DNA, 16S Capture recovered between 13- and 837-fold more bacterial sequence reads than standard mNGS, with the largest differences observed for the specimens having the lowest burden of bacterial DNA.
Broad specificity and classification of multiple bacterial species.
To evaluate the ability of 16S Capture to broadly identify bacterial species, we next analyzed a mock bacterial community comprised of 20 taxa from across the eubacterial phylogeny, which was normalized based on 16S operon copy number (Fig. 3; see Tables S2 to S5 in the supplemental material). 16S Capture was alternatively performed using either 1,000 or 10,000 copies of 16S operons per taxon spiked into a background of 100 ng of human DNA, with material sheared to an average of 150 or 400 bp in length. Testing was performed in duplicate for each condition. Because some taxa in the community may not be readily distinguishable by 16S rRNA gene polymorphisms, concordance of sequencing results with expected composition was evaluated at both the genus and species level. Successful recovery of bacterial sequences was registered if relative abundance of microbial DNA in both replicates exceeded that identified from a paired, nontemplate reaction by at least 10-fold.
FIG 3.

Affinity of 16S Capture probes across disparate species. (A and B) Enrichment of genera (A) or species (B) from a 20-organism mock community by 16S Capture. The number of 16S templates per taxon and the average length of the template fragments are indicated below each plot, with replicates performed for each condition. Matched controls prepared from specimens lacking bacterial DNA are shown for comparison for each condition. A heatmap indicates the relative abundance of each classification in a specimen. Black “X” overlays indicate classifications for which a replicate exhibited ≤10-fold relative abundance of microbial DNA compared with the paired, nontemplate control reaction.
For samples sheared to 150 bp, 16S Capture successfully recovered 13 of 20 taxa at the species level and 12 of 17 taxa at the level of genus or lower when 1,000 16S operon copies per taxon were used as the template. When templates were increased to 10,000 copies per taxon, recovery improved to 18 of 20 species and 15 of 17 taxon classifications at the level of genus or lower. Use of 400-bp templates mirrored these results but demonstrated increased recovery of taxa, with 17 of 20 of species-level classifications and 15 of 17 taxa at the level of genus or lower being recovered using the low template concentration and 19 of 20 species and 17 of 17 taxa recovered at the level of genus or lower for the higher template concentration.
These findings confirm that the 16S Capture probe panel confers cross-species specificity, although performance of the panel was dose dependent on the starting bacterial template concentration. Some taxa were consistently recovered with greater read counts than others across replicates from a particular condition, raising the possibility that certain species are more efficiently enriched than others by this approach. Other taxa, including Escherichia coli, Bacillus cereus, and Clostridium beijerinckii, were recovered more abundantly at the genus level but underrepresented in species-level classifications, consistent with an inherent inability of 16S rRNA gene sequences to resolve among closely related species within particular genera (22).
16S Capture performed on human clinical specimens.
To provide proof of principle for the utility of the method in clinical practice, we next applied 16S Capture to a pair of representative clinical specimens. Each was processed in parallel with a matched tissue sample derived from the same patient that showed no histologic evidence of infection, so that the relative abundance of bacterial species from the diagnostic specimen could be compared against a matched control in order to establish the significance of positive findings. To facilitate taxonomic classification of reads, dedicated reference sets were constructed for each clinical sample by empirically recruiting sequences from a curated 16S rRNA gene database having high homology to those recovered by sequencing.
The first specimens were derived from an 80-year-old woman with a history of polymyalgia rheumatica, hypertension, and peripheral artery disease, who presented with deep vein thrombosis, aortic aneurysms, and pleural effusions. Blood cultures from the patient were positive for Streptococcus pneumoniae. Tissue sections of aortic aneurysm revealed diplococci by Brown and Brenn stain (used for differential staining of Gram-positive and Gram-negative organisms in tissue sections) (Fig. 4A and B), and molecular testing by conventional broad-range bacterial PCR accordingly identified S. pneumoniae. As expected, subjecting DNA from a FFPE biopsy specimen of the lesion to 16S Capture (Fig. 4C) recovered reads that were in the vast majority consistent with S. pneumoniae at both the genus (36% of classified reads) or species (28% of reads) level, in agreement with prior clinical diagnoses. A smaller fraction of reads (13%) was classified as alternative Streptococcus species, potentially representing contributions from polymerase errors and other artifacts arising during NGS library preparation and sequencing (36) and limitations of taxonomic classification. In support of the latter hypothesis, whereas BLAST (37) analysis of reads classified as S. pneumoniae identified multiple sequences from type strains having total identity to that species and markedly less identity to alternative taxa, lesser abundance Streptococcus classifications typically showed equally high identity matches to multiple independent species. The remainder of reads were distributed among a collection of unrelated bacterial species, which may indicate preanalytic contamination introduced during specimen preparation and processing. We found that 182 of the 299 classifications with read counts greater than 1 (61%) from the sample were are also present in the control, arguing in favor of preanalytic contamination.
FIG 4.
Clinical case series. (A) Histology of case 1 biopsy specimen showing diplococci by Gram stain. Original magnification, ×100. (B) Case 1 histology. Original magnification, ×400. (C) Relative abundance of top 10 identifications established by 16S Capture of case 1, shown relative to patient matched negative control material. (D) Histology of case 2 biopsy specimen showing Gram-variable bacilli by Gram stain. Original magnification, ×100. (E) Case 2 histology. Original magnification, ×400. (F) Relative abundance of top 10 identifications established by 16S Capture of case 2, shown relative to patient-matched negative-control material.
The second case was a presumed infectious process that could not be resolved by conventional molecular diagnostic testing. The patient was a 56-year-old man with a history of cardiomyopathy, atrial fibrillation, left ventricular assist device placement, and leukocytosis, who expired shortly after presentation. A postmortem diagnosis of bacterial endo-myocarditis with septic emboli to the brain and kidneys was established, with Gram-variable, mixed bacilli visualized using Brown and Brenn and Giemsa stains (Fig. 4D and E).
Culture of fresh tissue and broad-range bacterial endpoint PCR with gel electrophoretic evaluation of product from FFPE tissue were performed. Although both specimens had evidence of infection, testing was negative and the causative organism or organisms could not be identified. Using 16S Capture (Fig. 4F), the most abundant individual classifications of bacterial reads corresponded to Lactobacillus iners (17% of classified reads) and Prevotella oris (11% of reads), providing a provisional diagnosis. As before, a lesser proportion of sequence reads mapped to classifications related to the two major species, followed by a longer distribution of species which could relate either to a polymicrobial process or to contamination. Similar to the previous case, 152 of 337 classifications present in the sample with greater than one supporting read (45%) were also detectable in the paired control, consistent with some level of preanalytic contamination or background. We subsequently sought to confirm the detection of the two major organisms identified by 16S Capture using orthologous testing methods. We performed organism-specific PCR of the clinical material using primers derived from the primary literature (33, 35) (Fig. 5). Amplification with primers designed for L. iners (Fig. 5A) were confirmed to provide specific amplification for that organism and did not yield product when DNA extracted from human cells or from the closely related L. rhamnosus species was used as the template. Testing DNA extracted from the patient autopsy material yielded strong amplification while the patient matched control was PCR negative, consistent with the presence of L. iners in the diagnostic specimen. Parallel results were obtained using primers designed to specifically amplify P. oris (Fig. 5B). Amplification did not occur using human DNA or that from closely related Prevotella species as the template but was positive in the clinical specimen alone, albeit in the presence of an additional, higher-molecular-weight band. To confirm the identity of PCR products from the patient specimen, both species-specific amplifications were subjected to Sanger sequencing and classified against the nonredundant NCBI BLAST database. In both cases, this analysis revealed that amplicon sequences matched DNA from the expected organisms (a 172-bp L. iners product with 100% identity to L. iners strain DSM 13335 [accession NR_036982.1] and a P. oris product with 100% identity to P. oris strain JCM 12252 [accession NR_133118.1]).
FIG 5.

Organism-specific PCR confirmation of 16S Capture diagnoses of patient material. (A) L. iners-specific PCR for defined control specimens and patient autopsy material. (B) P. oris-specific PCR for defined control specimens and patient autopsy material. Arrowheads in both panels indicate the expected amplicon size.
DISCUSSION
16S Capture is a novel, targeted enrichment NGS technique for the broad range identification of bacterial DNA from clinical specimens. By virtue of its selectivity for bacterial DNA, 16S Capture effectively depletes human nucleic acids prior to sequencing, bypassing one of the major limitations of current mNGS protocols and allowing diagnosis in specimens where the overwhelming majority of DNA is derived from host material. We note that mNGS protocols require approximately 24 million reads per specimen (13), while in this study, robust classifications were achieved using less than 1/50 of that sequencing power. Unlike target enrichment assays that are designed for specific organisms, our method provides broad specificity across eubacterial species, allowing bacteria to be recovered without prior knowledge or expectation of their presence. 16S Capture is compatible with small quantities of highly fragmented DNA, making it suitable for use in formalin-fixed paraffin-embedded materials (38, 39) or other clinical specimens containing degraded nucleic acids. Analytically, focused interrogation of the 16S rRNA gene allows extensively populated and curated databases of 16S rRNA gene sequences to be used for taxonomic classification of sequence reads, providing a far more extensive and comprehensive knowledgebase than is available for whole-genome analyses (40). Hybridization capture technologies allow scalable sequencing of the full-length 16S rRNA gene, whereas amplicon-based 16S rRNA gene analysis is typically limited to particular variable regions (2, 4). Targeting the multicopy 16S rRNA operon (23) also provides an inherent level of signal amplification per genome equivalent of bacteria present. Although we opted for a comprehensive design in this study, hybridization probes could be designed against a narrower set of defined pathogens in order to affect a higher selection efficiency at the expense of generality.
Despite these advantages, 16S Capture also presents several drawbacks. The method is more time-consuming than mNGS, since the need for hybridization enrichment procedures adds both active and inactive time to the assay. As the sequence fragments examined by 16S Capture can be shorter in length than those generated by standard 16S rRNA gene PCR, and because they may derive from 16S rRNA gene variable regions which have suboptimal discriminatory power among taxa (22, 41), the taxonomic resolution achievable by 16S Capture can also be proportionally limited and subject to ambiguity, at least for a subset of reads recovered. Bacterial classifications that are assignable to individual sequences may consequently be equally compatible with multiple species, barring a species-level assignment in all cases. Hybridization capture methods can introduce sequence-dependent biases (42), with subsequent impacts to perceived species abundance and potentially limiting the diversity of organisms recovered. Lastly, given the sensitivity of the approach, analysis of specimens optimally requires that uninfected matched tissue or extraction controls be examined in parallel in order to exclude contamination as the source of bacterial nucleic acids.
Using proof-of-principle case specimens, we found that pathogenic organisms consistent with observed histology could be identified from direct specimens by 16S Capture but were not found in patient-matched, uninfected control tissue. These diagnoses could be confirmed using conventional broad-range 16S rRNA gene sequencing or by dedicated species-specific analyses in instances where conventional testing failed to generate broad range 16S rRNA gene PCR product, likely due to the poor quality of nucleic acid extracted from fixed material (38, 39). These findings support the use of 16S Capture in instances where clinical suspicion of infection is high, either from patient presentation or from histologic analyses, but where conventional diagnostics are not yielding.
16S Capture provides a means for the efficient and sensitive detection of bacteria from human tissues and specimens with highly fragmented template DNA, adding to the growing collection of NGS techniques that can be used to interrogate challenging clinical cases. Future, prospective studies will expand the number of clinical cases examined by this method and evaluate the diagnostic yield of the approach in clinical practice.
Supplementary Material
Footnotes
Supplemental material is available online only.
REFERENCES
- 1.Procop GW. 2007. Molecular diagnostics for the detection and characterization of microbial pathogens. Clin Infect Dis 45(Suppl 2):S99–S111. doi: 10.1086/519259. [DOI] [PubMed] [Google Scholar]
- 2.Clarridge JE. 2004. Impact of 16S rRNA gene sequence analysis for identification of bacteria on clinical microbiology and infectious diseases. Clin Microbiol Rev 17:840–862. doi: 10.1128/CMR.17.4.840-862.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Nikkari S, Lopez FA, Lepp PW, Cieslak PR, Ladd-Wilson S, Passaro D, Danila R, Relman DA. 2002. Broad-range bacterial detection and the analysis of unexplained death and critical illness. Emerg Infect Dis 8:188–194. doi: 10.3201/eid0802.010150. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Cummings LA, Kurosawa K, Hoogestraat DR, SenGupta DJ, Candra F, Doyle M, Thielges S, Land TA, Rosenthal CA, Hoffman NG, Salipante SJ, Cookson BT. 2016. Clinical next generation sequencing outperforms standard microbiological culture for characterizing polymicrobial samples. Clin Chem 62:1465–1473. doi: 10.1373/clinchem.2016.258806. [DOI] [PubMed] [Google Scholar]
- 5.Salipante SJ, Sengupta DJ, Rosenthal C, Costa G, Spangler J, Sims EH, Jacobs MA, Miller SI, Hoogestraat DR, Cookson BT, McCoy C, Matsen FA, Shendure J, Lee CC, Harkins TT, Hoffman NG. 2013. Rapid 16S rRNA next-generation sequencing of polymicrobial clinical samples for diagnosis of complex bacterial infections. PLoS One 8:e65226. doi: 10.1371/journal.pone.0065226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Golenberg EM, Bickel A, Weihs P. 1996. Effect of highly fragmented DNA on PCR. Nucleic Acids Res 24:5026–5033. doi: 10.1093/nar/24.24.5026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Kennedy K, Hall MW, Lynch MDJ, Moreno-Hagelsieb G, Neufeld JD. 2014. Evaluating bias of Illumina-based bacterial 16S rRNA gene profiles. Appl Environ Microbiol 80:5717–5722. doi: 10.1128/AEM.01451-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Wilson MR, Naccache SN, Samayoa E, Biagtan M, Bashir H, Yu G, Salamat SM, Somasekar S, Federman S, Miller S, Sokolic R, Garabedian E, Candotti F, Buckley RH, Reed KD, Meyer TL, Seroogy CM, Galloway R, Henderson SL, Gern JE, DeRisi JL, Chiu CY. 2014. Actionable diagnosis of neuroleptospirosis by next-generation sequencing. N Engl J Med 370:2408–2417. doi: 10.1056/NEJMoa1401268. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Doughty EL, Sergeant MJ, Adetifa I, Antonio M, Pallen MJ. 2014. Culture-independent detection and characterization of Mycobacterium tuberculosis and M. africanum in sputum samples using shotgun metagenomics on a benchtop sequencer. PeerJ 2:e585. doi: 10.7717/peerj.585. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Schlaberg R, Chiu CY, Miller S, Procop GW, Weinstock G, Professional Practice Committee and Committee on Laboratory Practices of the American Society for Microbiology, Microbiology Resource Committee of the College of American Pathologists. 2017. Validation of metagenomic next-generation sequencing tests for universal pathogen detection. Arch Pathol Lab Med 141:776–786. doi: 10.5858/arpa.2016-0539-RA. [DOI] [PubMed] [Google Scholar]
- 11.Thoendel M, Jeraldo PR, Greenwood-Quaintance KE, Yao JZ, Chia N, Hanssen AD, Abdel MP, Patel R. 2016. Comparison of microbial DNA enrichment tools for metagenomic whole-genome sequencing. J Microbiol Methods 127:141–145. doi: 10.1016/j.mimet.2016.05.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Hasan MR, Rawat A, Tang P, Jithesh PV, Thomas E, Tan R, Tilley P. 2016. Depletion of human DNA in spiked clinical specimens for improvement of sensitivity of pathogen detection by next-generation sequencing. J Clin Microbiol 54:919–927. doi: 10.1128/JCM.03050-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Blauwkamp TA, Thair S, Rosen MJ, Blair L, Lindner MS, Vilfan ID, Kawli T, Christians FC, Venkatasubrahmanyam S, Wall GD, Cheung A, Rogers ZN, Meshulam-Simon G, Huijse L, Balakrishnan S, Quinn JV, Hollemon D, Hong DK, Vaughn ML, Kertesz M, Bercovici S, Wilber JC, Yang S. 2019. Analytical and clinical validation of a microbial cell-free DNA sequencing test for infectious disease. Nat Microbiol 4:663–674. doi: 10.1038/s41564-018-0349-6. [DOI] [PubMed] [Google Scholar]
- 14.Barrett SLR, Holmes EA, Long DR, Shean RC, Bautista GE, Ravishankar S, Peddu V, Cookson BT, Singh PK, Greninger AL, Salipante SJ. 2020. Cell free DNA from respiratory pathogens is detectable in the blood plasma of cystic fibrosis patients. Sci Rep 10:6903. doi: 10.1038/s41598-020-63970-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Brown AC, Bryant JM, Einer-Jensen K, Holdstock J, Houniet DT, Chan JZM, Depledge DP, Nikolayevskyy V, Broda A, Stone MJ, Christiansen MT, Williams R, McAndrew MB, Tutill H, Brown J, Melzer M, Rosmarin C, McHugh TD, Shorten RJ, Drobniewski F, Speight G, Breuer J. 2015. Rapid whole-genome sequencing of Mycobacterium tuberculosis isolates directly from clinical samples. J Clin Microbiol 53:2230–2237. doi: 10.1128/JCM.00486-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Allicock OM, Guo C, Uhlemann A-C, Whittier S, Chauhan LV, Garcia J, Price A, Morse SS, Mishra N, Briese T, Lipkin WI. 2018. BacCapSeq: a platform for diagnosis and characterization of bacterial infections. mBio 9:02007-18. doi: 10.1128/mBio.02007-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Pinto M, Borges V, Antelo M, Pinheiro M, Nunes A, Azevedo J, Borrego MJ, Mendonça J, Carpinteiro D, Vieira L, Gomes JP. 2016. Genome-scale analysis of the non-cultivable Treponema pallidum reveals extensive within-patient genetic variation. Nat Microbiol 2:16190. doi: 10.1038/nmicrobiol.2016.190. [DOI] [PubMed] [Google Scholar]
- 18.Gaudin M, Desnues C. 2018. Hybrid capture-based next generation sequencing and its application to human infectious diseases. Front Microbiol 9:2924. doi: 10.3389/fmicb.2018.02924. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Cole JR, Wang Q, Fish JA, Chai B, McGarrell DM, Sun Y, Brown CT, Porras-Alfaro A, Kuske CR, Tiedje JM. 2014. Ribosomal Database Project: data and tools for high-throughput rRNA analysis. Nucleic Acids Res 42:D633–D642. doi: 10.1093/nar/gkt1244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Brosius J, Palmer ML, Kennedy PJ, Noller HF. 1978. Complete nucleotide sequence of a 16S ribosomal RNA gene from Escherichia coli. Proc Natl Acad Sci U S A 75:4801–4805. doi: 10.1073/pnas.75.10.4801. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Li W, Godzik A. 2006. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658–1659. doi: 10.1093/bioinformatics/btl158. [DOI] [PubMed] [Google Scholar]
- 22.McLean K, Rosenthal CA, Sengupta D, Owens J, Cookson BT, Hoffman NG, Salipante SJ. 2019. Improved species-level clinical identification of Enterobacteriaceae through broad-range DnaJ PCR and sequencing. J Clin Microbiol 57:e00986-19. doi: 10.1128/JCM.00986-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Stoddard SF, Smith BJ, Hein R, Roller BRK, Schmidt TM. 2015. rrnDB: improved tools for interpreting rRNA gene abundance in bacteria and archaea and a new foundation for future development. Nucleic Acids Res 43:D593–D598. doi: 10.1093/nar/gku1201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.SenGupta DJ, Cummings LA, Hoogestraat DR, Butler-Wu SM, Shendure J, Cookson BT, Salipante SJ. 2014. Whole-genome sequencing for high-resolution investigation of methicillin-resistant Staphylococcus aureus epidemiology and genome plasticity. J Clin Microbiol 52:2787–2796. doi: 10.1128/JCM.00759-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Schmitt MW, Fox EJ, Prindle MJ, Reid-Bayliss KS, True LD, Radich JP, Loeb LA. 2015. Sequencing small genomic targets with high efficiency and extreme accuracy. Nat Methods 12:423–425. doi: 10.1038/nmeth.3351. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Petersen KR, Gerritsen AT, Settles ML, Streett DA, Hunter SS. 2015. Super Deduper, fast PCR duplicate detection in fastq files. Proc 6th ACM Conf Bioinforma Comput Biol Health Inform :491–492. [Google Scholar]
- 27.Zhang J, Kobert K, Flouri T, Stamatakis A. 2014. PEAR: a fast and accurate Illumina Paired-End reAd mergeR. Bioinformatics 30:614–620. doi: 10.1093/bioinformatics/btt593. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Nawrocki EP, Eddy SR. 2013. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29:2933–2935. doi: 10.1093/bioinformatics/btt509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Salipante SJ, Kawashima T, Rosenthal C, Hoogestraat DR, Cummings LA, Sengupta DJ, Harkins TT, Cookson BT, Hoffman NG. 2014. Performance comparison of Illumina and ion torrent next-generation sequencing platforms for 16S rRNA-based bacterial community profiling. Appl Environ Microbiol 80:7583–7591. doi: 10.1128/AEM.02206-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Barbera P, Kozlov AM, Czech L, Morel B, Darriba D, Flouri T, Stamatakis A. 2019. EPA-ng: massively parallel evolutionary placement of genetic sequences. Syst Biol 68:365–369. doi: 10.1093/sysbio/syy054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Czech L, Barbera P, Stamatakis A. 2020. Genesis and Gappa: processing, analyzing and visualizing phylogenetic (placement) data. Bioinforma Oxf Engl 36:3263–3265. doi: 10.1093/bioinformatics/btaa070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Love MI, Huber W, Anders S. 2014. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15:550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Alqumber MA, Burton JP, Devenish C, Tagg JR. 2008. A species-specific PCR for Lactobacillus iners demonstrates a relative specificity of this species for vaginal colonization. Microb Ecol Health Dis 20:135–139. doi: 10.1080/08910600802340967. [DOI] [Google Scholar]
- 34.Ebili HO, Hassall JC, Fadhil W, Ham-Karim H, Asiri A, Raposo TP, Agboola AJ, Ilyas M. 2017. “Squirrel” primer-based PCR assay for direct and targeted sanger sequencing of short genomic segments. J Biomol Tech 28:97–110. doi: 10.7171/jbt.17-2803-001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Riggio MP, Lennon A. 2007. Development of a novel PCR assay for detection of Prevotella oris in clinical specimens. FEMS Microbiol Lett 276:123–128. doi: 10.1111/j.1574-6968.2007.00926.x. [DOI] [PubMed] [Google Scholar]
- 36.Ma X, Shao Y, Tian L, Flasch DA, Mulder HL, Edmonson MN, Liu Y, Chen X, Newman S, Nakitandwe J, Li Y, Li B, Shen S, Wang Z, Shurtleff S, Robison LL, Levy S, Easton J, Zhang J. 2019. Analysis of error profiles in deep next-generation sequencing data. Genome Biol 20:50. doi: 10.1186/s13059-019-1659-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol 215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- 38.Lu XJD, Liu KYP, Zhu YS, Cui C, Poh CF. 2018. Using ddPCR to assess the DNA yield of FFPE samples. Biomol Detect Quantif 16:5–11. doi: 10.1016/j.bdq.2018.10.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Amemiya K, Hirotsu Y, Oyama T, Omata M. 2019. Relationship between formalin reagent and success rate of targeted sequencing analysis using formalin fixed paraffin embedded tissues. Clin Chim Acta 488:129–134. doi: 10.1016/j.cca.2018.11.002. [DOI] [PubMed] [Google Scholar]
- 40.Breitwieser FP, Lu J, Salzberg SL. 2019. A review of methods and databases for metagenomic classification and assembly. Brief Bioinform 20:1125–1136. doi: 10.1093/bib/bbx120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Chakravorty S, Helb D, Burday M, Connell N, Alland D. 2007. A detailed analysis of 16S ribosomal RNA gene segments for the diagnosis of pathogenic bacteria. J Microbiol Methods 69:330–339. doi: 10.1016/j.mimet.2007.02.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Null A, Xu Y, Jiang H, Tyler-Smith C, Xue Y, Jiang T, Wang J, Wu M, Liu X, Tian G, Wang J, Wang J, Yang H, Zhang X. 2011. Comprehensive comparison of three commercial human whole-exome capture platforms. Genome Biol 12:R95. doi: 10.1186/gb-2011-12-9-r95. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Sequence reads supporting bacterial classifications are available from the NCBI Sequence Read Archive (SRA) under BioProject accession PRJNA635908.


