ABSTRACT
The development and continuous improvement of high-throughput sequencing platforms have stimulated interest in the study of complex microbial communities. Currently, the most popular sequencing approach to study microbial community composition and dynamics is targeted 16S rRNA gene metabarcoding. To prepare samples for sequencing, there are a variety of processing steps, each with the potential to introduce bias at the data analysis stage. In this short review, key information from the literature pertaining to each processing step is described, and consequently, general recommendations for future 16S rRNA gene metabarcoding experiments are made.
KEYWORDS: 16S RNA, DNA sequencing, microbiome
INTRODUCTION
In recent years, the emergence of high-throughput sequencing platforms has revolutionized the study of complex microbial communities. Most commonly, marker genes (e.g., 16S rRNA and 18S rRNA genes) are amplified and sequenced, providing both qualitative and quantitative (i.e., relative abundance) data. However, the variety of methodologies which can be used to carry out marker gene analysis can be overwhelming. Each methodological stage, from sampling to data analysis, can introduce biases; such biases can skew data sets by introducing changes in the relative abundances observed, and they can affect the perception of community diversity. This short review includes key information from current literature on sample collection, sample storage and processing, and sequencing and data analysis, specifically for the study of bacterial communities using 16S rRNA gene metabarcoding. By collating fundamental research from each of these areas, we aim to try to ensure that scientists entering this field are better informed to make decisions on experimental design for 16S rRNA gene sequencing studies.
SAMPLE COLLECTION
A sampling method is obviously dependent on sample type, and as such, the factors which may introduce bias will also vary between different types of microbiome studies. Clearly, study-specific concerns cannot be entirely covered in this review. However, the overarching factors which should be taken into account will be briefly covered in this section.
First, it is important to consider the proposed sampling site. Bacterial community composition varies even within a specific environment, for example, at different sites within the gastrointestinal tract (1) and the respiratory tract (2) and at different soil depths (3, 4). Since the magnitude of interindividual variation is very much dependent on sampling site (5), this can have implications for experimental design, specifically with regard to the number of subjects and the number of samples to be taken.
Second, there are conflicting results in the literature with regard to the variation introduced by different sample collection methodologies. For example, there have been attempts to replace invasive sampling with less invasive methods; however, significant differences have been found in microbial populations in comparisons of swab and biopsy samples from human intestines (6), breath condensate and lung brushings (7), and rumen fluid samples obtained via oral stomach tubing and a fistula (8). However, other works contradict these findings, with two studies showing no statistically significant differences when studying the rumen microbiota in cattle using a variety of sampling methods (9, 10). Additionally, no significant differences were evident in microbial composition in comparisons of sinonasal swabs and biopsy samples (11) and rectal swabs and stool samples (12). This kind of conflict in the literature is not uncommon, which leads to a lack of consensus and standardization.
A final consideration is whether samples should be homogenized, which appears to be most critical in studies on gut contents (8, 13) and on soil (14), since various microbial compositions have been observed in different stool fractions and in soils with various particle sizes. Although the literature is generally conflicting with regard to sampling methodology, it is important to consider that comparisons of data obtained using different approaches should be avoided.
SAMPLE STORAGE
There is conflicting evidence on whether different storage conditions alone can have an impact on microbial community studies (15–18). It is often not practical to extract DNA from fresh samples; therefore, samples are generally stored for various durations prior to DNA extraction. Conventionally, it is assumed that rapid freezing to −80°C is best practice (18, 19), but this is not feasible for all study designs, for example, at remote sites where low-temperature storage is unavailable (20). Several studies have been carried out to assess the effects of storage conditions on study findings, which will be summarized in this section.
FRESH VERSUS FROZEN SAMPLES
A couple of studies showed that freezing samples appeared to cause an increase in the Firmicutes-to-Bacteroidetes ratio in comparison with fresh samples (15, 19). Conversely, in a study by Fouhy et al., the only bacterial groups differentially expressed between fresh and snap-frozen fecal samples were the Faecalibacterium and Leuconostoc genera, with no significant differences being evident at the phylum or family level (18). No significant effects on microbial composition or diversity were observed in fecal samples refrigerated for 24 h (21) or 72 h (20) prior to DNA extraction.
The impact of storage duration has also been explored in various studies. Lauber et al. stored soil, feces, and skin samples at various temperatures and found that storage duration had no significant impact on overall bacterial community structure or diversity (17). In samples which were stored at −80°C for 2 years, a small number of changes in the microbial communities were observed, with increased abundances of lactobacilli and bacilli and a reduction in the total number of operational taxonomic units (OTUs) (for a definition of OTUs, please see Operational Taxonomic Unit Picking Methods, below). Using the data presented in the literature, processing fresh samples is generally the best approach, but when this is not possible, samples should be frozen for unequal amounts of time and processed in one batch or frozen for an equal amount of time and processed in multiple batches. The decision on how to proceed will be dependent on the duration of the sample collection phase and on the study design, but regardless of processing method, the storage duration and DNA extraction batch should be recorded to enable this to be taken into account during analysis.
USE OF CRYOPROTECTANT
McKain et al. explored the effects of using a cryoprotectant (i.e., glycerol/phosphate-buffered saline) to store ruminal digesta samples and found that freezing samples without cryoprotectant caused a significant loss in Bacteroidetes when measured by 16S rRNA gene copy number by quantitative PCR (15). The authors consequently suggested that simply storing samples without a cryoprotectant and carrying out DNA extraction at a later date would impact downstream results with regard to archaeal and bacterial community composition. Choo et al. explored the effects of using several common preservative buffers (i.e., RNAlater, OMNIgene.GUT, and Tris-EDTA) relative to samples stored dry at −80°C on fecal microbiota composition (20). Samples stored in the OMNIgene.GUT buffer diverged the least from the samples stored dry at −80°C, and the results obtained from the samples stored in Tris-EDTA diverged the most, with associated changes in relative abundances of biologically important bacterial groups, such as Escherichia-Shigella, Citrobacter, and Enterobacter. Additionally, RNAlater has previously been shown to be unsuitable for the storage of samples subject to microbial community analysis, with samples stored in RNAlater being the least similar to fresh samples and samples immediately frozen at −80°C (22, 23). Consequently, when considering the use of a cryoprotectant for storage, it is important to ensure that all samples are stored in the same manner.
DNA EXTRACTION
During DNA extraction, it is important to consider that some microbial cells may be more resistant to lysis, such as bacterial endospores (24) and Gram-positive bacteria, which will have an impact on DNA extraction efficiency. The presence of inhibitors has also been found to directly impact DNA extraction efficiency (e.g., debris in environmental samples and organic matter in soil and feces) and can affect the efficiency of PCR downstream (reviewed in detail by Schrader et al. [25]). Common inhibitors include inorganic material (e.g., calcium ions), with the majority of inhibitors being organic matter, such as humic acid, bile salts, and polysaccharides. These issues will vary according to sample type; therefore, matrix-specific DNA extraction protocols should be optimized as part of a 16S rRNA gene metabarcoding experiment.
Besides phenol-chloroform DNA extraction methods, there are many commercial extraction kits available which incorporate mechanical and/or chemical/enzymatic lysis steps. Numerous authors have demonstrated that the abundances of specific bacterial groups vary in comparisons of different DNA extraction methodologies (8, 26–31). Specifically, variations in DNA yield and quality are obtained which can lead to different results in downstream analyses (28).
One key DNA extraction step which can introduce bias is the presence or absence of a mechanical lysis step. The inclusion of a bead-beating step has been linked to a higher DNA yield (8, 29, 32), higher bacterial diversity (29, 32), and more efficient extraction of DNA from Gram-positive and spore-forming bacteria (29, 33, 34). Consequently, some authors suggest that samples subjected to different DNA extraction methods are not comparable (8, 28, 35). Ultimately, the best approach is to utilize a method which extracts the highest yield and quality of DNA as possible without biasing the method toward particular bacterial taxa. To achieve this, the inclusion of a bead-beating step and prior optimization of the DNA extraction method to ensure optimal DNA yield and quality is recommended prior to carrying out 16S rRNA gene sequencing.
SEQUENCING STRATEGY
Library preparation.
Since the entire 16S rRNA gene cannot be sequenced using short-read second-generation sequencing platforms, a short region of the gene must be selected for PCR amplification and sequencing. There is currently no consensus on the most appropriate hypervariable region(s), and several studies have been carried out to determine the advantages and disadvantages of each. Importantly, the choice of hypervariable region(s) and the design of “universal” PCR primers have an effect on phylogenetic resolution (36–40). Indeed, no primer set is truly universal, with some commonly used 16S rRNA gene primers proving ineffective at amplifying biologically relevant bacteria (34, 41). Fouhy et al. explored the effects of primer choice (as well as DNA extraction and sequencing platform) on microbial composition data using a mock bacterial community and three primer sets (42), with differences in relative abundances and richness being observed.
Further biases can be introduced during PCR amplification due to the presence of PCR inhibitors (described in DNA Extraction, above), with the number of PCR cycles and the use of a high-fidelity polymerase (43) also having an impact on results. The formation of chimeras occurs in later PCR cycles when the highest concentration of incompletely extended primers compete with the original primers. Consequently, the potential for chimera formation can be reduced by lowering the number of PCR cycles (44). Previous work found that bacterial richness increased as the PCR cycle number increased (45, 46), but that cycle number had no significant effect on community structure (46). A lower number of PCR artifacts was found when using a high-fidelity polymerase than with a standard polymerase (43). The use of different polymerases has also been found to significantly affect PCR efficiencies for particular bacterial groups and overall bacterial community structure (46). Finally, the quantity of input DNA into a PCR has also been found to have a significant effect on observed bacterial community structure (31). In summary, there is not a “gold standard” hypervariable region for 16S sequencing, but it is important to consider that PCR reagents and PCR conditions should be optimized and kept consistent across a study.
Sequencing platforms.
D'Amore et al. have studied the choice of sequencing platform most recently (47), and we refer the reader to that paper for a more-in-depth analysis. Illumina technology (primarily the MiSeq system) has become the most common sequencing platform for 16S rRNA gene metabarcoding. This is because the MiSeq system, in general, produces the most accurate longest reads and has a much higher throughput than the other platforms, which enables more samples to be sequenced at higher depth or lower cost. Indeed, while D'Amore et al. caution that the choice of sequencer depends on the question being asked, they note that the MiSeq system is likely to be the platform of choice in most cases. The Roche 454 sequencer was, for a long time, the platform most used for 16S studies. The potential longer reads of this technology have some advantages; however, it is now no longer available, as Roche retired the product in 2013. The 454 sequencer unfortunately suffered from an elevated error rate due to miscalling of homopolymers. The Ion Torrent and Ion Proton platforms are often available at low capital cost and produce data more quickly than the MiSeq system. However, the lower throughput and higher error rates mean that many researchers prefer to select the MiSeq system. While Illumina MiSeq offers the highest quality data, there are some reported problems with the platform. Illumina MiSeq error rates are often thought to be around 0.01%; however Kozich et al. showed that the actual error rates can be as high as 10% and recommend a complete overlap of 250-bp reads to correct for this (48). D'Amore et al. similarly showed library-dependent error rates in either read 1 or read 2 (but not the overlap) in MiSeq data, albeit at a lower rate (2 to 3%) (47). An improvement has been suggested to this, which involves a heterogeneity spacer that improves sequence diversity in the library (49).
PacBio and Oxford Nanopore technologies are able to sequence the full length of the 16S gene, which is of course very powerful. However, again error rates are an issue, in the range of 5 to 15% for both technologies, which can cause subsequent errors in downstream analysis. Despite the high error rate of long-read single-molecule sequencing systems (50–52), studies are beginning to appear to show their utility for 16S rRNA gene sequencing (53–56). For example, Schloss et al. were able to reduce the observed error rate for the V1 to V9 region from 0.69 to 0.027% for PacBio data, which is comparable to those for the Illumina, 454 and Ion Torrent systems (54). One of the drawbacks of the PacBio technology is its throughput, i.e., the number of samples that can be run on the platform simultaneously and at a reasonable cost is much lower than with the MiSeq system.
When planning a 16S sequencing study, three key considerations are the quality of sequence data, the cost of sequencing, and the length of generated reads, as detailed already in this section. A final factor is the number of samples which can be analyzed per sequencing run. When utilizing Illumina platforms, it is possible to use multiplexing strategies by implementation of unique single-indexed (57) or dual-indexed (48) (or barcoded) primers for library preparation. If the number of samples per run is increased, this is associated with a lower coverage (or number of sequences generated) per sample. If the coverage per sample is too low, the diversity of the microbial community being studied is likely to be underrepresented, as rarer members of the community are less likely to be detected. Therefore, guidance on the number of samples to be included per run should be obtained from small pilot studies (and observation of the resultant rarefaction curves) or published literature. In larger studies, more than one sequencing run may be required, and Caporaso et al. showed that the data were highly reproducible across sequencing lanes (57). The appropriate sequencing platform should be selected based upon the aims of the experiment and the error rates associated with the available platforms. Another key consideration is sequencing coverage and its relation to the number of samples to be run. When studying core members of a microbial community, lowering the amount of coverage by increasing the number of samples in a sequencing run may be an effective way to decrease costs. However, if rarer members of a community are of interest, lower sample numbers leading to increased coverage may be more appropriate.
MOCK BACTERIAL COMMUNITIES
As part of 16S microbiome studies, it is useful to include a mock community control composed of predetermined ratios of DNA from a mixture of bacterial species. This not only allows the quantification of sequencing error (58) but also allows bias introduced during the sampling and library preparation processes to be identified (42, 47, 59, 60). For example, a mock community containing bacterial taxonomies which are of specific interest to the research group can be used to calculate whether these taxonomies are likely to be over- or underrepresented in samples. Similar to mock communities, spike-in standards can also be used to analyze bias and the reproducibility of methodologies (61). However, unlike mock communities, these standards are added directly to samples; therefore, quality control can be performed on a per-sample basis. However, there is a risk of crossover between the 16S rRNA gene sequences contained in the standards and those which may be found in samples. Consequently, care must be taken to select bacteria which are highly unlikely to occur in the samples of interest (62, 63) or which have been designed in silico and are dissimilar to sequences found in 16S databases (61).
There are a variety of sources which provide mock bacterial communities for use in research; however, some researchers choose to create their own mock communities in-house which more accurately reflect bacteria of interest and scientific importance. Preprepared bacterial communities are available in two different formats: DNA mock communities and whole-cell mock communities. The whole-cell mock communities are useful for establishing the efficiency of the DNA extraction step, whereas DNA mock communities will only assess the efficiency of PCR, clean-up, sequencing, and analysis steps. At the time of this writing, mock communities are available from the American Type Culture Collection (ATCC) and Zymo Research. When planning a 16S study, the inclusion of a mock community is strongly encouraged.
ANALYSIS STRATEGY
Comparing pipelines.
The analysis of large and complex 16S rRNA gene sequencing data sets requires the use of bioinformatic tools. There are many pipelines available to process and analyze 16S rRNA gene sequencing data, including the commonly used QIIME (64), MG-RAST (65), UPARSE (66) (https://www.drive5.com/usearch/manual/uparse_pipeline.html), and mothur (67). These packages contain sets of tools which facilitate the complete analysis of 16S rRNA gene data, from quality control to operational taxonomic unit (OTU) clustering. Where they differ is predominantly in their accessibility to those with limited computational knowledge and in the availability of documentation.
Nilakanta et al. compared seven different packages (mothur, QIIME, WATERS, RDPipeline, VAMPS, Genboree, and SnoWMan) and concluded that while all of these packages provide effective pipelines for 16S rRNA gene analysis, the extensive documentation which accompanies mothur and QIIME provides them with an advantage over the other packages (68). Plummer et al. analyzed a single data set using QIIME, mothur, and MG-RAST and found that there were few differences in the results with regard to taxonomic classification and diversity (69). However, there were differences in the ease of use of each of these packages and the time required for analysis, with QIIME being the quickest analysis package (approximately 1 h) and MG-RAST being the slowest (approximately 2 days, due to the need for manual quality control to remove multiple annotations of reads). The authors do state that although MG-RAST is the slowest analysis method, it is perhaps the most suitable package for users with no command line experience. Ultimately, the choice of analysis package will be made on the basis of the user's level of experience in bioinformatics and on the available resources at the user's host institution.
Quality control, alignment, and taxonomic assignment.
It is essential to carry out quality filtering to remove DNA sequences which are of unexpected length, have long homopolymers, contain ambiguous bases, or do not align to the correct 16S rRNA gene region. Critically, sequences should then be screened for chimeras, as the presence of chimeric sequences can affect the interpretation of the final data set and could, for example, overinflate the perception of community diversity (70). A variety of tools have been developed to remove chimeric sequences, such as UCHIME (66) and Chimera Slayer (70). By including a mock bacterial community in a sequencing run, since the true sequences in these are known, the number of chimeric sequences can be calculated (58).
Sequences should then be aligned to a reference alignment or assigned to a suitable reference using a sequence classifier, such as the RDP Classifier, which uses a naive Bayesian approach based on 8-mers (71). Schloss showed that alignment quality can significantly impact diversity and can artificially inflate the number of bacterial OTUs, and advised against using alignments which do not take into account the secondary structure of the 16S gene (72). Of the three most commonly used alignments which are guided by secondary structure (i.e., Greengenes [73], RDP [74], and SILVA [75]), the Greengenes alignment was observed to be of poor quality, leading to significantly greater richness and diversity estimates.
Postalignment, sequences and OTUs are assigned taxonomies based upon their similarity to training sets, which are most commonly constructed from the Greengenes, RDP, and SILVA databases. Errors within these databases, caused by sequencing/PCR errors (76) or by the incorrect labeling of sequences (77), may lead to the misidentification of sequences. Another issue when relying on databases for taxonomic assignment is their bias toward bacteria which are clinically relevant in humans, meaning that researchers investigating nonhuman hosts or environmental samples may struggle to assign taxonomy to their sequences. For example, in a study of the honey bee gut microbiota, disagreement was found between the three databases listed above upon carrying out taxonomic assignments (78). At the genus level, the three databases concurred in their assignments for only 13% of sequences. The classification of sequences was improved by including bee-specific full-length 16S rRNA gene sequences in the training set, highlighting the need to include more representative sequences from a greater number of habitats.
This improvement in classification of sequences has been highlighted by Werner et al., who advised using the largest and most diverse database possible (79). This group also found that trimming the reference sequences to the primer region of interest improved classification depth. However, in a more extensively studied environment, such as the human intestine, Ritari et al. found that making a personalized reference database containing only bacterial species which were known to inhabit that niche led to an increase in lower-taxonomic-level assignments, probably due to less competition among sequences than with large databases (80).
OPERATIONAL TAXONOMIC UNIT PICKING METHODS
Operational taxonomic units (OTUs) are the common currency of 16S or marker gene studies of microbiomes. The term was originally coined by Sokal and Sneath (81) and in its more general usage refers simply to groups of organisms that are closely related. There are two major methods for defining OTUs: reference-based and de novo. In reference-based clustering, sequences from a community are clustered against a known reference database, and in de novo clustering, the sequences are clustered according to pairwise distance measures. Reference-based OTUs are sometimes referred to as phylotypes (82). As with many areas of microbiome analysis, the evidence is mixed as to which of the two approaches is best. It has been found that de novo methods perform better in terms of the quality of OTU assignments (83), with another study showing that de novo OTUs were unstable (84). However, Westcott and Schloss (83) argued that OTUs can be stable yet still incorrect, and in particular, they showed that some reference-based techniques were sensitive to the order of sequences in the database. Sul et al. found that reference-based techniques produced results similar to those with de novo methods, with the added benefit of low computational overheads and the ability to compare data sets from different variable regions (85). Indeed, perhaps the major difference between reference- and de novo-based methods is that de novo-based methods have a significantly greater computational overhead, with the need to compare every sequence to every other sequence in its most naive form.
Even within clustering tools, the choice of parameters has been shown to have a critical impact on the results. While a threshold of 97% has become standard, Patin et al. have shown that 16S rRNA gene sequences as similar as 99% can represent functionally distinct microorganisms, which means that functionally diverse species would be clustered at the 97% threshold (86). However, that may rely on accurate sequences, and if those do not exist, the 97% threshold can help avoid an overestimation of biodiversity (87). Susceptibility to differing parameters may also be pipeline dependent (88). Given the controversy and potential biases of clustering sequences, some have suggested methods and models for using individual sequences to represent OTUs (i.e., remove the clustering step entirely) (89–92).
CORRECTING FOR GENE COPY NUMBER
Different bacterial species also have various copy numbers of the 16S rRNA gene (93, 94), which can lead to misinterpretations in comparisons of the abundance of bacterial OTUs or attempts to construct a “true” description of the microbial community within a sample (95). It is unusual in 16S rRNA gene studies to have an accurate knowledge of the copy numbers for all identified OTUs. Therefore, tools have been developed which seek to correct for copy number variation using sequence databases and phylogenetic information to give a more accurate picture of the relative abundances of these OTUs. These include Copyrighter (96), rrNDB (93), functions in the picante R package and pplacer (97), and part of the PICRUSt package (98).
As these techniques are reliant on databases, the same problems are present as for taxonomic identification. Principally, lesser-studied bacterial taxonomies are less likely to be represented. It is also important to note that in comparisons of OTUs between samples rather than within a sample (e.g., in comparisons of treatment effects), the impact of copy number variation is reduced, as the under- or overrepresentation of OTUs would be consistent across samples as long as the same methodology had been used.
CONTAMINATION ISSUES
Microbial DNA contamination arising from DNA extraction kits, PCR reagents, and the lab environment may have a particularly large effect when studying low-microbial-biomass samples. Salter et al. found that contamination in DNA extraction kits not only varied by manufacturer but by individual lot, and samples processed in separate laboratories contained different types of contaminating DNA (99). This lack of predictability led the authors to suggest that “negative” (or reagent-only) controls should be run alongside samples in all 16S rRNA gene metabarcoding studies. If reagent-only controls are not included, this can lead to a misinterpretation of results. When Salter et al. analyzed a data set comparing nasopharyngeal microbiota samples from children at two time points, they found that while the time points appeared to cluster separately, this effect was mainly due to bias caused by contamination from the extraction kits used. Randomization of samples prior to processing may help avoid the introduction of this type of bias. Contamination could also lead to the false identification of microbial communities where they do not in fact exist (100) and could affect our understanding of which bacteria are relevant in clinical samples (101).
The amplification of background contaminants from PCR reagents could perhaps be avoided via the use of primer-extension PCR (102), but this would have no effect on contamination originating from other sources. Several methods have been suggested to remove contaminating DNA from reagents and the lab environment, including UV and gamma radiation (103–107); DNA intercalation by 8-methoxypsoralen, ethidium monoazide, and propidium monoazide (104, 106–108); enzymatic treatments (105–107, 109–111); silica-based membrane filtration (112); CsCl2 density gradient centrifugation (111); and bleach/copper-bis-(phenanthroline)-sulfate/H2O2 (CoPA) solution treatment (105). These methods have shown varied effects on contamination levels and PCR sensitivity, and the inclusion of reagent-only controls alongside these decontamination measures is still recommended.
What should be done with sequencing data from reagent-only controls is still under debate. It is often not appropriate to simply remove all of the bacterial OTUs found in controls, as these may overlap OTUs which can genuinely be found in samples (108). Other methods have been suggested which take into account the abundance of OTUs to predict the likelihood of sequence reads having originated from contamination. These include adapting the neutral community model (12) and combining quantitative PCR data with OTU relative abundance data to compare the absolute abundances of contaminating OTUs in controls and samples (113). However, the field is rapidly reaching consensus that, due to contamination issues, not including reagent-only controls can negatively impact the quality control of sequence data. When planning a 16S study, the inclusion of reagent-only controls (i.e., DNA extraction kit and PCR controls) is advised.
CONCLUSIONS
The study of complex microbial communities using high-throughput sequencing platforms has allowed a better understanding of a variety of biological systems and the impact of various conditions (e.g., disease states) on the host microbiome. Looking at the literature, it is clear that bias can be introduced into microbiota studies at all methodological stages from sampling to bioinformatic analysis. While the variety of different 16S rRNA gene metabarcoding methodologies might seem overwhelming, the main factor to keep in mind when designing a microbiota study is consistency. It is paramount to use consistent methodology throughout a study to minimize potential biases which could lead to spurious results.
The volume of studies attempting to define best practice for various stages of the microbiome experimental process is large, and we cover only some of the literature in this review. Unfortunately, as can be seen, there is little consensus, and further studies are unlikely to find any. The reality is that many of the biases described in this review are context and environment specific, and while individual studies may be true within their context, their conclusions may not be transferable to other studies. Clearly, with biases possible at every step, a good experimental design is essential. Recording and publication of all experimental metadata are essential for understanding microbiome studies, and unfortunately, many currently published studies lack these data.
Trying to find consensus in the literature is challenging, with many studies producing conflicting evidence about the effects of various steps in the experimental process. It is therefore essential that consistency is maintained within a study, and there must be an acceptance that comparisons between studies may not be possible.
In summary, we recommend extracting DNA from fresh samples if possible; if not, samples should be stored in a consistent manner (i.e., at the same temperature, for the same duration, and with or without cryoprotectant) with appropriate metadata being recorded. The use of a mechanical lysis step is recommended to minimize potential biases due to some microbial cells being more resistant to lysis. The selection of appropriate primers should be made after careful consideration of the literature, but it is important to note that even universal primers will not amplify all bacteria in a given sample. Sequencing both mock bacterial communities and “negative”/reagent-only controls is important for determining background contamination and sequencing error rate, and it should be included at least for each sequencing run and, even better, for every batch of commercial reagents/kits. To reduce the chance of OTU inflation caused by sequencing errors, consider complete overlap of MiSeq reads, which translates as targeting a single hypervariable region. Finally, and to reiterate, record every aspect of your experiment and report it in the methods section, and remember that the critical consideration is consistency in methodology at each stage.
ACKNOWLEDGMENTS
This project was supported by the Biotechnology and Biological Sciences Research Council (BBSRC; grants BB/N016742/1 [principal investigator {PI}, Mick Watson], BB/N01720X/1 [PI, Mick Watson], BB/K501591/1 [PI, Jos Houdijk, SRUC], and BB/J01446X/1 [PI, Gerry McLachlan, The Roslin Institute]), including institute strategic program and national capability awards to The Roslin Institute (BBSRC: grants BB/P013732/1, BB/J004235/1, and BB/J004243/1). SRUC receives support from the Scottish Government's Rural and Environment Science and Analytical Services Division (RESAS).
REFERENCES
- 1.Hong PY, Croix JA, Greenberg E, Gaskins HR, Mackie RI. 2011. Pyrosequencing-based analysis of the mucosal microbiota in healthy individuals reveals ubiquitous bacterial groups and micro-heterogeneity. PLoS One 6:e25042. doi: 10.1371/journal.pone.0025042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Glendinning L, Wright S, Pollock J, Tennant P, Collie D, McLachlan G. 2016. Variability of the sheep lung microbiota. Appl Environ Microbiol 82:3225–3238. doi: 10.1128/AEM.00540-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Mendes LW, Tsai SM. 2014. Variations of bacterial community structure and composition in mangrove sediment at different depths in Southeastern Brazil. Diversity 6:827–843. doi: 10.3390/d6040827. [DOI] [Google Scholar]
- 4.Steven B, Gallegos-Graves LV, Belnap J, Kuske CR. 2013. Dryland soil microbial communities display spatial biogeographic patterns associated with soil depth and soil parent material. FEMS Microbiol Ecol 86:101–113. doi: 10.1111/1574-6941.12143. [DOI] [PubMed] [Google Scholar]
- 5.Flores GE, Caporaso JG, Henley JB, Rideout JR, Domogala D, Chase J, Leff JW, Vázquez-Baeza Y, Gonzalez A, Knight R, Dunn RR, Fierer N. 2014. Temporal variability is a personalized feature of the human microbiome. Genome Biol 15:531. doi: 10.1186/s13059-014-0531-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Araújo-Pérez F, Mccoy AN, Okechukwu C, Carroll IM, Smith KM, Jeremiah K, Sandler RS, Asher GN, Keku TO. 2012. Differences in microbial signatures between rectal mucosal biopsies and rectal swabs. Gut Microbes 3:530–535. doi: 10.4161/gmic.22157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Glendinning L, Wright S, Tennant P, Gill AC, Collie D, McLachlan G. 2017. Microbiota in exhaled breath condensate and the lung. Appl Environ Microbiol 83:e00515-17. doi: 10.1128/AEM.00515-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Henderson G, Cox F, Kittelmann S, Miri VH, Zethof M, Noel SJ, Waghorn GC, Janssen PH. 2013. Effect of DNA extraction methods and sampling techniques on the apparent structure of cow and sheep rumen microbial communities. PLoS One 8:e74787. doi: 10.1371/journal.pone.0074787. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Paz HA, Anderson CL, Muller MJ, Kononoff PJ, Fernando SC. 2016. Rumen bacterial community composition in Holstein and Jersey cows is different under same dietary condition and is not affected by sampling method. Front Microbiol 7:1206. doi: 10.3389/fmicb.2016.01206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Ramos-Morales E, Arco-Pérez A, Martín-García AI, Yáñez-Ruiz DR, Frutos P, Hervás G. 2014. Use of stomach tubing as an alternative to rumen cannulation to study ruminal fermentation and microbiota in sheep and goats. Anim Feed Sci Technol 198:57–66. doi: 10.1016/j.anifeedsci.2014.09.016. [DOI] [Google Scholar]
- 11.Bassiouni A, Cleland EJ, Psaltis AJ, Vreugde S, Wormald P-J. 2015. Sinonasal microbiome sampling: a comparison of techniques. PLoS One 10:e0123216. doi: 10.1371/journal.pone.0123216. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Bassis CM, Moore NM, Lolans K, Seekatz AM, Weinstein RA, Young VB, Hayden MK, CDC Prevention Epicenters Program. 2017. Comparison of stool versus rectal swab samples and storage conditions on bacterial community profiles. BMC Microbiol 17:78. doi: 10.1186/s12866-017-0983-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Gorzelak MA, Gill SK, Tasnim N, Ahmadi-Vand Z, Jay M, Gibson DL. 2015. Methods for improving human gut microbiome data by reducing variability through sample processing and storage of stool. PLoS One 10:1–14. doi: 10.1371/journal.pone.0134802. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Portillo MC, Leff JW, Lauber CL, Fierer N. 2013. Cell size distributions of soil bacterial and archaeal taxa. Appl Environ Microbiol 79:7610–7617. doi: 10.1128/AEM.02710-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.McKain N, Genc B, Snelling TJ, Wallace RJ. 2013. Differential recovery of bacterial and archaeal 16S rRNA genes from ruminal digesta in response to glycerol as cryoprotectant. J Microbiol Methods 95:381–383. doi: 10.1016/j.mimet.2013.10.009. [DOI] [PubMed] [Google Scholar]
- 16.Rubin BER, Gibbons SM, Kennedy S, Hampton-Marcell J, Owens S, Gilbert JA. 2013. Investigating the impact of storage conditions on microbial community composition in soil samples. PLoS One 8:e70460. doi: 10.1371/journal.pone.0070460. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Lauber CL, Zhou N, Gordon JI, Knight R. 2011. Effect of storage conditions on the assessment of bacterial community structure in soil and human-associated samples. FEMS Microbiol Lett 307:80–86. doi: 10.1111/j.1574-6968.2010.01965.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Fouhy F, Deane J, Rea MC, O'Sullivan Ó Ross RP, O'Callaghan G, Plant BJ, Stanton C. 2015. The effects of freezing on faecal microbiota as determined using MiSeq sequencing and culture-based investigations. PLoS One 10:e0119355. doi: 10.1371/journal.pone.0119355. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Bahl MI, Bergström A, Licht TR. 2012. Freezing fecal samples prior to DNA extraction affects the Firmicutes to Bacteroidetes ratio determined by downstream quantitative PCR analysis. FEMS Microbiol Lett 329:193–197. doi: 10.1111/j.1574-6968.2012.02523.x. [DOI] [PubMed] [Google Scholar]
- 20.Choo JM, Leong LEX, Rogers GB. 2015. Sample storage conditions significantly influence faecal microbiome profiles. Sci Rep 5:1–10. doi: 10.1038/srep16350. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Tedjo DI, Jonkers DMAE, Savelkoul PH, Masclee AA, van Best N, Pierik MJ, Penders J. 2015. The effect of sampling and storage on the fecal microbiota composition in healthy and diseased subjects. PLoS One 10:e0126685. doi: 10.1371/journal.pone.0126685. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Hale VL, Tan CL, Knight R, Amato KR. 2015. Effect of preservation method on spider monkey (Ateles geoffroyi) fecal microbiota over 8 weeks. J Microbiol Methods 113:16–26. doi: 10.1016/j.mimet.2015.03.021. [DOI] [PubMed] [Google Scholar]
- 23.Dominianni C, Wu J, Hayes RB, Ahn J. 2014. Comparison of methods for fecal microbiome biospecimen collection. BMC Microbiol 14:103. doi: 10.1186/1471-2180-14-103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Kuske C, Banton K, Adorada D, Stark P, Hill K, Jackson P. 1998. Small-scale DNA sample preparation method for field PCR detection of microbial cells and spores in soil. Appl Environ Microbiol 64:2463–2472. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Schrader C, Schielke A, Ellerbroek L, Johne R. 2012. PCR inhibitors–occurrence, properties and removal. J Appl Microbiol 113:1014–1026. doi: 10.1111/j.1365-2672.2012.05384.x. [DOI] [PubMed] [Google Scholar]
- 26.Desneux J, Pourcher AM. 2014. Comparison of DNA extraction kits and modification of DNA elution procedure for the quantitation of subdominant bacteria from piggery effluents with real-time PCR. Microbiologyopen 3:437–445. doi: 10.1002/mbo3.178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Mackenzie BW, Waite DW, Taylor MW. 2015. Evaluating variation in human gut microbiota profiles due to DNA extraction method and inter-subject differences. Front Microbiol 6:130. doi: 10.3389/fmicb.2015.00130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Gerasimidis K, Bertz M, Quince C, Brunner K, Bruce A, Combet E, Calus S, Loman N, Ijaz UZ, Kennedy N, Walker A, Berry S, Salonen A, Nikkila J, Jalanka-Tuovinen J, Boer R, Peters R, Gierveld S, McOrist A, Jackson M, Bird A, Nechvatal J, Ram J, Basson M, D'Amore R, Ijaz U. 2016. The effect of DNA extraction methodology on gut microbiota research applications. BMC Res Notes 9:365. doi: 10.1186/s13104-016-2171-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Guo F, Zhang T. 2013. Biases during DNA extraction of activated sludge samples revealed by high throughput sequencing. Appl Microbiol Biotechnol 97:4607–4616. doi: 10.1007/s00253-012-4244-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Hart ML, Meyer A, Johnson PJ, Ericsson AC. 2015. Comparative evaluation of DNA extraction methods from feces of multiple host species for downstream next-generation sequencing. PLoS One 10:e0143334. doi: 10.1371/journal.pone.0143334. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Kennedy NA, Walker AW, Berry SH, Duncan SH, Farquarson FM, Louis P, Thomson JM, UK IBD Genetics Consortium, Satsangi J, Flint HJ, Parkhill J, Lees CW, Hold GL. 2014. The impact of different DNA extraction kits and laboratories upon the assessment of human gut microbiota composition by 16S rRNA gene sequencing. PLoS One 9:e88982. doi: 10.1371/journal.pone.0088982. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Maukonen J, Simoes C, Saarela M. 2012. The currently used commercial DNA-extraction methods give different results of clostridial and actinobacterial populations derived from human fecal samples. FEMS Microbiol Ecol 79:697–708. doi: 10.1111/j.1574-6941.2011.01257.x. [DOI] [PubMed] [Google Scholar]
- 33.Salonen A, Nikkilä J, Jalanka-Tuovinen J, Immonen O, Rajilić-Stojanović M, Kekkonen RA, Palva A, de Vos WM. 2010. Comparative analysis of fecal DNA extraction methods with phylogenetic microarray: Effective recovery of bacterial and archaeal DNA using mechanical cell lysis. J Microbiol Methods 81:127–134. doi: 10.1016/j.mimet.2010.02.007. [DOI] [PubMed] [Google Scholar]
- 34.Walker AW, Martin JC, Scott P, Parkhill J, Flint HJ, Scott KP. 2015. 16S rRNA gene-based profiling of the human infant gut microbiota is strongly influenced by sample processing and PCR primer choice. Microbiome 3:26. doi: 10.1186/s40168-015-0087-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Wesolowska-Andersen A, Bahl MI, Carvalho V, Kristiansen K, Sicheritz-Pontén T, Gupta R, Licht TR. 2014. Choice of bacterial DNA extraction method from fecal material influences community structure as evaluated by metagenomic analysis. Microbiome 2:19. doi: 10.1186/2049-2618-2-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Yang B, Wang Y, Qian P-Y. 2016. Sensitivity and correlation of hypervariable regions in 16S rRNA genes in phylogenetic analysis. BMC Bioinformatics 17:135. doi: 10.1186/s12859-016-0992-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Tremblay J, Singh K, Fern A, Kirton ES, He S, Woyke T, Lee J, Chen F, Dangl JL, Tringe SG. 2015. Primer and platform effects on 16S rRNA tag sequencing. Front Microbiol 6:771. doi: 10.3389/fmicb.2015.00771. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Cruaud P, Vigneron A, Lucchetti-Miganeh C, Ciron PE, Godfroy A, Cambon-Bonavita MA. 2014. Influence of DNA extraction method, 16S rRNA targeted hypervariable regions, and sample origin on microbial diversity detected by 454 pyrosequencing in marine chemosynthetic ecosystems. Appl Environ Microbiol 80:4626–4639. doi: 10.1128/AEM.00592-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Ghyselinck J, Pfeiffer S, Heylen K, Sessitsch A, De Vos P. 2013. The effect of primer choice and short read sequences on the outcome of 16S rRNA gene based diversity studies. PLoS One 8:e71360. doi: 10.1371/journal.pone.0071360. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Chakravorty S, Helb D, Burday M, Connell N, Alland D. 2007. A detailed analysis of 16S ribosomal RNA gene segments for the diagnosis of pathogenic bacteria. J Microbiol Methods 69:330–339. doi: 10.1016/j.mimet.2007.02.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Bergmann GT, Bates ST, Eilers KG, Lauber CL, Caporaso G, Walters WA, Knight R, Fierer N. 2012. The under-recognized dominance of Verrucomicrobia in soil bacterial communities. Soil Biol Biochem 43:1450–1455. doi: 10.1016/j.soilbio.2011.03.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Fouhy F, Clooney AG, Stanton C, Claesson MJ, Cotter PD. 2016. 16S rRNA gene sequencing of mock microbial populations–impact of DNA extraction method, primer choice and sequencing platform. BMC Microbiol 16:123. doi: 10.1186/s12866-016-0738-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Gohl D, Vangay P, Garbe J, MacLean A, Hauge A, Becker A, Gould T, Clayton J, Johnson T, Hunter R, Knights D, Beckman KB. 2016. Systematic improvement of amplicon marker gene methods for increased accuracy in microbiome studies. Nat Biotechnol 34:942–949. doi: 10.1038/nbt.3601. [DOI] [PubMed] [Google Scholar]
- 44.Kanagawa T. 2003. Bias and artifacts in multitemplate polymerase chain reactions (PCR). J Biosci Bioeng 96:317–323. doi: 10.1016/S1389-1723(03)90130-7. [DOI] [PubMed] [Google Scholar]
- 45.Ahn J, Kim B, Song J, Weon HY. 2012. Effects of PCR cycle number and DNA polymerase type on the 16S rRNA gene pyrosequencing analysis of bacterial communities. J Microbiol 50:1071–1074. doi: 10.1007/s12275-012-2642-z. [DOI] [PubMed] [Google Scholar]
- 46.Wu J-Y, Jiang X-T, Jiang Y-X, Lu S-Y, Zou F, Zhou H-W. 2010. Effects of polymerase, template dilution and cycle number on PCR based 16S rRNA diversity analysis using the deep sequencing method. BMC Microbiol 10:255. doi: 10.1186/1471-2180-10-255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.D'Amore R, Ijaz UZ, Schirmer M, Kenny JG, Gregory R, Darby AC, Shakya M, Podar M, Quince C, Hall N. 2016. A comprehensive benchmarking study of protocols and sequencing platforms for 16S rRNA community profiling. BMC Genomics 17:55. doi: 10.1186/s12864-015-2194-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Kozich JJ, Westcott SL, Baxter NT, Highlander SK, Schloss PD. 2013. Development of a dual-index sequencing strategy and curation pipeline for analyzing amplicon sequence data on the MiSeq Illumina sequencing platform. Appl Environ Microbiol 79:5112–5120. doi: 10.1128/AEM.01043-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Fadrosh DW, Ma B, Gajer P, Sengamalay N, Ott S, Brotman RM, Ravel J. 2014. An improved dual-indexing approach for multiplexed 16S rRNA gene sequencing on the Illumina MiSeq platform. Microbiome 2:6. doi: 10.1186/2049-2618-2-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, Peluso P, Rank D, Baybayan P, Bettman B, Bibillo A, Bjornson K, Chaudhuri B, Christians F, Cicero R, Clark S, Dalal R, DeWinter A, Dixon J, Foquet M, Gaertner A, Hardenbol P, Heiner C, Hester K, Holden D, Kearns G, Kong X, Kuse R, Lacroix Y, Lin S, Lundquist P, Ma C, Marks P, Maxham M, Murphy D, Park I, Pham T, Phillips M, Roy J, Sebra R, Shen G, Sorenson J, Tomaney A, Travers K, Trulson M, Vieceli J, Wegener J, Wu D, Yang A, Zaccarin D, et al. . 2009. Real-time DNA sequencing from single polymerase molecules. Science 323:133–138. doi: 10.1126/science.1162986. [DOI] [PubMed] [Google Scholar]
- 51.Loman NJ, Watson M. 2015. Successful test launch for nanopore sequencing. Nat Methods 12:303–304. doi: 10.1038/nmeth.3327. [DOI] [PubMed] [Google Scholar]
- 52.Jain M, Fiddes IT, Miga KH, Olsen HE, Paten B, Akeson M. 2015. Improved data analysis for the MinION nanopore sequencer. Nat Methods 12:351–356. doi: 10.1038/nmeth.3290. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Fichot EB, Norman RS. 2013. Microbial phylogenetic profiling with the Pacific Biosciences sequencing platform. Microbiome 1:10. doi: 10.1186/2049-2618-1-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Schloss PD, Jenior ML, Koumpouras CC, Westcott SL, Highlander SK. 2016. Sequencing 16S rRNA gene fragments using the PacBio SMRT DNA sequencing system. PeerJ 4:e1869. doi: 10.7717/peerj.1869. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Benítez-Páez A, Portune KJ, Sanz Y. 2016. Species-level resolution of 16S rRNA gene amplicons sequenced through the MinION portable Nanopore sequencer. Gigascience 5:4. doi: 10.1186/s13742-016-0111-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Wagner J, Coupland P, Browne HP, Lawley TD, Francis SC, Parkhill J. 2016. Evaluation of PacBio sequencing for full-length bacterial 16S rRNA gene classification. BMC Microbiol 16:274. doi: 10.1186/s12866-016-0891-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Caporaso JG, Lauber CL, Walters WA, Berg-Lyons D, Huntley J, Fierer N, Owens SM, Betley J, Fraser L, Bauer M, Gormley N, Gilbert JA, Smith G, Knight R. 2012. Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms. ISME J 6:1621–1624. doi: 10.1038/ismej.2012.8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Schloss PD, Gevers D, Westcott SL. 2011. Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies. PLoS One 6:e27310. doi: 10.1371/journal.pone.0027310. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Parada AE, Needham DM, Fuhrman JA. 2016. Every base matters: assessing small subunit rRNA primers for marine microbiomes with mock communities, time series and global field samples. Environ Microbiol 18:1403–1414. doi: 10.1111/1462-2920.13023. [DOI] [PubMed] [Google Scholar]
- 60.Brooks JP, Edwards DJ, Harwich MD, Rivera MC, Fettweis JM, Serrano MG, Reris RA, Sheth NU, Huang B, Girerd P, Vaginal Microbiome Consortium, Strauss JF III, Jefferson KK, Buck GA. 2015. The truth about metagenomics: quantifying and counteracting bias in 16S rRNA studies. BMC Microbiol 15:66. doi: 10.1186/s12866-015-0351-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Tourlousse DM, Yoshiike S, Ohashi A, Matsukura S, Noda N, Sekiguchi Y. 2016. Synthetic spike-in standards for high-throughput 16S rRNA gene amplicon sequencing. Nucleic Acids Res 45:e23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Smets W, Leff JW, Bradford MA, McCulley RL, Lebeer S, Fierer N. 2016. A method for simultaneous measurement of soil bacterial abundances and community composition via 16S rRNA gene sequencing. Soil Biol Biochem 96:145–151. doi: 10.1016/j.soilbio.2016.02.003. [DOI] [Google Scholar]
- 63.Stämmler F, Glasner J, Hiergeist A, Holler E, Weber D, Oefner PJ, Gessner A, Spang R. 2016. Adjusting microbiome profiles for differences in microbial load by spike-in bacteria. Microbiome 4:28. doi: 10.1186/s40168-016-0175-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Peña AG, Goodrich K, Gordon JI, Huttley GA, Kelley ST, Knights D, Jeremy E, Ley RE, Lozupone CA, McDonald D, Muegge BD, Reeder J, Sevinsky JR, Turnbaugh PJ, Walters WA, Widmann J, Yatsunenko T, Zaneveld J, Knight R. 2011. QIIME allows analysis of high-throughput community sequencing data. Nat Methods 7:335–336. doi: 10.1038/nchembio.581. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Meyer F, Paarmann D, D'Souza M, Olson R, Glass EM, Kubal M, Paczian T, Rodriguez A, Stevens R, Wilke A, Wilkening J, Edwards RA. 2008. The metagenomics RAST server–a public resource for the automatic phylo-genetic and functional analysis of metagenomes. BMC Bioinformatics 9:386. doi: 10.1186/1471-2105-9-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R. 2011. UCHIME improves sensitivity and speed of chimera detection. Bioinformatics 27:2194–2200. doi: 10.1093/bioinformatics/btr381. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ, Sahl JW, Stres B, Thallinger GG, Van Horn DJ, Weber CF. 2009. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 75:7537–7541. doi: 10.1128/AEM.01541-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Nilakanta H, Drews KL, Firrell S, Foulkes MA, Jablonski KA. 2014. A review of software for analyzing molecular sequences. BMC Res Notes 7:830. doi: 10.1186/1756-0500-7-830. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Plummer E, Twin J, Bulach DM, Garland SM, Tabrizi SN. 2015. A comparison of three bioinformatics pipelines for the analysis of preterm gut microbiota using 16S rRNA gene sequencing data. J Proteomics Bioinformatics 8:283–291. doi: 10.4172/jpb.1000381. [DOI] [Google Scholar]
- 70.Haas BJ, Gevers D, Earl AM, Feldgarden M, Ward DV, Giannoukos G, Ciulla D, Tabbaa D, Highlander SK, Sodergren E, Methé B, DeSantis TZ, The Human Microbiome Consortium, Petrosino JF, Knight R, Birren BW. 2011. Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Res 21:494–504. doi: 10.1101/gr.112730.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Wang Q, Garrity GM, Tiedje JM, Cole JR. 2007. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol 73:5261–5267. doi: 10.1128/AEM.00062-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Schloss PD. 2010. The effects of alignment quality, distance calculation method, sequence filtering, and region on the analysis of 16S rRNA gene-based studies. PLoS Comput Biol 6:e1000844. doi: 10.1371/journal.pcbi.1000844. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, Huber T, Dalevi D, Hu P, Andersen GL. 2006. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol 72:5069–5072. doi: 10.1128/AEM.03006-05. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Cole JR, Wang Q, Fish JA, Chai B, McGarrell DM, Sun Y, Brown CT, Porras-Alfaro A, Kuske CR, Tiedje JM. 2013. Ribosomal database project: data and tools for high throughput rRNA analysis. Nucleic Acids Res 42:D633–D642. doi: 10.1093/nar/gkt1244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glockner FO. 2013. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res 41:D590–D596. doi: 10.1093/nar/gks1219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Ashelford KE, Chuzhanova NA, Fry JC, Jones AJ, Weightman AJ. 2005. At least 1 in 20 16S rRNA sequence records currently held in public repositories is estimated to contain substantial anomalies. Appl Environ Microbiol 71:7724–7736. doi: 10.1128/AEM.71.12.7724-7736.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Kozlov AM, Zhang JJ, Yilmaz P, Glockner FO, Stamatakis A. 2016. Phylogeny-aware identification and correction of taxonomically mislabeled sequences. Nucleic Acids Res 44:5022–5033. doi: 10.1093/nar/gkw396. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Newton ILG, Roeselers G. 2012. The effect of training set on the classification of honey bee gut microbiota using the Naive Bayesian Classifier. BMC Microbiol 12:221. doi: 10.1186/1471-2180-12-221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Werner JJ, Koren O, Hugenholtz P, DeSantis TZ, Walters WA, Caporaso JG, Angenent LT, Knight R, Ley RE. 2012. Impact of training sets on classification of high-throughput bacterial 16S rRNA gene surveys. ISME J 6:94–103. doi: 10.1038/ismej.2011.82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Ritari J, Salojarvi J, Lahti L, de Vos WM. 2015. Improved taxonomic assignment of human intestinal 16S rRNA sequences by a dedicated reference database. BMC Genomics 16:1056. doi: 10.1186/s12864-015-2265-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Sokal RR, Sneath PHA. 1965. Principles of numerical taxonomy. J Mammol 46:111–112. doi: 10.2307/1377831. [DOI] [Google Scholar]
- 82.Schloss PD, Westcott SL. 2011. Assessing and improving methods used in operational taxonomic unit-based approaches for 16S rRNA gene sequence analysis. Appl Environ Microbiol 77:3219–3226. doi: 10.1128/AEM.02810-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Westcott SL, Schloss PD. 2015. De novo clustering methods outperform reference-based methods for assigning 16S rRNA gene sequences to operational taxonomic units. PeerJ 3:e1487. doi: 10.7717/peerj.1487. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.He Y, Caporaso JG, Jiang X-T, Sheng H-F, Huse SM, Rideout JR, Edgar RC, Kopylova E, Walters WA, Knight R, Zhou H-W. 2015. Stability of operational taxonomic units: an important but neglected property for analyzing microbial diversity. Microbiome 3:20. doi: 10.1186/s40168-015-0081-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Sul WJ, Cole JR, Jesus EDC, Wang Q, Farris RJ, Fish JA, Tiedje JM. 2011. Bacterial community comparisons by taxonomy-supervised analysis independent of sequence alignment and clustering. Proc Natl Acad Sci U S A 108:14637–14642. doi: 10.1073/pnas.1111435108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Patin NV, Kunin V, Lidström U, Ashby MN. 2013. Effects of OTU clustering and PCR artifacts on microbial diversity estimates. Microb Ecol 65:709–719. doi: 10.1007/s00248-012-0145-4. [DOI] [PubMed] [Google Scholar]
- 87.Kunin V, Engelbrektson A, Ochman H, Hugenholtz P. 2010. Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ Microbiol 12:118–123. doi: 10.1111/j.1462-2920.2009.02051.x. [DOI] [PubMed] [Google Scholar]
- 88.Schmidt TSB, Matias Rodrigues JF, von Mering C. 2015. Limits to robustness and reproducibility in the demarcation of operational taxonomic units. Environ Microbiol 17:1689–1706. doi: 10.1111/1462-2920.12610. [DOI] [PubMed] [Google Scholar]
- 89.Tikhonov M, Leach RW, Wingreen NS. 2015. Interpreting 16S metagenomic data without clustering to achieve sub-OTU resolution. ISME J 9:68–80. doi: 10.1038/ismej.2014.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Callahan BJ, McMurdie PJ, Holmes SP. 2017. Exact sequence variants should replace operational taxonomic units in marker gene data analysis. ISME J doi: 10.1038/ismej.2017.119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP. 2016. DADA2: high-resolution sample inference from Illumina amplicon data. Nat Methods 13:581–583. doi: 10.1038/nmeth.3869. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Thompson LR, Sanders JG, McDonald D, Amir A, Ladau J, Locey KJ, Prill RJ, Tripathi A, Gibbons SM, Ackermann G, Navas-Molina JA, Janssen S, Kopylova E, Vázquez-Baeza Y, González A, Morton JT, Mirarab S, Zech Xu Z, Jiang L, Haroon MF, Kanbar J, Zhu Q, Jin Song S, Kosciolek T, Bokulich NA, Lefler J, Brislawn CJ, Humphrey G, Owens SM, Hampton-Marcell J, Berg-Lyons D, McKenzie V, Fierer N, Fuhrman JA, Clauset A, Stevens RL, Shade A, Pollard KS, Goodwin KD, Jansson JK, Gilbert JA, Knight R, Earth Microbiome Project Consortium. 2017. A communal catalogue reveals Earth's multiscale microbial diversity. Nature 551:457–463. doi: 10.1038/nature24621. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Stoddard SF, Smith BJ, Hein R, Roller BRK, Schmidt TM. 2015. rrnDB: improved tools for interpreting rRNA gene abundance in bacteria and archaea and a new foundation for future development. Nucleic Acids Res 43:D593–D598. doi: 10.1093/nar/gku1201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Větrovský T, Baldrian P. 2013. The variability of the 16S rRNA gene in bacterial genomes and its consequences for bacterial community analyses. PLoS One 8:e57923. doi: 10.1371/journal.pone.0057923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Props R, Kerckhof F-M, Rubbens P, De Vrieze J, Hernandez Sanabria E, Waegeman W, Monsieurs P, Hammes F, Boon N. 2016. Absolute quantification of microbial taxon abundances. ISME J 11:584–587. doi: 10.1038/ismej.2016.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Angly FE, Dennis PG, Skarshewski A, Vanwonterghem I, Hugenholtz P, Tyson GW. 2014. CopyRighter: a rapid tool for improving the accuracy of microbial community profiles through lineage-specific gene copy number correction. Microbiome 2:11. doi: 10.1186/2049-2618-2-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Kembel SW, Wu M, Eisen JA, Green JL. 2012. Incorporating 16S gene copy number information improves estimates of microbial diversity and abundance. PLoS Comput Biol 8:e1002743. doi: 10.1371/journal.pcbi.1002743. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Langille MGI, Zaneveld J, Caporaso JG, McDonald D, Knights D, Reyes JA, Clemente JC, Burkepile DE, Thurber RLV, Knight R, Beiko RG, Huttenhower C. 2013. Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nat Biotechnol 31:814–821. doi: 10.1038/nbt.2676. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.Salter SJ, Cox MJ, Turek EM, Calus ST, Cookson WO, Moffatt MF, Turner P, Parkhill J, Loman NJ, Walker AW. 2014. Reagent and laboratory contamination can critically impact sequence-based microbiome analyses. BMC Biol 12:87. doi: 10.1186/s12915-014-0087-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Lauder AP, Roche AM, Sherrill-Mix S, Bailey A, Laughlin AL, Bittinger K, Leite R, Elovitz MA, Parry S, Bushman FD. 2016. Comparison of placenta samples with contamination controls does not provide evidence for a distinct placenta microbiota. Microbiome 4:1–11. doi: 10.1186/s40168-015-0145-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Laurence M, Hatzis C, Brash DE. 2014. Common contaminants in next-generation sequencing that hinder discovery of low-abundance microbes. PLoS One 9:e97876. doi: 10.1371/journal.pone.0097876. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Chang SS, Hsu HL, Cheng JC, Tseng CP. 2011. An efficient strategy for broad-range detection of low abundance bacteria without DNA decontamination of PCR reagents. PLoS One 6:e20303. doi: 10.1371/journal.pone.0020303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Tamariz J, Voynarovska K, Prinz M, Caragine T. 2006. The application of ultraviolet irradiation to exogenous sources of DNA in plasticware and water for the amplification of low copy number DNA. J Forensic Sci 51:790–794. doi: 10.1111/j.1556-4029.2006.00172.x. [DOI] [PubMed] [Google Scholar]
- 104.Humphrey B, McLeod N, Turner C, Sutton JM, Dark PM, Warhurst G. 2015. Removal of contaminant DNA by combined UV-EMA treatment allows low copy number detection of clinically relevant bacteria using pan-bacterial real-time PCR. PLoS One 10:e0132954. doi: 10.1371/journal.pone.0132954. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Champlot S, Berthelot C, Pruvost M, Bennett EA, Grange T, Geigl E-M. 2010. An efficient multistrategy DNA decontamination procedure of PCR reagents for hypersensitive PCR applications. PLoS One 5:e13042. doi: 10.1371/journal.pone.0013042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106.Klaschik S, Lehmann L, Raadts A, Hoeft A, Stuber F. 2002. Comparison of different decontamination methods for reagents to detect low concentrations of bacterial 16S DNA by real-time-PCR. Mol Biotechnol 22:231–242. doi: 10.1385/MB:22:3:231. [DOI] [PubMed] [Google Scholar]
- 107.Corless CE, Guiver M, Borrow R, Edwards-Jones V, Kaczmarski EB, Fox AJ. 2000. Contamination and sensitivity issues with a real-time universal 16S rRNA PCR. J Clin Microbiol 38:1747–1752. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Glassing A, Dowd SE, Galandiuk S, Davis B, Chiodini RJ. 2016. Inherent bacterial DNA contamination of extraction and sequencing reagents may affect interpretation of microbiota in low bacterial biomass samples. Gut Pathog 8:24. doi: 10.1186/s13099-016-0103-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109.Czurda S, Smelik S, Preuner-Stix S, Nogueira F, Lion T. 2016. Occurrence of fungal DNA contamination in PCR reagents: approaches to control and decontamination. J Clin Microbiol 54:148–152. doi: 10.1128/JCM.02112-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110.Mennerat A, Sheldon BC. 2014. How to deal with PCR contamination in molecular microbial ecology. Microb Ecol 68:834–841. doi: 10.1007/s00248-014-0453-y. [DOI] [PubMed] [Google Scholar]
- 111.Rand KH, Houck H. 1990. Taq polymerase contains bacterial DNA of unknown origin. Mol Cell Probes 4:445–450. doi: 10.1016/0890-8508(90)90003-I. [DOI] [PubMed] [Google Scholar]
- 112.Mohammadi T, Reesink HW, Vandenbroucke-Grauls C, Savelkoul PHM. 2005. Removal of contaminating DNA from commercial nucleic acid extraction kit reagents. J Microbiol Methods 61:285–288. doi: 10.1016/j.mimet.2004.11.018. [DOI] [PubMed] [Google Scholar]
- 113.Lazarevic V, Gaïa N, Girard M, Schrenzel J. 2016. Decontamination of 16S rRNA gene amplicon sequence datasets based on bacterial load assessment by qPCR. BMC Microbiol 16:73. doi: 10.1186/s12866-015-0617-z. [DOI] [PMC free article] [PubMed] [Google Scholar]