Performance of Microbiome Sequence Inference Methods in Environments with Varying Biomass

Vincent Caruso; Xubo Song; Mark Asquith; Lisa Karstens

doi:10.1128/mSystems.00163-18

. 2019 Feb 19;4(1):e00163-18. doi: 10.1128/mSystems.00163-18

Performance of Microbiome Sequence Inference Methods in Environments with Varying Biomass

Vincent Caruso ^a, Xubo Song ^a,^b, Mark Asquith ^c, Lisa Karstens ^a,^d,^✉

Editor: Sean M Gibbons^e

PMCID: PMC6381225 PMID: 30801029

Microbial communities have important ramifications for human health, but determining their impact requires accurate characterization. Current technology makes microbiome sequence data more accessible than ever. However, popular software methods for analyzing these data are based on algorithms developed alongside older sequencing technology and smaller data sets and thus may not be adequate for modern, high-throughput data sets. Additionally, samples from environments where microbes are scarce present additional challenges to community characterization relative to high-biomass environments, an issue that is often ignored. We found that a new class of microbiome sequence processing tools, called amplicon sequence variant (ASV) methods, outperformed conventional methods. In samples representing low-biomass communities, where sample contamination becomes a significant confounding factor, the improved accuracy of ASV methods may allow more-robust computational identification of contaminants.

KEYWORDS: ASV methods, OTU clustering, bioinformatics, microbiome

ABSTRACT

Microbiome community composition plays an important role in human health, and while most research to date has focused on high-microbial-biomass communities, low-biomass communities are also important. However, contamination and technical noise make determining the true community signal difficult when biomass levels are low, and the influence of varying biomass on sequence processing methods has received little attention. Here, we benchmarked six methods that infer community composition from 16S rRNA sequence reads, using samples of varying biomass. We included two operational taxonomic unit (OTU) clustering algorithms, one entropy-based method, and three more-recent amplicon sequence variant (ASV) methods. We first compared inference results from high-biomass mock communities to assess baseline performance. We then benchmarked the methods on a dilution series made from a single mock community—samples that varied only in biomass. ASVs/OTUs inferred by each method were classified as representing expected community, technical noise, or contamination. With the high-biomass data, we found that the ASV methods had good sensitivity and precision, whereas the other methods suffered in one area or in both. Inferred contamination was present only in small proportions. With the dilution series, contamination represented an increasing proportion of the data from the inferred communities, regardless of the inference method used. However, correlation between inferred contaminants and sample biomass was strongest for the ASV methods and weakest for the OTU methods. Thus, no inference method on its own can distinguish true community sequences from contaminant sequences, but ASV methods provide the most accurate characterization of community and contaminants.

IMPORTANCE Microbial communities have important ramifications for human health, but determining their impact requires accurate characterization. Current technology makes microbiome sequence data more accessible than ever. However, popular software methods for analyzing these data are based on algorithms developed alongside older sequencing technology and smaller data sets and thus may not be adequate for modern, high-throughput data sets. Additionally, samples from environments where microbes are scarce present additional challenges to community characterization relative to high-biomass environments, an issue that is often ignored. We found that a new class of microbiome sequence processing tools, called amplicon sequence variant (ASV) methods, outperformed conventional methods. In samples representing low-biomass communities, where sample contamination becomes a significant confounding factor, the improved accuracy of ASV methods may allow more-robust computational identification of contaminants.

INTRODUCTION

Microbiome research has established the crucial role of microbial communities in many environments, including the important link between human microbial communities at various body sites and a number of disorders, ranging from obesity (1) to irritable bowel disease (2) to Parkinson’s disease (3). While the majority of research has focused on environments with relatively high microbial biomass, such as the human gut, microbial communities are also found at much lower abundance in a variety of other environments. Some examples of low-biomass microbiomes are the urinary tract (4), mucosae of the lungs (5), and blood (6), as well as the built environment (7), including hospitals (8) and spacecraft assembly facilities (9). As with higher-biomass microbiomes, dysbioses of low-biomass microbiomes are also associated with disease, including urgency urinary incontinence (10, 11), cystic fibrosis, and asthma (12). Thus, these low-biomass environments are medically important.

Currently, a common method for profiling microbial communities is to sequence the 16S rRNA gene. Found in all prokaryotes, the 16S rRNA gene consists of hypervariable regions, which serve as barcodes to identify distinct organisms, flanked by highly conserved regions that offer a target for PCR primers to isolate and amplify the region of interest in a wide range of organisms. DNA sequencing reads generated from the 16S region are processed to remove sequencing noise and intraorganism variation, as well as to remove PCR chimeras. Clustering reads into operational taxonomic units (OTUs) has been the de facto standard for sequence inference with 16S rRNA gene sequencing data since at least 2006 (13). With OTU methods, the researcher selects a radius of variability (typically 3%), within which sequence differences are assumed to be due to variation within the taxonomic group or to random sequencer noise. All sequence reads within the chosen radius are clustered into a single OTU, representing one unit of analysis.

Recently, several methods have been published that take a different approach (14 –16). These algorithms, which we (and others [17, 18]) refer to as amplicon sequence variant (ASV) methods, attempt to model the error of the sequencer and to cluster reads such that their distribution within clusters is consistent with the error model. This approach avoids making assumptions about the variation within a taxonomic group, a weakness of OTU methods (19). By considering both sequence similarity and abundance in the model, ASV methods account for the error profile that results from next-generation sequencing (NGS) experiments, which may produce tens of thousands of reads for a single 16S rRNA gene template sequence. Hence, ASV methods have the potential simultaneously to improve the sensitivity and specificity of 16S rRNA gene sequence inference compared to OTU methods.

Samples taken from an environment with low microbial biomass present distinct challenges (20, 21), and methods deemed appropriate for high-biomass samples—both in the laboratory and in silico—may not transfer well to low-biomass studies. In dealing with low-biomass samples, there is less starting template DNA for the PCR. Consequently, any contamination from extraction reagents or the laboratory environment makes up a larger fraction of the extracted sample than is the case with high-microbial-biomass samples (20). Additionally, the greater number of PCR cycles typically required with low-biomass samples may produce disproportionate quantities of contaminant sequences, depending on the amplification bias of the primers used (5). In other words, the sequencing of low-biomass microbiome communities suffers from a low signal-to-noise ratio, a problem not encountered in sequencing high-microbial-biomass communities, since contaminating sequences are overwhelmed by the community DNA of high-biomass samples.

In this study, we focused on in silico sequence inference and compared the performance characteristics of several inference methods to provide an unbiased assessment of performance in high-biomass settings, as well as to investigate how the starting DNA concentration of a sample affects the inferred community composition. To do this, we performed two distinct but related experiments. First, we compared selected methods applied to various mock community data sets to establish their performance on high-microbial-biomass samples of varying compositions. We then evaluated the same methods on a dilution series made from a single mock microbial community to see how inference results changed as the starting DNA concentration decreased. We hypothesized that ASV methods would be both more sensitive and specific than OTU methods, regardless of the starting biomass. We also anticipated that decreasing the starting DNA concentration would lead to an increase in the inference of spurious and contaminant sequences due to the lower signal-to-noise ratio but that the ASV methods would more accurately identify the true contamination present.

While other studies have investigated how sample biomass affects community composition estimates (22 –24), to our knowledge, this is the first to have studied the impact of sample biomass on in silico community inference methods.

RESULTS

Experimental design.

Six 16S rRNA read clustering methods were chosen for comparison: two de novo OTU methods (UCLUST and UPARSE), three ASV methods (UNOISE, Deblur, and Divisive Amplicon Denoising Algorithm 2 [DADA2]), and an information-theoretic approach (Minimum Entropy Decomposition [MED]). Only methods that infer ASVs/OTUs de novo were selected, as de novo inference introduces less bias and generally accounts for more of the data. To the extent possible, each inference method was used in its default mode or with default parameters, along with its native chimera-removal function, as this represents the most likely usage by the typical user. Where no native chimera-removal tool existed, UCHIME (25) was used.

To assess the performance of the six selected methods, we first compared the methods on four high-biomass (undiluted) mock community data sets to show the baseline performance of each method on samples representative of high-microbial-biomass communities. Three of these data sets, referred to here as “Kozich,” “Schirmer,” and “D’Amore,” were from previously published studies (26 –28), and the fourth data set, which we call “Zymo,” was generated for this study (see Table 1).

TABLE 1.

High-microbial-biomass mock communities

Data set name (reference)	No. of strains	Genomic distribution	No. of raw reads
Kozich (26)	21	Uniform	269.8K
Schirmer (27)	57	Uniform	593.9K
D’Amore (28)	53	Log-normal	262.1K
Zymo	8	Uniform	427.2K

Open in a new tab

We next evaluated each method’s performance with varying microbial biomass by benchmarking each on a mock community dilution series. The dilution series mimics samples of successively lower biomass and allowed us to observe how each method’s inference results changed as biomass decreased.

Evaluation.

To evaluate the results from each processing method, we classified ASVs/OTUs into five categories, using a scheme similar to that used previously by Edgar (29), Callahan et al. (14), and Nearing et al. (18). ASVs/OTUs that exactly matched a reference sequence from the known community were classified as “Reference” ASVs/OTUs. Those that differed from a more abundant Reference ASV/OTU by up to 10 nucleotides (nt) were labeled “Ref Noisy” ASVs/OTUs, as these likely represented reference-derived ASVs/OTUs incorrectly inferred as distinct due to sequencing errors (technical noise). The remaining ASVs/OTUs were compared to the National Center for Biotechnology Information’s Nucleotide (NT) database (30) using BLAST (31). Those that matched an NT sequence exactly were classified as “Contaminant” ASVs/OTUs, as these likely represented correctly identified contaminating DNA in the sample. ASVs/OTUs that differed from a Contaminant ASV/OTU by up to 10 nucleotides were dubbed “Contam Noisy”. All remaining ASVs/OTUs were labeled “Other” and might include unaccounted-for PCR artifacts (such as chimeras) and sequencing noise.

We further summarized results by computing recall and precision for the inferred ASVs/OTUs. Recall data measure the proportion of known community members detected by each method, while precision data give the proportion of predicted community members that belong to the known community. Precision was computed two different ways: first, by considering all reported ASVs/OTUs, where all non-Reference results represent false positives (FP); second, by considering only Reference and Ref Noisy results to represent true and FP, respectively (technical precision), as there is more ambiguity in the remaining categories and Contaminant ASVs/OTUs represent true positives in some contexts. These statistics give a sense of the accuracy of community diversity estimates. In addition, we computed the proportion of reads mapped to Reference ASVs/OTUs, which measures the overall effect of spurious ASV/OTU detection by an inference algorithm. Finally, we computed observed alpha diversities using three different indices and compared each to expected alpha diversities.

High-microbial-biomass mock communities. (i) Total inferred ASVs/OTUs.

With the four undiluted, high-biomass mock communities, the total number of distinct ASVs/OTUs inferred by each method varied widely (see Fig. 1). UCLUST reported the largest number of ASVs/OTUs on all data sets, while Deblur reported the fewest (for Zymo and Kozich) or second fewest (for Schirmer and D’Amore). MED found the fewest ASVs/OTUs on the Schirmer and D’Amore data sets but fell in the middle on the Zymo and Kozich data sets. Among the ASV methods, DADA2 detected the most ASVs.

(ii) Classification of ASVs/OTUs.

The inference methods differed in their ability to detect the expected reference strains (Table 2; see also Table 3). All methods recovered nearly all references for the less diverse Zymo and Kozich data sets (8 of 8 for Zymo and at least 20 of 21 for Kozich, representing 100% and 95% recall, respectively), but for the larger Schirmer and D’Amore data sets, the OTU methods detected notably fewer references (46 of 57 for Schirmer and 42 of 53 for D’Amore, representing 81% and 79% recall, respectively). DADA2 and UNOISE detected the greatest number of reference strains in all data sets (96% to 100% recall), closely followed by MED (94% to 100% recall).

TABLE 2.

Number of ASVs/OTUs in each category for the high-microbial-biomass mock communities

Data set	Method	No. of ASVs/OTUs^b
Data set	Method	Inferred total	Reference	Ref Noisy	Contaminant	Contam Noisy	Other
Zymo (8 strains)	UCLUST	200	8	74	47	1	70
	UPARSE	69	8	1	35	0	25
	MED	57	9^a	48	0	0	0
	UNOISE	12	9^a	1	1	0	1
	Deblur	8	8	0	0	0	0
	DADA2	20	9^a	2	5	0	4

Kozich (21 strains)	UCLUST	191	20	42	102	4	23
	UPARSE	101	20	1	75	0	5
	MED	46	22^a	21	3	0	0
	UNOISE	40	21	1	17	0	1
	Deblur	32	20	0	11	0	1
	DADA2	56	22^a	1	31	0	2

Schirmer (57 strains)	UCLUST	185	46	68	28	4	39
	UPARSE	77	46	1	26	0	4
	MED	65	56	3	6	0	0
	UNOISE	78	57	0	20	1	0
	Deblur	71	54	0	16	1	0
	DADA2	88	57	2	28	0	1

D’Amore (53 strains)	UCLUST	66	42	4	16	0	4
	UPARSE	58	42	0	15	0	1
	MED	55	50	2	3	0	0
	UNOISE	59	51	0	8	0	0
	Deblur	56	48	0	8	0	0
	DADA2	66	51	0	15	0	0

Open in a new tab

As some strains have more than one allele, the number of references detected may be greater than the total number of strains.

Ref, Reference; Contam, Contaminant.

TABLE 3.

ASV/OTU recall and precision for the high-microbial-biomass mock communities^a

Method	Data set
	Zymo			Kozich			Schirmer			D’Amore
	Recall	Overall precision (%)	Technical precision	Recall	Overall precision (%)	Technical precision	Recall	Overall precision (%)	Technical precision	Recall	Overall precision (%)	Technical precision
UCLUST	100	4	10	95	10	32	81	25	40	79	64	91
UPARSE	100	12	89	95	20	95	81	60	98	79	72	100
MED	100	16	16	100	48	51	98	86	95	94	91	96
UNOISE	100	75	90	100	53	95	100	73	100	96	86	100
Deblur	100	100	100	95	63	100	95	76	100	91	86	100
DADA2	100	45	82	100	39	96	100	65	96	96	77	100

Open in a new tab

Precision was calculated two ways. The first value counts all unexpected (non-Reference) ASVs/OTUs as false positives, whereas the second value counts only technical noise (Ref Noisy) as false positives.

Non-Reference ASVs/OTUs included the Ref Noisy, Contaminant, Contam Noisy, and Other categories described above. There was wide variation in the Ref Noisy category: UCLUST reported high numbers (42 to 74) of Ref Noisy ASVs/OTUs for three of the four mock communities, as did MED (21 to 48) for two data sets, whereas all other methods inferred no more than 2 Ref Noisy results. In general, several Contaminant ASVs/OTUs were detected. UCLUST and UPARSE gave the highest number (15 to 102) of Contaminant ASVs/OTUs, while MED identified the fewest (0 to 6). Among the ASV methods, DADA2 reported the most (5 to 31) and Deblur the fewest (0 to 16) Contaminant results. However, no method identified more than 4 Contam Noisy ASVs/OTUs. The number of inferred Other ASVs/OTUs typically ranged from 0 to 5, but UCLUST found much higher totals (23 to 70) for three communities, as did UPARSE (25) for the Zymo data set.

Owing to the wide variation in numbers of unanticipated, non-Reference ASVs/OTUs, precision varied greatly across methods and data sets (Table 3). Deblur and UNOISE gave relatively high precision (63% to 100% and 53% to 86%, respectively) on all data sets, as did MED on the Schirmer and D’Amore communities (86% and 91%), whereas UCLUST and UPARSE ranked last on all data sets (4% to 64% and 12% to 72%, respectively). MED exhibited the most variation across data sets, ranging from 16% to 91% precision. Technical precision (which counts only Ref Noisy results as false positives) was necessarily higher, with a large increase for UPARSE, but otherwise, the same general trends were observed among the methods.

(iii) ASV/OTU abundance.

To measure the overall impact of the various noise sources on inference with respect to the target community, we computed the percentage of output reads assigned to Reference ASVs/OTUs for each method (see Table 4). For all high-biomass data sets, a large majority of reads (95.6% to 100%) were mapped to the target mock community regardless of the inference method. The proportion of reads assigned to each ASV/OTU category is shown in Fig. S1 in the supplemental material. We also plotted abundance distributions of Reference and non-Reference ASVs/OTUs, presented in Fig. 2, which shows how well the target community and unexpected ASVs/OTUs are separated in terms of signal strength.

TABLE 4.

Percentage of sequence reads mapped to Reference ASVs/OTUs in high-biomass samples

Method	% sequence reads mapped to Reference ASVs/OTUs
Method	Zymo	Kozich	Schirmer	D’Amore
UCLUST	99.2	98.9	96.6	98.3
UPARSE	99.8	99.1	95.2	98.0
MED	95.6	97.6	96.9	98.3
UNOISE	99.8	99.3	97.0	98.3
Deblur	100.0	99.3	96.9	98.3
DADA2	99.8	99.2	96.9	98.3

Open in a new tab

FIG 2 — Abundance distributions of Reference and non-Reference ASVs/OTUs for high-biomass communities. Data represent log₁₀-transformed abundance distributions of Reference ASVs/OTUs (those that match the 16S rRNA sequence of a known mock community member) and non-Reference ASVs/OTUs, as inferred by each of the six methods. Box plots show median, interquartile range (IQR), and 1.5 × IQR data. Individual ASV/OTU data points are overlaid on the box plots. Each subplot shows abundance distributions for one of the four high-biomass communities.

FIG S1

Composition of high-biomass samples. Compositions are represented in terms of the relative abundances of reads assigned to ASVs/OTUs in each category for the four high-biomass mock communities. Categories are defined in Materials and Methods. Each panel shows sample compositions for one of the four high-biomass data sets. Download FIG S1, TIF file, 0.6 MB^{(636KB, tif)}.

This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.

(iv) Alpha diversity.

Shannon, inverse Simpson, and Fisher indices for alpha diversity, computed for each method’s inferred ASVs/OTUs, are plotted in Fig. 3. With the Shannon and inverse Simpson indices, all methods gave the diversity ranking that we would expect, given each community’s known richness and evenness (see Table 1), with somewhat higher diversities for the more sensitive MED, UNOISE, and DADA2 methods. With the Fisher index, only the ASV methods gave the expected ranking, while the other three methods gave inflated values for one or more data sets.

Dilution series of Zymo mock community.

A summary of ASVs/OTUs inferred by each method for a subset of dilution series samples, including classification results, is shown in Table 5. Full results for all samples are given in Table S1 in the supplemental material.

TABLE 5.

Number of ASVs/OTUs in each category for selected samples of the dilution series mock community benchmark

Dilution	Method	No. of ASVs/OTUs
Dilution	Method	Inferred-total	Reference	Ref Noisy	Contaminant	Contam Noisy	Other
1:1 (neat) (243.5K reads)	UCLUST	202	8	74	47	0	73
	UPARSE	69	8	1	35	0	25
	MED	57	9	48	0	0	0
	UNOISE	12	9	1	1	1	0
	Deblur	8	8	0	0	0	0
	DADA2	20	9	2	5	1	3

1:9 (282.0K reads)	UCLUST	450	8	62	218	25	137
	UPARSE	288	8	0	197	2	81
	MED	78	9	63	6	0	0
	UNOISE	119	9	0	97	3	10
	Deblur	85	8	0	75	0	2
	DADA2	114	9	2	91	0	12

1:81 (243.5k reads)	UCLUST	336	8	23	200	14	92
	UPARSE	269	8	1	186	2	72
	MED	153	9	65	76	2	1
	UNOISE	449	9	1	277	91	71
	Deblur	339	8	0	237	38	56
	DADA2	261	9	1	195	9	48

1:729 (144.3K reads)	UCLUST	377	8	2	239	37	91
	UPARSE	304	8	0	228	5	63
	MED	570	9	29	349	139	44
	UNOISE	530	9	0	330	123	68
	Deblur	430	8	0	293	68	61
	DADA2	381	9	1	270	49	52

1:6561 (49.4K reads)	UCLUST	195	8	2	127	9	49
	UPARSE	183	8	1	126	2	46
	MED	325	9	24	177	64	51
	UNOISE	267	9	3	161	39	55
	Deblur	226	9	1	152	16	48
	DADA2	193	8	1	129	11	44

Open in a new tab

TABLE S1

ASVs/OTUs in each class for each method over all nine Zymo mock community dilution series samples. Download Table S1, CSV file, 0.00 MB^{(1.7KB, csv)}.

This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.

(i) Total inferred ASVs/OTUs.

As starting microbial biomass decreased, the total number of inferred ASVs/OTUs increased for all methods, dramatically for some (see Fig. 4A). This trend appeared not to hold for the two most dilute samples, but the deviation can be explained by the much lower sequencing depth obtained for these two samples—less than 50K reads each, compared to greater than 140K reads for each of the other samples. When inferred ASV/OTU totals were normalized by sample read count, the trend of increasing numbers of ASVs/OTUs was observed across the full dilution series (see Fig. S2).

FIG S2

Total ASVs/OTUs inferred by each method at each concentration, normalized by median sample read count. For each dilution sample, the total number of inferred ASVs/OTUs was multiplied by the total number of output reads for a given method, and the result was then divided by the median number of output reads among all samples for that method. Each color corresponds to a different method. Download FIG S2, TIF file, 0.7 MB^{(699.5KB, tif)}.

This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.

At the highest concentrations (1:1 and 1:3), the ASV methods reported the fewest ASVs/OTUs (8 to 22), with the number of ASVs detected increasing steadily across the dilution series to a peak of 381 to 530 at a 1:729 dilution (the two most dilute samples were an exception, as explained above). With MED, the total numbers reported at higher concentrations were greater than those seen with the ASV methods but lower than those seen with the OTU methods for the undiluted sample, remaining relatively steady over the first four dilution samples (57 to 102 ASVs/OTUs); however, the MED total rose sharply such that the method detected the highest totals (278 to 570) for the three most dilute samples. In contrast, the totals reported by the OTU methods were at the high end for the three highest-concentration samples (69 to 288 for UPARSE and 202 to 450 for UCLUST), with a sharp spike for the 1:9 sample, but their totals leveled off over the rest of the dilution series, with UPARSE reporting the fewest ASVs/OTUs (142 to 304) for the four most dilute samples.