Abstract
The accurate description of a microbial community is an important first step in understanding the roles of its components in ecosystem function. A method for surveying microbial communities termed serial analysis of rRNA genes (SARD) is described here. Through a series of molecular cloning steps, short DNA sequence tags are recovered from the fifth variable (V5) region of the prokaryotic 16S rRNA genes from microbial communities. These tags are ligated to form concatemers comprised of 20 to 40 tags which are cloned and identified by DNA sequencing. Four agricultural soil samples were profiled with SARD to assess the method's utility. A total of 37,008 SARD tags comprising 3,127 unique sequences were identified. A comparison of duplicate profiles from one soil genomic DNA preparation revealed that the method was highly reproducible. The large numbers of singleton tags, together with nonparametric richness estimates, indicated that a significant amount of sequence tag diversity remained undetected with this level of sampling. The abundance classes of the observed tags were scale-free and conformed to a power law distribution. Numerically, the majority of the total tags observed belonged to abundance classes that were each present at less than 1% of the community. Over 99% of the unique tags individually made up less than 1% of the community. Therefore, from either a numerical or diversity standpoint, taxa with low abundance comprised a significant proportion of the microbial communities examined and could potentially make a large contribution to ecosystem function. SARD may provide a means to explore the ecological roles of these rare members of microbial communities in qualitative and quantitative terms.
Microbial communities are typified by extraordinary numbers of cells and species richness (43). A key step in understanding the ecological role of these communities is an accurate census of the community structure and composition. Culture independence is a crucial part of any microbial surveying method, since the overwhelming majority of environmental prokaryotes are not culturable under standard laboratory conditions (3).
Current molecular microbial surveying methods, such as terminal restriction fragment length polymorphism (28), automated ribosomal intergenic spacer analysis (17), and denaturing gradient gel electrophoresis (30), offer fairly quick and inexpensive means to detect a few hundred of the most-abundant taxa in a sample. Comparison of the profiles created with these methods has been a valuable approach for addressing ecological questions about microbial communities, especially in studies where large numbers of samples are involved (reviewed in reference 24).
Another approach for surveying microbial communities has been through DNA sequencing of 16S rRNA gene clones (35). Although this approach provides good phylogenetic resolution, it is not the most efficient method of surveying a complex community, since the majority of the 16S rRNA gene sequence is conserved among prokaryotes (10). Thus, considerable effort is expended sequencing regions of the gene with little information content. An alternative approach, including that taken in this report, has been to focus sequencing resources on a given variable region within the 16S rRNA gene (27, 32, 33, 39). While the details of the methods that use this strategy differ, their common theme is a tradeoff of phylogenetic resolution for sequence throughput.
The need for methods that can be used for deep surveys of microbial communities is exemplified by studies showing that the majority of species are present at very low abundance (1, 5, 12, 42, 49). Moreover, the members of such communities with low abundance are disproportionately affected by environmental stress or disturbances (19, 44). Are these species with low abundance viable, and if so, do they contribute to ecosystem function in meaningful ways? To address these questions, new surveying methods are needed that can detect species with low abundance. A new method, termed SARD for serial analysis of rRNA genes, is described here that enables the detection and quantitation of rare sequences in microbial communities. In agricultural soil samples, this method indicated that, numerically, most of the DNA sequences came from prokaryotic species that were among the least abundant, revealing an unexpected dominance of rare species in the overall microbial population.
MATERIALS AND METHODS
Molecular biological reagents.
Oligonucleotides were obtained from Operon Technologies (Alameda, CA). AmpliTaq Gold DNA polymerase was obtained from Applied Biosystems (Foster City, CA). All other DNA-modifying enzymes, including Taq DNA polymerase, were obtained from New England Biolabs (Beverly, MA).
Soil sampling.
Soil samples were collected from Union Island and Victoria Island in the Sacramento River delta of California in October, 2004. These sites corresponded to locations directly above potential natural gas accumulations identified by three-dimensional seismic surveys. Wells were drilled at these sites within 4 weeks after the samples were collected, and the structures at both locations were found to contain noncommercial levels of hydrocarbons.
The sampling locations were determined with a wide area augmentation system-enabled Garmin GPS V (Olathe, KS) handheld unit. The coordinate measurements were averaged for approximately 10 min to increase accuracy. At each location, a hole was drilled to a depth of 30 cm with a Stihl BT45 gas-powered drill fitted with a 3.5-cm-diameter ship auger bit (no. 47422; Irwin). Core samples were collected at the bottom of the hole with a 2.5-cm-diameter by 30-cm AMS soil core sampler tool (American Falls, ID). The core sampler had a replaceable tip that was changed between samples. The tool was modified by adding a 1-in.-diameter by 8-in. (2.5 by 20 cm) plastic plug such that 1-in. by 4-in. (2.5 by 10 cm) core samples were collected in a plastic liner. Following collection, the core samples were capped and maintained on ice until they arrived at the laboratory for DNA extraction.
Soil DNA extractions.
Soil core samples were extruded from the sampling sleeves and were dissected to remove the outer portions that had been in contact with the liner or sampler tip. Genomic DNA was extracted from about 0.5 g soil from each sample with a FastDNA spin kit for soil in a FastPrep FP120 instrument (MP Biomedicals, Irvine, CA), except that the components for the lysis matrix were obtained from CeroGlass (Columbia, TN). The bead-beating matrix consisted of one 4-mm glass bead (GSM-40), 0.75 g 1.4- to 1.6-mm zirconium silicate beads (SLZ-15), and 1.0 g 0.07- to 0.125-mm zirconium silicate beads (BSLZ-1). The beads were prepared by acid washing with HCl and HNO3 followed by neutralization with extensive water washings and autoclaving. Prechilled samples were disrupted by bead beating in a FastPrep instrument at 6.5 m/s for 45 s. Subsequent steps were performed according to the kit instructions.
SARD profiling.
The following oligonucleotides were used in SARD profiles: TX-9, 5′-[BioTEG]-GGATTAGAWACCCBGGTAGTC-3′; 1391R, 5′-GACGGGCRGTGWGTRCA-3′ (45); TX12, 5′-[Phos]CTCCAGGTCTACATCCTAGTCAGGAC[23-ddC-Q]-3′; TX13, 5′-ATAGGTCCTGACTAGGATGTAGACCTGGAG-3′; TX14, 5′-[Phos]CTCCAGACTAGCATCCGCTGACTTGA[23-ddC-Q]-3′; TX15, 5′-AATGTCAAGTCAGCGGATGCTAGTCTGGAG-3′; TX131, 5′-[BioTEG]-GTACGATTACTCGATAGTCACGGTCCTGACTAGGATGTAGACC-3′; and TX141, [BioTEG]-GGATATACTCAGGTTGCAACGGTCAAGTCAGCGGATGCTAGTC-3′.
Primers TX9 and 1391R were used to PCR amplify a 600-bp region of the 16S rRNA gene from 600 ng soil genomic DNA in a 400-μl reaction mixture volume as outlined in Fig. 1. The numbers of PCR-amplifiable 16S rRNA genes with these primers in 600 ng of soil genomic DNA were determined by quantitative PCR to be 2.8 × 107, 2.2 × 107, 7.3 × 107, and 2.4 × 107 for samples WP43, WP45, Pol-NE, and Pol-W, respectively.
FIG. 1.
Serial analysis of rRNA genes (SARD). A conserved AluI restriction enzyme recognition site is located immediately downstream of the fifth variable (V5) region of the bacterial 16S rRNA gene. Step I, “universal” bacterial PCR primers are employed to create amplicons flanking this region from environmental genomic DNA. The amplicons, containing 5′ biotin groups, are digested with AluI. Step II, the 5′-most AluI restriction fragments are immobilized on magnetic streptavidin-coated (SA) beads. Step III, the beads are split into two pools and a unique double-stranded adapter is ligated to each pool. Step IV, the adapters, including short sequence tags, are released from the beads by digestion with BpmI. Step V, the 3′ overhangs of the released fragments are removed with the Klenow fragment and the products are recombined. Step VI, DNA fragments are ligated head-to-head, PCR amplified with primers specific to the unique adapters, and cleaved with the restriction enzyme FokI. Step VII, the resulting 28-bp ditags, possessing 4-bp AGCT 5′ overhangs, are purified and ligated to form concatemers. Step VIII, concatemers comprising 30 or more SARD tags are purified and cloned into the HindIII site of pUC19.
The PCR conditions for the SARD library included 7.5 min at 95°C followed by 20 to 24 cycles (depending upon the sample) of 30 s at 94°C, 30 s at 55°C, and 1 min at 72°C. The number of cycles was kept to a minimum to reduce PCR-dependent biases and errors resulting from misincorporation of nucleotides. The resulting amplicon was agarose gel purified, recovered with a QIAquick gel extraction kit (QIAGEN, Valencia, CA), and cleaved with AluI. The ∼80-bp restriction digest band, including fragments corresponding to 40 to 120 bp in size, was purified on a 10% polyacrylamide gel electrophoresis-Tris-borate-EDTA (PAGE-TBE) gel. The acrylamide pieces were fragmented by passage from a 0.5-ml microcentrifuge tube, with a hole pierced in the bottom with a 21-gauge needle, into a 1.5-ml microcentrifuge tube by centrifugation at 13,000 × g for 2 min. The excised DNA fragments were extracted from the acrylamide gel by diffusion in TEN buffer (10 mM Tris HCl, pH 8, 1 mM EDTA, 50 mM NaCl) at 55°C for 20 min. The buffer-acrylamide mixture was transferred to a Spin-X tube (no. 8163; Corning Inc., Corning, NY). The buffer containing DNA was separated from the acrylamide by centrifugation.
The ∼80-bp AluI fragments were immobilized by binding to Dynal M-280 streptavidin beads (Invitrogen, Carlsbad, CA) in 1× BW buffer (5 mM Tris HCl, pH 7.5, 0.5 mM EDTA, 1.0 M NaCl). The beads were washed twice in 1× BW buffer and twice in wash buffer (10 mM Tris HCl, pH 8, 10 mM MgSO4, 50 mM NaCl). During the final wash, the beads were split into two pools and the buffer was removed while the beads were positioned next to the magnetic tube holder. Each bead pool was resuspended in a ligation mixture containing T4 DNA ligase, buffer, and either of the double-strand adapters TX12/13 and TX14/15. The ligations were incubated overnight at 16°C, followed by heat inactivation at 65°C for 15 min. The ligations were washed twice in 1× BW buffer and twice in wash buffer.
SARD tags were cleaved from the streptavidin beads by incubation with 2.5 U BpmI for 2 h at 37°C. Released tags were transferred to a new tube and heat inactivated at 65°C for 20 min. The 3′ overhangs were removed by incubation with 2.5 U Klenow fragment in the presence of 200 μM deoxynucloside triphosphates at 25°C for 30 min. The split pools were recombined and heat inactivated at 70°C for 30 min. The SARD tag adapters were ligated together to form ditags by adding rATP to 1 mM and 200 U of T4 DNA ligase and incubating overnight at 16°C.
The ditags were amplified in two steps, utilizing the primers TX131 and TX141, to prepare an adequate amount for concatenation. Prior to each step, the number of cycles that would lead to the greatest amount of 120-bp amplicon product without the appearance of artifact bands or smears was determined empirically in small-scale (15 μl) reaction mixtures. PCR conditions for the first step consisted of 94°C for 5 min followed by 10 cycles of 94°C for 20 s, 55°C for 30 s, and 72°C for 30 s followed by 6 to 10 cycles (depending upon the sample) of 94°C for 20 s and 72°C for 45 s. Ditags were amplified with AmpliTaq Gold DNA polymerase in three 100-μl reaction mixtures. The amplified ditags were purified on 10% PAGE-TBE gels and served as the template for the second amplification. PCR conditions for the second step included 94°C for 2 min followed by 6 to 10 cycles of 94°C for 30 s and 72°C for 1 min. The ditags were amplified with Taq DNA polymerase (New England Biolabs) in 96 100-μl reaction mixture flasks (9.6 ml total) in the second large-scale PCR step. The amplified ditags were purified on 10% PAGE-TBE gels.
The PAGE-purified large-scale ditag preparations were digested with 240 U of FokI for 4 h at 37°C to release the adapters. The 28-bp ditags were purified by two rounds of PAGE-TBE purification, first on a 12% gel and then on a 16% gel. DNA was recovered from the acrylamide gel by fragmentation and diffusion as described above, phenol-chloroform extracted, and ethanol precipitated.
The purified 28-bp ditags were concatenated by resuspending ethanol precipitates in a ligation mixture including 1,000 U of T4 DNA ligase. The ligations were incubated overnight at 16°C. The reaction mixtures were heated to 65°C for 15 min, followed by incubation on ice for 15 min as described previously (23). The ditag concatemers were ethanol precipitated and purified on 8% PAGE-TBE gels. DNA fragments corresponding to 300 to 700 bp were excised and purified by fragmentation/diffusion followed by ethanol precipitation. The resulting ditag concatemers were ligated into the HindIII site of pUC19 and transformed into chemically competent Escherichia coli DH10B cells (Invitrogen, Carlsbad, CA).
Bacterial transformants were plated on medium containing 5-bromo-4-chloro-3-indolyl-β-d-galactopyranoside for blue/white screening to identify cells harboring plasmids with inserts. Cells from white colonies were further screened for clones with large inserts by colony PCR utilizing M13 forward and reverse primers. Amplicons were resolved on 1.5% agarose gels. Clones with insert sizes of 300 to 600 bp (20 to 40 SARD tags) were selected for sequencing.
SARD clone DNA sequencing.
Sequencing of the SARD clones was performed by using a modification of the BigDye terminator method used for microbial genome sequencing at The Institute for Genomic Research (18, 40). Briefly, the sequencing reaction used the M13 forward primer only, and a sequencing buffer containing sucrose and betaine (final concentrations, 80 mM Tris HCl, pH 9.0, 2 mM MgCl2, 2% sucrose, 0.75 M betaine) was substituted for the original sequencing buffer. Finally, the following cycling conditions were used: 98°C for 2 min, 98°C for 10 min to 50°C for 5 min to 70°C for 4 min (40 cycles).
SARD profile analysis.
A software program was written to extract the SARD tag sequences from the raw sequence files and convert the data into a tab-delimited file that could be further manipulated in a spreadsheet program such as Microsoft Excel. Any SARD tags that were too short or that were greater than 1 nucleotide too long were discarded. Tags that were 1 nucleotide too long were trimmed to the correct length and included in the analyses. These tags likely resulted from incomplete removal of the 3′ overhang by the Klenow enzyme prior to the head-to-head tag ligation to form ditags. A small number of tags (<0.4%) that were either presumptive cloning artifacts derived from TX131/TX141 or were from a downstream, conserved AluI site at position 1067 (E. coli numbering) of the 16S rRNA gene were filtered out of the tag set. The program reported additional numerical features of the profiles, including, for each tag, the minimum number of nucleotide position changes necessary to equal another tag sequence. SARD tag richness and diversity estimates were made with the EstimateS software package, version 7.5 (9). To construct the rarefaction plots and Chao1 estimates, the SARD data was randomized a total of 50 times.
16S rRNA gene libraries.
Approximately 600 bp of the bacterial 16S rRNA genes were PCR amplified with the same primers used for SARD profiling (TX9 and 1391R), except that neither primer had a biotin modification and both primers possessed 5′ phosphate groups to facilitate cloning. The number of PCR cycles was kept to 20 to 22 cycles to minimize amplification-dependent biases. Amplicons were purified on agarose gels with a QIAquick gel purification kit (QIAGEN, Valencia, CA).
PCR products were ligated nondirectionally into the SmaI site of pUC19 and were transformed into E. coli DH10B cells (Invitrogen, Carlsbad, CA). Transformants were plated onto LB agar-ampicillin plates containing 5-bromo-4-chloro-3-indolyl-β-d-galactopyranoside for blue/white screening. White colonies were picked and grown in LB-ampicillin medium. Plasmid template DNA was prepared by a modified alkaline lysis method. The 16S rRNA gene nucleotide sequences of the clone inserts were determined by using BigDye terminators (Applied Biosystems, Foster City, CA) and 3.2 pmol of M13F sequencing primer as described previously (20). Sequences were analyzed on ABI 3730xl sequencers (Applied Biosystems, Foster City, CA) and trimmed to remove the vector sequence.
Phylogenetic analysis.
16S rRNA gene sequences were aligned with ClustalX, version 1.81 (41). Unrooted phylogenetic trees were created with the PHYLIP package of programs, version 3.63 (16), including DNADIST and NEIGHBOR, where the output file of one program served as the input for the next program. The evolutionary distances were computed with DNADIST using the Kimura two-parameter model. The trees were edited using RETREE. The bootstrap values for the nodes were determined from 1,000 iterative analyses with the SEQBOOT, DNADIST, NEIGHBOR, and CONSENSE programs.
Nucleotide sequence accession numbers.
Partial 16S rRNA gene sequences were deposited in the GenBank database (accession nos. EF600115 to EF600228 [Pol-NE], EF600229 to EF600332 [Pol-W], EF600333 to EF600441 [WP43], and EF600442 to EF600552 [WP45]).
SARD data accession numbers.
SARD tag sequences and abundance data were deposited in the NCBI Gene Expression Omnibus (GEO) site located at http://www.ncbi.nlm.nih.gov/geo (accession no. GSE8119).
RESULTS
SARD library construction.
The test samples were comprised of four soil samples collected from a depth of about 40 cm from the Sacramento River delta area of California. The four samples were collected as two pairs of two samples. Each pair of samples was located about 7 km apart; each sample within a pair was collected about 25 m apart. Two of the samples were located within an irrigated asparagus field (WP43 and WP45), and two were located within an alfalfa field (Pol-W and Pol-NE). Genomic DNA extracted from these soil samples served as the starting material both for SARD profiles and 16S rRNA gene clone libraries.
SARD was designed to recover a 16-bp sequence tag from the fifth variable region (V5) of the bacterial 16S rRNA gene (4). The method begins by PCR amplification of a subregion of the 16S rRNA gene at positions 790 to 1391 (Fig. 1). The location where the SARD tag is recovered from the 16S rRNA gene is defined by the first occurrence of an AluI restriction site downstream of the forward primer (TX9). The forward primer is complementary to a conserved DNA sequence immediately upstream of the V5 region. In most cases, the first AluI site downstream of TX9 occurs at position 860 (E. coli numbering), which is located at the downstream junction of the V5 region and the subsequent conserved region (10). As described below, the AluI site was found to be conserved at this location (or within the V5 region) in 70 to 80 percent of 16S rRNA gene sequences examined.
The forward primer, TX9, was biotinylated to facilitate binding to a solid support following digestion with AluI. However, prior to binding to the streptavidin beads, the AluI-digested PCR products were PAGE purified to recover only those fragments where the AluI site was located within or adjacent to the 5′ region. These fragments were then bound to a streptavidin solid support, which resulted in the V5 variable sequence being exposed at the free end of the bound DNA. Double-stranded DNA adapters that include recognition sites for the type IIS restriction enzyme BpmI were ligated to the bound DNA. Type IIS restriction enzymes cleave DNA at a distance from their recognition sites. Thus, cleavage of these bound DNA fragments with BpmI released the adapter sequences, including 12 bp of variable sequence from the V5 region.
In a series of subsequent enzymatic modifications, the 3′ overhangs of the released sequence tags were removed with the Klenow fragment, and the resulting blunt end tags were ligated together to form ditags. Following amplification and PAGE purification, the adapter sequences were released from the ditags by cleavage with a second type IIS restriction enzyme, FokI, whose recognition site was also located within the adapter sequences. FokI digestion resulted in 24-bp ditags that possessed 4-bp AGCT 5′ overhangs on each end. These ditags were purified and ligated together to form concatemers. These concatemers were cloned into the HindIII site of pUC19.
The SARD tag concatemers were free from adapter sequences, which increased the throughput of tags identified per sequencing run. The ditags that comprise the concatemers were themselves the result of head-to-head ligation of individual tags. Therefore, the SARD tags were arranged within the concatemers on alternating DNA strands separated by an AluI site every 28 bp. Each tag consisted of 12 bp of variable sequence plus the 4-bp AluI site. A software program was written to identify and extract SARD tags from the raw sequence data.
A total of 37,008 SARD tags were identified from the four soil sample genomic DNA preparations (Table 1). These tags comprised 3,127 unique tag sequences. The number of times a tag sequence is present in a sample is expected to reflect the abundance of the corresponding 16S rRNA gene sequence in the sample community. Most unique SARD tags were observed only once (singletons) or twice (doubletons) in each sample, indicating extensive tag richness in these samples and that the profiles had captured only a fraction of the tag diversity present.
TABLE 1.
SARD profile summary
| Sample | Total no. tags | No. unique tags | Total no. tags at <1% abundance (%) | No. unique tags at <1% abundance (%) | No. singletons | No. doubletons |
|---|---|---|---|---|---|---|
| Pol-W | 7,625 | 819 | 2,609 | 808 | 515 | 98 |
| Pol-NE | 10,062 | 1,045 | 4,876 | 1,031 | 602 | 141 |
| WP43 | 8,120 | 1,143 | 4,331 | 1,130 | 791 | 115 |
| WP45 | 11,201 | 1,253 | 5,777 | 1,234 | 810 | 148 |
| Totala | 37,008 | 3,127 | 19,910 (53.8)b | 3,110 (99.5%)c | 2,017 | 366 |
Determined by combining tags from the four profiles and recalculating the results.
Total number of tags at less than 1% abundance divided by the total number of tags.
Total number of unique tags at less than 1% abundance divided by the total number of unique tags.
The extent of a SARD survey is determined by the number of tags per concatemer and the number of concatemers that are sequenced. Since typically only a fraction of the SARD clones in a library are sequenced, the extent of coverage of a SARD library can be increased by additional sequencing of the same library. The sensitivity of SARD is thus dependent upon the level of sequencing that is performed. Additional sequencing of SARD clones from these samples would be expected to have revealed a greater proportion of the tag sequences present.
A total of five SARD profiles were conducted from soil genomic DNA extracted from four samples. Two profiles were conducted from the same genomic DNA, prepared from the sample Pol-W, to assess the reproducibility of SARD (Fig. 2). As expected, the least-abundant tags (seen the least number of times) showed the most variability between the duplicate profiles. Since the data are plotted on a log-log scale, only those tags that were present in both SARD profiles were included. The Pearson correlation from this comparison (r2 > 0.99) and the plots on a linear scale obtained from the entire data set (data not shown) were similarly robust.
FIG. 2.
Reproducibility of SARD profiles. Data from two separate SARD profiles of the same sample are plotted against one another. Values are shown only for tags that were observed in both profiles. The symbol sizes reflect the number of coincident tags that overlapped on the plot as a result of occurring at the same abundance levels. The largest symbol corresponds to 45 tags that were seen once in both profiles.
Error rate estimates.
Some fraction of the SARD tags identified in these profiles were expected to contain incorrect sequences that could have resulted either from Taq polymerase errors during PCR or during DNA sequencing. Knowing what level of artifact contamination existed in the SARD profiles was important, since these tags will contribute to the numbers of rare tags and will thus influence richness estimates. While the exact number of SARD tag artifacts was unknown, some estimates could be made.
One approach for estimating the number of tag artifacts is through published error rate determinations. Taq polymerase misincorporates nucleotides at a rate of 8 × 10−6/bp/duplication (8). Therefore, under the conditions used in this study, <180 of the tags identified in this study were expected to be artifacts as a result of Taq replication errors. DNA sequencing errors, which can be discriminated during automated sequencing by using Phred scores (13, 14), were expected to have resulted in <550 tag artifacts. Taken together, the total number of tag artifacts was expected to comprise <730 (2%) of the ∼37,000 tags identified.
Community structure assessment.
Rank abundance plots were used to assess the SARD tag diversity of the four agricultural soil samples (Fig. 3). In this and subsequent experiments, the data from the two SARD profiles from Pol-W were combined into a single data set. The SARD profiles were found to be comprised of a large number of rare sequence tags and a small number of abundant tags. The Simpson's reciprocal diversity index values were similar for three of the four samples. The lower diversity value for the fourth sample, Pol-W, could be attributed to the presence of two abundant SARD tag sequences that were each present at levels of 16 and 17 percent of the sample. The most-abundant tag found in any of the other three samples was present at 11 percent.
FIG. 3.
Rank abundance plot of SARD tag sequences from four soil samples. Boxed values indicate Simpson's reciprocal diversity index (1/D) values.
The same SARD tag abundance data were used to examine abundance class distributions. Histograms of SARD tag abundance class data were made where the width of each bin size was adjusted through logarithmic binning (34). Plots of the abundance classes as a function of the proportion of all tags that each class comprised showed a linear relationship on a log-log scale (Fig. 4). Thus, the abundance classes were found to be scale-free and to follow a power law distribution. Similarly, the data fit a power law form when plotted as cumulative Zipf distributions (not shown).
FIG. 4.
Abundance class distributions of SARD data. SARD tag abundance data plotted as histograms with logarithmic binning.
Richness estimates.
SARD profiles are based on sampling individual tags in a population where the number of times any given tag is seen is proportional to its abundance in the original sample. To determine whether the sampling effort of the SARD profiles had captured a significant fraction of the different tags present in the samples, the SARD data was plotted as accumulation curves. These accumulation curves were computed using resampling of the data with a total of 50 randomizations without replacement with the program EstimateS (9). In addition, predictions of the total tag richness from these samples were made by using the nonparametric richness estimator Chao1 (6, 7).
In the case of the Pol-NE and Pol-W samples, the accumulation plots and Chao1 estimates from the two samples are essentially indistinguishable from one another. In contrast to the observed richness in the accumulation plots, the Chao1 richness estimates of WP43 were significantly higher than that of the WP45 sample. The difference between these samples did not become significant at the 95% confidence level until the sample size had reached about 5,000 tags. Despite similar accumulation plots in the four samples, the Chao1 richness estimates varied by nearly a factor of two between the alfalfa field (Pol-NE and Pol-W) and asparagus field (WP43 and WP45) sampling locations (Fig. 5). Importantly, the richness estimates appeared sample-size dependent and did not reach a plateau. Therefore, additional sampling would be expected to lead to higher richness estimates.
FIG. 5.
Observed and estimated SARD tag richness from four agricultural soil samples. The bars show the 95% confidence intervals provided for the Chao1 richness estimates. The plots were made with EstimateS (9) following 50 randomizations. For clarity, only every 300th data point is plotted.
SARD tag occurrence in 16S rRNA genes.
To determine how well a SARD profile represents a bacterial community, a 16S rRNA clone library was created from the sample WP45 for comparison. The clone library was constructed by PCR amplification of an approximately 600-bp region of the 16S rRNA genes with the same primers (TX9/1391R) and from the same soil genomic DNA preparation that was used for the SARD profiles. These PCR amplicons were ligated into the pUC19 vector. The DNA sequences from 110 clones passed quality control (long, high-quality sequencing read and nonchimeric) and were the subject of subsequent analysis. Each sequence was examined for the presence of a SARD tag that would be expected to be recovered in a SARD profile. Eighty-five of the 110 clone sequences (77%) possessed an AluI restriction site within or adjacent to the V5 region and, therefore, would be expected to be identified in a SARD profile. There were a total of 53 different SARD tags identified out of the 85 tags present in the clone sequences. Of these tags, 49 (93%) were also identified in the corresponding SARD profile. In summary, there were 110 total sequences, 85 informative tags, 53 unique tags, and 49 tags in common in the 16S rRNA clone sequences from WP45. The four tags not seen in the WP45 SARD profile were observed once (singletons) in the clone sequences, and thus, the actual abundance of these sequences in the community may be very low and could explain their absence in the SARD profiles.
Two factors can limit the resolution of SARD profiles: (i) different 16S rRNA genes may share the same SARD tag sequence, and (ii), some 16S rRNA genes may not produce an informative SARD tag. To understand how these limitations affect SARD profiles, a phylogenetic tree was constructed from the WP45 16S rRNA clones and was annotated with the corresponding SARD tags (Fig. 6). Sixteen of the 53 SARD tags identified in the 16S rRNA clones occurred in more than one clone sequence. In these cases, where different 16S rRNA clones possessed the same SARD tag sequence, the 16S gene sequences were significantly similar to one another and were members of the same bacterial division, except in two cases. These were PT0015 and PT0023. The PT0015 tag was found in a Betaproteobacteria clone and an Acidobacteria clone. The PT0023 tag was likewise found in clone sequences from two divisions, Chloroflexi and WS3. Taken together, the average distance between 16S rRNA clones identified in this study that encoded the same SARD tag was 3.3%.
FIG. 6.
Unrooted phylogenetic tree of 16S rRNA gene sequences from sample WP45. Bootstrap values of 50 percent or greater are indicated. Identification numbers for SARD tags found in the 16S rRNA sequences are indicated to the right. The names of 16S rRNA sequences that possess the same SARD tag sequence are shaded. The corresponding SARD tag identification numbers are grouped by vertical bars. GenBank accession numbers for reference sequences utilized to deduce bacterial division affiliations are given in italics. SARD tags that occur in sequences of different division affiliations are indicated by an asterisk. NIT, not informative tag. The scale bars represent distances of 10 percent.
Not all 16S rRNA genes possess a recoverable and informative SARD tag. For example, about 70% of 16S rRNA sequences in an ARB database had an AluI restriction site within or adjacent to the V5 region and would be expected to produce an informative tag (19). For most of the remaining sequences, a SARD tag will still be recovered from an AluI site located at position 1067 (E. coli numbering). Any SARD tag produced from this region will have little information content, since this region is conserved. For this reason a step was included in the SARD library construction to gel purify only those terminal AluI restriction fragments that occur within or adjacent to the V5 region (Fig. 1, Steps I and II).
Comparison of the SARD data with the 16S rRNA clone sequences also revealed that 13 of the 14 bacterial divisions represented by 16S rRNA clones had corresponding SARD tag sequences (Fig. 6). The exception was the Planctomycetes division, where three 16S rRNA clones that did not encode an informative SARD tag were identified in this study by DNA sequencing. The Gemmimonas division also appeared underrepresented in the SARD profile, since only 1 of the 10 nonidentical 16S clone sequences possessed an informative SARD tag. To more objectively assess phylogenetic bias in the SARD method, we examined a database of 5,100 16S rRNA gene sequences (21) to determine whether different bacterial divisions would be equally represented in a SARD profile (Fig. 7). In addition to Planctomycetes and Gemmimonas, less than 50 percent of the 16S sequences from five other divisions, including Haloanaerobiales, Fibrobacteres, Bacteroidetes, Synergistes, and Deinococcus-Thermus, were predicted to be identified in a SARD profile and these divisions would therefore be underrepresented. The majority of 16S rRNA sequences from 28 other bacterial phylum-level divisions were predicted to be well represented in a SARD profile.
FIG. 7.
Expected phylogenetic coverage of a SARD profile. Approximately 5,100 16S rRNA gene sequences (21) were examined to determine the predicted locations of SARD tags for each sequence. The fraction of sequences for each bacterial division where a tag would be expected to be recovered from the V5 region is indicated. A SARD tag will not be recovered from 16S rRNA genes where an AluI restriction site is not present in or near the V5 region.
DISCUSSION
In any sampling-based survey method, there is a premium placed on the ability to create large datasets. This situation is especially true when surveying microbial communities, which are often characterized by a large number of rare species and a small number of abundant species (19, 39, 42). Large, coherent datasets are necessary to carry out statistically significant quantitative analyses of microbial communities. SARD was developed to facilitate such surveys.
SARD is based on concatenating short DNA sequence tags and is similar to a method devised to measure gene expression, termed SAGE, for serial analysis of gene expression (46-48). One advantage that SAGE has over other methods of gene expression analysis, such as microarrays or reporter matrices, is that no prior DNA sequence information for the genome under study is necessary. This attribute is a critical property of any microbial community surveying method, since only a small fraction of bacterial DNA sequences are known (37). SAGE per se would not work as a microbial community profiling tool, since the method relies on the fact that essentially none of the mRNA transcripts being surveyed share sequence similarities. Thus, the location of a SAGE tag within a given mRNA is not important. For the purpose of surveying a common gene from a community of genomes, a method necessarily must target a variable region. SARD accomplishes this point by targeting a conserved AluI restriction site located immediately adjacent to the V5 variable region of the 16S gene (Fig. 1).
Methods similar to SARD that rely on concatenation of variable regions of the 16S rRNA have been developed. One of these methods, termed SARST, for serial analysis of ribosomal sequence tags (RSTs), uses PCR to amplify and concatenate the V1 region of the 16S gene (33). SARST produces variable region RSTs that are from 17 to 55 bp in length. In a recent application of SARST, nearly 13,000 RSTs were identified from arctic tundra and boreal forest samples (32). A variation on SARST that targets the V6 region has also been reported (27). In each of these methods, there is a tradeoff between RST length and phylogenetic resolution. Relative to SARST and its related methods, SARD provides less phylogenetic resolution (shorter tags) for increased throughput (more tags/sequence run) to create deeper surveys.
SARD may be further distinguished from these other methods in that SARD tags are perhaps best utilized as barcodes to facilitate quantitative analysis of the abundance and distribution of microbial taxa rather than to make phylogenetic inferences between tags that possess various degrees of sequence identity. In order to realize the most value from the longer tag sequences recovered by SARST and related methods, alignments need to be made to accommodate insertions and deletions in related sequences. While this step enables the grouping of related RSTs into operational taxonomic units to lessen the effects of artifacts, the alignment of large sets of sequence data is computationally intensive and currently limited by available bioinformatics tools. The much larger molecular survey data sets that are now possible with new sequencing technologies (39) pose a significant challenge to software normally employed to handle DNA sequence data. SARD tags, by contrast, are treated as unique sequence identifiers, resulting in much smaller-sized data files that can be manipulated and analyzed with common spreadsheet software. The presence of SARD tag artifacts in the resulting SARD profiles must be kept in mind, however, when interpreting the results.
Phylogenetic resolution of a SARD tag.
A SARD tag is comprised of 12 bp of variable sequence and 4 bp of the AluI restriction site (AGCT). The resolution of a 16-bp tag corresponds to a single base change, or 6.3% (1/16). For example, the smallest difference that could be detected when comparing two 16-mer sequences was ∼6%. As a broad approximation, when comparing 16S rRNA gene sequences, distance values of 3%, 5%, and 10% correspond to species-, genus-, and family-level phylogenetic resolutions, respectively (37). From a practical standpoint, the phylogenetic resolution of a SARD tag may be somewhat greater. For example, 75% (12/16) of a SARD tag is variable sequence, whereas variable regions comprise only 38% (591/1,542) of the E. coli 16S rRNA gene (10). A SARD tag can then be considered to possess more information content per unit length than a longer piece of DNA having a greater proportion of conserved sequence. Also, the information content of the 16S gene is partially redundant because of the intramolecular base pairing that occurs to produce the rRNA secondary structure. The targeted location of a SARD tag within the 16S rRNA gene is on one side of a hairpin structure that forms the V5 region, and thus, the SARD tag itself has no redundant sequence.
The notion that a SARD tag contains more information density than a longer piece of DNA that includes conserved DNA and inverted repeats was supported by the observation that nonidentical 16S rRNA clone sequences from this study that encoded the same SARD tag were, on average, only 3.3% different from one another. Therefore, at least operationally, the phylogenetic resolution of a SARD tag corresponds to somewhere between the species and genus levels.
SARD limitations.
As with all microbial surveying methods, SARD has inherent limitations and biases. Sources of bias that are intrinsic to most microbial surveying projects include the choice of the sample DNA extraction method and the choice of “universal” primers for the initial PCR amplification of the 16S rRNA gene. In addition, the differential amplification of templates, or PCR bias, can lead to distortions of the relative abundance of DNA sequences in a sample. This type of bias occurs in a small minority of 16S rRNA templates and can be lessened by decreasing the amplification cycles and increasing the template concentrations (2, 15, 29, 36), as was done in this study.
A potentially more significant bias arises from variability in the location of the first AluI site downstream of the TX9 primer. For example, the position within the 16S gene where a SARD tag will be recovered is determined by the presence of a conserved AluI restriction site within or adjacent to the V5 region. This site was found to be present in about 70 percent of sequences examined. 16S rRNA gene sequences that do not have an AluI site in this region will be excluded from a SARD profile. Some bacterial divisions will be disproportionably affected by this bias, which will lead to their being underrepresented in a SARD profile (Fig. 7). This bias highlights the need for conducting more than one type of survey method. Sequencing 16S rRNA gene clones is complementary to SARD in that the approach provides a qualitative framework of a microbial community, as well as identifying underrepresented sequences in a SARD profile.
rRNA operons are known to vary from 1 to 15 copies per genome (26). As a result, 16S-based survey data cannot be translated directly to genome equivalents or cell numbers. Nevertheless, in comparisons of 16S data from two samples to identify relative differences or ratios, the variation in rRNA operon copy number is not relevant, since it cancels out in the arithmetic. Therefore, meaningful quantitative comparisons of 16S data can be made despite rRNA operon copy number variation between species. A potentially relevant exception could occur when two different species that harbor identical 16S rRNA gene sequences (or SARD tags) and that possess different rRNA operon copy numbers are being compared in two samples. This possibility may not significantly detract from such comparative analyses, since rRNA operon copy number appears to have a phylogenetic basis (related genomes equal similar rRNA copy numbers) (25).
SARD tag abundance distribution.
The abundance class distribution of a bacterial community is an important parameter that can enable estimates to be made of certain features of the community based on relatively small samples. Such features include estimates on the amount of sampling necessary to achieve a given level of coverage, estimates of species diversity, and estimates of the total species richness present. Without knowing what type of abundance distribution a community follows, species or operational taxonomic unit richness estimates must rely on nonparametric estimators (22).
Abundance classes of bacterial communities have been suggested to follow log-normal (11, 38), power law (Zipf) (19), or other distributions (31), to name a few. The abundance class data presented here from four soil samples clearly followed a power law distribution down to a lower abundance level of 0.01%. Interestingly, the observed power law distribution model would not be consistent with a single-copy abundance class. Since the total number of PCR-amplifiable 16S genes used in the construction of the SARD libraries (determined by quantitative PCR) was ∼25 million, tags present in single copy would represent an abundance class of ∼4 × 10−6 percent. Such an abundance class would not fit the line in any of the plots in Fig. 4 at any tag fraction. The data therefore anticipate that either of two mutually exclusive scenarios exists: (i), the SARD tag abundance distribution for the entire community follows a power law and does not include very-low-abundance classes, or (ii), the data follow a bimodal (e.g., log normal) distribution with an inflection point that has not yet become evident with the current level of sampling. Whether the observed SARD tag distribution from these four samples translates to other taxonomic levels of resolution is not known.
SARD tag richness estimates.
Soil habitats are known to harbor significant numbers of bacterial cells and species. The classic DNA reassociation experiments by Torsvik et al. indicated that as many as 4,000 different bacterial genomes were present in a 30-g soil sample (42). More recently, a mathematical reevaluation of published soil genomic DNA reassociation data indicated that there may have been ∼107 prokaryotic species in a 10-g soil sample (19). Richness estimates made from sampling-based survey data, such as 16S rRNA clone libraries, have tended to be much lower and are probably the result of inadequate sampling.
In the SARD data presented here, both the accumulation curves and Chao1 richness estimates of the SARD profiles of the four soil samples were found to be sample-size dependent. Thus, additional sampling would be expected to identify more unique tags and lead to higher estimates of the total richness. Taken together with the large number of singleton tags observed, the level of sampling of these soil communities was not sufficient to estimate the total tag richness in these samples. The inadequate sampling of these profiles was significant given the size of these surveys (∼104) and the level of taxonomic resolution of the SARD profiles (approximately between the genus and species level).
SARD tags do not provide a species-level resolution, and thus, one SARD tag may represent multiple different 16S sequences (species). Importantly, these sequences are usually quite closely related (Fig. 6). For some applications, this sequence consolidation aspect of SARD can be an asset. For example, to our knowledge, no soil or similarly complex bacterial community has been surveyed to completion. A method, such as SARD, that effectively reduces microbial complexity in a coherent manner and provides increased throughput may be an important tool for characterizing extraordinarily species-rich bacterial communities, such as those found in soil and sediments.
Acknowledgments
We thank G. Nichols and T. Piazza (Victoria Island Farms) and R. Ferguson (Ferguson Farms) for generously sharing their time and for providing access to their properties for soil sample collection.
Authors M.N.A., J.R., and D.D.-D. declare a financial interest in the subject matter as founders/stockholders of Taxon. This work was supported in part by Department of Energy Small Business Innovations Research (SBIR) grant DE-FG02-04ER84089.
Footnotes
Published ahead of print on 25 May 2007.
REFERENCES
- 1.Acinas, S. G., V. Klepac-Ceraj, D. E. Hunt, C. Pharino, I. Ceraj, D. L. Distel, and M. F. Polz. 2004. Fine-scale phylogenetic architecture of a complex bacterial community. Nature 430:551-554. [DOI] [PubMed] [Google Scholar]
- 2.Acinas, S. G., R. Sarma-Rupavtarm, V. Klepac-Ceraj, and M. F. Polz. 2005. PCR-induced sequence artifacts and bias: insights from comparison of two 16S rRNA clone libraries constructed from the same sample. Appl. Environ. Microbiol. 71:8966-8969. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Amann, R. I., W. Ludwig, and K. H. Schleifer. 1995. Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol. Rev. 59:143-169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ashby, M. September 2003. Methods for the survey and genetic analysis of populations. U.S. patent 6,613,520.
- 5.Borneman, J., and E. W. Triplett. 1997. Molecular microbial diversity in soils from eastern Amazonia: evidence for unusual microorganisms and microbial population shifts associated with deforestation. Appl. Environ. Microbiol. 63:2647-2653. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Chao, A. 1987. Estimating the population size for capture-recapture data with unequal catchability. Biometrics 43:783-791. [PubMed] [Google Scholar]
- 7.Chao, A. 1984. Non-parametric estimation of the number of classes in a population. Scand. J. Stat. 11:265-270. [Google Scholar]
- 8.Cline, J., J. C. Braman, and H. H. Hogrefe. 1996. PCR fidelity of pfu DNA polymerase and other thermostable DNA polymerases. Nucleic Acids Res. 24:3546-3551. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Colwell, R. K. 2005. EstimateS: statistical estimation of species richness and shared species from samples, version 7.5. User's guide. University of Connecticut, Storrs. http://viceroy.eeb.uconn.edu/EstimateS.
- 10.De Rijk, P., J. M. Neefs, Y. Van de Peer, and R. De Wachter. 1992. Compilation of small ribosomal subunit RNA sequences. Nucleic Acids Res. 20:2075-2089. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Dunbar, J., S. M. Barns, L. O. Ticknor, and C. R. Kuske. 2002. Empirical and theoretical bacterial diversity in four Arizona soils. Appl. Environ. Microbiol. 68:3035-3045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Dunbar, J., S. Takala, S. M. Barns, J. A. Davis, and C. R. Kuske. 1999. Levels of bacterial community diversity in four arid soils compared by cultivation and 16S rRNA gene cloning. Appl. Environ. Microbiol. 65:1662-1669. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Ewing, B., and P. Green. 1998. Base-calling of automated sequencer traces using Phred. II. Error probabilities. Genome Res. 8:186-194. [PubMed] [Google Scholar]
- 14.Ewing, B., L. Hillier, M. C. Wendl, and P. Green. 1998. Base-calling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Res. 8:175-185. [DOI] [PubMed] [Google Scholar]
- 15.Farrelly, V., F. A. Rainey, and E. Stackebrandt. 1995. Effect of genome size and rrn gene copy number on PCR amplification of 16S rRNA genes from a mixture of bacterial species. Appl. Environ. Microbiol. 61:2798-2801. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Felsenstein, J. 2004. PHYLIP (phylogeny inference package), version 3.6. Department of Genome Sciences, University of Washington, Seattle, WA.
- 17.Fisher, M. M., and E. W. Triplett. 1999. Automated approach for ribosomal intergenic spacer analysis of microbial diversity and its application to freshwater bacterial communities. Appl. Environ. Microbiol. 65:4630-4636. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Fleischmann, R. D., M. D. Adams, O. White, R. A. Clayton, E. F. Kirkness, A. R. Kerlavage, C. J. Bult, J. F. Tomb, B. A. Dougherty, J. M. Merrick, et al. 1995. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269:496-512. [DOI] [PubMed] [Google Scholar]
- 19.Gans, J., M. Wolinsky, and J. Dunbar. 2005. Computational improvements reveal great bacterial diversity and high metal toxicity in soil. Science 309:1387-1390. [DOI] [PubMed] [Google Scholar]
- 20.Gill, S. R., M. Pop, R. T. Deboy, P. B. Eckburg, P. J. Turnbaugh, B. S. Samuel, J. I. Gordon, D. A. Relman, C. M. Fraser-Liggett, and K. E. Nelson. 2006. Metagenomic analysis of the human distal gut microbiome. Science 312:1355-1359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Hugenholtz, P. 2002. Exploring prokaryotic diversity in the genomic era. Genome Biol. 3:REVIEWS0003.1-0003.8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Hughes, J. B., J. J. Hellmann, T. H. Ricketts, and B. J. Bohannan. 2001. Counting the uncountable: statistical approaches to estimating microbial diversity. Appl. Environ. Microbiol. 67:4399-4406. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Kenzelmann, M., and K. Muhlemann. 1999. Substantially enhanced cloning efficiency of SAGE (serial analysis of gene expression) by adding a heating step to the original protocol. Nucleic Acids Res. 27:917-918. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Kirk, J. L., L. A. Beaudette, M. Hart, P. Moutoglis, J. N. Klironomos, H. Lee, and J. T. Trevors. 2004. Methods of studying soil microbial diversity. J. Microbiol. Methods 58:169-188. [DOI] [PubMed] [Google Scholar]
- 25.Klappenbach, J. A., J. M. Dunbar, and T. M. Schmidt. 2000. rRNA operon copy number reflects ecological strategies of bacteria. Appl. Environ. Microbiol. 66:1328-1333. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Klappenbach, J. A., P. R. Saxman, J. R. Cole, and T. M. Schmidt. 2001. rrndb: the ribosomal RNA operon copy number database. Nucleic Acids Res. 29:181-184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Kysela, D. T., C. Palacios, and M. L. Sogin. 2005. Serial analysis of V6 ribosomal sequence tags (SARST-V6): a method for efficient, high-throughput analysis of microbial community composition. Environ. Microbiol. 7:356-364. [DOI] [PubMed] [Google Scholar]
- 28.Liu, W. T., T. L. Marsh, H. Cheng, and L. J. Forney. 1997. Characterization of microbial diversity by determining terminal restriction fragment length polymorphisms of genes encoding 16S rRNA. Appl. Environ. Microbiol. 63:4516-4522. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Lueders, T., and M. W. Friedrich. 2003. Evaluation of PCR amplification bias by terminal restriction fragment length polymorphism analysis of small-subunit rRNA and mcrA genes by using defined template mixtures of methanogenic pure cultures and soil DNA extracts. Appl. Environ. Microbiol. 69:320-326. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Muyzer, G., E. C. de Waal, and A. G. Uitterlinden. 1993. Profiling of complex microbial populations by denaturing gradient gel electrophoresis analysis of polymerase chain reaction-amplified genes coding for 16S rRNA. Appl. Environ. Microbiol. 59:695-700. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Narang, R., and J. Dunbar. 2004. Modeling bacterial species abundance from small community surveys. Microb. Ecol. 47:396-406. [DOI] [PubMed] [Google Scholar]
- 32.Neufeld, J. D., and W. W. Mohn. 2005. Unexpectedly high bacterial diversity in arctic tundra relative to boreal forest soils, revealed by serial analysis of ribosomal sequence tags. Appl. Environ. Microbiol. 71:5710-5718. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Neufeld, J. D., Z. Yu, W. Lam, and W. W. Mohn. 2004. Serial analysis of ribosomal sequence tags (SARST): a high-throughput method for profiling complex microbial communities. Environ. Microbiol. 6:131-144. [DOI] [PubMed] [Google Scholar]
- 34.Newman, M. E. J. 2005. Power laws, Pareto distributions and Zipf's law. Contemp. Physics 46:323-352. [Google Scholar]
- 35.Pace, N. R. 1997. A molecular view of microbial diversity and the biosphere. Science 276:734-740. [DOI] [PubMed] [Google Scholar]
- 36.Polz, M. F., and C. M. Cavanaugh. 1998. Bias in template-to-product ratios in multitemplate PCR. Appl. Environ. Microbiol. 64:3724-3730. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Schloss, P. D., and J. Handelsman. 2004. Status of the microbial census. Microbiol. Mol. Biol. Rev. 68:686-691. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Schloss, P. D., and J. Handelsman. 2006. Toward a census of bacteria in soil. PLoS Comput. Biol. 2:e92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Sogin, M. L., H. G. Morrison, J. A. Huber, D. M. Welch, S. M. Huse, P. R. Neal, J. M. Arrieta, and G. J. Herndl. 2006. Microbial diversity in the deep sea and the underexplored “rare biosphere.” Proc. Natl. Acad. Sci. USA 103:12115-12120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Tettelin, H., and T. V. Feldblyum. 2004. Genome sequencing and analysis. John Wiley & Sons, Ltd., London, United Kingdom.
- 41.Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins. 1997. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25:4876-4882. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Torsvik, V., J. Goksoyr, and F. L. Daae. 1990. High diversity in DNA of soil bacteria. Appl. Environ. Microbiol. 56:782-787. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Torsvik, V., L. Ovreas, and T. F. Thingstad. 2002. Prokaryotic diversity—magnitude, dynamics, and controlling factors. Science 296:1064-1066. [DOI] [PubMed] [Google Scholar]
- 44.Torsvik, V., R. Sorheim, and J. Goksoyr. 1996. Total bacterial diversity in soil and sediment communities—a review. J. Ind. Microbiol. Biotechnol. 17:170-178. [Google Scholar]
- 45.Tyson, G. W., J. Chapman, P. Hugenholtz, E. E. Allen, R. J. Ram, P. M. Richardson, V. V. Solovyev, E. M. Rubin, D. S. Rokhsar, and J. F. Banfield. 2004. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428:37-43. [DOI] [PubMed] [Google Scholar]
- 46.Velculescu, V. E., L. Zhang, B. Vogelstein, and K. W. Kinzler. 1995. Serial analysis of gene expression. Science 270:484-487. [DOI] [PubMed] [Google Scholar]
- 47.Velculescu, V. E., L. Zhang, W. Zhou, J. Vogelstein, M. A. Basrai, D. E. Bassett, Jr., P. Hieter, B. Vogelstein, and K. W. Kinzler. 1997. Characterization of the yeast transcriptome. Cell 88:243-251. [DOI] [PubMed] [Google Scholar]
- 48.Zhang, L., W. Zhou, V. E. Velculescu, S. E. Kern, R. H. Hruban, S. R. Hamilton, B. Vogelstein, and K. W. Kinzler. 1997. Gene expression profiles in normal and cancer cells. Science 276:1268-1272. [DOI] [PubMed] [Google Scholar]
- 49.Zhou, J., B. Xia, D. S. Treves, L. Y. Wu, T. L. Marsh, R. V. O'Neill, A. V. Palumbo, and J. M. Tiedje. 2002. Spatial and resource factors influencing high microbial diversity in soil. Appl. Environ. Microbiol. 68:326-334. [DOI] [PMC free article] [PubMed] [Google Scholar]








