Abstract
Sequence analysis of environmental DNA promises to provide new insights into the ecology and biogeochemistry of uncultured marine microbes. In this study we used the Sargasso Sea Whole Genome Sequence (WGS) data set to search for hydrolases used by Cytophaga-like bacteria to degrade biopolymers such as polysaccharides and proteins. Analysis of the Sargasso WGS data for contigs bearing both the 16S rRNA genes of Cytophaga-like bacteria and hydrolase genes revealed a cellulase gene (celM) most similar to the gene found in Cytophaga hutchinsonii. A BLAST search of the entire Sargasso Sea WGS data set indicated that celM was the most abundant cellulase-like gene in the Sargasso Sea. However, the similarity between CelM-like cellulases and peptidases belonging to metalloprotease family M42 led us to question whether CelM is involved in the degradation of polysaccharides or proteins. PCR primers were designed for the celM genes in the Sargasso Sea WGS data set and used to identify celM in a fosmid library constructed with prokaryotic DNA from the western Arctic Ocean. Expression analysis of the Cytophaga-like Arctic CelM, which is 63% identical and 77% similar to CelM in C. hutchinsonii, indicated that there was peptidase activity, whereas cellulase activity was not detected. Our analysis suggests that the celM gene plays a role in the degradation of protein by Cytophaga-like bacteria. The abundance of peptidase genes in the Cytophaga-like fosmid clone provides further evidence for the importance of Cytophaga-like bacteria in the degradation of protein in high-molecular-weight dissolved organic matter.
The hydrolysis of biopolymers is potentially a rate-limiting step in the degradation of high-molecular-weight dissolved organic matter (DOM) by microbial communities in the oceans (2). The consumption of high-molecular-weight DOM by microbes requires a hydrolysis step because only compounds that have molecular masses that are less than about 500 Da can be transported across the bacterial membrane (33). Despite the requirement for hydrolysis, high-molecular-weight DOM is highly bioreactive (1) and plays an important role in bacterial metabolism in the ocean. Polysaccharides and proteins are the most abundant known constituents of high-molecular-weight DOM (4) and are readily utilized by bacteria. In the Sargasso Sea, for example, protein supports about one-half of the bacterial nitrogen requirement (20), and polysaccharides, such as chitin, can support as much as 10% of the bacterial production in estuaries (22).
Cytophaga-like bacteria are hypothesized to be important in the hydrolysis and mineralization of biopolymers in the oceans. Cultured isolates of Cytophaga-like bacteria are proficient in degrading carbohydrate biopolymers, such as cellulose and chitin (21), which are constituents of high-molecular-weight DOM (3). Efficient utilization of biopolymers in high-molecular-weight DOM and in detritus particles might explain the high levels of free-living Cytophaga-like bacteria and their especially high levels on detritus particles in the ocean (12-14, 29). The hypothesis that high-molecular-weight DOM is consumed by Cytophaga-like bacteria is supported by radiotracer studies that examined the consumption of protein and chitin by uncultured bacteria (8). Finally, Cytophaga-like bacteria grow more rapidly in seawater incubation mixtures supplemented with concentrated high-molecular-weight DOM (11). Examining hydrolase genes in environmental DNA could be another approach to link uncultured bacterial biopolymer degradation and other biogeochemical processes.
In this study we examined genes in the Sargasso Sea Whole Genome Sequence (WGS) data set (32) encoding hydrolases potentially used by marine Cytophaga-like bacteria for degrading biopolymers in high-molecular-weight DOM. Our focus on Cytophaga-like bacteria was motivated by the desire to understand the role of these bacteria in the utilization of high-molecular-weight DOM. PCR primers were designed for the most abundant type of endoglucanase identified in the Sargasso data set and used to screen a fosmid library constructed with prokaryotic DNA from the western Arctic Ocean. The cloned hydrolase was expressed in Escherichia coli and assayed for various hydrolase activities because gene function could not be determined with confidence using amino acid similarity alone due to the similarity of the hydrolase to both cellulases and peptidases. Our results highlight the value of experimental data that support actual gene function. The complete sequence of the fosmid bearing Arctic Cytophaga-like DNA supports the hypothesis that Cytophaga-like bacteria are especially adapted to utilizing biopolymers in high-molecular-weight DOM.
MATERIALS AND METHODS
Contigs in the Sargasso WGS database (32) were screened for Cytophaga-like 16SrRNA genes using BLASTN and a data set for bacterial 16S rRNA genes (http://rdp.cme.msu.edu/download/philrdp.16Sab.arb) (19). Open reading frames of the Cytophaga-like contigs were then examined for endoglucanase genes using BLASTP and the GenBank database (http://www.ncbi.nlm.nih.gov/). This analysis identified a celM-like endoglucanase gene most similar (63%) to that found in Cytophaga hutchinsonii. The C. hutchinsonii gene was then used in a second iteration of BLASTP to find similar genes in the Sargasso WGS data set.
Fosmid library screening.
The Arctic fosmid library was constructed using DNA isolated from cells in the <0.8-μm size fraction of a seawater sample collected at a depth of 10 m in the Chukchi Sea (72°19.33′N, 151°59.07′W) in July 2004. For DNA isolation and library construction we used procedures described previously (10), except that the bacterial biomass was collected by vacuum filtration rather than by tangential flow filtration. Pools of 96 fosmid clones were screened for 16S rRNA genes by denaturing gradient electrophoresis (DGGE) of PCR amplicons generated with primers GC358F and 517R (28). Selected bands resolved on an 8% polyacrylamide gel containing a 25% to 55% denaturant gradient (13.8% to 22% formamide and 10.5% to 23% urea) were reamplified and sequenced. Phylogenetic classification was performed using BLAST and the ARB sequence analysis tool (26).
PCR primers were designed to amplify the celM-like genes identified in the Sargasso WGS data set using the tool for identifying consensus primers in the Oligo primer analysis software package (Molecular Biology Insights, Inc.). The celM gene primers CelM298F (5′-GCTCCTTCAAAATGGG-3′) and CelM559R (5′-AACCTCAGCAATCATAAATCC-3′) were selected based on successful in silico amplification of four of the most complete (>990-bp) Sargasso celM-like genes. These primers were then used to screen the Arctic fosmid library. Pools of 96 fosmid clones were screened by performing a celM gene PCR with primers at a concentration of 200 nM in a buffer containing 0.7 mM MgCl2 provided with the Taq polymerase (Promega). The thermal cycling conditions consisted of 35 cycles of denaturation at 96°C, primer annealing at 50°C, and DNA polymerization at 72°C. The celM-bearing clones in PCR-positive plate pools were identified by further PCR screening of the 96 pooled clone rows and columns of the 96-well plate.
The celM-bearing Arctic clone was completely sequenced by the Joint Genome Institute. The sequence was analyzed using the annotation tools available in the Artemis DNA sequence viewer obtained from the Welcome Trust Sanger Institute (http://www.sanger.ac.uk/Software/Artemis/) and using the FGENESB website (http://www.softberry.ru/) (Softberry, Inc.).
Phylogenetic analysis.
16S rRNA gene sequences were aligned using the ARB fast aligner (26), and celM gene sequences were aligned using a CelM amino acid sequence alignment generated using ClustalW (6). Distance matrices and neighbor-joining trees were constructed using PHYLIP (15).
Expression analysis of CelM.
The celM gene of the Arctic fosmid clone (Arctic CelM) was subcloned into an expression vector for activity analyses. The entire celM gene of the Arctic CelM clone was amplified using PCR primers CelM516U (5′-CACCATGGCAACAAAAAAAATACTT-3′) and CelM1648L (5′-AGATGTTTCTACTTTACTTAACCCAGA-3′) and was cloned into the pET101 expression vector and E. coli TOP10 (Invitrogen) by following the procedure provided by the manufacturer. The celM pET101 construct was subsequently transformed into E. coli BL21(DE3) for expression analysis.
In order to examine the size and amino acid sequence of Arctic CelM, overnight cultures (500 ml) of E. coli bearing the celM-pET101 construct or the pET101 vector alone were pelleted by centrifugation and washed with phosphate-buffered saline (PBS) (8 g of NaCl per liter, 0.2 g of KCl per liter, 1.44 g of Na2HPO4 per liter, 0.24g of KH2PO per liter; pH adjusted to 7.4). The pellet was resuspended in 5 ml of PBS, and the cells were lysed using sonication. Cell debris was removed by centrifugation at 100,000 × g for 60 min at 4°C. The molecular weight of the expressed protein was estimated by sodium dodecyl sulfate-polyacrylamide gel electrophoresis and compared to that expected from the 1,094-bp celM gene. Amino acid sequencing by liquid chromatography-mass spectrometry of the expressed protein was performed by the Campus Chemical Instrument Center, Ohio State University, using a procedure described by Hanson and Tabita (18). The Arctic CelM was digested with trypsin, and the sequences of 21 peptides covering 63% of the original protein were then matched to the sequence of Arctic CelM.
Cell lysates were assayed for glutamyl aminopeptidase activity using glutamine-p-nitroanilide (Glu-pNA) (Peptide Institute, Inc.) as described by L'Anson et al. (25). Twenty microliters of 5 mM Glu-pNA was added to 180 μl of lysate and incubated for 18 h at 37°C or 60°C (cloned Clostridium thermocellum CelM). The reaction was stopped by addition of 100 μl of 30% (vol/vol) acetic acid. The sample was then centrifuged at 10,000 × g for 5 min, and the absorbance at 410 nm was determined.
Hydrolysis of carboxymethylcellose (CMC) was assayed by monitoring the production of reducing sugars using the procedure outlined by Wu et al. (34). The reaction mixture contained 20 μl of 1% CMC (Polysciences, Inc.) and 180 μl of lysate and was incubated at 37°C for 18 h. The reaction was terminated by addition of 0.2 ml of 2% sodium carbonate and 1 ml of cyanide-carbonate solution (10 mM KCN, 50 mM Na2CO3). Two milliliters of 0.05% potassium ferricyanide was added, and the solution was vortexed and boiled for 30 min. The tubes were cooled, and the absorbance at 420 nm was determined.
Cellulase activity was assayed by measuring the hydrolysis of fluorogenic and chromogenic substrate analogs, including 4-methylumbelliferyl-β-d-glucopyranoside, 4-nitrophenyl-β-d-cellobioside, and 4-nitrophenyl-β-d-cellotetraoside (Sigma-Aldrich). The reaction mixture contained 20 μl of 5 mM substrate and 180 μl of lysate. The reaction mixtures were incubated for 18 h at 37°C or 60°C (cloned C. thermocellum CelM) and terminated by addition of glycine carbonate buffer (pH 9.7). The sample was then centrifuged at 10,000 × g for 5 min, and the absorbance at 410 nm was determined (nitrophenyl-linked substrate) or the preparation was assayed for blue fluorescence under UV light (methylumbelliferyl-linked substrate).
Protease activity was measured using fluorescein isothiocyanate-conjugated bovine albumin. The reaction mixture contained 20 μl of a 1-mg/ml labeled albumin solution and 180 μl of lysate. The mixture was incubated at 37°C for 18 h, and the reaction was terminated by addition of 5% trichloroacetic acid. The sample was centrifuged at 10,000 × g for 15 min, and the green fluorescein fluorescence in the supernatant was assayed under UV light. Controls were treated with proteinase K.
Nucleotide sequence accession number.
The nucleotide sequence of the Arctic fosmid was deposited in GenBank under accession number DQ272742.
RESULTS AND DISCUSSION
Cytophaga-like contigs in the Sargasso Sea WGS data set.
BLASTN analysis of the Sargasso WGS data set using a database of 16S rRNA genes revealed 26 contigs having 16S rRNA genes from Cytophaga-like bacteria. In contrast, in the original annotation of the Sargasso WGS data set only 11 contigs were identified as having Cytophaga-like 16S rRNA genes (32), which represents less than 5% of the prokaryotes in the Sargasso WGS data set. Such a low abundance of Cytophaga-like bacteria is somewhat surprising for a marine system because Cytophaga-like bacteria are one of the abundant groups of bacteria in the ocean (21). Our BLASTN analysis indicates that the abundance of Cytophaga-like bacteria in the Sargasso WGS data set might be closer to 15%. The contribution of this bacterial group to bacterial communities has been underestimated by studies using clone libraries of 16S rRNA genes (7). Although problems with the general PCR primers that might miss Cytophaga-like bacteria would not affect the shotgun sequencing approach, DNA extraction efficiency could be a factor. Our results suggest that 16S rRNA genes of other bacterial groups may have been missed in the original annotation as well.
The Cytophaga-like bacteria identified in the Sargasso WGS data set were related to other Cytophaga-like bacteria seen previously in marine environments, based on 16S rRNA gene phylogeny. The Sargasso Cytophaga-like sequences were grouped into nine clusters of marine Cytophaga-like bacteria (Fig. 1), although one Sargasso gene (contig AACY01458001) remained separate from the other Cytophaga-like bacterial sequences (Fig. 1). Because the shotgun sequencing approach has no PCR step, we might have expected the Sargasso WGS data set to include completely new types of Cytophaga-like bacteria not seen previously in PCR-based studies. However, the Cytophaga-like bacteria in the Sargasso study were closely related to bacteria already identified in PCR clone libraries.
FIG. 1.
Phylogenetic relationships of Cytophaga-like bacterial 16S rRNA genes in the Sargasso Sea WGS data set (Sargasso Sea) and in a fosmid clone library of Arctic environmental DNA (Arctic Ocean). Clusters 1 and 3 are expanded to reveal the Sargasso contigs and Arctic fosmid having both 16S rRNA and celM genes. The other Cytophaga-like clusters contained one to three Sargasso Cytophaga-like 16S rRNA genes and Cytophaga-like genes of uncultured bacteria from various marine environments.
Cytophaga-like hydrolases in the Sargasso WGS data set.
BLASTP analysis identified two Cytophaga-like contigs bearing genes encoding hydrolases most similar to the celM gene in C. hutchinsonii. Contig AACY01014206 encodes a protein that is 63% identical and 77% similar to C. hutchinsonii CelM (Table 1). This protein is also 33% to 39% identical to the CelM in Geobacter metallireducens, a putative hydrolase in Geobacter sulfurreducens, and endoglucanases in three bacteria. The protein encoded by this Cytophaga-like hydrolase gene is also similar to peptidases in three species of Bacillus (Table 1).
TABLE 1.
BLASTP analysis of a Cytophaga-like celM gene in the Sargasso WGS data set (contig AACY01014206)a
| Accession no. | Description | Organism | % Identity | % Similarity |
|---|---|---|---|---|
| ZP_00309918 | Cellulase M and related proteins (COG1363) | Cytophaga hutchinsonii | 63 | 77 |
| ZP_00301131 | Cellulase M and related proteins (COG1363) | Geobacter metallireducens | 38 | 58 |
| NP_953245 | Hydrolase, putative | Geobacter sulfurreducens | 39 | 58 |
| AAV45786 | Endoglucanase | Haloarcula marismortui | 36 | 55 |
| YP_176856 | Endoglucanase | Bacillus clausii | 36 | 53 |
| NP_693253 | Endoglucanase | Oceanobacillus iheyensis | 34 | 53 |
| ZP_00185964 | Cellulase M and related proteins (COG1363) | Rubrobacter xylanophilus | 31 | 51 |
| NP_618720 | Cellulase | Methanosarcina acetivorans | 32 | 54 |
| NP_632697 | Hypothetical protein MM0673 | Methanosarcina mazei | 32 | 53 |
| NP_868423 | Probable endoglucanase | Rhodopirellula baltica | 31 | 52 |
| ZP_00149447 | Cellulase M and related proteins (COG1363) | Methanococcoides | 34 | 52 |
| YP_038617 | Glucanase/aminopeptidase | Bacillus thuringiensis | 33 | 55 |
| NP_834277 | Aminopeptidase | Bacillus cereus | 33 | 55 |
| YP_021461 | Peptidase, M42 family | Bacillus anthracis | 32 | 55 |
| YP_085892 | Glucanase/aminopeptidase | Bacillus cereus | 32 | 55 |
| NP_693064 | Endo-1,4-beta-glucanase | Oceanobacillus iheyensis | 33 | 55 |
The 16 genes in the GenBank database with the highest levels of similarity to the Sargasso celM gene are listed. The levels of identity and similarity are the levels of identity and similarity between the amino acid sequence of the Sargasso CelM and the amino acid sequences of the homologous gene products in the organisms listed.
A protein encoded on a second Cytophaga-like contig (AACY01060866) also is similar to CelM. BLASTP analysis revealed that the hydrolase encoded on the second contig is 53% identical and 79% similar to CelM in C. hutchinsonii. In contrast, there is no similarity between the hydrolase encoded on this Cytophaga-like contig and peptidases or proteases.
Abundance of CelM in the Sargasso WGS.
CelM is the most abundant cellulase-like hydrolase in the Sargasso data set (Fig. 2) based on a BLASTP analysis using 113 cellulases (EC 3.2.1.4) available from the Swiss-Prot database (March 2005). CelM occurs 30 times in the Sargasso data set, outnumbering all other cellulases classified in glycosyl hydrolase families, according to the nomenclature used by the Carbohydrate-Active Enzymes (CAZY) database (http://afmb.cnrs-mrs.fr/CAZY/) (Fig. 2). Family 5 glycosyl hydrolase is the second most abundant type of cellulase and occurs on 27 contigs. Cellulases in families 8, 9, 10, and 12 occur on eight or fewer contigs. No contig contained more than one cellulase.
FIG. 2.
Abundance of celM-like genes and genes belonging to various glycosyl hydrolase families in the Sargasso WGS data set.
Phylogeny of celM.
The celM gene has been annotated in a broad range of Archaea and Bacteria, including enteric bacteria, methanogens, thermophiles, and high-G+C-content gram-positive bacteria, as well as Cytophaga-like bacteria. Phylogenetic analysis suggested that some of the genes are true orthologs of celM, but others may be distantly related genes that have been mistaken for celM. There does not seem to be a simple relationship between these putative celM genes and bacterial phylogeny. For example, celM genes of Archaea and Bacteria are not clearly separated from each other (Fig. 3). Although Methanobacterium thermoautotrophicum and Archaeoglobus fulgidus grouped together, the other Archaea, including Methanococcus, Methanocaldococcus, and Pyrococcus, occurred on different branches.
FIG. 3.
Phylogenetic relationships of cellM-like genes in cultivated and uncultivated Bacteria and Archaea. The neighbor-joining tree was constructed using an amino acid alignment and the corresponding 527 nucleotides. The numbers at the nodes are bootstrap values based on 100 replicates. Scale bar = 10 mutations per 100 nucleotides.
Some bacteria apparently possess several copies of genes annotated as celM, which complicates the picture of the relationship between the celM gene tree and bacterial phylogeny. For example, C. hutchinsonii has three copies of the celM gene. The levels of amino acid identity for proteins encoded by C. hutchinsonii genes range from 20% to 30%, which are comparable to the levels of similarity among celM genes in different bacteria. As a result, the C. hutchinsonii genes appear in different parts of the celM tree (Fig. 3). Similarly, celM genes in different strains of E. coli do not group together, and the four Clostridium celM genes appear in different parts of the celM tree, although they are from different Clostridium species. One possibility is that bacteria received celM genes with different evolutionary histories through lateral gene transfer. On the other hand, closer scrutiny may reveal that misidentification of distantly related genes is responsible for the observed difference between the bacterial phylogeny and the celM gene tree.
Despite possible problems with the deep branches in the celM gene phylogeny, the presence of closely related clusters of celM genes does provide some insight into the types of bacteria in the Sargasso Sea that have celM. Although celM genes from related bacteria, such as Clostridium sp. strains, do not always group together, when celM genes do group together, they come from related bacteria. For example, all 14 Firmicutes celM genes belong to a single clade, as do the six Pyrococcus celM genes (Fig. 3). Similarly, all 15 of the Sargasso celM genes cluster together with celM from C. hutchinsonii (Fig. 3), suggesting that all of these genes are associated with Cytophaga-like bacteria. Although most of the celM genes in the Sargasso data are not linked to 16S rRNA genes, the phylogenetic analysis suggests that despite the broad diversity of bacteria that potentially have this gene, celM seems to be restricted to Cytophaga-like bacteria in the Sargasso Sea.
Gene content of an Arctic fosmid bearing celM.
To test the hypothesis that Cytophaga-like bacteria having celM also possess genes encoding other hydrolases, we examined a celM-containing genome fragment from an uncultivated Cytophaga-like bacterium. This analysis was performed using a fosmid library of prokaryotic DNA collected from the Arctic Ocean.
The Arctic fosmid library had 4,800 clones (average insert size, 40 kb) that were largely derived from prokaryotic DNA. Twenty-seven clones carried 16S rRNA genes, as determined by DGGE screening. BLASTN analysis of the DGGE band sequences revealed that 23 clones contained prokaryotic DNA and four clones carried DNA from the photosynthetic picoeukaryote Mantoniella squamata. The clones carrying prokaryotic DNA included 10 clones bearing Cytophaga-like 16S rRNA and four clones identified as clones carrying DNA from Gammaproteobacteria. The remaining clones with 16S rRNA genes belonged to the Alphaproteobacteria, Betaproteobacteria, and Actinobacteria (three each). No archaeal clones were detected with DGGE primers for this group.
One clone in the Arctic fosmid library (Arctic CelM) was positive as determined by the celM gene PCR. Similar to our results with the Sargasso data set, the Arctic CelM clone also carried a Cytophaga-like 16S rRNA gene detected in the DGGE analysis. A sequence analysis extending upstream and downstream of the celM gene PCR priming sites revealed a 1,089-bp open reading frame. A BLASTP analysis indicated that this open reading frame encodes a protein that is 63% identical and 77% similar to CelM in C. hutchinsonii. The amino acid sequence of Arctic CelM does not appear to have a signal peptide found in the secreted proteins examined so far, based on an analysis using SignalP (http://www.cbs.dtu.dk/services/SignalP/).
Sequence analysis of the Arctic CelM fosmid clone revealed 28 genes, including 25 open reading frames, 16S and 23S rRNA genes, and a 16S-23S rRNA internal transcribed spacer (Table 2; see Table S1 in the supplemental material). Phylogenetic analysis placed the 16S rRNA gene in marine Cytophaga-like clade 1, which also has genes from the Sargasso Sea and other marine environments (Fig. 1). Many of the protein-encoding genes were also most similar to genes in Cytophaga-like bacteria, which further indicates that the Arctic CelM fosmid clone contains DNA from a Cytophaga-like bacterium. Sixty percent of the genes on the Arctic fosmid were most similar to genes in Bacteroidetes, and 40% were most similar to genes in Cytophaga-like bacteria. Seven (30%) of the fosmid genes were most similar to genes in C. hutchinsonii. The celM gene exhibited the highest level of similarity (63% identity) to a gene in C. hutchinsonii. Other genes similar to genes in C. hutchinsonii include genes encoding acyl coenzyme A synthetase (COG0365), dihydroorotate dehydrogenase (COG0167), and a hypothetical protein, which were 62 to 78% identical to genes in the Arctic fosmid (see Table S1 in the supplemental material). Two genes were 50% identical to peptidyl-prolyl cis-trans isomerase and glycerol kinase genes.
TABLE 2.
Analysis of select open reading frames and rRNA genes of the Arctic CelM fosmid clonea
| Open reading frame | Location | Length (bp) | Accession no. | Gene function | Organism | % Identity | % Similarity |
|---|---|---|---|---|---|---|---|
| 3 | c(3976) | 735 | AAD36499 | Glycerol uptake facilitator protein | Thermotoga maritima | 47 | 66 |
| 7 | c(10446) | 1,233 | CAD78184 | Probable aminopeptidase | Pirellula sp. | 38 | 52 |
| 8 | 10478 | 1,065 | CAE79899 | Probable aminopeptidase | Bdellovibrio bacteriovorus | 36 | 51 |
| 13 | c(16837) | 1,858 | AY03305 | 23S rRNA | Uncultured Cytophagales | 98 | NAb |
| 14 | c(17336) | 500 | AY03305 | 16S-23S rRNA internal transcribed spacer | Uncultured Cytophagales | 93 | NA |
| 15 | c(18840) | 1,505 | AY03305 | 16S rRNA | Uncultured Cytophagales | 97 | NA |
| 16 | 19781 | 1,092 | ZP_00309918 | Cellulase M and related proteins (COG1363) | Cytophaga hutchinsonii | 63 | 77 |
| 21 | 25715 | 1,758 | AAM24093 | ABC-type multidrug/protein/lipid transport system ATPase | Thermoanaerobacter tengcongensis | 37 | 55 |
| 24 | 27957 | 2,508 | AAO35142 | Zinc carboxypeptidase | Clostridium tetani | 25 | 41 |
Gene functions were assigned based on the highest levels of similarity in BLASTP analyses (GenBank database, March 2005). rRNA genes were identified using BLASTN. The levels of identity and similarity are the levels of identity and similarity between the amino acid sequences encoded by the open reading frames and the amino acid sequences of the homologous gene products in the organisms listed. The level of nucleotide identity is indicated for rRNA genes. The location indicates where the gene starts; c indicates genes encoded on the complementary strand. For a complete list of all genes in the fosmid clone see Table S1 in the supplemental material.
NA, not applicable to rRNA genes.
Several of the genes in the Arctic CelM fosmid clone appear to encode hydrolases. In fact, two genes encoding peptidases, an aminopeptidase gene (gene 7) and a zinc carboxypeptidase gene (gene 24), were identified in addition to the celM gene (gene 16) (Table 2). Such a combination of proteolytic and glycolytic potential is consistent with the hypothesis that Cytophaga-like bacteria are well adapted to using biopolymers such as those in high-molecular-weight DOM. Genes involved in organic nutrient uptake were also well represented. The Arctic fosmid had genes encoding a glycerol uptake facilitator protein (gene 3) and an ABC-type transporter (gene 21) that is potentially involved in the uptake of peptides (see Table S1 in the supplemental material).
Peptidase activity of the Cytophaga-like celM gene.
Most studies of genes cloned from environmental DNA have relied on sequence analysis to infer what metabolic function the genes might mediate. Function is commonly assigned according to the highest level of similarity in a BLAST analysis using a large database, such as the GenBank database. Although this approach is easy and widely used, it can be misleading because there is often little or no experimental evidence for the function of many, if not all, of the genes identified by the search. This was the case in our analysis of celM. The only experimental evidence for endoglucanase activity of CelM is evidence for the celM gene in C. thermocellum (24). The Arctic CelM was 29% identical and 48% similar to the C. thermocellum CelM (Table 1). Other genes similar to the Arctic celM gene were genes encoding family M42 peptidases. The peptidase most similar to the Arctic CelM was a protein in Bacillus cereus, which was 32% identical and 55% similar to the Arctic CelM.
To clarify the true identity of Arctic CelM, PCR primers were designed to amplify the entire gene and then were used to subclone the celM gene into an expression vector. The molecular mass of the protein expressed from the Arctic CelM clone determined by polyacrylamide gel electrophoresis was 34 kDa, which was consistent with the size of the Arctic celM open reading frame (1,089 bp). The amino acid sequence of the expressed protein determined by liquid chromatography-mass spectrometry (21 peptide fragments and 63% coverage) was identical to that of the expected protein based on the celM sequence in the Arctic CelM fosmid clone.
CelM encoded in the Arctic fosmid tested positive for glutamyl aminopeptidase activity when it was assayed with Glu-pNA (Fig. 4). The peptidase activity of the Arctic CelM in crude E. coli extracts was much higher than the activity in extracts of E. coli expressing the C. thermocellum CelM; the activity of the C. thermocellum CelM was not different from that of the negative control, the expression vector alone (Fig. 4). Because of the amino acid similarity of the Arctic CelM and the C. thermocellum CelM to zinc metalloproteases, we expected that the peptidase activity of the Arctic CelM would be higher with the addition of Zn (30). In fact, Zn addition actually reduced the activity of the Arctic CelM and had no effect on the activity of the C. thermocellum CelM (Fig. 4). The absence of peptidase activity in the C. thermocellum CelM is noteworthy because based solely on amino acid similarity one might predict that this protein would have peptidase activity due to the presence of a putative peptidase M42 domain (PFAM 05343).
FIG. 4.
Aminopeptidase activity of Arctic CelM, Clostridium CelM, and pET101 control assayed with and without addition of Zn using glutamine-p-nitroanilide. The error bars indicate standard errors.
No cellulase activity was detected in Arctic CelM extracts assayed with CMC, 4-methylumbelliferyl-β-d-glucopyranoside, 4-nitrophenyl-β-d-cellobioside, and 4-nitrophenyl-β-d-cellotetraoside. As we expected, the cloned C. thermocellum CelM tested positive for CMC hydrolysis but did not hydrolyze any of the cellulose analogs.
Conclusions.
Sequence analysis of environmental DNA from uncultivated microbes is a potentially powerful tool for uncovering the metabolic capabilities of microbes in the environment and broadening our view of microbial ecology. However, gene sequences must be translated into biological functions, and this is a limiting step in using environmental sequence data to address ecological questions. This study exposed the shortcomings of assigning gene function based on sequence similarity alone. The difficulty in distinguishing CelM from the M42 family of peptidases based on sequence similarity was overcome only by determining the activity of the expressed protein. While expression analysis is labor-intensive, it extracts the most complete and accurate information from environmental sequence data.
Functional assays circumvent the problem of assigning biological functions to genes recovered from metagenomic libraries, but this approach has its shortcomings. Sophisticated approaches based on complementation of genes in mutant E. coli host cells have proven to be useful in detecting genes that are active in the fermentation of glycerol (23). In contrast, rather simple assays using selective media (31) and looking for distinctive coloration of colonies (17) have been effective for detecting antibiotic resistance genes. Similarly straightforward assays are available to detect genes encoding hydrolases, such as xylanase (5) and chitinase (9). However, the effectiveness of any functional assay depends on successful expression in E. coli or alternative hosts, such as those proposed for studying uncultured soil bacteria (16, 27).
The combination of sequence and expression analyses should allow us to start to examine the role of CelM in biopolymer degradation by Cytophaga-like bacteria, although we need more information about signal peptides used by these bacteria and other marine microbes. If CelM is exported using a signal peptide unlike those studied to date, then it may liberate low-molecular-weight peptides that either are transported directly into the cell or are hydrolyzed further to even smaller peptides or free amino acids. However, if CelM remains in the cell, it may play a role analogous to that of the PepA glutamyl aminopeptidase in Lactococcus lactis (25), which acts on intracellular peptides. Regardless, CelM is probably one part of a multicomponent system used by Cytophaga-like bacteria for the consumption of biopolymers in high-molecular-weight DOM and particulate detritus. CelM and other peptidases probably work together with proteases and peptide transporters in the degradation of protein-containing DOM by Cytophaga-like bacteria. Characterizing the genes encoding components of biopolymer utilization systems should be useful in linking uncultured Cytophaga-like bacteria and other bacterial groups to biopolymer degradation and DOM cycling.
Supplementary Material
Acknowledgments
This study was supported by grants from the U.S. Department of Energy (DE-FG02-01ER63142) and the National Science Foundation (OPP 0124733). The sequencing and assembly of the fosmid clone was performed by the production sequencing group at the DOE/Joint Genome Institute through the Sequence-For-Others Program under the auspices of the U.S. Department of Energy's Office of Science, Biological and Environmental Research Program and the University of California Lawrence Livermore National Laboratory.
We thank Rex Malmstrom for his assistance and the Chief Scientists of the Shelf Basin Interaction project, Jackie Grebmeier and Lee Cooper, for their support during sample collection in the western Arctic Ocean. David Wilson kindly provided the clone of celM from C. thermocellum.
Footnotes
Supplemental material for this article may be found at http://aem.asm.org/.
REFERENCES
- 1.Amon, R. M. W., and R. Benner. 1996. Bacterial utilization of different size classes of dissolved organic matter. Limnol. Oceanogr. 41:41-51. [Google Scholar]
- 2.Arnosti, C. 2003. Microbial extracellular enzymes and their role in dissolved organic matter cycling, p. 316-342. In S. Findlay and R. L. Sinsabaugh (ed.), Aquatic ecosystems: interactivity of dissolved organic matter. Aquatic Ecology Series. Academic Press, San Diego, CA.
- 3.Benner, R. 2003. Molecular indicators of the bioavailability of dissolved organic matter, p. 121-138. In S. Findlay and R. L. Sinsabaugh (ed.), Aquatic ecosystems: interactivity of dissolved organic matter. Aquatic Ecology Series. Academic Press, San Diego, CA.
- 4.Benner, R., J. D. Pakulski, M. McCarthy, J. I. Hedges, and P. G. Hatcher. 1992. Bulk chemical characteristics of dissolved organic matter in the ocean. Science 255:1561-1564. [DOI] [PubMed] [Google Scholar]
- 5.Brennan, Y., W. N. Callen, L. Christoffersen, P. Dupree, F. Goubet, S. Healey, M. Hernandez, M. Keller, K. Li, N. Palackal, A. Sittenfeld, G. Tamayo, S. Wells, G. P. Hazlewood, E. J. Mathur, J. M. Short, D. E. Robertson, and B. A. Steer. 2004. Unusual microbial xylanases from insect guts. Appl. Environ. Microbiol. 70:3609-3617. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Chenna, R., H. Sugawara, T. Koike, R. Lopez, T. J. Gibson, D. G. Higgins, and J. D. Thompson. 2003. Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 31:3497-3500. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Cottrell, M. T., and D. L. Kirchman. 2000. Community composition of marine bacterioplankton determined by 16S rRNA gene clone libraries and fluorescence in situ hybridization. Appl. Environ. Microbiol. 66:5116-5122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Cottrell, M. T., and D. L. Kirchman. 2000. Natural assemblages of marine proteobacteria and members of the Cytophaga-Flavobacter cluster consuming low- and high-molecular-weight dissolved organic matter. Appl. Environ. Microbiol. 66:1692-1697. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Cottrell, M. T., J. A. Moore, and D. L. Kirchman. 1999. Chitinases from uncultured marine microorganisms. Appl. Environ. Microbiol. 65:2553-2557. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Cottrell, M. T., L. Yu, and D. L. Kirchman. Bacterial diversity of metagenomic and PCR libraries from the Delaware River. Environ. Microbiol., in press. [DOI] [PubMed]
- 11.Covert, J. S., and M. A. Moran. 2001. Molecular characterization of estuarine bacterial communities that use high- and low-molecular weight fractions of dissolved organic carbon. Aquat. Microb. Ecol. 25:127-139. [Google Scholar]
- 12.Crump, B. C., E. V. Armbrust, and J. A. Baross. 1999. Phylogenetic analysis of particle-attached and free-living bacterial communities in the Columbia River, its estuary, and the adjacent coastal ocean. Appl. Environ. Microbiol. 65:3192-3204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.DeLong, E. F., D. G. Franks, and A. L. Alldredge. 1993. Phylogenetic diversity of aggregate-attached vs free-living marine bacterial assemblages. Limnol. Oceanogr. 38:924-934. [Google Scholar]
- 14.Fandino, L. B., L. Riemann, G. F. Steward, R. A. Long, and F. Azam. 2001. Variations in bacterial community structure during a dinoflagellate bloom analyzed by DGGE and 16S rDNA sequencing. Aquat. Microb. Ecol. 23:119-130. [Google Scholar]
- 15.Felsenstein, J. 1989. PHYLIP—Phylogeny Inference Package (version 3.2). Cladistics 5:164-166. [Google Scholar]
- 16.Gabor, E. M., E. J. de Vries, and D. B. Janssen. 2003. Efficient recovery of environmental DNA for expression cloning by indirect extraction methods. FEMS Microbiol. Ecol. 44:153-163. [DOI] [PubMed] [Google Scholar]
- 17.Gillespie, D. E., S. F. Brady, A. D. Bettermann, N. P. Cianciotto, M. R. Liles, M. R. Rondon, J. Clardy, R. M. Goodman, and J. Handelsman. 2002. Isolation of antibiotics turbomycin A and B from a metagenomic library of soil microbial DNA. Appl. Environ. Microbiol. 68:4301-4306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Hanson, T. E., and F. R. Tabita. 2003. Insights into the stress response and sulfur metabolism revealed by proteome analysis of a Chlorobium tepidum mutant lacking the Rubisco-like protein. Photosynth. Res. 78:231-248. [DOI] [PubMed] [Google Scholar]
- 19.Hugenholtz, P. 2002. Exploring prokaryotic diversity in the genomic era. Genome Biol. [Online.] doi: 10.1186/gb-2002-3-2-reviews0003. [DOI] [PMC free article] [PubMed]
- 20.Keil, R. G., and D. L. Kirchman. 1999. Utilization of dissolved protein and amino acids in the northern Sargasso Sea. Aquat. Microb. Ecol. 18:293-300. [Google Scholar]
- 21.Kirchman, D. L. 2002. The ecology of Cytophaga-Flavobacteria in aquatic environments. FEMS Microbiol. Ecol. 39:91-100. [DOI] [PubMed] [Google Scholar]
- 22.Kirchman, D. L., and J. White. 1999. Hydrolysis and mineralization of chitin in the Delaware Estuary. Aquat. Microb. Ecol. 18:187-196. [Google Scholar]
- 23.Knietsch, A., S. Bowien, G. Whited, G. Gottschalk, and R. Daniel. 2003. Identification and characterization of coenzyme B12-dependent glycerol dehydratase- and diol dehydratase-encoding genes from metagenomic DNA libraries derived from enrichment cultures. Appl. Environ. Microbiol. 69:3048-3060. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Kobayashi, T., M. P. M. Romaniec, P. J. Barker, U. T. Gerngross, and A. L. Demain. 1993. Nucleotide sequence of gene celM encoding a new endoglucanase (CelM) of Clostridium thermocellum and purification of the enzyme. J. Ferment. Bioeng. 76:251-256. [Google Scholar]
- 25.L'Anson, K. J. A., S. Movahedi, H. G. Griffin, M. J. Gasson, and F. Mulholland. 1995. A nonessential glutamyl aminopeptidase is required for optimal growth of Lactococcus lactis MG1363 in milk. Microbiology 141:2873-2881. [DOI] [PubMed] [Google Scholar]
- 26.Ludwig, W., O. Strunk, R. Westram, L. Richter, H. Meier, Yadhukumar, A. Buchner, T. Lai, S. Steppi, G. Jobb, W. Forster, I. Brettske, S. Gerber, A. W. Ginhart, O. Gross, S. Grumann, S. Hermann, R. Jost, A. Konig, T. Liss, R. Lussmann, M. May, B. Nonhoff, B. Reichel, R. Strehlow, A. Stamatakis, N. Stuckmann, A. Vilbig, M. Lenke, T. Ludwig, A. Bode, and K. H. Schleifer. 2004. ARB: a software environment for sequence data. Nucleic Acids Res. 32:1363-1371. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Martinez, A., S. J. Kolvek, C. L. T. Yip, J. Hopke, K. A. Brown, I. A. MacNeil, and M. S. Osburne. 2004. Genetically modified bacterial strains and novel bacterial artificial chromosome shuttle vectors for constructing environmental libraries and detecting heterologous natural products in multiple expression hosts. Appl. Environ. Microbiol. 70:2452-2463. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Muyzer, G., A. Teske, C. O. Wirsen, and H. W. Jannasch. 1995. Phylogenetic-relationships of Thiomicrospira species and their identification in deep-sea hydrothermal vent samples by denaturing gradient gel-electrophoresis of 16S rDNA fragments. Arch. Microbiol. 164:165-172. [DOI] [PubMed] [Google Scholar]
- 29.Rath, J., K. Y. Wu, G. J. Herndl, and E. F. DeLong. 1998. High phylogenetic diversity in a marine-snow-associated bacterial assemblage. Aquat. Microb. Ecol. 14:261-269. [Google Scholar]
- 30.Rawlings, N. D., and A. J. Barrett. 1995. Evolutionary families of metallopeptidases. Methods Enzymol. 248:183-228. [DOI] [PubMed] [Google Scholar]
- 31.Riesenfeld, C. S., R. M. Goodman, and J. Handelsman. 2004. Uncultured soil bacteria are a reservoir of new antibiotic resistance genes. Environ. Microbiol. 6:981-989. [DOI] [PubMed] [Google Scholar]
- 32.Venter, J. C., K. Remington, J. F. Heidelberg, A. L. Halpern, D. Rusch, J. A. Eisen, D. Y. Wu, I. Paulsen, K. E. Nelson, W. Nelson, D. E. Fouts, S. Levy, A. H. Knap, M. W. Lomas, K. Nealson, O. White, J. Peterson, J. Hoffman, R. Parsons, H. Baden-Tillson, C. Pfannkoch, Y. H. Rogers, and H. O. Smith. 2004. Environmental genome shotgun sequencing of the Sargasso Sea. Science 304:66-74. [DOI] [PubMed] [Google Scholar]
- 33.Weiss, M. S., U. Abele, J. Weckesser, W. Welte, E. Schiltz, and G. E. Schulz. 1991. Molecular architecture and electrostatic properties of a bacterial porin. Science 254:1627-1630. [DOI] [PubMed] [Google Scholar]
- 34.Wu, J. H. D., W. H. Ormejohnson, and A. L. Demain. 1988. Two components of an extracellular protein aggregate of Clostridium thermocellum together degrade crystalline cellulose. Biochemistry 27:1703-1709. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




