Abstract
Cellulosilyticum lentocellum DSM 5427 is an anaerobic, endospore-forming member of the Firmicutes. We describe the complete genome sequence of this cellulose-degrading bacterium, which was originally isolated from estuarine sediment of a river that received both domestic and paper mill waste. Comparative genomics of cellulolytic clostridia will provide insight into factors that influence degradation rates.
TEXT
Cellulosilyticum lentocellum DSM 5427 (2), previously known as Clostridium lentocellum, was isolated from estuarine sediment of the River Don, Aberdeenshire, Scotland (15). The sample was sourced for potential novel cellulose degraders because the river received both domestic and paper mill effluent and its sediments exhibited cellulolytic activity. C. lentocellum is able to degrade cellulose slowly and may form a single terminal endospore (15). Based on its 16S rRNA, C. lentocellum DSM 5427 belongs to Clostridium cluster XIVb (4) or Lachnospiraceae, with its closest relatives being C. rumincola (2), Metabacterium polyspora, and Epulopiscium spp. (2, 4). Another strain of C. lentocellum DSM 5427, C. lentocellum SG6 (16, 17), isolated from budgerigar bird droppings was named based on its phenotypic similarities, although no phylogenetic relationship has been determined.
The C. lentocellum DSM 5427 genome was sequenced at the Joint Genome Institute (JGI) using 454 Titanium (14) and Illumina (1) technologies. Descriptions of library construction, sequencing, and assembly can be found at the JGI website (http://www.jgi.doe.gov/). Illumina data were assembled using VELVET (19), and the consensus sequences were shredded into 1.5-kb overlapped fake reads and assembled with the 454 data. The initial Newbler assembly contained 125 contigs in 12 scaffolds. This assembly was then converted into a phrap assembly by making fake reads from the consensus and collecting the read pairs in the 454 paired-end library. The Phred/Phrap/Consed software package (5–7) was used for sequence assembly and quality assessment. Illumina data were used to correct potential base errors and increase consensus quality using the software Polisher developed at the JGI. After the shotgun stage, reads were assembled with parallel Phrap (High Performance Software, LLC). Possible misassemblies were corrected with gapResolution (Cliff Han, unpublished data), Dupfinisher (8), or sequencing cloned bridging PCR fragments with subcloning. Gaps between contigs were closed by editing in Consed, by PCR, and by Bubble PCR primer walks. Automated annotation was performed at Oak Ridge National Laboratory, including coding sequence prediction using PRODIGAL (10), and functional annotation using Clusters of Orthologous Genes (COG) (18) and KEGG (11). Genes encoding tRNAs and rRNA operons were determined using tRNAscan-SE (13) and RNAmmer 1.2 (12), respectively.
The genome of C. lentocellum DSM 5427 consists of a single circular chromosome of 4,714,237 bp with a G+C content of 34.3%. The genome contains 4,185 predicted protein-coding sequences, 105 tRNAs, 12 23S rRNAs, and 11 16S rRNAs. COG annotation showed that 9.74% of the predicted proteins fall into carbohydrate transport and metabolism. Thirteen predicted open reading frames are annotated as cellulose degradation enzymes. Seven of these coding sequences are closely related to cellulases in C. ruminicola (3).
This is the first complete genome sequence from a member of Clostridium cluster XIVb, which includes anaerobes from anoxic sediments and herbivores. Comparative studies with other cellulose degraders (9) hold promise for advancing our understanding of the cellular and genomic factors that influence cellulose degradation and biofuel production.
Nucleotide sequence accession number.
The genome sequence of C. lentocellum DSM 5427 has been deposited in GenBank under accession no. CP002582.
Acknowledgments
We thank Sandra M. Adams for facilitating DNA submission.
This work was funded by the DOE Great Lakes Bioenergy Research Center (DOE BER Office of Science DE-FC02-07ER64494) supporting G.S., B.G.F., and C.R.C. The work conducted by the U.S. Department of Energy JGI was supported by the Office of Science of the U.S. Department of Energy under contract DE-AC02-05CH11231.
Footnotes
Published ahead of print on 11 March 2011.
REFERENCES
- 1. Bennett S. 2004. Solexa Ltd. Pharmacogenomics 5:433–438 [DOI] [PubMed] [Google Scholar]
- 2. Cai S., Dong X. 2010. Cellulosilyticum ruminicola gen. nov., sp. nov., isolated from the rumen of yak, and reclassification of Clostridium lentocellum as Cellulosilyticum lentocellum comb. nov. Int. J. Syst. Evol. Microbiol. 60:845–849 [DOI] [PubMed] [Google Scholar]
- 3. Cai S., et al. 2010. Cellulosilyticum ruminicola, a newly described rumen bacterium that possesses redundant fibrolytic-protein-encoding genes and degrades lignocellulose with multiple carbohydrate-borne fibrolytic enzymes. Appl. Environ. Microbiol. 76:3818–3824 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Collins M. D., et al. 1994. The phylogeny of the genus Clostridium: proposal of five new genera and eleven new species combinations. Int. J. Syst. Bacteriol. 44:812–826 [DOI] [PubMed] [Google Scholar]
- 5. Ewing B., Green P. 1998. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8:186–194 [PubMed] [Google Scholar]
- 6. Ewing B., Hillier L., Wendl M. C., Green P. 1998. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8:175–185 [DOI] [PubMed] [Google Scholar]
- 7. Gordon D., Abajian C., Green P. 1998. Consed: a graphical tool for sequence finishing. Genome Res. 8:195–202 [DOI] [PubMed] [Google Scholar]
- 8. Han C. S., Chain P. 2006. Finishing repeat regions automatically with Dupfinisher, p. 141–146 In Arabnia. H. R., Valafar H. (ed.), Proceedings of the 2006 International Conference on Bioinformatics and Computational Biology CSREA Press, Las Vegas, NV [Google Scholar]
- 9. Hemme C. L., et al. 2010. Sequencing of multiple clostridial genomes related to biomass conversion and biofuel production. J. Bacteriol. 192:6494–6496 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Hyatt D., et al. 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11:119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Kanehisa M., Goto S., Kawashima S., Okuno Y., Hattori M. 2004. The KEGG resource for deciphering the genome. Nucleic Acids Res. 32:D277–D280 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Lagesen K., et al. 2007. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 35:3100–3108 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Lowe T. M., Eddy S. R. 1997. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25:955–964 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Margulies M., et al. 2005. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437:376–380 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Murray W. D., Hoffman L., Campbell N. L., Madden R. H. 1986. Clostridium lentocellum sp. nov., a cellulolytic species from river sediment containing paper-mill waste. Syst. Appl. Microbiol. 8:181–184 [Google Scholar]
- 16. Ravinder T., Sudha Rani K., Gopal R., Seenayya G. 1998. Direct conversion of biomass to acetic acid by anaerobic cellulolytic isolates. J. Sci. Ind. Res. 57:591–594 [Google Scholar]
- 17. Ravinder T., Swamy M. V., Seenayya G., Reddy G. 2001. Clostridium lentocellum SG6—a potential organism for fermentation of cellulose to acetic acid. Bioresour. Technol. 80:171–177 [DOI] [PubMed] [Google Scholar]
- 18. Tatusov R. L., et al. 2003. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4:41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Zerbino D. R., Birney E. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18:821–829 [DOI] [PMC free article] [PubMed] [Google Scholar]
