Abstract
Clostridium thermocellum is a potent cellulolytic bacterium. C. thermocellum strain PAL5, was derived from strain S14 that was isolated from bagasse paper sludge, possesses higher cellulose-degradation ability than representative strains ATCC27405 and DSM1313. In this work, we determined the draft genome sequence of C. thermocellum PAL5. Genomic DNA was used for whole-genome sequencing using the Illumina HiSeq 2500. We obtained 215 contigs of >200 bp (N50, 78,366 bp; mean length, 17,378 bp). The assembled data were subjected to the National Center for Biotechnology Information (NCBI) Prokaryotic Genome Annotation Pipeline, and 3198 protein-coding sequences, 53 tRNA genes, and 4 rRNA genes were identified. The data are accessible at NCBI (the accession number SBHL00000000). Our data resource will facilitate further studies of efficient cellulose-degradation using C. thermocellum.
Keywords: Clostridium thermocellum, Cellulose, Cellulose-degradation, Draft genome sequence
Specifications table
| Subject area | Biology |
| More specific subject area | Bacteriology, Genomics |
| Type of data | Genomic sequence, predicted genes and annotation of respective proteins, deposited in NCBI database and available by links provided within article |
| How data were acquired | Whole-genome sequencing using Illumina HiSeq 2500 |
| Data format | Raw and analyzed |
| Experimental factors | Genomic DNA extracted from pure culture of Clostridium thermocellum PAL5 |
| Experimental features | Genome sequencing, de novo assembly, gene prediction |
| Data source location | Tsukuba, Ibaraki, Japan |
| Data accessibility | Deposited data are available at the National Center for Biotechnology Information (NCBI) under the accession number SBHL00000000 (https://www.ncbi.nlm.nih.gov/nuccore/SBHL00000000) |
| Related research article | C. Tachaapaikoon, A. Kosugi, P. Pason, R. Waeonukul, K. Ratanakhanokchai, K.L. Kyu, T. Arai, Y. Murata, Y. Mori, Isolation and characterization of a new cellulosome-producing Clostridium thermocellum strain, Biodegradation 23 (1) (2012) 57–68. |
Value of the data
|
1. Data
The thermophilic anaerobic bacterium Clostridium thermocellum (recently called Hungateiclostridium thermocellum) is a multifunctional ethanol producer, capable of both saccharification and fermentation [1]. C. thermocellum PAL5 was derived from strain S14 [2], [3], [4] that was isolated from bagasse paper sludge. The cellulolytic activity of strain PAL5 was compared with those of C. thermocellum ATCC27405T, a type strain of this species [5], and C. thermocellum DSM1313 [6] by incubation for 3 days at 60 °C in CTFUD medium [7] containing 1.0% microcrystalline cellulose powder instead of cellobiose. PAL5 showed better cellulose degrading ability than the other strains (Fig. 1), indicating that PAL5 may, like strain S14, possess high cellulose-degradation ability.
Fig. 1.
Comparison of cellulose-degradation ability of three strains of Clostridium thermocellum. The percentage of residual cellulose related to the original weight is shown for experiments with Clostridium thermocellum strains PAL5, ATCC27405, DSM1313 and uninoculated controls (control) after 3 days of incubation at 60 °C. PAL5, ATCC247405 and DSM 1313 were grown on CTFUD medium containing 1.0% microcrystalline cellulose. The data are means of four independent experiments. Error bars represent ± standard deviation (n = 4).
In this work, we determined the draft genome sequence of C. thermocellum PAL5 to identify which factors affect its cellulose-degradation ability. In total, 81,421,880 single reads with length 100 bp were obtained after filtering for quality score. Genome de novo assembly was performed using the CLC Genomic Workbench (CLC Bio, Qiagen, Valencia, CA); 215 contigs of >200 bp excluding scaffolded regions were obtained. Features of the genome are shown in Table 1. The assembled data for PAL5 were subjected to the NCBI Prokaryotic Genome Annotation Pipeline (PGAP), and 3,198 protein-coding sequences (CDSs), 53 tRNA genes, and 4 rRNA genes were identified. The equivalent values for strain ATCC27405 were 3,204 CDSs, 56 tRNA genes, and 12 rRNA genes (GenBank accession number:NC_009012). Thus, it was confirmed that the sequencing results for PAL5 in this work were similar to the known genome information for the type strain, and thus could be considered reliable.
Table 1.
Features of C. thermocellum PAL5 genome.
| Feature | Description |
|---|---|
| Number of reads used in assembly | 81,421,088 |
| Read length | 100 bp |
| Genome size (total contig size) | 3.84 Mbp |
| Assembly G + C percent | 38.80% |
| N50 contig length | 78,366 bp |
| Minimum contig length | 208 bp |
| Maximum contig length | 424,669 bp |
| Average contig length | 17,378 bp |
| Number of contigs | 215 contigs |
| Total contig size | 3,736,353 bp |
| Genome coverage | 2,178-fold |
We used the average nucleotide identity (ANI) assay [8] among eight strains of C. thermocellum, including PAL5, and two out group strains, C. clariflavum DSM19732 (CP003065.1) and Herbivorax (Hungateiclostridium) saccinocola GGR1 (CP025197.1). The ANI value is calculated as the mean identity of BLASTn matches between the virtually fragmented query genome and the reference genome. A dendrogram of relatedness using ANI values (Suppl. Table 1) was constructed using the unweighted pair group method with arithmetic (UPGMA) method (Fig. 2) and single-linkage method (data not shown) as clustering methods, which showed that PAL5 is closely related to all the C. thermocellum strains.
Fig. 2.
Dendrogram of average nucleotide identity (ANI) values. The ANI value for each combination of strains was calculated, and a dendrogram was constructed using the unweighted pair group method with arithmetic mean. Clostridium clariflavum DSM19732 (GenBank accession number: NZ_CP003065.1) and Herbivorax saccinocola GGR1 (NZ_CP025197.1) were used as outgroups. Strains of Clostridium thermocellum: PAL5, ATCC27405 (NC_009012), DSM1313 (NC_017,304), DSM2360 (NZ_CP016502), CB1 (NZ_CBQ0000000000.1), JW20 (NZ_ABVG00000000.2), AD2 (NZ_CP013828.1), and YS (AJGT00000000.1).
Eight putative cellulosomal scaffolding protein of PAL5 were identified from genomic data by similarity with strain ATCC27405 (Table 2). The protein accession numbers corresponding to CipA and OlpB were divided into three nonconsecutive fragments; we suggest this was because the single reads could not be concatenated by the algorithm used in the de novo assembly. We consider that our genome data are of sufficient quality for further analysis to consider which factors affect the cellulose-degradation ability of strain PAL5 and others.
Table 2.
Comparison of cellulosomal scaffolding proteins from strains ATCC27405T and PAL5.
| Predicted protein | ATCC27405T | Protein_accession number in PAL5 |
|---|---|---|
| Scaffolding protein | CipA | THJ77199.1, THJ77201.1, THJ77215.1 (partial) |
| Anchoring protein | OlpA | THJ76703.1 |
| OlpC | THJ77790.1 | |
| SdbA | THJ78951.1 | |
| Orf2p | THJ76702.1 | |
| OlpB | THJ76701.1, THJ77198.1, THJ77200.1 (partial) | |
| Cellulosomal integrated protein | Cthe_0735 | THJ78005.1 |
| Cthe_0736 | THJ78004.1 |
2. Experimental design, materials, and methods
2.1. Genomic DNA extraction and sequencing
Genomic DNA of C. thermocellum PAL5 was extracted from microbial cells grown in anaerobic conditions at 60 °C. We used the cetyltrimethylammonium bromide (CTAB) method to extract genomic DNA [9]. The genomic DNA was processed to template samples using the TruSeq Nano DNA LT Library Prep Kit (Illumina, San Diego, CA). The template samples were formed into clusters using the HiSeq PE Rapid Cluster Kit v2-HS and HiSeq Rapid Due cBot v2 Sample Loading Kit, and then sequenced using the HiSeq Rapid SBS Kit v2-HS (Illumina) with the HiSeq 2500 next generation sequencer (Illumina). Genome de novo assembly was performed using the CLC Genomic Workbench. The assembled data were subjected to the NCBI PGAP.
2.2. Genomic average nucleotide identity
ANI analysis, which is used for in silico analysis of DNA–DNA hybridization, was performed. ANI values of combinations of the whole genome sequences of C. thermocellum strains were calculated using the web tool ANI calculator (http://enve-omics.ce.gatech.edu/ani/). The matrix made from ANI values between C. thermocellum strains was converted to a genetic dendrogram with algorithms such as the unweighted pair group method with arithmetic mean and single-linkage clustering method in the R statistic program.
Acknowledgements
This work was conducted as part of a development project funded by Exploratory Research for Advanced Technology (ERATO) (Grant number JPMJER1502) of the Japan Science and Technology Agency (JST) and the Science and Technology Research Partnership for Sustainable Development (SATREPS), Japan Science and Technology Agency (JST)/Japan International Cooperation Agency (JICA). We thank James Allen, DPhil, from Edanz Group (www.edanzediting.com/ac) for editing a draft of this manuscript.
Footnotes
Supplementary data to this article can be found online at https://doi.org/10.1016/j.dib.2019.104274.
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Appendix A. Supplementary data
The following is the Supplementary data to this article:
References
- 1.Prawitwong P., Waeonukul R., Tachaapaikoon C., Pason P., Ratanakhanokchai K., Deng L., Sermsathanaswadi J., Septiningrum K., Mori Y., Kosugi A. Direct glucose production from lignocellulose using Clostridium thermocellum cultures supplemented with a thermostable β-glucosidase. Biotechnol. Biofuels. 2013;6(1):184. doi: 10.1186/1754-6834-6-184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Tachaapaikoon C., Kosugi A., Pason P., Waeonukul R., Ratanakhanokchai K., Kyu K.L., Arai T., Murata Y., Mori Y. Isolation and characterization of a new cellulosome-producing Clostridium thermocellum strain. Biodegradation. 2012;23(1):57–68. doi: 10.1007/s10532-011-9486-9. [DOI] [PubMed] [Google Scholar]
- 3.Shikata A., Sermsathanaswadi J., Thianheng P., Baramee S., Tachaapaikoon C., Waeonukuld R., Pason P., Ratanakhanokchai K., Kosugi A. Characterization of an anaerobic, thermophilic, alkaliphilic, high lignocellulosic biomass-degrading bacterial community, ISHI-3, isolated from biocompost. Enzym. Microb. Technol. 2018;118:66–75. doi: 10.1016/j.enzmictec.2018.07.001. [DOI] [PubMed] [Google Scholar]
- 4.Widyasti E., Shikata A., Hashim R., Sulaiman O., Sudesh K., Wahjono E., Kosugi A. Biodegradation of fibrillated oil palm trunk fiber by a novel thermophilic, anaerobic, xylanolytic bacterium Caldicoprobacter sp. CL-2 isolated from compost. Enzym. Microb. Technol. 2018;111:21–28. doi: 10.1016/j.enzmictec.2017.12.009. [DOI] [PubMed] [Google Scholar]
- 5.Wilson C.M., Rodriguez M.J., Johnson C.M., Martin S.L., Chu T.M., Wolfinger R.D., Hauser L.J., Land M.L., Klingeman D.M., Syed M.H., Ragauskas A.J., Tschaplinski T.J., Mielenz J.R., Brown S.D. Global transcriptome analysis of Clostridium thermocellum ATCC27405 during growth on dilute acid pretreated Populus and switchgrass. Biotechnol. Biofuels. 2013;6:179. doi: 10.1186/1754-6834-6-179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Feinberg L., Foden J., Barrett T., Davenport K.W., Bruce D., Detter C., Tapia R., Han C., Lapidus A., Lucas S., Cheng J.F., Pitluck S., Woyke T., Ivanova N., Mikhailova N., Land M., Hauser L., Argyros D.A., Goodwin L., Hogsett D., Caiazza N. Complete genome sequence of the cellulolytic thermophile Clostridium thermocellum DSM1313. J. Bacteriol. 2011;193:2906–2907. doi: 10.1128/JB.00322-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Olson D.G., Lee L.R. Transformation of clostridium thermocellum by electroporation. Methods Enzymol. 2012;510:317–330. doi: 10.1016/B978-0-12-415931-0.00017-3. [DOI] [PubMed] [Google Scholar]
- 8.Goris J., Konstantidis K.T., Klappenbach J.A., Coenye T., Vandamme P., Tiedje J.M. DNA–DNA hybridization valuesand their relationship to whole-genome sequence similarities. Int. J. Syst. Evol. Microbiol. 2007;57:81–91. doi: 10.1099/ijs.0.64483-0. [DOI] [PubMed] [Google Scholar]
- 9.Murray M.G., Thompson W.F. Rapid isolation of high molecular weight plant DNA. Nucleic Acids Res. 1980;8:4321–4325. doi: 10.1093/nar/8.19.4321. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


