ABSTRACT
Cystobasidium ongulense has been reported from East Ongul Island near Syowa Station, East Antarctica, as a new basidiomycetous yeast species. This species has cold active lipases and cellulases that are active even at subzero temperatures. We report draft genome sequences of five Cystobasidium ongulense strains isolated from East Antarctica.
ANNOUNCEMENT
Cystobasidium ongulense was isolated from East Ongul Island, East Antarctica, and reported as a new species of basidiomycete yeast (1). Since this yeast can grow in a wide range of temperatures, from −3°C to 30°C, the genomic information of this yeast is expected to be valuable as a new genetic resource (2).
This study analyzed the whole-genome sequences of two strains (9A-2 = JCM 31527 and 9A-5 = JCM 31528) isolated from East Ongul Island and three strains (051-20-1, 056-1-20Y-3, and 056-20-8) isolated from Inhovde, near Syowa Station.
Each strain was cultured in yeast extract-peptone-dextrose (YPD) liquid medium (Difco) at 15°C for 5 days. The cells from cultures were collected by centrifugation at 3,500 × g for 5 min at 4°C. The cell pellets were washed with sterile distilled water and precipitated by centrifugation again. The cell pellets were dried using a vacuum freeze dryer. The lyophilized cells were powdered in a mortar and used for genome extraction. The genomic DNA of five Cystobasidium ongulense strains was extracted and purified using a NucleoBond AXG100 column with a Nucleo buffer set III (TaKaRa Bio) according to the manufacturer’s protocol. The concentration and purification of the genomic DNA were determined using a NanoDrop spectrophotometer (Thermo Scientific) and the Qubit double-stranded DNA (dsDNA) broad-range (BR) assay kit (Thermo Scientific). Then, the genomic DNA was digested using the microTUBE (Covaris), and the genomic library was constructed using the Illumina TruSeq DNA PCR-free library preparation kit (Illumina). The quality of the library was confirmed using the Quant-iT PicoGreen dsDNA assay kit, and the library size was checked using the Agilent 2200 TapeStation system (Agilent Technologies, Inc.). The paired-end sequence reaction was performed on a HiSeq 2500 instrument (Illumina). Adapters and low-quality bases were trimmed using fastp ver. 0.12.4 (3). Assembly of the nuclear genomes was conducted using VelvetOptimiser ver. 2.2.6 (4). The completeness of the assemblies was assessed using BUSCO ver. 5.2.2 (5) with the database basidiomycota_odb10 (the number of the marker is 1764). After the identification of repeat sequences using RepeatModeler ver. 2.0.1 and RepeatMasker ver. 4.1.2 (6), annotation was performed using the BRAKER2 ver. 2.1.6 (7–11) automated annotation pipeline using the fungi protein data set from OrthoDB ver. 10 (12). BRAKER2 ab initio gene predictions were carried out using AUGUSTUS ver. 3.4.0 (10) and GeneMark-EP+ ver. 4.64 (13). The resulting BRAKER2 gtf file was parsed using TSEBRA (14) and AGAT ver. 0.8.0 (15). Finally, functional annotation was performed using InterProScan ver. 5 (16), Phobius ver. 1.01 (17), antiSMASH ver. 6.0 (18), and EggNog mapper ver. 2.1.3 (19) against the eggNOG database ver. 5.0.1 (20) using diamond ver. 2.0.9 (21).
A summary of the genomic analysis results of the five C. ongulense strains is shown in Table 1. Briefly, the genome size ranged from 19.9 Mb to 20.1 Mb, the GC content ranged from 49.1 to 49.2%, and the BUSCO score was 92.4 to 92.7%. Consequently, the number of protein coding genes predicted ranged from 7,183 to 7,250.
TABLE 1.
Assembly summary for draft genomic data of five Cystobasidium ongulense strains
| Data for strain: |
|||||
|---|---|---|---|---|---|
| Characteristic | 9A-2 (=JCM 31527) | 9A-5 (=JCM 31528) | 051-20-1 | 056-1-20Y-3 | 056-20-8 |
| Sampling site | East Ongule Island | East Ongule Island | Inhovde | Inhovde | Inhovde |
| No. of reads | 8,304,954,096 | 6,269,707,240 | 7,317,973,400 | 7,758.637,304 | 7,164,512,704 |
| No. of scaffolds | 69 | 67 | 120 | 73 | 81 |
| Total length (Mb) | 19.9 | 19.9 | 20.1 | 20.0 | 20.0 |
| Coverage (×) | 392 | 297 | 337 | 361 | 330 |
| GC content (%) | 49.2 | 49.2 | 49.2 | 49.1 | 49.1 |
| Scaffold N50 value (kb) | 2,125 | 2,126 | 1,057 | 1,432 | 1,222 |
| No. of proteins | 7,190 | 7,183 | 7,243 | 7,227 | 7,250 |
| BUSCO score (%) | 92.6 | 92.6 | 92.7 | 92.6 | 92.4 |
| GenBank accession no. | BQUO00000000.1 | BQUP00000000.1 | BQUQ00000000.1 | BQUR00000000.1 | BQUS00000000.1 |
| SRA accession no. | DRR118811, DRR118812 | DRR118813, DRR118814 | DRR118815, DRR118816 | DRR118817, DRR118818 | DRR118819, DRR118820 |
We believe that the genomic information of C. ongulense will help us understand how Antarctic fungi adapt to the extreme environment.
Data availability.
This whole-genome shotgun project has been deposited at DDBJ/EMBL/GenBank under BioProject accession number PRJDB6568 with accession numbers BQUO00000000.1 (9A-2), BQUP00000000.1 (9A-5), BQUQ00000000.1 (051-20-1), BQUR00000000.1 (056-1-20Y-3), and BQUS00000000.1 (056-20-8). The raw reads are available under SRA accession numbers DRR118811, DRR118812 (9A-2), DRR118813, DRR118814 (9A-5), DRR118815, DRR118816 (051-20-1), DRR118817, DRR118818 (056-1-20Y-3), and DRR118819, DRR118820 (056-20-8).
ACKNOWLEDGMENTS
This work was supported by National Institute of Polar Research (NIPR) research project KP-309, general collaboration project number 2-29. This work was also supported by a JSPS Grant-in-Aid for Young Scientists (A) (number 16H06211), JSPS KAKENHI grant number 16H06279 (PAGS), and Institution for Fermentation, Osaka, general research grant number G-2022-1-007.
Contributor Information
Masaharu Tsuji, Email: spindletuber@gmail.com.
Jason E. Stajich, University of California, Riverside
REFERENCES
- 1.Tsuji M, Tsujimoto M, Imura S. 2017. Cystobasidium tubakii and Cystobasidium ongulense, new basidiomycetous yeast species isolated from East Ongul Island, East Antarctica. Mycoscience 58:103–110. doi: 10.1016/j.myc.2016.11.002. [DOI] [Google Scholar]
- 2.Tsuji M. 2018. Genetic diversity of yeasts from East Ongul Island, East Antarctica and their extracellular enzymes secretion. Polar Biol 41:249–258. doi: 10.1007/s00300-017-2185-1. [DOI] [Google Scholar]
- 3.Chen S, Zhou Y, Chen Y, Gu J. 2018. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34:i884–i890. doi: 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Zerbino DR, Birney E. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–829. doi: 10.1101/gr.074492.107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. 2015. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
- 6.Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C, Smit AF. 2020. RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci USA 117:9451–9457. doi: 10.1073/pnas.1921046117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Brůna T, Hoff KJ, Lomsadze A, Stanke M, Borodovsky M. 2021. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom Bioinform 3:lqaa108. doi: 10.1093/nargab/lqaa108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Hoff KJ, Lomsadze A, Borodovsky M, Stanke M. 2019. Whole-genome annotation with BRAKER. Methods Mol Biol 1962:65–95. doi: 10.1007/978-1-4939-9173-0_5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Hoff KJ, Lange S, Lomsadze A, Borodovsky M, Stanke M. 2016. BRAKER1: unsupervised RNA-Seq-based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics 32:767–769. doi: 10.1093/bioinformatics/btv661. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Stanke M, Diekhans M, Baertsch R, Haussler D. 2008. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24:637–644. doi: 10.1093/bioinformatics/btn013. [DOI] [PubMed] [Google Scholar]
- 11.Stanke M, Schöffmann O, Morgenstern B, Waack S. 2006. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics 7:62. doi: 10.1186/1471-2105-7-62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Kriventseva EV, Kuznetsov D, Tegenfeldt F, Manni M, Dias R, Simão FA, Zdobnov EM. 2019. OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Nucleic Acids Res 47:D807–D811. doi: 10.1093/nar/gky1053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Brůna T, Lomsadze A, Borodovsky M. 2020. GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins. NAR Genom Bioinform 2:lqaa026. doi: 10.1093/nargab/lqaa026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Gabriel L, Hoff KJ, Brůna T, Borodovsky M, Stanke M. 2021. TSEBRA: transcript selector for BRAKER. BMC Bioinformatics 22:566. doi: 10.1186/s12859-021-04482-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Dainat J, Hereñú D, Davis E, Crouch K, LucileSol, pascal-git, tayyrov. 2021. NBISweden/AGAT: AGAT-v0.8.0. Zenodo doi: 10.5281/zenodo.3552717. [DOI] [Google Scholar]
- 16.Jones P, Binns D, Chang H-Y, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, Pesseat S, Quinn AF, Sangrador-Vegas A, Scheremetjew M, Yong S-Y, Lopez R, Hunter S. 2014. InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236–1240. doi: 10.1093/bioinformatics/btu031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Käll L, Krogh A, Sonnhammer EL. 2004. A combined transmembrane topology and signal peptide prediction method. J Mol Biol 338:1027–1036. doi: 10.1016/j.jmb.2004.03.016. [DOI] [PubMed] [Google Scholar]
- 18.Blin K, Shaw S, Kloosterman AM, Charlop-Powers Z, van Wezel GP, Medema MH, Weber T. 2021. AntiSMASH 6.0: improving cluster detection and comparison capabilities. Nucleic Acids Res 49:W29–W35. doi: 10.1093/nar/gkab335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Huerta-Cepas J, Forslund K, Coelho LP, Szklarczyk D, Jensen LJ, von Mering C, Bork P. 2017. Fast genome-wide functional annotation through orthology assignment by eggNOG-Mapper. Mol Biol Evol 34:2115–2122. doi: 10.1093/molbev/msx148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Huerta-Cepas J, Szklarczyk D, Heller D, Hernández-Plaza A, Forslund SK, Cook H, Mende DR, Letunic I, Rattei T, Jensen LJ, von Mering C, Bork P. 2019. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res 47:D309–D314. doi: 10.1093/nar/gky1085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Buchfink B, Xie C, Huson DH. 2015. Fast and sensitive protein alignment using DIAMOND. Nat Methods 12:59–60. doi: 10.1038/nmeth.3176. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
This whole-genome shotgun project has been deposited at DDBJ/EMBL/GenBank under BioProject accession number PRJDB6568 with accession numbers BQUO00000000.1 (9A-2), BQUP00000000.1 (9A-5), BQUQ00000000.1 (051-20-1), BQUR00000000.1 (056-1-20Y-3), and BQUS00000000.1 (056-20-8). The raw reads are available under SRA accession numbers DRR118811, DRR118812 (9A-2), DRR118813, DRR118814 (9A-5), DRR118815, DRR118816 (051-20-1), DRR118817, DRR118818 (056-1-20Y-3), and DRR118819, DRR118820 (056-20-8).
