Draft Genome Sequences of Five Cystobasidium ongulense Strains Isolated from Areas near Syowa Station, East Antarctica

Masaharu Tsuji; Jun-ichi Ishihara; Yasuhiro Gotoh; Tetsuya Hayashi; Hiroki Takahashi

doi:10.1128/mra.00224-22

. 2022 Jun 13;11(7):e00224-22. doi: 10.1128/mra.00224-22

Draft Genome Sequences of Five Cystobasidium ongulense Strains Isolated from Areas near Syowa Station, East Antarctica

Masaharu Tsuji ^a,^✉, Jun-ichi Ishihara ^b, Yasuhiro Gotoh ^c, Tetsuya Hayashi ^c, Hiroki Takahashi ^b

Editor: Jason E Stajich^d

PMCID: PMC9302155 PMID: 35695498

ABSTRACT

Cystobasidium ongulense has been reported from East Ongul Island near Syowa Station, East Antarctica, as a new basidiomycetous yeast species. This species has cold active lipases and cellulases that are active even at subzero temperatures. We report draft genome sequences of five Cystobasidium ongulense strains isolated from East Antarctica.

ANNOUNCEMENT

Cystobasidium ongulense was isolated from East Ongul Island, East Antarctica, and reported as a new species of basidiomycete yeast (1). Since this yeast can grow in a wide range of temperatures, from −3°C to 30°C, the genomic information of this yeast is expected to be valuable as a new genetic resource (2).

This study analyzed the whole-genome sequences of two strains (9A-2 = JCM 31527 and 9A-5 = JCM 31528) isolated from East Ongul Island and three strains (051-20-1, 056-1-20Y-3, and 056-20-8) isolated from Inhovde, near Syowa Station.

Each strain was cultured in yeast extract-peptone-dextrose (YPD) liquid medium (Difco) at 15°C for 5 days. The cells from cultures were collected by centrifugation at 3,500 × g for 5 min at 4°C. The cell pellets were washed with sterile distilled water and precipitated by centrifugation again. The cell pellets were dried using a vacuum freeze dryer. The lyophilized cells were powdered in a mortar and used for genome extraction. The genomic DNA of five Cystobasidium ongulense strains was extracted and purified using a NucleoBond AXG100 column with a Nucleo buffer set III (TaKaRa Bio) according to the manufacturer’s protocol. The concentration and purification of the genomic DNA were determined using a NanoDrop spectrophotometer (Thermo Scientific) and the Qubit double-stranded DNA (dsDNA) broad-range (BR) assay kit (Thermo Scientific). Then, the genomic DNA was digested using the microTUBE (Covaris), and the genomic library was constructed using the Illumina TruSeq DNA PCR-free library preparation kit (Illumina). The quality of the library was confirmed using the Quant-iT PicoGreen dsDNA assay kit, and the library size was checked using the Agilent 2200 TapeStation system (Agilent Technologies, Inc.). The paired-end sequence reaction was performed on a HiSeq 2500 instrument (Illumina). Adapters and low-quality bases were trimmed using fastp ver. 0.12.4 (3). Assembly of the nuclear genomes was conducted using VelvetOptimiser ver. 2.2.6 (4). The completeness of the assemblies was assessed using BUSCO ver. 5.2.2 (5) with the database basidiomycota_odb10 (the number of the marker is 1764). After the identification of repeat sequences using RepeatModeler ver. 2.0.1 and RepeatMasker ver. 4.1.2 (6), annotation was performed using the BRAKER2 ver. 2.1.6 (7 –11) automated annotation pipeline using the fungi protein data set from OrthoDB ver. 10 (12). BRAKER2 ab initio gene predictions were carried out using AUGUSTUS ver. 3.4.0 (10) and GeneMark-EP+ ver. 4.64 (13). The resulting BRAKER2 gtf file was parsed using TSEBRA (14) and AGAT ver. 0.8.0 (15). Finally, functional annotation was performed using InterProScan ver. 5 (16), Phobius ver. 1.01 (17), antiSMASH ver. 6.0 (18), and EggNog mapper ver. 2.1.3 (19) against the eggNOG database ver. 5.0.1 (20) using diamond ver. 2.0.9 (21).

A summary of the genomic analysis results of the five C. ongulense strains is shown in Table 1. Briefly, the genome size ranged from 19.9 Mb to 20.1 Mb, the GC content ranged from 49.1 to 49.2%, and the BUSCO score was 92.4 to 92.7%. Consequently, the number of protein coding genes predicted ranged from 7,183 to 7,250.

TABLE 1.

Assembly summary for draft genomic data of five Cystobasidium ongulense strains

	Data for strain:
Characteristic	9A-2 (=JCM 31527)	9A-5 (=JCM 31528)	051-20-1	056-1-20Y-3	056-20-8
Sampling site	East Ongule Island	East Ongule Island	Inhovde	Inhovde	Inhovde
No. of reads	8,304,954,096	6,269,707,240	7,317,973,400	7,758.637,304	7,164,512,704
No. of scaffolds	69	67	120	73	81
Total length (Mb)	19.9	19.9	20.1	20.0	20.0
Coverage (×)	392	297	337	361	330
GC content (%)	49.2	49.2	49.2	49.1	49.1
Scaffold N₅₀ value (kb)	2,125	2,126	1,057	1,432	1,222
No. of proteins	7,190	7,183	7,243	7,227	7,250
BUSCO score (%)	92.6	92.6	92.7	92.6	92.4
GenBank accession no.	BQUO00000000.1	BQUP00000000.1	BQUQ00000000.1	BQUR00000000.1	BQUS00000000.1
SRA accession no.	DRR118811, DRR118812	DRR118813, DRR118814	DRR118815, DRR118816	DRR118817, DRR118818	DRR118819, DRR118820

Open in a new tab

We believe that the genomic information of C. ongulense will help us understand how Antarctic fungi adapt to the extreme environment.

Data availability.

This whole-genome shotgun project has been deposited at DDBJ/EMBL/GenBank under BioProject accession number PRJDB6568 with accession numbers BQUO00000000.1 (9A-2), BQUP00000000.1 (9A-5), BQUQ00000000.1 (051-20-1), BQUR00000000.1 (056-1-20Y-3), and BQUS00000000.1 (056-20-8). The raw reads are available under SRA accession numbers DRR118811, DRR118812 (9A-2), DRR118813, DRR118814 (9A-5), DRR118815, DRR118816 (051-20-1), DRR118817, DRR118818 (056-1-20Y-3), and DRR118819, DRR118820 (056-20-8).

ACKNOWLEDGMENTS

This work was supported by National Institute of Polar Research (NIPR) research project KP-309, general collaboration project number 2-29. This work was also supported by a JSPS Grant-in-Aid for Young Scientists (A) (number 16H06211), JSPS KAKENHI grant number 16H06279 (PAGS), and Institution for Fermentation, Osaka, general research grant number G-2022-1-007.

Contributor Information

Masaharu Tsuji, Email: spindletuber@gmail.com.

Jason E. Stajich, University of California, Riverside

REFERENCES

1.Tsuji M, Tsujimoto M, Imura S. 2017. Cystobasidium tubakii and Cystobasidium ongulense, new basidiomycetous yeast species isolated from East Ongul Island, East Antarctica. Mycoscience 58:103–110. doi: 10.1016/j.myc.2016.11.002. [DOI] [Google Scholar]
2.Tsuji M. 2018. Genetic diversity of yeasts from East Ongul Island, East Antarctica and their extracellular enzymes secretion. Polar Biol 41:249–258. doi: 10.1007/s00300-017-2185-1. [DOI] [Google Scholar]
3.Chen S, Zhou Y, Chen Y, Gu J. 2018. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34:i884–i890. doi: 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Zerbino DR, Birney E. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–829. doi: 10.1101/gr.074492.107. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. 2015. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
6.Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C, Smit AF. 2020. RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci USA 117:9451–9457. doi: 10.1073/pnas.1921046117. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Brůna T, Hoff KJ, Lomsadze A, Stanke M, Borodovsky M. 2021. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom Bioinform 3:lqaa108. doi: 10.1093/nargab/lqaa108. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Hoff KJ, Lomsadze A, Borodovsky M, Stanke M. 2019. Whole-genome annotation with BRAKER. Methods Mol Biol 1962:65–95. doi: 10.1007/978-1-4939-9173-0_5. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Hoff KJ, Lange S, Lomsadze A, Borodovsky M, Stanke M. 2016. BRAKER1: unsupervised RNA-Seq-based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics 32:767–769. doi: 10.1093/bioinformatics/btv661. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Stanke M, Diekhans M, Baertsch R, Haussler D. 2008. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24:637–644. doi: 10.1093/bioinformatics/btn013. [DOI] [PubMed] [Google Scholar]
11.Stanke M, Schöffmann O, Morgenstern B, Waack S. 2006. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics 7:62. doi: 10.1186/1471-2105-7-62. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Kriventseva EV, Kuznetsov D, Tegenfeldt F, Manni M, Dias R, Simão FA, Zdobnov EM. 2019. OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Nucleic Acids Res 47:D807–D811. doi: 10.1093/nar/gky1053. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Brůna T, Lomsadze A, Borodovsky M. 2020. GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins. NAR Genom Bioinform 2:lqaa026. doi: 10.1093/nargab/lqaa026. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Gabriel L, Hoff KJ, Brůna T, Borodovsky M, Stanke M. 2021. TSEBRA: transcript selector for BRAKER. BMC Bioinformatics 22:566. doi: 10.1186/s12859-021-04482-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Dainat J, Hereñú D, Davis E, Crouch K, LucileSol, pascal-git, tayyrov. 2021. NBISweden/AGAT: AGAT-v0.8.0. Zenodo doi: 10.5281/zenodo.3552717. [DOI] [Google Scholar]
16.Jones P, Binns D, Chang H-Y, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, Pesseat S, Quinn AF, Sangrador-Vegas A, Scheremetjew M, Yong S-Y, Lopez R, Hunter S. 2014. InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236–1240. doi: 10.1093/bioinformatics/btu031. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Käll L, Krogh A, Sonnhammer EL. 2004. A combined transmembrane topology and signal peptide prediction method. J Mol Biol 338:1027–1036. doi: 10.1016/j.jmb.2004.03.016. [DOI] [PubMed] [Google Scholar]
18.Blin K, Shaw S, Kloosterman AM, Charlop-Powers Z, van Wezel GP, Medema MH, Weber T. 2021. AntiSMASH 6.0: improving cluster detection and comparison capabilities. Nucleic Acids Res 49:W29–W35. doi: 10.1093/nar/gkab335. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Huerta-Cepas J, Forslund K, Coelho LP, Szklarczyk D, Jensen LJ, von Mering C, Bork P. 2017. Fast genome-wide functional annotation through orthology assignment by eggNOG-Mapper. Mol Biol Evol 34:2115–2122. doi: 10.1093/molbev/msx148. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Huerta-Cepas J, Szklarczyk D, Heller D, Hernández-Plaza A, Forslund SK, Cook H, Mende DR, Letunic I, Rattei T, Jensen LJ, von Mering C, Bork P. 2019. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res 47:D309–D314. doi: 10.1093/nar/gky1085. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Buchfink B, Xie C, Huson DH. 2015. Fast and sensitive protein alignment using DIAMOND. Nat Methods 12:59–60. doi: 10.1038/nmeth.3176. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[B1] 1.Tsuji M, Tsujimoto M, Imura S. 2017. Cystobasidium tubakii and Cystobasidium ongulense, new basidiomycetous yeast species isolated from East Ongul Island, East Antarctica. Mycoscience 58:103–110. doi: 10.1016/j.myc.2016.11.002. [DOI] [Google Scholar]

[B2] 2.Tsuji M. 2018. Genetic diversity of yeasts from East Ongul Island, East Antarctica and their extracellular enzymes secretion. Polar Biol 41:249–258. doi: 10.1007/s00300-017-2185-1. [DOI] [Google Scholar]

[B3] 3.Chen S, Zhou Y, Chen Y, Gu J. 2018. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34:i884–i890. doi: 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B4] 4.Zerbino DR, Birney E. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–829. doi: 10.1101/gr.074492.107. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B5] 5.Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. 2015. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]

[B6] 6.Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C, Smit AF. 2020. RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci USA 117:9451–9457. doi: 10.1073/pnas.1921046117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] 7.Brůna T, Hoff KJ, Lomsadze A, Stanke M, Borodovsky M. 2021. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom Bioinform 3:lqaa108. doi: 10.1093/nargab/lqaa108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B8] 8.Hoff KJ, Lomsadze A, Borodovsky M, Stanke M. 2019. Whole-genome annotation with BRAKER. Methods Mol Biol 1962:65–95. doi: 10.1007/978-1-4939-9173-0_5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9] 9.Hoff KJ, Lange S, Lomsadze A, Borodovsky M, Stanke M. 2016. BRAKER1: unsupervised RNA-Seq-based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics 32:767–769. doi: 10.1093/bioinformatics/btv661. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] 10.Stanke M, Diekhans M, Baertsch R, Haussler D. 2008. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24:637–644. doi: 10.1093/bioinformatics/btn013. [DOI] [PubMed] [Google Scholar]

[B11] 11.Stanke M, Schöffmann O, Morgenstern B, Waack S. 2006. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics 7:62. doi: 10.1186/1471-2105-7-62. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B12] 12.Kriventseva EV, Kuznetsov D, Tegenfeldt F, Manni M, Dias R, Simão FA, Zdobnov EM. 2019. OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Nucleic Acids Res 47:D807–D811. doi: 10.1093/nar/gky1053. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B13] 13.Brůna T, Lomsadze A, Borodovsky M. 2020. GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins. NAR Genom Bioinform 2:lqaa026. doi: 10.1093/nargab/lqaa026. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B14] 14.Gabriel L, Hoff KJ, Brůna T, Borodovsky M, Stanke M. 2021. TSEBRA: transcript selector for BRAKER. BMC Bioinformatics 22:566. doi: 10.1186/s12859-021-04482-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B15] 15.Dainat J, Hereñú D, Davis E, Crouch K, LucileSol, pascal-git, tayyrov. 2021. NBISweden/AGAT: AGAT-v0.8.0. Zenodo doi: 10.5281/zenodo.3552717. [DOI] [Google Scholar]

[B16] 16.Jones P, Binns D, Chang H-Y, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, Pesseat S, Quinn AF, Sangrador-Vegas A, Scheremetjew M, Yong S-Y, Lopez R, Hunter S. 2014. InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236–1240. doi: 10.1093/bioinformatics/btu031. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B17] 17.Käll L, Krogh A, Sonnhammer EL. 2004. A combined transmembrane topology and signal peptide prediction method. J Mol Biol 338:1027–1036. doi: 10.1016/j.jmb.2004.03.016. [DOI] [PubMed] [Google Scholar]

[B18] 18.Blin K, Shaw S, Kloosterman AM, Charlop-Powers Z, van Wezel GP, Medema MH, Weber T. 2021. AntiSMASH 6.0: improving cluster detection and comparison capabilities. Nucleic Acids Res 49:W29–W35. doi: 10.1093/nar/gkab335. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] 19.Huerta-Cepas J, Forslund K, Coelho LP, Szklarczyk D, Jensen LJ, von Mering C, Bork P. 2017. Fast genome-wide functional annotation through orthology assignment by eggNOG-Mapper. Mol Biol Evol 34:2115–2122. doi: 10.1093/molbev/msx148. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B20] 20.Huerta-Cepas J, Szklarczyk D, Heller D, Hernández-Plaza A, Forslund SK, Cook H, Mende DR, Letunic I, Rattei T, Jensen LJ, von Mering C, Bork P. 2019. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res 47:D309–D314. doi: 10.1093/nar/gky1085. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B21] 21.Buchfink B, Xie C, Huson DH. 2015. Fast and sensitive protein alignment using DIAMOND. Nat Methods 12:59–60. doi: 10.1038/nmeth.3176. [DOI] [PubMed] [Google Scholar]

PERMALINK

Draft Genome Sequences of Five Cystobasidium ongulense Strains Isolated from Areas near Syowa Station, East Antarctica

Masaharu Tsuji

Jun-ichi Ishihara

Yasuhiro Gotoh

Tetsuya Hayashi

Hiroki Takahashi

Roles

ABSTRACT

ANNOUNCEMENT

TABLE 1.

Data availability.

ACKNOWLEDGMENTS

Contributor Information

REFERENCES

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Draft Genome Sequences of Five Cystobasidium ongulense Strains Isolated from Areas near Syowa Station, East Antarctica

Masaharu Tsuji

Jun-ichi Ishihara

Yasuhiro Gotoh

Tetsuya Hayashi

Hiroki Takahashi

Roles

ABSTRACT

ANNOUNCEMENT

TABLE 1.

Data availability.

ACKNOWLEDGMENTS

Contributor Information

REFERENCES

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases