Skip to main content
Microbiology Resource Announcements logoLink to Microbiology Resource Announcements
. 2021 May 27;10(21):e00195-21. doi: 10.1128/MRA.00195-21

Draft Genome Sequences of Five Fungal Strains Isolated from Kefir

Simonas Marcišauskas a, Yongkyu Kim b,*, Sonja Blasche b,*, Kiran R Patil b,*, Boyang Ji a,c, Jens Nielsen a,c,
Editor: Antonis Rokasd
PMCID: PMC8201632  PMID: 34042484

ABSTRACT

We present the annotated draft genome sequences of five fungal strains isolated from kefir grains. These isolates included three ascomycetous (Candida californica, Kazachstania exigua, and Kazachstania unispora) and one basidiomycetous (Rhodotorula mucilaginosa) species. The results revealed a detailed overview of the metabolic features of kefir fungi that will be potentially useful in biotechnological applications.

ANNOUNCEMENT

Kefir is fermented milk traditionally produced by a specific symbiotic culture of bacteria and fungi. Also known as kefir grains, this culture usually consists of 40 to 50 different species, including lactic acid bacteria, acetic acid bacteria, and yeasts (1). The ascomycetous yeast Kluyveromyces marxianus was previously identified in kefir grains (2), but little is known about other cooccurring fungi. Here, we report the annotated whole-genome sequences of the ascomycetous yeasts Candida californica, Kazachstania exigua, and Kazachstania unispora and the basidiomycetous fungus Rhodotorula mucilaginosa, isolated from kefir grains collected from private sources. These kefir grain cultures were collected in Germany (Ger04, C. californica and K. unispora; Ger06/OG2, K. exigua) and South Korea (Kefir Korea, R. mucilaginosa). C. californica SB-48 (referring internal stock identifier) was isolated from ground kefir grains and plated in serial dilutions onto yeast extract-peptone-dextrose-adenine (YPDA) medium. C. californica SB-116 was isolated and plated in serial dilutions onto Sabouraud dextrose (SD) medium. K. exigua SB-178 was isolated and plated in serial dilutions onto M17 medium supplemented with glucose. K. unispora SB-162 was isolated and plated in serial dilutions on de Man, Rogosa, and Sharpe (MRS) agar-milk agar (1/1 mix of MRS agar and 3.5% ultrahigh-temperature processing [UHT] milk). R. mucilaginosa SB-353 was isolated and plated in serial dilutions onto tomato juice agar (TJA). All isolates were grown in their corresponding medium for up to 5 days at 30°C. Isolates were identified by internal transcribed spaced (ITS) DNA amplification PCR using the primers S-D-Bact-0515-a-S-16 (GTGCCAGCMGCNGCGG) and S-*-Univ-1392-a-A-15 (ACGGGCGGTGTGTRC) (3) and subsequent Sanger sequencing of the amplified region. ITS sequences were taxonomically assigned using an open-reference method. The kefir-isolated yeast was used as the reference, and subsequent naive Bayesian classification was performed using UNITE (4). Strains were deposited and are available in the Leibniz Institute DSMZ collection of microorganisms under the same strain names.

The genomic DNA extraction was performed using a two-step approach combining enzymatic digestion with lysozyme, followed by bead beating with 0.3-g glass beads. The supernatant was then digested with proteinase K and applied to phenol-chloroform extraction and DNA precipitation, as described in references 5 and 6. DNA was then prepared for sequencing using a Nextera DNA library preparation kit (Illumina) and sequenced on an Illumina HiSeq 2000 instrument to get 100-bp paired-end reads with the insert size ranging between 250 bp and 300 bp. The quality of reads was checked with FastQC v0.11.9 (7), while Trimmomatic v0.36 (8) was used to adapter and quality trim the reads (with the following parameter settings: ILLUMINACLIP:TruSeq2-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36). A separate removal step for other contaminants was not performed. The resulting reads were assembled with ABySS v2.02 (9) and SPAdes v3.9.0 (10). No correction steps were performed for the ABySS assemblies; however, mismatches and short indels were corrected for the SPAdes assemblies (enabled with the --careful flag). The obtained genome assemblies based on different parameters were evaluated based on contiguity and completeness with single-copy orthologs using the QUAST v4.1 (11) and BUSCO v3 (12) tools, respectively. The lineage data sets in the benchmarking universal single-copy ortholog (BUSCO) analysis were saccharomycetes_odb9 (C. californica, K. exigua, and K. unispora) and basidiomycota_odb9 (R. mucilaginosa). The best genome assemblies were obtained with ABySS with k-mer length values (parameter k) set to 45 for K. exigua and 67 for R. mucilaginosa. Regarding the other three strains, SPAdes with default settings and the --careful flag produced the best assemblies. The contigs shorter than 500 bp were discarded.

Assemblies were annotated for repeat regions and soft masked with the RepeatModeler v1.0.11 (13) and RepeatMasker v4.0.7 (14) tools. The protein-encoding sequences (CDSs) and tRNAs were predicted with the funannotate predict function in funannotate v1.5.3 (15). The predicted genes were functionally annotated based on their protein sequences using the funannotate annotate function in funannotate v1.5.3 (15) from the MEROPS v12.0 (16), MIBiG v1.4 (17), Pfam v32.0 (18), dbCAN v7, and eggNOG v4.5.1 (19) databases. Transmembrane and secreted proteins were annotated using Phobius v1.0.1 (20) and SignalP v4.1 (21). Finally, secondary metabolite biosynthetic gene clusters were identified with antiSMASH v4.2.0 (22). Default parameters were used for all software unless otherwise specified.

Table 1 shows that the five newly isolated strains exhibit a genome size range of 12.02 Mb to 20.07 Mb with an average GC content of 28.6% to 60.6%.

TABLE 1.

Accession numbers and characteristics of kefir fungal isolates

Species Strain SRA accession no. GenBank accession no. No. of reads Coverage (×) Genome size (bp) GC content (%) No. of contigs Contig N50 (bp) No. of genes Single-copy BUSCOs (%)
Candida californica SB-48 SRX9449769 PUHW00000000 25,448,750 192 12,323,006 28.6 1,206 23,604 5,524 91.1a
Candida californica SB-116 SRX9449771 PUHU00000000 25,749,226 194 12,320,729 28.7 981 28,622 5,490 92.0a
Kazachstania exigua SB-178 SRX9449774 PUHR00000000 27,522,278 189 13,507,013 33.3 773 38,581 5,522 96.9a
Kazachstania unispora SB-162 SRX9449773 PUHS00000000 29,185,902 225 12,020,007 32.3 432 60,809 5,464 96.8a
Rhodotorula mucilaginosa SB-353 SRX9449775 PUHQ00000000 19,300,818 89 20,066,154 60.6 416 112,846 7,169 93.4b
a

The lineage data set saccharomycetes_odb10.

b

The lineage data set basidiomycota_odb10.

Data availability.

The raw reads have been deposited at the NCBI Sequence Read Archive (SRA), and the whole-genome shotgun projects have been deposited at DDBJ/ENA/GenBank. While all these data are available under BioProject number PRJNA435582, the individual SRA and GenBank accession numbers described in this report are included in Table 1. The GenBank versions described in this paper are the first versions (01).

ACKNOWLEDGMENTS

The DNA sequencing libraries were created and sequenced at the EMBL Genomics Core Facility.

This work was sponsored by the German Ministry of Education and Research (BMBF) (grant number 031A601B) as a part of the ERASysAPP project SysMilk.

Contributor Information

Jens Nielsen, Email: nielsenj@chalmers.se.

Antonis Rokas, Vanderbilt University.

REFERENCES

  • 1.Lopitz-Otsoa F, Rementeria A, Elguezabal N, Garaizar J. 2006. Kefir: a symbiotic yeasts-bacteria community with alleged healthy capabilities. Rev Iberoam Micol 23:67–74. doi: 10.1016/S1130-1406(06)70016-X. [DOI] [PubMed] [Google Scholar]
  • 2.Simova E, Beshkova D, Angelov A, Hristozova T, Frengova G, Spasov Z. 2002. Lactic acid bacteria and yeasts in kefir grains and kefir made from them. J Ind Microbiol Biotechnol 28:1–6. doi: 10.1038/sj/jim/7000186. [DOI] [PubMed] [Google Scholar]
  • 3.Klindworth A, Pruesse E, Schweer T, Peplies J, Quast C, Horn M, Glöckner FO. 2013. Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies. Nucleic Acids Res 41:e1. doi: 10.1093/nar/gks808. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Nilsson RH, Larsson K-H, Taylor AFS, Bengtsson-Palme J, Jeppesen TS, Schigel D, Kennedy P, Picard K, Glöckner FO, Tedersoo L, Saar I, Kõljalg U, Abarenkov K. 2019. The UNITE database for molecular identification of fungi: handling dark taxa and parallel taxonomic classifications. Nucleic Acids Res 47:D259–D264. doi: 10.1093/nar/gky1022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Kowalczyk M, Kolakowski P, Radziwill-Bienkowska JM, Szmytkowska A, Bardowski J. 2012. Cascade cell lyses and DNA extraction for identification of genes and microorganisms in kefir grains. J Dairy Res 79:26–32. doi: 10.1017/S0022029911000677. [DOI] [PubMed] [Google Scholar]
  • 6.Blasche S, Kim Y, Mars RAT, Machado D, Maansson M, Kafkia E, Milanese A, Zeller G, Teusink B, Nielsen J, Benes V, Neves R, Sauer U, Patil KR. 2021. Metabolic cooperation and spatiotemporal niche partitioning in a kefir microbial community. Nat Microbiol 6:196–208. doi: 10.1038/s41564-020-00816-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Andrews S. 2010. FastQC: a quality control tool for high throughput sequence data. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
  • 8.Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Jackman SD, Vandervalk BP, Mohamadi H, Chu J, Yeo S, Hammond SA, Jahesh G, Khan H, Coombe L, Warren RL, Birol I. 2017. ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter. Genome Res 27:768–777. doi: 10.1101/gr.214346.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477. doi: 10.1089/cmb.2012.0021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Gurevich A, Saveliev V, Vyahhi N, Tesler G. 2013. QUAST: quality assessment tool for genome assemblies. Bioinformatics 29:1072–1075. doi: 10.1093/bioinformatics/btt086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. 2015. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
  • 13.Smit A, Hubley R. 2015. RepeatModeler Open-1.0. https://www.repeatmasker.org/RepeatModeler/
  • 14.Smit A, Hubley R, Green P. 2015. RepeatMasker Open-4.0. https://www.repeatmasker.org/RepeatMasker/
  • 15.Palmer J, Stajich J. 2019. nextgenusfs/funannotate: funannotate v1.5.3. doi: 10.5281/ZENODO.2604804. [DOI]
  • 16.Rawlings ND, Barrett AJ, Thomas PD, Huang X, Bateman A, Finn RD. 2018. The MEROPS database of proteolytic enzymes, their substrates and inhibitors in 2017 and a comparison with peptidases in the PANTHER database. Nucleic Acids Res 46:D624–D632. doi: 10.1093/nar/gkx1134. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Medema MH, Kottmann R, Yilmaz P, Cummings M, Biggins JB, Blin K, de Bruijn I, Chooi YH, Claesen J, Coates RC, Cruz-Morales P, Duddela S, Düsterhus S, Edwards DJ, Fewer DP, Garg N, Geiger C, Gomez-Escribano JP, Greule A, Hadjithomas M, Haines AS, Helfrich EJN, Hillwig ML, Ishida K, Jones AC, Jones CS, Jungmann K, Kegler C, Kim HU, Kötter P, Krug D, Masschelein J, Melnik AV, Mantovani SM, Monroe EA, Moore M, Moss N, Nützmann H-W, Pan G, Pati A, Petras D, Reen FJ, Rosconi F, Rui Z, Tian Z, Tobias NJ, Tsunematsu Y, Wiemann P, Wyckoff E, Yan X, et al. 2015. Minimum Information about a Biosynthetic Gene cluster. Nat Chem Biol 11:625–631. doi: 10.1038/nchembio.1890. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, Sonnhammer ELL, Tate J, Punta M. 2014. Pfam: the protein families database. Nucleic Acids Res 42:D222–D230. doi: 10.1093/nar/gkt1223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Huerta-Cepas J, Forslund K, Coelho LP, Szklarczyk D, Jensen LJ, von Mering C, Bork P. 2017. Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Mol Biol Evol 34:2115–2122. doi: 10.1093/molbev/msx148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Kall L, Krogh A, Sonnhammer ELL. 2005. An HMM posterior decoder for sequence feature prediction that includes homology information. Bioinformatics 21:i251–i257. doi: 10.1093/bioinformatics/bti1014. [DOI] [PubMed] [Google Scholar]
  • 21.Nielsen H. 2017. Predicting secretory proteins with SignalP. Methods Mol Biol 1611:59–73. doi: 10.1007/978-1-4939-7015-5_6. [DOI] [PubMed] [Google Scholar]
  • 22.Medema MH, Blin K, Cimermancic P, de Jager V, Zakrzewski P, Fischbach MA, Weber T, Takano E, Breitling R. 2011. antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences. Nucleic Acids Res 39:W339–W346. doi: 10.1093/nar/gkr466. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The raw reads have been deposited at the NCBI Sequence Read Archive (SRA), and the whole-genome shotgun projects have been deposited at DDBJ/ENA/GenBank. While all these data are available under BioProject number PRJNA435582, the individual SRA and GenBank accession numbers described in this report are included in Table 1. The GenBank versions described in this paper are the first versions (01).


Articles from Microbiology Resource Announcements are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES