Skip to main content
Data in Brief logoLink to Data in Brief
. 2022 Feb 5;41:107888. doi: 10.1016/j.dib.2022.107888

Chromosome-level genome sequence data and analysis of the white koji fungus, Aspergillus luchuensis mut. kawachii IFO 4308

Kazuki Mori a,b, Chihiro Kadooka c,d,#, Ken Oda e, Kayu Okutsu c, Yumiko Yoshizaki c,d, Kazunori Takamine c,d, Kosuke Tashiro a, Masatoshi Goto d,f, Hisanori Tamaki c,d, Taiki Futagami c,d,
PMCID: PMC8847812  PMID: 35198670

Abstract

Aspergillus luchuensis mut. kawachii is used primarily in the production of shochu, a traditional Japanese distilled alcoholic beverage. Here, we report the chromosome-level genome sequence of A. luchuensis mut. kawachii IFO 4308 (NBRC 4308) and a comparison of the sequence with that of A. luchuensis RIB2601. The genome of strain IFO 4308 was assembled into nine contigs consisting of eight chromosomes and one mitochondrial DNA segment. The nearly complete genome of strain IFO 4308 comprises 37,287,730 bp with a GC content of 48.85% and 12,664 predicted coding sequences and 267 tRNAs. Comparison of the IFO 4308 and RIB2601 genomes revealed a highly conserved structure; however, the IFO 4308 genome is larger than that of RIB2601, which is primarily attributed to chromosome 5. The genome sequence of IFO 4308 was deposited in DDBJ/ENA/GenBank under accession numbers AP024425–AP024433.

Keywords: Aspergillus luchuensis mut. kawachii, white koji fungus, shochu, chromosome-level genome assembly

Specifications Table

Subject Biological sciences
Specific subject area Applied Microbiology, Genomics
Type of data Genomic sequence
Table
Figure
Supplementary file
How the data were acquired Whole genome sequencing using Illumina NovaSeq 6000 platform for short reads and Oxford Nanopore Technologies MinION for long reads.
Data format Raw
Assembled/analyzed
Description of data collection The genomic DNA of strain IFO 4308 was isolated. Raw sequence reads were generated using Illumina NovaSeq 6000 (short reads) and Oxford Nanopore Technologies MinION (long reads). The data were filtered, de novo assembled, and annotated using Funannotate pipeline and MFannot.
Data source location Institution: Kagoshima University
City/Town/Region: Kagoshima
County: Japan
Data accessibility The nucleotide sequence of IFO 4308 was deposited in DDBJ/ENA/GenBank under the accession numbers AP024425 (https://www.ncbi.nlm.nih.gov/nuccore/AP024425), AP024426 (https://www.ncbi.nlm.nih.gov/nuccore/AP024426), AP024427 (https://www.ncbi.nlm.nih.gov/nuccore/AP024427), AP024428 (https://www.ncbi.nlm.nih.gov/nuccore/AP024428), AP024429 (https://www.ncbi.nlm.nih.gov/nuccore/AP024429), AP024430 (https://www.ncbi.nlm.nih.gov/nuccore/AP024430), AP024431 (https://www.ncbi.nlm.nih.gov/nuccore/AP024431), AP024432 (https://www.ncbi.nlm.nih.gov/nuccore/AP024432), and AP024433 (https://www.ncbi.nlm.nih.gov/nuccore/AP024433). The nucleotide sequence of IFO 4308 was also deposited in Comprehensive Aspergillus oryzae Genome Database (CAoGD) by National Research Institute of Brewing, Japan (https://nribf21.nrib.go.jp/CAoGD/).
Raw sequence reads were deposited in the SRA under accession numbers DRX251718 (https://www.ncbi.nlm.nih.gov/sra/DRX251718) and DRX251719 (https://www.ncbi.nlm.nih.gov/sra/DRX251719).

Value of the Data

  • The white koji fungus, Aspergillus luchuensis mut. kawachii, is used in the production of the traditional Japanese distilled spirit shochu.

  • The chromosome-level genome sequence of the white koji fungus can assist shochu brewers and researchers studying koji fungi.

  • These data are useful for comparative genomics studies of koji fungi, providing further insights into the genetic background of the white koji fungus that make it superior for use in shochu production.

1. Data Description

The white koji fungus, Aspergillus luchuensis mut. kawachii, is primarily used to produce shochu, a traditional distilled alcoholic beverage indigenous to Japan [1], [2], [3]. The white koji fungus plays an important role in supplying amylolytic enzymes that decompose starch in shochu ingredients, such as rice, barley, buckwheat, and sweet potato. The fungus also secretes large amounts of citric acid that prevent the growth of contaminating microbes during the fermentation process. We previously reported the genome sequence of A. luchuensis mut. kawachii IFO 4308 (NBRC 4308) [4]. In addition, genome sequences of four other white koji fungi have recently been reported [5]. However, as these sequences were incomplete draft genome assemblies, we conducted a chromosome-level genome analysis of strain IFO 4308.

The nearly complete genome of strain IFO 4308 comprises 37,287,730 bp with a GC content of 48.85% and 12,664 predicted coding sequences and 267 tRNAs. Quality assessment identified 97.7% complete and single-copy, 0.2% complete and duplicate-copy, 0.9% fragmented-copy, and 1.2% missing Benchmarking Universal Single-Copy Orthologs (BUSCOs) [6]. We confirmed that most of the missing BUSCOs were actually present in the genome of IFO 4308. The discrepancy was attributed to technical limitations in gene prediction [6]. Details regarding the chromosomes present in strain IFO 4308 are summarized in Table 1.

Table 1.

Chromosomes of A. luchuensis mut. kawachii strain IFO 4308

Locationa Accession no. Size (Mb) GC% no. of CDSb no. of rRNAc no. of tRNA
Chr. 1 AP024425.1 6.19 49.5 2,106 NA 47
Chr. 2 AP024426.1 4.96 48.9 1,621 NA 34
Chr. 3 AP024427.1 4.83 49.3 1,636 NA 27
Chr. 4 AP024428.1 3.79 49.2 1,341 15 (72)d 17
Chr. 5 AP024429.1 6.27 48.4 2,077 NA 30
Chr. 6 AP024430.1 3.97 48.7 1,386 NA 37
Chr. 7 AP024431.1 3.19 48.2 1,077 NA 12
Chr. 8 AP024432.1 4.05 48.7 1,405 NA 37
MT AP024433.1 0.03 26.4 15 1 26
a

Chr, chromosome; MT, mitochondria.

b

CDS, coding DNA sequences.

c

NA, not applicable.

d

The number of rRNA genes is not clear due to their highly repetitive structure. The number in parentheses indicates the estimated copy number based on the median per-base coverage.

Aspergillus luchuensis mut. kawachii is an albino mutant of a particular A. luchuensis black koji fungus; however, the parent strain of IFO 4308 remains unknown [1], [2], [3],7]. Determination of the nearly complete genome sequence of IFO 4308 enabled us to compare its genomic structure with that of A. luchuensis RIB2601, the nearly complete genome of which was sequenced previously [8]. The genome of strain RIB2601 is 35,508,746 bp in size [8], which is smaller than that of strain IFO 4308. Genome comparison indicated a high degree of conservation in the genome structures of strains IFO 4308 and RIB2601, with the larger genome of IFO 4308 primarily attributed to chromosome 5 (Fig. 1). Differences in the genomes could have resulted from transposable elements, such as retrotransposons, because putative reverse transcriptase–encoding genes and long interspersed nuclear elements (LINEs) have been identified in the region specific to IFO 4308 (indicated by triangles and lines in Fig. 1).

Fig. 1.

Fig 1

Comparison of the genome structures of A. luchuensis mut. kawachii strain IFO 4308 and A. luchuensis strain RIB2601. The figure was created based on supplementary files. Triangles indicate the locations of genes annotated as reverse transcriptase, whereas lines indicate the locations of repetitive elements annotated as LINEs. Chr, chromosome.

2. Experimental Design, Materials and Methods

2.1. Sequencing and assembly

Strain IFO 4308 was grown in yeast extract-peptone-dextrose medium (2% [wt/vol] glucose, 1% [wt/vol] yeast extract, and 2% [wt/vol] peptone). After cultivation at 30 °C with shaking at 163 rpm for 24 h, mycelia were harvested by filtration. The cell pellet was freeze-dried and ground into powder using a mortar and pestle. DNA was extracted from the mycelial powder using DNAs-ici!-F DNA extraction reagent (Rizo, Inc., Tsukuba, Japan). DNA of strain IFO 4308 was sequenced using a hybrid assembly approach with Oxford Nanopore Technologies (ONT) MinION and Illumina NovaSeq 6000. ONT long reads were used for de novo assembly, whereas the Illumina short reads were used for error correction. The genomic library for ONT sequencing was prepared using a Ligation Sequencing Kit (SQK-LSK109) and sequenced via MinION using a flow cell (R9.4.1). Adapter sequences were trimmed using Porechop v0.2.4, and chimeric reads were removed using Yacrd v0.6.1, yielding 1,664,000 ONT reads (mean length, 7,354 bp). The genomic library for Illumina sequencing was prepared using a NEBNext Ultra II DNA Library Prep Kit (E7645) and sequenced via the NovaSeq 6000 using a paired-end sequencing strategy. The Illumina reads were filtered using Fastp v.0.20.1 with default parameters, yielding 42,205,278 reads (mean length, 150 bp). The ONT and Illumina reads provided 328 × and 169 × sequence coverages, respectively. De novo assembly of the ONT reads was performed using Canu v.2.0 [9], and the initial assembly and trimmed and corrected ONT reads were reassembled using Flye v2.8-b1674 [10]. Next, several contigs were bridged by contigs generated using MaSuRCA v3.4.2 [11]. The superior metrics were selected based on telomere-to-telomere chromosome assembly. Assemblies were polished using medaka v1.0.3 [12] and pilon v1.23 [13] for ONT reads and pilon v1.23 [13] for Illumina reads. The resulting assembly consisted of nine contigs corresponds to eight chromosomes and one mitochondrial DNA segment. Chromosomes 2, 3, 5, 6, 7, and 8 were generated using only Canu and Flye, whereas chromosomes 1 and 4 were generated via an assembly in which two contigs were bridged using a MaSuRCA contig.

2.2. Gene prediction and analysis

The obtained chromosomes and mitochondrial DNA were annotated using the Funannotate v1.8.1 pipeline [14] and MFannot v1.1 [15], respectively. For the Funannotate analysis, the RNA-sequencing (RNA-seq) data for strain IFO 4308 [16] (Sequence Read Archive [SRA] accession numbers SRX9800147 [https://www.ncbi.nlm.nih.gov/sra/SRX9800147] through SRX9800149 [https://www.ncbi.nlm.nih.gov/sra/SRX9800149]) were also used for gene prediction. RNA-seq reads were assembled and mapped using Trinity v2.8.5 [17] and HISAT v2.2.0 [18], respectively, and gene predictions were updated using PASA v2.4.1. Gene products were annotated based on sequence similarity relative to dbCAN2 v9.0 (based on CAZy database v7/30/2020), MEROPS v12.0, MIBiG v1.4, Pfam v33.1, and UniProt v2020_05 databases using antiSMASH v5.1.2, Barrnap v0.9, eggNOG-mapper v1.0.3 (for EggNOG v4.5 database), InterProScan v5.47-82.0, Phobius v1.01, SignalP v4.1, and tRNAscan-SE v2.0.7. Repetitive elements were identified using RepeatMasker v4.1.0 with the Dfam_3.1 and RepBase-20170127 databases [19]. Data from RepeatMasker are provided as supplementary files. Genome assembly and annotation completeness were assessed using BUSCO v5.1.2 with the ascomycota_odb10 (2020-09-10) data set [6]. The genome structures of strains IFO 4308 and RIB2601 were compared using Minimap2 v2.17 [20].

CRediT Author Statement

Kazuki Mori: Conceptualization, Investigation, Writing - Reviewing and Editing; Chihiro Kadooka: Investigation, Writing- Reviewing and Editing. Ken Oda: Data curation, Visualization, Writing - Reviewing and Editing; Kayu Okutsu: Writing - Reviewing and Editing; Yumiko Yoshizaki: Writing - Reviewing and Editing; Kazunori Takamine: Writing - Reviewing and Editing; Kosuke Tashiro: Writing- Reviewing and Editing Masatoshi Goto: Data curation, Writing - Reviewing and Editing; Hisanori Tamaki: Writing - Reviewing and Editing; Taiki Futagami: Supervision, Funding acquisition, Writing-Original draft preparation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by a JSPS KAKENHI grant (grant number 19K05773), a Novozymes Japan Research Fund, and a Nagase Science and Technology Foundation for T.F. C.K. was supported by a Grant-in-Aid for JSPS Research Fellows (grant number 17J02753).

Footnotes

Supplementary material associated with this article can be found in the online version at doi:10.1016/j.dib.2022.107888.

Appendix. Supplementary materials

mmc1.zip (306.9KB, zip)
mmc2.zip (297.2KB, zip)

References

  • 1.Yamada O., Takara R., Hamada R., Hayashi R., Tsukahara M., Mikami S. Molecular biological researches of Kuro-Koji molds, their classification and safety. J. Biosci. Bioeng. 2011;112:233–237. doi: 10.1016/j.jbiosc.2011.05.005. [DOI] [PubMed] [Google Scholar]
  • 2.Hong S.B., Lee M., Kim D.H., Varga J., Frisvad J.C., Perrone G., Gomi K., Yamada O., Machida M., Houbraken J., Samson R.A. Aspergillus luchuensis, an industrially important black Aspergillus in East Asia. PLoS One. 2013;8:e63769. doi: 10.1371/journal.pone.0063769. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Hong S.B., Yamada O., Samson R.A. Taxonomic re-evaluation of black koji molds. Appl. Microbiol. Biotechnol. 2014;98:555–561. doi: 10.1007/s00253-013-5332-9. [DOI] [PubMed] [Google Scholar]
  • 4.Futagami T., Mori K., Yamashita A., Wada S., Kajiwara Y., Takashita H., Omori T., Takegawa K., Tashiro K., Kuhara S., Goto M. Genome sequence of the white koji mold Aspergillus kawachii IFO 4308, used for brewing the Japanese distilled spirit shochu. Eukaryot. Cell. 2011;10:1586–15867. doi: 10.1128/EC.05224-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Yamamoto N., Watarai N., Koyano H., Sawada K., Toyoda A., Kurokawa K., Yamada T. Analysis of genomic characteristics and their influence on metabolism in Aspergillus luchuensis albino mutants using genome sequencing. Fungal Genet. Biol. 2021;155 doi: 10.1016/j.fgb.2021.103601. [DOI] [PubMed] [Google Scholar]
  • 6.Simão F.A., Waterhouse R.M., Ioannidis P., Kriventseva E.V., Zdobnov E.M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
  • 7.Kitahara K., Yoshida M. On the so-called Awamori white mold part III. (1) Morphological and several physiological characteristics. J. Ferment. Technol. 1949;27:162–166. [Google Scholar]
  • 8.Mori K., Kadooka C., Nishitani A., Okutsu K., Yoshizaki Y., Takamine K., Tashiro K., Goto M., Tamaki H., Futagami T. Chromosome-level genome sequences of the black koji fungus Aspergillus luchuensis RIB2601. Microbiol. Resour. Announc. 2021;10 doi: 10.1128/MRA.00384-21. e00384-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Koren S., Walenz B.P., Berlin K., Miller J.R., Bergman N.H., Phillippy A.M. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017;27:722–736. doi: 10.1101/gr.215087.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Kolmogorov M., Yuan J., Lin Y., Pevzner P.A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 2019;37:540–546. doi: 10.1038/s41587-019-0072-8. [DOI] [PubMed] [Google Scholar]
  • 11.Zimin A.V., Marçais G., Puiu D., Roberts M., Salzberg S.L., Yorke J.A. The MaSuRCA genome assembler. Bioinformatics. 2013;29:2669–2677. doi: 10.1093/bioinformatics/btt476. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Oxford Nanopore Technologies, Medaka (2018) https://nanoporetech.github.io/medaka/.
  • 13.Walker B.J., Abeel T., Shea T., Priest M., Abouelliel A., Sakthikumar S., Cuomo C.A., Zeng Q., Wortman J., Young S.K., Earl A.M. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 2014;9 doi: 10.1371/journal.pone.0112963. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Palmer J.M., Stajich J.E. Funannotate v1.8.1: Eukaryotic genome annotation. Zenodo. 2020 doi: 10.5281/zenodo.4054262. [DOI] [Google Scholar]
  • 15.N. Beck, B.F. Lang, MFannot (2010) http://megasun.bch.umontreal.ca/cgi-bin/mfannot/mfannotInterface.pl.
  • 16.Kadooka C., Izumitsu K., Asai T., Mori K., Okutsu K., Yoshizaki Y., Takamine K., Goto M., Tamaki H., Futagami T. Overexpression of the RNA-binding protein NrdA affects global gene expression and secondary metabolism in Aspergillus species. bioRxiv. 2021 doi: 10.1101/2021.03.15.435561. [DOI] [Google Scholar]
  • 17.Grabherr M.G., Haas B.J., Yassour M., Levin J.Z., Thompson D.A., Amit I., Adiconis X., Fan L., Raychowdhury R., Zeng Q., Chen Z., Mauceli E., Hacohen N., Gnirke A., Rhind N., di Palma F., Birren B.W., Nusbaum C., Lindblad-Toh K., Friedman N., Regev A. Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data. Nat. Biotechnol. 2011;29:644. doi: 10.1038/nbt.1883. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Kim D., Paggi J.M., Park C., Bennett C., Salzberg S.L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 2019;37:907–915. doi: 10.1038/s41587-019-0201-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.A. Smit, R. Hubley, G. Glusma, RepeatMasker, (2021) http://www.repeatmasker.org.
  • 20.Heng L. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–3100. doi: 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

mmc1.zip (306.9KB, zip)
mmc2.zip (297.2KB, zip)

Articles from Data in Brief are provided here courtesy of Elsevier

RESOURCES