Abstract
Here, we report the draft genome sequence and assembly of the Penicillium sp. strain E22, which was isolated from Antarctic soil of Deception Island, South Shetland Islands close to the Antarctic Peninsula. The genome was sequenced using a 2 # 250 bp paired-end method by Illumina MiSeq 6000. The genome assembly was performed using softwares implemented in the Kbase web service. The phylogenetic tree of strain E22 comparing its internal transcribed spacer (ITS) region with the other Penicillium showed high genetic similarity to Penicillium griseofulvum MN545450 and Penicillium camemberti MT530220. Draf genome of Penicillium sp. strain E22 comprises 33,653 coding sequences, with a high G + C content of 48.32% and a total size of 37,484,944 bp. This draft genome assembly version has been deposited at GenBank under accession JASJUN000000000.
Keywords: Penicillium; Antarctica; Deception Island, Draft genome sequencing
Specifications Table
Subject | Microbiology • Fungal Biology |
Specific subject area | The genome sequence was processed in Illumina MiSeq 6000 De novo assembly: SPAdes Genome Assembler software (v3.15.3), Annotation: DRAM (Distilled and Refined Annotation of Metabolism) software (v0.1.2) as implemented in the Kbase web service. |
Data format | Raw, Analyzed, Filtered and deposited |
Type of data | Table, Figure |
Data collection | Purification of genomic DNA from a pure culture of Penicillium sp. strain E22 isolated from Antarctic soil, The sequencing library was generated using the Nextera® XT DNA sample preparation kit for Illumina. Illumina MiSeq PE250 was used for whole genome sequencing. |
Data source location | The strain E22 was isolated from the soil of Deception Island (S 62° 55′ 58.1′' W 60° 35′ 26.8′'), Antarctica. |
Data accessibility | Data are deposited at the NCBI GenBank https://www.ncbi.nlm.nih.gov/bioproject/PRJNA970415 https://www.ncbi.nlm.nih.gov/sra/SRR24472943 |
1. Value of the Data
-
•
The availability of the draft genome assembly for Penicillium sp. strain E22 provides significant benefits for microbial taxonomy and ecological studies, especially in terms of identifying and mapping species distribution.
-
•
The information presented in this article has the potential to be beneficial for researchers who are engaged in environmental microbiology, environmental biotechnology, extremophiles and genomics.
-
•
The genomic data of Penicillium sp. strain E22 contained in this report could be a valuable asset for scientists who wish to conduct comparative genomic analyses across different strains and environment.
2. Background
The genus Penicillium comprises the most extensively distributed fungi, which are present universally in both outdoor and indoor environments, including food, water, plants, and soils. Presently, there are 354 acknowledged species in this genus, and numerous among them have the ability to generate a wide range of natural products and enzymes, including amylases, glucoamylase, cellulase, proteases, and xylanase [1], [2], [3].
2.1. Data description
The data presented here represents the genome sequencing, assembly, and annotation of the Antarctic Penicillium strain E22, isolated from Deception Island soil. Illumina sequencing yielded 874.33 million paired-end reads. The N50 contig length was 53.9 Kb with an average coverage of 24 × . The resulting draft genome was 37,484,944 bp in size with a G+C content of 48.32 %. Gene prediction analysis using the kb_DRAM web-based app in KBase (v.0.1.2) [4], resulted in 33,653 protein coding genes (Table 1).
Table 1.
QUAST report and genome features for Penicillium sp. strain E22 assembly.
Statistics without reference | Penicillium sp. strain E22 |
---|---|
# contigs | 2,704 |
# contigs (>= 0 bp) | 2,705 |
# contigs (>= 1000 bp) | 1,846 |
# contigs (>= 10000 bp) | 769 |
# contigs (>= 100000 bp) | 70 |
# contigs (>= 1000000 bp) | 0 |
Largest contig | 298,884 |
Total length | 37,484,944 |
Total length (>= 0 bp) | 37,485,436 |
Total length (>= 1000 bp) | 36,903,868 |
Total length (>= 10000 bp) | 33,127,976 |
Total length (>= 100000 bp) | 9,345,970 |
Total length (>= 1000000 bp) | 0 |
N50 | 53,495 |
N75 | 24,743 |
L50 | 203 |
L75 | 460 |
GC (%) | 48.32 |
Mismatches | |
# N's | 2597 |
# N's per 100 kbp | 6.93 |
Genome features | |
Total coding sequences | 33,653 |
tRNA genes | 198 |
rRNA genes | 50 |
Based on the comparison of the internal transcribed spacer (ITS) region of the 18S–5.8S–26S nuclear ribosomal of the isolate to other strains, it was found that it had the closest genetic similarity to Penicillium griseofulvum MN545450 and Penicillium camemberti MT530220, with a 99.15% identity with both species (Fig. 1). Functional gene annotation of the draft genome predicted about 3253 genes using KEGG. The carbohydrate-active enzyme analysis showed that Penicillium sp. strain E22 was dominated by AA1, AA3, GH13, GT2, GH16, GH43 and GH5. Different types of secondary metabolite clusters that may be involved in the formation of secondary metabolites were found: T1PKS, NRPS, NRPS-like, fungal-RiPP-like, NI-siderophore, betalactone, indole, terpene and and several hybrids (NRPS,T1PKS; NRPS,indole; NRPS-like,T1PKS; NRPS,fungal-RiPP-like; NRP-metallophore-NRPS and T1PKS,indole,NRPS-like,terpene). This whole genome project has been deposited at NCBI GenBank under accession number for Bioproject, Biosample and SRA as PRJNA970415, SAMN35003752 and SRR24472943, respectively. The assembly version described in this paper is version JASJUN000000000.
Fig. 1.
Phylogenetic tree of ITS region sequences inferred by maximum-likelihood method. Numbers above branches indicate 1000 bootstrap replicates values and 44 sequences of Penicillium species used are presented with GenBank accession numbers followed by the name of strains.
3. Experimental Design, Materials and Methods
3.1. Genome DNA extraction and sequencing
Penicillium sp. strain E22 was isolated from Deception Island (62° 55′ 58.1′'S 60° 35′ 26.8′'W), Antarctica. Strain E22 was routinely cultivated in Yeast Malt Extract Agar medium at 28°C for 3 days. TRIzol™ (Invitrogen™, USA) was used for genomic DNA extraction. The genomic library of strain E22 was generated using Nextera® XT DNA sample preparation kit according to the manufacturer's instructions. The whole genome sequencing was then performed by using an Illumina MiSeq PE250 at the Biotechnology Research Institute, Universiti Malaysia Sabah.
3.2. Species identification
The DNA fragment was amplified using universal primer set ITS1 (forward primer) 5-TCCGTAGGTGAACCTGCGG-3 and ITS4 (reverse primer) 5-TCCTCCGCTTATTGATATGC-3. The PCR product was sequenced using bi-directional sequencing. The sequence was analyzed by BLAST and then compared to the NCBI database. The phylogenetic tree was constructed using the Maximum-likelihood phylogenetic tree based on ITS rRNA gene sequences (879 base pair alignment positions including gaps; Substitution model: HKY85; Gamma shape parameter: 0.467; Transition/transversion ratio: 3.761; Number of categories: 4 and Proportion of invariant: 0.740) showing the relationship between strain E22 and the 43 most closely related reference species. The alignment, substitution model and construction of the phylogenetic tree were performed using the online Phylogeny.fr tool [4].
3.3. Reads pre-processing, genome assembly, quality assessment, and annotation
The raw reads were pre-processed using the Trimmomatic (v1.2.14) tool to trim low-quality bases and short reads (minimum length=36), then assembled using SPAdes Genome Assembler software (v3.15.3), Quast Report (QUality ASsessment Tool, v4.4) and the Annotation performed with DRAM (Distilled and Refined Annotation of Metabolism) software (v0.1.2). All software used was implemented in the Kbase web service [5].
Limitations
‘Not applicable’.
Ethics Statement
This work neither involves human subjects nor animal subjects. The authors declare that this manuscript is original work and has not been published elsewhere.
CRediT authorship contribution statement
Teoh Chui Peng: Formal analysis, Writing – original draft. Paris Lavin: Conceptualization, Supervision, Methodology, Writing – review & editing. Rómulo Oses Pedraza: Writing – review & editing. Natalia Fierro-Vásquez: Formal analysis, Writing – original draft. Cristina Purcarea: Writing – review & editing. Sheau Ting Yong: Data curation. Clemente M.V.L. Wong: Writing – review & editing, Conceptualization, Supervision, Methodology.
Acknowledgments
Acknowledgements
This work was supported by the INACH RT_20-19 Project (Instituto Antartico Chileno), Fondecyt Iniciación grant No 11190754 (National Agency of Research and Development, ANID), Convenio Mineduc-UA ANT20992, Funding granted by the Vice-Rector's Office for Research, Innovation and Postgraduate Studies at the University of Antofagasta and Romanian Academy project RO1567-IBB05/2022.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Data Availability
References
- 1.Visagie C.M., Houbraken J., Frisvad J.C., Hong S.-B., Klaassen C.H.W., Perrone G., Seifert K.A., Varga J., Yaguchi T., Samson R.A. Identification and nomenclature of the genus Penicillium. Stud. Mycol. 2014;78:343–371. doi: 10.1016/j.simyco.2014.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Kita D.M., Giovanella P., Yoshinaga T.T., Pellizzer E.P., Sette L.D. Antarctic fungi applied to textile dye bioremediation. Anais da Academia Brasileira de Ciências. 2022;94 doi: 10.1590/0001-3765202220210234. [DOI] [PubMed] [Google Scholar]
- 3.Vaishnav N., Singh A., Adsul M., Dixit P., Sandhu S.K., Mathur A., Puri S.K., Singhania R.R. Penicillium: the next emerging champion for cellulase production. Bioresour. Technol. Rep. 2018;2:131–140. doi: 10.1016/j.biteb.2018.04.003. [DOI] [Google Scholar]
- 4.Dereeper A., Guignon V., Blanc G., Audic S., Buffet S., Chevenet F., Dufayard J.F., Guindon S., Lefort V., Lescot M., Claverie J.M., Gascuel O. Phylogeny.fr: robust phylogenetic analysis for the non-specialist. Nucleic Acids Res. 2008;1(Web Server issue):W465–W469. doi: 10.1093/nar/gkn180. 36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: the United States department of energy systems biology knowledgebase. Nat. Biotechnol. 2018;36:566. doi: 10.1038/nbt.4163. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.