Skip to main content
Scientific Data logoLink to Scientific Data
. 2025 Aug 21;12:1460. doi: 10.1038/s41597-025-05798-9

Whole genome sequencing and annotations of Trametes sanguinea ZHSJ

Rathna Silviya Lodi 1,2,#, Xiuwen Jia 1,#, Peng Yang 3,#, Chune Peng 1,, Xiaodan Dong 1, Jiandong Han 3, Xiaofei Liu 2, Luzhang Wan 3,, Lizeng Peng 1,2,
PMCID: PMC12371009  PMID: 40841404

Abstract

Trametes sanguinea belongs to the polyporaceae family and its fruiting body possesses several benefits. In the current study, Trametes sanguinea ZHSJ (T. sanguinea ZHSJ) isolated from the chestnut tree wood was studied for its whole genome sequence and transcriptome sequence with their annotations. T. sanguinea ZHSJ genome is 41.3243 Mb in length with a total of 135 contigs with N50 contig length 2.8138 Mb. A total of 10886 genes of T. sanguinea ZHSJ were annotated. Further, RNA sequencing and assembly represented 11861 genes, out of these 10,507 genes were annotated with four annotation databases. Moreover, denovo prediction annotation performed by Funannotate represented 12,268 coding genes and simple sequence repeats (SSR) prediction represented total 1,242 SSR and among them 707 were trinucleotide repeat motifs.

Subject terms: Fungal genomics, Fungal biology

Background & Summary

Trametes sanguinea Lloyd (T. sanguinea) belongs to polyporaceae family and it is an edible and medicinal fungus widely cultivated in Yunnan province of China1. The fruiting body of this mushroom possess several polysaccharides with potential medicinal benefits such as antitumor activity by inhibition of migration, invasion and tumor microvascular activity2. Whereas the bioactive polysaccharide (TS2-2A) of T. sanguinea recognized by toll – like receptor (TLR4) and stimulates production of concern cytokines by RAW 264.7 macrophages and hence facilitates immune enhancement3. The treatment with crude polysaccharides of T. sanguinea attenuated myocardial damage by doxorubicin (DOX) induced cardiotoxicity through inhibiting autophagy of cardiomyocytes by inhibiting autophagy associated marker microtubule associated protein 1 A/1B – light chain 3 (LC3A) – related autophagy signaling pathways and inhibiting apoptosis of cardiomyocytes via inhibition of cleaved poly (ADP – ribose) polymerases (PARP) – related apoptosis signaling pathways thus expressing cardioprotective activity4. Moreover, polysaccharide – based composite particles from T. sanguinea effectively healed diabetic foot ulcers by killing bacteria, reducing reactive oxygen species (ROS) levels and enhancing activities of superoxide dismutase (SOD) and catalase (CAT)5. Furthermore, partially acetylated heteropolysaccharides (TS1-1A) from T. sanguinea fruiting body evinced anti-human cytomegalo virus (HCMV) by preventing attachment of virus and arresting the replication of virus by down regulating NAD (P) H Quinone Dehydrogenase 1(NQO1) and heme oxygenase – 1 (HO-1) proteins concerned with oxidative stress and thus representing antiviral activity6. Moreover, the diabetes induced mice when treated with Pinus sp. sawdust and T. sanguinea mycelium extract represented decrease in non- high density lipo-protein (HDL) cholesterol and triglycerides and thus suggested treat dyslipidemia7.

Moreover, T. sanguinea has several other benefits such as biodegradability and biotransformation of triphenyl phosphate8, 2,4,6 – trinitrotoluene (TNT)9 and polychlorinated biphenyls (Sadañoski et al.10). Moreover, T. sanguinea has potentiality in hydrolyzing lignin by secreting several lignocellulolytic enzymes such as cellulases, xylanases, laccases and manganese peroxidases1113. Furthermore, T. sanguinea possess potentiality in bioremediation by decolorization of hazardous recalcitrant complex synthetic textile dyes14. However, the current study revealed information about the whole genome sequencing, RNA sequencing, functional annotations of coding genes and denovo prediction of these annotated genes. Hence, the study of the complete genome of T. sanguinea ZHSJ provides new insights for further research into this medicinally important fungus and its specified genes that are responsible for producing compounds with prominent medicinal activities such as antitumor, anti-inflammatory, antimicrobial etc. Further research on biosynthetic pathways of these compounds would aid in identifying and producing novel medically important drugs industrially through genetic bioengineering.

Experimental workflow

Methods

T. sanguinea ZHSJ source and its genomic DNA extraction and sequencing

The fungal strain has been collected from Mengyin Yun Meng scenic area, Linyi city, Shandong province, China, from the chestnut tree wood. The fruiting body was surface sterilized and cut into small 1 cm pieces and placed on to the potato dextrose agar (PDA) medium containing plates and incubated at 25 °C for 7 – 14 days, the obtained fungal culture was maintained as pure culture by inoculating on to the PDA medium. Later, the agar plugs of the fungal culture grown on PDA for 7 days have been inoculated on to the liquid PD broth medium and incubated at 25° C for 7 days to obtain required amount for genomic analysis. The genomic DNA was isolated from the T. sanguinea ZHSJ cultured in the liquid medium for 7 days. The mycelium was separated from the broth by centrifuging at 9000x g for 1 minute15. The mycelium that was sedimented at the bottom was subjected to freeze dry by using liquid nitrogen and grinded well to crack the cell wall and elute the DNA. Later, the extraction of DNA was performed by following the instructions of Bioflux Biospin DNA extraction kit purchased from Hangzhou Bioer Technology Co., Ltd., Hangzhou, China. Library construction and sequencing of genome was performed by using combination of PacBioSequelIIe and Illumina sequencing platforms16. For Illumina sequencing the DNA samples were fragmented to ~400 bp using CovarisM220 acoustic shearer by following manufacture’s protocol. Illuminia sequencing libraries were prepared from the fragments by using NEXTFLEX Rapid DNA seq kit17. Further, 5′ prime ends were end-repaired and phosphorylated and 3′ ends were A- tailed and ligated to sequencing adapters. Then, adapters ligated products were enriched by PCR and further these libraries were subjected to Illumina Novaseq 6000 (Illumina Inc, San Diego, CA, USA). Furthermore, for PacBio sequencing, genome DNA was fragmented to ~10Kb then purified end – repaired and ligated with SMRT bell sequencing adapters by following manufacturer’s instructions (Pacific Biosciences, CA), later, PacBio library was prepared and sequenced on one SMRT cell using standard protocol18.

T. sanguinea ZHSJ RNA extraction, purification and library construction and sequencing

T. sanguinea ZHSJ grown in PD broth for 7 days at 28 °C, the mycelium was collected after incubation and the mycelial cell wall was degraded by grinding the sample with liquid nitrogen and 100 mg of the mycelial powder was subjected to RNA extraction by using Omega Bio- TEK E.Z.N.A fungal RNA kit by following the instructions provided by the kit. Further, from total RNA mRNA was enriched by using RNeasy pure mRNA bead kit and following the instructions to obtain pure mRNA from the crude RNA sample. Later, mRNA was reverse transcribed to cDNA, where the first strand of cDNA was synthesized in M-Mulv reverse transcriptase system by using fragmented mRNA as template, oligonucleotides as primer and RNaseH was used to degrade RNA strand. The second strand of cDNA was synthesized with dNTPs as raw material under DNA polymerase I system by completing cDNA end repair and A tailing. Further, adapters were ligated to cDNA and 200 bp of cDNA was screened and purified by Hieff NGS ® DNA selection beads. Later, cDNA library was amplified by PCR and final detection is carried out by using Illumina NovaseqXplus.

Basic genomic information of T. sanguinea ZHSJ

T. sanguinea ZHSJ total genome size was represented as 41324262 bp and it possesses 135 number of contigs, out of these 31-contigs sequences were referred as mitochondrial sequences. GC %, number coding genes, KEGG genes, COG genes and RNAs were represented in (Table 1).

Table 1.

Basic genomic statistical information of T. sanguinea ZHSJ.

S: No Feature Value
1. Total genome size 41324262 bp
2. Contigs 135
3. Contigs N50 2,81,3802 bp
4. GC content % 52.08
5. Coding sequences (Total number of genes) 9049
6. Genes of KEGG 5341
7. Genes of COG 2859
8. Total number of tRNAs 1280
9. Total number of rRNAs 1035

The total genome size, N50 contigs, coding sequences and RNAs were represented statistically.

Quality control

The genome sequence was assembled by using PacBio reads and Illumina reads. The original data is transferred to sequence data via base calling for Illumina sequencing data and they were defined as raw data or raw reads which were saved in FASTQ file. Further, from this raw data the low-quality data has been removed by quality trimming resulting in formation of clean data. The statistical method was used to calculate the base distribution and quality fluctuation of each cycle of all sequencing reads which could intuitively reflect the library construction quality and sequencing quality of the sequencing samples from a macroscopic perspective and analyze the base distribution of samples. The third-generation quality control of the sample and the second-generation quality control data were represented in (Table 2) (Fig. 1). The cDNA reads obtained from sequencing were further subjected to filtering through fastp (version 0.180) by using parameters such as removing reads containing adapters, removing reads containing more than 10% of unknown nucleotides (N) and removing low quality reads containing more than 50% of low quality (Q value ≤ 20) bases. Low quality data has been filtered by several parameters and clean reads were obtained (Table 3). Further, filtered data was subjected to analyze the base composition and mass distribution to provide a visual representation of the quality of the data. Here, the more balanced the base composition is, the higher the quality (Fig. 2).

Table 2.

Statistical information of third generation and second-generation quality control data of T. sanguinea ZHSJ.

S: No Feature Value
Third generation quality control data
1. Total read number 17752
2. Total Bases (bp) 3.89 Gb
3. Largest base pair read length 61076
4. Read length of average reads 21935.22
5. Coverage of raw data (read depth) 94.2459
Second-generation quality control data
6. The length of insertion sequence 496 bp
7. Read length of original reads 151 bp
8. Raw pair reads 22611262
9. Raw bases 3.41 Gb
10. Raw Q20% 0.985452
11. Raw Q30% 0.959331
12. Clean pair reads 21359246
13. Clean bases 3.21 Gb
14. Clean Q20% 0.988822
15. Clean Q30% 0.964834

The sequence of 496 bp was inserted into the sequencer to obtain clean reads. Here, after the quality control the percentage of bases with Phred value greater than 20 and 30 bases to the total bases were represented, which refers that the larger the Phred value, the better the quality of the sequence.

Fig. 1.

Fig. 1

Quality control of sequencing data. (a) The distribution of base error rate before and after the quality control here, x- co-ordinate indicates the arrangement of bases on the reads from 5′ to 3′ end and second ordinate represents average error rate (%) of all reads. Here, the first half represents the first end of sequencing and the other half is the other end sequencing read’s error rate distribution. (b) Base composition distribution map before and after quality control, here, x- axis represents arrangement of the bases on reads from 5′ to 3′ end and y- axis represents the percentage of the reads at A, C, G, T and N location with different colors. As the start sequence attached to the primer adapter initially there was fluctuation of A, C, G and T but later stabilized. Moreover, lesser unknown bases N represents sequencing sample were less affected by the system’s AT preference. (c) The distribution of the average mass value of each base of all sequencing reads before and after quality control, here, the data represents base co-ordinate of the reads on x-axis and base average mass value of the reads on y- axis, whereas first half of the graph represents distribution of the base average and the other half is the other end distribution of the base average. (d) Sequencing length distribution of clean reads of T. sanguinea ZHSJ.

Table 3.

Transcriptome data quality control.

S: No Feature Value
Data filtering statistics
1. Raw data 3,83,85,854
2. Clean data (%) 3,83,82,184 (99.99%)
3. Adapter (%) 3,670 (0.01%)
4. Low quality (%) 0 (0.00%)
5. Poly A (%) 0 (0.00%)
6. N (%) 0 (0.00%)
Base information statistics
7. Raw data (bp) 5.75 Gb
8. BF-Q20(%) 5.69 Gb (98.92%)
9. BF-Q30(%) 5.54 Gb (96.22%)
10. BF-N (%) 76,798 bp (0.00%)
11. BF- GC (%) 3.39 Gb
12. Clean data (bp) 5.75 Gb
13. AF-Q20 (%) 5.68 Gb
14. AF-Q30 (%) 5.53 Gb
15. AF-N (%) 76,665 bp (0.00%)
16. AF-GC (%) 3.38 Gb (58.91%)

Low quality data has been filtered by removing reads containing adapters, reads with greater than 10% N content, A- base reads and low-quality reads. After filtering, data represents clean reads with Q20% - 5.68 Gb, Q30% - 5.53 Gb, N% - 76,665 bp and GC% - 3.38 Gb.

Fig. 2.

Fig. 2

Transcriptome quality control – Base distribution before and after filtration. The base composition and position of bases along with reads were more balanced after filtration. The more balanced the base composition the higher the quality of the data.

Genomic assessment

Genomic assessment was performed by analyzing GC-depth distribution and k – mer frequency distribution. GC- depth distribution was analyzed by using SOAP (short oligonucleotide alignment program)19 and Bowtie 2 (version 2.5.1) software http://bowtie-bio.sourceforge.net/bowtie2/index.shtml20. Whereas, K-mer frequency distribution was analyzed by k – mer analysis tool kit (KAT)21. In GC – depth analysis original reads of paired – end (PE) library was aligned to the assembly sequence to obtain base depth. Here, GC content was represented as 50 – 60% and the sequencing depth was represented at 100. This represents there is no obvious GC – bias. However, the GC- depth distribution was divided into three layers, whereas the major distribution was at 50 – 70% and the minor distribution was at 15 – 35%. The blast analysis of these sequences represented no contamination. Hence, the three-layer distribution would be due to heterozygosity causing homologous chromosomes to assemble into one or two strands at heterozygous sites (Fig. 3a). For, k – mer frequency analysis PE sequencing reads were used, in which high quality sequencing region was selected and 17 – kmer depth was taken by base and portion of the frequency of each depth was counted. Here, at the depth of 92 the highest peak with frequency of 1.651% was represented. However, due to heterozygosity two peaks were represented as the lowest peak preceding the main peak and the tail of the main peak represents the repetition (Fig. 3b).

Fig. 3.

Fig. 3

Result of genomic assessment. (a) GC content distribution analysis represented at 50–60% and the sequencing depth was represented at 100 evincing there is no GC-bias. (b) k- mer frequency analysis represents at the depth of 92 the highest peak with frequency of 1.651%.

Genomic and transcriptomic assembly and prediction

HiFi reads of clean data were assembled into contigs using CANU (correct- then – assemble assembler) and assembly software hifiasm v0.19.5 and later error correction of the PacBio assembly was performed using the Illumina clean reads. Assembly evaluation has been performed by CEGMA (core eukaryotic genes mapping approach) (version 2.5)22 http://korflab.ucdavis.edu/datasets/cegma/ and BUSCO (Benchmarking Universal single – copy orthologs) (version 5.4.5)23 http://busco.ezlab.org/.GapCloser(version1.12) https://sourceforge.net/projects/soapdenovo2/files/GapCloser/bin/r6/GapCloser-bin-v1.12-r6.tgz/download software was used to fill in the gaps in the results of second-generation assembly24. Transcriptome denovo assembly was analyzed by short reads assembling program Trinity. Where, Trinity is a software package that combines three components, such as Inchworm, Chrysalis and Butterfly. Here, Inchworm assembles reads by k-mer based approach, resulting in collection of linear contigs, Chrysalis clusters related contigs that correspond alternatively to the spliced transcripts or unique portions of paralogous genes and then builds a de Bruijin graphs for each cluster of related contigs. Further, Butterfly analyzes the path of reads and read pairings in the context of the corresponding de Bruijin graph and outputs one linear sequence of each alternatively spliced isoform and transcripts derived from paralogous genes25. Funannotate (v1.8.17)26 was used to perform denovo prediction. Funannotate employs an evidence based integrative approach to identify protein coding genes by incorporating homology-based protein alignments and transcriptome data processed through trinity (v2.5.1) and PASA (v2.5.3)27. These evidence sources generate hints for ab initio predictors such as Augustus (v3.5.0)28 and GeneMark-ES/ET (vG.72)29. Further, EvidenceModeler (v2.1.0)30 integrates these predictions with weighted evidence, resolving conflicts and selecting the most reliable gene models. The genome prediction was performed by analyzing coding genes prediction, non- coding RNA gene prediction and repeat sequence prediction by using Marker2 software (gene prediction of fungi) http://www.yandell-lab.org/software/maker.html31, Barrnap (version 0.4.2) https://github.com/tseemann/barrnap/ and tRNAscan – SE (version 1.3.1) http://trna.ucsc.edu/software/ were used to analyze rRNA and genome contained in the tRNA for prediction32. Whereas to analyze the repeat prediction Repeatmasker software http://www.repeatmasker.org/ was used33. Denovo assembly of the quality control clean reads represents 135 total contig number. G + C% was represented as 52.08% and the detailed information was represented in (Table 4) and out of these 135 contigs 12 contigs possess major coding genes (Fig. 4a). Assembly evaluation was performed by BUSCO and CEGMA (Table 4). The tRNA’s prediction indicated 1280 number with 21 types (Fig. 4b). Repeat prediction analyzed by Repeatmasker software represented 25.81% genome possess repeats with 1,06,65,171 bp, these interspersed repeats were referred as transposon elements that include DNA transposons, retrotransposons and unclassified repeats. DNA transposons were 209 in number (n), whereas retrotransposons were categorized into long terminal repeats (LTR) n = 641, short interspersed nuclear elements (SINEs) n = 16 and long interspersed nuclear elements (LINEs) n = 170. Whereas 1902 were unclassified repeats (Table 4). Further, Unigenes were sorted from longest to shortest and the assembly was analyzed based on N50 number and length. The smaller the number of N50s the better the assembly quality (Table 4). Further, assembly quality was analyzed by unigene length distribution analysis (Fig. 4c) and the integrity of assembly was analyzed further by BUSCO and represented in (Fig. 4d).

Table 4.

Statistical information of genome assembly, evaluation and prediction of T. sanguinea ZHSJ.

S: No Feature Value
Genome assembly
1. Total contig number 135
2. Total bases in contig 41324262 bp
3. Large contig length 4110638 bp
4. Contig N50 2813802 bp
5. Contig N90 69383 bp
6. G + C 52.08%
Genome assembly evaluation
BUSCO
7. Complete 94.3%
8. Complete duplicated 0.4%
9. Fragmented 2.0%
10. Missing 3.7%
CEGMA
11. Complete 94.35%
12. Partial 95.97%
13. Total Orthologs number 247
14. Average Orthologs number 1.06
Genetic prediction
Coding gene prediction
15. Number of coding genes 9049
16. Total gene length 23070809 bp
17. Gene average length 2549.54 bp
18. Gene density 0.22 kb
19. GC content in gene region 57.39%
20. Percentage of genes in genome 55.83%
21. Intergenetic region length 18253453 bp
22. GC content in intergenetic region 54.93%
23. Genome percentage in intergenetic region 44.17%
24. Transposable elements 209
rRNA prediction
25. Total rRNAs 1035
26. 5S rRNA 259
27. 5.8S rRNA 258
28. 18S rRNA 256
29. 28S rRNA 262
Transcriptome assembly
30. Number of genes 11861
31. Gene coverage 96.37%
32. GC (%) 57.9754%
33. N50 number 2865
34. N50 length 2316
35. Maximum length 15161
36. Minimum length 201
37. Average length 1715
38. Total assembled bases 2,03,45,744 bp
Denovo prediction
39. Number of coding genes 12268
40. Total gene length 20258617 bp
41. Gene average length 1651.34 bp
42. Gene density 3.25 kb
43. GC content in gene region 56.91%
44. Percentage of genes in genome 50.78%
45. Intergenetic region length 19636407 bp
46. GC content in intergenetic region 55.43%
47. Genome percentage in intergenetic region 44.17%
48. Transposable elements 209

The genome sequence and transcriptome sequence after quality control were subjected to assembly and genetic prediction.

Fig. 4.

Fig. 4

Genome assembly and prediction. (a) The Circos plot depicting the genomic assembly and prediction of T. sanguinea ZHSJ. 12 large contigs of the T. sanguinea ZHSJ genome were represented in circle1 and circle 2 represents predicted annotated coding genes, circle 3 and 4 representing the short and long reads across the genome and represents the repeat regions in between these 12 contigs with more than 20 kb in size. (b) The abscissa representing types of tRNA’s and the ordinate representing the number of tRNA’s, here, methionine represented in highest number compared to others. (c) Unigene length distribution. (d) BUSCO analysis, a total of 290 BUSCOs were searched, among them 278 were complete BUSCOs, 274 were complete and single copy BUSCOs, 4 were complete and duplicated BUSCOs, 12 were fragmented BUSCOs and there were no missing BUSCOs.

Genome annotation

Basic genome annotation performed by using alignment tools such as BLAST, Diamond and HMMER to annotate the predicted genes from T. sanguinea ZHSJ with five major data bases NR (non- redundant protein) database34, Swiss – port library, Pfam (protein families) library35, COG (cluster of orthologous genes) data base http://www.ncbi.nlm.nih.gov/COG/36, GO (gene ontology) databases http://www.geneontology.org/37 and KEEG Notes (Kyoto encyclopedia of genes and genomes) database http://www.genome.jp/kegg/38. The predicted coding genes were annotated based on basic functions and these functional annotations were carried out by comparing with five major databases such as, NR database, Swiss- port library, Pfam library, COG database and GO database. However, NR and Swiss- port database annotations are based on protein sequence alignments, a total of 10886 genes of T. sanguinea ZHSJ were annotated. Whereas the Pfam database is a large collection of protein families that rely on multiple sequence alignments and hidden Markov models. Proteins function primarily through the secondary result of the primary sequence, which are referred to as domains. The different combinations of domains produce proteins that vary in function. Therefore, the identification of protein domains is prominent for the analysis of protein function, a total of 6148 genes of T. sanguinea ZHSJ were annotated by Pfam database.

COG annotations

COG database alignments performed for functional annotation, classification and protein evolution analysis of predicted proteins, results of COG annotations represent four main COG classifications such as information storage and processing, metabolism, cellular processing and signaling and poorly characterized (function unknown). A total of 2859 genes were assigned to 24 types of COGs out of them carbohydrate transport and metabolism, translation ribosomal structure and biogenesis, amino acid transport and metabolism, post translational modification, protein turnover and chaperones were the top number of genes that were annotated. Further, there were 369 genes representing general functions but they were poorly characterized and 42 genes functions were unknown (Fig. 5a).

Fig. 5.

Fig. 5

Genome annotation. (a) COG annotation, the x coordinate represents the functional classification of COG and the Y coordinate represents the number of genes with this type of function. Here, carbohydrate transport metabolism function genes are high in number. (b) GO annotation, the abscissa represents the three branches of GO namely biological process, cellular component and molecular function with their further classifications. The ordinate represents the relative portion of genes. Here, metabolic processes possess the highest number of genes in biological process, cellular anatomical entity possesses highest number of genes in cellular component and catalytic activity possesses highest number of genes in molecular function. (c) KEGG annotation, the ordinate represents the name of the KEGG metabolic pathway and the abscissa is the number of genes/transcripts annotated to the pathway. KEGG metabolic pathways are divided into 7 categories: metabolism, genetic information processing, environmental information processing, cellular processes, organismal systems, human diseases and drug development. d) summary of annotation represents total 11,861 assembled unigenes were annotated against four major public databases such as Nr – 10,473 unigenes 88.3%, KEGG – 9,125 unigenes 76.9%, KOG – 4,301 unigenes 36.2% and Swiss Port – 5,224 unigenes 44.0%. Among these, 10,507 (88.6%) were successfully annotated in at least one database, while only 1,354 (11.4%) remained unannotated, indicating a high annotation success rate. The Venn diagram analysis represented that 4,209 (35.5%) unigenes were concurrently annotated across all four databases.

GO annotations

According to the GO database, 5688 genes of T. sanguinea ZHSJ were assigned to three major categories such as cellular component 3252 genes, molecular function 4290 genes and biological processes 2911 genes. In these categories, cellular component possesses the largest number of genes for cellular anatomical entity (2915). Whereas in molecular function catalytic activity and binding possess the highest number of genes (2912, 2479). In biological processes metabolic processes and cellular processes possess the highest number of genes (2193, 2086) (Fig. 5b).

KEGG annotations

The systemic metabolic pathway of gene products from T. sanguinea ZHSJ was further analyzed by KEGG database. A total of 13,253 genes were assigned against six categories of KEEG: cellular processes possess 1273 genes among them transport and catabolism pathway related genes were the highest in number 552. Moreover, environmental information processing possesses 922 genes in these signal transduction pathway related genes 856 were the highest. Whereas genetic information processing possesses 1118 genes, among them translation pathway related genes 334 were the highest in number. Further, human diseases possess 3048 genes among them neurodegenerative disease pathway related genes 844 were highest in number. Moreover, metabolism possesses 5349 genes among them global and overview maps (special class metabolic pathway maps of KEGG) 2679 were the highest in number. Furthermore, organismal systems possess 1543 genes out of them endocrine system pathway related genes 436 were highest in number (Fig. 5c). The transcriptomic unigene annotation of T. sanguinea ZHSJ has been represented in (Fig. S1) and (Fig. 5d).

Simple sequence repeats (SSR) prediction and primer design

The Microsatellite (MISA) http://pgrc.ipk-gatersleben.de/misa/ was employed to perform microsatellite mining in the whole transcriptome of T. sanguinea ZHSJ and the parameters used were (definition unit_size, min-repeats): 2–6 n3-5 4-4 5-4 6-4 interrupts (max_difference-between_2_SSRs):100. If the distance between two SSRs is shorter than 100 bp, then they are considered as one SSR. Based on the MISA results, primer3 http://www.broadinstitute.org/genome_software/other/promer3.html was used to design primer pairs in the flanking regions of SSRs39. Total 1,242 SSR were identified by using MISA software in T. sanguinea ZHSJ transcriptome, among them the most abundant repeat motif was trinucleotide with number (n) = 707 they were AAG/CTT, ACC/CGT, AGC/CTG, AGG/CCT, ATC/ATG and CCG/CGG. Followed by dinucleotide with n = 218 they were commonly AC/GT, AG/CT and CG/CG. Then, tetranucleotide with n = 191 they were AACG/CGTT, ACGC/CGTT, ACGG/CCGT, AGCG/CGCT and ATCC/ATGG. Whereas pentanucleotide and hexanucleotide were n = 45 and n = 81 respectively (Fig. 6).

Fig. 6.

Fig. 6

Statistical plot of the proportion of SSR of different tandem repeat unit types in the total SSR. Trinucleotide SSR was the abundant repeat motif in transcriptome of T. sanguinea ZHSJ.

Data Records

The whole genome sequence raw reads, FASTA files and gff annotation files of T. sanguinea ZHSJ has been deposited at the National center for biotechnology (NCBI) under the Bio project number – PRJNA1174344, and accession number – GCA_050630565.1 https://identifiers.org/ncbi/insdc.gca:GCA_050630565.140. https://identifiers.org/ncbi/insdc.sra:SRP60078741. Funannotate annotation version gff3 file has been submitted to Zenodo 10.5281/zenodo.1675084842.

Technical Validation

The genome sequence assembled by using PacBio and Illumina, were subjected to quality control and clean genomic reads were assessed by analyzing GC-depth distribution and k – mer frequency distribution by SOAP and Bowtie 2 v2.5.1. Here, GC content 50 – 60% and the sequencing depth was represented at 100 and high-quality sequencing region was selected and 17 – kmer depth was taken by base and portion of the frequency of each depth was counted. Here, at the depth of 92 the highest peak with frequency of 1.651% was represented (Fig. 3). Illumina reads were assembled into 135 contigs using CANU and hifiasm v0.19.5. Assembly evaluation performed by BUSCO v5.4.5 and CEGMA v2.5. 94.3% of complete BUSCOs were included in assembled genome (Table 4). Transcriptome denovo assembly was analyzed by short reads assembling program Trinity and it represented 96.37% gene coverage (Table 4). Denovo prediction annotation was performed by using Funannotate v1.8.17 represented 12268 coding genes with total gene length 20258617 bp (Table 4). All these evaluations represent the high-quality genome assembly and annotation of T. sanguinea ZHSJ.

Supplementary information

Supplementary information (452.7KB, pdf)

Acknowledgements

This work was jointly supported by the National Natural Science Foundation of China (Grant Number, 32101035), the Natural Science Foundation of Shandong Province (Grant Number, ZR2021QC025), the Special Project of Central Government for Local Science and Technology Development of Shandong Province (Grant Number, YDZX2022151), Key R&D Program of Shandong Province (2024TZXD020) and Agricultural Science and Technology Innovation Project (grant number CXGC2025F09). Natural Science Foundation of Shandong Province ZR2023QC061.

Author contributions

R.S.L., X.J. and P.Y.: whole genome sequencing, bioinformatics, data interpretation and manuscript writing, X.D. and J.H.: data analysis and HI-FI sequencing, X.L.: pipeline investigation, C.P., L.W. and L.P.: conceptualization, project supervision, data interpretation, T. sanguinea ZHSJ cultivation and manuscript revision.

Code availability

All analyses were performed by following the guidelines provided in the manuals for the software and pipelines used. The information on the software’s used and their versions are detailed in the methods section. All the software and tools in the study were used with their default parameters unless otherwise detailed.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Rathna Silviya Lodi, Xiuwen Jia, Peng Yang.

Contributor Information

Chune Peng, Email: pengchune@saas.ac.cn.

Luzhang Wan, Email: wanluzhang657@163.com.

Lizeng Peng, Email: penglizeng@sdnu.edu.cn.

Supplementary information

The online version contains supplementary material available at 10.1038/s41597-025-05798-9.

References

  • 1.Lesage-Meessen, L. et al. Phylogeographic relationships in the polypore fungus Pycnoporus inferred from molecular data. FEMS Microbiol Lett325, 37–48 (2011). [DOI] [PubMed] [Google Scholar]
  • 2.Yan, M. X. et al. Structural Characterization and Tumor Microvascular Inhibition Activity of Total Polysaccharide from Trametes sanguinea Lloyd. Chem Biodivers19 (2022). [DOI] [PubMed]
  • 3.Zhang, M. et al. Structural characterization of a polysaccharide from Trametes sanguinea Lloyd with immune-enhancing activity via activation of TLR4. Int J Biol Macromol206, 1026–1038 (2022). [DOI] [PubMed] [Google Scholar]
  • 4.Shen, C. et al. Cardioprotective effect of crude polysaccharide fermented by Trametes Sanguinea Lyoyd on doxorubicin-induced myocardial injury mice. BMC Pharmacol Toxicol24 (2023). [DOI] [PMC free article] [PubMed]
  • 5.Huang, X. et al. Pycnoporus sanguineus Polysaccharides as Reducing Agents: Self-Assembled Composite Nanoparticles for Integrative Diabetic Wound Therapy. Int J Nanomedicine18, 6021–6035 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Wang, Y. et al. Structure elucidation and antiviral activity of a cold water-extracted mannogalactofucan Ts1-1A from Trametes sanguinea against human cytomegalovirus in vitro. Carbohydr Polym335 (2024). [DOI] [PubMed]
  • 7.Rech, G. et al. Lipid-lowering effect of Pinus sp. sawdust and Pycnoporus sanguineus mycelium in streptozotocin-induced diabetic rats. J Food Biochem44 (2020). [DOI] [PubMed]
  • 8.Feng, M. et al. Bioremediation of triphenyl phosphate by Pycnoporus sanguineus: Metabolic pathway, proteomic mechanism and biotoxicity assessment. J Hazard Mater417 (2021). [DOI] [PubMed]
  • 9.Alvarado-Ramírez, L. et al. Biotransformation of 2,4,6-Trinitrotoluene by a cocktail of native laccases from Pycnoporus sanguineus CS43 under oxygenic and non-oxygenic atmospheres. Chemosphere352 (2024). [DOI] [PubMed]
  • 10.Sadañoski, M. A. et al. Bioprocess conditions for treating mineral transformer oils contaminated with polychlorinated biphenyls (PCBs). J Environ Chem Eng8 (2020).
  • 11.Gauna, A., Larran, A. S., Feldman, S. R., Permingeat, H. R. & Perotti, V. E. Secretome characterization of the lignocellulose-degrading fungi Pycnoporus sanguineus and Ganoderma resinaceum growing on Panicum prionitis biomass. Mycologia113, 877–890 (2021). [DOI] [PubMed] [Google Scholar]
  • 12.Sánchez-Corzo, L. D. et al. Lignocellulolytic enzyme production from wood rot fungi collected in chiapas, mexico, and their growth on lignocellulosic material. Journal of Fungi7 (2021). [DOI] [PMC free article] [PubMed]
  • 13.Lu, C., Wang, H., Luo, Y. & Guo, L. An efficient system for pre-delignification of gramineous biofuel feedstock in vitro: Application of a laccase from Pycnoporus sanguineus H275. Process Biochemistry45, 1141–1147 (2010). [Google Scholar]
  • 14.Malcı, K., Kurt-Gür, G., Tamerler, C. & Yazgan-Karatas, A. Combinatorial decolorization performance of Pycnoporus sanguineus MUCL 38531 sourced recombinant laccase/mediator systems on toxic textile dyes. International Journal of Environmental Science and Technology20, 951–966 (2023). [Google Scholar]
  • 15.Bellemare, A., John, T. & Marqueteau, S. Fungal genomic DNA extraction methods for rapid genotyping and genome sequencing. in Methods in Molecular Biology1775, 11–20 (Humana Press Inc., 2018). [DOI] [PubMed]
  • 16.Head, S. R. et al. Library construction for next-generation sequencing: Overviews and challenges. Biotechniques56, 61–77 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Tan, G., Opitz, L., Schlapbach, R. & Rehrauer, H. Long fragments achieve lower base quality in Illumina paired-end sequencing. Sci Rep9 (2019). [DOI] [PMC free article] [PubMed]
  • 18.Kanwar, N., Blanco, C., Chen, I. A. & Seelig, B. PacBio sequencing output increased through uniform and directional fivefold concatenation. Sci Rep11 (2021). [DOI] [PMC free article] [PubMed]
  • 19.Li, R., Li, Y., Kristiansen, K. & Wang, J. SOAP: Short oligonucleotide alignment program. Bioinformatics24, 713–714 (2008). [DOI] [PubMed] [Google Scholar]
  • 20.Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat Methods9, 357–359 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Mapleson, D., Accinelli, G. G., Kettleborough, G., Wright, J. & Clavijo, B. J. KAT: A K-mer analysis toolkit to quality control NGS datasets and genome assemblies. Bioinformatics33, 574–576 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Parra, G., Bradnam, K. & Korf, I. CEGMA: A pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics23, 1061–1067 (2007). [DOI] [PubMed] [Google Scholar]
  • 23.Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics31, 3210–3212 (2015). [DOI] [PubMed] [Google Scholar]
  • 24.Xu, M. et al. TGS-GapCloser: A fast and accurate gap closer for large genomes with low coverage of error-prone long reads. Gigascience9 (2020). [DOI] [PMC free article] [PubMed]
  • 25.Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol29, 644–652 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Palmer, J. M. & StajichJ. Funannotate v1.8.1: Eukaryotic genome annotation (v1.8). Zenodo (2020).
  • 27.He, Y. et al. PaSa: An LLM Agent for Comprehensive Academic Paper Search. 1https://icml.cc/Conferences/2023.
  • 28.Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics24, 637–644 (2008). [DOI] [PubMed] [Google Scholar]
  • 29.BoRoDovsIcYt, M. Genmark: parallel gene recognition for both dna strands. Claverie & Bougueleret17 (1993).
  • 30.Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol9 (2008). [DOI] [PMC free article] [PubMed]
  • 31.Holt, C. & Yandell, M. MAKER2: An annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics12 (2011). [DOI] [PMC free article] [PubMed]
  • 32.Lowe, T. M. & Eddy, S. R. TRNAscan-SE: A Program for Improved Detection of Transfer RNA Genes in Genomic Sequence. Nucleic Acids Research25 (1997). [DOI] [PMC free article] [PubMed]
  • 33.Besemer, J., Lomsadze, A. & Borodovsky, M. GeneMarkS: A Self-Training Method for Prediction of Gene Starts in Microbial Genomes. Implications for Finding Sequence Motifs in Regulatory Regions. Nucleic Acids Research29 (2001). [DOI] [PMC free article] [PubMed]
  • 34.Li, W., Kondratowicz, B., McWilliam, H., Nauche, S. & Lopez, R. The annotation-enriched non-redundant patent sequence databases. Database2013 (2013). [DOI] [PMC free article] [PubMed]
  • 35.Finn, R. D. et al. Pfam: The protein families database. Nucleic Acids Research42, D222 – D230 at 10.1093/nar/gkt1223 (2014). [DOI] [PMC free article] [PubMed]
  • 36.Galperin, M. Y. et al. COG database update: Focus on microbial diversity, model organisms, and widespread pathogens. Nucleic Acids Res49, D274–D281 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Harris, M. A. et al. The Gene Oncology (GO) database and informatics resource. Nucleic Acids Res32 (2004). [DOI] [PMC free article] [PubMed]
  • 38.Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research28http://www.genome.ad.jp/kegg/ (2000). [DOI] [PMC free article] [PubMed]
  • 39.Misener, S., Krawetz, S. A., Rozen, S. & Skaletsky, H. Primer3 on the WWW for General Users and for Biologist Programmers. http://www.dnastar.com/. [DOI] [PubMed]
  • 40.Peng, L., Peng, C. & Lodi, R. S. Genbankhttps://identifiers.org/ncbi/insdc.gca:GCA_050630565.1 (2025).
  • 41.Peng, L., Peng, C. & Lodi, R. S. NCBI Sequence Read Archive.https://identifiers.org/ncbi/insdc.sra:SRP600787 (2025).
  • 42.Peng, L., Peng, C. & Lodi, R. S. Trametes sanguinea ZHSJ Genome Annotation. Zenodo.10.5281/zenodo.16750848 (2025).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

  1. Peng, L., Peng, C. & Lodi, R. S. Genbankhttps://identifiers.org/ncbi/insdc.gca:GCA_050630565.1 (2025).
  2. Peng, L., Peng, C. & Lodi, R. S. NCBI Sequence Read Archive.https://identifiers.org/ncbi/insdc.sra:SRP600787 (2025).
  3. Peng, L., Peng, C. & Lodi, R. S. Trametes sanguinea ZHSJ Genome Annotation. Zenodo.10.5281/zenodo.16750848 (2025).

Supplementary Materials

Supplementary information (452.7KB, pdf)

Data Availability Statement

All analyses were performed by following the guidelines provided in the manuals for the software and pipelines used. The information on the software’s used and their versions are detailed in the methods section. All the software and tools in the study were used with their default parameters unless otherwise detailed.


Articles from Scientific Data are provided here courtesy of Nature Publishing Group

RESOURCES