Abstract
Trichoptera is one of the most evolutionarily successful aquatic insect lineages and is highly valued value in adaptive evolution research. This study presents the chromosome-level genome assemblies of Himalopsyche anomala and Eubasilissa splendida achieved using PacBio, Illumina, and Hi-C sequencing. For H. anomala and E. splendida, assembly sizes were 663.43 and 859.28 Mb, with scaffold N50 lengths of 28.44 and 31.17 Mb, respectively. In H. anomala and E. splendida, we anchored 24 and 29 pseudochromosomes, and identified 11,469 and 10,554 protein-coding genes, respectively. The high-quality genomes of H. anomala and E. splendida provide critical genomic resources for understanding the evolution and ecology of Trichoptera and performing comparative genomics analyses.
Subject terms: Entomology, Comparative genomics
Background & Summary
Trichoptera, commonly known as caddisflies, represent the largest order of completely aquatic insects within Endopterygota1. Encompassing approximately 17,000 extant species, Trichoptera are distributed across all continents except Antarctica2. Their larvae exhibit remarkably diverse behavior, constructing various nest structures or living freely in aquatic environments3. Their adaptability to varying water conditions, including temperature and dissolved oxygen, differs significantly among families, genera, and individual species4. Consequently, they serve as vital indicator organisms in water quality monitoring efforts. Additionally, the varied feeding habits of trichopteran larvae contribute to the energy dynamics within stream ecosystems5,6.
Trichoptera is divided into two suborders, Annulipalpia and Integripalpia, based on morphology and habit. Annulipalpian larvae typically inhabit running water or wave-washed riverbanks, using pin silk along with plant debris and small stones to construct fixed shelter. Integripalpia includes “cocoon-makers” and “Phryganides”7,8. Cocoon-makers larvae are either free-living or construct purse-case or saddle-case and are usually found in fast-flowing rivers and streams. Last instar larvae produce closed, semipermeable cocoons for pupation. In contrast, most Phryganides larvae thrive in stagnant or slow-moving water, adeptly combining stones, leaves, and twigs with silk proteins to construct mobile nests9,10. Rhyacophilidae and Phryganeidae are representative cocoon-makers and Phryganides, respectively, and exhibit marked ecological habit and lifestyle differences.
The family Rhyacophilidae originated in the Palaearctic region and is primarily distributed in the northern-hemisphere11. Their predatory larvae exhibit high sensitivity to environmental changes12. However, the majority of phryganeid larvae are shredders, feeding on detritus and plant material in aquatic environments13. These larvae tend to be less sensitive to environmental changes compared with rhyacophilid larvae. Some species can survive in humid terrestrial environments after leaving the water10. Himalopsyche anomala Banks and Eubasilissa splendida Yang & Yang are typical representatives of Rhyacophilidae and Phryganeidae, respectively. Despite extensive studies on their biological characteristics, their precise phylogenetic positions and the molecular mechanisms underlying their adaptive evolution remain uncertain. High-quality reference genomes are crucial for advancing genetics and genome research. To date, nearly 30 trichopteran species have had their genomes sequenced and published, including two Himalopsyche species and Eubasilissa regina. However, the chromosome-level has been reached in only partial species from five families (Glossosomatidae, Hydropsychidae, Leptoceridae, Limnephilidae, and Odontoceridae).
To enhance our understanding of the adaptive evolution and ecology of holometabola aquatic insects, we used PacBio long-read sequencing, Illumina short-read sequencing, and Hi-C data sequencing techniques to achieve the first chromosome-level genome assemblies for H. anomala Banks and E. splendida Yang & Yang, with assembly sizes of 663.43 and 859.28 Mb and scaffold N50 lengths of 28.44 and 31.17 Mb, respectively. Hi-C scaffolding resulted in chromosome-level assemblies, with 99.29% (2,697 contigs) and 99.61% (643 contigs) of the initially assembled sequences anchored to 24 and 29 pseudochromosomes for H. anomala and E. splendida, respectively. In total, 288.10 Mb (43.43%) and 471.23 Mb (54.84%) of the sequences were identified as repetitive elements in these two respective assemblies. Moreover, integrating three prediction methods enabled the identification of 11,469 and 10,554 protein-coding genes (PCGs) in H. anomala and E. splendida, respectively. The high-quality genomes of these species not only advance our understanding of adaptive evolution in Trichoptera but also serve as resources for comparative genomics research on evolution in biology and ecology fields. Furthermore, they contribute to elucidating the phylogenetic relationships between the cocoon-maker and Phryganides groups.
Methods
Sample collection
Himalopsyche anomala and E. splendida specimens were collected using ultraviolet light tubes from Xi-niu Sea (33°11′42″N; 103°53′46″E; alt: 2,348 m) and Wu-hua Sea (33°09′32″N; 103°51′55″E; alt: 2,377 m), respectively, in Jiuzhaigou National Nature Reserve, Sichuan Province, in July 2020. Specimens were identified by X-Y Ge and C-H Sun. Each sample underwent cleaning with phosphate-buffered saline buffer and the gut was removed under a stereo microscope (to minimize intestinal microbial contamination). Subsequently, samples were stored in liquid nitrogen before nucleic acid extraction14.
Nucleic acid extraction and sequencing
For genome survey, transcriptome, PacBio, and Hi-C sequencing, four male individuals of each species were sequenced. Additionally, a female individual underwent DNA sequencing using the Illumina platform to identify sex chromosome. DNA and RNA were extracted from samples using the Qiagen DNeasy Blood & Tissue Kit (Qiagen) and TRIzol Reagent Kit (Invitrogen)15.
For PacBio sequencing, sequencing libraries with 20 kb (H. anomala) and 30 kb (E. splendida) insert size were constructed, respectively, using the SMRTbell Template Prep Kit 1.0-SPv3, tailored to the quality of extracted DNA. Long-read sequencing was performed using the PacBio Sequel II platform with the CLR strategy. PCR-free sequencing libraries with a 350 bp insert size were generated for short-read genome sequencing. The Hi-C library was created using Mbol restriction endonuclease16. Both library types were subsequently sequenced on the Illumina Novaseq. 6000 and BGISEQ-500 platforms.
In total, approximately 285.76 and 352.18 Gb of raw data were generated for H. anomala and E. splendida, respectively. For H. anomala, the raw data included 117.23 Gb (approximately 176×) of PacBio reads with a scaffold N50 of 19.78 kb, 86.45 Gb of Illumina reads (comprising 28.87 and 57.58 Gb from the female and male samples, respectively), 74.62 Gb of Hi-C data, and 6.11 Gb of transcriptome data. For E. splendida, the raw data consisted of 117.9 Gb (approximately 136×) of PacBio reads with a scaffold N50 of 29.33 kb, 131.42 Gb of Illumina reads (comprising 43.73 and 87.69 Gb from the female and male samples, respectively), 91.40 Gb of Hi-C data, and 6.16 Gb of transcriptome data.
Genome size estimation and assembly
The acquired DNA sequencing reads underwent rigorous quality control using BBmap v38.6717. This process included the removal of duplicate reads and filtering of low-quality reads, which were defined as follows: quality score < 20, length < 15, and consecutive polymer A/G/C > 10. For k-mer analysis, khist.sh was used with the parameter k = 21. Genome size was estimated using the R package of GenomeScope v2.0.118 to calculate the k-mer distribution and generate a histogram, with a maximum sequencing coverage of 10,000. The estimated genome sizes were approximately 608.17 and 786.73 Mb for H. anomala and E. splendida, respectively, with the H. anomala genome exhibiting higher heterozygosity (1.03%; Fig. S1) compared to the lower heterozygosity of E. splendida (0.79%; Fig. S2).
Flye v2.8.319 was used for PacBio long-read assembly, with one round of self-polishing based on long reads. This resulted in 774.15 and 870.01 Mb assemblies for H. anomala and E. splendida, respectively. Illumina short-read mapping was performed using Minimap2 v2.1720, and the assembled genome underwent two rounds of polishing with NextPolish v1.1.021. Redundant sequences were removed using Purge_Dups v1.2.522 with the haploid cutoff set at 60 (-s 60) based on the aforementioned short-read mapping. Before chromosome anchoring, Hi-C reads alignment and quality control were conducted using Juicer v1.6.223 with its default parameters. Subsequently, 3D-DNA v18092224 was employed to automatically anchor the majority of contigs into pseudochromosomes. Mis-joins were corrected using Juicebox v1.11.0823 through manual inspection and refinement. In total, 97.68% and 99.58% of assembly contigs were anchored into 24 and 29 pseudochromosomes, with lengths of 11.53–39.79 Mb for H. anomala and 9.92–51.78 Mb for E. splendida (Fig. 1).
Fig. 1.
Genome-wide chromosomal interactive heatmap. Each chromosome and contig is framed in blue and green, respectively. (a) Himalopsyche anomala. (b) Eubasilissa splendida.
Thorough examination for potential contaminants was conducted using MMseqs. 2 v1125 with the parameter “–min-seq-id 0.8” against the National Center for Biotechnology Information (NCBI) nt and UniVec databases. Sequences with > 90% alignments were removed. The final assembly lengths were 663.43 Mb (H. anomala) and 859.28 Mb (E. splendida), respectively (Table 1). To identify sex chromosomes, Illumina reads of the female individual were mapped against the assembly, and sequencing depth for each chromosome was calculated. Trichoptera follows the ZO female sex determination system26, hence, chromosomes with half the sequencing depth were identified as sex chromosomes (Tables S1, S2). The GC content of H. anomala and E. splendida assemblies was 31.55% and 32.76%, respectively. Notably, the estimated genome size closely matched the assembly size, with the genome assembly size of H. anomala resembling that of other Himalopsyche species27,28, whereas the genome size of E. splendida exceeded that of Eubasilissa regina (440.07 Mb)29. Genome completeness was assessed using Benchmarking Universal Single-Copy Orthologs (BUSCO) v3.0.230, employing the parameter “-m genome”, during each stage of the assembly. The completeness was computed as 98.1% and 98.2% for H. anomala and E. splendida, respectively, indicating high-quality assembled genomes (Table 2).
Table 1.
Genome assembly statistics for Himalopsyche anomala and Eubasilissa splendida.
| Assembly | Himalopsyche anomala | Eubasilissa splendida |
|---|---|---|
| Total contig/scaffold Number | 3,084/402 | 935/321 |
| Total Length (MB) | 663.433 | 859.28 |
| contig/scaffold N50 (MB) | 0.48/23.32 | 4.68/22.87 |
| Max contig length: (MB) | 39.791 | 51.78 |
| Max scaffold Length (MB) | 6.762 | 17.607 |
| Gap (%) | 0.04 | 0.01 |
| GC Content (%) | 31.55 | 32.76 |
Table 2.
Statistical result of BUSCO for Himalopsyche anomala and Eubasilissa splendida.
| Summary | Himalopsyche anomala | Eubasilissa splendida |
|---|---|---|
| Complete BUSCOs | 1,342 (98.1%) | 1,343 (98.2%) |
| Complete and single-copy BUSCOs | 1,328 (97.1%) | 1,336 (97.7%) |
| Complete and duplicated BUSCOs | 14 (1.0%) | 7 (0.5%) |
| Fragmented BUSCOs | 6 (0.4%) | 5 (0.4%) |
| Missing BUSCOs | 19 (1.5%) | 19 (1.4%) |
Repetitive sequence and noncoding RNAs annotation
RepeatModeler v2.0.231 and the LTR discovery pipeline (-LTRstruct) of genome tools32 were used to build a de novo repetitive element database. Subsequently, we merged this database with the known repeat element database (Repbase-2018102633 and Dfam 3.134). RepeatMasker v4.0.735 was used to annotate the repeat elements of the two assemblies based on the custom database, identifying 288.10 Mb (approximately 43.43%) and 471.23 Mb (approximately 54.84%) of repetitive sequences for H. anomala and E. splendida, respectively. Among these elements, the largest proportion comprised unclassified elements, accounting for 21.43% and 28.44% of the total genomes of the respective species. Details regarding other common repetitive elements are provided in Tables S3, S4. To annotate the non-coding RNAs, we employed Infernal v1.1.436 and tRNAscan-SE v2.0.937, low-confidence tRNAs by setting parameter “EukHighConfidenceFilter” was filtered. A total of 717 ncRNAs and 766 ncRNAs were annotated in the H. anomala and E. splendida genomes, respectively, with tRNAs constituting more than 50% (384 and 420) of these ncRNAs. Details regarding other noncoding RNAs are provided in Tables S5, S6.
Genome annotation
We integrated a multifaceted approach encompassing ab initio predictions, homologous proteins, and transcriptomic strategies to predict gene structures in the H. anomala and E. splendida genomes. Initially, we used BRAKER v2.1.638, which integrated results from Augustus v3.3.339 and GeneMark v4.3240. In this process, we utilized the arthropod reference proteins from OrthoDB10 v1041 to proceed ab initio predictions. Additionally, we downloaded the protein sequences of model organisms and closely related species (Table 3), including Drosophila melanogaster Meigen, Bombyx mori (Linnaeus), Spodoptera litura (Fabricius) and so on. These sequences were used for homologous gene prediction, employing GeMoMa v1.7.142 with the parameter “GeMoMa.c = 0.5 GeMoMa.p = 10”. Transcriptome sequencing reads underwent the same quality control methods used for DNA sequencing. Subsequently, HISAT2 v2.2.043 and samtools were employed to produce BAM alignments for reference assembly, and StringTie v2.1.644 was used to perform transcriptome assembly. Conclusively, we used MAKER v3.01.0345 to synthesize the three distinct strategies. A total of 11,469 and 10,554 PCGs were predicted in the H. anomala and E. splendida genomes, respectively (Table 4). The average number of exons and introns per gene was similar in H. anomala (9.4 exons and 8.2 introns) and E. splendida (7.1 exons and 8.3 introns). Variations in gene density were observed across different chromosomes, with the highest gene density on chromosome 21 and chromosome 23 in the H. anomala and E. splendida genomes, respectively (Fig. 2a,b). BUSCO was employed to predict protein sequence for both genomes with integrity of 98.4% in protein model, attesting to the high-quality annotation of the genomes.
Table 3.
Species taxonomic information and accession code of all samples used in this study.
| Species | Class | Order | Source |
|---|---|---|---|
| Tribolium castaneum | Insecta | Coleoptera | NCBI (GCF_000002335.3) |
| Drosophila melanogaster | Insecta | Diptera | NCBI (GCF_000001215.4) |
| Bombyx mori | Insecta | Lepidoptera | NCBI (GCF_014905235.1) |
| Helicoverpa armigera | Insecta | Lepidoptera | NCBI (GCF_023701775.1) |
| Spodoptera litura | Insecta | Lepidoptera | NCBI (GCF_002706865.1) |
| Cheumatopsyche charites | Insecta | Trichoptera | 10.6084/m9.figshare.19673562.v1 |
Table 4.
Structural annotation information of protein-encoding genes of Himalopsyche anomala and Eubasilissa splendida.
| Structural annotation | Himalopsyche anomala | Eubasilissa splendida |
|---|---|---|
| Number of protein-coding genes | 11,469 | 10,554 |
| Number of predicted protein sequences | 13,652 | 12,736 |
| Mean protein length (aa) | 576.5 | 576.4 |
| Mean gene length (bp) | 12,237.20 | 14,481.30 |
| Gene ratio | 21.15% | 17.79% |
| Number of exons per gene | 9.4 | 7.1 |
| Mean exon length (bp) | 347.3 | 330.8 |
| Exon ratio | 5.70% | 3.88% |
| Number of CDSs per gene | 9.2 | 9.3 |
| Mean CDS length (bp) | 223 | 223.9 |
| CDS ratio | 3.56% | 2.57% |
| Number of introns per gene | 8.2 | 8.3 |
| Mean intron length (bp) | 1,084.10 | 1,358.4 |
| Intron ratio | 15.45% | 13.91% |
Fig. 2.
Characterization of the assembled Himalopsyche anomala and Eubasilissa splendida genome, phylogenetic relationship, and gene family evolution. (a) Himalopsyche anomala. (b) Eubasilissa splendida. From the inner to outer layers: gene density, GC content (GC), DNA transposons (DNA), long-interspersed elements (LINE), long-terminal repeat elements (LTR), short-interspersed elements (SINE), chromosome length (Chr).
To functionally annotate the PCGs, Diamond v2.0.11.14946 was applied to search against the UniProtKB database47, using a sensitive strategy. Furthermore, eggNOGmapper v2.0.148 was used to annotate protein domains based on eggNOG v5.049. Concurrently, InterProScan 5.53–87.050 was also employed to identify domains by Pfam51, SMART52, Superfamily53, Gene3D54, and CDD55 databases. Integration of the predicted results led to the functional annotation of 10,715 (93.42%) and 9,947 (94.24%) PCGs for H. anomala and E. splendida, respectively (Table S7).
Data Records
The newly assembled genomes are available at the NCBI under the BioProject IDs: PRJNA749930 (H. anomala) and PRJNA749861 (E. splendida). Raw Illumina, PacBio, Hi-C, and transcriptome data for both species have been deposited in the Sequence Read Archive under identification numbers SRP351561 (H. anomala)56 and SRP351440 (E. splendida)57. The chromosomal assemblies of H. anomala and E. splendida have been deposited in the NCBI assembly with the accession numbers JAHZMQ00000000058 and JAHZML00000000059, respectively. Results of annotation for repetitive elements and gene prediction for both species are available in the figshare database60.
Technical Validation
We evaluated the quality of H. anomala and E. splendida genome assemblies, focusing on completeness and accuracy. The completeness of assembly was evaluated using BUSCO with the insects_odb10 database, yielding final assemblies with BUSCO completeness of 98.1% and 98.2% for H. anomala and E. splendida, respectively, affirming the high quality of these genomes. To verify accuracy of assembly, we calculated mapping rates by aligning PacBio and Illumina reads to the final assembly: for H. anomala, 96.21%, 96.99%, and 96.41% of reads were successfully mapped, respectively; for E. splendida, higher mapping rates of 96.99%, 97.11%, and 96.42% were obtained, respectively. The Hic assembly underwent manual correction to ensure accuracy, and the Hi-C heatmap showed a well-organized interaction pattern at the chromosomal level (Fig. 1). Additionally, the final annotated gene BUSCO completeness was 98.4% for both H. anomala and E. splendida. Collectively, these results confirm the high quality and accuracy of the new chromosome-level assemblies.
Supplementary information
Acknowledgements
This research was supported by the National Natural Science Foundation of China (32271631; 32370489; 32311520285).
Author contributions
X.G., C.S. and B.W. conceived and designed the experiments. X.G. and J.D. collected the samples. X.G., L.P. and Z.D. analyzed the data and results. X.G. wrote the manuscript. X.G., C.S. and B.W. revised the manuscript. All authors read and approved the final manuscript.
Code availability
No specific code was used in this study. All analytical processes were executed according to the manuals and protocols of the corresponding bioinformatic tools.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Changhai Sun, Email: chsun@njau.edu.cn.
Beixin Wang, Email: wangbeixin@njau.edu.cn.
Supplementary information
The online version contains supplementary material available at 10.1038/s41597-024-03097-3.
References
- 1.Dijkstra KD, Monaghan MT, Pauls SU. Freshwater biodiversity and aquatic insect diversification. Annu. Rev. Entomol. 2014;59:143–163. doi: 10.1146/annurev-ento-011613-161958. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Morse, J. C. Trichoptera World Checklist. http://entweb.clemson.edu/database/trichopt/index.htm. (2023).
- 3.Wiggins, G. B. Caddisflies: the underwater architects. University of Toronto Press. (2004).
- 4.Hamid SA, Che S. Application of aquatic insects (Ephemeroptera, Plecoptera and Trichoptera) in water quality assessment of malaysian headwater. Trop. Life Sci. Res. 2017;28:143–162. doi: 10.21315/tlsr2017.28.2.11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Morse JC, et al. Freshwater biomonitoring with macroinvertebrates in East Asia. Front Ecol Environ. 2007;5:33–42. doi: 10.1890/1540-9295(2007)5[33:FBWMIE]2.0.CO;2. [DOI] [Google Scholar]
- 6.Morse, J. C, Frandsen, P. B, Graf, W. & Thomas, J. A. Diversity and ecosystem services of Trichoptera. Diversity and ecosystem Services of Aquatic Insects (ed. by Morse, J. C. & Adler, P. H.). Insects. 10, 125 (2019). [DOI] [PMC free article] [PubMed]
- 7.Thomas JA, Frandsen PB, Prendini E, Zhou X, Holzenthal RW. A multigene phylogeny and timeline for Trichoptera (Insecta) Syst. Entomol. 2020;45:670–686. doi: 10.1111/syen.12422. [DOI] [Google Scholar]
- 8.Ge X, et al. Massive gene rearrangements of mitochondrial genomes and implications for the phylogeny of Trichoptera (Insecta) Syst. Entomol. 2023;48:278–295. doi: 10.1111/syen.12575. [DOI] [Google Scholar]
- 9.Malm T, Johanson KA, Wahlberg N. The evolutionary history of Trichoptera (Insecta): A case of successful adaptation to life in freshwater. Syst. Entomol. 2013;38:459–473. doi: 10.1111/syen.12016. [DOI] [Google Scholar]
- 10.Wiggins, G. B. The caddisfly family Phryganeidae (Trichoptera). University of Toronto Press. (1996).
- 11.de Moor FC, Ivanov VD. Global diversity of caddisflies (Trichoptera: Insecta) in freshwater. Hydrobiologia. 2008;595:393–407. doi: 10.1007/s10750-007-9113-2. [DOI] [Google Scholar]
- 12.Hjalmarsson AE, et al. Molecular phylogeny of Himalopsyche (Trichoptera, Rhyacophilidae) Syst. Entomol. 2019;44:973–984. doi: 10.1111/syen.12367. [DOI] [Google Scholar]
- 13.Jannot JE, Bruneau E, Wissinger SA. Effects of larval energetic resources on life history and adult allocation patterns in a caddisfly (Trichoptera: Phryganeidae) Ecol Entomol. 2007;32:376–383. doi: 10.1111/j.1365-2311.2007.00876.x. [DOI] [Google Scholar]
- 14.Luo S, Tang M, Frandsen PB, Stewart RJ, Zhou X. The genome of an underwater architect, the caddisfly Stenopsyche tienmushanensis Hwang (Insecta: Trichoptera) GigaScience. 2018;7:giy143. doi: 10.1093/gigascience/giy143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Ge X, et al. The First Chromosome-level Genome Assembly of Cheumatopsyche charites Malicky and Chantaramongkol, 1997 (Trichoptera: Hydropsychidae) Reveals How It Responds to Pollution. Genome. Biol. Evol. 2022;1410:evac136. doi: 10.1093/gbe/evac136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Liu Y, et al. Apolygus lucorum genome provides insights into omnivorousness and mesophyll feeding. Mol. Ecol. Resour. 2020;21:287–300. doi: 10.1111/1755-0998.13253. [DOI] [PubMed] [Google Scholar]
- 17.Bushnell, B. BBtools. Available online: https://sourceforge.net/projects/bbmap/. (accessed on 1 October 2022) (2014).
- 18.Ranallo-Benavidez TR, Jaron KS, Schatz MC. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun. 2020;11:1432. doi: 10.1038/s41467-020-14998-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Kolmogorov M, Yuan J, Lin Y, Pevzner PA. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 2019;37:540–546. doi: 10.1038/s41587-019-0072-8. [DOI] [PubMed] [Google Scholar]
- 20.Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–3100. doi: 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Hu J, Fan J, Sun Z, Liu S. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics. 2020;36:2253–2255. doi: 10.1093/bioinformatics/btz891. [DOI] [PubMed] [Google Scholar]
- 22.Guan D, et al. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics. 2020;36:2896–2898. doi: 10.1093/bioinformatics/btaa025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Durand NC, et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst. 2016;3:95–98. doi: 10.1016/j.cels.2016.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Dudchenko O, et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 2017;356:92–95. doi: 10.1126/science.aal3327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Steinegger M, Soding J. MMseqs. 2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 2017;35:1026–1028. doi: 10.1038/nbt.3988. [DOI] [PubMed] [Google Scholar]
- 26.Yoshido A, et al. Step-by-step evolution of neo-sex chromosomes in geographical populations of wild silkmoths, Samia cynthia ssp. Heredity. 2011;106:614–624. doi: 10.1038/hdy.2010.94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Deng XL, et al. The impact of sequencing depth and relatedness of the reference genome in population genomic studies: A case study with two caddisfly species (Trichoptera, Rhyacophilidae, Himalopsyche) Ecol. Evol. 2022;12:e9583. doi: 10.1002/ece3.9583. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Heckenhauer J, et al. Genome size evolution in the diverse insect order Trichoptera. GigaScience. 2022;11:giac011. doi: 10.1093/gigascience/giac011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Heckenhauer J, et al. Characterization of the primary structure of the major silk gene, h-fibroin, across caddisfly (Trichoptera) suborders. iScience. 2023;26:107253. doi: 10.1016/j.isci.2023.107253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Waterhouse RM, et al. BUSCO. Applications from Quality Assessments to Gene Prediction and Phylogenomics. Mol. Biol. Evol. 2018;35:543–548. doi: 10.1093/molbev/msx319. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Flynn JM, et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA. 2020;117:9451–9457. doi: 10.1073/pnas.1921046117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Gremme, G. The GENOMETOOLS genome analysis system. http://genometools.org. (2023).
- 33.Bao W, Kojima KK, Kohany O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA. 2015;6:11. doi: 10.1186/s13100-015-0041-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Hubley R, et al. The Dfam database of repetitive DNA families. Nucleic. Acids. Res. 2016;44:D81–D89. doi: 10.1093/nar/gkv1272. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Smit, A. F. A, Hubley, R. & Green, P. RepeatMasker Open-4.0. Available online: http://www.repeatmasker.org (2013–2015) (accessed on 1 October 2022).
- 36.Nawrocki EP, Eddy SR. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 2013;29:2933–2935. doi: 10.1093/bioinformatics/btt509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Chan PP, Lowe TM. TRNAscan-SE: Searching for tRNA genes in genomic sequences. Methods. Mol. Biol. 2019;1962:1–14. doi: 10.1007/978-1-4939-9173-0_1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Bruna T, Hoff KJ, Lomsadze A, Stanke M, Borodovsky M. BRAKER2: Automatic eukaryotic genome annotation with GeneMark-EP and AUGUSTUS supported by a protein database. Nar. Genom. Bioinform. 2021;3:lqaa108. doi: 10.1093/nargab/lqaa108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Stanke M, Steinkamp R, Waack S, Morgenstern B. AUGUSTUS: A web server for gene finding in eukaryotes. Nucleic. Acids. Res. 2004;32:W309–W312. doi: 10.1093/nar/gkh379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Bruna T, Lomsadze A, Borodovsky M. GeneMark-EP: Eukaryotic gene prediction with self-training in the space of genes and proteins. Nar Genom. Bioinform. 2020;2:lqaa26. doi: 10.1093/nargab/lqaa026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Kriventseva EV, et al. OrthoDB v10: Sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Nucleic. Acids. Res. 2019;47:D807–D811. doi: 10.1093/nar/gky1053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Keilwagen J, Hartung F, Grau J. GeMoMa: Homology-Based Gene Prediction Utilizing Intron Position Conservation and RNA-seq Data. Methods. Mol. Biol. 2019;1962:161–177. doi: 10.1007/978-1-4939-9173-0_9. [DOI] [PubMed] [Google Scholar]
- 43.Kim D, Langmead B, Salzberg SL. HISAT: A fast spliced aligner with low memory requirements. Nat. Methods. 2015;12:357–360. doi: 10.1038/nmeth.3317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Kovaka S, et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome. Biol. 2019;20:278. doi: 10.1186/s13059-019-1910-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Holt C, Yandell M. MAKER2: An annotation pipeline and genome-database management tool for second-generation genome projects. BMC. Bioinform. 2011;12:491. doi: 10.1186/1471-2105-12-491. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat. Methods. 2015;12:59–60. doi: 10.1038/nmeth.3176. [DOI] [PubMed] [Google Scholar]
- 47.Morgat A, et al. Enzyme annotation in UniProtKB using Rhea. Bioinformatics. 2020;36:1896–1901. doi: 10.1093/bioinformatics/btz817. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Huerta-Cepas J, et al. Fast Genome-Wide Functional Annotation through Orthology Assignment by egg NOG-Mapper. Mol. Biol. Evol. 2017;34:2115–2122. doi: 10.1093/molbev/msx148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Huerta-Cepas J, et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 2019;47:D309–D314. doi: 10.1093/nar/gky1085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Finn RD, et al. InterPro in 2017—Beyond protein family and domain annotations. Nucleic Acids Res. 2017;45:D190–D199. doi: 10.1093/nar/gkw1107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.El-Gebali S, et al. The Pfam protein families database in 2019. Nucleic. Acids. Res. 2019;47:D427–D432. doi: 10.1093/nar/gky995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Letunic I, Bork P. 20 years of the SMART protein domain annotation resource. Nucleic. Acids. Res. 2018;46:D493–D496. doi: 10.1093/nar/gkx922. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Wilson D, et al. SUPERFAMILY—Sophisticated comparative genomics, data mining, visualization and phylogeny. Nucleic. Acids. Res. 2009;37:D380–D386. doi: 10.1093/nar/gkn762. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Lewis TE, et al. Gene3D: Extensive Prediction of Globular Domains in Proteins. Nucleic. Acids. Res. 2018;46:D1282. doi: 10.1093/nar/gkx1187. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Marchler-Bauer A, et al. CDD/SPARCLE: Functional classification of proteins via subfamily domain architectures. Nucleic. Acids. Res. 2017;45:D200–D203. doi: 10.1093/nar/gkw1129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.2023. NCBI Sequence Read Archive. SRP351561
- 57.2023. NCBI Sequence Read Archive. SRP351440
- 58.Ge XY, Peng L, Sun CH, Wang BX. 2023. Genbank. GCA_031772345.1
- 59.Ge XY, Peng L, Sun CH, Wang BX. 2023. Genbank. GCA_031772225.1
- 60.Ge XY. 2023. Chromosome-scale genomes of two caddisflies (Himalopsyche anomala and Eubasilissa splendida) figshare. [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Citations
- 2023. NCBI Sequence Read Archive. SRP351561
- 2023. NCBI Sequence Read Archive. SRP351440
- Ge XY, Peng L, Sun CH, Wang BX. 2023. Genbank. GCA_031772345.1
- Ge XY, Peng L, Sun CH, Wang BX. 2023. Genbank. GCA_031772225.1
- Ge XY. 2023. Chromosome-scale genomes of two caddisflies (Himalopsyche anomala and Eubasilissa splendida) figshare. [DOI] [PMC free article] [PubMed]
Supplementary Materials
Data Availability Statement
No specific code was used in this study. All analytical processes were executed according to the manuals and protocols of the corresponding bioinformatic tools.


