Skip to main content
G3: Genes | Genomes | Genetics logoLink to G3: Genes | Genomes | Genetics
. 2025 Oct 1;15(12):jkaf231. doi: 10.1093/g3journal/jkaf231

Chromosome-level assembly of the club-legged grasshopper (Gomphocerus sibiricus) genome

Octavio M Palacios-Gimenez 1,2,3,, Mahendra Varma 4,5, Xinyi Cheng 6,7, Mai-Britt Mosbech 8, Alexander Suh 9,10, Holger Schielzeth 11,12,
Editor: K Vogel
PMCID: PMC12693501  PMID: 41029998

Abstract

Grasshoppers represent true outliers in genome sizes, both within insects and within animals in general. Their genomes are large and generally variable in sizes and feature a high abundance of repetitive DNA sequences. This has hampered the assembly of grasshopper genomes to the chromosome level. Here we present a chromosome-level reference genome for the club-legged grasshopper (Gomphocerus sibiricus, Acrididae: Gomphocerinae) using PacBio HiFi long-read and Hi-C sequencing technologies. In male haploid cells, the species has a chromosome set of n = 9 with an X0 sex-determination system, characterized by an absence of a Y chromosome. Our assembly spans 9.57 Gb in total, with 8.87 Gb organized into 9 chromosomes—8 autosomes and the X chromosome. The final assembly has a scaffold N50 value of 1.58 Gb, covers 96.7% single copy Insecta orthologs, and contains 42,665 predicted protein-coding genes and 43,385 mRNA transcripts. We compiled a curated, nonredundant, species-specific repeat library and used it to annotate repetitive DNA, covering 81.69% of the genome, mostly DNA transposons, long-interspersed nuclear element and long-terminal repeat retrotransposons. The genome of the club-legged grasshopper shows high degree of synteny with the locusts Schistocerca gregaria and Locusta migratoria, and the analysis strongly indicates 3 autosome–autosome centric fusions in Gomphocerinae. The genome offers a valuable resource for grasshopper genomics and for exploring the genetic basis of a transspecies color polymorphism.

Keywords: color polymorphism, chromosome-level genome assembly, Orthoptera, Acrididae, Gomphocerinae

Introduction

Genomes of Orthoptera (grasshoppers, crickets, and bush-crickets) are interesting for at least 4 reasons. First, they represent the largest genomes of all insects (Hanrahan and Johnston 2011; Mao et al. 2020; Yuan et al. 2021; Sun et al. 2023) and are thus of interest to genome size evolution. Second, they have a dynamic karyotype, with several known cases of major chromosomal rearrangement (Palacios-Gimenez et al. 2025), including the formation of newly evolved neo-sex chromosome systems (Castillo et al. 2010) and a high abundance of supernumerary B chromosome (Palestis et al. 2004). They are thus of interest for studying genomic organization and stability. Third, Orthoptera feature one of the most phylogenetically widespread cases of a transspecies color polymorphism, with green and brown color morphs coexisting within populations in about 30% of the species (Schielzeth 2020). Genomic resources will facilitate the quest for the genomic basis of this shared polymorphism. Fourth, many species are short winged and thus sedentary but develop long wings and the ability to disperse under unfavorable conditions. They are thus model systems for the study of phenotypic plasticity and dispersal polymorphisms (Cabon et al. 2025).

Orthopteran genomes are known to be large, variable in size and rich in repetitive DNA sequences (Palacios-Gimenez et al. 2020; Shah et al. 2020; Hawlitschek et al. 2023; Sun et al. 2023; Liu et al. 2024). This has hampered the assembly of Orthopteran genomes using short-read sequencing technologies. The situation is currently changing with the availability of long-read sequencing technologies, yet fully phased genome assembly still represents a challenge for very large and repeat-enrich genomes like those of orthopterans. To date, 15 chromosome-level and 8 scaffold-level reference genome assemblies have been published for Orthoptera (https://www.ncbi.nlm.nih.gov/datasets/genome/?taxon=6993). Most of the assemblies come from crickets (families Gryllidae, Trigoniidae, Myrmecophilidae) with relatively small genomes (0.6–2.3 Gb). Furthermore, the genomes of 2 swarming locusts (Locusta migratoria, 6.3 Gb, and Schistocerca gregaria, 8.7 Gb) have been sequenced along with 5 other species of Schistocerca (8.5–9.1 Gb). Outside crickets and the 2 locust genera, only 6 reference genomes have been assembled (2 Ensifera, 6.4–9.0 Gb, 3 Tetrigidae 1.0–1.7 Gb, 4 lines of a species of Morabinae 3.75–4.1 Gb). This means that the available genomes do not cover the phylogenetic diversity of Orthoptera. In particular, no genome from the subfamily Gomphocerinae has been sequenced and assembled, even though Gomphocerinae represent one of the most species-rich clades with among the largest insect genomes (Cigliano et al. 2025).

We present a reference chromosome-level annotated genome assembly for the club-legged grasshopper Gomphocerus sibiricus (Caelifera, Acrididae: Gomphocerinae, Fig. 1a-b). The species is green–brown polymorphic in both sexes and is unique among grasshoppers in featuring a marked sexual dimorphism in front-leg morphology (Valverde et al. 2018). Males of this species have a haploid chromosome set of n = 8A + X0 (Gosálvez and López-Fernández 1981) (Fig. 1c). Their genome includes 8 autosomes—3 large metacentric and 5 smaller acrocentric chromosomes—and follows an X0 sex-determination system. In this system, females are homozygous (XX), while males are hemizygous (X0) and lack a Y chromosome (Gosálvez and López-Fernández 1981). This haploid karyotype aligns with other species in the Gomphocerinae subfamily (Cabrero and Camacho 1986) but differs of the standard n = 11A + X0 found in most male Acrididae grasshoppers (John and Hewitt 1966; Palacios-Gimenez et al. 2025).

Fig. 1.

Fig. 1.

a) Male club-legged grasshopper G. sibiricus showing its unique swollen front-leg clubs. b) Female club-legged grasshoppers illustrating the green–brown polymorphism that is shared with many other species. c) Schematic drawing of the haploid male karyotype of club-legged grasshopper. Dots indicate positions of the centromere on each chromosome. Chromosomes are sorted by size and centromere position. The karyotype consists of 3 pairs of large metacentric autosomes (chr1 to chr3), 5 small-medium acrocentric autosomes (chr4 to chr8), and the acrocentric X chromosome. d) GenomeScope2 profile plot for k-mer 40 with associated summary statistics below the header. e) Hi-C contact map indicates 9 super scaffolds (indicated by black boxes), corresponding to 9 chromosomes of the club-legged grasshopper. Circles and dashed lines indicate the putative location of the centromeres. f) Summary and evaluation of the genome assembly, including genome assembly and scaffold statistics (depicted in a snail plot), along with a BUSCO (insecta_odb10) assessment displayed at the top right.

To provide a foundation for studying transspecies color polymorphism and genome evolution in species with large genomes, we generated a chromosome-level assembly of the club-legged grasshopper. Using PacBio HiFi long-reads and high-throughput chromatin conformation capture (Hi-C) for long-range scaffolding, we generated a 9.57 Gb assembly spanning 9 major scaffolds, covering 42,655 predicted protein-coding genes and 81.69% repetitive DNA sequences. The assembly is highly contiguous with a scaffold N50 of 1.58 Gb and a BUSCO completeness score of 96.7%. This high-quality genome provides an essential resource for understanding genomic architecture and evolutionary dynamics in grasshoppers.

Methods

Taxon sampling, DNA extraction, and sequencing

Males of the club-legged grasshopper (G. sibiricus) were collected in 2022 in north of the Col du Galibier (Savoie, France). Entire individuals were flash frozen in liquid nitrogen and preserved at −80 °C until DNA extraction. A total of 3 males were sequenced using different platforms: (i) we sequenced the genome of one male using PacBio HiFi long-read sequencing (12 Sequel SMRT cells 1 M on a PacBio Sequel II system) that produced 17,674,582 HiFi raw reads (283.1 Gb, genome coverage 28.6×). Mean HiFi read length was 16,013 bp; median HiFi read quality was 10; and mean HiFi number of passes was Q33; (ii) 2 male individuals were used to generate paired-end chromatin conformation capture (Hi-C) libraries using the Omni-C kit (Cantata Bio), following the manufacturer's protocol for nonmammalian samples (version 1). The Hi-C libraries were sequenced on an Illumina NovaSeq 6000 S4 flowcell, yielding a total of 2,257.7 million read pairs (2 × 150 bp). PacBio HiFi and Hi-C library preparation and sequencing were performed at the Uppsala Genome Center (UGC), Science for Life Laboratory (SciLifeLab), Department of Immunology, Genetics and Pathology, Uppsala University, Sweden.

Genome size estimation

We used GeneScopeFK (https://github.com/thegenemyers/GENESCOPE.FK), a modified version of GenomeScope v2.0 (Ranallo-Benavidez et al. 2020), to estimate the genome size, genome coverage, and heterozygosity based on PacBio HiFi long-read sequencing with a k-mer size of 40.

Genome assembly and Hi-C scaffolding

We assembled the PacBio HiFi long-read data into a primary assembly using Hifiasm v0.16.1 (Cheng et al. 2021) with default parameters and removed foreign contaminants with NCBI FCS-GX v0.5.4 (Astashyn et al. 2024). We then eliminated haplotigs and contig overlaps using Purge_Dups v1.2.6 (Guan et al. 2020). Hi-C reads were preprocessed and mapped to the decontaminated and purged primary assembly with Pairtools (Open2C et al. 2024), followed by scaffolding with YaHS v1.2.2 (Zhou et al. 2023). We generated a Hi-C contact map using JuicerTools v1.11.08 (Durand et al. 2016) and visualized the map with Juicebox v1.11.08 (Durand et al. 2016) to manually curate and correct the boundaries of the superscaffolds. We assessed the completeness of the chromosome-level assembly using BUSCO v5.5.0 (Manni et al. 2021) with the insecta_odb10 dataset (n = 1,367).

X chromosome identification

In the male karyotype of the club-legged grasshopper, the X chromosome is present in a single copy, while each autosome is diploid. In this system, the X chromosome in male sequencing reads is thus expected to show half the coverage of the autosomes. We thus identified the X chromosome based on read coverage in PacBio HiFi long-read sequencing. We mapped the PacBio HiFi reads onto the newly generated chromosome-level assembly using minimap2 v2.26-r1175 (Li 2018) with default parameters. After mapping, we sorted the output BAM files by coordinate with SAMtools v1.20 (Danecek et al. 2021) and calculated the coverage in 1 Mb windows for each chromosomal scaffold with mosdepth v0.3.3 (Pedersen and Quinlan 2018) with the “–mapq 60' flag.

Synteny with published grasshopper genomes

We used AnchorWave v1.2.5 (Song et al. 2022) to identify collinear regions between the genomes of the club-legged grasshopper and the desert locust S. gregaria (NCBI RefSeq assembly GCF_023897955.1), as well as between the club-legged grasshopper and the migratory locust L. migratoria (Li et al. 2024). AnchorWave performs whole-genome duplication–informed collinear anchor identification between genomes and performs base pair–resolved global alignment for collinear blocks using a 2-piece affine gap cost strategy. First, we took the desert locust genome sequence and gene annotation as input and extracted the full-length coding sequences to serve as anchors using the gff2seq function in AnchorWave. Second, we used minimap2 (Li 2018) in AnchroWave to lift over the start and end position of the desert locust full-length CDS to the club-legged grasshopper genome, with the settings -x splice -t 16 -k 12 -a -p 0.4 -N 20. Third, we used the proali function (settings -R2 -Q 1 -ns) in AnchorWave to identify collinear regions between the genomes. The outputs of this function are end-to-end sequence alignments for each collinear block that were used for plotting in R v4.3.0 (R Core Team 2022). Finally, we repeated all the aforementioned steps using the genomes of the club-legged grasshopper and the migratory locust L. migratoria in place of S. gregaria.

Repeat annotation

We used Earl Grey v5.1.0 (Baril et al. 2024) to predict repetitive DNA sequences de novo in the new chromosome-level assembly and retrieve a nonredundant species-specific repeat library. We used the repeatlib_filtering_workflow to filter the repeat library for non-TEs resembling proteins (source code: https://github.com/NBISweden/repeatlib_filtering_workflow) and parsed the filtered repeat library through TEtrimmer (Qian et al. 2025) to assist in the manual curation of the repeat library. The curated library is available on Figshare (https://doi.org/10.6084/m9.figshare.29148746) and is currently being submitted to the Dfam database (Storer et al. 2021). After curation, we merged the curated nonredundant species-specific repeat library with Arthropoda consensus sequences from Repbase (Bao et al. 2015). Finally, we used this repeat library to annotate the chromosome-level assembly with RepeatMasker v4.1.0 (Smit et al. 2019). We used the script calcDivergenceFromAlign.pl from RepeatMasker utils to calculate the divergence between repeats and their consensus sequences using the Kimura 2-parameter distance while accounting for the presence of CpG sites.

Genome annotation

The repeat-masked genome was used for gene model annotation with BRAKER3 v3.0.8 (Gabriel et al. 2024). To support gene prediction, we used the paired-end RNA-seq reads previously generated for the club-legged grasshopper deposited in NCBI under the BioProject PRJNA525981 and PRJNA1241690 together with Arthropoda protein data from OrthoDB v11 (Kuznetsov et al. 2023). The published RNA-seq samples (Shah et al. 2019) in the BioProject PRJNA525981 included the pool of 5 individuals: 1 imago brown female, 1 imago green female, 1 imago brown male, 1 imago green male, and 1 last-instar green female (accession number SRX5491242 and SRX5491243). RNA-seq samples in the BioProject PRJNA1241690 included 11 adult females (accession number SAMN47561012-SAMN47561022) and 11 adult males (accession number SAMN47561023- SAMN47561033), with RNA extracted specifically from the thorax tissue.

We aligned the each published paired-end RNA-seq dataset to the repeat-masked genome using HISAT2 v2.2 (Kim et al. 2015) with the “–dta” parameter under default settings. We then sorted the resulting BAM files with SAMtools v1.14 and used them alongside the Arthropoda protein database to run BRAKER3. To complement BRAKE3's predictions, we used Helixer v0.3.5 (Holst et al. 2023 Feb 6) with the flags “–lineage invertebrate –subsequence-length 213,840 –overlap-offset 106,920 –overlap-core-length 160,380 –peak-threshold 0.9 –batch-size 16.” Finally, we combined the annotation files from BRAKE3 and Helixer using the agat_sp_complement_annotations.pl script from the AGAT v1.4.0 toolkit (Dainat et al. 2024).

Results and discussion

Genome size and genome assembly

Using GenomeScope2 with a k-mer length of 40, we predicted genome size of the club-legged grasshopper to be 9.12 Gb with 2% heterozygosity (Fig. 1d). This estimate is slightly lower than the 10.15 Gb genome size estimated from the testis using Feulgen densitometry (Gosalvez et al. 1980) and the 10.43 Gb estimated from brain cells using flow cytometry (Shah et al. 2020).

The decontaminated and purged primary PacBio HiFi assembly consisted of 3,752 contigs (N50 contig 6.52 Mb) with a total assembly size of 9.57 Gb (Table 1). Scaffolding with Hi-C data resulted in final assembly containing 9 chromosome models (N50 scaffold 1.58 Gb, Fig. 1e; Table 1). The assembly size slightly exceeds the estimates from the k-mer analysis but is lower than the estimates from Feulgen densitometry and flow cytometry. This chromosome-level assembly represents the largest published chromosome-level insect genome to date. The 9 chromosomal models match the species karyotype (Gosálvez and López-Fernández 1981) and comprise scaffolds of 158 to 2,237 Mb in size, covering 93% of the whole assembly (Fig. 1e; Table 1); unplaced scaffolds consisted of 7% of the total assembly size. BUSCO analysis of the chromosome-level assembly revealed C:96.7% [S:92.5%, D:4.2%], F:1%, M:2.3%, n:1,367 (Fig. 1f; Table 1), indicating the assembly well–captured protein-coding genes.

Table 1.

Summary statistics for the final assembly, gene, and repeat annotation

Assembly statistics
Assembly size (Gb) 9.57
Scaffolds 1,462
N50 scaffold (Gb) 1.58
L50 scaffold count 3
Contigs 3,752
N50 contig (Mb) 6.52
Assembly in scaffolded contigs (%) 92.7
Assembly in unscaffolded contigs (%) 7.3
BUSCO C:96.7% [S:92.5%, D:4.2%], F:1%, M:2.3%, n:1367
Annotation statistics
Protein-coding genes 43,655
mRNA 43,885
Single exon gene 9,799
Mean exons per mRNA (bp) 5.8
Mean exon length (bp) 292
Mean intron length (bp) 18,564
BUSCO C:93.9% [S:73.7%, D:20.2%], F:2.3%, M:3.8%, n:1367
Repeat statistics
TE content (%) 71.63
 SINE (%) 0.57
 LINE (%) 21.01
 LTR (%) 22.05
 DNA (%) 22.57
 RC (%) 7.95
 Unknown (%) 5.18
Satellite DNA (%) 1.41
Simple repeats (%) 0.64
Low complexity 0.04
Total repeat content (%) 81.69

Comparing the metrics of the chromosome-level assembly of the club-legged grasshopper to the available Schistocerca grasshopper assemblies (range 8.5 to 9.1 Gb, scaffold N50 ranges 791.2 to 992.9 Mb) on NCBI (https://www.ncbi.nlm.nih.gov/datasets/genome/?taxon=7008) reveals that the club-legged grasshopper has a slightly larger genome size and higher contiguity compared with other Schistocerca genomes. The genome also stands out from all other so far published orthopteran genome assemblies in size and contiguity.

X chromosome identification

We identified the X chromosome by analyzing genome coverage from PacBio HiFi long-read sequencing. Most chromosome models maintained consistent coverage across 1 Mb windows, but the fourth-largest chromosome (size 899 Mb) shows reduced coverage compared with the remaining chromosome models (Fig. 2a). This chromosome model thus represents the assembly of the X chromosome. We assigned the remaining 8 chromosomes as chr1 to chr8 in a descending order of size, as shown earlier (Fig. 1c, Fig. 1e).

Fig. 2.

Fig. 2.

a) Genome-wide PacBio HiFi read coverage across 1 Mb windows is plotted along chromosomal positions in the club-legged grasshopper G. sibiricus genome. The dashed black line indicates the average read coverage across the autosomes. The reduced coverage on one of the scaffolds identifies the scaffold as the sex-linked X chromosome. b) Synteny of Acridid genomes inferred from collinear anchors between the desert locust S. gregaria and the club-legged grasshopper chromosome-level assemblies. Each dot is plotted based on the start coordinate of the reference genome (S. gregaria) and query genome (G. sibiricus) of each anchor. Collinear anchors on the same strand are shown in blue; anchors on the opposite strands are shown in red.

Synteny of chromosomes with other Orthoptera

Male club-legged grasshoppers have a haploid karyotype of n = 8A + X0, which includes 3 large metacentric chromosomes, characterized by a centromere located near the midpoint of the chromosome, resulting in a short and a long arm. The conformance also applies to all European Gomphocerinae grasshoppers characterized so far (Cabrero and Camacho 1986; Palacios-Gimenez et al. 2025). In contrast, most other subfamilies of Acrididae grasshoppers have a male haploid karyotype of n = 11A + X0, where all chromosomes are acrocentric, featuring a single arm with centromeres located near the periphery (Palacios-Gimenez et al. 2025).

Three autosome–autosome centric fusions (A-A centric fusions) putatively reduced the chromosome number in gomphocerine grasshoppers (including the club-legged grasshopper), a trait shared by half of the analyzed gomphocerine grasshopper species (including all European species), suggesting it originated in a common ancestor (Cabrero and Camacho 1986). Using AnchorWave, we identified 14,607 collinear anchors between the desert locust S. gregaria (Acrididae: Cyrtacanthacridinae) and our novel assembly. The same set of anchors was also found between the migratory locust L. migratoria (Acrididae: Oedipodinae) and our assembly. Our analysis showed that A-A centric fusions between chr1-chr5, chr2-chr6, and chr3-chr8 chromosomes (ordered by the size in the locusts) gave rise to the 3 largest autosomal metacentric chromosomes in the club-legged grasshopper (Fig. 2b, Supplementary Fig. 1). These findings confirm the role of A-A centric fusions in driving karyotype diversity across grasshopper genomes.

Furthermore, we found that the X chromosome of the club-legged grasshopper is homologous to that of the locusts (Fig. 2b, Supplementary Fig. 1), confirming its shared ancestry with the X chromosome of other insects (Toups and Vicoso 2023; Li et al. 2024). We found many rearrangements, mostly inversions, within chromosomes, but no evidence of any cross-chromosome translocations (Fig. 2b, Supplementary Fig. 1). This seems remarkable given the divergence time of 45 My (Song et al. 2020).

Centromere identification

The Hi-C map showed increased contacts between 1 region each in the center of the 3 large chromosome models and at 1 end of the 6 others chromosome models. These contacts are indicative for the approximate location of the centromeres (Fig. 1e). For the 3 metacentric chromosomes, the putative centromere is located slightly off-center, consistent with the karyotype description. In contrast, for the 5 smaller acrocentric autosomes, the contact map suggests the centromere is positioned at the distal end and for the acrocentric chromosome X at the proximate end (Fig. 1e).

Repeat annotation

We de novo identified repetitive DNA sequences and constructed a curated nonredundant species-specific repeat library containing 4,408 consensus sequences. The proliferation of repetitive DNA sequences drives the large genome size of the club-legged grasshopper (Shah et al. 2020), a pattern that correlates with genome size variation across the Tree of Life (Gregory 2005; Hua-Van et al. 2011; Cabral-de-Mello and Palacios-Gimenez 2025). Repetitive DNA accounts for 81.69% of the club-legged grasshopper genome assembly, including 22.57% DNA transposons, 22.05% long-terminal repeat (LTR) retrotransposons, 21.01% long-interspersed nuclear elements (LINE), 7.95% rolling-circles (RC), 0.57% short-interspersed nuclear elements (SINE), 0.25% Penelope-like elements (PLE), 1.41% satellite DNA, and only 5.18% of unknown repeats (Table 1). The repetitive DNA content of the club-legged grasshopper (81.69%) is comparable to those of other previously assembled grasshopper genomes, which range from 60% to 80% (Palacios-Gimenez et al. 2020; Verlinden et al. 2020; Jayaprasad et al. 2024; Li et al. 2024; Li et al. 2024). The fraction of satellite DNA and RC is substantially reduced as compared to previous estimates for the club-legged grasshopper (Shah et al. 2020), likely due to long-read sequencing combined with improved curation and annotation. After subtracting repetitive DNA sequences, the club-legged grasshopper's nonrepetitive genome spans 1.75 Gb, making it remarkably larger than that of most other insects (Cong et al. 2022; Cabral-de-Mello and Palacios-Gimenez 2025).

Estimates of sequences divergence within repeat classes indicates 2 periods of repetitive DNA proliferation. A recent repetitive DNA burst occurred within the 0% to 10% divergence range, involving all major repetitive DNA classes (Fig. 3a). Another burst, primarily consisting of RC elements, appeared within the 41% to 43% divergence range, suggesting these represent older and degenerated RC copies (Fig. 3a).

Fig. 3.

Fig. 3.

Assembly-based repeat landscape analysis in the club-legged grasshopper G. sibiricus genome. a) The divergence between repetitive DNA copies and their consensus sequences is shown on the X-axis as genetic distance calculated using the Kimura 2-parameter distance. The percentage of the genome assembly occupied by repetitive DNA sequences in each class is shown on the Y-axis. b) The percentage of repeat-derived base pairs shown per window of 1 Mb along each assembled chromosome, following the same color scheme as in (a).

Gene annotation

We predicted the structure of 43,655 putative protein-coding genes and 43,885 mRNA in the club-legged grasshopper chromosome-level assembly (19,410 genes identified by BRAKER3 and 30,992 genes identified by Helixer), with an average exon size and intron size of 292 (median 143 bp) and 18,564 bp (median 6,551 bp), respectively (Table 1). BUSCO score on the predicted transcripts are 94% (Table 1), indicating the genome captures almost the entire gene space. Overall, 6.4% of the predicted protein-coding genes are found in the unplaced scaffolds (only 2% of the BRAKER predicted genes). The total of 43,655 predicted protein-coding genes in club-legged grasshopper is substantially more than those reported in the migratory locust L. migratoria (26,636) (Li et al. 2024) and the desert locust S. gregaria (18,815) (Verlinden et al. 2020). This probably resulted from the use of multiple annotation platforms to enhance gene prediction accuracy. The average intron size in the club-legged grasshopper genome is substantially larger than the average intron size of ∼1,000 bp in Drosophila melanogaster (Hoskins et al. 2015) and even larger than the average intron size of ∼7,000 bp (median ∼1,700 bp) in humans (Piovesan et al. 2019). Intron gigantism therefore contributes to the large genome size of the club-legged grasshopper.

Acknowledgments

We acknowledge support of the National Genomics Infrastructure (NGI)/UGC and UPPMAX for providing assistance in massive parallel sequencing and computational infrastructure. Work performed at NGI/UGC has been funded by RFI/VR and Science for Life Laboratory (SciLifeLab), Sweden. We are thankful to SciLifeLab, the Knut, and Alice Wallenberg Foundation and the Council for Research Infrastructures and the Swedish Research Council for providing infrastructure for massive parallel sequencing and SNIC/Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX) for providing assistance computational infrastructure. We also thank Ignas Bunikis for his guidance and support during the genome assembly process.

Contributor Information

Octavio M Palacios-Gimenez, Population Ecology Group, Institute of Biodiversity, Ecology and Evolution, Friedrich Schiller University Jena, Dornburger Straße 159, Jena 07743, Thuringia, Germany; Department of Organismal Biology—Systematic Biology, Evolutionary Biology Centre, Uppsala University, Uppsala 75236, Uppland, Sweden; German Centre for Integrative Biodiversity Research (IDiv) Halle-Jena-Leipzig, Puschstraße 4, Leipzig 04103, Saxony, Germany.

Mahendra Varma, Population Ecology Group, Institute of Biodiversity, Ecology and Evolution, Friedrich Schiller University Jena, Dornburger Straße 159, Jena 07743, Thuringia, Germany; Max Planck Institute for Chemical Ecology, Hans-Knöll-Straße 8, Jena 07745, Thüringen, Germany.

Xinyi Cheng, Population Ecology Group, Institute of Biodiversity, Ecology and Evolution, Friedrich Schiller University Jena, Dornburger Straße 159, Jena 07743, Thuringia, Germany; Max Planck Institute for Chemical Ecology, Hans-Knöll-Straße 8, Jena 07745, Thüringen, Germany.

Mai-Britt Mosbech, Department of Immunology, Genetics and Pathology, Uppsala Genome Center, Uppsala University, National Genomics Infrastructure Hosted by SciLifeLab, Box 518, Uppsala 75108, Uppland, Sweden.

Alexander Suh, Department of Organismal Biology—Systematic Biology, Evolutionary Biology Centre, Uppsala University, Uppsala 75236, Uppland, Sweden; Centre for Molecular Biodiversity Research, Leibniz Institute for the Analysis of Biodiversity Change, Zoologisches Forschungsmuseum A. Koenig, Adenauerallee 160, Bonn 53113, North Rhine-Westphalia, Germany.

Holger Schielzeth, Population Ecology Group, Institute of Biodiversity, Ecology and Evolution, Friedrich Schiller University Jena, Dornburger Straße 159, Jena 07743, Thuringia, Germany; German Centre for Integrative Biodiversity Research (IDiv) Halle-Jena-Leipzig, Puschstraße 4, Leipzig 04103, Saxony, Germany.

Data availability

The final assembly and raw data have been deposited at NCBI under the project accession number PRJNA1250961. The genome assembly is available under accession number JBPBLO000000000. PacBio HiFi long-read sequencing data can be found under accession numbers SRX29045381, SRX29045382, SRX29045386-SRX29045393, SRX29073929, and SRX29073928. Hi-C sequencing data are available under accession numbers SRX29045383-SRX29045385. The associated files for this manuscript are available on the figshare database: Supplementary Fig. 1 (https://doi.org/10.6084/m9.figshare.29476994), Gene annotation models (https://doi.org/10.6084/m9.figshare.29148779) RepeatMasker annotation (https://doi.org/10.6084/m9.figshare.29148836), curated nonredundant species-specific repeat library (https://doi.org/10.6084/m9.figshare.29148746).

Supplemental material available at G3 online.

Funding

This work was supported by the Swedish Research Council (grant number 2020-03866 to OMPG), the Swedish Research Council Formas (2017-01597 to AS), and by the German Research Foundation (DFG) as part of the SFB TRR 212 (NC³)—Project numbers 316099922 and 396776775.

Conflicts of interest. The authors declare no conflict of interest.

Literature cited

  1. Astashyn  A  et al.  2024. Rapid and sensitive detection of genome contamination at scale with FCS-GX. Genome Biol. 25:60. 10.1186/s13059-024-03198-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bao  W, Kojima  KK, Kohany  O. 2015. Repbase update, a database of repetitive elements in eukaryotic genomes. Mob DNA.  6:11. 10.1186/s13100-015-0041-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Baril  T, Galbraith  J, Hayward  A. 2024. Earl grey: a fully automated user-friendly transposable element annotation and analysis pipeline. Mol Biol Evol.  41:msae068. 10.1093/molbev/msae068. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Cabon  L, Varma  M, Winter  G, Ebeling  A, Schielzeth  H. 2025. Phenotypic plasticity, heritability, and genotype-by-environment interactions in an insect dispersal polymorphism. BMC Ecol Evol. (25):94. 10.1186/s12862-025-02447-y. [DOI] [PMC free article] [PubMed]
  5. Cabral-de-Mello  DC, Palacios-Gimenez  OM. 2025. Repetitive DNAs: the ‘invisible’ regulators of insect adaptation and speciation. Curr Opin Insect Sci. (67):101295. 10.1016/j.cois.2024.101295. [DOI] [PubMed] [Google Scholar]
  6. Cabrero  J, Camacho  JPM. 1986. Cytogenetic studies in gomphocerine grasshoppers. I. Comparative analysis of chromosome C-banding pattern. Heredity (Edinb).  56:365–372. 10.1038/hdy.1986.58. [DOI] [Google Scholar]
  7. Castillo  ER, Marti  DA, Bidau  CJ. 2010. Sex and neo-sex chromosomes in Orthoptera: a review. J Orthoptera Res. 19:213–231. 10.1665/034.019.0207. [DOI] [Google Scholar]
  8. Cheng  H, Concepcion  GT, Feng  X, Zhang  H, Li  H. 2021. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 18:170–175. 10.1038/s41592-020-01056-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Cigliano  M, Braun  H, Eades  D, Otte  D.  2025. Orthoptera Species File. Version 5.0/5.0. Orthoptera Species File Online. http://Orthoptera.SpeciesFile.org.
  10. Cong  Y, Ye  X, Mei  Y, He  K, Li  F. 2022. Transposons and non-coding regions drive the intrafamily differences of genome size in insects. iScience. 25:104873. 10.1016/j.isci.2022.104873. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Dainat  J  et al.  2024. NBISweden/AGAT: AGAT-v1.4.1. Zenodo. 10.5281/zenodo.3552717. [DOI]
  12. Danecek  P  et al.  2021. Twelve years of SAMtools and BCFtools. GigaScience. 10:giab008. 10.1093/gigascience/giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Durand  NC  et al.  2016a. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst.  3:99–101. 10.1016/j.cels.2015.07.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Durand  NC  et al.  2016b. Juicer provides a one-click system for analyzing loop-resolution hi-C experiments. Cell Syst.  3:95–98. 10.1016/j.cels.2016.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Gabriel  L  et al.  2024. BRAKER3: fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS, and TSEBRA. Genome Res. 5:769–777. 10.1101/gr.278090.123. [DOI] [PMC free article] [PubMed]
  16. Gosálvez  J, López-Fernández  C. 1981. Extra heterochromatin in natural populations of Gomphocerus sibiricus (Orthoptera: Acrididae). Genetica. 56:197–204. 10.1007/BF00057560. [DOI] [Google Scholar]
  17. Gosalvez  J, López-Fernandez  C, Esponda  P. 1980. Variability of the DNA content in five orthopteran species. Caryologia. 33:275–281. 10.1080/00087114.1980.10796840. [DOI] [Google Scholar]
  18. Gregory  TR. 2005. Genome size evolution in animals. In: Gregory  TR, editor. The evolution of the genome. Elsevier Academic Press, Elsevier. p. 3–87. [Google Scholar]
  19. Guan  D  et al.  2020. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics. 36:2896–2898. 10.1093/bioinformatics/btaa025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Hanrahan  SJ, Johnston  JS. 2011. New genome size estimates of 134 species of arthropods. Chromosome Res. 19:809–823. 10.1007/s10577-011-9231-6. [DOI] [PubMed] [Google Scholar]
  21. Hawlitschek  O  et al.  2023. New estimates of genome size in Orthoptera and their evolutionary implications. PLoS One. 18:e0275551. 10.1371/journal.pone.0275551. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Holst  F  et al.  2023. Helixer– de novo prediction of primary Eukaryotic gene models combining deep learning and a Hidden Markov model. bioRxiv.:2023.02.06.527280. 10.1101/2023.02.06.527280. [DOI] [PMC free article] [PubMed]
  23. Hoskins  RA  et al.  2015. The release 6 reference sequence of the Drosophila melanogaster genome. Genome Res. 25:445–458. 10.1101/gr.185579.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Hua-Van  A, Le Rouzic  A, Boutin  TS, Filée  J, Capy  P. 2011. The struggle for life of the genome's selfish architects. Biol Direct.  6:19. 10.1186/1745-6150-6-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Jayaprasad  S  et al.  2024. Orthopteran neo-sex chromosomes reveal dynamics of recombination suppression and evolution of supergenes. Mol Ecol. 33:e17567. 10.1111/mec.17567. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. John  B, Hewitt  GM. 1966. Karyotype stability and DNA variability in the acrididae. Chromosoma. 20:155–172. 10.1007/BF00335205. [DOI] [Google Scholar]
  27. Kim  D, Langmead  B, Salzberg  SL. 2015. HISAT: a fast spliced aligner with low memory requirements. Nat Methods. 12:357–360. 10.1038/nmeth.3317. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Kuznetsov  D  et al.  2023. OrthoDB v11: annotation of orthologs in the widest sampling of organismal diversity. Nucleic Acids Res.  51:D445–D451. 10.1093/nar/gkac998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Li  H. 2018. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 34:3094–3100. 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Li  X  et al.  2024. Chromosome-level genome assembly of the morabine grasshopper Vandiemenella viatica19. Sci Data. 11:997. 10.1038/s41597-024-03858-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Li  X, Mank  JE, Ban  L. 2024. The grasshopper genome reveals long-term gene content conservation of the X chromosome and temporal variation in X chromosome evolution. Genome Res. 34:997–1007. 10.1101/gr.278794.123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Liu  X, Zhao  L, Majid  M, Huang  Y. 2024. Orthoptera-TElib: a library of Orthoptera transposable elements for TE annotation. Mob DNA.  15:5. 10.1186/s13100-024-00316-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Manni  M, Berkeley  MR, Seppey  M, Simão  FA, Zdobnov  EM. 2021. BUSCO update: novel and streamlined workflows along with bbroader and deeper phylogenetic coverage for scoring of Eukaryotic, Prokaryotic, and viral genomes. Mol Biol Evol. 38:4647–4654. 10.1093/molbev/msab199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Mao  Y  et al.  2020. Genome size of 17 species from Caelifera (Orthoptera) and determination of internal standards with very large genome size in insecta. Front Physiol. 11:567125. 10.3389/fphys.2020.567125. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Open2C  et al.  2024. Pairtools: from sequencing data to chromosome contacts. PLoS Comput Biol. 20:e1012164. 10.1371/journal.pcbi.1012164. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Palacios-Gimenez  OM, Castillo  ERD, Schielzeth  H. 2025. Karyotype evolution and speciation in Orthoptera. J Evol Biol. 38:516–529. 10.1093/jeb/voaf018. [DOI] [PubMed] [Google Scholar]
  37. Palacios-Gimenez  OM  et al.  2020. Comparative analysis of morabine grasshopper genomes reveals highly abundant transposable elements and rapidly proliferating satellite DNA repeats. BMC Biol. 18:199. 10.1186/s12915-020-00925-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Palestis  BG, Trivers  R, Burt  A, Jones  RN. 2004. The distribution of B chromosomes across species. Cytogenet Genome Res. 106:151–158. 10.1159/000079281. [DOI] [PubMed] [Google Scholar]
  39. Pedersen  BS, Quinlan  AR. 2018. Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics. 34:867–868. 10.1093/bioinformatics/btx699. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Piovesan  A  et al.  2019. Human protein-coding genes and gene feature statistics in 2019. BMC Res Notes.  12:315. 10.1186/s13104-019-4343-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Qian  J  et al.  2025. TEtrimmer: a tool to automate the manual curation of transposable elements. Nat Commun. 16:8429. 10.1038/s41467-025-63889-y. [DOI] [PMC free article] [PubMed]
  42. R Core Team . 2022. R: a language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/. [Google Scholar]
  43. Ranallo-Benavidez  TR, Jaron  KS, Schatz  MC. 2020. GenomeScope 2.0 and smudgeplot for reference-free profiling of polyploid genomes. Nat Commun. 11:1432. 10.1038/s41467-020-14998-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Schielzeth  H. 2020. Phylogenetic, geographic and habitat distribution of the green-brown polymorphism in European orthopterans. bioRxiv.:2020.03.31.016915. 10.1101/2020.03.31.016915. [DOI]
  45. Shah  A, Hoffman  JI, Schielzeth  H. 2019. Transcriptome assembly for a colour-polymorphic grasshopper (Gomphocerus sibiricus) with a very large genome size. BMC Genomics. 20:370. 10.1186/s12864-019-5756-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Shah  A, Hoffman  JI, Schielzeth  H. 2020. Comparative analysis of genomic repeat content in gomphocerine grasshoppers reveals expansion of satellite DNA and helitrons in species with unusually large genomes. Genome Biol Evol. 12:1180–1193. 10.1093/gbe/evaa119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Smit  A, Hubley  R, Green  P.  2019. RepeatMasker. http://www.repeatmasker.org.
  48. Song  B  et al.  2022. AnchorWave:sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication. Proc Natl Acad Sci U S A.  119:e2113075119. 10.1073/pnas.2113075119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Song  H  et al.  2020. Phylogenomic analysis sheds light on the evolutionary pathways towards acoustic communication in Orthoptera. Nat Commun. 11:4939. 10.1038/s41467-020-18739-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Storer  J, Hubley  R, Rosen  J, Wheeler  TJ, Smit  AF. 2021. The Dfam community resource of transposable element families, sequence models, and genome annotations. Mob DNA.  12:2. 10.1186/s13100-020-00230-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Sun  K, Lu  Y, Huang  Y, Huang  H. 2023. Genome size evolution in grasshoppers (Orthoptera: Caelifera: Acrididae). Syst Entomol. 48:434–444. 10.1111/syen.12586. [DOI] [Google Scholar]
  52. Toups  MA, Vicoso  B. 2023. The X chromosome of insects likely predates the origin of class Insecta. Evolution. 77:2504–2511. 10.1093/evolut/qpad169. [DOI] [PubMed] [Google Scholar]
  53. Valverde  J, Eggert  H, Kurtz  J, Schielzeth  H. 2018. Condition-dependence and sexual ornamentation: effects of immune challenges on a highly sexually dimorphic grasshopper. Insect Sci. 25:617–630. 10.1111/1744-7917.12448. [DOI] [PubMed] [Google Scholar]
  54. Verlinden  H  et al.  2020. First draft genome assembly of the desert locust, Schistocerca gregaria. F1000Res. 9:775. 10.12688/f1000research.25148.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Yuan  H  et al.  2021. The evolutionary patterns of genome size in Ensifera (Insecta: orthoptera). Front Genet. 12:693541. 10.3389/fgene.2021.693541. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Zhou  C, McCarthy  SA, Durbin  R. 2023. YaHS: yet another Hi-C scaffolding tool. Bioinformatics. 39:btac808. doi: 10.1093/bioinformatics/btac808. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

  1. Dainat  J  et al.  2024. NBISweden/AGAT: AGAT-v1.4.1. Zenodo. 10.5281/zenodo.3552717. [DOI]

Data Availability Statement

The final assembly and raw data have been deposited at NCBI under the project accession number PRJNA1250961. The genome assembly is available under accession number JBPBLO000000000. PacBio HiFi long-read sequencing data can be found under accession numbers SRX29045381, SRX29045382, SRX29045386-SRX29045393, SRX29073929, and SRX29073928. Hi-C sequencing data are available under accession numbers SRX29045383-SRX29045385. The associated files for this manuscript are available on the figshare database: Supplementary Fig. 1 (https://doi.org/10.6084/m9.figshare.29476994), Gene annotation models (https://doi.org/10.6084/m9.figshare.29148779) RepeatMasker annotation (https://doi.org/10.6084/m9.figshare.29148836), curated nonredundant species-specific repeat library (https://doi.org/10.6084/m9.figshare.29148746).

Supplemental material available at G3 online.


Articles from G3: Genes | Genomes | Genetics are provided here courtesy of Oxford University Press

RESOURCES