Chromosome-level genome assembly of the predatory stink bug Arma custos

Yuqin Wang; Yunfei Luo; Yunkang Ge; Sha Liu; Wenkai Liang; Chaoyan Wu; Shujun Wei; Jiaying Zhu

doi:10.1038/s41597-024-03270-8

. 2024 Apr 23;11:417. doi: 10.1038/s41597-024-03270-8

Chromosome-level genome assembly of the predatory stink bug Arma custos

Yuqin Wang ^1,^#, Yunfei Luo ^1,^#, Yunkang Ge ^1,^#, Sha Liu ¹, Wenkai Liang ¹, Chaoyan Wu ¹, Shujun Wei ^1,², Jiaying Zhu ^1,^3,^✉

PMCID: PMC11039643 PMID: 38654007

Abstract

The stink bug Arma custos (Hemiptera: Pentatomidae) is a predatory enemy successfully used for biocontrol of lepidopteran and coleopteran pests in notorious invasive species. In this study, a high-quality chromosome-scale genome assembly of A. custos was achieved through a combination of Illumina sequencing, PacBio HiFi sequencing, and Hi-C scaffolding techniques. The final assembled genome was 969.02 Mb in size, with 935.94 Mb anchored to seven chromosomes, and a scaffold N50 length of 135.75 Mb. This genome comprised 52.78% repetitive elements. The detected complete BUSCO score was 99.34%, indicating its completeness. A total of 13,708 protein-coding genes were predicted in the genome, and 13219 of them were annotated. This genome provides an invaluable resource for further research on various aspects of predatory bugs, such as biology, genetics, and functional genomics.

Subject terms: Entomology, Genome

Background & Summary

The stink bug Arma custos (Fabricius, 1794) (Hemiptera: Pentatomidae) is synonymous with Arma chinensis (Fallou, 1881), which has been recorded in China, Mongolia and Korea, as well as central and southern Europe (except the British Islands) and the neighboring parts of the Middle East^1,2. Both nymphs and adults of this zoophytophagous bug can predate many agricultural and forestry pests belonging to the orders of Coleoptera, Lepidoptera, Hemiptera and Hymenoptera by utilizing a venomous cocktail produced by the salivary gland to capture and digest preys^3,4. It can be easily mass-reared using artificial diet in a factory and exhibits strong adaptability to diverse ecological niches, enabling its successful use as a commercialized biocontrol agent^3,5. Notably, it has shown effective management of notorious invasive pests such as the fall webworm Hyphantria cunea, the Colorado potato beetle Leptinotarsa decemlineata, and the fall armyworm Spodoptera frugiperda through the augmentative release^6–8. However, limited attention has been given to the investigation of the biological characteristics^9–11, artificial rearing methods^3,5,12, chemoecology¹³, response to temperature and drought stresses^8,14–17, and developmental regulation by miRNA¹⁸ of this predatory bug. In terms of its genetic information, only the mitochondrial genome and several transcriptomic datasets are available as the current genetic resources^{13,15,16,18,19}. Obtaining high-quality genome for providing a whole set of gene resources of A. custos will greatly facilitate a wide range of biological researches and allow further investigations, such as population genetic diversity, venomics, adaptive evolution, and comparative genomics.

In this study, we have assembled a chromosome-level genome of A. custos by combining PacBio HiFi sequencing and High-throughput chromosome conformation capture (Hi-C) technologies. The genome assembly allowed us to identify repeat sequences and protein-coding genes. Predicted genes were annotated. The generated genomic resources will facilitate to the investigation of this predatory bug.

Methods

Sample collection and rearing

The population of A. custos used in this study originated from a colony collected in the suburb of Kunming, Yunnan Province, China. These bugs have been maintained in our laboratory for more than 20 generations. They were fed with larvae of the yellow mealworm Tenebrio molitor, the greater wax moth Galleria mellonella, and the fall armyworm S. frugiperda. Cages measuring 40 cm × 40 cm × 40 cm, constructed with Nylon netting (44 × 32 mesh) on all sides, were used to rear the bugs. Each cage housed approximately 100 bugs. Soybean plants were also provided in the cage for feeding and perching. The bugs were reared at a constant temperature of 25 ± 1 °C, 70 ± 5% relative humidity, and a photoperiod of 14 L:10D.

Sequencing

Genomic DNA was extracted from one newly emerged male adult using the QIAamp DNA Mini Kit (Qiagen, Hilden, Germany). Total RNA was isolated from various adult tissues including different glands of the salivary venom apparatus (anterior main gland, posterior main gland, and accessary gland), gut and residual body (adult deprived of salivary venom apparatus and gut). The integrity and contamination of the DNA and RNA were assessed on a 1% agarose gel. The purity of the DNA and RNA was measured with a NanoDrop 2000c spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). The DNA and RNA concentration of was determined using the Qubit DNA Assay Kit in Qubit 3.0 Flurometer (Invitrogen, Carlsbad, CA, USA).

For short-read genomic and transcriptome sequencing, the library with an insert size of 350 bp was constructed using the NEBNext Ultra DNA Library Prep Kit (Illumina, San Diego, CA, USA) following manufacturer’s recommendations. This library was then sequenced on the Illumina NovaSeq 6000 platform (Illumina, San Diego, CA, USA). The genomic short-read data yielded from the Illumina NovaSeq 6000 platform amounted to 74.86 Gb with a Q20 value of 96.56% and a Q30 value of 90.84% (Table 1). A total of 72.86 Gb transcriptomic data were generated, which have Q20 values over 96.56% and Q30 values more than 90.84%.

Table 1.

Statistics of sequencing data for genome assembly and annotation.

Library type	Sequencing platform	Sample	Reads number	Raw data (Gb)	NCBI SRA accession no.
Genome	Illumina NovaSeq 6000	Male adult	249,518,674	74.86	SRR25498178
Genome	PacBio sequel II	Male adult	2,299,735	34.41	SRR25503034
Hi-C	Illumina NovaSeq 6000	Male adult	8,661,026	163.62	SRR25518321
Transcriptome	Illumina NovaSeq 6000	Anterior main gland of male	42,024,044	12.61	SRR25541878 SRR25541877
Transcriptome	Illumina NovaSeq 6000	Posterior main gland of male	45,376,382	13.61	SRR25541873 SRR25541872
Transcriptome	Illumina NovaSeq 6000	Accessary gland of male	42,227,792	12.67	SRR25541880 SRR25541879
Transcriptome	Illumina NovaSeq 6000	Duct of accessary gland of male	43,771,219	13.13	SRR25541882 SRR25541881
Transcriptome	Illumina NovaSeq 6000	Gut of male	40,425,550	12.13	SRR25541876 SRR25541875
Transcriptome	Illumina NovaSeq 6000	Residual body of male	44,540,809	13.36	SRR25541871 SRR25541870
Transcriptome	Illumina NovaSeq 6000	Anterior main gland of female	44,301,583	13.29	SRR25541868 SRR25541867
Transcriptome	Illumina NovaSeq 6000	Posterior main gland of female	61,418,417	18.52	SRR25541864 SRR25541863
Transcriptome	Illumina NovaSeq 6000	Accessary gland of female	43,782,125	13.13	SRR25541874 SRR25541869
Transcriptome	Illumina NovaSeq 6000	Duct of accessary gland of female	44,006,115	13.2	SRR25541886 SRR25541885
Transcriptome	Illumina NovaSeq 6000	Gut of female	41,769,353	12.53	SRR25541866 SRR25541865
Transcriptome	Illumina NovaSeq 6000	Residual body of female	45,693,537	13.71	SRR25541884 SRR25541883

Open in a new tab

For PacBio HiFi long-read sequencing, the SMRTbell library was prepared with the SMRTbell Express template preparation kit 2.0 (Pacific Biosciences, Menlo Park, CA) and subsequently sequenced using the Sequel II Sequencing Kit 2.0 with SMRT Cell 8 M Tray on a PacBio sequel II instrument (Pacific Biosciences, Menlo Park, CA). In total, 34.41 Gb high-quality HiFi reads (34.85 × coverage) were obtained with an average length of 14.96 kb and an N50 length of 15.18 kb (Table 1).

The Hi-C library was generated using the restriction endonuclease Mbol following the standard protocol described previously²⁰, which was sequenced on the Illumina NovaSeq 6000 platform (Illumina, San Diego, CA, USA) using a 150-bp paired-end strategy. A total of 163.62 Gb (165.72 × coverage) of raw data was generated.

Genome survey

To ensure data quality, adapter sequences and low-quality reads were removed with fastp v0.21.0²¹. The resulting clean reads were used to generate a histogram of the 17-mer distribution with Jellyfish v2.2.7 with parameters ‘count -g generators -G 4 -s 5 G -m 17 -C -t 10’²² (Fig. 1), followed by calculation of genome heterozygosity. Based on these analyses, the estimated genome size was determined to be 987.35 Mb, with a heterozygosity of 0.80%.

Fig. 1 — The 17-mer analysis of the genome of *Arma custos*. The X-axis represents the k-mer depth. The Y-axis indicates the k-mer frequency for a given depth.

Genome assembly

The PacBio HiFi reads were utilized to assemble the genome into contigs using hifiasm v0.16.1²³. The assembled draft genome was polished by employing the genomic short-reads generated by Illumina NovaSeq 6000 sequencer with the NextPolish v1.4.0²⁴. To identify and remove potential contaminant sequences, Kraken2 was employed against a custom database²⁵. A total of 137 contigs were identified as bacteria and subsequently eliminated. The resulting draft genome was 969.02 Mb with a contig N50 of 2.11 Mb, and the GC content of 33.18% (Table 2).

Table 2.

Statistics of the Arma custos genome assembly.

Features	Statistics
Total contig length (bp)	969,016,255
Number of contigs	1,142
Contig N50 size (bp)	2,105,537
Maximum contig size (bp)	11,334,306
Number of chromosomes	7
Total length of chromosomes (bp)	935,936,572
GC content (%)	33.18

Open in a new tab

Hi-C scaffolding

The raw HiC data were processed using Hi-C-Pro v2.8.0²⁶, followed by quality control with fastp v0.21.0²¹. The resulting data were aligned to the draft genome assembly utilizing bowtie 2 v2.2.3²⁷ to obtain the uniquely mapped paired-end reads. Among the 8,661,026 reads, 4,330,513 reads were paired, with a total paired ratio of 38.70%. And a total of 1,470,719 reads were uniquely mapped to the genome, with an effect rate of 33.96%, representing valid interaction pairs. These valid interaction pairs were used to anchor the assembled contigs to near-chromosomal level using the Allhic v0.9.8²⁸. Then, juicebox v1.11.08²⁹ was employed for manual correction based on chromosome interaction strength, ultimately resulting a chromosome-level genome. After curation, a total of 935.94 Mb of contigs, accounting for 96.58% of the assembled draft genome, were anchored into seven chromosomes, ranging from 77.33 Mb to 234.11 Mb (Table 3). The number of anchored chromosomes matched the result of chromosome karyotype analysis following the previously reported method³⁰ (Fig. 2). The final genome exhibited an N50 of 135.75 Mb. A genome-wide chromatin interaction HiC heatmap was constructed using the ggplot2 software in the R package. According to the heatmap, all chromosomes were clearly distinguishable from each other (Fig. 3). The Advanced Circos tool implanted in TBtools v1.098765³¹ was used to visualize the landscape of the chromosomes (Fig. 4).

Table 3.

Summary of the assembled seven chromosomes of Arma custos.

Chr ID	Contig number	Chr length (bp)
Chr1	174	234,112,533
Chr2	138	135,751,263
Chr3	147	124,603,657
Chr4	97	115,729,252
Chr5	97	108,710,392
Chr6	168	139,701,585
Chr7	149	77,327,890

Open in a new tab

Fig. 2 — Karyotype analysis of *Arma custos* reveals a chromosome count of seven. The chromosomes from two nuclei are shown.

Fig. 3 — Heatmap of the Hi-C assembly of *Arma custos*. The interaction intensity of Hi-C links represents by colors shown in the left bar, ranging from yellow (low) to red (high).

Fig. 4 — Overview of the genome characteristics of *Arma custos* in a circos plot. (a), length of chromosomes at the Mb scale; (b), gene density per Mb; (c), CG content per Mb.

Genome annotation

A combined strategy of homology alignment and de novo search was applied to identify repetitive elements in the genome. Tandem repeats were detected using Tandem Repeats Finder (TRF) v4.09³². Repetitive elements homologous to those available in the Repbase28.06³³ were identified with RepeatMasker v4.1.0 and RepeatProteinMask v4.1.0³⁴. In addition, a de novo repetitive elements database was generated using LTR_FINDER v1.0.6³⁵, RepeatScout v1.0.5³⁶, and RepeatModeler v2.0.1³⁷. The resulting repeat sequences with lengths greater than 100 bp and gap ‘N’ less than 5%, obtained from both two strategies, were combined to construct the raw transposable element library. This library was then processed by UCLUST algorithm³⁸ to yield a non-redundant library, followed by DNA-level repeat identification using RepeatMasker v2.0.1³⁷. The results indicated that the genome contained 52.78% repetitive elements, most of which were long terminal repeat (LTR) retrotransposons, representing 40.05% of the genome (Table 4).

Table 4.

Summary of repetitive sequences identified in the genome of Arma custos.

Repeat family	De novo + Repbase		TE Proteins		Combined TEs
Repeat family	Length (bp)	% of genome	Length (bp)	% of genome	Length (bp)	% of genome
DNA transposon	35,910,229	3.71	4,460,637	0.46	38,432,048	3.97
LINE	51,833,697	5.35	50,304,093	5.19	84,751,997	8.75
SINE	727,401	0.08	0	0	727,401	0.08
LTR	386,285,131	39.86	22,523,095	2.32	388,095,502	40.05
Unknown	45,755,684	4.72	222	0	45,755,906	4.72
Total (TRF not included)	502,619,694	51.86	77,280,191	7.97	505,079,672	52.12

Open in a new tab

For non-coding RNA (ncRNAs) annotation, the transfer RNAs (tRNAs) were predicted using tRNAscan-SE v1.4³⁹. As ribosomal RNAs (rRNAs) are highly conserved, they were predicted by searching against selected rRNA sequences from closely related species as references using the BLAST v2.2.26⁴⁰. Other ncRNAs, including micro RNAs (miRNAs) and small nuclear RNAs (snRNAs), were identified by searching against the Rfam database v14.1⁴¹ using the Infernal v1.1.2⁴². Overall, 20,337 tRNAs, 1,556 rRNAs, 2,790 miRNAs and 596 snRNAs were predicted, resulting in a total of 25,279 ncRNAs (Table 5).

Table 5.

Summary of non-coding RNAs predicted in the genome of Arma custos.

Class	Type	Number	Average length (bp)	Total length (bp)	% of genome
miRNA		2,790	125.08	348,971	0.036009
tRNA		20,337	73.44	1,493,526	0.15
rRNA	rRNA	778	207.06	161,090	0.016622
	18 S	214	244.99	52,428	0.00541
	28 S	477	211.16	100,722	0.010393
	5.8 S	2	110	220	0.000023
	5 S	85	90.82	7,720	0.000797
snRNA	snRNA	298	131.52	39,193	0.004044
	CD-box	46	141.17	6,494	0.00067
	HACA-box	16	187.44	2,999	0.000309
	splicing	231	124.84	28,837	0.002976
	scaRNA	5	172.6	863	0.000089

Open in a new tab

A combined three-pronged strategy, involving de novo prediction, homology-based gene prediction, and transcriptome-based prediction, was employed to annotate the genes in the genome. De novo gene models were generated by multiple programs, namely Augustus v3.3.3⁴³, GlimmerHMM v3.0.4⁴⁴, SNAP v2013.11.29⁴⁵, Geneid v1.4⁴⁶, and Genscan v1.0⁴⁷. For homology-based prediction, protein sets from five bugs including Halyomorpha halys⁴⁸, Nesidiocoris tenui⁴⁹, Oncopeltus fasciatus⁵⁰, Rhodnius prolixus⁵¹ and Apolygus lucorum⁵² were downloaded from Insectbase 2.0⁵³ on January 3, 2022. These protein sets were aligned to the assembled genome using TblastN v2.2.26⁴⁰ with an E-value threshold of ≤1e⁻⁵. The matching proteins from these bugs were used to predict the gene structure of the assembled genome with GeneWise v2.4.1⁵⁴. For transcriptome-based prediction, raw reads from five transcriptomic libraries were subjected to quality control with fastp v0.21.0²¹. After eliminating adapter sequences and low-quality reads with Trimmomatic v1.4⁵⁵, clean data were assembled into transcripts using Trinity v2.11.0⁵⁶ and StringTie2 v2.1.6⁵⁷. The candidate coding regions in these transcripts were predicted using TransDecoder v5.5.0⁵⁶, which is implemented in the Trinity software. The resulting protein sequences were used to predict the gene structures following the procedure as described for homology-based prediction. In addition, the clean transcriptomic data were aligned to the assembled genome using HISAT2 v2.2.1⁵⁸ to identify the exons and splice sites, and these were used to extract the gene structures using PASA v2.4.1⁵⁹. A non-redundant reference gene set was generated by merging genes predicted by the three strategies with EVidenceModeler (EVM) v1.1.1⁶⁰. The gene models were further updated with PASA v2.4.1⁵⁹ to identify untranslated regions. Finally, the final comprehensive gene set was generated, resulting in a total of 13,708 protein-coding genes (Table 6). These genes had an average gene length of 25,698.42 bp. The average lengths of their coding sequence (CDS), exon, and intron length were 1,400.68 bp, 192.17 bp, and 3,863.69 bp, respectively. On average, each gene contained 7.29 exons (Table 7).

Table 6.

Summary of protein-coding genes annotated in Arma custos genome by three strategies.

	Gene set	Number	Average transcript length (bp)	Average CDS length (bp)	Average exons per gene	Average exon length (bp)	Average intron length (bp)
De novo	Augustus	19,761	14,722.95	1,082.37	5.26	205.61	3,198.78
	GlimmerHMM	58,896	10,708.91	415.56	3.21	129.28	4,648.56
	SNAP	13,204	64,327.32	600.21	9.76	61.49	7,274.35
	Geneid	17,994	24,118.89	970.43	4.28	226.78	7,059.29
	Genscan	19,234	31,631.85	1,038.01	5.3	195.91	7,117.62
Homolog	Nten	7,909	8,905.95	957.12	4.39	218.25	2,347.9
	Aluc	11,081	11,029.78	1,076.08	5.24	205.26	2,346.15
	Hhal	15,287	12,849.29	1,176.33	5.82	202.21	2,423.1
	Ofas	14,804	5,945.94	795.81	3.76	211.46	1,863.76
	Rpro	11,814	8,474.72	932.33	4.63	201.46	2,079.04
Transcriptome	PASA	20,975	25,353.77	1,278.69	6.62	193.13	4,283.15
Transcriptome	Transcripts	35,938	44,647.4	2,807.63	8.33	336.92	5,705.45
EVM		19,234	17,188.62	1,098.45	5.64	194.8	3,468.63
PASA update		19,029	20,359.35	1,128.47	5.79	195.06	4,018.72
Final set		13,708	25,698.42	1,400.68	7.29	192.17	3,863.69

Open in a new tab

Nten, Nesidiocoris tenui; Aluc, Apolygus lucorum; Hhal, Halyomorpha halys; Ofas, Oncopeltus fasciatus; Rpro, Rhodnius prolixus.

Table 7.

Comparison of protein-coding genes annotated in the genomes of Arma custos and other bugs.

Species	Number	Average transcript length (bp)	Average CDS length (bp)	Average exons per gene	Average exon length (bp)	Average intron length (bp)
Acus	13,708	25,698.42	1,400.68	7.29	192.17	3,863.69
Rpro	15,438	7,353.42	1,059.56	5.77	183.5	1,318.33
Ofas	19,587	11,934.86	899.86	5.09	176.68	2,695.92
Aluc	20,111	22,559.88	1,348.05	6.6	204.18	3,786.2
Nten	24,514	7,117.77	957.91	4.22	226.79	1,910.8
Hhal	14,454	22,935.31	1,445.21	7.39	195.44	3,360.68

Open in a new tab

Achi, Arma custos; Rpro, Rhodnius prolixus; Ofas, Oncopeltus fasciatus; Aluc, Apolygus lucorum; Nten, Nesidiocoris tenui; Hhal, Halyomorpha halys.

The annotation of the protein-coding genes was performed using BLAST v2.2.26⁴⁰ against SwissProt and National Center for Biotechnology Information (NCBI) non-redundant (Nr) database with DIAMOND v2.2.22⁶¹, parameters used ‘-ultra-sensitive -max-target-seqs. 1 -evalue 1e⁻⁵’ with a threshold of E-value ≤ 1e⁻⁵. The motifs and domains present in the predicted proteins encoding by these genes were annotated using InterProScan v86.0 with parameters ‘-disable-precalc, -goterms, -pathways’ and Pfam⁶². Additionally, these genes were classified into functional categories based on KEGG⁶³ and GO⁶⁴ with a threshold of E-value ≤ 1e⁻⁵. Overall, 13,219 predicted genes were annotated using the databases of Nr, SwissProt, InterProScan, Pfam, KEGG and GO, representing 96.43% of the total gene set (Table 8).

Table 8.

Summary of functional annotation of protein-coding genes encoded in genome of Arma custos.

Database	Number	Percent (%)
Nr	12864	93.84
Swissprot	9933	72.46
InterPro	12353	90.12
Pfam	9729	70.97
KEGG	10228	74.61
GO	7810	56.97
Annotated at least one database	13219	96.43
Unannotated	489	3.57
Total	13708

Open in a new tab

Data Records

The raw data of Illumina short reads, PacBio HiFi long reads and Hi-C reads for assembling the genome of A. custos, as well as the transcriptome Illumina sequencing data for genomic annotation, have been deposited in the NCBI SRA (Sequence Read Archive) database under BioProject number PRJNA1001510. Illumina sequencing data for genome survey can be accessed and downloaded with accession number SRR25498178⁶⁵. PacBio sequel II sequencing data for genome assembly can be accessed and downloaded with accession number SRR25503034⁶⁶. Hi-C sequencing data can be accessed and downloaded with accession number SRR25518321⁶⁷. Transcriptome sequencing data for genome annotation can be accessed and downloaded from NCBI SRA database (https://identifiers.org/ncbi/insdc.sra:SRP453032)⁶⁸. The genome sequence has been deposited in Genbank under the accession number JBBAGI000000000 (https://www.ncbi.nlm.nih.gov/nuccore/JBBAGI000000000)⁶⁹. The final chromosome assembly, genome structure annotation, amino acid sequences and CDS sequences data are available at the Figshare database (10.6084/m9.figshare.25284943)⁷⁰.

Technical Validation

The accuracy of the assembled genome was assessed using two methods. Firstly, the clean Illumina genomic short reads were aligned back to the genome by Burrows–Wheeler Aligner (BWA) v.0.7.12-r1039⁷¹. Approximately 97.81% of the short reads were successfully aligned to the genome, providing a genome coverage of 99.95%. The heterozygous and homozygous nucleotide polymorphisms (SNPs) in the genome were 0.407191% and 0.00011%, respectively. The results indicate a high accuracy of the genome assembly. Secondly, the accuracy of the assembled genome was evaluated using Merqury v1.4⁷². A quality value of 46.78 was obtained, affirming the base-level accuracy genome assembly. The completeness of the assembled genome was evaluated using three methods. Firstly, Benchmarking Universal Single-Copy Orthologs (BUSCO) v5.4.7 (-l insecta_odb10 -m genome)⁷³ was employed. The results showed that the complete and fragment scores were 99.34% and 0.22%, respectively. Among the retrieved complete single-copy genes, only 2.3% of them are duplicated. Secondly, Core Eukaryotic Genes Mapping Approach (CEGMA, v2.5)⁷⁴ was employed. Among the 248 most highly conserved core eukaryotic genes (CEGs) within CEGMA, 230 CEGs were successfully assembled, accounting for 92.74%, and 222 CEGs were complete, accounting for 89.52%. Thirdly, LTR Assembly Index (LAI) was assessed using LTR_retriever v. 2.9.0⁷⁵, resulting in a value of 8.44. These results indicated a high level of completeness in the genome assembly.

Acknowledgements

This work was supported by the Joint Special Key Project of Agricultural Basic Research in Yunnan Province (202101BD070001-024), the National Natural Science Foundation of China (32360686); the Xing Dian Talents Support Program of Yunnan Province to Shujun Wei, the National Science and Technology Innovation Talent Program in Forestry and Grassland for Young Top-notch Talents (2019132615), and the Funding for the Construction of First-Class Discipline of Forestry in Yunnan Province.

Author contributions

J.Z. and S.W. conceived this study. Y.W., Y.L., Y.G. and C.W. collected the samples, and prepared DNA and RNA for sequencing. Y.W., Y.L. and Y.G. performed the experiments and analysed the data. Y.W. drew the figures. S.L., W.L. and C.W. assisted in data analysis. J.Z. drafted the manuscript. J.Z. and S.W. revised the manuscript. All authors read the final manuscript for submission.

Code availability

In this study, no custom scripts or command lines were utilized. All software employed for data processing and analysis are publicly available. The specific versions and parameters of each software are detailed in the Methods section. If no specific parameters were mentioned for a particular software, default parameters were used. The software was applied following the manuals and protocols provided by the respective bioinformatic tools.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Yuqin Wang, Yunfei Luo, Yunkang Ge.

References

1.Rider DA, Zheng LY. Checklist and nomenclatural notes on the Chinese Pentatomidae (Heteroptera) I, Asopinae. Entomotaxonomia. 2002;24:107–115. [Google Scholar]
2.Zhao Q, Wei J, Bu W, Liu G, Zhang H. Synonymize Arma chinensis as Arma custos based on morphological, molecular and geographical data. Zootaxa. 2018;4455:161–176. doi: 10.11646/zootaxa.4455.1.7. [DOI] [PubMed] [Google Scholar]
3.Zou DY, et al. Taxonomic and bionomic notes on Arma chinensis (Fallou) (Hemiptera: Pentatomidae: Asopinae) Zootaxa. 2012;3382:41–52. doi: 10.11646/zootaxa.3382.1.4. [DOI] [Google Scholar]
4.Pan M, Zhang H, Zhang L, Chen H. Effects of starvation and prey availability on predation and dispersal of an omnivorous predator Arma chinensis Fallou. J. Insect Behav. 2019;32:134–144. doi: 10.1007/s10905-019-09718-9. [DOI] [Google Scholar]
5.Zou DY, et al. Performance and cost comparisons for continuous rearing of Arma chinensis (Hemiptera: Pentatomidae: Asopinae) on a zoophytogenous artificial diet and a secondary prey. J. Econ. Entomol. 2015;108:454–461. doi: 10.1093/jee/tov024. [DOI] [PubMed] [Google Scholar]
6.Wang WL, et al. Preliminary observation of preyed ability of Arma chinensis (Fallou), a new natural enemy of Hyphantria cunea (Drury). Shandong For. Sci. Technol. 2012;1:11–14. [Google Scholar]
7.Tang YT, et al. Predation and behaviour of Arma chinensis to Spodoptera frugiperda. Plant Protection. 2019;45:65–68. [Google Scholar]
8.Liu J, Liao J, Li C. Bottom-up effects of drought on the growth and development of potato, Leptinotarsa decemlineata Say and Arma chinensis Fallou. Pest Manag. Sci. 2022;78:4353–4360. doi: 10.1002/ps.7054. [DOI] [PubMed] [Google Scholar]
9.Li JJ, et al. Effects of three prey species on development and fecundity of the predaceous stinkbug Arma chinensis (Hemiptera: Pentatomidae) Chin. J. Biol. Control. 2016;32:552–561. [Google Scholar]
10.Wang J, et al. Population growth performance of Arma custos (Faricius) (Hemiptera: Pentatomidae) at different temperatures. J. Insect Sci. 2022;22:12. doi: 10.1093/jisesa/ieac058. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Liu, J., Liu, X., Liao, L. & Li, C., Biological performance of Arma chinensis on three preys Antheraea pernyi, Plodia interpunctella and Leptinotarsa decemlineata, Int. J. Pest Manag. 10.1080/09670874.2023.2216173, 1-8 (2023).
12.Guo Y, Liu CX, Zhang LS, Wang MQ, Chen HY. Sterol content in the artificial diet of Mythimna separata affects the metabolomics of Arma chinensis (Fallou) as determined by proton nuclear magnetic resonance spectroscopy. Arch. Insect Biochem. Physiol. 2017;96:e21426. doi: 10.1002/arch.21426. [DOI] [PubMed] [Google Scholar]
13.Wu S, et al. Analysis of chemosensory genes in full and hungry adults of Arma chinensis (Pentatomidae) through antennal transcriptome. Front. Physiol. 2020;11:588291. doi: 10.3389/fphys.2020.588291. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Zou DY, et al. A meridic diet for continuous rearing of Arma chinensis (Hemiptera: Pentatomidae: Asopinae) Biol. Control. 2013;67:491–497. doi: 10.1016/j.biocontrol.2013.09.020. [DOI] [Google Scholar]
15.Zou DY, et al. Performance of Arma chinensis reared on an artificial diet formulated using transcriptomic methods. Bull. Entomol. Res. 2019;109:24–33. doi: 10.1017/S0007485318000111. [DOI] [PubMed] [Google Scholar]
16.Zou D, et al. Differential proteomics analysis unraveled mechanisms of Arma chinensis responding to improved artificial diet. Insects. 2022;13:605. doi: 10.3390/insects13070605. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Meng JY, Yang CL, Wang HC, Cao Y, Zhang CY. Molecular characterization of six heat shock protein 70 genes from Arma chinensis and their expression patterns in response to temperature stress. Cell Stress Chaperones. 2022;27:659–671. doi: 10.1007/s12192-022-01303-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Yin Y, Zhu Y, Mao J, Gundersen-Rindal DE, Liu C. Identification and characterization of microRNAs in the immature stage of the beneficial predatory bug Arma chinensis Fallou (Hemiptera: Pentatomidae) Arch. Insect Biochem. Physiol. 2021;107:e21796. doi: 10.1002/arch.21796. [DOI] [PubMed] [Google Scholar]
19.Zou D, et al. Nutrigenomics in Arma chinensis: transcriptome analysis of Arma chinensis fed on artificial diet and Chinese oak silk moth Antheraea pernyi pupae. PLoS One. 2013;8:e60881. doi: 10.1371/journal.pone.0060881. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Belton JM, et al. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods. 2012;58:268–276. doi: 10.1016/j.ymeth.2012.05.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34:i884–i890. doi: 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Marcais G, Kingsford C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 2011;27:764–770. doi: 10.1093/bioinformatics/btr011. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Cheng H, Concepcion G, Feng X, Zhang H, Li H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods. 2021;18:170–175. doi: 10.1038/s41592-020-01056-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Hu J, Fan J, Sun Z, Liu S. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics. 2020;36:2253–2255. doi: 10.1093/bioinformatics/btz891. [DOI] [PubMed] [Google Scholar]
25.Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol. 2019;20:1–13. doi: 10.1186/s13059-019-1891-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Servant N, et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 2015;16:259. doi: 10.1186/s13059-015-0831-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Shi J, et al. Chromosome conformation capture resolved near complete genome assembly of broomcorn millet. Nat. Commun. 2019;10:464. doi: 10.1038/s41467-018-07876-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Zhang X, Zhang S, Zhao Q, Ming R, Tang H. Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data. Nat. Plants. 2019;5:833–845. doi: 10.1038/s41477-019-0487-8. [DOI] [PubMed] [Google Scholar]
29.Durand NC, et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 2016;3:99–101. doi: 10.1016/j.cels.2015.07.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Imai HT, Taylor RW, Crosland MW, Crozier RH. Modes of spontaneous chromosomal mutation and karyotype evolution in ants with reference to the minimum interaction hypothesis. Jpn. J. Genet. 1988;63:159–185. doi: 10.1266/jjg.63.159. [DOI] [PubMed] [Google Scholar]
31.Chen C, et al. TBtools: an integrative toolkit developed for interactive analyses of big biological data. Mol. Plant. 2020;13:1194–1202. doi: 10.1016/j.molp.2020.06.009. [DOI] [PubMed] [Google Scholar]
32.Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27:573–580. doi: 10.1093/nar/27.2.573. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Bao W, Kojima KK, Kohany O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mobile DNA. 2015;6:11. doi: 10.1186/s13100-015-0041-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Bergman CM, Quesneville H. Discovering and detecting transposable elements in genome sequences. Brief Bioinform. 2007;8:382–392. doi: 10.1093/bib/bbm048. [DOI] [PubMed] [Google Scholar]
35.Ou S, Jiang N. LTR_FINDER_parallel: parallelization of LTR_FINDER enabling rapid identification of long terminal repeat retrotransposons. Mobile DNA. 2019;10:48. doi: 10.1186/s13100-019-0193-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Price AL, Jones NC, Pevzner PA. De novo identification of repeat families in large genomes. Bioinformatics. 2005;21:i351–i358. doi: 10.1093/bioinformatics/bti1018. [DOI] [PubMed] [Google Scholar]
37.Flynn JM, et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA. 2020;117:9451–9457. doi: 10.1073/pnas.1921046117. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010;26:2460–2461. doi: 10.1093/bioinformatics/btq461. [DOI] [PubMed] [Google Scholar]
39.Chan PP, Lin BY, Mak AJ, Lowe TM. tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Res. 2021;49:9077–9096. doi: 10.1093/nar/gkab688. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Altschul SF, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Griffiths-Jones S, et al. Rfam: an RNA family database. Nucleic Acids Res. 2003;31:439–441. doi: 10.1093/nar/gkg006. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Nawrocki EP, Eddy SR. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 2013;29:2933–2935. doi: 10.1093/bioinformatics/btt509. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Stanke M, Steinkamp R, Waack S, Morgenstern B. AUGUSTUS: A web server for gene finding in eukaryotes. Nucleic Acids Res. 2004;32:W309–W312. doi: 10.1093/nar/gkh379. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Majoros WH, Pertea M, Salzberg SL. TigrScan and GlimmerHMM: Two open source ab initio eukaryotic gene-finders. Bioinformatics. 2004;20:2878–2879. doi: 10.1093/bioinformatics/bth315. [DOI] [PubMed] [Google Scholar]
45.Korf I. Gene finding in novel genomes. BMC Bioinformatics. 2004;5:59. doi: 10.1186/1471-2105-5-59. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Blanco E, Parra G, Guigó R. Using geneid to identify genes. Curr. Protoc. Bioinformatics. 2007;18:4.3.1–4.3.28. doi: 10.1002/0471250953.bi0403s18. [DOI] [PubMed] [Google Scholar]
47.Aggarwal G, Ramaswamy R. Ab initio gene identification: Prokaryote genome annotation with GeneScan and GLIMMER. J. Biosci. 2002;27:7–14. doi: 10.1007/BF02703679. [DOI] [PubMed] [Google Scholar]
48.Sparks ME, et al. Brown marmorated stink bug, Halyomorpha halys (Stål), genome: putative underpinnings of polyphagy, insecticide resistance potential and biology of a top worldwide pest. BMC Genomics. 2020;21:227. doi: 10.1186/s12864-020-6510-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Shibata T, et al. High-quality genome of the zoophytophagous stink bug, Nesidiocoris tenuis, informs their food habit adaptation. G3. 2024;14:jkad289. doi: 10.1093/g3journal/jkad289. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Panfilio KA, et al. Molecular evolutionary trends and feeding ecology diversification in the Hemiptera, anchored by the milkweed bug genome. Genome Biol. 2019;20:64. doi: 10.1186/s13059-019-1660-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Mesquita RD, et al. Genome of Rhodnius prolixus, an insect vector of Chagas disease, reveals unique adaptations to hematophagy and parasite infection. Proc. Natl. Acad. Sci. USA. 2015;112:14936–14941. doi: 10.1073/pnas.1506226112. [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Liu Y, et al. Apolygus lucorum genome provides insights into omnivorousness and mesophyll feeding. Mol. Ecol. Resour. 2021;21:287–300. doi: 10.1111/1755-0998.13253. [DOI] [PubMed] [Google Scholar]
53.Mei Y, et al. InsectBase 2.0: a comprehensive gene resource for insects. Nucleic Acids Res. 2022;50:D1040–D1045. doi: 10.1093/nar/gkab1090. [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Birney E, Clamp M, Durbin R. GeneWise and Genomewise. Genome Res. 2004;14:988–995. doi: 10.1101/gr.1865504. [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
56.Haas BJ, et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 2013;8:1494–1512. doi: 10.1038/nprot.2013.084. [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Kovaka S, et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 2019;20:278. doi: 10.1186/s13059-019-1910-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
58.Kim D, Paggi J, Park C, Bennett C, Salzberg S. Graph-based genome alignment and genotyping with HISAT2 and HISAT genotype. Nat. Biotechnol. 2019;37:907–915. doi: 10.1038/s41587-019-0201-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Haas BJ, et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003;31:5654–5666. doi: 10.1093/nar/gkg770. [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Haas BJ, et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol. 2008;9:R7. doi: 10.1186/gb-2008-9-1-r7. [DOI] [PMC free article] [PubMed] [Google Scholar]
61.Buchfink B, Reuter K, Drost H. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat. Methods. 2021;18:366–368. doi: 10.1038/s41592-021-01101-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
62.Jones P, et al. InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014;30:1236–1240. doi: 10.1093/bioinformatics/btu031. [DOI] [PMC free article] [PubMed] [Google Scholar]
63.Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 2017;45:D353–D361. doi: 10.1093/nar/gkw1092. [DOI] [PMC free article] [PubMed] [Google Scholar]
64.Ashburner M, et al. Gene Ontology: tool for the unification of biology. Nat. Genet. 2000;25:25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
65.2024. NCBI Sequence Read Archive. SRR25498178
66.2024. NCBI Sequence Read Archive. SRR25503034
67.2024. NCBI Sequence Read Archive. SRR25518321
68.NCBI Sequence Read Archive.https://identifiers.org/ncbi/insdc.sra:SRP453032 (2024).
69.Wang YQ, Zhu JY. 2024. Arma custos isolate FDSW210240299, whole genome shotgun sequencing project. GenBank. JBBAGI000000000
70.Wang YQ, 2024. Chromosome-level genome assembly of the predatory stink bug Arma custos (Hemiptera: Pentatomidae) Figshare. [DOI] [PMC free article] [PubMed]
71.Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
72.Rhie A, et al. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 2020;21:245. doi: 10.1186/s13059-020-02134-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
73.Manni M, Berkeley MR, Seppey M, Zdobnov EM. BUSCO: assessing genomic data quality and beyond. Curr. Protoc. 2021;1:e323. doi: 10.1002/cpz1.323. [DOI] [PubMed] [Google Scholar]
74.Parra G, Bradnam K, Korf I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics. 2007;23:1061–1067. doi: 10.1093/bioinformatics/btm071. [DOI] [PubMed] [Google Scholar]
75.Ou S, Chen J, Jiang N. Assessing genome assembly quality using the LTR Assembly Index (LAI) Nucleic Acids Res. 2018;46:e126. doi: 10.1093/nar/gky730. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

2024. NCBI Sequence Read Archive. SRR25498178
2024. NCBI Sequence Read Archive. SRR25503034
2024. NCBI Sequence Read Archive. SRR25518321
Wang YQ, Zhu JY. 2024. Arma custos isolate FDSW210240299, whole genome shotgun sequencing project. GenBank. JBBAGI000000000
Wang YQ, 2024. Chromosome-level genome assembly of the predatory stink bug Arma custos (Hemiptera: Pentatomidae) Figshare. [DOI] [PMC free article] [PubMed]

Data Availability Statement

[CR1] 1.Rider DA, Zheng LY. Checklist and nomenclatural notes on the Chinese Pentatomidae (Heteroptera) I, Asopinae. Entomotaxonomia. 2002;24:107–115. [Google Scholar]

[CR2] 2.Zhao Q, Wei J, Bu W, Liu G, Zhang H. Synonymize Arma chinensis as Arma custos based on morphological, molecular and geographical data. Zootaxa. 2018;4455:161–176. doi: 10.11646/zootaxa.4455.1.7. [DOI] [PubMed] [Google Scholar]

[CR3] 3.Zou DY, et al. Taxonomic and bionomic notes on Arma chinensis (Fallou) (Hemiptera: Pentatomidae: Asopinae) Zootaxa. 2012;3382:41–52. doi: 10.11646/zootaxa.3382.1.4. [DOI] [Google Scholar]

[CR4] 4.Pan M, Zhang H, Zhang L, Chen H. Effects of starvation and prey availability on predation and dispersal of an omnivorous predator Arma chinensis Fallou. J. Insect Behav. 2019;32:134–144. doi: 10.1007/s10905-019-09718-9. [DOI] [Google Scholar]

[CR5] 5.Zou DY, et al. Performance and cost comparisons for continuous rearing of Arma chinensis (Hemiptera: Pentatomidae: Asopinae) on a zoophytogenous artificial diet and a secondary prey. J. Econ. Entomol. 2015;108:454–461. doi: 10.1093/jee/tov024. [DOI] [PubMed] [Google Scholar]

[CR6] 6.Wang WL, et al. Preliminary observation of preyed ability of Arma chinensis (Fallou), a new natural enemy of Hyphantria cunea (Drury). Shandong For. Sci. Technol. 2012;1:11–14. [Google Scholar]

[CR7] 7.Tang YT, et al. Predation and behaviour of Arma chinensis to Spodoptera frugiperda. Plant Protection. 2019;45:65–68. [Google Scholar]

[CR8] 8.Liu J, Liao J, Li C. Bottom-up effects of drought on the growth and development of potato, Leptinotarsa decemlineata Say and Arma chinensis Fallou. Pest Manag. Sci. 2022;78:4353–4360. doi: 10.1002/ps.7054. [DOI] [PubMed] [Google Scholar]

[CR9] 9.Li JJ, et al. Effects of three prey species on development and fecundity of the predaceous stinkbug Arma chinensis (Hemiptera: Pentatomidae) Chin. J. Biol. Control. 2016;32:552–561. [Google Scholar]

[CR10] 10.Wang J, et al. Population growth performance of Arma custos (Faricius) (Hemiptera: Pentatomidae) at different temperatures. J. Insect Sci. 2022;22:12. doi: 10.1093/jisesa/ieac058. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Liu, J., Liu, X., Liao, L. & Li, C., Biological performance of Arma chinensis on three preys Antheraea pernyi, Plodia interpunctella and Leptinotarsa decemlineata, Int. J. Pest Manag. 10.1080/09670874.2023.2216173, 1-8 (2023).

[CR12] 12.Guo Y, Liu CX, Zhang LS, Wang MQ, Chen HY. Sterol content in the artificial diet of Mythimna separata affects the metabolomics of Arma chinensis (Fallou) as determined by proton nuclear magnetic resonance spectroscopy. Arch. Insect Biochem. Physiol. 2017;96:e21426. doi: 10.1002/arch.21426. [DOI] [PubMed] [Google Scholar]

[CR13] 13.Wu S, et al. Analysis of chemosensory genes in full and hungry adults of Arma chinensis (Pentatomidae) through antennal transcriptome. Front. Physiol. 2020;11:588291. doi: 10.3389/fphys.2020.588291. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Zou DY, et al. A meridic diet for continuous rearing of Arma chinensis (Hemiptera: Pentatomidae: Asopinae) Biol. Control. 2013;67:491–497. doi: 10.1016/j.biocontrol.2013.09.020. [DOI] [Google Scholar]

[CR15] 15.Zou DY, et al. Performance of Arma chinensis reared on an artificial diet formulated using transcriptomic methods. Bull. Entomol. Res. 2019;109:24–33. doi: 10.1017/S0007485318000111. [DOI] [PubMed] [Google Scholar]

[CR16] 16.Zou D, et al. Differential proteomics analysis unraveled mechanisms of Arma chinensis responding to improved artificial diet. Insects. 2022;13:605. doi: 10.3390/insects13070605. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Meng JY, Yang CL, Wang HC, Cao Y, Zhang CY. Molecular characterization of six heat shock protein 70 genes from Arma chinensis and their expression patterns in response to temperature stress. Cell Stress Chaperones. 2022;27:659–671. doi: 10.1007/s12192-022-01303-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Yin Y, Zhu Y, Mao J, Gundersen-Rindal DE, Liu C. Identification and characterization of microRNAs in the immature stage of the beneficial predatory bug Arma chinensis Fallou (Hemiptera: Pentatomidae) Arch. Insect Biochem. Physiol. 2021;107:e21796. doi: 10.1002/arch.21796. [DOI] [PubMed] [Google Scholar]

[CR19] 19.Zou D, et al. Nutrigenomics in Arma chinensis: transcriptome analysis of Arma chinensis fed on artificial diet and Chinese oak silk moth Antheraea pernyi pupae. PLoS One. 2013;8:e60881. doi: 10.1371/journal.pone.0060881. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Belton JM, et al. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods. 2012;58:268–276. doi: 10.1016/j.ymeth.2012.05.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34:i884–i890. doi: 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Marcais G, Kingsford C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 2011;27:764–770. doi: 10.1093/bioinformatics/btr011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Cheng H, Concepcion G, Feng X, Zhang H, Li H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods. 2021;18:170–175. doi: 10.1038/s41592-020-01056-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Hu J, Fan J, Sun Z, Liu S. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics. 2020;36:2253–2255. doi: 10.1093/bioinformatics/btz891. [DOI] [PubMed] [Google Scholar]

[CR25] 25.Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol. 2019;20:1–13. doi: 10.1186/s13059-019-1891-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Servant N, et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 2015;16:259. doi: 10.1186/s13059-015-0831-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Shi J, et al. Chromosome conformation capture resolved near complete genome assembly of broomcorn millet. Nat. Commun. 2019;10:464. doi: 10.1038/s41467-018-07876-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Zhang X, Zhang S, Zhao Q, Ming R, Tang H. Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data. Nat. Plants. 2019;5:833–845. doi: 10.1038/s41477-019-0487-8. [DOI] [PubMed] [Google Scholar]

[CR29] 29.Durand NC, et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 2016;3:99–101. doi: 10.1016/j.cels.2015.07.012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Imai HT, Taylor RW, Crosland MW, Crozier RH. Modes of spontaneous chromosomal mutation and karyotype evolution in ants with reference to the minimum interaction hypothesis. Jpn. J. Genet. 1988;63:159–185. doi: 10.1266/jjg.63.159. [DOI] [PubMed] [Google Scholar]

[CR31] 31.Chen C, et al. TBtools: an integrative toolkit developed for interactive analyses of big biological data. Mol. Plant. 2020;13:1194–1202. doi: 10.1016/j.molp.2020.06.009. [DOI] [PubMed] [Google Scholar]

[CR32] 32.Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27:573–580. doi: 10.1093/nar/27.2.573. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Bao W, Kojima KK, Kohany O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mobile DNA. 2015;6:11. doi: 10.1186/s13100-015-0041-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Bergman CM, Quesneville H. Discovering and detecting transposable elements in genome sequences. Brief Bioinform. 2007;8:382–392. doi: 10.1093/bib/bbm048. [DOI] [PubMed] [Google Scholar]

[CR35] 35.Ou S, Jiang N. LTR_FINDER_parallel: parallelization of LTR_FINDER enabling rapid identification of long terminal repeat retrotransposons. Mobile DNA. 2019;10:48. doi: 10.1186/s13100-019-0193-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Price AL, Jones NC, Pevzner PA. De novo identification of repeat families in large genomes. Bioinformatics. 2005;21:i351–i358. doi: 10.1093/bioinformatics/bti1018. [DOI] [PubMed] [Google Scholar]

[CR37] 37.Flynn JM, et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA. 2020;117:9451–9457. doi: 10.1073/pnas.1921046117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] 38.Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010;26:2460–2461. doi: 10.1093/bioinformatics/btq461. [DOI] [PubMed] [Google Scholar]

[CR39] 39.Chan PP, Lin BY, Mak AJ, Lowe TM. tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Res. 2021;49:9077–9096. doi: 10.1093/nar/gkab688. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] 40.Altschul SF, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] 41.Griffiths-Jones S, et al. Rfam: an RNA family database. Nucleic Acids Res. 2003;31:439–441. doi: 10.1093/nar/gkg006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.Nawrocki EP, Eddy SR. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 2013;29:2933–2935. doi: 10.1093/bioinformatics/btt509. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR43] 43.Stanke M, Steinkamp R, Waack S, Morgenstern B. AUGUSTUS: A web server for gene finding in eukaryotes. Nucleic Acids Res. 2004;32:W309–W312. doi: 10.1093/nar/gkh379. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Majoros WH, Pertea M, Salzberg SL. TigrScan and GlimmerHMM: Two open source ab initio eukaryotic gene-finders. Bioinformatics. 2004;20:2878–2879. doi: 10.1093/bioinformatics/bth315. [DOI] [PubMed] [Google Scholar]

[CR45] 45.Korf I. Gene finding in novel genomes. BMC Bioinformatics. 2004;5:59. doi: 10.1186/1471-2105-5-59. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR46] 46.Blanco E, Parra G, Guigó R. Using geneid to identify genes. Curr. Protoc. Bioinformatics. 2007;18:4.3.1–4.3.28. doi: 10.1002/0471250953.bi0403s18. [DOI] [PubMed] [Google Scholar]

[CR47] 47.Aggarwal G, Ramaswamy R. Ab initio gene identification: Prokaryote genome annotation with GeneScan and GLIMMER. J. Biosci. 2002;27:7–14. doi: 10.1007/BF02703679. [DOI] [PubMed] [Google Scholar]

[CR48] 48.Sparks ME, et al. Brown marmorated stink bug, Halyomorpha halys (Stål), genome: putative underpinnings of polyphagy, insecticide resistance potential and biology of a top worldwide pest. BMC Genomics. 2020;21:227. doi: 10.1186/s12864-020-6510-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR49] 49.Shibata T, et al. High-quality genome of the zoophytophagous stink bug, Nesidiocoris tenuis, informs their food habit adaptation. G3. 2024;14:jkad289. doi: 10.1093/g3journal/jkad289. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR50] 50.Panfilio KA, et al. Molecular evolutionary trends and feeding ecology diversification in the Hemiptera, anchored by the milkweed bug genome. Genome Biol. 2019;20:64. doi: 10.1186/s13059-019-1660-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR51] 51.Mesquita RD, et al. Genome of Rhodnius prolixus, an insect vector of Chagas disease, reveals unique adaptations to hematophagy and parasite infection. Proc. Natl. Acad. Sci. USA. 2015;112:14936–14941. doi: 10.1073/pnas.1506226112. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR52] 52.Liu Y, et al. Apolygus lucorum genome provides insights into omnivorousness and mesophyll feeding. Mol. Ecol. Resour. 2021;21:287–300. doi: 10.1111/1755-0998.13253. [DOI] [PubMed] [Google Scholar]

[CR53] 53.Mei Y, et al. InsectBase 2.0: a comprehensive gene resource for insects. Nucleic Acids Res. 2022;50:D1040–D1045. doi: 10.1093/nar/gkab1090. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR54] 54.Birney E, Clamp M, Durbin R. GeneWise and Genomewise. Genome Res. 2004;14:988–995. doi: 10.1101/gr.1865504. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR55] 55.Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR56] 56.Haas BJ, et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 2013;8:1494–1512. doi: 10.1038/nprot.2013.084. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR57] 57.Kovaka S, et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 2019;20:278. doi: 10.1186/s13059-019-1910-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR58] 58.Kim D, Paggi J, Park C, Bennett C, Salzberg S. Graph-based genome alignment and genotyping with HISAT2 and HISAT genotype. Nat. Biotechnol. 2019;37:907–915. doi: 10.1038/s41587-019-0201-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR59] 59.Haas BJ, et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003;31:5654–5666. doi: 10.1093/nar/gkg770. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR60] 60.Haas BJ, et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol. 2008;9:R7. doi: 10.1186/gb-2008-9-1-r7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR61] 61.Buchfink B, Reuter K, Drost H. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat. Methods. 2021;18:366–368. doi: 10.1038/s41592-021-01101-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR62] 62.Jones P, et al. InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014;30:1236–1240. doi: 10.1093/bioinformatics/btu031. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR63] 63.Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 2017;45:D353–D361. doi: 10.1093/nar/gkw1092. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR64] 64.Ashburner M, et al. Gene Ontology: tool for the unification of biology. Nat. Genet. 2000;25:25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR65] 65.2024. NCBI Sequence Read Archive. SRR25498178

[CR66] 66.2024. NCBI Sequence Read Archive. SRR25503034

[CR67] 67.2024. NCBI Sequence Read Archive. SRR25518321

[CR68] 68.NCBI Sequence Read Archive.https://identifiers.org/ncbi/insdc.sra:SRP453032 (2024).

[CR69] 69.Wang YQ, Zhu JY. 2024. Arma custos isolate FDSW210240299, whole genome shotgun sequencing project. GenBank. JBBAGI000000000

[CR70] 70.Wang YQ, 2024. Chromosome-level genome assembly of the predatory stink bug Arma custos (Hemiptera: Pentatomidae) Figshare. [DOI] [PMC free article] [PubMed]

[CR71] 71.Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR72] 72.Rhie A, et al. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 2020;21:245. doi: 10.1186/s13059-020-02134-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR73] 73.Manni M, Berkeley MR, Seppey M, Zdobnov EM. BUSCO: assessing genomic data quality and beyond. Curr. Protoc. 2021;1:e323. doi: 10.1002/cpz1.323. [DOI] [PubMed] [Google Scholar]

[CR74] 74.Parra G, Bradnam K, Korf I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics. 2007;23:1061–1067. doi: 10.1093/bioinformatics/btm071. [DOI] [PubMed] [Google Scholar]

[CR75] 75.Ou S, Chen J, Jiang N. Assessing genome assembly quality using the LTR Assembly Index (LAI) Nucleic Acids Res. 2018;46:e126. doi: 10.1093/nar/gky730. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Chromosome-level genome assembly of the predatory stink bug Arma custos

Yuqin Wang

Yunfei Luo

Yunkang Ge

Sha Liu

Wenkai Liang

Chaoyan Wu

Shujun Wei

Jiaying Zhu

Abstract

Background & Summary

Methods

Sample collection and rearing

Sequencing

Table 1.

Genome survey

Fig. 1.

Genome assembly

Table 2.

Hi-C scaffolding

Table 3.

Fig. 2.

Fig. 3.

Fig. 4.

Genome annotation

Table 4.

Table 5.

Table 6.

Table 7.

Table 8.

Data Records

Technical Validation

Acknowledgements

Author contributions

Code availability

Competing interests

Footnotes

References

Associated Data

Data Citations

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases