Skip to main content
Scientific Data logoLink to Scientific Data
. 2024 Aug 17;11:899. doi: 10.1038/s41597-024-03753-8

Chromosome-level genome assembly of the invasive pest Pseudococcus jackbeardsleyi (Hemiptera: Pseudococcidae)

Shaokun Guo 1,2,3,, Bo Liu 1,2, Qingying Zhao 4, Zhihong Li 1,2,3, Guoping Zhan 4
PMCID: PMC11330446  PMID: 39154014

Abstract

Among over 2,000 species of mealybugs (Hemiptera: Pseudococcidae), only 13 genomes have been published so far, seriously limiting the researches on the phylogeny and adaptive evolution of this group. The continuous publication of mealybug genomes will significantly facilitate our exploration of the biological characteristics, detrimental attributes, and control strategies of the Pseudococcidae family. Jack Beardsley mealybug (Pseudococcus jackbeardsleyi) as one of the hazardous invasive pests, it could cause enormous losses to the fruit and vegetable industries worldwide. Herein, we combined Nanopore long-read, short-read Illumina and Hi-C sequencing, generating a high-quality chromosome-level genome assembly of P. jackbeardsleyi. The genome size was determined to be 334.818 Mb, which was assembled into 5 linkage groups with a N50 of 67.233 Mb. The BUSCO analysis demonstrated the completeness of the genome assembly and annotation are 95.7% and 92.8%, respectively. The developed high-quality genome will serve as an asset for delving into the genetic mechanisms underlying the invasiveness of P. jackbeardsleyi, thereby offering a crucial theoretical foundation for the prevention and management of Pseudococcidae pests.

Subject terms: Entomology, Sequencing

Background & Summary

Mealybugs (Hemiptera: Pseudococcidae) are significant pests including over 2,000 species, of which 20% are polyphagous, affecting a wide variety of agricultural, horticultural, and ornamental plants worldwide1,2. They feed on plant sap and excrete honeydew, leading to plant growth restriction, yields reduction, or even plant death, which could cause huge crop losses3. Their broad host range, rapid reproduction cycle, global distribution, and ability to transmit important plant viruses, contribute to their considerable potential for causing damage4. Among the members of Pseudococcidae, Jack Beardsley mealybug (Pseudococcus jackbeardsleyi Gimpel and Miller) is a polyphagous species originating from the neotropical region5. This species is known to infest plant species including 88 genera of hosts in 38 plant families, including various vegetables, fruits, and ornamental crops, such as Luffa cylindrica, Nephelium lappaceum and Ficus microcarpain, which are of significant economic importance6. As one important invasive species, P. jackbeardsleyi is widely distributed in 46 countries and regions till now, and is still expanding its invasion ranges rapidly6. Except for the direct impact on host plant, P. jackbeardsleyi is also a potential vectors of plant virus such as CaMMV (Cacao Mild Mosaic Virus)7. Therefore, its broad spectrum of economic hosts and the capacity to extend its geographical make it a candidate pest target in the future8. However, there is limited information concerning control strategies for this species. Here, whole genome sequencing was performed to construct a high-quality genome assembly for this species, which will help to study the role of Pseudococcidae in ecosystems, protect biodiversity and promote sustainable development.

Among the multitude of species within the Pseudococcidae family, only 13 genomes have been published, with merely 5 of them assembled to the chromosomal level thus far (Table 1), which severely impacted our understanding of their adaptability, systematic evolution and invasive strategies. In the present study, we have successfully generated a high-quality reference genome of P. jackbeardsleyi at a chromosomal level, utilizing a comprehensive approach that combines Nanopore long-read sequencing, high-throughput chromosome conformation capture (Hi-C) technology, and Illumina platform paired-end short-read sequencing. The assembled genome size is 334.818 Mb, which were clustered into 5 linkage groups with an N50 of 67.233 Mb. Moreover, we have identified 10,908 annotated protein-coding genes and Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis demonstrated the completeness of 95.7% of chromosome-level genome and 92.8% of annotated genes. Additionally, 12 gene families covering detoxification and chemosensory genes were also predicted. Overall, the high-quality P. jackbeardsleyi genome not only provides a useful resource for understanding the phylogenomic and comparative genomics of Pseudococcidae, but also facilitates the development of potential control strategies for these pests.

Table 1.

Assembly features for genomes of Pseudococcus jackbeardsleyi and other scale insects.

Feature Pseudococcus jackbeardsleyi Pseudococcus viburni Pseudococcus longispinus Phenacoccus solenopsis Planococcus citri Paracoccus marginatus Acanthococcus lagerstroemiae Balanococcus diminutus Coronaproctus castanopsis Ferrisia virgata Hypogeococcus pungens Maconellicoccus hirsutus Trionymus perrisii
Level Chr. Scaf. Scaf. Chr. Chr. Scaf. Chr. Chr. Chr. Scaf. Scaf. Scaf. Scaf.
Size (Mb) 334.818 435.4 285 292.5 403.6 191.2 658.1 313.1 700.1 304.6 238.2 163 237.6
No. Scaf./Chr. 5 2,392 66,857 588 5 60,102 9 5 3 32,723 250,844 12,889 80,386
Scaf. N50 (Mb) 67.233 0.875 0.0099 49 83.7 0.0065 70.5 63.3 273.8 0.0254 0.0019 0.0468 0.0046
No. contig 96 2,465 67,377 1,500 35 61,408 1,035 75 143 33,491 258,686 13,288 80,611
Contig N50 (Mb) 7.767 0.8266 0.0097 0.4898 23.6 0.0063 5.5 6.7 12.4 0.0243 0.0017 0.0449 0.0046

Methods

Samples collection, DNA and RNA preparation

Pseudococcus jackbeardsleyi were collected from mangosteens in Pingxiang, Guangxi Zhuang Autonomous Region of China (22.1178° N, 106.7394° E) for genome sequencing. Genomic DNA for the Nanopore and the Illumina paired-end library preparation was extracted from 20 females using Blood & Cell Culture DNA Kits (Qiagen, Hilden, Germany) according to the manufacturer’s instructions. The purity and concentration of all DNA extracts were verified using the QubitTM dsDNA Quantification Assay Kits (Life Technologies Corporation, Eugene, OR, USA) with a NanoDrop (NanoDrop Products, Wilmington, DE, USA) and a Qubit®3.0 Fluorometer (Life Technologies Corporation, Eugene, OR, USA). The Blue Pippin system (Sage Science, Beverly, MA, USA) was used to retrieve large DNA fragments by gel cutting. For RNA-seq, total RNA from one female was extracted using TRIzol Reagent (Invitrogen, Carlsbad, CA, USA) and quantified with a NanoDrop ND-2000 spectrophotometer (NanoDrop Products, Wilmington, DE, USA), with three biological replicates prepared.

Genome sequencing and assembly

Genomic DNA was repaired and purified using the same methods employed in previous study9. Ligation Sequencing Kit (Cat# SQK-LSK109, Oxford Nanopore Technologies, Oxford, UK) was used for adaptors ligation and purification. Subsequently, the DNA library was constructed, and its quantitation was measured using the Qubit®3.0 Fluorometer (Life Technologies Corporation, Eugene, OR, USA). Approximately 700 ng DNA library was prepared and performed on an Oxford Nanopore PromethION P48 device with MinKNOW v22.03.4 using flow cell R9.4.1 (FLO-PRO002) (Oxford Nanopore Technologies, Oxford, UK) at the Genome Center of Grandomics (Wuhan, China) for real-time single-molecule sequencing. The Guppy basecaller (v6.0.7) was used to convert the raw signal into canonical DNA bases for basecalling with the specific parameters (--flowcell FLO-PRO002 --kit SQK-LSK109 --basecaller r9.4.1 --min_length 1000 --compress_fastq). After sequencing and basecalling, the fastq data can be used for genome assembly. For short-read sequencing, a paired-end library construction and sequencing followed previously described methods10. After quality control, we obtained 15.468 Gb of short reads (coverage: 47.741×) from the Illumina platform and 42.105 Gb of long reads (coverage: 129.954×) from the Nanopore platform for genome assembly (Table 2).

Table 2.

Statistics for sequencing data for Pseudococcus jackbeardsleyi genome assembly.

Method Insert size (bp) Data (Gb) Coverage (×) Usage
Illumina NovaSeq 500 15.468 47.741 Survey, correction
Nanopore 20,000 42.105 129.954 De novo assembly
Hi-C library 100–500 33.114 102.204 Chromosome-level assembly
Total / 90.687 279.899 /

Oxford Nanopore long reads were utilized for de novo genome assembly. Raw reads were corrected and assembled using NextDenovo v2.4.0 (https://github.com/Nextomics/NextDenovo) with default parameters to generate a draft assembly. After assembly, NextPolish v1.3.111 was used to further improve single base accuracy using standard parameters as described in previous publication9. This process resulted in 96 contigs with a contig N50 length of 7.767 Mb (Table 1). Genome size, heterozygosity, and duplication were estimated by the K-mer method. K-mers were counted by jellyfish v2.2.912 with 21-base oligonucleotide based on Illumina short reads (Fig. 1a). Parameters were determined by GenomeScope v1.013. BUSCO v4.1.4 was used to assess the completeness of the assembly based on the insecta_odb10 database (1,367 genes)14, revealing that 95.7% of the genes were complete in the contig-level genome (Table 3).

Fig. 1. Assembly features of Pseudococcus jackbeardsleyi genome.

Fig. 1

(a) Kmer (21) distribution and estimated genome size, heterozygosity and duplication rate; (b) genome-wide all-by-all Hi-C interaction.

Table 3.

Completeness of Pseudococcus jackbeardsleyi genome assembly and annotation evaluated by BUSCO based on insecta_odb10 database (1,367 genes).

Source Complete (C) Single copy (S) Duplicated (D) Fragmented (F) Missing (M)
Contig-level 95.7% 93.0% 2.7% 0.7% 3.6%
Chromosome-level 95.7% 93.4% 2.3% 0.8% 3.5%
Annotation 92.8% 89.5% 3.3% 2.3% 4.9%

Hi-C sequencing and chromosome anchoring

The Hi-C library was constructed by a standard protocol described previously with certain modifications15. Briefly, 20 females of P. jackbeardsleyi were ground in 2% formaldehyde for cross-linking cellular protein. Cross-linking was halted by adding glycine and applying additional vacuum infiltration. Fixed tissue was then ground into a powder, resuspended in nuclei isolation buffer, and the purified nuclei were digested with 100 units of DpnII restriction enzyme, marked by incubating with biotin-14-dATP as described in previous studies9,10. Hi-C libraries were quantified using quantitative real-time PCR with a library quantification kit/Illumina GA Universal (KAPA, Wilmington, MA, USA). Subsequently, the libraries were sequenced on the Illumina NovaSeq platform, generating 150 bp paired-end reads. In total, 33.114 Gb (coverage: 102.204×) of Hi-C data for P. jackbeardsleyi was generated (Table 2). Juicer v1.6 and 3D de novo assembly (3D-DNA) pipelines were used to assemble the scaffolds into a chromosome-level genome16,17. The results showed 84.12% normal paired reads, while the others were chimeric paired (12.24%), chimeric ambiguous (2.21%) or unmapped reads (1.43%), with 26.08% of the read pairs showing Hi-C contacts (Table 4). The assembled contigs were clustered into 5 linkage groups with an N50 of 67.233 Mb (Fig. 1b, Table 1). BUSCO was also used to evaluate the completeness of the chromosome-level genome, which showed that 95.7% was identified as complete genes (Table 3).

Table 4.

Summary of Hi-C data for chromosome-level assembly of Pseudococcus jackbeardsleyi genome.

Parameter Value
Sequenced Read Pairs 109,359,863
 Normal Paired 91,989,238 (84.12%)
 Chimeric Paired 13,386,340 (12.24%)
 Chimeric Ambiguous 2,420,030 (2.21%)
 Unmapped 1,564,255 (1.43%)
Ligation Motif Present 67,777,133 (61.98%)
Alignable (Normal + Chimeric Paired) 105,375,578 (96.36%)
Unique Reads 45,133,938 (41.27%)
PCR Duplicates 59,592,184 (54.49%)
Optical Duplicates 649,456 (0.59%)
Library Complexity Estimate 52,123,569
Intra-fragment Reads 8,501,126 (7.77%/18.84%)
Below MAPQ Threshold 8,110,642 (7.42%/17.97%)
Hi-C Contacts 28,522,170 (26.08%/63.19%)
 Ligation Motif Present 20,951,834 (19.16%/46.42%)
 3’ Bias (Long Range) 78% - 22%
 Pair Type %(L-I-O-R) 25% - 25% - 25% - 25%
Inter-chromosomal 5,178,347 (4.74%/11.47%)
Intra-chromosomal 23,343,823 (21.35%/51.72%)
Short Range (<20Kb) 19,074,662 (17.44%/42.26%)
Long Range (>20Kb) 4,268,693 (3.90%/9.46%)

RNA-seq

The cDNA libraries were constructed with the TruSeqTM RNA sample preparation Kit (Illumina, San Diego, CA, USA) using 1 μg of total RNA. Libraries were size-selected for 300 bp target fragments on 2% low range ultra-agarose, followed by PCR amplification for 15 cycles using Phusion DNA polymerase (NEB, Ipswich, MA, USA). After quantification by TBS380 (Picogreen, Waltham, MA, USA), the paired-end library was sequenced on an Illumina NovaSeq 6000 sequencer (Illumina, San Diego, CA, USA) at Majorbio Bio-pharm Technology Co., Ltd (Shanghai, China). Trinity v2.11.0 was used for de novo assembly to obtain corresponding transcripts from RNA-seq raw data with default parameters18 (--seqType fq --max_memory 200 G --left R1.raw.fastq --right R2.raw.fastq --CPU 60 --trimmomatic --output pj_trinity).

Gene structure and function annotation

Gene structure annotation was performed using Maker v3.01.03 genome annotation pipeline, following established protocols10,19. RNA-seq evidence described above was utilized for genome annotation to improve exon nucleotide accuracy. Gene functions were annotated using eggnog-mapper v2.1.720. A total of 10,908 annotated protein-coding genes were identified, and BUSCO analysis showed that 92.8% of the evaluated single-copy genes were identified as complete, with 2.3% fragmented and 4.9% missing gene (Table 3). In scale insects, species with genomes assembled to the chromosome level tend to have fewer genes compared to those at the scaffold level, which may be due to redundancy caused by insufficient assembly levels (Table 5).

Table 5.

Statistics for number of protein-coding genes in the genome of Pseudococcus jackbeardsleyi and other scale insects.

Species Name Assemble Level NO. protein-coding genes
Pseudococcus jackbeardsleyi Chr. 10,908
Pseudococcus viburni Scaf. 23,629
Phenacoccus solenopsis Chr. 11,880
Planococcus citri Chr. 18,954
Coronaproctus castanopsis Chr. 10,542
Ferrisia virgata Scaf. 47,978
Maconellicoccus hirsutus Scaf. 21,623

Repeats and non-coding RNA (ncRNA) annotation

RepeatMasker v4.0.7 was used to detect repetitive elements in scaffolds longer than 1,000 bp against the Insecta repeats within RepBase Update21 (http://www.girinst.org). For ab initio prediction, RepeatModeler v2.0.1 (http://www.repeatmasker.org/RepeatModeler.html, RRID: SCR_015027) were first used for de novo candidate database constructing of repetitive elements. Among the repetitive sequences, retroelements and DNA transposons accounted for 5.22% and 5.61% of the whole genome, respectively (Table 6). Totally 4,380 satellites and 63,734 simple repeats were identified as tandem repeats (TRs), accounting for 0.19% and 0.86% of the P. jackbeardsleyi genome, respectively (Table 6). For ncRNA annotation, transfer RNA (tRNA) and ribosome RNA (rRNA) were predicted by tRNAscan-SE and RNAmmer with default parameters22,23. MicroRNAs (miRNA) were predicted by aligning the genomic sequence against RFAM v14.10 database (http://rfam.xfam.org/) using BLASTN24. A total of 219 tRNAs, 83 rRNAs, and 28 microRNAs were predicted in the P. jackbeardsleyi genome (Table 7). A circular diagram illustrating gene count, repeat density and GC content was generated using Circos25 (Fig. 2).

Table 6.

Statistics for repeat elements in the genome of Pseudococcus jackbeardsleyi.

Types Number Length (bp) Percentage (%)
Retroelements 23,414 17,479,410 5.22
SINEs 61 9,813 0
Penelope 31 29,910 0.01
LINEs 6,205 1,671,215 0.5
CRE/SLACS 0 0 0
L2/CR1/Rex 927 328,377 0.1
R1/LOA/Jockey 2,848 570,516 0.17
R2/R4/NeSL 0 0 0
RTE/Bov-B 136 27,046 0.01
L1/CIN4 0 0 0
LTR elements 17,148 15,798,382 4.72
BEL/Pao 590 733,282 0.22
Ty1/Copia 2,294 2,695,829 0.81
Gypsy/DIRS1 7,417 9,319,913 2.78
Retroviral 1,004 109,343 0.03
DNA transposons 82,721 18,792,160 5.61
hobo-Activator 29,616 6,473,573 1.93
Tc1-IS630-Pogo 16,404 3,948,341 1.18
En-Spm 0 0 0
MuDR-IS905 0 0 0
PiggyBac 190 101,936 0.03
Tourist/Harbinger 2,530 622,867 0.19
Other (Mirage, P-element, Transib) 0 0 0
Rolling-circles 7,887 2,366,428 0.71
Unclassified 313,462 79,851,041 23.85
Total interspersed repeats 419,597  116,122,611 34.68
Small RNA 248 228,707 0.07
Satellites 4,380 624,937 0.19
Simple repeats 63,734 2,869,998 0.86
Low complexity 14,688 1,558,602 0.47

Table 7.

Statistics for noncoding RNA genes in the genome of Pseudococcus jackbeardsleyi.

Types Number
Infernal stats Candidate tRNAs read 244
Infernal-confirmed tRNAs 219
Bases scanned by Infernal 24,287
tRNA count tRNAs decoding Standard 20 AA 195
Selenocysteine tRNAs (TCA) 0
Possible suppressor tRNAs (CTA,TTA,TCA) 0
tRNAs with undetermined/unknown isotypes 10
Predicted pseudogenes 14
Total tRNAs 219
tRNAs with introns 17
rRNA 5s rRNA 26
5.8s rRNA 14
18s rRNA 19
28s rRNA 24
miRNA 28

Fig. 2. Overview of assembled Pseudococcus jackbeardsleyi genome.

Fig. 2

The outer layer of coloured blocks is a circular representation of the 5 linkage-groups and circos demonstration of gene count (histogram), repeat density (heatmap) and GC content (line) from the outer to the inner circle, respectively.

Genome family analysis

Twelve gene families associated with detoxification and chemosensory functions were manually annotated in P. jackbeardsleyi, including cytochrome P450 monooxygenase (P450s), glutathione S-transferase (GSTs), carboxyl/cholinesterase (CCEs), UDP-glycosyltransferases (UGTs), ATP-binding cassette (ABC) transporter, heat shock protein (HSP), odorant binding protein (OBP), odorant receptor (OR), gustatory receptor (GR), Ionotropic receptors (IR), chemosensory proteins (CSP), and sensory neuron membrane protein (SNMP). The bioinformatic pipeline BITACORA (full mode) conducted HMMER and BLAST analyses26. Genes were annotated with a default cutoff E-value of 10e-5 and manually verified based on gene length and conserved domains sourced from the SMART database27. In total, we identified 83 P450s, 16 GSTs, 150 CCEs, 38 UGTs, 83 ABC transporters and 47 HSPs in P. jackbeardsleyi genome (Table 8). Additionally, there are 81 chemosensory genes in P. jackbeardsleyi, including 21 OBPs, 5 ORs, 15 GRs, 19 IRs, 9 CSPs and 12 SNMPs (Table 8).

Table 8.

Statistics for 12 gene families of Pseudococcus jackbeardsleyi.

Gene Family Number of annotated genes Identified Number of manually annotated genes Total number of identified genes Total number of identified genes clustering identical sequences
P450 83 0 83 83
GST 16 1 17 17
CCE 150 0 150 150
UGT 38 0 38 38
ABC 83 3 86 86
OBP 21 1 22 22
OR 5 16 21 21
GR 15 8 23 23
IR 19 8 27 27
CSP 9 2 11 11
SNMP 12 0 12 12
HSP 47 17 64 62

Data Records

The dataset is available at the National Center for Biotechnology Information (NCBI), under the genome accession number of JAZDXF00000000028. The NCBI BioProject accession number is PRJNA1070360. RNA-seq, Hi-C and Illumina raw reads have been deposited in the Sequence Read Archive (SRA) repository with the accession number of SRP48660429. In addition, the annotation files for genome, ncRNA and repeat content had been submitted at the figshare3032.

Technical Validation

The integrity of the extracted DNA was assessed by agarose gel electrophoresis, and DNA concentration was determined using NanoDrop and Qubit 3.0 Fluorometer with an absorbance of approximately 2.0 at 260/280. The scaffold N50, indicating the length at which half of the genome assembly is in scaffolds of this size, notably improved to 67.233 Mb, surpassing many other genomes (Table 1). We evaluated the completeness of the genome assembly using the sequence identity method, aligning small fragment library reads with the assembled genome using BWA software. The BUSCO analysis demonstrated 95.7% completeness (Table 3), affirming the high quality of the genome assembly. The percentage of duplicated single-copy genes assessed by BUSCO was minimal at 2.3% (Table 3), indicating that duplication was not a significant issue in the assembly process. Furthermore, BlobTools was utilized to detect potential contamination in the assembly, revealing no indications of contamination. These results indicated that we successfully acquired a high-quality genome of P. jackbeardsleyi.

Acknowledgements

This research was supported by the National Key Research and Development Programme of China (2021YFF0601901), Beijing Natural Science Foundation (6244049), Hainan Natural Science Foundation (323MS065) and the China Agriculture Research System of MOF and MARA.

Author contributions

Shaokun Guo conceived and designed the study; Shaokun Guo and Bo Liu conducted molecular works; Guoping Zhan and Qingying Zhao provided the insect pictures; Shaokun Guo, Guoping Zhan and Zhihong Li discussed the results; Shaokun Guo analyzed the data and wrote the manuscript.

Code availability

The data analyses were performed according to the manuals and protocols by the developers of corresponding bioinformatics tools and all software, and codes used in this work are publicly available, with corresponding versions indicated in Methods.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Garcia Morales, M. et al. ScaleNet: a literature-based model of scale insect biology and systematics. Database2016, bav118 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Miller, D., Miller, G. & Watson, G. Invasive species of mealybugs (Hemiptera: Pseudococcidae) and their threat to US agriculture. P. Entomol. Soc. Wash.104, 825–836 (2002). [Google Scholar]
  • 3.Bellotti, A. C., et al.Cassava pests in Latin America, Africa and Asia. (Centro Internacional de Agricultura Tropical (CIAT), 2011).
  • 4.Meyer, J. B., Kasdorf, G. G. F., Nel, L. H. & Pietersen, G. Transmission of activated-episomal Banana streak OL (badna) virus (BSOLV) to cv. Williams Banana (Musa sp.) by three mealybug species. Plant Dis.92, 1158–1163 (2008). 10.1094/PDIS-92-8-1158 [DOI] [PubMed] [Google Scholar]
  • 5.Williams, D. The distribution of the neotropical mealybug Pseudococcus elisae Borchsenius in the Pacific region and Southern. Asia (Hem.-Hom., Pseudococcidae). Entomologist’s Monthly Magazine124, 123–124 (1988). [Google Scholar]
  • 6.CABI. Pseudococcus jackbeardsleyi (Jack Beardsley mealybug). CABI Compendium, https://www.cabi.org/cpc/datasheet/45087 (2021).
  • 7.Puig, A. S., Wurzel, S., Suarez, S., Marelli, J. P. & Niogret, J. Mealybug (Hemiptera: Pseudococcidae) species associated with cacao mild mosaic virus and evidence of virus acquisition. Insects12, 994 (2021). 10.3390/insects12110994 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Williams, D. J. & Watson, G. W. Scale insects of the tropical South Pacific region. Part 2. Mealybugs (Pseudococcidae). (CAB International, 1988).
  • 9.Guo, S. et al. Chromosome-level genome assembly of an important wolfberry fruit fly (Neoceratitis asiatica Becker). Sci. Data10, 675 (2023). 10.1038/s41597-023-02601-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Guo, S. et al. Chromosome-level assembly of the melon thrips genome yields insights into evolution of a sap-sucking lifestyle and pesticide resistance. Mol. Ecol. Resour.20, 1110–1125 (2020). 10.1111/1755-0998.13189 [DOI] [PubMed] [Google Scholar]
  • 11.Hu, J., Fan, J. P., Sun, Z. Y. & Liu, S. L. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics36, 2253–2255 (2020). 10.1093/bioinformatics/btz891 [DOI] [PubMed] [Google Scholar]
  • 12.Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics27, 764–770 (2011). 10.1093/bioinformatics/btr011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics33, 2202–2204 (2017). 10.1093/bioinformatics/btx153 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Manni, M., Berkeley, M. R., Seppey, M., Simao, F. A. & Zdobnov, E. M. BUSCO update: Novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol.38, 4647–4654 (2021). 10.1093/molbev/msab199 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Belaghzal, H., Dekker, J. & Gibcus, J. H. Hi-C 2.0: An optimized Hi-C procedure for high-resolution genome-wide mapping of chromosome conformation. Methods123, 56–65 (2017). 10.1016/j.ymeth.2017.04.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science356, 92–95 (2017). 10.1126/science.aal3327 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst.3, 95–98 (2016). 10.1016/j.cels.2016.07.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Grabherr, M. G. et al. Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data. Nat. Biotechnol.29, 644–652 (2011). 10.1038/nbt.1883 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Cantarel, B. L. et al. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res.18, 188–196 (2008). 10.1101/gr.6743907 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Cantalapiedra, C. P., Hernández-Plaza, A., Letunic, I., Bork, P. & Huerta-Cepas, J. eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol. Biol. Evol.38, 5825–5829 (2021). 10.1093/molbev/msab293 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinf. 25, unit 4.10 (2009). [DOI] [PubMed]
  • 22.Lagesen, K. et al. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res.35, 3100–3108 (2007). 10.1093/nar/gkm160 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res.25, 955–964 (1997). 10.1093/nar/25.5.955 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Kalvari, I. et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res.49, D192–D200 (2020). 10.1093/nar/gkaa1047 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Krzywinski, M. et al. Circos: An information aesthetic for comparative genomics. Genome Res.19, 1639–1645 (2009). 10.1101/gr.092759.109 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Vizueta, J., Sanchez-Gracia, A. & Rozas, J. BITACORA: A comprehensive tool for the identification and annotation of gene families in genome assemblies. Mol. Ecol. Resour.20, 1445–1452 (2020). 10.1111/1755-0998.13202 [DOI] [PubMed] [Google Scholar]
  • 27.Letunic, I., Khedkar, S. & Bork, P. SMART: recent updates, new developments and status in 2020. Nucleic Acids Res.49, D458–D460 (2020). 10.1093/nar/gkaa937 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.NCBI Assemblyhttps://identifiers.org/ncbi/insdc.gca:GCA_038380155.1 (2024).
  • 29.NCBI Sequence Read Archivehttps://identifiers.org/ncbi/insdc.sra:SRP486604 (2024).
  • 30.Guo, S. Pseudococcus jackbeardsleyi genome annotation. figshare10.6084/m9.figshare.25622025.v1 (2024). 10.6084/m9.figshare.25622025.v1 [DOI]
  • 31.Guo, S. Pseudococcus jackbeardsleyi noncoding RNA prediction. figshare10.6084/m9.figshare.26268106.v1 (2024). 10.6084/m9.figshare.26268106.v1 [DOI]
  • 32.Guo, S. Pseudococcus jackbeardsleyi repeat content annotation. figshare10.6084/m9.figshare.26268229.v1 (2024).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

  1. NCBI Assemblyhttps://identifiers.org/ncbi/insdc.gca:GCA_038380155.1 (2024).
  2. NCBI Sequence Read Archivehttps://identifiers.org/ncbi/insdc.sra:SRP486604 (2024).
  3. Guo, S. Pseudococcus jackbeardsleyi genome annotation. figshare10.6084/m9.figshare.25622025.v1 (2024). 10.6084/m9.figshare.25622025.v1 [DOI]
  4. Guo, S. Pseudococcus jackbeardsleyi noncoding RNA prediction. figshare10.6084/m9.figshare.26268106.v1 (2024). 10.6084/m9.figshare.26268106.v1 [DOI]

Data Availability Statement

The data analyses were performed according to the manuals and protocols by the developers of corresponding bioinformatics tools and all software, and codes used in this work are publicly available, with corresponding versions indicated in Methods.


Articles from Scientific Data are provided here courtesy of Nature Publishing Group

RESOURCES