Chromosome-level genome assembly of the invasive pest Pseudococcus jackbeardsleyi (Hemiptera: Pseudococcidae)

Shaokun Guo; Bo Liu; Qingying Zhao; Zhihong Li; Guoping Zhan

doi:10.1038/s41597-024-03753-8

. 2024 Aug 17;11:899. doi: 10.1038/s41597-024-03753-8

Chromosome-level genome assembly of the invasive pest Pseudococcus jackbeardsleyi (Hemiptera: Pseudococcidae)

Shaokun Guo ^1,^2,^3,^✉, Bo Liu ^1,², Qingying Zhao ⁴, Zhihong Li ^1,^2,³, Guoping Zhan ⁴

PMCID: PMC11330446 PMID: 39154014

Abstract

Among over 2,000 species of mealybugs (Hemiptera: Pseudococcidae), only 13 genomes have been published so far, seriously limiting the researches on the phylogeny and adaptive evolution of this group. The continuous publication of mealybug genomes will significantly facilitate our exploration of the biological characteristics, detrimental attributes, and control strategies of the Pseudococcidae family. Jack Beardsley mealybug (Pseudococcus jackbeardsleyi) as one of the hazardous invasive pests, it could cause enormous losses to the fruit and vegetable industries worldwide. Herein, we combined Nanopore long-read, short-read Illumina and Hi-C sequencing, generating a high-quality chromosome-level genome assembly of P. jackbeardsleyi. The genome size was determined to be 334.818 Mb, which was assembled into 5 linkage groups with a N50 of 67.233 Mb. The BUSCO analysis demonstrated the completeness of the genome assembly and annotation are 95.7% and 92.8%, respectively. The developed high-quality genome will serve as an asset for delving into the genetic mechanisms underlying the invasiveness of P. jackbeardsleyi, thereby offering a crucial theoretical foundation for the prevention and management of Pseudococcidae pests.

Subject terms: Entomology, Sequencing

Background & Summary

Mealybugs (Hemiptera: Pseudococcidae) are significant pests including over 2,000 species, of which 20% are polyphagous, affecting a wide variety of agricultural, horticultural, and ornamental plants worldwide^1,2. They feed on plant sap and excrete honeydew, leading to plant growth restriction, yields reduction, or even plant death, which could cause huge crop losses³. Their broad host range, rapid reproduction cycle, global distribution, and ability to transmit important plant viruses, contribute to their considerable potential for causing damage⁴. Among the members of Pseudococcidae, Jack Beardsley mealybug (Pseudococcus jackbeardsleyi Gimpel and Miller) is a polyphagous species originating from the neotropical region⁵. This species is known to infest plant species including 88 genera of hosts in 38 plant families, including various vegetables, fruits, and ornamental crops, such as Luffa cylindrica, Nephelium lappaceum and Ficus microcarpain, which are of significant economic importance⁶. As one important invasive species, P. jackbeardsleyi is widely distributed in 46 countries and regions till now, and is still expanding its invasion ranges rapidly⁶. Except for the direct impact on host plant, P. jackbeardsleyi is also a potential vectors of plant virus such as CaMMV (Cacao Mild Mosaic Virus)⁷. Therefore, its broad spectrum of economic hosts and the capacity to extend its geographical make it a candidate pest target in the future⁸. However, there is limited information concerning control strategies for this species. Here, whole genome sequencing was performed to construct a high-quality genome assembly for this species, which will help to study the role of Pseudococcidae in ecosystems, protect biodiversity and promote sustainable development.

Among the multitude of species within the Pseudococcidae family, only 13 genomes have been published, with merely 5 of them assembled to the chromosomal level thus far (Table 1), which severely impacted our understanding of their adaptability, systematic evolution and invasive strategies. In the present study, we have successfully generated a high-quality reference genome of P. jackbeardsleyi at a chromosomal level, utilizing a comprehensive approach that combines Nanopore long-read sequencing, high-throughput chromosome conformation capture (Hi-C) technology, and Illumina platform paired-end short-read sequencing. The assembled genome size is 334.818 Mb, which were clustered into 5 linkage groups with an N50 of 67.233 Mb. Moreover, we have identified 10,908 annotated protein-coding genes and Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis demonstrated the completeness of 95.7% of chromosome-level genome and 92.8% of annotated genes. Additionally, 12 gene families covering detoxification and chemosensory genes were also predicted. Overall, the high-quality P. jackbeardsleyi genome not only provides a useful resource for understanding the phylogenomic and comparative genomics of Pseudococcidae, but also facilitates the development of potential control strategies for these pests.

Table 1.

Assembly features for genomes of Pseudococcus jackbeardsleyi and other scale insects.

Feature	Pseudococcus jackbeardsleyi	Pseudococcus viburni	Pseudococcus longispinus	Phenacoccus solenopsis	Planococcus citri	Paracoccus marginatus	Acanthococcus lagerstroemiae	Balanococcus diminutus	Coronaproctus castanopsis	Ferrisia virgata	Hypogeococcus pungens	Maconellicoccus hirsutus	Trionymus perrisii
Level	Chr.	Scaf.	Scaf.	Chr.	Chr.	Scaf.	Chr.	Chr.	Chr.	Scaf.	Scaf.	Scaf.	Scaf.
Size (Mb)	334.818	435.4	285	292.5	403.6	191.2	658.1	313.1	700.1	304.6	238.2	163	237.6
No. Scaf./Chr.	5	2,392	66,857	588	5	60,102	9	5	3	32,723	250,844	12,889	80,386
Scaf. N50 (Mb)	67.233	0.875	0.0099	49	83.7	0.0065	70.5	63.3	273.8	0.0254	0.0019	0.0468	0.0046
No. contig	96	2,465	67,377	1,500	35	61,408	1,035	75	143	33,491	258,686	13,288	80,611
Contig N50 (Mb)	7.767	0.8266	0.0097	0.4898	23.6	0.0063	5.5	6.7	12.4	0.0243	0.0017	0.0449	0.0046

Open in a new tab

Methods

Samples collection, DNA and RNA preparation

Pseudococcus jackbeardsleyi were collected from mangosteens in Pingxiang, Guangxi Zhuang Autonomous Region of China (22.1178° N, 106.7394° E) for genome sequencing. Genomic DNA for the Nanopore and the Illumina paired-end library preparation was extracted from 20 females using Blood & Cell Culture DNA Kits (Qiagen, Hilden, Germany) according to the manufacturer’s instructions. The purity and concentration of all DNA extracts were verified using the Qubit^TM dsDNA Quantification Assay Kits (Life Technologies Corporation, Eugene, OR, USA) with a NanoDrop (NanoDrop Products, Wilmington, DE, USA) and a Qubit®3.0 Fluorometer (Life Technologies Corporation, Eugene, OR, USA). The Blue Pippin system (Sage Science, Beverly, MA, USA) was used to retrieve large DNA fragments by gel cutting. For RNA-seq, total RNA from one female was extracted using TRIzol Reagent (Invitrogen, Carlsbad, CA, USA) and quantified with a NanoDrop ND-2000 spectrophotometer (NanoDrop Products, Wilmington, DE, USA), with three biological replicates prepared.

Genome sequencing and assembly

Genomic DNA was repaired and purified using the same methods employed in previous study⁹. Ligation Sequencing Kit (Cat# SQK-LSK109, Oxford Nanopore Technologies, Oxford, UK) was used for adaptors ligation and purification. Subsequently, the DNA library was constructed, and its quantitation was measured using the Qubit®3.0 Fluorometer (Life Technologies Corporation, Eugene, OR, USA). Approximately 700 ng DNA library was prepared and performed on an Oxford Nanopore PromethION P48 device with MinKNOW v22.03.4 using flow cell R9.4.1 (FLO-PRO002) (Oxford Nanopore Technologies, Oxford, UK) at the Genome Center of Grandomics (Wuhan, China) for real-time single-molecule sequencing. The Guppy basecaller (v6.0.7) was used to convert the raw signal into canonical DNA bases for basecalling with the specific parameters (--flowcell FLO-PRO002 --kit SQK-LSK109 --basecaller r9.4.1 --min_length 1000 --compress_fastq). After sequencing and basecalling, the fastq data can be used for genome assembly. For short-read sequencing, a paired-end library construction and sequencing followed previously described methods¹⁰. After quality control, we obtained 15.468 Gb of short reads (coverage: 47.741×) from the Illumina platform and 42.105 Gb of long reads (coverage: 129.954×) from the Nanopore platform for genome assembly (Table 2).

Table 2.

Statistics for sequencing data for Pseudococcus jackbeardsleyi genome assembly.

Method	Insert size (bp)	Data (Gb)	Coverage (×)	Usage
Illumina NovaSeq	500	15.468	47.741	Survey, correction
Nanopore	20,000	42.105	129.954	De novo assembly
Hi-C library	100–500	33.114	102.204	Chromosome-level assembly
Total	/	90.687	279.899	/

Open in a new tab

Oxford Nanopore long reads were utilized for de novo genome assembly. Raw reads were corrected and assembled using NextDenovo v2.4.0 (https://github.com/Nextomics/NextDenovo) with default parameters to generate a draft assembly. After assembly, NextPolish v1.3.1¹¹ was used to further improve single base accuracy using standard parameters as described in previous publication⁹. This process resulted in 96 contigs with a contig N50 length of 7.767 Mb (Table 1). Genome size, heterozygosity, and duplication were estimated by the K-mer method. K-mers were counted by jellyfish v2.2.9¹² with 21-base oligonucleotide based on Illumina short reads (Fig. 1a). Parameters were determined by GenomeScope v1.0¹³. BUSCO v4.1.4 was used to assess the completeness of the assembly based on the insecta_odb10 database (1,367 genes)¹⁴, revealing that 95.7% of the genes were complete in the contig-level genome (Table 3).

Fig. 1 — (a) Kmer (21) distribution and estimated genome size, heterozygosity and duplication rate; (b) genome-wide all-by-all Hi-C interaction.

Table 3.

Completeness of Pseudococcus jackbeardsleyi genome assembly and annotation evaluated by BUSCO based on insecta_odb10 database (1,367 genes).

Source	Complete (C)	Single copy (S)	Duplicated (D)	Fragmented (F)	Missing (M)
Contig-level	95.7%	93.0%	2.7%	0.7%	3.6%
Chromosome-level	95.7%	93.4%	2.3%	0.8%	3.5%
Annotation	92.8%	89.5%	3.3%	2.3%	4.9%

Open in a new tab

Hi-C sequencing and chromosome anchoring

The Hi-C library was constructed by a standard protocol described previously with certain modifications¹⁵. Briefly, 20 females of P. jackbeardsleyi were ground in 2% formaldehyde for cross-linking cellular protein. Cross-linking was halted by adding glycine and applying additional vacuum infiltration. Fixed tissue was then ground into a powder, resuspended in nuclei isolation buffer, and the purified nuclei were digested with 100 units of DpnII restriction enzyme, marked by incubating with biotin-14-dATP as described in previous studies^9,10. Hi-C libraries were quantified using quantitative real-time PCR with a library quantification kit/Illumina GA Universal (KAPA, Wilmington, MA, USA). Subsequently, the libraries were sequenced on the Illumina NovaSeq platform, generating 150 bp paired-end reads. In total, 33.114 Gb (coverage: 102.204×) of Hi-C data for P. jackbeardsleyi was generated (Table 2). Juicer v1.6 and 3D de novo assembly (3D-DNA) pipelines were used to assemble the scaffolds into a chromosome-level genome^16,17. The results showed 84.12% normal paired reads, while the others were chimeric paired (12.24%), chimeric ambiguous (2.21%) or unmapped reads (1.43%), with 26.08% of the read pairs showing Hi-C contacts (Table 4). The assembled contigs were clustered into 5 linkage groups with an N50 of 67.233 Mb (Fig. 1b, Table 1). BUSCO was also used to evaluate the completeness of the chromosome-level genome, which showed that 95.7% was identified as complete genes (Table 3).

Table 4.

Summary of Hi-C data for chromosome-level assembly of Pseudococcus jackbeardsleyi genome.

Parameter	Value
Sequenced Read Pairs	109,359,863
Normal Paired	91,989,238 (84.12%)
Chimeric Paired	13,386,340 (12.24%)
Chimeric Ambiguous	2,420,030 (2.21%)
Unmapped	1,564,255 (1.43%)
Ligation Motif Present	67,777,133 (61.98%)
Alignable (Normal + Chimeric Paired)	105,375,578 (96.36%)
Unique Reads	45,133,938 (41.27%)
PCR Duplicates	59,592,184 (54.49%)
Optical Duplicates	649,456 (0.59%)
Library Complexity Estimate	52,123,569
Intra-fragment Reads	8,501,126 (7.77%/18.84%)
Below MAPQ Threshold	8,110,642 (7.42%/17.97%)
Hi-C Contacts	28,522,170 (26.08%/63.19%)
Ligation Motif Present	20,951,834 (19.16%/46.42%)
3’ Bias (Long Range)	78% - 22%
Pair Type %(L-I-O-R)	25% - 25% - 25% - 25%
Inter-chromosomal	5,178,347 (4.74%/11.47%)
Intra-chromosomal	23,343,823 (21.35%/51.72%)
Short Range (<20Kb)	19,074,662 (17.44%/42.26%)
Long Range (>20Kb)	4,268,693 (3.90%/9.46%)

Open in a new tab

RNA-seq

The cDNA libraries were constructed with the TruSeq^TM RNA sample preparation Kit (Illumina, San Diego, CA, USA) using 1 μg of total RNA. Libraries were size-selected for 300 bp target fragments on 2% low range ultra-agarose, followed by PCR amplification for 15 cycles using Phusion DNA polymerase (NEB, Ipswich, MA, USA). After quantification by TBS380 (Picogreen, Waltham, MA, USA), the paired-end library was sequenced on an Illumina NovaSeq 6000 sequencer (Illumina, San Diego, CA, USA) at Majorbio Bio-pharm Technology Co., Ltd (Shanghai, China). Trinity v2.11.0 was used for de novo assembly to obtain corresponding transcripts from RNA-seq raw data with default parameters¹⁸ (--seqType fq --max_memory 200 G --left R1.raw.fastq --right R2.raw.fastq --CPU 60 --trimmomatic --output pj_trinity).

Gene structure and function annotation

Gene structure annotation was performed using Maker v3.01.03 genome annotation pipeline, following established protocols^10,19. RNA-seq evidence described above was utilized for genome annotation to improve exon nucleotide accuracy. Gene functions were annotated using eggnog-mapper v2.1.7²⁰. A total of 10,908 annotated protein-coding genes were identified, and BUSCO analysis showed that 92.8% of the evaluated single-copy genes were identified as complete, with 2.3% fragmented and 4.9% missing gene (Table 3). In scale insects, species with genomes assembled to the chromosome level tend to have fewer genes compared to those at the scaffold level, which may be due to redundancy caused by insufficient assembly levels (Table 5).

Table 5.

Statistics for number of protein-coding genes in the genome of Pseudococcus jackbeardsleyi and other scale insects.

Species Name	Assemble Level	NO. protein-coding genes
Pseudococcus jackbeardsleyi	Chr.	10,908
Pseudococcus viburni	Scaf.	23,629
Phenacoccus solenopsis	Chr.	11,880
Planococcus citri	Chr.	18,954
Coronaproctus castanopsis	Chr.	10,542
Ferrisia virgata	Scaf.	47,978
Maconellicoccus hirsutus	Scaf.	21,623

Open in a new tab

Repeats and non-coding RNA (ncRNA) annotation

RepeatMasker v4.0.7 was used to detect repetitive elements in scaffolds longer than 1,000 bp against the Insecta repeats within RepBase Update²¹ (http://www.girinst.org). For ab initio prediction, RepeatModeler v2.0.1 (http://www.repeatmasker.org/RepeatModeler.html, RRID: SCR_015027) were first used for de novo candidate database constructing of repetitive elements. Among the repetitive sequences, retroelements and DNA transposons accounted for 5.22% and 5.61% of the whole genome, respectively (Table 6). Totally 4,380 satellites and 63,734 simple repeats were identified as tandem repeats (TRs), accounting for 0.19% and 0.86% of the P. jackbeardsleyi genome, respectively (Table 6). For ncRNA annotation, transfer RNA (tRNA) and ribosome RNA (rRNA) were predicted by tRNAscan-SE and RNAmmer with default parameters^22,23. MicroRNAs (miRNA) were predicted by aligning the genomic sequence against RFAM v14.10 database (http://rfam.xfam.org/) using BLASTN²⁴. A total of 219 tRNAs, 83 rRNAs, and 28 microRNAs were predicted in the P. jackbeardsleyi genome (Table 7). A circular diagram illustrating gene count, repeat density and GC content was generated using Circos²⁵ (Fig. 2).

Table 6.

Statistics for repeat elements in the genome of Pseudococcus jackbeardsleyi.

Types	Number	Length (bp)	Percentage (%)
Retroelements	23,414	17,479,410	5.22
SINEs	61	9,813	0
Penelope	31	29,910	0.01
LINEs	6,205	1,671,215	0.5
CRE/SLACS	0	0	0
L2/CR1/Rex	927	328,377	0.1
R1/LOA/Jockey	2,848	570,516	0.17
R2/R4/NeSL	0	0	0
RTE/Bov-B	136	27,046	0.01
L1/CIN4	0	0	0
LTR elements	17,148	15,798,382	4.72
BEL/Pao	590	733,282	0.22
Ty1/Copia	2,294	2,695,829	0.81
Gypsy/DIRS1	7,417	9,319,913	2.78
Retroviral	1,004	109,343	0.03
DNA transposons	82,721	18,792,160	5.61
hobo-Activator	29,616	6,473,573	1.93
Tc1-IS630-Pogo	16,404	3,948,341	1.18
En-Spm	0	0	0
MuDR-IS905	0	0	0
PiggyBac	190	101,936	0.03
Tourist/Harbinger	2,530	622,867	0.19
Other (Mirage, P-element, Transib)	0	0	0
Rolling-circles	7,887	2,366,428	0.71
Unclassified	313,462	79,851,041	23.85
Total interspersed repeats	419,597	116,122,611	34.68
Small RNA	248	228,707	0.07
Satellites	4,380	624,937	0.19
Simple repeats	63,734	2,869,998	0.86
Low complexity	14,688	1,558,602	0.47

Open in a new tab

Table 7.

Statistics for noncoding RNA genes in the genome of Pseudococcus jackbeardsleyi.

Types		Number
Infernal stats	Candidate tRNAs read	244
	Infernal-confirmed tRNAs	219
	Bases scanned by Infernal	24,287
tRNA count	tRNAs decoding Standard 20 AA	195
	Selenocysteine tRNAs (TCA)	0
	Possible suppressor tRNAs (CTA,TTA,TCA)	0
	tRNAs with undetermined/unknown isotypes	10
	Predicted pseudogenes	14
Total tRNAs		219
tRNAs with introns		17
rRNA	5s rRNA	26
	5.8s rRNA	14
	18s rRNA	19
	28s rRNA	24
miRNA		28

Open in a new tab

Fig. 2 — The outer layer of coloured blocks is a circular representation of the 5 linkage-groups and circos demonstration of gene count (histogram), repeat density (heatmap) and GC content (line) from the outer to the inner circle, respectively.

Genome family analysis

Twelve gene families associated with detoxification and chemosensory functions were manually annotated in P. jackbeardsleyi, including cytochrome P450 monooxygenase (P450s), glutathione S-transferase (GSTs), carboxyl/cholinesterase (CCEs), UDP-glycosyltransferases (UGTs), ATP-binding cassette (ABC) transporter, heat shock protein (HSP), odorant binding protein (OBP), odorant receptor (OR), gustatory receptor (GR), Ionotropic receptors (IR), chemosensory proteins (CSP), and sensory neuron membrane protein (SNMP). The bioinformatic pipeline BITACORA (full mode) conducted HMMER and BLAST analyses²⁶. Genes were annotated with a default cutoff E-value of 10e-5 and manually verified based on gene length and conserved domains sourced from the SMART database²⁷. In total, we identified 83 P450s, 16 GSTs, 150 CCEs, 38 UGTs, 83 ABC transporters and 47 HSPs in P. jackbeardsleyi genome (Table 8). Additionally, there are 81 chemosensory genes in P. jackbeardsleyi, including 21 OBPs, 5 ORs, 15 GRs, 19 IRs, 9 CSPs and 12 SNMPs (Table 8).

Table 8.

Statistics for 12 gene families of Pseudococcus jackbeardsleyi.

Gene Family	Number of annotated genes Identified	Number of manually annotated genes	Total number of identified genes	Total number of identified genes clustering identical sequences
P450	83	0	83	83
GST	16	1	17	17
CCE	150	0	150	150
UGT	38	0	38	38
ABC	83	3	86	86
OBP	21	1	22	22
OR	5	16	21	21
GR	15	8	23	23
IR	19	8	27	27
CSP	9	2	11	11
SNMP	12	0	12	12
HSP	47	17	64	62

Open in a new tab

Data Records

The dataset is available at the National Center for Biotechnology Information (NCBI), under the genome accession number of JAZDXF000000000²⁸. The NCBI BioProject accession number is PRJNA1070360. RNA-seq, Hi-C and Illumina raw reads have been deposited in the Sequence Read Archive (SRA) repository with the accession number of SRP486604²⁹. In addition, the annotation files for genome, ncRNA and repeat content had been submitted at the figshare^30–32.

Technical Validation

The integrity of the extracted DNA was assessed by agarose gel electrophoresis, and DNA concentration was determined using NanoDrop and Qubit 3.0 Fluorometer with an absorbance of approximately 2.0 at 260/280. The scaffold N50, indicating the length at which half of the genome assembly is in scaffolds of this size, notably improved to 67.233 Mb, surpassing many other genomes (Table 1). We evaluated the completeness of the genome assembly using the sequence identity method, aligning small fragment library reads with the assembled genome using BWA software. The BUSCO analysis demonstrated 95.7% completeness (Table 3), affirming the high quality of the genome assembly. The percentage of duplicated single-copy genes assessed by BUSCO was minimal at 2.3% (Table 3), indicating that duplication was not a significant issue in the assembly process. Furthermore, BlobTools was utilized to detect potential contamination in the assembly, revealing no indications of contamination. These results indicated that we successfully acquired a high-quality genome of P. jackbeardsleyi.

Acknowledgements

This research was supported by the National Key Research and Development Programme of China (2021YFF0601901), Beijing Natural Science Foundation (6244049), Hainan Natural Science Foundation (323MS065) and the China Agriculture Research System of MOF and MARA.

Author contributions

Shaokun Guo conceived and designed the study; Shaokun Guo and Bo Liu conducted molecular works; Guoping Zhan and Qingying Zhao provided the insect pictures; Shaokun Guo, Guoping Zhan and Zhihong Li discussed the results; Shaokun Guo analyzed the data and wrote the manuscript.

Code availability

The data analyses were performed according to the manuals and protocols by the developers of corresponding bioinformatics tools and all software, and codes used in this work are publicly available, with corresponding versions indicated in Methods.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Garcia Morales, M. et al. ScaleNet: a literature-based model of scale insect biology and systematics. Database2016, bav118 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Miller, D., Miller, G. & Watson, G. Invasive species of mealybugs (Hemiptera: Pseudococcidae) and their threat to US agriculture. P. Entomol. Soc. Wash.104, 825–836 (2002). [Google Scholar]
3.Bellotti, A. C., et al.Cassava pests in Latin America, Africa and Asia. (Centro Internacional de Agricultura Tropical (CIAT), 2011).
4.Meyer, J. B., Kasdorf, G. G. F., Nel, L. H. & Pietersen, G. Transmission of activated-episomal Banana streak OL (badna) virus (BSOLV) to cv. Williams Banana (Musa sp.) by three mealybug species. Plant Dis.92, 1158–1163 (2008). 10.1094/PDIS-92-8-1158 [DOI] [PubMed] [Google Scholar]
5.Williams, D. The distribution of the neotropical mealybug Pseudococcus elisae Borchsenius in the Pacific region and Southern. Asia (Hem.-Hom., Pseudococcidae). Entomologist’s Monthly Magazine124, 123–124 (1988). [Google Scholar]
6.CABI. Pseudococcus jackbeardsleyi (Jack Beardsley mealybug). CABI Compendium, https://www.cabi.org/cpc/datasheet/45087 (2021).
7.Puig, A. S., Wurzel, S., Suarez, S., Marelli, J. P. & Niogret, J. Mealybug (Hemiptera: Pseudococcidae) species associated with cacao mild mosaic virus and evidence of virus acquisition. Insects12, 994 (2021). 10.3390/insects12110994 [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Williams, D. J. & Watson, G. W. Scale insects of the tropical South Pacific region. Part 2. Mealybugs (Pseudococcidae). (CAB International, 1988).
9.Guo, S. et al. Chromosome-level genome assembly of an important wolfberry fruit fly (Neoceratitis asiatica Becker). Sci. Data10, 675 (2023). 10.1038/s41597-023-02601-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Guo, S. et al. Chromosome-level assembly of the melon thrips genome yields insights into evolution of a sap-sucking lifestyle and pesticide resistance. Mol. Ecol. Resour.20, 1110–1125 (2020). 10.1111/1755-0998.13189 [DOI] [PubMed] [Google Scholar]
11.Hu, J., Fan, J. P., Sun, Z. Y. & Liu, S. L. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics36, 2253–2255 (2020). 10.1093/bioinformatics/btz891 [DOI] [PubMed] [Google Scholar]
12.Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics27, 764–770 (2011). 10.1093/bioinformatics/btr011 [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics33, 2202–2204 (2017). 10.1093/bioinformatics/btx153 [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Manni, M., Berkeley, M. R., Seppey, M., Simao, F. A. & Zdobnov, E. M. BUSCO update: Novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol.38, 4647–4654 (2021). 10.1093/molbev/msab199 [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Belaghzal, H., Dekker, J. & Gibcus, J. H. Hi-C 2.0: An optimized Hi-C procedure for high-resolution genome-wide mapping of chromosome conformation. Methods123, 56–65 (2017). 10.1016/j.ymeth.2017.04.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science356, 92–95 (2017). 10.1126/science.aal3327 [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst.3, 95–98 (2016). 10.1016/j.cels.2016.07.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Grabherr, M. G. et al. Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data. Nat. Biotechnol.29, 644–652 (2011). 10.1038/nbt.1883 [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Cantarel, B. L. et al. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res.18, 188–196 (2008). 10.1101/gr.6743907 [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Cantalapiedra, C. P., Hernández-Plaza, A., Letunic, I., Bork, P. & Huerta-Cepas, J. eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol. Biol. Evol.38, 5825–5829 (2021). 10.1093/molbev/msab293 [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinf. 25, unit 4.10 (2009). [DOI] [PubMed]
22.Lagesen, K. et al. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res.35, 3100–3108 (2007). 10.1093/nar/gkm160 [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res.25, 955–964 (1997). 10.1093/nar/25.5.955 [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Kalvari, I. et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res.49, D192–D200 (2020). 10.1093/nar/gkaa1047 [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Krzywinski, M. et al. Circos: An information aesthetic for comparative genomics. Genome Res.19, 1639–1645 (2009). 10.1101/gr.092759.109 [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Vizueta, J., Sanchez-Gracia, A. & Rozas, J. BITACORA: A comprehensive tool for the identification and annotation of gene families in genome assemblies. Mol. Ecol. Resour.20, 1445–1452 (2020). 10.1111/1755-0998.13202 [DOI] [PubMed] [Google Scholar]
27.Letunic, I., Khedkar, S. & Bork, P. SMART: recent updates, new developments and status in 2020. Nucleic Acids Res.49, D458–D460 (2020). 10.1093/nar/gkaa937 [DOI] [PMC free article] [PubMed] [Google Scholar]
28.NCBI Assemblyhttps://identifiers.org/ncbi/insdc.gca:GCA_038380155.1 (2024).
29.NCBI Sequence Read Archivehttps://identifiers.org/ncbi/insdc.sra:SRP486604 (2024).
30.Guo, S. Pseudococcus jackbeardsleyi genome annotation. figshare10.6084/m9.figshare.25622025.v1 (2024). 10.6084/m9.figshare.25622025.v1 [DOI]
31.Guo, S. Pseudococcus jackbeardsleyi noncoding RNA prediction. figshare10.6084/m9.figshare.26268106.v1 (2024). 10.6084/m9.figshare.26268106.v1 [DOI]
32.Guo, S. Pseudococcus jackbeardsleyi repeat content annotation. figshare10.6084/m9.figshare.26268229.v1 (2024).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

NCBI Assemblyhttps://identifiers.org/ncbi/insdc.gca:GCA_038380155.1 (2024).
NCBI Sequence Read Archivehttps://identifiers.org/ncbi/insdc.sra:SRP486604 (2024).
Guo, S. Pseudococcus jackbeardsleyi genome annotation. figshare10.6084/m9.figshare.25622025.v1 (2024). 10.6084/m9.figshare.25622025.v1 [DOI]
Guo, S. Pseudococcus jackbeardsleyi noncoding RNA prediction. figshare10.6084/m9.figshare.26268106.v1 (2024). 10.6084/m9.figshare.26268106.v1 [DOI]

Data Availability Statement

[CR1] 1.Garcia Morales, M. et al. ScaleNet: a literature-based model of scale insect biology and systematics. Database2016, bav118 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.Miller, D., Miller, G. & Watson, G. Invasive species of mealybugs (Hemiptera: Pseudococcidae) and their threat to US agriculture. P. Entomol. Soc. Wash.104, 825–836 (2002). [Google Scholar]

[CR3] 3.Bellotti, A. C., et al.Cassava pests in Latin America, Africa and Asia. (Centro Internacional de Agricultura Tropical (CIAT), 2011).

[CR4] 4.Meyer, J. B., Kasdorf, G. G. F., Nel, L. H. & Pietersen, G. Transmission of activated-episomal Banana streak OL (badna) virus (BSOLV) to cv. Williams Banana (Musa sp.) by three mealybug species. Plant Dis.92, 1158–1163 (2008). 10.1094/PDIS-92-8-1158 [DOI] [PubMed] [Google Scholar]

[CR5] 5.Williams, D. The distribution of the neotropical mealybug Pseudococcus elisae Borchsenius in the Pacific region and Southern. Asia (Hem.-Hom., Pseudococcidae). Entomologist’s Monthly Magazine124, 123–124 (1988). [Google Scholar]

[CR6] 6.CABI. Pseudococcus jackbeardsleyi (Jack Beardsley mealybug). CABI Compendium, https://www.cabi.org/cpc/datasheet/45087 (2021).

[CR7] 7.Puig, A. S., Wurzel, S., Suarez, S., Marelli, J. P. & Niogret, J. Mealybug (Hemiptera: Pseudococcidae) species associated with cacao mild mosaic virus and evidence of virus acquisition. Insects12, 994 (2021). 10.3390/insects12110994 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Williams, D. J. & Watson, G. W. Scale insects of the tropical South Pacific region. Part 2. Mealybugs (Pseudococcidae). (CAB International, 1988).

[CR9] 9.Guo, S. et al. Chromosome-level genome assembly of an important wolfberry fruit fly (Neoceratitis asiatica Becker). Sci. Data10, 675 (2023). 10.1038/s41597-023-02601-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Guo, S. et al. Chromosome-level assembly of the melon thrips genome yields insights into evolution of a sap-sucking lifestyle and pesticide resistance. Mol. Ecol. Resour.20, 1110–1125 (2020). 10.1111/1755-0998.13189 [DOI] [PubMed] [Google Scholar]

[CR11] 11.Hu, J., Fan, J. P., Sun, Z. Y. & Liu, S. L. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics36, 2253–2255 (2020). 10.1093/bioinformatics/btz891 [DOI] [PubMed] [Google Scholar]

[CR12] 12.Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics27, 764–770 (2011). 10.1093/bioinformatics/btr011 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics33, 2202–2204 (2017). 10.1093/bioinformatics/btx153 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Manni, M., Berkeley, M. R., Seppey, M., Simao, F. A. & Zdobnov, E. M. BUSCO update: Novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol.38, 4647–4654 (2021). 10.1093/molbev/msab199 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Belaghzal, H., Dekker, J. & Gibcus, J. H. Hi-C 2.0: An optimized Hi-C procedure for high-resolution genome-wide mapping of chromosome conformation. Methods123, 56–65 (2017). 10.1016/j.ymeth.2017.04.004 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science356, 92–95 (2017). 10.1126/science.aal3327 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst.3, 95–98 (2016). 10.1016/j.cels.2016.07.002 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Grabherr, M. G. et al. Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data. Nat. Biotechnol.29, 644–652 (2011). 10.1038/nbt.1883 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Cantarel, B. L. et al. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res.18, 188–196 (2008). 10.1101/gr.6743907 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Cantalapiedra, C. P., Hernández-Plaza, A., Letunic, I., Bork, P. & Huerta-Cepas, J. eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol. Biol. Evol.38, 5825–5829 (2021). 10.1093/molbev/msab293 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinf. 25, unit 4.10 (2009). [DOI] [PubMed]

[CR22] 22.Lagesen, K. et al. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res.35, 3100–3108 (2007). 10.1093/nar/gkm160 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res.25, 955–964 (1997). 10.1093/nar/25.5.955 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Kalvari, I. et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res.49, D192–D200 (2020). 10.1093/nar/gkaa1047 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Krzywinski, M. et al. Circos: An information aesthetic for comparative genomics. Genome Res.19, 1639–1645 (2009). 10.1101/gr.092759.109 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Vizueta, J., Sanchez-Gracia, A. & Rozas, J. BITACORA: A comprehensive tool for the identification and annotation of gene families in genome assemblies. Mol. Ecol. Resour.20, 1445–1452 (2020). 10.1111/1755-0998.13202 [DOI] [PubMed] [Google Scholar]

[CR27] 27.Letunic, I., Khedkar, S. & Bork, P. SMART: recent updates, new developments and status in 2020. Nucleic Acids Res.49, D458–D460 (2020). 10.1093/nar/gkaa937 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.NCBI Assemblyhttps://identifiers.org/ncbi/insdc.gca:GCA_038380155.1 (2024).

[CR29] 29.NCBI Sequence Read Archivehttps://identifiers.org/ncbi/insdc.sra:SRP486604 (2024).

[CR30] 30.Guo, S. Pseudococcus jackbeardsleyi genome annotation. figshare10.6084/m9.figshare.25622025.v1 (2024). 10.6084/m9.figshare.25622025.v1 [DOI]

[CR31] 31.Guo, S. Pseudococcus jackbeardsleyi noncoding RNA prediction. figshare10.6084/m9.figshare.26268106.v1 (2024). 10.6084/m9.figshare.26268106.v1 [DOI]

[CR32] 32.Guo, S. Pseudococcus jackbeardsleyi repeat content annotation. figshare10.6084/m9.figshare.26268229.v1 (2024).

PERMALINK

Chromosome-level genome assembly of the invasive pest Pseudococcus jackbeardsleyi (Hemiptera: Pseudococcidae)

Shaokun Guo

Bo Liu

Qingying Zhao

Zhihong Li

Guoping Zhan

Abstract

Background & Summary

Table 1.

Methods

Samples collection, DNA and RNA preparation

Genome sequencing and assembly

Table 2.

Fig. 1. Assembly features of Pseudococcus jackbeardsleyi genome.

Table 3.

Hi-C sequencing and chromosome anchoring

Table 4.

RNA-seq

Gene structure and function annotation

Table 5.

Repeats and non-coding RNA (ncRNA) annotation

Table 6.

Table 7.

Fig. 2. Overview of assembled Pseudococcus jackbeardsleyi genome.

Genome family analysis

Table 8.

Data Records

Technical Validation

Acknowledgements

Author contributions

Code availability

Competing interests

Footnotes

References

Associated Data

Data Citations

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases