Skip to main content
Data in Brief logoLink to Data in Brief
. 2024 Mar 6;54:110298. doi: 10.1016/j.dib.2024.110298

A de novo assembly of genomic dataset sequences of the sugar beet root maggot Tetanops myopaeformis, TmSBRM_v1.0

Nadim W Alkharouf a, Chenggen Chu b, Vincent P Klink c,
PMCID: PMC10965452  PMID: 38544912

Abstract

The sugar beet root maggot (SBRM), Tetanops myopaeformis (von Röder), is a devastating insect pathogen of sugar beet (SB), Beta vulgaris, ssp vulgaris (B. vulgaris), an important food crop, while also being one of only two plants globally from which sugar is widely produced, and accounting for 35% of global raw sugar with an annual farm value of $3 billion in the United States alone. SBRM is the most devastating pathogen of sugar beet in North America. The limited natural resistance of B. vulgaris necessitates an understanding of the SBRM genome to facilitate generating knowledge of its basic biology, including the interaction between the pathogen and its host(s). Presented is the de novo assembled draft genome sequence of T. myopaeformis isolated from field-grown B. vulgaris in North Dakota, USA. The SBRM genome sequence TmSBRM_v1.0 will also be valuable for molecular genetic marker development to facilitate host resistance gene identification and knowledge, including SB polygalacturonase inhibiting protein (PGIP), and development of new control strategies for this pathogen, relationship to model genetic organisms like Drosophila melanogaster and aid in agronomic improvement of sugar beet for stakeholders while also providing information on the relationship between the SBRM and climate change.

Keywords: Genome, Plant pathogen, Insect, evolution, Drosophila, Climate change, Agronomic, Stakeholder


Specifications Table

Subject Omics
Specific subject area Genomics
Data format Raw, Analyzed, Filtered
Type of data Table
Data collection A draft of the T. myopaeformis genome was assembled from isolated genomic DNA from a mate-pair library using PacBio Sequel Revio sequencing platform.
Data source location Fargo, ND, USA
Data accessibility Repository name: NCBI
Data identification number: BioSample accession: SAMN37733483, Temporary SubmissionID: SUB13882507, BioProject ID PRJNA1026092.
Direct URL to data: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1026092
The data has been deposited in Genbank SRA archive found at NCBI under embargo. The data met their requirements for submission. The BioSample accession, Temporary SubmissionID, and BioProject ID that relates to the data are provided above.

1. Value of the Data

  • The DNA sequence reads provide data that researchers can use to understand insect evolution, the evolution of a pathological niche, host selection, ecology, climate change, and other facets to improve an understanding of their biology and agronomic impact(s).

  • The data, deposited in a public database, are available freely for use.

  • The data are anticipated to be used for scientific uses. The genomic data can be used to identify targets for the suppression of essential pathogen gene function through RNA interference, mutagenesis, or gene editing through CRISPR/Cas9-mediated processes that synthetically modify genes [[1], [2], [3], [4], [5]]. The suppression of the function of the essential SBRM genes leads to the death of the pathogen, allowing the plant to grow unimpeded by the, otherwise, detrimental effects of the pathogen which would improve the agronomic value of the crop. The reference genome will also allow for a better scientific investigation of the insect and related insect species. The advantage of having a reference genome has been proven over and over in biological studies with subsequent generations of the genome improving on the original sequence [6]; Hoskins et al. 2014.

2. Background

Tetanops myopaeformis (von Röder), the sugar beet root maggot (SBRM), is a devastating pathogen of sugar beet (SB), Beta vulgaris, ssp vulgaris (B. vulgaris) [[1], [6]]. SB is an important food crop, while also being one of only two plants globally from which sugar is widely produced, and accounting for 35% of global raw sugar with an annual farm value of $3 billion in the United States alone [[1], [6]]. SBRM is the most devastating pathogen of sugar beet in North America [[1], [6]]. Agricultural control of SBRM is limited by a scarcity of genetic knowledge. The analysis presented here is aimed at providing such a resource for the scientific community and a basis for future updates. The work describes the data, explains its utility to the community, provides protocols and references, and provides a documented link to the data that is in a standard, re-useable format.

3. Data Description

A draft de novo assembly of the T. myopaeformis genome was made, BioSample accession: SAMN37733483, Temporary SubmissionID: SUB13882507, BioProject ID PRJNA1026092. The data are at the URL: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1026092. PacBio HiFi reads were assembled using the pipeline Flye, version 2.9.2, [2]. Default values were used, except for setting the –asm-coverage argument to 50, to reduce memory consumption. Flye was installed and run on the Windows Subsystem for Linux (Ubuntu 22.04) running on a Windows 2022 workstation with 45 GB of memory. The T. myopaeformis sequencing and assembly statistics are summarized (Table 1). The mate-pair library produced a total number of raw reads of 6,356,906. The total read length was 71,844,227,661 and the N50/N90 reads were 11313 and 8294, respectfully (Table 1). The assembly statistics showed a total length of 414,327,873 with the number of contigs of 8,228 (Table 1). The contigs N50 was 57,402. The largest contig was 573,329 bp (Table 1). The mean genome coverage was 94x (Table 1).

Table 1.

T. myopaeformis TmSBRM_v1.0 genome assembly statistics.

Sequencing technology: PacBio HiFi
Total number of raw reads: 6,356,906
Total read length: 71,844,227,661
Reads N50/N90: 11,313 / 8,294
GC% 40.2
Assembly statistics:
 Total length: 414,327,873
 Number of contigs 8,228
 Contigs N50 57,402
 Largest contig 573,329
 Mean coverage 94x

4. Experimental Design, Materials and Methods

Five larvae of T. myopaeformis were collected at Fargo, North Dakota and used as a source of DNA for the analysis. DNA isolation and sequencing of the agriculturally important T. myopaeformis was performed at CD Genomics using their proprietary methods. Briefly, DNA was isolated from liquid nitrogen flash frozen larvae. For sample preparation and DNA quality control, isolated DNA quality and quantity were assessed using Qubit and Agilent 5200 Fragment Analyzer. For DNA fragmentation, the DNA was cut into 15 Kb or larger fragments using the Covanis g-TUBE, and subsequently purified using magnetic beads. For DNA repair and end modification, DNA repair enzyme was used to correct any DNA damage and ensure uniformity. Furthermore, the overhang adapters were ligated to the end of the repaired DNA ends, followed by purification using magnetic beads. For SMRTbell library preparation, the repaired and adapter-ligated DNA fragments were then converted into SMRTbell libraries. For SMRTbell library size selection, the SMRTbell libraries were subjected to BluePippin size selection to enrich the fragments over 9-13 Kb and 15 Kb. For binding to polymerase, the size-selected SMRTbell libraries were bound to DNA polymerase molecules. For DNA sequencing, the prepared SMRTbell libraries, with bound polymerase molecules, were loaded onto the PacBio Sequel Revio sequencing platform with options enabled to retain subreads and kinetics sequencing metrics. Furthermore, BAM files generated through HiFi sequencing were converted to FastQ files using the SamTools Fastq algorithm. The FastQ files were imported for genome assembly.

Limitations

Insects are known to have sex chromosomes. Therefore, the assembly produced here may not totally account for such differences. Consequently, subsequent iterations of SBRM genome sequencing efforts can be done that target this detail.

Ethics Statement

The authors have read and follow the ethical requirements for publication in Data in Brief and confirming that the current work does not involve human subjects, animal experiments, or any data collected from social media platforms.

CRediT authorship contribution statement

Nadim W. Alkharouf: Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing – original draft. Chenggen Chu: Investigation, Resources. Vincent P. Klink: Conceptualization, Methodology, Resources, Visualization, Supervision, Project administration, Funding acquisition, Writing – original draft.

Acknowledgments

This work was supported by the USDA-ARS NP 8042-21220-262-000D project to VK. The mention of trade names or commercial products in this publication was solely for the purpose of providing specific information and does not imply recommendation or endorsement by the United States Department of Agriculture. USDA is an equal opportunity provider and employer.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability

References

  • 1.Alkharouf N.W., Klink V.P., Matthews BF. Identification of Heterodera glycines (soybean cyst nematode [SCN]) cDNA sequences with high identity to those of Caenorhabditis elegans having lethal mutant or RNAi phenotypes. Exp. Parasitol. 2007;115:247–258. doi: 10.1016/j.exppara.2006.09.009. [DOI] [PubMed] [Google Scholar]
  • 2.Anderson M.A.E., Gonzalez E., Edgington M.P., Ang J.X.D., Purusothaman D.K., Shackleford L., Nevard K., Verkuijl S.A.N., Harvey-Samuel T., Leftwich P.T., Esvelt K., Alphey L. A multiplexed, confinable CRISPR/Cas9 gene drive can propagate in caged Aedes aegypti populations. Nat. Commun. 2024;15:729. doi: 10.1038/s41467-024-44956-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Bai X., Yu K., Xiong S., Chen J., Yang Y., Ye X., Yao H., Wang F., Fang Q., Song Q., Ye G. CRISPR/Cas9-mediated mutagenesis of the white gene in an ectoparasitic wasp, Habrobracon hebetor. Pest Manag. Sci. 2023 doi: 10.1002/ps.7851. [DOI] [PubMed] [Google Scholar]
  • 4.Chan D.T.C., Baldwin G.S., Bernstein H.. Revealing the host-dependent nature of an engineered genetic inverter in concordance with physiology. Biodes Res. 2023;5 doi: 10.34133/bdr.0016. 0016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Ijaz M., Khan F., Zaki H.E.M., Khan M.M., Radwan K.S.A., Jiang Y., Qian J., Ahmed T., Shahid M.S., Luo J., Li B. Recent trends and advancements in CRISPR-based tools for enhancing resistance against plant pathogens. Plants. 2023;12:1911. doi: 10.3390/plants12091911. (Basel) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Adams M.D., Celniker S.E., Holt R.A., Evans C.A., Gocayne J.D., Amanatides P.G., Scherer S.E., Li P.W., Hoskins R.A., Galle R.F., et al. The genome sequence of Drosophila melanogaster. Science. 2000;287:2185–2195. doi: 10.1126/science.287.5461.2185. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement


Articles from Data in Brief are provided here courtesy of Elsevier

RESOURCES