Abstract
Members of the whitefly Bemisia tabaci species complex cause millions of dollars of damage globally and are considered one of the world’s most invasive species. They are capable of causing extensive damage to major vegetable, grain legume and fiber crops. All member of the species complex are morphologically identical therefore, data from the partial mitochondrial cytochrome oxidase subunit I (mtCOI) gene sequence has been used to identify the various species. The current reference dataset that is widely used is found on the CSIRO data portal. However, the reference set stored on the CSIRO data does not include newly added sequences (2013-2017), therefore an updated reference dataset is needed. All mtCOI data for the Bemisia tabaci species complex were downloaded on 22 May 2017 from GenBank and after quality checking, a dataset of 1,071 unique sequences and 696 base pairs was generated (https://doi.org/10.6084/m9.figshare.5437420.v1).
Keywords: species identification, whitefly, insect vector, mitochondrial cytochrome oxidase, DNA barcoding
Introduction
Members of the Bemisia tabaci (whiteflies) species complex are among the world’s most devastating insect pests and cause billions of dollars (US) of damage each year, leaving farmers in the developing world food insecure ( De Barro et al., 2011). As a species complex with at least 34 members, identification is based on the use of the 657 bp portion of the 3’ end of the mitochondrial COI (mtCOI) ( Boykin et al., 2012, Boykin et al., 2013). In order to identify members of the complex correctly, a curated reference dataset is a useful resource. In 2012, a reference mtCOI dataset was made available on the CSIRO data portal ( De Barro & Boykin, 2012). Errors in the dataset were subsequently identified and so the dataset was updated on 15 May 2017 ( http://doi.org/10.4225/08/591a4018dfca8) ( De Barro & Boykin, 2017), but did not include new additions from GenBank (post 2012). Therefore, the dataset described herein represents the most up-to-date reference resource for members of the complex.
Methods
The CSIRO dataset ( http://doi.org/10.4225/08/591a4018dfca8), updated 15 May 2017 was used as the starting point. The existing records were updated to include host plant data. New records post-2012 were then downloaded on 22 May 2017 directly from GenBank. All downloaded data was treated as follows:
1) Data was classified with BLAST using the new CSIRO reference data set
2) Sequences that caused gaps in the alignment were removed
2) Sequences that had stop codons present were removed
3) Clustal Omega ( Sievers & Higgins, 2014) was used for preliminary alignment and fine tuning of the alignment was carried out with MAFFT ( Katoh & Standley, 2013).
4) Duplicate sequences were then removed using BBMAP Dedupe ( Bushnell, 2017).
In addition, all MEAM2 sequences were removed as they have now been confirmed to be pseudogenes ( Tay et al., 2017).
Data availability
The data referenced by this article are under copyright with the following copyright statement: Copyright: © 2017 Boykin LM et al.
Figshare: Dataset 1. mtCOI reference data for species ID of Bemisia tabaci. DOI: 10.6084/m9.figshare.5437420 ( Boykin et al., 2017)
Funding Statement
The author(s) declared that no grants were involved in supporting this work.
[version 1; referees: 2 approved]
References
- Boykin LM, Armstrong KF, Kubatko L, et al. : Species delimitation and global biosecurity. Evol Bioinform Online. 2012;8:1–37. 10.4137/EBO.S8532 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boykin LM, Bell CD, Evans G, et al. : Is agriculture driving the diversification of the Bemisia tabaci species complex (Hemiptera: Sternorrhyncha: Aleyrodidae)?: Dating, diversification and biogeographic evidence revealed. BMC Evol Biol. 2013;13:228. 10.1186/1471-2148-13-228 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boykin L, Savill A, De Barro P: mtCOI reference data for species ID of Bemisia tabaci. figshare. 2017. Data Source [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bushnell B: BBmap.2017. Reference Source [Google Scholar]
- De Barro P, Boykin LM: Global Bemisia dataset release version 31 Dec 2012. CSIRO.2012. 10.4225/08/50EB54B6F1042 [DOI] [Google Scholar]
- De Barro P, Boykin LM: Global Bemisia dataset release version 15 May 2017. CSIRO.2017. 10.4225/08/591a4018dfca8 [DOI] [Google Scholar]
- De Barro PJ, Liu SS, Boykin LM, et al. : Bemisia tabaci: a statement of species status. Annu Rev Entomol. 2011;56:1–19. 10.1146/annurev-ento-112408-085504 [DOI] [PubMed] [Google Scholar]
- Katoh K, Standley DM: MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30(4):772–780. 10.1093/molbev/mst010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sievers F, Higgins DG: Clustal Omega, accurate alignment of very large numbers of sequences. Methods Mol Biol. 2014;1079:105–116. 10.1007/978-1-62703-646-7_6 [DOI] [PubMed] [Google Scholar]
- Tay WT, Elfekih S, Court LN, et al. : The trouble with MEAM2: Implications of pseudogenes on species delimitation in the globally invasive Bemisia tabaci (Hemiptera: Aleyrodidae) cryptic species complex. Genome Biol Evol. 2017. 10.1093/gbe/evx173 [DOI] [PMC free article] [PubMed] [Google Scholar]