Skip to main content
Data in Brief logoLink to Data in Brief
. 2022 Mar 23;42:108090. doi: 10.1016/j.dib.2022.108090

rbcL gene in global molecular data repository

Conrad Omonhinmin 1,, Chinedu Onuselogu 1
PMCID: PMC8987485  PMID: 35402671

Abstract

The ribulose-bisphosphate carboxylase (rbcL) gene sequence data in the molecular data repository has been increasing significantly, over the years with contributions from different parts of the world. The abundance of the gene has enhanced its applications in several ways. Bulk records were obtained from National Center for Biotechnology Information (NCBI) GenBank database using the entrez efetch utility as implemented in the Biopython package version 1.77. Records corresponding to the following keywords “rbcL AND plants [filter] AND biomol_genomic [PROP] AND is _nuccore [filter]” were created. Generated records were cleaned and then further analysed using the code file in the supplementary materials. Country information was obtained by searching reference information for matches to countries present in the pycountry package. Where no match was found, null was returned. This data article contains information about the plant family and species whose rbcL gene sequence has been deposited on the NCBI and regions of the world that has contributed to the rbcL repository growth. This data can be used to analyse the intra and inter family relatedness of plant and compare with existing relationships the molecular characterization of plants, evolutionary relationship studies, inferring biogeography origin of plant.

Keywords: rbcL gene, Evolutionary, Biogeography, Phylogeny, Molecular repository

Specifications Table

Subject Biological sciences
Specific subject area Molecular phylogenetics, Phylogeny and Evolution
Type of data Text, Table, Chart, Figure
How data were acquired Biopython package version 1.77. was used to retrieve the rbcL gene sequence data from the NCBI GenBank. The written code used for retrieving the data from the NCBI GenBank can be assessed in the supplementary materials.
Data format Raw, Analysed and Filtered.
Description of data collection Bulk data were obtained from NCBI GenBank database using the entrez efetch utility as implemented in the Biopython package version 1.77. Datasets that do not have the matching words rbcL, Plant and DNA were filtered off from the data leaving behind data with the keywords rbcL, plant and DNA.
Data source location The data was obtained from the NCBI GenBank database.
Data accessibility With the article.
Repository name Mendeley Data
Data identification number 10.17632/wdmtpnwsrn.1
Direct link to the dataset: http://www.rbcLGeneinGlobalMolecularDataRepository.com

Values of the Data

  • This data present information of plant species, phylum, and family for which rbcL gene sequence have been deposited on NCBI GenBank.

  • Molecular systematics can use the data to renew the relatedness of plants both within and between families as well as compare with existing relationships.

  • This data is useful in the following field: molecular characterization of plants, evolutionary relationship studies, inferring biogeography, origin of plant, codon bias usage profile, protein structure analysis, ecological preference studies.

  • This data can be used to determine the pattern of growth of rbcL gene sequence from different regions in the molecular repository.

  • This data shows the least explore plant species and the need for exploitation.

1. Data Description

The data in this article gives an overview of the total number of plant species, families, with rbcL gene sequence in the GenBank and the regions that has contributed to the growth of the rbcL sequence in the repository. The sequence data of the rbcL gene are used for renewal of phylogenies among the seed plants [1]. The rbcL gene is preferred among other plant genes for phylogenetic studies due to its slower rate of evolutionary changes and the lowest divergence among the plastid genes in flowering plants [2,3]. [4] described the suitability of the gene for solving intergeneric and interspecific relationship and no difficulties of alignment. Some of the applications of the gene in the molecular investigations of plant species include: tracing of the molecular origin of plants [5], the biogeography origin of plants [6]. The datasets used, in the study was collected as a secondary data and the Bio python code written for data collection can be assessed as Supplementary data, the rbcL gene data used was obtained from the first report till 2020. Fig. 1, shows the most studied plant families with rbcL gene on the GenBank. Fig. 2; shows plant phyla with rbcL gene sequence and the extent to which the sequences have been utilized for rbcL related studies. The continents with rbcL sequence submission and the percentage of contribution to the GenBank is represented in Figs 3, 4 and 5 shows countries with higher rbcL sequence submission on GenBank. The plant species and other species with rbcL gene sequence can be assessed in the supplementary materials.

Fig. 1.

Fig 1

Most studied plant families with rbcL gene sequence in GenBank.

*The numbers indicate the number of species in each family with rbcL gene deposited on NCBI GenBank.

*NB: The study discovered a total number of 808 plant families with rbcL gene sequence submitted on the NCBI GenBank making it difficult to include all the families in the tree map in Fig. 1, hence the plant families with the most rbcL gene submission are mentioned in Fig. 1.

Fig. 2.

Fig 2

Percentage of plant phyla with rbcL gene data deposited on GenBank.

Fig. 3.

Fig 3

Percentage of rbcL sequences contribution from different regions.

Fig. 4.

Fig 4

Countries with higher submissions of rbcL sequences on the GenBank repository.

Fig. 5.

Fig 5

Map showing global concentration of rbcL sequence contribution to GenBank repository.

* Regions with dark blue has higher contribution of rbcL gene sequence on the NCBI GenBank.

2. Experimental Design, Materials and Methods

Bulk records were obtained for NCBI GenBank database using the entrez efetch utility as implemented in the Biopython package version 1.77. Records corresponding to the following keywords “rbcL AND plants[filter] AND biomol_genomic[PROP] AND is_nuccore[filter]” were obtained. Obtained records were cleaned and then further analysed using the codes files in the supplementary material. Country information was obtained by searching reference information for matches to countries present in the pycountry package. Where no match was found, null was returned.

CRedit Author Statement

Conrad Omonhinmin: Conceptualization, Methodology, Validation and Supervision; Chinedu Onuselogu: Data curation, Investigation, software, Reviewing and Editing, Writing-Original draft preparation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships which have or could be perceived to have influenced the work reported in this article.

Acknowledgments

The authors grateful to the Covenant University Centre for Research Innovation and Discovery (CUCRID) for the Publication Funding and Mr Bode Onile-ere of Biological Science, department, Covenant University for the assistance in data acquisition from the NCBI GenBank.

Footnotes

Supplementary material associated with this article can be found in the online version at doi:10.1016/j.dib.2022.108090.

Appendix. Supplementary materials

mmc1.txt (4KB, txt)
mmc2.txt (4KB, txt)
mmc3.xlsx (21.2MB, xlsx)

Data Availability

References

  • 1.Hashim A.M., Alatawi A., Altaf F.M., Qari S.H., Elhady M.E., Osman G.H., Abouseadaa H.H. Phylogenetic relationships and DNA barcoding of nine endangered medicinal plant species endemic to Saint Katherine protectorate. Saudi J. Biol. Sci. 2021;28(3):1919–1930. doi: 10.1016/j.sjbs.2020.12.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Kress W.J., Wurdack K.J., Zimmer E.A., Weigt L.A., Janzen D.H. Use of DNA barcodes to identify flowering plants. Proc. Natl. Acad. Sci. 2005;102(23):8369–8374. doi: 10.1073/pnas.0503123102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Ahmed I., Biggs P.J., Matthews P.J., Collins L.J., Hendy M.D., Lockhart P.J. Mutational dynamics of aroid chloroplast genomes. Genome Biol. Evol. 2012;4(12):1316–1323. doi: 10.1093/gbe/evs110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Reddy B.U. Cladistic analyses of a few members of Cucurbitaceae using rbcL nucleotide and amino acid sequences. Int. J. Bioinf. Res. 2009;1:58–64. [Google Scholar]
  • 5.Soltis D., Soltis P., Endress P., Chase M., Manchester S., Judd W., Mavrodiev E. University of Chicago Press; 2018. Phylogeny and Evolution of the Angiosperms: Revised and Updated Edition. [Google Scholar]
  • 6.Marion A., Sfriso A., Andreoli C., Moro I. The presence of exotic Hypnea flexicaulis (Rhodophyta) in the Mediterranean Sea as indicated by morphology, rbcL and cox1analyses. Aquat. Bot. 2011;95(1):55–58. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

mmc1.txt (4KB, txt)
mmc2.txt (4KB, txt)
mmc3.xlsx (21.2MB, xlsx)

Data Availability Statement


Articles from Data in Brief are provided here courtesy of Elsevier

RESOURCES