Skip to main content
Data in Brief logoLink to Data in Brief
. 2019 Mar 7;24:103757. doi: 10.1016/j.dib.2019.103757

Data on draft genome sequence of Bacillus sp. strain VKPM B-3276 isolated from Culex pipiens larvae

VV Zinina a, AA Korzhenkov a, AV Tepliuk a, AA Kanikovskaja b, MV Patrushev a, IV Kublanov a, SV Toshchakov a,
PMCID: PMC6441799  PMID: 30976634

Abstract

The draft genome sequence of Bacillus sp. VKPM B-3276, a mesophilic, gram-positive bacterium, isolated from dead Culex pipiens larvae is presented. This strain was deposited in the Russian National Collection of Industrial Microorganisms as a prospective candidate for development of new entomopathogenic agents. The genome of Bacillus sp. VKPM B-3276 was 6,126,346 bp in length with predicted completeness of 99.43%. Genome analysis identified 6518 protein-coding sequences and 111 RNAs genes. 13% (271) of the protein-coding genes were assigned to “Carbohydrates” subsystem category, according to RAST/SEED. Among them about 50 enzymes, responsible for chitin, peptidoglycan and related molecules decomposition, were found. The draft genome of strain VKPM B-3276 was deposited at DBJ/EMBL/GenBank under the accession nos. RZHM00000000, PRJNA511803 and SAMN10644103 for Genome, Bioproject and Biosample, respectively.

Keywords: Draft genome assembly, De novo assembly, Entomopathogenic bacteria, Bacillus, Mosquito biocontrol


Specifications table.

Subject area Biology, Microbiology
More specific subject area Microbial biotechnology
Type of data Genomic sequence, predicted genes and annotation of respective proteins, deposited in NCBI database and available by links provided within article; heatmap of average nucleotide identity between type strain genome assemblies of “Bacilluscereus group”, and histogram of genes involved in degradation of chitin, peptidoglycan and related compounds presented within an article;
How data was acquired De novo whole genome sequencing with Illumina MiSeq
Data format Analyzed and annotated draft genome assembly
Experimental factors Extraction of genomic DNA from a pure culture, fragment library preparation, Illumina sequencing, de novo assembly and annotation procedures
Experimental features Extraction of genomic DNA was performed with standard phenol-chloroform method; fragment library was prepared with KAPA HyperPlus™ Kit; sequencing was performed with Illumina MiSeq™ system. The genome was assembled using SPAdes and annotated with RAST web server
Data source location The culture of strain VKPM B-3276 is deposited in Russian National Collection of Industrial Microorganisms (VKPM) in Moscow, Russian Federation. http://vkpm.genetika.ru/katalog-mikroorganizmov/show21240/
Data accessibility Data are publicly available at NCBI Genbank. The Biosample, Bioproject and assembly/WGS accession numbers are: SAMN10644103 (https://www.ncbi.nlm.nih.gov/biosample/SAMN10644103/) PRJNA511803 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA511806) and RZHM00000000 (https://www.ncbi.nlm.nih.gov/nuccore/RZHM00000000), respectively.
Value of the data
  • This particular Bacillus sp. strain VKPM B-3276 was isolated from Culex pipiens larvae and showed significant entomopathogenic activity [2], therefore could be regarded as prospective entomocide.

  • The genome encodes a high number of various enzymes, participating in chitin and peptidoglycan degradation, which could be relevant in medicine (antimicrobial agents) or for waste utilization (chitin bioconversion).

  • According to whole genome alignment data Bacillus sp. strain VKPM B-3276 may be regarded as a new subspecies inside “Bacillus cereus group”.

  • Data on genome sequence of strain VKPM B-3276 can be used to search and characterize novel biotechnology-relevant enzymes and gene clusters.

1. Data

Bacillus sp. strain VKPM B-3276 was isolated from Culex pipiens larvae as an entomopathogenic agent [1]. Its genome was sequenced using Illumina Miseq platform to identify genes, responsible for its entomopathogenic properties. De novo assembly resulted in 176 contigs with average coverage of 44x. Total length of the assembly was equal to 6,126,346  bp with a G + C content of 35%. Automatic annotation by RAST (Rapid Annotation using Subsystems Technology) server [2] identified 6518 protein-coding and 111 RNA genes. Protein-coding sequences were organized in 358 subsystems, among which the most numerous were “Amino acids and derivatives” (395 genes), “Carbohydrates” (271), “Protein” (191) and “Cofactors, Vitamins, Prosthetic Groups, Pigments” (187). From almost 200 Carbohydrate Active enZymes (CAZymes) [3], detected using the dbCAN server [4] 50 were predicted to participate in decomposition of chitin and peptidoglycan and their derivatives (Fig. 1). The latter is well correlated with the isolation source of this strain as well as its entomocidic capabilities [1]. According to sequence comparisons, some of these enzymes are only distantly related to currently known members of CAZyme families and/or representing recently proposed families with limited number of members. E.g. VKPM B-3276 genome possesses a gene for GH129, a family, for which the only characterized member - α-N-acetylgalactosaminidase, possibly involved in mucin degradation [5]. This observation emphasizes the potential of this strain for other than insecticide-related applications.

Fig. 1.

Fig. 1

Genes, encoding proteins involved in degradation of chitin, peptidoglycan and related compounds. CBM50 - CBM module of enzymes, cleaving either chitin or peptidoglycan; CE14 - diacetylchitobiose deacetylase, putative; CE4 - chitin deacetylase, putative; CE9 - N-acetylglucosamine 6-phosphate deacetylase; GH129 - α-N-acetylgalactosaminidase; GH18 - chitinase, putative; GH23 - lysozyme, putative; GH25 - lysozyme, putative; GH73 - lysozyme, putative; GH8 - сhitosanase.

Analysis of genes, responsible for secondary metabolite biosynthesis, showed that VKPM B-3276 has a number of pathogen-related features. Gene clusters for bacillibactin and antrachelin siderophore biosynthesis, system for petrobactin-mediated iron uptake, as well as multiple toxin/antitoxin systems were found. VKPM B-3276 also possesses a heme utilization system, characteristic to gram-positive pathogens [6]. Interestingly, large proportion of pathogen-related gene clusters show a high level of syntheny with extremely pathogenic B. anthracis [7], therefore not only accentuating the importance of this strain as a prospective insecticide, but also indicating requirement of extensive safety studies before implementation of this strain in agricultural industry.

According to the analysis of 16S rRNA genes, strain VKPM B-3276 belongs to “Bacillus cereus group” of species, including B. cereus, B. anthracis and well known entomocidic strain Bacillus thuringiensis [8]. For the purpose of refinement of strain B-3276 phylogenetic position, average nucleotide identity (ANI) was calculated between B-3276 and all available genomes of “Bacillus cereus group”. ANI analysis showed that B. thuringiensis serovar berliner ATCC 10792 (96.48%) and B. cereus strain NCTC2599 (95.96% ANI) were the closest relatives of strain VKPM B-3276 forming with it a distinctive cluster (Fig. 2). Digital DNA:DNA Hybridization analysis (DDH) showed that the probability of the hypothesis, that these strains are from the same subspecies, is less than 25%.

Fig. 2.

Fig. 2

Average nucleotide identity between type strain genome assemblies of “Bacillus cereus group”, available at NCBI GenBank, and Bacillus sp. strain VKPM B-3276 (marked in red).

2. Experimental design, materials and methods

2.1. Strain isolation and deposition into collection

VKPM strain B-3276 was isolated from Culex pipiens larvae cadaver [2] and deposited in Russian National Collection of Industrial Microorganisms (VKPM). In 2018 it was sequenced in the frame of Russian program “Genomes of industrially-relevant microorganisms”.

2.2. DNA extraction, library preparation and sequencing

Genomic DNA was extracted and purified with standard phenol-chloroform method. DNA integrity was assessed by electrophoresis in agarose gel. Fragmentation of DNA was performed with Bioruptor™ sonicator (Diagenode, Belgium) to achieve an average fragment length of 500 bp. Additional step of size-selection with electrophoresis was performed before library preparation to get fragments in range from 400 to 600 bp. Further steps of library preparation were performed with KAPA™ HyperPlus fragment library kit (Roche) according to the manufacturer's instructions. Sequencing was done with Illumina MiSeq™ platform (Illumina, USA) using 500 cycles paired-end sequencing cartridge. 579,166 read pairs were obtained from the sequencing run.

2.3. De novo assembly

Removal of low-quality reads, bases and sequencing adapters was made with fastq-mcf [9] using the following parameters: Phred score ≥ 25, window size = 5. Genome were assembled with SPAdes v 3.10 [10] in “careful” mode. To check the quality of the assembly, reads were mapped back to contigs with bowtie2 [11], mapping file was processed with samtools [12].

2.4. Genome annotation and analysis

Genome was annotated with RAST [2] using RASTtk scheme [13]. Functional analysis was performed using the tools embedded in SEED portal [14]. CAZymes [3] prediction was done using the dbCAN meta server [4]. Analysis of genes involved in the biosynthesis of secondary metabolites was made with ANTISMASH [15] server. Average nucleotide identity was calculated using ani.rb script (https://github.com/lmrodriguezr/enveomics). ANI heatmap was plotted using ggplot2 library for R. Probability of being a new species or subspecies was assessed with GGDC 2.1 [16].

Acknowledgements

This work was supported by the NRC “Kurchatov institute” (internal grant #1972 from 09.08.2018 “Genomes of industrially-relevant microorganisms”). Computationally-intensive bioinformatic part of this work has been carried out using computing resources of the federal collective usage center Complex for Simulation and Data Processing for Mega-science Facilities at NRC “Kurchatov Institute”, http://ckp.nrcki.ru/. Authors also thank Drs. Konstantin Voyushin and Konstantin Sidoruk from the Russian National Collection of Industrial Microorganisms for their valuable input.

Footnotes

Transparency document associated with this article can be found in the online version at https://doi.org/10.1016/j.dib.2019.103757.

Transparency document

The following is the transparency document related to this article:

Multimedia component 1

mmc1.pdf (49.2KB, pdf)

References

  • 1.Shevtsov V.V., Gajtan V.I., Krajnova O.A., Khovrychev M.P., Rasnitsyn S.P., Vojtsik A.A., Ganina E.A., Pakhtuev A.I., Lomovskaja T.F. 1996. Strain of Bacterium Bacillus Sphaericus - a Producer of Entomopathogenic Preparation against Blood-Sucking Mosquito Larvae.http://www1.fips.ru/wps/portal/IPS_Ru#1546427824754 Russian Patent No 1305916. [Google Scholar]
  • 2.Aziz R.K., Bartels D., Best A., DeJongh M., Disz T., Edwards R.A., Formsma K., Gerdes S., Glass E.M., Kubal M., Meyer F., Olsen G.J., Olson R., Osterman A.L., Overbeek R.A., McNeil L.K., Paarmann D., Paczian T., Parrello B., Pusch G.D., Reich C., Stevens R., Vassieva O., Vonstein V., Wilke A., Zagnitko O. The RAST Server: rapid annotations using subsystems technology. BMC Genomics. 2008;9:75. doi: 10.1186/1471-2164-9-75. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Lombard V., Golaconda Ramulu H., Drula E., Coutinho P.M., Henrissat B. The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res. 2014;42:D490–D495. doi: 10.1093/nar/gkt1178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Yin Y., Mao X., Yang J., Chen X., Mao F., Xu Y. DbCAN: a web resource for automated carbohydrate-active enzyme annotation. Nucleic Acids Res. 2012;40:W445–W451. doi: 10.1093/nar/gks479. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Kiyohara M., Nakatomi T., Kurihara S., Fushinobu S., Suzuki H., Tanaka T., Shoda S.I., Kitaoka M., Katayama T., Yamamoto K., Ashida H. α-N-acetylgalactosaminidase from infant-associated bifidobacteria belonging to novel glycoside hydrolase family 129 is implicated in alternative mucin degradation pathway. J. Biol. Chem. 2012;287:693–700. doi: 10.1074/jbc.M111.277384. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Contreras H., Chim N., Credali A., Goulding C.W. Heme uptake in bacterial pathogens. Curr. Opin. Chem. Biol. 2014;19:34–41. doi: 10.1016/j.cbpa.2013.12.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Ravel J., Jiang L., Stanley S.T., Wilson M.R., Decker R.S., Read T.D., Worsham P., Keim P.S., Salzberg S.L., Fraser-Liggett C.M., Rasko D.A. The complete genome sequence of Bacillus anthracis Ames “Ancestor”. J. Bacteriol. 2009;91:445–446. doi: 10.1128/JB.01347-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Helgason E., Økstad O.A., Caugant D.A., Johansen H.A., Fouet A., Mock M., Hegna I., Kolstø A.B. Bacillus anthracis, Bacillus cereus, and bacillus thuringiensis - one species on the basis of genetic evidence. Appl. Environ. Microbiol. 2000;66:2627–2630. doi: 10.1128/aem.66.6.2627-2630.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Aronesty E. Comparison of sequencing utility programs. Open Bioinf. J. 2013;7:1–8. [Google Scholar]
  • 10.Bankevich A., Nurk S., Antipov D., Gurevich A.A., Dvorkin M., Kulikov A.S., Lesin V.M., Nikolenko S.I., Pham S., Prjibelski A.D., Pyshkin A.V., Sirotkin A.V., Vyahhi N., Tesler G., Alekseyev M.A., Pevzner P.A. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 2012;19:455–477. doi: 10.1089/cmb.2012.0021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Langmead B., Salzberg S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. The sequence alignment/map format and SAMtools. Bioinform. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Brettin T., Davis J.J., Disz T., Edwards R.A., Gerdes S., Olsen G.J., Olson R., Overbeek R., Parrello B., Pusch G.D., Shukla M., Thomason J.A., Stevens R., Vonstein V., Wattam A.R., Xia F. RASTtk: a modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci. Rep. 2015;5:8365. doi: 10.1038/srep08365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Overbeek R., Olson R., Pusch G.D., Olsen G.J., Davis J.J., Disz T., Edwards R.A., Gerdes S., Parrello B., Shukla M., Vonstein V., Wattam A.R., Xia F., Stevens R. The SEED and the rapid annotation of microbial genomes using subsystems technology (RAST) Nucleic Acids Res. 2014;42:D206–D214. doi: 10.1093/nar/gkt1226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Blin K., Wolf T., Chevrette M.G., Lu X., Schwalen C.J., Kautsar S.A., Suarez Duran H.G., de Los Santos E.L.C., Kim H.U., Nave M., Dickschat J.S., Mitchell D.A., Shelest E., Breitling R., Takano E., Lee S.Y., Weber T., Medema M.H. antiSMASH 4.0-improvements in chemistry prediction and gene cluster boundary identification. Nucleic Acids Res. 2017;45:W36–W41. doi: 10.1093/nar/gkx319. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Meier-Kolthoff J.P., Auch A.F., Klenk H.P., Göker M. Genome sequence-based species delimitation with confidence intervals and improved distance functions. BMC Bioinf. 2013;14:60. doi: 10.1186/1471-2105-14-60. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Multimedia component 1

mmc1.pdf (49.2KB, pdf)

Articles from Data in Brief are provided here courtesy of Elsevier

RESOURCES