Abstract
Here, we report the complete genome sequence data of the biocontrol strains Bacillus velezensis BP1.2A and BT2.4 isolated from Vietnamese crop plants. The size of the genomes is 3,916,868 bp (BP1.2A), and 3,922,686 bp (BT2.4), respectively. The BioProjects have been deposited at NCBI GenBank. The GenBank accession numbers for the B. velezensis strains are PRJNA634914 (BP1.2A) and PRJNA634832 (BT2.4) for the BioProjects, CP085504 (BP1.2A) and CP085505 (BT2.4) for the chromosomes, GCA_013284785.2 (BP2.1A), and GCA_013284785.2 (BT2.4) for GenBank assembly accessions, and SAMN15012571 (BP1.2A) and SAMN15009897 (BT2.4) for the BioSamples. Both genomes were closely related to FZB42, the model strain for plant growth promoting bacilli.
Keywords: Complete genome, Phylogenetic analysis, Bacillus velezensis, Lipopeptides, Polyketides, Macrolactin
Specifications Table
| Subject | Biological sciences |
| Specific subject area | Molecular Phylogenetics |
| Type of data | Table, Figure, genome sequencing data in FASTA format. |
| How the data were acquired | Short reads were generated with Illumina HiSeq at LGC Genomics (Berlin, Germany). Long reads were obtained with Oxford Nanopore MinION. |
| Data format | Analyzed DNA sequence data in FASTA, NEWICK and text format. |
| Description of data collection | Pure cultures of BP1.2A and BT2.4 were used to isolate genomic DNA and to obtain the genomic data. Genome annotation was carried out using NCBI Genome Automatic Annotation Pipeline (PGAP) and RAST. |
|
|
| Data accessibility | The BioProjects have been deposited at NCBI GenBank under the following accession numbers: Bioprojects: PRJNA634914 (BP1.2A), and PRJNA634832 (BT2.4), Biosamples: SAMN15012571 (BP1.2A), and SAMN15009897 (BT2.4), Sequences of the chromosomes: CP085504.1 (BP1.2A) and CP085505.1 (BT2.4), GenBank assembly accessions: GCA_013285085.2 (BP1.2A), and GCA_013284785.2 (BT2.4). The SRA records could be accessed for BP1.2A, and BT2.4 from their corresponding links from the BioProjects. https://www.ncbi.nlm.nih.gov/sra/PRJNA634914 https://www.ncbi.nlm.nih.gov/sra/PRJNA634832 |
| With the article | |
| L.T.T. Tam, J. Jähne, P.T. Luong, L.T.P. Thao, L.T.K. Chung, A. Schneider, C. Blumenscheit, P. Lasch, T. Schweder, R. Borriss. Draft genome sequences of 59 endospore-forming Gram-positive bacteria associated with crop plants grown in Vietnam. Microbiol. Resour. Announc. 9 (2020): e01154–20 https://doi/10.1128/MRA.01154–20. |
Value of the Data
-
•
The data of this article demonstrate that it is possible, to isolate closely related Bacillus strains from remote geographical regions with different climatic conditions
-
•
BP1.2A, and BT2.4 share 99.99% identical residues with the model strain FZB42 (Table 3). The high similarity of the two novel strains with the biocontrol strain FZB42, encourages the development of the strains as promising biocontrol agents used in sustainable agriculture in temperate and subtropical zones, as well.
-
•
The data demonstrate that gene clusters involved in non-ribosomal and ribosomal synthesis of antibacterial and antifungal secondary metabolites are highly conserved in different representatives of B. velezensis, despite of their geographical distribution.
-
•
For the scientific community, the genome data presented here, extend the resources for comparative genomic analysis among the members of the Bacillus amyloliquefaciens operational group, including Bacillus velezensis, at present the most important species used in biological plant protection.
-
•
Furthermore, extended genomic analyses performed between closely related bacteria should elucidate regions and/or genes with different variability and might identify regions (genes) with an enhanced mutation bias.
Table 3.
Sequence comparison of BP1.2A, and BT2.4 with FZB4242 using blastn, and ANIb [11]. The italic numbers set in brackets indicate the overlap of the sequences used in the comparison. Analysis of singletons was performed with the EDGAR software package [12].
| ANIb comparison | BP1.2A (CP085504.1) | BT2.4 (CP085505.1) | FZB42 (CP000560.2) |
|---|---|---|---|
| BP1.2A | * | 100 (99.74) | 100.00 (99.64) |
| BT2.4 | 100.00 (99.67) | * | 99.99 (99.58) |
| FZB42 | 100.00 (99.64) | 99.99 (99.61) | * |
| BLASTN comparison | Query BP1.2A | Query BT2.4 | Query FZB42 |
| BP1.2A cover | 100 | 99.854% | 98.877% |
| BP1.2A identities | 100 | 99.995% | 99.989% |
| BP1.2A different nts | 0 | 184/3,916,940 | 426/3,874,585 |
| BP1.2A gaps | 0 | 74/3,916,940 | 102/3,874,585 |
| BT2.4 cover | 100% | 100% | 99.866% |
| BT2.4 identities | 99.996% | 100 | 99.993% |
| BT2.4 different nts | 174/3,916,868 | 0 | 274/3,911,604 |
| BT2.4 gaps | 25/3,916,868 | 0 | 21/3,911,604 |
| FZB42 cover | 99.697% | 98.026% | 100 |
| FZB42 identities | 99.987% | 99.990% | 100 |
| FZB42 different nts | 490/3,904,992 | 382/3,845,221 | 0 |
| FZB42 gaps | 182/3,904,992 | 192/3,845,221 | 0 |
| Singletons (CDS) | BP1.2A | BT2.4 | FZB42 |
| BP1.2A | * | 1 | 41 |
| BT2.4 | 0 | * | 40 |
| FZB42 | 67 | 67 | * |
1. Data Description
The draft genome sequences of 59 Gram-positive bacterial strains that were isolated from Vietnamese crop plants have been already reported [1]. Two of these strains, B. velezensis BP1.2A, and B. velezensis BT2.4, were now completely sequenced using the nanopore sequencing technology. Both sequences exhibited a very high degree of similarity with the model strain of plant-growth promoting Gram-positive bacteria, B. velezensis FZB 42 [2].
The complete genomes consist of single circular chromosomes with 3916,868 bps (BP1.2A) and 3922,686 bps (BT2.4), respectively. Automatic genome annotation was performed using the RAST (Rapid Annotation using Subsystems Technology) server [3], and the NCBI Genome Automatic Annotation Pipeline (PGAP) [4] for the general genome annotation deposited in NCBI.
As shown in Table 1, subsystem proteins distribution [5] of the two strains is very similar to FZB42 [6] indicating their close relationship. Genome mining of B. velezensis performed with antiSMASH version 6.0 [7] extracted the complete set of gene clusters and genes involved in non-ribosomal and ribosomal synthesis of secondary metabolites previously identified in FZB42 Table 2. shows the potential to synthesize an impressive number of different secondary metabolites in B. velezensis strains BP1.2A, BT2.4, and FZB42.
Table 1.
General genomic features of B. velezensis BP1.2A (CP085504.1), and BT2.4 (CP085505.1) compared with FZB42 (NC_009725.2). Methods used for generating the data are set in brackets (PGAP, RAST, EDGAR). Differences to FZB42 are labelled in red letters.
| Attributes | BP1.2A | BT2.4 | FZB42 |
|---|---|---|---|
| Genome size (bp) | 3,916,868 | 3,922,686 | 3,918,596 |
| G+C% | 46.5 | 46.5 | 46,5 |
| Number of genes (PGAP) | 3871 | 3870 | 3855 |
| CDSs total (PGAP) | 3753 | 3752 | 3734 |
| CDS core genome (EDGAR) | 3633 | 3633 | 3633 |
| CDS pan genome (EDGAR) | 3757 | 3757 | 3757 |
| RNA genes (RAST) | 118 | 118 | 118 |
| rRNAs (PGAP) | 27 | 27 | 29 |
| tRNAs (PGAP) | 86 | 86 | 88 |
| ncRNAs (PGAP) | 5 | 5 | 4 |
| Pseudo genes (PGAP) | 71 | 69 | 59 |
| Number of coding sequences (RAST) | 3939 | 3946 | 3938 |
| Number of Subsystems (RAST) | 324 | 324 | 324 |
| Subsystem Feature Counts | |||
| Cofactors, Vitamins, Prosthetic Groups, Pigments | 147 | 147 | 147 |
| Cell Wall and Capsule | 73 | 73 | 73 |
| Virulence, Disease and Defense | 38 | 38 | 38 |
| Potassium metabolism | 3 | 3 | 3 |
| Miscellaneous | 24 | 24 | 24 |
| Phages, Prophages, Transposable elements, Plasmids | 0 | 0 | 0 |
| Membrane Transport | 42 | 42 | 42 |
| Iron acquisition and metabolism | 25 | 25 | 25 |
| RNA metabolism | 63 | 63 | 64 |
| Nucleosides and Nucleotides | 95 | 95 | 95 |
| Protein Metabolism | 209 | 209 | 211 |
| Cell Division and Cell Cycle | 6 | 6 | 6 |
| Motility and Chemotaxis | 42 | 42 | 42 |
| Regulation and Cell signaling | 28 | 28 | 28 |
| Secondary Metabolism | 6 | 6 | 6 |
| DNA Metabolism | 63 | 63 | 63 |
| Fatty Acids, Lipids, and Isoprenoids | 53 | 53 | 53 |
| Nitrogen Metabolism | 20 | 20 | 20 |
| Dormancy and Sporulation | 91 | 91 | 91 |
| Respiration | 40 | 40 | 40 |
| Stress Response | 43 | 43 | 43 |
| Metabolism of Aromatic Compounds | 12 | 12 | 13 |
| Amino Acids and Derivatives | 299 | 300 | 301 |
| Sulfur Metabolism | 6 | 6 | 6 |
| Phosphorus Metabolism | 12 | 12 | 12 |
| Carbohydrates | 215 | 215 | 215 |
Table 2.
Detection of gene clusters involved in synthesis of secondary metabolites in the genomes of B. velezensis BP1.2A (CP085504), and B.velezensis BT2.4 (CP085505). For comparison FZB42 (CP000560.2) was also analyzed. Similarity to known metabolites listed in the MIBiG 2.0 repository [8] is indicated.
| Region | CP085504 | CP085505 | CP000560.2 | Similarity | ||||
|---|---|---|---|---|---|---|---|---|
| Surfactin | 318,208 | 383,067 | 318,208 | 383,067 | 322,723 | 387,582 | 95% | BGC0000433 |
| Plantazolicin | 717,159 | 740,336 | 717,099 | 740,276 | 721,674 | 744,851 | 100% | BGC0000569 |
| Ketoacyl:ACP synthase | 935,682 | 976,926 | 935,298 | 976,542 | 940,739 | 981,983 | 100% | Bacillus |
| Squalene/phytoene synthase | 1062,552 | 1079,781 | 1062,168 | 1079,397 | 1074,783 | 1075,523 | 100% | Bacillus |
| Macrolactin H | 1366,841 | 1453,226 | 1366,457 | 1452,842 | 1371,897 | 1458,282 | 100% | BGC0000181 |
| Bacillaene | 1676,755 | 1777,357 | 1676,371 | 1776,973 | 1681,811 | 1782,413 | 100% | BGC0001089 |
| Fengycin | 1866,123 | 1903,373 | 1865,739 | 1902,989 | 1871,179 | 1908,429 | 100% | BGC0001095 |
| Bacillomycin D | 1907,878 | 1963,948 | 1918,319 | 1963,564 | 1923,759 | 1969,004 | 100% | BGC0001090 |
| Squalene-hopene synthase | 2010,880 | 2032,763 | 2010,496 | 2032,379 | 2024,219 | 2026,102 | 100% | Bacillus |
| T3PKS | 2099,249 | 2140,349 | 2098,865 | 2139,965 | 2102,588 | 2143,688 | 100% | Bacillus |
| Difficidin | 2269,142 | 2362,931 | 2268,758 | 2362,547 | 2344,012 | 2286,309 | 100% | BGC0000176 |
| PK-5x Cys | 2851,295 | 2900,808 | 2850,911 | 2906,712 | 2873,990 | 2884,225 | 88% | B.velezensis |
| Bacillibactin | 3017,800 | 3024,927, | 3023,696 | 3030,823 | 3021,021 | 3033,995 | 100% | BGC0000309 |
| Amylocyclicin | 3039,655 | 3045,228, | 3045,551 | 3051,124 | 3043,470 | 3049,481 | 100% | BGC0000616 |
| Bacilysin | 3574,134 | 3615,552 | 3580,030 | 3621,448 | 3593,882 | 3599,780 | 100% | BGC0001184 |
The phylogenomic analysis supported by TYGS [10] reveals that BP1.2A, and BT2.4 are representatives of the species B. velezensis (Fig. 1). Differences to B. velezensis FZB42 were not detected when the genomes were pairwise compared using ANIb [11] (Fig. 2) indicating their close relationship, despite that the sites of their isolation (Vietnam and Germany) are very remote from each other.
Fig. 1.
Phylogenetic tree of B.velezensis strains BP1.2A (CP085504), and BT2.4 (CP085505) labelled in red letters. The tree, based on whole genome sequences, was inferred with FastME 2.1.6.1 [9] from GBDP distances calculated from genome sequences. The branch lengths are scaled in terms of GBDP distance formulad5. The numbers below branches are GBDP pseudo-bootstrap support values from 100 replications, with an average branch support of 57.3%.
Fig. 2.
Pairwise comparison of the genomes of B. velezensis BP1.2A, and BT2.4 with B. velezensis FZB42, and the type strain of B. velezensis CCUG 50,740 using ANIb [11].
Table 3 and the Venn diagram presented in Fig. 3 summarize the comparison of the whole genome sequences of BP1.2A, and BT2.4 with FZB42. The three strains share a core genome of 3633 CDS. There is only one additional CDS (encoding a hypothetical protein) in BP1.2A, when compared with BT2.4 suggesting that both strains are identical or nearly identical clones, and the observed difference is due to sequencing error(s). Slight differences were detected, when the genomes were compared with FZB42. BP1.2A, and BT2.4 harbored 41 or 40 CDS, respectively, not occurring in the FZB42 genome. Vice versa, FZB42 harbored a total of 67 singletons, not present in the Vietnamese strains (Table 3). The slight differences to the numbers given in the Venn diagram (Fig. 3) are due to the different methods applied, as explained in the legend to Fig. 3.
Fig. 3.
Venn diagram of the genomes of FZB42 (1), BP1.2A (2), and BT2.4 (3). Please note: The singleton numbers don´t necessarily correspond to the numbers in the “Singleton” interface (Table 3). The Venn diagram constructed with EDGAR shows the number of best hits between subsets of genomes. But: A gene without reciprocal best hit to another genome is not necessarily a singleton [12].
2. Experimental Design, Materials and Methods
2.1. Strain growth conditions and DNA isolation
Cultivation of the Bacillus strains and DNA isolation have been previously described [1].
2.2. Genome sequencing, assembly, and annotation
Short-read sequencing was conducted in LGC Genomics (Berlin, Germany) using Illumina HiSeq in a paired 150 bp manner. Default parameters were used for all software unless otherwise specified. The short reads were trimmed and filtered using fastp [12] on default settings. Long-read sequencing was done in house with the Oxford Nanopore MinION with the flowcell R9.4.1 and prepared with the Ligation Sequencing Kit (SQK-LSK109). The samples were sequenced 48 h and basecalled afterwards by Guppy v3.1.5. Long reads were trimmed using Porechop (https://github.com/rrwick/Porechop, v0.2.4) and filtered using Filtlong (https://github.com/rrwick/Filtlong, v0.2.0) on default settings. De-novo assemblies were generated by using the hybrid-assembler Unicycler v0.4.8 [13]. The short-read assembly was done by SPades v3.13.0 [14] without read correction and normal bridging and the long-read assembly was done by racon v1.4.20 [15]. The quality of assemblies was assessed by determining the ratio of falsely trimmed proteins by using Ideel (https://github.com/phiweger/ideel).
2.3. Phylogenomics
The genome sequence data were uploaded to the Type (Strain) Genome Server (TYGS) for a whole genome-based analysis [10]. All pairwise comparisons were conducted using GBDP, and 100 distance replicates were calculated each. The resulting intergenomic distances were used to infer a balanced minimum evolution tree via FASTME 2.1.6.1 [9]. The tree was visualized with iTOL (https://itol.embl.de/#).
Ethics Statements
This work did not contain human subjects, animals, cell lines or endangered species.
CRediT authorship contribution statement
Christian Blumenscheit: Investigation, Methodology, Data curation, Software, Writing – original draft. Jennifer Jähne: Investigation, Methodology, Data curation. Andy Schneider: Investigation, Methodology. Jochen Blom: Software. Thomas Schweder: Conceptualization, Supervision. Peter Lasch: Conceptualization, Methodology, Supervision. Rainer Borriss: Conceptualization, Writing – review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
We thank Le Thi Thanh Tam, Division of Plant Pathology and Phyto-Immunology, Plant Protection Research Institute, Hanoi, Vietnam for strains BP1.2A and Bt2.4. This research was supported through project ENDOBICA (Bundesministerium für Bildung und Forschung grant 031B0582A).
Data Availability
References
- 1.Tam L.T.T., Jähne J., Luong P.T., Thao L.T.P., Chung L.T.K., Schneider A., Blumenscheit C., Lasch P., Schweder T., Borriss R. Draft genome sequences of 59 endospore-forming Gram-positive bacteria associated with crop plants grown in Vietnam. Microbiol. Resour. Announc. 2020;9:e01154–e01220. doi: 10.1128/MRA.01154-20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Chowdhury S.P., Hartmann A., Gao X., Borriss R. Biocontrol mechanism by root-associated Bacillus amyloliquefaciens FZB42 - a review. Front. Microbiol. 2015;6:780. doi: 10.3389/fmicb.2015.00780. Jul 28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Overbeek R., Olson R., Pusch G.D., Olsen G.J., Davis J.J., Disz T., Edwards R.A., Gerdes S., Parrello B., Shukla M., Vonstein V., Wattam A.R., Xia F., Stevens R. The SEED and the rapid annotation of microbial genomes using subsystems technology (RAST) Nucleic Acids Res. 2014;42:D206–D214. doi: 10.1093/nar/gkt1226. JanDatabase issue. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Li W., O'Neill K.R., Haft D.H., DiCuccio M., Chetvernin V., Badretdin A., Coulouris G., Chitsaz F., Derbyshire M.K., Durkin A.S., Gonzales N.R., Gwadz M., Lanczycki C.J., Song J.S., Thanki N., Wang J., Yamashita R.A., Yang M., Zheng C., Marchler-Bauer A., Thibaud-Nissen F. RefSeq: expanding the prokaryotic genome annotation pipeline reach with protein family model curation. Nucleic Acids Res. 2021;49:D1020–D1028. doi: 10.1093/nar/gkt1226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Overbeek R., Begley T., Butler R.M., Choudhuri J.V., Chuang H.Y., Cohoon M., de Crécy- Lagard V., Diaz N., Disz T., Edwards R., Fonstein M., Frank E.D., Gerdes S., Glass E.M., Goesmann A., Hanson, Iwata-Reuyl D., Jensen R., Jamshidi N., Krause L., Kubal M., Larsen N., Linke B., McHardy A.C., Meyer F., Neuweger H., Olsen G., Olson R., Osterman A., Portnoy V., Pusch G.D., Rodionov D.A., Rückert C., Steiner J., Stevens R., Thiele I., Vassieva O., Ye Y., Zagnitko O., Vonstein V. The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 2005;17:5691–5702. doi: 10.1093/nar/gki866. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Chen X.H., Koumoutsi A., Scholz R., Eisenreich A., Schneider K., Heinemeyer I., Morgenstern B., Voss A., Hess W.R., Reva O., Junge H., Voigt B., Jungblut P.R., Vater J., Süssmuth R., Liesegang H., Strittmatter A., Gottschalk G., Borriss R. Comparative analysis of the complete genome sequence of the plant growth-promoting bacterium Bacillus amyloliquefaciens FZB42. Nat. Biotechnol. 2007;9:1007–1014. doi: 10.1038/nbt1325. [DOI] [PubMed] [Google Scholar]
- 7.Blin K., Shaw S., Kloosterman A.M., Charlop-Powers Z., van Wezel G.P., Medema M.H., Weber T. antiSMASH 6.0: improving cluster detection and comparison capabilities. Nucleic Acids Res. 2021;49:W29–W35. doi: 10.1093/nar/gkab335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Kautsar S.A., Blin K., Shaw S., Navarro-Muñoz J.C., Terlouw B.R., van der Hooft J.J.J., van Santen J.A., Tracanna V., Suarez Duran H.G., Pascal Andreu V., Selem-Mojica N., Alanjary M., Robinson S.L., Lund G., Epstein S.C., Sisto A.C., Charkoudian L.K., Collemare J., Linington R.G., Weber T., Medema M.H. MIBiG 2.0: a repository for biosynthetic gene clusters of known function. Nucleic Acids Res. 2020;48:D454–D458. doi: 10.1093/nar/gkz882. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Lefort V., Desper R., Gascuel O. FastME 2.0: a comprehensive, accurate, and fast distance-based phylogeny inference program. Mol. Biol. Evol. 2015;32:2798–2800. doi: 10.1093/molbev/msv150. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Meier-Kolthoff J.P., Göker M. TYGS is an automated high-throughput platform for state-of-the- art genome-based taxonomy. Nat. Commun. 2019;10:2182. doi: 10.1038/s41467-019-10210-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Richter M., Rosselló-Móra R., Glöckner F.O., Peplies J. JSpeciesWS: a web server for prokaryotic species circumscription based on pairwise genome comparison. Bioinformatics. 2016;32:929–931. doi: 10.1093/bioinformatics/btv681. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Dieckmann M.A., Beyvers S., Nkouamedjo-Fankep R.C., Hanel P.H.G., Jelonek L., Blom J., Goesmann A. EDGAR3.0: comparative genomics and phylogenomics on a scalable infrastructure. Nucleic Acids Res. 2021;49:W185–W192. doi: 10.1093/nar/gkab341. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Chen S., Zhou Y., Chen Y., Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34:i884–i890. doi: 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Wick R.R., Judd L.M., Gorrie C.L., Holt K.E. Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput. Biol. 2017;13(6) doi: 10.1371/journal.pcbi.1005595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Bankevich A., Nurk S., Antipov D., Gurevich A.A., Dvorkin M., Kulikov A.S., Lesin V.M., Nikolenko S.I., Pham S., Prjibelski A.D., Pyshkin A.V., Sirotkin A.V., Vyahhi N., Tesler G., Alekseyev M.A., Pevzner P.A. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 2012;19(5):455–477. doi: 10.1089/cmb.2012.0021. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



