Draft Genome Sequence of a Poly-γ-Glutamic Acid-Producing Isolate, Bacillus paralicheniformis Strain bcasdu2018/01

Anirban Adhikary; Tanu Bansal; Pooja Gupta; Deepti Jain; Purnima Anand; Rani Gupta; Jugsharan Singh Virdi; Ruchi Gulati Marwah

doi:10.1128/MRA.01013-21

. 2021 Nov 18;10(46):e01013-21. doi: 10.1128/MRA.01013-21

Draft Genome Sequence of a Poly-γ-Glutamic Acid-Producing Isolate, Bacillus paralicheniformis Strain bcasdu2018/01

Anirban Adhikary ^a,^*, Tanu Bansal ^a,^§, Pooja Gupta ^a,^◊, Deepti Jain ^b, Purnima Anand ^a, Rani Gupta ^c, Jugsharan Singh Virdi ^c, Ruchi Gulati Marwah ^a,^✉

Editor: Steven R Gill^d

PMCID: PMC8601133 PMID: 34792384

ABSTRACT

Bacillus paralicheniformis bcasdu2018/01 was isolated from the indoor environment of a chemistry laboratory. As part of the extracellular matrix, this isolate produces copious amounts of poly-γ-glutamic acid (γ-PGA). Here, we report the 4.25-Mbp draft genome assembly of the organism with an average G+C content of 45.92%.

ANNOUNCEMENT

The colony of Bacillus paralicheniformis bcasdu2018/01 was isolated on a nutrient agar (HiMedia Laboratories, India) plate exposed to the indoor environment of a chemistry laboratory of our institute in New Delhi, India. The Bacillus paralicheniformis group of bacteria are understudied from the perspective of biofilm formation and their capability to produce poly-γ-glutamic acid (γ-PGA) as a secreted biopolymer for industrial production. It is thus essential to identify a suitable genetic predisposition for its cost-effective production. Thus, we performed Illumina-based whole-genome sequencing of this bacterial isolate via Xcelris Labs (Ahmedabad, Gujarat, India).

Initially, the 16S rRNA gene was amplified using the primers 27F (5′-AGAGTTTGATCCTGGCTCAG-3′) and 1492R (5′-TACGGYTACCTTGTTACGACTT-3′) for sequencing using the Sanger method (GenBank accession number MK881510.2). A similarity search using the BLAST tool (1) against the NCBI rRNA database showed the closest similarity to Bacillus haynesii strain NRRL B-41327 (99.59%) and Bacillus licheniformis strain DSM-13 (99.46%). Genomic DNA was isolated using TRIzol reagent as per the manufacturer’s instructions. The quality of the isolated DNA was analyzed using agarose gel electrophoresis, and its concentration was determined using a Qubit 2.0 fluorometer. A TruSeq Nano DNA library prep kit was used to prepare the paired-end (PE) sequencing library. The library was sequenced using the Illumina HiSeq 2500 platform (2 × 150 bp chemistry), which generated a total of 141,059,626 reads with a coverage of 4,876×. The adapter sequences were removed and low-quality reads were filtered using Trimmomatic v0.36 with the following parameters: ILLUMINACLIP:adapter.fasta:2:30:10; LEADING:10; TRAILING: 10; SLIDINGWINDOW:0:20; MINLEN:40 (2). De novo assembly of the high-quality paired-end reads was accomplished using Velvet v1.2.10 (3), and the assemblies were optimized at kmer 121. GapCloser v1.12 software was used to remove gaps from the assembled scaffolds using PE read information (4). The final assembly consisted of 28 scaffolds, a genome size of 4,248,830 bp, an N₅₀ scaffold value of 824,371 bp, an average scaffold length of 151,744 bp, a maximum scaffold length of 1,060,188 bp, a minimum scaffold length of 511 bp, and a G+C content of 45.92%. The assembled scaffolds were subjected to gene prediction using NCBI PGAP v5.2 (5). Similar entries to the predicted protein coding genes in the NCBI nonredundant (nr) database were identified using the BLASTP algorithm with an E value threshold of 1e-05. Similarity was also searched for against the UniProt, COG, and Pfam databases. Taxonomic identity was established by in silico DNA-DNA hybridization using formula 2 on the GGDC 2.1 server (6). Further confirmation was provided using ribosomal multilocus sequence typing (rMLST) (7). Secondary metabolites were predicted using antiSMASH v6.0 (8), and bacteriocins were predicted using BAGEL4 (9). Microsatellites were predicted using MISA-web (10), prophages using PHASTER (11, 12), and clustered interspaced short palindromic repeats (CRISPRs) using CRISPRfinder (13). The sequence identities of relevant genes were determined using the NCBI BLASTN tool. Default parameters were used for all software unless stated otherwise.

Gene prediction using PGAP identified a total of 4,278 genes comprising 4,108 protein coding sequences, 73 pseudogenes, 77 tRNA genes, 15 rRNA genes, and 5 noncoding RNA (ncRNA) genes. In silico DNA-DNA hybridization showed that the genome of this isolate had 90.6% similarity to the type strain Bacillus paralicheniformis KJ16, thus assigning it its species epithet. This finding was backed by rMLST analysis, which also identified it as Bacillus paralicheniformis. One putative CRISPR module, one intact and one incomplete prophage, and one simple sequence repeat (SSR) was found in the genome. Functional annotation against the UniProt database found the presence of the following genes involved in γ-PGA metabolism and secretion (BLASTN percentage identities to Bacillus licheniformis ATCC 14580 are indicated in parentheses): pgsB (97.72%), pgsC (98.22%), pgsA (96.58%), pgsE (96.43%), pgdS (94.38%), and ggt (94.8%). Apart from ggt, these genes are present in an operon. The genome also contains putative gene clusters for the production of secondary metabolites, namely, fengycin, lichenysin, bacteriocin, bacitracin, polyketides, bacilysin, petrobactin, streptin, bottromycin, lanthipeptides, and lasso peptides, and the siderophore bacillibactin.

Data availability.

The whole-genome shotgun project has been deposited at GenBank under the accession number JAGTPZ000000000. The reads were deposited in the Sequence Read Archive (SRA) under the accession number SRX12486464. The BioProject accession number for the sequencing project is PRJNA724655.

ACKNOWLEDGMENTS

This work was supported by a grant received from the Department of Microbiology, Bhaskaracharya College of Applied Sciences, under the Star College Scheme, Department of Biotechnology, Government of India, to R.G.M. and core funding from the Regional Centre for Biotechnology, Faridabad, to D.J.

Contributor Information

Ruchi Gulati Marwah, Email: ruchi.gulati@bcas.du.ac.in.

Steven R. Gill, University of Rochester School of Medicine and Dentistry

REFERENCES

1.Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol 215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
2.Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Zerbino DR, Birney E. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–829. doi: 10.1101/gr.074492.107. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Xu M, Guo L, Gu S, Wang O, Zhang R, Peters BA, Fan G, Liu X, Xu X, Deng L, Zhang Y. 2020. TGS-GapCloser: a fast and accurate gap closer for large genomes with low coverage of error-prone long reads. Gigascience 9:1–11. doi: 10.1093/gigascience/giaa094. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Tatusova T, DiCuccio M, Badretdin A, Chetvernin V, Nawrocki EP, Zaslavsky L, Lomsadze A, Pruitt KD, Borodovsky M, Ostell J. 2016. NCBI Prokaryotic Genome Annotation Pipeline. Nucleic Acids Res 44:6614–6624. doi: 10.1093/nar/gkw569. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Auch AF, Klenk HP, Göker M. 2010. Standard operating procedure for calculating genome-to-genome distances based on high-scoring segment pairs. Stand Genomic Sci 2:142–148. doi: 10.4056/sigs.541628. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Jolley KA, Bliss CM, Bennett JS, Bratcher HB, Brehony C, Colles FM, Wimalarathna H, Harrison OB, Sheppard SK, Cody AJ, Maiden MCJ. 2012. Ribosomal multilocus sequence typing: universal characterization of bacteria from domain to strain. Microbiology (Reading) 158:1005–1015. doi: 10.1099/mic.0.055459-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Blin K, Shaw S, Kloosterman AM, Charlop-Powers Z, van Wezel GP, Medema MH, Weber T. 2021. antiSMASH 6.0: improving cluster detection and comparison capabilities. Nucleic Acids Res 49:W29–W35. doi: 10.1093/nar/gkab335. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Van Heel AJ, De Jong A, Song C, Viel JH, Kok J, Kuipers OP. 2018. BAGEL4: a user-friendly Web server to thoroughly mine RiPPs and bacteriocins. Nucleic Acids Res 46:W278–W281. doi: 10.1093/nar/gky383. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Beier S, Thiel T, Münch T, Scholz U, Mascher M. 2017. MISA-web: a Web server for microsatellite prediction. Bioinformatics 33:2583–2585. doi: 10.1093/bioinformatics/btx198. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Arndt D, Grant JR, Marcu A, Sajed T, Pon A, Liang Y, Wishart DS. 2016. PHASTER: a better, faster version of the PHAST phage search tool. Nucleic Acids Res 44:W16–W21. doi: 10.1093/nar/gkw387. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Zhou Y, Liang Y, Lynch KH, Dennis JJ, Wishart DS. 2011. PHAST: a fast phage search tool. Nucleic Acids Res 39:W347–W352. doi: 10.1093/nar/gkr485. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Grissa I, Vergnaud G, Pourcel C. 2007. CRISPRFinder: a Web tool to identify clustered regularly interspaced short palindromic repeats. Nucleic Acids Res 35:W52–W57. doi: 10.1093/nar/gkm360. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[B1] 1.Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol 215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]

[B2] 2.Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3] 3.Zerbino DR, Birney E. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–829. doi: 10.1101/gr.074492.107. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B4] 4.Xu M, Guo L, Gu S, Wang O, Zhang R, Peters BA, Fan G, Liu X, Xu X, Deng L, Zhang Y. 2020. TGS-GapCloser: a fast and accurate gap closer for large genomes with low coverage of error-prone long reads. Gigascience 9:1–11. doi: 10.1093/gigascience/giaa094. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B5] 5.Tatusova T, DiCuccio M, Badretdin A, Chetvernin V, Nawrocki EP, Zaslavsky L, Lomsadze A, Pruitt KD, Borodovsky M, Ostell J. 2016. NCBI Prokaryotic Genome Annotation Pipeline. Nucleic Acids Res 44:6614–6624. doi: 10.1093/nar/gkw569. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] 6.Auch AF, Klenk HP, Göker M. 2010. Standard operating procedure for calculating genome-to-genome distances based on high-scoring segment pairs. Stand Genomic Sci 2:142–148. doi: 10.4056/sigs.541628. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] 7.Jolley KA, Bliss CM, Bennett JS, Bratcher HB, Brehony C, Colles FM, Wimalarathna H, Harrison OB, Sheppard SK, Cody AJ, Maiden MCJ. 2012. Ribosomal multilocus sequence typing: universal characterization of bacteria from domain to strain. Microbiology (Reading) 158:1005–1015. doi: 10.1099/mic.0.055459-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B8] 8.Blin K, Shaw S, Kloosterman AM, Charlop-Powers Z, van Wezel GP, Medema MH, Weber T. 2021. antiSMASH 6.0: improving cluster detection and comparison capabilities. Nucleic Acids Res 49:W29–W35. doi: 10.1093/nar/gkab335. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9] 9.Van Heel AJ, De Jong A, Song C, Viel JH, Kok J, Kuipers OP. 2018. BAGEL4: a user-friendly Web server to thoroughly mine RiPPs and bacteriocins. Nucleic Acids Res 46:W278–W281. doi: 10.1093/nar/gky383. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] 10.Beier S, Thiel T, Münch T, Scholz U, Mascher M. 2017. MISA-web: a Web server for microsatellite prediction. Bioinformatics 33:2583–2585. doi: 10.1093/bioinformatics/btx198. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11] 11.Arndt D, Grant JR, Marcu A, Sajed T, Pon A, Liang Y, Wishart DS. 2016. PHASTER: a better, faster version of the PHAST phage search tool. Nucleic Acids Res 44:W16–W21. doi: 10.1093/nar/gkw387. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B12] 12.Zhou Y, Liang Y, Lynch KH, Dennis JJ, Wishart DS. 2011. PHAST: a fast phage search tool. Nucleic Acids Res 39:W347–W352. doi: 10.1093/nar/gkr485. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B13] 13.Grissa I, Vergnaud G, Pourcel C. 2007. CRISPRFinder: a Web tool to identify clustered regularly interspaced short palindromic repeats. Nucleic Acids Res 35:W52–W57. doi: 10.1093/nar/gkm360. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Draft Genome Sequence of a Poly-γ-Glutamic Acid-Producing Isolate, Bacillus paralicheniformis Strain bcasdu2018/01

Anirban Adhikary

Tanu Bansal

Pooja Gupta

Deepti Jain

Purnima Anand

Rani Gupta

Jugsharan Singh Virdi

Ruchi Gulati Marwah

Roles

ABSTRACT

ANNOUNCEMENT

Data availability.

ACKNOWLEDGMENTS

Contributor Information

REFERENCES

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Draft Genome Sequence of a Poly-γ-Glutamic Acid-Producing Isolate, Bacillus paralicheniformis Strain bcasdu2018/01

Anirban Adhikary

Tanu Bansal

Pooja Gupta

Deepti Jain

Purnima Anand

Rani Gupta

Jugsharan Singh Virdi

Ruchi Gulati Marwah

Roles

ABSTRACT

ANNOUNCEMENT

Data availability.

ACKNOWLEDGMENTS

Contributor Information

REFERENCES

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases