Genome sequence data of Bacillus sp. CCB-MMP212 isolated from Malaysian mangrove: A potential strain in arsenic resistance with ArsI, C•As lyase

Nor Azura Azami; Nyok-Sean Lau; Go Furusawa

doi:10.1016/j.dib.2022.108597

. 2022 Sep 13;45:108597. doi: 10.1016/j.dib.2022.108597

Genome sequence data of Bacillus sp. CCB-MMP212 isolated from Malaysian mangrove: A potential strain in arsenic resistance with ArsI, C•As lyase

Nor Azura Azami ¹, Nyok-Sean Lau ¹, Go Furusawa ^1,^⁎

PMCID: PMC9508498 PMID: 36164294

Abstract

Bacillus sp. CCB-MMP212 is a Gram-positive bacterium isolated from mangrove sediment in Matang Perak, Malaysia (4.85496°E, 100.73495°N). Genome sequencing was performed using the Oxford Nanopore and Illumina platforms. The assembled genome was annotated using the rapid annotation subsystem technology server (RAST) (rast.nmpdr.org). The genome size of the Bacillus sp. CCB-MMP212 was 6,151,644 base pairs (bp) with a G+C content of 34.75%. The genome includes 6,311 coding sequences and 58 RNAs. The sequence has been deposited at Genbank with the accession number of JALDQE000000000. Interestingly, an arsenic resistance (ars) operon consisted of arsenic resistance operon repressor (arsR), ACR3 family arsenite efflux transporter (arsB), and arsenate reductase (arsC) genes were found in the genome. In addition, the arsenic inducible gene (arsI), which encoded a dioxygenase with C•As lyase activity, was also found in the ars operon. The enzyme is crucial for the methylation of methylarsonous acid [MAs(III)] and trivalent roxarsone [Rox(III)]. This dataset reveals the genetic ability of this strain in arsenic resistance. To the best of our knowledge, the arsI encoding C•As lyase is rarely reported within the genus Bacillus. Therefore, the dataset presented in this manuscript provides further insight into the arsenic resistance mechanisms of the genus Bacillus.

Keywords: Bacillus, Genome sequence, Arsenic resistance, Mangrove

Specification Table

Subject	Biology
Specific subject area	Microbiology, Genomics and Molecular Biology
Type of data	Tables, Figures and whole-genome sequencing data
How data were acquired	The complete genome sequence was determined using the Oxford Nanopore and Illumina platforms
Data format	Raw and analysed
Parameters for data collection	Pure culture of Bacillus sp. CCB-MMP212 was grown in marine agar (MA) at a temperature of 30°C and a pH of 7
Description of data collection	The genomic DNA was sequenced using the Oxford Nanopore and Illumina platforms, while subsequence annotation was done using the RAST server (RAST)
Data source location	Sediment samples were collected from Matang mangrove forest, Perak, Malaysia
Data accessibility	The complete genome sequence of Bacillus sp. CCB-MMP212 was deposited in NCBI GenBank under accession number JALDQE000000000 Direct URL to data: https://www.ncbi.nlm.nih.gov/nuccore/JALDQE000000000 Database link: BioProject: PRJNA818481 BioSample: SAMN26865143

Open in a new tab

Value of the Data

•
The whole-genome sequence of Bacillus sp. CCB-MMP212 could provide valuable information to researchers working on the Bacillus strain with the potential for arsenic resistance.
•
The Bacillus sp. CCB-MMP212 could be a referral strain for the arsI encoding C•As lyase in the genus Bacillus.
•
The whole-genome sequence of Bacillus sp. CCB-MMP212 can contribute to the understanding of molecular information and related characteristics of this strain.
•
The data can be used by researchers working in the field of Microbiology, Genomics, and Molecular Biology.

1. Data Description

Bacillus sp. CCB-MMP212 was isolated from mangrove sediment during the microbial diversity investigation of Matang Mangrove Forest, Perak, Malaysia. This study presents the complete whole-genome sequence of Bacillus sp. CCB-MMP212. The genome sequencing was performed using the Oxford Nanopore and Illumina platforms. The assembled genome was annotated using the rapid annotation with the RAST server (RAST) (rast.nmpdr.org) [1]. The result shows that the genome contained 6,151,644 base pairs (bp) with a G+C content of 34.75%. The genome includes 6,311 coding sequences and 58 RNAs. The assembly statistics and genomic features of Bacillus sp. CCB-MMP212 were summarised in Table 1. Bacillus sp. CCB-MMP212 whole-genome sequence was used to construct an accurate evolutionary relationship with other bacterial whole genomes closely related to Bacillus species using the Type Strain Genome Server, (TYGS) (https://tygs.dsmz.de) [2]. Fig. 1 shows that Bacillus sp. CCB-MMP212 is closely related to Bacillus thuringiensis ATCC 10792 and forms a clade with Bacillus cereus ATCC 14579. To confirm the phylogenetic relationship of CCB-MMP212, the average nucleotide identity (ANI) values and Digital DNA-DNA hybridization (dDDH) values between Bacillus sp. CCB-MMP212 and closely related species were calculated by the OrthoANI algorithm [3] and TYGS, respectively. Table 2 shows the ANI value of Bacillus sp. CCB-MMP212 and Bacillus thuringiensis ATCC 10792 exhibited the highest percentage (98.19%), followed by Bacillus cereus ATCC 14579 with an ANI value of 96.66%. From the Table, the ANI values of other strains were below the species boundary value (ANI, >95%) [4]. The dDDH values of B. thuringiensis ATCC 10792 (84.4%) and B. cereus ATCC 14579 (71.5%) were higher than the species boundary value (<70%) (Table 2) [5], indicating the consistency of the phylogenetic relationship of Bacillus sp. CCB-MMP212.

Table 1.

Assembly statistics and genomic features of Bacillus sp. CCB-MMP212.

Contigs no.	41
Genome size (bp)	6,151,644
GC content (%)	34.75
Largest contig (bp)	933547
N50 contig (bp)	556935
N75 contig (bp)	211718
L50 contig	5
L75	10
Number of Coding Sequences	6311
Number of RNAs	58
Number of subsystems	353
NCBI Accession No	JALDQE000000000

Open in a new tab

Fig 1 — Whole genome phylogenetic tree constructed by Type Strain Genome Server, using Maximum Likelihood Method based on Generalised Time Reversible (GTR) model. The tree shows the close relationship between *Bacillus* sp. CCB-MMP212 with the closed species, while *Geobacillus stearothermophilus* ATCC 12980 is included to serve as an outgroup.

Table 2.

Comparison of several Bacillus isolates based on genomic metrics including digital DNA-DNA hybridization (dDDH) and average nucleotide identity (ANI).

	dDDH (d4, in %)	ANI (%)	NCBI accession
Bacillus thuringiensis ATCC 10792	84.4	98.19	CM000753
Bacillus cereus ATCC 14579	71.5	96.66	AE016877
Bacillus toyonensis BCT-7112	45.0	91.61	CP006863
Bacillus tropicus N24	44.9	91.78	NZ_MACG00000000
Bacillus fungorum 17-SMS-01	44.5	91.52	NZ_NWUW00000000
Bacillus paranthracis Mn5	44.3	91.48	NZ_MACE00000000
Bacillus wiedmannii FSL W8-0169	44.0	91.29	NZ_LOBC00000000
Bacillus anthracis Ames	43.8	91.41	AE016879
Bacillus luti TD41	43.5	91.30	NZ_MACI00000000
Bacillus pacificus EB422	43.4	91.28	NZ_MACD00000000
Bacillus albus N35-10-2	43.1	91.13	NZ_MAOE00000000
Bacillus mobilis 0711P9-1	42.6	91.01	NZ_MACF00000000
Bacillus proteolyticus TD42	39.5	89.85	NZ_MACH00000000
Bacillus nitratireducens 4049	39.3	89.77	NZ_MAOC00000000
Bacillus mycoides DSM 2048	38.3	89.43	CM000742
Bacillus paramycoides NH24A2	37.1	88.97	NZ_MAOI00000000
Bacillus pseudomycoides DSM 12442	26.9	82.28	CM000745
Bacillus bingmayongensis FJAT-13831T	26.8	82.53	NZ_AKCS00000000
Bacillus cytotoxicus NVH 391-98	25.5	81.38	CP000764

Open in a new tab

Fig. 2 shows the subsystem statistics information of Bacillus sp. CCB-MMP212. The bar chart on the left side of the figure depicts the percentage coverage of subsystems. The pie chart generated by the RAST server and viewed in SEED viewer depicts the distribution of the 27 most common subsystem categories among 2118 subsystem categories. The most abundant subsystem categories were amino acids and derivatives (384), carbohydrates (281), cofactors, vitamins, prosthetic groups, and pigments (158). Interestingly, an ars operon consisting of asrR, I, B, and C was present in the genome (Table 3). Yoshinaga and colleagues reported that trivalent organoarsenicals, such as MAs(III) and Rox(III), are degraded to As(III) by ArsI with C•As lyase activity [6]. Then, As(III) might be released from the cell by an arsenite efflux permease, ArsB. Thus, bacteria with C•As lyase, including CCB-MMP212, might play an important role in arsenic biogeocycle through the degradation of environmental organoarsenicals.

Table 3.

Arsenic enzyme coding genes found in Bacillus sp. CCB-MMP212 genome.

Start	Stop	Strand	Gene	No of Locus		Protein name	Description
348446	348751	+	arsR	MCI4251078.1	101	Arsenical resistance operon transcriptional regulator ArsR	As(III)-responsive repressor of transcription [1].
348812	349249	+	arsI	MCI4251079.1	145	Glyoxalase/bleomycin resistance/dioxygenase family protein	Responsible for MAs(III) demethylation. Cleaves the C·As bond in a wide range of trivalent, organoarsenicals, including the trivalent roxarsone [Rox(III)], into As(III) [3].
349268	350308	+	arsB	MCI4251080.1	346	ACR3 family arsenite efflux transporter	Extrude the trivalent arsenic As(III) from the cell [3].
350329	350733	+	arsC	MCI4251081.1	134	Arsenate reductase (thioredoxin)	Reduce the arsenate ion (H₂AsO₄-) to arsenite ion (AsO2-) [2].

Open in a new tab

2. Experimental Design, Materials and Methods

2.1. Sample collection

Bacillus sp. CCB-MMP212 was isolated from sediment in Matang Forest Mangrove, Perak, Malaysia. The strain was deposited in the Centre for Chemical Biology-Microbial Biodiversity Library (CCB-MBL) in freeze-dried form and was stored in 40% glycerol stock at −80°C.

2.2. DNA Extraction

The DNA extraction was performed according to the method of Sokolov [7] with slight modifications. Bacterial resuspension was spun down and supernatant (ethanol) was removed via decantation. The pellet was resuspended in 500 µL of lysis buffer (50 mM NaCl, 50 mM Tris-HCl pH8, 50 mM EDTA, 2% SDS) and incubated for 30 min at 60°C. A volume of 3 µL RNAse A (10 mg/mL) was added to the lysate and incubated for 10 min at room temperature. A volume of 50 µL (0.1x vol) saturated KCl was added at 4°C for 5 min to remove the salt. The lysate was extracted once with an equal volume of chloroform to remove the remaining proteins. The aqueous layer containing the DNA was mixed with an equal volume of isopropanol and 20 µL of solid-phase reversible immobilization (SPRI) bead to promote the binding of DNA onto the solid carboxylated layer [8]. The mixture was incubated for 10 min at room temperature. Then the mixture was placed on a magnetic rack for 2 min and the supernatant was discarded. The bound magnetic bead was washed twice with 75% ethanol. The bead was resuspended in 100 µL of TE buffer, then incubated at 50°C for 5 min to extract the DNA.

2.3. Nanopore and Illumina library preparation and genome sequencing

According to the manufacturer's instructions (Oxford Nanopore, UK), approximately 400 ng of DNA as measured by Qubit was fragmented with the Nanopore rapid barcoding kit. On a Nanopore Flongle flow cell, the sample was sequenced. Guppy v4.4.1 was used to extract the fast5 file (high accuracy mode) [9]. Approximately 100 ng of DNA was fragmented to 350 bp using a Bioruptor, then the NEB Ultra II library preparation kit for Illumina was used according to the manufacturer's instructions (NEB, Ipswich, MA). Each sample was sequenced on a NovaSEQ6000 (Illumina, San Diego, CA), yielding approximately 1 gb of paired-end data (2×150 bp).

2.4. Hybrid De novo assembly - Nanopore and Illumina

Raw nanopore reads were quality- and length-filtered to retain reads with scores of 7 or higher that were longer than 2,000 bp. The filtered Nanopore was then used in combination with the Illumina reads for hybrid assembly with Unicycler (default settings) [10]. Contigs shorter than 500 bp were removed, and the filtered assembly was used for further analysis.

Ethics Statement

NA.

Credit Author Statement

Nor Azura Azami: Methodology, Writing – original draft, Writing – reviewing; Lau Nyok-Sean: Methodology and editing; Go Furusawa: Supervision, Writing – review & editing.

Funding Sources

This work was supported by the Short-Term Grant (304/PCCB/6315540) by Universiti Sains Malaysia awarded to Nor Azura.

Declaration of Competing Interest

The authors declared that they have no conflicts of interest.

Acknowledgement

The authors thank WorldFish and GeneSEQ for the gDNA extraction, sequencing and data analysis.

Data Availability

Bacillus sp. CCB-MMP212, whole genome shotgun sequencing project (Original data) (NCBI).

References

1.Aziz R.K., Bartels D., Best A., DeJongh M., Disz T., Edwards R.A., Formsma K., Gerdes S., Glass E.M., Kubal M., Meyer F., Olsen G.J., Olson R., Osterman A.L., Overbeek R.A., McNeil L.K., Paarmann D., Paczian T., Parrello B., Pusch G.D., Reich C., Stevens R., Vassieva O., Vonstein V., Wilke A., Zagnitko O. The RAST server: rapid annotations using subsystems technology. BMC Genom. 2008;9:1–15. doi: 10.1186/1471-2164-9-75. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Meier-Kolthoff J.P., Göker M. TYGS is an automated high-throughput platform for state-of-the-art genome-based taxonomy. Nat. Commun. 2019;10:2182. doi: 10.1038/s41467-019-10210-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Yoon S.H., Ha S.M., Lim J.M., Kwon S.J., Chun J. A large-scale evaluation of algorithms to calculate average nucleotide identity. Antonie van Leeuwenhoek. 2017;110:1281–1286. doi: 10.1007/s10482-017-0844-4. [DOI] [PubMed] [Google Scholar]
4.Lee I., Kim Y.O., Park S.C., Chun J. OrthoANI: An improved algorithm and software for calculating average nucleotide identity. Int. J. Syst. Evol. Microbiol. 2016;66(22):1100–1103. doi: 10.1099/ijsem.0.000760. [DOI] [PubMed] [Google Scholar]
5.Meier-Kolthoff J.P., Klenk H.P., Göker M. Taxonomic use of DNA G+C content and DNA–DNA hybridization in the genomic age. Int. J. Syst. Evol. Microbiol. 2014;64:352–356. doi: 10.1099/ijs.0.056994-0. [DOI] [PubMed] [Google Scholar]
6.Yoshinaga M., Rosen B.P. A C•As lyase for degradation of environmental organoarsenical herbicides and animal husbandry growth promoters. Proc. Natl. Acad. Sci. U.S.A. 2014;111(21):7701–7706. doi: 10.1073/pnas.1403057111. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Sokolov E.P. An improved method for DNA isolation from mucopolysaccharide-rich molluscan tissues. J Molluscan. Stud. 2000;66(4):573–575. doi: 10.1093/mollus/66.4.573. [DOI] [Google Scholar]
8.Oberacker P., Stepper P., Bond D.M., Hohn S., Focken J., Meyer V., Schelle L., SugrueI V.J., Jeunen G.J., Moser T., Hore S.R., Meyenn F.V., Hipp K., Hore T.A., Jurkowski T.P. Bio-on-magnetic-beads (BOMB): open platform for high-throughput nucleic acid extraction and manipulation. PLoS Biol. 2019;10(17):1–16. doi: 10.1371/journal.pbio.3000107. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Wick R.R., Judd L.M., Holt K.E. Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol. 2019;20:129. doi: 10.1186/s13059-019-1727-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Wick R.R., Louise M.J., Claire L.G., K E.H. Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput. Biol. 2017;13(6) doi: 10.1371/journal.pcbi.1005595. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Bacillus sp. CCB-MMP212, whole genome shotgun sequencing project (Original data) (NCBI).

[bib0001] 1.Aziz R.K., Bartels D., Best A., DeJongh M., Disz T., Edwards R.A., Formsma K., Gerdes S., Glass E.M., Kubal M., Meyer F., Olsen G.J., Olson R., Osterman A.L., Overbeek R.A., McNeil L.K., Paarmann D., Paczian T., Parrello B., Pusch G.D., Reich C., Stevens R., Vassieva O., Vonstein V., Wilke A., Zagnitko O. The RAST server: rapid annotations using subsystems technology. BMC Genom. 2008;9:1–15. doi: 10.1186/1471-2164-9-75. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0002] 2.Meier-Kolthoff J.P., Göker M. TYGS is an automated high-throughput platform for state-of-the-art genome-based taxonomy. Nat. Commun. 2019;10:2182. doi: 10.1038/s41467-019-10210-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0003] 3.Yoon S.H., Ha S.M., Lim J.M., Kwon S.J., Chun J. A large-scale evaluation of algorithms to calculate average nucleotide identity. Antonie van Leeuwenhoek. 2017;110:1281–1286. doi: 10.1007/s10482-017-0844-4. [DOI] [PubMed] [Google Scholar]

[bib0004] 4.Lee I., Kim Y.O., Park S.C., Chun J. OrthoANI: An improved algorithm and software for calculating average nucleotide identity. Int. J. Syst. Evol. Microbiol. 2016;66(22):1100–1103. doi: 10.1099/ijsem.0.000760. [DOI] [PubMed] [Google Scholar]

[bib0005] 5.Meier-Kolthoff J.P., Klenk H.P., Göker M. Taxonomic use of DNA G+C content and DNA–DNA hybridization in the genomic age. Int. J. Syst. Evol. Microbiol. 2014;64:352–356. doi: 10.1099/ijs.0.056994-0. [DOI] [PubMed] [Google Scholar]

[bib0006] 6.Yoshinaga M., Rosen B.P. A C•As lyase for degradation of environmental organoarsenical herbicides and animal husbandry growth promoters. Proc. Natl. Acad. Sci. U.S.A. 2014;111(21):7701–7706. doi: 10.1073/pnas.1403057111. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0007] 7.Sokolov E.P. An improved method for DNA isolation from mucopolysaccharide-rich molluscan tissues. J Molluscan. Stud. 2000;66(4):573–575. doi: 10.1093/mollus/66.4.573. [DOI] [Google Scholar]

[bib0008] 8.Oberacker P., Stepper P., Bond D.M., Hohn S., Focken J., Meyer V., Schelle L., SugrueI V.J., Jeunen G.J., Moser T., Hore S.R., Meyenn F.V., Hipp K., Hore T.A., Jurkowski T.P. Bio-on-magnetic-beads (BOMB): open platform for high-throughput nucleic acid extraction and manipulation. PLoS Biol. 2019;10(17):1–16. doi: 10.1371/journal.pbio.3000107. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0009] 9.Wick R.R., Judd L.M., Holt K.E. Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol. 2019;20:129. doi: 10.1186/s13059-019-1727-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0010] 10.Wick R.R., Louise M.J., Claire L.G., K E.H. Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput. Biol. 2017;13(6) doi: 10.1371/journal.pcbi.1005595. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Genome sequence data of Bacillus sp. CCB-MMP212 isolated from Malaysian mangrove: A potential strain in arsenic resistance with ArsI, C•As lyase

Nor Azura Azami

Nyok-Sean Lau

Go Furusawa

Abstract

Value of the Data

1. Data Description

Table 1.

Fig. 1.

Table 2.

Fig. 2.

Table 3.

2. Experimental Design, Materials and Methods

2.1. Sample collection

2.2. DNA Extraction

2.3. Nanopore and Illumina library preparation and genome sequencing

2.4. Hybrid De novo assembly - Nanopore and Illumina

Ethics Statement

Credit Author Statement

Funding Sources

Declaration of Competing Interest

Acknowledgement

Data Availability

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Genome sequence data of Bacillus sp. CCB-MMP212 isolated from Malaysian mangrove: A potential strain in arsenic resistance with ArsI, C•As lyase

Nor Azura Azami

Nyok-Sean Lau

Go Furusawa

Abstract

Value of the Data

1. Data Description

Table 1.

Fig. 1.

Table 2.

Fig. 2.

Table 3.

2. Experimental Design, Materials and Methods

2.1. Sample collection

2.2. DNA Extraction

2.3. Nanopore and Illumina library preparation and genome sequencing

2.4. Hybrid De novo assembly - Nanopore and Illumina

Ethics Statement

Credit Author Statement

Funding Sources

Declaration of Competing Interest

Acknowledgement

Data Availability

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases