Whole genome sequence data of an Antarctic bacterium, Arthrobacter sp. ES1 from the Schirmacher Oasis, East Antarctica

Chui Peng Teoh; Nur Athirah Yusof; Cahyo Budiman; Yoke Kqueen Cheah; Clemente Michael Vui Ling Wong

doi:10.1016/j.dib.2023.109052

. 2023 Mar 8;48:109052. doi: 10.1016/j.dib.2023.109052

Whole genome sequence data of an Antarctic bacterium, Arthrobacter sp. ES1 from the Schirmacher Oasis, East Antarctica

Chui Peng Teoh ^a, Nur Athirah Yusof ^a, Cahyo Budiman ^a, Yoke Kqueen Cheah ^b, Clemente Michael Vui Ling Wong ^a,^⁎

PMCID: PMC10024075 PMID: 36942092

Abstract

Arthrobacter is a coryneform bacterium in the family of Micrococcaceae. Arthrobacter species isolated from hostile environments are capable of producing interesting bioactive compounds, some of which may be a new class of antibiotics. Here, we present the complete genome sequence of Arthrobacter sp. ES1 isolated from Schirmacher Oasis in East Antarctica. Genomic DNA sequencing was performed using the Illumina MiSeq sequencer. Arthrobacter sp. ES1 has a genome size of 3,964,927 bp and a GC content of 65.73%. The raw genome sequences have been deposited in the NCBI Sequence Read Archive database under the accession number, SRR20664316.

Keywords: Arthrobacter, Antarctica, Whole genome sequencing, Genome

Specification Table

Subject	Biology

Specific subject area	Microbiology and genomics
Type of data	Table Figure
How the data were acquired	The genomic library was constructed using Nextera® XT DNA Sample preparation kit. Genome sequencing was performed using 300 cycles of Miseq® Regent Kit v2 and Illumina MiSeq Platforms. Raw sequencing data were trimmed and filtered using SolexaQA and bowtie2 tools [1,2]. De novo assembly was performed by using Velvet [3]. The genome completeness was assessed using BUSCO [4] tool. Genome annotation was performed using an online web tool, Rapid Annotation using Subsystem Technology (RAST) [5].
Data format	Raw Filtered Analyzed Assembled Annotated
Description of data collection	Genomic DNA Arthrobacter sp. ES1 was extracted using QIAGEN® DNeasy Blood and tissue kit.
Data source location	Arthrobacter sp. ES1 was isolated from a snow sample collected from the Schirmacher oasis (S70° 07’ 2.3” E 22۫° 55’ 46.5”), East Antarctica.
Data accessibility	The data is hosted at National Center for Biotechnology Information Bioproject: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA344835 Biosample: https://www.ncbi.nlm.nih.gov/biosample/SAMN05712594 NCBI GenBank Accession Number: NZ_MQTO01000000 https://www.ncbi.nlm.nih.gov/nuccore/2119834887 Repository name: NCBI SRA database Data identification number: SRR20664316 Direct URL to data: https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR20664316

Open in a new tab

Value of the Data

•
Whole genome sequence data can be used to identify Arthrobacter sp. ES1 and determine whether it is a new species.
•
Whole genome sequence data of strain ES1 can be useful for comparative genomic studies with other Arthrobacter species.
•
Unravelling the genome of strain ES1 may aid in the discovery of novel bioactive compound-coding gene clusters.

1. Objective

Arthrobacter sp. Are ubiquity present in the environment, they are frequently isolated from the soil. Arthrobacter sp. strains have exceptional survival abilities. They have been isolated from a variety of harsh environments, including radioactive and chemically contaminated sites as well as the polar regions. These Arthrobacter sp. are capable of metabolizing and resisting environmental hazards and heavy metals [6]. In addition, polar environments have been proposed as a source of novel bioactive compounds. Sixteen bacterial strains that produce antibiotics have been isolated from the central Arctic Ocean by Wietz et al., [7]. Seven of these Arthrobacter spp. can produce arthrobacilins, A and C under different growth conditions [7]. Our objective was to sequence, assemble, and annotate the genome of Arthrobacter sp. ES1, which would allow us to discover new bioactive compounds and conduct evolutionary research.

2. Data Description

This data set includes raw and assembled DNA sequences that have been quality-assessed, as well as annotated versions of the genomes of Arthrobacter sp. ES1. The resulting paired-end sequencing reads were designated as ES1_R1.fastq and ES1_R2.fastq. Herein, the raw and clean-sequencing reads, statistics for the assembly, the genome's quality, and its annotation are reported. A total of 3,248,048 raw reads were generated resulting in 490,455,248 bases (Table 1). The sequencing reads were then pre-processed to remove low-quality, contaminant, and short reads, a total of 72.92% of clean reads were recovered. The genome size of strain ES1 is 3,964,927 bp at 77 × sequence coverage with a GC content of 65.73%. The strain ES1 draft genome consists of 170 contigs, with the longest contig having 356,645 bases, the N50 having 66,568 bases, and the N90 having 15,117 bases. De novo assembly produced 111 small contigs (<10,000 bp) and 59 large contigs (>10,000 bp) (Table 2). The quality of the draft genome of strain ES1 was examined using Benchmarking Universal Single-Copy Ortholog (BUSCO) tested with actinobacteria_odb9 lineage, resulting in 97.8% of complete BUSCOs (Fig. 1). The genome annotation was performed using Rapid Annotation using Subsystem Technology (RAST) server. The output shows that there are 3,904 coding sequences and 51 RNAs in strain ES1, and 25% of coding sequences were classified into 285 subsystems (Fig. 2).

Table 1.

Pre-processed sequencing reads statistics of forward (ES1_R1.fastq) and reverse (ES1_R2.fastq) reads.

Description	Forward	Reverse	Total
Total Raw Reads	1,624,024	1,624,024	3,248,048
Total Raw Reads Bases	245,227,624	245,227,624	490,455,248
Total Clean Reads	1,357,978	1,010,480	2,368,458
Total Clean Reads Bases	186,557,107	117,766,675	304,323,783
Clean Reads (%)	83.62	62.22	72.92
GC Content Clean Reads (%)	64	64

Open in a new tab

Table 2.

Assembly statistics for the draft genome of Arthrobacter sp. ES1

Feature	Value	Percentage (%)
Genome size	3,964,927
Numbers of contigs	170
N50	66,568	1.679
N90	15,117	0.381
L50	18
GC content (%)	2,604,737	65.73
Longest contig	356,645	8.995
Shortest contig	201	0.005
Number of contigs < 8k bases	90	52.941
Number of contigs > 8k bases	21	12.353
Number of contigs > 16k bases	54	31.765
Number of contigs > 100k bases	4	2.353
Number of contigs > 200k bases	1	0.588
Mean contig size	23,323.1	0.588
Median contig size	7,247	0.182

Open in a new tab

Fig 1 — Quality for the draft genome of *Arthrobacter* sp. ES1 was assessed by using the BUSCO tool tested with actinobacteria_odb9 lineage.

Fig 2 — Subsystem distribution for the draft genome of *Arthrobacter* sp. ES1 generated from RAST.

3. Experimental Design, Materials, and Methods

3.1. Genome DNA extraction and sequencing

Arthrobacter sp. ES1 was grown in nutrient broth (NB) medium at 20°C for 3 days and used for genomic DNA extraction. Genomic DNA was extracted by using DNeasy Blood and Tissue kit (Qiagen, Inc, USA) according to the manufacturer's instructions. The Nextera® XT DNA sample preparation kit was used to construct a genomic library. A whole genome shot-gun sequencing was performed by using a 300 cycles Miseq® Reagent Kit v2 on an Illumina MiSeq sequencer to generate 150 bp paired-end reads.

3.2. Reads Pre-Processing, Genome Assembly, Quality Assessment, and Annotation

The raw reads were pre-processed with the SolexaQA tool to remove low-quality bases (Qphred < 20) and short reads (minimum length=50) [1]. Reads were filtered using bowtie2 to remove phiX reads [2]. FastQC was used to ensure that the generated clean reads were of high quality (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) [8]. De novo assembly and scaffolding were performed by using Velvet v1.2.10 [3]. The quality of the draft genome was assessed by using Benchmarking Universal Single-Copy Ortholog (BUSCO) [4]. The draft genome was annotated by using Rapid Annotation using Subsystem Technology (RAST) software [5].

Ethics Statement

This work neither involves human subjects nor animal subjects. The authors declare that this manuscript is original work and has not been published elsewhere.

CRediT authorship contribution statement

Chui Peng Teoh: Conceptualization, Methodology, Data curation, Writing – original draft, Writing – review & editing. Nur Athirah Yusof: Conceptualization, Methodology. Cahyo Budiman: Conceptualization, Methodology. Yoke Kqueen Cheah: Conceptualization, Methodology. Clemente Michael Vui Ling Wong: Conceptualization, Methodology, Supervision, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:

Acknowledgments

The funding supports from the Yayasan Penyelidikan Antartika Sultan Mizan (YPASM) and the Academy of Sciences, Malaysia is gratefully acknowledged. We would like to thank the Academy of Sciences, Malaysia for arranging our trip to the Antarctic, and the National Centre for Antarctic and Ocean Research (NCAOR), India for inviting Malaysians to join their scientific expedition.

Data Availability

WGS of Arthrobacter sp. ES1: Pure culture (Original data) (NCBI SRA database).

References

1.Cox M.P., Peterson D.A., Biggs P.J. SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinform. 2010;11:485. doi: 10.1186/1471-2105-11-485. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Langmead B., Salzberg S.L. Fast gapped-read alignment with Bowtie 2. Nat. Method. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Zerbino D.R., Birney E. Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18:821–829. doi: 10.1101/gr.074492.107. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Simão F.A., Waterhouse R.M., Ioannidis P., Kriventseva E.V., Zdobnov E.M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
5.Aziz R.K., Bartels D., Best A.A., DeJongh M., Disz T., Edwards R.A., Formsma K., Gerdes S., Glass E.M., Kubal M., Meyer F., Olsen G.J., Olson R., Osterman A.L., Overbeek R.A., McNeil L.K., Paarmann D., Paczian T., Parrello B., Pusch G.D., Reich C., Stevens R., Vassieva O., Vonstein V., Wilke A., Zagnitko O. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genom. 2008;9:75. doi: 10.1186/1471-2164-9-75. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Mongodin E.F., Shapir N., Daugherty S.C., DeBoy R.T., Emerson J.B., Shvartzbeyn A., Radune D., Vamathevan J., Riggs F., Grinberg V., Khouri H., Wackett L.P., Nelson K.E., Sadowsky M.J. Secrets of soil survival revealed by the genome sequence of Arthrobacter aurescens TC1. PLoS Genet. 2006;2:e214. doi: 10.1371/journal.pgen.0020214. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Wietz M., Månsson M., Bowman J.S., Blom N., Ng Y., Gram L. Wide distribution of closely related, antibiotic-producing Arthrobacter strains throughout the Arctic Ocean. Appl. Environ. Microbiol. 2012;78:2039–2042. doi: 10.1128/AEM.07096-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.S. Andrew, FastQC: A Quality Control Tool for High Throughput Sequence Data, https://www.bioinformatics.babraham.ac.uk/projects/fastqc/, 2010.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

WGS of Arthrobacter sp. ES1: Pure culture (Original data) (NCBI SRA database).

[bib0001] 1.Cox M.P., Peterson D.A., Biggs P.J. SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinform. 2010;11:485. doi: 10.1186/1471-2105-11-485. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0002] 2.Langmead B., Salzberg S.L. Fast gapped-read alignment with Bowtie 2. Nat. Method. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0003] 3.Zerbino D.R., Birney E. Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18:821–829. doi: 10.1101/gr.074492.107. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0004] 4.Simão F.A., Waterhouse R.M., Ioannidis P., Kriventseva E.V., Zdobnov E.M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]

[bib0005] 5.Aziz R.K., Bartels D., Best A.A., DeJongh M., Disz T., Edwards R.A., Formsma K., Gerdes S., Glass E.M., Kubal M., Meyer F., Olsen G.J., Olson R., Osterman A.L., Overbeek R.A., McNeil L.K., Paarmann D., Paczian T., Parrello B., Pusch G.D., Reich C., Stevens R., Vassieva O., Vonstein V., Wilke A., Zagnitko O. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genom. 2008;9:75. doi: 10.1186/1471-2164-9-75. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0006] 6.Mongodin E.F., Shapir N., Daugherty S.C., DeBoy R.T., Emerson J.B., Shvartzbeyn A., Radune D., Vamathevan J., Riggs F., Grinberg V., Khouri H., Wackett L.P., Nelson K.E., Sadowsky M.J. Secrets of soil survival revealed by the genome sequence of Arthrobacter aurescens TC1. PLoS Genet. 2006;2:e214. doi: 10.1371/journal.pgen.0020214. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0007] 7.Wietz M., Månsson M., Bowman J.S., Blom N., Ng Y., Gram L. Wide distribution of closely related, antibiotic-producing Arthrobacter strains throughout the Arctic Ocean. Appl. Environ. Microbiol. 2012;78:2039–2042. doi: 10.1128/AEM.07096-11. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0008] 8.S. Andrew, FastQC: A Quality Control Tool for High Throughput Sequence Data, https://www.bioinformatics.babraham.ac.uk/projects/fastqc/, 2010.

PERMALINK

Whole genome sequence data of an Antarctic bacterium, Arthrobacter sp. ES1 from the Schirmacher Oasis, East Antarctica

Chui Peng Teoh

Nur Athirah Yusof

Cahyo Budiman

Yoke Kqueen Cheah

Clemente Michael Vui Ling Wong

Abstract

Value of the Data

1. Objective

2. Data Description

Table 1.

Table 2.

Fig. 1.

Fig. 2.

3. Experimental Design, Materials, and Methods

3.1. Genome DNA extraction and sequencing

3.2. Reads Pre-Processing, Genome Assembly, Quality Assessment, and Annotation

Ethics Statement

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Data Availability

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Whole genome sequence data of an Antarctic bacterium, Arthrobacter sp. ES1 from the Schirmacher Oasis, East Antarctica

Chui Peng Teoh

Nur Athirah Yusof

Cahyo Budiman

Yoke Kqueen Cheah

Clemente Michael Vui Ling Wong

Abstract

Value of the Data

1. Objective

2. Data Description

Table 1.

Table 2.

Fig. 1.

Fig. 2.

3. Experimental Design, Materials, and Methods

3.1. Genome DNA extraction and sequencing

3.2. Reads Pre-Processing, Genome Assembly, Quality Assessment, and Annotation

Ethics Statement

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Data Availability

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases