Abstract
Arthrobacter is a coryneform bacterium in the family of Micrococcaceae. Arthrobacter species isolated from hostile environments are capable of producing interesting bioactive compounds, some of which may be a new class of antibiotics. Here, we present the complete genome sequence of Arthrobacter sp. ES1 isolated from Schirmacher Oasis in East Antarctica. Genomic DNA sequencing was performed using the Illumina MiSeq sequencer. Arthrobacter sp. ES1 has a genome size of 3,964,927 bp and a GC content of 65.73%. The raw genome sequences have been deposited in the NCBI Sequence Read Archive database under the accession number, SRR20664316.
Keywords: Arthrobacter, Antarctica, Whole genome sequencing, Genome
Specification Table
| Subject | Biology |
| Specific subject area | Microbiology and genomics |
| Type of data | Table Figure |
| How the data were acquired | The genomic library was constructed using Nextera® XT DNA Sample preparation kit. Genome sequencing was performed using 300 cycles of Miseq® Regent Kit v2 and Illumina MiSeq Platforms. Raw sequencing data were trimmed and filtered using SolexaQA and bowtie2 tools [1,2]. De novo assembly was performed by using Velvet [3]. The genome completeness was assessed using BUSCO [4] tool. Genome annotation was performed using an online web tool, Rapid Annotation using Subsystem Technology (RAST) [5]. |
| Data format | Raw Filtered Analyzed Assembled Annotated |
| Description of data collection | Genomic DNA Arthrobacter sp. ES1 was extracted using QIAGEN® DNeasy Blood and tissue kit. |
| Data source location | Arthrobacter sp. ES1 was isolated from a snow sample collected from the Schirmacher oasis (S70° 07’ 2.3” E 22۫° 55’ 46.5”), East Antarctica. |
| Data accessibility | The data is hosted at National Center for Biotechnology Information Bioproject: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA344835 Biosample: https://www.ncbi.nlm.nih.gov/biosample/SAMN05712594 NCBI GenBank Accession Number: NZ_MQTO01000000 https://www.ncbi.nlm.nih.gov/nuccore/2119834887 Repository name: NCBI SRA database Data identification number: SRR20664316 Direct URL to data: https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR20664316 |
Value of the Data
-
•
Whole genome sequence data can be used to identify Arthrobacter sp. ES1 and determine whether it is a new species.
-
•
Whole genome sequence data of strain ES1 can be useful for comparative genomic studies with other Arthrobacter species.
-
•
Unravelling the genome of strain ES1 may aid in the discovery of novel bioactive compound-coding gene clusters.
1. Objective
Arthrobacter sp. Are ubiquity present in the environment, they are frequently isolated from the soil. Arthrobacter sp. strains have exceptional survival abilities. They have been isolated from a variety of harsh environments, including radioactive and chemically contaminated sites as well as the polar regions. These Arthrobacter sp. are capable of metabolizing and resisting environmental hazards and heavy metals [6]. In addition, polar environments have been proposed as a source of novel bioactive compounds. Sixteen bacterial strains that produce antibiotics have been isolated from the central Arctic Ocean by Wietz et al., [7]. Seven of these Arthrobacter spp. can produce arthrobacilins, A and C under different growth conditions [7]. Our objective was to sequence, assemble, and annotate the genome of Arthrobacter sp. ES1, which would allow us to discover new bioactive compounds and conduct evolutionary research.
2. Data Description
This data set includes raw and assembled DNA sequences that have been quality-assessed, as well as annotated versions of the genomes of Arthrobacter sp. ES1. The resulting paired-end sequencing reads were designated as ES1_R1.fastq and ES1_R2.fastq. Herein, the raw and clean-sequencing reads, statistics for the assembly, the genome's quality, and its annotation are reported. A total of 3,248,048 raw reads were generated resulting in 490,455,248 bases (Table 1). The sequencing reads were then pre-processed to remove low-quality, contaminant, and short reads, a total of 72.92% of clean reads were recovered. The genome size of strain ES1 is 3,964,927 bp at 77 × sequence coverage with a GC content of 65.73%. The strain ES1 draft genome consists of 170 contigs, with the longest contig having 356,645 bases, the N50 having 66,568 bases, and the N90 having 15,117 bases. De novo assembly produced 111 small contigs (<10,000 bp) and 59 large contigs (>10,000 bp) (Table 2). The quality of the draft genome of strain ES1 was examined using Benchmarking Universal Single-Copy Ortholog (BUSCO) tested with actinobacteria_odb9 lineage, resulting in 97.8% of complete BUSCOs (Fig. 1). The genome annotation was performed using Rapid Annotation using Subsystem Technology (RAST) server. The output shows that there are 3,904 coding sequences and 51 RNAs in strain ES1, and 25% of coding sequences were classified into 285 subsystems (Fig. 2).
Table 1.
Pre-processed sequencing reads statistics of forward (ES1_R1.fastq) and reverse (ES1_R2.fastq) reads.
| Description | Forward | Reverse | Total |
|---|---|---|---|
| Total Raw Reads | 1,624,024 | 1,624,024 | 3,248,048 |
| Total Raw Reads Bases | 245,227,624 | 245,227,624 | 490,455,248 |
| Total Clean Reads | 1,357,978 | 1,010,480 | 2,368,458 |
| Total Clean Reads Bases | 186,557,107 | 117,766,675 | 304,323,783 |
| Clean Reads (%) | 83.62 | 62.22 | 72.92 |
| GC Content Clean Reads (%) | 64 | 64 |
Table 2.
Assembly statistics for the draft genome of Arthrobacter sp. ES1
| Feature | Value | Percentage (%) |
|---|---|---|
| Genome size | 3,964,927 | |
| Numbers of contigs | 170 | |
| N50 | 66,568 | 1.679 |
| N90 | 15,117 | 0.381 |
| L50 | 18 | |
| GC content (%) | 2,604,737 | 65.73 |
| Longest contig | 356,645 | 8.995 |
| Shortest contig | 201 | 0.005 |
| Number of contigs < 8k bases | 90 | 52.941 |
| Number of contigs > 8k bases | 21 | 12.353 |
| Number of contigs > 16k bases | 54 | 31.765 |
| Number of contigs > 100k bases | 4 | 2.353 |
| Number of contigs > 200k bases | 1 | 0.588 |
| Mean contig size | 23,323.1 | 0.588 |
| Median contig size | 7,247 | 0.182 |
Fig. 1.
Quality for the draft genome of Arthrobacter sp. ES1 was assessed by using the BUSCO tool tested with actinobacteria_odb9 lineage.
Fig. 2.
Subsystem distribution for the draft genome of Arthrobacter sp. ES1 generated from RAST.
3. Experimental Design, Materials, and Methods
3.1. Genome DNA extraction and sequencing
Arthrobacter sp. ES1 was grown in nutrient broth (NB) medium at 20°C for 3 days and used for genomic DNA extraction. Genomic DNA was extracted by using DNeasy Blood and Tissue kit (Qiagen, Inc, USA) according to the manufacturer's instructions. The Nextera® XT DNA sample preparation kit was used to construct a genomic library. A whole genome shot-gun sequencing was performed by using a 300 cycles Miseq® Reagent Kit v2 on an Illumina MiSeq sequencer to generate 150 bp paired-end reads.
3.2. Reads Pre-Processing, Genome Assembly, Quality Assessment, and Annotation
The raw reads were pre-processed with the SolexaQA tool to remove low-quality bases (Qphred < 20) and short reads (minimum length=50) [1]. Reads were filtered using bowtie2 to remove phiX reads [2]. FastQC was used to ensure that the generated clean reads were of high quality (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) [8]. De novo assembly and scaffolding were performed by using Velvet v1.2.10 [3]. The quality of the draft genome was assessed by using Benchmarking Universal Single-Copy Ortholog (BUSCO) [4]. The draft genome was annotated by using Rapid Annotation using Subsystem Technology (RAST) software [5].
Ethics Statement
This work neither involves human subjects nor animal subjects. The authors declare that this manuscript is original work and has not been published elsewhere.
CRediT authorship contribution statement
Chui Peng Teoh: Conceptualization, Methodology, Data curation, Writing – original draft, Writing – review & editing. Nur Athirah Yusof: Conceptualization, Methodology. Cahyo Budiman: Conceptualization, Methodology. Yoke Kqueen Cheah: Conceptualization, Methodology. Clemente Michael Vui Ling Wong: Conceptualization, Methodology, Supervision, Writing – review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:
Acknowledgments
The funding supports from the Yayasan Penyelidikan Antartika Sultan Mizan (YPASM) and the Academy of Sciences, Malaysia is gratefully acknowledged. We would like to thank the Academy of Sciences, Malaysia for arranging our trip to the Antarctic, and the National Centre for Antarctic and Ocean Research (NCAOR), India for inviting Malaysians to join their scientific expedition.
Data Availability
WGS of Arthrobacter sp. ES1: Pure culture (Original data) (NCBI SRA database).
References
- 1.Cox M.P., Peterson D.A., Biggs P.J. SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinform. 2010;11:485. doi: 10.1186/1471-2105-11-485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Langmead B., Salzberg S.L. Fast gapped-read alignment with Bowtie 2. Nat. Method. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Zerbino D.R., Birney E. Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18:821–829. doi: 10.1101/gr.074492.107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Simão F.A., Waterhouse R.M., Ioannidis P., Kriventseva E.V., Zdobnov E.M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
- 5.Aziz R.K., Bartels D., Best A.A., DeJongh M., Disz T., Edwards R.A., Formsma K., Gerdes S., Glass E.M., Kubal M., Meyer F., Olsen G.J., Olson R., Osterman A.L., Overbeek R.A., McNeil L.K., Paarmann D., Paczian T., Parrello B., Pusch G.D., Reich C., Stevens R., Vassieva O., Vonstein V., Wilke A., Zagnitko O. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genom. 2008;9:75. doi: 10.1186/1471-2164-9-75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Mongodin E.F., Shapir N., Daugherty S.C., DeBoy R.T., Emerson J.B., Shvartzbeyn A., Radune D., Vamathevan J., Riggs F., Grinberg V., Khouri H., Wackett L.P., Nelson K.E., Sadowsky M.J. Secrets of soil survival revealed by the genome sequence of Arthrobacter aurescens TC1. PLoS Genet. 2006;2:e214. doi: 10.1371/journal.pgen.0020214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Wietz M., Månsson M., Bowman J.S., Blom N., Ng Y., Gram L. Wide distribution of closely related, antibiotic-producing Arthrobacter strains throughout the Arctic Ocean. Appl. Environ. Microbiol. 2012;78:2039–2042. doi: 10.1128/AEM.07096-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.S. Andrew, FastQC: A Quality Control Tool for High Throughput Sequence Data, https://www.bioinformatics.babraham.ac.uk/projects/fastqc/, 2010.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
WGS of Arthrobacter sp. ES1: Pure culture (Original data) (NCBI SRA database).


