Abstract
The genome data of Streptomyces sp. FH025 comprised of 8,381,474 bp with a high GC content of 72.51%. The genome contains 7035 coding sequences spanning 1261 contigs. Streptomyces sp. FH025 contains 57 secondary metabolite gene clusters including polyketide synthase, nonribosomal polyketide synthase and other biosynthetic pathways such as amglyccycl, butyrolactone, terpenes, siderophores, lanthipeptide-class-iv, and ladderane. 16S rRNA analysis of Streptomyces sp. FH025 is similar to the Streptomyces genus. This whole genome project has been deposited at NCBI under the accession JAFLNG000000000.
Keywords: Streptomyces sp., Draft genome sequence, FH025, Secondary metabolites, Anti-malarial activity
Specification Table
| Subject | Biology |
| Specific subject area | Microbiology, Bacterial genomics, Biotechnology |
| Type of data | Figure, Table, Draft genome sequence data |
| How data were acquired | Genome sequencing on Miseq |
| Data format | Raw and analyzed |
| Parameters for data collection | Genomic DNA was isolated from a pure culture of Streptomyces sp. FH025. AntiSMASH software predicted the putative biosynthetic gene clusters. |
| Description of data collection | Whole-genome sequencing, assembly, and annotation |
| Data source location | Soil samples used for bacteria isolation were collected at Likas, Sabah, Malaysia. (06°2′18.4″N 116° 7′16.6″E) |
| Data accessibility | The data is available at NCBI Genbank from the following links: http://www.ncbi.nlm.nih.gov/bioproject/705517 https://www.ncbi.nlm.nih.gov/biosample/18091016 https://www.ncbi.nlm.nih.gov/sra/PRJNA705517 |
Value of the Data
-
•
The Streptomyces strain FH025 draft genome showed that it is unique as compared to other strains and has the potential to produce novel bioactive compounds.
-
•
The secondary metabolite putative genes identified in Streptomyces sp. FH025 genome could contribute greatly to the antibiotic and drug discovery for treatment of various human diseases.
-
•
Based on the genome data and previous study, this strain could be a potential strain for study of anti-malarial compounds as well as various enzymes production.
1. Data Description
Streptomyces sp. FH025 was isolated from Likas, Sabah, Malaysia (06°2′18.4″ N 116° 7′16.6″ E). The draft genome characteristics of Streptomyces sp. FH025 were summarized in Table 1. There were 1261 number of contigs with a total contig size of 8381,474 bp and N50 contig number of 10,071. The L50 value was 246 and the GC content was 72.51%. Based on genome annotation, there were 1261 number of contigs with protein encoding genes and 406 number of sub systems with 7035 number of coding sequences (Table 1, Fig. 1). There were 74 RNAs.
Table 1.
Characteristics of draft genome assembly of Streptomyces sp. FH025.
| Number of contigs | 1261 |
| Total contig size (bp) | 8381,474 |
| N50 contig numbera | 10,071 |
| L50 | 246 |
| GC content (%) | 72.51 |
| Number of contigs (with protein encoding genes) | 1261 |
| Number of subsystems | 406 |
| Number of coding sequences | 7035 |
| Number of RNAs | 74 |
Minimum set of contigs that represent at least 50% of total genome sequence.
Fig. 1.
Subsystem statistics information of FH025 using RAST annotation. The subsystems category and corresponding feature counts were shown in the legend.
Additionally, Streptomyces sp. FH025 could produce important secondary metabolites when analyzed using antiSMASH. It was estimated that there were 51 secondary metabolites cluster of genes (smCOG) (Table 2). The secondary metabolite genes present were type I and type III polyketides synthase (PKS). There were 9 non-ribosomal polypeptide synthetase (NRPS), 10 NRPS-like and 1 NRPS-Type I PKS identified. Besides, several secondary metabolite biosynthetic pathways were present such as amglyccycl, butyrolactone, terpenes, siderophores, lantipeptide and ladderane.
Table 2.
Putative gene clusters coding for secondary metabolites detected by antiSMASH annotation of Streptomyces sp. FH025.
| Features | Number of clusters |
|---|---|
| No of smCOG1 | 57 |
| PKS2 | |
| PKS-like | 2 |
| Type I | 17 |
| Type III | 2 |
| NRPS3 | 9 |
| NRPS-like | 10 |
| NRPS-Type I PKS | 1 |
| Biosynthetic Pathways | |
| Amglyccycl | 1 |
| Butyrolactone | 1 |
| Terpenes | 3 |
| Siderophores | 4 |
| Lanthipeptide-class-iv | 1 |
| Ladderane | 1 |
| RiPP-like | 1 |
| RRE-containing | 2 |
| NAPAA | 1 |
| Others | 1 |
Secondary metabolism Clusters of Orthologous Groups.
Polyketide synthase.
Nonribosomal polypeptide synthetase.
ContEst16S software analysis indicated that the draft genome assembly did not have contamination of other prokaryotic genome. The 16S rRNA phylogenetic analysis revealed that Streptomyces sp. FH025 is closely related to the Streptomyces genus (Fig 2). Furthermore, genome-based taxonomy analysis revealed that strain FH025 has the highest average nucleotide identity (ANI) value (89.42%) and highest digital DNA-DNA hybridization (dDDH) value (38.4%) with Kitasatospora aureofaciens strain DM-1 (Table 3). However, strain FH025 was not affiliated as Kintasatospora because the values of ANI and dDDH were not greater than the established cutoff values on species delimitation for ANI (> 95–96%) [1] and dDDH value (>70%), respectively [2]. The low genome identity of strain FH025 with other strains analyzed indicated that strain FH025 is unique and warrant further investigation.
Fig. 2.
Phylogenetic tree diagram of FH025 generated using neighbor-joining based on 16S rRNA gene sequence (947 bp) shows that FH025 was closely related with the Streptomyces genus. The numbers at branch nodes indicate percentages from 1000 bootstraps.
Table 3.
The 16S rRNA sequence similarity, ANI and dDDH values of strain FH025 and its closely related species.
| Closely related species | 16S rRNA sequence similarity (%) | OrthoANIu value (%) | dDDH value (%) |
|---|---|---|---|
| NC_016109.1 Kitasatospora setae KM 6054, complete sequence | 98.17 | 80.54 | 24.6 |
| NZ_CP020563.1 Kitasatospora albolonga strain YIM 101,047 chromosome, complete genome | 97.46 | 75.52 | 21.8 |
| NZ_CP020567.1 Kitasatospora aureofaciens strain DM-1 chromosome, complete genome | 99.80 | 89.42 | 38.4 |
| NZ_CP025394.1 Kitasatospora sp. MMS16-BH015 chromosome, complete genome | 98.67 | 81.01 | 25.2 |
| NZ_CP054919.1 Kitasatospora sp. NA04385 chromosome, complete genome | 98.57 | 80.72 | 24.7 |
| Streptomyces clavuligerus strain ATCC 27,064 chromosome, complete genome | 96.64 | 75.89 | 21.8 |
| Streptomyces galilaeus strain ATCC 14,969 chromosome, complete genome | 96.13 | 75.59 | 21.3 |
| Streptomyces nitrosporeus strain ATCC 12,769 chromosome, complete genome | 97.25 | 75.87 | 21.7 |
| Streptomyces subrutilus strain ATCC 27,467 chromosome, complete genome | 96.85 | 76.23 | 21.5 |
| Streptomyces tsukubensis strain NRRL 18,488 chromosome, complete genome | 96.95 | 75.59 | 21.8 |
2. Experimental Design, Materials and Methods
2.1. Sample collection and isolation of streptomyces
Soil samples covered with dead leaves were collected under a tree, Shorea parvifolia from Likas, Sabah, Malaysia and bacteria isolation was performed as previously described [3]. Briefly, serial dilution was performed on the soil samples and bacteria isolation was carried out using modified humic acid agar (with addition of vitamin B). Screening of isolates exhibiting anti-malaria activities was conducted and FH025 was observed to exhibit anti-malarial activities as previously described [3]. The isolate was sub-cultured on oatmeal agar (pH 7.2) at 28 °C to obtain a pure isolate named FH025. The culture was stored in 20% glycerol stock at −80 °C.
2.2. DNA isolation, genome sequencing, assembly, and annotation
Genomic DNA was isolated using Wizard® Genomic DNA Purification Kit according to manufacturer's instructions (Promega, USA). A whole-genome sequencing library was prepared using Nextera XT DNA library preparation kit following manufacturer's instructions (illumina, USA). The libraries were sequenced using the Miseq platform (Illumina, USA) to generate 2 × 250 paired end reads. The raw reads adapters were trimmed. Low quality sequences (<Q30) were trimmed by Trimmomatic version 0.38.0 [4]. Primary genome assembly was performed using Unicycler version 0.4.8.0 [5]. The primary draft genome was analyzed by rapid annotation using subsystems technology (RAST) [6], [7], [8]. The secondary metabolites biosynthetic gene clusters of strain FH025 draft genome were identified using antiSMASH version 5.0 [9].
2.3. 16S rRNA phylogenetic analysis
ContEst16S software was used to extract Streptomyces sp. FH025 16S rRNA gene sequence (981 bp) and analyze for any contamination of prokaryotic genomes [10]. Basic local alignment search tool (BLAST) analysis was performed against NCBI database and the top 20 near species 16S rRNA gene sequence was retrieved. The sequences were aligned using ClustalW and trimmed to 947 bp [11]. The phylogenetic tree was constructed by neighbor joining method with 1000 bootstraps using MEGA X software [12].
2.4. Average nucleotide identity and digital dna-dna hybridization genome-based taxonomy analysis
The ANI between the genome of strain FH025 and related species with complete genome from NCBI database were determined by OrthoANIu algorithm [13]. Digital DNA-DNA hybridization (dDDH) was performed using genome blast distance phylogeny with 10 closely related species with complete genome sequence obtained from NCBI database [14].
Data Availability
The whole genome project was deposited at NCBI under the accession JAFLNG000000000.
Ethics Statement
This study did not involve any human subjects and animal experiments. No ethical approval was required.
CRediT Author Statement
Lucky Poh Wah Goh: Formal analysis, Data curation, Writing – original draft, Writing – review & editing; Fauze Mahmud: Writing – review & editing; Ping-Chin Lee: Conceptualization, Resources, Supervision, Writing – review & editing.
Declaration of Competing Interest
The authors declare that they do not have conflict of interest that could influence the work reported in this paper.
Acknowledgement
This work is partly supported by Universiti Malaysia Sabah (GKP22–2018).
References
- 1.Varghese N.J., Mukherjee S., Ivanova N., Konstantinidis K.T., Mavrommatis K., Kyrpides N.C., Pati A. Microbial species delineation using whole genome sequences. Nucleic. Acid. Res. 2015;43:6761–6771. doi: 10.1093/nar/gkv657. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Meier-Kolthoff J.P., Klenk H.-P., Göker M. Taxonomic use of DNA G+C content and DNA–DNA hybridization in the genomic age. Int. J. Syst. Evol. Microbiol. 2014;64(2014):352–356. doi: 10.1099/ijs.0.056994-0. [DOI] [PubMed] [Google Scholar]
- 3.Dahari D.E., Salleh R.M., Mahmud F., Lee P.-C., Embi N., Sidek H.M. Anti-malarial activities of two soil actinomycete isolates from sabah via inhibition of glycogen synthase kinase 3β. Trop Life Sci Res. 2016;27:53–71. doi: 10.21315/tlsr2016.27.2.5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Bolger A.M., Lohse M., Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinfo. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Wick R.R., Judd L.M., Gorrie C.L., Holt K.E. Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput. Biol. 2017;13 doi: 10.1371/journal.pcbi.1005595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Aziz R.K., Bartels D., Best A.A., DeJongh M., Disz T., Edwards R.A., Formsma K., Gerdes S., Glass E.M., Kubal M., Meyer F., Olsen G.J., Olson R., Osterman A.L., Overbeek R.A., McNeil L.K., Paarmann D., Paczian T., Parrello B.…Zagnitko O. The RAST server: rapid annotations using subsystems technology. BMC. Genomics. 2008;9:75. doi: 10.1186/1471-2164-9-75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Overbeek R., Olson R., Pusch G.D., Olsen G.J., Davis J.J., Disz T., Edwards R.A., Gerdes S., Parrello B., Shukla M., Vonstein V., Wattam A.R., Xia F., Stevens R. The SEED and the rapid annotation of microbial genomes using subsystems technology (RAST) Nucleic. Acid. Res. 2013;42:D206–D214. doi: 10.1093/nar/gkt1226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Brettin T., Davis J.J., Disz T., Edwards R.A., Gerdes S., Olsen G.J., Olson R., Overbeek R., Parrello B., Pusch G.D., Shukla M., Thomason J.A., Stevens R., Vonstein V., Wattam A.R., Xia F. RASTtk: a modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci. Rep. 2015;5 doi: 10.1038/srep08365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Blin K., Shaw S., Steinke K., Villebro R., Ziemert N., Lee S.Y., Medema M.H., Weber T. antiSMASH 5.0: updates to the secondary metabolite genome mining pipeline. Nucleic. Acid. Res. 2019;47:W81–W87. doi: 10.1093/nar/gkz310. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Lee I., Chalita M., Ha S.-M., Na S.-I., Yoon S.-H., Chun J. ContEst16S: an algorithm that identifies contaminated prokaryotic genomes using 16S RNA gene sequences. Int. J. Syst. Evol. Microbiol. 2017;67:2053–2057. doi: 10.1099/ijsem.0.001872. [DOI] [PubMed] [Google Scholar]
- 11.Thompson J.D., Higgins D.G., Gibson T.J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic. Acid. Res. 1994;22:4673–4680. doi: 10.1093/nar/22.22.4673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Kumar S., Stecher G., Li M., Knyaz C., Tamura K., MEGA X. Molecular evolutionary genetics analysis across computing platforms. Mol. Biol. Evol. 2018;35:1547–1549. doi: 10.1093/molbev/msy096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Yoon S.-H., Ha S., Lim J., Kwon S., Chun J. A large-scale evaluation of algorithms to calculate average nucleotide identity. Antonie Van Leeuwenhoek. 2017;110:1281–1286. doi: 10.1007/s10482-017-0844-4. [DOI] [PubMed] [Google Scholar]
- 14.Meier-Kolthoff J.P., Auch A.F., Klenk H.-P., Göker M. Genome sequence-based species delimitation with confidence intervals and improved distance functions. BMC. Bioinfo. 2013;14:60. doi: 10.1186/1471-2105-14-60. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The whole genome project was deposited at NCBI under the accession JAFLNG000000000.


