Draft genomes of 3 cyanobacteria strains and 17 co-habiting proteobacteria assembled from metagenomes

Maximilian Berthold; Martin Albrecht; Douglas A Campbell; Naaman M Omar

doi:10.1128/MRA.00460-23

. 2023 Nov 9;12(12):e00460-23. doi: 10.1128/MRA.00460-23

Draft genomes of 3 cyanobacteria strains and 17 co-habiting proteobacteria assembled from metagenomes

Maximilian Berthold ^1,^✉, Martin Albrecht ², Douglas A Campbell ¹, Naaman M Omar ¹

Editor: J Cameron Thrash³

PMCID: PMC10720521 PMID: 37943043

ABSTRACT

Cyanobium and Synechococcus are prominent, globally distributed cyanobacteria genera with ecological significance. Here, we report the genomes of the marine Synechococcus sp. CCMP836 and two strains of Cyanobium (CZS25K and CZS48M) along with the genomes of 17 co-occurring proteobacteria. These genomes will improve the strain-specific ecological positions.

KEYWORDS: cyanobacteria, proteobacteria, metagenomics

ANNOUNCEMENT

The global brackish microbiome is dominated by Synechococcus and Cyanobium (1) with more strains available than sequenced genomes. Cyanobium spp. CZS25K and CZS48M were isolated from the southern Baltic Sea in 2017 (surface water, Darss-Zingst lagoon system, a eutrophic lagoon system with a salinity range of 2 to 6 (2), 54.43°N, 12.68°E; see (3) for details on isolation) and obtained from the Applied Ecology Culture Collection at the University of Rostock. Synechococcus sp. CCMP836 (synonyms WH 8007, 838BG) was isolated from the surface waters of the Gulf of Mexico in 1980 (19.75°N, −92.41°W) (4) and obtained from NCMA Bigelow.

CZS25K and CZS48M were grown in freshwater BG11 media (1 ppt salinity), whereas CCMP836 was grown in marine BG11 media (32 ppt salinity) under a 12:12 photoperiod on a shaker for 7 days before DNA extractions. DNA was extracted from xenic cultures of CCMP836, CZS25K, and CZS48M using a Qiagen DNeasy kit. Libraries were prepared using the NEBNext Ultra II DNA Library Prep kit for Illumina (New England Biolabs) as per the manufacturer’s recommendations and sequenced using the Illumina NovaSeq S4 lane using the Xp protocol as per the manufacturer’s recommendations by Genome Québec, Montréal, Canada, yielding 30.6 million, 28.4 million, and 32.4 million paired-end reads of 150 bp, respectively. In a second extraction, DNA was extracted from xenic cultures of CCMP836, CZS25K, and CZS48M using a Qiagen Genomic-tip 20 /G kit. Libraries were prepared using the Pacific Biosciences Preparing whole genome and metagenome libraries using the SMRTbell prep kit 3.0 protocol and sequenced using a PacBio Sequel II (Genome Québec, Montréal, Canada). No size selection was done prior to sequencing. The sequence quality was checked using FASTQC 0.11.9 (https://www.bioinformatics.babraham.ac.uk/projects/fastqc) and subsequently processed using Cutadapt 4.1 (5). Illumina reads and PacBio CCS reads were error-corrected and de novo assembled by Spades v3.15.4 (6) using the “-s” option for PacBio CCS reads and the “−1” and “−2” options for Illumina reads. Assemblies were aligned and sorted using Bowtie2 v2.4.4 (7) and Samtools v1.17 (8) and binned using MetaBAT2 (9) running on default parameters (Table 1). Samtools v1.17 (8) was also used to calculate genome coverage. Bin reliability was verified using FOCUS 1.5 (10) and CheckM v1.2.0 (11), and gaps were closed by GenomeFinisher (12) under default parameters using alternate Megahit v1.2.9 (13) and GenPipes (14) assemblies. Megahit v1.2.9 was run using the “-r” option for PacBio CCS reads and the “−1” and “−2” options for Illumina reads and the “meta-sensitive” preset. GenPipes was run by Genome Québec, Montréal, Canada, using PacBio reads. The completeness of bins was assessed using BUSCO v5.2.2 (15) (Figure 1) and subsequently annotated using PGAP (16). Taxonomy was assigned using the GTDB-Tk v2.1.1 classify workflow (17). Default parameters were used for all software unless otherwise specified.

TABLE 1.

Taxonomy and attributes of binned metagenome-assembled genomes^a

Phylum	Order	Organism	Total length (Mbp)	# contigs	Largest contig (bp)	GC (%)	N ₅₀	Genome coverage (×)	Source strain	Bin	Genome contamination (CheckM)	National Center for Biotechnology Information (NCBI) genome accession number	NCBI Sequence Read Archive (SRA) number
C	Synechococcales	Cyanobium sp. CZS48M	2.769251	37	457,164	66.93	98,032	3,538.57	CZS48M	9	0.14	JAUCZK000000000	SRR24524117 SRR24524116
C	Synechococcales	Cyanobium sp. CZS25K	3.100390	34	369,722	67.92	132,422	4,373.01	CZS25K	11	0.82	JAUCZB000000000	SRR24524118 SRR24524119
C	Synechococcales	Synechococcus sp. CCMP836	2.253937	19	357,230	63.53	155,493	1,145.01	CCMP836	24	0	JAUCYY000000000	SRR24524120 SRR24524121
P	Alteromonadales	Alteromonas macleodii	4.571015	58	300,811	44.65	130,570	37.377	CZS25K	4	0	JAUCZE000000000	SRR24524118 SRR24524119
P	Alteromonadales	Alteromonas macleodii	4.503556	49	286,734	44.67	138,090	324.57	CCMP836	21	0	JAUCYV000000000	SRR24524120 SRR24524121
P	Burkholderiales	Hydrogenophaga sp.	4.017909	52	251,823	66.32	102,357	115.426	CZS48M	2	0	JAUCZG000000000	SRR24524117 SRR24524116
P	Burkholderiales	Hydrogenophaga sp.	3.855627	10	1,132,111	64.5	446,382	180.227	CZS25K	9	0	JAUCZF000000000	SRR24524118 SRR24524119
P	Caulobacterales	Maricaulis sp.	2.986590	32	433,653	60.81	129,008	233.057	CCMP836	23	0	JAUCYX000000000	SRR24524120 SRR24524121
P	Oceanibaculales	Oceanibaculum nanhaiense	3.500712	17	570,757	65.29	317,371	189.412	CZS48M	7	0	JAUCZI000000000	SRR24524117 SRR24524116
P	Oceanospirillales	Marinobacter salarius	4.275244	62	365,476	57.32	135,686	61.0175	CCMP836	13	0	JAUCYT000000000	SRR24524120 SRR24524121
P	Rhizobiales	Allorhizobium sp.	4.479681	30	509,794	61.4	227,190	40.1371	CCMP836	11	0	JAUCYR000000000	SRR24524120 SRR24524121
P	Rhizobiales	Allorhizobium sp.	4.342418	21	823,773	61.46	352,591	199.246	CZS25K	3	0	JAUCZD000000000	SRR24524118 SRR24524119
P	Rhodobacterales	Roseovarius sp.	4.804078	18	743,804	66.05	328,058	400.304	CCMP836	15	0	JAUCYU000000000	SRR24524120 SRR24524121
P	Rhodobacterales	Tabrizicola sp.	3.616173	17	987,302	63.3	372,849	319.115	CZS48M	3	0	JAUCZH000000000	SRR24524117 SRR24524116
P	Rhodobacterales	Rhodobacteraceae sp.	3.599685	21	573,686	63.23	224,738	249.302	CCMP836	22	0	JAUCYW000000000	SRR24524120 SRR24524121
P	Rhodospirillales	Thalassospira xiamenensis	4.764832	23	970,535	54.71	264,246	254.064	CCMP836	9	0	JAUCYZ000000000	SRR24524120 SRR24524121
P	Sphingomonadales	Blastomonas fulva	3.805544	56	387,436	64.57	116,093	62.7709	CZS25K	10	0	JAUCZA000000000	SRR24524118 SRR24524119
P	Sphingomonadales	Blastomonas sp.	3.658526	58	237,960	64.04	112,763	92.1295	CZS25K	1	0	JAUCZC000000000	SRR24524118 SRR24524119
P	Sphingomonadales	Blastomonas fulva	3.463794	8	1,112,259	64.39	693,030	1031.03	CZS48M	8	0	JAUCZJ000000000	SRR24524117 SRR24524116
P	Sphingomonadales	Parasphingorhabdus sp.	3.127999	28	434,863	59.39	180,829	20.2539	CCMP836	12	0.84	JAUCYS000000000	SRR24524120 SRR24524121

Open in a new tab

^{^a}

The phylum column indicates whether organisms are cyanobacteria (C) or proteobacteria (P).

Fig 1 — Predicted completeness of cyanobacterial and proteobacterial metagenome-assembled genomes based on core genes as analyzed by BUSCO.

Contributor Information

Maximilian Berthold, Email: mberthold@mta.ca.

J. Cameron Thrash, University of Southern California, Los Angeles, California, USA.

DATA AVAILABILITY

This project has been deposited at the NCBI under BioProject accession number PRJNA956506. The raw sequence metagenomic reads can be located on the SRA under accession numbers SRR24524116 to SRR24524121. Genome assemblies have been deposited at DDBJ/ENA/GenBank under accession numbers JAUCYR000000000 to JAUCZK000000000.

REFERENCES

1. Doré H, Leconte J, Guyet U, Breton S, Farrant GK, Demory D, Ratin M, Hoebeke M, Corre E, Pitt FD, Ostrowski M, Scanlan DJ, Partensky F, Six C, Garczarek L, Blanchard JL. 2022. Global phylogeography of marine Synechococcus in coastal areas reveals strong community shifts. mSystems 7:e0065622. doi: 10.1128/msystems.00656-22 [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Schiewer U. 2008. Darß-Zingst Boddens, northern Rügener Boddens and Schlei, p 35–86. In Schiewer U (ed), Ecology of Baltic coastal waters. Springer, Berlin, Heidelberg. [Google Scholar]
3. Albrecht M, Pröschold T, Schumann R. 2017. Identification of cyanobacteria in a Eutrophic Coastal lagoon on the Southern Baltic coast. Front Microbiol 8:923. doi: 10.3389/fmicb.2017.00923 [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Wood AM. 1985. Adaptation of photosynthetic apparatus of marine ultraphytoplankton to natural light fields. Nature 316:253–255. doi: 10.1038/316253a0 [DOI] [Google Scholar]
5. Martin M. 2011. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet j 17:10. doi: 10.14806/ej.17.1.200 [DOI] [Google Scholar]
6. Nurk S, Meleshko D, Korobeynikov A, Pevzner PA. 2017. metaSPAdes: a new versatile metagenomic assembler. Genome Res 27:824–834. doi: 10.1101/gr.213959.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359. doi: 10.1038/nmeth.1923 [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup . 2009. The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079. doi: 10.1093/bioinformatics/btp352 [DOI] [PMC free article] [PubMed] [Google Scholar]
9. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies - PMC. Retrieved May 23 May 2023. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6662567. [DOI] [PMC free article] [PubMed]
10. Silva GGZ, Cuevas DA, Dutilh BE, Edwards RA. 2014. FOCUS: an alignment-free model to identify organisms in metagenomes using non-negative least squares. PeerJ 2:e425. doi: 10.7717/peerj.425 [DOI] [PMC free article] [PubMed] [Google Scholar]
11. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Retrieved 23 May 2023. https://genome.cshlp.org/content/25/7/1043 [DOI] [PMC free article] [PubMed]
12. Guizelini D, Raittz RT, Cruz LM, Souza EM, Steffens MBR, Pedrosa FO. 2016. GFinisher: a new strategy to refine and finish bacterial genome assemblies. Sci Rep 6:34963. doi: 10.1038/srep34963 [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Li D, Liu C-M, Luo R, Sadakane K, Lam T-W. 2015. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31:1674–1676. doi: 10.1093/bioinformatics/btv033 [DOI] [PubMed] [Google Scholar]
14. Bourgey M, Dali R, Eveleigh R, Chen KC, Letourneau L, Fillon J, Michaud M, Caron M, Sandoval J, Lefebvre F, Leveque G, Mercier E, Bujold D, Marquis P, Van PT, Anderson de Lima Morais D, Tremblay J, Shao X, Henrion E, Gonzalez E, Quirion P-O, Caron B, Bourque G. 2019. GenPipes: an open-source framework for distributed and scalable genomic analyses. Gigascience 8:giz037. doi: 10.1093/gigascience/giz037 [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. 2021. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol Biol Evol 38:4647–4654. doi: 10.1093/molbev/msab199 [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Tatusova T, DiCuccio M, Badretdin A, Chetvernin V, Nawrocki EP, Zaslavsky L, Lomsadze A, Pruitt KD, Borodovsky M, Ostell J. 2016. NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res 44:6614–6624. doi: 10.1093/nar/gkw569 [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Chaumeil P-A, Mussig AJ, Hugenholtz P, Parks DH. 2022. GTDB-Tk v2: memory friendly classification with the genome taxonomy database. Bioinformatics 38:5315–5316. doi: 10.1093/bioinformatics/btac672 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[B1] 1. Doré H, Leconte J, Guyet U, Breton S, Farrant GK, Demory D, Ratin M, Hoebeke M, Corre E, Pitt FD, Ostrowski M, Scanlan DJ, Partensky F, Six C, Garczarek L, Blanchard JL. 2022. Global phylogeography of marine Synechococcus in coastal areas reveals strong community shifts. mSystems 7:e0065622. doi: 10.1128/msystems.00656-22 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B2] 2. Schiewer U. 2008. Darß-Zingst Boddens, northern Rügener Boddens and Schlei, p 35–86. In Schiewer U (ed), Ecology of Baltic coastal waters. Springer, Berlin, Heidelberg. [Google Scholar]

[B3] 3. Albrecht M, Pröschold T, Schumann R. 2017. Identification of cyanobacteria in a Eutrophic Coastal lagoon on the Southern Baltic coast. Front Microbiol 8:923. doi: 10.3389/fmicb.2017.00923 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B4] 4. Wood AM. 1985. Adaptation of photosynthetic apparatus of marine ultraphytoplankton to natural light fields. Nature 316:253–255. doi: 10.1038/316253a0 [DOI] [Google Scholar]

[B5] 5. Martin M. 2011. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet j 17:10. doi: 10.14806/ej.17.1.200 [DOI] [Google Scholar]

[B6] 6. Nurk S, Meleshko D, Korobeynikov A, Pevzner PA. 2017. metaSPAdes: a new versatile metagenomic assembler. Genome Res 27:824–834. doi: 10.1101/gr.213959.116 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] 7. Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359. doi: 10.1038/nmeth.1923 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B8] 8. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup . 2009. The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079. doi: 10.1093/bioinformatics/btp352 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9] 9. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies - PMC. Retrieved May 23 May 2023. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6662567. [DOI] [PMC free article] [PubMed]

[B10] 10. Silva GGZ, Cuevas DA, Dutilh BE, Edwards RA. 2014. FOCUS: an alignment-free model to identify organisms in metagenomes using non-negative least squares. PeerJ 2:e425. doi: 10.7717/peerj.425 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11] 11. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Retrieved 23 May 2023. https://genome.cshlp.org/content/25/7/1043 [DOI] [PMC free article] [PubMed]

[B12] 12. Guizelini D, Raittz RT, Cruz LM, Souza EM, Steffens MBR, Pedrosa FO. 2016. GFinisher: a new strategy to refine and finish bacterial genome assemblies. Sci Rep 6:34963. doi: 10.1038/srep34963 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B13] 13. Li D, Liu C-M, Luo R, Sadakane K, Lam T-W. 2015. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31:1674–1676. doi: 10.1093/bioinformatics/btv033 [DOI] [PubMed] [Google Scholar]

[B14] 14. Bourgey M, Dali R, Eveleigh R, Chen KC, Letourneau L, Fillon J, Michaud M, Caron M, Sandoval J, Lefebvre F, Leveque G, Mercier E, Bujold D, Marquis P, Van PT, Anderson de Lima Morais D, Tremblay J, Shao X, Henrion E, Gonzalez E, Quirion P-O, Caron B, Bourque G. 2019. GenPipes: an open-source framework for distributed and scalable genomic analyses. Gigascience 8:giz037. doi: 10.1093/gigascience/giz037 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B15] 15. Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. 2021. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol Biol Evol 38:4647–4654. doi: 10.1093/molbev/msab199 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B16] 16. Tatusova T, DiCuccio M, Badretdin A, Chetvernin V, Nawrocki EP, Zaslavsky L, Lomsadze A, Pruitt KD, Borodovsky M, Ostell J. 2016. NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res 44:6614–6624. doi: 10.1093/nar/gkw569 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B17] 17. Chaumeil P-A, Mussig AJ, Hugenholtz P, Parks DH. 2022. GTDB-Tk v2: memory friendly classification with the genome taxonomy database. Bioinformatics 38:5315–5316. doi: 10.1093/bioinformatics/btac672 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Draft genomes of 3 cyanobacteria strains and 17 co-habiting proteobacteria assembled from metagenomes

Maximilian Berthold

Martin Albrecht

Douglas A Campbell

Naaman M Omar

Roles

ABSTRACT

ANNOUNCEMENT

TABLE 1.

Fig 1.

Contributor Information

DATA AVAILABILITY

REFERENCES

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Draft genomes of 3 cyanobacteria strains and 17 co-habiting proteobacteria assembled from metagenomes

Maximilian Berthold

Martin Albrecht

Douglas A Campbell

Naaman M Omar

Roles

ABSTRACT

ANNOUNCEMENT

TABLE 1.

Fig 1.

Contributor Information

DATA AVAILABILITY

REFERENCES

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases