Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2017 Mar 6;114(12):3198–3203. doi: 10.1073/pnas.1618556114

Comparative genomics uncovers the prolific and distinctive metabolic potential of the cyanobacterial genus Moorea

Tiago Leao a, Guilherme Castelão b, Anton Korobeynikov c,d, Emily A Monroe e, Sheila Podell a, Evgenia Glukhov a, Eric E Allen a, William H Gerwick a,f, Lena Gerwick a,1
PMCID: PMC5373345  PMID: 28265051

Significance

The genus Moorea has yielded more than 40% of all reported marine cyanobacterial natural products. Preliminary genomic data suggest that many more natural products are yet to be discovered. However, incomplete genomic information has hampered the discovery of novel compounds using genome-mining approaches. Here, we report a complete genome of a filamentous marine tropical cyanobacterium, Moorea producens PAL, along with the improvement of other three Moorea draft genomes. Our analyses revealed a vast and distinctive natural product metabolic potential in these strains, highlighting that they are still an excellent source of unique metabolites despite previous extensive studies.

Keywords: tropical marine cyanobacteria, genome comparison, biosynthetic gene clusters, heterocyst glycolipids, gene cluster network

Abstract

Cyanobacteria are major sources of oxygen, nitrogen, and carbon in nature. In addition to the importance of their primary metabolism, some cyanobacteria are prolific producers of unique and bioactive secondary metabolites. Chemical investigations of the cyanobacterial genus Moorea have resulted in the isolation of over 190 compounds in the last two decades. However, preliminary genomic analysis has suggested that genome-guided approaches can enable the discovery of novel compounds from even well-studied Moorea strains, highlighting the importance of obtaining complete genomes. We report a complete genome of a filamentous tropical marine cyanobacterium, Moorea producens PAL, which reveals that about one-fifth of its genome is devoted to production of secondary metabolites, an impressive four times the cyanobacterial average. Moreover, possession of the complete PAL genome has allowed improvement to the assembly of three other Moorea draft genomes. Comparative genomics revealed that they are remarkably similar to one another, despite their differences in geography, morphology, and secondary metabolite profiles. Gene cluster networking highlights that this genus is distinctive among cyanobacteria, not only in the number of secondary metabolite pathways but also in the content of many pathways, which are potentially distinct from all other bacterial gene clusters to date. These findings portend that future genome-guided secondary metabolite discovery and isolation efforts should be highly productive.


Cyanobacteria are carbon-fixing, oxygenic photosynthetic prokaryotes that play essential roles in nearly every biotic environment. Moreover, the development of oxygenic photosynthesis in cyanobacteria was responsible for creating Earth’s oxygen-rich atmosphere, thereby stimulating evolution of the extraordinary species diversity currently present (1, 2). In the open ocean, nitrogen-fixing (N2-fixing) cyanobacteria are the major source of biological nitrogen, and this can be a limiting factor to productivity in these oligotrophic environments (3). Filamentous diazotrophic cyanobacteria from subsection VIII, such as Nostoc and Anabaena, fix nitrogen within specialized cells called heterocysts (4).

Apart from their importance in biogeochemical cycles because of their primary metabolism, cyanobacteria are also a prolific source of secondary metabolites known as natural products (NPs). NPs from diverse life forms have been major inspirational sources of therapeutic agents used to treat cancer, infections, inflammation, and many other disease states (5). One genus of cyanobacteria in particular, Moorea, has been an exceptionally rich source of novel bioactive NPs (6). This taxonomic group, previously identified as “marine Lyngbya” but recently reclassified on the basis of genetic data as Moorea, consists of large, nondiazotrophic filaments that are mostly found growing benthically in shallow tropical marine environments (7). This genus has already yielded over 190 new NPs in the past two decades, accounting for more than 40% of all reported marine cyanobacterial NPs (8). The discovery of these NPs was mostly driven by classical isolation approaches, although this has been accelerated by the recent development of mass spectrometry (MS)-based molecular networking (groups metabolites according to their MS fragmentation fingerprints, simplifying the search for new NPs or their analogs) (9). Genomic analyses of these filamentous cyanobacteria have revealed that even well-studied strains possess additional genetic capacity to produce novel and chemically unique NPs (10), and suggest that bottom-up approaches (11) would be productive; a recent example is given by the discovery and description of the columbamides from Moorea bouillonii (12). Additionally, and despite the growing interest and importance of genome-guided isolation of NPs as well as the vast biosynthetic potential of these tropical filamentous marine cyanobacteria, not a single complete genome is available in the public databases. Such a complete genome is essential to serve as a reference for other sequencing projects and thereby improve our understanding of their full biosynthetic capacity to produce NPs.

In the present project, we applied a variety of computational and assembly methods to obtain a complete genome of a tropical filamentous marine cyanobacterium (the genome of Moorea producens PAL). This knowledge was applied to three other draft genomes by reference assembly (Moorea producens JHB, Moorea producens 3L, and Moorea bouillonii PNG), thereby greatly improving their assemblies as well as the ensuing evaluation of their metabolic and NP-producing capabilities. Comparisons between these genomes demonstrated that these four strains are remarkably similar, despite their differences in geographical site, morphology, and NP chemistry. Additionally, the presence in Moorea spp. of glycolipid biosynthetic genes associated with heterocyst formation, the site of nitrogen fixation in some filamentous cyanobacteria, suggests that this genus evolved from one that was capable of fixing atmospheric nitrogen. Moreover, we observed that these four Moorea strains are metabolically distinct from all previously described cyanobacteria, both in number and content of their NP pathways, providing support and raising expectations for future genome-guided isolation efforts.

Results and Discussion

Geographical, Morphological, and Chemical Features of Four Filamentous Marine Cyanobacteria.

The present study analyzed and compared four strains of tropical filamentous marine cyanobacteria of the genus Moorea (Fig. 1): M. producens PAL 15AUG08-1, M. producens JHB 22AUG96-1, M. producens NAK12DEC93-3L, and M. bouillonii PNG 19MAY05-8 (abbreviated as PAL, JHB, 3L, and PNG, respectively). All of these strains were laboratory cultured in saltwater BG-11 media since the time of their original collection. PAL was collected from a remote island in the Northern Pacific Ocean, Palmyra Atoll, in August 2008, and it produces the NPs palmyramide A and curacin D. PNG was collected from Papua New Guinea in May 2005, and it produces columbamide A–C, apratoxins A–C, and lyngbyabellin A. These two Pacific Ocean strains have similar morphologies comprised of discoid cells that are arranged into large isopolar filaments, present as trichomes covered by thick mucilaginous sheaths (7). The exterior of the sheath material is richly populated with various heterotrophic bacteria, some of which may exist in obligate commensal relationships (13). However, M. bouillonii PNG has a lighter coloration and thinner filaments (around 20–40 µm instead of 80–100 µm in PAL). The other two strains described here, JHB and 3L, are from the Caribbean Sea and hence constitute Atlantic species. JHB was collected from Hector’s Bay, Jamaica, in August 1996, and it produces hectoramide, hectochlorin A–D, and jamaicamide A–F. The 3L strain was collected from Curaçao in December 1993, and it produces barbamide, dechlorobarbamide, carmabins A and B, curacins A−C, and curazole. These two Atlantic strains have a similar morphology to PAL, with the exception that 3L has an overall red coloration caused by larger relative proportion of the pigment phycoerythrin. As recently reviewed by Kleigrewe et al. (8), the compounds cited above are produced via enzymes encoded by unique biosynthetic genes, some of which are almost exclusive to filamentous marine tropical cyanobacteria. Moreover, it is interesting to observe that some of the unique structural features in Moorea NPs (e.g., terminal olefins, t-butyl groups, gem-dichloro groups) are shared among different cyanobacterial metabolites that have different structural backbones. This suggests the likelihood of combinatorial repurposing of these genetic elements during the evolution of their pathways. Given the divergent geographical locations of their collection, differences in morphology, and variations in NP chemistry, a comparative genomic study of these four strains was undertaken.

Fig. 1.

Fig. 1.

Geographical location and microscopy images (using 40× magnification) of the four investigated Moorea strains.

The Use of Hybrid Assembly and Long-Reads Scaffolding to Obtain a Complete Genome of Tropical Filamentous Cyanobacterium.

The genus Moorea currently lacks a reliable reference genome. This would be invaluable for the relative placement of fragmented genomic data from sequencing projects of other Moorea strains. Therefore, to obtain a high-quality genome sequence, two different methods were used, Illumina MiSeq and PacBio, using DNA from a nonaxenic laboratory culture of Moorea producens PAL. Both the short and long reads were assembled together (described as “hybrid assembly”) using standard settings of SPAdes 3.5 (14), and yielded 47 linear contigs larger than 500 bp along with one circular contig of 35.5 kb (a candidate cyanobacterial plasmid). Hybrid assembly has previously been used to improve overall draft genome quality; however, in this case, it was still fragmented because large repeated regions remained unresolved (15). To resolve these regions and close the genome, we developed an approach that involved trimming the repetitive edges from the assembled contigs (which tend to have assembly mistakes) and then submitting these trimmed contigs to SSPACE-LongReads scaffolding with the standard settings (16). Fourteen of the contigs assembled into a single circular scaffold of 9.67 Mb, and gaps were closed, again using the long reads. The minimum coverage was 98-fold, and together with the 35.5-kb circular plasmid, it constitutes the complete M. producens PAL genome, a complete genome of a tropical filamentous marine cyanobacterium (Table S1). To assure that no cyanobacterial contigs were left out of the assembly, especially in light of the fact that the sequenced culture was nonaxenic, we performed a binning procedure using multiple features (GC content, coverage, phylogenetic identification of conserved genes, tetranucleotide fingerprint). This analysis confirmed that all 15 contigs (14 comprising the circular chromosome and 1 for the circular plasmid) from the PAL genome were the only cyanobacterial contigs in the sample (confirming that the culture was monocyanobacterial). Moreover, the binning procedure identified a fully assembled large contig of 3.63 Mb that represents a draft genome of a Hyphomonas sp. strain “Mor2” (GenBank: CP017718), an uncultured α-proteobacteria associated with M. producens PAL.

Table S1.

Genomic features of Moorea spp. genomes (one complete and three drafts)

Feature M. producens PAL M. producens JHB M. bouillonii PNG M. producens 3L
Main scaffold size (chromosome), Mb 9.67 9.35 8.23 8.15
Total genome size, Mb 9.71 9.38 8.32 8.37
Contigs that constitute the main scaffold (chromosome) 1 205 291 204
Unmapped scaffolds 0 0 12 (0.09 Mb) 78 (0.19 Mb)
Plasmid scaffolds 1 (circular) 2 (linear contigs) 0 0
G+C content, % 43.52 43.67 43.63 43.68
N50 of scaffolds (besides plasmids) NA NA 8,262,658 8,171,464
tRNA genes 60 54 56 56
rRNA genes (5S, 16S, 23S) 6 3 6 7
Total genes 7,571 7,517 6,982 7,080
Functions assigned, % 62.17 62.22 62.33 61.5
Chromosomal GI* 27 30 20 24
Number of BGCs 44 44 31 33
Genomic % of BGC 19.89 21.96 14.96 14.99

Statistics obtained from JGI/IMG annotation, unless marked with asterisk (*) for statistic from IslandViewer3 and dagger () for antiSMASH. BGCs, biosynthetic gene clusters; GI, genomic islands.

Possession of this reference genome for M. producens PAL enabled a substantial improvement in the assemblies of several other Moorea genomes via standard referencing procedures (Supporting Information) (17, 18). In the case of M. producens JHB, this reference assembly procedure resulted in a linear chromosomal scaffold of 9.6 Mb consisting of 205 contigs with ∼26,000 Ns that connect the contigs, along with two small plasmid scaffolds of 9.5 and 2 kb. The final draft genome of M. bouillonii PNG consisted of a linear chromosomal scaffold of 8.23 Mb (291 contigs and ∼32,000 Ns) and 12 unmapped scaffolds from 1.6 to 16.7 kb. The M. producens 3L final draft genome consisted of a linear chromosomal scaffold of 8.15 Mb (205 contigs and ∼20,000 Ns) and 78 unmapped scaffolds from 0.5 to 9.4 kb. Additional features of these four genomes are presented in Table S1.

The completeness of the four genomes was estimated by the presence and absence of ubiquitous cyanobacterial housekeeping genes [e.g., present in single copy in nearly all finished cyanobacterial genomes from Joint Genome Institute (JGI)/Integrated Microbial Genomes (IMG) database (total of 107 genomes, Dataset S1, worksheet 1)]. Our reference genome, M. producens PAL, contained all 195 housekeeping genes, reinforcing its completeness. The other three draft genomes were compared with the same 195 single-copy gene dataset, and revealed that the assemblies of 3L, PNG, and JHB contained 98.97, 98.46, and 99.49% of these genes, respectively (missing genes listed on the Dataset S1, worksheet 2). These percentages are close to the reference genome and thus indicative of their relative completeness and the excellent quality of their assembly. Other parameters from Table S1, such as GC content, number of genes, and percentage of annotated genes, are consistent with other cyanobacterial genomes (19).

Genome Comparison Among Moorea Strains Reveals Significant Synteny.

Given the wide geographical range from which the four Moorea strains were obtained, spanning some 16,000 km and existing in two distinct oceans, one could expect that they might show considerable sequence divergence. However, a precedent set from the genus Salinispora indicates that genomic conservation is in some cases observed for geographically divergent species (20). The four genomes investigated here were found to be remarkably similar with a very high average nucleotide identity (minimum of 94.6%), consistent with previously reported 16S rRNA gene identities of more than 99% (7). This is visualized as a circular map that compares the reference and draft genomes (Fig. S1) and bar graphs that depict the number and the percent identities between homologous genes in the different genomes (Fig. 2A). In Fig. S1, the high nucleotide identity between the Moorea genomes indicates that the reference assembly approach was a good solution for improving the quality of these three draft genomes. This high nucleotide identity translates to a high amino acid similarity, confirming their close evolutionary relationship (Fig. 2A). It is remarkable that M. producens PAL has higher similarity to M. bouillonii PNG than to other M. producens strains (also observed in the phylogenomic tree; Fig. 3), suggesting that it may require reclassification at the species level. These phylogenetic relationships may reflect the degree of separation between Pacific (PAL and PNG) and Atlantic strains (3L and JHB); however, a larger genome dataset will be required to substantiate this hypothesis. Last, the MUMmer plots in Fig. S2 indicate that these Moorea genomes are also highly syntenic with one another (similar genomic regions are present in the same order) yet are very distinct from the genome of Microcoleus sp. PCC 7113, the closest sequenced relative to Moorea.

Fig. S1.

Fig. S1.

Circular map showing three linear draft genomes (PNG, JHB, and 3L) aligned to the reference PAL circular chromosome. Each one of the three outer rings represents a main scaffold/chromosome, with the color code representing percent nucleotide identity. Fourth ring represents G+C content of the reference.

Fig. 2.

Fig. 2.

(A) A histogram of percent amino acid identity for all shared homologous genes with the PAL genome (bidirectional best BLAST hit, minimum ID (identity) of 50%). (B) Venn diagram for the shared homologous genes and strain-specific genes among the four Moorea strains. AA, amino acid.

Fig. 3.

Fig. 3.

(A) Phylogenomic analyses of completed cyanobacterial genomes using 29 conserved genes from Calteau et al. (19). Branches are colored according to cyanobacterial subsections (except by PCC 7418 and PCC 8305, which are not yet classified). All bootstrap values are higher than 85, except those marked by a circle (minimum bootstrap value is 52). (B) The number of biosynthetic gene clusters as deduced by antiSMASH analysis and colored by antiSMASH NP categories. For branches with more than one genome (triangular tips), the number of BGCs correspond to the most prolific genome.

Fig. S2.

Fig. S2.

MUMmer plots comparing colinearity of Moorea producens PAL to (A) Moorea bouillonii PNG; (B) Moorea producens JHB; (C) Moorea producens 3L; and (D) Microcoleus sp. PCC 7113, the next closest phylogenomic relative.

These four Moorea genomes share 5,944 homologous genes as identified by BLAST analysis (Fig. 2B). Therefore, only 8–13.5% of the total genes per genome are strain specific. Unfortunately, the great majority of the strain-specific genes lack detailed annotation (e.g., hypothetical proteins). On average, the largest number of annotated orthologous genes (OG) belong to categories “R: General function prediction only” (13%), “M: Cell wall biogenesis” (9%), “T: Signal transduction mechanisms” (7%), “E: Amino acid transport and metabolism” (7%), and “X: Mobilome” (7%). As expected by the high synteny and average nucleotide identity, the gene counts in most cluster of orthologous groups (COG) categories of all four genomes is remarkably similar (Table S2). Moreover, most of these categories possess a very similar OG content among the strains, represented by the normalized D-rank. When the D-rank is close to zero, the genes in the category have higher similarity to the homologs in the reference genome. In the categories related to primary metabolism, all four strains are nearly identical. All are annotated as photosynthetic (atmospheric carbon dioxide as primary carbon source), nondiazotrophic (absence of nitrogenase genes), capable of the biosynthesis of all proteinogenic amino acids (except for tyrosine and phenylalanine), and possessing the biosynthetic genes for important cofactors including CoA, cobalamin, biotin, flavin, NAD, heme, and thiamine. Additionally, the number of specialized sigma factors in the genomes of these four filamentous marine cyanobacteria strains, as previously discussed in Jones et al. (21), are virtually the same (five specialized sigma factors per genome). Despite the significant similarity between the four genomes, some COG categories were indicative of a number of subtle genetic differences (see comparison of COG categories in Supporting Information).

Table S2.

COGs comparison by category

Category Category description Reference Draft genomes
PAL D-rank PNG D-rank 3L D-rank JHB
B Chromatin structure and dynamics 2 0* 2 0* 2 0* 2
C Energy production and conversion 157 0.03 155 0.05 151 0.02 171
D Cell cycle control, cell division, chromosome partitioning 43 0.01 40 0.05 35 0.03 39
E Amino acid transport and metabolism 245 0.05 237 0.03 236 0.02 252
F Nucleotide transport and metabolism 69 0.01 62 0.01 63 0.03 61
G Carbohydrate transport and metabolism 141 0.05 155 0.04 137 0.04 153
H Coenzyme transport and metabolism 210 0.03 206 0.01 194 0.01 210
I Lipid transport and metabolism 115 0.01 96 0.05 92 0.08 123
J Translation, ribosomal structure, and biogenesis 194 0* 190 0* 188 0* 194
K Transcription 114 0.03 112 0.09 101 0.08 113
L Replication, recombination, and repair 104 0.09 107 0* 91 0.03 91
M Cell wall/membrane/envelope biogenesis 296 0.09 285 0.23 293 0.16 326
N Cell motility 69 0.09 63 0.12 59 0.07 69
O Posttranslational modification, protein turnover, chaperones 163 0.04 149 0.01 145 0.04 169
P Inorganic ion transport and metabolism 153 0.04 148 0.01 141 0.02 153
Q Secondary metabolites biosynthesis, transport, and catabolism 171 0.58 118 0.57 112 0.08 161
R General function prediction only 480 0.02 425 0.04 409 0.03 465
S Function unknown 211 0.01 203 0.01 200 0.04 226
T Signal transduction mechanisms 266 0.21 256 0.05 237 0.03 269
U Intracellular trafficking, secretion, and vesicular transport 62 0.27 39 0.31 42 0.2 49
V Defense mechanisms 135 0.03 125 0.16 115 0.08 135
W Extracellular structures 17 0* 16 0* 19 0* 18
X Mobilome: prophages, transposons 294 0.73 183 0.49 226 0.48 235
B–X Total 3,711 3,372 3,288 3,684

Data in boldface refer to the highest D-ranks, highlighting differences between categories from draft genomes compared with the reference (PAL).

*

Indicates D-ranks with value of P > 0.05 (not statistically significant).

Data represent the most common categories (in average percentage of genes).

The Evolved Loss of Nitrogen Fixation in the Genus Moorea?

The gene cluster for heterocyst envelope glycolipid biosynthesis (hgl) has been identified and characterized in the filamentous diazotrophic cyanobacteria Anabaena sp. PCC 7120 and Nostoc punctiforme ATCC 29133 (22, 23). These genes are commonly found in diazotrophic cyanobacteria from subsections VIII but are lacking in the other subsections. BLAST analysis of 267 cyanobacterial genomes from JGI/IMG confirmed the absence of these four core genes in subsections I–VII. As expected, M. producens 3L, a filamentous non–heterocyst-forming cyanobacterium from subsection VI, does not possess the hgl cluster. Surprisingly, the other three Moorea genomes described herein (PAL, PNG, and JHB) contain the complete hgl cluster. As depicted in Fig. S3, it appears that M. producens 3L recently lost the hgl cluster. Homologs of the genes upstream and downstream of the hgl cluster in PNG, JHB, and PAL are adjacent to one another in the 3L genome. Two new genes at this position that encode for hypothetical proteins have apparently replaced the hgl cluster in the 3L genome (red box in Fig. S3). Despite the presence of the hgl cluster, filaments cultured in nitrogen-deficient medium (up to 8 d at which time the cells start to rapidly die) did not develop heterocysts nor did they visibly produce heterocyst glycolipids (e.g., they were not reactive to Alcian blue staining, a dye used for acidic polysaccharides such as heterocyst glycolipids) (24). The only regulatory homolog for heterocyst development located in Moorea was hetR (∼70% nucleotide identity, located about 1.7–2.2 Mb apart from the hgl cluster); the ntcA and patS genes were absent. An additional four predicted regulatory elements in the immediate vicinity of the hgl core (Fig. S3) suggest that its regulation may be different and perhaps more complex than previously reported in Nostocales. Future transcriptomic experiments may provide insights into the regulation of this cluster.

Fig. S3.

Fig. S3.

Schematic of synteny within the vicinity of hgl core genes (same colors represent homologous genes); red boxes and arrows represent the displacement of those genes by mobile elements in M. producens 3L genome. Minimum identity is 92%, and all coverages are 100%; hgl core represents 19–21 genes. CST, chemotaxis sensory transducer; TCP, two-component system; TPR, tetratricopeptide repeat. Gene product predictions were retrieved from JGI expert-reviewed annotation. According to antiSMASH, genes in red are response regulators.

This study reports a cyanobacterium from outside subsection VIII that possesses the hgl cluster. To the best of our knowledge, the only other cyanobacterium capable of forming heterocyst glycolipids and not fixing nitrogen (the nif cluster is absent) is Raphidiopsis brookii D9 (Nostocales, subsection VIII) (24). Here, we propose an analogous situation where the retention of the hgl cluster (except by 3L) and a selective loss of the nif cluster has occurred. However, because there are no close relatives of Moorea that possess nif genes, we are unable to draw specific conclusions regarding the position or timing of this loss. Interestingly, several unclustered genes are present in these four genomes with predicted functions as “global nitrogen regulator,” “nitrogen fixation proteins of unknown function,” and “nitrogen regulatory protein P-II 1”; nonetheless, these genes have also been reported in non–heterocyst-forming and nondiazotrophic cyanobacteria (25). The fact that Moorea strains survive up to 8 d under nitrogen deprivation can likely be attributed to the presence of cyanophycin, a multi-l-arginyl-poly-l-aspartate nitrogen storage reserve material typical of cyanobacteria (26). Of note, our genomic analysis revealed that each of the Moorea genomes contained one cyanophycin synthetase and at least one cyanophycinase gene.

Uncovering the Metabolic Potential of the Genus Moorea.

A phylogenomic analysis (Fig. 3A) confirmed that these four Moorea strains are monophyletic, supporting the findings of high genomic synteny. However, based on phylogeny (Fig. 3A) and the occurrence of the hgl cluster, this genus may be misplaced within section VI of the cyanobacteria. Another highly prominent feature that distinguishes Moorea from other cyanobacteria (Fig. 3B) is the large number of biosynthetic gene clusters (BGCs). The average number of BGCs in this clade is dramatically larger than any other radiation of cyanobacteria. Although Moorea harbors an average of 38 per genome, some of the closest relatives (e.g., Microcoleus sp. PCC 7113, Dactylococcopsis. salina PCC 8305, Gleocapsa sp. PCC 7428) contain less than one-half this number. As such, Moorea spp. are “superproducers” among cyanobacteria, and on average 18% of their genome is dedicated to secondary metabolism (Table S1), nearly four times the average of other cyanobacteria (1). In comparison with all other bacterial genomes (Fig. 4), Moorea are among the most prolific producers of NPs with only some actinobacterial strains being more endowed (27). The discrepancy between our analyses and that performed previously by Jones et al. (21) on the draft genome of M. producens 3L is due to the fact that the BGC-mining tool antiSMASH (28) was not yet available in the earlier analysis. In the previous study, BGCs in the 3L genome were identified primarily by BLAST searching for NRPS and PKS genes, and this resulted in an underestimation of the resident biosynthetic pathways.

Fig. 4.

Fig. 4.

Distribution of bacterial genomes from JGI/IMG database in terms of genomic percentage dedicated to secondary metabolism (NP biosynthesis). Several prolific NP producers are identified in the figure, including Streptomyces coelicolor A3, Streptomyces bingchenggensis BCW-1, and two Salinispora strains (highest and lowest genomic percentages from this genus). The total number of genomes interrogated was 40,532.

To investigate the novelty of these numerous Moorea BGCs, we decided to group these BGCs into families according to sequence homology at the gene level. This “gene cluster networking” procedure has been applied to explore the biosynthetic capacity of 830 actinobacterial genomes (29). Because the code to the aforementioned networking approach is not publicly available, we adapted our own strategy for the discovery of gene cluster families (as described in Supporting Information). We refer to this workflow as BioCompass, found at biocompass.net/. The output can be displayed as a network diagram using Cytoscape, version 3.2.1 (Fig. 5). BioCompass predictions were verified to match well-known previously characterized pathways. For uncharacterized pathways, all BioCompass predictions were manually examined to confirm consistency between the multigene alignments within members of the same family. Nodes in the network signify gene clusters, whereas edges represent shared subclusters or subunits of the gene cluster. Subclusters indicate groups of adjacent and/or nonadjacent genes that share synteny and predicted function. Self-loops represent unique subclusters (not shared with any other pathway).

Fig. 5.

Fig. 5.

Gene cluster networking of PAL versus gene clusters from PNG, 3L, JHB, the MiBIG database, completed cyanobacterial genomes from JGI/IMG, and their closest homologs from the National Center for Biotechnology Information (NCBI) database (according to antiSMASH results). A represents only orphan gene clusters from the PAL genome. B contains known and cryptic gene clusters from cyanobacteria, and C contains only Moorea-specific cryptic gene clusters. Nodes represent clusters, and edges represent subclusters. Node size is proportional to gene cluster size. Incomplete gene clusters are sequences that contain undefined nucleotides and therefore require further validation. Known gene clusters are named in red. For more information regarding Moorea clusters, see Dataset S2, worksheet 2 (numbers on nodes refer to tabulated data in Dataset S2).

As depicted by the gene cluster network (Fig. 5 and Table 1), the great majority of gene clusters from PAL (40 out of 44 clusters, around 91%) match only cryptic gene clusters in other organisms (gene clusters not assigned to known NPs), suggesting that they likely encode the biosynthesis of unique NPs. Interestingly, 26 of the PAL clusters (about 59%, Fig. 5C) only have homology to other Moorea pathways, confirming previous chemical investigations that indicated they possess a unique secondary metabolite profile compared with other bacteria (8). Moreover, these findings suggest that M. producens PAL is not only a source of unique NPs but that these NPs will likely be composed of unique chemical backbones. Finally, given the level of synteny between Moorea genomes, it is intriguing to observe a significant number of orphan gene clusters (gene clusters only found in PAL, a total of seven clusters, ∼16%) (Fig. 5A).

Table 1.

Summary table listing number of known (K), “cryptic” (C), and “orphan” (O) NP pathways according to Fig. 5

PKS NRPS PKS-NRPS RiPP Terpene Others Sum per strain
Annotation K C* O K C O K C O K C O K C O K C O K C O
PAL 5 1 7 1 2 5 1 10 4 4 4 2 35 7
JHB 3 11 2§ 5 12 5 6 2 42
PNG 3 8 3 1 10 1 3 2 3 27 1
3L 4 2# 4 1|| 2 9 1 4 6 3 29 1
Subtotal 15 1 2 30 1 8 13 1 41 6 16 18 10 106 9
Total 16 33 22 47 16 18 152

Pathways are divided by biosynthetic category. Zeroes were replaced with dashes to improve data visualization. NRPS, nonribosomal peptide synthetase; PKS, polyketide synthase; RiPP, ribosomally synthesized and posttranslationally modified peptides.

*

Cryptic: A gene cluster not assigned to any known NP.

Orphan: A cryptic gene cluster only found in one strain (no matches to any sequence in the NCBI database).

Palmyramide and curacin.

§

Hectochlorin and jamaicamide.

Lyngbyabellin, columbamide, and apratoxin.

#

Carmabin and barbamide.

||

Curacin.

As previously reported, accurate prediction of BGC borders is a common challenge for the field (27, 30). This issue can have an effect on the estimated percentage of the genome dedicated to NP biosynthesis. However, the homology alignment feature of BioCompass allowed us to refine the BGC borders by removing unshared genes of unknown function, excluding from the analysis predicted proteins most likely representing genes adjacent rather than integral to BGCs. This more conservative approach to estimating cluster sizes had only a small effect on the percentage of the M. producens PAL genome allocated to secondary metabolism, reducing it from 19.89% (JGI) to 18.02% (Dataset S2, worksheet 2), confirming the validity of the relationships shown in Fig. 4. Further analyses of various features of Moorea’s BGCs (Dataset S2, worksheet 2), such as G+C content, few mobile elements within clusters, and encoding of relatively rare structural moieties (8), suggest that these strains have vertically acquired these biosynthetic pathways, consistent with previous reports for cyanobacteria (19). However, a larger sample size and better-characterized pathway products are needed to fully understand the evolution and distribution of Moorea’s NP pathways.

In summary, analysis of the genetic constitution and relationship of Moorea to other cyanobacteria suggests that the genus is distinctive among known cyanobacteria, especially in its exceptional capacity for production of secondary metabolites. Development of a reference genome for M. producens PAL has increased understanding of the genomic capacities of three related strains of filamentous cyanobacteria, providing fresh insights into this important source of NPs. Using gene cluster networking, we were able to demonstrate that many of the Moorea BGCs are rare among bacterial genomes, and suggest future directions for productive genome-guided isolation efforts of unique NPs from this genus.

Materials and Methods

SI Materials and Methods for details of sampling, culturing methods, DNA extraction, sequencing, assembly, genome comparison, and other bioinformatic analyses.

SI Materials and Methods

Sampling, Culturing, Microscopy, and Previous Sequencing Efforts.

PAL was collected at ∼1 m from a remote island in the Northern Pacific Ocean, Palmyra Atoll, in August 2008, as previously described by Taniguchi et al. (31). Its DNA was previously extracted (using the protocol below, except for the SDS pretreatment step) and submitted for MiSeq Illumina sequencing, using a 300-bp paired-end library. JHB was collected from Jamaica, at 2-m depth in Hector Bay, in August 1996, as described by Marquez et al. (32). Its DNA was extracted and submitted for HiSeq Illumina sequencing, using a 100-bp paired-end library. PNG was collected from Pigeon Island at 10-m depth in Papua New Guinea in May 2005, as described by Grindberg et al. (33). It was extracted and sequenced similarly to JHB. Last, 3L was collected near Carmabi Station at ∼2-m depth from Curaçao in December 1993, and it was extracted and sequenced by Sanger technology, as described by Jones et al. (21). All four genomes were highly fragmented, and due to those circumstances, only the 3L genome was published (accession no. GCA_000211815.1). 3L is the only Moorea genome published to date. All of these strains were established as unicyanobacterial cultures using standard microbiological isolation methods (plating, dilution, and microscopy analysis), and they have been maintained as live cultures in saltwater BG-11 media (34) since their isolation. Experiments with nitrogen-depleted media used the same BG-11 formulation, except that ferric ammonium citrate was substituted for ferric citrate and NaNO3 was substituted for NaCl, both in equimolar concentrations. Filaments from nitrogen-depleted cultures were subjected to 100 µL of Alcian blue staining [1% in 3% (vol/vol) acetic acid] for 2 s, followed by washing with sterile distilled water, and then observed under light microscopy using an Olympus IX51 epifluorescent microscope and Olympus U-CMAD3 camera. Control filaments were grown in regular saltwater BG-11 media and also stained using Alcian blue.

DNA Extraction, PacBio Sequencing of M. producens PAL, and De Novo Assembly.

To reduce the amount of heterotrophic contaminant bacteria, a pretreatment was performed with 10% (vol/vol) SDS, followed by BG-11 rinsing for SDS removal. DNA extraction was performed using a “QIAGEN Bacterial Genomic DNA Extraction Kit” optimized for cyanobacteria by carefully grinding filaments using mortar and pestle under liquid nitrogen before extracting using the standard kit protocol. The quality of the genomic DNA (gDNA) was evaluated by Nanodrop, 1% agarose gel electrophoresis, and Genomic DNA Screen Tape analysis. Post-quality control, the gDNA was sequenced using a PacBio RS II platform (Pacific Biosciences) at the Institute of Genomic and Medicine, University of California, San Diego, using a 10-kb fragment library and two Smart Cells to obtain high coverage. Both short and long reads were assembled together into contigs using SPAdes, version 3.5, using default settings with automatic coverage cutoff. Scaffolds were generated by using SSPACE-LongReads, with default settings, and gaps were closed using long reads by Geneious 8.1, with a minimum read coverage of 98-fold. The binning pipeline was adapted from Albertsen et al. (35), using coverage versus GC content to produce bins, indicating that all scaffolded contigs most likely belong to the same taxon of cyanobacteria. In this pipeline described by Albertsen et al., the minimum length threshold for the contigs was 500 bp, and other parameters evaluated were the phylogenetic designations of 107 single-copy genes conserved in bacteria and tetranucleotide fingerprint.

Reference Assembly of Draft Genomes.

Moorea producens JHB and Moorea bouillonii PNG reads were both assembled by SPAdes 3.5, using default settings with automatic coverage cutoff, generating 2,435 and 908 contigs, respectively. These contigs were scaffolded using SSPACE-ShortReads with a minimal link number of 20 reads, to ensure more accurate scaffolding. The complete genome of M. producens PAL was used as a reference template to order the contigs of the other strains into scaffolds with the software CONTIGuator. The scaffolds were submitted to the tool GapFiller, to reduce the number of Ns and gaps. Scaffolds not included in the reference assembly step (because of low nucleotide similarity to the reference template) were binned and named either unmapped scaffolds or plasmid scaffolds, depending on gene content annotations and best BLAST matches to NCBI/NR database. The taxonomic origins of binned contigs were verified using DarkHorse software, version 1.5 (36), to determine closest phylogenetic matches in GenBank nr for each individual gene in the contigs submitted. Moorea producens 3L was previously assembled and binned by Jones et al. (21) using similar methods; therefore, we proceeded straight to reference assembly. The genomes of PAL, JHB, 3L, and PNG have been deposited at DNA Data Bank of Japan/European Nucleotide Archive/GenBank under accessions numbers GCA_001767235.1, GCA_000211815.1, MKZR00000000, and MKZS00000000, respectively.

Comparative Genomics.

The complete reference genome and the three draft genomes were submitted to the Joint Genome Institute (JGI)/Integrated Microbial Genomes (IMG) database expert review pipeline for annotation. Comparative genomics and statistics analyses were obtained using the Genome Statistics (Table 1), Pairwise average nucleotide identity (Dataset S2, worksheet 1), Genome Gene Best Homologs (Fig. 2), and COG Homology and Abundance Profile (Table S1) tools from the JGI/IMG web server. The cutoff between two genes to be considered homologs was 50% amino acid identity. Synteny plots (Fig. S3) were generated by MUMmer 3.0 (37), with a maximum gap of 500 bp and minimum cluster length of 100 bp between Moorea genomes. Circular maps (Fig. S1) were generated by BRIG software (38), using Moorea producens PAL as a reference chromosome. MultiGeneBLAST (39) was used to compare hgl core genes and vicinities among the four Moorea genomes (Fig. S3). Traditional BLASTN and BLASTP searches versus the NCBI/NR database were performed to investigate some specific genes (for example, hetR, ntcA, patS, and so on).

Phylogenomics.

Phylogenomic analysis was performed using 29 different conserved genes reviewed in Calteau et al. (19), from all finished cyanobacterial genomes at JGI up to July 13, 2016 (so as to increase tree resolution) plus our four Moorea genomes, totaling 107 genomes. The genes were aligned by MUSCLE and the tree was built by the program Geneious Tree Builder, using the model Jukes-Cantor genetic building model, and the Neighbor-Joining method, with 1,000 bootstrap repetitions and one outgroup genome (Chloroflexus auranticus J-10). The tree image was edited in FigTree 1.4.2 (tree.bio.ed.ac.uk/software/figtree/), with branches colored according cyanobacterial subsections as previously reported in ref. 4.

Biosynthetic Gene Cluster.

Biosynthetic gene clusters (BGCs) were identified using antiSMASH 3.0 (28), standard configurations, ClusterFinder option off. All BGCs from Moorea strains were manually verified, to reduce false positives. All 103 currently available finished cyanobacterial genomes were also submitted to antiSMASH, but not manually verified. The number of BGCs per genome was generated using a Python script to parse antiSMASH results and to incorporate counts into Fig. 3B (gene clusters containing only RiPP precursor peptides were excluded from the count). Of particular note, despite the use of antiSMASH to identify the BGCs in these cyanobacterial genomes (including our strains), we used the statistics for Fig. 4 (plot of genomic percentage devoted to BGCs in bacteria) directly from JGI/IMG database, using all bacterial genomes in the July 2016 version of the database. We acknowledge that antiSMASH represents a more accurate tool for gene cluster identification than ClusterFinder (30), the standard tool used by JGI/IMG; however, submitting all bacterial genomes to antiSMASH was not feasible for our analyses. Nevertheless, as observed in Moorea producens PAL, the use of antiSMASH, manual inspection, and border refinement of PAL’s BGCs did not significantly differ from JGI/IMG statistics.

Gene Cluster Networking.

For the dereplication and initial analysis of biosynthetic gene clusters in Moorea spp. genomes, we developed a custom pipeline called Comparative Synteny Software for Biosynthetic Gene Clusters (BioCompass). Source code and detailed use instructions are provided at biocompass.net/ (along with the instructions to reproduce this exact analysis). Analogously to Doroghazi et al. (29), we grouped the BGCs into gene cluster families based on synteny and homology. A similarity matrix was used to divide each given BGC into subclusters based on synteny at best MultiGeneBLAST hits (obtained using antiSMASH 3.0) and the functional annotation of each gene in the queried cluster. This information was then incorporated into a query-specific database to search for the best matches for each subcluster. The newly created database included microbial BGCs identified by antiSMASH (downloaded from NCBI database, GenBank NR August 2016), the MiBIG repository (version 1.2), and BGCs from all finished JGI/IMG cyanobacterial genomes (same as included in phylogenomic analysis; Dataset S2, worksheet 1). Final similarity scores were calculated via MultiGeneBLAST for each subcluster, and the output was displayed as a network diagram using Cytoscape, version 3.2.1 (as shown in Fig. 5). The cutoff parameters used were 25% of matching genes per subcluster, with matching genes defined as having a minimum of 50% amino acid identity, 80% alignment coverage, a cumulative BLAST bit score of 1,000, and a MultiGeneBLAST score of 5.

SI Results and Discussion

COG Category Comparison.

The most obvious differences between orthologs in Moorea are found in categories M, Q, T, U, and X (normalized D-score equal to or greater than 0.2, marked in red in Table S1). For category M, besides having a D-rank greater than 0.2 between PAL and 3L, this difference is mainly due to the number of copies of uncharacterized “glycosyltransferase involved in cell wall biosynthesis”; therefore, they were not investigated further.

Curiously, the last two categories mentioned above, category U and category X, present a similar pattern of a positive D-rank in all three drafts if compared with the reference. The major difference between all four strains is the number of copies of two specific gene products, named “filamentous hemagglutinin family N-terminal domain-containing protein” (fhaB homologs) and “transposases” (encoded by transposons and insertion sequence elements). Transposases are enzymes that promote site-specific recombination between two DNA sites and are oftentimes associated with genomic islands (GIs); hence they tend to provide genome plasticity and their abundance tends to increase the rate of exchange of genetic material (40). Given this description of transposases and the similar pattern in categories U and X, one could hypothesize that a greater number of fhaB homologs inside GI from M. producens PAL would be the cause of this trend. However, none of these homologs is located partially or completely inside of PAL’s GI; hence, we hypothesize that the difference in number of copies is due to arbitrary duplication events, not reflecting biological key differences among those four strains.

Next, by analyzing the genes in category T from the four Moorea genomes, we identified two gene products specific to Moorea producens, “Forkhead associated (FHA) domain, binds pSer, pThr, pTyr” and “Stress-induced morphogen (activity unknown)” (COG1716 and 0271, respectively); as well as one gene exclusive to Moorea bouillonii, “DNA-binding response regulator, LytR/AlgR family” (COG3279). All of these genes were present on the main chromosome scaffold of each strain. FHA domains have many different roles in bacteria, including carbohydrate storage, pathogenic and symbiotic host–bacterium interactions (41). In cyanobacteria, not much is currently known regarding the function of FHA, and given the diversity of functions in other bacteria, we believe that further knockout and/or enzymology studies are required to draw conclusions about this gene function in M. producens. However, we predict that this protein could be related to host–bacterium interactions between M. producens strains (hosts) and heterotrophic bacteria (species-specific associates), because these kinds of associations have been previously reported (13). Regarding the Stress-induced morphogen (activity unknown), it is even more difficult to hypothesize about its function, because protein similarity only indicates that this gene product belongs to BolA-like protein superfamily, which are stress response proteins involved in the regulation of cell morphology. Cyanobacterial differentiation and morphology driven by environmental stimulus is a very complex subject (42). This protein could be an interesting target for additional exploration of how Moorea producens regulates its morphology according to environmental factors and how it differs from Moorea bouillonii. The last noteworthy gene from category T annotation is DNA-binding response regulator, LytR/AlgR family, with the closest homologs found in the Flavobacteriales), suggesting that it may have been horizontally acquired by M. bouillonii PNG. This gene is a response regulator commonly found associated with bacteriocin production in Lactobacillus spp. (43) but it is not found inside any of the BGCs in M. bouillonii. Therefore, we speculate that this gene may have been acquired to regulate a resistance mechanism against toxic compound(s) from heterotrophs associated with M. bouillonii.

Category Q (secondary metabolism biosynthesis, transport and catabolism) is one of the more interesting to compare between the Moorea strains. Surprisingly, the category Q genes of PAL are similar to those of JHB and substantially different from those of PNG and 3L. In part, this pattern emerges because of the number of BGCs in these four strains (Table 1). Further investigations using gene cluster networking (Fig. 5; more details in the final section of this manuscript) suggest that PAL and JHB share 10 more pathways than PAL and either of the other two strains. It is noteworthy that the total number of BGCs in all four strains is remarkably high, a feature that has not been previously observed to this magnitude from any other cyanobacteria. As explored more thoroughly in the final section, the high abundance of BGCs is a prominent feature that distinguishes Moorea from other cyanobacteria.

Last, we searched for genus-specific genes and identified three COGs that are only present in Moorea genomes. The first set of genes belongs to “COG1465: 3-dehydroquinate synthase” and it has low similarity (<53%) to actinobacterial “3-dehydroquinate synthases.” 3-Dehydroquinate synthases are commonly involved with the biosynthesis of mycosporin-like amino acids (MAAs). However, based on the reference genome, none of those three genus-specific genes is inside the identified genomic islands or BGCs. Therefore, given its low similarity, we believe that this first gene product was misannotated in PAL. In addition, these Moorea strains do have MAA-like gene clusters (in PAL, this is the cluster number 020) that contains a gene predicted to encode 3-dehydroquinate synthase. However, this homolog is shared with many other cyanobacterial genomes, as expected for a MAA-like biosynthetic gene. The second set of genes is from “COG1504: Uncharacterized protein,” and it has low identity (<52%) to eukaryotic “mth938 domain-containing proteins.” The last group of genes belongs to “COG4518: Mu-like prophage FluMu protein gp41” and has similarity to a recently published homolog in the genome of Stanieria sp. NIES-3757 (cyanobacterial genome absent from JGI/IMG database; GenBank: AP017375.1) annotated as a “sulfotransferase” (72% ID). Unfortunately, little information is available for these genes and their protein products, and thus the analysis was not mechanistically informative. Nevertheless, these three orthologs can be useful as specific markers to assist in the identification of Moorea strains in environmental samples.

Supplementary Material

Supplementary File
pnas.1618556114.sd01.xlsx (108.4KB, xlsx)
Supplementary File
pnas.1618556114.sd02.xlsx (83.2KB, xlsx)

Acknowledgments

This research was supported by National Institutes of Health Grants CA108874 and GM107550 (to W.H.G. and L.G.) and by Russian Science Foundation Grant 14-50-00069 (to A.K.). We thank the CAPES Foundation for Research Fellowship 13425-13-7 (to T.L.).

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: The genomes of PAL, JHB, 3L, and PNG have been deposited at DNA Data Bank of Japan/European Nucleotide Archive/GenBank (accession nos. GCA_001767235.1, GCA_000211815.1, MKZR00000000, and MKZS00000000, respectively).

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1618556114/-/DCSupplemental.

References

  • 1.Shih PM, et al. Improving the coverage of the cyanobacterial phylum using diversity-driven genome sequencing. Proc Natl Acad Sci USA. 2013;110(3):1053–1058. doi: 10.1073/pnas.1217107110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Flores E, López-lozano A, Herrero A. Nitrogen fixation in the oxygenic (cyanobacteria): The fight against oxygen. Biol Nitrogen Fixat. 2015;2:879–889. [Google Scholar]
  • 3.Zehr JP. Nitrogen fixation by marine cyanobacteria. Trends Microbiol. 2011;19(4):162–173. doi: 10.1016/j.tim.2010.12.004. [DOI] [PubMed] [Google Scholar]
  • 4.Komarek J, Kastovsky J, Mares J, Johansen JR. Taxonomic classification of cyanoprokaryotes (cyanobacterial genera) 2014, using a polyphasic approach. Preslia. 2014;86(4):295–335. [Google Scholar]
  • 5.Newman DJ, Cragg GM. Natural products as sources of new drugs from 1981 to 2014. J Nat Prod. 2016;79(3):629–661. doi: 10.1021/acs.jnatprod.5b01055. [DOI] [PubMed] [Google Scholar]
  • 6.Dittmann E, Gugger M, Sivonen K, Fewer DP. Natural product biosynthetic diversity and comparative genomics of the cyanobacteria. Trends Microbiol. 2015;23(10):642–652. doi: 10.1016/j.tim.2015.07.008. [DOI] [PubMed] [Google Scholar]
  • 7.Engene N, et al. Moorea producens gen. nov., sp. nov. and Moorea bouillonii comb. nov., tropical marine cyanobacteria rich in bioactive secondary metabolites. Int J Syst Evol Microbiol. 2012;62(Pt 5):1171–1178. doi: 10.1099/ijs.0.033761-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Kleigrewe K, Gerwick L, Sherman DH, Gerwick WH. Unique marine derived cyanobacterial biosynthetic genes for chemical diversity. Nat Prod Rep. 2016;33(2):348–364. doi: 10.1039/c5np00097a. [DOI] [PubMed] [Google Scholar]
  • 9.Wang M, et al. Sharing and community curation of mass spectrometry data with global natural products social molecular networking. Nat Biotechnol. 2016;34(8):828–837. doi: 10.1038/nbt.3597. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Moss NA, et al. Integrating mass spectrometry and genomics for cyanobacterial metabolite discovery. J Ind Microbiol Biotechnol. 2016;43(2-3):313–324. doi: 10.1007/s10295-015-1705-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Luo Y, Cobb RE, Zhao H. Recent advances in natural product discovery. Curr Opin Biotechnol. 2014;30:230–237. doi: 10.1016/j.copbio.2014.09.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Kleigrewe K, et al. Combining mass spectrometric metabolic profiling with genomic analysis: A powerful approach for discovering natural products from cyanobacteria. J Nat Prod. 2015;78(7):1671–1682. doi: 10.1021/acs.jnatprod.5b00301. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Cummings SL, et al. A novel uncultured heterotrophic bacterial associate of the cyanobacterium Moorea producens JHB. BMC Microbiol. 2016;16(1):198. doi: 10.1186/s12866-016-0817-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Nurk S, et al. Assembling single-cell genomes and mini-metagenomes from chimeric MDA products. J Comput Biol. 2013;20(10):714–737. doi: 10.1089/cmb.2013.0084. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Utturkar SM, et al. Evaluation and validation of de novo and hybrid assembly techniques to derive high-quality genome sequences. Bioinformatics. 2014;30(19):2709–2716. doi: 10.1093/bioinformatics/btu391. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Boetzer M, Pirovano W. SSPACE-LongRead: Scaffolding bacterial draft genomes using long read sequence information. BMC Bioinformatics. 2014;15(1):211. doi: 10.1186/1471-2105-15-211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Pop M, Phillippy A, Delcher AL, Salzberg SL. Comparative genome assembly. Brief Bioinform. 2004;5(3):237–248. doi: 10.1093/bib/5.3.237. [DOI] [PubMed] [Google Scholar]
  • 18.Galardini M, Biondi EG, Bazzicalupo M, Mengoni A. CONTIGuator: A bacterial genomes finishing tool for structural insights on draft genomes. Source Code Biol Med. 2011;6(1):11. doi: 10.1186/1751-0473-6-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Calteau A, et al. Phylum-wide comparative genomics unravel the diversity of secondary metabolism in cyanobacteria. BMC Genomics. 2014;15(1):977. doi: 10.1186/1471-2164-15-977. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Ziemert N, et al. Diversity and evolution of secondary metabolism in the marine actinomycete genus Salinispora. Proc Natl Acad Sci USA. 2014;111(12):E1130–E1139. doi: 10.1073/pnas.1324161111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Jones AC, et al. Genomic insights into the physiology and ecology of the marine filamentous cyanobacterium Lyngbya majuscula. Proc Natl Acad Sci USA. 2011;108(21):8815–8820. doi: 10.1073/pnas.1101137108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Campbell EL, Cohen MF, Meeks JC. A polyketide-synthase-like gene is involved in the synthesis of heterocyst glycolipids in Nostoc punctiforme strain ATCC 29133. Arch Microbiol. 1997;167(4):251–258. doi: 10.1007/s002030050440. [DOI] [PubMed] [Google Scholar]
  • 23.Zhang CC, Laurent S, Sakr S, Peng L, Bédu S. Heterocyst differentiation and pattern formation in cyanobacteria: A chorus of signals. Mol Microbiol. 2006;59(2):367–375. doi: 10.1111/j.1365-2958.2005.04979.x. [DOI] [PubMed] [Google Scholar]
  • 24.Stucken K, et al. The smallest known genomes of multicellular and toxic cyanobacteria: Comparison, minimal gene sets for linked traits and the evolutionary implications. PLoS One. 2010;5(2):e9235. doi: 10.1371/journal.pone.0009235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Lee H-M, Vázquez-Bermúdez MF, de Marsac NT. The global nitrogen regulator NtcA regulates transcription of the signal transducer PII (GlnB) and influences its phosphorylation level in response to nitrogen and carbon supplies in the cyanobacterium Synechococcus sp. strain PCC 7942. J Bacteriol. 1999;181(9):2697–2702. doi: 10.1128/jb.181.9.2697-2702.1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Berg H, et al. Biosynthesis of the cyanobacterial reserve polymer multi-l-arginyl-poly-l-aspartic acid (cyanophycin) J Biochem. 2000;267:5561–5570. doi: 10.1046/j.1432-1327.2000.01622.x. [DOI] [PubMed] [Google Scholar]
  • 27.Cimermancic P, et al. Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters. Cell. 2014;158(2):412–421. doi: 10.1016/j.cell.2014.06.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Weber T, et al. AntiSMASH 3.0-a comprehensive resource for the genome mining of biosynthetic gene clusters. Nucleic Acids Res. 2015;43(W1):W237–W243. doi: 10.1093/nar/gkv437. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Doroghazi JR, et al. A roadmap for natural product discovery based on large-scale genomics and metabolomics. Nat Chem Biol. 2014;10(11):963–968. doi: 10.1038/nchembio.1659. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Medema MH, Fischbach MA. Computational approaches to natural product discovery. Nat Chem Biol. 2015;11(9):639–648. doi: 10.1038/nchembio.1884. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Taniguchi M, et al. Palmyramide A, a cyclic depsipeptide from a Palmyra Atoll collection of the marine cyanobacterium Lyngbya majuscula. J Nat Prod. 2010;73(3):393–398. doi: 10.1021/np900428h. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Marquez BL, et al. Structure and absolute stereochemistry of hectochlorin, a potent stimulator of actin assembly. J Nat Prod. 2002;65(6):866–871. doi: 10.1021/np0106283. [DOI] [PubMed] [Google Scholar]
  • 33.Grindberg RV, et al. Single cell genome amplification accelerates identification of the apratoxin biosynthetic pathway from a complex microbial assemblage. PLoS One. 2011;6(4):e18565. doi: 10.1371/journal.pone.0018565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Rippka R, Deruelles J, Waterbury JB, Herdman M, Stanier RY. Generic assignments, strain histories and properties of pure cultures of cyanobacteria. Microbiology. 1979;111(1):1–61. [Google Scholar]
  • 35.Albertsen M, et al. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat Biotechnol. 2013;31(6):533–538. doi: 10.1038/nbt.2579. [DOI] [PubMed] [Google Scholar]
  • 36.Podell S, Gaasterland T. DarkHorse: A method for genome-wide prediction of horizontal gene transfer. Genome Biol. 2007;8(2):R16. doi: 10.1186/gb-2007-8-2-r16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Kurtz S, et al. Versatile and open software for comparing large genomes. Genome Biol. 2004;5(2):R12. doi: 10.1186/gb-2004-5-2-r12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Alikhan N-F, Petty NK, Ben Zakour NL, Beatson SA. BLAST ring image generator (BRIG): Simple prokaryote genome comparisons. BMC Genomics. 2011;12(1):402. doi: 10.1186/1471-2164-12-402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Medema MH, Takano E, Breitling R. Detecting sequence homology at the gene cluster level with MultiGeneBlast. Mol Biol Evol. 2013;30(5):1218–1223. doi: 10.1093/molbev/mst025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Langille MGI, Hsiao WWL, Brinkman FSL. Detecting genomic islands using bioinformatics approaches. Nat Rev Microbiol. 2010;8(5):373–382. doi: 10.1038/nrmicro2350. [DOI] [PubMed] [Google Scholar]
  • 41.Weiling H, Xiaowen Y, Chunmei L, Jianping X. Function and evolution of ubiquitous bacterial signaling adapter phosphopeptide recognition domain FHA. Cell Signal. 2013;25(3):660–665. doi: 10.1016/j.cellsig.2012.11.019. [DOI] [PubMed] [Google Scholar]
  • 42.Singh SP, Montgomery BL. Determining cell shape: Adaptive regulation of cyanobacterial cellular differentiation and morphology. Trends Microbiol. 2011;19(6):278–285. doi: 10.1016/j.tim.2011.03.001. [DOI] [PubMed] [Google Scholar]
  • 43.Kuo YC, et al. Characterization of putative class II bacteriocins identified from a non-bacteriocin-producing strain Lactobacillus casei ATCC 334. Appl Microbiol Biotechnol. 2013;97(1):237–246. doi: 10.1007/s00253-012-4149-2. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File
pnas.1618556114.sd01.xlsx (108.4KB, xlsx)
Supplementary File
pnas.1618556114.sd02.xlsx (83.2KB, xlsx)

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES