ABSTRACT
Long-standing cyanobacterial harmful algal blooms (CyanoHABs) are known to result from synergistic interaction between elevated nutrients and superior ecophysiology of cyanobacteria. However, it remains to be determined whether CyanoHABs are a result of positive selection by eutrophic waters. To address this, we conducted molecular evolutionary analyses on the genomes of 9 bloom-forming cyanobacteria, combined with pangenomics and metatranscriptomics. The results showed no positive selection by water eutrophication. Instead, all homologous genes in the species are under strong purifying selection based on the ratio of divergence at nonsynonymous and synonymous sites (dN/dS) and phylogeny. The dN/dS < 0.85 (median = 0.3) for all homologous genes are similar between the genes in the pathways driving CyanoHABs and housekeeping functions. Phylogenetic support for non-positive selection comes from the mixed clustering of strains: strains of the same species from diverse geographic origins form the same clusters, while strains from the same origins form different clusters. Further support lies in the codon adaptation index (CAI) and single nucleotide polymorphism (SNP). The CAI ranged from 0.42 to 0.9 (mean = 0.75), which indicates high-level codon usage bias; the pathways for CyanoHABs and housekeeping functions showed a similar CAI. Interestingly, CAI was negatively correlated with gene expression in 3 metatranscriptomes. The numbers of SNPs were concentrated around 5 to 50. As the SNP number increases, the gene expression level decreases. These negative correlations agree with the population-level dN/dS and phylogeny in supporting purifying selection in bloom-forming cyanobacteria. In summary, superior ecophysiology appears to be acquired prior to water eutrophication.
IMPORTANCE CyanoHABs are global environmental hazards, and their mechanisms of action are being intensively investigated. On an ecological scale, CyanoHABs are consequences of synergistic interactions between biological functions and elevated nutrients in eutrophic waters. On an evolutionary scale, one important question is how bloom-forming cyanobacteria acquire these superior biological functions. There are several possibilities, including adaptive evolution and horizontal gene transfer. Here, we explored the possibility of positive selection. We reasoned that there are two possible periods for cyanobacteria to acquire these functions: before the onset of water eutrophication or during water eutrophication. Either way, there should be molecular signatures in protein sequences for positive selection. Interestingly, we found no positive selection by water eutrophication, but strong purifying selection instead on nearly all the genes, suggesting these superior functions aiding CyanoHABs are acquired prior to water eutrophication.
KEYWORDS: cyanobacterial blooms, dN/dS, evolution, metatranscriptomics, pangenomics, purifying selection, single nucleotide polymorphism
INTRODUCTION
Cyanobacterial harmful algal blooms (CyanoHABs) are one of the most profound environmental hazards in modern history due in part to their global distribution (1), historical prevalence (2, 3), and economic consequences (4). While a variety of factors have been identified which contribute to CyanoHAB proliferation in the face of climate change (1, 5–7), many questions remain regarding their “complicated and confusing ecology” (8, 9).
To date, there remains a general consensus that nutrient loading serves as a main driver of CyanoHABs; these nutrient inputs range from macronutrients to trace metals, and nutrient residency times vary from long-term legacy nutrients to rapid ephemeral pulses (10). The corresponding metabolic pathways which utilize these nutrients in CyanoHABs have been systematically identified (11) and are curated in a web database, CyanoPATH (12). Among these pathways are physiological processes such as gas vesicle biosynthesis, nitrogen utilization, phycobilisome, CO2-concentrating mechanisms, and amino acid uptake. Cumulatively, these are examples of a few CyanoHAB features which are implicitly linked to nutrient acquisition and metabolism. Changes in water nutrient levels are unforeseeable to cyanobacteria; therefore, their success, observed on a global scale, attests to their superior ability to utilize elevated nutrients in eutrophic waters. From an evolutionary perspective, these superior biological functions could be acquired either prior to the onset of water eutrophication or during the eutrophication process.
In light of this recognition and the longstanding existence of CyanoHABs, one question is whether CyanoHABs are a result of positive selection by water eutrophication. In positive selection, also called Darwinian selection, genotypic changes increase the fitness of an organism (13). More specifically, either new beneficial mutations or a favorable environment enables new/existing variants to have a reproductive advantage which increases in frequency and finally fixes in the relevant population (14). In contrast, purifying selection removes (i.e., purifies) inferior variants carrying harmful mutations out of a population, typically to maintain a specific important biological function (14). Importantly, it has been shown that the widespread niches of extant cyanobacteria result from long adaptive radiation in various habitats (15). This process often involves the acquisition of some special functions in adaptation to specific habitats (16). In the presence of water eutrophication, we addressed the question of positive selection using select species of bloom-forming cyanobacteria with sufficient in vitro whole-genome sequences of single isolates; both molecular evolution and bioinformatics analysis were performed, including phylogeny, pangenome analysis, selection pressure, codon usage bias, and in situ metatranscriptome profiles in natural waters.
RESULTS
Discrepancy between phylogeny and biogeography in bloom-forming cyanobacteria.
Nine bloom-forming cyanobacterial species with at least 4 genomes of single isolates, not genomes binned from metagenomes, were included in the study. In the phylogeny of whole genome sequences, each species forms a separate clade with Synechocystis spp. and Synechococcus spp. as reference species, which have smaller genomes and earlier origins than other cyanobacteria (Fig. 1) (17). Considering the geographical origins of the strains, a clear discrepancy is observed between phylogeny and biogeography. Specifically, within the individual clades of different species, 2 types of subclades are formed: those consisting of strains from different geographic origins and those consisting of strains from the same location. Meanwhile, distances of varying lengths are clear between conspecific strains from the same locations. For example, this is most clear in the smallest clade of Leptolyngbya boryana, which comprises only four strains. One Japanese strain, NIES-2135, is closer to the European strain PCC 6306 than to the other two strains from Japan, dg5 and IAM-M101.
FIG 1.
Phylogeny of 9 bloom-forming cyanobacterial harmful algal blooms (CyanoHABs) assessed in this study with in vitro whole genome sequences of single culture isolates. The alignment of 150 whole genomes from the National Center for Biotechnology Information (NCBI) was used to infer Maximum Likelihood phylogeny. Cyanobacteria species are color-coded, each species forming a separate clade, with Synechocystis spp. and Synechococcus spp. as reference species. Circles on the tree branches indicate bootstrap values greater or equal to 70%. Details about strains are provided in Supplemental Data Set 1.
To further examine this discrepancy between phylogeny and biogeography, we focused on the largest Microcystis aeruginosa clade (Fig. 1), which has the most (18) whole genome sequences (Data Set S1). Provided with more traits associated with the strains, such as toxicity (microcystin-producing or not) and genome size, it is apparent that each cluster has strains from different geographic locations, which are often geographically far from one another. For example, one subclade has strains from Japan (NIES strains), China (TAIHU98 and FACHB strains), Europe (PCC strains), and the USA (LE3), and the other 2 subclades comprise a mix of strains from Japan, the USA, and Europe (Fig. 2B). The phylogenetically close but geographically distant strains are visualized in a global map (Fig. 2C), and at least 10 pairs of strains ranged across 5 continents. Interestingly, the toxic strains tended to cluster within a few clades, despite their origins across the 5 continents. Genome sizes tend to vary within each clade whether they consisted of international or national strains. This geography-phylogeny disparity contradicts the expectation for divergent evolution and more specifically suggests that no diverging adaptation to local eutrophic waters is required for these strains to form blooms across the world. Put differently, they are capable of forming blooms in eutrophic waters anywhere.
FIG 2.
Phylogenetic trees and global distribution of Microcystis aeruginosa strains. (A) The ML phylogenetic tree built with the core genome sequences. Biogeographical location of the original culture isolate is indicated by color (Asia = orange, Europe = bue, North America = red, Africa = black, Oceanica = green) on the inner concentric circle. Genome size is shown as a color gradient from red (5.9 Mb) to blue (4.3 Mb) on the outer circle. Squares on the tree branches indicate bootstrap values greater or equal to 70%. (B) The ML phylogenetic tree built with the whole genome sequences. The tree was rooted using two cyanobacteria (Anabaena cylindrica PCC7122 and Planktothrix agardhii NIES-204) as outgroups. Taxonomy and biogeographical location of the original culture isolate of each strain are shown. Clustering was conducted using the TYGS server (22). Strains belonging to the same clusters and subclusters are marked using the same color stripes. Triangles on the tree branches indicate bootstrap values greater or equal to 70%. (C) World map showing the biogeographical location of the original culture isolate. Strains sharing the latest common ancestor are connected. Map was obtained from OpenStreetMap and is licensed under the Open Data Commons Open Database License (OdbL).
Core and pangenomes analysis in bloom-forming cyanobacteria.
Pangenome analysis of all the bloom-forming species showed 2 general patterns: the number of total genes increased while the number of core genes decreased as more genomes were included (Fig. 3A and Fig. S1). For example, 50 Microcystis aeruginosa genomes give a total of 28,829 genes, and among them, there are 1,580 core genes (shared by all 50 strains) and 13,881 unique genes (present in only 1 strain) (Fig. 3B), with range of 9 (NIES-298B) to 1,365 (FD4) in each genome. For the other cyanobacteria, similar patterns were observed despite the fact that their genome sizes and numbers of core and unique genes are different; however, the rate of change as more genomes are included was different between species (Fig. S1).
FIG 3.
Pangenome analysis of 50 M. aeruginosa strains. (A) Core-pan-genome profiles of 50 selected M. aeruginosa genomes. Colored boxes denote the pan-genome (yellow) and core genome (blue) sizes, respectively. (B) Flower plot showing core and unique genes of the 50 selected M. aeruginosa genomes. Central circle represents the number of genes shared by all strains, while petals represent the number of unique genes in each strain. (C) Core genes of M. aeruginosa strains in CyanoPATH. AAPEP, uptake of amino acids and peptides; CCM, CO2-concentrating mechanism; DrugR, antibiotics resistance; MAA, mycosporine-like amino acid biosynthesis for screening UV radiation; MetalR, heavy metal resistance; Nfix, nitrogen fixation; Nitrogen, nitrogen utilization; Osmos, osmosis homeostasis; OSR, oxidative stress resistance; PBS, phycobilisome; Phosphorus, inorganic/organic utilization; PS-I/PS-II, photosystem I and II; Sugar, sugar assimilation; Sulfur, sulfur utilization; TEVit, assimilation of trace metals and vitamins; Toxins, cyanotoxins; UFA, unsaturated fatty acids; Vesicle, gas vesicles.
We then assessed the essentiality of the core genes by comparing them to the essential genes in Synechococcus elongatus, a photosynthetic model organism which has a much smaller genome (2.7 Mb) (19). We found a high degree of overlap between them: 73.4% and 84.0% of the essential genes were core and soft-core genes (present in ≥95% of the genomes), respectively. Because S. elongatus rarely forms blooms, this suggests that additional genes besides the core genes are needed for CyanoHABs. Next, we examined how many genes were involved in pathways for CyanoHABs, as curated in the pathway database CyanoPATH (12). A total of 185 core genes were found to be involved in CyanoPATH (Fig. 3C). Functionally, these genes are mainly involved in nitrogen utilization (24 genes), PS-I (photosystem-1) (20), CCM (CO2-concentrating mechanism) (21), AAPEP (amino acid utilization) (22), MAA (mycosporine-like amino acid biosynthesis) (22), and gas vesicle biosynthesis (14), followed by PBS (phycobilisome) (10) and pathways for utilizing other nutrients (23).
The genes for CyanoHABs are under strong purifying selection.
To determine whether bloom-forming cyanobacteria are under positive selection, we first performed a nonsynonymous/synonymous rate ratio (dN/dS) analysis. All homologous genes of the strains of each species had a dN/dS ratio of <1, with a median of about 0.32 and a minimum close to 0 (Fig. 4A and S2); this strongly suggests that the strains from different locations are under purifying selection, not positive selection for divergence (9, 21, 24). In M. aeruginosa, all dN/dS ratios are less than 0.85, with a median of 0.3. When the homologous genes were categorized into the functional pathways as curated in CyanoPATH (12), these functional genes showed similar dN/dS ratios to those of the housekeeping genes, e.g., central metabolism, transcription, translation, etc. (Kruskal-Wallis test, P > 0.5; Fig. 4B) (19). This strong purifying selection on the genes for CyanoHABs and housekeeping genes suggests that these genes are functionally important to the fitness of the species and any deleterious mutations are eliminated from the population.
FIG 4.
dN/dS and CAI for all homologous genes in 50 genomes of M. aeruginosa. (A) The dN/dS distribution of all genes. (B) The dN/dS of the genes in the pathways of CyanoPATH. (C) Distribution of CAI in all M. aeruginosa genes. (D) CAI for genes in different groups of functions. Purple bars are pathways strongly associated with CyanoHABs; green bars are pathways not specifically related to CyanoHABs; last 7 blue bars represent essential biological functions (19).
Second, we determined the codon usage bias of the CyanoPATH genes in M. aeruginosa, a result of the balance between mutation and selection on translation. We used CAI (codon adaptation index), an index of adaptation in terms of the nonrandom synonymous codon usage of a gene from synonymous codons for translational efficiency (25). The CAI values ranged from 0.42 to 0.90, with a median of 0.75 (Fig. 4C). When the homologous genes are grouped into the metabolic pathways in CyanoPATH, their CAI values are similar to those in core metabolism and housekeeping processes (e.g., energy metabolism and protein) (Kruskal-Wallis test, P > 0.5; Fig. 4D).
Correlation between gene expression and CAI and dN/dS.
We further tested the possibility that the codon usage bias of genes can predict their expression level. Interestingly, the CAI in the common bloom-forming species M. aeruginosa was negatively correlated with the gene expression levels in most pathways in CyanoPATH, except for PBS, TEVit (vitamins and trace elements), and phosphorus utilization, using the metatranscriptomes from our study and 2 other studies (Fig. 5 and Table 1). Such negative correlations were similar in the 3 data sets used, with correlation coefficients ranging from 0.09 to 0.42 (P < 0.001). This suggests that the genes which were highly expressed in most of the pathways tended to have low CAI, while those with high CAI tended to be expressed at relatively low levels. Next, we analyzed the relationship between dN/dS and gene expression. A similar negative correlation was observed, but it was significant in fewer pathways (Fig. S3). Among these, only Nfix (nitrogen fixation), nitrogen utilization, TEVit, PS-I, cyanotoxin biosynthesis, and phosphorus utilization showed significant correlations (P < 0.05).
FIG 5.
Relationships between gene expression and CAI in M. aeruginosa. A linear regression was drawn for each metabolic pathway with 95% confidence intervals and P values.
TABLE 1.
Correlation coefficients between TPM and CAI in each metatranscriptome data seta
| Data set | Correlation coefficient | No. of genes |
|---|---|---|
| Lake Erie 2012_1 | −0.42 | 1,574 |
| Lake Erie 2012_2 | −0.09 | 1,577 |
| Lake Erie 2012_3 | −0.36 | 1,524 |
| Lake Taihu 2019_1 | −0.31 | 4,144 |
| Lake Taihu 2019_2 | −0.34 | 4,088 |
| Lake Taihu 2019_3 | −0.33 | 3,897 |
| Kranji Reservoir 2014 | −0.07 | 2,291 |
TPM, transcripts per kilobase million; CAI, codon adaptation index.
Relationship between SNP and gene expression.
Finally, we determined the single nucleotide polymorphisms (SNPs) in the genes of M. aeruginosa. A standard Poisson distribution of the SNPs was observed for all genes, using the Lake Taihu data sets (Fig. 6). About 2,500 genes had 5 to 50 SNPs, about 500 had more than 50 SNPs, and the rest (800) had very few SNPs. Moreover, all 1,580 core genes had SNPs, and 3,992 pan genes had at least 1 SNP. No linear correlations were observed between the number of SNPs and gene expression level. However, by dividing the gene expression levels into 3 groups, low (transcripts per kilobase million [TPM] = 0.5 to 10), medium (TPM = 11 to 1,000), and high (TPM > 1,000) (20), we found significant differences in the number of SNPs between the low and high groups and between the medium and high groups (Kruskal-Wallis test, P < 0.001). In other words, genes expressed at a high level had fewer SNPs than those expressed at a low or medium level (Fig. 6).
FIG 6.
SNP numbers in 3 gene expression levels of in situ metatranscriptome data from Lake Taihu. Data Set 1: (A) SRA: SRR21035067, (B) SRA: SRR21035066. Data Set 2: (C) SRA: SRR21035065. (D) SRA: SRR21035064; Data Set 3: (E) SRA: SRR21035063, (F) SRA: SRR21035062. Kruskal-Wallis test: *, P < 0.05; ***, P < 0.01; ****, P < 0.001. Using the EMBL-EBI’s Expression Atlas thresholds as our guide, we set the following quantitative cutoffs for gene expression: low expression, 0.5 to 10 transcripts per kilobase million (TPM); medium, 11 to 1,000 TPM; high, >1,000 TPM (20).
DISCUSSION
CyanoHABs of dozens of species are prevalent in eutrophic waters worldwide. One key question as to this ubiquitous dominance is whether CyanoHABs result from positive selection by eutrophic waters in different geographic locations. Our results show that no directed (positive) evolution was observed and, instead, most genes are under strong purifying selection. Meanwhile, the phylogenies of conspecific strains from different geographic origins showed no clear divergent evolution patterns; instead, some of the geographically distant strains tended to cluster together. This discrepancy between phylogeny and geography has been repeatedly observed in other studies (26, 27). This common closeness between geographically distant strains shared in all 9 select cyanobacteria contradicts their geographic separation, which falsifies positive selection toward water eutrophication. Furthermore, the purifying selection on most genes provides the most important evidence against positive selection. Additional support for purifying selection comes from the similar dN/dS ratios between the pathways for CyanoHABs and central metabolism; these bloom-driving functions are no different in the selection regime than the essential housekeeping functions. Meanwhile, we are also aware that conventional dN/dS ratios used at the population level are developed for distinct sequences and thus caution should be taken when using them (28). Despite these considerations, these two findings provide evidence for purifying selection, not positive selection, in cyanobacteria.
The lack of evidence for positive evolution in this study underscores the fact that water eutrophication due to human activities is a recent shift in the water nutrient state (10). First, this upward shift is unforeseeable to cyanobacteria, so they cannot prepare for it ahead of time in terms of biological functions. Second, after eutrophication occurs, they slowly begin to adapt to elevated nutrients; however, the past 100 years do not appear to have been enough time to accumulate beneficial mutations, given that their mutation rates are similar to those in bacteria (29, 30), that the effective population size is reduced in asexual organisms (31) and that beneficial mutations are lost to annual bottlenecking (32) when they overwinter (33, 34). This is not to say that no adaptation whatsoever has occurred since water eutrophication, but the molecular signature of adaptive evolution is probably insignificant and evades our detection at the moment.
The finding that CyanoHABs are not a consequence of positive selection by water eutrophication provides a foundation for understanding the molecular genetics of bloom-forming cyanobacteria. One important discovery is the negative correlation between CAI and gene expression, with genes of low CAI (<0.5) having higher expression. CAI has a range of 0 to 1, with the lower bound (CAI = 0) when only the least frequent codons are used and the upper bound (CAI = 1) when the most frequent codons are used in a gene (35). It has been shown that a value greater than 0.5 indicates a high bias (36), a result of selection. Thus, the negative correlation indeed reflects the level of purifying selection, the potential to increase transcription/translation efficiency (37, 38). Specifically, most genes have CAI values of >0.65, which is higher than the neutral 0.5 and suggests that the codons are already used in a biased manner. Thus, the genes used the most, which are highly expressed, prefer common codons to facilitate transcription/translation and thus have high CAI, while the genes less used are not as highly expressed and can use rare codons to achieve an overall transcription efficiency. The negative relationship between the number of SNPs and gene expression levels can be understood in the same manner. Fewer SNPs reflect stronger purifying selection and the importance of certain genes which are often used and indeed expressed at higher levels than those with more SNPs.
The finding that CyanoHABs are not the results of positive selection by water eutrophication also has an important bearing on bloom control. A lack of positive selection suggests that bloom-forming cyanobacteria are functionally ready prior to water eutrophication. In other words, these species must have these functions encoded in their genomes which are also fully functional prior to water eutrophication so that when freshwaters became eutrophic (i.e., loaded with excess external nutrients from human activities), they can make the most of elevated nutrients more than co-living algae to dominate the phytoplankton community. In light of the superior biological functions in place in bloom-forming cyanobacteria, nutrient reduction appears to be the most effective method for controlling CyanoHABs because it eradicates the problem at the source; however, it is also the most expensive and time-consuming (39).
In summary, we found that CyanoHABs are not a result of positive selection by water eutrophication. Instead, bloom-forming cyanobacteria are under purifying selection. Consequently, the genes with relatively less codon usage bias and lower SNPs are expressed at higher levels than those with more codon bias and SNPs. Given the superior functions encoded by the genomes of bloom-forming cyanobacteria and the driving role of nutrients established by scientific consensus (10), this study also suggests that the most effective control of CyanoHABs is nutrient reduction in eutrophic waters.
MATERIALS AND METHODS
Genomes and sources.
In December 2020, all the genome sequences of 9 bloom-forming cyanobacteria (M. aeruginosa, Aphanizomenon flos-aquae, Arthrospira platensis, Cylindrospermopsis raciborskii, Fischerella thermalis, Nodularia spumigena, Planktothrix agardhii, Synechococcus elongatus, and Synechocystis salina) according to CyanoPATH were obtained from the National Center for Biotechnology Information (NCBI) genome database. Metagenome-assembled genomes (MAG) were filtered out because MAGs are mostly incomplete and may contain fragments from other species due to practical or technical limitations (40). Only the genomes reconstructed from pure cultures (single isolates) were subject to the following analyses, and their features are listed in Data Set S1 with the geographic locations of samples found listed by the NCBI, Joint Genome Institute (JGI), National Institute for Environmental Studies (NIES), and Biological Resource Center of Institut Pasteur (CRBIP).
Comparative genome analyses (core-pan-genome analysis).
All the genomes were annotated with Prokka v1.14.6 (41) using the corresponding genus databases to determine the location and attributes of the genes they contained. The core-pan-genome analysis was performed with Roary v3.13.0 using the default settings (42). The resulting core- and pan-genomes were visualized by a core-pan-genome plot using Prism v9.1.1 (43) and a flower plot using R v4.1.1 (44).
Phylogenetic analyses.
A whole genome-based phylogenetic tree of all cyanobacteria was built using PhyloPhlAn v3.0.60 in “medium diversity” mode (45) and was visualized on iTOL (46).
For M. aeruginosa, 2 phylogenetic trees were reconstructed based on the core genome and whole genome sequences, respectively. The core genome sequences of 50 M. aeruginosa strains generated from pangenomic analysis were aligned with Mafft v7.310 using the FFT-NS-2 progressive algorithm (47). A Maximum Likelihood (ML) tree was built by FastTree v2.1.10 with default parameters (48) and visualized online on iTOL (46). The genome sequences of M. aeruginosa were then searched against microcystin synthetase A (GenBank accession no. AAF00960.1) with NCBI BLAST+ v2.6.0 (49), and those containing the enzyme were considered toxic (50).
The whole genome sequences of 50 M. aeruginosa, P. agardhii (GenBank assembly no. GCA_003609755.1) and Anabaena cylindrica (GenBank assembly no. GCA_000317695.1) were uploaded to TYGS for genome-based phylogenetic analysis (22). The resulting phylogenetic tree was then uploaded to iTOL for visualization (46). The global distribution of the samples was visualized on a world map obtained from OpenStreetMap (51), and strains sharing the earliest common ancestor were lined using Tableau v2021.3 (18).
Selection pressure analyses.
The dN/dS and CAI were calculated to determine the selection pressure on the protein-coding genes. Genes shared by less than 4 strains in each species and those without nonsynonymous mutations were excluded from the selection pressure analyses (52). The dN/dS of each gene was calculated using CodeML from the PAML (Phylogenetic Analysis by Maximum Likelihood) package v4.9j with default settings (53). For M. aeruginosa, random subsampling of 10 genomes was repeated 40 times. Then, dN/dS were calculated for each repetition and the average values were recorded.
The CAI values were determined by CAIcal v1.4 (25) based on the codon usage table available in the Codon Usage Database (54).
Metatranscriptomic sequencing and analyses.
Water samples were collected from 3 locations at Lake Taihu with 2 biological replicates and processed for next-generation sequencing (NGS) as previously described (55). The prepared NGS libraries were loaded to the Illumina Mi-Seq platform for 150-bp paired-end sequencing. Two or three sets of RNA-seq data were first merged according to their sampling sites as shown in Table 1. Low-quality reads were filtered out using Trimmomatic v0.39 (56). The trim settings used are as follows: headcrop = 10, trailing = 20, slidingwindow = 4:26, minlen = 75. From the 50 M. aeruginosa strains, GenBank assembly no. GCA_002282945.1 was selected as the reference for the following analyses and annotated by Prokka v1.13 (41). Next, for each set of RNA-seq data, expression levels were calculated using RSEM v1.3.3 (23). Transcript abundances were quantified in terms of TPM.
The correlation coefficients between TPM and CAI values in each data set (Lake Erie, Lake Taihu and Kranji Reservoir) were calculated using the “corrr” package v0.4.3 (57).
Genetic variant analyses.
Three sets of RNA sequencing data were used for SNP calling. We performed a two-pass alignment to map reads to the reference assembly (GenBank assembly no. GCA_002282945.1) using STAR v2.7.10a (58). Identical reads were marked using the MarkDuplicates tool in Picard v2.21.9 (59) and were ignored in downstream analysis. VarScan v2.4.4 was used to call variants with a minimum coverage value of 1 and a P value of 0.95 (60).
Metabolic analyses.
A previous study identified 718 essential genes necessary for survival in cyanobacteria (19). Genes in our study which overlapped with the essential gene set were annotated as putative essential genes. Core genes were grouped into different metabolic pathways by searching them against a reference set of protein sequences obtained from CyanoPATH (11), a database of pathways associated with cyanobacterial blooms, using DIAMOND v0.9.14 (61). Those with less than 75% identity or less than 75% coverage (hit length divided by the reference length) were filtered out. Box plots of dN/dS and CAI values of core genes in each pathway were generated using the Python Matplotlib package v3.4.3 (62). Scatterplots of CAI versus transformed TPM values of core genes in each pathway were drawn by Prism v9.11 (43). TPM values were transformed into log(TPM + 1).
Data and code availability.
The raw metatranscriptomic sequencing data generated from this study have been deposited in the NCBI Sequence Read Archive (SRA) under BioProject accession no. PRJNA869295. The SRA and GenBank accession numbers of the omics data used are listed in Table 2 and Data Set S1, respectively. All codes are available on request.
TABLE 2.
Metatranscriptome data sets used in this study
| Data set | Source | SRAa accession no. |
|---|---|---|
| Lake Erie 2012_1 | Steffen et al. (63) | SRR1927216, SRR1927217, SRR1927219 |
| Lake Erie 2012_2 | Steffen et al. (63) | SRR1927220, SRR1927222, SRR1927223 |
| Lake Erie 2012_3 | Steffen et al. (63) | SRR1927225, SRR1927229, SRR1927239 |
| Lake Taihu 2019_1 | This paper | SRR21035067, SRR21035066 |
| Lake Taihu 2019_2 | This paper | SRR21035065, SRR21035064 |
| Lake Taihu 2019_3 | This paper | SRR21035063, SRR21035062 |
| Kranji Reservoir 2014 | Penn et al. (55) | SRR1171050, SRR1171059, SRR1171061 |
SRA, Sequence Read Archive (https://www.ncbi.nlm.nih.gov/sra).
ACKNOWLEDGMENTS
This work was supported by the National Natural Science Foundation of China (32171565), the Kunshan Municipal Government Research Fund, and a Duke Kunshan University Summer Research Scholarship (to Y.Y., W.C., X.C., and Q.G.).
Footnotes
Supplemental material is available online only.
Contributor Information
Huansheng Cao, Email: hc284@duke.edu.
Yanbin Yin, University of Nebraska - Lincoln.
REFERENCES
- 1.Paerl HW, Havens KE, Hall NS, Otten TG, Zhu M, Xu H, Zhu G, Qin B. 2020. Mitigating a global expansion of toxic cyanobacterial blooms: confounding effects and challenges posed by climate change. Mar Freshw Res 71:579–592. doi: 10.1071/MF18392. [DOI] [Google Scholar]
- 2.Francis G. 1878. Poisonous Australian lake. Nature 18:11–12. doi: 10.1038/018011d0. [DOI] [Google Scholar]
- 3.Lathrop RC, Carpenter SR. 1992. Phytoplankton and their relationship to nutrients, p 97–126. In Kitchell JF (ed), Food web management: a case study of Lake Mendota. Springer New York, New York, NY. doi: 10.1007/978-1-4612-4410-3_7. [DOI] [Google Scholar]
- 4.Steffensen DA. 2008. Economic cost of cyanobacterial blooms, p 855–865. In Hudnell HK (ed), Cyanobacterial harmful algal blooms: state of the science and research needs. Springer New York, New York, NY. doi: 10.1007/978-0-387-75865-7_37. [DOI] [Google Scholar]
- 5.Chapra SC, Boehlert B, Fant C, Bierman VJ, Henderson J, Mills D, Mas DML, Rennels L, Jantarasami L, Martinich J, Strzepek KM, Paerl HW. 2017. Climate change impacts on harmful algal blooms in U.S. freshwaters: a screening-level assessment. Environ Sci Technol 51:8933–8943. doi: 10.1021/acs.est.7b01498. [DOI] [PubMed] [Google Scholar]
- 6.Gobler CJ. 2020. Climate change and harmful algal blooms: insights and perspective. Harmful Algae 91:101731. doi: 10.1016/j.hal.2019.101731. [DOI] [PubMed] [Google Scholar]
- 7.Paerl HW, Huisman J. 2008. Blooms like it hot. Science 320:57–58. doi: 10.1126/science.1155398. [DOI] [PubMed] [Google Scholar]
- 8.Wilhelm SW, Bullerjahn GS, McKay RML, Moran MA. 2020. The complicated and confusing ecology of Microcystis blooms. mBio 11:e00529-20. doi: 10.1128/mBio.00529-20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Wells ML, Karlson B, Wulff A, Kudela R, Trick C, Asnaghi V, Berdalet E, Cochlan W, Davidson K, De Rijcke M, Dutkiewicz S, Hallegraeff G, Flynn KJ, Legrand C, Paerl H, Silke J, Suikkanen S, Thompson P, Trainer VL. 2020. Future HAB science: directions and challenges in a changing climate. Harmful Algae 91:101632. doi: 10.1016/j.hal.2019.101632. [DOI] [PubMed] [Google Scholar]
- 10.Heisler J, Glibert PM, Burkholder JM, Anderson DM, Cochlan W, Dennison WC, Dortch Q, Gobler CJ, Heil CA, Humphries E, Lewitus A, Magnien R, Marshall HG, Sellner K, Stockwell DA, Stoecker DK, Suddleson M. 2008. Eutrophication and harmful algal blooms: a scientific consensus. Harmful Algae 8:3–13. doi: 10.1016/j.hal.2008.08.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Cao H, Shimura Y, Steffen MM, Yang Z, Lu J, Joel A, Jenkins L, Kawachi M, Yin Y, Garcia-Pichel F. 2020. The trait repertoire enabling cyanobacteria to bloom assessed through comparative genomic complexity and metatranscriptomics. mBio 11:e01155-20. doi: 10.1128/mBio.01155-20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Du W, Li G, Ho N, Jenkins L, Hockaday D, Tan J, Cao H. 2021. CyanoPATH: a knowledgebase of genome-scale functional repertoire for toxic cyanobacterial blooms. Briefings in Bioinformatics 22:bbaa375. doi: 10.1093/bib/bbaa375. [DOI] [PubMed] [Google Scholar]
- 13.Stajich JE. 2013. Comparative genomics, p 380–386. In Losos JB, Baum DA, Futuyma DJ, Hoekstra HE, Lenski RE, Moore AJ, Peichel CL, Schluter D, Whitlock MC (ed), The Princeton guide to evolution. Princeton University Press, Princeton, NJ. doi: 10.1515/9781400848065-054. [DOI] [Google Scholar]
- 14.Aquadro CF. 2013. Molecular evolution, p 367–373. In Losos JB, Baum DA, Futuyma DJ, Hoekstra HE, Lenski RE, Moore AJ, Peichel CL, Schluter D, Whitlock MC (ed), The Princeton guide to evolution. Princeton University Press, Princeton, NJ. [Google Scholar]
- 15.Shi T, Falkowski PG. 2008. Genome evolution in cyanobacteria: the stable core and the variable shell. Proc Natl Acad Sci USA 105:2510–2515. doi: 10.1073/pnas.0711165105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Chen M-Y, Teng W-K, Zhao L, Hu C-X, Zhou Y-K, Han B-P, Song L-R, Shu W-S. 2021. Comparative genomics reveals insights into cyanobacterial evolution and habitat adaptation. ISME J 15:211–227. doi: 10.1038/s41396-020-00775-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Tomitani A, Knoll AH, Cavanaugh CM, Ohno T. 2006. The evolutionary diversification of cyanobacteria: molecular-phylogenetic and paleontological perspectives. Proc Natl Acad Sci USA 103:5442–5447. doi: 10.1073/pnas.0600999103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Tableau Software. 2022. Tableau desktop. Retrieved from https://www.tableau.com/. Tableau Software, Mountain View, CA. [Google Scholar]
- 19.Rubin BE, Wetmore KM, Price MN, Diamond S, Shultzaberger RK, Lowe LC, Curtin G, Arkin AP, Deutschbauer A, Golden SS. 2015. The essential gene set of a photosynthetic organism. Proc Natl Acad Sci USA 112:E6634–E6643. doi: 10.1073/pnas.1519220112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.European Molecular Biology Laboratory. 2020. Expression Atlas. Retrieved from https://www.ebi.ac.uk/gxa/home.
- 21.Yang Z. 2001. Adaptive molecular evolution. In Balding D, Bishop M, Cannings C (ed), Handbook of statistical genetics, 1st ed. Wiley, New York, NY. [Google Scholar]
- 22.Meier-Kolthoff JP, Göker M. 2019. TYGS is an automated high-throughput platform for state-of-the-art genome-based taxonomy. Nat Commun 10:2182. doi: 10.1038/s41467-019-10210-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Li B, Dewey CN. 2011. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12:323. doi: 10.1186/1471-2105-12-323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Martinez-Gutierrez CA, Aylward FO. 2019. Strong purifying selection is associated with genome streamlining in epipelagic Marinimicrobia. Genome Biol Evol 11:2887–2894. doi: 10.1093/gbe/evz201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Puigbò P, Bravo IG, Garcia-Vallve S. 2008. CAIcal: a combined set of tools to assess codon usage adaptation. Biol Direct 3:38. doi: 10.1186/1745-6150-3-38. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Humbert J-F, Barbe V, Latifi A, Gugger M, Calteau A, Coursin T, Lajus A, Castelli V, Oztas S, Samson G, Longin C, Medigue C, de Marsac NT. 2013. A tribute to disorder in the genome of the bloom-forming freshwater cyanobacterium Microcystis aeruginosa. PLoS One 8:e70747. doi: 10.1371/journal.pone.0070747. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Pérez-Carrascal OM, Terrat Y, Giani A, Fortin N, Greer CW, Tromas N, Shapiro BJ. 2019. Coherence of Microcystis species revealed through population genomics. ISME J 13:2887–2900. doi: 10.1038/s41396-019-0481-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Mugal CF, Wolf JBW, Kaj I. 2014. Why time matters: codon evolution and the temporal dynamics of dN/dS. Mol Biol Evol 31:212–231. doi: 10.1093/molbev/mst192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Osburne MS, Holmbeck BM, Coe A, Chisholm SW. 2011. The spontaneous mutation frequencies of Prochlorococcus strains are commensurate with those of other bacteria. Environ Microbiol Rep 3:744–749. doi: 10.1111/j.1758-2229.2011.00293.x. [DOI] [PubMed] [Google Scholar]
- 30.Denamur E, Matic I. 2006. Evolution of mutation rates in bacteria. Mol Microbiol 60:820–827. doi: 10.1111/j.1365-2958.2006.05150.x. [DOI] [PubMed] [Google Scholar]
- 31.Park S-C, Krug J. 2007. Clonal interference in large populations. Proc Natl Acad Sci USA 104:18135–18140. doi: 10.1073/pnas.0705778104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Wahl LM, Gerrish PJ. 2001. The probability that beneficial mutations are lost in populations with periodic bottlenecks. Evolution 55:2606–2610. doi: 10.1111/j.0014-3820.2001.tb00772.x. [DOI] [PubMed] [Google Scholar]
- 33.Misson B, Latour D. 2012. Influence of light, sediment mixing, temperature and duration of the benthic life phase on the benthic recruitment of Microcystis. J Plankton Res 34:113–119. doi: 10.1093/plankt/fbr093. [DOI] [Google Scholar]
- 34.Preston T, Stewart WDP, Reynolds CS. 1980. Bloom-forming cyanobacterium Microcystis aeruginosa overwinters on sediment surface. Nature 288:365–367. doi: 10.1038/288365a0. [DOI] [Google Scholar]
- 35.Sharp PM, Li W-H. 1987. The codon adaptation index: a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 15:1281–1295. doi: 10.1093/nar/15.3.1281. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Uddin A. 2017. Indices of codon usage bias. J Proteomics Bioinform 10:1000e34. doi: 10.4172/jpb.1000e34. [DOI] [Google Scholar]
- 37.Zhao F, Zhou Z, Dang Y, Na H, Adam C, Lipzen A, Ng V, Grigoriev IV, Liu Y. 2021. Genome-wide role of codon usage on transcription and identification of potential regulators. Proc Natl Acad Sci USA 118:e2022590118. doi: 10.1073/pnas.2022590118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Zhou Z, Dang Y, Zhou M, Li L, Yu C-h, Fu J, Chen S, Liu Y. 2016. Codon usage is an important determinant of gene expression levels largely through its effects on transcription. Proc Natl Acad Sci USA 113:E6117–E6125. doi: 10.1073/pnas.1606724113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Paerl HW, Gardner WS, Havens KE, Joyner AR, McCarthy MJ, Newell SE, Qin B, Scott JT. 2016. Mitigating cyanobacterial harmful algal blooms in aquatic ecosystems impacted by climate change and anthropogenic nutrients. Harmful Algae 54:213–222. doi: 10.1016/j.hal.2015.09.009. [DOI] [PubMed] [Google Scholar]
- 40.Parks DH, Rinke C, Chuvochina M, Chaumeil P-A, Woodcroft BJ, Evans PN, Hugenholtz P, Tyson GW. 2017. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat Microbiol 2:1533–1542. doi: 10.1038/s41564-017-0012-7. [DOI] [PubMed] [Google Scholar]
- 41.Seemann T. 2014. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30:2068–2069. doi: 10.1093/bioinformatics/btu153. [DOI] [PubMed] [Google Scholar]
- 42.Page AJ, Cummins CA, Hunt M, Wong VK, Reuter S, Holden MTG, Fookes M, Falush D, Keane JA, Parkhill J. 2015. Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 31:3691–3693. doi: 10.1093/bioinformatics/btv421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.GraphPad Software. 2022. Prism. Retrieved from https://www.graphpad.com/scientific-software/prism/. GraphPad Holdings LLC, San Diego, CA. [Google Scholar]
- 44.R Core Team. 2017. R: A language and environment for statistical computing. Available from https://www.R-project.org/. R Foundation for Statistical Computing, Vienna, Austria. [Google Scholar]
- 45.Asnicar F, Thomas AM, Beghini F, Mengoni C, Manara S, Manghi P, Zhu Q, Bolzan M, Cumbo F, May U, Sanders JG, Zolfo M, Kopylova E, Pasolli E, Knight R, Mirarab S, Huttenhower C, Segata N. 2020. Precise phylogenetic analysis of microbial isolates and genomes from metagenomes using PhyloPhlAn 3.0. Nat Commun 11:2500. doi: 10.1038/s41467-020-16366-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Letunic I, Bork P. 2021. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res 49:W293–W296. doi: 10.1093/nar/gkab301. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Katoh K, Standley DM. 2013. MAFFT Multiple Sequence Alignment software version 7: improvements in performance and usability. Mol Biol Evol 30:772–780. doi: 10.1093/molbev/mst010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Price MN, Dehal PS, Arkin AP. 2010. FastTree 2: approximately maximum-likelihood trees for large alignments. PLoS One 5:e9490. doi: 10.1371/journal.pone.0009490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. 2009. BLAST+: architecture and applications. BMC Bioinformatics 10:421. doi: 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Tillett D, Dittmann E, Erhard M, von Döhren H, Börner T, Neilan BA. 2000. Structural organization of microcystin biosynthesis in Microcystis aeruginosa PCC7806: an integrated peptide-polyketide synthetase system. Chem Biol 7:753–764. doi: 10.1016/s1074-5521(00)00021-1. [DOI] [PubMed] [Google Scholar]
- 51.OpenStreetMap. 2022. Planet dump. Retrieved from https://planet.openstreetmap.org. OpenStreetMap Foundation, Cambridge, United Kingdom. [Google Scholar]
- 52.Yang Z. 2005. PAML FAQ. Available from http://abacus.gene.ucl.ac.uk/software/pamlFAQs.pdf. Ziheng Yang Lab, University College London, London, United Kingdom. [Google Scholar]
- 53.Yang Z. 2007. PAML 4: Phylogenetic Analysis by Maximum Likelihood. Mol Biol Evol 24:1586–1591. doi: 10.1093/molbev/msm088. [DOI] [PubMed] [Google Scholar]
- 54.Nakamura Y, Gojobori T, Ikemura T. 2000. Codon usage tabulated from international DNA sequence databases: status for the year 2000. Nucleic Acids Res 28:292. doi: 10.1093/nar/28.1.292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Penn K, Wang J, Fernando SC, Thompson JR. 2014. Secondary metabolite gene expression and interplay of bacterial functions in a tropical freshwater cyanobacterial bloom. ISME J 8:1866–1878. doi: 10.1038/ismej.2014.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Kuhn M, Jackson S, Cimentada J. 2022. corrr: Correlations in R. Available from https://corrr.tidymodels.org.
- 58.Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. 2013. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Broad Institute. 2019. Picard toolkit. Broad Institute, Cambridge, MA. [Google Scholar]
- 60.Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, Miller CA, Mardis ER, Ding L, Wilson RK. 2012. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res 22:568–576. doi: 10.1101/gr.129684.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Buchfink B, Xie C, Huson DH. 2015. Fast and sensitive protein alignment using DIAMOND. Nat Methods 12:59–60. doi: 10.1038/nmeth.3176. [DOI] [PubMed] [Google Scholar]
- 62.Hunter JD. 2007. Matplotlib: a 2D graphics environment. Comput Sci Eng 9:90–95. doi: 10.1109/MCSE.2007.55. [DOI] [Google Scholar]
- 63.Steffen MM, Belisle BS, Watson SB, Boyer GL, Bourbonniere RA, Wilhelm SW. 2015. Metatranscriptomic evidence for co-occurring top-down and bottom-up controls on toxic cyanobacterial communities. Appl Environ Microbiol 81:3268–3276. doi: 10.1128/AEM.04101-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Set S1. Download spectrum.03194-22-s0001.xls, XLS file, 0.07 MB (75KB, xls)
Fig. S1 to S3. Download spectrum.03194-22-s0002.pdf, PDF file, 0.9 MB (906.1KB, pdf)
Data Availability Statement
The raw metatranscriptomic sequencing data generated from this study have been deposited in the NCBI Sequence Read Archive (SRA) under BioProject accession no. PRJNA869295. The SRA and GenBank accession numbers of the omics data used are listed in Table 2 and Data Set S1, respectively. All codes are available on request.
TABLE 2.
Metatranscriptome data sets used in this study
| Data set | Source | SRAa accession no. |
|---|---|---|
| Lake Erie 2012_1 | Steffen et al. (63) | SRR1927216, SRR1927217, SRR1927219 |
| Lake Erie 2012_2 | Steffen et al. (63) | SRR1927220, SRR1927222, SRR1927223 |
| Lake Erie 2012_3 | Steffen et al. (63) | SRR1927225, SRR1927229, SRR1927239 |
| Lake Taihu 2019_1 | This paper | SRR21035067, SRR21035066 |
| Lake Taihu 2019_2 | This paper | SRR21035065, SRR21035064 |
| Lake Taihu 2019_3 | This paper | SRR21035063, SRR21035062 |
| Kranji Reservoir 2014 | Penn et al. (55) | SRR1171050, SRR1171059, SRR1171061 |
SRA, Sequence Read Archive (https://www.ncbi.nlm.nih.gov/sra).






