Abstract
Plants fix nitrogen in concert with diverse microbial symbionts, often recruiting them from the surrounding environment each generation. Vertical transmission of a microbial symbiont from parent to offspring can produce extreme evolutionary consequences, including metabolic codependence, genome reduction, and synchronized life cycles. One of the few examples of vertical transmission of N-fixing symbionts occurs in Azolla ferns, which maintain an obligate mutualism with the cyanobacterium Trichormus azollae—but the genomic consequences of this interaction, and whether the symbiosis involves other vertically transmitted microbial partners, are currently unknown. We generated high-coverage metagenomes across the genus Azolla and reconstructed metagenome assembled genomes to investigate whether a core microbiome exists within Azolla leaf cavities, and how the genomes of T. azollae diverged from their free-living relatives. Our results suggest that T. azollae is the only consistent symbiont across all Azolla accessions, and that other bacterial groups are transient or facultative associates. Pangenomic analyses of T. azollae indicate extreme pseudogenization and gene loss compared to free-living relatives—especially in defensive, stress-tolerance, and secondary metabolite pathways—yet, the key functions of nitrogen fixation and photosynthesis remain intact. Additionally, differential codon bias and intensified positive selection on photosynthesis, intracellular transport, and carbohydrate metabolism genes suggest ongoing evolution in response to the unique conditions within Azolla leaf cavities. These findings highlight how genome erosion and shifting selection pressures jointly drive the evolution of this unique mutualism, while broadening the taxonomic scope of genomic studies on vertically transmitted symbioses.
Keywords: cyanobacteria, nitrogen fixation, vertical transmission, Azolla, pangenome, Trichormus azollae, Nostoc azollae, Anabaena azollae
Introduction
Plants partner with a wide variety of microbes to secure nutrients, tolerate stress, and regulate their development [1]. In most cases, beneficial plant symbionts are recruited by new generations from the surrounding soil or aquatic microbiota (termed horizontal transmission), whereas vertical transmission from parent to offspring appears rare [2]. However, the selective pressures and genomic consequences of obligate, vertically-transmitted symbioses can be extreme [3, 4], making such associations important to the study of the evolution of symbioses. The consequences of such associations include the evolution of metabolic codependence [5], life-cycle synchronization [6], and horizontal gene transfer [7]. Understanding the evolutionary pathways and fitness consequences of these modifications is crucial for efforts to control or engineer beneficial, long-term plant-microbe interactions for improving food security, carbon capture, and ecosystem restoration [8, 9].
A particularly extreme example of plant-microbe mutualism is found in the aquatic Azolla ferns (Salviniaceae). These ferns grow on the surfaces of still, fresh waters on every continent barring Antarctica. Despite this widespread distribution, there are only seven recognized species in the genus, arranged into two taxonomic sections [10, 11]. The first section, Azolla sect. Rhizosperma includes Azolla pinnata and Azolla nilotica, which are native to Africa, Asia, and Oceania (Fig. 1A). The second section, Azolla sect. Azolla is presumed native to the Americas and possibly Oceania, and contains five species: A. caroliniana, A. microphylla, A. mexicana, A. filiculoides, and A. rubra, which are now widely naturalized outside of their native ranges (Fig. 1B). The plants primarily reproduce clonally, though sexual reproduction is regularly observed in nature. As N-fixers, Azolla are widely used as a biofertilizer [12], a protein supplement for animals [13], and as water purifiers [14]; and have even been implicated to have played a role in global cooling during the Eocene epoch [15].
Figure 1.
(A) Photographs of A. pinnata (sect. Rhizosperma) and (B) A. caroliniana (sect. Azolla). (C) Scanning electron micrograph of the leaf and leaf pocket of A. pinnata. (D) Close-up of T. azollae cells within the pocket. Arrows show examples of heterocyst cells, SH is a simple hair cell. (E) Close-up of unknown bacteria attached to individual T. azollae cells. SEM imaging methods are provided in the supplemental methods.
All Azolla species host a nitrogen-fixing endophytic cyanobacterium in hollow cavities within most of their minute leaves (Fig. 1C and D). This cyanobacterium—named Trichormus azollae Komárek and Anagnostidis 1989 (syn Anabaena azollae, Nostoc azollae)—is filamentous and heterocystous, closely resembling free-living members of the Nostocaceae [16]. Relative to most host-microbe mutualisms, the Azolla–Trichormus symbiosis is unique in its mode and efficiency of transmission. For one, T. azollae appears incapable of autonomous growth outside of its host and has evolved a complex, multi-stage life history that persists through both clonal (sporophytic) and sexual reproductive (gametophytic) life stages of the fern [17]. This synchronized developmental cycle, involving motile, vegetative, and dormant phases in the bacterium, results in a maternal transmission with near-perfect fidelity and hints at considerable adaptive divergence from its free-living relatives in the genera Nostoc and Anabaena.
The first sequenced genome of T. azollae revealed extreme levels of pseudogenization, loss of genes predicted to be critical for life outside the host, and a carbohydrate transport system rarely encountered in cyanobacteria [18] and may represent the incipient stages of an endosymbiotic event that may have been similar to the evolution of the reduced N-fixing “nitroplast” organelle of certain marine algae [19]. However, it is unclear whether these trends extend across the entire T. azollae lineage, and whether other, more cryptic evolutionary consequences—such as shifts in selection regimes on particular genes—have also occurred, as they have in many intracellular arthropod symbionts [20, 21]. Further, a suite of additional microbial taxa also resides inside the leaf chambers, often adhering to the cells of T. azollae (Fig. 1E). Some of these bacteria have been implicated in N fixation or denitrification, but do not appear to carry out the process within the leaf cavity [22] and may themselves be vertically-transmitted symbionts of the ferns [23]. It remains unknown whether a core microbiome, beyond T. azollae, exists in the leaf pockets across the Azolla lineage and, if so, what its characteristics are.
We performed comparative metagenomic, pangenomic, and phylogenomic analyses to investigate two key questions. First, is there evidence of a shared, potentially co-diversifying Azolla leaf pocket microbiome across different host fern strains? Second, how do the primary T. azollae symbionts differ from their free-living cyanobacterial relatives in terms of functional genome content, gene loss, codon bias, and natural selection regime? By generating multiple metagenome replicates for each Azolla species, we achieve unprecedented statistical power to identify the genes, functions, and microbial taxa essential for maintaining this unique and ecologically important mutualism.
Materials and methods
Specimen sourcing and sequencing
Living cultures of Azolla ferns were sourced from the Azolla germplasm collection kept at the Dr. Cecilia Koo Botanical Conservation Center (KBCC) in Gaoshu Township, Taiwan. This collection had been maintained since the 1980s in indoor growth chambers at the International Rice Research Institute (IRRI) in Los Baños, Philippines until its transfer to KBCC in 2017. Strains from this collection had been sourced from around the world and have been maintained as inbred lines. From this collection, we selected at least two individuals of each species that appeared healthiest (with the exception of A. nilotica, which was unavailable). In addition, we sourced a wild strain of what was presumed to be A. filiculoides from the University of California Botanical Garden, USA (37.875, −122.239) and Azolla sect. Azolla from Brazos Bend State Park in SE Texas, USA (29.379, −95.614), and Azolla sect. Azolla and A. pinnata growing in outdoor ponds at the USDA Invasive Plant Research Lab in SE Florida, USA (26.083, −80.242). Upon entering our lab, each strain was briefly surface sterilized with a dilute 10% bleach solution, thoroughly rinsed in sterile water, and moved into sterile plant culture containers filled with “IRRI2” nutrient medium [24]. These were allowed to grow for 4–8 months in a plant growth chamber with 16 h:8 h 26°C/21°C day/night cycle at 60% relative humidity and 150 μmol m−2 s−1 light intensity.
We isolated and concentrated leaf pockets from the host plant to enrich microbial deoxyribonucleic acid using a modified enzymatic digest method [25], followed by a standard phenol-chloroform DNA extraction on the resulting material. The steps are detailed in the supplemental methods. Purified leaf pocket metagenomes, which included enriched densities of pocket symbionts accompanied by a reduced number of host plant protoplasts and hair cells, were sequenced on a NovaSeq 6000 System (Illumina) (2 × 151 bp paired-end sequencing) at the US Department of Energy’s Joint Genome Institute (JGI) using their low-input metagenomic workflow [26]. Reads were quality-filtered, error-corrected, assembled using metaSPAdes [27], and mapped to assembled contigs using BBMap [28]. Feature prediction and functional annotation was carried out using PROKKA v.1.14 [29] and eggNOG mapper v.2 [30], respectively, after which predicted genes were clustered into orthologous groups (“orthogroups”) using Orthofinder v.2.5.5 [58]. Finally, contigs were binned into metagenome-assembled genomes (MAGs) using MetaBat2 v.2.15 [31] and checked for completeness with CheckM v.1.1.3 [32]. Only MAGs surpassing the “medium quality” criterion of MAGs were retained (>50% completion, <10% contamination) [33]. Details of software parameter settings are listed in the supplementary methods. Raw and processed data are hosted on the JGI Genome Portal under proposal number 503794 (genome.jgi.doe.gov).
Metagenome processing and analysis
Two Azolla host plant phylogenies were estimated based on genomic SNPs and plastid marker sequences. To recover host plant sequences, metagenomic reads were aligned to either the entire A. filiculoides reference genome [34] or six plastid markers (rbcL, atpB, rps4, trnL-trnF, trnG-trnR, and rps4-trnS) (one or more for every Azolla species) [11] using bowtie2 v.2.5.2 [35] with the –sensitive and –local options enabled. For each mapping, variant calling was carried out using mpileup in samtools v.1.21 [36]. For the full genome mapping, variants of low quality (<30) or those classified as multiallelic, monomorphic, or within 10 bases of another SNP were removed, after which SNP alignment was carried out using vcf-kit v.0.3 [37]. Reads mapped to the plastid references were also processed through mpileup and used to generate consensus sequences which were then aligned on a per-locus basis to the existing plastid reference alignment using the --add and--keeplength options in MAFFT v.7.520 [38], after which they were concatenated. The resulting plastid and SNP alignments were 5795 and 268 177 bp in length, respectively, collapsing into 1463 and 13 592 unique site patterns. Maximum likelihood-based phylogenetic inference was carried out using raxml-ng [39] with 10 random starting trees and nonparametric bootstrapping across 200 iterations. Substitution models were selected after preliminary likelihood ratio tests: GTR + G for plastid marker sequences, GTR + G + ASC_LEWIS for genomic SNPs.
Phylogenetic assignment of the MAGs was carried out using the PhyloPhlAn 3.0 [40] pipeline, which identifies and aligns shared markers from unknown MAGs with those from a large database of >150 000 MAGs and 80 000 reference genomes. These alignments are used to taxonomically classify the MAGs and identify and align common marker genes, which were concatenated into amino acid alignments for partitioned phylogenetic reconstruction in raxml-ng using the LG + G8 + F substitution model. Cospeciation between T. azollae and Azolla was assessed using the Procrustean Approach to Cophylogeny (PACo) method, which has been shown to outperform related methods in minimizing types I and II error rates [41]. The PACo approach compares the residuals of two phylogenetic distance matrices subjected to Procrustes analysis with the residuals of a randomly permuted association matrix. Our choice of permutation algorithm (“r0”) assumed that the evolution of T. azollae tracked the evolution of the host plant. This permutation was repeated 1000 times returning a P-value for the null hypothesis that the observed cophylogenetic signal is no different than chance alone.
We used the related pipeline MetaPhlAn4 [42] to investigate overall patterns of diversity among all contigs in each metagenome. This pipeline attempts to classify metagenomic reads based on a reference-database of ~5.1 million clade-specific marker genes and outputs a taxonomic abundance profile. As diversity was very low for all metagenomes and sequencing depths were comparable, rarefaction of metagenomic reads was not carried out, and there was no relationship between sequencing depth and subsequent derived diversity metrics.
The R package phyloseq [43] was used to characterize the leaf pocket communities based on taxonomic assignments from MetaPhlAn4. The richness and Shannon diversities of samples were compared within leaf pockets of the two major Azolla clades. The R package rbims v.0.0.0.9 [44] was used to generate a database of functional predictions from the annotated MAGs. This R package organizes annotated sequences into a hierarchical database based on their KEGG IDs [45] and COG functional annotations [46] output by EggNOG mapper, which can then be used to visualize and compare metabolic pathways between genomes or metagenomes.
Selection, gene loss, and codon bias across the Trichormus azollae pangenome
For pangenomic analyses, we used all high-quality MAGs as well as additional closely-related cyanobacterial genomes including Nostoc punctiforme GCF_000020025.1 [47], Nostoc PCC7120 GCF_000009705.1 [48], Nodularia spumigena GCF_003054475.1, and Trichormus variabilis GCF_009856605.1, of which all are free-living, heterocystous cyanobacteria isolated from aquatic habitats or plant surfaces. Pangenomes of the T. azollae MAGs and their free-living relatives were generated using anvi’o v.8 [49] using the bioinformatic pipeline described in [50], wherein gene clusters—defined based on sequence similarity—were aligned among related T. azollae genomes and visualized. We then calculated functional enrichment of particular gene clusters between clades of T. azollae and between T. azollae MAGs and the free-living outgroup genomes using the statistical test described in [51]. This test fits logistic regressions to gene cluster occurrences with focal group category as a predictor (e.g. T. azollae vs. outgroup genomes) and outputs false-discovery rate-corrected significance values.
We investigated pseudogenization of coding regions in all MAGs of our metagenome, focusing specifically on those of T. azollae and their free-living relatives listed above. MAGs and reference genomes were processed through Prokka [29] to identify genes and CDS, which were then run through the Pseudofinder v.1.1 pipeline [52] using default parameters and the --annotate option against a reference database containing all cyanobacterial protein sequences available on the UniProt server (accessed on 14 August 2022) [53]. We attempted to predict the original function of these pseudogenes via BLAST search against a database of all cyanobacterial proteins and then annotating the top hits from this database. This approach permits the annotation of pseudogenes that have lost crucial functional sites, making it otherwise challenging to predict their original functions. Counts of orthologous genes and pseudogenes in each cyanobacterial genome were modeled using negative binomial generalized linear models for comparison across taxonomic clade, COG categories, and pangenomic core-accessory regions.
We also quantified codon bias in orthologous genes using the “mean expression level predictor” (MELP) metric with the coRdon R package [54], which measures codon bias in intact genes relative to those of known highly expressed genes (here, ribosomal proteins) and positively correlates with transcript abundances [55–57]. MELP values above 1 indicate codon frequencies match those of ribosomal genes whereas values below 1 indicate a closer match to genome average codon frequencies. Intact genes were further analyzed for signatures of selection. To do this, amino acid sequence alignments of all orthogroups present in all the free-living outgroups and at least 22 of the 24 T. azollae MAGs (3520 total) were created with MAFFT via the Orthofinder pipeline and used for gene tree estimation with raxml-ng (LG + G8 + F substitution model). Codon-aware nucleotide alignments were then generated using pal2nal software [59]. Tests for episodic positive (diversifying) selection on T. azollae genes were carried out using Branch-Site Unrestricted Statistical Test for Episodic Diversification (BUSTED) [60] in the HyPhy v.2.5.62 [61] software package, with default parameters and allowing for substitution rate variation. This analysis uses a likelihood ratio test to assess whether a rate-variable codon model is a better fit to the data (a subset of test branches in an orthologous gene tree) than a “null” model constrained to disallow positive selection, which is characterized by a ratio of nonsynonymous to synonymous mutations >1. This test has been expanded as part of the BUSTED-PH approach to assess differential selection between phenotypes and is detailed here: https://github.com/veg/hyphy-analyses/tree/master/BUSTED-PH. The approach first involves conducting a BUSTED test on groups of gene tree branches belonging to tips with particular phenotypes (here, Azolla symbionts and free-living bacteria). It compares the statistical likelihood of a universal dN/dS distribution over the entire gene tree to one where the two phenotypes have different distributions using a likelihood ratio test. Orthogroups failing this differential selection test still show evidence of selection associated with the focal phenotype. This may be due to the influence of outliers in either the focal or background branches, as variation in the relatively low number of branches in our comparison can increase the probability of type II (false negative) error. The entire BUSTED-PH procedure is diagrammed in the supplemental methods (Fig. S1).
We used HyPhy’s RELAX test to assess whether either purifying or positive selection on a particular orthologous group was either relaxed or intensified in the T. azollae clade as a consequence of their symbiotic lifestyle [62]. This test compares an unconstrained null dN/dS model fit to that of a more complex model containing a selection intensity exponent (k) which imposes a shift in the null model between focal and background gene tree branches. Values of k > 1 indicate selection on the focal branches (T. azollae) has intensified relative to the background (free-living Nostocales), whereas 0 < k < 1 indicates selection has relaxed toward neutrality (with k = 0 representing a relaxation to pure neutrality). For all P-values, false discovery rates were controlled using the Benjamini–Hochberg step-up procedure [65].
Results
Microbial community patterns
Metagenome assembly resulted in an acceptable average N50 value across all assemblies of 31 kb, with high average coverage (>300×) and mapping ratio (>95%) across all contigs. Contig binning returned 121 medium and high-quality MAGs. Statistics for these are provided in the supplemental data table alongside processed data used for statistical analyses.
Between one and fourteen MAGs were recovered per Azolla sample, with T. azollae reads recovered at high relative abundances in every metagenome (Fig. 2). Beyond the cyanobacterial symbionts, there was no group of bacteria consistently found within all or even the majority of leaf pockets, including any additional cyanobacterial strains. The highest number of non-cyanobacterial MAGs were classified to the order Rhizobiales (syn. Hyphomicrobiales), which were found across 43% of samples. The leaf cavity samples with the highest taxonomic diversity were those that were freshly collected from natural habitats, indicating that losses of bacterial populations in leaf cavities likely occurred during long-term laboratory propagation (Fig. S2). On average, lower taxonomic diversity (in terms of raw richness and Shannon diversity) was observed in the leaf cavities of A. pinnata (sect. Rhizosperma) relative to Azolla sect. Azolla (richness: t19.7 = 2.3, P < .05; Shannon t15.7 = 2.3, P < .05), though the statistical significance of this comparison disappeared when wild-collected samples were excluded from the analysis. Overall, most leaf pockets contained very low microbial diversity (mean = 5.3 ± 0.2 species) (Fig. S2).
Figure 2.
Phylogenomic tree of medium and high-quality MAGs recovered from Azolla leaf pocket metagenomes. Branches are shaded by bacterial clade. First series of rings shows normalized KEGG ortholog gene (OG) counts within major nitrogen transformation pathways in each MAG.
The metabolic traits of the leaf pocket MAGs were quite diverse, and hierarchical clustering separated the cyanobacterial symbionts and a small number of facultative anaerobes from the majority of the MAGs (Fig. S3). Relative to the non-cyanobacterial MAGs, the T. azollae genomes were characterized by complete N-fixation pathways and an absence of chemotaxis and associated motility genes. Metabolic pathways represented in the remainder of MAGs were highly variable, and included a wide array of inorganic N transformations, including N fixation outside of T. azollae and denitrification (Fig. 2). Various fixed-acid fermentative pathways were also detected, including in a homofermentative, nonmotile Lactococcus species. Additional putatively nonmotile MAGs included members of Chitinophagaceae detected across multiple samples. Oxidative phosphorylation pathway genes were common across most MAGs but reduced in those classified as Enterobacter ludwigii, Methylovorus sp., and the aforementioned Lactococcus genomes. However, there were no clear trends in metabolic traits in the set of MAGs that clearly distinguish them from other common free-living aquatic or plant-associated bacteria.
Evolutionary genomics of Trichormus azollae
Both Azolla trees were generally concordant, recovering the two monophyletic Azolla sections and two caroliniana/microphylla/mexicana and filiculoides/rubra complexes within sect. Azolla, with all inner branches having bootstrap support >75% (Fig. S4). Comparison of the T. azollae and Azolla phylogenetic trees (based on both genomic SNP and plastid markers) revealed substantial cophylogenetic signal, supporting the hypothesis that the phylogeny of T. azollae tracks that of its host plant (PACo SSresid < 0.05, P < .0001 for both Azolla trees) (Fig. 3, Fig. S5). Pangenomic analysis of the T. azollae MAGs returned a suite of 1672 conserved, intact gene clusters common to both the obligate cyanobionts and their free-living relatives, which we designated the core pangenome (Fig. 3). All single-copy genes fell into this conserved core. A second, more fragmented conserved region was also heavily shared among all genomes. Two additional clusters were also identified. The first contained the 2333 clusters exclusive to the free-living taxa whereas the second contained 1069 “accessory” gene clusters that were inconsistently shared among T. azollae lineages. The gene clusters found within these core and accessory bins differed in their functional annotations, with only COG category X (mobilome/prophages/transposons) having a greater total abundance in the accessory genome than the core genome (Fig. S6). Likewise, the fractional representation of each COG category in intact clusters also significantly differed between core and accessory genomes, with gene clusters belonging to COG categories X (mobile elements), Q (secondary metabolite synthesis), V (defense), K (transcription), P (inorganic ion transport/metabolism), and T (signal transduction) being found at higher frequencies (relative to other COG categories) in the accessory set compared to the core pangenome (Fig. S6).
Figure 3.
Pangenome of T. azollae symbionts and a selection of closely-related free-living cyanobacterial genomes. Inset graph shows phylogenomic trees of T. azollae MAGs and their host ferns with lines connecting the host and symbiont tips. Vertical ordering is based on phylogenetic relatedness. Inner rows on the radial display show intact gene clusters of each genome. The “core genome” is the region on the right comprising 1672 gene clusters shared across all genomes. “SCGs” bar indicates the region of single-copy genes, all clustered within the core genome. The functional homogeneity index identifies relative similarity of each gene cluster with respect to the amino acid composition of each cluster among genomes.
Functional enrichment analysis of pangenomic gene clusters identified 693 (35% of the total count) KEGG-assigned gene clusters and 974 (40% of the total count) COG-annotated clusters to be statistically associated with either T. azollae or their free-living relatives (Fig. 4A). Trichormus azollae genomes were significantly enriched with genes regulating fimbral expression and sensing of histidine kinase—important components of attachment, biofilm formation, and secretion. Conversely, the free-living outgroup genomes were enriched with clusters coding for phototaxis, light capture, UV protection, gas vesicle formation, and osmoregulation. A few of these genes were specifically associated with individual Azolla lineages. Two distinct fimbrial regulatory genes, fimE and fimB, were uniquely found in the sections Azolla and Rhizosperma, respectively, but were absent in the free-living outgroups. A COG-category-wise analysis of differentially enriched genes suggests that the three most enriched categories in the outgroups relative to T. azollae were associated with categories X (mobilome), Q (secondary metabolites), and V (defense mechanisms) (Fig. 4B).
Figure 4.

(A) Functional enrichment of gene clusters between the three focal groups. Placement of points and their coloring represent relative representation of each gene cluster among the three genomic groups. KEGG annotations of certain differentially-enriched gene clusters are shown in gray boxes. (B) Differential functional enrichment of genes by COG category. Higher values indicate fewer genes within a category were present across T. azollae genomes relative to outgroup genomes. (C) Comparison of pseudo- and intact gene counts between the T. azollae and its free-living relatives. (D) Percentage of pseudogenes in each genome as a function of COG category. Category key: L = replication and repair, V = defense mechanisms, T = signal transduction, Q = secondary metabolites, K = transcription, S = unknown function, P = inorganic ion transport and metabolism, I = lipid metabolism, D = cell cycle control and mitosis, O = post-translational modification and protein turnover, G = carbohydrate metabolism, E = amino acid metabolism, U = intracellular trafficking and secretion, M = cell wall/membrane/envelope biogenesis, N = cell motility, C = energy production and conversion, F = nucleotide metabolism, H = coenzyme metabolism, J = translation, X = mobilome/prophages/transposons, and “–” = no description.
Pseudogene counts were significantly higher in T. azollae compared to their free-living relatives (Neg. Binomial GLM; Z = 17.72, P < .0001) (Fig. 4C), and, in T. azollae, were significantly greater in abundance than even intact genes (Neg. Binomial GLM; Z = −12.79, P < .0001). The relative fractions of pseudogenes among COG categories varied significantly for both the cyanobiont and free-living outgroup genomes (Fig. 4D). However, this variability was much higher for T. azollae, which had significantly more pseudogenes than intact genes for COG categories L (replication and repair), V (defense mechanisms), T (signal transduction), Q (secondary metabolites), and K (transcription), with the largest differences between T. azollae and outgroup genomes in replication/repair and signal transduction categories. In contrast, MAGs classified to the order Rhizobiales (a second hypothesized symbiont of Azolla [22]) did not show evidence for greater fractional pseudogene representation in their genomes (t-test t92 = 0.8, P = .43) and had higher average intact gene counts than most other non-cyanobacterial MAGs recovered from the samples (Neg. Binomial GLM Z = −3, P < .005). In both T. azollae and outgroup genomes, most pseudogenes were identified due to the absence of detectable open reading frames or substantial gene truncation. A smaller proportion were flagged due to gene fragmentation, and none were identified based on elevated dN/d = dS values (Fig. S7).
The MELP metric found 90 orthologous groups to be differentially biased (and therefore possibly differentially expressed) between T. azollae and free-living relatives (Fig. 5A). Of these, 32 had significantly higher MELP values in T. azollae compared to the outgroup genomes, whereas 58 were significantly lower than the outgroup. Genes with MELP scores >1 in the outgroup and <1 in the cyanobionts include those coding for glycosidases (cd, ma nplT), cobalamin synthesis (cobW), and Fe-S cluster assembly proteins (sufB). Genes enriched in the cyanobionts relative to the outgroups include photosystem II proteins (psbK), phycoerythrocyanin linker proteins (pecC) and a putative virulence factor (mviM). MELP scores of orthologous genes varied between T. azollae and the free-living relatives in a category-dependent fashion (Fig. 5B). Most markedly, orthologues in the COG category U corresponding to intracellular trafficking, secretion, and vesicular transport had the greatest differences between the two groups and was the only category with significantly higher predicted expression in T. azollae genomes relative to the outgroup. All other statistically significant MELP comparisons favored the free-living outgroups.
Figure 5.
(A) Heatmap of predicted gene expression based on the MELP metric. Values >1 indicate codon bias is closer to highly expressed ribosomal reference genes than genomic average codon frequencies. Rows are genes shared among T. azollae and free-living relatives which could be assigned to KEGG orthologs. The subset of genes shown here correspond to those with significantly different MELP scores between the two groups with substantial effect sizes (|D| > 0.8) indicated in the leftmost column, where negative values indicate higher MELP scores for T. azollae and positive values indicate higher scores for the free-living relatives. The top 10% of these genes with regard to effect size are highlighted alongside their KEGG annotations. (B). Genes with MELP scores that significantly differed between T. azollae and free-living genomes, ordered by COG category. Asterisks next to each line denote adjusted significance of difference among groups (P < *.05, **.001, ***.0001). Here, for example, genes in the “U” category (intracellular trafficking and secretion) have significantly higher MELP scores than in free-living relatives. COG category definitions are listed in the Fig. 4 caption.
Tests for changes in the strength of selection for orthologous genes using the RELAX procedure returned 139 orthologues (out of 3520 total) exhibiting statistically significant evidence of intensified selection (k > 1) and 22 showing evidence of relaxed selection (k < 1) (Fig. 6A). Genes in the test group identified as having relaxed in selection strength include many in the COG categories T (signal transduction) and L (replication, recombination and repair) and include subunit 1 of the cytochrome oxidase enzyme (coxA/ctaD), and two DNA replication and repair proteins (recF and alkA). Conversely, genes experiencing intensification of selection were detected across all COG categories and included circadian clock proteins (kaiB/kaiC), genes involved in photosynthesis (ftrC, chlB, psbC), transport and adhesion (psbC, pilC, exbD) and heterocyst pattern formation (patA).
Figure 6.
(A) Results of RELAX analysis for intensification or relaxation of natural selection on test branches (corresponding to T. azollae MAGs) of OG trees. The violin plot shows the overall distribution of selection intensity parameters inferred for each OG. Values >1 indicate selection on the OG is intensified in test branches and values <1 indicate selection has relaxed. Dashed line denotes median k across all genes and points identify the OGs where k ≠ 1. Lists above the graph identify a subset of annotated OGs. Inset bar graph shows the relative distribution of OGs with relaxed or intensified selection relative to the total number of orthogroups in each COG category. (B) Flowchart illustrating the results of the BUSTED-PH test for differential positive selection on OGs. Arrows denote the number OGs either passing or failing each statistical test for positive selection on foreground/background branches and differential selection between the two. The boxes on the right highlight the annotations of some of the OGs found to differ in selection regimes between T. azollae and the free-living outgroup genomes.
Subjecting these orthologues to a more specific test for positive selection under the BUSTED-PH procedure identified a suite of 22 and 29 mutually exclusive orthologues identified as having at least one branch experiencing statistically significant positive selection in T. azollae and free-living relatives, respectively (Fig. 6B). Of these orthologues, only 7 (for T. azollae) and 2 (for the outgroup) showed evidence for statistically greater positive selection than the background group. Genes differentially under increased positive selection in T. azollae include an arsenosugar glycosyltranferase and mannose-6-phosphate isomerase, whereas additional genes under positive selection in this group but failing the final differential selection test include proteins playing important rolls in chlorophyll production in low-light environments (chlB), inhibition and directionality of cell division (minD), and long-term nitrogen storage (cyanophycin synthetase). In contrast, genes under significantly increased positive selection in the free-living outgroup include nitrogenase iron protein (nifH) and an IS630 family transposase.
Discussion
In this study, we used metagenomic and pangenomic analyses to examine the taxonomic and metabolic diversity of the symbiotic leaf pocket microbial community across much of the Azolla fern lineage. We focused on comparing the pangenomes of the dominant cyanobacterial symbiont, T. azollae, with those of closely related free-living species to identify patterns in functional gene enrichment, pseudogenization, and differences in codon bias and selection regimes on specific orthologous genes.
At the community level, we encountered only one bacterial taxon—T. azollae (Nostocales)—that was present in leaf pockets across all Azolla strains, regardless of whether the plants had been recently collected from the wild or were cultured as germplasm for the past 30 years. This extreme host fidelity is due to the unique vertical transmission mechanism of T. azollae, which persists even in benign germplasm culture conditions due to its fitness benefits to the host [66]. In contrast, no other members of Nostocaceae were detected, highlight the absence of any of the hypothesized secondary symbionts [90]. Though other microbial taxa were found across multiple host fern strains—particularly Bradyrhizobium and Rhizobium species—none showed a cophylogenetic signal comparable to that between T. azollae and their host plants. This suggests that although the conditions in the leaf pocket are suitable for the invasion and possible vertical transmission of bacterial clades with a reasonably wide range of metabolic traits, the associations of most non-cyanobacterial clades are likely transient and nonessential for host performance. A second possibility is that some of these non-cyanobacterial MAGs represent epiphytes or endophytes of the host ferns that remained after surface sterilization. This is more likely the case for the bacterial clades found as singletons in the dataset and which are commonly detected in bulk water samples such as Chitinophaga and Ferrovibrio.
The relatively large representation of Rhizobium and Bradyrhizobium MAGs in the leaf pocket metagenomes aligns closely with the results of other recent surveys. A previous metagenomic analysis [22] surveyed the compositions of eight Azolla accessions including one of the same strains analyzed here (IRRI 3017). Here, the authors were able to recover ribosomal RNA biomarkers for the Rhizobia at comparable relative abundances to those recovered here. In agreement with our study, the authors further speculated that these taxa did not contribute to N-fixation in the leaves as they lacked the requisite genes in their MAG assemblies, but also possessed denitrification genes which may be important for removal of excess ammonium from the leaf chamber. Another recent study of 112 Azolla leaf pocket microbiomes in wild strains from California, USA (primarily members of sect. Azolla) also encountered members of Rhizobiales across their samples [23]. Their analysis revealed that these non-cyanobacterial members show a weaker significant cophylogenetic signal with the host fern than that of its primary cyanobiont.
The sources of these Rhizobiales in Azolla leaf pockets—whether via vertical transmission or repeated invasion of leaf pockets from distinct local source populations—remain unclear. Rhizobia are already known to epibiotically colonize the heterocysts of both free-living Anabaena species [67] and T. azollae [68]. However, none of the non-cyanobacterial MAGs showed evidence of cophylogeny with the host plant, and most possessed pathways for flagellar assembly and chemotaxis, suggesting that invasion of a leaf pocket and transient persistence by motile, free-living bacteria from the surrounding environment happens with some regularity. Invasion most likely occurs in the early stages of leaf development, when the young leaf’s pocket is briefly open to the outside environment [17]. To better clarify the mode of microbiome transmission and leaf invasion, a fluorescence in situ approach could be used to track the spatial location and persistence of a bacterial clade such as Rhizobium across life stages. It would also be valuable to explore how leaf pocket microbial communities vary with environmental context—for example, between wild and cultured specimens, or among congeners growing under different climatic or water conditions—as different bacterial taxa might be more likely to colonize and persist under stressful or benign conditions.
Comparison of the T. azollae pangenome to the genomes of close, free-living relatives revealed differences in pseudogenization, gene set enrichment, codon bias, and signatures of selection which help clarify the evolution of this cyanobacterium’s unique lifestyle. Our results confirm at the clade level the substantial pseudogenization (56%) previously documented in the T. azollae genome [18]. The pseudogenic content of most bacterial genomes appear to range between 1% and 25% [69, 70], though our approach to pseudogene detection uses a broader and more sensitive set of detection criteria to flag pseudogenes [63, 64], and therefore returns pseudogenization estimates between 10% and 30% higher than earlier methods [52]. It is hypothesized that highly-pseudogenized genomes of symbiotic bacteria are expected in recently evolved symbioses which have not been stable long enough to experience reductive genome evolution. This is likely the case in Azolla, as the fossil-calibrated molecular clock places divergence of the genus at 89 Ma in the late Cretaceous [11]. In contrast, highly eroded intracellular symbiont genomes such as Buchnera have divergence estimates of at least 150 Ma or earlier [71], though this reduction process may have already begun in T. azollae as the average genome size has decreased to ca. 5 Mb from 6–7 Mb in their free-living relatives [18].
Our functional annotation of these pseudogenes both confirms and expands the scope of previous genomic analyses of T. azollae. In agreement with the first genomic analysis of T. azollae [18], the highest proportions of pseudogenes were from replication, recombination, and repair (COG category L), signal transduction (T), and secondary metabolite biosynthesis, transport, and catabolism (Q) categories. The former two categories also contained the most genes experiencing relaxed selection. The relative degree of pseudogenization in signal transduction genes was especially pronounced. Genes in this category have also been disproportionately purged from numerous obligate symbionts of arthropods (e.g. Serratia, Buchnera, Pantoea, Rickettsia) [72–74]. However, this trend is less apparent in the vertically-transmitted bacteria present in the leaf nodules of angiosperms such as Psychotria and Ardisia [75, 76].
Among the orthologous genes shared between free-living cyanobacteria and T. azollae, many exhibited differential codon bias. Genes associated with intracellular trafficking, secretion, and vesicular transport (U) were the only functional category predicted to be more optimized for expression in T. azollae compared to the outgroup. In contrast, genes related to posttranslational modification (O), defense mechanisms (V), and replication and repair (L) were predicted to have higher expression in the outgroup than in T. azollae. Additionally, the functional enrichment of COG categories related to wall, membrane, and envelope biogenesis (M) and carbohydrate transport and metabolism (G) in the T. azollae genomes suggests a shift in selective pressure away from environmental stress tolerance and toward adhesion and extracellular transport.
We identified hundreds of genes that exhibit signs of differential enrichment, codon evolution, or natural selection between T. azollae clades and free-living Nostocales. Many of these genes are linked to pathways involved in nitrogen fixation, motility, stress tolerance, photosynthesis, carbon metabolism, and defense. Major differences in the properties of these orthologs between the cyanobacterial lineages are briefly highlighted below.
Nitrogen fixation and heterocyst development
The Azolla–Trichormus symbiosis relies on effective nitrogen fixation, reflected in the higher heterocyst frequency of T. azollae (~20%) compared to free-living cyanobacteria (~5%–10%) [77, 78]. This suggests a greater investment in nitrogen fixation at the expense of photosynthesis and replication. In line with this observation, we found evidence for increased selection pressure on the patA gene—a key regulator of heterocyst formation and patterning [79]. Additionally, cyanophycin synthetase (critical for nitrogen storage and transfer among cells) is under positive selection in T. azollae but not in the outgroup. Conversely, the nifZ (essential for MoFe protein synthesis in nitrogenase) was identified as being under strengthened selection according to the RELAX procedure but was not identified as being under positive selection according to the BUSTED test, suggesting that it may be experiencing increased purifying selection instead. The nifH gene, in contrast, appears to be experiencing increased positive selection in the outgroups, despite an earlier proteomic observation of its post-translational modification in Azolla leaves [80]. These results highlight an overall reduction of positive selection in nif genes in T. azollae, possibly in line with increased purifying selection to maintain this critical metabolic function. However, whereas our analyses focused solely on gene bodies, a large 11 kb intergenic spacer between nifK and nifDH found in the vegetative cells of free-living Nostocales taxa has been lost in those of T. azollae [81] which may hasten or expand the production of these nitrogenase enzymes beyond the effects of selection on the gene bodies themselves [82].
Motility, chemotaxis, and stress tolerance
Genes associated with environmental stress responses were enriched in free-living outgroups, and include osmoprotectant transporters (opuC/D), UV damage endonucleases, and efflux pump regulators tetR/acrR [83]. Motility-related genes coding for type IV pilus proteins (pilW/V), a gas vesicle structural protein (gvpA), and phototaxis regulator (pixH) were also enriched in free-living genomes. In contrast, T. azollae showed enrichment in fimbriae-associated adhesion proteins, an important structure in plant-cyanobacterial symbiosis [84]. The fimbral regulator fimB was enriched in A. pinnata (sect. Rhizosperma), whereas its counterpart fimE was enriched in sect. Azolla species. The fimB regulator exhibits optimal performance at higher temperatures than fimE [85], which aligns with the increased growth performance of A. pinnata at higher temperatures relative to A. filiculoides [86]. In addition, genes involved in DNA repair such as recF and alkA showed evidence for relaxed selection in T. azollae. This evidence indicates a broad evolutionary genomic reshaping in the transition from a motile existence in a periodically stressful environment to a more stable, surface-attached lifestyle in line with the natural history of T. azollae.
Photosynthesis and carbon metabolism
We anticipated genes related to photosynthesis should be under relaxed selection in T. azollae given that its carbon appears to be provisioned by its host plant. Genes that showed the largest differences in codon bias between the test groups included antenna protein-related genes pecA/pecC, and photosystem II subunits psbM, psbY, psbX, and psbK—all of which were predicted to be optimized for expression in T. azollae. Similarly, two PSII antenna protein genes (psbC, psbY) and cytochrome f (petA) are predicted to be experiencing increased selection pressure in T. azollae. This differential codon optimization and increased selection on multiple essential and conserved photosystem components in T. azollae begs further investigation, as it has long been understood that whereas the Azolla ferns show especially high photosynthetic rates [87], their T. azollae symbionts seem to produce markedly less Rubisco and phosphoribulokinase and have lower CO₂ fixation rates than free-living Anabaena strains [80, 88]. While this may simply be due to the higher heterocyst frequencies observed in T. azollae, another explanation for this discrepancy is that selection on photosynthetic genes in T. azollae may favor increased efficiency rather than absolute carbon fixation capacity. The continued production of photosynthetic pigments, structures, and enzymes suggests that maintaining a functional photosynthetic apparatus is important for the symbiosis. This may help supplement its carbon budget during periods of host dormancy, when external carbon supply is reduced. Such a mechanism seems necessary in seasonal climates when the host plants exhibit dramatic changes in pigment composition and carbon fixation rates [89]. Additionally, the light environment within Azolla leaf chambers may be quite different from the spectrum experienced by free-living cyanobacteria leading to increased selection on light harvesting pigments.
Little is currently known about the physiology of T. azollae, as they have proven resistant to culturing efforts, and previous investigations of cyanobacterial isolates from Azolla have, by accident, been carried out on free-living cyanobacterial contaminants rather than the primary symbiont itself [90]. Notably, we did not detect contaminant genomes from related Nostocales taxa in our samples, suggesting that long-term secondary cyanobionts are not a general feature of this symbiosis. To progress on this front, our genomic data can be used to develop metabolic models which can guide the development of isolation media [91]. Further, there have been promising developments in identifying differentially expressed genes in the host fern in the presence and absence of T. azollae [92], but similar studies of gene expression in the symbiont are needed [93], particularly across different stages of the host plant’s life cycle. Likewise, studies linking physiological status and gene or metabolite expression in situ are becoming more common in Azolla [94–96], and the divergently evolving or eroding genes identified herein can contextualize results in this developing area of research.
The Azolla leaf pocket is also a useful model microcosm for investigating microbial community dynamics at high spatiotemporal resolution. Fine-scale genomic and transcriptomic assays, alongside in situ imaging could provide direct insights into the invasion and persistence of mixed strain assemblages in these micro-scale bioreactors. Introducing new microbes to the plant at the megaspore stage has already been successfully carried out and has even resulted in the successful introduction of a free-living cyanobacterial isolate into the leaf cavity [97]. This sets up the exciting prospect of experimentally introducing novel beneficial symbionts to either improve plant productivity or study the evolution of the symbiosis by attempting to recreate its early stages.
Conclusions
Contrary to observations of a diverse, shared leaf pocket microbiome in wild Azolla ferns, our metagenomic analysis of leaf pockets from strains both recently collected and from long-term cultures do not support a persistent vertically-transmitted microbiome outside of the cyanobacterium Trichormus (syn. Nostoc, Anabaena) azollae, the persistent microbial symbiont of its host plant. Our findings further highlight the substantial genomic differentiation of the T. azollae symbiont from its free-living relatives, shaped by both selection on essential symbiotic functions and the ongoing loss of non-essential genes. The evolutionary trajectory of the Azolla–Trichormus symbiosis appears to be one of ongoing, pairwise co-diversification, with natural selection reshaping the cyanobiont’s role in nitrogen fixation, host adhesion, and metabolite transfer, while retaining the metabolic independence that differentiates it from more ancient intracellular symbionts. By expanding our understanding of the only known vertically transmitted nitrogen-fixing mutualism, these results contribute to a growing body of knowledge on how bacterial symbioses evolve in complex settings, particularly in the ecologically and economically important N-fixing cyanobacteria.
Supplementary Material
Acknowledgements
The authors thank the Cecilia Koo Botanic Conservation Center (KBCC), Ken-Yu Cheng (KBCC), C. Rothfels (Utah State University), H. Forbes (UC Berkeley Botanical Garden), P. Madeira (USDA), and E. Pokorny (USDA) for assistance with specimen sourcing.
Contributor Information
David W Armitage, Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna, Okinawa 904-0495, Japan; Department of BioSciences, Rice University, MS-140, PO Box 1892, Houston, TX 77251-1892, United States.
Alexandro G Alonso-Sánchez, Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna, Okinawa 904-0495, Japan.
Samantha R Coy, Department of BioSciences, Rice University, MS-140, PO Box 1892, Houston, TX 77251-1892, United States; Biology Department, Woods Hole Oceanographic Institution, 266 Woods Hole Rd, MS #52, Woods Hole, MA 02543, United States.
Zhuli Cheng, Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna, Okinawa 904-0495, Japan.
Arno Hagenbeek, Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna, Okinawa 904-0495, Japan.
Karla P López-Martínez, Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna, Okinawa 904-0495, Japan.
Yong Heng Phua, Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna, Okinawa 904-0495, Japan.
Alden R Sears, Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna, Okinawa 904-0495, Japan; Department of Plant and Microbial Biology, North Carolina State University, Campus Box 7612, Raleigh, NC 27695-7612, United States.
Author contributions
David W. Armitage (Conceptualization, Funding acquisition, Supervision, Data curation, Formal analysis, Writing—original draft, Writing—review & editing), Alexandro G. Alonso-Sánchez (Data curation, Writing—review & editing), Samantha R. Coy (Data curation, Writing—review & editing), Alden R. Sears (Data curation, Writing—review & editing), Zhuli Cheng (Formal analysis, Writing—review & editing), Arno Hagenbeek (Formal analysis, Writing—review & editing), Karla P. López-Martínez (Formal analysis, Writing—review & editing), and Yong Heng Phua (Visualization, Formal analysis, Writing—review & editing)
Conflicts of interest
The authors declare no conflicts of interest.
Funding
Funding was provided by USDA NIFA Postdoctoral Fellowship 2018-07819, the Rice University Department of BioSciences, and a cabinet subsidy to OIST. Sequencing support was provided by the US Department of Energy Joint Genome Institute Community Science Program (CSP-503794).
Data availability
Metagenomic contigs, MAGs, and annotations are publicly available on the JGI IMG server under proposal ID 503794, and processed data are available as supplementary tables alongside this article.
References
- 1. Hirsch AM. Plant-microbe symbioses: a continuum from commensalism to parasitism. Symbiosis 2004;37:345–63. [Google Scholar]
- 2. Frank AC, Saldierna Guzmán JP, Shay JE. Transmission of bacterial endophytes. Microorganisms 2017;5:70. 10.3390/microorganisms5040070 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Moran NA, McCutcheon JP, Nakabachi A. Genomics and evolution of heritable bacterial symbionts. Annu Rev Genet 2008;42:165–90. 10.1146/annurev.genet.41.110306.130119 [DOI] [PubMed] [Google Scholar]
- 4. Sachs JL, Skophammer RG, Regus JU. Evolutionary transitions in bacterial symbiosis. Proc Natl Acad Sci 2011;108:10800–7. 10.1073/pnas.1100304108 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Wierz JC, Dirksen P, Kirsch R. et al. Intracellular symbiont Symbiodolus is vertically transmitted and widespread across insect orders. ISME J 2024;18:wrae099. 10.1093/ismejo/wrae099 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Zheng W, Bergman B, Chen B. et al. Cellular responses in the cyanobacterial symbiont during its vertical transfer between plant generations in the Azolla microphylla symbiosis. New Phytol 2009;181:53–61. 10.1111/j.1469-8137.2008.02644.x [DOI] [PubMed] [Google Scholar]
- 7. Danneels B, Viruel J, Mcgrath K. et al. Patterns of transmission and horizontal gene transfer in the Dioscorea sansibarensis leaf symbiosis revealed by whole-genome sequencing. Curr Biol 2021;31:2666–2673.e4. 10.1016/j.cub.2021.03.049 [DOI] [PubMed] [Google Scholar]
- 8. Bailey-Serres J, Parker JE, Ainsworth EA. et al. Genetic strategies for improving crop yields. Nature 2019;575:109–18. 10.1038/s41586-019-1679-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Griffin C, Oz MT, Demirer GS. Engineering plant–microbe communication for plant nutrient use efficiency. Curr Opin Biotechnol 2024;88:103150. 10.1016/j.copbio.2024.103150 [DOI] [PubMed] [Google Scholar]
- 10. Saunders RMK, Fowler K. A morphological taxonomic revision of Azolla Lam. section Rhizosperma (Mey.) Mett. (Azollaceae). Bot J Linn Soc 1992;109:329–57. 10.1111/j.1095-8339.1992.tb00277.x [DOI] [Google Scholar]
- 11. Metzgar JS, Schneider H, Pryer KM. Phylogeny and divergence time estimates for the fern genus Azolla (Salviniaceae). Int J Plant Sci 2007;168:1045–53. 10.1086/519007 [DOI] [Google Scholar]
- 12. Wagner GM. Azolla: a review of its biology and utilization. Bot Rev 1997;63:1–26. 10.1007/BF02857915 [DOI] [Google Scholar]
- 13. Brouwer P, Schluepmann H, Nierop KG. et al. Growing Azolla to produce sustainable protein feed: the effect of differing species and CO2 concentrations on biomass productivity and chemical composition. J Sci Food Agric 2018;98:4759–68. 10.1002/jsfa.9016 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Song U, Park H, Lee EJ. Ecological responses and remediation ability of water fern (Azolla japonica) to water pollution. J Plant Biol 2012;55:381–9. 10.1007/s12374-012-0010-5 [DOI] [Google Scholar]
- 15. Speelman EN, Van Kempen MML, Barke J. et al. The Eocene Arctic Azolla bloom: environmental conditions, productivity and carbon drawdown. Geobiology 2009;7:155–70. 10.1111/j.1472-4669.2009.00195.x [DOI] [PubMed] [Google Scholar]
- 16. Komárek J, Anagnostidis K. Modern approach to the classification system of Cyanophytes 4 - Nostocales. Algol Stud Für Hydrobiol Suppl 1989;56:247–345. [Google Scholar]
- 17. Peters GA, Toia RE Jr, Raveed D. et al. The Azolla–Anabaena azollae relationship VI. Morphological aspects of the association. New Phytol 1978;80:583–93. 10.1111/j.1469-8137.1978.tb01591.x [DOI] [Google Scholar]
- 18. Ran L, Larsson J, Vigil-Stenman T. et al. Genome erosion in a nitrogen-fixing vertically transmitted endosymbiotic multicellular cyanobacterium. PLoS One 2010;5:e11486. 10.1371/journal.pone.0011486 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Coale TH, Loconte V, Turk-Kubo KA. et al. Nitrogen-fixing organelle in a marine alga. Science 2024;384:217–22. 10.1126/science.adk1075 [DOI] [PubMed] [Google Scholar]
- 20. Chong RA, Park H, Moran NA. Genome evolution of the obligate endosymbiont Buchnera aphidicola. Mol Biol Evol 2019;36:1481–9. 10.1093/molbev/msz082 [DOI] [PubMed] [Google Scholar]
- 21. Siozios S, Nadal-Jimenez P, Azagi T. et al. Genome dynamics across the evolutionary transition to endosymbiosis. Curr Biol 2024;34:5659–5670.e7. 10.1016/j.cub.2024.10.044 [DOI] [PubMed] [Google Scholar]
- 22. Dijkhuizen LW, Brouwer P, Bolhuis H. et al. Is there foul play in the leaf pocket? The metagenome of floating fern Azolla reveals endophytes that do not fix N2 but may denitrify. New Phytol 2018;217:453–66. 10.1111/nph.14843 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Song MJ, Freund F, Tibble CM. et al. The nitrogen-fixing fern Azolla has a complex microbiome characterized by varying degrees of cophylogenetic signal. Am J Bot 2025;112:e70010. 10.1002/ajb2.70010 [DOI] [PubMed] [Google Scholar]
- 24. Pereira AL, Carrapiço F. Culture of Azolla filiculoides in artificial conditions. Plant Biosyst Int J Deal Asp Plant Biol 2009;143:431–4. 10.1080/11263500903172110 [DOI] [Google Scholar]
- 25. Uheda E. Isolation of empty packets from Anabaena-free Azolla. Plant Cell Physiol 1986;27:1187–90. 10.1093/oxfordjournals.pcp.a077203 [DOI] [Google Scholar]
- 26. Clum A, Huntemann M, Bushnell B. et al. DOE JGI metagenome workflow. mSystems 2021;6:e00804-20. 10.1128/msystems.00804-20 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Nurk S, Meleshko D, Korobeynikov A. et al. metaSPAdes: a new versatile metagenomic assembler. Genome Res 2017;27:824–34. 10.1101/gr.213959.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Bushnell B. BBMap: A Fast, Accurate, Splice-Aware Aligner. Berkeley: Lawrence Berkeley National Lab. (LBNL), 2014. [Google Scholar]
- 29. Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 2014;30:2068–9. 10.1093/bioinformatics/btu153 [DOI] [PubMed] [Google Scholar]
- 30. Cantalapiedra CP, Hernández-Plaza A, Letunic I. et al. eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol Biol Evol 2021;38:5825–9. 10.1093/molbev/msab293 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Emms DM, Kelly S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 2019;20:238. 10.1186/s13059-019-1832-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Kang D, Li F, Kirton E. et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 2019;7:e7359. 10.7717/peerj.7359 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Parks DH, Imelfort M, Skennerton CT. et al. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 2015;25:1043–55. 10.1101/gr.186072.114 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Bowers RM, Kyrpides NC, Stepanauskas R. et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat Biotechnol 2017;35:725–31. 10.1038/nbt.3893 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Li F-W, Brouwer P, Carretero-Paulet L. et al. Fern genomes elucidate land plant evolution and cyanobacterial symbioses. Nat Plants 2018;4:460. 10.1038/s41477-018-0188-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Langmead B, Salzberg SL. Fast gapped-read alignment with bowtie 2. Nat Methods 2012;9:357–9. 10.1038/nmeth.1923 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Li H, Handsaker B, Wysoker A. et al. The sequence alignment/map format and SAMtools. Bioinformatics 2009;25:2078–9. 10.1093/bioinformatics/btp352 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Cook DE, Andersen EC. VCF-kit: assorted utilities for the variant call format. Bioinformatics 2017;33:1581–2. 10.1093/bioinformatics/btx011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Katoh K, Standley DM. MAFFT multiple sequence alignment software fersion 7: improvements in performance and usability. Mol Biol Evol 2013;30:772–80. 10.1093/molbev/mst010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Kozlov AM, Darriba D, Flouri T. et al. RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics 2019;35:4453–5. 10.1093/bioinformatics/btz305 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Asnicar F, Thomas MA, Beghini F. et al. Precise phylogenetic analysis of microbial isolates and genomes from metagenomes using PhyloPhlAn 3.0. Nat Commun 2020;11:2500. 10.1038/s41467-020-16366-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Hutchinson MC, Cagua EF, Balbuena JA. et al. Paco: implementing procrustean approach to cophylogeny in R. Methods Ecol Evol 2017;8:932–40. 10.1111/2041-210X.12736 [DOI] [Google Scholar]
- 43. Blanco-Míguez A, Beghini F, Cumbo F. et al. Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4. Nat Biotechnol 2023;41:1633–44. 10.1038/s41587-023-01688-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. McMurdie PJ, Holmes S. Phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS One 2013;8:e61217. 10.1371/journal.pone.0061217 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Vázquez-Rosas-Landa, M. Rbims: R Tools for Reconstructing Bin Metabolisms. R package version 0.0.0.9000. https://github.com/mirnavazquez/RbiMs. 2025.
- 46. Kanehisa M, Sato Y, Kawashima M. et al. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res 2016;44:D457–62. 10.1093/nar/gkv1070 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Galperin MY, Wolf YI, Makarova KS. et al. COG database update: focus on microbial diversity, model organisms, and widespread pathogens. Nucleic Acids Res 2021;49:D274–81. 10.1093/nar/gkaa1018 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Ekman M, Picossi S, Campbell EL. et al. A Nostoc punctiforme sugar transporter necessary to establish a cyanobacterium-plant symbiosis. Plant Physiol 2013;161:1984–92. 10.1104/pp.112.213116 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Kaneko T, Nakamura Y, Wolk CP. et al. Complete genomic sequence of the filamentous nitrogen-fixing cyanobacterium Anabaena sp. Strain PCC 7120. DNA Res; 2001;8:227–53. doi: 10.1093/dnares/8.5.227 [DOI] [PubMed] [Google Scholar]
- 50. Eren AM, Kiefl E, Shaiber A. et al. Community-led, integrated, reproducible multi-omics with anvi’o. Nat Microbiol 2021;6:3–6. 10.1038/s41564-020-00834-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Delmont TO, Eren AM. Linking pangenomes and metagenomes: the Prochlorococcus metapangenome. PeerJ 2018;6:e4320. 10.7717/peerj.4320 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Shaiber A, Willis AD, Delmont TO et al. Functional and genetic markers of niche partitioning among enigmatic members of the human oral microbiome. Genome Biol 2020;21:292. 10.1186/s13059-020-02195-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Syberg-Olsen MJ, Garber AI, Keeling PJ. et al. Pseudofinder: detection of pseudogenes in prokaryotic genomes. Mol Biol Evol 2022;39:msac153. 10.1093/molbev/msac153 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. The UniProt Consortium . UniProt: the universal protein knowledgebase in 2023. Nucleic Acids Res 2023;51:D523–31. 10.1093/nar/gkac1052 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Elek A, Kuzman M, Vlahoviček K. coRdon: Codon Usage Analysis and Prediction of Gene Expressivity. R package version 1.26.0. 10.18129/B9.bioc.coRdon. 2025. [DOI] [Google Scholar]
- 56. Supek F, Vlahoviček K. Comparison of codon usage measures and their applicability in prediction of microbial gene expressivity. BMC Bioinformatics 2005;6:182. 10.1186/1471-2105-6-182 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Zhou Z, Dang Y, Zhou M. et al. Codon usage is an important determinant of gene expression levels largely through its effects on transcription. Proc Natl Acad Sci 2016;113:E6117–25. 10.1073/pnas.1606724113 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Newman ZR, Young JM, Ingolia NT. et al. Differences in codon bias and GC content contribute to the balanced expression of TLR7 and TLR9. Proc Natl Acad Sci 2016;113:E1362–71. 10.1073/pnas.1518976113 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. Suyama M, Torrents D, Bork P. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res 2006;34:W609–12. 10.1093/nar/gkl315 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60. Murrell B, Weaver S, Smith MD. et al. Gene-wide identification of episodic selection. Mol Biol Evol 2015;32:1365–71. 10.1093/molbev/msv035 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61. Kosakovsky Pond SL, Poon AFY, Velazquez R. et al. HyPhy 2.5—a customizable platform for evolutionary hypothesis testing using phylogenies. Mol Biol Evol 2020;37:295–9. 10.1093/molbev/msz197 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. Wertheim JO, Murrell B, Smith MD. et al. RELAX: detecting relaxed selection in a phylogenetic framework. Mol Biol Evol 2015;32:820–32. 10.1093/molbev/msu400 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63. Oakeson KF, Gil R, Clayton AL. et al. Genome degeneration and adaptation in a nascent stage of symbiosis. Genome Biol Evol 2014;6:76–93. 10.1093/gbe/evt210 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64. Clayton AL, Oakeson KF, Gutin M. et al. A novel human-infection-derived bacterium provides insights into the evolutionary origins of mutualistic insect–bacterial symbioses. PLoS Genet 2012;8:e1002990. 10.1371/journal.pgen.1002990 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol 1995;57:289–300. 10.1111/j.2517-6161.1995.tb02031.x [DOI] [Google Scholar]
- 66. Brouwer P, Bräutigam A, Buijs VA. et al. Metabolic adaptation, a specialized leaf organ structure and vascular responses to diurnal N2 fixation by Nostoc azollae sustain the astonishing productivity of Azolla ferns without nitrogen fertilizer. Front Plant Sci 2017;8:442. 10.3389/fpls.2017.00442 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67. Stevenson BS, Waterbury JB. Isolation and identification of an epibiotic bacterium associated with heterocystous Anabaena cells. Biol Bull 2006;210:73–7. 10.2307/4134596 [DOI] [PubMed] [Google Scholar]
- 68. Carrapiço F. Are bacteria the third partner of the Azolla–Anabaena symbiosis? Plant Soil 1991;137:157–60. 10.1007/BF02187448 [DOI] [Google Scholar]
- 69. Liu Y, Harrison PM, Kunin V. et al. Comprehensive analysis of pseudogenes in prokaryotes: widespread gene decay and failure of putative horizontally transferred genes. Genome Biol 2004;5:R64. 10.1186/gb-2004-5-9-r64 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70. Douglas GM, Shapiro BJ. Pseudogenes act as a neutral reference for detecting selection in prokaryotic pangenomes. Nat Ecol Evol 2024;8:304–14. 10.1038/s41559-023-02268-6 [DOI] [PubMed] [Google Scholar]
- 71. Moran NA, Munson MA, Baumann P. et al. A molecular clock in endosymbiotic bacteria is calibrated using the insect hosts. Proc R Soc Lond B Biol Sci 1997;253:167–71. 10.1098/rspb.1993.0098 [DOI] [Google Scholar]
- 72. Blanc G, Ogata H, Robert C. et al. Reductive genome evolution from the mother of rickettsia. PLoS Genet 2007;3:e14. 10.1371/journal.pgen.0030014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73. Otero-Bravo A, Goffredi S, Sabree ZL. Cladogenesis and genomic streamlining in extracellular endosymbionts of tropical stink bugs. Genome Biol Evol 2018;10:680–93. 10.1093/gbe/evy033 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74. Burke GR, Moran NA. Massive genomic decay in Serratia symbiotica, a recently evolved symbiont of aphids. Genome Biol Evol 2011;3:195–208. 10.1093/gbe/evr002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75. Carlier AL, Eberl L. The eroded genome of a Psychotria leaf symbiont: hypotheses about lifestyle and interactions with its plant host. Environ Microbiol 2012;14:2757–69. 10.1111/j.1462-2920.2012.02763.x [DOI] [PubMed] [Google Scholar]
- 76. Carlier A, Fehr L, Pinto-Carbó M. et al. The genome analysis of Candidatus Burkholderia crenata reveals that secondary metabolism may be a key function of the Ardisia crenata leaf nodule symbiosis. Environ Microbiol 2016;18:2507–22. 10.1111/1462-2920.13184 [DOI] [PubMed] [Google Scholar]
- 77. Canini A, Grilli Caiola M, Mascini M. Ammonium content, nitrogenase activity and heterocyst frequency within the leaf cavities of Azolla filiculoides lam. FEMS Microbiol Lett 1990;71:205–10. 10.1111/j.1574-6968.1990.tb03823.x [DOI] [Google Scholar]
- 78. Singh RP, Singh PK. Symbiotic algal nitrogenase activity and heterocyst frequency in seven Azolla species after phosphorus fertilization. Hydrobiologia 1988;169:313–8. 10.1007/BF00007554 [DOI] [Google Scholar]
- 79. Risser DD, Callahan SM. Genetic and cytological evidence that heterocyst patterning is regulated by inhibitor gradients that promote activator decay. Proc Natl Acad Sci 2009;106:19884–8. 10.1073/pnas.0909152106 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80. Ekman M, Tollbäck P, Bergman B. Proteomic analysis of the cyanobacterium of the Azolla symbiosis: identity, adaptation, and NifH modification. J Exp Bot 2008;59:1023–34. 10.1093/jxb/erm282 [DOI] [PubMed] [Google Scholar]
- 81. Franche C, Cohen-Bazire G. Evolutionary divergence in the nifH.D.K. gene region among nine symbiotic Anabaena azollae and between Anabaena azollae and some free-living heterocystous cyanobacteria. Symbiosis 1987;3:159–78. [Google Scholar]
- 82. de Vries S, de Vries J. Azolla: A model system for symbiotic nitrogen fixation and evolutionary developmental biology. In: Fernández H. (ed.), Current Advances in Fern Research. Cham: Springer International Publishing, 2018, 21–46. [Google Scholar]
- 83. Deng W, Li C, Xie J. The underling mechanism of bacterial TetR/AcrR family transcriptional repressors. Cell Signal 2013;25:1608–13. 10.1016/j.cellsig.2013.04.003 [DOI] [PubMed] [Google Scholar]
- 84. Dick H, Stewart WDP. The occurrence of fimbriae on a N22-fixing cyanobacterium which occurs in lichen symbiosis. Arch Microbiol 1980;124:107–9. 10.1007/BF00407037 [DOI] [Google Scholar]
- 85. Gally DL, Bogan JA, Eisenstein BI. et al. Environmental regulation of the fim switch controlling type 1 fimbrial phase variation in Escherichia coli K-12: effects of temperature and media. J Bacteriol 1993;175:6186–93. 10.1128/jb.175.19.6186-6193.1993 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86. Ocloo XS, Vazquez-Prokopec GM, Civitello DJ. Mapping current and future habitat suitability of Azolla spp., a biofertilizer for small-scale rice farming in Africa. PLoS One 2023;18:e0291009. 10.1371/journal.pone.0291009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87. Shi D-J, Hall DO. The Azolla–Anabaena association: historical perspective, symbiosis and energy metabolism. Bot Rev 1988;54:353–86. 10.1007/BF02858416 [DOI] [Google Scholar]
- 88. Nierzwicki-Bauer SA, Haselkorn R. Differences in mRNA levels in Anabaena living freely or in symbiotic association with Azolla. EMBO J 1986;5:29–35. 10.1002/j.1460-2075.1986.tb04173.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89. Kösesakal T. Effects of seasonal changes on pigment composition of Azolla filiculoides Lam. Am Fern J 2014;104:58–66. 10.1640/0002-8444-104.2.58 [DOI] [Google Scholar]
- 90. Pereira AL, Vasconcelos V. Classification and phylogeny of the cyanobiont Anabaena azollae Strasburger: an answered question? Int J Syst Evol Microbiol 2014;64:1830–40. 10.1099/ijs.0.059238-0 [DOI] [PubMed] [Google Scholar]
- 91. Heirendt L, Arreckx S, Pfau T. et al. Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v.3.0. Nat Protoc 2019;14:639–702. 10.1038/s41596-018-0098-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92. Eily AN, Pryer KM, Li F-W. A first glimpse at genes important to the Azolla–Nostoc symbiosis. Symbiosis 2019;78:149–62. 10.1007/s13199-019-00599-2 [DOI] [Google Scholar]
- 93. Dijkhuizen LW, Tabatabaei BES, Brouwer P. et al. Far-red light-induced Azolla filiculoides symbiosis sexual reproduction: responsive transcripts of symbiont Nostoc azollae encode transporters whilst those of the fern relate to the angiosperm floral transition. Front Plant Sci 2021;12:693039. 10.3389/fpls.2021.693039 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94. de Vries S, de Vries J, Teschke H. et al. Jasmonic and salicylic acid response in the fern Azolla filiculoides and its cyanobiont. Plant Cell Environ 2018;41:2530–48. 10.1111/pce.13131 [DOI] [PubMed] [Google Scholar]
- 95. de Vries S, Herrfurth C, Li F-W. et al. An ancient route towards salicylic acid and its implications for the perpetual Trichormus–Azolla symbiosis. Plant Cell Environ 2023;46:2884–908. 10.1111/pce.14659 [DOI] [PubMed] [Google Scholar]
- 96. Güngör E, Bartels B, Bolchi G. et al. Biosynthesis and differential spatial distribution of the 3-deoxyanthocyanidins apigenidin and luteolinidin at the interface of a plant-cyanobacteria symbiosis exposed to cold. Plant Cell Environ 2024;47:4151–70. 10.1111/pce.15010 [DOI] [PubMed] [Google Scholar]
- 97. Lin C, Liu Z-Z, Zheng D-Y. et al. Re-establishment of symbiosis to Anabaena-free Azolla. In: Bothe H, de Bruijn FJ, Newton WE (eds.), Nitrogen Fixation: Hundred Years After. Proceedings of the 7th International Congress on Nitrogen Fixation. Stuttgart: Gustav Fischer, 1988, 223–7. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Metagenomic contigs, MAGs, and annotations are publicly available on the JGI IMG server under proposal ID 503794, and processed data are available as supplementary tables alongside this article.





