Abstract
Convergence provides clues to unveil the non-random nature of evolution. Intermediate paths toward convergence inform us of the stochasticity and the constraint of evolutionary processes. Although previous studies have suggested that substantial constraints exist in microevolutionary paths, it remains unclear whether macroevolutionary convergence follows stochastic or constrained paths. Here, we performed comparative genomics for hundreds of lactic acid bacteria (LAB) species, including clades showing a convergent gene repertoire and sharing fructose-rich habitats. By adopting phylogenetic comparative methods we showed that the genomic convergence of distinct fructophilic LAB (FLAB) lineages was caused by parallel losses of more than a hundred orthologs and the gene losses followed significantly similar orders. Our results further suggested that the loss of adhE, a key gene for phenotypic convergence to FLAB, follows a specific evolutionary path of domain architecture decay and amino acid substitutions in multiple LAB lineages sharing fructose-rich habitats. These findings unveiled the constrained evolutionary paths toward the convergence of free-living bacterial clades at the genomic and molecular levels.
Subject terms: Comparative genomics, Molecular evolution, Bacterial evolution
Using comparative genomics, the authors unveil constrained evolutionary paths both at genomic and molecular levels during convergent evolution of two distinct lactic acid bacteria lineages.
Introduction
Convergent evolution has been a powerful clue to suggest non-random or predictable nature of evolution: both microevolution1–6 and macroevolution7–10. Statistically significant similarity among independent evolutionary outcomes suggests the existence of shared selective pressures or evolutionary constraints among lineages11,12. To date, convergence has been studied at multiple levels including phenotypic, genomic, and molecular levels13–17. While many studies have focused on the evolutionary mechanisms leading to extant traits, studies on the intermediate evolutionary steps and processes have provided insights into the randomness and constraints on potential evolutionary paths from a past to a current state18–23. Those constraints on evolutionary paths have been observed by comparing distinct evolutionary paths toward convergence, i.e., the orders of multiple evolutionary events in different lineages. Evolutionary paths toward convergence can be different among lineages if the order of evolutionary events is random (the “evolutionary funnel” model), or similar if the intermediate evolutionary process is non-random or constrained (the “evolutionary stream” model) (Fig. 1). “Evolutionary funnel model” suggests that there are potentially diverse evolutionary intermediates from the common ancestor to the converged evolutionary outcomes, while “evolutionary stream model” suggests that evolutionary constraints only allow specific evolutionary paths.
Fig. 1. Models for convergence through multiple evolutionary events.

a, b Given two distinct lineages show genomic or phenotypic convergence through multiple evolutionary events (A–F), the lineages potentially follow different evolutionary paths (the evolutionary “funnel” model) (a), i.e., experience events A–F in different orders or follow similar paths (the evolutionary “stream” model) (b).
Constraints on intermediate paths of microevolution were studied by direct tracing of evolution or by measuring the fitness of artificial evolutionary intermediates18–20,24–26. Those microevolutionary studies showed that mutations on a gene accumulate following constrained paths, which reflect epistatic relationships among amino acid residues. On the other hand, macroevolutionary convergence in nature often takes so long that evolutionary intermediates are not generally observable. Phylogenetic comparative methods have been widely adopted to reconstruct long-term processes on a phylogenetic tree27–29. While a study on long-term phenotypic convergence supported stochastic paths, that is, the evolutionary funnel model30, studies on genomic evolution have suggested an evolutionary stream model23,31,32. In addition, previous studies on convergent genomic evolution have been limited to the genome streamlining of organelles and endosymbiotic species such as mitochondria23,31 and Buchnera32. Thus, it is still unclear whether a common order of evolutionary events is observed in the macroevolutionary convergence, especially of free-living species.
Fructophilic lactic acid bacteria (FLAB) are a group of Lactobacillaceae species sharing unique metabolic traits (explained below) and are reported to show clear convergent evolution in terms of phenotypic, ecological, and genomic traits33. FLAB consists of two phylogenetically distant genera, Fructobacillus and Apilactobacillus (excluding A. ozensis), which initially belonged to the families Leuconostocaceae and Lactobacillaceae, respectively34,35. Phenotypically, FLAB are a unique group of lactic acid bacteria (LAB) that grow poorly on glucose but well on fructose. Supplementation with external electron acceptors markedly improves the growth of FLAB on glucose36. Pyruvate, oxygen, and fructose are the major sources of the electron acceptors. In addition, plant-derived phenolic compounds are also possible electron acceptors37. Ecologically, FLAB are free-living clades found only in fructose-rich niches, including fruit surfaces, fermented fruits, flowers, and honeybee guts, which environments likely led to the convergent evolution33,35,38. Honeybee-based food products are a rich source of FLAB39.
FLAB possess small genomes (<1.69 Mbp) and small numbers of coding DNA sequences characterized by the convergent absence of genes, especially for metabolism such as phosphotransferase system transporters33. The genomic characteristics of FLAB cannot be explained only by typical genome streamlining, as their gene repertoires are not similar to those of other related clades with small genomes33. A characteristic of the FLAB genome is the absence of adhE gene40,41. AdhE is an 850–900 amino-acid bifunctional enzyme that has alcohol dehydrogenase (ADH) and acetaldehyde dehydrogenase (ALDH) domains, and is one of the key enzymes for NAD/NADH cycling in the heterolactic phosphoketolase pathway42. FLAB use this pathway for glucose metabolism; thus, the lack of AdhE causes poor glucose growth if external electron acceptors for the oxidation of NADH in glucose metabolism are unavailable33,43. FLAB grow well on fructose, which functions both as a carbon source and an electron acceptor. In contrast, adhE is essential for all heterofermentative LAB, excluding FLAB43. Notably, the sole non-FLAB member in the genus Apilactobacillus, A. ozensis, grows on glucose and possesses both complete AdhE (877 amino acids) and partial AdhE (458 amino acids)43. In this way, FLAB have been defined as a group of lactobacilli sharing metabolic traits: (1) Growing well on fructose and requiring electron acceptors (e.g., fructose, oxygen, or pyruvate) to grow on glucose, (2) Metabolizing limited number of carbohydrates, (3) Lacking many genes for glycometabolism, and (4) Lacking adhE fully or partially.
In this study, we first applied ancestral reconstruction of gene content to the recently accumulating genome resources of various Lactobacillaceae species and analyzed the intermediate evolutionary processes of genomic convergence toward FLAB. Then, we statatistically investigated which of the evolutionary funnel or stream model suits the gene-content evolution of FLAB. We further focused on the molecular evolutionary process of adhE gene loss in terms of domain loss and amino acid substitution patterns, and verified which evolutionary model is supported for the evolution at a molecular level. Our study shed light on the possibility of constraints on the macroevolutionary paths toward convergence of free-living species and the importance of systematic evolutionary analysis in revealing the driving forces and predicting the future of natural evolution.
Results
Reconstruction of gene-content evolution toward two FLAB clades
To analyze intermediate processes toward genomic convergence of the two FLAB lineages (Apilactobacillus and Fructobacillus), we reconstructed the gene content of ancestral species from that of extant species using Diversitree44. A function of Diversitree estimates the presence/absence of an ortholog group (OG) in each ancestral species, i.e., each internal node in a phylogeny, by maximizing the likelihood under a stochastic model (Mk model45) of gene gains/losses (Fig. 2a). This method requires the phylogeny and a profile of the presence/absence of an OG for every extant species, i.e., every tip of the phylogenetic tree. To prepare the required dataset, we retrieved representative genomes and a genome-based reference phylogenetic tree for all phyla of bacteria from Genome Taxonomy Database (GTDB r202), a database of a genome-phylogeny-based taxonomy of prokaryotes46. The reference phylogenetic tree was reconstructed from a concatenated multiple sequence alignment of 120 marker genes46. We then extracted reference genomes and a phylogenetic tree of all the 344 Lactobacillaceae species for which GTDB provides high-quality genomes (>95% completeness and <5% contamination). We employed stringent thresholds for genome completeness and contamination due to the risk of false negatives and false positives in detecting orthologous genes in our downstream analysis (explained below). We finally constructed a presence/absence profile for 2293 OGs using a gene annotation tool (KofamScan47) applied to the representative genomes of the 344 species. Because the estimation of ancestral gene content was uncertain, we repeated the inference of ancestral presence/absence 500 times for every OG with different model parameter sets pre-computed using a Bayesian method44. We eliminated the possibility of auto-correlation among the 500 model parameters to ensure that the repeatedly reconstructed ancestral states were independent of each other (Supplementary Fig. 1a).
Fig. 2. The genomic convergence of FLAB involved shared losses of over 100 orthologs.
a A schematic diagram of ancestral reconstruction. Gene presence and absence at each extant or ancestral species are represented by circles and bars, respectively. In this example, the presence or absence of orthologs a and b for each internal node of the phylogenetic tree is estimated based on whether they are present or absent at each tip. Here, orthologs a and b are inferred to have been acquired once and twice (suggesting horizontal transfers for b), respectively. b The phylogenetic relationship of Lactobacillaceae species that are descendants of the latest common ancestor of the two FLAB clades. Paths from the common ancestor of two FLAB clades (LCAfa) to the common ancestor of each of them (LCAa and LCAf) were indicated with thick branches. F0-F7 and A0-A5 represent the branches of the tree from LCAfa to LCAf and LCAa, respectively. The scale bar indicates the branch length of 0.1. c Classification of Lactobacillaceae OGs based on inferred ancestral scenarios. The heatmap shows the ratio of 500 ancestral scenarios inferred for each OG in every presence/absence pattern at LCAfa, LCAa, and LCAf. The number of OGs in each class corresponding to the major group of estimated scenarios was shown below the heatmap. d Contingency tables for metabolic and non-metabolic genes to test the significant overlap of OGs lost by Fructobacillus and Apilactobacillus. The color of each cell corresponds to the number in the cell. P values were 3.53 × 10−15 and 4.39 × 10−19 for metabolic and non-metabolic genes, respectively. Classifications of OGs lost in the lineages from LCAfa to LCAa (e) and LCAf (f) by the most likely branch where an OG was lost. The heatmap shows the ratio of ancestral scenarios supporting each branch as the branch where each OG was lost. Phylogenetic distributions of OGs lost in the lineages from LCAfa to LCAa (g) and LCAf (h). The OGs were sorted in the same order as e and f.
We extracted OGs commonly and independently lost in the two lineages leading to FLAB clades, that is, OGs that were present in the common ancestor (LCA) of the two FLAB clades (LCAfa) but absent in the LCA of Apilactobacillus (LCAa) and that of Fructobacillus (LCAf) (Fig. 2b). We classified every reconstruction result for each OG into eight (23) scenarios corresponding to the presence/absence in LCAfa, LCAa, and LCAf and represented each scenario as a triplet of zero (absence) and one (presence). For example, a scenario in which an OG is present in LCAfa, absent in LCAa, and present in LCAf is represented by (1, 0, 1). We selected the most strongly supported scenario for each OG from the 500 reconstructions (Fig. 2c). We found that the supported scenarios for most OGs were robust against the independently reconstructed ancestral states, and found 137 OGs (55 metabolic and 82 non-metabolic OGs; metabolic genes were defined as set of OGs included in ‘Metabolism’ of KEGG Brite database48) commonly lost in the two lineages leading to the FLAB clades.
We further found that the overlap between the metabolic gene sets lost along the two evolutionary paths toward the FLAB clades (LCAfa-to- LCAa and LCAfa-to- LCAf) was larger than expected by chance, confirming the previously suggested metabolic convergence of FLAB33 (Fig. 2d). The non-metabolic genes lost in the two lineages also showed significant overlaps, suggesting that the repertoires of non-metabolic genes converged under shared evolutionary pressures or constraints. A hierarchical clustering method previously used to detect the convergence of metabolic gene sets33, clustered Fructobacillus and Apilactobacillus while the two genera are distantly related in the reference phylogeny, suggesting the convergence of metabolic gene repertoires (Supplementary Fig. 1b). Notably, the convergence of non-metabolic gene repertoires was not apparent in the dendrogram, because the two FLAB genera were not clustered (Supplementary Fig. 1c). The result suggests that we might have missed convergent gene-loss evolution without conducting ancestral reconstruction because of divergent gene gains and weak gene-content diversification in non-converging clades.
To reveal the order of gene loss in the two evolutionary paths toward FLAB, we next estimated the branch where the gene loss of each OG occurred (Fig. 2e, f). Among the branches in the paths toward Apilactobacillus and Fructobacillus (A0-A5 and F0-F7, respectively), we determined the gene-loss branch that was best supported by repeated ancestral reconstructions. We found that the supported branches were overall similar among the 500 reconstruction results and were consistent with the phylogenetic distribution of OGs in the extant species (Fig. 2g, h). Notably, the number of lost genes in Apilactobacillus evolution was especially large in branch A3, where the clade, including Apilactobacillus and Fructilactobacillus was diversified from Lentilactobacillus. Consistent with our results, Fructilactobacillus spp. are closely linked to specific niches, including flowers49, beer50, sourdoughs51,52, and insects34, and are known to have limited carbohydrate metabolic properties and small genomes33,49,50,53. These results suggest a phase of massive gene losses associated with the transition to their specific niches (e.g., insects and flowers) before the derivation of the clade, including Fructilactobacillus and Apilactobacillus (Supplementary Fig. 3). However, we did not observe such a peak in gene loss during Fructobacillus evolution.
In this way, we reconstructed the gene content of the evolutionary intermediates and the orders of gene gain/losses along with the paths toward the two FLAB clades to analyze whether the stream or funnel model fits the convergent evolutionary processes.
Gene loss events toward FLAB clades supported the evolutionary stream model
To statistically assess whether the order of gene loss was shared between the paths leading to the FLAB clades, we evaluated the similarity between the gene loss orders of the two FLAB lineages. Here, the loss-order similarity was defined as the ratio of OG pairs lost in the same order between the two lineages in all pairs of commonly lost 137 OGs (Supplementary Data 3). We then calculated the null distribution of the loss order similarity by random shuffle of gene loss orders. Finally, we compared the observed similarity with the null distribution to test whether the observed score is significantly higher than expected by chance (Fig. 3a).
Fig. 3. Significantly similar yet partially divergent gene loss orders in the evolutionary paths of two FLAB clades.
a The schematic diagram of a statistical test method to detect the significance of gene loss order similarity between the two FLAB lineages. b The result of the statistical test depicted in (a). The green vertical line segment represents the observed score of gene loss order similarity calculated for 137 OGs commonly and independently lost by FLAB, while the grey histogram illustrates the null distribution of the score. The loss order similarity was defined as the ratio of OG pairs lost in the same order between LCAfa-to-LCAa and LCAfa-to-LCAf lineages. c Comparison of the relative loss order of COG functional categories in the two FLAB lineages. Each dot represents the order of average gene loss timings of each functional category in each lineage. Red and grey dots indicate functions that showed markedly different or similar relative timing of gene losses, respectively. d COG-functional-category-wise counts of OGs lost at each branch in paths toward the two FLAB lineages. The dot size indicates the number of lost OGs. Only the OGs commonly lost in the two lineages were included in the counts. The COG functional categories were sorted by the average timing of gene losses in each lineage. Red and grey edges connect the same functions, corresponding to functions represented as red and grey dots in (c).
The results showed that the order of gene loss was significantly similar, supporting the evolutionary stream model for gene content evolution in FLAB (Fig. 3b). These trends were confirmed by visualizing the overlap ratio (Jaccard index) of the OGs lost in each branch of the two evolutionary paths (Supplementary Fig. 2a). To interpret the evolutionary paths shared between the lineages leading to the two FLAB clades, we mapped 137 commonly lost OGs onto functional categories defined by the COG database54. By sorting the functional categories according to the average timing of gene loss, we found that most functional categories (14 of 18) were lost in a similar order (Fig. 3c, d). Functions lost early in both lineages included secondary metabolism and defense mechanisms, which tend to be encoded in accessory genomes55–57. Function unknown genes (“Function unknown” and “General function prediction only”) also tended to be lost early. Because genes in accessory genomes and genes of unknown function are often non-essential, these results suggest that non-essential genes were lost early in both convergent evolutionary paths.
In contrast, four of the 18 functional categories did not show similar gene loss orders. These four categories contained 44 OGs, with 37 related to carbohydrate transport and metabolism or amino acid transport metabolism. Notably, the two functions’ order of loss was reversed between the two lineages. These trends were confirmed by mapping OGs onto pathways defined in the KEGG database48 (Supplementary Fig. 2b). This result indicates that the gene-loss evolution of these two metabolic pathways followed the stochastic evolutionary funnel model, where different order of gene losses would have led to the same destination, that is, the loss of both functions.
The phylogeny of 344 Lactobacillaceae species was extracted from the maximum-likelihood genome phylogeny provided in GTDB58 and further confirmed by re-inferring a phylogeny from the GTDB marker gene sequences of Lactobacillaceae species (Supplementary Fig. 4). We noted that the phylogenetic location of the Pediococcus clade was different from that reported in a previous study, likely because of the selection of the marker gene sets59 (Supplementary Fig. 5a). While the Ultrafast bootstrap values of the relevant branches in our tree were 96%, the tree topology may have umbiguities. When we changed the position of the Pediococcus clade (Supplementary Fig. 5b), the significantly large overlap of the commonly lost gene sets was robustly observed (P value = 6.8 × 10−32) and the tendency of the gene-loss order similarity was also kept observed but without statistical significance after removing or regrafting Pediococcus clade (Supplementary Fig. 5c–f).
In summary, convergent gene losses toward FLAB were mainly consistent with the evolutionary stream model, except for genes involved in the metabolism of carbohydrates and amino acids.
Non-FLAB lactobacillaceae clades in fructose-rich environment convergently lost adhE
Next, we focused on the loss of the adhE gene as a key convergent gene loss event toward the emergence of the two FLAB clades. To comprehensively detect and compare adhE loss processes that occurred during convergent evolution, we first examined whether other Lactobacillaceae clades also lost adhE under selective pressures similar to those of the two FLAB clades. Given a previous study’s report that certain Lactobacillaceae species harbor partial adhE genes33, we employed a sequence similarity-based search to sensibly detect these partial genes. This was achieved by querying complete adhE genes, detected from all bacterial clades through profile hidden Markov model (pHMM) searches (Supplementary Fig. 6a). We searched adhE-like genes by HMM profile searches against all high-quality genomes retrieved from GTDB with low contamination and high completeness (Supplementary Fig. 6b, c). After excluding markedly short or long sequences, we detected 4833 adhE-like genes from diverse bacterial phyla (Supplementary Fig. 6d). We then used this comprehensive adhE-like gene dataset as a query for sequence similarity searches by MMSeqs60. Here we set the threshold of sequence identity and coverage both as 45%. The thresholds were chosen for the sensitive detection of partial adhE genes. As a result, we identified 399 adhE-like genes in 369 non-FLAB Lactobacillaceae species (Supplementary Fig. 6e, Supplementary Data 1). Among the genes, only 287 genes (71.9%) were as long as typical adhE genes (800–1000 aa), and 77 genes (19.3%) were substantially shorter (350–600 aa) (discussed later).
Our extensive searches unveiled that multiple species in at least eight non-FLAB genera of Lactobacillaceae lost the adhE gene or had substantially shorter adhE-like genes. This suggests that adhE was repeatedly lost in non-FLAB clades neither Apilactobacillus nor Fructobacillus (Fig. 4a; Supplementary Data 2). Most non-FLAB clades lacking adhE were homofermentative species in which adhE was not essential. Notably, the homofermentative species missing adhE tend to miss pyruvate formate lyases, which are known to convert from pyruvate to acetyl-CoA61 (Supplementary Fig. 6f). On the other hand, pyruvate formate lyase is not conserved in heterofermentative species as reported previously59. These findings suggests that the acetyl-CoA-to-ethanol conversion by AdhE in homofermentative species can be associated with activity of pyruvate formate lyase. Moreover, the occurrence of adhE loss in non-FLAB was suggested to be non-random because significantly high proportions of those non-FLAB species (30 of 87) were isolated from bee-associated environments, which are major isolation sources of Fructobacillus and Apilactobacillus (Fig. 4a; Chi-square test, p = 1.14 × 10−20, 2.30 × 10−20, and 1.83 × 10−20 for all, heterofermentation, and homofermentation species, respectively). Three of the four bee-associated non-FLAB clades included at least one basal clade possessing adhE, which suggests that the adhE genes were lost after inhabiting bee-associated environments (Fig. 4b). Not only the isolation sites, we also analyzed habitats of each Lactobacillaceae species based on published datasets of shot-gun metagenome sequencing. Using a previously developed pipeline62, we showed the enrichment of non-FLAB Lactobacillaceae species missing complete adhE in pollen (Fig. 4b; Phylogenetic ANOVA, p = 0.002), which is also known as a fructose-rich environment and a habitat for FLAB.
Fig. 4. Repeated loss of adhE in fructose-associated species and the tendency in the domain loss order.
a adhE is repeatedly lost in bee-associated clades. The left bar graph indicates the number of species with and without a complete adhE sequence by isolation site (bee-associated or others). The right panel shows the phylogenetic distributions of hetero/homofermentative species and complete adhE in a genomic phylogeny of 344 Lactobacillaceae species with a high-quality representative genome. The copy number of complete adhE is indicated as the color thickness of the heatmap. b Phylogenetic distribution of species in fructose-rich environments. Habitat preference scores for pollen environment and species isolated from bee-associated environments were represented by a bar graph and a color strip, respectively. c The phylogenetic distribution of the number OGs which are present in each species’ genomes and commonly lost in the evolutionary paths toward two FLAB clades (LCAfa-to- LCAf and LCAfa-to- LCAa in Fig. 2b). d The phylogenetic distribution of partial adhE-derived genes which were classified into ALDH fragment, ADH fragment, and middle fragment. e, Gene phylogenies of N-terminal and C-terminal halves of complete and partial adhE in Lactobacillaceae and the amino acid lengths of their whole sequences. f State transition rates among the repertoire of complete and partial adhE across the whole Lactobacillaceae phylogeny. g Number of extant Lactobacillaceae species with each combination of complete and partial adhE. The X-axis labels represented as colored circles correspond to the labels in (f).
We also found that the genomes of non-FLAB species without adhE convergently lost a part of the 137 OGs that were commonly lost from the two FLAB clades (Phylogenetic ANOVA, p = 0.006 and 0.0002 for homo- and hetero-fermentation species, respectively) (Fig. 4c). This shared gene loss also supports the notion that non-FLAB clades without adhE were subjected to similar selective pressure with FLAB clades. A clear example of such a non-FLAB species is the homofermentative Holzapfelia floricola, isolated from flowers63 and has a gene content similar to that of FLAB species43.
Collectively, our results showed that multiple non-FLAB Lactobacillaceae clades, whose habitats are similar to those of FLAB, convergently lost adhE and other genes, likely under shared pressures.
Repeated domain-wise loss of adhE also followed the evolutionary stream model
As previously mentioned, AdhE is a bifunctional enzyme with ADH and ALDH domains, and at least ten species of Apilactobacillus only have a partial adhE gene with the ALDH domain (Supplementary Data 2). This implies that the adhE gene was lost in a domain-wise manner in Apilactobacillus, where the ADH domain was lost first. Although we could not find partial adhE genes in Fructobacillus (Supplementary Fig. 6a; see section “Methods“), we found that many non-FLAB Lactobacillaceae species contained a partial (350–600 aa) adhE gene as described above (Supplementary Fig. 6e). Therefore, we examined whether domain-wise losses occurred in the non-FLAB Lactobacillaceae clades that lost adhE and whether they followed the same domain-loss order (i.e., the evolutionary stream model).
First, we carefully discriminated (partial) adhE genes from their paralogs because our sensitive searches of adhE-like genes could identify non-adhE genes only with ALDH or ADH domain. We constructed phylogenetic trees of the N- and C-terminal halves of all 399 adhE-like genes in the non-FLAB Lactobacillaceae species. Using the adhE gene of E. coli as an outgroup, we selected 372 genes, including 85 partial genes distributed in multiple clades, as complete and partial adhE (Fig. 4d, e and Supplementary Fig. 7). A sequence similarity network of 399 adhE-like genes also indicated that the 372 genes belonged to the same gene family cluster as adhE (Supplementary Fig. 6g). In both gene phylogenies and similarity networks, 85 partial genes were separated into different clades or clusters, indicating that domain-wise losses repeatedly occurred in Lactobacillaceae (Fig. 4e, Supplementary Fig. 6h).
We then classified the 85 partial adhE genes into three classes of “ALDH-only,” “ADH-only,” and “incomplete fragment” by analysis of the multiple sequence alignment (an incomplete fragment typically contains one domain and a part of the other domain). Interestingly, as many as 70 of the 85 partial adhE genes were ALDH-only fragments, while only five and ten were ADH-only and incomplete fragments, respectively. Thus, in both Apilactobacillus and non-FLAB Lactobacillaceae clades, adhE was lost in a domain-wise manner, and an ALDH-only fragment was the major evolutionary intermediate (Fig. 4d and Supplementary Fig. 7). For example, Apilactobacillus apinorum and Lacticaseibacillus thailandensis are completely missing adhE, while their sister groups possess one ALDH fragment only, suggesting ALDH fragment was the intermediate state of adhE loss in their lineages. We also found that ALDH-only fragments were conserved across species even after the loss of the ADH domain, suggesting that the ALDH-only fragments alone contributed to fitness (Fig. 4d). In contrast, the ADH-only fragments were not conserved across species. Three species (Companilactobacillus zhachilii, Lacticaseibacillus saniviri, Lentilactobacillus kefiri) were annotated to have both ALDH-only and ADH-only fragments in tandem, possibly caused by sequencing errors, because a complete adhE gene was found in all other strains of the three species (Supplementary Fig. 8). Therefore, we assumed that these three species possessed a complete adhE gene throughout the following analysis.
To quantitatively compare the frequencies of the adhE domain-wise losses in different orders, we estimated the transition rate parameters among the five states (complete, ALDH-only, ADH-only, incomplete, and no adhE) and two additional states because some species had another adhE gene in addition to an ALDH-only domain or a complete adhE gene (Fig. 4f). In the state transition model, we allowed transitions in which both ALDH and ADH domains are lost simultaneously because those transitions are possible by single mutations (e.g., nonsense mutations at upstream regions). We found that the transition rate from complete adhE to an ALDH-only fragment was 6.7 times faster than that to an ADH-only fragment. The results also revealed a substantially high rate of gaining an ALDH-only fragment by species with complete adhE, whereas none of the species with complete adhE gained the ADH fragment (Fig. 4f, g). We also found that the rate of a transition from “no AdhE” to “one complete AdhE” was non-zero, suggesting adhE genes could be horizontally transferred.
In summary, adhE loss occurred repeatedly in a domain-wise manner in Lactobacillaceae, initiating with the ADH domain loss. The finding suggests the decay of adhE domain architecture follows the evolutionary stream model.
Amino-acid substitutions suggest the benefit of a specific intermediate step of adhE evolution
To reveal the molecular basis of how an ALDH-only fragment of the adhE gene can be beneficial and lead to constrained evolutionary paths, we investigated the evolution of amino acid sequences in ALDH domains. AdhE is known to function by forming a filament-like self-assembled complex called spirosome, where ALDH and ADH domains of different AdhE molecules interact64, and substrate channeling of an intermediate product, acetaldehyde, between them enhances their enzymatic activities65,66.
We constructed a molecular phylogenetic tree of the ALDH domain sequences encompassing the complete adhE genes and ALDH-only fragments. Then, we reconstructed the ancestral amino acid sequences for every internal node of the tree and extracted mutations specific to the ALDH-only fragment clades (Fig. 5a). Notably, some mutations were repeatedly observed, specifically in the ALDH-only fragment clades (Supplementary Fig. 9a). We found 15 sites that were the most variable among the ALDH-only fragments, but were conserved in the complete adhE genes (Fig. 5b, c). In particular, sites 105 and 451 accumulated specific substitutions in ALDH-only fragments (Fig. 5d). We confirmed that the estimation of ancestral sequences was supported by overall high posterior probabilities (Supplementary Fig. 9b).
Fig. 5. Mutation history suggests fragmental AdhEs have lost channeling ability but retain enzymatic activity.
a A schematic diagram to detect mutations specific to partial adhE missing ADH domains. b Estimated number of amino acid substitutions that occurred on the gene tree of adhE at every sequence position in the N-terminal half of AdhE. The grey and red parts of every bar indicate the number of substitutions of complete AdhEs and ALDH fragments, respectively. c Enrichment of substitutions in ALDH fragments at every sequence position in the N-terminal half of AdhE. Each dot indicates the ratio of substitutions that occurred in ALDH fragments at a site of AdhE. Only sites mutated ≧5 times were indicated here. Red dots indicate the top 15 sites that tended to be substituted specifically in ALDH fragments. d Amino acid composition of the 15 sites in complete AdhEs and ALDH fragments. Amino acid characters are indicated for sites 105 and 451, which were focused on in (f). e A previously reported complex structure of AdhE dimer (PDB: 3ahc)37. The two AdhE molecules were colored white and yellow. The location of the 15 sites was colored red in each molecule. f Zoom-in diagrams of the AdhE dimer structures focusing on the two of 15 sites in the AdhE molecule colored white. g Enrichment of substitutions in ALDH fragments for AdhE-AdhE interaction sites, NAD-binding sites, and others. Each dot indicates the ratio of mutations that occurred in ALDH fragments at each sequence position. Box plots indicate the distributions of the data points. Asterisks indicate statistical significance by Mann–Whitney U-test (p < 0.05). h Number of substitutions that occurred in complete AdhEs and ALDH fragments for AdhE-AdhE interaction sites, NAD-binding sites, and others. Each dot indicates the number of substitutions that occurred in ALDH fragments at each sequence position. Box plots indicate the distributions of the data points. Asterisks and “n.s.” indicate the significant and not significant differences by Mann–Whitney U-test (p < 0.05), respectively.
To infer the characteristics of the sites with fragment-specific mutations, we mapped the 15 site variables in the ALDH-only fragments onto a previously reported AdhE complex structure (PDB:6ahc65) and found that these sites were enriched at the interface between the ALDH and ADH domains in the spirosome (Fig. 5e). In particular, sites 105 and 451 formed ionic bonds with another AdhE molecule (Fig. 5f). We statistically confirmed that the ratios of mutations in the ALDH-only fragments were significantly larger at the ALDH-ADH interaction interfaces than at other sites (Fig. 5g). These results strongly suggest that ALDH domains translated from ALDH-only fragments do not form spirosomes and do not require substrate channeling.
In contrast, we found that the NAD+-binding sites in the ALDH-only fragments were strongly conserved, similar to those of the complete adhE genes (Fig. 5h and Supplementary Fig. 9c). These results suggest that the ALDH-only fragments retained the enzymatic activity of ALDH even after the ADH domains were lost. Notably, some sites in the ALDH-only fragments showed markedly better conservation than complete adhE, suggesting ALDH-only fragments convergently acquired point mutations (Supplementary Fig. 9d). This raised a possibility that ALDH-only fragments gained specific functions after domain loss, as reported for other protein families67,68.
Discussion
In this study, we dug into multiple LAB clades that convergently adapted to a fructose-rich environment to investigate the similarity of their long-term evolutionary paths at multiple levels. At the gene-content evolution level, we raised a possibility that the loss of as many as 137 genes occurred in significantly similar orders, although the result depends on the reference phylogeny topology. At the domain architecture evolution level, the repeated losses of adhE followed the same order of domain-wise losses as the first loss of the ADH domain. At the amino acid sequence level, we also observed a convergent accumulation of point mutations at the substrate-channeling interfaces of AdhE. The results suggesting evolutionary stream models at multiple scales illuminate the non-random nature and predictability of intermediate processes of evolution.
The similarity in the gene loss order based on the GTDB phylogeny was remarkable. While the evolution of endosymbiotic organisms and organelles follows similar evolutionary paths, FLAB are free-living and have experienced habitat changes23,31,32. As a recent study showed the general predictability of bacterial species that will gain/lose a gene in the future based on current gene content information69, there can be non-random or predictable patterns in the gene gain/loss order of diverse microbes. In the present study, both FLAB lineages independently lost non-essential genes before inhabiting fructose-rich environments, where specific genes were previously suggested to be commonly lost (e.g., phosphotransferase system33). Thus, the mechanism of the suggested evolutionary stream can be the gene essentiality affecting firstly lost genes and the following similar selective pressures in specific niches (e.g., insects and flowers) causing shared gene losses. Notably, gene loss patterns in carbohydrate transport and metabolism and amino acid transport metabolism did not follow the same pattern in the two FLAB clades (Fig. 3d). In other words, the convergent gene losses of these two functional categories followed an evolutionary funnel model. While the ancestors of Apilactobacillus markedly lost carbohydrate metabolism genes during the homo-to-heterofermentation transition (first branch: A1), the ancestors of Fructobacillus lost these genes mainly in the later stages (F6). Fructobacillus mainly lost genes for amino acid metabolism at or soon after the homo-to-heterofermentation transition (F0–3). Although glycolysis would have become unnecessary after the transition to heterofermentation, our results suggest that genes for carbohydrate metabolism can be stochastically retained after the homo-to-heterofermentation transition. Their retention may be because these two pathways are still advantageous in changing environments or at least not disadvantageous. It should be noted that the phylogenetic tree topology of Lactobacillaceae needs to be examined with the growing genome datasets in the future.
In addition to gene content evolution, we identified common evolutionary paths of the AdhE domain architecture in fructose-rich environments, where the ADH domain was first lost. This clear evolutionary trend is especially notable given that domain architecture convergence was reported to be rare in general70. It should be noted that the domain-loss frequency of proteins is generally high at both the N- and C-termini71, and the bias of the domain-loss order is not likely attributed to their relative positions.
Our amino acid sequence analysis suggests that the loss of adhE genes tends to proceed through an ALDH-only fragment as an evolutionary intermediate. Our hypothesis to explain the order of domain loss is that ALDH fragments are less toxic than ADH fragments, and ALDH fragments can be beneficial for detoxifying aldehydes. Substrate channeling of AdhE has been reported to be critical for the forward reaction (i.e., conversion from acetyl-CoA to acetaldehyde) catalyzed by the ALDH domain65. According to the previous study, the acetaldehyde-producing activity of the ALDH domain was decreased by disrupting spirosome formation, while ethanol-to-acetaldehyde conversion activity of the ADH domain was not affected. Thus, we reasoned that ALDH fragments would have less activity to produce toxic acetaldehyde than ADH fragments.
We further hypothesized that the conserved ALDH-only fragments may contribute to fitness by removing the toxic aldehydes generated from other pathways (e.g., acetaldehyde generated from pyruvate via acetyl-CoA72 and formate generated by pyruvate-formate lyases61), because active sites were still significantly conserved in ALDH-only fragments (Fig. 4h). Indeed, the ALDH-only fragment of Apilactobacillus kunkeei has been reported to catalyze a reverse reaction (the enzymatic activity for the forward reaction has yet to be investigated)43. Notably, the detoxification activity can be acquired before losing ADH domains. Moreover, we found species with both ALDH-only fragments and complete adhE, implying that ALDH fragments might contribute to fitness even in the presence of complete adhE. A previous study has suggested that the expression of complete AdhE can be toxic to bacteria73. Therefore, ALDH fragments may catalyze a reverse reaction to eliminate the acetaldehyde leaked from the AdhE spirosome and/or produced by other pathways as an intermediate product. Consistently, a eukaryotic clade, Polytomella, has two adhE genes, one of which has an ADH domain with very low activity, and an ALDH domain that preferentially catalyzes the acetaldehyde-to-acetyl CoA reaction74. Here, we focused on the evolution of adhE because loss of the gene characterizes FLAB. Nevertheless, we can generally apply the analysis methodologies to other multidomain proteins, which may further reveal the evolutionary patterns of domain architecture decay.
Collectively, our findings on stream-model convergent evolution at multiple levels extend our knowledge of the constrained evolutionary paths in the long-term evolution of free-living microbes. So far, convergent evolution at different levels has been studied using separate model clades18–20,23–26,31,32. Thus, analyzing the evolution of lactic acid bacteria will also lead us to find the interplay between evolutionary events at different levels of convergence, such as domain architecture evolution contingent on the losses of other specific genes. This study also established lactic acid bacteria in a fructose-rich environment as a model system to scrutinize convergent evolution at multiple levels and their interplays.
Methods
Dataset
We retrieved all the protein sequences of every representative genome and a reference phylogeny from GTDB r202 on April 28, 2021. The datasets covered 45,555 bacterial species of all the phyla defined in GTDB. A tree of Lactobacillaceae species (369 species) was extracted from the whole reference phylogeny using the TreeNode.prune function in ete3 toolkit 3.1.2 (ref. 75), preserving the branch lengths (with an option “preserve_branch_length = True”). We also extracted a tree of Lactobacillaceae for 344 species with high-quality representative genomes (defined as >95% completeness and <5% contamination throughout this study). We used representative genomes and phylogenies only of Lactobacillaceae throughout this study, except for searching complete adhE genes from all bacterial phyla (explained later). 16S rRNA gene sequences of 336 of the 344 species were downloaded from GTDB r202.
Reconstruction of ancestral gene content
To reveal the order of gene loss in Fructobacillus and Apilactobacillus lineages, we reconstructed gene gain/loss scenarios (i.e., gene presence/absence at each internal node in the reference phylogeny) for every ortholog group. We firstly conducted Hidden Markov Model-based ortholog annotation for all the 344 high-quality representative genomes of Lactobacillaceae by KofamScan v1.3.0 (ref. 47). Using the results of the ortholog annotation for the Lactobacillaceae genomes by KofamScan, we first prepared an ortholog table representing the presence/absence (represented as one and zero, respectively) of each ortholog group for each of the 344 Lactobacillaceae species with high-quality representative genomes. Then, we estimated gene gain/loss rate parameters for every ortholog group under the two-state Mk model45 using Markov chain Monte Carlo (MCMC) sampling implemented in Diversitree v0.9.16 (ref. 44). Here, we used an ultrametric Lactobacillaceae tree converted from the GTDB reference phylogeny by chronos function in ape 5.6.2 (ref. 76) package. For MCMC sampling, we initialized model parameters by maximum likelihood estimation and set the burn-in and sampling interval of MCMC sampling to 500 and 10, respectively (we confirmed negligible autocorrelation of estimated parameters under this setting (Supplementary Fig. 4)). This yielded 500 MCMC samples with gene gain/loss rate parameters for each ortholog group. Finally, we estimated the gene gain/loss scenarios for each of the 500 MCMC samples using the maximum-likelihood joint reconstruction in Diversitree v0.9.16.
Test of gene loss order similarity
To evaluate the similarity between the evolutionary processes of the two FLAB clades (Fructobacillus and Apilactobacillus), we tested whether the order of gene loss was significantly similar. We first listed genes lost during evolution from the common ancestor of the two FLAB clades (LCAfa) to that of Apilactobacillus (LCAa) or that of Fructobacillus (LCAf) based on the best-supported gene gain/loss scenario for every ortholog. We then estimated the gene-loss branches for each ortholog, branches where each ortholog was lost, by selecting the most frequently supported scenario. To determine the most frequently supported scenario for every ortholog, we only considered scenarios in which gene loss occurred once during the evolution from LCAfa to LCAa/LCAf (Fig. 4a). Next, we calculated the similarity index S, defined as the ratio of ortholog pairs lost in the same order between the two lineages to all pairs of orthologs lost in both lineages. To obtain the null distribution of S, we calculated the same index (S’) after randomly shuffling the correspondence between the orthologs and loss branches. We repeatedly shuffled and calculated S’ 10,000 times, and calculated the p value of S as the ratio of S’ ≧ S.
Habitat preference analysis of Lactobacillaceae based on metagenome
To identify species inhabiting fructose-rich environments, we estimated the habitat preference for each of the 336 Lactobacillaceae species for which high-quality representative genomes and 16S rRNA sequences were available from GTDB (see Datasets). We queried the 16S rRNA sequences of the 336 species against ProkAtlas online62 on June 30, 2021. ProkAtlas conducts a BLAST search of queried 16S rRNA sequences in short-read metagenomic resources from various environments (e.g., soil, human gut, and pollen), then returns a habitat preference score for each environment. ProkAtlas parameters for nucleotide identity and sequence coverage thresholds were set to 97% and 150 bp, respectively.
Detection of adhE-like genes
adhE genes possessed by Lactobacillaceae were comprehensively detected, as shown in Supplementary Fig. 6a. We firstly searched adhE (K04072 in the KEGG Orthology database48) for all the 25,877 high-quality representative genomes of all bacterial phyla by Hidden Markov Model-based ortholog annotation using KofamScan v1.3.0 (ref. 47). Based on the OG annotation results, we extracted all genes annotated as adhE and filtered genes encoding proteins in the length range between 800-1000 aa, which yielded 4833 genes. We then conducted a sensitive homology search for every protein sequence of the 369 Lactobacillaceae species, treating the 4833 genes as a database using MMseqs2 13.45111 (mmseqs easy-search -s 7.50)60. By extracting genes with both target coverage and identity greater than 45% for any of the 4833 genes, we retrieved 399 adhE-like genes from Lactobacillaceae.
Classification of adhE-like genes into adhE-derived genes and other gene families
To remove genes belonging to different families from the 399 adhE-like genes, we constructed phylogenetic trees for both the N- and C-terminal regions of AdhE. We first conducted a multiple sequence alignment (MSA) for the 399 adhE-like genes in Lactobacillaceae and the adhE gene of Escherichia coli (here assumed to be an outgroup within the AdhE family) using MAFFT 7.310 (ref. 77). We then extracted 500 N-terminal and C-terminal columns from the resulting MSA and excluded sequences whose >50% of the columns were filled with gaps in the extracted MSAs, which yielded MSAs of 381 and 314 sequences for the N-terminal and C-terminal half regions of adhE, respectively. After trimming MSA columns filled with gaps by Trimal v1.3. rev15 (-gappyout)78, phylogenetic trees were constructed for both MSAs using IQ-TREE v2.0.3 (-m MFP -bb 1000 -nt 20)79. For the two phylogenetic trees, we first estimated the latest common ancestor node on the phylogenies by assuming E. coli adhE was an outgroup of all Lactobacillaceae complete adhE genes, which are ~900 aa. Finally, we identified an outgroup clade of all adhE-like genes and treated them as non-adhE genes, that is, genes belonging to other protein families. After excluding these sequences, we obtained 372 sequences of adhE and adhE-derived fragment genes possessed by Lactobacillaceae.
To verify the validity of the non-adhE gene identification, we constructed sequence similarity networks (SSNs) for 399 adhE-like genes. We conducted an all-versus-all alignment of the sequences by MMseqs2 (easy-search -s 7.50) and constructed two SSNs by linking all gene pairs with >45% and >70% sequence identity, respectively.
adhE detection in representative and non-representative genomes of Companilactobacillus zhachilii, Lacticaseibacillus saniviri, and Lentilactobacillus kefiri
Representative/non-representative genomes of the three species in GTDB were downloaded by ncbi-genome-download-0.3.1 from Refseq and Genbank, and we annotated genes in the genomes by prodigal 2.6.3 (ref. 80). We then searched for adhE-like genes in these genomes using the same method used to detect adhE-like genes in representative Lactobacillaceae genomes. Finally, we aligned the detected genes using MAFFT and visualized the multiple sequence alignments.
Rate parameter estimation for state transitions of AdhE repertoire
To analyze the evolutionary dynamics of AdhE repertoire, we counted AdhEs per species to distinguish complete AdhEs, ALDH fragments, ADH fragments, and intermediates (partially missing ALDH or ADH domain) based on the sequence length and center position of every sequence in the MSA. Here, complete AdhEs, ALDH fragments, ADH fragments, and intermediates were defined as sequences with >800 aa lengths and center positions in the 400-900th columns of MSA, those with center positions in the <400th columns, those with center positions in >900th columns, and the other sequences.
To estimate the state transition rates among the seven observed states of the AdhE repertoire per species (color-filled circles in Fig. 3d), we conducted a phylogenetic comparative analysis using the Mk model (k = 7) implemented in Diversitree package. All state transition parameters (42 parameters) were estimated using the maximum likelihood method (method = “nlm”), given as inputs the species tree of 369 Lactobacillus species and the state information of the AdhE repertoire for each species. Each parameter was initialized to 1/42.
Analysis of mutations accumulating specifically in adhE-derived fragment genes
To detect mutations accumulating specifically in adhE-derived fragment genes, we conducted ancestral sequence reconstruction of the N-terminal half regions of 381 adhE-like genes using IQ-TREE v2.0.3 (-m MFP -asr -te). The tree topology was fixed as described above. We adopted one amino acid with a > 50% posterior probability for each sequence position of every ancestral node. If no amino acid showed >50% posterior probability, the site at the node was treated as uncertain. For every sequence position in the MSA, we detected branches of the phylogenetic tree where one or more substitutions occurred based on whether different amino acids were adopted for the branch’s parental and child nodes. By counting the branches where substitutions occurred, we detected the top-15 sequence positions with substitutions specific to the clades of adhE-derived N-terminal fragments.
Structural analysis of AdhE
To interpret the functional insights into mutations at 15 sequence positions, we downloaded the structure of the AdhE complex (6ahc) from Protein Data Bank, extracted a dimer structure (chains B and C), and mapped the positions onto it. Next, we tested the enrichment of these positions at the interaction interface of AdhEs or at the active site of the AdhE ALDH domain. We annotated 88 residues of chain B, whose any atom is within 5Å of chain C, as residues for dimerization of AdhE. We also downloaded and analyzed another structure of AdhE bound to an NAD+ (6tqh66) and defined 41 residues of chain B, whose any atom is within 5Å of the NAD+ molecule, as residues for the active site of the ALDH domain. We then compared the portion of substitutions in the ALDH fragment clades among residues for dimerization, the active site of the ALDH domain, and other residues.
Statistics and reproducibility
Throughout the paper, we selected appropriate statistical tests for each format of analyzed data. We selected Mann–Whitney U-test or Phylogenetic ANOVA to test the significance of differences between data groups when the data are independent or dependent from a phylogeny, respectively. We chose Chi-squared test for testing significant differences between expected and observed values of contingency tables. Furthermore, we conducted permutation tests on the significance of gene loss order similarity. All analyses in this study are reproducible using public datasets or the source data of this study (Supplementary Data 4).
Supplementary information
Description of Additional Supplementary Materials
Acknowledgments
We thank all the members of the Iwasaki lab for valuable discussions and critical comments on the content of this paper, especially T. K. Suzuki and K. Miyake for many insightful feedbacks. Computations were partially performed on the SuperComputer System, Institute for Chemical Research, Kyoto University. This work was funded by JPMJCR19S2 from Japan Science and Technology Agency to W.I.; KAKENHI 19H05688 and 22H04925 from the Japan Society for the Promotion of Science (JSPS) to W.I.; and KAKENHI 22J20318 from the JSPS to N.K.
Author contributions
Conceptualization: N.K., S.M., A.E., and W.I. Methodology: N.K., S.M., A.E., and W.I., Investigation: N.K., S.M., and Y.T., Visualization: N.K., Supervision: W.I., Writing—original draft: N.K., A.E., and W.I., Writing—review and editing: N.K., Y.T., M.A., A.E., and W.I.
Peer review
Peer review information
Communications Biology thanks Ingo Ebersberger, Duhita Sant and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: Sabina Leanti La Rosa and Dario Ummarino. A peer review file is available.
Data availability
The bacterial reference phylogeny and genome sequence data used in this study are publicly available from GTDB. We provide the list of genome accessions for Lactobacillaceae species for which we analyzed their genomes and habitats (Supplementary Data 2). In addition, the list of 137 orthologs commonly and independently lost in two FLAB lineages is provided (Supplementary Data 3). Amino acid sequences of adhE-like genes are provided as Supplementary Data 1. The source data behind figures in this paper are provided as Supplementary Data 4. These supplementary materials are available in a public repository, Zenodo (10.5281/zenodo.11378208)81. All data needed to evaluate the conclusions in the paper are present in the paper, the Supplementary Materials, and/or the public repositories described above.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Naoki Konno, Email: konno-naoki555@g.ecc.u-tokyo.ac.jp.
Wataru Iwasaki, Email: iwasaki@k.u-tokyo.ac.jp.
Supplementary information
The online version contains supplementary material available at 10.1038/s42003-024-06580-0.
References
- 1.Kocher, T. D., Conroy, J. A., McKaye, K. R. & Stauffer, J. R. Similar morphologies of cichlid fish in Lakes Tanganyika and Malawi are due to convergence. Mol. Phylogenet. Evol.2, 158–165 (1993). 10.1006/mpev.1993.1016 [DOI] [PubMed] [Google Scholar]
- 2.Losos, J. B., Jackman, T. R., Larson, A., de Queiroz, K. & Rodríguez-Schettino, L. Contingency and determinism in replicated adaptive radiations of island lizards. Science279, 2115–2118 (1998). 10.1126/science.279.5359.2115 [DOI] [PubMed] [Google Scholar]
- 3.Blackledge, T. A. & Gillespie, R. G. Convergent evolution of behavior in an adaptive radiation of Hawaiian web-building spiders. Proc. Natl Acad. Sci. USA101, 16228–16233 (2004). 10.1073/pnas.0407395101 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Muschick, M., Indermaur, A. & Salzburger, W. Convergent evolution within an adaptive radiation of cichlid fishes. Curr. Biol.22, 2362–2368 (2012). 10.1016/j.cub.2012.10.048 [DOI] [PubMed] [Google Scholar]
- 5.Martin, C. H. & Wainwright, P. C. Multiple fitness peaks on the adaptive landscape drive adaptive radiation in the wild. Science339, 208–211 (2013). 10.1126/science.1227710 [DOI] [PubMed] [Google Scholar]
- 6.Blount, Z. D., Lenski, R. E. & Losos, J. B. Contingency and determinism in evolution: Replaying life’s tape. Science362, eaam5979 (2018). 10.1126/science.aam5979 [DOI] [PubMed] [Google Scholar]
- 7.Mahler, D. L., Ingram, T., Revell, L. J. & Losos, J. B. Exceptional convergence on the macroevolutionary landscape in island lizard radiations. Science341, 292–295 (2013). 10.1126/science.1232392 [DOI] [PubMed] [Google Scholar]
- 8.Figueirido, B. et al. Body-axis organization in tetrapods: a model-system to disentangle the developmental origins of convergent evolution in deep time. Biol. Lett.18, 20220047 (2022). 10.1098/rsbl.2022.0047 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Powell, R. Is convergence more than an analogy? Homoplasy and its implications for macroevolutionary predictability. Biol. Philos.22, 565–578 (2007). 10.1007/s10539-006-9057-3 [DOI] [Google Scholar]
- 10.Moen, D. S., Morlon, H. & Wiens, J. J. Testing Convergence Versus History: Convergence Dominates Phenotypic Evolution for over 150 Million Years in Frogs. Syst. Biol.65, 146–160 (2015). 10.1093/sysbio/syv073 [DOI] [PubMed] [Google Scholar]
- 11.Stayton, C. T. What does convergent evolution mean? The interpretation of convergence and its implications in the search for limits to evolution. Interface Focus5, 20150039 (2015). 10.1098/rsfs.2015.0039 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Stayton, C. T. The definition, recognition, and interpretation of convergent evolution, and two new measures for quantifying and assessing the significance of convergence. Evolution69, 2140–2153 (2015). 10.1111/evo.12729 [DOI] [PubMed] [Google Scholar]
- 13.Arendt, J. & Reznick, D. Convergence and parallelism reconsidered: what have we learned about the genetics of adaptation? Trends Ecol. Evol.23, 26–32 (2008). 10.1016/j.tree.2007.09.011 [DOI] [PubMed] [Google Scholar]
- 14.Stern, D. L. The genetic causes of convergent evolution. Nat. Rev. Genet.14, 751–764 (2013). 10.1038/nrg3483 [DOI] [PubMed] [Google Scholar]
- 15.Storz, J. F. Causes of molecular convergence and parallelism in protein evolution. Nat. Rev. Genet.17, 239–250 (2016). 10.1038/nrg.2016.11 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Xu, S., Wang, J., Guo, Z., He, Z. & Shi, S. Genomic convergence in the adaptation to extreme environments. Plant Commun.1, 100117 (2020). 10.1016/j.xplc.2020.100117 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Gerstein, A. C., Chun, H.-J. E., Grant, A. & Otto, S. P. Genomic convergence toward diploidy in Saccharomyces cerevisiae. PLoS Genet2, e145 (2006). 10.1371/journal.pgen.0020145 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Greenbury, S. F., Barahona, M. & Johnston, I. G. HyperTraPS: inferring probabilistic patterns of trait acquisition in evolutionary and disease progression pathways. Cell Syst.10, 39–51.e10 (2020). 10.1016/j.cels.2019.10.009 [DOI] [PubMed] [Google Scholar]
- 19.Hosseini, S.-R., Diaz-Uriarte, R., Markowetz, F. & Beerenwinkel, N. Estimating the predictability of cancer evolution. Bioinformatics35, i389–i397 (2019). 10.1093/bioinformatics/btz332 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Caravagna, G. et al. Detecting repeated cancer evolution from multi-region tumor sequencing data. Nat. Methods15, 707–714 (2018). 10.1038/s41592-018-0108-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Suzuki, T. K. On the origin of complex adaptive traits: Progress since the Darwin versus Mivart debate. J. Exp. Zool. B Mol. Dev. Evol.328, 304–320 (2017). 10.1002/jez.b.22740 [DOI] [PubMed] [Google Scholar]
- 22.Suzuki, T. K. Phenotypic systems biology for organisms: concepts, methods and case studies. Biophys. Physicobiol.19, e190011 (2022). 10.2142/biophysico.bppb-v19.0011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Johnston, I. G. & Williams, B. P. Evolutionary inference across eukaryotes identifies specific pressures favoring mitochondrial gene retention. Cell Syst.2, 101–111 (2016). 10.1016/j.cels.2016.01.013 [DOI] [PubMed] [Google Scholar]
- 24.Weinreich, D. M., Delaney, N. F., DePristo, M. A. & Hartl, D. L. Darwinian evolution can follow only very few mutational paths to fitter proteins. Science312, 111–114 (2006). 10.1126/science.1123539 [DOI] [PubMed] [Google Scholar]
- 25.Gong, L. I., Suchard, M. A. & Bloom, J. D. Stability-mediated epistasis constrains the evolution of an influenza protein. Elife2, e00631 (2013). 10.7554/eLife.00631 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Bloom, J. D., Gong, L. I. & Baltimore, D. Permissive secondary mutations enable the evolution of influenza oseltamivir resistance. Science328, 1272–1275 (2010). 10.1126/science.1187816 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Harvey, P. H. & Purvis, A. Comparative methods for explaining adaptations. Nature351, 619–624 (1991). 10.1038/351619a0 [DOI] [PubMed] [Google Scholar]
- 28.Pagel, M. Inferring the historical patterns of biological evolution. Nature401, 877–884 (1999). 10.1038/44766 [DOI] [PubMed] [Google Scholar]
- 29.Uyeda, J. C., Zenil-Ferguson, R. & Pennell, M. W. Rethinking phylogenetic comparative methods. Syst. Biol.67, 1091–1109 (2018). 10.1093/sysbio/syy031 [DOI] [PubMed] [Google Scholar]
- 30.Santos, B. F. & Perrard, A. Testing the Dutilleul syndrome: host use drives the convergent evolution of multiple traits in parasitic wasps. J. Evol. Biol.31, 1430–1439 (2018). 10.1111/jeb.13343 [DOI] [PubMed] [Google Scholar]
- 31.Giannakis, K. et al. Evolutionary inference across eukaryotes identifies universal features shaping organelle gene retention. Cell Syst.13, 874–884.e5 (2022). 10.1016/j.cels.2022.08.007 [DOI] [PubMed] [Google Scholar]
- 32.Yizhak, K., Tuller, T., Papp, B. & Ruppin, E. Metabolic modeling of endosymbiont genome reduction on a temporal scale. Mol. Syst. Biol.7, 479 (2011). 10.1038/msb.2011.11 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Maeno, S. et al. Unique niche-specific adaptation of fructophilic lactic acid bacteria and proposal of three Apilactobacillus species as novel members of the group. BMC Microbiol.21, 41 (2021). 10.1186/s12866-021-02101-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Zheng, J. et al. A taxonomic note on the genus Lactobacillus: Description of 23 novel genera, emended description of the genus Lactobacillus Beijerinck 1901, and union of Lactobacillaceae and Leuconostocaceae. Int. J. Syst. Evol. Microbiol.70, 2782–2858 (2020). 10.1099/ijsem.0.004107 [DOI] [PubMed] [Google Scholar]
- 35.Endo, A. et al. Fructophilic lactic acid bacteria, a unique group of fructose-fermenting microbes. Appl. Environ. Microbiol.84, e01290-18 (2018). [DOI] [PMC free article] [PubMed]
- 36.Endo, A., Futagawa-Endo, Y. & Dicks, L. M. T. Isolation and characterization of fructophilic lactic acid bacteria from fructose-rich niches. Syst. Appl. Microbiol.32, 593–600 (2009). 10.1016/j.syapm.2009.08.002 [DOI] [PubMed] [Google Scholar]
- 37.Filannino, P., Di Cagno, R., Addante, R., Pontonio, E. & Gobbetti, M. Metabolism of Fructophilic Lactic Acid Bacteria Isolated from the Apis mellifera L. Bee Gut: Phenolic Acids as External Electron Acceptors. Appl. Environ. Microbiol.82, 6899–6911 (2016). 10.1128/AEM.02194-16 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Endo, A. & Salminen, S. Honeybees and beehives are rich sources for fructophilic lactic acid bacteria. Syst. Appl. Microbiol.36, 444–448 (2013). 10.1016/j.syapm.2013.06.002 [DOI] [PubMed] [Google Scholar]
- 39.Takatani, N. & Endo, A. Viable fructophilic lactic acid bacteria present in honeybee-based food products. FEMS Microbiol. Lett.368, fnab150 (2021). 10.1093/femsle/fnab150 [DOI] [PubMed] [Google Scholar]
- 40.Maeno, S., Kajikawa, A., Dicks, L. & Endo, A. Introduction of bifunctional alcohol/acetaldehyde dehydrogenase gene (adhE) in Fructobacillus fructosus settled its fructophilic characteristics. Res. Microbiol.170, 35–42 (2019). 10.1016/j.resmic.2018.09.004 [DOI] [PubMed] [Google Scholar]
- 41.Endo, A., Tanaka, N., Oikawa, Y., Okada, S. & Dicks, L. Fructophilic characteristics of Fructobacillus spp. may be due to the absence of an alcohol/acetaldehyde dehydrogenase gene (adhE). Curr. Microbiol.68, 531–535 (2014). 10.1007/s00284-013-0506-3 [DOI] [PubMed] [Google Scholar]
- 42.Zaunmüller, T., Eichert, M., Richter, H. & Unden, G. Variations in the energy metabolism of biotechnologically relevant heterofermentative lactic acid bacteria during growth on sugars and organic acids. Appl. Microbiol. Biotechnol.72, 421–429 (2006). 10.1007/s00253-006-0514-3 [DOI] [PubMed] [Google Scholar]
- 43.Maeno, S. et al. Genomic characterization of a fructophilic bee symbiont Lactobacillus kunkeei reveals its niche-specific adaptation. Syst. Appl. Microbiol.39, 516–526 (2016). 10.1016/j.syapm.2016.09.006 [DOI] [PubMed] [Google Scholar]
- 44.FitzJohn, R. G. Diversitree : comparative phylogenetic analyses of diversification in R. Methods Ecol. Evol.3, 1084–1092 (2012). 10.1111/j.2041-210X.2012.00234.x [DOI] [Google Scholar]
- 45.Pagel, M. Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters. Proc. R. Soc. Lond. Ser. B: Biol. Sci.255, 37–45 (1997). [Google Scholar]
- 46.Parks, D. H. et al. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Res.50, D785–D794 (2022). 10.1093/nar/gkab776 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Aramaki, T. et al. KofamKOALA: KEGG Ortholog assignment based on profile HMM and adaptive score threshold. Bioinformatics36, 2251–2252 (2020). 10.1093/bioinformatics/btz859 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Kanehisa, M., Furumichi, M., Sato, Y., Ishiguro-Watanabe, M. & Tanabe, M. KEGG: integrating viruses and cellular organisms. Nucleic Acids Res49, D545–D551 (2021). 10.1093/nar/gkaa970 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Endo, A., Futagawa-Endo, Y., Sakamoto, M., Kitahara, M. & Dicks, L. M. T. Lactobacillus florum sp. nov., a fructophilic species isolated from flowers. Int. J. Syst. Evol. Microbiol.60, 2478–2482 (2010). 10.1099/ijs.0.019067-0 [DOI] [PubMed] [Google Scholar]
- 50.Suzuki, K., Iijima, K., Ozaki, K. & Yamashita, H. Isolation of a hop-sensitive variant of Lactobacillus lindneri and identification of genetic markers for beer spoilage ability of lactic acid bacteria. Appl. Environ. Microbiol.71, 5089–5097 (2005). 10.1128/AEM.71.9.5089-5097.2005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Gänzle, M. G. & Zheng, J. Lifestyles of sourdough lactobacilli – Do they matter for microbial ecology and bread quality? Int. J. Food Microbiol.302, 15–23 (2019). 10.1016/j.ijfoodmicro.2018.08.019 [DOI] [PubMed] [Google Scholar]
- 52.Rogalski, E., Ehrmann, M. A. & Vogel, R. F. Strain-specific interaction of Fructilactobacillus sanfranciscensis with yeasts in the sourdough fermentation. Eur. Food Res. Technol.247, 1437–1447 (2021). 10.1007/s00217-021-03722-0 [DOI] [Google Scholar]
- 53.Vuong, H. Q. & McFrederick, Q. S. Comparative genomics of wild bee and flower isolated Lactobacillus reveals potential adaptation to the bee host. Genome Biol. Evol.11, 2151–2161 (2019). 10.1093/gbe/evz136 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Tatusov, R. L., Koonin, E. V. & Lipman, D. J. A genomic perspective on protein families. Science278, 631–637 (1997). 10.1126/science.278.5338.631 [DOI] [PubMed] [Google Scholar]
- 55.Bernheim, A. & Sorek, R. The pan-immune system of bacteria: antiviral defence as a community resource. Nat. Rev. Microbiol.18, 113–119 (2019). 10.1038/s41579-019-0278-2 [DOI] [PubMed] [Google Scholar]
- 56.Belbahri, L. et al. Comparative genomics of Bacillus amyloliquefaciens strains reveals a core genome with traits for habitat adaptation and a secondary metabolites rich accessory genome. Front. Microbiol.8, 1438 (2017). 10.3389/fmicb.2017.01438 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Livingstone, P. G., Morphew, R. M. & Whitworth, D. E. Genome sequencing and pan-genome analysis of 23 Corallococcus spp. strains reveal unexpected diversity, with particular plasticity of predatory gene sets. Front. Microbiol.9, 3187 (2018). 10.3389/fmicb.2018.03187 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Parks, D. H. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat. Biotechnol.36, 996–1004 (2018). 10.1038/nbt.4229 [DOI] [PubMed] [Google Scholar]
- 59.Zheng, J., Ruan, L., Sun, M. & Gänzle, M. A genomic view of lactobacilli and pediococci demonstrates that phylogeny matches ecology and physiology. Appl. Environ. Microbiol.81, 7233–7243 (2015). 10.1128/AEM.02116-15 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol.35, 1026–1028 (2017). 10.1038/nbt.3988 [DOI] [PubMed] [Google Scholar]
- 61.Knappe, J., Blaschkowski, H. P., Gröbner, P. & Schmitt, T. Pyruvate formate-lyase of Escherichia coli: the acetyl-enzyme intermediate. Eur. J. Biochem.50, 253–263 (1974). 10.1111/j.1432-1033.1974.tb03894.x [DOI] [PubMed] [Google Scholar]
- 62.Mise, K. & Iwasaki, W. Environmental atlas of prokaryotes enables powerful and intuitive habitat-based analysis of community structures. iScience23, 101624 (2020). 10.1016/j.isci.2020.101624 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Kawasaki, S. et al. Lactobacillus floricola sp. nov., lactic acid bacteria isolated from mountain flowers. Int. J. Syst. Evol. Microbiol.61, 1356–1359 (2011). 10.1099/ijs.0.022988-0 [DOI] [PubMed] [Google Scholar]
- 64.Kessler, D., Leibrecht, I. & Knappe, J. Pyruvate-formate-lyase-deactivase and acetyl-CoA reductase activities of Escherichia coli reside on a polymeric protein particle encoded by adhE. FEBS Lett.281, 59–63 (1991). 10.1016/0014-5793(91)80358-A [DOI] [PubMed] [Google Scholar]
- 65.Kim, G. et al. Aldehyde-alcohol dehydrogenase forms a high-order spirosome architecture critical for its activity. Nat. Commun.10, 1–11 (2019). 10.1038/s41467-019-12427-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Pony, P., Rapisarda, C., Terradot, L., Marza, E. & Fronzes, R. Filamentation of the bacterial bi-functional alcohol/aldehyde dehydrogenase AdhE is essential for substrate channeling and enzymatic regulation. Nat. Commun.11, 1426 (2020). 10.1038/s41467-020-15214-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Koludarov, I. et al. Domain loss enabled evolution of novel functions in the snake three-finger toxin gene superfamily. Nat. Commun.14, 1–15 (2023). 10.1038/s41467-023-40550-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Casewell, N. R., Wagstaff, S. C., Harrison, R. A., Renjifo, C. & Wüster, W. Domain loss facilitates accelerated evolution and neofunctionalization of duplicate snake venom metalloproteinase toxin genes. Mol. Biol. Evol.28, 2637–2649 (2011). 10.1093/molbev/msr091 [DOI] [PubMed] [Google Scholar]
- 69.Konno, N. & Iwasaki, W. Machine learning enables prediction of metabolic system evolution in bacteria. Sci. Adv.9, eadc9130 (2023). 10.1126/sciadv.adc9130 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Gough, J. Convergent evolution of domain architectures (is rare). Bioinformatics21, 1464–1471 (2005). 10.1093/bioinformatics/bti204 [DOI] [PubMed] [Google Scholar]
- 71.Weiner, J. 3rd, Beaussart, F. & Bornberg-Bauer, E. Domain deletions and substitutions in the modular protein evolution. FEBS J.273, 2037–2047 (2006). 10.1111/j.1742-4658.2006.05220.x [DOI] [PubMed] [Google Scholar]
- 72.Holzapfel, W. H. & Wood, B. J. B. Lactic Acid Bacteria: Biodiversity and Taxonomy (John Wiley & Sons, 2014).
- 73.Membrillo-Hernández, J. et al. Evolution of the adhE Gene Product of Escherichia coli from a Functional Reductase to a Dehydrogenase: GENETIC AND BIOCHEMICAL STUDIES OF THE MUTANT PROTEINS*. J. Biol. Chem.275, 33869–33875 (2000). 10.1074/jbc.M005464200 [DOI] [PubMed] [Google Scholar]
- 74.Van Lis, R. et al. Phylogenetic and functional diversity of aldehyde-alcohol dehydrogenases in microalgae. Plant Mol. Biol.105, 497–511 (2021). 10.1007/s11103-020-01105-9 [DOI] [PubMed] [Google Scholar]
- 75.Huerta-Cepas, J., Serra, F. & Bork, P. ETE 3: reconstruction, analysis, and visualization of phylogenomic data. Mol. Biol. Evol.33, 1635–1638 (2016). 10.1093/molbev/msw046 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Paradis, E. & Schliep, K. ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics35, 526–528 (2018). 10.1093/bioinformatics/bty633 [DOI] [PubMed] [Google Scholar]
- 77.Katoh, K., Misawa, K., Kuma, K.-I. & Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res.30, 3059–3066 (2002). 10.1093/nar/gkf436 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics25, 1972–1973 (2009). 10.1093/bioinformatics/btp348 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Minh, B. Q. et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol.37, 1530–1534 (2020). 10.1093/molbev/msaa015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinform.11, 119 (2010). 10.1186/1471-2105-11-119 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Konno, N. et al. Evolutionary paths toward multi-level convergence of lactic acid bacteria in fructose-rich environments [Data set]. Zenodo. 10.5281/zenodo.11378208 (2024). [DOI] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Description of Additional Supplementary Materials
Data Availability Statement
The bacterial reference phylogeny and genome sequence data used in this study are publicly available from GTDB. We provide the list of genome accessions for Lactobacillaceae species for which we analyzed their genomes and habitats (Supplementary Data 2). In addition, the list of 137 orthologs commonly and independently lost in two FLAB lineages is provided (Supplementary Data 3). Amino acid sequences of adhE-like genes are provided as Supplementary Data 1. The source data behind figures in this paper are provided as Supplementary Data 4. These supplementary materials are available in a public repository, Zenodo (10.5281/zenodo.11378208)81. All data needed to evaluate the conclusions in the paper are present in the paper, the Supplementary Materials, and/or the public repositories described above.




