Abstract
Multidomain proteins can have a complex evolutionary history that may involve de novo domain evolution, recruitment and / or recombination of existing domains and domain losses. Here, the domain evolution of the plant-specific Ca2+-permeable mechanosensitive channel protein, MID1-COMPLEMENTING ACTIVITY (MCA), was investigated. MCA, a multidomain protein, possesses a Ca2+-influx-MCAfunc domain and a PLAC8 domain. Profile Hidden Markov Models (HMMs) of domains were assessed in 25 viridiplantae proteomes. While PLAC8 was detected in plants, animals, and fungi, MCAfunc was found in streptophytes but not in chlorophytes. Full MCA proteins were only found in embryophytes. We identified the MCAfunc domain in all streptophytes including charophytes where it appeared in E3 ubiquitin ligase-like proteins. Our Maximum Likelihood (ML) analyses suggested that the MCAfunc domain evolved early in the history of streptophytes. The PLAC8 domain showed similarity to Plant Cadmium Resistance (PCR) genes, and the coupling of MCAfunc and PLAC8 seemed to represent a single evolutionary event. This combination is unique in MCA, and does not exist in other plant mechanosensitive channels. Within angiosperms, gene duplications increased the number of MCAs. Considering their role in mechanosensing in roots, MCA might be instrumental for the rise of land plants. This study provides a textbook example of de novo domain emergence, recombination, duplication, and losses, leading to the convergence of function of proteins in plants.
Introduction
Proteins are essential components in any biological organism, including plants. Each protein can be assembled from smaller units, termed domains, and a protein can consist of a single or multiple domains [1]. There exist several databases for the repository of protein domains found in biological organisms [2]. Pfam, for example, currently has 19,179 entries ([3]; Pfam v.34.0, released March 2021). During organismal evolution, protein domains can combine but also evolve de novo. These de novo domains can be further combined with other de novo or existing domains to create novel proteins [1]. During plant evolution, it has been suggested that at least 500 novel protein domains unique to this evolutionary lineage have emerged [4]. A search of Arabidopsis thaliana proteomes suggested that 75% of its proteins have domains registered in Pfam [5]. This indicates that there still exist a significant amount of unknown protein domains or domain combinations even in well studied plants, let alone plants in general. The combination of domains is perhaps a cost-effective way for organisms to create novel proteins [1], and in A. thaliana, at least 25% of proteins have multiple domains [5].
Integral membrane proteins that mediate ion fluxes in response to mechanical stresses, including touch, wind, water flow, osmotic pressure, gravity, and cell division- and cell expansion-generated forces, are called mechanosensitive channels. To date, five groups of mechanosensitive channels are found in plants [6]. One of them is a group of MID1-COMPLEMENTARY ACTIVITY (MCA) proteins, which are shown to function as Ca2+-permeable mechanosensitive channels [7, 8]. The genes encoding MCAs are found exclusively in the plant kingdom [7, 9], whereas genes encoding other groups of mechanosensitive channels are found in prokaryotes and/or eukaryotes. Therefore, MCAs are unique in terms of molecular evolution and it is interesting to investigate when and where the MCA genes appeared during plant evolution.
In A. thaliana, two paralogous MCA genes, AtMCA1 and AtMCA2 have been isolated, and their functions examined in great detail. The AtMCA1 protein is involved in touch sensing at the root tip and a hypoosmotic shock-induced increase in the cytosolic free Ca2+ concentration [7]. AtMCA2 was reported to participate in Ca2+ uptake at the roots [10]. In addition, AtMCA1 and AtMCA2 respond to membrane stretch to generate cation currents when expressed in Xenopus laevis oocytes [8]. Furthermore, MCA channels appear to have common functions in plants, based on studies on Oryza sativa OsMCA1 [11–13], Nicotiana tabacum NtMCA1, NtMCA2 [14], Zea mays CNR13 [15], and Streptocarpus MCA-like gene (as Saintpaulia in [16]; see [17]).
MCAs are approximately 420 amino acid (aa) residues long multidomain proteins. They retain the provisionally advocated ARPK domain (Amino-terminal domain of Rice putative Protein Kinases; 1–143 aa) [7], overlapping with the EF hand-like region at the N-terminal region (136–180 aa) (InterPro: IPR002048), and well-curated PLAC8 domain (Pfam ID: PF04749) at the C-terminal region (S1 Appendix). A coiled-coil motif is located in the middle of the proteins. An approximately 170 aa region at the N-terminus, covering the ARPK and the EF hand-like domains, has Ca2+ influx activity and is proposed to be a functional domain of MCAs [18]. In this study, we defined the N-terminal region as the MCA functional (MCAfunc) domain.
In previous work, an MCA Neighbor-Joining tree was published that included only a limited number of plants, i.e. one moss, one lycophyte, one gymnosperm, and eight angiosperms. The unrooted tree showed that MCA proteins were mostly grouped following the tree of life (e.g. tolweb.org/tree/), except for Picea sitchensis (gymnosperm) and Linum usitatissimum (angiosperm) [9]. However, information from this tree is insufficient to elucidate the evolutionary history of the protein family or their domains. To better understand the origin and evolution of MCA proteins in plants, a more comprehensive study is required. Thus, in the present study, wide-ranging phylogenetic analyses of MCA proteins were carried out on 25 viridiplantae proteomes and full MCA proteins of 55 streptophyte species. Here, for ranks, we followed the definition by Leliaert et al. [19] and NCBI Taxonomy Browser (https://www.ncbi.nlm.nih.gov/guide/taxonomy/), where viridiplantae include green algae (chlorophytes) and streptophytes, streptophytes include charophytes and embryophytes, and embryophytes (also termed as “land plants”) include bryophytes (Hornworts, Liverworts, Mosses), lycophytes, ferns, gymnosperms and angiosperms. Since MCA is a multidomain protein, we focused on the evolution, origin and fate of each domain (MCAfunc and PLAC8) as well as the full MCA protein. Comprehensive domain searches were carried out against the viridiplantae proteomes that included two chlorophytes and two charophytes. The study represents an example for the evolutionary dynamics of a multidomain protein in plants.
Materials and methods
Proteomes, genome, and transcriptomes used in this study
Twenty-five proteomes including species ranging from chlorophytes to angiosperms, were downloaded from Uniprot (https://www.uniprot.org/) and plaza (https://bioinformatics.psb.ugent.be/plaza/versions/gymno-plaza/) (S2 Appendix). Genomes / transcriptomes of 55 plant species were explored to find the full MCA genes (S3 Appendix; KEGG: [20]; Phytozome: https://phytozome.jgi.doe.gov/pz/portal.html#; OneKP: [21]; NCBI Genome: https://www.ncbi/nlm.nih.gov; Fernbase: https://www.fernbase.org; EnsemblPlants: https://plants.ensembl.org). Recently, systematic studies returned the genus Physcomitrella to the genus Physcomitrium [22], but we used the name Physcomitrella in this study for consistency with the registered names in the databases. The proteome completeness information, i.e. BUSCO completeness values (BUSCO-C) were available for most taxa on the Uniprot database. The BUSCO-C values of proteomes from plaza database (Cycas micholitzii, Taxus baccata) were newly obtained in this study using BUSCO v.4.0.6 [23], by comparisons against viridiplantae_odb10 lineage datasets.
Building profile Hidden Markov Models (HMMs)
Three profile HMMs were used in this study: for the full MCA protein the model in PANTHER (http://www.pantherdb.org/), ‘PROTEIN MID1-COMPLEMENTARY ACTIVITY 1 (MCA1): PTHR46604.SF3.pir.hmm’ (422 aa) was used. A new profile HMM was created with hmmbuild in HMMER v.3.1b2 package (http://hmmer.org/), for the 1–167 aa region of the putative MCAfunc domain (MCAfunc.hmm; 167 aa). MCAfunc.hmm was registered in Pfam v.34.0 (PF19584). The profile HMM of PLAC8, was obtained from Pfam v.33.1 (PLAC8.hmm: PF04749; 91 aa). Logos of the profile HMMs were generated with Skylign (http://www.skylign.org) (S4 and S5 Appendices).
Building domain matrices
Proteomes were interrogated for the presence of the MCAfunc and PLAC8 domains using hmmsearch (HMMER package), and the default setting (E value < 10.00). In these, proteins with the ‘full E-value’ < 0.001 and > 30 aa homologous regions were kept for further analyses. The domain sequences were aligned with MCAfunc.hmm and PLAC8.hmm, using hmmalign, respectively. The alignments were manually checked and corrected in BioEdit v.7.2.5 [24]. They were further trimmed to remove hypervariable regions with BMGE v.1.12 [25] on the Galaxy server (https://galaxy.pasteur.fr/).
The proteome of M. polymorpha subsp. ruderalis (UP000077202) did not include proteins with both domains, but the closely related M. polymorpha did. The full MCA sequence was found in the genome database of M. polymorpha subsp. ruderalis (NCBI Genome GCA_001641455.1; Mp_v4; LVLJ01003617.1:83933–90898), and was highly homologous to that in M. polymorpha (Phytozome v.12.0: Mapoly0134s0009) (S6 Appendix). Thus, the translated amino acid sequences of the genome region (LVLJ01003617.1:83933–90898) was used as “A0A176VHI1_MARPO*” (S7 and S8 Appendices).
Domain-based phylogenetic analyses
Maximum likelihood (ML) analyses were carried out with PhyML v.3.0 [26] on the ATGC server (www.atgc-montpellier.fr), with Smart Model Selection (SMS) [27]. Tree topology searches using SPR were carried out, and SH-like αLRT values obtained for branch support. ML rapid bootstrapping analyses of 2000 replicates were performed for additional clade support with RAxML v.8 [28], using models selected with ToPALi v.2 [29].
The Phyml trees were examined with Notung v.2.9 [30] for determining the root of the trees. The required species tree for this analysis (S9 Appendix) followed the Tree of Life Web Project (http://tolweb.org) and Angiosperm phylogeny website v.14 [31]. The bryophyte relationships followed [32]. For the MCAfunc, the proteins in the charophyte K. nitens were suggested as root (S10 and S11 Appendices). For the PLAC8 domain tree, no strong root position was indicated (S12 Appendix), and thus a midpoint rooted tree was shown for ease of visualization (S13 Appendix).
Partner domain HMMER searches of MCAfunc and PLAC8 domains
The retained proteins possessed either MCAfunc or PLAC8, or both domains. To determine the exact domain composition of these proteins, they were searched against the Pfam with hmmscan in HMMER (https://www.ebi.ac.uk/Tools/hmmer/). Based on their E-values they were visualized through R [33], as a colour-coded rooted phylogeny and heatmap utilizing ggplot2 [34], ggtree [35], ape v.5.0 [36], and phytools [37]. In some cases where two closely related domains were predicted for the same genome position, or domain duplications were involved, the domains with the lowest E values were selected. The data was also used for the schematic illustrations of representative domain structures visualized by R with a modified script based on Brennan (https://rforbiochemists.blogspot.com/2015/11/drawing-protein-domain-structure-using-r.html).
Phylogeny analysis of full MCA protein sequences
BLAST searches were carried out on plant genome and transcriptome databases using the AtMCA1 protein sequence. The found sequences were further evaluated using hmmsearch with MCAfunc.hmm and PLAC8.hmm. Only genes possessing both domains were included in the phylogenetic analysis. The positions of MCA sequences in genomes were examined where it was possible, and only one transcript sequence involved, e.g. in Selaginella moellendorffii, one MCA genome sequence found, whereas two identical proteins are present in the proteome (UP000001514). Thus, only one MCA from S. moellendorffii was included in the analyses. A phylogenetic tree was built with Phyml v.3.0, and subjected to Notung analyses for rooting. The bryophytes were suggested as likely root (S14 Appendix).
Because the study focussed on MCA, we specifically analysed gene duplication events for the full MCA protein tree in a Notung reconcile analysis (tree rearranged with Edge Weight Threshold = 0.6). The species tree used here (S15 Appendix) followed the Angiosperm phylogeny website v.14 [31]. The bryophyte relationship followed [32]. The relationships within angiosperm followed [38], for Brassicaceae [39], for Fabaceae [40], and for Poaceae [41].
Results
MCAfunc domain found in streptophytes, MCAfunc+PLAC8 in land plants
To determine the distribution of MCA proteins in viridiplantae, 25 proteomes (see Table 1) were interrogated for domains of the MCA protein, MCAfunc and PLAC8, with profile HMMs using HMMER. In total, 217 proteins were found possessing only the MCAfunc domain, 438 with only the PLAC8 domain, and 32 possessing both domains (Table 1; S7 and S8 Appendices). The MCAfunc domain was only present in streptophytes, whereas the PLAC8 domain was found in all proteomes examined in this study (Table 1).
Table 1. Number of proteins found in proteomes.
Vernacular | ID | Taxon | No of proteins with | |
---|---|---|---|---|
name | MCAfunc | PLAC8 | ||
Chlorophytes | CHLRE | Chlamydomonas reinhardtii | 0 | 12 |
Chlorophytes | VOLCA | Volvox carteri f. nagariensis | 0 | 7 |
Charophytes | KLENI | Klebsormidium nitens | 2 | 13 |
Charophytes | CHABU | Chara braunii | 8 | 12 |
Bryophytes | MARPO | Marchantia polymorpha | 9 | 13 |
Bryophytes | MapoRu | Marchantia polymorpha subsp. ruderalis | 9 | 7 |
Bryophytes | PHYPA | Physcomitrella patens | 9 | 23 |
Lycophytes | SELML | Selaginella moellendorffii | 17 | 15 |
Gymnosperm | CMI | Cycas micholitzii | 1 | 12 |
Gymnosperm | TBA | Taxus baccata | 7 | 10 |
Angiosperm | AMBTC | Amborella trichopoda | 7 | 9 |
Angiosperm | MUSAM | Musa acuminata subsp. malaccensis | 9 | 26 |
Angiosperm | ORYSJ | Oryza sativa subsp. japonica | 29 | 23 |
Angiosperm | MAIZE | Zea mays | 11 | 25 |
Angiosperm | SORBI | Sorghum bicolor | 13 | 19 |
Angiosperm | AQUCA | Aquilegia coerulea | 10 | 10 |
Angiosperm | VITVI | Vitis vinifera | 7 | 20 |
Angiosperm | POPTR | Populus trichocarpa | 12 | 26 |
Angiosperm | MEDTR | Medicago truncatula | 10 | 22 |
Angiosperm | CUCSA | Cucumis sativus | 5 | 10 |
Angiosperm | GOSRA | Gossypium raimondii | 14 | 29 |
Angiosperm | BRAOL | Brassica oleracea var. oleracea | 7 | 29 |
Angiosperm | ARATH | Arabidopsis thaliana | 5 | 20 |
Angiosperm | ERYGU | Erythranthe guttata | 9 | 27 |
Angiosperm | SOLLC | Solanum lycopersicum | 7 | 19 |
sum | 217 | 438 |
The number of MCAfunc domain proteins within a proteome varied between species. In charophytes, Klebsormidium nitens had two proteins, but Chara braunii eight. In angiosperms, monocots possessed generally higher numbers between nine and 29, whereas dicots five to 14 per species. The number of PLAC8 domain proteins was between seven and 30 per species, the lowest in the liverwort Marchantia polymorpha subsp. ruderalis and the highest in Brassica oleracea var. oleracea. The more complete liverwort proteome of Marchantia polymorpha (BUSCO 96.7% in UniProt) had 14 PLAC8 genes. The low number in M. polymorpha subsp. ruderalis (BUSCO 91.3%) might be explained by the incompleteness of its proteome. The full MCA protein with both of MCAfunc and PLAC8 domains was not found in charophytes. In streptophytes, at least one and up to three full MCA proteins were found per species.
MCAfunc and PLAC8 domain phylogenies
Since MCA is a multidomain protein, we studied the phylogenetic relationships of the domains MCAfunc and PLAC8 separately. In the MCAfunc domain Maximum Likelihood (ML) tree of 217 domain sequences, the samples included clustered according to the presence of partner domains (Fig 1; S11 and S16 Appendices). For example, samples of charophytes and ‘Clade a’, that included AtPUB13 (RING-type E3 ubiquitin ligase), had U-box (PF04564.15), Arm (PF00514.23) or Arm_2 (PF04826.13) as partner domains to MCAfunc (Fig 1). Arm and Arm_2 are overlapping domains. ‘Clade a’ contained two major clades each including all streptophyte lineages, suggesting a gene duplication. ‘Clade b’ (Fig 1; S11 Appendix) also contained two main clades including most streptophyte lineages, suggesting a further duplication, where most proteins in one clade had lost the Arm domain. The following clades ‘c’ and ‘d’ contained mostly monocot-specific undescribed or potential protein kinase proteins (Fig 1) (e.g. rice Q2QZY3). ‘Clade e’, is the MCA protein clade including AtMCA1 and AtMCA2, where the majority of MCAfunc domain proteins were partnered with the PLAC8 domain, which suggested that MCA as the derived proteins. A few proteins scattered across ‘Clade e’ had lost PLAC8 (Fig 1E), but there always was at least one protein with MCAfunc plus PLAC8 present in each species (Table 1). Some MCA proteins had obtained an alternative partner domain such as C1_2 (PF03107.16) and PP2 (PF14299.6) (M. polymorpha), Pkinase (PF00069.25), or Pkinase-Tyr (PF07714.17) (M. acuminata) C1_2 and Mlh1_C (PF16413.5) (Z. mays).
In Zea mays, A0A1D6PNG8 and A0A1D6F850 hold the protein name “MCA1” in UniProt, but they in ‘Clade d’ and also lacked PLAC8 but retained Pkinase or Pkinase_Tyr. On the other hand, CNR13 and A0AD6JP06 were found to be proper MCAs since they were in the MCA clade (‘Clade e’) and possessed PLAC8 (S11 and S16 Appendices), as previously reported [15].
The phylogeny of the other MCA domain, PLAC8, was also examined phylogenetically. In the PLAC8 domain ML tree of 438 domain sequences, the samples also clustered according to their partner domains (Fig 2; S13 and S17 Appendices). Most of PLAC8 domain proteins appeared as single domain proteins, but the MCA clade (‘Clade I’) retained MCAfunc, while another clade (‘Clade II’) retained a DUF2985 (PF11204) domain (Fig 2; S13 and S17 Appendices) with unknown functions. In A. thaliana, PLAC8 single domain proteins are registered as “Plant Cadmium Resistance proteins (PCR)”, with the function to reduce cadmium uptake [42]. The MCA clade appeared to be closely related to a clade including AtPCR9 and AtPCR12. All proteins in Clade I’, except for two, retained MCAfunc and PLAC8. In gymnosperms, two proteins per species were found, with one having MCAfunc while the other lacked it. The ML tree topology and distribution of partner domains suggested that the coupling between MCAfunc and PLAC8 domains occurred once in the plant lineage, possibly in the common ancestor of embryophytes, and was sometimes lost after gene duplication events but was always retained in at least one copy.
U-box and Arm are original partners of the MCAfunc domain
While the PLAC8 domain commonly existed within the plant, animal, and fungi kingdoms, the MCAfunc domain was only observed in streptophytes in the plant kingdom. Thus, the MCAfunc domain might be the key domain for the MCA protein, and we further assessed the coupling of the MCAfunc domain with its partner domains. The predicted domain combination for K. nitens (charophyte) was MCAfunc + U-box + Arm (type I) (Fig 3). Type I was found in all species, but the number of Arm domains varied from one to five. In C. braunii (charophyte), in addition to type I, MCAfunc only (type II) and MCAfunc + Ubox (type III) and lineage specific types (Fig 3Cb) were found. In P. patens (bryophyte-moss), type I, type II, the MCA type (MCAfunc + PLAC8: type IV), and also lineage specific types were found (Fig 3Pp). The proteome of ferns was not available, but type I to type IV, and lineage specific combinations were widely observed from lycophytes to angiosperms. The O. sativa MCAfunc + Pkinase (Os) type was widely present in angiosperm monocots, and in the moss P. patens, possessing an additional HSP70 domain (Pp). The A. thaliana MCAfunc + Arm: ARO3 (At) type was only observed in the angiosperms A. thaliana, Brassica oleracea, and Amborella trichopoda (Fig 3; Table 2).
Table 2. Domain partners of MCAfunc domain and their combinations found in proteomes across viridiplantae.
ID | Taxon | Domain combinations (Types) | ||||||
---|---|---|---|---|---|---|---|---|
I | II | III | IV | Os | At | sum | ||
CHLRE | Chlamydomonas reinhardtii | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
VOLCA | Volvox carteri f. nagariensis | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
KLENI | Klebsormidium nitens | 2 | 0 | 0 | 0 | 0 | 0 | 2 |
CHABU | Chara braunii | 4 | 1 | 1 | 0 | 0 | 0 | 6 |
MARPO | Marchantia polymorpha | 4 | 1 | 1 | 1 | 0 | 0 | 7 |
MapoRu | Marchantia polymorpha subsp. ruderalis | 4 | 2 | 0 | *1 | 0 | 0 | 7 |
PHYPA | Physcomitrella patens | 3 | 2 | 0 | 2 | †2 | 0 | 9 |
SELML | Selaginella moellendorffii | 12 | 0 | 3 | 2 | 0 | 0 | 17 |
CMI | Cycas micholitzii | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
TBA | Taxus baccata | 3 | 2 | 1 | 1 | 0 | 0 | 7 |
AMBTC | Amborella trichopoda | 3 | 0 | 3 | 1 | 0 | 1 | 8 |
MUSAM | Musa acuminata subsp. malaccensis | 1 | 0 | 4 | 3 | 1 | 0 | 9 |
ORYSJ | Oryza sativa subsp. japonica | 1 | 14 | 0 | 1 | 7 | 0 | 23 |
MAIZE | Zea mays | 1 | 2 | 1 | 2 | 4 | 0 | 10 |
SORBI | Sorghum bicolor | 1 | 2 | 0 | 1 | 9 | 0 | 13 |
AQUCA | Aquilegia coerulea | 4 | 0 | 2 | 1 | 0 | 0 | 7 |
VITVI | Vitis vinifera | 3 | 0 | 3 | 1 | 0 | 0 | 7 |
POPTR | Populus trichocarpa | 6 | 0 | 2 | 2 | 0 | 0 | 10 |
MEDTR | Medicago truncatula | 4 | 0 | 2 | 2 | 0 | 0 | 8 |
CUCSA | Cucumis sativus | 2 | 1 | 0 | 1 | 0 | 0 | 4 |
GOSRA | Gossypium raimondii | 10 | 0 | 2 | 1 | 0 | 0 | 13 |
BRAOL | Brassica oleracea var. oleracea | 3 | 0 | 0 | 3 | 0 | 1 | 7 |
ARATH | Arabidopsis thaliana | 1 | 0 | 1 | 2 | 0 | 1 | 5 |
ERYGU | Erythranthe guttata | 3 | 1 | 3 | 2 | 0 | 0 | 9 |
SOLLC | Solanum lycopersicum | 4 | 0 | 1 | 1 | 0 | 0 | 6 |
sum | 79 | 28 | 30 | 32 | 21 | 3 |
The result of HMMER searches of MCAfunc domain partners and their combinations are listed and arranged following the Tree of Life. The types of domain combinations are described as follows: Type I: MCAfunc + U-box + Arm/Arm_2, Type II: MCAfunc only, Type III: MCAfunc + U-box, Type IV: MCAfunc + PLAC8 (MCA protein type), Os (monocot type): MCAfunc + Pkinase/Pkinase_Tyr. † -with HSP70, At (ARO3 type): MCAfunc + Arm
Full MCA protein phylogeny, duplication and diversification in land plants
In order to unravel the history of MCA proteins in plants, a phylogeny of 106 full MCA proteins from 55 embryophyte species was reconstructed. The full MCA proteins include both MCAfunc and PLAC8 domain sequences. In this analysis, the basal grades in the ML tree, from bryophytes to gymnosperms followed the tree of life relationships (Fig 4). The angiosperm MCAs formed two clades. The majority of proteins fell in ‘Clade α’ including proteins of all angiosperm species analysed in this study. Only nine proteins formed ‘Clade β’, representing the orders Laureales (Cinnamomum camphora), Malpighiales, Rosales, Solanales and Brassicales (Carica papaya). These showed an MCA diversification and lacked approximately 10 aa in the N-terminal region (Fig 5). The predicted functional site of AtMCAs, the 21st position of aspartic acid (Asp21; Fig 5, arrow) [18], was different in mosses (asparagine) and liverworts (alanine). Hornworts, on the other hand, retained Asp21. At least one MCA per species retained Asp21 from lycophytes to angiosperms (Fig 5).
A maximum of 39 duplication events were estimated across the ML tree, with two outside angiosperms (Fig 4, S18 Appendix). Three duplication events were inferred prior or at the point of diversification of angiosperms. Within the angiosperms, duplications were scattered among the lineages, but the superrosid clade stood out with an accumulation of six inferred duplications events. For several species repeated duplication events were inferred, e.g. three in Linum usitatissimum and two in Beta vulgaris and Helianthus annuus (S18 Appendix).
Discussion
The evolution of multidomain proteins can be complex, and may involve de novo domain evolution, recruitment of existing domains as partners, and recombination and domain losses [1]. In the present study where the evolution of the multidomain protein MCA was examined in detail, the results showed that it represents an example with a complex evolutionary history.
Our comprehensive proteome interrogation with profile HMMs suggested that the MCAfunc domain [18], formerly subscribed as ARPK domain plus EF hand-like [7], is a well conserved domain among plants. Accordingly, MCA can be described as a multidomain protein composed of the MCAfunc and the PLAC8 domains. PLAC8 is widely observed in eukaryotes as seen in our profile HMM searches, in which we found it in all proteomes we examined (Table 1). The MCAfunc domain, on the other hand, was streptophyte-specific and not found in chlorophytes, suggesting that the domain originated in the common ancestor of streptophytes, i.e. charophytes plus embryophytes (Fig 6) [4].
The MCAfunc evolution further included domain recruitment, recombination and losses. The E3 ubiquitin ligase-type proteins (type I in Fig 3) found in charophytes, represent an ancestral combination (Table 1). Type I proteins were found in most streptophytes, except Cycas micholitzii possibly due to the incompleteness of its proteome. The ancestral charophyte K. nitens retained only the type I, while in the more derived C. braunii, MCAfunc obtained different partner domains or lost them all (Fig 3). Although domain-losses need to be seen with caution in some species included here due to their proteome incompleteness, single MCAfunc domain proteins were also observed in well-assembled genomes such as P. patens and O. sativa, supporting the existence of single-domain MCAfunc proteins (Fig 1 and Fig 3). Lineage-specific domain combinations were also observed in angiosperms, such as the Os- and At-types. Intriguingly, the At-type was only found in Brassicaceae and A. trichopoda, but perhaps due to unrelated parallel evolutionary events (Table 1).
A key event for the MCA evolution seemed to be the partnering of MCAfunc and PLAC8 first recruited in the common ancestor of embryophytes. MCA is seemingly streptophyte-specific and might play some basic roles, perhaps as a mechanosensor, for habit expansion to terra firma [9]. A previous study indicated that the Asp21 of MCAfunc domain is crucial for Ca2+ uptake [18]. Since Asp21 is diversified in mosses and liverworts, it could be hypothesized that their MCAs do not have Ca2+ uptake function. In angiosperms, MCA diverged into two clades and one might have changed functions from proper MCA. However, further studies would be needed to support these hypotheses.
Intriguingly, the E3 ubiquitin ligase type, with MCAfunc + U-box + Arm, seems to represent the most ancient MCAfunc protein (Fig 1). E3 ubiquitin ligase mediate substrate specificity for ubiquitylation [43] and is a large protein family. In the E3 ubiquitin ligases of A. thaliana, only plant U-Box13 (PUB13) and PUB45 retained MCAfunc domains (Fig 1; S11 and S16 Appendices). PUB13 was suggested to be involved in the abscisic acid signalling pathway, flowering time, and abiotic stress resistance [44]. The expression level of PUB45 seemed to be affected by nutrients [45]. It could be postulated that the MCAfunc domain was first utilized for these E3 ubiquitin ligases for roles for environmental adaptation, though further studies are required here. In addition, there are proteins only retaining the MCAfunc domain, but their function is not yet reported and remains unknown (S7 Appendix).
The PLAC8 domain exists in the Plant Cadmium Resistant (PCR) protein family as single domain proteins. PCRs are possibly transmembrane proteins and have roles in cadmium resistance [42] and zinc transport (PCR2; [46]). It is possible that the MCAfunc domain, initially part of E3 ubiquitin ligase, and PLAC8, an ion transporter, combined at some point in time and resulted in a novel protein, MCA, as a mechanosensor reacting to environmental calcium ions [11, 14]. The sequences between MCA and other plant mechanosensitive channels, such as MSL, are different [9], and the evolutionary history of MCA is different from that of MSL, which has originated in prokaryotes [47], and may represent an example of convergence in function.
Conclusions
In conclusion, MCA is an example of a multidomain protein, whose MCAfunc domain emerged de novo in the ancestor of streptophytes, and recruited an existing domain PLAC8 in the ancestor of embryophytes. The full MCA protein further duplicated and diversified during the evolution of land plants, involving recombination and losses of domains. However, each streptophyte species analysed had at least one complete full MCA copy, pointing to the importance of the protein. The functions of many MCA proteins are not investigated yet but they appear somewhat related to environment sensing, protein-protein interactions, and ion transport. In the basal lineage of streptophytes, i.e. charophytes, the MCAfunc domain is associated with U-box and Arm domains, supposed to play roles in the E3 ubiquitin ligase pathway. On the other hand, MCA proteins with MCAfunc and PLAC8 domains show quite different roles in ion transport. This further supports a hypothesis where domain swapping is an efficient mechanism to increase protein numbers with diversified functions during organismal evolution. Future studies will shed more light on the roles of these proteins and their interactions in relation to land plant evolution.
Supporting information
Acknowledgments
The authors are indebted to Dannie Durand for helpful comments and discussions pertaining to this study, particularly relating to the Notung rooting analyses. The authors also thank Daniel Barker and Frank Wright for helpful discussions. KN is grateful to the following persons for facilitating research associateships, to Pete Hollingsworth and Mark Newman at the Royal Botanic Garden Edinburgh (RBGE), UK, and to Akitoshi Iwamoto at Kanagawa University, Japan. This work was logistically supported by RBGE’s Science and ICT divisions. We also thank Duncan Reddish and Catherine Kidner for facilitating RBGE Linux server access and support, and Iain Milne for organizing access to the CropDiversity server, James Hutton Institute, Dundee, UK. We acknowledge the National Institute of Genetics, Japan, for allowing the use of their NIG-supercomputer system. We thank three anonymous reviewers and the editor for their constructive comments.
Data Availability
R codes used in this study are published in protocol.io. [dx.doi.org/10.17504/protocols.io.bkqwkvxe] Gene trees and their alignment are available from TreeBASE. [http://purl.org/phylo/treebase/phylows/study/TB2:S26880] The seed alignment of the MCAfunc domain is available in the new March 2021 release, v. 34.0, of Pfam [http://pfam.xfam.org/family/PF19584#tabview=tab0], as well as in the S19 Appendix of this study.
Funding Statement
This work was supported by the Japan Society for the Promotion of Science (JSPS) [KAKENHI Grant Number 25120708], Ministry of Education, Culture, Sports, Science & Technology of Japan, to HI. KN’s stay at RBGE is financially supported by the Edinburgh Botanic Garden Sibbald Trust [2018#18], JSPS [JSPS KAKENHI Grant Number 18K06375], and the Sumitomo Foundation [170204].
References
- 1.Buljan M, Bateman A. The evolution of protein domain families. Biochem Soc Trans. 2009; 37: 751–755. 10.1042/BST0370751 [DOI] [PubMed] [Google Scholar]
- 2.Bagowski CP, Bruins W, Te Velthuis AJ. The nature of protein domain evolution: shaping the interaction network. Curr Genomics. 2010; 11: 368–376. 10.2174/138920210791616725 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, et al. The Pfam protein families database in 2019. Nucl Acids Res. 2019; 47: D427–D432. 10.1093/nar/gky995 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Kersting AR, Bornberg-Bauer E, Moore AD, Grath S. Dynamics and adaptive benefits of protein domain emergence and arrangements during plant genome evolution. Genome Biol Evol. 2012; 4: 316–329. 10.1093/gbe/evs004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Zhang XC, Wang Z, Zhang X, Le MH, Sun J, Xu D, et al. Evolutionary dynamics of protein domain architecture in plants. BMC Evol Biol. 2012; 12: 6. 10.1186/1471-2148-12-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Basu D, Haswell ES. Plant mechanosensitive ion channels: an ocean of possibilities. Curr Opin Plant Biol. 2017; 40: 43–48. 10.1016/j.pbi.2017.07.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Nakagawa Y, Katagiri T, Shinozaki K, Qi Z, Tatsumi H, Furuichi T, et al. Arabidopsis plasma membrane protein crucial for Ca2+ influx and touch sensing in roots. Proc Natl Acad Sci USA. 2007; 104: 3639–3644. 10.1073/pnas.0607703104 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Furuichi T, Iida H, Sokabe M, Tatsumi H. Expression of Arabidopsis MCA1 enhanced mechanosensitive channel activity in the Xenopus laevis oocyte plasma membrane. Plant Signal Behav. 2012; 7: 1022–1026. 10.4161/psb.20783 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Kurusu T, Kuchitsu K, Nakano M, Nakayama Y, Iida H. Plant mechanosensing and Ca2+ transport. Trends Plant Sci. 2013; 18: 227–233. 10.1016/j.tplants.2012.12.002 [DOI] [PubMed] [Google Scholar]
- 10.Yamanaka T, Nakagawa Y, Mori K, Nakano M, Imamura T, Kataoka H, et al. MCA1 and MCA2 that mediate Ca2+ uptake have distinct and overlapping roles in Arabidopsis. Plant Physiol. 2010; 152: 1284–1296. 10.1104/pp.109.147371 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Kurusu T, Nishikawa D, Yamazaki Y, Gotoh M, Nakano M, Hamada H, et al. Plasma membrane protein OsMCA1 is involved in regulation of hypo-osmotic shock-induced Ca2+ influx and modulates generation of reactive oxygen species in cultured rice cells. BMC Plant Biol. 2012. a; 12: 11. 10.1186/1471-2229-12-11 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Liu Z, Cheng Q, Sun Y, Dai H, Song G, Guo Z, et al. A SNP in OsMCA1 responding for a plant architecture defect by deactivation of bioactive GA in rice. Plant Mol Biol. 2015; 87: 17–30. 10.1007/s11103-014-0257-y [DOI] [PubMed] [Google Scholar]
- 13.Liang J, He Y, Zhang Q, Wang W, Zhang Z. Plasma membrane Ca2+ permeable mechanosensitive channel OsDMT1 is involved in regulation of plant architecture and ion homeostasis in rice. Int J Mol Sci. 2020; 21: 1097. 10.3390/ijms21031097 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kurusu T, Yamanaka T, Nakano M, Takiguchi A, Ogasawara Y, Hayashi T, et al. Involvement of the putative Ca2+-permeable mechanosensitive channels, NtMCA1 and NtMCA2, in Ca2+ uptake, Ca2+-dependent cell proliferation and mechanical stress-induced gene expression in tobacco (Nicotiana tabacum) BY-2 cells. J Plant Res. 2012b; 125: 555–568. 10.1007/s10265-011-0462-6 [DOI] [PubMed] [Google Scholar]
- 15.Rosa M, Abraham-Juárez MJ, Lewis MW, Fonseca JP, Tian W, Ramirez V, et al. 2017. The maize MID-COMPLEMENTING ACTIVITY homolog CELL NUMBER REGULATOR13/NARROW ODD DWARF coordinates organ growth and tissue patterning. Plant Cell 2017; 29: 474–490. 10.1105/tpc.16.00878 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Ohnishi M, Kadohama N, Suzuki Y, Kajiyama T, Shichijo C, Ishizaki K, et al. Involvement of Ca2+ in vacuole degradation caused by a rapid temperature decrease in Saintpaulia palisade cells: a case of gene expression analysis in a specialized small tissue. Plant Cell Physiol. 2015; 56: 1297–1305. 10.1093/pcp/pcv048 [DOI] [PubMed] [Google Scholar]
- 17.Nishii K, Hughes M, Briggs M, Haston E, Christie F, DeVilliers MJ, et al. Streptocarpus redefined to include all Afro-Malagasy Gesneriaceae: Molecular phylogenies prove congruent with geography and cytology and uncovers remarkable morphological homoplasies. Taxon 2015; 64: 1243–1274. 10.12705/646.8 [DOI] [Google Scholar]
- 18.Nakano M, Iida K, Nyunoya H, Iida H. Determination of structural regions important for Ca2+ uptake activity in Arabidopsis MCA1 and MCA2 expressed in yeast. Plant Cell Physiol. 2011; 52: 1915–1930. 10.1093/pcp/pcr131 [DOI] [PubMed] [Google Scholar]
- 19.Leliaert F, Smith DR, Moreau H, Herron MD, Verbruggen H, Delwiche CF, et al. Phylogeny and molecular evolution of the green algae. Critical Rev Plant Sci 2012; 31: 1–46. 10.1080/07352689.2011.615705 [DOI] [Google Scholar]
- 20.Kanehisa M, Sato Y, Furumichi M, Morishima K, Tanabe M. New approach for understanding genome variations in KEGG. Nucl Acids Res. 2019; 47: D590–D595. 10.1093/nar/gky962 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Leebens-Mack JH, Barker MS, Carpenter EJ, Deyholos MK, et al. One thousand plant transcriptomes and the phylogenomics of green plants. Nature 2019; 574: 679–685. 10.1038/s41586-019-1693-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Medina R, Johnson MG, Liu Y, Wickett NJ, Shaw AJ, Goffinet B. Phylogenomic delineation of Physcomitrium (Bryophyta: Funariaceae) based on targeted sequencing of nuclear exons and their flanking regions rejects the retention of Physcomitrella, Physcomitridium and Aphanorrhegma. J Syst Evol. 2019; 57: 404–417. 10.1111/jse.12516 [DOI] [Google Scholar]
- 23.Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 2015; 31: 3210–3212. 10.1093/bioinformatics/btv351 [DOI] [PubMed] [Google Scholar]
- 24.Hall TA. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucl Acid S. 1999; 41: 95–98. 10.1021/bk-1999-0734.ch008 [DOI] [Google Scholar]
- 25.Criscuolo A, Gribaldo S. BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol Biol. 2010; 10: 210. 10.1186/1471-2148-10-210 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol. 2010; 59: 307–321. 10.1093/sysbio/syq010 [DOI] [PubMed] [Google Scholar]
- 27.Lefort V, Longueville J-E, Gascuel O. SMS: Smart Model Selection in PhyML." Mol Biol Evol. 2017; 34: 2422–2424. 10.1093/molbev/msx149 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Stamatakis A RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014; 30: 1312–1313. 10.1093/bioinformatics/btu033 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Milne I, Lindner D, Bayer M, Husmeier D, McGuire G, Marshall DF, et al. TOPALi v2: a rich graphical interface for evolutionary analyses of multiple alignments on HPC clusters and multi-core desktops. Bioinformatics. 2009; 25: 126–127. 10.1093/bioinformatics/btn575 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Chen K, Durand D, Farach-Colton M. NOTUNG: a program for dating gene duplications and optimizing gene family trees. J Comput Biol. 2000; 7: 429–447. 10.1089/106652700750050871 [DOI] [PubMed] [Google Scholar]
- 31.Stevens, P. F. 2001 onwards. Angiosperm Phylogeny Website. Version 14, July 2017 [and more or less continuously updated since] Available from: http://www.mobot.org/MOBOT/research/APweb/
- 32.Puttick MN, Morris JL, Williams TA, Cox CJ, Edwards D, Kenrick P, et al. The interrelationships of land plants and the nature of the ancestral embryophyte. Curr Biol. 2018; 28: 733–745 e732. 10.1016/j.cub.2018.01.063 [DOI] [PubMed] [Google Scholar]
- 33.R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2020. Available from: https://www.R-project.org/ [Google Scholar]
- 34.Wickham H. ggplot2: elegant graphics for data analyses. Springer-Varlag; New York. 2016. Available from: https://ggplot2.tidyverse.org [Google Scholar]
- 35.Yu G-C, Smith D, Zhu H, Guan Y, Tommy Lam T T-Y. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol Evol. 2017; 8: 28–36. 10.1111/2041-210X.12628 [DOI] [Google Scholar]
- 36.Paradis E, Schliep K. ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics. 2019; 35: 526–528. 10.1093/bioinformatics/bty633 [DOI] [PubMed] [Google Scholar]
- 37.Revell LJ. phytools: An R package for phylogenetic comparative biology (and other things). Methods Ecol Evol. 2012; 3: 217–223. 10.1111/j.2041-210X.2011.00169.x [DOI] [Google Scholar]
- 38.Edger PP, Jocelyn C. Hall JC, Harkess A, Tang M, Coombs J, Mohammadin S, et al. Brassicales phylogeny inferred from 72 plastid genes: A reanalysis of the phylogenetic localization of two paleopolyploid events and origin of novel chemical defenses. Amer J Bot. 2017; 105: 463–469. 10.1002/ajb2.1040 [DOI] [PubMed] [Google Scholar]
- 39.Nikolov LA, Shushkov P, Nevado B, Gan X, Al-Shehbaz IA, Filatov D, et al. Resolving the backbone of the Brassicaceae phylogeny for investigating trait diversity. New Phytol. 2019; 222: 1638–1651. 10.1111/nph.15732 [DOI] [PubMed] [Google Scholar]
- 40.Choi IS, Ruhlman TA, Jansen RK. Comparative mitogenome analysis of the genus Trifolium reveals independent gene fission of ccmFn and intracellular gene transfers in Fabaceae. Int J Mol Sci. 2020; 21: 6. 10.3390/ijms21061959 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Saarela JM, Burke SV, Wysocki WP, Barrett MD, Clark LG, Craine JM, et al. A 250 plastome phylogeny of the grass family (Poaceae): topological support under different data partitions. PeerJ. 2018; 6: e4299. 10.7717/peerj.4299 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Song WY, Martinoia E, Lee J, Kim D, Kim DY, Vogt E, et al. A novel family of cys-rich membrane proteins mediates cadmium resistance in Arabidopsis. Plant Physiol. 2004; 135: 1027–1039. 10.1104/pp.103.037739 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Metzger MB, Pruneda JN, Klevit RE, Weissman AM. RING-type E3 ligases: master manipulators of E2 ubiquitin-conjugating enzymes and ubiquitination. Biochim Biophys Acta. 2014; 1843: 47–60. 10.1016/j.bbamcr.2013.05.026 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Liao D, Cao Y, Sun X, Espinoza C, Nguyen CT, Liang Y, et al. Arabidopsis E3 ubiquitin ligase PLANT U-BOX13 (PUB13) regulates chitin receptor LYSIN MOTIF RECEPTOR KINASE5 (LYK5) protein abundance. New Phytol. 2017; 214: 1646–1656. 10.1111/nph.14472 [DOI] [PubMed] [Google Scholar]
- 45.Pradhan M. Understanding the regulatory basis of defense signalling in plants: The role of Argonautes in modulating defense responses in Nicotiana attenuate. PhD Dissertation Friedrich-Schiller- Universität Jena. 2019. [Google Scholar]
- 46.Song WY, Choi KS, Kim DY, Geisler M, Park J, Vincenzetti V, et al. Arabidopsis PCR2 is a zinc exporter involved in both zinc extrusion and long-distance zinc transport. Plant Cell 2010; 22: 2237–2252. 10.1105/tpc.109.070185 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Cox CD, Nakayama Y, Nomura T, Martinac B. The evolutionary ‘tinkering’ of MscS-like channels: generation of structural and functional diversity. Pflugers Arch.–Eur. J. Physiol. 2014; 467: 3–13. 10.1007/s00424-014-1522-2 [DOI] [PubMed] [Google Scholar]