Skip to main content
Plant Physiology logoLink to Plant Physiology
. 2020 Apr 14;183(2):637–655. doi: 10.1104/pp.19.01082

Insights into the Diversification and Evolution of R2R3-MYB Transcription Factors in Plants1

Chen-Kun Jiang 1, Guang-Yuan Rao 1,2,3
PMCID: PMC7271803  PMID: 32291329

Gene duplications within a specific subfamily led to the expansion of one of the largest transcription factor families in land plants.

Abstract

As one of the largest families of transcription factors (TFs) in plants, R2R3-MYB proteins play crucial roles in regulating a series of plant-specific biological processes. Although the diversity of plant R2R3-MYB TFs has been studied previously, the processes and mechanisms underlying the expansion of these proteins remain unclear. Here, we performed evolutionary analyses of plant R2R3-MYB TFs with dense coverage of streptophyte algae and embryophytes. Our analyses revealed that ancestral land plants exhibited 10 subfamilies of R2R3-MYB proteins, among which orthologs of seven subfamilies were present in chlorophytes and charophycean algae. We found that asymmetric gene duplication events in different subfamilies account for the expansion of R2R3-MYB proteins in embryophytes. We further discovered that the largest subfamily of R2R3-MYBs in land plants, subfamily VIII, emerged in the common ancestor of Zygnematophyceae and embryophytes. During plant terrestrialization, six duplication events gave rise to seven clades of subfamily VIII. Subsequently, this TF subfamily showed a tendency for expansion in bryophytes, lycophytes, and ferns and extensively diversified in ancestral gymnosperms and angiosperms in clades VIII-A-1, VIII-D, and VIII-E. In contrast to subfamily VIII, other subfamilies of R2R3-MYB TFs have remained less expanded across embryophytes. The findings regarding phylogenetic analyses, auxiliary motifs, and DNA-binding specificities provide insight into the evolutionary history of plant R2R3-MYB TFs and shed light on the mechanisms underlying the extensive expansion and subsequent sub- and neofunctionalization of these proteins.


MYB transcription factors (TFs) are a group of pan-eukaryotic TFs defined by the presence of one to four MYB repeats of ∼50 amino acids forming three α-helices (Frampton et al., 1989; Stracke et al., 2001; Dubos et al., 2010; de Mendoza et al., 2013). According to the MYB repeat number and the identity of the MYB repeats, MYB proteins are generally classified as MYB-related, R2R3-MYB, R1R2R3-MYB, and 4R-MYB proteins (Jin and Martin, 1999; Riechmann and Ratcliffe, 2000; Stracke et al., 2001; Yanhui et al., 2006; Dubos et al., 2010). Both R2R3-MYB and R1R2R3-MYB proteins contain R2R3-MYB repeats, which are essential for their DNA binding (Sakura et al., 1989; Howe et al., 1990). MYB TFs are one of the largest classes of TFs in plants, and the size of this group has been mainly attributed to the rapid expansion of the R2R3-MYB TF family (Riechmann and Ratcliffe, 2000; Dubos et al., 2010).

Previous studies have shown that the number of R2R3-MYB TFs increased during the course of green plant evolution (Feller et al., 2011; Du et al., 2015; Bowman et al., 2017). Species within Chlorophyta generally exhibit fewer than 12 R2R3-MYB TFs, whereas the genome of the streptophyte alga Klebsormidium nitens encodes 22 R2R3-MYB TFs (Feller et al., 2011; Hori et al., 2014; Du et al., 2015; Bowman et al., 2017). Land plants generally express a greater number of R2R3-MYB TFs in comparison with chlorophytes and charophytes; however, most land plant R2R3-MYB TFs have no orthologs in chlorophytes and charophytes. Only orthologs of Arabidopsis (Arabidopsis thaliana) FOUR LIPS (FLP)/AtMYB124 and DUO POLLEN 1 (DUO1)/AtMYB125 have been identified in chlorophytes and/or charophytes (Du et al., 2015; Bowman et al., 2017; Higo et al., 2018). The genome and transcriptome sequencing of algae, especially charophytes, has provided insights into the evolution of several large TF families, including the lateral organ boundaries domain (LBD; Chanderbali et al., 2015), homeodomain (HD; Catarino et al., 2016; Romani et al., 2018), basic helix-loop-helix (bHLH; Catarino et al., 2016), and NAM/ATAF/CUC (NAC; Maugarny-Calès et al., 2016) families. However, the evolutionary history of R2R3-MYB TFs before plant terrestrialization remains unclear.

Previous studies have identified more than 20 R2R3-MYB genes in the genomes of most land plants (Du et al., 2015; Bowman et al., 2017), suggesting that R2R3-MYB TFs have become substantially expanded in such organisms. In comparison with bryophytes and lycophytes, seed plants generally exhibit more R2R3-MYB TFs (Feller et al., 2011; Brockington et al., 2013; Du et al., 2015; Bowman et al., 2017). The R2R3-MYB TFs of land plants have been classified into 23 to 90 subgroups in different studies (Kranz et al., 1998; Stracke et al., 2001, 2014; Jiang et al., 2004b; Wilkins et al., 2009; Dubos et al., 2010; Du et al., 2012a, 2012b, 2015; Li et al., 2012; Hou et al., 2014; Soler et al., 2015; Zhang et al., 2018; Yang et al., 2019). There is little consensus regarding the classification schemes of R2R3-MYB TFs, including the number of subgroups, primarily because of lineage-specific gene loss or expansion (Wilkins et al., 2009; Li et al., 2012; Du et al., 2015; Soler et al., 2015), but also because of differences in sampling coverage, type of applied phylogenetic analysis, and strategy used for subgroup designation. The R2R3-MYB proteins of each subgroup usually share similar functions (Dubos et al., 2010). For example, referring to Arabidopsis nomenclature, the R2R3-MYBs of subgroup 18 are involved in pollen or spore development (Millar and Gubler, 2005; Aya et al., 2011), whereas those of subgroup 9 regulate epidermal cell differentiation and cuticle development (Glover et al., 1998; Brockington et al., 2013; Oshima et al., 2013), and the members of subgroups 4 to 7 regulate flavonoid biosynthesis (Liu et al., 2015; Xu et al., 2015). The evolutionary history of R2R3-MYB TFs in land plants is of particular interest because of their remarkable genetic and functional diversity. However, previous studies of R2R3-MYB TFs have provided limited information about the emergence of R2R3-MYB subgroups and the phylogenetic relationships between and within subgroups. Moreover, the frequently used angiosperm-based classification of R2R3-MYBs is unsuitable for analyses of the diversity and evolutionary history of these TFs in other land plants (Bowman et al., 2017). In addition, highly conserved R2R3-MYB domains and limited lineage sampling are obstacles to studies aimed at inferring the evolutionary history and phylogenetic relationships of R2R3-MYB TFs with a high resolution (Brockington et al., 2013). Therefore, to trace the lineage-asymmetric expansion of R2R3-MYB TFs in land plants, targeted strategies for analyzing their ancestral and recent evolutionary history were used in the current study.

Auxiliary motifs outside of the MYB domain are significant signatures of closely related R2R3-MYB TFs (Kranz et al., 1998; Stracke et al., 2001). However, not only are the molecular functions of the most conserved motifs unknown (Millard et al., 2019), but the evolutionary histories of the auxiliary motifs are unclear. The evolution of the DNA-binding specificities of plant R2R3-MYB TFs is also obscure. Three types of DNA sites (MYB binding site I [MBSI], MBSII, and MBSIIG) for R2R3-MYB TF binding were described prior to the year 2000 (Solano et al., 1997; Romero et al., 1998), but a study in 2010 showed that the DNA-binding specificity of FLP is uniquely distinct from that of other R2R3-MYBs (Xie et al., 2010). In addition, although the R2R3-MYB TFs within a phylogenetic clade exhibit similar DNA-binding preferences, overlap of DNA-binding specificity has been observed in different groups of R2R3-MYBs (Romero et al., 1998).

R2R3-MYB TFs are involved in plant development, cell differentiation, secondary metabolism, and stress responses (Dubos et al., 2010; Li et al., 2015; Daneva et al., 2016; Ma and Constabel, 2019). Some R2R3-MYB TFs function as lineage-specific regulators, whereas others present conserved functions (Dubos et al., 2010; Aya et al., 2011; Albert et al., 2018; Higo et al., 2018; Yasui et al., 2019), suggesting that plant R2R3-MYB TFs exhibit diverse evolutionary patterns. To better understand the evolution of R2R3-MYB proteins, we used a broad plant sample from recently available transcriptome and genome databases. Here, we report the complicated evolutionary history of the R2R3-MYB family in plants. We concentrated on the early diversification events of R2R3-MYB TFs before the divergence of extant embryophyte lineages and further lineage-specific expansions. By combining multiple phylogenetic analyses, the diverse evolutionary episodes of this TF family were revealed, including the emergence of land plant R2R3-MYB proteins and the evolutionary events driving the expansion of R2R3-MYB TFs. Additionally, conserved auxiliary motifs within each R2R3-MYB subfamily were identified, and the primary DNA-binding specificities of many R2R3-MYBs in Arabidopsis were classified. Importantly, we identified an unrecognized subfamily of land plant R2R3-MYB TFs that emerged in the common ancestor of Zygnematophyceae and embryophytes and became the largest R2R3-MYB clade in land plants due to multiple gene duplications.

RESULTS

Ten Subfamilies of R2R3-MYB TFs in Land Plants

We obtained more than 4,300 sequences with R2R3-MYB domains from 87 species covering the major lineages of Archaeplastida (Table 1; Supplemental Fig. S1; Supplemental Table S1).

Table 1. Sampling as reported in previous phylogenetic analyses of proteins with the R2R3-MYB domain as compared to the current study.

This table summarizes MYB sampling across the Archaeplastida from six studies, with the number of species (left column) and number of sequences (right column) shown for each study.

Lineage Wilkins et al., 2009 Li et al., 2012 Soler et al., 2015 Du et al., 2015 Bowman et al., 2017 This Study
Rhodophytes 0 0 0 0 0 0 1 4 0 0 3 11
Glaucophytes 0 0 0 0 0 0 0 0 0 0 1 1
Chlorophytes 0 0 0 0 0 0 4 17 2 16 6 35
Charophytes
 Mesostigmatophyceae 0 0 0 0 0 0 0 0 0 0 1 4
 Chlorokybophyceae 0 0 0 0 0 0 0 0 0 0 2 16
 Klebsormidiophyceae 0 0 0 0 0 0 0 0 1 23 2 30
 Charophyceae 0 0 0 0 0 0 0 0 0 0 5 18
 Coleochaetophyceae 0 0 0 0 0 0 0 0 0 0 4 37
 Zygnematophyceae 0 0 0 0 0 0 0 0 0 0 13 89
Bryophytes 1 2 0 0 0 0 1 52 2 79 3 120
Lycophytes 0 0 0 0 0 0 1 11 1 18 1 42
Monilophytes 0 0 0 0 0 0 0 0 0 0 15 377
Gymnosperms 3 21 0 0 0 0 1 159 1 165 6 439
Angiosperms
 ANA grade 0 0 0 0 0 0 1 55 1 55 2 141
 Magnoliids 0 0 0 0 0 0 0 0 0 0 2 208
 Monocots 4 12 1 102 1 106 2 265 0 0 4 592
 Basal eudicots 0 0 0 0 0 0 1 77 0 0 2 214
 Core eudicots 11 469 5 719 4 570 7 862 1 131 15 1,938
Total 19 504 6 821 5 676 19 1,502 9 487 87 4,312

Previous phylogenetic analyses of R2R3-MYBs showed that increasing the number of sequences in the dataset could reduce support values for shallow and deep branches (Du et al., 2015; Soler et al., 2015). Genome-wide duplications have contributed to the expansion of R2R3-MYB TFs in land plants (Du et al., 2015). To balance the resolution and species sampling, we first analyzed the phylogeny of the R2R3-MYB and R1R2R3-MYB proteins in Marchantia polymorpha, Selaginella moellendorffii, and Amborella trichopoda, whose genomes present no recent whole-genome duplications or polyploidizations (Banks et al., 2011; Amborella Genome Project, 2013; Bowman et al., 2017). We rooted the tree with the CDC5s, a group of pan-eukaryotic MYB proteins that diverged from R2R3-MYB and R1R2R3-MYB in the most recent common ancestor of eukaryotes (Du et al., 2015). The phylogeny revealed that the R2R3-MYB TFs clustered into 10 monophyletic clades, whereas the R1R2R3-MYBs were paraphyletic (Fig. 1A; Supplemental Fig. S2). The branch length of the ASYMMETRIC LEAVES1/ROUGH SHEATH1/PHANTASTICA (ARP) clade was long, similar to the extraordinarily long branch of this clade identified in previous studies (Stracke et al., 2001; Jiang et al., 2004a; Du et al., 2012b; Soler et al., 2015). This clade had the lowest leaf stability index (Supplemental Fig. S2), which implied that the placement of the ARP clade was unstable. Placement of the other nine monophyletic clades did not change after the ARPs were removed, and the support value for clade VIII thus increased to 95 (Supplemental Fig. S2). The 10 clades of R2R3-MYB TFs were still retained after the fern Salvinia cucullata and the gymnosperm Picea abies were included in the dataset (Fig. 1F; Supplemental Fig. S3). Phylogenetic reanalysis of R2R3-MYB datasets from previous studies (Du et al., 2015; Bowman et al., 2017) also showed the presence of ARPs and the remaining nine monophyletic clades (Supplemental Fig. S4), confirming that our findings are not artifacts of biased sampling. Therefore, these 10 clades were designated as land plant R2R3-MYB subfamilies (Fig. 1, A and E). The analyses of intron-exon structure and conserved auxiliary motifs outside of the R2R3-MYB domain also supported the identification of 10 subfamilies of R2R3-MYB TFs in land plants, with few exceptions (Fig. 1, B–E; Supplemental Figs. S5 and S6; Supplemental Table S2). All 10 R2R3-MYB subfamilies contained sequences from M. polymorpha and S. moellendorffii (Fig. 1, A and F).

Figure 1.

Figure 1.

Phylogenetic analysis of proteins with R2R3-MYB domains in representative land plants. A, Maximum likelihood (ML) phylogeny of proteins with R2R3-MYB domains in M. polymorpha, S. moellendorffii, and A. trichopoda. The MYB protein CDC5 sequences were used as the outgroup. Ultrafast bootstrap values are associated with the internal branches; values <50 were omitted. R1R2R3-MYB TFs and 10 subfamilies of R2R3-MYB TFs are labeled. The bracketed text indicates the source database or the sequence amplified in this study. B, The presence or absence of subfamily-specific auxiliary motifs of R2R3-MYB TFs is indicated by a solid or open square, respectively, with number labels corresponding to the terminal nodes of the phylogenetic tree. C, Intron-exon structure of MYB genes in the R2R3-MYB-encoding region. D, Subfamily-specific amino acid auxiliary motifs (gray boxes) of land plant R2R3-MYB TFs. The diagrams are not drawn to scale. E, Subfamily VIII R2R3-MYB protein sequences of motifs 31 (left column) and 32 (right column). F, ML phylogeny of proteins with R2R3-MYB domains in five representative species. R1R2R3-MYB TFs and 10 subfamilies of R2R3-MYB TFs are labeled and collapsed into triangles. Ultrafast bootstrap values are associated with the internal branches. The color of the solid circles in A, E, and F indicates the source of the sequence: green, bryophytes; orange, lycophytes; yellow, ferns; purple, gymnosperms; and red, angiosperms.

Seven Land Plant R2R3-MYB Subfamilies Emerged in Algae

Next, the presence of orthologs of 10 subfamilies of land plant R2R3-MYBs in algae was explored. The phylogenetic tree of algae and Marchantia R2R3-MYB TFs showed that land plant R2R3-MYB proteins did not reside in a monophyletic clade (Fig. 2A). Five clades containing green plants with high support values (>90) were identified in the phylogenetic tree (Fig. 2A). The unrooted tree also supported the presence of the five monophyletic groups (Supplemental Fig. S7). The second clade, consisting of the R1R2R3-MYB proteins, was omitted in our following analyses.

Figure 2.

Figure 2.

Phylogenetic analysis of proteins with R2R3-MYB domains in glaucophytes, red algae, and green plants. A, ML phylogram of proteins with R2R3-MYB domains in algae and M. polymorpha. The MYB protein CDC5 sequences were used as the outgroup. Ultrafast bootstrap values are associated with the internal branches; values <50 were omitted. Orthologs of land plant R2R3-MYB proteins from subfamilies FLP, II, V, ARP, VI, VII, and VIII are labeled in gray blocks. The key applies to the sequences in A to C. B, Sequences of auxiliary motifs of R2R3-MYB TFs from subfamilies FLP, II, V, ARP, VI, VII, and VIII in land plants and their charophycean orthologs. C, Intron-exon structure of MYB genes from subfamilies FLP, V, ARP, VI, and VIII, as well as their algal orthologs in the R2R3-MYB domain-encoding region. D, The presence or absence of orthologs of land plant R2R3-MYB subfamilies in the plant lineage is indicated by solid or open squares, respectively.

The first clade, with a support value of 100, contained a sequence from the chlorophyte Chlamydomonas reinhardtii and two FLPs from M. polymorpha, and the remaining sequences were from streptophyte algae (Fig. 2A). R2R3-MYBs of subfamilies I and II from liverwort were identified in the third clade, which consisted only of streptophyte sequences (Fig. 2A). Sequences of both subfamilies I and II from M. polymorpha clustered with basal charophyte sequences with high support. In the fourth clade, 10 chlorophyte sequences did not cluster onto a branch, and dozens of charophyte branches were identified (Fig. 2A), which indicated that the R2R3-MYBs of this clade have diversified along with green plant evolution. M. polymorpha sequences from subfamilies III, IV, and V were included in this clade, and their placements were near to charophycean sequences; however, only subfamily V R2R3-MYB TFs constituted a monophyletic group with charophyte sequences, with a support value of 100. Similar to the third clade, the fifth clade contained only streptophyte sequences. Interestingly, no sequences of Mesostigmatophyceae or Chlorokybophyceae were identified (Fig. 2A), indicating that this clade diversified after the split of Mesostigmatophyceae-Chlorokybophyceae and the lineage leading to the remaining streptophytes. Notably, subfamily ARP, VI, VII, and VIII sequences from M. polymorpha clustered with charophyte sequences with strong support (≥93; Fig. 2A).

Analyses of the gene structure and protein sequences of land plant R2R3-MYBs and their candidate charophyte orthologs indicate that some of the charophyte sequences contained conserved auxiliary motifs and exhibited intron-exon structures identical to those of the corresponding land plant subfamilies (Fig. 2, B and C). No additional evidence was found to support the orthology of the charophyte Mesvi458S06219/Chrsp259S00284 sequences to liverwort Mapoly0085s0092, a subfamily I sequence. In summary, these results strongly support the emergence of plant R2R3-MYB subfamilies II, V, ARP, VI, VII, and VIII in the streptophyte alga ancestor of land plants, and that the emergence of subfamily FLP could date back to ancestral green plants (Fig. 2D).

Land Plant Subfamily VIII R2R3-MYB TFs

More than half of the R2R3-MYB TFs within land plant species belonged to subfamily VIII (Fig. 1A). Next, we analyzed the diversification of the R2R3-MYB TFs of subfamily VIII from seven species of major lineages of embryophytes, including M. polymorpha, Physcomitrella patens, S. moellendorffii, S. cucullata, P. abies, A. trichopoda, and Arabidopsis. The subfamily VIII R2R3-MYB TFs from land plants formed a monophyletic group when Zygnematophyceae sequences were used as the outgroup (Fig. 3A). Four branches were identified in the land plant subtree. The first branch was composed of two monophyletic clades (VIII-A-1 and VIII-A-2), and the second branch contained three clades (VIII-b, VIII-B, and VIII-C). The remaining two branches were VIII-D and VIII-E. Notably, clades VIII-A-1, VIII-D, and VIII-E were shown to be extensively diversified in seed plants (Fig. 3A; Supplemental Fig. S8).

Figure 3.

Figure 3.

Phylogenetic analysis of subfamily VIII R2R3-MYB TFs. A, ML phylogram of R2R3-MYB proteins of subfamily VIII in representative land plants. Clades within subfamily VIII are labeled and collapsed into triangles. Sequences from Zygnematophyceae were used as the outgroup. Ultrafast bootstrap values are associated with the internal branches. B, Schematic and alignment of VIII-A-, VIII-b/B-, and VIII-D-specific auxiliary motifs. Diagrams are not drawn to scale. C, Hypothetical evolutionary model of subfamily VIII TFs before the diversification of extant land plants. Circles represent inferred gene duplication events. Question marks indicate that there is conflict between motif analysis and topology.

Focusing on the region outside of the R2R3-MYB domain and subfamily-specific motifs, clade-specific motifs were identified in VIII-A, VIII-b/B, and VIII-D (Fig. 3B). Some R2R3-MYB TFs belonging to VIII-A-1 and VIII-A-2 shared an exclusive auxiliary motif, VIII-(1), at the C terminus, suggesting that duplication of the ancestral VIII-A R2R3-MYB gene gave rise to the VIII-A-1/2 clades (Fig. 3, B and C). Similarly, clades VIII-b and VIII-B may share an ancestral MYB gene, since some proteins from both VIII-b and VIII-B contained conserved motifs (Fig. 3, B and C). The finding that all of the major clades in subfamily VIII contained bryophyte sequences indicated that the land plant subfamily VIII MYB TFs were derived from six deep duplications of an ancestral subfamily VIII R2R3-MYB gene before the divergence of extant land plant lineages (Fig. 3C).

Gene Duplications of R2R3-MYB Genes and Their Extraordinary Expansion

The evolutionary history of each subfamily of land plant R2R3-MYB TFs was explored. Phylogenetic analyses showed that R2R3-MYB genes from subfamilies II, IV, V, and ARP were duplicated in the common ancestor of angiosperms (Fig. 4A; Supplemental Fig. S9) and that R2R3-MYBs from subfamilies IV and VII were duplicated before the divergence of gymnosperms and angiosperms (Supplemental Fig. S9). Lycophyte- and gymnosperm-specific gene duplication of R2R3-MYBs was identified in subfamilies VI and III, respectively (Fig. 4, B and C; Supplemental Fig. S9). Most subfamilies of land plant R2R3-MYB TFs were not duplicated before embryophyte diversification (Supplemental Fig. S9). Land plant MYB sequences from subfamily FLP constitute a paraphyletic group, suggesting that the ancestral gene of subfamily FLP may have been duplicated before plant terrestrialization (Supplemental Fig. S9).

Figure 4.

Figure 4.

ML phylograms of representative subfamilies/clades of R2R3-MYB TFs in streptophytes. Phylogenies of subfamily II (A), subfamily III (B), subfamily VI (C), clade A-2 of subfamily VIII (D), and clade B of subfamily VIII (E). The color of the terminal node indicates the source of the sequence, as shown in the inset. Ultrafast bootstrap values are associated with the internal branches; values <50 were omitted. The Greek letter in the colored square indicates the name of the subclade, whereas an X indicates the presence of a within-clade duplication in core eudicots. The presence or absence of subfamily/clade-specific auxiliary motifs of R2R3-MYB TFs is indicated by solid and open squares, respectively, corresponding to the terminal nodes of the phylogenetic tree.

Next, we focused on the evolution of the largest subfamily of land plant R2R3-MYB TFs, namely subfamily VIII. One to three duplication events occurred in the common ancestor of seed plants and angiosperms in clades VIII-A-2, VIII-B, and VIII-C (Fig. 4D; Supplemental Fig. S10). The unrooted trees of the remaining three clades, VIII-A-1, VIII-D, and VIII-E, were constructed to explore their diversification (Fig. 5; Supplemental Fig. S11). We found that the topology of the three unrooted trees and the land plant phylogeny were incongruent. The sequences of bryophytes only constituted a few monophyletic groups in the three clades; and lycophyte sequences diversified into four groups in VIII-A-1 but showed no expansion in VIII-D and VIII-E (Fig. 5; Supplemental Figs. S10 and S11). We identified eight monophyletic groups consisting of euphyllophyte or seed plant sequences in the unrooted trees, including VIII-A-1-[s1]/[s2], VIII-D-[e1]/[s1]/[s2], and VIII-E-[e1]/[e2]/[s1] (Fig. 5). The results indicate that the ancestral genes of these monophylies emerged before the diversification of extant euphyllophytes (i.e. ferns and seed plants) or seed plants, and the emergence of the ancestral genes was possibly caused by deep gene duplications. In addition to deep gene duplications, the detailed phylogenetic analyses showed that lineage-specific radiations within ferns, gymnosperms, and angiosperms have also played crucial roles in the expansion of clades VIII-A-1, VIII-D, and VIII-E (Supplemental Fig. S10). For example, gene duplications occurred at least four and two times in the lineage leading to gymnosperms and angiosperms, respectively, as identified in VIII-D-[s1] (Supplemental Fig. S10).

Figure 5.

Figure 5.

Unrooted trees of clades VIII-A-1 (A), VIII-D (B), and VIII-E (C) from representative land plants. Gray-shaded areas represent clades with sequences from multiple gymnosperm and angiosperm species; ultrafast bootstrap values of the clades are indicated in the basal region. The letters and colors of the solid circles indicate the sources of the sequences: green, bryophytes; orange, lycophytes; yellow, ferns; purple, gymnosperms; and red, angiosperms. Green circles with a blue outline indicate sequences of mosses.

Some lineage-specific subclades and sequences were identified in subfamily VIII within angiosperms, such as the AtMYB47/95, AtMYB28/29/76, and AtMYB34/51/122 sequences, which belonged to Arabidopsis-specific subclades in clade VIII-D (Fig. 6, A and B; Supplemental Fig. S10). Additionally, we found that VIII-D-[s2]-ε is a core eudicot-specific subclade (Fig. 6A; Supplemental Fig. S10).

Figure 6.

Figure 6.

Summary of the evolutionary history of R2R3-MYB TFs in land plants and their algal orthologs. A, Schematic diagram of the evolution of R2R3-MYB TFs in each subfamily. The schematic representation is dependent on the phylogenetic analyses conducted in this study (Supplemental Figs. S9 and S10). Black lines indicate the phylogeny of green plants based on Wickett et al. (2014), The Angiosperm Phylogeny Group (2016), Puttick et al. (2018), and One Thousand Plant Transcriptomes Initiative (2019). Squares on the black lines indicate the presence of lineage-specific subclades of R2R3-MYB TFs; a rounded square represents an Arabidopsis-specific subclade. A Greek letter in a square indicates the name of the subclade, an X indicates the presence of within-clade duplication in core eudicots, an L indicates that the sequence is likely to be a member of an existing subclade, and a question mark indicates that the evolutionary pattern of the subclade is questionable. A number in the square indicates the number of related subclades in the phylogenetic tree. B, Classification of R2R3-MYB TFs in Arabidopsis according to our analyses. The placement of TFs corresponds with their respective position (squares) in A (to the left). C, Summary of the classification systems of plant R2R3-MYB TFs in previous studies. Rectangles represent the R2R3-MYB subgroup(s) identified in the study and are labeled with the name of the subgroup/branch; strikethrough of the Arabidopsis R2R3-MYB TF name indicates that the TF was not included in the subgroup.

Several R2R3-MYB gene-loss events were identified in land plants (Fig. 6; Supplemental Figs. S9 and S10). For example, no R2R3-MYB TFs of subfamily I were identified in euphyllophytes. Similarly, the genomes of ferns did not encode R2R3-MYB TFs of subfamily VI or clade VIII-C. No VIII-b sequences were identified in vascular plants. In subfamily VIII, some clades only contained sequences from either liverworts or mosses: clades VIII-A-2 and VIII-B did not contain sequences from M. polymorpha, whereas no moss sequences were identified in VIII-C (Supplemental Fig. S10).

Evolution of Conserved Motifs of R2R3-MYB TFs

We found that the loss of subfamily-specific auxiliary motifs of R2R3-MYB TFs is common. Sequences from some monophyletic clades have completely lost their auxiliary motif(s). For example, motif 1 was only identified in the R2R3-MYBs of subfamily FLP from Coleochaetophyceae and euphyllophytes (Supplemental Fig. S9; Supplemental Table S2). Subfamily-specific motifs 10 and 11 were absent in angiosperm subclades IV-α and IV-β, and auxiliary motifs 27 and 29 were not detected in the sequences from angiosperm subclade VII-α (Supplemental Fig. S9). In subfamily VIII, clades VIII-b and VIII-E-[e1] did not contain motif 31, whereas the sequences in VIII-C and VIII-E-[e1] did not contain motif 32 (Supplemental Fig. S10).

The loss of the clade-specific motifs of VIII-A, VIII-B, and VIII-D accompanied land plant evolution. In clade VIII-A-1, the presence of motif VIII-(1) was haphazardly distributed among land plants (Supplemental Fig. S10); motif VIII-(1) was not present in the angiosperm sequences of VIII-A-2 (Fig. 4D). Both motifs VIII-(3) and VIII-(5) of VIII-b/B were absent from angiosperm subclade VIII-B-β (Fig. 4E). In VIII-D, motifs VIII-(6) and VIII-(7) were only present in a few angiosperm subclades, although many VIII-D sequences from nonflowering plants contained these motifs (Supplemental Fig. S10).

The conserved amino acid signature ([D/E]LX2[R/K]X3LX6LX3R) in the R3 repeat, involved in the interaction with R/B-like bHLH proteins to form the MYB-bHLH-WDR (WD repeat) TF complex (Zimmermann et al., 2004; Feller et al., 2011; Xu et al., 2015), was only present in the R2R3-MYB TFs of VIII-E, except for clade VIII-E-[e1] (Supplemental Fig. S8). This finding indicates that the capacity to form a protein complex with bHLH and WDR TFs emerged in the ancestral VIII-E TFs.

Three Types of Primary DNA-Binding Specificities of R2R3-MYB TFs

Three types of primary DNA-binding specificities of R2R3-MYB TFs were identified from eight subfamilies (Fig. 7; Supplemental Fig. S12). Type 1 specificities were found in the seven subfamilies of R2R3-MYB TFs (II–VIII), in addition to R1R2R3-MYB TFs. Intriguingly, we observed that type 2 DNA-binding motifs were the target sequences of R2R3-MYBs from subfamily VIII, rather than targets of other subfamilies. The third type was the specificity of R2R3-MYB TFs AtMYB88 and AtMYB124 of subfamily FLP.

Figure 7.

Figure 7.

Classification of the primary DNA-binding specificities of mouse c-Myb and MYB proteins from Arabidopsis. A, Network graph of the pairwise similarities of DNA-binding specificities of MYB proteins. The nodes represent the DNA-binding specificity of the MYB proteins, and the edges represent pairwise similarities (S) >1.5e−4. Dashed rings indicate clusters of binding specificities. B, DNA-binding specificities of MYB TFs clustered in the UPGMA tree. Notably, the UPGMA tree is rooted at the midpoint, and the topology does not indicate the evolutionary history of the DNA-binding specificities. The numbers indicate the position of a base in the DNA-binding profile (Solano et al., 1997). The color of the nodes in A and B indicates the source of the MYB protein.

DISCUSSION

The Evolution of MYB Proteins in Plants Has Involved Changes in the Number of MYB Repeats

MYB-related, R2R3-MYB, R1R2R3-MYB, and 4R-MYB proteins are recognized as separated groups in the MYB TF superfamily because they contain different numbers of MYB repeats (Jin and Martin, 1999; Riechmann and Ratcliffe, 2000; Stracke et al., 2001; Yanhui et al., 2006; Dubos et al., 2010). We found that some MYB proteins with two tandem MYB repeats are orthologous to R1R2R3-MYB or 4R-MYB TFs (Supplemental Fig. S13). The number of tandem MYB repeats in the R1R2R3-MYB and 4R-MYB proteins may change during plant evolution, which indicates that in addition to the number of MYB repeats, phylogenetic relationships should be considered in the classification and identification of MYB proteins.

It has been hypothesized that R2R3-MYB proteins were derived from R1R2R3-MYB proteins via the loss of the first MYB repeat (Rosinski and Atchley, 1998). An alternative hypothesis suggests that R1R2R3-MYB TFs emerged when R2R3-MYB TFs gained MYB repeats R1 (Jiang et al., 2004a). The “gain-of-repeat” hypothesis is based on phylogenetic reconstruction by neighbor joining, which supports the notion that ARP is sister to R1R2R3-MYB and the remaining R2R3-MYB TFs (Jiang et al., 2004a). This last hypothesis is not in accord with our finding that ARP is a clade of streptophyte-specific R2R3-MYB TFs (Fig. 2). Therefore, we infer that R1R2R3-MYB proteins did not evolve from R2R3-MYB TFs through the gain of MYB repeat.

The placement of subfamily FLP is close to the R1R2R3-MYB TFs (Fig. 1, A and F), which was also noted previously (Kelemen et al., 2015). Our analysis of the intron-exon structure of the R2R3-MYB domain showed that land plant FLP and several R2R3-MYB genes in algae shared the same intron positions and phases as R1R2R3-MYB genes (Supplemental Fig. S14). It is likely that the shared intron positions were derived from a common ancestral MYB gene rather than having evolved independently.

Identification of R2R3-MYB TFs Is Influenced by Quality of Genome Annotation and Assembly

Previous analyses identified fewer than 20 R2R3-MYB genes in the genome of the spikemoss S. moellendorffii (Du et al., 2015; Bowman et al., 2017); however, 41 R2R3-MYB genes were identified in the genome of S. moellendorffii in this study via the molecular cloning and in silico analysis of two versions of the genome annotation (Phytozome and the National Center for Biotechnology Information) after removing alleles. In the Phytozome version of the genome of S. moellendorffii, the corresponding loci of the R2R3-MYB genes identified and cloned here encode proteins without the full-length R2R3-MYB domain (Supplemental Fig. S15). Similarly, the annotated coding sequences (CDSs) of five R2R3-MYB genes in A. trichopoda encode proteins with only one MYB repeat, which is different from their transcriptome assemblies (Supplemental Fig. S16). The discrepancies indicate that the misidentification of sequences could occur due to low-quality genome annotation. We also found that the genomic sequences of many potential R2R3-MYB genes were located at the edges of scaffolds or had assembly gaps, which hampered identification and subsequent analyses (Supplemental File S1). Therefore, the number of R2R3-MYB genes within some species is probably underestimated; more R2R3-MYB genes might be identified as the quality of genome assembly and annotation is improved.

Our analyses lay a solid foundation for further evolutionary studies of R2R3-MYB TFs. Further data sources and the improvement of genome annotation will inevitably refine the evolutionary history of plant R2R3-MYB TFs and improve our understanding of their diversification. Detailed evolutionary analysis within a subfamily or a clade using denser taxonomic sampling would be helpful to trace the processes and mechanisms of gene duplications.

Increased Diversity of R2R3-MYB TFs during Green Plant Evolution

Compared with rhodophytes and glaucophytes, chlorophytes and charophycean algae exhibit a greater number of R2R3-MYB TFs (Fig. 2; Supplemental Fig. S1). Many chlorophyte- and charophyte-specific sequences or clades of R2R3-MYB TFs were found in the phylogeny, which indicates that the diversification of R2R3-MYB TFs in green plants initially occurred in water. The results also imply that our classification scheme of R2R3-MYB subfamilies is applicable to the sequences of land plants and, to a limited extent, those of algae.

Similar to TFs from the LBD, HD, bHLH, and NAC families (Chanderbali et al., 2015; Catarino et al., 2016; Maugarny-Calès et al., 2016; Romani et al., 2018), the subfamilies of land plant R2R3-MYBs are not exclusive to embryophytes; TFs from R2R3-MYB subfamilies FLP, II, V, ARP, VI, VII, and VIII emerged in the aquatic ancestor of land plants (Fig. 2). This study is the first to identify orthologs of the R2R3-MYB protein subfamilies II, V, ARP, VII, and VIII in streptophyte algae. There are three possible reasons for the absence of subfamilies I, III, and IV in charophytes. The first is that their gene expression has not been detected by transcriptomic studies (Gao et al., 2018). Second, orthologs of these sequences may have been lost in streptophyte algae. The third possibility is that these sequences may have emerged after the split of Zygnematophyceae and the lineage leading to embryophytes.

The exon-intron structures of the R2R3-MYB genes of subfamilies V and VI from land plants and charophytes are different (Fig. 2C), indicating that the structures of the ancestral genes of these subfamilies changed before the divergence of major land plant lineages. The absence of motif 23 of subfamily VI and the lack of subfamily VII motifs 25 to 30 in the charophycean orthologs (Fig. 2B; Supplemental Fig. S8) suggest that these proteins may have undergone structural evolution in the common ancestor of land plants. Similarly, the miR319/miR159 target site is absent in subfamily VII genes from Coleochaete irregularis and Spirogyra sp. (Supplemental Fig. S8), suggesting that this subfamily gained this site and became subject to microRNA regulation during plant terrestrialization (Lin and Bowman, 2018).

Orthologs of the land plant subfamily VI R2R3-MYB gene DUO1 have been recently identified in streptophyte algae (Higo et al., 2018). A phylogenetic analysis by Higo et al. (2018) indicated that the DUO1-like genes of Charophyceae and Zygnematophyceae constitute a monophyletic group that is sister to land plant DUO1 genes. However, our results strongly indicate (support value = 99) that the Charophyceae DUO1 sequences form a sister group to the land plant subfamily VI R2R3-MYB TF, whereas the Zygnematophyceae DUO1-like sequences are more distant, being positioned as sister to the clade containing the first two groups (Fig. 2A). We found that motif 22 is shared by the embryophyte subfamily VI R2R3-MYB TFs and their Charophyceae orthologs, but the motif is absent in the sequences of Zygnematophyceae (Fig. 2B). Considering both the phylogenetic relationships and the primary structure of the proteins, it is possible that independent gene loss after gene duplication contributed to the early evolution of the subfamily VI R2R3-MYB gene DUO1.

Recent studies have revealed that there are differences in the expression patterns and functions of closely related R2R3-MYB and NAC TFs from land plants and Chara sp., indicating that the changes in the expression and neofunctionalization of these genes occurred after the split of Charophyceae and the lineage leading to embryophytes (Higo et al., 2018; Bonnot et al., 2019). It is interesting to explore whether the neofunctionalization of the R2R3-MYB TFs of subfamilies FLP, II, V, ARP, VII, and VIII occurred during plant terrestrialization to rewire regulatory networks to assist early embryophytes in adapting to land.

A Classification Scheme for R2R3-MYB TFs in Land Plants

This study provides a classification scheme for R2R3-MYB TFs in land plants (Fig. 6) to replace the previous schemes that classified this group of TFs into dozens of “subgroups”. There are three major reasons in support of our classification scheme.

First, previous analyses used an all-in-one strategy for phylogenetic analyses, in which an overall phylogenetic tree was constructed to explore the evolution of all R2R3-MYB TFs. However, only the R2R3-MYB domain was used for these analyses, inevitably decreasing the resolution of the subtrees. Additionally, the inclusion of all R2R3-MYB sequences may reduce the support values in shallow and deep clades of the phylogenetic trees (Du et al., 2015; Soler et al., 2015). By contrast, the high-coverage species sampling strategy applied in this study provided the evolutionary patterns of R2R3-MYB TFs with high confidence. Our approach simultaneously achieved high resolution, excellent reliability, and broad sampling. We used a progressive strategy in our phylogenetic analyses in which sequences with deep divergence were first divided into subsets, and homologs within a subset were then analyzed using alignments with more informative sites.

Second, previous subgroup classifications were mostly angiosperm based and were not suitable for other land plant taxa (Bowman et al., 2017); all subgroups were given the same rank in these analyses, which ignored the evolutionary processes leading to their occurrence. Our results show that expansions of R2R3-MYB TFs in land plants occurred asymmetrically among subfamilies and among different plant groups. To better reflect this complex evolutionary history, we introduced a hierarchical classification containing subfamilies, clades, and lineage-specific subclades.

Finally, the previous classifications of R2R3-MYB TFs did not mention within-subgroup gene duplications. Focusing on angiosperms, we found that many subgroups identified in previous studies were derived from more than one ancestral R2R3-MYB gene, suggesting that such groups are polyphyletic (Fig. 6C).

One clade of Poaceae-specific R2R3-MYB TFs did not cluster with any existing subfamilies (Supplemental Fig. S4). The analysis of their primary structure showed that they are divergent members of subfamily IV on the basis of the presence of subfamily IV-specific motif 12 (Supplemental Fig. S17). Additionally, through BLAST, we found that Poaceae species possess a group of unique R2R3-MYB sequences (Supplemental Table S3; Supplemental File S2). The rogue placement and low identity between these sequences and other R2R3-MYB TFs indicate that these genes may be fast evolving.

Functional Divergence after Duplications of Subfamily VIII of the R2R3-MYB Genes during Plant Evolution

Our results revealed that the ancestral R2R3-MYB gene of subfamily VIII was duplicated six times during plant terrestrialization (Fig. 5). It has been shown that the functions of different clades of subfamily VIII R2R3-MYB TFs are diverse in bryophytes. For instance, in M. polymorpha, the clade-VIII-E R2R3-MYB gene Mapoly0073s0038 plays a role in regulating stress-induced flavonoid accumulation (Albert et al., 2018), and the clade-VIII-C gene Mapoly0034s0034 is involved in gemma cup development (Yasui et al., 2019). In P. patens, the VIII-D R2R3-MYB gene Pp3c11_5420 (Pp1) and the VIII-A-2 R2R3-MYB gene Pp3c15_15960 (Pp2) are essential for cell growth during protonemata development (Leech et al., 1993). The expression of the VIII-A-2 R2R3-MYB gene Pp3c9_15970 (Phypa_184923 in genome version V1.1) can be induced by UV-B radiation (Wolf et al., 2010). The expression levels of some VIII-A-1 and VIII-A-2 R2R3-MYB genes are significantly upregulated after the overexpression of PpVNS7, a regulator of the development of water-conducting and supporting tissue in P. patens (Xu et al., 2014). The diverse functions of the bryophyte R2R3-MYB TFs of subfamily VIII suggest that the neofunctionalization of subfamily VIII R2R3-MYB genes occurred after gene duplications in the common ancestor of land plants. This indicates that with the early gene duplication events of subfamily VIII, functional differentiation of different clades has occurred. It is interesting to explore whether the ancestral function of the R2R3-MYBs of subfamily VIII in Zygnamatophyceae was retained in land plants after gene duplications.

We found that some R2R3-MYB TFs within subfamily VIII clades share similar functions. Many R2R3-MYBs of VIII-A are regulators of secondary cell wall accumulation. For example, in Arabidopsis, VIII-A-1 TFs such as AtMYB26 (Yang et al., 2007), AtMYB46 (Zhong et al., 2007; McCarthy et al., 2009), AtMYB61 (Romano et al., 2012), AtMYB83 (McCarthy et al., 2009), and AtMYB103 (Zhong et al., 2008), as well as VIII-A-2 TFs including AtMYB20, AtMYB42, AtMYB43, and AtMYB85 (Zhong et al., 2008; Geng et al., 2020), are important players in the regulatory network of secondary cell wall biosynthesis and can activate the expression of secondary wall biosynthetic genes. AtMYB35/TDF1 and AtMYB80/MS188 (formerly MYB103) belong to two angiosperm subclades of VIII-B that diverged before the diversification of extant seed plants, and both play vital roles in tapetum development (Phan et al., 2011; Lou et al., 2018). The R2R3-MYB TFs of VIII-C are regulatory factors of meristem formation in liverworts and angiosperms (Schmitz et al., 2002; Keller et al., 2006; Müller et al., 2006; Yasui et al., 2019). Two angiosperm subclades of VIII-D R2R3-MYBs regulate the development of epidermal hair in different regions (Brockington et al., 2013). In addition to the liverwort R2R3-MYB TF of clade VIII-E (Albert et al., 2018), a number of angiosperm VIII-E-[e2] TFs are involved in the regulation of flavonoid biosynthesis (Liu et al., 2015; Allan and Espley, 2018). The partial conservation of the functions of R2R3-MYB TFs within a clade of subfamily VIII indicates that the ancestral TFs of each clade of subfamily VIII may have regulated similar developmental or physiological processes. Evolution of the gene regulatory network (GRN), including its rewiring and cooption, could occur after TF gene duplication and subsequent functional divergence (Das Gupta and Tsiantis, 2018). We infer that after multiple gene duplications, some R2R3-MYB genes of subfamily VIII retained partial ancestral functions, increased the complexity of the original GRNs, or formed similar GRNs due to changes in expression patterns, whereas other R2R3-MYB genes of subfamily VIII formed novel GRNs through neofunctionalization.

Gene Duplications in the Expansion of Land Plant R2R3-MYB TFs

It has been suggested that genome-wide duplications and tandem duplication events have contributed to the expansion of R2R3-MYBs in land plants (Du et al., 2015). However, the current study shows that the expansion of R2R3-MYB TFs in land plants has been subfamily/clade asymmetric and lineage specific (Figs. 36; Supplemental Figs. S8S11), and that gene duplications have played an important role in their expansion.

Our results reveal that the number of R2R3-MYB TFs in flowering plants is variable, but the number of subclades of R2R3-MYB proteins is relatively stable (Supplemental Fig. S18). The number of R2R3-MYB TFs in flowering plants is not closely correlated with the number of whole-genome duplication events (Supplemental Fig. S18; Van de Peer et al., 2017; Cheng et al., 2018). It is possible that R2R3-MYB gene retention after duplication events is biased among angiosperm taxa (Freeling, 2009).

Additionally, our results show that sequences from 38 angiosperm R2R3-MYB subclades found within single analyzed core eudicot species do not form a monophyletic group, indicating that they were derived from gene duplications (Figs. 4 and 6A; Supplemental Figs. S9 and S10). As gene duplications within core eudicots may have been derived from gamma paleohexaploidy (Vekemans et al., 2012), we suggest that core eudicot-specific duplications that are pervasive in different angiosperm subclades of R2R3-MYB TFs are gamma-triplication derived.

Functional redundancy is a challenge to studies exploring the functions of duplicated R2R3-MYB genes (Dubos et al., 2010). Our results reveal recent and ancestral duplications of R2R3-MYB genes, which may provide a backbone for the study of the neo- and subfunctionalization of these genes as well as the evolution of gene regulatory networks.

Evolution of Auxiliary Motifs of R2R3-MYB TFs and Their Neofunctionalization

Thirty-two subfamily-specific motifs were explored in the 10 subfamilies of plant R2R3-MYB TFs. At least one conserved motif of land plant R2R3-MYB TFs has been reported previously in each subfamily except subfamily I, but 14 subfamily-specific motifs have not been reported (Supplemental Fig. S5; Gocal et al., 2001; Stracke et al., 2001; Du et al., 2012a, 2012b, 2015). Additionally, only two of eight clade-specific auxiliary motifs [VIII-(6) and VIII-(7)] of subfamily VIII were identified previously (Fig. 3; Brockington et al., 2013). In this study, we identified more conserved motifs than were found in previous studies because we used updated datasets with higher coverage of plant taxa.

The functions of most subfamily-specific motifs and subfamily VIII clade-specific motifs are still unknown. A recent study found that in the cucumber (Cucumis sativus), the R2R3-MYB TF CsMYB6 can directly interact with the MYB-related protein CsTRY, mediated by the R3 MYB repeat and motif VIII-(6) of CsMYB6 (Yang et al., 2018). We observed multiple independent loss events of conserved motifs across different R2R3-MYB subfamilies, whereas most motifs identified previously are seed plant or angiosperm specific, suggesting that losses and gains of auxiliary motifs during plant evolution are common in R2R3-MYB TFs. These auxiliary motifs of R2R3-MYBs can be targets of posttranslational modification or be involved in protein-protein interactions and transcriptional activation or repression (Millard et al., 2019). The loss of motifs of TFs could lead to functional shifts (Finet et al., 2013). We infer that the losses and gains of auxiliary motifs of R2R3-MYBs could contribute to their neofunctionalization.

Evolution of the DNA-Binding Specificity of Plant R2R3-MYB TFs

It has been noted that the DNA base-contacting residues of subfamilies FLP and ARP are distinct from those of the typical R2R3- and R1R2R3-MYB proteins, which might contribute to their unique DNA-binding specificity (Guo et al., 2008; Xie et al., 2010). Interestingly, the DNA base-contacting residues of the FLPs of chlorophytes and some charophyte species are identical to those of typical R2R3-/R1R2R3-MYB proteins (Supplemental Fig. S19), which suggests that the change in the DNA-binding specificity of subfamily FLP evolved during streptophyte evolution.

The major difference between the type-1 and type-2 DNA motifs is that the preference of the -2′ site is for A or G (Fig. 7B). Our results show that R2R3-MYB TFs with type-2 DNA-binding specificity belong to subfamily VIII, suggesting that ancestral subfamily VIII R2R3-MYB TFs attained altered specificity during their emergence. This inference is also consistent with the previous observation that the R2R3-MYB TFs within our subfamily VIII in Arabidopsis preferentially bind to the MBSIIG sequence (Romero et al., 1998). Further analyses of subfamily VIII R2R3-MYBs in Zygnematophyceae will provide more details of the early evolution of their DNA-binding specificity. As a member of subfamily VIII, however, AtMYB113 preferentially binds to the type-1 DNA motif (Fig. 7B), which may be correlated with an increase in the ability to bind G at the −2′ site, suggesting that the DNA-binding specificity is still evolving within subfamily VIII.

The diversity of the DNA-binding properties of R2R3-MYB TFs in subfamilies VII and VIII has been explored (Solano et al., 1997; Romero et al., 1998). Subfamily VII R2R3-MYB TFs can bind both A and G at the −2′ site, which confirms that PhMYB3, a subfamily VII protein, can bind MBSI and MBSII (Solano et al., 1997). Additionally, we noted that within subfamily VIII, the variation of the DNA-binding profiles was high (Fig. 7B). For example, the +2′ sites of the DNA-binding profiles harbor either an A or a C (Fig. 7B), corresponding to the difference between MBSII and MBSIIG (Solano et al., 1997). It is interesting to explore whether the variation of the specificity of subfamily VIII R2R3-MYB TFs has played a role in their expansion and subsequent sub- and neofunctionalization in land plants.

Notably, the primary and secondary binding motifs of AtMYB52 were detected in a protein-binding microarray (Franco-Zorrilla et al., 2014). Similarly, the binding of MBSIIG was detected in PhMYB3, although its binding was weak (Solano et al., 1997). In addition, the DNA-binding specificities analyzed by in vitro and in vivo approaches can be different. For instance, the DNA-binding profile of AtMYB125 was successfully explored by protein-binding microarray analysis (Higo et al., 2018) but was not detected in yeast one-hybrid assays (Kelemen et al., 2015). By contrast, AtMYB91/AS1 from subfamily ARP was found to be unable to bind to DNA in vitro (Guo et al., 2008), but interaction of AtMYB91/AS1 with DNA was identified in yeast one-hybrid assays (Kelemen et al., 2015). The discordance among primary and secondary motifs as well as the disparity between in vivo and in vitro approaches for examining DNA-binding specificity suggests that future analysis of secondary motifs and the combination of multiple pieces of evidence will be helpful in elucidating the evolution of the DNA-binding specificity of plant R2R3-MYB TFs.

CONCLUSION

We found that the number of R2R3-MYB genes increased in green plants, but the number of large lineages (i.e. R2R3-MYB TF subfamilies) remained stable in land plants. Our study shows that the expansion of R2R3-MYB TFs in land plants is attributable to multiple gene duplications exhibiting asymmetry between subfamilies and clades, which cannot be explained by the previous hypothesis that the expansion of these TFs was dependent on whole-genome duplication events (Du et al., 2015). The expansion of R2R3-MYBs was mainly caused by the rapid expansion of subfamily VIII, which emerged in the common ancestor of Zygnematophyceae and land plants, whereas other subfamilies showed little change in size during embryophyte evolution. Moreover, our work provides insight into the phylogenetic relationships among dozens of subclades of R2R3-MYB TFs that have been ignored previously.

MATERIALS AND METHODS

Sequence Retrieval

The CDSs of MYB genes were obtained from The Arabidopsis Information Resource (http://www.arabidopsis.org) according to the accession numbers from previous studies (Riechmann and Ratcliffe, 2000; Stracke et al., 2001; Yanhui et al., 2006; Dubos et al., 2010). The CDSs of CDC5, 4R-MYB, and R2R3-MYB domain-containing proteins were identified from the genome and transcriptome databases of rhodophytes, glaucophytes, chlorophytes, charophytes, and embryophytes (Supplemental Table S1; Supplemental Methods) with TBLASTN (e-value <0.01; Altschul et al., 1997) using AtCDC5, AtMYB4R1, and the R2R3-MYB protein AtMYB46 from Arabidopsis (Arabidopsis thaliana) as separate queries. RNA-sequencing data from charophycean algae including Spirotaenia minuta, Coleochaete irregularis, and Spirogloea muscicola were processed by quality trimming, adapter clipping, and de novo assembly (Supplemental Methods) using TRIMMOMATIC v0.38 (Bolger et al., 2014) and Trinity v2.8.6 (Grabherr et al., 2011) for MYB sequence retrieval. To identify R2R3-MYB genes in Mesotaenium kramstae, we assembled its draft genome with SparseAssembler (Ye et al., 2012) using raw data from the National Center for Biotechnology Information Sequence Read Archive (accession no. SRR7051064); the CDSs of R2R3-MYB genes were predicted using AUGUSTUS v3.3.1 (Supplemental Methods; Stanke and Morgenstern, 2005). We discarded redundant sequences after BLASTN searches were performed when multiple sources of sequences within a species were retrieved. As the number of R2R3-MYB TFs identified in the spikemoss S. moellendorffii was unexpectedly smaller than the number of R2R3-MYB TFs identified in bryophyte species in previous studies (Du et al., 2015; Bowman et al., 2017), we amplified 19 CDSs of R2R3-MYB genes in S. moellendorffii by reverse transcription PCR followed by cloning (GenBank MN199004–MN199022; Supplemental Table S4; Supplemental Methods). To exclude redundant alleles of the MYB gene in S. moellendorffii, synteny analysis was performed using MCScanX (Wang et al., 2012). The CDSs of candidate R2R3-MYB genes with questionable annotations from ferns, gymnosperms, and A. trichopoda were predicted using AUGUSTUS v3.3.1 (Supplemental Table S1; Stanke and Morgenstern, 2005).

To exclude false-positive sequences, the retrieved sequences were used as queries to conduct reverse BLAST searches of the database of Arabidopsis MYB proteins. Sequences whose first hit in the reverse BLAST searches was not an R2R3-MYB or R1R2R3-MYB protein were discarded. Sequences with long insertions or deletions in MYB domains were also removed from our dataset. All coding sequences were translated into protein sequences for the following analyses when necessary.

Sequence Alignment

All sequence alignments were performed by MAFFT v7.307 with different algorithms (Supplemental Methods; Katoh and Standley, 2013). The L-INS-i algorithm was used to align sequences with one alignable domain, such as 4R-MYBs, R2R3-MYB domains, and R2R3-MYB TFs, with auxiliary motifs of subfamily VIII. The regions outside of the MYB domain or subfamily VIII-specific motifs were manually trimmed using MEGA 7 (Kumar et al., 2016). The "--addlong" parameter, an algorithm for aligning long sequences to short alignments, was used in combination with L-INS-i to align R2R3-MYB sequences from independent species to a trimmed alignment. The E-INS-i alignment strategy was used to align sequences whose variable regions outside of the R2R3-MYB domain may contain subfamily/clade/subclade conserved motifs. The full-length alignment computed by E-INS-i was used for phylogenetic reconstruction because the filtered multiple sequence alignments had the potential to reduce the fidelity of the phylogenetic analyses (Tan et al., 2015). The protein alignments generated by E-INS-i were converted to nucleotide sequence alignments by PAL2NAL (Suyama et al., 2006).

Phylogenetic Analysis

The best-fit substitution model and partitioning schemes for each alignment were determined by PartitionFinder2 (Lanfear et al., 2017) and/or modelFinder (Chernomor et al., 2016; Kalyaanamoorthy et al., 2017) according to the Bayesian information criterion. The explored partitioning schemes included MYB repeats R2 and R3, regions outside of the MYB domain, and three separate codon positions, according to the types of alignments (Supplemental Methods). Multiple phylogenetic analyses were performed in this study (for details, please see Supplemental Methods). In most cases, IQ-TREE v1.6.8 (Nguyen et al., 2015) was used to conduct ML analysis. To test the reliability of the phylogenetic trees, 1,000 ultrafast bootstrap replicates were conducted in each analysis (Hoang et al., 2018).

The phylogenies of the R2R3-MYB and R1R2R3-MYB proteins of M. polymorpha, S. moellendorffii, and A. trichopoda rooted by CDC5s were constructed using IQ-TREE, RAxML v8.2.12 (Stamatakis, 2014), PhyML v3.3 (Guindon et al., 2010), and MrBayes v3.2.6 (Ronquist et al., 2012). Rapid bootstrap trees of RAxML were used to calculate the leaf stability index for each taxon by RnR-lsi v1.0 (Aberer et al., 2013). Since the stability indexes of ARP R2R3-MYB TFs were low, phylogenetic analyses were performed after the pruning of this clade. IQ-TREE analyses of TFs with R2R3-MYB domains from five representative land plant species and previously reported datasets were subsequently performed.

To explore the origin of land plant R2R3-MYB TFs, the R2R3-MYB, R1R2R3-MYB, and CDC5s from algae species (Fig. 2) and M. polymorpha were used to construct the ML tree.

Ten subfamilies of R2R3-MYB TFs were identified in land plants in representative lineages. As the topology of the phylogenetic tree of R2R3-MYB TFs was nearly the same as that obtained when using sequences from additional species (Soler et al., 2015), the phylogenetic approach was used to identify orthologs of each R2R3-MYB subfamily with the exception of the ARP subfamily in previously unanalyzed land plant species (Supplemental Table S3). After the pruning of ARPs, proteins with R2R3-MYB domains in each species were combined with the sequences of M. polymorpha, S. moellendorffii, and A. trichopoda to perform phylogenetic analyses.

To explore the expansion history of subfamily VIII, the largest R2R3-MYB subfamily in land plants, we first constructed phylogenetic trees corresponding to the alignment of Zygnematophyceae species M. polymorpha, P. patens, S. moellendorffii, S. cucullata, P. abies, A. trichopoda, and Arabidopsis. Next, subfamily VIII R2R3-MYB TFs from the other land plant species were subjected to phylogenetic analyses in combination with the Zygnematophyceae sequences and those of representative embryophyte lineages. The unrooted trees of clades VIII-A-1, VIII-D, and VIII-E were subsequently constructed to explore duplication events in representative land plants.

To understand the evolutionary history within subfamilies FLP, I, II, III, IV, V, ARP, VI, and VII as well as within clades VIII-A-1, VIII-A-2, VIII-b, VIII-B, VIII-C, VIII-D, and VIII-E, the full-length nucleotide alignments of each subfamily/clade were used for phylogenetic reconstructions. The phylogenetic analyses of six euphyllophyte or seed plant clades in VIII-D and VIII-E were performed using the same approach.

ML analyses of 4R-MYB and R1R2R3-MYB TFs were performed in order to certify their orthologs with a mere two MYB repeats.

Primary Structure and Gene Structure Analysis

MEME (Bailey and Elkan, 1994) and FIMO (Grant et al., 2011) software were used to identify subfamily-, clade-, and subclade-specific motifs in the full-length amino acid sequences of the R2R3-MYB proteins of streptophytes (Supplemental Fig. S5; Supplemental Table S2) after R2R3-MYB domains were excluded. We carefully checked each motif to exclude incorrect and false-positive matches.

The regions encoding MYB repeats in coding sequences and genomic sequences were submitted to GSDS v2.0 (Hu et al., 2015) to illustrate their intron-exon structure.

Analysis of DNA-Binding Specificity

The position frequency matrices of the primary DNA-binding specificities of mouse c-Myb as well as R1R2R3- and R2R3-MYB TFs from Arabidopsis were obtained from several databases and studies (Howe and Watson, 1991; Xie et al., 2010; Franco-Zorrilla et al., 2014; Lotkowska et al., 2015; O’Malley et al., 2016; Higo et al., 2018; Fornes et al., 2020). The pairwise similarities of the binding specificities were calculated by using sstat in MOSTA v1.1 (Pape et al., 2008) and illustrated by Cytoscape v3.7.2 (Shannon et al., 2003). The distance-based tree of DNA-binding specificities was computed with the UPGMA algorithm in STAMP (Mahony and Benos, 2007). The logos of binding specificities were illustrated by Ceqlogo in the MEME suite (Bailey et al., 2009).

Accession Numbers

The sequences from this article can be obtained according to the accession numbers in Supplemental Table S1.

Supplemental Data

The following supplemental materials are available.

  • Supplemental Figure S1. Numbers of MYB proteins used in this study.

  • Supplemental Figure S2. Phylogenetic trees of R1R2R3-MYB and R2R3-MYB TFs of M. polymorpha, S. moellendorffii, and A. trichopoda constructed by various methods.

  • Supplemental Figure S3. Phylogenetic tree of proteins with R2R3-MYB domains in M. polymorpha, S. moellendorffii, S. cucullate, P. abies, and A. trichopoda.

  • Supplemental Figure S4. Phylogenetic trees of land plant R1R2R3-MYB and R2R3-MYB TFs from previous studies.

  • Supplemental Figure S5. Auxiliary motifs of R2R3-MYB TFs in land plants identified by MEME.

  • Supplemental Figure S6. Phylogenetic tree and exon-intron structures of the MYB domain-encoding regions of R2R3-MYB genes of subclade VIII-E-[e2]-γ in flowering plants.

  • Supplemental Figure S7. Unrooted phylogenetic tree of MYB proteins from algae and M. polymorpha.

  • Supplemental Figure S8. Nucleotide-level phylogenetic analysis of subfamily VIII R2R3-MYB TFs.

  • Supplemental Figure S9. Phylogenetic trees of R2R3-MYB sequences from subfamilies FLP, I, II, III, IV, V, ARP, VI, and VII.

  • Supplemental Figure S10. Phylogenetic trees of subfamily VIII R2R3-MYB sequences from clades VIII-A-1, VIII-A-2, VIII-b, VIII-B, VIII-C, VIII-D, and VIII-E.

  • Supplemental Figure S11. Unrooted phylogenetic trees of R2R3-MYB TFs of clades VIII-A-1, VIII-D, and VIII-E from representative land plants.

  • Supplemental Figure S12. Similarity network of DNA-binding specificity of R2R3- and R1R2R3-MYB TFs with TF names.

  • Supplemental Figure S13. Examples of misidentified R2R3-MYB TFs in previous studies.

  • Supplemental Figure S14. Exon-intron structures of the R2R3-MYB domain-encoding regions of R1R2R3-MYB genes and potentially homologous R2R3-MYB genes.

  • Supplemental Figure S15. Pairwise sequence alignment of R2R3-MYB TFs in S. moellendorffii, which were annotated in Phytozome database and amplified in this study.

  • Supplemental Figure S16. Pairwise sequence alignment of R2R3-MYB TFs in A. trichopoda, which were annotated in Phytozome database and assembled from the transcriptome by using Trinity.

  • Supplemental Figure S17. Schematic representation of the primary structure of R2R3-MYB TFs in the Poaceae-specific clade and their subfamily IV homologs.

  • Supplemental Figure S18. The number of subclades of R2R3-MYB TFs in angiosperm species analyzed in this study.

  • Supplemental Figure S19. Sequence alignment of the R2R3-MYB domains of subfamily FLP proteins.

  • Supplemental Table S1. Accession numbers of the MYB sequences and corresponding data sources used in this study.

  • Supplemental Table S2. Sequences of auxiliary motifs of subfamily VIII R2R3-MYB TFs in streptophytes.

  • Supplemental Table S3. Classification of R2R3-MYB TFs in land plants.

  • Supplemental Table S4. Primer sequences used for the cloning of MYB genes in S. moellendorffii.

  • Supplemental File S1. Potential R2R3-MYB gene loci in seed plants that were omitted by proteome sequence searches.

  • Supplemental File S2. BLAST results of monocots using Poaceae-specific R2R3-MYB proteins as query.

  • Supplemental Methods. Detailed methods for sequence retrieval and phylogenetic analyses.

Acknowledgments

We are grateful to Yan-Ping Guo (Beijing Normal University) for valuable comments on the article. Many thanks to the members of the Rao lab for helpful and inspiring discussions. Our sincere appreciation is extended to Theodor C. H. Cole (Freie Universität Berlin) for English language editing.

Footnotes

1

This work was supported by the National Natural Science Foundation of China (NSFC; grant no. 31670221).

References

  1. Aberer AJ, Krompass D, Stamatakis A(2013) Pruning rogue taxa improves phylogenetic accuracy: An efficient algorithm and webservice. Syst Biol 62: 162–166 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Albert NW, Thrimawithana AH, McGhie TK, Clayton WA, Deroles SC, Schwinn KE, Bowman JL, Jordan BR, Davies KM(2018) Genetic analysis of the liverwort Marchantia polymorpha reveals that R2R3MYB activation of flavonoid production in response to abiotic stress is an ancient character in land plants. New Phytol 218: 554–566 [DOI] [PubMed] [Google Scholar]
  3. Allan AC, Espley RV(2018) MYBs drive novel consumer traits in fruits and vegetables. Trends Plant Sci 23: 693–705 [DOI] [PubMed] [Google Scholar]
  4. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ(1997) Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Amborella Genome Project (2013) The Amborella genome and the evolution of flowering plants. Science 342: 1241089. [DOI] [PubMed] [Google Scholar]
  6. Aya K, Hiwatashi Y, Kojima M, Sakakibara H, Ueguchi-Tanaka M, Hasebe M, Matsuoka M(2011) The Gibberellin perception system evolved to regulate a pre-existing GAMYB-mediated system during land plant evolution. Nat Commun 2: 544. [DOI] [PubMed] [Google Scholar]
  7. Bailey TL, Elkan C(1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 2: 28–36 [PubMed] [Google Scholar]
  8. Bailey TL, Bodén M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW, Noble WS(2009) MEME SUITE: Tools for motif discovery and searching. Nucleic Acids Res 37: W202–W208 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Banks JA, Nishiyama T, Hasebe M, Bowman JL, Gribskov M, dePamphilis C, Albert VA, Aono N, Aoyama T, Ambrose BA, et al. (2011) The Selaginella genome identifies genetic changes associated with the evolution of vascular plants. Science 332: 960–963 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Bolger AM, Lohse M, Usadel B(2014) Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 30: 2114–2120 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Bonnot C, Hetherington AJ, Champion C, Breuninger H, Kelly S, Dolan L(2019) Neofunctionalisation of basic helix-loop-helix proteins occurred when embryophytes colonised the land. New Phytol 223: 993–1008 [DOI] [PubMed] [Google Scholar]
  12. Bowman JL, Kohchi T, Yamato KT, Jenkins J, Shu S, Ishizaki K, Yamaoka S, Nishihama R, Nakamura Y, Berger F, et al. (2017) Insights into land plant evolution garnered from the Marchantia polymorpha genome. Cell 171: 287–304.e15 [DOI] [PubMed] [Google Scholar]
  13. Brockington SF, Alvarez-Fernandez R, Landis JB, Alcorn K, Walker RH, Thomas MM, Hileman LC, Glover BJ(2013) Evolutionary analysis of the MIXTA gene family highlights potential targets for the study of cellular differentiation. Mol Biol Evol 30: 526–540 [DOI] [PubMed] [Google Scholar]
  14. Catarino B, Hetherington AJ, Emms DM, Kelly S, Dolan L(2016) The stepwise increase in the number of transcription factor families in the Precambrian predated the diversification of plants on land. Mol Biol Evol 33: 2815–2819 [DOI] [PubMed] [Google Scholar]
  15. Chanderbali AS, He F, Soltis PS, Soltis DE(2015) Out of the water: Origin and diversification of the LBD gene family. Mol Biol Evol 32: 1996–2000 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Cheng F, Wu J, Cai X, Liang J, Freeling M, Wang X(2018) Gene retention, fractionation and subgenome differences in polyploid plants. Nat Plants 4: 258–268 [DOI] [PubMed] [Google Scholar]
  17. Chernomor O, von Haeseler A, Minh BQ(2016) Terrace aware data structure for phylogenomic inference from supermatrices. Syst Biol 65: 997–1008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Daneva A, Gao Z, Van Durme M, Nowack MK(2016) Functions and regulation of programmed cell death in plant development. Annu Rev Cell Dev Biol 32: 441–468 [DOI] [PubMed] [Google Scholar]
  19. Das Gupta M, Tsiantis M(2018) Gene networks and the evolution of plant morphology. Curr Opin Plant Biol 45(Pt A): 82–87 [DOI] [PubMed] [Google Scholar]
  20. de Mendoza A, Sebé-Pedrós A, Šestak MS, Matejčić M, Torruella G, Domazet-Lošo T, Ruiz-Trillo I(2013) Transcription factor evolution in eukaryotes and the assembly of the regulatory toolkit in multicellular lineages. Proc Natl Acad Sci USA 110: E4858–E4866 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Dubos C, Stracke R, Grotewold E, Weisshaar B, Martin C, Lepiniec L(2010) MYB transcription factors in Arabidopsis. Trends Plant Sci 15: 573–581 [DOI] [PubMed] [Google Scholar]
  22. Du H, Yang SS, Liang Z, Feng BR, Liu L, Huang YB, Tang YX(2012a) Genome-wide analysis of the MYB transcription factor superfamily in soybean. BMC Plant Biol 12: 106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Du H, Feng BR, Yang SS, Huang YB, Tang YX(2012b) The R2R3-MYB transcription factor gene family in maize. PLoS One 7: e37463. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Du H, Liang Z, Zhao S, Nan MG, Tran LSP, Lu K, Huang YB, Li JN(2015) The evolutionary history of R2R3-MYB proteins across 50 eukaryotes: New insights into subfamily classification and expansion. Sci Rep 5: 11037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Feller A, Machemer K, Braun EL, Grotewold E(2011) Evolutionary and comparative analysis of MYB and bHLH plant transcription factors. Plant J 66: 94–116 [DOI] [PubMed] [Google Scholar]
  26. Finet C, Berne-Dedieu A, Scutt CP, Marlétaz F(2013) Evolution of the ARF gene family in land plants: Old domains, new tricks. Mol Biol Evol 30: 45–56 [DOI] [PubMed] [Google Scholar]
  27. Fornes O, Castro-Mondragon JA, Khan A, van der Lee R, Zhang X, Richmond PA, Modi BP, Correard S, Gheorghe M, Baranašić D, et al. (2020) JASPAR 2020: Update of the open-access database of transcription factor binding profiles. Nucleic Acids Res 48(D1): D87–D92 [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Frampton J, Leutz A, Gibson T, Graf T(1989) DNA-binding domain ancestry. Nature 342: 134. [DOI] [PubMed] [Google Scholar]
  29. Franco-Zorrilla JM, López-Vidriero I, Carrasco JL, Godoy M, Vera P, Solano R(2014) DNA-binding specificities of plant transcription factors and their potential to define target genes. Proc Natl Acad Sci USA 111: 2367–2372 [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Freeling M.(2009) Bias in plant gene content following different sorts of duplication: tandem, whole-genome, segmental, or by transposition. Annu Rev Plant Biol 60: 433–453 [DOI] [PubMed] [Google Scholar]
  31. Gao Y, Wang W, Zhang T, Gong Z, Zhao H, Han G-Z(2018) Out of water: The origin and early diversification of plant R-genes. Plant Physiol 177: 82–89 [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Geng P, Zhang S, Liu J, Zhao C, Wu J, Cao Y, Fu C, Han X, He H, Zhao Q(2020) MYB20, MYB42, MYB43 and MYB85 regulate phenylalanine and lignin biosynthesis during secondary cell wall formation. Plant Physiol 182: 1272–1283 [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Glover BJ, Perez-Rodriguez M, Martin C(1998) Development of several epidermal cell types can be specified by the same MYB-related plant transcription factor. Development 125: 3497–3508 [DOI] [PubMed] [Google Scholar]
  34. Gocal GFW, Sheldon CC, Gubler F, Moritz T, Bagnall DJ, MacMillan CP, Li SF, Parish RW, Dennis ES, Weigel D, et al. (2001) GAMYB-like genes, flowering, and gibberellin signaling in Arabidopsis. Plant Physiol 127: 1682–1693 [PMC free article] [PubMed] [Google Scholar]
  35. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, et al. (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29: 644–652 [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Grant CE, Bailey TL, Noble WS(2011) FIMO: Scanning for occurrences of a given motif. Bioinformatics 27: 1017–1018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Guindon S, Dufayard J-F, Lefort V, Anisimova M, Hordijk W, Gascuel O(2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59: 307–321 [DOI] [PubMed] [Google Scholar]
  38. Guo M, Thomas J, Collins G, Timmermans MCP(2008) Direct repression of KNOX loci by the ASYMMETRIC LEAVES1 complex of Arabidopsis. Plant Cell 20: 48–58 [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Higo A, Kawashima T, Borg M, Zhao M, López-Vidriero I, Sakayama H, Montgomery SA, Sekimoto H, Hackenberg D, Shimamura M, et al. (2018) Transcription factor DUO1 generated by neo-functionalization is associated with evolution of sperm differentiation in plants. Nat Commun 9: 5283. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Hoang DT, Chernomor O, von Haeseler A, Minh BQ, Vinh LS(2018) UFBoot2: Improving the ultrafast bootstrap approximation. Mol Biol Evol 35: 518–522 [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Hori K, Maruyama F, Fujisawa T, Togashi T, Yamamoto N, Seo M, Sato S, Yamada T, Mori H, Tajima N, et al. (2014) Klebsormidium flaccidum genome reveals primary factors for plant terrestrial adaptation. Nat Commun 5: 3978. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Hou XJ, Li SB, Liu SR, Hu CG, Zhang JZ(2014) Genome-wide classification and evolutionary and expression analyses of citrus MYB transcription factor families in sweet orange. PLoS One 9: e112375. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Howe KM, Reakes CFL, Watson RJ(1990) Characterization of the sequence-specific interaction of mouse c-myb protein with DNA. EMBO J 9: 161–169 [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Howe KM, Watson RJ (1991) Nucleotide preferences in sequence-specific recognition of DNA by c-myb protein. Nucleic Acids Res 19: 3913–3919 [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Hu B, Jin J, Guo A-Y, Zhang H, Luo J, Gao G(2015) GSDS 2.0: An upgraded gene feature visualization server. Bioinformatics 31: 1296–1297 [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Jiang C, Gu J, Chopra S, Gu X, Peterson T(2004a) Ordered origin of the typical two- and three-repeat Myb genes. Gene 326: 13–22 [DOI] [PubMed] [Google Scholar]
  47. Jiang C, Gu X, Peterson T(2004b) Identification of conserved gene structures and carboxy-terminal motifs in the Myb gene family of Arabidopsis and Oryza sativa L. ssp. indica. Genome Biol 5: R46. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Jin H, Martin C(1999) Multifunctionality and diversity within the plant MYB-gene family. Plant Mol Biol 41: 577–585 [DOI] [PubMed] [Google Scholar]
  49. Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS(2017) ModelFinder: Fast model selection for accurate phylogenetic estimates. Nat Methods 14: 587–589 [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Katoh K, Standley DM(2013) MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol Biol Evol 30: 772–780 [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Kelemen Z, Sebastian A, Xu W, Grain D, Salsac F, Avon A, Berger N, Tran J, Dubreucq B, Lurin C, et al. (2015) Analysis of the DNA-binding activities of the Arabidopsis R2R3-MYB transcription factor family by one-hybrid experiments in yeast. PLoS One 10: e0141044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Keller T, Abbott J, Moritz T, Doerner P(2006) Arabidopsis REGULATOR OF AXILLARY MERISTEMS1 controls a leaf axil stem cell niche and modulates vegetative development. Plant Cell 18: 598–611 [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Kranz HD, Denekamp M, Greco R, Jin H, Leyva A, Meissner RC, Petroni K, Urzainqui A, Bevan M, Martin C, et al. (1998) Towards functional characterisation of the members of the R2R3-MYB gene family from Arabidopsis thaliana. Plant J 16: 263–276 [DOI] [PubMed] [Google Scholar]
  54. Kumar S, Stecher G, Tamura K(2016) MEGA7: Molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol 33: 1870–1874 [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Lanfear R, Frandsen PB, Wright AM, Senfeld T, Calcott B(2017) PartitionFinder 2: New methods for selecting partitioned models of evolution for molecular and morphological phylogenetic analyses. Mol Biol Evol 34: 772–773 [DOI] [PubMed] [Google Scholar]
  56. Leech MJ, Kammerer W, Cove DJ, Martin C, Wang TL(1993) Expression of myb-related genes in the moss, Physcomitrella patens. Plant J 3: 51–61 [DOI] [PubMed] [Google Scholar]
  57. Li C, Ng CK-Y, Fan L-M(2015) MYB transcription factors, active players in abiotic stress signaling. Environ Exp Bot 114: 80–91 [Google Scholar]
  58. Li Q, Zhang C, Li J, Wang L, Ren Z(2012) Genome-wide identification and characterization of R2R3MYB family in Cucumis sativus. PLoS One 7: e47576. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Lin S-S, Bowman JL(2018) MicroRNAs in Marchantia polymorpha. New Phytol 220: 409–416 [DOI] [PubMed] [Google Scholar]
  60. Liu J, Osbourn A, Ma P(2015) MYB transcription factors as regulators of phenylpropanoid metabolism in plants. Mol Plant 8: 689–708 [DOI] [PubMed] [Google Scholar]
  61. Lotkowska ME, Tohge T, Fernie AR, Xue G-P, Balazadeh S, Mueller-Roeber B(2015) The Araidopsis transcription factor MYB112 promotes anthocyanin formation during salinity and under high light stress. Plant Physiol 169: 1862–1880 [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Lou Y, Zhou H-S, Han Y, Zeng Q-Y, Zhu J, Yang Z-N(2018) Positive regulation of AMS by TDF1 and the formation of a TDF1-AMS complex are required for anther development in Arabidopsis thaliana. New Phytol 217: 378–391 [DOI] [PubMed] [Google Scholar]
  63. Ma D, Constabel CP(2019) MYB repressors as regulators of phenylpropanoid metabolism in plants. Trends Plant Sci 24: 275–289 [DOI] [PubMed] [Google Scholar]
  64. Mahony S, Benos PV(2007) STAMP: A web tool for exploring DNA-binding motif similarities. Nucleic Acids Res 35: W253–W258 [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Maugarny-Calès A, Gonçalves B, Jouannic S, Melkonian M, Ka-Shu Wong G, Laufs P(2016) Apparition of the NAC transcription factors predates the emergence of land plants. Mol Plant 9: 1345–1348 [DOI] [PubMed] [Google Scholar]
  66. McCarthy RL, Zhong R, Ye Z-H(2009) MYB83 is a direct target of SND1 and acts redundantly with MYB46 in the regulation of secondary cell wall biosynthesis in Arabidopsis. Plant Cell Physiol 50: 1950–1964 [DOI] [PubMed] [Google Scholar]
  67. Millar AA, Gubler F(2005) The Arabidopsis GAMYB-like genes, MYB33 and MYB65, are microRNA-regulated genes that redundantly facilitate anther development. Plant Cell 17: 705–721 [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Millard PS, Kragelund BB, Burow M(2019) R2R3 MYB transcription factors—functions outside the DNA-binding domain. Trends Plant Sci 24: 934–946 [DOI] [PubMed] [Google Scholar]
  69. Müller D, Schmitz G, Theres K(2006) Blind homologous R2R3 Myb genes control the pattern of lateral meristem initiation in Arabidopsis. Plant Cell 18: 586–597 [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ(2015) IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32: 268–274 [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. O’Malley RC, Huang SC, Song L, Lewsey MG, Bartlett A, Nery JR, Galli M, Gallavotti A, Ecker JR(2016) Cistrome and epicistrome features shape the regulatory DNA landscape. Cell 165: 1280–1292 [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. One Thousand Plant Transcriptomes Initiative (2019) One thousand plant transcriptomes and the phylogenomics of green plants. Nature 574: 679–685 [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Oshima Y, Shikata M, Koyama T, Ohtsubo N, Mitsuda N, Ohme-Takagi M(2013) MIXTA-like transcription factors and WAX INDUCER1/SHINE1 coordinately regulate cuticle development in Arabidopsis and Torenia fournieri. Plant Cell 25: 1609–1624 [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Pape UJ, Rahmann S, Vingron M(2008) Natural similarity measures between position frequency matrices with an application to clustering. Bioinformatics 24: 350–357 [DOI] [PubMed] [Google Scholar]
  75. Phan HA, Iacuone S, Li SF, Parish RW(2011) The MYB80 transcription factor is required for pollen development and the regulation of tapetal programmed cell death in Arabidopsis thaliana. Plant Cell 23: 2209–2224 [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Puttick MN, Morris JL, Williams TA, Cox CJ, Edwards D, Kenrick P, Pressel S, Wellman CH, Schneider H, Pisani D, et al. (2018) The interrelationships of land plants and the nature of the ancestral embryophyte. Curr Biol 28: 733–745.e2 [DOI] [PubMed] [Google Scholar]
  77. Riechmann JL, Ratcliffe OJ(2000) A genomic perspective on plant transcription factors. Curr Opin Plant Biol 3: 423–434 [DOI] [PubMed] [Google Scholar]
  78. Romani F, Reinheimer R, Florent SN, Bowman JL, Moreno JE(2018) Evolutionary history of HOMEODOMAIN LEUCINE ZIPPER transcription factors during plant transition to land. New Phytol 219: 408–421 [DOI] [PubMed] [Google Scholar]
  79. Romano JM, Dubos C, Prouse MB, Wilkins O, Hong H, Poole M, Kang KY, Li E, Douglas CJ, Western TL, et al. (2012) AtMYB61, an R2R3-MYB transcription factor, functions as a pleiotropic regulator via a small gene network. New Phytol 195: 774–786 [DOI] [PubMed] [Google Scholar]
  80. Romero I, Fuertes A, Benito MJ, Malpica JM, Leyva A, Paz-Ares J(1998) More than 80 R2R3-MYB regulatory genes in the genome of Arabidopsis thaliana. Plant J 14: 273–284 [DOI] [PubMed] [Google Scholar]
  81. Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Höhna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP(2012) MrBayes 3.2: Efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol 61: 539–542 [DOI] [PMC free article] [PubMed] [Google Scholar]
  82. Rosinski JA, Atchley WR(1998) Molecular evolution of the Myb family of transcription factors: Evidence for polyphyletic origin. J Mol Evol 46: 74–83 [DOI] [PubMed] [Google Scholar]
  83. Sakura H, Kanei-Ishii C, Nagase T, Nakagoshi H, Gonda TJ, Ishii S(1989) Delineation of three functional domains of the transcriptional activator encoded by the c-myb protooncogene. Proc Natl Acad Sci USA 86: 5758–5762 [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Schmitz G, Tillmann E, Carriero F, Fiore C, Cellini F, Theres K(2002) The tomato Blind gene encodes a MYB transcription factor that controls the formation of lateral meristems. Proc Natl Acad Sci USA 99: 1064–1069 [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T(2003) Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res 13: 2498–2504 [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Solano R, Fuertes A, Sánchez-Pulido L, Valencia A, Paz-Ares J(1997) A single residue substitution causes a switch from the dual DNA binding specificity of plant transcription factor MYB.Ph3 to the animal c-MYB specificity. J Biol Chem 272: 2889–2895 [DOI] [PubMed] [Google Scholar]
  87. Soler M, Camargo ELO, Carocha V, Cassan-Wang H, San Clemente H, Savelli B, Hefer CA, Paiva JA, Myburg AA, Grima-Pettenati J(2015) The Eucalyptus grandis R2R3-MYB transcription factor family: Evidence for woody growth-related evolution and function. New Phytol 206: 1364–1377 [DOI] [PubMed] [Google Scholar]
  88. Stamatakis A.(2014) RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30: 1312–1313 [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Stanke M, Morgenstern B(2005) AUGUSTUS: A web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res 33: W465–W467 [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. Stracke R, Werber M, Weisshaar B(2001) The R2R3-MYB gene family in Arabidopsis thaliana. Curr Opin Plant Biol 4: 447–456 [DOI] [PubMed] [Google Scholar]
  91. Stracke R, Holtgräwe D, Schneider J, Pucker B, Sörensen TR, Weisshaar B(2014) Genome-wide identification and characterisation of R2R3-MYB genes in sugar beet (Beta vulgaris). BMC Plant Biol 14: 249. [DOI] [PMC free article] [PubMed] [Google Scholar]
  92. Suyama M, Torrents D, Bork P(2006) PAL2NAL: Robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res 34: W609–W612 [DOI] [PMC free article] [PubMed] [Google Scholar]
  93. Tan G, Muffato M, Ledergerber C, Herrero J, Goldman N, Gil M, Dessimoz C(2015) Current methods for automated filtering of multiple sequence alignments frequently worsen single-gene phylogenetic inference. Syst Biol 64: 778–791 [DOI] [PMC free article] [PubMed] [Google Scholar]
  94. The Angiosperm Phylogeny Group (2016) An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG IV. Bot J Linn Soc 181: 1–20 [Google Scholar]
  95. Van de Peer Y, Mizrachi E, Marchal K(2017) The evolutionary significance of polyploidy. Nat Rev Genet 18: 411–424 [DOI] [PubMed] [Google Scholar]
  96. Vekemans D, Proost S, Vanneste K, Coenen H, Viaene T, Ruelens P, Maere S, Van de Peer Y, Geuten K(2012) Gamma paleohexaploidy in the stem lineage of core eudicots: Significance for MADS-box gene and species diversification. Mol Biol Evol 29: 3793–3806 [DOI] [PubMed] [Google Scholar]
  97. Wang Y, Tang H, Debarry JD, Tan X, Li J, Wang X, Lee TH, Jin H, Marler B, Guo H, et al. (2012) MCScanX: A toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res 40: e49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  98. Wickett NJ, Mirarab S, Nguyen N, Warnow T, Carpenter E, Matasci N, Ayyampalayam S, Barker MS, Burleigh JG, Gitzendanner MA, et al. (2014) Phylotranscriptomic analysis of the origin and early diversification of land plants. Proc Natl Acad Sci USA 111: E4859–E4868 [DOI] [PMC free article] [PubMed] [Google Scholar]
  99. Wilkins O, Nahal H, Foong J, Provart NJ, Campbell MM(2009) Expansion and diversification of the Populus R2R3-MYB family of transcription factors. Plant Physiol 149: 981–993 [DOI] [PMC free article] [PubMed] [Google Scholar]
  100. Wolf L, Rizzini L, Stracke R, Ulm R, Rensing SA(2010) The molecular and physiological responses of Physcomitrella patens to ultraviolet-B radiation. Plant Physiol 153: 1123–1134 [DOI] [PMC free article] [PubMed] [Google Scholar]
  101. Xie Z, Lee E, Lucas JR, Morohashi K, Li D, Murray JAH, Sack FD, Grotewold E(2010) Regulation of cell proliferation in the stomatal lineage by the Arabidopsis MYB FOUR LIPS via direct targeting of core cell cycle genes. Plant Cell 22: 2306–2321 [DOI] [PMC free article] [PubMed] [Google Scholar]
  102. Xu B, Ohtani M, Yamaguchi M, Toyooka K, Wakazaki M, Sato M, Kubo M, Nakano Y, Sano R, Hiwatashi Y, et al. (2014) Contribution of NAC transcription factors to plant adaptation to land. Science 343: 1505–1508 [DOI] [PubMed] [Google Scholar]
  103. Xu W, Dubos C, Lepiniec L(2015) Transcriptional control of flavonoid biosynthesis by MYB-bHLH-WDR complexes. Trends Plant Sci 20: 176–185 [DOI] [PubMed] [Google Scholar]
  104. Yang C, Xu Z, Song J, Conner K, Vizcay Barrena G, Wilson ZA(2007) Arabidopsis MYB26/MALE STERILE35 regulates secondary thickening in the endothecium and is essential for anther dehiscence. Plant Cell 19: 534–548 [DOI] [PMC free article] [PubMed] [Google Scholar]
  105. Yang K, Li Y, Wang S, Xu X, Sun H, Zhao H, Li X, Gao Z(2019) Genome-wide identification and expression analysis of the MYB transcription factor in moso bamboo (Phyllostachys edulis). PeerJ 6: e6242. [DOI] [PMC free article] [PubMed] [Google Scholar]
  106. Yang S, Cai Y, Liu X, Dong M, Zhang Y, Chen S, Zhang W, Li Y, Tang M, Zhai X, et al. (2018) A CsMYB6-CsTRY module regulates fruit trichome initiation in cucumber. J Exp Bot 69: 1887–1902 [DOI] [PMC free article] [PubMed] [Google Scholar]
  107. Yanhui C, Xiaoyuan Y, Kun H, Meihua L, Jigang L, Zhaofeng G, Zhiqiang L, Yunfei Z, Xiaoxiao W, Xiaoming Q, et al. (2006) The MYB transcription factor superfamily of Arabidopsis: Expression analysis and phylogenetic comparison with the rice MYB family. Plant Mol Biol 60: 107–124 [DOI] [PubMed] [Google Scholar]
  108. Yasui Y, Tsukamoto S, Sugaya T, Nishihama R, Wang Q, Kato H, Yamato KT, Fukaki H, Mimura T, Kubo H, Theres K, Kohchi T, et al. (2019) GEMMA CUP-ASSOCIATED MYB1, an ortholog of axillary meristem regulators, is essential in vegetative reproduction in Marchantia polymorpha. Curr Biol 29: 3987–3995.e5 [DOI] [PubMed] [Google Scholar]
  109. Ye C, Ma ZS, Cannon CH, Pop M, Yu DW(2012) Exploiting sparseness in de novo genome assembly. BMC Bioinformatics 13(Suppl 6): S1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  110. Zhang C, Ma R, Xu J, Yan J, Guo L, Song J, Feng R, Yu M(2018) Genome-wide identification and classification of MYB superfamily genes in peach. PLoS One 13: e0199192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  111. Zhong R, Richardson EA, Ye Z-H(2007) The MYB46 transcription factor is a direct target of SND1 and regulates secondary wall biosynthesis in Arabidopsis. Plant Cell 19: 2776–2792 [DOI] [PMC free article] [PubMed] [Google Scholar]
  112. Zhong R, Lee C, Zhou J, McCarthy RL, Ye Z-H(2008) A battery of transcription factors involved in the regulation of secondary cell wall biosynthesis in Arabidopsis. Plant Cell 20: 2763–2782 [DOI] [PMC free article] [PubMed] [Google Scholar]
  113. Zimmermann IM, Heim MA, Weisshaar B, Uhrig JF(2004) Comprehensive identification of Arabidopsis thaliana MYB transcription factors interacting with R/B-like BHLH proteins. Plant J 40: 22–34 [DOI] [PubMed] [Google Scholar]

Articles from Plant Physiology are provided here courtesy of Oxford University Press

RESOURCES