Evolutionary Dynamics of the DNA-Binding Domains in Putative R2R3-MYB Genes Identified from Rice Subspecies indica and japonica Genomes

Li Jia; Michael T Clegg; Tao Jiang

doi:10.1104/pp.103.027201

. 2004 Feb;134(2):575–585. doi: 10.1104/pp.103.027201

Evolutionary Dynamics of the DNA-Binding Domains in Putative R2R3-MYB Genes Identified from Rice Subspecies indica and japonica Genomes¹^,^[w]

Li Jia ^1,^*, Michael T Clegg ¹, Tao Jiang ¹

PMCID: PMC344534 PMID: 14966247

Abstract

The molecular evolution of the R2R3-MYB gene family is of great interest because it is one of the most important transcription factor gene families in the plant kingdom. Comparative analyses of a gene family may reveal important adaptive changes at the protein level and thereby provide insights that relate structure to function. We have performed a range of comparative and bioinformatics analyses on R2R3-MYB genes identified from the rice (Oryza sativa subsp. japonica and indica) and Arabidopsis genome sequences. The study provides an initial framework to investigate how different evolutionary lineages in a gene family evolve new functions. Our results reveal a remarkable excess of non-synonymous substitutions, an indication of adaptive selection on protein structure that occurred during the evolution of both helix1 and helix2 of rice R2R3-MYB DNA-binding domains. These flexible α-helix regions associated with high frequencies of excess non-synonymous substitutions may play critical roles in the characteristic packing of R2R3-MYB DNA-binding domains and thereby modify the protein-DNA interaction process resulting in the recognition of novel DNA-binding sites. Furthermore, a co-evolutionary pattern is found between the second α-helix of the R2 domain and the second α-helix of the R3 domain by examining all the possible α-helix pairings in both the R2 and R3 domains. This points to the functional importance of pairing interactions between related secondary structures.

Recent studies have shown that gene duplication, as one of the most important mechanisms for the emergence of novel gene functions (Ohno, 1967, 1970; Nei, 1969; Zhang et al., 1998), has resulted in substantial genetic redundancy in all sequenced genomes (Lynch and Conery, 2000; Prince and Pickett, 2002). After gene duplication, a relaxed selective constraint on newly duplicated gene copies allows for an exploration of the space of adaptive protein structures and thereby promotes functional diversification of the duplicates through the origin of biochemical innovation (Sankoff, 2001; Wagner, 2001). Genetic redundancy is common in the rice (Oryza sativa) genome and in most other plant genomes such as Arabidopsis (Feng et al., 2002; Goff et al., 2002; Sasaki et al., 2002; Yu et al., 2002). Rice offers important advantages for basic research in monocotyledonous plants because of its diploid nature, transformability, and the existence of genetic and molecular resources (Sasaki and Burr, 2000; Yuan et al., 2001). Rice also represents the only case where two subsp. (japonica and indica) have been sequenced and are publicly available. The genomic information of rice provides a rich resource not only for dissecting the rice genome but also for performing comparative studies between the two subsp. of rice and between rice and other cereal species (Bennetzen, 2002; Yuan et al., 2003). Systematic analyses of the Arabidopsis genome have facilitated detailed studies of genome structure and evolution in the plant kingdom (Blanc et al., 2000; Ball and Cherry, 2001; Castresana, 2001; Sreekumar et al., 2001). It is now possible to study the evolutionary dynamics of rice gene families by using corresponding genes from Arabidopsis genome to identify rice orthologs, thus facilitating performing comparative analyses.

Arabidopsis R2R3-MYB (R2R3-AtMYB) has provided an excellent model for studying molecular mechanisms underlying the acquisition of divergent functions by paralogous genes during evolution (Jia et al., 2003). Selection is expected to have a differential impact on synonymous and non-synonymous changes. Therefore, the comparison of synonymous and non-synonymous substitution rates among genes from different evolutionary lineages provides a powerful means of analyzing protein evolution (Sutton and Wilkinson, 1997; Yang and Nielsen, 1998; Lukens and Doebley, 2001). It is well known that R2R3-MYB genes play important roles in the regulation of secondary metabolism, the control of cell shape, disease resistance, and hormone responses in the plant kingdom (Urao et al., 1993; Gubler et al., 1995; Martin and Paz-Ares, 1997; Kranz et al., 1998; Romero et al., 1998; Braun and Grotewold, 1999; Jin and Martin, 1999; Stracke et al., 2001). It also is speculated that transcription factors play crucial roles in generating morphological and physiological variations through their control on developmental processes (Doebley and Lukens, 1998; Stern, 1998; Cubas et al., 1999; Arthur, 2002). The above features make the R2R3-MYB gene family an ideal subject for comparative analyses. Because more than 100 R2R3-MYB genes have been identified from the Arabidopsis genome (Kranz et al., 1998; Stracke et al., 2001) and our initial results suggest that a minimum of 80 R2R3-MYB genes exist in rice (either japonica or indica) genomes (data not shown), it is of great interest to ask how patterns of protein change differ in rice as compared with that in Arabidopsis. To address this question, we have identified rice orthologs of R2R3-AtMYB and analyzed non-synonymous and synonymous substitution rate differences in the duplication process to search for evidence of positive selection on different regions of R2R3-MYB DNA-binding domains. A remarkable excess of non-synonymous substitutions was identified in both helix1 and helix2 of rice R2R3-MYB DNA-binding domains when compared with that of helix3. This is similar to the findings in the study of the R2R3-AtMYB family (Jia et al., 2003) and indicates that greater adaptive flexibility is a property of the first two helices. These flexible α-helix regions associated with excess non-synonymous substitution frequencies are thought to play critical roles in the characteristic packing of DNA-binding domains during the protein-DNA interaction process. Moreover, a striking co-evolutionary pattern is found between the second α-helix of the R2 domain and that of the R3 domain from both Arabidopsis and rice R2R3-MYB genes, evidently pointing to the functional importance of the pairing correlations between related secondary structures in preserving the conformation of the specific protein folding pocket. Our results suggest that, like their Arabidopsis counterparts, adaptive protein evolution has occurred in the R2R3-MYB DNA-binding domains during the evolutionary history of rice.

RESULTS

Conserved DNA-Binding Domains of Rice and Arabidopsis R2R3-MYB Genes

R2R3-MYB is one of the most important subfamilies of HLH transcription factor superfamilies. The DNA-binding domains of R2R3-MYB genes consist of two structurally identical repeats: the R2 domain and the R3 domain (Ogata et al., 1994; Lipsick, 1996; Kranz et al., 1998). The protein sequence alignment of the rice R2R3-MYB DNA-binding domains revealed that each repeat consists of three α-helices and the last two α-helices on each repeat with a loop between them form a characteristic structure of the HLH DNA-binding motif (Fig. 1A). It is known that the first and second helices (i.e. helix1 and helix2) do not directly bind to DNA. However, they contribute to the characteristics of the HLH motif folding that are important in the recognition of a specific gene target. On the other hand, the third helix (i.e. helix3) of each repeat makes direct contact with the major groove of its DNA target (Ogata et al., 1994; Williams and Grotewold, 1997). A full-length R2R3-MYB gene contains two DNA-binding domains, an activation domain, and several flexible domains (Fig. 1B). A sequence comparison of DNA-binding domains among the putative R2R3-MYB genes from rice and Arabidopsis suggests that the R2R3-MYB genes have evolved in a very conserved manner, especially in their DNA-binding domains, and this conservation continues after the divergence of dicot and monocot about 200 million years ago (Wolfe et al., 1989).

Figure 1. — The amino acid sequence alignment of DNA-binding domains of the R2R3-MYB genes from Arabidopsis and rice. A, Amino acid sequence alignment of the DNA-binding domains (the R2 and R3 domains) of 78 R2R3-MYB genes from Arabidopsis and rice using the ClustalW program. Dark shading, Identical residues; light shading, similar residues. Three helices in the R2 or R3 domains are indicated with white boxes. The R2 and R3 domains are marked with black bars under the corresponding residues. B, Structure of R2R3-MYB genes. R2R3-MYB genes contain two DNA-binding domains (hatch box), an activation domain (shadow box), and multiple flexible domains (white box).

Heterogeneous Positive Selection Pressure in the Phylogeny of Rice R2R3-MYB DNA-Binding Domains

Rooted phylogenetic trees of R2R3-MYB DNA-binding domains were reconstructed using the maximum likelihood method from fastDNAml for both japonica and indica rice R2R3-MYB genes (Figs. 2 and 3). The maximum likelihood-based free ratio model that assumes selective pressures are different among all the lineages was found to fit significantly better (at 0.01 level) than the maximum likelihood-based one ratio that assumes selective pressures are the same for all lineage models for both japonica and indica. For japonica, the maximum likelihood of the free ratio model is –10,494.99, compared with –11,353.40 for the one ratio model, and twice the log-likelihood difference, 2Δl, is 858.41. For indica, the maximum likelihood of the free ratio model is –4,959.52, compared with –5,160.64 for the one ratio model, and twice the log-likelihood difference is 201.12. The statistical test of the difference was calculated from a χ² distribution with degrees of freedom = 106 for japonica and degrees of freedom = 42 for indica. Because the maximum likelihood-based free ratio model was found to fit significantly better than the maximum likelihood-based one ratio model for both japonica and indica phylogenies at the 0.01 level, the dN:dS (non-synonymous substitution rate: synonymous substitution rate) ratio for each branch of the given tree topologies was estimated using the free ratio model. The positively selected branches A to G were inferred with dN:dS ratios of 4.2, 2.7, 5.6, 40.9, 11.3, 4.1, and 13.5, respectively, for japonica (Fig. 2), and the branches A to D were inferred with dN:dS ratios of 6.3, 2.1, 17.2, and 20.8, respectively, for indica (Fig. 3). The high dN:dS ratios suggest that these branches have experienced excess non-synonymous substitutions, which indicate episodes of accelerated protein evolution. We also confirmed these putative positive selection branches by using a pair-wise comparison method based on estimated ancestral sequences and Fisher's exact test of statistical significance (Zhang et al., 1997, 1998; Jia et al., 2003).

Figure 2. — A maximum likelihood phylogenetic relationship among the DNA-binding domains of *japonica* rice R2R3-MYB genes. The maximum likelihood free ratio test shows that branches A to G experienced significant excess non-synonymous substitutions. The confidences at all seven branches are at 0.01 significant levels. The corresponding clades defined by these branches were examined to identify positively selected amino acid sites by using the subtree sampling technique. The branch lengths are scaled according to the number of substitutions per nucleotide. AF189786 was chosen as the outgroup. The short bar at the bottom of the figure represents the branch length corresponding to 0.1 (10%) substitutions per site.

Figure 3. — A maximum likelihood reconstructed phylogenetic relationship among the DNA-binding domains of *indica* rice R2R3-MYB genes. The maximum likelihood free ratio test shows that branches A to D experienced significant non-synonymous substitutions. The confidences at all four branches are at 0.01 significant levels. The corresponding clades defined by these branches were examined to identify positively selected amino acid sites by using the subtree sampling technique. The branch lengths are scaled according to the number of substitutions per nucleotide. AF189786 was chosen as the outgroup. The short bar at the bottom of the figure represents the branch length corresponding to 0.1 (10%) substitutions per site.

Unequal Positive Selection Pressure along the DNA-Binding Domains

The identification of sites of accelerated amino acid change should give us a clue about the relationship between sites experiencing positive selection episodes and their consequent effects on protein secondary structure. This information can assist experimental biologists by pointing to potentially fruitful avenues for structural research. Because predominant purifying and neutral selection signals can mask positive selection signals, a noise reduction technique (Jia et al., 2003) was used to identify sites that may have experienced excess non-synonymous substitutions. In our analysis (Table I), the subtrees of each selected clade with eight or less taxa were completely sampled. However, because of computational limitations, the larger clades were only partially sampled by randomly choosing a certain number of subtrees. A total of 5,000 subtrees were sampled from a set of approximately 2.73 × 10¹⁴ subtrees defined in a total of eight positively selected clades from the phylogeny of japonica R2R3-MYB DNA-binding domains, and 413 of the sampled subtrees were identified as informative subtrees for the inference of positive selection sites. A total of 1,000 subtrees were sampled from a set of 1,049,065 subtrees defined in a total of four positively selected clades from the phylogeny of indica R2R3-MYB DNA-binding domains, and 107 of the sampled subtrees were identified as informative subtrees. The inferred sites with 99% or higher confidence level were recorded. The total number of excess non-synonymous substitutions on recorded sites was counted and plotted out for both japonica (Fig. 4A) and indica (Fig. 4B). A total of 789 counts of excess non-synonymous substitutions for japonica and 252 for indica, respectively, was found along the 104 amino acid sites in the R2 and R3 domains (Table II). If these substitutions occurred randomly, an average of 7.6 counts per amino acid site would be expected for japonica, and an average of 2.4 counts per amino acid site would be expected for indica. However, our permutation test illustrated that helix1 and helix2 in the R2 domain have significantly higher counts per site than helix3 in the R2 domain in both japonica and indica. In the R3 domain, helix1 and helix2 have significantly higher counts per site than helix3 as well. These results strongly suggest that excess non-synonymous substitutions are not randomly distributed in the protein secondary structure domains. The incidence of positive selection events is concentrated on certain regions of the R2 and R3 domains. In particular, both helix1 and helix2 in the R2 and R3 domains have experienced more intense positive selection pressure.

Table I.

The nos. of relevant subtree topologies (combinations), subtrees sampled, and informative subtrees in the application of the subtree sampling technique on the DNA-binding domains of the R2R3-MYB genes from japonica and indica rice

Subspecies	Combinations	Sampled	Informative
japonica	∼2.73 × 10¹⁴	5,000	413
indica	1,049,065	1,000	107

Open in a new tab

Figure 4. — Unequal positive selection pressure along the DNA-binding domain. The three helices of each repeat are indicated using gray hatch boxes. The boundary between the R2 and R3 domains is indicated using a vertical dashed black line. A consensus sequence is given to correlate the amino acid positions with the bar chart. The most conserved residues are underlined as landmarks. A, Positively selected amino acid sites in the R2 and R3 domains of the R2R3-MYB genes from *japonica rice*. B, Positively selected amino acid sites in the R2 and R3 domains of the R2R3-MYB genes from *indica* rice.

Table II.

The statistical analysis of positive selection sites based on protein secondary structures of the R2 and R3 domains

The statistical significance of the positive selection counts on each of the helices was determined by a permutation test. *, Significance at 0.05 level; **, significance at 0.01 level.

Category	No. of Sites	Count		Percentage		Count/Site
Category	No. of Sites	indica	japonica	indica	japonica	indica	japonica
				%
Full R2R3 region	104	252	789	100%	100%	2.4	7.6
R2 domain
Helix1	15	62	137	25%	17%	4.1**	9.1**
Helix2	7	49	128	19%	16%	7.0**	18.3**
Helix3	10	9	13	4%	2%	0.9	1.3
R3 domain
Helix1	14	76	289	30%	37%	5.4**	20.6**
Helix2	6	23	87	9%	11%	3.8**	14.5**
Helix3	10	7	11	3%	1%	0.7	1.1

Open in a new tab

Co-Evolution of Certain Protein Secondary Structures in R2R3-MYB DNA-Binding Domains

Co-evolution of certain protein secondary structures (e.g. α-helices) may exist because of interactions (e.g. covalent linkage and hydrogen bonding) among these structures within a single protein. The relationship between any two subdomains (or structures) that interact with each other in such a single molecule is a one-to-one relationship. Recall that R2R3-MYB genes function as monomers when binding to their DNA targets, their DNA-binding domains consist of two structurally identical repeats, and each repeat consists of three α-helices. Helix1 and helix2 contribute to the characteristics of HLH motif folding, and helix3 of each repeat makes direct contact with the major groove of the target DNA. The function of R2R3-MYB genes depends on an active DNA-binding pocket formed between the R2 and R3 domains. Hence, a working molecule requires two such domains to co-evolve together. Therefore, any change in one of the two domains that perturbs the activity of the protein must be selected against, or subsequently compensated for, by a correlated change in the other domain. For these reasons, it is very important to identify secondary structures that determine the coevolution between R2 and R3 domains. The study of these subtle patterns of co-evolution should shed light on the question of how these co-evolutionary pairings were preserved. To study this problem, a correlation analysis was performed to measure quantitatively the strength of co-evolution between α-helices in R2R3-MYB DNA-binding domains. The correlation analysis revealed a significant (at the 0.01 level) positive linear regression relationship between the R2 and R3 domains from japonica (and indica) R2R3-MYB genes with the correlation coefficient (r) being 0.84 (and 0.78, respectively) and the coefficient of determination (r²) being 0.71 (and 0.61, respectively). As a comparison, the two coefficients for Arabidopsis were found to be 0.79 and 0.62. A further correlation analysis at the secondary structure level of the DNA-binding domains revealed another positive linear regression relationship between helix2 of the R2 domain and helix2 of the R3 domain in both rice and Arabidopsis with a significance at the 0.01 level (Table III).

Table III.

Correlation coefficients for the given domain pairs

Correlation Coefficients	japonica	indica	Arabidopsis
r (R2, R3)	0.84**	0.78**	0.79**
r (R2-1, R3-1)	0.19	0.25	0.21
r (R2-2, R3-2)	0.81**	0.75**	0.76**
r (R2-3, R3-3)	0.23	0.27	0.28

Open in a new tab

r (R2, R3), Correlation coefficient between the R2 and R3 domains; r (R2-1, R3-1), correlation coefficient between helix1 of the R2 domain and helix1 of the R3 domain; r (R2-2, R3-2), correlation coefficient between helix2 of the R2 domain and helix2 of the R3 domain; r (R2-3, R3-3), correlation coefficient between helix3 of the R2 domain and helix3 of the R3 domain. **, Significance at 0.01 level.

DISCUSSION

Gene duplication and subsequent functional divergence of duplicated genes are major mechanisms for the evolution of novel gene function (Ohno, 1967, 1970; Nei, 1969; Li and Gojobori, 1983; Long and Langley, 1993; Ohta, 1994; Walsh, 1995; Zhang et al., 1998; Lynch and Conery, 2000; Sankoff, 2001; Prince and Pickett, 2002). The functional novelty acquired by duplicate genes may reside in the acquisition of novel catalytic or structural properties or in shifts in developmental expression, thereby facilitating a finer modulation of the developmental program. Furthermore, it has been suggested that evolutionary changes in regulatory genes should be the predominant molecular mechanism governing both physiological and morphological evolution. The R2R3-MYB gene family presents an excellent model for exploring the utility of bioinformatics analyses to reveal the potential structural information latent in genome sequence data. Our investigation of R2R3-MYB DNA-binding domains shows that: (a) accelerated rates of protein evolution exist in both helix1 and helix2 of the R2 and R3 domains, whereas helix3 is more constrained in protein evolution, which is consistent with its direct DNA-binding function; and (b) a strong correlation between separate but functionally interacting protein secondary structures exists in the R2 and R3 domains.

Because R2R3-MYB genes function as monomers when binding to their DNA targets, and alterations in protein sequences play important roles in generating distinct R2R3-MYB functions (Ogata et al., 1994; Williams and Grotewold, 1997; Ganter et al., 1999; Lee and Schiefelbein, 2001), it makes sense to speculate that divergent functions of R2R3-MYB genes most likely are derived from the evolution of their DNA-binding domains. Experimental evidence shows that helix3 of both the R2 and R3 domains is involved in direct interaction with DNA, and mutations occurring in helix3 often abolish MYB DNA-binding ability and result in a functional loss of the MYB protein (Saikumar et al., 1990; Frampton et al., 1991; Gabrielsen et al., 1991; Guehman et al., 1992; Ogata et al., 1994; Oda et al., 1997; Sasaki et al., 2000). These findings suggest that the more flexible helix1 and helix2 structures can accommodate a higher incidence of amino acid change without deleterious effects. Consistent with these results and the concept of “evolutionary filtering” (Zurawski and Clegg, 1987), we have observed that both helix1 and helix2 of the R2 and R3 domains are subjected to significantly higher positive selection pressures than helix3.

It appears likely that subtle changes in DNA-binding characteristics might arise through minor conformational changes associated with the structural interactions between helix1 and helix2 of the R2 and R3 domains. These in turn might provide structural variations in R2R3-MYB proteins and thereby generate novel characteristics for the recognition of new target gene sites. It still remains unclear how positive selection really operated on R2R3-MYB genes. Our speculation on how DNA-binding domains of R2R3-MYB genes evolved to recognize different gene targets is that when an R2R3-MYB gene was duplicated, purifying selection on the duplicate genes was relaxed, and at this stage, a few neutral or nearly neutral mutations resulted in a conformational modification of one of the two redundant proteins to allow the recognition of new gene targets. The acceleration of non-synonymous substitutions followed this event and further enhanced the specialized gene function. Because the analysis on Arabidopsis R2R3-AtMYB genes showed similar evolutionary patterns (Jia et al., 2003), we believe that positive selection was involved in the evolution of novel specific functions of R2R3-MYB genes in both monocot and dicot lineages. However, further experiments are required to examine the functional changes of DNA-binding domains in the evolutionary process.

The functions of proteins in biological systems are determined by the physical interactions that they have with other molecules. Molecular interactions are of primary importance in metabolic and signaling pathways. It has been shown that proteins and their interaction partners must co-evolve so that any divergent change in one molecule can be complemented by the molecule's interaction partner. Otherwise, the interaction between the molecules will be lost along with its function (Moyle et al., 1994; Atwell et al., 1997; Pazos et al., 1997; Jespers et al., 1999). However, the co-evolutionary pattern of DNA-binding domains in proteins that acts as transcription factors is not well understood. Recent advances in computational methods allow a preliminary identification of interaction partners based on the comparative analysis of genome sequence data, providing a different approach to this fundamental problem. In this paper, we performed a co-evolution analysis to ask whether sequence comparisons can be used to recognize novel co-evolutionary relationships. Operationally, this analysis involves statistical comparisons between pairs of secondary structures that are suspected of interacting with each other in a single molecule. The analysis reveals a strong correlation in amino acid substitution rates between separate but functionally interacting R2 and R3 domains and between helix2 of the R2 domain and helix2 of the R3 domain. These specific co-evolutionary patterns highlight the functional importance of the protein tertiary structures composed by these domains (or subdomains). In addition to the finding that helix2 of both the R2 and R3 domains experienced excess non-synonymous substitutions, the analysis also showed that the strongest co-evolutionary relationship between the second helices of the R2 and R3 domains when compared with other helices in the two domains. The unique properties of helix2 clearly demonstrate its crucial roles during the evolution of the R2R3-MYB gene family.

Molecular genetic analyses have demonstrated that novel morphological and physiological variations may arise from changes in transcription factors. However, recent studies failed to identify evidence of positive selection in the evolution of several transcription factor families (Doebley and Lukens, 1998; Lukens and Doebley, 2001; Zhang et al., 2001). It has been suggested that positive selection might have occurred during a relatively short period of time compared with the much longer time since gene duplication events. Therefore, more frequent purifying and neutral selection events could mask positive selection episodes because most gene families appear to have evolved in an episodic fashion (Gillespie, 1991; Endo et al., 1996; Messier and Stewart, 1997; Zhang et al., 1998, 2001; Lukens and Doebley, 2001). The noise reduction technique employed here is an effective approach to infer amino acid sites that have experienced excess non-synonymous substitutions on specific lineages with good sensitivity and specificity, and it has been used successfully to identify individual amino acid sites that were positively selected during the evolution of the R2R3-MYB gene family (Jia et al., 2003). This approach, combined with other protein secondary structure prediction programs, presents an effective method for studying protein-DNA interaction regions of transcription factors based on their primary amino acid sequences.

It has been proposed that manipulations of the gene regulatory system could pave the way for the improvement of rice and other cereals (Tyagi and Mohanty, 2000). Experimental data showed that R2R3-MYB genes carry out important regulatory functions in rice. For example, it has been indicated that the expression of GAMYB gene in rice subsp. indica is stimulated by GA and plays important roles in rice flowering (Gubler et al., 1997; Gocal et al., 1999). Some rice R2R3-MYB genes are known to be involved in proanthocyanidin accumulation in developing seed, the control of the fungal infection and host cell death, and the regulation of the seed maturation (Suzuki et al., 1997; Lee et al., 2001; Nesi et al., 2001). Unfortunately, the functions of most rice R2R3-MYB genes studied in this paper are unknown. Further functional genomics studies need to be performed to identify the important roles of these genes in rice. The identification of positive selection sites (or regions) in the DNA-binding domains of rice R2R3-MYB genes indicates those regions of protein structure most susceptible to adaptive change; therefore, these analyses should assist the protein engineer in narrowing the huge space of potential protein modifications to a manageable subset.

MATERIALS AND METHODS

Rice (Oryza sativa) R2R3-MYB Gene Sequences

The predicted rice R2R3-MYB protein-coding sequences were retrieved from japonica and indica rice contig databases maintained at the Beijing Genome Institute (http://btn.genomics.org.cn/rice) and the Torrey Mesa Research Institute (http://www.tmri.org), respectively. Both indica and japonica genome sequences are draft sequences, and their quality is unknown. At least one cDNA sequence (i.e. DNA-binding domains of R2R3-AtMYB) was arbitrarily chosen from each distinguishable lineage (i.e. the lineage with bootstrapping support greater than 90%) in the Arabidopsis R2R3-MYB phylogeny (see Fig. 4 in Jia et al., 2003) to serve as queries (with a total of 11 cDNA sequences) to BLAST against the above two databases separately. They are: AtMYB0 (M79448), AtMYB4 (AF062860), AtMYB31 (X90387), AtMYB49 (AF175991), AtMYB68 (AF062901), AtMYB71 (U62743), AtMYB97 (AF176002), AtMYB102 (X90382), AtMYB109 (AF262734), AtMYB113 (AY008378), and AtMYB117 (AF334816). The common contig sequences retrieved from the BLAST searches were collected together to form our initial rice R2R3-MYB gene-containing contig libraries, which were separately prepared for japonica and indica rice genomes using our automated program (Perl scripts are available from our Web site at http://www.cs.ucr.edu/~lijia/Downloads/Genomics/PlantPhysiology/Perlscripts). GlimmerR (Salzberg et al., 1999), which was trained to recognize genes in the rice genome, was used to annotate genes along R2R3-MYB gene-containing contigs from the above libraries separately. It has been shown that GlimmerR has a sensitivity of 94% and a specificity of 97% at the nucleotide level to predict gene-coding regions on the test set of 172 complete genes (Yuan et al., 2001). To further guarantee the reliability of GlimmerR-predicted genes for our molecular evolution analyses, expressed sequence tag (EST) databases maintained by GenBank, EMBL, the DNA Data Bank of Japan, The Institute for Genomic Research, the Beijing Genome Institute, and the Torrey Mesa Research Institute were used to confirm genes that have EST evidence. Only sequences with an expectation value close to 0.0 and identities above 90% were selected in our study. A table of japonica and indica putative R2R3-MYB genes and the EST/cDNA clones used to confirm these genes is presented as supplemental material (Supplemental Table I, available in the online version of this article at http://www.plantphysiol.org). In addition, R2R3-MYB cDNA sequences from KOME (http://cdna01.dna.affrc.go.jp/cDNA; Kikuchi et al., 2003) and GenBank were also incorporated into our data set (Supplemental Table II). All the annotated codon sequences of their DNA-binding domains are available at http://www.cs.ucr.edu/~lijia/Downloads/Genomics/PlantPhysiology/Sequences. For convenience, each putative rice R2R3-MYB gene identified above was named after the name of the contig on which it is contained (except for cDNAs retrieved from KOME, where the gene names all start with AK; GAMYB is the gene's conventional name; CA767403 is the accession no. of the gene in GenBank). All the contig sequences used in our study are available in GenBank, and their accession numbers are provided in the Supplemental Table I.

Phylogeny Reconstruction

Multiple sequence alignments of DNA-binding domains of putative rice R2R3-MYB genes were obtained using ClustalW (Thompson et al., 1994). Protal2dna (http://bioweb.pasteur.fr/seqanal/interfaces/protal2dna.html) was then used to align R2R3-MYB codon sequences according to their protein sequence alignment. Only the R2 and R3 domains consisting of sequences of 312 nucleotides (i.e. 104 codons) were kept for our phylogenetic analyses, similar to the analysis in Jia et al. (2003). The maximum likelihood method from fastDNAml (Olsen et al., 1994) and the neighbor-joining method from PHYLIP (Felsenstein, 1995) were used to reconstruct phylogenetic trees based on the alignment of the DNA-binding domains of R2R3-MYB nucleotide sequences. The maximum likelihood method used global rearrangements and a transition/transversion ratio of 1.7. Because both methods gave almost the same tree topologies, the ones reconstructed using maximum likelihood method were kept for the subsequent positive selection studies.

Inference of Excess Non-Synonymous Substitution Sites and Unequal Positive Selection Pressure Test

A maximum likelihood-based free ratio model from PAML (Yang, 1997), which assumes heterogeneous selective pressures among lineages, was used to estimate the dN:dS ratio (denoted as ω, i.e. the ratio between the non-synonymous substitution rate and the synonymous substitution rate, where dN is the number of non-synonymous substitutions per non-synonymous site, and dS is the number of synonymous substitutions per synonymous site) on each lineage to identify branches that experienced excess non-synonymous substitutions. The identified branches were also confirmed using estimated ancestral codon sequences based on a pair-wise comparison method (Zhang et al., 1997, 1998; Jia et al., 2003). A discrete model (PAML; Yang, 1997), where three ω classes (ω₀, purifying selection; ω₁, neutral selection; and ω₂, positive selection) with probabilities (p₀, p₁, and p₂) are estimated from the data using maximum likelihood. These estimates identify excess non-synonymous substitutions at amino acid sites likely to have experienced positive selection pressures for each identified lineage. If the third ω value is significantly greater than 1.0, non-synonymous substitution is inferred to occur more frequently than synonymous substitution at some codons, and positive selection is implicated. Positively selected amino acid sites were then identified using a Bayesian method implemented in PAML. This method calculates the posterior probability of a codon site that is from any one of three ω classes. However, when purifying selection or neutrality are the forces governing the evolution of a phylogeny, the third ω ratio should be less than or equal to 1.0 in the discrete model; thus, no positively selected sites can be identified (Jia et al., 2003). To infer potential positive selection sites for a given lineage involving excess non-synonymous substitutions, the background noise caused by purifying and neutral selections needs to be minimized. A noise reduction technique (Jia et al., 2003) was performed to infer positively selected sites by sampling all or many random subtrees of the clade corresponding to an identified lineage. In this technique, a sampled subtree of a lineage is informative if its third ω ratio is significantly (α = 0.05 level of significance) greater than 1.0, and putative positive selection sites can be determined under discrete model in the subtree. Once a positively selected branch is identified, a lineage that has this branch as its deepest branch is defined. In our subtree sampling procedure, the original deepest branch of the lineage will be kept the same for all sampled subtrees. Given the taxa in a lineage, the subtree topologies can be generated by deleting different combinations of taxa except for the “outgroup” taxon. Each taxa combination considered induces a subtree of the original lineage. These induced subtrees were checked one by one to identify informative subtree topologies discrete model. For each informative subtree topology, positively selected amino acid sites can be identified and used to infer unequal positive selection pressure along the R2 and R3 domains. Positive selection pressure on a specific protein region is estimated by counting the total number inferred positive selection sites in the region found in the above sampling process.

Permutation Test on Unequal Positive Selection Pressure

An adapted permutation test (Churchill and Doerge, 1994; Good, 1994; Jia et al., 2003) was used to examine whether there is significant difference between positive selection pressures on two given regions of DNA-binding domains. The number of positive selection counts at each codon (amino acid) site is the sum of inferences from all the putative positive selection lineages where the subtrees sampled suggested this site as a positive selection site. This number is considered as a data point at the given site. We use (X₁,..., X_m) to denote a set of data points in a region with m codon sites and (Y₁,..., Y_n) to denote a set of data points in another region with n codon sites. These two sets of data points form an observation Z = (X₁,..., X_m, Y₁,..., Y_n). The permutation test was used to check whether there is a significant difference between these two sets of data points (i.e. positive selection counts). The test was performed by generating 1,000 random samples from the set of all permutations of the observed set Z. In our test, Student's t values for all the permutated sets of data points were calculated, and the distribution of Student's t values was plotted out. A two-sided test at 0.05 or 0.01 level of significance was conducted to detect a significant difference.

Correlation Analysis

A correlation analysis protocol (Goh et al., 2000), which was used to study the co-evolution of proteins with their interaction partners, was modified and introduced into our correlation analysis on the secondary structures of R2R3-MYB DNA-binding domains. All multiple alignments of protein sequences from the domains (e.g. the R2 domain) or subdomains (e.g. helix1 in the R3 domain) analyzed were generated using ClustalW. To compute a distance measure for protein sequences, the PROTDIST program (Felsenstein, 1995) using maximum likelihood estimates based on the Dayhoff PAM matrix was used. To quantify possible co-evolution among key DNA-binding structures (i.e. the R2 and R3 domains; helix1, helix2, and helix3 in the R2 domain; and helix1, helix2, and helix3 in the R3 domain), a linear regression analysis was employed to measure the correlation between the pair-wise evolutionary distances among all sequences in an alignment as follows.

We define X as a two-dimensional N × N matrix of evolutionary distances, where N is equal to the number of aligned sequences from the subdomains or domains being analyzed (e.g. all the helix2 subdomains of the R2 domains). For the putative subdomains that have co-evolved with the subdomains given in X (e.g. the helix2 subdomains of the R3 domains), a similar distance matrix, Y, is constructed. X_ij denotes the pair-wise distance between subdomain m_i and subdomain m_j; likewise, Y_ij signifies the pair-wise distance between subdomain n_i and subdomain n_j (where n_i and m_i are from the same DNA-binding domain, and n_j and m_j are from another DNA-binding domain). The correlation coefficient (r) between the matrix X and the matrix Y (Press et al., 1988) is defined as:

with –1 ≤ r ≤ 1, where X̄ is the mean of all X_ij values, and where Ȳ is the mean of all Y_ij values. Co-evolution is inferred when r is significantly different from zero. Otherwise, the co-evolution hypothesis is rejected.

Estimation of Statistical Significance of Correlation

The significance of a value r is assessed by a bootstrap analysis that yields an estimate of the sd of r given the size of our data set and by an estimation of the probability of obtaining the observed value of r by chance (i.e. the p value; Goh et al., 2000). In the bootstrap analysis, 1,000 data sets containing K pair-wise distances are randomly drawn (with replacement) from the K pair-wise distances in the original set. For every such set, the bootstrap correlation coefficient (r_b) is computed. The bootstrap interval (i.e. the interval of r_b accounting for 68% of the obtained values of r_b) is obtained from the 16% (a) and 84% (b) percentiles in the histogram of the 1,000 values of r_b, and the mean value of r_b is from the 50% percentile. The bootstrap estimate of the sd of the observed correlation is then calculated as:

The p value quantifies the co-evolution between a domain (or subdomain) and its putative interaction partner, and it is obtained by randomly shuffling the pair-wise distances between domains (or subdomains). Thus, the assignment of correspondences (e.g. helix1 is paired with helix2 in the R2 domain and helix1 is paired with helix2 in the R3 domain) is replaced by random assignments, and the correlation coefficient is computed as described above. From the resulting 1,000 values (r_rand), the z score for the actual observed value r is calculated as:

where σ is the sd of r_rand, and r̄_rand is the mean. The p value is then computed as p = erfc(|z|)/ Inline graphic ,, where erfc is the complement error function.

Supplementary Material

Supplemental Data

plntphys_134_2_575__index.html^{(846B, html)}

Acknowledgments

We are grateful to Bin Shuai and Haifeng Li for their helpful discussions and to the anonymous referees for many valuable suggestions.

http://www.plantphysiol.org/cgi/doi/10.1104/pp.103.027201.

This work was supported in part by the National Science Foundation Information Technology Research (ITR); (grant no. ACI–0085910 to T.J. and M.T.C.), by the National Key Project for Basic Research (973; grant no. 2002CB512801 to T.J.), and by the National Institutes of Health (grant no. P20 RR16475 from the Biomedical Research Infrastructure Network Program of the National Center for Research Resources).

^[w]

The online version of this article contains Web-only data.

References

Arthur W (2002) The emerging conceptual framework of evolutionary developmental biology. Nature 415: 757–764 [DOI] [PubMed] [Google Scholar]
Atwell S, Ultsch M, De Vos AM, Wells JA (1997) Structural pasticity in a remodeled protein-protein interface. Science 278: 1125–1128 [DOI] [PubMed] [Google Scholar]
Ball CA, Cherry JM (2001) Genome comparisons highlight similarity and diversity within the eukaryotic kingdoms. Curr Opin Chem Biol 5: 86–89 [DOI] [PMC free article] [PubMed] [Google Scholar]
Bennetzen J (2002) Opening the door to comparative plant biology. Science 296: 60–63 [DOI] [PubMed] [Google Scholar]
Blanc G, Barakat A, Guyot R, Cooke R, Delseny M (2000) Extensive duplication and reshuffling in the Arabidopsis genome. Plant Cell 12: 1093–1101 [DOI] [PMC free article] [PubMed] [Google Scholar]
Braun EL, Grotewold E (1999) Newly discovered plant c-myb-like genes rewrite the evolution of the plant myb gene family. Plant Physiol 121: 21–24 [DOI] [PMC free article] [PubMed] [Google Scholar]
Castresana J (2001) Comparative genomics and bioenergetics. Biochim Biophys Acta 1506: 147–162 [DOI] [PubMed] [Google Scholar]
Churchill GA, Doerge RW (1994) Empirical threshold values for quantitative trait mapping. Genetics 138: 963–971 [DOI] [PMC free article] [PubMed] [Google Scholar]
Cubas P, Vincent C, Coen E (1999) An epigenetic mutation responsible for natural variation in floral symmetry. Nature 401: 157–161 [DOI] [PubMed] [Google Scholar]
Doebley J, Lukens L (1998) Transcriptional regulators and the evolution of plant form. Plant Cell 10: 1075–1082 [DOI] [PMC free article] [PubMed] [Google Scholar]
Endo T, Ikeo K, Gojobori T (1996) Large-scale search for genes on which positive selection may operate. Mol Biol Evol 13: 685–690 [DOI] [PubMed] [Google Scholar]
Felsenstein J (1995) PHYLIP (Phylogeny Inference Package), Version 3.57c. University of Washington, Seattle
Feng Q, Zhang Y, Hao P, Wang S, Fu G, Huang Y, Li Y, Zhu J, Liu Y, Hu X et al. (2002) Sequence and analysis of rice chromosome 4. Nature 420: 316–320 [DOI] [PubMed] [Google Scholar]
Frampton J, Gibson TJ, Ness SA, Doderiein G, Graf T (1991) Proposed structure for the DNA-binding domain of the Myb oncoprotein based on model building and mutational analysis. Prot Eng 4: 891–901 [DOI] [PubMed] [Google Scholar]
Gabrielsen OS, Sentenac A, Fromageot P (1991) Specific DNA binding by c-Myb: evidence for a double helix-turn-helix-related motif. Science 253: 1140–1143 [DOI] [PubMed] [Google Scholar]
Ganter B, Chao S, Lipsick J (1999) Transcriptional activation by the Myb proteins requires a specific local promoter structure. FEBS letters 460: 401–410 [DOI] [PubMed] [Google Scholar]
Gillespie J (1991) The causes of molecular evolution. Oxford University Press, New York
Gocal G, Poole F, Gubler AT, Watts F, Blundell RG, King RW (1999) Long-day up-regulation of a GAMYB gene during Lolium temulentum inflorescence formation. Plant Physiol 119: 1271–1278 [DOI] [PMC free article] [PubMed] [Google Scholar]
Goff S, Ricke D, Lan TH, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H et al. (2002) A draft sequence of the rice genome (Oryza sativa ssp. japonica). Science 296: 92–100 [DOI] [PubMed] [Google Scholar]
Goh CS, Bogan AA, Joachimiak M, Walther D, Cohen FE (2000) Coevolution of proteins with their interaction partners. J Mol Biol 299: 283–293 [DOI] [PubMed] [Google Scholar]
Good P (1994) Permutation Tests: A Practical Guide to Resampling for Testing Hypotheses. Springer-Verlag, New York
Gubler F, Kalla R, Roberts JK, Jacobsen JV (1995) Gibberellin-regulated expression of a myb gene in barley aleurone cells: evidence for Myb transactivation of a high-pI alpha-amylase gene promoter. Plant Cell 7: 1879–1891 [DOI] [PMC free article] [PubMed] [Google Scholar]
Gubler F, Watts R, Kalla J, Matthews R, Keys P, Jacobsen JV (1997) Cloning of a rice cDNA encoding a transcription factor homologous to barley GAMyb. Plant Cell Physiol 38: 362–365 [DOI] [PubMed] [Google Scholar]
Guehman S, Vorbrueggen G, Kalkbrenner F, Moelling K (1992) Reduction of a conserved Cys is essential for Myb DNA-binding. Nucleic Acids Res 20: 2279–2286 [DOI] [PMC free article] [PubMed] [Google Scholar]
Jespers L, Lijnen HR, Vanwetswinkel S, Van Hoef B, Brepoels K, Collen D, De Maeyer M (1999) Guiding a docking made by phage display: selection of correlated mutations at the staphylokinase-plasmin interface. J Mol Biol 290: 471–479 [DOI] [PubMed] [Google Scholar]
Jia L, Clegg MT, Jiang T (2003) Excess nonsynonymous substitutions suggest that positive selection episodes operated in the DNA-binding domain evolution of Arabidopsis R2R3-MYB genes. Plant Mol Biol (in press) [DOI] [PubMed]
Jin H, Martin C (1999) Multifunctionality and diversity within the plant MYB-gene family. Plant Mol Biol 41: 577–585 [DOI] [PubMed] [Google Scholar]
Kikuchi S, Satoh K, Nagata T, Kawagashira N, Doi K, Kishimoto N, Yazaki J, Ishikawa M, Yamada H, Ooka H, et al (2003) Collection, mapping, and annotation of over 28, 000 cDNA clones from japonica rice. Science 301: 376–379 [DOI] [PubMed] [Google Scholar]
Kranz HD, Denekamp M, Greco R, Jin H, Leyva A, Meissner RC, Petroni K, Urzainqui A, Bevan M, Martin C et al. (1998) Towards functional characterization of the members of the R2R3-MYB gene family from Arabidopsis thaliana. Plant J 16: 263–276 [DOI] [PubMed] [Google Scholar]
Kranz HD, Scholz K, Weisshaar B (2000) c-MYB oncogene-like genes encoding three MYB repeats occur in all major plant lineages. Plant J 21: 231–235 [DOI] [PubMed] [Google Scholar]
Lee MM, Schiefelbein J (2001) Developmentally distinct MYB genes encode functionally equivalent proteins in Arabidopsis. Development 128: 1539–1546 [DOI] [PubMed] [Google Scholar]
Lee MW, Qi M, Yang Y (2001) A novel jasmonic acid-inducible rice myb gene associates with fungal infection and host cell death. Mol Plant-Microbe Interact 14: 527–535 [DOI] [PubMed] [Google Scholar]
Li WH, Gojobori T (1983) Rapid evolution of goat and sheep globin genes following gene duplication. Mol Biol Evol 1: 94–108 [DOI] [PubMed] [Google Scholar]
Lipsick JS (1996) One billion years of Myb. Oncogene 13: 223–235 [PubMed] [Google Scholar]
Long M, Langley CH (1993) Natural selection and the origin of jingwei, a chimeric processed functional gene in Drosophila. Science 260: 91–95 [DOI] [PubMed] [Google Scholar]
Lukens L, Doebley J (2001) Molecular evolution of the teosinte branched gene among maize and related grasses. Mol Biol Evol 18: 627–638 [DOI] [PubMed] [Google Scholar]
Lynch M, Conery JS (2000) The evolutionary fate and consequences of duplicate genes. Science 290: 1151–1155 [DOI] [PubMed] [Google Scholar]
Martin C, Paz-Ares J (1997) MYB transcription factors in plants. Trends Genet 13: 67–73 [DOI] [PubMed] [Google Scholar]
Messier W, Stewart C (1997) Episodic adaptive evolution of primate lysozymes. Nature 385: 151–154 [DOI] [PubMed] [Google Scholar]
Moyle WR, Campbell RK, Myers RV, Bernard MP, Han Y, Wang X (1994) Co-evolution of ligand-receptor pairs. Nature 368: 251–255 [DOI] [PubMed] [Google Scholar]
Nei M (1969) Gene duplication and nucleotide substitution in evolution. Nature 221: 5175–5177 [DOI] [PubMed] [Google Scholar]
Nesi N, Jond C, Debeaujon I, Caboche M, Lepiniec L (2001) The Arabidopsis TT2 gene encodes an R2R3 MYB domain protein that acts as a key determinant for proanthocyanidin accumulation in developing seed. Plant Cell 13: 2099–2114 [DOI] [PMC free article] [PubMed] [Google Scholar]
Nielsen R, Yang Z (1998) Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148: 929–936 [DOI] [PMC free article] [PubMed] [Google Scholar]
Oda M, Furukawa K, Ogata K, Sarai A, Ishii S, Nishimura Y, Nakamura H (1997) Identification of indispensable residues for specific DNA-binding in the imperfect tandem repeats of c-Myb R2R3. Protein Eng 10: 1407–1414 [DOI] [PubMed] [Google Scholar]
Ogata K, Morikawa S, Nakamura H, Sekikawa A, Inoue T, Kanai H, Sarai A, Ishii S, Nishimura Y (1994) Solution structure of a specific DNA complex of the Myb DNA-binding domain with cooperative recognition helices. Cell 79: 639–648 [DOI] [PubMed] [Google Scholar]
Ohno S (1967) Sex Chromosomes and Sex-Linked Genes. Springer, New York
Ohno S (1970) Evolution by Gene Duplication. Springer, New York
Ohta T (1994) Further examples of evolution by gene duplication revealed through DNA sequence comparisons. Genetics 138: 1331–1337 [DOI] [PMC free article] [PubMed] [Google Scholar]
Olsen GJ, Matsuda H, Hagstrom R, Overbeek R (1994) fastDNAml: a tool for construction of phylogenetic trees of DNA sequences using maximum likelihood. Comput Appl Biosci 10: 41–48 [DOI] [PubMed] [Google Scholar]
Pazos F, Helmer-Citterich M, Aukiello G, Valencia A (1997) Correlated mutations contain information about protein-protein interaction. J Mol Biol 271: 511–523 [DOI] [PubMed] [Google Scholar]
Press WH, Flannery BP, Teukolsky SA, Vetterling WT (1988) Numerical Recipes in C. Cambridge University Press, Cambridge, UK
Prince VE, Pickett FB (2002) Splitting pairs: the diverging fates of duplicated genes. Nat Rev Genet 3: 827–837 [DOI] [PubMed] [Google Scholar]
Romero I, Fuertes A, Benito MJ, Malpica JM, Leyva A, Paz-Ares J (1998) More than 80 R2R3-MYB regulatory genes in the genome of Arabidopsis thaliana. Plant J 14: 273–284 [DOI] [PubMed] [Google Scholar]
Saikumar P, Murali R, Reddy EP (1990) Role of tryptophan repeats and flanking amino acids in Myb-DNA interactions. Proc Natl Acad Sci USA 87: 8452–8456 [DOI] [PMC free article] [PubMed] [Google Scholar]
Salzberg SL, Pertea M, Delcher AL, Gardner MJ, Tettelin H (1999) Interpolated Markov models for eukaryotic gene finding. Genomics 59: 24–31 [DOI] [PubMed] [Google Scholar]
Sankoff D (2001) Gene and genome duplication. Curr Opin Genet Dev 11: 681–684 [DOI] [PubMed] [Google Scholar]
Sasaki T, Matsumoto T, Yamamoto K, Sakata K, Baba T, Katayose Y, Wu J, Niimura Y, Cheng Z, Nagamura Y et al. (2002) The genome sequence and structure of rice chromosome 1. Nature 420: 312–316 [DOI] [PubMed] [Google Scholar]
Sasaki M, Ogata K, Hatanaka H, Nishimura Y (2000) Backbone dynamics of the c-Myb DNA-binding domain complexed with a specific DNA. J Biochem 127: 945–953 [DOI] [PubMed] [Google Scholar]
Sasaki T, Burr B (2000) International rice genome sequencing project: the effort to completely sequence the rice genome. Curr Opin Plant Biol 3: 138–141 [DOI] [PubMed] [Google Scholar]
Sreekumar KR, Aravind L, Koonin E (2001) Computational analysis of human disease-associated genes and their protein products. Curr Opin Genet Dev 11: 247–257 [DOI] [PubMed] [Google Scholar]
Stern D (1998) A role of ultrabithorax in morphological differences between Drosophila species. Nature 396: 463–466 [DOI] [PMC free article] [PubMed] [Google Scholar]
Stracke R, Werber M, Weisshaar B (2001) The R2R3-MYB gene family in Arabidopsis thaliana. Curr Opin Plant Biol 4: 447–456 [DOI] [PubMed] [Google Scholar]
Sutton K, Wilkinson M (1997) Rapid evolution of a homeodomain: evidence for positive selection. J Mol Evol 45: 579–588 [DOI] [PubMed] [Google Scholar]
Suzuki A, Suzuki T, Tanabe F, Toki S, Washida H, Wu CY, Takaiwa F (1997) Cloning and expression of five myb-related genes from rice seed. Gene 198: 393–398 [DOI] [PubMed] [Google Scholar]
Tyagi AK, Mohanty A (2000) Rice transformation for crop improvement and functional genomics. Plant Sci 158: 1–18 [DOI] [PubMed] [Google Scholar]
Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22: 4673–4680 [DOI] [PMC free article] [PubMed] [Google Scholar]
Urao T, Yamaguchi-Shinozaki K, Urao S, Shinozaki K (1993) An Arabidopsis myb homolog is induced by dehydration stress and its gene product binds to the conserved MYB recognition sequence. Plant Cell 5: 1529–1539 [DOI] [PMC free article] [PubMed] [Google Scholar]
Wagner A (2001) Birth and death of duplicated genes in completely sequenced eukaryotes. Trends Genet 17: 237–239 [DOI] [PubMed] [Google Scholar]
Walsh JB (1995) How often do duplicated genes evolve new functions. Genetics 139: 421–428 [DOI] [PMC free article] [PubMed] [Google Scholar]
Williams C, Grotewold E (1997) Differences between plant and animal Myb domains are fundamental for DNA binding activity, and chimeric Myb domains have novel DNA binding specificities. J Biol Chem 272: 563–571 [DOI] [PubMed] [Google Scholar]
Wolfe KH, Gouy M, Yang YW, Sharp PM, Li WH (1989) Date of the monocot-dicot divergence estimated from chloroplast DNA sequence data. Proc Natl Acad Sci USA 86: 6201–6205 [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang Z (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. CABIOS 13: 555–556 [DOI] [PubMed] [Google Scholar]
Yang Z, Nielsen R (1998) Synonymous and nonsynonymous rate variation in nuclear genes of mammals. J Mol Evol 46: 409–418 [DOI] [PubMed] [Google Scholar]
Yu J, Hu S, Wang J, Wong GK, Li S, Liu B, Peng Y, Dai L, Zhou Y, Zhang X, et al. (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. indica) Science 296: 79–91 [DOI] [PubMed] [Google Scholar]
Yuan QP, OuYang S, Liu J, Suh B, Cheung F, Sultana R, Lee D, Quackenbush J, Buell CR (2003) The TIGR rice genome annotation resource: annotating the rice genome and creating resources for plant biologists. Nucleic Acids Res 31: 229–233 [DOI] [PMC free article] [PubMed] [Google Scholar]
Yuan QP, Quackenbush J, Sultana R, Pertea M, Salzberg SL, Buell CR (2001) Rice bioinformatics: analysis of rice sequence data and leveraging the data to other plant species. Plant Physiology 125: 1166–1174 [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang J, Kumar S, Nei M (1997) Small-sample tests of episodic adaptive evolution: a case study of primate lysozymes. Mol Biol Evol 14: 1335–1338 [DOI] [PubMed] [Google Scholar]
Zhang J, Rosenberg HF, Nei M (1998) Positive Darwinian selection after gene duplication in primate ribonuclease genes. Proc Natl Acad Sci USA 95: 3708–3713 [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang L, Pond SK, Gaut B (2001) A survey of the molecular evolutionary dynamics of twenty-five multigene families from four grass taxa. J Mol Evol 52: 144–156 [DOI] [PubMed] [Google Scholar]
Zurawski G, Clegg MT (1987) Evolution of higher-plant chloroplast DNA-encoded genes: implications for structure-function and phylogenetic studies. Annu Rev Plant Physiol 38: 391–418 [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Data

plntphys_134_2_575__index.html^{(846B, html)}

plntphys_134_2_575__27201_Supplemental_table.doc^{(68KB, doc)}

[ref1] Arthur W (2002) The emerging conceptual framework of evolutionary developmental biology. Nature 415: 757–764 [DOI] [PubMed] [Google Scholar]

[ref2] Atwell S, Ultsch M, De Vos AM, Wells JA (1997) Structural pasticity in a remodeled protein-protein interface. Science 278: 1125–1128 [DOI] [PubMed] [Google Scholar]

[ref3] Ball CA, Cherry JM (2001) Genome comparisons highlight similarity and diversity within the eukaryotic kingdoms. Curr Opin Chem Biol 5: 86–89 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref4] Bennetzen J (2002) Opening the door to comparative plant biology. Science 296: 60–63 [DOI] [PubMed] [Google Scholar]

[ref5] Blanc G, Barakat A, Guyot R, Cooke R, Delseny M (2000) Extensive duplication and reshuffling in the Arabidopsis genome. Plant Cell 12: 1093–1101 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref6] Braun EL, Grotewold E (1999) Newly discovered plant c-myb-like genes rewrite the evolution of the plant myb gene family. Plant Physiol 121: 21–24 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref7] Castresana J (2001) Comparative genomics and bioenergetics. Biochim Biophys Acta 1506: 147–162 [DOI] [PubMed] [Google Scholar]

[ref8] Churchill GA, Doerge RW (1994) Empirical threshold values for quantitative trait mapping. Genetics 138: 963–971 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref9] Cubas P, Vincent C, Coen E (1999) An epigenetic mutation responsible for natural variation in floral symmetry. Nature 401: 157–161 [DOI] [PubMed] [Google Scholar]

[ref10] Doebley J, Lukens L (1998) Transcriptional regulators and the evolution of plant form. Plant Cell 10: 1075–1082 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref11] Endo T, Ikeo K, Gojobori T (1996) Large-scale search for genes on which positive selection may operate. Mol Biol Evol 13: 685–690 [DOI] [PubMed] [Google Scholar]

[ref12] Felsenstein J (1995) PHYLIP (Phylogeny Inference Package), Version 3.57c. University of Washington, Seattle

[ref13] Feng Q, Zhang Y, Hao P, Wang S, Fu G, Huang Y, Li Y, Zhu J, Liu Y, Hu X et al. (2002) Sequence and analysis of rice chromosome 4. Nature 420: 316–320 [DOI] [PubMed] [Google Scholar]

[ref14] Frampton J, Gibson TJ, Ness SA, Doderiein G, Graf T (1991) Proposed structure for the DNA-binding domain of the Myb oncoprotein based on model building and mutational analysis. Prot Eng 4: 891–901 [DOI] [PubMed] [Google Scholar]

[ref15] Gabrielsen OS, Sentenac A, Fromageot P (1991) Specific DNA binding by c-Myb: evidence for a double helix-turn-helix-related motif. Science 253: 1140–1143 [DOI] [PubMed] [Google Scholar]

[ref16] Ganter B, Chao S, Lipsick J (1999) Transcriptional activation by the Myb proteins requires a specific local promoter structure. FEBS letters 460: 401–410 [DOI] [PubMed] [Google Scholar]

[ref17] Gillespie J (1991) The causes of molecular evolution. Oxford University Press, New York

[ref18] Gocal G, Poole F, Gubler AT, Watts F, Blundell RG, King RW (1999) Long-day up-regulation of a GAMYB gene during Lolium temulentum inflorescence formation. Plant Physiol 119: 1271–1278 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref19] Goff S, Ricke D, Lan TH, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H et al. (2002) A draft sequence of the rice genome (Oryza sativa ssp. japonica). Science 296: 92–100 [DOI] [PubMed] [Google Scholar]

[ref20] Goh CS, Bogan AA, Joachimiak M, Walther D, Cohen FE (2000) Coevolution of proteins with their interaction partners. J Mol Biol 299: 283–293 [DOI] [PubMed] [Google Scholar]

[ref21] Good P (1994) Permutation Tests: A Practical Guide to Resampling for Testing Hypotheses. Springer-Verlag, New York

[ref22] Gubler F, Kalla R, Roberts JK, Jacobsen JV (1995) Gibberellin-regulated expression of a myb gene in barley aleurone cells: evidence for Myb transactivation of a high-pI alpha-amylase gene promoter. Plant Cell 7: 1879–1891 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref23] Gubler F, Watts R, Kalla J, Matthews R, Keys P, Jacobsen JV (1997) Cloning of a rice cDNA encoding a transcription factor homologous to barley GAMyb. Plant Cell Physiol 38: 362–365 [DOI] [PubMed] [Google Scholar]

[ref24] Guehman S, Vorbrueggen G, Kalkbrenner F, Moelling K (1992) Reduction of a conserved Cys is essential for Myb DNA-binding. Nucleic Acids Res 20: 2279–2286 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref25] Jespers L, Lijnen HR, Vanwetswinkel S, Van Hoef B, Brepoels K, Collen D, De Maeyer M (1999) Guiding a docking made by phage display: selection of correlated mutations at the staphylokinase-plasmin interface. J Mol Biol 290: 471–479 [DOI] [PubMed] [Google Scholar]

[ref26] Jia L, Clegg MT, Jiang T (2003) Excess nonsynonymous substitutions suggest that positive selection episodes operated in the DNA-binding domain evolution of Arabidopsis R2R3-MYB genes. Plant Mol Biol (in press) [DOI] [PubMed]

[ref27] Jin H, Martin C (1999) Multifunctionality and diversity within the plant MYB-gene family. Plant Mol Biol 41: 577–585 [DOI] [PubMed] [Google Scholar]

[ref28] Kikuchi S, Satoh K, Nagata T, Kawagashira N, Doi K, Kishimoto N, Yazaki J, Ishikawa M, Yamada H, Ooka H, et al (2003) Collection, mapping, and annotation of over 28, 000 cDNA clones from japonica rice. Science 301: 376–379 [DOI] [PubMed] [Google Scholar]

[ref29] Kranz HD, Denekamp M, Greco R, Jin H, Leyva A, Meissner RC, Petroni K, Urzainqui A, Bevan M, Martin C et al. (1998) Towards functional characterization of the members of the R2R3-MYB gene family from Arabidopsis thaliana. Plant J 16: 263–276 [DOI] [PubMed] [Google Scholar]

[N0xb095bc8N0xac42df0] Kranz HD, Scholz K, Weisshaar B (2000) c-MYB oncogene-like genes encoding three MYB repeats occur in all major plant lineages. Plant J 21: 231–235 [DOI] [PubMed] [Google Scholar]

[ref31] Lee MM, Schiefelbein J (2001) Developmentally distinct MYB genes encode functionally equivalent proteins in Arabidopsis. Development 128: 1539–1546 [DOI] [PubMed] [Google Scholar]

[ref32] Lee MW, Qi M, Yang Y (2001) A novel jasmonic acid-inducible rice myb gene associates with fungal infection and host cell death. Mol Plant-Microbe Interact 14: 527–535 [DOI] [PubMed] [Google Scholar]

[ref33] Li WH, Gojobori T (1983) Rapid evolution of goat and sheep globin genes following gene duplication. Mol Biol Evol 1: 94–108 [DOI] [PubMed] [Google Scholar]

[ref34] Lipsick JS (1996) One billion years of Myb. Oncogene 13: 223–235 [PubMed] [Google Scholar]

[ref35] Long M, Langley CH (1993) Natural selection and the origin of jingwei, a chimeric processed functional gene in Drosophila. Science 260: 91–95 [DOI] [PubMed] [Google Scholar]

[ref36] Lukens L, Doebley J (2001) Molecular evolution of the teosinte branched gene among maize and related grasses. Mol Biol Evol 18: 627–638 [DOI] [PubMed] [Google Scholar]

[ref37] Lynch M, Conery JS (2000) The evolutionary fate and consequences of duplicate genes. Science 290: 1151–1155 [DOI] [PubMed] [Google Scholar]

[ref38] Martin C, Paz-Ares J (1997) MYB transcription factors in plants. Trends Genet 13: 67–73 [DOI] [PubMed] [Google Scholar]

[ref39] Messier W, Stewart C (1997) Episodic adaptive evolution of primate lysozymes. Nature 385: 151–154 [DOI] [PubMed] [Google Scholar]

[ref40] Moyle WR, Campbell RK, Myers RV, Bernard MP, Han Y, Wang X (1994) Co-evolution of ligand-receptor pairs. Nature 368: 251–255 [DOI] [PubMed] [Google Scholar]

[ref41] Nei M (1969) Gene duplication and nucleotide substitution in evolution. Nature 221: 5175–5177 [DOI] [PubMed] [Google Scholar]

[ref42] Nesi N, Jond C, Debeaujon I, Caboche M, Lepiniec L (2001) The Arabidopsis TT2 gene encodes an R2R3 MYB domain protein that acts as a key determinant for proanthocyanidin accumulation in developing seed. Plant Cell 13: 2099–2114 [DOI] [PMC free article] [PubMed] [Google Scholar]

[N0xb095bc8N0xac43b4c] Nielsen R, Yang Z (1998) Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148: 929–936 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref44] Oda M, Furukawa K, Ogata K, Sarai A, Ishii S, Nishimura Y, Nakamura H (1997) Identification of indispensable residues for specific DNA-binding in the imperfect tandem repeats of c-Myb R2R3. Protein Eng 10: 1407–1414 [DOI] [PubMed] [Google Scholar]

[ref45] Ogata K, Morikawa S, Nakamura H, Sekikawa A, Inoue T, Kanai H, Sarai A, Ishii S, Nishimura Y (1994) Solution structure of a specific DNA complex of the Myb DNA-binding domain with cooperative recognition helices. Cell 79: 639–648 [DOI] [PubMed] [Google Scholar]

[ref46] Ohno S (1967) Sex Chromosomes and Sex-Linked Genes. Springer, New York

[ref47] Ohno S (1970) Evolution by Gene Duplication. Springer, New York

[ref48] Ohta T (1994) Further examples of evolution by gene duplication revealed through DNA sequence comparisons. Genetics 138: 1331–1337 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref49] Olsen GJ, Matsuda H, Hagstrom R, Overbeek R (1994) fastDNAml: a tool for construction of phylogenetic trees of DNA sequences using maximum likelihood. Comput Appl Biosci 10: 41–48 [DOI] [PubMed] [Google Scholar]

[ref50] Pazos F, Helmer-Citterich M, Aukiello G, Valencia A (1997) Correlated mutations contain information about protein-protein interaction. J Mol Biol 271: 511–523 [DOI] [PubMed] [Google Scholar]

[ref51] Press WH, Flannery BP, Teukolsky SA, Vetterling WT (1988) Numerical Recipes in C. Cambridge University Press, Cambridge, UK

[ref52] Prince VE, Pickett FB (2002) Splitting pairs: the diverging fates of duplicated genes. Nat Rev Genet 3: 827–837 [DOI] [PubMed] [Google Scholar]

[ref53] Romero I, Fuertes A, Benito MJ, Malpica JM, Leyva A, Paz-Ares J (1998) More than 80 R2R3-MYB regulatory genes in the genome of Arabidopsis thaliana. Plant J 14: 273–284 [DOI] [PubMed] [Google Scholar]

[ref54] Saikumar P, Murali R, Reddy EP (1990) Role of tryptophan repeats and flanking amino acids in Myb-DNA interactions. Proc Natl Acad Sci USA 87: 8452–8456 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref55] Salzberg SL, Pertea M, Delcher AL, Gardner MJ, Tettelin H (1999) Interpolated Markov models for eukaryotic gene finding. Genomics 59: 24–31 [DOI] [PubMed] [Google Scholar]

[ref56] Sankoff D (2001) Gene and genome duplication. Curr Opin Genet Dev 11: 681–684 [DOI] [PubMed] [Google Scholar]

[ref57] Sasaki T, Matsumoto T, Yamamoto K, Sakata K, Baba T, Katayose Y, Wu J, Niimura Y, Cheng Z, Nagamura Y et al. (2002) The genome sequence and structure of rice chromosome 1. Nature 420: 312–316 [DOI] [PubMed] [Google Scholar]

[ref58] Sasaki M, Ogata K, Hatanaka H, Nishimura Y (2000) Backbone dynamics of the c-Myb DNA-binding domain complexed with a specific DNA. J Biochem 127: 945–953 [DOI] [PubMed] [Google Scholar]

[ref59] Sasaki T, Burr B (2000) International rice genome sequencing project: the effort to completely sequence the rice genome. Curr Opin Plant Biol 3: 138–141 [DOI] [PubMed] [Google Scholar]

[ref60] Sreekumar KR, Aravind L, Koonin E (2001) Computational analysis of human disease-associated genes and their protein products. Curr Opin Genet Dev 11: 247–257 [DOI] [PubMed] [Google Scholar]

[ref61] Stern D (1998) A role of ultrabithorax in morphological differences between Drosophila species. Nature 396: 463–466 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref62] Stracke R, Werber M, Weisshaar B (2001) The R2R3-MYB gene family in Arabidopsis thaliana. Curr Opin Plant Biol 4: 447–456 [DOI] [PubMed] [Google Scholar]

[ref63] Sutton K, Wilkinson M (1997) Rapid evolution of a homeodomain: evidence for positive selection. J Mol Evol 45: 579–588 [DOI] [PubMed] [Google Scholar]

[ref64] Suzuki A, Suzuki T, Tanabe F, Toki S, Washida H, Wu CY, Takaiwa F (1997) Cloning and expression of five myb-related genes from rice seed. Gene 198: 393–398 [DOI] [PubMed] [Google Scholar]

[ref65] Tyagi AK, Mohanty A (2000) Rice transformation for crop improvement and functional genomics. Plant Sci 158: 1–18 [DOI] [PubMed] [Google Scholar]

[ref66] Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22: 4673–4680 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref67] Urao T, Yamaguchi-Shinozaki K, Urao S, Shinozaki K (1993) An Arabidopsis myb homolog is induced by dehydration stress and its gene product binds to the conserved MYB recognition sequence. Plant Cell 5: 1529–1539 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref68] Wagner A (2001) Birth and death of duplicated genes in completely sequenced eukaryotes. Trends Genet 17: 237–239 [DOI] [PubMed] [Google Scholar]

[ref69] Walsh JB (1995) How often do duplicated genes evolve new functions. Genetics 139: 421–428 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref70] Williams C, Grotewold E (1997) Differences between plant and animal Myb domains are fundamental for DNA binding activity, and chimeric Myb domains have novel DNA binding specificities. J Biol Chem 272: 563–571 [DOI] [PubMed] [Google Scholar]

[ref71] Wolfe KH, Gouy M, Yang YW, Sharp PM, Li WH (1989) Date of the monocot-dicot divergence estimated from chloroplast DNA sequence data. Proc Natl Acad Sci USA 86: 6201–6205 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref72] Yang Z (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. CABIOS 13: 555–556 [DOI] [PubMed] [Google Scholar]

[ref73] Yang Z, Nielsen R (1998) Synonymous and nonsynonymous rate variation in nuclear genes of mammals. J Mol Evol 46: 409–418 [DOI] [PubMed] [Google Scholar]

[ref74] Yu J, Hu S, Wang J, Wong GK, Li S, Liu B, Peng Y, Dai L, Zhou Y, Zhang X, et al. (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. indica) Science 296: 79–91 [DOI] [PubMed] [Google Scholar]

[ref75] Yuan QP, OuYang S, Liu J, Suh B, Cheung F, Sultana R, Lee D, Quackenbush J, Buell CR (2003) The TIGR rice genome annotation resource: annotating the rice genome and creating resources for plant biologists. Nucleic Acids Res 31: 229–233 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref76] Yuan QP, Quackenbush J, Sultana R, Pertea M, Salzberg SL, Buell CR (2001) Rice bioinformatics: analysis of rice sequence data and leveraging the data to other plant species. Plant Physiology 125: 1166–1174 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref77] Zhang J, Kumar S, Nei M (1997) Small-sample tests of episodic adaptive evolution: a case study of primate lysozymes. Mol Biol Evol 14: 1335–1338 [DOI] [PubMed] [Google Scholar]

[ref78] Zhang J, Rosenberg HF, Nei M (1998) Positive Darwinian selection after gene duplication in primate ribonuclease genes. Proc Natl Acad Sci USA 95: 3708–3713 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref79] Zhang L, Pond SK, Gaut B (2001) A survey of the molecular evolutionary dynamics of twenty-five multigene families from four grass taxa. J Mol Evol 52: 144–156 [DOI] [PubMed] [Google Scholar]

[ref80] Zurawski G, Clegg MT (1987) Evolution of higher-plant chloroplast DNA-encoded genes: implications for structure-function and phylogenetic studies. Annu Rev Plant Physiol 38: 391–418 [Google Scholar]

PERMALINK

Evolutionary Dynamics of the DNA-Binding Domains in Putative R2R3-MYB Genes Identified from Rice Subspecies indica and japonica Genomes¹^,^[w]

Li Jia

Michael T Clegg

Tao Jiang

Abstract

RESULTS

Conserved DNA-Binding Domains of Rice and Arabidopsis R2R3-MYB Genes

Figure 1.

Heterogeneous Positive Selection Pressure in the Phylogeny of Rice R2R3-MYB DNA-Binding Domains

Figure 2.

Figure 3.