Determinants of Exon-Level Evolutionary Rates in Arabidopsis Species

Gideon C-T Wu; Feng-Chi Chen

doi:10.4137/EBO.S9743

. 2012 Jul 4;8:389–415. doi: 10.4137/EBO.S9743

Determinants of Exon-Level Evolutionary Rates in Arabidopsis Species

Gideon C-T Wu ^1,², Feng-Chi Chen ^1,^2,^3,^4,^✉

PMCID: PMC3399485 PMID: 22844194

Abstract

What causes the variations in evolutionary rates is fundamental to molecular evolution. However, in plants, the causes of within-gene evolutionary rate variations remain underexplored. Here we use the principal component regression to examine the contributions of eleven exon features to the within-gene variations in nonsynonymous substitution rate (d_N), synonymous substitution rate (d_S), and the d_N/d_S ratio in Arabidopsis species. We demonstrate that exon features related to protein structural-functional constraints and mRNA splicing account for the largest proportions of within-gene variations in d_N/d_S and d_N. Meanwhile, for d_S, a combination of expression level, exon length, and structural-functional features explains the largest proportion of within-gene variances. Our results suggest that the determinants of within-gene variations differ from those of between-gene variations in evolutionary rates. Furthermore, the relative importance of different exon features also differs between plants and animals. Our study thus may shed a new light on the evolution of plant genes.

Keywords: exonic evolutionary rates, nonsynonymous substitution rate, synonymous substitution rate, principal component regression, Arabidopsis thaliana

Introduction

Evolutionary rates are known to vary significantly between different genes.¹^,² The genomic determinants of evolutionary rates have been extensively studied.² Gene expression level is presently considered as a major determinant of evolution rate—highly expressed genes tend to have a lower nonsynonymous substitution rate (d_N) and a lower nonsynonymous-to-synonymous rate (d_N/d_S) ratio than lowly expressed genes.³^,⁴ Meanwhile, other biological features are also reported to be important for determining evolutionary rates, including tissue specificity of gene expression,⁵^,⁶ gene compactness,⁷ protein extracellularity,⁸ presence of paralogous genes,⁹^,¹⁰ and G+C content.¹¹^,¹² Notably, evolutionary rates also vary within genes. For example, the determinants of evolutionary rates, such as the number of solvent-accessible amino acid residues,¹³^,¹⁴ proportion of intrinsically disordered regions (IDRs),¹⁵^–¹⁷ and proportion of functional domains,¹⁸^–²⁰ may differ significantly between exons, leading to significant variations in evolutionary rates even for different exons of the same genes.

Recently, it has been shown that alternative splicing (AS), an important mechanism to increase proteome diversity,²¹^,²² has significant effects on within-gene variations in evolutionary rates in mammals.¹⁷^,²³^,²⁴ Particularly, the protein regions encoded by alternatively spliced exons (ASEs) evolve more rapidly than those encoded by constitutively spliced exons (CSEs).²³^,²⁴ Notably, however, the contribution of AS to the diversity of transcriptome and proteome actually differs between animals and plants, with the former having more alternatively spliced transcript isoforms than the latter.²²^,²⁵^–²⁷ Furthermore, the patterns of AS also differ between the two lineages. Genome-wide studies showed that the major AS type in plants is intron retention,²⁸^,²⁹ which is the rarest event in animals.²²^,²⁶^,²⁷ Therefore, whether AS also plays an important role in plant exon evolution remains unknown.

Another difference between plants and animals is the higher rates of gene duplication in plant genomes. Particularly, whole-genome duplications (WGDs) occur more frequently in plants than in animals.³⁰^,³¹ The genome of Arabidopsis thaliana, for example, has experienced at least four WGD events.³² In addition, tandem gene duplicates also occur more frequently in plants than in animals.³³ Accordingly, gene duplication may have profound impacts on the functions and evolution of plant genes. Since both of AS and gene duplication can increase proteome diversity, the higher level of gene duplication may have compensated for the lower level of AS in plants, thus decreasing the importance of AS. In this vein, it is likely that the influences of AS on exon evolution are less significant in plants than in animals. This hypothesis, nevertheless, has not been examined.

In this study, we systematically examined the contributions of eleven exon features to the evolutionary rates of plant exons: (1) exonic expression level; (2) the ASE/CSE exon types; (3) weighted exon frequency (WEF, which is an mRNA splicing-related feature; see Materials and Methods); (4) proportion of solvent-accessible amino acid residues (PSA); (5) proportion of Pfam domain (PD); (6) proportion of intrinsically disordered regions (PIDR); (7) exon length; (8) 5′ intron length; (9) 3′ intron length; (10) exon duplicability; and (11) G+C content. Note that some of these features are correlated with each other. One obvious example is PSA, PD, and PIDR, all of which are structural-functional features. Intrinsically disordered regions tend to lack functionally structured domains and be solvent-accessible.³⁴^–³⁶ In addition, 5′ and 3′ intron length are both related to the “compactness” of the interested exons, and thus are collectively termed “compactness features”. Other correlations are less obvious. For example, G+C content was suggested to be positively correlated with the level of gene duplication.¹⁰ Gene compactness was also reported to be positively correlated with G+C content and gene expression level, but negatively correlated with protein evolutionary rate.⁹ Furthermore, gene compactness correlates with gene expression level in contradictory directions between mammals and plants.³⁷^,³⁸ In lieu of the correlations between the analyzed exon features, an appropriate analysis able to control for the confounding effects of intercorrelated variables is required for the purpose of this study. Here we use principal component regression (PCR) analysis to delineate the relative contributions of the eleven exon features to the variations in exonic evolutionary rates in Arabidopsis species (Arabidopsis thaliana and A. lyrata). PCR has been shown to be appropriate for analyzing interacting variables on noisy data.³⁹^,⁴⁰

Our results suggest that for Arabidopsis, structural-functional features constitute a single dominant component in affecting the variances in exonic d_N/d_S and d_N. For d_S, however, a combination of multiple features, including exonic expression level, exon length, and structural-functional features, consist of the most important component in determining the variance in d_S. Our results suggest that the determinants of exon-level evolutionary rates are fairly different from those of the gene-level evolutionary rates. Furthermore, the determinants of exonic evolutionary rates also differ between animals and plants. Our analysis thus has provided new insights into plant exon evolution.

Materials and Methods

Data sources and sequence alignments

The genomic sequences, transcript sequences, gene annotations, transcript structures, and gene orthology between A. thaliana and A. lyrata were retrieved from the EnsemblPlants website (http://plants.ensembl.org/index.html) (Release 11, TAIR10) via the BioMart interface.⁴¹^,⁴² To ensure data quality, only known transcripts with known protein products were analyzed. Since AS is one of the focus of this study, we retained the genes that have at least 2 transcripts and obtained 52,840 orthologous exon pairs from 4,926 one-to-one orthologous gene pairs, which correspond to 11,723 A. thaliana transcripts. We defined the CSE/ASE exon type according to the transcripts of A. thaliana because it is better annotated and has a larger number of alternatively spliced transcripts than A. lyrata. Exons lacking information of any of the analyzed features were filtered out. In the end, we generated an integrated dataset of 28,173 within-gene exon pairs based on the 9,412 exons of 2,102 transcripts.

For each orthologous gene pair, the peptide sequences of all transcript isoforms were aligned by using the MUSCLE program.⁴³ We then chose the longest alignable pair of orthologous peptide sequences for each gene pair. These sequences were then back- translated to nucleotide sequences, and divided into exons with reference to A. thaliana annotations for calculation of evolutionary rates and measurements of exon features. For simplicity, the codons that span an exon-exon boundary were excluded from our analysis.

Measurements of exonic expression level

The RNA-seq data were retrieved from the Sequence Read Archive (SRA)⁴⁴ website (http://trace.ncbi.nlm.nih.gov/Traces/sra/). We retrieved a series of raw data submitted by Filichkin et al (SRP000935)²⁹ and Schimidt’s lab (SRP007763). These RNA-seq data cover several experimental conditions, including abiotic stresses (cold, drought, heat, highlight, and salt), aging stress, and response to different iron concentrations, in addition to normal growth conditions. To measure exonic expression level, we first identified the unique sequences in the analyzed exons using inhouse PERL scripts, and then calculated the number of RNA-seq short reads that can be mapped to these unique regions. The RNA-seq raw data were processed to output the fastq files by using sratoolkit 2.1.6 (available at: http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software). The fastq files were further processed and filtered by the Bowtie-TopHat-Cufflinks-SAMtools toolset⁴⁵^–⁴⁸ (available at: http://bowtie-bio.sourceforge.net/index.shtml; http://tophat.cbcb.umd.edu/; http://cufflinks.cbcb.umd.edu/; http://samtools.sourceforge.net/) to generate the FPKM (fragments per kilobase of exon per million fragments mapped) value for each exon of interest. The expression level of an exon (expL) is the average FPKM value.

Estimations of exonic evolutionary rates

We calculated d_N, d_S, and d_N/d_S of all orthologous exon pairs by using the YN00 program of the PAML 4 package.⁴⁹^,⁵⁰ To avoid biases in evolutionary rate estimations, we excluded exons whose lengths are shorter than 81 bp.¹⁷^,⁵¹

Computation of splicing features

The analyzed exons were divided into five different classes: complex ASE, multiple ASE, single ASE, simple CSE, and complex CSE (see Supplementary Fig. S1A for more details). Briefly, ASEs are the exons that are occasionally skipped in the transcript, whereas CSEs are those that are always present in the transcript. This classification, however, is apparently oversimplified because exons from different transcript isoforms usually partially overlap with each other (Supplementary Fig. S1A). We thus divided ASEs into three different types. A single ASE is one that occurs only once in all of the transcript isoforms. By contrast, a multiple ASE occurs multiple times, and the boundary of this ASE remains unchanged in different transcript isoforms. If a multiple ASE changes its boundary in different transcripts, it is termed a complex ASE. Meanwhile, a simple CSE is one that does not change its boundaries in transcript isoforms. In the case where the boundaries of a CSE do change, or the CSE becomes discontinuous in another transcript isoform, the CSE is termed a complex CSE (Supplementary Fig. S1A). To incorporate this exon classification into our PCR analysis, we assigned an integer value to each of the five types of exons: “1” for complex ASE, “2” for multiple ASE, “3” for single ASE, “4” for complex CSE, and “5” for simple CSE.

We also calculated the weighted exon frequency (WEF) as a quantitative measurement of the relative importance of exons in AS events. The “frequency” of an exon is the proportion of transcript isoforms that include this specific exon. For example, if a gene contains four AS isoforms, and a certain exon is included in three of the four isoforms, the frequency of this exon is 3/4. However, since exons from different isoforms can partially overlap with each other, it is sometimes difficult to clearly define the “frequency” of an exon. Therefore, we use WEF instead, which is the length-weighted frequency of an exon. Supplementary Figure S1B gives examples of WEF calculation. Briefly, an exon was divided into several sub-regions according to how it overlaps with exons from other transcript isoforms. The frequency of each sub-region can be calculated and then weighted by the length of each sub-region to yield WEF.

Measurements of structural-functional features

The structural-functional features analyzed in this study include the proportion of solvent-accessible amino acid residues (PSA), proportion of Pfam domain sites (PD), and proportion of intrinsically disordered regions (PIDR).

Solvent-accessible residues were predicted by using the SSpro/ACCpro 4.1 program⁵² with a 30% exposure threshold. Intrinsically disordered regions were predicted by using DISOPRED V2⁵³ with default parameters. The Pfam domain information⁵⁴ was retrieved from the EnsmblPlant databases.

Measurements of compactness features and other features

The exon length, 5′/3′ intron length, and G+C content were calculated using an in-house PERL script with reference to the annotations of EnsemblPlants. Note that the first and the last coding exons, which do not have 5′ or 3′ intron, respectively, were excluded. Exon duplicability (ED) was defined as the “copy number” of an exon in the A. thaliana transcriptome. We BLASTN-aligned⁵⁵ the exon of interest against the A. thaliana exome using default parameters. A BLAST hit was considered as a potential duplicate of the exon of interest if it satisfies the following criteria: (1) the alignable length is ≧90% of the query exon length; (2) the alignable region has a ≧90% similarity with the query exon.

Statistical analysis

The PCR analyses were conducted by using the program provided by Drummond et al³⁹ under the R environments.⁵⁶ In this study, we calculated the differences in evolutionary rates (d_N, d_S, and d_N/d_S) and the eleven exon features for pairs of exons from the same transcripts (the total number of within-gene exon pairs analyzed here is 28,173). These differences were then analyzed using PCR to delineate the contributions of the exon features to the variances in evolutionary rates.

We also compared exons from different genes by randomly generating 500 datasets, with each dataset containing 28,173 between-gene exon pairs, which is the number of exon pairs in the within-gene analysis. We first randomly selected one exon from the analyzed transcripts, and then selected a second exon that belongs to a gene other than where the first exon is from. By doing so we could be sure that the two exons are from different genes. This process was iterated 28,173 times to derive 28,173 between-gene exon pairs for each of the 500 random datasets. A PCR analysis was then conducted for each of these 500 random datasets. The results of the 500 random sampling experiments were then averaged to represent the effects of between-gene differences in exon features on variances in evolutionary rates.

Results and Discussions

The correlations between exon features and exonic evolutionary rates

To confirm whether the eleven analyzed exon features are correlated with exonic evolutionary rates in Arabidopsis, we pooled all of the exons together and conducted simple Pearson’s correlations separately for d_N, d_S and d_N/d_S ratio against each individual exon feature. As shown in Table 1, exonic d_N/d_S is significantly correlated with each of the eleven exon features. Similar results are also observed for d_N, except that the correlation between 3′ intron length and d_N is statistically insignificant. By contrast, only five of the eleven features (exon length, expression level, exon duplicability, 3′ intron length, and ASE/CSE exon type) are significantly correlated with d_S. Notably, exon features related to structural-functional constraints (PD, PSA, and PIDR) have the highest correlations with exonic d_N/d_S. For d_N and d_S, unexpectedly, exon length seems to be the most important determinants. Another unexpected observation is that expression level ranks only the fifth and the fourth, respectively, in terms of coefficient of correlation with d_N/d_S and d_N, although it ranks the second in the case of d_S. These results, however, are oversimplified because the interactions between different features are not controlled. Accordingly, we conducted PCR analyses to control for the confounding effects of inter-correlations between the analyzed features.

Table 1.

The Pearson’s coefficient of correlation between each of the eleven exon features and d_N/d_S, d_N, and d_S.

Exon feature	Pearson’s coefficient of correlation (rank)

	d_N/d_S	d_N	d_S
% Pfam domain	−0.2338 (3)^a,^b,^****	−0.2292 (5)^****	0.0126 (8)
% solvent-accessible amino acid residues	0.2372 (2)^****	0.2434 (3)^****	0.0055 (10)
% intrinsically disordered regions	0.2694 (1)^****	0.2842 (2)^****	0.0165 (7)
ASE/CSE exon type	−0.0294 (11)^*	−0.0210 (10)^*	0.0171 (5)^*
Weighted exon frequency	−0.0309 (10)^*	−0.0232 (9)^*	0.0169 (6)
5′ intron length	0.0496 (7)^***	0.0475 (7)^**	−0.0068 (9)
3′ intron length	0.0421 (8)^**	0.0096 (11)	−0.0537 (3)^***
Exon length	0.2180 (4)^****	0.3133 (1)^****	0.1560 (1)^****
Exonic expression level	−0.1852 (5)^****	−0.2382 (4)^****	−0.0892 (2)^****
Exon duplicability	−0.0625 (6)^****	−0.0898 (6)^****	−0.0394 (4)^**
G+C content	−0.0349 (9)^**	−0.0356 (8)^**	−0.0016 (11)

Exon features	Comp 1	Comp 2	Comp 3	Comp 4	Comp 5	Comp 6	Comp 7	Comp 8	Comp 9	Comp 10	Comp 11
ASE/CSE exon type	0.070	0.421	0.002	0.001	0.003	0.000	0.001	0.000	0.000	0.000	0.501
Weighted exon frequency (WEF)	0.070	0.418	0.002	0.001	0.007	0.002	0.001	0.000	0.000	0.000	0.498
Average exonic expression level	0.042	0.018	0.454	0.000	0.004	0.006	0.000	0.000	0.462	0.013	0.000
5′ intron length	0.014	0.000	0.017	0.088	0.330	0.161	0.362	0.025	0.002	0.001	0.000
3′ intron length	0.001	0.000	0.003	0.422	0.161	0.001	0.389	0.010	0.012	0.001	0.000
Exon length	0.092	0.028	0.330	0.005	0.001	0.000	0.063	0.002	0.434	0.046	0.000
G+C content	0.005	0.001	0.017	0.428	0.252	0.089	0.133	0.017	0.038	0.020	0.000
Exon duplicability	0.000	0.013	0.001	0.022	0.216	0.708	0.035	0.000	0.002	0.000	0.001
% solvent-accessible amino acid residues	0.208^a	0.031	0.087	0.030	0.024	0.029	0.000	0.270	0.032	0.289	0.000
% intrinsically disordered regions	0.298	0.042	0.053	0.002	0.001	0.002	0.002	0.022	0.012	0.566	0.000
% Pfam domain	0.198	0.027	0.035	0.000	0.001	0.001	0.013	0.655	0.006	0.063	0.000

Exon features	Comp 1	Comp 2	Comp 3	Comp 4	Comp 5	Comp 6	Comp 7	Comp 8	Comp 9	Comp 10	Comp 11
ASE/CSE exon type	0.003	0.485	0.000	0.005	0.001	0.003	0.000	0.000	0.000	0.000	0.500
Weighted exon frequency (WEF)	0.003	0.483	0.001	0.006	0.000	0.003	0.000	0.000	0.000	0.000	0.500
Average exonic expression level	0.094	0.001	0.409	0.004	0.003	0.003	0.003	0.001	0.481	0.002	0.000
5′ intron length	0.018	0.009	0.002	0.301	0.080	0.058	0.525	0.001	0.000	0.000	0.000
3′ intron length	0.008	0.004	0.003	0.385	0.097	0.035	0.458	0.005	0.008	0.000	0.000
Exon length	0.137	0.006	0.259	0.008	0.071	0.026	0.001	0.050	0.422	0.025	0.000
G+C content	0.000	0.000	0.094	0.207	0.267	0.355	0.004	0.014	0.061	0.014	0.000
Exon duplicability	0.010	0.004	0.006	0.014	0.462	0.480	0.005	0.000	0.000	0.000	0.000
% solvent accessible amino acid residues	0.187^a	0.003	0.127	0.062	0.013	0.035	0.001	0.300	0.025	0.248	0.000
% intrinsically disordered regions	0.302	0.003	0.076	0.008	0.005	0.000	0.000	0.006	0.002	0.600	0.000
% Pfam domain	0.237	0.002	0.024	0.000	0.002	0.000	0.002	0.622	0.001	0.109	0.000

	Var. explained	Comp 1	Comp 2	Comp 9	Comp 10	Comp 3	Comp 4	Comp 5	Comp 6	Comp 7	Comp 8	Comp 11
Exon features	(Sub)total
	8.00	6.47	1.09	0.27	0.11	0.01	0.01	0.01	0.01	0.01	0.01	0.00
ASE/CSE exon type	0.913	0.4536	0.4589	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
Weighted exon frequency (WEF)	0.909	0.4536	0.4556	0.0000	0.0000	0.0000	0.0000	0.0001	0.0000	0.0000	0.0000	0.0000
Average exonic expression level	0.423	0.2722	0.0196	0.1247	0.0014	0.0045	0.0000	0.0000	0.0001	0.0000	0.0000	0.0000
5′ intron length	0.101	0.0907	0.0000	0.0005	0.0001	0.0002	0.0009	0.0033	0.0016	0.0036	0.0003	0.0000
3′ intron length	0.020	0.0065	0.0000	0.0032	0.0001	0.0000	0.0042	0.0016	0.0000	0.0039	0.0001	0.0000
Exon length	0.753	0.5962	0.0305	0.1172	0.0051	0.0033	0.0001	0.0000	0.0000	0.0006	0.0000	0.0000
G+C content	0.055	0.0324	0.0011	0.0103	0.0022	0.0002	0.0043	0.0025	0.0009	0.0013	0.0002	0.0000
Exon duplicability	0.025	0.0000	0.0142	0.0005	0.0000	0.0000	0.0002	0.0022	0.0071	0.0004	0.0000	0.0000
% solvent-accessible amino acid residues	1.426	1.3478	0.0338	0.0086	0.0318	0.0009	0.0003	0.0002	0.0003	0.0000	0.0027	0.0000
% intrinsically disordered regions	2.043	1.9310	0.0458	0.0032	0.0623	0.0005	0.0000	0.0000	0.0000	0.0000	0.0002	0.0000
% Pfam domain	1.328	1.2830	0.0294	0.0016	0.0069	0.0004	0.0000	0.0000	0.0000	0.0001	0.0066	0.0000
Exon feature category	Subtotal
Structural-functional features	4.798	4.562	0.109	0.014	0.101	0.002	0.000	0.000	0.000	0.000	0.009	0.000
Splicing features	1.822	0.907	0.915	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
Exon length	0.753	0.596	0.031	0.117	0.005	0.003	0.000	0.000	0.000	0.001	0.000	0.000
Expression level	0.423	0.272	0.020	0.125	0.001	0.005	0.000	0.000	0.000	0.000	0.000	0.000
Compactness features	0.121	0.097	0.000	0.004	0.000	0.000	0.005	0.005	0.002	0.008	0.000	0.000
Other features	0.080	0.032	0.015	0.011	0.002	0.000	0.005	0.005	0.008	0.002	0.000	0.000

	Var. explained	Comp 1	Comp 2	Comp 9	Comp 3	Comp 10	Comp 7	Comp 6	Comp 4	Comp 5	Comp 11	Comp 8
Exon features	(Sub)total
	13.49	9.38	2.33	0.7	0.52	0.24	0.21	0.05	0.02	0.02	0.02	0.00
ASE/CSE exon type	1.650	0.6580	0.9809	0.0000	0.0010	0.0000	0.0002	0.0000	0.0000	0.0001	0.0100	0.0000
Weighted exon frequency (WEF)	1.643	0.6580	0.9739	0.0000	0.0010	0.0000	0.0002	0.0001	0.0000	0.0001	0.0100	0.0000
Average exonic expression level	1.000	0.3948	0.0419	0.3234	0.2361	0.0031	0.0000	0.0003	0.0000	0.0001	0.0000	0.0000
5′ intron length	0.235	0.1316	0.0000	0.0014	0.0088	0.0002	0.0760	0.0081	0.0018	0.0066	0.0000	0.0000
3′ intron length	0.113	0.0094	0.0000	0.0084	0.0016	0.0002	0.0817	0.0001	0.0084	0.0032	0.0000	0.0000
Exon length	1.430	0.8648	0.0652	0.3038	0.1716	0.0110	0.0132	0.0000	0.0001	0.0000	0.0000	0.0000
G+C content	0.136	0.0470	0.0023	0.0266	0.0088	0.0048	0.0279	0.0045	0.0086	0.0050	0.0000	0.0000
Exon duplicability	0.080	0.0000	0.0303	0.0014	0.0005	0.0000	0.0074	0.0354	0.0004	0.0043	0.0000	0.0000
% solvent-accessible amino acid residues	2.167	1.9552	0.0722	0.0224	0.0452	0.0694	0.0000	0.0015	0.0006	0.0005	0.0000	0.0000
% intrinsically disordered regions	3.071	2.8012	0.0979	0.0084	0.0276	0.1358	0.0004	0.0001	0.0000	0.0000	0.0000	0.0000
% Pfam domain	1.964	1.8612	0.0629	0.0042	0.0182	0.0151	0.0027	0.0001	0.0000	0.0000	0.0000	0.0000
Exon feature category	Subtotal
Structural-functional features	7.203	6.618	0.233	0.035	0.091	0.220	0.003	0.002	0.001	0.001	0.000	0.000
Splicing features	3.294	1.316	1.955	0.000	0.002	0.000	0.000	0.000	0.000	0.000	0.020	0.000
Exon length	1.430	0.865	0.065	0.304	0.172	0.011	0.013	0.000	0.000	0.000	0.000	0.000
Expression level	1.000	0.395	0.042	0.323	0.236	0.003	0.000	0.000	0.000	0.000	0.000	0.000
Compactness features	0.348	0.141	0.000	0.010	0.010	0.000	0.158	0.008	0.010	0.010	0.000	0.000
Other features	0.215	0.047	0.033	0.028	0.009	0.005	0.035	0.040	0.009	0.009	0.000	0.000

	Var. explained	Comp 3	Comp 2	Comp 1	Comp 7	Comp 6	Comp 9	Comp 4	Comp 10	Comp 11	Comp 5	Comp 8
Exon features	(Sub)total
	3.12	1.48	0.44	0.32	0.26	0.23	0.19	0.1	0.07	0.03	0.00	0.00
ASE/CSE exon type	0.226	0.0030	0.1852	0.0224	0.0003	0.0000	0.0000	0.0001	0.0000	0.0150	0.0000	0.0000
Weighted exon frequency (WEF)	0.225	0.0030	0.1839	0.0224	0.0003	0.0005	0.0000	0.0001	0.0000	0.0149	0.0000	0.0000
Average exonic expression level	0.783	0.6719	0.0079	0.0134	0.0000	0.0014	0.0878	0.0000	0.0009	0.0000	0.0000	0.0000
5′ intron length	0.170	0.0252	0.0000	0.0045	0.0941	0.0370	0.0004	0.0088	0.0001	0.0000	0.0000	0.0000
3′ intron length	0.151	0.0044	0.0000	0.0003	0.1011	0.0002	0.0023	0.0422	0.0001	0.0000	0.0000	0.0000
Exon length	0.633	0.4884	0.0123	0.0294	0.0164	0.0000	0.0825	0.0005	0.0032	0.0000	0.0000	0.0000
G+C content	0.134	0.0252	0.0004	0.0016	0.0346	0.0205	0.0072	0.0428	0.0014	0.0000	0.0000	0.0000
Exon duplicability	0.182	0.0015	0.0057	0.0000	0.0091	0.1628	0.0004	0.0022	0.0000	0.0000	0.0000	0.0000
% solvent-accessible amino acid residues	0.245	0.1288	0.0136	0.0666	0.0000	0.0067	0.0061	0.0030	0.0202	0.0000	0.0000	0.0000
% intrinsically disordered regions	0.235	0.0784	0.0185	0.0954	0.0005	0.0005	0.0023	0.0002	0.0396	0.0000	0.0000	0.0000
% Pfam domain	0.136	0.0518	0.0119	0.0634	0.0034	0.0002	0.0011	0.0000	0.0044	0.0000	0.0000	0.0000
Exon feature category	Subtotal
Expression level	0.783	0.672	0.008	0.013	0.000	0.001	0.088	0.000	0.001	0.000	0.000	0.000
Exon length	0.633	0.488	0.012	0.029	0.016	0.000	0.082	0.001	0.003	0.000	0.000	0.000
Structural-functional features	0.617	0.259	0.044	0.225	0.004	0.007	0.010	0.003	0.064	0.000	0.000	0.000
Splicing features	0.451	0.006	0.369	0.045	0.001	0.000	0.000	0.000	0.000	0.030	0.000	0.000
Compactness features	0.321	0.030	0.000	0.005	0.195	0.037	0.003	0.051	0.000	0.000	0.000	0.000
Other features	0.315	0.027	0.006	0.002	0.044	0.183	0.008	0.045	0.001	0.000	0.000	0.000

	Var. explained	Comp 1	Comp 3	Comp 4	Comp 7	Comp 8	Comp 5	Comp 2	Comp 10	Comp 6	Comp 11	Comp 9
Exon features	(Sub)total
	13.678	13.088	0.237	0.157	0.070	0.039	0.030	0.020	0.017	0.015	0.003	0.002
ASE/CSE exon type	0.046	0.0340	0.0001	0.0008	0.0000	0.0000	0.0000	0.0097	0.0000	0.0000	0.0015	0.0000
Weighted exon frequency (WEF)	0.057	0.0446	0.0002	0.0009	0.0000	0.0000	0.0000	0.0097	0.0000	0.0001	0.0015	0.0000
Average exonic expression level	1.332	1.2329	0.0967	0.0006	0.0002	0.0000	0.0001	0.0000	0.0000	0.0000	0.0000	0.0011
5′ intron length	0.324	0.2365	0.0004	0.0471	0.0367	0.0000	0.0024	0.0002	0.0000	0.0008	0.0000	0.0000
3′ intron length	0.201	0.1036	0.0008	0.0604	0.0320	0.0002	0.0029	0.0001	0.0000	0.0005	0.0000	0.0000
Exon length	1.863	1.7945	0.0612	0.0012	0.0001	0.0019	0.0021	0.0001	0.0004	0.0004	0.0000	0.0010
G+C content	0.070	0.0004	0.0222	0.0324	0.0003	0.0006	0.0081	0.0000	0.0002	0.0052	0.0000	0.0001
Exon duplicability	0.160	0.1354	0.0013	0.0023	0.0003	0.0000	0.0141	0.0001	0.0000	0.0070	0.0000	0.0000
% solvent-accessible amino acid residues	2.509	2.4524	0.0302	0.0098	0.0001	0.0116	0.0004	0.0001	0.0042	0.0005	0.0000	0.0001
% intrinsically disordered regions	3.983	3.9533	0.0179	0.0012	0.0000	0.0002	0.0001	0.0001	0.0101	0.0000	0.0000	0.0000
% Pfam domain	3.133	3.1008	0.0057	0.0001	0.0002	0.0242	0.0001	0.0000	0.0018	0.0000	0.0000	0.0000
Exon feature category	Subtotal
Structural-functional features	9.625	9.507	0.054	0.011	0.000	0.036	0.001	0.000	0.016	0.001	0.000	0.000
Exon length	1.863	1.794	0.061	0.001	0.000	0.002	0.002	0.000	0.000	0.000	0.000	0.001
Expression level	1.332	1.233	0.097	0.001	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.001
Compactness features	0.593	0.340	0.062	0.109	0.069	0.002	0.008	0.000	0.000	0.002	0.000	0.001
Other features	0.230	0.136	0.024	0.035	0.001	0.001	0.022	0.000	0.000	0.012	0.000	0.000
Splicing features	0.103	0.079	0.000	0.002	0.000	0.000	0.000	0.019	0.000	0.000	0.003	0.000

	Var. explained	Comp 1	Comp 3	Comp 4	Comp 8	Comp 9	Comp 2	Comp 6	Comp 10	Comp 5	Comp 7	Comp 11
Exon features	(Sub)total
	17.973	15.941	1.269	0.267	0.207	0.121	0.059	0.050	0.042	0.008	0.005	0.003
ASE/CSE exon type	0.077	0.0457	0.0003	0.0014	0.0001	0.0000	0.0286	0.0002	0.0000	0.0000	0.0000	0.0013
Weighted exon frequency (WEF)	0.091	0.0590	0.0010	0.0016	0.0000	0.0000	0.0285	0.0002	0.0000	0.0000	0.0000	0.0013
Average exonic expression level	2.082	1.5039	0.5183	0.0011	0.0002	0.0583	0.0001	0.0001	0.0001	0.0000	0.0000	0.0000
5′ intron length	0.380	0.2908	0.0022	0.0797	0.0002	0.0000	0.0005	0.0033	0.0000	0.0008	0.0025	0.0000
3′ intron length	0.241	0.1272	0.0042	0.1027	0.0010	0.0010	0.0002	0.0015	0.0000	0.0009	0.0022	0.0000
Exon length	2.583	2.1880	0.3275	0.0021	0.0105	0.0511	0.0004	0.0013	0.0011	0.0006	0.0000	0.0000
G+C content	0.206	0.0005	0.1197	0.0553	0.0029	0.0075	0.0000	0.0175	0.0006	0.0022	0.0000	0.0000
Exon duplicability	0.204	0.1647	0.0072	0.0040	0.0000	0.0000	0.0002	0.0242	0.0000	0.0038	0.0000	0.0000
% solvent-accessible amino acid residues	3.239	2.9830	0.1619	0.0167	0.0622	0.0031	0.0003	0.0017	0.0104	0.0001	0.0000	0.0000
% intrinsically disordered regions	4.933	4.8077	0.0961	0.0021	0.0012	0.0002	0.0002	0.0000	0.0252	0.0000	0.0000	0.0000
% Pfam domain	3.935	3.7704	0.0307	0.0001	0.1290	0.0001	0.0001	0.0000	0.0046	0.0000	0.0000	0.0000
Exon feature category	Subtotal
Structural-functional features	12.107	11.561	0.289	0.019	0.192	0.003	0.001	0.002	0.040	0.000	0.000	0.000
exon length	2.583	2.188	0.328	0.002	0.010	0.051	0.000	0.001	0.001	0.001	0.000	0.000
expression level	2.082	1.504	0.518	0.001	0.000	0.058	0.000	0.000	0.000	0.000	0.000	0.000
Compactness features	0.621	0.418	0.006	0.182	0.001	0.001	0.001	0.005	0.000	0.002	0.005	0.000
Other features	0.410	0.165	0.127	0.059	0.003	0.007	0.000	0.042	0.001	0.006	0.000	0.000
Splicing features	0.169	0.105	0.001	0.003	0.000	0.000	0.057	0.000	0.000	0.000	0.000	0.003

	Var. explained	Comp 3	Comp 1	Comp 9	Comp 6	Comp 8	Comp 5	Comp 7	Comp 2	Comp 4	Comp 10	Comp 11
Exon features	(Sub)total
	2.991	1.329	0.411	0.298	0.247	0.239	0.198	0.112	0.073	0.043	0.038	0.004
ASE/CSE exon type	0.040	0.0003	0.0011	0.0000	0.0008	0.0001	0.0002	0.0000	0.0355	0.0002	0.0000	0.0018
Weighted exon frequency (WEF)	0.041	0.0011	0.0014	0.0000	0.0009	0.0000	0.0000	0.0000	0.0354	0.0002	0.0000	0.0018
Average exonic expression level	0.726	0.5426	0.0388	0.1430	0.0006	0.0003	0.0005	0.0003	0.0001	0.0002	0.0001	0.0000
5′ intron length	0.113	0.0023	0.0074	0.0001	0.0153	0.0002	0.0160	0.0584	0.0006	0.0133	0.0000	0.0000
3′ intron length	0.107	0.0043	0.0033	0.0024	0.0082	0.0011	0.0197	0.0513	0.0003	0.0169	0.0000	0.0000
Exon length	0.559	0.3432	0.0564	0.1254	0.0063	0.0121	0.0141	0.0002	0.0004	0.0003	0.0010	0.0000
G+C content	0.296	0.1248	0.0000	0.0183	0.0872	0.0034	0.0529	0.0004	0.0000	0.0087	0.0005	0.0000
Exon duplicability	0.223	0.0074	0.0042	0.0000	0.1189	0.0000	0.0910	0.0006	0.0003	0.0006	0.0000	0.0000
% solvent-accessible amino acid residues	0.349	0.1699	0.0769	0.0075	0.0084	0.0716	0.0024	0.0001	0.0002	0.0027	0.0094	0.0000
% intrinsically disordered regions	0.251	0.1008	0.1240	0.0006	0.0001	0.0014	0.0009	0.0000	0.0001	0.0003	0.0229	0.0000
% Pfam domain	0.283	0.0321	0.0972	0.0002	0.0001	0.1488	0.0005	0.0003	0.0001	0.0000	0.0042	0.0000
Exon feature category	Subtotal
Structural-functional features	0.884	0.303	0.298	0.008	0.009	0.222	0.004	0.000	0.000	0.003	0.037	0.000
expression level	0.726	0.543	0.039	0.143	0.001	0.000	0.001	0.000	0.000	0.000	0.000	0.000
exon length	0.559	0.343	0.056	0.125	0.006	0.012	0.014	0.000	0.000	0.000	0.001	0.000
Other features	0.519	0.132	0.004	0.018	0.206	0.003	0.144	0.001	0.000	0.009	0.001	0.000
Compactness features	0.221	0.007	0.011	0.002	0.023	0.001	0.036	0.110	0.001	0.030	0.000	0.000
Splicing features	0.081	0.001	0.002	0.000	0.002	0.000	0.000	0.000	0.071	0.000	0.000	0.004

Principal component	Pearson correlation coefficient

	d_N/d_S	d_N	d_S
Component 1	−0.0763^a^,^***	−0.0450^***	0.0515^***
Component 2	0.0745^***	0.1481^***	0.1011^***
Component 3	0.0504^***	0.0519^***	0.0034
Component 4	0.1210^***	0.2049^***	0.1123^***
Component 5	0.0398^***	0.1307^***	0.1278^***
Component 6	0.0745^***	0.1205^***	0.0596^***
Component 7	−0.1415^***	−0.2097^***	−0.0941^***
Component 8	−0.0391^***	−0.0388 ^***	0.0060
Component 9	0.0291^***	0.0663^***	0.0499^***
Component 10	−0.1976^***	−0.2229^***	−0.0236^**
Component 11	0.0561^***	0.1260^***	0.0970^***

PERMALINK

Determinants of Exon-Level Evolutionary Rates in Arabidopsis Species

Gideon C-T Wu

Feng-Chi Chen

Abstract

Introduction

Materials and Methods

Data sources and sequence alignments

Measurements of exonic expression level

Estimations of exonic evolutionary rates

Computation of splicing features

Measurements of structural-functional features

Measurements of compactness features and other features

Statistical analysis

Results and Discussions

The correlations between exon features and exonic evolutionary rates

Table 1.

The exon features are grouped into biologically sensible components

Table 2.

Table 3.

Structural-functional features are the single dominant feature category in explaining the variances in exonic dN/dS and dN in Arabidopsis

Figure 1.

The variances in exonic dS can be explained by a combination of different exon features

The relative importance of exonic expression level and structural-functional features in affecting between-gene dN/dS variance

Figure 2.

Figure 3.

Gene level versus exon level— the differences in evolutionary rate determinants

Potential caveats

Concluding remarks

Supplementary Data

Table S1.

Table S2.

Table S3.

Table S4.

Table S5.

Table S6.

Table S7.

Footnotes

References

Associated Data

Supplementary Materials

Table S1.

Table S2.

Table S3.

Table S4.

Table S5.

Table S6.

Table S7.

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Structural-functional features are the single dominant feature category in explaining the variances in exonic d_N/d_S and d_N in Arabidopsis

The variances in exonic d_S can be explained by a combination of different exon features

The relative importance of exonic expression level and structural-functional features in affecting between-gene d_N/d_S variance