Skip to main content
Molecular Biology and Evolution logoLink to Molecular Biology and Evolution
. 2015 Oct 24;33(2):585–590. doi: 10.1093/molbev/msv234

Disentangling Sources of Selection on Exonic Transcriptional Enhancers

Rachel M Agoglia 1, Hunter B Fraser 2,*
PMCID: PMC4909131  PMID: 26500252

Abstract

In addition to coding for proteins, exons can also impact transcription by encoding regulatory elements such as enhancers. It has been debated whether such features confer heightened selective constraint, or evolve neutrally. We have addressed this question by developing a new approach to disentangle the sources of selection acting on exonic enhancers, in which we model the evolutionary rates of every possible substitution as a function of their effects on both protein sequence and enhancer activity. In three exonic enhancers, we found no significant association between evolutionary rates and effects on enhancer activity. This suggests that despite having biochemical activity, these exonic enhancers have no detectable selective constraint, and thus are unlikely to play a major role in protein evolution.

Keywords: cis-regulation, enhancer, evolutionary rate, exon


Beyond their essential role in specifying amino acid sequences, exons can play additional roles in the regulation of transcription, translation, splicing, and mRNA stability. For example, transcriptional enhancers located within exons can regulate the expression of their respective genes, or neighboring genes, in a tissue-specific manner (Birnbaum et al. 2012; Ritter et al. 2012).

It is plausible that exonic enhancers may be subject to purifying selection to preserve their function, leading to greater conservation than in exons without regulatory roles (Lin et al. 2011; Suzuki and Saitou 2011). An initial investigation of conservation at exonic enhancers (Stergachis et al. 2013) examined the conservation of 4-fold degenerate bases, both inside and outside exonic transcription factor (TF) binding sites—the building blocks of exonic enhancers. This analysis found that exonic enhancers are more conserved than nonregulatory exons, and the authors concluded that exonic enhancers have a strong influence on the trajectory of protein evolution. Recently, however, a reevaluation of this claim argues that when using a different metric to measure conservation, there is no detectable difference in conservation between TF binding and nonbinding sites within exons (Xing and He 2015). Both of these contradictory studies focused on genome-wide trends, and thus may be susceptible to biases caused by confounding factors that were not taken into account. Indeed, the latter study claims that the original finding of conservation was primarily due to higher GC content in exonic TF binding sites. Whether additional confounders may exist is still an open question.

We decided to investigate this question from a different perspective, by developing a method for comparing the relative roles of protein sequence and enhancer function in selection acting on exonic enhancers (fig. 1). Building on the method introduced by Smith et al. (2013) for detecting selection on regulatory elements, our approach tests whether the substitutions that have occurred during the evolution of specific exonic enhancers are best explained by selection on protein sequence, enhancer activity, or a combination of both. This is possible when, for each possible single-nucleotide variant (SNV), we have experimental data measuring its effect on enhancer activity. More formally, we fit a linear model of the form:

k=αδ(nonsynonymous)+βU+γD+ε (1)

where:

U={0ifF0Fotherwise}D={0ifF0Fotherwise}F=|log2(Expression of derived alleleExpression of ancestral allele)|

Where k is the evolutionary rate of each SNV (measured by rate of that substitution at a given position in the exon); “nonsynonymous” is a binary indicator of whether each SNV changes an amino acid; “U” (upregulation) and “D” (downregulation) are the log fold-changes in enhancer activity caused by each SNV; and ε is an error term. Our goal is to find the coefficients α, β, and γ that best explain the observed evolutionary rates; negative coefficients would reflect negative selection on each respective type of change, whereas positive coefficients would be expected from a predominance of positive selection. This model is attractive because it naturally accounts for substitutions with effects on both protein sequence and enhancer activity, and it can be extended to more complex models (discussed below). Changes in enhancer activity are split into two separate terms to maximize the linearity of the relationships (with two separate terms, we can capture scenarios such as negative selection on both up- and downregulation, as well as positive selection for upregulation coupled with negative selection for downregulation; these scenarios cannot both be captured with just one regulatory variable in a linear model).

Fig. 1.

Fig. 1.

Outline of our approach. For each possible substitution in an exonic enhancer, we determined the evolutionary rate across placental mammals, as inferred from the rate of substitution. We then fit a linear model to determine the relative contribution of two plausible drivers of evolutionary rate: amino acid sequence, where synonymous changes are expected to occur more commonly than nonsynonymous ones; and regulatory impact, wherein positions with high regulatory impact (such as TF binding sites) may dictate lower evolutionary rate..

Regulatory effects for each possible SNV were measured in saturation mutagenesis experiments of three human exonic enhancers performed by Birnbaum et al. (2014). The three exons (SORL1 exon 17, TRAF3IP2 exon 2, and PPARG exon 6) had strong enhancer activity in both human hepatocytes and mouse liver, though only the human sequences were tested. Birnbaum et al. (2014) subjected >104 randomly mutated versions of each exon to a massively parallel reporter assay in the livers of live mice, which allowed them to infer each SNV’s precise effect on enhancer activity. These SNVs were also tested in HeLa cells, resulting in different spectra of regulatory effects; we analyzed the data from both cell types.

To determine the evolutionary rate at each position in these exons, we identified and aligned orthologous exon sequences from a wide range of placental mammals (30–36 for each exon), and used them to reconstruct the most likely ancestral exon sequence at each node of the phylogeny (fig. 2A, supplementary fig. S1, Supplementary Material online). Each substitution that occurred was identified and scored as synonymous or nonsyonymous. The evolutionary rates of the exons varied, with PPARG exon 6 showing the highest and TRAF3IP2 exon 2 showing the lowest overall conservation (fig. 2B). As expected, there was a strong bias toward synonymous substitutions and transitions over transversions.

Fig. 2.

Fig. 2.

Exonic enhancer evolution. (A) Phylogenetic tree of the mammalian species for which we analyzed the SORL1 exon 17 sequences. Numbers indicate the number of point mutations identified in each branch. (B) Summary of the substitution patterns for each of the three exons analyzed. Percentages for “# Sites Substituted” are relative to the total number of sites in the exon; percentages for the other columns are given relative to the total number of substitutions.

In fitting the linear model (eq. 1), we determined the three coefficients and their associated P values for each of the three exons in both liver and HeLa cells (table 1). For all three exons in both cell types, we found highly significant negative coefficients for the nonsynonymous term (indicating negative selection on protein sequence; all P < 3 × 1011), but no significance for the regulatory terms (all P > 0.05). This indicates that all detectable selection is acting solely on the protein sequence; when considering the effects of both coding and regulatory changes, the enhancer activities do not contribute significantly to substitution rates.

Table 1.

Results from Linear Regression Analysis..

Cell Type Equation Parameter SORL1 Exon 17
TRAF3IP2 Exon 2
PPARG Exon 6
Coeff. P Value Coeff. P Value Coeff. P Value
Liver Nonsynonymous −0.0167 2.2e-13*** −0.0052 2.4e-11*** −0.0057 1.3e-13***
Upregulation −0.0031 0.597 −0.0008 0.711 −0.0026 0.232
Downregulation −0.0013 0.717 −0.0010 0.476 0.0001 0.934
HeLa Nonsynonymous −0.0162 1.1e-12*** −0.0052 2.0e-11*** −0.0056 2.5e-13***
Upregulation 0.0275 0.051 0.0010 0.816 −0.0015 0.640
Downregulation −0.0080 0.446 0.0022 0.573 0.0005 0.791

*** Indicate p-values significant after Bonferroni correction for multiple tests.

We then explored several additional approaches for fitting the linear model. First, to test for synergistic effects of regulatory and nonsynonymous SNVs, we added pairwise interaction terms to the original model (supplementary table S1, Supplementary Material online). Second, we combined the up- and downregulation terms from the original model into a single term for regulatory effect (supplementary table S2, Supplementary Material online). Third, we examined the effect of including only up- or only downregulation, instead of both (supplementary tables S3 and S4, Supplementary Material online), as well as leaving out the nonsynonymous term (supplementary table S5, Supplementary Material online). Fourth, we replaced our binary nonsynonymous variable, which indicates whether a substitution is synonymous or nonsynonymous, with a variable that further discriminates between conservative versus radical nonsynonymous changes (supplementary table S6, Supplementary Material online). Fifth, because these enhancers may not be active in more distantly related mammals, we restricted the analysis to just primates and rodents (supplementary table S7, Supplementary Material online). Finally, we weighted the substitution rates to account for sequence-specific mutation rates (supplementary table S8, Supplementary Material online). In all cases, these modified linear models confirmed our initial results.

Although the enhancer boundaries for these exons are not precisely defined, Birnbaum et al. (2014) did note that two of them had a higher density of SNVs with regulatory effects in part of the exon (the third, PPARG, had enhancer activity throughout its exon 6). Therefore we tested the core-enhancer and outer-enhancer sections of these two exons separately in our linear model (supplementary tables S9 and S10, Supplementary Material online). Additionally, to determine if regulatory selection was detectable in the absence of protein-coding constraints, we applied our model to introns flanking the SORL1 and PPARG exons (there was little coverage of the flanking introns in the TRAF3IP2 data set), omitting the “nonsynonymous” term from the regression (supplementary table S11, Supplementary Material online). We again found no signals of selection on regulatory activity in these additional regions, further suggesting that these enhancers have no detectable selective constraint.

As a complementary approach to our linear model, we also calculated the correlation between substitution rate and regulatory impact when controlling for nonsynonymous effect, via partial correlation. In agreement with our regression results, we found no significant associations (supplementary tables S12–S14, Supplementary Material online).

These results suggest that there is little selection on the regulatory functions of these exonic enhancers. This is in stark contrast to the three noncoding enhancers studied by Smith et al (2013), where strong selection was detected on each enhancer. The saturation mutagenesis data used in each study are quite similar (Patwardhan et al. 2012; Birnbaum et al. 2014), having been produced by the same group using the same in vivo reporter assay. Therefore the lack of detectable selection on the activities of all three exonic enhancers is unlikely to be explained by an inability of this type of data to reveal selection. To further investigate the power of our analysis, we fit our model to simulated evolutionary rates in which the regulatory effects are under varying strengths of selection (supplementary tables S15–S21, Supplementary Material online, see Materials and Methods), which confirmed that our approach has the ability to detect selection on the activity levels of these enhancers.

If selection is not acting on these enhancer activities, then a reasonable question is why do these act as such strong enhancers; did they acquire this activity simply by neutral drift? While this may be unlikely to evolve within any given exon, it is important to note that these three exonic enhancers were selected as being among the strongest such elements from a genome-wide screen (Birnbaum et al. 2014). It is not hard to imagine that across the entire human genome, a handful of exons might have evolved enhancer activity simply by chance. Additionally, as noted by Xing and He (2015), the relatively short binding motifs of TFs coupled with open chromatin during transcription makes it plausible that enhancers could evolve within an exon without having been under selection to do so.

Our framework for disentangling sources of selection has a number of limitations. First, it is limited to sequences for which we have saturation mutagenesis data. The number of such studies is small, though rapidly growing (Patwardhan et al. 2009, 2012; Kinney et al. 2010; Kwasnieski et al. 2012; Melnikov et al. 2012; Kheradpour et al. 2013; Birnbaum et al. 2014; Metzger et al. 2015). Second, our approach measures selection across an entire phylogeny; if selection was acting only in a small subset (e.g., one branch) of mammals, we would not detect it, since the signal from any one branch would be diluted by the others. Third, our approach does not account for epistatic interactions, trans-acting differences between species, artifacts caused by the plasmid context, or differential effects of SNVs across environments or tissues (see Smith et al. 2013 for further discussion). It is likely that these limitations will be addressed by future saturation mutagenesis studies that examine SNV combinations (cf Patwardhan et al. 2012) and additional cell types/environments. In addition, the recent invention of “saturation editing” of the genome now allows SNVs to be studied in their natural chromosomal context (Findlay et al. 2014).

As our analysis only covers three exonic enhancers, we cannot extrapolate these findings to the rest of the genome. However it is nonetheless striking that even among three of the strongest human exonic enhancers—where one might expect to have the greatest chance of finding selection to maintain their function—the only detectable selection is to conserve amino acid sequence. Therefore, we interpret these results as consistent with the conclusion of Xing and He (2015), that exonic enhancers likely do not play a major role in protein evolution.

Our framework for disentangling sources of selection is quite flexible, and can be applied to any instance of a sequence encoding distinct functions—such as exonic splice enhancers and silencers, exonic miRNA binding sites, overlapping TF binding sites, etc. As saturation mutagenesis studies become more commonplace, our approach may help distinguish between sequences that only appear to be “functional” because they have some biochemical activity, as opposed to those that are truly important for organismal function and fitness (Graur et al. 2013).

Materials and Methods

We used BLAST (Altschul et al. 1990) to identify orthologs of each exon among placental mammals, using the human sequence as the query. We aligned the sequences with ClustalW2 (McWilliam et al. 2013). The phylogenies were extracted from the best dates mammalian supertree generated by Bininda-Emonds et al. (2007). We excluded any sequences from mammals not found in this phylogeny, except for those that had a close relative (in the same genus) represented in the phylogeny. We then reconstructed the ancestral sequences at each node of the tree using FastML (Ashkenazy et al. 2012) with the default settings for nucleotide reconstruction. Due to a lack of sensitivity to indels, a small number of manual adjustments were made to the resulting multiple sequence alignment for TRAF3IP2 exon 2 (three internal node sequences had 6-bp deletions that were incorrectly filled by FastML to match the sequences of their parent nodes). When reconstructing the ancestral intron sequences, we adjusted the FastML settings to call indels when there was at least 90% probability of a given position being an indel.

Within the phylogeny, we located the most likely branch for each substitution by comparing the sequences at each node. Because the saturation mutagenesis data only covered SNVs, the few cases of insertions and deletions were ignored in this analysis. For each possible substitution, we counted how many times it occurred in the phylogeny (most occurred zero times; very few occurred more than once). Because not all possible substitutions had an equal opportunity to occur, we divided these counts by the frequency of the starting base at that position (an “A” if the change is A → T, etc.) across all sequences in the phylogeny; for example, a potential substitution whose starting base is present at a particular position in 40 of the sequences in the phylogeny would be divided by 40 to yield a weighted substitution rate. Any potential substitutions whose starting base never appeared in any sequences were excluded from the analysis, because they would never have had the chance to occur.

We then scored each substitution that occurred as synonymous or nonsynonymous. For potential substitutions that never occurred, we scored them based on the human exon sequence. In some rare instances, the same substitution caused a synonymous change in one branch of the phylogeny, and a nonsynonymous change in a different branch. We scored such cases as whichever type they occurred as most often. Instances that were tied for being both synonymous and nonsynonymous (1, 2, and 0 substitutions for SORL1 exon 17, TRAF3IP2 exon 2, and PPARG exon 6, respectively) were excluded from the analysis.

We obtained the saturation mutagenesis data from Birnbaum et al. (2014) Table S5 online. These data include regulatory effect estimates for the three possible SNVs at each position, starting from the human sequence. For substitutions that did not involve the human reference base, which were not measured by Birnbaum et al. (2014), we calculated the effect size by adding the effect sizes (in log-space) of the two human substitutions that would yield the substitution in question. For example, for a T → C substitution at a position that is an A in human, we add the effects of T → A and A → C to get the effect size for this substitution. Thus if the T → A was a 2-fold increase, and the A → C was a 1.5-fold increase, then T → C would be inferred to be a 3-fold increase. Regulatory effect sizes were divided into separate variables for up- and downregulation (upregulating substitutions were given a value of zero for down-regulation, and vice versa; see eq. 1). For the severity scores in our supplementary table S5, Supplementary Material online, we grouped all amino acids based on their size, polarity, and charge (Zhang 2000); transitions between these groups were scored as “radical” (given a score of 2) whereas transitions within these groups were scored as “conservative” (given a score of 1) when considering nonsynonymous substitutions (where synonymous substitutions are given a score of 0). Instances of nonsynonymous changes that were radical in one branch of the phylogeny and conservative in another were handled in the same way as those tied for being synonymous and nonsynonymous, described above. For the mutation rate-corrected substitution rates in our supplementary table S8, Supplementary Material online, rates for transitions were divided by 2 and rates for CpG → TpG substitutions were divided by 10, to account for the greater frequency of these mutations. All linear regression analyses were done using the lm() function in the R statistical package (allowing for a flexible intercept), and reported P values are from the t-test.

The core enhancer regions for SORL1 and TRAF3IP2 are at positions 259–320 and 333–409, respectively, in the Birnbaum et al. (2014) mutagenesis data. The flanking intron regions for SORL1 and PPARG are at positions 325–555 and 1–260, respectively. Correlations and partial correlations for our supplementary tables S12–S14, Supplementary Material online are Pearson correlations calculated in R, with P values from the t-test. For the first four simulations (supplementary tables S15–S18, Supplementary Material online), substitutions with magnitude of regulatory impact greater than the indicated cutoff were assigned a substitution rate of zero, to mimic the effects of strong selection against mutations exceeding some minimum regulatory impact. Similarly, for our supplementary tables S19 and S20, Supplementary Material online, we set substitutions with regulatory impact above the listed cutoff to be equal to the maximum of the substitution rates for the given data set. For our simulation in supplementary table S21, Supplementary Material online, we allowed the strength of selection to vary continuously with effect size, by first setting the substitution rate to ln(1/|log2(fold-change)|) in supplementary table S21, Supplementary Material online.

Supplementary Material

Supplementary tables S1–S21 and figure S1 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).

Supplementary Data

Acknowledgments

The authors thank J. Smith and the Fraser Lab for helpful discussions and comments on the manuscript. This study was supported by NIH grants 5T32GM007790-36 (supporting R.M.A.) and 1R01GM097171-01A1.

References

  1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol. 215:403–410. [DOI] [PubMed] [Google Scholar]
  2. Ashkenazy H, Penn O, Doron-Faigenboim A, Cohen O, Cannarozzi G, Zomer O, Pupko T. 2012. FastML: a web server for probabilistic reconstruction of ancestral sequences. Nucleic Acids Res. 40 (Web Server issue): W580–W584. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bininda-Emonds ORB, Cardillo M, Jones KE, MacPhee RDE, Beck RMD, Grenyer R, Price SA, Vos RA, Gittleman JL, Purvis A. 2007. The delayed rise of present-day mammals. Nature 446:507–512. [DOI] [PubMed] [Google Scholar]
  4. Birnbaum RY, Clowney EJ, Agamy O, Kim MJ, Zhao J, Yamanaka T, Pappalardo Z, Clarke SL, Wenger AM, Nguyen L, et al. 2012. Coding exons function as tissue-specific enhancers of nearby genes. Genome Res. 22:1059–1068. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Birnbaum RY, Patwardhan RP, Kim MJ, Findlay GM, Martin B, Zhao J, Bell RJ, Smith RP, Ku AA, Shendure J, Ahituv N. 2014. Systematic dissection of coding exons at single nucleotide resolution supports an additional role in cell-specific transcriptional regulation. PLoS Genet. 10:e1004592. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Findlay GM, Boyle EA, Hause RJ, Klein JC, Shendure J. 2014. Saturation editing of genomic regions by multiplex homology-directed repair. Nature 513:120–123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Graur D, Zheng Y, Price N, Azevedo RB, Zufall RA, Elhaik E. 2013. On the immortality of television sets: “function” in the human genome according to the evolution-free gospel of ENCODE. Genome Biol Evol. 5:578–590. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Kheradpour P, Ernst J, Melnikov A, Rogov P, Wang L, Zhang X, Alston J, Mikkelsen TS, Kellis M. 2013. Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. Genome Res. 23:800–811. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Kinney JB, Murugan A, Callan CG, Jr, Cox EC. 2010. Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence. Proc Natl Acad Sci U S A. 107:9158–9163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Kwasnieski JC, Mogno I, Myers CA, Corbo JC, Cohen BA. 2012. Complex effects of nucleotide variants in a mammalian cis-regulatory element. Proc Natl Acad Sci U S A. 109:19498–19503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Lin MF, Kheradpour P, Washietl S, Parker BJ, Pedersen JS, Kellis M. 2011. Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes. Genome Res. 21:1916–1928. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. McWilliam H, Li W, Uludag M, Squizzato S, Park YM, Buso N, Cowley AP, Lopez R. 2013. Analysis Tool Web Services from the EMBL-EBI. Nucleic Acids Res. 41 (Web Server issue): W597–W600. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Melnikov A, Murugan A, Zhang X, Tesileanu T, Wang L, Rogov P, Feizi S, Gnirke A, Callan CG Jr, Kinney JB, et al. 2012. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat Biotechnol. 30:271–277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Metzger BP, Yuan DC, Gruber JD, Duveau F, Wittkopp PJ. 2015. Selection on noise constrains variation in a eukaryotic promoter. Nature 521:344–347. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Patwardhan RP, Hiatt JB, Witten DM, Kim MJ, Smith RP, May D, Lee C, Andrie JM, Lee SI, Cooper GM, et al. 2012. Massively parallel functional dissection of mammalian enhancers in vivo. Nat Biotechnol. 30:265–270. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Patwardhan RP, Lee C, Litvin O, Young DL, Pe’er D, Shendure J. 2009. High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis. Nat Biotechnol. 27:1173–1175. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Ritter DI, Dong Z, Guo S, Chuang JH. 2012. Transcriptional enhancers in protein-coding exons of vertebrate developmental genes. PLoS One 7:e35202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Smith JD, McManus K, Fraser HB. 2013. A novel test for selection on cis-regulatory elements reveals positive and negative selection acting on mammalian transcriptional enhancers. Mol Biol Evol. 30(11):2509–2518. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Stergachis AB, Haugen E, Shafer A, Fu W, Vernot B, Reynolds A, Raubitschek A, Ziegler S, LeProust EM, Akey JM, Stamatoyannopoulos JA. 2013. Exonic transcription factor binding directs codon choice and affects protein evolution. Science 342:1367–1372. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Suzuki R, Saitou N. 2011. Exploration for functional nucleotide sequence candidates within coding regions of mammalian genes. DNA Res. 18:177–187. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Xing K, He X. 2015. Reassessing the “duon” hypothesis of protein evolution. Mol Biol Evol. 32(4):1056–1062. [DOI] [PubMed] [Google Scholar]
  22. Zhang J. 2000. Rates of conservative and radical nonsynonymous nucleotide substitutions in mammalian nuclear genes. J Mol Evol. 50:56–68 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Molecular Biology and Evolution are provided here courtesy of Oxford University Press

RESOURCES