Skip to main content
Molecular Biology and Evolution logoLink to Molecular Biology and Evolution
. 2017 Feb 25;34(6):1352–1362. doi: 10.1093/molbev/msx068

Structure of the Transcriptional Regulatory Network Correlates with Regulatory Divergence in Drosophila

Bing Yang 1, Patricia J Wittkopp 1,2,*
PMCID: PMC5435113  PMID: 28333240

Abstract

Transcriptional control of gene expression is regulated by biochemical interactions between cis-regulatory DNA sequences and trans-acting factors that form complex regulatory networks. Genetic changes affecting both cis- and trans-acting sequences in these networks have been shown to alter patterns of gene expression as well as higher-order organismal phenotypes. Here, we investigate how the structure of these regulatory networks relates to patterns of polymorphism and divergence in gene expression. To do this, we compared a transcriptional regulatory network inferred for Drosophila melanogaster to differences in gene regulation observed between two strains of D. melanogaster as well as between two pairs of closely related species: Drosophila sechellia and Drosophila simulans, and D. simulans and D. melanogaster. We found that the number of transcription factors predicted to directly regulate a gene (“in-degree”) was negatively correlated with divergence in both gene expression (mRNA abundance) and cis-regulation. This observation suggests that the number of transcription factors directly regulating a gene’s expression affects the conservation of cis-regulation and gene expression over evolutionary time. We also tested the hypothesis that transcription factors regulating more target genes (higher “out-degree”) are less likely to evolve changes in their cis-regulation and expression (presumably due to increased pleiotropy), but found little support for this predicted relationship. Taken together, these data show how the architecture of regulatory networks can influence regulatory evolution.

Keywords: regulatory network, evolution, cis-regulation, transcription factor, in-degree, out-degree, gene expression

Introduction

Genetic changes that alter gene expression contribute to phenotypic evolution, thus understanding how gene expression is regulated and changes over evolutionary time is important for understanding how phenotypes evolve (Wray 2007; Carroll 2008; Stern and Orgogozo 2009; Wittkopp and Kalay 2011). The first step in gene expression is transcription, which is controlled by interactions between trans-acting transcription factors and cis-acting DNA sequences. Transcriptional regulatory networks summarize the connections between transcription factors and the genes that they regulate, known as their “target genes” (Zhu et al. 2007). Because evolutionary changes arise within the context of these regulatory networks, the architecture of a regulatory network might make some types of changes more likely to evolve than others. Indeed, the connectivity of genes in a transcriptional regulatory network (i.e., the number of genes a gene regulates or is regulated by) has been found to correlate with evolutionary properties such as the rate of coding sequence evolution and gene duplication (e.g., Evangelisti and Wagner 2004; Jovelin and Phillips 2009).

Connectivity within a transcriptional regulatory network might also influence the evolution of transcriptional regulation itself. For example, the number of transcription factors regulating expression of a gene, a quantity known as “‘in-degree,” has been shown to positively correlate with plasticity in gene expression among environments (Promislow 2005) as well as mutational variance (Landry et al. 2007). This latter study, which examined the effects of new mutations arising in the near absence of selection on mRNA abundance, found that genes whose cis-regulatory elements had binding sites for more transcription factors were more likely to have their expression altered by new mutations, presumably because such genes had a larger mutational target size. It has also been suggested, however, that new mutations are less likely to alter expression of genes with many transcriptional regulators (higher in-degree) than genes with fewer transcriptional regulators (lower in-degree) because of robustness conferred by transcription factor binding sites with redundant or overlapping functions (Macneil and Walhout 2011). Natural selection is also expected to enforce greater constraints on expression of genes with higher in-degree because they tend to be key players in developmental pathways and changes in their expression tend to have large phenotypic consequences (Borneman et al. 2006; Batada and Hurst 2007). Depending on the interplay of mutational target size, robustness conferred by multiple regulators, and selective constraints on gene expression, in-degree might be either positively or negatively correlated with gene expression divergence.

The number of genes regulated by a transcription factor, a quantity known as “out-degree,” has also been predicted to influence the evolution of gene expression (McGuigan et al. 2014). Specifically, it has been proposed that mutations that alter expression of a transcription factor with many target genes should be more deleterious than mutations that alter expression of transcription factors with fewer target genes because the former have a greater potential to affect many phenotypes at once, increasing the probability that the mutation has deleterious effects on fitness (Cooper et al. 2007). Consistent with this idea, studies examining the effects of individual gene deletions in the baker’s yeast Saccharomyces cerevisiae have shown a significant negative correlation between the number of genes that change expression upon knockout of a gene and the relative fitness of that gene’s deletion (Hughes et al. 2000; Featherstone and Broadie 2002). Simulations of regulatory evolution also show evidence of a negative correlation between out-degree and effects of gene deletions on fitness but suggest that this correlation is quite weak (Siegal et al. 2007). Taken together, these data suggest that if a relationship is present between the out-degree of transcription factors and the evolution of gene expression, it should be negative, with transcription factors regulating more target genes showing less expression divergence among species than transcription factors regulating fewer target genes.

Here, we test these hypotheses about relationships between in-degree or out-degree and the evolution of gene expression by comparing a transcriptional regulatory network inferred for Drosophila melanogaster (Marbach et al. 2012) to expression differences observed within and between closely related Drosophila species (Coolon et al. 2014). Correlations between network topology and regulatory evolution are observed that suggest the architecture of existing transcriptional regulatory networks influences paths of future evolutionary change.

Results

Assessing Reliability of the D.melanogaster Transcriptional Regulatory Network

To examine the evolution of gene expression in the context of a transcriptional regulatory network, we used the “supervised” network that Marbach et al. (2012) constructed from datasets describing conservation of transcription factor binding motifs, physical binding of transcription factors, chromatin marks, patterns of gene expression, and experimentally confirmed regulatory interactions curated in REDfly (Halfon et al. 2008). Statistically significant differences in expression within and between closely related Drosophila species were taken from Coolon et al. (2014), in which RNA-seq data were used to compare transcript abundance between African and North American strains of D. melanogaster (mel-mel), Drosophila simulans and Drosophila sechellia (sim-sec), and D. melanogaster and D. simulans (mel-sim). Differences in cis-regulatory activity between each pair of strains or species reported in Coolon et al. (2014) were also used to test for relationships between in-degree or out-degree and cis-regulatory evolution, as cis-regulatory activity might provide a more direct read-out of the relationship between transcription factors and their target genes. We restricted our analysis to the 4,577 of 12,286 genes in the Marbach et al. (2012) regulatory network for which both expression differences and relative cis-regulatory activity were analyzed in all three comparisons (Coolon et al. 2014). Of these, 227 were transcription factors that appeared as regulators in the network and 4,576 were target genes in the network; one transcription factor did not appear as a target gene in the network. The Coolon et al. (2014) and Marbach et al. (2012) datasets are described in more detail in the Materials and Methods section, and supplementary figure S1, Supplementary Material online, explains how these datasets were merged. A comparison of in-degree and out-degree for genes in the Marbach et al. (2012) network that were included and excluded in our study is shown in supplementary figure S2, Supplementary Material online.

Because the transcriptional regulatory network we used was derived from data collected from D. melanogaster, we first considered whether or not this network provided a reasonable approximation of transcriptional regulatory networks in D. sechellia and D. simulans. These two species last shared a common ancestor with D. melanogaster ∼2.5 million years ago (Cutter 2008), yet both can still form viable F1 hybrids with D. melanogaster, suggesting that their transcriptional regulatory networks remain largely compatible. The strong conservation of transcription factor binding sites between D. melanogaster and D. yakuba (Bradley et al. 2010), species which diverged twice as long ago as D. melanogaster, D. simulans and D. sechellia (Cutter 2008), further suggests that network topology should be largely conserved among the species examined.

If a transcriptional network reliably represents regulatory relationships, we expect that transcription factors in this network with altered expression should tend to have more target genes with altered expression than transcription factors with conserved expression. Indeed, for all three comparisons, we found that transcription factors with expression differences between the strains or species compared had a greater proportion of target genes with statistically significant expression differences than transcription factors without expression differences (fig. 1AC). We also expect the converse to be true: target genes with expression differences should be more likely to have regulators (transcription factors) with expression differences between the strains or species being compared than target genes with conserved expression. Again, the data analyzed were consistent with this expectation: the proportion of transcription factors with significant differences in expression between the strains or species compared was larger for target genes that showed significant differences in expression than for target genes that did not (fig. 1DF). An assessment of the sensitivity of this metric to errors in the network structure is presented in supplementary figure S3, Supplementary Material online.

Fig. 1.

Fig. 1

Assessing reliability of the regulatory network. (AC) For each transcription factor (N = 227), we calculated the proportion of its target genes that showed significant expression differences between the strains or species compared. The boxplots show the distributions of these proportions for transcription factors with (dark gray) and without (light gray) significant expression differences between the two strains of D. melanogaster (A), D. simulans and D. sechellia (B), and D. melanogaster and D. simulans (C). P-values shown are from non-parametric Wilcoxon rank sum tests, and N indicates the number of transcription factors included in each category. (DF) For each target gene (N = 4576), we calculated the proportion of regulators (transcription factors) that showed significant expression differences between the strains or species being compared. The boxplots show the distributions of these proportions for target genes with (dark gray) and without (light gray) significant expression differences between the two strains of D. melanogaster (D), D. simulans and D. sechellia (E), and D. melanogaster and D. simulans (F). P-values shown are from non-parametric Wilcoxon rank sum tests, and N indicates the number of target genes included in each category.

In-Degree Correlates with Differences in Gene Expression within and between Species

As described in the Introduction, the number of transcription factors directly controlling a gene’s expression (in-degree) has been predicted to correlate positively or negatively with gene expression divergence depending on the factor assumed to be primarily responsible for the correlation. To empirically determine the relationship between in-degree and expression divergence, we compared the in-degree distributions between genes with (Nmel-mel = 1,372, Nsim-sec = 1,281, Nmel-sim = 1,480) and without (Nmel-mel= 3,204, Nsim-sec= 3,295, Nmel-sim = 3,096) statistically significant expression differences between the strains and species examined (fig. 2). We found that the medians of the in-degree distributions for the two groups were significantly different for all three comparisons (Wilcoxon rank sum test, Pmel-mel = 2 x10 14, Psim-sec = 2 x 10 10, Pmel-sim = 1 x10 12), with differentially expressed genes having a lower median in-degree than genes that were not differentially expressed (fig. 2AC).

Fig. 2.

Fig. 2

Relationship between network in-degree and differences in gene expression within species and between species. (AC) Boxplots show the in-degree distributions for genes with (dark gray) and without (light gray) significant differences in gene expression in the mel-mel (A), sim-sec (B), and mel-sim (C) comparisons. P-values are from non-parametric Wilcoxon rank sum tests, and N indicates the number of genes in each group. (D–F) Absolute magnitude of gene expression differences (Y-axis) is plotted against in-degree (X-axis) in the mel-mel (D), sim-sec (E), and mel-sim (F) comparisons. A LOESS line fitted to these data is shown in dark gray. Spearman’s rank correlation coefficients (ρ) and associated p-values are also shown.

To better understand the relationship between in-degree and the evolution of gene expression, we asked how the proportion of genes with a significant expression difference changed with increasing in-degree. Consistent with the tendency for genes with an expression difference to have lower in-degree than genes without an expression difference, increasing in-degree was found to be associated with a decreasing proportion of genes with expression differences using logistic regression (Pmel-mel < 2 × 10 16, Psim-sec = 5 × 10 9, Pmel-sim = 9 × 10 12, N = 4,576 in all tests). We also compared in-degree of each gene to its magnitude of expression difference (regardless of whether or not this difference was statistically significant) and used the nonparametric Spearman’s rank correlation coefficient (ρ) to test for a significant relationship between the two (fig. 2DF). This analysis showed that genes with larger in-degrees are not only less likely to have a significant expression difference between strains and species, but that the magnitude of any expression differences that do exist also tends to be smaller (ρmel-mel = −0.17 Pmel-mel < 2 × 10 16; ρsim-sec = −0.14, Psim-sec < 2 × 10 16; ρmel-sim = −0.17, Pmel-sim < 2 × 10 16; N= 4,576 in all tests).

In-Degree Correlates with Differences in Cis-Regulatory Activity within and between Species

To determine whether the relationship observed between in-degree and differences in transcript abundance (gene expression) also exists between in-degree and differences in cis-regulatory activity, we again divided genes into two groups, those with (Nmel-mel = 316, Nsim-sec= 489, Nmel-sim= 732) and without (Nmel-mel= 4,260, Nsim-sec = 4,087, Nmel-sim = 3,844) significant cis-regulatory differences, and compared their in-degree distributions. A significantly lower in-degree was observed for genes with differences in cis-regulatory activity using Wilcoxon rank sum tests to compare the medians of the in-degree distributions (fig. 3AC, Pmel-mel = 3 × 10 3, Psim-sec = 2 × 10 8, Pmel-sim = 2 × 10 3). Logistic regressions also indicated that higher in-degree was associated with a decreased probability of differences in cis-regulatory activity between strains and species (Pmel-mel = 0.003; Psim-sec = 2 × 10 10; Pmel-sim = 0.0027; N= 4,576 in all cases), and a significantly negative Spearman’s rank correlation coefficient was observed between in-degree and the magnitude of differences in cis-regulatory activity (fig. 3DF, ρmel-mel = −0.08, Pmel-mel = 5 × 10 8; ρsim-sec = −0.13, Psim-sec < 2 × 10 16; ρmel-sim = −0.07, Pmel-sim = 6 × 10 6, N= 4,576 in all cases). These findings suggest that the effects of in-degree on the evolution of cis-regulatory activity are at least partially responsible for the observed relationship between in-degree and differences in gene expression.

Fig. 3.

Fig. 3

Relationship between network in-degree and difference in cis-regulatory activity within species and between species. (AC) Boxplots show in-degree distributions for genes with (dark gray) and without (light gray) significant differences in cis-regulatory activity in the mel-mel (A), sim-sec (B), and mel-sim (C) comparisons. P-values are from non-parametric Wilcoxon rank sum tests, and N indicates the number of genes in each group. (DF) Absolute magnitude of cis-regulatory difference (Y-axis) is plotted against in-degree (X-axis) in the mel-mel (D), sim-sec (E), and mel-sim (F) comparisons. A LOESS line fitted to these data is shown in dark gray. Spearman’s rank correlation coefficients (ρ) and associated p-values are also shown.

Out-Degree Correlates with Differences in Gene Expression within but Not between Species

Expression of transcription factors with many target genes (higher out-degree) is often assumed to evolve more slowly than expression of transcription factors with fewer target genes (lower out-degree) because changing expression of the former is expected to have greater pleiotropic effects and hence greater selective constraint than changing the latter. To test this hypothesis, we compared the median out-degree between transcription factors with (Nmel-mel = 65, Nsim-sec= 44, Nmel-sim = 56) and without (Nmel-mel = 162, Nsim-sec= 183, Nmel-sim = 171) differences in expression in the mel-mel, sim-sec, and mel-sim comparisons using the same tests described above for in-degree. [Note that the smaller number of transcription factors (N = 227) than target genes (N = 4,576) provides less power to detect similarly sized effects for out-degree than in-degree.] When comparing expression between two strains of D. melanogaster, we found evidence of the predicted patterns: lower out-degree for transcription factors with expression differences (fig. 4A; Pmel-mel = 2×10 4) and fewer (logistic regression: β = −0.001, P = 4 × 10 4, N= 227) as well as smaller (fig. 4D, Spearman’s rank correlation: ρ = −0.23, P = 5 × 10 4, N= 227) expression differences for transcription factors with higher out-degree. Surprisingly, these relationships were not seen in either of the interspecific comparisons. Rather, we found no statistically significant differences in median out-degree between transcription factors with and without expression differences in the sim-sec and mel-sim comparisons (fig. 4B and C; Psim-sec = 0.71, Pmel-sim= 0.70; N = 227 in both cases) nor any significant correlation between the probability of expression differences and out-degree (logistic regression, Psim-sec= 0.50, Pmel-sim= 0.79, N= 227 in both cases) or the magnitude of expression differences and out-degree (fig. 4E and F; Spearman’s rank correlation, ρsim-sec = −0.094, Psim-sec= 0.16; ρmel-sim = −0.090, Pmel-sim = 0.17, N= 227 in both cases). These results are especially surprising given that the effects of selection, which is assumed to be the force driving a negative correlation between out-degree and expression divergence, should be stronger between than within species.

Fig. 4.

Fig. 4

Relationship between network out-degree and difference in gene expression within species and between species. (AC) Boxplots show out-degree distributions for genes with (dark gray) and without (light gray) significant differences in gene expression in the mel-mel (A), sim-sec (B), and mel-sim (C) comparisons. P-values are from non-parametric Wilcoxon rank sum tests, and N indicates the number of genes in each group. (DF) Absolute magnitude of gene expression difference (Y-axis) is plotted against out-degree (X-axis) in the mel-mel (D), sim-sec (E), and mel-sim (F) comparisons. A LOESS line fitted to these data is shown in dark gray. Spearman’s rank correlation coefficients (ρ) and associated p-values are also shown.

The hypothesis that out-degree negatively correlates with expression divergence is based on the assumption that out-degree is a good proxy for pleiotropy; however, this assumption might not be true. To examine this possibility, we compared out-degree to the number of Gene Ontology (GO) categories associated with each transcription factor, a measure previously shown to be correlated with other empirical measures of pleiotropy in yeast (He and Zhang 2006). We found no significant correlation between out-degree and the number of GO categories among the transcription factors examined (supplementary fig. S4, Supplementary Material online; Spearman’s rank correlation coefficient = −0.07, P = 0.4). We also tested whether the number of GO terms associated with a transcription factor correlates significantly with expression differences within or between species and found evidence for such a correlation only in the mel-mel comparison (fig. 5AC, Pmel-mel = 0.02; Psim-sec = 0.11; Pmel-sim = 0.17). Similarly, the number of GO terms only showed a statistically significant correlation with the magnitude of expression differences in the mel-mel comparison (fig. 5DF; ρmel-mel = 0.16, Pmel-mel = 0.05; ρsim-sec = −0.05, Psim-sec = 0.52; ρmel-sim = 0.00, Pmel-sim = 0.95, N= 227 in all cases). In both of these cases, however, the significant relationship observed in the mel-mel comparison between the number of GO terms and expression differences was in the opposite direction than expected, with transcription factors having more ontology terms more likely to have an expression difference (fig. 5A) or a larger expression difference (fig. 5D) than transcription factors with fewer ontology terms.

Fig. 5.

Fig. 5

Relationship between number of GO SLIM terms associated with a transcription factor and differences in gene expression within species and between species. (AC) Boxplots show GO SLIM term distributions for genes with (dark gray) and without (light gray) significant differences in gene expression in the mel-mel (A), sim-sec (B), and mel-sim (C) comparisons. P-values are from non-parametric Wilcoxon rank sum tests, and N indicates the number of genes in each group. (DF) Absolute magnitude of gene expression differences (Y-axis) is plotted against the number of GO SLIM terms (X-axis) in the mel-mel (D), sim-sec (E), and mel-sim (F) comparisons. A LOESS line fitted to these data is shown in dark gray. Spearman’s rank correlation coefficients (ρ) and associated p-values are also shown.

Out-Degree Does Not Correlate with Differences in Cis-Regulation within or between Species

To determine whether the relationship between out-degree and expression differences seen for the mel-mel comparison might be explained by a correlation between out-degree and differences in cis-regulatory activity, we compared out-degree between transcription factors with (Nmel-mel = 10) and without (Nmel-mel = 217) cis-regulatory differences in the mel-mel comparison. We found no significant difference in the median out-degree between the two groups of genes (fig. 6A; P= 0.71) nor any significant correlation between out-degree and the probability of cis-regulatory differences (logistic regression, P = 0.57, Nmel-mel = 227) or magnitude of cis-regulatory differences (fig. 6D, Spearman’s ρ = −0.01, P = 0.88, Nmel-mel = 227). For completeness, we also tested cis-regulatory differences in the sim-sec and mel-sim comparisons for a correlation with out-degree. Again, we found no significant difference in out-degree between transcription factors with (Nsim-sec = 16, Nmel-sim = 22) and without (Nsim-sec = 211, Nmel-sim = 205) differences in cis-regulatory activity (fig. 6B and C; Psim-sec = 0.56, Pmel-sim = 0.44) nor any significant correlation between out-degree and the probability of cis-regulatory differences (logistic regression, Psim-sec = 0.27; Pmel-sim = 0.38; N= 227 in both cases) or magnitude of cis-regulatory differences (fig. 6E and F; Spearman’s ρsim-sec = −0.08, Psim-sec = 0.21; ρmel-sim= −0.07, Pmel-sim = 0.32, N= 227 in both cases). These data suggest that the out-degree of a transcription factor has little influence on the evolution of its cis-regulatory expression differences.

Fig. 6.

Fig. 6

Relationship between network out-degree and difference in cis-regulatory activity within species and between species. (AC) Boxplots show out-degree distributions for genes with (dark gray) and without (light gray) significant differences in cis-regulation in the mel-mel (A), sim-sec (B), and mel-sim (C) comparisons. P-values are from non-parametric Wilcoxon rank sum tests, and N indicates the number of genes in each group. (D–F) Absolute magnitude of cis-regulatory differences (Y-axis) is plotted against out-degree (X-axis) in the mel-mel (D), sim-sec (E), and mel-sim (F) comparisons. A LOESS line fitted to these data is shown in dark gray. Spearman’s rank correlation coefficients (ρ) and associated p-values are also shown.

Discussion

By comparing in-degree and out-degree in the Drosophila regulatory network with changes in gene expression and cis-regulation that have evolved within and between species, we found that genes regulated by larger numbers of transcription factors tended to have fewer and smaller changes in expression both within and between species than genes regulated by smaller numbers of transcription factors. By contrast, we found that the number of genes a transcription factor regulates, a property predicted to be related to pleiotropy, showed a statistically significant correlation with differences in total gene expression only when comparing two strains of D. melanogaster. No significant correlation between out-degree and differences in cis-regulation were observed in any comparison, either within or between species. These relationships are summarized in figure 7. Below, we discuss the implications of our findings and compare our results with results from a similar study of regulatory differences between S. cerevisiae and S. paradox (Kopp and McIntyre 2012).

Fig. 7.

Fig. 7

In-degree is a better predictor of changes in cis-regulation and gene expression over evolutionary time than out-degree. This schematic shows the direction of the relationship, if any, between differences in gene expression (A) or cis-regulation (B) observed within or between Drosophila species and either in-degree (top row) or out-degree (bottom row), which are properties of the network architecture. A horizontal line indicates that no statistically significant relationship (defined as P < 0.01 for the Wilcoxon rank sum test) was observed. As described in the main text and shown in figure 5, we also compared differences in transcription factor expression to the number of GOSlim terms associated with each transcription factor and found evidence of a marginally significant relationship (P = 0.02 for Wilcoxon test, P = 0.05 for Spearman’s rank correlation coefficient only in the mel-mel comparison and the sign of this correlation was in the opposite direction than predicted

Network in-Degree Appears to Influence the Evolution of Gene Expression

The combinatorial control of a gene’s expression by sets of transcription factors might either suppress or enhance the effects of new mutations on transcript levels. For example, genes regulated by many transcription factors may be more likely to have their expression altered by new mutations than genes regulated by fewer transcription factors because there are more sites in the genome that affect the expression of these genes. Consistent with this prediction, a study of mutation accumulation lines in yeast found that mutational variance (differences in gene expression caused by new mutations) correlated positively with the number of trans-acting regulators predicted to regulate a gene’s expression (Landry et al. 2007). Interactions among transcription factors regulating expression of a gene can complicate the relationship between mutational target size and changes in gene expression, however. For example, a mutation that disrupts activity of a transcription factor might have little to no effect on expression of a target gene if another transcription factor(s) partially or completely compensates for the loss of the first transcription factor’s activity. Effects of mutating individual transcription factors might also be smaller for genes regulated by larger sets of transcription factors than smaller sets if each transcription factor contributes a comparable amount to gene expression. These properties might cause genes regulated by large sets of transcription factors to acquire changes in expression more slowly and/or less often than genes regulated by fewer transcription factors. Our data are consistent with these latter models: the number of transcription factors regulating a gene’s expression showed a significant negative correlation with both the frequency and magnitude of total expression differences as well as cis-regulatory differences within and between species.

A similar comparison between the number of transcription factors regulating a gene’s expression and its cis-regulatory divergence was performed for S.cerevisiae and Saccharomyces paradoxus by Kopp and McIntyre (2012). They found the opposite relationship between network in-degree and cis-regulatory divergence, with larger differences in cis-regulation observed for genes with larger numbers of transcriptional regulators, but the magnitude of this effect was described as small. The different relationships observed in these two studies might result from differences in the structure of transcriptional regulatory networks between yeast and flies. For example, compared to genes in Drosophila, Saccharomyces genes tend to have relatively few regulators (supplementary fig. S5, Supplementary Material online), which limits the opportunity for interactions among transcription factors to buffer the effects of regulatory changes. The smaller number of regulators might also cause genetic changes affecting a single transcriptional regulator to tend to have larger effects on expression. Ultimately, however, the reason for the different relationships reported between in-degree and regulatory divergence in yeast (Kopp and McIntyre 2012) and flies (this study) remains unknown.

Network Out-Degree Appears to Have Minimal Effect on the Evolution of Gene Expression

Patterns of variation within and between species are influenced by both mutation and selection, with selection acting to preserve favorable genetic variants and eliminate deleterious ones. In the context of regulatory networks, genetic variants that affect expression of genes that influence activity of many other genes are thought to often be deleterious because of their expected greater pleiotropy and thus expected to be preferentially eliminated by natural selection (e.g., Featherstone and Broadie 2002; Cooper et al. 2007; McGuigan et al. 2014). Our data provide limited support for this hypothesis, however, with the predicted negative correlation between divergence and the number of target genes of a transcription factor observed only for expression differences within a species. Selection is expected to have a larger impact on differences between than within species because of the longer divergence time, suggesting that selection is unlikely to be responsible for the relationship observed within D. melanogaster. Kopp and Mclntyre (2012) also failed to find a statistically significant correlation between out-degree of a transcription factor and cis-regulatory divergence between S. cerevisiae and S. paradoxus.

We do not think that these findings refute the idea that increasing pleiotropy increases the probability that a genetic change is deleterious, but rather that they suggest that the number of direct target genes a transcription factor regulates (“out-degree”) is not a good measure of pleiotropy. For example, the target genes of a transcription factor might tend to affect the same biological functions, minimizing the pleiotropic effects of genetic changes affecting expression of that transcription factor. The absence of a correlation between out-degree and the number of GO terms associated with a transcription factor is consistent with this potential explanation. The number of GO terms also failed to correlate with expression divergence, however, suggesting that it might also be a poor measure of pleiotropy (at least in Drosophila). Quantifying pleiotropy is notoriously difficult (Paaby and Rockman 2013), and detecting any relationship between pleiotropy and the evolution of gene expression that might exist will likely require information beyond the topology of regulatory networks and GO categorizations.

Looking Ahead

Understanding how existing biological systems shape the paths for future evolutionary change is an important goal for evolutionary biology. We must understand how genotypes are translated into phenotypes to achieve this goal, and the elucidation of regulatory networks controlling gene expression is a key step in this process. Our results suggest that some topological features of regulatory networks (e.g., in-degree) might be useful predictors of evolutionary change, whereas others (e.g., out-degree) might have less explanatory power than expected. The scope of these conclusions is limited, however, by the small number of species for which even a preliminary comparison between network topology and expression divergence is possible; elucidating regulatory networks remains challenging in even the most developed genetic model systems. Advances in functional genomics, computational tools for inferring regulatory networks, and methods for perturbing genomes to assess the phenotypic effects of a particular genetic change promise to provide more opportunities to study the relationship between biological networks and regulatory evolution.

Materials and Methods

Transcriptional Regulatory Network

The transcriptional regulatory network used in this work was the “supervised” network described by Marbach et al. (2012). It was inferred using information from several sources, including genome-wide chromatin immuno-precipitation, conserved transcription factor binding motifs among 12 Drosophila species, gene expression profiles across different development stages, chromatin modification profiles among several cell types, and experimentally confirmed regulatory relationships (Marbach et al. 2012).

Additional tests of the reliability of this network and its applicability to other Drosophila species (D. simulans and D. sechellia) were performed by switching edges among genes in the network as shown in supplementary figure S3, Supplementary Material online. Although this approach is intuitive and has been used to compare observed and randomized network structures in prior work (e.g., Milo et al. 2002, 2003; Iorio et al. 2016), the statistical properties of the null models generated in this way have not been established for comparing to datasets like gene expression with covariance among measures and the results should be interpreted with this in mind (Churchill and Doerge 2008). Briefly, the degree-preserving network randomization was done by randomly selecting two edges in the network and then, as long as the newly created edges did not already exist in the network, exchanging their target genes. This process was repeated until the intended percentage of edges was switched. In other words, 10% edge switching means 10% of the edges have exchanged ends with other edges. This randomization strategy keeps the in-degree and out-degree unchanged for all randomized networks (Milo et al. 2002, 2003). Error bars shown in supplementary figure S3BG, Supplementary Material online, indicate two standard deviations around the mean derived from the 200 permutations with the same percent edge switching.

Comparing Gene Expression and Cis-Regulatory Activity among Strains and Species

Differences in mRNA transcript abundance (“gene expression”) and relative cis-regulatory activity between the zhr and z30 strains of D. melanogaster, the droSec1 strain of D. sechellia and Tsimbazaza strain of D. simulans, and the zhr strain of D. melanogaster and Tsimbazaza strain of D. simulans were taken from the analysis of RNA-seq data described in Coolon et al. (2014). These data include comparisons of gene expression between each pair of strains or species (mel-mel, sim-sec, and mel-sim) as well as comparisons of relative cis-regulatory activity inferred by comparing relative allelic expression in F1 hybrids produced by crossing each pair of strains or species (Wittkopp et al. 2004; McManus et al. 2010). The statistical significance of differences in gene expression and cis-regulatory activity between strains or species were determined using binomial exact tests to compare read abundance in mixed parental (for expression differences) and F1 hybrid (for cis-regulatory differences) RNA-seq datasets with a Benjamini and Hochberg (1995) 5% false discovery rate (as implemented in R v3.0.1) to correct for multiple testing (Coolon et al. 2014). The process used to merge the expression and network files and identify the 4577 genes analyzed in this study is described in supplementary figure S1, Supplementary Material online. Ultimately, we analyzed the 4,577 genes (including 227 transcription factors) that passed the quality control standards for measuring allele-specific expression used by Coolon et al. (2014) and also appeared in the regulatory network inferred by Marbach et al. (2012) (supplementary fig. S1, Supplementary Material online). All gene annotations were based on D. melanogaster FlyBase FBgn#s (Attrill et al. 2015).

Comparing Network Properties to Differences in Gene Expression and Cis-Regulation

Analyses shown in figures 1, 2AC, 3AC, 4AC, and 6AC compare the presence or absence of statistically significant (FDR = 0.05) differences in gene expression or cis-regulatory activity described in Coolon et al. (2014) to relationships among genes in the network (fig. 1), in-degree of all target genes (figs. 2 and 3) and out-degree of all transcription factors (figs. 4 and 6). Non-parametric Wilcoxon rank sum tests were used to compare median in-degree and out-degree between sets of genes with and without statistically significant differences in gene expression or cis-regulation for each pair of strains or species examined as well as to compare the proportion of target genes with differential expression between transcription factors with and without differential expression and vice versa. These tests evaluated the null hypothesis of no association between in-degree or out-degree and differences in gene expression or cis-regulation. Logistic regressions were also used to compare an indicator variable representing whether or not a gene had a statistically significant difference in gene expression and/or cis-regulatory activity in a given comparison to its in-degree or out-degree. These tests were performed using the glm function in R with the options “family = binomial, link = logit,” which uses a Z-score to assess the statistical significance of the factor being tested; a significant test indicates that the factor tested (e.g., in-degree or out-degree) has statistically significant predictive ability for which genes have significant expression differences. The null hypothesis in each case was that the factor tested was not a significant predictor of differences in expression or cis-regulation.

Spearman’s rank correlation coefficients were used to test for a significant correlation between the log2 transformed magnitude of the differences in gene expression or cis-regulatory activity reported in Coolon et al. (2014) and a gene’s in-degree or out-degree. The null hypothesis for this test is that there is no relationship between a gene’s in-degree or out-degree and the magnitude of its expression difference between strains or species. Results from these tests are shown in figures 2D–F, 3DF, 4DF, and 6DF. A LOESS (locally weighted scatterplot smoothing) line was fitted to these data using the loess function with default parameters in R.

GO Analysis

GO terms were obtained from FlyBase (Attrill et al. 2015) for each transcription factor in our dataset, and the number of GO terms associated with each transcription factor was used as a proxy for its degree of pleiotropy. To minimize redundancy among GO terms, we restricted our analysis to the GO SLIM categories defined by Gene Ontology Consortium (2015). To determine whether the number of GO SLIM terms associated with a transcription factor was related to differences in expression of its target genes between each pair of strains or species, we used Wilcoxon rank sum tests to compare the median number of GO SLIM terms between sets of transcription factors with and without expression differences (fig. 5AC). Spearman’s rank correlation coefficients were also used to test for a significant relationship between the log2 magnitude of differences in gene expression or cis-regulatory activity and number of GO SLIM terms associated with each transcription factor.

Comparing in-Degree Distributions between Flies and Yeast

In the Discussion section, we compare our results from analysis of a Drosophila regulatory network to a similar study that was performed using a S. cerevisiae regulatory network (Kopp and McIntyre 2012), including a comparison of the in-degree distributions between the two networks. The Drosophila melanogaster network used for this analysis was the same network used for the rest of the analyses in this paper (Marbach et al. 2012) and the S. cerevisiae network used was described in Balaji et al. (2006). In each case, in-degree was calculated as the number of transcription factors predicted to regulate a target within the network.

Statistical Analyses

All statistical analyses were performed in R v3.2.2 (RCoreTeam 2016). Database files and scripts used to perform these analyses are available for download from https://deepblue.lib.umich.edu/data/concern/generic_works/9s161628x (last accessed February 8, 2017). Supplementary figures S1 and S3, Supplementary Material online, and their associated legends describe which files were used for each step of the project.

Supplementary Material

Supplementary data are available at Molecular Biology and Evolution online.

Supplementary Material

Supplementary Data

Acknowledgments

We thank the University of Michigan LSA High Performance Computing for computational resources; Joseph Coolon for providing the gene expression datasets; Joseph Coolon, Kraig Stevenson and Brian Metzger for advice on statistical analysis; and Brian Metzger and Fabien Duveau for comments on the manuscript. Funding for this work was provided by the National Science Foundation (MCB-1021398) and the National Institutes of Health (1R35GM118073).

References

  1. Attrill H, Falls K, Goodman JL, Millburn GH, Antonazzo G, Rey AJ, Marygold SJ.. FlyBase Consortium. 2015. FlyBase: establishing a gene group resource for Drosophila melanogaster. Nucleic Acids Res. 44:D786–D792. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Balaji S, Babu MM, Iyer LM, Luscombe NM, Aravind L.. 2006. Comprehensive analysis of combinatorial regulation using the transcriptional regulatory network of yeast. J Mol Biol. 360:213–227. [DOI] [PubMed] [Google Scholar]
  3. Batada NN, Hurst LD.. 2007. Evolution of chromosome organization driven by selection for reduced gene expression noise. Nat Genet. 39:945–949. [DOI] [PubMed] [Google Scholar]
  4. Benjamini Y, Hochberg Y.. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B Stat Methodol. 57:289–300. [Google Scholar]
  5. Borneman AR, Leigh-Bell JA, Yu H, Bertone P, Gerstein M, Snyder M.. 2006. Target hub proteins serve as master regulators of development in yeast. Genes Dev. 20:435–448. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bradley RK, Li X-Y, Trapnell C, Davidson S, Pachter L, Chu HC, Tonkin LA, Biggin MD, Eisen MB.. 2010. Binding site turnover produces pervasive quantitative changes in transcription factor binding between closely related drosophila species. PLOS Biol. 8:e1000343.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Carroll SB. 2008. Evo-devo and an expanding evolutionary synthesis: a genetic theory of morphological evolution. Cell 134:25–36. [DOI] [PubMed] [Google Scholar]
  8. Churchill GA, Doerge RW.. 2008. Naive application of permutation testing leads to inflated type I error rates. Genetics 178:609–610. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Coolon JD, McManus CJ, Stevenson KR, Graveley BR, Wittkopp PJ.. 2014. Tempo and mode of regulatory evolution in Drosophila. Genome Res. 24:797–808. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Cooper TF, Ostrowski EA, Travisano M.. 2007. A negative relationship between mutation pleiotropy and fitness effect in yeast. Evolution 61:1495–1499. [DOI] [PubMed] [Google Scholar]
  11. Cutter AD. 2008. Divergence times in Caenorhabditis and Drosophila inferred from direct estimates of the neutral mutation rate. Mol Biol Evol. 25:778–786. [DOI] [PubMed] [Google Scholar]
  12. Evangelisti AM, Wagner A.. 2004. Molecular evolution in the yeast transcriptional regulation network. J Exp Zool Part B 302:392–411. [DOI] [PubMed] [Google Scholar]
  13. Featherstone DE, Broadie K.. 2002. Wrestling with pleiotropy: genomic and topological analysis of the yeast gene expression network. Bioessays 24:267–274. [DOI] [PubMed] [Google Scholar]
  14. Gene Ontology Consortium. 2015. Gene ontology consortium: going forward. Nucleic Acids Res . 43:D1049–D1056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Halfon MS, Gallo SM, Bergman CM.. 2008. REDfly 2.0: an integrated database of cis-regulatory modules and transcription factor binding sites in Drosophila. Nucl Acids Res. 36:D594–D598. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. He X, Zhang J.. 2006. Toward a molecular understanding of pleiotropy. Genetics 173:1885–1891. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Hughes TR, Marton MJ, Jones AR, Roberts CJ, Stoughton R, Armour CD, Bennett HA, Coffey E, Dai H, He YD, et al. 2000. Functional discovery via a compendium of expression profiles. Cell 102:109–126. [DOI] [PubMed] [Google Scholar]
  18. Iorio F, Bernardo-Faura M, Gobbi A, Cokelaer T, Jurman G, Saez-Rodriguez J.. 2016. Efficient randomization of biological networks while preserving functional characterization of individual nodes. BMC Bioinformatics 17:542–555. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Jovelin R, Phillips PC.. 2009. Evolutionary rates and centrality in the yeast gene regulatory network. Genome Biol 10:R35.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Kopp A, McIntyre LM.. 2012. Transcriptional network structure has little effect on the rate of regulatory evolution in yeast. Mol Biol Evol. 29:1899–1905. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Landry CR, Lemos B, Rifkin SA, Dickinson WJ, Hartl DL.. 2007. Genetic properties influencing the evolvability of gene expression. Science 317:118–121. [DOI] [PubMed] [Google Scholar]
  22. Macneil LT, Walhout AJM.. 2011. Gene regulatory networks and the role of robustness and stochasticity in the control of gene expression. Genome Res. 21:645–657. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Marbach D, Roy S, Ay F, Meyer PE, Candeias R, Kahveci T, Bristow CA, Kellis M.. 2012. Predictive regulatory models in Drosophila melanogaster by integrative inference of transcriptional networks. Genome Res. 22:1334–1349. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. McGuigan K, Collet JM, Allen SL, Chenoweth SF, Blows MW.. 2014. Pleiotropic mutations are subject to strong stabilizing selection. Genetics 197:1051–1062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. McManus CJ, Coolon JD, Duff MO, Eipper-Mains J, Graveley BR, Wittkopp PJ.. 2010. Regulatory divergence in Drosophila revealed by mRNA-seq. Genome Res. 20:816–825. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Milo R, Kashtan N, Itzkovitz S, Newman MEJ, Alon U.. 2003. On the uniform generation of random graphs with prescribed degree sequences. arXiv:cond-mat/0312028
  27. Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U.. 2002. Network motifs: simple building blocks of complex networks. Science 298:824–827. [DOI] [PubMed] [Google Scholar]
  28. Paaby AB, Rockman MV.. 2013. The many faces of pleiotropy. Trend Genet. 29:66–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Promislow D. 2005. A regulatory network analysis of phenotypic plasticity in yeast. Am Nat. 165:515–523. [DOI] [PubMed] [Google Scholar]
  30. RCoreTeam. 2016. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; Retrieved from https://www.R-project.org/. [Google Scholar]
  31. Siegal ML, Promislow DEL, Bergman A.. 2007. Functional and evolutionary inference in gene networks: does topology matter? Genetica 129:83–103. [DOI] [PubMed] [Google Scholar]
  32. Stern DL, Orgogozo V.. 2009. Is genetic evolution predictable? Science 323:746–751. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Wittkopp PJ, Haerum BK, Clark AG.. 2004. Evolutionary changes in cis and trans gene regulation. Nature 430:85–88. [DOI] [PubMed] [Google Scholar]
  34. Wittkopp PJ, Kalay G.. 2011. Cis-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence. Nat Rev Genet. 13:59–69. [DOI] [PubMed] [Google Scholar]
  35. Wray GA. 2007. The evolutionary significance of cis-regulatory mutations. Nat Rev Genet. 8:206–216. [DOI] [PubMed] [Google Scholar]
  36. Zhu X, Gerstein M, Snyder M.. 2007. Getting connected: analysis and principles of biological networks. Genes Dev. 21:1010–1024. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Molecular Biology and Evolution are provided here courtesy of Oxford University Press

RESOURCES