Skip to main content
Genetics logoLink to Genetics
. 2014 Aug 29;198(3):1155–1166. doi: 10.1534/genetics.114.168112

Hybrid Incompatibility Arises in a Sequence-Based Bioenergetic Model of Transcription Factor Binding

Alexander Y Tulchinsky *,1, Norman A Johnson *,†,, Ward B Watt §, Adam H Porter *,
PMCID: PMC4224158  PMID: 25173845

Abstract

Postzygotic isolation between incipient species results from the accumulation of incompatibilities that arise as a consequence of genetic divergence. When phenotypes are determined by regulatory interactions, hybrid incompatibility can evolve even as a consequence of parallel adaptation in parental populations because interacting genes can produce the same phenotype through incompatible allelic combinations. We explore the evolutionary conditions that promote and constrain hybrid incompatibility in regulatory networks using a bioenergetic model (combining thermodynamics and kinetics) of transcriptional regulation, considering the bioenergetic basis of molecular interactions between transcription factors (TFs) and their binding sites. The bioenergetic parameters consider the free energy of formation of the bond between the TF and its binding site and the availability of TFs in the intracellular environment. Together these determine fractional occupancy of the TF on the promoter site, the degree of subsequent gene expression and in diploids, and the degree of dominance among allelic interactions. This results in a sigmoid genotype–phenotype map and fitness landscape, with the details of the shape determining the degree of bioenergetic evolutionary constraint on hybrid incompatibility. Using individual-based simulations, we subjected two allopatric populations to parallel directional or stabilizing selection. Misregulation of hybrid gene expression occurred under either type of selection, although it evolved faster under directional selection. Under directional selection, the extent of hybrid incompatibility increased with the slope of the genotype–phenotype map near the derived parental expression level. Under stabilizing selection, hybrid incompatibility arose from compensatory mutations and was greater when the bioenergetic properties of the interaction caused the space of nearly neutral genotypes around the stable expression level to be wide. F2’s showed higher hybrid incompatibility than F1’s to the extent that the bioenergetic properties favored dominant regulatory interactions. The present model is a mechanistically explicit case of the Bateson–Dobzhansky–Muller model, connecting environmental selective pressure to hybrid incompatibility through the molecular mechanism of regulatory divergence. The bioenergetic parameters that determine expression represent measurable properties of transcriptional regulation, providing a predictive framework for empirical studies of how phenotypic evolution results in epistatic incompatibility at the molecular level in hybrids.

Keywords: speciation, Dobzhansky–Muller interactions, cis–trans coevolution, regulatory evolution, adaptive landscape, genotype–phenotype map


POSTZYGOTIC hybrid incompatibility (HI), an important component of reproductive isolation (Coyne and Orr 2004), is usually not due to failure of a single gene, but arises from incompatibilities between interacting genes, as described by the Bateson–Dobzhansky–Muller (BDM) model (Bateson 1909; Dobzhansky 1937; Muller 1942). Recent studies of the genes underlying HI show that this BDM model is well supported and that some form of selection plays a role in the evolution of HI (Johnson 2010; Presgraves 2010; Maheshwari and Barbash 2011). In most cases, however, the molecular basis of HI is unresolved; because incompatibility requires at least two interacting genes under the BDM model, the cause of incompatibility can be understood fully only after all of the interacting partners have been identified.

Gene interactions are of essence in developmental biology. Networks of interacting genes map an organism’s genotype to its phenotype through developmental and physiological processes (Wilkins 2002). Motivated by the pervasive nature of these networks, Johnson and Porter (2000, 2001, 2007) (see also Palmer and Feldman 2009) developed models that connect directional selection on phenotypic traits to the evolution of hybrid incompatibility through the action of gene networks. In these models, BDM incompatibilities can be understood in terms of molecular interactions among the nucleic acid and protein components of these networks. Simplifications in the representation of these interactions in this model yielded several results that limited its generality and its tractability for future empirical work.

Molecular interactions in biological systems are governed by bioenergetic (combining thermodynamics and kinetics, Morowitz 1978) principles inherent to the physics of interacting molecules. In the context of gene regulation (Gerland et al. 2002), these principles require that interactions between transcription factors (TFs) and the promoter sites they associate with be viewed not as binary and static, with prospective TFs always or never bound such that gene expression proceeds at its maximum level or not at all. Rather, bioenergetic principles require that binding be dynamic and expressed in terms of fractional occupancy, the probability that the TF is associated with its binding site at a given moment. The values of bioenergetic parameters themselves depend on the configurations and concentrations of the interacting molecules, which are evolvable in regulatory networks, with the consequence that gene expression evolves on a continuous rather than binary scale (Gerland et al. 2002). Intermediate levels of gene expression are possible and can be favored in environments where the corresponding phenotypes are optimal.

The genotype–phenotype (G–P) map, which describes the relationship between genotypic information and phenotypic expression (Segal et al. 2008), is shaped by the bioenergetic properties of the molecular interactions (Gerland et al. 2002). The fitness landscape, which describes the relationship between fitness and the underlying genetic information and environmental conditions that ultimately determine reproductive success, is mediated through the phenotype. When phenotypes are based upon regulatory interactions, the shape of the fitness landscape is determined by the bioenergetics of the interacting molecules. The fitness landscape is therefore ultimately based upon these bioenergetic properties and subject to bioenergetic constraints (Watt and Dean 2000; Watt et al. 2003, Watt 2013). The evolutionary dynamics of HI depend on the details of its shape.

In this study, we investigate the effects of bioenergetic parameters on evolving genetic regulatory interactions and the evolutionary constraints imposed upon their by-product, BDM incompatibilities. We show the conditions under which BDM incompatibilities are most likely to evolve in the simplest regulatory interaction, the two-locus case. Our results overcome the limitations in the Johnson and Porter (2000, 2001, 2007) models and are well suited to empirical studies of the bioenergetic basis of gene expression (Segal and Widom 2009; Shultzaberger et al. 2012) and bioinformatic data characterizing promoter sequences and TF binding (Segal et al. 2008; Wittkopp and Kalay 2012).

Modeling Approach

To characterize the bioenergetic properties of regulatory molecular interactions, we modify a class of statistical physics models (Von Hippel and Berg 1986; Gerland et al. 2002; Mustonen et al. 2008), originally developed to model the bioenergetics of transcription regulation in terms of the information content of the regulatory site. In these models, the information content represents the fit between the TF’s binding motif and the binding site’s nucleotide sequence, which in turn determines the free energy of formation of bonds between them. In the cell, a TF molecule may interact to varying extents with its target binding site, any spurious binding sites in the remainder of the genome, and other molecules in the intracellular environment. The total free energy of binding to this nonspecific background varies with the motif length and genome size. The binding energy of the TF to the specific site of interest, relative to the collective binding energies to the competing interactors, and in combination with the number of TF molecules available, determines the fractional occupancy—the probability that the TF is associated with its binding site at a given moment. Gene expression occurs while the TF is bound.

We modify this basic statistical physics model by allowing the information content of the TF’s binding motif to vary in the same way as the binding site and represent the information content of both as mutable binary sequences (Figure 1), with fractional occupancies calculated from novel variants of each. The phenotype is proportional to fractional occupancy, and the fitnesses of those phenotypes depend on environmental conditions that may change. Adaptation arises from evolutionary changes in the underlying regulatory interaction responsible for gene expression. Inasmuch as the fitness landscape—the relationship between genotype and fitness—is mediated through the phenotype, the dynamics of adaptation ultimately are subject to the evolutionary constraints imposed by the underlying bioenergetic parameters. As in Johnson and Porter (2000, 2007), we expect hybrid incompatibility, in the form of misregulated gene expression, to arise as a by-product of adaptation if parent populations evolve different and incompatible solutions in response to selection for a given level of expression, but subject to the bioenergetic constraints that act upon adaptation.

Figure 1.

Figure 1

“Lock and key” model of a two-locus regulatory interaction. Expression depends on the fit between the transcription factor, encoded at the first locus, and a binding site in the regulatory region of the second locus. Alleles of the transcription factor and the binding site are represented as binary strings. A perfect match results in the maximum level of expression (but not necessarily the highest fitness).

Model

We model a regulatory interaction between two unlinked diploid loci, where the first locus encodes a mutable TF and the second locus encodes a protein whose expression level determines the organismal phenotype of interest. Expression of the second locus is activated by binding of the TF to a mutable binding site in its regulatory region (Figure 1). Information contained in the alleles of the TF and the binding site is represented using sequences of n = 12 bits, where the constant n is the length of the binding motif. The number of matched bits represents the fit between the binding site’s nucleotide sequence and the TF’s binding motif, which in combination with bioenergetic parameters determines the level of expression and the resulting phenotype.

Fractional occupancy

Expression is calculated from the fractional occupancy of the TF on its target binding site (the proportion of time the TF is bound) following the statistical physics models of Von Hippel and Berg (1986), Gerland et al. (2002), and Mustonen et al. (2008), which model fractional occupancy as a dynamic equilibrium determined by the free energy of binding and the availability of TF molecules in the cell, and take into account the possibility of spurious occupancy of the TF on nontarget sites. Transcription factor molecules may exist freely in solution or in association with the genome, where they may be bound at any position with a sequence-dependent probability. An individual binding site’s sequence distinguishes it from the rest of the genome and determines the relative affinity for that site by a given TF molecule. Following Gerland et al. (2002), we focus on a single functional binding site and treat the rest of the genome as random sequence that may contain whole or partial instances of the binding motif. Attraction to the genomic background and the rest of the intracellular environment is treated as nonspecific binding with no functional consequence except to decrease the fractional occupancy at the focal site.

Equilibrium fractional occupancy, θ, at the focal binding site is determined by how well its sequence matches the TF motif vs. the extent of nonspecific binding. This is expressed in terms of bioenergetic parameters as

θ=NTFNTF+eΔGTFΔGb, (1)

where NTF is the number of molecules of the transcription factor in the cell, ΔGTF is the free energy of binding of the TF to the focal site, and ΔGb is the combined free energy of formation for all binding interactions with the nonspecific background. (ΔGTF and ΔGb are in units of kbT, the Boltzmann constant multiplied by temperature, and by definition take negative values.) To account for the effect of mutation on fractional occupancy, we express the effect of the fit between a TF and its target cis-regulatory site on binding energy as ΔGTF = (n – mG1 = ΔGmatch – mΔG1, where m is the number of mismatched bits between the TF and its target site, ΔG1 is the contribution of a single bit of information to the free energy of formation, and ΔGmatch is the free energy of binding of a perfectly matched TF-focal site pair. Each matched bit increases the negative magnitude of the free energy of formation, increasing the fractional occupancy. Implicit in the ΔG1 parameter is the simplifying assumption that each position in the TF-binding site interaction has an equal and additive effect on binding energy (Von Hippel and Berg 1986; Gerland et al. 2002; see also Khatri et al. 2009).

Following Gerland et al. (2002), we define a heuristic parameter describing a relationship between two causal free-energy parameters. Ediff = ΔGb – ΔGmatch is the difference in free energies of binding between the nonspecific background and a perfectly matched TF, with its sign chosen such that Ediff < 0 when nonspecific sites are collectively more attractive than the best-matched focal site. (Gerland et al. 2002 show that under plausible regulatory conditions, Ediff ≈ 0 is expected to yield biologically realistic levels of transcriptional control and that empirical estimates are consistent with that or slightly lower.) This allows us to cast fractional occupancy in terms of mismatches, the free-energy contribution of a single match, and the relative attractiveness of the nonspecific background, as

θ=NTFNTF+emΔG1Ediff. (2)

To model the phenotypes of diploid organisms, we need to take into account dominance between allelic forms of the TF. Dominance effects emerge from allelic differences in fractional occupancy and are therefore sensitive to bioenergetic parameters. They play a role in the degree of hybrid incompatibility seen in F1 vs. F2 crosses. We model dominance by treating each TF as a competitive inhibitor of the other, reciprocally reducing their fractional occupancies, and scaling emΔG1Ediff accordingly. Following Michaelis and Menten (1913) (Supporting Information, File S1), the fractional occupancy of each allelic form of the TF on its target binding site is reduced from its haploid level to

θ=NTFNTF+αemΔG1Ediff, (3)

where α=1+NTFcemcΔG1+Ediff,c. NTF is the number of molecules of the TF allele of interest, NTFc is the number of molecules of the competing TF allele, mc is the number of mismatches in the competing TF, and Ediff,c = ΔGbc – ΔGmatch scales the net free energy of formation of nonspecific binding of the competing TF to that of its own perfectly matched, specific binding site. Allelic variation may occur at both TF and binding-site loci, and θ’ is calculated separately for each of the four possible allelic interactions. We used NTFc = NTF and Ediff,c = Ediff throughout, tantamount to the assumptions that each TF allele is equally expressed and nonspecific binding is equal among alleles.

Phenotype

We treat the organismal phenotype, P, as equivalent to the level of expression. At each allele of the cis-regulated locus, expression of the trait proceeds if either TF allele is bound; thus, the rate of expression, r, at each equals kΣθ’, the sum of the competition-adjusted fractional occupancies of the TF alleles multiplied by a proportionality constant k. The maximum total fractional occupancy, θmax, occurs in perfectly matched (m = 0) homozygotes. Expression is assumed to proceed independently from the two binding sites, and keeping all other cellular processes constant, the final level of expression is proportional to the expression rate (Gertz et al. 2009). Thus, we calculate the phenotype P = Σr, the sum of the expression rates of each allele of the cis-regulated locus. To force P to a unary scale, we set k = 1/ max such that P = 1 when m = 0 at all four allelic interactions. If the transcription factor were a repressor rather than an activator, P would instead equal (1 – Σr). In the simple two-locus case, this would not change the results except for reversing the effect of expression level, so we do not investigate repressors further here.

The genotype–phenotype map

The genotype–phenotype (G–P) map comprises the set of rules governing the translation of genetic information into the realized phenotype (Travisano and Shaw 2012). When those rules follow the bioenergetic principles embodied by Equation 3 the G–P map takes a general sigmoid form (Figure 2), with the details of the shape and position determined by the three bioenergetic parameters, ΔG1, Ediff, and NTF. Once a G–P map’s shape is established by assigning values to those parameters, the phenotype it produces depends on the number of mismatches, m, in the binding motif. Figure S1 shows the effects on the phenotype of varying each of these parameters independently. Increasing ΔG1 (in negative magnitude, because free energies are by definition negative) increases the slope at the inflection point and shifts the overall curve to the left (Figure S1A). Increasing Ediff decreases the attraction of the TF to nonspecific targets and therefore increases its availability at the binding site. This makes the binding interaction more tolerant to mismatches, which shifts the curve to the right without changing the slope at the inflection point (Figure S1B). It follows from Equation 2 that an exponential increase in NTF has an identical effect as a linear increase in Ediff (Figure S1C); the outcomes are isomorphic. In our analyses, we held NTF constant and varied Ediff instead. The lateral shifts induced by singly varying the bioenergetic parameters also imply that, for a constant motif length n, the minimum phenotype differs and in some cases considerable gene expression can proceed even when m = n (Figure S1). To explore the effects of variation in the shape of the G–P map while constraining it to its biologically relevant range, such that P ∼0 when m = n, it is necessary to vary ΔG1 and Ediff concurrently (Figure 2A).

Figure 2.

Figure 2

Effects of the bioenergetic parameters ΔG1 and Ediff on the genotype–phenotype (G–P) map and the corresponding fitness landscape. ΔG1 and Ediff are in units of kbT. (A) Effect on the G–P map. Horizontal axis: the number of mismatched bits between the binding site and the transcription factor’s binding motif. Vertical axis: the phenotype, which in this case is the expression level normalized to a scale of zero to one. ΔG1 values, in steps of 0.0825 kbT, were chosen for each Ediff so as to hold constant the expression produced by n = 12 substitutions. (See Figure S1 for the independent effects of these parameters.) (B) The fitness landscapes for the bioenergetic parameter combinations in A, in this case with environmental fitness parameters set to Popt = 1 and σs = 0.05. Horizontal axis: Δ mismatches is the number of mismatches responsible for the difference between an individual’s phenotype P and the optimal phenotype Popt. The bioenergetic parameters that determine expression level, and therefore the transition from genotype to phenotype, extend further to drive the relationship between genotype and fitness under a given environmental selection regime.

It is useful to adopt a heuristic shorthand to describe qualitative differences between the bioenergetically determined shapes. We use the slope near the top of the G–P map curve, where P = 1, unless otherwise specified. We therefore refer to the “slope” of the G–P map as the difference between phenotypes at m = 0 and m = 1 (Figure 2A), i.e., the phenotypic effect of the first mismatch, such that G–P maps with shallower slopes are broader and flatter at the top. We use the same language to compare fitness landscapes, below.

Fitness and the fitness landscape

Fitness is a function of an organism’s deviation from the optimal expression level, as per Johnson and Porter (2000, 2007),

W=exp[(PPopt)22σs2], (4)

where Popt is the optimal expression level as determined by the environment, and σs2 is the variance of fitness around the optimum, a measure of the degree to which a suboptimal phenotype is tolerated by selection.

The fitness landscape describes the relationship between genotype and fitness across a spectrum of environments (Svensson and Calsbeek 2012). Inasmuch as an organism’s fitness depends on its phenotype, the fitness landscape is mediated through the G–P map and, ultimately, the bioenergetic properties of the molecules that underlie this map (Watt and Dean 2000; Watt et al. 2003, Watt 2013). Figure 2B shows the fitness landscapes corresponding to the G–P maps of Figure 2A when the environmentally determined fitness parameters of Equation 4 are set to Popt = 1 and σ = 0.05. Changes in the bioenergetic parameters have the same qualitative effects on the slope of the fitness landscape as they have on the G–P map; the fitness landscapes around other values of Popt share that isomorphism. Fitness landscapes with flatter slopes (such as the top curve in Figure 2B) can result from G–P maps with flatter slopes under a constant value of σs or could result from a G–P map with steeper slope and a more permissive environment—a higher value of σs. Because of this relationship, we did not study the effect of varying σs alone.

Simulations

We used simulations to examine misregulation of the hybrid phenotype in cases where two allopatric populations of diploids were subject to either parallel directional or parallel stabilizing selection for high gene expression. Although divergent selection would also result in HI under our model, we focus on parallel selection to study incompatibility resulting from gene interactions alone, excluding the effect of adaptation to diverging environments. We study selection to intermediate gene expression in File S1.

Each generation consisted of viability selection, using a fitness function with a standard deviation (σs) of 0.05, followed by random mating. Population size was kept constant each generation, with no overlap of generations. Mutations occurred in the offspring at a rate of 0.001 per locus, with each mutation changing one bit of information. A high mutation rate was chosen so that populations would be able to respond reliably to selection. Varying the mutation rate should have little effect on hybrid incompatibility under directional selection as long as populations are large enough that sufficient new mutations are available for selection to act upon (Johnson and Porter 2000). Under stabilizing selection, the mutation rate is expected to affect the rate of divergence due to drift, but since this effect is relative to population size and divergence time, we do not test it separately here.

We used n = 12 bits and σs = 0.05 throughout the article because these values permit adequate response to selection under the range of bioenergetic parameters and population sizes we studied. The sensitivity of the results to σs also depends on the mutation effect size (Johnson and Porter 2000), which, for a specified set of parameter values, is reflected in the number of bits in the binding motif linking the TF to its binding site. Tulchinsky (2013) found that doubling n (and simultaneously halving ΔG1 to restore the sigmoid shape of the G–P map) had negligible effect on the results in the two-locus interaction, so we do not report those results here.

To assess misregulation of the hybrid phenotype for each replicate, we randomly created 50 F1 hybrids, and F2 hybrids from those (without viability selection on the F1’s), and calculated total hybrid misregulation as the mean absolute deviation from the optimal parental phenotype, Popt. This total misregulation includes a contribution from residual polymorphism in the parental populations, so we calculated net hybrid misregulation by subtracting the mean parental deviation. Although the evolution of net hybrid misregulation is the central issue in explaining the evolution of BDM incompatibilities, we also report the fitness of hybrids under the assumption that they experience the same viability selection (Equation 4) as do the parental populations. Results are based on 200 replicates of the simulation. To compare stabilizing selection to directional selection, the optimal phenotype of the conserved trait was held constant at a value of Popt = 1.0 for 40,000 generations and that of the directionally selected trait changed at a rate of ΔPopt = 1/40,000, from an initial value near zero (corresponding to 12 mismatches) to a final value of 1.0 (corresponding to zero mismatches) over the same time period. (Sensitivity analyses in Tulchinsky 2013 indicate that the choice of ΔPopt has a small effect unless it is large enough relative to σs and the mutation rate that it drives the population toward extinction.) In all cases, the initial genotypes of all individuals were set to identical values such that each phenotype started at the initial optimum.

We tested the effect of the fitness landscape on evolutionary outcomes by varying the bioenergetic parameters that specify the G–P map. For regulatory interactions under directional selection, we performed a series of simulations to test the effect of genetic divergence and the slope of the genotype–phenotype map. (Further tests of the separate effects of ΔG1 and Ediff can be found in Tulchinsky 2013.) To test the effect of genetic divergence on hybrid incompatibility while holding the G–P map constant, we held NTF at 100, ΔG1 at −0.610 kbT, and Ediff at −1.0 kbT (corresponding to the second curve from the left in Figure 2A) and varied the starting phenotype so that between one and six substitutions would be required in each parent population to reach the final optimal phenotype after 4000 generations. To test the effect of the slope of the G–P map and fitness landscape while holding genetic divergence constant, we evolved populations toward a phenotype of 1.0, held NTF at 100, and adjusted ΔG1 and Ediff to vary the slope at the final optimal phenotype while ensuring that exactly n = 12 substitutions separated the initial and final optima (Figure 2A). The slope of the G–P map at the endpoint phenotype of Popt = 1.0, represented by the phenotypic difference between zero and one mismatches in Figure 2A, is related to our parameter values as follows: for slope 0.002, ΔG1 = −0.858 and Ediff = 2.0; for slope 0.004, ΔG1 = −0.774 and Ediff = 1.0; for slope 0.010, ΔG1 = −0.692 and Ediff = 0.0; for slope 0.022, ΔG1 = −0.610 and Ediff = −1.0; for slope 0.046, ΔG1 = −0.530 and Ediff = −2.0. Under stabilizing selection, the genotypic basis of the regulatory interaction can evolve when one locus compensates in response to a weakly deleterious mutation in the other. We examined the effect on this compensatory evolution of population size crossed with the slope of the G–P map.

Dominance is inherent to bioenergetic models and it allows for the possibility that when parental populations have the same optimal phenotypes, F1 hybrids may show less net misregulation than F2’s in proportion to the level of dominance. To examine the effect of allelic dominance, we manipulated competitive binding between alleles by changing the availability of TF’s at their binding sites. We express the degree of dominance as the effect that removing one of the two allele copies of the TF would produce on an individual’s phenotype. We held NTF at 100 and stepped Ediff from −7.5 kbT to −1.0 kbT in increments of 0.5 kbT. At each value of Ediff, the value of ΔG1 was chosen such that exactly n = 12 substitutions separated ancestrally from derived phenotypes. These parameter settings vary the degree of dominance in the following way. At Ediff = −7.5 kbT, the attraction of the TF to the nonspecific background is high, so its availability at the binding site is low and competitive binding between alleles is negligible. A complete loss of binding of one TF allele reduces expression by an average of 49%, so at these parameter values, strongly and weakly binding TF alleles approach codominance. At Ediff = −1.0 kbT, the attraction of the TF to the nonspecific background is low, an excess is available at the binding site, competitive binding is high, and the stronger-binding TF approaches complete dominance. At these parameter values, a complete loss of binding of one TF allele reduces expression by only 3% (Equation 2). These simulations were run at parental population sizes of 400 for t = 4000 generations, such that ΔPopt = 1/4000. In File S1, we also study the extent that misregulation apparent in the F2 generation remains cryptic in F1 hybrids, obscured by regulatory combinations from parental genotypes.

All simulations were written in C in the XCode environment and run under the Mac OS X operating system, and graphics were produced using the statistical package R.

Results

Under all parameter sets, each parent population tracked the optimal phenotype so that the final deviation from the optimum was <0.05 in at least 99.5% of replicates. HI occurred in the F1 or F2 generation under both directional and stabilizing selection, but was more severe and evolved in less time under directional selection. Residual polymorphism remained in the parental populations at the end of each run, but contributed only a very small amount to misregulation in hybrids (File S1). We found that the bioenergetic parameters affected the extent of HI differently depending on the type of selection.

Type of selection

Under parallel directional selection to new optimal phenotypes, how much the hybrid genotype deviated, on average, from the optimal number of TF-to-binding-site matches depended on how far genotypes of the parent populations had diverged from one another, which in turn depended on the number of substitutions separating ancestral from derived phenotypes. The shape of the G–P map determined the phenotypic and fitness effects of too few or too many matches in the hybrid. The effect of changing the slope of the G–P map and fitness landscape can be seen in Figure 3, open bars. In this case, the ancestral and derived phenotypes were kept constant and the G–P map varied as shown in Figure 2A. Median HI increased with increasing slope around the derived phenotype of 1.0. HI occurred primarily in the F2 generation, because the parameters that determine the shape of the G–P map in our model also determine the dominance of high expression (see dominance results below).

Figure 3.

Figure 3

The effects of the slope the G–P map (phenotypic effect of one mutation) and population size on median F2 misregulation and corresponding fitness. Open boxes correspond to directional selection from minimal expression (12 mismatches) toward maximal expression (zero mismatches). Shaded boxes correspond to stabilizing selection at maximal expression (zero mismatches). Box plots show median, quartiles, and full ranges. Population size has no effect under directional selection, but smaller populations are more likely to evolve misregulation under stabilizing selection. The slope of the G–P map is represented by the effect of one mutation, which here is the phenotypic difference between a genotype with zero mismatches and a genotype with one mismatch (visible in Figure 2A). Steeper slopes increase hybrid misregulation under directional selection, but decrease misregulation under stabilizing selection. The effect is the same for directional selection toward intermediate expression (not shown). Hybrid fitness follows Equation 4.

Under stabilizing selection, HI can evolve as a consequence of compensatory evolution once a deleterious mutation at either locus becomes common. We found that HI evolved only when the population size was low and the phenotypic effect of a single mutation was small (Figure 3). The latter was possible only when the phenotype was conserved at high or low expression, because the phenotypic and fitness effects of a single mismatch are strongest at intermediate expression (Figure 2). If the effect of a single mutation was too high or the population size was relatively larger, parental genotypes did not diverge enough for HI to evolve (see also Fierst and Hansen 2010), as nearly all mutations were eliminated by selection before a compensatory mutation could arise. Here again, HI occurred primarily in the F2 generation due to dominance in the F1.

Direction of selection

In File S1, we report the effects of the direction of selection and evolutionary distance (the number of substitutions needed to reach the final phenotype on a given G–P map) on the evolution of HI. We find that selection from intermediate toward extreme phenotypic expression (from Popt = 0.5 to Popt = 1.0) yields much lower values of HI than does selection toward intermediate expression. This effect occurs because the slope of the G–P map at P = 0.5 is steeper, such that each mismatch has a greater effect on the hybrid phenotype and fitness than at P = 1.0. The absolute magnitude of the effect is higher over longer evolutionary distances, but is proportionally stronger over short evolutionary distances.

Allelic dominance

In the above results, bioenergetic parameter settings were such that higher expression was dominant over lower expression (i.e., a single allele of the TF with good fit to the binding site was sufficient to drive expression). As dominance was decreased by altering those parameter values, selection for higher expression shifted median HI in the F1 upward to match that in the F2 (Figure 4). Under directional selection for reduced expression, dominance did not change the generation in which HI occurs (data not shown), because our model has no combination of bioenergetic parameter values under which low expression is dominant.

Figure 4.

Figure 4

Effect of allelic dominance on median net phenotypic misregulation in F1 (open bars) and F2 (shaded bars) crosses following directional selection from minimal expression (12 mismatches) toward maximal expression (0 mismatches). Dominance is manipulated by changing Ediff and ΔG1 values and expressed as the average effect on an individual’s phenotype of removing one of its TF allele copies. Dominance is highest on the left. Hybrid fitness follows Equation 4. Box plots show median, quartiles, and full ranges. See Model for details.

Under stabilizing selection for high expression, reducing the degree of dominance reduced HI in both F1 and F2 generations because compensatory evolution did not occur when the mutation effect size was too large. For the F2 generation, this can be seen indirectly in Figure 3 and results are comparable in the F1 (not shown). A property of the bioenergetic model is that varying the parameters Ediff and ΔG1 to modify dominance also changes the slopes of the G–P map and fitness landscape (Figure 2). In equivalent biological terms, decreasing the availability of the TF at the binding site decreases dominance and simultaneously increases the effect of a single mismatch on expression; it increases slope of the G–P map. Figure 3 (shaded bars) shows the effect of the slope of the G–P map on HI under stabilizing selection for high expression; HI increases as a consequence of decreasing slope. Dominance, in itself, is likely to enhance this effect, because a mildly deleterious mutation of a given effect size is more likely to reach high frequency if it is recessive.

Cryptic misregulation in F1 hybrids

Even when the phenotypes of F1’s tend to resemble their parents’ and HI is low, underlying regulatory divergence can still be indicated if expression levels differ between orthologous combinations of TF/binding-site pairs (Wittkopp et al. 2004; Landry et al. 2005; Graze et al. 2012; Maheshwari and Barbash 2012). We find that the magnitude of these asymmetries depends on whether selection is toward the extreme vs. intermediate phenotype (Figure S3), being strongly constrained when evolving toward Popt = 1, and it occurs in F1 hybrids whether or not they exhibit HI. Asymmetry is strongest near P = 0.5 because mismatches there have the greatest effect on expression (Figure 2). These cryptic asymmetries in the F1 are manifested as HI in the F2 generation as parental combinations are dissociated. (See Figure S2.)

Discussion

Postzygotic isolation generally results from the accumulation of incompatibilities that arises as a consequence of genetic divergence (Orr 1995; Maheshwari and Barbash 2011; Nosil and Flaxman 2011). Transcription regulation contributes substantially to phenotypic divergence (Wray et al. 2003; Wittkopp and Kalay 2012), and abnormal expression levels observed in hybrids suggest that cis–trans regulatory interactions likely play a role in speciation (Ortíz-Barrientos et al. 2007). In addition to diversification of cis-regulatory regions (Wray 2007), evidence that affects affinity for DNA binding sites is emerging from comparisons within and among TF gene families of sequence evolution (Jovelin 2009; Nakagawa et al. 2013). Because phenotypic divergence often occurs under selection (Schluter 2009; Sobel et al. 2010), it is influenced by the adaptive landscape of the genes involved, including how those genes interact to produce the phenotype (Hansen and Wagner 2001; Gavrilets 2004; Palmer and Feldman 2009). The outcomes of those interactions, at the molecular level, are determined in turn by their local bioenergetic milieu. Thus, to understand at the molecular level how genetic incompatibility evolves between populations, we need a class of models that incorporate the relationship between genotype and phenotype in its bioenergetic context. To this end, we extended the gene-network speciation model of Johnson and Porter (2000, 2007) by incorporating an information-based statistical physics model of transcriptional regulation (Gerland et al. 2002; Mustonen et al. 2008). We show that the evolution of hybrid incompatibility depends on the bioenergetic properties of transcription factors and their binding sites.

The bioenergetic properties of TF–binding site interactions relate genotype to expression (Gertz et al. 2009; Segal and Widom 2009) and therefore affect the potential for adaptation and speciation by shaping the fitness landscape (Gavrilets 2004). Accordingly, we found that the degree to which a regulatory interaction under directional selection produced HI depended primarily on two factors: the number of substitutions separating ancestral and derived populations and the slope of the genotype–phenotype map and corresponding fitness landscape near the derived parental phenotype. HI was stronger when genetic divergence was high and when the G–P map around the derived phenotype was steep. Under stabilizing selection, HI occurred due to compensatory evolution, and its strength depended on the slope of the G–P map and population size. HI in a stabilized phenotype was stronger at low population sizes and, in contrast to the directional-selection case, when the G–P map around the phenotype was shallow. Similar population size effects under stabilizing selection appear in an analytical bioenergetic model (Khatri and Goldstein 2013).

Genetic divergence under selection

The number of substitutions separating ancestral and derived populations correlated with the genetic divergence between parent populations that evolved in parallel to new optimal phenotypes, and therefore the severity of HI. With selection for increased expression, each newly evolved matching position in the TF–binding site interaction in one population had a 50% chance per bit of information of being mismatched with the corresponding position in the other population. Thus, increasing the accumulated genetic divergence within evolving populations increased the expected number of extraneous mismatches in the cross-species molecular interactions. Similarly, with selection for decreased expression, each favored substitution that produced a beneficial mismatch in a parent had the potential to produce a spurious match in the hybrid, through the recreation of ancestral matches (see Figure 5).

Figure 5.

Figure 5

(A) Evolution toward lower gene expression: each parent population evolves two mismatches between transcription factor and binding site in response to selection for reduced expression. F1 hybrids may have higher gene expression than either parent population due to reconstructed ancestral matches and accidental matches at derived sites. (B) Evolution toward higher gene expression: each parent evolves new matches between transcription factor and binding site in response to selection for increased expression. Incompatible alleles arise in the F1 because each parent evolved a different fit between transcription factor and binding site.

Our results differ from those of Johnson and Porter (2000), where HI evolved readily only when reduced phenotypic expression was favored. This is a consequence of how alleles were modeled. In that model, each interacting allele was represented as a vector of one or more real numbers, and binding strength decreased with the Euclidean distance between TF and binding site. Reduced binding could be produced by at least two incompatible evolutionary trajectories; e.g., with one-dimensional alleles, each ortholog could evolve in either of two directions that each increased its distance from its binding partner. However, increased binding could result only from moving the Euclidean distance between alleles toward zero, so little incompatibility could arise between populations. Because alleles are represented in the bioenergetic model in terms of their information content as unique bit sequences (lacking directionality), HI evolved whether selection was for reduced or increased expression. In addition, the directionality of alleles in the Johnson and Porter (2000) model is responsible for their result that HI increased with the number of dimensions used to represent an allele. This outcome did not appear in the results of the bioenergetic model.

Studies of gene expression in hybrids often find asymmetric expression of parental orthologs (Wittkopp et al. 2004; Landry et al. 2005; Graze et al. 2012). This can be indicative of cryptic cis-by-trans regulatory divergence, as was recently revealed by asymmetric expression of the hybrid lethality gene Lhr in Drosophila (Maheshwari and Barbash 2012). Such regulatory divergence may produce asymmetry if, for example, spurious binding occurs between the TF of one species and the regulatory region from the other, but not the reverse. We found this phenomenon (Figure S3) if the relative amount of divergence in cis compared to trans differed between derived parental populations (Figure 6). Asymmetric expression occurred more frequently in our data than symmetric expression (data not shown), because for a given number of mismatches, there are more allelic states that produce asymmetry. Depending on dominance, the overall expression level in F1 hybrids may be similar to parental expression levels. As in Maheshwari and Barbash (2012), cryptic cis-by-trans divergence may be revealed in the F1 generation only by asymmetric expression.

Figure 6.

Figure 6

Asymmetric expression of parental orthologs in the F1 hybrid resulting from cis-by-trans regulatory divergence, when the relative amount of divergence in cis compared to trans differs between species. In the above example, divergence occurred under selection for reduced expression. More spurious matches resulted between the binding site of species 1 and the transcription factor of species 2 (top right interaction in the hybrid) than in the inverse interaction (bottom right). Depending on allelic dominance, the overall expression level in the F1 generation may be nearly the same in hybrids as in the parent species.

Effect of the fitness landscape

The fitness landscape of evolving populations is determined by the relationship between genotype and phenotype (the G–P map), the relationship between phenotype and fitness (Gavrilets 2004), and thereby, the bioenergetic properties affecting gene expression. Because the phenotype-to-fitness relationship was invariable among our populations at any given time, variation in the fitness landscape in our simulations was determined solely by the G–P map and the underlying bioenergetic parameters that define it. Thus, we can consider the effect of the G–P map on adaptation in evolving populations, as well as on hybrid fitness directly.

The slope of the G–P map and fitness landscape around a given phenotype determines the amount of phenotypic change produced by an additional match (or mismatch) in the TF–binding site interaction. Thus, it determines the robustness of a TF–binding site interaction to mutation, as well as the effect of excess matches or mismatches in hybrids. Under directional selection, a steeper slope near the derived parental phenotype resulted in higher median HI because fewer incompatible positions are needed to produce a given level of misregulation in the hybrid (Figure 3). Likewise, if parent populations evolved toward a highly robust TF–binding site interaction, the result was less HI, because the interaction is also robust to incompatibilities in the hybrid.

The observation that hybrid misregulation occurs in conserved traits (True and Haag 2001) suggests that HI is possible under stabilizing selection (see also Palmer and Feldman 2009; Fierst and Hansen 2010), although it may be less likely (Gavrilets 2004; Schluter 2009). Under stabilizing selection, we found that the amount of HI was determined primarily by the slopes of the G–P map and fitness landscape near the conserved phenotype and by population size (Figure 3). In contrast to directional selection, under stabilizing selection HI was greater when the slope near the parental phenotype was shallow. This is because divergence of parental genotypes under stabilizing selection depends on compensatory evolution, which is less likely to occur when the average fitness effect of a mutation is high; such mutations are eliminated by selection before a compensatory mutation can arise (Haag 2007; Fierst and Hansen 2010). Low population size reduces the efficiency of selection (Wright 1931), increasing the parameter range under which compensatory evolution can occur (Lynch 2007).

Dependence of compensatory evolution on the shape of the fitness landscape is also found in multilocus models where the shape is determined by the strength of epistasis among loci (Palmer and Feldman 2009; Fierst and Hansen 2010). In general, a flatter plateau of near neutrality around a conserved phenotype is expected to increase divergence at the underlying loci (Haag 2007; Palmer and Feldman 2009), which produced HI in our results if the hybrid genotype fell outside of the nearly neutral plateau. This aspect of our model bears similarity to the “holey adaptive landscape” model of Gavrilets (1999, 2004), in which fitness is a step function with a perfectly flat plateau and an infinitely steep threshold. The main difference, in addition to the fact that accounting for bioenergetic parameters results in a sigmoid fitness landscape, is where the holey-landscape model considers the cumulative effect of multiple independent loci, we consider instead the cumulative effect of multiple independent mutations within pairs of epistatically interacting loci.

Under the model of Johnson and Porter (2000), no HI was observed under stabilizing selection. We found this to be a consequence of the way diploid loci were handled to intentionally exclude the effects of dominance. When we permitted dominance in their model, a high amount of HI evolved under stabilizing selection (A. Tulchinsky, N. Johnson, A. Porter, unpublished results). By contrast, the present model resulted in more limited evolution of HI under stabilizing selection (Figure 3). As above, this is a consequence of how alleles are represented in the two models. Compensatory evolution is more likely when alleles are restricted to evolve in a limited number of dimensions; for example, with one-dimensional alleles in the earlier model, half of all possible mutations were at least partially compensatory for a deleterious substitution. In this model, a deleterious mismatch at a given position can be compensated only by a mutation at that same position (or another mismatched position, if any are available). We believe that this reduced probability of compensation is more realistic and consistent with the theoretical difficulties of compensatory evolution (Haag 2007).

Our model includes an assumption (implicit in the ΔG1 parameter) that each position in the TF–binding site interaction has an equal and additive effect on binding energy (Von Hippel and Berg 1986; Gerland et al. 2002; see also Khatri et al. 2009). Relaxing this assumption may result in increased HI under stabilizing selection, if nonadditivity allows TF–binding site interactions to diverge without any reduction in fitness. Likewise, it may result in decreased HI if TF mutations affect binding to multiple nucleotides, making compensatory evolution more difficult.

Dominance and F1 vs. F2 hybrid breakdown

In the absence of any other modifiers of allelic dominance, the same bioenergetic parameters that determine the effect of mismatches on expression (ΔG1, Ediff, and NTF) also determine the extent of dominance between TF alleles. Strong binding is dominant over weak binding when NTF >> emΔG1Ediff; in this case a reduction in binding of a single allele of the TF has little effect on expression. In biological terms, the reason is that when Ediff or NTF are high, an excess of transcription factor is available at the binding site; if one allele of the TF binds poorly, the other allele is sufficient to drive expression. Dominance decreases, asymptotically approaching codominance, as Ediff and NTF become lower, because fewer TF molecules are available at the binding site and expression is sensitive to further reduction in TF availability. Introducing mismatches into just one allele of the TF is enough to reduce the expression level.

Under the parameter combinations that yielded the above results, F2 hybrids showed much higher HI than did F1’s (Figure 4, left). These bioenergetic parameter values promote strong allelic dominance; knocking out one allele copy of a TF has little average effect. Under these conditions, the availability of TF was not limiting and competitive binding among TF alleles for the regulatory sites was strong, so that phenotypic expression in heterozygotes was driven by the allele combination with the tightest binding. F1 individuals exclusively carry TF and binding-site alleles from both parent populations. Knocking out either TF ortholog eliminates competitive binding so that the hybrid exhibits one parental phenotype or the other, so overall, F1’s and their two parents had the same phenotypes, and HI was minimal. However, in F2’s, half of the time, homozygous TF’s of one parental line combine with homozygous binding-site orthologs of the other, so average misexpression in F2’s was high.

Hybrid incompatibility in the F1 generation reaches that of the F2 (Figure 4, right) under bioenergetic parameter values where TF’s become increasingly scarce at the binding sites (Ediff increases or NTF decreases). This is because competitive binding becomes less important and phenotypic expression in individuals approaches the expression averaged over all TF-to-binding-site pairs. Codominance occurs in parental, F1, and F2 individuals, and averaging over individuals, F1’s and F2’s are equally misregulated.

This has ramifications when the two-locus regulatory interaction occurs upstream of further regulatory steps (see Ronen et al. 2002). Because dominance is partially dependent on concentration (NTF), a pathway with more than one step would exhibit evolutionary change in dominance at downstream loci as a consequence of expression changes upstream. This dependence of dominance on the genetic background is consistent with existing theory that indicates that, in general, the contribution of one locus to the phenotype can evolve solely due to changes at other loci (Wagner and Mezey 2000). We did not introduce other determinants of dominance (e.g., dimerization) as additional model parameters, because in our two-locus pathway, the effect of dominance on speciation would be the same regardless of how that dominance was generated. If weak binding were made dominant over strong binding, unfit low-expression hybrids would appear in the F1 generation but the degree of HI would be unchanged overall.

Parameterization of the model in empirical studies

The parameters of the bioenergetic model represent empirically measurable properties of transcription factors and the genomic background with which they interact (Gerland et al. 2002; Mustonen et al. 2008). Simicevic et al. (2013) document fivefold differences in NTF among TF loci, roughly corresponding to the range of Ediff values we investigated here. The effect on gene expression of the concentration of TF’s, NTF, is readily manipulated in the laboratory. The component of Ediff describing the relative competition for the TF from the genomic background is readily estimated by adding increasing concentrations of random genomic DNA to a solution of TF and specific binding-site fragments and measuring the effect on specific binding (Raumann et al. 1995). Binding affinities (corresponding to mΔG1) of TF and cis-site variants have been indirectly measured in several studies (e.g., Man et al. 2004; Shultzaberger et al. 2012; Samee and Sinha 2013). Empirical measures of these parameters within and between populations showing HI will yield considerable insight into the relationship between the evolution of the misregulation of expression and resulting BDM incompatibilities.

Biological context of the two-locus interaction and HI in F1’s vs. F2’s

Because gene networks connect genotype to complex phenotypes via G–P maps, ultimately network-based models are needed for making inferences about the initiation and evolutionary dynamics of speciation (Johnson and Porter 2001; see also Wilkins 2002, 2007). The two-locus regulatory interaction is elemental to all regulatory networks and provides a necessary basis for interpreting the evolution of HI in network settings. In a companion study (A. Tulchinsky, N. Johnson, A. Porter, unpublished results), we examine a bioenergetic model of a simple, three-locus pleiotropic network motif, another ubiquitous element of regulatory networks (Gibson 1996; Wray et al. 2003). Information-based bioenergetic models of other motifs have been studied in the context of the evolution of gene expression per se. These include cases where multiple independent TF’s regulate expression at a single locus (Stewart and Plotkin 2014; Ezer et al. 2014), where single TF’s have multiple binding sites in a single promoter region (He et al. 2012), and where a TF may be inhibited by nucleosomes at the binding site (Raveh-Sadka et al. 2009).

We found that HI evolves more readily in the F2 than in the F1 across a broad range of bioenergetic parameter values and that this difference depends on the degree of dominance imposed by those parameters (File S1). Our results apply to cases of parallel selection, where hybrid fitness doesn’t depend on environmental differences between parental populations. To what extent under these conditions, then, do we predict HI in natural systems to be stronger in F2’s? In this study we held constant the expression level of the upstream TF at NTF = 100 and instead varied Ediff, the difference in the free energies of binding of the TF to the target site vs. the nonspecific background. The effects on the phenotype of varying Ediff and NTF are isomorphic, with the effects of varying Ediff on the linear scale equivalent to varying NTF on the exponential scale (via Equation 4; illustrated in Figure S1). Inasmuch as phenotypic expression evolves in broader network settings, a problem well beyond the scope of this study, we expect the expression level of our upstream TF to respond to phenotypic selection as well. This upstream change in NTF will alter the baseline bioenergetic properties of the downstream two-locus interaction we model here, affecting its degree of dominance (Figure 6) and the shapes of the G–P map and fitness landscape (Figure S1). We expect this to have downstream effects on the evolutionary dynamics of HI, including the extent that HI can evolve in F1 relative to F2 crosses (Figure 4). Nevertheless, this bioenergetic model implies that HI in the F1 should be equal to or lower than that in the F2 regardless of the parameter values.

In nature, it is very plausible that the selection imposed by a changing environment will be multifarious, affecting several developmentally independent traits at once (Nosil et al. 2008). Although the bioenergetic bases of these traits are likely to differ and with them, their fitness landscapes, the fitness effects of evolving incompatibilities are likely to be multiplicative in hybrids. The more traits respond to selection, the smaller the incompatibilities need to be in each for significant hybrid incompatibilities to evolve. These multiplicative effects, however, will tend to magnify differences in HI between F1 and F2 generations to the extent that dominance effects emerge from the bioenergetic properties that determine expression of those traits. We expect regulatory interactions to exhibit bioenergetic properties favoring allelic dominance in cases where F1 hybrids resemble their parents more than do F2’s.

Supplementary Material

Supporting Information

Acknowledgments

We thank Patricia Wittkopp, David Garfield, Jeffrey Blanchard, Ana Caicedo, Benjamin Normark, Courtney Babbitt, Brett Payseur, Marcy Uyenoyama, and anonymous reviewers for valuable comments and discussion on the manuscript.

Footnotes

Communicating editor: B. A. Payseur

Literature Cited

  1. Bateson W., 1909.  Heredity and variation in modern lights, Darwin and Modern Science, edited by Seward A. C. Cambridge University Press, Cambridge, UK. [Google Scholar]
  2. Coyne J. A., Orr H. A., 2004.  Speciation. Sinauer, Sunderland, MA. [Google Scholar]
  3. Dobzhansky T., 1937.  Genetics and the Origin of Species. Columbia University Press, New York. [Google Scholar]
  4. Ezer D., Zabet N. R., Adryan B., 2014.  Physical constraints determine the logic of bacterial promoter architectures. Nucleic Acids Res 42: 4196–4207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Fierst J. L., Hansen T. F., 2010.  Genetic architecture and postzygotic reproductive isolation: evolution of Bateson–Dobzhansky–Muller incompatibilities in a polygenic model. Evolution 64: 675–693. [DOI] [PubMed] [Google Scholar]
  6. Gavrilets S., 1999.  A dynamical theory of speciation on holey adaptive landscapes. Am. Nat. 154: 1–22. [DOI] [PubMed] [Google Scholar]
  7. Gavrilets S., 2004.  Fitness Landscapes and the Origin of Species. Princeton University Press, Princeton, NJ. [Google Scholar]
  8. Gerland U., Moroz J. D., Hwa T., 2002.  Physical constraints and functional characteristics of transcription factor-DNA interaction. Proc. Natl. Acad. Sci. USA 99: 12015–12020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Gertz J., Siggia E. D., Cohen B. A., 2009.  Analysis of combinatorial cis-regulation in synthetic and genomic promoters. Nature 457: 215–218. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Gibson G., 1996.  Epistasis and pleiotropy as natural properties of transcriptional regulation. Theor. Popul. Biol. 49: 58–89. [DOI] [PubMed] [Google Scholar]
  11. Graze R. M., Novelo L. L., Amin V., Fear J. M., Casella G., et al. , 2012.  Allelic imbalance in Drosophila hybrid heads: exons, isoforms, and evolution. Mol. Biol. Evol. 29: 1521–1532. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Haag E. S., 2007.  Compensatory vs. pseudocompensatory evolution in molecular and developmental interactions. Genetica 129: 45–55. [DOI] [PubMed] [Google Scholar]
  13. Hansen T. T., Wagner G. P., 2001.  Modeling genetic architecture: a multilinear theory of gene interaction. Theor. Popul. Biol. 59: 61–86. [DOI] [PubMed] [Google Scholar]
  14. He X., Duque T. S. P. C., Sinha S., 2012.  Evolutionary origins of transcription factor binding site clusters. Mol. Biol. Evol. 29: 1059–1070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Johnson N. A., 2010.  Hybrid incompatibility genes: Remnants of a genomic battlefield? Trends Genet. 26: 317–325. [DOI] [PubMed] [Google Scholar]
  16. Johnson N. A., Porter A. H., 2000.  Rapid speciation via parallel, directional selection on regulatory genetic pathways. J. Theor. Biol. 205: 527–542. [DOI] [PubMed] [Google Scholar]
  17. Johnson N. A., Porter A. H., 2001.  Toward a new synthesis: population genetics and evolutionary developmental biology. Genetica 112–113: 45–58. [PubMed] [Google Scholar]
  18. Johnson N. A., Porter A. H., 2007.  Evolution of branched regulatory genetic pathways: directional selection on pleiotropic loci accelerates developmental system drift. Genetica 129: 57–70. [DOI] [PubMed] [Google Scholar]
  19. Jovelin R., 2009.  Rapid sequence evolution of transcription factors controlling neuron differentiation in Caenorhabditis. Mol. Biol. Evol. 26: 2373–2386. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Khatri B. S., Goldstein R. A., 2013.  Evolutionary stochastic dynamics of speciation and a simple genotype-phenotype map for protein binding DNA. ArXiv:1303.7006. [Google Scholar]
  21. Khatri B. S., McLeish T. C. B., Sear R. P., 2009.  Statistical mechanics of convergent evolution in spatial patterning. Proc. Natl. Acad. Sci. USA 106: 9564–9569. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Landry C. R., Wittkopp P. J., Taubes C. H., Ranz J. M., Clark A. G., et al. , 2005.  Compensatory cistrans evolution and the dysregulation of gene expression in interspecific hybrids of Drosophila. Genetics 171: 1813–1822. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Lynch M., 2007.  The Origins of Genome Architecture. Sinauer Associates, Sunderland, MA. [Google Scholar]
  24. Maheshwari S., Barbash D. A., 2011.  The genetics of hybrid incompatibilities. Annu. Rev. Genet. 45: 331–355. [DOI] [PubMed] [Google Scholar]
  25. Maheshwari S., Barbash D. A., 2012.  Cis-by-trans regulatory divergence causes the asymmetric lethal effects of an ancestral hybrid incompatibility gene. PLoS Genet. 8: e1002597. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Man T.-K., Yang J. S.-W., Stormo G. D., 2004.  Quantitative modeling of DNA–protein interactions: affects of amino acid substitutions on binding specificity of the Mnt repressor. Nucleic Acids Res. 32: 4026–4032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Michaelis, L., and M. L. Menten, 1913.  Die kinetik der invertinwirkung. Biochem. Z. 49: 333–369. [Google Scholar]
  28. Morowitz H. J., 1978.  Foundations of Bioenergetics. Academic Press, New York. [Google Scholar]
  29. Muller H. J., 1942.  Isolating mechanisms, evolution and temperature. Biol. Symp. 6: 71–125. [Google Scholar]
  30. Mustonen V., Kinney J., Callahan C. G., Jr, Lässig M., 2008.  Energy-dependent fitness: a quantitative model for the evolution of yeast transcription factor binding sites. Proc. Natl. Acad. Sci. USA 105: 12376–12381. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Nakagawa S., Gisselbrecht S. S., Rogers J. M., Hartl D. L., Bulyk M. L., 2013.  DNA binding specificity changes in the evolution of forkhead transcription factors. Proc. Natl. Acad. Sci. USA 110: 12349–12354. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Nosil P., Flaxman S. M., 2011.  Conditions for mutation-order speciation. Proc. Biol. Sci. 278: 399–407. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Nosil P., Harmon L. J., Seehausen O., 2008.  Ecological explanations for (incomplete) speciation. Trends Ecol. Evol. 24: 145–156. [DOI] [PubMed] [Google Scholar]
  34. Orr H. A., 1995.  The population genetics of speciation: the evolution of hybrid incompatibilities. Genetics 139: 1805–1813. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Ortíz-Barrientos D., Counterman B. A., Noor M. A. F., 2007.  Gene expression divergence and the origin of hybrid dysfunctions. Genetica 129: 71–81. [DOI] [PubMed] [Google Scholar]
  36. Palmer M. E., Feldman M. W., 2009.  Dynamics of hybrid incompatibility in gene networks in a constant environment. Evolution 63: 418–431. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Presgraves D. C., 2010.  The molecular evolutionary basis of species formation. Nat. Rev. Genet. 11: 175–180. [DOI] [PubMed] [Google Scholar]
  38. Raumann B. E., Knight K. L., Sauer R. T., 1995.  Dramatic changes in DNA-binding specificity caused by single residue substitutions in an Arc/Mnt hybrid repressor. Nat. Struct. Biol. 2: 1115–1122. [DOI] [PubMed] [Google Scholar]
  39. Raveh-Sadka T., Levo M., Segal E., 2009.  Incorporating nucleosomes into thermodynamic models of transcription regulation. Genome Res. 19: 1480–1496. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Ronen M., Rosenberg R., Shraiman B. I., Alon U., 2002.  Assigning numbers to the arrows: parameterizing a gene regulation network by using accurate expression kinetics. Proc. Natl. Acad. Sci. USA 99: 10555–10560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Samee A. H., Sinha S., 2013.  Evaluating thermodynamic models of enhancer activity on cellular resolution gene expression data. Methods 62: 79–90. [DOI] [PubMed] [Google Scholar]
  42. Schluter D., 2009.  Evidence for ecological speciation and its alternative. Science 323: 737–741. [DOI] [PubMed] [Google Scholar]
  43. Segal E., Widom J., 2009.  From DNA sequence to transcriptional behaviour: a quantitative approach. Nat. Rev. Genet. 10: 443–456. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Segal E., Raveh-Sadka T., Schroeder M., Unnerstall U., Gaul U., 2008.  Predicting expression patterns from regulatory sequence in Drosophila segmentation. Nature 451: 535–540. [DOI] [PubMed] [Google Scholar]
  45. Shultzaberger R. K., Maerkl S. J., Kirsch J. F., Eisen M. B., 2012.  Probing the informational and regulatory plasticity of a transcription factor DNA-binding domain. PLoS Genet. 8: e1002614. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Simicevic J., Schmid A. W., Gilardoni P. A., Zoller B., Raghav S. K., et al. , 2013.  Absolute quantification of transcription factors during cellular differentiation using multiplexed target proteomics. Nat. Methods 10: 570–576. [DOI] [PubMed] [Google Scholar]
  47. Sobel J. M., Chen G. F., Watt L. R., Schemske D. W., 2010.  The biology of speciation. Evolution 64: 295–315. [DOI] [PubMed] [Google Scholar]
  48. Stewart A. J., Plotkin P. P., 2014.  The evolution of complex gene regulation by low-specificity binding sites. Proc. R. Soc. B 280: 20131313. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Svensson E. I., Calsbeek R. (Editors), 2012.  The Adaptive Landscape in Evolutionary Biology. Oxford University Press, Oxford. [Google Scholar]
  50. Travisano M., Shaw R. G., 2012.  Lost in the map. Evolution 67: 305–314. [DOI] [PubMed] [Google Scholar]
  51. True J. R., Haag E. S., 2001.  Developmental system drift and flexibility in evolutionary trajectories. Evol. Dev. 3: 109–119. [DOI] [PubMed] [Google Scholar]
  52. Tulchinsky, A. Y., 2013 Evolution of Hybrid Incompatibilities in Gene Regulatory Networks. Ph.D. Dissertation, University of Massachusetts, Amherst, MA. [Google Scholar]
  53. von Hippel P. H., Berg O. G., 1986.  On the specificity of DNA–protein interactions. Proc. Natl. Acad. Sci. USA 83: 1608–1612. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Wagner G. P., Mezey J., 2000.  Modeling the evolution of genetic architecture: a continuum of alleles model with pairwise AxA epistasis. J. Theor. Biol. 203: 163–175. [DOI] [PubMed] [Google Scholar]
  55. Watt W. B., 2013.  Specific-gene studies of evolutionary mechanisms in an age of genome-wide surveying. Ann. N. Y. Acad. Sci. 1289: 1–17. [DOI] [PubMed] [Google Scholar]
  56. Watt W. B., Dean A. M., 2000.  Molecular-functional studies of adaptive genetic variation in prokaryotes and eukaryotes. Annu. Rev. Genet. 34: 593–622. [DOI] [PubMed] [Google Scholar]
  57. Watt W. B., Wheat C. W., Meyer E. H., Martin J.-F., 2003.  Adaptation at specific loci. VII. Natural selection, dispersal and the diversity of molecular-functional variation patterns among butterfly species complexes (Colias: Lepidoptera, Pieridae). Mol. Ecol. 12: 1265–1275. [DOI] [PubMed] [Google Scholar]
  58. Wilkins A. S., 2002.  The Evolution of Developmental Pathways. Sinauer Associates, Sunderland, MA. [Google Scholar]
  59. Wilkins A. S., 2007.  Between “design’ and “bricolage”: genetic networks, levels of selection, and adaptive evolution. Proc. Natl. Acad. Sci. USA 104(Suppl. 1): 8590–8596. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Wittkopp P. J., Kalay G., 2012.  Cis-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence. Nat. Rev. Genet. 13: 59–69. [DOI] [PubMed] [Google Scholar]
  61. Wittkopp P. J., Haerum B. K., Clark A. G., 2004.  Evolutionary changes in cis and trans gene regulation. Nature 430: 85–88. [DOI] [PubMed] [Google Scholar]
  62. Wray G. A., 2007.  The evolutionary significance of cis-regulatory mutations. Nat. Rev. Genet. 8: 206–216. [DOI] [PubMed] [Google Scholar]
  63. Wray G. A., Hahn M. W., Abouheif E., Balhoff J. P., Pizer M., et al. , 2003.  The evolution of transcriptional regulation in eukaryotes. Mol. Biol. Evol. 20: 1377–1419. [DOI] [PubMed] [Google Scholar]
  64. Wright S., 1931.  Evolution in Mendelian populations. Genetics 2: 97. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Articles from Genetics are provided here courtesy of Oxford University Press

RESOURCES