Abstract
We report a new mechanism for allelic dominance in regulatory genetic interactions that we call binding dominance. We investigated a biophysical model of gene regulation, where the fractional occupancy of a transcription factor (TF) on the cis-regulated promoter site it binds to is determined by binding energy (–ΔG) and TF dosage. Transcription and gene expression proceed when the TF is bound to the promoter. In diploids, individuals may be heterozygous at the cis-site, at the TF’s coding region, or at the TF’s own promoter, which determines allele-specific dosage. We find that when the TF’s coding region is heterozygous, TF alleles compete for occupancy at the cis-sites and the tighter-binding TF is dominant in proportion to the difference in binding strength. When the TF’s own promoter is heterozygous, the TF produced at the higher dosage is also dominant. Cis-site heterozygotes have additive expression and therefore codominant phenotypes. Binding dominance propagates to affect the expression of downstream loci and it is sensitive in both magnitude and direction to genetic background, but its detectability often attenuates. While binding dominance is inevitable at the molecular level, it is difficult to detect in the phenotype under some biophysical conditions, more so when TF dosage is high and allele-specific binding affinities are similar. A body of empirical research on the biophysics of TF binding demonstrates the plausibility of this mechanism of dominance, but studies of gene expression under competitive binding in heterozygotes in a diversity of genetic backgrounds are needed.
Keywords: competitive binding, epistasis, fractional occupancy, genotype–phenotype map, thermodynamics
MENDEL (1866) coined the terms dominant and recessive to describe variants that respectively appear in 3:1 ratios in first-generation hybrid crosses. Wright (1934) proposed a plausible mechanism, demonstrating theoretically that dominance can arise as a natural consequence of functional allelic differences among enzymes that play roles in metabolic pathways. Alleles with reduced function tended to be recessive, and variation in the genetic background could modify the degree of dominance. Kacser and Burns (1981) cast Wright’s mechanism into the language of enzyme kinetics and metabolic flux, a mechanism we will call flux dominance, and several studies have extended and modified it (e.g., Keightley and Kacser 1987; Keightley 1996; Bagheri and Wagner 2004). Since then, several other mechanisms have been found to produce dominance, including negative regulatory feedback (Omholt et al. 2000), threshold-based reaction-diffusion systems (Gilchrist and Nijhout 2001), protein–protein interactions (Veitia et al. 2013), and epigenetic modifications (Li et al. 2012; Bond and Baulcombe 2014). In general, dominance arises because the relationship between the genotype and the phenotype it produces is nonlinear (Gilchrist and Nijhout 2001; Veitia et al. 2013).
Empirical studies have shown that dominance is commonly found in loci involved in gene regulation. In particular, trans-acting alleles [e.g., transcription factors (TFs)] commonly show dominance, whereas alleles of the cis-acting sites they regulate only rarely do (Hughes et al. 2006; Stupar and Springer 2006; Wray 2007; Li et al. 2012; Bond and Baulcombe 2014). The mechanism is unknown. We propose that this dominance is an inevitable consequence of differences in binding dynamics between trans-acting gene products as they compete for access to the cis-sites they regulate. The degree of dominance thereby depends on allele-specific differences in concentration and binding affinity of the trans-acting allelic variants. Such competitive binding interactions are integral to models of multifactorial gene regulation (Bintu et al. 2005b), including nucleosome–TF interactions (Teif et al. 2010, 2012), and repressor (Browning and Busby 2004) and microRNA function (Thomson and Dinger 2016), but have not been applied to allelic interactions. This form of Mendelian dominance, which we term binding dominance, propagates through regulatory pathways and is modified by polymorphism at other loci in the pathway. Our findings apply to any trans-acting regulatory molecules interacting with cis-acting regulatory sites. TF/promoter interactions meet these criteria well and we will develop the model using that language.
Methods
Model overview
Biophysical models have long been used to study molecular interactions between DNA and molecules that bind to it (e.g., Gerland et al. 2002; Bintu et al. 2005a; Phillips et al. 2012; Tulchinsky et al. 2014; Khatri and Goldstein 2015). The central premise of these models is that interactions between regulatory molecules and the sites they regulate behave according to the thermodynamic and kinetic principles that drive all molecular interactions. Consistent with empirical data (reviewed in Mueller et al. 2013), gene expression in these models only ensues while a TF molecule is physically bound to the promoter of the regulated gene.
In our model, binding is a stochastic process determined by the free energy of association (–ΔG, in units of 1/kBT, the Boltzmann constant × the temperature in °K; –ΔG is negative by definition), between a TF molecule and promoter, which we will call “binding energy.” The fractional occupancy θ—the proportion of time a promoter is occupied by a TF molecule, and therefore the gene expression level—depends on –ΔG, and also on dosage [TF], the concentration of free TF molecules available in the nucleus to bind when the promoter is unoccupied.
The biophysical model represents interacting TF molecules and the promoter sequence as strings of bits of arbitrary length (Figure 1A), an approach based in statistical physics and information theory (Gerland et al. 2002). This method of abstraction permits characterization of molecular interactions at arbitrary scales, from the state space of electrostatic interaction among atoms to amino acid and nucleotide variation, and ultimately, to the genetic basis of variation in those molecules. The binding energy decays to 0 in steps of –ΔG1 as m, the proportion of mismatched bits over the length of the bitstring, increases (Teif et al. 2010). Specifically, m is the Hamming distance between the bitstrings, scaled to the bitstring length (Figure 1A). The binding energy is related to the dissociation constant K as K = exp[–ΔG]. Here, we present the model at the physiological scale, in terms of [TF] and K, which may be more accessible to readers. Its analog at the scale of physically interacting molecules is presented in Supplemental Material, File S1. The haploid model, a parameter-reduced form of our model in Tulchinsky et al. (2014) (see File S1), is
(1) |
where ΔK = exp[–ΔG1] is the stepwise change in the dissociation constant, such that K = (ΔK)m.
We use the following notational conventions throughout. Interacting loci are labeled with letters A and B, with C included for three-locus pathways. Subscripts indicate allelic variants as in Figure 1, panels B and C; those before the letter (e.g., 1A) refer to promoter alleles and those after the letter (e.g., A1) indicate coding region alleles (therefore, TF structural variants). Subscripts are dropped for homozygotes (e.g., AA), and both subscripts are used when both sites vary for an allele (e.g., 1A1 and 2A2). Arrows indicate allele-specific regulatory interactions, e.g., mA1→1B represents bitstring mismatches between TF allele A1 and cis-site allele 1B. We use brackets for concentration, e.g., [TF] represents the concentration of a generic TF with [TF]sat as its saturating concentration, and [A1] represents the concentration of the TF structural variant coded by allele copy 1 of locus A.
Diploid model
In diploids, TF variants A1 and A2 (Figure 1B) compete for occupancy at both promoter sites 1B and 2B independently (Tulchinsky et al. 2014) and the concentration of TF molecules is the sum of those from each TF allele copy, i.e., [A1] + [A2] = [TF]. Under TF competition, the fractional occupancy of A1 on promoter site 1B in the presence of A2 is
(2a) |
where m11 (= mA1→1B) and m21 (= mA2→1B) are the proportions of mismatches between the bit strings of A1 and A2 to that of 1B, respectively (see File S1). Fractional occupancies of the other three interactions are calculated analogously,
(2b) |
(2c) |
(2d) |
where m12 = mA1→2B and m22 = mA2→2B. The final expression level (φ) is the sum of the fractional occupancies of the four TF–promoter pairs,
(3) |
where φ* is the unscaled expression. When both TF variants are at saturating concentration, i.e., [A1] + [A2] = [TF]sat, then maximum fractional occupancy θmax = [TF]sat/(1 + [TF]sat) occurs when m = 0, and minimum fractional occupancy θmin = [TF]sat/(ΔK + [TF]sat) occurs when m = 1. We scale expression to the range [0,1] using φ = (φ* – θmin)/(θmax – θmin). However, because φ < θmin when m = 1 and [TF] < [TF]sat, we set also set an expression floor at φ = 0, such that
(4) |
This constraint is necessary in the A→B step of the three-locus model described below, and for generality we apply it to both models. As a baseline for scaling purposes, we use TF dosages [A1]sat = [A2]sat = [TF]sat/2 as the allele-specific saturating concentrations.
Genotype–phenotype (G–P) map
We treat the phenotype, P, as being proportional to the expression level of the cis-regulated locus, such that P = λφ, and without loss of generality, treat that proportionality constant as λ = 1, such that P = φ.
In the biophysical model, the bit strings are abstract representations of information content that can characterize underlying genetic differences in the interacting molecules. Equation 1, Equation 2, Equation 3, and Equation 4 therefore characterize the G–P map, the rules by which the phenotype is generated from the underlying genotype as a function of binding energy and TF concentration.
Dominance
Competition between TF alleles for binding to their cis-regulated sites creates conditions for allelic dominance (Tulchinsky et al. 2014). Following Wright (1934), we use d = (P11 – P12)/(P11 – P22) as the dominance coefficient, where P12 is the heterozygote phenotype and P11 and P22 are homozygote phenotypes. Allele “1” of the respective locus is thereby the reference allele for which dominance is assessed. Allele 1 is codominant when d = 1/2, completely dominant at d = 0, and completely recessive at d = 1. When polymorphism occurs at more than one site, these P’s represent marginal phenotypes with respect to the reference allele, and d is the marginal dominance, holding the rest of the genetic background constant.
If fractional occupancy cannot be measured separately for each allele, then d must be assessed phenotypically. Even strong dominance becomes increasingly difficult to detect as φ’s for homozygotes and heterozygotes of both alleles approach equality because the three genotypes will have very similar phenotypes; the trait will appear to be unaffected by these loci, or the degree of dominance will be obscured by sampling and measurement error. Detectability (t) is proportional to the absolute difference between the two homozygote phenotypes, such that t = κ |P11 – P22| with proportionality function κ. In a constant genetic background, κ is some increasing function of the accuracy in the measurement of P (or φ) and the sample size of the study.
Three-locus pathways: propagation and genetic background
In a linear three-locus pathway (Figure 1C), locus B codes for a second TF that binds to the promoter of locus C, such that there are two regulatory steps, A→B and B→C. The final phenotype is the expression level at locus C (P = φC). The promoter and product sites of locus B together comprise a single allele (in this three-locus, two-allele model), and the doubly heterozygous B genotype is denoted 1B12B2. Competitive binding of the A alleles onto the two B alleles proceeds independently, creating two allele-specific expression terms, φ1B1 and φ2B2, based on Equation 2. Expression of these B alleles yields separate [1B1] and [2B2] values, which we calculate as [1B1] = φ1B1[B]sat and [2B2] = φ2B2[B]sat, such that maximal expression of the B locus yields [1B1]sat + [2B2]sat = [B]sat.
Analysis
We considered cases where fractional occupancy and therefore gene expression is maximal (φ = P = 1) when binding is maximal (m = 0) and TF concentration is saturating, and that φ = P = 0.5 when m = 0.5 at the same [TF]sat. Analysis of the role of TF concentration requires scaling ΔK to [TF]sat in order to meet these constraints. This results in no loss of generality because m is in units of abstract, arbitrarily scalable “information” about properties that affect ΔK in the binding interaction. For example, bits in the TF and cis-site bitstrings (Figure 1A) are not intended to correspond one-to-one to single amino acid or nucleotide positions, and stepwise changes in m are not literal representations of substitutions. Rather, the bitstrings are abstract representations of shape and charge, and m informs us of the extent to which they are compatible. Substituting m = 1/2 into Equation 2 and Equation 3, and solving Equation 4 for ΔK, we used ΔK = [TF]2sat.
We report results from the cases where [TF]sat takes the values 10, 100, and 1000. In many cases, we found heterozygote and homozygote phenotypes to be so similar that their differences could be hard to detect in empirical studies. To graphically illustrate the effects of detectability, we overlay the genotype-dominance maps with white opacity masks, grading from opaque at t = 0 through translucency to transparency at t = 1, where t is the detectability parameter, with the effect of making the underlying genotype-dominance map increasingly visible as t increases. As a heuristic, we treat scaling function κ as a constant arbitrarily set to 4 with a maximum of t = 1; i.e., dominance is undetectable when homozygote phenotypes are equal and always detectable when their difference equals or exceeds 1/4.
In three-locus pathways, we used Equation 2 and Equation 4, with appropriate subscripts, to calculate φC. For simplicity, we assume [TF]sat is the same for both regulatory steps, i.e., [A]sat = [B]sat = [TF]sat. All analyses were done using Mathematica (Wolfram Research, Inc., 2015) and code used to generate the figures is available on GitHub at https://github.com/adamhporter/Mendelian-dominance-via-transcription-factor-binding.git.
Data availability
The authors state that all data necessary for confirming the conclusions presented in the article are represented fully within the article.
Results
We compare three types of polymorphism (Figure 1B). Polymorphism in the cis-regulated B locus is represented as AA→1B2B; that in the TF protein-coding region is A1A2→BB; and variation in TF dosage (i.e., allele-specific [TF] as determined by upstream expression) is 1A2A→BB. In the three-locus AA→BB→CC pathway, we consider the propagation and detectability of dominance at locus A with respect to expression at downstream locus C (φC) and explore genetic background effects when loci B and C are polymorphic or have imperfect binding.
Genotype–phenotype maps
The shapes of the G–P maps differ depending on which site is polymorphic. In the 1A2A→BB case (Figure 2, A–C) with maximal TF binding (m = 0), φ is low when both alleles are at low dosage ([TF]), climbing toward high expression as [TF] of both alleles rises to saturating concentration [TF]sat. The effect is very sensitive to [TF]sat, such that the region of detectably lower φ is confined to the very bottom left corner of Figure 2C when [TF]sat is high. The drop-off in φ is proportional to their sum, [TF], therefore perpendicular to the [1A] = [2A] diagonal.
In the A1A2→BB case (Figure 2, D–F) at [TF]sat, φ depends on competitive binding of the TF variants to the cis-sites they regulate (Equation 2 and Equation 4). φ is high as long as either TF binds tightly (mA1→B or mA2→B is low), yielding a characteristic L-shaped ridge on the density plot, indicating dominance of the tighter-binding allele. Increasing [TF]sat (Figure 2, E and F) broadens and flattens the ridge.
In the AA→1B2B case (Figure 2, G–I), the expression of the two B-allele copies is additive (Equation 3) and at [TF]sat, peak φ occurs when both alleles perfectly match the TF (mA→1B = mA→2B = 0). Expression falls away on both axes, leaving a characteristic arc on the density plot (Figure 2G), curving opposite the direction of the A1A2→BB case. Increasing [TF]sat produces a more plateaued ridge that extends further out along the mA→1B = mA→2B diagonal, visible as a more squared-off arc on the density plot (Figure 2, H and I).
Dominance in expression level φ
Dominance at the A locus with respect to φ emerges when variation occurs in the TF (the 1A2A→BB and A1A2→BB cases), with different patterns (Figure 3A–F). However, when variation occurs at the cis site (the AA→1B2B case) expression is always codominant (d = 0.5; not illustrated) due to the additivity of the products of locus B (Equation 3).
When TF binding varies (the A1A2→BB case; Figure 3, A–C), the TF allele with higher binding affinity (lower m) has a competitive advantage and dominant expression. The isoclines follow the diagonal when m is low but flare at higher m such that the competitive binding effect becomes much weaker. In this range, the occupancy of each allele is so low that the TFs effectively cease to compete and the phenotype approaches additivity (i.e., diploid φ* of Equation 3 approaches haploid θ of Equation 1 as m goes to 1). [TF]sat has a strong effect on dominance due to its effect on competition. When [TF]sat is high (Figure 3C), small changes in binding affinity can produce large changes in d, particularly when m < 0.5, whereas much larger changes in m are required for the same effect at [TF]sat = 10 (Figure 3A). Polymorphism in the B locus has no effect on the dominance of A1 in the A1A2→1B2B case.
When TF dosage varies (the 1A2A→BB case; Figure 3, D–F), the allele with higher [TF] is dominant. The isoclines spread linearly from the bottom left corner of the density plot, where [TF] is low for both alleles, continuing into the region beyond the dotted line where total TF concentration is saturating ([1A] + [2A] ≥ [TF]sat). This dominance pattern is not substantially altered by [TF]sat, nor is it by m < 1 provided that the TF coding region and the cis-site are homozygous. These plots are therefore not shown.
When TF dosage and binding affinity both vary (the 1A12A2→BB case), the two sources of dominance interact cooperatively. Figure 3H shows the effect of allelic variation [1A1] and [2A2] under conditions where mA1→B = 0.1, mA2→B = 0.2, and [TF]sat = 100. For orientation, Figure 3H represents the effects of varying dosage [TF] for the binding strength combination lying at the position of the circle in Figure 3B; the circles in the centers of Figure 3B and Figure 3H represent the same conditions. At this saturating concentration (i.e., [A1]sat = [A2]sat = [TF]sat/2 at both circles), allele 1A1 is dominant with d = 0.291. Along the x-axis in Figure 3H, increasing the dosage of the more tightly binding 1A1 allele above [A1]sat increases its dominance, whereas decreasing its concentration pushes d back toward codominance until dominance is ultimately reversed and 1A1 becomes recessive. Along the y-axis, increasing the dosage of the 2A2 allele also counteracts dominance of the 1A1 allele, but the rate of change is much slower, and is only able to reverse the direction of dominance if [A1] and [A2] start well below [TF]sat/2.
Dominance is more sensitive to binding affinity than to differences in dosage. Figure 3G reflects the same conditions as Figure 3E but with a fivefold difference in allele-specific dosages, [A1] = [TF]sat/2 and [A2] = [TF]sat/10. For orientation, the orange crosses in Figure 3, E and G share common parameter settings. Under these maximum-binding conditions, 1A1 is dominant with d = 0.17. In Figure 3G, codominance is restored when binding of 1A1 is reduced by ∼20%, becoming recessive beyond that.
Detectability of dominance in the phenotype
Figure 4 shows the dominance maps of Figure 3, A–F overlaid by white opacity masks that obscure d in proportion to the similarity of the expression levels in homozygotes. Existing dominance due to dosage differences in the 1A2A→BB case is likely to be hard to detect unless [TF]sat is low and the dosages differ strongly (Figure 4D), and is likely to be detectable only in loss-of-expression alleles when [TF]sat is high (Figure 4, E and F). Detectability is higher in the A1A2→BB case, especially when [TF]sat is low (Figure 4A). As [A]sat increases (Figure 4, B and C), the region of low detectability of dominance broadens in the high- and low-expression regions of the corresponding G–P maps (Figure 2E and Figure 3F).
Three-locus pathways
Using a three-locus linear pathway (Figure 1C), we assessed the G–P maps and the dominance of the dosage (1A) and binding (A1) sites with respect to expression of locus C (φC). We will call this dominance dAC. We also examined the effects of genetic background by varying binding in the B→C step.
The G–P map of the TF-dosage case (the 1A2A→BB→CC case) with [TF]sat = 10 (Figure 5A) is a steeper version of the 1A2A→BB map (Figure 2A), such that φC is nearly maximal unless [A] is very low for both A alleles. Higher values of [TF]sat yield such steep G–P maps at low dosage that only virtual double-knockout 1A2A genotypes are able to appreciably reduce locus C’s expression (not illustrated). The G–P map for the TF-binding case (the A1A2→BB→CC case; Figure 5B shows [TF]sat = 10) takes the same general form as the A1A2→BB map (Figure 3E), but has a broad, high-expression plateau such that far greater A→B mismatch is required for an equivalent reduction of φC. At higher [TF]sat (not illustrated), the region of low expression becomes increasingly confined to the top right corner such that only very weak A→B binding affects φC at maximal B→C binding. The plateau becomes even broader and the shape squares off as it does for the A1A2→BB maps of Figure 3, D–F.
Dominance at the 1A and A1 sites propagates down the pathway to yield dominance with respect to φC. In the A1A2→BB→CC case, the transition of dAC from dominant to recessive lies parallel to the mA1→B = mA2→B diagonal when mB→C = 0 (Figure 5C and Figure 6A), and increasing [TF]sat steepens the transition (Figure 5D and Figure 6B). dAC is weaker and more sensitive to binding strength when mB→C = 0.5 but is slightly more detectable (Figure 6, C and D). dAC drops rapidly between 0.5 ≤ mB→C ≤ 1 and becomes very hard to detect, especially when [TF]sat is high (not shown). Here, without sensitive assays of expression, even unexpressed, completely recessive A alleles may go undetected.
Despite the differences in their G–P maps, dominance in the 1A2A→BB→CC case is almost identical to that of the 1A2A→BB case seen in Figure 3, D–F. However, its detectability (Figure 7, A–F) is much weaker (e.g., compare Figure 4D to Figure 7A). It increases slightly when mB→C = 0.5 (Figure 7C), but drops to become negligible beyond that (not shown). For higher levels of [TF]sat, dominance will only be detectable when one of the A alleles is unexpressed (Figure 7, B and D) unless assays are extremely sensitive.
Polymorphism in the genetic background can modify dAC, but the magnitude of the effect depends on the background type. The effect is greatest in the 1A2A→1B2B→CC case, where dosage differences in TF locus A coexist with binding site variation in the cis site of locus B. For illustration, we’ve chosen a combination where dosage of the 1A allele is maximal ([1A] = [TF]sat/2) and that of the 2A allele is low ([2A] = [TF]sat/10), at the position of the square in Figure 7A, such that dominance is strong and relatively easy to detect. Figure 6G shows the effect of binding variation in the A→B step, due to variation in the B-locus promoter (mA→1B vs. mA→2B; the coding region of TF A is monomorphic) at this dosage combination. As overall A→B binding decreases (mA→B increases), dAC increases (and becomes more detectable) until, ultimately, allele 1A becomes recessive. This effect is less pronounced as [TF]sat increases (Figure 6H), and also as the dosage differences decrease (not shown). Discontinuities in the isoclines of Figure 6, G and H, shown as dashed lines, are explained in Figure S1.
Other foreground/background combinations have weaker effects or none at all, and they mostly affect detectability. In Figure 6, E and F, we show an example for the A1A2→B1B2→CC case, where the genetic background consists of a high-functioning B1 allele (mB1→C = 0) and a low-functioning B2 allele (mB2→C = 0.9). Detectability is somewhat higher relative to the A1A2→BB→CC cases (Figure 6, panels A and B, respectively) but the effect on dAC is negligible. In the 1A2A→B1B2→CC case, detectability of dAC is largely determined by the dominant B allele in the B→C step, such that the genotype-dominance maps (not shown) are virtually indistinguishable from the 1A2A→BB→CC cases of Figure 7, A and B. There is no effect on dAC of variation in the C-locus promoter (the A1A2→BB→1C2C and 1A2A→BB→1C2C cases; not illustrated), but it reduces detectability by reducing φC.
Discussion
We find that dominance emerges in regulatory genetic pathways due to competitive molecular interactions between TF variants in heterozygotes as they bind to their shared promoters. Alleles with higher competitive ability are inevitably dominant with respect to their contributions to fractional occupancy. However, between cis-acting alleles, dominance cannot occur because the corresponding transcripts are independently expressed, such that overall expression is their sum. Dominance effects extend to expression of downstream loci in multi-step pathways, and polymorphism therein can generate genetic background effects. However, this form of dominance is likely to be phenotypically detectable only when TF dosages or binding strengths are in the range where overall gene expression levels differ measurably between homozygotes. We discuss each of these properties and their implications.
Binding dominance: a new mechanism for Mendelian dominance
Competition occurring between allelic TF variants for binding to the promoter sites they regulate (Equation 2; the A1A2→BB and 1A2A→BB interactions) represents a novel mechanism of dominance at the molecular level. The strength of the dominance depends on the biophysical properties of the interaction between TF molecules and the promoter sites to which they bind. When TF variants differ in their binding affinities (–ΔG), the variant with higher affinity is dominant (Figure 3, A–C). Dominance of the competing TF variants is also sensitive to TF availability ([TF]; Figure 3, D–F). This is because, when [TF] is low, fractional occupancy is likewise low and there is little competition at the binding site; the allelic effects approach additivity. Conversely, at high [TF], the more abundant TF allele more often occupies the promoter sequence, driving expression in the heterozygote. In contrast, polymorphism at the downstream cis-regulatory site (AA→1B2B) cannot contribute to dominance. This is because expression of the cis-regulated gene product, or respectively the TF variant, proceeds independently for each allele and overall expression is their sum. In the three-locus pathway, dominance in locus A can propagate down the pathway, such that A alleles can show dominance with respect to expression of locus C (φC; Figure 5, C and D, Figure 6, and Figure 7) as well as locus B (φB).
Binding dominance differs from the type of dominance that arises in metabolic pathways, which we call flux dominance, though the mechanisms of both are rooted in the biophysics of molecular interactions. In enzymes embedded in metabolic pathways, dominant alleles have higher rates of catalysis (kcat), thus producing a higher flux from substrate to product, and the degree of dominance is proportional to the difference in kcat values (Kacser and Burns 1981; Keightley and Kacser 1987; Keightley 1996). Flux dominance is sensitive to substrate saturation of the enzyme (Bagheri-Chaichian et al. 2003), analogous to the way [TF]sat affects the degree of binding dominance through fractional occupancy. Flux dominance does not explain the effects of mutations at regulatory loci (Keightley 1996) because regulatory genetic pathways do not experience flux.
Protein assembly dominance occurs when some subunits of complex proteins are expressed in inappropriate concentrations or have defective structures, disrupting the stoichiometry of protein assembly (Veitia 2003; Veitia et al. 2013). These represent downstream effects in the binding-dominance model, where subunit concentrations are determined by allele-specific φ1B1 and φ2B2, the expression levels of the B1 and B2 structural variants. The phenotype has a nonlinear relationship to gene expression, or in our notation, P = kφ becomes P = k(φ1B1, φ2B2), where k is now a function of the expression levels and binding properties of the other subunits in the complex.
Feedback dominance results from cases where a gene product autoregulates its expression. Omholt et al. (2000) analyzed feedback dominance using the biophysically relevant Hill (1910) equation, which permits serially repeated promoter site sequences; they considered only cases that lacked polymorphism in the TF coding region. Gene products could regulate either their own promoters (in our notation, 1A2A→1A2A) or the promoters of an upstream TF (1A2A→1B2B→1A2A). These pathways resemble the 1A2A→1B2B and 1A2A→1B2B→CC cases for which we find dominance, suggesting that feedback dominance may ultimately prove to be a special case of binding dominance. To our knowledge, the effects of polymorphism in the coding regions, thus competitive binding, on feedback dominance remain unexplored.
Diffusion dominance arises in network-based regulation of ontogenetic diffusion gradients, including morphogen concentrations, their diffusion and decay rates, and the threshold concentrations necessary to initiate a phenotypic response (Gilchrist and Nijhout 2001). Allelic variation affecting any of these components can show dominance in network output. While we have presented our model in the context of TF–promoter interactions, its principles apply broadly to interactions between any genetically determined, interacting regulatory molecules. Our simple regulatory pathways represent elements in these more complex diffusion-based networks, and we expect that dominance due to competitive binding will be inherent in them.
Detectability and cryptic dominance
Biophysical conditions that lead to especially high or low fractional occupancies, determining respectively the bottom left and top right corners of the G–P maps (Figure 2 and Figure 5, A and B), can mask dominance because the two homozygotes have very similar phenotypes. This can occur when m is similar for both alleles, or when allele-specific dosage [TF] is either high enough to saturate the binding site, or low enough that the binding site is rarely occupied by either allele. Even strong dominance at the level of molecular interactions can remain cryptic (e.g., compare Figure 3, D–F to Figure 4, D–F). When [TF]sat is high, only completely unexpressed 1A or 2A alleles will be detectable as recessive (Figure 4, E and F) and moderate to strong dominance will likely go undetected. Likewise, when both TF alleles have similar binding affinities or dosages, the alleles will be nearly codominant, lying along the region of the diagonals of Figure 3, A–F, but all individuals will also have nearly identical phenotypes. There, even polymorphism will be difficult to detect without genotyping; the degree of dominance may be of little practical importance in these cases anyway. Nevertheless, we predict that cryptic dominance will become apparent in assays of allele-specific expression levels (Mueller et al. 2013) in association with dosage and binding strength variation.
Detectability of dominance in the three-locus pathway (Figure 6 and Figure 7) is lower than in the two-locus pathway (Figure 4), because detectability is successively attenuated when it passes through [TF] of downstream loci. In the three-locus pathway, the A→B step determines [B]. In general, [TF] must be low for differences in [TF] to affect expression (Figure 3, A–C; this is also why low detectability is widespread in the 1A2A→BB case of Figure 4, D–F). It takes relatively large changes in expression in the A→B step to appreciatively change [B], and therefore to detect differences in expression at loci further downstream.
Effects of genetic background
Polymorphism in the genetic background can enhance, obscure, or even reverse binding dominance. There are two types of background effects in the two-locus regulatory interaction and several more in the three-locus pathway. In the two-locus pathway, dominance of coding site (A1 and A2) alleles at the TF locus is unaffected by polymorphism in the cis-regulated locus (i.e., dA1A2→1B2B = dA1A2→BB). However, when allele-specific TF dosages and binding affinities ([TF] and m) are permitted to vary in the 1A12A2→BB case, dominance of coding site TF variants is affected by polymorphism in their promoters (Figure 3G) and vice versa (Figure 3H). For a given TF coding region (A1A2) heterozygote, dominance modification is asymmetrical, being more effective when the dosage of the tighter-binding A allele is varied (Figure 3H). In contrast, for a given dosage (1A2A) heterozygote, changes in binding affinities of either allele have effects of similar magnitude (Figure 3G).
In the three-locus pathway, detectability of dAC is further modified by binding strength in the B→C step, such that it is least attenuated when mB→C = 0.5 (for [TF]sat = 10, compare Figure 6, A and C and also Figure 7, A and C; for [TF]sat = 100, compare Figure 6, B and D and also Figure 7, B and D). This is where the G–P map for the B-locus TF coding region is steepest (Figure 2, D–F), therefore where |φC.11–φC.22| (the denominator of dAC) is greatest. dAC becomes almost undetectable when mB→C is high because G–P maps are nearly flat there (Figure 2, D–F), such that the underlying two-locus dominance is nearly undetectable (Figure 4, B and C). Polymorphism at the coding site of locus B (the A1A2→B1B2→CC and 1A2A→B1B2→CC cases) modifies detectability only negligibly (Figure 6, G and H and Figure 7, G and H), because expression at the B→C step incorporates dominance of the tighter-binding allele. Modifying binding strength mB→C by changing the C-locus promoter has the same effect on dAC as does changing the B-locus coding region, but does not affect dominance in the BB→1C2C case because expression there is additive.
Flux dominance is similarly sensitive to allelic substitutions that occur up to several steps removed along a metabolic pathway (Kacser and Burns 1981; Keightley 1996). Bagheri-Chaichian et al. (2003) show that the downstream dominance effects are sensitive to enzyme saturation at intermediate steps, much as we see in binding site saturation in regulatory pathways (Figure 6 and Figure 7). Feedback dominance likewise shows downstream effects (Omholt et al. 2000) in pathways with the structure 1A2A→BB→(1A2A and CC), i.e., where the product of locus B coregulates a downstream locus C as well as an upstream locus A. In this case, dominance of the A1 allele is detectable in the expression of locus C. Omholt et al. (2000) did not directly assess attenuation of the signal due to saturation at intermediate steps; rather, they noticed and excluded it by considering only cases where homozygotes showed differences >25%.
Binding dominance is likely to interact with flux dominance. When locus B codes for a metabolic enzyme, flux dominance of allele B1 can be modified in 1A2A→B1B2 or A1A2→B1B2 interactions, provided that regulatory changes in B’s expression levels affect enzyme saturation in the three B-locus genotypes. Polymorphism in both the promoter and product site of the B locus, i.e., the 1A2A→1B12B2 and A1A2→1B12B2 cases, should further influence B1’s flux dominance by further changing relative allozyme concentrations. Conversely, we expect changes in allozyme concentration or kcat due to variation in 2B or B2 to modify, mask, or expose binding dominance at 1A or A1 when dA1 is assessed using genotype-specific fluxes in the metabolic pathway.
Beyond the regulatory pathway, TFs interact with other molecules in the cell that may be influenced by genetic background. These include direct interactions with proteins that regulate TF availability, spurious DNA, RNA, or protein binding, and indirect effects of physiological conditions such as pH (Mueller et al. 2013). These affect the [TF]/[TF]sat ratio but have negligible effect on dominance and its detectability: the isoclines of Figure 3, D–F and the detectability gradients of Figure 4, D–F are linear, therefore constant with respect to this ratio. However, dominance may be modified in cases where TF variants differ in their responses to the nonspecific background or are regulated differently (i.e., A1A2 cases with properties closer to the 1A12A2 case). For analytical convenience in this study, these secondary binding effects are subsumed into [TF] (see parameter reduction in File S1). The unreduced model of Tulchinsky et al. (2014) may be necessary in the design and interpretation of experiments.
Empirical studies
Consistent with the competitive binding model, cis-site heterozygotes typically show additive expression whereas trans heterozygotes commonly show dominance (Wray 2007; Guo et al. 2008; Tirosh et al. 2010; Zhang et al. 2011; Gruber et al. 2012; Meiklejohn et al. 2014), although some cis-site polymorphisms show patterns of dominance as well (Guo et al. 2008; Lemos et al. 2008). Our modeling suggests the possibility that unidentified polymorphism in regulatory loci upstream, in disequilibrium with the cis site, may be involved in at least some of these exceptions. Motifs with variable numbers of binding site repeats in the promoter region could also potentially produce binding dominance and even overdominance, as they do in feedback dominance (Omholt et al. 2000).
Mueller et al. (2013) review empirical work on the biophysics of fractional occupancy in regulatory interactions. Gene expression is highly correlated with fractional occupancy of TFs on their binding sites, as our model assumes. Site-specific mutagenesis, using a variety of techniques for measuring binding affinity at primary vs. secondary (likely to be spurious background) binding sites, reveals strong differences in binding affinity among artificial promoter region alleles (1B and 2B alleles, in our notation). Some of these techniques are themselves based on measures of competitive binding among sites. Gaur et al. (2013) review studies demonstrating that TF and promoter region alleles show significant patterns of allele-specific gene expression in diverse model organisms. Nevertheless, to our knowledge, allelic variation in TF binding affinity and concentration, in diverse genetic backgrounds, with respect to its effects on competitive binding and heterozygote gene expression, remain to be studied.
Concluding remarks
In the discovery and documentation of regulatory architectures that drive gene expression, it has been necessary and appropriate to use inbred lines and careful breeding designs in model organisms to control for heterozygosity and to homogenize the genetic background. Outside of the laboratory, polymorphism is ubiquitous. Our understanding of gene regulation must account for it as we learn to predict and manipulate gene expression in the face of multilocus heterozygosity and, ultimately, as we design and implement new regulatory architectures, in diverse systems of importance in medical, agricultural, and fundamental research. A comprehensive, quantitative, and mechanistically robust theory of Mendelian dominance will likely be required, and binding dominance is likely to be a significant component of it.
Acknowledgments
We thank C. Babbitt and anonymous reviewers for valuable comments on the manuscript, and J. Birchler for an enjoyable and insightful conversation.
Footnotes
Supplemental material is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.116.195255/-/DC1.
Communicating editor: E. A. Stone
Literature Cited
- Bagheri H. C., Wagner G. P., 2004. Evolution of dominance in metabolic pathways. Genetics 168: 1713–1735. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bagheri-Chaichian H., Hermisson J., Vaisnys J. R., Wagner G. P., 2003. Effect of epistasis on phenotypic robustness in metabolic pathways. Math. Biosci. 184: 27–51. [DOI] [PubMed] [Google Scholar]
- Bintu L., Buchler N. E., Garcia H. G., Gerland U., Hua T., et al. , 2005a Transcriptional regulation by the numbers: models. Curr. Opin. Genet. Dev. 15: 116–124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bintu L., Buchler N. E., Garcia H. G., Gerland U., Hua T., et al. , 2005b Transcriptional regulation by the numbers: applications. Curr. Opin. Genet. Dev. 15: 125–135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bond D. M., Baulcombe D. C., 2014. Small RNAs and heritable epigenetic modification in plants. Trends Cell Biol. 24: 100–107. [DOI] [PubMed] [Google Scholar]
- Browning D. F., Busby S. J. W., 2004. The regulation of bacterial transcription initiation. Nat. Rev. Microbiol. 2: 1–9. [DOI] [PubMed] [Google Scholar]
- Gaur U., Li K., Mai S., Liu G., 2013. Research progress in allele-specific expression and its regulatory mechanisms. J. Appl. Genet. 54: 271–283. [DOI] [PubMed] [Google Scholar]
- Gerland U., Moroz J. D., Hwa T., 2002. Physical constraints and functional characteristics of transcription factor-DNA interaction. Proc. Natl. Acad. Sci. USA 99: 12015–12020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gilchrist M. A., Nijhout H. F., 2001. Nonlinear developmental processes as sources of dominance. Genetics 159: 423–432. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gruber J. D., Vogel K., Kalay G., Wittkopp P. J., 2012. Contrasting properties of gene-specific regulatory, coding, and copy number mutations in Saccharomyces cerevisiae: frequency, effects, and dominance. PLoS Genet. 8: e1002497. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guo M., Yang S., Rupe M., Hu B., Bickel D. R., et al. , 2008. Genome-wide allele-specific expression analysis using massively parallel signature sequencing (MPSS) reveals cis- and trans-effects of gene expression in maize hybrid meristem tissue. Plant Mol. Biol. 66: 551–563. [DOI] [PubMed] [Google Scholar]
- Hill A. V., 1910. The possible effect of the aggregation of the molecules of hemoglobin. J. Physiol. 40: iv–viii. [Google Scholar]
- Hughes K. A., Ayroles J. F., Reedy M. M., Drnevich J. M., Rowe K. C., et al. , 2006. Segregating variation in the transcriptome: cis regulation and additivity of effects. Genetics 173: 1347–1355. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kacser H., Burns J. A., 1981. The molecular basis of dominance. Genetics 97: 639–666. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Keightley P. D., 1996. A molecular basis for dominance and recessivity. Genetics 143: 621–625. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Keightley P. D., Kacser H., 1987. Dominance, pleiotropy and metabolic structure. Genetics 117: 319–329. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Khatri B. S., Goldstein R. A., 2015. A course-grained model of sequence evolution and the population size dependence of the speciation rate. J. Theor. Biol. 378: 56–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lemos B., Araripe L. O., Fontanillas P., Hartl D. L., 2008. Dominance and the evolutionary accumulation of cis- and trans-effects on gene expression. Proc. Natl. Acad. Sci. USA 105: 14471–14476. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li Y., Varala K., Moose S. P., Hudson M. P., 2012. Inheritance pattern of 24 nt siRNA clusters in Arabidopsis hybrids is influenced by proximity to transposable elements. PLoS One 7(10): e47043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meiklejohn C. D., Coolon J. D., Hartl D. L., Wittkopp P. J., 2014. The roles of cis- and trans-regulation in the evolution of regulatory incompatibilities and sexually dimorphic gene expression. Genome Res. 24: 84–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mendel, G. J., 1866 Versuche über Pflanzen-Hybriden. Verhandlungen der naturforschenden Vereins in Brunn. 4: 3–47. Reprinted as Experiments in plant hybridization (in English) on MendelWeb, Ed. 97.1, edited by R. B. Blumberg. Available at: http://www.mendelweb.org/Mendel.html. Accessed: July 5, 2016.
- Mueller F., Stasevich T. J., Maza D., McNally J. D., 2013. Quantifying transcription factor kinetics: at work or at play? Crit. Rev. Biochem. Mol. Biol. 48: 492–514. [DOI] [PubMed] [Google Scholar]
- Omholt S. W., Plahte E., Øyehaug L., Xian K., 2000. Gene regulatory networks generating the phenomena of additivity, dominance and epistasis. Genetics 155: 969–990. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Phillips R., Kondev J., Thierot J., Garcia H., 2012. Physical Biology of the Cell, Ed. 2 Garland Science, New York. [Google Scholar]
- Stupar R. M., Springer N. M., 2006. Cis-transcriptional variation in maize inbred lines B71 and Mo17 leads to additive expression patterns in the F1 hybrid. Genetics 173: 2199–2210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Teif V. B., Ettig R., Rippe K., 2010. A lattice model for transcription factor access to DNA. Biophys. J. 99: 2597–2607. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Teif V. B., Shkrabkou A. V., Egorova V. P., Krot V. I., 2012. Nucleosomes in gene regulation: theoretical approaches. Mol. Biol. 46: 1–10. [PubMed] [Google Scholar]
- Thomson D. W., Dinger M. E., 2016. Endogenous microRNA sponges: evidence and controversy. Nat. Rev. Genet. 17: 273–283. [DOI] [PubMed] [Google Scholar]
- Tirosh I., Reikhav S., Segal N., Assia Y., Barkai N., 2010. Chromatin regulators as capacitors of interspecies variations in gene expression. Mol. Syst. Biol. 6: 435. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tulchinsky A. Y., Johnson N. A., Watt W. B., Porter A. H., 2014. Hybrid incompatibility arises in a sequence-based bioenergetic model of transcription factor binding. Genetics 198: 1155–1166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Veitia R. A., 2003. Nonlinear effects in macromolecular assembly and dosage sensitivity. J. Theor. Biol. 220: 19–25. [DOI] [PubMed] [Google Scholar]
- Veitia R. A., Bottaani S., Birchler J. A., 2013. Gene dosage effects: nonlinearities, genetic interactions, and dosage compensation. Trends Genet. 29: 385–393. [DOI] [PubMed] [Google Scholar]
- Wolfram Research, Inc. , 2015. Mathematica v10.3. Wolfram Research, Champaign, IL. [Google Scholar]
- Wray G. A., 2007. The evolutionary significance of cis-regulatory mutations. Nat. Genet. 8: 206–216. [DOI] [PubMed] [Google Scholar]
- Wright S., 1934. Physiological and evolutionary theories of dominance. Am. Nat. 68: 24–53. [Google Scholar]
- Zhang X., Cal A. J., Borevitz J. O., 2011. Genetic architecture of regulatory variation in Arabidopsis thaliana. Genome Res. 21: 725–733. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The authors state that all data necessary for confirming the conclusions presented in the article are represented fully within the article.