Abstract
Promoters are thought to play a major role in adaptive evolution, yet little is known about the regulatory diversity within species, where microevolutionary processes take place. To investigate the potential for evolutionary change in the promoter of a gene, we examined nucleotide and functional variation of the Chalcone Synthase (CHS) cis-regulatory region in Arabidopsis thaliana. CHS is the branch point enzyme of a biosynthetic pathway that leads to the production of secondary metabolites influencing the interaction between the plant and its environment. We found that nucleotide diversity in the intergenic region encompassing the CHS promoter (π = 0.003) is compatible with neutral expectations. To quantify functional variation specifically as a result of cis-regulation of CHS mRNA levels, we developed an assay using F1 individuals in which distinct promoter alleles are compared within a common trans-regulatory background. We examined functional cis-regulatory variation in response to different stimuli representing a variety of CHS transcriptional environments (dark, light, and insect feeding). We observed extensive functional variation, some of which appeared to be independent of the trans-regulatory background. Comparison of functional and nucleotide diversity suggested a candidate point mutation that may explain cis-regulatory differences in light response. Our results indicate that functional changes in promoters can arise from a few mutations, pointing to promoter regions as a fundamental determinant of functional genetic variation.
INTRODUCTION
The cis-regulatory regions have been hypothesized to facilitate adaptive innovations because subtle nucleotide changes may generate novel phenotypes while preserving existing functions (Wray et al., 2003). Promoters and other cis-regulatory regions form a protein/DNA complex with trans-regulatory proteins (transcription factors), thereby promoting integrative control of expression. The functional architecture of these regions consists of short and often redundant transcription factor binding sites interspersed within a background sequence of apparently nonfunctional regions. In some cases, binding site loss through point mutation may be easily compensated by remaining binding sites (Piano et al., 1999; Ludwig et al., 2000; Dermitzakis et al., 2003). Because quasiperfect binding sites are often detected in the sequence background of promoter regions, new binding sites can emerge randomly through point mutation and permit a gain of function (Stone and Wray, 2001; Dermitzakis et al., 2003). Gene duplication has also been shown to promote gene expression variation, further supporting the idea that new functions can evolve readily from cis-regulatory changes (Gu et al., 2004).
The evolutionary dynamics of cis- and trans-regulatory regions remain poorly understood. Several studies suggest abundant neutral expression variation, most of which results from variation at the trans-regulatory level (Brem et al., 2002; Enard et al., 2002; Von Dassow and Odell, 2002; Khaitovich et al., 2004). In addition, a comparative study of cis-regulatory activity between two closely related Drosophila melanogaster species reveals extensive divergence at the cis-regulatory level, with most genes having undergone more cis- than trans-regulatory changes (Wittkopp et al., 2004). Thus, cis-regulatory evolution may be the prime determinant of expression changes between species, although trans-regulatory polymorphism is more important within species (Wittkopp et al., 2004). Studies in human, fish, Drosophila, and maize (Zea mays) document the role of specific cis-regulatory variants in adaptive evolution (Crawford et al., 1999; Wang et al., 1999; Schulte et al., 2000; Michalak et al., 2001; Bamshad et al., 2002; Lerman et al., 2003; Rockman et al., 2003). However, the fraction of naturally segregating cis-regulatory polymorphisms that are adaptive remains unknown.
The fact that cis-regulatory function cannot be predicted from the basic nucleotide sequence is a major hurdle to advancement of this field. To identify functional cis-regulatory regions, phylogenetic footprinting is currently the most widely used approach. This method assumes that sequence conservation in noncoding regions indicates function, and it has proven useful in identifying some functionally important elements in promoter sequences (Koch et al., 2001; Boffelli et al., 2003; Cliften et al., 2003). This approach, however, assumes that function has been conserved across lineages and de facto ignores the adaptive innovations that may accompany speciation. To address this issue, we must focus on functional changes rather than conservation to understand the short-term evolutionary dynamics of cis-regulatory regions.
Here, the within-species variation in a cis-regulatory region is characterized at both nucleotide and functional levels to evaluate the reservoir of cis-regulatory variability upon which natural selection could act. We developed an assay of cis-regulatory function that enables us to systematically evaluate within species cis-regulatory variation and combine this approach with a survey of sequence diversity in the intergenic region containing a known promoter. This assay employs single nucleotide polymorphisms (SNPs) in the coding region of target genes to discriminate allelic cis-regulatory differences within F1 individuals, where allelic cis-variants operate in a common trans-regulatory environment (Cowles et al., 2002). We evaluated the relative proportions of expressed parental alleles present in F1 cDNA pools using pyrosequencing technology, similar to Wittkopp et al. (2004). Because alleles are expressed in the same cells, this method efficiently controls for environmental variation, allowing sensitive detection of differences in expression.
We focused on the expression of the Chalcone Synthase gene (CHS) because it codes for the branch point enzyme of the flavonoid pathway. This pathway produces secondary metabolites directly involved in the interaction between the organism and its environment (Winkel-Shirley, 2001). For example, Arabidopsis thaliana mutants with enhanced flavonoid biosynthesis show higher insect resistance, in association with a physiological cost of resistance; hence, genes controlling flavonoid production may play a role in adaptive evolution (Johnson and Dowd, 2004). CHS is known to be upregulated in leaves by multiple biotic and abiotic environmental cues, such as insect feeding or light (Reymond et al., 2000; Jenkins et al., 2001; Wade et al., 2001; H. Vogel, personal communication). It is also developmentally regulated, as flavonoid production is elevated in flowers (Burbulis et al., 1996). cis-regulatory variation of CHS expression was examined in multiple conditions covering the diversity of CHS expression environments. Concomitantly, we have assessed molecular diversity in the intergenic region containing the well-characterized CHS promoter (Hartmann et al., 1998; Logemann and Hahlbrock, 2002).
Combining approaches at both nucleotide and functional levels, we investigate the evolution of the cis-regulatory region governing CHS expression in A. thaliana and address the following questions: (1) is there within-species nucleotide variation in the CHS promoter? (2) Does the pattern of diversity contain the footprint of selection? (3) Is there functional CHS cis-regulatory variation segregating within A. thaliana? (4) What is the putative genetic basis of naturally occurring cis-regulatory variation?
RESULTS
Sequence Diversity
Approximately 1339 bp of the 5′ flanking region at the CHS gene (henceforth referred to as intergenic region) were sequenced in 28 A. thaliana accessions originating from different parts of the world (Figure 1). This region is known to contain the CHS promoter. Detailed serial deletion constructs have delimited the upstream region involved in the activation of the gene, and multiple cues have been identified that induce its expression (Hartmann et al., 1998; Logemann and Hahlbrock, 2002).
Figure 1.
Summary of Polymorphism Location and Frequencies in the 5′ Flanking Region of CHS in A. thaliana.
(A) Phylogenetic footprints identified by Koch et al. (2001) are outlined by black boxes along the bottom line. The gray box indicates a region conserved across multiple Arabidopsis species (J. de Meaux, unpublished data). Polymorphisms are indicated by black and gray triangles along the middle line according to their position on the sequence. Upper and lower triangles indicate segregating sites and indels, respectively. Gray triangles indicate singleton polymorphisms. Two indels are longer than one nucleotide. Black bars in the top graph show the frequency of each mutation in our sample (28 ecotypes). Black arrows highlight those mutations that are found within a phylogenetic footprint.
(B) Summary of DNA variation in the 5′ intergenic region upstream from CHS and in the CHS transcribed region (encompassing exon 1, intron 1, and part of exon 2) in A. thaliana. Numbers indicate the position in the sequence relative to the first nucleotide of the CHS start codon. Positions used for the pyrosequencing assay are indicated with bold letters. Functional cis-regulatory groups identified by our assay are indicated in the last column. *, Undetermined positions; $, SNP780; §, SNP840; #, mutation situated in an intron; (a), sequence from Ramos-Onsins et al. (2004).
Fourteen nucleotide polymorphisms and seven indels were distributed across the intergenic region (Figures 1A and 1B; see supplemental data online for alignment). Five of seven indels were 1 bp long. Of 21 polymorphisms, four are singletons (one polymorphism and three indels). Fourteen haplotypes were observed in 28 accessions, and at least one recombination event was detected. We calculated π (per site polymorphism rate) separately for nucleotide polymorphisms and indels and obtained 0.0032 and 0.001, respectively (Table 1). Tajima's D test examines within species polymorphism and evaluates whether its frequency distribution is compatible with the neutral-equilibrium model (Tajima, 1989). The patterns of polymorphism were not significantly different from neutral expectations as indicated by nonsignificant Tajima's D (D = 0.6052 and D = −0.356, respectively; both P > 0.1). Genome-wide estimates of diversity are available for A. thaliana that incorporate the demographic history for neutral expectations (Schmid et al., 2005). Of 195 loci analyzed by Schmid et al. (2005), 37 (18%) showed a Tajima's D value that was greater than the value observed at the CHS intergenic region (S.E. Ramos-Onsins, personal communication). Thus, the pattern of polymorphism in the intergenic region is consistent with neutral expectations derived from either theoretical or empirical predictions. Fay and Wu's H was estimated using the intergenic sequence of A. thaliana croatica as an outgroup (Table 1). This neutrality test uses an outgroup sequence to analyze the frequency distribution of derived mutations. Significant negative deviation from the neutral model (i.e., an excess of high frequency derived mutations) has been associated with a possible directional selective event (Fay and Wu, 2000). In the CHS intergenic region, the H value was not significantly different from zero (H = 1.62; P > 0.1; Table 1).
Table 1.
Summary Statistics of DNA Variation in the CHS Region
5′ Upstream Intergenic Region (1339 bp, 28 Sequences)
|
Exon 1, Intron 1, and Exon 2 (1216 bp, 20 Sequences)
|
||
---|---|---|---|
Measures of Diversity | Segregating Sites | Indelsa | Segregating Sites |
Informative sites | 13 | 5 (four affecting a single position) | 2 (all silent) |
Singletons | 1 | 2 (one affecting one position) | 4 (all silent but one) |
Haplotypes | 9 | 7 | 6 |
Haplotype diversity | 0.897 | 0.755 | 0.636 |
Average number of nucleotide differences | 4.241 | 1.368 | 0.857 |
π | 0.0032 | 0.001 | 0.0007 |
θ | 0.0027 | 0.0012 | 0.0013 |
Total number of observed polymorphisms (expected number) | 14 (10.06)b | 5 (8.94) b | |
Dtajima | 0.6052 (P > 0.1) | −0.356 (P > 0.1) | −1.24 (P > 0.1) |
Average number of observed differences from A. croatica (expected number) | 41.741 (45.68)b | 59.94 (56.62)b | |
Number of base pairs compared | 910 | 345.469 | |
H (outgroup A. croatica) | 1.6267 (P > 0.1) | 0.726 (P > 0.1) |
For the calculations, indels were coded as substitutions.
Nonsignificant HKA test (χ2 value = 1.531, P = 0.2160).
The comparison of polymorphism to divergence ratios across two loci (Hudson, Kreitman, and Aguadé [HKA] test) provides another way of evaluating whether the evolutionary history of a given genomic region is remarkable (Hudson et al., 1987). We sequenced the CHS coding region in 22 A. thaliana accessions. Polymorphism in the coding region was lower than in the intergenic region (π = 0.0007), with five silent polymorphisms observed, three of which are singletons (Figure 1B). The comparison between promoter and coding sequence using the HKA test detected no significant difference (χ2 = 1.531; P > 0.1; Table 1).
Polymorphism in the CHS promoter region was further compared with two A. thaliana noncoding regions flanking different members of a trypsin inhibitor gene family (ATTI2 and ATTI4) that had contrasting patterns of gene expression variation (Clauss and Mitchell-Olds, 2004). Polymorphism found in the CHS intergenic region and in the noncoding region upstream from ATTI2 was comparable. By contrast, nucleotide diversity found in the noncoding region upstream from ATTI4 was an order of magnitude greater (nucleotide diversity, π, reached 0.00364 and 0.02686, respectively, for ATTI2 and ATTI4, M.J. Clauss, personal communication). Within-species expression level differences were shown to segregate within A. thaliana for ATTI4 but not ATTI2 (Clauss and Mitchell-Olds, 2004).
We also compared the polymorphism-to-divergence ratio of the CHS promoter region with that of the ADH promoter region studied by Miyashita (2001) using sequences from Arabidopsis gemmifera as an outgroup. The ADH promoter region harbors experimentally verified regulatory elements (Hoeren et al., 1998). No significant difference was detected (Table 2). Respectively, 52 and 84% of CHS and ADH A. thaliana upstream flanking regions aligned with those from A. gemmifera, allowing us to compare 696 bp of the 1383-bp CHS intergenic region and 2049 bp of the 2366-bp ADH upstream region (Miyashita, 2001). Within A. thaliana, indel events were more common in the ADH intergenic region than in the CHS region (38 indels in 2366 bp versus 7 indels in 1339 bp). By contrast, CHS had more indel events than the ADH flanking region since divergence from A. gemmifera, as indicated by the poor alignment.
Table 2.
Polymorphism to Divergence Ratios in CHS and ADH Intergenic Regions
Characteristics of Variation | Promoter CHS | Promoter ADHa |
---|---|---|
Number of sequences | 27 | 14 |
Number of polymorphic sites in A. thaliana | 14 | 42 |
Expected number of polymorphic sites | 17.88 | 38.12 |
Number of base pairs | 1335 | 2366 |
Average number of differences from A. gemmifera | 52.00 | 203.00 |
Expected number of differences | 48.20 | 207.00 |
Number of base pairs compared | 696 | 2049 |
HKA χ2 | 0.669 | |
P value | 0.413 |
Data from Miyashita (2001).
Phylogenetic footprinting throughout the Brassicacae has identified conserved fragments in the intergenic region that were shown to promote light-induced expression (Koch et al., 2001). Two substitutions and one indel were detected in these conserved fragments. The two substitutions affected an H-box–like motif and a G-box motif (positions −248 and −515, respectively; see Figure 1). The G-box motif also contained a 3-bp indel (position −520).
To evaluate whether functional constraints may have prevented the occurrence of polymorphisms that affect binding sites, we have screened the intergenic region for putative binding sites using the PLACE database of plant enhancer elements (Higo et al., 1999). For each haplotype, approximately half of the 1339 positions were covered by a predicted binding site. Of 21 polymorphisms, 10 (47%) affected a predicted binding site, indicating that polymorphic positions are randomly distributed across the region regardless of predicted binding sites. The 5′ portion of the intergenic region (∼500 bp) was shown not to affect promoter light response (Hartmann et al., 1998). Diversity in this portion, however, did not appear to differ from the 3′ portion (Figure 1).
Expression Diversity
In F1 individuals, both parental cis-regulatory regions experience the same trans-regulatory environment; thus, the relative amount of parental mRNAs reflects their relative activity. F1 progenies were analyzed by comparing Col-0 with each of seven other accessions (Cvi-0, Da1-12, Ei-2, Kas-1, Tsu-1, Lip-0, and Mrk-0). Col-0 harbored a mutation allowing us to distinguish its CHS mRNA allele. The seven other accessions were chosen to represent nucleotide variation in the promoter region. F1 seeds were grown in controlled conditions and examined in various CHS transcription environments. In total, eight CHS transcription environments were assessed (48 h dark, 8 h light, 24 h light, organ-specific expression in flowers, 9 h insect feeding and corresponding control, and 24 h insect feeding and corresponding control). To evaluate the relative amount of parental mRNAs, a PCR was performed on the cDNA pool using primers that anneal to conserved regions around the SNPs differentiating parental CHS copies. Each parental allele was then amplified proportionately to its relative concentration in the template solution. The relative amount of parental alleles in the PCR solution was quantified by pyrosequencing. In a DNA sample extracted from a heterozygous diploid individual, parental genomes are present in equal amount; thus, parental copies occur at the same concentration. DNA extracted from a leaf of one heterozygous F1 individual from each cross was therefore used to calibrate the null hypothesis of no allelic difference in promoter activity, following Cowles et al. (2002). A total of six to eight independent F1 individuals were analyzed for each treatment and each parental combination. An analysis of variance (ANOVA) was performed to evaluate the effect of both experimental and biological factors (below). Separate analyses were performed for light treatments and for each insect feeding experiment and corresponding control. No parent-of-origin effect was detected, ruling out the possibility that our results reflect imprinting or maternal effects (data not shown).
Technical Variation
Pyrosequencing measurements were repeated with two independent primer pairs, yielding highly correlated results (R2 = 0.87; P < 0.0001; Figure 2). Primer pairs nonetheless significantly affected the results, as shown by the significant effect of primer pair upon pyrosequencing measurement in the ANOVA analysis (Table 3). Similarly, there was a significant effect of the pyrosequencing technical covariate, as measured by monomorphic peak height (see Methods). A significant effect of PCR plates was also detected. We also investigated column and row effects by measuring the relative allelic signal of a single sample on two plates. We found a significant column effect that varied among plates (data not shown). A complete sample randomization before PCR amplification and pyrosequencing was therefore compulsory to control for nonbiologically relevant variation.
Figure 2.
Correlated Pyrosequencing Measurements with Independent Primer Pairs.
Regression between pyrosequencing measurements on SNP780 obtained for ∼70 cDNA samples with two independent primer pairs (Set 719 and Set 700) (R = 0.93, P < 0.001).
Table 3.
ANOVA of Allele-Specific CHS Transcript Levels
Source | SS-Type IIIa | df | Mean Square | F ratio | P |
---|---|---|---|---|---|
Relative allele expression in response to light treatments (model R2 = 0.78) | |||||
Trialb | 0.17249 | 1 | 0.17249 | 4.43 | 0.0819 |
Primers | 0.01745 | 1 | 0.01745 | 15.71 | 0.0001 |
Treatmentc | 0.42930 | 4 | 0.10733 | 33.71 | 0.0019 |
Cross | 0.07455 | 6 | 0.01242 | 16.17 | 0.0011 |
Technical covariate | 0.09176 | 1 | 0.09176 | 86.62 | <0.0001 |
Treatment × cross | 0.13544 | 24 | 0.00564 | 5.08 | <0.0001 |
Trial × treatmentd | 0.01341 | 4 | 0.00335 | 3.02 | 0.0177 |
Trial × cross | 0.00454 | 6 | 0.00076 | 0.68 | 0.6659 |
Plate within trial | 0.23077 | 6 | 0.03846 | 34.63 | <0.0001 |
Error | 0.44093 | 483 | 0.00093 | ||
Relative allele expression after 9 h of insect feeding (model R2 = 0.57) | |||||
Trialb | 0.01562 | 1 | 0.01562 | 0.44 | 0.5214 |
Primer | 0.01191 | 1 | 0.11907 | 6.09 | 0.0143 |
Treatmentc | 0.07948 | 2 | 0.03974 | 5.24 | 0.1547 |
Cross | 0.15651 | 6 | 0.02609 | 2.78 | 0.1149 |
Technical covariate | 0.02207 | 1 | 0.02207 | 11.28 | 0.0009 |
Treatment × cross | 0.14858 | 12 | 0.01238 | 6.33 | <0.0001 |
Trial × treatment | 0.01603 | 2 | 0.00801 | 4.1 | 0.0177 |
Trial × cross | 0.06096 | 6 | 0.01016 | 5.19 | <0.0001 |
Plate within trial | 0.14189 | 6 | 0.23648 | 12.09 | <0.0001 |
Error | 0.51643 | 264 | 0.00196 | ||
Relative allele expression after 24 h of insect feeding (model R2 = 0.62) | |||||
Trialb | 0.01836 | 1 | 0.01836 | 0.61 | 0.4697 |
Primer | 0.01983 | 1 | 0.01983 | 9.17 | 0.0027 |
Treatmentc | 0.09018 | 2 | 0.04509 | 79.31 | <0.0001 |
Crosse | 0.04108 | 5 | 0.00822 | 95.13 | <0.0001 |
Technical covariate | 0.12713 | 1 | 0.12713 | 58.77 | <0.0001 |
Treatment × cross | 0.11060 | 10 | 0.01106 | 5.11 | <0.0001 |
Trial × treatment | 0.00059 | 2 | 0.00030 | 0.14 | 0.8727 |
Trial × cross | 0.00045 | 5 | 0.00009 | 0.04 | 0.9990 |
Plate within trial | 0.23549 | 6 | 0.03925 | 18.14 | <0.0001 |
Error | 0.51049 | 236 | 0.00216 |
SS, sum of squares.
Trial effect corresponds to independent repetition of the experimental gene induction treatments. This effect was treated as a random source of variation.
The null expectation of no cis-regulatory allelic differences modeled by amplifications from heterozygous DNA samples is included in the treatments.
Trial × treatment effect and cross effect are not significant if data from the Kas-1 × Col-0 cross are deleted from the analysis.
Because the Kas-1 × Col-0 cross differed markedly from the other crosses, it was deleted from the analysis of the 24 h insect feeding trial.
In the allelic mixture, one or the other allele may be preferentially amplified. To account for this bias, raw data were calibrated using a K corrected standard curve (see Methods), following Wittkopp et al. (2004). Because the treatments we used presumably caused the expression of distinct suites of genes, and because a high PCR cycle number was required to obtain comparable amounts of PCR products across samples (because of low CHS expression in the dark), we wanted to make sure that there is no treatment-dependent PCR bias in our measurement, for which the K correction might not account. We therefore investigated whether the cDNA gene pool could affect our measurements. We prepared serial volumetric RNA mixtures from homozygous genotypes placed under the different light treatments (dark, 8 h light, and 24 h light). Three independent retrotranscription reactions (replicates) were performed, and pyrosequencing measurements were corrected as described in Methods. Figure 3 shows the regressions obtained for the different light treatments. An analysis of covariance detected no significant interaction between light treatment and the levels of serial mixtures (F4,153 = 1.82, P > 0.1). This demonstrates that our method accurately measures the relative amounts of parental alleles regardless of the environmental conditions in which cDNA pools were generated. In addition, there was no significant effect of experimental replicates (F2,4.02 = 2.02, P > 0.1).
Figure 3.
Standard Curves for Calibration of the Raw Data.
Regression obtained between the proportion of the Col-0 allele in serial volumetric mixtures and the pyrosequencing measurement. Serial volumetric mixtures were performed using cDNA pools resulting from different light treatments as well as genomic DNA. The regression coefficients (R) were 0.99, 0.90, 0.94, and 0.97 for DNA, dark, light 8 h, and light 24 h, respectively. ANOVA indicated no interaction between allelic proportion and treatment effect (P > 0.1).
Light Treatments
CHS cis-regulatory haplotypes were found to respond differently to the light environments as indicated by a significant effect of light treatments upon relative expression of CHS parental alleles within F1 heterozygotes (Table 3). Crosses were also shown to differ, and a significant interaction between light response and crosses was found. Homogeneous results were obtained in both experimental trials, as indicated by a nonsignificant trial effect. Relative to the Col-0 allele, six of seven parental alleles were less expressed in the dark and more expressed in both 8 h and 24 h light (Figure 4).
Figure 4.
Relative CHS cis-Regulatory Activity in F1 Individuals from Seven Parental Combinations in Response to Light Environment.
F1 individuals obtained from crosses between Col-0 and seven other ecotypes were analyzed. For each cross, relative levels of the Col-0 CHS mRNA allele are indicated for each of the light treatments (dark, light 8 h, and light 24 h). The bold black line indicates the expected value for equal promoter activity of both parental haplotypes in the cross. In the dark, the Col-0 parental allele is generally more expressed than its counterpart. The pattern is reversed in light 8 h and light 24 h. Only individuals from the Kas-1 × Col-0 cross depart from this pattern. Error bars show standard error.
Results obtained for the cross between Kas-1 and Col-0 were responsible for a large part of the genetic differences between crosses. In this cross, Col-0 and Kas-1 CHS mRNA levels in dark-maintained leaves did not differ significantly from the heterozygous DNA control. Although fewer repetitions were available for the cross between Ler-0 and Col-0, a similar trend toward equal allelic expression was observed. For the other light treatments, all parental combinations gave similar results.
We further tested whether the differential promoter activity between Col-0 and the other accessions was background dependent. We crossed Ei-2, Lip-0, and Cvi-0 with both Lz-0 and Ag-0, two accessions with promoter and coding sequences similar to Col-0, and evaluated relative CHS mRNA allelic abundance in three light treatments (six crosses in total; Table 4). In comparison with Ei-2, Lip-0, or Cvi-0, mRNA alleles from Lz-0 and Ag-0 were more abundant in the dark and less abundant after 8 h or 24 h of light. Lz-0 and Ag-0 cis-regulatory alleles thus behaved similarly to the Col-0 cis-regulatory allele (significant treatment effect, F2,90 = 17.55, P < 0.001; Table 4). No significant cross effect was detected, indicating that these results were independent of genetic background. We also took advantage of an additional SNP (SNP840) distinguishing Lip-0 from Ei-2, Ag-0, and Lz-0 CHS coding sequences and looked at relative expression levels in response to light changes. The cross between Lip-0 and Ei-2 was significantly different, whereas crosses between Lip-0 and either Lz-0 or Ag-0 were comparable to results obtained for the cross between Lip-0 and Col-0. The relative expression levels of Lip-0 and Ei-2 were independent of light treatment (F2,15 = 0.026, P = 0.97), in contrast with the other two crosses (Lip-0 with either Lz-0 or Ag-0; F3,31= 10.86, P < 0.001). The Lip-0 allele was significantly more expressed than the Ei-2 allele in all three light conditions (F3,16 = 48.62, P < 0.001; minimum P < 0.001 for comparison between DNA measurement and light treatments).
Table 4.
ANOVA of Allele-Specific CHS Transcript Levels in Crosses Not Involving the Col-0 Genotype
Source (Model R2 = 0.79) | SSa | df | MSb | F ratio | P |
---|---|---|---|---|---|
Treatmentc | 0.085 | 2 | 0.042 | 17.504 | <0.001 |
Crossd | 0.011 | 5 | 0.002 | 0.887 | 0.493 |
Technical covariate | 0.011 | 1 | 0.011 | 4.594 | 0.035 |
Cross × treatment | 0.032 | 10 | 0.003 | 1.323 | 0.23 |
Plate | 0.108 | 1 | 0.108 | 44.319 | <0.001 |
Plate × genotype | 0.031 | 5 | 0.006 | 2.546 | 0.033 |
Plate × treatment | 0.036 | 2 | 0.018 | 7.320 | 0.001 |
Error | 0.218 | 90 | 0.002 |
SS, sum of squares type III.
MS, mean square.
Three treatments were compared (dark, 8 h light, and 24 h light).
Six crosses were analyzed.
Thus, the light treatments allowed us to distinguish at least two functionally distinct allelic promoter groups: group 1 (comprising Col-0, Ag-0, and Lz-0) and group 2 (comprising Lip-0, Ei-0, Mrk-0, Cvi-0, Tsu-1, and Da1-12). The Kas-1 and Ler-0 promoter alleles appear to form a third functional group (group 3) that does not respond to dark differently from the Col-0 allele, although differences were detected in the insect feeding trial (Figure 1B; see below). The cross between Ei-2 and Lip-0 indicates that additional cis-regulatory diversity exists within these broadly defined groups.
The mean observed percentage of Col-0 mRNA allele reached a maximum of 60% in samples collected after 48 h in the dark and a minimum of 40% in samples collected after either 8 or 24 h of intense illumination (Figure 4). Thus, the mRNA from group 1 (Col-0, Ag-0, and Lz-0) accessions is up to 50% (1.5-to-1 ratio) more abundant in the dark than the mRNA from group 2 accessions. By contrast, in light, mRNA from group 2 is ∼50% more abundant than mRNA from group 1. Given that CHS is induced by light and repressed in the dark (Jenkins et al., 2001), this indicates that the expression of group 2 CHS alleles is tightly regulated by light environment, whereas the expression of group 1 alleles is more loosely regulated.
In samples collected from flowers, group 1 cis-regulatory alleles showed weaker activity than the other alleles to which they were compared. Results for organ-specific expression in flowers were thus similar to those for leaf tissue under 8 h or 24 h of light induction. Because flower development is also regulated by light, organ-specific expression in flowers is not independent of the light environment, and the specific effect of the developmentally regulated CHS transcription environment upon relative cis-regulatory activity is difficult to assess. However, our results indicate that the flower-specific transcriptional environment has the same effect as the (very stressful) 8 h or 24 h strong light environments.
Insect Treatments
Leaves were collected after either 9 h or 24 h of insect feeding, together with control leaves from individuals upon which no larva had been placed. We analyzed results obtained after 9 h and 24 h separately, in comparison with their corresponding controls. For 9 h of insect feeding, no significant treatment effect was found, but a significant interaction between crosses and treatments was observed (Table 3). The SLICE analysis (see Methods) detected significant heterogeneity among crosses for each of two treatments (9 h insect feeding; F6,264 = 11.09, P < 0.0001; 9 h insect control; F6,264 = 9.42, P < 0.0001). Three of seven crosses showed significant heterogeneity across treatments (DNA, 9 h insect feeding and 9 h control). Both control and insect-damaged leaves showed an excess of Col-0 mRNA allele relative to the mRNA of the Cvi-0, Da1-12, and Ei-2 genotypes (least squares mean comparisons; maximum P = 0.0097). No effect of insect feeding on the differential response of parental alleles could be detected (P = 0.07; Scheffé's multiple mean comparison test between 9 h insect and control). Although no significant trial effect was detected, the interaction between trial and treatment and trial and cross was significant. This is probably related to our experimental conditions. For insect feeding trials, plants were placed in the shade under reduced light intensity, but temperature and light levels were not strictly controlled.
For 24 h insect feeding, the offspring of Kas-1 × Col-0 differed markedly (Figure 5). We thus conducted the ANOVA without this parental genotype combination. A significant interaction between treatments and crosses was observed, although the treatment effect alone was not significant (Table 3). Similar to the 9 h insect treatment, no consistent effect of insect feeding upon differential response of parental alleles was detected. Nonetheless, Kas-1 and Col-0 showed significantly different responses in control relative to insect-damaged leaves. Indeed, for insect-damaged leaves, both parental alleles were expressed at the same level, but for control leaves, the Kas-1 parental allele was more expressed than the Col-0 allele (∼0.7 and 0.3, hence, more than twofold expression difference). A comparable trend was observed for the Ler-0 genotype, although missing data prevented its incorporation in the ANOVA analysis. However, when the Col-0 × Kas-1 cross was excluded from the analysis, the SLICE analysis detected a significant heterogeneity among crosses within treatment (24 h insect feeding; F5,236 = 7. 52, P < 0. 0001; 24 h insect control; F5,236 = 4. 08, P = 0. 0014).
Figure 5.
Relative CHS cis-Regulatory Activity in F1 Individuals from Seven Parental Combinations in Response to 24 h of Insect Feeding.
F1 individuals obtained from crosses between Col-0 and seven other ecotypes were analyzed. For each cross, relative levels of the Col-0 CHS mRNA allele in a 24 h insect-damaged plant and corresponding control are indicated. The bold black line indicates the expected value for equal promoter activity of both parental haplotypes, as measured on DNA from heterozygous individuals. Error bars show standard error.
Molecular Basis of Functional Change
At the nucleotide level, the comparison of the Col-0 and Tsu-1 sequences reveals that group 1 promoter alleles differ from other alleles by at least one mutation. This mutation is located in the intergenic region on position −674 (Figure 1B). This mutation consists of an adenine residue replacing the ancestral guanine residue. It affects a cis-element involved in the light repression of Asparagine Synthase expression in pea (Pisum sativum) (Ngai et al., 1997). This element occurs within a fragment covering position −688 to −641 that is conserved in the Arabidopsis genus (J. de Meaux, unpublished data). The putative functional consequences of this mutation are discussed below.
At alignment position −739, the sequence TTGGCA has changed to TTGACA in 9 out of 28 accessions. TTGACA, which contains the W-box core TGAC to which WRKY transcription factors bind, occurs at elevated frequencies in defense-related genes (Maleck et al., 2000). This potential gain-of-function polymorphism is situated in a region where the AT content alternates in phase with the helical DNA repeat. This feature is a possible nucleosome signal (Thastrom et al., 1999). As a result of DNA wrapping in a nucleosome, the TTGACA site is likely to be spatially close to a region ∼80 bp away, which is highly conserved across A. thaliana species, and could participate in the interaction among transcription factors controlling CHS expression. However, the two accessions (Cvi-0 and Kas-1) carrying this newly evolved motif did not show correlated cis-regulatory characteristics in our functional assay.
Finally, the Ler-0 and Kas-1 functional group (group 3) could not be correlated with the specific occurrence of a single mutation in the intergenic region. No difference from the Col-0 haplotype could be identified that was simultaneously shared by Ler-0 and Kas-1 and absent from the other accessions used in the functional study. This indicates that either several mutations compensate for each other to produce a similar functional phenotype or that the molecular basis of the group 3 functional alleles lies outside the region we have sequenced.
DISCUSSION
Functional genetically based variation within species is a prerequisite for adaptive evolution. However, within species surveys of cis-regulatory diversity are scarce, and little is known about the adaptive importance and fate of this diversity. We combined an analysis of nucleotide variation in the CHS promoter region with a robust assay of CHS cis-regulatory variation to evaluate the reservoir of diversity in this region which may contribute to evolution in A. thaliana. Although selection has not left any detectable signature in the CHS promoter region, we demonstrate that functional cis-regulatory polymorphisms segregate within A. thaliana. In addition, our study indicates that functional cis-regulatory variation can arise from a small number of mutations.
Neutral Pattern of Diversity in the Promoter Region
We surveyed nucleotide variation in the intergenic region upstream from the CHS coding region. The 3′ part of this region has been shown experimentally to drive gene expression in response to light and pathogen elicitor molecules (Hartmann et al., 1998; Logemann and Hahlbrock, 2002). The pattern of polymorphism segregating in this region is compatible with theoretical and empirically based neutral expectations, as indicated by the various summary statistics available to test for deviations from the neutral equilibrium model (Table 1). Neutral evolution is also indicated by the fact that the ratio of divergence to polymorphism in the CHS promoter region is statistically undistinguishable from that of the CHS transcribed region, as well as the ADH 5′ upstream region (Miyashita, 2001; Ramos-Onsins et al., 2004). Nonetheless, it remains possible that selection has acted locally in some populations. Our sampling does not allow us to test this hypothesis. The CHS promoter region differs from the ADH promoter region in the extent to which A. gemmifera and A. thaliana sequences can be aligned. The alignable region between A. thaliana and A. gemmifera comprises ∼50% of the CHS upstream region, whereas it covers >80% of the ADH region. Similar results are found when comparing the A. thaliana CHS promoter region to Arabidopsis croatica. Intraspecific data from additional loci are needed to evaluate how much the CHS intergenic region typifies the level of functional constraints in cis-regulatory regions and, thus, the level of polymorphism and divergence that can be expected in functional noncoding regions. As a first estimation, our comparison with the ATTI gene family indicates that the levels of diversity in the CHS intergenic region are comparable with those found in regions showing less expression variation (Clauss and Mitchell-Olds, 2004).
Functional Variation
To assess functional cis-regulatory variation, we paired distinct parental promoter alleles within F1 heterozygotes and assessed the relative expression levels of parental mRNA alleles that are distinguishable by at least one nucleotide polymorphism. This approach has been used previously to study both within- and among-species cis-regulatory polymorphisms (Cowles et al., 2002; Yan et al., 2002; Wittkopp et al., 2004). Here, we adapted the method to assess cis-regulatory variation in a broad range of inducing or repressing transcriptional environments. By controlling for multiple sources of experimental bias (PCR, technical covariates, and experimental error) and ruling out possible bias unrelated to our biological purpose (e.g., no cDNA pool–dependent bias and no maternal effect), we have developed a robust assay for identifying significant functional differences among alleles. This assay efficiently controls for changes in expression as a result of trans-regulatory and environmental differences. Indeed, the standard deviation for pyrosequencing measurements performed on cDNA samples is comparable in magnitude to that for measurements on DNA samples from heterozygous individuals. Most of the variation in our assay is actually due to PCR rather than to environmental differences affecting expression in different F1 individuals. This contrasts with methods evaluating variation in total gene expression, such as cDNA microarrays or real-time PCR, where plant-to-plant variation in total expression levels complicates intraspecific evaluation of expression variation (Townsend, 2004).
CHS expression is repressed in the dark and induced in light (Jenkins et al., 2001). Its expression is also regulated by the circadian clock, is induced by herbivory or elicitors (Reymond et al., 2000; Logemann and Hahlbrock, 2002), and is upregulated in flowers (Burbulis et al., 1996). We evaluated CHS cis-regulatory variation in all these transcription environments in A. thaliana and thus identified at least three functional groups of light-sensitive promoter alleles. Group 1 cis-regulatory alleles are loosely light regulated, whereas group 2 cis-regulatory alleles are tightly light regulated, being relatively more repressed in the dark and more induced in light. The third group of cis-regulatory alleles is found in the Kas-1 and Ler-0 genotypes, where CHS mRNA alleles are relatively more induced in the light but less repressed in the dark, as indicated by the higher level of Kas-1 mRNA allele measured in the early morning on control plants of the insect trial. These cis-regulatory differences are apparently independent of the genetic background in which they were observed because similar light responses were detected in multiple crosses. Our results demonstrate that extensive cis-regulatory variation exists within species and that the analysis of expression in a single environment underestimates cis-regulatory polymorphism.
The functional cis-regulatory differences we observe are small in magnitude and reach a maximum ratio of 1.5 to 1, as in the case of Col-0 versus Cvi-0 in the dark. Nevertheless, they are likely to substantially influence flavonoid production. CHS is a key enzyme in the flavonoid pathway, and its activity is thought to be regulated primarily at the transcriptional level (Mol et al., 1996; Noh and Spalding, 1998). In addition, it has been shown to form enzymatic complexes by interacting directly with other enzymes of the pathway (Burbulis and Winkel-Shirley, 1999). Thus, a 50% difference is likely to modify significantly the metabolite flux through the flavonoid pathway (Borevitz et al., 2000; Liu et al., 2002).
Rather than transcriptional differences, our results could also be explained by allele-specific differences in mRNA stability. A genome-wide study of mRNA stability in A. thaliana indicates that unstable mRNAs are rare (1%), whereas allelic variation of expression levels seems more common, as indicated by studies in humans and maize (Bray et al., 2003; Guo et al., 2004). The CHS mRNA is not unstable (Gutierrez et al., 2002), and the SNP polymorphism does not seem to affect the minimum free energy of mRNA folding, which means that global pairing patterns are similar for all three mRNA variants (Hofacker, 2003). Together, this suggests that stability differences among these mRNA variants are unlikely.
The Molecular Basis of Functional Variation
Because the intergenic region was shown to influence CHS expression (Hartmann et al., 1998; Logemann and Hahlbrock, 2002), it is relevant to ask which promoter polymorphisms may explain the observed functional variation. Functional cis-regulatory changes can result from allele-specific differences in chromatin decondensation. However, the functional polymorphism uncovered in this study does not correlate with molecular changes in regions known to be associated with nucleosome positioning (Lomvardas and Thanos, 2002). Alternatively, allele-specific variation in methylation susceptibility has been shown to affect expression of the SUPERMAN gene in A. thaliana (Jacobsen and Meyerowitz, 1997). The genetic basis of that difference remains nonetheless unclear, and this phenomenon does not seem to be widespread, as it has not been observed at other loci. Our results instead suggest that variation in transcription-factor binding specificity is responsible for the functional cis-regulatory variation of CHS expression observed in A. thaliana.
We did not detect functional constraints indicative of purifying selection against new polymorphisms within putative transcription factor binding sites. Most of these putative binding sites are neither overrepresented nor relevant for CHS expression and, hence, may be nonfunctional (Sandelin and Wasserman, 2004). Nonetheless, for 3 out of 10 polymorphic putative binding sites, a prediction regarding potential functional relevance can be made using information from our functional assay, combined with a careful scrutiny of the binding site sequence context (see below).
We identified one point mutation at alignment position −674 that strictly correlates with the functional group of promoter alleles that are loosely light regulated. An experimental confirmation of the specific role of this mutation is not currently feasible. The changes in activity are likely to be too small to be reliably detected with a transient expression system using a reporter gene. Further allele-based methods are needed for testing the genetic basis of small changes in promoter activity. Despite this limitation, several lines of evidence strongly suggest that this mutation has functional consequences. This mutation occurs in a nucleotide island that is conserved among Arabidopsis species (J. de Meaux, unpublished data), within a nucleotide box that was first observed in the promoter of the pea Asparagine Synthetase, for which transcription is negatively regulated by light (Ngai et al., 1997). The tightly light-regulated promoter alleles, which harbor the wild-type box, are more upregulated in light than the loosely light-regulated alleles, which harbor the mutated box. Because the wild-type box characterized in pea participates in light-mediated repression of gene expression, this suggests that the mutated box causes stronger repression than the wild-type box. It is interesting to note that the G-to-A change, which presumably alters transcription-factor binding, is likely to change the structural configuration of DNA in the box (Brukner et al., 1995) and occurs in a position of less stringent nucleotide conservation (Ngai et al., 1997). This mutation suggests that functional cis-regulatory variation can result from relatively few changes; thus, the potential for adaptive evolution can readily be generated in promoter regions (Wray et al., 2003).
Three mutations occur in phylogenetic footprints; however, it was not possible to relate them to any functional allelic group. Two of these mutations are singletons carried by the Lz-0 or Ler-0 accessions and occur within a G-box. Although highly conserved in the Brassicacae (Koch et al., 2001), this box was shown to be nonfunctional with respect to light regulation in A. thaliana (Hartmann et al., 1998). Consistent with this previous study, we could not detect a cis-regulatory difference correlating with these changes. A third mutation is carried by several Indo-Asian accessions (Kas-1, Sorbo, Shakhdara, and Hodja) and affects a highly conserved H-like box (Koch et al., 2001). Because Kas-1 and Ler-0 seemed to be similarly affected in their response to the biological clock and do not share this mutation, it is unclear whether this second mutation significantly affects expression in A. thaliana. Our result has consequences for the interpretation of phylogenetic footprints. Indeed, if conservation indicates functional constraints at a macroevolutionary scale, we cannot exclude the possibility that such constraints have been recently relaxed at a microevolutionary scale, resulting in neutral variation within phylogenetic footprints.
We found a newly evolved WRKY transcription factor binding site (W-box) at relatively high frequency. These boxes are usually found to be overrepresented in defense-regulated promoters (Maleck et al., 2000). However, a careful analysis of the sequence context of this box indicates that, if functional, this new W-box is expected to interact with the nearby functional region because of possible nucleosome position. It is interesting to note that sequences carrying the newly evolved W-box never have any of the three polymorphic positions, spaced by 9 and 7 bp, respectively, which immediately precede the highly conserved region starting at alignment position −688 (see above). All three of these mutations affect a TGA trinucleotide, and TGA is the most flexible trinucleotide according to the bendability scale of Brukner et al. (1995). This W-box was not associated with functional cis-regulatory variation in our assay. Nonetheless, it illustrates how our understanding of promoter function may benefit from investigation of the sequence context surrounding putative binding sites. In the future, the characterization of naturally segregating cis-regulatory diversity should contribute to the elucidation of promoter function because it generates molecular hypotheses for functional variation in promoter regions.
METHODS
Sequencing
All Arabidopsis thaliana seeds were obtained from the Nottingham Arabidopsis Stock Centre. Young leaves from each accession were ground in liquid nitrogen, and DNA was subsequently purified following standard cetyl-trimethyl-ammonium bromide protocol. CHS is single copy in the A. thaliana genome (Koch et al., 2000). To amplify the CHS intergenic region, we designed a forward primer in the closest adjacent putative open reading frame 5′ upstream from CHS (5′-TCTCCGGTCTGCATTGTGC-3′) and a reverse primer in the first CHS exon (5′-GTAGTCAGGATACTCCGC-3′). The adjacent open reading frame is of unknown function and is annotated Mac12.12 in the A. thaliana genome (http://www.arabidopsis.org). PCR was conducted in a solution (2 mM MgCl2, 1 unit of Taq polymerase [Qiagen, Valencia, CA], and 0.2 mM deoxynucleotide triphosphate) buffered using the manufacturer's recommendations and containing 2 pmol of each primer for a total volume of 25 μL. PCR products were obtained with 35 cycles as follows: 94°C, 30 s/52°C, 30 s/68°C, 3 min. Two independent PCRs were performed for each accession, and their mixed PCR products were directly sequenced on both strands with an ABI3700 capillary sequencer (Applied Biosystems, Foster City, CA) using primers placed approximately every 500 bp. In case of ambiguous sequence results, the whole procedure was repeated. Sequences were assembled with Seqman 5.0 (DNASTAR, Madison, WI), and each variable site was checked by examining sequence chromatograms. The orthologous region was also sequenced from Arabidopsis croatica and Arabidopsis gemmifera, two species closely related to A. thaliana, using primers 5′-AGGACAATCGTTGATCCAG-3′ and 5′-GTAGTCAGGATACTCCGC-3′. The PCR was conducted as described above, with the exception of the use of a 54°C annealing temperature for PCR cycling. Two independent PCRs were performed and products were cloned using the TOPO TA cloning kit (Invitrogen Life Technologies, Paisley, UK). Six clones per PCR were sequenced as above. The CHS exons 1 and 2 were sequenced in 20 A. thaliana accessions as well as in A. croatica following Ramos-Onsins et al. (2004).
Expression Analysis
Crosses
A first set of eight crosses was performed between accession Col-0 and each of the following accessions: Lip-0, Mrk-0, Kas-1, Cvi-0, Ler-0, Tsu-0, Da(1)-12, and Ei-2. These nine accessions were chosen to represent the sequence diversity found in the intergenic region. Reciprocal crosses were performed to control for maternal effects. Seeds from each reciprocal cross were sown in a sterilized potting soil/vermiculite mix and were vernalized in the dark at 4°C for 7 d followed by 7 d in Voetsch reach-in chambers (12 h day, 20°C day temperature, 16°C night temperature, 70% humidity) for germination. Germinated seedlings were then transplanted into single pots and assigned to random positions in the reach-in chambers in the same conditions as described above. For each reciprocal cross, 20 seedlings per cross (40 seedlings in total for each parental combination) were grown for 5 weeks. A second set of seven crosses was performed between Ag-0, Lz-0, Cvi-0, Ei-2, and Lip-0 (Ag-0 × Cvi-0, Ag-0 × Ei-2, Ag-0 × Lip-0, Lz-0 × Cvi-0, Lz-0 × Ei-2, Lz-0 × Lip-0, and Lip-0 × Ei-2), which were analyzed only under different light conditions (see below).
Light Treatments
CHS gene expression is repressed in the dark and strongly induced in light (Jenkins et al., 2001). Plants were placed in the dark for 48 h at 20°C followed by 24 h of strong white light at 15°C (700 mmol/cm2), and samples were collected at the end of the dark phase, after 8 h light, and at the end of the light phase. We thus analyzed three independent light environments: dark, 8 h strong light, and 24 h strong light. Soil temperature in the pots reached 25°C as a consequence of illumination. One newly expanded leaf of ∼1 cm in length was harvested from each plant. Leaves were immediately put into liquid nitrogen either after 48 h in the dark or after 8 h or 24 h in strong light. To control for biological clock effects between trials on CHS expression levels, the experiment was always started at the same time of the day (12:00 pm). For each light treatment, leaves from four distinct plants per cross (two from each reciprocal cross) were harvested. Two independent trials separated by a 5-month interval were conducted, for a total of eight plants per cross and per treatment. For the above mentioned second set of crosses, eight F1 individuals per cross were analyzed for their response to light in a single trial.
Organ-Specific Expression in Flowers
CHS is upregulated in flowers where flavonoids are produced abundantly (Burbulis et al., 1996). Plants where a leaf had been harvested after 48 h in the dark were further grown for several weeks under standard light conditions in a reach-in chamber until they flowered. A single flower per plant was harvested for RNA extraction. Flowers from four distinct plants per cross (two from each reciprocal cross) were harvested. Two independent trials separated by a 5-month interval were conducted, for a total of eight plants per cross.
Insect Feeding Trials
CHS expression is induced upon feeding by Plutella xylostella larvae (H. Vogel, personal communication). For insect induction, 4th instar larvae were starved overnight. In the early morning, two larvae were placed on each plant assigned to the insect treatment. After 9 and 24 h, herbivore-damaged leaves, as well as leaves from insect-free control plants, were harvested. For each insect treatment and each associated control, leaves from four distinct plants per cross (two from each reciprocal cross) were harvested. Two independent trials separated by a 5-month interval were conducted, for a total of eight plants per cross and per treatment. Insect feeding trials resulted in four independent treatments: 9 h insect feeding, 9 h control, 24 h insect feeding, and 24 h control.
RNA Extraction, cDNA Synthesis, and Quantitative Pyrosequencing
Immediately after collection, 1 to 2 cm2 of sampled leaf material was placed into liquid nitrogen and stored at −80°C. RNA was extracted using 500 mL of Trizol (Invitrogen Life Technologies) and following the manufacturer's protocol. Precipitated RNA was resuspended in RNA storage solution (Ambion, Austin, TX) after 20 min of air drying. Total RNA concentrations were evaluated by spectrophotometry, and ∼4 μg of total RNA was used for each cDNA synthesis. cDNA was synthesized using Superscript III RT (Invitrogen Life Technologies) following the manufacturer's protocol with the following modifications: 100 units of Superscript III-RT and 20 units of RNaseOUT (Invitrogen Life Technologies) for a 20-μL reaction. Also, incubation time for retrotranscription was extended to 2 h. To ensure that RNA samples were free of DNA contamination, a PCR was also performed on several RNA samples before the RT, following the protocol described below. No amplification product was observed.
Pyrosequencing is a method based on the stochiometric photochemical reaction triggered by the nucleotide by nucleotide extension of a sequencing primer. The photochemical reaction allows quantification of nucleotides at polymorphic positions relative to neighboring monomorphic positions, hence estimating allele-specific relative expression levels (Ahmadian et al., 2000; Neve et al., 2002).
To control for possible position effects in the thermocycler, cDNA samples together with DNA extracted from heterozygous plants were randomly distributed across 96-well plates before PCR. A 1-μL volume of the cDNA solution was used for PCR with 5 pmol of each primer, 5 pmol of ROTI-Mix (Carl Roth, Karlsruhe, Germany) mixed nucleotides, 25 pmol MgCl2, 1 unit of Taq polymerase (Qiagen, Valencia, CA), and the PCR buffer provided by the manufacturer. Two sets of primers were used to assess SNP 780, which differentiates Col-0, Lz-0, and Ag-0 from all other accessions: Set 719 (5′Biotin-CAAGGTTGCTTCGCCGGC-3′/5′-GGTAACGGCTGTGATCTC-3′) and Set 700 (5′Biotin-AGCGTCTCATGATGTACC-3′/5′-TTTCTCTCCGACAGATGTG-3′). Primer set 700 was also used to assess SNP840 differentiating Lip-0 from the other accessions.
SNP quantity was assessed using the PyrosequencerAB device (Biotage, Uppsala, Sweden) and following the manufacturer's protocol for vacuum sample preparation and pyrosequencing reactions. Sequencing primers used for SNP780 and SNP840 were 5′-GAGGACACGTGCTCCA-3′ and 5′-TGTGTCAGGGTCCG-3′, respectively. For both SNPs, relative SNP concentration was measured as the ratio of a polymorphic peak relative to a monomorphic peak. Polymorphic peaks corresponding to more than one position (i.e., a monomorphic and a polymorphic position or two monomorphic positions) were not used in the analysis. Genotypes Lip-0 and Col-0 differ in the CHS coding sequence at two nucleotide positions (SNP780 and SNP840), permitting experimental replication with two independent SNPs. Both SNPs were assessed on PCR fragments obtained with the primer set 700, and expression levels were significantly correlated (R2 = 0.84, P = 0.003). To calibrate the pyrosequencing measurements, serial volumetric mixtures of homozygote DNA or RNA were performed (0.2:0.8, 0.3:0.7… up to 0.8:0.2). Volumetric proportions were corrected for concentration and technical bias using the K correction proposed by Wittkopp et al. (2004), in which K is the ratio of the technical bias (measured as the deviation of the heterozygote DNA measurement from 50%) over the volumetric bias (measured as the deviation of the 50/50 volumetric mixtures from 50%). The technical bias accounts for possible preferential amplification of a parental allele, and the volumetric bias accounts for inaccuracy in the estimation of total DNA (or RNA) levels before the mixture. True proportions were obtained from transforming volumetric proportions (VP) with the following function: K * VP/(1 − VP + K * VP).
Data Analysis
Population Genetics
Sequences were aligned with Megalign 5.03 (DNASTAR). The DnaSP 3.84 program (Rozas and Rozas, 1999) was used for both intraspecific and interspecific analyses of nucleotide polymorphism (Table 1). Nucleotide diversity was calculated as π, the average number of nucleotide differences among pairs of sequences, and as θ, the proportion of segregating sites (Watterson, 1975; Tajima, 1983; Nei, 1987). To compute the indel diversity statistics, each indel was coded as a substitution, the rest of the sequences being equal. Patterns of nucleotide polymorphism were summarized by the following test statistics: Tajima's D, based in the differences between two estimators of intraspecific diversity, and Fay and Wu's H, which makes use of an outgroup sequence to analyze the frequency of derived polymorphisms (Tajima, 1989; Fay and Wu, 2000). The HKA test is based on the prediction that, for a particular region of the genome, the rate of divergence between species is proportional to the levels of polymorphism within species (Hudson et al., 1987). This test compares the ratio of intraspecific polymorphism to interspecific divergence in two loci. HKA tests were performed for silent positions using silent segregating sites and the silent divergence value (Nei, 1987) to compare the intergenic region and the CHS coding region. For comparison of multiple intergenic regions, all positions were considered to be silent. These three neutrality tests (D, H, and HKA) focus on different characteristics of nucleotide polymorphism and thus effectively summarize the evolutionary history of the CHS intergenic region.
Promoter Analysis
Sequences were searched for known transcription factor binding sites using version 16.0 of the PLACE database (Higo et al., 1999). This database was chosen because it is publicly available and is regularly updated (403 binding site entries on April 13, 2004). We wrote a Perl script that reads a sequence alignment, automatically queries PLACE with each sequence in the alignment (gaps removed), and outputs those binding sites that are affected by a polymorphic site. The script is available from the authors upon request.
Statistical Analysis of Expression Data
A significant correlation was detected between the pyrosequencing measurement and the signal-to-noise difference estimated by comparing the height of a monomorphic peak to basal signal. Following the manufacturer's advice, we discarded data where monomorphic peaks showed a signal below 10 units. We subsequently incorporated the signal-to-noise difference as a technical covariate in the pyrosequencing measurement. We performed separate analyses for light induction treatment (including organ-specific expression in flowers) as well as for each insect treatment. In all analyses, measurements obtained for DNA samples of heterozygous individuals were used to calibrate the null hypothesis in which both promoter alleles have equal activity.
An ANOVA was performed with the GLM procedure implemented in SAS using the following statistical model:
![]() |
where μ is the grand mean, Gi is the effect of the ith cross, Ij indicates the jth induction treatment, Tk represents the kth trial, Pkl is the effect of the lth plate in the kth trial, Nm indicates the mth primer set, C is a technical covariate, and GIij, ITjk, and GTik represent interactions between cross × treatment, treatment × trial, and cross × trial, respectively. Because of missing data in some cells of our data set, it was not possible to include the effect of the maternal genotype in this model. This effect was thus analyzed separately using a similar model that did not incorporate the interactions between factors.
In each well, Xijklm is the ratio of a polymorphic peak over a monomorphic peak obtained by pyrosequencing. In the GLM model, trial, PCR plate within trial, and interactions involving the trial factor were treated as random effects. To dissect significant interaction effects, we used the SLICE procedure in SAS to test for heterogeneity among treatments within a genotype or among genotypes within treatment. For heterozygous DNA, no heterogeneity among crosses was detected. When expression levels were significantly heterogeneous, we performed a separate ANOVA for each cross or each treatment followed by a least squares mean comparison test (Tukey's test) to identify statistically differentiated treatments or genotypes. For the additional crosses, given that we had only one trial and two plates, all effects were handled as fixed effects in the ANOVA.
For the analysis of the effect of cDNA pool upon pyrosequencing measurement, the following linear model was tested:
![]() |
where μ is the grand mean, Ai is a continuous covariate corresponding to the ith proportion of Col-0 allele, Ij indicates the jth induction treatment, Rk represents the kth retrotranscription batch, C is a technical covariate, and ARik represents interaction between proportion × treatment. Rk and ARik are random effects.
Sequence data from this article have been deposited with the EMBL/GenBank data libraries under accession numbers AJ867819 to AJ867845. Accession numbers for the A. gemmifera and A. croatica sequences are AJ868239 and AJ868240, respectively.
Supplementary Material
Acknowledgments
We thank J. Bishop, M.J. Clauss, A. Lawton-Rauh, S.E. Ramos-Onsins, P. Wittkopp, J. Zavala, and two anonymous reviewers for helpful discussions and/or comments on the manuscript. This work was supported by the Max Planck Gesellschaft and by the Bundesministerium fuer Bildung und Forschung/Jena Centre for Bioinformatics Initiative 0312704F.
The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantcell.org) is: Juliette de Meaux (jdemeaux@ice.mpg.de).
Online version contains Web-only data.
Article, publication date, and citation information can be found at www.plantcell.org/cgi/doi/10.1105/tpc.104.027839.
References
- Ahmadian, A., Gharizadeh, B., Gustafsson, A.C., Sterky, F., Nyren, P., Uhlen, M., and Lundeberg, J. (2000). Single-nucleotide polymorphism analysis by pyrosequencing. Anal. Biochem. 280, 103–110. [DOI] [PubMed] [Google Scholar]
- Bamshad, M.J., Mummidi, S., Gonzalez, E., Ahuja, S.S., Dunn, D.M., Watkins, W.S., Wooding, S., Stone, A.C., Jorde, L.B., Weiss, R.B., and Ahuja, S.K. (2002). A strong signature of balancing selection in the 5′ cis-regulatory region of CCR5. Proc. Natl. Acad. Sci. USA 99, 10539–10544. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boffelli, D., McAuliffe, J., Ovcharenko, D., Lewis, K.D., Ovcharenko, I., Pachter, L., and Rubin, E.M. (2003). Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299, 1391–1394. [DOI] [PubMed] [Google Scholar]
- Borevitz, J.O., Xia, Y., Blount, J.W., Dixon, R.A., and Lamb, C. (2000). Activation tagging identifies a conserved MYB regulator of phenylpropanoid biosynthesis. Plant Cell 12, 2383–2393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bray, N.J., Buckland, P.R., Owen, M.J., and O'Donovan, M.C. (2003). Cis-acting variation in the expression of a high proportion of genes in human brain. Hum. Genet. 113, 149–153. [DOI] [PubMed] [Google Scholar]
- Brem, R.B., Yvert, G., Clinton, R., and Kruglyak, L. (2002). Genetic dissection of transcriptional regulation in budding yeast. Science 296, 752–755. [DOI] [PubMed] [Google Scholar]
- Brukner, I., Sanchez, R., Suck, D., and Pongor, S. (1995). Sequence-dependent bending propensity of DNA as revealed by DNase-I: Parameters for trinucleotides. EMBO J. 14, 1812–1818. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burbulis, I.E., Iacobucci, M., and Shirley, B.W. (1996). A null mutation in the first enzyme of flavonoid biosynthesis does not affect male fertility in Arabidopsis. Plant Cell 8, 1013–1025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burbulis, I.E., and Winkel-Shirley, B. (1999). Interactions among enzymes of the Arabidopsis flavonoid biosynthetic pathway. Proc. Natl. Acad. Sci. USA 96, 12929–12934. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clauss, M.J., and Mitchell-Olds, T. (2004). Functional divergence in tandemly duplicated Arabidopsis thaliana trypsin inhibitor genes. Genetics 166, 1419–1436. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cliften, P., Sudarsanam, P., Desikan, A., Fulton, L., Fulton, B., Majors, J., Waterston, R., Cohen, B.A., and Johnston, M. (2003). Finding functional features in Saccharomyces genomes by phylogenetic footprinting. Science 301, 71–76. [DOI] [PubMed] [Google Scholar]
- Cowles, C.R., Hirschhorn, J.N., Altshuler, D., and Lander, E.S. (2002). Detection of regulatory variation in mouse genes. Nat. Genet. 32, 432–437. [DOI] [PubMed] [Google Scholar]
- Crawford, D.L., Segal, J.A., and Barnett, J.L. (1999). Evolutionary analysis of TATA-less proximal promoter function. Mol. Biol. Evol. 16, 194–207. [DOI] [PubMed] [Google Scholar]
- Dermitzakis, E.T., Bergman, C.M., and Clark, A.G. (2003). Tracing the evolutionary history of Drosophila regulatory regions with models that identify transcription factor binding sites. Mol. Biol. Evol. 20, 703–714. [DOI] [PubMed] [Google Scholar]
- Enard, W., et al. (2002). Intra- and interspecific variation in primate gene expression patterns. Science 296, 340–343. [DOI] [PubMed] [Google Scholar]
- Fay, J.C., and Wu, C.I. (2000). Hitchhiking under positive Darwinian selection. Genetics 155, 1405–1413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gu, Z.L., Rifkin, S.A., White, K.P., and Li, W.H. (2004). Duplicate genes increase gene expression diversity within and between species. Nat. Genet. 36, 577–579. [DOI] [PubMed] [Google Scholar]
- Guo, M., Rupe, M.A., Zinselmeier, C., Habben, J., Bowen, B.A., and Smith, O.S. (2004). Allelic variation of gene expression in maize hybrids. Plant Cell 16, 1707–1716. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gutierrez, R.A., Ewing, R.M., Cherry, J.M., and Green, P.J. (2002). Identification of unstable transcripts in Arabidopsis by cDNA microarray analysis: Rapid decay is associated with a group of touch- and specific clock-controlled genes. Proc. Natl. Acad. Sci. USA 99, 11513–11518. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hartmann, U., Valentine, W.J., Christie, J.M., Hays, J., Jenkins, G.I., and Weisshaar, B. (1998). Identification of UV/blue light-response elements in the Arabidopsis thaliana chalcone synthase promoter using a homologous protoplast transient expression system. Plant Mol. Biol. 36, 741–754. [DOI] [PubMed] [Google Scholar]
- Higo, K., Ugawa, Y., Iwamoto, M., and Korenaga, T. (1999). Plant cis-acting regulatory DNA elements (PLACE) database: 1999. Nucleic Acids Res. 27, 297–300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoeren, F.U., Dolferus, R., Wu, Y.R., Peacock, W.J., and Dennis, E.S. (1998). Evidence for a role for AtMYB2 in the induction of the Arabidopsis alcohol dehydrogenase gene (ADH1) by low oxygen. Genetics 149, 479–490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hofacker, I.L. (2003). Vienna RNA secondary server. Nucleic Acids Res. 13, 3429–3431. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hudson, R.R., Kreitman, M., and Aguade, M. (1987). A test of neutral molecular evolution based on nucleotide data. Genetics 116, 153–159. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jacobsen, S.E., and Meyerowitz, E.M. (1997). Hypermethylated SUPERMAN epigenetic alleles in Arabidopsis. Science 277, 1100–1103. [DOI] [PubMed] [Google Scholar]
- Jenkins, G.I., Long, J.C., Wade, H.K., Shenton, M.R., and Bibikova, T.N. (2001). UV and blue light signalling: Pathways regulating chalcone synthase gene expression in Arabidopsis. New Phytol. 151, 121–131. [DOI] [PubMed] [Google Scholar]
- Johnson, E.T., and Dowd, P.F. (2004). Differentially enhanced insect resistance, at a cost, in Arabidopsis thaliana constitutively expressing a transcription factor of defensive metabolites. J. Agric. Food Chem. 52, 5135–5138. [DOI] [PubMed] [Google Scholar]
- Khaitovich, P., Weiss, G., Lachmann, M., Hellmann, I., Enard, W., Muetzel, B., Wirkner, U., Ansorge, W., and Paabo, S. (2004). A neutral model of transcriptome evolution. PLoS Biology 2, 682–689. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Koch, M.A., Haubold, B., and Mitchell-Olds, T. (2000). Comparative evolutionary analysis of chalcone synthase and alcohol dehydrogenase loci in Arabidopsis, Arabis, and related genera (Brassicaceae). Mol. Biol. Evol. 17, 1483–1498. [DOI] [PubMed] [Google Scholar]
- Koch, M.A., Weisshaar, B., Kroymann, J., Haubold, B., and Mitchell-Olds, T. (2001). Comparative genomics and regulatory evolution: Conservation and function of the Chs and Apetala3 promoters. Mol. Biol. Evol. 18, 1882–1891. [DOI] [PubMed] [Google Scholar]
- Lerman, D.N., Michalak, P., Helin, A.B., Bettencourt, B.R., and Feder, M.E. (2003). Modification of heat-shock gene expression in Drosophila melanogaster populations via transposable elements. Mol. Biol. Evol. 20, 135–144. [DOI] [PubMed] [Google Scholar]
- Liu, C.-J., Blount, J.W., Steele, C.L., and Dixon, R.A. (2002). Bottlenecks for metabolic engineering of isoflavone glycoconjugates in Arabidopsis. Proc. Natl. Acad. Sci. USA 99, 14578–14583. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Logemann, E., and Hahlbrock, K. (2002). Crosstalk among stress responses in plants: Pathogen defense overrides UV protection through an inversely regulated ACE/ACE type of light-responsive gene promoter unit. Proc. Natl. Acad. Sci. USA 99, 2428–2432. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lomvardas, S., and Thanos, D. (2002). Modifying gene expression programs by altering core promoter chromatin structure. Cell 110, 261–271. [DOI] [PubMed] [Google Scholar]
- Ludwig, M.Z., Bergman, C., Patel, N.H., and Kreitman, M. (2000). Evidence for stabilizing selection in a eukaryotic enhancer element. Nature 403, 564–567. [DOI] [PubMed] [Google Scholar]
- Maleck, K., Levine, A., Eulgem, T., Morgan, A., Schmid, J., Lawton, K.A., Dangl, J.L., and Dietrich, R.A. (2000). The transcriptome of Arabidopsis thaliana during systemic acquired resistance. Nat. Genet. 26, 403–410. [DOI] [PubMed] [Google Scholar]
- Michalak, P., Minkov, I., Helin, A., Lerman, D.N., Bettencourt, B.R., Feder, M.E., Korol, A.B., and Nevo, E. (2001). Genetic evidence for adaptation-driven incipient speciation of Drosophila melanogaster along a microclimatic contrast in “Evolution Canyon,” Israel. Proc. Natl. Acad. Sci. USA 98, 13195–13200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miyashita, N.T. (2001). DNA variation in the 5′ upstream region of the Adh locus of the wild plants Arabidopsis thaliana and Arabis gemmifera. Mol. Biol. Evol. 18, 164–171. [DOI] [PubMed] [Google Scholar]
- Mol, J., Jenkins, G., Schafer, E., and Weiss, D. (1996). Signal perception, transduction, and gene expression involved in anthocyanin biosynthesis. Crit. Rev. Plant Sci. 15, 525–557. [Google Scholar]
- Nei, M. (1987). Molecular Evolutionary Genetics. (New York: Columbia University Press).
- Neve, B., Froguel, P., Corset, L., Vaillant, E., Vatin, V., and Boutin, P. (2002). Rapid SNP allele frequency determination in genomic DNA pools by pyrosequencing. Biotechniques 32, 1138–1142. [DOI] [PubMed] [Google Scholar]
- Ngai, N., Tsai, F.Y., and Coruzzi, G. (1997). Light-induced transcriptional repression of the pea AS1 gene: Identification of cis-elements and transfactors. Plant J. 12, 1021–1034. [DOI] [PubMed] [Google Scholar]
- Noh, B., and Spalding, E.P. (1998). Anion channels and the stimulation of anthocyanin accumulation by blue light in Arabidopsis seedlings. Plant Physiol. 116, 503–509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Piano, F., Parisi, M.J., Karess, R., and Kambysellis, M.P. (1999). Evidence for redundancy but not trans factor-cis element coevolution in the regulation of Drosophila Yp genes. Genetics 152, 605–616. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ramos-Onsins, S.E., Stranger, B.E., Mitchell-Olds, T., and Aguade, M. (2004). Multilocus analysis of variation and speciation in the closely related species Arabidopsis halleri and A. lyrata. Genetics 166, 373–388. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reymond, P., Weber, H., Damond, M., and Farmer, E.E. (2000). Differential gene expression in response to mechanical wounding and insect feeding in Arabidopsis. Plant Cell 12, 707–719. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rockman, M.V., Hahn, M.W., Soranzo, N., Goldstein, D.B., and Wray, G.A. (2003). Positive selection on a human-specific transcription factor binding site regulating IL4 expression. Curr. Biol. 13, 2118–2123. [DOI] [PubMed] [Google Scholar]
- Rozas, J., and Rozas, R. (1999). DnaSP version 3: An integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15, 174–175. [DOI] [PubMed] [Google Scholar]
- Sandelin, A., and Wasserman, W.W. (2004). Constrained binding site diversity within families of transcription factors enhances pattern discovery bioinformatics. J. Mol. Biol. 338, 207–215. [DOI] [PubMed] [Google Scholar]
- Schmid, K.J., Ramos-Onsins, S.E., Ryngis-Beckstein, H., Weisshaar, B., and Mitchell-Olds, T. (16January2005). A multilocus sequence survey in Arabidopsis thaliana reveals a genome-wide departure from the standard neutral model of sequence evolution. Genetics 10.1534/genetics.104.033795. [DOI] [PMC free article] [PubMed]
- Schulte, P.M., Glemet, H.C., Fiebig, A.A., and Powers, D.A. (2000). Adaptive variation in lactate dehydrogenase-B gene expression: Role of a stress-responsive regulatory element. Proc. Natl. Acad. Sci. USA 97, 6597–6602. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stone, J.R., and Wray, G.A. (2001). Rapid evolution of cis-regulatory sequences via local point mutations. Mol. Biol. Evol. 18, 1764–1770. [DOI] [PubMed] [Google Scholar]
- Tajima, F. (1983). Evolutionary relationship of DNA sequences in finite populations. Genetics 105, 437–460. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tajima, F. (1989). Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123, 585–595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thastrom, A., Lowary, P.T., Widlund, H.R., Cao, H., Kubista, M., and Widom, J. (1999). Sequence motifs and free energies of selected natural and non-natural nucleosome positioning DNA sequences. J. Mol. Biol. 288, 213–229. [DOI] [PubMed] [Google Scholar]
- Townsend, J.P. (2004). Resolution of large and small differences in gene expression using models for the Bayesian analysis of gene expression levels and spotted DNA microarrays. BMC Bioinformatics 5, 54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Von Dassow, G., and Odell, G.M. (2002). Design and constraints of the Drosophila segment polarity module: Robust spatial patterning emerges from intertwined cell state switches. J. Exp. Zool. 294, 179–215. [DOI] [PubMed] [Google Scholar]
- Wade, H.K., Bibikova, T.N., Valentine, W.J., and Jenkins, G.I. (2001). Interactions within a network of phytochrome, cryptochrome and UV-B phototransduction pathways regulate chalcone synthase gene expression in Arabidopsis leaf tissue. Plant J. 25, 675–685. [DOI] [PubMed] [Google Scholar]
- Wang, R.L., Stec, A., Hey, J., Lukens, L., and Doebley, J. (1999). The limits of selection during maize domestication. Nature 398, 236–239. [DOI] [PubMed] [Google Scholar]
- Watterson, G.A. (1975). Number of segregating sites in genetic models without recombination. Theor. Popul. Biol. 7, 256–276. [DOI] [PubMed] [Google Scholar]
- Winkel-Shirley, B. (2001). Flavonoid biosynthesis. A colorful model for genetics, biochemistry, cell biology, and biotechnology. Plant Physiol. 126, 485–493. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wittkopp, P.J., Haerum, B.K., and Clark, A.G. (2004). Evolutionary changes in cis and trans gene regulation. Nature 430, 85–88. [DOI] [PubMed] [Google Scholar]
- Wray, G.A., Hahn, M.W., Abouheif, E., Balhoff, J.P., Pizer, M., Rockman, M.V., and Romano, L.A. (2003). The evolution of transcriptional regulation in eukaryotes. Mol. Biol. Evol. 20, 1377–1419. [DOI] [PubMed] [Google Scholar]
- Yan, H., Yuan, W.S., Velculescu, V.E., Vogelstein, B., and Kinzler, K.W. (2002). Allelic variation in human gene expression. Science 297, 1143. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.