Significance
Differences among individuals and species originate from changes to the genome. Yet our knowledge of the principles that might allow prediction of the effects of any particular mutation is limited. One such prediction might be that duplicating a gene would double the gene’s output. We show that this is actually not the case in Drosophila flies. Instead, in almost all of the cases we tested (using a naturally occurring and an artificially constructed tandem duplicate gene), we observed that the output of the duplicated genes was greater than double the output of single copies—as much as five times greater. This finding suggests that tandem duplicate genes could have disproportionate effects when they occur.
Keywords: tandem duplication, gene expression, position effect, gene structure, genome evolution
Abstract
Tandem gene duplication is an important mutational process in evolutionary adaptation and human disease. Hypothetically, two tandem gene copies should produce twice the output of a single gene, but this expectation has not been rigorously investigated. Here, we show that tandem duplication often results in more than double the gene activity. A naturally occurring tandem duplication of the Alcohol dehydrogenase (Adh) gene exhibits 2.6-fold greater expression than the single-copy gene in transgenic Drosophila. This tandem duplication also exhibits greater activity than two copies of the gene in trans, demonstrating that it is the tandem arrangement and not copy number that is the cause of overactivity. We also show that tandem duplication of an unrelated synthetic reporter gene is overactive (2.3- to 5.1-fold) at all sites in the genome that we tested, suggesting that overactivity could be a general property of tandem gene duplicates. Overactivity occurs at the level of RNA transcription, and therefore tandem duplicate overactivity appears to be a previously unidentified form of position effect. The increment of surplus gene expression observed is comparable to many regulatory mutations fixed in nature and, if typical of other genomes, would shape the fate of tandem duplicates in evolution.
Evolutionarily and medically relevant phenotypes often derive from quantitative changes in gene expression. It is becoming increasingly appreciated that relatively modest changes in gene expression or protein activity can have meaningful effects. For example, alleles with 1.1- to 1.6-fold effects on transcription or enzyme activity (1) have been identified that show evidence for selection, including Adh in Drosophila melanogaster and Lactase and Prodynorphin in humans (1–3). Furthermore, most transcriptional variation in Drosophila species is on the order of twofold or less (4). Understanding the mutational basis of these activity changes is a necessary step to predict phenotypes based on genomic sequences.
One simple way for gene activity to double is through tandem gene duplication. Gene duplication is a common mutational process, occurring with estimated rates of 10−9 to 10−7 new duplicates per gene per generation in flies, worms, and yeast (5, 6). Gene duplication has been of long-standing interest in evolution because, once genes have duplicated, one copy may acquire a novel function (7, 8), and many genes involved in physiological and developmental diversification occur as tandem duplicates in gene complexes. However, relatively little is known empirically about the first step in this process—the immediate phenotypic consequences of a single gene duplication. This may be due to the difficulty of isolating the effects of increased copy number from any potential contribution of subsequent sequence divergence to gene expression of a duplicate pair. Here, we uncovered an effect of tandem duplications on gene activity in the Drosophila melanogaster genome that is greater than twofold. We suggest that this phenomenon, which we refer to as “tandem duplicate overactivity,” may be a previously unidentified type of position effect on gene expression.
Results and Discussion
Tandem Duplication of Adh Is Overactive.
We encountered the possibility that tandem gene duplicates might not simply produce a twofold increase in gene output in the course of pursuing the genetic basis of the sixfold greater ADH enzyme activity in brewery-adapted Drosophila virilis relative to its sibling Drosophila americana (Fig. 1A). Two copies of the entire D. virilis gene, including all known regulatory elements, occur within a 7-kb tandem duplication, whereas the orthologous sequence in D. americana is single copy (9). We cloned the duplicated Adh region from D. virilis and found that the two duplicate copies in our laboratory strain were nearly identical, with only three distinguishing single-nucleotide changes located distal to the transcription unit (Fig. 1B). We therefore presumed that the tandem duplication would account for twofold higher activity, with the remaining threefold change in activity accounted for by subsequent changes in regulatory or coding sequences.
Fig. 1.
Tandem duplication of Adh from D. virilis is overactive. (A) ADH enzyme activity is sixfold higher in D. virilis than D. americana. Boxplots show median and interquartile range with thin lines extending to the lesser of 1.5× the interquartile range or the data extremes. n = 15 samples. (B) Schematic of the tandem duplicated Adh locus in D. virilis (“Dup”). Vertical bars delimit the duplicated region. Ovals mark the three nucleotides that distinguish the left copy from the right copy. Also shown are engineered constructs with SNPs removed (“Ident_dup”) and the isolated single copy (“Single”). (C) ADH activity of D. melanogaster flies (ZH-86Fb attP site, Adh-null) transformed with D. virilis Single and Dup constructs. Dashed line shows predicted twofold mean activity of the Single construct. Error bars show 95% confidence interval of means (Tables S1–S14). Sample sizes for this and subsequent plots are in Tables S1–S14. We verified that assay measurements scaled one-to-one with homogenate concentration (Fig. S1).
We tested this presumption by inserting duplicate and single-copy D. virilis Adh transgenes (Fig. 1B) into an inbred Adh-null D. melanogaster recipient line at a specific chromosomal insertion site (attP ZH-86Fb), followed by measurement of ADH activity from whole-fly homogenates. We were surprised to observe 2.6-fold higher ADH enzyme activity from the duplicates than from the single copy of D. virilis Adh (Fig. 1C). The difference between single and duplicate was significantly greater than expected (t test, P = 0.0005; see Tables S1–S14 for details of underlying mixed-effects models). In addition, we tested for any effect of the between-copy nucleotide changes by engineering a construct where the left and the right genes were identical (Fig. 1B). ADH activity from this identical-duplicate construct was indistinguishable from the original cloned duplicate (t test, P = 0.64; Fig. 1C). These results suggested that duplication of the Adh gene itself might be the source of the excess 60% activity.
Table S1.
Sample size, mixed-effects model parameters, and estimated effects for data presented in Fig. 1C, ADH activity at ZH-86Fb
| Sample size | Random effects | Fixed effects | Planned comparisons | ||||||||||||
| Day reps | Construct | N (lines) | Vials per line | Total samples | SD | Unequal variance multiplier | Estimate | SE | t value | df | P | ||||
| 10 | Single | 3 | 3 | 72 | Day | 0.083 | Dup | 3.03 | (Intercept) | 0.436 | 0.027 | (Dup – Single) vs. Single | 8.17 | 5 | 0.00045 |
| Dup | 4 | 3 | 90 | Line | 0.0087 | Ident_Dup | 2.61 | Dup | 0.676 | 0.016 | Dup vs. Ident_Dup | 0.49 | 5 | 0.64 | |
| Ident_Dup | 3 | 3 | 75 | Residual | 0.124 | Single | 1 | Ident_dup | 0.687 | 0.015 | |||||
Model: lme(relative_ADH_activity ∼ construct, random = list(one1 = pdIdent(∼day-1), one2 = pdIdent(∼line-1)), weights = varIdent(form=∼1|construct)). Units for response variables are as follows: ADH activity, ΔAbs340 per minute per milligram of protein; Adh mRNA, level relative to control gene RP49, i.e., 2^−(Adh Cq – RP49 Cq); β-galactosidase activity, ΔAbs574 per hour * 1,000. “Estimate” denotes the estimate of the response variable using standard R style, such that (Intercept) refers to the mean of the first factor level (always set to Single) and the subsequent values (e.g., Dup) are the mean of the next factor level minus the intercept. For models with no intercept, means are shown. These parameter estimates were used to compute the planned t tests. Model formulas are given in nlme style. Experiments with a vial random effect (vial nested within line and day) measured multiple fly homogenates per culture vial; experiments without this term measured a single homogenate per vial. “Total samples” is often less than the product of the grouping factors due to (i) in some experiments, not all lines were available to measure on each day replicate, or (ii) occasional vials not producing flies.
Table S14.
Sample size, mixed-effects model parameters, and estimated effects for data presented in Fig. 5B, D. virilis Adh mRNA at attP40
| Sample size | Random effects | Fixed effects | Planned comparisons | ||||||||||||
| Day reps | Construct | N (lines) | Vials per line | Total samples | SD | Unequal variance multiplier | Estimate | SE | t value | df | P | ||||
| 2 | Single | 2 | 3 | 12 | Day | 0.098 | Single | 1 | (Intercept) | 2.36 | 0.17 | (Dup – Single) vs. Single | 0.43 | 3 | 0.70 |
| Dup | 3 | 3 | 18 | Line | 0.16 | Dup | 2.42 | Dup | 2.20 | 0.28 | |||||
| Residual | 0.37 | ||||||||||||||
Model: lme(relative_ADH_mRNA ∼ construct, random = list(one1 = pdIdent(∼day-1), one2 = pdIdent(∼line-1)), weights = varIdent(form=∼1|construct)). For details, see the legend of Table S1.
Table S2.
Sample size, mixed-effects model parameters, and estimated effects for data presented in Fig. 2, ADH activity at ZH-86Fb from F2s using high throughput method and no-intercept model
| Sample size | Random effects | Fixed effects | Planned comparisons | ||||||||||||
| Day reps | Construct-zygosity | N (lines) | Vials per line | Total samples | SD | Unequal variance multiplier | Estimate | SE | t value | df | P | ||||
| 1 | Dup_hom | 3 | 3 | 16 | Line | 0.009 | Dup_hom | 7.86 | Dup_hom | 1.264 | 0.042 | Dup_het vs. Single_hom | 7.99 | 4 | 0.0013 |
| Dup_het | 3 | 3 | 27 | Vial | 0.020 | Dup_het | 4.02 | Dup_het | 0.743 | 0.018 | |||||
| Single_hom | 3 | 3 | 22 | Residual | 0.021 | Single_hom | 4.65 | Single_hom | 0.515 | 0.022 | |||||
| Single_het | 3 | 3 | 17 | Single_het | 1 | Single_het | 0.281 | 0.009 | |||||||
| Uninserted (pooled) | 6 | Uninserted | 0.4 | Uninserted | 0.076 | 0.009 | |||||||||
Model: lme(relative_ADH_activity ∼ -1 + construct_zygosity, random = list(one1 = pdIdent(∼line-1), one2 = pdIdent(∼vial-1)), weights = varIdent(form=∼1|construct_zygosity)). For details, see the legend of Table S1.
Table S3.
Sample size, mixed-effects model parameters, and estimated effects for data presented in Fig. S2A, ADH activity at ZH-86Fb from F2s using manual homogenization and no-intercept model
| Sample size | Random effects | Fixed effects | Planned comparisons | ||||||||||||
| Day reps | Construct-zygosity | N (lines) | Vials per line | Total samples | SD | Unequal variance multiplier | Estimate | SE | t value | df | P | ||||
| 3 | Dup_hom | 3 | 3 | 35 | Day | 0.011 | Dup_hom | 3.81 | Dup_hom | 1.264 | 0.031 | Dup_het vs. Single_hom | 6.36 | 4 | 0.0031 |
| Dup_het | 3 | 3 | 36 | Line | 0.014 | Dup_het | 2.00 | Dup_het | 0.689 | 0.018 | |||||
| Single_hom | 3 | 3 | 34 | Residual | 0.091 | Single_hom | 1.41 | Single_hom | 0.460 | 0.015 | |||||
| Single_het | 3 | 3 | 36 | Single_het | 1 | Single_het | 0.265 | 0.013 | |||||||
Model: lme(relative_ADH_activity ∼ -1 + construct_zygosity, random = list(one1 = pdIdent(∼day-1), one2 = pdIdent(∼line-1), one3 = pdIdent(∼vial-1)), weights = varIdent(form=∼1|construct_zygosity)). For details, see the legend of Table S1.
Table S4.
Sample size, mixed-effects model parameters, and estimated effects for data presented in Fig. 3A, β-galactosidase activity at ZH-86Fb
| Sample size | Random effects | Fixed effects | Planned comparisons | ||||||||||||
| Day reps | Construct | N (lines) | Vials per line | Total samples | SD | Unequal variance multiplier | Estimate | SE | t value | df | P | ||||
| 6 | Single | 4 | 3 | 72 | Day | 5.48 | Single | 1 | (Intercept) | 100.38 | 4.72 | (Dup – Single) vs. Single | 3.68 | 6 | 0.0103 |
| Dup | 4 | 3 | 72 | Line | 7.11 | Dup | 2.52 | Dup | 133.58 | 7.69 | |||||
| Residual | 45.88 | ||||||||||||||
Model: lme(betagal_activity ∼ construct, random = list(one1 = pdIdent(∼day-1), one2 = pdIdent(∼line-1)), weights = varIdent(form=∼1|construct)). For details, see the legend of Table S1.
Table S5.
Sample size, mixed-effects model parameters, and estimated effects for data presented in Fig. 3B, β-galactosidase activity at ZH-22a
| Sample size | Random effects | Fixed effects | Planned comparisons | ||||||||||||
| Day reps | Construct | N (lines) | Vials per line | Total samples | SD | Unequal variance multiplier | Estimate | SE | t value | df | P | ||||
| 5 | Single | 2 | 3 | 60 | Day | 3.55 | Single | 1 | (Intercept) | 54.90 | 1.83 | (Dup – Single) vs. Single | 6.62 | 2 | 0.022 |
| Dup | 2 | 3 | 59 | Line | 0.00038 | Dup | 2.80 | Dup | 76.85 | 2.76 | |||||
| Vial | 0.00064 | ||||||||||||||
| Residual | 7.14 | ||||||||||||||
Model: lme(betagal_activity ∼ construct, random = list(one1 = pdIdent(∼day-1), one2 = pdIdent(∼line-1), one3 = pdIdent(∼vial-1)), weights = varIdent(form=∼1|construct)). For details, see the legend of Table S1.
Table S6.
Sample size, mixed-effects model parameters, and estimated effects for data presented in Fig. 3C, β-galactosidase activity at ZH-68E
| Sample size | Random effects | Fixed effects | Planned comparisons | ||||||||||||
| Day reps | Construct | N (lines) | Vials per line | Total samples | SD | Unequal variance multiplier | Estimate | SE | t value | df | P | ||||
| 6 | Single | 3 | 3 | 84 | Day | 6.94 | Single | 1 | (Intercept) | 59.25 | 10.24 | (Dup – Single) vs. Single | 9.17 | 4 | 0.00078 |
| Dup | 3 | 3 | 84 | Line | 16.76 | Dup | 3.76 | Dup | 224.65 | 14.83 | |||||
| Vial | 4.50 | ||||||||||||||
| Residual | 12.00 | ||||||||||||||
Model: lme(betagal_activity ∼ construct, random = list(one1 = pdIdent(∼day-1), one2 = pdIdent(∼line-1), one3 = pdIdent(∼vial-1)), weights = varIdent(form=∼1|construct)). For details, see the legend of Table S1.
Table S7.
Sample size, mixed-effects model parameters, and estimated effects for data presented in Fig. 3D, β-galactosidase activity at ZH-51D
| Sample size | Random effects | Fixed effects | Planned comparisons | ||||||||||||
| Day reps | Construct | N (lines) | Vials per line | Total samples | SD | Unequal variance multiplier | Estimate | SE | t value | df | P | ||||
| 4 | Single | 3 | 3 | 72 | Day | 5.18 | Single | 1 | (Intercept) | 44.50 | 3.75 | (Dup – Single) vs. Single | 12.34 | 4 | 0.00025 |
| Dup | 3 | 3 | 65 | Line | 4.13 | Dup | 3.17 | Dup | 118.50 | 4.68 | |||||
| Vial | 4.60 | ||||||||||||||
| Residual | 24.40 | ||||||||||||||
Model: lme(betagal_activity ∼ construct, random = list(one1 = pdIdent(∼day-1), one2 = pdIdent(∼line-1), one3 = pdIdent(∼vial-1)), weights = varIdent(form=∼1|construct)). For details, see the legend of Table S1.
Table S8.
Sample size, mixed-effects model parameters, and estimated effects for data presented in Fig. 3E, β-galactosidase activity at VK00037
| Sample size | Random effects | Fixed effects | Planned comparisons | ||||||||||||
| Day reps | Construct | N (lines) | Vials per line | Total samples | SD | Unequal variance multiplier | Estimate | SE | t value | df | P | ||||
| 6 | Single | 2 | 3 | 72 | Day | 7.83 | Single | 1 | (Intercept) | 83.92 | 6.86 | (Dup – Single) vs. Single | 4.51 | 2 | 0.046 |
| Dup | 2 | 3 | 72 | Line | 8.01 | Dup | 3.00 | Dup | 136.41 | 9.40 | |||||
| Vial | 10.20 | ||||||||||||||
| Residual | 11.46 | ||||||||||||||
Model: lme(betagal_activity ∼ construct, random = list(one1 = pdIdent(∼day-1), one2 = pdIdent(∼line-1), one3 = pdIdent(∼vial-1)), weights = varIdent(form=∼1|construct)). For details, see the legend of Table S1.
Table S9.
Sample size, mixed-effects model parameters, and estimated effects for data presented in Fig. 3F, β-galactosidase activity at VK00033
| Sample size | Random effects | Fixed effects | Planned comparisons | ||||||||||||
| Day reps | Construct | N (lines) | Vials per line | Total samples | SD | Unequal variance multiplier | Estimate | SE | t value | df | P | ||||
| 4 | Single | 2 | 3 | 48 | Day | 0.00063 | Single | 1 | (Intercept) | 71.83 | 1.05 | (Dup – Single) vs. Single | 7.18 | 2 | 0.019 |
| Dup | 2 | 3 | 48 | Line | 0.79 | Dup | 3.10 | Dup | 94.74 | 3.01 | |||||
| Vial | 3.3E-05 | ||||||||||||||
| Residual | 6.18 | ||||||||||||||
Model: lme(betagal_activity ∼ construct, random = list(one1 = pdIdent(∼day-1), one2 = pdIdent(∼line-1), one3 = pdIdent(∼vial-1)), weights = varIdent(form=∼1|construct)). For details, see the legend of Table S1.
Table S10.
Sample size, mixed-effects model parameters, and estimated effects for data presented in Fig. 3G, β-galactosidase activity at attP40
| Sample size | Random effects | Fixed effects | Planned comparisons | ||||||||||||
| Day reps | Construct | N (lines) | Vials per line | Total samples | SD | Unequal variance multiplier | Estimate | SE | t value | df | P | ||||
| 4 | Single | 4 | 3 | 96 | Day | 2.82 | Single | 1 | (Intercept) | 99.61 | 2.91 | (Dup – Single) vs. Single | 6.90 | 6 | 0.00046 |
| Dup | 4 | 3 | 96 | Line | 4.23 | Dup | 2.75 | Dup | 137.10 | 4.60 | |||||
| Vial | 5.43 | ||||||||||||||
| Residual | 30.56 | ||||||||||||||
Model: lme(betagal_activity ∼ construct, random = list(one1 = pdIdent(∼day-1), one2 = pdIdent(∼line-1), one3 = pdIdent(∼vial-1)), weights = varIdent(form=∼1|construct)). For details, see the legend of Table S1.
Table S11.
Sample size, mixed-effects model parameters, and estimated effects for data presented in Fig. 4A, ADH activity at attP40
| Sample size | Random effects | Fixed effects | Planned comparisons | ||||||||||||
| Day reps | Construct | N (lines) | Vials per line | Total samples | SD | Unequal variance multiplier | Estimate | SE | t value | df | P | ||||
| 5 | Single | 3 | 3 | 45 | Day | 0.035 | Single | 1 | (Intercept) | 0.514 | 0.025 | (Dup – Single) vs. Single | 1.57 | 4 | 0.19 |
| Dup | 3 | 3 | 45 | Line | 0.024 | Dup | 1.15 | Dup | 0.469 | 0.029 | |||||
| Residual | 0.106 | ||||||||||||||
Model: lme(relative_ADH_activity ∼ construct, random = list(one1 = pdIdent(∼day-1), one2 = pdIdent(∼line-1)), weights = varIdent(form=∼1|construct)). For details, see the legend of Table S1.
Table S12.
Sample size, mixed-effects model parameters, and estimated effects for data presented in Fig. 4B, ADH activity at attP40 from F2s using high throughput method and no-intercept model
| Sample size | Random effects | Fixed effects | Planned comparisons | ||||||||||||
| Day reps | Construct-zygosity | N (lines) | Vials per line | Total samples | SD | Unequal variance multiplier | Estimate | SE | t value | df | P | ||||
| 2 | Dup_hom | 3 | 3 | 35 | Day | 6.7E-09 | Dup_hom | 3.99 | Dup_hom | 0.641 | 0.025 | Dup_het vs. Single_hom | 0.48 | 4 | 0.66 |
| Dup_het | 3 | 3 | 36 | Line | 9.8E-07 | Dup_het | 2.07 | Dup_het | 0.357 | 0.013 | |||||
| Single_hom | 3 | 3 | 36 | Vial | 0.007 | Single_hom | 2.24 | Single_hom | 0.348 | 0.014 | |||||
| Single_het | 3 | 3 | 36 | Residual | 0.076 | Single_het | 1 | Single_het | 0.182 | 0.006 | |||||
| Uninserted (pooled) | 70 | Uninserted | 0.46 | Uninserted | 0.039 | 0.002 | |||||||||
Model: lme(relative_ADH_activity ∼ -1 + construct_zygosity, random = list(one1 = pdIdent(∼day-1), one2 = pdIdent(∼line-1), one3 = pdIdent(∼vial-1)), weights = varIdent(form=∼1|construct_zygosity)). For details, see the legend of Table S1.
Table S13.
Sample size, mixed-effects model parameters, and estimated effects for data presented in Fig. 5A, D. virilis Adh mRNA at ZH-86Fb
| Sample size | Random effects | Fixed effects | Planned comparisons | ||||||||||||
| Day reps | Construct | N (lines) | Vials per line | Total samples | SD | Unequal variance multiplier | Estimate | SE | t value | df | P | ||||
| 3 | Single | 3 | 3 | 21 | Day | 0.11 | Single | 1 | (Intercept) | 1.25 | 0.10 | (Ident_Dup – Single) vs. Single | 6.85 | 4 | 0.0024 |
| Ident_Dup | 3 | 3 | 18 | Line | 0.00001 | Ident_Dup | 3.85 | Ident_Dup | 3.43 | 0.30 | |||||
| Residual | 1.24 | ||||||||||||||
Model: lme(relative_ADH_mRNA ∼ construct, random = list(one1 = pdIdent(∼day-1), one2 = pdIdent(∼line-1)), weights = varIdent(form=∼1|construct)). For details, see the legend of Table S1.
Overactivity Depends on Tandem Arrangement.
This unexpected observation prompted us to examine whether the surplus activity could be due to nonadditive scaling of gene expression. Specifically, the duplication-bearing flies contain four copies of the D. virilis Adh gene per cell, whereas the singletons carry two copies. We reasoned that comparing ADH activity in flies with an equal number of gene copies per cell but in different configurations would control for nonadditive scaling of gene expression. We crossed single and duplicate inserted flies back to the Adh-null transgene insertion line, crossed F1 siblings, and compared F2 flies that were singleton homozygotes (two copies per cell) with flies that were duplicate heterozygotes (two copies per cell) (Fig. 2). Duplicate heterozygotes had 50% higher ADH activity than singleton homozygotes, significantly different from the null expectation of equal activity (t test, P = 0.0013). This demonstrates that two gene copies arranged in tandem behave differently than two copies in trans.
Fig. 2.
Excessive ADH activity is due to the tandem duplication, not copy number per cell. F2 homozygotes and heterozygotes from crosses back to the Adh-null transgene insertion line were extracted using the high-throughput procedure. Compare the Single homozygote and the Dup heterozygote, each of which bear two copies of the Adh but in different configurations.
Tandem Duplicates of a Synthetic Reporter Gene Are Overactive at All Sites Tested.
We next considered whether this overactivity could be a general property of tandem duplicates. If so, overactivity would not be limited to the Adh gene, and it should not be limited to one chromosomal location. We tested the first hypothesis by constructing duplications of an unrelated gene, the well-studied synthetic reporter gene vgQ-lacZ that consists of the Escherichia coli β-galactosidase reporter gene linked to the ∼800-bp quadrant enhancer of the D. melanogaster vestigial gene (10). We inserted single and duplicate constructs into the same insertion site used above and then measured β-galactosidase activity in third-instar wing imaginal disk cells. The activity of duplicate transgenes relative to singletons was again significantly greater than twofold (t test, P = 0.01; Fig. 3A), even though the gene, tissue, and measurement assay used were completely different.
Fig. 3.
Duplicate overactivity is not limited to Adh and varies with genomic position. vgQ-lacZ “Single” and tandem duplicate (“Dup”) constructs were inserted in the following attP sites: (A) ZH-86Fb (same site and genetic background as Fig. 1); (B) ZH-22A; (C) ZH-68E; (D) ZH-51D; (E) VK00037; (F) VK00033; and (G) attp40 (same site and genetic background as Fig. 4). β-Galactosidase activity was measured from wing imaginal discs. Dashed line shows predicted twofold activity of Single construct at each site. (H) Summary of preceding panels. Each point represents mean β-galactosidase activity from Dup and Single inserts at each site. Dashed line indicates a twofold activity difference.
We examined whether duplicate overactivity was dependent on chromosomal position by inserting single and duplicate vgQ-lacZ transgenes at six additional sites. We selected attP insertion sites that are commonly used by Drosophila researchers because of their faithful expression of transgenes. The vgQ-lacZ duplicates were significantly overactive at all sites (Fig. 3 and Tables S1–S14), with duplicate activity ranging from 2.3-fold (in four of seven sites) to 5.1-fold higher than singletons. Even considering the possibility that the insertion sites selected are a biased sample of the genome, this result suggests that overactivity is common and could have a typical value. It also suggests that the degree of overactivity is influenced by chromosome location.
Although overactivity was common in our observations, we note that it was not universal across chromosome locations. When the Adh Single and Dup constructs were inserted in the attP40 site (in an Adh-null background), Adh duplicate activity relative to singleton activity was not significantly different from twofold (t test, P = 0.19; Fig. 4A). If Adh duplicates in this site are merely additive, two copies of the gene should produce the same output regardless of whether or not the genes are in tandem configuration. To test this hypothesis, we crossed single and duplicate inserted flies back to the Adh-null transgene insertion line, crossed F1 siblings, and compared the activity of F2 duplicate heterozygotes to singleton homozygotes. These genotypes were indistinguishable from one another (t test, P = 0.66; Fig. 4B), indicating additivity. In contrast, however, the vgQ-LacZ duplicate insertions in this site were overactive (Fig. 3G). Therefore, duplicates do have the capacity to behave additively, but this appears to be influenced by both chromosomal location and some aspect of the duplicated sequence.
Fig. 4.
Tandem duplicate activity is simply additive for Adh in one genomic position. (A) ADH activity for virilis Adh Single and Dup constructs in the attP40 site in Adh-null background. (B) Differences in ADH activity are proportional to copy number per cell. F2 heterozygous and homozygous males from crosses back to the Adh-null transgene insertion line were generated as in Fig. 2.
Duplicate Overactivity Is Transcriptional.
Our experiments indicate that duplicate overactivity is not the result of raw scaling of gene number per cell and is influenced by chromosome position. These observations are not consistent with a posttranscriptional mechanism. Instead, they suggest that overactivity should be manifest at the transcript level. To test this prediction, we isolated RNA from single and duplicate inserted flies and conducted quantitative real-time PCR measurements calibrated with a standard curve and a control gene RP49 (Fig. S1). In the overactive ZH-86Fb site, duplicate flies expressed virilis Adh RNA transcript levels that were 3.7-fold higher than singleton flies, significantly greater than the additive expectation of twofold (t test, P = 0.002; Fig. 5A). In contrast, in the additive attP40 site, the difference in Adh transcript levels was not significantly different from twofold (t test, P = 0.70; Fig. 5B). Duplicate overactivity (and its absence) therefore manifests at both the protein and transcript levels.
Fig. S1.
Standard curves. To ensure linearity of assay conditions, standard curves were constructed. Regression equations and R2 values for linear regressions on log2-transformed values are shown on each chart. (A) ADH activity of twofold serial dilution of fly homogenates. Circles: Ident_dup flies. Triangles: Single construct flies. Open symbols denote data points that we excluded from linear regression due to loss of linear response at low concentrations. Arrowhead marks the fly concentration used in enzyme assay experiments. (B) β-Galactosidase activity of twofold serial dilution of purified β-galactosidase enzyme under assay conditions. Arrowheads mark the range of activity values observed from wing imaginal discs in experiments. (C and D) qRT-PCR of fourfold serial dilution of virilis Adh (C) or RP49 (D) plasmids. Arrowheads mark the range of quantification cycle (Cq) values observed from fly cDNA in experiments.
Fig. 5.
Overactivity is associated with increased transcription. Expression of Adh relative to control gene RP49 was measured with quantitative real-time PCR. (A) Adh transcription from duplicates in the ZH-86Fb site is overactive. (B) Adh transcription from duplicates in the attp40 site is additive. Dashed line shows predicted twofold expression of Single construct.
The Biological Significance and Potential Mechanisms Underlying Overactivity.
It may be asked whether it is biologically significant that a mutation changes activity by 2.6-fold rather than 2.0-fold. Evidence from functional and population studies in flies and humans suggests that fractional differences in gene expression of this magnitude (60%) can have phenotypic effects and show signatures of selection (1–3). It is therefore likely that duplicate overactivity can contribute meaningfully to phenotypes when large changes in activity are advantageous. The sixfold difference in ADH activity between alcohol-resistant D. virilis and alcohol-sensitive D. americana is among the largest seen between sibling Drosophila species (9, 11). Gene duplication and duplicate overactivity appear to be able to account for a portion of this difference, but we caution that we did not measure the level of overactivity at the native D. virilis Adh locus. Additional sequence divergence at Adh or in trans may also contribute to the difference in ADH levels.
At the population scale, however, most duplicates occur at low allele frequencies, suggesting that there is generally negative selection against large changes in gene activity (5, 6). When functionally redundant duplicated genes are retained (12), their joint expression levels often evolve to be comparable to that of single-copy genes. These observations suggest that duplicate overactivity might often be suppressed or masked by selection for mutations that reduce gene activity.
There are hints in the literature that gene duplicate overactivity may occur in other contexts. In the first-described case of gene duplication, Sturtevant (13) observed that Bar duplicate heterozygotes suppress eye facet formation 1.5-fold more than singleton homozygotes, a similar ratio to what we observed here with Adh. In mosquitoes, a tandemly duplicated block of P450 genes exhibits 25- to 50-fold higher transcription (14). In addition, in tumor cells as well as human populations, some duplicated genes also show possible nonadditive expression relative to single copies (15–17). However, we note that detection of duplicate overactivity requires that one control for additional potential regulatory substitutions in cis and in trans, which may impose a practical limit on studies of duplicate overactivity to fresh duplicates or to transgenic experiments.
Tandem duplicate overactivity appears to be a previously unknown form of position effect, in this case one in which gene expression levels are affected by the presence of an adjacent duplicate gene. The greater than twofold increase in transcription from a tandem duplicate could arise from aspects of various known regulatory mechanisms. Some of the possibilities we can envision include the following: (i) more frequent rebinding of transcription factors because the local concentration of binding sites is higher in tandemly arranged duplicates (18); (ii) more efficient looping of DNA due to clusters of transcription factors binding to identical sites on both gene copies (18); or (iii) more effective remodeling of chromatin to a favorable state for transcription (19). Any of these mechanisms could be enhanced or attenuated by neighboring sequences (i.e., by classical position effects), which could account for the observed dependence of the degree of overactivity on chromosomal position. However, the nearly universal positive overactivity observed here suggests that this is not just the influence of classical position effects, which we would expect would affect both single and duplicate genes in similar ways. Instead, some aspect of the duplicated sequence itself appears to generate a synergistic effect on expression whenever two identical genes are adjacent to each other.
Conclusion
The discovery of the overactivity of tandem duplicates in Drosophila, despite many decades of the study of gene duplication, underscores how our understanding of the quantitative factors that govern gene expression are incomplete. We hope that this study will prompt similar quantitative analyses of gene duplicates in other genomes to ascertain to what degree overactivity is a general phenomenon. Uncovering such potential general principles is a necessary step toward the goal of using genome sequences to understand and predict phenotypes.
Methods
We investigated the contribution of tandem duplications to phenotypes (enzyme activity and mRNA levels) using transgenes in Drosophila. Transgenic lines with Adh and vgQ-lacZ single or tandem duplicate insertions were produced using the PhiC31-attP system as described in SI Methods and Table S15. This transgenic system allows different transgenes (e.g., Single and Duplicate) to be inserted into the same chromosomal site in identical genetic background. ADH enzyme activity and mRNA level was measured from homogenates of whole flies, whereas β-galactosidase activity was measured from dissected wandering third-instar wing imaginal discs using protocols described in SI Methods and Fig. S2. Assays were checked for linearity and one-to-one scaling using standard curves shown in Fig. S1. The experimental design had a nested structure: enzyme activity and mRNA levels were measured from a large number of samples (i.e., 12–96) from a small number (i.e., 2–4) of replicate transgenic lines. Therefore, we analyzed the data with a mixed-effects model (described in more detail in SI Methods and with details on the sample size, model parameters, and estimated effects for each experiment presented in Tables S1–S14). Tests of the null hypothesis of twofold difference were calculated with t tests, using the SEs from the mixed-effects models and degrees of freedom corresponding to the number of transgenic lines.
Table S15.
Oligonucleotide primers used
| Name | Sequence | Purpose |
| pCaryP-attP-F2 | GCGGCAACCCTCAGCG | Construction of line pf40 |
| pCaryP-attP-R1 | ACGTGTCCACCCCGGTCA | Construction of line pf40 |
| Adh-CDS-F2 | AAGCAAAAAAGAAGTCACCATGTC | Construction of line pf40 |
| Adh-exon2-R2 | CAGGTTCTAGGATTGAATACACGA | Construction of line pf40 |
| Adh_vir_AF1 | TTggcgcgcCTTATCAGTAAATTTACGAGTGGTTTGT | Amplify vir Adh LHS |
| Adh_vir_il3F | GCAAACCGTAAGTCCATTGTCTAC | Amplify vir Adh center |
| Adh_vir_il3R | GTAGACAATGGACTTACGGTTTGC | Amplify vir Adh LHS |
| Adh_vir_ir1F | TAGCCGAGACATTGCTGTTGAG | Amplify vir Adh RHS |
| Adh_vir_ir1R | CTCAACAGCAATGTCTCGGCTA | Amplify vir Ah center |
| Adh_vir_NR1 | ATGTTAgcggccgCTGTTTTGCTGTCTGAATTTTGTG | Amplify vir Adh RHS |
| Adh_vir_s3aAF1 | CCCGGGCGAATTCGCCggcgcgccTTATCAGTAAATTTACGAGTGGTTTGT | Amplify vir Adh with Gibson ends for pS3aG |
| Adh_vir_s3aNR1 | CTGATTATGATCTAGAGTCgcggccgcTGTTTTGCTGTCTGAATTTTGTG | Amplify vir Adh with Gibson ends for pS3aG |
| csr34F | GCGCTGACTTTGAGTGGAATGTC | Amplify vgQ-lacZ |
| pGBQ_S3aAcsr34F | TGGCGGCCGCGGGAATTCGATGCTCTTCCACGGACATGCTAAGGGTTAATCAACGCGCTGACTTTGAGTGGAATGTC | Amplify vgQ-lacZ with Gibson ends for pGem-T-Easy |
| pGBQ_S3aNsv40R2 | CCGCGAATTCACTAGTGATGCTCTTCGACATTCCACTCAAAGTCAGCGCGTTGATTAACCCTTAGCATGTCCGTG | Amplify vgQ-lacZ with Gibson ends for pGem-T-Easy |
| qrAdhvir-2F | CTATTGCGGTAAACTTTACGGGCACG | qRT-PCR vir Adh |
| qrAdhvir-2R | TACACCAGTAATAGGCGCCAGTCTG | qRT-PCR vir Adh |
| Rp49-dl2F | CGCCCAGCATACAGGCCCAAGA | qRT-PCR Rp49 (RPL32) |
| Rp49-dl2R | ACCAGGAACTTCTTGAATCCGGTGG | qRT-PCR Rp49 (RPL32) |
| S3a_F2 | CACATGTGCAAGAGAACCCAGTG | Sequence inserts in pS3aG |
| S3a_R5 | GCATTCATTTTATGTTTCAGGTTCA | Sequence inserts in pS3aG |
| sv40-R2 | GTTGATTAACCCTTAGCATGTCCGTG | Amplify vgQ-lacZ |
| va_S01_R | ACTGATGACTTTTGGTTTAAGTATTTCTA | Sequence vir Adh |
| va_S02_R | CAGCCGAACGGCATTGATT | Sequence vir Adh |
| va_S03_F | TACACAAGAGCATTACATTACCTAGACAC | Sequence vir Adh |
| va_S04_F | ATAGCTGCGCATGCATGATA | Sequence vir Adh |
| va_S05_R | CGTGTGCTTTGTGTGTTGGTAGA | Sequence vir Adh |
| va_S06_F | CGGCAAGGGAACACTAAAAA | Sequence vir Adh |
| va_S07_R | GACAATCTCCCGACTGGTGT | Sequence vir Adh |
| va_S08_F | TTGCCAAGCTGAAGACTGTG | Sequence vir Adh |
| va_S09_R | GATCGTCCAAAATGCCAGC | Sequence vir Adh |
| va_S10_F | TGGGCGAGCTACTTCTTGAG | Sequence vir Adh |
| va_S11_R | GCCTCAATGGCCTTAACAAA | Sequence vir Adh |
| va_S12_F | AAATGTGGTTGGTTGCTTTTCA | Sequence vir Adh |
| va_S13_R | TGGAGTAATCGAATTTGCAACG | Sequence vir Adh |
| va_S14_F | AATAAAATCTTCCTTTTGCAGGTTAC | Sequence vir Adh |
| va_S15_R | AACCTGCAAAAGGAAGATTTTATTAC | Sequence vir Adh |
| va_S16_F | GAGCATTTATTATTCAAGCAAGAT | Sequence vir Adh |
| va_S17_R | TATCTTGCTTGAATAATAAATGCTC | Sequence vir Adh |
| va_S18_F | TAGACATAGGTCAACATTTCCGATT | Sequence vir Adh |
| va_S19_R | AATCGGAAATGTTGACCTATGTCT | Sequence vir Adh |
| va_S20_F | CCAATCAAAAGTTGCGTGTG | Sequence vir Adh |
| va_S21_R | CACACGCAACTTTTGATTGG | Sequence vir Adh |
| va_S22_F | AAATGGCGTATACCGCAAAA | Sequence vir Adh |
| va_S23_R | CAGCAGAAGCCAGCAAACG | Sequence vir Adh |
| va_S24_R | GCTGTGCTTGTTGTTCTTGC | Sequence vir Adh |
| va_S25_F | GATACGGATGGATTTGCTACGA | Sequence vir Adh |
| va_S30_F | GCCGACGAGTTGCCAATG | Sequence vir Adh |
| va_S31_R | GACTCAATCCGCTTGTTCTGTG | Sequence vir Adh |
| va_S32_F | GTCACACCTCTGGGATGTCAAT | Sequence vir Adh |
| vgQ-F1 | CGATTGTACTTTGTCGTTTCTAATTG | Sequence vgQ-lacZ |
| vgQ-R1 | CCCTCCGGAGACCGGGGGCCCAAAAATAG | Sequence vgQ-lacZ |
| vgQ-seqout-F1 | ACGCATAGTGCGGTCCTGCAC | Sequence vgQ-lacZ |
| vgQ-seqout-R1 | CTCCAGTTGTTGGATATTTTTCTCTCG | Sequence vgQ-lacZ |
| vgQseq_10200R | CAGACGCCACTGCTGCCAGG | Sequence vgQ-lacZ |
| vgQseq_10500F | TAACGCCTGGGTCGAACGCTGG | Sequence vgQ-lacZ |
| vgQseq_10800R | GCAGTAAGGCGGTCGGGATAGT | Sequence vgQ-lacZ |
| vgQseq_11000F | TGGCTGAATATCGACGGTTTCCATATGG | Sequence vgQ-lacZ |
| vgQseq_11300R | TACCCGTATCACTTTTGCTGATATGG | Sequence vgQ-lacZ |
| vgQseq_11600F | ATGAATGGGAGCAGTGGTGGAATGC | Sequence vgQ-lacZ |
| vgQseq_11900R | AGACACTCTATGCCTGTGTGGAG | Sequence vgQ-lacZ |
| vgQseq_12200F | TCACTGCATTCTAGTTGTGGTTTGTCC | Sequence vgQ-lacZ |
| vgQseq_7300F | TGACGACTTCTGGCTTCTGGTACG | Sequence vgQ-lacZ |
| vgQseq_7700F | TATAAATAGAGGCGCTTCGTCTACGGA | Sequence vgQ-lacZ |
| vgQseq_7900R | GATTCACTTTAACTTGCACTTTACTGCAGA | Sequence vgQ-lacZ |
| vgQseq_8300F | GTTACGATGCGCCCATCTACACCA | Sequence vgQ-lacZ |
| vgQseq_8500R | ATGAAACGCCGAGTTAACGCCATCA | Sequence vgQ-lacZ |
| vgQseq_8800F | TTGCGTGACTACCTACGGGTAACAG | Sequence vgQ-lacZ |
| vgQseq_9000R | GTGTGCAGTTCAACCACCGCA | Sequence vgQ-lacZ |
| vgQseq_9300F | TGGATGAAGCCAATATTGAAACCCACG | Sequence vgQ-lacZ |
| vgQseq_9600R | TTTGATGGACCATTTCGGCACAGCC | Sequence vgQ-lacZ |
| vgQseq_9900F | CGCATCCAGCGCTGACGGAAGC | Sequence vgQ-lacZ |
| vgQstapleL-F2 | CACGGACATGCTAAGGGTTAATCAACGCGCTGACTTTGAGTGGAATGTC | Amplify staple fragments to make vgQ-lacZ duplicate |
| vgQstapleL-R2 | GCCGCGGGAATTCGATTCGTGGCGATTTATGACCCGATAACG | Amplify staple fragments to make vgQ-lacZ duplicate |
| vgQstapleR-F2 | GCGAATTCACTAGTGATTCTCGAATTGGTCGACCTGCAGCCA | Amplify staple fragments to make vgQ-lacZ duplicate |
| vgQstapleR-R1 | GACATTCCACTCAAAGTCAGCGCGTTGATTAACCCTTAGCATGTCCGTG | Amplify staple fragments to make vgQ-lacZ duplicate |
| vir_S26_F | ACTAGAAGTCCGTGTCAGGAATCTAA | Sequence vir Adh |
| vir_S29_R | GTACACTGATTCATAAGAGACATACATCC | Sequence vir Adh |
Fig. S2.
Comparison of homogenization methods. F2 homozygotes and heterozygotes from crosses to the null pf86 line give qualitatively similar results when extracted using (A) Potter–Elvehjem homogenizers or (B) the miniG ball bearing grinder (same data as Fig. 2).
SI Methods
Stocks and Insertion Lines.
The D. virilis line used in this study has been maintained in the laboratory for over a decade and is derived from Bowling Green stock #15010-1051.0 (20). D. americana #15010-0951.00 was obtained from the University of California, San Diego Drosophila Species Stock Center. The vgQ-lacZ reporter gene was isolated from the transgenic D. melanogaster line vgQ-lacZ[hope40] (10).
To compare the effects of Adh loci, we created two Adh-null lines containing the machinery for phiC31 site-specific transgenesis:
-
i)
Line pf86 has genotype y[1] M{vas-int.Dm}ZH-2A w[*]; Adh[fn6] cn[1]; M{3xP3-RFP.attP}ZH-86Fb. It was derived from Bloomington Drosophila Stock Center (BDSC) #24749 (y[1] M{vas-int.Dm}ZH-2A w[*]; M{3xP3-RFP.attP}ZH-86Fb), BDSC #1983 (Adh[fn6] cn[1]; ry[506], and the balancer line w[1118]; CyO/Sco; MKRS/Tm6b,Tb). The following crossing scheme was used to create the line: (P) #24749 females were crossed to balancer males. (F1a) Sco;Tb males were crossed to #24749 females. (F1b) #1983 males were crossed to balancer females. (F1c) #24749 females were crossed to balancer males. (F2a) Sco;+ females from cross F1a were crossed to Cy;Tb males from cross F1b. (F2b) Cy;Tb males from F1c were crossed to #24749 females. (F3) Sco;Tb males from F2a were crossed to Cy;+ females from F2b. (F4) Cy;+ males and females were sib-mated. (F5) Non-Cy males and females were sib-mated to form the homozygous line.
-
ii)
Line pf40 has genotype y[1] M{vas-int.Dm}ZH-2A w[*]; P{CaryP}attP40 Adh[fn6] cn[1]; +. It was derived from the line “attp40” (attP40 y[1] w[67c23] M{vas-int.Dm}ZH-2A; P{CaryP}attP40), BDSC #1983, and the balancer line w[1118]; CyO/Sco; MKRS/Tm6b,Tb. The following crossing scheme was used to create the line: (Pa) #1983 females were crossed with attP40 males. (Pb) attp40 females were crossed with balancer males. (F1a) females from cross Pa were crossed to balancer males. (F1b) Cy;MKRS males from cross Pb were crossed with attP40 females. (F1c) attp40 females were crossed with balancer males. (F2a) Sco;MKRS males from cross F1a were crossed with Cyo;+ females from cross F1b. DNA was extracted from wings of males in this cross and PCR with sequencing was carried out to detect males carrying both Adh[fn6] and p{CaryP}attP40 using primers pCaryP-attP-F2/pCaryP-attP-R1 and Adh-CDS-F2/Adh-exon2-R2. Crosses with nonrecombinant males (i.e., not positive for both Adh[fn6] and p{CaryP}attP40) were discarded. (F2b) Sco;MKRS males from cross F1c were crossed with attP40 females. (F3) Cy;MKRS males from cross F2a were crossed to Sco;+ females from cross F2b. (F4) Sco;+ males and females were sib-mated. (F5) Non-Sco males and females were sib-mated to form the homozygous line.
Other PhiC31-attP insertion lines used in this study are VK00037 (BDSC #9752), VK00033 (BDSC #9750), ZH-22A (BDSC# 24481), ZH-51D (BDSC #24483), and ZH-68E (BDSC #24485).
Cloning and Transgenics.
Genomic DNA for cloning was extracted using the Qiagen Genomic-tip 20/G kit. Insert sequence was verified using Sanger sequencing after every cloning step. Duplicates were checked for double peaks. Primer sequences are in Table S15.
The D. virilis locus was amplified in three fragments (“left,” “center,” “right”) using Phusion or Q5 polymerase (New England Biolabs) and primers Adh_vir_AF1/Adh_vir_il3R, Adh_vir_il3F/Adh_vir_ir1R, and Adh_vir_ir1F/Adh_vir_NR1, respectively. Fragments were gel purified, cloned into pGEM-T-Easy (Promega), and sequenced with primers shown in Table S15. One SNP, the leftmost of three SNPs in Fig. 1B, was introduced into the junction of the left and center fragments by the il3R/il3F primers (which match the reference genome sequence, but not the haplotype used here, as determined by PCR and sequencing from genomic DNA). Fragments were reamplified using primers with vector ends (Adh_vir_S3aAf1 or Adh_vir_S3aNR1) and assembled (21) with Gibson Assembly Master Mix (New England Biolabs) into the transformation vector pS3aG (22), which had been cut with AscI and NotI-HF (New England Biolabs) to remove the GFP fragment, but preserve the flanking gypsy and Sf-1b insulators. To make the virilis single-copy locus (“Single”), which matches the right-side duplicate, fragments were amplified from the left and right clones using primers S3aAf1/ir1R and ir1F/S3aNR1, and Gibson assembled into pS3aG. To make the identical duplicate construct, we first digested the 6,900-bp NdeI–NotI fragment from Single and cloned it into pGem-T-Easy. This fragment was redigested out and ligated with the following: the 1,000-bp XhoI–NdeI fragment from center, the 6,900-bp AscI–XhoI fragment from Single, and pS3aG backbone (AscI–NotI-HF).
The vgQ-lacZ gene was retrieved from transformed flies using the primers csr34F/sv40R2, which amplify the entire enhancer, lacZ gene, and SV40 terminator. This was TA cloned into pGem-T-Easy and sequenced. Restriction site ends were added by reamplifying using primers pgBQS3aAcsr34F/pgBQS3aNSV40R2 and Gibson assembly into pGem-T-Easy to make the “pgBQ” construct. The “VgQ-lacZ Single” fragment was excised from pgBQ using AscI and NotI-HF and ligated into pS3aG. To make a duplicate, a “staple” was first produced by Gibson assembly into pGem-T-Easy of PCR products of the left and right ends of the gene, produced with primers vgQstapleL-F2/vgQstapleL-R2 and vgQstapleR-F2 and vgQstapleR-R1. The duplicate was ligated from the following: (i) the AscI–HinDIII fragment from “pgBQ”; (ii) the HinDIII–AclI fragment from staple; (iii) the AclI–NotI-HF fragment from pgBQ; and (iv) pS3aG (digested with AscI and NotI-HF).
Plasmid DNA was prepared to 1 µg/µL with the Qiagen Midi Spin kit or the Machery-Nagel Nucleobond Xtra Midi Plus EF kit and injected into embryos. For injections in pf86, pf40, and ZH-51D, positive transformants were backcrossed to the injection line and made homozygous. Injections into VK00037, VK00033, ZH-22A, and ZH-68E were carried out by BestGene. Transformants in these lines were crossed out to yw to remove the phiC31 source and then made homozygous.
ADH Enzyme Assay.
Flies for assays were reared in low-crowding conditions: five females and two males were allowed to lay eggs in sugar food vials without yeast for 2–3 d. For ADH assays, 0- to 24-h-old male flies were collected and aged in fresh vials for 4 d. Flies were picked under CO2 and transferred to microcentrifuge tubes on ice before homogenization. For β-galactosidase assays, wandering third-instar larvae were collected and dissected immediately.
The ADH enzyme assay was a modification of Mercot et al. (11). For most experiments, flies were homogenized at a concentration of five flies per milliliter in NP buffer (cold 0.1 M NaH2PO4 plus Na2HPO4, pH 8.6), using 2-mL Potter–Elvehjem Teflon–glass homogenizers on ice. Because the number of flies produced per vial was variable, we always homogenized two flies in 400 µL of NP buffer. Homogenate was transferred to a microcentrifuge tube and centrifuged at 4 °C for 5 min at 21,000 × g. Sample order was then randomized and 180 µL of supernatant was then transferred to PCR strip tubes at room temperature (21 °C) so that a multichannel pipet could be used to load samples into the assay.
All ADH enzyme assays were carried out as follows. Forty microliters of homogenate was added to 60 µL of assay solution in flat-bottom 96-well clear polystyrene plates with three technical replicates per homogenate. Assay solution consisted of 4 mM freshly reconstituted NAD+ (Sigma), 800 mM ethanol, and 100 mM Tris⋅HCl, pH 8.6. The plate was loaded in a Multiskan GO plate reader (Thermo Fisher) at 25 °C, shaken 15 s, paused 15 s, and absorbance was then read at 340 nm at 25 °C over 5 min in 9-s intervals. These conditions were chosen such that the highest observed sample was in Vmax conditions, based on serial dilution of each substrate. Activity measurements of serial dilutions of fly homogenates were observed to scale linearly over the concentrations used in experiments (Fig. S1A), suggesting that concentration-dependent change in enzyme cooperativity did not contribute to the observed overactivity. Fly homogenates display a low background rate of NAD+ reduction (e.g., uninserted flies in Fig. 2). The occurrence of a background rate suggests that the magnitude of overactivity is slightly higher than reported here (for ADH enzyme activity from site ZH-86Fb). However, as we did not measure background rates in all experiments, we report uncorrected enzyme activity values.
Soluble protein was measured for each sample to account for variation in homogenization efficiency. Protein was quantified from ADH assay samples using the Quant-IT protein assay kit (Thermo Fisher). Five microliters of homogenate or standards were added to 95 µL of assay solution in black round-bottom 96-well plates. Three technical replicates were run per homogenate, with standards run on each plate. Plates were shaken, incubated for 30 min to 2 h at room temperature, and fluorescence was measured on a Perkin-Elmer Victor5 plate reader. Soluble protein of each technical replicate was then estimated using the slope and intercept of a linear regression through the 300–100 ng/µL protein standards for that plate.
For the crossing experiments (Figs. 2 and 3C), we developed a high-throughput homogenization procedure that uses a Spex Mini-G vertical ball-bearing grinder (Spex Sample Prep). A 1-mL 96-well Masterblock titer plate (Greiner) was loaded into a Spex Kryotech aluminum plate insert. Each well of the plate was filled with 5/32” stainless-steel ball bearings (Spex) and 250 µL of NP buffer and chilled. Single 4-d-old male flies were picked by eye color under CO2 and loaded individually into wells of the plate (kept on ice), following a randomized plate layout. The plate was sealed with a cap mat and homogenized in the Mini-G at 1,000 rpm for 15 s followed by 2 min on ice, repeated for five cycles. The aluminum insert was then removed. The plate was centrifuged at 4,000 × g at 4 °C for 10 min. A volume of 180 µL of supernatant was transferred to PCR strip tubes at room temperature and assayed as described above. We compared this method to the manual homogenizers in pilot assays and recovered similar ADH activities but higher soluble protein, which probably accounts for the scale differences between the two methods, e.g., dup homozygotes measured in Fig. 1C vs. Fig. 2. Comparison of the two homogenization protocols on flies from the cross experiment (Fig. S2) suggested no qualitative effect on the relative difference among genotypes and heterozygotes vs. homozygotes.
For analysis, because the technical replicates of ADH activity and total protein were not paired and could not be easily incorporated into our statistical model, we computed the median activity and median protein for each set of technical replicates. The response variable used in analysis, with one data point per homogenate, was [median(ADH activity of three technical replicates)/median(soluble protein of three technical replicates)], with units of ΔAbs340 per minute per milligram of soluble protein.
β-Galactosidase Assay.
β-Galactosidase assays were heavily modified from Ashburner (23) as follows. Wandering third-instar larvae were dissected in KPM buffer (50 mM KH2PO4 plus K2HPO4, pH 7.5, plus 1 mM MgCl2). Both wing imaginal discs were separated out and pipetted in 1-µL volume into a microcentrifuge tube containing 4 µL of 1% (vol/vol) Nonidet-P40 detergent in KPM buffer to lyse. Samples were incubated in ice for 1–3 h and then brought to a volume of 400 µL using KPM buffer, vortexed, and centrifuged for 5 min at 4 °C at 21,000 × g. Samples were randomized and 180 µL of supernatant was transferred to PCR strip tubes at room temperature for the assay.
For the β-galactosidase assay, 50 µL of sample (i.e., one-fourth disk) was combined with 150 µL of assay solution in clear 96-well plates. To make the assay solution, 4 mM chlorophenol red β-d-galactopyranoside (Roche) was first prepared in sterile deionized water. A 1/9 volume of 10× KPM buffer was added to this and immediately vortexed. Plates were loaded into the Multiskan GO, shaken 15 s, paused 15 s, and absorbance at 574 nm measured over 1 h in 1-min intervals at 25 °C. The response variable used for analysis was the median of two to three technical replicates of β-galactosidase activity, in units of ΔAbs574 per hour. These assay conditions were chosen to ensure linearity of the response. Serial dilution of CPRG substrate did not yield a flat-asymptoting Michaelis-Menten–like curve at substrate concentrations greater than 4 mM in preliminary tests, perhaps due to the solubility of the substrate in the assay buffer. Therefore, to ensure that the assay was performing linearly, we tested for, and observed, that the assay performed linearly over the observed range of activity present in our fly samples, using serial dilution of purified E. coli β-galactosidase (Sigma) in a buffer consisting of the same concentration of KPM plus Nonidet as used in the fly disk samples (Fig. S1C). Unlike the ADH assays, we did not quantify soluble protein, owing to interference from the detergent and from the much smaller sample concentration of the imaginal discs.
Quantitative PCR.
Flies were prepared as in the ADH enzyme assay, with the following modifications. Four to five males were homogenized in 800–1,000 µL of NP on ice in Potter–Elvehjem homogenizers. Immediately, 200 µL of homogenate was added to 600 µL of TRIzol LS (Sigma) and vortexed. To extract RNA, TRIzol lysates were run through the Direct-Zol kit (Zymo Research) with DNase treatment. RNA concentration was quantified using the Qubit HS or BR RNA kits (Thermo Fisher). cDNA was synthesized using the SuperScript IV first-strand synthesis kit (Thermo Fisher) from a fixed amount of RNA (e.g., 455 ng) per day replicate.
For quantitative real-time PCR, cDNA samples were diluted to 80 µL, and 4 µL of this was run in 10-µL reactions containing Power Up SYBR Green Master mix (Applied Biosystems) and each primer at 500 nM in an ABI 7600HT machine (Applied Biosystems) under normal conditions (50 °C for 2 min; 95 °C for 2 min; 40 cycles of 95 °C for 15 s, 60 °C for 15 s, 72 °C for 60 s). Quantification cycle (Cq) was set manually above observed noise, between 0.1 and 1. Three to four technical replicates were run per sample, and the median quantification cycle was used for analysis. Technical replicates that produced undetectable levels were dropped before analysis. D. virilis Adh transcripts were amplified using primers qrAdhvir-2F and qrAdhvir-2R (whose 3′ end spans an intron). Control gene RP49 (also known as RpL32) was amplified using Rp49-dl2F and Rp49-dl2R, which produce an intron-spanning product. Control samples were prepared from uninserted flies or by omitting reverse transcriptase or DNase; the quantification cycle threshold was hit >10 cycles later or not at all, suggesting that contaminants were at 0.001-fold or lower. Melt curves produced single peaks. A standard curve (Fig. S1 C and D) was run using a fourfold serial dilution of PCR products cloned in pGem-T-Easy and diluted in E. coli tRNA as recommended by Bustin et al. (24). Four technical replicates were run per dilution step. Three technical replicate measurements in the Adh standard curve appeared to be low outliers (i.e., quantification cycle was hit 8+ cycles after the median), and were dropped. Standard curves were linear and one-to-one (e.g., 4.0-fold dilution produced a quantification cycle difference of 2.0, suggesting a 4.0-fold lower concentration of product).
For analysis, we used the delta-delta-Ct method, which gives values in fold-change of expression, i.e., 2−(Adh Cq – RP49 Cq) for each sample. For presentation, all values were normalized such that Single constructs had a value of 1.
Statistical Analysis.
Each experiment had a similar structure. For each construct, two to four transgenic lines were measured. Lines were considered as the experimental units for determining degrees of freedom. Preliminary measurements suggested that we had adequate statistical power to detect the observed deviations from twofold differences with two or more transgene lines per construct, so long as multiple measurements within lines were conducted. Flies were collected from each of three vials per transgenic line, with one sample per vial for ADH assays (or one heterozygote and one homozygote for the cross experiments) and one to two samples per vial for β-galactosidase assays. To increase our sampling precision, we repeated the same measurements across multiple days using fresh samples each time. In addition, two to three technical replicate measurements were conducted on each sample. Technical replication was condensed into a single measurement per sample by taking the median, to simplify the statistical analysis and to systematically reduce the influence of occasional large outliers (e.g., due to opaque particles). The response variables used in the statistical model (ADH activity, β-galactosidase activity, and Adh mRNA relative expression) are described in their respective sections above.
The goal of this experimental design was to determine the contribution of genotype to the observed differences in gene expression. Initial tests had suggested that there might be variation deriving from individual fly cultures and from performance of the assays on different days; we also considered it possible that individual transgenic lines contributed additional variation. Replicating the experiment at these levels was necessary for the detection of small differences from 2.0-fold between duplicate and single constructs in enzyme assays with their associated variance. To analyze data with this structure, we used a statistical model (mixed-effects model) that partitions unwanted sources of variation into random effects, similar to a traditional nested analysis of variance. The model we used was additionally structured to account for the nesting factors day and line not being nested within one another, but “crossed” (25). Also, because variance appeared to increase with gene copy number, the model also corrects for different variance among genotypes, which would otherwise lead to a violation of the equal variance assumption.
Specifically, statistical analysis was conducted in R 3.2.2 using packages nlme, lme4, and ggplot2 (ref. 26 and www.R-project.org/). A typical model was response ∼ construct with crossed-factor random effects 1|line and 1|day (the lowest level nesting factor vial is included implicitly). Examination of residuals suggested unequal variance, with duplicate constructs showing much higher variance, and thus analysis with lmer() was not reasonable. We therefore fit models with crossed-factor random effects and floating variance in lme() using the modified syntax suggested by Galecki and Burzykowski (24), e.g., lme(activity ∼ construct, random = list(one1 = pdIdent(∼day - 1), one2 = pdIdent(∼line - 1)), weights = varIdent(form =∼1|construct), data = data), where one1 and one2 are vectors of the value 1L. The construct factor was ordered such that duplicates were in the first (intercept) position. All models used restricted maximum likelihood (REML). Residuals and random effects were checked for normality and lack of structure when plotted against fitted values and grouping factors. Model parameter estimates are shown in Tables S1–S14.
Constructs were compared using t tests based on the model estimates of mean and SE. This was done manually to test the null hypothesis of twofold difference, and because the current version of lme() overestimates the degrees of freedom when random effects are specified as a list. We considered sample size to be the number of transgenic lines (i.e., two to four). We tested the null hypothesis that the duplicate gave twice the response of the single using a two-sided t test that compares the effect of the single to the difference between single and duplicate (this difference term is a direct output of the model). Specifically,
where is the fixed-effect parameter “(Intercept)” (i.e., the mean of Single), is the fixed effect parameter “Dup” or “Ident_Dup” (i.e., the mean of Dup minus the Intercept), and are the respective SEs from the mixed-effects model, and and are the respective sample sizes. if the duplicate has exactly twice the activity of the single. All of these values are given in Tables S1–S14. We conducted this as an equal-variance t test because we had already corrected for the effects of unequal variance among genotypes in the mixed-effects model. The 95% confidence intervals of the mean shown in the figures were computed from the t distribution using the SEs obtained when we refit the model with no intercept.
For the cross experiments, we were most interested in the planned comparison between dup heterozygous and singleton homozygous. For this comparison, we compared the mean estimates from a no-intercept model using a two-sample, two-sided t test.
Acknowledgments
We thank Kathy Vaccaro for superb technical support in producing transgenic lines; Nicholas Keuler for guidance on statistical analysis; Greg Wray for discussion; Fiona Ukken, Victoria Kassner, and Jane Selegue for technical advice; and Henry Chung, Matt Giorgianni, and Noah Dowell for advice and helpful comments on the manuscript. D.W.L. is a Howard Hughes Medical Institute Fellow of the Life Sciences Research Foundation. S.B.C. is a Howard Hughes Medical Institute Investigator.
Footnotes
The authors declare no conflict of interest.
Data deposition: The DNA sequences reported in this paper have been deposited in the GenBank database [accession no. KU559568 (Drosophila virilis Adh locus)].
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1605886113/-/DCSupplemental.
References
- 1.Stam LF, Laurie CC. Molecular dissection of a major gene effect on a quantitative trait: The level of alcohol dehydrogenase expression in Drosophila melanogaster. Genetics. 1996;144(4):1559–1564. doi: 10.1093/genetics/144.4.1559. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Tishkoff SA, et al. Convergent adaptation of human lactase persistence in Africa and Europe. Nat Genet. 2007;39(1):31–40. doi: 10.1038/ng1946. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Babbitt CC, et al. Multiple functional variants in cis modulate PDYN expression. Mol Biol Evol. 2010;27(2):465–479. doi: 10.1093/molbev/msp276. [DOI] [PubMed] [Google Scholar]
- 4.Coolon JD, McManus CJ, Stevenson KR, Graveley BR, Wittkopp PJ. Tempo and mode of regulatory evolution in Drosophila. Genome Res. 2014;24(5):797–808. doi: 10.1101/gr.163014.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Katju V, Bergthorsson U. 2013. Copy-number changes in evolution: Rates, fitness effects and adaptive significance. Front Genet 4:273.
- 6.Rogers RL, et al. Tandem duplications and the limits of natural selection in Drosophila yakuba and Drosophila simulans. PLoS One. 2015;10(7):e0132184. doi: 10.1371/journal.pone.0132184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Ohno S. Evolution by Gene Duplication. Springer; New York: 1970. [Google Scholar]
- 8.Force A, et al. Preservation of duplicate genes by complementary, degenerative mutations. Genetics. 1999;151(4):1531–1545. doi: 10.1093/genetics/151.4.1531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Nurminsky DI, Moriyama EN, Lozovskaya ER, Hartl DL. Molecular phylogeny and genome evolution in the Drosophila virilis species group: Duplications of the alcohol dehydrogenase gene. Mol Biol Evol. 1996;13(1):132–149. doi: 10.1093/oxfordjournals.molbev.a025551. [DOI] [PubMed] [Google Scholar]
- 10.Kim J, et al. Integration of positional signals and regulation of wing formation and identity by Drosophila vestigial gene. Nature. 1996;382(6587):133–138. doi: 10.1038/382133a0. [DOI] [PubMed] [Google Scholar]
- 11.Mercot H, Defaye D, Capy P, Pla E, David JR. Alcohol tolerance, ADH activity, and ecological niche of Drosophila species. Evolution. 1994;48(3):746–757. doi: 10.1111/j.1558-5646.1994.tb01358.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Qian W, Liao B-Y, Chang AY-F, Zhang J. Maintenance of duplicate genes and their functional redundancy by reduced expression. Trends Genet. 2010;26(10):425–430. doi: 10.1016/j.tig.2010.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Sturtevant AH. The effects of unequal crossing over at the Bar locus in Drosophila. Genetics. 1925;10(2):117–147. doi: 10.1093/genetics/10.2.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Wondji CS, et al. Two duplicated P450 genes are associated with pyrethroid resistance in Anopheles funestus, a major malaria vector. Genome Res. 2009;19(3):452–459. doi: 10.1101/gr.087916.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Faust JB, Meeker TC. Amplification and expression of the bcl-1 gene in human solid tumor cell lines. Cancer Res. 1992;52(9):2460–2463. [PubMed] [Google Scholar]
- 16.Perry GH, et al. Diet and the evolution of human amylase gene copy number variation. Nat Genet. 2007;39(10):1256–1260. doi: 10.1038/ng2123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Handsaker RE, et al. Large multiallelic copy number variations in humans. Nat Genet. 2015;47(3):296–303. doi: 10.1038/ng.3200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Feuerborn A, Cook PR. Why the activity of a gene depends on its neighbors. Trends Genet. 2015;31(9):483–490. doi: 10.1016/j.tig.2015.07.001. [DOI] [PubMed] [Google Scholar]
- 19.Gross DS, Chowdhary S, Anandhakumar J, Kainth AS. Chromatin. Curr Biol. 2015;25(24):R1158–R1163. doi: 10.1016/j.cub.2015.10.059. [DOI] [PubMed] [Google Scholar]
- 20.Wittkopp PJ, Vaccaro K, Carroll SB. Evolution of yellow gene regulation and pigmentation in Drosophila. Curr Biol. 2002;12(18):1547–1556. doi: 10.1016/s0960-9822(02)01113-2. [DOI] [PubMed] [Google Scholar]
- 21.Gibson DG, et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat Methods. 2009;6(5):343–345. doi: 10.1038/nmeth.1318. [DOI] [PubMed] [Google Scholar]
- 22.Ordway AJ, Hancuch KN, Johnson W, Wiliams TM, Rebeiz M. The expansion of body coloration involves coordinated evolution in cis and trans within the pigmentation regulatory network of Drosophila prostipennis. Dev Biol. 2014;392(2):431–440. doi: 10.1016/j.ydbio.2014.05.023. [DOI] [PubMed] [Google Scholar]
- 23.Ashburner M. Drosophila: A Laboratory Manual. Cold Spring Harbor Lab Press; Cold Spring Harbor, NY: 1989. pp. 317–318. [Google Scholar]
- 24.Bustin SA, et al. The MIQE guidelines: Minimum information for publication of quantitative real-time PCR experiments. Clin Chem. 2009;55(4):611–622. doi: 10.1373/clinchem.2008.112797. [DOI] [PubMed] [Google Scholar]
- 25.Galecki A, Burzykowski T. Linear Mixed-Effects Models Using R: A Step-By-Step Approach. Springer; New York: 2013. pp. 478–480. [Google Scholar]
- 26.Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer; New York: 2009. [Google Scholar]







