Skip to main content
Genetics logoLink to Genetics
. 2017 Sep 20;207(3):1157–1166. doi: 10.1534/genetics.117.300306

Variation in Position Effect Variegation Within a Natural Population

Keegan J P Kelsey 1,1, Andrew G Clark 1
PMCID: PMC5676239  PMID: 28931559

Abstract

Changes in chromatin state may drive changes in gene expression, and it is of growing interest to understand the population genetic forces that drive differences in chromatin state. Here, we use the phenomenon of position effect variegation (PEV), a well-studied proxy for chromatin state, to survey variation in PEV among a naturally derived population. Further, we explore the genetic architecture of natural variation in factors that modify PEV. While previous mutation screens have identified over 150 suppressors and enhancers of PEV, it remains unknown to what extent allelic variation in these modifiers mediate interindividual variation in PEV. Is natural variation in PEV mediated by segregating genetic variation in known Su(var) and E(var) genes, or is the trait polygenic, with many variants mapping elsewhere in the genome? We designed a dominant mapping study that directly answers this question and suggests that the bulk of the variance in PEV does not map to genes with prior annotated impact to PEV. Instead, we find enrichment of top P-value ranked associations that suggest impact to active promoter and transcription start site proximal regions. This work highlights extensive variation in PEV within a population, and provides a quantitative view of the role naturally segregating autosomal variants play in modifying PEV—a phenomenon that continues to shape our understanding of chromatin state and epigenetics.

Keywords: Position Effect Variegation (PEV), chromatin, natural variation, phenotypic variance, suppressor of variegation, enhancer of variegation


CHROMATIN states, defined broadly as the combination of chromatin and bound factors that together impact chromatin accessibility and gene expression, have clear evolutionary importance (Schulze and Wallrath 2007). The ModEncode project has generated valuable data for use in understanding determinants and classification of chromatin states and correlation with functional consequences (modENCODE Consortium et al. 2010; Kharchenko et al. 2011; Ernst and Kellis 2012; Filion et al. 2016). Despite rapid progress in chromatin biology, we still know little about variation in chromatin states among individuals and naturally segregating genetic variants that may be involved in generating these differences (Richards 2006, 2008; Bossdorf et al. 2008; Hu and Barrett 2017). Any natural variant that serves to impact chromatin state may be a target of natural selection, and it is of great interest to understand population genetic forces that drive differences between chromatin and genome accessibility of individuals.

Originally discovered by H. J. Muller, position effect variegation (PEV) has long been studied in Drosophila melanogaster and is widely accepted as a valuable tool in understanding dynamics of chromatin state and gene expression, especially with regards to the boundary between heterochromatin and euchromatin. The most commonly used form of PEV is the result of a specific X chromosome inversion whitemottled-4, or simply wm4 (Muller 1930), which relocated the normally euchromatic white gene next to pericentromeric heterochromatin. The molecular impact of this relocation is now understood to be a spreading of chromatin-associating factors that results in altered chromatin structure and gene silencing of white without known modification to coding sequence (Wallrath and Elgin 1995). Depending on the genomic background, the phenotypic manifestation of this particular silencing is mosaic across the facets of the eye, resulting in a “mottled” eye phenotype with local clones of cells in each eye showing an apparently stochastic response. Of significance, PEV is triggered by gene-chromosome rearrangements in many organisms, producing modified gene expression as a result of modified chromatin environment (Girton and Johansen 2008; Elgin and Reuter 2013).

Manifestation of PEV in the Drosophila eye allows for ease of visible phenotyping and has generated a large body of work leading to the discovery of numerous PEV modifying factors, both genetic and environmental, that also impact chromatin and genome accessibility. These modifiers are typically described as positively influencing gene expression at the wm4 locus, termed suppressors of variegation, or Su(var), or negatively influencing gene expression at wm4, termed enhancers of variegation, or E(var). Screens for modifiers have identified over 150 loci that modify PEV (Ebert et al. 2004). Examples of modifiers of PEV and chromatin include Su(var)3-9, Su(var)2-5 and piwi. Su(var)3-9, Su(var)2-5, or HP1a, and maternal Piwi help to establish heterochromatin (Schotta et al. 2002; Gu and Elgin 2013). HP1a interacts with Su(var)3-9, and is also required for the spreading of heterochromatin (Hines et al. 2009). It is important to understand, however, that the vast majority of these modifiers have been identified through mutation screens, and may or may not represent loci harboring natural variation that impact the PEV phenotype. Despite the extensive literature on modifiers of PEV, there is currently little information on the range of PEV variation within a natural population—an important first step to assessing any determinant of PEV in a population.

Genome-wide association studies (GWAS) initially gained favor in human studies as a way to perform a relatively unbiased search for common natural variants involved in a phenotype of interest. Using sets of inbred reference lines of D. melanogaster, GWAS has proven to be a powerful resource for generating unbiased, data-driven hypotheses, where a genome-wide search of candidate loci may be coupled with the extensive functional annotation and genetic tools already available. The Drosophila Genetic Reference Panel (DGRP)—a collection of genome-sequenced inbred lines of D. melanogaster (Mackay et al. 2012; Huang et al. 2014)—has become a handy resource for initial GWAS screens with Drosophila. Several groups have already successfully used the resource to identify novel variants involved in a wide range of traits, including sleep (Harbison et al. 2013), leg development (Grubbs et al. 2013), sperm competition (Chow et al. 2012), host–microbiota interaction (Dobson et al. 2015), fecundity and fitness (Durham et al. 2014), and nutritional indices (Unckless et al. 2015).

To better understand variation in PEV within a population, and the underlying genetic architecture of segregating natural variants involved in heterochromatin dynamics, we performed GWAS on F1 progeny of DGRP lines crossed to wm4—a line bearing an X-linked inversion that displays PEV of the white eye phenotype. PEV was quantified by novel digital image analysis of visible images captured with a dissecting microscope. We found extensive variation in PEV, our proxy for chromatin state, in the DGRP population. Despite this detailed work, we find little evidence of association for segregating variants within known Su(var) and E(var) genes. However, we do find variants having association with PEV to be over-represented in regions having a chromatin state indicative of active promoter and transcription start site (TSS)-proximal features. Furthermore, a comprehensive search across binding sites for factors that modify chromatin accessibility link numerous binding sites to PEV-associated variants, emphasizing regions with bistable chromatin states. Altogether, the evidence suggests autosomal dominant natural variation interacts with PEV through numerous, small effect loci that are enriched for transcription factor (TF) binding and sites of open chromatin, implying influence through gene expression or subtle changes to chromatin balance.

Materials and Methods

Drosophila stocks

Lines from the DGRP (Mackay et al. 2012) were a gift from the Mackay laboratory. Line 1712 (Bloomington), which harbors the whitemottled-4, In(1)wm4 (or simply wm4) locus on the X chromosome and a second chromosome deletion and balancer, Df(2L)2802/CyO, was used to assess variegation across the DGRP population. Canton-S and mutant eye color stocks, 245 (bw1) and 3605 (w1118) (Bloomington) provided biological context in our eye color phenotype assay. All flies were maintained on a standard cornmeal-molasses-sucrose-yeast medium and kept at 25° on a 12-hr light/dark cycle.

Experimental cross

In each of two replicate vials, 10 males from each of 124 DGRP lines were crossed to three virgin females of the 1712 stock and allowed to mate for 1 day. Mated 1712 females were then transferred to new food, allowed to lay eggs, and removed from the vial after 5 days. Variegating male progeny segregated into two phenotypic classes based on the second chromosome, curly winged (CyO), and noncurly winged (Df(2L)2802), and were aged at least 4–8 days before imaging.

Image capture and eye color quantification

Pigments in the eye of D. melanogaster are synthesized by two, well-characterized metabolic pathways (Summers et al. 1982). These pigments are typically quantified through separate extractions based on chemical properties (Ephrussi and Herold 1944). Often, only a single extracted pigment is used to describe eye color, and, although useful for detecting general differences, this method results in considerable loss of information. Even casual inspection of eye color patterns that manifest PEV reveals a far more complex range of differences, including pigment intensity, different hues of pigmentation including yellow, orange, brown, and red, and variation in patch size and morphology. To better capture the multidimensional aspects of eye color and PEV, and improve mapping, we developed an imaging method that retains and fully describes eye color within a single assay.

The left or right eye was randomly selected from each adult male and imaged using an Olympus SMZ-10 dissecting microscope with an attached Cannon Rebel 6 megapixel digital camera in a windowless room. Images were captured and stored using software from the camera manufacturer. After removing outliers and lines with more than three images, an average of 17.5 eyes (SD = 9.2) were imaged from each line, resulting in a total of 3966 images across all lines and second chromosome combinations. A standard gray card (Kodak, 18% gray) was imaged before and after each set of conditions to normalize against fluctuating light conditions. Ommatidia from images were isolated using a pipeline built in Cell Profiler 2.0 (Lamprecht et al. 2007) and visually inspected. Images that failed to process were individually assessed and isolated in Photoshop. Isolated ommatidia and standard gray card images were then processed using custom scripts in R (R Development Core Team 2008). Ommatidia image files were separated into red, green, and blue color channels and values were bounded between 0 and 1. Color channel values from each pixel of every image were normalized against the mean value of individual color channels from matched gray card pairs according to a generalized gamma adjustment using the formula,

Vγ=Vout;whereγ=log0.82logs¯ (1)

where s¯ is the individual mean color channel (red, green, or blue) for the gray card imaged before and after samples; 0.82 refers to the idealized 18% gray card value in RGB color space. V is the color channel value for an individual pixel within the image, γ is the normalizing function, and Vout is the normalized color value. The final summarized output for each eye and image resulted in three values, including the mean of each normalized red, green; and blue color channel.

Image assay validation

Eye-color stocks (Figure 1A) were imaged, and Principal Component Analysis (PCA) using the values from the individual red, green, and blue color channels of each image, sufficiently described the multivariate data. Nonpigmented (white) and pigmented stocks (bw1, wm4, and Canton-S) exhibited large differences described by principal component 1 (PC1), and pigmented stocks (bw1, wm4, and Canton-S) exhibited differences primarily described across principal component 2 (PC2) (Figure 1B). PC1 accounted for 87.4% (SD = 24.4%) of variance in the data, and PC2 accounted for 12.5% (SD = 9.2%). Component loadings provide detail on how each color channel contributes to the dispersion of the data, where blue and green channels have a similar impact on PC1 (−0.72 and −0.68, respectively), while the red channel has a minor impact (−0.12). This is in contrast to PC2, where the red channel is the primary driver (0.94), and blue and green channels have minor roles (−0.30 and 0.15, respectively). MANOVA using PC1 and PC2 from each individual image (Equation 2) highlights the ability of this approach to discriminate across eye groups of the four stocks (P-value <2.2 × 1016).

Figure 1.

Figure 1

(A) Examples of imaged eyes from various D. melanogaster alleles; (a) w1118, (b) wm4 with the Df(2L)2802 second chromosome, (c) wm4 with the CyO second chromosome, (d) bw1, (e) wm4 with Df(2L)2802/CyO, and (f) Canton-S. (B) Scatter plot of PC1 and PC2 values for mutant and experimental PEV eyes. Black and white points represent individual images and average values of eye colors from stocks; (a) wm4,(d) bw1, (e) wm4 with Df(2L)2802/CyO, (f) Canton-S. Experimental individuals with just the Df(2L)2802 (periwinkle), in general, show greater variegation and eyes that are closer to the w1118 allele (a), an eye that lacks pigmentation. Experimental siblings with the CyO allele (red) show less variegation, or more pigmentation. (C) Boxplot of PEV summarized by line (x-axis) and separated by second chromosome, Df(2L)2802 (periwinkle) vs. CyO (red). PEV is represented PC1 (y-axis), where lower values indicate less pigmented eyes, and higher values indicate more pigmented eyes. Among-line variance is greater than within-line variance, suggesting natural genetic variation is involved in observed differences in PEV.

Statistical analysis of phenotype

All subsequent analysis was performed in R (R Development Core Team 2008). The mean red, green, and blue color values for each image were used input for PCA. Eye groups were assessed using MANOVA and the formula,

Ypc1,pc2,pc3=μ+S+e (2)

where Y is PC1, PC2, and PC3 from each image, and S is the stock of origin (1712, 245, or 3605; Canton-S). Differences between experimental groups were assessed using ANOVA and fit using separate principal components from each image with the formula,

Y=μ+L+C+V+LxC+e (3)

where Y is either PC1, PC2, or PC3, L is the DGRP line of origin, C is the second chromosome background (CyO or Df(2L)2802), and V is the replicate vial (A or B). Although the experimental design contained fully crossed factors, interaction terms LxV, CxV, and LxCxV were not statistically significant and were dropped from the model. Proportion of variance for each effect is calculated as,

eta2=SSeffectSStotal (4)

where SS is the sum of squares. Broad-sense heritability (H2) was calculated using

H2=VgVp (5)

and based on the linear model,

Y=μ+L+e, (6)

where Y is a single principal component and single second chromosome background combination, and L is the DGRP line of origin. The proportion of variance explained by line differences was considered the variance attributed to genetic components (Vg), and the total variance of the sample was considered the variance attributed to phenotype (Vp). For each second chromosome background, H2 was summed across the three PCs and weighted according to the proportion of variance explained by each PC (Supplemental Material, Table S1).

Genotypes and association testing

Genotypes and annotation for DGRP lines were downloaded from the website, dgrp.gnets.ncsu.edu, and all variants and findings are reported using build BDGPR5/dm3. As others have observed, the DGRP lines display small amounts of cryptic genetic relatedness (He et al. 2014; Huang et al. 2014). Here, we used GEMMA (Zhou and Stephens 2012) to both estimate a centered genetic relatedness matrix (GRM), accounting for cryptic relatedness, and implement the univariate mixed linear model (MLM). Individual genetic variants were treated as a fixed effect, and the GRM was included as a random effect. Each of the two experimental populations [CyO and Df(2L)2802 second chromosome backgrounds] were used as separate input, providing two independent sources of association values. Response variables, PC1 line means from males only, were regressed against each variant using single marker association (SMA). GRMs were estimated separately for each background based on lines with observed phenotypes. When selecting source variants for estimating GRMs, PLINK v1.07 (Purcell et al. 2007) was used to prune SNPs for Linkage Disequilibrium (indep function with parameters 50, 10, and 2), and to remove X chromosome SNPs. Effect sizes (Cohen’s d) and 95% upper and lower confidence limits were based on SMA P-values and respective allele sample sizes, and determined using the function, pes, within the R package compute.es (Del Re 2013). In all 107 lines were used with a CyO second chromosome background and 109 lines were used with a Df(2L)2802 second chromosome background. Testing was performed across 775,689 (CyO) and 928,587 (Df(2L)2802) biallelic variants (SNPs and indels) with a MAF of 0.05 or greater. Due to the high correlation between phenotype and top associations of the two experimental populations, subsequent analysis was performed using only the Df(2L)2802 GWA data. Bootstrap analysis was used to generated expected site class frequencies. Site classes were counted from 1000 randomly selected common variants, and an expected distribution was achieved through 10,000 iterations of resampling. For variants with more than one annotated class, only one class was selected, and priority was ranked as follows; nonsynonymous > ncRNA > synonymous > UTR > intronic > intergenic. The prop.test in R was used to assess the ratio of nonsynonymous to synonymous SNP counts within exonic variants between that observed counts from top GWA SNPs and mean counts from randomly selected SNPs of the above bootstrap analysis. P-value enrichment was assess for the Df(2L)2802 ∼200 kb deletion. Enrichment was quantified as the proportion of variants within the region having P-values <0.1. Expected distributions for each chromosome arm were generated by randomly sampling 1000 times identical length linked segments and similarly calculating P-value enrichment.

Candidate gene analysis

A total of 105 candidate genes was selected based on known involvement in variegation. Genes were identified in Flybase using the term “Modifier of Variegation.” Location of each gene was extracted from Flybase and SNPs within the gene and ±2 kb of the gene were examined using the above MLM for each SNP having a MAF of 0.05 or greater. Bootstrap analysis was used to generate an expected Site Frequency Spectrum (SFS) for comparison against the SFS of the 105 candidate genes. If genes contained multiple transcripts, only one transcript was randomly selected. Next, 105 genes were randomly sampled from the full gene set and proportions of SNP allele frequencies were binned into MAF groups of 0.05. First, 11,273 genes with known locations on chromosomes 2, 3, and 4 were downloaded from Flybase. All SNPs sampled were from the experimental population having the Df(2L)2802 second chromosome background. The process of selecting 105 genes, at random, was repeated 10,000 times using sampling with replacement. Observed proportions were compared to the mean and SD for each expected MAF grouping. The observed SFS was generated by querying the 105 candidate genes with known involvement in variegation, for all SNP allele frequencies. Similarly, bootstrap analysis was used to generate an expected distribution of counts of segregating sites with sets of genes. The total number of segregating sites (within ±2 kb) were counted within the sets of 105 randomly selected genes throughout the autosomal genome. Counts were normalized by the total number of base pairs summed across all 105 randomly selected genes. This process was again repeated 10,000 times using sampling with replacement to achieve and expected distribution. The observed proportion of segregating sites in the 105 candidate genes was then compared to the expected distribution of proportions.

GCTA v1.25.2 (Yang et al. 2011) was iteratively called using phenotypes and GRMs generated from lines containing the Df(2L)2802 background chromosome. The options –reml and –reml-alg 2 were used to estimate statistical explanation of among-line phenotypic variance attributed to top rank and random SNPs. Attempts to set –grm-cutoff resulted in an imperfect comparison between the three SNP groups (top hits, PEV modifiers only, and random sets) as, across SNP counts, different lines were dropped given the different GRMs. Therefore, to maintain an equal comparison between all groups, –grm-cutoff was not set. SE were reported for top hits and top hits within PEV modifying genes; however, this information was not reported for permuted sets of random SNPs. Instead, 100 iterations of random permutations were used to generate lists of SNPs for further generating GRMs. The mean of expected attributed variance was reported the randomly permuted lists, along with a SD of means.

Genomic analysis

The nine-state genome-wide combinatorial chromatin state annotation is described in Kharchenko et al. (2011), and annotation files were sourced from www.modencode.org. Expected nine-state distributions were generated in the experimental population through sampling autosomal variants for state assignment. 1000 variants were randomly selected and chromatin states were counted. This process was repeated 100 times. ChIP-chip and ChIP-seq files were also downloaded from ModENCODE (www.modencode.org). If replicate samples existed, only one file was randomly selected for analysis and composite files, if they were made available, were used instead of individual samples. Comparison between the expected distribution of variants within binding sites and variants enriched with associations to PEV was performed as described above; 1000 autosomal variants were selected at random, and variants within binding sites were counted. The distribution of counts, as generated through 10,000 iterations, was then compared to observed counts from 1000 of the top P-value ranked associating variants. A full list of factors with respective ModENCODE IDs and observed and expected counts has been made available (File S3).

Data availability

Original eye images are available upon request. File S1 contains PEV phenotype values. File S2 contains a full list of variants with respective association P-values. File S3 contains observed and expected counts of top PEV associations within annotated chromatin features.

Results

Natural variation in background genetic effects on wm4 expression

To quantify natural variation in PEV, we made use of the DGRP. The PEV phenotype was expressed by crossing males from inbred DGRP lines to virgin females carrying the wm4 allele on the X chromosome. F1 variegating males were identical across a single X, third, and fourth chromosome, segregating according to one of two second chromosomes, and varying with respect to a full haplotype from each of the DGRP lines assayed (Figure S1). The two second chromosomes differ primarily with respect to a ∼200 kb deletion in 25F2–25F5 on the nonbalancing chromosome, Df(2L)2802, and inversions on the balancing chromosome, CyO. Experimental F1 progeny exhibited a wide range of eye pigmentation differences, showing variation that spanned a complete lack of pigmentation to eyes that were heavily pigmented (Figure S2). The quantitative image assay further detailed a broad phenotypic spread in PEV, with PC1 and PC2 values spanning between an eye mutant that lacks pigmentation (white) and mutants of known pigment deficiencies (Figure 1B). Reapplying PCA to just the mean red, green, and blue color values of images from each of the F1 variegation males, indicates that PC1 captures the vast majority of variance in the experimental data, 97.4% (SD = 18.6%), while PC2 and PC3 only describe a small proportion of variance, 2.4% (SD = 2.9%) and 0.2% (SD = 0.9%). Using PCA on color images of PEV individuals, effectively allows the simplification of a multivariate data source to a single describing variable, PC1, with minimal (2.6%) loss of data, and provides a robust univariate phenotype for association mapping.

ANOVA using individual PCs provides an assessment of importance ascribed to each of three experimental variables; among-line differences, second chromosome differences, and replicate environments (Equation 3 and Table 1). Combined genetic components explain 80.5% of the phenotypic variance attributed to PC1, where second chromosome differences separately explained 50.9% of the variance, among-line variation (our source of natural variation) explained 27.7% of the variance, and 1.9% of the variance was explained through genetic interactions of second chromosome background and individual lines. Partitioning the sample by presence/absence of the second chromosome deficiency provides two separate measures of broad-sense heritability (H2) of PEV within the DGRP population. Among-line differences explained over half of the phenotypic variance, 59.4 and 57.4%, for the DGRP populations within CyO and Df(2L)2802 second chromosome backgrounds (Table S1). These data suggest that a large portion of the observed variation in PEV, within respective second chromosome backgrounds, is attributed to segregating genetic variants among the naturally derived DGRP haplotypes.

Table 1. Partitioning the variance in PEV attributed to genetic and environmental factors.

Principal Component 1 Principal Component 2
Variable Proportion of Variance (%) P-value Proportion of Variance (%) P-value
Among line 27.7 <1.0 × 10−22 15.3 <1.0 × 10−22
Second chromosome 50.9 <1.0 × 10−22 <0.1 0.88
Vial <0.1 0.33 <0.33 1.6 × 10−6
Among line × second chromosome 1.9 <1.0 × 10−22 31.2 <1.0 × 10−22
Within line and residuals 19.5 53.2

When comparing second chromosome backgrounds, lines showed strong positive correlation in PEV between PC1 values (Pearson correlation of 0.84), where the Df(2L)2802 second chromosome acts a clear E(var) with respect to the CyO second chromosome balancer; showing greater variegation, or less pigmented eyes, in nearly all lines (Figure S3). Although not the focus of this study, it is important to recognize that individuals differing only by second chromosome accounted for almost half (49.5%) of the phenotypic variance observed, considerably more than explained through natural variation. The consequences of this highlight two potential scenarios. First is the possibility of an unannotated mutation in either a Su(var) or E(var) between the second chromosomes. A second possibility is that the totality of the deletion on Df(2L)2802, a deletion of ∼200 kb, acts to enhance variegation through a sponge or sink model, similar to what is proposed to occur with the Y chromosome (Francisco and Lemos 2014); we cannot distinguish between these possibilities.

Genome-wide association testing

To assess the contribution of segregating variants to PEV, SMA was performed genome-wide using variegating F1 males from crosses between DGRP lines and the wm4 reporter. Full haplotypes from each of the distinct DGRP lines provided source variation for association mapping. PC1 from each of the two second chromosome populations were used as separate input into a univariate MLM accounting for cryptic relatedness (He et al. 2014). Testing was performed across common biallelic variants (SNPs and indels, MAF 0.05) using haplotypes extracted from DGRP chromosomes 2, 3, and 4. The X chromosome, carrying the wm4 reporter, was invariant across experimental populations. Quantile-quantile (Q-Q) plots indicate P-values from each experimental population overall conform well to the null distribution (Figure S4). Effect sizes follow a trend where Cohen’s d increases as MAF decreases (Figure S5). A comparison of the P-value rank ordered 1000 top associations shows 75.3% overlap between the two experimental populations, consistent with a strong correlation in PEV between the two groups.

The top-ranked associations are enriched for variants that are located in exons, UTRs, and ncRNAs, with reduced representation within intronic and intergenic regions (Figure S6). From this set of 1000 top associations, a comparison of synonymous to nonsynonymous SNPs within exonic sites shows no significant difference between ratios of observed counts when compared to randomly drawn sets of SNPs (proportion test, P-value = 1). Among the smallest P-values, only two variants were identified as resulting in a missense mutation, and both showed reduced strength in association across independent GWAS samples (Table S2). Although we note general enrichment of top-ranked associations in exons, we see no data to indicate a strong bias toward sites that would result in change to protein function. We also note no P-value enrichment within the Df(2L)2802 second chromosome deficiency region (Figure S7).

Despite many variants identified as having an effect size ≥1, and with enrichment proximal to functional regions, our small sample sizes, substantial background effect on phenotype, as noted through second chromosome differences, and abundance of multiple variant classes in top associations reduce confidence in traditional functional follow-up. A full list of variants with association P-values has been made available (File S2).

Dominant, common variants within known autosomal PEV modifiers fail to fully account for among-line differences

Despite low power to identify individual causal variants with high confidence, an extensive literature on known genic modifiers of PEV affords the opportunity to assess significance of ensembles of variants. To quantify the impact of naturally occurring dominant polymorphism in known modifiers of PEV, we identified variants in Su(var) and E(var) genes. The term, “Modifier of Variegation” was searched within FlyBase, and over 200 genes satisfied this criterion. As our experimental setup resulted in individuals sharing a common X chromosome, the set of modifiers was reduced to 105 autosomal genes (Table S3). Variants within, or extending ±2 kb of identified autosomal modifiers were grouped and results from the above SMA were used. A total of 16,640 variants (6153 having MAF ≥ 0.05) was identified in autosomal modifiers of PEV within the 109 lines having the Df(2L)2802 second chromosome background. Importantly, of common variants (MAF 0.05) in this reduced set of PEV genes, only five were also identified in the top 1000 genome-wide SMA hits, making up <0.5% of top associated variants. These five variants hold overall P-value ranks of 379, 553, 772, 820, and 821, indicating that a majority of natural variants with likely impact to phenotypic variance of PEV do not reside in, or near, genes with known impact to PEV. GCTA (Yang et al. 2011) was used to compare the cumulative statistical explanation of the among-line phenotypic variance between classes of variants as grouped by top GWA variants and top variants within PEV modifier. Top variants identified through GWA, rank-ordered by P-value, consistently explained a greater proportion of among-line variance than variants within prior known genic PEV modifiers (Figure 2 and Figure S8). The 1000 top ranking overall variants statistically explained 93.7% (SE = 24.8%) of variance due to genetic differences, and the 1000 top ranking variants in known PEV genes explained considerably less at 56.0% (SE = 22.2%). These sets are both compared to variants randomly drawn from the autosomal genome which explained, on average, 2.3% (SD = 2.7%) of phenotypic variance.

Figure 2.

Figure 2

Proportion of among-line phenotypic variance explained within the Df(2L)2802 second chromosome population using GCTA. Comparisons between SNP groupings include; the most significant GWA variants (black line), variants within know PEV modifiers only (blue line), and randomly selected autosomal variants (gray line). Shading represents SE.

Although there is strong evidence for involvement of genes from Table S3 in PEV, there is little data indicating natural variation within these genes is responsible for differences in PEV among lines. This is not completely surprising, however, as many known Su(var) and E(var) genes show conservation across species (Fodor et al. 2010), suggesting little room for variation in coding sequence. Indeed, two additional pieces of data further explain the lack of association with variants in known PEV modifiers, and suggest purifying selection within the modifiers. First, the observed SFS, within the experimental population, shows an increase in low MAF variants, and a decrease in variants ≥0.05 within known PEV genes, compared to sets of genes randomly selected from the autosomal genome (Figure 3A). Second, known PEV modifying genes exhibit a paucity of segregating sites compared to an expected distribution, having fewer segregating sites than 99.7% of gene sets randomly selected from autosomes (Figure 3B).

Figure 3.

Figure 3

Variants within known autosomal PEV modifiers compared to expected distributions. (A) SFS of variants within known autosomal modifiers of PEV (black), and variants within sets of randomly selected autosomal genes (gray). Error bars reflect SD. (B) Proportion of known autosomal PEV modifiers that contain segregating sites compared to sets genes drawn randomly 10,000 times. The shaded area highlights 95% of the expected distribution; 99.7% of randomly selected gene sets contain a greater proportion of segregating sites than 105 known genic PEV modifiers.

General feature enrichment of top associations

As we see little evidence for association within known Su(var)s and E(var)s, we then ask if other genomic features show an over-representation of association with PEV. Regulatory regions are a logical next set of features to query given that we find no strong evidence linking natural PEV variation to protein coding variation, hypothesizing instead that PEV-associated variants primarily impact regulation of protein quantities and not protein function and quality. We make use of extensive public data from ModEncode to classify genomic regulatory regions. Feature-predictive combinations of specific histone modifications have been shown to strongly correlate with functional elements of the genome (modENCODE Consortium et al. 2010), and are an excellent source of labeled data for quick surveys. We next asked if particular signatures were over or under-represented in the top P-value ranked associations. Using a previously built combinatorial 9-state (c1–c9) assignment from S2 and BG3 cells (Kharchenko et al. 2011), we queried the genome labeled with discrete, nonoverlapping c1–c9 for an over-representation of PEV-associations. First, we generated an expected proportion of labels using all autosomal variants (Figure S9). We then selected the top 1000 SMA variants, as rank-ordered by P-value, and compared the observed counts to an expected distribution. We found an over-representation of signature, c1, consistent across BG3 and S2 cells (>1 SD) and lowered representation of signatures c4, c6, and c7 (<1 SD). Signature c1 is described as representing active promoter and TSS-proximal regions.

We next hypothesized that enrichment of top P-value ranked associations at active promoter and TSS regions may be the result of a small number of factors driving a signal, i.e., altered binding of a single TF at multiple regions in the genome is the causative force driving natural differences PEV. To test for enrichment of top P-value ranked associations within binding sites of individual chromatin-binding factors, analysis was extended to survey ChIP-chip and ChIP-seq data, also made available through the modENCODE project (modENCODE Consortium et al. 2010; Kharchenko et al. 2011). This resource is comprised of hundreds of experiments, sampled across various developmental states and cell types, and contains observed binding sites of factors such as histone modifications, TFs, and non-TFs. Again, for each factor, we search for enrichment of top rank-ordered SMA variants within the observed binding sites (Figure S10). Instead of finding one or two factors having variants with enriched P-values located within binding sites, we note that several factors displayed a strong enrichment with PEV-associated variants (Figure 4). We observed enrichment within chromatin binding sites of know PEV modifiers, such as JIL-1 (Lerach et al. 2006), LSD1 (Di Stefano et al. 2007), and BEAF-32 (Gilbert et al. 2006), among others. Further, we note enrichment of top ranked associations in sites that suggest natural variation has a particular impact on TSS regions that show a “balanced” or “bistable” chromatin state. Bistable chromatin sites are sites that may be influenced to either exhibit active or repressed gene expression. We note statistically significant differences in nearly all bound factors that strongly characterize bistable sites; ASH1, H3K4me1, H3K4me2, H3K4me3, and RNA Pol II, including depletion of H3K27me3 (Kharchenko et al. 2011). Although purely statistical, the over-representation of associations in sites bound by known PEV modifiers further suggests that natural autosomal genetic variation primarily modify PEV through influencing genome-wide expression rates or chromatin state occupancy and balance, not through altering protein function of individual genic modifiers.

Figure 4.

Figure 4

Observed enrichment of variants with association to PEV and chromatin features from S2 cells only. Black bars represent 95% of the expected distribution and gray bars represent the left and right 2.5% tails of the distribution. Numeric values are the average number of variants expected to fall within each chromatin feature. Observed counts from variants within the 1000 top P-value ranked associations are represented in red. Only features with observed values in the 5% tails of the expected distribution are shown. False discovery rate was used to assess statistical significance across the multiple tests, and significant values are noted.

Discussion

We designed an assay to identify autosomal nonrecessive variants involved in differences in PEV between naturally derived lines of D. melanogaster. We identified a wide range of PEV in the population, indicating presence of natural variation in chromatin state and epigenetic features. Despite large PEV-induced pigmentation differences between phenotyped lines, we find little evidence for involvement of polymorphic sites within known Su(var) and E(var) genes that contribute to these differences in PEV. Our top SMA associations further indicate that natural differences in response to PEV are not the primary result of changes in protein function among lines. We instead find that regions of enriched association to PEV are over-represented for promoter and TSS-proximal regions with an additional emphasis on sites that display bistable chromatin features. This suggests that autosomal interactions with differences in PEV, i.e., differences in heterochromatin formation and/or maintenance, are the combined result of many small effect loci that accumulate differences, and are linked to modified rates of transcription, either through small changes to specific TF binding sites or broad changes in chromatin state and chromatin mark distributions.

Furthermore, our data fit the Site Exposure Model of Variegated Silencing (Ahmad and Henikoff 2001), where a variegated state is the result of bistable features between TF binding and chromatin features. Remarkably, these findings have precedent in the PEV system and fit extremely well with prior findings indicating a key driver behind mosaic features of variegation is a bistable equilibrium between TF binding and heterochromatin content (Ahmad and Henikoff 2001). Here, it was found that by simply varying levels of a GAL4 transcriptional activator to a heterochromatin-embedded promoter, heterochromatin state could be disrupted. In the model proposed, termed The Site Exposure Model of Variegated Silencing (Widom 1999; Ahmad and Henikoff 2001), kinetics of DNA-histone contact dictate ability of a TF to bind a promoter or enhancer feature, and thus influence gene expression. Features that increase contact between TF binding and activator, include changes to TF abundance or changes to TF binding efficiency such as through mutated underlying binding sequence, changes to nucleosome occupancy (observed through histone and chromatin marks), or changes to abundance of TF guide molecules. Applied to our study, this suggests that each individual shows differences in PEV due to a large number of sites that impact expression rates of factors that then impact binding efficiency of TFs at the wm4 locus. This model also predicts that changes to chromatin content, i.e., an increase or decrease in heterochromatin, can, in turn, impact sensitive loci throughout the genome and influence gene expression. Indeed, this fits with observations that differing natural Y chromosomes, a giant source of heterochromatin, impact gene expression in autosomes (Lemos et al. 2008, 2010).

However, as the effect sizes for all associated variants are small, the combined set of variants likely do not fully explain differences in PEV across our sample set; indicating there is yet unobserved genetic variation that accounts for differences among lines. It is important to note that this query of natural variation was not exhaustive and only considered autosomal dominant variants with a particular focus on common variants. We reasoned that a heterozygous screen would be informative, because most Su(var) and E(var) allelic effects are dominant, but we note there is every reason to believe that recessive genetic modifiers of PEV exist also and were missed in our screen. Importantly, we did not query G × G interactions or Y-linked variants, known contributors to PEV and gene expression differences in natural populations (Lemos et al. 2008, 2010). Finally, it is important to consider that the autosomal loci identified here only show correlation with differences in PEV; it is not known at this time if these loci represent causal drivers of differences in PEV.

Supplementary Material

Supplemental material is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.117.300306/-/DC1.

Acknowledgments

We thank Dan Barbash and Jason Mezey for thoughtful critique and comments throughout all stages of the work, and Sally Elgin for important personal communications. Several members of the Clark laboratory provided support through general discussion and expertise, including; Jen Grenier, Clement Chow, Rob Unckless, Julien Ayroles, Roman Arguello, Margarida Cardoso Moreira, Tim Connallon, Angela Early, and Grace Chi. Finally, we thank the time and effort of reviewers for their detailed comments and helpful critique. This work was supported by R01 GM119125.

Footnotes

Communicating editor: S. Chenoweth

Literature Cited

  1. Ahmad K., Henikoff S., 2001.  Modulation of a transcription factor counteracts heterochromatic gene silencing in Drosophila. Cell 104: 839–847. [DOI] [PubMed] [Google Scholar]
  2. Bossdorf O., Richards C. L., Pigliucci M., 2008.  Epigenetics for ecologists. Ecol. Lett. 11: 106–115. [DOI] [PubMed] [Google Scholar]
  3. Chow, C. Y., M. F. Wolfner, and A. G. Clark, 2012.  A large neurological component to genetic differences underlying biased sperm use in Drosophila. Genetics 193: 177–185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Del Re, A. C., 2013 compute.es: compute effect sizes. R Package Version 0.2-2. Available at: http://cran.r-project.org/web/packages/compute.es. Accessed: September 1, 2015.
  5. Di Stefano L., Ji J.-Y., Moon N.-S., Herr A., Dyson N., 2007.  Mutation of Drosophila Lsd1 disrupts H3–K4 methylation, resulting in tissue-specific defects during development. Curr. Biol. 17: 808–812. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Dobson A. J., Chaston J. M., Newell P. D., Donahue L., Hermann S. L., et al. , 2015.  Host genetic determinants of microbiota-dependent nutrition revealed by genome-wide analysis of Drosophila melanogaster. Nat. Commun. 6: 6312. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Durham M. F., Magwire M. M., Stone E. A., Leips J., 2014.  Genome-wide analysis in Drosophila reveals age-specific effects of SNPs on fitness traits. Nat. Commun. 5: 4338. [DOI] [PubMed] [Google Scholar]
  8. Ebert A., Schotta G., Lein S., Kubicek S., Krauss V., et al. , 2004.  Su(var) genes regulate the balance between euchromatin and heterochromatin in Drosophila. Genes Dev. 18: 2973–2983. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Elgin S. C., Reuter G., 2013.  Position-effect variegation, heterochromatin formation, and gene silencing in Drosophila. Cold Spring Harb. Perspect. Biol. 5: a017780. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Ephrussi B., Herold J. L., 1944.  Studies of eye pigments of Drosophila. I. Methods of extraction and quantitative estimation of the pigment components. Genetics 29: 148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Ernst J., Kellis M., 2012.  ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods 9: 215–216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Filion G. J., van Bemmel J. G., Braunschweig U., Talhout W., Kind J., et al. , 2016.  Systematic protein location mapping reveals five principal chromatin types in Drosophila cells. Cell 143: 212–224. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Fodor B. D., Shukeir N., Reuter G., Jenuwein T., 2010.  Mammalian Su(var) genes in chromatin control. Annu. Rev. Cell Dev. Biol. 26: 471–501. [DOI] [PubMed] [Google Scholar]
  14. Francisco F. O., Lemos B., 2014.  How do Y-chromosomes modulate genome-wide epigenetic states: genome folding, chromatin sinks, and gene expression. J Genomics 1: 94–103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Gilbert M. K., Tan Y. Y., Hart C. M., 2006.  The Drosophila boundary element-associated factors BEAF-32A and BEAF-32B affect chromatin structure. Genetics 173: 1365–1375. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Girton J. R., Johansen K. M., 2008.  Chromatin structure and the regulation of gene expression: the lessons of PEV in Drosophila. Adv. Genet. 61: 1–43. [DOI] [PubMed] [Google Scholar]
  17. Grubbs N., Leach M., Su X., Petrisko T., Rosario J. B., et al. , 2013.  New components of Drosophila leg development identified through genome wide association studies. PLoS One 8: e60261. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Gu T., Elgin S. C. R., 2013.  Maternal depletion of Piwi, a component of the RNAi system, impacts heterochromatin formation in Drosophila. PLoS Genet. 9: e1003780. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Harbison S. T., McCoy L. J., Mackay T. F., 2013.  Genome-wide association study of sleep in Drosophila melanogaster. BMC Genomics 14: 281. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. He B. Z., Ludwig M. Z., Dickerson D. A., Barse L., Arun B., et al. , 2014.  Effect of genetic variation in a Drosophila model of diabetes-associated misfolded human proinsulin. Genetics 196: 557–567. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Hines K. A., Cryderman D. E., Flannery K. M., Yang H., Vitalini M. W., et al. , 2009.  Domains of heterochromatin protein 1 required for Drosophila melanogaster heterochromatin spreading. Genetics 182: 967–977. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Hu J., Barrett R. D. H., 2017.  Epigenetics in natural animal populations. J. Evol. Biol. 30: 1612–1632. [DOI] [PubMed] [Google Scholar]
  23. Huang W., Massouras A., Inoue Y., Peiffer J., Ràmia M., et al. , 2014.  Natural variation in genome architecture among 205 Drosophila melanogaster genetic reference panel lines. Genome Res. 24: 1193–1208. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Kharchenko P. V., Alekseyenko A. A., Schwartz Y. B., Minoda A., Riddle N. C., et al. , 2011.  Comprehensive analysis of the chromatin landscape in Drosophila melanogaster. Nature 471: 480–485. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Lamprecht M. R., Sabatini D. M., Carpenter A. E., 2007.  Cellprofiler: free, versatile software for automated biological image analysis. Biotechniques 42: 71–75. [DOI] [PubMed] [Google Scholar]
  26. Lemos B., Araripe L. O., Hartl D. L., 2008.  Polymorphic Y chromosomes harbor cryptic variation with manifold functional consequences. Science 319: 91–93. [DOI] [PubMed] [Google Scholar]
  27. Lemos B., Branco A. T., Hartl D. L., 2010.  Epigenetic effects of polymorphic Y chromosomes modulate chromatin components, immune response, and sexual conflict. Proc. Natl. Acad. Sci. USA 107: 15826–15831. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Lerach S., Zhang W., Bao X., Deng H., Girton J., et al. , 2006.  Loss-of-function alleles of the JIL-1 kinase are strong suppressors of position effect variegation of the wm4 allele in Drosophila. Genetics 173: 2403–2406. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Mackay T. F., Richards S., Stone E. A., Barbadilla A., Ayroles J. F., et al. , 2012.  The Drosophila melanogaster genetic reference panel. Nature 482: 173–178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. modENCODE Consortium. Roy S., Ernst J., Kharchenko P. V., Kheradpour P., et al. , 2010.  Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science 330: 1787–1797. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Muller H. J., 1930.  Types of visible variations induced by x-rays in Drosophila. J. Genet. 22: 299–334. [Google Scholar]
  32. Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M. A., et al. , 2007.  Plink: a toolset for whole-genome association and population-based linkage analysis. Am. J. Hum. Genet. 81: 559–575. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. R Development Core Team , 2008.  R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. [Google Scholar]
  34. Richards E. J., 2006.  Inherited epigenetic variation—revisiting soft inheritance. Nat. Rev. Genet. 7: 395–401. [DOI] [PubMed] [Google Scholar]
  35. Richards E. J., 2008.  Population epigenetics. Curr. Opin. Genet. Dev. 8: 221–226. [DOI] [PubMed] [Google Scholar]
  36. Schotta G., Ebert A., Krauss V., Fischer A., Hoffmann J., et al. , 2002.  Central role of Drosophila Su(var)3–9 in histone H3–K9 methylation and heterochromatic gene silencing. EMBO J. 21: 1121–1131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Schulze S. R., Wallrath L. L., 2007.  Gene regulation by chromatin structure: paradigms established in Drosophila melanogaster. Annu. Rev. Entomol. 52: 171–192. [DOI] [PubMed] [Google Scholar]
  38. Summers K. M., Howells A. J., Pyliotis N. A., 1982.  Biology of eye pigmentation in insects. Adv. Insect Physiol. 16: 119–166. [Google Scholar]
  39. Unckless R. L., Rottschaefer S. M., Lazzaro B. P., 2015.  A genome-wide association study for nutritional indices in Drosophila. G3 (Bethesda) 5: 417–425. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Wallrath L. L., Elgin S. C., 1995.  Position effect variegation in Drosophila is associated with an altered chromatin structure. Genes Dev. 9: 1263–1277. [DOI] [PubMed] [Google Scholar]
  41. Widom J., 1999.  Equilibrium and dynamic nucleosome stability. Methods Mol. Biol. 119: 61–77. [DOI] [PubMed] [Google Scholar]
  42. Yang J., Lee S. H., Goddard M. E., Visscher P. M., 2011.  GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88: 76–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Zhou X., Stephens M., 2012.  Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44: 821–824. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

Original eye images are available upon request. File S1 contains PEV phenotype values. File S2 contains a full list of variants with respective association P-values. File S3 contains observed and expected counts of top PEV associations within annotated chromatin features.


Articles from Genetics are provided here courtesy of Oxford University Press

RESOURCES