Skip to main content
Genetics logoLink to Genetics
. 2015 Aug 12;201(2):487–497. doi: 10.1534/genetics.115.180273

Genetic Control of Environmental Variation of Two Quantitative Traits of Drosophila melanogaster Revealed by Whole-Genome Sequencing

Peter Sørensen *, Gustavo de los Campos , Fabio Morgante , Trudy F C Mackay , Daniel Sorensen *,1
PMCID: PMC4596664  PMID: 26269504

Abstract

Genetic studies usually focus on quantifying and understanding the existence of genetic control on expected phenotypic outcomes. However, there is compelling evidence suggesting the existence of genetic control at the level of environmental variability, with some genotypes exhibiting more stable and others more volatile performance. Understanding the mechanisms responsible for environmental variability not only informs medical questions but is relevant in evolution and in agricultural science. In this work fully sequenced inbred lines of Drosophila melanogaster were analyzed to study the nature of genetic control of environmental variance for two quantitative traits: starvation resistance (SR) and startle response (SL). The evidence for genetic control of environmental variance is compelling for both traits. Sequence information is incorporated in random regression models to study the underlying genetic signals, which are shown to be different in the two traits. Genomic variance in sexual dimorphism was found for SR but not for SL. Indeed, the proportion of variance captured by sequence information and the contribution to this variance from four chromosome segments differ between sexes in SR but not in SL. The number of studies of environmental variation, particularly in humans, is limited. The availability of full sequence information and modern computationally intensive statistical methods provides opportunities for rigorous analyses of environmental variability.

Keywords: Bayesian inference, environmental sensitivity, genetic control of environmental variance, genomic models, random regression models


ENVIRONMENTAL sensitivity can be defined either as mean phenotypic changes of a given genotype in different environments or as differences in the environmental variance of different genotypes in the same environment (Jinks and Pooni 1988). The first definition gives rise to genotype-by-environment interaction at the level of the mean, a topic that has a long history due to its relevance in livestock and plant breeding as well as in medical genetics. Recent articles on specific areas are Huquet et al. (2012) in livestock, El-Soda et al. (2014) in plants, and Hutter et al. (2013) in humans.

The second definition implies that environmental variance is under genetic control and is the subject of the present work. Genetic control of environmental variation has implications in evolutionary biology, in animal and plant improvement, and in medicine. In evolutionary biology, a fundamental problem is to understand the forces that maintain phenotypic variation. Most of the models assume that environmental variance is constant and explain the observed levels of variation by invoking a balance between a gain of genetic variance by mutation and a loss by different forms of selection and drift. Zhang and Hill (2005) discuss models where environmental variance is partly under genetic control and study conditions for its maintenance under stabilizing selection. From a breeding point of view, if environmental variance is under genetic control, it would be possible to decrease variation by selection leading to more homogeneous products (Mulder et al. 2008). In human health the study of variation has shown a recent revival due to the role it may play in understanding complex diseases (Geiler-Samerotte et al. 2013). For example, the question of how a phenotype (e.g., blood pressure) varies over time within an individual and whether this variability is subject to genetic control can be of clinical relevance. Further, many traits such as cancer originate from rare events taking place in a few cells. Such events could be the result of stochastic cell-to-cell variation that has been shown to be under genetic control in Saccharomyces cerevisiae (Ansel et al. 2008) and in Arabidopsis thaliana (Jimenez-Gomez et al. 2011; Shen et al. 2012). Feinberg and Irizarry (2010) provide evidence from human and mouse tissues that differentially methylated regions could be one explanation for this genetically controlled variation in gene expression.

The literature on the subject can be distinguished between attempts at documenting the existence of genetic factors influencing variation and those aimed at finding the specific genes involved. The latter are more recent and involve fewer studies (Ansel et al. 2008; Jimenez-Gomez et al. 2011; Shen et al. 2012; Hutter et al. 2013) and the focus in some is on analysis of the marginal phenotypic variation (Yang et al. 2012a). This is different from the subject of the present study, where the focus is on the conditional variance, given the genotype. This distinction is important. For example, with repeated measurements, it amounts to studying the variance either between or within individuals.

In domestic animals and plants, statistical documentation for genetic control of environmental variance stems from analyses of outbred populations (reviewed in Hill and Mulder 2010). The evidence includes litter size in pigs (Sorensen and Waagepetersen 2003), adult weight in snails (Ros et al. 2004), uterine capacity in rabbits (Ibáñez et al. 2008), body weight in poultry (Rowe et al. 2006; Wolc et al. 2009), slaughter weight in pigs (Ibáñez et al. 2007), and litter size and weight at birth in mice (Gutierrez et al. 2006). Usually, support for the model is based on reporting estimates of the model parameters and on comparisons involving the quality of fit of the genetically heterogeneous variance model with that of models posing homogeneous variances. A better fit of the heterogeneous variance model can be due to its flexibility to account for specific features of the data not necessarily related to variance heterogeneity, such as unaccounted lack of normality (Yang et al. 2011). Spurious results can never be excluded using a model-based approach for the assessment of genetic control of environmental variances, particularly when a properly conducted analysis involves complex, highly parameterized hierarchical models.

More direct evidence for genetic control of environmental variation comes from analyses of genetically homogeneous populations. These data, consisting of pure lines that are genotypically different, with replicated genotypes within lines, are well suited to investigate genetic heterogeneity of environmental variation. The variance between genetically identical individuals within lines reflects environmental variation, and a different environmental variance across lines indicates that a genetic component is operating. Such results were reported for dry matter grain yield per plot from three genetically homogeneous single-cross maize hybrids by Yang et al. (2012b). More compelling evidence using bristle number in isogenic lines of Drosophila melanogaster reared in a common macroenvironment under controlled laboratory conditions was provided by Mackay and Lyman (2005). This design has the added advantage that the statistical method involved is straightforward: a comparison of within-line sampling variances across lines. A similar approach was followed by Morgante et al. (2015) who used inbred lines of D. melanogaster reared in a common macroenvironment under controlled laboratory conditions. Three quantitative traits were analyzed and the results provided a strong indication of environmental variance heterogeneity between inbred lines, with a pattern of behavior that differed among traits. In this work we extend the work of Morgante et al. (2015) by incorporating sequence information in the analysis. This allows a separation of the variation (in environmental variance) between lines (in principle, entirely of genetic origin) into a fraction captured by sequence information and a remainder. Here we restrict the analysis to two traits: starvation resistance and startle response. For each trait the environmental variances of males and females were analyzed as two correlated traits to investigate the existence of genomic variation in sexual dimorphisms. The total genomic variance is also partitioned into contributions from four chromosome segments.

This article is organized as follows. Material and Methods provides a brief description of the data, of the model for the variance between lines, and of the two sets of statistical models employed, both of which treat records in males and females as two correlated traits. In Results we report and compare inferences based on the two sets of models. The models are implemented using empirical Bayesian methods. The Discussion provides a final overview and implications of the results obtained. A number of technical details are relegated to supporting information, File S1. These include a formal derivation of the probability model for the variance between lines and details of a spectral decomposition that plays an important computational role in the Markov chain Monte Carlo strategy implemented, as well as a full description of the Bayesian models. Finally, File S1 also provides an overview of the Monte Carlo implementation of the Bartlett test to study overall heterogeneity of within-line variance, between lines.

Materials and Methods

The data

The traits studied are starvation resistance (SR) and startle response (SL). The data belong to the D. melanogaster Genetics Reference Panel (DGRP) (Mackay et al. 2012). The inbred lines were obtained by 20 generations of full-sib mating from isofemale lines collected from the Raleigh, North Carolina population, which have full genome sequences.

The DGRP lines are not completely homozygous. Most lines remained segregating for ≤2% sites after 20 generations of full-sib inbreeding; however, a number of lines had ∼20% of markers segregating on one or more autosomal arms due to the segregation of large polymorphic inversions. In previous work we explicitly tested whether the log-sampling variances may show an association with heterozygosity and no such tendency was found (Morgante et al. 2015). Therefore, rather than excluding lines, all were included in the analysis without accounting for the level of heterozygosity.

The experimental design involves replicates within line for each sex. For each sex, there are n lines (n = 197 for SR and n = 167 for SL), nr replicates per line (nr = 5 for SR, nr = 2 for SL). The phenotypes analyzed are the log-sampling variances within each line, sex, and replicate of SR and SL. Each of the nt=nnr log-sampling variances from each sex are computed from ns phenotypes (ns = 10 for SR; ns = 20 for SL) collected within each replicate. Resistance to starvation was quantified by placing two 1-day-old flies in culture vials containing nonnutritive medium (1.5% agar and 5 ml water) and scoring survival every 8 hr until all flies were dead. SL was assessed by placing single 3- to 7-day-old adult flies, collected under carbon dioxide exposure, into vials containing 5 ml culture medium and leaving them overnight to acclimate to their new environment. On the next day, between 8 am and 12 pm (2–6 hr after lights on), each fly was subjected to a gentle mechanical disturbance, and the total amount of time the fly was active in the 45 sec immediately following the disturbance was recorded. A similar data set and the same traits were also used by Ober et al. (2012) to study predictive ability, using whole-genome sequences.

Genetic marker data:

Briefly, the genome of D. melanogaster contains four pairs of chromosomes: an X/Y pair (males are XY and females are XX) and three autosomes labeled 2, 3, and 4. The fourth chromosome is very tiny and it is ignored. The “genetic markers” used in the present analyses were called from raw sequence data as described in Huang et al. (2014). Markers were filtered such that minor allele frequencies were >0.05 and were called in at least 80% of the lines, leaving a total of 1,493,351 SNPs from four chromosome arms (2L, 2R, 3L, and 3R). The numbers of genetic markers within each of these sets were 406,577, 327,967, 390,711, and 368,096, respectively.

One of the objectives of this study was to study whether sexual dimorphism could be detected at the autosomal level. Therefore sex chromosomes were excluded. As a check some of the analyses were repeated including the sex chromosomes. These are reported briefly and we show that the results of both analyses are almost identical.

Sexual dimorphism:

The term genetic (or genomic) variance in sexual dimorphism is used throughout this article. It should be understood as differences in the genetic (or genomic) variances between males and females (of log-sampling variances) and a genetic (or genomic) correlation between log-sampling variances of males and females different from 1.

Sexual dimorphism itself is manifested in the significance of the effect of sex. This would result in different means of log-sampling variances in males and females. Despite this difference, the variance in log-sampling variance in the two sexes may be equal, or the correlation between log-sampling variances in the two sexes may be equal to one. This would give rise to sexual dimorphism but no variance in sexual dimorphism. The converse is also true. In this work we focus mainly on the variance in sexual dimorphism.

Statistical models

The data are analyzed with two sets of two-trait models, which regard records in males and females as two correlated traits. The first set comprises relatively simple two-trait models, which allocate the total variance in log-sampling variance into components between and within lines. These models are appealing in the sense that they are free from assumptions regarding gene action and lead directly to an estimate of broad sense heritability of the log-sampling variance of SR or SL, defined as the ratio of the between-line component relative to the total variance. This set of models provides also a description of broad sense genetic correlation between sexes. Differences in these parameters between sexes provide a first indication of genetic variance in sexual dimorphism.

The data are analyzed subsequently using a more parameterized two-trait model. Here the component in log-sampling variance between lines is studied in more detail, by quantifying the proportion of this variance captured by genetic marker information present in each of four chromosome arms and a remainder, not captured by genetic marker information. For each trait, we study whether chromosome arms contribute differently to the total genetic variance (variance between lines) and whether this contribution varies between sexes. The general two-trait model provides estimates of within-chromosome correlation of genomic values between sexes. Differences in these parameters between sexes provide a more detailed picture and disclose what we call genomic variance in sexual dimorphism.

The dependent variable analyzed is the logarithm of the sampling variance in a particular sex, line, and replicate. This is a measure of residual variance that is interpreted as environmental variance. The justification for this interpretation is that lines are highly homozygous, and, as indicated above, the small amount of heterozygosity remaining after 20 generations of full-sib mating was found not to be related to residual variability. The sample variance in a particular sex, line, and replicate (ignoring here the subscripts for sex, line, and replicate) is the scalar

S=i=1ns(yiy¯)2ns1, (1)

where yi is the phenotype and y¯ is the mean of the ns observations. It can be shown (see File S1) that the distribution of zmjk=lnSmjk can be approximated by

zmjk|σmjk2N(lnσmjk2,2ns1), (2)

where m is a subscript for males (f is a subscript for females), j is a subscript for the line, k is subscript for the replicate, and σmjk2 is the variance of yi for the specific line, sex, and replicate subclass. This normal distribution has known variance and unknown mean. Hereinafter we refer to the dependent variable z as the log-sampling variance.

The general two-trait model

We begin with a description of the general two-trait model since the simpler models are special cases. The general two-trait model partitions the line effects into a component explained by genetic marker effects, g, and a component due to genetic effects that are not associated with markers, h. Additionally, the component due to genetic markers g is subdivided into contributions from the C = 4 chromosome arms gi,i=1,,C. The model also includes replicate effects r that contribute to within-line variation.

In (2), the model for the mean in males (m) is assumed to be

lnσmjk2=μm+i=1Cgimj+hmj+rmjk (3)

and in females (f)

lnσfjk2=μf+i=1Cgifj+hfj+rfjk. (4)

In these expressions, the μ’s are scalar means for each sex, the g’s are contributions to line effects from each of C = 4 chromosome arms that can be associated with genetic marker information (genomic effects or genomic values), the h’s represent a component of the line mean that cannot be captured by regression on markers, and the r’s are the contribution to within-line effects from replicates.

The genomic value for chromosome arm k (a scalar) is defined as the sum of the effects of all the markers in chromosome arm k; that is, for line j, chromosome arm k, in males,

gmjk=i=1pkwijkbmik=wjkbmk, (5)

with an equivalent expression for females. In (5), pk is the number of markers in chromosome arm k, the scalar wijk is the observed (centered and scaled) label for marker i in chromosome arm k in line j, and bmik is the effect of marker i in chromosome arm k in males. The n × 1 vectors of genomic effects for males and females for chromosome arm k are gmk=Wkbmk and gfk=Wkbfk, respectively, where Wk = {wijk} is the observed n × pk matrix of marker genotypes of chromosome arm k. The joint distribution of vectors bmk and bfk is assumed to be

[bmkbfk]N([00],[Iσbmk2IσbmkbfkIσbmkbfkIσbfk2]), (6)

where the I’s represent pk × pk identity matrices, σbmk2(σbfk2) is the prior uncertainty variance of marker effects for males (females) in chromosome arm k, and σbmkbfk is their prior covariance.

Due to the centering the rank of Wk is n1. It follows from these assumptions and from standard properties of the multivariate normal distribution that the model for the joint distribution of genomic values in males and females is the singular multinormal (SN) distribution,

[gmkgfk]SN([00],[Gkσgmk2GkσgmkgfkGkσgmkgfkGkσgfk2]),k=1,,C. (7)

Above, σgmk2 is the genomic variance in males in chromosome arm k, σgfk2 is the genomic variance in females in chromosome arm k, σgmkgfk is the genomic covariance between males and females in chromosome arm k, Gk=(1/pk)WkWk is the genomic relationship matrix of rank n1 of chromosome arm k, and SN denotes the singular normal distribution (details in File S1). A more compact notation for (7) is

gk=(gmk,gfk)SN(0,GkVgk), (8)

where

Vgk=[σgmk2σgmkgfkσgmkgfkσgfk2]. (9)

Similarly, collecting the h’s in vectors with n elements for each sex and the r’s in vectors of nt = (n × nr) elements for each sex, their joint distribution is assumed to be

[hmhf]N([00],[Iσhm2IσhmhfIσhmhfIσhf2]), (10)

and

[rmrf]N([00],[Iσrm200Iσrf2]). (11)

The identity matrix I is of order n × n in (10) and of order (nnr) × (nnr) in (11). The vectors of log-sample variances for males and females, zm and zf, each with nt = nnr elements, are conditionally normally distributed, given g, h, and r, and take the form

zm|μm,gm,hm,rmN(1μm+i=1CZgim+Zhm+rm,Iσe2), (12a)
zf|μf,gf,hf,rfN(1μf+i=1CZgif+Zhf+rf,Iσe2), (12b)

where σe2=2/9 for starvation resistance and σe2=2/19 for startle response. In these expressions 1 is an nt × 1 vector of ones, and Z is an observed nt × n incidence matrix (of ones and zeros) that associates each of the nr log-sampling variances of a given sex and line to a common element g or h. The vectors r contain the effects of the nt = nnr replicates.

For males, inferences are reported in terms of ratios

hgmi2=σgmi2σgm2+σhm2,i=1,,C, (13)

with σgm2=Var(i=1Cgmi) and a similar expression (hgfi2) for females. When the elements of W are scaled so that the average of the diagonal elements of WW′ is 1, (13) quantifies the proportion of the variance in log-sampling variance between lines (total genetic variance), captured by genetic marker information associated with chromosome arm i. The ratio involving marker effects from all the chromosome arms is

hgm2=σgm2σgm2+σhm2 (14)

with a similar expression hgf2 for females.

We also report within-chromosome arm genomic correlations

rgi=σgmigfiσgmiσgfi,i=1,,C, (15)

and broad sense heritabilities for each sex, here defined as

Hm2=σgm2+σhm2σgm2+σhm2+σrm2, (16a)
Hf2=σgf2+σhf2σgf2+σhf2+σrf2. (16b)

These quantify the variance in log-sample variance observed between lines (which is the total genetic variance), as a proportion of the total variance (sum of the between- and within-line components). In the denominator of (16), the conditional variance in expression (2) is excluded because it is data dependent: the number of phenotypes per replicate used to compute z. Therefore, expressions (16a) and (16b) are interpreted as broad sense heritability of sample variances computed with a large number of replicates.

Finally, we also compute the total genomic correlation between sexes, defined as

rgi=Cov(i=1Cgmi,i=1Cgfi)σgm2σgf2, (17)

and the broad sense genetic correlation between sexes, defined as

rB=Cov(i=1Cgmi,i=1Cgfi)+Cov(hm,hf)σgm2+σhm2σgf2+σhf2. (18)

The genomic variance of chromosome-specific components

The models specified by (3), (4), and (7) quantify the relative contribution from each of the chromosome arms 2L, 2R, 3L, and 3R by fitting separate variance components for each of those segments. Here we describe the basis for this partitioning. The point of departure is to assume that the 2n × 1 vector of genomic effects g=(gmgf) is equal to the sum of the vectors of genomic effects belonging to C chromosome components. Then

g=i=1Cgi,Var(g|W)=i=1CVar(gi|Wi)=i=1C1piWiWiVgi, (19)

where gi is the 2n × 1 vector of genomic values for males and females, associated with component i, i = 1,…, C; pi is the number of marker genotypes in component i; and Vgi is the 2 × 2 genomic covariance matrix from chromosome component i given by (9). Expression (19) holds under the model assumption that the components of the vector of marker effects b=(b1,,bi,,bC) are realizations from biN (0, IVbi), with Cov(bi,bj)=0, i,j=1,,C;ij. Then Cov(gi,gj|W)=Cov(Wibi,bjWj)=0. In these expressions, bi=(bmi,bfi) is the vector of marker effects of chromosome component i, and

Vbi=[σbmi2σbmibfiσbmibfiσbfi2].

Analyses based on simple models

Two simple two-trait models are implemented that partition the total variance in log-sampling variance into a between-line and a within-line component. The difference between the two models is whether marker information is included or not in the analysis.

Model H:

This model does not include marker information. Dropping the regression on markers (and therefore all the genetic marker effects gi,i=1,,C) results in a model with line and replicate effects only. The line effects in model H are assumed to be identically and independently distributed. The variance–covariance structure of marginal distribution of the observations z (with respect to lines) is block diagonal, where the elements within a block = 1 and specify the covariance between observations within lines. Due to the balanced nature of the data, maximum-likelihood estimates of variance components under Gaussian assumptions are equal to method of moments estimates that can be derived from standard ANOVA.

Model G:

This model includes marker information. Dropping the effects h and including an overall g effect leads to a model with replicate and line effects also, but in contrast to model H, here line effects have a correlated structure given by the genomic relationship matrix. In this model, the g’s are realizations from a common distribution SN(0,GVg). In model G, the variance–covariance structure of marginal distribution of the observations z includes a dominating block-diagonal structure, with elements given by the diagonals of the genomic relationship matrix (which, on average, = 1), and small off-block diagonals that describe the weak genetic marker-based measure of covariation among lines defined by the genomic relationship matrix.

Inferences from model G are reported as

HGm2=σGm2σGm2+σrm2 (20)

for males, with a similar expression for females, and as

rG=σGmfσGmσGf. (21)

The ratio (20) describes the broad sense heritability or variance between lines as a proportion of the total variance, when only marker information is included in the model. The correlation (21) can be interpreted as the total genetic correlation between sexes.

Similarly, inferences from model H are reported as

HHm2=σHm2σHm2+σrm2 (22)

for males, with a similar expression (HHf2) for females, and as

rH=σHmfσHmσHf. (23)

The ratio (22) describes the broad sense heritability or variance between lines as a proportion of the total variance, when marker information is excluded from the model. The correlations (23) and (21) can be interpreted as the total (broad sense) genetic correlation between sexes. The ratios (20) and (22) correspond to (16).

Implementation of the models

The models were implemented using a Bayesian, Markov chain Monte Carlo approach, whereby the hyperparameters of the dispersion parameters of the Bayesian model were estimated by maximum likelihood, and conditional on these, the model was fitted using Markov chain Monte Carlo. To illustrate, consider determining a value of the scale parameter of an inverse chi-square distribution, which is the prior assumed for the variance components associated with r. This is achieved by equating the mode of the inverse chi-square prior density, for a fixed value of the degrees of freedom, to the maximum-likelihood estimate and solving for the scale parameter. The decision to use this approach rather than classical likelihood is based on the need to obtain measures of uncertainty that account as much as possible for the limited amount of information in the data without resorting to large sample theory, which is inherent in classical likelihood. This is relevant in the present study where the data are of limited size and results in marginal posterior distributions that are expected to be asymmetric, particularly in the case of the more complex, general two-trait model. The absence of symmetry of posterior distributions is well captured by the Bayesian MCMC methods, as a glance at results in Table 2 reveals. In a situation like this, summarizing results in the form of posterior means and standard deviations would be misleading. In the presence of asymmetry, posterior means are poor indicators of points of high probability mass. We therefore summarize inferences in terms of posterior modes and posterior intervals.

Table 2. MCMC estimates of posterior modes and 95% posterior intervals of the proportion of the total genetic variance (variance between lines) captured by genetic marker information from each of four chromosome arms (Equation 13) and of the within-chromosome arms genomic correlations (Equation 15).

SR SL
Chromosome Sex Genetic heritability Genetic correlation Genetic heritability Genetic correlation
2L M 0.008 (0.003; 0.170) 0.87 (−0.57; 0.95) 0.38 (0.16; 0.71) 0.77 (0.21; 0.87)
F 0.006 (0.003; 0.092) 0.29 (0.12; 0.66)
2R M 0.007 (0.003; 0.134) 0.32(−0.47; 0.68) 0.009 (0.003; 0.176) 0.93(−0.40; 0.98)
F 0.30 (0.16; 0.52) 0.008 (0.003; 0.169)
3L M 0.01 (0.004; 0.130) 0.90(−0.59; 0.97) 0.52 (0.22; 0.76) 0.76(0.39; 0.86)
F 0.007 (0.003; 0.120) 0.57 (0.26; 0;80)
3R M 0.15 (0.08; 0.46) 0.57(−0.18; 0.79) 0.008 (0.003; 0.124) 0.91(−0.52; 0.98)
F 0.20 (0.10; 0.44) 0.007 (0.003; 0.128)
Total M 0.27 (0.14; 0.56) 0.36(−0.03; 0.60) 0.99 (0.83; → 1) 0.72(0.57; 0.80)
F 0.63 (0.40; 0.80) 0.99 (0.81; → 1)

The last row shows the proportion captured by the sum of all genomic effects for the four chromosome arms (Equation 14) and the total genomic correlation between sexes (Equation 17).

Details of the prior distributions of the parameters of the Bayesian models and of the Markov chain Monte Carlo algorithm can be found in File S1.

Data availability

The data used in the study is available at http://dgrp2.gnets.ncsu.edu/data.html.

Results

The first step of the analysis consisted of computing descriptive statistics and graphs (boxplots) based on which, by simple graphical inspection, we assessed the existence of differences in log-sample variance between lines and sexes. Subsequently we tested more formally the existence of variance heterogeneity between lines, using a Monte Carlo implementation of Bartlett’s test. This led to rejection of the null hypothesis of homogeneity of variance for both traits.

In a second step we performed more formal analyses, using two sets of models. The first set consists of simple models that partition the total variance in log-sampling variance into between- and within-line components. The second set employs more complex models whereby the variance between lines (in principle of genetic origin) is partitioned into components from four chromosome segments that capture variation due to linear regression on genetic markers and a remainder.

Descriptive analyses

For starvation resistance, raw means (variances) of zj=lnSj are, for males, 3.92 (0.53) and for females 4.47 (0.57). The raw value of the correlation of zj between sexes is 0.43. For startle response, these values are, for males, 3.09 (0.41) and for females 3.02 (0.44). The raw correlation is 0.73.

Figure 1 provides a summary of the empirical distribution of SR records. Figure 1A displays boxplots of the records by line sorted in decreasing order of log-sampling variance within lines, from left to right. The corresponding plots for SL are shown in Figure 2A. The results for males are similar and are not shown. Figure 1B and Figure 2B show scatter plots of the within-line log-sampling variance of males vs. females for SR and SL, respectively. Figure 1 and Figure 2 indicate marked heterogeneity of within-line variance across lines for both traits. Figure 1B and Figure 2B suggest that there is a poor linear relationship of within-line log-sample variance between sexes for SR but not for SL. In the case of SR, this is indicative of sexual dimorphism at the level of the mean (as indicated above, the mean is larger in females than in males).

Figure 1.

Figure 1

(A) Boxplots of the distribution of phenotypic records for starvation resistance (SR) within lines in females, for all the lines in the study. (B) Relationship of the log-sampling variances within lines of females vs. males.

Figure 2.

Figure 2

(A) Boxplots of the distribution of phenotypic records for startle response (SL) within lines in females, for all the lines in the study. (B) Relationship of the log-sampling variances within lines of females vs. males.

Testing for heterogeneity of environmental variance

Before fitting the parameterized models a Monte Carlo implementation of the Bartlett test was carried out within sex (technical details in File S1) to test for differences in the sampling variances (Equation 1). The P-values in both traits are of the order of 10−16, leading to a clear rejection of the hypothesis of variance homogeneity between lines.

Analyses based on model H and model G

MCMC estimates of posterior modes and 95% posterior intervals are shown in Table 1. These simple models lead to symmetric posterior distributions, as a glance at the numbers reveals. Posterior means and medians are almost indistinguishable from the posterior modes (not shown). Both models lead to very similar inferences. For SR, males show a little higher broad sense heritability than females. The total genetic correlation between sexes departs clearly from 1. These two sources of evidence provide a first indication of genetic variance in sexual dimorphism for this trait.

Table 1. Analyses based on model H and model G.

Model H Model G
Trait Sex HH2 rH HG2 rG
SR Males 0.66 (0.55; 0.74) 0.58(0.41; 0.70) 0.65 (0.56; 0.75) 0.59(0.43; 0.72)
Females 0.60 (0.50; 0.71) 0.60 (0.51; 0.68)
SL Males 0.71 (0.59; 0.80) 0.96(0.90; 0.98) 0.71 (0.59; 0.81) 0.97(0.91; 0.98)
Females 0.69 (0.57; 0.78) 0.67 (0.55; 0.72)

Shown are MCMC estimates of posterior modes of broad sense heritabilities HH2 and HG2, given by (22) and (20), respectively, and of the broad sense genetic correlation between sexes rH and rG, given by (23) and (21), respectively. Ninety-five percent posterior intervals are in parentheses.

For SL the results indicate very similar broad sense heritabilities in the two sexes and total genetic correlations that are very close to 1. There is no indication of genetic variance in sexual dimorphism for this trait.

Inclusion of sex chromosomes

As a small check, the analysis based on model G was repeated, including genetic marker information from the sex chromosomes. For SR, the modal values of the variance components between and within lines, and the between-line covariance between males and females, were as follows: including sex chromosomes, the estimates were 0.2278, 0.1966, and 0.1240, respectively. Excluding sex chromosomes, the estimates were 0.2308, 0.2009, and 0.1266, respectively. The modal values of the genetic correlation between sexes, including and excluding sex chromosomes, are 0.5857 and 0.5880, respectively. The differences are statistically indistinguishable. The same occurred for SL (not shown): inclusion or exclusion of sex chromosomes led to virtually identical results.

Analysis based on the general two-trait model

Results summarized in terms of estimates of posterior modes and 95% posterior intervals of the proportion of the total genetic variance captured by genetic marker information from each of four chromosome arms (Equation 13), and of the within-chromosome arms genomic correlations (Equation 15), are displayed in Table 2. SR shows clear signals from chromosome arm 2R only in females and from chromosome 3R in both sexes. Judging by the 95% posterior intervals, the only within-chromosome genomic correlation that deserves mention is that associated with chromosome 3R (although the value of zero is included in the posterior interval). The remaining ones involve association between variables that hardly display variability at all. In this case the correlation does not contribute to phenotypic similarity and is very difficult to estimate. This is well captured by the Bayesian MCMC approach that results in wide posterior intervals reflecting large posterior uncertainty.

In contrast to SR, SL shows a very similar pattern in both sexes. There are clear signals from chromosome arms 2L and 3L in males and females. The genomic correlations between sexes in these chromosome arms are high and the posterior intervals do not include the value of zero.

The last row of Table 2 shows the overall proportion of the variance between lines captured by all the genetic marker information, as well as the overall genomic correlation. In the case of SR, this proportion is rather small in males (27%), it is 2.3 times larger in females (63%), and the total genomic correlation between sexes is moderate to small (0.36). However, for SL, the pattern is again very different. The total proportion of the genetic variance (variance between lines) captured by the markers is almost 100% in both sexes and the total genomic correlation between sexes is high (0.72). Therefore the general two-trait model substantiates the results suggested by the analyses based on model H and model G, indicating the existence of genomic variance in sexual dimorphism for SR and the lack of it in the case of SL.

In contrast to the results from Table 1, the posterior distributions in Table 2 show clear signs of asymmetry, which is attenuated when the signal is very strong. For example, the posterior mode, mean, and median in chromosome arm 2L for SR in males are 0.008, 0.035, and 0.019, respectively. For chromosome arm 3R for SR in females, which shows a moderate value, these numbers are 0.20, 0.24, and 0.23. However, for SL, chromosome arm 3L in females has a strong effect, and mode, mean, and median are 0.57, 0.55, and 0.55.

From the general two-trait model we also computed posterior modes and 95% posterior intervals of broad sense heritabilities and correlation defined in (16) and (18). These parameters have the same interpretation as those displayed in Table 1, inferred with models H and G. For SR, the estimates of (16a) and (16b) are 0.66 (0.51; 0.73) and 0.61 (0.48; 0.69), respectively. For the broad sense genetic correlation (Equation 18), the number is 0.52 (0.42; 0.65). These numbers are in good agreement with those in Table 1. For SL, the modal estimates of (16a) and (16b) are 0.87 (0.74; 0.94) and 0.85 (0.71; 0.93), respectively, and the number for (18) is 0.77 (0.57; 0.96). These broad sense heritabilities are a little higher than those reported in Table 1, and the broad sense genetic correlation is lower.

Discussion

In previous work (Morgante et al. 2015) we produced evidence of genetic control of environmental variance for quantitative traits, using Drosophila inbred lines. Due to the design of the experiment and the possibility to control macroenvironmental variation in the laboratory, this evidence must be regarded as substantial, because differences between lines in environmental variance are a reflection of genetic variation. Here we extend this work by incorporating full sequence information and modern computationally intensive statistical methods. This provides opportunities for a more detailed investigation of environmental variability. This study revealed different mechanisms operating in the two traits and sexes. Specifically, for starvation resistance, the analyses indicated that the total variance in variance between lines is a little smaller in females than in males and the broad sense genetic correlation between sexes is markedly <1 (0.58). The fraction of the between-line variance captured by genomic markers is almost 2.3 times larger in females than in males. Further, the strength of the genomic signals across the four chromosome segments differs in the two sexes. On the other hand there is no indication of genetic or genomic variance in sexual dimorphism in startle response, where total variance in variance between lines is almost the same in both sexes, the broad sense genetic correlation between sexes is almost 1, and practically all the variance in environmental variance between lines is captured by marker information. In this trait, in agreement with the absence of genetic and genomic variance in sexual dimorphism, the genomic signals in the four chromosome segments are very similar in males and females.

The marker-based model (G) and the marker-free model (H) lead to the same partitioning of the total variance into a between-line and a within-line component and the same broad sense genetic correlation

The analyses based on models H and G confirm the existence of a genetic component acting on environmental variance. Both models yielded very similar results for the estimates of broad sense heritabilities (∼0.65 for SR in males and 0.60 for SR in females and ∼0.70 for SL in both sexes) and for the broad sense genetic correlations (0.58 for SR and 0.97 for SL). These results are indicative of genetic variance in sexual dimorphism for SR and lack of it for SL. Sexual dimorphism has also been reported at the level of the mean by, for example, Mackay et al. (2012) and Zhou et al. (2012) who observed very different patterns for the number of dead flies vs. starvation time in males and females.

Genetic marker information, included in model G and excluded in model H, has an undetectable effect on the subdivision of the total variance. This is due to the presence of replicates within lines (genotypes) and the fact that lines have very small relationships among them. In the case of model H, the between-line component is completely defined by the covariation of the elements of z within lines. On the other hand, model G leads to a covariance structure of z that has dominating diagonal blocks, consisting of the diagonal elements of the (scaled) genomic relationship matrix, and off-diagonals that describe the weak genetic marker-based relationships among lines. In the case of model G, the elements of the diagonal blocks are 1 on average, in contrast to model H, where they are all exactly 1. In model G, the off-diagonal elements, in principle, contribute to additive genetic variance or to narrow sense heritability. Therefore estimates of the variance between lines or of broad sense heritability based on model G may be difficult to interpret and are expected to differ from estimates based on model H. However, our analyses do not detect such differences, presumably due to the very dominating block-diagonal structure, very similar in both models, in relation to the amount of data available. This situation is similar to what is encountered in the analysis of outbred populations, using genomic best linear unbiased prediction. In pedigreed populations with strong family structures, estimates of heritability obtained using pedigree-based or genetic marker-based covariance matrices lead to similar results (de los Campos et al. 2013). In the present data, the “strong family structure” is represented by the replicated genotypes within lines.

Partition of variance between lines using genetic markers from different chromosome arms reveals further differences between traits and sexes

The results from models H and G are further supported by the general two-trait model that allows a partitioning of the total variance between lines into contributions from four chromosomal segments and a remainder not captured by markers. For SR, the partitioning indicates a strong signal generated by chromosomal segment 2R in females and clearly detectable signals from chromosome arm 3R in both sexes. In contrast, for SL, segments 2L and 3L generate strong signals in both sexes. We conclude that sexes show different signal behavior in the case of SR and a similar one in the case of SL. Our data do not allow an investigation of the nature of the differences of the signals between males and females.

Another line of evidence supporting these results was sought by carrying out an informal genome-wide association analysis for each trait, within sexes. Even though the experiment is underpowered for SNP effect detection, we simply counted the number of markers whose effects were associated with P-values arbitrarily chosen to be <0.01 and the results are shown in Table 3. A glance at the numbers in Table 3 shows that the genomic regions enriched with genetic markers with stronger signals agree well with the pattern displayed in Table 2. That is, for SR, males and females show the largest numbers in 3R followed by another large number (4954) only for females in arm 2R. On the other hand, for SL, both males and females show the largest numbers in Table 3 in relation to arms 2L and 3L. Arriving at the same conclusion using different approaches provides stronger support for our findings.

Table 3. Number of markers whose P-values are <0.01.

Chromosome segment
Sex 2L 2R 3L 3R
Males 3475 3275 2649 5844
Females 3267 4954 3930 6092
Males 6302 4058 4969 3898
Females 4948 3187 5170 4183

Starvation resistance is shown in first two rows and startle response in last two rows.

The nature of the variance explained by genetic markers

In the present experiment the observed variance between inbred lines provides an adequate description of the total genetic variance of the trait, which can be inferred directly from models H and G. The analysis based on the general two-trait model indicates that the linear regression of the log-sampling variances on genetic markers captures only a fraction of this genetic variation in the case of SR and practically all of it in the case of SL. Further, in the case of SR this fraction is less than half as large in males as in females (27% vs. 63%).

Possible explanations of the difference between the estimated broad sense heritability and the proportion of variance explained by linear regression on sequence-derived SNPs may include rare variants not captured by the sequence-derived SNPs, structural variation, and epistatic interactions (the latter was found to be an important mechanism acting at the level of the mean for both SR and SL by Huang et al. 2012). Dominance is not a plausible explanation because the great majority of the data involve homozygotes only. Further work is clearly needed to elucidate the factors that can explain why the linear regression on genetic markers accounts for different proportions of the total genetic variance (variance between lines), depending on the trait and the sex.

Supplementary Material

Supporting Information

Acknowledgments

The study was funded by the Danish Strategic Research Council (GenSAP: Centre for Genomic Selection in Animals and Plants, contract no. 12-132452). GDLC and DS acknowledge financial support from NIH grants GMR01101219 and GMR01099992.

Footnotes

Communicating editor: G. A. Churchill

Supporting information is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.115.180273/-/DC1.

Literature Cited

  1. Ansel J., Bottin H., Rodriguez-Beltran C., Damon C., Nagarajan M., et al. , 2008.  Cell-to-cell stochastic variation in gene expression is a complex genetic trait. PLoS Genet. 4: e1000049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. de los Campos G., Vazquez A. I., Fernando R., Klimentidis Y. C., Sorensen D., 2013.  Prediction of complex human traits using the genomic best linear unbiased predictor. PLoS Genet. 9: e1003608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. El-Soda M., Malosetti M., Zwaan B. J., Koornneef M., Aarts M. G. M., 2014.  Genotype x environment interaction QTL mapping in plants: lessons from Arabidopsis. Trends Plant Sci. 19: 390–398. [DOI] [PubMed] [Google Scholar]
  4. Feinberg A. P., Irizarry R. A., 2010.  Stochastic epigenetic variation as a driving force of development, evolutionary adaptation and disease. Proc. Natl. Acad. Sci. USA 102: 1757–1764. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Geiler-Samerotte K. A., Bauer C. R., Li S., Ziv N., Gresham D., et al. , 2013.  The details in the distributions: why and how to study phenotypic variability. Curr. Opin. Biotechnol. 24: 752–759. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Gutierrez J. P., Nieto B., Piqueras P., Ibáñez N., Salgado C., 2006.  Genetic parameters for canalisation analysis of litter size and litter weight at birth in mice. Genet. Sel. Evol. 38: 445–462. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Hill W. G., Mulder H. A., 2010.  Genetic analysis of environmental variation. Genet. Res. 92: 381–395. [DOI] [PubMed] [Google Scholar]
  8. Huang W., Richards S., Carbone M. A., Zhu D., Anholt R. R. H., et al. , 2012.  Epistasis dominates the genetic architecture of Drosophila quantitative traits. Proc. Natl. Acad. Sci. USA 24: 15553–15559. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Huang W., Massouras A., Inoue Y., Peiffer J., Ramia M., et al. , 2014.  Natural variation in genome architecture among 205 Drosophila melanogaster genetic reference panel lines. Genome Res. 24: 1193–1208. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Huquet B., Leclrec H., Ducrocq V., 2012.  Modelling and estimation of genotype by environment interaction for production traits in French dairy cattle. Genet. Sel. Evol. 44: 35. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Hutter C. M., Mechanic L. E., Chatterjee P., Kraft N., Gillanders E. M., 2013.  Gene-environment interactions in cancer epidemiology: a National Cancer Institute think tank report. Genet. Epidemiol. 37: 643–657. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Ibáñez N., Varona L., Sorensen D., Noguera J. L., 2007.  A study of heterogeneity of environmental variance for slaughter weight in pigs. Animal 2: 19–26. [DOI] [PubMed] [Google Scholar]
  13. Ibáñez N., Sorensen D., Waagepetersen R., Blasco A., 2008.  Selection for environmental variation: a statistical analysis and power calculations to detect response. Genetics 180: 2209–2226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Jimenez-Gomez J. M., Corwin J. A., Joseph B., Maloof J. N., Kliebenstein D. J., 2011.  Genomic analysis of QTLs and genes altering natural variation in stochastic noise. PLoS Genet. 7: e1002295. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Jinks J. L., Pooni H. S., 1988.  The genetic basis of environmental sensitivity, pp. 505–522 in Proceedings of the 2nd International Conference on Quantitative Genetics, edited by Weir B. S., Eisen E. J., Goodman M. M., Namkoong G., pp. 505–522, Sinauer Associates, Sunderland, Massachusetts. [Google Scholar]
  16. Mackay T. F. C., Lyman R. F., 2005.  Drosophila bristles and the nature of quantitative genetic variation. Philos. Trans. R. Soc. Lond. B Biol. Sci. 360: 1513–1527. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Mackay T. F. C., Richards S., Stone A. A., Barbadilla A., Ayroles J. F., et al. , 2012.  The Drosophila melanogaster genetics reference panel. Nature 482: 173–178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Morgante F., Sørensen P., Sorensen D., Maltecca C., Mackay T. F. C., 2015.  Genetic architecture of micro-environmental plasticity in Drosophila melanogaster. Sci. Rep. 5: 09785. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Mulder H. A., Bijma P., Hill W. G., 2008.  Selection for uniformity in livestock by exploiting genetic heterogeneity of residual variance. Genet. Sel. Evol. 40: 37–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Ober U., Ayroles J. F., Stone E. A., Richards S., Zhu D., et al. , 2012.  Using whole-genome sequence to predict quantitative trait phenotypes in Drosophila melanogaster. PLoS Genet. 8: e1002685. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Ros M., Sorensen D., Waagepetersen R., Dupont-Nivet M., SanCristobal M., et al. , 2004.  Evidence for genetic control of adult weight plasticity in the snail Helix aspersa. Genetics 168: 2089–2097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Rowe S., White I. M. S., Avendano S., Hill W. G., 2006.  Genetic heterogeneity of residual variance in broiler chickens. Genet. Sel. Evol. 38: 617–635. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Shen X., Pettersson M., Rönegård L., Carlborg Ö., 2012.  Inheritance beyond plain heritability: variance controlling genes in Arabidopsis thaliana. PLoS Genet. 4: e1002839. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Sorensen D., Waagepetersen R., 2003.  Normal linear models with genetically structured residual variance heterogeneity: a case study. Genet. Res. 82: 207–222. [DOI] [PubMed] [Google Scholar]
  25. Wolc A., White I. M. S., Avendano S., Hill W. G., 2009.  Genetic variability in residual variation in body weight and conformation scores in broiler chicken. Poult. Sci. 88: 1156–1161. [DOI] [PubMed] [Google Scholar]
  26. Yang Y., Christensen O. F., Sorensen D., 2011.  Analysis of a genetically structured variance heterogeneity model using the Box-Cox transformation. Genet. Res. 93: 33–46. [DOI] [PubMed] [Google Scholar]
  27. Yang J., R. J. Loos, J. E. Powell, S. E. Medland, E. K. Spelioteset al. , 2012a FTO genotype is associated with the phenotypic variability of body mass index. Nature 490: 267–273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Yang Y., Schön C. C., Sorensen D., 2012b The genetics of environmental variation for dry matter grain yield in maize. Genet. Res. 94: 113–119. [DOI] [PubMed] [Google Scholar]
  29. Zhang X. S., Hill W. G., 2005.  Evolution of the environmental component of phenotypic variance: stabilizing selection in changing environments and the cost of homogeneity. Evolution 59: 1237–1244. [PubMed] [Google Scholar]
  30. Zhou S., Campbell T. G., Stone E. A., Mackay T. F. C., Anholt R. R. H., 2012.  Phenotypic plasticity of the Drosophila transcriptome. PLoS Genet. 8: e1002593. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Data Availability Statement

The data used in the study is available at http://dgrp2.gnets.ncsu.edu/data.html.


Articles from Genetics are provided here courtesy of Oxford University Press

RESOURCES