Abstract
Assessing the impact of the social environment on health and disease is challenging. As social effects are in part determined by the genetic makeup of social partners, they can be studied from associations between genotypes of one individual and phenotype of another (social genetic effects, SGE, also called indirect genetic effects). For the first time we quantified the contribution of SGE to more than 100 organismal phenotypes and genome-wide gene expression measured in laboratory mice. We find that genetic variation in cage mates (i.e. SGE) contributes to variation in organismal and molecular measures related to anxiety, wound healing, immune function, and body weight. Social genetic effects explained up to 29% of phenotypic variance, and for several traits their contribution exceeded that of direct genetic effects (effects of an individual’s genotypes on its own phenotype). Importantly, we show that ignoring SGE can severely bias estimates of direct genetic effects (heritability). Thus SGE may be an important source of “missing heritability” in studies of complex traits in human populations. In summary, our study uncovers an important contribution of the social environment to phenotypic variation, sets the basis for using SGE to dissect social effects, and identifies an opportunity to improve studies of direct genetic effects.
Author Summary
Daily interactions between individuals can influence their health both in positive and negative ways. Often the mechanisms mediating social effects are unknown, so current approaches to study social effects are limited to a few phenotypes for which the mediating mechanisms are known a priori or suspected. Here we propose to leverage the fact that most traits are genetically controlled to investigate the influence of the social environment. To do so, we study associations between genotypes of one individual and phenotype of another individual (social genetic effects, SGE, also called indirect genetic effects). Importantly, SGE can be studied even when the traits that mediate the influence of the social environment are not known. For the first time we quantified the contribution of SGE to more than 100 organismal phenotypes and genome-wide gene expression measured in laboratory mice. We find that genetic variation in cage mates (i.e. SGE) explains up to 29% of the variation in anxiety, wound healing, immune function, and body weight. Hence our study uncovers an unexpectedly large influence of the social environment. Additionally, we show that ignoring SGE can severely bias estimates of direct genetic effects (effects of an individual’s genotypes on its own phenotype), which has important implications for the study of the genetic basis of complex traits.
Introduction
Social interactions contribute to health and disease (e.g. peer smoking increases one’s risk of taking up smoking). So far, quantifying social effects has required a clear hypothesis about the mechanisms mediating the influence of the social environment (in the example above peer smoking is the trait that mediates the social influence). For many phenotypes however, such hypotheses do not exist. Therefore we propose an alternative strategy to study social effects: we investigate effects on an individual's phenotype that arise from genotypes of social partners (social genetic effects, SGE, also called indirect genetic effects[1, 2]). SGE constitute the genetic basis of social effects and can be detected without prior knowledge of the phenotypes through which the social influence is exerted. SGE have been reported for interactions between mothers and offspring (maternal genotypes indirectly affect offspring phenotypes)[3–8] and more recently for interactions between adult individuals, in livestock and wild animals[2, 9–17] For example, growth rate in farm pigs has been found to be in part determined by the genetic makeup of the other pigs in the pen[2]. However, the extent to which SGE explain variation in biomedical traits is largely unknown.
If SGE do contribute to such traits, they are a promising approach to quantify effects of the social environment. Additionally, they provide an anchor to investigate causal paths and dissect the mechanisms underlying social effects. Finally, in studies of direct genetic effects carried out by the broad community, SGE may be used to account for social environmental effects.
Our study aimed at quantifying the contribution of SGE to multiple biomedical traits. We uncover unexpectedly large social genetic effects on multiple organismal and molecular phenotypes.
Results
To investigate whether SGE explain variation in biomedical traits, we considered two experiments in laboratory mice involving complementary genetic designs, and assessed both organismal and gene expression traits.
Experiment with two inbred strains
We first carried out an experiment with two inbred strains. We chose C57BL/6J (B6) and DBA/2J (D2), the progenitor strains of the largest mouse reference population, the BxD recombinant inbred panel[18]. We co-housed 86 mice at weaning as B6/B6, B6/D2 or D2/D2 pairs. After a period of six weeks during which the mice interacted undisturbed in their home cages, we collected 50 organismal phenotypes relevant to unconditioned anxiety, helplessness (a measure of depressed mood), general locomotor activity, stress, social dominance, wound healing and body weight (Fig 1A, S1 Table). We also profiled genome-wide gene expression in the prefrontal cortex (PFC). The PFC was selected because it is involved in coordinating behavioural responses based on sensorimotor information, motivation and affect, all of which may be affected by the social environment; as a result individual differences in PFC expression levels may reflect behavioural responses to the social environment [19–21].
In this design with two inbred strains there are three potential genetic sources of phenotypic variation: differences in the strain of the focal (i.e. phenotyped) mice (DGE), differences in the strain of their cage mates (SGE) and an interaction between the two, whereby the effect of the strain of the cage mate depends on the strain of the focal mouse (Fig 1B, S1 Fig and Methods). Variance partitioning and model selection (see Methods) provided evidence that interactions between DGE and SGE are common across traits (S2 Fig), with SGE typically affecting a specific trait in one strain (B6 or D2) but not the other (e.g. Fig 1B, S3 Fig). Thus, for each trait, we modeled the measurements collected in B6 mice and D2 focal mice separately (S1 Fig).
For eleven out of 50 organismal phenotypes we found significant SGE in either B6 or D2 mice (P < 0.05, FDR < 43%—see Methods; Table 1, S2 and S3 Tables). Strain-specific differences existed as different measures were affected by SGE in B6 and D2 mice. Of these eleven phenotypes, two were measures of stress and six were measures of anxiety, providing evidence that variation in the genetic makeup of cage mates causes variation in stress-related phenotypes. The direction of effect was consistent across the six measures of anxiety (S3 Fig). We also detected strong SGE on the rate of wound healing (measured from an ear punch, P = 6.6 10−3, Q = 0.36), showing that SGE are not limited to behaviors. (Table 1). Importantly, SGE explained a considerable proportion of phenotypic variance (up to 18%), showing that social effects of genetic origin are important contributors to phenotypic variation in these two strains.
Table 1. Organismal phenotypes significantly affected by SGE in the experiment with inbred strains (P < 0.05).
Focal mice | Phenotype | Measure of | SGE P value | SGE Q value | SGE variance (%) |
---|---|---|---|---|---|
B6 | Wound area | Wound healing | 6.6 10−3 | 0.36 | 18 +/- 12 |
B6 | Midbrain concentration of noradrenaline | Stress | 7.2 10−3 | 0.36 | 15 +/- 10 |
B6 | Midbrain concentration of dopamine* | Stress | 2.6 10−2 | 0.38 | 10 +/- 9 |
D2 | Time spent not moving in center first min (AC) | Anxiety | 1.1 10−2 | 0.38 | 15 +/- 11 |
D2 | Total arm entries (EPM) | Locomotor activity | 1.6 10−2 | 0.38 | 12 +/- 10 |
D2 | Ratio time spent not moving periphery/center min 2 to 5 (AC) | Anxiety | 2.6 10−2 | 0.38 | 14 +/- 11 |
D2 | Ratio time spent not moving periphery/center min 6 to 30 (AC) | Anxiety | 3.2 10−2 | 0.38 | 11 +/- 9 |
D2 | Number immobility bouts first 2 min (FST) | Helplessness | 3.4 10−2 | 0.38 | 8 +/- 8 |
D2 | Ratio time spent periphery/center min 2 to 5 (AC) | Anxiety | 3.5 10−2 | 0.38 | 12 +/- 10 |
D2 | Ratio ambulatory time periphery/center min 6 to 30 (AC) | Anxiety | 3.8 10−2 | 0.38 | 9 +/- 8 |
D2 | Proportion of entries in open arms (EPM) | Anxiety | 4.8 10−2 | 0.43 | 10 +/- 9 |
We also found evidence that SGE affect gene expression in the PFC. A gene set enrichment analysis based on the contribution of SGE to gene expression levels (see Methods) revealed “integrin-mediated signaling pathway”(P = 2.5 10−6, Q = 0.024) and “regulation of dopamine metabolic process” (P = 7.0 10−4, Q = 1) as the most significant Gene Ontology (GO) terms in D2 and B6 mice respectively (Table 2). The latter enrichment, although statistically less robust, is strikingly consistent with our finding that SGE affect dopamine levels measured by HPLC in B6 mice (Table 1). Our results therefore converge to show that variation in the genetic makeup of cage mates causes variation in behavioral, biochemical, and gene expression traits relevant to stress. Although the large number of tests limits the statistical power to detect SGE on individual genes (12,898 genes tested), we identified three genes with significant effects (Srsf2, Phlpp2, and Ppid; FDR < 33%; S4 Table).
Table 2. Gene set enrichment analysis based on the contribution of SGE to gene expression in the prefrontal cortex (PFC) in the experiment with inbred strains.
Focal mice | GO term ID | GO term | # annotated genes | P value | Q value |
---|---|---|---|---|---|
B6 | GO:0042053 | regulation of dopamine metabolic process* | 12 | 7.0 10−4 | 1 |
B6 | GO:0045216 | cell-cell junction organization | 127 | 2.8 10−3 | 1 |
B6 | GO:0072376 | protein activation cascade | 20 | 2.8 10−3 | 1 |
B6 | GO:1902017 | regulation of cilium assembly | 15 | 3.3 10−3 | 1 |
B6 | GO:0042987 | amyloid precursor protein catabolic process | 19 | 3.4 10−3 | 1 |
D2 | GO:0007229 | integrin-mediated signaling pathway | 57 | 2.5 10−6 | 0.024 |
D2 | GO:0043269 | regulation of ion transport | 431 | 1.7 10−4 | 0.81 |
D2 | GO:0030001 | metal ion transport | 543 | 1.4 10−3 | 1 |
D2 | GO:1900006 | positive regulation of dendrite development | 59 | 1.5 10−3 | 1 |
D2 | GO:0008542 | visual learning | 43 | 1.9 10−3 | 1 |
Reanalysis of a large outbred mice dataset
We next investigated SGE in a dataset from outbred (Heterogeneous Stock) mice[22–24]. This design better represents traditional mouse housing conditions (groups of four, five and six mice mostly rather than pairs; S4 Fig) and genetic variation in natural populations (high genetic diversity and genetically unique individuals rather than two inbred strains). The dataset comprises more than 100 organismal phenotypes measured in 2,448 mice (S5 Table), and gene expression in hippocampus for a subset of 457 mice. To accommodate the genetic design of this study, we fitted random effects models with variance components for DGE, SGE and their covariance. Our models are inspired from models published in the literature[2, 25] yet differ in some important ways, which we explain and justify in S1 Note. Because genome-wide genotypes were not available for a subset of mice (526 out of 2,448), we used pedigree information to estimate pairwise genetic covariance for the mice with no genotype information (see Methods). We found that the results obtained using both pedigree and genotype data for estimation of genetic similarity were in agreement with those obtained using genotype data only from the subset of mice that were in cages where all mice had been genotyped (S5 Fig). All models were fitted using LIMIX[26, 27].
Simulations showed that DGE and SGE were unbiased (S6 Fig and S7 Fig). Of the 117 organismal phenotypes available in this dataset, 43 were significantly affected by SGE (P < 0.05, FDR < 5.7%; Table 3, S5 Table). SGE explained up to 29% of phenotypic variation and an average of 8.9% across the 43 significantly affected traits. Importantly, the estimated contribution of SGE was greater than that of DGE for 8 of the 43 traits.
Table 3. Organismal phenotypes significantly affected by SGE in the outbred mice dataset (P < 0.05).
Measure | Measure of | SGE P value | SGE Q value | SGE variance (%) | DGE variance (%) |
---|---|---|---|---|---|
CD4 fluorescence intensity | Lymphocyte function | <1.00 10-16 | <1.00 10-16 | 29 +/- 5 | 21 +/- 2 |
Glycemia before glucose injection | Basal glycemia | 1.30 10-10 | 3.20 10-09 | 11 +/- 4 | 11 +/- 2 |
Size of CD4+ cells (forward scatter) | Lymphocyte function | 2.50 10-08 | 4.40 10-07 | 27 +/- 6 | 5 +/- 1 |
Size of B cells (forward scatter) | Lymphocyte function | 4.00 10-08 | 5.20 10-07 | 25 +/- 6 | 5 +/- 1 |
Size of CD8+ cells (forward scatter) | Lymphocyte function | 8.00 10-08 | 8.20 10-07 | 27 +/- 6 | 4 +/- 1 |
Chloride | Blood biochemistry | 7.70 10-07 | 6.60 10-06 | 17 +/- 5 | 4 +/- 2 |
Calcium | Blood biochemistry | 9.90 10-07 | 7.30 10-06 | 18 +/- 5 | 5 +/- 2 |
Absolute number of lymphocytes | Lymphocyte function | 3.60 10-06 | 2.30 10-05 | 7 +/- 4 | 9 +/- 3 |
Weight at 6 weeks of age | Body weight | 5.30 10-06 | 3.00 10-05 | 7 +/- 4 | 11 +/- 2 |
Sodium | Blood biochemistry | 5.90 10-06 | 3.00 10-05 | 17 +/- 5 | 2 +/- 2 |
Alanine transaminase | Liver function | 1.00 10-05 | 4.70 10-05 | 11 +/- 4 | 2 +/- 2 |
White blood cells | Lymphocyte function | 3.50 10-05 | 0.00014 | 6 +/- 4 | 7 +/- 3 |
Mean startle (following bang only) before conditioning with foot shock | Unconditioned anxiety | 0.00011 | 4.00 10-04 | 15 +/- 4 | 24 +/- 4 |
Weight before food hyponeophagia test (7 weeks) | Body weight | 0.00015 | 0.00052 | 12 +/- 4 | 6 +/- 2 |
Glycemia 15 minutes after glucose injection | Glucose tolerance | 0.00035 | 0.0011 | 9 +/- 4 | 9 +/- 3 |
High density lipoprotein | Blood biochemistry | 0.00046 | 0.0014 | 8 +/- 4 | 35 +/- 3 |
Proportion of CD3+ T-cells that are CD4+ | Lymphocyte function | 0.00049 | 0.0014 | 7 +/- 5 | 9 +/- 3 |
Red blood cell distribution width | Red blood cell function | 0.00053 | 0.0014 | 6 +/- 4 | 28 +/- 4 |
Weight after food hyponeophagia test | Body weight | 0.00093 | 0.0024 | 10 +/- 4 | 7 +/- 2 |
Percentage of B220+ cells | Lymphocyte function | 0.0011 | 0.0027 | 10 +/- 5 | 32 +/- 4 |
Wound (ear hole) area | Wound healing | 0.0016 | 0.0037 | 6 +/- 3 | 32 +/- 3 |
Expiratory time (after metacholine) | Lung function | 0.0018 | 0.0039 | 3 +/- 3 | 11 +/- 3 |
Enhanced pause (before metacholine) | Lung function | 0.0019 | 0.0041 | 10 +/- 4 | 9 +/- 3 |
Distance traveled in the center of the EPM | Unconditioned anxiety | 0.0021 | 0.0043 | 4 +/- 3 | 27 +/- 4 |
Mean startle (following tone + bang) before conditioning with foot shock | Unconditioned anxiety | 0.0032 | 0.0063 | 9 +/- 4 | 25 +/- 4 |
Percentage of CD8+ cells | Lymphocyte function | 0.0039 | 0.0073 | 4 +/- 3 | 48 +/- 4 |
Distance traveled in the closed arms of the EPM | Locomotor activity | 0.0042 | 0.0077 | 2 +/- 3 | 18 +/- 3 |
Distance traveled in the open arms of the EPM | Unconditioned anxiety | 0.0057 | 0.01 | 2 +/- 3 | 19 +/- 3 |
Number of entries into the open arms of the EPM | Unconditioned anxiety | 0.0063 | 0.011 | 2 +/- 3 | 19 +/- 3 |
Albumin | Blood biochemistry | 0.0066 | 0.011 | 5 +/- 4 | 4 +/- 2 |
Number of KI67+ cells | Adult neurogenesis | 0.01 | 0.017 | 4 +/- 6 | 38 +/- 7 |
Measured mean cell hemoglobin concentration | Red blood cell function | 0.012 | 0.019 | 7 +/- 4 | 15 +/- 3 |
Beam breaks from ambulation in home cage during last 5 minutes | Locomotor activity | 0.013 | 0.02 | 2 +/- 3 | 11 +/- 3 |
Faecal steroids in males and females | Stress | 0.015 | 0.022 | 6 +/- 5 | 8 +/- 4 |
Time spent in the open arms of the EPM | Unconditioned anxiety | 0.015 | 0.022 | 1 +/- 3 | 17 +/- 3 |
Aspartate transaminase | Blood biochemistry | 0.023 | 0.032 | 4 +/- 4 | 2 +/- 2 |
Alkaline phosphatase | Blood biochemistry | 0.026 | 0.036 | 1 +/- 3 | 44 +/- 3 |
Creatinine | Blood biochemistry | 0.027 | 0.036 | 5 +/- 5 | 2 +/- 3 |
Glucose | Basal glycemia | 0.032 | 0.041 | 11 +/- 4 | 9 +/- 3 |
Time freezing during cue | Conditioned anxiety | 0.034 | 0.042 | 7 +/- 4 | 15 +/- 4 |
Percentage of CD3+ cells | Lymphocyte function | 0.034 | 0.042 | 5 +/- 4 | 31 +/- 4 |
Respiratory rate (after metacholine) | Lung function | 0.04 | 0.048 | 1 +/- 3 | 17 +/- 4 |
Expiratory time (before metacholine) | Lung function | 0.049 | 0.057 | 2 +/- 3 | 15 +/- 3 |
Among the organismal phenotypes most significantly and strongly affected by SGE in this dataset were measures of lymphocyte activation (in particular size and number of CD4+ T cells and B cells, collected by fluorescence-activated cell sorting and full blood count, Table 3). In contrast, measures related to other leucocytes (neutrophils, basophils and monocytes) and natural killer T cells. Altogether, these results indicate that genotypes of cage mates influence humoral immunity, although this is unlikely that this represents spread of a disease as the mice were kept in a clean mouse facility.
In addition, rate of wound healing, the measure most significantly and strongly associated with SGE in the experiment with two inbred strains, was also significantly affected by SGE in the outbred dataset (P = 1.6 10−3, Q = 3.7 10−3).
Three measures of body weight (collected on weeks 6 and 7) also figured among the traits significantly affected by SGE. Other measures of body weight, collected at weeks 9 and 10, showed no SGE (S5 Table). The two sets of measures likely reflect a different phenotype as the normal physiology of the mice was disrupted between weeks 7 and 9 by aggressive phenotyping (tests of conditioned anxiety involving foot shocks, airway sensitization by allergen, intraperitoneal injection of glucose). Another potential but unlikely explanation for the higher contribution of SGE on earlier body weight measures is a sharp decrease in social effects between weeks 7 and 9.
Finally, a subset of measures of anxiety, blood biochemistry, and lung function were also affected by the genotypes of cage mates.
In contrast, there was no statistical evidence for SGE on gene expression levels in the hippocampus (smallest nominal P value 4.1 10−5 corresponding to a Q value of 64%, S6 Table). This is most likely the result of a much smaller sample size (457 or 5% of mice) for expression traits, and is consistent with published power analyses[28].
We next explored whether studies focused on DGE can safely ignore SGE, focusing on the estimation of the collective effect of additive DGE on phenotypic variation (narrow-sense heritability). In the outbred dataset, cage mates are genetically more similar to each other than average (S8 Fig). As a result, DGE and SGE are correlated (an example of gene-environment correlation). Thus, we hypothesized that failing to account for SGE would bias estimates of DGE. We also hypothesized that fitting cage effects might be sufficient to eliminate this bias. To investigate both hypotheses, we compared DGE estimates obtained using a linear mixed model for DGE that does not account for cage effects nor SGE, a model that accounts for DGE and cage effects, and one that accounts for DGE, SGE, corresponding social environmental effects, and cage effects (“full model”). Models that did not account for SGE led to substantially larger DGE estimates, and estimates from the model with DGE and cage effects were intermediate between those from the model with DGE only and those from the full model. In simulated traits based on the real genotypes and generated from DGE, SGE and cage effects (see Methods), we found that models that did not account for SGE yielded inflated DGE estimates, whereas joint modeling of DGE, SGE and cage effects resulted in unbiased estimates (Fig 2C and S6 Fig). Importantly, fitting cage effects but no SGE did not eliminate the bias. The simulation results strongly suggest that, in the real data, the estimates obtained from the full model are most accurate, and that models that ignore SGE overestimate heritability. This problem is particularly acute when direct and social random genetic effects are positively correlated (i.e. , see Methods; Fig 2A and 2B). The problem we highlight here is general and likely to affect other studies in which social partners are related, including twin and family studies used to estimate heritability in humans (see Discussion).
Discussion
Using two complementary genetic designs–one using two mouse inbred strains and one using outbred mice—we estimated the contribution of social genetic effects to a variety of organismal phenotypes and gene expression traits. The experiment with two inbred strains was designed to investigate SGE and focused on behaviours (anxiety and helplessness), as there is strong evidence that behaviours are socially affected[29–33]. To test whether SGE can be detected in outbred populations and survey a broader range of phenotypes, we re-analysed a large dataset from outbred mice and quantified the contribution of SGE to more than 100 phenotypes. The design of our study raises important questions: are positive results (i.e. evidence of SGE) in the experiment with two inbred strains expected to replicate in the outbred dataset? Are some phenotypes expected to be affected by SGE and some not? We now discuss these points.
Some phenotypes were measured in both experiments with similar protocols. Wound area, the measure of wound healing, was collected in both experiments using the exact same protocol. It is significantly affected by SGE in both experiments. The protocols for measuring body and adrenal gland weight are fairly simple thus reducing technical variation between experiments, and body weight was measured at about the same time point (around 50 days of age) in both experiments. There was strong evidence for SGE on body weight in the outbred dataset but no evidence in the experiment with two inbred strains. No SGE on adrenal gland weight were detected in either dataset. Finally, a partially overlapping set of measures of unconditioned anxiety was significantly affected by SGE in both experiments. While reviewing results from the two experiments in parallel is informative, positive results in one experiment are not strictly expected to replicate in the other. Indeed, although the variants that give rise to SGE in the experiment with two inbred strains also segregate in the outbred population (the two strains used in our experiment were among the eight founders of the outbred population), they have recombined with many additional variants from the six other founders. Moreover, the housing conditions were very different in the experiment with inbred strains and outbred experiment (group size of 2 vs. 2 to 7 respectively, and unfamiliar mice vs. familiar mice housed together). Therefore, one should not expect the overall contribution of SGE be the same in the two experiments. Rather, combining the two experiments provides a first hint at the generalizability of our results. Published studies of SGE provide additional information on this matter, and suggest that SGE may contribute to variation in body weight across species[2].
That social genetic effects contribute to variation in anxiety probably does not come as a surprise but their contribution to wound healing maybe more so. This result is however supported by significant p-values in both experiments of our study and large effect sizes (18 and 6%). When interpreting this result, it is important to bear in mind that social effects on wound healing can (and will, necessarily) be mediated by traits of cage mates that are different from wound healing. For example, social effects on wound healing could be mediated by social grooming, which could either mechanically disrupt the healing process or chemically enhance it[34]. Any traits of cage mates that may induce a systemic stress response in the focal animal could also mediate social effects on wound healing[35, 36]. Thus, social effects on wound healing are not unlikely, and, similarly, social effects may affect any phenotype (e.g. by through the induction of a systemic stress response,).
Because any phenotype may a priori be affected by social effects and the mechanisms at play are rarely known, SGE offer an attractive alternative to investigate social effects. First, as we have shown, they can be used to quantify social effects, effectively providing a lower bound estimate of social effects (as only the genetic component is captured). Second, SGE can be used to test whether a particular trait of social partners has an effect on a phenotype of interest. Establishing a causal relationship between two phenotypes is always difficult because of the risk of reverse causation and independent action of hidden confounders on both traits; SGE provide an anchor to test causality.
Independent of their relevance for studying social effects, we show that ignoring SGE can lead to biased estimates of heritability (i.e. the collective effect of DGE). In our study (outbred dataset), DGE and SGE are correlated by design (mice that share a cage are more genetically similar than average), and we show that this correlation leads to biased estimates of heritability if unaccounted for. Fitting cage effects, which has the primary goal of accounting for environmental effects shared by cage mates (e.g. noise levels), does not eliminate the bias. Our results are of interest to the broad genetics community as DGE and SGE are correlated in most if not all experimental designs traditionally used to estimate heritability in humans and model organisms, and SGE may thus have caused widespread bias. For example, in twin designs, MZ twins not only share 100% of their genotypes but they also share 100% of the genotypes of their sibling; DZ twins in comparison share both 50% of their genotypes and 50% of the genotypes of their sibling. Thus, SGE can contribute to increased concordance between MZ twins compared to DZ twins. If SGE are not modelled, heritability may be overestimated and appear “missing” when compared to genome-wide association results obtained from unrelated individuals[37]. Note that when the covariance between direct and social random genetic effects is negative (competition effects), ignoring SGE may lead to underestimating heritability. SGE in humans were considered once before (“sibling effects” [38]) but were never, to the best of our knowledge, modelled in heritability studies. Because we found that fitting cage effects was not sufficient to eliminate the bias due to SGE, we suspect that accounting for a “common environment” shared by family members, as is commonly done in human studies[39–41], will not eliminate SGE-induced bias.
It is not the first time that unaccounted for gene-environment correlations are put forward as potential causes of bias (e.g. Conley et al. investigated the correlation between genetics and urban setting [42]). However, the impact of the correlation between DGE and SGE is likely to be particularly severe as we have shown that SGE affect a wide range of phenotypes and DGE and SGE are correlated in most experimental designs used to estimate heritability.
Our study sheds light on an important component of the genetic architecture of complex traits, one that lies outside the individual, in social partners. Social genetic effects have already been shown to play an important role in artificial selection of livestock[43] and have important evolutionary consequences[44, 45]. Our results provide evidence that SGE are also an important component of health and disease.
Methods
Experiment with inbred strains
Animals and experimental design
Forty-five C57BL/6J (B6) and 45 DBA/2J (D2) about 3-weeks-old (±3 days) were shipped from The Jackson Laboratory. Mice were weaned on the day they were shipped and littermates were shipped together so as to minimise novel social interactions prior to experiments. Three D2 mice died during shipment and to maintain balance one B6 mouse was excluded. On arrival at UTHSC, mice were transferred to standard (75 square inches) mouse cages and allowed to recover overnight with their original littermates. Ears were punched for identification, and mice were transferred to new cages. Mice were co-housed in pairs to make up sixteen B6/D2 (BD) cages, 14 B6/B6 (BB) cages, and 13 D2/D2 (DD) cages assembled using mice from 15 B6 and 15 D2 litters (3 mice per litter originally). Littermates were never paired in BB and DD cages so cage mates were always unfamiliar with each other at the beginning of the experiment. Littermates were distributed across the three groups (BB, BD, and DD), and the cages ordered as successive BB, BD, and DD triplets on the shelves. Mice were housed in standard mouse cages with corncob bedding and a cardboard shelter on a fixed 12:12 hour light: dark cycle with ad libitum access to food and water. Phenotypes were collected after six weeks of co-housing.
Phenotyping
All protocols were approved by the UTHSC Institutional Animal Care and Use Committee. Mice were phenotyped between 7am and 2pm, in a random order different for each test. The order in which tests were performed, as well as the measures collected in each test, are summarised in S1 Table.
Elevated plus maze: We used an elevated plus maze (Columbus Instruments) with a surface ~30 cm above the floor that contained four 51-cm long, 11.5 cm-wide arms arranged at right angles. Closed arms had opaque walls 30 cm high, extending the length of the arm. Each mouse was placed in the center of the maze facing an open arm and allowed 5 minutes to explore the maze. Each cage mate was tested in turn and precaution was taken so that cage mates did not interact just after the first cage mate had finished the test and before the second one was taken to the maze.
Activity chamber: We used six 17" X 17" Plexiglas activity chambers (Med Associates Inc., ENV-510) with photobeams around the edge to measure movement. The beams were interfaced to a computer to record movement. Each activity chamber was located into a closed section of a cupboard, which had white light on. Each mouse was placed in the centre of the chamber and given 30 minutes to explore the activity chamber. Cage mates were tested at the same time.
Tail suspension test: Adhesive tape was wrapped around each tail three quarters of the distance from the base, and used to attach a suspension hook. Animals were suspended for six minutes and video recorded from the side. Cage mates were tested at the same time.
Forced swim test: Mice were placed in six cylinders (height: 20 cm, diameter: 15 cm) with water at about 25°C for 6 minutes and their behaviour recorded from the side. Upon completion mice were dried. Cage mates were tested at the same time.
Body weight: Mice were weighed on the day after arrival at UTHSC and every week thereafter.
Sacrifice, tissue collection, and gland weights: Mice were euthanized by injection of 250 mg/kg Avertin (20 mg/ml; tribromoethanol) followed by cardiac puncture and exsanguination under deep anaesthesia. The blood was collected in lithium heparin tubes, and an aliquot was taken in another lithium heparin tube for serotonin quantification. The aliquot was immediately snap frozen in liquid nitrogen and then stored at -80C.
The brain was dissected and placed in an ice-cold coronal mouse brain matrix. A slab containing the prefrontal cortex (PFC) was removed, and the PFC taken, snap frozen and stored at -80°C. The part of the brain caudal to the amygdala was dissected, snap frozen and stored at -80°C. Then the midbrain was dissected by letting the brain soften on ice and removing the cerebellum. The midbrain was stored at -80°C and shipped on dry ice for analysis by high-pressure liquid chromatography.
Both preputial glands were dissected and weighed together, as were both gonads. Right and left adrenal glands were weighed separately.
High-pressure liquid chromatography: Brain samples were weighed frozen. They were then crushed in 1,5 ml of 10-3M HCl containing sodium metabisulfite and ascorbic acid (antioxidants), and centrifuged at 22,000 g for 20min at 5°C. The supernatants were collected and filtered through a10-kDa membrane (Nanosep, Pall) by centrifugation at 7,000 g. Then, a 20 μl aliquot of sample was analyzed for serotonin by fluorometric detection[46]. The amounts of catecholamines (dopamine and noradrenaline) and their metabolites (DOPAC, HVA, VMA) as well as the serotonin metabolite 5HIAA were measured by electrochemical detection on a serial array of coulometric flow-through graphite electrodes (CoulArray, ESA)[47]. The analysis, data reduction and peak identification were fully automated.
Ten μl of blood were used for serotonin quantification. Serotonin was extracted by adding 400 μl of 10−3 M HCl, followed by 2 centrifuge runs including one on Nanosep 10 kD. Quantification was carried out as for brain serotonin.
Size of ear hole: At sacrifice the ear that was clipped on arrival was cut and placed in formalin. It was then mounted on a glass slide, scanned, and the resulting image analysed with ImageJ. The scale was set once using the size of the slide. The hole was delimited using the freeform drawing tool.
Pre-processing of organismal phenotypes
All collected phenotypes were quantitative. Individual phenotypes were normalized using the parametric Box-Cox power transformation[48], using R’s MASS package[49] and a model where strain of the animal, strain of the cage-mate, and an interaction between the two were included as fixed effects to inform the Box-Cox fit. Note that this approach does not regress out these genetic effects.
On the Box-Cox transformed data for each phenotype, we tested for association with experimental covariates and regressed out significantly associated covariates that were unrelated to the measure of interest. The covariates that were accounted for are reported in S1 Table. Significance of SGE calculated by including covariates in the models rather than regressing them out are reported in S3 Table.
RNA extraction and sequencing
Total RNA was extracted using a Qiagen Qiacube and RNeasy mini kit according to the manufacturers protocols and was loaded on a 96-well plate, with samples randomly allocated to wells, for shipment to the sequencing centre. The quantity and quality of the samples was assessed using Agilent high sensitivity R6K screen tape. One sample (B42) with high degradation was excluded. Two samples (B39, D30) had very low RNA integrity numbers (RIN, 3.7 and 4.2 respectively), and one sample (B9) had low 28S/18S peak height ratio but they were included in library preparation and sequencing. All libraries were prepared according to the dUTP strand-specific protocol, multiplexed in one 85-plex and sequenced in 5 HiSeq 2500 rapid runs to obtain paired-end, 51 bp reads.
Alignment of sequencing reads and quantification of gene expression
We used SNPs and indels identified in D2 (Sanger Mouse Project[50], files mgp.v3.snps.rsIDdbSNPv137.vcf and mgp.v3.indels.rsIDdbSNPv137.vcf, both on GRCm38) to construct a D2 genome and the corresponding gene annotation file using the software SeqNature[51] (version 1.2). Reads were aligned using Tophat[52] (Tophat2 version 2.0.11, Bowtie version 2.2.2.0, and Samtools version 0.1.19.0), considering either the B6 genome (GRCm38) or the D2 genome. High quality alignments were selected using the following criteria: both reads of the pair had to uniquely map to the same chromosome and within 2.3 Mb of each other (length of the longest gene). Expression levels were quantified at the gene level using HTSeq[53] (version 0.6.1), considering high quality alignments only and UCSC annotations (GRCm38/mm10).
Pre-processing and quality control
Raw expression counts were adjusted for library size using the R package DESeq2[54] (version 1.2.10). Out of 23,420 genes, we considered 13,271 genes with at least 10 (library size normalized) reads in at least 40% of the samples. Gene expression levels were normalized using the Box-Cox transformation as done for the organismal phenotypes.
We identified and excluded genes whose expression was affected by the order of sacrifice: at the end of our experiment, we sacrificed each cage mate in turn with an interval of about 15 minutes between the first and the second cage mate. The cage mate that was sacrificed second was in the room and we found evidence that the expression of a small number of genes was activated during this interval (i.e. their expression was correlated with order of sacrifice within cage mate pair). Some of the affected genes were well-known immediate early genes. In our experiment, this effect was confounded with potential social genetic effects as B6 mice in BD cages were always sacrificed first and D2 mice second, while in BB cages some of the B6 mice were necessarily sacrificed first and others second, and the same applies for D2 mice in DD cages. To avoid possible false positive SGE, we took a conservative approach and excluded all genes for which the order of sacrifice within pair, in B6 mice from BB cages and D2 mice from DD cages, was associated with gene expression levels (P < 0.05 P, > 1% variance explained; 375 genes in total).
We used a principal component analysis (PCA) to identify outlier samples. The first and second principal components explained 22% and 9% of variation in gene expression, and were used to define outliers (S9 Fig). A total of 7 samples were excluded (B39, D13, D30, D9, B30, D16, D39), two of which also had very low RIN. The final gene expression dataset consisted of 11 pairs of B6 cage mates, 3 B6 mice whose B6 cage mates were not included in the final sample, 15 pairs of B6/D2 cage mates, 1 B6 mouse whose D2 cage mate was not included in the final sample, 9 pairs of D2 cage mates, and 4 D2 mice whose cage mates were not included in the final sample. This amounts to 25, 16, 15, and 22 mice in the BB, BD, DB, and DD groups respectively.
In addition to removing outlier samples, we used two approaches to adjust for observed and hidden covariates affecting the expression profiles. First, for significance tests of SGE on gene expression levels, we used PEER to account for hidden sources of variation[55]; we ran PEER while including strain, strain of the cage mate, and interaction between them as observed covariates so as not to remove the genetic signal (see below), and used the expression residuals calculated from the first 5 PEER factors for statistical tests. The p-values obtained with PEER residuals were calibrated whereas other normalization strategies led to severely deflated test statistics, supporting that the PEER factors accounted for unwanted variation such as hidden batch effects.
Second, for variance partitioning and model selection, we regressed out observed technical covariates only as removal of hidden factors can reduce the residual noise. We considered several technical covariates: initial RNA concentration (as an indicator of the size of tissue dissected, since the same volume of buffer was added to all), RIN, ratio between heights of 28S and 18S peaks, proportion of first reads mapping—as they should—to the strand opposite to the feature (i.e. strand specificity of the protocol, calculated by running htseq-count a second time with the–-stranded = yes option), insert size, yield, proportion of reads mapped to the (B6 or D2) genome, proportion of mapped reads that are of high quality (as defined above), proportion of high quality reads that are exonic, number of alignments with 0, 1, or 2 errors (as calculated by SAMStat[56] version 1.09), and the following measures calculated from the output of FastQC[57] (version 0.10.1): average per base sequence quality, average per base sequence (G nucleotide used) content, average per base GC content, average per sequence GC content, average per base N content, average per sequence quality scores, average sequence duplication levels, presence or absence of overrepresented sequences. We calculated the correlation of these covariates with the first principal components, and found that proportion of selected read pairs that are exonic was significantly correlated with the first principal component (Pearson correlation, R2 = 0.66, P = 8.10−12). Therefore we regressed the effect of this covariate out of the expression of each gene, and used the residuals for variance partitioning and model selection.
Statistical modelling
Conceptually we distinguish between the focal individual (f), whose phenotype is the phenotype of interest and the dependent variable in the model, and the focal individual’s social partners–here its cage mate (cm)–which may influence the phenotype of the focal individual through a number of unknown mediating phenotypes. No prior knowledge or measurement of these mediating phenotypes is required as they are captured through their genetic and environmental components (social genetic effects, SGE, and social environmental effects, SEE, respectively).
Genetic effects considered in this experiment include:
direct genetic effects (DGE, effects of the strain of the focal animal on its own phenotype–the phenotype of interest). This component is only relevant when both B6 and D2 focal mice are analysed together.
social genetic effects (SGE, effects of the strain of the cage mate on the focal animal’s phenotype; SGE act through mediating phenotypes of the cage mate).
an interaction between DGE and SGE (whereby the effect of the strain of the cage mate depends on the strain of the focal individual). This component is only relevant when both B6 and D2 focal mice are analysed together.
Genetic effects were modelled as fixed effects.
“Environmental” (i.e. unexplained or residual) variance may arise from:
direct environmental effects (DEE, effects of the “environment” of the focal animal on its own phenotype)
social environmental effects (SEE, effects of the environment of the cage mate on the focal animal’s phenotype; SEE are the environmental component of the mediating phenotypes, so whenever social effects are considered both SGE and SEE need to be accounted for)
cage effects (which affect the phenotype of interest of all mice sharing a cage in the same way)
Environmental effects were modelled as random effects.
The model of phenotypic variation that includes all of the genetic and environmental effects listed above (note: a simpler model was effectively used for analysis, as explained next) is the following:
where yf is the phenotype of interest of the focal animal f; AD,f and AS,cm respectively refer to additive DGE of the focal individual and additive SGE from the cage mate cm; AD,f : AS,cm refers to the interaction between AD,f and AS,cm; is a row of the matrix Z that indicates cage mates (importantly Zi,i = 0); is the vector of random DEE and the vector of random SEE. is a row of the matrix W that indicates cage assignment and c the vector of cage effects.
The joint distribution of the environmental random effects is defined as:
The covariance between the random direct and social environmental effects exists because an environmental factor may influence both the phenotype of interest and the mediating traits, in which case it will influence the phenotype of interest both in a direct () and an indirect () manner. Depending on 1) the direction of the environmental effect on the phenotype of interest, 2) the directions of the effects on the mediating traits, and 3) the signs of the correlations between mediating traits and phenotype of interest, may be positive or negative.
The environmental covariance is:
Note that cov (ei,ej) may be negative as a result of being negative.
When group size is constant, as is the case in this dataset (group size = 2), eD, eS, and c are not identifiable. Thus we instead used a single random environmental term, whose correlation structure is described below, as has been done in other studies of social genetic effects[25]:
From the expressions of cov (ei,ej) above we see that and . As the determinant of the environmental covariance matrix is necessarily positive, −1 ≤ ρ ≤ 1.
Note that under the assumption 0 ≤ ρ ≤ 1 the environmental component could be rewritten as a sum of cage effects and iid residuals. This assumption is often made when group size is large (e.g. [2]) because
where n is the group size. is likely to be positive and large for large groups, so ρ is likely be positive for large groups. However, in this experiment group size is 2 so and ρ may well be negative. Thus we cannot rewrite the residual term as a sum of cage effects and iid residuals.
To investigate interactions between DGE and SGE (see Methods sections Model selection and Variance partitioning below), we considered all mice (B6 and D2) as focal individuals. For all other analyses we analysed B6 focal mice and D2 focal mice separately. We did so because there was evidence that interactions exist between DGE and SGE (results reported in the columns under “All mice” in S2 Table and shown in S2 Fig) and analysing B6 and D2 focal mice separately in that case simplified results interpretation.
When analysing B6 and D2 focal mice separately, we used the following simpler model:
(1) |
where βS is the regression coefficient of SGE and Xcm the strain of the cage mate cm, with strain encoded as 0 for B6 and 1 for D2.
Model selection
To investigate whether interactions between DGE and SGE might contribute to phenotypic variation, we sought to identify the combination of genetic and environmental terms from {DGE, SGE, interaction between DGE and SGE, DEE, SEE, covariance between DEE and SEE, and cage effects} that best modelled phenotypic variation. All mice (B6 and D2) were included as focal individuals for this analysis. Constraints were imposed on the environmental correlation parameter ρ when SEE and/or cage effects were not included in the model. Thus, the models were fitted in R using either the functions lm (stats package), lme (nlme package), or gls (nlme package, corCompSymm correlation structure for the residuals), as appropriate. The best model was selected using the Akaike information criterion (AIC) calculated based on the maximum-likelihood (ML) fit using the AIC function in the R stats package. The results of this analysis are reported in S2 Fig and the columns under “All mice” in S2 Table).
Variance partitioning
For S2 Fig and the columns under “All mice” in S2 Table, both B6 and D2 mice were considered as focal individuals and we used the following model:
(2) |
where βD is the regression coefficient of DGE, Xf the strain of f, and βD:S the regression coefficient of the interaction between DGE and SGE. Xf.Xcm is the product of the numerically encoded strains of f and cm, and therefore was 0 except for D2 focal mice with D2 cage mates.
All other results were obtained by analysing B6 and D2 focal mice separately using Model (1).
All models were fitted using restricted maximum likelihood (REML). The proportion of phenotypic variance explained by each fixed effect in the model (Model (1) or Model (2)) was calculated as the ratio of the sample variance of that term (i.e. the expected value of the variance of that term, which is for SGE) and the sample phenotypic variance. We obtained the sample phenotypic variance by drawing from the fitted model and calculating the average variance of the draws.
Standard errors were obtained using Gaussian propagation of uncertainty.
Statistical significance tests
Statistical significance analyses for SGE were carried out in B6 and D2 focal mice separately (S1 Fig), based on Model (1). Significance of the SGE covariate was assessed using a likelihood ratio test. Q values[58] implemented in the R package qvalue [59] were used to adjust for multiple testing across traits and focal strain (43 traits x 2 focal strains, 86 tests in total).
Gene set enrichment analysis
We considered B6 and D2 focal mice separately for this analysis, using the R package topGO[60] and the annotation package org.Mm.eg.db to perform gene set enrichment analyses. We used the residuals from PEER and considered the variance explained by SGE in Model 1 as score for each gene. Enrichment P values were obtained with the Kolmogorov-Smirnov test and the elim algorithm that decorrelates the GO graph structure[61]. Q values[58] were used for multiple testing correction, and obtained using the R package package[59].
Outbred mice dataset
Description of the dataset
We used the raw data from Valdar et al.[22] (genotypes and organismal phenotypes), Huang et al.[23] (organismal phenotype: cellular proliferation in the subgranular zone of the dentate gyrus, a measure of adult neurogenesis) and Huang et al.[24] (gene expression in the hippocampus), which were collected on Heterogeneous Stock (HS) mice. HS mice are descended from eight inbred progenitor strains through more than 50 generations of circular breeding with 24 families[62]. As a result, each HS mouse is genetically unique. The mice included in this dataset are from a colony established at Oxford for a few generations, and for which pedigree data are available (3,149 mice). Cage information was available for a subset of 2,448 mice, filling 549 cages in groups of 2 to 7 mice (S4 Fig). Genotypes at 13,459 single nucleotide polymorphisms were available for a subset of 1,940 of those 2,448 mice, and a pair-wise genetic similarity matrix shows that they were related at different levels (S8 Fig), including (but not only) many full siblings. Some phenotype data were available for all 2,448 mice, but the number of mice phenotyped for each organismal phenotype is reported in S5 Table.
Inference of relationship coefficients using the single-step or H matrix
Our analysis of this dataset is based on linear mixed models, which use a genetic covariance matrix made of pair-wise relationship coefficients. These coefficients can be estimated from genome-wide genotypes, however as 508 cage mates of phenotyped mice had not been genotyped, we estimated relationship coefficients for all 3,149 mice based on both genotype and pedigree data.
We discarded the genotypes of 18 mice because they represented likely duplicates.
We infered the pair-wise relationship coefficient of the 1,227 non-genotyped mice with all 3,149 mice using the single-step or H matrix[63–65], which was constructed as (in the inverse scale;):
where A−1 is the inverse of pedigree-based Wright (1921) relationship matrix, and Agg is the pedigree-based relationship submatrix across genotyped individuals. The pedigree consisted of the genotyped individuals and their parents, some of them common across sibships. To warrant that both G(*) and A−1 refer to the same base population and to ensure that G(*) is a valid covariance matrix, G(*) was constructed as [66, 67]
where a and b are constants (respectively 0.015 and 0.982) based on statistics of A and G so that average relationship and average inbreeding agree for both matrices. These constants correct for the random drift and the loss on heterozygosity across generations. Matrix M contains centered genotypes based on observed allelic frequencies p and q = 1 − p for each locus as in [68]. The matrix H was obtained by inversion of H−1. Program pregsf90 ([69]; available at http://nce.ads.uga.edu/wiki/doku.php) was used to construct H. Using this matrix, all estimates of genetic variance refer to the pedigree base populations, i.e. the parents of the animals in the study. Because this is only one generation back in time, we can assume that this base genetic variance is concordant with the genetic variance in the phenotyped population.
Processing of organismal phenotypes
We considered a subset of the experimental variables recorded by Valdar et al.[22] as covariates for our analysis (S5 Table). Each phenotype was normalized using a covariate-aware Box-Cox transformation[49], and the covariates were fitted as fixed effects in the models used in our analyses.
Re-processing of expression data
Gene expression in the hippocampus was assessed by hybridization of mRNA to the Illumina Mouse WG-6 v1 BeadArray platform. The data, as originally pre-processed by Huang et al. had critical shortcomings: S10A and S10B Fig show that the higher correlation that can be seen in the raw data between probes mapping to the same gene compared to random probe pairs is lost in the pre-processed data from Huang et al.[24]. We re-processed the data as detailed in Section 2.6 of Krohn[70]. Briefly, the BeadArray images were imported into the Gene Expression module (V 1.6.0) of the Illumina GenomeStudio (V 2010.1) without invoking any data adjustment procedures. These raw data were transferred to R using the Bioconductor package lumi[71]. A cluster dendrogram enabled identification of four outlier samples, which were excluded from further pre-processing. Two thousand control probes scattered across the Illumina chip enabled subtraction of background noise levels via lumi. Standardisation of variance and mean were carried out via lumi’s variance stabilising transformation and robust spline normalisation. Then a series of four filters were applied to eliminate unreliable probes: following the pipeline developed by Barbosa-Morais et al.[72], probes not rated as “perfect” (50 out of 50 nucleotides match the reference genome) or “good” (48 to 49 out of 50 matches to the genome) were removed; so were probes that mapped to more than one genomic location according to BLAST queries[73] and probes that included a SNP (NCBI dbSNP Build 137); finally, only probes detected in at least 5% of the mice at a 0.95 detection level (as per GenomeStudio) were retained. Following filter application, 15,736 of the original 47K probes, mapping to 11,996 genes, were kept. Finally, ComBat[74] was used to control for batch effects (the data were assayed over several months). S10C Fig shows that our processing of the data kept the biological correlation that exists between probes mapping to the same gene.
Statistical models
We considered a linear additive model,
(3) |
Here, yf is the phenotypic value of the focal mouse f, is a row of the matrix X of covariate values and b a column vector of corresponding estimated coefficients. aD,f is the additive direct genetic effects (DGE), also called breeding value, of f. is a row of the matrix Z that indicates cage mates (importantly Zi,i = 0) and the column vector of additive social genetic effects (SGE). refers to direct environmental effects (DEE) and to social environmental effects (SEE). is a row of the matrix W that indicates cage assignment and c the column vector of cage effects
Because the genetic setup is not binary (B6 / D2 in the experiment with inbred strains) but individuals are genetically unique and related to varying degrees, we modeled the components in Model (3) as random effects, while accounting for covariance between different effects. The joint distribution of all random effects is defined as:
and the phenotypic covariance is:
(4) |
H1, H2, and H3 were calculated by dividing the (single-step) H matrix by, respectively, sampleVar(H), , and sampleVar(ZHZT), where sampleVar is the sample variance of the corresponding covariance matrix: suppose that we have a vector of random variables with covariance matrix M, the sample variance of M is calculated as
Tr denotes the trace, n is the sample size, and [75, 76].
As a result of this scaling, , , and directly reflect the relative contribution of DGE, the covariance between DGE and SGE, and SGE.
Similarly, I1, I2, and I3 were calculated by dividing the identity matrix I by, respectively, sampleVar(I) = 1, , and sampleVar(ZIZT), so that , , and directly reflect the phenotypic variance explained by DEE, the covariance between DEE and SEE, and SEE.
I4 was calculated by dividing the identity matrix I by sampleVar(WIWT).
See S1 Note for a discussion of how these models differ from published SGE models.
Variance partitioning
For each trait, Eq (4) was used to partition phenotypic variance into its individual components. All models were fitted using LIMIX[26, 27], which we extended to handle models with correlated variance components. The model was fitted using restricted maximum likelihood (REML). We report the contribution of DGE and SGE to phenotypic variation as the proportions of phenotypic variance explained by DGE and SGE, i.e. the ratio between the sample variance explained by DGE () or SGE () and the sample variance of the overall phenotype covariance matrix (defined by Eq 4, with all variance components estimated).
Standard errors on variance components are estimated as the square root of the corresponding diagonal entries of the inverse of the Fisher information, which is the negative expected value of the Hessian[77].
Simulation experiments
Phenotypes were simulated based on the genotypes and the cage relationships of the full set of 2,448 mice. Phenotypes were drawn from model (3). We note the correlation between and : .
We present several sets of simulations:
one based on the estimated contributions of all parameters averaged over all organismal phenotypes: = 15%, = 4%, = 0.5, = 52%, = 0, = 0, = 22% (S6B Fig and S7B Fig)
one based on the estimated contribution of all parameters averaged over the six phenotypes with highest SGE: = 5%, = 27%, = 0.92, = 5%, = 0, = 0, = 51% (S6C Fig and S7C Fig)
taking values {0,10,20,30,40,50} with all other parameters set to their average estimated contribution across all phenotypes (so = 4%, = 0.5, = 52%, = 0, = 0, = 22%) (S6A Fig and S7A Fig)
taking values {0,10,20,30,40,50} with all other parameters set to their average estimated contribution across all phenotypes (Fig 2C, S6A Fig and S7A Fig)
taking values {-0.8,-0.4,-0.2,0.2,0.4,0.8} with all other parameters set to their average estimated contribution across all phenotypes (S6A Fig and S7A Fig)
taking values {0,10,20,30,40,50} with all other parameters set to their average estimated contribution across all phenotypes (S6A Fig and S7A Fig)
taking values {0,10,20,30,40,50} with all other parameters set to their average estimated contribution across all phenotypes (S6A Fig and S7A Fig)
taking values {-0.8,-0.4,-0.2,0.2,0.4,0.8} with all other parameters set to their average estimated contribution across all phenotypes (S6A Fig and S7A Fig)
taking values {0,10,20,30,40,50} with all other parameters set to their average estimated contribution across all phenotypes (S6A Fig and S7A Fig)
500 simulations were drawn for each combination of parameters.
Statistical significance tests
Statistical significance was assessed using restricted likelihood ratio tests, comparing the full model in Eq (3) to a null model without the SGE term. Because both SGE and the covariance between DGE and SGE are constrained to zero in the null model, we calculated significance by comparing the log likelihood ratio statistics to the χ2 distribution with 2 degrees of freedom, which is conservative. Q values[58] were used for multiple testing correction, and obtained using the R package qvalue [59].
Accession numbers
Gene expression data from the experiment with inbred strains are available from ArrayExpress E-MTAB-5276. Phenotype data for the same experiment are provided as S7 Table.
Supporting Information
Acknowledgments
We thank the High-Throughput Genomics Group at the Wellcome Trust Centre for Human Genetics for the generation of the RNA Sequencing data.
Data Availability
Gene expression data from the experiment with inbred strains are available from ArrayExpress E-MTAB-5276. Phenotype data for the same experiment are provided as S7 Table.
Funding Statement
The High-Throughput Genomics Group at the Wellcome Trust Centre for Human Genetics is funded by Wellcome Trust grant reference 090532/Z/09/Z and MRC Hub grant G0900747 91070. AB was supported by fellowships from the EMBL Interdisciplinary Postdoc Programme under Marie Curie COFUND Actions and the Wellcome Trust (105941/Z/14/Z). MKM, JFI, CJB, and RWW are supported in part by NIAAA grants (U01 AA016662, U01 AA013499, U01 AA014425) and the UTHSC Center for Integrative and Translational Genomics. The funders played no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Moore AJ, Brodie ED III, Wolf JB. Interacting phenotypes and the evolutionary process: I. Direct and indirect genetic effects of social interactions. Evolution; international journal of organic evolution. 1997:1352–62. [DOI] [PubMed] [Google Scholar]
- 2.Bergsma R, Kanis E, Knol EF, Bijma P. The contribution of social effects to heritable variation in finishing traits of domestic pigs (Sus scrofa). Genetics. 2008;178(3):1559–70. Epub 2008/02/05. PubMed Central PMCID: PMC2391867. 10.1534/genetics.107.084236 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Wolf JB, Wade MJ. What are maternal effects (and what are they not)? Philosophical Transactions of the Royal Society B: Biological Sciences. 2009;364(1520):1107–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Wilson A, Coltman D, Pemberton J, Overall A, Byrne K, Kruuk L. Maternal genetic effects set the potential for evolution in a free‐living vertebrate population. Journal of evolutionary biology. 2005;18(2):405–14. 10.1111/j.1420-9101.2004.00824.x [DOI] [PubMed] [Google Scholar]
- 5.de Albuquerque G. Estimates of direct and maternal genetic effects for weights from birth to 600 days of age in Nelore cattle. Journal of Animal breeding and Genetics. 2001;118(2):83–92. [Google Scholar]
- 6.Department of Biological Sciences University of South Carolina Timothy A. Mousseau Associate Professor C, University CWFAPF. Maternal Effects As Adaptations: Oxford University Press, USA; 1998.
- 7.Strandberg E, Jacobsson J, Saetre P. Direct genetic, maternal and litter effects on behaviour in German shepherd dogs in Sweden. Livestock Production Science. 2005;93(1):33–42. [Google Scholar]
- 8.Champagne FA, Meaney MJ. Stress during gestation alters postpartum maternal care and the development of the offspring in a rodent model. Biological psychiatry. 2006;59(12):1227–35. 10.1016/j.biopsych.2005.10.016 [DOI] [PubMed] [Google Scholar]
- 9.Brinker T, Bijma P, Visscher J, Rodenburg TB, Ellen ED. Plumage condition in laying hens: genetic parameters for direct and indirect effects in two purebred layer lines. Genetics, selection, evolution: GSE. 2014;46:33 Epub 2014/06/03. PubMed Central PMCID: PMC4073196. 10.1186/1297-9686-46-33 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Peeters K, Eppink TT, Ellen ED, Visscher J, Bijma P. Indirect genetic effects for survival in domestic chickens (Gallus gallus) are magnified in crossbred genotypes and show a parent-of-origin effect. Genetics. 2012;192(2):705–13. 10.1534/genetics.112.142554 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Duijvesteijn N, Knol E, Bijma P. Direct and associative effects for androstenone and genetic correlations with backfat and growth in entire male pigs. Journal of animal science. 2012;90(8):2465–75. 10.2527/jas.2011-4625 [DOI] [PubMed] [Google Scholar]
- 12.Arango J, Misztal I, Tsuruta S, Culbertson M, Herring W. Estimation of variance components including competitive effects of Large White growing gilts. Journal of animal science. 2005;83(6):1241–6. [DOI] [PubMed] [Google Scholar]
- 13.Khaw H. Indirect genetic effects for harvest weight in Nile tilapia (Oreochromis niloticus). Age. 2014;6330(166):58.97. [Google Scholar]
- 14.Petfield D, Chenoweth SF, Rundle HD, Blows MW. Genetic variance in female condition predicts indirect genetic variance in male sexual display traits. Proceedings of the National Academy of Sciences. 2005;102(17):6045–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Alemu SW, Bijma P, Møller SH, Janss L, Berg P. Indirect genetic effects contribute substantially to heritable variation in aggression-related traits in group-housed mink (Neovison vison). Genet Sel Evol. 2014;46:30 10.1186/1297-9686-46-30 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Wilson A, Morrissey M, Adams M, Walling C, Guinness F, Pemberton J, et al. Indirect genetics effects and evolutionary constraint: an analysis of social dominance in red deer, Cervus elaphus. Journal of evolutionary biology. 2011;24(4):772–83. 10.1111/j.1420-9101.2010.02212.x [DOI] [PubMed] [Google Scholar]
- 17.Sartori C, Mantovani R. Indirect genetic effects and the genetic bases of social dominance: evidence from cattle. Heredity. 2013;110(1):3–9. 10.1038/hdy.2012.56 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Peirce JL, Lu L, Gu J, Silver LM, Williams RW. A new set of BXD recombinant inbred lines from advanced intercross populations in mice. BMC genetics. 2004;5(1):7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Bredy TW, Wu H, Crego C, Zellhoefer J, Sun YE, Barad M. Histone modifications around individual BDNF gene promoters in prefrontal cortex are associated with extinction of conditioned fear. Learning & memory. 2007;14(4):268–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Caldji C, Diorio J, Anisman H, Meaney MJ. Maternal behavior regulates benzodiazepine/GABAA receptor subunit expression in brain regions associated with fear in BALB/c and C57BL/6 mice. Neuropsychopharmacology. 2004;29(7):1344–52. 10.1038/sj.npp.1300436 [DOI] [PubMed] [Google Scholar]
- 21.Kerns RT, Ravindranathan A, Hassan S, Cage MP, York T, Sikela JM, et al. Ethanol-responsive brain region expression networks: implications for behavioral responses to acute ethanol in DBA/2J versus C57BL/6J mice. The Journal of neuroscience. 2005;25(9):2255–66. 10.1523/JNEUROSCI.4372-04.2005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Valdar W, Solberg LC, Gauguier D, Burnett S, Klenerman P, Cookson WO, et al. Genome-wide genetic association of complex traits in heterogeneous stock mice. Nature genetics. 2006;38(8):879–87. 10.1038/ng1840 [DOI] [PubMed] [Google Scholar]
- 23.Huang G-J, Smith AL, Gray DH, Cosgrove C, Singer BH, Edwards A, et al. A genetic and functional relationship between T cells and cellular proliferation in the adult hippocampus. 2010. [DOI] [PMC free article] [PubMed]
- 24.Huang G-J, Shifman S, Valdar W, Johannesson M, Yalcin B, Taylor MS, et al. High resolution mapping of expression QTLs in heterogeneous stock mice in multiple tissues. Genome research. 2009;19(6):1133–40. 10.1101/gr.088120.108 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Bijma P, Muir WM, Ellen ED, Wolf JB, Van Arendonk JA. Multilevel selection 2: estimating the genetic parameters determining inheritance and response to selection. Genetics. 2007;175(1):289–99. 10.1534/genetics.106.062729 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Lippert C, Casale FP, Rakitsch B, Stegle O. LIMIX: genetic analysis of multiple traits. bioRxiv. 2014. [Google Scholar]
- 27.Casale FP, Rakitsch B, Lippert C, Stegle O. Efficient set tests for the genetic analysis of correlated traits. Nature methods. 2015. [DOI] [PubMed] [Google Scholar]
- 28.Bijma P. Estimating indirect genetic effects: precision of estimates and optimum designs. Genetics. 2010;186(3):1013–28. 10.1534/genetics.110.120493 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Gardner M, Steinberg L. Peer influence on risk taking, risk preference, and risky decision making in adolescence and adulthood: an experimental study. Developmental psychology. 2005;41(4):625–35. 10.1037/0012-1649.41.4.625 [DOI] [PubMed] [Google Scholar]
- 30.Groh DR, Jason LA, Keys CB. Social network variables in alcoholics anonymous: a literature review. Clinical psychology review. 2008;28(3):430–50. PubMed Central PMCID: PMC2289871. 10.1016/j.cpr.2007.07.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Haeffel GJ, Hames JL. Cognitive Vulnerability to Depression Can Be Contagious. Clinical Psychological Science. 2014;2(1):75–85. [Google Scholar]
- 32.Kumpulainen K. Psychiatric conditions associated with bullying. International journal of adolescent medicine and health. 2008;20(2):121–32. [DOI] [PubMed] [Google Scholar]
- 33.Russek LG, Schwartz GE. Feelings of parental caring predict health status in midlife: A 35-year follow-up of the Harvard Mastery of Stress Study. J Behav Med. 1997;20(1):1–13. [DOI] [PubMed] [Google Scholar]
- 34.Hutson J, Niall M, Evans D, Fowler R. Effect of salivary glands on wound contraction in mice. 1979. [DOI] [PubMed]
- 35.Detillion CE, Craft TK, Glasper ER, Prendergast BJ, DeVries AC. Social facilitation of wound healing. Psychoneuroendocrinology. 2004;29(8):1004–11. 10.1016/j.psyneuen.2003.10.003 [DOI] [PubMed] [Google Scholar]
- 36.Ebrecht M, Hextall J, Kirtley L-G, Taylor A, Dyson M, Weinman J. Perceived stress and cortisol levels predict speed of wound healing in healthy male adults. Psychoneuroendocrinology. 2004;29(6):798–809. 10.1016/S0306-4530(03)00144-6 [DOI] [PubMed] [Google Scholar]
- 37.Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. Finding the missing heritability of complex diseases. Nature. 2009;461(7265):747–53. Epub 2009/10/09. PubMed Central PMCID: PMC2831613. 10.1038/nature08494 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Eaves L. A model for sibling effects in man. Heredity. 1976;36(2):205–14. [DOI] [PubMed] [Google Scholar]
- 39.Silventoinen K, Sammalisto S, Perola M, Boomsma DI, Cornes BK, Davis C, et al. Heritability of adult body height: a comparative study of twin cohorts in eight countries. Twin research. 2003;6(05):399–408. [DOI] [PubMed] [Google Scholar]
- 40.Hyttinen V, Kaprio J, Kinnunen L, Koskenvuo M, Tuomilehto J. Genetic Liability of Type 1 Diabetes and the Onset Age Among 22,650 Young Finnish Twin Pairs A Nationwide Follow-Up Study. Diabetes. 2003;52(4):1052–5. [DOI] [PubMed] [Google Scholar]
- 41.Wirdefeldt K, Gatz M, Reynolds CA, Prescott CA, Pedersen NL. Heritability of Parkinson disease in Swedish twins: a longitudinal study. Neurobiology of aging. 2011;32(10):1923. e1–. e8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Conley D, Siegal ML, Domingue BW, Harris KM, McQueen MB, Boardman JD. Testing the key assumption of heritability estimates based on genome-wide genetic relatedness. Journal of human genetics. 2014;59(6):342–5. 10.1038/jhg.2014.14 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Muir WM. Incorporation of competitive effects in forest tree or animal breeding programs. Genetics. 2005;170(3):1247–59. 10.1534/genetics.104.035956 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Wolf JB, Brodie ED III, Cheverud JM, Moore AJ, Wade MJ. Evolutionary consequences of indirect genetic effects. Trends in ecology & evolution. 1998;13(2):64–9. [DOI] [PubMed] [Google Scholar]
- 45.Dawkins R. The Extended Phenotype: The Long Reach of the Gene: Oxford University Press; 1999. [Google Scholar]
- 46.Kema IP, Schellings AM, Hoppenbrouwers CJ, Rutgers HM, de Vries EG, Muskiet FA. High performance liquid chromatographic profiling of tryptophan and related indoles in body fluids and tissues of carcinoid patients. Clinica chimica acta; international journal of clinical chemistry. 1993;221(1–2):143–58. Epub 1993/11/30. [DOI] [PubMed] [Google Scholar]
- 47.Gamache P, Ryan E, Svendsen C, Murayama K, Acworth IN. Simultaneous measurement of monoamines, metabolites and amino acids in brain tissue and microdialysis perfusates. Journal of chromatography. 1993;614(2):213–20. Epub 1993/05/05. [DOI] [PubMed] [Google Scholar]
- 48.Box GE, Cox DR. An analysis of transformations. Journal of the Royal Statistical Society Series B (Methodological). 1964:211–52. [Google Scholar]
- 49.Venables W, Ripley B. Modern applied statistics with S. Fourth edition ed: Springer; 2002. [Google Scholar]
- 50.Keane TM, Goodstadt L, Danecek P, White MA, Wong K, Yalcin B, et al. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature. 2011;477(7364):289–94. Epub 2011/09/17. PubMed Central PMCID: PMC3276836. 10.1038/nature10413 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Munger SC, Raghupathy N, Choi K, Simons AK, Gatti DM, Hinerfeld DA, et al. RNA-Seq alignment to individualized genomes improves transcript abundance estimates in multiparent populations. Genetics. 2014;198(1):59–73. Epub 2014/09/23. PubMed Central PMCID: PMC4174954. 10.1534/genetics.114.165886 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome biology. 2013;14(4):R36 Epub 2013/04/27. PubMed Central PMCID: PMC4053844. 10.1186/gb-2013-14-4-r36 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Anders S, Pyl PT, Huber W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31(2):166–9. Epub 2014/09/28. PubMed Central PMCID: PMC4287950. 10.1093/bioinformatics/btu638 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome biology. 2014;15(12):550 Epub 2014/12/18. PubMed Central PMCID: PMC4302049. 10.1186/s13059-014-0550-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Stegle O, Parts L, Piipari M, Winn J, Durbin R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nature protocols. 2012;7(3):500–7. 10.1038/nprot.2011.457 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Lassmann T, Hayashizaki Y, Daub CO. SAMStat: monitoring biases in next generation sequencing data. Bioinformatics. 2011;27(1):130–1. Epub 2010/11/20. PubMed Central PMCID: PMC3008642. 10.1093/bioinformatics/btq614 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Andrews S. FastQC: A quality control tool for high throughput sequence data. Available from: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
- 58.Storey JD. The positive false discovery rate: a Bayesian interpretation and the q-value. Annals of statistics. 2003:2013–35. [Google Scholar]
- 59.Bass JDSwcfAJ DAaRD. qvalue: Q-value estimation for false discovery rate control. R package version 220. 2015.
- 60.Alexa A, Rahnenfuhrer J. topGO: enrichment analysis for gene ontology. R package version. 2010;2(0). [Google Scholar]
- 61.Alexa A, Rahnenführer J, Lengauer T. Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics. 2006;22(13):1600–7. 10.1093/bioinformatics/btl140 [DOI] [PubMed] [Google Scholar]
- 62.Demarest K, Koyner J, McCaughran J Jr, Cipp L, Hitzemann R. Further characterization and high-resolution mapping of quantitative trait loci for ethanol-induced locomotor activity. Behavior genetics. 2001;31(1):79–91. [DOI] [PubMed] [Google Scholar]
- 63.Legarra A, Aguilar I, Misztal I. A relationship matrix including full pedigree and genomic information. J Dairy Sci. 2009;92(9):4656–63. 10.3168/jds.2009-2061 [DOI] [PubMed] [Google Scholar]
- 64.Christensen OF, Lund MS. Genomic prediction when some animals are not genotyped. Genetics Selection Evolution. 2010;42(1):1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Aguilar I, Misztal I, Johnson D, Legarra A, Tsuruta S, Lawlor T. Hot topic: A unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. Journal of Dairy Science. 2010;93(2):743–52. 10.3168/jds.2009-2730 [DOI] [PubMed] [Google Scholar]
- 66.Vitezica ZG, Aguilar I, Misztal I, Legarra A. Bias in genomic predictions for populations under selection. Genet Res (Camb). 2011;93(5):357–66. [DOI] [PubMed] [Google Scholar]
- 67.Christensen OF, Madsen P, Nielsen B, Ostersen T, Su G. Single-step methods for genomic evaluation in pigs. Animal. 2012;6(10):1565–71. 10.1017/S1751731112000742 [DOI] [PubMed] [Google Scholar]
- 68.VanRaden P. Efficient methods to compute genomic predictions. Journal of dairy science. 2008;91(11):4414–23. 10.3168/jds.2007-0980 [DOI] [PubMed] [Google Scholar]
- 69.Aguilar I, Misztal I, Legarra A, Tsuruta S. Efficient computation of the genomic relationship matrix and other matrices used in single‐step evaluation. Journal of Animal Breeding and Genetics. 2011;128(6):422–8. 10.1111/j.1439-0388.2010.00912.x [DOI] [PubMed] [Google Scholar]
- 70.Krohn J. Genes contributing to variation in fear-related behaviour: Oxford, UK; 2013.
- 71.Du P, Kibbe WA, Lin SM. lumi: a pipeline for processing Illumina microarray. Bioinformatics. 2008;24(13):1547–8. Epub 2008/05/10. 10.1093/bioinformatics/btn224 [DOI] [PubMed] [Google Scholar]
- 72.Barbosa-Morais NL, Dunning MJ, Samarajiwa SA, Darot JF, Ritchie ME, Lynch AG, et al. A re-annotation pipeline for Illumina BeadArrays: improving the interpretation of gene expression data. Nucleic acids research. 2010;38(3):e17 Epub 2009/11/20. PubMed Central PMCID: PMC2817484. 10.1093/nar/gkp942 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acids research. 1997;25(17):3389–402. Epub 1997/09/01. PubMed Central PMCID: PMC146917. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8(1):118–27. 10.1093/biostatistics/kxj037 [DOI] [PubMed] [Google Scholar]
- 75.Kang HM, Sul JH, Service SK, Zaitlen NA, Kong S-y, Freimer NB, et al. Variance component model to account for sample structure in genome-wide association studies. Nature genetics. 2010;42(4):348–54. 10.1038/ng.548 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Searle SR. Matrix algebra useful for statistics: Wiley; 1982. [Google Scholar]
- 77.Murphy KP. Machine Learning: A Probabilistic Perspective: The MIT Press; 2012. 1096 p. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Gene expression data from the experiment with inbred strains are available from ArrayExpress E-MTAB-5276. Phenotype data for the same experiment are provided as S7 Table.