Abstract
Phenotypic variance heterogeneity across genotypes at a single nucleotide polymorphism (SNP) may reflect underlying gene-environment (G×E) or gene-gene interactions. We modeled variance heterogeneity for blood lipids and BMI in up to 44,211 participants and investigated relationships between variance effects (Pv), G×E interaction effects (with smoking and physical activity), and marginal genetic effects (Pm). Correlations between Pv and Pm were stronger for SNPs with established marginal effects (Spearman’s ρ = 0.401 for triglycerides, and ρ = 0.236 for BMI) compared to all SNPs. When Pv and Pm were compared for all pruned SNPs, only BMI was statistically significant (Spearman’s ρ = 0.010). Overall, SNPs with established marginal effects were overrepresented in the nominally significant part of the Pv distribution (Pbinomial <0.05). SNPs from the top 1% of the Pm distribution for BMI had more significant Pv values (PMann–Whitney = 1.46×10−5), and the odds ratio of SNPs with nominally significant (<0.05) Pm and Pv was 1.33 (95% CI: 1.12, 1.57) for BMI. Moreover, BMI SNPs with nominally significant G×E interaction P-values (Pint<0.05) were enriched with nominally significant Pv values (Pbinomial = 8.63×10−9 and 8.52×10−7 for SNP × smoking and SNP × physical activity, respectively). We conclude that some loci with strong marginal effects may be good candidates for G×E, and variance-based prioritization can be used to identify them.
Author summary
Most contemporary studies of gene-environment interactions focus on gene variants that are known to bear strong and reliable associations with the traits of interest. The strategy is intuitive because it helps limit the number of tests performed by focusing on a relatively small number of gene variants. However, this approach is predicated on an implicit assumption that these loci are strong candidates for interactions owing to their established relationships with the index traits. The counter-argument is that, because these loci have highly consistent signals within and between populations that vary by environmental characteristics, the probability that these variants interact with other factors is low. The current analysis tests whether variants with strong marginal effects signals (i.e., those prioritized through conventional genome-wide association analyses) are strong or weak candidates for gene-environment interactions. Here we describe analyses focused on lipids and BMI that test this hypothesis by comparing marginal effect signals with variance effect signals and those derived from explicit genome-wide, gene-environment interaction analyses. We conclude that for BMI, there are features of the top-ranking marginal effect loci that render them stronger candidates for interactions than is true of variants with weaker marginal effects signals. These findings are likely to help optimize the efficiency of future gene-environment interaction analyses by providing evidence-based rankings for strong candidate loci.
Introduction
Gene-environment (G×E) interactions may contribute to complex diseases, but their detection has proven challenging; hence, a variety of approaches have been developed to enhance power. Most G×E analyses focus on loci that are strong biological candidates [1] or those with highly significant marginal effects [2]. The latter approach is attractive because these loci are available in many large cohorts, and can be conveniently followed-up with interaction analyses if environmental data are accessible. Moreover, selecting SNPs with strong and reproducible marginal effect signals is a pragmatic data-reduction step that may improve power [3], although this approach risks omitting other promising candidates [4].
In a linear regression setting, the presence of interaction effects drives phenotypic variance heterogeneity by genotype [3,5]. Exploiting variance heterogeneity as a signature of interactions is appealing because, unlike standard approaches for assessing G×E interactions, no explicit information about environmental exposures is needed [6] and multiple exposures can be simultaneously considered.
Here we explored whether loci identified in large-scale genome-wide association studies (GWAS) of blood lipids and body mass index (BMI) are strong candidates for G×E interactions by comparing genome-wide variance heterogeneity P-value distributions generated using Levene’s test against P-value distributions for marginal effects and explicit G×E interaction effects (for smoking and physical activity).
Results
We assessed between-genotype variance heterogeneity for up to 1,927,671 directly genotyped or imputed SNPs (HapMap II CEU reference panel [7]) that passed quality control (QC). Meta-analyses of Levene’s test summary statistics [8] were performed for BMI (n≤44,211 participants), and blood concentrations of high-density lipoprotein cholesterol (HDL-C) (n≤34,315), low-density lipoprotein cholesterol (LDL-C) (n≤34,180), total cholesterol (TC) (n≤34,318) and triglycerides (TG) (n≤34,110). We then obtained marginal effects results for the same index traits and SNPs from publicly available GWAS summary data from the GIANT (Genetic Investigation of ANthropometric Traits) Consortium [9] and GLGC (Global Lipids Genetics Consortium) [10,11].
We compared the genome-wide marginal effects with between-genotype variance heterogeneity results for each of the five cardiometabolic traits by calculating the association between marginal effects (Pm) and variance heterogeneity (Pv) P-values using the rank-based Spearman correlation (ρ). This was done using a set of 42,710 pruned SNPs produced using the--indep-pairwise command in PLINK (see Materials and Methods) to account for linkage disequilibrium (LD) among variants.
As shown in Table 1 (see also Fig 1A and S1 Table), the Spearman’s ρ for the association between Pm and Pv for all pruned SNPs was of very small magnitude and only statistically significant for BMI. The exclusion of SNPs based on progressively more conservative Pm thresholds (Pm<0.05; Pm<10−4; previously established loci with Pm<5×10−8 in external datasets), saw corresponding improvements in the magnitude of these correlations, which were statistically significant for all traits except TC when focusing on previously established loci. The BMI correlation at the Pm<0.05 threshold, as well as the test of equality with ρ for all SNPs, was statistically significant, suggesting concordance between marginal and variance signals at a nominal level of significance. The odds ratio (OR) for a SNP to have both Pm<0.05 and Pv<0.05 as compared to Pv≥0.05 was 1.33 (95% CI: 1.12, 1.57) for BMI while the 95% CIs of ORs for other traits included 1. On the other hand, the P-value for a non-zero ρ for TG was statistically significant when focusing on the established loci and at Pm<10−4, suggesting concordance between marginal and variance signals at more conservative Pm thresholds.
Table 1. Spearman correlations between marginal effects Pm and heterogeneity of variance from Levene's test Pv.
Trait | Max Sample Size | All SNPs in analysis | SNPs with Pm<0.05 | SNPs with Pm<10−4 | Known Loci | Odds ratio (SNPs with Pm<0.05 and Pv<0.05) | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
# SNPs | Spearman ρ | P-value | # SNPs | Spearman ρ | P-value | P-value for equality test with ρ for all SNPs | # SNPs | Spearman ρ | P-value | P-value for equality test with ρ for all SNPs | # SNPs | Spearman ρ | P-value | P-value for equality test with ρ for all SNPs | OR (95% CI) | ||
TC | 34 318 | 41 328 | 0.001 | 0.89 | 2190 | 0.026 | 0.22 | 0.24 | 126 | 0.062 | 0.49 | 0.50 | 69 | 0.188 | 0.12 | 0.13 | 0.97 (0.78–1.19) |
TG | 34 110 | 41 206 | 0.003 | 0.51 | 2 079 | -0.006 | 0.80 | 0.69 | 83 | 0.230 | 3.61×10−2 | 3.87×10−2 | 40 | 0.401 | 1.03×10−2 | 1.00×10−2 | 1.20 (0.99–1.44) |
HDL-C | 34 315 | 41 332 | 0.006 | 0.24 | 2 146 | -0.001 | 0.97 | 0.77 | 95 | -0.074 | 0.48 | 0.45 | 68 | 0.200 | 0.10 | 9.54×10−2 | 1.12 (0.92–1.35) |
LDL-C | 34 180 | 41 207 | 0.005 | 0.29 | 2 164 | 0.013 | 0.55 | 0.73 | 100 | 0.055 | 0.59 | 0.62 | 53 | 0.258 | 6.18×10−2 | 6.58×10−2 | 1.06 (0.87–1.28) |
BMI | 44 211 | 42 710 | 0.010 | 4.56×10−2 | 1 900 | 0.066 | 3.82×10−3 | 1.56×10−2 | 68 | 0.201 | 9.98×10−2 | 0.12 | 71 | 0.236 | 4.76×10−2 | 6.38×10−2 | 1.33 (1.12–1.57) |
BMI: body mass index; HDL-C: low-density lipoprotein cholesterol; LDL-C: low-density lipoprotein cholesterol; SNP: single nucleotide polymorphism; TC: total cholesterol; TG: triglycerides
We further compared Pm with interaction P-values from exposure-specific (smoking and physical activity) genome-wide interaction tests for BMI (Pint); this was only done for BMI owing to the requirement for an adequately powered external dataset (such a dataset was accessible through the GIANT consortium) (Table 2). Marginal effects GWAS were performed by strata of smokers vs. non-smokers and physically active vs. inactive participants (n = 210,316 European-ancestry adults [12]) respectively, and a heterogeneity test [12] was used to generate exposure specific Pint distributions. Spearman ρ for the pruned set of SNPs in the SNP × physical activity and the SNP × smoking analyses were low and not statistically significant (Table 2). We also compared Pint values and Pv values for BMI. Spearman’s ρ for the pruned set of SNPs were low and not statistically significant.
Table 2. Spearman correlations between Pint in SNP × Physical Activity and SNP × Smoking on BMI analyses and marginal effects Pm or heterogeneity of variance from Levene's test Pv.
Characteristic | Max Sample Size | Max Sample Size PA/Smoking | All SNPs | SNPs with Pm<0.05 | Known SNPs | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
# SNPs | Spearman ρ | P-value | # SNPs | Spearman ρ | P-value | # SNPs | Spearman ρ | P-value | |||
Marginal effects Pm | |||||||||||
PA × SNP | 322,144 | 180,271 | 41838 | 0.001 | 0.761 | 2142 | 0.029 | 0.176 | 71 | -0.003 | 0.978 |
Smoking × SNP | 322,144 | 210,306 | 41371 | -0.004 | 0.429 | 2351 | 0.010 | 0.619 | 71 | 0.205 | 0.0863 |
Levene's test for homogeneity of variance Pv | |||||||||||
PA × SNP | 44,211 | 180,271 | 41838 | 0.005 | 0.35 | 2142 | -0.003 | 0.884 | 71 | 0.052 | 0.669 |
Smoking × SNP | 44,211 | 210,306 | 41371 | 0.004 | 0.401 | 2351 | -0.023 | 0.265 | 71 | 0.110 | 0.360 |
PA: physical activity; BMI: body mass index; SNP: single nucleotide polymorphism; Pv: Variance (Levene’s) test P-value; Pm: Marginal (linear regression) test P-value
We next tested if the number of previously established marginal effect SNPs (Pm<5×10−8) that were also nominally significant (Pv<0.05) for variance heterogeneity was greater than expected by chance (Tables 3 and 4, Fig 1). For 4 out of the 5 index traits, we observed enrichment at the lower end of the Pv distribution (Pv<0.05) for the established GWAS-derived lead SNPs. Thus, the nominally significant regions of the Pv distributions were generally enriched for GWAS-derived loci.
Table 3. Enrichment of variance and gene × environment interaction nominally significant results with GWAS-derived loci.
Trait | Analysis | Total SNPs/ Observed SNPs with P<0.05 (Expected) |
Pbinomial |
---|---|---|---|
BMI | Levene's | 71/10 (3.6) | 3×10−3 |
SNP × PA | 71/4 (3.6) | 0.48 | |
SNP × Smoking | 71/5 (3.6) | 0.28 | |
Average for SNP × PA & SNP × Smoking | 71/2 (3.6) | 0.88 | |
TG | Levene's | 40/9 (2) | 1×10−4 |
LDL-C | Levene's | 53/8 (2.7) | 5×10−3 |
HDL-C | Levene's | 68/6 (3.4) | 0.12 |
TC | Levene's | 69/9 (3.5) | 7×10−3 |
PA: physical activity; BMI: body mass index; GWAS: genome-wide association study; HDL-C: low-density lipoprotein cholesterol; LDL-C: low-density lipoprotein cholesterol; SNP: single nucleotide polymorphism; TC: total cholesterol; TG: triglycerides
Table 4. Enrichment of SNPs with nominally significant Pint for test of SNP × Smoking and SNP × Physical Activity interaction for BMI (Pint<0.05) by SNPs with nominally significant Levene's test (Pv<0.05).
Analysis | Total SNPs with Pint<0.05/ Observed SNPs with Pint<0.05 & Pv<0.05 (Expected) | Pbinomial |
---|---|---|
SNP × PA | 2142/159 (107.1) | 8.52×10−7 |
SNP × Smoking | 2351/182 (117.6) | 8.63×10−9 |
BMI: body mass index; PA: physical activity; SNP: single nucleotide polymorphism; Pv = Variance (Levene’s) test P-value; Pint = G×E interaction (heterogeneity) test P-value; Pbinomial = significance of observing Pv<0.05 more than expected by chance
We also performed enrichment analyses to test if previously established marginal effects SNPs (Pm<5×10−8) are enriched for nominally significant (Pint<0.05) interactions in the SNP × physical activity or SNP × Smoking analyses, but no enrichment was observed (Table 3; Fig 1B). By contrast, for the physical activity and smoking interaction tests (using all pruned SNPs), the lower end of the Pint distribution (Pint<0.05) was enriched with SNPs that were nominally significant in the Levene’s test analysis (Pv<0.05) (Table 4). This enrichment translated into an OR of 1.08 (95% CI: 1.01, 1.14) for a SNP to have Pint<0.05 given Pv<0.05 vs. Pv≥0.05 for SNP × physical activity interaction. The corresponding OR for the SNP × smoking interaction test was not significant (OR = 1.02; 95% CI: 0.96, 1.08).
Finally, in the pruned SNP-set we used the Mann–Whitney U test to probe for systematic differences in Pv and Pm ranks. P-values were ordered from least significant to most significant, and the lowest 100th centile (i.e. the most significantly associated SNPs) was compared to the remaining 99th percentile for each of the five traits. For BMI, SNPs in the lowest 100th centile of the Pm distribution had markedly higher Pv ranks (i.e. more significant Pv) than the remaining SNPs (PMann–Whitney = 1.46×10−5; Table 5). Even when excluding previously established lead SNPs (Pm<5×10−8) for BMI (or SNPs +/-500kb proximal), SNPs from the lowest 100th centile of the Pm rank-ordered distribution had higher Pv ranks than the remaining SNPs (PMann–Whitney = 4.30×10−4; Table 5). Conversely, no difference in Pv ranks was observed for SNPs from the lowest 100th centile of the Pm rank-ordered distribution for the four blood lipid traits; this may reflect trait-specific G×E effects or differences in statistical power by trait. No differences in Pv ranks between SNPs from the lowest 99th centile of the Pm rank-ordered distribution compared to SNPs from the 98th to 1st centiles of the distribution were observed for any trait (PMann–Whitney>0.05; Table 5). Similarly, no difference in Pm ranks was observed for SNPs from the lowest 100th centile of the Pv rank-ordered distribution for any traits (PMann–Whitney>0.05; Table 6).
Table 5. Comparison of Levene's test Pv ranks from different centiles of the Pm rank-ordered distribution for the index traits.
Trait | Known SNPs | Min Pm from 100th centile | Max Pm from 100th centile | Median Pv rank for 100th centile | Median Pv rank for 99th-1st centiles | Mann-Whitney P-value | Min Pm from 99th centile | Max Pm from 99th centile | Median Pv rank for 99th centile | Median Pv rank for 98th-1st centiles | Mann-Whitney P-value |
---|---|---|---|---|---|---|---|---|---|---|---|
BMI | Included | 4.78×10−91 | 5.82×10−3 | 58.82 | 49.93 | 1.46×10−5 | 5.86×10−3 | 1.85×10−2 | 52.79 | 49.91 | 0.42 |
BMI | Excluded | 3.59×10−6 | 8.56×10−3 | 55.78 | 49.95 | 4.30×10−4 | 8.73×10−3 | 2.18×10−2 | 52.60 | 49.93 | 0.36 |
HDL-C | Included | 3.56×10−573 | 6.48×10−3 | 51.49 | 49.99 | 0.47 | 6.48×10−3 | 1.67×10−2 | 50.49 | 49.98 | 0.92 |
HDL-C | Excluded | 6.68×10−11 | 9.94×10−3 | 51.45 | 49.99 | 0.77 | 9.95×10−3 | 2.09×10−2 | 51.06 | 49.98 | 0.47 |
LDL-C | Included | 3.80×10−143 | 7.14×10−3 | 53.11 | 49.98 | 0.52 | 7.18×10−3 | 1.75×10−2 | 48.44 | 49.99 | 0.85 |
LDL-C | Excluded | 2.03×10−11 | 9.88×10−3 | 53.42 | 49.97 | 0.38 | 9.90×10−3 | 2.09×10−2 | 48.37 | 49.99 | 1.00 |
TG | Included | 2.23×10−113 | 8.18×10−3 | 53.73 | 49.98 | 0.32 | 8.19×10−3 | 1.92×10−2 | 52.42 | 49.95 | 0.63 |
TG | Excluded | 1.00×10−10 | 1.06×10−2 | 51.27 | 49.99 | 0.64 | 1.06×10−2 | 2.21×10−2 | 53.23 | 49.95 | 0.41 |
TC | Included | 1.41×10−107 | 5.85×10−3 | 52.03 | 49.98 | 0.32 | 5.87×10−3 | 1.49×10−2 | 51.21 | 49.97 | 0.62 |
TC | Excluded | 3.11×10−11 | 9.14×10−3 | 49.43 | 50.01 | 0.66 | 9.15×10−3 | 1.91×10−2 | 50.12 | 50.01 | 0.93 |
BMI: body mass index; HDL-C: low-density lipoprotein cholesterol; LDL-C: low-density lipoprotein cholesterol; SNP: single nucleotide polymorphism; TC: total cholesterol; TG: triglycerides; Pv: Variance (Levene’s) test P-value; Pm: marginal (linear regression) test P-value
Table 6. Comparison of marginal effects Pm ranks from different centiles of the Levene's test Pv rank-ordered distribution for the index traits.
Trait | Known SNPs | Min Pv from 100th centile | Max Pv from 100th centile | Median Pm rank for 100th centile | Median Pm rank for 99th-1st centiles | Mann-Whitney P-value | Min Pv from 99th centile | Max Pv from 99th centile | Median Pm rank for 99th centile | Median Pm rank for 98th-1st centiles | Mann-Whitney P-value |
---|---|---|---|---|---|---|---|---|---|---|---|
BMI | Included | 2.95×10−7 | 6.31×10−3 | 51.28 | 49.53 | 0.51 | 6.33×10−3 | 1.30×10−2 | 53.57 | 49.53 | 0.13 |
BMI | Excluded | 2.95×10−7 | 6.38×10−3 | 51.40 | 49.48 | 0.42 | 6.38×10−3 | 1.30×10−2 | 53.50 | 49.44 | 0.17 |
HDL-C | Included | 2.04×10−5 | 9.44×10−3 | 46.28 | 50.04 | 0.52 | 9.45×10−3 | 1.90×10−2 | 53.06 | 50.01 | 0.44 |
HDL-C | Excluded | 2.04×10−5 | 9.45×10−3 | 46.42 | 50.05 | 0.37 | 9.47×10−3 | 1.89×10−2 | 53.37 | 50.01 | 0.31 |
LDL-C | Included | 1.06×10−8 | 9.12×10−3 | 52.96 | 49.98 | 0.19 | 9.15×10−3 | 1.88×10−2 | 50.78 | 49.96 | 0.99 |
LDL-C | Excluded | 1.44×10−5 | 9.37×10−3 | 50.39 | 49.99 | 0.64 | 9.37×10−3 | 1.92×10−2 | 51.85 | 49.97 | 0.68 |
TG | Included | 2.45×10−6 | 8.39×10−3 | 48.93 | 50.01 | 0.60 | 8.39×10−3 | 1.78×10−2 | 51.75 | 50.01 | 0.53 |
TG | Excluded | 2.45×10−6 | 8.37×10−3 | 49.23 | 50.01 | 0.66 | 8.39×10−3 | 1.78×10−2 | 51.92 | 50.00 | 0.51 |
TC | Included | 3.28×10−5 | 1.08×10−2 | 51.61 | 49.98 | 0.16 | 1.08×10−2 | 2.09×10−2 | 50.29 | 49.98 | 0.92 |
TC | Excluded | 3.28×10−5 | 1.10×10−2 | 51.23 | 50.00 | 0.33 | 1.10×10−2 | 2.10×10−2 | 49.92 | 50.00 | 0.93 |
BMI: body mass index; HDL-C: low-density lipoprotein cholesterol; LDL-C: low-density lipoprotein cholesterol; SNP: single nucleotide polymorphism; TC: total cholesterol; TG: triglycerides; Pv: Variance (Levene’s) test P-value; Pm: marginal (linear regression) test P-value
To assess whether a trait with a non-normal distribution (e.g. BMI) or strong marginal associations could cause spurious association between the marginal and variance signals, we recapitulated the analysis pipeline (correlation analysis, enrichment analysis, comparisons of rank Pm and Pv values) in simulations described in the Materials and Methods. Careful assessment of results emanating from these simulations did not reveal evidence of type I error inflation caused by the non-normal distribution of an outcome trait nor strong marginal effects. For instance, we extracted correlation P-values of Pm, Pv and Pint generated from 5,000 simulations. QQ-plots of the 5,000 correlation P-values, 2,500 binomial P-values, and 2,500 Mann-Whitney U test P-values revealed no inflation (S1A–S1C Fig, S2A and S2B Fig and S3A and S3B Fig, respectively). Repeating these analyses on subsets of SNPs with low Pm values did not materially change the results.
Discussion
Collectively, our analyses highlight a few variants with genome-wide significant marginal effects that may be strong candidates for G×E interactions owing to their strong concurrent variance heterogeneity P-values. For BMI, such SNPs are also overrepresented in the nominally significant part of the Pv distribution. FTO is an excellent example, as it conveys strong marginal effects [13], exhibits high between-genotype heterogeneity here (Tables 2 and 3 and Fig 1B) and elsewhere [5], and reportedly interacts with physical activity, diet and other lifestyle exposures [2,14,15] and is associated with macronutrient intake [16,17].
Although variance heterogeneity tests are potentially powerful screening tools for G×E interactions, like most interaction tests, they may be bias prone. For example, apparent differences in phenotypic variances across genotypes may be caused by scaling, particularly when the phenotypic means also differ substantially [18], such that the per-genotype means and variances for index traits are correlated. However, where necessary we transformed variables, and the correlations between Pm and Pv were generally weak, excluding this as a likely source of bias. Using simulated data, we investigated whether the non-normal distribution of a trait can cause a spurious association between marginal and variance signals, which we show is highly improbable. Through further simulations, we assessed whether SNPs with large marginal effects inflate Pv, but observed no inflation, indicating that large genetic marginal effects do not artificially inflate variance heterogeneity to a meaningful extent, and SNPs with low Pm and low Pv-values are thus likely to be strong candidates for G×E interactions, at least in the case of BMI. It might also be that combining populations from ancestral (e.g., hunter-gatherers) and contemporary environments increases variance heterogeneity owing to diversity in population substructure rather than G×E interactions per se [19]. However, this seems unlikely here, as the cohorts examined are from Westernized European-ancestry populations.
There are several additional explanations for between-genotype variance heterogeneity, such as variance misclassification that can occur when the index variant is located within a haplotype containing rare functional variants that convey strong marginal effects [5]. Hence, although variance heterogeneity tests represent a useful data-reduction step, before conclusions are drawn about the presence or absence of G×E interactions, index variants should be validated by testing their interactions with explicit environmental exposures, as we did here with smoking and physical activity. However, genome-wide G×E interactions datasets are not comprised of functionally validated G×E interactions, as no such resource is currently available for human complex traits. This limitation inhibits the extent to which causal effects can be attributed to the top-ranking loci and their interactions with smoking or physical activity.
We conclude that the common approach of prioritizing loci with established genome-wide significant association signals without further discrimination for G×E interaction analyses might be useful, but the efficiency of such analyses could be substantially improved by focusing on variants with low P-values for both variance heterogeneity and marginal effects. We provide these rankings here to facilitate this approach.
Materials and methods
A detailed project flow-chart is shown in Fig 2.
Study sample
We performed a genome-wide search for SNPs whose associations with the following traits are characterized by high between-genotype variance heterogeneity: BMI, TC, TG, HDL-C and LDL-C. The variance heterogeneity analyses were performed using Levene’s test [20] in up to 44,211 participants of European descent from seven population-based cohorts. Descriptions of these cohorts are presented in S2 Table. To minimize bias that might result from unequal sample sizes between SNPs when calculating the correlations between the P-values from the marginal (Pm) and variance heterogeneity (Pv) meta-analyses, we restricted the sample size for analyses to 26,000 participants for BMI and to 24,000 participants for lipid traits (S4 Fig).
Genotyping and imputation
A detailed summary of sample sizes, genotyping platforms, genotype calling algorithms, sample and SNP quality control filters, and analysis software for all participating cohorts are provided in S2 and S3 Tables. For each individual, SNPs were imputed using the CEU reference panel of HapMap II [7] (S2 Table). We excluded SNPs with low imputation quality (below 0.3 for MACH, 0.4 for IMPUTE, and 0.8 for PLINK imputed data), Hardy-Weinberg equilibrium P <10−6, directly genotyped SNP call rate < 95%, and minor allele frequency (MAF) < 1%.
Selection of SNPs identified through GWAS
We identified SNPs that have been robustly associated (P<5x10-8) with the five cardiometabolic traits in European ancestry populations: 77 SNPs associated with BMI discovered by GIANT [9]; and 58 SNPs associated with LDL-C, 71 SNPs associated with HDL-C, 74 SNPs associated with TC, and 40 SNPs associated with TG [10,11] discovered by GLGC.
Variance heterogeneity analyses
We used Levene’s test [20] to identify SNPs that show heterogeneity of phenotypic variances (σi2) across the three genotype groups at each SNP locus (i = 0, 1, or 2). We first log10 transformed all five traits followed by a z-score transformation by subtracting the sample mean and dividing by the sample standard deviation (SD), and further Winsorized the z-score values at 4 SD. The transformed phenotype Y was then used to calculate Z, defined by the absolute deviation of each participant’s phenotype from the sample mean of his or her respective genotype group at a given SNP locus. For each trait, participating cohorts provided the necessary summary statistics for each genotype at each marker [8]. Specifically, the per genotype group counts (n0s, n1s, n2s), per genotype means (), and per genotype group variances of Z (σ0s2,σ1s2,σ2s2) were centrally collected and meta-analyzed. The minimum number of observations per genotype group required is 30 participants per cohort.
Meta-analyses were performed using the following formula, derived previously [8]:
Where N is the combined sample size, and are the sample mean and variance of Z in the ith genotype group of the sth study, respectively. When combining summary-level data to calculate the Levene’s test statistics L, the following natural weights ωis and γi were calculated: and , where ni the sum of genotype counts in the ith genotype group across all participating cohorts. These weights are determined by the frequency of the marker amongst the cohorts, such that the sum of both weights is equal to 1, i.e. and . The meta-analysis Levene’s test P-value is obtained by comparing L to an F-distribution with df1 = 2 and df2 = N-3.
Comparison between marginal effects and variance heterogeneity P-values
Marginal effects P-values for BMI and the relevant lipid traits were obtained from publically available GWAS summary data from the GIANT [9] and GLGC [10,11] consortia, respectively (all cohorts included here in the Levene’s meta-analysis were also included in the GIANT and GLGC datasets).
To illustrate our findings, we rank-ordered the P-values (from lowest to highest) from both marginal effects and variance effects analyses for all 1,927,671 SNPs so that the lowest P-value for a given trait was assigned a rank equal to the lowest 100th centile. These rank-scaled distributions for Pm for all five traits are presented in Fig 1.
We calculated Spearman’s correlations for each of the five cardiometabolic traits between Pm and Pv. This was done using a pruned set of SNPs. Pruning was performed in the TwinGene cohort using the--indep-pairwise 50 5 0.1 command in PLINK [21] by calculating LD (r2) for each pair of SNPs within a window of 50 SNPs, removing one of a pair of SNPs if r2>0.1; we proceeded by shifting the window 5 SNPs forwards and repeating the procedure. Spearman’s correlations were computed for categories of SNPs: i) all pruned SNPs, ii) the subset of SNPs that was nominally significant (Pm<0.05) in the marginal effects analysis, iii) the subset of SNPs with Pm<10−4 in the marginal effects analysis, and iv) SNPs that were previously established in conventional marginal effects GWAS meta-analyses (Pm<5×10−8). We also compared Spearman’s correlations between these categories of SNPs using the test for equality of two correlations [22].
Next, we performed enrichment analyses to test if there was a higher number of established SNPs in the nominally significant variance P-value (Pv<0.05) distribution than expected by chance under the binominal distribution.
We also tested if there is a difference in Pv ranks for SNPs from the lowest 100th centile of the Pm rank-ordered distribution for all five traits and the rest of SNPs in the pruned set of SNPs using the Mann–Whitney U test, including and excluding established SNPs (or SNPs that were +/-500kb from the reported lead SNP). This analysis was repeated for SNPs from the 99th centile vs SNPs from 1st to 98th centiles of the Pm rank-ordered distribution. The same Mann–Whitney U tests were used to study differences in Pm ranks for SNPs from the lowest 100th and 99th centiles of the Pv rank-ordered distribution and the rest of SNPs in the pruned set of SNPs.
All analyses were performed using Stata 12 (StataCorp LP, TX, USA), unless specified otherwise.
SNP × Physical activity and SNP × Smoking interaction analyses for the outcome of BMI
We used now published data from 210,316 European-ancestry adults (from the GIANT consortium) pertaining to marginal effects meta-analyses for BMI that had been performed separately by strata of smoking (45,968 smokers vs. 164,355 non-smokers) [23]. The genetic marginal effect estimates, calculated separately within each of the two strata, were compared using a heterogeneity test [12] to infer the presence or absence of SNP × smoking interaction effects. The same analyses were performed using physical activity as a binary stratifying variable in up to 180,287 European-ancestry adults (42,065 physically active vs. 138,222 physically inactive) [24]. We calculated Spearman correlations between the P-values derived from the marginal effects meta-analysis and the Pint from the interaction effects meta-analysis (i.e., the between-strata heterogeneity test for SNP × smoking and SNP × physical activity interactions from the GIANT consortium); these tests were undertaken for all SNPs and those SNPs that were nominally significant (Pm<0.05) in the marginal effects analysis. We then performed enrichment analyses to test if the numbers of nominally significant (Pint<0.05) GWAS-derived SNPs from both SNP × physical activity and SNP × smoking analyses were greater than expected by chance under the binomial distribution. We further calculated the OR of having Pint<0.05 given Pv<0.05 versus Pv≥0.05 both SNP × physical activity and SNP × smoking interaction analyses in a pruned set of TwinGene SNPs produced using the—indep-pairwise 50 5 0.8 command in PLINK [21].
Thereafter, we calculated the average rank for each SNP’s ranking on the Pint rank-ordered distributions from the SNP × smoking and SNP × physical activity interaction analyses and performed enrichment analysis using these average ranks with >95th centile instead of Pint<0.05 as the cut-off.
Simulations
We simulated genetic data for 44,000 individuals from a pruned set of 50,335 SNPs with allele frequencies, effect estimates and Pm values drawn from the GIANT consortium. We generated an outcome trait by summing the products of the simulated allele counts and effect estimates over all SNPs for each individual, and subsequently added a randomly generated non-normal error term such that the trait resembles the observed distribution of the transformed BMI trait used in the main (real data) analyses. We also simulated a fixed binary interacting factor with 30% prevalence. Using this simulated dataset, we calculated Pm, Pv and Pint values for each SNP and undertook i) pairwise Spearman correlation analyses between Pm, Pv and Pint values (5,000 simulations), ii) enrichment analysis using binomial tests (2,500 simulations) and iii) Mann-Whitney U tests to determine systematic differences in Pv and Pm ranks (2,500 simulations). Following the same pipeline, we created additional simulated datasets narrowing down SNPs to i) those with Pm values from the lowest percentile (n = 504; highest Pm = 5×10−3) and to ii) genome-wide significant SNPs (n = 71; Pm<5×10−8), and tested the pairwise Spearman correlation for Pm, Pv and Pint values (1,000 simulations for both sets). Simulations were run using the statistical software R (v. 3.3.2).[25]
Supporting information
Data Availability
Pm values were obtained from the Genetic Investigation of ANthropometric Traits (GIANT) and the Global Lipids Genetics Consortium (GLGC). Association statistics from GIANT and GLGC are available here: https://www.broadinstitute.org/collaboration/giant/index.php/GIANT_consortiumhttp://csg.sph.umich.edu//abecasis/public/lipids2013/<http://csg.sph.umich.edu/abecasis/public/lipids2013/>. Pv values were calculated as explained in the Methods. Pv values are made publicly available on Dryad at doi:10.5061/dryad.q1m7t. Pi values are drawn from GIANT and are contained in the following articles "Genome-wide physical activity interactions in adiposity--A meta-analysis of 200,452 adults" (10.1371/journal.pgen.1006528) and "Genome-wide meta-analysis of 241,258 adults accounting for smoking behaviour identifies novel loci for obesity traits" (10.1038/ncomms14977).
Funding Statement
This research was undertaken as part of a research program supported by the European Commission (CoG-2015_681742_NASCENT), Swedish Research Council (Distinguished Young Researchers Award in Medicine), Swedish Heart-Lung Foundation, and the Novo Nordisk Foundation, all grants to PWF. DS is supported by the Swedish Research Council International Postdoc Fellowship (4.1-2016-00416). TVV is supported by the Novo Nordisk Foundation Postdoctoral Fellowship within Endocrinology/Metabolism at International Elite Research Environments via NNF16OC0020698. TWW was supported by the grants "Bundesministerium für Bildung und Forschung": BMBF-01ER1206, BMBF-01ER1507. APM is a Wellcome Trust Senior Fellow in Basic Biomedical Science (grant WT098017). LAC acknowledges funding for the Framingham Heart Study: This research was conducted in part using data and resources from the Framingham Heart Study of the National Heart Lung and Blood Institute of the National Institutes of Health and Boston University School of Medicine. The analyses reflect intellectual input and resource development from the Framingham Heart Study investigators participating in the SNP Health Association Resource (SHARe) project. This work was partially supported by the National Heart, Lung and Blood Institute's Framingham Heart Study (Contract No. N01-HC-25195 and Contract No. HHSN268201500001I) and its contract with Affymetrix, Inc for genotyping services (Contract No. N02-HL-6-4278). A portion of this research utilized the Linux Cluster for Genetic Analysis (LinGA-II) funded by the Robert Dawson Evans Endowment of the Department of Medicine at Boston University School of Medicine and Boston Medical Center. This research was partially supported by grant R01-DK089256 from the National Institute of Diabetes and Digestive and Kidney Diseases (MPIs: I.B. Borecki, LAC, K. North). TOK was supported by the Danish Council for Independent Research (DFF—1333-00124) and Sapere Aude program grant (DFF—1331-00730B). RM would like to acknowledge the High Performance Computing Center of University of Tartu. EGCUT was supported by EU H2020 grants 692145, 676550, 654248, 692065, Estonian Research Council Grant IUT20-60, and PerMed I, NIASC, EIT—Health and European Union through the European Regional Development Fund (Project No, 2014-2020.4.01.15-0012 GENTRANSMED). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Franks PW, Mesa JL, Harding AH, Wareham NJ (2007) Gene-lifestyle interaction on risk of type 2 diabetes. Nutr Metab Cardiovasc Dis 17: 104–124. doi: 10.1016/j.numecd.2006.04.001 [DOI] [PubMed] [Google Scholar]
- 2.Kilpelainen TO, Qi L, Brage S, Sharp SJ, Sonestedt E, et al. (2011) Physical activity attenuates the influence of FTO variants on obesity risk: a meta-analysis of 218,166 adults and 19,268 children. PLoS Med 8: e1001116 doi: 10.1371/journal.pmed.1001116 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Deng WQ, Pare G (2011) A fast algorithm to optimize SNP prioritization for gene-gene and gene-environment interactions. Genet Epidemiol 35: 729–738. doi: 10.1002/gepi.20624 [DOI] [PubMed] [Google Scholar]
- 4.Scott RA, Chu AY, Grarup N, Manning AK, Hivert MF, et al. (2012) No interactions between previously associated 2-hour glucose gene variants and physical activity or BMI on 2-hour glucose levels. Diabetes 61: 1291–1296. doi: 10.2337/db11-0973 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Yang J, Loos RJ, Powell JE, Medland SE, Speliotes EK, et al. (2012) FTO genotype is associated with phenotypic variability of body mass index. Nature 490: 267–272. doi: 10.1038/nature11401 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Pare G, Cook NR, Ridker PM, Chasman DI (2010) On the use of variance per genotype as a tool to identify quantitative trait interaction effects: a report from the Women's Genome Health Study. PLoS Genet 6: e1000981 doi: 10.1371/journal.pgen.1000981 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, et al. (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449: 851–861. doi: 10.1038/nature06258 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Deng WQ., Asma S, and Paré G. (2014) Meta-analysis of SNPs involved in variance heterogeneity using Levene’s test for equal variances. European Journal of Human Genetics 223: 427–430. doi: 10.1038/ejhg.2013.166 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Locke AE, Kahali B, Berndt SI, Justice AE, Pers TH, et al. (2015) Genetic studies of body mass index yield new insights for obesity biology. Nature 518: 197–206. doi: 10.1038/nature14177 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Willer CJ, Schmidt EM, Sengupta S, Peloso GM, et al. (2013) Discovery and refinement of loci associated with lipid levels. Nat Genet 45: 1274–1283. doi: 10.1038/ng.2797 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Teslovich TM, Musunuru K, Smith AV, Edmondson AC, Stylianou IM, et al. (2010) Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466: 707–713. doi: 10.1038/nature09270 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Randall JC, Winkler TW, Kutalik Z, Berndt SI, Jackson AU, et al. (2013) Sex-stratified genome-wide association studies including 270,000 individuals show sexual dimorphism in genetic loci for anthropometric traits. PLoS Genet 9: e1003500 doi: 10.1371/journal.pgen.1003500 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Frayling TM, Timpson NJ, Weedon MN, Zeggini E, Freathy RM, et al. (2007) A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 316: 889–894. doi: 10.1126/science.1141634 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Ahmad S, Rukh G, Varga TV, Ali A, Kurbasic A, et al. (2013) Gene x physical activity interactions in obesity: combined analysis of 111,421 individuals of European ancestry. PLoS Genet 9: e1003607 doi: 10.1371/journal.pgen.1003607 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Young AI, Wauthier F, Donnelly P (2016) Multiple novel gene-by-environment interactions modify the effect of FTO variants on body mass index. Nat Commun 7: 12724 doi: 10.1038/ncomms12724 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Qi Q, Kilpelainen TO, Downer MK, Tanaka T, Smith CE, et al. (2014) FTO genetic variants, dietary intake and body mass index: insights from 177,330 individuals. Hum Mol Genet 23: 6961–6972. doi: 10.1093/hmg/ddu411 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Tanaka T, Ngwa JS, van Rooij FJ, Zillikens MC, Wojczynski MK, et al. (2013) Genome-wide meta-analysis of observational studies shows common genetic variants associated with macronutrient intake. Am J Clin Nutr 97: 1395–1402. doi: 10.3945/ajcn.112.052183 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Sun X, Elston R, Morris N, Zhu X (2013) What is the significance of difference in phenotypic variability across SNP genotypes? Am J Hum Genet 93: 390–397. doi: 10.1016/j.ajhg.2013.06.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Marigorta UM, Gibson G (2014) A simulation study of gene-by-environment interactions in GWAS implies ample hidden effects. Front Genet 5: 225 doi: 10.3389/fgene.2014.00225 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Levene H (1960) Robust tests for equality of variances In: Olkin I, editor. Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling. Stanford, CA: Stanford University Press; pp. 278–292. [Google Scholar]
- 21.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81: 559–575. doi: 10.1086/519795 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Kleinbaum DG, Kleinbaum DG (2007) Applied regression analysis and other multivariable methods. Australia; Belmont, CA: Brooks/Cole; xxi, 906 p. p. [Google Scholar]
- 23.Justice AE., et al. (2017) Genome-wide meta-analysis of 241,258 adults accounting for smoking behavior identifies novel loci for obesity traits." Nat Commun 8: 14977 doi: 10.1038/ncomms14977 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Graff M, et al. (2017) Genome-wide physical activity interactions in adiposity―A meta-analysis of 200,452 adults. PLoS Genetics 134: e1006528 doi: 10.1371/journal.pgen.1006528 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Pm values were obtained from the Genetic Investigation of ANthropometric Traits (GIANT) and the Global Lipids Genetics Consortium (GLGC). Association statistics from GIANT and GLGC are available here: https://www.broadinstitute.org/collaboration/giant/index.php/GIANT_consortiumhttp://csg.sph.umich.edu//abecasis/public/lipids2013/<http://csg.sph.umich.edu/abecasis/public/lipids2013/>. Pv values were calculated as explained in the Methods. Pv values are made publicly available on Dryad at doi:10.5061/dryad.q1m7t. Pi values are drawn from GIANT and are contained in the following articles "Genome-wide physical activity interactions in adiposity--A meta-analysis of 200,452 adults" (10.1371/journal.pgen.1006528) and "Genome-wide meta-analysis of 241,258 adults accounting for smoking behaviour identifies novel loci for obesity traits" (10.1038/ncomms14977).