Abstract
Spousal comparisons have been proposed as a design that can both reduce confounding and estimate effects of the shared adulthood environment. However, assortative mating, the process by which individuals select phenotypically (dis)similar mates, could distort associations when comparing spouses. We evaluated the use of spousal comparisons, as in the within-spouse pair (WSP) model, for aetiological research such as genetic association studies. We demonstrated that the WSP model can reduce confounding but may be susceptible to collider bias arising from conditioning on assorted spouse pairs. Analyses using UK Biobank spouse pairs found that WSP genetic association estimates were smaller than estimates from random pairs for height, educational attainment, and BMI variants. Within-sibling pair estimates, robust to demographic and parental effects, were also smaller than random pair estimates for height and educational attainment, but not for BMI. WSP models, like other within-family models, may reduce confounding from demographic factors in genetic association estimates, and so could be useful for triangulating evidence across study designs to assess the robustness of findings. However, WSP estimates should be interpreted with caution due to potential collider bias.
Author summary
There is growing evidence that genome-wide association studies capture associations relating to environmental factors, such as indirect effects from parental genotypes. Within-family models such as sibling comparisons can be used to disentangles these different sources of association but are limited by the paucity of sibling data in large biobanks. Within-spouse pair models are a potentially tractable model because spouses share environmental factors in adulthood and may also share early-life environmental factors. Here, we evaluated the application of within-spouse models in genetic association studies, specifically considering assortative mating, a phenomenon whereby individuals may select a phenotypically similar partner. We found that within-spouse pair models can detect genuine confounding in genetic association estimates but are potentially susceptible to collider bias induced by comparing assorted pairs. Within-spouse pair estimates could be useful when combining evidence from different study designs.
Introduction
Within-sibship models have been widely used in genetic association studies for many decades [1–6]. Genotypic differences between siblings are the consequence of random segregation at meiosis, rather than parental or ancestral differences, and so within-sibship genetic association models control for demography (assortative mating, population stratification) and indirect genetic effects of parents [4,5,7]. However, there is a paucity of genetic data from siblings with limited availability for many phenotypes. Furthermore, while siblings are matched on the early-life environment, their environments in adulthood, when many phenotypes are measured, may differ.
In contrast, spouses may have different early-life environments but are likely to share an environment for much of adulthood while cohabiting [8]. This may act to increase phenotypic similarity, such as for behavioural (e.g., physical activity and alcohol use) or personality traits [9,10]. The shared adulthood environment between spouses has prompted their use in a variety of contexts in genetic and epidemiological research using a model that we refer to as the “within-spouse pair” (WSP) model. The WSP model involves modelling the similarities and differences of spouses, either by analysing the differences between each pair or by modelling spousal relationships as a covariate in a fixed-effect model. For example, previous studies have used the WSP model to estimate phenotypic variance explained by the shared adulthood environment [11–17]. The WSP model has also been proposed as an approach to reduce confounding in aetiological research, with environmental confounders likely to be strongly correlated between spouses [18]. Here, we describe the strengths and limitations of within-spouse (WSP) designs for genetic studies.
A caveat of the WSP model is that spousal similarities are not just consequences of sharing an adulthood environment. There is evidence that for some phenotypes, spouses do not become much more similar during a relationship [19]. Another cause of spousal similarities is assortative mating–a phenomenon where humans are generally more likely to select a phenotypically similar [9,10,20–26] or, in some instances [27], dissimilar [28,29] mate. For example, height and years in schooling are often fixed prior to partnership formation, suggesting that spousal similarities for these phenotypes reflect assortment rather than effects of the shared adulthood environment. Furthermore, geographical, ancestral, and cultural factors often have strong influences on both phenotypic variation and partner selection patterns, as illustrated by the ancestral similarities of spouses [30]. Therefore, some degree of spousal phenotypic similarities is likely to be explained by spousal assortment on factors not typically defined as phenotypes, such as place of birth or religion.
The WSP model may be susceptible to collider bias, which can occur when conditioning on a variable which is influenced by two or more upstream factors. Collider bias can induce spurious associations between these factors where the collider variable is conditioned on in analysis either by analytical model design or sample selection. For example, if a school grants scholarships to either individuals who are exceptional at sport or exceptional academically, then sporting and academic ability will be negatively correlated amongst individuals with scholarships. Similarly, spousal samples by definition condition on spousal compatibility, a pairwise measure of how likely two individuals are to enter a relationship. If several phenotypes influence spousal compatibility, then collider bias could potentially arise in the WSP model [31–33]. For example, if similarities for age and educational attainment influence compatibility then spouses with larger age differences are more likely to have similar educational attainment (Fig 1). Previous spousal studies have acknowledged assortative mating, but whether assortment could distort WSP comparisons has not been investigated in detail. For example, the possibility of collider bias has been little discussed. We aimed to investigate the utility of the WSP model in genetic epidemiology and assess its robustness to collider bias.
We used causal diagrams (allowing double-headed arrows signifying correlated variables that may be influenced by variables outside the model [34]) and simulated data to illustrate two important characteristics of the WSP model. First, the WSP model can reduce confounding if spouses are correlated for confounders. Second, the WSP model is susceptible to collider bias induced by conditioning on spousal compatibility. We then applied the WSP model using 47,435 spouse-pairs in UK Biobank [35] to estimate associations between genetic variants and phenotypes (e.g. height). We then estimated effect size shrinkage (% decrease) in the WSP estimates compared to within-pair estimates from random non-assorted pairs, which were derived by reordering the spouse-pair sample. For comparison, we also estimated within-sibship shrinkage using 19,523 sibling pairs from UK Biobank. Finally, as a negative control analysis, we used the WSP model to estimate the effects of age on systolic blood pressure (SBP) and coronary artery disease (CAD).
Results
Within-spouse pair model: Assortative mating, spousal correlations and collider bias
Here, we present results from simulations evaluating the WSP model under assortative mating. In the first simulation model (A), the relationship between an exposure and an outcome is confounded by an unmeasured factor. Spouses are positively correlated for the unmeasured confounder, either because of assortative mating or because of shared environmental factors during cohabitation (Fig 2A). Simulations demonstrated that under this model, WSP estimates of the effect of the exposure on the outcome are less biased and converge to the simulated unbiased estimate as the spousal correlation for the confounder tends to 1 (S1 Fig and S1 Table).
In the second simulation model (B), two independent exposures influence an outcome. These exposures could be two phenotypes, a phenotype and a genetic score, or two independent genetic scores such as height genetic scores constructed from odd and even chromosomes. Since there is assortment on the two exposures, assortment acts as a collider, which induces associations between variables that would otherwise be independent in the population. For example, in the WSP model, positive assortment on height and educational attainment phenotypes could induce a negative correlation between height and educational attainment genetic scores. The strength of this spurious correlation will depend on the underlying data generating process and the degree of assortment on the exposures. Indeed, assortment on height across multiple generations has resulted in positive correlations between height increasing genetic variants on different chromosomes [22]. These correlations could lead to bias in WSP estimates of either exposure on the outcome. However, WSP estimates will only be affected by collider bias if both exposures influence the outcome and if the effects of the exposures on the collider are not perfectly multiplicative (Fig 2B).
Simulations showed that the degree of bias in the effect estimate is a function of the degree of assortment on the two exposures, with more bias when spouses assort strongly on both traits. For example, under this model and using plausible assortment estimates for educational attainment (spousal phenotypic correlation: 0.5) and height (spousal phenotypic correlation: 0.2) [25], the expected bias would be around 13% when estimating the effect of education on a trait which is also influenced by height. If both exposures have the same direction of effect on the outcome and assortment, then the WSP estimate would be biased downwards (S1 Fig and S2 Table).
Empirical analyses using spouse pairs in UK Biobank
Within-spouse pair: genetic and phenotypic associations
Although the WSP model is susceptible to collider bias, the model could be useful in re-calibrating associations that might be biased due to confounding. Genetic data are particularly useful for evaluating aetiological models because genotypes are measured accurately and fixed from conception, largely removing the possibility of reverse causation. Here we aimed to evaluate if WSP genetic association estimates are less confounded than estimates from population studies of unrelated individuals.
Estimates of genetic associations using unrelated individuals can be distorted by demography (e.g. assortative mating, fine-scale population structure) and indirect effects of parents [6,7,36]. WSP estimates may be less affected by these sources of association, particularly population structure, because of environmental and ancestral similarities between spouses [30].
Using 47,435 spouse-pairs from UK Biobank (S3 Table and S2, S3 and S4 Figs), previously derived using household sharing information [24], we first explored the extent to which spouse and sibling pairs are correlated for the first 10 principal components and birth coordinates (north-south, east-west) to inform the extent of pairwise spousal ancestral similarities. Spouse pairs were correlated for both birth coordinates and the first 10 principal components with correlations ranging from 0.10 for PC6 to 0.32 for PC4 across the principal components and strong correlations observed for both north-south (0.62; 95% C.I. 0.61, 0.62) and east-west (0.46; 95% C.I. 0.46, 0.47) birth coordinates. As expected, sibling pairs were very strongly correlated for birth coordinates and the first 10 principal components with correlations ranging from 0.74 for PC10 to 0.98 for PC4 (S4 Table). The spousal correlations for birth coordinates and principal components illustrate how assortative mating and social homogamy induce ancestral similarities between spouses.
We then estimated the effects of genetic variants on six different traits using the WSP design and between unrelated non-spouse pairs. We then estimated the shrinkage (% attenuation) from the non-spouse pair genetic association estimates to the WSP estimates. For comparison, we also applied the same approach to a sample of 19,523 sibling pairs (i.e. within-sibship model). Within-sibship models are a gold standard within family design for estimating genetic associations because they control for demographic and parental effects [4,6,36,37]. Comparisons between WSP and within-sibship shrinkage estimates would provide insight into the accuracy of WSP genetic association estimates.
We found strong evidence of smaller effect sizes in the WSP model for height (shrinkage: 19%; 95% CI 17%, 22%), educational attainment (shrinkage: 72%; 95% CI 64%, 79%) and BMI (shrinkage: 16%; 95% CI 6%, 25%). Contrastingly, there was limited evidence of shrinkage for SBP and CAD variants. The within-sibship analysis provided strong evidence of shrinkage for height (shrinkage: 15%; 95% C.I. 11%, 20%) and educational attainment (shrinkage: 53%; 95% C.I. 35%, 71%) variants, but limited evidence for BMI variants (shrinkage: 5%; 95% -12%, 22%). WSP shrinkage estimates were generally higher than within-sibship estimates, but imprecision prevented stronger conclusions regarding heterogeneity. Including principal components in the random-pair models did not greatly affect results except in the alcohol analysis where there was only evidence for shrinkage in the unadjusted models. This suggests that population stratification is unlikely to entirely explain the observed shrinkage in these estimates (Table 1). We note that the alcohol analysis included only a single SNP which is known to be strongly associated with population structure in UK Biobank [24].
Table 1. Estimates of genetic association shrinkage from within-spouse pair and within-sibship models.
Phenotype | Number of SNPs | Covariates | Within-spouse pair shrinkage: % (95% C.I.) | Within-sibship shrinkage: % (95% C.I.) | Heterogeneity P for spouse and sibling shrinkage estimates |
---|---|---|---|---|---|
Height | 381 | No PC | 19% (17%, 22%) | 15% (11%, 20%) | 0.18 |
PC1-10 | 17% (14%, 20%) | 13% (8%, 18%) | 0.19 | ||
Educational attainment | 69 | No PC | 72% (64%, 79%) | 53% (35%, 71%) | 0.06 |
PC1-10 | 71% (62%, 79%) | 51% (33%, 70%) | 0.06 | ||
Body mass index | 68 | No PC | 16% (6%, 25%) | 5% (-12%, 22%) | 0.28 |
PC1-10 | 16% (6%, 25%) | 5% (-12%, 22%) | 0.28 | ||
Coronary artery disease | 41 | No PC | -4% (-23%, 15%) | -1% (-36%, 34%) | 0.90 |
PC1-10 | -4% (-23%, 16%) | -1% (-35%, 34%) | 0.83 | ||
Systolic blood pressure | 242 | No PC | 0% (-7%, 8%) | 5% (-7%, 18%) | 0.53 |
PC1-10 | 0% (-8%, 7%) | 5% (-7%, 18%) | 0.50 | ||
Alcohol consumption | 1A | No PC | 29% (14%, 43%) | 20% (-20%, 59%) | 0.40 |
PC1-10 | 14% (-5%, 33%) | 4% (-46%, 54%) | 0.42 |
A: rs1229984 in ADH1B
We compared the within-sibship and WSP shrinkage estimates from this study (using UK Biobank data only) to within-sibship shrinkage estimates from a recent within-sibship GWAS of 17 cohorts [5], which included over 4x as many siblings as this study. The within-sibship shrinkage estimates from the multi-cohort GWAS were highly consistent with the within-sibship shrinkage estimates from UK Biobank only but were much more precise. The multi-cohort within-sibship shrinkage estimates were smaller than the WSP shrinkage estimates for height, BMI, and educational attainment, with non-overlapping confidence intervals, providing some evidence that WSP shrinkage is larger than within-sibship shrinkage for these phenotypes (S5 Table).
Within-spouse pair: age, SBP and CAD
As a negative control analysis, we next used the WSP model to estimate the effects of increasing age on outcomes known to be related to age (CAD and SBP), using random pair estimates for comparison. Age cannot be influenced by other phenotypes, so analyses are unlikely to be susceptible to reverse causation or confounding. However, collider bias in the WSP model with age is plausible because spousal compatibility is influenced by age similarities. For example, couples with large age differences may systematically differ to couples with smaller age differences. It follows that differences between WSP and random pair estimates (with age as the exposure) are likely to reflect collider bias. This is a similar premise to autosomal GWAS of sex, where genetic associations are likely to reflect participation bias because autosomal genetic variation cannot influence sex [38].
Pairwise age differences were found to be greater between random pairs, consistent with individuals preferring a partner of a similar age. We did not find strong evidence for differences in age effect estimates on CAD and SBP between the spouse and random pair samples suggesting that any collider bias effects are modest in this context (Table 2).
Table 2. Within-spouse pair estimates of the effect of age on SBP and CAD.
Phenotype | Spouse-pairs (N = 47,435) | Random pairs (N = 47,435): Median estimate from 100 simulations |
---|---|---|
Average age difference (years); Median (Q1, Q3) | 2.0 (1.0, 4.0) | 7.0 (3.0, 13.0) |
Systolic blood pressure (Change in mmHg per 1-year increase in age; 95% C.I.) | 0.74 (0.69, 0.80) | 0.80 (0.78, 0.83) |
Coronary artery disease (OR per 1-year increase in age; 95% C.I.) | 1.05 (1.04, 1.05) | 1.05 (1.04, 1.05) |
All analyses were adjusted for sex of the index individual.
Discussion
In this study, we used causal diagrams, simulations, and empirical data to evaluate the use of the WSP model in genetic epidemiology. We showed that the WSP model can account for unmeasured confounding if spouses are correlated for the confounder but that comparing assorted spouses can induce collider bias. Using empirical data, we found evidence that genetic association estimates for height, educational attainment, and BMI shrink in the WSP model when compared to a within-pair model using random individuals.
Within-sibship models in UK Biobank, which control for demographic and parental effects [6,36], also provided evidence of shrinkage for height and educational attainment variants but not for BMI, consistent with previous studies [4,5,37]. WSP shrinkage point estimates for height, BMI and education were larger than the UK Biobank within-sibship shrinkage estimates although confidence intervals overlapped. However, there was strong statistical evidence that WSP shrinkage is greater than within-sibship shrinkage for height, BMI and educational attainment when using more precise within-sibship shrinkage estimates from a recent within-sibship GWAS [5]. The consistent evidence of shrinkage between the two models for height and education suggests that WSP models may be removing associations relating to demography.
Simulated data illustrated that if spouses assort on a confounder of the exposure and outcome, then the WSP association provides a less biased estimate of the causal effect than a conventional model unadjusted for the confounder. An example of a potential confounder in genetic association studies is ancestry, which we showed to be more correlated between spouses than for non-spouse pairs by illustrating birth coordinate and principal component correlations between spouses. However, we note that including principal components as covariates did not greatly affect shrinkage estimates except for alcohol consumption where, as noted earlier, the single variant used is known to be strongly associated with population structure [24]. The WSP shrinkage estimates being higher than the sibling estimates suggests that the shrinkage cannot be explained by adjustment for confounding alone. If the only source of shrinkage is adjustment for confounding, then WSP shrinkage estimates should be smaller than within-sibship estimates because spouse models are unlikely to fully control for demographic or family-level (e.g. parental nurture) effects. Collider bias induced by comparing assorted pairs is one potential explanation for the observed WSP shrinkage.
Collider bias could contribute to the observed shrinkage depending on the interactive model between the colliding effects and the degree of assortment. Assuming a linear additive model, collider bias is likely to shrink rather than inflate genetic associations because assortment would induce negative correlations in the WSP model between factors influencing the assorted trait in the same direction. For example, assortment could induce negative correlations between height increasing genetic variants and height increasing environmental factors in the WSP model, leading to shrinkage when estimating WSP genetic associations. This is in contrast to the population-level effects of assortative mating which inflate associations because of induced positive correlations between trait-increasing variants on different chromosomes [22]. However, in the negative control example of age on health outcomes, we found little evidence of collider bias; within-pair effect estimates of age on CAD and SBP were consistent between spouse and non-spouse pair samples. Another potential source of WSP shrinkage is the spousal environment. The spousal environment could be influenced by individual’s genotypes leading to reduced spousal phenotypic differences. For example, if an individual has high genetic liability to increased alcohol consumption this could lead to their partner consuming similar amounts of alcohol independent of their genotype.
A key implication of these analyses is that spousal similarities and differences are not necessarily random or attributable solely to the shared adulthood environment. WSP similarities are likely to reflect a combination of social homogamy, assortative mating and the shared adulthood environment. Amidst growing evidence that genetic epidemiological studies can capture effects of fine-scale population structure, parental nurture and assortative mating [4,6,39–46], there is considerable interest in using genotype data from pedigrees to more accurately estimate direct genetic effects and trait heritability as well as to explore parental effects on offspring phenotypes [11–14,39–41,44,45,47–51]. Family designs such as the transmission disequilibrium test [52] and within-sibship models are protected from many of these biases by random segregation at meiosis [53,54]. However, in contrast, inferences from spousal analyses are not as robust, thus it is important to understand and model the assortment in spousal designs. A further implication is that assortative mating is likely to contribute to the phenotypic and genetic structure of epidemiological studies. Large studies such as the UK Biobank, frequently incidentally sample participants who are partnered with another study participant [24]. The non-randomness of study participation in UK Biobank has been previously discussed as a possible cause of selection (participation) bias [31]. Our findings illustrate that assortative mating is likely to contribute to the non-random distribution of phenotypes (and genotypes) in population biobanks.
Our study has several important limitations. First, as described in our previous study [24], derived spouse-pairs were identified using household sharing information so may be susceptible to a degree of classification error with non-spouse pairs being incorrectly identified as spouses. Second, the mechanisms by which spouses jointly participate in UK Biobank may have induced selection bias into empirical analyses as these pairs could be more similar than pairs that did not jointly participate. Third, given that the exact mechanisms of assortment are not widely understood, our simulations and assumptions may not accurately capture the mechanisms underlying spousal assortment. In simulations we assumed that factors influencing assortment are independent across the population but in practice, factors influencing assortment are often correlated (e.g. height and education). Future research could use more complex simulations to evaluate models that can distinguish the effects of social homogamy, migration and measurement error. Fourth, it is important to note that educational attainment as defined by qualifications when study participants are aged over 40 will also capture individuals with degrees obtained during adulthood, suggesting that educational similarities could also plausibly relate to the shared adulthood environment.
To conclude, the WSP model can reduce confounding from environmental factors but may also be susceptible to collider bias. An empirical example using genetic associations suggested that WSP estimates may be downwardly biased. Contrastingly, WSP estimates for effects of age did not seem to be affected by collider bias. An advantage of WSP models is that they may have increased power for genetic studies relative to other within-family designs because (non-consanguineous) spouses are less likely than first degree relatives to share long segments of the genome identical by descent. The WSP model could be a complimentary orthogonal design to other within-family models when triangulating evidence from different study designs [33].
Methods
Data sources
UK Biobank
Study description
UK Biobank is a large-scale prospective cohort study which sampled 503,325 individuals aged between 38–73 years at baseline, recruited between 2006 and 2010 from across the United Kingdom. The cohort has been described in detail previously [35,55]. For the purposes of this study, we used two subsamples of the cohort; spouse-pairs [24], and full-sibling pairs [6].
Potential spouses were estimated using household sharing information in a previous publication [24]. We started with a European subsample of UK Biobank, consisting of 463,827 individuals based on a k-means cluster analysis on the first 4 genetic principal components. We then used phenotype data to extract pairs of individuals who reported (a) living with their spouse (field ID: 6141–0.0), (b) the same length of time living in the house (field ID: 699–0.0), (c) the same number of occupants in the household (field ID: 709–0.0), (d) the same number of vehicles (field ID: 728–0.0), (e) the same accommodation type and rental status (field IDs: 670–0.0, 680–0.0), (f) identical home coordinates (rounded to the nearest km) (field IDs: 20074–0.0, 20075–0.0) and (g) are registered with the same UK Biobank recruitment centre (field ID: 54–0.0) and (h) both have available genotype data. We considered pairs with identical information across all household variables as putative spouses. When more than two individuals shared identical information (observed in 18,145 instances), then these individuals were removed. 53 closely related pairs (IBD > 0.1) were identified and removed using a genetic relationship matrix. We excluded 4,866 potential couples who were the same sex (9.3% of the sample) as they were deemed to be more likely to be false positives and because of possible heterogeneity in same-sex assortment patterns. The original paper identified 47,549 male-female pairs believed to be cohabitating spouses. In this study, we used an updated version of the genetic data after removing individuals who had opted out of the study resulting in a slightly reduced sample of 47,435 complete pairs.
Full-sibling relationships were derived using UK Biobank provided estimates of pairwise identical by state (IBS) kinships (>0.5–21*IBS0, <0.7) and IBS0 (>0.001, <0.008), the proportion of unshared loci [6]. This approach identified 40,275 siblings from 19,523 families. For the purposes of within-sibship analyses, we restricted the sample to 2 siblings from each family, selecting siblings at random. The analysis sample included 39,046 individuals from 19,523 families.
Phenotype data
At baseline, the height of study participants was measured using a Seca 202 device at the assessment centre (field ID: 12144–0.0), body mass index was derived manually from measures of standing height and weight (field ID: 21001.0.0), systolic blood pressure was measured using an automated reading from an Omron Digital blood pressure monitor (field ID: 4080–0.0). Educational attainment was defined as in a previous study [56], using questionnaire data on qualifications to estimate the number of years spent in full-time education (field ID: 6138). Coronary artery disease cases were diagnosed using International Classification of Disease (10th edition) (ICD10) and Operating Procedure System (OPS) codes from either hospital events (Hospital Episode Statistics) or underlying cause of death from the death register. The following ICD10 (I21, I22, I23, I24, I25, Z955) and OPS codes (K40-K46, K471, K49, K50, K75) [57] were used to classify diseased cases. North-south (field ID: 129) and east-west (field ID: 130) birth coordinates were derived from self-reported town of birth.
Alcohol consumption was defined as in a previous study [24]. In brief, participants were asked to estimate their current alcohol intake frequency (daily or almost daily, three or four times a week, once or twice a week, one to three times a month, special occasions only, never, prefer not to say) (ID: 1558–0.0). Individuals reporting a current intake frequency of at least once or twice a week were asked to estimate their average weekly intake of a range of different alcoholic beverages (red wine, white wine, champagne, beer, cider, spirits, fortified wine) (ID: 1568–0.0, 1578–0.0, 1588–0.0, 1598–0.0, 1608–0.0). We converted intake frequencies to weekly alcohol consumption in units by converting the questionnaire measurements to units: measures for spirits (1 unit), glasses for wines (2 units) and pints for beer/cider (2.5 units). Individuals reporting current intake frequency of “one to three times a month”, “special occasions only” or “never” (for whom this phenotype was not collected), were assumed to have a weekly alcohol consumption volume of 0. We removed 189 pairs with outlying values (>5 S.D from the mean) from one or more members.
Genotyping
UK Biobank study participants (N = 488,377) were assayed using the UK BiLEVE Axiom Array by Affymetrix1 (N = 49,950) and the UK Biobank Axiom Array (N = 438,427). Directly genotyped variants were pre-phased using SHAPEIT3 [58] and then imputed using Impute4 using the UK10K [59], Haplotype Reference Consortium [60] and 1000 Genomes Phase 3 [61] reference panels. Post-imputation, data were available for approximately ~96 million genetic variants. More detail is contained in previous publications [35,62].
Genome-wide association studies
Summary statistics from previous published GWAS, independent from UK Biobank, were used for information on SNPs associated with coronary artery disease [63], body mass index [64], educational attainment [56] and height [65].
Genome-wide summary data were not available for a recent systolic blood pressure GWAS [66], so we performed a GWAS of systolic blood pressure using UK Biobank. To remove sample overlap, we excluded the 47,435 spouse pairs from the analysis and used the remaining sample of 367,963 individuals of self-report European descent. A GWAS was conducted on this sample using a linear mixed model (LMM) association method as implemented in BOLT-LMM (v2.3)[67]. To model population structure in the sample we used 143,006 directly genotyped SNPs obtained after filtering on MAF > 0.01; genotyping rate > 0.015; Hardy-Weinberg equilibrium p-value < 0.0001 and LD pruning to an r2 threshold of 0.1 using PLINK v2.0 [68]. We included the age and sex of participants as covariates in the model.
A set of Genome-wide significant SNPs were generated for each trait by LD clumping relevant summary statistics (P<5×10−8, r2<0.001, clumping distance = 10000 kb) using the 1000 Genomes Phase 3 GBR samples [61] as the reference panel. For alcohol consumption, we used a missense variant (rs1229984) in ADH1B strongly associated with alcohol behaviour, as in a previous study [24].
Theory of within-spouse pair comparisons
The phenotype P of individual I can be modelled as a function of independent factors; genetics G, the environment E, age, sex and a stochastic variance term ∈.
When considering male-female spouse pairs, we can decompose the influence of the environment E on P into effects of the shared environment between spouses SE (e.g. during cohabitation) and effects of the non-shared environment NSE. For example, for the male M and female F in pair K:
We then define the WSP model across spouse pairs as:
where the differences between the spouses for each factor are included in the model (e.g. for pair K, , ). The shared environmental terms are by definition equal for men and women and drop out of the model.
For the WSP model to generate an unconfounded estimate of the causal effect of G on P, we require that the genetic and environmental difference terms in the between-spouse model are independent, i.e. Corr(G*, E*) = 0. This assumption could be violated by several factors including assortative mating and indirect genetic effects. For example, if parental genotypes influence their offspring phenotype, then the offspring’s genotype would be positively correlated with their parental environment.
Random and non-random mating
Consider the WSP model applied to three distinct sets of pairs: a) a random set of males and females (non-spouses), b) spouse pairs under random mating (random spouses), and c) spouse-pairs under assortative mating (assorted spouses). In theory, the environmental differences between pairs would decrease with cohabitation and under assortment on environmental factors such as place of birth and socio-economic status:
Note that as the environmental differences between pairs tends to zero (E*→0), the bias in the estimated association between P and G will also tend to zero (bias(P~G)→0) even if G* and E* are correlated in the WSP model (Corr(G*, E*)≠0) because the pair would be matched for the confounder, suggesting that comparing assorted pairs could reduce the effect of environmental biases.
We define the mechanism by which spouses assort as spousal compatibility A, a pairwise measure of the likelihood that two individuals enter a relationship. If several phenotypes influence assortment, then assortative mating can induce collider bias. For example, assortment on a phenotype influenced by genetic and environmental factors could induce spousal correlations in both genetic and environmental determinants of the phenotype, i.e. Corr(GKM, GKF)>0 & Corr(EKM, EKF)>0. It follows that in the WSP model, spousal genetic differences could be inversely associated with spousal environmental differences, i.e. Corr(G*, E*)<0.
Statistical methods
Simulations
Model A: Within-spouse pair: spousal correlation for confounders
In model A, an exposure X influences an outcome Y but the relationship is confounded by life-course exposure to an environmental factor E which influences both X and Y. We evaluated the effect of spousal correlations for E on the WSP estimates of the effect of X on Y.
Spousal correlations for E were generated by simulating E and a spousal assortment measure A such that Corr(E, A) = C. Male-female pairs were defined by ordering A such that AM1≥AM2≥..AM1000 and AF1≥AF2≥..AF1000 and matching respective males and females, i.e. AM1 with AF1. This matching induces a spousal correlation for E which converges to C as the sample size increases to infinity.
Using 2,000 simulated individuals (1,000 males and 1,000 females), we generated WSP estimates at a range of values of C (0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9). Code for model (A) can be found at https://github.com/LaurenceHowe/Between-spouse/blob/master/simulations.R.
Model B: Within-spouse pair: assortative mating and collider bias
In model B, individuals assort on two independent phenotypes X1 and X2, that also influence an outcome O such that Y~X1+X2+∈. We evaluated the effects of assortment on X1 and X2 on the WSP estimate of the effect of X1 on Y. We simulated A, X1 and X2 such that Corr(X1, A) = C1 & Corr(X2 A) = C2. As above, we then defined pairs by ordering A which induces spousal correlations for X1 and X2.
Using 2,000 simulated individuals (1,000 males and 1,000 females), we generated WSP estimates used varying degrees of spousal assortment (Ci = 0, 0.1, 0.2, 0.3, 0.4, 0.5: i ϵ(1,2)). The WSP regression model is defined as Y*~X1* where Y* = YKM−YKF and for each assorted pair. Code for model (B) can be found at https://github.com/LaurenceHowe/Between-spouse/blob/master/simulations.R.
Empirical analysis in the UK Biobank
Within-spouse pair: Genetic and phenotypic differences
We estimated the correlations between spouses for birth coordinates (north-south, east-west) principal components using a linear regression model in R. Given that the regression model includes the same variable from different individuals, the association estimates are approximately equivalent to correlations.
We defined the genotypic differences at a variant for spouse pair K with individuals A and B as:
WSP effect estimates of each genetic variant on the relevant phenotype of interest (height, body mass index, systolic blood pressure, educational attainment, coronary artery disease or alcohol consumption) were generated using linear or logistic regression. In the context of binary outcomes, the pair were rearranged so that the phenotypic difference could take the value of either 0 or 1 (for logistic regression), with other variables rearranged accordingly. The sex of the reference individual and the age difference between the spouses were included as covariates:
Using the models described above, we generated associations using the WSP model with the spouse-pairs. For comparison, we generated 100 distinct datasets of random-male female pairs which were generated by randomly rearranging the 47,435 spouse-pairs and ensuring that pairs were of different sex. We applied the same within-pair models to the random male-female pairs, taking the median effect estimate and standard error for each variant from the 100 random-pair estimates. To compare WSP and random-pair genetic association estimates, we used an inverse-variance weighted (IVW) approach [69,70]. The IVW approach uses summary data to estimate the effect of a polygenic score from the discovery GWAS, where the genetic variants were selected from, on the phenotype in both models. Using betas from the discovery GWAS as “weights” and betas and standard errors from the WSP and random-pair models, the IVW estimates are calculated across N variants as follows.
Shrinkage in genetic associations for each phenotype, defined as the percentage difference between the random pair IVW estimate and the WSP estimate, was calculated using the delta method assuming no covariance between the estimators.
As we investigated only a single genetic variant for alcohol consumption, we were unable to investigate a trend across genetic variants. Instead we tested for a difference between two means for the WSP and median random-pair estimate [71].
Within-sibship birth coordinate correlations, principal component correlations and shrinkage estimates were generated using very similar methods to the spousal analyses [4,6,36]. Unlike the male-female spouse-pairs, siblings can be different sexes, so we included a sex difference term in the regression models for the shrinkage analysis. Within-sibship estimates were compared with random-pair estimates as in the spousal analysis. Shrinkages in genetic associations for each phenotype were estimated as above.
We investigated heterogeneity between WSP and within-sibship shrinkage estimates using the difference for two means test [71] assuming no covariance.
As a sensitivity analysis, we included an analysis adjusting for principal components in the random-pair samples to account for population structure differences. We included differences for the first 10 principal components in the random pair models as below. Principal component differences were not included in the WSP or within-sibship models.
For comparison, we also considered within-sibship shrinkage estimates from a recent within-sibship GWAS preprint [5]. This preprint reported shrinkage for genetic variants at genome-wide significance (5x10-8) and a more liberal threshold (1x10-5) for height, BMI, educational attainment, SBP and alcohol consumption. Coronary heart disease was not analysed in this study. As the shrinkage estimates were broadly similar between the two thresholds for the 5 phenotypes in this preprint, we considered the shrinkage estimates from the liberal threshold.
Within-spouse pair: age, SBP and CAD
The WSP effect estimates of age on CAD and SBP were estimated using the following regression model (linear or logistic dependent on the outcome of interest), including sex of the reference individual and the age difference between-spouses as covariates:
As above, we repeated analyses using the datasets of random male-female pairs, reporting the median effect size and standard error across the 100 simulated datasets for each model.
Supporting information
Data Availability
In this study, we used individual participant data from UK Biobank. Interested researchers would be able to obtain the same data-set from UK Biobank. Reference: https://www.nature.com/articles/s41586-018-0579-z Contact: access@ukbiobank.ac.uk. Relevant code for simulation models is available at the following repository https://github.com/LaurenceHowe/Between-spouse.
Funding Statement
LJH, TB, TTM, GH, NMD and GDS are members of MRC Integrative Epidemiology Unit which is supported by the Medical Research Council (MRC) [MC_UU_00011/1] and the University of Bristol (principal investigator: GDS). NMD is supported by The Economics and Social Research Council (ESRC) via a Future Research Leaders grant [ES/N000757/1], a Norwegian Research Council Grant number 295989 and by the Health Foundation’s Efficiency Research Programme (Award 807293). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Curtis D, Miller MB, Sham PC. Combining the sibling disequilibrium test and transmission/disequilibrium test for multiallelic markers. American journal of human genetics. 1999;64(6):1785. doi: 10.1086/302421 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Abecasis GR, Cardon LR, Cookson WO. A general test of association for quantitative traits in nuclear families. Am J Hum Genet. 2000;66(1):279–92. Epub 2000/01/13. doi: 10.1086/302698 ; PubMed Central PMCID: PMC1288332. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Fulker DW, Cherny SS, Sham PC, Hewitt JK. Combined linkage and association sib-pair analysis for quantitative traits. Am J Hum Genet. 1999;64(1):259–67. Epub 1999/01/23. doi: 10.1086/302193 ; PubMed Central PMCID: PMC1377724. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Lee JJ, Wedow R, Okbay A, Kong E, Maghzian O, Zacher M, et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat Genet. 2018;50(8):1112–21. Epub 2018/07/25. doi: 10.1038/s41588-018-0147-3 ; PubMed Central PMCID: PMC6393768. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Howe LJ, Nivard MG, Morris TT, Hansen AF, Rasheed H, Cho Y, et al. Within-sibship GWAS improve estimates of direct genetic effects. bioRxiv. 2021:2021.03.05.433935. doi: 10.1101/2021.03.05.433935 [DOI] [Google Scholar]
- 6.Brumpton B, Sanderson E, Hartwig FP, Harrison S, Vie GÅ, Cho Y, et al. Within-family studies for Mendelian randomization: avoiding dynastic, assortative mating, and population stratification biases. Nature Communications. 2020:602516. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Young AI, Benonisdottir S, Przeworski M, Kong A. Deconstructing the sources of genotype-phenotype associations in humans. Science. 2019;365(6460):1396–400. doi: 10.1126/science.aax3710 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Davey Smith G. Epidemiology, epigenetics and the ‘Gloomy Prospect’: embracing randomness in population health research and practice. International Journal of Epidemiology. 2011;40(3):537–62. doi: 10.1093/ije/dyr117 [DOI] [PubMed] [Google Scholar]
- 9.Ask H, Rognmo K, Torvik FA, Roysamb E, Tambs K. Non-random mating and convergence over time for alcohol consumption, smoking, and exercise: the Nord-Trondelag Health Study. Behav Genet. 2012;42(3):354–65. Epub 2011/10/19. doi: 10.1007/s10519-011-9509-7 . [DOI] [PubMed] [Google Scholar]
- 10.Ask H, Idstad M, Engdahl B, Tambs K. Non-random Mating and Convergence Over Time for Mental Health, Life Satisfaction, and Personality: The Nord-Trøndelag Health Study. Behavior Genetics. 2013;43(2):108–19. doi: 10.1007/s10519-012-9578-2 [DOI] [PubMed] [Google Scholar]
- 11.Xia C, Amador C, Huffman J, Trochet H, Campbell A, Porteous D, et al. Pedigree-and SNP-associated genetics and recent environment are the major contributors to anthropometric and cardiometabolic trait variation. PLoS Genetics. 2016;12(2):e1005804. doi: 10.1371/journal.pgen.1005804 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Hill WD, Arslan RC, Xia C, Luciano M, Amador C, Navarro P, et al. Genomic analysis of family data reveals additional genetic effects on intelligence and personality. Molecular Psychiatry. 2018;23(12):2347. doi: 10.1038/s41380-017-0005-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Amador C, Xia C, Nagy R, Campbell A, Porteous D, Smith BH, et al. Regional variation in health is predominantly driven by lifestyle rather than genetics. Nature Communications. 2017;8(1):801. doi: 10.1038/s41467-017-00497-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Wang K, Gaitsch H, Poon H, Cox NJ, Rzhetsky A. Classification of common human diseases derived from shared genetic and environmental determinants. Nature Genetics. 2017;49(9):1319. doi: 10.1038/ng.3931 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Barbu MC, Shen X, Walker RM, Howard DM, Evans KL, Whalley HC, et al. Epigenetic prediction of major depressive disorder. Molecular Psychiatry. 2020. doi: 10.1038/s41380-020-0808-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Zeng Y, Amador C, Xia C, Marioni R, Sproul D, Walker RM, et al. Parent of origin genetic effects on methylation in humans are common and influence complex trait variation. Nature Communications. 2019;10(1):1383. doi: 10.1038/s41467-019-09301-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Zeng Y, Navarro P, Xia C, Amador C, Fernandez-Pujals AM, Thomson PA, et al. Shared Genetics and Couple-Associated Environment Are Major Contributors to the Risk of Both Clinical and Self-Declared Depression. EBioMedicine. 2016;14:161–7. Epub 2016/11/14. doi: 10.1016/j.ebiom.2016.11.003 ; PubMed Central PMCID: PMC5161419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Bjorngaard JH, Vie GA, Krokstad S, Janszky I, Romundstad PR, Vatten LJ. Cardiovascular mortality—Comparing risk factor associations within couples and in the total population—The HUNT Study. International journal of cardiology. 2017;232:127–33. Epub 2017/01/14. doi: 10.1016/j.ijcard.2017.01.041 . [DOI] [PubMed] [Google Scholar]
- 19.Tambs K, Moum T. No Large Convergence during Marriage for Health, Lifestyle, and Personality in a Large Sample of Norwegian Spouses. Journal of Marriage and Family. 1992;54(4):957–71. doi: 10.2307/353175 [DOI] [Google Scholar]
- 20.Robinson MR, Kleinman A, Graff M, Vinkhuyzen AA, Couper D, Miller MB, et al. Genetic evidence of assortative mating in humans. Nature Human Behaviour. 2017;1(1):0016. [Google Scholar]
- 21.Tenesa A, Rawlik K, Navarro P, Canela-Xandri O. Genetic determination of height-mediated mate choice. Genome biology. 2015;16(1):269. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Yengo L, Robinson MR, Keller MC, Kemper KE, Yang Y, Trzaskowski M, et al. Imprint of assortative mating on the human genome. Nature Human Behaviour. 2018:300020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Mare RD. Five decades of educational assortative mating. American Sociological Review. 1991:15–32. [Google Scholar]
- 24.Howe LJ, Lawson DJ, Davies NM, Pourcain BS, Lewis SJ, Davey Smith G, et al. Genetic evidence for assortative mating on alcohol consumption in the UK Biobank. Nature Communications. 2019. doi: 10.1038/s41467-019-12424-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Robinson MR, Kleinman A, Graff M, Vinkhuyzen AA, Couper D, Miller MB, et al. Genetic evidence of assortative mating in humans. Nature Human Behaviour. 2017;1:0016. [Google Scholar]
- 26.Buss DM. Human mate selection: Opposites are sometimes said to attract, but in fact we are likely to marry someone who is similar to us in almost every variable. American Scientist. 1985;73(1):47–51. [Google Scholar]
- 27.Jiang Y, Bolnick DI, Kirkpatrick M. Assortative mating in animals. The American naturalist. 2013;181(6):E125–38. Epub 2013/05/15. doi: 10.1086/670160 . [DOI] [PubMed] [Google Scholar]
- 28.Kromer J, Hummel T, Pietrowski D, Giani AS, Sauter J, Ehninger G, et al. Influence of HLA on human partnership and sexual satisfaction. Sci Rep. 2016;6:32550. Epub 2016/09/01. doi: 10.1038/srep32550 ; PubMed Central PMCID: PMC5006172. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Chaix R, Cao C, Donnelly P. Is mate choice in humans MHC-dependent? PLoS Genet. 2008;4(9):e1000184. Epub 2008/09/13. doi: 10.1371/journal.pgen.1000184 ; PubMed Central PMCID: PMC2519788. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Sebro R, Hoffman TJ, Lange C, Rogus JJ, Risch NJ. Testing for non-random mating: evidence for ancestry-related assortative mating in the Framingham heart study. Genet Epidemiol. 2010;34(7):674–9. Epub 2010/09/16. doi: 10.1002/gepi.20528 ; PubMed Central PMCID: PMC3775670. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Munafò MR, Tilling K, Taylor AE, Evans DM, Davey Smith G. Collider scope: when selection bias can substantially influence observed associations. International Journal of Epidemiology. 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Catalogue of bias collaboration, Lee H, Aronson JK, D N. Collider bias. In Catalogue of Bias. 2019. Available from: https://catalogofbias.org/biases/collider-bias/.
- 33.Lawlor DA, Tilling K, Davey Smith G. Triangulation in aetiological epidemiology. International Journal of Epidemiology. 2016;45(6):1866–86. doi: 10.1093/ije/dyw314 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Davey Smith G, Phillips AN. Correlation without a cause: an epidemiological odyssey. International Journal of Epidemiology. 2020;49(1):4–14. doi: 10.1093/ije/dyaa016 International Journal of Epidemiology. [DOI] [PubMed] [Google Scholar]
- 35.Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562(7726):203. doi: 10.1038/s41586-018-0579-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Davies NM, Howe LJ, Brumpton B, Havdahl A, Evans DM, Davey Smith G. Within family Mendelian randomization studies. Hum Mol Genet. 2019;28(R2):R170–r9. Epub 2019/10/28. doi: 10.1093/hmg/ddz204 . [DOI] [PubMed] [Google Scholar]
- 37.Mostafavi H, Harpak A, Agarwal I, Conley D, Pritchard JK, Przeworski M. Variable prediction accuracy of polygenic scores within an ancestry group. eLife. 2020;9. Epub 2020/01/31. doi: 10.7554/eLife.48376 ; PubMed Central PMCID: PMC7067566. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Pirastu N, Cordioli M, Nandakumar P, Mignogna G, Abdellaoui A, Hollis B, et al. Genetic analyses identify widespread sex-differential participation bias. Nature Genetics. 2021;53(5):663–71. doi: 10.1038/s41588-021-00846-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Barton N, Hermisson J, Nordborg M. Why structure matters. eLife. 2019;8. Epub 2019/03/22. doi: 10.7554/eLife.45380 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Berg JJ, Harpak A, Sinnott-Armstrong N, Joergensen AM, Mostafavi H, Field Y, et al. Reduced signal for polygenic adaptation of height in UK Biobank. eLife. 2019;8. Epub 2019/03/22. doi: 10.7554/eLife.39725 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Sohail M, Maier RM, Ganna A, Bloemendal A, Martin AR, Turchin MC, et al. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. eLife. 2019;8. Epub 2019/03/22. doi: 10.7554/eLife.39702 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Ruby JG, Wright KM, Rand KA, Kermany A, Noto K, Curtis D, et al. Estimates of the Heritability of Human Longevity Are Substantially Inflated due to Assortative Mating. Genetics. 2018;210(3):1109–24. doi: 10.1534/genetics.118.301613 Genetics. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Haworth S, Mitchell R, Corbin L, Wade KH, Dudding T, Budu-Aggrey A, et al. Apparent latent structure within the UK Biobank sample has implications for epidemiological analysis. Nature Communications. 2019;10(1):333. doi: 10.1038/s41467-018-08219-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Kong A, Thorleifsson G, Frigge ML, Vilhjalmsson BJ, Young AI, Thorgeirsson TE, et al. The nature of nurture: Effects of parental genotypes. Science. 2018;359(6374):424–8. doi: 10.1126/science.aan6877 [DOI] [PubMed] [Google Scholar]
- 45.Young AI, Frigge ML, Gudbjartsson DF, Thorleifsson G, Bjornsdottir G, Sulem P, et al. Relatedness disequilibrium regression estimates heritability without environmental bias. Nature Genetics. 2018;50(9):1304. doi: 10.1038/s41588-018-0178-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Hartwig FP, Davies NM, Davey Smith G. Bias in Mendelian randomization due to assortative mating. Genet Epidemiol. 2018;42(7):608–20. Epub 2018/07/05. doi: 10.1002/gepi.22138 ; PubMed Central PMCID: PMC6221130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Evans DM, Moen GH, Hwang LD, Lawlor DA, Warrington NM. Elucidating the role of maternal environmental exposures on offspring health and disease using two-sample Mendelian randomization. Int J Epidemiol. 2019. Epub 2019/03/01. doi: 10.1093/ije/dyz019 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Moen GH, Hemani G, Warrington NM, Evans DM. Calculating Power to Detect Maternal and Offspring Genetic Effects in Genetic Association Studies. Behav Genet. 2019. Epub 2019/01/03. doi: 10.1007/s10519-018-9944-9 . [DOI] [PubMed] [Google Scholar]
- 49.Warrington NM, Freathy RM, Neale MC, Evans DM. Using structural equation modelling to jointly estimate maternal and fetal effects on birthweight in the UK Biobank. Int J Epidemiol. 2018;47(4):1229–41. Epub 2018/02/16. doi: 10.1093/ije/dyy015 ; PubMed Central PMCID: PMC6124616. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Beaumont RN, Warrington NM, Cavadino A, Tyrrell J, Nodzenski M, Horikoshi M, et al. Genome-wide association study of offspring birth weight in 86 577 women identifies five novel loci and highlights maternal genetic effects that are independent of fetal genetics. Hum Mol Genet. 2018;27(4):742–56. Epub 2018/01/09. doi: 10.1093/hmg/ddx429 ; PubMed Central PMCID: PMC5886200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Muñoz M, Pong-Wong R, Canela-Xandri O, Rawlik K, Haley CS, Tenesa A. Evaluating the contribution of genetics and familial shared environment to common disease using the UK Biobank. Nature Genetics. 2016. doi: 10.1038/ng.3618 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Spielman RS, Ewens WJ. The TDT and other family-based tests for linkage disequilibrium and association. American journal of human genetics. 1996;59(5):983. [PMC free article] [PubMed] [Google Scholar]
- 53.Cordell HJ, Clayton DG. Genetic association studies. The Lancet. 2005;366(9491):1121–31. doi: 10.1016/S0140-6736(05)67424-7 [DOI] [PubMed] [Google Scholar]
- 54.Davey Smith G, Ebrahim S. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? International Journal of Epidemiology. 2003;32(1):1–22. doi: 10.1093/ije/dyg070 [DOI] [PubMed] [Google Scholar]
- 55.Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Medicine. 2015;12(3):e1001779. doi: 10.1371/journal.pmed.1001779 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Okbay A, Beauchamp JP, Fontana MA, Lee JJ, Pers TH, Rietveld CA, et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature. 2016;533(7604):539. doi: 10.1038/nature17671 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Denaxas SC, George J, Herrett E, Shah AD, Kalra D, Hingorani AD, et al. Data resource profile: cardiovascular disease research using linked bespoke studies and electronic health records (CALIBER). International Journal of Epidemiology. 2012;41(6):1625–38. doi: 10.1093/ije/dys188 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.O’Connell J, Sharp K, Shrine N, Wain L, Hall I, Tobin M, et al. Haplotype estimation for biobank-scale data sets. Nature Genetics. 2016;48(7):817–20. doi: 10.1038/ng.3583 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.UK10K Consortium. The UK10K project identifies rare variants in health and disease. Nature. 2015;526(7571):82–90. doi: 10.1038/nature14962 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, et al. A reference panel of 64,976 haplotypes for genotype imputation. Nature Genetics. 2016;48(10):1279. doi: 10.1038/ng.3643 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Genomes Project Consortium. A global reference for human genetic variation. Nature. 2015;526(7571):68–74. doi: 10.1038/nature15393 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Allen NE, Sudlow C, Peakman T, Collins R. UK biobank data: come and get it. American Association for the Advancement of Science; 2014. [DOI] [PubMed] [Google Scholar]
- 63.Nikpay M, Goel A, Won H-H, Hall LM, Willenborg C, Kanoni S, et al. A comprehensive 1000 Genomes–based genome-wide association meta-analysis of coronary artery disease. Nature Genetics. 2015;47(10):1121. doi: 10.1038/ng.3396 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Locke AE, Kahali B, Berndt SI, Justice AE, Pers TH, Day FR, et al. Genetic studies of body mass index yield new insights for obesity biology. Nature. 2015;518(7538):197. doi: 10.1038/nature14177 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Wood AR, Esko T, Yang J, Vedantam S, Pers TH, Gustafsson S, et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nature Genetics. 2014;46(11):1173. doi: 10.1038/ng.3097 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Ehret GB, Ferreira T, Chasman DI, Jackson AU, Schmidt EM, Johnson T, et al. The genetics of blood pressure regulation and its target organs from association studies in 342,415 individuals. Nature Genetics. 2016;48(10):1171. doi: 10.1038/ng.3667 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Loh P-R, Tucker G, Bulik-Sullivan BK, Vilhjalmsson BJ, Finucane HK, Salem RM, et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nature Genetics. 2015;47(3):284–90. doi: 10.1038/ng.3190 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics. 2007;81(3):559–75. doi: 10.1086/519795 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Palla L, Dudbridge F. A fast method that uses polygenic scores to estimate the variance explained by genome-wide marker panels and the proportion of variants affecting a trait. The American Journal of Human Genetics. 2015;97(2):250–9. doi: 10.1016/j.ajhg.2015.06.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Burgess S, Butterworth A, Thompson SG. Mendelian randomization analysis with multiple genetic variants using summarized data. Genetic Epidemiology. 2013;37(7):658–65. doi: 10.1002/gepi.21758 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Altman DG, Bland JM. Interaction revisited: the difference between two estimates. BMJ. 2003;326(7382):219. doi: 10.1136/bmj.326.7382.219 BMJ. [DOI] [PMC free article] [PubMed] [Google Scholar]