Skip to main content
PLOS Genetics logoLink to PLOS Genetics
. 2021 Nov 4;17(11):e1009883. doi: 10.1371/journal.pgen.1009883

Assortative mating and within-spouse pair comparisons

Laurence J Howe 1,2,*, Thomas Battram 1,2, Tim T Morris 1,2, Fernando P Hartwig 1,3, Gibran Hemani 1,2, Neil M Davies 1,2,4, George Davey Smith 1,2
Editor: Samuli Ripatti5
PMCID: PMC8594845  PMID: 34735433

Abstract

Spousal comparisons have been proposed as a design that can both reduce confounding and estimate effects of the shared adulthood environment. However, assortative mating, the process by which individuals select phenotypically (dis)similar mates, could distort associations when comparing spouses. We evaluated the use of spousal comparisons, as in the within-spouse pair (WSP) model, for aetiological research such as genetic association studies. We demonstrated that the WSP model can reduce confounding but may be susceptible to collider bias arising from conditioning on assorted spouse pairs. Analyses using UK Biobank spouse pairs found that WSP genetic association estimates were smaller than estimates from random pairs for height, educational attainment, and BMI variants. Within-sibling pair estimates, robust to demographic and parental effects, were also smaller than random pair estimates for height and educational attainment, but not for BMI. WSP models, like other within-family models, may reduce confounding from demographic factors in genetic association estimates, and so could be useful for triangulating evidence across study designs to assess the robustness of findings. However, WSP estimates should be interpreted with caution due to potential collider bias.

Author summary

There is growing evidence that genome-wide association studies capture associations relating to environmental factors, such as indirect effects from parental genotypes. Within-family models such as sibling comparisons can be used to disentangles these different sources of association but are limited by the paucity of sibling data in large biobanks. Within-spouse pair models are a potentially tractable model because spouses share environmental factors in adulthood and may also share early-life environmental factors. Here, we evaluated the application of within-spouse models in genetic association studies, specifically considering assortative mating, a phenomenon whereby individuals may select a phenotypically similar partner. We found that within-spouse pair models can detect genuine confounding in genetic association estimates but are potentially susceptible to collider bias induced by comparing assorted pairs. Within-spouse pair estimates could be useful when combining evidence from different study designs.

Introduction

Within-sibship models have been widely used in genetic association studies for many decades [16]. Genotypic differences between siblings are the consequence of random segregation at meiosis, rather than parental or ancestral differences, and so within-sibship genetic association models control for demography (assortative mating, population stratification) and indirect genetic effects of parents [4,5,7]. However, there is a paucity of genetic data from siblings with limited availability for many phenotypes. Furthermore, while siblings are matched on the early-life environment, their environments in adulthood, when many phenotypes are measured, may differ.

In contrast, spouses may have different early-life environments but are likely to share an environment for much of adulthood while cohabiting [8]. This may act to increase phenotypic similarity, such as for behavioural (e.g., physical activity and alcohol use) or personality traits [9,10]. The shared adulthood environment between spouses has prompted their use in a variety of contexts in genetic and epidemiological research using a model that we refer to as the “within-spouse pair” (WSP) model. The WSP model involves modelling the similarities and differences of spouses, either by analysing the differences between each pair or by modelling spousal relationships as a covariate in a fixed-effect model. For example, previous studies have used the WSP model to estimate phenotypic variance explained by the shared adulthood environment [1117]. The WSP model has also been proposed as an approach to reduce confounding in aetiological research, with environmental confounders likely to be strongly correlated between spouses [18]. Here, we describe the strengths and limitations of within-spouse (WSP) designs for genetic studies.

A caveat of the WSP model is that spousal similarities are not just consequences of sharing an adulthood environment. There is evidence that for some phenotypes, spouses do not become much more similar during a relationship [19]. Another cause of spousal similarities is assortative mating–a phenomenon where humans are generally more likely to select a phenotypically similar [9,10,2026] or, in some instances [27], dissimilar [28,29] mate. For example, height and years in schooling are often fixed prior to partnership formation, suggesting that spousal similarities for these phenotypes reflect assortment rather than effects of the shared adulthood environment. Furthermore, geographical, ancestral, and cultural factors often have strong influences on both phenotypic variation and partner selection patterns, as illustrated by the ancestral similarities of spouses [30]. Therefore, some degree of spousal phenotypic similarities is likely to be explained by spousal assortment on factors not typically defined as phenotypes, such as place of birth or religion.

The WSP model may be susceptible to collider bias, which can occur when conditioning on a variable which is influenced by two or more upstream factors. Collider bias can induce spurious associations between these factors where the collider variable is conditioned on in analysis either by analytical model design or sample selection. For example, if a school grants scholarships to either individuals who are exceptional at sport or exceptional academically, then sporting and academic ability will be negatively correlated amongst individuals with scholarships. Similarly, spousal samples by definition condition on spousal compatibility, a pairwise measure of how likely two individuals are to enter a relationship. If several phenotypes influence spousal compatibility, then collider bias could potentially arise in the WSP model [3133]. For example, if similarities for age and educational attainment influence compatibility then spouses with larger age differences are more likely to have similar educational attainment (Fig 1). Previous spousal studies have acknowledged assortative mating, but whether assortment could distort WSP comparisons has not been investigated in detail. For example, the possibility of collider bias has been little discussed. We aimed to investigate the utility of the WSP model in genetic epidemiology and assess its robustness to collider bias.

Fig 1. Causal diagram illustrating collider bias in within-spouse pair and within-sibship models.

Fig 1

A) To illustrate collider bias in the context of spouses, consider a model with age and educational attainment (E) which are assumed here to be independent. Assuming that spouses assort on similarities for age and education, it follows that spousal assortment (A) is a common effect of age and education similarities. In a within-spouse pair model, adjusting or accounting for A would induce associations between age and education similarities. For example, if a spouse-pair have a large difference in age, then they must be similar for education. B) Contrastingly, for a within-sibship model, it is less plausible that age and education influence the sibling’s family F, as age and education are post-birth phenotypes. Therefore, adjusting for F is unlikely to induce an association between age and education.

We used causal diagrams (allowing double-headed arrows signifying correlated variables that may be influenced by variables outside the model [34]) and simulated data to illustrate two important characteristics of the WSP model. First, the WSP model can reduce confounding if spouses are correlated for confounders. Second, the WSP model is susceptible to collider bias induced by conditioning on spousal compatibility. We then applied the WSP model using 47,435 spouse-pairs in UK Biobank [35] to estimate associations between genetic variants and phenotypes (e.g. height). We then estimated effect size shrinkage (% decrease) in the WSP estimates compared to within-pair estimates from random non-assorted pairs, which were derived by reordering the spouse-pair sample. For comparison, we also estimated within-sibship shrinkage using 19,523 sibling pairs from UK Biobank. Finally, as a negative control analysis, we used the WSP model to estimate the effects of age on systolic blood pressure (SBP) and coronary artery disease (CAD).

Results

Within-spouse pair model: Assortative mating, spousal correlations and collider bias

Here, we present results from simulations evaluating the WSP model under assortative mating. In the first simulation model (A), the relationship between an exposure and an outcome is confounded by an unmeasured factor. Spouses are positively correlated for the unmeasured confounder, either because of assortative mating or because of shared environmental factors during cohabitation (Fig 2A). Simulations demonstrated that under this model, WSP estimates of the effect of the exposure on the outcome are less biased and converge to the simulated unbiased estimate as the spousal correlation for the confounder tends to 1 (S1 Fig and S1 Table).

Fig 2. Causal diagrams of simulated models for assortative mating, spousal correlations, and collider bias.

Fig 2

The WSP design uses pairwise spousal differences (e.g. XM1XF1 & YM1YF1) in regression models, fitting each spouse pair as a single observation. A) Within-spouse pair: spousal correlations for confounders. Exposure X; Outcome Y; Unmeasured confounder E; Spousal assortment A; WSP exposure X* (X* = XMXF); WSP outcome Y* (Y* = YMYF); WSP environmental confounder (the non-shared portion of the set of confounders) E* (E* = EMEF). This figure illustrates the effect of an exposure on an outcome in the presence of an unmeasured confounder. Here, spousal pairing is determined by an assortment variable correlated with the confounder (indicated by A, a child of the confounder E). It follows that the value of spouses’ confounders will be correlated. In this example, a WSP model will reduce simulated bias in the estimate of the effect of X on Y (S1 Fig). Here we assume that spousal correlations for the confounder reflect assortment but in practice they could also relate to the shared spousal environment. B) Within-spouse pair: assortative mating and collider bias. Exposures X1 X2; Outcome Y; Spousal assortment A; WSP exposures X1*, X2*(Xi*=XiMXiF); WSP outcome Y* (Y* = YMYF). This figure illustrates the effect of an exposure on an outcome when two, otherwise independent exposures influence both the outcome and spousal assortment. It follows that associations will be present in the WSP model between the two exposures, which will distort the WSP estimated effect of the exposure on the outcome. We quantify the effect of potential collider bias in the WSP model at different levels of assortment on the two exposures. Dashed lines indicate associations induced by spousal assortment.

In the second simulation model (B), two independent exposures influence an outcome. These exposures could be two phenotypes, a phenotype and a genetic score, or two independent genetic scores such as height genetic scores constructed from odd and even chromosomes. Since there is assortment on the two exposures, assortment acts as a collider, which induces associations between variables that would otherwise be independent in the population. For example, in the WSP model, positive assortment on height and educational attainment phenotypes could induce a negative correlation between height and educational attainment genetic scores. The strength of this spurious correlation will depend on the underlying data generating process and the degree of assortment on the exposures. Indeed, assortment on height across multiple generations has resulted in positive correlations between height increasing genetic variants on different chromosomes [22]. These correlations could lead to bias in WSP estimates of either exposure on the outcome. However, WSP estimates will only be affected by collider bias if both exposures influence the outcome and if the effects of the exposures on the collider are not perfectly multiplicative (Fig 2B).

Simulations showed that the degree of bias in the effect estimate is a function of the degree of assortment on the two exposures, with more bias when spouses assort strongly on both traits. For example, under this model and using plausible assortment estimates for educational attainment (spousal phenotypic correlation: 0.5) and height (spousal phenotypic correlation: 0.2) [25], the expected bias would be around 13% when estimating the effect of education on a trait which is also influenced by height. If both exposures have the same direction of effect on the outcome and assortment, then the WSP estimate would be biased downwards (S1 Fig and S2 Table).

Empirical analyses using spouse pairs in UK Biobank

Within-spouse pair: genetic and phenotypic associations

Although the WSP model is susceptible to collider bias, the model could be useful in re-calibrating associations that might be biased due to confounding. Genetic data are particularly useful for evaluating aetiological models because genotypes are measured accurately and fixed from conception, largely removing the possibility of reverse causation. Here we aimed to evaluate if WSP genetic association estimates are less confounded than estimates from population studies of unrelated individuals.

Estimates of genetic associations using unrelated individuals can be distorted by demography (e.g. assortative mating, fine-scale population structure) and indirect effects of parents [6,7,36]. WSP estimates may be less affected by these sources of association, particularly population structure, because of environmental and ancestral similarities between spouses [30].

Using 47,435 spouse-pairs from UK Biobank (S3 Table and S2, S3 and S4 Figs), previously derived using household sharing information [24], we first explored the extent to which spouse and sibling pairs are correlated for the first 10 principal components and birth coordinates (north-south, east-west) to inform the extent of pairwise spousal ancestral similarities. Spouse pairs were correlated for both birth coordinates and the first 10 principal components with correlations ranging from 0.10 for PC6 to 0.32 for PC4 across the principal components and strong correlations observed for both north-south (0.62; 95% C.I. 0.61, 0.62) and east-west (0.46; 95% C.I. 0.46, 0.47) birth coordinates. As expected, sibling pairs were very strongly correlated for birth coordinates and the first 10 principal components with correlations ranging from 0.74 for PC10 to 0.98 for PC4 (S4 Table). The spousal correlations for birth coordinates and principal components illustrate how assortative mating and social homogamy induce ancestral similarities between spouses.

We then estimated the effects of genetic variants on six different traits using the WSP design and between unrelated non-spouse pairs. We then estimated the shrinkage (% attenuation) from the non-spouse pair genetic association estimates to the WSP estimates. For comparison, we also applied the same approach to a sample of 19,523 sibling pairs (i.e. within-sibship model). Within-sibship models are a gold standard within family design for estimating genetic associations because they control for demographic and parental effects [4,6,36,37]. Comparisons between WSP and within-sibship shrinkage estimates would provide insight into the accuracy of WSP genetic association estimates.

We found strong evidence of smaller effect sizes in the WSP model for height (shrinkage: 19%; 95% CI 17%, 22%), educational attainment (shrinkage: 72%; 95% CI 64%, 79%) and BMI (shrinkage: 16%; 95% CI 6%, 25%). Contrastingly, there was limited evidence of shrinkage for SBP and CAD variants. The within-sibship analysis provided strong evidence of shrinkage for height (shrinkage: 15%; 95% C.I. 11%, 20%) and educational attainment (shrinkage: 53%; 95% C.I. 35%, 71%) variants, but limited evidence for BMI variants (shrinkage: 5%; 95% -12%, 22%). WSP shrinkage estimates were generally higher than within-sibship estimates, but imprecision prevented stronger conclusions regarding heterogeneity. Including principal components in the random-pair models did not greatly affect results except in the alcohol analysis where there was only evidence for shrinkage in the unadjusted models. This suggests that population stratification is unlikely to entirely explain the observed shrinkage in these estimates (Table 1). We note that the alcohol analysis included only a single SNP which is known to be strongly associated with population structure in UK Biobank [24].

Table 1. Estimates of genetic association shrinkage from within-spouse pair and within-sibship models.
Phenotype Number of SNPs Covariates Within-spouse pair shrinkage: % (95% C.I.) Within-sibship shrinkage: % (95% C.I.) Heterogeneity P for spouse and sibling shrinkage estimates
Height 381 No PC 19% (17%, 22%) 15% (11%, 20%) 0.18
PC1-10 17% (14%, 20%) 13% (8%, 18%) 0.19
Educational attainment 69 No PC 72% (64%, 79%) 53% (35%, 71%) 0.06
PC1-10 71% (62%, 79%) 51% (33%, 70%) 0.06
Body mass index 68 No PC 16% (6%, 25%) 5% (-12%, 22%) 0.28
PC1-10 16% (6%, 25%) 5% (-12%, 22%) 0.28
Coronary artery disease 41 No PC -4% (-23%, 15%) -1% (-36%, 34%) 0.90
PC1-10 -4% (-23%, 16%) -1% (-35%, 34%) 0.83
Systolic blood pressure 242 No PC 0% (-7%, 8%) 5% (-7%, 18%) 0.53
PC1-10 0% (-8%, 7%) 5% (-7%, 18%) 0.50
Alcohol consumption 1A No PC 29% (14%, 43%) 20% (-20%, 59%) 0.40
PC1-10 14% (-5%, 33%) 4% (-46%, 54%) 0.42

A: rs1229984 in ADH1B

We compared the within-sibship and WSP shrinkage estimates from this study (using UK Biobank data only) to within-sibship shrinkage estimates from a recent within-sibship GWAS of 17 cohorts [5], which included over 4x as many siblings as this study. The within-sibship shrinkage estimates from the multi-cohort GWAS were highly consistent with the within-sibship shrinkage estimates from UK Biobank only but were much more precise. The multi-cohort within-sibship shrinkage estimates were smaller than the WSP shrinkage estimates for height, BMI, and educational attainment, with non-overlapping confidence intervals, providing some evidence that WSP shrinkage is larger than within-sibship shrinkage for these phenotypes (S5 Table).

Within-spouse pair: age, SBP and CAD

As a negative control analysis, we next used the WSP model to estimate the effects of increasing age on outcomes known to be related to age (CAD and SBP), using random pair estimates for comparison. Age cannot be influenced by other phenotypes, so analyses are unlikely to be susceptible to reverse causation or confounding. However, collider bias in the WSP model with age is plausible because spousal compatibility is influenced by age similarities. For example, couples with large age differences may systematically differ to couples with smaller age differences. It follows that differences between WSP and random pair estimates (with age as the exposure) are likely to reflect collider bias. This is a similar premise to autosomal GWAS of sex, where genetic associations are likely to reflect participation bias because autosomal genetic variation cannot influence sex [38].

Pairwise age differences were found to be greater between random pairs, consistent with individuals preferring a partner of a similar age. We did not find strong evidence for differences in age effect estimates on CAD and SBP between the spouse and random pair samples suggesting that any collider bias effects are modest in this context (Table 2).

Table 2. Within-spouse pair estimates of the effect of age on SBP and CAD.
Phenotype Spouse-pairs (N = 47,435) Random pairs (N = 47,435): Median estimate from 100 simulations
Average age difference (years); Median (Q1, Q3) 2.0 (1.0, 4.0) 7.0 (3.0, 13.0)
Systolic blood pressure (Change in mmHg per 1-year increase in age; 95% C.I.) 0.74 (0.69, 0.80) 0.80 (0.78, 0.83)
Coronary artery disease (OR per 1-year increase in age; 95% C.I.) 1.05 (1.04, 1.05) 1.05 (1.04, 1.05)

All analyses were adjusted for sex of the index individual.

Discussion

In this study, we used causal diagrams, simulations, and empirical data to evaluate the use of the WSP model in genetic epidemiology. We showed that the WSP model can account for unmeasured confounding if spouses are correlated for the confounder but that comparing assorted spouses can induce collider bias. Using empirical data, we found evidence that genetic association estimates for height, educational attainment, and BMI shrink in the WSP model when compared to a within-pair model using random individuals.

Within-sibship models in UK Biobank, which control for demographic and parental effects [6,36], also provided evidence of shrinkage for height and educational attainment variants but not for BMI, consistent with previous studies [4,5,37]. WSP shrinkage point estimates for height, BMI and education were larger than the UK Biobank within-sibship shrinkage estimates although confidence intervals overlapped. However, there was strong statistical evidence that WSP shrinkage is greater than within-sibship shrinkage for height, BMI and educational attainment when using more precise within-sibship shrinkage estimates from a recent within-sibship GWAS [5]. The consistent evidence of shrinkage between the two models for height and education suggests that WSP models may be removing associations relating to demography.

Simulated data illustrated that if spouses assort on a confounder of the exposure and outcome, then the WSP association provides a less biased estimate of the causal effect than a conventional model unadjusted for the confounder. An example of a potential confounder in genetic association studies is ancestry, which we showed to be more correlated between spouses than for non-spouse pairs by illustrating birth coordinate and principal component correlations between spouses. However, we note that including principal components as covariates did not greatly affect shrinkage estimates except for alcohol consumption where, as noted earlier, the single variant used is known to be strongly associated with population structure [24]. The WSP shrinkage estimates being higher than the sibling estimates suggests that the shrinkage cannot be explained by adjustment for confounding alone. If the only source of shrinkage is adjustment for confounding, then WSP shrinkage estimates should be smaller than within-sibship estimates because spouse models are unlikely to fully control for demographic or family-level (e.g. parental nurture) effects. Collider bias induced by comparing assorted pairs is one potential explanation for the observed WSP shrinkage.

Collider bias could contribute to the observed shrinkage depending on the interactive model between the colliding effects and the degree of assortment. Assuming a linear additive model, collider bias is likely to shrink rather than inflate genetic associations because assortment would induce negative correlations in the WSP model between factors influencing the assorted trait in the same direction. For example, assortment could induce negative correlations between height increasing genetic variants and height increasing environmental factors in the WSP model, leading to shrinkage when estimating WSP genetic associations. This is in contrast to the population-level effects of assortative mating which inflate associations because of induced positive correlations between trait-increasing variants on different chromosomes [22]. However, in the negative control example of age on health outcomes, we found little evidence of collider bias; within-pair effect estimates of age on CAD and SBP were consistent between spouse and non-spouse pair samples. Another potential source of WSP shrinkage is the spousal environment. The spousal environment could be influenced by individual’s genotypes leading to reduced spousal phenotypic differences. For example, if an individual has high genetic liability to increased alcohol consumption this could lead to their partner consuming similar amounts of alcohol independent of their genotype.

A key implication of these analyses is that spousal similarities and differences are not necessarily random or attributable solely to the shared adulthood environment. WSP similarities are likely to reflect a combination of social homogamy, assortative mating and the shared adulthood environment. Amidst growing evidence that genetic epidemiological studies can capture effects of fine-scale population structure, parental nurture and assortative mating [4,6,3946], there is considerable interest in using genotype data from pedigrees to more accurately estimate direct genetic effects and trait heritability as well as to explore parental effects on offspring phenotypes [1114,3941,44,45,4751]. Family designs such as the transmission disequilibrium test [52] and within-sibship models are protected from many of these biases by random segregation at meiosis [53,54]. However, in contrast, inferences from spousal analyses are not as robust, thus it is important to understand and model the assortment in spousal designs. A further implication is that assortative mating is likely to contribute to the phenotypic and genetic structure of epidemiological studies. Large studies such as the UK Biobank, frequently incidentally sample participants who are partnered with another study participant [24]. The non-randomness of study participation in UK Biobank has been previously discussed as a possible cause of selection (participation) bias [31]. Our findings illustrate that assortative mating is likely to contribute to the non-random distribution of phenotypes (and genotypes) in population biobanks.

Our study has several important limitations. First, as described in our previous study [24], derived spouse-pairs were identified using household sharing information so may be susceptible to a degree of classification error with non-spouse pairs being incorrectly identified as spouses. Second, the mechanisms by which spouses jointly participate in UK Biobank may have induced selection bias into empirical analyses as these pairs could be more similar than pairs that did not jointly participate. Third, given that the exact mechanisms of assortment are not widely understood, our simulations and assumptions may not accurately capture the mechanisms underlying spousal assortment. In simulations we assumed that factors influencing assortment are independent across the population but in practice, factors influencing assortment are often correlated (e.g. height and education). Future research could use more complex simulations to evaluate models that can distinguish the effects of social homogamy, migration and measurement error. Fourth, it is important to note that educational attainment as defined by qualifications when study participants are aged over 40 will also capture individuals with degrees obtained during adulthood, suggesting that educational similarities could also plausibly relate to the shared adulthood environment.

To conclude, the WSP model can reduce confounding from environmental factors but may also be susceptible to collider bias. An empirical example using genetic associations suggested that WSP estimates may be downwardly biased. Contrastingly, WSP estimates for effects of age did not seem to be affected by collider bias. An advantage of WSP models is that they may have increased power for genetic studies relative to other within-family designs because (non-consanguineous) spouses are less likely than first degree relatives to share long segments of the genome identical by descent. The WSP model could be a complimentary orthogonal design to other within-family models when triangulating evidence from different study designs [33].

Methods

Data sources

UK Biobank

Study description

UK Biobank is a large-scale prospective cohort study which sampled 503,325 individuals aged between 38–73 years at baseline, recruited between 2006 and 2010 from across the United Kingdom. The cohort has been described in detail previously [35,55]. For the purposes of this study, we used two subsamples of the cohort; spouse-pairs [24], and full-sibling pairs [6].

Potential spouses were estimated using household sharing information in a previous publication [24]. We started with a European subsample of UK Biobank, consisting of 463,827 individuals based on a k-means cluster analysis on the first 4 genetic principal components. We then used phenotype data to extract pairs of individuals who reported (a) living with their spouse (field ID: 6141–0.0), (b) the same length of time living in the house (field ID: 699–0.0), (c) the same number of occupants in the household (field ID: 709–0.0), (d) the same number of vehicles (field ID: 728–0.0), (e) the same accommodation type and rental status (field IDs: 670–0.0, 680–0.0), (f) identical home coordinates (rounded to the nearest km) (field IDs: 20074–0.0, 20075–0.0) and (g) are registered with the same UK Biobank recruitment centre (field ID: 54–0.0) and (h) both have available genotype data. We considered pairs with identical information across all household variables as putative spouses. When more than two individuals shared identical information (observed in 18,145 instances), then these individuals were removed. 53 closely related pairs (IBD > 0.1) were identified and removed using a genetic relationship matrix. We excluded 4,866 potential couples who were the same sex (9.3% of the sample) as they were deemed to be more likely to be false positives and because of possible heterogeneity in same-sex assortment patterns. The original paper identified 47,549 male-female pairs believed to be cohabitating spouses. In this study, we used an updated version of the genetic data after removing individuals who had opted out of the study resulting in a slightly reduced sample of 47,435 complete pairs.

Full-sibling relationships were derived using UK Biobank provided estimates of pairwise identical by state (IBS) kinships (>0.5–21*IBS0, <0.7) and IBS0 (>0.001, <0.008), the proportion of unshared loci [6]. This approach identified 40,275 siblings from 19,523 families. For the purposes of within-sibship analyses, we restricted the sample to 2 siblings from each family, selecting siblings at random. The analysis sample included 39,046 individuals from 19,523 families.

Phenotype data

At baseline, the height of study participants was measured using a Seca 202 device at the assessment centre (field ID: 12144–0.0), body mass index was derived manually from measures of standing height and weight (field ID: 21001.0.0), systolic blood pressure was measured using an automated reading from an Omron Digital blood pressure monitor (field ID: 4080–0.0). Educational attainment was defined as in a previous study [56], using questionnaire data on qualifications to estimate the number of years spent in full-time education (field ID: 6138). Coronary artery disease cases were diagnosed using International Classification of Disease (10th edition) (ICD10) and Operating Procedure System (OPS) codes from either hospital events (Hospital Episode Statistics) or underlying cause of death from the death register. The following ICD10 (I21, I22, I23, I24, I25, Z955) and OPS codes (K40-K46, K471, K49, K50, K75) [57] were used to classify diseased cases. North-south (field ID: 129) and east-west (field ID: 130) birth coordinates were derived from self-reported town of birth.

Alcohol consumption was defined as in a previous study [24]. In brief, participants were asked to estimate their current alcohol intake frequency (daily or almost daily, three or four times a week, once or twice a week, one to three times a month, special occasions only, never, prefer not to say) (ID: 1558–0.0). Individuals reporting a current intake frequency of at least once or twice a week were asked to estimate their average weekly intake of a range of different alcoholic beverages (red wine, white wine, champagne, beer, cider, spirits, fortified wine) (ID: 1568–0.0, 1578–0.0, 1588–0.0, 1598–0.0, 1608–0.0). We converted intake frequencies to weekly alcohol consumption in units by converting the questionnaire measurements to units: measures for spirits (1 unit), glasses for wines (2 units) and pints for beer/cider (2.5 units). Individuals reporting current intake frequency of “one to three times a month”, “special occasions only” or “never” (for whom this phenotype was not collected), were assumed to have a weekly alcohol consumption volume of 0. We removed 189 pairs with outlying values (>5 S.D from the mean) from one or more members.

Genotyping

UK Biobank study participants (N = 488,377) were assayed using the UK BiLEVE Axiom Array by Affymetrix1 (N = 49,950) and the UK Biobank Axiom Array (N = 438,427). Directly genotyped variants were pre-phased using SHAPEIT3 [58] and then imputed using Impute4 using the UK10K [59], Haplotype Reference Consortium [60] and 1000 Genomes Phase 3 [61] reference panels. Post-imputation, data were available for approximately ~96 million genetic variants. More detail is contained in previous publications [35,62].

Genome-wide association studies

Summary statistics from previous published GWAS, independent from UK Biobank, were used for information on SNPs associated with coronary artery disease [63], body mass index [64], educational attainment [56] and height [65].

Genome-wide summary data were not available for a recent systolic blood pressure GWAS [66], so we performed a GWAS of systolic blood pressure using UK Biobank. To remove sample overlap, we excluded the 47,435 spouse pairs from the analysis and used the remaining sample of 367,963 individuals of self-report European descent. A GWAS was conducted on this sample using a linear mixed model (LMM) association method as implemented in BOLT-LMM (v2.3)[67]. To model population structure in the sample we used 143,006 directly genotyped SNPs obtained after filtering on MAF > 0.01; genotyping rate > 0.015; Hardy-Weinberg equilibrium p-value < 0.0001 and LD pruning to an r2 threshold of 0.1 using PLINK v2.0 [68]. We included the age and sex of participants as covariates in the model.

A set of Genome-wide significant SNPs were generated for each trait by LD clumping relevant summary statistics (P<5×10−8, r2<0.001, clumping distance = 10000 kb) using the 1000 Genomes Phase 3 GBR samples [61] as the reference panel. For alcohol consumption, we used a missense variant (rs1229984) in ADH1B strongly associated with alcohol behaviour, as in a previous study [24].

Theory of within-spouse pair comparisons

The phenotype P of individual I can be modelled as a function of independent factors; genetics G, the environment E, age, sex and a stochastic variance term ∈.

PI=GI+EI+AgeI+SexI+I

When considering male-female spouse pairs, we can decompose the influence of the environment E on P into effects of the shared environment between spouses SE (e.g. during cohabitation) and effects of the non-shared environment NSE. For example, for the male M and female F in pair K:

PKM=GKM+(SEK+NSEKM)+AgeKM+SexKM+KM
PKF=GKF+(SEK+NSEKF)+AgeKF+SexKF+KF

We then define the WSP model across spouse pairs as:

P*=G*+E*+Age*+Sex*+*

where the differences between the spouses for each factor are included in the model (e.g. for pair K, PK*=PKMPKF, GK*=GKMGKF,EK*=NSEKMNSEKF). The shared environmental terms are by definition equal for men and women and drop out of the model.

For the WSP model to generate an unconfounded estimate of the causal effect of G on P, we require that the genetic and environmental difference terms in the between-spouse model are independent, i.e. Corr(G*, E*) = 0. This assumption could be violated by several factors including assortative mating and indirect genetic effects. For example, if parental genotypes influence their offspring phenotype, then the offspring’s genotype would be positively correlated with their parental environment.

Random and non-random mating

Consider the WSP model applied to three distinct sets of pairs: a) a random set of males and females (non-spouses), b) spouse pairs under random mating (random spouses), and c) spouse-pairs under assortative mating (assorted spouses). In theory, the environmental differences between pairs would decrease with cohabitation and under assortment on environmental factors such as place of birth and socio-economic status:

E*NonSpouse>E*RandomSpouse>E*AssortedSpouse

Note that as the environmental differences between pairs tends to zero (E*→0), the bias in the estimated association between P and G will also tend to zero (bias(P~G)→0) even if G* and E* are correlated in the WSP model (Corr(G*, E*)≠0) because the pair would be matched for the confounder, suggesting that comparing assorted pairs could reduce the effect of environmental biases.

We define the mechanism by which spouses assort as spousal compatibility A, a pairwise measure of the likelihood that two individuals enter a relationship. If several phenotypes influence assortment, then assortative mating can induce collider bias. For example, assortment on a phenotype influenced by genetic and environmental factors could induce spousal correlations in both genetic and environmental determinants of the phenotype, i.e. Corr(GKM, GKF)>0 & Corr(EKM, EKF)>0. It follows that in the WSP model, spousal genetic differences could be inversely associated with spousal environmental differences, i.e. Corr(G*, E*)<0.

Statistical methods

Simulations

Model A: Within-spouse pair: spousal correlation for confounders

In model A, an exposure X influences an outcome Y but the relationship is confounded by life-course exposure to an environmental factor E which influences both X and Y. We evaluated the effect of spousal correlations for E on the WSP estimates of the effect of X on Y.

Spousal correlations for E were generated by simulating E and a spousal assortment measure A such that Corr(E, A) = C. Male-female pairs were defined by ordering A such that AM1AM2≥..AM1000 and AF1AF2≥..AF1000 and matching respective males and females, i.e. AM1 with AF1. This matching induces a spousal correlation for E which converges to C as the sample size increases to infinity.

Using 2,000 simulated individuals (1,000 males and 1,000 females), we generated WSP estimates at a range of values of C (0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9). Code for model (A) can be found at https://github.com/LaurenceHowe/Between-spouse/blob/master/simulations.R.

Model B: Within-spouse pair: assortative mating and collider bias

In model B, individuals assort on two independent phenotypes X1 and X2, that also influence an outcome O such that Y~X1+X2+∈. We evaluated the effects of assortment on X1 and X2 on the WSP estimate of the effect of X1 on Y. We simulated A, X1 and X2 such that Corr(X1, A) = C1 & Corr(X2 A) = C2. As above, we then defined pairs by ordering A which induces spousal correlations for X1 and X2.

Using 2,000 simulated individuals (1,000 males and 1,000 females), we generated WSP estimates used varying degrees of spousal assortment (Ci = 0, 0.1, 0.2, 0.3, 0.4, 0.5: i ϵ(1,2)). The WSP regression model is defined as Y*~X1* where Y* = YKMYKF and X1*=X1KMX1KF for each assorted pair. Code for model (B) can be found at https://github.com/LaurenceHowe/Between-spouse/blob/master/simulations.R.

Empirical analysis in the UK Biobank

Within-spouse pair: Genetic and phenotypic differences

We estimated the correlations between spouses for birth coordinates (north-south, east-west) principal components using a linear regression model in R. Given that the regression model includes the same variable from different individuals, the association estimates are approximately equivalent to correlations.

We defined the genotypic differences at a variant for spouse pair K with individuals A and B as:

GenotypeDifK=GenotypeKAGenotypeKB

WSP effect estimates of each genetic variant on the relevant phenotype of interest (height, body mass index, systolic blood pressure, educational attainment, coronary artery disease or alcohol consumption) were generated using linear or logistic regression. In the context of binary outcomes, the pair were rearranged so that the phenotypic difference could take the value of either 0 or 1 (for logistic regression), with other variables rearranged accordingly. The sex of the reference individual and the age difference between the spouses were included as covariates:

PhenotypeDifKGenotypeDifK+AgeDifK+SexKA
wherePhenotypeDifK=PhenotypeKAPhenotypeKB
andAgeDifK=AgeKAAgeKB

Using the models described above, we generated associations using the WSP model with the spouse-pairs. For comparison, we generated 100 distinct datasets of random-male female pairs which were generated by randomly rearranging the 47,435 spouse-pairs and ensuring that pairs were of different sex. We applied the same within-pair models to the random male-female pairs, taking the median effect estimate and standard error for each variant from the 100 random-pair estimates. To compare WSP and random-pair genetic association estimates, we used an inverse-variance weighted (IVW) approach [69,70]. The IVW approach uses summary data to estimate the effect of a polygenic score from the discovery GWAS, where the genetic variants were selected from, on the phenotype in both models. Using betas from the discovery GWAS as “weights” and betas and standard errors from the WSP and random-pair models, the IVW estimates are calculated across N variants as follows.

Beta(IVW)=1nWeight*BetaSE21nWeight2SE2
SE(IVW)=11nWeight2SE2

Shrinkage in genetic associations for each phenotype, defined as the percentage difference between the random pair IVW estimate and the WSP estimate, was calculated using the delta method assuming no covariance between the estimators.

As we investigated only a single genetic variant for alcohol consumption, we were unable to investigate a trend across genetic variants. Instead we tested for a difference between two means for the WSP and median random-pair estimate [71].

Within-sibship birth coordinate correlations, principal component correlations and shrinkage estimates were generated using very similar methods to the spousal analyses [4,6,36]. Unlike the male-female spouse-pairs, siblings can be different sexes, so we included a sex difference term in the regression models for the shrinkage analysis. Within-sibship estimates were compared with random-pair estimates as in the spousal analysis. Shrinkages in genetic associations for each phenotype were estimated as above.

PhenotypeDifKGenotypeDifK+AgeDifK+SexDifK
whereSexDifK=SexKASexKB

We investigated heterogeneity between WSP and within-sibship shrinkage estimates using the difference for two means test [71] assuming no covariance.

As a sensitivity analysis, we included an analysis adjusting for principal components in the random-pair samples to account for population structure differences. We included differences for the first 10 principal components in the random pair models as below. Principal component differences were not included in the WSP or within-sibship models.

PhenotypeDifKGenotypeDifK+AgeDifK+SexDifK+PC1DifK..+PC10DifK
wherePC1DifK=PC1KAPC1KB

For comparison, we also considered within-sibship shrinkage estimates from a recent within-sibship GWAS preprint [5]. This preprint reported shrinkage for genetic variants at genome-wide significance (5x10-8) and a more liberal threshold (1x10-5) for height, BMI, educational attainment, SBP and alcohol consumption. Coronary heart disease was not analysed in this study. As the shrinkage estimates were broadly similar between the two thresholds for the 5 phenotypes in this preprint, we considered the shrinkage estimates from the liberal threshold.

Within-spouse pair: age, SBP and CAD

The WSP effect estimates of age on CAD and SBP were estimated using the following regression model (linear or logistic dependent on the outcome of interest), including sex of the reference individual and the age difference between-spouses as covariates:

PhenotypeDifKAgeDifK+SexKA

As above, we repeated analyses using the datasets of random male-female pairs, reporting the median effect size and standard error across the 100 simulated datasets for each model.

Supporting information

S1 Table. Model 1: Spousal correlations controlling for confounding.

Results from simulation analyses investigating how the WSP model can control for confounding if spouses assort on the confounder.

(DOCX)

S2 Table. Model 2: Assortment and collider bias.

Results from simulation analyses investigating how the WSP model may be susceptible to collider bias induced by spousal assortment.

(DOCX)

S3 Table. Characteristics of the spouse sample (N≤94,870).

A table containing summary-level phenotype information on the characteristics of the UK Biobank spouses, stratified by sex.

(DOCX)

S4 Table. Spouse and sibling pair correlations for birth coordinates and principal components.

A table containing within-pair correlations for spouses and siblings for north-south and east-west birth coordinates as well as the first 10 principal components.

(DOCX)

S5 Table. Comparisons of WSP and within-sibship shrinkage estimates.

A table containing WSP and within-sibship shrinkage estimates from this study for height, educational attainment, BMI, SBP and alcohol consumption as well as within-sibship shrinkage estimates from an external preprint.

(DOCX)

S1 Fig. Simulation results for Within spouse-pair models.

A–Simulations for model (A): Spousal correlations controlling for confounding. As the strength of spousal assortment (spousal correlation) on the confounder (E) increases, the within-spouse pair (WSP) estimate of X on Y unadjusted for E (in blue) moves from the confounded unadjusted estimate of 0.45 to the unbiased estimate of 0.30. B–Simulations for model (B): Within spouse-pair: assortment and collider bias. Spousal assortment can induce collider bias in WSP estimates. If spouses assort on two phenotypes X1 and X2 which both affect outcome Y, then the association of X1 and Y (or X2 and Y) estimated from the WSP model is a biased estimate of the causal effect of X1 on Y (or X2 on Y). This bias monotonically increases in the degree of assortment on either X1 or X2.

(PNG)

S2 Fig. Height of UK Biobank spouse pairs.

Scatter plot showing male spouse height on the X axis and female spouse height on the Y axis for each spouse-pair.

(PNG)

S3 Fig. BMI of UK Biobank spouse pairs.

Scatter plot showing male spouse BMI on the X axis and female spouse BMI on the Y axis for each spouse-pair.

(PNG)

S4 Fig. SBP of UK Biobank spouse pairs.

Scatter plot showing male spouse SBP on the X axis and female spouse SBP on the Y axis for each spouse-pair.

(PNG)

Data Availability

In this study, we used individual participant data from UK Biobank. Interested researchers would be able to obtain the same data-set from UK Biobank. Reference: https://www.nature.com/articles/s41586-018-0579-z Contact: access@ukbiobank.ac.uk. Relevant code for simulation models is available at the following repository https://github.com/LaurenceHowe/Between-spouse.

Funding Statement

LJH, TB, TTM, GH, NMD and GDS are members of MRC Integrative Epidemiology Unit which is supported by the Medical Research Council (MRC) [MC_UU_00011/1] and the University of Bristol (principal investigator: GDS). NMD is supported by The Economics and Social Research Council (ESRC) via a Future Research Leaders grant [ES/N000757/1], a Norwegian Research Council Grant number 295989 and by the Health Foundation’s Efficiency Research Programme (Award 807293). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Curtis D, Miller MB, Sham PC. Combining the sibling disequilibrium test and transmission/disequilibrium test for multiallelic markers. American journal of human genetics. 1999;64(6):1785. doi: 10.1086/302421 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Abecasis GR, Cardon LR, Cookson WO. A general test of association for quantitative traits in nuclear families. Am J Hum Genet. 2000;66(1):279–92. Epub 2000/01/13. doi: 10.1086/302698 ; PubMed Central PMCID: PMC1288332. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Fulker DW, Cherny SS, Sham PC, Hewitt JK. Combined linkage and association sib-pair analysis for quantitative traits. Am J Hum Genet. 1999;64(1):259–67. Epub 1999/01/23. doi: 10.1086/302193 ; PubMed Central PMCID: PMC1377724. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Lee JJ, Wedow R, Okbay A, Kong E, Maghzian O, Zacher M, et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat Genet. 2018;50(8):1112–21. Epub 2018/07/25. doi: 10.1038/s41588-018-0147-3 ; PubMed Central PMCID: PMC6393768. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Howe LJ, Nivard MG, Morris TT, Hansen AF, Rasheed H, Cho Y, et al. Within-sibship GWAS improve estimates of direct genetic effects. bioRxiv. 2021:2021.03.05.433935. doi: 10.1101/2021.03.05.433935 [DOI] [Google Scholar]
  • 6.Brumpton B, Sanderson E, Hartwig FP, Harrison S, Vie GÅ, Cho Y, et al. Within-family studies for Mendelian randomization: avoiding dynastic, assortative mating, and population stratification biases. Nature Communications. 2020:602516. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Young AI, Benonisdottir S, Przeworski M, Kong A. Deconstructing the sources of genotype-phenotype associations in humans. Science. 2019;365(6460):1396–400. doi: 10.1126/science.aax3710 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Davey Smith G. Epidemiology, epigenetics and the ‘Gloomy Prospect’: embracing randomness in population health research and practice. International Journal of Epidemiology. 2011;40(3):537–62. doi: 10.1093/ije/dyr117 [DOI] [PubMed] [Google Scholar]
  • 9.Ask H, Rognmo K, Torvik FA, Roysamb E, Tambs K. Non-random mating and convergence over time for alcohol consumption, smoking, and exercise: the Nord-Trondelag Health Study. Behav Genet. 2012;42(3):354–65. Epub 2011/10/19. doi: 10.1007/s10519-011-9509-7 . [DOI] [PubMed] [Google Scholar]
  • 10.Ask H, Idstad M, Engdahl B, Tambs K. Non-random Mating and Convergence Over Time for Mental Health, Life Satisfaction, and Personality: The Nord-Trøndelag Health Study. Behavior Genetics. 2013;43(2):108–19. doi: 10.1007/s10519-012-9578-2 [DOI] [PubMed] [Google Scholar]
  • 11.Xia C, Amador C, Huffman J, Trochet H, Campbell A, Porteous D, et al. Pedigree-and SNP-associated genetics and recent environment are the major contributors to anthropometric and cardiometabolic trait variation. PLoS Genetics. 2016;12(2):e1005804. doi: 10.1371/journal.pgen.1005804 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Hill WD, Arslan RC, Xia C, Luciano M, Amador C, Navarro P, et al. Genomic analysis of family data reveals additional genetic effects on intelligence and personality. Molecular Psychiatry. 2018;23(12):2347. doi: 10.1038/s41380-017-0005-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Amador C, Xia C, Nagy R, Campbell A, Porteous D, Smith BH, et al. Regional variation in health is predominantly driven by lifestyle rather than genetics. Nature Communications. 2017;8(1):801. doi: 10.1038/s41467-017-00497-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Wang K, Gaitsch H, Poon H, Cox NJ, Rzhetsky A. Classification of common human diseases derived from shared genetic and environmental determinants. Nature Genetics. 2017;49(9):1319. doi: 10.1038/ng.3931 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Barbu MC, Shen X, Walker RM, Howard DM, Evans KL, Whalley HC, et al. Epigenetic prediction of major depressive disorder. Molecular Psychiatry. 2020. doi: 10.1038/s41380-020-0808-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Zeng Y, Amador C, Xia C, Marioni R, Sproul D, Walker RM, et al. Parent of origin genetic effects on methylation in humans are common and influence complex trait variation. Nature Communications. 2019;10(1):1383. doi: 10.1038/s41467-019-09301-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Zeng Y, Navarro P, Xia C, Amador C, Fernandez-Pujals AM, Thomson PA, et al. Shared Genetics and Couple-Associated Environment Are Major Contributors to the Risk of Both Clinical and Self-Declared Depression. EBioMedicine. 2016;14:161–7. Epub 2016/11/14. doi: 10.1016/j.ebiom.2016.11.003 ; PubMed Central PMCID: PMC5161419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Bjorngaard JH, Vie GA, Krokstad S, Janszky I, Romundstad PR, Vatten LJ. Cardiovascular mortality—Comparing risk factor associations within couples and in the total population—The HUNT Study. International journal of cardiology. 2017;232:127–33. Epub 2017/01/14. doi: 10.1016/j.ijcard.2017.01.041 . [DOI] [PubMed] [Google Scholar]
  • 19.Tambs K, Moum T. No Large Convergence during Marriage for Health, Lifestyle, and Personality in a Large Sample of Norwegian Spouses. Journal of Marriage and Family. 1992;54(4):957–71. doi: 10.2307/353175 [DOI] [Google Scholar]
  • 20.Robinson MR, Kleinman A, Graff M, Vinkhuyzen AA, Couper D, Miller MB, et al. Genetic evidence of assortative mating in humans. Nature Human Behaviour. 2017;1(1):0016. [Google Scholar]
  • 21.Tenesa A, Rawlik K, Navarro P, Canela-Xandri O. Genetic determination of height-mediated mate choice. Genome biology. 2015;16(1):269. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Yengo L, Robinson MR, Keller MC, Kemper KE, Yang Y, Trzaskowski M, et al. Imprint of assortative mating on the human genome. Nature Human Behaviour. 2018:300020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Mare RD. Five decades of educational assortative mating. American Sociological Review. 1991:15–32. [Google Scholar]
  • 24.Howe LJ, Lawson DJ, Davies NM, Pourcain BS, Lewis SJ, Davey Smith G, et al. Genetic evidence for assortative mating on alcohol consumption in the UK Biobank. Nature Communications. 2019. doi: 10.1038/s41467-019-12424-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Robinson MR, Kleinman A, Graff M, Vinkhuyzen AA, Couper D, Miller MB, et al. Genetic evidence of assortative mating in humans. Nature Human Behaviour. 2017;1:0016. [Google Scholar]
  • 26.Buss DM. Human mate selection: Opposites are sometimes said to attract, but in fact we are likely to marry someone who is similar to us in almost every variable. American Scientist. 1985;73(1):47–51. [Google Scholar]
  • 27.Jiang Y, Bolnick DI, Kirkpatrick M. Assortative mating in animals. The American naturalist. 2013;181(6):E125–38. Epub 2013/05/15. doi: 10.1086/670160 . [DOI] [PubMed] [Google Scholar]
  • 28.Kromer J, Hummel T, Pietrowski D, Giani AS, Sauter J, Ehninger G, et al. Influence of HLA on human partnership and sexual satisfaction. Sci Rep. 2016;6:32550. Epub 2016/09/01. doi: 10.1038/srep32550 ; PubMed Central PMCID: PMC5006172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Chaix R, Cao C, Donnelly P. Is mate choice in humans MHC-dependent? PLoS Genet. 2008;4(9):e1000184. Epub 2008/09/13. doi: 10.1371/journal.pgen.1000184 ; PubMed Central PMCID: PMC2519788. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Sebro R, Hoffman TJ, Lange C, Rogus JJ, Risch NJ. Testing for non-random mating: evidence for ancestry-related assortative mating in the Framingham heart study. Genet Epidemiol. 2010;34(7):674–9. Epub 2010/09/16. doi: 10.1002/gepi.20528 ; PubMed Central PMCID: PMC3775670. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Munafò MR, Tilling K, Taylor AE, Evans DM, Davey Smith G. Collider scope: when selection bias can substantially influence observed associations. International Journal of Epidemiology. 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Catalogue of bias collaboration, Lee H, Aronson JK, D N. Collider bias. In Catalogue of Bias. 2019. Available from: https://catalogofbias.org/biases/collider-bias/.
  • 33.Lawlor DA, Tilling K, Davey Smith G. Triangulation in aetiological epidemiology. International Journal of Epidemiology. 2016;45(6):1866–86. doi: 10.1093/ije/dyw314 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Davey Smith G, Phillips AN. Correlation without a cause: an epidemiological odyssey. International Journal of Epidemiology. 2020;49(1):4–14. doi: 10.1093/ije/dyaa016 International Journal of Epidemiology. [DOI] [PubMed] [Google Scholar]
  • 35.Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562(7726):203. doi: 10.1038/s41586-018-0579-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Davies NM, Howe LJ, Brumpton B, Havdahl A, Evans DM, Davey Smith G. Within family Mendelian randomization studies. Hum Mol Genet. 2019;28(R2):R170–r9. Epub 2019/10/28. doi: 10.1093/hmg/ddz204 . [DOI] [PubMed] [Google Scholar]
  • 37.Mostafavi H, Harpak A, Agarwal I, Conley D, Pritchard JK, Przeworski M. Variable prediction accuracy of polygenic scores within an ancestry group. eLife. 2020;9. Epub 2020/01/31. doi: 10.7554/eLife.48376 ; PubMed Central PMCID: PMC7067566. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Pirastu N, Cordioli M, Nandakumar P, Mignogna G, Abdellaoui A, Hollis B, et al. Genetic analyses identify widespread sex-differential participation bias. Nature Genetics. 2021;53(5):663–71. doi: 10.1038/s41588-021-00846-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Barton N, Hermisson J, Nordborg M. Why structure matters. eLife. 2019;8. Epub 2019/03/22. doi: 10.7554/eLife.45380 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Berg JJ, Harpak A, Sinnott-Armstrong N, Joergensen AM, Mostafavi H, Field Y, et al. Reduced signal for polygenic adaptation of height in UK Biobank. eLife. 2019;8. Epub 2019/03/22. doi: 10.7554/eLife.39725 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Sohail M, Maier RM, Ganna A, Bloemendal A, Martin AR, Turchin MC, et al. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. eLife. 2019;8. Epub 2019/03/22. doi: 10.7554/eLife.39702 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Ruby JG, Wright KM, Rand KA, Kermany A, Noto K, Curtis D, et al. Estimates of the Heritability of Human Longevity Are Substantially Inflated due to Assortative Mating. Genetics. 2018;210(3):1109–24. doi: 10.1534/genetics.118.301613 Genetics. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Haworth S, Mitchell R, Corbin L, Wade KH, Dudding T, Budu-Aggrey A, et al. Apparent latent structure within the UK Biobank sample has implications for epidemiological analysis. Nature Communications. 2019;10(1):333. doi: 10.1038/s41467-018-08219-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Kong A, Thorleifsson G, Frigge ML, Vilhjalmsson BJ, Young AI, Thorgeirsson TE, et al. The nature of nurture: Effects of parental genotypes. Science. 2018;359(6374):424–8. doi: 10.1126/science.aan6877 [DOI] [PubMed] [Google Scholar]
  • 45.Young AI, Frigge ML, Gudbjartsson DF, Thorleifsson G, Bjornsdottir G, Sulem P, et al. Relatedness disequilibrium regression estimates heritability without environmental bias. Nature Genetics. 2018;50(9):1304. doi: 10.1038/s41588-018-0178-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Hartwig FP, Davies NM, Davey Smith G. Bias in Mendelian randomization due to assortative mating. Genet Epidemiol. 2018;42(7):608–20. Epub 2018/07/05. doi: 10.1002/gepi.22138 ; PubMed Central PMCID: PMC6221130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Evans DM, Moen GH, Hwang LD, Lawlor DA, Warrington NM. Elucidating the role of maternal environmental exposures on offspring health and disease using two-sample Mendelian randomization. Int J Epidemiol. 2019. Epub 2019/03/01. doi: 10.1093/ije/dyz019 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Moen GH, Hemani G, Warrington NM, Evans DM. Calculating Power to Detect Maternal and Offspring Genetic Effects in Genetic Association Studies. Behav Genet. 2019. Epub 2019/01/03. doi: 10.1007/s10519-018-9944-9 . [DOI] [PubMed] [Google Scholar]
  • 49.Warrington NM, Freathy RM, Neale MC, Evans DM. Using structural equation modelling to jointly estimate maternal and fetal effects on birthweight in the UK Biobank. Int J Epidemiol. 2018;47(4):1229–41. Epub 2018/02/16. doi: 10.1093/ije/dyy015 ; PubMed Central PMCID: PMC6124616. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Beaumont RN, Warrington NM, Cavadino A, Tyrrell J, Nodzenski M, Horikoshi M, et al. Genome-wide association study of offspring birth weight in 86 577 women identifies five novel loci and highlights maternal genetic effects that are independent of fetal genetics. Hum Mol Genet. 2018;27(4):742–56. Epub 2018/01/09. doi: 10.1093/hmg/ddx429 ; PubMed Central PMCID: PMC5886200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Muñoz M, Pong-Wong R, Canela-Xandri O, Rawlik K, Haley CS, Tenesa A. Evaluating the contribution of genetics and familial shared environment to common disease using the UK Biobank. Nature Genetics. 2016. doi: 10.1038/ng.3618 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Spielman RS, Ewens WJ. The TDT and other family-based tests for linkage disequilibrium and association. American journal of human genetics. 1996;59(5):983. [PMC free article] [PubMed] [Google Scholar]
  • 53.Cordell HJ, Clayton DG. Genetic association studies. The Lancet. 2005;366(9491):1121–31. doi: 10.1016/S0140-6736(05)67424-7 [DOI] [PubMed] [Google Scholar]
  • 54.Davey Smith G, Ebrahim S. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? International Journal of Epidemiology. 2003;32(1):1–22. doi: 10.1093/ije/dyg070 [DOI] [PubMed] [Google Scholar]
  • 55.Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Medicine. 2015;12(3):e1001779. doi: 10.1371/journal.pmed.1001779 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Okbay A, Beauchamp JP, Fontana MA, Lee JJ, Pers TH, Rietveld CA, et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature. 2016;533(7604):539. doi: 10.1038/nature17671 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Denaxas SC, George J, Herrett E, Shah AD, Kalra D, Hingorani AD, et al. Data resource profile: cardiovascular disease research using linked bespoke studies and electronic health records (CALIBER). International Journal of Epidemiology. 2012;41(6):1625–38. doi: 10.1093/ije/dys188 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.O’Connell J, Sharp K, Shrine N, Wain L, Hall I, Tobin M, et al. Haplotype estimation for biobank-scale data sets. Nature Genetics. 2016;48(7):817–20. doi: 10.1038/ng.3583 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.UK10K Consortium. The UK10K project identifies rare variants in health and disease. Nature. 2015;526(7571):82–90. doi: 10.1038/nature14962 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, et al. A reference panel of 64,976 haplotypes for genotype imputation. Nature Genetics. 2016;48(10):1279. doi: 10.1038/ng.3643 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Genomes Project Consortium. A global reference for human genetic variation. Nature. 2015;526(7571):68–74. doi: 10.1038/nature15393 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Allen NE, Sudlow C, Peakman T, Collins R. UK biobank data: come and get it. American Association for the Advancement of Science; 2014. [DOI] [PubMed] [Google Scholar]
  • 63.Nikpay M, Goel A, Won H-H, Hall LM, Willenborg C, Kanoni S, et al. A comprehensive 1000 Genomes–based genome-wide association meta-analysis of coronary artery disease. Nature Genetics. 2015;47(10):1121. doi: 10.1038/ng.3396 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Locke AE, Kahali B, Berndt SI, Justice AE, Pers TH, Day FR, et al. Genetic studies of body mass index yield new insights for obesity biology. Nature. 2015;518(7538):197. doi: 10.1038/nature14177 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Wood AR, Esko T, Yang J, Vedantam S, Pers TH, Gustafsson S, et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nature Genetics. 2014;46(11):1173. doi: 10.1038/ng.3097 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Ehret GB, Ferreira T, Chasman DI, Jackson AU, Schmidt EM, Johnson T, et al. The genetics of blood pressure regulation and its target organs from association studies in 342,415 individuals. Nature Genetics. 2016;48(10):1171. doi: 10.1038/ng.3667 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Loh P-R, Tucker G, Bulik-Sullivan BK, Vilhjalmsson BJ, Finucane HK, Salem RM, et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nature Genetics. 2015;47(3):284–90. doi: 10.1038/ng.3190 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics. 2007;81(3):559–75. doi: 10.1086/519795 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Palla L, Dudbridge F. A fast method that uses polygenic scores to estimate the variance explained by genome-wide marker panels and the proportion of variants affecting a trait. The American Journal of Human Genetics. 2015;97(2):250–9. doi: 10.1016/j.ajhg.2015.06.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Burgess S, Butterworth A, Thompson SG. Mendelian randomization analysis with multiple genetic variants using summarized data. Genetic Epidemiology. 2013;37(7):658–65. doi: 10.1002/gepi.21758 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Altman DG, Bland JM. Interaction revisited: the difference between two estimates. BMJ. 2003;326(7382):219. doi: 10.1136/bmj.326.7382.219 BMJ. [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

David Balding, Samuli Ripatti

25 Mar 2021

Dear Dr Howe,

Thank you very much for submitting your Research Article entitled 'Assortative mating and within-spouse pair comparisons' to PLOS Genetics.

The manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the current manuscript. Based on the reviews, we will not be able to accept this version of the manuscript, but we would be willing to review a much-revised version. We cannot, of course, promise publication at that time.

Should you decide to revise the manuscript for further consideration here, your revisions should address the specific points made by each reviewer. We will also require a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript.

If you decide to revise the manuscript for further consideration at PLOS Genetics, please aim to resubmit within the next 60 days, unless it will take extra time to address the concerns of the reviewers, in which case we would appreciate an expected resubmission date by email to plosgenetics@plos.org.

If present, accompanying reviewer attachments are included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist.

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool.  PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process.

To resubmit, use the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder.

[LINK]

We are sorry that we cannot be more positive about your manuscript at this stage. Please do not hesitate to contact us if you have any concerns or questions.

Yours sincerely,

Samuli Ripatti

Associate Editor

PLOS Genetics

David Balding

Section Editor: Methods

PLOS Genetics

As you can see from the reviewers comments, they found the work interesting and potentially important, but what would be particularly needed is to make a better fit to a broader genetics audience. In its current format, the paper would make a better fit to an epidemiological journal. However the reviewers give several suggestions on how to modify it to be better suited to a genetics audience with better examples from genetic analyses and examination of the behaviour of the model across a range of circumstances in simulated and/or empirical data.

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: Review been attached

Reviewer #2: This study presents a comparison of two different “within-family” models for testing genetic associations. The concept is clear that a within-spouse pair analysis may account for shared environment as the shared environment among spouses is the same. However, I have a few concerns:

1. I do not see how a WSP model controls for ancestry confounding if spousal differences correlate with their ancestry differences? Yes, pairs may be matched for the home environment, but a correlation between delta_phenotype and delta_genotype could certainly be driven by delta_ancestry among spousal pairs, perhaps even more so as all other environmental similarity may be controlled for. If one fits the model in a mixed model framework does it help control for stratification? What if there is a phenotypic correlation with PC1 and mating is random with respect to PC1, is there covariance between the effect size estimates and the PC loadings. The sibling pair analysis, due to mendelian segregation should be unbiased of this, but I do not see a strong consideration of genetic stratification effects in this work, which I feel leaves it lacking.

2. Why use summary statistics from previously published non-UKB meta-analyses? Why not compare to directly estimating the effects using a mixed-effects association model in the UK Biobank? Non-UKB meta-analysis estimates are obtained from very heterogeneous cohorts in the worst type of association study methodology (single-marker marginal associations with some arbitrarily determine dnumber of PCs). The current comparison is valid, but I think a comparison of analysing the same data in a different way is also warranted. Note systolic blood pressure was analysed in this way (within UKB analysis) and the shrinkage, especially from the spouse pair model is much less than for some of the other traits.

3. I would also suggest rather than taking the most statistically significant estimates, which acertains a very specific subset of variants with specific LD and MAF properties, that comparisons are done across a range of markers. Perhaps prune the data for LD beforehand and analyse all the markers and then observe the shrinkage genome-wide and as a function of the association test statistic values?

Reviewer #3: Please see attachment

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Attachment

Submitted filename: Review.docx

Attachment

Submitted filename: HoweEtAl_PLOSGenet_2021.docx

Decision Letter 1

David Balding, Samuli Ripatti

11 Aug 2021

Dear Dr Howe,

Thank you very much for submitting your Research Article entitled 'Assortative mating and within-spouse pair comparisons' to PLOS Genetics.

The manuscript was fully evaluated at the editorial level and by independent peer reviewers.  Both reviewers find that the revised article has improved considerably and has substantial merit, but Reviewer 3 raises a fundamental question about the role of collider bias in the WSP design and how it impacts the control of confounding in an empirical setting.  The editors would like to see these addressed. The reviewer suggest two possible ways to quantify the problem. Either (or both) of these additional analyses would potentially strengthen the paper and help drive home its main argument.  Alternatively the authors may find a better approach to address Reviewer 3's concerns.

Therefore we are sending it back to you for major revision, but this time with a narrower focus on the issue to be addressed and we look forward to receiving a revised submission.  As usual, please detail responses to the reviewers in a letter and describe the changes you have made in the manuscript.

If you decide to revise the manuscript for further consideration at PLOS Genetics, please aim to resubmit within the next 60 days, unless it will take extra time to address the concerns of the reviewers, in which case we would appreciate an expected resubmission date by email to plosgenetics@plos.org.

If present, accompanying reviewer attachments are included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist.

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool.  PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process.

To resubmit, use the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder.

[LINK]

We are sorry that we cannot be more positive about your manuscript at this stage. Please do not hesitate to contact us if you have any concerns or questions.

Yours sincerely,

Samuli Ripatti

Associate Editor

PLOS Genetics

David Balding

Section Editor: Methods

PLOS Genetics

Reviewer's Responses to Questions

Reviewer #2: I believe that the authors have argued for the approach taken in this manuscript very well and in doing so have alleviated my concerns sufficiently for meta support publication. While I think more work needs to be done on this topic, what is presented here is very interesting and represents an ideal starting point for future research.

Reviewer #3: I appreciate the authors significantly reworked the presentation of the manuscript, which in my view is now much easier to follow. I think both the Introduction and Discussion sections are now much improved.

My main concern is the following. The manuscript strives to drive home two points (based on the abstract): (1) WSP design can reduce confounding, but (2) WSP design is susceptible to collider bias. Both confounding and collider bias are addressed in simulation, but there is no direct evidence of collider bias is at play in empirical data analysis. Rather, the authors inferred that collider bias is happening because shrinkage of the effect size estimates in WSP design is larger than within-sibship design; since within-sibship design is perfectly controlling for confounding due to ancestry/demography/family, the additional shrinkage is attributed to collider bias that induces a negative correlation. But ultimately, this is still only an indirect inference by deduction. Collider bias is thus, in my view, still not strongly supported by empirical data analysis. Would some approach like that published in Day et al., AJHG 2016 (PMID 26849114) be feasible in this setting to provide a direct demonstration of collider bias?

In fact, by the author’s argument, shrinkage in the WSP design need not be greater than that in the within-sibship design for collider bias to be at play, since the correlation between spouses in principal component space is quite a bit less than the correlation between sibs, suggesting that WSP design is much less efficacious to control for confounding (at least due to ancestry), than the within-sibship design. Yet, it is not clear the degree to which WSP design could be controlling for confounding, while additional shrinkage could be attributed to collider bias. So some form of partitioning of the effects attributed to correction for confounding could contribute to evidence that collider bias is operating (though still indirectly).

Lastly, a slightly minor concern, is that the authors assumed that spouses assort on ancestry compared to non-spousal pairs (e.g. paragraph 2 of Discussion). I think this comes from the belief that ancestry are the most common confounder in genetic association studies, and there are correlations in PC space between spouses – but it is not actually demonstrated that the correlation in PC space is related to ancestry, considering that only the white British UKB individuals were used in the analysis. (And note that PCA would be sensitive to the choice of individuals used in the analysis, thus subject to all upstream ascertainment practices. In other words, demonstration would need to occur in-analysis, rather than citing publications investigating population structure in UKB in general).

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Reviewer #3: No

Decision Letter 2

David Balding, Samuli Ripatti

15 Oct 2021

Dear Dr Howe,

We are pleased to inform you that your manuscript entitled "Assortative mating and within-spouse pair comparisons" has been editorially accepted for publication in PLOS Genetics. Congratulations!

Before your submission can be formally accepted and sent to production you will need to complete our formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Please note: the accept date on your published article will reflect the date of this provisional acceptance, but your manuscript will not be scheduled for publication until the required changes have been made.

Once your paper is formally accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you’ve already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosgenetics@plos.org.

In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pgenetics/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. Note that PLOS requires an ORCID iD for all corresponding authors. Therefore, please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field.  This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager.

If you have a press-related query, or would like to know about making your underlying data available (as you will be aware, this is required for publication), please see the end of this email. If your institution or institutions have a press office, please notify them about your upcoming article at this point, to enable them to help maximise its impact. Inform journal staff as soon as possible if you are preparing a press release for your article and need a publication date.

Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Genetics!

Yours sincerely,

Samuli Ripatti

Associate Editor

PLOS Genetics

David Balding

Section Editor: Methods

PLOS Genetics

www.plosgenetics.org

Twitter: @PLOSGenetics

-----------------------------------------------------

Data Deposition

If you have submitted a Research Article or Front Matter that has associated data that are not suitable for deposition in a subject-specific public repository (such as GenBank or ArrayExpress), one way to make that data available is to deposit it in the Dryad Digital Repository. As you may recall, we ask all authors to agree to make data available; this is one way to achieve that. A full list of recommended repositories can be found on our website.

The following link will take you to the Dryad record for your article, so you won't have to re‐enter its bibliographic information, and can upload your files directly: 

http://datadryad.org/submit?journalID=pgenetics&manu=PGENETICS-D-21-00140R2

More information about depositing data in Dryad is available at http://www.datadryad.org/depositing. If you experience any difficulties in submitting your data, please contact help@datadryad.org for support.

Additionally, please be aware that our data availability policy requires that all numerical data underlying display items are included with the submission, and you will need to provide this before we can formally accept your manuscript, if not already present.

----------------------------------------------------

Press Queries

If you or your institution will be preparing press materials for this manuscript, or if you need to know your paper's publication date for media purposes, please inform the journal staff as soon as possible so that your submission can be scheduled accordingly. Your manuscript will remain under a strict press embargo until the publication date and time. This means an early version of your manuscript will not be published ahead of your final version. PLOS Genetics may also choose to issue a press release for your article. If there's anything the journal should know or you'd like more information, please get in touch via plosgenetics@plos.org.

Acceptance letter

David Balding, Samuli Ripatti

29 Oct 2021

PGENETICS-D-21-00140R2

Assortative mating and within-spouse pair comparisons

Dear Dr Howe,

We are pleased to inform you that your manuscript entitled "Assortative mating and within-spouse pair comparisons" has been formally accepted for publication in PLOS Genetics! Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out or your manuscript is a front-matter piece, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Genetics and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Agnes Pap

PLOS Genetics

On behalf of:

The PLOS Genetics Team

Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom

plosgenetics@plos.org | +44 (0) 1223-442823

plosgenetics.org | Twitter: @PLOSGenetics

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. Model 1: Spousal correlations controlling for confounding.

    Results from simulation analyses investigating how the WSP model can control for confounding if spouses assort on the confounder.

    (DOCX)

    S2 Table. Model 2: Assortment and collider bias.

    Results from simulation analyses investigating how the WSP model may be susceptible to collider bias induced by spousal assortment.

    (DOCX)

    S3 Table. Characteristics of the spouse sample (N≤94,870).

    A table containing summary-level phenotype information on the characteristics of the UK Biobank spouses, stratified by sex.

    (DOCX)

    S4 Table. Spouse and sibling pair correlations for birth coordinates and principal components.

    A table containing within-pair correlations for spouses and siblings for north-south and east-west birth coordinates as well as the first 10 principal components.

    (DOCX)

    S5 Table. Comparisons of WSP and within-sibship shrinkage estimates.

    A table containing WSP and within-sibship shrinkage estimates from this study for height, educational attainment, BMI, SBP and alcohol consumption as well as within-sibship shrinkage estimates from an external preprint.

    (DOCX)

    S1 Fig. Simulation results for Within spouse-pair models.

    A–Simulations for model (A): Spousal correlations controlling for confounding. As the strength of spousal assortment (spousal correlation) on the confounder (E) increases, the within-spouse pair (WSP) estimate of X on Y unadjusted for E (in blue) moves from the confounded unadjusted estimate of 0.45 to the unbiased estimate of 0.30. B–Simulations for model (B): Within spouse-pair: assortment and collider bias. Spousal assortment can induce collider bias in WSP estimates. If spouses assort on two phenotypes X1 and X2 which both affect outcome Y, then the association of X1 and Y (or X2 and Y) estimated from the WSP model is a biased estimate of the causal effect of X1 on Y (or X2 on Y). This bias monotonically increases in the degree of assortment on either X1 or X2.

    (PNG)

    S2 Fig. Height of UK Biobank spouse pairs.

    Scatter plot showing male spouse height on the X axis and female spouse height on the Y axis for each spouse-pair.

    (PNG)

    S3 Fig. BMI of UK Biobank spouse pairs.

    Scatter plot showing male spouse BMI on the X axis and female spouse BMI on the Y axis for each spouse-pair.

    (PNG)

    S4 Fig. SBP of UK Biobank spouse pairs.

    Scatter plot showing male spouse SBP on the X axis and female spouse SBP on the Y axis for each spouse-pair.

    (PNG)

    Attachment

    Submitted filename: Review.docx

    Attachment

    Submitted filename: HoweEtAl_PLOSGenet_2021.docx

    Attachment

    Submitted filename: withinSpouseManuscript_reviewer_response_060721.docx

    Attachment

    Submitted filename: WithinSpouse_Response_September2021.docx

    Data Availability Statement

    In this study, we used individual participant data from UK Biobank. Interested researchers would be able to obtain the same data-set from UK Biobank. Reference: https://www.nature.com/articles/s41586-018-0579-z Contact: access@ukbiobank.ac.uk. Relevant code for simulation models is available at the following repository https://github.com/LaurenceHowe/Between-spouse.


    Articles from PLoS Genetics are provided here courtesy of PLOS

    RESOURCES