Abstract
The observation of genetic correlations between disparate human traits has been interpreted as evidence of widespread pleiotropy. Here, we introduce cross-trait assortative mating (xAM) as an alternative explanation. We observe that xAM affects many phenotypes and that phenotypic cross-mate correlation estimates are strongly associated with genetic correlation estimates (R2 = 74%). We demonstrate that existing xAM plausibly accounts for substantial fractions of genetic correlation estimates and that previously reported genetic correlation estimates between some pairs of psychiatric disorders are congruent with xAM alone. Finally, we provide evidence for a history of xAM at the genetic level using cross-trait even/odd chromosome polygenic score correlations. Together, our results demonstrate that previous reports have likely overestimated the true genetic similarity between many phenotypes.
One-Sentence Summary:
Statistical artifacts due to non-random mating, rather than shared biology, may explain reported genetic correlations.
Methods that use summary statistics from genome-wide association studies (GWAS) to investigate genetic overlap across phenotypes have become a fundamental statistical tool across many domains of human complex trait genetics (1–5). The results of these analyses have been striking: many trait pairs, even those with limited phenotypic similarity, display nontrivial genetic correlations (for example, 0.209 [se=0.042] for attention-deficit hyperactivity disorder [ADHD] and body mass index [BMI] in (1)). These findings have been broadly interpreted as evidence for widespread pleiotropy across the phenome (6–8), and, in the case of psychiatric disorders, have raised concerns about the suitability of the existing nosology given evidence for shared genetic bases (1, 9).
Here, we consider an overlooked source of potential bias in these findings: cross-trait assortative mating (xAM), the phenomenon whereby mates display cross-correlations across distinct traits. There are several reasons to be concerned with this potential oversight: First, the single-trait linear mixed model, which genetic correlation estimators generalize, is misspecified under single-trait assortative mating (sAM) and overestimates SNP heritability (10). Second, sAM is widespread across multiple domains for which substantial genetic correlations have been observed, including anthropometric, psychosocial, and disease traits (1, 7, 8). Third, recent work has provided genetic-level evidence for a history of sAM with respect to some of these same phenotypes (11). Fourth, xAM is known to generate spurious results for other marker-based inference procedures, including Mendelian randomization (12) and association studies (13).
We set out to systematically assess the impact of xAM on genetic correlation estimates, first compiling a large atlas of cross-mate correlations across a broad array of previously-studied phenotypes utilizing two large population-based samples (n=81,394; n=746,566). We find that these phenotypic cross-mate correlations explain a major portion of empirical marker-based genetic correlation estimates for the same trait pairs (R2=74% across samples). We next demonstrate that xAM biases genetic correlation estimates and yields non-trivial estimates even among traits with uncorrelated genetic effects. We use a simulation-based approach to evaluate the extent to which empirical levels of xAM alone might plausibly explain genetic correlation estimates among previously-studied traits, finding that, for many trait pairs, substantial fractions of empirical genetic correlation estimates are congruent with expectations for etiologically independent traits subject to xAM. At the same time, we observe that particular phenotype pairs, such as schizophrenia and bipolar disorders, evidence substantially larger genetic correlation estimates than can be plausibly attributed to xAM-induced artifact. Lastly, we utilize correlations between even versus odd chromosome-specific polygenic scores (PGS) to detect genetic signatures of xAM, extending a previous approach (11). We find that cross-trait even/odd PGS correlations mirror cross-mate phenotypic correlation patterns and, through this association, explain substantial variation in empirical genetic correlation estimates.
Results
Genetic correlation estimates mirror cross-mate phenotypic correlations
We begin by quantifying the extent to which empirical genetic correlation estimates align with cross-trait spousal correlations across a broad array of phenotypes: a set of 20 previously-studied traits measured in the UK Biobank (UKB) (14) and a collection of six psychiatric disorder diagnoses ascertained from Danish civil registry data (15). We estimated cross-mate correlations for 40,697 spousal pairs within the UKB sample and 373,283 mate pairs randomly selected from the Danish population. For a pair of phenotypes Y and Z, there are three cross-mate correlation parameters: ryy and rzz the correlations between mates on Y and Z, respectively, and ryz, the cross-mate cross-trait correlation; we generically denote these quantities rmate, and present these estimates in the diagonal and sub-diagonal entries of Figs. 1A and 1B. We also compiled LD score regression (LDSC) genetic correlation estimates, denoted , for each pair of phenotypes, which we present in the super-diagonal entries of Figs. 1A and 1B. All pairwise estimates are provided in Table S1.
Cross-mate correlation structures were diverse across the trait pairs we examined (Fig. S1). Whereas cross-mate single-trait and cross-mate cross-trait correlations were similar for some trait pairs (for example, , , and for BMI and hip circumferences), these quantities were of opposing signs for others (for example, and versus for years of education and regular smoking). In general, cross-mate correlation structures were not consistent with sAM alone. When the cross-mate correlation rzz for as secondary trait Z is fully mediated through sAM on Y, we expect that ; this model fit the data poorly (Fig. S2).
Estimates of were strongly associated with rmate estimates across both samples (Fig. 1C; meta-analytic R2=74.32%, 95% CI: 67.02%–81.62% in a linear model; R2=76.69%, 95% CI: 73.94%–79.45% in a Bayesian model accounting for heteroskedasticity and estimation error). The regression slope did not significantly differ across the UKB and psychiatric phenotypes in either model (for example, p=0.16 for a sample-by-rmate interaction term in the linear model). The strength of this association largely persisted when excluding trait pairs with large genetic correlation estimates: considering only trait pairs with estimated genetic correlations below 0.50 in magnitude yielded R2=70.94%; further restricting to those below 0.30 in magnitude, yielded R2=67.88%. This suggests that the observed association does not merely reflect sAM on genetically homogenous factors.
Defining genetic correlation
Having established that a large degree of the variance in genetic correlation estimates can be predicted from phenotypic mating correlations, we now provide theoretical intuition as to why this might occur (see Supplementary Text for further details; (16)). We start by defining three distinct notions of genetic similarity between phenotypes. These definitions are summarized in Table 1.
Table 1.
Metric of genetic similarity | Relation to shared etiology and xAM |
---|---|
Pleiotropy is present at a particular locus when it influences both phenotypes. | Reflects shared etiology. Substantial numbers of pleiotropic loci imply that overlapping genetic variants affect both traits, though their effects may not be consistent. |
Effect correlation (ρβ) refers to the correlation between standardized genetic effects. | Reflects shared etiology. ρβ > 0 implies an overlapping set of variants (at pleiotropic loci) influence both traits with similar effects on average. |
Score correlation (ρℓ) refers to the correlation between true polygenic scores. | Reflects shared etiology, or xAM-induced population structure, or both. Roughly equal to ρβ under random mating but larger than ρβ under xAM due to long-range sign-consistent LD. ρℓ > 0 does not necessarily imply biological similarity or even the existence of pleiotropic loci. |
Genetic correlation estimators (), such as bivariate LD score regression, are commonly interpreted as estimates of the effect correlation. | Reflect shared etiology under random mating but produces estimates substantially greater than both ρβ and ρℓ under xAM, even when ρβ = 0 or in the complete absence of pleiotropy. |
Under random mating, score correlations and effect correlations are equal in expectation, imply the existence of pleiotropic loci, and are well-captured by widely-used genetic correlation estimators. Under xAM, however, substantial score correlations can arise in the absence of effect correlation or even pleiotropy, and genetic correlation estimators overestimate both ρβ and ρℓ.
We consider a pair of phenotypes Y, Z, with heritable components ℓy, ℓz reflecting the additive effects of m standardized haploid variants X1, … Xm with phenotype-specific effect vectors βy, βz. For simplicity, we assume that causal variants are initially unlinked and that both phenotypes have unit variance under random mating (panmixis), such that the panmictic heritabilities are and .
Pleiotropy is present when a locus influences two or more phenotypes. Thus, locus Xi is pleiotropic with respect to Y and Z when both βy;i ≠ 0 and βz;i ≠ 0, though these effects might differ substantially in magnitude or direction. On the other hand, the correlation between effects, which we refer to as the effect correlation ρβ, indexes the similarity of variant effects on two phenotypes:
where the β vectors include all variants, causal or otherwise, and thus may contain elements equal to zero. A value of ρβ > 0 implies both the existence of pleiotropic loci and that said loci have similar effects on average, and we term a pair of traits genetically orthogonal when ρβ = 0. Effect correlation is distinct from the classical definition of genetic correlation as the correlation between the heritable components of two traits (17), which we refer to as the score correlation:
as it reflects the correlation between the true PGS.
Within the standard linear mixed model framework, ρℓ and ρβ are equivalent and hence seldom discussed as separate quantities (though genetic correlation estimates are commonly interpreted as estimates of ρβ (4, 18, 19)). However, traits with uncorrelated effects can unintuitively have correlated PGS. Under xAM, all causal variants affecting trait Y become correlated with all causal variants affecting trait Z, and these correlations are directionally consistent with their respective effects (see Supplementary Text (16)). As we will demonstrate in the following section, this results in non-zero score correlations in the direction of the cross-mate cross-trait phenotypic correlation, even for genetically orthogonal traits.
The impact of xAM in simulations
We ran a series of forward-time simulations using realistic genotype data to investigate the impact of xAM on multiple measures of genetic correlation. At each generation, individuals (consisting of a set of genotypes together with two phenotypes) were matched to achieve target cross-mate correlation parameters, after which we estimated genetic correlations using LDSC (denoted ; (6)), Haseman-Elston regression (HE, denoted ; (20)), and residual maximum likelihood (REML; ; (18)). We also computed true score correlations (ρℓ), which is possible when the true genetic effects are known. We performed sensitivity analyses to confirm that our results did not depend on simulation parameters, including the number of causal variants (Fig. S3), mate selection algorithm (Fig. S4), recombination scheme (Fig. S5), and whether causal variants with orthogonal genetic effects arose on overlapping loci (Fig. S6). We additionally investigated the impact of xAM on GWAS effect estimates and GWAS-based methods for identifying pleiotropic SNPs (Figs. S7 to S10), genetic correlation estimates for binary phenotypes subject to misdiagnosis (Fig. S11), partitioned genetic correlation estimates (Fig. S12), and genetic covariance estimators (Fig. S13).
xAM induces nonzero score correlations among genetically orthogonal traits
We confirmed that xAM induces substantial score correlations across a broad array of simulation parameters. This is perhaps most striking for traits with orthogonal effects: Fig. 2A demonstrates the increase in the true score correlation across multiple generations of xAM for a pair of traits with ρβ = 0, rmate = 0.5, and . Across simulation replicates, the average score correlation was 0.11 after a single generation of xAM, which increased to 0.24 after three generations.
Importantly, this increase in score correlation induced by xAM does not represent bias: the population-level correlation between the heritable components of the phenotypes truly does increase under xAM. On the other hand, as we demonstrate next, genetic correlation estimators become misspecified under xAM and yield biased estimates. Still, even unbiased estimates of score correlation can be driven by either shared biology, or xAM, or both, further complicating interpretation (Table 1).
Effect correlation estimates are biased upwards
For genetically orthogonal traits, after a single generation of xAM, the REML estimator yielded and the HE and LDSC estimators, which are closely related (21, 22), both yielded average estimates of , all of which were greater than the true value of ρβ=0.00. After three generations of xAM, this upward bias became more pronounced, with REML and LDSC yielding estimates of and , respectively.
The quantities ρ ℓ , and are monotonically related to ρ β , r mate , and
Many trait pairs subject to xAM will truly have correlated genetic effects. Figure 2B illustrates the relationship between ρβ, ρℓ, and for two traits with . Excepting the case of ρβ=1.0 (genetically identical phenotypes), results remained consistent with the genetically orthogonal case: ρβ was lower than ρℓ, which was in turn lower than the upwardly biased estimates provided by REML, HE, and LDSC. For example, when ρβ=0.25, LDSC yielded after three generations of xAM. We note that whereas the true effect correlation varies in Fig. 2B, the cross-mate correlations remain fixed, demonstrating that the potential for xAM-induced bias is present even when cross-mate cross-trait correlations partially reflect shared genetic bases. The impact of xAM on both ρℓ and was greater for traits under stronger xAM (Fig. 2C) and for traits with greater heritabilites (Fig. 2D).
xAM biases annotation- and locus-level analyses
Partitioned genetic correlation estimators evidenced similar biases as genome-wide estimators under xAM, even when supplied with annotations directly relevant to bivariate genetic architecture. Further, this bias was greatest at regions relevant to only one of the two phenotypes (Fig. S12).
In association studies, GWAS effect estimates for SNPs causal for the focal trait were biased upwards in magnitude whereas those causal for a secondary, unrelated trait under xAM with the first were biased toward their effects on that trait (Figs. S7 to S9). These biases were asymptotically non-negligible (Fig. S9). As a result, xAM increased the likelihood of rejecting the null-hypothesis of no association at all SNPs causal for either trait, increasing both statistical power and false positives rates (Fig. S10). Eventually, all variants affecting a secondary phenotype subject to xAM with the GWAS phenotype will reach genome-wide significance as sample size becomes large. However, spurious effect estimates will remain attenuated (Figs. S7 to S9), implying that methods for identifying cross-trait heterogeneity in GWAS estimates may have the potential to differentiate trait-specific signal from xAM-induced artifacts.
xAM alone can plausibly explain substantial variance in empirical genetic correlation estimates
We next sought to quantify the extent to which empirical estimates for previously-studied trait pairs could be explained by xAM alone, assuming genetic orthogonality. We proceeded with a simulation-based approach, which we present for the UKB and psychiatric phenotypes in turn.
Within each sample we identified mate pairs and estimated phenotypic cross-mate correlations. We then used these estimates, together with empirical heritability estimates, as inputs to a forward-time simulation where separate, non-overlapping collections of causal variants were assigned to each phenotype, such that ρβ=0.0. At each generation, we estimated the effect correlation, ρβ, using method-of-moments. These projected effect correlation estimates, which we denote , can be interpreted as expected LDSC genetic correlation estimates for genetically orthogonal traits under xAM consistent with empirical spousal correlations.
We next compared to empirical LDSC estimates derived in real data, which we denote . To simplify discussion, we define the ratio:
which measures the projected LDSC effect correlation estimate due to xAM-induced artifact relative to the empirical LDSC effect correlation estimate for a given phenotype pair (Fig. 3A).
Expected effect correlation estimates for UKB phenotypes in the absence of pleiotropy
We restricted our attention to 132 (of 190 possible) pairs of UKB phenotypes with nominally significant (p < 0.05) LDSC genetic correlation estimates (Table S1). We first obtained pedigree-based heritability estimates for each of the traits of interest from the literature, using estimates derived in demographically-comparable (Table S2). Together with the phenotypic mating correlations (Fig. 1A), these comprised inputs to forward time simulations used to compute .
Across 132 trait pairs, 42 evidenced values significantly greater than zero (their 95% credible intervals did not include zero) after a single generation of xAM, which increased to 74 trait pairs after three generations. Across all trait pairs (including those not significantly different from zero), the inverse variance weighted average estimate was 0.25 (se=0.005). Figure 3B presents the first 20 pairs in descending order of and Fig. 3C presents the raw projected and empirical effect correlation estimates across all 132 pairs (see Table S3 and Figs. S14 to S15 for detailed results spanning five generations of xAM). Finally, Fig. 3D displays average values within and between qualitative phenotypic domains.
Expected effect correlation estimates among psychiatric disorders in the absence of pleiotropy
We next estimated for a collection of six psychiatric disorders, using correlations estimated in spousal pairs randomly selected from the Danish population (Table S4). We then compared these projections to the LDSC genetic correlation estimates reported by Grotzinger and colleagues (23).
Across all pairwise combinations of disorders, we observed an average ratio of (se=0.016; Figs. 4A and 4B) after five generations of xAM. Some trait pairs evidenced considerably greater empirical genetic correlation estimates than might be explained by xAM alone (for anxiety disorders and major depression, , 95% CI: 0.17–0.25), whereas for other pairs this discrepancy was modest (for alcohol use disorder and schizophrenia, , 95% CI: 0.59–1.24; see Table S5 and Fig. S16 for complete results).
xAM exacerbates bias due to misdiagnosis
After additional simulations demonstrated that xAM and further inflates genetic correlation estimates in the context diagnostic errors (Fig. S11), we extended our method for estimating to incorporate misdiagnosis. Results were heterogenous across disorder pairs (Fig. S17). For example, whereas moderate rates of diagnostic errors (5%) together with three generations of xAM yielded genetic correlation estimates for ADHD and major depression on par with published estimates (, 95% CI: 0.73–1.22), substantial diagnostic errors (15%) after five generations of xAM yielded estimates well below previously published estimates for bipolar disorders and major depression (, 95% CI: 0.12–0.62). Figure 4C highlights the potential impacts of xAM and diagnostic errors on four selected trait pairs and Fig. S17 presents results for all pairs.
Genetic evidence for xAM recapitulates empirical cross-mate correlations
Cross-mate phenotypic correlation estimates (rmate) explained substantial variance in the cross-chromosome even/odd PGS correlations in a linear model (; R2=47.66%; Fig. 5A). This association, which is congruent with expectations under phenotypically-mediated xAM (Supplementary Text; (16)), persisted when accounting for measurement error and heteroskedasticity, and across PGS p-value thresholds (Fig. S18).
Additionally, cross-trait even/odd chromosome PGS correlations were positively associated with empirical LDSC genetic correlation estimates (; R2=34.81%; Fig. 5B). This is consistent with the hypothesis that empirical effect correlation estimates are capturing additional structure beyond the signatures of biological overlap. Further, regressing on and simultaneously revealed that the association between and is mediated via (ΔR2<0.001; partial effect p=0.48 for versus p<5e-8 for ). Thus, alternative sources of structure independent from xAM do not appear to explain the positive association between and .
Discussion
Nonzero effect correlation estimates have been widely interpreted as evidence for overlapping genetic bases. It is therefore surprising that substantial variation in genetic correlation estimates can be explained by cross-mate phenotypic correlations. Given the strength of this association, the consequences of the random mating assumption implicit in all commonly used genetic correlation estimators warrant critical attention.
Our results show that cross-mate phenotypic correlations among many pairs of phenotypes are strong enough that one or more generations of xAM would substantially inflate genetic correlation estimates. Further, the correlation structures of even/odd chromosome PGS coincide with expectations after one or more generations of xAM. Wes conclude that xAM comprises a source of systematic bias in the study of genetic similarity across complex traits, one which subverts the widespread interpretation of genetic correlation as a direct index of biological similarity.
Our findings mirror recent results regarding the potential impacts of assortative mating across other areas of statistical genetics, including marker-based heritability estimation (10), and Mendelian randomization (12). Our results also complicate the interpretation of a number of multivariate analytic frameworks. For example, genomic structural equation modeling (24), which takes marker-based genetic correlation estimates as inputs, will propagate xAM induced biases. This does not mean such methods are fundamentally flawed, but instead demonstrates the importance of developing unbiased effect correlation estimators given the centrality of genetic correlation estimates in modern statistical genetics.
With this in mind, we comment on potential approaches to disentangling true effect correlation from xAM-induced artifact. First, family-based designs for addressing xAM (25) are increasingly being applied to molecular genetic data with promising results (26). Second, we conjecture that approaches aimed at characterizing effect heterogeneity across multiple phenotypes may provide a viable means for identifying trait-specific loci: though all trait-specific loci for either of two traits will achieve genome-wide significance in large sample GWAS of either trait, effect estimates will remain substantially larger at causal loci. Finally, we propose that directly modeling the dependence between genotypes and their effects will allow the differentiation of effect correlations and score correlations in samples of unrelated individuals.
There are several limitations to the current investigation. Foremost among these are the numerous assumptions about population dynamics required to model xAM, including but not limited to: panmictic heritabilities, stability of cross-mate phenotypic correlation structures over succeeding generations, stability and transmissability of environmental factors, and the extent to which mating patterns reflect social versus genetic homogamy. We proceeded under the tractable dynamical framework of two additive phenotypes subject to primary-phenotypic xAM with constant cross-mate correlations, stable non-heritable sources of variation, and no vertical transmission. Though each of these assumptions is likely untenable for particular trait pairs, thereby compromising the accuracy of our projections, we hypothesize that the qualitative phenomenon whereby xAM inflates genetic correlation estimates will persist for many traits. Nonetheless, we caution that these projections are contingent upon multiple consequential decisions. Constructing a generative model that reconciles the association between empirical mating patterns and genetic correlation estimates is an ill-posed inverse problem for which there are multiple solutions, and of which we have only explored a subset. At the same time, existing methods are only able to sidestep these decisions by making the strong (and often incorrect) assumption that mating is random.
Lastly, we remark that xAM is, in essence, a form of population structure not captured by conventional principal component or mixed-model based correction. Given the increasing evidence that existing methods fail to completely address structural factors, even in ostensibly ancestrally homogenous groups (27), a broader characterization of population structure and methods for addressing such structure will likely be necessary to generate results that are maximally clinically relevant and can be applied equitably.
Supplementary Material
Acknowledgments
The authors would like to thank the editors and referees as well as Professors Alkes L. Price and Matthew C. Keller for their thorough and helpful feedback.
Funding
Chan Zuckerberg Initiative grant CZF2019-002449 (NAZ, RB)
National Institutes of Health grant R01AG042568 (AIY)
National Institutes of Health grant R01CA227237 (NAZ, RB)
National Institutes of Health grant R01CA227466 (NAZ, RB)
National Institutes of Health grant R01ES029929 (NAZ, RB)
National Institutes of Health grant R01GM142112 (NAZ, RB)
National Institutes of Health grant R01HG006399 (NAZ, RB)
National Institutes of Health grant R01HG011345 (NAZ, RB)
National Institutes of Health grant R01HL151152 (NAZ, RB)
National Institutes of Health grant R01HL155024 (NAZ, RB)
National Institutes of Health grant R01MH125252 (NAZ, RB)
National Institutes of Health grant R01MH122688 (NAZ, RB)
National Institutes of Health grant R01MH130581 (AWD, JF, KSK, NAZ, NC)
National Institutes of Health grant R24AG065184 (AIY)
National Institutes of Health grant R35GM125055 (SS)
National Institutes of Health grant R35GM133531 (NAZ, RB)
National Institutes of Health grant T32NS048004 (RB)
National Institutes of Health grant U01MH126798 (NAZ, RB)
National Institutes of Health grant U01HG009080 (NAZ, RB)
National Science Foundation award CAREER-1943497 (SS)
Open Philanthropy grant 010623-00001 (AIY)
Wellcome Trust grant 200176/Z/15/Z (JF, KSK)
Footnotes
Competing interests: Authors declare that they have no competing interests.
This manuscript has been accepted for publication in Science. This version has not undergone final editing. Please refer to the complete version of record at http://www.sciencemag.org/. The manuscript may not be reproduced or used in any manner that does not fall within the fair use provisions of the Copyright Act without the prior, written permission of AAAS.
Supplementary Materials:
Data and materials availability:
Cross-mate correlation estimates and all other results are provided in Tables S1 to S6 (16). Code used to perform simulations is publicly available (28). UK Biobank data were accessed under application 33127 and are available through the UK Biobank Access Management System. Danish data are stored in a national high performance computing facility in Denmark. Access to data may be granted upon application to Statistics Denmark in accordance with Danish Law as detailed online (29).
References and Notes
- 1.Anttila V et al. , Analysis of shared heritability in common disorders of the brain. Science. 360 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Lee PH, Feng Y-CA, Smoller JW, Pleiotropy and Cross-Disorder Genetics Among Psychiatric Disorders. Biol. Psychiatry. 89, 20–31 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Weissbrod O, Flint J, Rosset S, Estimating SNP-Based Heritability and Genetic Correlation in Case-Control Studies Directly and with Summary Statistics. Am. J. Hum. Genet. 103, 89–99 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Bulik-Sullivan BK et al. , LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Wu Y et al. , Genome-wide association study of medication-use and associated disease in the UK Biobank. Nat. Commun. 10, 1891 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Bulik-Sullivan B et al. , An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Zheng J et al. , LD Hub: A centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis. Bioinforma. Oxf. Engl. 33, 272–279 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.van Rheenen W, Peyrot WJ, Schork AJ, Lee SH, Wray NR, Genetic correlations of polygenic disease traits: From theory to practice. Nat. Rev. Genet. 20, 567–581 (2019). [DOI] [PubMed] [Google Scholar]
- 9.Lee PH et al. , Genomic Relationships, Novel Loci, and Pleiotropic Mechanisms across Eight Psychiatric Disorders. Cell. 179, 1469–1482.e11 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Border R et al. , Assortative mating biases marker-based heritability estimators. Nat. Commun. 13, 660 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Yengo L et al. , Imprint of assortative mating on the human genome. Nat. Hum. Behav. 2, 948–954 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Hartwig FP, Davies NM, Smith GD, Bias in Mendelian randomization due to assortative mating. Genet. Epidemiol. 42, 608–620 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Kong A et al. , The nature of nurture: Effects of parental genotypes. Science. 359, 424–428 (2018). [DOI] [PubMed] [Google Scholar]
- 14.Bycroft C et al. , The UK Biobank resource with deep phenotyping and genomic data. Nature. 562, 203–209 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Plana-Ripoll O et al. , Exploring comorbidity within mental disorders among a Danish national population. JAMA Psychiatry. 76, 259–270 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Materials and methods are available as supplementary materials at the Science website.
- 17.Falconer DS, Introduction to quantitative genetics (Oliver And Boyd; Edinburgh; London, 1960). [Google Scholar]
- 18.Yang J, Lee SH, Goddard ME, Visscher PM, GCTA: A Tool for Genome-wide Complex Trait Analysis. Am. J. Hum. Genet. 88, 76–82 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Loh P-R, Kichaev G, Gazal S, Schoech AP, Price AL, Mixed-model association for biobank-scale datasets. Nat. Genet, 1 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Haseman JK, Elston RC, The investigation of linkage between a quantitative trait and a marker locus. Behav. Genet. 2, 3–19 (1972). [DOI] [PubMed] [Google Scholar]
- 21.Bulik-Sullivan B, “Relationship between LD Score and Haseman-Elston Regression” (preprint, Genetics, 2015), , doi: 10.1101/018283. [DOI] [Google Scholar]
- 22.Zhou X, A unified framework for variance component estimation with summary statistics in genome-wide association studies. Ann. Appl. Stat. 11, 2027 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Grotzinger AD et al. , Genetic architecture of 11 major psychiatric disorders at biobehavioral, functional genomic and molecular genetic levels of analysis. Nat. Genet. 54, 548–559 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Grotzinger AD et al. , Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits. Nat. Hum. Behav. 3, 513–525 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Keller MC et al. , The Genetic Correlation between Height and IQ: Shared Genes or Assortative Mating? PLOS Genet. 9, e1003451 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Howe LJ et al. , Within-sibship genome-wide association analyses decrease bias in estimates of direct genetic effects. Nat. Genet. 54, 581–592 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Berg JJ et al. , Reduced signal for polygenic adaptation of height in UK Biobank. eLife. 8, e39725 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Border R, xAM_and_gen_corr (2022), doi: 10.5281/zenodo.7065696. [DOI] [Google Scholar]
- 29.Statistics Denmark: Data for research, (available at https://www.dst.dk/en/TilSalg/Forskningsservice).
- 30.Pedersen CB, Gøtzsche H, Møller JØ, Mortensen PB, The Danish civil registration system. Dan Med Bull. 53, 441–449 (2006). [PubMed] [Google Scholar]
- 31.Pedersen CB, The Danish civil registration system. Scand. J. Public Health. 39, 22–25 (2011). [DOI] [PubMed] [Google Scholar]
- 32.Schmidt M et al. , The Danish National Patient Registry: a review of content, data quality, and research potential. Clin. Epidemiol. 7, 449 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Mors O, Perto GP, Mortensen PB, The Danish psychiatric central research register. Scand. J. Public Health. 39, 54–57 (2011). [DOI] [PubMed] [Google Scholar]
- 34.Gandal MJ et al. , Shared molecular neuropathology across major psychiatric disorders parallels polygenic overlap. Science. 359, 693–697 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Howe LJ et al. , Genetic evidence for assortative mating on alcohol consumption in the UK Biobank. Nat. Commun. 10, 1–10 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Rosseel Y, lavaan: An R package for structural equation modeling. J. Stat. Softw. 48, 1–36 (2012). [Google Scholar]
- 37.R Core Team, “R: A language and environment for statistical computing” (manual, Vienna, Austria, 2021), (available at https://www.R-project.org/). [Google Scholar]
- 38.The 1000 Genomes Project Consortium, A global reference for human genetic variation. Nature. 526, 68–74 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Chang CC et al. , Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 4, 7 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Ho DE, Imai K, King G, Stuart EA, MatchIt: Nonparametric preprocessing for parametric causal inference. J. Stat. Softw. 42, 1–28 (2011). [Google Scholar]
- 41.Pazokitoroudi A et al. , Efficient variance components analysis across millions of genomes. Nat. Commun. 11, 4020 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Wu Y et al. , Fast estimation of genetic correlation for biobank-scale data. Am. J. Hum. Genet. 109, 24–32 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Hagberg A, networkx: Python package for creating and manipulating graphs and networks, (available at https://networkx.org/).
- 44.Eaves L, The use of twins in the analysis of assortative mating. Heredity. 43, 399–409 (1979). [DOI] [PubMed] [Google Scholar]
- 45.Bürkner P-C, brms: An R Package for Bayesian Multilevel Models Using Stan. J. Stat. Softw. 80, 1–28 (2017). [Google Scholar]
- 46.Nagylaki T, Assortative mating for a quantitative character. J. Math. Biol. 16, 57–74 (1982). [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Cross-mate correlation estimates and all other results are provided in Tables S1 to S6 (16). Code used to perform simulations is publicly available (28). UK Biobank data were accessed under application 33127 and are available through the UK Biobank Access Management System. Danish data are stored in a national high performance computing facility in Denmark. Access to data may be granted upon application to Statistics Denmark in accordance with Danish Law as detailed online (29).