Abstract
We interrogate the joint genetic architecture of 11 major psychiatric disorders at biobehavioral, functional genomic, and molecular genetic levels of analysis. We identify four broad factors (Neurodevelopmental, Compulsive, Psychotic, and Internalizing) that underlie genetic correlations among the disorders, and test whether these factors adequately explain their genetic correlations with biobehavioral traits. We introduce Stratified Genomic Structural Equation Modeling, which we use to identify gene sets that disproportionately contribute to genetic risk sharing. This includes protein-truncating variant–intolerant genes expressed in excitatory and GABAergic brain cells that are enriched for genetic overlap across disorders with psychotic features. Multivariate association analyses detect 152 (20 novel) independent loci that act on the individual factors and identify nine loci that act heterogeneously across disorders within a factor. Despite moderate-to-high genetic correlations across all 11 disorders, we find little utility of a single dimension of genetic risk across psychiatric disorders either at the level of biobehavioral correlates or at the level of individual variants.
Psychiatric disorders aggregate both within individuals and within families. Offspring of parents with psychiatric illness are at higher risk for developing a broad range of psychiatric disorders, not just the specific parental disorder1-3. Moreover, approximately half of individuals with a psychiatric illness will concurrently meet criteria for a second disorder4. Comorbidity is the norm, rather than the exception. Factor analyses that have modeled these comorbidity patterns consistently identify a transdiagnostic p-factor representing general risk across psychiatric disorders, along with several intermediate factors representing more specific clusters of psychiatric risk (e.g., psychotic disorders, mood disorders)5-7. Modern genomics has built on these findings to begin to elucidate the genetic basis for shared risk across disorders8,9, with new statistical tools paired with genome-wide association study (GWAS) data being used to identify variants associated with multiple disorders10,11. Most recently, Lee et al.12 identified three major dimensions of genetic risk sharing (Neurodevelopmental, Compulsive and Psychotic) across eight psychiatric disorders, raising the possibility that key mechanisms of individual disorder risk may operate through these more general factors. Importantly, however, neither phenotypic comorbidity nor genetic correlations among disorders are by themselves sufficient to establish the etiological, diagnostic, or therapeutic utility of the identified factors.
Here we apply Genomic Structural Equation Modelling (Genomic SEM)13 to GWAS data to examine the genetic architecture of 11 major psychiatric disorders (average total sample size per disorder = 156,771 participants; range = 9,725–802,939) across biobehavioral, functional genomic, and molecular genetic levels of analysis. Genomic SEM is able to investigate the multivariate genetic architecture across disorders that were not measured in the same sample, thereby offering novel insights across the diagnostic spectrum. We begin by estimating several potential genomic factor models and identify four broad factors that index shared genetic liability within and across constellations of disorders. We then evaluate the utility of these factors using a multi-step approach. First, we test the extent to which the factors adequately explain the patterns of genetic correlation between psychiatric disorders and a wide range of external biobehavioral traits. Second, we introduce Stratified Genomic SEM, which we apply to identify gene sets and categories (e.g., protein-truncating variant–intolerant genes, low minor allele frequency (MAF) SNPs) for which genetic risk sharing across subclusters of disorders, as indexed by each of the factors, and genetic differentiation, as indexed by disorder-specific residuals, is enriched. Finally, we capitalize on Genomic SEM for multivariate GWAS to identify loci that confer risk to multiple disorders via the factors, along with loci that operate heterogeneously across disorders within a given factor. Collectively, these results offer key insights into the shared and disorder-specific mechanisms of genetic risk for psychiatric disease.
Results
Genomic factor analysis across 11 psychiatric traits.
We curated the most recent European ancestry GWAS summary data for 11 major psychiatric disorders: attention-deficit/hyperactivity disorder (ADHD)14, problematic alcohol use (ALCH)15, anorexia nervosa (AN)16, autism spectrum disorder (AUT)17, anxiety disorders (ANX)18,19, bipolar disorder (BIP)20, major depressive disorder (MDD)21,22, obsessive compulsive disorder (OCD)23, post-traumatic stress disorder (PTSD)24,25, schizophrenia (SCZ)26, and Tourette syndrome (TS)27 (Table 1 and Supplementary Table 1). A heatmap of genetic correlations estimated using LD Score regression (LDSC)8 indicates pervasive overlap across the 11 disorders, with more pronounced clustering observed among certain constellations of disorders (Fig. 1a and Supplementary Table 2).
Table 1 ∣.
Contributing univariate GWAS |
Population prevalence |
Cases / Controls | SNP- heritability (SE) |
Mean χ2(1) |
LDSC univariate intercept |
Independent hits (LD with Q hits) |
LD with factor hits (LD with Q hits) |
Unique from factor hits (LD with Q hits) |
---|---|---|---|---|---|---|---|---|
AN | .009 | 16,992 / 55,525 | .138 (.010) | 1.297 | 1.020 | 8 (0) | 1 (0) | 7 (0) |
OCD | .02 | 2,688 / 7,037 | .265 (.019) | 1.062 | 0.993 | 0 (0) | 0 (0) | 0 (0) |
TS | .007 | 4,819 / 9,488 | .207 (.026) | 1.123 | 1.014 | 1 (0) | 0 (0) | 1 (0) |
SCZ | .01 | 53,386 / 77,258 | .208 (.005) | 2.118 | 1.077 | 179 (2) | 89 (2) | 90 (0) |
BIP | .01 | 20,352 / 31,358 | .202 (.009) | 1.396 | 1.020 | 16 (0) | 9 (0) | 7 (0) |
ALCH | .12 | 176,024 | .060 (.040) | 1.199 | 0.994 | 6 (3) | 2 (1) | 4 (2) |
ADHD | .087 | 24,116 / 91,557 | .250 (.025) | 1.221 | 0.969 | 6 (0) | 3 (0) | 3 (0) |
AUT | .02 | 18,382 / 27,969 | .133 (.011) | 1.198 | 1.008 | 3 (1) | 0 (0) | 3 (1) |
PTSD | .068 | 12,255 / 26,338 | .264 (.008) | 1.119 | 0.991 | 0 (0) | 0 (0) | 0 (0) |
MDD | .21 | 249,227 / 553,712 | .087 (.014) | 1.957 | 1.024 | 109 (0) | 43 (0) | 66 (0) |
ANX | .311 | 30,993 / 69,883 | .294 (.003) | 1.194 | 0.998 | 2 (0) | 2 (0) | 0 (0) |
Independent hits were defined using a pruning window of 250 kb and r2 < 0.1. Hits are considered in LD if their LD was r2 > 0.10 or within a 250-kb window of one another. Reported population prevalences were used for the liability scale conversion. We note that, for the five traits (ALCH, ADHD, PTSD, MDD, and ANX) for which the finalized univariate summary statistics were produced by applying Genomic SEM to meta-analyze two summary statistics of the same or similar phenotypes, the liability scale conversion was used for the univariate meta-analysis only. The meta-analyzed outcome for these five traits was subsequently treated as continuous in all downstream analyses as these then reflected summary statistics for a latent factor defined by the two-indicators in the prior, meta-analytic stage. Values in parentheses indicate whether any of the hits were in LD with hits for factor-specific QSNP hits from the respective model. To facilitate comparison across mean χ2 values reported in each row, all χ2 statistics with df > 1 were converted to were converted to χ2(1) statistics before taking their means. For all GWAS analyses, we correct for multiple testing by employing the field standard significance threshold of P < 5 × 10−8.
We formally modeled this genetic covariance structure in Genomic SEM13, finding that a four correlated factors model fit the data well (Fig. 1b). Factor 1 consists of disorders characterized largely by compulsive behaviors (AN, OCD, TS). Factor 2 is characterized by disorders that may have psychotic features (SCZ, BIP). Factor 3 is characterized primarily by childhood-onset neurodevelopmental disorders (ADHD, AUT). Factor 4 is characterized by internalizing disorders (ANX, MDD). In line with prior evidence for a higher-order transdiagnostic “p-factor”5-7, we find that a hierarchical model also fit the genetic covariance structure well (Fig. 1c). We retained these two models—the four correlated factors model and the hierarchical factor model—to examine the utility of the genomic factors at biobehavioral, functional genomic, and molecular levels of analysis. We discuss a post-hoc bifactor model (Fig. 1d) at the end of the Results section.
Psychiatric genetics factors and biobehavioral traits.
We examined patterns of correlations across the psychiatric factors and 49 biobehavioral traits28, 101 metrics of brain morphology29, and circadian activity across 24 hours30. Results for brain morphology are presented in Supplementary Figures 3 and 4 and Supplementary Table 4, as none of these associations were significant at a Bonferroni-corrected threshold for 174 tests (P < 2.87 × 10−4). To evaluate the extent to which external traits operated through a given factor, we calculated χ2difference tests comparing a model in which the trait predicted the factor only to one in which it predicted the individual disorders of a given factor (or the first-order factors in the case of analyses using the p-factor model; Supplementary Fig. 5). We term the χ2difference across these two models the Qtrait heterogeneity index (Fig. 2). A significant Qtrait index indicates that the pattern of associations between the individual disorders and the external trait is not well accounted for by the factor.
Using a Bonferroni correction for 174 tests, 7/49 correlations with biobehavioral traits were significant for Qtrait for the Compulsive factor, 18/49 for the Psychotic factor, 39/49 for the Neurodevelopmental factor, 17/49 for the Internalizing factor, and 38/49 for the p-factor (Fig. 3 and Supplementary Table 5). Excluding genetic correlations significant for Qtrait, and using the same Bonferroni correction, 17 genetic correlations were significant for the Compulsive factor, 12 for the Psychotic factor, five for the Neurodevelopmental factor, 20 for the Internalizing factor, and three for the p-factor. We provide a more detailed assessment of significant correlations in the Supplementary Note.
Atypical patterns of physical movement throughout the 24-hour cycle may reflect disturbances in basic homeostatic processes that confer transdiagnostic psychiatric risk31. Using accelerometer data from UK Biobank30, we examined genetic correlations between the individual psychiatric traits and factors and physical movement across a 24-hour period (Fig. 4 and Supplementary Table 6). One correlation was significant for Qtrait for the Compulsive factor, two for the Psychotic factor, 12 for the Neurodevelopmental factor, seven for the Internalizing factor, and 18 for the p-factor. Excluding significant Qtrait correlations, eight correlations were significant for the Compulsive factor, four for the Psychotic factor, one for the Neurodevelopmental factor, six for the Internalizing factor, and two for the p-factor.
Compulsive disorders were positively genetically correlated with physical movement throughout the daylight hours and into the evening. Psychotic disorders were positively genetically correlated with excess movement in the early morning hours. The pattern of associations deviated from the factor structure largely in the daylight and evening hours, with larger positive genetic correlations observed for BIP. Genetic correlations with movement throughout the day were heterogeneous across disorders that load on the Neurodevelopmental disorders factor. Internalizing disorders were negatively genetically correlated with movement throughout the daylight and earlier evening hours.
Stratified Genomic SEM.
Overview and validation via simulation.
We developed Stratified Genomic SEM to allow the basic principles of Genomic SEM to be applied to genetic covariance matrices estimated within different gene sets and categories (Methods). These gene sets and categories, collectively referred to as annotations, can be constructed based on a variety of sources, such as collateral gene expression data obtained from single-cell RNA sequencing. Such an analysis goes beyond methods such as Stratified LDSC (S-LDSC)32 that estimate enrichment of heritability for particular traits within functional annotations. Rather, Stratified Genomic SEM utilizes a multivariate framework to ask whether shared and unique genetic signal across a set of traits is enriched within particular annotations. Enrichment is defined as the ratio of the proportion of genome-wide risk sharing indexed by the annotation to that annotation’s size as a proportion of the genome (Methods). The null, corresponding to no enrichment, is a ratio of 1.0, with values above 1.0 indicating enriched signal within a functional annotation.
In order to validate the key statistical properties of Stratified Genomic SEM, we began by simulating genetically correlated phenotypes that were enriched in six annotations. We then show that our multivariate extension of S-LDSC produces accurate estimates of stratified genetic covariance along with unbiased standard errors (Supplementary Figs. 8-10 and Supplementary Tables 7-9). Finally, we demonstrate that these stratified genetic covariance matrices can be used as input to Stratified Genomic SEM to produce unbiased factor loadings and unbiased standard errors (Supplementary Fig. 11).
Genetic enrichment of psychiatric factors.
We fit Stratified Genomic SEM models to examine whether the degree of risk sharing and differentiation is enriched across disorders. In total, enrichment analyses were based on 168 binary annotations. This included 29 annotations created to examine the interaction between expression patterns for protein-truncating variant (PTV)–intolerant (PI) genes (obtained from the Genome Aggregation Database; gnomAD33) and human brain cells in the hippocampus and prefrontal cortex (obtained from GTEx34). Using a Bonferroni correction for 168 tests, we identify 40 annotations significantly enriched for the Psychotic disorders factor, one annotation (conserved primate) for the Neurodevelopmental disorders factor, four annotations for the Internalizing disorders factor, and 38 annotations for the p-factor (Supplementary Table 10 and Supplementary Figs. 12-18).
PI results revealed that these annotations were particularly enriched for the Psychotic disorders factor, with 5 out of the 10 most significantly enriched gene sets falling in this category (Fig. 5). The most enriched annotations for the Neurodevelopmental and Internalizing disorders factors were fetal female brain DNase and fetal male brain H3K4me1, respectively. For specific tissues, brain regions were generally enriched, as was also observed for other complex traits35, but were most enriched for the Psychotic disorders factor. Genetic sharing across disorders, as estimated by a higher order p-factor, was enriched in conserved annotations, and enrichment increased from low to high MAF alleles (Supplementary Figs. 19-24).
We went on to examine enrichment of residual (i.e., unique) variance for the individual disorders in the correlated factors model and the residuals of the psychiatric factors in the hierarchical model (Supplementary Table 10). Results for the individual disorders revealed 17 significant residual enrichment estimates at a Bonferroni-corrected threshold. This included 13 significant estimates for various disorders within evolutionarily conserved annotations (e.g., conserved primate), along with significant enrichment unique to MDD for coding regions and the PI × excitatory dentate gyrus neurons annotation.
Multivariate GWAS.
Simulations.
We conducted a series of simulations to further validate the calibration of Genomic SEM for multivariate GWAS in the specific context of the analyses presented here. For each simulation, we used Genomic SEM to estimate factor-specific SNP effects and factor-specific indices of heterogeneity, as indexed by QSNP13. QSNP indexes violation of the null hypothesis that the SNP acts on the individual disorders entirely via the factor on which they load (Fig. 2; see Methods). As expected, simulation results revealed that the power to detect multivariate SNP effects and QSNP decreased and increased, respectively, as population SNP effects increasingly deviated from those implied by the factor structure (Supplementary Figs. 25-28 and Supplementary Table 11). These simulations additionally illustrated that that SNP effects on factors, as estimated with Genomic SEM, are not simply the reflection of the most high-powered univariate GWAS that defines the factor, that there is null signal when the population of SNP effects is set to 0, and that power for QSNP is particularly high when there are directionally discordant SNP effects across the factor indicators.
We present additional results benchmarking Genomic SEM against existing methods— Multi-trait Analysis of GWAS (MTAG)36, Model Averaging Genome-wide Association Meta-analysis (MA-GWAMA), and N-weighted Multivariate GWAMA (N-GWAMA)37—in the Supplementary Note. In addition, we examined the performance of multivariate GWAS in Genomic SEM when specified as an unstructured model that computes an omnibus index of association across all 11 disorders. Unstructured model results were obtained by comparing a maximally complex model in which the SNP is allowed to have direct regression relations with each of the 11 disorders against a null model in which the SNP is associated with none of the disorders. This is in contrast to the multivariate GWAS specified as a factor model discussed initially that estimates SNP effects on the factors, as this defines a structure of the relationship between the SNP and the 11 disorders. Briefly, we find the unstructured model is particularly well suited when the aim is to identify an exhaustive set of SNPs relevant to psychiatric risk, but does little to elucidate the specific patterning of associations. In contrast, the factor model allows us to systematically probe the genetic underpinnings of convergence and divergence across clusters of psychiatric disorders.
Empirical results.
Using the 4,775,763 SNPs present across the 11 disorders, the unstructured multivariate GWAS identified 184 associated loci at a conventional genome-wide significance threshold (P < 5 × 10−8)38, 39 of which were not in LD with any of the univariate associations (see Fig. 6 for Miami plots, Supplementary Fig. 32 for QQ-plots, and Supplementary Table 12 for individual hits).
We went on to perform the structured multivariate GWAS analyses using two factor models: the correlated factors model (with Factors 1-4 as the GWAS target) and the hierarchical factor model (with the higher order p-factor as the GWAS target; Fig. 6, Supplementary Fig. 32 for QQ-plots, and Supplementary Fig. 33 for bar plots of individual variants estimated as genome-wide significant). We also estimate QSNP specific to each factor used as a GWAS target (see Methods for details).
We identified one hit for the Compulsive disorders factor, a locus also associated with AN16 (Supplementary Tables 14 and 15). We identified two loci for the Compulsive disorders factor-specific QSNP statistic (Supplementary Table 16), including a locus (rs1906252) with strong opposing effects on AN and TS. We identified 108 hits for the Psychotic disorders factor, 96 of which were in LD with previously reported associations for BIP and SCZ (Supplementary Table 17), and 12 of which were novel relative to the contributing univariate GWASs. The Psychotic disorders factor-specific QSNP statistic revealed six hits, three of which were in LD with hits for ALCH (Supplementary Table 19), including a locus in the well-described Alcohol Dehydrogenase 1B (ADH1B) gene that was significant for factor-specific QSNP for all four factors.
We identified nine hits for the Neurodevelopmental disorders factor (Supplementary Table 20), three of which were in LD with hits for ADHD or MDD, and two of which were novel relative to the contributing univariate GWASs. There were seven hits for the Neurodevelopmental QSNP statistic, many of which appeared to be specific to AUT (Supplementary Table 22). We identified 44 independent hits for the Internalizing disorders factor, six of which were unique of hits from the contributing univariate GWASs (Supplementary Table 23). Three loci were identified for the Internalizing factor-specific QSNP statistic, all three of which were in LD with hits for ALCH (Supplementary Table 25). We note that the discrepancy in the number of univariate MDD hits (109) relative to the number of Internalizing factor hits (44) can be attributed to a combination of signal specific to MDD and splitting the MDD signal across two factors (Supplementary Fig. 38).
Nine hits from the correlated factors model were in LD across the factors, and one hit was in LD with a QSNP hit. In total, our structured GWAS discovered 152 independent loci that are likely to operate broadly within constellations of phenotypes, 20 of which were novel relative to the univariate traits. We identify nine independent QSNP hits that do not conform to the identified factor structure (Table 2), a third of which appeared to operate through pathways unique to ALCH.
Table 2 ∣.
Multivariate GWAS target | Effective n | Mean χ2(1) |
LDSC univariate intercept |
Independent hits (LD with Q hits) |
LD with univariate trait hits (LD with Q hits) |
Unique from univariate trait hits (LD with Q hits) |
---|---|---|---|---|---|---|
Multivariate GWAS | ||||||
Factor 1 (Compulsive) | 19,108 | 1.209 | 0.973 | 1 (0) | 1 (0) | 0 (0) |
Factor 2 (Psychotic) | 87,138 | 1.869 | 0.975 | 108 (1) | 96 (1) | 12 (0) |
Factor 3 (Neurodevelopmental) | 55,932 | 1.301 | 1.022 | 9 (0) | 7 (0) | 2 (0) |
Factor 4 (Internalizing) | 455,340 | 1.635 | 0.997 | 44 (0) | 38 (0) | 6 (0) |
Total hits across Factors 1-4 | - | - | - | 153 (1) | 133 (1) | 20 (0) |
Hierarchical p factor | 667,343 | 1.795 | 0.955 | 2 (1) | 2 (1) | 0 (0) |
Bifactor p factor | 666,557 | 1.985 | 0.982 | 66 (8) | 58 (8) | 8 (0) |
Unstructured meta-analysis | - | 2.216 | 0.883 | 184 | 145 (−) | 39 (−) |
Heterogeneity index (QSNP) | ||||||
Factor 1 (Compulsive) QSNP | - | 1.113 | 1.001 | 2 | 1 | 1 |
Factor 2 (Psychotic) QSNP | - | 1.251 | 0.994 | 6 | 4 | 2 |
Factor 3 (Neurodevelopmental) QSNP | - | 1.246 | 0.980 | 7 | 4 | 3 |
Factor 4 (Internalizing) QSNP | - | 1.142 | 0.977 | 3 | 3 | 0 |
Total QSNP hits across Factors 1-4 | - | - | - | 9 | 5 | 4 |
Hierarchical p factor QSNP | - | 1.667 | 0.928 | 69 | 58 | 11 |
Bifactor p factor QSNP | - | 1.645 | 0.936 | 76 | 59 | 17 |
Independent hits were defined using a pruning window of 250 kb and r2 < 0.1. Hits are considered in LD if their LD was r2 > 0.10 or within a 250-kb window of one another. Values in parentheses indicate whether any of the hits were in LD with hits for factor-specific QSNP hits from the respective model. Factor-specific QSNP indexes whether a particular SNP is unlikely to operate through the identified factor structure, as will often be the case when a SNP effect is highly specific to an individual disorder. To facilitate comparison across mean χ2 values reported in each row, all χ2 statistics with df > 1 (i.e. those for QSNP and those for the unstructured multivariate GWAS) were converted to χ2(1) statistics before taking their means. For all GWAS analyses, we correct for multiple testing by employing the field standard significance threshold of P < 5 × 10−8.
We identified only two genome-wide hits for the higher-order p-factor, both of which were in LD with univariate hits for MDD and SCZ (Supplementary Table 26), and have been described in multiple external GWAS of psychiatric traits (Supplementary Table 27). The p-factor was characterized by the highest level of heterogeneity, with 69 loci identified for QSNP (Supplementary Table 28), 49 of which were in LD with hits on the four psychiatric factors from the correlated factors model. Despite few hits for p, its considerable mean χ2 (1.795) may be attributable to the aggregation of heterogeneous signal across factors 1-4 in the hierarchical factor GWAS.
In a post-hoc analysis, we specified the p-factor in the context of a bifactor model5,6 in which the p-factor and four domain-specific factors are orthogonal to one another and directly predict the 11 disorders (Fig. 1d). In contrast to the hierarchical model, the bifactor model allows for direct associations between p and the 11 disorders. We identified 66 independent hits on the bifactor p-factor, including the two hits for the hierarchical p-factor (Supplementary Table 29). Among these 66 hits, 38 were in LD with hits from the correlated factors model, eight were novel relative to univariate hits, and seven were novel relative to both univariate and correlated factors hits. We identified 76 QSNP hits, 50 of which were in LD with hierarchical p, QSNP hits (Supplementary Table 31). Although the bifactor specification of p produced more factor hits than did the hierarchical specification, the pattern of results with respect to the large number of QSNP hits and high overall mean χ2 of QSNP was similar, and the LDSC genetic correlation across these two specifications of p was > 0.99. Collectively, these results indicate low utility of either specification of the p-factor at the level of individual genetic variants.
Estimating causal effects of problematic alcohol use.
One third of the QSNP discoveries from the correlated factors model appeared to operate through pathways unique to ALCH. This motivated an examination of the causal effects of ALCH on the disorders and factors using a form of multi-trait Mendelian randomization (MR) within the Genomic SEM framework. We ran two types of MR models: one using the QSNP variant in the ADH1B gene as a single instrumental variable for ALCH, and a second multi-variant MR approach using eight loci identified from an independent ALCH discovery GWAS as instrumental variables39. The multi-variant approach allowed for additional effects of the loci on other disorders or factors where appropriate (Supplementary Note). Results from the ADH1B and multi-variant Genomic SEM-MR approaches tentatively supported a causal effect of ALCH on MDD and BIP (Supplementary Note and Supplementary Figs. 39 and 40). In these models, ALCH loadings on factors 2-4 were no longer significant, but the remaining disorders continued to load significantly on their respective factors. Multiple causation by ALCH is thus insufficient to fully account for widespread genetic overlap observed across disorders.
Discussion
We used Genomic SEM to identify four broad factors (Neurodevelopmental, Compulsive, Psychotic, and Internalizing) that provide a reasonable model of the genetic correlations among 11 major psychiatric disorders. We find that the Compulsive, Psychotic, and Internalizing factors are generally effective at describing the genetic relationships between psychiatric disorders at biobehavioral, functional genomic, and molecular levels of analysis. At the biobehavioral level, the pattern of associations with external correlates was informative with respect to the shared and distinct characteristics across the disorders. For example, the accelerometer results displayed both divergent patterns of findings across the factors and convergent patterns for the disorders within a factor. This provides evidence for both the validity and the utility of the genetic factor model for characterizing genetic associations with basic aspects of everyday functioning that may be, at face, relatively distal from the biological mechanisms of the disorders themselves. Results were less consistent with respect to the utility of a Neurodevelopmental disorders factor. For example, the Neurodevelopmental disorders factor exhibited much higher degrees of heterogeneity with respect to relationships with external correlates and with respect to effects of individual variants, a finding that seemed to be largely driven by divergent patterns for AUT.
Although the genetic correlations among the 11 disorders were somewhat consistent with the concept of a general p-factor, a hierarchical factor model that specified such a p-factor was found to offer limited biological insight, obscuring patterns of genetic correlations with external biobehavioral traits, enrichment within specific biological annotations, and associations with individual variants. Compared to the hierarchical model, a bifactor model identified a larger number of GWAS hits for p, but continued to exhibit a great deal of SNP-level heterogeneity. Given that a p-factor was found to be insufficient for accounting for patterns of multivariate associations at biobehavioral and variant levels of analysis, the question arises: what processes give rise to the moderate genetic correlations observed among the four, first-order factors? One possibility is that genetic correlations among the four factors originate from shared biology underlying pairwise combinations of factors and not from any biology that is shared across all factors. Similarly, genetic correlations among the factors themselves may reflect combinations of shared biology among subsets of disorders spanning factors that are not shared across all disorders within the corresponding factors.
In some circumstances, genetic correlations across disorders may arise from direct, potentially mutual, causation between the factor or disorder-specific liabilities and one another40 or reflect causation directly between the symptoms of different disorders41. Based on significant locus-specific violations of the four factor model at loci relevant to ALCH, we incorporated MR into Genomic SEM models, with both single and multi-variant MR indicating causal effects of ALCH on MDD and BIP.
In order to identify gene sets and categories in which shared and unique genetic signal for multiple disorders is disproportionally localized, we developed and validated both a multivariate extension of S-LDSC and Stratified Genomic SEM. In line with prior findings linking SCZ and BIP to excitatory hippocampal CA142,43 and CA344,45 neurons and GABAergic neurons46,47, we observe that the intersection between PI genes and genes expressed in both excitatory and GABAergic neurons explained an outsized proportion of the genetic variance in the Psychotic disorders factor. These results converge with considerable evidence from prior univariate work in indicating shared risk pathways for SCZ and BIP. Enrichment of variance unique to MDD, rather than shared across internalizing disorders, in excitatory dentate gyrus (DG) neurons is consistent with prior findings on the anti-depressive effects of DG stimulation in mouse models48 and the observation that anti-depressants increase neurogenesis in this region49.
We provide a more detailed account of limitations in the Supplementary Note, but highlight limitations particularly relevant to future work here. Summary statistics from well-powered GWASs spanning the wide range of psychiatric disorders investigated here were only consistently available for individuals of European ancestry. A major priority for continued work in this area will be to increase the diversity of populations for which psychiatric GWAS are available. Recently developed methods for the stratified analysis of genetic correlations across ancestral populations will be invaluable for the analysis of such data50. Moreover, our results may have been influenced by the phenotyping and case-ascertainment methods used. Cai et al.51 have specifically reported that psychiatric phenotypes derived using minimal phenotyping (defined as “individuals’ self-reported symptoms, help seeking, diagnoses or medication”) may produce GWAS signals of low specificity. Although our sensitivity analyses suggested minimal differences when excluding GWAS that used self-report cohorts, this issue should continue to be explored in future work. It will also be informative for future research to examine further the effect of heterogeneity in how samples are ascertained and disorders are assessed on genetic relationships among disorders52.
The current analyses revealed four correlated psychiatric factors that account for extensive genetic overlap across disorders. We elucidate the composition of these factors by demonstrating patterns of correlations with external biobehavioral traits, develop and apply Stratified Genomic SEM to identify classes of genes that explain disproportionate levels of genetic risk sharing and uniqueness, and distinguish pleiotropic loci with directionally concordant effects on the individual factors from those acting heterogeneously across disorders within a factor. Our results offer critical insights into shared and disorder specific mechanisms of genetic risk and suggest possible avenues for revising a psychiatric nosology currently defined largely by clinical observation. Evidence derived from multivariate genetic analysis, alongside evidence at other levels of explanation (e.g., cognitive neuroscience, environmental stressors), could guide the development of novel treatments and revision of established diagnostic taxonomies.
Methods
The section directly below gives an overview of Genomic SEM followed by the validation and application of the novel method introduced here, Stratified Genomic SEM. The Supplementary Note provides additional details about the curation of the psychiatric phenotypes, model fitting procedure, results excluding self-report GWAS, comparison to prior results from the second iteration of results from the PGC cross-disorder group (i.e., CDG2)12, genetic correlations with external traits, interpretation and estimation of the Q metrics (QTrait and QSNP), multivariate GWAS simulations, multivariate MR analyses, S-LDSC results, quality control procedures, and an extended account of the limitations outlined in the Discussion.
Overview of Genomic SEM.
Genomic SEM is a two-stage Structural Equation Modelling approach. In the first stage, a genetic covariance matrix (S) and its associated sampling covariance matrix (VS) are estimated with a multivariate version of LD Score regression (LDSC). S consists of heritabilities on the diagonal and genetic covariances (co-heritabilities) on the off-diagonal. V consists of squared standard errors of S on the diagonal and sampling covariances on the off-diagonal, which capture dependencies between estimating errors that will arise in situations such as participant sample overlap across GWAS phenotypes. In the second stage, a structural equation model is fit to S by optimizing a fit function that minimizes the discrepancy between the model-implied genetic covariance matrix (Σ(θ)) and S, weighted by the elements within V. We use the diagonally weighted least squares (WLS) fit function described in Grotzinger et al.13:
where S and Σ(θ) have been half-vectorized to produce s and σ(θ), respectively, and DS is VS with its off-diagonal elements set to 0. The sampling covariance matrix of the stage 2, Genomic SEM parameter estimates (Vθ) are obtained using a sandwich correction described in Grotzinger et al.13:
where is the matrix of model derivatives evaluated at the parameter estimates, Γ is the stage 2 weight matrix, DS, and VS is the sampling covariance matrix of S. Validation of Genomic SEM in Grotzinger et al.13 demonstrated that the framework produces unbiased standard errors, appropriately accounts for sample overlap in multivariate GWAS, and produces accurate point estimates for different population generating models. In addition, polygenic scores derived from Genomic SEM summary statistics were found to better predict the individual traits that define the factor than polygenic scores constructed from the summary statistics for the individual traits. As part of the current analyses, we sought to further validate Genomic SEM via a series of simulations based directly on the factor structure identified here and additionally benchmark Genomic SEM against existing multivariate methods.
Overview of Stratified Genomic SEM.
Stratified Genomic SEM extends the overall Genomic SEM framework by allowing potentially different structural equation models to be fit to genetic covariance matrices estimated in different gene sets and categories. These gene sets and categories, collectively referred to as annotations, can be constructed based on a variety of sources, such as collateral gene expression data obtained from single-cell RNA sequencing. We develop a multivariate extension of Stratified LD Score Regression (S-LDSC)32 below to estimate these annotation-specific genetic covariance matrices and their associated sampling covariance matrices. We describe two types of annotation-specific genetic covariance matrices, S0 and Sτ. S0 contains estimates of genetic covariance within a specific annotation without controlling for overlap with other annotations. In other words, it is composed of the zero-order coefficients implied by the multivariate S-LDSC model. Sτ contains estimates of genetic covariance controlling for annotation overlap. In other words, it is composed of multiple regression coefficients estimated by the multivariate S-LDSC model. The distinction between S0 and Sτ directly parallels the distinction made in univariate S-LDSC32 between overall heritability explained by an annotation and the incremental contribution of an annotation to heritability beyond all other annotations considered. Note that the estimates required to populate elements of an overall genome-wide S matrix can be produced either from the zero-order annotation that includes all SNPs or by aggregating parameters corresponding to each annotation from the multivariate S-LDSC model used to estimate Sτ.
Below, we validate via simulation that Stratified Genomic SEM produces unbiased model parameter estimates and standard errors, and that model fit indices appropriately favor the population generating model within a given annotation. There is a wide array of research questions that can be asked using Stratified Genomic SEM. In this paper, we examine genetic enrichment of variance in psychiatric genetic factors across a broad range of annotations.
Multivariate Stratified LDSC.
Under a multivariate extension of the S-LDSC model, the expected value of the product of z statistics for each pairwise combination of phenotypes for SNP j equals:
where Ni is the sample size for study i, c indexes a genomic annotation, Mc is the number of SNPs in annotation c, ℓ(j,c) is the LD score of SNP j with respect to annotation c (that is, the sum of squared LD this SNP has with all SNPs in the annotation), τc is a vector of free parameters used to compute the conditional contribution to heritability or coheritability (genetic covariance) in annotation c, Ns is the number of individuals included in both GWAS samples, ρ is the phenotypic correlation within the overlapping samples, and a is a term representing unmeasured sources of confounding such as shared population stratification across GWASs53. The inclusion of the term Mc in the above equation produces LD scores () that are scaled relative to the size of the respective annotations, thereby allowing τc to be interpreted on the same scale as genome-wide estimates of heritability and coheritability, rather than on a per SNP scale. Note that when the z statistics for the same phenotype is double entered on the left hand side of the above equation, such that E[z1j z2j] becomes , the equation reduces to the univariate S-LDSC model8.
Following Finucane et al.32, the multivariate S-LDSC model is estimated by regressing the product of z statistics against the annotation-specific LD scores using a weighted regression model (see online supplement of Finucane et al.32 for a description of how weights are calculated). Standard errors and dependencies among estimation errors (i.e., sampling covariances) are estimated using a multivariate block jackknife. As sample overlap creates a dependency between z statistics for the two traits, thus increasing their products, the S-LDSC intercept (ρNs/√(N1N2) + a) is affected, but the regression slope is unaffected, and the estimates of partitioned genetic covariance and their standard errors are not biased.
Derivation of Sτ and S0.
Sτ,c is a matrix containing estimates of genetic variance and covariance in annotation c, controlling for overlap with other annotations. It is composed of multiple regression coefficients, τc, estimated directly with the multivariate S-LDSC model by populating each of its cells with the corresponding τ estimate from the multivariate S-LDSC model.
S0,c is a matrix containing estimates of genetic covariance in annotation c, without controlling for overlap with other annotations. The elements ζc composing S0,c can be derived from the τc estimates from the multivariate S-LDSC model in combination with knowledge of annotation overlap. Thus, the zero-order contribution of target annotation t to heritability or co-heritability is written as:
where ∣Cc ∩ Ct∣ is the number of SNPs in annotation c that are also in target annotation t, and ∣Cc∣ is the total number of SNPs in annotation c (alternatively expressed as Mc), such that reflects the proportion of SNPs in annotatio c that are also in target annotation t. This proportion is used to weight the term τc for each annotation in deriving the zero-order contribution of target annotation t to heritability or coheritability.
When the multivariate S-LDSC model is correct, Sτ is expected to produce unbiased estimates of the conditional contribution of an annotation to genetic covariance, after controlling for the effects of variants in all other annotation (i.e., accounting for the fact that variants can reside in multiple annotations). In comparison, S0 is expected to produce unbiased estimates of the total contribution of all genetic variants in an annotation to genetic covariance (i.e., irrespective of its overlap with the other annotations). S0 has two desirable properties. First, its estimate is not as directly contingent on which other annotations are included in the multivariate S-LDSC model. Second, because it does not decompose contributions of an annotation into those that are shared vs. unique of other annotations, it is expected to produce more stable estimates at small and moderate sample sizes. For this reason, the empirical Stratified Genomic SEM analyses reported here employ S0 matrices, and should be interpreted accordingly.
Simulations of stratified genetic covariance.
Simulation procedure.
Using raw, individual-level genotype data simulation, we sought to validate the point estimates and standard errors (SEs) produced by Stratified Genomic SEM. We compare results for S0 and Sτ. We began by generating 100 sets of 45, 100% heritable phenotypes (“orthogonal genotypes”) using the GCTA package54. Each 100% heritable phenotype was specified to have 10,000 randomly selected causal variants from within a particular annotation. These phenotypes were paired with genotypic data for 100,000 randomly selected, unrelated individuals of European descent from UK Biobank data for the 1,209,498 SNPs present in HapMap3.
The simulated genotypes were used to construct six different factor structures for six causal annotations. All orthogonal genotypes were scaled M = 0, SD = 1. For three of the causal annotations (DHS Peaks, H3K27ac, and PromoterUSC), seven genotypes for each annotation were used to construct six new correlated genotypes, each as the weighted linear combination of a domain-specific genetic factor and a general genetic factor, which was constructed from the seventh genotype. For the remaining three causal annotations (FetalDHS, H3K9ac, and TFBS), eight genotypes for each annotation were used to construct two sets of three correlated genotypes for two correlated general genetic factors, constructed from the seventh and eighth genotypes. A set of six “total” genotypes was created by summing a factor indicator genotype from each of the six causal annotations. As each genotype within each annotation was specified to have 10,000 causal SNPs, the “total” genotypes created as the sum of six annotations had 60,000 causal SNPs in the population generating model.
Phenotypes were subsequently constructed as the weighted linear combination of one of the six “total” genotypes and domain-specific environmental factors (randomly sampled from a normal distribution with M = 0, SD = 1). Heritabilities for phenotypes 1-6 were all set to , such that the weights for the genotypes were and the weights for the environmental factors were . Each of the 600 phenotypes (100 sets of 6 phenotypes) was then analyzed as a univariate GWAS in PLINK55 to produce univariate GWAS summary statistics. The summary statistics were then munged, and Stratified Genomic SEM using the 1000 Genomes Phase 3 BaselineLD Version 2.2 model was used to construct 100 sets of 6 × 6 stratified zero-order genetic covariance matrices (S0), τ covariance matrices (Sτ), and their corresponding sampling covariance matrices (VS0 and VSτ).
Validating S0 and VS0.
For the zero-order genetic covariance matrix, we would expect the annotation including all SNPs—i.e. the genome-wide annotation—to reflect the weighted linear combination of the generating covariance matrices specified for the six causal annotations, with weights equal to the proportion of all SNPs contained in each of the corresponding causal annotations. For each of the six causal annotations, we expect the zero-order covariance matrix for the corresponding annotation to be a linear combination of that annotation’s population-generating matrix and the remaining annotations’ population-generating matrices weighted by the proportion of SNPs overlapping across the annotations. To test these expectations, we created average observed covariance matrices across the 100 simulations for the genome-wide annotation and six causal annotations. The estimated S0 genome-wide covariance matrix approximately reflected an additive mixture of the six population generating covariance matrices, and was estimated with minimal bias (absolute value of mean discrepancy = 0.004; Supplementary Fig. 8). In addition, the observed covariance matrices for each of the causal annotations were minimally biased relative to the generating population (Supplementary Table 7).
In order to evaluate the accuracy of the SEs, we analyzed the ratio of the mean SE estimate across the 100 simulations over the empirical SE (calculated as the standard deviation of the parameter estimates across the 100 simulations). A value above 1 for this ratio indicates conservative SE estimates. This ratio was calculated within each of the annotations and for each cell of the covariance matrix. The average ratio across annotations and cells of the covariance matrix was 1.030 (see Supplementary Fig. 9 for distribution across all annotations and Supplementary Table 7 for ratio within causal annotations). Thus, we have produced a SE estimate for stratified heritability and covariance that performs as expected. In fact, our estimates are very slightly conservative as the mean SE was slightly larger than the empirical SE. Moreover, the average z statistic for heritability and covariance estimates within the causal annotations were all highly significant, suggesting more than adequate power under the conditions of the current simulation.
Validating Sτ and VSτ.
The expectation for the genetic Sτ covariance matrices is that the observed covariance matrices will reflect the generating model within only that annotation. Indeed, the causal annotations closely matched their respective population generating covariance matrices and bias was minimal (Supplementary Fig. 10 and Supplementary Table 7). We then analyzed the ratio of the mean SE estimate across the 100 runs over the empirical SE (calculated as the standard deviation of the parameter estimates across the 100 runs). The average ratio of SE estimates was 1.014 across all annotations (Supplementary Fig. 9) and, importantly, was also close to 1 for the causal annotations. Results for 4,459 of the total 5,300 covariance matrices produced negative heritability estimates. This included some of the causal annotations (Supplementary Table 30), but was largely true for the non-causal annotations. Negative heritability estimates are unsurprising for the non-causal annotations as their population generating effect is 0. The z statistics for the Sτ heritabilities and covariances were, on average, smaller relative to the S0 covariance matrices. This is to be expected as the S0 covariance matrices include power gained from variance shared with overlapping annotations.
The Sτ covariance matrices for the causal annotations were then used as input for Genomic SEM models. The two types of population generating models—a common factor and correlated factors model—were run for each annotation. For all causal annotations, Genomic SEM estimates closely matched the parameters specified in the generating population (Supplementary Table 8 and Supplementary Fig. 11). In addition, the ratio of the mean model SEs over the empirical SEs was near 1. Model fit statistics (CFI, AIC, and model χ2) also generally favored the generating model for a particular annotation (Supplementary Table 9). This was least true for the H3K27ac annotation. This is unsurprising as the population-generating model for the H3K27ac annotation—a correlated factors model with a factor correlation of 0.7—most closely matched the competing common factor model. Collectively, these results indicate that Stratified Genomic SEM produces unbiased parameter estimates and standard errors for S0 and Sτ, that Sτ shows specificity to the causal annotations of interest, and that model fit indices generally favor the appropriate model.
Estimating genetic enrichment of model parameters.
We can examine whether the proportional contribution of an annotation to a given genome-wide parameter in Stratified Genomic SEM is different than would be expected on the basis of the relative size of that annotation, so long as the parameter is scaled comparably across all annotations considered56. This is formalized by testing the null hypothesis,
where θc is the parameter estimate in annotation c, as estimated from a Genomic SEM model applied to S0,c; θ is the genome-wide parameter estimate, as estimated from a Genomic SEM model applied to the genome-wide S matrix derived via aggregating the conditional contributions of all annotations included in the multivariate S-LDSC model; Mc is the number of SNPs in annotation c; and M is the total number of SNPs used to computed the LD-scores. This formula can be rearranged to produce a ratio of ratios (the so-called enrichment ratio) that indexes the magnitude of enrichment:
with a value of 1.0 corresponding to the null of no enrichment, values greater than 1.0 corresponding to enrichment (overrepresentation of signal in the annotation relative to its size), and values below 1.0 corresponding to depletion (underrepresentation of signal in the annotation relative to its size).
In the current application, we are interested in enrichment of genetic signal shared across subclusters of disorders and disorder-specific signal, as indexed by a factor model that allows the estimates of factor variances and disorder-specific uniquenesses, respectively, to vary across annotations, while holding all factor loadings invariant across annotations. We use a two-step model-fitting procedure to estimate the enrichment ratio in order to directly obtain an estimate of its SE. In Step 1, we estimate the factor loadings needed to scale the total genome-wide variances of the factors to 1.0. This is achieved by fitting a model to the genome-wide S-LDSC matrix in which unit variance identification is used. In Step 2, the loading estimates from the prior Step 1 model are fixed and the factor variance is freely estimated separately in each annotation using the S0,c matrices. Thus, the estimated factor variances in Step 2 are scaled proportionally relative to the genome-wide factor variance (i.e., the numerator of the enrichment ratio). This estimate and its SE are subsequently divided by the proportion of SNPs in the corresponding annotation (i.e., the denominator of the enrichment ratio). For clarification, we note that genome-wide enrichment across all SNPs is exactly equal to 1. That is, for Step 2, if the genome-wide S-LDSC matrix is used as input, this produces a parameter estimate of 1, which is then divided by a proportion of 1.0, which reflects the ratio of M/M (i.e., all SNPs over all SNPs).
Selection and creation of annotations.
In order to construct the genome-wide S-LDSC matrix, and estimate stratified genetic covariance, we utilized pre-computed annotation files provided by the original S-LDSC authors32. In line with recommendations, we utilized all annotations from the most recent 1000 Genomes Phase 3 BaselineLD Version 2.257 that includes a total of 97 annotations ranging from coding, UTR, promoter, and flanking window annotations. For tissue specific histone marks, we included annotations constructed based on data from the Roadmap Epigenetics Project58 for narrowly defined peaks for DNase hypersensitivity, H3K27ac, H3K4me1, H3K4me3, H3K9ac, and H3K36me3 chromatin. For tissue-specific gene expression, we include annotations constructed based on RNA sequencing data from human tissues from Genotype-Tissue Expression (GTEx)59 and for annotations constructed from human, mouse, and rat microarray experiments from the Franke Lab (i.e., DEPICT)60. For both tissue-specific histone/chromatin marks and gene expression, we utilized only brain and endocrine relevant regions in addition to 5 randomly selected control regions from each (i.e., 10 controls total).
We also created 29 annotations to examine the interaction between protein-truncating variant (PTV)–intolerant (PI) genes and human brain cells. PI genes were obtained from the Genome Aggregation Database (gnomAD), and ascertained using the probability of loss-of-function intolerance (pLI) metric. We selected genes with pLI > 0.9, producing a list of 3,063 genes33. Human brain cell gene sets were based on single-nucleus RNA-seq (sNuc-seq) data generated GTEx project brain tissues in the hippocampus and prefrontal cortex34. Excluding sporadic genes and genes with low expression, for the 14 cell types we selected the top 1,600 (~15%) differentially expressed genes in each cell type, which likely cover all genes that are important for a specific cell type. PI × human brain cell gene sets contained the intersection of genes that are PTV-intolerant and each human brain cell gene set. Annotations were created using a 100-kb window and LD information from the European subsample of 1000 Genomes Phase 3.
We do not estimate enrichment of psychiatric factors for continuous or flanking window annotations, yielding a total of 168 binary annotations across the baseline model, gene expression, histone marks, PI, and brain cell annotations. For a Bonferroni correction < 0.05, this corresponds to P < 2.98 × 10−4. We note that continuous and flanking window annotations were retained for construction of the genome-wide, S-LDSC matrix.
Supplementary Material
Acknowledgements
This work presented here would not have been possible without the enormous efforts put forth by the investigators and participants from Psychiatric Genetics Consortium, iPSYCH, UK Biobank, and 23andMe. The work from these contributing groups was supported by numerous grants from governmental and charitable bodies as well as philanthropic donation. Research reported in this publication was supported by the National Institute of Mental Health of the National Institutes of Health under Award Number R01MH120219. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. A.D.G. was additionally supported by NIH Grant R01HD083613. E.M.T.-D. was additionally supported by NIH grants R01AG054628 and R01HD083613 and the Jacobs Foundation. E.M.T.-D. is a faculty associate of the Population Research Center at the University of Texas, which is supported by NIH grant P2CHD042849, and the Center on Aging and Population Sciences, which is supported by NIH grant P30AG066614. M.G.N. is additionally supported by ZonMW grants 849200011 and 531003014 from The Netherlands Organisation for Health Research and Development, a VENI grant awarded by NWO (VI.Veni.191G.030) and is a Jacobs Foundation Fellow. W.A.A. is supported by the "European Union’s Horizon 2020 research and innovation programme, Marie Sklodowska Curie Actions – MSCA-ITN-2016 – Innovative Training Networks under grant agreement No [721567]". H.F.I. is supported by the "Aggression in Children: unraveling gene-environment interplay to inform Treatment and InterventiON strategies" (ACTION) project. ACTION receives funding from the European Union Seventh Framework Program (FP7/2007-2013) under grant agreement no 602768. C.M.L. is supported by the National Institute for Health Research Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London. A.M.M. is supported by the Wellcome Trust (104036/Z/14/Z, 216767/Z/19/Z), UKRI MRC (MC_PC_17209, MR/S035818/1). K.-P.L. is supported by the Deutsche Forschungsgemeinschaft (DFG: CRU 125, CRC TRR 58 A1/A5, No. 44541416), the European Union’s Seventh Framework Programme under Grant No. 602805 (Aggressotype), the Horizon 2020 Research and Innovation Programme under Grant No. 728018 (Eat2beNICE) and 643051 (MiND), Fritz Thyssen Foundation (No. 10.13.1185), ERA-Net NEURON/RESPOND, No. 01EW1602B, ERA-Net NEURON/DECODE, No. FKZ01EW1902 and 5-100 Russian Academic Excellence Project. G.B. is supported by the National Institute for Health Research Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London. P.H.L. is supported by NIH R01MH119243 and R00MH101367. The iPSYCH team was supported by grants from the Lundbeck Foundation (R102-A9118, R155-2014-1724 and R248-2017-2003), the EU FP7 Program (Grant No. 602805, “Aggressotype”) and H2020 Program (Grant No. 667302, “CoCA”), NIMH (1U01MH109514-01 to ADB) and the universities and university hospitals of Aarhus and Copenhagen. The Danish National Biobank resource was supported by the Novo Nordisk Foundation. High-performance computer capacity for handling and statistical analysis of iPSYCH data on the GenomeDK HPC facility was provided by the Center for Genomics and Personalized Medicine and the Centre for Integrative Sequencing, iSEQ, Aarhus University, Denmark (grant to A.D.B.).
Appendix
Consortia
iPSYCH
Jakob Grove9-12, Manuel Mattheisen10,17,19-22, Anders D. Børglum9-11, and Ole Mors9,23
Tourette Syndrome and Obsessive Compulsive Disorder Working Group of the Psychiatric Genetics Consortium
Manuel Mattheisen10,17,19-22
Bipolar Disorder Working Group of the Psychiatric Genetics Consortium
Cathryn M. Lewis7,8, Andrew M. McIntosh6, Jakob Grove9-12, Manuel Mattheisen10,17,19-22, Anders D. Børglum9-11, Ole Mors9,23, Gerome Breen7,8, Phil H. Lee24,25, and Jordan W. Smoller24,25
Major Depressive Disorder Working Group of the Psychiatric Genetics Consortium
Mark J. Adams6, Cathryn M. Lewis7,8, Andrew M. McIntosh6, Jakob Grove9-12, Manuel Mattheisen10,17,19-22, Anders D. Børglum9-11, Ole Mors9,23, Gerome Breen7,8, Kenneth S. Kendler26, Jordan W. Smoller24,25, and Michel G. Nivard4,28
Schizophrenia Working Group of the Psychiatric Genetics Consortium
Andrew M. McIntosh6, Sandra M. Meier10,20, Manuel Mattheisen10,17,19-22, Anders D. Børglum9-11, Ole Mors9,23, Phil H. Lee24,25, Kenneth S. Kendler26, and Jordan W. Smoller24,25
Footnotes
Competing Interests
J.W.S. is an unpaid member of the Bipolar/Depression Research Community Advisory Panel of 23andMe. C.M.L. is on the SAB for Myriad Neuroscience. G.B. is a scientific advisor for COMPASS Pathways. The other authors declare no competing interests.
Code Availability
GenomicSEM software (which now includes the Stratified GenomicSEM extension) is an R package that is available from GitHub at the following URL: https://github.com/GenomicSEM/GenomicSEM
Directions for installing the GenomicSEM R package can be found at: https://github.com/GenomicSEM/GenomicSEM/wiki
Data Availability
The data that support the findings of this study are all publicly available or can be requested for access. Specific download links for various datasets are directly below.
Summary statistics for data from the PGC can be downloaded or requested here: https://www.med.unc.edu/pgc/download-results/
Summary statistics for the Anxiety phenotype in UKB (TotANX_OR) can be downloaded here: https://drive.google.com/drive/folders/1fguHvz7l2G45sbMI9h_veQun4aXNTy1v
23andMe summary statistics are made available through 23andMe to qualified researchers under an agreement with 23andMe that protects the privacy of 23andMe participants. Please visit research.23andme.com/collaborate/#publication for more information.
Summary statistics for the volume-based neuroimaging phenotypes were downloaded from: https://github.com/BIG-S2/GWAS
Summary statistics for the health and well-being complex trait correlations can be downloaded from: https://atlas.ctglab.nl/
Summary statistics for the circadian rhythm correlations across 24-hours can be downloaded from: https://cnsgenomics.com/software/gcta/#DataResource
Data from gnomAD used to identify PI genes for creation of annotations can be downloaded here: https://storage.googleapis.com/gnomad-public/release/2.1.1/constraint/gnomad.v2.1.1.lof_metrics.by_gene.txt.bgz
Gene count data per cell for creation of annotations were obtained from: https://storage.googleapis.com/gtex_additional_datasets/single_cell_data/GTEx_droncseq_hip_pcf.tar
Data which map individual cells to cell types (e.g. neuron, astrocyte etc.) were obtained from: https://static-content.springer.com/esm/art%3A10.1038%2Fnmeth.4407/MediaObjects/41592_2017_BFnmeth4407_MOESM10_ESM.xlsx
Links to the LD-scores, reference panel data, and the code used to produce the current results can all be found at: https://github.com/MichelNivard/GenomicSEM/wiki
Links to the BaselineLD v2.2 annotations can be found here: https://data.broadinstitute.org/alkesgroup/LDSCORE/
References
- 1.Martel MM et al. A general psychopathology factor (P factor) in children: structural model analysis and external validation through familial risk and child global executive function. J. Abnorm. Psychol 126, 137–148 (2017). [DOI] [PubMed] [Google Scholar]
- 2.Dean K et al. The impact of parental mental illness across the full diagnostic spectrum on externalising and internalising vulnerabilities in young offspring. Psychol. Med 48, 2257–2263 (2018). [DOI] [PubMed] [Google Scholar]
- 3.McLaughlin KA et al. Parent psychopathology and offspring mental disorders: results from the WHO World Mental Health Surveys. Br. J. Psychiatry 200, 290–299 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Kessler RC, Chiu WT, Demler O & Walters EE Prevalence, severity, and comorbidity of 12-month DSM-IV disorders in the National Comorbidity Survey Replication. Arch. Gen. Psychiatry 62, 617–627 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Caspi A et al. The p factor: one general psychopathology factor in the structure of psychiatric disorders? Clin. Psychol. Sci 2, 119–137 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Lahey BB et al. Is there a general factor of prevalent psychopathology during adulthood? J. Abnorm. Psychol 121, 971–977 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Pettersson E, Larsson H & Lichtenstein P Common psychiatric disorders share the same genetic origin: a multivariate sibling study of the Swedish population. Mol. Psychiatry 21, 717–721 (2016). [DOI] [PubMed] [Google Scholar]
- 8.Bulik-Sullivan BK et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet 47, 291–295 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Selzam S, Coleman JR, Caspi A, Moffitt TE & Plomin R A polygenic p factor for major psychiatric disorders. Transl. Psychiatry 8, 205 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Lee SH et al. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nat. Genet 45, 984–994 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Anttila V et al. Analysis of shared heritability in common disorders of the brain. science 360, eaap8757 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Lee PH et al. Genomic relationships, novel loci, and pleiotropic mechanisms across eight psychiatric disorders. Cell 179, 1469–1482.e11 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Grotzinger AD et al. Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits. Nat. Hum. Behav 3, 513–525 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Demontis D et al. Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder. Nat. Genet 51, 63–75 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Walters RK et al. Transancestral GWAS of alcohol dependence reveals common genetic underpinnings with psychiatric disorders. Nat. Neurosci 21, 1656–1669 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Watson HJ et al. Genome-wide association study identifies eight risk loci and implicates metabo-psychiatric origins for anorexia nervosa. Nat. Genet 51, 1207–1214 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Grove J et al. Identification of common genetic risk variants for autism spectrum disorder. Nat. Genet 51, 431–444 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Otowa T et al. Meta-analysis of genome-wide association studies of anxiety disorders. Mol. Psychiatry 21, 1391–1399 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Purves KL et al. A major role for common genetic variation in anxiety disorders. Mol. Psychiatry 25, 3292–3303 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Stahl EA et al. Genome-wide association study identifies 30 loci associated with bipolar disorder. Nat. Genet 51, 793–803 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Wray NR et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat. Genet 50, 668–681 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Howard DM et al. Genome-wide association study of depression phenotypes in UK Biobank identifies variants in excitatory synaptic pathways. Nat. Commun 9, 1470 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.International Obsessive Compulsive Disorder Foundation Genetic Collaborative (IOCDF-GC) and OCD Collaborative Genetics Association Studies (OCGAS). Revealing the complex genetic architecture of obsessive–compulsive disorder using meta-analysis. Mol. Psychiatry 23, 1181–1188 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Meier SM et al. Genetic variants associated with anxiety and stress-related disorders: a genome-wide association study and mouse-model study. JAMA Psychiatry 76, 924–932 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Duncan LE et al. Largest GWAS of PTSD (N = 20 070) yields genetic overlap with schizophrenia and sex differences in heritability. Mol. Psychiatry 23, 666–673 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Ripke S, Walters JT & O'Donovan MC Mapping genomic loci prioritises genes and implicates synaptic biology in schizophrenia. medRxiv 2020.09.12.20192922 (2020). doi: 10.1101/2020.09.12.20192922 [DOI] [Google Scholar]
- 27.Yu D et al. Interrogating the genetic determinants of Tourette’s syndrome and other tic disorders through genome-wide association studies. Am. J. Psychiatry 176, 217–227 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Watanabe K et al. A global overview of pleiotropy and genetic architecture in complex traits. Nat. Genet 51, 1339–1348 (2019). [DOI] [PubMed] [Google Scholar]
- 29.Zhao B et al. Genome-wide association analysis of 19,629 individuals identifies novel genetic variants for regional brain volumes and refines their genetic co-architecture with cognitive and mental health traits. Nat. Genet 51, 1637–1644 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Jiang L et al. A resource-efficient tool for mixed model association analysis of large-scale data. Nat. Genet 51, 1749–1755 (2019). [DOI] [PubMed] [Google Scholar]
- 31.Karatsoreos IN Links between circadian rhythms and psychiatric disease. Front. Behav. Neurosci 8, 162 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Finucane HK et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet 47, 1228–1235 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Karczewski KJ et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Habib N et al. Massively parallel single-nucleus RNA-seq with DroNc-seq. Nat. Methods 14, 955–958 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Finucane HK et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet 50, 621–629 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Turley P et al. Multi-trait analysis of genome-wide association summary statistics using MTAG. Nat. Genet 50, 229–237 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Baselmans BML et al. Multivariate genome-wide analyses of the well-being spectrum. Nat. Genet 51, 445–451 (2019). [DOI] [PubMed] [Google Scholar]
- 38.Pe'er I, Yelensky R, Altshuler D & Daly MJ Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet. Epidemiol 32, 381–385 (2008). [DOI] [PubMed] [Google Scholar]
- 39.Kranzler HR et al. Genome-wide association study of alcohol consumption and use disorder in 274,424 individuals from multiple populations. Nat. Commun 10, 1499 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Epskamp S, Rhemtulla M & Borsboom D Generalized network psychometrics: combining network and latent variable models. Psychometrika 82, 904–927 (2017). [DOI] [PubMed] [Google Scholar]
- 41.Borsboom D A network theory of mental disorders. World Psychiatry 16, 5–13 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Liu L, Schulz SC, Lee S, Reutiman TJ & Fatemi SH Hippocampal CA1 pyramidal cell size is reduced in bipolar disorder. Cell. Mol. Neurobiol 27, 351–358 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Ho NF et al. Progressive decline in hippocampal CA1 volume in individuals at ultra-high-risk for psychosis who do not remit: findings from the Longitudinal Youth at Risk Study. Neuropsychopharmacology 42, 1361–1370 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Konradi C et al. Hippocampal interneurons in bipolar disorder. Arch. Gen. Psychiatry 68, 340–350 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Li W et al. Synaptic proteins in the hippocampus indicative of increased neuronal activity in CA3 in schizophrenia. Am. J. Psychiatry 172, 373–382 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Volk DW, Sampson AR, Zhang Y, Edelson JR & Lewis DA Cortical GABA markers identify a molecular subtype of psychotic and bipolar disorders. Psychol. Med 46, 2501–2512 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.de Jonge JC, Vinkers CH, Hulshoff Pol HE & Marsman A GABAergic mechanisms in schizophrenia: linking postmortem and in vivo studies. Front. Psychiatry 8, 118 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Yun S et al. Stimulation of entorhinal cortex–dentate gyrus circuitry is antidepressive. Nat. Med 24, 658–666 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Boldrini M et al. Antidepressants increase neural progenitor cells in the human hippocampus. Neuropsychopharmacology 34, 2376–2389 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Shi H et al. Population-specific causal disease effect sizes in functionally important regions impacted by selection. Nat. Commun 12, 1098 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Cai N et al. Minimal phenotyping yields genome-wide association signals of low specificity for major depression. Nat. Genet 52, 437–447 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Newson JJ, Hunter D & Thiagarajan TC The heterogeneity of mental health assessment. Front. Psychiatry 11, 76 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
Methods-only References
- 53.Yengo L, Yang J & Visscher PM Expectation of the intercept from bivariate LD score regression in the presence of population stratification. bioRxiv 310565 (2018). [Google Scholar]
- 54.Yang J, Lee SH, Goddard ME & Visscher PM GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet 88, 76–82 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Purcell S et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet 81, 559–575 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Meredith W Measurement invariance, factor analysis and factorial invariance. Psychometrika 58, 525–543 (1993). [Google Scholar]
- 57.Hujoel ML, Gazal S, Hormozdiari F, van de Geijn B & Price AL Disease heritability enrichment of regulatory elements is concentrated in elements with ancient sequence age and conserved function across species. Am. J. Hum. Genet 104, 611–624 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Kundaje A et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Pers TH et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat. Commun 6, 5890 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data that support the findings of this study are all publicly available or can be requested for access. Specific download links for various datasets are directly below.
Summary statistics for data from the PGC can be downloaded or requested here: https://www.med.unc.edu/pgc/download-results/
Summary statistics for the Anxiety phenotype in UKB (TotANX_OR) can be downloaded here: https://drive.google.com/drive/folders/1fguHvz7l2G45sbMI9h_veQun4aXNTy1v
23andMe summary statistics are made available through 23andMe to qualified researchers under an agreement with 23andMe that protects the privacy of 23andMe participants. Please visit research.23andme.com/collaborate/#publication for more information.
Summary statistics for the volume-based neuroimaging phenotypes were downloaded from: https://github.com/BIG-S2/GWAS
Summary statistics for the health and well-being complex trait correlations can be downloaded from: https://atlas.ctglab.nl/
Summary statistics for the circadian rhythm correlations across 24-hours can be downloaded from: https://cnsgenomics.com/software/gcta/#DataResource
Data from gnomAD used to identify PI genes for creation of annotations can be downloaded here: https://storage.googleapis.com/gnomad-public/release/2.1.1/constraint/gnomad.v2.1.1.lof_metrics.by_gene.txt.bgz
Gene count data per cell for creation of annotations were obtained from: https://storage.googleapis.com/gtex_additional_datasets/single_cell_data/GTEx_droncseq_hip_pcf.tar
Data which map individual cells to cell types (e.g. neuron, astrocyte etc.) were obtained from: https://static-content.springer.com/esm/art%3A10.1038%2Fnmeth.4407/MediaObjects/41592_2017_BFnmeth4407_MOESM10_ESM.xlsx
Links to the LD-scores, reference panel data, and the code used to produce the current results can all be found at: https://github.com/MichelNivard/GenomicSEM/wiki
Links to the BaselineLD v2.2 annotations can be found here: https://data.broadinstitute.org/alkesgroup/LDSCORE/