Quantifying concordant genetic effects of de novo mutations on multiple disorders

Hanmin Guo; Lin Hou; Yu Shi; Sheng Chih Jin; Xue Zeng; Boyang Li; Richard P Lifton; Martina Brueckner; Hongyu Zhao; Qiongshi Lu

doi:10.7554/eLife.75551

. 2022 Jun 6;11:e75551. doi: 10.7554/eLife.75551

Quantifying concordant genetic effects of de novo mutations on multiple disorders

Hanmin Guo ^1,², Lin Hou ^1,^2,³, Yu Shi ⁴, Sheng Chih Jin ⁵, Xue Zeng ^6,⁷, Boyang Li ⁸, Richard P Lifton ^6,⁷, Martina Brueckner ^6,⁹, Hongyu Zhao ^6,^8,^10,^✉, Qiongshi Lu ^11,^✉

Editors: Alexander Young¹², Molly Przeworski¹³

PMCID: PMC9217133 PMID: 35666111

Abstract

Exome sequencing on tens of thousands of parent-proband trios has identified numerous deleterious de novo mutations (DNMs) and implicated risk genes for many disorders. Recent studies have suggested shared genes and pathways are enriched for DNMs across multiple disorders. However, existing analytic strategies only focus on genes that reach statistical significance for multiple disorders and require large trio samples in each study. As a result, these methods are not able to characterize the full landscape of genetic sharing due to polygenicity and incomplete penetrance. In this work, we introduce EncoreDNM, a novel statistical framework to quantify shared genetic effects between two disorders characterized by concordant enrichment of DNMs in the exome. EncoreDNM makes use of exome-wide, summary-level DNM data, including genes that do not reach statistical significance in single-disorder analysis, to evaluate the overall and annotation-partitioned genetic sharing between two disorders. Applying EncoreDNM to DNM data of nine disorders, we identified abundant pairwise enrichment correlations, especially in genes intolerant to pathogenic mutations and genes highly expressed in fetal tissues. These results suggest that EncoreDNM improves current analytic approaches and may have broad applications in DNM studies.

Research organism: Human

Introduction

De novo mutations (DNMs) can be highly deleterious and provide important insights into the genetic cause for disease (Veltman and Brunner, 2012). As the cost of sequencing continues to drop, whole-exome sequencing (WES) studies conducted on tens of thousands of family trios have pinpointed numerous risk genes for a variety of disorders (Lelieveld et al., 2016; Kaplanis et al., 2020; Satterstrom et al., 2020). In addition, accumulating evidence suggests that risk genes enriched for pathogenic DNMs may be shared by multiple disorders (Hoischen et al., 2014; Fromer et al., 2014; Homsy et al., 2015; Li et al., 2016; Nguyen et al., 2020). These shared genes could reveal biological pathways that play prominent roles in disease etiology and shed light on clinically heterogeneous yet genetically related diseases (Homsy et al., 2015; Li et al., 2016; Nguyen et al., 2020).

Most efforts to identify shared risk genes directly compare genes that are significantly associated with each disorder (Nguyen et al., 2017; Willsey et al., 2018). There have been some successes with this approach in identifying shared genes and pathways (e.g. chromatin modifiers) underlying developmental disorder (DD), autism spectrum disorder (ASD), and congenital heart disease (CHD), thanks to the large trio samples in these studies (Kaplanis et al., 2020; Satterstrom et al., 2020; Jin et al., 2017), whereas findings in smaller studies remain suggestive (Allen et al., 2013; Jin et al., 2020b). Even in the largest studies to date, statistical power remains moderate for risk genes with weaker effects (Kaplanis et al., 2020; Howrigan et al., 2020). It is estimated that more than 1000 haploinsufficient genes contributing to developmental disorder risk have not yet achieved statistical significance in large WES studies (Kaplanis et al., 2020). Therefore, analytic approaches that only account for top significant genes cannot capture the full landscape of genetic sharing in multiple disorders. Recently, a Bayesian framework named mTADA was proposed to jointly analyze DNM data of two diseases and improve risk gene mapping (Nguyen et al., 2020). Although mTADA produces estimates for the proportion of shared risk genes, the statistical property of these parameter estimates has not been studied. There is a pressing need for powerful, robust, and interpretable methods that quantify concordant DNM association patterns for multiple disorders using exome-wide DNM counts.

Recent advances in estimating genetic correlations using summary data from genome-wide association studies (GWAS) may provide a blueprint for approaching this problem in DNM research (Zhang et al., 2021a). Modeling ‘omnigenic’ associations as independent random effects, linear mixed-effects models leverage genome-wide association profiles to quantify the correlation between additive genetic components of multiple complex traits (Lee et al., 2012; Bulik-Sullivan et al., 2015; Lu et al., 2017; Ning et al., 2020). These methods have identified ubiquitous genetic correlations across many human traits and revealed significant and robust genetic correlations that could not be inferred from significant GWAS associations alone (Shi et al., 2017; Brainstorm, 2018; Guo et al., 2021; Zhang et al., 2021b).

Here, we introduce EncoreDNM (Enrichment correlation estimator for De Novo Mutations), a novel statistical framework that leverages exome-wide DNM counts, including genes that do not reach exome-wide statistical significance in single-disorder analysis, to estimate concordant DNM associations between disorders. EncoreDNM uses a generalized linear mixed-effects model to quantify the occurrence of DNMs while accounting for de novo mutability of each gene and technical inconsistencies between studies. We demonstrate the performance of EncoreDNM through extensive simulations and analyses of DNM data of nine disorders.

Results

Method overview

DNM counts in the exome deviate from the null (i.e. expected counts based on de novo mutability) when mutations play a role in disease etiology. Disease risk genes will show enrichment for deleterious DNMs in probands and non-risk genes may be slightly depleted for DNM counts. Our goal is to estimate the correlation of such deviation between two disorders, which we refer to as the DNM enrichment correlation. More specifically, we use a pair of mixed-effects Poisson regression models (Munkin and Trivedi, 1999) to quantify the occurrence of DNMs in two studies.

[\begin{matrix} Y_{i 1} \\ Y_{i 2} \end{matrix}] \sim P o i s s o n ([\begin{matrix} λ_{i 1} \\ λ_{i 2} \end{matrix}]),

\log ([\begin{matrix} λ_{i 1} \\ λ_{i 2} \end{matrix}]) = [\begin{matrix} β_{1} \\ β_{2} \end{matrix}] + \log ([\begin{matrix} 2 N_{1} m_{i} \\ 2 N_{2} m_{i} \end{matrix}]) + [\begin{matrix} ϕ_{i 1} \\ ϕ_{i 2} \end{matrix}],

[\begin{matrix} ϕ_{i 1} \\ ϕ_{i 2} \end{matrix}] \sim M V N ([\begin{matrix} 0 \\ 0 \end{matrix}], [\begin{matrix} σ_{1}^{2} & ρ σ_{1} σ_{2} \\ ρ σ_{1} σ_{2} & σ_{2}^{2} \end{matrix}]) .

Here, $Y_{i 1}, Y_{i 2}$ are the DNM counts for the i-th gene and $N_{1}, N_{2}$ are the number of parent-proband trios in two studies, respectively. The log Poisson rates of DNM occurrence are decomposed into three components: the elevation component, the background component, and the deviation component. The elevation component $β_{k}$ ( $k = 1,2$ ) is a fixed effect term adjusting for systematic, exome-wide bias in DNM counts. One example of such bias is the batch effect caused by different sequencing and variant calling pipelines in two studies. The elevation parameter $β_{k}$ tends to be positive when DNMs are over-called with higher sensitivity and negative when DNMs are under-called with higher specificity (Wei et al., 2015). The background component $l o g (2 N_{k} m_{i})$ is a gene-specific fixed effect that reflects the expected mutation counts determined by the genomic sequence context under the null (Samocha et al., 2014). $m_{i}$ is the de novo mutability for the i-th gene, and $2 N_{1} m_{i}$ and $2 N_{2} m_{i}$ are the expected DNM counts in the i-th gene under the null in two studies. The deviation component $ϕ_{i k}$ is a gene-specific random effect that quantifies the deviation of DNM profile from what is expected under the null (i.e. no risk genes for the disorder). $ϕ_{i 1}$ and $ϕ_{i 2}$ follow a multivariate normal distribution with dispersion parameters $σ_{1}$ and $σ_{2}$ and a correlation $ρ$ . A larger value of the dispersion parameter $σ_{k}$ indicates a more substantial deviation from the null. That is, DNM counts show strong enrichment in some genes and depletion in other genes compared to the expectation based on de novo mutability. A smaller value of $σ_{k}$ suggests that the DNM count data is well in line with what is expected based on the null model. DNM enrichment correlation is denoted by $ρ$ and is our main parameter of interest. It quantifies the concordance of DNM burden in two disorders.

Parameters in this model can be estimated using a Monte Carlo maximum likelihood estimation (MLE) procedure. Standard errors of the estimates are obtained through inversion of the observed Fisher information matrix. In practice, we use annotated DNM data as input and fit mixed-effects Poisson models for each variant class separately: loss of function (LoF), deleterious missense (Dmis, defined as MetaSVM-deleterious), tolerable missense (Tmis, defined as MetaSVM-tolerable), and synonymous (Figure 1). More details about model setup and parameter estimation are discussed in Materials and methods.

Figure 1. — The inputs of EncoreDNM are de novo mutability of each gene and exome-wide, annotated DNM counts from two studies. We fit a mixed-effects Poisson model to estimate the DNM enrichment correlation between two disorders for each variant class.

Simulation results

We conducted simulations to assess the parameter estimation performance of EncoreDNM in various settings. We focused on two variant classes, that is, Tmis and LoF variants, since they have the highest and lowest median mutabilities in the exome. We used EncoreDNM to estimate the elevation parameter $β$ , dispersion parameter $σ$ , and enrichment correlation $ρ$ (Materials and methods). Under various parameter settings, EncoreDNM always provided unbiased estimation of the parameters (Figure 2 and Figure 2—figure supplements 1–2). Furthermore, the 95% Wald confidence intervals achieved coverage rates close to 95% under all simulation settings, demonstrating the effectiveness of EncoreDNM to provide accurate statistical inference.

Figure 2. — (a) Boxplot of $β$ estimates in single-trait analysis with $σ$ fixed at 0.75. (b) Boxplot of $σ$ estimates in single-trait analysis with $β$ fixed at –0.25. (c) Boxplot of $ρ$ estimates in cross-trait analysis with $β$ and $σ$ fixed at –0.25 and 0.75. True parameter values are marked by dashed lines. The number above each box represents the coverage rate of 95% Wald confidence intervals. Each simulation setting was repeated 100 times.

Figure 2—figure supplement 1. — (a) Boxplot of $β$ estimates in single-trait analysis with $σ$ fixed at 0.75. (b) Boxplot of $σ$ estimates in single-trait analysis with $β$ fixed at –0.25. (c) Boxplot of $ρ$ estimates in cross-trait analysis with $β$ and $σ$ fixed at –0.25 and 0.75. True parameter values are marked by dashed lines. The number above each box represents the coverage rate of 95% Wald confidence intervals. Each simulation setting was repeated 100 times.

Next, we compared the performance of EncoreDNM with mTADA (Nguyen et al., 2020), a Bayesian framework that estimates the proportion of shared risk genes for two disorders. First, we simulated DNM data under the mixed-effects Poisson model. We evaluated two methods across a range of combinations of elevation parameter, dispersion parameter, and sample size for two disorders. The false positive rates for our method were well-calibrated in all parameter settings, but mTADA produced false positive findings when the observed DNM counts were relatively small (e.g. due to reduced elevation or dispersion parameters or a lower sample size; Figure 3a). We also assessed the statistical power of two approaches under a baseline setting where false positives for both methods were controlled. As enrichment correlation increased, EncoreDNM achieved universally greater statistical power compared to mTADA (Figure 3b).

Figure 3. — (a) False positive rates under a mixed-effects Poisson regression model. (b) Statistical power of two methods under a mixed-effects Poisson regression model as the enrichment correlation increases. Parameters ( $β, σ, N$ ) were fixed at (–0.25, 0.75, 5000) for both disorders. (c) False positive rates under a multinomial model. (d) Statistical power under a multinomial model with varying proportion of shared causal genes. Parameters ( $u, p, N$ ) were fixed at (0.95, 0.25, 5000) for both disorders. Each simulation setting was repeated 100 times.

To ensure a fair comparison, we also considered a mis-specified model setting where we randomly distributed the total DNM counts for each disorder into all genes with an enrichment in causal genes (Materials and methods). EncoreDNM showed well-controlled false positive rate across all simulation settings, whereas severe inflation of false positives arose for mTADA when the total mutation count, the proportion of probands that can be explained by DNMs, or the sample size were small (Figure 3c). Furthermore, we compared the statistical power of two methods under this model in a baseline setting where false positive rate was controlled. EncoreDNM showed higher statistical power compared to mTADA as the fraction of shared causal genes increased (Figure 3d).

Pervasive enrichment correlation of damaging DNMs among developmental disorders

We applied EncoreDNM to DNM data of nine disorders (Supplementary file 1-STable 1; Materials and methods): developmental disorder (n=31,058; number of trios; Kaplanis et al., 2020), autism spectrum disorder (n=6430; Satterstrom et al., 2020), schizophrenia (SCZ; n=2772; Howrigan et al., 2020), congenital heart disease (n=2645; Jin et al., 2017), intellectual disability (ID; n=820; Lelieveld et al., 2016), Tourette disorder (TD; n=484) Willsey et al., 2017, epileptic encephalopathies (EP; n=264; Allen et al., 2013), cerebral palsy (CP; n=250; Jin et al., 2020b), and congenital hydrocephalus (CH; n=232; Jin et al., 2020a). In addition, we also included 1789 trios comprising healthy parents and unaffected siblings of autism probands as controls (Krumm et al., 2015).

We first performed single-trait analysis under the mixed-effects Poisson model for each disorder. The estimated elevation parameters (i.e. $β$ ) were negative for almost all disorders and variant classes (Figure 4a), with LoF variants showing particularly lower parameter estimates. This may be explained by more stringent quality control in LoF variant calling (Jin et al., 2017) and potential survival bias (Lek et al., 2016). It is also consistent with a depletion of LoF DNMs in healthy control trios (Homsy et al., 2015). The dispersion parameter estimates (i.e. $σ$ ) were higher for LoF variants than other variant classes (Figure 4b), which is consistent with our expectation that LoF variants have stronger effects on disease risk and should show a larger deviation from the null mutation rate in disease probands. We also compared the goodness of fit of our proposed mixed-effects Poisson model to a simpler fixed-effects model without the deviation component (Materials and methods). The expected distribution of recurrent DNM counts showed substantial and statistically significant improvement under the mixed-effects Poisson model (Figure 4c–f, Figure 4—figure supplement 1, and Supplementary file 1-STable 2).

Figure 4. — (**a, b**) Estimation results of $β$ and $σ$ for nine disorders and four variant classes. Error bars represent 1.96*standard errors. Sample sizes of DNM datasets for each disorder are provided in Supplementary file 1-STable 1. (**c–f**) Distribution of DNM events per gene in four variant classes for developmental disorder. Red and green bars represent the expected frequency of genes under the fixed-effects and mixed-effects Poisson regression models, respectively. Blue bars represent the observed frequency of genes.

Figure 4—figure supplement 1. — (**a, b**) Estimation results of $β$ and $σ$ for nine disorders and four variant classes. Error bars represent 1.96*standard errors. Sample sizes of DNM datasets for each disorder are provided in Supplementary file 1-STable 1. (**c–f**) Distribution of DNM events per gene in four variant classes for developmental disorder. Red and green bars represent the expected frequency of genes under the fixed-effects and mixed-effects Poisson regression models, respectively. Blue bars represent the observed frequency of genes.

Next, we estimated pairwise DNM enrichment correlations for 9 disorders. In total, we identified 25 pairs of disorders with significant correlations at a false discovery rate (FDR) cutoff of 0.05 (Figure 5 and Figure 5—figure supplement 1), including 12 significant correlations for LoF variants, 7 for Dmis variants, 5 for Tmis variants, and only 1 significant correlation for synonymous variants. Notably, all significant correlations are positive (Supplementary file 1-STable 3). No significant correlation was identified between any disorder and healthy controls (Figure 5—figure supplement 2). This is consistent with our expectation, since DNMs in the control groups will distribute proportionally according to the de novo mutability without showing enrichment in certain genes. The number of identified significant correlations for each disorder was proportional to the sample size in each study (Spearman correlation = 0.70) with controls being a notable outlier (Figure 5—figure supplement 3).

Figure 5. — (a) Shows sample size (for example, number of trios) for each disease. X-axis denotes sample size on the log scale. (b) Heatmap of enrichment correlations for LoF (upper triangle) and synonymous (lower triangle) DNMs among nine disorders. Larger squares represent more significant p-values, and deeper color represents stronger correlations. Significant correlations (FDR <0.05) are shown as full-sized squares marked by asterisks.

Figure 5—figure supplement 1. — (a) Shows sample size (for example, number of trios) for each disease. X-axis denotes sample size on the log scale. (b) Heatmap of enrichment correlations for LoF (upper triangle) and synonymous (lower triangle) DNMs among nine disorders. Larger squares represent more significant p-values, and deeper color represents stronger correlations. Significant correlations (FDR <0.05) are shown as full-sized squares marked by asterisks.

We identified highly concordant and significant LoF DNM enrichment among developmental disorder, autism, intellectual disability, and congenital heart disease, which is consistent with previous reports (Li et al., 2016; Nguyen et al., 2020; Nguyen et al., 2017; Hormozdiari et al., 2015). Schizophrenia shows highly significant LoF correlations with developmental disorder (p=2.0e-3) and intellectual disability (3.7e-5). The positive enrichment correlation between autism and cerebral palsy in LoF variants ( $ρ$ =0.81, p=3.3e-3) is consistent with their co-occurrence (Christensen et al., 2014). The high enrichment correlation between intellectual disability and cerebral palsy in LoF variants ( $ρ$ =0.68, p=1.0e-4) is consistent with the associations between intellectual disability and motor or non-motor abnormalities caused by cerebral palsy (Reid et al., 2018). A previous study also suggested significant genetic sharing of intellectual disability and cerebral palsy by overlapping genes harboring rare damaging variants (Jin et al., 2020b). Here, we obtained consistent results after accounting for de novo mutabilities and potential confounding bias.

Some significant correlations identified in our analysis are consistent with phenotypic associations in epidemiological studies, but have not been reported using genetic data to the extent of our knowledge. For example, the LoF enrichment correlation between congenital heart disease and cerebral palsy ( $ρ$ =0.88, p=1.7e-3) is consistent with findings that reduced supply of oxygenated blood in fetal brain due to cardiac malformations may be a risk factor for cerebral palsy (Garne et al., 2008). The enrichment correlation between intellectual disability and congenital hydrocephalus in LoF variants ( $ρ$ =0.63, p=2.4e-3) is consistent with lower intellectual performance in a proportion of children with congenital hydrocephalus (Lumenta and Skotarczak, 1995).

Genes showing pathogenic DNMs in multiple disorders may shed light on the mechanisms underlying enrichment correlations (Supplementary file 1-STable 4). We identified five genes (CTNNB1, NBEA, POGZ, SPRED2, and KMT2C) with LoF DNMs in five different disorders and 21 genes had LoF DNMs in four disorders (Supplementary file 1-STable 5). These 26 genes with LoF variants in at least four disorders were significantly enriched for 63 gene ontology (GO) terms with FDR <0.05 (Supplementary file 1-STable 6). Chromatin organization (p=7.8e-11), nucleoplasm (p=2.8e-10), chromosome organization (p=6.8e-10), histone methyltransferase complex (p=1.4e-9), and positive regulation of gene expression (p=2.2e-9) were the most significantly enriched GO terms. One notable example consistently included in these gene sets is CTNNB1 (Figure 5—figure supplement 4). It encodes $β$ -catenin, is one of the only two genes reaching genome-wide significance in a recent WES study for cerebral palsy (Jin et al., 2020b), and also harbors multiple LoF variants in developmental disorder, intellectual disability, autism, and congenital heart disease. It is a fundamental component of the canonical Wnt signaling pathway which is known to confer genetic risk for autism (O’Roak et al., 2012). Genes with recurrent damaging DNMs in multiple disorders also revealed shared biological function across these disorders (Rees et al., 2021). We identified 30 recurrent cross-disorder LoF mutations that were not recurrent in developmental disorder alone (Supplementary file 1). FBXO11, encoding the F-box only protein 31, shows two recurrent p.Ser831fs LoF variants in autism and congenital hydrocephalus (Figure 5—figure supplement 5; p=1.9e-3; Materials and methods). The F-box protein constitutes a substrate-recognition component of the SCF (SKP1-cullin-F-box) complex, an E3-ubiquitin ligase complex responsible for ubiquitination and proteasomal degradation (Cardozo and Pagano, 2004). DNMs in FBXO11 have been previously implicated in severe intellectual disability individuals with autistic behavior problem (Jansen et al., 2019) and neurodevelopmental disorder (Gregor et al., 2018).

For comparison, we also applied mTADA to the same nine disorders and control trios. In total, mTADA identified 117 disorder pairs with significant genetic sharings at an FDR cutoff of 0.05 (Supplementary file 1-STable 8 and Figure 5—figure supplement 6). Notably, we identified significant synonymous DNM correlations for all 36 disorder pairs and between all disorders and healthy controls (Figure 5—figure supplement 7). These results are consistent with the simulation results and suggest a substantially inflated false positive rate in mTADA.

Partitioning DNM enrichment correlation by gene set

To gain biological insights into the shared genetic architecture of nine disorders, we repeated EncoreDNM correlation analysis in several gene sets. First, we defined genes with high/low probability of intolerance to LoF variants using pLI scores (Karczewski et al., 2020), and identified genes with high/low brain expression (HBE/LBE) (Werling et al., 2020; Materials and methods; Supplementary file 1-STable 9). We identified 11 and 12 disorder pairs showing significant enrichment correlations for LoF DNMs in high-pLI genes and HBE genes, respectively (Figure 6a–b). We observed fewer significant correlations for Dmis and Tmis variants in these gene sets (Figure 6—figure supplements 1–2). All identified significant correlations were positive (Supplementary file 1-STables 10 -11). No significant correlations were identified for synonymous variants (Figure 6—figure supplements 1–2) or between disorders and controls (Figure 6—figure supplements 3–4).

Figure 6. — (a) Enrichment correlations in high-pLI genes (upper triangle) and low-pLI genes (lower triangle) for LoF variants. Here, pLI is the probability of being loss-of-function intolerant (see Materials and methods). (b) Enrichment correlations in HBE genes (upper triangle) and LBE genes (lower triangle) for LoF variants. (c) Enrichment correlations in HHE genes (upper triangle) and LHE genes (lower triangle) for LoF variants. (d) Enrichment correlations in CHD-related pathways for LoF and synonymous variants. Larger squares represent more significant p-values, and deeper color represents stronger correlations. Significant correlations (FDR <0.05) are shown as full-sized squares marked by asterisks.

Figure 6—figure supplement 1. — (a) Enrichment correlations in high-pLI genes (upper triangle) and low-pLI genes (lower triangle) for LoF variants. Here, pLI is the probability of being loss-of-function intolerant (see Materials and methods). (b) Enrichment correlations in HBE genes (upper triangle) and LBE genes (lower triangle) for LoF variants. (c) Enrichment correlations in HHE genes (upper triangle) and LHE genes (lower triangle) for LoF variants. (d) Enrichment correlations in CHD-related pathways for LoF and synonymous variants. Larger squares represent more significant p-values, and deeper color represents stronger correlations. Significant correlations (FDR <0.05) are shown as full-sized squares marked by asterisks.

We observed a clear enrichment of significant correlations in disease-relevant gene sets. Overall, high-pLI genes showed substantially stronger correlations across disorders than genes with low pLI (one-sided Kolmogorov-Smirnov test; p=2.3e-6). Similarly, enrichment correlations were stronger in HBE genes than in LBE genes (p=8.8e-7). Among the 11 disorder pairs showing significant enrichment correlations in high-pLI genes, two pairs, that is, autism-schizophrenia ( $ρ$ =0.68, p=2.4e-3) and developmental disorder-congenital hydrocephalus ( $ρ$ =0.43, p=1.5e-3), were not identified in the exome-wide analysis. We also identified four novel disorder pairs with significant correlations in HBE genes, including developmental disorder-cerebral palsy ( $ρ$ =0.80, p=9.5e-5), developmental disorder-congenital hydrocephalus ( $ρ$ =0.67, p=1.4e-3), autism-congenital hydrocephalus ( $ρ$ =0.82, p=4.7e-4), and schizophrenia-epileptic encephalopathies ( $ρ$ =0.66, p=2.0e-3). These novel enrichment correlations are consistent with known comorbidities between these disorders (Kielinen et al., 2004; Kilincaslan and Mukaddes, 2009) and findings based on significant risk genes (Li et al., 2016; Jin et al., 2020a; Kume et al., 1998; Cao and Wu, 2015).

Furthermore, we estimated DNM enrichment correlations in genes with high/low expression in mouse developing heart (HHE/LHE) (Homsy et al., 2015; Materials and methods; Supplementary file 1-STable 9). We identified 9 significant enrichment correlations for LoF variants in HHE genes (Figure 6c). Strength of enrichment correlations did not show a significant difference between HHE and LHE genes (p=0.846), possibly due to a lack of cardiac disorders in our analysis. Finally, we estimated enrichment correlations between congenital heart disease and other disorders in known pathways for congenital heart disease (Zaidi and Brueckner, 2017; Materials and methods; Supplementary file 1-STable 9). We identified five significant correlations for LoF variants (Figure 6d), including a novel correlation between congenital heart disease and Tourette disorder ( $ρ$ =0.93, p=3.3e-9). Of note, arrhythmia caused by congenital heart disease is a known risk factor for Tourette disorder (Gulisano et al., 2011). In these analyses, all significant enrichment correlations were positive (Supplementary file 1) and other variant classes showed generally weaker correlations than LoF variants (Figure 6—figure supplements 5–6). We did not observe significant correlations in these gene sets between disorders and controls (Figure 6—figure supplements 7–8).

Discussion

In this paper, we introduced EncoreDNM, a novel statistical framework to quantify correlated DNM enrichment between two disorders. Through extensive simulations and analyses of DNM data for nine disorders, we demonstrated that our proposed mixed-effects Poisson regression model provides unbiased parameter estimates, shows well-controlled false positive rate, and is robust to exome-wide technical biases. Leveraging exome-wide DNM counts and genomic context-based mutability data, EncoreDNM achieves superior fit for real DNM datasets compared to simpler models and provides statistically powerful and computationally efficient estimation of DNM enrichment correlation. Further, EncoreDNM can quantify concordant genetic effects for user-defined variant classes within pre-specified gene sets, thus is suitable for exploring diverse types of hypotheses and can provide crucial biological insights into the shared genetic etiology in multiple disorders. In comparison, the Bayesian approach implemented in mTADA can produce false positives findings, especially when the DNM count is low, possibility due to the overestimated proportion of risk genes. We still observed inflation in false positive rates under a more stringent significance cutoff or using posterior probability threshold strategy (Supplementary file 1-STables 14-17).

Multi-trait analyses of GWAS data have revealed shared genetic architecture among many neuropsychiatric traits (Brainstorm, 2018; Lee et al., 2013; Gratten et al., 2014; Abdellaoui and Verweij, 2021). These findings have led to the identification of pleiotropic variants, genes, and hub genomic regions underlying many traits and have revealed multiple psychopathological factors jointly affecting human neurological phenotypes (Lee, 2019; Wang et al., 2015). Although emerging evidence suggests that causal DNMs underlying several disorders with well-powered studies (e.g. congenital heart disease and neurodevelopmental disorders; Homsy et al., 2015) may be shared, our understanding of the extent and the mechanism underlying such sharing remains incomplete. Applied to DNM data for nine disorders, EncoreDNM identified pervasive enrichment correlations of DNMs. We observed particularly strong correlations in pathogenic variant classes (e.g. LoF and Dmis variants) and disease-relevant genes (e.g. genes with high pLI and genes highly expressed in relevant tissues). Genes underlying these correlations were significantly enriched in pathways involved in chromatin organization and modification and gene expression regulation. The DNM correlations were substantially attenuated in genes with lower expression and genes with frequent occurrences of LoF variants in the population. A similar attenuation was observed in less pathogenic variant classes (e.g., synonymous variants). Further, no significant correlations were identified between any disorder and healthy controls. We also compared DNM enrichment correlations of five disorders with genetic correlations estimated from GWAS summary statistics (Supplementary file 1-STable 18). We had consistent findings from GWAS and DNM data (Spearman correlation = 0.70; Figure 5—figure supplement 8 and Supplementary file 1-STable 19). These results lay the groundwork for future investigations of pleiotropic mechanisms of DNMs.

Our study has some limitations. First, EncoreDNM assumes probands from different input studies to be independent. In rare cases when two studies have overlapping proband samples, enrichment correlation estimates may be inflated and must be interpreted with caution. Second, genetic correlation methods based on GWAS summary data provided key motivations for the mixed-effects Poisson regression model in our study. Built upon genetic correlations, a plethora of methods have been developed in the GWAS literature to jointly model more than two GWAS (Turley et al., 2018), identify and quantify common factors underlying multiple traits (Grotzinger et al., 2019; Grotzinger et al., 2020), estimate causal effects among different traits (Pickrell et al., 2016), and identify pleiotropic genomic regions through hypothesis-free scans (Guo et al., 2021). Future directions of EncoreDNM include using enrichment correlation to improve gene discovery, learning the directional effects and the causal structure underlying multiple disorders, and dynamically searching for gene sets and annotation classes with shared genetic effects without pre-specifying the hypothesis.

Taken together, we provide a new analytic approach to an important problem in DNM studies. We believe EncoreDNM improves the statistical rigor in multi-disorder DNM modeling and opens up many interesting future directions in both method development and follow-up analyses in WES studies. As trio sample size in WES studies continues to grow, EncoreDNM will have broad applications and can greatly benefit DNM research.

Materials and methods

Statistical model

For a single study, we assume that DNM counts in a given variant class (for example, synonymous variants) follow a mixed-effects Poisson model:

Y_{i} \sim P o i s s o n (λ_{i}),

\log (λ_{i}) = β + \log (2 N m_{i}) + ϕ_{i},

ϕ_{i} \sim N (0, σ^{2}), f o r i = 1, \dots, G,

where $Y_{i}$ is the DNM count in the i-th gene, $N$ is the number of trios, $m_{i}$ is the de novo mutability for the i-th gene (for example, mutation rate per chromosome per generation) which is known a priori (Samocha et al., 2014), and $G$ is the total number of genes in the study. The elevation parameter $β$ quantifies the global elevation of mutation rate compared to mutability estimates based on genomic sequence alone. Gene-specific deviation from expected DNM rate is quantified by random effect $ϕ_{i}$ with a dispersion parameter $σ$ . Here, the $ϕ_{i}$ are assumed to be independent across different genes, in which case the observed DNM counts of different genes are independent. There is no constraint on the value of $β$ , and the dispersion parameter $σ$ can be any positive value.

Next, we describe how we expand this model to quantify the shared genetics of two disorders. We adopt a flexible Poisson-lognormal mixture framework that can accommodate both overdispersion and correlation (Munkin and Trivedi, 1999). We assume DNM counts in a given variant class for two diseases follow:

[\begin{matrix} Y_{i 1} \\ Y_{i 2} \end{matrix}] \sim P o i s s o n ([\begin{matrix} λ_{i 1} \\ λ_{i 2} \end{matrix}]),

\log ([\begin{matrix} λ_{i 1} \\ λ_{i 2} \end{matrix}]) = [\begin{matrix} β_{1} \\ β_{2} \end{matrix}] + \log ([\begin{matrix} 2 N_{1} m_{i} \\ 2 N_{2} m_{i} \end{matrix}]) + [\begin{matrix} ϕ_{i 1} \\ ϕ_{i 2} \end{matrix}],

[\begin{matrix} ϕ_{i 1} \\ ϕ_{i 2} \end{matrix}] \sim M V N ([\begin{matrix} 0 \\ 0 \end{matrix}], [\begin{matrix} σ_{1}^{2} & ρ σ_{1} σ_{2} \\ ρ σ_{1} σ_{2} & σ_{2}^{2} \end{matrix}]),

where $Y_{i 1}, Y_{i 2}$ are the DNM counts for the i-th gene and $N_{1}, N_{2}$ are the trio sizes in two studies, respectively. Similar to the single-trait model, $m_{i}$ is the mutability for the i-th gene. $β_{1}, β_{2}$ are the elevation parameters, and $ϕ_{i 1}, ϕ_{i 2}$ are the gene-specific random effects with dispersion parameters $σ_{1}, σ_{2}$ , for two disorders respectively. $ρ$ is the enrichment correlation which quantifies the concordance of the gene-specific DNM burden between two disorders. Here, $β_{1}, β_{2}, σ_{1}, σ_{2}, ρ$ are unknown parameters. The gene specific effects for two disorders are assumed to be independent for different genes. We also assume that there is no shared sample for two disorders, in which case $Y_{i 1}$ is independent with $Y_{i 2}$ given $[\begin{matrix} λ_{i 1} \\ λ_{i 2} \end{matrix}]$ .

Parameter estimation

We implement an MLE procedure to estimate unknown parameters. For single-trait analysis, the log-likelihood function can be expressed as follows:

l (β, σ| Y) = \sum_{i = 1}^{G} \log [\int \exp (- λ_{i}) λ_{i}^{Y_{i}} * f (ϕ_{i}) d ϕ_{i}] + C,

where $Y = {[Y_{1}, \dots, Y_{G}]}^{T}$ , $λ_{i} = 2 N m_{i} \exp (β + ϕ_{i})$ , $C = - \sum_{i = 1}^{G} \log (Y_{i}!)$ , and $f (ϕ_{i}) = \frac{1}{\sqrt{2 π} σ} \exp (- \frac{ϕ_{i}^{2}}{2 σ^{2}})$ . Note that there is no closed form for the integral in the log-likelihood function. Therefore, we use Monte Carlo integration to evaluate the log-likelihood function. Let $ϕ_{i j} = σ ξ_{i j}$ , where the $ξ_{i j}$ are independently and identically distributed random variables following a standard normal distribution. We have

l (β, σ| Y) \approx l ` (β, σ| Y) = \sum_{i = 1}^{G} \log [\sum_{j = 1}^{M} \exp (- λ_{i j}) λ_{i j}^{Y_{i}}] + C,

where $λ_{i j} = 2 N m_{i} \exp (β + σ ξ_{i j})$ , and $M$ is the Monte Carlo sample size which is set to be 1,000. Then, we could obtain the MLE of $β, σ$ through maximization of $l^{'} (β, σ | Y)$ . We obtain the standard error of the MLE through inversion of the observed Fisher information matrix. However, when the DNM count is small, the Fisher information may be non-invertible and the parameter vector is not numerically identifiable. In this case, we employ group-wise jackknife using 100 randomly partitioned gene groups to obtain standard errors for parameter estimates. This approach produces consistent standard errors compared to the Fisher information approach (Figure 5—figure supplement 9).

The estimation procedure can be generalized to multi-trait analysis. Log-likelihood function can be expressed as follows:

l (β_{1}, β_{2}, σ_{1}, σ_{2}, ρ| Y_{1}, Y_{2}) = \sum_{i = 1}^{G} \log [\int \exp (- λ_{i 1} - λ_{i 2}) λ_{i 1}^{Y_{i 1}} λ_{i 2}^{Y_{i 2}} * f (ϕ_{i 1}, ϕ_{i 2}) d ϕ_{i 1} d ϕ_{i 2}] + C,

where $Y_{1} = {[Y_{11}, \dots, Y_{G 1}]}^{T}$ , $Y_{2} = {[Y_{12}, \dots, Y_{G 2}]}^{T}$ , $λ_{i 1} = 2 N_{1} m_{i} \exp (β_{1} + ϕ_{i 1})$ , $λ_{i 2} = 2 N_{2} m_{i} \exp (β_{2} + ϕ_{i 2})$ , $C = - \sum_{i = 1}^{G} [\log (Y_{i 1}!) + \log (Y_{i 2}!)]$ , and $f (ϕ_{i 1}, ϕ_{i 2}) = \frac{1}{2 π σ_{1} σ_{2} \sqrt{1 - ρ^{2}}} \exp [- \frac{1}{2 \sqrt{1 - ρ^{2}}} (\frac{ϕ_{i 1}^{2}}{σ_{1}^{2}} + \frac{ϕ_{i 2}^{2}}{σ_{2}^{2}} - \frac{2 ρ ϕ_{i 1} ϕ_{i 2}}{σ_{1} σ_{2}})]$ . We use Monte Carlo integration to evaluate the log-likelihood function. Let $ϕ_{i 1 j} = σ_{1} ξ_{i 1 j}$ and $ϕ_{i 2 j} = σ_{2} (ρ ξ_{i 1 j} + \sqrt{1 - ρ^{2}} ξ_{i 2 j})$ , where the $ξ_{i 1 j}$ and $ξ_{i 2 j}$ are independently and identically distributed random variables following a standard normal distribution. We have

l (β_{1}, β_{2}, σ_{1}, σ_{2}, ρ| Y_{1}, Y_{2}) \approx l ` (β_{1}, β_{2}, σ_{1}, σ_{2}, ρ| Y_{1}, Y_{2}) = \sum_{i = 1}^{G} \log [\sum_{j = 1}^{M} \exp (- λ_{i 1 j} - λ_{i 2 j}) λ_{i 1 j}^{Y_{i 1}} λ_{i 2 j}^{Y_{i 2}}] + C,

where $λ_{i 1 j} = 2 N_{1} m_{i} \exp (β_{1} + σ_{1} ξ_{i 1 j})$ and $λ_{i 2 j} = 2 N_{2} m_{i} \exp [β_{2} + σ_{2} (ρ ξ_{i 1 j} + \sqrt{1 - ρ^{2}} ξ_{i 2 j})]$ . Then, we obtain the MLE of $β_{1}, β_{2}, σ_{1}, σ_{2}, ρ$ through maximization of $l ` (β_{1}, β_{2}, σ_{1}, σ_{2}, ρ| Y_{1}, Y_{2})$ . Standard error of MLE can be obtained either through inversion of the observed Fisher information matrix or group-wise jackknife if non-invertibility issue occurs.

Computation time

Analysis of a typical pair of disorders with 18,000 genes takes about 10 min on a 2.5 GHz cluster with 1 core.

DNM data and variant annotation

We obtained DNM data from published studies (Supplementary file 1-STable 1). DNM data for epileptic encephalopathies from the original release (Allen et al., 2013) were not in an editable format and were instead collected from denovo-db (Turner et al., 2017). We used ANNOVAR (Wang et al., 2010) to annotate all DNMs. Synonymous variants were determined based on the ‘synonymous SNV’ annotation in ANNOVAR; Variants with ‘startloss’, ‘stopgain’, ‘stoploss’, ‘splicing’, ‘frameshift insertion’, ‘frameshift deletion’, or ‘frameshift substitution’ annotations were classified as LoF; Dmis variants were defined as nonsynonymous SNVs predicted to be deleterious by MetaSVM Dong et al., 2015; nonsynonymous SNVs predicted to be tolerable by MetaSVM were classified as Tmis. Other DNMs which did not fall into these categories were removed from the analysis. For each variant class, we estimated the mutability of each gene using a sequence-based mutation model (Samocha et al., 2014) while adjusting for the sequencing coverage factor based on control trios as previously described (Jin et al., 2017; Supplementary file 1-STable 20). We included 18,454 autosomal protein-coding genes in our analysis. TTN was removed due to its substantially larger size.

Description and implementation of mTADA

The method mTADA employs a Bayesian framework and estimates the proportion of shared risk genes. Specifically, mTADA assigns all genes into four groups: genes that are not relevant for either disorder, risk genes for the first disorder alone, risk genes for the second disorder alone, and risk genes shared by both disorders. The proportion of these groups are parametrized as $π_{0}, π_{1}, π_{2}, π_{3}$ , respectively. In particular, parameter $π_{3}$ quantifies the extent of genetic sharing between two disorders, with a larger value indicating stronger genetic overlap (Nguyen et al., 2020). The 95% credible interval constructed through MCMC is used to measure the uncertainty in $π_{3}$ estimates.

The software mTADA requires the following parameters as inputs: proportion of risk genes ( $π_{1}^{S}, π_{2}^{S}$ ), mean relative risks ( ${\bar{γ}}_{1}^{S}, {\bar{γ}}_{2}^{S}$ ), and dispersion parameters ( ${\bar{β}}_{1}^{S}, {\bar{β}}_{2}^{S}$ ) for both disorders. We used extTADA (Nguyen et al., 2017 )to estimate these parameters as suggested by the mTADA paper (Nguyen et al., 2020). mTADA reported the estimated proportion of shared risk genes $π_{3}$ (posterior mode of $π_{3}$ ) and its corresponding 95% credible interval $[L B, U B]$ . We considered $π_{1}^{S} * π_{2}^{S}$ as the expected proportion of shared risk genes, and there is significant genetic sharing between two disorders when $L B > π_{1}^{S} * π_{2}^{S}$ . We quantify statistical evidence for genetic sharing by comparing the posterior distribution of $π_{3}$ with $π_{1}^{S} * π_{2}^{S}$ ,

p = 2 * \frac{\sum_{i = 1}^{N_{M C M C}} I (π_{3}^{i} < π_{1}^{S} * π_{2}^{S})}{N_{M C M C}},

where $π_{3}^{i}$ is the i-th MCMC iteration sample, $N_{M C M C}$ is the number of iterations, and $I ()$ is the indicator function. This is also equivalent to performing two-sided inference using posterior probability P $(π_{3} > π_{1}^{S} * π_{2}^{S})$ . Number of MCMC chain was set as 2 and number of iterations was set as 10,000.

Simulation settings

We assessed the performance of EncoreDNM under the mixed-effects Poisson model. We performed simulations for two variant classes: Tmis and LoF variants, which have the largest and the smallest median mutability values across all genes. First, we performed single-trait simulations to assess estimation precision of elevation parameter $β$ and dispersion parameter $σ$ . We set the true values of $β$ to be −0.5,–0.25, and 0, and the true values of $σ$ to be 0.5, 0.75, and 1. These values were chosen based on the estimated parameters in real DNM data analyses and ensured simulation settings to be realistic. Next, we performed simulations for cross-trait analysis to assess estimation precision of enrichment correlation $ρ$ , whose true values were set to be 0, 0.2, 0.4, 0.6, and 0.8. Sample size for each disorder was set to be 5000. Coverage rate was calculated as the percentage of simulations that the 95% Wald confidence interval covered the true parameter value. Each parameter setting was repeated 100 times.

We also carried out simulations to compare the performance of EncoreDNM and mTADA. False positive rate and statistical power for EncoreDNM were calculated as the proportion of simulation repeats that p-value for enrichment correlation $ρ$ was smaller than 0.05. and the proportion of simulation repeats that p-value for estimated proportion of shared risk genes $π_{3}$ was smaller than 0.05 was used for mTADA. We aggregated all variant classes together, so mutability for each gene was determined as the sum of mutabilities across four variant classes (i.e. LoF, Dmis, Tmis, and synonymous).

First, we simulated DNM data under the mixed-effects Poisson model. To see whether two methods would produce false positive findings, we performed simulations under the null hypothesis that the enrichment correlation $ρ$ is zero. We compared two methods under a range of parameter combinations of ( $β, σ, N$ ) for both disorders: (–0.25, 0.75, 5000) for the baseline setting, (–1, 0.75, 5000) for a setting with small $β$ , (–0.25, 0.5, 5000) for a setting with small $σ$ , and (–0.25, 0.75, 1000) for a setting with small sample size. We also assessed the statistical power of two methods under the alternative hypothesis. True value of enrichment correlation $ρ$ was set to be 0.05, 0.1, 0.15, and 0.2. In the power analysis, parameters ( $β, σ, N$ ) were fixed at (–0.25, 0.75, 5000) as in the baseline setting when both methods had well-controlled false positive rate.

To ensure a fair comparison, we also compared EncoreDNM and mTADA under a multinomial model, which is different from the data generation processes for the two approaches. For each disorder ( $k = 1,2$ ), we randomly selected causal genes of proportion $π_{k}^{S}$ . A proportion (i.e. $π_{3}$ ) of causal genes overlap between two disorders. We assumed that the total DNM count to follow a Poisson distribution: $C_{k} ~ P o i s s o n (u_{k} * 2 N_{k} \sum_{i = 1}^{G} m_{i})$ , where $u_{k}$ represents an elevation factor to represent systematic bias in the data. Let $Y_{k}$ denote the vector of DNMs counts in the exome, $m$ denote the vector of mutability values for all genes, and $m_{c a u s a l, k}$ denote the vector of mutability with values set to be 0 for non-causal genes of disorder $k$ . We assumed that a proportion $p_{k}$ of the probands could be attributed to DNMs burden in causal genes, and $1 - p_{k}$ of the probands obtained DNMs by chance:

Y_{k} = Y_{c a u s a l, k} + Y_{b a c k g r o u n d, k},

Y_{c a u s a l, k} ~ M u l t i n o m i a l (p_{k} C_{k}, m_{c a u s a l, k}),

Y_{b a c k g r o u n d, k} ~ M u l t i n o m i a l ((1 - p_{k}) C_{k}, m) .

To check whether false positive findings could arise, we performed simulations under the null hypothesis that $π_{3} = π_{1}^{S} * π_{2}^{S}$ across a range of parameter combinations of ( $u, p, N$ ) for both disorders: (0.95, 0.25, 5000) for the baseline setting, (0.75, 0.25, 5000) for a setting with small $u$ (i.e., reduced total mutation count), (0.95, 0.15, 5000) for a setting with small $p$ (fewer probands explained by DNMs), and (0.95, 0.25, 1000) for a setting with smaller sample size. $π_{1}^{S}$ and $π_{2}^{S}$ were set as 0.1. We also assessed the statistical power of two methods under the alternative hypothesis that $π_{3} > π_{1}^{S} * π_{2}^{S}$ . In power analysis, ( $u, p, N$ ) were fixed at (0.95, 0.25, 5000) as in the baseline setting when false positive rate for both methods were well-calibrated.

Comparison to the fixed-effects Poisson model

For single-trait analysis, the fixed-effects Poisson model assumes that

Y_{i} \sim P o i s s o n (λ_{i}),

\log (λ_{i}) = β + \log (2 N m_{i}), f o r i = 1, \dots, G .

Note that the fixed-effects Poisson model is a special case of our proposed mixed-effects Poisson model when $σ = 0$ . We compared the two models using likelihood ratio test. Under the null hypothesis that $σ = 0$ , $2 (l_{a l t} - l_{n u l l}) \sim \frac{1}{2} χ_{1}^{2}$ asymptotically, where $l_{a l t}$ and $l_{n u l l}$ represent the log likelihood of the fitted mixed-effects and fixed-effects Poisson models respectively.

Recurrent genes and DNMs

We used FUMA (Watanabe et al., 2017) to perform GO enrichment analysis for genes harboring LoF DNMs in multiple disorders. Due to potential sample overlap between the studies of developmental disorder (Kaplanis et al., 2020) and intellectual disability (Lelieveld et al., 2016), we excluded intellectual disability from the analysis of recurrent DNMs. We calculated the probability of observing two identical DNMs in two disorders using a Monte Carlo simulation method. For each disorder, we simulated exome-wide DNMs profile from a multinomial distribution, where the size was fixed at the observed DNM count and the per-base mutation probability was determined by the tri-nucleotide base context. We repeated the simulation procedure 100,000 times to evaluate the significance of recurrent DNMs. Lollipop plots for recurrent mutations were generated using MutationMapper on the cBio Cancer Genomics Portal (Cerami et al., 2012).

Implementation of cross-trait LD score regression

We used cross-trait LDSC (Bulik-Sullivan et al., 2015) to estimate genetic correlations between disorders. LD scores were computed using European samples from the 1000 Genomes Project Phase 3 data (Auton et al., 2015). Only HapMap 3 SNPs were used as observations in the explanatory variable with the --merge-alleles flag. Intercepts were not constrained in the analyses.

Estimating enrichment correlation in gene sets

Genes with a high/low probability of intolerance to LoF variants (high-pLI/low-pLI) were defined as the 4,614 genes in the upper/lower quartiles of pLI scores (Karczewski et al., 2020). Genes with high/low brain expression (HBE/LBE) were defined as the 4,614 genes in the upper/lower quartiles of expression in the human fetal brain (Werling et al., 2020). Genes with high/low heart expression (HHE/LHE) were defined as the 4,614 genes in the upper/lower quartiles of expression in the developing heart of embryonic mouse (Zaidi et al., 2013). Five biological pathways have been reported to be involved in congenital heart disease: chromatin remodeling, Notch signaling, cilia function, sarcomere structure and function, and RAS signaling (Zaidi and Brueckner, 2017). We extracted 1730 unique genes that belong to these five pathways from the gene ontology database (Ashburner et al., 2000) and referred to the union set as CHD-related genes. We repeated EncoreDNM enrichment correlation analysis in these gene sets. One-sided Kolmogorov-Smirnov test was used to assess the statistical difference between enrichment correlation signal strength in different gene sets.

URLs

GWAS summary statistics data of autism spectrum disorder, schizophrenia, and Tourette disorder were downloaded on the PGC website, https://www.med.unc.edu/pgc/download-results/; Summary statistics of cognitive performance were downloaded on the SSGAC website, https://thessgac.com/; Summary statistics of epilepsy were downloaded on the epiGAD website, https://www.epigad.org/; pLI scores were downloaded from gnomAD v3.1 repository https://gnomad.broadinstitute.org/downloads; mTADA, https://github.com/hoangtn/mTADA, Nguyen et al., 2021; denovo-db, https://denovo-db.gs.washington.edu/denovo-db/; MutationMapper on cBioPortal, https://www.cbioportal.org/mutation_mapper; LDSC, https://github.com/bulik/ldsc; Schorsch, 2020.

Code availability

EncoreDNM software is available at https://github.com/ghm17/EncoreDNM; Guo, 2022.

Acknowledgements

LH acknowledges research support from the National Science Foundation of China (Grant No. 12071243) and Shanghai Municipal Science and Technology Major Project (Grant No. 2017SHZDZX01). QL acknowledges research support from the University of Wisconsin-Madison Office of the Chancellor and the Vice Chancellor for Research and Graduate Education with funding from the Wisconsin Alumni Research Foundation and the Waisman Center pilot grant program at University of Wisconsin-Madison. HZ acknowledges research support from the National Institutes of Health (Grant No. R03HD100883 and R01GM134005) and the National Science Foundation (DMS 1902903).

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Hongyu Zhao, Email: Hongyu.Zhao@yale.edu.

Qiongshi Lu, Email: qlu@biostat.wisc.edu.

Alexander Young, University of California, Los Angeles, United States.

Molly Przeworski, Columbia University, United States.

Funding Information

This paper was supported by the following grants:

National Science Foundation of China No. 12071243 to Lin Hou.
Shanghai Municipal Science and Technology Major Project No. 2017SHZDZX01 to Lin Hou.
Wisconsin Alumni Research Foundation to Qiongshi Lu.
Waisman Center pilot grant program at University of Wisconsin-Madison to Qiongshi Lu.
National Institutes of Health No. R03HD100883 and R01GM134005 to Hongyu Zhao.
National Science Foundation DMS 1902903 to Hongyu Zhao.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing - original draft, Writing - review and editing.

Conceptualization, Methodology, Project administration, Supervision, Writing - original draft, Writing - review and editing.

Formal analysis.

Data curation, Writing - review and editing.

Data curation.

Validation.

Methodology, Validation, Writing - review and editing.

Conceptualization, Methodology, Project administration, Supervision, Writing - original draft, Writing - review and editing.

Additional files

Supplementary file 1. Supplementary Tables 1-20.

elife-75551-supp1.xlsx^{(1.8MB, xlsx)}

MDAR checklist

elife-75551-mdarchecklist1.pdf^{(200.1KB, pdf)}

Data availability

The current manuscript is a computational study, so no data have been generated for this manuscript.

References

Abdellaoui A, Verweij KJH. Dissecting polygenic signals from genome-wide association studies on human behaviour. Nature Human Behaviour. 2021;5:686–694. doi: 10.1038/s41562-021-01110-y. [DOI] [PubMed] [Google Scholar]
Allen AS, Berkovic SF, Cossette P, Delanty N, Dlugos D, Eichler EE, Epstein MP, Glauser T, Goldstein DB, Han Y, Heinzen EL, Hitomi Y, Howell KB, Johnson MR, Kuzniecky R, Lowenstein DH, Lu Y-F, Madou MRZ, Marson AG, Mefford HC, Esmaeeli Nieh S, O’Brien TJ, Ottman R, Petrovski S, Poduri A, Ruzzo EK, Scheffer IE, Sherr EH, Yuskaitis CJ, Abou-Khalil B, Alldredge BK, Bautista JF, Berkovic SF, Boro A, Cascino GD, Consalvo D, Crumrine P, Devinsky O, Dlugos D, Epstein MP, Fiol M, Fountain NB, French J, Friedman D, Geller EB, Glauser T, Glynn S, Haut SR, Hayward J, Helmers SL, Joshi S, Kanner A, Kirsch HE, Knowlton RC, Kossoff EH, Kuperman R, Kuzniecky R, Lowenstein DH, McGuire SM, Motika PV, Novotny EJ, Ottman R, Paolicchi JM, Parent JM, Park K, Poduri A, Scheffer IE, Shellhaas RA, Sherr EH, Shih JJ, Singh R, Sirven J, Smith MC, Sullivan J, Lin Thio L, Venkat A, Vining EPG, Von Allmen GK, Weisenberg JL, Widdess-Walsh P, Winawer MR, Epi4K Consortium. Epilepsy Phenome/Genome Project De novo mutations in epileptic encephalopathies. Nature. 2013;501:217–221. doi: 10.1038/nature12439. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene Ontology: tool for the unification of biology. Nature Genetics. 2000;25:25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR, 1000 Genomes Project Consortium A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
Brainstorm C. Analysis of shared heritability in common disorders of the brain. Science (New York, N.Y.) 2018;360:aap875. doi: 10.1126/science.aap875. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bulik-Sullivan B, Finucane HK, Anttila V, Gusev A, Day FR, Loh P-R, Duncan L, Perry JRB, Patterson N, Robinson EB, Daly MJ, Price AL, Neale BM, ReproGen Consortium. Psychiatric Genomics Consortium. Genetic Consortium for Anorexia Nervosa of the Wellcome Trust Case Control Consortium 3 An atlas of genetic correlations across human diseases and traits. Nature Genetics. 2015;47:1236–1241. doi: 10.1038/ng.3406. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cao M, Wu JI. Camk2a-Cre-mediated conditional deletion of chromatin remodeler Brg1 causes perinatal hydrocephalus. Neuroscience Letters. 2015;597:71–76. doi: 10.1016/j.neulet.2015.04.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cardozo T, Pagano M. The SCF ubiquitin ligase: insights into a molecular machine. Nature Reviews. Molecular Cell Biology. 2004;5:739–751. doi: 10.1038/nrm1471. [DOI] [PubMed] [Google Scholar]
Cerami E, Gao J, Dogrusoz U, Gross BE, Sumer SO, Aksoy BA, Jacobsen A, Byrne CJ, Heuer ML, Larsson E, Antipin Y, Reva B, Goldberg AP, Sander C, Schultz N. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discovery. 2012;2:401–404. doi: 10.1158/2159-8290.CD-12-0095. [DOI] [PMC free article] [PubMed] [Google Scholar]
Christensen D, Van Naarden Braun K, Doernberg NS, Maenner MJ, Arneson CL, Durkin MS, Benedict RE, Kirby RS, Wingate MS, Fitzgerald R, Yeargin-Allsopp M. Prevalence of cerebral palsy, co-occurring autism spectrum disorders, and motor functioning - Autism and Developmental Disabilities Monitoring Network, USA, 2008. Developmental Medicine and Child Neurology. 2014;56:59–65. doi: 10.1111/dmcn.12268. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dong C, Wei P, Jian X, Gibbs R, Boerwinkle E, Wang K, Liu X. Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies. Human Molecular Genetics. 2015;24:2125–2137. doi: 10.1093/hmg/ddu733. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fromer M, Pocklington AJ, Kavanagh DH, Williams HJ, Dwyer S, Gormley P, Georgieva L, Rees E, Palta P, Ruderfer DM, Carrera N, Humphreys I, Johnson JS, Roussos P, Barker DD, Banks E, Milanova V, Grant SG, Hannon E, Rose SA, Chambert K, Mahajan M, Scolnick EM, Moran JL, Kirov G, Palotie A, McCarroll SA, Holmans P, Sklar P, Owen MJ, Purcell SM, O’Donovan MC. De novo mutations in schizophrenia implicate synaptic networks. Nature. 2014;506:179–184. doi: 10.1038/nature12929. [DOI] [PMC free article] [PubMed] [Google Scholar]
Garne E, Dolk H, Krägeloh-Mann I, Holst Ravn S, Cans C, SCPE Collaborative Group Cerebral palsy and congenital malformations. European Journal of Paediatric Neurology. 2008;12:82–88. doi: 10.1016/j.ejpn.2007.07.001. [DOI] [PubMed] [Google Scholar]
Gratten J, Wray NR, Keller MC, Visscher PM. Large-scale genomics unveils the genetic architecture of psychiatric disorders. Nature Neuroscience. 2014;17:782–790. doi: 10.1038/nn.3708. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gregor A, Sadleir LG, Asadollahi R, Azzarello-Burri S, Battaglia A, Ousager LB, Boonsawat P, Bruel A-L, Buchert R, Calpena E, Cogné B, Dallapiccola B, Distelmaier F, Elmslie F, Faivre L, Haack TB, Harrison V, Henderson A, Hunt D, Isidor B, Joset P, Kumada S, Lachmeijer AMA, Lees M, Lynch SA, Martinez F, Matsumoto N, McDougall C, Mefford HC, Miyake N, Myers CT, Moutton S, Nesbitt A, Novelli A, Orellana C, Rauch A, Rosello M, Saida K, Santani AB, Sarkar A, Scheffer IE, Shinawi M, Steindl K, Symonds JD, Zackai EH, University of Washington Center for Mendelian Genomics. DDD Study. Reis A, Sticht H, Zweier C. De Novo Variants in the F-Box Protein FBXO11 in 20 Individuals with a Variable Neurodevelopmental Disorder. American Journal of Human Genetics. 2018;103:305–316. doi: 10.1016/j.ajhg.2018.07.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
Grotzinger AD, Rhemtulla M, de Vlaming R, Ritchie SJ, Mallard TT, Hill WD, Ip HF, Marioni RE, McIntosh AM, Deary IJ, Koellinger PD, Harden KP, Nivard MG, Tucker-Drob EM. Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits. Nature Human Behaviour. 2019;3:513–525. doi: 10.1038/s41562-019-0566-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Grotzinger AD, Mallard TT, Akingbuwa WA, Ip HF, Adams MJ, Lewis CM, McIntosh AM, Grove J, Dalsgaard S, Lesch KP, Strom N, Meier SM, Mattheisen M, Børglum AD, Mors O, Breen G, Lee PH, Kendler KS, Smoller JW, Tucker-Drob EM, Nivard MG, iPSYCH, Tourette Syndrome and Obsessive Compulsive Disorder Working Group of the Psychiatric Genetics Consortium. Bipolar Disorder Working Group of the Psychiatric Genetics Consortium. Major Depressive Disorder Working Group of the Psychiatric Genetics Consortium. Schizophrenia Working Group of the Psychiatric Genetics Consortium Genetic Architecture of 11 Major Psychiatric Disorders at Biobehavioral, Functional Genomic, and Molecular Genetic Levels of Analysis. medRxiv. 2020 doi: 10.1101/2020.09.22.20196089. [DOI] [PMC free article] [PubMed]
Gulisano M, Calì PV, Cavanna AE, Eddy C, Rickards H, Rizzo R. Cardiovascular safety of aripiprazole and pimozide in young patients with Tourette syndrome. Neurological Sciences. 2011;32:1213–1217. doi: 10.1007/s10072-011-0678-1. [DOI] [PubMed] [Google Scholar]
Guo H, Li JJ, Lu Q, Hou L. Detecting local genetic correlations with scan statistics. Nature Communications. 2021;12:2033. doi: 10.1038/s41467-021-22334-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Guo H. EncoreDNM. swh:1:rev:44ec5903b4c34e7b73ed7791f30d0b3544bafcd1GitHub. 2022 https://github.com/ghm17/EncoreDNM
Hoischen A, Krumm N, Eichler EE. Prioritization of neurodevelopmental disease genes by discovery of new mutations. Nature Neuroscience. 2014;17:764–772. doi: 10.1038/nn.3703. [DOI] [PMC free article] [PubMed] [Google Scholar]
Homsy J, Zaidi S, Shen Y, Ware JS, Samocha KE, Karczewski KJ, DePalma SR, McKean D, Wakimoto H, Gorham J, Jin SC, Deanfield J, Giardini A, Porter GA, Jr, Kim R, Bilguvar K, López-Giráldez F, Tikhonova I, Mane S, Romano-Adesman A, Qi H, Vardarajan B, Ma L, Daly M, Roberts AE, Russell MW, Mital S, Newburger JW, Gaynor JW, Breitbart RE, Iossifov I, Ronemus M, Sanders SJ, Kaltman JR, Seidman JG, Brueckner M, Gelb BD, Goldmuntz E, Lifton RP, Seidman CE, Chung WK. De novo mutations in congenital heart disease with neurodevelopmental and other congenital anomalies. Science (New York, N.Y.) 2015;350:1262–1266. doi: 10.1126/science.aac9396. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hormozdiari F, Penn O, Borenstein E, Eichler EE. The discovery of integrated gene networks for autism and related disorders. Genome Research. 2015;25:142–154. doi: 10.1101/gr.178855.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
Howrigan DP, Rose SA, Samocha KE, Fromer M, Cerrato F, Chen WJ, Churchhouse C, Chambert K, Chandler SD, Daly MJ, Dumont A, Genovese G, Hwu H-G, Laird N, Kosmicki JA, Moran JL, Roe C, Singh T, Wang S-H, Faraone SV, Glatt SJ, McCarroll SA, Tsuang M, Neale BM. Exome sequencing in schizophrenia-affected parent-offspring trios reveals risk conferred by protein-coding de novo mutations. Nature Neuroscience. 2020;23:185–193. doi: 10.1038/s41593-019-0564-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jansen S, van der Werf IM, Innes AM, Afenjar A, Agrawal PB, Anderson IJ, Atwal PS, van Binsbergen E, van den Boogaard M-J, Castiglia L, Coban-Akdemir ZH, van Dijck A, Doummar D, van Eerde AM, van Essen AJ, van Gassen KL, Guillen Sacoto MJ, van Haelst MM, Iossifov I, Jackson JL, Judd E, Kaiwar C, Keren B, Klee EW, Klein Wassink-Ruiter JS, Meuwissen ME, Monaghan KG, de Munnik SA, Nava C, Ockeloen CW, Pettinato R, Racher H, Rinne T, Romano C, Sanders VR, Schnur RE, Smeets EJ, Stegmann APA, Stray-Pedersen A, Sweetser DA, Terhal PA, Tveten K, VanNoy GE, de Vries PF, Waxler JL, Willing M, Pfundt R, Veltman JA, Kooy RF, Vissers LELM, de Vries BBA. De novo variants in FBXO11 cause a syndromic form of intellectual disability with behavioral problems and dysmorphisms. European Journal of Human Genetics. 2019;27:738–746. doi: 10.1038/s41431-018-0292-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jin SC, Homsy J, Zaidi S, Lu Q, Morton S, DePalma SR, Zeng X, Qi H, Chang W, Sierant MC, Hung W-C, Haider S, Zhang J, Knight J, Bjornson RD, Castaldi C, Tikhonoa IR, Bilguvar K, Mane SM, Sanders SJ, Mital S, Russell MW, Gaynor JW, Deanfield J, Giardini A, Porter GA, Jr, Srivastava D, Lo CW, Shen Y, Watkins WS, Yandell M, Yost HJ, Tristani-Firouzi M, Newburger JW, Roberts AE, Kim R, Zhao H, Kaltman JR, Goldmuntz E, Chung WK, Seidman JG, Gelb BD, Seidman CE, Lifton RP, Brueckner M. Contribution of rare inherited and de novo variants in 2,871 congenital heart disease probands. Nature Genetics. 2017;49:1593–1601. doi: 10.1038/ng.3970. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jin SC, Dong W, Kundishora AJ, Panchagnula S, Moreno-De-Luca A, Furey CG, Allocco AA, Walker RL, Nelson-Williams C, Smith H, Dunbar A, Conine S, Lu Q, Zeng X, Sierant MC, Knight JR, Sullivan W, Duy PQ, DeSpenza T, Reeves BC, Karimy JK, Marlier A, Castaldi C, Tikhonova IR, Li B, Peña HP, Broach JR, Kabachelor EM, Ssenyonga P, Hehnly C, Ge L, Keren B, Timberlake AT, Goto J, Mangano FT, Johnston JM, Butler WE, Warf BC, Smith ER, Schiff SJ, Limbrick DD, Heuer G, Jackson EM, Iskandar BJ, Mane S, Haider S, Guclu B, Bayri Y, Sahin Y, Duncan CC, Apuzzo MLJ, DiLuna ML, Hoffman EJ, Sestan N, Ment LR, Alper SL, Bilguvar K, Geschwind DH, Günel M, Lifton RP, Kahle KT. Exome sequencing implicates genetic disruption of prenatal neuro-gliogenesis in sporadic congenital hydrocephalus. Nature Medicine. 2020a;26:1754–1765. doi: 10.1038/s41591-020-1090-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jin SC, Lewis SA, Bakhtiari S, Zeng X, Sierant MC, Shetty S, Nordlie SM, Elie A, Corbett MA, Norton BY, van Eyk CL, Haider S, Guida BS, Magee H, Liu J, Pastore S, Vincent JB, Brunstrom-Hernandez J, Papavasileiou A, Fahey MC, Berry JG, Harper K, Zhou C, Zhang J, Li B, Zhao H, Heim J, Webber DL, Frank MSB, Xia L, Xu Y, Zhu D, Zhang B, Sheth AH, Knight JR, Castaldi C, Tikhonova IR, López-Giráldez F, Keren B, Whalen S, Buratti J, Doummar D, Cho M, Retterer K, Millan F, Wang Y, Waugh JL, Rodan L, Cohen JS, Fatemi A, Lin AE, Phillips JP, Feyma T, MacLennan SC, Vaughan S, Crompton KE, Reid SM, Reddihough DS, Shang Q, Gao C, Novak I, Badawi N, Wilson YA, McIntyre SJ, Mane SM, Wang X, Amor DJ, Zarnescu DC, Lu Q, Xing Q, Zhu C, Bilguvar K, Padilla-Lopez S, Lifton RP, Gecz J, MacLennan AH, Kruer MC. Mutations disrupting neuritogenesis genes confer risk for cerebral palsy. Nature Genetics. 2020b;52:1046–1056. doi: 10.1038/s41588-020-0695-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kaplanis J, Samocha KE, Wiel L, Zhang Z, Arvai KJ, Eberhardt RY, Gallone G, Lelieveld SH, Martin HC, McRae JF, Short PJ, Torene RI, de Boer E, Danecek P, Gardner EJ, Huang N, Lord J, Martincorena I, Pfundt R, Reijnders MRF, Yeung A, Yntema HG, Deciphering Developmental Disorders Study. Vissers LELM, Juusola J, Wright CF, Brunner HG, Firth HV, FitzPatrick DR, Barrett JC, Hurles ME, Gilissen C, Retterer K. Evidence for 28 genetic disorders discovered by combining healthcare and research data. Nature. 2020;586:757–762. doi: 10.1038/s41586-020-2832-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, Collins RL, Laricchia KM, Ganna A, Birnbaum DP, Gauthier LD, Brand H, Solomonson M, Watts NA, Rhodes D, Singer-Berk M, England EM, Seaby EG, Kosmicki JA, Walters RK, Tashman K, Farjoun Y, Banks E, Poterba T, Wang A, Seed C, Whiffin N, Chong JX, Samocha KE, Pierce-Hoffman E, Zappala Z, O’Donnell-Luria AH, Minikel EV, Weisburd B, Lek M, Ware JS, Vittal C, Armean IM, Bergelson L, Cibulskis K, Connolly KM, Covarrubias M, Donnelly S, Ferriera S, Gabriel S, Gentry J, Gupta N, Jeandet T, Kaplan D, Llanwarne C, Munshi R, Novod S, Petrillo N, Roazen D, Ruano-Rubio V, Saltzman A, Schleicher M, Soto J, Tibbetts K, Tolonen C, Wade G, Talkowski ME, Genome Aggregation Database Consortium. Neale BM, Daly MJ, MacArthur DG. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581:434–443. doi: 10.1038/s41586-020-2308-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kielinen M, Rantala H, Timonen E, Linna SL, Moilanen I. Associated medical disorders and disabilities in children with autistic disorder: a population-based study. Autism. 2004;8:49–60. doi: 10.1177/1362361304040638. [DOI] [PubMed] [Google Scholar]
Kilincaslan A, Mukaddes NM. Pervasive developmental disorders in individuals with cerebral palsy. Developmental Medicine and Child Neurology. 2009;51:289–294. doi: 10.1111/j.1469-8749.2008.03171.x. [DOI] [PubMed] [Google Scholar]
Krumm N, Turner TN, Baker C, Vives L, Mohajeri K, Witherspoon K, Raja A, Coe BP, Stessman HA, He Z-X, Leal SM, Bernier R, Eichler EE. Excess of rare, inherited truncating mutations in autism. Nature Genetics. 2015;47:582–588. doi: 10.1038/ng.3303. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kume T, Deng KY, Winfrey V, Gould DB, Walter MA, Hogan BL. The forkhead/winged helix gene Mf1 is disrupted in the pleiotropic mouse mutation congenital hydrocephalus. Cell. 1998;93:985–996. doi: 10.1016/s0092-8674(00)81204-0. [DOI] [PubMed] [Google Scholar]
Lee SH, Yang J, Goddard ME, Visscher PM, Wray NR. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics (Oxford, England) 2012;28:2540–2542. doi: 10.1093/bioinformatics/bts474. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee SH, Ripke S, Neale BM, Faraone SV, Purcell SM, Perlis RH, Mowry BJ, Thapar A, Goddard ME, Witte JS, Absher D, Agartz I, Akil H, Amin F, Andreassen OA, Anjorin A, Anney R, Anttila V, Arking DE, Asherson P, Azevedo MH, Backlund L, Badner JA, Bailey AJ, Banaschewski T, Barchas JD, Barnes MR, Barrett TB, Bass N, Battaglia A, Bauer M, Bayés M, Bellivier F, Bergen SE, Berrettini W, Betancur C, Bettecken T, Biederman J, Binder EB, Black DW, Blackwood DHR, Bloss CS, Boehnke M, Boomsma DI, Breen G, Breuer R, Bruggeman R, Cormican P, Buccola NG, Buitelaar JK, Bunney WE, Buxbaum JD, Byerley WF, Byrne EM, Caesar S, Cahn W, Cantor RM, Casas M, Chakravarti A, Chambert K, Choudhury K, Cichon S, Cloninger CR, Collier DA, Cook EH, Coon H, Cormand B, Corvin A, Coryell WH, Craig DW, Craig IW, Crosbie J, Cuccaro ML, Curtis D, Czamara D, Datta S, Dawson G, Day R, De Geus EJ, Degenhardt F, Djurovic S, Donohoe GJ, Doyle AE, Duan J, Dudbridge F, Duketis E, Ebstein RP, Edenberg HJ, Elia J, Ennis S, Etain B, Fanous A, Farmer AE, Ferrier IN, Flickinger M, Fombonne E, Foroud T, Frank J, Franke B, Fraser C, Freedman R, Freimer NB, Freitag CM, Friedl M, Frisén L, Gallagher L, Gejman PV, Georgieva L, Gershon ES, Geschwind DH, Giegling I, Gill M, Gordon SD, Gordon-Smith K, Green EK, Greenwood TA, Grice DE, Gross M, Grozeva D, Guan W, Gurling H, De Haan L, Haines JL, Hakonarson H, Hallmayer J, Hamilton SP, Hamshere ML, Hansen TF, Hartmann AM, Hautzinger M, Heath AC, Henders AK, Herms S, Hickie IB, Hipolito M, Hoefels S, Holmans PA, Holsboer F, Hoogendijk WJ, Hottenga J-J, Hultman CM, Hus V, Ingason A, Ising M, Jamain S, Jones EG, Jones I, Jones L, Tzeng J-Y, Kähler AK, Kahn RS, Kandaswamy R, Keller MC, Kennedy JL, Kenny E, Kent L, Kim Y, Kirov GK, Klauck SM, Klei L, Knowles JA, Kohli MA, Koller DL, Konte B, Korszun A, Krabbendam L, Krasucki R, Kuntsi J, Kwan P, Landén M, Långström N, Lathrop M, Lawrence J, Lawson WB, Leboyer M, Ledbetter DH, Lee PH, Lencz T, Lesch K-P, Levinson DF, Lewis CM, Li J, Lichtenstein P, Lieberman JA, Lin D-Y, Linszen DH, Liu C, Lohoff FW, Loo SK, Lord C, Lowe JK, Lucae S, MacIntyre DJ, Madden PAF, Maestrini E, Magnusson PKE, Mahon PB, Maier W, Malhotra AK, Mane SM, Martin CL, Martin NG, Mattheisen M, Matthews K, Mattingsdal M, McCarroll SA, McGhee KA, McGough JJ, McGrath PJ, McGuffin P, McInnis MG, McIntosh A, McKinney R, McLean AW, McMahon FJ, McMahon WM, McQuillin A, Medeiros H, Medland SE, Meier S, Melle I, Meng F, Meyer J, Middeldorp CM, Middleton L, Milanova V, Miranda A, Monaco AP, Montgomery GW, Moran JL, Moreno-De-Luca D, Morken G, Morris DW, Morrow EM, Moskvina V, Muglia P, Mühleisen TW, Muir WJ, Müller-Myhsok B, Murtha M, Myers RM, Myin-Germeys I, Neale MC, Nelson SF, Nievergelt CM, Nikolov I, Nimgaonkar V, Nolen WA, Nöthen MM, Nurnberger JI, Nwulia EA, Nyholt DR, O’Dushlaine C, Oades RD, Olincy A, Oliveira G, Olsen L, Ophoff RA, Osby U, Owen MJ, Palotie A, Parr JR, Paterson AD, Pato CN, Pato MT, Penninx BW, Pergadia ML, Pericak-Vance MA, Pickard BS, Pimm J, Piven J, Posthuma D, Potash JB, Poustka F, Propping P, Puri V, Quested DJ, Quinn EM, Ramos-Quiroga JA, Rasmussen HB, Raychaudhuri S, Rehnström K, Reif A, Ribasés M, Rice JP, Rietschel M, Roeder K, Roeyers H, Rossin L, Rothenberger A, Rouleau G, Ruderfer D, Rujescu D, Sanders AR, Sanders SJ, Santangelo SL, Sergeant JA, Schachar R, Schalling M, Schatzberg AF, Scheftner WA, Schellenberg GD, Scherer SW, Schork NJ, Schulze TG, Schumacher J, Schwarz M, Scolnick E, Scott LJ, Shi J, Shilling PD, Shyn SI, Silverman JM, Slager SL, Smalley SL, Smit JH, Smith EN, Sonuga-Barke EJS, St Clair D, State M, Steffens M, Steinhausen H-C, Strauss JS, Strohmaier J, Stroup TS, Sutcliffe JS, Szatmari P, Szelinger S, Thirumalai S, Thompson RC, Todorov AA, Tozzi F, Treutlein J, Uhr M, van den Oord EJCG, Van Grootheest G, Van Os J, Vicente AM, Vieland VJ, Vincent JB, Visscher PM, Walsh CA, Wassink TH, Watson SJ, Weissman MM, Werge T, Wienker TF, Wijsman EM, Willemsen G, Williams N, Willsey AJ, Witt SH, Xu W, Young AH, Yu TW, Zammit S, Zandi PP, Zhang P, Zitman FG, Zöllner S, Devlin B, Kelsoe JR, Sklar P, Daly MJ, O’Donovan MC, Craddock N, Sullivan PF, Smoller JW, Kendler KS, Wray NR, Cross-Disorder Group of the Psychiatric Genomics Consortium. International Inflammatory Bowel Disease Genetics Consortium (IIBDGC) Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nature Genetics. 2013;45:984–994. doi: 10.1038/ng.2711. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee PH. Genomic relationships, novel loci, and pleiotropic mechanisms across eight psychiatric disorders. Cell. 2019;179:1469–1482. doi: 10.1016/j.cell.2019.11.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O’Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB, Tukiainen T, Birnbaum DP, Kosmicki JA, Duncan LE, Estrada K, Zhao F, Zou J, Pierce-Hoffman E, Berghout J, Cooper DN, Deflaux N, DePristo M, Do R, Flannick J, Fromer M, Gauthier L, Goldstein J, Gupta N, Howrigan D, Kiezun A, Kurki MI, Moonshine AL, Natarajan P, Orozco L, Peloso GM, Poplin R, Rivas MA, Ruano-Rubio V, Rose SA, Ruderfer DM, Shakir K, Stenson PD, Stevens C, Thomas BP, Tiao G, Tusie-Luna MT, Weisburd B, Won H-H, Yu D, Altshuler DM, Ardissino D, Boehnke M, Danesh J, Donnelly S, Elosua R, Florez JC, Gabriel SB, Getz G, Glatt SJ, Hultman CM, Kathiresan S, Laakso M, McCarroll S, McCarthy MI, McGovern D, McPherson R, Neale BM, Palotie A, Purcell SM, Saleheen D, Scharf JM, Sklar P, Sullivan PF, Tuomilehto J, Tsuang MT, Watkins HC, Wilson JG, Daly MJ, MacArthur DG, Exome Aggregation Consortium Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536:285–291. doi: 10.1038/nature19057. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lelieveld SH, Reijnders MRF, Pfundt R, Yntema HG, Kamsteeg E-J, de Vries P, de Vries BBA, Willemsen MH, Kleefstra T, Löhner K, Vreeburg M, Stevens SJC, van der Burgt I, Bongers EMHF, Stegmann APA, Rump P, Rinne T, Nelen MR, Veltman JA, Vissers LELM, Brunner HG, Gilissen C. Meta-analysis of 2,104 trios provides support for 10 new genes for intellectual disability. Nature Neuroscience. 2016;19:1194–1196. doi: 10.1038/nn.4352. [DOI] [PubMed] [Google Scholar]
Li J, Cai T, Jiang Y, Chen H, He X, Chen C, Li X, Shao Q, Ran X, Li Z, Xia K, Liu C, Sun ZS, Wu J. Genes with de novo mutations are shared by four neuropsychiatric disorders discovered from NPdenovo database. Molecular Psychiatry. 2016;21:290–297. doi: 10.1038/mp.2015.40. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lu Q, Li B, Ou D, Erlendsdottir M, Powles RL, Jiang T, Hu Y, Chang D, Jin C, Dai W, He Q, Liu Z, Mukherjee S, Crane PK, Zhao H. A Powerful Approach to Estimating Annotation-Stratified Genetic Covariance via GWAS Summary Statistics. American Journal of Human Genetics. 2017;101:939–964. doi: 10.1016/j.ajhg.2017.11.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lumenta CB, Skotarczak U. Long-term follow-up in 233 patients with congenital hydrocephalus. Child’s Nervous System. 1995;11:173–175. doi: 10.1007/BF00570260. [DOI] [PubMed] [Google Scholar]
Munkin MK, Trivedi PK. Simulated maximum likelihood estimation of multivariate mixed‐Poisson regression models, with application. The Econometrics Journal. 1999;2:29–48. doi: 10.1111/1368-423X.00019. [DOI] [Google Scholar]
Nguyen HT, Bryois J, Kim A, Dobbyn A, Huckins LM, Munoz-Manchado AB, Ruderfer DM, Genovese G, Fromer M, Xu X, Pinto D, Linnarsson S, Verhage M, Smit AB, Hjerling-Leffler J, Buxbaum JD, Hultman C, Sklar P, Purcell SM, Lage K, He X, Sullivan PF, Stahl EA. Integrated Bayesian analysis of rare exonic variants to identify risk genes for schizophrenia and neurodevelopmental disorders. Genome Medicine. 2017;9:114. doi: 10.1186/s13073-017-0497-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nguyen T-H, Dobbyn A, Brown RC, Riley BP, Buxbaum JD, Pinto D, Purcell SM, Sullivan PF, He X, Stahl EA. mTADA is a framework for identifying risk genes from de novo mutations in multiple traits. Nature Communications. 2020;11:2929. doi: 10.1038/s41467-020-16487-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nguyen TH, Dobbyn A, Brown RC, Riley BP, Buxbaum J, Pinto D, Purcell SM, Sullivan PF, He X, Eli A. mTADA is a framework for identifying risk genes from de novo mutations in multiple traits. 7630c4bGitHub. 2021 doi: 10.1038/s41467-020-16487-z. https://github.com/hoangtn/mTADA [DOI] [PMC free article] [PubMed]
Ning Z, Pawitan Y, Shen X. High-definition likelihood inference of genetic correlations across human complex traits. Nature Genetics. 2020;52:859–864. doi: 10.1038/s41588-020-0653-y. [DOI] [PubMed] [Google Scholar]
O’Roak BJ, Vives L, Girirajan S, Karakoc E, Krumm N, Coe BP, Levy R, Ko A, Lee C, Smith JD, Turner EH, Stanaway IB, Vernot B, Malig M, Baker C, Reilly B, Akey JM, Borenstein E, Rieder MJ, Nickerson DA, Bernier R, Shendure J, Eichler EE. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature. 2012;485:246–250. doi: 10.1038/nature10989. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pickrell JK, Berisa T, Liu JZ, Ségurel L, Tung JY, Hinds DA. Detection and interpretation of shared genetic influences on 42 human traits. Nature Genetics. 2016;48:709–717. doi: 10.1038/ng.3570. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rees E, Creeth HDJ, Hwu H-G, Chen WJ, Tsuang M, Glatt SJ, Rey R, Kirov G, Walters JTR, Holmans P, Owen MJ, O’Donovan MC. Schizophrenia, autism spectrum disorders and developmental disorders share specific disruptive coding mutations. Nature Communications. 2021;12:5353. doi: 10.1038/s41467-021-25532-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Reid SM, Meehan EM, Arnup SJ, Reddihough DS. Intellectual disability in cerebral palsy: a population-based retrospective study. Developmental Medicine and Child Neurology. 2018;60:687–694. doi: 10.1111/dmcn.13773. [DOI] [PubMed] [Google Scholar]
Samocha KE, Robinson EB, Sanders SJ, Stevens C, Sabo A, McGrath LM, Kosmicki JA, Rehnström K, Mallick S, Kirby A, Wall DP, MacArthur DG, Gabriel SB, DePristo M, Purcell SM, Palotie A, Boerwinkle E, Buxbaum JD, Cook EH, Jr, Gibbs RA, Schellenberg GD, Sutcliffe JS, Devlin B, Roeder K, Neale BM, Daly MJ. A framework for the interpretation of de novo mutation in human disease. Nature Genetics. 2014;46:944–950. doi: 10.1038/ng.3050. [DOI] [PMC free article] [PubMed] [Google Scholar]
Satterstrom FK, Kosmicki JA, Wang J, Breen MS, De Rubeis S, An J-Y, Peng M, Collins R, Grove J, Klei L, Stevens C, Reichert J, Mulhern MS, Artomov M, Gerges S, Sheppard B, Xu X, Bhaduri A, Norman U, Brand H, Schwartz G, Nguyen R, Guerrero EE, Dias C, Autism Sequencing Consortium. iPSYCH-Broad Consortium. Betancur C, Cook EH, Gallagher L, Gill M, Sutcliffe JS, Thurm A, Zwick ME, Børglum AD, State MW, Cicek AE, Talkowski ME, Cutler DJ, Devlin B, Sanders SJ, Roeder K, Daly MJ, Buxbaum JD. Large-Scale Exome Sequencing Study Implicates Both Developmental and Functional Changes in the Neurobiology of Autism. Cell. 2020;180:568–584. doi: 10.1016/j.cell.2019.12.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schorsch E. LDSC (LD SCore) v1.0.1. aa33296GitHub. 2020 https://github.com/bulik/ldsc
Shi H, Mancuso N, Spendlove S, Pasaniuc B. Local Genetic Correlation Gives Insights into the Shared Genetic Architecture of Complex Traits. American Journal of Human Genetics. 2017;101:737–751. doi: 10.1016/j.ajhg.2017.09.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
Turley P, Walters RK, Maghzian O, Okbay A, Lee JJ, Fontana MA, Nguyen-Viet TA, Wedow R, Zacher M, Furlotte NA, 23andMe Research Team. Social Science Genetic Association Consortium. Magnusson P, Oskarsson S, Johannesson M, Visscher PM, Laibson D, Cesarini D, Neale BM, Benjamin DJ. Multi-trait analysis of genome-wide association summary statistics using MTAG. Nature Genetics. 2018;50:229–237. doi: 10.1038/s41588-017-0009-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Turner TN, Yi Q, Krumm N, Huddleston J, Hoekzema K, F Stessman HA, Doebley A-L, Bernier RA, Nickerson DA, Eichler EE. denovo-db: A compendium of human de novo variants. Nucleic Acids Research. 2017;45:D804–D811. doi: 10.1093/nar/gkw865. [DOI] [PMC free article] [PubMed] [Google Scholar]
Veltman JA, Brunner HG. De novo mutations in human genetic disease. Nature Reviews. Genetics. 2012;13:565–575. doi: 10.1038/nrg3241. [DOI] [PubMed] [Google Scholar]
Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Research. 2010;38:e164. doi: 10.1093/nar/gkq603. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang Q, Yang C, Gelernter J, Zhao H. Pervasive pleiotropy between psychiatric disorders and immune disorders revealed by integrative analysis of multiple GWAS. Human Genetics. 2015;134:1195–1209. doi: 10.1007/s00439-015-1596-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Watanabe K, Taskesen E, van Bochoven A, Posthuma D. Functional mapping and annotation of genetic associations with FUMA. Nature Communications. 2017;8:1–11. doi: 10.1038/s41467-017-01261-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wei Q, Zhan X, Zhong X, Liu Y, Han Y, Chen W, Li B. A Bayesian framework for de novo mutation calling in parents-offspring trios. Bioinformatics (Oxford, England) 2015;31:1375–1381. doi: 10.1093/bioinformatics/btu839. [DOI] [PMC free article] [PubMed] [Google Scholar]
Werling DM, Pochareddy S, Choi J, An JY, Sheppard B, Peng M, Li Z, Dastmalchi C, Santpere G, Sousa AMM, Tebbenkamp ATN, Kaur N, Gulden FO, Breen MS, Liang L, Gilson MC, Zhao X, Dong S, Klei L, Cicek AE, Buxbaum JD, Adle-Biassette H, Thomas JL, Aldinger KA, O’Day DR, Glass IA, Zaitlen NA, Talkowski ME, Roeder K, State MW, Devlin B, Sanders SJ, Sestan N. Whole-Genome and RNA Sequencing Reveal Variation and Transcriptomic Coordination in the Developing Human Prefrontal Cortex. Cell Reports. 2020;31:e107489. doi: 10.1016/j.celrep.2020.03.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
Willsey AJ, Fernandez TV, Yu D, King RA, Dietrich A, Xing J, Sanders SJ, Mandell JD, Huang AY, Richer P, Smith L, Dong S, Samocha KE, Tourette International Collaborative Genetics (TIC Genetics) Tourette Syndrome Association International Consortium for Genetics (TSAICG) Neale BM, Coppola G, Mathews CA, Tischfield JA, Scharf JM, State MW, Heiman GA. De Novo Coding Variants Are Strongly Associated with Tourette Disorder. Neuron. 2017;94:486–499. doi: 10.1016/j.neuron.2017.04.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
Willsey AJ, Morris MT, Wang S, Willsey HR, Sun N, Teerikorpi N, Baum TB, Cagney G, Bender KJ, Desai TA, Srivastava D, Davis GW, Doudna J, Chang E, Sohal V, Lowenstein DH, Li H, Agard D, Keiser MJ, Shoichet B, von Zastrow M, Mucke L, Finkbeiner S, Gan L, Sestan N, Ward ME, Huttenhain R, Nowakowski TJ, Bellen HJ, Frank LM, Khokha MK, Lifton RP, Kampmann M, Ideker T, State MW, Krogan NJ. The Psychiatric Cell Map Initiative: A Convergent Systems Biological Approach to Illuminating Key Molecular Pathways in Neuropsychiatric Disorders. Cell. 2018;174:505–520. doi: 10.1016/j.cell.2018.06.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zaidi S, Choi M, Wakimoto H, Ma L, Jiang J, Overton JD, Romano-Adesman A, Bjornson RD, Breitbart RE, Brown KK, Carriero NJ, Cheung YH, Deanfield J, DePalma S, Fakhro KA, Glessner J, Hakonarson H, Italia MJ, Kaltman JR, Kaski J, Kim R, Kline JK, Lee T, Leipzig J, Lopez A, Mane SM, Mitchell LE, Newburger JW, Parfenov M, Pe’er I, Porter G, Roberts AE, Sachidanandam R, Sanders SJ, Seiden HS, State MW, Subramanian S, Tikhonova IR, Wang W, Warburton D, White PS, Williams IA, Zhao H, Seidman JG, Brueckner M, Chung WK, Gelb BD, Goldmuntz E, Seidman CE, Lifton RP. De novo mutations in histone-modifying genes in congenital heart disease. Nature. 2013;498:220–223. doi: 10.1038/nature12141. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zaidi S, Brueckner M. Genetics and Genomics of Congenital Heart Disease. Circulation Research. 2017;120:923–940. doi: 10.1161/CIRCRESAHA.116.309140. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang Y, Cheng Y, Jiang W, Ye Y, Lu Q, Zhao H. Comparison of methods for estimating genetic correlation between complex traits using GWAS summary statistics. Briefings in Bioinformatics. 2021a;22:bbaa442. doi: 10.1093/bib/bbaa442. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang Y, Lu Q, Ye Y, Huang K, Liu W, Wu Y, Zhong X, Li B, Yu Z, Travers BG, Werling DM, Li JJ, Zhao H. SUPERGNOVA: local genetic correlation analysis reveals heterogeneous etiologic sharing of complex traits. Genome Biology. 2021b;22:1–30. doi: 10.1186/s13059-021-02478-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

eLife. doi: 10.7554/eLife.75551.sa0

Editor's evaluation

Alexander Young ¹

Lu et al. provide a powerful statistical method that measures excess sharing of de novo mutations between pairs of disorders. This method extends the concept of 'genetic correlation' to disorders caused by de-novo mutations, measuring the correlation in excess de-novo mutations in genome-wide genes for different classes of mutations. The authors apply the method to nine disorders including a developmental disorder, autism spectrum disorder, congenital heart disease, schizophrenia, and intellectual disability, finding a statistically significant overlap between 12 pairs of disorders in de novo mutations that cause a loss of gene function. This method will be of interest to researchers working on disorders caused by de-novo mutations.

eLife. doi: 10.7554/eLife.75551.sa1

Decision letter

Editor: Alexander Young¹

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Quantifying concordant genetic effects of de novo mutations on multiple disorders" for consideration by eLife. Your article has been reviewed by two peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Molly Przeworski as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

1) The manuscript needs to explain better the different approaches taken by encoreDNM and mTADA, and the corresponding strengths and weaknesses. The authors focus on power in frequentist hypothesis testing for non-zero genetic overlap between disorders, but mTADA takes a Bayesian approach. Clarification is needed here on whether the statistical basis of the comparison is fair.

2) Greater clarification of the model fitting procedure is needed: how the de-novo mutability parameter is used and the computational complexity of optimizing the likelihood function.

3) The heatmaps that present the results of the empirical analysis show only statistical significance levels. Since the authors' method's primary parameter is the correlation parameter, it would be helpful to display estimates of this parameter in the main text and to focus discussion on that parameter more and on statistical significance less.

Reviewer #1 (Recommendations for the authors):

Here, I have the following comments.

1. Authors proposed a mixed-effect Poisson model with overdispersion. I have a few questions regarding this model. First, it is not clear whether the de novo mutability m_i is a parameter or a given estimation in the model. Authors mentioned on line 110 that the background component is a gene-specific fixed effect but in the Methods they do not treat it as a parameter in the section of parameter estimation. Second, to treat overdispersion in Poisson model, negative binomial distribution is usually applied. Here, authors added an overdispersion variable \phi_i in the model. It will be helpful to discuss on this point. Third, a MCMC algorithm was used to solve the EncoreDNM model. How is the computational efficiency? On line 431, authors used the inversion of Fisher information matrix to obtain the covariance for parameters. Is there any non-invertibility problem here?

2. In the data analysis, authors used Figure 4c-f to suggest better fit of mixed-effect model. It is better to use a table to summarize the goodness of fit test instead of using bar plots in Supplementary Figure 3. On line 197, before reading the whole paragraph, it is not clear at the beginning whether single-trait analysis refers to mixed-effect or fixed-effect. Next, authors discovered that no significant correlation as identified between any disorders and control groups. Discussions about this should be helpful.

3. In Figure 5, panel b and c can be simplified with upper triangle for LoF and lower triangle for synoymous. Moreover, the heatmaps here only reflect the significance of correlations but no information regarding the correlation magnitude itself. It would be helpful to include this in the same figure. This point could be applied to other heatmaps in the supplement. In addition, Figure 5b shows more significant correlations for diseases with larger sample sizes. This is consistent with common sense. So including correlation magnitudes in this figure would be more informative.

Reviewer #2 (Recommendations for the authors):

1. I felt the introduction and discussion lacked a more in-depth comparison between EncoreDNM and other similar methods (specifically mTADA). In particular, what are the differences in parameters between these two methods and why do you see such an increase in false positives (Figure 3) for mTADA over EncoreDNM?

2. L66: The technical wording of this statement could be improved. The paper cited was commenting on remaining haplo-insufficient genes yet to be discovered. I also do not believe "undetected" to be the correct word here as this refers to a statistical test for enrichment rather than a clinical association. There are several genes (e.g. in Kaplanis et al. Supp Table 2) that are known to cause developmental disorders clinically but do not pass p. value thresholds suggesting a statistical enrichment in studies like DDD.

3. In Figure 1, is there a vertical line missing? On the left, there are squares representing genes, but there is also one rectangle.

4. L103-117: Could the derivation of the dispersion parameter Φ be explained in better lay-terms somewhere? I understand that it is attempting to quantify the random nature of DNM counts one expects when sequencing a subset of individuals with a given disorder, but the maths in the methods section is a bit impenetrable for somebody who is not well-versed in calculus. This is especially crucial considering the major role it plays in interpreting the various results presented, and in comparison, to mTADA. It will be difficult for some readers of a journal with a broad range of topics such as eLife to understand how this parameter, particularly σ, was estimated.

5. L108: Further to point 4, what are reasonable values of the parameters used to fit your model? Additionally, the authors state that β "tends to be larger… and smaller…" under various conditions, but based on Figure 3, it appears that it shifts between positive and negative?

6. L151: The word "could estimate" is loaded and represents an opinion and should be removed in favour of "estimates".

7. L184-187: The excessive use of acronyms for various disorders makes for difficult reading. While I am familiar with acronyms for developmental disorders, autism spectrum, and schizophrenia because these are my field, I constantly had to refer back to the definitions is it possible to just use the actual disorder names throughout the manuscript where possible?

8. Figure 3: I think it is relevant to put the various simulated parameters that constitute Φ (e.g. σ, ρ) into context with actual data as presented in Figure 4. i.e. do the simulated parameters used here represent reasonable assumptions of these values and are they within the expectations of mTADA? Could the range of parameters have an outsized effect on the differences between mTADA and EncoreDNM? Or does it have to do with p. value thresholding (see below). Furthermore, I would appreciate it if x-axis labels included the actual parameter setting rather than the less descriptive "small" and plots A and C had labels which described which parameters were fixed in those respective analyses.

9. L246: "i.e." should be removed. There are five genes and you listed all five.

10. L267-272: Are the authors certain the thresholding on the mTADA results is reasonable? Whilst the FDR cutoff the authors have applied may suggest "significance", the p. values for the mTADA synonymous variant analysis seem very similar across all disorder combinations. Furthermore, the general pattern of p. values for LoFs seems similar to that shown for EncoreDNM. I suspect that the correlation between p. values of EncoreDNM and mTADA will be high and one could generate similar results by adjusting significance thresholds independently for each tool.

Additionally, as mTADA is based on a Bayesian approach, shouldn't the authors threshold based on posterior probabilities rather than p. values? I am unsure if comparing the π values from the different mTADA results is a valid approach as described in the methods? How do the authors conclusions change when using thresholds based on those the authors of mTADA suggest (e.g. PP > 0.8)?

11. Figure 6: Could a visual cue/text be added to differentiate between the analyses in the upper and lower triangles?

12. What is the overall rank of genes identified between different disorders when comparing between mTADA and EncoreDNM? Could the authors plot relative p/PP values for genes identified to be significantly enriched between disorders?

13. L228-229: Do not use the terms "hints at correlation" or "correlate strongly". It either does or does not meet your p. value threshold and/or correlate.

14. L257-260: Are the recurrent LoF mutations found in developmental disorders any different than those already identified by Kaplanis et al.? If so, how do your results increase understanding of the mechanisms underlying developmental disorders? I would perhaps shift the focus of this section to identifying recurrent mutations across disorders (and perhaps cite/refer to Rees et al. in Nature Communications; PMID: 34504065).

15. L274-280: I do not feel the LD score analysis constitutes a new analysis and should be moved to the discussion. A similar result was recently published in by Abdellaoui and Verweij in Nature Human Behaviour (PMID: 33986517) that covers many of these traits.

16. Throughout: Could the term "Type I Error" be avoided? I would prefer false-discovery/false-positive be used as it is a much clearer term and immediately recognisable by the majority of readers.

17. L365-366: I do not think identifying gene-disease associations is a goal of EncoreDNM so I do not consider it a limitation of the study.

18. L584: I think the hyperlink is broken? I was able to find the repository for EncoreDNM in ghm17's github account, so this was not a major issue.

eLife. 2022 Jun 6;11:e75551. doi: 10.7554/eLife.75551.sa2

Author response

Essential revisions:

1) The manuscript needs to explain better the different approaches taken by encoreDNM and mTADA, and the corresponding strengths and weaknesses. The authors focus on power in frequentist hypothesis testing for non-zero genetic overlap between disorders, but mTADA takes a Bayesian approach. Clarification is needed here on whether the statistical basis of the comparison is fair.

2) Greater clarification of the model fitting procedure is needed: how the de-novo mutability parameter is used and the computational complexity of optimizing the likelihood function.

3) The heatmaps that present the results of the empirical analysis show only statistical significance levels. Since the authors' method's primary parameter is the correlation parameter, it would be helpful to display estimates of this parameter in the main text and to focus discussion on that parameter more and on statistical significance less.

We really appreciate the thoughtful and constructive comments from the editor and both reviewers. In this revision, we have provided additional justifications for the comparison between EncoreDNM and mTADA, and clarified statistical details in the model fitting and parameter estimation procedure. We also present the enrichment correlation estimates in addition to significance levels in the updated heatmaps as suggested. The new analyses have produced highly consistent results compared to our initial submission and have strengthened the manuscript. We provide details of these analyses in the point-by-point response below.

Reviewer #1 (Recommendations for the authors):

Here, I have the following comments.

1. Authors proposed a mixed-effect Poisson model with overdispersion. I have a few questions regarding this model. First, it is not clear whether the de novo mutability m_i is a parameter or a given estimation in the model. Authors mentioned on line 110 that the background component is a gene-specific fixed effect but in the Methods they do not treat it as a parameter in the section of parameter estimation.

Thank you for the comment. The de novo mutability $m_{i}$ is given a priori rather than being a parameter to be estimated. There is extensive literature on estimating de novo mutability from genomic sequence context. In this paper, we used mutability values estimated from a tri-nucleotide-based model proposed by Samocha and colleagues¹.

Second, to treat overdispersion in Poisson model, negative binomial distribution is usually applied. Here, authors added an overdispersion variable \phi_i in the model. It will be helpful to discuss on this point.

We thank the reviewer for pointing this out. It is true that negative binomial distribution as a Poisson-γ mixture is commonly used for handling overdispersion in count data. This is because Poisson distribution and γ distribution are a conjugate pair, and their compound model can be expressed in closed form. However, the main goal of our study is to quantify the shared genetic component of two disorders, and there is no trivial way to parametrize the bivariate extension of negative binomial distribution for this purpose. Marshall and Olkin proposed a bivariate Poisson-γ mixture (negative binomial) model with a restriction that both count variables share the same γ distribution component². The mixture model has closed form solution but assumes the dispersion of two count variables to be identical and limits the unconditional correlation of two count variables to be positive.

In this paper, we adopted a Poisson-lognormal mixture framework, which is another commonly used Poisson mixture model that accommodates overdispersion (see Chapter 4.2 in Cameron, A.C. et al.³). Unlike the Poisson-γ mixture, this model does not have the computational convenience of having closed-form solutions. Instead, we implemented the Monte Carlo integration method to calculate the likelihood which is computationally more intensive but still obtained accurate and robust results. A major reason why we chose to use the Poisson-lognormal mixture is that it can be easily generalized to bivariate count model by replacing its univariate normal component with a bivariate normal distribution. This flexible bivariate count model was initially proposed by Munkin and Trivedi in econometrics, and has been demonstrated to accommodate both overdispersion and correlation, which suits our main goal very well⁴. In general, finding a computationally simpler approach to parametrize correlation in bivariate count data remains an interesting methodological question, but we are content with the excellent empirical performance of the Poisson-lognormal mixture model in our analyses. We have added some clarifications and discussions into the Methods-statistical model section of our revised manuscript.

Third, a MCMC algorithm was used to solve the EncoreDNM model. How is the computational efficiency?

Our Monte Carlo integration method is computationally efficient. Analysis of a typical trait pair with 18,000 genes takes about 10 minutes on a 2.5GHz cluster with 1 core. We have added these details into the Methods-computation time section in the revised manuscript.

On line 431, authors used the inversion of Fisher information matrix to obtain the covariance for parameters. Is there any non-invertibility problem here?

Thank you for raising this important point. We double checked our previous analyses and found that the non-invertibility issue indeed occurred when the DNM count is small (mostly happened when analyzing synonymous variants). In this revision, we newly implemented a group-wise jackknife method to obtain standard errors for parameter estimates when the Fisher information matrix is noninvertable. More specifically, we randomly partitioned all genes into 100 groups with equal size. Each time, we left one group out and estimated the parameters using the DNM data of the remaining genes. We then repeated the procedure 100 times and calculated the jackknife standard errors. This approach produced similar standard error estimates compared to the Fisher information approach in our analysis, as illustrated in Figure 5—figure supplement 9 in the revised manuscript. We have also incorporated the new method details into the Methods-parameter estimation section in our revised manuscript.

2. In the data analysis, authors used Figure 4c-f to suggest better fit of mixed-effect model. It is better to use a table to summarize the goodness of fit test instead of using bar plots in Supplementary Figure 3.

Thank you for your great suggestion. We have added Supplementary Table 2 in our revised manuscript to show the results of likelihood ratio tests.

On line 197, before reading the whole paragraph, it is not clear at the beginning whether single-trait analysis refers to mixed-effect or fixed-effect.

Thank you for your comment. We have added the word “under the mixed-effects Poisson model” for clarification.

Next, authors discovered that no significant correlation as identified between any disorders and control groups. Discussions about this should be helpful.

For disorders, DNMs will be enriched in risk genes and slightly depleted in non-risk genes. For control groups (these are healthy siblings recruited in a study for autism), DNMs are expected to distribute proportionally according to the de novo mutability (determined by the genomic sequence context) without showing enrichment in certain genes. Therefore, we expect the enrichment correlation, characterized by concordant enrichment of DNMs in the exome, to be near zero between disorders and the control group. Our results are consistent with our expectation. We have added some related discussions into the Results section of our revised manuscript.

3. In Figure 5, panel b and c can be simplified with upper triangle for LoF and lower triangle for synoymous. Moreover, the heatmaps here only reflect the significance of correlations but no information regarding the correlation magnitude itself. It would be helpful to include this in the same figure. This point could be applied to other heatmaps in the supplement. In addition, Figure 5b shows more significant correlations for diseases with larger sample sizes. This is consistent with common sense. So including correlation magnitudes in this figure would be more informative.

Thank you for this great suggestion. We have incorporated the enrichment correlation estimates into heatmaps (Figures 5-6). Figure 6—figure supplements 1-8 in the revised manuscript have also been updated accordingly. We also included the updated Figure 5 below for your convenience.

Reviewer #2 (Recommendations for the authors):

1. I felt the introduction and discussion lacked a more in-depth comparison between EncoreDNM and other similar methods (specifically mTADA). In particular, what are the differences in parameters between these two methods and why do you see such an increase in false positives (Figure 3) for mTADA over EncoreDNM?

Thank you for the comment. To quantify shared genetic effects between two disorders, EncoreDNM assumes a mixed-effects Poisson model and estimates the correlation of deviation components across two disorders, whereas mTADA employs a Bayesian framework and estimates the proportion of shared risk genes. We have provided statistical details of EncoreDNM in the Methods section of our manuscript. Here, we briefly introduce mTADA.

The mTADA method assigns all genes into four groups: genes that are not relevant for either disorder, risk genes for the first disorder alone, risk genes for the second disorder alone, and risk genes shared by both disorders. The proportion of these groups are parametrized as $π_{0}, π_{1}, π_{2}, π_{3}$ , respectively. In particular, parameter $π_{3}$ quantifies the extent of genetic sharing between two disorders, with a larger value indicating stronger genetic overlap (for example, see Figures 4a-b in the mTADA paper⁵). The 95% credible interval constructed through MCMC is used to measure the uncertainty in $π_{3}$ estimates.

Through extensive simulations and analysis of nine disorders, we demonstrated that EncoreDNM provides accurate statistical inference, but mTADA can produce false positives findings when following the author-recommended procedure, especially when the DNM count is small. One possible reason is that when there is not sufficient DNM counts, mTADA tends to overestimate $π_{3}$ . Nguyen et al. also reported this phenomenon in their simulation settings with small mean relative risks⁵ (see Supplementary Figure 3 in the mTADA paper). Further, although the inflation of false positive rate for mTADA may be alleviated by inducing a more stringent significance threshold, researchers will not know how to select such a threshold in practice. We will provide more details on this in our response to Comment #10 below. Related discussions have also been incorporated into the revised manuscript.

2. L66: The technical wording of this statement could be improved. The paper cited was commenting on remaining haplo-insufficient genes yet to be discovered. I also do not believe "undetected" to be the correct word here as this refers to a statistical test for enrichment rather than a clinical association. There are several genes (e.g. in Kaplanis et al. Supp Table 2) that are known to cause developmental disorders clinically but do not pass p. value thresholds suggesting a statistical enrichment in studies like DDD.

We have replaced the word “more than 1,000 genes associated with DD remain undetected” with “more than 1,000 haploinsufficient genes contributing to DD risk have not yet achieved statistical significance”.

3. In Figure 1, is there a vertical line missing? On the left, there are squares representing genes, but there is also one rectangle.

We use squares to represent different genes. The rectangle in the middle represents many other genes that are omitted due to limited space.

4. L103-117: Could the derivation of the dispersion parameter Φ be explained in better lay-terms somewhere? I understand that it is attempting to quantify the random nature of DNM counts one expects when sequencing a subset of individuals with a given disorder, but the maths in the methods section is a bit impenetrable for somebody who is not well-versed in calculus. This is especially crucial considering the major role it plays in interpreting the various results presented, and in comparison, to mTADA. It will be difficult for some readers of a journal with a broad range of topics such as eLife to understand how this parameter, particularly σ, was estimated.

We appreciate the suggestion. The main goal of EncoreDNM is to quantify correlated DNM enrichment between two disorders. The statistical framework is parametrized as follows. $[\begin{matrix} Y_{i 1} \\ Y_{i 2} \\ \end{matrix}] \sim P o i s s o n ([\begin{matrix} λ_{i 1} \\ λ_{i 2} \\ \end{matrix}]),$ $\log ([\begin{matrix} λ_{i 1} \\ λ_{i 2} \\ \end{matrix}]) = [\begin{matrix} β_{1} \\ β_{2} \\ \end{matrix}] + \log ([\begin{matrix} 2 N_{1} m_{i} \\ 2 N_{2} m_{i} \\ \end{matrix}]) + [\begin{matrix} ϕ_{i 1} \\ ϕ_{i 2} \\ \end{matrix}],$ $[\begin{matrix} ϕ_{i 1} \\ ϕ_{i 2} \\ \end{matrix}] \sim MVN ([\begin{matrix} 0 \\ 0 \\ \end{matrix}], [\begin{matrix} σ_{1}^{2} & ρ σ_{1} σ_{2} \\ ρ σ_{1} σ_{2} & σ_{2}^{2} \\ \end{matrix}]) .$ We have described the statistical details of this framework in the Methods section of our manuscript. Briefly, in this model, DNM rate is affected by the elevation component $β_{k}$ , the background component $l o g (2 N_{k} m_{i})$ , and the deviation component $ϕ_{ik}$ . The component in question, $ϕ_{ik}$ , is modeled as a random effect that follows a bivariate normal distribution. More specifically, $ϕ$ quantifies the degree to which DNM counts look different from what we expect to see under the null (i.e., no risk genes for the disorder). A larger value of the dispersion parameter $σ$ indicates a more substantial deviation from the null. That is, DNM counts show strong enrichment in some genes and depletion in other genes compared to the expectation based on de novo mutability. If $σ$ has a small value, it means the DNM count data is largely consistent with the null. Parameter $ρ$ further allows such deviation of two disorders to be correlated and is a key parameter of interest in our framework. We have added some clarifications into the Results-method overview section in the revised manuscript.

5. L108: Further to point 4, what are reasonable values of the parameters used to fit your model? Additionally, the authors state that β "tends to be larger… and smaller…" under various conditions, but based on Figure 3, it appears that it shifts between positive and negative?

In our mixed-effects Poisson regression model, there is no constraint on what value the elevation parameter $β$ can be. A positive value of $β$ represents over-calling DNMs while a negative value represents under-calling. The dispersion parameter $σ$ can be any positive value. A larger $σ$ indicates that DNM counts show a strong deviation compared to the expectation rate determined by de novo mutability. The enrichment correlation $ρ$ quantifies the concordance of DNM enrichments between two disorders and can be any value between -1 and 1. Applying EncoreDNM to nine disorders, we found that the estimates of $β$ were almost always negative across variant classes, which may be explained by strict quality control in DNM calling pipelines. We also note that the empirical estimates of these parameters are affected by noise in the data, especially when the sample size is small.

6. L151: The word "could estimate" is loaded and represents an opinion and should be removed in favour of "estimates".

Thank you for the suggestion. We have replaced the word "could estimate" with “estimates”.

7. L184-187: The excessive use of acronyms for various disorders makes for difficult reading. While I am familiar with acronyms for developmental disorders, autism spectrum, and schizophrenia because these are my field, I constantly had to refer back to the definitions is it possible to just use the actual disorder names throughout the manuscript where possible?

We have replaced the acronyms with the actual names of disorders throughout the revised manuscript.

8. Figure 3: I think it is relevant to put the various simulated parameters that constitute Φ (e.g. σ, ρ) into context with actual data as presented in Figure 4. i.e. do the simulated parameters used here represent reasonable assumptions of these values and are they within the expectations of mTADA? Could the range of parameters have an outsized effect on the differences between mTADA and EncoreDNM? Or does it have to do with p. value thresholding (see below). Furthermore, I would appreciate it if x-axis labels included the actual parameter setting rather than the less descriptive "small" and plots A and C had labels which described which parameters were fixed in those respective analyses.

In the real DNM data analysis using EncoreDNM, the mean(SD) of the estimated $β$ and $π_{3}$ were -0.54(0.25) and 0.99(0.36), respectively. In our simulations, $β$ was chosen as -1, -0.5, or 0, and $σ$ was chosen as 0.5, 0.75, or 1. Therefore the parameter values used in simulations would be reasonable and not have outsized effects on the performance of two methods. We have also added the parameter values into the x-axis labels of Figure 3.

9. L246: "i.e." should be removed. There are five genes and you listed all five.

We have removed “i.e.” from the sentence.

10. L267-272: Are the authors certain the thresholding on the mTADA results is reasonable? Whilst the FDR cutoff the authors have applied may suggest "significance", the p. values for the mTADA synonymous variant analysis seem very similar across all disorder combinations. Furthermore, the general pattern of p. values for LoFs seems similar to that shown for EncoreDNM. I suspect that the correlation between p. values of EncoreDNM and mTADA will be high and one could generate similar results by adjusting significance thresholds independently for each tool.

Additionally, as mTADA is based on a Bayesian approach, shouldn't the authors threshold based on posterior probabilities rather than p. values? I am unsure if comparing the π values from the different mTADA results is a valid approach as described in the methods? How do the authors conclusions change when using thresholds based on those the authors of mTADA suggest (e.g. PP > 0.8)?

This is an important point, and we appreciate the comment. In the paper that introduced the mTADA approach, Nguyen et al. used the proportion of shared risk genes $π_{3}$ to assess genetic overlaps between disorders⁵. They argued that a larger value of $π_{3}$ compared to 0 indicates stronger genetic sharing (for example, see Figures 4a-b in the mTADA paper⁵), and importantly, used credible intervals for $π_{3}$ to quantify the imprecision in their estimates. It is true that a PP cutoff of 0.8 was used to identify disease risk genes but this was not the approach for studying genetic overlaps in their paper.

To compare the performance of mTADA with EncoreDNM, we followed Nguyen et al. and performed posterior inference for $π_{3}$ . We used the same 95% credible interval approach to quantify the imprecision in $π_{3}$ estimates, and compared it with $π_{1}^{S} * π_{2}^{S}$ which quantifies the expected proportion of shared risk genes if two disorders are genetically independent. Here, $π_{1}^{S}$ and $π_{2}^{S}$ are the estimated proportions of risk genes for two disorders respectively. We compared the posterior distribution of $π_{3}$ with $π_{1}^{S} * π_{2}^{S}$ and used the following metric to quantify the statistical evidence:

p = $2 * \frac{\sum_{i = 1}^{10000} I (π_{3}^{i} < π_{1}^{S} * π_{2}^{S})}{10000}$ .

Here, $π_{3}^{i}$ is the $i$ -th MCMC iteration sample and $I ()$ is the indicator function. If the lower bound of the 95% credible interval for $π_{3}$ exactly equals $π_{1}^{S} * π_{2}^{S}$ , then the corresponding p will be 0.05. This is also equivalent to using a (two-sided) 0.95 posterior probability cutoff on P( $π_{3} > π_{1}^{S} * π_{2}^{S}$ ) to claim statistical significance. Importantly, we once again highlight an adjustment we made in the posterior inference. Instead of comparing $π_{3}$ with 0 (this is what Nguyen et al. did in the mTADA paper), comparing with the expected proportion $π_{1}^{S} * π_{2}^{S}$ will lead to more conservative inference results for $π_{3}$ since $π_{1}^{S} * π_{2}^{S}$ is always greater than 0. Therefore, the posterior inference approach we used for mTADA is largely based on but statistically more conservative than what Nguyen et al. used for analyzing shared genetics between disorders.

Further, we investigated whether varying the significance threshold for mTADA would substantially change its performance in simulations under a mixed-effects Poisson regression model. At the significance cutoff 0.05, mTADA produced substantial proportion of false positive findings in the small $β$ , small $σ$ , and small $N$ settings (Supplementary Table 14). Under a more stringent significance cutoff of 0.01, mTADA still produced a substantial inflation in false positive rates when $β$ and $σ$ are small. We also employed a multinomial model instead of Poisson regression to simulate DNM counts. We obtained consistent results (Supplementary Table 15).

Although it is not the analytic approach used in the mTADA paper, we also investigated an alternative strategy which makes inference based on whether two disorders share at least one risk gene with PP>0.8. This strategy produced substantial inflation in false positive rates in the baseline setting (Supplementary Tables 16-17).

The discussions above have been incorporated into the revised manuscript.

11. Figure 6: Could a visual cue/text be added to differentiate between the analyses in the upper and lower triangles?

Thank you for the constructive suggestion. We have added the text for different gene sets in Figure 6. As suggested by Comment 8 from reviewer #1, we depicted correlation estimates rather than significance levels. Figure 6—figure supplements 1-8 have also been updated accordingly.

12. What is the overall rank of genes identified between different disorders when comparing between mTADA and EncoreDNM? Could the authors plot relative p/PP values for genes identified to be significantly enriched between disorders?

EncoreDNM does not prioritize disease risk genes, but instead estimates the enrichment correlation which quantifies concordant DNM effects between two disorders. This is conceptually similar to genetic correlation which can be estimated from GWAS data. Similarly, genetic correlation quantifies the overall shared additive genetic components between two traits but does not prioritize specific SNP associations. Therefore, we are not able to compare the rank of genes between mTADA and EncoreDNM.

13. L228-229: Do not use the terms "hints at correlation" or "correlate strongly". It either does or does not meet your p. value threshold and/or correlate.

We have deleted the statement to avoid confusion.

14. L257-260: Are the recurrent LoF mutations found in developmental disorders any different than those already identified by Kaplanis et al.? If so, how do your results increase understanding of the mechanisms underlying developmental disorders? I would perhaps shift the focus of this section to identifying recurrent mutations across disorders (and perhaps cite/refer to Rees et al. in Nature Communications; PMID: 34504065).

This is a great suggestion. We identified 30 recurrent cross-disorder LoF mutations that were not recurrent in developmental disorder alone (see Supplementary Table 7 ). We have now shifted the focus of this section to cross-disorder findings. In particular, we have highlighted the gene FBXO11 that shows recurrent LoF variants in autism and congenital hydrocephalus as an example. The Rees et al. paper has also been added as a reference in our revised manuscript.

15. L274-280: I do not feel the LD score analysis constitutes a new analysis and should be moved to the discussion. A similar result was recently published in by Abdellaoui and Verweij in Nature Human Behaviour (PMID: 33986517) that covers many of these traits.

We have moved the LD score regression analysis into the Discussion section.

16. Throughout: Could the term "Type I Error" be avoided? I would prefer false-discovery/false-positive be used as it is a much clearer term and immediately recognisable by the majority of readers.

We have replaced the word “type-I error” with “false positive rate” throughout the manuscript.

17. L365-366: I do not think identifying gene-disease associations is a goal of EncoreDNM so I do not consider it a limitation of the study.

We have removed this point from the Discussion section.

18. L584: I think the hyperlink is broken? I was able to find the repository for EncoreDNM in ghm17's github account, so this was not a major issue.

Thank you for pointing this out. The hyperlink has been fixed.

References

1. Samocha, K.E. et al. A framework for the interpretation of de novo mutation in human disease. Nature genetics 46, 944-950 (2014).

2. Marshall, A.W. & Olkin, I. Multivariate distributions generated from mixtures of convolution and product families. Lecture Notes-Monograph Series , 371-393 (1990).

3. Cameron, A.C. & Trivedi, P.K. Regression analysis of count data, (Cambridge university press, 2013).

4. Munkin, M.K. & Trivedi, P.K. Simulated maximum likelihood estimation of multivariate mixed‐Poisson regression models, with application. The Econometrics Journal 2, 29-48 (1999).

5. Nguyen, T.-H. et al. mTADA is a framework for identifying risk genes from de novo

mutations in multiple traits. Nature Communications 11, 2929 (2020).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary file 1. Supplementary Tables 1-20.

elife-75551-supp1.xlsx^{(1.8MB, xlsx)}

MDAR checklist

elife-75551-mdarchecklist1.pdf^{(200.1KB, pdf)}

Data Availability Statement

The current manuscript is a computational study, so no data have been generated for this manuscript.

[bib1] Abdellaoui A, Verweij KJH. Dissecting polygenic signals from genome-wide association studies on human behaviour. Nature Human Behaviour. 2021;5:686–694. doi: 10.1038/s41562-021-01110-y. [DOI] [PubMed] [Google Scholar]

[bib2] Allen AS, Berkovic SF, Cossette P, Delanty N, Dlugos D, Eichler EE, Epstein MP, Glauser T, Goldstein DB, Han Y, Heinzen EL, Hitomi Y, Howell KB, Johnson MR, Kuzniecky R, Lowenstein DH, Lu Y-F, Madou MRZ, Marson AG, Mefford HC, Esmaeeli Nieh S, O’Brien TJ, Ottman R, Petrovski S, Poduri A, Ruzzo EK, Scheffer IE, Sherr EH, Yuskaitis CJ, Abou-Khalil B, Alldredge BK, Bautista JF, Berkovic SF, Boro A, Cascino GD, Consalvo D, Crumrine P, Devinsky O, Dlugos D, Epstein MP, Fiol M, Fountain NB, French J, Friedman D, Geller EB, Glauser T, Glynn S, Haut SR, Hayward J, Helmers SL, Joshi S, Kanner A, Kirsch HE, Knowlton RC, Kossoff EH, Kuperman R, Kuzniecky R, Lowenstein DH, McGuire SM, Motika PV, Novotny EJ, Ottman R, Paolicchi JM, Parent JM, Park K, Poduri A, Scheffer IE, Shellhaas RA, Sherr EH, Shih JJ, Singh R, Sirven J, Smith MC, Sullivan J, Lin Thio L, Venkat A, Vining EPG, Von Allmen GK, Weisenberg JL, Widdess-Walsh P, Winawer MR, Epi4K Consortium. Epilepsy Phenome/Genome Project De novo mutations in epileptic encephalopathies. Nature. 2013;501:217–221. doi: 10.1038/nature12439. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene Ontology: tool for the unification of biology. Nature Genetics. 2000;25:25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR, 1000 Genomes Project Consortium A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] Brainstorm C. Analysis of shared heritability in common disorders of the brain. Science (New York, N.Y.) 2018;360:aap875. doi: 10.1126/science.aap875. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] Bulik-Sullivan B, Finucane HK, Anttila V, Gusev A, Day FR, Loh P-R, Duncan L, Perry JRB, Patterson N, Robinson EB, Daly MJ, Price AL, Neale BM, ReproGen Consortium. Psychiatric Genomics Consortium. Genetic Consortium for Anorexia Nervosa of the Wellcome Trust Case Control Consortium 3 An atlas of genetic correlations across human diseases and traits. Nature Genetics. 2015;47:1236–1241. doi: 10.1038/ng.3406. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] Cao M, Wu JI. Camk2a-Cre-mediated conditional deletion of chromatin remodeler Brg1 causes perinatal hydrocephalus. Neuroscience Letters. 2015;597:71–76. doi: 10.1016/j.neulet.2015.04.041. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] Cardozo T, Pagano M. The SCF ubiquitin ligase: insights into a molecular machine. Nature Reviews. Molecular Cell Biology. 2004;5:739–751. doi: 10.1038/nrm1471. [DOI] [PubMed] [Google Scholar]

[bib9] Cerami E, Gao J, Dogrusoz U, Gross BE, Sumer SO, Aksoy BA, Jacobsen A, Byrne CJ, Heuer ML, Larsson E, Antipin Y, Reva B, Goldberg AP, Sander C, Schultz N. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discovery. 2012;2:401–404. doi: 10.1158/2159-8290.CD-12-0095. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] Christensen D, Van Naarden Braun K, Doernberg NS, Maenner MJ, Arneson CL, Durkin MS, Benedict RE, Kirby RS, Wingate MS, Fitzgerald R, Yeargin-Allsopp M. Prevalence of cerebral palsy, co-occurring autism spectrum disorders, and motor functioning - Autism and Developmental Disabilities Monitoring Network, USA, 2008. Developmental Medicine and Child Neurology. 2014;56:59–65. doi: 10.1111/dmcn.12268. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] Dong C, Wei P, Jian X, Gibbs R, Boerwinkle E, Wang K, Liu X. Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies. Human Molecular Genetics. 2015;24:2125–2137. doi: 10.1093/hmg/ddu733. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] Fromer M, Pocklington AJ, Kavanagh DH, Williams HJ, Dwyer S, Gormley P, Georgieva L, Rees E, Palta P, Ruderfer DM, Carrera N, Humphreys I, Johnson JS, Roussos P, Barker DD, Banks E, Milanova V, Grant SG, Hannon E, Rose SA, Chambert K, Mahajan M, Scolnick EM, Moran JL, Kirov G, Palotie A, McCarroll SA, Holmans P, Sklar P, Owen MJ, Purcell SM, O’Donovan MC. De novo mutations in schizophrenia implicate synaptic networks. Nature. 2014;506:179–184. doi: 10.1038/nature12929. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib13] Garne E, Dolk H, Krägeloh-Mann I, Holst Ravn S, Cans C, SCPE Collaborative Group Cerebral palsy and congenital malformations. European Journal of Paediatric Neurology. 2008;12:82–88. doi: 10.1016/j.ejpn.2007.07.001. [DOI] [PubMed] [Google Scholar]

[bib14] Gratten J, Wray NR, Keller MC, Visscher PM. Large-scale genomics unveils the genetic architecture of psychiatric disorders. Nature Neuroscience. 2014;17:782–790. doi: 10.1038/nn.3708. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] Gregor A, Sadleir LG, Asadollahi R, Azzarello-Burri S, Battaglia A, Ousager LB, Boonsawat P, Bruel A-L, Buchert R, Calpena E, Cogné B, Dallapiccola B, Distelmaier F, Elmslie F, Faivre L, Haack TB, Harrison V, Henderson A, Hunt D, Isidor B, Joset P, Kumada S, Lachmeijer AMA, Lees M, Lynch SA, Martinez F, Matsumoto N, McDougall C, Mefford HC, Miyake N, Myers CT, Moutton S, Nesbitt A, Novelli A, Orellana C, Rauch A, Rosello M, Saida K, Santani AB, Sarkar A, Scheffer IE, Shinawi M, Steindl K, Symonds JD, Zackai EH, University of Washington Center for Mendelian Genomics. DDD Study. Reis A, Sticht H, Zweier C. De Novo Variants in the F-Box Protein FBXO11 in 20 Individuals with a Variable Neurodevelopmental Disorder. American Journal of Human Genetics. 2018;103:305–316. doi: 10.1016/j.ajhg.2018.07.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] Grotzinger AD, Rhemtulla M, de Vlaming R, Ritchie SJ, Mallard TT, Hill WD, Ip HF, Marioni RE, McIntosh AM, Deary IJ, Koellinger PD, Harden KP, Nivard MG, Tucker-Drob EM. Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits. Nature Human Behaviour. 2019;3:513–525. doi: 10.1038/s41562-019-0566-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib17] Grotzinger AD, Mallard TT, Akingbuwa WA, Ip HF, Adams MJ, Lewis CM, McIntosh AM, Grove J, Dalsgaard S, Lesch KP, Strom N, Meier SM, Mattheisen M, Børglum AD, Mors O, Breen G, Lee PH, Kendler KS, Smoller JW, Tucker-Drob EM, Nivard MG, iPSYCH, Tourette Syndrome and Obsessive Compulsive Disorder Working Group of the Psychiatric Genetics Consortium. Bipolar Disorder Working Group of the Psychiatric Genetics Consortium. Major Depressive Disorder Working Group of the Psychiatric Genetics Consortium. Schizophrenia Working Group of the Psychiatric Genetics Consortium Genetic Architecture of 11 Major Psychiatric Disorders at Biobehavioral, Functional Genomic, and Molecular Genetic Levels of Analysis. medRxiv. 2020 doi: 10.1101/2020.09.22.20196089. [DOI] [PMC free article] [PubMed]

[bib18] Gulisano M, Calì PV, Cavanna AE, Eddy C, Rickards H, Rizzo R. Cardiovascular safety of aripiprazole and pimozide in young patients with Tourette syndrome. Neurological Sciences. 2011;32:1213–1217. doi: 10.1007/s10072-011-0678-1. [DOI] [PubMed] [Google Scholar]

[bib19] Guo H, Li JJ, Lu Q, Hou L. Detecting local genetic correlations with scan statistics. Nature Communications. 2021;12:2033. doi: 10.1038/s41467-021-22334-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] Guo H. EncoreDNM. swh:1:rev:44ec5903b4c34e7b73ed7791f30d0b3544bafcd1GitHub. 2022 https://github.com/ghm17/EncoreDNM

[bib21] Hoischen A, Krumm N, Eichler EE. Prioritization of neurodevelopmental disease genes by discovery of new mutations. Nature Neuroscience. 2014;17:764–772. doi: 10.1038/nn.3703. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib22] Homsy J, Zaidi S, Shen Y, Ware JS, Samocha KE, Karczewski KJ, DePalma SR, McKean D, Wakimoto H, Gorham J, Jin SC, Deanfield J, Giardini A, Porter GA, Jr, Kim R, Bilguvar K, López-Giráldez F, Tikhonova I, Mane S, Romano-Adesman A, Qi H, Vardarajan B, Ma L, Daly M, Roberts AE, Russell MW, Mital S, Newburger JW, Gaynor JW, Breitbart RE, Iossifov I, Ronemus M, Sanders SJ, Kaltman JR, Seidman JG, Brueckner M, Gelb BD, Goldmuntz E, Lifton RP, Seidman CE, Chung WK. De novo mutations in congenital heart disease with neurodevelopmental and other congenital anomalies. Science (New York, N.Y.) 2015;350:1262–1266. doi: 10.1126/science.aac9396. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] Hormozdiari F, Penn O, Borenstein E, Eichler EE. The discovery of integrated gene networks for autism and related disorders. Genome Research. 2015;25:142–154. doi: 10.1101/gr.178855.114. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] Howrigan DP, Rose SA, Samocha KE, Fromer M, Cerrato F, Chen WJ, Churchhouse C, Chambert K, Chandler SD, Daly MJ, Dumont A, Genovese G, Hwu H-G, Laird N, Kosmicki JA, Moran JL, Roe C, Singh T, Wang S-H, Faraone SV, Glatt SJ, McCarroll SA, Tsuang M, Neale BM. Exome sequencing in schizophrenia-affected parent-offspring trios reveals risk conferred by protein-coding de novo mutations. Nature Neuroscience. 2020;23:185–193. doi: 10.1038/s41593-019-0564-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] Jansen S, van der Werf IM, Innes AM, Afenjar A, Agrawal PB, Anderson IJ, Atwal PS, van Binsbergen E, van den Boogaard M-J, Castiglia L, Coban-Akdemir ZH, van Dijck A, Doummar D, van Eerde AM, van Essen AJ, van Gassen KL, Guillen Sacoto MJ, van Haelst MM, Iossifov I, Jackson JL, Judd E, Kaiwar C, Keren B, Klee EW, Klein Wassink-Ruiter JS, Meuwissen ME, Monaghan KG, de Munnik SA, Nava C, Ockeloen CW, Pettinato R, Racher H, Rinne T, Romano C, Sanders VR, Schnur RE, Smeets EJ, Stegmann APA, Stray-Pedersen A, Sweetser DA, Terhal PA, Tveten K, VanNoy GE, de Vries PF, Waxler JL, Willing M, Pfundt R, Veltman JA, Kooy RF, Vissers LELM, de Vries BBA. De novo variants in FBXO11 cause a syndromic form of intellectual disability with behavioral problems and dysmorphisms. European Journal of Human Genetics. 2019;27:738–746. doi: 10.1038/s41431-018-0292-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] Jin SC, Homsy J, Zaidi S, Lu Q, Morton S, DePalma SR, Zeng X, Qi H, Chang W, Sierant MC, Hung W-C, Haider S, Zhang J, Knight J, Bjornson RD, Castaldi C, Tikhonoa IR, Bilguvar K, Mane SM, Sanders SJ, Mital S, Russell MW, Gaynor JW, Deanfield J, Giardini A, Porter GA, Jr, Srivastava D, Lo CW, Shen Y, Watkins WS, Yandell M, Yost HJ, Tristani-Firouzi M, Newburger JW, Roberts AE, Kim R, Zhao H, Kaltman JR, Goldmuntz E, Chung WK, Seidman JG, Gelb BD, Seidman CE, Lifton RP, Brueckner M. Contribution of rare inherited and de novo variants in 2,871 congenital heart disease probands. Nature Genetics. 2017;49:1593–1601. doi: 10.1038/ng.3970. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib27] Jin SC, Dong W, Kundishora AJ, Panchagnula S, Moreno-De-Luca A, Furey CG, Allocco AA, Walker RL, Nelson-Williams C, Smith H, Dunbar A, Conine S, Lu Q, Zeng X, Sierant MC, Knight JR, Sullivan W, Duy PQ, DeSpenza T, Reeves BC, Karimy JK, Marlier A, Castaldi C, Tikhonova IR, Li B, Peña HP, Broach JR, Kabachelor EM, Ssenyonga P, Hehnly C, Ge L, Keren B, Timberlake AT, Goto J, Mangano FT, Johnston JM, Butler WE, Warf BC, Smith ER, Schiff SJ, Limbrick DD, Heuer G, Jackson EM, Iskandar BJ, Mane S, Haider S, Guclu B, Bayri Y, Sahin Y, Duncan CC, Apuzzo MLJ, DiLuna ML, Hoffman EJ, Sestan N, Ment LR, Alper SL, Bilguvar K, Geschwind DH, Günel M, Lifton RP, Kahle KT. Exome sequencing implicates genetic disruption of prenatal neuro-gliogenesis in sporadic congenital hydrocephalus. Nature Medicine. 2020a;26:1754–1765. doi: 10.1038/s41591-020-1090-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] Jin SC, Lewis SA, Bakhtiari S, Zeng X, Sierant MC, Shetty S, Nordlie SM, Elie A, Corbett MA, Norton BY, van Eyk CL, Haider S, Guida BS, Magee H, Liu J, Pastore S, Vincent JB, Brunstrom-Hernandez J, Papavasileiou A, Fahey MC, Berry JG, Harper K, Zhou C, Zhang J, Li B, Zhao H, Heim J, Webber DL, Frank MSB, Xia L, Xu Y, Zhu D, Zhang B, Sheth AH, Knight JR, Castaldi C, Tikhonova IR, López-Giráldez F, Keren B, Whalen S, Buratti J, Doummar D, Cho M, Retterer K, Millan F, Wang Y, Waugh JL, Rodan L, Cohen JS, Fatemi A, Lin AE, Phillips JP, Feyma T, MacLennan SC, Vaughan S, Crompton KE, Reid SM, Reddihough DS, Shang Q, Gao C, Novak I, Badawi N, Wilson YA, McIntyre SJ, Mane SM, Wang X, Amor DJ, Zarnescu DC, Lu Q, Xing Q, Zhu C, Bilguvar K, Padilla-Lopez S, Lifton RP, Gecz J, MacLennan AH, Kruer MC. Mutations disrupting neuritogenesis genes confer risk for cerebral palsy. Nature Genetics. 2020b;52:1046–1056. doi: 10.1038/s41588-020-0695-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib29] Kaplanis J, Samocha KE, Wiel L, Zhang Z, Arvai KJ, Eberhardt RY, Gallone G, Lelieveld SH, Martin HC, McRae JF, Short PJ, Torene RI, de Boer E, Danecek P, Gardner EJ, Huang N, Lord J, Martincorena I, Pfundt R, Reijnders MRF, Yeung A, Yntema HG, Deciphering Developmental Disorders Study. Vissers LELM, Juusola J, Wright CF, Brunner HG, Firth HV, FitzPatrick DR, Barrett JC, Hurles ME, Gilissen C, Retterer K. Evidence for 28 genetic disorders discovered by combining healthcare and research data. Nature. 2020;586:757–762. doi: 10.1038/s41586-020-2832-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, Collins RL, Laricchia KM, Ganna A, Birnbaum DP, Gauthier LD, Brand H, Solomonson M, Watts NA, Rhodes D, Singer-Berk M, England EM, Seaby EG, Kosmicki JA, Walters RK, Tashman K, Farjoun Y, Banks E, Poterba T, Wang A, Seed C, Whiffin N, Chong JX, Samocha KE, Pierce-Hoffman E, Zappala Z, O’Donnell-Luria AH, Minikel EV, Weisburd B, Lek M, Ware JS, Vittal C, Armean IM, Bergelson L, Cibulskis K, Connolly KM, Covarrubias M, Donnelly S, Ferriera S, Gabriel S, Gentry J, Gupta N, Jeandet T, Kaplan D, Llanwarne C, Munshi R, Novod S, Petrillo N, Roazen D, Ruano-Rubio V, Saltzman A, Schleicher M, Soto J, Tibbetts K, Tolonen C, Wade G, Talkowski ME, Genome Aggregation Database Consortium. Neale BM, Daly MJ, MacArthur DG. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581:434–443. doi: 10.1038/s41586-020-2308-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib31] Kielinen M, Rantala H, Timonen E, Linna SL, Moilanen I. Associated medical disorders and disabilities in children with autistic disorder: a population-based study. Autism. 2004;8:49–60. doi: 10.1177/1362361304040638. [DOI] [PubMed] [Google Scholar]

[bib32] Kilincaslan A, Mukaddes NM. Pervasive developmental disorders in individuals with cerebral palsy. Developmental Medicine and Child Neurology. 2009;51:289–294. doi: 10.1111/j.1469-8749.2008.03171.x. [DOI] [PubMed] [Google Scholar]

[bib33] Krumm N, Turner TN, Baker C, Vives L, Mohajeri K, Witherspoon K, Raja A, Coe BP, Stessman HA, He Z-X, Leal SM, Bernier R, Eichler EE. Excess of rare, inherited truncating mutations in autism. Nature Genetics. 2015;47:582–588. doi: 10.1038/ng.3303. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib34] Kume T, Deng KY, Winfrey V, Gould DB, Walter MA, Hogan BL. The forkhead/winged helix gene Mf1 is disrupted in the pleiotropic mouse mutation congenital hydrocephalus. Cell. 1998;93:985–996. doi: 10.1016/s0092-8674(00)81204-0. [DOI] [PubMed] [Google Scholar]

[bib35] Lee SH, Yang J, Goddard ME, Visscher PM, Wray NR. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics (Oxford, England) 2012;28:2540–2542. doi: 10.1093/bioinformatics/bts474. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib37] Lee PH. Genomic relationships, novel loci, and pleiotropic mechanisms across eight psychiatric disorders. Cell. 2019;179:1469–1482. doi: 10.1016/j.cell.2019.11.020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib38] Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O’Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB, Tukiainen T, Birnbaum DP, Kosmicki JA, Duncan LE, Estrada K, Zhao F, Zou J, Pierce-Hoffman E, Berghout J, Cooper DN, Deflaux N, DePristo M, Do R, Flannick J, Fromer M, Gauthier L, Goldstein J, Gupta N, Howrigan D, Kiezun A, Kurki MI, Moonshine AL, Natarajan P, Orozco L, Peloso GM, Poplin R, Rivas MA, Ruano-Rubio V, Rose SA, Ruderfer DM, Shakir K, Stenson PD, Stevens C, Thomas BP, Tiao G, Tusie-Luna MT, Weisburd B, Won H-H, Yu D, Altshuler DM, Ardissino D, Boehnke M, Danesh J, Donnelly S, Elosua R, Florez JC, Gabriel SB, Getz G, Glatt SJ, Hultman CM, Kathiresan S, Laakso M, McCarroll S, McCarthy MI, McGovern D, McPherson R, Neale BM, Palotie A, Purcell SM, Saleheen D, Scharf JM, Sklar P, Sullivan PF, Tuomilehto J, Tsuang MT, Watkins HC, Wilson JG, Daly MJ, MacArthur DG, Exome Aggregation Consortium Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536:285–291. doi: 10.1038/nature19057. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib39] Lelieveld SH, Reijnders MRF, Pfundt R, Yntema HG, Kamsteeg E-J, de Vries P, de Vries BBA, Willemsen MH, Kleefstra T, Löhner K, Vreeburg M, Stevens SJC, van der Burgt I, Bongers EMHF, Stegmann APA, Rump P, Rinne T, Nelen MR, Veltman JA, Vissers LELM, Brunner HG, Gilissen C. Meta-analysis of 2,104 trios provides support for 10 new genes for intellectual disability. Nature Neuroscience. 2016;19:1194–1196. doi: 10.1038/nn.4352. [DOI] [PubMed] [Google Scholar]

[bib40] Li J, Cai T, Jiang Y, Chen H, He X, Chen C, Li X, Shao Q, Ran X, Li Z, Xia K, Liu C, Sun ZS, Wu J. Genes with de novo mutations are shared by four neuropsychiatric disorders discovered from NPdenovo database. Molecular Psychiatry. 2016;21:290–297. doi: 10.1038/mp.2015.40. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib41] Lu Q, Li B, Ou D, Erlendsdottir M, Powles RL, Jiang T, Hu Y, Chang D, Jin C, Dai W, He Q, Liu Z, Mukherjee S, Crane PK, Zhao H. A Powerful Approach to Estimating Annotation-Stratified Genetic Covariance via GWAS Summary Statistics. American Journal of Human Genetics. 2017;101:939–964. doi: 10.1016/j.ajhg.2017.11.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib42] Lumenta CB, Skotarczak U. Long-term follow-up in 233 patients with congenital hydrocephalus. Child’s Nervous System. 1995;11:173–175. doi: 10.1007/BF00570260. [DOI] [PubMed] [Google Scholar]

[bib43] Munkin MK, Trivedi PK. Simulated maximum likelihood estimation of multivariate mixed‐Poisson regression models, with application. The Econometrics Journal. 1999;2:29–48. doi: 10.1111/1368-423X.00019. [DOI] [Google Scholar]

[bib44] Nguyen HT, Bryois J, Kim A, Dobbyn A, Huckins LM, Munoz-Manchado AB, Ruderfer DM, Genovese G, Fromer M, Xu X, Pinto D, Linnarsson S, Verhage M, Smit AB, Hjerling-Leffler J, Buxbaum JD, Hultman C, Sklar P, Purcell SM, Lage K, He X, Sullivan PF, Stahl EA. Integrated Bayesian analysis of rare exonic variants to identify risk genes for schizophrenia and neurodevelopmental disorders. Genome Medicine. 2017;9:114. doi: 10.1186/s13073-017-0497-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib45] Nguyen T-H, Dobbyn A, Brown RC, Riley BP, Buxbaum JD, Pinto D, Purcell SM, Sullivan PF, He X, Stahl EA. mTADA is a framework for identifying risk genes from de novo mutations in multiple traits. Nature Communications. 2020;11:2929. doi: 10.1038/s41467-020-16487-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib46] Nguyen TH, Dobbyn A, Brown RC, Riley BP, Buxbaum J, Pinto D, Purcell SM, Sullivan PF, He X, Eli A. mTADA is a framework for identifying risk genes from de novo mutations in multiple traits. 7630c4bGitHub. 2021 doi: 10.1038/s41467-020-16487-z. https://github.com/hoangtn/mTADA [DOI] [PMC free article] [PubMed]

[bib47] Ning Z, Pawitan Y, Shen X. High-definition likelihood inference of genetic correlations across human complex traits. Nature Genetics. 2020;52:859–864. doi: 10.1038/s41588-020-0653-y. [DOI] [PubMed] [Google Scholar]

[bib48] O’Roak BJ, Vives L, Girirajan S, Karakoc E, Krumm N, Coe BP, Levy R, Ko A, Lee C, Smith JD, Turner EH, Stanaway IB, Vernot B, Malig M, Baker C, Reilly B, Akey JM, Borenstein E, Rieder MJ, Nickerson DA, Bernier R, Shendure J, Eichler EE. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature. 2012;485:246–250. doi: 10.1038/nature10989. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib49] Pickrell JK, Berisa T, Liu JZ, Ségurel L, Tung JY, Hinds DA. Detection and interpretation of shared genetic influences on 42 human traits. Nature Genetics. 2016;48:709–717. doi: 10.1038/ng.3570. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib50] Rees E, Creeth HDJ, Hwu H-G, Chen WJ, Tsuang M, Glatt SJ, Rey R, Kirov G, Walters JTR, Holmans P, Owen MJ, O’Donovan MC. Schizophrenia, autism spectrum disorders and developmental disorders share specific disruptive coding mutations. Nature Communications. 2021;12:5353. doi: 10.1038/s41467-021-25532-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib51] Reid SM, Meehan EM, Arnup SJ, Reddihough DS. Intellectual disability in cerebral palsy: a population-based retrospective study. Developmental Medicine and Child Neurology. 2018;60:687–694. doi: 10.1111/dmcn.13773. [DOI] [PubMed] [Google Scholar]

[bib52] Samocha KE, Robinson EB, Sanders SJ, Stevens C, Sabo A, McGrath LM, Kosmicki JA, Rehnström K, Mallick S, Kirby A, Wall DP, MacArthur DG, Gabriel SB, DePristo M, Purcell SM, Palotie A, Boerwinkle E, Buxbaum JD, Cook EH, Jr, Gibbs RA, Schellenberg GD, Sutcliffe JS, Devlin B, Roeder K, Neale BM, Daly MJ. A framework for the interpretation of de novo mutation in human disease. Nature Genetics. 2014;46:944–950. doi: 10.1038/ng.3050. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib53] Satterstrom FK, Kosmicki JA, Wang J, Breen MS, De Rubeis S, An J-Y, Peng M, Collins R, Grove J, Klei L, Stevens C, Reichert J, Mulhern MS, Artomov M, Gerges S, Sheppard B, Xu X, Bhaduri A, Norman U, Brand H, Schwartz G, Nguyen R, Guerrero EE, Dias C, Autism Sequencing Consortium. iPSYCH-Broad Consortium. Betancur C, Cook EH, Gallagher L, Gill M, Sutcliffe JS, Thurm A, Zwick ME, Børglum AD, State MW, Cicek AE, Talkowski ME, Cutler DJ, Devlin B, Sanders SJ, Roeder K, Daly MJ, Buxbaum JD. Large-Scale Exome Sequencing Study Implicates Both Developmental and Functional Changes in the Neurobiology of Autism. Cell. 2020;180:568–584. doi: 10.1016/j.cell.2019.12.036. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib54] Schorsch E. LDSC (LD SCore) v1.0.1. aa33296GitHub. 2020 https://github.com/bulik/ldsc

[bib55] Shi H, Mancuso N, Spendlove S, Pasaniuc B. Local Genetic Correlation Gives Insights into the Shared Genetic Architecture of Complex Traits. American Journal of Human Genetics. 2017;101:737–751. doi: 10.1016/j.ajhg.2017.09.022. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib56] Turley P, Walters RK, Maghzian O, Okbay A, Lee JJ, Fontana MA, Nguyen-Viet TA, Wedow R, Zacher M, Furlotte NA, 23andMe Research Team. Social Science Genetic Association Consortium. Magnusson P, Oskarsson S, Johannesson M, Visscher PM, Laibson D, Cesarini D, Neale BM, Benjamin DJ. Multi-trait analysis of genome-wide association summary statistics using MTAG. Nature Genetics. 2018;50:229–237. doi: 10.1038/s41588-017-0009-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib57] Turner TN, Yi Q, Krumm N, Huddleston J, Hoekzema K, F Stessman HA, Doebley A-L, Bernier RA, Nickerson DA, Eichler EE. denovo-db: A compendium of human de novo variants. Nucleic Acids Research. 2017;45:D804–D811. doi: 10.1093/nar/gkw865. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib58] Veltman JA, Brunner HG. De novo mutations in human genetic disease. Nature Reviews. Genetics. 2012;13:565–575. doi: 10.1038/nrg3241. [DOI] [PubMed] [Google Scholar]

[bib59] Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Research. 2010;38:e164. doi: 10.1093/nar/gkq603. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib60] Wang Q, Yang C, Gelernter J, Zhao H. Pervasive pleiotropy between psychiatric disorders and immune disorders revealed by integrative analysis of multiple GWAS. Human Genetics. 2015;134:1195–1209. doi: 10.1007/s00439-015-1596-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib61] Watanabe K, Taskesen E, van Bochoven A, Posthuma D. Functional mapping and annotation of genetic associations with FUMA. Nature Communications. 2017;8:1–11. doi: 10.1038/s41467-017-01261-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib62] Wei Q, Zhan X, Zhong X, Liu Y, Han Y, Chen W, Li B. A Bayesian framework for de novo mutation calling in parents-offspring trios. Bioinformatics (Oxford, England) 2015;31:1375–1381. doi: 10.1093/bioinformatics/btu839. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib63] Werling DM, Pochareddy S, Choi J, An JY, Sheppard B, Peng M, Li Z, Dastmalchi C, Santpere G, Sousa AMM, Tebbenkamp ATN, Kaur N, Gulden FO, Breen MS, Liang L, Gilson MC, Zhao X, Dong S, Klei L, Cicek AE, Buxbaum JD, Adle-Biassette H, Thomas JL, Aldinger KA, O’Day DR, Glass IA, Zaitlen NA, Talkowski ME, Roeder K, State MW, Devlin B, Sanders SJ, Sestan N. Whole-Genome and RNA Sequencing Reveal Variation and Transcriptomic Coordination in the Developing Human Prefrontal Cortex. Cell Reports. 2020;31:e107489. doi: 10.1016/j.celrep.2020.03.053. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib64] Willsey AJ, Fernandez TV, Yu D, King RA, Dietrich A, Xing J, Sanders SJ, Mandell JD, Huang AY, Richer P, Smith L, Dong S, Samocha KE, Tourette International Collaborative Genetics (TIC Genetics) Tourette Syndrome Association International Consortium for Genetics (TSAICG) Neale BM, Coppola G, Mathews CA, Tischfield JA, Scharf JM, State MW, Heiman GA. De Novo Coding Variants Are Strongly Associated with Tourette Disorder. Neuron. 2017;94:486–499. doi: 10.1016/j.neuron.2017.04.024. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib65] Willsey AJ, Morris MT, Wang S, Willsey HR, Sun N, Teerikorpi N, Baum TB, Cagney G, Bender KJ, Desai TA, Srivastava D, Davis GW, Doudna J, Chang E, Sohal V, Lowenstein DH, Li H, Agard D, Keiser MJ, Shoichet B, von Zastrow M, Mucke L, Finkbeiner S, Gan L, Sestan N, Ward ME, Huttenhain R, Nowakowski TJ, Bellen HJ, Frank LM, Khokha MK, Lifton RP, Kampmann M, Ideker T, State MW, Krogan NJ. The Psychiatric Cell Map Initiative: A Convergent Systems Biological Approach to Illuminating Key Molecular Pathways in Neuropsychiatric Disorders. Cell. 2018;174:505–520. doi: 10.1016/j.cell.2018.06.016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib66] Zaidi S, Choi M, Wakimoto H, Ma L, Jiang J, Overton JD, Romano-Adesman A, Bjornson RD, Breitbart RE, Brown KK, Carriero NJ, Cheung YH, Deanfield J, DePalma S, Fakhro KA, Glessner J, Hakonarson H, Italia MJ, Kaltman JR, Kaski J, Kim R, Kline JK, Lee T, Leipzig J, Lopez A, Mane SM, Mitchell LE, Newburger JW, Parfenov M, Pe’er I, Porter G, Roberts AE, Sachidanandam R, Sanders SJ, Seiden HS, State MW, Subramanian S, Tikhonova IR, Wang W, Warburton D, White PS, Williams IA, Zhao H, Seidman JG, Brueckner M, Chung WK, Gelb BD, Goldmuntz E, Seidman CE, Lifton RP. De novo mutations in histone-modifying genes in congenital heart disease. Nature. 2013;498:220–223. doi: 10.1038/nature12141. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib67] Zaidi S, Brueckner M. Genetics and Genomics of Congenital Heart Disease. Circulation Research. 2017;120:923–940. doi: 10.1161/CIRCRESAHA.116.309140. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib68] Zhang Y, Cheng Y, Jiang W, Ye Y, Lu Q, Zhao H. Comparison of methods for estimating genetic correlation between complex traits using GWAS summary statistics. Briefings in Bioinformatics. 2021a;22:bbaa442. doi: 10.1093/bib/bbaa442. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib69] Zhang Y, Lu Q, Ye Y, Huang K, Liu W, Wu Y, Zhong X, Li B, Yu Z, Travers BG, Werling DM, Li JJ, Zhao H. SUPERGNOVA: local genetic correlation analysis reveals heterogeneous etiologic sharing of complex traits. Genome Biology. 2021b;22:1–30. doi: 10.1186/s13059-021-02478-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Quantifying concordant genetic effects of de novo mutations on multiple disorders

Hanmin Guo

Lin Hou

Yu Shi

Sheng Chih Jin

Xue Zeng

Boyang Li

Richard P Lifton

Martina Brueckner

Hongyu Zhao

Qiongshi Lu

Roles

Abstract

Introduction

Results

Method overview

Figure 1. EncoreDNM workflow.

Simulation results

Figure 2. Parameter estimation results of EncoreDNM.

Figure 2—figure supplement 1. Estimation results of elevation parameter β under a mixed-effects Poisson regression model.

Figure 2—figure supplement 2. Estimation results of dispersion parameter σ under a mixed-effects Poisson regression model.

Figure 3. Comparison of EncoreDNM and mTADA.

Pervasive enrichment correlation of damaging DNMs among developmental disorders

Figure 4. Model fitting results for nine disorders.

Figure 4—figure supplement 1. Likelihood ratio test shows significantly improved goodness of fit of the mixed-effects Poisson model compared to a fixed-effects model without the deviation component.

Figure 5. EncoreDNM identifies pervasive enrichment correlations of damaging DNMs among nine disorders.

Figure 5—figure supplement 1. DNM enrichment correlations of nine disorders based on Dmis and Tmis variants.

Figure 5—figure supplement 2. DNM enrichment correlations between nine disorders and controls.

Figure 5—figure supplement 3. Number of significant correlations identified for each disorder is proportional to its sample size.

Figure 5—figure supplement 4. Lollipop plot for LoF DNMs in CTNNB1.

Figure 5—figure supplement 5. Lollipop plot for LoF DNMs in FBXO11.

Figure 5—figure supplement 6. DNM genetic sharing in nine disorders estimated for LoF, Dmis, Tmis, and synonymous DNMs using mTADA.

Figure 5—figure supplement 7. DNM genetic sharing in nine disorders and controls identified by mTADA.

Figure 5—figure supplement 8. Comparison of GWAS- and DNM-based estimation of genetic sharing among five disorders.

Figure 5—figure supplement 9. Group-wise jackknife method and inversion of Fisher information matrix method produced similar standard error estimates for LoF variants.

Partitioning DNM enrichment correlation by gene set

Figure 6. DNM enrichment correlations in disease-relevant gene sets.

Figure 6—figure supplement 1. DNM enrichment correlations in high-pLI genes (upper triangle) and low-pLI genes (lower triangle) for Dmis, Tmis, and synonymous variants.

Figure 6—figure supplement 2. DNM enrichment correlations in HBE genes (upper triangle) and LBE genes (lower triangle) for Dmis, Tmis, and synonymous variants.

Figure 6—figure supplement 3. DNM enrichment correlations between nine disorders and controls in high-pLI and low-pLI gene sets.

Figure 6—figure supplement 4. DNM enrichment correlations between nine disorders and controls in HBE and LBE genes.

Figure 6—figure supplement 5. DNM enrichment correlations in HHE genes (upper triangle) and LHE genes (lower triangle) for Dmis, Tmis, and synonymous variants.

Figure 6—figure supplement 6. DNM enrichment correlations in CHD-related pathways for Dmis and Tmis variants.

Figure 6—figure supplement 7. DNM enrichment correlations between nine disorders and controls in HHE and LHE gene sets.

Figure 6—figure supplement 8. DNM enrichment correlations between CHD and controls in CHD-related pathways.

Discussion

Materials and methods

Statistical model

Parameter estimation

Computation time

DNM data and variant annotation

Description and implementation of mTADA

Simulation settings

Comparison to the fixed-effects Poisson model

Recurrent genes and DNMs

Implementation of cross-trait LD score regression

Estimating enrichment correlation in gene sets

URLs

Code availability

Acknowledgements

Funding Statement

Contributor Information

Funding Information

Additional information

Competing interests

Author contributions

Additional files

Data availability

References

Editor's evaluation

Alexander Young

Roles

Decision letter

Roles

Author response

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

Figure 2—figure supplement 1. Estimation results of elevation parameter $β$ under a mixed-effects Poisson regression model.

Figure 2—figure supplement 2. Estimation results of dispersion parameter $σ$ under a mixed-effects Poisson regression model.