Significance
Essential genes (EGs) are necessary for survival and the development of an organism. Our study is focused on investigating the role of EGs in autism spectrum disorder (ASD). With a comprehensive catalog of 3,915 mammalian EGs, we show that there is both an elevated burden of damaging mutations in EGs in ASD probands and also, an enrichment of EGs in known ASD risk genes. Moreover, the analysis of EGs in the developing brain identified clusters of coexpressed EGs implicated in ASD. Overall, we provide evidence that genes that are essential for survival and fitness also contribute to ASD risk and lead to the disruption of normal social behavior.
Keywords: essential genes, mouse knockouts, mutational burden, autism spectrum disorder, coexpression modules
Abstract
Autism spectrum disorder (ASD) is a heterogeneous, highly heritable neurodevelopmental syndrome characterized by impaired social interaction, communication, and repetitive behavior. It is estimated that hundreds of genes contribute to ASD. We asked if genes with a strong effect on survival and fitness contribute to ASD risk. Human orthologs of genes with an essential role in pre- and postnatal development in the mouse [essential genes (EGs)] are enriched for disease genes and under strong purifying selection relative to human orthologs of mouse genes with a known nonlethal phenotype [nonessential genes (NEGs)]. This intolerance to deleterious mutations, commonly observed haploinsufficiency, and the importance of EGs in development suggest a possible cumulative effect of deleterious variants in EGs on complex neurodevelopmental disorders. With a comprehensive catalog of 3,915 mammalian EGs, we provide compelling evidence for a stronger contribution of EGs to ASD risk compared with NEGs. By examining the exonic de novo and inherited variants from 1,781 ASD quartet families, we show a significantly higher burden of damaging mutations in EGs in ASD probands compared with their non-ASD siblings. The analysis of EGs in the developing brain identified clusters of coexpressed EGs implicated in ASD. Finally, we suggest a high-priority list of 29 EGs with potential ASD risk as targets for future functional and behavioral studies. Overall, we show that large-scale studies of gene function in model organisms provide a powerful approach for prioritization of genes and pathogenic variants identified by sequencing studies of human disease.
Autism spectrum disorder (ASD) is a heterogeneous, heritable neurodevelopmental syndrome characterized by impaired social interaction, communication, and repetitive behavior (1, 2). The highly polygenic nature of ASD (3–5) suggests that the analysis of the full spectrum of sequence variants in hundreds of genes will be necessary for deeper understanding of disrupted neuronal function. Prioritization of ASD risk genes initially focused on known pathways with recognized relevance to pathogenesis of ASD, such as synaptic function and neuronal development (6). However, combined analyses of de novo, inherited, and case–control variation in over 2,500 ASD parent–child nuclear families identified around 100 genes contributing to ASD risk (7–9), converging on pathways implicated in transcriptional regulation and chromatin modeling in addition to synaptic function.
The main challenge in the current understanding of genetic architecture of ASD comes from a need to study the interplay between variants with a high effect (for example, recurrent de novo variants) and a background of variants with an intermediate effect but that nevertheless still disrupt proper neuronal development. Essential genes (EGs) or genes that are necessary for successful completion of pre- and postnatal development are prime candidates for the source of this background or load of variants with a cumulative intermediate effect. EGs are highly enriched for human disease genes and under strong purifying selection (10–14). In addition to intolerance to loss-of-function and deleterious mutations, the functional impact of EGs is reflected by haploinsufficiency that is commonly observed in heterozygous mutations (11, 15). In addition to their role in defining a “minimal gene set” (16, 17), EGs tend to play important roles in protein interaction networks (18). Therefore, one may consider that EGs are involved in rate-limiting steps that affect a range of disease pathways (19).
Recently, three large-scale screens (gene trap and CRISPR-Cas9) have been performed to assess the effect of single-gene mutations on cell viability or survival of haploid human cancer cell lines (“cell-based essentiality”) (20–22). These studies identified an overlapping core set of genes that were essential in the majority of cell lines tested (n = 956), although a subset of genes were essential in specific cell lines. In an alternative and complementary approach, we assembled a catalog of human orthologs of EGs in the mouse (n = 3,326) (14) based on the organismal-level phenotypes of loss-of-function mouse mutants from the Mouse Genome Informatics (MGI) database (23) and the International Mouse Phenotyping Consortium (IMPC) web portal (24). Based on these data, homozygous loss-of-function mutations in 3,326 genes lead to prenatal or preweaning lethality, with a significant overlap between the core set of human cell EGs and human orthologs of EGs in the mouse (14). These studies are consistent with 30% (or ∼6,000) of protein-coding genes to be essential for pre- and postnatal survival (14, 25).
A deeper understanding of the mutational spectrum of EGs in a neurodevelopmental disorder, such as ASD, is important, because EGs are less likely to be redundant, are more likely to have functional consequences when mutated, and may produce a gradation of phenotypes (25). Our previous work reported an enrichment of EGs among genes with de novo mutations in ASD patients (11). Several groups reported an enrichment of de novo and rare inherited single-nucleotide loss-of-function variants in ASD probands (8, 26), although there is a depletion of damaging mutations in ASD risk genes in population controls (12, 27, 28). In this report, we compiled, to our knowledge, the most comprehensive list of human EGs and extended the analysis to both de novo and inherited damaging variants in 1,781 ASD families. In addition to disease status, we further showed the effect of damaging variants in EGs on ASD-related traits, such as the social skill measurement in 2,348 ASD probands. Finally, we performed coexpression analysis of EGs in the developing human brain to identify clusters of interacting EGs that contribute to ASD risk and suggest ASD candidate genes.
Results
To identify the most comprehensive set of EGs in mammals, we combined the set of human orthologs of EGs in the mouse (n = 3,326) (14) with a set of human “core EGs” (n = 956) that were found to be essential in cell-based assays (20–22). Based on a significant overlap between tested mouse and human EGs (14), we expanded our original set of 3,326 EGs with the addition of nonoverlapping 589 EGs identified only in human cell lines for a total of 3,915 EGs (SI Materials and Methods and Dataset S1). In our subsequent analyses, we compared features of and genetic variation in these EGs with 4,919 human orthologs of genes with reported nonlethal phenotypes in the mouse [nonessential genes (NEGs)].
Homozygous loss-of-function mutations in EGs lead to lethality (or miscarriages in humans) and as such, cannot contribute to disease. Although we and others reported a depletion of loss-of-function mutations in EGs in humans (11, 12, 14), heterozygosity for a loss-of-function mutation or other “milder” alleles in EGs may contribute to both dominant and recessive diseases. We illustrated this point using a catalog of disease-linked genes in Online Mendelian Inheritance in Man (29) (SI Materials and Methods); EGs were enriched relative to NEGs in 1,000 genes underlying dominant diseases (odds ratio = 1.95, P value = 3.17 × 10−19; two-sided Fisher’s exact test) and 1,645 genes underlying recessive disease (odds ratio = 1.52, P value = 4.94 × 10−11; two-sided Fisher’s exact test) (Fig. 1A). A stronger enrichment of EGs among genes underlying dominant disease implies that dominant negative alleles and haploinsufficiency play an important role. We provide multiple lines of evidence for higher probability of haploinsufficiency of EGs (Fig. 1A and SI Materials and Methods). First, using the systematically rated dosage-sensitive genes from ClinGen (30), we found that EGs were significantly enriched compared with NEGs and that the levels of EG enrichment positively correlated with levels of evidence supporting dosage sensitivity of rated genes (odds ratio = 3.94, P value = 5.07 × 10−20 for “sufficient evidence”; odds ratio = 5.26, P value = 7.08 × 10−5 for “some evidence”; odds ratio = 2.52, P value = 0.0106 for “little evidence”; odds ratio = 1.14, P value = 0.608 for “not dosage sensitive”; two-sided Fisher’s exact test). Second, as an extension of the earlier findings from the work by Georgi et al. (11), we confirmed the enrichment of EG relative to NEG for 262 human haploinsufficient genes (31) with the updated EG and NEG list (183 EGs vs. 62 NEGs; P value = 1.64 × 10−22, odds ratio = 3.84; two-sided Fisher’s exact test). Third, EGs are significantly overrepresented among 313 human orthologs of mouse genes with heterozygous alleles associated with mutant phenotypes from the MGI (23) (odds ratio = 3.43, P value = 2.74 × 10−23; two-sided Fisher’s exact test). Fourth, with two genome-wide prediction models of haploinsufficient genes in the human genome (32, 33), we observed that EGs have significantly higher probability of exhibiting haploinsufficiency compared with NEGs (P value < 2.2 × 10−16 for both models; two-sided Wilcoxon rank sum test) (Fig. 1 B and C and SI Materials and Methods). Based on our findings that EGs linked to Mendelian disease are overwhelmingly dosage-sensitive, we explored the possibility that a cumulative effect of pathogenic variants in multiple EGs may underlie the genetic basis of a complex disease with early postnatal onset, such as ASD.
Fig. 1.
Haploinsufficiency of EGs. (A) For each class of genes with different essentiality status (EG in red, NEG in turquoise, and unknown in gray), the proportion of genes among each gene set of interest is plotted in Left. Dosage-sensitive genes from ClinGen (30) were classified into five categories (1, sufficient evidence; 2, some evidence; 3, little evidence; 4, no evidence; and 5, not sensitive/recessive). Two-sided Fisher’s exact test was performed to assess the enrichment of EGs vs. NEGs, and the P values were indicated. The odds ratios for enrichment of EGs compared with NEGs and the 95% confidence intervals of odds ratios are plotted in Right. OMIM, Online Mendelian Inheritance in Man (29). (B and C) Histograms and estimated density curves indicating the distribution of (B) the Haploinsufficiency Score (HIS) (32) and (C) the Genome-Wide Haploinsufficiency Score (GHIS) (33) across three gene sets, including EGs (red), NEGs (turquoise), and all protein-coding genes (56) (gray). EGs have significantly higher probability of exhibiting haploinsufficiency compared with NEGs (P value < 2.2 × 10−16 for both models; two-sided Wilcoxon rank sum test).
To address a possible cumulative effect of variants in EGs in ASD in a larger cohort of 1,781 ASD quartet families (with 1,781 probands and 1,781 siblings) from the Simons Simplex Collection (34), we acquired de novo and rare inherited mutations from the exome sequencing data of these families (8, 26). We examined the individual mutational burden defined by the number of de novo loss-of-function (dnLoF), de novo nonsynonymous damaging (dnNSD), and inherited rare damaging (inhRD) mutations per individual (Fig. S1, SI Materials and Methods, and Datasets S2–S4). On average, an ASD proband carried 0.06 dnLoF, 0.21 dnNSD, and 10.74 inhRD mutations in EGs. The mutational burden in EGs was significantly elevated in ASD probands compared with unaffected siblings for the three classes of variants considered (P value = 4.75 × 10−7 for dnLoF, P value = 3.41 × 10−4 for dnNSD, and P value = 0.017 for inhRD; one-sided Wilcoxon signed ranked test) (Fig. 2A and Table S1). In contrast, no significant difference in mutational burden in NEGs was observed (P value = 0.10 for dnLoF, P value = 0.069 for dnNSD, and P value = 0.75 for inhRD) (Table S1). Interestingly, 10,823 genes that are currently not assigned as EG or NEG (i.e., phenotypically uncharacterized in mouse knockouts and human cell-based assays) have a moderately elevated burden of dnLoF but not dnNSD and inhRD variants in ASD probands (P value = 0.0042) (Table S1). Notably, the effect sizes of EG burden in each variant type correspond to our understanding of the severity of the variant type; de novo mutations, which are expected to have a larger functional impact, also display the strongest difference between ASD probands and unaffected siblings (effect size = 0.117 for dnLoF; effect size = 0.079 for dnNSD; Cohen’s d). In contrast, inherited mutations are expected to have a moderate functional impact, and a smaller difference is observed between probands and siblings (effect size = 0.042 for inhRD). Although we observed marginally increased burden of dnLoF and dnNSD mutations in EGs in female (n = 325) compared with male (n = 2,043) probands (Table S2), the analysis of families divided by gender of proband–sibling pairs (female–female, male–female, female–male, and male–male) showed that gender bias does not underlie the observed differences in mutational burden between probands and siblings (Table S3).
Fig. S1.
Variant filtering steps for the mutational burden analysis. EVS, Exome Variant Server (release ESP 6500); MAF, minor allele frequency.
Fig. 2.
Assessment of the contribution of EGs to ASD risk. (A) Individual mutational burden analysis in 1,781 pairs of ASD probands and unaffected siblings (Table S1). The analyses were performed separately for 3,915 EGs (red) and 4,919 NEGs (turquoise). The individual mutational burden is defined by the number of dnLoF, dnNSD, and inhRD mutations per individual. Effect sizes were measured by Cohen’s d, which is defined as the difference between both means divided by the SD of the paired differences. The estimated 95% confidence intervals of effect sizes were plotted (SI Materials and Methods). P values were obtained from one-sided Wilcoxon signed ranked test. *P value < 0.05. (B) ASD candidate genes categorized by SFARI genes scores (S, syndromic; 1, high confidence; 2, strong candidate; 3, suggestive evidence; 4, minimal evidence; 5, hypothesized; and 6, not supported) (37) and their essentiality status (EG in red, NEG in turquoise, and unknown in gray). ***The P value from two-sided Fisher’s exact test (EG vs. NEG) is less than 0.001. (C) The distribution of TADA FDR q values of EGs and NEGs. The FDR q value of the TADA test evaluates ASD association based on combined evidence from de novo SNVs and small deletions, rare inherited variants, and variants (9). The observed negative log10 (q) values of 3,915 EGs (red) and 4,919 NEGs (turquoise) are compared with the expected counterparts under the null hypothesis. The dashed lines indicate the FDR thresholds (FDR = 0.1 in red and FDR = 0.5 in blue) for identification of ASD risk genes. The 95% confidence intervals of the expected negative log10 (q) values are shaded in gray.
Table S1.
Mutational burden analysis in 1,781 ASD quartet families
| Variant type and gene set | No. of genes | Proband average | Sibling average | Effect size | Effect size 95% CI low | Effect size 95% CI high | P value | 
| dnLOF | |||||||
| EG: this work | 3,915 | 0.0640 | 0.0286 | 0.1170 | 0.0715 | 0.1596 | 4.75 × 10−7 | 
| EG: Dickinson et al. (14) | 3,326 | 0.0595 | 0.0253 | 0.1176 | 0.0730 | 0.1603 | 4.16 × 10−7 | 
| EG: Georgi et al. (11) | 2,472 | 0.0494 | 0.0168 | 0.1254 | 0.0820 | 0.1671 | 7.82 × 10−8 | 
| Human cell EGs (20, 21, 22) | 956 | 0.0079 | 0.0056 | 0.0193 | −0.0269 | 0.0637 | 0.2118 | 
| NEG | 4,919 | 0.0387 | 0.0309 | 0.0300 | −0.0157 | 0.0774 | 0.1028 | 
| Phenotypically uncharacterized genes | 10,823 | 0.0752 | 0.0533 | 0.0606 | 0.0143 | 0.1084 | 0.004257 | 
| dnNSD | |||||||
| EG: this work | 3,915 | 0.2061 | 0.1589 | 0.0794 | 0.0324 | 0.1274 | 3.41 × 10−4 | 
| EG: Dickinson et al. (14) | 3,326 | 0.1875 | 0.1376 | 0.0892 | 0.0429 | 0.1353 | 8.13 × 10−5 | 
| EG: Georgi et al. (11) | 2,472 | 0.1505 | 0.1050 | 0.0895 | 0.0435 | 0.1366 | 7.36 × 10−5 | 
| Human cell EGs (20, 21, 22) | 956 | 0.0371 | 0.0365 | 0.0021 | −0.0435 | 0.0499 | 0.4696 | 
| NEG | 4,919 | 0.1611 | 0.1404 | 0.0374 | −0.0100 | 0.0827 | 0.0691 | 
| Phenotypically uncharacterized genes | 10,823 | 0.2471 | 0.2791 | −0.0419 | −0.0884 | 0.0044 | 0.9636 | 
| inhRD | |||||||
| EG: this work | 3,915 | 10.7428 | 10.6042 | 0.0420 | −0.0041 | 0.0887 | 0.01688 | 
| EG: Dickinson et al. (14) | 3,326 | 9.3257 | 9.2358 | 0.0287 | −0.0185 | 0.0757 | 0.04139 | 
| EG: Georgi et al. (11) | 2,472 | 7.0236 | 6.9163 | 0.0402 | −0.0053 | 0.0867 | 0.02622 | 
| Human cell EGs (20, 21, 22) | 956 | 2.3745 | 2.3779 | −0.0022 | −0.0485 | 0.0435 | 0.5935 | 
| NEG | 4,919 | 12.7816 | 12.8355 | −0.0150 | −0.0618 | 0.0308 | 0.7456 | 
| Phenotypically uncharacterized genes | 10,823 | 20.3947 | 20.4559 | −0.0133 | −0.0592 | 0.0342 | 0.5404 | 
Effect sizes were measured by Cohen's d, which is defined as the difference between both means divided by the SD of the paired differences. P values were obtained from one-sided Wilcoxon signed ranked test. 95% CI, 95% confidence interval.
Table S2.
Difference in individual mutational burden between male and female probands
| Variant type and gene set | Female proband average | Male proband average | Effect size | P value | 
| dnLoF | ||||
| EG | 0.0862 | 0.0597 | 0.1042 | 0.0355* | 
| NEG | 0.0462 | 0.0357 | 0.0551 | 0.1782 | 
| dnNSD | ||||
| EG | 0.2400 | 0.1948 | 0.1014 | 0.0388* | 
| NEG | 0.2000 | 0.1596 | 0.0993 | 0.0742 | 
| inhRD | ||||
| EG | 11.0523 | 10.9633 | 0.0151 | 0.4711 | 
| NEG | 13.2677 | 13.0113 | 0.0360 | 0.5271 | 
Effect sizes were measured by Cohen's d, which is defined as the difference between both means divided by pooled SD.
P values with statistical significance.
Table S3.
Mutational burden analysis in 1,781 ASD quartet families (dissected by genders of proband–sibling pairs)
| Variant type, gene set, and proband gender | Sibling gender | No. of families | Proband average | Sibling average | Effect size | P value | 
| dnLoF | ||||||
| EG | ||||||
| All | All | 1,781 | 0.0640 | 0.0286 | 0.1170 | 4.75 × 10−7 | 
| Female | Male | 101 | 0.0891 | 0.0099 | 0.2588 | 0.0067 | 
| Male | Female | 826 | 0.0593 | 0.0327 | 0.0893 | 0.0053 | 
| Male | Male | 732 | 0.0615 | 0.0246 | 0.1228 | 0.0005 | 
| Female | Female | 122 | 0.0902 | 0.0410 | 0.1461 | 0.0600 | 
| NEG | ||||||
| All | All | 1,781 | 0.0387 | 0.0309 | 0.0300 | 0.1028 | 
| Female | Male | 101 | 0.0396 | 0.0297 | 0.0374 | 0.3884 | 
| Male | Female | 826 | 0.0412 | 0.0266 | 0.0558 | 0.0549 | 
| Male | Male | 732 | 0.0369 | 0.0314 | 0.0213 | 0.2838 | 
| Female | Female | 122 | 0.0328 | 0.0574 | 0.0818 | 0.8302 | 
| dnNSD | ||||||
| EG | ||||||
| All | All | 1,781 | 0.2061 | 0.1589 | 0.0794 | 0.0003 | 
| Female | Male | 101 | 0.2178 | 0.1683 | 0.0724 | 0.2392 | 
| Male | Female | 826 | 0.2094 | 0.1755 | 0.0552 | 0.0454 | 
| Male | Male | 732 | 0.1885 | 0.1270 | 0.1136 | 0.0013 | 
| Female | Female | 122 | 0.2787 | 0.2295 | 0.0725 | 0.2157 | 
| NEG | ||||||
| All | All | 1,781 | 0.1611 | 0.1404 | 0.0374 | 0.0691 | 
| Female | Male | 101 | 0.1881 | 0.1980 | 0.0155 | 0.5696 | 
| Male | Female | 826 | 0.1465 | 0.1477 | 0.0022 | 0.5515 | 
| Male | Male | 732 | 0.1667 | 0.1175 | 0.0904 | 0.0080 | 
| Female | Female | 122 | 0.2049 | 0.1803 | 0.0379 | 0.3817 | 
| inhRD | ||||||
| EG | ||||||
| All | All | 1,781 | 10.7428 | 10.6042 | 0.0420 | 0.0169 | 
| Female | Male | 101 | 10.3762 | 10.6436 | 0.0778 | 0.8260 | 
| Male | Female | 826 | 10.8341 | 10.7034 | 0.0401 | 0.1120 | 
| Male | Male | 732 | 10.5765 | 10.4372 | 0.0417 | 0.0449 | 
| Female | Female | 122 | 11.4262 | 10.9016 | 0.1619 | 0.0430 | 
| NEG | ||||||
| All | All | 1,781 | 12.7816 | 12.8355 | 0.0150 | 0.7456 | 
| Female | Male | 101 | 12.5050 | 13.0792 | 0.1398 | 0.9143 | 
| Male | Female | 826 | 12.8693 | 13.0182 | 0.0424 | 0.7802 | 
| Male | Male | 732 | 12.6134 | 12.5546 | 0.0165 | 0.5576 | 
| Female | Female | 122 | 13.4262 | 13.0820 | 0.0907 | 0.1327 | 
Effect sizes were measured by Cohen's d, which is defined as the difference between both means divided by the SD of the paired differences. P values were obtained from one-sided Wilcoxon signed ranked test.
To evaluate the effect of rare damaging mutations in EGs on ASD-associated traits, we used the available quantitative phenotype data on social and cognitive impairments in ∼2,500 ASD families from Simons Simplex Collection (8, 26) (Dataset S2). As a measure of sociability, we used the total raw score from the Social Responsiveness Scale (SRS) (35), and as cognitive measures, we used three different intelligence quotient (IQ) scores (full-scale IQ, verbal IQ, and nonverbal IQ). As previously reported (36), SRS scores were unrelated to IQ, especially in subjects with IQ higher than 50 (Fig. S2). In male probands, we observed that the mutational burden in EGs was positively correlated with the SRS total raw score (P value = 1.08 × 10−6; Poisson regression) (Table 1). The effect was not significant in NEGs (P = 0.21). In female probands, mutational burden in NEGs but not EGs was negatively correlated with SRS total raw score (P = 0.085 for EG and P = 6.06e-06 for NEG). In addition, we found that mutational burden in both EGs and NEGs had a significant effect (P value < 2.2 × 10−16) on verbal and nonverbal IQ scores and that the effect sizes of mutational burden in EGs and NEGs were comparable (Table S4). These results suggest that, in ASD probands, deleterious variants in EGs contribute to decreased social skills in males, whereas deleterious variants in both EGs and NEGs lead to decreased IQ.
Fig. S2.
Correlation between SRS and IQ. For each of 2,368 ASD probands from Simons Simplex Collection, the Pearson correlation between SRS total raw scores and three IQ scores (full-scale IQ, verbal IQ, and nonverbal IQ) was plotted. The probands were divided by IQ scores: (A, C, and E) IQ < 50 and (B, D, and F) IQ ≥ 50.
Table 1.
Relationship between individual mutational burden and SRS in ASD probands
| Group and gene set | Estimate | Standard error | P value | 
| 2,031 Male probands | |||
| EG (3,915 genes) | 0.001860 | 0.000381 | 1.08 × 10−6* | 
| NEG (4,919 genes) | 0.000407 | 0.000324 | 0.209 | 
| 317 Female probands | |||
| EG (3,915 genes) | −0.001511 | 0.000877 | 0.085 | 
| NEG (4,919 genes) | −0.003084 | 0.000682 | 6.04 × 10−6 | 
Coefficients for Poisson regression are shown, which model the relationship between SRS total raw score and individual burden of all rare damaging mutations (including dnLOF, dnNSD, and inhRD mutations).
The P value with statistical significance with positive estimated effects (P value < 0.05; estimate > 0).
Table S4.
Relationship between individual mutational burden and IQ in ASD probands
| Trait and gene set | Estimate | SE | P value | 
| Verbal IQ | |||
| EG (3,915 genes) | −0.007279 | 0.000400 | <2.2 × 10−16 | 
| NEG (4,919 genes) | −0.005307 | 0.000383 | <2.2 × 10−16 | 
| Nonverbal IQ | |||
| EG (3,915 genes) | −0.007172 | 0.000336 | <2.2 × 10−16 | 
| NEG (4,919 genes) | −0.004906 | 0.000320 | <2.2 × 10−16 | 
Coefficients for Poisson regression are shown, which modeled the relationship between verbal/nonverbal IQ and individual burden of all rare damaging mutations (including dnLOF, dnNSD, and inhRD mutations).
To initially explore the overlap between EGs and known ASD genes, we examined the essentiality status of ∼500 ASD candidate genes from the Simons Foundation Autism Research Initiative (SFARI) AutDB database (updated December of 2015) (37) (Fig. 2B). Compared with NEGs, EGs were enriched among ASD candidates categorized as “syndromic” (category S: odds ratio = 3.95, P value = 0.0003; two-sided Fisher’s exact test), candidates with “high confidence” (category 1: odds ratio = 15.12, P value = 0.0004), and candidates with “suggestive evidence” (category 3: odds ratio = 2.14, P value = 0.0006). Trends of enrichment of EGs were also observed for “strong candidates” (category 2: odds ratio = 1.62, P value = 0.21). We did not observe enrichment of EGs among candidate genes with less supportive evidence (categories 4–6).
To further address whether EGs contribute to ASD risk, we compared the strength of ASD association signals between EGs and NEGs in data from a recent comprehensive analysis of ASD genomic architecture (9), where the transmission and de novo association (TADA) test (38) was used to evaluate ASD association based on combined evidence from de novo single-nucleotide variants (SNVs), de novo small deletions, and rare inherited variants from Simons Simplex Collection cohorts as well as case–control data from Autism Sequencing Consortium (ASC) cohorts (39). There was a significant enrichment of EGs compared with NEGs in 65 high-confidence TADA ASD genes [TADA false discovery rate (FDR) q values < 0.1] identified by Sanders et al. (9) (36 EGs vs. 15 NEGs; odds ratio = 3.03, P value = 1.82 × 10−4; one-sided Fisher’s exact test). In a broader set of 441 “potential” TADA ASD genes (TADA FDR < 0.5), EGs were also enriched compared with NEGs (132 EGs vs. 117 NEGs; odds ratio = 1.43, P value = 0.00537). Furthermore, by comparing the observed TADA FDR with the expected TADA FDR, we detected a strong deviation from the null distribution in EGs, especially in 132 EGs with potential ASD association (TADA FDR < 0.5) (Fig. 2C). In contrast, NEGs were not enriched for association relative to the background expectation, suggesting that the association signals between EGs and ASD were stronger and less likely to be false positive compared with NEGs.
It is our hypothesis that a cumulative effect of deleterious variants in several EGs, within the same pathway or across pathways may underlie impaired brain development and individual’s ASD risk. To identify clusters of potentially interacting genes, we evaluated the spatiotemporal expression of EGs and NEGs using RNA sequencing (RNA-seq) data from BrainSpan (40). We identified 41 coexpression modules with distinct expression patterns across 16 brain regions and 31 pre- and postnatal time points (Fig. S3 and SI Materials and Methods). We observed that the majority of EG-enriched modules (11 of 14; FDR < 0.1; two-sided Fisher’s exact test) (Fig. 3A, Fig. S3, and Table S5) exhibited an “early-expression” pattern, where the expression levels were higher at early fetal stages (starting from 8 postconceptual weeks) and gradually declined before birth. In contrast, the majority of the NEG-enriched modules (15 of 18) exhibited a “later-expression” pattern, with expression levels that were lower at early fetal stages and gradually increased until birth.
Fig. S3.
Expression profiles of 41 coexpression modules in the brain. Expression profiles of genes from 41 coexpression modules based on the RNA-seq data from BrainSpan (25) are shown. The y axis represents the first principle component of the module-level expression profiles in each brain tissue type. The x axis represents developmental stages in chronological order (Fig. 2B shows the labels of the time points). The vertical dashed lines indicate the time of birth. The total number of protein-coding genes in each module (n) is indicated along with the module name.
Fig. 3.
Coexpression analysis of EGs in developing human brain. (A) Coexpressed modules enriched in EGs and NEGs. The upper barplot displays the level of enrichment of EGs vs. NEGs for each of 41 coexpression modules based on BrainSpan RNA-seq data. The lower barplot displays the level of enrichment (green) of 441 potential ASD genes in EGs from 41 coexpression modules. The heights of the bars represent negative log10 (FDR q value). The upper and lower red dashed lines indicate FDR q value threshold of 0.1. (B) The brain expression trajectories of genes from three coexpression modules implicated in ASD. The expression trajectories in brain for 1,601 genes in M01 (orange), 1,150 genes in M02 (purple), and 347 genes in M16 (green) were fitted based on the first principle components of the module-level expression profiles (y axis). The x axis represents developmental stages in chronological order. The vertical dashed line indicates the time of birth. pcw, Postconceptual week. (C) Coexpression network of 973 EGs from M01 (orange), M02 (purple), and M16 (green). Edges indicate coexpression between gene pairs.
Table S5.
Coexpression modules in the developing brain
| Module | No. of genes | Expression pattern | Enrichment | No. of EGs | No of NEGs | Odds ratio (EG/NEG) | FDR q value (EG/NEG) | No. of potential ASD genes | Odds ratio (ASD genes) | FDR q value (ASD genes) | 
| M01 | 1,601 | Early expressed | EG enriched | 501 | 251 | 2.73 | 7.38 × 10−38* | 55 | 1.52 | 0.004* | 
| M02 | 1,150 | Early expressed | EG enriched | 367 | 208 | 2.34 | 2.80 × 10−22* | 53 | 2.13 | 2.58 × 10−6* | 
| M03 | 1,054 | Mixed | NEG enriched | 204 | 340 | 0.74 | 9.67 × 10−4* | 18 | 0.72 | 0.934 | 
| M04 | 810 | Late expressed | NEG enriched | 122 | 326 | 0.45 | 3.19 × 10−14* | 19 | 1.00 | 0.529 | 
| M05 | 781 | Late expressed | NEG enriched | 156 | 239 | 0.81 | 0.0491* | 24 | 1.32 | 0.122 | 
| M06 | 702 | Late expressed | NEG enriched | 129 | 254 | 0.63 | 1.55 × 10−5* | 11 | 0.65 | 0.948 | 
| M07 | 663 | Early expressed | EG enriched | 251 | 141 | 2.32 | 1.23 × 10−15* | 8 | 0.50 | 0.989 | 
| M08 | 580 | Early expressed | EG enriched | 193 | 114 | 2.19 | 3.62 × 10−11* | 13 | 0.95 | 0.613 | 
| M09 | 559 | Late expressed | NEG enriched | 104 | 206 | 0.62 | 9.26 × 10−5* | 16 | 1.23 | 0.246 | 
| M10 | 503 | Early expressed | EG enriched | 126 | 114 | 1.40 | 0.0102* | 9 | 0.74 | 0.847 | 
| M11 | 457 | Late expressed | NEG enriched | 79 | 178 | 0.55 | 7.33 × 10−6* | 9 | 0.83 | 0.753 | 
| M12 | 420 | Late expressed | NEG enriched | 62 | 163 | 0.47 | 1.90 × 10−7* | 7 | 0.69 | 0.874 | 
| M13 | 418 | Late expressed | NEG enriched | 97 | 193 | 0.62 | 1.46 × 10−4* | 7 | 0.69 | 0.877 | 
| M14 | 370 | Late expressed | EG enriched | 81 | 58 | 1.77 | 0.00102* | 4 | 0.45 | 0.977 | 
| M15 | 368 | Mixed | EG enriched | 104 | 95 | 1.39 | 0.0251* | 5 | 0.57 | 0.934 | 
| M16 | 347 | Early expressed | EG enriched | 106 | 90 | 1.49 | 0.00570* | 20 | 2.57 | 2.80 × 10−4* | 
| M17 | 339 | Early expressed | EG enriched | 102 | 59 | 2.20 | 1.20 × 10−06* | 16 | 2.05 | 0.008 | 
| M18 | 306 | Late expressed | 66 | 61 | 1.37 | 0.0874 | 5 | 0.67 | 0.861 | |
| M19 | 299 | Late expressed | NEG enriched | 31 | 118 | 0.32 | 1.81 × 10−9* | 2 | 0.28 | 0.994 | 
| M20 | 296 | Late expressed | NEG enriched | 51 | 91 | 0.70 | 0.0498* | 5 | 0.72 | 0.823 | 
| M21 | 291 | Early expressed | 54 | 73 | 0.93 | 0.719 | 5 | 0.72 | 0.818 | |
| M22 | 278 | Early expressed | EG enriched | 83 | 25 | 4.24 | 6.17 × 10−12* | 2 | 0.29 | 0.991 | 
| M23 | 272 | Late expressed | NEG enriched | 41 | 84 | 0.61 | 0.0108* | 2 | 0.31 | 0.988 | 
| M24 | 258 | Early expressed | 51 | 49 | 1.31 | 0.189 | 11 | 1.84 | 0.047 | |
| M25 | 244 | Early expressed | EG enriched | 86 | 49 | 2.23 | 6.66 × 10−6* | 11 | 1.98 | 0.031 | 
| M26 | 239 | Early expressed | EG enriched | 79 | 18 | 5.61 | 8.28 × 10−14* | 4 | 0.70 | 0.821 | 
| M27 | 213 | Late expressed | NEG enriched | 45 | 85 | 0.66 | 0.0261* | 6 | 1.19 | 0.399 | 
| M28 | 197 | Late expressed | 32 | 41 | 0.98 | 1 | 1 | 0.21 | 0.991 | |
| M29 | 193 | Late expressed | NEG enriched | 33 | 69 | 0.60 | 0.0158* | 2 | 0.43 | 0.943 | 
| M30 | 188 | Late expressed | NEG enriched | 11 | 43 | 0.32 | 2.92 × 10−4* | 3 | 0.69 | 0.808 | 
| M31 | 187 | Late expressed | 41 | 64 | 0.80 | 0.323 | 6 | 1.38 | 0.279 | |
| M32 | 172 | Late expressed | NEG enriched | 24 | 60 | 0.50 | 0.00388* | 3 | 0.75 | 0.766 | 
| M33 | 170 | Late expressed | 41 | 40 | 1.29 | 0.263 | 4 | 1.00 | 0.568 | |
| M34 | 163 | Mixed | EG enriched | 48 | 22 | 2.76 | 5.06 × 10−5* | 2 | 0.51 | 0.904 | 
| M35 | 151 | Mixed | NEG enriched | 21 | 48 | 0.55 | 0.0207* | 6 | 1.73 | 0.147 | 
| M36 | 151 | Late expressed | 22 | 44 | 0.63 | 0.0815 | 3 | 0.82 | 0.707 | |
| M37 | 146 | Early expressed | EG enriched | 38 | 9 | 5.35 | 3.81 × 10−7* | 2 | 0.57 | 0.862 | 
| M38 | 128 | Late expressed | NEG enriched | 17 | 63 | 0.34 | 2.11 × 10−5* | 4 | 1.36 | 0.347 | 
| M39 | 115 | Early expressed | 29 | 42 | 0.87 | 0.632 | 4 | 1.47 | 0.298 | |
| M40 | 99 | Unknown | 4 | 13 | 0.39 | 0.0926 | 1 | 0.45 | 0.890 | |
| M41 | 74 | Unknown | NEG enriched | 4 | 16 | 0.31 | 0.0400* | 1 | 0.59 | 0.816 | 
P values with statistical significance.
We found that EGs in three EG-enriched modules (M01, M02, and M16) were significantly enriched (FDR < 0.1; one-sided Fisher’s exact test) for 441 potential TADA ASD genes (Fig. 3A). Notably, all of the three modules were also EG-enriched and early-expressed across fetal brain regions (Fig. 3 A and B). From the pathway enrichment analysis of these EG-enriched modules in the Reactome database (41, 42), we found that the top pathways enriched included “transcription” (M01), “chromatin modifying enzymes and chromatin organization” (M02), and “axon guidance” (M16) (Table S6), in agreement with the insights from recent large-scale autism studies showing that genes for synaptic formation, transcriptional regulation, and chromatin remodeling are disrupted in autism (7–9). This combined analysis identified 974 EGs from three modules that are coexpressed with known ASD candidate genes at distinct stages of brain development.
Table S6.
Reactome pathways enriched in three EG-enriched modules implicated in ASD
| Module and term | Overlap | P value | Adjusted P value | Genes | 
| M01 | ||||
| Transcription | 25/202 | 2.40 × 10−6 | 6.79 × 10−4* | GTF3C3; HDAC2; CCNT2; GTF3C4; RRN3; CSTF3; GTF2E1; CLP1; PCF11; POLR2B; SNAPC3; CSTF1; RNGTT; TBP; NCBP1; NCBP2; GTF2H3; NFIA; POLR3B; NFIB; POLR3C; POLR1B; POLR1E; TFAM; TAF5 | 
| Processing of capped intron-containing pre-mRNA | 22/144 | 4.34 × 10−7 | 3.67 × 10−4* | NCBP1; NUP133; DHX9; NCBP2; CSTF3; CDC5L; HNRNPU; PLRG1; YBX1; NUP160; EFTUD2; PRPF4; CLP1; HNRNPH1; PCF11; POLR2B; NUP50; CSTF1; NUPL1; RAE1; SF3B1; CTNNBL1 | 
| Folding of actin by CCT/TriC | 7/9 | 1.11 × 10−6 | 4.70 × 10−4* | CCT3; CCT6A; CCT2; TCP1; CCT7; CCT5; CCT4 | 
| mRNA splicing | 17/113 | 1.11 × 10−5 | 0.00188* | NCBP1; DHX9; NCBP2; CSTF3; CDC5L; HNRNPU; PLRG1; YBX1; EFTUD2; PRPF4; CLP1; HNRNPH1; PCF11; POLR2B; CSTF1; SF3B1; CTNNBL1 | 
| HIV infection | 23/218 | 6.34 × 10−5 | 0.00589* | CCNT2; PSMD11; RNGTT; TBP; TSG101; NCBP1; NUP133; NCBP2; XRCC5; HMGA1; NEDD4L; GTF2H3; GTF2E1; NUP160; AP1G1; POLR2B; NUP50; PSMD2; TAF5; NUPL1; PAK2; RAE1; KPNB1 | 
| HIV lifecycle | 18/137 | 3.19 × 10−5 | 0.00451* | CCNT2; RNGTT; TBP; TSG101; NCBP1; NUP133; NCBP2; XRCC5; HMGA1; NEDD4L; GTF2H3; GTF2E1; NUP160; POLR2B; NUP50; TAF5; NUPL1; RAE1 | 
| snRNP assembly | 10/49 | 8.30 × 10−5 | 0.00589* | NCBP1; NUP133; NCBP2; NUP50; TGS1; DDX20; NUPL1; RAE1; NUP160; WDR77 | 
| Formation of tubulin-folding intermediates by TriC/CCT | 7/20 | 5.94 × 10−5 | 0.00589* | CCT3; CCT6A; CCT2; TCP1; CCT7; CCT5; CCT4 | 
| Association of TriC/CCT with target proteins during biosynthesis | 8/29 | 7.19 × 10−5 | 0.00589* | CCT3; CCT6A; CCT2; TCP1; XRN2; CCT7; CCT5; CCT4 | 
| Regulation of cholesterol biosynthesis by SREBP | 10/53 | 1.47 × 10−4 | 0.00890* | SQLE; SEC24B; GGPS1; NFYA; TGS1; CYP51A1; HMGCR; SEC24D; KPNB1; FDFT1 | 
| M02 | ||||
| Chromatin organization | 35/208 | 3.76 × 10−15 | 1.41 × 10−12* | PHF2; KDM5C; SMARCB1; TRRAP; EHMT2; EHMT1; CHD4; ACTB; PHF21A; NSD1; SAP130; EP400; WDR5; EP300; BRD8; WHSC1; MTA2; KDM6B; BRD1; CREBBP; KDM4B; SMARCC2; KDM2B; SETDB1; SETD1B; USP22; DNMT3A; ARID1A; GATAD2A; HCFC1; SMARCA4; NCOR1; KAT6B; KAT6A; RCOR1 | 
| Processing of capped intron-containing pre-mRNA | 19/144 | 3.55 × 10−7 | 8.87 × 10−5* | NUP214; SF3A1; SF3B2; SF3B3; NUP155; FUS; DDX23; SMC1A; PRPF8; SRRM1; NUP93; PRPF6; U2AF2; NUP62; POLR2D; TPR; DHX38; NUP98; SNRNP200 | 
| Transcription | 18/202 | 1.02 × 10−4 | 0.00660* | GTF3C1; NFIX; POU2F1; EHMT2; CHD4; SSRP1; GATAD2A; SRRM1; POLR3A; POLR1A; U2AF2; POLR2D; TCEB3; UBTF; DHX38; MTA2; TAF4; TAF1 | 
| PKMTs methylate histone lysines | 7/29 | 8.03 × 10−5 | 0.00660* | SETDB1; EHMT2; NSD1; SETD1B; WDR5; EHMT1; WHSC1 | 
| Transport of mature mRNA derived from an intron-containing transcript | 9/50 | 5.80 × 10−5 | 0.00660* | NUP214; NUP93; NUP155; U2AF2; NUP62; TPR; DHX38; NUP98; SRRM1 | 
| HATs acetylate histones | 13/105 | 4.91 × 10−5 | 0.00660* | BRD1; CREBBP; TRRAP; USP22; ACTB; HCFC1; KAT6B; KAT6A; SAP130; EP400; WDR5; EP300; BRD8 | 
| Transport of mature transcript to cytoplasm | 9/54 | 9.84 × 10−5 | 0.00660* | NUP214; NUP93; NUP155; U2AF2; NUP62; TPR; DHX38; NUP98; SRRM1 | 
| mRNA splicing | 13/113 | 9.74 × 10−5 | 0.00660* | SF3A1; SF3B2; SF3B3; FUS; DDX23; SMC1A; PRPF8; SRRM1; PRPF6; U2AF2; POLR2D; DHX38; SNRNP200 | 
| Regulation of lipid metabolism by peroxisome proliferator-activated receptor alpha | 13/114 | 1.06 × 10−4 | 0.00660* | ABCA1; MED1; CREBBP; NCOA6; NRF1; MED26; SREBF2; MED12; MED14; MED24; NCOR1; SIN3A; EP300 | 
| Transcriptional regulation of white adipocyte differentiation | 11/78 | 6.57 × 10−5 | 0.00660* | MED12; MED1; CREBBP; MED14; MED24; NCOR1; NCOA6; EP300; LPL; MED26; SREBF2 | 
| M16 | ||||
| Axon guidance | 11/327 | 2.24 × 10−4 | 0.0740 | GSK3B; ARHGEF12; ROCK2; RASA1; KCNQ3; ANK2; ANK3; ARHGEF7; GRIN2B; MYH10; ITGA9 | 
| Synthesis of PIPs at the early endosome membrane | 3/13 | 3.84 × 10−4 | 0.0740 | INPP4A; PIKFYVE; PIK3C3 | 
| CREB phosphorylation through the activation of Ras | 3/27 | 2.54 × 10−3 | 0.122 | PDPK1; BRAF; GRIN2B | 
| Insulin receptor signaling cascade | 5/92 | 0.00191 | 0.122 | PDPK1; GRB10; PIK3C3; TSC1; MTOR | 
| Eph-ephrin signaling | 5/94 | 0.00209 | 0.122 | ROCK2; RASA1; ARHGEF7; GRIN2B; MYH10 | 
| Sema4D-induced cell migration and growth cone collapse | 3/24 | 0.00187 | 0.122 | ARHGEF12; ROCK2; MYH10 | 
| Interaction between L1 and ankyrins | 3/29 | 0.00306 | 0.131 | KCNQ3; ANK2; ANK3 | 
| Post NMDA receptor activation events | 3/35 | 0.00501 | 0.143 | PDPK1; BRAF; GRIN2B | 
| Signaling by insulin receptor | 5/116 | 0.00497 | 0.143 | PDPK1; GRB10; PIK3C3; TSC1; MTOR | 
| PI3K cascade | 4/68 | 0.00423 | 0.143 | PDPK1; PIK3C3; TSC1; MTOR | 
CREB, cAMP response element binding protein; HATs, histone acetyltransferases; NMDA, N-methyl-d-aspartate; PKMTs, protein lysine methyltransferases; PIPs, phosphatidylinositol phosphates; PI3K, phosphoinositide 3-kinase; snRNP, small nuclear ribonucleo proteins; SREBP, sterol regulatory element-binding proteins; TriC/CCT, TCP1-ring complex or chaperonin containing TCP1.
Adjusted P values with statistical significance.
To further prioritize known EGs as candidates for ASD, we constructed a coexpression network for 974 EGs from three modules enriched for potential ASD genes (Fig. 3C and SI Materials and Methods); 844 genes among 974 have a close interaction with high-confidence ASD genes (connected to at least two genes with TADA FDR < 0.1), and 370 genes harbor de novo or inherited loss-of-function mutations in ASD individuals from Simons Simplex Collection or ASC cohorts. Of these, 52 have a TADA FDR less than 0.5. Among 52 genes, 23 have been previously shown to contribute to ASD risk [categories syndromic (S), 1, 2, 3, and 4 in SFARI]. For the remaining 29 EGs that have not yet been linked to ASD risk, we argue that, based on (i) the importance of EGs in ASD etiology as shown by their role in critical developmental stages and the increased burden of rare, damaging mutations in ASD individuals; (ii) their coexpression with high-confidence ASD genes in brain; and (iii) the suggestive genetic evidence from the TADA analysis, these 29 EGs represent the strongest candidates for additional investigation in their role in ASD (Fig. S4 and Table S7). According to available mouse phenotypes from the MGI (23) and the IMPC (24), 11 of these 29 EGs have reported heterozygous phenotypes in mice (Table S7). Among them, four EGs (CHD1, FBXO11, KDM4B, and VCP) have been associated with abnormal neural development and/or behavioral phenotypes in heterozygotes.
Fig. S4.
Chromosomal distribution of 29 EGs suggested as strong ASD candidate genes. The locations of each gene along the chromosomes are shown in red.
Table S7.
Priority list of 29 EGs as strong ASD candidates
| Gene | Chromosome | Start | End | Module | TADA FDR q value | No. of high-confidence ASD genes that are coexpressed | Disease associations | 
| BIRC6 | 2 | 32357028 | 32618899 | M02 | 0.47 | 15 | — | 
| CHD1 | 5 | 98853985 | 98928957 | M01 | 0.17 | 15 | CHD8 has been previously associated with autism | 
| CUL1 | 7 | 148697914 | 148801036 | M01 | 0.49 | 12 | — | 
| DHX29 | 5 | 55256245 | 55307722 | M01 | 0.40 | 17 | — | 
| DVL3 | 3 | 184155388 | 184173610 | M02 | 0.33 | 10 | Robinow syndrome-3 characterized by skeletal abnormalities | 
| EP300 | 22 | 41091786 | 41180077 | M02 | 0.45 | 13 | Rubinstein–Taybi syndrome characterized by short stature, moderate to severe learning difficulties, distinctive facial features, and broad thumbs and first toes | 
| EP400 | 12 | 131949920 | 132081102 | M02 | 0.43 | 9 | — | 
| FBXO11 | 2 | 47789316 | 47905793 | M01 | 0.15 | 17 | Associated with chronic otitis media with effusion and recurrent otitis media, a hearing loss disorder, and the N-ethyl-N-nitrosourea (ENU) knockout of the homologous mouse gene results in the deaf mouse mutant Jeff | 
| KDM4B | 19 | 4969113 | 5153595 | M02 | 0.30 | 14 | — | 
| LDB1 | 10 | 102107560 | 102120453 | M02 | 0.42 | 14 | — | 
| LTN1 | 21 | 28928144 | 28992956 | M16 | 0.37 | 3 | — | 
| MORC3 | 21 | 36320189 | 36386148 | M01 | 0.50 | 10 | — | 
| MYH10 | 17 | 8474205 | 8630761 | M16 | 0.13 | 3 | Essential for normal spine morphology and dynamics; pharmacologic or genetic inhibition of Myh10 altered protrusive motility of spines, destabilized their mushroom head morphology, and impaired excitatory synaptic transmission | 
| NFIB | 9 | 14081843 | 14398983 | M01 | 0.45 | 15 | — | 
| PBX1 | 1 | 164555584 | 164899296 | M01 | 0.46 | 16 | — | 
| PHF21A | 11 | 45929323 | 46121178 | M02 | 0.48 | 11 | — | 
| RFX7 | 15 | 56087280 | 56243266 | M01 | 0.25 | 17 | — | 
| RNF38 | 9 | 36336396 | 36487548 | M01 | 0.41 | 18 | — | 
| SMARCE1 | 17 | 40624962 | 40648508 | M01 | 0.41 | 12 | Meningiomas (brain and spinal cord tumors) | 
| SNW1 | 14 | 77717599 | 77761207 | M01 | 0.44 | 12 | — | 
| STXBP5 | 6 | 147204425 | 147390476 | M16 | 0.37 | 2 | — | 
| SUFU | 10 | 102503987 | 102633535 | M02 | 0.47 | 14 | Familial meningioma, medulloblastoma | 
| TAF4 | 20 | 61953469 | 62065810 | M02 | 0.30 | 10 | Interference of transcription by the binding of TAF4 with expanded polyglutamine stretches is involved in the pathogenetic mechanisms underlying neurodegeneration | 
| TANC2 | 17 | 63009556 | 63427699 | M02 | 0.32 | 14 | — | 
| TNPO3 | 7 | 128954180 | 129055173 | M01 | 0.19 | 17 | Mutations found in patients with muscular dystrophy | 
| UTP6 | 17 | 31860899 | 31901765 | M01 | 0.19 | 12 | — | 
| VCP | 9 | 35056064 | 35073249 | M02 | 0.49 | 9 | Inclusion body myopathy with Paget disease of bone and frontotemporal dementia, amyotrophic lateral sclerosis, Charcot–Marie–tooth disease type 2Y | 
| WHSC1 | 4 | 1871424 | 1982207 | M02 | 0.27 | 13 | Located in the Wolf–Hirschhorn syndrome critical region | 
| YTHDC1 | 4 | 68310387 | 68350089 | M01 | 0.48 | 11 | — | 
SI Materials and Methods
Identification of EGs.
We identified 3,023 protein-coding EGs annotated with 50 Mouse Phenotype (MP) terms, including prenatal, perinatal, and postnatal lethal phenotypes from the MGI (23) (Table S8). The MGI database was also used to extract 4,995 protein-coding NEGs with nonlethal phenotypes in the mouse. Phenotype data from the IMPC database portal (24) expanded the lethal gene list with the addition of 252 lethal genes and 101 genes with subviable phenotypes. We further supplemented the nonlethal gene list with 701 genes with viable phenotypes from the IMPC. In the case of discrepancy in the reported lethality status between the MGI and the IMPC, we deferred to the phenotypes reported by the IMPC, because these mouse lines were generated on a defined C57BL/6N background and phenotypically characterized using a standardized pipeline. One to one mouse–human orthology of lethal and nonlethal genes was established based on MGI annotation and manual curation, resulting in 3,326 essential and 4,919 nonessential human orthologs.
Table S8.
MP terms for lethal phenotypes
| MP identification | Lethality type | Lethality description | 
| MP:0002058 | Neonatal lethality | Death within the neonatal period after birth (Mus: P0) | 
| MP:0002080 | Prenatal lethality | Death anytime between fertilization and birth (Mus: approximately E18.5) | 
| MP:0002081 | Perinatal lethality | Death anytime within the perinatal period (Mus: E18.5 through postnatal day 1) | 
| MP:0002082 | Postnatal lethality | Premature death anytime between the neonatal period and weaning age (Mus: P1 to ∼3 wk of age) | 
| MP:0006204 | Embryonic lethality before implantation | Death anytime between fertilization and implantation (Mus: E0 to less than E4.5) | 
| MP:0006205 | Embryonic lethality between implantation and somite formation | Death anytime between the point of implantation and somite formation (Mus: E4.5 to less than E8) | 
| MP:0006206 | Embryonic lethality between somite formation and embryo turning | Death anytime between somite formation and the initiation of embryo turning (Mus: E8 to less than E9) | 
| MP:0006207 | Embryonic lethality during organogenesis | Death anytime between embryo turning and the completion of organogenesis (Mus: E9–E9.5 to less than E14) | 
| MP:0006208 | Lethality throughout fetal growth and development | Death anytime between the completion of organogenesis and birth (Mus: E14 to approximately E18.5) | 
| MP:0008527 | Embryonic lethality at implantation | Death because of failure of implantation (Mus: E4.5) | 
| MP:0008569 | Lethality at weaning | Premature death at weaning age, often caused by the inability to make the transition to solid food | 
| MP:0008762 | Embryonic lethality | Death of an animal within the embryonic period before organogenesis (Mus: before E14) | 
| MP:0009850 | Embryonic lethality between implantation and placentation | Death anytime between the point of implantation and the initiation of placentation (Mus: E4.5 to less than E9) | 
| MP:0010770 | Preweaning lethality | Death anytime between fertilization and weaning age (Mus: ∼3–4 wk of age) | 
| MP:0010831 | Partial lethality | The appearance of lower than Mendelian ratios of offspring of a given genotype because of death of some but not all of the organisms | 
| MP:0010832 | Lethality during fetal growth through weaning | Death anytime between the completion of organogenesis and weaning age (Mus: E14 to ∼3 wk of age) | 
| MP:0011083 | Complete lethality at weaning | Premature death at weaning age of all organisms of a given genotype in a population, often because of the inability to make the transition to solid food | 
| MP:0011084 | Partial lethality at weaning | The appearance of lower than Mendelian ratios of organisms of a given genotype because of death of some but not all of the organisms at weaning age | 
| MP:0011085 | Complete postnatal lethality | Premature death anytime between the neonatal period and weaning age of all organisms of a given genotype in a population (Mus: P1 to ∼3 wk of age) | 
| MP:0011086 | Partial postnatal lethality | The appearance of lower than Mendelian ratios of organisms of a given genotype because of death of some but not all of the organisms anytime between the neonatal period and weaning age (Mus: P1 to ∼3 wk of age) | 
| MP:0011087 | Complete neonatal lethality | Death of all organisms of a given genotype in a population within the neonatal period after birth (Mus: P0) | 
| MP:0011088 | Partial neonatal lethality | The appearance of lower than Mendelian ratios of organisms of a given genotype because of death of some but not all of the organisms within the neonatal period after birth (Mus: P0) | 
| MP:0011089 | Complete perinatal lethality | Death of all organisms of a given genotype in a population within the perinatal period (Mus: E18.5 through postnatal day 1) | 
| MP:0011090 | Partial perinatal lethality | The appearance of lower than Mendelian ratios of organisms of a given genotype because of death of some but not all of the organisms within the perinatal period (Mus: E18.5 through postnatal day 1) | 
| MP:0011091 | Complete prenatal lethality | Death of all organisms of a given genotype in a population between fertilization and birth (Mus: approximately E18.5) | 
| MP:0011092 | Complete embryonic lethality | Death of all organisms of a given genotype in a population within the embryonic period before organogenesis (Mus: before E14) | 
| MP:0011093 | Complete embryonic lethality at implantation | Death of all organisms of a given genotype in a population at the point of implantation (Mus: E4.5) | 
| MP:0011094 | Complete embryonic lethality before implantation | Death of all organisms of a given genotype in a population between fertilization and implantation (Mus: E0 to less than E4.5) | 
| MP:0011095 | Complete embryonic lethality between implantation and placentation | Death of all organisms of a given genotype in a population between the point of implantation and the initiation of placentation (Mus: E4.5 to less than E9) | 
| MP:0011096 | Complete embryonic lethality between implantation and somite formation | Death of all organisms of a given genotype in a population between the point of implantation and somite formation (Mus: E4.5 to less than E8) | 
| MP:0011097 | Complete embryonic lethality between somite formation and embryo turning | Death of all organisms of a given genotype in a population between somite formation and the initiation of embryo turning (Mus: E8 to less than E9) | 
| MP:0011098 | Complete embryonic lethality during organogenesis | Death of all organisms of a given genotype in a population between embryo turning and the completion of organogenesis (Mus: E9–E9.5 to less than E14) | 
| MP:0011099 | Complete lethality throughout fetal growth and development | Death of all organisms of a given genotype in a population between the completion of organogenesis and birth (Mus: E14 to approximately E18.5) | 
| MP:0011100 | Complete preweaning lethality | Death of all organisms of a given genotype in a population between fertilization and weaning age (Mus: ∼3–4 wk of age) | 
| MP:0011101 | Partial prenatal lethality | The appearance of lower than Mendelian ratios of organisms of a given genotype because of death of some but not all of the organisms between fertilization and birth (Mus: approximately E18.5) | 
| MP:0011102 | Partial embryonic lethality | The appearance of lower than Mendelian ratios of organisms of a given genotype because of death of some but not all of the organisms within the embryonic period before organogenesis (Mus: before E14) | 
| MP:0011103 | Partial embryonic lethality at implantation | The appearance of lower than Mendelian ratios of organisms of a given genotype because of death of some but not all of the organisms at the point of implantation (Mus: E4.5) | 
| MP:0011104 | Partial embryonic lethality before implantation | The appearance of lower than Mendelian ratios of organisms of a given genotype because of death of some but not all of the organisms between fertilization and implantation (Mus: E0 to less than E4.5) | 
| MP:0011105 | Partial embryonic lethality between implantation and placentation | The appearance of lower than Mendelian ratios of organisms of a given genotype because of death of some but not all of the organisms between the point of implantation and the initiation of placentation (Mus: E4.5 to less than E9) | 
| MP:0011106 | Partial embryonic lethality between implantation and somite formation | The appearance of lower than Mendelian ratios of organisms of a given genotype because of death of some but not all of the organisms between the point of implantation and somite formation (Mus: E4.5 to less than E8) | 
| MP:0011107 | Partial embryonic lethality between somite formation and embryo turning | The appearance of lower than Mendelian ratios of organisms of a given genotype because of death of some but not all of the organisms between somite formation and the initiation of embryo turning (Mus: E8 to less than E9) | 
| MP:0011108 | Partial embryonic lethality during organogenesis | The appearance of lower than Mendelian ratios of organisms of a given genotype because of death of some but not all of the organisms between embryo turning and the completion of organogenesis (Mus: E9–E9.5 to less than E14) | 
| MP:0011109 | Partial lethality throughout fetal growth and development | The appearance of lower than Mendelian ratios of organisms of a given genotype because of death of some but not all of the organisms between the completion of organogenesis and birth (Mus: E14 to approximately E18.5) | 
| MP:0011110 | Partial preweaning lethality | The appearance of lower than Mendelian ratios of organisms of a given genotype because of death of some but not all of the organisms between fertilization and weaning age (Mus: ∼3–4 wk of age) | 
| MP:0011111 | Complete lethality during fetal growth through weaning | Death of all organisms of a given genotype in a population between the completion of organogenesis and weaning age (Mus: E14 to ∼3 wk of age) | 
| MP:0011112 | Partial lethality during fetal growth through weaning | The appearance of lower than Mendelian ratios of organisms of a given genotype because of death of some but not all of the organisms between the completion of organogenesis and weaning age (Mus: E14 to ∼3 wk of age) | 
| MP:0011400 | Complete lethality | All individuals of a given genotype in a population die before the end of the normal lifespan, but time(s) of death are unspecified | 
| MP:0013292 | Embryonic lethality before organogenesis | Death before the completion of embryo turning (Mus: E9–E9.5) | 
| MP:0013293 | Embryonic lethality before tooth bud stage | Death before the appearance of tooth buds (Mus: E12–E12.5) | 
| MP:0013294 | Prenatal lethality before heart atrial septation | Death before the completion of heart atrial septation (Mus: E14.5–E15.5) | 
E, embryonic day; Mus, Mus musculus.
The catalog of EGs was further augmented with the addition of cell EGs from three recent studies (20–22) aimed at the characterization of EGs in human cell lines. We obtained 1,580 core EGs (genes above essentiality threshold in at least three of five cell lines in the study) from the work by Hart et al. (22), 1,739 core EGs (genes above essentiality threshold in at least two of four cell lines in the study) from the work by Wang et al. (21), and 1,734 core EGs (genes above essentiality threshold in at least one of two cell lines in the study) from the work by Blomen et al. (20). By taking the overlap of three sets of core EGs, we obtained 956 high-confidence human EGs. Among 956 EGs in human cell lines, 348 genes (36.4%) are also human orthologs of EGs in the mouse, 19 genes (2.0%) are human orthologs of NEGs in the mouse, and 589 genes (61.6%) have not been tested in the mouse (14).
Analysis of Haploinsufficiency of EGs.
We collected genes sets from multiple studies and resources for the analysis of patterns of inheritance and haploinsufficiency of EGs. First, a catalog of human disease genes was obtained from Online Mendelian Inheritance in Man (OMIM; downloaded on July 12, 2016) (29). From the OMIM catalog, we identified 1,411 genes annotated with genetic disorders that are “autosomal dominant” or “X-linked dominant” and 2,056 genes annotated with genetic disorders that are “autosomal recessive” or “X-linked recessive.” By dissecting the above two gene lists, we obtained 1,000 genes underlying only dominant diseases, 1,645 genes underlying only recessive diseases, and 441 genes that were linked to both dominant and recessive disorders. Second, a list of 616 protein-coding genes that were systematically assessed for evidence for dosage sensitivity was obtained from ClinGen Dosage Sensitivity Map (30). Among 616 genes, 239 genes were dosage-sensitive with sufficient evidence, 41 genes were dosage-sensitive with some evidence, 47 genes were dosage-sensitive with little evidence, 200 genes had no evidence for dosage pathogenicity so far, and 89 genes were not dosage-sensitive or with autosomal recessive phenotype. Third, a list of 262 haploinsufficient genes based on text-mining from PubMed and OMIM was obtained from the work by Dang et al. (31). Fourth, from the MGI, we identified 313 human orthologs of mouse genes associated with heterozygous phenotypes. For each of the gene sets, we evaluated the enrichment of EGs compared with NEGs using Fisher’s exact test.
We acquired the Haploinsufficiency Scores (32) and the Genome-Wide Haploinsufficiency Scores (33) for genome-wide prediction of the probability of haploinsufficiency. For each prediction model, the raw scores were ranked and converted to percentiles. The histograms and estimated density curves were plotted using ggplot2 (geom_histogram and geom_line) in R.
Burden Analysis of Mutations in EGs in ASD Families.
The Simons Simplex Collection contains genetic and phenotypic information from 2,600 ASD families, each of which has one child affected with ASD and unaffected parents and siblings (34). ASD probands were defined by clinical consensus from the Autism Diagnostic Interview–Revised (57) and the Autism Diagnostic Observation Schedule (58). Multiple individual phenotypic measures, including the SRS (35) and IQ, were available (8, 26).
We aimed to investigate the impact of both de novo and rare inherited variants in EGs on ASD risk. We acquired a list of 5,648 de novo variants from an exome sequencing study on 2,517 ASD families from the Simons Simplex Collection (8) and an additional list of 1,544 de novo variants from a reanalysis of the same cohort (2,377 ASD families) with a different pipeline (26). Among 7,192 de novo variants, 674 were loss-of-function mutations (i.e., SNVs that are frameshift, stop-loss, stop-gain, start-loss, splicing donor or acceptor, and frameshift indels), and 3,462 were nonsynonymous mutations (i.e., missense SNVs and nonframeshift indels). The deleterious de novo nonsynonymous mutations were selected using a threshold of the Combined Annotation-Dependent Depletion (CADD) (59) phred-scale score above 10. In addition, we obtained 249,729 rare inherited mutations from 2,377 ASD families (26). From the variants successfully called by both GATK (60) and FreeBayes (61), we extracted loss-of-function mutations and nonsynonymous mutations with minor allele frequency in Exome Variant Server (European ancestry) (62) less than 0.01 and CADD phred-scale score above 10. At the end of the variant filtering steps, we obtained 372 dnLoF variants, 1,497 dnNSD variants, and 77,891 inhRD variants in EGs or NEGs for mutational burden analysis (Fig. S1 and Datasets S3 and S4).
The individual mutational burden was defined as the number of mutations carried by each subject in the gene sets of interest (i.e., 3,915 EGs and 4,919 NEGs) for each class of variants (dnLoF, dnNSD, and inhRD). Among all Simons Simplex Collection ASD families, there were 1,781 ASD quartets where exome sequence data from an affected proband and an unaffected sibling were available. The individual mutational burden in 1,781 ASD probands was compared with the burden in their unaffected sibling using one-sided Wilcoxon signed ranked test.
We acquired SRS total raw scores for 2,348 probands and 1,678 siblings as well as verbal/nonverbal IQs for 2,359 probands for 1,781 ASD quartets and 587 ASD trios from Simons Simplex Collection families (Dataset S2). Poisson regression analysis was carried out separately between each trait (i.e., SRS total raw score and verbal IQ and nonverbal IQ) as the dependent variables and the individual burdens of all rare damaging mutations (including dnLoF, dnNSD, and inhRD) in EGs or NEGs as the independent variables.
Construction of Coexpression Modules and Coexpression Network in Brain.
Coexpression analysis in human brain was conducted based on RNA-seq data from BrainSpan: Atlas of the Developing Human Brain (40). We used the Weighted Correlation Network Analysis (WGCNA) package (63) for data quality control and identification of modules of coexpressed genes. The expression data for 52,376 Ensembl genes (56) (including protein-coding genes, noncoding genes, or pseudogenes) across 525 samples were obtained; 1,716 genes with too many missing entries or zero variance in expression levels were removed by the “goodSamplesGenes” function in the WGCNA, and 12,613 genes with very low expression levels [maximum reads per kilobase of transcript per million mapped reads (RPKM) less than 0.5 across samples] were removed. As a final step for gene-level data cleaning, only protein-coding genes were selected for additional analysis. For sample-level data cleaning, three outlier subjects (300, 303, and 306) were removed according to subject-level clustering result. Ten brain tissue types (caudal ganglionic eminence, cerebellum, dorsal thalamus, lateral ganglionic eminence, medial ganglionic eminence, occipital neocortex, parietal neocortex, primary motor sensory cortex, temporal neocortex, and upper rhombic lip) with data from fewer than 10 developmental stages were removed. The final quality-controlled dataset consisted of expression levels of 15,952 protein-coding genes in 16 brain tissue types across 31 pre- and postnatal developmental stages (495 samples in total). For module detection, we used the “blockwiseModules” function in the WGCNA with default parameters, except for the network type (power = 6, deepSplit = 2, and networkType = “signed”). We used the signed version of coexpression analysis that links two genes with positive correlation of expression levels.
Coexpression between gene pairs was calculated based on the quality-controlled BrainSpan RNA-seq data with 495 brain samples. Two genes were defined as “coexpressed” in the brain if the Pearson correlation of the expression levels of both genes across 495 brain samples was greater than or equal to 0.8. In total, there were 8,600,150 coexpression links among protein-coding genes. The coexpression network was created using the GeneMania plugin (64) within Cytoscape 3.2.1 (65). Of 974 EGs from three modules (M01, M02, and M16) implicated in ASD, coexpression data were available for 973 genes, which were used as the input gene set for network construction. The coexpression network consists of a main connected component with 963 nodes and 187,443 edges as well as 10 isolated nodes.
Discussion
We provide multiple lines of evidence suggesting that deleterious variants in EGs have a cumulative effect on ASD risk. Using the most comprehensive list of 3,915 EGs established to date, we show that there is both an elevated burden of damaging mutations in EGs in ASD probands and also, an enrichment of EGs in the recently identified high-confidence ASD-associated genes. Moreover, the analysis of EGs in the developing brain identified clusters of coexpressed EGs implicated in ASD, including 29 EGs functionally related to previously identified ASD risk genes.
We find that ASD individuals have a higher burden of mutations in EGs compared with their unaffected siblings. It is notable but not surprising that this effect is particularly pronounced when considering de novo mutations, because this class of mutations is only subject to selection pressure after originating in the individual and has exhibited some of the most prominent associations with the risk of ASD (8, 43–45). Similarly, a moderately increased burden of dnLoF variants in ASD probands was detected with a group of 10,823 phenotypically uncharacterized genes. Based on current estimates, one-fifth of these uncharacterized genes (∼2,000) are expected to be EGs, which may explain the higher mutational burden of dnLoF variants in ASD probands. Recent studies have begun to show that additional genetic factors, such as rare and common inherited variations, also contribute to ASD (26, 46). Our result supports this finding, showing that inherited, rare, damaging mutations in EGs also have a significant effect on ASD risk. Furthermore, we show an EG-specific effect on social responsiveness, a measure of the social aspects of ASD. In contrast, mutational burden in both EGs and NEGs has an effect on IQ measures. Complex social behaviors result from a range of different cognitive processes; however, in ASD subjects, there is a striking dissociation in the level of impairment in social interaction or communication and general cognitive abilities (as measured by IQ) (36) (Fig. S2). Moreover, studies in model organisms clearly show a fetal origin for social behavior deficits (47). Our results are in line with these findings and suggest that, although a higher mutational burden over all genes may have consequences on IQ, mutational burden in a set of genes with a role at critical early developmental stages influences the development of social behavior. Moreover, our findings are also further supported by the recent report that genomic regions that are under accelerated evolution have essential functions in the human brain development and when mutated, may cause increased risk for autism (48). Therefore, understanding the regulatory landscape of dosage-sensitive EGs expressed at critical stages of brain development may reveal risk alleles for many neurodevelopmental and psychiatric disorders.
The analysis of the overlapping set of Simons Simplex Collection ASD families by several groups using complementary approaches led to the identification of around 100 ASD risk genes and the finding of a depletion of damaging mutations in ASD risk genes (12, 27, 28). We show that a significant number of reported ASD risk genes are essential for survival and fitness and therefore, have a distinctive mutational spectrum, providing a biological foundation for this intolerance to damaging mutations. Of the spectrum of existing alleles, homozygosity or compound heterozygosity for loss-of function alleles will never be observed. Also, because of synthetic lethality, some combinations of mutations in EGs are eliminated. Therefore, individuals will have only a subset of “milder” coding or regulatory alleles. The current list of candidate genes consists of 100 (high-confidence ASD genes) to 400 genes (potential ASD genes) (9). It is striking that our study provides strong statistical evidence for the aggregate effect across 3,915 EGs impacting risk for this neurodevelopmental disorder. A recent SNP-based heritability study reported the extreme polygenicity of schizophrenia, with 70% of 1-Mb genomic regions harboring schizophrenia risk alleles (49). Assuming a similar genetic architecture in ASD and schizophrenia, genomic maps of EGs with “surviving” deleterious and regulatory variants in ASD probands represent a complementary approach for the analysis of combinations of culprit genes or alleles.
Because of the fundamental functional role of EGs in an organism, genetic variants in these genes are likely to contribute to many traits and diseases as reflected by the previous finding that EGs are enriched for human disease genes (11, 13, 14). Our study is focused on a specific neurodevelopmental disorder—ASD—because it has been suggested that ASD has its roots in abnormalities in prenatal brain development (50–52). Specifically, our analysis of the temporal expression patterns of coexpressed gene modules in the developing brain shows that genes in three EG-enriched coexpression modules implicated in ASD are expressed at a high level at the earliest stages of brain development, as early as 8 weeks after conception. In contrast, at later stages of brain development, the expression levels of genes in these EG-enriched modules decrease, whereas the expression levels of genes in NEG-enriched modules increase. This finding suggests that EGs have a distinctive influence at some of the earliest brain developmental stages as previously reported for constrained genes (53) and genes in functional networks perturbed in ASD (54). However, it is not clear whether the contribution of EGs is specific to ASD or widespread across disorders with various underlying mechanisms. A comparison of the burden of deleterious variants in EGs across other complex disorders, including those with a later onset, is warranted.
Each individual can carry a number of deleterious mutations, each of which can have a small effect. Because brain function may be particularly sensitive to mutation accumulation, identifying a specific set of genes in which mutations have a behavioral effect will assist us in understanding how mutation accumulation within an individual can result in a phenotype, such as ASD. Hallmarks of ASD are phenotypic heterogeneity, frequent comorbidities, and that no specific brain region or cell type is uniquely implicated (5), further supporting the role of genes with a global effect on embryonic and fetal development. Here, we provide evidence that genes that are essential for survival and fitness also contribute to ASD risk and lead to the disruption of normal social behavior.
Materials and Methods
Identification of EGs.
Mouse Phenotype (MP) terms for the annotation of EGs are listed in Table S8. More details on identification of the catalog of EGs are in SI Materials and Methods.
Analysis of Haploinsufficiency of EGs.
Details on collection of genes sets for the analysis of haploinsufficiency of EGs are in SI Materials and Methods.
Burden Analysis of Mutations in EGs in ASD Families.
Details on collection of genetic and phenotypic data of ASD families and variant filtering process are in SI Materials and Methods.
Comparison Between Observed and Expected TADA FDR q Values.
To compare the strength of association signals to ASD between EGs and NEGs, FDR q values for the TADA test of 18,665 genes were obtained from the work by Sanders et al. (9). For each gene set of interest (i.e., 3,915 EGs or 4,919 NEGs), the null distribution of TADA FDR q values was generated by randomly resampling with replacement. Within one iteration of the resampling procedure, the TADA FDR q value of a random gene from the tested 18,665 genes was obtained for each gene in the gene set of interest. The resampled TADA FDR q values were then ranked from low to high. The resampling procedure was repeated for 100,000 iterations. For each observed TADA FDR q value ranked from low to high, the median of 100,000 resampled q values with the same rank was considered the expected TADA FDR q value. The 2.5th and 97.5th percentiles of 100,000 resampled q values were considered the estimated 95% confidence intervals of each expected TADA FDR q value. The observed FDR q values were then compared with the expected FDR q values.
Construction of Coexpression Modules and Coexpression Network in Brain.
Details on construction of coexpression modules and coexpression network in the developing human brain are in SI Materials and Methods.
Pathway Enrichment Analysis.
We performed pathway enrichment analysis in the Reactome database (42) using Enrichr (55) for three EG-enriched modules (M01, M02, and M16) that were also enriched for potential ASD genes (Table S6). The enriched pathways were ranked by P values with Benjamini–Hochberg adjustment (FDR q values) from the Fisher’s exact test.
Code Availability.
Details on availability of code used to generate reported results are in Table S9.
Table S9.
Analysis code for figures and tables generated
| File name | Figure/table | Description | 
| Fig1A_Fig2B_plotGeneSetEnrichment.r | Figs. 1A and 2B | Plotting enrichment of EGs among haploinssuficient genes and ASD risk genes | 
| Fig1BC_plotHIScoreDistribution.r | Fig. 1 B and C | Plotting the distribution of haploinsufficiency scores | 
| Fig2A_getForestPlot_burdenAnalysis.r | Fig. 2A | Plotting the results for mutational burden analysis | 
| Fig2A_Table1_TableS1_S2_S3_S4_burdenAnalysis.r | Fig. 2A, Table 1, and Tables S1, S2, S3, and S4 | Performing mutational burden analysis | 
| Fig2C_getExpectedTADAFDR.py | Fig. 2C | Generating the null distribution of TADA FDR q values for gene set | 
| Fig2C_plotTADAfdrQQ.r | Fig. 2C | Plotting the observed vs. null distribution of TADA FDR q values for gene set | 
| Fig3A_plotModuleEnrichment.r | Fig. 3A | Plotting the enrichment of EGs/NEGs among coexpression modules | 
| Fig3C_plotNetworkAttibutes.r | Fig. 3C | Plotting the coexpression network of gene modules implicated in ASD | 
| FigS1_Fig3B_plotEigengenes.r | Fig. 3B and Fig. S1 | Plotting the expression trajectories of coexpression modules | 
| FigS2_plotSRS_IQ.r | Fig. S2 | Plotting the correlation between SRS and IQ in ASD probands | 
| DatasetS3_S4_getVariantList.py | Datasets S3 and S4 | Generating lists of variants for mutational burden analysis | 
Analysis codes for figures and tables generated were deposited into Github (https://github.com/Bucanlab/Ji_PNAS_2016).
Supplementary Material
Acknowledgments
We thank Steve Murray and the International Mouse Phenotyping Consortium (IMPC) for help with generation of gene lists, and Benjamin Georgi, Benjamin Voight, Hakon Hakonarson, Steve Brown, Judith Miller, Edward Brodkin, and Lu Chen for discussions. X.J. was supported by a fellowship from Biomedical Graduate Studies at the University of Pennsylvania. This work was supported by the Pennsylvania Commonwealth Grant and NIH Grants R01MH101822 (to C.D.B.) and R01MH093415 (to M.B. and Steven M. Paul; multiple principal investigators).
Footnotes
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1613195113/-/DCSupplemental.
References
- 1.State MW, Levitt P. The conundrums of understanding genetic risks for autism spectrum disorders. Nat Neurosci. 2011;14(12):1499–1506. doi: 10.1038/nn.2924. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Huguet G, Ey E, Bourgeron T. The genetic landscapes of autism spectrum disorders. Annu Rev Genomics Hum Genet. 2013;14:191–213. doi: 10.1146/annurev-genom-091212-153431. [DOI] [PubMed] [Google Scholar]
- 3.Willsey AJ, State MW. Autism spectrum disorders: From genes to neurobiology. Curr Opin Neurobiol. 2015;30:92–99. doi: 10.1016/j.conb.2014.10.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.De Rubeis S, Buxbaum JD. Recent advances in the genetics of autism spectrum disorder. Curr Neurol Neurosci Rep. 2015;15(6):36. doi: 10.1007/s11910-015-0553-1. [DOI] [PubMed] [Google Scholar]
- 5.de la Torre-Ubieta L, Won H, Stein JL, Geschwind DH. Advancing the understanding of autism disease mechanisms through genetics. Nat Med. 2016;22(4):345–361. doi: 10.1038/nm.4071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Geschwind DH, Levitt P. Autism spectrum disorders: Developmental disconnection syndromes. Curr Opin Neurobiol. 2007;17(1):103–111. doi: 10.1016/j.conb.2007.01.009. [DOI] [PubMed] [Google Scholar]
- 7.De Rubeis S, et al. DDD Study; Homozygosity Mapping Collaborative for Autism; UK10K Consortium Synaptic, transcriptional and chromatin genes disrupted in autism. Nature. 2014;515(7526):209–215. doi: 10.1038/nature13772. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Iossifov I, et al. The contribution of de novo coding mutations to autism spectrum disorder. Nature. 2014;515(7526):216–221. doi: 10.1038/nature13908. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Sanders SJ, et al. Autism Sequencing Consortium Insights into autism spectrum disorder genomic architecture and biology from 71 risk loci. Neuron. 2015;87(6):1215–1233. doi: 10.1016/j.neuron.2015.09.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Zhang M, Zhu C, Jacomy A, Lu LJ, Jegga AG. The orphan disease networks. Am J Hum Genet. 2011;88(6):755–766. doi: 10.1016/j.ajhg.2011.05.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Georgi B, Voight BF, Bućan M. From mouse to human: Evolutionary genomics analysis of human orthologs of essential genes. PLoS Genet. 2013;9(5):e1003484. doi: 10.1371/journal.pgen.1003484. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Petrovski S, Wang Q, Heinzen EL, Allen AS, Goldstein DB. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 2013;9(8):e1003709. doi: 10.1371/journal.pgen.1003709. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Dickerson JE, Zhu A, Robertson DL, Hentges KE. Defining the role of essential genes in human disease. PLoS One. 2011;6(11):e27368. doi: 10.1371/journal.pone.0027368. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Dickinson ME, et al. International Mouse Phenotyping Consortium; Jackson Laboratory; Infrastructure Nationale PHENOMIN, Institut Clinique de la Souris (ICS); Charles River Laboratories; MRC Harwell; Toronto Centre for Phenogenomics; Wellcome Trust Sanger Institute; RIKEN BioResource Center High-throughput discovery of novel developmental phenotypes. Nature. 2016;537(7621):508–514. doi: 10.1038/nature19356. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Deutschbauer AM, et al. Mechanisms of haploinsufficiency revealed by genome-wide profiling in yeast. Genetics. 2005;169(4):1915–1925. doi: 10.1534/genetics.104.036871. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Mushegian AR, Koonin EV. A minimal gene set for cellular life derived by comparison of complete bacterial genomes. Proc Natl Acad Sci USA. 1996;93(19):10268–10273. doi: 10.1073/pnas.93.19.10268. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Koonin EV. Comparative genomics, minimal gene-sets and the last universal common ancestor. Nat Rev Microbiol. 2003;1(2):127–136. doi: 10.1038/nrmicro751. [DOI] [PubMed] [Google Scholar]
- 18.Hwang YC, et al. Predicting essential genes based on network and sequence analysis. Mol Biosyst. 2009;5(12):1672–1678. doi: 10.1039/B900611G. [DOI] [PubMed] [Google Scholar]
- 19.Chakravarti A, Turner TN. Revealing rate-limiting steps in complex disease biology: The crucial importance of studying rare, extreme-phenotype families. BioEssays. 2016;38(6):578–586. doi: 10.1002/bies.201500203. [DOI] [PubMed] [Google Scholar]
- 20.Blomen VA, et al. Gene essentiality and synthetic lethality in haploid human cells. Science. 2015;350(6264):1092–1096. doi: 10.1126/science.aac7557. [DOI] [PubMed] [Google Scholar]
- 21.Wang T, et al. Identification and characterization of essential genes in the human genome. Science. 2015;350(6264):1096–1101. doi: 10.1126/science.aac7041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Hart T, et al. High-resolution CRISPR screens reveal fitness genes and genotype-specific cancer liabilities. Cell. 2015;163(6):1515–1526. doi: 10.1016/j.cell.2015.11.015. [DOI] [PubMed] [Google Scholar]
- 23.Eppig JT, et al. Mouse Genome Database Group The Mouse Genome Database (MGD): From genes to mice--a community resource for mouse biology. Nucleic Acids Res. 2005;33(Database issue):D471–D475. doi: 10.1093/nar/gki113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Koscielny G, et al. The International Mouse Phenotyping Consortium Web Portal, a unified point of access for knockout mice and related phenotyping data. Nucleic Acids Res. 2014;42(Database issue):D802–D809. doi: 10.1093/nar/gkt977. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.White JK, et al. Sanger Institute Mouse Genetics Project Genome-wide generation and systematic phenotyping of knockout mice reveals new roles for many genes. Cell. 2013;154(2):452–464. doi: 10.1016/j.cell.2013.06.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Krumm N, et al. Excess of rare, inherited truncating mutations in autism. Nat Genet. 2015;47(6):582–588. doi: 10.1038/ng.3303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Samocha KE, et al. A framework for the interpretation of de novo mutation in human disease. Nat Genet. 2014;46(9):944–950. doi: 10.1038/ng.3050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Iossifov I, et al. Low load for disruptive mutations in autism genes and their biased transmission. Proc Natl Acad Sci USA. 2015;112(41):E5600–E5607. doi: 10.1073/pnas.1516376112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005;33(Database issue):D514–D517. doi: 10.1093/nar/gki033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Rehm HL, et al. ClinGen ClinGen--the Clinical Genome Resource. N Engl J Med. 2015;372(23):2235–2242. doi: 10.1056/NEJMsr1406261. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Dang VT, Kassahn KS, Marcos AE, Ragan MA. Identification of human haploinsufficient genes and their genomic proximity to segmental duplications. Eur J Hum Genet. 2008;16(11):1350–1357. doi: 10.1038/ejhg.2008.111. [DOI] [PubMed] [Google Scholar]
- 32.Huang N, Lee I, Marcotte EM, Hurles ME. Characterising and predicting haploinsufficiency in the human genome. PLoS Genet. 2010;6(10):e1001154. doi: 10.1371/journal.pgen.1001154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Steinberg J, Honti F, Meader S, Webber C. Haploinsufficiency predictions without study bias. Nucleic Acids Res. 2015;43(15):e101. doi: 10.1093/nar/gkv474. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Fischbach GD, Lord C. The Simons Simplex Collection: A resource for identification of autism genetic risk factors. Neuron. 2010;68(2):192–195. doi: 10.1016/j.neuron.2010.10.006. [DOI] [PubMed] [Google Scholar]
- 35.Constantino J, Gruber C. The Social Responsiveness Scale Manual. Western Psychological Services; Los Angeles: 2005. [Google Scholar]
- 36.Constantino JN, et al. Validation of a brief quantitative measure of autistic traits: Comparison of the social responsiveness scale with the autism diagnostic interview-revised. J Autism Dev Disord. 2003;33(4):427–433. doi: 10.1023/a:1025014929212. [DOI] [PubMed] [Google Scholar]
- 37.Abrahams BS, et al. SFARI Gene 2.0: A community-driven knowledgebase for the autism spectrum disorders (ASDs) Mol Autism. 2013;4(1):36. doi: 10.1186/2040-2392-4-36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.He X, et al. Integrated model of de novo and inherited genetic variants yields greater power to identify risk genes. PLoS Genet. 2013;9(8):e1003671. doi: 10.1371/journal.pgen.1003671. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Buxbaum JD, et al. Autism Sequencing Consortium The autism sequencing consortium: Large-scale, high-throughput sequencing in autism spectrum disorders. Neuron. 2012;76(6):1052–1056. doi: 10.1016/j.neuron.2012.12.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.BrainSpan 2011 BrainSpan: Atlas of the Developing Human Brain. Available at brainspan.org. Accessed October 4, 2013.
- 41.Croft D, et al. The Reactome pathway knowledgebase. Nucleic Acids Res. 2014;42(Database issue):D472–D477. doi: 10.1093/nar/gkt1102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Fabregat A, et al. The Reactome pathway Knowledgebase. Nucleic Acids Res. 2016;44(D1):D481–D487. doi: 10.1093/nar/gkv1351. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Sanders SJ, et al. De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature. 2012;485(7397):237–241. doi: 10.1038/nature10945. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.O’Roak BJ, et al. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature. 2012;485(7397):246–250. doi: 10.1038/nature10989. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Iossifov I, et al. De novo gene disruptions in children on the autistic spectrum. Neuron. 2012;74(2):285–299. doi: 10.1016/j.neuron.2012.04.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Gaugler T, et al. Most genetic risk for autism resides with common variation. Nat Genet. 2014;46(8):881–885. doi: 10.1038/ng.3039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Belinson H, et al. Prenatal β-catenin/Brn2/Tbr2 transcriptional cascade regulates adult social and stereotypic behaviors. Mol Psychiatry. 2016;21(10):1417–1433. doi: 10.1038/mp.2015.207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Doan RN, et al. Mutations in human accelerated regions disrupt cognition and social behavior. Cell. 2016;167(2):341–354.e12. doi: 10.1016/j.cell.2016.08.071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Loh PR, et al. Schizophrenia Working Group of Psychiatric Genomics Consortium Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat Genet. 2015;47(12):1385–1392. doi: 10.1038/ng.3431. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Willsey AJ, et al. Coexpression networks implicate human midfetal deep cortical projection neurons in the pathogenesis of autism. Cell. 2013;155(5):997–1007. doi: 10.1016/j.cell.2013.10.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Parikshak NN, et al. Integrative functional genomic analyses implicate specific molecular pathways and circuits in autism. Cell. 2013;155(5):1008–1021. doi: 10.1016/j.cell.2013.10.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Stoner R, et al. Patches of disorganization in the neocortex of children with autism. N Engl J Med. 2014;370(13):1209–1219. doi: 10.1056/NEJMoa1307491. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Choi J, Shooshtari P, Samocha KE, Daly MJ, Cotsapas C. Network analysis of genome-wide selective constraint reveals a gene network active in early fetal brain intolerant of mutation. PLoS Genet. 2016;12(6):e1006121. doi: 10.1371/journal.pgen.1006121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Chang J, Gilman SR, Chiang AH, Sanders SJ, Vitkup D. Genotype to phenotype relationships in autism spectrum disorders. Nat Neurosci. 2015;18(2):191–198. doi: 10.1038/nn.3907. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Chen EY, et al. Enrichr: Interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics. 2013;14:128. doi: 10.1186/1471-2105-14-128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Flicek P, et al. Ensembl 2014. Nucleic Acids Res. 2014;42(Database issue):D749–D755. doi: 10.1093/nar/gkt1196. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Lord C, Rutter M, Le Couteur A. Autism Diagnostic Interview-Revised: A revised version of a diagnostic interview for caregivers of individuals with possible pervasive developmental disorders. J Autism Dev Disord. 1994;24(5):659–685. doi: 10.1007/BF02172145. [DOI] [PubMed] [Google Scholar]
- 58.Lord C, et al. The autism diagnostic observation schedule-generic: A standard measure of social and communication deficits associated with the spectrum of autism. J Autism Dev Disord. 2000;30(3):205–223. [PubMed] [Google Scholar]
- 59.Kircher M, et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014;46(3):310–315. doi: 10.1038/ng.2892. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.McKenna A, et al. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Garrison E, Marth G. 2012. Haplotype-based variant detection from short-read sequencing. arXiv:1207.3907.
- 62. NHLBI Exome Sequencing Project (ESP) Exome Variant Server. Available at evs.gs.washington.edu/EVS/. Accessed November 11, 2015.
- 63.Langfelder P, Horvath S. WGCNA: An R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:559. doi: 10.1186/1471-2105-9-559. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Mostafavi S, Ray D, Warde-Farley D, Grouios C, Morris Q. GeneMANIA: A real-time multiple association network integration algorithm for predicting gene function. Genome Biol. 2008;9(Suppl 1):S4. doi: 10.1186/gb-2008-9-s1-s4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Shannon P, et al. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13(11):2498–2504. doi: 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.







