Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2016 Dec 12;113(52):15054–15059. doi: 10.1073/pnas.1613195113

Increased burden of deleterious variants in essential genes in autism spectrum disorder

Xiao Ji a,b, Rachel L Kember b, Christopher D Brown b,1, Maja Bućan b,c,1
PMCID: PMC5206557  PMID: 27956632

Significance

Essential genes (EGs) are necessary for survival and the development of an organism. Our study is focused on investigating the role of EGs in autism spectrum disorder (ASD). With a comprehensive catalog of 3,915 mammalian EGs, we show that there is both an elevated burden of damaging mutations in EGs in ASD probands and also, an enrichment of EGs in known ASD risk genes. Moreover, the analysis of EGs in the developing brain identified clusters of coexpressed EGs implicated in ASD. Overall, we provide evidence that genes that are essential for survival and fitness also contribute to ASD risk and lead to the disruption of normal social behavior.

Keywords: essential genes, mouse knockouts, mutational burden, autism spectrum disorder, coexpression modules

Abstract

Autism spectrum disorder (ASD) is a heterogeneous, highly heritable neurodevelopmental syndrome characterized by impaired social interaction, communication, and repetitive behavior. It is estimated that hundreds of genes contribute to ASD. We asked if genes with a strong effect on survival and fitness contribute to ASD risk. Human orthologs of genes with an essential role in pre- and postnatal development in the mouse [essential genes (EGs)] are enriched for disease genes and under strong purifying selection relative to human orthologs of mouse genes with a known nonlethal phenotype [nonessential genes (NEGs)]. This intolerance to deleterious mutations, commonly observed haploinsufficiency, and the importance of EGs in development suggest a possible cumulative effect of deleterious variants in EGs on complex neurodevelopmental disorders. With a comprehensive catalog of 3,915 mammalian EGs, we provide compelling evidence for a stronger contribution of EGs to ASD risk compared with NEGs. By examining the exonic de novo and inherited variants from 1,781 ASD quartet families, we show a significantly higher burden of damaging mutations in EGs in ASD probands compared with their non-ASD siblings. The analysis of EGs in the developing brain identified clusters of coexpressed EGs implicated in ASD. Finally, we suggest a high-priority list of 29 EGs with potential ASD risk as targets for future functional and behavioral studies. Overall, we show that large-scale studies of gene function in model organisms provide a powerful approach for prioritization of genes and pathogenic variants identified by sequencing studies of human disease.


Autism spectrum disorder (ASD) is a heterogeneous, heritable neurodevelopmental syndrome characterized by impaired social interaction, communication, and repetitive behavior (1, 2). The highly polygenic nature of ASD (35) suggests that the analysis of the full spectrum of sequence variants in hundreds of genes will be necessary for deeper understanding of disrupted neuronal function. Prioritization of ASD risk genes initially focused on known pathways with recognized relevance to pathogenesis of ASD, such as synaptic function and neuronal development (6). However, combined analyses of de novo, inherited, and case–control variation in over 2,500 ASD parent–child nuclear families identified around 100 genes contributing to ASD risk (79), converging on pathways implicated in transcriptional regulation and chromatin modeling in addition to synaptic function.

The main challenge in the current understanding of genetic architecture of ASD comes from a need to study the interplay between variants with a high effect (for example, recurrent de novo variants) and a background of variants with an intermediate effect but that nevertheless still disrupt proper neuronal development. Essential genes (EGs) or genes that are necessary for successful completion of pre- and postnatal development are prime candidates for the source of this background or load of variants with a cumulative intermediate effect. EGs are highly enriched for human disease genes and under strong purifying selection (1014). In addition to intolerance to loss-of-function and deleterious mutations, the functional impact of EGs is reflected by haploinsufficiency that is commonly observed in heterozygous mutations (11, 15). In addition to their role in defining a “minimal gene set” (16, 17), EGs tend to play important roles in protein interaction networks (18). Therefore, one may consider that EGs are involved in rate-limiting steps that affect a range of disease pathways (19).

Recently, three large-scale screens (gene trap and CRISPR-Cas9) have been performed to assess the effect of single-gene mutations on cell viability or survival of haploid human cancer cell lines (“cell-based essentiality”) (2022). These studies identified an overlapping core set of genes that were essential in the majority of cell lines tested (n = 956), although a subset of genes were essential in specific cell lines. In an alternative and complementary approach, we assembled a catalog of human orthologs of EGs in the mouse (n = 3,326) (14) based on the organismal-level phenotypes of loss-of-function mouse mutants from the Mouse Genome Informatics (MGI) database (23) and the International Mouse Phenotyping Consortium (IMPC) web portal (24). Based on these data, homozygous loss-of-function mutations in 3,326 genes lead to prenatal or preweaning lethality, with a significant overlap between the core set of human cell EGs and human orthologs of EGs in the mouse (14). These studies are consistent with 30% (or ∼6,000) of protein-coding genes to be essential for pre- and postnatal survival (14, 25).

A deeper understanding of the mutational spectrum of EGs in a neurodevelopmental disorder, such as ASD, is important, because EGs are less likely to be redundant, are more likely to have functional consequences when mutated, and may produce a gradation of phenotypes (25). Our previous work reported an enrichment of EGs among genes with de novo mutations in ASD patients (11). Several groups reported an enrichment of de novo and rare inherited single-nucleotide loss-of-function variants in ASD probands (8, 26), although there is a depletion of damaging mutations in ASD risk genes in population controls (12, 27, 28). In this report, we compiled, to our knowledge, the most comprehensive list of human EGs and extended the analysis to both de novo and inherited damaging variants in 1,781 ASD families. In addition to disease status, we further showed the effect of damaging variants in EGs on ASD-related traits, such as the social skill measurement in 2,348 ASD probands. Finally, we performed coexpression analysis of EGs in the developing human brain to identify clusters of interacting EGs that contribute to ASD risk and suggest ASD candidate genes.

Results

To identify the most comprehensive set of EGs in mammals, we combined the set of human orthologs of EGs in the mouse (n = 3,326) (14) with a set of human “core EGs” (n = 956) that were found to be essential in cell-based assays (2022). Based on a significant overlap between tested mouse and human EGs (14), we expanded our original set of 3,326 EGs with the addition of nonoverlapping 589 EGs identified only in human cell lines for a total of 3,915 EGs (SI Materials and Methods and Dataset S1). In our subsequent analyses, we compared features of and genetic variation in these EGs with 4,919 human orthologs of genes with reported nonlethal phenotypes in the mouse [nonessential genes (NEGs)].

Homozygous loss-of-function mutations in EGs lead to lethality (or miscarriages in humans) and as such, cannot contribute to disease. Although we and others reported a depletion of loss-of-function mutations in EGs in humans (11, 12, 14), heterozygosity for a loss-of-function mutation or other “milder” alleles in EGs may contribute to both dominant and recessive diseases. We illustrated this point using a catalog of disease-linked genes in Online Mendelian Inheritance in Man (29) (SI Materials and Methods); EGs were enriched relative to NEGs in 1,000 genes underlying dominant diseases (odds ratio = 1.95, P value = 3.17 × 10−19; two-sided Fisher’s exact test) and 1,645 genes underlying recessive disease (odds ratio = 1.52, P value = 4.94 × 10−11; two-sided Fisher’s exact test) (Fig. 1A). A stronger enrichment of EGs among genes underlying dominant disease implies that dominant negative alleles and haploinsufficiency play an important role. We provide multiple lines of evidence for higher probability of haploinsufficiency of EGs (Fig. 1A and SI Materials and Methods). First, using the systematically rated dosage-sensitive genes from ClinGen (30), we found that EGs were significantly enriched compared with NEGs and that the levels of EG enrichment positively correlated with levels of evidence supporting dosage sensitivity of rated genes (odds ratio = 3.94, P value = 5.07 × 10−20 for “sufficient evidence”; odds ratio = 5.26, P value = 7.08 × 10−5 for “some evidence”; odds ratio = 2.52, P value = 0.0106 for “little evidence”; odds ratio = 1.14, P value = 0.608 for “not dosage sensitive”; two-sided Fisher’s exact test). Second, as an extension of the earlier findings from the work by Georgi et al. (11), we confirmed the enrichment of EG relative to NEG for 262 human haploinsufficient genes (31) with the updated EG and NEG list (183 EGs vs. 62 NEGs; P value = 1.64 × 10−22, odds ratio = 3.84; two-sided Fisher’s exact test). Third, EGs are significantly overrepresented among 313 human orthologs of mouse genes with heterozygous alleles associated with mutant phenotypes from the MGI (23) (odds ratio = 3.43, P value = 2.74 × 10−23; two-sided Fisher’s exact test). Fourth, with two genome-wide prediction models of haploinsufficient genes in the human genome (32, 33), we observed that EGs have significantly higher probability of exhibiting haploinsufficiency compared with NEGs (P value < 2.2 × 10−16 for both models; two-sided Wilcoxon rank sum test) (Fig. 1 B and C and SI Materials and Methods). Based on our findings that EGs linked to Mendelian disease are overwhelmingly dosage-sensitive, we explored the possibility that a cumulative effect of pathogenic variants in multiple EGs may underlie the genetic basis of a complex disease with early postnatal onset, such as ASD.

Fig. 1.

Fig. 1.

Haploinsufficiency of EGs. (A) For each class of genes with different essentiality status (EG in red, NEG in turquoise, and unknown in gray), the proportion of genes among each gene set of interest is plotted in Left. Dosage-sensitive genes from ClinGen (30) were classified into five categories (1, sufficient evidence; 2, some evidence; 3, little evidence; 4, no evidence; and 5, not sensitive/recessive). Two-sided Fisher’s exact test was performed to assess the enrichment of EGs vs. NEGs, and the P values were indicated. The odds ratios for enrichment of EGs compared with NEGs and the 95% confidence intervals of odds ratios are plotted in Right. OMIM, Online Mendelian Inheritance in Man (29). (B and C) Histograms and estimated density curves indicating the distribution of (B) the Haploinsufficiency Score (HIS) (32) and (C) the Genome-Wide Haploinsufficiency Score (GHIS) (33) across three gene sets, including EGs (red), NEGs (turquoise), and all protein-coding genes (56) (gray). EGs have significantly higher probability of exhibiting haploinsufficiency compared with NEGs (P value < 2.2 × 10−16 for both models; two-sided Wilcoxon rank sum test).

To address a possible cumulative effect of variants in EGs in ASD in a larger cohort of 1,781 ASD quartet families (with 1,781 probands and 1,781 siblings) from the Simons Simplex Collection (34), we acquired de novo and rare inherited mutations from the exome sequencing data of these families (8, 26). We examined the individual mutational burden defined by the number of de novo loss-of-function (dnLoF), de novo nonsynonymous damaging (dnNSD), and inherited rare damaging (inhRD) mutations per individual (Fig. S1, SI Materials and Methods, and Datasets S2–S4). On average, an ASD proband carried 0.06 dnLoF, 0.21 dnNSD, and 10.74 inhRD mutations in EGs. The mutational burden in EGs was significantly elevated in ASD probands compared with unaffected siblings for the three classes of variants considered (P value = 4.75 × 10−7 for dnLoF, P value = 3.41 × 10−4 for dnNSD, and P value = 0.017 for inhRD; one-sided Wilcoxon signed ranked test) (Fig. 2A and Table S1). In contrast, no significant difference in mutational burden in NEGs was observed (P value = 0.10 for dnLoF, P value = 0.069 for dnNSD, and P value = 0.75 for inhRD) (Table S1). Interestingly, 10,823 genes that are currently not assigned as EG or NEG (i.e., phenotypically uncharacterized in mouse knockouts and human cell-based assays) have a moderately elevated burden of dnLoF but not dnNSD and inhRD variants in ASD probands (P value = 0.0042) (Table S1). Notably, the effect sizes of EG burden in each variant type correspond to our understanding of the severity of the variant type; de novo mutations, which are expected to have a larger functional impact, also display the strongest difference between ASD probands and unaffected siblings (effect size = 0.117 for dnLoF; effect size = 0.079 for dnNSD; Cohen’s d). In contrast, inherited mutations are expected to have a moderate functional impact, and a smaller difference is observed between probands and siblings (effect size = 0.042 for inhRD). Although we observed marginally increased burden of dnLoF and dnNSD mutations in EGs in female (n = 325) compared with male (n = 2,043) probands (Table S2), the analysis of families divided by gender of proband–sibling pairs (female–female, male–female, female–male, and male–male) showed that gender bias does not underlie the observed differences in mutational burden between probands and siblings (Table S3).

Fig. S1.

Fig. S1.

Variant filtering steps for the mutational burden analysis. EVS, Exome Variant Server (release ESP 6500); MAF, minor allele frequency.

Fig. 2.

Fig. 2.

Assessment of the contribution of EGs to ASD risk. (A) Individual mutational burden analysis in 1,781 pairs of ASD probands and unaffected siblings (Table S1). The analyses were performed separately for 3,915 EGs (red) and 4,919 NEGs (turquoise). The individual mutational burden is defined by the number of dnLoF, dnNSD, and inhRD mutations per individual. Effect sizes were measured by Cohen’s d, which is defined as the difference between both means divided by the SD of the paired differences. The estimated 95% confidence intervals of effect sizes were plotted (SI Materials and Methods). P values were obtained from one-sided Wilcoxon signed ranked test. *P value < 0.05. (B) ASD candidate genes categorized by SFARI genes scores (S, syndromic; 1, high confidence; 2, strong candidate; 3, suggestive evidence; 4, minimal evidence; 5, hypothesized; and 6, not supported) (37) and their essentiality status (EG in red, NEG in turquoise, and unknown in gray). ***The P value from two-sided Fisher’s exact test (EG vs. NEG) is less than 0.001. (C) The distribution of TADA FDR q values of EGs and NEGs. The FDR q value of the TADA test evaluates ASD association based on combined evidence from de novo SNVs and small deletions, rare inherited variants, and variants (9). The observed negative log10 (q) values of 3,915 EGs (red) and 4,919 NEGs (turquoise) are compared with the expected counterparts under the null hypothesis. The dashed lines indicate the FDR thresholds (FDR = 0.1 in red and FDR = 0.5 in blue) for identification of ASD risk genes. The 95% confidence intervals of the expected negative log10 (q) values are shaded in gray.

Table S1.

Mutational burden analysis in 1,781 ASD quartet families

Variant type and gene set No. of genes Proband average Sibling average Effect size Effect size 95% CI low Effect size 95% CI high P value
dnLOF
 EG: this work 3,915 0.0640 0.0286 0.1170 0.0715 0.1596 4.75 × 10−7
 EG: Dickinson et al. (14) 3,326 0.0595 0.0253 0.1176 0.0730 0.1603 4.16 × 10−7
 EG: Georgi et al. (11) 2,472 0.0494 0.0168 0.1254 0.0820 0.1671 7.82 × 10−8
 Human cell EGs (20, 21, 22) 956 0.0079 0.0056 0.0193 −0.0269 0.0637 0.2118
 NEG 4,919 0.0387 0.0309 0.0300 −0.0157 0.0774 0.1028
 Phenotypically uncharacterized genes 10,823 0.0752 0.0533 0.0606 0.0143 0.1084 0.004257
dnNSD
 EG: this work 3,915 0.2061 0.1589 0.0794 0.0324 0.1274 3.41 × 10−4
 EG: Dickinson et al. (14) 3,326 0.1875 0.1376 0.0892 0.0429 0.1353 8.13 × 10−5
 EG: Georgi et al. (11) 2,472 0.1505 0.1050 0.0895 0.0435 0.1366 7.36 × 10−5
 Human cell EGs (20, 21, 22) 956 0.0371 0.0365 0.0021 −0.0435 0.0499 0.4696
 NEG 4,919 0.1611 0.1404 0.0374 −0.0100 0.0827 0.0691
 Phenotypically uncharacterized genes 10,823 0.2471 0.2791 −0.0419 −0.0884 0.0044 0.9636
inhRD
 EG: this work 3,915 10.7428 10.6042 0.0420 −0.0041 0.0887 0.01688
 EG: Dickinson et al. (14) 3,326 9.3257 9.2358 0.0287 −0.0185 0.0757 0.04139
 EG: Georgi et al. (11) 2,472 7.0236 6.9163 0.0402 −0.0053 0.0867 0.02622
 Human cell EGs (20, 21, 22) 956 2.3745 2.3779 −0.0022 −0.0485 0.0435 0.5935
 NEG 4,919 12.7816 12.8355 −0.0150 −0.0618 0.0308 0.7456
 Phenotypically uncharacterized genes 10,823 20.3947 20.4559 −0.0133 −0.0592 0.0342 0.5404

Effect sizes were measured by Cohen's d, which is defined as the difference between both means divided by the SD of the paired differences. P values were obtained from one-sided Wilcoxon signed ranked test. 95% CI, 95% confidence interval.

Table S2.

Difference in individual mutational burden between male and female probands

Variant type and gene set Female proband average Male proband average Effect size P value
dnLoF
 EG 0.0862 0.0597 0.1042 0.0355*
 NEG 0.0462 0.0357 0.0551 0.1782
dnNSD
 EG 0.2400 0.1948 0.1014 0.0388*
 NEG 0.2000 0.1596 0.0993 0.0742
inhRD
 EG 11.0523 10.9633 0.0151 0.4711
 NEG 13.2677 13.0113 0.0360 0.5271

Effect sizes were measured by Cohen's d, which is defined as the difference between both means divided by pooled SD.

*

P values with statistical significance.

Table S3.

Mutational burden analysis in 1,781 ASD quartet families (dissected by genders of proband–sibling pairs)

Variant type, gene set, and proband gender Sibling gender No. of families Proband average Sibling average Effect size P value
dnLoF
 EG
  All All 1,781 0.0640 0.0286 0.1170 4.75 × 10−7
  Female Male 101 0.0891 0.0099 0.2588 0.0067
  Male Female 826 0.0593 0.0327 0.0893 0.0053
  Male Male 732 0.0615 0.0246 0.1228 0.0005
  Female Female 122 0.0902 0.0410 0.1461 0.0600
 NEG
  All All 1,781 0.0387 0.0309 0.0300 0.1028
  Female Male 101 0.0396 0.0297 0.0374 0.3884
  Male Female 826 0.0412 0.0266 0.0558 0.0549
  Male Male 732 0.0369 0.0314 0.0213 0.2838
  Female Female 122 0.0328 0.0574 0.0818 0.8302
dnNSD
 EG
  All All 1,781 0.2061 0.1589 0.0794 0.0003
  Female Male 101 0.2178 0.1683 0.0724 0.2392
  Male Female 826 0.2094 0.1755 0.0552 0.0454
  Male Male 732 0.1885 0.1270 0.1136 0.0013
  Female Female 122 0.2787 0.2295 0.0725 0.2157
 NEG
  All All 1,781 0.1611 0.1404 0.0374 0.0691
  Female Male 101 0.1881 0.1980 0.0155 0.5696
  Male Female 826 0.1465 0.1477 0.0022 0.5515
  Male Male 732 0.1667 0.1175 0.0904 0.0080
  Female Female 122 0.2049 0.1803 0.0379 0.3817
inhRD
 EG
  All All 1,781 10.7428 10.6042 0.0420 0.0169
  Female Male 101 10.3762 10.6436 0.0778 0.8260
  Male Female 826 10.8341 10.7034 0.0401 0.1120
  Male Male 732 10.5765 10.4372 0.0417 0.0449
  Female Female 122 11.4262 10.9016 0.1619 0.0430
 NEG
  All All 1,781 12.7816 12.8355 0.0150 0.7456
  Female Male 101 12.5050 13.0792 0.1398 0.9143
  Male Female 826 12.8693 13.0182 0.0424 0.7802
  Male Male 732 12.6134 12.5546 0.0165 0.5576
  Female Female 122 13.4262 13.0820 0.0907 0.1327

Effect sizes were measured by Cohen's d, which is defined as the difference between both means divided by the SD of the paired differences. P values were obtained from one-sided Wilcoxon signed ranked test.

To evaluate the effect of rare damaging mutations in EGs on ASD-associated traits, we used the available quantitative phenotype data on social and cognitive impairments in ∼2,500 ASD families from Simons Simplex Collection (8, 26) (Dataset S2). As a measure of sociability, we used the total raw score from the Social Responsiveness Scale (SRS) (35), and as cognitive measures, we used three different intelligence quotient (IQ) scores (full-scale IQ, verbal IQ, and nonverbal IQ). As previously reported (36), SRS scores were unrelated to IQ, especially in subjects with IQ higher than 50 (Fig. S2). In male probands, we observed that the mutational burden in EGs was positively correlated with the SRS total raw score (P value = 1.08 × 10−6; Poisson regression) (Table 1). The effect was not significant in NEGs (P = 0.21). In female probands, mutational burden in NEGs but not EGs was negatively correlated with SRS total raw score (P = 0.085 for EG and P = 6.06e-06 for NEG). In addition, we found that mutational burden in both EGs and NEGs had a significant effect (P value < 2.2 × 10−16) on verbal and nonverbal IQ scores and that the effect sizes of mutational burden in EGs and NEGs were comparable (Table S4). These results suggest that, in ASD probands, deleterious variants in EGs contribute to decreased social skills in males, whereas deleterious variants in both EGs and NEGs lead to decreased IQ.

Fig. S2.

Fig. S2.

Correlation between SRS and IQ. For each of 2,368 ASD probands from Simons Simplex Collection, the Pearson correlation between SRS total raw scores and three IQ scores (full-scale IQ, verbal IQ, and nonverbal IQ) was plotted. The probands were divided by IQ scores: (A, C, and E) IQ < 50 and (B, D, and F) IQ ≥ 50.

Table 1.

Relationship between individual mutational burden and SRS in ASD probands

Group and gene set Estimate Standard error P value
2,031 Male probands
 EG (3,915 genes) 0.001860 0.000381 1.08 × 10−6*
 NEG (4,919 genes) 0.000407 0.000324 0.209
317 Female probands
 EG (3,915 genes) −0.001511 0.000877 0.085
 NEG (4,919 genes) −0.003084 0.000682 6.04 × 10−6

Coefficients for Poisson regression are shown, which model the relationship between SRS total raw score and individual burden of all rare damaging mutations (including dnLOF, dnNSD, and inhRD mutations).

*

The P value with statistical significance with positive estimated effects (P value < 0.05; estimate > 0).

Table S4.

Relationship between individual mutational burden and IQ in ASD probands

Trait and gene set Estimate SE P value
Verbal IQ
 EG (3,915 genes) −0.007279 0.000400 <2.2 × 10−16
 NEG (4,919 genes) −0.005307 0.000383 <2.2 × 10−16
Nonverbal IQ
 EG (3,915 genes) −0.007172 0.000336 <2.2 × 10−16
 NEG (4,919 genes) −0.004906 0.000320 <2.2 × 10−16

Coefficients for Poisson regression are shown, which modeled the relationship between verbal/nonverbal IQ and individual burden of all rare damaging mutations (including dnLOF, dnNSD, and inhRD mutations).

To initially explore the overlap between EGs and known ASD genes, we examined the essentiality status of ∼500 ASD candidate genes from the Simons Foundation Autism Research Initiative (SFARI) AutDB database (updated December of 2015) (37) (Fig. 2B). Compared with NEGs, EGs were enriched among ASD candidates categorized as “syndromic” (category S: odds ratio = 3.95, P value = 0.0003; two-sided Fisher’s exact test), candidates with “high confidence” (category 1: odds ratio = 15.12, P value = 0.0004), and candidates with “suggestive evidence” (category 3: odds ratio = 2.14, P value = 0.0006). Trends of enrichment of EGs were also observed for “strong candidates” (category 2: odds ratio = 1.62, P value = 0.21). We did not observe enrichment of EGs among candidate genes with less supportive evidence (categories 4–6).

To further address whether EGs contribute to ASD risk, we compared the strength of ASD association signals between EGs and NEGs in data from a recent comprehensive analysis of ASD genomic architecture (9), where the transmission and de novo association (TADA) test (38) was used to evaluate ASD association based on combined evidence from de novo single-nucleotide variants (SNVs), de novo small deletions, and rare inherited variants from Simons Simplex Collection cohorts as well as case–control data from Autism Sequencing Consortium (ASC) cohorts (39). There was a significant enrichment of EGs compared with NEGs in 65 high-confidence TADA ASD genes [TADA false discovery rate (FDR) q values < 0.1] identified by Sanders et al. (9) (36 EGs vs. 15 NEGs; odds ratio = 3.03, P value = 1.82 × 10−4; one-sided Fisher’s exact test). In a broader set of 441 “potential” TADA ASD genes (TADA FDR < 0.5), EGs were also enriched compared with NEGs (132 EGs vs. 117 NEGs; odds ratio = 1.43, P value = 0.00537). Furthermore, by comparing the observed TADA FDR with the expected TADA FDR, we detected a strong deviation from the null distribution in EGs, especially in 132 EGs with potential ASD association (TADA FDR < 0.5) (Fig. 2C). In contrast, NEGs were not enriched for association relative to the background expectation, suggesting that the association signals between EGs and ASD were stronger and less likely to be false positive compared with NEGs.

It is our hypothesis that a cumulative effect of deleterious variants in several EGs, within the same pathway or across pathways may underlie impaired brain development and individual’s ASD risk. To identify clusters of potentially interacting genes, we evaluated the spatiotemporal expression of EGs and NEGs using RNA sequencing (RNA-seq) data from BrainSpan (40). We identified 41 coexpression modules with distinct expression patterns across 16 brain regions and 31 pre- and postnatal time points (Fig. S3 and SI Materials and Methods). We observed that the majority of EG-enriched modules (11 of 14; FDR < 0.1; two-sided Fisher’s exact test) (Fig. 3A, Fig. S3, and Table S5) exhibited an “early-expression” pattern, where the expression levels were higher at early fetal stages (starting from 8 postconceptual weeks) and gradually declined before birth. In contrast, the majority of the NEG-enriched modules (15 of 18) exhibited a “later-expression” pattern, with expression levels that were lower at early fetal stages and gradually increased until birth.

Fig. S3.

Fig. S3.

Expression profiles of 41 coexpression modules in the brain. Expression profiles of genes from 41 coexpression modules based on the RNA-seq data from BrainSpan (25) are shown. The y axis represents the first principle component of the module-level expression profiles in each brain tissue type. The x axis represents developmental stages in chronological order (Fig. 2B shows the labels of the time points). The vertical dashed lines indicate the time of birth. The total number of protein-coding genes in each module (n) is indicated along with the module name.

Fig. 3.

Fig. 3.

Coexpression analysis of EGs in developing human brain. (A) Coexpressed modules enriched in EGs and NEGs. The upper barplot displays the level of enrichment of EGs vs. NEGs for each of 41 coexpression modules based on BrainSpan RNA-seq data. The lower barplot displays the level of enrichment (green) of 441 potential ASD genes in EGs from 41 coexpression modules. The heights of the bars represent negative log10 (FDR q value). The upper and lower red dashed lines indicate FDR q value threshold of 0.1. (B) The brain expression trajectories of genes from three coexpression modules implicated in ASD. The expression trajectories in brain for 1,601 genes in M01 (orange), 1,150 genes in M02 (purple), and 347 genes in M16 (green) were fitted based on the first principle components of the module-level expression profiles (y axis). The x axis represents developmental stages in chronological order. The vertical dashed line indicates the time of birth. pcw, Postconceptual week. (C) Coexpression network of 973 EGs from M01 (orange), M02 (purple), and M16 (green). Edges indicate coexpression between gene pairs.

Table S5.

Coexpression modules in the developing brain

Module No. of genes Expression pattern Enrichment No. of EGs No of NEGs Odds ratio (EG/NEG) FDR q value (EG/NEG) No. of potential ASD genes Odds ratio (ASD genes) FDR q value (ASD genes)
M01 1,601 Early expressed EG enriched 501 251 2.73 7.38 × 10−38* 55 1.52 0.004*
M02 1,150 Early expressed EG enriched 367 208 2.34 2.80 × 10−22* 53 2.13 2.58 × 10−6*
M03 1,054 Mixed NEG enriched 204 340 0.74 9.67 × 10−4* 18 0.72 0.934
M04 810 Late expressed NEG enriched 122 326 0.45 3.19 × 10−14* 19 1.00 0.529
M05 781 Late expressed NEG enriched 156 239 0.81 0.0491* 24 1.32 0.122
M06 702 Late expressed NEG enriched 129 254 0.63 1.55 × 10−5* 11 0.65 0.948
M07 663 Early expressed EG enriched 251 141 2.32 1.23 × 10−15* 8 0.50 0.989
M08 580 Early expressed EG enriched 193 114 2.19 3.62 × 10−11* 13 0.95 0.613
M09 559 Late expressed NEG enriched 104 206 0.62 9.26 × 10−5* 16 1.23 0.246
M10 503 Early expressed EG enriched 126 114 1.40 0.0102* 9 0.74 0.847
M11 457 Late expressed NEG enriched 79 178 0.55 7.33 × 10−6* 9 0.83 0.753
M12 420 Late expressed NEG enriched 62 163 0.47 1.90 × 10−7* 7 0.69 0.874
M13 418 Late expressed NEG enriched 97 193 0.62 1.46 × 10−4* 7 0.69 0.877
M14 370 Late expressed EG enriched 81 58 1.77 0.00102* 4 0.45 0.977
M15 368 Mixed EG enriched 104 95 1.39 0.0251* 5 0.57 0.934
M16 347 Early expressed EG enriched 106 90 1.49 0.00570* 20 2.57 2.80 × 10−4*
M17 339 Early expressed EG enriched 102 59 2.20 1.20 × 10−06* 16 2.05 0.008
M18 306 Late expressed 66 61 1.37 0.0874 5 0.67 0.861
M19 299 Late expressed NEG enriched 31 118 0.32 1.81 × 10−9* 2 0.28 0.994
M20 296 Late expressed NEG enriched 51 91 0.70 0.0498* 5 0.72 0.823
M21 291 Early expressed 54 73 0.93 0.719 5 0.72 0.818
M22 278 Early expressed EG enriched 83 25 4.24 6.17 × 10−12* 2 0.29 0.991
M23 272 Late expressed NEG enriched 41 84 0.61 0.0108* 2 0.31 0.988
M24 258 Early expressed 51 49 1.31 0.189 11 1.84 0.047
M25 244 Early expressed EG enriched 86 49 2.23 6.66 × 10−6* 11 1.98 0.031
M26 239 Early expressed EG enriched 79 18 5.61 8.28 × 10−14* 4 0.70 0.821
M27 213 Late expressed NEG enriched 45 85 0.66 0.0261* 6 1.19 0.399
M28 197 Late expressed 32 41 0.98 1 1 0.21 0.991
M29 193 Late expressed NEG enriched 33 69 0.60 0.0158* 2 0.43 0.943
M30 188 Late expressed NEG enriched 11 43 0.32 2.92 × 10−4* 3 0.69 0.808
M31 187 Late expressed 41 64 0.80 0.323 6 1.38 0.279
M32 172 Late expressed NEG enriched 24 60 0.50 0.00388* 3 0.75 0.766
M33 170 Late expressed 41 40 1.29 0.263 4 1.00 0.568
M34 163 Mixed EG enriched 48 22 2.76 5.06 × 10−5* 2 0.51 0.904
M35 151 Mixed NEG enriched 21 48 0.55 0.0207* 6 1.73 0.147
M36 151 Late expressed 22 44 0.63 0.0815 3 0.82 0.707
M37 146 Early expressed EG enriched 38 9 5.35 3.81 × 10−7* 2 0.57 0.862
M38 128 Late expressed NEG enriched 17 63 0.34 2.11 × 10−5* 4 1.36 0.347
M39 115 Early expressed 29 42 0.87 0.632 4 1.47 0.298
M40 99 Unknown 4 13 0.39 0.0926 1 0.45 0.890
M41 74 Unknown NEG enriched 4 16 0.31 0.0400* 1 0.59 0.816
*

P values with statistical significance.

We found that EGs in three EG-enriched modules (M01, M02, and M16) were significantly enriched (FDR < 0.1; one-sided Fisher’s exact test) for 441 potential TADA ASD genes (Fig. 3A). Notably, all of the three modules were also EG-enriched and early-expressed across fetal brain regions (Fig. 3 A and B). From the pathway enrichment analysis of these EG-enriched modules in the Reactome database (41, 42), we found that the top pathways enriched included “transcription” (M01), “chromatin modifying enzymes and chromatin organization” (M02), and “axon guidance” (M16) (Table S6), in agreement with the insights from recent large-scale autism studies showing that genes for synaptic formation, transcriptional regulation, and chromatin remodeling are disrupted in autism (79). This combined analysis identified 974 EGs from three modules that are coexpressed with known ASD candidate genes at distinct stages of brain development.

Table S6.

Reactome pathways enriched in three EG-enriched modules implicated in ASD

Module and term Overlap P value Adjusted P value Genes
M01
 Transcription 25/202 2.40 × 10−6 6.79 × 10−4* GTF3C3; HDAC2; CCNT2; GTF3C4; RRN3; CSTF3; GTF2E1; CLP1; PCF11; POLR2B; SNAPC3; CSTF1; RNGTT; TBP; NCBP1; NCBP2; GTF2H3; NFIA; POLR3B; NFIB; POLR3C; POLR1B; POLR1E; TFAM; TAF5
 Processing of capped intron-containing pre-mRNA 22/144 4.34 × 10−7 3.67 × 10−4* NCBP1; NUP133; DHX9; NCBP2; CSTF3; CDC5L; HNRNPU; PLRG1; YBX1; NUP160; EFTUD2; PRPF4; CLP1; HNRNPH1; PCF11; POLR2B; NUP50; CSTF1; NUPL1; RAE1; SF3B1; CTNNBL1
 Folding of actin by CCT/TriC 7/9 1.11 × 10−6 4.70 × 10−4* CCT3; CCT6A; CCT2; TCP1; CCT7; CCT5; CCT4
 mRNA splicing 17/113 1.11 × 10−5 0.00188* NCBP1; DHX9; NCBP2; CSTF3; CDC5L; HNRNPU; PLRG1; YBX1; EFTUD2; PRPF4; CLP1; HNRNPH1; PCF11; POLR2B; CSTF1; SF3B1; CTNNBL1
 HIV infection 23/218 6.34 × 10−5 0.00589* CCNT2; PSMD11; RNGTT; TBP; TSG101; NCBP1; NUP133; NCBP2; XRCC5; HMGA1; NEDD4L; GTF2H3; GTF2E1; NUP160; AP1G1; POLR2B; NUP50; PSMD2; TAF5; NUPL1; PAK2; RAE1; KPNB1
 HIV lifecycle 18/137 3.19 × 10−5 0.00451* CCNT2; RNGTT; TBP; TSG101; NCBP1; NUP133; NCBP2; XRCC5; HMGA1; NEDD4L; GTF2H3; GTF2E1; NUP160; POLR2B; NUP50; TAF5; NUPL1; RAE1
 snRNP assembly 10/49 8.30 × 10−5 0.00589* NCBP1; NUP133; NCBP2; NUP50; TGS1; DDX20; NUPL1; RAE1; NUP160; WDR77
 Formation of tubulin-folding intermediates by TriC/CCT 7/20 5.94 × 10−5 0.00589* CCT3; CCT6A; CCT2; TCP1; CCT7; CCT5; CCT4
 Association of TriC/CCT with target proteins during biosynthesis 8/29 7.19 × 10−5 0.00589* CCT3; CCT6A; CCT2; TCP1; XRN2; CCT7; CCT5; CCT4
 Regulation of cholesterol biosynthesis by SREBP 10/53 1.47 × 10−4 0.00890* SQLE; SEC24B; GGPS1; NFYA; TGS1; CYP51A1; HMGCR; SEC24D; KPNB1; FDFT1
M02
 Chromatin organization 35/208 3.76 × 10−15 1.41 × 10−12* PHF2; KDM5C; SMARCB1; TRRAP; EHMT2; EHMT1; CHD4; ACTB; PHF21A; NSD1; SAP130; EP400; WDR5; EP300; BRD8; WHSC1; MTA2; KDM6B; BRD1; CREBBP; KDM4B; SMARCC2; KDM2B; SETDB1; SETD1B; USP22; DNMT3A; ARID1A; GATAD2A; HCFC1; SMARCA4; NCOR1; KAT6B; KAT6A; RCOR1
 Processing of capped intron-containing pre-mRNA 19/144 3.55 × 10−7 8.87 × 10−5* NUP214; SF3A1; SF3B2; SF3B3; NUP155; FUS; DDX23; SMC1A; PRPF8; SRRM1; NUP93; PRPF6; U2AF2; NUP62; POLR2D; TPR; DHX38; NUP98; SNRNP200
 Transcription 18/202 1.02 × 10−4 0.00660* GTF3C1; NFIX; POU2F1; EHMT2; CHD4; SSRP1; GATAD2A; SRRM1; POLR3A; POLR1A; U2AF2; POLR2D; TCEB3; UBTF; DHX38; MTA2; TAF4; TAF1
 PKMTs methylate histone lysines 7/29 8.03 × 10−5 0.00660* SETDB1; EHMT2; NSD1; SETD1B; WDR5; EHMT1; WHSC1
 Transport of mature mRNA derived from an intron-containing transcript 9/50 5.80 × 10−5 0.00660* NUP214; NUP93; NUP155; U2AF2; NUP62; TPR; DHX38; NUP98; SRRM1
 HATs acetylate histones 13/105 4.91 × 10−5 0.00660* BRD1; CREBBP; TRRAP; USP22; ACTB; HCFC1; KAT6B; KAT6A; SAP130; EP400; WDR5; EP300; BRD8
 Transport of mature transcript to cytoplasm 9/54 9.84 × 10−5 0.00660* NUP214; NUP93; NUP155; U2AF2; NUP62; TPR; DHX38; NUP98; SRRM1
 mRNA splicing 13/113 9.74 × 10−5 0.00660* SF3A1; SF3B2; SF3B3; FUS; DDX23; SMC1A; PRPF8; SRRM1; PRPF6; U2AF2; POLR2D; DHX38; SNRNP200
 Regulation of lipid metabolism by peroxisome proliferator-activated receptor alpha 13/114 1.06 × 10−4 0.00660* ABCA1; MED1; CREBBP; NCOA6; NRF1; MED26; SREBF2; MED12; MED14; MED24; NCOR1; SIN3A; EP300
 Transcriptional regulation of white adipocyte differentiation 11/78 6.57 × 10−5 0.00660* MED12; MED1; CREBBP; MED14; MED24; NCOR1; NCOA6; EP300; LPL; MED26; SREBF2
M16
 Axon guidance 11/327 2.24 × 10−4 0.0740 GSK3B; ARHGEF12; ROCK2; RASA1; KCNQ3; ANK2; ANK3; ARHGEF7; GRIN2B; MYH10; ITGA9
 Synthesis of PIPs at the early endosome membrane 3/13 3.84 × 10−4 0.0740 INPP4A; PIKFYVE; PIK3C3
 CREB phosphorylation through the activation of Ras 3/27 2.54 × 10−3 0.122 PDPK1; BRAF; GRIN2B
 Insulin receptor signaling cascade 5/92 0.00191 0.122 PDPK1; GRB10; PIK3C3; TSC1; MTOR
 Eph-ephrin signaling 5/94 0.00209 0.122 ROCK2; RASA1; ARHGEF7; GRIN2B; MYH10
 Sema4D-induced cell migration and growth cone collapse 3/24 0.00187 0.122 ARHGEF12; ROCK2; MYH10
 Interaction between L1 and ankyrins 3/29 0.00306 0.131 KCNQ3; ANK2; ANK3
 Post NMDA receptor activation events 3/35 0.00501 0.143 PDPK1; BRAF; GRIN2B
 Signaling by insulin receptor 5/116 0.00497 0.143 PDPK1; GRB10; PIK3C3; TSC1; MTOR
 PI3K cascade 4/68 0.00423 0.143 PDPK1; PIK3C3; TSC1; MTOR

CREB, cAMP response element binding protein; HATs, histone acetyltransferases; NMDA, N-methyl-d-aspartate; PKMTs, protein lysine methyltransferases; PIPs, phosphatidylinositol phosphates; PI3K, phosphoinositide 3-kinase; snRNP, small nuclear ribonucleo proteins; SREBP, sterol regulatory element-binding proteins; TriC/CCT, TCP1-ring complex or chaperonin containing TCP1.

*

Adjusted P values with statistical significance.

To further prioritize known EGs as candidates for ASD, we constructed a coexpression network for 974 EGs from three modules enriched for potential ASD genes (Fig. 3C and SI Materials and Methods); 844 genes among 974 have a close interaction with high-confidence ASD genes (connected to at least two genes with TADA FDR < 0.1), and 370 genes harbor de novo or inherited loss-of-function mutations in ASD individuals from Simons Simplex Collection or ASC cohorts. Of these, 52 have a TADA FDR less than 0.5. Among 52 genes, 23 have been previously shown to contribute to ASD risk [categories syndromic (S), 1, 2, 3, and 4 in SFARI]. For the remaining 29 EGs that have not yet been linked to ASD risk, we argue that, based on (i) the importance of EGs in ASD etiology as shown by their role in critical developmental stages and the increased burden of rare, damaging mutations in ASD individuals; (ii) their coexpression with high-confidence ASD genes in brain; and (iii) the suggestive genetic evidence from the TADA analysis, these 29 EGs represent the strongest candidates for additional investigation in their role in ASD (Fig. S4 and Table S7). According to available mouse phenotypes from the MGI (23) and the IMPC (24), 11 of these 29 EGs have reported heterozygous phenotypes in mice (Table S7). Among them, four EGs (CHD1, FBXO11, KDM4B, and VCP) have been associated with abnormal neural development and/or behavioral phenotypes in heterozygotes.

Fig. S4.

Fig. S4.

Chromosomal distribution of 29 EGs suggested as strong ASD candidate genes. The locations of each gene along the chromosomes are shown in red.

Table S7.

Priority list of 29 EGs as strong ASD candidates

Gene Chromosome Start End Module TADA FDR q value No. of high-confidence ASD genes that are coexpressed Disease associations
BIRC6 2 32357028 32618899 M02 0.47 15
CHD1 5 98853985 98928957 M01 0.17 15 CHD8 has been previously associated with autism
CUL1 7 148697914 148801036 M01 0.49 12
DHX29 5 55256245 55307722 M01 0.40 17
DVL3 3 184155388 184173610 M02 0.33 10 Robinow syndrome-3 characterized by skeletal abnormalities
EP300 22 41091786 41180077 M02 0.45 13 Rubinstein–Taybi syndrome characterized by short stature, moderate to severe learning difficulties, distinctive facial features, and broad thumbs and first toes
EP400 12 131949920 132081102 M02 0.43 9
FBXO11 2 47789316 47905793 M01 0.15 17 Associated with chronic otitis media with effusion and recurrent otitis media, a hearing loss disorder, and the N-ethyl-N-nitrosourea (ENU) knockout of the homologous mouse gene results in the deaf mouse mutant Jeff
KDM4B 19 4969113 5153595 M02 0.30 14
LDB1 10 102107560 102120453 M02 0.42 14
LTN1 21 28928144 28992956 M16 0.37 3
MORC3 21 36320189 36386148 M01 0.50 10
MYH10 17 8474205 8630761 M16 0.13 3 Essential for normal spine morphology and dynamics; pharmacologic or genetic inhibition of Myh10 altered protrusive motility of spines, destabilized their mushroom head morphology, and impaired excitatory synaptic transmission
NFIB 9 14081843 14398983 M01 0.45 15
PBX1 1 164555584 164899296 M01 0.46 16
PHF21A 11 45929323 46121178 M02 0.48 11
RFX7 15 56087280 56243266 M01 0.25 17
RNF38 9 36336396 36487548 M01 0.41 18
SMARCE1 17 40624962 40648508 M01 0.41 12 Meningiomas (brain and spinal cord tumors)
SNW1 14 77717599 77761207 M01 0.44 12
STXBP5 6 147204425 147390476 M16 0.37 2
SUFU 10 102503987 102633535 M02 0.47 14 Familial meningioma, medulloblastoma
TAF4 20 61953469 62065810 M02 0.30 10 Interference of transcription by the binding of TAF4 with expanded polyglutamine stretches is involved in the pathogenetic mechanisms underlying neurodegeneration
TANC2 17 63009556 63427699 M02 0.32 14
TNPO3 7 128954180 129055173 M01 0.19 17 Mutations found in patients with muscular dystrophy
UTP6 17 31860899 31901765 M01 0.19 12
VCP 9 35056064 35073249 M02 0.49 9 Inclusion body myopathy with Paget disease of bone and frontotemporal dementia, amyotrophic lateral sclerosis, Charcot–Marie–tooth disease type 2Y
WHSC1 4 1871424 1982207 M02 0.27 13 Located in the Wolf–Hirschhorn syndrome critical region
YTHDC1 4 68310387 68350089 M01 0.48 11

SI Materials and Methods

Identification of EGs.

We identified 3,023 protein-coding EGs annotated with 50 Mouse Phenotype (MP) terms, including prenatal, perinatal, and postnatal lethal phenotypes from the MGI (23) (Table S8). The MGI database was also used to extract 4,995 protein-coding NEGs with nonlethal phenotypes in the mouse. Phenotype data from the IMPC database portal (24) expanded the lethal gene list with the addition of 252 lethal genes and 101 genes with subviable phenotypes. We further supplemented the nonlethal gene list with 701 genes with viable phenotypes from the IMPC. In the case of discrepancy in the reported lethality status between the MGI and the IMPC, we deferred to the phenotypes reported by the IMPC, because these mouse lines were generated on a defined C57BL/6N background and phenotypically characterized using a standardized pipeline. One to one mouse–human orthology of lethal and nonlethal genes was established based on MGI annotation and manual curation, resulting in 3,326 essential and 4,919 nonessential human orthologs.

Table S8.

MP terms for lethal phenotypes

MP identification Lethality type Lethality description
MP:0002058 Neonatal lethality Death within the neonatal period after birth (Mus: P0)
MP:0002080 Prenatal lethality Death anytime between fertilization and birth (Mus: approximately E18.5)
MP:0002081 Perinatal lethality Death anytime within the perinatal period (Mus: E18.5 through postnatal day 1)
MP:0002082 Postnatal lethality Premature death anytime between the neonatal period and weaning age (Mus: P1 to ∼3 wk of age)
MP:0006204 Embryonic lethality before implantation Death anytime between fertilization and implantation (Mus: E0 to less than E4.5)
MP:0006205 Embryonic lethality between implantation and somite formation Death anytime between the point of implantation and somite formation (Mus: E4.5 to less than E8)
MP:0006206 Embryonic lethality between somite formation and embryo turning Death anytime between somite formation and the initiation of embryo turning (Mus: E8 to less than E9)
MP:0006207 Embryonic lethality during organogenesis Death anytime between embryo turning and the completion of organogenesis (Mus: E9–E9.5 to less than E14)
MP:0006208 Lethality throughout fetal growth and development Death anytime between the completion of organogenesis and birth (Mus: E14 to approximately E18.5)
MP:0008527 Embryonic lethality at implantation Death because of failure of implantation (Mus: E4.5)
MP:0008569 Lethality at weaning Premature death at weaning age, often caused by the inability to make the transition to solid food
MP:0008762 Embryonic lethality Death of an animal within the embryonic period before organogenesis (Mus: before E14)
MP:0009850 Embryonic lethality between implantation and placentation Death anytime between the point of implantation and the initiation of placentation (Mus: E4.5 to less than E9)
MP:0010770 Preweaning lethality Death anytime between fertilization and weaning age (Mus: ∼3–4 wk of age)
MP:0010831 Partial lethality The appearance of lower than Mendelian ratios of offspring of a given genotype because of death of some but not all of the organisms
MP:0010832 Lethality during fetal growth through weaning Death anytime between the completion of organogenesis and weaning age (Mus: E14 to ∼3 wk of age)
MP:0011083 Complete lethality at weaning Premature death at weaning age of all organisms of a given genotype in a population, often because of the inability to make the transition to solid food
MP:0011084 Partial lethality at weaning The appearance of lower than Mendelian ratios of organisms of a given genotype because of death of some but not all of the organisms at weaning age
MP:0011085 Complete postnatal lethality Premature death anytime between the neonatal period and weaning age of all organisms of a given genotype in a population (Mus: P1 to ∼3 wk of age)
MP:0011086 Partial postnatal lethality The appearance of lower than Mendelian ratios of organisms of a given genotype because of death of some but not all of the organisms anytime between the neonatal period and weaning age (Mus: P1 to ∼3 wk of age)
MP:0011087 Complete neonatal lethality Death of all organisms of a given genotype in a population within the neonatal period after birth (Mus: P0)
MP:0011088 Partial neonatal lethality The appearance of lower than Mendelian ratios of organisms of a given genotype because of death of some but not all of the organisms within the neonatal period after birth (Mus: P0)
MP:0011089 Complete perinatal lethality Death of all organisms of a given genotype in a population within the perinatal period (Mus: E18.5 through postnatal day 1)
MP:0011090 Partial perinatal lethality The appearance of lower than Mendelian ratios of organisms of a given genotype because of death of some but not all of the organisms within the perinatal period (Mus: E18.5 through postnatal day 1)
MP:0011091 Complete prenatal lethality Death of all organisms of a given genotype in a population between fertilization and birth (Mus: approximately E18.5)
MP:0011092 Complete embryonic lethality Death of all organisms of a given genotype in a population within the embryonic period before organogenesis (Mus: before E14)
MP:0011093 Complete embryonic lethality at implantation Death of all organisms of a given genotype in a population at the point of implantation (Mus: E4.5)
MP:0011094 Complete embryonic lethality before implantation Death of all organisms of a given genotype in a population between fertilization and implantation (Mus: E0 to less than E4.5)
MP:0011095 Complete embryonic lethality between implantation and placentation Death of all organisms of a given genotype in a population between the point of implantation and the initiation of placentation (Mus: E4.5 to less than E9)
MP:0011096 Complete embryonic lethality between implantation and somite formation Death of all organisms of a given genotype in a population between the point of implantation and somite formation (Mus: E4.5 to less than E8)
MP:0011097 Complete embryonic lethality between somite formation and embryo turning Death of all organisms of a given genotype in a population between somite formation and the initiation of embryo turning (Mus: E8 to less than E9)
MP:0011098 Complete embryonic lethality during organogenesis Death of all organisms of a given genotype in a population between embryo turning and the completion of organogenesis (Mus: E9–E9.5 to less than E14)
MP:0011099 Complete lethality throughout fetal growth and development Death of all organisms of a given genotype in a population between the completion of organogenesis and birth (Mus: E14 to approximately E18.5)
MP:0011100 Complete preweaning lethality Death of all organisms of a given genotype in a population between fertilization and weaning age (Mus: ∼3–4 wk of age)
MP:0011101 Partial prenatal lethality The appearance of lower than Mendelian ratios of organisms of a given genotype because of death of some but not all of the organisms between fertilization and birth (Mus: approximately E18.5)
MP:0011102 Partial embryonic lethality The appearance of lower than Mendelian ratios of organisms of a given genotype because of death of some but not all of the organisms within the embryonic period before organogenesis (Mus: before E14)
MP:0011103 Partial embryonic lethality at implantation The appearance of lower than Mendelian ratios of organisms of a given genotype because of death of some but not all of the organisms at the point of implantation (Mus: E4.5)
MP:0011104 Partial embryonic lethality before implantation The appearance of lower than Mendelian ratios of organisms of a given genotype because of death of some but not all of the organisms between fertilization and implantation (Mus: E0 to less than E4.5)
MP:0011105 Partial embryonic lethality between implantation and placentation The appearance of lower than Mendelian ratios of organisms of a given genotype because of death of some but not all of the organisms between the point of implantation and the initiation of placentation (Mus: E4.5 to less than E9)
MP:0011106 Partial embryonic lethality between implantation and somite formation The appearance of lower than Mendelian ratios of organisms of a given genotype because of death of some but not all of the organisms between the point of implantation and somite formation (Mus: E4.5 to less than E8)
MP:0011107 Partial embryonic lethality between somite formation and embryo turning The appearance of lower than Mendelian ratios of organisms of a given genotype because of death of some but not all of the organisms between somite formation and the initiation of embryo turning (Mus: E8 to less than E9)
MP:0011108 Partial embryonic lethality during organogenesis The appearance of lower than Mendelian ratios of organisms of a given genotype because of death of some but not all of the organisms between embryo turning and the completion of organogenesis (Mus: E9–E9.5 to less than E14)
MP:0011109 Partial lethality throughout fetal growth and development The appearance of lower than Mendelian ratios of organisms of a given genotype because of death of some but not all of the organisms between the completion of organogenesis and birth (Mus: E14 to approximately E18.5)
MP:0011110 Partial preweaning lethality The appearance of lower than Mendelian ratios of organisms of a given genotype because of death of some but not all of the organisms between fertilization and weaning age (Mus: ∼3–4 wk of age)
MP:0011111 Complete lethality during fetal growth through weaning Death of all organisms of a given genotype in a population between the completion of organogenesis and weaning age (Mus: E14 to ∼3 wk of age)
MP:0011112 Partial lethality during fetal growth through weaning The appearance of lower than Mendelian ratios of organisms of a given genotype because of death of some but not all of the organisms between the completion of organogenesis and weaning age (Mus: E14 to ∼3 wk of age)
MP:0011400 Complete lethality All individuals of a given genotype in a population die before the end of the normal lifespan, but time(s) of death are unspecified
MP:0013292 Embryonic lethality before organogenesis Death before the completion of embryo turning (Mus: E9–E9.5)
MP:0013293 Embryonic lethality before tooth bud stage Death before the appearance of tooth buds (Mus: E12–E12.5)
MP:0013294 Prenatal lethality before heart atrial septation Death before the completion of heart atrial septation (Mus: E14.5–E15.5)

E, embryonic day; Mus, Mus musculus.

The catalog of EGs was further augmented with the addition of cell EGs from three recent studies (2022) aimed at the characterization of EGs in human cell lines. We obtained 1,580 core EGs (genes above essentiality threshold in at least three of five cell lines in the study) from the work by Hart et al. (22), 1,739 core EGs (genes above essentiality threshold in at least two of four cell lines in the study) from the work by Wang et al. (21), and 1,734 core EGs (genes above essentiality threshold in at least one of two cell lines in the study) from the work by Blomen et al. (20). By taking the overlap of three sets of core EGs, we obtained 956 high-confidence human EGs. Among 956 EGs in human cell lines, 348 genes (36.4%) are also human orthologs of EGs in the mouse, 19 genes (2.0%) are human orthologs of NEGs in the mouse, and 589 genes (61.6%) have not been tested in the mouse (14).

Analysis of Haploinsufficiency of EGs.

We collected genes sets from multiple studies and resources for the analysis of patterns of inheritance and haploinsufficiency of EGs. First, a catalog of human disease genes was obtained from Online Mendelian Inheritance in Man (OMIM; downloaded on July 12, 2016) (29). From the OMIM catalog, we identified 1,411 genes annotated with genetic disorders that are “autosomal dominant” or “X-linked dominant” and 2,056 genes annotated with genetic disorders that are “autosomal recessive” or “X-linked recessive.” By dissecting the above two gene lists, we obtained 1,000 genes underlying only dominant diseases, 1,645 genes underlying only recessive diseases, and 441 genes that were linked to both dominant and recessive disorders. Second, a list of 616 protein-coding genes that were systematically assessed for evidence for dosage sensitivity was obtained from ClinGen Dosage Sensitivity Map (30). Among 616 genes, 239 genes were dosage-sensitive with sufficient evidence, 41 genes were dosage-sensitive with some evidence, 47 genes were dosage-sensitive with little evidence, 200 genes had no evidence for dosage pathogenicity so far, and 89 genes were not dosage-sensitive or with autosomal recessive phenotype. Third, a list of 262 haploinsufficient genes based on text-mining from PubMed and OMIM was obtained from the work by Dang et al. (31). Fourth, from the MGI, we identified 313 human orthologs of mouse genes associated with heterozygous phenotypes. For each of the gene sets, we evaluated the enrichment of EGs compared with NEGs using Fisher’s exact test.

We acquired the Haploinsufficiency Scores (32) and the Genome-Wide Haploinsufficiency Scores (33) for genome-wide prediction of the probability of haploinsufficiency. For each prediction model, the raw scores were ranked and converted to percentiles. The histograms and estimated density curves were plotted using ggplot2 (geom_histogram and geom_line) in R.

Burden Analysis of Mutations in EGs in ASD Families.

The Simons Simplex Collection contains genetic and phenotypic information from 2,600 ASD families, each of which has one child affected with ASD and unaffected parents and siblings (34). ASD probands were defined by clinical consensus from the Autism Diagnostic Interview–Revised (57) and the Autism Diagnostic Observation Schedule (58). Multiple individual phenotypic measures, including the SRS (35) and IQ, were available (8, 26).

We aimed to investigate the impact of both de novo and rare inherited variants in EGs on ASD risk. We acquired a list of 5,648 de novo variants from an exome sequencing study on 2,517 ASD families from the Simons Simplex Collection (8) and an additional list of 1,544 de novo variants from a reanalysis of the same cohort (2,377 ASD families) with a different pipeline (26). Among 7,192 de novo variants, 674 were loss-of-function mutations (i.e., SNVs that are frameshift, stop-loss, stop-gain, start-loss, splicing donor or acceptor, and frameshift indels), and 3,462 were nonsynonymous mutations (i.e., missense SNVs and nonframeshift indels). The deleterious de novo nonsynonymous mutations were selected using a threshold of the Combined Annotation-Dependent Depletion (CADD) (59) phred-scale score above 10. In addition, we obtained 249,729 rare inherited mutations from 2,377 ASD families (26). From the variants successfully called by both GATK (60) and FreeBayes (61), we extracted loss-of-function mutations and nonsynonymous mutations with minor allele frequency in Exome Variant Server (European ancestry) (62) less than 0.01 and CADD phred-scale score above 10. At the end of the variant filtering steps, we obtained 372 dnLoF variants, 1,497 dnNSD variants, and 77,891 inhRD variants in EGs or NEGs for mutational burden analysis (Fig. S1 and Datasets S3 and S4).

The individual mutational burden was defined as the number of mutations carried by each subject in the gene sets of interest (i.e., 3,915 EGs and 4,919 NEGs) for each class of variants (dnLoF, dnNSD, and inhRD). Among all Simons Simplex Collection ASD families, there were 1,781 ASD quartets where exome sequence data from an affected proband and an unaffected sibling were available. The individual mutational burden in 1,781 ASD probands was compared with the burden in their unaffected sibling using one-sided Wilcoxon signed ranked test.

We acquired SRS total raw scores for 2,348 probands and 1,678 siblings as well as verbal/nonverbal IQs for 2,359 probands for 1,781 ASD quartets and 587 ASD trios from Simons Simplex Collection families (Dataset S2). Poisson regression analysis was carried out separately between each trait (i.e., SRS total raw score and verbal IQ and nonverbal IQ) as the dependent variables and the individual burdens of all rare damaging mutations (including dnLoF, dnNSD, and inhRD) in EGs or NEGs as the independent variables.

Construction of Coexpression Modules and Coexpression Network in Brain.

Coexpression analysis in human brain was conducted based on RNA-seq data from BrainSpan: Atlas of the Developing Human Brain (40). We used the Weighted Correlation Network Analysis (WGCNA) package (63) for data quality control and identification of modules of coexpressed genes. The expression data for 52,376 Ensembl genes (56) (including protein-coding genes, noncoding genes, or pseudogenes) across 525 samples were obtained; 1,716 genes with too many missing entries or zero variance in expression levels were removed by the “goodSamplesGenes” function in the WGCNA, and 12,613 genes with very low expression levels [maximum reads per kilobase of transcript per million mapped reads (RPKM) less than 0.5 across samples] were removed. As a final step for gene-level data cleaning, only protein-coding genes were selected for additional analysis. For sample-level data cleaning, three outlier subjects (300, 303, and 306) were removed according to subject-level clustering result. Ten brain tissue types (caudal ganglionic eminence, cerebellum, dorsal thalamus, lateral ganglionic eminence, medial ganglionic eminence, occipital neocortex, parietal neocortex, primary motor sensory cortex, temporal neocortex, and upper rhombic lip) with data from fewer than 10 developmental stages were removed. The final quality-controlled dataset consisted of expression levels of 15,952 protein-coding genes in 16 brain tissue types across 31 pre- and postnatal developmental stages (495 samples in total). For module detection, we used the “blockwiseModules” function in the WGCNA with default parameters, except for the network type (power = 6, deepSplit = 2, and networkType = “signed”). We used the signed version of coexpression analysis that links two genes with positive correlation of expression levels.

Coexpression between gene pairs was calculated based on the quality-controlled BrainSpan RNA-seq data with 495 brain samples. Two genes were defined as “coexpressed” in the brain if the Pearson correlation of the expression levels of both genes across 495 brain samples was greater than or equal to 0.8. In total, there were 8,600,150 coexpression links among protein-coding genes. The coexpression network was created using the GeneMania plugin (64) within Cytoscape 3.2.1 (65). Of 974 EGs from three modules (M01, M02, and M16) implicated in ASD, coexpression data were available for 973 genes, which were used as the input gene set for network construction. The coexpression network consists of a main connected component with 963 nodes and 187,443 edges as well as 10 isolated nodes.

Discussion

We provide multiple lines of evidence suggesting that deleterious variants in EGs have a cumulative effect on ASD risk. Using the most comprehensive list of 3,915 EGs established to date, we show that there is both an elevated burden of damaging mutations in EGs in ASD probands and also, an enrichment of EGs in the recently identified high-confidence ASD-associated genes. Moreover, the analysis of EGs in the developing brain identified clusters of coexpressed EGs implicated in ASD, including 29 EGs functionally related to previously identified ASD risk genes.

We find that ASD individuals have a higher burden of mutations in EGs compared with their unaffected siblings. It is notable but not surprising that this effect is particularly pronounced when considering de novo mutations, because this class of mutations is only subject to selection pressure after originating in the individual and has exhibited some of the most prominent associations with the risk of ASD (8, 4345). Similarly, a moderately increased burden of dnLoF variants in ASD probands was detected with a group of 10,823 phenotypically uncharacterized genes. Based on current estimates, one-fifth of these uncharacterized genes (∼2,000) are expected to be EGs, which may explain the higher mutational burden of dnLoF variants in ASD probands. Recent studies have begun to show that additional genetic factors, such as rare and common inherited variations, also contribute to ASD (26, 46). Our result supports this finding, showing that inherited, rare, damaging mutations in EGs also have a significant effect on ASD risk. Furthermore, we show an EG-specific effect on social responsiveness, a measure of the social aspects of ASD. In contrast, mutational burden in both EGs and NEGs has an effect on IQ measures. Complex social behaviors result from a range of different cognitive processes; however, in ASD subjects, there is a striking dissociation in the level of impairment in social interaction or communication and general cognitive abilities (as measured by IQ) (36) (Fig. S2). Moreover, studies in model organisms clearly show a fetal origin for social behavior deficits (47). Our results are in line with these findings and suggest that, although a higher mutational burden over all genes may have consequences on IQ, mutational burden in a set of genes with a role at critical early developmental stages influences the development of social behavior. Moreover, our findings are also further supported by the recent report that genomic regions that are under accelerated evolution have essential functions in the human brain development and when mutated, may cause increased risk for autism (48). Therefore, understanding the regulatory landscape of dosage-sensitive EGs expressed at critical stages of brain development may reveal risk alleles for many neurodevelopmental and psychiatric disorders.

The analysis of the overlapping set of Simons Simplex Collection ASD families by several groups using complementary approaches led to the identification of around 100 ASD risk genes and the finding of a depletion of damaging mutations in ASD risk genes (12, 27, 28). We show that a significant number of reported ASD risk genes are essential for survival and fitness and therefore, have a distinctive mutational spectrum, providing a biological foundation for this intolerance to damaging mutations. Of the spectrum of existing alleles, homozygosity or compound heterozygosity for loss-of function alleles will never be observed. Also, because of synthetic lethality, some combinations of mutations in EGs are eliminated. Therefore, individuals will have only a subset of “milder” coding or regulatory alleles. The current list of candidate genes consists of 100 (high-confidence ASD genes) to 400 genes (potential ASD genes) (9). It is striking that our study provides strong statistical evidence for the aggregate effect across 3,915 EGs impacting risk for this neurodevelopmental disorder. A recent SNP-based heritability study reported the extreme polygenicity of schizophrenia, with 70% of 1-Mb genomic regions harboring schizophrenia risk alleles (49). Assuming a similar genetic architecture in ASD and schizophrenia, genomic maps of EGs with “surviving” deleterious and regulatory variants in ASD probands represent a complementary approach for the analysis of combinations of culprit genes or alleles.

Because of the fundamental functional role of EGs in an organism, genetic variants in these genes are likely to contribute to many traits and diseases as reflected by the previous finding that EGs are enriched for human disease genes (11, 13, 14). Our study is focused on a specific neurodevelopmental disorder—ASD—because it has been suggested that ASD has its roots in abnormalities in prenatal brain development (5052). Specifically, our analysis of the temporal expression patterns of coexpressed gene modules in the developing brain shows that genes in three EG-enriched coexpression modules implicated in ASD are expressed at a high level at the earliest stages of brain development, as early as 8 weeks after conception. In contrast, at later stages of brain development, the expression levels of genes in these EG-enriched modules decrease, whereas the expression levels of genes in NEG-enriched modules increase. This finding suggests that EGs have a distinctive influence at some of the earliest brain developmental stages as previously reported for constrained genes (53) and genes in functional networks perturbed in ASD (54). However, it is not clear whether the contribution of EGs is specific to ASD or widespread across disorders with various underlying mechanisms. A comparison of the burden of deleterious variants in EGs across other complex disorders, including those with a later onset, is warranted.

Each individual can carry a number of deleterious mutations, each of which can have a small effect. Because brain function may be particularly sensitive to mutation accumulation, identifying a specific set of genes in which mutations have a behavioral effect will assist us in understanding how mutation accumulation within an individual can result in a phenotype, such as ASD. Hallmarks of ASD are phenotypic heterogeneity, frequent comorbidities, and that no specific brain region or cell type is uniquely implicated (5), further supporting the role of genes with a global effect on embryonic and fetal development. Here, we provide evidence that genes that are essential for survival and fitness also contribute to ASD risk and lead to the disruption of normal social behavior.

Materials and Methods

Identification of EGs.

Mouse Phenotype (MP) terms for the annotation of EGs are listed in Table S8. More details on identification of the catalog of EGs are in SI Materials and Methods.

Analysis of Haploinsufficiency of EGs.

Details on collection of genes sets for the analysis of haploinsufficiency of EGs are in SI Materials and Methods.

Burden Analysis of Mutations in EGs in ASD Families.

Details on collection of genetic and phenotypic data of ASD families and variant filtering process are in SI Materials and Methods.

Comparison Between Observed and Expected TADA FDR q Values.

To compare the strength of association signals to ASD between EGs and NEGs, FDR q values for the TADA test of 18,665 genes were obtained from the work by Sanders et al. (9). For each gene set of interest (i.e., 3,915 EGs or 4,919 NEGs), the null distribution of TADA FDR q values was generated by randomly resampling with replacement. Within one iteration of the resampling procedure, the TADA FDR q value of a random gene from the tested 18,665 genes was obtained for each gene in the gene set of interest. The resampled TADA FDR q values were then ranked from low to high. The resampling procedure was repeated for 100,000 iterations. For each observed TADA FDR q value ranked from low to high, the median of 100,000 resampled q values with the same rank was considered the expected TADA FDR q value. The 2.5th and 97.5th percentiles of 100,000 resampled q values were considered the estimated 95% confidence intervals of each expected TADA FDR q value. The observed FDR q values were then compared with the expected FDR q values.

Construction of Coexpression Modules and Coexpression Network in Brain.

Details on construction of coexpression modules and coexpression network in the developing human brain are in SI Materials and Methods.

Pathway Enrichment Analysis.

We performed pathway enrichment analysis in the Reactome database (42) using Enrichr (55) for three EG-enriched modules (M01, M02, and M16) that were also enriched for potential ASD genes (Table S6). The enriched pathways were ranked by P values with Benjamini–Hochberg adjustment (FDR q values) from the Fisher’s exact test.

Code Availability.

Details on availability of code used to generate reported results are in Table S9.

Table S9.

Analysis code for figures and tables generated

File name Figure/table Description
Fig1A_Fig2B_plotGeneSetEnrichment.r Figs. 1A and 2B Plotting enrichment of EGs among haploinssuficient genes and ASD risk genes
Fig1BC_plotHIScoreDistribution.r Fig. 1 B and C Plotting the distribution of haploinsufficiency scores
Fig2A_getForestPlot_burdenAnalysis.r Fig. 2A Plotting the results for mutational burden analysis
Fig2A_Table1_TableS1_S2_S3_S4_burdenAnalysis.r Fig. 2A, Table 1, and Tables S1, S2, S3, and S4 Performing mutational burden analysis
Fig2C_getExpectedTADAFDR.py Fig. 2C Generating the null distribution of TADA FDR q values for gene set
Fig2C_plotTADAfdrQQ.r Fig. 2C Plotting the observed vs. null distribution of TADA FDR q values for gene set
Fig3A_plotModuleEnrichment.r Fig. 3A Plotting the enrichment of EGs/NEGs among coexpression modules
Fig3C_plotNetworkAttibutes.r Fig. 3C Plotting the coexpression network of gene modules implicated in ASD
FigS1_Fig3B_plotEigengenes.r Fig. 3B and Fig. S1 Plotting the expression trajectories of coexpression modules
FigS2_plotSRS_IQ.r Fig. S2 Plotting the correlation between SRS and IQ in ASD probands
DatasetS3_S4_getVariantList.py Datasets S3 and S4 Generating lists of variants for mutational burden analysis

Analysis codes for figures and tables generated were deposited into Github (https://github.com/Bucanlab/Ji_PNAS_2016).

Supplementary Material

Supplementary File
Supplementary File
pnas.1613195113.sd02.xlsx (349.8KB, xlsx)
Supplementary File
pnas.1613195113.sd03.xlsx (472.5KB, xlsx)
Supplementary File
Supplementary File
pnas.1613195113.sd02.xlsx (349.8KB, xlsx)
Supplementary File
pnas.1613195113.sd03.xlsx (472.5KB, xlsx)
Supplementary File

Acknowledgments

We thank Steve Murray and the International Mouse Phenotyping Consortium (IMPC) for help with generation of gene lists, and Benjamin Georgi, Benjamin Voight, Hakon Hakonarson, Steve Brown, Judith Miller, Edward Brodkin, and Lu Chen for discussions. X.J. was supported by a fellowship from Biomedical Graduate Studies at the University of Pennsylvania. This work was supported by the Pennsylvania Commonwealth Grant and NIH Grants R01MH101822 (to C.D.B.) and R01MH093415 (to M.B. and Steven M. Paul; multiple principal investigators).

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1613195113/-/DCSupplemental.

References

  • 1.State MW, Levitt P. The conundrums of understanding genetic risks for autism spectrum disorders. Nat Neurosci. 2011;14(12):1499–1506. doi: 10.1038/nn.2924. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Huguet G, Ey E, Bourgeron T. The genetic landscapes of autism spectrum disorders. Annu Rev Genomics Hum Genet. 2013;14:191–213. doi: 10.1146/annurev-genom-091212-153431. [DOI] [PubMed] [Google Scholar]
  • 3.Willsey AJ, State MW. Autism spectrum disorders: From genes to neurobiology. Curr Opin Neurobiol. 2015;30:92–99. doi: 10.1016/j.conb.2014.10.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.De Rubeis S, Buxbaum JD. Recent advances in the genetics of autism spectrum disorder. Curr Neurol Neurosci Rep. 2015;15(6):36. doi: 10.1007/s11910-015-0553-1. [DOI] [PubMed] [Google Scholar]
  • 5.de la Torre-Ubieta L, Won H, Stein JL, Geschwind DH. Advancing the understanding of autism disease mechanisms through genetics. Nat Med. 2016;22(4):345–361. doi: 10.1038/nm.4071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Geschwind DH, Levitt P. Autism spectrum disorders: Developmental disconnection syndromes. Curr Opin Neurobiol. 2007;17(1):103–111. doi: 10.1016/j.conb.2007.01.009. [DOI] [PubMed] [Google Scholar]
  • 7.De Rubeis S, et al. DDD Study; Homozygosity Mapping Collaborative for Autism; UK10K Consortium Synaptic, transcriptional and chromatin genes disrupted in autism. Nature. 2014;515(7526):209–215. doi: 10.1038/nature13772. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Iossifov I, et al. The contribution of de novo coding mutations to autism spectrum disorder. Nature. 2014;515(7526):216–221. doi: 10.1038/nature13908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Sanders SJ, et al. Autism Sequencing Consortium Insights into autism spectrum disorder genomic architecture and biology from 71 risk loci. Neuron. 2015;87(6):1215–1233. doi: 10.1016/j.neuron.2015.09.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Zhang M, Zhu C, Jacomy A, Lu LJ, Jegga AG. The orphan disease networks. Am J Hum Genet. 2011;88(6):755–766. doi: 10.1016/j.ajhg.2011.05.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Georgi B, Voight BF, Bućan M. From mouse to human: Evolutionary genomics analysis of human orthologs of essential genes. PLoS Genet. 2013;9(5):e1003484. doi: 10.1371/journal.pgen.1003484. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Petrovski S, Wang Q, Heinzen EL, Allen AS, Goldstein DB. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 2013;9(8):e1003709. doi: 10.1371/journal.pgen.1003709. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Dickerson JE, Zhu A, Robertson DL, Hentges KE. Defining the role of essential genes in human disease. PLoS One. 2011;6(11):e27368. doi: 10.1371/journal.pone.0027368. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Dickinson ME, et al. International Mouse Phenotyping Consortium; Jackson Laboratory; Infrastructure Nationale PHENOMIN, Institut Clinique de la Souris (ICS); Charles River Laboratories; MRC Harwell; Toronto Centre for Phenogenomics; Wellcome Trust Sanger Institute; RIKEN BioResource Center High-throughput discovery of novel developmental phenotypes. Nature. 2016;537(7621):508–514. doi: 10.1038/nature19356. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Deutschbauer AM, et al. Mechanisms of haploinsufficiency revealed by genome-wide profiling in yeast. Genetics. 2005;169(4):1915–1925. doi: 10.1534/genetics.104.036871. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Mushegian AR, Koonin EV. A minimal gene set for cellular life derived by comparison of complete bacterial genomes. Proc Natl Acad Sci USA. 1996;93(19):10268–10273. doi: 10.1073/pnas.93.19.10268. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Koonin EV. Comparative genomics, minimal gene-sets and the last universal common ancestor. Nat Rev Microbiol. 2003;1(2):127–136. doi: 10.1038/nrmicro751. [DOI] [PubMed] [Google Scholar]
  • 18.Hwang YC, et al. Predicting essential genes based on network and sequence analysis. Mol Biosyst. 2009;5(12):1672–1678. doi: 10.1039/B900611G. [DOI] [PubMed] [Google Scholar]
  • 19.Chakravarti A, Turner TN. Revealing rate-limiting steps in complex disease biology: The crucial importance of studying rare, extreme-phenotype families. BioEssays. 2016;38(6):578–586. doi: 10.1002/bies.201500203. [DOI] [PubMed] [Google Scholar]
  • 20.Blomen VA, et al. Gene essentiality and synthetic lethality in haploid human cells. Science. 2015;350(6264):1092–1096. doi: 10.1126/science.aac7557. [DOI] [PubMed] [Google Scholar]
  • 21.Wang T, et al. Identification and characterization of essential genes in the human genome. Science. 2015;350(6264):1096–1101. doi: 10.1126/science.aac7041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Hart T, et al. High-resolution CRISPR screens reveal fitness genes and genotype-specific cancer liabilities. Cell. 2015;163(6):1515–1526. doi: 10.1016/j.cell.2015.11.015. [DOI] [PubMed] [Google Scholar]
  • 23.Eppig JT, et al. Mouse Genome Database Group The Mouse Genome Database (MGD): From genes to mice--a community resource for mouse biology. Nucleic Acids Res. 2005;33(Database issue):D471–D475. doi: 10.1093/nar/gki113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Koscielny G, et al. The International Mouse Phenotyping Consortium Web Portal, a unified point of access for knockout mice and related phenotyping data. Nucleic Acids Res. 2014;42(Database issue):D802–D809. doi: 10.1093/nar/gkt977. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.White JK, et al. Sanger Institute Mouse Genetics Project Genome-wide generation and systematic phenotyping of knockout mice reveals new roles for many genes. Cell. 2013;154(2):452–464. doi: 10.1016/j.cell.2013.06.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Krumm N, et al. Excess of rare, inherited truncating mutations in autism. Nat Genet. 2015;47(6):582–588. doi: 10.1038/ng.3303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Samocha KE, et al. A framework for the interpretation of de novo mutation in human disease. Nat Genet. 2014;46(9):944–950. doi: 10.1038/ng.3050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Iossifov I, et al. Low load for disruptive mutations in autism genes and their biased transmission. Proc Natl Acad Sci USA. 2015;112(41):E5600–E5607. doi: 10.1073/pnas.1516376112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005;33(Database issue):D514–D517. doi: 10.1093/nar/gki033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Rehm HL, et al. ClinGen ClinGen--the Clinical Genome Resource. N Engl J Med. 2015;372(23):2235–2242. doi: 10.1056/NEJMsr1406261. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Dang VT, Kassahn KS, Marcos AE, Ragan MA. Identification of human haploinsufficient genes and their genomic proximity to segmental duplications. Eur J Hum Genet. 2008;16(11):1350–1357. doi: 10.1038/ejhg.2008.111. [DOI] [PubMed] [Google Scholar]
  • 32.Huang N, Lee I, Marcotte EM, Hurles ME. Characterising and predicting haploinsufficiency in the human genome. PLoS Genet. 2010;6(10):e1001154. doi: 10.1371/journal.pgen.1001154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Steinberg J, Honti F, Meader S, Webber C. Haploinsufficiency predictions without study bias. Nucleic Acids Res. 2015;43(15):e101. doi: 10.1093/nar/gkv474. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Fischbach GD, Lord C. The Simons Simplex Collection: A resource for identification of autism genetic risk factors. Neuron. 2010;68(2):192–195. doi: 10.1016/j.neuron.2010.10.006. [DOI] [PubMed] [Google Scholar]
  • 35.Constantino J, Gruber C. The Social Responsiveness Scale Manual. Western Psychological Services; Los Angeles: 2005. [Google Scholar]
  • 36.Constantino JN, et al. Validation of a brief quantitative measure of autistic traits: Comparison of the social responsiveness scale with the autism diagnostic interview-revised. J Autism Dev Disord. 2003;33(4):427–433. doi: 10.1023/a:1025014929212. [DOI] [PubMed] [Google Scholar]
  • 37.Abrahams BS, et al. SFARI Gene 2.0: A community-driven knowledgebase for the autism spectrum disorders (ASDs) Mol Autism. 2013;4(1):36. doi: 10.1186/2040-2392-4-36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.He X, et al. Integrated model of de novo and inherited genetic variants yields greater power to identify risk genes. PLoS Genet. 2013;9(8):e1003671. doi: 10.1371/journal.pgen.1003671. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Buxbaum JD, et al. Autism Sequencing Consortium The autism sequencing consortium: Large-scale, high-throughput sequencing in autism spectrum disorders. Neuron. 2012;76(6):1052–1056. doi: 10.1016/j.neuron.2012.12.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.BrainSpan 2011 BrainSpan: Atlas of the Developing Human Brain. Available at brainspan.org. Accessed October 4, 2013.
  • 41.Croft D, et al. The Reactome pathway knowledgebase. Nucleic Acids Res. 2014;42(Database issue):D472–D477. doi: 10.1093/nar/gkt1102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Fabregat A, et al. The Reactome pathway Knowledgebase. Nucleic Acids Res. 2016;44(D1):D481–D487. doi: 10.1093/nar/gkv1351. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Sanders SJ, et al. De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature. 2012;485(7397):237–241. doi: 10.1038/nature10945. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.O’Roak BJ, et al. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature. 2012;485(7397):246–250. doi: 10.1038/nature10989. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Iossifov I, et al. De novo gene disruptions in children on the autistic spectrum. Neuron. 2012;74(2):285–299. doi: 10.1016/j.neuron.2012.04.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Gaugler T, et al. Most genetic risk for autism resides with common variation. Nat Genet. 2014;46(8):881–885. doi: 10.1038/ng.3039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Belinson H, et al. Prenatal β-catenin/Brn2/Tbr2 transcriptional cascade regulates adult social and stereotypic behaviors. Mol Psychiatry. 2016;21(10):1417–1433. doi: 10.1038/mp.2015.207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Doan RN, et al. Mutations in human accelerated regions disrupt cognition and social behavior. Cell. 2016;167(2):341–354.e12. doi: 10.1016/j.cell.2016.08.071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Loh PR, et al. Schizophrenia Working Group of Psychiatric Genomics Consortium Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat Genet. 2015;47(12):1385–1392. doi: 10.1038/ng.3431. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Willsey AJ, et al. Coexpression networks implicate human midfetal deep cortical projection neurons in the pathogenesis of autism. Cell. 2013;155(5):997–1007. doi: 10.1016/j.cell.2013.10.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Parikshak NN, et al. Integrative functional genomic analyses implicate specific molecular pathways and circuits in autism. Cell. 2013;155(5):1008–1021. doi: 10.1016/j.cell.2013.10.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Stoner R, et al. Patches of disorganization in the neocortex of children with autism. N Engl J Med. 2014;370(13):1209–1219. doi: 10.1056/NEJMoa1307491. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Choi J, Shooshtari P, Samocha KE, Daly MJ, Cotsapas C. Network analysis of genome-wide selective constraint reveals a gene network active in early fetal brain intolerant of mutation. PLoS Genet. 2016;12(6):e1006121. doi: 10.1371/journal.pgen.1006121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Chang J, Gilman SR, Chiang AH, Sanders SJ, Vitkup D. Genotype to phenotype relationships in autism spectrum disorders. Nat Neurosci. 2015;18(2):191–198. doi: 10.1038/nn.3907. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Chen EY, et al. Enrichr: Interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics. 2013;14:128. doi: 10.1186/1471-2105-14-128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Flicek P, et al. Ensembl 2014. Nucleic Acids Res. 2014;42(Database issue):D749–D755. doi: 10.1093/nar/gkt1196. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Lord C, Rutter M, Le Couteur A. Autism Diagnostic Interview-Revised: A revised version of a diagnostic interview for caregivers of individuals with possible pervasive developmental disorders. J Autism Dev Disord. 1994;24(5):659–685. doi: 10.1007/BF02172145. [DOI] [PubMed] [Google Scholar]
  • 58.Lord C, et al. The autism diagnostic observation schedule-generic: A standard measure of social and communication deficits associated with the spectrum of autism. J Autism Dev Disord. 2000;30(3):205–223. [PubMed] [Google Scholar]
  • 59.Kircher M, et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014;46(3):310–315. doi: 10.1038/ng.2892. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.McKenna A, et al. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Garrison E, Marth G. 2012. Haplotype-based variant detection from short-read sequencing. arXiv:1207.3907.
  • 62. NHLBI Exome Sequencing Project (ESP) Exome Variant Server. Available at evs.gs.washington.edu/EVS/. Accessed November 11, 2015.
  • 63.Langfelder P, Horvath S. WGCNA: An R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:559. doi: 10.1186/1471-2105-9-559. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Mostafavi S, Ray D, Warde-Farley D, Grouios C, Morris Q. GeneMANIA: A real-time multiple association network integration algorithm for predicting gene function. Genome Biol. 2008;9(Suppl 1):S4. doi: 10.1186/gb-2008-9-s1-s4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Shannon P, et al. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13(11):2498–2504. doi: 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File
Supplementary File
pnas.1613195113.sd02.xlsx (349.8KB, xlsx)
Supplementary File
pnas.1613195113.sd03.xlsx (472.5KB, xlsx)
Supplementary File
Supplementary File
pnas.1613195113.sd02.xlsx (349.8KB, xlsx)
Supplementary File
pnas.1613195113.sd03.xlsx (472.5KB, xlsx)
Supplementary File

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES