Abstract
In a recent issue of Cell, Satterstrom et al. leverage de novo high-impact variants to identify 102 genes associated with autism spectrum disorder (ASD). Most of these genes have roles in regulation of gene expression or neuronal communication, implicating both developmental and functional changes in ASD.
Autism spectrum disorder (ASD) is a developmental disability characterized by persistent deficits in social communication and interaction as well as restricted, repetitive patterns of behavior. The prevalence of ASD is estimated to be >1%, highlighting the significant societal burden, with males four times as likely to be affected as females. Given that ASD is defined by clinical presentation (as opposed to a molecular diagnosis), with a wide range of symptom severity that may not be fully manifest until social demands exceed the reduced capacity, understanding the molecular and/or genetic basis of ASD is critical to the development of effective therapies. Moreover, the high comorbidity of ASD with intellectual disability (ID) and neurodevelopmental delay (NDD) (30%–50%) requires that symptoms must not be better explained by ID or global developmental delay, further complicating diagnosis. Despite the high heterogeneity of the ASD phenotype, studies have demonstrated a significant genetic component to ASD risk, with common variation explaining >50% of the heritability (Gaugler et al., 2014). However, despite increasingly large sample sizes, genome-wide association studies (GWASs) have had limited success identifying specific genes associated with ASD. A somewhat complementary approach focusing on rare mutations of large effect, often arising new in the offspring (de novo, i.e., not inherited from a parent) while not explaining a great deal of the overall disease risk has been extremely successful in identifying ASD-associated genes and providing critical insight into ASD biology.
In a study recently published in Cell, a large consortium led by Stephan Sanders, Kathryn Roeder, Mark Daly, and Joseph Buxbaum (Satterstrom et al., 2020) describes a large-scale exome sequencing analysis that tackles many key challenges in autism genetics and represents the largest analysis undertaken of de novo mutations in ASD, leading to the identification of 102 ASD-associated genes. The authors assembled a cohort of ~35,500 samples (~12,000 with ASD), including ~21,000 family-based samples (6,430 ASD cases, 2,179 unaffected siblings, and both parents) and 14,365 case-control samples (5,556 cases and 8,809 controls), and identified a set of 9,345 rare de novo variants in proton-coding exons (allele frequency %0.1% in their dataset and in non-psychiatric subsets of the ExAC and gnomAD databases). There are two key observations that informed downstream analyses. First, recent studies have shown that measures of functional severity can delineate classes of variants that have different risk burdens for disease. Specifically, the authors relied on two measures of functional severity, evolutionary constraint such as the probability of loss-of-function intolerance (pLI) score, which identifies genes likely to contribute to disease as evidenced by the paucity of functional variation observed in the general population (implying that functional variation is strongly selected against in these genes) (Kosmicki et al., 2017), and the missense badness, PolyPhen-2, constraint (MPC) score, which assesses the likely functional impact of a mutation (Samocha et al., 2017), to classify genetic variants into risk tiers. The second key observation is that ASD is associated with reduced fecundity (Power et al., 2013), and thus, genetic variation associated with ASD is subject to natural selection. In practice, this means that de novo variation, which is not subject to natural selection, should have a larger impact on ASD risk than inherited variation, even for the same class of genetic variant severity. This is precisely what the authors found. They observed a highly significant 3.5-fold enrichment of de novo protein truncating variants (PTVs) (p = 4 × 10−17), the most severe class of mutation, as they are predicted to eliminate gene function, versus a non-significant 1.2-fold enrichment of inherited PTVs (p = 0.07). Likewise, mutation severity (defined by pLI and MPC) was associated with ASD risk burden (see Figure 1C in Satterstrom et al., 2020). Indeed, while overall de novo missense variants showed a marginal enrichment over the rate expected by chance, the most severe category, MPC R2, which occurs at a frequency similar to de novo PTVs, had a 2.1-fold case enrichment (p = 3 × 10−8), and explained more of the variance than PTVs with pLI <0.995. In total, all de novo exome variants identified in auto-somes explained 1.92% of the variance of ASD, compared to >50% for common variants (Gaugler et al., 2014). While the explained variance may be small, the impact on understanding ASD biology, described in more detail below, is quite significant, as de novo variation allows for the identification of specific genes, which can be quite challenging for common variation, which is often located in regulatory regions of the genome with unclear impact on gene function (Grove et al., 2019).
The next challenge the authors tackled was to explain sex differences in ASD risk. Consistent with previous studies, the authors observed a 2-fold enrichment of de novo PTVs in highly constrained genes in affected females versus males (p = 3 × 10−6). The most popular explanation for the sex difference is a female protective effect, which, consistent with the observed data, would postulate that females need a higher mutation burden to manifest ASD (i.e., females tolerate mutations better, and thus, females who manifest the disease should harbor mutations with larger effects or have accumulated more mutations). The authors also test an alternate hypothesis of sex-specific differences in mutation burden—that the same mutation has a larger effect (higher liability) in males than females, and thus, females would again need a higher mutation load to manifest disease. While determining the liability of individual variants is not possible, since they are rare, the authors instead looked at the liability of classes of genes (e.g., PTVs) and separately estimated the liability in males and females. Across all classes of variants, the authors did not observe a significant sex difference in ASD liability, providing further evidence for the female protective model of ASD risk.
A key strength of exome sequencing analyses is the ability to identify specific genes associated with ASD risk. Using an updated version of their TADA software, which integrates protein-truncating and missense variants from both family-based and case-control studies, the authors identified 102 ASD risk genes at false discovery rate (FDR) ≤0.1, of which 26 were significant after Bonferroni correction. Of the 102 ASD genes, 60 were not discovered by previous exome sequencing analyses (Sanders et al., 2015), and the authors consider 30 as “truly novel,” having not been previously implicated in autosomal dominant neurodevelopmental disorders (ASD, developmental delay, epilepsy, and ID). It is important to note that all the identified genes were autosomal, largely due to reduced power for X chromosome analyses, since most de novo variants arise on the paternal chromosome, which is only transmitted to female offspring, who represent a minority of ASD cases. Looking at the patterns of mutations in ASD genes, it is interesting to note that some of the genes are identified based on de novo missense variants rather than on PTVs (which presumably knock out gene function). This suggests that some genes may predispose to autism risk through gain of function rather than loss of function, which would have clear clinical implications on potential treatment approaches.
A major question in the ASD genetics field has been whether studies focused on rare de novo large-effect variants shed light on the majority of ASD cases, many of which do not carry such variants and are presumably at risk due to an accumulation of common variants with small effect sizes. The authors note that among the five genome-wide significant ASD hits from the largest available GWAS (Grove et al., 2019), KMT2E is on the list of 102 ASD genes implicated by exome sequencing. This overlap suggests that the same genes may be affected by both common and rare ASD variants. However, a more formal test for enrichment of ASD-associated common variants among the 102 genes did not show any signal. To further test their hypothesis, the authors included GWAS datasets from 5 additional traits: schizophrenia, major depressive disorder, and attention deficit hyperactivity disorder (ADHD) (all positively correlated with ASD); educational attainment (positively correlated with ASD and negatively correlated with schizophrenia and ADHD); and height (negative control). Despite observing no enrichment for ASD, both schizophrenia and educational attainment showed significant enrichment. The authors argue that lack of association with ASD is a function of power, and that the positive associations with well-powered ASD-correlated traits was evidence for an overlap between rare-variant- and common-variant-implicated genes. It is a fair point, as the ASD GWAS has far less signal than either the schizophrenia or educational attainment GWASs, which each identify >100 loci, compared to 5 for ASD. However, these results still leave open the very real possibility that common and rare variants will identify distinct sets of genes that contribute differently to ASD risk.
Given the significant comorbidity observed between ASD and NDD, the authors asked whether they could gain insight into the relationship between these traits by comparing the relative frequency of disruptive de novo mutations in ASD- or NDD-ascertained trios. Of the 102 ASD-associated genes, 53 were classified as ASD-predominant (ASDP) and 49 as ASD with NDD (ASDNDD), with 13 genes showing statistically significant heterogeneity in mutation frequency between samples ascertained for ASD versus NDD (2 ASDP, 11 ASDNDD). The authors make a compelling case that the ASDNDD genes are functionally distinct from the ASDP genes: (1) they demonstrate a paucity of inherited PTVs for ASDNDD genes despite pro-bands having no significant difference in PTV frequency, consistent with greater natural selection acting on ASDNDD genes, and (2) ASD subjects who carry disruptive de novo PTVs in ASDNDD genes walk 2.6 ± 1.2 months later and have an IQ 11.9 ± 6.0 points lower those with mutations in ASDP genes. The authors are careful to point out that while there is a distinction between the ASDNDD and ASDP genes, disruptive de novo mutations in both reduced IQ and delayed age of walking, and that the excess burden of de novo variants is not limited to low-IQ ASD cases, supporting the idea that de novo disruptive mutations do not simply impair cognition (Robinson et al., 2014).
Functional characterization of the 102 ASD-associated genes was reassuringly consistent with previous studies, identifying a set of genes associated with gene expression regulation (GER; 58 genes) and neuronal communication (NC; 24 genes). The authors also identified a new associated category, “cyto-skeleton organization” (9 genes). Using RNA sequencing (RNA-seq) data from the Genotype-Tissue Expression (GTEx) resource, the authors found enrichment of gene expression in 11 of 13 brain regions sampled, with the strongest enrichment in cortex and cerebellar hemisphere. Assessing relative prenatal versus postnatal expression, the 101 cortically expressed ASD-associated genes were enriched for prenatal expression (p = 8 × 10−8), with the ASDNDD showing a more pronounced prenatal bias. Interestingly, the GER and NC genes show differences in peak expression timing, and an unsupervised clustering approach found that GER genes co-cluster and peak during the mid-fetal epoch, while NC genes co-cluster separately and peak postnatally. The authors speculate that the differing expression patterns of GER and NC genes could indicate two different time points of ASD susceptibility or, alternatively, a single susceptibility period at which both gene sets are highly expressed (mid- to late fetal development). Single-cell analysis of prenatal human forebrain was then used to identify specific cell types implicated with ASD. Consistent with previous studies, enrichment for the 102 ASD-associated genes was observed in maturing and mature neurons of the excitatory and inhibitory lineages from mid-fetal development onward, but not in non-neuronal lineages. These results are consistent with results from gene enrichment in postmortem (largely adult) brains, where enrichment for GWAS-implicated genes was observed in neuronally expressed genes, despite dysregulation of gene expression in microglia (Gupta et al., 2014).
Finally, the authors used a combination of computational approaches to implicate additional genes in ASD risk and to clarify the regulatory relationship between GER and NC genes. Relying on the assumption that genes that are co-expressed with ASD-associated genes could provide additional insight into ASD etiology, the authors combined their genetic association results with gene expression data from mid-fetal human brain and identified 138 genes (FDR % 0.005), including 83 not captured by genetic analyses alone. The impact of these additional genes on understanding ASD risk or neurobiology was not expanded upon and is thus a ripe area for further study. Protein-protein interaction (PPI) network analysis and results from chromatin and cross-linked immunoprecipitation sequence assays proved to be more insightful. First, the PPI networks, while showing an overall excess interaction among the 102 ASD-associated genes, did not show enriched interaction among GER and NC genes. More importantly, analyzing 26 GER genes with identified regulatory targets showed no evidence that GER genes regulate NC genes. The implication, consistent with previous data, is that GER and NC genes identify distinct pathways to autism risk and may suggest the need for improved classification of diagnostic sub-types and distinct therapeutic approaches.
In summary, the strength of this manuscript was not simply the large sample size, which yielded 102 ASD-associated genes, of which 30 are novel, but the careful analyses that begin to tease out distinct genetic features that vary across ASD cases. First, the authors tackled a major question in ASD—why do males have higher risk for ASD? By separately looking at liability for classes of genetic variants in males and females, they provided further evidence for a female protective model in which females can tolerate more mutational burden before manifesting disease. Second, the breakdown into ASDP and ASDNDD genes could lead to improved molecular diagnosis, providing clinicians with early insights into prognosis and potentially informing therapeutic options. Third, as the authors highlighted in their discussion, despite the commonality of gene haploinsufficiency leading to ASD, the impact on other phenotypes can be dramatically different, ranging from global developmental delay with impaired cognitive, social, and gross motor skills to a much milder impact on developmental phenotypes. The ability to recognize these different classes of genes provides a hook by which researchers can begin to separate out core ASD features (social communication deficits and repetitive and restrictive behaviors) from more general neurodevelopmental impairment. Finally, the prenatal bias for GER gene expression (and no bias for NC genes), along with the observation that GER genes do not appear to regulate NC genes, leads to an intriguing hypothesis. Specifically, the authors speculated that NC ASD genes provide support for the role of excitatory/inhibitory imbalance in ASD (Rubenstein and Merzenich, 2003) through direct impact on neurotransmission (also supported by the single-cell data implicating maturing and mature neurons in both excitatory and inhibitory lineages). In the absence of GER genes regulating NC genes, the authors suggest that GER genes impact excitatory/inhibitory balance by altering the numbers of these neurons in given regions of the brain. Thus, despite evidence for distinct roles of the genes identified, the authors propose a phenotypic convergence that goes beyond the identification of genes and suggest that understanding the nature of this convergence is likely to “hold the key to understanding the neurobiology that underlies the ASD phenotype.”
Together, the results generated by this largest-to-date exome sequencing analysis of ASD provide new insights into the role of rare protein-disrupting variants in disease manifestation and help characterize the phenotypic presentation, functional consequences, and developmental time points critical to ASD risk. A key remaining question, likely to be answered in the near future as sample sizes increase, is whether genes harboring common variation identify additional distinct pathways to ASD or identify a single unified pathway underlying ASD risk, perhaps providing the missing link between GER and NC genes.
REFERENCES
- Gaugler T, Klei L, Sanders SJ, Bodea CA, Goldberg AP, Lee AB, Mahajan M, Manaa D, Pawitan Y, Reichert J, et al. (2014). Most genetic risk for autism resides with common variation. Nat. Genet 46, 881–885. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grove J, Ripke S, Als TD, Mattheisen M, Walters RK, Won H, Pallesen J, Agerbo E, Andreassen OA, Anney R, et al. ; Autism Spectrum Disorder Working Group of the Psychiatric Genomics Consortium; BUPGEN; Major Depressive Disorder Working Group of the Psychiatric Genomics Consortium; 23andMe Research Team (2019). Identification of common genetic risk variants for autism spectrum disorder. Nat. Genet 51, 431–444. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gupta S, Ellis SE, Ashar FN, Moes A, Bader JS, Zhan J, West AB, and Arking DE (2014). Transcriptome analysis reveals dysregulation of innate immune response genes and neuronal activity-dependent genes in autism. Nat. Commun 5, 5748. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kosmicki JA, Samocha KE, Howrigan DP, Sanders SJ, Slowikowski K, Lek M, Karczewski KJ, Cutler DJ, Devlin B, and Roeder K (2017). Refining the role of de novo protein-truncating variants in neurodevelopmental disorders by using population reference samples. Nat. Genet 49, 504–510. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Power RA, Kyaga S, Uher R, MacCabe JH, Långström N, Landen M, McGuffin P, Lewis CM, Lichtenstein P, and Svensson AC (2013). Fecundity of patients with schizophrenia, autism, bipolar disorder, depression, anorexia nervosa, or substance abuse vs their unaffected siblings. JAMA Psychiatry 70, 22–30. [DOI] [PubMed] [Google Scholar]
- Robinson EB, Samocha KE, Kosmicki JA, McGrath L, Neale BM, Perlis RH, and Daly MJ (2014). Autism spectrum disorder severity reflects the average contribution of de novo and familial influences. Proc. Natl. Acad. Sci. USA 111, 15161–15165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rubenstein JLR, and Merzenich MM (2003). Model of autism: increased ratio of excitation/inhibition in key neural systems. Genes Brain Behav. 2, 255–267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Samocha KE, Kosmicki JA, Karczewski KJ, O’Donnell-Luria AH, Pierce-Hoffman E, MacArthur DG, Neale BM, and Daly MJ (2017). Regional missense constraint improves variant deleteriousness prediction. bioRxiv. 10.1101/148353. [DOI] [Google Scholar]
- Sanders SJ, He X, Willsey AJ, Ercan-Sencicek AG, Samocha KE, Cicek AE, Murtha MT, Bal VH, Bishop SL, Dong S, et al. ; Autism Sequencing Consortium (2015). Insights into Autism Spectrum Disorder Genomic Architecture and Biology from 71 Risk Loci. Neuron 87, 1215–1233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Satterstrom FK, Kosmicki JA, Wang J, Breen MS, De Rubeis S, An J-Y, Peng M, Collins R, Grove J, Klei L, et al. (2020). Large-Scale Exome Sequencing Study Implicates Both Developmental and Functional Changes in the Neurobiology of Autism. Cell 180, 568–584. [DOI] [PMC free article] [PubMed] [Google Scholar]
