Abstract
Genotype-phenotype relationships shape health and population fitness but remain difficult to predict and interpret. Here, we apply an evolutionary action method in mutational landscapes to unravel genes and pathways connected to autism spectrum disorder (ASD). Evolutionary action predicts the impact of missense variants on protein function by measuring motions in fitness landscapes, based on phylogenetic distances and substitution odds in homologous sequences. By examining 368 pathways across 2,384 individuals with ASD (probands), we found that 23 pathways, a total of 398 genes, had de novo missense variants biased to higher evolutionary action scores than expected by random chance, including axonogenesis, synaptic transmission, and neurodevelopmental pathways. The predicted fitness impact of de novo and inherited missense variants in candidate genes correlated with the IQ of individuals with ASD, even for using only the new gene candidates. This approach demonstrates how the evolutionary action method can be applied in biology to integrate missense variants over a cohort to identify genes contributing a shared phenotype. Using this approach, we have detected those missense variants most likely to contribute to ASD pathogenesis and have elucidated their phenotypic impact.
One Sentence Summary
An evolutionary action approach elucidates the genotype-phenotype relationship in autism.
Introduction
The relationship between genotype and phenotype can be difficult to predict and interpret. This presents particular challenges when interpreting mutations of complex diseases like autism spectrum disorder (ASD), which is both phenotypically and genetically heterogeneous. Some predictions place the number of genes involved in ASD pathogenesis in the hundreds (1) (2) bordering on thousands (3) (4) (5), and the highly multigenic nature of the disorder means that few causative genes can be identified through an excess of mutations. In the absence of any single gene responsible for the majority of ASD cases, the most commonly mutated genes only account for approximately 2% of cases each (6) (7). To explain additional cases, it is critical to expand analysis to interpret the collectively large number of variants in rarely mutated genes.
Although ASD has many implicated contributing factors, including environment (8) (9), common polymorphisms (10), and inherited rare variants (11), de novo variants in particular are suspected to be enriched as a class for causative mutations because they have not been subjected to generations of evolutionary selection. Analysis of de novo mutations in ASD has largely focused on copy number variants (CNVs) (12–14), single nucleotide variants (SNVs) resulting in an obvious loss of function (15–17), and genes with a detectably elevated mutation rate (18, 19). Far less attention has been paid to the role of missense variants on genes with low mutation rates, with such analyses limited to genes already implicated in ASD or in ASD-related pathways (20). The overall role of missense variants in driving phenotype severity has also remained unclear. Whereas strong links between mutation and lower patient IQ have been detected for loss-of-function (LOF) de novo variants (21) (16), as defined by the combined class of nonsense, frameshift, and splice-site mutations, studies have not yet been able to link missense mutations to the same patient presentations on a large scale and without prior knowledge of ASD-associated genes (16). However, individuals with ASD are more likely to carry a de novo missense variant than either a de novo LOF or a de novo copy number variant (16), so the prioritization and interpretation of these variants is paramount, especially if they are revealed to be an important and understudied source of driver events.
Here, we prioritized rarely mutated, potentially causative ASD genes by their de novo missense variants alone. Without making any a priori assumptions of which genes or pathways drive ASD, we tested whether groups of functionally related genes were biased toward high impact variants. Interpreting the variant effects on protein function is challenging (22, 23) and subject to disagreement between different methods of variant impact prediction (24). To estimate the impact of each variant, we first used the evolutionary action equation (25), a state-of-the-art prediction method that was consistently assessed to be one of the best methods in the objective, blind contests of the Critical Assessment of Genome Interpretation community (26, 27). Briefly, the evolutionary action equation models the genotype-to-phenotype relationship to first-order approximation with an equation that equates the functional impact of a mutation on fitness to the product of the functional importance of the mutated residue and the amino acid dissimilarity of the substitution. It is well suited for considering variants from groups of genes in aggregate because its scoring system for variants possesses built-in normalization for gene selection pressure. To quantify mutational bias in pathways, we considered the evolutionary action scores over the de novo missense mutations of functionally related genes across all patients. This integrative approach detected non-random mutational patterns indicative of proband-specific selection of missense variants associated with axonogenesis, synaptic transmission, and other neurodevelopmental pathways. Notably, in the genes prioritized by this approach, both missense de novo variants as well as rare inherited missense variants correlated with patient IQ, demonstrating a direct relationship to patient phenotype. We concluded that evolutionary action integration in pathways detected some missense variants that could contribute to ASD pathogenesis, with implications for prioritizing genes and variants in ASD.
Results
Characterization of the de novo missense variant class in ASD probands
We first assessed whether de novo missense variants in 2384 individuals with ASD have, as a class, a distinct and more impactful variant profile compared to random expectation or those in matched siblings without ASD. Across 2,384 individuals with ASD, we identified 1,418 missense variants affecting 1,269 unique genes and annotated the impact of the variants using evolutionary action scores (Table S1). Close to half of the probands (43.9%) carried a de novo missense variant, and the observed de novo missense mutation prevalence was 0.59/proband, similar to the rates reported by Neale et. al (43) (0.58/proband) and Sanders et. al (18) (0.55/proband). Across 1,792 unaffected siblings, we identified 976 missense variants affecting 911 unique genes (Table S2). Compared to their matched unaffected siblings, probands had more de novo missense variants, consistent with previous studies (16), and even after filtering to exclude genes with more than three variants this difference remained significant (difference of 6.25%, p = 0.016). The average predicted impact of all missense variants in probands was not significantly different from what would be expected by random mutagenesis (z-score, +0.13; Fig. 1A), indicating no evidence of selection. This finding is consistent with the very low fraction of expected ASD driver variants, as indicated by the fraction of excessive variants in probands compared to their healthy siblings and also suggested in previous studies (17). For the healthy siblings, this difference was significant (p=0.03), indicating selection against pathogenic variants, which may be due to the lack of disease driver variants in these individuals. Comparing evolutionary action distributions of de novo missense variants in probands with their healthy matched siblings showed a small difference that did not reach significance (p = 0.23; Fig. 1B), suggesting a broad spectrum of fitness effects for the variants that drive ASD, which agrees with previous conclusions that these drivers involve mild effects (17). These results suggest that the landscape of de novo missense variants over all individual genes in patients with ASD is similar to that of the matched siblings and dominated by mutations with relatively mild impact on protein fitness.
Figure 1. The impact of missense variants in probands and matched siblings.

(A) The impact of de novo variants in probands and matched siblings without ASD is compared to random nucleotide changes. The average evolutionary action score of all de novo missense variants is superimposed upon the distribution of averages produced by 10,000 simulations of 1,418 randomly selected coding missense variants for probands (upper plot) and 976 randomly selected coding missense variants for matched siblings (lower plot). (B) The impact of missense variants was compared between patients with ASD and matched siblings. The distribution of evolutionary action scores for missense variants in patients with ASD (black) compared to matched siblings (grey) are represented by violin plots with the center dot indicating median and the center bar indicating the 25th to 75th percentiles of the data. The plots were compared statistically using a 2-sample Kolmogorov-Smirnoff test.
However, network analysis of the 1,269 genes in which missense de novo variants occur in the patient group exposed an underlying non-random signal within this class of variants. Affected genes had significantly more protein-protein interactions in STRING (Search Tool for Retrieval of Interacting Genes/Proteins) database (28) than would be expected by chance (p = 7.3×10−12 and hundreds of Gene Ontology Biological Processes were significantly enriched. For the matched siblings, the protein-protein interaction enrichment was barely significant (p = 0.045). Yet, the vast majority of genes (1195 out of 1269 genes) under consideration exhibited these network features (Fig. S1A). A gene-centric interaction or enrichment approach is fundamentally limited in its ability to isolate the detected signal or stratify candidate genes; of the 1269 genes affected by de novo missense variants, 86% interacted with another in the set compared to 79% expected by chance, and there was no way to identify which genes were the excess driving the significance (Fig. S1B). For these reasons, a complementary approach to evaluating events within the missense variant class was necessary in order to extricate a causative subset of genes and variants.
Prioritization of de novo missense variants using variant-centric pathway analysis
To pinpoint the source of the interaction signal within the de novo missense class and meaningfully prioritize a subset of the missense de novo variants and their associated genes, we therefore pursued a variant-centric approach in which we examined patterns of variant impact across functionally related groups of genes. Genes were grouped by ontology using the software tool GO2MSIG (29), producing 368 pathways encompassing 15,310 total genes (Table S3), and variant impact was annotated with the evolutionary action method, producing impact scores on a continuous scale between 0 (minimum predicted impact) and 100 (maximum predicted impact). For the 1,792 patients with matched siblings, 1,037 missense de novo variants across 960 genes in probands and 976 missense de novo variants across 911 genes in the healthy siblings were considered (after we focused on genes with three or fewer missense mutations in order to avoid false discovery of pathways due to a single important gene). From these, 860 missense de novo variants across 796 genes in probands and 776 missense de novo variants across 725 genes in healthy siblings were assigned to the pathways. For each pathway, the evolutionary action score distribution of the de novo variants within the pathway was compared to the evolutionary action distribution of all other de novo variants in patients with ASD. Pathways that displayed a bias toward high-impact variants and remained significant after multiple hypothesis testing were considered to be of interest, and genes that were affected by de novo variants and present in a significant pathway were considered prioritized genes. This approach revealed 23 significant pathways in the probands, with functions that demonstrated clear ties to nervous system development, including axonogenesis and synaptic transmission (Fig. 2A, Table S4). For example, in the synaptic transmission pathway, 49 mutations contributed from 43 individual genes produced a variant impact distribution statistically (p = 6.95×10−4, q = 0.037) and visibly biased to higher evolutionary action scores (Fig. 2B). As a control, the same process was repeated using all 976 missense de novo variants from the matched siblings; no pathways exhibited significant bias toward high functional impact (Fig. 2C). For subsequent analysis, genes falling into pathways with significant evolutionary action bias toward high impact mutations were grouped together into a single set of 398 prioritized genes, and all other 562 genes with de novo missense variants were considered deprioritized (Table S5). An independent cohort of 5,134 individuals with ASD (DB6 release of MSSNG) had 933 de novo missense variant calls in any of the 23 prioritized pathways and these variants were biased to higher evolutionary action impact compared to the 1389 de novo missense variants in the rest of the genes (p < 0.0001). This result validated our conclusion that individuals with ASD have de novo missense variants that affected the function of the prioritized gene pathways.
Figure 2. Prioritization of de novo missense variants using variant impact on pathways.

(A) Hierarchical clustering of the 23 prioritized pathways is shown. For the 398 genes with missense variants that were associated with at least one relevant pathway, a matrix was created to denote whether the gene was (red) or was not (grey) a component of the pathway. Pathways were then grouped according to their patterns of affected genes via hierarchical clustering performed by GENE-E. (B) Evolutionary action score distribution for the synaptic transmission pathway is shown. Evolutionary action scores for the proband variants (red) and matched sibling variants (black) in this pathway were binned in deciles and represented as histograms. (C) Significance of all tested pathways in cohorts of patients with ASD and their matched siblings is presented. Each point represents one of the 368 tested pathways and is connected with a line to the same pathway in the matched cohort. The q = 0.1 significance threshold after False Discovery Rate (FDR) correction is represented as a dashed red line.
Evolutionary action burden of de novo missense variants in prioritized genes correlates with phenotypic severity
To determine whether prioritizing genes according to the evolutionary action distributions in pathways provides a meaningful stratification between causative and non-causative genes, the variants in the prioritized genes were tested for their relationship to patient presentation, defined here by IQ. This analysis was performed only for the probands due to the lack of IQ information for the unaffected siblings. The capacity of evolutionary action scores alone to predict patient phenotypic presentation within this prioritized gene set was tested by comparing the clinical presentation of male patients included in the initial analysis who were affected by different de novo missense variants in the same candidate gene. Although female probands with de novo missense mutations in prioritized genes contributed a minority of the data, they were highly disproportionately represented at low IQ and were analyzed separately from male patients to prevent confounding based on gender. When more than one phenotyped patient had a de novo missense variant in a given prioritized gene, the higher evolutionary action variant within the gene correctly predicted the patient with the lower IQ in 71.4% of paired comparisons (n = 28). Across all such cases, patients harboring the higher evolutionary action variant demonstrated significantly lower IQ overall, corresponding to a 15.2 point drop in IQ on average between the two groups (p = 0.023, paired t-test) (Fig. 3).
Figure 3. Predicted variant impact on IQ of ASD patient pairs with different de novo missense variants in the same candidate gene.

Pairs of patients with ASD affected by different de novo missense variants in the same prioritized gene were identified across the 398 prioritized genes (n = 28). Within each pair, the patient with the higher variant evolutionary action score was determined. Full-scale IQ scores were compared between the higher and lower evolutionary action groups using a paired t-test. Correctly prioritized pairs are shown linked by a solid black line; incorrectly prioritized pairs are shown linked by a dashed grey line.
To further explore the relationship between these variants and patient presentation, all male patients with ASD were divided into three groups corresponding to phenotypic severity: high IQ of 100 or greater (i.e., greater than or equal to population average), low IQ of less than 70 (i.e., more than two standard deviations below population average, and consistent with a diagnosis of intellectual disability), and intermediate IQ. Prioritized genes were grouped together into a single set of candidate causative ASD genes, and the evolutionary action score burden (sum of evolutionary action scores) of mutations in these genes was calculated for each patient and considered across the three groups. Significant differences in total variant impact were found between the three IQ groups, with the lowest IQ patient group having the highest impact mutations in the prioritized genes (p = 0.048; Kruskal-Wallis test) (Fig. S2A). This relationship between IQ and mutation evolutionary action scores was not seen when applied to all genes affected by de novo mutations (p = 0.89) or to genes that were not prioritized by the method (p = 0.58) (Fig. S2A). This correlation can be explained by comparing the distributions of evolutionary action scores of prioritized genes between the three IQ groups: patients with the lowest IQ had more variants with high scores than expected by chance, in contrast to patients with the highest IQ, who had fewer variants with high scores than expected by chance (Fig. S2B). These data showed that the impact of the variant on the protein (as estimated by evolutionary action score) correlated with patient phenotype (IQ).
We next investigated the impact of the protein itself on human health, estimated here using RVIS (Residual Variation Intolerance Score) calculations of genic tolerance to mutation (30). Although genic tolerance to mutation was not on its own a significant predictor of phenotypic severity (Fig. S3), weighting the evolutionary action impact score to account for differences in genic tolerance to mutation (weighted EA, see Methods) further improved the ability of the evolutionary action score burden in prioritized genes to predict patient phenotype when binned (p = 0.0028; Fig. 4) and this relationship also became significant when unbinned (R= −0.14, p = 0.013, linear regression) (Tables S6 and S7). In addition, this correlation was stronger when RVIS was used to measure genic tolerance to mutation compared to pLI (probability of being loss of function intolerant) or Missense Constraint Metric (Table S8); the correlation was significant also when verbal or non-verbal IQ were defined as the primary outcome (Table S9). The correlation was generally reproducible with other variant impact prediction methods, such as Polyphen2 (Polymorphism Phenotyping v2), CADD (Combined Annotation Dependent Depletion), SIFT (Sorting Intolerant From Tolerant), MPC (Missense badness, PolyPhen-2, and Constraint), and BLOSUM62 (BLOcks SUbstitution Matrix using sequences with less than 62% similarity), but it was stronger and more robust to changes in the analysis when evolutionary action was used instead. (Table S10). The correlation between evolutionary action score burden and IQ also could not be explained by confounding due to the presence of de novo nonsense variants or CNVs, which are known to affect IQ. There was no significant correlation between de novo CNV deletion size and evolutionary action score burden (Pearson R= −0.0001, p = 0.98), and patients with a concurrent nonsense variant did not have a higher evolutionary action burden in prioritized genes (p = 0.97). No significant relationship between patient IQ and evolutionary action score burden was found when the relevant gene set was instead considered to be all genes affected by de novo mutations (p = 0.21), genes that were not prioritized by the method (p = 0.74) (Fig. 4), or genes belonging to independent gene sets of interest a priori, such as those enriched for expression in the brain (31), proposed by orthogonal methods (32) (33) (34) (3), or connected to other candidates in a protein interaction network (Table S11). Whereas the correlation strength between missense variant burden and IQ is more modest than comparable continuous correlations that have been published connecting CNV deletion length to IQ (35), this is likely due to sample size and subsetting; the correlation we detect is at least as strong as the established connection between CNVs and IQ after restriction to the same patients used in our analysis (R = −0.09, p = 0.03).
Figure 4. De novo evolutionary action score burden and ASD patient IQ for prioritized and deprioritized gene groups.

Prioritized genes, deprioritized genes, and all genes with de novo missense variants were assessed for their relationship to the IQ of patients with ASD. For each male patient, the summed evolutionary action burden of de novo missense variants was calculated for each category with evolutionary action score weighting for genic intolerance to mutation (∑EAweighted). The patients were then split into three groups by their full-scale IQ score and the scores were compared using Kruskal-Wallis tests. Error bars reflect the 95% CI of the mean.
Furthermore, whereas the evolutionary action score burden accounted for cases in which more than one variant of interest was detected in a patient’s exome, the results could not be explained by an uneven distribution of patients affected by multiple de novo variants in prioritized genes (p = 0.51; chi square test) (Fig. S4A), and the genotype-phenotype relationship remained significant (p = 0.014) when considering only patients affected by a single variant of interest (Fig. S4B). Female patients were assessed separately, and while their variant impact profile across prioritized genes was equally biased to high action (Fig. S5A), the genotype-phenotype analysis was underpowered to detect a relationship of the magnitude present in male patients (Fig. S5B) and the correlation between IQ and evolutionary action burden was not significant (p = 0.40, linear regression).
Prioritized gene set demonstrates enrichment for genes linked to ASD
To determine whether prioritization using evolutionary action pathway distributions captured established knowledge, we next compared our prioritized gene set to the 2017 version of the manually curated genes for ASD by the Simons Foundation Autism Research Initiative (SFARI). We considered SFARI categories 1–3 (high confidence, strong candidate, and suggestive evidence) to be appropriate for comparison. We then quantified the overlap of our gene lists to the SFARI list and found that the prioritized genes were highly enriched for genes in the SFARI gene set compared to deprioritized genes (35/398 vs. 14/562; p = 10−5, Fisher’s exact test). Even better enrichment was obtained for the genes with the lowest RVIS scores (38/398 vs. 11/494; p = 10−6, Fisher’s exact test), which is orthogonal information to our gene prioritization. The prioritized genes were also significantly enriched in genes with higher pLI scores (p=1.5 × 10−6; Fig. S6A), in brain-expressed genes (p=10−7; Fig. S6B), and in 102 genes (p=9 ×10−5) implicated in ASD risk according to a recent study (36), when compared to the non-prioritized genes. These positive control data show that prioritization using evolutionary action pathway distributions preferentially captures current knowledge. Furthermore, of our prioritized genes, 28 of 363 (7.71%) that were not recognized by SFARI as high confidence genes in 2017 were recognized as such in the most recent release of SFARI, whereas for the nonprioritized genes, only 11 of 548 (2%) went from unrecognized to recognized over the same time frame. This difference was highly significant in a chi square analysis (p-value is < 0.0001) and demonstrated the utility of the evolutionary action-based prioritization approach. To determine whether prioritized genes without known link to ASD were also contributing to the relationship between genotype and phenotype, we next tested the ability of evolutionary action burden in prioritized genes to predict patient phenotype when the gene was either supported by the high-confidence curated SFARI gene set or unscored by SFARI. For each subset of the prioritized genes, patients with de novo variants in these genes were split into two groups based on whether their burden was above or below the mean of all such patients. Across all prioritized genes, the patient group with above-average evolutionary action burden demonstrated significantly lower IQ scores corresponding to an ~8 point drop in IQ (p = 0.006) (Fig. 5). The difference became more pronounced when restricting to prioritized genes also in the SFARI gene set, with the average IQ a full 30 points lower in the patient group with a higher evolutionary action burden (90.3 vs 60.3, p = 0.0015), 5 points more than would be found when considering the SFARI gene set without the aid of prioritization (Fig. S7A). Moreover, considering prioritization status improved the unthresholded correlation value from R = −0.33 to R = −0.37 (p = 0.02) (Fig. S7B). However, the majority (84.5%) of prioritized genes were not placed into any category by SFARI curation, and a significant ~6.5 IQ point difference between the groups persisted when considering only these unannotated genes (p = 0.03). For all tests, statistical significance was maintained across a wide range of thresholds (Fig. S8).
Figure 5. Effect of SFARI curation confidence and prioritization status on the relationship between genotype and phenotype.

There were three gene sets of interest: all prioritized genes, prioritized genes overlapping with SFARI categories 1–3 curated gene set, and prioritized genes without a SFARI confidence score. The evolutionary action burden of all three gene sets for all male patients with at least one de novo missense variant within the gene set were averaged. Patients were split into higher and lower evolutionary action burden groups based on whether their score was above or below the average burden, respectively. Groups were compared statistically with an unpaired t-test, and the mean and 95% CI interval of the mean for each group is displayed overlaying all IQ scores for patients in the group.
We next re-performed the analyses more stringently, comparing our prioritized gene set to an uncurated assessment of the current published literature. We defined genes with at least one association in PubMed between the gene name and the term ‘autism’ as being supported by the literature, and found that prioritized genes were significantly enriched for literature support compared to deprioritized genes (p < 0.0001, Fig. 6A). Simply considering the genes with the lowest RVIS scores also yielded a significant, but weaker association (p = 3 × 10−12 compared to p = 9 × 10−21; Fig. S9). Amongst all genes with literature support, those that were prioritized had a larger number of associations per gene (p = 0.007, Fig. 6B), indicating more extensive support. Moreover, while prioritized genes with literature support exhibited a significant relationship between IQ and evolutionary action burden both when binned (Fig. 6C) and unbinned (p = 0.019, linear regression), the deprioritized genes with literature support did not (Fig. 6D), suggesting that associations with deprioritized genes may have been false positives. We then tested the ability of evolutionary action burden in prioritized genes to predict patient phenotype when the gene was either supported or unsupported by the literature. When considering prioritized genes with literature support, patients with an above-average evolutionary action burden had IQ scores ~11 points lower than those with a below-average evolutionary action burden (p = 0.01, Fig. 6E). When considering prioritized genes with no literature associations with ASD, the same trend was seen with a significant decrease in IQ of over 7 points (p = 0.04, Fig. 6E). Again, statistical significance was maintained across a wide range of thresholds (Fig. S8).
Figure 6. Uncurated literature associations with ASD and the effect of literature and prioritization status on the relationship between genotype and phenotype.

(A) Enrichment of a prioritized gene set for associations with ASD in PubMed compared to a deprioritized gene set. (B) All genes with de novo variants and support in the literature were separated by prioritization status. Then the numbers of PubMed associations with ASD for the genes in each category were compared using an unpaired t-test. (C) De novo missense variants in prioritized genes with literature support (n = 133) were assessed for their relationship with patient IQ. Male patients were then split into three groups according to their full-scale IQ scores and then their weighted for genic intolerance to mutation evolutionary action burdens (∑EAweighted) were compared using a Kruskal-Wallis test. Error bars reflect the 95% CI of the mean. (D) De novo missense variants in deprioritized genes with literature support (n = 66) were assessed for their relationship with patient IQ. Male patients were then split into three groups according to their full-scale IQ scores and then their weighted for genic intolerance to mutation evolutionary action burdens (∑EAweighted) were compared using a Kruskal-Wallis test. Error bars reflect the 95% CI of the mean. (E) For each of three gene sets of interest (all prioritized genes, prioritized genes with at least one PubMed association with ASD, and prioritized genes without a PubMed association with ASD), the gene set evolutionary action burdens of all male patients with at least one de novo missense variant within the gene set were averaged. Patients were split into higher and lower evolutionary action burden groups based on whether their score was above or below the average burden, respectively. Groups were compared statistically with an unpaired t-test, and the mean and 95% CI interval of the mean for each group is displayed overlaying all IQ scores for patients in the group.
Evolutionary action score burden of rare and low-frequency inherited variants in prioritized genes correlates with phenotype severity
Given that the impact of de novo mutations in the candidate causative genes correlated with patient presentation, we next tested whether rare inherited variations in these same genes exhibited a similar relationship with IQ. We considered rare and low-frequency inherited variants with minor allele frequency (MAF) less than 0.05 in male patients that were detected in at least one parent, but were not inherited by the healthy sibling. Across the cohort there were 25,042 variants in the candidate genes that met these criteria. For each patient, we calculated the inherited evolutionary action burden in the candidate genes as the summation of all evolutionary action scores in these variants after weighting for gene-specific tolerance to mutation. We found that there was a significant correlation between IQ and inherited variant evolutionary action burden as well, with patients with a high IQ having a lower inherited evolutionary action burden in the prioritized gene set (p = 0.0005; Fig. 7A). There was no relationship between evolutionary action burden and IQ when considering genes that were not prioritized (p = 0.26), or that were low-confidence SFARI genes (SFARI categories 4–6; p = 0.83); the same relationships could be found when limiting the MAF cutoff to more stringent definitions of rare variant status (Fig. 7B). Incorporation of the de novo variants into the evolutionary action burden increased significance further (p = 0.0003). These data show that within the prioritized gene set, rare inherited variants also linked genotype to phenotype.
Figure 7. Relationship between inherited evolutionary action score burden and patient IQ for prioritized and deprioritized gene groups.

(A) For each male patient with ASD, rare and low-frequency inherited variants (MAF < 0.05) that were detected in at least one parent, but not inherited by the healthy sibling, were identified across prioritized genes. The inherited evolutionary action burden was calculated as the summation of all evolutionary action scores of these variants after weighting for gene tolerance to mutation (∑EAweighted). The line indicates the linear regression across all points and the shaded grey area represents the 95% CI; the p-value corresponds to the significance of the regression. For visualization purposes, the patients were also sorted by IQ and divided into nine equal groups; the average burden and IQ of each group is overlaid upon the regression and error bars indicate the standard error of the mean. (B) Log(p-values) of the linear regression of IQ and inherited evolutionary action burden as the minor allele frequency (MAF) threshold is increasingly restricted to lower frequencies for prioritized genes, deprioritized genes, and lower-confidence SFARI genes.
Discussion
Our data show that the evolutionary action distributions of de novo missense variants can be used to elucidate causative pathways in a complex multigenic disease and prioritize variants that stratify disease severity. Here, using sequencing data from 2,384 individuals diagnosed with ASD, we hypothesized that affected cohort-specific selection for large variant fitness effects within a group of functionally related genes implied an association of those pathways and genes with ASD. We observed significant impact signatures in 23 pathways, including axonogenesis, neuron development, and synaptic transmission. The mutated genes from these pathways were enriched for literature associations with ASD and are highly consistent with pathways of importance derived from analyses of CNV and loss-of-function (LOF) variant data (34) (12) (37, 38), as well as pathways identified through recurrent missense mutations in patients with neurodevelopmental disorders (39). This suggested that the putative causative missense SNVs identified in this study may operate through mechanisms similar, rather than orthologous, to well-documented processes involved in ASD etiology.
Our study directly links the evolutionary impact of missense variants to a measure of ASD phenotypic severity without a priori knowledge of ASD-associated genes. Although IQ cannot reflect all aspects of the phenotypic severity of patients with ASD, it correlates well with behavior-based observer-rating scales that encompass diverse areas of autistic symptomatology (40) and repetitive behaviors in patients (41), and therefore can provide a relevant index of ASD severity. Past work relating de novo variants to IQ in patients with ASD has focused almost exclusively on CNV and LOF variants, with studies finding a significant relationship between IQ and the de novo mutation rate of LOF variants (21) (16) as well as CNVs and truncating SNVs (42). However, when these same studies assessed missense variants no correlation with IQ was found, even after restricting analyses to recurrent missense variants (16) (42). Our initial assessments of the de novo missense class agreed with others who have reported that the overall impact of de novo missense variants in ASD does not differ substantially from expectations (43) (18). However, we found that this collective profile did not preclude the detection of gene and variant subsets with mutational signatures indicating a significant genotype-phenotype relationship. We observed a 30-point IQ decrease in patients with above-average missense variant impact burdens across the highest confidence gene candidates, and a 7.4-point IQ decrease in patients with above-average missense variant impact burdens across unexpected gene candidates. Additionally, we demonstrated that different variants within the same candidate gene could be linked to phenotypic outcomes through their predicted evolutionary action impact on protein fitness. Furthermore, a modest but highly significant correlation between rare inherited missense variant burden and IQ when considering the prioritized genes indicated that these genes may contribute to ASD etiology through pathways beyond de novo variation.
Our results suggest that de novo missense variants, especially those with high impact affecting important genes in neurological pathways, have the potential to influence the phenotypic presentation of patients with ASD even if they or the genes in which they occur have not been previously linked to ASD in the literature. However, lower-impact missense variants in a gene should not be assumed to produce a similar effect even if the gene or pathway has been previously associated with ASD. These findings have implications for clinical interpretation of de novo missense variants of unknown significance in patients diagnosed with ASD, which in turn could improve estimations of recurrence risk in siblings by helping to clarify whether a patient’s de novo missense variant influences their presentation or is merely incidental. In the future, larger cohorts and additional sequenced trios will enable refinement of the observed genotype-phenotype relationship into a clinically valuable outcome predictor, and will clarify whether missense variants in female patients with ASD demonstrate the same relationship to clinical presentation.
Our results also have implications for laboratory testing by suggesting which genes and variants to prioritize for experimental validation and inclusion in the SFARI gene set. One gene with a single missense variant in the cohort, CAMK2A, was not included in the SFARI Gene study when it was completed and had minimal literature support for an association with ASD but was prioritized by the pathway-evolutionary action integration as part of the synaptic transmission pathway. The detected variant in this gene has recently been shown to decrease excitatory synaptic transmission in cultured neurons and produce aberrant behavior including social deficits and increased repetitive behavior in mice with a knock-in of the variant (44). Now, CAMK2A has been linked to intellectual disability (45) and incorporated into SFARI Gene. Although this is a single example, pathway-evolutionary action can prospectively aid ongoing large-scale experimental efforts to test the functional effect of de novo missense mutations detected in major trio studies. In addition, as statistical power grows along with cohort sizes for ASD, genes suggested by pathway-evolutionary action can be further prioritized using results of frequency-based analyses to create short-lists for testing. Already 10 candidate genes, including several not in SFARI, overlap with a recent list of 35 genes identified through recurrent missense mutations in patients with neurodevelopmental disorders (39). Independent studies have reported 4 of our prioritized genes as new candidate ASD-associated genes (19, 20).
Several limitations may affect the sensitivity and specificity of the method. First, our approach to identify de novo variants (see Methods) is stringent and biased towards high specificity rather than towards finding more variants of lower confidence. Consequently, some genes with relevant de novo variants may not be prioritized for ASD association. Another limitation concerns the gene groups defined with Gene Ontology terms, which are far from a complete accounting of molecular functions and pathways. More genes are likely to be prioritized using alternative groupings. Also, given the polygenic character of ASD and the finite number of probands, small pathways may lack enough variants to achieve statistical significance. On the other hand, very large pathways that mostly include unrelated genes to ASD may be dominated by non-driver variants, leaving the very few driver genes deprioritized. An evolutionary action limitation is that 5% of the human genes were not scored due to insufficient number of homologous sequences. Although most of these genes are pseudogenes or functionally insignificant genes, it is possible that ASD driver genes were also included at a lower fraction. All these limitations decrease the sensitivity of our approach, but our prioritization could also include false positive genes. This is because any association to ASD is at the level of the gene groups and not at the level of single genes, since most genes have only one de novo variant. Finally, in the future, other phenotypes besides IQ phenotype should also be used to account for ASD phenotype severity as well as larger, independent cohorts. Despite these limitations, the results support our approach and shed light on the ASD genotype-phenotype relationship.
More broadly, the elucidation of the genotype-phenotype relationship through the integration of mutation impact and gene importance scores is an approach with implications for evolutionary theory and biology. The mathematics underpinning the use of evolutionary action distributions to identify pathways and genes of interest is founded on the assumption of an evolutionary fitness function that maps genotypes to phenotypes in the fitness landscape, but which is not directly calculable. Differentiation of this fitness function yields the evolutionary action equation to predict variant impact, in which the perturbation of the fitness landscape is equal to the product of the evolutionary fitness gradient, estimated by Evolutionary Trace (46), and the substitution log-odds of the amino acid change (25). These values are calculable from sequence data and predictions have been shown to correlate well with experimental assessments of protein fitness (47) (25), to consistently outperform machine learning methods (26, 27), and to enable stratification of patient morbidity (25) and mortality (48) in other disease contexts. This evolutionary action theory is extended in our study by considering the distribution of variant evolutionary action scores over a pathway. Such distributions are akin to integrating the evolutionary action equation across the pathway to recover the original genotype-phenotype relationship. Significant distributions indicate a nonrandom genotype-phenotype relationship. As we show here, this new evolutionary calculus in fitness landscapes can, in practice, identify candidate phenotypic driver genes and the relationship between variant impact and patient clinical outcome. The pathway evolutionary action approach should be generalizable beyond ASD to other multigenic diseases and phenotypes and can be applied to germline and de novo mutations alike.
Materials and Methods
Study design
In this study, we used whole exome sequencing data and associated phenotypic data from family trios and quartets of the Simons Simplex Collection (SSC) to suggest pathways and genes that may be associated with ASD and clarify the genotype to phenotype relationship of de novo missense variants in ASD. De novo missense variants in patients with ASD and their siblings were annotated with a computationally predicted impact on fitness (Evolutionary Action) and functionally related groups of genes were defined by Gene Ontology hierarchy and gene association data (GO2MSIG). Groups with a collective variant bias toward high impact were identified by examining the Evolutionary Action score distributions of their missense de novo variants. Genes that contributed variants to a biased score distribution were suggested as potentially involved in ASD. These genes were further examined for plausibility via genotype-phenotype correlations to IQ, comparisons to established knowledge, and time-stamped analyses of experimental verification. Data for generating the figures can be found in the supplementary tables and source data may be obtained from NDAR (Study 349).
Data acquisition
Variant call files (.vcfs) produced by the Simons Simplex Collection (SSC) were downloaded from NDAR (Study 349); this exome sequencing data encompassed 2,392 families and used FreeBayes SNV calling performed by Krumm et al. at the University of Washington. Phenotype data for the associated patients were obtained from the same source. CNV data for the correlation between evolutionary action score burden and IQ were downloaded from NDAR (Study 361) and restricted to patient de novo deletions. Approved researchers can also obtain the underlying SSC population dataset described in this study (https://sfari.org/resources/autism-cohorts/simons-simplex-collection) by applying at https://base.sfari.org.
De novo variant calling and quality assessment
Variants were called as de novo if the proband call was heterozygous with a depth higher than 10, alternate allele fraction of 0.3 or higher, and average alternate allele quality of 15 or higher; the same position was required in both parents to have a depth of at least 30, at least 95% of reads supporting a reference call, and no more than 5 reads supporting a non-reference call. These thresholds produced a set of de novo variants indicating high quality (Ti/Tv=2.64) and an absence of negative selection (lambda is 0.009, when 0.01 indicates no selection pressure and 0.038 indicates the negative selection pressure seen in inherited human variants) (49). Together, the Ti/Tv ratio and the lambda value suggest that our set of de novo variants is consistent to de novo variants reported in other studies and distinct from inherited variants, somatic variants, and random sequencing errors (Fig. S10). More than 98% of the de novo variants were autosomal, while the rest variants were on the X chromosome. Using this procedure, we identified de novo variants in both patients and siblings. Eight families were excluded from downstream analysis due to specific technical errors that resulted in an excessive number of apparent de novo sequence events in either the patient or sibling. In order to focus on genes that are infrequently mutated we did not consider genes with more than three missense mutations, which notably included the well-documented ASD driver SCN2A, and six more genes (HLA-B, MAGEC1, MUC4, MUC5B, PABPC1, and RBMX).
Network/Gene Set Enrichment analysis of genes affected by de novo variants in patients
Protein-protein interactions were defined by the Homo sapiens STRING v.10.0 network (28) using the aggregate score of all evidence types and were considered as interactions if they had ‘medium confidence’ or higher (interaction score ≥ 0.4). Enrichment tests for protein-protein interactions, as well as gene set enrichment analysis for Gene Ontology Biological Processes, were performed through the STRING graphical user interface. Gene sets were considered significantly enriched at the default q<0.05 threshold reported by STRING. STRING analysis accounts for gene length when assessing significance of enrichments.
Annotation of missense variants with Evolutionary Action
The impact of missense variants on protein fitness was computed with the evolutionary action equation, which has won multiple challenges of the Critical Assessment of Genome Interpretation community in 2017, 2015, 2013, and 2011 (26, 27). Briefly, this equation follows from viewing evolution as a differentiable mapping, f, of genotypes (γ) onto the fitness landscape (φ), so that:
| (1). |
Differentiation then leads to the evolutionary action equation:
| (2), |
where ∇f is the evolutionary gradient in the fitness landscape, dγ is a genotype perturbation such as a mutation, and dφ is the fitness effect. In practice, (2) is approximated to first order. For a substitution from amino acid type X to type Y at a protein residue, ri, the evolutionary gradient ∇f reduces to , which is the mutational sensitivity at ri and equivalent to its evolutionary importance defined by the Evolutionary Trace method (46) (51). To estimate dγ, we use odds of amino acid substitution from X to Y. This approach produced scores on a continuous scale between 0 and 100, where a higher value indicated a larger predicted impact on protein fitness resulting from the amino acid substitution. When a variant affected multiple isoforms of a protein, the impact score was averaged across all affected isoforms. Evolutionary Action calculations are described at greater length in the original publication of the method (25).
Annotation of genes for genic tolerance to mutation
We used the Residual Variation Intolerance Score (RVIS) (30) as our main measure of genic sensitivity to mutation; RVIS scores were converted with the equation mutation intolerance score = ((100-RVIS%)/100) in order to lie on a scale from 0–1 with 1 indicating maximum intolerance to mutation. RVIS is based on mathematical determinations of mutation population frequencies and is unbiased by the state of scientific knowledge regarding ASD. For a small fraction of genes an RVIS score did not yet exist. If a variant without an RVIS score was the only variant in a prioritized gene in the patient, the patient was not included in the IQ correlation analysis because their burden could not be accurately assessed.
Identification of gene groups with bias toward impactful missense variants
Gene groups were defined using Gene Ontology (GO) terms customized by GO2MSIG (29); customization was specific to Homo sapiens and ensured at least 500 genes in each group. This approach produced 368 pathways encompassing 15,310 total genes (Table S3). Gene groups with a collective variant bias toward high impact were identified by examining the evolutionary action score distributions of their missense de novo variants. For each pathway, the evolutionary action score distribution of the de novo variants within the pathway was compared to the evolutionary action distribution of all other de novo variants using a one-sided Kolmogorov-Smirnov test. Note that to account for properties that are unique to de novo variants in ASD patients, these comparisons were performed only within the relevant variant class. To account for multiple hypothesis testing across a large number of gene groups while maximizing discovery by limiting false negatives, groups which were significant after FDR with q<0.1 were considered significant (Table S4). This analysis was performed using missense de novo variants from 1,792 patients with matched siblings, and then repeated using missense de novo variants from the 1,792 matched siblings.
Relating evolutionary action scores in prioritized genes to patient phenotype
Although female probands with de novo missense mutations in prioritized genes contributed a very small fraction of the data (<1/7), they were highly disproportionately represented at low full-scale IQ scores (42% with IQ < 70 vs. 26% for male probands) and were analyzed separately from male patients to prevent confounding based on gender. ASD patients were divided into three groups by phenotype severity as defined by high full-scale IQ (greater than or equal to population average), low full-scale IQ (more than two standard deviations below population average, and consistent with a diagnosis of intellectual disability), and intermediate full-scale IQ. Genes falling into pathways with significant bias toward high impact mutations were grouped together into a single set of prioritized candidate ASD genes, and we considered the evolutionary action scores of mutations in these genes across the three groups for all binned analyses. For each patient the sum of the evolutionary action scores of de novo variants in their affected candidate genes was calculated. For each gene, variant evolutionary action scores (EA) were then weighted by the RVIS score such that weighted EA= EA*mutation intolerance score, and the total patient burden was recalculated with weighted evolutionary action scores in place of the raw evolutionary action scores. For comparison, we also substituted raw ExAC LOF Constraint Metric (pLI) and ExAC Missense Constraint Metric scores as alternative measures of genic intolerance to mutation (52) (Table S8), nonverbal and verbal IQ scores as alternate measures of phenotypic severity (Table S9), other prioritization approaches (Table S11), and CADD (53), SIFT (54), BLOSUM62 (55), MPC (56), and Polyphen-2 (57) scores as alternate measures of variant severity (Table S10).
For analysis of inherited germline variants, we considered low-frequency inherited variants (MAF<0.05) in prioritized genes that were observed in at least one parent, but were not inherited by the healthy sibling (using the same thresholds to confirm absence of the variant as were used for parent calls when determining de novo variants). The inherited evolutionary action burden for each male patient was calculated as the summation of all evolutionary action scores of these variants after weighting for genic tolerance to mutation. The minor allele frequencies of variants were obtained from ExAC (52) using the entire population.
Comparison of prioritized genes to published knowledge
Genes with at least one association in PubMed between the gene name and the term ‘autism’ were defined as being supported by the literature, while genes with no search results returned were defined as lacking literature support. These values were obtained automatically using a Biopython script on 10/4/2016. An updated PubMed search was performed on 01/13/2020, which resulted in 65 (33%) more genes associated to ASD and this update reassured our previous conclusion (Fig. S9). SFARI gene annotations were obtained from SFARI Gene and the SFARI Gene Scoring Module (4) on 1/13/2017. An updated version of the SFARI gene annotations was obtained on 9/15/2020 and this version was used to perform a timestamp analysis. All tests for enrichment of prioritized genes in the literature, either as defined by PubMed or SFARI, were performed using Fisher’s exact tests and in all cases the prioritized genes were compared against the other genes affected by de novo missense mutations in the patient cohort studied, such that all genes are derived from the same source dataset.
Summary statistical analysis
Collective bias of a gene set toward variants with high computationally predicted fitness impact was assessed using one-sided Kolmogorov-Smirnov tests that compared to variants outside that gene set; False Discovery Rate (FDR) correction for multiple hypothesis testing was applied and gene sets with FDR <0.1 were considered for further analysis. Pearson’s r was calculated for correlation between continuous data. Data was additionally analyzed with paired t-tests, unpaired t-tests, Kruskall-Wallis tests, and 2-sample Kolmogorov-Smirnoff tests as appropriate, with all tests being two-sided. p-values of <0.05 were considered statistically significant. Data are typically displayed as means with error bars representing the 95% CI interval of the mean, and descriptions of the statistical tests that were used for evaluation of the experiments are provided within the respective figure legends. Statistical analyses were conducted using scipy statistical packages for Python 2.7 and graphs were plotted using GraphPad Prism 6.0.
Supplementary Material
Figure S1. Stratification of de novo missense variants using gene-centric network analysis.
Figure S2. Relationship between de novo evolutionary action score burden and patient IQ for prioritized and deprioritized gene groups.
Figure S3. Relationship between patient IQ and genic tolerance to mutation (RVIS)
Figure S4. Relationship between evolutionary action burden and phenotype cannot be explained by the number of de novo missense variants of interest in a patient.
Figure S5. Comparison of prioritized de novo missense variants and genotype-phenotype analysis in male versus female probands.
Figure S6. Enrichment of prioritization status in genes with high pLI scores and in brain-expressed genes.
Figure S7. Effect of prioritization status on the relationship between genotype and phenotype in SFARI autism gene list.
Figure S8. Effect of threshold variation on high and low weighted EA burden group comparisons.
Figure S9. Updated PubMed search for associating our prioritization to ASD.
Figure S10. Quality control of de novo variant calls.
Table S1. Proband de novo missense variants.
Table S2. Healthy sibling de novo missense variants.
Table S3. Gene Ontology pathway definitions.
Table S4. Gene Ontology pathway bias in patients and matched siblings.
Table S5. Gene annotation with prioritization status.
Table S6. IQ and total weighted EA burden for male patients with at least one missense variant in a prioritized gene.
Table S7. Variant annotation with evolutionary action, RVIS, IQ, and gene prioritization status for variants in male patients with available phenotypic data.
Table S8. Genotype-phenotype relationship of de novo missense variants within gene sets, separated by prioritization group and metric used to estimate genic tolerance to mutation.
Table S9. Genotype-phenotype relationship of de novo missense variants within gene sets, separated by prioritization group and metric used to define phenotype.
Table S10. IQ correlation to mutation burden as measured by six prediction methods.
Table S11. IQ correlation to mutation burden as measured by evolutionary action for different gene sets.
Overline: Autism.
Autism spectrum disorder (ASD) is phenotypically and genetically heterogeneous, and the relevance of de novo missense variants to neurodevelopmental outcomes has been unclear. This study suggests pathways and genes that may be involved in ASD by assessing bias toward high predicted fitness impact amongst rare de novo missense variants in functionally related genes. For both established and unexpected genes, the computationally predicted evolutionary impact of these variants correlated with patient IQ. This study both supports a role for this rare variant class in ASD development and presents a generalizable approach toward elucidating the genotype-phenotype relationship in complex diseases.
Acknowledgements:
We are grateful to all of the families at the participating Simons Simplex Collection (SSC) sites, as well as the principal investigators (A. Beaudet, R. Bernier, J. Constantino, E. Cook, E. Fombonne, D. Geschwind, R. Goin-Kochel, E. Hanson, D. Grice, A. Klin, D. Ledbetter, C. Lord, C. Martin, D. Martin, R. Maxim, J. Miles, O. Ousley, K. Pelphrey, B. Peterson, J. Piggot, C. Saulnier, M. State, W. Stone, J. Sutcliffe, C. Walsh, Z. Warren, E. Wijsman). We appreciate obtaining access to phenotypic data from the SFARI Base.
Funding:
This work is supported by the National Institutes of Health (grant numbers GM079656-8, DE025181, GM066099 and AG061105 to O.L.); the Oskar Fischer Foundation (to O.L.); the National Science Foundation (grant number DBI1356569 to O.L.), and the Defense Advance Research Project Agency (grant number N66001-15-C-4042 to O.L). A.K. is supported by the Baylor College of Medicine Comprehensive Cancer Training Program grant number RP160283, the Baylor Research Advocates for Student Scientists, and the McNair MD/PhD Scholars program.
Footnotes
Competing interests: The authors declare no competing interests.
Related patents: US10886005B2 Identifying genes associated with a phenotype. Lichtarge O, Hsu TK, Katsonis P, Koire AM.
Data and materials availability:
All data associated with the paper are in the main text or the Supplementary Materials.
References and Notes:
- 1.Iossifov I, Ronemus M, Levy D, Wang Z, Hakker I, Rosenbaum J, Yamrom B, Lee YH, Narzisi G, Leotta A, Kendall J, Grabowska E, Ma B, Marks S, Rodgers L, Stepansky A, Troge J, Andrews P, Bekritsky M, Pradhan K, Ghiban E, Kramer M, Parla J, Demeter R, Fulton LL, Fulton RS, Magrini VJ, Ye K, Darnell JC, Darnell RB, Mardis ER, Wilson RK, Schatz MC, McCombie WR, Wigler M, De novo gene disruptions in children on the autistic spectrum. Neuron 74, 285–299 (2012); published online EpubApr 26 ( 10.1016/j.neuron.2012.04.009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Betancur C, Etiological heterogeneity in autism spectrum disorders: more than 100 genetic and genomic disorders and still counting. Brain Res 1380, 42–77 (2011); published online EpubMar 22 ( 10.1016/j.brainres.2010.11.078). [DOI] [PubMed] [Google Scholar]
- 3.Liu L, Lei J, Sanders SJ, Willsey AJ, Kou Y, Cicek AE, Klei L, Lu C, He X, Li M, Muhle RA, Ma’ayan A, Noonan JP, Sestan N, McFadden KA, State MW, Buxbaum JD, Devlin B, Roeder K, DAWN: a framework to identify autism genes and subnetworks using gene expression and genetics. Mol Autism 5, 22 (2014); published online EpubMar 06 ( 10.1186/2040-2392-5-22). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Abrahams BS, Arking DE, Campbell DB, Mefford HC, Morrow EM, Weiss LA, Menashe I, Wadkins T, Banerjee-Basu S, Packer A, SFARI Gene 2.0: a community-driven knowledgebase for the autism spectrum disorders (ASDs). Mol Autism 4, 36 (2013); published online EpubOct 03 ( 10.1186/2040-2392-4-36). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Yin J, Schaaf CP, Autism genetics - an overview. Prenat Diagn, (2016); published online EpubOct 15 ( 10.1002/pd.4942). [DOI] [PubMed] [Google Scholar]
- 6.Abrahams BS, Geschwind DH, Advances in autism genetics: on the threshold of a new neurobiology. Nat Rev Genet 9, 341–355 (2008); published online EpubMay ( 10.1038/nrg2346). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.An JY, Claudianos C, Genetic heterogeneity in autism: From single gene to a pathway perspective. Neurosci Biobehav Rev 68, 442–453 (2016); published online EpubSep ( 10.1016/j.neubiorev.2016.06.013). [DOI] [PubMed] [Google Scholar]
- 8.Durkin MS, Maenner MJ, Newschaffer CJ, Lee LC, Cunniff CM, Daniels JL, Kirby RS, Leavitt L, Miller L, Zahorodny W, Schieve LA, Advanced parental age and the risk of autism spectrum disorder. Am J Epidemiol 168, 1268–1276 (2008); published online EpubDec 01 ( 10.1093/aje/kwn250). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Cheslack-Postava K, Liu K, Bearman PS, Closely spaced pregnancies are associated with increased odds of autism in California sibling births. Pediatrics 127, 246–253 (2011); published online EpubFeb ( 10.1542/peds.2010-2371). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Gaugler T, Klei L, Sanders SJ, Bodea CA, Goldberg AP, Lee AB, Mahajan M, Manaa D, Pawitan Y, Reichert J, Ripke S, Sandin S, Sklar P, Svantesson O, Reichenberg A, Hultman CM, Devlin B, Roeder K, Buxbaum JD, Most genetic risk for autism resides with common variation. Nat Genet 46, 881–885 (2014); published online EpubAug ( 10.1038/ng.3039). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Krumm N, Turner TN, Baker C, Vives L, Mohajeri K, Witherspoon K, Raja A, Coe BP, Stessman HA, He ZX, Leal SM, Bernier R, Eichler EE, Excess of rare, inherited truncating mutations in autism. Nat Genet 47, 582–588 (2015); published online EpubJun ( 10.1038/ng.3303). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Pinto D, Delaby E, Merico D, Barbosa M, Merikangas A, Klei L, Thiruvahindrapuram B, Xu X, Ziman R, Wang Z, Vorstman JA, Thompson A, Regan R, Pilorge M, Pellecchia G, Pagnamenta AT, Oliveira B, Marshall CR, Magalhaes TR, Lowe JK, Howe JL, Griswold AJ, Gilbert J, Duketis E, Dombroski BA, De Jonge MV, Cuccaro M, Crawford EL, Correia CT, Conroy J, Conceicao IC, Chiocchetti AG, Casey JP, Cai G, Cabrol C, Bolshakova N, Bacchelli E, Anney R, Gallinger S, Cotterchio M, Casey G, Zwaigenbaum L, Wittemeyer K, Wing K, Wallace S, van Engeland H, Tryfon A, Thomson S, Soorya L, Roge B, Roberts W, Poustka F, Mouga S, Minshew N, McInnes LA, McGrew SG, Lord C, Leboyer M, Le Couteur AS, Kolevzon A, Jimenez Gonzalez P, Jacob S, Holt R, Guter S, Green J, Green A, Gillberg C, Fernandez BA, Duque F, Delorme R, Dawson G, Chaste P, Cafe C, Brennan S, Bourgeron T, Bolton PF, Bolte S, Bernier R, Baird G, Bailey AJ, Anagnostou E, Almeida J, Wijsman EM, Vieland VJ, Vicente AM, Schellenberg GD, Pericak-Vance M, Paterson AD, Parr JR, Oliveira G, Nurnberger JI, Monaco AP, Maestrini E, Klauck SM, Hakonarson H, Haines JL, Geschwind DH, Freitag CM, Folstein SE, Ennis S, Coon H, Battaglia A, Szatmari P, Sutcliffe JS, Hallmayer J, Gill M, Cook EH, Buxbaum JD, Devlin B, Gallagher L, Betancur C, Scherer SW, Convergence of genes and cellular pathways dysregulated in autism spectrum disorders. Am J Hum Genet 94, 677–694 (2014); published online EpubMay 01 ( 10.1016/j.ajhg.2014.03.018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Sanders SJ, Ercan-Sencicek AG, Hus V, Luo R, Murtha MT, Moreno-De-Luca D, Chu SH, Moreau MP, Gupta AR, Thomson SA, Mason CE, Bilguvar K, Celestino-Soper PB, Choi M, Crawford EL, Davis L, Wright NR, Dhodapkar RM, DiCola M, DiLullo NM, Fernandez TV, Fielding-Singh V, Fishman DO, Frahm S, Garagaloyan R, Goh GS, Kammela S, Klei L, Lowe JK, Lund SC, McGrew AD, Meyer KA, Moffat WJ, Murdoch JD, O’Roak BJ, Ober GT, Pottenger RS, Raubeson MJ, Song Y, Wang Q, Yaspan BL, Yu TW, Yurkiewicz IR, Beaudet AL, Cantor RM, Curland M, Grice DE, Gunel M, Lifton RP, Mane SM, Martin DM, Shaw CA, Sheldon M, Tischfield JA, Walsh CA, Morrow EM, Ledbetter DH, Fombonne E, Lord C, Martin CL, Brooks AI, Sutcliffe JS, Cook EH Jr., Geschwind D, Roeder K, Devlin B, State MW, Multiple recurrent de novo CNVs, including duplications of the 7q11.23 Williams syndrome region, are strongly associated with autism. Neuron 70, 863–885 (2011); published online EpubJun 09 ( 10.1016/j.neuron.2011.05.002). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Leppa VM, Kravitz SN, Martin CL, Andrieux J, Le Caignec C, Martin-Coignard D, DyBuncio C, Sanders SJ, Lowe JK, Cantor RM, Geschwind DH, Rare Inherited and De Novo CNVs Reveal Complex Contributions to ASD Risk in Multiplex Families. Am J Hum Genet 99, 540–554 (2016); published online EpubSep 01 ( 10.1016/j.ajhg.2016.06.036). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Wang T, Guo H, Xiong B, Stessman HA, Wu H, Coe BP, Turner TN, Liu Y, Zhao W, Hoekzema K, Vives L, Xia L, Tang M, Ou J, Chen B, Shen Y, Xun G, Long M, Lin J, Kronenberg ZN, Peng Y, Bai T, Li H, Ke X, Hu Z, Zhao J, Zou X, Xia K, Eichler EE, De novo genic mutations among a Chinese autism spectrum disorder cohort. Nat Commun 7, 13316 (2016); published online EpubNov 08 ( 10.1038/ncomms13316). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Iossifov I, O’Roak BJ, Sanders SJ, Ronemus M, Krumm N, Levy D, Stessman HA, Witherspoon KT, Vives L, Patterson KE, Smith JD, Paeper B, Nickerson DA, Dea J, Dong S, Gonzalez LE, Mandell JD, Mane SM, Murtha MT, Sullivan CA, Walker MF, Waqar Z, Wei L, Willsey AJ, Yamrom B, Lee YH, Grabowska E, Dalkic E, Wang Z, Marks S, Andrews P, Leotta A, Kendall J, Hakker I, Rosenbaum J, Ma B, Rodgers L, Troge J, Narzisi G, Yoon S, Schatz MC, Ye K, McCombie WR, Shendure J, Eichler EE, State MW, Wigler M, The contribution of de novo coding mutations to autism spectrum disorder. Nature 515, 216–221 (2014); published online EpubNov 13 ( 10.1038/nature13908). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Buja A, Volfovsky N, Krieger AM, Lord C, Lash AE, Wigler M, Iossifov I, Damaging de novo mutations diminish motor skills in children on the autism spectrum. Proc Natl Acad Sci U S A 115, E1859–E1866 (2018); published online EpubFeb 20 ( 10.1073/pnas.1715427115). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Sanders SJ, Murtha MT, Gupta AR, Murdoch JD, Raubeson MJ, Willsey AJ, Ercan-Sencicek AG, DiLullo NM, Parikshak NN, Stein JL, Walker MF, Ober GT, Teran NA, Song Y, El-Fishawy P, Murtha RC, Choi M, Overton JD, Bjornson RD, Carriero NJ, Meyer KA, Bilguvar K, Mane SM, Sestan N, Lifton RP, Gunel M, Roeder K, Geschwind DH, Devlin B, State MW, De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature 485, 237–241 (2012); published online EpubApr 04 ( 10.1038/nature10945). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Lelieveld SH, Wiel L, Venselaar H, Pfundt R, Vriend G, Veltman JA, Brunner HG, Vissers L, Gilissen C, Spatial Clustering of de Novo Missense Mutations Identifies Candidate Neurodevelopmental Disorder-Associated Genes. Am J Hum Genet 101, 478–484 (2017); published online EpubSep 7 ( 10.1016/j.ajhg.2017.08.004). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Chen S, Fragoza R, Klei L, Liu Y, Wang J, Roeder K, Devlin B, Yu H, An interactome perturbation framework prioritizes damaging missense mutations for developmental disorders. Nat Genet 50, 1032–1040 (2018); published online EpubJul ( 10.1038/s41588-018-0130-z). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Robinson EB, Samocha KE, Kosmicki JA, McGrath L, Neale BM, Perlis RH, Daly MJ, Autism spectrum disorder severity reflects the average contribution of de novo and familial influences. Proc Natl Acad Sci U S A 111, 15161–15165 (2014); published online EpubOct 21 ( 10.1073/pnas.1409204111). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Miosge LA, Field MA, Sontani Y, Cho V, Johnson S, Palkova A, Balakishnan B, Liang R, Zhang Y, Lyon S, Beutler B, Whittle B, Bertram EM, Enders A, Goodnow CC, Andrews TD, Comparison of predicted and actual consequences of missense mutations. Proc Natl Acad Sci U S A 112, E5189–5198 (2015); published online EpubSep 15 ( 10.1073/pnas.1511585112). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Katsonis P, Koire A, Wilson SJ, Hsu TK, Lua RC, Wilkins AD, Lichtarge O, Single nucleotide variations: biological impact and theoretical interpretation. Protein Sci 23, 1650–1666 (2014); published online EpubDec ( 10.1002/pro.2552). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Hicks S, Plon SE, Kimmel M, Statistical analysis of missense mutation classifiers. Hum Mutat 34, 405–406 (2013); published online EpubFeb ( 10.1002/humu.22243). [DOI] [PubMed] [Google Scholar]
- 25.Katsonis P, Lichtarge O, A formal perturbation equation between genotype and phenotype determines the Evolutionary Action of protein-coding variations on fitness. Genome Res 24, 2050–2058 (2014); published online EpubDec ( 10.1101/gr.176214.114). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Katsonis P, Lichtarge O, Objective assessment of the evolutionary action equation for the fitness effect of missense mutations across CAGI blinded contests. Hum Mutat, (2017); published online EpubMay 23 ( 10.1002/humu.23266). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Katsonis P, Lichtarge O, CAGI5: Objective performance assessments of predictions based on the Evolutionary Action equation. Hum Mutat 40, 1436–1454 (2019); published online EpubSep ( 10.1002/humu.23873). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, Simonovic M, Roth A, Santos A, Tsafou KP, Kuhn M, Bork P, Jensen LJ, von Mering C, STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res 43, D447–452 (2015); published online EpubJan ( 10.1093/nar/gku1003). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Powell JA, GO2MSIG, an automated GO based multi-species gene set generator for gene set enrichment analysis. BMC Bioinformatics 15, 146 (2014); published online EpubMay 17 ( 10.1186/1471-2105-15-146). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Petrovski S, Wang Q, Heinzen EL, Allen AS, Goldstein DB, Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet 9, e1003709 (2013) 10.1371/journal.pgen.1003709). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Uhlen M, Fagerberg L, Hallstrom BM, Lindskog C, Oksvold P, Mardinoglu A, Sivertsson A, Kampf C, Sjostedt E, Asplund A, Olsson I, Edlund K, Lundberg E, Navani S, Szigyarto CA, Odeberg J, Djureinovic D, Takanen JO, Hober S, Alm T, Edqvist PH, Berling H, Tegel H, Mulder J, Rockberg J, Nilsson P, Schwenk JM, Hamsten M, von Feilitzen K, Forsberg M, Persson L, Johansson F, Zwahlen M, von Heijne G, Nielsen J, Ponten F, Proteomics. Tissue-based map of the human proteome. Science 347, 1260419 (2015); published online EpubJan 23 ( 10.1126/science.1260419). [DOI] [PubMed] [Google Scholar]
- 32.Darnell JC, Van Driesche SJ, Zhang C, Hung KY, Mele A, Fraser CE, Stone EF, Chen C, Fak JJ, Chi SW, Licatalosi DD, Richter JD, Darnell RB, FMRP stalls ribosomal translocation on mRNAs linked to synaptic function and autism. Cell 146, 247–261 (2011); published online EpubJul 22 ( 10.1016/j.cell.2011.06.013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Parikshak NN, Luo R, Zhang A, Won H, Lowe JK, Chandran V, Horvath S, Geschwind DH, Integrative functional genomic analyses implicate specific molecular pathways and circuits in autism. Cell 155, 1008–1021 (2013); published online EpubNov 21 ( 10.1016/j.cell.2013.10.031). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Gilman SR, Iossifov I, Levy D, Ronemus M, Wigler M, Vitkup D, Rare de novo variants associated with autism implicate a large functional network of genes involved in formation and function of synapses. Neuron 70, 898–907 (2011); published online EpubJun 09 ( 10.1016/j.neuron.2011.05.021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Girirajan S, Dennis MY, Baker C, Malig M, Coe BP, Campbell CD, Mark K, Vu TH, Alkan C, Cheng Z, Biesecker LG, Bernier R, Eichler EE, Refinement and discovery of new hotspots of copy-number variation associated with autism spectrum disorder. Am J Hum Genet 92, 221–237 (2013); published online EpubFeb 7 ( 10.1016/j.ajhg.2012.12.016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Satterstrom FK, Kosmicki JA, Wang J, Breen MS, De Rubeis S, An JY, Peng M, Collins R, Grove J, Klei L, Stevens C, Reichert J, Mulhern MS, Artomov M, Gerges S, Sheppard B, Xu X, Bhaduri A, Norman U, Brand H, Schwartz G, Nguyen R, Guerrero EE, Dias C, C. Autism Sequencing, P.-B. C. i, Betancur C, Cook EH, Gallagher L, Gill M, Sutcliffe JS, Thurm A, Zwick ME, Borglum AD, State MW, Cicek AE, Talkowski ME, Cutler DJ, Devlin B, Sanders SJ, Roeder K, Daly MJ, Buxbaum JD, Large-Scale Exome Sequencing Study Implicates Both Developmental and Functional Changes in the Neurobiology of Autism. Cell 180, 568–584 e523 (2020); published online EpubFeb 6 ( 10.1016/j.cell.2019.12.036). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Glessner JT, Wang K, Cai G, Korvatska O, Kim CE, Wood S, Zhang H, Estes A, Brune CW, Bradfield JP, Imielinski M, Frackelton EC, Reichert J, Crawford EL, Munson J, Sleiman PM, Chiavacci R, Annaiah K, Thomas K, Hou C, Glaberson W, Flory J, Otieno F, Garris M, Soorya L, Klei L, Piven J, Meyer KJ, Anagnostou E, Sakurai T, Game RM, Rudd DS, Zurawiecki D, McDougle CJ, Davis LK, Miller J, Posey DJ, Michaels S, Kolevzon A, Silverman JM, Bernier R, Levy SE, Schultz RT, Dawson G, Owley T, McMahon WM, Wassink TH, Sweeney JA, Nurnberger JI, Coon H, Sutcliffe JS, Minshew NJ, Grant SF, Bucan M, Cook EH, Buxbaum JD, Devlin B, Schellenberg GD, Hakonarson H, Autism genome-wide copy number variation reveals ubiquitin and neuronal genes. Nature 459, 569–573 (2009); published online EpubMay 28 ( 10.1038/nature07953). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Shohat S, Ben-David E, Shifman S, Varying Intolerance of Gene Pathways to Mutational Classes Explain Genetic Convergence across Neuropsychiatric Disorders. Cell Rep 18, 2217–2227 (2017); published online EpubFeb 28 ( 10.1016/j.celrep.2017.02.007). [DOI] [PubMed] [Google Scholar]
- 39.Geisheker MR, Heymann G, Wang T, Coe BP, Turner TN, Stessman HAF, Hoekzema K, Kvarnung M, Shaw M, Friend K, Liebelt J, Barnett C, Thompson EM, Haan E, Guo H, Anderlid BM, Nordgren A, Lindstrand A, Vandeweyer G, Alberti A, Avola E, Vinci M, Giusto S, Pramparo T, Pierce K, Nalabolu S, Michaelson JJ, Sedlacek Z, Santen GWE, Peeters H, Hakonarson H, Courchesne E, Romano C, Kooy RF, Bernier RA, Nordenskjold M, Gecz J, Xia K, Zweifel LS, Eichler EE, Hotspots of missense mutation identify neurodevelopmental disorder genes and functional domains. Nat Neurosci, (2017); published online EpubJun 19 ( 10.1038/nn.4589). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Nishiyama T, Taniai H, Miyachi T, Ozaki K, Tomita M, Sumi S, Genetic correlation between autistic traits and IQ in a population-based sample of twins with autism spectrum disorders (ASDs). J Hum Genet 54, 56–61 (2009); published online EpubJan ( 10.1038/jhg.2008.3). [DOI] [PubMed] [Google Scholar]
- 41.Richler J, Bishop SL, Kleinke JR, Lord C, Restricted and repetitive behaviors in young children with autism spectrum disorders. J Autism Dev Disord 37, 73–85 (2007); published online EpubJan ( 10.1007/s10803-006-0332-6). [DOI] [PubMed] [Google Scholar]
- 42.O’Roak BJ, Vives L, Girirajan S, Karakoc E, Krumm N, Coe BP, Levy R, Ko A, Lee C, Smith JD, Turner EH, Stanaway IB, Vernot B, Malig M, Baker C, Reilly B, Akey JM, Borenstein E, Rieder MJ, Nickerson DA, Bernier R, Shendure J, Eichler EE, Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature 485, 246–250 (2012); published online EpubApr 04 ( 10.1038/nature10989). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Neale BM, Kou Y, Liu L, Ma’ayan A, Samocha KE, Sabo A, Lin CF, Stevens C, Wang LS, Makarov V, Polak P, Yoon S, Maguire J, Crawford EL, Campbell NG, Geller ET, Valladares O, Schafer C, Liu H, Zhao T, Cai G, Lihm J, Dannenfelser R, Jabado O, Peralta Z, Nagaswamy U, Muzny D, Reid JG, Newsham I, Wu Y, Lewis L, Han Y, Voight BF, Lim E, Rossin E, Kirby A, Flannick J, Fromer M, Shakir K, Fennell T, Garimella K, Banks E, Poplin R, Gabriel S, DePristo M, Wimbish JR, Boone BE, Levy SE, Betancur C, Sunyaev S, Boerwinkle E, Buxbaum JD, Cook EH Jr., Devlin B, Gibbs RA, Roeder K, Schellenberg GD, Sutcliffe JS, Daly MJ, Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature 485, 242–245 (2012); published online EpubApr 04 ( 10.1038/nature11011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Stephenson JR, Wang X, Perfitt TL, Parrish WP, Shonesy BC, Marks CR, Mortlock DP, Nakagawa T, Sutcliffe JS, Colbran RJ, A Novel Human CAMK2A Mutation Disrupts Dendritic Morphology and Synaptic Transmission, and Causes ASD-Related Behaviors. J Neurosci 37, 2216–2233 (2017); published online EpubFeb 22 ( 10.1523/JNEUROSCI.2068-16.2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Kury S, van Woerden GM, Besnard T, Proietti Onori M, Latypova X, Towne MC, Cho MT, Prescott TE, Ploeg MA, Sanders S, Stessman HAF, Pujol A, Distel B, Robak LA, Bernstein JA, Denomme-Pichon AS, Lesca G, Sellars EA, Berg J, Carre W, Busk OL, van Bon BWM, Waugh JL, Deardorff M, Hoganson GE, Bosanko KB, Johnson DS, Dabir T, Holla OL, Sarkar A, Tveten K, de Bellescize J, Braathen GJ, Terhal PA, Grange DK, van Haeringen A, Lam C, Mirzaa G, Burton J, Bhoj EJ, Douglas J, Santani AB, Nesbitt AI, Helbig KL, Andrews MV, Begtrup A, Tang S, van Gassen KLI, Juusola J, Foss K, Enns GM, Moog U, Hinderhofer K, Paramasivam N, Lincoln S, Kusako BH, Lindenbaum P, Charpentier E, Nowak CB, Cherot E, Simonet T, Ruivenkamp CAL, Hahn S, Brownstein CA, Xia F, Schmitt S, Deb W, Bonneau D, Nizon M, Quinquis D, Chelly J, Rudolf G, Sanlaville D, Parent P, Gilbert-Dussardier B, Toutain A, Sutton VR, Thies J, Peart-Vissers L, Boisseau P, Vincent M, Grabrucker AM, Dubourg C, N. Undiagnosed Diseases, Tan WH, Verbeek NE, Granzow M, Santen GWE, Shendure J, Isidor B, Pasquier L, Redon R, Yang Y, State MW, Kleefstra T, Cogne B, Gem H, S. Deciphering Developmental Disorders, Petrovski S, Retterer K, Eichler EE, Rosenfeld JA, Agrawal PB, Bezieau S, Odent S, Elgersma Y, Mercier S, De Novo Mutations in Protein Kinase Genes CAMK2A and CAMK2B Cause Intellectual Disability. Am J Hum Genet 101, 768–788 (2017); published online EpubNov 2 ( 10.1016/j.ajhg.2017.10.003). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Lichtarge O, Bourne HR, Cohen FE, An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 257, 342–358 (1996); published online EpubMar 29 ( 10.1006/jmbi.1996.0167). [DOI] [PubMed] [Google Scholar]
- 47.Gallion J, Koire A, Katsonis P, Schoenegge AM, Bouvier M, Lichtarge O, Predicting phenotype from genotype: Improving accuracy through more robust experimental and computational modeling. Hum Mutat 38, 569–580 (2017); published online EpubMay ( 10.1002/humu.23193). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Neskey DM, Osman AA, Ow TJ, Katsonis P, McDonald T, Hicks SC, Hsu TK, Pickering CR, Ward A, Patel A, Yordy JS, Skinner HD, Giri U, Sano D, Story MD, Beadle BM, El-Naggar AK, Kies MS, William WN, Caulin C, Frederick M, Kimmel M, Myers JN, Lichtarge O, Evolutionary Action Score of TP53 Identifies High-Risk Mutations Associated with Decreased Survival and Increased Distant Metastases in Head and Neck Cancer. Cancer Res 75, 1527–1536 (2015); published online EpubApr 01 ( 10.1158/0008-5472.CAN-14-2735). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Koire A, Katsonis P, Lichtarge O, Repurposing Germline Exomes of the Cancer Genome Atlas Demands a Cautious Approach and Sample-Specific Variant Filtering. Pac Symp Biocomput 21, 207–218 (2016). [PMC free article] [PubMed] [Google Scholar]
- 50.Cai B, Li B, Kiga N, Thusberg J, Bergquist T, Chen YC, Niknafs N, Carter H, Tokheim C, Beleva-Guthrie V, Douville C, Bhattacharya R, Yeo HTG, Fan J, Sengupta S, Kim D, Cline M, Turner T, Diekhans M, Zaucha J, Pal LR, Cao C, Yu CH, Yin Y, Carraro M, Giollo M, Ferrari C, Leonardi E, Tosatto SCE, Bobe J, Ball M, Hoskins RA, Repo S, Church G, Brenner SE, Moult J, Gough J, Stanke M, Karchin R, Mooney SD, Matching phenotypes to whole genomes: Lessons learned from four iterations of the personal genome project community challenges. Hum Mutat, (2017); published online EpubMay 23 ( 10.1002/humu.23265). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Mihalek I, Res I, Lichtarge O, A family of evolution-entropy hybrid methods for ranking protein residues by importance. J Mol Biol 336, 1265–1282 (2004); published online EpubMar 05 ( 10.1016/j.jmb.2003.12.078). [DOI] [PubMed] [Google Scholar]
- 52.Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O’Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB, Tukiainen T, Birnbaum DP, Kosmicki JA, Duncan LE, Estrada K, Zhao F, Zou J, Pierce-Hoffman E, Berghout J, Cooper DN, Deflaux N, DePristo M, Do R, Flannick J, Fromer M, Gauthier L, Goldstein J, Gupta N, Howrigan D, Kiezun A, Kurki MI, Moonshine AL, Natarajan P, Orozco L, Peloso GM, Poplin R, Rivas MA, Ruano-Rubio V, Rose SA, Ruderfer DM, Shakir K, Stenson PD, Stevens C, Thomas BP, Tiao G, Tusie-Luna MT, Weisburd B, Won HH, Yu D, Altshuler DM, Ardissino D, Boehnke M, Danesh J, Donnelly S, Elosua R, Florez JC, Gabriel SB, Getz G, Glatt SJ, Hultman CM, Kathiresan S, Laakso M, McCarroll S, McCarthy MI, McGovern D, McPherson R, Neale BM, Palotie A, Purcell SM, Saleheen D, Scharf JM, Sklar P, Sullivan PF, Tuomilehto J, Tsuang MT, Watkins HC, Wilson JG, Daly MJ, MacArthur DG, C. Exome Aggregation, Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016); published online EpubAug 18 ( 10.1038/nature19057). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J, A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 46, 310–315 (2014); published online EpubMar ( 10.1038/ng.2892). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Ng PC, Henikoff S, SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res 31, 3812–3814 (2003); published online EpubJul 01 ( [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Henikoff S, Henikoff JG, Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 89, 10915–10919 (1992); published online EpubNov 15 ( [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Samocha KE, Kosmicki JA, Karczewski KJ, O’Donnell-Luria AH, Pierce-Hoffman E, MacArthur DG, Neale BM, Daly MJ, Regional missense constraint improves variant deleteriousness prediction. bioRxiv, 148353 (2017) 10.1101/148353). [DOI] [Google Scholar]
- 57.Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR, A method and server for predicting damaging missense mutations. Nat Methods 7, 248–249 (2010); published online EpubApr ( 10.1038/nmeth0410-248). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Xu B, Ionita-Laza I, Roos JL, Boone B, Woodrick S, Sun Y, Levy S, Gogos JA, Karayiorgou M, De novo gene mutations highlight patterns of genetic and neural complexity in schizophrenia. Nat Genet 44, 1365–1369 (2012); published online EpubDec ( 10.1038/ng.2446). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.E.-R. E. S. C. Euro, P. Epilepsy Phenome/Genome, K. C. Epi, De novo mutations in synaptic transmission genes including DNM1 cause epileptic encephalopathies. Am J Hum Genet 95, 360–370 (2014); published online EpubOct 2 ( 10.1016/j.ajhg.2014.08.013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Francioli LC, Polak PP, Koren A, Menelaou A, Chun S, Renkens I, C. Genome of the Netherlands, van Duijn CM, Swertz M, Wijmenga C, van Ommen G, Slagboom PE, Boomsma DI, Ye K, Guryev V, Arndt PF, Kloosterman WP, de Bakker PIW, Sunyaev SR, Genome-wide patterns and properties of de novo mutations in humans. Nat Genet 47, 822–826 (2015); published online EpubJul ( 10.1038/ng.3292). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Figure S1. Stratification of de novo missense variants using gene-centric network analysis.
Figure S2. Relationship between de novo evolutionary action score burden and patient IQ for prioritized and deprioritized gene groups.
Figure S3. Relationship between patient IQ and genic tolerance to mutation (RVIS)
Figure S4. Relationship between evolutionary action burden and phenotype cannot be explained by the number of de novo missense variants of interest in a patient.
Figure S5. Comparison of prioritized de novo missense variants and genotype-phenotype analysis in male versus female probands.
Figure S6. Enrichment of prioritization status in genes with high pLI scores and in brain-expressed genes.
Figure S7. Effect of prioritization status on the relationship between genotype and phenotype in SFARI autism gene list.
Figure S8. Effect of threshold variation on high and low weighted EA burden group comparisons.
Figure S9. Updated PubMed search for associating our prioritization to ASD.
Figure S10. Quality control of de novo variant calls.
Table S1. Proband de novo missense variants.
Table S2. Healthy sibling de novo missense variants.
Table S3. Gene Ontology pathway definitions.
Table S4. Gene Ontology pathway bias in patients and matched siblings.
Table S5. Gene annotation with prioritization status.
Table S6. IQ and total weighted EA burden for male patients with at least one missense variant in a prioritized gene.
Table S7. Variant annotation with evolutionary action, RVIS, IQ, and gene prioritization status for variants in male patients with available phenotypic data.
Table S8. Genotype-phenotype relationship of de novo missense variants within gene sets, separated by prioritization group and metric used to estimate genic tolerance to mutation.
Table S9. Genotype-phenotype relationship of de novo missense variants within gene sets, separated by prioritization group and metric used to define phenotype.
Table S10. IQ correlation to mutation burden as measured by six prediction methods.
Table S11. IQ correlation to mutation burden as measured by evolutionary action for different gene sets.
Data Availability Statement
All data associated with the paper are in the main text or the Supplementary Materials.
