Abstract
Meningomyelocele (also known as spina bifida) is considered to be a genetically complex disease resulting from a failure of the neural tube to close. Individuals with meningomyelocele display neuromotor disability and frequent hydrocephalus, requiring ventricular shunting. A few genes have been proposed to contribute to disease susceptibility, but beyond that it remains unexplained1. We postulated that de novo mutations under purifying selection contribute to the risk of developing meningomyelocele2. Here we recruited a cohort of 851 meningomyelocele trios who required shunting at birth and 732 control trios, and found that de novo likely gene disruption or damaging missense mutations occurred in approximately 22.3% of subjects, with 28% of such variants estimated to contribute to disease risk. The 187 genes with damaging de novo mutations collectively define networks including actin cytoskeleton and microtubule-based processes, Netrin-1 signalling and chromatin-modifying enzymes. Gene validation demonstrated partial or complete loss of function, impaired signalling and defective closure of the neural tube in Xenopus embryos. Our results indicate that de novo mutations make key contributions to meningomyelocele risk, and highlight critical pathways required for neural tube closure in human embryogenesis.
Meningomyelocele (MM) is the most common structural defect of the central nervous system in humans. It results from failed closure of the neural tube in the first six weeks of gestation. Folic acid supplementation has reduced the disease burden3, but the incidence of 1 in 3,000–10,000 live births, along with the associated life-long neuromotor disability and increased mortality, has focused attention on this condition. Timely diagnosis allows for prenatal counselling and informed management choices, including termination of pregnancy, fetal surgery or postnatal surgery for the almost fully concordant hydrocephalus accompanying the hindbrain Chiari type II malformation4,5. More than 20 million people worldwide live with a failure of neural tube closure (NTD), but the causes remain poorly defined.
Mouse knockout studies have established hundreds of genes associated with NTD, often with partial penetrance, implicating apical–basal polarity or the Wnt/PCP, Wnt/β-catenin and DNA transcription pathways6. Despite epidemiological heritability estimates of 60–70% for NTDs7, few genes have been implicated in human NTDs, potentially because of genetic heterogeneity. Additive effects of common SNPs have been proposed, especially in the folate-metabolism gene MTHFR (C677T, p.Ala222Val), but meta-analysis from 22 different association studies indicates only a modest risk (odds ratio 1.23, 95% confidence interval (CI), 1.07–1.42)8. Inherited mutations in VANGL1/VANGL2, TBXT, CCL2 and CELSR1 have been identified in cohort studies9–12 but represent only a small percentage of cases. Previous successes using de novo mutation (DNM) approaches in conditions such as autism and congenital heart disease, in which mutations are potentially under purifying selection13,14, prompted a trio approach in 43 families, identifying recurrent mutations in SHROOM3 (ref. 15), but left unanswered the question of whether larger studies would be better powered.
Establishing the MM cohort
A critical step in DNM discovery is assessment for an excess burden of damaging mutations in cases versus controls13. With conservative assumptions of a modest increase in damaging DNMs in cases, and an estimate of perhaps 50–100 total MM genetic risk loci, we calculated that a cohort of 300–500 trios would be required to identify 5–10 recurrently mutated genes (see Extended Data Fig. 1 and Supplementary Notes). In 2015, we therefore established the Spina Bifida Sequencing Consortium (SBSC) to aggregate previous trio recruitment efforts supported by the US Centers for Disease Control and Prevention (CDC) alongside targeted new recruitment from specialists in more than 12 countries and through social-media outreach. Recruitment was limited to diagnosis of MM probands with open neural tissue observed at birth and hydrocephalus requiring shunting, specifying the most severe form of MM compatible with long-term survival (Methods). Our ethnically diverse cohort was designed specifically to assess risk from DNMs using trio sequencing, matched with 732 control trios from the Simons Simplex Collection16. After five years of recruitment, we had enrolled 325 trios, performed trio whole-exome sequencing (WES) and DNM analysis, but found no recurrently mutated genes, so we extended the recruitment timeline for a further five years, to double the cohort size.
After ten years of recruitment, we assembled a cohort of 851 MM trios (839 trios and 6 quartets) consisting of exome sequence data from 2,541 individuals. The SBSC was populated by 15 sites across the world, incorporating both historic CDC-assembled cohorts and an aggregated earlier report of 39 NTD trios (35 trios and 2 quartets)15.
Damaging DNM burden in MM trios
After a series of strict quality controls and kinship analyses applied to trio exomes, we constructed a high-confidence call set of 2,592 DNMs (1,458 from 777 MM trios and 1,134 from 725 control trios) (Methods) and confirmed strong DNM signatures17 (Supplementary Fig. 2a,b). The rates of the total and functionally categorized DNMs (likely gene disrupting (LGD), damaging missense (D-mis), tolerant missense and synonymous) were calculated for the jointly covered consensus regions (Fig. 1a), wherein total DNM rates follow a Poisson distribution (Methods and Supplementary Fig. 2c,d). The average mutation rates of total DNMs were 1.22 × 10−8 (1.13 × 10−8–1.31 × 10−8, 95% confidence interval) in MM and 1.18 × 10−8 (1.09 × 10−8–1.28 × 10−8, 95% confidence interval) in controls, which did not differ significantly (Supplementary Fig. 2e; Wilcoxon rank-sum test, P = 0.56) and were also comparable to the rates from previous studies (1.08 × 10−8–1.32 × 10−8)18,19.
Fig. 1 |. Enrichment of damaging DNMs in MM versus control.
a, DNMs categorized by predicted functional impact: LGD, D-mis, D-mis-HC (highly constrained, called from a meta predictor), tolerant mis and synonymous. DNM rates per proband in MM (left sides of the triangle with strips) and control (right sides). Areas represent the ratio of DNM rates between MM and controls in each functional category. b,c, Variant rate (10−8, left y axis) and theoretical rate (DNM rate per child, right y axis) are shown for different categories. Statistical analysis of the ratio of DNM rates for MM and control cohorts, denoted by the ratio of the two Poisson rates, or rate ratio (RR), calculated for all genes (b, n = 19,658) and constrained genes (c, n = 3,060; pLI ≥ 0.9). The theoretical rate was calculated by normalization of the variant rate with the total size of the hg38 coding region (59,281,518 base pairs). P-values were calculated by a one-sided rate-ratio test. Error bars: 95% confidence interval of two sample t-test; ***P < 0.001, **P < 0.01, *P < 0.05; NS, not significant.
By contrast, we observed an excessive rate of LGD DNMs (frameshift insertions and deletions (indels), stop–gain and splice donor/acceptor mutations) in MM (1.2 × 10−9) relative to controls (5.0 × 10−10) (Fig. 1b), with a ratio of the two Poisson rates (or rate ratio)20 of 2.10 (P = 9.0 × 10−6). Case-control comparison of DNM rate per proband (that is, theoretical DNM rate per proband; 0.12 in MM versus 0.05 in controls; Methods) indicated that 7.2% (95% confidence interval, 2.96%–11.36%) of MM probands display LGD DNMs implicated in MM risk (Table 1). Based on the excessive burden, approximately 52.43% of LGD DNMs in MM probands were estimated to contribute to MM risk (Methods), which is comparable to rates in other complex diseases, such as Tourette’s syndrome (around 51.3%) and autism-spectrum disorders (42%)13,21. When LGD DNMs were considered alongside D-mis DNMs (LGD + D-Mis, referred to as damaging DNMs), 28.26% (95% confidence interval, 7.75%–48.76%) were estimated to contribute to MM risk, whereas LGD DNMs considered with damaging missense of high confidence (D-mis-HC, a subset of D-mis DNMs predicted to have highly confident pathogenicity22) contributed approximately 80% of the total LGD risk burden (rate ratio, 1.86; P = 2.0 × 10−6). Paternal and maternal ages, which were expected to be correlated with DNM occurrence23,24, were not correlated with the excessive DNMs burden (Supplementary Fig. 3a). Moreover, there was no enrichment in tolerant or synonymous missense DNMs.
Table 1 |.
Contribution of DNMs to MM risk
| Variant type | Theoretical rate per child | Percentage of cases with DNMs mediating risk (±95% CI) | Percentage of DNMs carrying risk (±95% CI) | |
|---|---|---|---|---|
| MM (n = 772) | Control (n = 724) | |||
| LGD | 0.12 (0.09–0.14) | 0.05 (0.03–0.08) | 7.16% (2.96–11.36%) | 52.43% (21.67–83.18%) |
| Damaging (HC) | 0.18 (0.14–0.21) | 0.09 (0.07–0.12) | 9.60% (4.45–14.75%) | 46.15% (21.38–70.92%) |
| Damaging (all) | 0.29 (0.24–0.33) | 0.21 (0.17–0.25) | 9.62.% (2.64–16.59%) | 28.26% (7.75–48.76%) |
The DNM theoretical rate (the DNM rate per child) of MM (n = 772) and control (n = 724) trios with the 95% confidence interval (CI) using a one-sample t-test. The percentage of cases with DNM mediating risk were calculated by the difference between the theoretical rate of the MM and the control with 95% confidence interval obtained using a two-sample t-test (two-sided). The percentage of cases with DNM mediating risk was estimated by dividing the difference of theoretical rate between MM and control by the theoretical rate of MM trios, with the 95% confidence interval from a two-sample t-test. The theoretical rates were calculated on the basis of consensus regions for comparison of DNM rates. Damaging DNMs include LGD and D-mis DNM. Damaging-HC (high confidence) includes LGD and D-mis-HC.
In 3,060 genes considered to be highly constrained (that is, the probability of loss-of-function intolerance, pLI ≥ 0.9), the burden of D-mis DNMs was even more significant (rate ratio, 2.10; P = 0.003) (Fig. 1c). Furthermore, we observed a substantially increased burden of D-mis-HC DNMs (rate ratio, 7.7; P = 0.002). This contrasts with the DNM contributions of other complex diseases, for which LGDs have large roles compared with D-mis DNMs13.
To assess whether there is major missed heritability using WES relative to whole-genome sequencing (WGS), we recruited an extra 101 trios and 1 quartet, and performed WGS using standard discovery pipelines (Methods). We identified non-recurrent de novo single-nucleotide variants (SNVs) and indels at expected rates of 1.72 × 10−8 (versus 1.2 × 10−8–2.4 × 10−8 in previous publications19,25; Extended Data Table 1), encompassing 9 putative splice-disrupting and 41 in putative enhancer/promoter regions (Supplementary Data). Copy number variant (CNV) analysis revealed two non-recurrent deletions and two duplications (of more than 100 kilobase pairs), a rate comparable to other complex diseases26,27 (Supplementary Fig. 3b and Supplementary Table 1). De novo inversions or translocations were not observed. By contrast, a high LGD rate (0.13 per proband) in coding region was replicated in WGS data, with three overlapping genes observed in the WES cohort (KDM1A, ITPR3 and GRHL2). Overall, WGS-based analysis mirrored the findings from WES and did not identify an extra major class of mutation (see Supplementary Notes).
Functional convergence of MM DNM genes
Of 198 damaging DNMs (81 LGD and 117 D-mis) found in MM trios, Sanger sequencing was conducted on 89.4% (177 out of 198) for which sufficient DNA was available, yielding a 96.6% validation rate, which removed six false DNMs. The remaining 192 damaging DNMs (79 LGD and 114 D-mis) occurred in 187 unique genes, with only five genes mutated in two separate trios (a doubleton; PAX3, IRS1, ZSWIM6, BRSK2 and VWA8) (Fig. 2a and Supplementary Data). Notably, only PAX3 had previously been implicated in human NTDs15,28. Of the genes identified as singletons, only TBXT (splice donor) and CELSR1 (frameshift) had previously been implicated as inherited NTD risk factors9,11. The lack of more recurrently mutated genes indicates that there is a considerable gap in our current understanding of MM genetic risk factors.
Fig. 2 |. Functional convergence of genes implicated by damaging DNMs.
a, Proportion of singletons (DNM occurred in one trio) and doubletons (DNMs occurred in two independent trios) damaging (LGD or D-mis) genes in controls and MM. Five doubleton genes were annotated with variant functional categories (LGD, D-mis-HC or D-mis). b, A protein–protein interaction network composed of MM damaging DNM genes. Genes connected by at least one edge are shown; unconnected (orphan) genes are shown in Supplementary Fig. 4. Node colours denote the variant functional categories: LGD (purple), D-mis (light pink) and D-mis-HC (dark pink). Doubleton genes have green borders (three doubletons are shown). Edge thickness denotes the confidence score of the protein interaction, defined in the STRING database. c,d, The total number of edges (interactions) was counted in 100,000 networks randomly generated with bootstrapping by selecting gene set size of 108 (that is, 80% of the 135 control DNM genes) in MM and control cohorts. The density of the number of edges (c) is shown in 100,000 bootstrapped networks. Higher numbers of edges denote denser network interconnections. P-values were calculated by two-sided Wilcoxon rank-sum test. ***P < 2.2 × 10−16. The number of nodes with 0–10 edges is shown (d) in controls (green) and MM (yellow) gene sets. Points show the median and error bars denote the first and third quartiles. P-values were calculated by a one-sided Wilcoxon rank-sum test. ****P < 2.2 × 10−16; NS, not significant. e, Gene Ontology (GO) term network visualization for genes overlapping with 187 MM DNM genes with statistical significance (a GO enrichment analysis). GO terms of functional relevance are connected by an edge through commonly involved genes (small circles). Node size indicates the significance of the terms. Degree of connectivity between terms (edges), kappa statistics by ClueGO. NTRK1 is also known as TRKA.
Spatial transcriptomic analysis of 36 of the damaging DNM genes (multiplexed error robust in situ hybridization (MERFISH), described in the Methods) confirmed expression at mouse embryonic day 9.5, coinciding with neural tube closure (Extended Data Fig. 2). We found that most genes were expressed broadly in the embryo (78%; 28 of 36), but a minority (22%) showed some cell type-specific expression, such as Celsr1 in neural progenitors and Stab1 in neural-crest progenitors (Extended Data Fig. 3 and Supplementary Notes).
We next studied the protein interactions of MM DNM genes using the STRING database29 (Fig. 2b). Of the 187 genes, 107 (57%), including 4 doubletons, 33 LGDs and 74 D-mis were highly interconnected, compared with controls (38% interconnected) (Supplementary Fig. 4). Bootstrap analysis confirmed a significantly greater degree of network colocalization than for genes in the control group (Fig. 2c and Methods; two-sided Wilcoxon rank-sum test, P < 1.0 × 10−16). Genes exhibiting a higher node degree (n ≥ 6) were predominantly observed in the MM gene set (Wilcoxon rank-sum test, one-sided; Fig. 2d). Indeed, MM DNM genes were significantly enriched in biological pathways such as morphogenesis of polarized epithelium, neuronal cell adhesion, neural tube closure and signal transduction (Gene Ontology biological processes, adjusted P < 0.05; Fig. 2e and Supplementary Table 2), indicating functional convergence in neural tube closure.
DNMs implicate functional modules in NTD
Using the 187 MM DNM genes as seeds, we used network propagation to construct a comprehensive gene network associated with human MM risk30 (Extended Data Fig. 4 and Methods), by implicating functionally related genes that might not be directly observable in patients owing to, for instance, lethal mutations. The propagated network comprised 439 genes, including 257 propagated genes, and exhibited 2,447 interconnected edges. We found an over-representation of 374 experimentally proven mouse NTD causal genes31 (Supplementary Data) in both seed and propagated genes (11 and 17 of the mouse NTD genes in before and after propagation; one-tailed hypergeometric P = 0.0015), corroborating functional relatedness to human MM. By contrast, no such relatedness was found with propagation in genes carrying DNMs in control trios (P = 0.295).
Applying a clustering algorithm32 to the propagated network, we identified five subnetworks in which damaging DNMs and propagated genes were closely connected and enriched in predefined signalling pathways (referred to as submodules): GTPase-based actin cytoskeleton organization, microtubule-based processes, chromatin-modifying enzymes, netrin-1 signalling and metabolism of lipids6,33–39 (false discovery rate (FDR) < 10−5; Fig. 3a, Supplementary Data and Methods), indicating that disrupting these pathways can increase MM risk. Notably, a higher haploinsufficiency of the 51 propagated genes in the submodules (pLI = 0.98 versus 0.69 of 46 seed genes (median); see Supplementary Fig. 5) substantiated the utility of network propagation to further implicate potentially lethal MM-risk genes.
Fig. 3 |. Functional modules that contribute to MM risk.
a, Functional submodules clustered from the propagated network, with the 187 damaging DNM genes from the MM cohort used as seeds. They are clustered with a Leiden algorithm with co-expression values from the STRING database as attributes, five submodules annotated with FDR < 10−5 by GO biological process, KEGG or Reactome databases shown. Functional terms annotated with FDR < 10−8 are shown with light purple background. b, Most significantly enriched GO biological processes with FDR < 10−8 shown for GTPase involved actin cytoskeleton and microtubule-based processes. FDR value: −log10 scale.
Common among the 12 most strongly associated signalling pathways with FDR below 10−8 was actin and microtubule organization processes (Fig. 3b). Functional relatedness to other submodules of higher-level processes40,41 and essential roles in neural fold adhesion and closure42,43 indicates that RHO/RAC1 GTPase-mediated actin and microtubule organization processes contribute to human MM risk.
Functional assessment of MM genes
We next explored the functional impact of DNMs. In the absence of key regulators, nine genes were selected based on involvement in the identified functional submodules: TIAM1, PLCE1, NOSTRIN and WHAMM (GTPase-involved actin cytoskeleton), KDM1A and SPEN (chromatin-modifying enzymes), TNK2 and MINK1 (netrin-1 signalling) and DNAH5 (microtubule-based process) (Fig. 4a). Two further doubleton genes, VWA8 and BRSK2, identified in two MM probands, were also included (Extended Data Table 2). Patient mutations in these genes were primarily D-mis DNMs located in domains predicted to interfere with protein function (Fig. 4a, yellow), except for LGD mutations in SPEN, MINK1, WHAMM and NOSTRIN, which are predicted to truncate the protein (Fig. 4a, red).
Fig. 4 |. Functional validation of damaging DNMs related to actin polymerization.
a, Damaging DNM genes highlight different functional submodules ofnine mutated genes, including the D-mis genes (TNK2, TIAM1, PLCE1, KDM1A and DNAH5) and LGD genes (WHAMM, NOSTRIN, SPEN and MINK1). b, A TIAM1 H1149P patient mutation led to fewer lamellipodia (filamentous actin) than the wild type (WT) imaged with phalloidin. Experiment independently repeated three times.Scale bars,10μm. c, A TIAM1 H1149P patient mutation (n=27) decreased RAC1 activation with constitutively active SRC, observed using Förster resonance energy transfer (FRET) compared with vector (n=37) and WT TIAM1 (n=32). Kruskal–Wallis test followed by a two-sided pairwise Wilcoxon test; P value adjusted with a Bonferroni correction. Data are shown with a Hampel filter. Error bars show s.e.m. d, Quantification of GTP-bound RHOA (n=5) of E623Q PLCE1. PLCE1 contains a RAS-like small GTPase (RAS GEF) domain so there is crosstalk between RAS and RHOA, PLCE1 seems to have an indirect effect on RHOA rather than a direct one. Each dot represents one independent experiment (n=5). One-way analysis of variance (ANOVA) with a posthoc Bonferroni correction Error bar shows s.d.a.u., arbitrary units. e,g,i, In X. laevis, knockdown of spen (e), mink1 (g) and nostrin and whamm (i) cause an open neural tube. Dorsal views of Xenopus embryos injected with MOs at the late neurula stage are shown. f,h,j, Quantification of the average distance between neural folds in Xenopus spen (f; control (n=25), 10ng (n=17), 20ng (n=19)), mink1 (h; control (n=13), 5ng (n=12), 10ng (n=10)), and nostrin (5ng (n=16), 10ng (n=17)) and whamm (5ng (n=16), 10ng (n=17)) (j) embryos injected with MOs with pax3 insitu hybridization (control (n=14), 5ng (n=13) and 10ng (n=14)). Error bars show s.e.m. P-values calculated by one-way ANOVA, followed by Tukey’s multiple comparison test. ****P < 0.0001, *** P < 0.001, **P < 0.01, *P < 0.05; NS, not significant.
In vitro assessment confirmed impaired or reduced protein function for most D-mis DNMs tested (Fig. 4a–d and Extended Data Figs. 5–10). TIAM1 is a SRC-activated RAC1-specific guanine nucleotide exchange factor that promotes the assembly of filamentous actin. We found that wild-type TIAM1, but not the mutant TIAM1 H1149P, co-localized with filamentous actin (Fig. 4b). Although basal RAC1 activation was unperturbed, the H1149P mutant had a significant reduction in RAC1 activation after co-expression with constitutively activate SRC (Fig. 4c and Extended Data Fig. 5a,c,d). PLCE1 regulates RHOA activation, leading to actin polymerization and cell migration. The E623Q mutation, located in the RAS GEF domain, disrupted the active GTP-bound form of RHOA (Fig. 4d and Extended Data Fig. 5b,e). TNK2 is a Cdc42-activated kinase that phosphorylates WASP, among other targets. We found that the immunoprecipitated TNK2 P186L mutant DNM exhibited approximately 60% reduced kinase activity toward a WASP pseudo-substrate, relative to the wild type and kinase-dead versions (Extended Data Fig. 6a–c). Likewise, we identified the impact of patient mutations on protein homeostasis, causing alternative splicing in DNAH5 (Extended Data Fig. 6d–f), reduced expression in KDM1A (Extended Data Fig. 7a–c) and production failure in VWA8 (Extended Data Fig. 7d–f). Overall, six of the eight D-mis mutations tested showed clear functional disruption, yielding a validation rate of 75%. We further conducted computational analysis (including 3D modelling and binding simulation; Methods) on the genes with insufficient functional evidence (CLIP2 and BRSK2; Supplementary Figs. 6 and 7) or with limited options for functional assays (SCAP and DHCR7; involved in lipid metabolism). These predicted substantial alterations in protein structure and reduced binding affinity (Supplementary Fig. 8).
Finally, in vivo knockdown of the four genes with LGD mutations (SPEN, MINK1, NOSTRIN and WHAMM) was done in Xenopus laevis to test the phenotypic effects on neural tube closure. In all four genes, splice-blocking morpholinos (MOs; see Methods) targeting the orthologue (spen, mink1, nostrin and whamm) resulted in NTD (Fig. 4e–j), which was further validated in crispant mutants (Extended Data Figs. 8–10), a rescue experiment (Extended Data Fig. 9) and translation-blocking MOs (Extended Data Fig. 10). The severity of NTD was dose dependent in all genes; for instance, injection of 10 ng spen MO led to mild NTD, whereas 20 ng injection induced more severe open NTD in both anterior and posterior regions, as evidenced by pax3 in situ hybridization (Fig. 4e–j). Furthermore, combined targeting of nostrin and whamm, which jointly participate in actin polymerization processes, led to synergistic pathogenicity in the phenotypes (Fig. 4i,j). These findings confirm the role of LGD DNM genes in MM pathogenesis in vivo, underscoring the potential gene dosage-dependent and synergistic pathophysiology contributing to MM risk.
Expanding clinical phenotypes to MM
We noted that 82 of the MM DNM genes had previously been implicated in syndromic or non-syndromic phenotypes in the Online Mendelian Inheritance in Man (OMIM) database44, raising the question of whether our MM subjects might also express some of these OMIM phenotypes. By recontacting families, we found that 6% (5 of 83) of the MM subjects (that is, with mutations in ZSWIM6, PAX3 in two subjects, TCF12 and BICRA) had clinical features in addition to MM (Acromelic frontonasal dysostosis, Waardenburg, Craniosynostosis and Coffin-Siris syndromes, respectively), indicating that MM may represent a phenotypic expansion (Supplementary Table 3 and Supplementary Notes). Some subjects were lost to follow-up, meaning that 21.6% of the phenotypes could not be excluded, owing to a lack of full clinical information. A further two subjects (2.4%) shared some clinical features with OMIM phenotypes but not sufficient to make the clinical diagnoses. For most of these 82 subjects, the zygosity did not match the OMIM zygosity (dominant versus recessive), so it was not surprising that 70% of subjects lacked any OMIM clinical features. We conclude that MM can infrequently present as a partly penetrant phenotype in established OMIM disorders and suggest that specific alleles or environmental factors may determine expressivity.
Discussion
In this study, we present the first large-scale assessment of DNMs contributing to MM risk. Approximately 22% of MM probands had damaging DNMs, of which about 28% were expected to contribute to MM risk. This risk is comparable to other severe paediatric conditions that are probably under strong purifying selection, because the mutations are unlikely to propagate to offspring45–47. As well as the LGD DNMs found in other severe paediatric conditions, we found strong enrichment of D-mis DNMs, indicating that some LGD mutations in these genes may be lethal during embryogenesis. It would be interesting to assess the contribution of D-mis mutations in other severe paediatric conditions.
Although only a subset of the genes identified probably contribute to risk, we found that candidate MM genes were more interconnected than expected by chance. This lends support to the existence of a key network that profoundly contributes to MM, characterized by numerous genes with the potential to affect phenotype, interconnected by regulatory or protein-interaction networks, aligning with the omnigenic model48. Many of the networks highlight pathways previously implicated from mouse or frog models, including Wnt signalling, planar cell polarity, actin regulation and cell signalling. Surprisingly, there was little overlap of mutated genes in humans with animal NTD models. One difference is that mouse and frog NTD genes are often identified as lethal embryonic phenotypes49,50, whereas our ascertainment was limited to living subjects. Thus, although the genes implicated in laboratory animals versus human NTDs may be different, we expect the pathways to be highly similar, as evident in the propagated network. Future work could introduce human DNMs into vertebrate models or human neural tube stem-cell models, to bridge this divide.
Of the 187 genes with DNMs in MM subjects, 83 had previously been implicated in human phenotypes, according to OMIM. Only two of these OMIM conditions listed MM as part of the associated spectrum, and in only five subjects was the overlap sufficient for clinical diagnosis of the linked OMIM phenotype. This indicates that MM should be considered as part of the clinical continuum of a variety of OMIM conditions, which may be expressed in an allelic- or zygosity-specific fashion. Surprisingly, only 5 of the 187 genes were recurrently mutated in our cohort, indicating that thousands of genes could contribute to MM risk (Supplementary Notes).
Our findings do not resolve the pathogenesis of most MM cases. Cumulative risk of singleton inherited variants51, rare and de novo copy-number variants52, or environmental factors such as folic acid deficiency, could also contribute to risk by exceeding a threshold for disease as proposed2. It is also possible that regulatory mutations contribute, perhaps modulated by folic acid. Finally, evidence of gene dosage sensitivity and synergistic effects raise the possibility that environmental factors during critical developmental windows might amplify the effects of mutations.
Methods
Establishing the SBSC cohort
Inclusion criteria.
Strict inclusion criteria were set for subject enrolment into the Spina Bifida Sequencing Consortium (SBSC). Participants must be affected by lumbosacral MM with Arnold–Chiari malformation and hydrocephalus requiring surgical intervention, such as ventriculoperitoneal shunt or endoscopic third ventriculostomy. We excluded subjects with closed neural tube defects not requiring surgery at birth, with meningocele, or without hydrocephalus or not requiring surgery at birth. Enrolment required that both biological parents were available for sampling. In the case of fetal surgery to correct MM, which reduces the incidence of hydrocephalus, the requirement for inclusion of hydrocephalus at the time of enrolment was lifted. Subjects with known syndromes that would explain their conditions were excluded. Any families that did not meet the strict inclusion criteria were excluded.
Recruitment.
The SBSC used several concurrent recruitment approaches: first, identifying recruitable MM trios from local, national and international hospitals; second, social-media outreach through the Spina Bifida Association accounts on Twitter/X, Facebook and Instagram directly to families; third, recruiting from spina bifida multidisciplinary clinics around the world with high MM caseloads; fourth, leveraging historic CDC-supported neural tube defect cohorts; fifth, sharing sequencing data that have already been generated from trios with members of the SBSC and publicly with the US National Institutes of Health (NIH)-supported the database of Genotypes and Phenotypes (dbGaP); and sixth, through follow-up when subjects and/or their families contacted the SBSC independently through our postings and advertisements on social media. The countries where participants were recruited include the United States, Mexico, Brazil, Italy, Georgia, Egypt, Canada, Venezuela, Pakistan, Guatemala and Nigeria. All families were provided with informed consent approved by the UCSD institutional review board S99075 protocol 140028. Recruitment processes were conducted in accordance with approval by the review boards of the University of California, San Diego supervising the study of subjects enrolled by each of the institutional members of the SBSC.
Study questionnaire.
All families prospectively recruited completed a standardized SBSC questionnaire, which confirmed the inclusion criteria and documented past medical history and current status by adopting previous questionnaire fields53, along with the place and date of birth of the patients and parents to control the DNM rate for parental age at the time of conception.
Subject sample and data handling.
DNA was extracted from blood or saliva by standard salt extraction protocols (Qiagen or Autogen) or from previous CDC-funded recruitment efforts where recontact was not possible, but these were considered lost to follow-up for sample re-collection. Data from 37 families that had previous trio WES15 were incorporated into the cohort using the standardized bioinformatics pipeline.
Sequence data generation
In total, 2,541 samples from 851 MM trios (839 trios and 6 quartets) were eligible for WES using six target capture kits (Roche Exome v.2, Agilent xGen Exome v.1, Agilent SureSelect v.4, Agilent SureSelect v.4, IDT xGen Exome v.2 and Twist human comprehensive exome), and trios were sequenced using the same capture kit. Samples with a low concentration of DNA, that were gender discordant or seemed to be contaminated (45 samples) failed quality control, resulting in 794 trios and 6 quartets. As a control, we obtained 732 healthy trios from the SAFARI cohort16, comprising WES data captured using NimbleGen EZ v.2.
Data preprocessing and quality control
Raw reads were aligned to a reference genome (GRCh38) using bwa mem (0.7.17), preprocessed with PICARD (2.20.7) AddOrReplaceReadGroups and MarkDuplicates. Germline variants were collected using GATK (v.4.11.0) HaplotypeCaller in the reference confidence model (-ERC GVCF) to be combined through CombineGVCF, generating a joint vcf for all MM and control trios, followed by GenotypeGVCF (v.4.11.0) to be jointly genotyped. Multiallelic variants were split (bcftools norm-m -any), indels were realigned and base quality recalibration was done using GATK (v.4.0.11) with known indel and population frequency information (dbSNP (146) and Mills_and_1000 G_gold_standard.indels.hg38. vcf.gz from a GATK resource bundle). Variant quality was recalibrated by GATK VQSR using HapMap (hapmap_3.3.hg38.vcf.gz), omni (1000G_omni2.5.hg38.vcf.gz), 1000G (1000G_phase1.snps.high_confidence.hg38.vcf.gz), (Mills_and_1000G_gold_standard.indels.hg38.vcf.gz) and axiom (Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf.gz).
To confirm kinship integrity within trios, we used TRUFFLE54 (v.1.38), removing 29 and 7 trios from the MM and control cohorts, respectively. IBD1 values were extracted from the vcf by using only variants with allele frequencies higher than 0.05 (--maf 0.05) to only use confident calls. The IBD1 for all possible sample pairs was examined if a parent–child IBD1 value was smaller than 0.75. Samples having more relations (IBD1 ≥ 0.75) with one outside their family were removed, possibly owing to low data quality or contamination.
De novo variant calling
De novo SNVs and indels were collected by following GATK genotype-refinement steps, with MM samples with a mean depth of 75× (95× from control samples). The genotypes of 777 MM and 725 control trios were gathered, and posterior probabilities of the variants were calculated using CalculateGenotypePosteriors (v.4.2.6.1). We selected only de novo variants that had genotype quality (GQ) ≥ 20, mapping quality (MQ) ≥ 30 and PASS in VQSR. Candidate DNMs that were found to have sequencing depth (DP) ≥ 12 in all three family members were kept to secure high-confidence calls. To remove candidates that deviated from theoretical heterozygous states (AF = 0.5), a two-sided binomial test was done to all positions by applying read depths and number of alternative alleles, removing sites with P > 0.01. Genotypes of both parents were examined and kept if genotyped as reference homozygous (GT = 0/0). To enhance the robustness of a high-confidence set of DNMs, a series of post-filters were applied by examining regional information and raw allele counts. First, clustered DNMs were filtered out when two or more DNMs of one proband were observed within 10 bp. Second, DNMs that were found to exist in the same genotypes in the parents, either in MM or control cohorts, were also removed. For de novo SNVs, each DNM position was examined by collecting raw reads with base quality greater than 13 and MQ > 0 as criteria for each trio (bcftools mpileup v.1.9). Then, to remove artefacts derived from short-read misalignment, a position was filtered out when the same alternative allele was observed in either of the parents. Two-sided binomial tests were done on SNVs with raw read counts, with a strict cut-off of 0.05. All indels were manually inspected using Integrative Genomics Viewer. The schematic overview of the DNM detection pipeline is shown in Supplementary Fig. 9. VEP (v.106) was used to annotate de novo variants with 1,000 genomes (phase 3), COSMIC (92), ClinVar (202109), ESP (V2-SSA136), HGMD-PUBLIC (20204), dbSNP (154), GENCODE (human release 40), gnomAD (r2.1.1), PolyPhen (2.2.2) and SIFT (5.2.2). Mutational signatures were collected by Mutalisk55 (February 2018) using linear regression.
DNM statistical confirmation
Consensus region for DNM rate calculation.
To compare DNM rates and burden among sequencing data generated using different platforms (such as exome capture libraries), genomic regions with sufficient coverage in common (consensus regions) were extracted. For each individual genome, coverage was calculated using bedtools56 (v.2.30.0) genomecov. Then, the coverage bedgraphs were merged into each trio using bedtools unionbedg, and only the regions with read depth of 12 or more in all three family members were kept. Each family bedgraph was merged with bedgraphs of other families in each batch, to check whether at least 70% of families covered that region. Finally, the batch bedgraphs were merged into a final consensus region, reaching a total of 36,553,428 bp.
DNM rates examined using a Poisson distribution.
DNMs located in the consensus region were confirmed that the number of DNMs per proband displayed the expected Poisson distribution. In brief, we assumed a cut-off of DNM count per proband from 1 to 20, generating DNM sets based on the cut-offs. Then, an expected Poisson distribution was generated per cut-off, with lambda as an average count of DNMs per proband (R dpoi). An observed distribution of DNMs per proband was compared with the expected distribution, using a chi-square goodness-of-fit test (R chisq.test). A cut-off of 7 DNMs per proband was set (Supplementary Fig. 2c,d), removing five and one trios from MM and controls, respectively, with DNMs ranging from 8 to 484, possibly owing to low data quality.
DNM burden analysis
DNM rates were calculated according to the number of DNMs observed in the consensus region. We used a rate ratio test, given that the number of DNMs per proband had a Poisson distribution. The DNMs were categorized into LGD, including frameshift indels, stop–gain and splice donor/acceptor mutations. Among the missense variants, if annotated as ‘probably damaging’ or ‘likely damaging’ in PolyPhen or ‘deleterious’ in SIFT, when the CADD prediction score (>20) supports either at the same time, we annotated DNMs as D-mis. If a DNM was annotated as deleterious by a highly strict meta predictor MetaSVM22, we prioritized it as D-mis-HC. We replicated the high-consensus D-mis mutations and their burden with two other independent missense variant annotation predictors, MetaLR22 and REVEL57 (Supplementary Notes and Supplementary Fig. 9c,d). MetaSVM, MetalLR and REVEL led to annotations based on dbNSFP (4.1a). Missense mutations that were not annotated as D-mis were categorized as ‘tolerant missense’. Constrained genes were collected with pLI ≥ 0.9. DNM rate was calculated for the consensus region (36,553,428 bp) in the mutation categories, with 95% confidence intervals calculated (one sample t-test). Two-sided Wilcoxon rank-sum tests were used to check the difference in the total DNM rates between MM (n = 772) and controls (n = 724). A rate ratio test (one-sided) was used to compare two Poisson-distributed rates from each cohort. Theoretical DNM rates were estimated according to the total size of the coding region in hg38 (59,281,518 bp). The percentage of the subjects with DNMs mediating risk could be calculated by the difference between the theoretical rates (DNM rate per exome) of the MM and the control (95% confidence interval calculated using a Wilcoxon rank-sum test, two-sided). The percentage of DNMs carrying the MM risk was estimated by dividing the difference of the theoretical rate by the theoretical rate of MM trios.
Whole-genome analysis
We also recruited 101 trios and 1 quartet with MM and generated WGS data using Illumina HiSeq2500 and NovaSeq. The reads were preprocessed and aligned with the same pipeline of WES data, with a mean depth of 40×. The same quality controls data were applied as for the WES data, removing four trios that failed kinship integrity and contamination analyses. Small SNVs and indels were collected using the same variant calling and filtering as for the WES analysis. We applied a binomial test to check whether DNMs were derived from a heterozygous state, as for WES, with a stronger P-value cut-off of 0.05 compared with WES because WGS data have more stable coverage with reduced capture bias and allelic imbalance. Further populational allele frequency information (gnomAD genome v.3.1.2) was added and SpliceAI (v1.3.1)58 was used (score > 0.5) to predict splice-disrupting variants. We annotated non-coding variants using GREEN-VARAN59 (GREEN-DB schema 2.5) to annotate DNMs in non-coding regions, including the type of regulatory elements, prediction supporting level and prioritization scores. For CNV analysis, CNVpytor (v.1.3.1)60 was used with a window size of 100 kb and manually curated by confirming the read depths of all fathers and mothers. To detect structural variants, we used Manta61 (v.1.6.0), Delly62 (v.0.8.1) and smoove (https://github.com/brentp/smoove; v.0.2.6) and merged the structural variant calls from the three callers using Survivor (v.1.0.7; parameters, dist = 50,000, callers = 0, type = 0, strands = 0, estimate = 0 and size = 50,000) to confirm that the detected structural variants were only present in probands. CNV analysis using WES data was not demonstrated in this study owing to the highly uncertain confidence levels observed during manual inspection, which are attributed to capture biases and inconsistent coverage depths.
MERFISH
MERFISH was done at the UCSD Epigenomics Core, as described previously63. In brief, mouse embryos at day 9.5 were fixed in paraformaldehyde and cryopreserved in 30% sucrose, embedded on dry ice and sectioned at 12-μm in a cryostat. These tissue sections then underwent fluorescent in situ hybridization with a panel of oligonucleotide probes specific for 36 of the genes identified from the current human DNM cohort, along with 107 marker genes. The 36 damaging DNM genes were selected on the basis of the candidate genes in the first recruitment of 325 trios. The marker genes were chosen using previously published single-cell data to classify cells into seven cell types: neuron, neural progenitor, neural crest, pre-epithelial to mesenchymal transition neural crest progenitor, dorsal root ganglia, blood and mesoderm64–66. During downstream analysis, marker genes were prioritized for the 46 most efficient markers specific for each cell type (see Supplementary Data for a full list of marker genes used). Probes were designed and manufactured using a standard pipeline by Vizgen (https://vizgen.com/gene-panel/). Tissue sections were prepared and processed using the Vizgen sample-preparation protocol (document 91600002 rev. A), then samples were placed in a Merscope for imaging and decoding (document 91600001, rev. G). Raw data were analysed using Scanpy (v.1.9.1)67. Cells were preprocessed and filtered using the following criteria: cells with volume less than 100 μm3, cells with volume larger than three times the median cell volume, fewer than four genes detected, fewer than ten transcripts detected, total RNA count lower than the 2% quantile or higher than the 98% quantile. Preprocessed cells were clustered on the basis of gene expression using a Leiden algorithm (leidenalg v.0.10.2), annotated using scType68 (v.1.0) based on marker genes as the reference set (see Supplementary Notes). Clusters were annotated to the cell type with the highest overall score from scType, accounting for expression of all marker genes per cell type in a focal cluster, compared with expression in other clusters. Clusters with prediction scores less than 0 or low-confidence scores (score less than 10% of the number of cells in each cluster) were classified as indeterminate because they could not be classified in any of the seven cell types. Expression of each of the 36 candidate genes was then analysed in each cell from raw data, and the percentage of total average expression per cell type was calculated (Extended Data Fig. 3). A gene was defined to be a specific expression to a cell type when it was expressed in more than 35% of a certain cell type.
Network analysis (co-localization and degree)
Relationships between damaging DNMs in MM versus control cohorts and the human genome network were calculated using STRING (v.11.5) for gene interactions. We calculated the connectedness from the STRING database of damaging DNMs from MM (n = 187) and control (n = 135) among 19,699 annotated genes, randomly resampled with a size of 80% of the gene set from control (n = 108), for 100,000 iterations. The number of edges per iteration was compared using a Wilcoxon rank-sum test (two-sided). The number of edges per node was compared between the cohorts (Wilcoxon rank-sum test, one-sided).
Pathway enrichment analysis
To estimate whether the MM or control DNM gene sets were functionally enriched in known biological pathways, the GO biological processes Reactome and KEGG pathway, we conducted a gene enrichment analysis using ClueGO (v.2.5.10) in Cytoscape (3.10.1), with parameters set as Evidence All_without_IEA (inferred from electronic annotation), GO tree interval ranging from 3 to 8, and at least 5 genes, 4% of genes for GO term selection and a kappa score of 0.4. We adjusted the P–value using Bonferroni step-down with a significance criterion of 0.05. Detailed information of the GO enrichment test can be found in Supplementary Table 2. No significant term was observed with the control gene sets.
Sanger confirmation
Of 198 damaging DNMs in the MM cohort for which trio DNA samples were available, 171 (86%) had sufficient DNA for Sanger sequencing confirmation. For a subset of damaging DNMs, primer sets were generated using Primer3 and obtained from IDT. The primer lengths were configured to span a range of 18–23 bp, targeting an ideal size of 20 bp. The GC content was set to a minimum of 30% and a maximum of 70%, and the annealing temperature was established to range between 57 °C and 62 °C, with an optimal temperature of 59 °C. PCR was performed using a QIAGEN Taq DNA polymerase kit and primer concentrations of 500 nM. Four reactions were done per DNM, amplifying DNA from the father, mother, affected child and an unrelated healthy individual. PCR products were confirmed by agarose gel electrophoresis, purified by incubation with exonuclease I/shrimp alkaline phosphatase (1:2), diluted in double-distilled water and underwent Sanger sequencing (Genewiz/Azenta). Trace files were analysed in SnapGene Viewer and compared against the UCSC genome browser to confirm that the affected child was heterozygous for the detected mutation and that the father, mother and control were homozygous for the reference base at the same locus. Because the same pipeline was applied to both cohorts, the validation rate of the DNMs from the control cohort, using SAFARI16, was expected to be similar to that of the MM cohort.
Network propagation and submodule identification
A propagated network was constructed using NetColoc (v.0.1.7)69. PCNet70 was used for the background human gene-interaction network, which contained 18,820 nodes and 2,693,109 edges. Of the 187 MM genes, two (SLCO1B3/SLC01B7 and H3–4) could not be included as seeds because they could not be found in PCNet. As implemented in NetColoc, w prime and w double prime values were calculated for each gene to generate z scores for proximity. We used a z-score threshold of 3 as a default. The entire list of the genes in the propagated network is in the Supplementary Data. Known mouse NTD genes (n = 374) were based on previous data, for which 205 genes were organized in 2010 (ref. 31) and others were then added (see Supplementary Data for the gene list). To examine the propagated network, we performed a hypergeometric test, using 18,820 total genes, as in PCNet (M = 18,820), the number of known NTD genes (n = 374), the number of nodes in the propagated network (N = 439) and the number of overlapping genes with propagated network and MM genes (k = 17): X ~ Hypergenometric(M,n,N).
Then, to identify small functional modules from the expanded network, the propagated network was clustered with a Leiden algorithm with 0.5 resolution, 0.01 beta with 45 iterations. To use the co-expression value as attributes, we again transferred the propagated network to the STRING database (v.12) using the Homo sapiens full STRING network, popping out 114 nodes without edges to the network, indicating a discrepancy in the sources of the protein network. In total, 15 small clusters were generated with more than five nodes. Five could be significantly enriched by functional biological terms (FDR < 10−5) by the GO biological process KEGG pathway without disease-related terms, and Reactome, which had more than 30% of the nodes derived from damaging DNMs (Supplementary Data). Two clusters annotated as GTPase involved actin cytoskeleton and microtubule-based process were enriched with functional terms with the most significant P values (FDR < 10−8), with eight and four terms, respectively. Each submodule was named on the basis of the reported biological terms to best represent the module (Supplementary Notes). The pLI of the genes in the five functional modules can be found in Supplementary Data.
Cloning
The DNMs from patients were mutagenized into wild-type cDNA using Gibson Assembly cloning. The mutations include R231G (R230G in mouse) in Vwa8b (mouse), E521Q and E623Q in PLCE1 (human), P168L in TNK2 (human), H1149P in TIAM1 (human), R620H in BRSK2 (human) and R349C in CLIP2 (rat). The non-human protein sequences for VWA8b and CLIP2 were confirmed to cover all the human protein sequences (Supplementary Table 4). For each of the plasmids, overlapping (40 bp) forward and reverse primers with 20 bp 5′ overhangs were designed complementary to the mutation site, with a single nucleobase difference corresponding to the mutation of interest (Supplementary Data). Plasmid DNA (15 ng) was amplified using an NEB Phusion High-Fidelity 2X Master Mix [NEB, M0531] for 25 cycles, after which the reaction was subjected to 1 h at 50 °C with NEBuilder HiFi DNA Assembly 2X Master Mix [NEB, E2621], and the reaction mixture was treated with DpnI [NEB, R0176] digestion at 37 °C for 30 min. Then 2 μl of the reaction mixture was transformed into NEB DH5-α competent E. coli [NEB, C2987], with successful mutation verified using Primordium whole-plasmid sequencing.
Cell transfection
Human embryonic kidney (HEK) 293T cells were passaged using TrypLE Express Enzyme (ThermoFisher Scientific, 12604013) and Dulbecco’s modified Eagle’s medium (DMEM), 10% FBS and 3.3 × 106 cells were seeded per 10-cm dish. Next day, each 10-cm plate was transfected with 10 μg plasmid using Lipofectamine 2000 (ThermoFisher Scientific, 11668019) following the manufacturer’s recommended guidelines. The cells were collected 72 h after transfection and cell pellets were used for protein extraction.
Protein extraction
Cell pellets were lysed using modified RIPA buffer (10 mM Tris-HCl, pH 8.0, 1 mM EDTA, 0.5 mM EGTA, 1% Triton X-100, 0.1% sodium deoxycholate, 0.1% SDS and 140 mM NaCl in distilled water) supplemented with 1× PMSF (Research Product International, P20270) on ice, with vortexing every 5 min over a 30-min period. The lysate was centrifuged at 10,000g at 4 °C for 10 min. The supernatant was transferred to a new pre-chilled tube. The concentration of collected protein was measured using a Pierce BCA assay (Thermo Scientific, 23225) following the manufacturer’s instructions.
Western blot
We added 4× SDS–PAGE loading buffer supplemented with 10% β-mercaptoethanol (Sigma, M3148) to the required volume of protein samples and then heated the samples at 95 °C for 5 min. The protein samples (10 μg) were loaded into each well of a Mini-PROTEAN TGX (Bio-Rad, 4569031) gel and run for 30–45 min at 100 V. The proteins were then transferred to a nitrocellulose membrane (Sigma, IPVH00010) through a 90-min transfer step at 285 mV. The membrane was blocked with blocking buffer consisting of 5% BSA (Sigma, A9418) in TBST (0.1% Tween-20 (Sigma, 11332465001) in TBS) for 1 h at room temperature. This was followed by incubation with appropriate primary antibody (diluted in blocking buffer) at 4 °C overnight with gentle rocking. The membrane was then washed four times for 10 min in TBST, followed by incubation for 1 h in the appropriate HRP-conjugated secondary antibody diluted 1:10,000 in blocking buffer at room temperature. The membrane was then washed again for four times in TBST and then wetted with Pierce ECL western substrate for HRP detection (Thermo Scientific, 32106) and imaged using a chemiluminescence detector (Biotechnique, 92–14860). Quantification of protein band intensity was performed using the ImageJ band densitometry plug-in, and final plots were created using Prism. For gel source data, see Supplementary Fig. 1.
TIAM1
The following constructs were as previously described: pCMV-EGFP71, pCMV-Flag-Tiam1 (WT-Tiam1)72, pRaichu-RaichuEV-Rac1 (2248X)68 and pcDNA3.1 vector73, with mRuby3 obtained from Addgene (127808). Cos-7 cells were grown in DMEM (Corning 10–013-CV) supplemented with 100 U ml−1 penicillin/streptomycin (Invitrogen, 15140122) and 10% heat-inactivated fetal bovine serum (Atlanta Biologicals, S11150H) on cell-culture-treated plastic (VWR, 10062–880) maintained at 5% CO2 and 37 °C. Cells were passaged with trypsin (Invitrogen, 25200072) once a week. For experiments, 24 h before transfection, cells were plated on either nitric acid-washed glass cover slips (Bellco Glass, 1943–10012A) or glass-bottomed plates (Cellvis, P241.5HN) coated with 20 μg ml−1 poly-d-lysine (Corning, 354210). Cells were transfected using Jetprime reagent (Polyplus, 101000046) according to the manufacturer recommendations. Then, 24 h after transfections, the cell medium was changed to serum-free DMEM. For live-cell experiments, phenol-red-free DMEM (Corning, 17–205-CV) was used. Experiments were done 48 h after transfection. For immunocytochemistry and microscopy, cells were fixed with 4% paraformaldehyde (Fisher Scientific, AC169650010), 4% sucrose (Sigma, S0389) in phosphate buffered saline (PBS). To confirm construct expression, after fixation, all cells were immunostained overnight at 4 °C with 1:1,000 rabbit anti-Flag (Cell Signaling Technology, 14793S) primary antibody diluted in 0.3% Triton X-100, 5% goat serum in PBS. Afterwards, cells were incubated with 1:500 Cy5 goat anti-rabbit secondary antibody (Jackson ImmunoResearch, 111–175-003) for 2 h at room temperature. For filamentous-actin experiments, cells were also incubated with Texas red phalloidin (Invitrogen, T7471) during this time. After fixation and staining, cells grown in glass coverslips were mounted using the aqueous mounting solution FluorSave (EMD Millipore, 345789). For imaging, filamentous actin was labelled with Texas red phalloidin then imaged using a Zeiss AxioObserver.Z1 microscope with a 20× objective lens. RAC1 activation was measure in cells transfected with the RaichuEV-Rac1 probe, imaged on a Zeiss LSM 880 confocal microscope at 20×. Live-imaging experiments were done at 37 °C and 5% CO2. Single-plane images were acquired using FRET (excitation, 458 nm and emission, 522–569 nm), CFP (donor) (excitation, 458 nm and emission, 463–507 nm) and YFP (Acceptor) (excitation, 514 nm and emission, 522–569 nm), presented as normalized FRET/donor values using ImageJ (NIH). Only cells expressing constructs were selected for analysis. GFP or mRuby3 fill visualized cell morphology by ROI cell trace, measured for mean intensity, corrected for background and normalized to the respective controls. Experiments were conducted with experimenters blinded to the conditions. Representative images were masked using ImageJ (NIH).
PLCE1
Plasmid expressing PLCE1, including the wild type, E521Q and E623Q mutations, were transfected into HEK293 cells and maintained in DMEM supplemented with 10% FBS and 1% penicillin/streptomycin. Plasmids were transfected into HEK293 cells using Lipofectamine (Invitrogen), grown for 48 h and lysed in lysis buffer (1% Triton X-100, 50 mM Tris, pH 7.4, 10 mM MgCl2, 500 mM NaCl) with added protease inhibitors (Roche). The lysates were spun down at 20,000g at 4 °C for 15 min and the supernatants were used for overnight incubation with 20 μl GST beads containing GST recombinant proteins on the rotator. After the incubation, samples were washed with 800 μl lysis buffer, the GST beads were collected at 500g at 4 °C for 2 min, the supernatants were aspirated out and this washing was repeated three times. Proteins were eluted with 20 μl 2× Laemmli buffer. For pull-down assay, eluted proteins were diluted in 4× SDS–PAGE loading buffer supplemented with 10% β-mercaptoethanol (Sigma, M3148) and loaded into each well of an Any kDa Bio-Rad Mini-PROTEAN TGX gel, and run for 30 min at 150 V. Samples were transferred to PVDF membrane (Sigma, IPVH00010), blocked with 5% BSA (Sigma, A9418) in TBST for 1 h at room temperature, membrane blotted with 1:1,000 mouse anti-RAC1 (BD Transduction Labs, 610650), mouse 1:1,000 anti-Cdc42 (BD Transduction Labs, 610928), 1:1,000 rabbit anti-RHOA (Cell Signaling Technology, 2117) and 1:1,000 rabbit anti-Myc (Cell Signaling Technology, 2278), washed in TBST for 10 min then blotted with 1:1,000 horse anti-mouse HRP-conjugated (Cell Signaling Technology, 7076P2) and 1:10,000 goat anti-rabbit HRP-conjugated (Invitrogen, 31460) secondary antibodies, washed in TBST for 10 min, then developed in Pierce ECL Western solution (Thermo Scientific, 32106) for 5 min, imaged using a chemiluminescence detector [Biotechnique, 92–14860] and band-intensity quantified using ImageJ band densitometry, and plotted using Prism. The GST–PAK1 beads (PAK02-A) and GST–rhotekin RBD beads (RT02-A) were from Cytoskeleton. For gel source data, see Supplementary Fig. 1.
TNK2
HEK293 T cells (from American Type Culture Collection) were maintained in DMEM (Corning) supplemented with 10% fetal bovine serum (Sigma) and 1× penicillin and streptomycin. The cells were transfected according to the protocol supplied with TransIT reagent (Mirus) using 7.5 μg DNA. After 48 h, the cells were collected, washed with phosphate-buffered saline and lysed in a buffer containing 25 mm Tris–HCl, pH 7.5, 100 mm NaCl, 1 mm EDTA, 1% Nonidet P-40, 5 μg ml−1 leupeptin, 5 μg ml−1 aprotinin, 1 mM PMSF and 200 μM Na3VO4. Cell lysis was for 30 min at 4 °C with rocking. The cell suspensions were centrifuged at 15,000 r.p.m. for 10 min and protein concentrations of the soluble lysates were measured using a colorimetric Bradford assay. The lysates (100 μg) were analysed by SDS–PAGE and western blotting using the following antibodies: 1:1,000 rabbit anti-phospho-Ack1/Tnk2 (Millipore Sigma, 09–142), 1:1,000 mouse anti-Flag (Sigma-Aldrich, A8592) and 1:1,000 mouse anti-γ-tubulin (Sigma-Aldrich, T6557) and donkey anti-rabbit (Amersham, NA9340V) and donkey anti-mouse (Amersham, NA93IV) HRP-linked secondary antibodies. Immunoprecipitation (IP)-kinase assays were performed essentially as described with the following modifications: HEK293T lysates (1 mg) were incubated with 40 μl anti-Flag M2 affinity resin (Sigma, A2220) on a rotator at 4 °C overnight, then washed three times with Tris-buffered saline (TBS). The immunoprecipitates were divided into three portions. One portion of each sample was mixed with a gel-loading buffer, boiled for 3 min and analysed by 7.5% SDS–PAGE with anti-Flag western blotting. The remaining two samples were used for duplicate activity measurements using the phosphocellulose paper-binding assay. The reactions contained 20 mm Tris–HCl, pH 7.4, 10 mm MgCl2, 0.25 mm ATP, 1 mM WASP peptide (KVIYDFIEKKG) and 20–50 cpm pmol−1 [γ−32P] ATP. Reactions were done at 30 °C for 15 min. Incorporation of 32P into peptide was measured by scintillation counting. For gel source data, see Supplementary Fig. 1.
DNAH5
RNA was extracted from patient and control fibroblast cells using TRIzol reagent (ThermoFisher Scientific, 15596026) following the manufacturer’s guidelines. The extracted RNA was used to synthesize cDNA using the SuperScript III First-Strand Synthesis System (ThermoFisher Scientific, 18080051) and the provided Oligo(dT)20 primer set. cDNA (30 ng) was used to perform PCR with the following DNAH5 primers using GoTaq Green Master Mix (Promega, M7122). Primers spanning DNAH5 exons 46–48, 46–49 and 1–4 were designed using Primer3 (details are in the Supplementary Data).
KDM1A
HEK293T cells were transfected with pEGFP-C2-KDM1A-WT, pEGFP-C2-KDM1A-R332C and empty pEGFP-C2 vectors, and proteins were collected to perform western blot using 1:1,000 rabbit anti-GFP primary antibody (Cell Signaling Technology, 2555), 1:1,000 mouse anti-β-actin (Santa Cruz Biotechnology, sc-47778) and 1:10,000 donkey anti-mouse HRP-conjugated secondary (Jackson ImmunoResearch, 715–035-150) and 1:10,000 donkey anti-rabbit HRP-conjugated secondary (Jackson ImmunoResearch, 711–036-152) (see the sections on cell transfections, protein extraction and western blots in the Methods). For gel source data, see Supplementary Fig. 1.
VWA8
HEK293T cells were transfected with pCMV-mVwa8b-HA WT, R230G and empty pCMV plasmids. The cells were then collected to extract protein. Western blot was then performed using 1:1,000 rabbit anti-HA (Y-11) (Santa Cruz Biotechnology, sc-805) and 1:5,000 mouse anti-α-tubulin (Sigma-Aldrich, T6074) primary antibodies and 1:10,000 donkey anti-mouse HRP-conjugated secondary (Jackson Immuno Research, 715–035-150) and 1:10,000 donkey anti-rabbit HRP-conjugated secondary (Jackson ImmunoResearch, 711–036-152) (see the sections on cell transfections, protein extraction and western blot in the Methods). For gel source data, see Supplementary Fig. 1.
CLIP2
HEK293T cells were transfected with pEGFP-CLIP2 WT, R349C (TGC or TGT), either with constitutively active AKT pE17K (Addgene, 73050) or empty vector using Lipofectamine as previously described, lysed into sample buffer and analysed by western blot using mouse anti-GFP antibody 1:200 (Thermo Fisher Scientific, A-11121) and 1:10,000 donkey anti-mouse HRP-conjugated secondary antibody (Jackson ImmunoResearch, 715–035-150). For microtubule analysis, HuH7 cells cultured in DMEM with 10% fetal calf serum (Sigma) and 1% antibiotic–antifungal mixture (Gibco) were transfected with pEGFP-CLIP2 WT, R349C or S352A (Turbofect, ThermoFisher Scientific, R0531). Time-lapse sequences (Δt = 2.4 s, 80 frames) of the CLIP2 fluorescence signal were captured using a Leica DMLB microscope through a 100× 1.3 NA objective lens and a Scion CFW1312M camera. The parameters of microtubule dynamics were computed as previously described74. Statistical comparisons were made using one-factor ANOVA. For gel source data, see Supplementary Fig. 1.
BRSK2
HEK293T cells were transfected with the pEXP-CMV-BRSK2-Flag plasmid containing wild type, K48A kinase-dead mutant and R620H patient mutation BRSK2 sequence using PEI (Sigma Millipore, 919012) and OptiMEM (Gibco, 31985–070) according to the manufacturer’s guidelines. The protein concentration of lysates was measured using the Pierce BCA protein assay kit (Thermo Scientific, 23225), then 20 μg protein was loaded into each well of a 4–12% Bis-Tris gel (Invitrogen, NP0336), transferred to nitrocellulose membranes (Thermo, 88018), and blocked then blotted with one of the following primary antibodies: 1:1,000 anti-Flag rabbit (Cell Signaling Technology, 14793S), 1:1,000 β-actin rabbit (Sigma, A5316), 1:1,000 anti-Phospho-AMPK Substrate Motif rabbit (Cell Signaling Technology, 5759) and 1:1,000 mTOR rabbit (Cell Signaling Technology, 2983), 1:1,000 P-mTOR Ser 2448 rabbit (Cell Signaling Technology #5536), 1:1,000 p70 S6 kinase rabbit (Cell Signaling Technology, 34475), 1:1,000 Phospho-p70 S6 Kinase Thr 389 rabbit (Cell Signaling Technology, 9234), 1:1,000 4E-BP1 rabbit (Cell Signaling Technology, 9644) and 1:1,000 Phospho-4E-BP1 Thr 37/46 rabbit (Cell Signaling Technology, 2855) at 4 °C overnight, rinsed in TBST for 10 min, incubated with LI-COR IRDye 680 donkey anti-mouse (LI-COR, 926–68072) or 800 donkey anti-rabbit (LI-COR, 926–32213) 1:10,000, rinsed in TBST and imaged on an Odyssey CLx using Image Studio software for analysis of band intensity. For gel source data, see Supplementary Fig. 1.
Computational prediction of D-mis DNMs
The AlphaFold Server75 was used to predict the structures of both wild-type and mutant forms of CLIP2, BRSK2, SCAP and DHCR7. For CLIP2, the wild type and R349C mutant structures were predicted in complex with the tubulin β−1 chain (UniProt ID: Q9H4B7) and tubulin )-1A chain (UniProt ID: Q71U36). The input molecules included the full-length CLIP2 homodimer, two copies each of the tubulin β−1 and tubulin )-1A chains, and two GTP and two GDP molecules to approximate the physiological conformation. Five conformations were generated per prediction, and those with protein–protein interactions between the Cap-Gly domains and tubulin chains were selected as the most biologically plausible poses, measured with PLIP (v.2.3.0)76. For BRSK2, the wild type and R620H mutant structures (residues 532–631) were predicted and subsequently docked with phosphatidylethanolamine conformers from the human cell membrane cryo-EM structure (PDB ID: 6BAJ) using AutoDock-GPU (v.1.5.3)77. The docking scores were analysed to evaluate changes in lipid affinity for the wild type and mutant forms. SCAP structures were predicted using AlphaFold, focusing on wild type and P26S mutant forms of the amino-terminal region (residues 1–48). Five predictions were analysed for each variant to assess the conformational diversity, particularly in the angle between the helices, which was observed to differ between the wild type and mutant structures. DHCR7 structure predictions using AlphaFold included wild type and W459R mutant forms of the C-terminal domain, residues 406–475. Five predictions were generated to capture the conformational diversity, analysing the impact of the mutation on helical angles and structural stability.
Xenopus modelling
Manipulations of Xenopus embryos was according to established protocols78. Ethics oversight was approved by the Institutional Animal Care and Use Committee of the University of Texas at Houston. Female adult Xenopus were induced to ovulate by the injection of human chorionic gonadotropin, and in vitro fertilization was done by homogenizing a small piece of a testis. Embryos were dejellied in 1/3× Marc’s Modified Ringer’s solution with 2.5% (w/v) cysteine (pH 7.9) at the two-cell stage. The Xenopus gene sequences for MOs or CRISPR designs, as well as for cloning the Whamm open reading frame, were obtained from Xenbase (http://www.xenbase.org). The open reading frame of Whamm was amplified from Xenopus cDNA and cloned into the pCS10R MCC vector containing a GFP tag. Capped mRNA was synthesized using the mMESSAGE mMACHINE SP6 transcription kit (Invitrogen Ambion, AM1340). MO and/or mRNA was microinjected into two dorsal blastomeres at the four-cell stage in 2% Ficoll (w/v) in 1/3× Marc’s Modified Ringer’s solution. The injected embryos were incubated at 18 °C until the late neurula stage and fixed with 1× MEMFA (0.1 M MOPS, 2 mM EGTA, 1 mM MgSO4, 3.7% formaldehyde, pH 7.4). The sequences of the MOs used were designed as follows and purchased from Gene Tools:
Whamm MO (splice blocking): 5′-AAAAGTAGGAAGAAGCCCCCACCCT-3′;
Nostrin S MO (splice blocking): 5′-ATAAAATTTACTTACGGTGGAGCCT-3′;
Nostrin S MO (translation blocking): 5′-ATCTGATCCATAGCCAGTAGGAAAT-3′;
Spen MO (splice blocking): 5′-GAGCAAACAGCCGCACTCAC-3′;
Mink1S.MO (splice blocking): 5′-TGAGTCAATCCCCCTCCTTAC-3′; and Mink1L.MO (splice blocking): 5′-CCCAAATCATTCCCCTTCTTACCC-3′.
To quantify the distance between the neural folds, they were visualized by in situ hybridization for Pax3, following a standard protocol78. Images were captured using a ZEISS Axio Zoom V16 stereomicroscope and associated Zen software. The distances between neural folds were measured as the distance between the medial sides of Pax3 expression domains, ensuring that the measurements were taken perpendicular to the anterior–posterior axis. These measurements were averaged from the anterior to the posterior regions of the neural folds. All analyses were performed using Fiji. For Spen: 25, 17 and 19 embyros testes for control (10 ng and 20 ng); for Mink1, 13, 12 and 10 embryos were confirmed for control (5 ng and 10 ng); for Nostrin and Whamm, 14, 16, 16, 17 and 12 embryos were examined for control (Nostrin 5 ng, Whamm 5 ng, Nostrin 10 ng and Whamm 10 ng); for the combination of Whamm and Nostrin, 11 and 12 embryos were examined (5 ng + 5 ng and 10 ng + 10 ng, respectively); for Whamm rescue analysis, 16, 20, 19 and 14 embryos were screened for control, using Whamm MO, Whamm MO + normal mRNA and normal mRNA only. To verify the efficiency of MOs, PCR with reverse transcription was performed. MOs were injected into all cells at the four-cell stage, and total RNA was extracted from four embryos per sample using the TRIZOL reagent (Invitrogen, 15596026). cDNAs were synthesized using M-MLV reverse transcriptase (Invitrogen, 28025013) and random primers (NEB, S1330S). cDNAs were amplified by Taq polymerase (NEB, M02735) or Phusion High-Fidelity DNA polymerase (NEB, M0530S) using the following primers:
Whamm: 5′-CGGTGCAGGACTTGGATTAT-3′ (212 forward), 5′-CCCATTTAGCTACCTCCTTCTG-3′ (806 reverse);
Nostrin S: 5′-CTCCACCTATGACAGGGTTTAC-3′ (45 forward), 5′-GGTTGTACCGAGTGAGGATATG-3′ (722 reverse);
Spen: 5′-GAAACCAGGCACCTCTGGGTG-3′ (10 forward), 5′–CCGTTCATAACGTCCTTCCCG-3′ (393 reverse); and Mink1 L and S: 5′-CCAGCTCGGAGTTTGGATGAC-3′ (16 forward), 5′-CAGTCTTCCTTCAGGGCATTTCC-3′ (386 reverse).
For the CRISPR assay, guide RNAs (gRNAs) were designed using CHOPCHOP software (https://chopchop.cbu.uib.no) and synthesized in Synthego (synthego.com). Recombinant Streptococcus pyogenes Cas9 nuclease (or Alt-RTMS.p.Cas9 Nuclease V3, 1081058) were purchased from Integrated DNA Technologies. Then, 320 pg of each gRNA and 1.5 ng of Cas9 were injected into one-cell-stage Xenopus embryos, which were incubated at 18 °C until they reached the late neurula stage. Genomic DNAs were extracted using QIAwave DNA Blood & Tissue Kit (Qiagen, 69554) and CRISPR targeting regions were amplified using Phusion High-Fidelity DNA polymerase (NEB, M0530S) or Q5 High-Fidelity DNA polymerase (NEB, M0491) Phusion High-Fidelity DNA polymerase (NEB, M0530S) for genotyping. The gRNA target sequences were as follows, with the PAM sequences underlined:
Spen: 5′-GGGAGACAGAGACCTTCGAACGG-3′, 5′-ATCCTCGGTGTACCTCCCAGTGG-3′;
Whamm: 5′-CATCGTGGCCTGGAACCAGGTGG-3′, 5′-CCTGGAGCGTTACTTCGGGGCGG-3′;
Nostrin S: 5′-GGACGACAGCAGAAGTACACAGG-3′, 5′-CGGATTTCTGACAGATCAAGAGG-3′; and Mink1: 5′-AAAGGAAGAGCGCCGCCGAGTGG-3′, 5′-CGTCCATCTGTTGGAAGCGTCGG-3′.
The following PCR primers were used for genotyping:
Spen: 5′-CAAGAGGGGATCTGATGGTGG-3′, 5′-CCATCGAGTCTCCGTTCATAACG-3′;
Whamm: 5′-GTGTGCCTTGTAAGTCACGTG-3′, 5′-CCGGGTCACTTGCTCCTCTAG-3′;
Nostrin S: 5′-CTCACATTGCCATAGCTTCCCC-3′, 5′-GGCAGCAAATGGTACAGAATCG-3′;
Mink1 L: 5′-CAATAATGAATGTGCCGGGGG-3′, 5′-CATTGCAGGATCCACGTGAC-3′; and Mink1 S: 5′-GTTCAATAATGAACGTGCCGGG-3′, 5′-CTCATGAAGGTTCGCCGAAAC-3′.
Extended Data
Extended Data Fig. 1 |. Power calculation of estimating a cohort size for DNM detection.
Power calculation showing potential number of discovered genes compared with cohort size (350 trios), for two different v values (enrichment ratio of loss of function variants in case versus control) and two different k values (assumed number of risk genes). For instance, if there are 50 genes to discover (k=50), a cohort of 400 trios will identify 16 genes if LOF variants are 2.5x more common in affected (v=2.5). All calculations manage a conservative false discovery rate (FDR). Gray dash: FDR.
Extended Data Fig. 2 |. Spatial expression of DNM genes with MERFISH in E9.5 mouse embryos.

Spatial expression of the MM genes with damaging DNMs. a, Gene expression of marker genes for seven selected cell types (neuron, neural progenitor, pre-epithelial to mesenchymal transition neural crest progenitor (NC progenitor), neural crest, mesoderm, dorsal root ganglia, and blood), in two embryonic replicates. b, Six spatial expression pattern of damaging DNM genes with specific (left) and broad (right) expression. Full MERFISH image of 36 genes can be found in the GitHub (https://github.com/Gleeson-Lab/Publications/tree/main/MM_DNM).
Extended Data Fig. 3 |. Cell type expression of the DNM genes in MERFISH.
a, Expression at E9.5 of the 36 damaging DNMs in seven cell types: neuron, neural progenitor, pre-epithelial to mesenchymal transition neural crest progenitor (Pre-EMT-NCP), neural crest, mesoderm, dorsal root ganglia, and blood. Indeterminate refers to the cells that were not specified with the marker genes designed for the seven cell types. b, Expression of marker genes used for specifying the cell types in MERFISH. Marker genes are shown within the cell type category which they represent.
Extended Data Fig. 4 |. A human protein network constructed with damaging DNMs contributing to MM risk.
By using the 187 damaging MM DNM genes, a propagated network was generated with NetColoc69 with a background protein network PCNet70, incorporating 439 nodes and 2,447 edges. Big blue circle: damaging DNM genes, Small purple circle: propagated gene, green border: known mouse NTD genes. The network is visualized with Cytoscape with STRING database.
Extended Data Fig. 5 |. H1149P patient mutation impairs TIAM1 activity and PLCE1 patient mutation E623Q leads to diminished GTP-bound RhoA.
a, H1149P mutation is located within the Dbl homology (DH) domain responsible for GEF activity. TIAM1 contains an N-terminal pleckstrin homology (PH), coiled-coiled (CC), extension (Ex), RAS binding (RBD), PDZ, Dbl-homology (DH) and PH domains, with the patient mutation falling within the DH domain. b, Schematic of PLCE1 protein with domains annotated. Patient E623Q mutation is located in the Ras GEF domain. PLCE1 contains a Guanine nucleotide exchange factor for Ras-like small GTPases (RAS GEF), Pleckstrin Homology (PH), Phospholipase C catalytic domain X (PLCX), Phospholipase C catalytic domain Y (PLCY), Protein Kinase C conserved region 2 (C2), RAS association domain 1 (RA1), and RAS association domain 2 (RA2). c, Construct expression H1149P (n = 76) is equivalent to wildtype (n = 85) in Phalloidin quantification. d, Construct expression H1149P in constitutive active (C.A.) (n = 53). Src Rac1 Förster resonance energy transfer (FRET) is equivalent to wildtype. P value adjusted with Bonferroni. Kruskal-Wallis followed by a two-sided pairwise Wilcoxon test, P value adjusted with Bonferroni. Data shown with Hampel filter. Error bar: standard error of the mean. P values: ns: not significant. e, Active GTP-bound form of RhoA precipitated from HEK293 expressing Myc-tagged PLCE1 using a GST-rhotekin pulldown assay. Overexpression of WT PLCE1 resulted in a substantial decrease in relative RhoA activity compared with mock cells. Compared to WT, cells transfected with variant forms of PLCE1 exhibited marked differences in GTP-bound RhoA.
Extended Data Fig. 6 |. The P168L patient mutation impairs TNK2 activity and intronic mutation at the splice acceptor site before exon 47 leads to alternative splicing of DNAH5.
a, P168L mutation is located within the kinase domain. TNK2 contains sterile alpha motif (SAM), Src homology 3 (SH3), CDS42 and RAC-interactive binding (CRIB), Mig6 homology region (MHR), and ubiquitin-associated domain (UBA). b, Blots for the A156T kinase dead, wildtype, and the patient mutation P168L. Lysates were probed with pY284-Ack1 (top), Ack1-flag (middle), and gamma-tubulin (bottom). Repeated independently with similar results four times. Ack1 refers to TNK2. c, TNK2 P168L patient mutation impaired WASP phosphorylation from immunoprecipitation (IP) kinase assay, compared to WT and kinase dead A156T. d, Location of chr5:13807727 T > G patient mutation in DNAH5 gene. e, Primer design for detecting altered splicing in DNAH5 cDNA - pair (i) spanning exons 46–48, pair (ii) spanning exons 46–49 and control pair (iii) spanning exons 1–4. f, RT-PCR results using primers listed in e showing altered splicing for exon 46–48 and 46–49 in patient cDNA, but not in controls.
Extended Data Fig. 7 |. KDM1A R332C patient mutation and VWA8b patient mutation R230G significantly reduces protein expression levels.
a, Schematic of KDM1A protein domains - R332C patient mutation is located in the amino-oxidase domain of KDM1A protein. b, Protein levels of WT and R332C KDM1A detected by western blot from HEK293T cells transfected with pEGFP-C2-KDM1A WT or R332C plasmids; c, Quantification of GFP-KDM1A band intensities from b normalized with β-actin loading control (n = 3). Bar: median, Error bar: interquartile range. Two-tailed unpaired t test with Welch’s correction, P value * = 0.0316. d, Schematic of protein domains for human VWA8b consisting of NTPase, Walker A (WA), ATP binding, and Walker B (WB) domains and patient mutation R230G in the NTPase domain. e, Protein levels of HA-tagged mVwa8b empty vector (EV), WT and R230 overexpressed in HEK293T cells detected using anti-HA antibody; alpha tubulin used as loading control. f, Quantification of HA band intensity from panel b normalized to loading control (n = 3). Bar: mean, Error bar: standard deviation of mean (SEM). one-way ANOVA ****: P value < 0.0001.
Extended Data Fig. 8 |. Validation of Spen and Mink1 knockdown.

a, Schematics of SPEN and MINK1 protein with domains annotated and patient mutations. RRM, RNA Recognition motif. MINT, Mxs2-interacting protein. SPOC, Spen paralogue and orthologue SPOC. CNH, Citron homology domain. b, Dorsal views of Xenopus laevis embryos subjected to in situ hybridization for Pax3 to visualize the neural folds in Spen morphants. c, RT-PCR confirmed that Spen MO reduced the amount of normally spliced Spen transcript. d. Dorsal views of Xenopus l. embryos subjected to in situ hybridization for Pax3 in Mink1 morphants. e, Validation of Mink1 morpholinos by RT-PCR. c,e, Each experiment was performed independently at least twice with similar results. f, Dorsal views of embryos injected with Spen gRNAs only or gRNAs with Cas9, with the accompanying chromatogram showing Sanger sequencing at the CRISPR target site. Control embryos injected with gRNAs only (#1 and #2) developed normally and exhibited an intact sequence, while embryos injected with Spen gRNAs and Cas9 (#3-#6) displayed neural tube defects and mosaic mutations at the CRISPR target site. g, Dorsal views of embryos injected with Mink1 gRNAs only or Mink1 gRNAs with Cas9, with the accompanying chromatogram showing Sanger sequencing at the CRISPR target site. Mink1 crispants (#2-#4) exhibited neural tube defect phenotypes and mosaic mutations at the CRISPR target site in both the L and S alleles of Mink1.
Extended Data Fig. 9 |. Validation of Whamm knockdown.
a, Schematics of SPEN and MINK1 protein with domains annotated and patient mutations. JMY, Junction-mediating and WASP homolog-associated domain, JMY_N, N-terminal of JMY. WH2, WASP-homology 2 domain b, The neural tube closure defect phenotype induced by Whamm MO (10 ng) was rescued through the injection of Whamm mRNA (700 pg). Embryos injected only with mRNA showed no significant phenotype. c, Dorsal views of Xenopus embryos at Stage 19, quantified with Pax3 for in situ hybridization to visualize the neural folds. d, Quantification of the average distance between neural folds in Whamm MO with rescue Whamm mRNA. The rescue experiment was repeated independently with similar results, with multiple independent experiments; Control (n = 16), Whamm MO (n = 20), Whamm MO + mRNA (n = 19), mRNA (n = 14). Box plot indicates the median (center line), the interquartile range (bounds of the box), and the whiskers represent the minimum and maximum. P-values by one-way ANOVA, followed by Tukey’s multiple comparison test: **** < 0.0001, ns: not significant. e, RT-PCR confirmed that Whamm MO reduced the amount of normally spliced Whamm transcript. f, Schematic showing gRNA regions designed to target Whamm gene and primer sites for genotyping. g, Control embryos (#1-#5) developed normally, while crispants (#6-#10) displayed neural tube defects. h, Genotyping in the target area. PCR products from control embryos (#1-#5) are approximately 631 bp, while those from Whamm crispants (#6-#10, except #8) are around 331, indicating a deletion of approximately 300 bp. e,h, Each experiment was performed independently at least twice with similar results. i, Comparison of sequence between control (#4 in panel g is shown) and crispant (#8 in panel g is shown) at the Whamm CRISPR target site. Although the #8 embryo did not exhibit the 300 bp deletion, Sanger sequencing result shows it has mosaic mutations.
Extended Data Fig. 10 |. Validation of Nostrin knockdown and synergistic effect with Whamm.

a, Schematics of NOSTRIN protein with domains annotated and patient mutations. FCH, Fes/CIP4, and EFC/F-BAR homology domain. F-BAR, Fes/CIP4 homology – Bin-Amphiphysin-Rvs domain. HR1, REM-1 domain. SH3, Src homology 3 domain. b, RT-PCR confirmed that the splice-blocking MO for Nostrin S reduced the amount of normally spliced Nostrin transcript. Experiment was performed independently at least twice with similar results c, Dorsal views of embryos injected with Nostrin gRNAs only or Nostrin gRNAs with Cas9, with the accompanying chromatogram showing Sanger sequencing at the CRISPR target site. Control embryos injected with gRNAs only (#1) developed normally and exhibited intact sequences, while embryos injected with Nostrin gRNAs and Cas9 (#2-#4) displayed neural tube defects and mosaic mutations at the CRISPR target site. d, Neural folds visualized by in situ hybridization for Pax3 in Nostrin splice-blocking MO and/or Wham MO. e, Dorsal views of late neurula embryos injected with Nostrin translation-blocking MO and/or Whamm MO.
Extended Data Table 1 |.
Denovo SNV and Indel rates (×10−8) of WGS in coding and noncoding regions
| Mutation type | Category | Coding | Noncoding | Total |
|---|---|---|---|---|
| SNV + Indel | DNM count | 1,976 | 8,817 | 10793 |
| DNM count per proband | 20.16 | 89.97 | 110.13 | |
| rate (× 10−8) | 0.315 | 1.4 | 1.72 | |
| SNV | DNM count | 1,658 | 7,429 | 9,087 |
| DNM count per proband | 16.92 | 75.81 | 92.72 | |
| rate (× 10−8) | 0.26 | 1.18 | 1.45 | |
| Indel | DNM count | 318 | 1,388 | 1,706 |
| DNM count per proband | 3.24 | 14.16 | 17.41 | |
| rate (× 10−8) | 0.05 | 0.22 | 0.27 |
Rates were calculated with 99 trios that could be analyzed after removal of four kinship failed trios.
Extended Data Table 2 |.
Damaging DNM genes with functional validation
| Functional module | Gene | DNM category (patient mutation) | Functional validation (assay) | Interpretation |
|---|---|---|---|---|
| GTPase involved actin cytoskeleton | WHAMM | LGD (R541*) | in vivo (Knockdown with splice-blocking MO & CRISPR) | Open neural tube in Xenopus laevis |
| NOSTRIN | LGD (R298*) | in vivo (Knockdown with splice-blocking and translation-blocking MO & CRISPR) | Open neural tube in Xenopus laevis | |
| TIAM1 | D-Mis-HC (Hl149P) | in vitro FRET Assay | Reduced Rac 1 activation with C.A. Src | |
| PLCE1 | D-Mis (E623Q) | in vitro GTP-Rho assay | Reduced the active GTP-bound RhoA | |
| VWA8 (doubleton) | LGD (L863*) D-Mis (R231G) | in vitro Protein expression analysis (western blot) | Protein production failure | |
| BRSK2 (doubleton) | D-Mis-HC (R116H) | Computational analysis (protein docking analysis) and in vitro Kinase assay. | Reduced protein docking to membrane. Kinase assay revealed no changes in kinase activity. | |
| Microtubule-based process | DNAH5 | LGD (R541*) | in vitro Splicing assay (RT-PCR) | Alternative splicing |
| CLIP2 | D-Mis (R349C) | Computational analysis (AlphaFold) | Interaction domain switched to CLIP2 MTB domain 1 (CAP-Gly 1) and alpha subunit | |
| Chromatin modifying enzyme | SPEN | LGD (Q665*) | in vivo (Knockdown with splice-blocking MO & CRISPR) | Open neural tube in Xenopus laevis |
| KDM1A | D-Mis-HC (R312C) | in vitro Protein expression analysis (western blot) | Reduced expression | |
| Netrin-l signaling | MINK1 | LGD(Q301*) | in vivo (Knockdown with splice-blocking MO & CRISPR) | Open neural tube in Xenopus laevis |
| TNK2 | D-Mis (P186L) | in vitro Kinase assay | Reduced kinase activity toward WASP pseudo-substrate | |
| Lipid metabolism | SCAP | D-Mis-HC (P26S) | Computational analysis (Structure prediction) | Possibly obstructed protein interaction |
| DHCR7 | D-Mis-HC (W427R) | Computational analysis (Structure prediction) | Reduced structural stability |
Fourteen MM damaging DNM genes are listed with functional submodules. The genes were functionally evaluated with in vivo, in vitro, or computational analysis. The genes are listLGD: likely gene disrupting, D-Mis: damaging missense, D-Mis-HC: damaging missense of high confidence, MO: morpholino.
Supplementary Material
The online version contains supplementary material available at https://doi.org/10.1038/s41586-025-08676-x.
Acknowledgements
We thank the individuals with meningomyelocele and their families who participated in this study; K. James, R. George, B. Copeland, V. Stanley, C. Shen and J. Venneri from the Spina Bifida Sequencing Consortium for recruitment and data technical support; staff at the UCSD Laboratory for Pediatric Brain Disease for clinical and technical support; B. Rosenthal and K. Fisch for statistical modelling; staff at the Broad Institute, the Yale Genetic Center, the Regeneron Genetics Center, the UCSD Institute for Genomic Medicine, the UC Irvine Sequencing Center and the Rady Children Institute for Genomics Medicine for sequencing support; B. Craddock for functional analysis of TNK2; and the Spina Bifida Association for recruitment. This work was supported by the Center for Inherited Disease Research (grant HHSN268201700006I), the Yale Center for Genomic Analysis, the Broad Institute, the UC Irvine Genomics Core, the UCSD Institute for Genomic Medicine, the UCSD Imaging Core (grants X01HD100698, X01HD110998, HD114132, P01HD104436 and U54OD030187); the Howard Hughes Medical Institute, the Dickinson Foundation and Rady’s Children Institute for Genomic Medicine to J.G.G.; the National Research Foundation of Korea, funded by the Ministry of Science and ICT (MSIT) (RS-2023–00278314) and the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health and Welfare (RS-2024–00438443, RS-2024–00405260), to Y.-J.J.H. and S.K.; the Science and Technology Development Fund (STDF) of Egypt (33650) with ethical approval 20105 to M.M.N., A.M.S.S., MY.I., and a VA Merit Award (I01 BX006248) to W.T.M.
Competing interests
A. Alkelai and A.R.S. are full-time employees of Regeneron Genetics Center. S.K. is a cofounder of AIMA, which seeks to develop techniques for early cancer diagnosis based on circulating tumour DNA. R.H.F. previously led TeratOmic Consulting, which is now defunct, and received travel funds for Reproductive and Developmental Medicine editorial board meetings.
Spina Bifida Sequencing Consortium
Allison E. Ashley Koch17, Hal S. Meltzer19, Joan T. Le19, Kit Sing Au16, Hope Northrup16, Gyang Markus Bot15, Valeria Capra31, Richard H. Finnell18, Zoha Kibar14, Philip J. Lupo39, Helio R. Machado20, Camila Araújo20, Tony Magana40, Ahmed I. Marwan30, Gia Melikishvili29, Osvaldo M. Mutchinick26, Roger E. Stevenson21, Anna Yurrita22, Maha S. Zaki33, Sara Mumtaz23, José Ramón Medina-Bereciartu27, Caroline M. Kolvenbach41, Shirlee Shril41, Friedhelm Hildebrandt28, Mahmoud M. Noureldeen32, Aida M. S. Salem32, Yukitoshi Takahashi42, Hormos Salimi-Dafsari43, H. Westley Phillips44, Brian Hanak45, Bülent Kara46, Ayfer Sakarya Güneş46, David D. Gonda19, Salman Kirmani47, Tinatin Tkemaladze48 & Joseph G. Gleeson1,2
39Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA. 40Mekelle University, Mekelle, Ethiopia. 41Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA. 42NHO Shizuoka Institute of Epilepsy and Neurological Disorders, Shizuoka, Japan. 43Department of Pediatrics, University Hospital Cologne, University of Cologne, Cologne, Germany. 44Department of Pediatric Neurosurgery, Stanford University, Palo Alto, CA, USA. 45Children’s Hospital of Orange County, Orange, CA, USA. 46Department of Child Health and Diseases, Department of Child Neurology, Umuttepe Campus, Kocaeli University, Kocaeli, Turkey. 47Division of Woman and Child Health, Human Development Program, The Aga Khan University, Karachi, Pakistan. 48Department of Molecular and Medical Genetics, Tbilisi State Medical University, Tbilisi, Georgia.
Footnotes
Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-025-08676-x.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The WES and WGS sequencing data used in this study are available in publicly accessible databases for the 1,146 subjects in the database of Genotypes and Phenotypes (phs003746.v1.p1 and phs002591.v1.p1). Pedigree information with database of Genotypes and Phenotypes identifiers is available in the Supplementary Data. Sequencing data for the remaining subjects cannot be deposited in public repositories because they were enrolled in the study with consent forms that did not conform to current data-sharing requirements. Summary data for these subjects are available on request from the corresponding author (J.J.G.) on reasonable request. Source data are provided with this paper.
Code availability
The computational codes used in this study are available at GitHub (https://github.com/Gleeson-Lab/Publications/tree/main/MM_DNM).
References
- 1.Iskandar BJ & Finnell RH Spina bifida. N. Engl. J. Med 387, 444–450 (2022). [DOI] [PubMed] [Google Scholar]
- 2.Lee S. & Gleeson JG Closing in on mechanisms of open neural tube defects. Trends Neurosci. 43, 519–532 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.MRC Vitamin Study Research Group. Prevention of neural tube defects: results of the Medical Research Council vitamin study. Lancet 338, 131–137 (1991). [PubMed] [Google Scholar]
- 4.Arnold JA Myelocyste, transposition von gewebskeimen und sympodie. Beitr. Pathol. Anat 16, 1–28 (1894). [Google Scholar]
- 5.Chiari H. Uber veränderungen des kleinhirns infolge von hydrocephalie des grosshirns. Dtsch. Med. Wochenschr 17, 1172–1175 (1891). [Google Scholar]
- 6.Wilde JJ, Petersen JR & Niswander L. Genetic, epigenetic, and environmental contributions to neural tube closure. Annu. Rev. Genet 48, 583–611 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Carter CO & Evans K. Spina bifida and anencephalus in greater London. J. Med. Genet 10, 209–234 (1973). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Zhang T. et al. Genetic variants in the folate pathway and the risk of neural tube defects: a meta-analysis of the published literature. PLoS ONE 8, e59570 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Lei Y. et al. Identification of novel CELSR1 mutations in spina bifida. PLoS ONE 9, e92207 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Kibar Z. et al. Mutations in VANGL1 associated with neural-tube defects. N. Engl. J. Med 356, 1432–1437 (2007). [DOI] [PubMed] [Google Scholar]
- 11.Morrison K. et al. Genetic mapping of the human homologue (T) of mouse T (Brachyury) and a search for allele association between human T and spina bifida. Hum. Mol. Genet 5, 669–674 (1996). [DOI] [PubMed] [Google Scholar]
- 12.Jensen LE, Etheredge AJ, Brown KS, Mitchell LE & Whitehead AS Maternal genotype for the monocyte chemoattractant protein 1 A(−2518)G promoter polymorphism is associated with the risk of spina bifida in offspring. Am. J. Med. Genet. A 140, 1114–1118 (2006). [DOI] [PubMed] [Google Scholar]
- 13.Iossifov I. et al. The contribution of de novo coding mutations to autism spectrum disorder. Nature 515, 216–221 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Zaidi S. et al. De novo mutations in histone-modifying genes in congenital heart disease. Nature 498, 220–223 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Lemay P. et al. Loss-of-function de novo mutations play an important role in severe human neural tube defects. J. Med. Genet 52, 493–497 (2015). [DOI] [PubMed] [Google Scholar]
- 16.Fischbach GD & Lord C. The Simons Simplex Collection: a resource for identification of autism genetic risk factors. Neuron 68, 192–195 (2010). [DOI] [PubMed] [Google Scholar]
- 17.Rahbari R. et al. Timing, rates and spectra of human germline mutation. Nat. Genet 48, 126–133 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Kessler MD et al. De novo mutations across 1,465 diverse genomes reveal mutational insights and reductions in the Amish founder population. Proc. Natl Acad. Sci. USA 117, 2560–2569 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Besenbacher S. et al. Novel variation and de novo mutation rates in population-wide de novo assembled Danish trios. Nat. Commun 6, 5969 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Frome EL The analysis of rates using Poisson regression models. Biometrics 39, 665–674 (1983). [PubMed] [Google Scholar]
- 21.Willsey AJ et al. De novo coding variants are strongly associated with Tourette disorder. Neuron 94, 486–499 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Dong C. et al. Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies. Hum. Mol. Genet 24, 2125–2137 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Kong A. et al. Rate of de novo mutations and the importance of father’s age to disease risk. Nature 488, 471–475 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Goldmann JM, Veltman JA & Gilissen C. De novo mutations reflect development and aging of the human germline. Trends Genet. 35, 828–839 (2019). [DOI] [PubMed] [Google Scholar]
- 25.Turner TN et al. Genomic patterns of de novo mutation in simplex autism. Cell 171, 710–722 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Martin J. et al. A brief report: de novo copy number variants in children with attention deficit hyperactivity disorder. Transl. Psychiatry 10, 135 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Sanders SJ et al. Multiple recurrent de novo CNVs, including duplications of the 7q11.23 Williams syndrome region, are strongly associated with autism. Neuron 70, 863–885 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Hol FA et al. A frameshift mutation in the gene for PAX3 in a girl with spina bifida and mild signs of Waardenburg syndrome. J. Med. Genet 32, 52–56 (1995). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Szklarczyk D. et al. The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 51, D638–D646 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Cowen L., Ideker T., Raphael BJ. & Sharan R. Network propagation: a universal amplifier of genetic associations. Nat. Rev. Genet 18, 551–562 (2017). [DOI] [PubMed] [Google Scholar]
- 31.Harris MJ & Juriloff DM An update to the list of mouse mutants with neural tube closure defects and advances toward a complete genetic perspective of neural tube closure. Birth Defects Res. A Clin. Mol. Teratol 88, 653–669 (2010). [DOI] [PubMed] [Google Scholar]
- 32.Traag VA, Waltman L. & van Eck NJ From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep 9, 5233 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Rolo A, Escuin S, Greene NDE & Copp AJ Rho GTPases in mammalian spinal neural tube closure. Small GTPases 9, 283–289 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Wallingford JB, Niswander LA, Shaw GM & Finnell RH The continuing challenge of understanding, preventing, and treating neural tube defects. Science 339, 1222002 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Niederkofler V, Salie R, Sigrist M. & Arber S. Repulsive guidance molecule (RGM) gene function is required for neural tube closure but not retinal topography in the mouse visual system. J. Neurosci 24, 808–818 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Kee N, Wilson N, Key B. & Cooper HM Netrin-1 is required for efficient neural tube closure. Dev. Neurobiol 73, 176–187 (2013). [DOI] [PubMed] [Google Scholar]
- 37.Greene ND, Stanier P. & Moore GE The emerging role of epigenetic mechanisms in the etiology of neural tube defects. Epigenetics 6, 875–883 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Akimova D. et al. Metabolite profiling of whole murine embryos reveals metabolic perturbations associated with maternal valproate-induced neural tube closure defects. Birth Defects Res. 109, 106–119 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Copp AJ, Stanier P. & Greene ND Neural tube defects: recent advances, unsolved questions, and controversies. Lancet Neurol. 12, 799–810 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Schaar BT & McConnell SK Cytoskeletal coordination during neuronal migration. Proc. Natl Acad. Sci. USA 102, 13652–13657 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Dent EW, Gupton SL & Gertler FB The growth cone cytoskeleton in axon outgrowth and guidance. Cold Spring Harb. Perspect. Biol 3, a001800 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Geelen JA & Langman J. Closure of the neural tube in the cephalic region of the mouse embryo. Anat. Rec 189, 625–640 (1977). [DOI] [PubMed] [Google Scholar]
- 43.Rolo A. et al. Regulation of cell protrusions by small GTPases during fusion of the neural folds. eLife 5, e13273 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Hamosh A, Scott AF, Amberger JS, Bocchini CA & McKusick VA Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 33, D514–D517 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Jin SC et al. Contribution of rare inherited and de novo variants in 2,871 congenital heart disease probands. Nat. Genet 49, 1593–1601 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Halvorsen M. et al. De novo mutations in childhood cases of sudden unexplained death that disrupt intracellular Ca2+ regulation. Proc. Natl Acad. Sci. USA 118, e2115140118 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Li W. et al. De novo mutations contributes approximately 7% of pathogenicity in inherited eye diseases. Invest. Ophthalmol. Vis. Sci 64, 5 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Boyle EA, Li YI & Pritchard JK An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Lemos MC et al. Genetic background influences embryonic lethality and the occurrence of neural tube defects in Men1 null mice: relevance to genetic modifiers. J. Endocrinol 203, 133–142 (2009). [DOI] [PubMed] [Google Scholar]
- 50.Momb J. et al. Deletion of Mthfd1l causes embryonic lethality and neural tube and craniofacial defects in mice. Proc. Natl Acad. Sci. USA 110, 549–554 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Chen Z. et al. Threshold for neural tube defect risk by accumulated singleton loss-offunction variants. Cell Res. 28, 1039–1041 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Bassuk AG et al. Copy number variation analysis implicates the cell polarity gene glypican 5 as a human spina bifida candidate gene. Hum. Mol. Genet 22, 1097–1111 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Rendeli C. et al. Assessment of health status in children with spina bifida. Spinal Cord 43, 230–235 (2005). [DOI] [PubMed] [Google Scholar]
- 54.Dimitromanolakis A, Paterson AD & Sun L. Fast and accurate shared segment detection and relatedness estimation in un-phased genetic data via TRUFFLE. Am. J. Hum. Genet 105, 78–88 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Lee J. et al. Mutalisk: a web-based somatic MUTation AnaLyIS toolKit for genomic, transcriptional and epigenomic signatures. Nucleic Acids Res. 46, W102–W108 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Quinlan AR & Hall IM BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Ioannidis NM et al. REVEL: An ensemble method for predicting the pathogenicity of rare missense variants. Am. J. Hum. Genet 99, 877–885 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Jaganathan K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535–548 (2019). [DOI] [PubMed] [Google Scholar]
- 59.Giacopuzzi E, Popitsch N. & Taylor JC GREEN-DB: a framework for the annotation and prioritization of non-coding regulatory variants from whole-genome sequencing data. Nucleic Acids Res. 50, 2522–2535 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Suvakov M, Panda A, Diesh C, Holmes I. & Abyzov A. CNVpytor: a tool for copy number variation detection and analysis from read depth and allele imbalance in whole-genome sequencing. Gigascience 10, giab074 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Chen X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222 (2016). [DOI] [PubMed] [Google Scholar]
- 62.Rausch T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Chen KH, Boettiger AN, Moffitt JR, Wang S. & Zhuang X. RNA imaging. Spatially resolved, highly multiplexed RNA profiling in single cells. Science 348, aaa6090 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Delile J. et al. Single cell transcriptomics reveals spatial and temporal dynamics of gene expression in the developing mouse spinal cord. Development 146, dev173807 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Soldatov R. et al. Spatiotemporal structure of cell fate decisions in murine neural crest. Science 364, eaas9536 (2019). [DOI] [PubMed] [Google Scholar]
- 66.Simões-Costa M. & Bronner ME Establishing neural crest identity: a gene regulatory recipe. Development 142, 242–257 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Wolf FA, Angerer P. & Theis FJ SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Komatsu N. et al. Development of an optimized backbone of FRET biosensors for kinases and GTPases. Mol. Biol. Cell 22, 4647–4656 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Rosenthal SB et al. Mapping the common gene networks that underlie related diseases. Nat. Protoc 18, 1745–1759 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Huang JK et al. Systematic evaluation of molecular networks for discovery of disease genes. Cell Syst. 6, 484–495 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Duman RS, Sanacora G. & Krystal JH Altered connectivity in depression: GABA and glutamate neurotransmitter deficits and reversal by novel treatments. Neuron 102, 75–90 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Tolias KF et al. The Rac1-GEF Tiam1 couples the NMDA receptor to the activity-dependent development of dendritic arbors and spines. Neuron 45, 525–538 (2005). [DOI] [PubMed] [Google Scholar]
- 73.Duman JG et al. The adhesion-GPCR BAI1 shapes dendritic arbors via Bcr-mediated RhoA activation causing late growth arrest. eLife 8, e47566 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Henrie H. et al. Stress-induced phosphorylation of CLIP-170 by JNK promotes microtubule rescue. J. Cell Biol 219, e201909093 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Abramson J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Adasme MF et al. PLIP 2021: expanding the scope of the protein–ligand interaction profiler to DNA and RNA. Nucleic Acids Res. 49, W530–W534 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Santos-Martins D. et al. Accelerating AutoDock4 with GPUs and gradient-based local search. J. Chem. Theory Comput 17, 1060–1073 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Sive H, Grainger RM & Harland RM Early Development of Xenopus laevis: A Laboratory Manual (Cold Spring Harbor Laboratory Press, 2000). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The WES and WGS sequencing data used in this study are available in publicly accessible databases for the 1,146 subjects in the database of Genotypes and Phenotypes (phs003746.v1.p1 and phs002591.v1.p1). Pedigree information with database of Genotypes and Phenotypes identifiers is available in the Supplementary Data. Sequencing data for the remaining subjects cannot be deposited in public repositories because they were enrolled in the study with consent forms that did not conform to current data-sharing requirements. Summary data for these subjects are available on request from the corresponding author (J.J.G.) on reasonable request. Source data are provided with this paper.











