Skip to main content
American Journal of Human Genetics logoLink to American Journal of Human Genetics
. 2025 Oct 6;112(11):2679–2692. doi: 10.1016/j.ajhg.2025.09.009

Distinguishing syndromic and nonsyndromic cleft palate through analysis of protein-altering de novo variants in 818 trios

Kelsey R Robinson 1, Sarah W Curtis 1, Justin E Paschall 2, Wasiu Lanre Adeyemo 3, Terri H Beaty 4, Azeez Butali 5, Carmen J Buxó 6, David J Cutler 1, Michael P Epstein 1, Lord JJ Gowans 7, Jacqueline T Hecht 8, Gary M Shaw 9, Lina Moreno Uribe 10, Jeffrey C Murray 11, Harrison Brand 12, Seth M Weinberg 13, Mary L Marazita 13, Kimberly F Doheny 2, Elizabeth J Leslie-Clarkson 1,
PMCID: PMC12808955  NIHMSID: NIHMS2115151  PMID: 41056948

Summary

De novo variants (DNs) are sporadically occurring variants found in an offspring but absent in both parents. DNs most commonly arise in the germline and are not under selective pressure; therefore, they may be enriched for disease-causing alleles. In fact, DNs have been implicated in multiple rare genetic disorders. Cleft palate (CP) is a craniofacial congenital anomaly occurring in ∼1 in 1,700 live births. Genome-wide association studies have found fewer than a dozen CP-specific loci, while exome and targeted sequencing studies in family-based and case-control cohorts often lack statistical power to conclusively identify causal variants. We therefore hypothesized that CP probands would be enriched for protein-altering DNs, which may explain the relative dearth in discovery. A complicating factor in understanding CP, however, is its phenotypically heterogeneous nature. As such, we aggregated sequence data for 818 trios with CP representing a combination of subtypes and isolated and syndromic presentations. We identified global enrichment of protein-altering DNs (1.48, p = 1.28 × 10−28) and exome-wide-significant (p < 1.3 × 10−6) gene-specific enrichment for SATB2, MEIS2, COL2A1, ZC4H2, EFTUD2, KAT6B, and ANKRD11. We found a statistically significant higher enrichment of protein-altering DNs in syndromic (1.70, p = 6.95 × 10−26) versus nonsyndromic (1.31, p = 8.51 × 10−8) probands but no differences between subtypes. We explored differences in gene-specific enrichment, finding some unique to syndromic probands (ZC4H2) or nonsyndromic probands (IRF6), as well as some shared between groups (SATB2). Altogether, we show that DNs are a contributor to CP risk and that combined analysis can enhance our ability to find genetic associations that would otherwise be undetected.

Keywords: cleft palate, variation, phenotypic heterogeneity, genetic architecture, de novo


Cleft palate probands are globally enriched for protein-altering de novo variants, and gene-specific enrichment was found for seven cleft-related genes. Differences in individuals with syndromic versus nonsyndromic cleft palate were observed, although overlap in affected genes between groups suggests the existence of a phenotypic spectrum rather than distinct etiologies.

Introduction

Cleft palate (CP) is one of three major subtypes of orofacial clefts (OFCs). Together with those affecting the upper lip (cleft lip [CL] with or without cleft palate [CL/P]), OFCs are the most common craniofacial birth defect. CP occurs in approximately 1 in 1,700 live births, though there are differences in geographical and ancestral frequencies.1 In the United States, approximately 2,600 babies are born with overt CP each year, and up to 80,000 babies are born with the more subtle submucous CP (SMCP).2 The secondary palate, or the roof of the mouth, separates the oral and nasal cavities and is made up of two primary components: an anterior bony hard palate that facilitates normal feeding and a posterior muscular soft palate that elevates to close off the pharynx during swallowing and speech. Left untreated, CP is often fatal early in life due to aspiration and/or malnutrition from feeding difficulties,3 though the prognosis is much more favorable with surgical correction. However, individuals born with CP often go on to face speech and/or hearing problems, require advanced orthodontic care, and can experience additional comorbidities as they age.4,5 As such, CP creates both individual and public health burdens. Improving our understanding of its origin can lead to enhanced prevention, treatment, and prognosis for affected individuals.

Despite approximately 25% of CP-affected individuals having a positive family history,6,7 identification of genetic variants causing or increasing the risk for CP remains elusive. Collectively, fewer than a dozen associated loci have been discovered with genome-wide association studies (GWASs) for CP.8,9,10,11,12,13,14 Genes such as IRF6,8 GRHL3,9 and CTNNA210 have been implicated, though these associations are relatively population specific. One study in a Chinese population identified 9 associated loci,12 but many were not replicated in a subsequent GWAS in an independent Chinese population.11 Given a relative lack of common variant associations with CP, another avenue of discovery is the contribution of rare variants.

A rare variant hypothesis is further supported by the fact that CP occurs as part of a syndrome in about 50% of occurrences, compared to 30% in CL/P. We previously found that individuals with CP were more likely to have rare pathogenic or likely pathogenic (P/LP) variants in genes assembled from clinical genetic testing panels than individuals with other types of OFCs.15 Specifically, we found that the diagnostic yield for 58 CP trios was 18% compared to 9% in CLP and 3% for CL. This suggests that CP may more frequently have a monogenic cause than CL/P. In comparison, a more comprehensive evaluation of rare variants in 603 syndromic OFC probands by Wilson et al.16 found a diagnostic yield for P/LP variants of 36.5%, though the difference between subtypes was less dramatic: 33.3% for CL/P and 37.1% for CP. Among the P/LP variants with known inheritance information, 73% were de novo, indicating that these types of variants are great candidates for additional investigation.

Our previous work has shown nominal significance for coding de novo variants (DNs) in 58 CP trios (enrichment 1.39, p = 9.32 × 10−3) but lacked the power to identify new genes associated with CP,17 indicating a need for larger studies. In addition, we have shown that phenotypic heterogeneity can mask novel genes and hamper gene discovery.14 Therefore, we assembled a phenotypically and ancestrally diverse cohort of 818 CP trios from the CPSeq, Gabriella Miller Kids First (GMKF), and Deciphering Developmental Disorders (DDD) studies. We also explored the heterogeneous nature of CP by investigating our cohort stratified by proband sex, CP subtype (involvement of the hard versus soft palate), and the presence or absence of additional non-cleft phenotypic features.

Subjects and methods

CPSeq dataset

We assembled a collection of 473 trios (consisting of an affected child and their mother and father) ascertained on proband affection status (e.g., CP) from the CPSeq (n = 429) and GMKF (n = 44) whole-genome sequencing (WGS) projects17 (hereafter referred to as CPSeq). There were 14 samples that underwent WGS in both studies, so duplicates were removed prior to analysis to ensure that an individual was only included once. Trios represent all major ancestry groups affected by CP, including those with European ancestry (recruited from Spain, Turkey, Hungary, and the United States), Latin America (Puerto Rico and Argentina), Asia (China, Singapore, Taiwan, and the Philippines), and Africa (Nigeria and Ghana). Recruitment and phenotypic assessment occurred at multiple domestic and international sites following institutional review board (IRB) approval for each local recruitment site and coordinating center (University of Iowa, University of Pittsburgh, and Emory University). All participants underwent informed consent prior to study inclusion.

There were 278 female probands and 195 male probands. All probands and parents were assessed for the presence of a CP, with ∼2/3 of the assembled samples undergoing additional phenotyping to assess the location and severity of the CP. There were 140 probands with cleft hard and soft palate (CH&SP), 164 cleft soft palate (CSP), 5 cleft hard palate (CHP), 27 SMCP, and 137 with unspecific CP subtypes. Although probands/trios were not excluded based on additional clinical features consistent with a syndromic diagnosis, only 33 trios were classified as possibly or probably syndromic based on a reported presence of additional major or minor clinical features. An additional breakdown based on genetic ancestry and subcategories is available in Table S1. We also did not exclude families with a history of CP, with 8% (38 trios) having at least one affected parent. Given the complex etiology of CP,15 variable expressivity, reduced penetrance, and potential for phenocopies, we elected to include all probands in this evaluation.

WGS

The full description of sequencing and variant-calling methodology for the CPSeq trios is detailed in Robinson et al.14 and for the GMKF trios in Bishop et al.17 The WGS for CPSeq was performed by the Center for Inherited Disease Research (CIDR) at Johns Hopkins University (Baltimore, MD). The DRAGEN Germline v.3.7.5 pipeline on the Illumina BaseSpace Sequence Hub platform was used for alignment, variant calling, and quality control, resulting in a single multisample VCF file. For GMKF, WGS for European samples was carried out by the McDonnell Genome Institute (MGI), the Washington University School of Medicine (St. Louis, MO), followed by realignment to hg38 and variant calling at the GMKF Data Resource Center at the Children’s Hospital of Philadelphia. WGS for Colombian and Taiwanese samples was carried out by the Broad Institute, with alignment to hg38 and variant calling by GATK pipelines.18,19,20 Sequencing data is available from the Database of Genotypes and Phenotypes (dbGaP) under dbGaP: phs002220.v1.p1 (CPSeq probands) and dbGaP phs001168.v2.p2 (GMKF probands). Additional information on GMKF cohorts can be found at https://kidsfirstdrc.org/studies/.

Identification of DNs

The DRAGEN 3.7.5 aligner and variant caller were used to generate gVCF files for each CPSeq trio. Individual trio VCFs with DN tags were then generated by using gVCF files combined with pedigree information as input to the DRAGEN 3.7.5 joint caller. To be considered a DN, variants had to have a quality score of 30 and a de novo quality score (DQ) of >2. The pipeline for DN called in the GMKF cohort is detailed in Bishop et al.17 For both cohorts, genotypes were set to missing if genotype quality (GQ) was <20 or the read depth was <10, and parental genotypes had to be a confirmed homozygous reference (0/0), pass all filtering steps, and have an allele balance (AB) ratio of <0.05.

DDD dataset

We accessed the publicly available DDD data, which were primarily ascertained based on undiagnosed neurodevelopmental disorders (NDDs) and/or congenital anomalies, abnormal growth parameters, dysmorphic features, and unusual behavioral phenotypes,21 using the 2017-15-12 data freeze. Details pertaining to sample assembly, exome sequencing, variant calling, and variant annotation have been described previously.22 We used a published list of DNs from 9,858 trios in DDD as detailed in Kaplanis et al.,23 in which further details on DN filtering steps can also be found. Because DDD uses whole-exome versus whole-genome data, we also confirmed that all genes represented in the exome capture targets from this study were present in the CPSeq dataset, and vice versa, to ensure that no genes were unevenly represented in either study. Using a list of Human Phenotype Ontology (HPO) terms related to clefting (Table S2), we identified 345 probands with CP and/or bifid uvula (BU), with 192 males and 153 females included. We found no meaningful differences between our results with and without probands with BU (n = 60) as their cleft phenotype and elected to keep them in our analysis. The DDD cohort was ascertained from regional genetics services in the UK and Ireland.

Variant annotation

Variants were annotated with ANNOVAR (v.201910), and those with coding consequences were selected based on their classification as “exonic” or “splicing.” We performed the initial exome-wide analysis using minor-allele frequency (MAF) cutoffs of <0.01% and <0.1% in either gnomAD exomes v.2.1.1 or gnomAD v3.1.2. We elected to use a MAF of <0.1% for the full analysis to avoid overfiltering, and this MAF filter was applied to all cohorts for consistency.

DN enrichment

We evaluated coding DN enrichment using the R package “denovolyzeR” (v.0.2.0). The cohort was tested for an excess of DNs exome wide and per gene using the functions “DenovolyzeByClass” and “DenovolyzeByGene,” respectively. These functions utilize mutation models described by Samocha et al.24 to determine if there are more observed DNs in a dataset than would be expected by chance. Using mutational rates, the number of DNs is expected to follow a Poisson distribution under the null model of no association between a variant’s class and a phenotype.25 Given a fixed sample size and expected mutation rate for a given genetic sequence, we can determine M (both the mean and variance) as well as the standard deviation, resulting in the “known” constant. Under the alternate model, the number of observed mutations, A, also follows a Poisson distribution, but A may not equal M. We used the Poisson distribution to determine the enrichment shown by A/M with 95% confidence intervals. Because we used multiple cohorts, we compared synonymous variant enrichment for each group, demonstrating similar observed-versus-expected values of 1 (Table S3), consistent with minimal calling bias.

For the DenovolyzeByGene analysis, gene-specific enrichment was considered exome-wide significant at p < 1.30 × 10−6, which corrects for 19,618 genes with predicted mutation rates tested twice for protein-altering (PA) and putative loss-of-function (pLoF) enrichment. The qqplots for all genes and evaluation by variant class can be seen in Figure S1. In analyses where we were interested in the enrichment in specific gene sets, the function “includeGenes” was applied, and we corrected for the number of clusters tested.

We also tested for significant differences between DN enrichment in males and females using a Z test for the observed versus expected number of variants in males versus females while considering the prevalence differences between sexes. We assumed that the variance of observed variants was equal to the expected variance, based on the Poisson distribution, and determined Z using (observed – expected)/square root(expected) for each class of variant.

Enrichment analyses and creation of gene sets

We evaluated our dataset for enrichment in several different ways. First, we used the freely available web server gProfiler26 to perform a ranked query of genes with PA DNs in order of ascending p value from the gene-specific analysis in denovolyzeR. We set the background list for the analysis to the 22,321 genes listed as exome capture targets. This method incrementally tests genes starting from the top of the list and is useful for detecting if enriched functional terms are clustered near the top or evenly distributed through the list. Next, we created three sets of genes directly relevant to CP: an OFC-specific gene panel,15 a set of marker genes generated from single-nucleotide RNA sequencing (snRNA-seq) of the secondary palate in mice at embryonic day 15.5 (E15.5),27 and a set of marker genes from human embryos at post-conception weeks 3–5.28 The OFC gene panel was curated from four sources, including the National Health Service (NHS) Genomic Medicine Service cleft panel (v.2.2), the Prevention Genetics CL/P clinical genetic testing panel, genes from the Online Mendelian Inheritance in Man (OMIM) that include OFCs with a known inheritance and molecular basis, and a manually curated list from recent research studies on OFC genetics—additional details on curation have been published in Diaz Perez et al.15 The full details on the mouse snRNA-seq and marker gene generation can be found in Piña et al.27 We filtered marker genes for false discovery rates (FDRs) < 0.01 prior to enrichment testing for DNs. The full details for marker gene generation for the human embryo snRNA-seq can be found in Zeng et al.28 For each of these three sets of genes, we tested enrichment in denovolyzeR using the “includeGene()” function, which derives enrichment values based on the observed versus expected number of variants within that restricted list of genes. Similar to the exome-wide function, this function employs the same mutational models accounting for gene length and sequence content. All genes were included regardless of specificity: in instances where genes were represented more than once (i.e., a marker gene for more than one cell type), all were tested together.

Results

CP probands are enriched for DNs

We evaluated exome-wide enrichment of DNs from a starting dataset of 818 CP trios. These were made up of 473 probands ascertained based on CP and 345 probands ascertained primarily based on undiagnosed developmental disorders as part of the Deciphering Developmental Disorders (DDD) study.16

We identified 1,101 protein-coding variants in 998 genes (Table S4), averaging 1.35 DNs per trio, which is consistent with the reported rates.17 The frequency of DNs per trios followed a Poisson distribution (Figure S2A) with no significant deviation as tested by chi-squared goodness of fit (p = 0.91). Although we employ a MAF cutoff of <0.1% using gnomAD (v.2.1.1 and v.3.1.2), the majority of variants (62%) were absent from these datasets (Figure S2B. There was, however, a higher rate of DNs among syndromic probands, with 79.6% of syndromic versus 70.9% of nonsyndromic probands having at least 1 coding DN. This resulted in a statistically significant difference in the DN rate, averaging 1.46 and 1.25 per syndromic and nonsyndromic trio, respectively (χ2, p = 0.005).

We classified DNs based on the variant type and predicted function as follows: synonymous variants, missense variants (including single amino acid substitutions and in-frame insertions or deletions), pLoF variants (including nonsense, frameshift insertions or deletions, and splice acceptor or donor sites), and a category referred to as PA variants, which includes the combination of all missense and pLoF variants. Broken down into these categories, we had 222 synonymous, 713 missense, 166 pLoF, and a combined 879 PA DNs (Figure S2C). Because we utilized the built-in mutation rates for denovolyzeR, there were 8 genes with DNs but without expected mutation rates (3 synonymous and 5 missense). While we cannot test these genes for gene-specific enrichment, the total number of DNs by class was still considered for these individuals.

We first tested DN enrichment in all CP trios using denovolyzeR on an exome-wide basis. CP probands had significantly more coding DNs (1.34, p = 1.81 × 10−20) than would be expected by chance based on mutational models.24 When split by variant class, there was no enrichment of synonymous variants (0.96, p = 0.76), which is expected for two reasons: synonymous variants are not often causal for disease and the lack of enrichment indicates that there is not an overall increased rate of variation in our samples. In comparison, there was significant enrichment of PA variants (1.49, p = 1.28 × 10−28), driven by both missense (1.37, p = 5.33 × 10−16) and pLoF (2.32, p = 1.16 × 10−21) variant classes (Figure 1A; Table S5). We also performed this analysis using a stricter MAF cutoff of <0.01% (Figure S3; Table S6), finding similar results with PA variant enrichment of 1.25 (p = 4.23 × 10−9); however, enrichment of synonymous variants decreased to 0.732 (p = 1), so we performed the remainder of our analyses using the higher cutoff to avoid overfiltering.

Figure 1.

Figure 1

CP probands are enriched for protein-altering de novo variants with denovolyzeR

(A) Exome-wide enrichment for DNs in 818 CP probands.

(B) Comparison of enrichment for syndromic (n = 378) and nonsyndromic (n = 436) probands.

(C) Exome-wide enrichment for DNs in 387 male and 431 female CP probands.

(D) Enrichment for 204 male and 174 female syndromic CP probands.

(E) Enrichment for 181 male and 255 female nonsyndromic CP probands.

Error bars represent ± 2 SE. The horizontal dotted line at 1 represents no enrichment (where observed = expected). Variant classes are represented by the following colors: gray = synonymous, blue = missense, red = pLoF, and purple = protein altering. p = 1.91 × 10−3.

Because we did not restrict our cohort based on parental cleft status, we also performed a sensitivity analysis, removing any proband with an affected parent. We found a small increase in both significance and enrichment for all variant classes (Table S7). Considering multiplex families are less likely to develop disease due to DNs, this is in line with what we expect. However, there is still evidence that DNs can contribute to disease risk even in multiplex families15,29; therefore, we elected to include these families moving forward to best represent the genetic landscape of CP.

We then evaluated differences between proband classifications. When comparing variant classes, the main difference between syndromic and nonsyndromic probands was the number of pLoF DNs with enrichments of 3.12 (p = 1.96 × 10−22) versus 1.65 (p = 1.39 × 10−4), respectively (Figure 1B). The difference in pLoF variants was statistically significant (χ2, p = 0.002), but there were no significant differences for synonymous (χ2, p = 0.12) or missense (χ2, p = 0.36) variants.

We next compared male versus female probands, as females are more frequently affected by CP. The combined CP cohort had a female bias (male:female [M:F] ratio of 0.90), which was more pronounced in nonsyndromic probands (M:F ratio of 0.71). Interestingly, syndromic probands were male biased (M:F ratio of 1.17). We compared our data to ratios reported from EUROCAT30 and found that their ratios for the full cohort (0.83) or nonsyndromic probands (0.78) were not significantly different from their registry-based ratios (χ2, p = 0.33 and p = 0.46, respectively); however, we found that the M:F ratio of our syndromic probands significantly differed from their reported ratio of 0.89 (χ2, p = 0.009). When evaluating differences by DN enrichment within the cohort, we found that both sexes were enriched for PA variants (male: 1.58, p = 2.65 × 10−19; female: 1.41, p = 8.91 × 10−12), and there were no significant differences between sex for the full cohort or by syndromic status (Figures 1 C–1E; Tables S5 and S8). Therefore, DNs do not appear to explain the sex bias typically observed in CP.

We compared enrichment by CP subtype for CH&SP (n = 140), CSP (n = 199), and SMCP (n = 86), based on the rationale that DNs may be more prevalent in more “severe” forms of CP (Figure 2; Table S5). Overall, the PA enrichment for each subtype was similar (CH&SP: 1.44, p = 1.74 × 10−5; CSP: 1.44, p = 5.75 × 10−7; SMCP: 1.65, p = 1.64 × 10−6), and there were no differences in variant class enrichment. This was not the case for pLoF variants (CH&SP: 1.47, p = 0.07; CSP: 2.13, p = 2.91 × 10−5; SMCP: 3.46, p = 1.12 × 10−7), which were significantly higher in SMCP compared to both CSP (χ2, p = 0.043) and CH&SP (χ2, p = 0.003). However, this is most likely explained by an ascertainment bias, as isolated SMCP is more likely to be identified in patients undergoing a full clinical workup and ascertained for other conditions rather than identified as an isolated phenotype. Accordingly, the majority (70%) of our SMCP probands were in the syndromic group, which we also know is significantly more enriched for pLoF DNs. Altogether, there does not seem to be a relationship between DN enrichments or DN class and CP severity.

Figure 2.

Figure 2

Exome-wide DN enrichment patterns across subtypes

There were no significant differences across any subtype for missense or protein-altering variants. SMCP probands were significantly more enriched compared to both CSP and CH&SP. The horizontal dotted line at 1 represents no enrichment (where observed = expected). Error bars represent ± 2 SE. Variant classes are represented by the following colors: gray = synonymous, blue = missense, red = pLoF, and purple = protein altering. p = 0.043 and ∗∗p = 2.70 × 10−3.

Gene-specific analyses unveil known and previously unassociated candidate genes for CP

We first performed analysis on a per-gene basis to identify individual genes with a significant excess of DNs. In the full cohort, 4 genes reached exome-wide significance (p < 1.3 × 10−6) for both PA and pLoF DNs: SATB2 (PA p = 1.29 × 10−30 and pLoF p = 1.68 × 10−22), MEIS2 (PA p = 2.18 × 10−10 and pLoF p = 5.01 × 10−9), COL2A1 (PA p = 4.98 × 10−10 and pLoF p = 2.88 × 10−12), and ZC4H2 (PA p = 2.32 × 10−7 and pLoF p = 1.16 × 10−6). Three additional genes were exome-wide significant for pLoF DNs only: EFTUD2 (p = 4.27 × 10−8), KAT6B (p = 1.47 × 10−7), and ANKRD11 (p = 1.52 × 10−7) (Figure 3; Table S9). Although they did not quite reach our set significance threshold, we also observed enrichment of several genes with known OFC associations: IRF6 (PA p = 3.14 × 10−6), MED13L (PA p = 7.55 × 10−6), and KMT2D (pLoF p = 2.00 × 10−6). Also nearing significance was PRKCI (PA p = 8.63 × 10−6), which had not been previously associated with CP or OFCs in general. Given the phenotypic specificity of our other findings in this dataset, we were particularly interested in the role of PRCKI in CP, and recent work illustrates its critical role in the periderm, similar to that of IRF6 and GRHL3.31

Figure 3.

Figure 3

Gene-specific analyses find exome-wide significance for putative loss-of-function and protein-altering DNs

The dotted line represents the exome-wide significance threshold at p = 1.3 × 10−6. Variant classes are represented by red = putative loss-of-function (pLoF) (top) and purple = protein altering (bottom).

As a comparison to unaffected individuals, we performed ByGene analysis for a recently published list of coding DNs in 1,517 trios from the gnomAD 4.1 exome dataset (Table S10).32 There were no DNs in any of the top ten genes in our cohort, and only 123 genes overlapped between datasets. Of these, only 101 were PA DNs, again highlighting the specificity of our findings.

When comparing these findings by group, we saw that these enrichments in ZC4H2, MED13L, ANKRD11, and KAT6B were driven exclusively by DNs in syndromic probands, whereas IRF6 enrichment was driven exclusively by nonsyndromic probands. The remaining genes had at least 1 DN identified in each group (Table 1). In total, there were 26 genes that shared PA DNs between both groups, though each was enriched to varying degrees due to differences in sample size (Figure 4; Table S9). A total of 393 or 371 genes with PA DNs was found only in syndromic or isolated probands, respectively.

Table 1.

Top 15 genes by significance in combined CPSeq and DDD cohorts with individual contributions from syndromic and nonsyndromic probands

Gene All (n = 818)
Syndromic (n = 378)
Nonsyndromic (n = 436)
Observed p value Observed p value Observed p value
SATB2 14 1.29 × 10−30a 11 6.72 × 10−27a 3 2.18 × 10−6
MEIS2 5 2.18 × 10−10a 4 1.65 × 10−9a 1 0.0162
COL2A1 6 4.98 × 10−10a 1 0.0387 5 1.56 × 10−9a
ZC4H2 3 2.32 × 10−7a 3 2.30 × 10−8a
IRF6 3 3.14 × 10−6 3 4.79 × 10−7a
MED13L 4 7.55 × 10−6 4 3.62 × 10−7a
PRKCI 3 8.63 × 10−6 2 1.49 × 10−4 1 0.0199
NEDD4L 3 2.12 × 10−5 2 2.73 × 10−4 1 0.0268
EFTUD2 3 2.57 × 10−5 2 3.10 × 10−4 1 0.0285
ANKRD11 4 3.69 × 10−5 4 1.81 × 10−6
STAG2 3 4.84 × 10−5 3 4.91 × 10−6
CSNK2B 2 7.77 × 10−5 1 0.0058 1 0.0067
PC 3 8.64 × 10−5 1 0.0372 2
POLR1F 2 0.000156 2 4.44 × 10−5
a

Enrichment is significant with p < 1.3 × 10−6.

Figure 4.

Figure 4

Comparison of DN enrichment by gene in syndromic versus nonsyndromic probands shows shared and distinct patterns

The dotted lines represent significant enrichment (p = 1.3 × 10−6), and the solid line represents the expected value if enrichment were the same, regardless of status.

We also assessed individual DNs in syndromic probands ascertained on CP (i.e., syndromic probands in CPSeq) (Table S4) and identified two variants that may explain each proband’s phenotype. First, we identified a variant in Cbl proto-oncogene (CBL; c.1228−2A>G [GenBank: NM_005188]) in an individual with CP, developmental delay, growth concerns, epilepsy, and an enlarged medial ventricle. In ClinVar (accession: VCV000177959.25), this variant is classified as P/LP for CBL syndrome (MIM: 613563). Although epilepsy has not been reported for this syndrome, the remainder of the phenotypes are plausibly attributed to this finding. Similarly, we found a frameshift deletion in RPL5 (c.46_47del [GenBank: NM_000969.5] [p.Tyr16ProfsTer5]) in a proband with CP, vascular ring anomaly, myopia, and ADHD. RPL5 is associated with Diamond-Blackfan anemia (DBA; MIM: 612561), which commonly features CP and congenital heart defects, including vascular ring anomalies.33,34 Although this specific variant has not been reported in ClinVar, it meets PVS1 (null variant), PS2 (confirmed de novo), and PM2 (absent in gnomAD v.4.1.0) criteria, qualifying for pathogenic classification according to ACMG guidelines. Although this variant explains some of the features in this proband, neither myopia nor attention-deficient hyperactivity disorder (ADHD) are linked to DBA, which may be due to variants in secondary loci. Though an in-depth investigation of each DN is beyond the scope of this study, our findings highlight the value of searching beyond aggregate enrichment within a cohort, particularly where the genotype supports the observed phenotypes.

There were also multiple genes with DNs that were found exclusively in specific CP subtypes (Table S11). For example, 2 DNs each were found in ARID1A (p = 1.91 × 10−4) and TGFBR2 (p = 1.53 × 10−5), which were found exclusively in probands with CH&SP (n = 140). Similarly, there were 3 EFTUD2 (p = 3.81 × 10−7) and 3 PRKCI (p = 1.27 × 10−7) DNs that were all found in probands with CSP (n = 199) specifically. Given the overall sample sizes, however, it is difficult to conclude whether our findings relate to the biological function of these genes or whether these patterns would change with larger samples.

Genes associated with AD conditions featuring OFCs are significantly enriched for DNs

We next compared DN enrichment for a list of 418 genes with known associations with any OFC type15 (Table S12). Unsurprisingly, we observed higher and more significant enrichment within this restricted list of genes for PA DNs (5.82, p = 1.08 × 10−48), including missense variants (3.29, p = 4.20 × 10−14) and pLoF variants (23.9, p = 9.00 × 10−58). When split by inheritance patterns, this list contained 178 genes associated with autosomal-dominant (AD), 170 with autosomal-recessive (AR), and 8 with X-linked conditions. The most pronounced findings were among genes related to AD conditions (10.0, p = 1.25 × 10−57) as compared to AR conditions (1.83, p = 0.017) (Figure 5), and there were no DNs in the limited number of X-linked genes. The difference in probands harboring DNs for AD versus AR OFCs was statistically significant (χ2 test) for all PA DNs (p = 6.58 × 10−16), with differences in both missense (p = 5.82 × 10−6) and pLoF (p = 7.58 × 10−11) DNs. This pattern remained true regardless of syndromic status or subtype classification, though we did not statistically test more stratified comparisons (Table S13; Figures S4 and S5).

Figure 5.

Figure 5

CP probands are strongly enriched for OFC-associated genes implicated in autosomal-dominant conditions

Enrichment is shown for all genes in the image and stratified by the mode of inheritance when known for disease-associated genes. The horizontal dotted line at 1 represents no enrichment (observed = expected value). Error bars represent ± 2 SE. Variant classes are represented by the following colors: gray = synonymous, blue = missense, red = pLoF, and purple = protein altering. p = 5.82 × 10−6, ∗∗p = 7.58 × 10−11, and ∗∗∗p = 6.58 × 10−16.

In aggregate, 11.3% of all DNs (125/1,101) and 13.1% of PA DNs (115/879) belonged to genes within this list. When stratified by inheritance patterns, the distribution of these PA DNs was consistent with our enrichment findings: 79.1% (91/115) of PA DNs were associated with AD, 9.6% (11/115) with AR, and 6.1% (7/115) with unspecified conditions. Further, we found that 6.2% (51/818) of probands harbored a pLoF DN in a gene associated with AD OFCs, and 16 of these were found in nonsyndromic individuals. From a clinical standpoint, these findings support the benefit of genetic testing for any individual with CP regardless of syndromic status or family history.

Because some probands harbor multiple DNs, we checked those with PA DNs and found that none had more than one in an AD OFC gene. We also investigated probands with PA DNs in the AR OFC genes in the CPSeq dataset (those for whom we have access to full sequencing data). There were no additional inherited variants identified in any of these probands, suggesting a limited contribution of DNs in these AR OFC genes to CP risk.

Gene Ontology suggests differences in underlying mechanisms of syndromic and nonsyndromic CP

To explore potential biological differences of DNs in syndromic and nonsyndromic CP, we performed a ranked query in gProfiler26 to look for patterns in Gene Ontology (GO) term enrichment. Using the list of genes with PA DNs in ascending order by significance, we compared results for biological process (BP), cellular component (CC), and molecular function (MF) categories. With this approach, the enrichment significance suggests that genes associated with a given term are more concentrated at the top of the ranked list rather than evenly distributed (i.e., the most significant genes are likely driving the observed enrichment). Broadly, there were more terms returned for syndromic probands (n = 11) than for nonsyndromic probands (n = 4). (Table S14). The top results for syndromic probands were histone-modifying activity (GO:MF), chromatin remodeling (GO:BP), and nucleoplasm (GO:CC), suggesting involvement in nuclear and chromatin-related functions. In contrast, the results for nonsyndromic probands were ATP hydrolysis activity (GO:MF), ATP-dependent activity (GO:MF), and roof-of-mouth development (GO:BP), with no terms enriched for GO:CC. Although the nonsyndromic results are less specific, they may suggest that disruption of typical ATP utilization could play a role in CP, though further study of this would be needed. Still, these terms suggest that the genes with DNs in syndromic probands play a role in epigenetic mechanisms involving chromatin, which was not true for our nonsyndromic probands. These findings are consistent with what has been reported previously,16 though further investigation into genes belonging to these pathways is warranted.

Distinct cell types are enriched for syndromic versus nonsyndromic CP probands

snRNA-seq data allow for the identification of specific sets of genes, often called marker genes, that are highly expressed within cell populations. We next wanted to know if any specific cell types were overrepresented in our DN dataset, as this may allow a more granular understanding of the key cells at play in CP development. For simplicity, we only report enrichment for the PA group in the following section, though all DN class enrichment is available in the supplemental information.

We first looked at marker genes across 10 clusters derived from the secondary palate of mice at E15.5 (the time of palatal shelf fusion) with an FDR of <0.01 (Table S15).27 From the 2,647 total marker genes, there was an overlap with 188 unique genes containing 235 total DNs in our dataset. In the full cohort, there were 5 clusters significantly enriched (p < 0.005, correction for 10 clusters): early osteocyte progenitor cells (3.71, p = 3.24 × 10−7), endothelium (1.88, p = 2.39 × 10−5), chondrocyte progenitor cells (3.61, p = 3.02 × 10−5), epithelium (1.63, p = 1.23 × 10−3), and late osteocyte progenitor cells (1.90, p = 2.83 × 10−3) (Tables 2 and S16; Figure 6A). When stratified by syndromic status, we found that chondrocyte progenitor cells and endothelium remained significantly enriched solely in our nonsyndromic group, whereas early osteocyte progenitor cells and late osteocyte progenitor cells remained significant only in the syndromic group (Figures 6A and S6A). Interestingly, the mesenchyme was significantly enriched solely in syndromic probands, and the epithelium was not significantly enriched in either group alone.

Table 2.

Enrichment and select genes for protein-altering DNs in snRNA-seq clusters

Cluster All
Syndromic
Nonsyndromic
Enr. Enr. Genes (no. of DNs if > 1) Enr. Genes (no. of DNs if > 1)
Mouse secondary palate (embryonic day 15.5)

Blood cells 1.11 (0.20–2.02) 1.60 (0.00–3.20) FOXP1, HDAC9 0.69 (−0.28–1.67) SIRPA, IZFKA
Chondrocyte progenitor cells 3.61 (1.77–5.45)b 2.61 (0.26–4.96) COL2A1, SOX5, FGFR3 4.52 (1.65–7.39)a COL2A1,5COL11A, PRICKLE1
Early osteo progenitor cells 3.71 (2.12–5.30)b 5.11 (2.34–7.88)b SATB2,11PTCH1 2.53 (0.76–4.30) SATB2,3FMOD, PRTG
Endothelium 1.88 (1.36–2.40)b 1.80 (1.05–2.55) HDAC7, CTNND1, CTNNA1 1.90 (1.18–2.62)a TGFBR2,2ACTB, DYNC1H1
Epithelium 1.63 (1.15–2.11)a 1.77 (1.03–2.51) MEIS2,4PRKCI,2ARID1A 1.53 (0.89–2.17) IRF6,3MEIS2, DNAH11
Late osteocyte progenitor cells 1.90 (1.12–2.68)a 2.40 (1.11–3.69)a SATB2,11SLC9A2, FRY, SYNE1 1.48 (0.54–2.42) SATB2,3INPP4A, CDH2
Mesenchyme 1.90 (1.05–2.75) 2.46 (1.05–3.87)a KAT6B,3SSBP3,2SOX11 1.42 (0.41–2.43) RPL12, MAB21L2, PRICKLE1
Muscle progenitor cells 1.51 (1.00–2.02) 1.40 (0.68–2.12) FOXP1, DLG2, SNX1 1.62 (0.89–2.35) MYH3,2HIPK3, CDHR3
Neural progenitor cells 0.92 (0.50–1.34) 1.57 (0.75–2.39) UBL3, CHD7, SOX5 0.36 (0.00–0.73) MSI1, SLC7A14
Pax9 mesenchyme 1.99 (0.84–3.14) 2.87 (0.85–4.89) FOXP1, RBMS1, ARHGAP20 1.24 (−0.01–2.49) COL11A, RBMS1, CDH11

Human whole embryo (post-conception weeks 3–5)

Cardiomyocyte 0.96 (0.01–1.91) 1.56 (−0.26–3.38) TTN 0.45 (−0.46–1.36) TTN
CNS lineage 2.04 (0.18–3.90) 2.65 (−0.5–5.8) ELAVL1, PBX1, SOX11 1.53 (−0.65–3.71) CDH2, KIF21A
Dermomyotome 5.61 (2.10–9.12)d 3.64 (−0.69–7.97) COL2A1, RPL2, RPS5 7.37 (1.49–13.25)c COL2A1,5RPL12, RPS6
Endoderm 1.77 (0.03–3.51) 1.91 (−0.92–4.74) CDKN1C, LAMB1 1.66 (−0.70–4.02) SERPINA1, DSC2
Endothelium 1.20 (−0.19–2.59) 0.87 (−0.8–2.53) SLC9A3R2 1.51 (−0.67–3.69) ANKRD37, ARPC1B
Epithelium 4.90 (2.13–7.67) 2.65 (−0.5–5.8) COL2A1, RPL2, RPS5 6.89 (2.27–11.51)d COL2A1,5RPL12, PRTG
Immune 2.74 (0.26–5.22) 3.56 (−0.77–7.89) CTSZ, FCGRT, MAF 2.06 (−0.77–4.89) APRBC1, ACTB
Intermediate mesoderm 4.30 (1.90–6.70)d 6.44 (2.15–10.73)d MEIS2,4CDKN1C, PAX8 2.48 (−0.02–4.98) MEIS2, RBMS1, PEG3
Lateral plate mesoderm 3.02 (1.02–5.02)c 4.36 (0.86–7.86)c MEIS2,4CDKN1C, FOXP1 1.89 (−0.28–4.06) MEIS2, DSC2, PEG3
Mesoderm precursor 1.71 (−0.21–3.63) 1.23 (−1.27–3.73) FBL 2.14 (−1.00–5.28) RPS6, SMC6
Neuromesodermal 0.53 (−0.52–1.59) 1.09 (−1.00–3.00) RPS6
PNS lineage 1.70 (0.03–3.37) 1.84 (−0.73–4.41) RPS2, BCAR3 1.59 (−0.59–3.77) YBX1, ARPC1B
Primitive erythroid 0.50 (−0.50–1.50) 1.09 (−1.13–3.31) HK1
Sclerotome 3.81 (1.38–6.24)c 2.47 (−0.42–5.36) CCD32, COL2A1, FBN2 5.00 (1.22–8.78)c COL2A1,5CDH11, SNAI1
a

Enrichment is significant with p < 5 × 10−3.

b

Enrichment is significant with p < 5 × 10−5.

c

Enrichment is significant with p < 3.6 × 10−3.

d

Enrichment is significant with p < 3.6 × 10−5.

Figure 6.

Figure 6

Cell clusters based on sets of marker genes are differentially enriched based on syndromic status and CP subtype

(A and B) Radar plots for −log10(p value) showing significant enrichment by cluster in snRNA-seq data from mouse palate at embryonic day 15.5 by (A) syndromic status and (B) CP subtype.

(C and D) Radar plots for −log10(p value) showing significant enrichment by cluster in snRNA-seq data from human embryos at post-conceptional weeks 3–5 by (C) syndromic status and (D) CP subtype.

For (A) and (B), the inner dotted ring represents p = 0.005, and the solid, outermost ring represents p = 1 × 10−10. For (C) and (D), the inner dotted ring represents p = 0.0036, and the solid, outermost ring represents p = 1 × 10−6.

When looking at the specific genes within these clusters, we found varied patterns of DN burden. There were five COL2A1 DNs within the nonsyndromic probands, accounting for 50% of the observed variants in the chondrocyte progenitor cluster. In contrast, among the 28 DNs found within genes of the endothelium, only two (TGBGR2 and DYNC1H1) harbored multiple DNs, with two each. Similarly, in the syndromic group, SATB2 variants accounted for 79% (11/14) of DNs in both early and late osteocyte progenitors, whereas in the mesenchyme, only two genes had more than one DN out of 13 total (KAT6B, n = 3; SSBP3, n = 2). Collectively, these results suggest that while some genes are individually enriched for DNs, such as COL2A1 or SATB2, disruption of broader processes (rather than a specific gene) also contributes to CP pathophysiology. For example, 24% (6/25) of the endothelium marker genes with DNs in nonsyndromic probands are part of the actin cytoskeleton; thus, deeper investigation into more general pathways and/or cell components may lead to more discoveries.

As with other evaluations, we also compared cluster enrichment by CP subtype to identify specific cell types that may drive phenotypic heterogeneity. However, the only cluster with any significant enrichment was the epithelium in CSP (2.34, p = 1.95 × 10−3) (Figures 6B and S6B; Table S16). There were 16 DNs in these cluster marker genes: 3 were within PRKCI and the remainder had only 1 DN per gene. Although we did not identify any significant patterns by subtype, future studies with larger sample sizes may be more insightful and provide more evidence of the trends we observed.

We also considered that an investigation into earlier cell origins could reveal differences between CP subgroups, as the mouse data were derived from palatal tissue specifically at the time of osteogenesis. As such, we next evaluated DNs in marker genes from a publicly available single-cell RNA-seq (scRNA-seq) dataset derived from human embryos at post-conceptional weeks 3–5 and clustered into 14 main cell lineages.28 In this dataset, there were 952 total marker genes (Table S17), of which 51 unique genes contained 65 total DNs from our cohort. Four clusters were significantly enriched (p < 3.57 × 10−3, correction for 14 clusters), including epithelium (4.90, p = 1.04 × 10−5), intermediate mesoderm (4.30, p = 1.75 × 10−5), dermomyotome (5.61, p = 1.78 × 10−5), and sclerotome (3.81, p = 4.04 × 10−4), though the lateral plate mesoderm was barely shy of our cutoff (3.02, p = 3.65 × 10−3) (Tables 2 and S18; Figure S7). With this dataset, genes for the epithelium, dermomyotome, and sclerotome were enriched for DNs in nonsyndromic probands, whereas the intermediate mesoderm and lateral plate mesoderm were enriched for DNs in the syndromic probands (Figures 6C and S7A). However, as with the mouse data, COL2A1 was a key driver in all enriched clusters for nonsyndromic probands, making up 56% (5/9), 71% (5/7), and 71% (5/7) of the DNs in epithelium, dermomyotome, and sclerotome, respectively. A similar pattern for syndromic probands was found, with MEIS2 accounting for 44% (4/9) of intermediate mesoderm and 67% (4/6) of lateral plate mesoderm DNs. Unlike the mouse data, there were no enriched clusters that lacked major driver genes.

We found a single enriched cluster when stratified by CP subtype (Figures 6D and S7B), with the sclerotome remaining significant for CH&SP probands, with 2 DNs in COL2A1 and 1 each for CDH1 and SNAI1. We noted fewer overall DNs overlapping with the human data (5.3% human versus 7.1% mouse), which may suggest more specificity for DNs in genes in the palate leading to CP but may also be due to factors not related to biology, such as statistical analysis methods or other technical differences.

In summary, both datasets were enriched for DNs in this cohort, with distinct patterns emerging primarily between syndromic and nonsyndromic CP probands. Although larger sample sizes are needed for further validation, investigation into variants within other genes of the enriched cell clusters may provide additional insight into CP etiology.

Discussion

Here, we found exome-wide enrichment in a large-scale investigation of coding DNs in 818 CP trios representing multiple CP subtypes, a broad phenotypic spectrum, and varying family history. Our cohort reflects the heterogeneity of the genetic architecture of CP: collective evaluation increases our power to detect significantly enriched genes or groups of genes, while stratification helps identify more specific signals. Although previous work has found putative causal DNs in multiplex families, we observed little to no contribution of the DNs in these probands to our findings. Among the top twenty enriched genes, only one DN came from a multiplex proband (POLR1F). While there are likely individual DNs contributing to risk for CP in some probands, they are collectively not a main factor in this cohort.

There was a significantly higher burden of DNs in syndromic probands, which was not unexpected given that de novo variation is not under selective pressure and therefore can result in more severe phenotypes. Interestingly, although there was no significant difference in the DN rates between males and females within our cohort in any group, there was a deviation in the M:F ratio in probands classified as syndromic when compared to reported ratios from EUROCAT. We suspect that this observation is a result of ascertainment. The majority of our syndromic probands were ascertained on NDDs as part of the DDD study, and we know NDDs disproportionally affect males.35,36

A second possible effect of ascertainment differences between DDD and CPSeq was the lack of DNs in genes frequently associated with syndromic forms of CP that are not characterized by intellectual disability or neurological phenotypes. Of note, there were no syndromic probands harboring DNs in IRF6,37 GRHL3,38 or MYH339—these were only found in our presumed nonsyndromic individuals. These results likely indicate differing mechanisms underlying syndromic CP featuring NDDs versus other congenital anomalies, illustrating that care must be taken not to extrapolate the “syndromic” results presented here to any and all syndromic forms of CP. The presence of PA DNs in the nonsyndromic cohort in genes typically associated with syndromic OFCs raises other important issues facing human genetics, including understanding variable expressivity versus the limitations of phenotyping. Some of these “syndromic” genes with DNs in the nonsyndromic cohort included COL21A,40 SATB2,41 and MEIS2.42 While variable expressivity has been documented for IRF6,37,43 GRHL3,9,38 and COL2A1,44,45 this is less true for the other genes mentioned here. It is important to note, however, that our nonsyndromic probands are presumed so—some were recruited in infancy, during which additional features, particularly NDDs, would not yet have been apparent. Therefore, we cannot rule out the later qualification as syndromic for a small subset of our cohort. Still, our findings could suggest an expansion of the phenotypic spectrum for SATB2, MYH3, and MEIS2 to include isolated CP.

Previous work by Wilson et al.16 found that P/LP variants in syndromic CP probands were overrepresented in genes involved in chromatin remodeling, which has historically been enriched in NDDs. This is similar to our findings, where the most significant genes with PA DNs in syndromic probands were enriched for both chromatin remodeling and histone-modifying activity. It is important to note, however, that there is much overlap in the data between studies, as many of their samples also came from DDD. Broadly, the distinctions between GO for syndromic and nonsyndromic probands can provide clues to etiologic differences. For example, syndromic probands harbored DNs in genes enriched for epigenetic modifications, including chromatin remodeling/organization and histone modification. It would make sense that disruption of these broad processes could then result in a wider range of effects, as observed in syndromic probands (e.g., multiple congenital anomalies or neurodevelopmental phenotypes). In contrast, there were fewer findings for nonsyndromic probands, which included the relatively nonspecific categories ATP hydrolysis and ATP-dependent activity. Yet, it remains possible that palatogenesis is particularly sensitive to perturbation of the energy stores required for cell migration or cytoskeletal rearrangement, leading to palatal defects.

Understanding the genetic underpinnings of CP subtypes remains a challenge. It could be hypothesized that CSP is a less severe version of CH&SP. The current data do not support this idea, as there were no differences in the rate of DNs in any class of variant between the two. In fact, although not significantly different, there was a higher enrichment of pLoF DNs for CSP than for CH&SP, suggesting a heterogeneous genetic architecture underlying similar phenotypic heterogeneity. We also aimed to identify differences in specific cell types or BPs and found two enriched clusters by subtype: CH&SP in the sclerotome from the human data and CSP in the mouse epithelium cluster. Although the sclerotome does not directly contribute to craniofacial development, it does similarly give rise to bony structures (the vertebral column) and associated soft tissues (intervertebral discs and meninges).46 Therefore, it would be unsurprising to observe an enrichment for CH&SP PA DNs if these genes play similar roles in palatal development, though further study is needed to substantiate such speculation. We also found that probands with CSP were enriched for DNs in genes from the mouse epithelium cluster. This may indicate a higher likelihood for CSP with disruption of genes in the epithelium, but we also know that genes highly expressed in this tissue, such as IRF6 and MEIS2, are not specifically associated with any CP subtype, so these associations must be interpreted cautiously.

Taken together, these data imply that there is not currently a single list of genes that represents all individuals with CP, and care should be taken when creating lists of “CP genes.” Although the underlying architecture of CP is heterogeneous, we highlight the utility of combined analysis of all classifications of CP to identify candidate genes for CP and to expand the phenotypic spectrum for others. In aggregate, we show there are differences between syndromic and nonsyndromic probands, but on an individual level, we find that these distinctions are less clear. Our main goal was to identify significant enrichment collectively, which can guide future investigation using these top candidate genes. As such, we recognize that this dataset contains both presumably causal and non-causal de novo variation: more detailed curation for variants is needed for better characterization of individual pathogenicity.

When considering this in a larger clinical picture, genetic testing for any individual born with a CP, regardless of the presence of additional clinical features, may be fruitful—in fact, we found that 6.4% (52/818) of probands had a pLoF DN in an AD OFC-associated gene, 16 of which were in presumed nonsyndromic individuals. This suggests that the clinical diagnostic yield for CP in this cohort may be similar to that of recent reports.15 Together, these findings highlight the heterogeneity of CP genetics and the continued need for larger cohorts. Future studies focused on the spectrum of phenotypes in genes associated with OFC syndromes, deeper exploration of subtype genetic risks, and the contribution of rare inherited variants are warranted to better understand palatogenesis and the genetic architecture of CP.

Acknowledgments

We are very thankful for the participants, their families, and colleagues who have made this research possible. Sequencing services for CPSeq were provided by the Center for Inherited Disease Research (CIDR). CIDR is fully funded through a federal contract from the National Institutes of Health (NIH) to the Johns Hopkins University, contract number HHSN268201700006I. Additional WGS was funded by NIH grants X01-HG010835 (E.J.L.-C.. and M.L.M.), X01-HL0132363 (M.L.M.), and X01-HD100701 (E.J.L.-C., J.C.M., and M.L.M.). Patient recruitment, assembly of phenotypic information, and data analysis were supported by NIH grants F31-DE032588 (K.R.R.), R01 DE027983 (E.J.L.-C.), R01-DE028342 (E.J.L.-C.), R01-DE030342 (E.J.L.-C.), R01-DE028300 (A.B.), R00-DE024571 (C.J.B.), U54GM133807 (C.J.B.), U54GM133807 (C.J.B.), R01-DE008559 (J.C.M.), R01-DE016148 (M.L.M. and S.M.W.), R01-DE008559 (J.C.M. and M.L.M.), R01-DE032122 (M.L.M.), R01-DE0332319 (M.L.M., E.J.L.-C., and S.M.W.), R01-DE011931 (J.T.H.), and R01-DE031261 (H.B.). We thank the California Department of Public Health, Maternal Child and Adolescent Division, for providing data for these analyses. This work was supported by the Centers for Disease Control and Prevention, Centers of Excellence no. U01-DD001033 (G.M.S.).

Author contributions

This study conceptualized by E.J.L.-C. and K.R.R. Resources were contributed by T.H.B., A.B., C.J.B., K.F.D., J.T.H., L.M.U., J.C.M., J.E.P., G.M.S., S.M.W., H.B., M.L.M., and E.J.L.-C. Data curation was performed by K.R.R., S.C., J.E.P., D.J.C., K.F.D., and H.B. Data analysis, investigation, and visualization were performed by K.R.R. under the supervision of D.J.C., M.P.E., and E.J.L.-C. The original manuscript was drafted by K.R.R. All authors contributed to critical review and approval of the final draft.

Declaration of interests

The authors declare no competing interests.

Published: October 6, 2025

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.ajhg.2025.09.009.

Web resources

Supplemental information

Document S1. Figures S1–S7 and Tables S1–S3, S5–S8, and S13
mmc1.pdf (1,012.6KB, pdf)
Data S1. Tables S4, S9–S12, and S14–S18
mmc2.xlsx (847.4KB, xlsx)
Document S2. Article plus supplemental information
mmc3.pdf (2.3MB, pdf)

References

  • 1.Mai C.T., Isenburg J.L., Canfield M.A., Meyer R.E., Correa A., Alverson C.J., Lupo P.J., Riehle-Colarusso T., Cho S.J., Aggarwal D., et al. National population-based estimates for major birth defects, 2010-2014. Birth Defects Res. 2019;111:1420–1435. doi: 10.1002/bdr2.1589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Stewart J.M., Ott J.E., Lagace R. Submucous cleft palate: prevalence in a school population. Cleft Palate J. 1972;9:246–250. [PubMed] [Google Scholar]
  • 3.Kang S.L., Narayanan C.S., Kelsall W. Mortality among infants born with orofacial clefts in a single cleft network. Cleft Palate. Craniofac. J. 2012;49:508–511. doi: 10.1597/10-179. [DOI] [PubMed] [Google Scholar]
  • 4.Marazita M.L. The evolution of human genetic studies of cleft lip and cleft palate. Annu. Rev. Genomics Hum. Genet. 2012;13:263–283. doi: 10.1146/annurev-genom-090711-163729. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Wehby G.L., Cassell C.H. The impact of orofacial clefts on quality of life and healthcare use and costs. Oral Dis. 2010;16:3–10. doi: 10.1111/j.1601-0825.2009.01588.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Moreira T., Dias M., Von Hafe M., Curval A.R., Ramalho C., Maia A.M., Moura C.P., Orofacial Cleft Team of University Hospital Center of São João EPE Orofacial clefts: Reflections on prenatal diagnosis and family history based on a series of cases of a tertiary children hospital. Congenit. Anom. 2023;63:195–199. doi: 10.1111/cga.12538. [DOI] [PubMed] [Google Scholar]
  • 7.Trezena S., Machado R.A., de Almeida Reis S.R., Scariot R., Rangel A.L.C.A., de Oliveira F.E.S., Borges A.J., Silva A.T., Martelli D.R.B., Martelli Júnior H. Isolated nonsyndromic cleft palate: multicenter epidemiological study in the Brazil. BMC Oral Health. 2023;23:486. doi: 10.1186/s12903-023-03197-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Rahimov F., Nieminen P., Kumari P., Juuri E., Nikopensius T., Paraiso K., German J., Karvanen A., Kals M., Elnahas A.G., et al. High incidence and geographic distribution of cleft palate in Finland are associated with the IRF6 gene. Nat. Commun. 2024;15:9568. doi: 10.1038/s41467-024-53634-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Leslie E.J., Liu H., Carlson J.C., Shaffer J.R., Feingold E., Wehby G., Laurie C.A., Jain D., Laurie C.C., Doheny K.F., et al. A Genome-wide Association Study of Nonsyndromic Cleft Palate Identifies an Etiologic Missense Variant in GRHL3. Am. J. Hum. Genet. 2016;98:744–754. doi: 10.1016/j.ajhg.2016.02.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Butali A., Mossey P.A., Adeyemo W.L., Eshete M.A., Gowans L.J.J., Busch T.D., Jain D., Yu W., Huan L., Laurie C.A., et al. Genomic analyses in African populations identify novel risk loci for cleft palate. Hum. Mol. Genet. 2019;28:1038–1051. doi: 10.1093/hmg/ddy402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Huang L., Jia Z., Shi Y., Du Q., Shi J., Wang Z., Mou Y., Wang Q., Zhang B., Wang Q., et al. Genetic factors define CPO and CLO subtypes of nonsyndromicorofacial cleft. PLoS Genet. 2019;15 doi: 10.1371/journal.pgen.1008357. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.He M., Zuo X., Liu H., Wang W., Zhang Y., Fu Y., Zhen Q., Yu Y., Pan Y., Qin C., et al. Genome-wide Analyses Identify a Novel Risk Locus for Nonsyndromic Cleft Palate. J. Dent. Res. 2020;99:1461–1468. doi: 10.1177/0022034520943867. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Beaty T.H., Ruczinski I., Murray J.C., Marazita M.L., Munger R.G., Hetmanski J.B., Murray T., Redett R.J., Fallin M.D., Liang K.Y., et al. Evidence for gene-environment interaction in a genome wide study of nonsyndromic cleft palate. Genet. Epidemiol. 2011;35:469–478. doi: 10.1002/gepi.20595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Robinson K., Mosley T.J., Rivera-González K.S., Jabbarpour C.R., Curtis S.W., Adeyemo W.L., Beaty T.H., Butali A., Buxó C.J., Cutler D.J., et al. Trio-based GWAS identifies novel associations and subtype-specific risk factors for cleft palate. Human Genet. Genom. Adv. 2023;4 doi: 10.1016/j.xhgg.2023.100234. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Diaz Perez K.K., Curtis S.W., Sanchis-Juan A., Zhao X., Head T., Ho S., Carter B., McHenry T., Bishop M.R., Valencia-Ramirez L.C., et al. Rare variants found in clinical gene panels illuminate the genetic and allelic architecture of orofacial clefting. Genet. Med. 2023;25 doi: 10.1016/j.gim.2023.100918. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Wilson K., Newbury D.F., Kini U. Analysis of exome data in a UK cohort of 603 patients with syndromic orofacial clefting identifies causal molecular pathways. Hum. Mol. Genet. 2023;32:1932–1942. doi: 10.1093/hmg/ddad023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Bishop M.R., Diaz Perez K.K., Sun M., Ho S., Chopra P., Mukhopadhyay N., Hetmanski J.B., Taub M.A., Moreno-Uribe L.M., Valencia-Ramirez L.C., et al. Genome-wide Enrichment of De Novo Coding Mutations in Orofacial Cleft Trios. Am. J. Hum. Genet. 2020;107:124–136. doi: 10.1016/j.ajhg.2020.05.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Conrad D.F., Keebler J.E.M., DePristo M.A., Lindsay S.J., Zhang Y., Casals F., Idaghdour Y., Hartl C.L., Torroja C., Garimella K.V., et al. Variation in genome-wide mutation rates within and between human families. Nat. Genet. 2011;43:712–714. doi: 10.1038/ng.862. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.McKenna A., Hanna M., Banks E., Sivachenko A., Cibulskis K., Kernytsky A., Garimella K., Altshuler D., Gabriel S., Daly M., DePristo M.A. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Van der Auwera G.A., Carneiro M.O., Hartl C., Poplin R., Del Angel G., Levy-Moonshine A., Jordan T., Shakir K., Roazen D., Thibault J., et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinformatics. 2013;43:11.10.1–11.10.33. doi: 10.1002/0471250953.bi1110s43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Wright C.F., Fitzgerald T.W., Jones W.D., Clayton S., McRae J.F., van Kogelenberg M., King D.A., Ambridge K., Barrett D.M., Bayzetinova T., et al. Genetic diagnosis of developmental disorders in the DDD study: a scalable analysis of genome-wide research data. Lancet. 2015;385:1305–1314. doi: 10.1016/S0140-6736(14)61705-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.McRae J.F., Clayton S., Fitzgerald T.W., Kaplanis J., Prigmore E., Rajan D., Sifrim A., Aitken S., Akawi N., Alvi M., et al. Prevalence and architecture of de novo mutations in developmental disorders. Nature. 2017;542:433–438. doi: 10.1038/nature21062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Kaplanis J., Samocha K.E., Wiel L., Zhang Z., Arvai K.J., Eberhardt R.Y., Gallone G., Lelieveld S.H., Martin H.C., McRae J.F., et al. Evidence for 28 genetic disorders discovered by combining healthcare and research data. Nature. 2020;586:757–762. doi: 10.1038/s41586-020-2832-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Samocha K.E., Robinson E.B., Sanders S.J., Stevens C., Sabo A., McGrath L.M., Kosmicki J.A., Rehnström K., Mallick S., Kirby A., et al. A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 2014;46:944–950. doi: 10.1038/ng.3050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Hayat M.J., Higgins M. Understanding poisson regression. J. Nurs. Educ. 2014;53:207–215. doi: 10.3928/01484834-20140325-04. [DOI] [PubMed] [Google Scholar]
  • 26.Kolberg L., Raudvere U., Kuzmin I., Adler P., Vilo J., Peterson H. g:Profiler—interoperable web service for functional enrichment analysis and gene identifier mapping (2023 update) Nucleic Acids Res. 2023;51:W207–W212. doi: 10.1093/nar/gkad347. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Piña J.O., Raju R., Roth D.M., Winchester E.W., Chattaraj P., Kidwai F., Faucz F.R., Iben J., Mitra A., Campbell K., et al. Multimodal spatiotemporal transcriptomic resolution of embryonic palate osteogenesis. Nat. Commun. 2023;14:5687. doi: 10.1038/s41467-023-41349-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Zeng B., Liu Z., Lu Y., Zhong S., Qin S., Huang L., Zeng Y., Li Z., Dong H., Shi Y., et al. The single-cell and spatial transcriptional landscape of human gastrulation and early brain development. Cell Stem Cell. 2023;30:851–866.e7. doi: 10.1016/j.stem.2023.04.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.More R.P., Warrier V., Brunel H., Buckingham C., Smith P., Allison C., Holt R., Bradshaw C.R., Baron-Cohen S. Identifying rare genetic variants in 21 highly multiplex autism families: the role of diagnosis and autistic traits. Mol. Psychiatry. 2023;28:2148–2157. doi: 10.1038/s41380-022-01938-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Calzolari E., Bianchi F., Rubini M., Ritvanen A., Neville A.J., EUROCAT Working Group Epidemiology of cleft palate in Europe: implications for genetic research. Cleft Palate. Craniofac. J. 2004;41:244–249. doi: 10.1597/02-074.1. [DOI] [PubMed] [Google Scholar]
  • 31.Robinson K., Singh S.K., Walkup R.B., Fawwal D.V., Vilfort K.M., Koloskee A., Fashina A., Adeyemo W.L., Beaty T.H., Butali A., et al. Rare variants in PRKCI cause Van der Woude syndrome and other features of peridermopathy. Am. J. Hum. Genet. 2025 doi: 10.1016/j.ajhg.2025.08.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Karczewski K.J., Francioli L.C., Tiao G., Cummings B.B., Alföldi J., Wang Q., Collins R.L., Laricchia K.M., Ganna A., Birnbaum D.P., et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581:434–443. doi: 10.1038/s41586-020-2308-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Gerrard G., Valgañón M., Foong H.E., Kasperaviciute D., Iskander D., Game L., Müller M., Aitman T.J., Roberts I., de la Fuente J., et al. Target enrichment and high-throughput sequencing of 80 ribosomal protein genes to identify mutations associated with Diamond-Blackfan anaemia. Br. J. Haematol. 2013;162:530–536. doi: 10.1111/bjh.12397. [DOI] [PubMed] [Google Scholar]
  • 34.Gazda H.T., Sheen M.R., Vlachos A., Choesmel V., O'Donohue M.F., Schneider H., Darras N., Hasman C., Sieff C.A., Newburger P.E., et al. Ribosomal protein L5 and L11 mutations are associated with cleft palate and abnormal thumbs in Diamond-Blackfan anemia patients. Am. J. Hum. Genet. 2008;83:769–780. doi: 10.1016/j.ajhg.2008.11.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Posserud M.-B., Skretting Solberg B., Engeland A., Haavik J., Klungsøyr K. Male to female ratios in autism spectrum disorders by age, intellectual disability and attention-deficit/hyperactivity disorder. Acta Psychiatr. Scand. 2021;144:635–646. doi: 10.1111/acps.13368. [DOI] [PubMed] [Google Scholar]
  • 36.May T., Adesina I., McGillivray J., Rinehart N.J. Sex differences in neurodevelopmental disorders. Curr. Opin. Neurol. 2019;32:622–626. doi: 10.1097/WCO.0000000000000714. [DOI] [PubMed] [Google Scholar]
  • 37.de Lima R.L.L.F., Hoper S.A., Ghassibe M., Cooper M.E., Rorick N.K., Kondo S., Katz L., Marazita M.L., Compton J., Bale S., et al. Prevalence and nonrandom distribution of exonic mutations in interferon regulatory factor 6 in 307 families with Van der Woude syndrome and 37 families with popliteal pterygium syndrome. Genet. Med. 2009;11:241–247. doi: 10.1097/GIM.0b013e318197a49a. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Peyrard-Janvid M., Leslie E.J., Kousa Y.A., Smith T.L., Dunnwald M., Magnusson M., Lentz B.A., Unneberg P., Fransson I., Koillinen H.K., et al. Dominant mutations in GRHL3 cause Van der Woude Syndrome and disrupt oral periderm development. Am. J. Hum. Genet. 2014;94:23–32. doi: 10.1016/j.ajhg.2013.11.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Carapito R., Goldenberg A., Paul N., Pichot A., David A., Hamel A., Dumant-Forest C., Leroux J., Ory B., Isidor B., Bahram S. Protein-altering MYH3 variants are associated with a spectrum of phenotypes extending to spondylocarpotarsal synostosis syndrome. Eur. J. Hum. Genet. 2016;24:1746–1751. doi: 10.1038/ejhg.2016.84. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Hoornaert K.P., Vereecke I., Dewinter C., Rosenberg T., Beemer F.A., Leroy J.G., Bendix L., Björck E., Bonduelle M., Boute O., et al. Stickler syndrome caused by COL2A1 mutations: genotype-phenotype correlation in a series of 100 patients. Eur. J. Hum. Genet. 2010;18:872–880. doi: 10.1038/ejhg.2010.23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Zarate Y.A., Bosanko K.A., Caffrey A.R., Bernstein J.A., Martin D.M., Williams M.S., Berry-Kravis E.M., Mark P.R., Manning M.A., Bhambhani V., et al. Mutation update for the SATB2 gene. Hum. Mutat. 2019;40:1013–1029. doi: 10.1002/humu.23771. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Giliberti A., Currò A., Papa F.T., Frullanti E., Ariani F., Coriolani G., Grosso S., Renieri A., Mari F. MEIS2 gene is responsible for intellectual disability, cardiac defects and a distinct facial phenotype. Eur. J. Med. Genet. 2020;63 doi: 10.1016/j.ejmg.2019.01.017. [DOI] [PubMed] [Google Scholar]
  • 43.Leslie E.J., Koboldt D.C., Kang C.J., Ma L., Hecht J.T., Wehby G.L., Christensen K., Czeizel A.E., Deleyiannis F.W.B., Fulton R.S., et al. IRF6 mutation screening in non-syndromic orofacial clefting: analysis of 1521 families. Clin. Genet. 2016;90:28–34. doi: 10.1111/cge.12675. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Nikopensius T., Jagomägi T., Krjutškov K., Tammekivi V., Saag M., Prane I., Piekuse L., Akota I., Barkane B., Krumina A., et al. Genetic variants in COL2A1, COL11A2, and IRF6 contribute risk to nonsyndromic cleft palate. Birth Defects Res. A Clin. Mol. Teratol. 2010;88:748–756. doi: 10.1002/bdra.20700. [DOI] [PubMed] [Google Scholar]
  • 45.Lace B., Pajusalu S., Livcane D., Grinfelde I., Akota I., Mauliņa I., Barkāne B., Stavusis J., Inashkina I. Monogenic Versus Multifactorial Inheritance in the Development of Isolated Cleft Palate: A Whole Genome Sequencing Study. Front. Genet. 2022;13 doi: 10.3389/fgene.2022.828534. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Carlson B.M. In: Human Embryology and Developmental Biology. Fifth Edition. Carlson B.M., editor. W.B. Saunders; 2014. pp. 92–116. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S7 and Tables S1–S3, S5–S8, and S13
mmc1.pdf (1,012.6KB, pdf)
Data S1. Tables S4, S9–S12, and S14–S18
mmc2.xlsx (847.4KB, xlsx)
Document S2. Article plus supplemental information
mmc3.pdf (2.3MB, pdf)

Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics

RESOURCES