Skip to main content
Human Mutation logoLink to Human Mutation
. 2025 Aug 28;2025:6824122. doi: 10.1155/humu/6824122

Functional Validation of Noncoding Variants Associated With Nonsyndromic Orofacial Cleft

Siying Zhu 1, Hongxu Tao 1, Robert A Cornell 2,, Huan Liu 1,3,4,
PMCID: PMC12411058  PMID: 40918684

Abstract

Over the past decade, genome-wide association studies (GWASs) have found genetic variants associated with elevated risk for nonsyndromic orofacial cleft (NSOFC). In the post-GWAS era of NSOFC genetic research, an important aim is to identify the pathogenic variants that influence craniofacial development processes, towards understanding how they lead to disease manifestation. However, two major challenges hinder the translation of GWAS results into a mechanistic understanding. Firstly, it is uncertain whether the variants pinpointed by GWAS represent the underlying pathogenic variants; secondly, the bulk of genetic variants identified through GWAS are situated in noncoding regions of the genome, complicating their biological interpretation. Presently, research on noncoding genetic variants associated with NSOFC predominantly centers on variants located in transcriptional regulatory elements. These variants modulate transcription, subsequently altering the expression of downstream target genes and disrupting gene regulatory networks. We provide a systematic summary of the recent NSOFC-associated GWAS findings for the first time. With a particular focus on variants located in noncoding regions, we delve into current statistical methods and functional approaches for identifying and validating causal variants, aiming to bridge the gap between genetic variants identified by GWAS and their underlying pathogenic mechanism responsible for NSOFC. Deciphering causal variants underlying NSOFC offers valuable clinical insights that may advance early diagnosis, enhance risk stratification, and facilitate the discovery of novel therapeutic targets.

Keywords: cisregulatory element, genome-wide association study, noncoding variant, nonsyndromic orofacial cleft

1. Introduction

Orofacial cleft (OFC) is among the most prevalent complex genetic disorders, with a global incidence of about 1 per 700 live births, with considerable variation among ethnic groups [1]. Approximately 70% of OFC cases, devoid of other developmental anomalies, fall under the category of nonsyndromic orofacial cleft (NSOFC), attributed to the involvement of multiple genes and several environmental risk factors [2, 3]. NSOFC is frequently classified into two subtypes: nonsyndromic cleft palate only (NSCPO) and nonsyndromic cleft lip with or without cleft palate (NSCL/P) which includes nonsyndromic cleft lip only (NSCLO) and nonsyndromic cleft lip with cleft palate (NSCLP). Studies on twins and familial clustering have highlighted the significant genetic contributions to the etiology of NSOFC [4]. To date, a large number of susceptibility genes/loci achieving genome-wide significance have been identified through 15 independent genome-wide association studies (GWASs) [519], along with numerous subsequent meta-analyses and replication studies [2036].

Despite the notable achievements of GWAS, the challenge of explaining the missing heritability and pinpointing causal variants persists. GWAS can account for approximately two-thirds of the heritability [37, 38]. One explanation for the missing heritability is the inadvertent exclusion of a myriad of common variants with marginal effect and rare variants failing to meet the stringent genome-wide significance threshold (typically set at p < 5 × 10−8) [3941]. The quest for the missing fraction often necessitates sequencing of whole-genome or targeted genomic regions [42]. However, the precise identification of functional variants poses a formidable challenge. Owing to the phenomenon of linkage disequilibrium (LD), the lead single nucleotide polymorphisms (SNPs) initially highlighted by GWAS may not be causal variants within a haplotype block, with pathogenic variants equally prone to being in strong LD with them [41]. Furthermore, the bulk of SNPs associated with diseases uncovered by GWAS or sequencing are situated in noncoding regions of the genome, rendering their biological interpretation inherently challenging [39]. One subset of these SNPs is hypothesized to modulate the function of noncoding RNA (ncRNA) species (such as long ncRNA and microRNA [miRNA]) [43, 44] and another to affect cisregulatory elements (CREs) including enhancers, inhibitors, boundary elements, and promoters [45]. Consequently, diverse strategies encompassing statistical fine-mapping and expression quantitative trait locus (eQTL) analyses have been employed to sift through significant variants unearthed by GWAS in search of putative functional variants [46, 47]. While efforts have been made to prioritize candidate variants within noncoding regions, discerning the identity of downstream genes modulated by them remains elusive. In addition, the interplay between variants and ncRNA remains poorly understood. CREs can be present at varying distances from the genes they regulate [48]. Furthermore, a single CRE may exert regulatory influence over multiple target genes [49, 50]. Moreover, promoters and enhancers sometimes overlap, and variants may concurrently influence the activity of both [51]. These observations underscore the need for functional experimentation to unravel the mechanisms through which alterations in noncoding sequences impact the activity of CREs and expression or function of targeted genes.

Here, we first provided a synopsis of noncoding variants associated with NSOFC identified through GWAS and sequencing. Next, we summarized strategies for prioritizing candidate variants, including fine-mapping variants, analyzing eQTL results, sequencing genomes at high-throughput, primarily focusing on the identification of CREs, and applying machine learning techniques. Lastly, we scrutinized the current experimental pipelines used for validating the functional significance of noncoding variants.

2. Known Noncoding Variants Associated With NSOFC

Noncoding regions encompass introns, CREs, 5⁣′ and 3⁣′ untranslated regions (UTRs) of mRNAs, and a range of ncRNAs, and variation in any of these elements may influence disease pathogenesis (Figure 1). miRNAs are a subset of ncRNAs, and at least one miRNA-140 regulates palatogenesis [53]. The UTRs and introns collectively constitute approximately 35% of the human genome [54]. Mutation of UTRs bordering the coding sequence of mRNA can affect the binding of miRNAs and RNA-binding proteins, contributing to the development of congenital diseases [52]. The introns are removed during pre-mRNA splicing, and abnormalities in this process can result in inherited diseases [55]. CREs include enhancers, promoters, silencers, and insulators. The bulk of this review will focus on how variation within CREs, identified in GWAS and high-throughput sequencing studies, can alter their interaction with transcription factors (TFs), thereby modulating the transcription of NSOFC-related genes [16, 45]. From the burgeoning field, we selected 87 articles related to NSOFC-associated noncoding variants for this review; our efforts were certainly incomplete, and additional studies are mentioned in the Supporting Information. Our literature screening process has been illustrated in Figure 2.

Figure 1.

Figure 1

Impact of noncoding variants within cisregulatory elements (CREs) on gene regulation and disease pathogenesis. (a) Schematic representation of major regulatory elements in the genome, including promoters, enhancers, silencers/repressors, insulators, and noncoding RNA genes. These regions recruit TFs and cofactors, undergo epigenetic regulation (e.g., DNA methylation and histone modification), and coordinate gene expression [45]. (b) Effect of miRNA variants on posttranscriptional regulation. Canonical binding of a miRNA to the 3⁣′UTR of a target mRNA leads to mRNA degradation (top panel). Variants within the 3⁣′UTR (middle panel) or the miRNA seed region (bottom panel) can disrupt miRNA–mRNA interaction, thereby reducing repression efficiency and increasing mRNA expression [52]. (c) Effects of noncoding variants on TF binding. SNPs within CREs can disrupt existing TF binding sites, altering (elevating or reducing) mRNA expression (upper panel), or create novel TF binding sites, leading to altered regulation (lower panel) [45]. (d) Mechanisms of noncoding RNA–mediated regulation. lncRNAs regulate gene expression by blocking or sponging miRNAs, or by stabilizing target mRNAs (top panel). miRNAs are processed into duplexes and incorporated into the RNA-induced silencing complex (RISC) with ARGONAUT proteins to mediate translational repression or mRNA degradation (bottom panel) [52]. The figure was generated and refined in Adobe Illustrator. TF, transcription factor; 3⁣′UTR, 3⁣′ untranslated region; SNP, single nucleotide polymorphism; lncRNA, long noncoding RNA; miRNA, microRNA; RISC, RNA-induced silencing complex. Red star indicates the location of an SNP.

Figure 2.

Figure 2

Schematic flowchart illustrating criteria for study inclusion. Flow diagram illustrating the systematic literature search strategy, beginning with a broad MEDLINE search via PubMed to identify all potentially relevant articles. Text mining of article titles, abstracts, and metadata was used to narrow down the initial results, leading to the exclusion of 84 articles. A total of 107 articles passed the initial screening and were manually reviewed for eligibility. The final systematic review included 89 articles, which were classified into four major categories: GWAS and meta-analyses, next-generation sequencing (including WGS and targeted sequencing), variant prioritization, and functional validation. Reasons for exclusion at each stage are shown on the right.

2.1. GWAS on NSOFC

Since the first GWAS associated with NSCL/P was conducted by Birnbaum et al. in 2009 [5], there have been 32 NSOFC-associated GWAS and whole-genome meta-analyses; together, these studies identified 2068 lead SNPs achieving whole-genome significance (p value < 5e − 08) associated with NSOFC, proximal to 192 genes in the genome (Table S1, Figure 3) [536]. Associations were considered genome-wide significant if they reached a p value ≤ 5e − 08, the commonly accepted threshold based on Bonferroni correction for multiple testing [57]. Through genomic annotation by the Ensembl Variant Effect Predictor [56], approximately 7% of these SNPs are located in coding sequence (Figure 3a), a similar fraction as the SNPs associated with cardiovascular disease [58]. NSOFC-associated SNPs are distributed in the different categories of noncoding regions: introns (48%), ncRNAs (28%), flanking regions (upstream and downstream, 12%), intergenic regions (3%), UTRs (1%), and CREs (~1%) (Figure 3a). Interestingly, among SNPs in CREs, about 88% are in CREs active in specific tissues, as opposed to active in all cell types [59].

Figure 3.

Figure 3

Summary of lead SNPs identified by GWAS. (a) Genomic annotation of significant variants associated with NSOFC identified by GWAS using the Ensembl Variant Effect Predictor [56]. (b) Number of lead SNPs identified by GWAS and meta-analyses in NSCL/P, NSCLO, and NSCPO, respectively. (c) Number of genes (192 in total) proximal to the lead SNPs across chromosomes. The figure was initially generated using the Ensembl Variant Effect Predictor and GraphPad Prism (based on data from Table S1) and further refined in Adobe Illustrator.

It is important to recognize that NSOFC comprises several phenotypes with overlapping but distinct genetic underpinnings. While early GWAS predominantly targeted NSCL/P, subsequent research has progressively examined other phenotypes. In 2016, Leslie et al. pioneered a GWAS of NSCPO and pinpointed a missense variant in GRHL3 associated with NSCPO but not with NSCL/P, highlighting the discrepancy of genetic heterogeneity among various subtypes of NSOFC [11, 21]. Based on all published GWAS to date, phenotypic stratification reveals differences in the numbers of lead SNPs among the three major subtypes, with 787, 913, and 843 lead SNPs reported for NSCL/P, NSCLO, and NSCPO, respectively (Figure 3b). Candidate genes based on three subtypes of NSOFC have been summarized in Figure 3c.

In a GWAS, whether variant allele frequencies are evaluated by arrays or by sequencing [60, 61], often over 1 million tests of association between genetic variants and a phenotype are conducted, resulting in a need to correct for these many tests and exclude the false positive variants. The Bonferroni correction is commonly used but is probably exceedingly conservative because it assumes that each variant is an independent test when in fact many are in LD. Because this stringent correction will discard some true associations, variants that are near but below genome-wide significance (5e − 08 < p_value < 1e − 06) are often considered as potentially causative; we list such variants from multiple studies in Table S2 [516, 1827, 3032, 35, 36].

2.2. The Next-Generation Sequencing Studies on NSOFC

Because GWAS only score common variants, to find rare and de novo variants contributing to the heritability of NSOFC, recently, there have been at least five whole-genome sequencing (WGS) studies [6266] and nine targeted sequencing studies focusing on NSOFC (summarized in Tables S3 and S4, respectively) [6775]. Some functional variants within noncoding regions were found by targeted sequencing. For example, Leslie et al. found new variants through targeted sequencing of 13 GWAS-selected regions and confirmed the functional impact of noncoding variants like rs227727, disrupting the enhancer activity, and a noncoding de novo mutation affecting the enhancer activity ability of FGFR2 through functional analysis [72]. Li et al. conducted targeted sequencing of an interval harboring IRF6, identifying rs12403599, an SNP within the IRF6 promoter, as a risk factor for NSCL/P [73]. The first large-scale WGS of OFCs in parent-offspring trios was carried out by Mukhopadhyay et al. in 2020, recognizing a novel locus on Chromosome 21 as a suspected risk element for OFCs in Colombians [64]. Yu et al. identified a rare variant in PDGFRA (c.C2740T; p.R914W) via WGS and confirmed its biological function by zebrafish mutants [66]. Importantly, WGS of any individual will uncover thousands of variants relative to the reference genome [76]. Identifying causal variants among them is a challenge, and efforts to identify causal variants are guided by the list of known OFC-associated genes and by understanding of the gene regulatory networks governing the development of relevant orofacial tissues (neural crest and oral epithelium).

3. Prioritization of Noncoding Variants of NSOFC

With the advancement of GWAS and next-generation sequencing NGS, numerous genetic variants have been associated with NSOFC. However, most of these variants reside in noncoding regions, making it challenging to identify those that are truly causal and functionally relevant. Prioritization strategies have therefore become indispensable for refining large sets of associated variants into biologically meaningful candidates, reducing the complexity and cost of functional validation. To date, three major types of prioritization strategies have been widely employed in NSOFC studies—statistical fine-mapping, epigenetic fine-mapping, and machine learning–based models—each offering distinct mechanisms for highlighting variants with regulatory or pathogenic potential (comparative features summarized in Table 1). The specific applications of prioritization strategies in NSOFC-associated variant studies are detailed in Table S5 [8, 9, 1114, 16, 18, 19, 2125, 28, 30, 36, 44, 6769, 7274, 7783, 8892, 9497, 102104].

Table 1.

Comparison of prioritization strategies for NSOFC-associated variants.

Category Method Description Advantages Limitations Ref Application
Statistical fine-mapping LD analysis Identifies variants in strong LD with lead SNPs Easy to implement; visualizes LD structure Ignores SNP joint effects; affected by population history; arbitrary thresholds [46] [8, 9, 1114, 16, 2225, 28, 30, 36, 44, 69, 7274, 7783]
Haplotype block analysis Groups variants by LD-defined haplotype blocks Reflects inheritance patterns Block boundaries are arbitrary; ignores joint SNP effects [84]
Conditional analysis Detects secondary signals conditioned on lead SNPs Identifies independent signals High false positive rate with many tests; unstable with many SNPs [85]
Bayesian refinement Estimates posterior probabilities for causal SNPs Less biased than threshold-based methods; quantifies uncertainty; detects weak signals Computationally demanding; sensitive to prior choice; reduced PIP in high LD [86, 87] [13, 18, 24]
eQTL integration Links SNPs to gene expression to infer function Provides gene-level interpretation; useful for noncoding variant prioritization Requires relevant eQTL data; tissue-specific limitations [47] [19, 21, 28, 36, 67, 69, 8892]

Epigenetic fine-mapping ATAC-seq Maps accessible chromatin regions High throughput; sensitive to open regulatory elements Requires high-quality samples; limited accessibility in clinical settings [93] [14, 16, 19, 24, 77, 81, 83, 91, 9497]
ChIP-seq Maps histone marks or TF binding sites Identifies active enhancers and TF targets Cell- and condition-specific limitations; dependent on antibody quality [98]
Hi-C/DLO-Hi-C Captures 3D chromatin architecture Maps long-range TADs and loops; reveals enhancer–promoter interaction Low resolution; restricted by enzyme cutting; complex to analyze and interpret [99]
DNA methylation profiling Assesses CpG methylation levels to infer epigenetic regulation Reflects stable gene regulation; captures repressive marks Cell- and stage-specific; unclear causal direction [100]

Machine learning models Logistic regression Linear model leveraging genomic or sequence features Easy to implement; interpretable; suitable for moderate-sized datasets Cannot capture complex non-linear relationships; poor scalability to large datasets [101] [81, 97, 102104]
SVM/gkmSVM Classifies data by maximizing margin in feature space (using k-mer sequence patterns) High predictive accuracy; effective in enhancer modeling Requires well-curated training data; black box nature hinders interpretability [105]
Random forest Tree-based ensemble learning method for classification Handles nonlinear patterns; robust to overfitting Less transparent; sensitive to imbalanced data [106]
Neural networks Deep learning model capturing complex data patterns Captures nonlinear relationships; tolerant to noise; high modeling power Requires large training datasets; computationally intensive [107]
CADD Integrates multiple annotations to score variant pathogenicity Broadly adopted; comprehensive variant assessment; trained on unbiased large datasets Not disease- or tissue-specific [108] [9, 67, 68, 72, 74, 96]

Abbreviations: Application, studies applied the corresponding method to prioritize NSOFC-associated variants; ATAC-seq, assay for transposase-accessible chromatin using sequencing; CADD, Combined Annotation–Dependent Depletion; ChIP-seq, chromatin immunoprecipitation sequencing; DLO-Hi-C, digestion-ligation-only Hi-C; eQTL, expression quantitative trait locus; gkmSVM, gapped k-mer support vector machine; Hi-C, high-throughput chromosome conformation capture; LD, linkage disequilibrium; Ref, reference; SNP, single nucleotide polymorphism; SVM, support vector machine; TAD, topologically associating domain.

3.1. Statistical Fine-Mapping for Variant Prioritization

Fine-mapping uses a variety of statistical methods to examine LD structures and haplotype blocks as a way to narrow down the list of potential causal variants [109, 110]. Heuristic fine-mapping methodologies focus on assessing the LD structure surrounding a lead SNP [46], combine the lead SNP with SNPs within the same haplotype block [84], or apply conditional analysis to uncover independent signals within the region [85]. However, these approaches overlook the collective impact of SNPs and lack objective criteria for screening causal variants. Penalized regression models are aimed at enhancing stability by reducing SNP data to a smaller subset closely associated with the trait in question and operating by simultaneously estimating the effects of SNPs and shrinking coefficients towards zero [111]. Bayesian methods assume the presence of a solitary causal variant within a specific locus of interest and integrate prior probabilities concerning genetic architecture with observed GWAS data; these approaches yield posterior inclusion probabilities for each SNP [86, 112]. Ludwig et al. used a Bayesian refinement approach for each of the risk loci for NSCL/P (2p21, 8q24, 14q22, 15q24, and 19p13) identified from GWAS of the European population and revealed potential causal variants at each locus [24]. Butali et al. used a Bayesian method to fine-map the 8q24 region for NSCL/P, and by comparing samples among African, European, and Asian ancestries, they nominated a potential causal variant within this region [13]. Bayesian methods, unlike traditional those relying solely on p_values, enable direct comparison of posterior inclusion probabilities for SNPs, thereby enhancing the efficiency of fine-mapping. They also effectively control the influence of SNPs with larger effects by considering the collective effects of SNPs, consequently enhancing sensitivity in detecting SNPs with smaller effects. Compared to methods based solely on SNP correlation with the lead SNP, Bayesian methods tend to prioritize fewer but more likely to be causal SNPs [87]. However, as they heavily rely on prior probabilities, improperly chosen priors may introduce bias into estimations and fine-mapping conclusions.

eQTL analyses, which reveal associations between DNA variant alleles and gene expression levels, have been used to identify disease susceptibility–associated genes [47, 89]. An eQTL analysis of orbicularis oris muscle mesenchymal stem cells identified the SNP rs1063588 and the gene MRPL53 associated with the risk of NSCL/P in a Brazilian sample [88]. Li et al. also produced an eQTL dataset from 40 lip tissues obtained from Chinese NSCL/P patients and further combined this eQTL dataset with risk SNPs from published GWAS data, revealing 243 SNPs that are associated with expression levels of 18 genes and with risk for NSCL/P [90]. The success of this approach is currently limited by a scarcity of eQTL datasets from tissues relevant to the pathogenesis of NSOFC.

3.2. Epigenetic Fine-Mapping for Variant Prioritization

Recent advances in analyses of chromatin accessibility and architecture permit the prioritization of DNA variants based on their presence in DNA with the epigenetic features of regulatory elements [98]. For instance, the assay for transposase-accessible chromatin using sequencing (ATAC-seq) reveals open chromatin regions more accessible to TFs; variants within these regions are more likely to regulate gene expression [93]. Moreover, chromatin immunoprecipitation sequencing (ChIP-seq) enables the mapping of regulatory elements by selectively capturing DNA fragments harboring specific posttranslational histone modifications or TF binding sites [113]. In a post-GWAS analysis, researchers used chromatin immunoprecipitation (ChIP) followed by qPCR to demonstrate that the NSOFC risk-associated allele of rs2275035 altered the binding affinity of TF KLF4 and the enhancer activity of the element containing the SNP [77]. ATAC-seq and ChIP-seq can be employed together to locate candidate regulatory elements. For instance, a set of enhancer candidates specific to zebrafish epithelial cells was identified by integrating ATAC-seq and H3K27Ac ChIP-seq from GFP-expressing periderm cells sorted from dissociated embryos [81]. Chromosome conformation capture (Hi-C) allows for efficient elucidation of chromatin interactions which reveal topologically associating domains (TADs) and enhancer–promoter interactions [99]. Xiao et al. used digestion-ligation-only Hi-C (DLO Hi-C) to identify 254 potentially functional SNPs associated with NSCPO within active enhancers of oral epithelial cells, which are physically associated with 1718 promoters in the human oral epithelial cell line [97]. A recent study showed that noncoding variants within regulatory elements can indirectly affect transcription through altering DNA methylation or modulating the three-dimensional (3D) conformation or accessibility of chromatin, and these epigenetic effects may vary in different cell types or environments, adding an additional challenge to identifying causal variants [100]. In summary, methods of mapping epigenetic features allow researchers to prioritize and interpret genetic variants based on their presence in elements likely to have a regulatory function.

3.3. Machine Learning Models for Assessing Noncoding Variants

Machine learning techniques have focused on identifying shared sequence features of regulatory elements, in hopes of prioritizing disease-associated variants without the need for laborious biochemical approaches. Although researchers can assess epigenetic features of chromatin and determine whether disease-associated variants lie within active regulatory elements, the methods to evaluate such features are complex, and often, the relevant tissue is difficult to access (e.g., embryonic palate tissue). Machine learning models for evaluating shared sequence features of active regulatory elements range from relatively basic logistic regression and support vector machine (SVM) to more complex models like random forests and neural networks [101, 106, 107]. Recently, Zhang et al. used 43 validated SNPs as training data and built seven models based on different machine learning methods and utilized the models to measure the accuracy of SNP risk assessment in Chinese Han populations. They found that the logistic regression model had the best predictive performance, followed by SVM [102]. The gapped k-mer support vector machine (gkmSVM), based on the enrichment weights assigned to all 10-mers within a training dataset, has been tested to predict approximately 90% of enhancer sequences revealed by the ENCODE project [105, 114]. The gkmSVM evaluation model could efficiently assess the effect of risk SNPs on tissue-specific enhancer activity, prioritizing causal variants for further biological validation experiments. Liu et al. trained three gkmSVM classifiers on sets of putative enhancers, recognized by ATAC-seq and H3K27Ac signals, from (1) zebrafish periderm cells, (2) embryonic mouse palate epithelium, and (3) a human oral epithelium cell line, respectively, to predict the effects of 14 OFC-associated SNPs near KRT18 on enhancer activity; interestingly, all three classifiers predicted SNP (rs2070875) to have the strongest impact on enhancer activity [81]. Although gkmSVM improves prioritization of noncoding variants, the accuracy of the prediction is highly dependent on the training sets, which may include false positives. Additionally, the “black box” quality of some machine learning algorithms hinders the interpretability of their results [115].

Combined Annotation–Dependent Depletion (CADD) is a widely used machine learning model for prioritizing causal variants by predicting variants' impacts. Compared to other approaches that are trained on single annotations or on a limited set of genomic variants with known pathogenic versus benign status, CADD is trained on less biased and much larger datasets, integrating over 60 genomic features into a single measure (i.e., the C score) for each variant [108, 116]. Because C scores reflect pathogenicity, allelic diversity, functional annotations, and known risk variants, their ability to prioritize functional and pathogenic variants exceeds that of other methods [116]. The application of CADD in the study of variants associated with NSOFC is a promising area for future investigation.

4. Functional Validation of Noncoding Variants

Acknowledging evidence that noncoding variants may in some cases contribute to the pathogenesis of NSOFC by altering ncRNAs [43, 44], we here focus on variants which disrupt enhancers and promoters to modulate gene expression, known as regulatory variants. Using text mining, we extracted and categorized methods of functional validation commonly employed in NSOFC studies into seven major classes: (1) protein binding, including ChIP, electrophoretic mobility shift assays (EMSAs), and flanking restriction enhanced pulldown (FREP); (2) in vitro reporter assays, including luciferase assays and massively parallel reporter assays (MPRAs); (3) in vivo reporter assays, including transient transgenic enhancer assays in zebrafish and lacZ reporter assays in mice; (4) in vitro gene regulation and perturbation assays, including clustered regulatory interspaced short palindromic repeat (CRISPR)/clustered regulatory interspaced short palindromic repeat–associated protein (Cas) 9–mediated enhancer or SNP editing, allele-specific genome editing, RNA interference, siRNA-mediated knockdown, gene overexpression, and cellular phenotype assays such as migration, proliferation, and apoptosis analyses; (5) in vivo or animal models; (6) spatial expression analyses of candidate genes in craniofacial tissues, including in situ hybridization, immunohistochemistry, and immunofluorescence staining; and (7) chromatin interaction profiling, including chromosome conformation capture (e.g., 3C, 3C-qPCR, and Hi-C). We further examined how many studies employed multiple validation strategies and found that 40% of all articles utilized three or more experimental approaches (Figure 4). These findings underscore the need for multifaceted validation frameworks to establish the functional relevance of noncoding variants. The most frequently adopted validation strategies are elaborated below, while a systematic summary of all approaches employed in NSOFC-related studies is provided in Table S6 [7, 1114, 16, 19, 21, 28, 34, 36, 44, 66, 67, 72, 73, 77, 8183, 91, 9497, 117139].

Figure 4.

Figure 4

Experimental strategies for functional validation of NSOFC-associated genetic variants. Diverse experimental strategies are used to validate noncoding variants. We surveyed 48 studies and used text mining of abstracts and metadata to classify the experimental approaches used to functionally validate noncoding variants. Seven validation categories were defined: (1) protein binding, (2) in vitro reporter assays, (3) in vivo reporter assays, (4) gene regulation and perturbation, (5) in vivo or animal models, (6) spatial expression analyses of candidate genes, and (7) chromatin interaction. The UpSet plot summarizes the number of studies employing different combinations of these approaches. Each vertical bar represents the number of studies (intersection size) that used a specific combination of validation types, as indicated by the filled dots below. The color of the bars corresponds to the number of validation categories used in each combination: pink, 5; orange, 4; green, 3; blue, 2; and purple 1. Horizontal bars on the right show the total number of studies employing each individual validation approach (set size). The figure was generated in R using the UpSetR package (based on data from Table S6) and further refined in Adobe Illustrator.

4.1. Validation of the Regulatory Variants Within Regulatory Element Regions

4.1.1. Protein Binding Assays

DNA variants situated in CREs have the potential to alter TF binding sites, either decreasing or increasing TF affinity, and thereby influencing gene expression [140]. EMSA and ChIP-seq (or ChIP-qPCR) have been used to investigate the effects of variants on the binding of transcription-associated proteins [11, 16, 19, 36, 72, 77, 83, 97, 122, 125, 126, 130, 132, 135]. For example, Rahimov et al. used EMSA to show that an NSCLO-associated SNP (rs642961, G > A) within an IRF6 enhancer region reduces binding of the TF TFAP2A, supporting the possibility that the SNP directly affects disease risk [122]. Li et al. recently introduced a refinement to the conventional EMSA and FREP, which utilizes restriction enzyme cleavage sites adjacent to the targeted DNA sequence, thereby mitigating nonspecific binding to DNA probes [141]. A limitation of EMSA however is that protein binding is tested on naked DNA in vitro instead of in the context of chromatin. ChIP circumvents this limitation but requires antibodies specific to proteins of interest, which are sometimes unavailable. Combining DNA pulldown assays and mass spectrometry avoids the need for antibodies [137]. Single nucleotide polymorphism sequencing (SNP-seq) detects SNPs that bind regulatory proteins using Type IIS restriction enzymes which cut DNA at a fixed distance (31 bp) to one side of the binding site, regardless of the sequence there. A library of constructs is synthesized of elements containing the SNP-of-interest flanked by a Type IIS restriction enzyme site 31 bp away. The constructs are incubated in nuclear extracts of a disease-relevant cell type. If the SNP is bound by a regulatory protein, the construction will be protected from digestion. DNA sequencing and mass spectrometry identify the protected constructs and the proteins that bind them [142]. Notably, certain variants proximal to the core binding sites of TF may not disrupt the motif itself but can affect the TF's binding ability and affinity, rendering them unlikely to be predicted to disrupt binding, although protein binding assays will still reveal such effects [143, 144]. Additionally, assays performed in vitro such as EMSA and FREP lack the complicated biological environment and dynamic interactions and signaling networks between cells, tissues, and organisms. A persistent challenge in all protein binding assays is the availability of cell line models of the relevant embryonic cell type, usually embryonic oral epithelium or oral mesenchyme.

4.1.2. Reporter Assays

A widely used approach to testing whether a disease-associated polymorphism is a regulatory variant is the reporter assay conducted in a disease-relevant cell type or in a model organism (such as mice or zebrafish) [145147]. For example, researchers have found evidence that NSCL/P-associated variants are indeed regulatory through dual-luciferase assays in cell lines [11, 14, 36, 72, 77, 81, 95, 97, 122, 126, 129, 130, 135, 137]. High-throughput methods including MPRA and the self-transcribing activity regulatory region sequencing (STARR-seq) permit evaluation of thousands of variants simultaneously [83, 146, 148]. Kumari et al. conducted an MPRA to screen OFC-associated loci by cloning candidate regulatory sequences harboring SNPs into a barcoded reporter library and measuring their allele-specific activity in a fetal oral epithelial cell line (GMSM-K) [83]. MPRA enables the simultaneous, quantitative assessment of thousands of regulatory variants in a high-throughput reporter system, but its current application is limited by labor-intensive library construction, high sequencing cost, and reliance on episomal in vitro models. Recently, van Arensbergen et al. developed a method that takes advantage of the fact that both enhancers and promoters generate transcripts. They cloned fragments of the entire genome, a few hundred base pairs each, together with barcodes into a promoterless vector. Using this method, which they called survey of regulatory elements (SuREs), the authors screened 5.9 million SNPs and identified over 30,000 SNPs affecting the capacity of putative CREs in human K562 and HepG2 cell lines [149]. A limitation shared by all high-throughput reporter assays is the requirement for a disease-relevant cell line that is amenable to transfection.

While in vitro reporter assays are useful for indicating whether a disease-associated SNP is a regulatory variant, in vivo reporter assays can reveal the cell type relevant to the disease in which the enhancer is active [21, 81, 122, 129]. For example, Rahimov et al. tested whether rs642961, in an evolutionarily conserved element 9.7 kb upstream of IRF6, was a regulatory variant by conducting a reporter assay in cultured foreskin keratinocytes. They found that the SNP is in an enhancer active in those cells and that there was a modest difference in the risk and nonrisk alleles, although this did not reach statistical significance. In the same study, they generated transgenic mice and showed that an 876 bp element harboring rs642961 has enhancer activity in palate epithelium, which supports the candidacy of this SNP as being functional [122]. However, rs642961 is in LD with another OFC-associated SNP with strong support for being functional [83]; it is possible that both SNPs contribute to the risk associated with this haplotype. Similarly, with luciferase reporter assays in vitro, Liu et al. found two NSCL/P-associated SNPs (rs11170342 and rs2070875), both near the KRT8 and KRT18 genes, to be located in enhancers active in an oral epithelium cell line, and that the risk allele of rs2070875 significantly diminished its enhancer activity in these cells. In mouse reporter assays, one but not the other of the elements harboring these SNPs had enhancer activity in periderm [81]. Fisher et al. devised a transgenic reporter vector incorporating the Tol2 transposon, which was subsequently injected into zebrafish embryos revealing the regulatory potential of the noncoding sequence of interest [145]. This strategy was used to demonstrate tissue-specific enhancer activity of DNA elements containing SNPs relevant to NSCL/P [72, 77, 81, 94, 95, 97, 117, 129]. For instance, Liu et al. assessed potential allele-dependent effects of three such SNPs (rs2275035, rs4147828, and rs560426) using in vitro luciferase assays and then deployed transgenic zebrafish to test the tissue specificity of enhancers containing these SNPs. Reporter expression in F1 transgenic embryos revealed that chromatin elements containing rs2275035 or rs4147828 both have enhancer activity in craniofacial mesenchyme [77]. Of note, there has yet to be a report showing allele-dependent effects in an in vivo reporter assay, although this was attempted in transgenic mice using a safe harbor integration site [16]. Stable transgenic embryos may be necessary to reveal the modest effects of common variants on reporter expression in vivo.

4.2. Confirmation of the Targeted Genes and Validation of Their Biological Function

A persistent challenge in the post-GWAS era is to identify the gene that is regulated by a given functional SNP. The number of probable candidate genes is limited to those within the same TAD [150]. TAD boundaries are recognized by high levels of CTCF binding [151], and the boundaries are similar across most cell types [152]. Ultimately, confirming the gene whose expression is affected by a noncoding SNP requires engineering the genotype of the SNP in a relevant cell line and measuring expression of the genes within the TAD, starting with genes implicated in craniofacial development. Currently, to engineer precise changes to the genome, most authors use CRISPRs and Cas systems, which generate double-stranded breaks in DNA, together with a single-stranded oligonucleotide template [153, 154]. Liu et al. used this approach to engineer an oral epithelium cell line to be homozygous for either the risk or nonrisk allele of SNP rs4147828, which is located in an intron of ABCA4 [77]. Chromatin configuration analyses revealed that the risk allele disrupts the interaction between the enhancer containing this SNP and the promoter of the adjacent ARHGAP29 gene and reduces expression of ARHGAP29 but not of ABCA4, supporting ARHGAP29 as the targeted gene [77]. CRISPR/Cas9-mediated homology-directed repair was used to engineer the genotype of an SNP associated with cleft palate only in human induced pluripotent stem cells which were subsequently differentiated into embryonic oral epithelium; ChIP-qPCR and expression analyses showed that the SNP interferes with binding of IRF6 at an enhancer regulating expression of IRF6 [16]. In prime editing, an engineered guide RNA capable of replacing targeted nucleotides up to 12 bases hybridizes to the targeted DNA sequence, allowing a catalytically impaired Cas9 endonuclease to specify the sites and encode the targeted base substitutions or deletions without relying on donor DNA templates or double-stranded breaks [155]. This method addresses the limitation of base editing, which is difficult to perform over eight transversion mutations, enabling us to explore more disease mechanisms.

5. Perspectives

Improvements in understanding the genetic underpinnings of NSOFC have come from the efforts of human geneticists searching for variants associated with risk for NSOFC and from those of developmental biologists striving to pinpoint the genes governing morphogenesis of the face. Advances in one realm have spurred them in the other. For example, IRF6 was first identified as the gene associated with Van der Woude syndrome [156], and subsequent knockout studies in mice revealed its critical role in periderm function during palatogenesis [157, 158]. Fine-mapping and functional studies of a noncoding variant associated with OFC helped identify an enhancer region of IRF6 that is active in embryonic oral epithelium in the mouse [122]. Conversely, MSX1 was first recognized as being essential for palate fusion and tooth development in mice [159] and later discovered to be mutated in certain individuals with OFC or tooth agenesis [160]. Modern GWASs are aimed for ever greater scale at enhancing the generalizability of findings. For instance, recent work on gout involving a GWAS across 2.6 million individuals, including 120,295 cases, uncovered new pathogenic pathways [161]. On the developmental biology side, efforts to identify regulatory variants are benefitting from the growing availability of multiomics data, including spatial multiomics data from embryonic tissues [162, 163], which allow for precise delineation of active CREs in relevant tissues. By combining these data with chromatin interaction maps, we can directly identify the regulatory targets of noncoding variants (illustrated in Figure 5). By analyzing variants within the subthreshold range, along with palatal epigenetic data and gene regulatory networks, we can create a refined “dictionary” of OFC risk loci that will serve as an invaluable reference following genetic association studies.

Figure 5.

Figure 5

Functional annotation for potential functional variants and subthreshold SNPs. By combining palate-specific multiomics data with chromatin accessibility maps and gene regulatory networks, potential functional variants and subthreshold SNPs can be systematically annotated and mapped to their putative regulatory targets [162, 163].

Rapid advancements in artificial intelligence are set to revolutionize functional validation studies, particularly in the prioritization of candidate variants. Algorithms that integrate craniofacial epigenetic data, such as DeepFace [164], offer promising pathways to investigate the functions of OFC-associated variants and their potential regulatory mechanisms. With refined insights into the sequence features of CREs and other regulatory regions—powered by machine learning and deep learning models like the gkmSVM-based DNA scoring system [165] and Malinois-based CRE effect prediction [166, 167], and machine learning approaches to finding regulatory variants have been reviewed [168]—we can more precisely identify risk alleles that influence CRE activity or modulate target gene expression in a palatal cell-type–specific context. Recent advances have introduced tools like AlphaGenome, a deep learning model that interprets long genomic sequences to predict gene regulation and variant effects [169]. It has accurately predicted that certain noncoding mutations in leukemia indirectly activated nearby oncogenes. While powerful, AlphaGenome still struggles with predicting the effects of long-range regulatory variants and lacks training on cell-type–specific regulatory contexts. Future extensions incorporating craniofacial epigenomic datasets and cell-type–specific training regimes hold promises to adapt AlphaGenome for OFC-associated variant prediction.

Despite recent progress, several challenges remain. As most risk variants lie in noncoding regions, interpreting their function is nontrivial, given the intricate and cell-type–specific nature of gene regulation. The majority of experimental validation methods remain low-throughput and labor-intensive, and artificial intelligence approaches are still nascent, unable to accurately predict the functions of CREs under dynamic cell changes due to the fixed training datasets. Furthermore, while enhancer–promoter interactions have received the most attention, variants affecting ncRNAs—such as those disrupting miRNA binding sites, long noncoding RNA (lncRNA) function, or RNA secondary structures—also play essential roles in disease etiology but remain poorly characterized.

Finally, as most functional SNPs lie in noncoding regions, particularly in CREs, their effects are typically mediated through target gene modulation. Hence, mapping the relationships between regulatory variants and their target genes not only clarifies OFC pathogenesis but also serves as a framework to uncover pleiotropic effects. Tracing shared target genes across diseases may uncover candidate markers for comorbidities, offering new opportunities for cross-disease genetic screening.

In conclusion, identifying functional noncoding variants informs understanding of the molecular pathogenesis of OFCs. In turn, such an understanding will guide the discovery of biomarkers for diagnosis and targets for advanced therapy, ultimately improving clinical outcomes for affected individuals.

Acknowledgments

This work was supported by the National Natural Science Foundation of China to Huan Liu (grant 82322014) and National Institutes of Health (R.A.C; DE027362, DE033016, DE023575, and DE027983). Robert Cornell thanks Dr. Priyanka Kumari and Dr. Sunil Singh for discussions of functional tests of noncoding variants associated with orofacial cleft and of gene regulatory networks in oral periderm, respectively. He is also grateful for collaborations with human geneticists Elizabeth Leslie, Mary Marazita, and Jeffrey Murray.

Contributor Information

Robert A. Cornell, Email: cornellr@uw.edu.

Huan Liu, Email: liu.huan@whu.edu.cn.

Data Availability Statement

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

Conflicts of Interest

The authors declare no conflicts of interest.

Author Contributions

Siying Zhu and Hongxu Tao contributed equally.

Funding

This work was supported by the National Natural Science Foundation of China (10.13039/501100001809, 82322014) and National Institutes of Health (10.13039/100000002, DE027362, DE033016, DE023575, and DE027983).

Supporting Information

Additional supporting information can be found online in the Supporting Information section.

Supporting Information 1

Table S1. Significant genetic variants associated with NSOFC identified through GWAS. Summary of 2068 lead SNPs significantly associated with NSOFC (p < 5e − 08), compiled from 32 GWAS and whole-genome meta-analyses. The table includes information on SNP ID, genomic position, reference and alternative alleles, nearby gene(s), statistical significance (p value), ethnicity of the study population, and corresponding reference.

6824122.f1.xlsx (341.8KB, xlsx)
Supporting Information 2

Table S2. Suggestive genetic variants associated with NSOFC identified through GWAS. Summary of potentially causative variants associated with NSOFC that show suggestive associations (5e − 08 < p < 1e − 06), derived from 27 GWAS and meta-analyses. The table includes information on SNP ID, genomic position, reference and alternative alleles, nearby gene(s), statistical significance (p value), ethnicity of the study population, and corresponding reference.

6824122.f2.xlsx (174.1KB, xlsx)
Supporting Information 3

Table S3. Genetic variants associated with NSOFC identified by WGS. Summary of genetic variants associated with NSOFC identified through WGS. This table compiles data from five published WGS studies, including information on variant ID, genomic position, reference and alternative alleles, nearby gene(s), variant class, study population ethnicity, and corresponding reference.

6824122.f3.xlsx (125.6KB, xlsx)
Supporting Information 4

Table S4. Genetic variants associated with NSOFC identified by targeted sequencing. Summary of genetic variants identified in NSOFC-related genes through nine targeted sequencing studies, including information on variant ID, genomic position, reference and alternative alleles, nearby gene(s), variant class, statistical significance (p value), study population ethnicity, and corresponding reference.

6824122.f4.xlsx (29.8KB, xlsx)
Supporting Information 5

Table S5. Prioritization strategies for genetic variants associated with NSOFC. Overview of prioritization strategies applied in NSOFC-associated variant studies. This table summarizes how various prioritization strategies, including statistical fine-mapping, epigenetic annotation, and machine learning–based prediction models, have been utilized to refine candidate variants from GWAS and sequencing datasets. For each study, key information is provided, including the type of prioritization strategy employed, target variants or loci, and corresponding reference.

6824122.f5.xlsx (28.6KB, xlsx)
Supporting Information 6

Table S6. Functional validation of genetic variants and susceptibility genes associated with NSOFC. Overview of experimental approaches used for functional validation of NSOFC-associated variants. This table summarizes how various functional validation methods, including protein binding assays, in vitro and in vivo reporter assays, gene regulation and perturbation assays, in vivo or animal models, spatial gene expression analyses, and chromatin interaction profiling, have been utilized to functionally characterize candidate variants identified from GWAS and sequencing datasets. For each study, key information is provided, including the specific validation techniques employed, target variants or loci, and corresponding reference.

6824122.f6.xlsx (28.6KB, xlsx)

References

  • 1.Leslie E. J., Marazita M. L. Genetics of Cleft Lip and Cleft Palate. American Journal of Medical Genetics Part C: Seminars in Medical Genetics . 2013;163(4):246–258. doi: 10.1002/ajmg.c.31381. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Carlson J. C., Shaffer J. R., Deleyiannis F., et al. Genome-Wide Interaction Study Implicates VGLL2 and Alcohol Exposure and PRL and Smoking in Orofacial Cleft Risk. Developmental Biology . 2022;10 doi: 10.3389/fcell.2022.621261.35223824 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Dixon M. J., Marazita M. L., Beaty T. H., Murray J. C. Cleft Lip and Palate: Understanding Genetic and Environmental Influences. Nature Reviews. Genetics . 2011;12(3):167–178. doi: 10.1038/nrg2933. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Melnick M. Cleft Lip and Palate: From Origin to Treatment. American Journal of Human Genetics . 2003;72(2):p. 503. doi: 10.1086/345997. [DOI] [Google Scholar]
  • 5.Birnbaum S., Ludwig K. U., Reutter H., et al. Key Susceptibility Locus for Nonsyndromic Cleft Lip With or Without Cleft Palate on Chromosome 8q24. Nature Genetics . 2009;41(4):473–477. doi: 10.1038/ng.333. [DOI] [PubMed] [Google Scholar]
  • 6.Grant S. F., Wang K., Zhang H., et al. A Genome-Wide Association Study Identifies a Locus for Nonsyndromic Cleft Lip With or Without Cleft Palate on 8q24. Journal of Pediatrics . 2009;155(6):909–913. doi: 10.1016/j.jpeds.2009.06.020. [DOI] [PubMed] [Google Scholar]
  • 7.Beaty T. H., Murray J. C., Marazita M. L., et al. A Genome-Wide Association Study of Cleft Lip With and Without Cleft Palate Identifies Risk Variants Near MAFB and ABCA4. Nature Genetics . 2010;42(6):525–529. doi: 10.1038/ng.580. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Mangold E., Ludwig K. U., Birnbaum S., et al. Genome-Wide Association Study Identifies Two Susceptibility Loci for Nonsyndromic Cleft Lip With or Without Cleft Palate. Nature Genetics . 2010;42(1):24–26. doi: 10.1038/ng.506. [DOI] [PubMed] [Google Scholar]
  • 9.Sun Y., Huang Y., Yin A., et al. Genome-Wide Association Study Identifies a New Susceptibility Locus for Cleft Lip With or Without a Cleft Palate. Nature Communications . 2015;6(1) doi: 10.1038/ncomms7414.25775280 [DOI] [PubMed] [Google Scholar]
  • 10.Leslie E. J., Carlson J. C., Shaffer J. R., et al. A Multi-Ethnic Genome-Wide Association Study Identifies Novel Loci for Non-Syndromic Cleft Lip With or Without Cleft Palate on 2p24.2, 17q23 and 19q13. Human Molecular Genetics . 2016;25(13):2862–2872. doi: 10.1093/hmg/ddw104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Leslie E. J., Liu H., Carlson J. C., et al. A Genome-Wide Association Study of Nonsyndromic Cleft Palate Identifies an Etiologic Missense Variant in GRHL3. American Journal of Human Genetics . 2016;98(4):744–754. doi: 10.1016/j.ajhg.2016.02.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Yu Y., Zuo X., He M., et al. Genome-Wide Analyses of Non-Syndromic Cleft Lip With Palate Identify 14 Novel Loci and Genetic Heterogeneity. Nature Communications . 2017;8(1) doi: 10.1038/ncomms14364.14364 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Butali A., Mossey P. A., Adeyemo W. L., et al. Genomic Analyses in African Populations Identify Novel Risk Loci for Cleft Palate. Human Molecular Genetics . 2019;28(6):1038–1051. doi: 10.1093/hmg/ddy402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.He M., Zuo X., Liu H., et al. Genome-Wide Analyses Identify a Novel Risk Locus for Nonsyndromic Cleft Palate. Journal of Dental Research . 2020;99(13):1461–1468. doi: 10.1177/0022034520943867. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Mukhopadhyay N., Feingold E., Moreno-Uribe L., et al. Genome-Wide Association Study of Non-Syndromic Orofacial Clefts in a Multiethnic Sample of Families and Controls Identifies Novel Regions. Developmental Biology . 2021;9 doi: 10.3389/fcell.2021.621482.33898419 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Rahimov F., Nieminen P., Kumari P., et al. High Incidence and Geographic Distribution of Cleft Palate in Finland Are Associated With the IRF6 Gene. Nature Communications . 2024;15(1):p. 9568. doi: 10.1038/s41467-024-53634-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Li B., Yong L., Yu Y., et al. Genome-Wide Analyses of Nonsyndromic Cleft Lip With or Without Palate Identify 20 New Risk Loci in the Chinese Han Population. Journal of Genetics and Genomics . 2022;49(9):903–905. doi: 10.1016/j.jgg.2022.02.004. [DOI] [PubMed] [Google Scholar]
  • 18.Robinson K., Mosley T. J., Rivera-González K. S., et al. Trio-Based GWAS Identifies Novel Associations and Subtype-Specific Risk Factors for Cleft Palate. Human Genetics and Genomics Advances . 2023;4(4) doi: 10.1016/j.xhgg.2023.100234.100234 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Lou S., Zhu G., Xing C., et al. Transcriptome-Wide Association Identifies KLC1 as a Regulator of Mitophagy in Non-Syndromic Cleft Lip With or Without Palate. Imeta . 2024;3(6):p. e262. doi: 10.1002/imt2.262. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Ludwig K. U., Mangold E., Herms S., et al. Genome-Wide Meta-Analyses of Nonsyndromic Cleft Lip With or Without Cleft Palate Identify Six New Risk Loci. Nature Genetics . 2012;44(9):968–971. doi: 10.1038/ng.2360. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Ludwig K. U., Ahmed S. T., Böhmer A. C., et al. Meta-Analysis Reveals Genome-Wide Significance at 15q13 for Nonsyndromic Clefting of Both the Lip and the Palate, and Functional Analyses Implicate GREM1 as a Plausible Causative Gene. PLOS Genetics . 2016;12(3) doi: 10.1371/journal.pgen.1005914.e1005914 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Wang Y., Sun Y., Huang Y., et al. Validation of a Genome-Wide Association Study Implied That SHTIN1 May Involve in the Pathogenesis of NSCL/P in Chinese Population. Scientific Reports . 2016;6(1) doi: 10.1038/srep38872. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Leslie E. J., Carlson J. C., Shaffer J. R., et al. Genome-Wide Meta-Analyses of Nonsyndromic Orofacial Clefts Identify Novel Associations Between FOXE1 and All Orofacial Clefts, and TP63 and Cleft Lip With or Without Cleft Palate. Human Genetics . 2017;136(3):275–286. doi: 10.1007/s00439-016-1754-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Ludwig K. U., Böhmer A. C., Bowes J., et al. Imputation of Orofacial Clefting Data Identifies Novel Risk Loci and Sheds Light on the Genetic Background of Cleft Lip ± Cleft Palate and Cleft Palate Only. Human Molecular Genetics . 2017;26(4):829–842. doi: 10.1093/hmg/ddx012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Mostowska A., Gaczkowska A., Zukowski K., et al. Common Variants in DLG1 Locus Are Associated With Non-Syndromic Cleft Lip With or Without Cleft Palate. Clinical Genetics . 2018;93(4):784–793. doi: 10.1111/cge.13141. [DOI] [PubMed] [Google Scholar]
  • 26.Huang L., Jia Z., Shi Y., et al. Genetic Factors Define CPO and CLO Subtypes of Nonsyndromicorofacial Cleft. PLoS Genetics . 2019;15(10) doi: 10.1371/journal.pgen.1008357.e1008357 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.van Rooij I. A. L. M., Ludwig K. U., Welzenbach J., et al. Non-Syndromic Cleft Lip With or Without Cleft Palate: Genome-Wide Association Study in Europeans Identifies a Suggestive Risk Locus at 16p12.1 and Supports SH3PXD2A as a Clefting Susceptibility Gene. Genes . 2019;10(12) doi: 10.3390/genes10121023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Ma L., Lou S., Miao Z., et al. Identification of Novel Susceptibility Loci for Non-Syndromic Cleft Lip With or Without Cleft Palate. Journal of Cellular and Molecular Medicine . 2020;24(23):13669–13678. doi: 10.1111/jcmm.15878,33108691. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Ludwig K. U., Böhmer A. C., Rubini M., et al. Strong Association of Variants Around FOXE1 and Orofacial Clefting. Journal of Dental Research . 2014;93(4):376–381. doi: 10.1177/0022034514523987. [DOI] [PubMed] [Google Scholar]
  • 30.Yang Y., Suzuki A., Iwata J., Jun G. Secondary Genome-Wide Association Study Using Novel Analytical Strategies Disentangle Genetic Components of Cleft Lip and/or Cleft Palate in 1q32.2. Genes . 2020;11(11):p. 1280. doi: 10.3390/genes11111280. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Welzenbach J., Hammond N. L., Nikolić M., et al. Integrative Approaches Generate Insights Into the Architecture of Non-Syndromic Cleft Lip With or Without Cleft Palate. Human Genetics and Genomics Advances . 2021;2(3) doi: 10.1016/j.xhgg.2021.100038.100038 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Ray D., Venkataraghavan S., Zhang W., et al. Pleiotropy Method Reveals Genetic Overlap Between Orofacial Clefts at Multiple Novel Loci From GWAS of Multi-Ethnic Trios. PLoS Genetics . 2021;17(7) doi: 10.1371/journal.pgen.1009584.e1009584 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Yu Y., Zhen Q., Chen W., et al. Genome-Wide Meta-Analyses Identify Five New Risk Loci for Nonsyndromic Orofacial Clefts in the Chinese Han Population. Genomic Medicine . 2023;11(10) doi: 10.1002/mgg3.2226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Jia Z., Mukhopadhyay N., Yang Z., et al. Multi-Ancestry Genome Wide Association Study Meta-Analysis of Non-Syndromic Orofacial Clefts . medRxiv; 2024. [DOI] [Google Scholar]
  • 35.Alade A., Peter T., Busch T., et al. Shared Genetic Risk Between Major Orofacial Cleft Phenotypes in an African Population. Genetic Epidemiology . 2024;48(6):258–269. doi: 10.1002/gepi.22564. [DOI] [PubMed] [Google Scholar]
  • 36.Lou S., Miao Z., Li X., et al. Functional Variant at 19q13.3 Confers Nonsyndromic Cleft Palate Susceptibility by Regulating HIF3A. iScience . 2025;28(2) doi: 10.1016/j.isci.2025.111829.111829 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Yang J., Benyamin B., McEvoy B. P., et al. Common SNPs Explain a Large Proportion of the Heritability for Human Height. Nature Genetics . 2010;42(7):565–569. doi: 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Yang J., Bakshi A., Zhu Z., et al. Genetic Variance Estimation With Imputed Variants Finds Negligible Missing Heritability for Human Height and Body Mass Index. Nature Genetics . 2015;47(10):1114–1120. doi: 10.1038/ng.3390. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Hindorff L. A., Sethupathy P., Junkins H. A., et al. Potential Etiologic and Functional Implications of Genome-Wide Association Loci for Human Diseases and Traits. Proceedings of the National Academy of Sciences of the United States of America . 2009;106(23):9362–9367. doi: 10.1073/pnas.0903103106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Wang K., Li M., Hakonarson H. Analysing Biological Pathways in Genome-Wide Association Studies. Nature Reviews. Genetics . 2010;11(12):843–854. doi: 10.1038/nrg2884. [DOI] [PubMed] [Google Scholar]
  • 41.Tam V., Patel N., Turcotte M., Bossé Y., Paré G., Meyre D. Benefits and Limitations of Genome-Wide Association Studies. Nature Reviews. Genetics . 2019;20(8):467–484. doi: 10.1038/s41576-019-0127-1. [DOI] [PubMed] [Google Scholar]
  • 42.Manolio T. A., Collins F. S., Cox N. J., et al. Finding the Missing Heritability of Complex Diseases. Nature . 2009;461(7265):747–753. doi: 10.1038/nature08494. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Ma L., Xu M., Li D., et al. A miRNA-Binding-Site SNP of MSX1 Is Associated With NSOC Susceptibility. Journal of Dental Research . 2014;93(6):559–564. doi: 10.1177/0022034514527617. [DOI] [PubMed] [Google Scholar]
  • 44.Yun L., Ma L., Wang M., et al. Rs2262251 in lnc RNA RP11-462G12.2 Is Associated With Nonsyndromic Cleft Lip With/Without Cleft Palate. Human Mutation . 2019;40(11):2057–2067. doi: 10.1002/humu.23859. [DOI] [PubMed] [Google Scholar]
  • 45.Fuxman Bass J. I. Understanding the Logic and Grammar of Cis-Regulatory Elements. Nature Reviews. Genetics . 2025;no doi: 10.1038/s41576-025-00847-w. [DOI] [PubMed] [Google Scholar]
  • 46.Schaid D. J., Chen W., Larson N. B. From Genome-Wide Associations to Candidate Causal Variants by Statistical Fine-Mapping. Nature Reviews. Genetics . 2018;19(8):491–504. doi: 10.1038/s41576-018-0016-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Zhu Z., Zhang F., Hu H., et al. Integration of Summary Data From GWAS and eQTL Studies Predicts Complex Trait Gene Targets. Nature Genetics . 2016;48(5):481–487. doi: 10.1038/ng.3538. [DOI] [PubMed] [Google Scholar]
  • 48.Oudelaar A. M., Higgs D. R. The Relationship Between Genome Structure and Function. Nature Reviews. Genetics . 2021;22(3):154–168. doi: 10.1038/s41576-020-00303-x. [DOI] [PubMed] [Google Scholar]
  • 49.Claussnitzer M., Dankel S. N., Kim K. H., et al. FTO Obesity Variant Circuitry and Adipocyte Browning in Humans. The New England Journal of Medicine . 2015;373(10):895–907. doi: 10.1056/NEJMoa1502214. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Smemo S., Tena J. J., Kim K. H., et al. Obesity-Associated Variants Within FTO Form Long-Range Functional Connections With IRX3. Nature . 2014;507(7492):371–375. doi: 10.1038/nature13138. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Hua J. T., Ahmed M., Guo H., et al. Risk SNP-Mediated Promoter-Enhancer Switching Drives Prostate Cancer Through lncRNA PCAT19. Cell . 2018;174(3):564–575.e18. doi: 10.1016/j.cell.2018.06.014. [DOI] [PubMed] [Google Scholar]
  • 52.Steri M., Idda M. L., Whalen M. B., Orrù V. Genetic Variants in mRNA Untranslated Regions. Wiley Interdisciplinary Reviews: RNA . 2018;9(4) doi: 10.1002/wrna.1474.e1474 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Eberhart J. K., He X., Swartz M. E., et al. MicroRNA Mirn140 Modulates Pdgf Signaling During Palatogenesis. Nature Genetics . 2008;40(3):290–298. doi: 10.1038/ng.82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Cenik C., Derti A., Mellor J. C., Berriz G. F., Roth F. P. Genome-Wide Functional Analysis of Human 5' Untranslated Region Introns. Genome Biology . 2010;11(3) doi: 10.1186/gb-2010-11-3-r29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Scotti M. M., Swanson M. S. RNA Mis-Splicing in Disease. Nature Reviews. Genetics . 2016;17(1):19–32. doi: 10.1038/nrg.2015.3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.McLaren W., Gil L., Hunt S. E., et al. The Ensembl Variant Effect Predictor. Genome Biology . 2016;17(1):p. 122. doi: 10.1186/s13059-016-0974-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Panagiotou O. A., Ioannidis J. P. What Should the Genome-Wide Significance Threshold Be? Empirical Replication of Borderline Genetic Associations. International Journal of Epidemiology . 2012;41(1):273–286. doi: 10.1093/ije/dyr178. [DOI] [PubMed] [Google Scholar]
  • 58.Heshmatzad K., Naderi N., Maleki M., et al. Role of Non-Coding Variants in Cardiovascular Disease. Journal of Cellular and Molecular Medicine . 2023;27(12):1621–1636. doi: 10.1111/jcmm.17762. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Andersson R., Gebhard C., Miguel-Escalada I., et al. An Atlas of Active Enhancers Across Human Cell Types and Tissues. Nature . 2014;507(7493):455–461. doi: 10.1038/nature12787. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Sazonovs A., Barrett J. C. Rare-Variant Studies to Complement Genome-Wide Association Studies. Annual Review of Genomics and Human Genetics . 2018;19(1):97–112. doi: 10.1146/annurev-genom-083117-021641. [DOI] [PubMed] [Google Scholar]
  • 61.Gilissen C., Hehir-Kwa J. Y., Thung D. T., et al. Genome Sequencing Identifies Major Causes of Severe Intellectual Disability. Nature . 2014;511(7509):344–347. doi: 10.1038/nature13394. [DOI] [PubMed] [Google Scholar]
  • 62.Bishop M. R., Diaz Perez K. K., Sun M., et al. Genome-Wide Enrichment of De Novo Coding Mutations in Orofacial Cleft Trios. American Journal of Human Genetics . 2020;107(1):124–136. doi: 10.1016/j.ajhg.2020.05.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Awotoye W., Mossey P. A., Hetmanski J. B., et al. Whole-Genome Sequencing Reveals De-Novo Mutations Associated With Nonsyndromic Cleft Lip/Palate. Scientific Reports . 2022;12(1):p. 11743. doi: 10.1038/s41598-022-15885-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Mukhopadhyay N., Bishop M., Mortillo M., et al. Whole Genome Sequencing of Orofacial Cleft Trios From the Gabriella Miller Kids First Pediatric Research Consortium Identifies a New Locus on Chromosome 21. Human Genetics . 2020;139(2):215–226. doi: 10.1007/s00439-019-02099-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Zieger H. K., Weinhold L., Schmidt A., et al. Prioritization of Non-Coding Elements Involved in Non-Syndromic Cleft Lip With/Without Cleft Palate Through Genome-Wide Analysis of De Novo Mutations. Human Genetics and Genomics Advances . 2023;4(1) doi: 10.1016/j.xhgg.2022.100166.100166 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Yu Y., Alvarado R., Petty L. E., et al. Polygenic Risk Impacts PDGFRA Mutation Penetrance in Non-Syndromic Cleft Lip and Palate. Human Molecular Genetics . 2022;31(14):2348–2357. doi: 10.1093/hmg/ddac037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Li M. J., Shi J. Y., Zhang B. H., Chen Q. M., Shi B., Jia Z. L. Targeted Re-Sequencing on 1p22 Among Non-Syndromic Orofacial Clefts From Han Chinese Population. Frontiers in Genetics . 2022;13:p. 36061182. doi: 10.3389/fgene.2022.947126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Khandelwal K. D., Ishorst N., Zhou H., et al. Novel IRF6 Mutations Detected in Orofacial Cleft Patients by Targeted Massively Parallel Sequencing. Journal of Dental Research . 2017;96(2):179–185. doi: 10.1177/0022034516678829. [DOI] [PubMed] [Google Scholar]
  • 69.You Y., Shi J., Shi B., Jia Z. Target Sequencing Reveals the Association Between Variants in VAX1 and NSCL/P in Chinese Population. Oral Diseases . 2023;29(5):2130–2138. doi: 10.1111/odi.14210. [DOI] [PubMed] [Google Scholar]
  • 70.Brito L. A., Yamamoto G. L., Melo S., et al. Rare Variants in the Epithelial Cadherin Gene Underlying the Genetic Etiology of Nonsyndromic Cleft Lip With or Without Cleft Palate. Human Mutation . 2015;36(11):1029–1033. doi: 10.1002/humu.22827. [DOI] [PubMed] [Google Scholar]
  • 71.Tao H. X., Yang Y. X., Shi B., Jia Z. L. Identification of Putative Regulatory Single-Nucleotide Variants in NTN1 Gene Associated With NSCL/P. Journal of Human Genetics . 2023;68(7):491–497. doi: 10.1038/s10038-023-01137-1. [DOI] [PubMed] [Google Scholar]
  • 72.Leslie E. J., Taub M. A., Liu H., et al. Identification of Functional Variants for Cleft Lip With or Without Cleft Palate in or Near PAX7, FGFR2, and NOG by Targeted Sequencing of GWAS Loci. American Journal of Human Genetics . 2015;96(3):397–411. doi: 10.1016/j.ajhg.2015.01.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Li M. J., Kumari P., Lin Y. S., et al. A Variant in the IRF6 Promoter Associated With the Risk for Orofacial Clefting. Journal of Dental Research . 2023;102(7):806–813. doi: 10.1177/00220345231165210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Mangold E., Böhmer A. C., Ishorst N., et al. Sequencing the GRHL3 Coding Region Reveals Rare Truncating Mutations and a Common Susceptibility Variant for Nonsyndromic Cleft Palate. American Journal of Human Genetics . 2016;98(4):755–762. doi: 10.1016/j.ajhg.2016.02.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Savastano C. P., Brito L. A., Faria Á. C., et al. Impact of Rare Variants in ARHGAP29 to the Etiology of Oral Clefts: Role of Loss-of-Function vs Missense Variants. Clinical Genetics . 2017;91(5):683–689. doi: 10.1111/cge.12823. [DOI] [PubMed] [Google Scholar]
  • 76.Lupski J. R., Reid J. G., Gonzaga-Jauregui C., et al. Whole-Genome Sequencing in a Patient With Charcot-Marie-Tooth Neuropathy. New England Journal of Medicine . 2010;362(13):1181–1191. doi: 10.1056/NEJMoa0908094. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Liu H., Leslie E. J., Carlson J. C., et al. Identification of Common Non-Coding Variants at 1p22 That Are Functional for Non-Syndromic Orofacial Clefting. Nature Communications . 2017;8 doi: 10.1038/ncomms14759. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Gaczkowska A., Żukowski K., Biedziak B., et al. Association of CDKAL1 Nucleotide Variants With the Risk of Non-Syndromic Cleft Lip With or Without Cleft Palate. Journal of Human Genetics . 2018;63(4):397–406. doi: 10.1038/s10038-017-0397-4. [DOI] [PubMed] [Google Scholar]
  • 79.Cura F., Palmieri A., Girardi A., et al. Possible Effect of SNAIL Family Transcriptional Repressor 1 Polymorphisms in Non-Syndromic Cleft Lip With or Without Cleft Palate. Clinical Oral Investigations . 2018;22(7):2535–2541. doi: 10.1007/s00784-018-2350-0. [DOI] [PubMed] [Google Scholar]
  • 80.Gaczkowska A., Biedziak B., Budner M., et al. PAX7 Nucleotide Variants and the Risk of Non-Syndromic Orofacial Clefts in the Polish Population. Oral Diseases . 2019;25(6):1608–1618. doi: 10.1111/odi.13139. [DOI] [PubMed] [Google Scholar]
  • 81.Liu H., Duncan K., Helverson A., et al. Analysis of Zebrafish Periderm Enhancers Facilitates Identification of a Regulatory Variant Near Human KRT8/18. Elife . 2020;9 doi: 10.7554/eLife.51325.e51325 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Li D., Zhu G., Lou S., et al. The Functional Variant of NTN1 Contributes to the Risk of Nonsyndromic Cleft Lip With or Without Cleft Palate. European Journal of Human Genetics . 2020;28(4):453–460. doi: 10.1038/s41431-019-0549-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Kumari P., Friedman R. Z., Curtis S. W., et al. Identification of Functional Non-Coding Variants Associated With Orofacial Cleft. Nature Communications . 2025;16(1):p. 6545. doi: 10.1038/s41467-025-61734-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Barrett J. C., Fry B., Maller J., Daly M. J. Haploview: Analysis and Visualization of LD and Haplotype Maps. Bioinformatics . 2005;21(2):263–265. doi: 10.1093/bioinformatics/bth457. [DOI] [PubMed] [Google Scholar]
  • 85.Yang J., Ferreira T., Morris A. P., et al. Conditional and Joint Multiple-SNP Analysis of GWAS Summary Statistics Identifies Additional Variants Influencing Complex Traits. Nature Genetics . 2012;44(4):369–375. doi: 10.1038/ng.2213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Maller J. B., McVean G., Byrnes J., et al. Bayesian Refinement of Association Signals for 14 Loci in 3 Common Diseases. Nature Genetics . 2012;44(12):1294–1301. doi: 10.1038/ng.2435. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.van de Bunt M., Cortes A., Brown M. A., Morris A. P., McCarthy M. I. Evaluating the Performance of Fine-Mapping Strategies at Common Variant GWAS Loci. PLoS Genetics . 2015;11(9) doi: 10.1371/journal.pgen.1005535.e1005535 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Masotti C., Brito L. A., Nica A. C., et al. MRPL53, a New Candidate Gene for Orofacial Clefting, Identified Using an eQTL Approach. Journal of Dental Research . 2018;97(1):33–40. doi: 10.1177/0022034517735805. [DOI] [PubMed] [Google Scholar]
  • 89.Yang J., Yu X., Zhu G., et al. Integrating GWAS and eQTL to Predict Genes and Pathways for Non-Syndromic Cleft Lip With or Without Palate. Oral Diseases . 2021;27(7):1747–1754. doi: 10.1111/odi.13699. [DOI] [PubMed] [Google Scholar]
  • 90.Li X., Tian Y., Qiu L., et al. Expression Quantitative Trait Locus Study of Non-Syndromic Cleft Lip With or Without Cleft Palate GWAS Variants in Lip Tissues. Cells . 2022;11(20):p. 3281. doi: 10.3390/cells11203281. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Lou S., Yang J., Zhu G. R., et al. Integrative Multi-Omics Analysis Identifies Genetic Variants Contributing to Non-Syndromic Cleft Lip With or Without Cleft Palate. Chinese Journal of Dental Research . 2024;27(1):65–73. doi: 10.3290/j.cjdr.b5136745. [DOI] [PubMed] [Google Scholar]
  • 92.Cui X., Zhu G., Han M., et al. Genetic Variants inBCL‐2 Family Genes Influence the Risk of Non-Syndromic Cleft Lip With or Without Cleft Palate. Birth Defects Research . 2024;116(1) doi: 10.1002/bdr2.2288. [DOI] [PubMed] [Google Scholar]
  • 93.Grandi F. C., Modi H., Kampman L., Corces M. R. Chromatin Accessibility Profiling by ATAC-Seq. Nature Protocols . 2022;17(6):1518–1552. doi: 10.1038/s41596-022-00692-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Gordon C. T., Attanasio C., Bhatia S., et al. Identification of Novel Craniofacial Regulatory Domains Located Far Upstream of SOX9 and Disrupted in Pierre Robin Sequence. Human Mutation . 2014;35(8):1011–1020. doi: 10.1002/humu.22606. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Liu H., Leslie E. J., Jia Z., et al. Irf6 Directly Regulates Klf17 in Zebrafish Periderm and Klf4 in Murine Oral Epithelium, and Dominant-Negative KLF4 Variants Are Present in Patients With Cleft Lip and Palate. Human Molecular Genetics . 2016;25(4):766–776. doi: 10.1093/hmg/ddv614. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Thieme F., Henschel L., Hammond N. L., et al. Extending the Allelic Spectrum at Noncoding Risk Loci of Orofacial Clefting. Human Mutation . 2021;42(8):1066–1078. doi: 10.1002/humu.24219. [DOI] [PubMed] [Google Scholar]
  • 97.Xiao Y., Jiao S., He M., et al. Chromatin Conformation of Human Oral Epithelium Can Identify Orofacial Cleft Missing Functional Variants. International Journal of Oral Science . 2022;14(1):p. 43. doi: 10.1038/s41368-022-00194-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Orozco G. Fine Mapping With Epigenetic Information and 3D Structure. Seminars in Immunopathology . 2022;44(1):115–125. doi: 10.1007/s00281-021-00906-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Javierre B. M., Burren O. S., Wilder S. P., et al. Lineage-Specific Genome Architecture Links Enhancers and Non-Coding Disease Variants to Target Gene Promoters. Cell . 2016;167(5):1369–1384.e19. doi: 10.1016/j.cell.2016.09.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Lincoln M. R., Axisa P. P., Hafler D. A. Epigenetic Fine-Mapping: Identification of Causal Mechanisms for Autoimmunity. Current Opinion in Immunology . 2020;67:50–56. doi: 10.1016/j.coi.2020.09.002. [DOI] [PubMed] [Google Scholar]
  • 101.Nicholls H. L., John C. R., Watson D. S., Munroe P. B., Barnes M. R., Cabrera C. P. Reaching the End-Game for GWAS: Machine Learning Approaches for the Prioritization of Complex Disease Loci. Frontiers in Genetics . 2020;11:p. 32351543. doi: 10.3389/fgene.2020.00350. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Zhang S. J., Meng P., Zhang J., et al. Machine Learning Models for Genetic Risk Assessment of Infants With Non-Syndromic Orofacial Cleft. Genomics, Proteomics & Bioinformatics . 2018;16(5):354–364. doi: 10.1016/j.gpb.2018.07.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Machado R. A., de Oliveira Silva C., Martelli-Junior H., das Neves L. T., Coletta R. D. Machine Learning in Prediction of Genetic Risk of Nonsyndromic Oral Clefts in the Brazilian Population. Clinical Oral Investigations . 2021;25(3):1273–1280. doi: 10.1007/s00784-020-03433-y. [DOI] [PubMed] [Google Scholar]
  • 104.Kang G., Baek S. H., Kim Y. H., Kim D. H., Park J. W. Genetic Risk Assessment of Nonsyndromic Cleft Lip With or Without Cleft Palate by Linking Genetic Networks and Deep Learning Models. International Journal of Molecular Sciences . 2023;24(5):p. 36901988. doi: 10.3390/ijms24054557. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Ghandi M., Lee D., Mohammad-Noori M., Beer M. A. Enhanced Regulatory Sequence Prediction Using Gapped K-Mer Features. PLoS Computational Biology . 2014;10(7) doi: 10.1371/journal.pcbi.1003711.e1003711 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Roshan U., Chikkagoudar S., Wei Z., Wang K., Hakonarson H. Ranking Causal Variants and Associated Regions in Genome-Wide Association Studies by the Support Vector Machine and Random Forest. Nucleic Acids Research . 2011;39(9):p. e62. doi: 10.1093/nar/gkr064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Zhou J., Theesfeld C. L., Yao K., Chen K. M., Wong A. K., Troyanskaya O. G. Deep Learning Sequence-Based Ab Initio Prediction of Variant Effects on Expression and Disease Risk. Nature Genetics . 2018;50(8):1171–1179. doi: 10.1038/s41588-018-0160-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Rentzsch P., Witten D., Cooper G. M., Shendure J., Kircher M. CADD: Predicting the Deleteriousness of Variants Throughout the Human Genome. Nucleic Acids Research . 2019;47(D1):D886–d894. doi: 10.1093/nar/gky1016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Novikova G., Andrews S. J., Renton A. E., Marcora E. Beyond Association: Successes and Challenges in Linking Non-Coding Genetic Variation to Functional Consequences That Modulate Alzheimer’s Disease Risk. Molecular Neurodegeneration . 2021;16(1):p. 27. doi: 10.1186/s13024-021-00449-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Spain S. L., Barrett J. C. Strategies for Fine-Mapping Complex Traits. Human Molecular Genetics . 2015;24(R1):R111–R119. doi: 10.1093/hmg/ddv260. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Hoggart C. J., Whittaker J. C., De Iorio M., Balding D. J. Simultaneous Analysis of all SNPs in Genome-Wide and Re-Sequencing Association Studies. PLoS Genetics . 2008;4(7):p. e1000130. doi: 10.1371/journal.pgen.1000130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Hormozdiari F., Kostem E., Kang E. Y., Pasaniuc B., Eskin E. Identifying Causal Variants at Loci With Multiple Signals of Association. Genetics . 2014;198(2):497–508. doi: 10.1534/genetics.114.167908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.ENCODE Project Consortium. An Integrated Encyclopedia of DNA Elements in the Human Genome. Nature . 2012;489(7414):p. 57. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114.Ghandi M., Mohammad-Noori M., Ghareghani N., Lee D., Garraway L., Beer M. A. gkmSVM: An R Package for Gapped-kmer SVM. Bioinformatics . 2016;32(14):2205–2207. doi: 10.1093/bioinformatics/btw203. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Beer M. A. Predicting Enhancer Activity and Variant Impact Using gkm-SVM. Human Mutation . 2017;38(9):1251–1258. doi: 10.1002/humu.23185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116.Kircher M., Witten D. M., Jain P., O'Roak B. J., Cooper G. M., Shendure J. A General Framework for Estimating the Relative Pathogenicity of Human Genetic Variants. Nature Genetics . 2014;46(3):310–315. doi: 10.1038/ng.2892. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.de la Garza G., Schleiffarth J. R., Dunnwald M., et al. Interferon Regulatory Factor 6 Promotes Differentiation of the Periderm by Activating Expression of Grainyhead-Like 3. The Journal of Investigative Dermatology . 2013;133(1):68–77. doi: 10.1038/jid.2012.269. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 118.Ghassibe-Sabbagh M., Desmyter L., Langenberg T., et al. FAF1, a Gene That Is Disrupted in Cleft Palate and Has Conserved Function in Zebrafish. American Journal of Human Genetics . 2011;88(2):150–161. doi: 10.1016/j.ajhg.2011.01.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 119.Leslie E. J., Mansilla M. A., Biggs L. C., et al. Expression and Mutation Analyses Implicate ARHGAP29 as the Etiologic Gene for the Cleft Lip With or Without Cleft Palate Locus Identified by Genome-Wide Association on Chromosome 1p22. Birth Defects Research. Part a, Clinical and Molecular Teratology . 2012;94(11):934–942. doi: 10.1002/bdra.23076. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 120.Paul B. J., Palmer K., Sharp J. C., Pratt C. H., Murray S. A., Dunnwald M. ARHGAP29 Mutation Is Associated With Abnormal Oral Epithelial Adhesions. Journal of Dental Research . 2017;96(11):1298–1305. doi: 10.1177/0022034517726079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 121.Liu H., Busch T., Eliason S., et al. Exome Sequencing Provides Additional Evidence for the Involvement of ARHGAP29 in Mendelian Orofacial Clefting and Extends the Phenotypic Spectrum to Isolated Cleft Palate. Birth Defects Research . 2017;109(1):27–37. doi: 10.1002/bdra.23596. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 122.Rahimov F., Marazita M. L., Visel A., et al. Disruption of an AP-2alpha Binding Site in an IRF6 Enhancer Is Associated With Cleft Lip. Nature Genetics . 2008;40(11):1341–1347. doi: 10.1038/ng.242. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123.Rainger J. K., Bhatia S., Bengani H., et al. Disruption of SATB2 or Its Long-Range Cis-Regulation by SOX9 Causes a Syndromic Form of Pierre Robin Sequence. Human Molecular Genetics . 2014;23(10):2569–2579. doi: 10.1093/hmg/ddt647. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124.Stüssel L. G., Hollstein R., Laugsch M., et al. MiRNA-149 as a Candidate for Facial Clefting and Neural Crest Cell Migration. Journal of Dental Research . 2022;101(3):323–330. doi: 10.1177/00220345211038203. [DOI] [PubMed] [Google Scholar]
  • 125.McDade S. S., Henry A. E., Pivato G. P., et al. Genome-Wide Analysis of p63 Binding Sites Identifies AP-2 Factors as Co-Regulators of Epidermal Differentiation. Nucleic Acids Research . 2012;40(15):7190–7206. doi: 10.1093/nar/gks389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 126.Antonini D., Sirico A., Aberdam E., et al. A Composite Enhancer Regulates p63 Gene Expression in Epidermal Morphogenesis and in Keratinocyte Differentiation by Multiple Mechanisms. Nucleic Acids Research . 2015;43(2):862–874. doi: 10.1093/nar/gku1396. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 127.Cvjetkovic N., Maili L., Weymouth K. S., et al. Regulatory Variant in FZD6 Gene Contributes to Nonsyndromic Cleft Lip and Palate in an African-American Family. Molecular Genetics & Genomic Medicine . 2015;3(5):440–451. doi: 10.1002/mgg3.155. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 128.Uslu V. V., Petretich M., Ruf S., et al. Long-Range Enhancers Regulating Myc Expression Are Required for Normal Facial Morphogenesis. Nature Genetics . 2014;46(7):753–758. doi: 10.1038/ng.2971. [DOI] [PubMed] [Google Scholar]
  • 129.Lidral A. C., Liu H., Bullard S. A., et al. A Single Nucleotide Polymorphism Associated With Isolated Cleft Lip and Palate, Thyroid Cancer and Hypothyroidism Alters the Activity of an Oral Epithelium and Thyroid Enhancer Near FOXE1. Human Molecular Genetics . 2015;24(14):3895–3907. doi: 10.1093/hmg/ddv047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130.Funato N., Twigg S. R. F. The Functional Impact of the Noncoding SNP rs3741442 on Orofacial Clefting. Journal of Dental Research . 2025 doi: 10.1177/00220345251334385. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 131.He F., Xiong W., Wang Y., et al. Modulation of BMP Signaling by Noggin Is Required for the Maintenance of Palatal Epithelial Integrity During Palatogenesis. Developmental Biology . 2010;347(1):109–121. doi: 10.1016/j.ydbio.2010.08.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 132.Benko S., Fantes J. A., Amiel J., et al. Highly Conserved Non-Coding Elements on Either Side of SOX9 Associated With Pierre Robin Sequence. Nature Genetics . 2009;41(3):359–364. doi: 10.1038/ng.329. [DOI] [PubMed] [Google Scholar]
  • 133.Kumari P., Singh S. K., Raman R. A Novel Non-Coding RNA Within an Intron of CDH2 and Association of Its SNP With Non-Syndromic Cleft Lip and Palate. Gene . 2018;658:123–128. doi: 10.1016/j.gene.2018.03.017. [DOI] [PubMed] [Google Scholar]
  • 134.Lansdon L. A., Darbro B. W., Petrin A. L., et al. Identification of Isthmin 1 as a Novel Clefting and Craniofacial Patterning Gene in Humans. Genetics . 2018;208(1):283–296. doi: 10.1534/genetics.117.300535. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 135.Fu X., Cheng Y., Yuan J., Huang C., Cheng H., Zhou R. Loss-of-Function Mutation in the X-Linked TBX22 Promoter Disrupts an ETS-1 Binding Site and Leads to Cleft Palate. Human Genetics . 2015;134(2):147–158. doi: 10.1007/s00439-014-1503-8. [DOI] [PubMed] [Google Scholar]
  • 136.Pauws E., Hoshino A., Bentley L., et al. Tbx22null Mice Have a Submucous Cleft Palate Due to Reduced Palatal Bone Formation and Also Display Ankyloglossia and Choanal Atresia Phenotypes. Human Molecular Genetics . 2009;18(21):4171–4179. doi: 10.1093/hmg/ddp368. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 137.Letra A., Zhao M., Silva R. M., Vieira A. R., Hecht J. T. Functional Significance of MMP3 and TIMP2 Polymorphisms in Cleft Lip/Palate. Journal of Dental Research . 2014;93(7):651–656. doi: 10.1177/0022034514534444. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 138.Eshete M. A., Liu H., Li M., et al. Loss-of-Function GRHL3 Variants Detected in African Patients With Isolated Cleft Palate. Journal of Dental Research . 2018;97(1):41–48. doi: 10.1177/0022034517729819. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 139.Cox L. L., Cox T. C., Moreno Uribe L. M., et al. Mutations in the Epithelial Cadherin-p120-Catenin Complex Cause Mendelian Non-Syndromic Cleft Lip With or Without Cleft Palate. American Journal of Human Genetics . 2018;102(6):1143–1157. doi: 10.1016/j.ajhg.2018.04.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 140.Haberle V., Stark A. Eukaryotic Core Promoters and the Functional Basis of Transcription Initiation. Nature Reviews. Molecular Cell Biology . 2018;19(10):621–637. doi: 10.1038/s41580-018-0028-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 141.Li G., Cunin P., Wu D., et al. The Rheumatoid Arthritis Risk Variant CCR6DNP Regulates CCR6 via PARP-1. PLoS Genetics . 2016;12(9):p. e1006292. doi: 10.1371/journal.pgen.1006292. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 142.Li G., Martínez-Bonet M., Wu D., et al. High-Throughput Identification of Noncoding Functional SNPs via Type IIS Enzyme Restriction. Nature Genetics . 2018;50(8):1180–1188. doi: 10.1038/s41588-018-0159-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 143.Ulirsch J. C., Nandakumar S. K., Wang L., et al. Systematic Functional Dissection of Common Genetic Variation Affecting Red Blood Cell Traits. Cell . 2016;165(6):1530–1545. doi: 10.1016/j.cell.2016.04.048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 144.Levo M., Zalckvar E., Sharon E., et al. Unraveling Determinants of Transcription Factor Binding Outside the Core Binding Site. Genome Research . 2015;25(7):1018–1029. doi: 10.1101/gr.185033.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 145.Fisher S., Grice E. A., Vinton R. M., et al. Evaluating the Biological Relevance of Putative Enhancers Using Tol2 Transposon-Mediated Transgenesis in Zebrafish. Nature Protocols . 2006;1(3):1297–1305. doi: 10.1038/nprot.2006.230. [DOI] [PubMed] [Google Scholar]
  • 146.Trauernicht M., Martinez-Ara M., van Steensel B. Deciphering Gene Regulation Using Massively Parallel Reporter Assays. Trends in Biochemical Sciences . 2020;45(1):90–91. doi: 10.1016/j.tibs.2019.10.006. [DOI] [PubMed] [Google Scholar]
  • 147.Sethi A., Gu M., Gumusgoz E., et al. Supervised Enhancer Prediction With Epigenetic Pattern Recognition and Targeted Validation. Nature Methods . 2020;17(8):807–814. doi: 10.1038/s41592-020-0907-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 148.Arnold C. D., Gerlach D., Stelzer C., Boryń Ł. M., Rath M., Stark A. Genome-Wide Quantitative Enhancer Activity Maps Identified by STARR-seq. Science . 2013;339(6123):1074–1077. doi: 10.1126/science.1232542. [DOI] [PubMed] [Google Scholar]
  • 149.van Arensbergen J., Pagie L., FitzPatrick V. D., et al. High-Throughput Identification of Human SNPs Affecting Regulatory Element Activity. Nature Genetics . 2019;51(7):1160–1169. doi: 10.1038/s41588-019-0455-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 150.Symmons O., Uslu V. V., Tsujimura T., et al. Functional and Topological Characteristics of Mammalian Regulatory Domains. Genome Research . 2014;24(3):390–400. doi: 10.1101/gr.163519.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 151.Kentepozidou E., Aitken S. J., Feig C., et al. Clustered CTCF Binding Is an Evolutionary Mechanism to Maintain Topologically Associating Domains. Genome Biology . 2020;21(1):p. 5. doi: 10.1186/s13059-019-1894-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 152.McArthur E., Capra J. A. Topologically Associating Domain Boundaries That Are Stable Across Diverse Cell Types Are Evolutionarily Constrained and Enriched for Heritability. American Journal of Human Genetics . 2021;108(2):269–283. doi: 10.1016/j.ajhg.2021.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 153.Komor A. C., Kim Y. B., Packer M. S., Zuris J. A., Liu D. R. Programmable Editing of a Target Base in Genomic DNA Without Double-Stranded DNA Cleavage. Nature . 2016;533(7603):420–424. doi: 10.1038/nature17946. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 154.Gaudelli N. M., Komor A. C., Rees H. A., et al. Programmable Base Editing of A•T to G•C in Genomic DNA Without DNA Cleavage. Nature . 2017;551(7681):464–471. doi: 10.1038/nature24644. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 155.Anzalone A. V., Randolph P. B., Davis J. R., et al. Search-and-Replace Genome Editing Without Double-Strand Breaks or Donor DNA. Nature . 2019;576(7785):149–157. doi: 10.1038/s41586-019-1711-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 156.Kondo S., Schutte B. C., Richardson R. J., et al. Mutations in IRF6 Cause Van der Woude and Popliteal Pterygium Syndromes. Nature Genetics . 2002;32(2):285–289. doi: 10.1038/ng985. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 157.Richardson R. J., Dixon J., Malhotra S., et al. Irf6 Is a Key Determinant of the Keratinocyte Proliferation-Differentiation Switch. Nature Genetics . 2006;38(11):1329–1334. doi: 10.1038/ng1894. [DOI] [PubMed] [Google Scholar]
  • 158.Richardson R. J., Hammond N. L., Coulombe P. A., et al. Periderm Prevents Pathological Epithelial Adhesions During Embryogenesis. The Journal of Clinical Investigation . 2014;124(9):3891–3900. doi: 10.1172/JCI71946. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 159.Satokata I., Maas R. Msx 1 Deficient Mice Exhibit Cleft Palate and Abnormalities of Craniofacial and Tooth Development. Nature Genetics . 1994;6(4):348–356. doi: 10.1038/ng0494-348. [DOI] [PubMed] [Google Scholar]
  • 160.van den Boogaard M. J., Dorland M., Beemer F. A., van Amstel H. K. MSX1 Mutation Is Associated With Orofacial Clefting and Tooth Agenesis in Humans. Nature Genetics . 2000;24(4):342–343. doi: 10.1038/74155. [DOI] [PubMed] [Google Scholar]
  • 161.Major T. J., Takei R., Matsuo H., et al. A Genome-Wide Association Analysis Reveals New Pathogenic Pathways in Gout. Nature Genetics . 2024;56(11):2392–2406. doi: 10.1038/s41588-024-01921-5. [DOI] [PubMed] [Google Scholar]
  • 162.Deng Y., Bartosovic M., Ma S., et al. Spatial Profiling of Chromatin Accessibility in Mouse and Human Tissues. Nature . 2022;609(7926):375–383. doi: 10.1038/s41586-022-05094-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 163.Zhang D., Deng Y., Kukanja P., et al. Spatial Epigenome-Transcriptome Co-Profiling of Mammalian Tissues. Nature . 2023;616(7955):113–122. doi: 10.1038/s41586-023-05795-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 164.Dai Y., Itai T., Pei G., et al. DeepFace: Deep-Learning-Based Framework to Contextualize Orofacial-Cleft-Related Variants During Human Embryonic Craniofacial Development. Human Genetics and Genomics Advances . 2024;5(3):p. 100322. doi: 10.1016/j.xhgg.2024.100322. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 165.Lee D., Gorkin D. U., Baker M., et al. A Method to Predict the Impact of Regulatory Variants From DNA Sequence. Nature Genetics . 2015;47(8):955–961. doi: 10.1038/ng.3331. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 166.Gosai S. J., Castro R. I., Fuentes N., et al. Machine-Guided Design of Cell-Type-Targeting Cis-Regulatory Elements. Nature . 2024;634(8036):1211–1220. doi: 10.1038/s41586-024-08070-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 167.Zheng A., Shen Z., Glass C. K., Gymrek M. Deep Learning Predicts the Impact of Regulatory Variants on Cell-Type-Specific Enhancers in the Brain. Bioinformatics Advances . 2023;3(1) doi: 10.1093/bioadv/vbad002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 168.Wang X., Li F., Zhang Y., et al. Deep Learning Approaches for Non-Coding Genetic Variant Effect Prediction: Current Progress and Future Prospects. Briefings in Bioinformatics . 2024;25(5) doi: 10.1093/bib/bbae446.39276327 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 169.Callaway E. Deepmind’s New AlphaGenome AI Tackles The ‘Dark Matter’ in Our DNA. Nature . 2025;643(8070):17–18. doi: 10.1038/d41586-025-01998-w. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information 1

Table S1. Significant genetic variants associated with NSOFC identified through GWAS. Summary of 2068 lead SNPs significantly associated with NSOFC (p < 5e − 08), compiled from 32 GWAS and whole-genome meta-analyses. The table includes information on SNP ID, genomic position, reference and alternative alleles, nearby gene(s), statistical significance (p value), ethnicity of the study population, and corresponding reference.

6824122.f1.xlsx (341.8KB, xlsx)
Supporting Information 2

Table S2. Suggestive genetic variants associated with NSOFC identified through GWAS. Summary of potentially causative variants associated with NSOFC that show suggestive associations (5e − 08 < p < 1e − 06), derived from 27 GWAS and meta-analyses. The table includes information on SNP ID, genomic position, reference and alternative alleles, nearby gene(s), statistical significance (p value), ethnicity of the study population, and corresponding reference.

6824122.f2.xlsx (174.1KB, xlsx)
Supporting Information 3

Table S3. Genetic variants associated with NSOFC identified by WGS. Summary of genetic variants associated with NSOFC identified through WGS. This table compiles data from five published WGS studies, including information on variant ID, genomic position, reference and alternative alleles, nearby gene(s), variant class, study population ethnicity, and corresponding reference.

6824122.f3.xlsx (125.6KB, xlsx)
Supporting Information 4

Table S4. Genetic variants associated with NSOFC identified by targeted sequencing. Summary of genetic variants identified in NSOFC-related genes through nine targeted sequencing studies, including information on variant ID, genomic position, reference and alternative alleles, nearby gene(s), variant class, statistical significance (p value), study population ethnicity, and corresponding reference.

6824122.f4.xlsx (29.8KB, xlsx)
Supporting Information 5

Table S5. Prioritization strategies for genetic variants associated with NSOFC. Overview of prioritization strategies applied in NSOFC-associated variant studies. This table summarizes how various prioritization strategies, including statistical fine-mapping, epigenetic annotation, and machine learning–based prediction models, have been utilized to refine candidate variants from GWAS and sequencing datasets. For each study, key information is provided, including the type of prioritization strategy employed, target variants or loci, and corresponding reference.

6824122.f5.xlsx (28.6KB, xlsx)
Supporting Information 6

Table S6. Functional validation of genetic variants and susceptibility genes associated with NSOFC. Overview of experimental approaches used for functional validation of NSOFC-associated variants. This table summarizes how various functional validation methods, including protein binding assays, in vitro and in vivo reporter assays, gene regulation and perturbation assays, in vivo or animal models, spatial gene expression analyses, and chromatin interaction profiling, have been utilized to functionally characterize candidate variants identified from GWAS and sequencing datasets. For each study, key information is provided, including the specific validation techniques employed, target variants or loci, and corresponding reference.

6824122.f6.xlsx (28.6KB, xlsx)

Data Availability Statement

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.


Articles from Human Mutation are provided here courtesy of Wiley

RESOURCES