Summary
Non-syndromic cleft lip with or without cleft palate (nsCL/P) is a common congenital facial malformation with a multifactorial etiology. Genome-wide association studies (GWASs) have identified multiple genetic risk loci. However, functional interpretation of these loci is hampered by the underrepresentation in public resources of systematic functional maps representative of human embryonic facial development. To generate novel insights into the etiology of nsCL/P, we leveraged published GWAS data on nsCL/P as well as available chromatin modification and expression data on mid-facial development. Our analyses identified five novel risk loci, prioritized candidate target genes within associated regions, and highlighted distinct pathways. Furthermore, the results suggest the presence of distinct regulatory effects of nsCL/P risk variants throughout mid-facial development and shed light on its regulatory architecture. Our integrated data provide a platform to advance hypothesis-driven molecular investigations of nsCL/P and other human facial defects.
Keywords: cleft, GWAS, regulatory effects, chromatin mark, craniofacial development
To advance our understanding of facial disorders, we combined GWAS data with epigenetic and transcriptional data. We identified novel risk loci, prioritized candidate target genes, and highlighted distinct pathways. Our results suggest distinct effects of risk variants throughout mid-facial development and provide insights into the regulatory architecture of risk loci.
Introduction
Current research into the etiology of common disorders is focused on the identification of genetic susceptibility factors and the manner in which these risk variants interfere with biological function. Over the past decade, genome-wide association studies (GWASs) of common disorders have identified numerous risk loci. However, success in the translation of statistical associations from GWASs into functional mechanisms is only a very recent achievement.1, 2, 3, 4, 5, 6 A major driver of these advances has been the availability of large-scale genetic data and the systematic integration of genetic, transcriptional, epigenetic, and other -omics datasets from disease-relevant cell types and tissues.7
Facial disorders rank among the most common birth defects worldwide and represent a substantial burden for affected individuals, their families, and healthcare systems.8,9 The most frequent facial disorder is non-syndromic cleft lip with or without cleft palate (nsCL/P). This condition has a global incidence of ∼1 in 1,000 live births9 and is characterized by a multifactorial etiology that includes an overall genetic contribution of around 90%.9, 10, 11 On an epidemiological level, nsCL/P is associated with an increased risk for adverse health outcomes.12 However, this observation remains largely unexplained at both the clinical and molecular levels. To date, GWASs and other systematic approaches have identified at least 40 nsCL/P risk loci,13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 which explain up to 30% of the estimated heritability in European populations.21 Despite these successes, functional dissection of the associated regions has been limited to only a few loci.29, 30, 31, 32 This is mainly attributable to the systematic underrepresentation of embryonic facial data in public resources such as ENCODE,33 Roadmap Epigenome,34 and GTEx.35 To overcome this limitation, researchers have recently profiled multiple chromatin modifications in cell types and tissues of relevance to individual time points of mid-facial development, a process that is largely completed by week 10 of gestation (Figure 1A). These cell types and tissues include early human neural crest cells (hNCCs),37 lineage-specified human cranial NCCs (cNCCs),38 and embryonic mid-facial tissue samples encompassing the time period 4.5–10 weeks post-conception (craniofacial tissue [CT]; days 32–56 of gestation).39 Previous studies have demonstrated a significant enrichment of nsCL/P-GWAS variants in active chromatin regions from both hNCCs and CT.21,39 To date, however, the fact that these datasets have been generated from differing sources has precluded the integrative analyses required for a comprehensive assessment of variant function at different time points of mid-facial development.
Figure 1.
Human facial development and results of meta-analysis in clefting (MAiC)
(A) Schematic representation. The first phase of facial development (blue shading) is characterized by a substantial contribution of neural crest cells (NCCs): In early embryogenesis, NCCs arise in the ectoderm, undergo epithelial-to-mesenchymal transition, and begin to migrate from the dorsal neural tube. An NCC fraction (i.e., cranial NCCs) contribute to the pre-swellings of the face and populate the future frontonasal prominence as well as the first (purple) and second (green) pharyngeal arches.36 Subsequently, NCC-derived cells fuse to form those human facial structures that are finalized by the 10th week of embryogenesis.
(B) MAiC quantile-quantile plot. Observed statistical associations for non-syndromic cleft lip with/without cleft palate (nsCL/P) are plotted against the association statistics expected under the null hypothesis of no association. The contribution of different ethnicities in MAiC is shown using a pie chart.
(C) MAiC Manhattan plot. MAiC −log10(p) association results are plotted along their chromosomal distribution. Blue and red lines indicate suggestive (p < 10−5) and genome-wide (p < 5 × 10−8) significance, respectively. The lowest p value was observed for rs55658222 (p = 8.69 × 10−63), located at 8q24.27 Novel risk loci are highlighted in green (lead variant plus variants in linkage disequilibrium [LD] [r2 ≥ 0.6]). Gene names in subscript discriminate novel risk loci in situations where the respective chromosomal band is already listed among the 40 risk loci.
To generate novel insights into the etiology of nsCL/P, the present study leveraged both existing GWAS data on nsCL/P and epigenetic data on mid-facial development. The specific aims of the study were threefold (Figure S1). First, we generated one of the largest genome-wide genetic datasets for nsCL/P to date by combining three GWASs, which collectively encompassed European, Asian, and Latin American ethnicities. Using this resource, which we term MAiC (meta-analysis in clefting), we confirmed the vast majority of established risk regions and detected five novel loci (the strategy for identification of novel risk loci is described in the Supplemental Material and methods). To shed light on potential etiological overlaps between nsCL/P and other phenotypes, we then cross-referenced the lead variants at nsCL/P risk loci with GWAS data on >3,000 common traits and identified a set of loci with pleiotropic effects. Second, we compiled a comprehensive epigenetic map of mid-facial development through joint analyses of available data from hNCCs, cNCCs, and CT. This resource of chromatin segments across mid-facial development serves as a platform for the interpretation of genetic findings for facial disorders and traits. Finally, we aimed to generate systematic insights into nsCL/P biology by combining MAiC and epigenetic data and then adding additional layers on gene expression in NCCs and global and local three-dimensional (3D) genomic interactions (i.e., topologically associated domains [TADs],40 promoter-capture HiC [pCHi-C]41). This approach revealed tissue- and time-point-specific regulatory effects at GWAS risk loci, prioritized candidate target genes, and highlighted distinct pathways. To our knowledge, the present report is the first to describe the systematic integration of large-scale summary statistics in nsCL/P and data on the cis-regulatory landscape across several stages of human mid-facial development.
Material and methods
GWAS meta-analysis MAiC
Cohort description
The meta-analysis included data from three previously published individual GWASs on nsCL/P (Bonn case-control GWAS cohort,18 GENEVA trio cohort,20 POFC GWAS cohort;17 Table S1). We included all nsCL/P summary statistics that were publicly accessible until June 2018. Data from the Bonn cohort were available in-house, while both the GENEVA (dbGaP: phs000094) and POFC (dbGaP: phs000774) datasets were downloaded from dbGaP upon approved data access, respectively. Previously conducted meta-analyses included combinations of two of these studies (Bonn and GENEVA GWAS cohort in Ludwig et al., 201219 [genotyped variants] and 201721 [imputed variants], GENEVA and POFC in Leslie et al.26). In the present study we combined the three GWAS cohorts to generate the largest nsCL/P meta-analysis to date. In accordance with previous studies,19,21,26 two meta-analyses were performed: (1) using all individuals with diverse population backgrounds (to increase statistical power by maximizing sample size; in the following termed as MAiC), and (2) using the European datasets only (MAiCEuro, to reduce genetic heterogeneity based on population differences). Data quality control (QC) included the detection and removal of overlapping individuals, confirmation of ethnicity, and data re-analysis. We call this new dataset MAiC to provide a clear distinction from the previous individual studies and meta-analyses of sub-cohorts. Further details in cohort description and data QC can be found in the Supplemental information.
Statistical analyses
Statistical analyses were performed separately for case-control cohorts and case-parent trios, respectively. Imputed data were taken as provided by dbGaP (POFC) or generated as previously described (for Bonn and GENEVA),21 respectively, and best-guess genotypes were assigned based on a posteriori genotype probabilities of ≥0.6. In the case-control cohorts, GWAS was performed using logistic regression performed with SNPTEST and -method expected, by incorporating five (Bonn and GENEVA cohorts) and 18 (POFC cohort) dimensions of the multi-dimensional-scaling coordinates,42 respectively. For the case-parent trios, a transmission disequilibrium test (TDT) was performed on the best-guess genotypes.43 After data cleaning procedures (Supplemental information), we meta-analyzed the GWAS data of all four sub-cohorts (Bonn case-control, GENEVA case-parent trios, POFC case-control, and POFC case-parent trios) using METAL.44
The final MAiC dataset (case-control plus case-parent trios) contained 6,825 individuals (including 3,946 affected; MAiCEuro: 3,568 individuals including 1,517 affected; Table S1). The maximum genomic inflation factor was 1.051 (GENEVA) and 1.056 (POFC case-control) for MAiC and MAiCEuro, respectively. All functional downstream analyses are based on MAiC because of largely increased statistical power. To estimate the single-nucleotide polymorphism (SNP)-based heritability (h2) for nsCL/P on the liability scale, we generated a European case-control-only dataset (Bonn, POFC, totaling 532 cases and 2,051 controls; Table S1) and performed linkage disequilibrium (LD) score regression as implemented in ldsr.45 Sample and population prevalence were set to 0.21 and 0.001, respectively.
Gene-based and pathway analyses
Gene-based analyses in MAiC and MAiCEuro were performed using MAGMA46 (v.1.06), implemented in FUMA. The input SNPs of MAiC were mapped to 17,911 protein-coding genes based to a distance of 0 kb upstream/downstream of the genes, resulting in threshold of test-wide significance of p = 2.79 × 10−6 (i.e., 0.05/17,911). To annotate known and novel nsCL/P risk loci in biological context, we investigated common expression patterns of the GWASTAD genes and their molecular functions (gene ontology [GO] terms, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways) using FUMAs “GENE2FUNC” tool in (1) all GWASTAD genes, and (2) a subset of GWASTAD genes expressed in NCCs. This approach allows us to pinpoint risk loci or genes that are functionally involved in the same pathways or molecular processes and might be useful for gene prioritization.
Analysis of pleiotropic effects using the GWAS ATLAS
For each of the 45 lead SNPs in MAiC, association signals from large-scale genetic studies (including p value, effect size, and effect direction) were retrieved from the GWAS ATLAS.47 At time of analysis (November 2019), the database comprised 4,756 GWASs on 3,302 unique traits. Notably, the unique traits are split into 28 domains, of which we combined two (environment, activities) into one domain to reduce redundancy. All significant SNP-trait associations at p < 0.05 were considered, and this number was corrected for the number of GWASs and loci in the analysis.
Epigenetic datasets for mid-facial development
Identification of datasets relevant to mid-facial development
Human cell-type- and developmental-stage-specific data for mid-facial development are underrepresented (or not represented at all) in large consortia data such as ENCODE.33 However, available data in the Gene Expression Omnibus (GEO) covered mid-facial development from (1) early stages (hNCCs,37 accessed through GEO: GSE28874), (2) differentiated human cNCCs38 (accessed through GEO: GSE70751), and (3) embryonic craniofacial human tissue of different Carnegie stages (CS) (accessed through GEO: GSE97752).39 In each of these datasets, analyses of chromatin modifications were performed using chromatin immunoprecipitation followed by sequencing (chromatin immunoprecipitation sequencing [ChIP-seq]) or are available as imputed datasets. Detailed information including antibodies used in these studies is shown in Table S3 and in the Supplemental information. For hNCCs and cNCCs, ChIP-seq had been performed for chromatin modifications H3K27ac, H3K4me1, H3K4me3, and H3K27me3. In CT, for samples of CS13–CS17, ChIP-seq was performed for H3K27ac, H3K4me1, H3K4me3, H3K27me3, and H3K36me3 (Table S4), and data for H3K9me3 were imputed. For CS20 and 10 wpc, H3K27ac3 ChIP-seq data were experimentally derived; all other marks were imputed (Table S3).
Data processing
For hNCCs and cNCCs, raw data were available in fastq format. A description of data QC is given in Rada-Iglesias et al.37 and Prescott et al.,38 respectively. ChIP-seq data from craniofacial data in Wilderman et al.39 comprise processed formats, including imputed signals, peaks, and segmentation data. In order to ensure comparability among the three data sources, computational processing of ChIP-seq data as published in Wilderman et al.39 (QC, alignment, peak calling, epigenetic imputation, chromatin segmentation) was adopted to the hNCC/cNCC bioinformatics pipeline, as described in the Supplemental information and Table S5.
Chromatin imputation and segmentation
To obtain uniform datasets, chromatin imputation followed by chromatin state segmentation was performed. First, H3K9me3 and H3K36me3 marks in hNCCs/cNCCs were imputed using ChromImpute (v.1.0.1),48 based on 127 cell types from the Roadmap Epigenome Project.34
Imputed hNCC/cNCC signal files for each individual chromosome and each chromatin mark were binarized, and segmentation was performed using the core+K27ac 18-state chromatin model provided by Roadmap with ChromHMM49 to predict 18 chromatin states. Because of the low number of chromatin marks measured in the NCC samples, epigenetic imputation issues, and the higher risk of batch effect between hNCCs, cNCCs, and CT, we adopted a robust strategy and condensed the 18 generated states into eight states, based on Roadmap definition: three active states (transcription starting sites [TSS], transcribed sites, and enhancers [Enh]), one bivalent state (Poised Enh/bivalent TSS), three repressed states (Heterochromatin, Repressed PolyComb sites, Zinc finger genes/Repeats), and one quiescent state (Quies). Potential batch effects were analyzed using principal-component analysis (PCA) and hierarchical clustering of Pearson correlation coefficients.
Other datasets
To identify genome-wide regulatory genomic units, we used TADs from human embryonic stem cells (hESCs) (H1 cell line) as provided by the Ren Lab.40 Protein-coding genes were extracted from UCSC genome browser (hg19) and were mapped to TADs using positional information. TADs containing an nsCL/P risk locus were defined as GWASTAD region. Based on previous evidence for complex regulatory interactions within one TAD, we considered all genes from the GWASTAD region as potential candidate genes for downstream effects of the associated variants in the r2 ≥ 0.6 region. Expression data from NCCs (two replicates of day11hNCC [GEO: GSE121428] and three replicates of passage2hNCC [GEO: GSE108521]) were retrieved from Laugsch et al. (GEO: GSE108522).50 For the comparison of genes in TADs of nsCL/P risk loci and genes expressed in NCCs, we used the average RNA-seq Fragments Per Kilobase Million (FPKM) across five samples. To identify functional links between different regulatory features (e.g., DNA-DNA interactions of enhancers and TSS) at specific risk loci, we accessed pCHi-C cis-interaction data collected in hESCs (GEO: GSE86821).41
Translation of genetic associations into tissue- and time-point-specific regulatory effects at a systematic level
Enrichment analyses using GREGOR
Based on chromatin segments obtained from hNCCs, cNCCs, and CT, we used GREGOR (Genomic Regulatory Elements and GWAS Overlap Algorithm)51 to evaluate the enrichment of significant SNPs from the MAiC data in the available regulatory features (i.e., eight predicted chromatin states). As described in the Supplemental information, a set of samples from the Roadmap Epigenomics project (comprising both fetal and adult tissue samples) was selected as an independent dataset for comparison. As input, we used MAiC nsCL/P variants with p ≤ 0.001 without additional variants in LD (n = 22,999); this threshold was selected to balance between adequate statistical power and true-positive association signals.
CT- and NCC-specific active chromatin sites
To examine specific effects in either NCCs or CT, we filtered in the chromatin segmentation datasets for active chromatin sites (TSS, Enhancer or transcribed sites) in NCCs that are repressed/quiet (Quiescent, Biv_TSS_pois_enh, ReprPC, Heterochromatin) in CT and vice versa. For robust observations, we only trust in a chromatin state if it is present in both NCC samples (hNCCs, cNCCs) or in five of the six CT (CS13, CS14, CS15, CS17, CS20, 10wpc) samples. To account for biases in length associated with batch effects, active sites were only retained if they had a distance of ≥500 bp to any chromatin segment of opposite activity status in the other cell system/tissue. In the following, we combined the specific active chromatin sites with MAiC associations and TAD data to filter for TADs with high density of strong associated genetic variants (pMAiC ≤ 5 × 10−5) in specific active chromatin sites at new and known nsCL/P GWAS risk loci.
Characterization of nsCL/P risk variants and candidate gene prioritization in context of epigenetic mid-facial timeline
For comprehensive insights in regulatory mechanisms at nsCL/P risk loci, we finally integrated all available genetic and functional data (MAiC associations, GWASTAD- and r2 ≥ 0.6-region boundaries, NCC- and CT-specific active chromatin sites, chromatin segmentation tracks, and pCHi-C cis interactions). Based on this approach, we attempt to prioritize genetic variants with regulatory effect and potential downstream target genes and to detect relevant regulatory elements specific for the early (hNCC/cNCC) or later mid-facial development (CT).
Results
MAiC identifies five novel risk loci
The MAiC dataset was generated by combining GWAS data from three previous studies (Bonn,18 GENEVA,24 POFC17), following the exclusion of overlapping individuals and extensive QC. The final dataset comprised 1,247 nsCL/P cases, 2,879 controls, and 2,699 case-parent trios of multiple ethnicities, and ∼7.74 million SNPs. The p value distribution was consistent with a multifactorial inheritance (Figure 1B; lambda = 1.07). A set of 1,375 SNPs achieved genome-wide significance (p < 5 × 10−8; Figure 1C). Analysis of established nsCL/P risk loci in MAiC revealed genome-wide significant SNPs at 25 of the 40 regions. These 25 regions comprised 22/26 loci that were previously identified in GWASs based on largely European samples and 3/14 loci reported in individuals from the Chinese population.13,22 At all other nsCL/P risk loci (n = 15), nominal significance (p < 0.05) was observed for individual variants that were in strong LD (D′ > 0.8) with the respective lead SNP (Table S2).
Importantly, the MAiC analyses also identified five novel risk loci (p < 5 × 10−8), thus increasing the number of identified nsCL/P GWAS risk loci to 45. These novel loci were located at chromosomes 1p36.13 (sentinel variant rs34746930), 5p12FGF10 (rs60107710), 5q13.1PIK3R1 (rs6449957), 7p21.1 (rs62453366), and 20q13.12 (rs3091552; Table 1). Consistent with previous findings on risk variants for nsCL/P and other complex traits,29 these lead variants map to non-coding regions that are adjacent to candidate genes with functions during facial development, such as CAPZB52 and NBL153 (both at 1p36) and EYA254 (at 20q13; Supplemental text; Figures S2–S6). To identify population-specific effects, a sub-analysis was performed in individuals from Central Europe (MAiCEuro; n = 562 cases, 2,051 controls, and 955 case-parent trios). No additional risk loci were identified at the level of genome-wide significance (Figure S7; Table S2). Using this European case-control cohort and LD score regression,45 SNP-based heritability was estimated as h2 = 28% ± 0.1%. This confirmed previous heritability estimates obtained using the Bonn cohort only.21
Table 1.
Novel risk loci for nsCL/P identified in MAiC
Locus | Lead variant | Positiona | Allele 1/allele 2b | p value | RRc | 95% CI |
---|---|---|---|---|---|---|
1p36.13 | rs34746930 | 19,781,724 | C/G | 4.19 × 10−8 | 1.30 | 1.18–1.43 |
5p12FGF10 | rs60107710 | 44,577,755 | A/G | 3.50 × 10−8 | 1.39 | 1.24–1.57 |
5q13PIK3R1 | rs6449957 | 67,483,732 | T/C | 6.59 × 10−9 | 1.21 | 1.13–1.29 |
7p21.1 | rs62453366 | 20,747,107 | G/T | 7.83 × 10−9 | 0.77 | 0.70–0.84 |
20q13.12 | rs3091552 | 45,440,006 | C/G | 1.31 × 10−9 | 1.38 | 1.22–1.47 |
nsCL/P, non-syndromic cleft lip with or without cleft palate; MAiC, meta-analysis in clefting; RR, relative risk; CI, confidence interval. Gene names in subscript distinguish novel associated regions from independent risk loci at the same chromosomal band.
Position according to hg19.
Risk allele is underlined.
RR provided for allele 1.
Gene-based analyses suggest nsCL/P candidate genes outside of GWAS risk loci
Using MAiC summary statistics and MAGMA,46 gene-based analyses yielded 1,357 genes with nominal significance (p < 0.05; Figure S8A). A total of 25 genes reached test-wide significance (p < 2.79 × 10−6; Table S6). Of these, 23 map to known GWAS risk loci. For some of these 23 genes, functional evidence strongly supports their involvement in nsCL/P (e.g., IRF6,55 TP6356). This analysis also suggested novel candidate genes at GWAS risk loci, such as ARID3B. In mice, the gene Arid3b is expressed in cranial mesenchyme structures and has been shown to interact with Mycn, which is encoded by a strong candidate gene at another nsCL/P risk locus.57,58 Two genes with a significant burden of common variants mapped outside all known GWAS risk loci. These genes, BTN3A3 (pgene = 6.96 × 10−7) and BTN3A1 (pgene = 2.44 × 10−6; Figure S9A), are both located at chromosome 6p22.2, and previous research found that BTN3A3 showed differential expression in the lip tissue of CL/P phenotypic subgroups.59 In MAiCEuro, the gene-based analysis revealed 11 genes with test-wide significance (Figure S8B; Table S7), including three novel candidate genes (LIMCH1, MSX2, and STRA13; Figures S9B–S9D). Overall, 41 genes yielded p < 10−5 in one of the two analyses.
We also analyzed a set of 13 previously identified nsCL/P candidate genes with: (1) a significant enrichment of low-frequency variants (four genes), 60 (2) an autosomal-dominant inheritance pattern in multigenerational families (four genes),61 or (3) an enrichment of rare coding variants (five genes).62 Of these, 12 genes were present in the analysis set. Two of these 12 genes approached test-wide significance: PRTG (p = 8.44 × 10−5) and CTNND1 (p = 2.17 × 10−5; Table S8). These observations indicate that in at least a subset of genes, both common and rare variations, contribute to nsCL/P.
Genes located in TAD regions of nsCL/P GWAS loci are enriched in developmental pathways
Accumulating evidence suggests that most regulatory interactions occur within TAD modules.63,64 Therefore, genes located within TADs represent candidates for the downstream effects of the associated common risk variants. To identify molecular processes of relevance to nsCL/P, for each of the 45 risk loci, GWASTAD regions were defined, based on the extent of the respective TAD in hESC data.40 In total, 407 genes were identified within the respective TADs (GWASTAD genes, range 1 to 29 genes per locus; Table S9). Enrichment analysis using MAGMA yielded test-wide significant (padj ≤ 0.05) results for 287 GO terms (Table S10). The most significant enrichments were observed for “tissue development” (padj = 8.34 × 10−9), “‘epithelium development” (padj = 8.82 × 10−9); and “appendage development” (padj = 7.92 × 10−8; Figure S10). Together with additional significant terms, such as “embryo development,” “tube development,” and “ear development,” these observations suggest the existence of common pathways for nsCL/P and other processes of organogenesis during embryonic development.
We then prioritized genes expressed in NCCs by adding available RNA sequencing (RNA-seq) data from hNCCs.65 In total, 240 of the 407 GWASTAD genes were expressed in NCCs, with strong expression being observed for a subset of 12 genes (≥200 fragments per kilobase mapped; Table S9). Of these, at least two have been previously implicated in NCC migration processes (CAPZB,52 TPM166). These 240 NCC-expressed genes showed a substantial overlap in significant GO terms compared with the analysis of all 407 GWASTAD genes (233 out of 287 pathways; Figure 2A; Table S11). Of those 233 pathways, 157 pathways showed stronger enrichment in the subset of NCC-expressed GWASTAD genes, the strongest of which represent cellular processes (Figure S10; Table S12). Among pathways that were exclusive to GWASTAD genes expressed in NCCs (n = 106), both regulatory processes and metabolic pathways were enriched. In contrast, pathways specific to GWASTAD genes that were not expressed in NCCs (n = 54) included “keratinocyte proliferation” and “epidermis development,” a finding that is consistent with the substantial contribution of the epithelial lineage to nsCL/P.56
Figure 2.
Systematic assessment of 45 risk loci for nsCL/P
(A) Enrichment analyses of biological processes. Enrichment of genes located at risk loci identified by genome-wide association studies (GWASTAD genes, n = 407, gray) and the subset of genes expressed in neural crest cells (n = 240, blue) were calculated using MAGMA. Left panel: While most of the associated pathways overlapped both datasets, a subset of terms was distinctly enriched in one of the groups. Right panel: Bars represent the top 10 of each specific enrichment (padj ≤ 0.05). Numbers reflect nsCL/P risk genes/total number of genes in the respective gene ontology (GO) term. For the most strongly associated pathways, gene names are provided in the respective box.
(B) Pleiotropic effects of lead variants. For the lead variant of each of the 45 nsCL/P risk loci, associations with common traits were retrieved from the GWAS ATLAS. Associations with at least two risk loci were observed for 17 traits from 12 domains (y axis). Bar colors represent direction of effects. aIncluding birth weight. bIncluding multiple mass-related measurements. cLung function as measured by Forced expiratory volume (FEV)1 or FEV1/Forced vital capacity (FVC) ratio.
We next addressed the potential etiological overlap between nsCL/P and other common phenotypes that might contribute to the adverse health outcomes observed in nsCL/P. We retrieved association signals for each of the 45 lead SNPs in MAiC from large-scale genetic studies, using the GWAS ATLAS.47 At the time of analysis (November 22, 2019), this resource comprised 4,756 GWASs on 3,302 unique traits. While all of the 45 variants were available in the atlas, only 19 showed at least one significant SNP-trait association when corrected for the number of GWASs and loci (p < 2.33 × 10−7; overall number: n = 219; Table S13). These associations reflect 35 collapsed traits across 12 domains, including height, bone mineral density, hair color, and body mass index (Table S14). Eighteen traits showed associations with at least two distinct nsCL/P risk loci. Interestingly, for some traits, the direction of effect differed between individual loci (e.g., height and bone mineral density), while for other traits, the direction of effect was consistent (e.g., hypothyroidism, glomerular filtration rate, and hair color; Figure 2B).
NsCL/P-associated variants are enriched in multiple chromatin states of mid-facial development
Recent analyses in human embryonic CT39 demonstrated both a significant enrichment of lead SNPs from earlier nsCL/P GWASs in active enhancers and the presence of mid-facial specific regulatory elements. To extend this work, we incorporated data from two NCC states in order to generate a unified mid-facial development resource of chromatin modifications (Figure S1). We retrieved data on ChIP-seq from hNCCs37 and cNCCs38 and applied the data analysis pipeline used by previous authors for computational analyses of ChIP-seq data from CT.39 We observed strong inter-sample correlations between chromatin mark and developmental stage (Figures S11 and S12). The integration of 127 non-facial samples from Roadmap34 revealed local clustering of NCCs and CT along a hierarchical axis comprising hESCs, induced pluripotent stem cells (iPSCs), and iPSC-derived cells (Figure S13). Here, the most tissue-specific pattern was observed for H3K27ac (Figure S14). Similar to a previous finding for CT,39 non-facial fetal tissue samples (such as brain, kidney, and lung) clustered distinctly from NCCs (Figure S14), thus emphasizing the limited utility of many public resources for the interpretation of genetic findings in facial disorders.
Next, we generated robust chromatin segments in NCCs using ChromHMM.67 Together with segmentation data from CT and Roadmap, chromatin segments were condensed to eight categories in order to increase the robustness of the subsequent analyses (Figure S15; Table S15). We then analyzed the positional overlap of all variants with pMAiC < 0.001 in the eight chromatin states across NCCs and CT (SNP0.001_nsCL/P, n = 22,999), and compared this to a matched set of non-associated SNPs (SNPcontrol_nsCL/P, p > 0.1). The results showed that 23% of the nsCL/P variants (SNP0.001_nsCL/P) mapped to active chromatin states, while 14% mapped to either bivalent or repressed chromatin states (Figure 3A). This enrichment was significantly higher compared to the control SNPs, where 16% and 11% of variants mapped to active, or to bivalent/repressed, chromatin states, respectively (p < 10−16, Fisher’s exact test).
Figure 3.
Association of MAiC across epigenetic annotations
For all enrichment analyses, two sets of single-nucleotide polymorphisms (SNPs) were designed: (1) set of MAiC risk variants, at pMAiC ≤ 0.001 (n = 22,999), and (2) a size-matched control set, comprising non-associated SNPs (pMAiC > 0.1) with similar allele-frequency distribution.
(A) Overall enrichment analysis. For each group, the fraction of SNPs represented in different chromatin annotations of mid-facial development was assessed, without discriminating between NCCs and craniofacial tissue (CT).
(B) Overview of enrichment in NCCs and CT. Enrichment of nsCL/P risk variants in eight chromatin states for each sample (hNCCs, cNCCs, and CT, plus a set of 11 Roadmap samples). p values were calculated using GREGOR.51 Abbreviations: TSS, transcription starting site; Enh, enhancer; ReprPC, repressed PolyComb; Tx, transcribed sites; Het, Heterochromatin; TxFlnk, transcribed sites at gene 5′ and 3′; Pois_TSS_Enh, poised enhancers and bivalent TSS; ZNF_Rpts, Zinc finger genes and repeats; FE, fold enrichment. Abbreviations of tissues as provided by Roadmap.34
To delineate associations of specific chromatin states along the time series, enrichment was tested using GREGOR.51 For each of the two SNP sets, every hNCC/cNCC/CT sample was tested, together with 11 randomly selected Roadmap samples (both fetal and adult). A significant enrichment for SNP0.01_nsCL/P was observed in most of the samples/chromatin states (Figure 3B; Table S16), as compared to SNPcontrol_nsCL/P (Figure S16; Table S17). While the fold enrichment (FE) was similar for NCCs and CT in six of the eight chromatin states (such as those related to active transcription; Figures 4A–4D; Figure S17), considerable differences in enrichment between NCC and CT samples were observed in chromatin states “active enhancers” and “poised enhancers/bivalent TSS.” In both states, NCCs displayed a stronger enrichment than CT samples. For enhancers, the mean FE (FEMean) in NCCs was 1.64 (pMean = 4.36 × 10−86, average of pGREGOR), compared with FEMean = 1.43 in CT (pMean = 8.09 × 10−22). For “poised enhancers/bivalent TSS,” the corresponding values were FEMean = 1.65, pMean = 3.39 × 10−20 in NCCs, compared with FEMean = 1.39, pMean = 4.74 × 10−4 in CT. These results may have been driven in part by the heterogeneous composition of the CT samples. However, the specific enrichment pattern observed in two out of eight chromatin states suggests a distinct biological underpinning. Overall, these data confirmed previous findings of an overrepresentation of nsCL/P lead variants in enhancer marks21,39 and extended this enrichment toward additional common variants and annotations.
Figure 4.
Association of MAiC variants in distinct chromatin states
(A–D). Individual enrichment results for MAiC risk variants in four chromatin states. p values represent the difference in enrichment between NCCs and CT. Abbreviations: TSS, transcription starting site; Enh, enhancer; ReprPC, repressed PolyComb; Tx, transcribed sites; Het, Heterochromatin; TxFlnk, transcribed sites at gene 5′ and 3′; Pois_TSS_Enh, poised enhancers and bivalent TSS; ZNF_Rpts, Zinc finger genes and repeats; FE, fold enrichment. Abbreviations of tissues as provided by Roadmap.34
A subset of nsCL/P-associated SNPs show distinct regulatory effects
To extend the investigation of the contribution of regions with differing regulatory profiles in NCCs and CT, we created genome-wide maps of active chromatin sites for both NCCs and CT. A total of 9,897 regions (encompassing 26.67 Mb) with active chromatin states in NCCs (TSS, enhancer or transcribed sites) were inactive in CT (quiescent, repressed, or bivalent; termed NCC-specific active sites). Similarly, 6,189 regions (29.37 Mb) were active in CT but inactive in NCCs (CT-specific active sites). The integration of MAiC association data revealed 62,084 genetic variants that map in NCC-specific active sites. Of these, 4,022 had pMAiC ≤ 0.05. Similarly, 72,556 variants (4,834 of which had pMAiC ≤ 0.05) mapped to CT-specific active sites. In each of the groups of NCC-specific and CT-specific active sites, the p value distribution differed significantly from that expected, with a significant enrichment of association signals being observed at the lower tail of the distribution (Figure 5A).
Figure 5.
Interpretation of MAiC association results
(A) Quantile-quantile plot of specific active sites. pMAIC values (as −log10) of SNPs located in NCC-specific (n = 62,084; blue) or CT-specific (n = 72,556; pink) active sites are plotted against expected p values. In both datasets, a significant enrichment of associated risk variants was observed.
(B) Distribution of risk variants in specific active sites. Variants located within NCC- and CT-specific regions were retrieved at different pMAiC cutoffs and aggregated per topologically associated domain (TAD, numbers in lower panel). TADs were classified according to whether the variants map uniquely to NCC-active elements (blue), CT-specific elements (pink), or both (purple). The distribution largely followed the expected logarithmic distribution. However, for a substantial number of loci, different associated SNPs (at p < 5 × 10−5) mapped to both NCC- and CT-specific sites within one TAD.
(C and D) Regulatory architecture at selected loci. Based on the extent of the TAD around the respective lead variant and variants in LD ≥ 0.6 (shown in gray framed box), different layers of data were aggregated and are represented for risk loci 5q13PIK3R1 (C) and 13q32.3 (D). Tracks include (top-down): MAiC p values with color code based on LD to respective top variants; extent of NCC-specific (blue) and CT-specific (pink) sites; chromatin segmentation data from hNCCs, cNCCs, CT (color code as in Figure 3), and selected samples from Roadmap; RefSeq gene positions; and promoter capture (pC) Hi-C cis-interactions collected in hESCs.
Filtering for the subset of SNPs with pMAiC ≤ 5 × 10−5 identified 112 SNPs that mapped to either NCC-specific (51 variants), or CT-specific active regions (61 variants; Table S18). These were distributed over 39 TADs, which encompassed both known nsCL/P risk loci (n = 19; e.g., chromosomes 1p22 [Figure S18] and 2p24.2 [Figure S19]) and regions with suggestive evidence for association (n = 20; e.g., chromosome 4p13 [Figure S20]). Interestingly, at six loci (e.g., chromosomes 1q32.1 [Figure S21] and 15q24.1 [Figure S22]), at least two associated variants in LD were located in different specific elements (Table S19). This represents a significantly higher enrichment than expected and suggests that individual variants of risk haplotypes might affect the regulatory architecture at different stages of craniofacial development (Figure 5B).
Finally, we assessed how novel hypotheses on nsCL/P pathogenesis can be generated from the systematic integration of data concerning: (1) statistical associations (MAiC), (2) chromatin modifications over time (mid-facial time-series), and (3) pCHi-C cis-interactions.41 Examples from two loci are described here. First, at 5q13PIK3R1, the lead variant (rs6449957, pMAiC = 6.59 × 10−10) is located within an active region upstream of PIK3R1. This region shows evidence of being transcribed but lacks any RefSeq annotation, which might point toward a transcribed enhancer or an as-yet-undetected transcript. PCHi-C data indicate cis interactions with PIK3R1 and MAST4, both of which are expressed in hNCCs. In addition, another variant in strong LD (rs921792, pMAiC = 1.17 × 10−5) maps to a putative enhancer that is detected in both NCCs and CT (Figure 5C). As a second example, at 13q32.2 (lead variant rs2763950, pMAiC = 3.03 × 10−6, intronic in CLYBL), interactions were observed between the region around the lead variant and the genes ZIC2, ZIC5, and GGACT. While some variants (including rs2763934 with pMAiC = 6.53 × 10−7) map to a craniofacial active element near the CLYBL gene promoter, additional variants (including rs4525350 with pMAiC = 6.39 × 10−6) map to several more distantly located NCC-specific enhancers. Based on pCHi-C data, our data indicate that in NCCs, risk variants might affect ZIC2 and ZIC5 expression. This hypothesis is further supported by the finding of active transcription sites in NCCs and a bivalent state in embryonic and adult tissues. A plausible hypothesis is that, at later time points of development, additional variants mapping to other enhancer elements act on GGACT, as suggested by the presence of transcribed sites in CT. Notably, the transcript region of CLYBL itself has limited evidence for active transcription across all analyzed stages of mid-facial development, despite the presence of some active marks in the promoter region (Figure 5D).
At other loci, our data provide evidence for the presence of tissue-specific gene isoforms (e.g., 4p13-locus; Figure S20), or a second, novel candidate gene at previously reported loci. For example, at chromosome 1p22, our data suggest that the previously identified gene ARHGAP2936 is a target gene with CT-specific expression and highlight ABCD3 as novel candidate gene (Figure S18). The data also suggest complex promoter-promoter interactions involving all genes at this locus (ARHGAP29, ABCD3, and ABCA4). Interestingly, the MAiC top-associated variant at 1p22 (rs35298667, pMAiC = 6.86 × 10−16) has putative enhancer function and maps to the “E2” element, whose functional role in nsCL/P was confirmed in previous research.32 At another locus (1q32.1), we found that SERTAD4 is a CT-specific target gene, while the established causal gene IRF6 was marked as bivalent, which is consistent with its established activity in epithelial tissue55,68 (Figure S21). Taken together, these results will inform future functional studies into nsCL/P and underscore the importance of the thorough genomic annotation of relevant cells and tissues.
Discussion
Here, we report on a data-driven approach that generated novel insights into the etiology of nsCL/P. At the genetic level, we identified five novel risk loci via the large-scale meta-analysis of common genetic variation. This large genome-wide resource empowered systematic analyses at the gene and pathway levels and implicated novel molecular players in nsCL/P. Our analysis of pleiotropic effects on other common traits revealed a substantial positional overlap with traits such as height and bone mineral density. At some loci, associated variants showed opposite directions of effect, which indicates their contribution to distinct pathways. We have provided examples of how this resource is useful in terms of translating statistical associations into biological insights and illustrated its potential for further analyses of facial disorders and traits.
While our results are based on a multiethnic cohort, this still comprises a substantial contribution from the European population. Still, we captured associations at all loci that had been previously reported in distant ethnicities, such as the Chinese population.13 Although these observations suggest that nsCL/P might show less locus heterogeneity than is the case for other common diseases, allelic heterogeneity is likely to contribute in part to the lack of replication observed at some loci in previous studies. Also, the integration of genetic and chromatin segmentation data might have been biased by the European background of both the genetic and epigenetic maps. Despite some initial evidence that methylation patterns show population-specific components,69,70 few studies to date have performed systematic analyses of how maps of chromatin accessibility (in particular in mid-facial development) vary across populations. Future studies are required to determine whether population-specific risk variants from non-European populations show differing enrichment patterns from those observed in the present study and to identify additional pleiotropic effects that are present at other risk haplotypes in other populations. Importantly, to address these issues, future meta-analysis should also include recent GWAS data (e.g., from Sub-Saharan Africans71 and Colombians72). In addition, our analyses were performed for nsCL/P as the central trait. Previous studies have generated evidence of an (albeit incomplete) etiological overlap between the various nsCL/P subtypes (e.g., cleft lip, and cleft lip with cleft palate) and the genetic heterogeneity of other types of orofacial clefting (e.g., cleft palate only).21,23,73,74 Application of our integrative approach to the investigation of cleft subtypes will facilitate understanding of their individual etiologies, an issue that was beyond the scope of the present study.
One major feature of our approach was that it combined previous individual data into one joint map of epigenetic chromatin segments of NCCs and CT. This will be highly useful in terms of the future interpretation of associations in facial disorders/traits. However, due to limited availability of datasets from other cell types, such as human embryonic epithelium, this map does not comprehensively capture all biological contributors to human craniofacial development. Furthermore, our joint analysis of the different CT stages may have overlooked some effects within single stages of CT. Nonetheless, the data obtained at individual loci add to increasing evidence that for nsCL/P development, risk loci have a complex regulatory architecture, and several genes at single loci might be of relevance across the different time points of craniofacial development. Notably, several of the genes prioritized by our systematic approach have obtained independent support by other studies, for instance clefting syndromes (e.g., TP63, EEC syndrome75), resequencing studies (e.g., ARHGAP2976,77 and IRF625), or experimental evidence (e.g., PAX778). While we here focused on an in silico approach, we hope that the results will empower further experimental investigations of specific risk variants that were highlighted among the set of associated variants. Using the joint pipeline, we will continue to update our resource as chromatin marks become available from additional human tissues and/or cell systems of relevance to mid-facial development. In addition, the map will be refined through the use of single-cell technologies in order to resolve the issue of tissue heterogeneity encountered in the present study. Finally, the integration of other layers of genetic information, such as rare variants identified by whole-exome or -genome sequencing in cleft cohorts, will further increase our understanding of the etiology of craniofacial development and disease.61,79
Declaration of interests
The authors declare no competing interests.
Acknowledgments
We thank Markus M. Nöthen and Andreas Buness for helpful discussions on the manuscript and Carlo Maj for data management. This work was supported by the German Research Foundation (DFG; LU 1944/3-1, to K.U.L.).
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.xhgg.2021.100038.
Data and code availability
Original data for genetic and functional analyses in the paper is available as follows: dbGaP (dbGaP: phs000094 and phs000774), GEO (GEO: GSE28874, GSE70751, and GSE97752), and Zenodo (DOI 10.5281/zenodo.3724148). The NCC- and CT-specific active sites generated during this study are available at Zenodo (DOI 10.5281/zenodo.3911187).
Web resources
ANNOVAR, https://wglab.org/software/9-annovar
Bowtie2, http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
ChromHMM, http://compbio.mit.edu/ChromHMM/
ChromImpute, http://www.biolchem.ucla.edu/labs/ernst/ChromImpute/
core 15-state chromatin model, https://egg2.wustl.edu/roadmap/data/byFileType/chromhmmSegmentations/ChmmModels/
FastQC, https://www.bioinformatics.babraham.ac.uk/projects/fastqc
FUMA, https://fuma.ctglab.nl/
GREGOR, http://csg.sph.umich.edu/GREGOR/
GWAS Atlas, https://atlas.ctglab.nl/
Hi-C data from Bing Ren Lab, http://chromosome.sdsc.edu/mouse/hi-c/download.html
IMPUTE2, http://mathgen.stats.ox.ac.uk/impute/impute_v2.html
KING: Kinship-based Inference for GWAS, https://www.kingrelatedness.com/
PhantomPeakQualTools, https://github.com/kundajelab/phantompeakqualtools
Roadmap, https://egg2.wustl.edu/roadmap/data/byFileType/signal/consolidated/macs2signal/pval/
UCSC, http://genome.ucsc.edu/
Supplemental information
References
- 1.Claussnitzer M., Dankel S.N., Kim K.-H., Quon G., Meuleman W., Haugen C., Glunk V., Sousa I.S., Beaudry J.L., Puviindran V., et al. FTO Obesity Variant Circuitry and Adipocyte Browning in Humans. N. Engl. J. Med. 2015;373:895–907. doi: 10.1056/NEJMoa1502214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Xue A., Wu Y., Zhu Z., Zhang F., Kemper K.E., Zheng Z., Yengo L., Lloyd-Jones L.R., Sidorenko J., Wu Y., et al. eQTLGen Consortium Genome-wide association analyses identify 143 risk variants and putative regulatory mechanisms for type 2 diabetes. Nat. Commun. 2018;9:2941. doi: 10.1038/s41467-018-04951-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Markunas C.A., Johnson E.O., Hancock D.B. Comprehensive evaluation of disease- and trait-specific enrichment for eight functional elements among GWAS-identified variants. Hum. Genet. 2017;136:911–919. doi: 10.1007/s00439-017-1815-6. [DOI] [PubMed] [Google Scholar]
- 4.Visscher P.M., Wray N.R., Zhang Q., Sklar P., McCarthy M.I., Brown M.A., Yang J. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am. J. Hum. Genet. 2017;101:5–22. doi: 10.1016/j.ajhg.2017.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Roman T.S., Mohlke K.L. Functional genomics and assays of regulatory activity detect mechanisms at loci for lipid traits and coronary artery disease. Curr. Opin. Genet. Dev. 2018;50:52–59. doi: 10.1016/j.gde.2018.02.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Lu Q., Powles R.L., Abdallah S., Ou D., Wang Q., Hu Y., Lu Y., Liu W., Li B., Mukherjee S., et al. Systematic tissue-specific functional annotation of the human genome highlights immune-related DNA elements for late-onset Alzheimer’s disease. PLoS Genet. 2017;13:e1006933. doi: 10.1371/journal.pgen.1006933. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Claussnitzer M., Cho J.H., Collins R., Cox N.J., Dermitzakis E.T., Hurles M.E., Kathiresan S., Kenny E.E., Lindgren C.M., MacArthur D.G., et al. A brief history of human disease genetics. Nature. 2020;577:179–189. doi: 10.1038/s41586-019-1879-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Jugessur A., Farlie P.G., Kilpatrick N. The genetics of isolated orofacial clefts: From genotypes to subphenotypes. Oral Dis. 2009;15:437–453. doi: 10.1111/j.1601-0825.2009.01577.x. [DOI] [PubMed] [Google Scholar]
- 9.Mangold E., Ludwig K.U., Nöthen M.M. Breakthroughs in the genetics of orofacial clefting. Trends Mol. Med. 2011;17:725–733. doi: 10.1016/j.molmed.2011.07.007. [DOI] [PubMed] [Google Scholar]
- 10.Mossey P.A., Modell B. Epidemiology of oral clefts 2012: An international perspective. Front. Oral Biol. 2012;16:1–18. doi: 10.1159/000337464. [DOI] [PubMed] [Google Scholar]
- 11.Grosen D., Bille C., Pedersen J.K., Skytthe A., Murray J.C., Christensen K. Recurrence risk for offspring of twins discordant for oral cleft: A population-based cohort study of the Danish 1936-2004 cleft twin cohort. Am. J. Med. Genet. A. 2010;152A:2468–2474. doi: 10.1002/ajmg.a.33608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Christensen K., Juel K., Herskind A.M., Murray J.C. Long term follow up study of survival associated with cleft lip and palate at birth. BMJ. 2004;328:1405. doi: 10.1136/bmj.38106.559120.7C. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Yu Y., Zuo X., He M., Gao J., Fu Y., Qin C., Meng L., Wang W., Song Y., Cheng Y., et al. Genome-wide analyses of non-syndromic cleft lip with palate identify 14 novel loci and genetic heterogeneity. Nat. Commun. 2017;8:14364. doi: 10.1038/ncomms14364. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.van Rooij I.A., Ludwig K.U., Welzenbach J., Ishorst N., Thonissen M., Galesloot T.E., Ongkosuwito E., Bergé S.J., Aldhorae K., Rojas-Martinez A., et al. Non-syndromic cleft lip with or without cleft palate: Genome-wide association study in Europeans identifies a suggestive risk locus at 16p12.1 and supports SH3PXD2A as a clefting susceptibility gene. Genes (Basel) 2019;10:1023. doi: 10.3390/genes10121023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Moreno L.M., Mansilla M.A., Bullard S.A., Cooper M.E., Busch T.D., Machida J., Johnson M.K., Brauer D., Krahn K., Daack-Hirsch S., et al. FOXE1 association with both isolated cleft lip with or without cleft palate, and isolated cleft palate. Hum. Mol. Genet. 2009;18:4879–4896. doi: 10.1093/hmg/ddp444. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Mostowska A., Gaczkowska A., Żukowski K., Ludwig K.U., Hozyasz K.K., Wójcicki P., Mangold E., Böhmer A.C., Heilmann-Heimbach S., Knapp M., et al. Common variants in DLG1 locus are associated with non-syndromic cleft lip with or without cleft palate. Clin. Genet. 2018;93:784–793. doi: 10.1111/cge.13141. [DOI] [PubMed] [Google Scholar]
- 17.Leslie E.J., Carlson J.C., Shaffer J.R., Feingold E., Wehby G., Laurie C.A., Jain D., Laurie C.C., Doheny K.F., McHenry T., et al. A multi-ethnic genome-wide association study identifies novel loci for non-syndromic cleft lip with or without cleft palate on 2p24.2, 17q23 and 19q13. Hum. Mol. Genet. 2016;25:2862–2872. doi: 10.1093/hmg/ddw104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Mangold E., Ludwig K.U., Birnbaum S., Baluardo C., Ferrian M., Herms S., Reutter H., de Assis N.A., Chawa T.A., Mattheisen M., et al. Genome-wide association study identifies two susceptibility loci for nonsyndromic cleft lip with or without cleft palate. Nat. Genet. 2010;42:24–26. doi: 10.1038/ng.506. [DOI] [PubMed] [Google Scholar]
- 19.Ludwig K.U., Mangold E., Herms S., Nowak S., Reutter H., Paul A., Becker J., Herberz R., AlChawa T., Nasser E., et al. Genome-wide meta-analyses of nonsyndromic cleft lip with or without cleft palate identify six new risk loci. Nat. Genet. 2012;44:968–971. doi: 10.1038/ng.2360. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Beaty T.H., Murray J.C., Marazita M.L., Munger R.G., Ruczinski I., Hetmanski J.B., Liang K.Y., Wu T., Murray T., Fallin M.D., et al. A genome-wide association study of cleft lip with and without cleft palate identifies risk variants near MAFB and ABCA4. Nat. Genet. 2010;42:525–529. doi: 10.1038/ng.580. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Ludwig K.U., Böhmer A.C., Bowes J., Nikolić M., Ishorst N., Wyatt N., Hammond N.L., Gölz L., Thieme F., Barth S., et al. Imputation of orofacial clefting data identifies novel risk loci and sheds light on the genetic background of cleft lip ± cleft palate and cleft palate only. Hum. Mol. Genet. 2017;26:829–842. doi: 10.1093/hmg/ddx012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Sun Y., Huang Y., Yin A., Pan Y., Wang Y., Wang C., Du Y., Wang M., Lan F., Hu Z., et al. Genome-wide association study identifies a new susceptibility locus for cleft lip with or without a cleft palate. Nat. Commun. 2015;6:6414. doi: 10.1038/ncomms7414. [DOI] [PubMed] [Google Scholar]
- 23.Ludwig K.U., Ahmed S.T., Böhmer A.C., Sangani N.B., Varghese S., Klamt J., Schuenke H., Gültepe P., Hofmann A., Rubini M., et al. Meta-analysis Reveals Genome-Wide Significance at 15q13 for Nonsyndromic Clefting of Both the Lip and the Palate, and Functional Analyses Implicate GREM1 As a Plausible Causative Gene. PLoS Genet. 2016;12:e1005914. doi: 10.1371/journal.pgen.1005914. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Beaty T.H., Taub M.A., Scott A.F., Murray J.C., Marazita M.L., Schwender H., Parker M.M., Hetmanski J.B., Balakrishnan P., Mansilla M.A., et al. Confirming genes influencing risk to cleft lip with/without cleft palate in a case-parent trio study. Hum. Genet. 2013;132:771–781. doi: 10.1007/s00439-013-1283-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Zucchero T.M., Cooper M.E., Maher B.S., Daack-Hirsch S., Nepomuceno B., Ribeiro L., Caprau D., Christensen K., Suzuki Y., Machida J., et al. Interferon Regulatory Factor 6 (IRF6) Gene Variants and the Risk of Isolated Cleft Lip or Palate. N. Engl. J. Med. 2004;351:769–780. doi: 10.1056/NEJMoa032909. [DOI] [PubMed] [Google Scholar]
- 26.Leslie E.J., Carlson J.C., Shaffer J.R., Butali A., Buxó C.J., Castilla E.E., Christensen K., Deleyiannis F.W.B., Leigh Field L., Hecht J.T., et al. Genome-wide meta-analyses of nonsyndromic orofacial clefts identify novel associations between FOXE1 and all orofacial clefts, and TP63 and cleft lip with or without cleft palate. Hum. Genet. 2017;136:275–286. doi: 10.1007/s00439-016-1754-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Birnbaum S., Ludwig K.U., Reutter H., Herms S., Steffens M., Rubini M., Baluardo C., Ferrian M., Almeida de Assis N., Alblas M.A., et al. Key susceptibility locus for nonsyndromic cleft lip with or without cleft palate on chromosome 8q24. Nat. Genet. 2009;41:473–477. doi: 10.1038/ng.333. [DOI] [PubMed] [Google Scholar]
- 28.Rahimov F., Marazita M.L., Visel A., Cooper M.E., Hitchler M.J., Rubini M., Domann F.E., Govil M., Christensen K., Bille C., et al. Disruption of an AP-2α binding site in an IRF6 enhancer is associated with cleft lip. Nat. Genet. 2008;40:1341–1347. doi: 10.1038/ng.242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Thieme F., Ludwig K.U. The Role of Noncoding Genetic Variation in Isolated Orofacial Clefts. J. Dent. Res. 2017;96:1238–1247. doi: 10.1177/0022034517720403. [DOI] [PubMed] [Google Scholar]
- 30.Attanasio C., Nord A.S., Zhu Y., Blow M.J., Li Z., Liberton D.K., Morrison H., Plajzer-Frick I., Holt A., Hosseini R., et al. Fine tuning of craniofacial morphology by distant-acting enhancers. Science. 2013;342:1241006. doi: 10.1126/science.1241006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Uslu V.V., Petretich M., Ruf S., Langenfeld K., Fonseca N.A., Marioni J.C., Spitz F. Long-range enhancers regulating Myc expression are required for normal facial morphogenesis. Nat. Genet. 2014;46:753–758. doi: 10.1038/ng.2971. [DOI] [PubMed] [Google Scholar]
- 32.Liu H., Leslie E.J., Carlson J.C., Beaty T.H., Marazita M.L., Lidral A.C., Cornell R.A. Identification of common non-coding variants at 1p22 that are functional for non-syndromic orofacial clefting. Nat. Commun. 2017;8:14759. doi: 10.1038/ncomms14759. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Dunham I., Kundaje A., Aldred S.F., Collins P.J., Davis C.A., Doyle F., Epstein C.B., Frietze S., Harrow J., Kaul R., et al. ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Kundaje A., Meuleman W., Ernst J., Bilenky M., Yen A., Heravi-Moussavi A., Kheradpour P., Zhang Z., Wang J., Ziller M.J., et al. Roadmap Epigenomics Consortium Integrative analysis of 111 reference human epigenomes. Nature. 2015;518:317–330. doi: 10.1038/nature14248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Aguet F., Brown A.A., Castel S.E., Davis J.R., He Y., Jo B., Mohammadi P., Park Y.S., Parsana P., Segrè A.V., et al. Genetic effects on gene expression across human tissues. Nature. 2017;550:204–213. doi: 10.1038/nature24277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Leslie E.J., Mansilla M.A., Biggs L.C., Schuette K., Bullard S., Cooper M., Dunnwald M., Lidral A.C., Marazita M.L., Beaty T.H., et al. Expression and mutation analyses implicate ARHGAP29 as the etiologic gene for the cleft lip with or without cleft palate locus identified by genome-wide association on chromosome 1p22. Birth Defects Res. A Clin. Mol. Teratol. 2012;94:934–942. doi: 10.1002/bdra.23076. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Rada-Iglesias A., Bajpai R., Prescott S., Brugmann S.A., Swigut T., Wysocka J. Epigenomic annotation of enhancers predicts transcriptional regulators of human neural crest. Cell Stem Cell. 2012;11:633–648. doi: 10.1016/j.stem.2012.07.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Prescott S.L., Srinivasan R., Marchetto M.C., Grishina I., Narvaiza I., Selleri L., Gage F.H., Swigut T., Wysocka J. Enhancer divergence and cis-regulatory evolution in the human and chimp neural crest. Cell. 2015;163:68–83. doi: 10.1016/j.cell.2015.08.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Wilderman A., VanOudenhove J., Kron J., Noonan J.P., Cotney J. High-Resolution Epigenomic Atlas of Human Embryonic Craniofacial Development. Cell Rep. 2018;23:1581–1597. doi: 10.1016/j.celrep.2018.03.129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Dixon J.R., Selvaraj S., Yue F., Kim A., Li Y., Shen Y., Hu M., Liu J.S., Ren B. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012;485:376–380. doi: 10.1038/nature11082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Freire-Pritchett P., Schoenfelder S., Várnai C., Wingett S.W., Cairns J., Collier A.J., García-Vílchez R., Furlan-Magaril M., Osborne C.S., Fraser P., et al. Global reorganisation of cis-regulatory units upon lineage commitment of human embryonic stem cells. eLife. 2017;6:e21926. doi: 10.7554/eLife.21926. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Marchini J., Howie B., Myers S., McVean G., Donnelly P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 2007;39:906–913. doi: 10.1038/ng2088. [DOI] [PubMed] [Google Scholar]
- 43.Spielman R.S., McGinnis R.E., Ewens W.J. Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM) Am. J. Hum. Genet. 1993;52:506–516. [PMC free article] [PubMed] [Google Scholar]
- 44.Willer C.J., Li Y., Abecasis G.R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics. 2010;26:2190–2191. doi: 10.1093/bioinformatics/btq340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Bulik-Sullivan B.K., Loh P.R., Finucane H.K., Ripke S., Yang J., Patterson N., Daly M.J., Price A.L., Neale B.M., Corvin A., et al. Schizophrenia Working Group of the Psychiatric Genomics Consortium LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 2015;47:291–295. doi: 10.1038/ng.3211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.de Leeuw C.A., Mooij J.M., Heskes T., Posthuma D. MAGMA: Generalized Gene-Set Analysis of GWAS Data. PLoS Comput. Biol. 2015;11:e1004219. doi: 10.1371/journal.pcbi.1004219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Watanabe K., Stringer S., Frei O., Umićević Mirkov M., de Leeuw C., Polderman T.J.C., van der Sluis S., Andreassen O.A., Neale B.M., Posthuma D. A global overview of pleiotropy and genetic architecture in complex traits. Nat. Genet. 2019;51:1339–1348. doi: 10.1038/s41588-019-0481-0. [DOI] [PubMed] [Google Scholar]
- 48.Ernst J., Kellis M. Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues. Nat. Biotechnol. 2015;33:364–376. doi: 10.1038/nbt.3157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Ernst J., Kellis M. Chromatin-state discovery and genome annotation with ChromHMM. Nat. Protoc. 2017;12:2478–2492. doi: 10.1038/nprot.2017.124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Laugsch M., Bartusel M., Alirzayeva H., Karaolidou A., Rehimi R., Crispatzu G., Nikolic M., Bleckwehl T., Kolovos P., van Ijcken W.F.J., et al. Disruption of the TFAP2A Regulatory Domain Causes Banchio-Oculo-Facial Syndrome (BOFS) and Illuminates Pathomechanisms for Other Human Neurocristopathies. Cell Stem Cell. 2018;24:736–752. e12. [Google Scholar]
- 51.Schmidt E.M., Zhang J., Zhou W., Chen J., Mohlke K.L., Chen Y.E., Willer C.J. GREGOR: evaluating global enrichment of trait-associated variants in epigenomic features using a systematic, data-driven approach. Bioinformatics. 2015;31:2601–2606. doi: 10.1093/bioinformatics/btv201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Mukherjee K., Ishii K., Pillalamarri V., Kammin T., Atkin J.F., Hickey S.E., Xi Q.J., Zepeda C.J., Gusella J.F., Talkowski M.E., et al. Actin capping protein CAPZB regulates cell morphology, differentiation, and neural crest migration in craniofacial morphogenesis. Hum. Mol. Genet. 2016;25:1255–1270. doi: 10.1093/hmg/ddw006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.McLennan R., Bailey C.M., Schumacher L.J., Teddy J.M., Morrison J.A., Kasemeier-Kulesa J.C., Wolfe L.A., Gogol M.M., Baker R.E., Maini P.K., et al. DAN (NBL1) promotes collective neural crest migration by restraining uncontrolled invasion. J. Cell Biol. 2017;16:3339–3354. doi: 10.1083/jcb.201612169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Matt N., Dupé V., Garnier J.M., Dennefeld C., Chambon P., Mark M., Ghyselinck N.B. Retinoic acid-dependent eye morphogenesis is orchestrated by neural crest cells. Development. 2005;132:4789–4800. doi: 10.1242/dev.02031. [DOI] [PubMed] [Google Scholar]
- 55.Kousa Y.A., Fuller E., Schutte B.C. IRF6 and AP2A Interaction Regulates Epidermal Development. J. Invest. Dermatol. 2018;138:2578–2588. doi: 10.1016/j.jid.2018.05.030. [DOI] [PubMed] [Google Scholar]
- 56.Lin-Shiao E., Lan Y., Welzenbach J., Alexander K.A., Zhang Z., Knapp M., Mangold E., Sammons M., Ludwig K.U., Berger S.L. p63 establishes epithelial enhancers at critical craniofacial development genes. Sci. Adv. 2019;5:eaaw0946. doi: 10.1126/sciadv.aaw0946. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Kobayashi K., Jakt L.M., Nishikawa S.I. Epigenetic regulation of the neuroblastoma genes, Arid3b and Mycn. Oncogene. 2013;32:2640–2648. doi: 10.1038/onc.2012.285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Takebe A., Era T., Okada M., Martin Jakt L., Kuroda Y., Nishikawa S. Microarray analysis of PDGFR α+ populations in ES cell differentiation culture identifies genes involved in differentiation of mesoderm and mesenchyme including ARID3b that is essential for development of embryonic mesenchymal cells. Dev. Biol. 2006;293:25–37. doi: 10.1016/j.ydbio.2005.12.016. [DOI] [PubMed] [Google Scholar]
- 59.Jakobsen L.P., Borup R., Vestergaard J., Larsen L.A., Lage K., Maroun L.L., Kjaer I., Niemann C.U., Andersen M., Knudsen M.A., et al. Expression analyses of human cleft palate tissue suggest a role for osteopontin and immune related factors in palatal development. Exp. Mol. Med. 2009;41:77–85. doi: 10.3858/emm.2009.41.2.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Leslie E.J., Carlson J.C., Shaffer J.R., Buxó C.J., Castilla E.E., Christensen K., Deleyiannis F.W.B., Field L.L., Hecht J.T., Moreno L., et al. Association studies of low-frequency coding variants in nonsyndromic cleft lip with or without cleft palate. Am. J. Med. Genet. A. 2017;173:1531–1538. doi: 10.1002/ajmg.a.38210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Cox L.L., Cox T.C., Moreno Uribe L.M., Zhu Y., Richter C.T., Nidey N., Standley J.M., Deng M., Blue E., Chong J.X., et al. Mutations in the Epithelial Cadherin-p120-Catenin Complex Cause Mendelian Non-Syndromic Cleft Lip with or without Cleft Palate. Am. J. Hum. Genet. 2018;102:1143–1157. doi: 10.1016/j.ajhg.2018.04.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Marini N.J., Asrani K., Yang W., Rine J., Shaw G.M. Accumulation of rare coding variants in genes implicated in risk of human cleft lip with or without cleft palate. Am. J. Med. Genet. A. 2019;179:1260–1269. doi: 10.1002/ajmg.a.61183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Lupiáñez D.G., Spielmann M., Mundlos S. Breaking TADs: How Alterations of Chromatin Domains Result in Disease. Trends Genet. 2016;32:225–237. doi: 10.1016/j.tig.2016.01.003. [DOI] [PubMed] [Google Scholar]
- 64.Lupiáñez D.G., Kraft K., Heinrich V., Krawitz P., Brancati F., Klopocki E., Horn D., Kayserili H., Opitz J.M., Laxova R., et al. Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions. Cell. 2015;161:1012–1025. doi: 10.1016/j.cell.2015.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Laugsch M., Bartusel M., Rehimi R., Alirzayeva H., Karaolidou A., Crispatzu G., Zentis P., Nikolic M., Bleckwehl T., Kolovos P., et al. Modeling the Pathological Long-Range Regulatory Effects of Human Structural Variation with Patient-Specific hiPSCs. Cell Stem Cell. 2019;24:736–752.e12. doi: 10.1016/j.stem.2019.03.004. [DOI] [PubMed] [Google Scholar]
- 66.Vermillion K.L., Lidberg K.A., Gammill L.S. Expression of actin-binding proteins and requirement for actin-depolymerizing factor in chick neural crest cells. Dev. Dyn. 2014;243:730–738. doi: 10.1002/dvdy.24105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Ernst J., Kellis M. ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods. 2012;9:215–216. doi: 10.1038/nmeth.1906. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Richardson R.J., Hammond N.L., Coulombe P.A., Saloranta C., Nousiainen H.O., Salonen R., Berry A., Hanley N., Headon D., Karikoski R., Dixon M.J. Periderm prevents pathological epithelial adhesions during embryogenesis. J. Clin. Invest. 2014;124:3891–3900. doi: 10.1172/JCI71946. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Fraser H.B., Lam L.L., Neumann S.M., Kobor M.S. Population-specificity of human DNA methylation. Genome Biol. 2012;13:R8. doi: 10.1186/gb-2012-13-2-r8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Liu J., Hutchison K., Perrone-Bizzozero N., Morgan M., Sui J., Calhoun V. Identification of genetic and epigenetic marks involved in population structure. PLoS ONE. 2010;5:e13209. doi: 10.1371/journal.pone.0013209. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Butali A., Mossey P.A., Adeyemo W.L., Eshete M.A., Gowans L.J.J., Busch T.D., Jain D., Yu W., Huan L., Laurie C.A., et al. Genomic analyses in African populations identify novel risk loci for cleft palate. Hum. Mol. Genet. 2019;28:1038–1051. doi: 10.1093/hmg/ddy402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Mukhopadhyay N., Bishop M., Mortillo M., Chopra P., Hetmanski J.B., Taub M.A., Moreno L.M., Valencia-Ramirez L.C., Restrepo C., Wehby G.L., et al. Whole genome sequencing of orofacial cleft trios from the Gabriella Miller Kids First Pediatric Research Consortium identifies a new locus on chromosome 21. Hum. Genet. 2020;139:215–226. doi: 10.1007/s00439-019-02099-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Carlson J.C., Anand D., Butali A., Buxo C.J., Christensen K., Deleyiannis F., Hecht J.T., Moreno L.M., Orioli I.M., Padilla C., et al. A systematic genetic analysis and visualization of phenotypic heterogeneity among orofacial cleft GWAS signals. Genet. Epidemiol. 2019;43:704–716. doi: 10.1002/gepi.22214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Huang L., Jia Z., Shi Y., Du Q., Shi J., Wang Z., Mou Y., Wang Q., Zhang B., Wang Q., et al. Genetic factors define CPO and CLO subtypes of nonsyndromicorofacial cleft. PLoS Genet. 2019;15:e1008357. doi: 10.1371/journal.pgen.1008357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Ray A.K., Marazita M.L., Pathak R., Beever C.L., Cooper M.E., Goldstein T., Shaw D.F., Field L.L. TP63 mutation and clefting modifier genes in an EEC syndrome family. Clin. Genet. 2004;66:217–222. doi: 10.1111/j.1399-0004.2004.00287.x. [DOI] [PubMed] [Google Scholar]
- 76.Liu H., Busch T., Eliason S., Anand D., Bullard S., Gowans L.J.J., Nidey N., Petrin A., Augustine-Akpan E.A., Saadi I., et al. Exome sequencing provides additional evidence for the involvement of ARHGAP29 in Mendelian orofacial clefting and extends the phenotypic spectrum to isolated cleft palate. Birth Defects Res. 2017;109:27–37. doi: 10.1002/bdra.23596. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Savastano C.P., Brito L.A., Faria Á.C., Setó-Salvia N., Peskett E., Musso C.M., Alvizi L., Ezquina S.A.M., James C., GOSgene, et al. Impact of rare variants in ARHGAP29 to the etiology of oral clefts: role of loss-of-function vs missense variants. Clin. Genet. 2017;91:683–689. doi: 10.1111/cge.12823. [DOI] [PubMed] [Google Scholar]
- 78.Mansouri A. The role of Pax3 and Pax7 in development and cancer. Crit. Rev. Oncog. 1998;9:141–149. doi: 10.1615/critrevoncog.v9.i2.40. [DOI] [PubMed] [Google Scholar]
- 79.Bishop M.R., Diaz Perez K.K., Sun M., Ho S., Chopra P., Mukhopadhyay N., Hetmanski J.B., Taub M.A., Moreno-Uribe L.M., Valencia-Ramirez L.C., et al. Genome-wide Enrichment of De Novo Coding Mutations in Orofacial Cleft Trios. Am. J. Hum. Genet. 2020;107:124–136. doi: 10.1016/j.ajhg.2020.05.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Original data for genetic and functional analyses in the paper is available as follows: dbGaP (dbGaP: phs000094 and phs000774), GEO (GEO: GSE28874, GSE70751, and GSE97752), and Zenodo (DOI 10.5281/zenodo.3724148). The NCC- and CT-specific active sites generated during this study are available at Zenodo (DOI 10.5281/zenodo.3911187).