Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2026 May 7.
Published in final edited form as: Genome Res. 2026 Apr 7;36(4):695–712. doi: 10.1101/gr.280838.125

Functional genomics analysis of developing zebrafish and human endoderm reveals highly conserved cis-regulatory modules acting during vertebrate organogenesis

Daniela M Riley 1,#, Randa Elsayed 1,#, Mark D Walsh 2,#, Simaran Johal 2, Ying Lin 3,4, Harry Walton 1, Till Bretschneider 5, Sascha Ott 1,6, Andrew C Nelson 2,6,*
PMCID: PMC7619044  EMSID: EMS213377  PMID: 41781333

Abstract

While vertebrate species are superficially diverse, they share key commonalities in terms of overall morphology, and organ configuration and function. Maintenance of these traits during evolution is partially explained by conservation of critical genes governing embryonic development. However, for conserved genes to deliver consistent developmental outcomes between species, similar gene regulatory programmes and gene expression patterns must also be maintained. The endoderm germ layer makes major contributions to the respiratory and gastrointestinal tracts, and associated organs including liver and pancreas. We used functional genomics approaches to identify highly conserved endodermal cis-regulatory modules (CRMs) functioning across the 400 million years of evolution separating zebrafish and humans. Our analyses suggest that there are few endoderm-specific CRMs, with many CRMs governing pancreas development also likely acting within the nervous system. Furthermore, these highly conserved CRMs are strongly enriched for binding sites of “neuro-pancreatic” transcription factors governing both pancreas and nervous system development, potentially suggesting function across these distinct organ systems. Additionally, we identify highly conserved CRMs potentially participating in endodermal patterning of adjacent craniofacial structures and sensory tissues. The highly conserved CRMs we identify are characterised by conserved patterns of transcription factor binding site co-occurrence. However, rigid arrangement of binding sites is not a common characteristic of the identified CRMs, suggesting more complex or individual grammatical rules. Overall, our analyses provide key insights into critical gene regulatory control during vertebrate endoderm organogenesis, and define a compendium of highly conserved CRMs that should be prioritised for analysis of neuro-pancreatic gene transcriptional control, and anterior embryonic patterning.

Introduction

While vertebrates exhibit remarkable phenotypic diversity, there are nevertheless many key commonalities of the vertebrate body plan. These include anterior-posterior polarity, similar internal and sensory organs of broadly conserved function, and similar configuration and integration of these organs. Conserved aspects of the vertebrate body plan are accepted to be largely controlled via selective pressures maintaining key gene sequences and functions. Indeed, though the last shared common ancestor of humans and zebrafish is estimated to have inhabited the Earth 400 million years ago, 70% of human genes have at least one obvious zebrafish orthologue (Howe et al. 2013). However, for conserved genes to mediate consistent phenotypic outcomes there must also be consistent regulation of such genes by cis-regulatory modules (CRMs). CRMs consist of clusters of transcription factor binding sites (TFBSs) that regulate the expression of target genes through collective or competitive binding of operative transcription factors (TFs). However, identification of conserved CRMs is often confounded by rearrangement and substitution of TFBSs, leading to similar functional capabilities without deep sequence conservation (reviewed in (Nelson and Wardle 2013; Long et al. 2016; Jindal and Farley 2021). In spite of this, comparative genomics analyses indicate that short discrete non-coding regions of the genome display a high degree of sequence conservation across hundreds of million years of evolution (Elgar 2009; Vavouri and Lehner 2009; Nelson and Wardle 2013; Polychronopoulos et al. 2017). These so-called highly conserved non-coding elements (HCNEs) show non-random distribution in the genome, tending to cluster around genes controlling developmental transitions and cell fate decisions (Sandelin et al. 2004; Woolfe et al. 2005; Engstrom et al. 2008). Many of the products of such genes are transcription factors, hence the putative targets of HCNE CRMs have been referred to as trans-dev genes (Woolfe et al. 2005; Elgar 2009; Nelson and Wardle 2013). That such genes appear to be regulated by HCNEs suggests particular constraints on the regulatory logic and architecture on trans-dev gene CRMs, though understanding of this is currently lacking.

While the importance and complete range of functions of HCNEs remain to be fully understood, many have been shown to direct tissue-specific gene expression consistent with them acting as enhancers (Nelson and Wardle 2013). Moreover, disruption of specific HCNEs is associated with developmental disorders and diseases including cancer, further highlighting their importance (Polychronopoulos et al. 2017). Functional genomics studies have indicated that sequence conservation at active enhancers, and use of equivalent enhancers between species is maximal during the so-called phylotypic period – a developmental window exhibiting maximal interspecies similarity within a phylum (Von Baer 1828; Duboule 1994; Bogdanovic et al. 2012; Raff 2012; Tena et al. 2014; Martinez-Morales 2016). However, previous studies typically involved bulk analyses of whole embryos with consequent loss of tissue-specific information, and poor detection of signal from minor cell populations. An analysis of HCNEs within CRMs acting in specific germ layers and tissues across disparate vertebrate species is therefore generally lacking.

Identification of tissue-specific CRMs and their key operative TFs is pivotal to understanding how gene regulatory control of cell fate decisions is achieved during normal development. Furthermore, to understand the basis for developmental disorders, it will be necessary to decipher gene regulatory networks (GRNs) underpinning normal development. Zebrafish is an excellent model organism for dissecting gene regulatory control of cell fate and behaviour (Figiel et al. 2021). Large clutch sizes, ex utero development and availability of well-characterised fluorescent reporter lines allow straightforward enrichment of minor cell populations from dissociated embryos for functional genomics analysis. Transparency of embryos and larvae during early development combined with ease of transgenesis also allow live imaging analysis of putative CRM function. Furthermore, zebrafish have undergone an additional whole genome duplication relative to non-teleost species (Glasauer and Neuhauss 2014). Selective maintenance of HCNEs at a single duplicated gene copy can therefore potentially be used to predict HCNE function. While early functional genomics studies analysing whole embryos have been useful (e.g. (Bogdanovic et al. 2012), new studies analysing enriched cell populations are necessary to gain germ layer-specific information..

The endoderm, one of the three primary germ layers of the vertebrate embryo, is induced in early development by Nodal signalling and makes major contributions to the formation of liver, pancreas, intestines, pharynx, swim bladder and other organs (Warga and Nusslein-Volhard 1999; Figiel et al. 2021). The dorsal forerunner cells (DFCs) which go on to form the zebrafish organ of laterality, Kupffer’s vesicle (KV) are also induced by Nodal and are suggested by some to be a specialized dorsal subset of endodermal cells due to their similar early developmental program (Alexander and Stainier 1999; Warga and Kane 2018; Moreno-Ayala et al. 2021). The SOX family transcription factor Sox17 is a commonly used marker of endoderm across vertebrate species, its expression indicating definitive specification of endoderm during gastrulation (Hudson et al. 1997; Alexander and Stainier 1999; Kanai-Azuma et al. 2002). As well as being expressed in endoderm progenitors during gastrulation, sox17 is subsequently expressed in other progenitor populations and has key roles in blood formation and vasculature development throughout vertebrate evolution (e.g.(Kanai-Azuma et al. 2002; Aamar and Dawid 2010; Chung et al. 2011; Saund et al. 2012; Viotti et al. 2012; Lilly et al. 2017; Figiel et al. 2021; Johal et al. 2025). While multiple recent studies have sought to identify CRMs and gene expression patterns in lineages derived from sox17-expressing progenitors (Quillien et al. 2017; Bonkhofer et al. 2019; Dobrzycki et al. 2020; Lopez-Perez et al. 2021; Xia et al. 2021; Trinh et al. 2023), substantial gaps persist in our knowledge of gene regulation in the developing endoderm.

Here, we aimed to characterise CRMs accessible in zebrafish endoderm during early organogenesis through comparative functional genomics analyses of cell populations arising from sox17-expressing endodermal progenitors, compared with sox17-expressing mesodermal and sox17-negative populations. We further aimed to identify and characterise HCNE CRMs acting in endoderm throughout jawed vertebrate evolution through integration of the resulting zebrafish data with functional genomics data from human embryonic stem cells (hESCs) that have undergone directed differentiation to represent distinct endoderm cell populations along the anterior-posterior axis.

Results

ATAC-seq reveals CRMs functioning in distinct sox17-expressing lineages during zebrafish embryogenesis

To enrich for endodermal cells during endoderm organogenesis we exploited sox17:GFP fish (Mizoguchi et al. 2008). While endogenous sox17 is rapidly downregulated at the end of gastrulation (Alexander and Stainier 1999), GFP protein persists throughout the endoderm for days after the endogenous gene has been silenced (Fig. 1A). However, sox17 expression is also observed in erythroid and endothelial lineages, limiting the utility of sox17:GFP alone for enrichment of endoderm (Chung et al. 2011). We therefore crossed homozygous sox17:GFP fish with fish homozygous for both gata1a:dsRed and kdrl:mCherry transgenes (Fig. 1B). We then used fluorescence activated cell sorting (FACS) to separate sox17+ vascular and erythroid mesodermal lineages (termed sox17M) from endoderm-enriched sox17+ lineages (sox17E) and all other sox17- lineages (sox17N) at 28 and 48 hours post-fertilisation (hpf) (Fig. 1C, Supplemental Fig. S1). 28 hpf was chosen as the earliest timepoint to ensure robust detection of all three fluorescent proteins. To verify that our sorting strategy enriched for endodermal cells in the sox17E population we performed bulk RNA-seq on 28 hpf samples. This revealed strong enrichment for transcripts expressed in specific endoderm cell populations including pancreatic, liver and intestinal markers (Fig. 1D, Supplemental Fig. 2, Supplemental Table 1). However, we also note enrichment of markers of fin epithelia and notochord which we attribute to transdifferentiation of the sox17-expressing left-right organizer (Kupffer’s vesicle - KV), which gives rise to posterior cell types once its role is complete ((Ikeda et al. 2022), Supplemental Fig. S3). Nevertheless, the sox17E population shows strong enrichment for endoderm markers, thus validating our approach. We therefore proceeded to perform independent duplicate ATAC-seq analysis on the three distinct FACS populations at 28 and 48 hpf (Supplemental Figs S4-7).

Figure 1. Fluorescence activated cell sorting separates endodermal and mesodermal cell populations arising from sox17+ progenitors.

Figure 1

(A) Widefield images of sox17:GFP embryos at 28 and 48 hpf with distinct endodermal structures indicated. (B) Widefield images of coexpression of sox17:GFP with kdrl:mCherry and gata1a:dsRed transgenes in trunk and tail vasculature at 48 hpf. (C) Sorting strategy – also see Supplemental Figure S1. (D) RNA-seq heatmap of the top 20 significant genes showing strongest fold enrichment in sox17E over sox17N at 28 hpf.

ATAC-seq analysis revealed thousands of differentially accessible regions (DARs) distinguishing the three sorted populations both within and between timepoints (Fig. 2A,B, Supplemental Tables 2,3). Chromatin accessibility profiles were consistent with the predicted cell identities being enriched in each sorted cell population. For example, the embryonic haemoglobin gene cluster, erythroid marker gata1a, and endothelial marker fli1b show enhanced accessibility in the sox17M population (Fig. 2C, Supplemental Fig. S8), while markers of the posterior foregut (gata6 and foxa3), pancreas and duodenum (pdx1), and liver and intestine (fabp2) show enhanced accessibility in the sox17E population (Fig. 2D, Supplemental Fig. S8). However, we note that many more regions show enhanced accessibility in sox17E over sox17M, compared to sox17E over sox17N (Fig. 2A). This suggests that while there are differences in accessibility profiles that distinguish the sox17E population from sox17M, many regions of accessible chromatin in sox17E are nevertheless shared with cell types in the sox17N population rather than being unique to endoderm.

Figure 2. Sorted cell populations have distinct chromatin accessibility profiles indicative of constituent cell identities.

Figure 2

(A) K-means clustered heatmaps of relative ATAC-seq read densities at differentially accessible regions (DARs, FDR ≤ 0.05) for the comparisons indicated. Clusters are rank ordered from greatest to least accessibility in the sox17E population. (B) Venn diagrams indicating number of sox17E > sox17N DARs overlapping sox17E > sox17M DARs at 28 and 48 hpf. Venn diagrams indicating overlaps of called peaks used in the analysis are depicted in Supplemental Figure S6, and genomic distribution of DARs in Supplemental Figure S7. (C) Example loci indicating greater accessibility of erythrocyte marker genes (gata1 and haemoglobin gene cluster) and the endothelial marker fli1b in the sox17M population. (D) Example loci indicating greater accessibility of markers of the posterior foregut (gata6 and foxa3), pancreas and duodenum (pdx1), and liver and intestine (fabp2) in the sox17E population compared to sox17M and sox17N. Peak heights in counts per million reads (CPM) are indicated. Significant DAR comparisons in panel D and outlined in red. Also see Supplemental Figure S8 for zoomed-out tracks rescaled per condition to the strongest local peak, confirming the enrichment shown here.

To more globally ascertain whether the ATAC-seq profiles of the sorted populations are consistent with the expected constituent cell populations we performed anatomical enrichment analysis on promoter regions that showed enhanced accessibility in the sox17E population. We specifically focused on promoter regions to avoid errant annotation of DARs to putative target genes. At the stages studied the endoderm predominantly consists of epithelial cells of the pharyngeal endoderm and intestinal rod, as well as the developing pancreas and liver primordia. As expected, we found strong enrichment for anatomical terms consistent with epithelial structures, gut, liver and pancreas, for DARs more accessible in both sox17E>sox17M and sox17E>sox17N, especially at 48 hpf (Fig. 3A, Supplemental Tables 4-7). This is consistent with greater development of endodermal organs by this stage. We also found enrichment for notochord markers in sox17E>sox17M, consistent with transdifferentiated cells from KV incorporating into this structure (Supplemental Figs S3,9). Importantly, we find no such enrichment for endodermal terms for sox17N>sox17E or sox17M>sox17E DARs, confirming enrichment of endoderm in the sox17E populations (Supplemental Fig. S10). Conversely, we found strong enrichment for anatomical terms consistent with nervous system and eye development but not endodermal tissues for sox17N>sox17E promoter DARs (Supplemental Table 8). This is highly consistent with these structures being absent from the sox17E population, as expected. However, we also found strong enrichment for markers of neural structures in sox17E>sox17M, and the lateral line system in both sox17E>sox17M and sox17E>sox17N (Fig. 3A). This suggests that either cells arising from sox17+ endodermal progenitors and KV in the sox17E population exhibit similar gene accessibility (and consequent potential expression) signatures to the nervous and lateral line systems, or there is hitherto unrecognized sox17:GFP expression in these cell types. A consideration, however, is that promoter accessibility is not necessarily correlated with promoter activity (Starks et al. 2019). For example, a promoter may be accessible but actively silenced by transcriptional repressors. It is therefore possible that accessibility of neural gene promoters is not indicative of neural cells in the sox17E population.

Figure 3. Promoter accessibility within the sox17E populations is consistent with presence of endodermal cell populations and lateral line neurons.

Figure 3

(A) Heatmap of -log10(adjusted P-value) from fishEnrichr Anatomy GeneRIF Predicted Z-score analyses of promoters showing greater accessibility in the comparisons indicated (FDR ≤ 0.05) as annotated by ChIPseeker. The key provides a continuous scale of -log10(adjusted P-value), with minimum and maximum significance indicated, along with the 50th percentile significant value. Selected terms are shown; complete fishEnrichr outputs are in Supplemental Tables 9-10. (B) Dorsal widefield images of sox17:GFP embryo at 48 hpf. Nine focal layers have each been artificially coloured based on depth from dorsal to ventral as indicated. (C) 3D rendering of confocal images 48 hpf sox17:GFP laterally orientated embryo with head to the left and tail to the right. Scale bar = 100 µm. OC1 = organ of Corti 1, plg = posterior lateral line ganglion, primD = dorsal primordium, primII = second primordium, iv = intersegmental vessel, L = liver, P = pancreas, dC = cells within duct of Cuvier, pe = pharyngeal endoderm. Pharyngeal pouches are marked by white arrows.

To further explore whether there is GFP expression in the nervous and lateral line systems we performed detailed imaging of sox17:GFP embryos. This revealed subtle but detectable expression in a small set of neurons exhibiting morphology consistent with being a subset of lateral line neurons within the hindbrain and trunk (Fig. 3B,C). This includes organ of Corti 1 (OC1), posterior lateral line ganglion (plg), dorsal primordium (primD) and second primordium (primII) (Pujol-Marti and Lopez-Schier 2013). We otherwise did not detect GFP expression in developing neural structures. Development of the posterior lateral line ganglion, nerve and its projections have been shown to be affected by inhibition of retinoic acid (RA) signalling (Begemann et al. 2004). Consistent with expectations, we found that modulation of RA signalling through addition of RA itself, an inhibitor of Aldh1a mediated RA synthesis (diethylaminobenzaldehyde – DEAB), and an inverse agonist of RA Receptor (BMS493) all disrupted the formation of these sox17:GFP+ neurons (Supplemental Fig. S11). We therefore conclude that the sox17:GFP reporter is expressed in a subset of lateral line neurons, but not other neural structures. However, to the best of our knowledge endogenous sox17 expression has not been reported in the zebrafish lateral line, either in literature providing detailed in situ hybridisation analysis of sox17 (Chung et al. 2011), or in whole embryo single-cell RNA-seq timecourse datasets including Daniocell (Sur et al. 2023). These resources do, however, report endoderm and haematovascular sox17 expression. It is therefore possible that sox17:GFP reporter expression in the lateral line neurons is an artefact caused by the position of transgene integration into the genome rather than reflecting a true sox17 expression domain.

Overall we conclude that we have profiled chromatin accessibility across two distinct sox17-expressing populations – sox17M containing erythroid progenitors and a subset of endothelial cells, and sox17E containing all endoderm, plus KV-derived posterior cell types and lateral line neurons. Given sox17E captures endodermal chromatin accessibility and thus active CRMs (albeit accessible sequences are not always active CRMs), we next wanted to determine which of these CRMs are likely to act during endoderm formation across the 400 million years of evolution separating zebrafish and humans.

Highly conserved CRMs functioning in both human and zebrafish endoderm cluster around genes controlling diverse aspects of endodermal and vertebrate-specific development

To determine which potential CRMs captured by our ATAC-seq data are recognisably functional in human endoderm we compared our zebrafish data to existing ChIP-seq data for the marker of active promoters and enhancers, H3K27ac (Creyghton et al. 2010). Specifically, we utilised H3K27ac ChIP-seq data from human embryonic stem cells (hESCs) that had undergone efficient directed differentiation to either anterior foregut (AFG), posterior foregut (PFG) or midgut/hindgut (MHG) (Loh et al. 2014) (Supplemental Table 11). AFG principally gives rise to anterior structures including thyroid and lungs in human, PFG to liver and pancreas, and MHG to small and large intestine.

We wanted to avoid eliminating DARs that are potentially functionally important in the endoderm, while enriching for genomic regions that are not likely to be controlling constitutively expressed genes. We therefore primarily focused our attention on DARs showing greater accessibility in sox17E relative to sox17M (sox17E>sox17M DARs). This is because our initial analyses indicate that relatively few DARs distinguish sox17E from the sox17N population, but that the majority that do also distinguish sox17E from sox17M (Fig. 2A,B).

To identify zebrafish sox17E DARs corresponding to human H3K27ac ChIP-seq peaks we compared both datasets to highly conserved non-coding elements (HCNEs) exhibiting ≥70% sequence identity across ≥30 alignment columns (Engstrom et al. 2008). We identified a total of 5,000 HCNEs overlapping H3K27ac peaks in at least one of the AFG, PFG and MHG cell populations (Fig. 4A). 7,435 HCNEs overlap sox17E>sox17M DARs while 236 HCNEs overlap sox17E>sox17N DARs, of which 112 are common to both sox17E>sox17M and sox17E>sox17N DARs (Fig. 4B). Of the 5,000 HCNEs overlapping human H3K27ac peaks, 1,701 were also identified in sox17E>sox17M DARs (Fig. 4C,D, Supplemental Figs S12-13, Supplemental Table 12-13). These HCNEs occur proximal to genes with prominent roles in endoderm development such as regulator of gut development DLL1/dld (Pellegrinet et al. 2011; Troll et al. 2018) (Fig. 4E), regulator of pancreas development FOXA2/foxa2 (Shin et al. 2008; Lee et al. 2019; Elsayed et al. 2021) (Fig. 4F) and pancreatic and biliary regulator HES1/her6 (Fig. 4G) (Spence et al. 2009).

Figure 4. HCNE CRMs bearing the hallmarks of functionality in human and zebrafish endoderm are proximal to genes with deleterious pancreatic, neural and craniofacial phenotypes.

Figure 4

(A) Number of HCNEs overlapping human H3K27ac peaks in AFG, PFG and/or MHG patterned endoderm cell populations derived from hESCs. Number of HCNEs not overlapping H3K27ac peaks are at the bottom right. (B) Distribution of HCNEs overlapping sox17E>sox17M DARs vs sox17E>sox17N DARs at 28 and 48 hpf. Number of HCNEs not overlapping DARs are at the bottom right. (C) Distribution of HCNEs overlapping sox17E>sox17M DARs also overlapping human H3K27ac peaks in AFG, PFG and/or MHG patterned endoderm cell populations derived from hESCs. Number of HCNEs not overlapping H3K27ac peaks or DARs are at the bottom right. (D) Schematic of human HCNE subsets used for functional, anatomical and phenotypic enrichment analyses using GREAT. (E-F) Example orthologous loci bearing zebrafish HCNE DARs and human H3K27ac HCNE peaks. Normalized peak heights in counts per million reads are indicated. HCNEs overlapping peaks and DARs are displayed beneath and colour-coded as the tracks. Grey boxes and lines indicate regions with HCNE matches between species. (H) Heatmap of enrichment of single gene mouse knockout phenotypes for collections of HCNEs as indicated in panel D. Significance in -log10(HyperFdrQ) are indicated both relative the whole genome, and also to all HCNEs to account for non-random distribution of HCNEs. Discrete colour coded values in the key indicate minimum and maximum significance, with 25th, 50th and 75th percentiles between, as indicated on the continuous scale. Terms are colour-coded according to anatomical categories as indicated.

To more globally analyse the targets of putative CRMs acting in both human and zebrafish endoderm we used GREAT to perform functional and phenotypic annotation analyses on the common 1,701 HCNEs (McLean et al. 2010). Annotation of putative target genes used the “basal plus extension” method wherein each gene is assigned a basal regulatory domain of 5 kb upstream and 1 kb downstream of the transcription start site (regardless of other nearby genes). The gene regulatory domain is extended in both directions to the nearest gene’s basal domain but no more than 1000 kb in one direction. To determine whether the HCNEs acting in AFG, PFG and MHG endoderm show distinct characteristic distribution patterns indicative of alternative target genes we analysed subsets unique to each population as well as the complete combined set (Fig. 4D). Furthermore, since HCNEs are known to have non-random genomic distribution patterns with specific biases towards trans-dev genes we tested whether the 1,701 HCNEs (hereafter referred to as “endoderm HCNEs”) showed enrichment around specific classes of gene (categorised according to function of gene products and their loss-of-function phenotypes) both overall and compared to all HCNEs (Fig. 4H, Supplemental Figs S14-18).

Consistent with expectations, our analyses reveal a highly significant association between endoderm HCNEs and genes encoding DNA-binding and gene regulatory proteins, including HMG domain-containing and chromatin binding factors. Notably, however, the endoderm HCNEs appear to show a greater enrichment of such genes compared to all HCNEs (Supplemental Fig. S14). Furthermore, both Biological Process Gene Ontology terms, and mammalian phenotype terms associated with endoderm HCNE subsets from AFG, PFG and MHG cells are broadly consistent with their anterior-posterior location within the vertebrate body plan. For example, putative target genes of HCNEs shared between AFG H3K27ac peaks and sox17E DARs are significantly associated with GO Biological Process Terms, mouse phenotypes and human phenotypes focused on anterior structures including brain, ear and craniofacial structures (Fig. 4H, Supplemental Figs S15,17,18). Similarly, putative targets of PFG H3K27ac peaks and sox17E DARs are associated with pancreas development and pancreatic abnormalities in knock-out mice, and also notably cardiac defects in humans (Fig. 4H, Supplemental Fig. S18). Conversely, putative targets of MHG H3K27ac peaks and sox17E DARs appear to be largely associated with formation of tube and ductal structures such as neural tube closure, endolymphatic duct and kidney formation, as well as vertebral column formation (Fig. 4H, Supplemental Figs S15,17,18). This may be indicative of common gene regulatory programmes governing tube formation including the gut, and CRM accessibility potentially being broadly coordinated regionally along the anterior-posterior axis.

We note that many of the putative targets of HCNE CRMs identified in human and zebrafish endoderm function in nervous system development (Fig. 4H, Supplemental Figs S15,17,18). For example, FOXA2/foxa2 functions both in pancreas development and formation of the floor plate of the neural tube (Ang and Rossant 1994; Brand et al. 1996; Shin et al. 2008; Dal-Pra et al. 2011). Similarly, HES1/her6 has been shown to exhibit nervous system defects including premature neurogenesis, severe neural tube defects, increased numbers of pulmonary neuroendocrine cells, and also pancreatic hypoplasia in mouse knock-out models (Ishibashi et al. 1995; Ito et al. 2000; Jensen et al. 2000), and regulates cell proliferation in the hindbrain in zebrafish (Coolen et al. 2012). As we discuss later, it is therefore highly likely that the significant enrichment for terms associated with the nervous system can be attributed to common neuro-pancreatic gene regulatory programmes operating across the developing nervous system and pancreas. Overall we conclude that we have identified a compendium of HCNEs likely to control gene expression during early organogenesis across vertebrate evolution.

HCNEs bearing hallmarks of functionality in human and zebrafish endoderm are enriched for binding sites of endoderm transcription factors

To determine which transcription factors are likely to be acting via the endodermal HCNEs we performed de novo and known TFBS enrichment analysis in the human and zebrafish sequences (Fig. 5A-C). We particularly focused on the PFG population given its role in pancreas development and the strong association between putative HCNE target genes and pancreas development in our previous analyses (Fig. 4H). TFBSs are often degenerate, with multiple TFs potentially able to bind the same site. Indeed, many of the TFBSs we identified by our analyse are highly similar (Supplemental Fig. S19). We therefore also analysed expression of the candidate TFs corresponding to the enriched TFBSs in RNA-seq data from the human AFG, PFG and MHG populations (Loh et al. 2014). This revealed expression of some TFs in the same endoderm populations exhibiting enrichment of their corresponding TFBSs in HCNEs (Fig. 5C). For example, TFBSs for SOX21 and MEIS1 show greater enrichment in AFG and PFG HCNEs than MHG HCNEs, and SOX21 and MEIS1 are also more highly expressed in AFG and PFG. Similarly, HOXA11, HOXD11 and HOXA13 TFBSs show greater enrichment in MHG HCNEs, consistent with greater expression of the TFs in MHG endoderm. However, we also find strong enrichment for TFBSs of multiple TFs known to have key functions in the endoderm despite the TFs showing no/minimal expression in AFG, PFG or MHG RNA-seq datasets (Fig. 5C). For example, ASCL1, ASCL2, PTF1A, PDX1, NEUROD1 and NKX6.1 are all known or suggested to have roles in pancreatic endoderm development (Offield et al. 1996; Naya et al. 1997; Krapp et al. 1998; Sander et al. 2000; Yee et al. 2001; Kawaguchi et al. 2002; Dong et al. 2008; Binot et al. 2010; Flasse et al. 2013; Duque et al. 2022; Vanheer et al. 2023; Zhu et al. 2023) and their TFBSs are enriched in PFG HCNEs though their expression is low or not detected in PFG endoderm. However, we find strong expression of these TF genes in RNA-seq data from hESCs further differentiated beyond the PFG stage to pancreatic progenitors (PPs). This suggests that the HCNE CRMs these TFs act through bear functional marks prior to the expression of these TFs. Notably, multiple SOX family transcription factors including SOX2, 3, 4, 6, 9, 15, 17 and 21 all show enrichment in endoderm HCNEs and appreciable expression in PFG and PP populations, as do FOXA1 and FOXA2. Given prior evidence that SOX and FOX transcription factors often have pioneer activity (Kamachi and Kondoh 2013; Iwafuchi-Doi et al. 2016; Julian et al. 2017; Fuglerud et al. 2022), it is tempting to speculate that earlier expression of these TFs may render endodermal HCNEs accessible for subsequent binding by later expressed pancreatic TFs like PTF1A. Indeed, interrogation of published ChIP-seq datasets reveals a subset of endodermal HCNEs that show occupancy by FOXA2 at multiple timepoints prior to PTF1A expression, followed by subsequent PTF1A occupancy. Notably, the summits of ChIP-seq peaks are directly at the HCNEs, strongly suggesting the HCNEs are key to TF recruitment (Supplementary Fig. S20).

Figure 5. Motif enrichment analysis reveals candidate transcription factors acting via endoderm HCNE CRMs.

Figure 5

(A-B) High confidence de novo motifs enriched in HCNEs overlapping both human PFG H3K27ac peaks and sox17E>sox17M DARs at 28 and/or 48 hpf, for the human (A) and zebrafish (B) HCNE sequences. P-values and closest match vertebrate transcription factors assigned by HOMER are indicated. (C) Hierarchical clustering of enrichment of known motifs in human and zebrafish endoderm HCNEs overlapping both sox17E>sox17M DARs at 28 and/or 48 hpf, and human AFG, PFG and MHG H3K27ac peaks as indicated. Enrichment in both human and zebrafish HCNE sequences are shown. RNA-seq expression of transcription factor genes corresponding to enriched motifs is indicated as FPKM (Fragments Per Kilobase of transcript per Million mapped reads) both for human and zebrafish datasets as indicated. AFG = anterior foregut; PFG = posterior foregut; MHG = mid/hindgut; PP = pancreatic progenitors. Zebrafish have multiple orthologues of mammalian TF genes due to the additional whole genome and other small duplications during teleost evolution. Where present a, b and c paralogues are depicted. * While the enriched motif is FOXA2:EBOX FPKM heatmap indicates FOXA2 expression alone.

PFG HCNEs have characteristic patterns of TFBS co-occurrence, but show relatively limited evidence for consistent rigid grammatical constraint

Given we identify TFBSs of major regulators of pancreas development in PFG HCNEs identified in both species, we conclude that these highly conserved putative CRMs are likely to be important for pancreas development. That the HCNE sequences are conserved between humans and zebrafish demonstrates that arrangements of putative TFBSs have remained largely consistent across 400 million years of evolution. Constrained TFBS “grammar” (arrangement, spacing and orientation of TFBS) has historically been suggested to point to the “enhanceosome” model of CRMs, where rigid TFBS grammar is required for correct assembly of TF complexes (reviewed in (Long et al. 2016; Jindal and Farley 2021). We wanted to determine whether PFG HCNEs contain consistent sets of grammatically constrained TFBSs, potentially suggestive of consistent TF complexes acting across subsets of HCNEs. We therefore analysed both co-occurrence of the identified enriched TFBSs within HCNEs, and also whether co-occurring TFBSs showed significantly consistent spacing patterns (Fig. 6, Supplemental Fig. 21).

Figure 6. Consistent 12 bp spacing of homeodomain-bHLH TBFS pairs is significantly enriched for a small subset of orthologous HCNEs.

Figure 6

(A) Matrix of TFBS motifs from Figure 5 and Supplemental Figure S19 showing consistent spacing in SpaMo analysis. Cells in the upper matrices are colour-coded according to best orientiation i.e. the strand and position of the secondary motif relative to the primary motif for the most significant spacing. However, significant spacing may be present in other orientations also. Numbers in cells indicate the spacing of motifs in bp that is significantly enriched. Numbers in black indicate significant spacing in only one of human and zebrafish; gold numbers indicate motif pairs significant for HCNEs of both species. Cells in the lower matrices are colour-coded based on significance of TFBS spacing. Numbers in cells indicate the number of occurrences of that spacing in significant orientations defined by SpaMo. Homeodomain-bHLH pairs are indicated in bold black boxes. (B) Distance and orientation plots of the two TFBSs pairs identified as significant in both human and zebrafish HCNEs. Quadrants within individual plots are as labelled on the figure. Red lines within graphs indicate secondary TFBS locations contributing to significant spacing from the primary TFBS (depicted in the centre), while grey bars indicate all other instances of the secondary TFBSs. (C) Bar graphs indicating cumulative numbers of HCNEs containing homeodomain-bHLH TFBS pairs exhibiting 12 bp spacing. The TFBSs pairs from panel A in which the TFBS pairs were identified are annotated on bar segments. The numbers of orthologous HCNEs shared between segments are indicated between the bars. The genomic locations of the 6 HCNEs indicated are shown in Supplemental Figure S22.

There are well-characterised examples of “suboptimisation” of TFBSs within developmental CRMs, where low affinity TFBSs tune target gene expression preventing deleterious ectopic or overexpression (Farley et al. 2015; Farley et al. 2016). However, position frequency matrices used for most motif analyses are derived from binding experiments (e.g. ChIP-seq) that are most indicative of high affinity TFBSs. To avoid exclusion of potentially important low affinity TFBSs in our analyses, we analysed co-occurrence using both permissive and stringent cut-offs for identification of individual TFBSs. Our results indicate strong pairwise co-occurrence of specific TFs throughout PFG HCNEs, while some TFBS combinations are rare (Supplemental Fig. 21). As expected, patterns of TFBS co-occurrence are very similar between human and zebrafish PFG HCNEs. This suggests that the same sets of TFs are likely capable of acting via the same HCNE CRMs in both species. However, this merely indicates whether TFBSs consistently occur in the same HCNEs, but not whether they show consistent spacing or orientation. We therefore next tested whether TFBS pairs have consistent arrangement and spacing, potentially indicative of constrained binding of TF complexes. While we find 16 TF pairs showing consistent arrangement in human PFG HCNEs and 15 in the homologous zebrafish HCNEs (P < 1 × 10-5), only PTF1A-PBX2 and MEIS1-ASCL2 were identified in both species (Fig. 6A,B). Notably, the significant spacing of PTF1A-PBX2 and MEIS1-ASCL2 TFBSs was the same in both species (12 and 12 bp respectively). PTF1A and ASCL2 are both basic helix-loop-helix (bHLH) factors with very similar DNA-binding preferences, while PBX2 and MEIS1 are both homeodomain transcription factors that also bind similar sequences to each other (Supplemental Fig. S19). Furthermore, multiple TF pairs showing significantly consistent spacing patterns in only one of the two species are homeodomain-bHLH pairs exhibiting 12 bp spacing (Fig. 6A,B). Identification of the PFG HCNEs for each of the significant 12 bp spaced homeodomain-bHLH TF pairs reveals substantial overlaps between HCNE sets both within and between species (Fig. 6C, Supplemental Fig. S22). This suggests that subtle differences in highly similar motifs may dictate whether significance is achieved in one or both species, and for which of the potential homeodomain-bHLH motif pairs. It is therefore possible that multiple different combinations of homeodomain-bHLH TF pairs may be capable of binding these sites.

To determine whether TFs corresponding to 12 bp-spaced homeodomain-bHLH TFBSs pairs are co-expressed in the developing endoderm we analysed published zebrafish single-cell RNA-seq data (Farrell et al. 2018; Sur et al. 2023). This revealed that while the homeodomain TFs are more broadly expressed than bHLH TFs in the developing endoderm, there is nevertheless clear co-expression of multiple homeodomain-bHLH TF pairs in different endoderm subpopulations. This includes meis1a/b with tcf12 in pharyngeal pouch endoderm, ptf1a with meis1a/b in exocrine pancreas progenitors, and ASCL2 homologue ascl1a with pdx1, pbx2 and meis1a in intestinal secretory cell progenitors (Supplemental Fig. S23). It is therefore possible that these homeodomain-bHLH TF pairs co-regulate a subset of target genes in the developing endoderm. However, it is also possible that other TF pairs capable of binding the same sequences act via these sites.

The bHLH TFBSs in question have a core consensus of CAGCTG and are therefore essentially palindromic, while the homeodomain TFBSs have more definable directionality (Supplemental Fig. S19, Fig. 6B). Our analyses reveal that the orientation of the homeodomain TFBSs is typically (but not always) consistent relative to the bHLH TFBS (Fig. 6A,B). It is therefore possible that this orderly arrangement of TFBSs is critical to TF complex engagement or competition at these sites. However, to the best of our knowledge no physical interactions have yet been identified for the homeodomain-bHLH TF pairs showing significantly consistent spacing.

Overall our analyses reveal consistent patterns of TFBS co-occurrence in PFG HCNEs between humans and zebrafish, potentially indicative of conserved cooperative gene regulatory programmes. However, there is only limited evidence for highly consistent CRM grammar both across sets of HCNEs and between species, with consistent arrangement of TFBSs pairs only apparent for small subsets of HCNEs within each species, and only 12 bp spaced homeodomain-bHLH TFBS pairs observed between species in six orthologous HCNEs. This potentially suggests that either few of the TFs with individually enriched TFBSs act in complexes with each other, or the complexes are flexible in terms of combined recognition of target TFBSs.

HNF1B HCNE CRMs identified in endoderm drive expression in the developing hindbrain

We next wanted to determine whether the HCNE CRMs we identified in human and zebrafish endoderm can drive expression in the predicted endodermal tissues. Given our observation that many HCNE CRMs identified in zebrafish and human endoderm are proximal to genes that govern both pancreas and nervous system development, we chose to focus on putative HCNE enhancers of such a gene to determine whether the HCNEs are similarly capable of driving endoderm and neural gene expression. Furthermore, increasing evidence suggests that subsets of cases of many disorders including diabetes and pancreatic agenesis are likely to be driven by genetic alteration of CRMs (Weedon et al. 2014; Claringbould and Zaugg 2021; Miguel-Escalada et al. 2022). We therefore also wanted to examine HCNE CRM function at a gene where coding mutations are already known to cause diabetes. Notably, our analyses indicate HCNE CRMs at monogenic diabetes gene HNF1B and its zebrafish orthologue hnf1ba, both of which are expressed in posterior foregut endoderm (Sun and Hopkins 2001; Loh et al. 2014). HNF1B haploinsufficiency causes maturity onset diabetes of the young (MODY) including pancreas atrophy, while homozygous disruption of hnf1ba in zebrafish leads to a failure of pancreas development during embryogenesis (Fajans et al. 2001; Sun and Hopkins 2001; Quilichini et al. 2021). However, while HNF1B haploinsufficiency in humans also causes renal defects, developmental defects in the zebrafish renal system in hnf1ba mutants are largely mitigated due to co-expression of the hnf1ba paralogue hnf1bb in the developing pronephros (Naylor et al. 2013). This is in stark contrast to the developing endoderm where hnf1bb is not substantially expressed and does not compensate for loss of hnf1ba (Sun and Hopkins 2001). Our sox17E ATAC-seq data indicate regions of accessibility containing HCNEs in both introns 4 and 5, while the homologous human HCNEs are within PFG H3K27ac peaks in the same introns (Fig. 7A). These HCNEs have not been maintained in hnf1bb after duplication of the ancestral hnf1b gene (Supplemental Fig. S24). Differences in expression at the hnf1ba/b loci may therefore be due to the function of the HNF1B/hnf1ba HCNEs. Consequently, we prioritised HNF1B/hnf1ba intron 4 and 5 HCNEs for analysis, predicting that they would govern the more prominent expression of HNF1B/hnf1ba in posterior foregut and its derivatives, which hnf1bb lacks. However, hnf1ba also exhibits a much broader hindbrain expression domain than hnf1bb (Choe et al. 2008), potentially suggesting the HCNEs present in hnf1ba but not hnf1bb may also govern hindbrain expression domains. Nevertheless, interrogation of published ChIP-seq data from hESCs differentiated in vitro to pancreatic progenitors indicates the HNF1B intron 5 HCNE is bound by PDX1 – a key regulator of pancreatic development – suggesting a likely function in pancreas development (Supplemental Fig. S25). Furthermore, TF genes corresponding to co-occurring TFBSs in the intron 4 HCNEs show marked co-expression with hnf1ba in developing endoderm (Supplemental Fig. S26).

Figure 7. HNF1B/hnf1ba HCNE CRMs bearing the hallmarks of functionality in zebrafish endoderm and human PFG endoderm are sufficient to drive reporter expression in the developing hindbrain but not endoderm.

Figure 7

(A) H3K27ac ChIP-seq and ATAC-seq data at human HNF1B and zebrafish hnf1ba loci respectively. HCNEs overlapping H3K27ac peaks and DARs are indicated in corresponding colours beneath tracks. Normalized peak heights in counts per million reads are indicated. HCNE and putative enhancer sequences cloned for reporter assays are outlined. (B) Lateral view of mCherry reporter expression driven by the i4Enh +6-8Kb enhancer at 48 hpf in sox17:GFP embryos injected with the reporter construct at the 1-cell stage. Maximum intensity z-projections of 27 z-slices from confocal images. (C) 3D rendering of confocal images showing 10 z-slices at a time going through the embryo along the z-axis from the left to right side. D = dorsal, R = right, P = posterior, V= ventral, L= left, A = anterior. Scale bars = 100 µm.

In addition to the intron 4 and 5 HCNEs there is also a sox17E ATAC-seq peak 3 kb upstream of the transcription start site that does not harbour HCNEs. To determine whether these putative enhancer regions, and the HCNEs are capable of driving expression in foregut endoderm we tested their ability to drive mCherry reporter expression in sox17:GFP embryos. We tested the ability of multiple genomic regions to drive expression. For intron 4 this included the entire accessible zebrafish region (i4Enh +6-8kb), discrete elements on the flanks of the intron 4 accessible region lacking HCNEs (i4Enh +6kb and i4Enh +8kb), and just the HCNE cluster (i4zHCNE). We also similarly tested the equivalent human intron 4 HCNE cluster (i4hHCNE). For intron 5 we tested discrete regions containing the zebrafish and human HCNE (i5zHCNE and i5hHCNE respectively). We also tested the upstream accessible region (Enh -3kb). Each reporter construct also contained the crystallin, alpha a (cryaa) promoter upstream of gfp. This drives GFP expression in the lens of the eye, thus providing a constitutive marker to control for injection. We imaged the zebrafish regularly up to and including 48 hpf, consistent with the terminal ATAC-seq timepoint. Except for Enh -3kb, all constructs drove expression in multiple tissues, with the most consistent mCherry expression seen in hindbrain (Fig. 7B,C, Supplemental Figs S27-28, Supplemental Tables 14-15). None of the reporter constructs yielded mCherry co-expression with sox17:GFP, indicating that they are not individually capable of driving expression in the endoderm (or lateral line neurons). All constructs containing HCNEs drove expression in the developing hindbrain (i4zHCNE – 62.2% of embryos; i4hHCNE – 87.5%; i4Enh +6-8kb 28.6%; i5zHCNE – 28.6%; i5hHCNE – 88.9%; Fig. 7B,C, Supplemental Fig. S28, Supplemental Table 14-15). The hindbrain expression domain was typically discrete, and not overlapping forebrain marker otx2b:Venus (Supplemental Fig. S29). This hindbrain expression driven by HNF1B/hnf1ba HCNEs is consistent with known expression of the endogenous genes in developing rhombomeres (Pouilhe et al. 2007; Choe et al. 2008). However, other sequences appear to be required for endoderm expression.

We note that hnf1ba intron 5 exhibits chromatin accessibility in the sox17M population, and both i5zHCNE and i5hHCNE consistently drive reporter expression in the zebrafish heart as well as the hindbrain (Fig. 7A, Supplemental Fig. S27, Supplemental Table 15). We previously showed that sox17:GFP is co-expressed with kdrl;mCherry in the developing heart, consistent with known sox17 expression in cardiac precursors in the lateral plate mesoderm (Chung et al. 2011; Johal et al. 2025). Hnf1ba expression has also been reported in the heart (Chi et al. 2008b). The i5HCNE enhancer therefore potentially regulates both cardiac and neural expression domains of hnf1ba but is insufficient for pancreas expression despite bearing the hallmarks of functionality in endoderm.

To explore why the studied reporter constructs were incapable of reproducing the hnf1ba endoderm expression pattern, we further examined both our zebrafish ATAC-seq data and published human endoderm functional genomics data (Lee et al. 2019). This revealed binding of transcription factors FOXA2, GATA4/6, and HNF1B itself in non-HCNE genomic regions within HNF1B introns 1, 4, 5, 6 and 8, and also in proximal intergenic regions (Supplemental Fig. S30). We also note FOXA2 and HNF1B binding at a HCNE in intron 1 (Supplemental Fig. S30). However, this HCNE did not correspond to a significant DAR in the zebrafish ATAC-seq data (Fig. 7).

FOXA2 is required for enhancer priming during pancreatic differentiation from hESCs (Lee et al. 2019), while haploinsufficiency of GATA6 and GATA4 compromise pancreatic progenitor differentiation (Shi et al. 2017) consistent with GATA4 and GATA6 being implicated in neonatal diabetes and pancreatic agenesis (Allen et al. 2011; Shaw-Smith et al. 2014). The presence of these critical TFs at non-HCNE CRMs at HNF1B despite the HCNEs coinciding with H3K27ac, ATAC-seq peaks and PDX1 binding (Supplemental Fig. S30) suggests that its endodermal expression domain is regulated by additional CRMs that are not highly conserved between zebrafish and humans. Interestingly, zebrafish sox17E ATAC-seq peaks are also observed in a similar location in intron 8 to human FOXA2/GATA4/GATA6 binding, as well as in additional upstream regions (Supplemental Fig. S31). It is possible that the locations of key CRMs have therefore remained roughly conserved, while TFBS grammar within these CRMs has not.

Overall we identify a subset of HCNEs bearing hallmarks of functionality in developing zebrafish and human endoderm during early organogenesis. These HCNEs are strongly enriched for TFBSs of factors critical during endodermal organ development and strongly associated with genes controlling craniofacial and pancreas development. However, the putative HCNE target genes governing pancreas development also have key roles in the developing nervous system, and the HCNE enhancers at the HNF1b/hnf1ba locus are sufficient to drive expression only the HNF1b/hnf1ba neural and cardiac but not endodermal expression domains. The present evidence suggests that the HCNEs bear key hallmarks of functionality during endoderm development including accessibility and H3K27ac, but must operate within a complex regulatory landscape to influence endodermal expression.

Discussion

Our initial intention was to identify putative CRMs functioning in zebrafish endoderm during early organogenesis, before identifying those with clear homology with CRMs functioning in human endoderm. A potential risk in our zebrafish endoderm enrichment strategy prior to ATAC-seq is the use of sox17:GFP reporter zebrafish. Endogenous sox17 expression in endoderm diminishes rapidly at the end of gastrulation, and our strategy therefore relies on persistence of GFP protein throughout the endoderm until the developmental timepoints we analysed. It is therefore possible that insufficient GFP remained in some endodermal cells for them to be sorted into the sox17E population. However, imaging of GFP+ endodermal structures (Figs 1, 3) and identification of strong sox17E-specific DARs at key endoderm markers (Fig. 2) suggests adequate GFP remains in the endoderm. While our strategy to use transgenic reporter zebrafish to enrich for the developing endoderm was successful, we found that relatively few genomic regions are uniquely accessible in the endoderm. Rather, while many genomic regions exhibit enhanced accessibility in the sox17E population over the sox17M haematovascular cells, the majority of these regions are similarly accessible in the sox17N population. This suggests that any given putative endoderm CRM is likely to also be accessible in some other non-endodermal cell types. Consistent with this, amongst the enhancers identified in zebrafish and human endoderm, the HCNE enhancer in introns 4 of HNF1B/hnf1ba can drive hindbrain expression, while the intron 5 enhancer drives both hindbrain and cardiac expression. In general, endoderm CRMs therefore appear unlikely to be endoderm-specific. Despite being identified in endoderm, neither HNF1B/hnf1ba HCNE enhancer is individually capable of driving endodermal gene expression. This is most likely due to additional regulatory sequences being required to permit endodermal gene expression. Future analysis of HNF1B/hnf1ba regulation should therefore consider the combined actions of multiple CRMs, including analysis of whether the cognate HNF1B/hnf1ba promoters are required for the HCNE enhancers to influence endodermal gene expression. Analysis of chromatin accessibility and histone modifications at the single-cell level within developing tissues would also potentially allow identification of unique patterns of CRM accessibility indicative of combinatorial CRM action permitting target gene expression in specific cell types. Furthermore, it is possible that the identified HNF1B/hnf1ba HCNE CRMs may be necessary for HNF1B/hnf1ba endodermal expression even though they are not sufficient. Future studies should therefore involve production of CRISPR knockout zebrafish and/or human pluripotent stem cells followed by analysis of pancreas development.

Our selection of HCNEs at hnf1ba for analysis was partly due to their lack of conservation at hnf1bb, potentially explaining distinctions in the expression patterns of the two genes including in endoderm. The hindbrain expression we observe in reporter assays (Fig. 7) is consistent with broader expression observed for hnf1ba than hnf1bb (Choe et al. 2008). Consideration of the whole genome duplication that has occurred during teleost evolution may offer further advantages in prioritising HCNEs for future analysis. A consistent challenge in study of gene regulation is accurately linking putative CRMs to target genes, especially given potential long-range promoter-enhancer interactions. Methods based on proximity alone are therefore prone to inaccuracies. Consideration of the teleost genome duplication may offer a solution to this for some HCNE CRMs. The teleost genome duplication has permitted both degradation of individual duplicated genes due to redundancy, and loss of HCNEs at individual gene copies as the target gene degrades or expression domains are lost during subfunctionalisation of duplicate gene pairs (Kikuta et al. 2007). Tracking cooccurrence of HCNEs with specific genes within conserved syntenic gene blocks should therefore permit prediction of HCNE-gene relationships, and prioritisation of HCNEs for future analysis.

A common neuro-pancreatic gene regulatory and cis-regulatory programme?

Many similarities between endocrine cells of the pancreas and neurons have been noted over the years including production of polypeptide hormones, neurotransmitters and their receptors, and also common chromatin methylation signatures (Pearse and Polak 1971; Fujita et al. 1980; Gonoi et al. 1994; Maechler and Wollheim 1999; van Arensbergen et al. 2010). Furthermore, many of the TFs required for development of the endocrine pancreas have key roles in the developing nervous system throughout vertebrates, including HNF1B, FOXA2, ISL1, PTF1A, ONECUT1, NEUROD1, HES1, PAX6, SOX4 and MEIS2 (Ang and Rossant 1994; Ishibashi et al. 1995; Pfaff et al. 1996; Miyata et al. 1999; Sun and Hopkins 2001; Haubst et al. 2004; Hutchinson and Eisen 2006; Nolte et al. 2006; Hori et al. 2008; Potzner et al. 2010; Dal-Pra et al. 2011; Coolen et al. 2012; Roy et al. 2012; Machon et al. 2015; Jin and Xiang 2019). This is consistent with HCNE CRMs identified by our analyses of zebrafish and human endoderm being associated with both pancreas and nervous system development. TFBSs significantly enriched within these HCNE CRMs include those of TFs also operating during pancreas and nervous system development including PTF1A, PDX1, FOXA2, ISL1, SOX4 NEUROD1 (Fig. 5). Given the HNF1B/hnf1ba endoderm HCNE CRMs drive expression in the developing nervous system, it is tempting to speculate that as well as there being a common set of TFs governing endocrine pancreas and nervous system development (neuro-pancreatic TFs), these TFs also may act together within common CRMs that function in both the pancreas and nervous system. However, a broader analysis of HCNE CRMs performed at scale would be necessary to draw such conclusions. Since any given HCNE CRM is not necessarily individually sufficient to drive neural and pancreatic gene expression, future work will require both reporter assays and genetic deletion studies to test the effects on putative target genes. Combinatorial analysis of CRMs in both reporter assays and deletion studies will also likely be required to fully characterise the roles of HCNE CRMs regulating any given gene.

Common gene regulatory programmes spanning pancreas and nervous system development potentially implies co-option of an ancient gene regulatory programme from a common ancestral cell type. Indeed, all deuterostomes have a nervous system but only jawed vertebrates have a true, morphologically distinct pancreas. Thus, logically a common gene regulatory programme could have arisen during the evolution of the nervous system and subsequently been co-opted during pancreatic evolution. This is supported by analysis of sea urchin Stongylocentrotus purpuratus larvae, which identified a subset of neurons derived from cells with a putative “pre-pancreatic” signature consisting of expression of homologues of multiple neuro-pancreatic TFs including SpSoxC (homologous to SOX4) and SpPtf1a (Perillo et al. 2018). However, of the 1,701 endodermal HCNE CRMs identified by our analysis of human and zebrafish data, only 24 are also HCNEs between human and lamprey genomes by the same criteria of 70% identity across at least 30 alignment columns. It therefore seems likely that though combined action of neuro-pancreatic TFs in the nervous system predates emergence of the vertebrate lineage, the HCNE CRMs they are predicted to bind are likely to have evolved after the divergence of jawless and jawed vertebrates.

Individual HCNE CRMs appear to have a distinct regulatory logic

Highly conserved CRMs have been suggested to be constrained by the action of TF complexes requiring consistent configuration of TFBSs (Guturu et al. 2013; Long et al. 2016; Jindal and Farley 2021). Our analysis of PFG HCNE CRMs suggests highly conserved patterns of TFBS co-occurrence (Supplemental Fig. S21). However, relatively few TF pairs show significant enrichment for specific TF spacing across different HCNEs within the same species (Fig. 6A). Indeed, although we identify significant incidence of 12 bp spacing of putative homeodomain-bHLH binding sites, this was only detected in a common set of 6 HCNEs. This suggests that either the selective pressure maintaining HCNEs is not based on the action of TF complexes requiring rigid TFBS spacing, or that TF complexes requiring such rigid grammar do not act widely across HCNE CRMs. It is similarly possible that depending on the configuration of DNA-binding domains within TF complexes, they are robust to variability in TFBS spacing. An alternative model to explain conserved patterns of TFBS co-occurrence without rigid spacing relates to recently proposed “dependency grammar” (Jindal and Farley 2021). This considers that multiple parameters including TFBS affinity, order, spacing and orientation are tuned by selective pressures to yield viable patterns and levels of expression from key enhancers. Thus, interplay between spacing and affinity of sites dictates the functional capabilities of enhancers without either parameter needing to be rigidly set (Farley et al. 2015; Farley et al. 2016). This may be linked to suboptimisation of enhancers via hinderance of combined TF binding, preventing overactivation (Farley et al. 2015). This is especially important for trans-dev genes, where their overactivation or ectopic expression may lead to harmful alterations in cell fate assignment during development. Enhancer configuration with genuine but less rigid grammatical logic may also offer an evolutionary advantage since such enhancers are more likely to be robust to minor sequence changes without evident alterations in function.

Given our results suggest combinatorial rules in terms of TFBS co-occurrence without rigid spacing in PFG HCNE CRMs, this supports the dependency grammar model where co-regulated genes are regulated by similar sets of TFs, but the tuning of grammatical parameters varies to ensure correct functional outputs. This tuning via dependency grammar is likely to be critical, hence the high degree of conservation of the HCNEs.

Overall our analyses reveal a compendium of HCNE CRMs acting across the 400 million years of evolution separating zebrafish and humans. These HCNE CRMs should be prioritised for analysis of commonalities of gene regulatory control programmes controlling neural and pancreatic development. Our analyses reveal consistent patterns of TFBS co-occurrence in endoderm HCNE CRMs that should form the basis of future studies of TF combinatorial and competitive binding to reveal conserved mechanisms tuning the correct spatiotemporal expression of trans-dev genes.

Methods

Zebrafish strains and transgenics

ABix, Tg(-5.0sox17:GFP)ha01Tg (Mizoguchi et al. 2008), Tg(kdrl-HsHRAS:mCherry)s896 (Chi et al. 2008a) and Tg(gata1a:DsRed)sd2Tg (Traver et al. 2003) fish were reared as described (Westerfield 2007). All zebrafish studies fully complied with the UK Animals (Scientific Procedures) Act 1986 as implemented by University of Warwick.

Preparation of cells for RNA-seq and ATAC-seq

Embryos were dechorionated using pronase and dissociated in a Collagenase:Trypsin blend as described in Supplemental Materials. Cell sorting was performed using a Becton Dickinson FACSAria Fusion Cell Sorter with a 100 μm nozzle and sheath fluid pressure of 25 pounds per square inch (psi). Debris was eliminated based on side scatter area (SSC-A) and forward scatter area (FSC-A) and singlet cells selected based on forward scatter area (FSC-A) and forward scatter width (FSC-W). Next the single cells were sorted based on fluorescence signal from green fluorescent protein EGFP (B488-530) and red fluorescent proteins mCherry and DsRed (YG561-610). Flow cytometry was operated and analysed using BD FACSDiva™ software.

RNA-seq

Total RNA was prepared from sorted cells using QIAGEN RNeasy Mini kit with on-column DNase I treatment (QIAGEN) following the manufacturer’s protocol. 10 ng total RNA per sample was use to construct sequencing libraries using the NEBNext Ultra II Directional RNA Library Prep kit for Illumina according to the manufacturer’s instructions and sequenced using NovaSeq 6000-S4-type flow cell. Reads were mapped onto zebrafish genome, danRer11 using STAR (Dobin et al. 2013). HTSeq was used to count aligned reads per gene (Anders et al. 2015). The raw count matrix was imported into iDEP (Ge et al. 2018), and differentially expressed genes (DEGs) identified using default parameters. Heatmaps of DEGs were produced using Morpheus (https://software.broadinstitute.org/morpheus).

ATAC-seq

50,000 sorted cells per sample were used for OMNI-ATAC-seq using adapted methods outlined by (Buenrostro et al. 2015) and (Corces et al. 2017). Full detail is provided in Supplemental Materials. Libraries were sequenced by the Genomics Facility at the University of Warwick with Illumina NextSeq 500 using the High Output Kit v2.5 (FC-404-2002), and by Novogene (UK) Company Limited using NovaSeq 6000-S4-type flow cell. Each sample used different barcoded reverse primers, allowing for the samples to be multiplexed for sequencing. Reads were trimmed from the 3’ end to a uniform 75 bp using seqtk so that mapping of reads was not impacted by sequencing read length. Adapter sequences were trimmed using Trimmomatic version 0.39 (Bolger et al. 2014). Data quality control and confirmation of adapter removal was performed using FastQC (Andrews 2010). Reads were aligned to the genome reference consortium zebrafish build 11 (GRCz11v97) using Bowtie 2 (Langmead and Salzberg 2012) using “Very sensitive” and 2 kb maximum fragment length for paired-end alignments. SAMtools (Li et al. 2009) was also used to filter out reads that mapped to the mitochondrial genome, duplicated reads and low-quality reads with a MAPQ score < 22. Numbers of reads per sample used in subsequent analyses are shown in Supplemental Table 16.

Accessible regions (peaks) per sample were called using MACS2 (Zhang et al. 2008; Gaspar 2018) with default parameters except: broad peak setting, mappable genome size of 1.4e9, false discovery rate cut-off of 0.05, bandwidth of 300bp, and a high-confidence fold-enrichment between 5 and 50. To control for Tn5 sequence-specific signature bias, peaks were called against an ATAC-seq control library produced from purified genomic DNA (Buenrostro et al. 2013).

Differentially accessible regions (DARs) between the cell populations and time points, were identified in R version 4.2.1 (R Core Team, 2021) using DiffBind version 3.0.15 (Buenrostro et al. 2013) with the DESeq2 option (Love et al. 2014), on the broad peaks files (.broadPeak) produced by MACS2 without merging of biological replicates. Summits were set to false so that the summit heights (read pileup) and locations were not calculated for each peak, allowing the whole peak size to be considered. DARs identified with FDR <= 0.01 DARs were used for subsequent analysis. DARs were assigned to the nearest TSS and distributions of DARs relative to genes assessed using ChIPseeker 1.32.1 (Yu et al. 2015) and GRCz11 (danRer11) Ensembl gene annotation version 97.

Gene Ontology (GO) term enrichment for biological processes, molecular functions and pathways was performed using over-representation analysis (ORA) in WebGestalt 2019 (Liao et al. 2019). FishEnrichr (Chen et al. 2013; Kuleshov et al. 2016) was used to identify zebrafish anatomical structures associates with genes lists.

Integrated genome viewer (IGV) version 2.5.0 (Robinson et al. 2011) was used to visualise mapped ATAC-seq and ChIP-seq reads, DARs and HCNEs. ATAC-seq peaks from biological replicates were merged using SAMtools (Li et al. 2009) to show representative peaks. Tracks were normalized using the CPM function in deepTools bamCoverage (Ramirez et al. 2016).

To produce heatmaps of ATAC-seq data BAM files were downsampled to equalize read numbers between conditions using Picard . Heatmaps were produced using seqMINER v1.3.4 using the KMeans enrichment linear clustering normalization method (Ye et al. 2011).

ChIP-seq data analysis

Previously published H3K27ac, FOXA2, GATA4, GATA6, HNF1B, PDX1 and PTF1A ChIP-seq data were downloaded from GEO: Series GSE52658, GSE183672 and GSE114102 (Loh et al. 2014; Lee et al. 2019; Miguel-Escalada et al. 2022). Reads were aligned to hg38 using Bowtie 2 (Langmead and Salzberg 2012) with default parameters apart from -N 1. ChIP-seq peaks were called using MACS2 (Zhang et al. 2008; Gaspar 2018) using default settings against cognate controls. A more stringent q-value of <10-8, instead of the default <0.05, was qualitatively determined as an appropriate cut off for the called peaks by looking at the peaks around key genes known to be expressed in endodermal cell populations, as well as genes expressed specifically in non-endodermal cell populations.

Analysis of Highly Conserved Non-coding Elements (HCNEs) overlapping ATAC-seq and ChIP-seq datasets

HCNEs comparing human (hg38) and zebrafish (danRer10) genome builds using a window size of 30 bp and 70-100% percent identity threshold were downloaded from ANCORA (http://ancora.genereg.net) (Engstrom et al. 2008). danRer10/hg38 HCNEs were converted to danRer11/hg38 using liftOver (Hinrichs et al. 2006). BEDTools intersect (Quinlan and Hall 2010) was used to determine if danRer11/hg38 HCNEs overlap with zebrafish danRer11 DARs from ATAC-seq and human hg38 ChIP-seq peaks.

Functional and anatomical enrichment analyses were performed for hg38 HCNE sets using Genomic Regions Enrichment of Annotations Tool (GREAT) version 4.0.4 with default settings (McLean et al. 2010; Tanigawa et al. 2022). Dual analyses were performed with background regions set to whole genome or a list of all HCNEs. Heatmaps of the top 20 enriched terms per set of HCNEs were selected based on hypergeometric false discovery rate corrected q-values (HyperFdrQ).

Motif enrichment analysis

De novo and known motif enrichment analysis was performed using HOMER v4.11.1 (Heinz et al. 2010). For known motif enrichment analysis the top 30 most significant motifs in each HCNE set (P < 1 × 10-8) were considered leading to a combined set of 60 motifs. A heatmap of indicating enrichment of these motifs in each HCNE set was produced and clustered using Morpheus (https://software.broadinstitute.org/morpheus). To annotate expression of transcription factors corresponding to these motif’s RPKM values from relevant human endoderm RNA-seq datasets were downloaded from GEO: Series GSE52658 (Anterior Foregut, Posterior Foregut and Mid/Hindgut endoderm) (Loh et al. 2014) and GSE216266 (passage 0 pancreatic progenitors) (Jarc et al. 2024). Cooccurrence of selected motifs within HCNEs were analysed using Paired Motif Enrichment Tool (PMET implemented at https://pmet.online and https://github.com/duocang/PMET-Shiny-App) (Rich-Griffin et al. 2020) on genomic intervals using default parameters. PMET output was used to produce a matrix using the acast function in reshape2 in R, prior to producing a heatmap using Conditional Formatting in Microsoft Excel. Pairwise analysis of spacing of motifs from HOMER was performed using SpaMo in MEME Suite (Whitington et al. 2011; Bailey et al. 2015). All position frequency matrices from significant de novo and known motif enrichment analysis using HOMER were provided as primary and secondary motifs for pairwise comparison of spacing.

Cloning of reporter constructs for reporter assays

Putative enhancer and HCNE elements of interest were PCR amplified from zebrafish or human genomic DNA using custom primers containing attB4 and attB1 sequences and Q5 ® High-Fidelity DNA Polymerase (NEB) and cloned by Gateway recombination into pDONR-P4-P1R (Thermo Fisher Scientific). All primers used are listed in Supplemental Table 17. To allow visualization of reporter activity in Tg(sox17:GFP) zebrafish, constructs were generated to express mCherry downstream of the putative enhancer. To do this published plasmids pENTRbasEGFP (Addgene #22453), pENTREGFP2 (Addgene #22450) and pDESTtol2pACrymCherry (Addgene #64023) (Villefranc et al. 2007; Berger and Currie 2013) were modified using NEBuilder HiFi DNA Assembly Master Mix (NEB) to switch EGFP and mCherry between the plasmids. To use the E1b promoter as the basal promoter in reporter assays it was amplified from gata2a-i4-E1b-GFP-Tol2 (kindly gifted by Dr Rui Monteiro) and inserted into pENTR mCherry by HiFi cloning. Following sequence verification, hnf1ba putative enhancer/HCNE, pENTR E1bP:mCherry, p3E-mcs1 (Addgene #49004) (Moore et al. 2013) and pDESTtol2pACryEGFP constructs were recombined by Gateway recombination (Thermo Fisher Scientific). This generated plasmids with putative hnf1ba enhancer elements upstream of the E1b promoter and mCherry, and the cryaa promoter upstream of EGFP.

Embryo compound treatments, reporter assays and imaging

Embryos for imaging were incubated from 22 hpf with 0.003% PTU (1-phenyl 2-thiourea) to block pigmentation and improve optical transparency. Embryos were immobilised for imaging using 30 μg/ml tricaine (ethyl 3-aminobenzoate methanesulfonate E10521-10G MERCK) and mounted and orientated in 1% agarose moulds created using a 3D-printed stamp (Kleinhans and Lecaudey, 2019). Embryos were imaged and images processes as described in Supplemental Materials.

To analyse the effects of retinoic acid (RA) signalling, embryos were treated from 5 hpf with either RA (Sigma-Aldrich R2625), diethylaminobenzaldehyde (DEAB Sigma-Aldrich D86256), BMS493 (APExBIO B7415) or vehicle (DMSO - 0.1%) as a control. Following treatment embryos were incubated in the dark to prevent degradation of the light sensitive chemicals.

For reporter assays, Tg(sox17:GFP) embryos were microinjected at the one cell stage with 25 pg reporter constructs combined with 25 pg Tol2 transposase capped mRNA produced from pCS2-Tol2. Embryos were screened using widefield fluorescent microscopes for expression of cry:EGFP in the lens at 48 hpf for confirmation of reporter construct integration into the genome. The location of mCherry expression in these embryos was observed using widefield fluorescent microscopes and scored. Embryos of interest were imaged further using confocal microscopes.

Supplementary Material

Supplementary Materials
Supplementary Tables

Acknowledgements

This research was also funded, in-part, by the Wellcome Trust through a Wellcome Seed Award in Science (210177/Z/18/Z) and an MRC New Investigator Research Grant (MR/S021531/1) to ACN. DMR, RE and HW were funded by the MRC Doctoral Training Partnership in Interdisciplinary Biomedical Research (MR/S021531/1, MR/N014294/1 and MR/W007053/1). MW was funded in-part by the Quantitative Biomedicine Programme through Warwick’s Wellcome Institutional Strategic Support Fund (ISSF) award with match funding provided by the University of Warwick. SJ was funded by BBSRC Midlands Integrative Biosciences Training Partnership (BB/M01116X/1). We thank Rui Monteiro for kindly gifting us the Tg(kdrl:mCherry) and Tg(gata1a:dsRed) fish used in this study, Fiona Wardle for Tg(sox17:GFP) and Michael Smutny for Tg(otx2b:Venus). pENTRbasEGFP, 599 p3E-mcs1 and pENTREGFP2 were gifts from Nathan Lawson (Addgene plasmid # 22453; http://n2t.net/addgene:22453; RRID:Addgene_22453; Addgene plasmid # 49004; http://n2t.net/addgene:49004; RRID:Addgene_49004; Addgene plasmid # 22450; http://n2t.net/addgene:22450; RRID:Addgene_22450). pDESTtol2pACrymCherry was a gift from Joachim Berger & Peter Currie (Addgene plasmid # 64023; http://n2t.net/addgene:64023; RRID:Addgene_64023). We thank Fiona Wardle and Karuna Sampath for reagents. We thank Karuna Sampath, Andrea Zaucker and Andre Pires da Silva for valuable discussions, and Karuna Sampath, Jonathan Millar and Andre Pires da Silva for generous access to equipment. We also thank the University of Warwick Research Technology Platform (RTP) Aquatics facility for zebrafish care and, Ian Hands-Portman from the Warwick School of Life Sciences for imaging advice and support. We thank Warwick Integrative Synthetic Biology Centre (WISB) for flow cytometry access, and Sarah Bennett for initial training. WISB is a BBSRC/EPSRC Synthetic Biology Research Centre (BB/M017982/1) funded under the UK Research Councils’ Synthetic Biology for Growth programme. We thank the Warwick School of Life Sciences Genomics Facility for sequencing and technical support, and Kate Woolley-Allen, Pavle Vrljicak, Julia Lipecki, Jade Scott, Paul Brown and Warwick Bioinformatics RTP for bioinformatics advice and support. We thank Xuesong Wang for support with the use of PMET.

Footnotes

Competing interest statement

The authors declare no competing interests.

Author contributions: A.C.N. conceived the study. A.C.N., S.O. and T.B. supervised the study. A.C.N. and D.M.R. designed the experiments. D.M.R., R.E., M.W., S.J., Y.L. and A.C.N. performed all the experiments. D.M.R., R.E., M.W., S.J., H.W. and A.C.N. performed data analysis, and visualization. A.C.N. and D.M.R. wrote the original draft. All authors read and approved the final manuscript.

Data access

All raw and processed sequencing data generated in this study have been submitted to the NCBI Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/) under accession number GSE294761 (ATAC-seq) and GSE294799 (RNA-seq).

References

  1. Aamar E, Dawid IB. Sox17 and chordin are required for formation of Kupffer’s vesicle and left-right asymmetry determination in zebrafish. Dev Dyn. 2010;239:2980–2988. doi: 10.1002/dvdy.22431. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Alexander J, Stainier DY. A molecular pathway leading to endoderm formation in zebrafish. Curr Biol. 1999;9:1147–1157. doi: 10.1016/S0960-9822(00)80016-0. [DOI] [PubMed] [Google Scholar]
  3. Allen HL, Flanagan SE, Shaw-Smith C, De Franco E, Akerman I, Caswell R, Ferrer J, Hattersley AT, Ellard S, International Pancreatic Agenesis C GATA6 haploinsufficiency causes pancreatic agenesis in humans. Nat Genet. 2011;44:20–22. doi: 10.1038/ng.1035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Anders S, Pyl PT, Huber W. HTSeq--a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31:166–169. doi: 10.1093/bioinformatics/btu638. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Andrews S. FastQC: a quality control tool for high throughput sequence data. 2010 (No Title) [Google Scholar]
  6. Ang SL, Rossant J. HNF-3 beta is essential for node and notochord formation in mouse development. Cell. 1994;78:561–574. doi: 10.1016/0092-8674(94)90522-3. [DOI] [PubMed] [Google Scholar]
  7. Bailey TL, Johnson J, Grant CE, Noble WS. The MEME Suite. Nucleic Acids Res. 2015;43:W39–49. doi: 10.1093/nar/gkv416. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Begemann G, Marx M, Mebus K, Meyer A, Bastmeyer M. Beyond the neckless phenotype: influence of reduced retinoic acid signaling on motor neuron development in the zebrafish hindbrain. Dev Biol. 2004;271:119–129. doi: 10.1016/j.ydbio.2004.03.033. [DOI] [PubMed] [Google Scholar]
  9. Berger J, Currie PD. 503unc, a small and muscle-specific zebrafish promoter. Genesis. 2013;51:443–447. doi: 10.1002/dvg.22385. [DOI] [PubMed] [Google Scholar]
  10. Binot AC, Manfroid I, Flasse L, Winandy M, Motte P, Martial JA, Peers B, Voz ML. Nkx6.1 and nkx6.2 regulate alpha- and beta-cell formation in zebrafish by acting on pancreatic endocrine progenitor cells. Dev Biol. 2010;340:397–407. doi: 10.1016/j.ydbio.2010.01.025. [DOI] [PubMed] [Google Scholar]
  11. Bogdanovic O, Fernandez-Minan A, Tena JJ, de la Calle-Mustienes E, Hidalgo C, van Kruysbergen I, van Heeringen SJ, Veenstra GJ, Gomez-Skarmeta JL. Dynamics of enhancer chromatin signatures mark the transition from pluripotency to cell specification during embryogenesis. Genome Res. 2012;22:2043–2053. doi: 10.1101/gr.134833.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Bonkhofer F, Rispoli R, Pinheiro P, Krecsmarik M, Schneider-Swales J, Tsang IHC, de Bruijn M, Monteiro R, Peterkin T, Patient R. Blood stem cell-forming haemogenic endothelium in zebrafish derives from arterial endothelium. Nat Commun. 2019;10:3577. doi: 10.1038/s41467-019-11423-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Brand M, Heisenberg CP, Warga RM, Pelegri F, Karlstrom RO, Beuchle D, Picker A, Jiang YJ, Furutani-Seiki M, van Eeden FJ, et al. Mutations affecting development of the midline and general body shape during zebrafish embryogenesis. Development. 1996;123:129–142. doi: 10.1242/dev.123.1.129. [DOI] [PubMed] [Google Scholar]
  15. Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nature methods. 2013;10:1213–1218. doi: 10.1038/nmeth.2688. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Buenrostro JD, Wu B, Chang HY, Greenleaf WJ. ATAC-seq: A Method for Assaying Chromatin Accessibility Genome-Wide. Curr Protoc Mol Biol. 2015;109:212921–212929. doi: 10.1002/0471142727.mb2129s109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Chen EY, Tan CM, Kou Y, Duan Q, Wang Z, Meirelles GV, Clark NR, Ma’ayan A. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC bioinformatics. 2013;14:1–14. doi: 10.1186/1471-2105-14-128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Chi NC, Shaw RM, De Val S, Kang G, Jan LY, Black BL, Stainier DY. Foxn4 directly regulates tbx2b expression and atrioventricular canal formation. Genes Dev. 2008a;22:734–739. doi: 10.1101/gad.1629408. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Chi NC, Shaw RM, Jungblut B, Huisken J, Ferrer T, Arnaout R, Scott I, Beis D, Xiao T, Baier H, et al. Genetic and physiologic dissection of the vertebrate cardiac conduction system. PLoS Biol. 2008b;6:e109. doi: 10.1371/journal.pbio.0060109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Choe SK, Hirsch N, Zhang X, Sagerstrom CG. hnf1b genes in zebrafish hindbrain development. Zebrafish. 2008;5:179–187. doi: 10.1089/zeb.2008.0534. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Chung MI, Ma AC, Fung TK, Leung AY. Characterization of Sry-related HMG box group F genes in zebrafish hematopoiesis. Exp Hematol. 2011;39:986–998.:e985. doi: 10.1016/j.exphem.2011.06.010. [DOI] [PubMed] [Google Scholar]
  22. Claringbould A, Zaugg JB. Enhancers in disease: molecular basis and emerging treatment strategies. Trends Mol Med. 2021;27:1060–1073. doi: 10.1016/j.molmed.2021.07.012. [DOI] [PubMed] [Google Scholar]
  23. Coolen M, Thieffry D, Drivenes O, Becker TS, Bally-Cuif L. miR-9 controls the timing of neurogenesis through the direct inhibition of antagonistic factors. Dev Cell. 2012;22:1052–1064. doi: 10.1016/j.devcel.2012.03.003. [DOI] [PubMed] [Google Scholar]
  24. Corces MR, Trevino AE, Hamilton EG, Greenside PG, Sinnott-Armstrong NA, Vesuna S, Satpathy AT, Rubin AJ, Montine KS, Wu B. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nature methods. 2017;14:959–962. doi: 10.1038/nmeth.4396. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Creyghton MP, Cheng AW, Welstead GG, Kooistra T, Carey BW, Steine EJ, Hanna J, Lodato MA, Frampton GM, Sharp PA, et al. Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc Natl Acad Sci U S A. 2010;107:21931–21936. doi: 10.1073/pnas.1016071107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Dal-Pra S, Thisse C, Thisse B. FoxA transcription factors are essential for the development of dorsal axial structures. Dev Biol. 2011;350:484–495. doi: 10.1016/j.ydbio.2010.12.018. [DOI] [PubMed] [Google Scholar]
  27. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Dobrzycki T, Mahony CB, Krecsmarik M, Koyunlar C, Rispoli R, Peulen-Zink J, Gussinklo K, Fedlaoui B, de Pater E, Patient R, et al. Deletion of a conserved Gata2 enhancer impairs haemogenic endothelium programming and adult Zebrafish haematopoiesis. Commun Biol. 2020;3:71. doi: 10.1038/s42003-020-0798-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Dong PD, Provost E, Leach SD, Stainier DY. Graded levels of Ptf1a differentially regulate endocrine and exocrine fates in the developing pancreas. Genes Dev. 2008;22:1445–1450. doi: 10.1101/gad.1663208. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Duboule D. Temporal colinearity and the phylotypic progression: a basis for the stability of a vertebrate Bauplan and the evolution of morphologies through heterochrony. Dev Suppl. 1994:135–142. [PubMed] [Google Scholar]
  31. Duque M, Amorim JP, Bessa J. Ptf1a function and transcriptional cis-regulation, a cornerstone in vertebrate pancreas development. FEBS J. 2022;289:5121–5136. doi: 10.1111/febs.16075. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Elgar G. Pan-vertebrate conserved non-coding sequences associated with developmental regulation. Brief Funct Genomic Proteomic. 2009;8:256–265. doi: 10.1093/bfgp/elp033. [DOI] [PubMed] [Google Scholar]
  33. Elsayed AK, Younis I, Ali G, Hussain K, Abdelalim EM. Aberrant development of pancreatic beta cells derived from human iPSCs with FOXA2 deficiency. Cell Death Dis. 2021;12:103. doi: 10.1038/s41419-021-03390-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Engstrom PG, Fredman D, Lenhard B. Ancora: a web resource for exploring highly conserved noncoding elements and their association with developmental regulatory genes. Genome Biol. 2008;9:R34. doi: 10.1186/gb-2008-9-2-r34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Fajans SS, Bell GI, Polonsky KS. Molecular mechanisms and clinical pathophysiology of maturity-onset diabetes of the young. N Engl J Med. 2001;345:971–980. doi: 10.1056/NEJMra002168. [DOI] [PubMed] [Google Scholar]
  36. Farley EK, Olson KM, Zhang W, Brandt AJ, Rokhsar DS, Levine MS. Suboptimization of developmental enhancers. Science. 2015;350:325–328. doi: 10.1126/science.aac6948. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Farley EK, Olson KM, Zhang W, Rokhsar DS, Levine MS. Syntax compensates for poor binding sites to encode tissue specificity of developmental enhancers. Proc Natl Acad Sci U S A. 2016;113:6508–6513. doi: 10.1073/pnas.1605085113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Farrell JA, Wang Y, Riesenfeld SJ, Shekhar K, Regev A, Schier AF. Single-cell reconstruction of developmental trajectories during zebrafish embryogenesis. Science. 2018;360 doi: 10.1126/science.aar3131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Figiel DM, Elsayed R, Nelson AC. Investigating the molecular guts of endoderm formation using zebrafish. Brief Funct Genomics. 2021 doi: 10.1093/bfgp/elab013. [DOI] [PubMed] [Google Scholar]
  40. Flasse LC, Pirson JL, Stern DG, Von Berg V, Manfroid I, Peers B, Voz ML. Ascl1b and Neurod1, instead of Neurog3, control pancreatic endocrine cell fate in zebrafish. BMC Biol. 2013;11:78. doi: 10.1186/1741-7007-11-78. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Fuglerud BM, Drissler S, Lotto J, Stephan TL, Thakur A, Cullum R, Hoodless PA. SOX9 reprograms endothelial cells by altering the chromatin landscape. Nucleic Acids Res. 2022;50:8547–8565. doi: 10.1093/nar/gkac652. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Fujita T, Kobayashi S, Yui R. Paraneuron concept and its current implications. Adv Biochem Psychopharmacol. 1980;25:321–325. [PubMed] [Google Scholar]
  43. Gaspar JM. Improved peak-calling with MACS2. BioRxiv. 2018:496521 [Google Scholar]
  44. Ge SX, Son EW, Yao R. iDEP: an integrated web application for differential expression and pathway analysis of RNA-seq data. BMC Bioinformatics. 2018;19:534. doi: 10.1186/s12859-018-2486-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Glasauer SM, Neuhauss SC. Whole-genome duplication in teleost fishes and its evolutionary consequences. Mol Genet Genomics. 2014;289:1045–1060. doi: 10.1007/s00438-014-0889-2. [DOI] [PubMed] [Google Scholar]
  46. Gonoi T, Mizuno N, Inagaki N, Kuromi H, Seino Y, Miyazaki J, Seino S. Functional neuronal ionotropic glutamate receptors are expressed in the non-neuronal cell line MIN6. J Biol Chem. 1994;269:16989–16992. [PubMed] [Google Scholar]
  47. Guturu H, Doxey AC, Wenger AM, Bejerano G. Structure-aided prediction of mammalian transcription factor complexes in conserved non-coding elements. Philos Trans R Soc Lond B Biol Sci. 2013;368:20130029. doi: 10.1098/rstb.2013.0029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Haubst N, Berger J, Radjendirane V, Graw J, Favor J, Saunders GF, Stoykova A, Gotz M. Molecular dissection of Pax6 function: the specific roles of the paired domain and homeodomain in brain development. Development. 2004;131:6131–6140. doi: 10.1242/dev.01524. [DOI] [PubMed] [Google Scholar]
  49. Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, Cheng JX, Murre C, Singh H, Glass CK. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell. 2010;38:576–589. doi: 10.1016/j.molcel.2010.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Hinrichs AS, Karolchik D, Baertsch R, Barber GP, Bejerano G, Clawson H, Diekhans M, Furey TS, Harte RA, Hsu F, et al. The UCSC Genome Browser Database: update 2006. Nucleic Acids Res. 2006;34:D590–598. doi: 10.1093/nar/gkj144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Hori K, Cholewa-Waclaw J, Nakada Y, Glasgow SM, Masui T, Henke RM, Wildner H, Martarelli B, Beres TM, Epstein JA, et al. A nonclassical bHLH Rbpj transcription factor complex is required for specification of GABAergic neurons independent of Notch signaling. Genes Dev. 2008;22:166–178. doi: 10.1101/gad.1628008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Howe K, Clark MD, Torroja CF, Torrance J, Berthelot C, Muffato M, Collins JE, Humphray S, McLaren K, Matthews L, et al. The zebrafish reference genome sequence and its relationship to the human genome. Nature. 2013;496:498–503. doi: 10.1038/nature12111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Hudson C, Clements D, Friday RV, Stott D, Woodland HR. Xsox17alpha and -beta mediate endoderm formation in Xenopus. Cell. 1997;91:397–405. doi: 10.1016/s0092-8674(00)80423-7. [DOI] [PubMed] [Google Scholar]
  54. Hutchinson SA, Eisen JS. Islet1 and Islet2 have equivalent abilities to promote motoneuron formation and to specify motoneuron subtype identity. Development. 2006;133:2137–2147. doi: 10.1242/dev.02355. [DOI] [PubMed] [Google Scholar]
  55. Ikeda T, Inamori K, Kawanishi T, Takeda H. Reemployment of Kupffer’s vesicle cells into axial and paraxial mesoderm via transdifferentiation. Dev Growth Differ. 2022;64:163–177. doi: 10.1111/dgd.12774. [DOI] [PubMed] [Google Scholar]
  56. Ishibashi M, Ang SL, Shiota K, Nakanishi S, Kageyama R, Guillemot F. Targeted disruption of mammalian hairy and Enhancer of split homolog-1 (HES-1) leads to upregulation of neural helix-loop-helix factors, premature neurogenesis, and severe neural tube defects. Genes Dev. 1995;9:3136–3148. doi: 10.1101/gad.9.24.3136. [DOI] [PubMed] [Google Scholar]
  57. Ito T, Udaka N, Yazawa T, Okudela K, Hayashi H, Sudo T, Guillemot F, Kageyama R, Kitamura H. Basic helix-loop-helix transcription factors regulate the neuroendocrine differentiation of fetal mouse pulmonary epithelium. Development. 2000;127:3913–3921. doi: 10.1242/dev.127.18.3913. [DOI] [PubMed] [Google Scholar]
  58. Iwafuchi-Doi M, Donahue G, Kakumanu A, Watts JA, Mahony S, Pugh BF, Lee D, Kaestner KH, Zaret KS. The Pioneer Transcription Factor FoxA Maintains an Accessible Nucleosome Configuration at Enhancers for Tissue-Specific Gene Activation. Mol Cell. 2016;62:79–91. doi: 10.1016/j.molcel.2016.03.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Jarc L, Bandral M, Zanfrini E, Lesche M, Kufrin V, Sendra R, Pezzolla D, Giannios I, Khattak S, Neumann K, et al. Regulation of multiple signaling pathways promotes the consistent expansion of human pancreatic progenitors in defined conditions. Elife. 2024;12 doi: 10.7554/eLife.89962. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Jensen J, Pedersen EE, Galante P, Hald J, Heller RS, Ishibashi M, Kageyama R, Guillemot F, Serup P, Madsen OD. Control of endodermal endocrine development by Hes-1. Nat Genet. 2000;24:36–44. doi: 10.1038/71657. [DOI] [PubMed] [Google Scholar]
  61. Jin K, Xiang M. Transcription factor Ptf1a in development, diseases and reprogramming. Cell Mol Life Sci. 2019;76:921–940. doi: 10.1007/s00018-018-2972-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Jindal GA, Farley EK. Enhancer grammar in development, evolution, and disease: dependencies and interplay. Dev Cell. 2021;56:575–587. doi: 10.1016/j.devcel.2021.02.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Johal S, Elsayed R, Wang D, Talbot CD, Feuda R, Panfilio KA, Nelson AC. Molecular and Functional Divergence of Zebrafish Sox Paralogs Controlling Endoderm Formation and Left-Right Patterning. Genome Biol Evol. 2025;17 doi: 10.1093/gbe/evaf213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Julian LM, McDonald AC, Stanford WL. Direct reprogramming with SOX factors: masters of cell fate. Curr Opin Genet Dev. 2017;46:24–36. doi: 10.1016/j.gde.2017.06.005. [DOI] [PubMed] [Google Scholar]
  65. Kamachi Y, Kondoh H. Sox proteins: regulators of cell fate specification and differentiation. Development. 2013;140:4129–4144. doi: 10.1242/dev.091793. [DOI] [PubMed] [Google Scholar]
  66. Kanai-Azuma M, Kanai Y, Gad JM, Tajima Y, Taya C, Kurohmaru M, Sanai Y, Yonekawa H, Yazaki K, Tam PP, et al. Depletion of definitive gut endoderm in Sox17-null mutant mice. Development. 2002;129:2367–2379. doi: 10.1242/dev.129.10.2367. [DOI] [PubMed] [Google Scholar]
  67. Kawaguchi Y, Cooper B, Gannon M, Ray M, MacDonald RJ, Wright CV. The role of the transcriptional regulator Ptf1a in converting intestinal to pancreatic progenitors. Nat Genet. 2002;32:128–134. doi: 10.1038/ng959. [DOI] [PubMed] [Google Scholar]
  68. Kikuta H, Laplante M, Navratilova P, Komisarczuk AZ, Engstrom PG, Fredman D, Akalin A, Caccamo M, Sealy I, Howe K, et al. Genomic regulatory blocks encompass multiple neighboring genes and maintain conserved synteny in vertebrates. Genome Res. 2007;17:545–555. doi: 10.1101/gr.6086307. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Krapp A, Knofler M, Ledermann B, Burki K, Berney C, Zoerkler N, Hagenbuchle O, Wellauer PK. The bHLH protein PTF1-p48 is essential for the formation of the exocrine and the correct spatial organization of the endocrine pancreas. Genes Dev. 1998;12:3752–3763. doi: 10.1101/gad.12.23.3752. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, Koplev S, Jenkins SL, Jagodnik KM, Lachmann A. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic acids research. 2016;44:W90–W97. doi: 10.1093/nar/gkw377. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nature methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Lee K, Cho H, Rickert RW, Li QV, Pulecio J, Leslie CS, Huangfu D. FOXA2 Is Required for Enhancer Priming during Pancreatic Differentiation. Cell Rep. 2019;28:382–393.:e387. doi: 10.1016/j.celrep.2019.06.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data Processing S The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Liao Y, Wang J, Jaehnig EJ, Shi Z, Zhang B. WebGestalt 2019: gene set analysis toolkit with revamped UIs and APIs. Nucleic acids research. 2019;47:W199–W205. doi: 10.1093/nar/gkz401. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Lilly AJ, Lacaud G, Kouskoff V. SOXF transcription factors in cardiovascular development. Semin Cell Dev Biol. 2017;63:50–57. doi: 10.1016/j.semcdb.2016.07.021. [DOI] [PubMed] [Google Scholar]
  76. Loh KM, Ang LT, Zhang J, Kumar V, Ang J, Auyeong JQ, Lee KL, Choo SH, Lim CY, Nichane M, et al. Efficient endoderm induction from human pluripotent stem cells by logically directing signals controlling lineage bifurcations. Cell Stem Cell. 2014;14:237–252. doi: 10.1016/j.stem.2013.12.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Long HK, Prescott SL, Wysocka J. Ever-Changing Landscapes: Transcriptional Enhancers in Development and Evolution. Cell. 2016;167:1170–1187. doi: 10.1016/j.cell.2016.09.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Lopez-Perez AR, Balwierz PJ, Lenhard B, Muller F, Wardle FC, Manfroid I, Voz ML, Peers B. Identification of downstream effectors of retinoic acid specifying the zebrafish pancreas by integrative genomics. Sci Rep. 2021;11:22717. doi: 10.1038/s41598-021-02039-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome biology. 2014;15:1–21. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Machon O, Masek J, Machonova O, Krauss S, Kozmik Z. Meis2 is essential for cranial and cardiac neural crest development. BMC Dev Biol. 2015;15:40. doi: 10.1186/s12861-015-0093-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Maechler P, Wollheim CB. Mitochondrial glutamate acts as a messenger in glucose-induced insulin exocytosis. Nature. 1999;402:685–689. doi: 10.1038/45280. [DOI] [PubMed] [Google Scholar]
  82. Martinez-Morales JR. Toward understanding the evolution of vertebrate gene regulatory networks: comparative genomics and epigenomic approaches. Brief Funct Genomics. 2016;15:315–321. doi: 10.1093/bfgp/elv032. [DOI] [PubMed] [Google Scholar]
  83. McLean CY, Bristor D, Hiller M, Clarke SL, Schaar BT, Lowe CB, Wenger AM, Bejerano G. GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol. 2010;28:495–501. doi: 10.1038/nbt.1630. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Miguel-Escalada I, Maestro MA, Balboa D, Elek A, Bernal A, Bernardo E, Grau V, Garcia-Hurtado J, Sebe-Pedros A, Ferrer J. Pancreas agenesis mutations disrupt a lead enhancer controlling a developmental enhancer cluster. Dev Cell. 2022;57:1922–1936.:e1929. doi: 10.1016/j.devcel.2022.07.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Miyata T, Maeda T, Lee JE. NeuroD is required for differentiation of the granule cells in the cerebellum and hippocampus. Genes Dev. 1999;13:1647–1652. doi: 10.1101/gad.13.13.1647. [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Mizoguchi T, Verkade H, Heath JK, Kuroiwa A, Kikuchi Y. Sdf1/Cxcr4 signaling controls the dorsal migration of endodermal cells during zebrafish gastrulation. Development. 2008;135:2521–2529. doi: 10.1242/dev.020107. [DOI] [PubMed] [Google Scholar]
  87. Moore JC, Sheppard-Tindell S, Shestopalov IA, Yamazoe S, Chen JK, Lawson ND. Post-transcriptional mechanisms contribute to Etv2 repression during vascular development. Dev Biol. 2013;384:128–140. doi: 10.1016/j.ydbio.2013.08.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Moreno-Ayala R, Olivares-Chauvet P, Schafer R, Junker JP. Variability of an Early Developmental Cell Population Underlies Stochastic Laterality Defects. Cell Rep. 2021;34:108606. doi: 10.1016/j.celrep.2020.108606. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Naya FJ, Huang HP, Qiu Y, Mutoh H, DeMayo FJ, Leiter AB, Tsai MJ. Diabetes, defective pancreatic morphogenesis, and abnormal enteroendocrine differentiation in BETA2/neuroD-deficient mice. Genes Dev. 1997;11:2323–2334. doi: 10.1101/gad.11.18.2323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. Naylor RW, Przepiorski A, Ren Q, Yu J, Davidson AJ. HNF1beta is essential for nephron segmentation during nephrogenesis. J Am Soc Nephrol. 2013;24:77–87. doi: 10.1681/ASN.2012070756. [DOI] [PMC free article] [PubMed] [Google Scholar]
  91. Nelson AC, Wardle FC. Conserved non-coding elements and cis regulation: actions speak louder than words. Development. 2013;140:1385–1395. doi: 10.1242/dev.084459. [DOI] [PubMed] [Google Scholar]
  92. Nolte C, Rastegar M, Amores A, Bouchard M, Grote D, Maas R, Kovacs EN, Postlethwait J, Rambaldi I, Rowan S, et al. Stereospecificity and PAX6 function direct Hoxd4 neural enhancer activity along the antero-posterior axis. Dev Biol. 2006;299:582–593. doi: 10.1016/j.ydbio.2006.08.061. [DOI] [PubMed] [Google Scholar]
  93. Offield MF, Jetton TL, Labosky PA, Ray M, Stein RW, Magnuson MA, Hogan BL, Wright CV. PDX-1 is required for pancreatic outgrowth and differentiation of the rostral duodenum. Development. 1996;122:983–995. doi: 10.1242/dev.122.3.983. [DOI] [PubMed] [Google Scholar]
  94. Pearse AG, Polak JM. Neural crest origin of the endocrine polypeptide (APUD) cells of the gastrointestinal tract and pancreas. Gut. 1971;12:783–788. doi: 10.1136/gut.12.10.783. [DOI] [PMC free article] [PubMed] [Google Scholar]
  95. Pellegrinet L, Rodilla V, Liu Z, Chen S, Koch U, Espinosa L, Kaestner KH, Kopan R, Lewis J, Radtke F. Dll1- and dll4-mediated notch signaling are required for homeostasis of intestinal stem cells. Gastroenterology. 2011;140:1230–1240.:e1231-1237. doi: 10.1053/j.gastro.2011.01.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  96. Perillo M, Paganos P, Mattiello T, Cocurullo M, Oliveri P, Arnone MI. New Neuronal Subtypes With a “Pre-Pancreatic” Signature in the Sea Urchin Stongylocentrotus purpuratus. Front Endocrinol (Lausanne) 2018;9:650. doi: 10.3389/fendo.2018.00650. [DOI] [PMC free article] [PubMed] [Google Scholar]
  97. Pfaff SL, Mendelsohn M, Stewart CL, Edlund T, Jessell TM. Requirement for LIM homeobox gene Isl1 in motor neuron generation reveals a motor neuron-dependent step in interneuron differentiation. Cell. 1996;84:309–320. doi: 10.1016/s0092-8674(00)80985-x. [DOI] [PubMed] [Google Scholar]
  98. Polychronopoulos D, King JWD, Nash AJ, Tan G, Lenhard B. Conserved non-coding elements: developmental gene regulation meets genome organization. Nucleic Acids Res. 2017;45:12611–12624. doi: 10.1093/nar/gkx1074. [DOI] [PMC free article] [PubMed] [Google Scholar]
  99. Potzner MR, Tsarovina K, Binder E, Penzo-Mendez A, Lefebvre V, Rohrer H, Wegner M, Sock E. Sequential requirement of Sox4 and Sox11 during development of the sympathetic nervous system. Development. 2010;137:775–784. doi: 10.1242/dev.042101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  100. Pouilhe M, Gilardi-Hebenstreit P, Desmarquet-Trin Dinh C, Charnay P. Direct regulation of vHnf1 by retinoic acid signaling and MAF-related factors in the neural tube. Dev Biol. 2007;309:344–357. doi: 10.1016/j.ydbio.2007.07.003. [DOI] [PubMed] [Google Scholar]
  101. Pujol-Marti J, Lopez-Schier H. Developmental and architectural principles of the lateral-line neural map. Front Neural Circuits. 2013;7:47. doi: 10.3389/fncir.2013.00047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  102. Quilichini E, Fabre M, Nord C, Dirami T, Le Marec A, Cereghini S, Pasek RC, Gannon M, Ahlgren U, Haumaitre C. Insights into the etiology and physiopathology of MODY5/HNF1B pancreatic phenotype with a mouse model of the human disease. J Pathol. 2021;254:31–45. doi: 10.1002/path.5629. [DOI] [PMC free article] [PubMed] [Google Scholar]
  103. Quillien A, Abdalla M, Yu J, Ou J, Zhu LJ, Lawson ND. Robust Identification of Developmentally Active Endothelial Enhancers in Zebrafish Using FANS-Assisted ATAC-seq. Cell Rep. 2017;20:709–720. doi: 10.1016/j.celrep.2017.06.070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  104. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  105. Raff RA. The shape of life: genes, development, and the evolution of animal form. University of Chicago Press; 2012. [Google Scholar]
  106. Ramirez F, Ryan DP, Gruning B, Bhardwaj V, Kilpert F, Richter AS, Heyne S, Dundar F, Manke T. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 2016;44:W160–165. doi: 10.1093/nar/gkw257. [DOI] [PMC free article] [PubMed] [Google Scholar]
  107. Rich-Griffin C, Eichmann R, Reitz MU, Hermann S, Woolley-Allen K, Brown PE, Wiwatdirekkul K, Esteban E, Pasha A, Kogel KH, et al. Regulation of Cell Type-Specific Immunity Networks in Arabidopsis Roots. Plant Cell. 2020;32:2742–2762. doi: 10.1105/tpc.20.00154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  108. Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP. Integrative Genomics Viewer. Nature biotechnology. 2011;29:24–26. doi: 10.1038/nbt.1754. [DOI] [PMC free article] [PubMed] [Google Scholar]
  109. Roy A, Francius C, Rousso DL, Seuntjens E, Debruyn J, Luxenhofer G, Huber AB, Huylebroeck D, Novitch BG, Clotman F. Onecut transcription factors act upstream of Isl1 to regulate spinal motoneuron diversification. Development. 2012;139:3109–3119. doi: 10.1242/dev.078501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  110. Sandelin A, Bailey P, Bruce S, Engstrom PG, Klos JM, Wasserman WW, Ericson J, Lenhard B. Arrays of ultraconserved non-coding regions span the loci of key developmental genes in vertebrate genomes. BMC Genomics. 2004;5:99. doi: 10.1186/1471-2164-5-99. [DOI] [PMC free article] [PubMed] [Google Scholar]
  111. Sander M, Sussel L, Conners J, Scheel D, Kalamaras J, Dela Cruz F, Schwitzgebel V, Hayes-Jordan A, German M. Homeobox gene Nkx6.1 lies downstream of Nkx2.2 in the major pathway of beta-cell formation in the pancreas. Development. 2000;127:5533–5540. doi: 10.1242/dev.127.24.5533. [DOI] [PubMed] [Google Scholar]
  112. Saund RS, Kanai-Azuma M, Kanai Y, Kim I, Lucero MT, Saijoh Y. Gut endoderm is involved in the transfer of left-right asymmetry from the node to the lateral plate mesoderm in the mouse embryo. Development. 2012;139:2426–2435. doi: 10.1242/dev.079921. [DOI] [PMC free article] [PubMed] [Google Scholar]
  113. Shaw-Smith C, De Franco E, Lango Allen H, Batlle M, Flanagan SE, Borowiec M, Taplin CE, van Alfen-van der Velden J, Cruz-Rojo J, Perez de Nanclares G, et al. GATA4 mutations are a cause of neonatal and childhood-onset diabetes. Diabetes. 2014;63:2888–2894. doi: 10.2337/db14-0061. [DOI] [PMC free article] [PubMed] [Google Scholar]
  114. Shi ZD, Lee K, Yang D, Amin S, Verma N, Li QV, Zhu Z, Soh CL, Kumar R, Evans T, et al. Genome Editing in hPSCs Reveals GATA6 Haploinsufficiency and a Genetic Interaction with GATA4 in Human Pancreatic Development. Cell Stem Cell. 2017;20:675–688.:e676. doi: 10.1016/j.stem.2017.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  115. Shin CH, Chung WS, Hong SK, Ober EA, Verkade H, Field HA, Huisken J, Stainier DY. Multiple roles for Med12 in vertebrate endoderm development. Dev Biol. 2008;317:467–479. doi: 10.1016/j.ydbio.2008.02.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  116. Spence JR, Lange AW, Lin SC, Kaestner KH, Lowy AM, Kim I, Whitsett JA, Wells JM. Sox17 regulates organ lineage segregation of ventral foregut progenitor cells. Dev Cell. 2009;17:62–74. doi: 10.1016/j.devcel.2009.05.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  117. Starks RR, Biswas A, Jain A, Tuteja G. Combined analysis of dissimilar promoter accessibility and gene expression profiles identifies tissue-specific genes and actively repressed networks. Epigenetics Chromatin. 2019;12:16. doi: 10.1186/s13072-019-0260-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  118. Sun Z, Hopkins N. vhnf1, the MODY5 and familial GCKD-associated gene, regulates regional specification of the zebrafish gut, pronephros, and hindbrain. Genes Dev. 2001;15:3217–3229. doi: 10.1101/gad946701. [DOI] [PMC free article] [PubMed] [Google Scholar]
  119. Sur A, Wang Y, Capar P, Margolin G, Prochaska MK, Farrell JA. Single-cell analysis of shared signatures and transcriptional diversity during zebrafish development. Dev Cell. 2023;58:3028–3047.:e3012. doi: 10.1016/j.devcel.2023.11.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  120. Tanigawa Y, Dyer ES, Bejerano G. WhichTF is functionally important in your open chromatin data? PLOS Computational Biology. 2022;18:e1010378. doi: 10.1371/journal.pcbi.1010378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  121. Tena JJ, Gonzalez-Aguilera C, Fernandez-Minan A, Vazquez-Marin J, Parra-Acero H, Cross JW, Rigby PW, Carvajal JJ, Wittbrodt J, Gomez-Skarmeta JL, et al. Comparative epigenomics in distantly related teleost species identifies conserved cis-regulatory nodes active during the vertebrate phylotypic period. Genome Res. 2014;24:1075–1085. doi: 10.1101/gr.163915.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  122. Traver D, Paw BH, Poss KD, Penberthy WT, Lin S, Zon LI. Transplantation and in vivo imaging of multilineage engraftment in zebrafish bloodless mutants. Nat Immunol. 2003;4:1238–1246. doi: 10.1038/ni1007. [DOI] [PubMed] [Google Scholar]
  123. Trinh LT, Osipovich AB, Liu B, Shrestha S, Cartailler JP, Wright CVE, Magnuson MA. Single-Cell RNA Sequencing of Sox17-Expressing Lineages Reveals Distinct Gene Regulatory Networks and Dynamic Developmental Trajectories. Stem Cells. 2023;41:643–657. doi: 10.1093/stmcls/sxad030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  124. Troll JV, Hamilton MK, Abel ML, Ganz J, Bates JM, Stephens WZ, Melancon E, van der Vaart M, Meijer AH, Distel M, et al. Microbiota promote secretory cell determination in the intestinal epithelium by modulating host Notch signaling. Development. 2018;145 doi: 10.1242/dev.155317. [DOI] [PMC free article] [PubMed] [Google Scholar]
  125. van Arensbergen J, Garcia-Hurtado J, Moran I, Maestro MA, Xu X, Van de Casteele M, Skoudy AL, Palassini M, Heimberg H, Ferrer J. Derepression of Polycomb targets during pancreatic organogenesis allows insulin-producing beta-cells to adopt a neural gene activity program. Genome Res. 2010;20:722–732. doi: 10.1101/gr.101709.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  126. Vanheer L, Fantuzzi F, To SK, Schiavo A, Van Haele M, Ostyn T, Haesen T, Yi X, Janiszewski A, Chappell J, et al. Inferring regulators of cell identity in the human adult pancreas. NAR Genom Bioinform. 2023;5:lqad068. doi: 10.1093/nargab/lqad068. [DOI] [PMC free article] [PubMed] [Google Scholar]
  127. Vavouri T, Lehner B. Conserved noncoding elements and the evolution of animal body plans. Bioessays. 2009;31:727–735. doi: 10.1002/bies.200900014. [DOI] [PubMed] [Google Scholar]
  128. Villefranc JA, Amigo J, Lawson ND. Gateway compatible vectors for analysis of gene function in the zebrafish. Dev Dyn. 2007;236:3077–3087. doi: 10.1002/dvdy.21354. [DOI] [PMC free article] [PubMed] [Google Scholar]
  129. Viotti M, Niu L, Shi SH, Hadjantonakis AK. Role of the gut endoderm in relaying left-right patterning in mice. PLoS Biol. 2012;10:e1001276. doi: 10.1371/journal.pbio.1001276. [DOI] [PMC free article] [PubMed] [Google Scholar]
  130. Von Baer KE. Über Entwickelungsgeschichte der Thiere; Beobachtung und Reflexion. Bei den gebrüdern Bornträger; 1828. [Google Scholar]
  131. Warga RM, Kane DA. Wilson cell origin for kupffer’s vesicle in the zebrafish. Dev Dyn. 2018;247:1057–1069. doi: 10.1002/dvdy.24657. [DOI] [PubMed] [Google Scholar]
  132. Warga RM, Nusslein-Volhard C. Origin and development of the zebrafish endoderm. Development. 1999;126:827–838. doi: 10.1242/dev.126.4.827. [DOI] [PubMed] [Google Scholar]
  133. Weedon MN, Cebola I, Patch AM, Flanagan SE, De Franco E, Caswell R, Rodriguez-Segui SA, Shaw-Smith C, Cho CH, Allen HL, et al. Recessive mutations in a distal PTF1A enhancer cause isolated pancreatic agenesis. Nat Genet. 2014;46:61–64. doi: 10.1038/ng.2826. [DOI] [PMC free article] [PubMed] [Google Scholar]
  134. Westerfield M. The Zebrafish Book; A guide for the laboratory use of zebrafish (Danio rerio) 2007 (No Title) [Google Scholar]
  135. Whitington T, Frith MC, Johnson J, Bailey TL. Inferring transcription factor complexes from ChIP-seq data. Nucleic Acids Res. 2011;39:e98. doi: 10.1093/nar/gkr341. [DOI] [PMC free article] [PubMed] [Google Scholar]
  136. Woolfe A, Goodson M, Goode DK, Snell P, McEwen GK, Vavouri T, Smith SF, North P, Callaway H, Kelly K, et al. Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol. 2005;3:e7. doi: 10.1371/journal.pbio.0030007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  137. Xia J, Kang Z, Xue Y, Ding Y, Gao S, Zhang Y, Lv P, Wang X, Ma D, Wang L, et al. A single-cell resolution developmental atlas of hematopoietic stem and progenitor cell expansion in zebrafish. Proc Natl Acad Sci U S A. 2021;118 doi: 10.1073/pnas.2015748118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  138. Ye T, Krebs AR, Choukrallah MA, Keime C, Plewniak F, Davidson I, Tora L. seqMINER: an integrated ChIP-seq data interpretation platform. Nucleic Acids Res. 2011;39:e35. doi: 10.1093/nar/gkq1287. [DOI] [PMC free article] [PubMed] [Google Scholar]
  139. Yee NS, Yusuff S, Pack M. Zebrafish pdx1 morphant displays defects in pancreas development and digestive organ chirality, and potentially identifies a multipotent pancreas progenitor cell. Genesis. 2001;30:137–140. doi: 10.1002/gene.1049. [DOI] [PubMed] [Google Scholar]
  140. Yu G, Wang L-G, He Q-Y. ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization. Bioinformatics. 2015;31:2382–2383. doi: 10.1093/bioinformatics/btv145. [DOI] [PubMed] [Google Scholar]
  141. Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W. Model-based analysis of ChIP-seq (MACS) Genome biology. 2008;9:1–9. doi: 10.1186/gb-2008-9-9-r137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  142. Zhu H, Wang G, Nguyen-Ngoc KV, Kim D, Miller M, Goss G, Kovsky J, Harrington AR, Saunders DC, Hopkirk AL, et al. Understanding cell fate acquisition in stem-cell-derived pancreatic islets using single-cell multiome-inferred regulomes. Dev Cell. 2023;58:727–743.:e711. doi: 10.1016/j.devcel.2023.03.011. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Materials
Supplementary Tables

Data Availability Statement

All raw and processed sequencing data generated in this study have been submitted to the NCBI Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/) under accession number GSE294761 (ATAC-seq) and GSE294799 (RNA-seq).

RESOURCES