Abstract
Neurodevelopmental disorders are thought to arise from interrupted development of the brain at an early age. Genome-wide association studies (GWAS) have identified hundreds of loci associated with susceptibility to neurodevelopmental disorders; however, which noncoding variants regulate which genes at these loci is often unclear. To implicate neuronal GWAS effector genes, we performed an integrated analysis of transcriptomics, epigenomics and chromatin conformation changes during the development from Induced pluripotent stem cell–derived neuronal progenitor cells (NPCs) into neurons using a combination of high-resolution promoter-focused Capture-C, ATAC-seq and RNA-seq. We observed that gene expression changes during the NPC-to-neuron transition were highly dependent on both promoter accessibility changes and long-range interactions which connect distal cis-regulatory elements (enhancer or silencers) to developmental-stage-specific genes. These genome-scale promoter-cis-regulatory-element atlases implicated 454 neurodevelopmental disorder-associated, putative causal variants mapping to 600 distal targets. These putative effector genes were significantly enriched for pathways involved in the regulation of neuronal development and chromatin organization, with 27% expressed in a stage-specific manner. The intersection of open chromatin and chromatin conformation revealed development-stage-specific gene regulatory architectures during neuronal differentiation, providing a rich resource to aid characterization of the genetic and developmental basis of neurodevelopmental disorders.
Keywords: chromatin architecture, epigenomics, iPSC, neurodevelopmental disorders
INTRODUCTION
Neurodevelopmental disorders are a group of disorders arising from interrupted development of the central nervous system at an early age1. These phenotypes can manifest as a variety of neurological dysfunctions, including deficient cognitive skills, behavior and communication impairment in childhood and neuropsychiatric challenges that only appear later in adulthood. The common pediatric neurodevelopmental disorders include intellectual disability (ID), attention deficit hyperactivity disorder (ADHD) and autism spectrum disorders (ASD). Additional traits which include neuropsychiatric diseases, such as schizophrenia (SCZ), and bipolar disorder (BIP), may be derived from the impaired neuronal development at early times, i.e. during pregnancy, the perinatal period, or during infancy/childhood2.
Like most common complex traits, the etiology of most neurodevelopmental disorders is widely thought to result from the combination and interplay between genetics and environmental factors. A genetic component is supported by the fact that many neurodevelopmental disorders are highly heritable; for instance, various twin and sibling studies have revealed more than 70% heritability in schizophrenia3, bipolar4, ADHD5 and ASD6,7 throughout the lifespan, and approximately 40%-60% in anti-social behavior (ATS)8, mild ID9 and obsessive-compulsive symptoms (OCD)10. Some phenotypes, like ASD, can be highly genetically heterogeneous and the pathogenesis can be driven by a variety of inherited and de novo variations11,12,13. Over the past 15 years, genomewide association studies (GWAS) have revealed hundreds of loci associated with susceptibility to neuronal developmental disorders14,15,16,17,18,19,20. Despite the precise effector genes not being known at each given locus, many of genes residing close to these signals are enriched in pathways related to neuronal proliferation, differentiation, migration, and maturation, e.g. glutamatergic neurotransmission and synaptic plasticity13,5,14,17. These reports therefore support the importance of the neuronal developmental process for conferring risk for these disorders, and thus warrants further investigation.
Given that the majority of GWAS–implicated variants for various traits are located at non-coding region, it has been a considerable challenge to identify true causal variants and the corresponding effector genes. Many fine-mapping strategies have been developed, and can be generally classified in to either inferring variants function (e.g. overlapping variants with chromatin marks, chromatin accessibility quantitative trait loci21, TF binding disruption22, etc) or focusing on gene prioritization (cis-eQTL colocalization23, DEPICT24, MAGMA25, etc). With the development of three-dimensional genomic technologies, an increasing number of studies have integrated spatial and functional genomic organization with GWAS data. Such approaches have driven the mapping of putatively casual variants associated with various traits to their corresponding effector gene(s) in relevant cell types26,27,28,29. For neurological-related traits, Rajarajan and colleagues (2018) explored genome conformation changes using Hi-C at the level of topologically associated chromatin domains (TADs) during neuronal cell differentiation30. This data helped triple the candidate susceptibility gene list for schizophrenia and implicated cell-specific disease risk vulnerability via spatial genome organization. Besides Hi-C, other 3D genomics techniques, including HindIII-based promoter capture Hi-C and proximity ligation-assisted ChIP-seq (PLAC-seq), have also been applied to various iPSC-derived or primary neuron cell types, including excitatory neurons, hippocampal DG-like neurons, lower motor neurons, astrocytes28, microglia and oligodendrocytes31. Similarly, an integrative analysis combined promoter Capture Hi-C with ChIP-seq for regulatory histone marks and characterized the changes of promoter contacts with regulatory elements during lineage specification from embryonic stem cells to neural progenitors32. These studies emphasized the importance of 3D chromatin structure in gene regulation and a wide range of neuropsychiatric disorders. However, the gene regulatory architecture driving neuronal cell differentiation has not been fully explored at the level of promoter-cis-regulatory-elements interactions among a wide range of neurodevelopmental disorders.
We sought to extend those prior studies by applying a high-resolution promoter-focused Capture-C coupled with ATAC-seq and RNA-seq in order to characterize genetic and regulatory architecture changes during differentiation from human iPSC-derived neuronal progenitor cells (NPCs) to neurons at the level of promoter-cis-regulatory-element interaction. We report that gene expression changes during the transition from NPCs to neurons, but this pattern can only partially be explained by promoter-proximal epigenomic changes. By integrating long-range promoter-chromatin interactions in 3D, we observe that developmental-stage-specific gene expression is associated with stage-specific contacts with distal cis-regulatory elements, the chromatin modifications of connected cis-regulatory elements and specific transcription factor binding patterns. Finally, using our promoter interaction maps in neuronal cell types, we connect 454 putative neurodevelopmental disorder-associated causal variants to 600 candidate effector genes, 60% of which are in developmental-stage-specific manner. These promoter-cis-regulatory-element atlases reveal both shared and neural stage-specific gene regulatory architectures, and provide a rich resource to aid understanding of the genetic and developmental basis of neurodevelopmental disorders.
RESULTS
Modeling gene expression dynamics during differentiation from iPSC-derived neuronal progenitor cells
To model the differentiation of human neuronal cells in vitro, we generated both neuronal progenitor cells (NPCs) and neurons from iPSC lines derived from two healthy donors (CHOPWT1033 and CHOPWT14) using an established protocol34 (Figure 1A). We assessed the expression of several cell type-specific marker genes using PCR and immunofluorescence microscopy in order to confirm that the iPSC derived cells molecularly resemble progenitors and neuronal cells. As anticipated, NPCs expressed the neuronal progenitor markers Nestin, SOX1 and PAX6 (Supplemental Figure 1A, B), while neurons expressed neuronal markers RBFOX3/NeuN, MAP2 and TUJ1 (Supplemental Figure 1C, D). To further validate the expression profiles of iPSC-derived neurons and NPCs, we performed RNA-seq on three replicates for each derived cell line. The difference between neurons and NPCs dominated the pattern of observed global expression variation, while there was a subtle difference between individual donors (Figure 1B). In addition to a few marker genes verified by PCR and immunofluorescence microscopy, there were 6,222 genes significantly differentially expressed between neurons and NPCs (FDR < 0.05, log2 Fold Change > 1 or < −1), including a handful of previously known marker genes (Figure 1C, Supplemental Table 1). The genes upregulated in NPCs included those that encode Notch effector HES5, which regulates the onset and maintenance of neuronal progenitor cells, and POU3F2 which maintain neurogenesis in several neuronal progenitor populations including neuroepithelial and radial glial cells35. In contrast, the genes responsible for encoding neuronal migration and axon formation, such as Neurofilament Light chain (NEFL)36, neuronal cell adhesion molecule (CNTN2)37 and axonal membrane protein (GAP43)38, were highly expressed in the neurons. Consistent with genes with known function, the enriched pathways upregulated in NPCs (IPA z-score < 0, FDR < 0.05) included the notch signaling pathway and the cell cycle regulation pathways, while synaptogenesis signaling pathway, CREB signaling and netrin signaling were significantly upregulated in neurons (IPA z-score > 0, FDR < 0.05) (Figure 1D, Supplemental Table 2).
Figure 1. Gene expression dynamics during iPSC-derived neuronal cell differentiation.

A. Schematic of the study design for generating NPCs and neurons and performing integrative analysis using promoter focused Capture-C, ATAC-seq and RNA-seq to identify the promoter-cis-regulatory element(cRE) looping and identify variant-to-gene mapping using GWAS data. B. Principle Component Analysis (PCA) of RNA-seq replicates in NPCs and Neurons. C. Relative expression heatmap of 6222 differentially expressed genes across all samples with known marker genes labeled. D, E. Top enriched canonical pathways from Qiagen Ingenuity for protein-coding genes differentially expressed between NPCs and neurons. Enriched pathways are grouped into activated pathways (D, red) and suppressed pathways (E, blue) by z-scores and ranked by their BH corrected p-value (−log10BH). The percentage of genes in each pathway found in our differentially expressed gene sets were plotted as ratio (orange line) and level of z-score was represented by the transparency scale. Expanded lists of enriched pathways are available in Supplementary Table 2.
We further compared our global expression profiles of iPSC-derived neuronal cells to primary tissues in the Genotype-Tissue Expression (GTEx) database39. The expression profiles of iPSC-derived neurons were more correlated with brain tissues when compared to other non-neuronal tissues (Supplemental Figure 2A). In contrast, iPSC-derived NPCs maintained more pluripotency, with the highest correlation being with cultured fibroblasts, while still retaining a brain expression profile (Supplemental Figure 2B). Taken together, our data support iPSC-derived NPCs and neurons as an effective model system to analyze the transcriptional mechanisms underlying neuronal cell differentiation.
Promoter chromatin accessibility patterns partially correlate with gene expression changes
To gain insight into the transcriptional mechanisms underlying neuronal cell differentiation, we performed ATAC-seq on both NPCs and the neurons from the same donors in order to assess the chromatin accessibility landscape in these cells. Five technical replicates per donor for each cell type were sequenced, yielding more than 40 million paired-end reads per sample and enabling us to identify open chromatin regions (OCRs) that typify active regulatory elements both cell types. Approximately 200,000 non-overlapped OCRs were identified for neurons and NPCs independently, and a consensus of 277,605 OCRs was derived from merging OCR sets across these cell types. Pearson correlation between the samples indicated both high data quality and reproducibility between replicates (Supplemental Figure 3A). Similar to the observations in RNA-seq, the accessibility differences between NPCs and neurons contributed to the major difference in the data, while genetic variation between donors also plays a role in chromatin accessibility landscape differences.
To assess the global accessibility changes during neuronal cell differentiation, we performed differential analysis on consensus OCRs and observed that 30,996 (11.2%) OCRs were significantly differentially accessible between NPCs and neurons, of which 72.5% (22,485) had higher accessibility in NPCs than neurons (Supplemental Figure 3B). A global assessment of accessibility between the cell types confirmed that the chromatin landscape was generally more accessible in NPCs (two-side paired Wilcox Rank-Sum Test P-value < 2x10−16) (Figure 2A). Our data suggests a more globally open chromatin state being actively maintained in pluripotent cells to allow for transcriptional activation.
Figure 2. Chromatin accessibility patterns in NPCs and neurons.

A. General higher accessibility in NPCs. Log2FPKM of 277,605 OCRs are used as the proxy to evaluate general chromatin accessibility and plotted as violin plot for both NPCs (blue) and neurons (red). Boxplot indicates Q1-1.5 * interquartile range (IQR) , 25th (Q1), median, 75th (Q3) and 1.5 * IQR + Q3. Wilcox Rank-Sum Test is performed to reach statistical significance (P < 2e-16). B. The composition of OCRs based on their relative distance to gene. prOCRs (blue) are within −1500bp ~ +500bp of transcription start site. The rest non-prOCRs were grouped to intragenic OCRs (green) which overlapped with gene body while intergenic OCRs (red) are far away from gene body. C. Promoter accessibility change for genes differentially expressed between NPCs and neurons. Log2 fold change of FPKM was used to measure the accessibility change for prOCR that resides closest to TSS of corresponding differentially upregulated (red, n=3,163) and down-regulated (blue, n=2,032) gene. D. The examples of prOCR accessibility and gene expression change for neurons and NPCs marker genes. The log2FPKM of prOCRs and log2TPM of gene expression were scaled and plotted in heatmap for neuron (red font, SYN1, STMN4, RBFOX3, NEFL) and NPC (blue font, NES, HES5, PAX6, SOX1) marker genes. The intensity of the yellow/red colors represents relatively low/high in the expression level or promoter accessibility. E. Enrichment of promoter chromatin openness among differentially expressed genes. Enrichment is calculated by a smoothed density of the prOCR accessibility fold changes (log2FPKM) corresponding to differentially expressed gene sets (upregulated: red, downregulated: blue), divided by the density of global fold change for all genes.
To discover the regulatory activity of these OCRs, we first focused on the OCRs that specifically coincide with gene promoter regions (prOCRs; −1500bp ~ +500 bp of transcription start site (TSS)), with the presumption that they function as promoter regulatory elements and directly drive gene expression. We annotated 14.8% (41,111) of the OCRs to 35,215 genes (Figure 2B, Supplemental Table 4), among which 6,075 prOCRs were differentially accessible between NPCs and neurons, corresponding to 7,749 genes. It should be noted that gene-prOCR contacts are not 1-to-1 mapped given the arbitrary definition of the promoter region (−150 to +500 of TSS) and that some gene TSSs are in very close proximity to each other. Among all the genes with at least one prOCR, 5,195 genes were differentially expressed between the cell types, with 2,032 downregulated and 3,163 upregulated when the NPCs were differentiated to neurons.
To determine whether changes in expression correlated with changes in corresponding promoter accessibility, we examined the list of upregulated and downregulated genes separately. When expression increased during neuronal differentiation, the corresponding prOCRs were generally more open (Figure 2C). For example, prOCR accessibility for neuron marker genes SYN1, STMN4, RBFOX3 and NEFL was increased when their expression was upregulated during neuronal differentiation (Figure 2D). This general trend was generally discernable for genes with more dramatic expression change (Figure 2E). However, among down-regulated genes during neuronal differentiation, we still observed more than 50% of prOCRs opening, and prOCR accessibility was less concordant with gene expression change (Figure 2D, E). This observation suggests that changes in promoter accessibility only partially explains differences in gene expression during neuronal differentiation, and that promoter accessibility likely contributes more substantially to gene up-regulation than down-regulation.
Characterizing long-range chromatin interactions in NPCs and neurons
To investigate the impact of higher-order chromatin organization on gene expression regulation, we conducted high-resolution (DpnII 4-cutter) promoter-focused Capture-C29,40 for both iPSC-derived NPCs and neurons from the same donors. Specifically, we enriched promoter-involved interactions through hybridization with a set of 127,472 RNA probes (“baits”) targeting the promoters of 28,241 genes in GENCODE V19, along with an additional 13,877 UCSC lincRNA and sno/miRNA annotations from UCSC. Similar to ATAC-seq and RNA-seq, the promoter-focused Capture-C samples were principally clustered by cell types, with the average stratum-adjusted correlation coefficient41 of contact frequency between biological replicates being 0.875 (technical replicate correlation coefficient was 0.929) (Supplemental Figure 4A). Following initial pre-processing by hicup42(Supplemental Table 5), we applied CHICAGO43 to the captured read pairs to identify significant interactions at 36,691 uniquely baited DpnII fragments with a Chicago score greater than 5. This threshold best balanced the sensitivity and specificity of detecting active regulatory elements and enhancer features across cell types (Supplemental Figure 5B). It resulted in 246,206 and 442,456 interactions with a median interaction distance of 48Kb and 85kb for neurons and NPCs, respectively (Supplemental Table 6, Supplemental Figure 4C). This dramatic reduction in promoter contact number in neurons relative to NPCs was consistent with previous observations30. Among all 592,798 interactions detected in at least one cell type, 10% interactions were differential between NPCs and neurons (Supplemental Table 7). Compared to random interactions, the interactions (Chicago score >= 5) identified in both NPC and neurons were significantly enriched for histone marked regulatory features in corresponding cell types (Supplemental Figure 4D). On average, ~95% of these interactions resided in the same topologically associating domain (TAD), as defined in human embryotic stem cells44 (Supplemental Figure 4E), and 10% of interactions occurred between promoter-containing baits. To increase the statistical power of detecting longer distance contacts, for which fewer reads were available per fragment to call significant interactions, we called promoter contacts at a four-fragment resolution after in silico fragment concatenation29,45. This allowed us to detect 236,501 and 293,681 interactions, with a median interaction distance of 192Kb and 227kb, for neurons and NPCs, respectively (Supplemental Table 6, Supplemental Figure 4C, E). 8% of interactions were differential between cell types (Supplemental Table 7). A combination of both calls not only preserved the precision of single DpnII-based fragment analyses, but also increased the sensitivity to detect interactions at greater distances with lower resolution, in order to assemble comprehensive promoter contact maps across the genome.
To further explore the regulatory nature of the spatial connections between gene promoters and distal genomic regions, we focused on the promoter-interacting regions (PIRs) overlapping with pre-called OCRs, where transcriptional regulation largely occurs46. 88,576 (31.9%) OCRs overlapped with PIRs on the same chromosome in at least one cell type. We termed these OCRs interacting with a promoter bait as putative “cis regulatory elements” (cREs). 71% (62,883) of cREs were in excess of 2kb away from a TSS and were previously annotated as non-promoter OCRs (nprOCRs) that failed to affiliate with any genes (Supplemental Figure 5A). By incorporating promoter-focused Capture-C data, we could annotate an additional 22.7% of OCRs and connect them to a total of 37,435 distal genes, compared to just 14.8% of OCRs if we only use ATAC-seq-defined prOCRs simply identified by genomic vicinity. On average, each gene was contacted by 6-7 cREs, with 90% genes having fewer than 20 interacting cREs (Supplemental Figure 5B). 22,232 and 24,841 cREs interacted with only one gene in neurons and NPCs, respectively, while 4-6% cREs interacted with more than 10 genes. The latter cREs could serve as master regulators of gene expression regulation in one of these cell types (Supplemental Figure 5C, Supplemental Table 8, Supplemental Table 9).
Previous studies have shown that PIRs largely coincide with open chromatin regions and histone marks enriched at enhancers26–29 (Supplemental Figure 4C). In comparison to OCRs that fail to associate with any genes, cREs were significantly more accessible in both NPCs and neurons (two-side Wilcox Rank-Sum Test P < 2x10−16, Figure 3A). Furthermore, we examined the global chromatin signatures of our cREs, by leveraging chromatin state inferred by ChromHMM in hESCs-derived neurons and NPCs (Supplemental Table 10) from the Roadmap Epigenomics Project47. We found that cREs from both NPCs and neurons were specifically enriched for enhancers and transcriptional start sites (TSS), and conversely depleted at heterochromatin and repeat genomic regions (Figure 3B). Further comparison with an experimentally validated enhancer database (VISTA), showed significant enrichment of brain-specific enhancers overlapping our cREs (binomial test P < 0.01, Figure 3C and Supplemental Table 11). By grouping genes based on the number of cREs per gene, we observed a modest positive correlation between the number of cREs and mean gene expression within the group (Supplemental Figure 5D, E. linear regression F-test NPC: P=4.72x10−5, R2= 0.73. neurons: P=9.64x10−4, R2=0.58), suggesting the additive effects of cREs on gene transcription. This result confirms that overlapping OCRs with promoter-focused Capture-C specifically enriches for genomic elements that are actively engaged in gene regulation.
Figure 3. Regulatory nature of spatial connections between gene promoters and putative cis-regulatory elements.

A. cREs enrich for OCRs with higher accessibility. The accessibility of cRE or OCRs that are not involved in any promoter interactions (non-cRE OCRs) are measured by log2FPKM independently in both NPCs and neurons. Wilcox Rank-Sum Test is performed to test the significance of accessibility difference (P < 2e-16). B. Chromatin state enrichment of cRE. 15 chromatin states are defined by 5 histone marks (H3K27me3, H3K4me3, H3K4me1, H3K36me3, H3K9me3) from ESC-derived NPCs and neurons (left panel). Random genome, all cRE and cRE far away from TSS (nprOCR_cRE) are compared to chromatin states for NPCs (middle panel) and neurons (right penel) independently, with relative level of overlap enrichment represented by heatmap scale with darker blue indicating higher enrichment. C. cRE enrich for experimentally validated enhancers in brains. Enhancers overlapping with cRE (red) are compared to all the enhancers (green) in VISTA database in term of tissue expression pattern. The proportion in each tissue are subjected to one-tailed binomial test for statistical significance (Supplemental Table 10). D. cREs interact with the PAX6 promoter in a cell-specific manner. The significant interaction (CHiCAGO score > 5) between cREs and PAX6 are indicated by arc in NPCs (blue) and neurons (red). The interactions with enhancers (red box) are pruned while interactions with polycomb repressed chromatin (blue box) are gained during the neuronal differentiation. E. Enhancers interacting with PAX6 regulate gene expression in forebrain. LacZ staining in mouse embryos are obtained from VISTA. F. Gene expression change is correlated with proportion change of enhancer feature positively and silencer feature negatively. The calculation of proportion of enhancer and silencer feature was illustrated in Supplemental Figure 7. The blue numbers indicate the numbers of genes per group. Linear regression was performed between log2 expression fold change and feature proportion difference per gene (Enhancer: P=2.18x10−11, beta= 0.374; R2=3.75x10−3; Silencer: P=0.08, beta= −0.169; R2=1.73x10−4). Boxplot indicates median, Q1, Q3, Q1 – 1.5 × IQR and Q3 + 1.5 × IQR.
Long-range interacting cis-regulatory elements contribute to gene transcriptional changes during neuronal differentiation
Among the 359,490 unique gene-cRE interactions observed, 31% were shared between the neurons and NPCs (Supplemental Figure 5F). To understand how long-range interactions between gene promoters and cREs contribute to transcriptional changes during differentiation, we focused on the genes differentially expressed between the two cell types, where we characterized the difference based on the number and chromatin states of cis-regulatory elements interacting with those genes.
For example, PAX6 is a transcription factor that exerts high-level control of cortical development and promotes neurogenesis in neuronal stem cells48–50. In iPSC-derived neurons and NPCs, the expression level of PAX6 was >4-fold higher in NPCs (Figure 1C, Supplemental Table 1), confirming its specific regulatory role in neuronal stem cells; however, when examining the 1D chromatin state and promoter accessibility, we did not observe significant differences between NPCs and neurons (Figure 2E). By incorporating the 3D chromatin interaction structure for PAX6, we detected 62 and 25 cRE interactions in NPCs and neurons, respectively (Figure 3D, Supplemental Table 8 and 9). There were 20 cREs shared between NPCs and neurons, while the remaining 70% of interacting cREs were specific to each cell type. More interestingly, the cREs that lost interactions with PAX6 during neuronal differentiation coincide with enhancers chromatin state (Figure 3D). Two known enhancers, hs855 and hs1082 from the VISTA enhancer database51, were physically contacted by PAX6 and regulated gene expression in forebrain where cortical neurons are located (Figure 3E). In contrast, the cREs gaining interactions with PAX6 (~0.6Mbp downstream of TSS) when NPCs differentiated to neurons were mainly enriched in polycomb repressed chromatin (Figure 3D), which is known to downregulate spatially proximal genes via the formation of heterochromatic regions52. Our data therefore suggests that transcriptional regulation of PAX6 is at the level of long-range chromatin interaction changes.
In addition to stem-cell specific genes, we also observed a similar pattern in the neuronal up-regulated gene NEFL, which encodes the neurofilament light chain and constitutes a major type of intermediate filaments found in neurons. The enhancer-like cRE ~500kbp downstream of NEFL interacted with its promoter when NPCs were differentiated into neurons (Supplemental Figure 6A). Furthermore, ~50kbp downstream of the NEFL promoter we observed increased chromatin accessibility at interacting cREs in neurons (Supplemental Figure 6B).
For a more global evaluation on the relationship between the difference in cRE number and change in gene expression during differentiation, we first focused on genes that were contacted by least one active enhancer-like cREs in either cell type. We found that the number of interactions between promoters with active enhancers generally positively correlated with gene expression change (Supplemental Figure 5G, linear regression P = 5.16×10−25; beta = 0.04; R2 = 9.22×10−3). This suggests that promoter interactions with active enhancers have additive effects on cell-type-specific expression levels. To further explore the gene regulation impact of different chromatin features, we calculated the proportion of active enhancer and silencer regions per gene across all interacting cREs in either cell type (Supplemental Figure 7). Specifically, we improved the ChromDiff53 epigenomic feature definition; instead of integrating chromatin states over the gene body, we calculated the proportion of each chromatin state over the entire length of the total interacting cREs per gene and merged chromatin states to two “super-categorical” features “enhancer” (Enh and EnhG) and “silencer” (ReprPCWk and ReprPC). Similar to enhancer-like cREs number, proportion change of enhancer features between neurons and NPCs was positively correlated with gene expression change (Figure 3F, linear regression P=2.18x10−11, beta= 0.374; R2=3.75x10−3). We also observed a weaker negative association between proportion change of silencer features and gene expression change (Figure 3F, linear regression P=0.08, beta= −0.169; R2=1.73x10−4). The genes with dramatically decreasing expression were mainly regulated by gaining interaction with silencer features. This data provides global evidence that the combination of 3D chromatin structural modifications and the dynamics of chromatin state can better explain the gene expression change during neuronal differentiation.
Transcription factor binding changes are associated with gene expression dynamics during neuronal development
Gene regulation involves both cis- and trans-regulatory elements. The 3D structure and status of chromatin represents two aspects of cis-regulatory architecture, and are highly associated with trans-acting elements like transcription factors (TFs) which link distal chromatin together (chromatin structure) and modify histone and/or DNA methylation and acetylation status (chromatin state)54,55. To identify TFs involved in neuronal development, we first scanned our defined open chromatin landscape using PIQ56, in order to determine putative genomic binding sites for 657 TFs from the JASPAR2020 TF binding database44. There were 382 and 427 expressed TFs (TPM > 1) yielding at least one high-confidence TF binding site (PIQ purity score > 0.8) within either a prOCR or a cRE interacting with a gene promoter (we termed these regions “gene annotated open regions”) in NPCs and neurons, respectively.
To evaluate TF motif enrichment at gene-annotated open regions for each cell type, we performed a Bias-free Footprint Enrichment Test (BiFET)58, which takes into account the TF binding footprint, along with corrected read depth and motif GC content, to unbiasedly predict enriched TFs (see methods). There were 77 and 124 TFs significantly enriched in neurons and NPCs (FDR < 0.05, TPM > 1), respectively (Supplemental Table 12), with 72 TFs shared across both cell types (Figure 4A). In line with previous reports28,32,40, CTCF was among the top enriched TFs for both neurons and NPCs, consistent with its role in mediating long-range genomic interactions (Figure 4A). Besides CTCF, we also observed a number of neuron development-related TFs among our enriched predicted TFs in both cell types (Figure 4A). ZIC3, which has been previously found to be enriched in excitatory and motor neurons28, and NRF1, which was identified as a neuron-enriched de novo motif59, were also both enriched in our iPSC-derived cortical neurons and NPCs. NPC regulatory TFs, such as E2F160, TCF361 and MYCN62, whose primary function are to maintain neuronal stem cells and control cell proliferation, decrease in expression during neuronal differentiation; however we found that some of these TF motifs were not only enriched in NPCs but also neurons, indicating that they may play additional roles in neuronal differentiation. For example, GO pathway analysis of genes specifically targeted by MYCN in neurons reveals that it may also play a role in neuronal differentiation processes like neurite morphology and synaptogenesis (Figure 4B). On the other hand, motifs for TFAP2C63, ETV564 and KLF665, which are known as critical regulators for neuronal differentiation and up-regulated in iPSC-derived neurons compared to NPCs, were also enriched in both NPCs and neurons. This foot-printing enrichment analysis suggests that some TFs previously thought to be either NPC or neuron specific based on gene expression alone may have additional functions in neural differentiation outside their previously defined stage by binding to different downstream genes.
Figure 4. Transcriptional factor changes during neuronal differentiation.

A. Enriched TFs at gene annotated open regions with differential expression profile between NPC and neurons. The relative expression of TFs significantly enriched in gene annotated open regions are plotted in heatmap and gene names with significantly differential expression are colored (up-regulation: red, down-regulation: blue). The enrichment was ranked independently in each cell type based on BiFET enrichment FDR (Supplemental Table 11). TFs were clustered and labelled with different color according to their TF family. B. Pathway enrichment of genes targeted by transcription factor MYCN. Genes targeted by MYCN were defined by either promoter connecting cREs or TSS-proximal prOCRs containing MYCN binding sites in either NPCs (blue) or neurons (red). Hypergeometric enrichment was performed overlapping cell-type specific genes against MSigDB GO Biological Process ontology collection. C. TF binding activity difference between NPC and neurons. Motif score represents the average binding frequency of a TF at given cells (see methods). Cell type enriched TFs are colored (green: both cell type; blue: NPC only; red: neurons only). Cell-specific-binding TFs are defined if it is enriched in given cell type, motif score > 0.8 in that given cell type but motif score < 0.8 in the other cell type.
To further distinguish TF binding activity between NPCs and neurons, we calculated motif scores to evaluate global TF binding frequency for specific motifs in the given cell types66. Although the majority of motifs produced similar scores in NPCs and neurons, the motif scores for TFs previously enriched in at least one cell type shifted towards neurons (Figure 4C). HEY1 and MYC showed higher TF binding activity in neurons, corresponding to their elevated expression during differentiation. On the other hand, ETS1 and ZNF424, which were only enriched in NPCs, showed higher TF binding activity in NPCs. Both TFs have been found previously to regulate NPCs, where ETS1 regulates radial glia formation during vertebrate embryogenesis67 and Zfp423/ZNF423 regulates cell cycle progression in Purkinje neuron progenitors68. This data implicates that, not only TF expression, differences in TF binding activity also contribute to the gene expression change during the neuronal differentiation.
Variant to gene mapping for GWAS signals associated with neurodevelopmental disorders
One benchmarked application of chromatin accessibility and promoter interaction maps is to prioritize variants and annotate those variants to their corresponding target genes. With the assumption that neuronal involvement is important for brain development, we integrated our data in both iPSC-derived neurons and NPCs with GWAS findings to annotate 380 unique lead sentinel variants from 7 neuronal developmental disorders (Supplemental Table 13), consisting of attention deficit hyperactivity disorder (ADHD), autism spectrum disorder (ASD), bipolar disorder (BD), schizophrenia (SCZ), intellectual disability (ID), obsessive compulsive disorder (OCD) and anti-social behavior (ATS). We found that ADHD-, ID-, BP- and SCZ-associated variants were significantly enriched at “gene annotated open regions” in both NPC and neurons; while OCD was only enriched in NPCs but not neurons (LDSC enrichment P < 0.05, Figure 5A). ASD and ATS SNPs were enriched in neither cell type, suggesting the involvement of alternative cell types for those disease etiologies, or the GWAS efforts to date were relatively under-powered for those disorders. In addition, we also performed heritability enrichment on several non-neuronal traits (eczema, allergies and Inflammatory bowel disease) with comparable GWAS sample size as negative controls. No enrichment was found in either NPCs or neurons (Figure 5A). Taken together, this result confirmed that gene annotated open regions in iPSC-derived NPCs and neurons specifically enriched for genetic heritability in a number of neurodevelopmental diseases and provide a reliable resource to study chromatin-mediated gene regulation in those diseases.
Figure 5. Genetic analysis of chromatin interactions with neurodevelopmental disorder-associated variants.

A. LD score regression enrichment for 7 neurodevelopmental disorders. The color and size of each square represent the enrichment P value and enrichment fold compared to baseline. Three non-brain traits (ECZ - eczema, ALG - allergy, IBD - Inflammatory bowel disease) were used as negative control. B. Casual variant-effect gene pair number in NPCs and neurons. The casual variant-effect gene pairs are identified if casual variant is in high LD (r2 > 0.8) with sentinel variant and reside in a cRE (FPKM > 1) that significantly interact (CHICAGO score > 5) with expressed gene (TPM > 1) in given cell type. C. Cell-specific interactions at locus “SEH1L”. Three open proxies (LD r2 > 0.8) are identified as causal variants for locus “SEH1L” (rs3809912). Significant interactions are labeled blue in heatmap. D. Reactome pathway enrichment of genes implicated by neurodevelopmental disorders SNPs. Blue bars indicate the enrichment significance in −log10FDR, Orange dots indicate the percentage of genes in the pathway overlapping with SNP-implicated genes. E. NCAN promoter interacts with cRE containing schizophrenia SNP specially in NPC. The significant interactions are indicated by arc in NPCs (blue) and neurons (red). cRE ~50kbp upstream NCAN promoter contains causal variant rs10419245 (vertical dash line). F. Read count support of the interaction between NCAN and cREs. The geometric mean of the read counts across all replicates were plotted in dots with significant interactions labeled blue. The significant interaction between NCAN bait and rs10419245 containing fragment is highlighted with red triangle. The expected level of Brownian collision background (solid line) and upper limit of 95% confidence (dashed line) were plotted. G. Disruption of GLIS2 binding site by rs10419245 at cRE contacting with NCAN promoter. Information Content Matrix (ICM) on nucleotide sequences of GLIS2 binding site was plotted. The square highlights the disrupted sequence, with the reference nucleotide sequence at top and alternative nucleotide sequence at bottom.
To further identify the genes that directly interact with, and are potentially regulated by, cREs harboring disease-associated variants, we expanded our variant assessment to include all variants from 95% credible sets and proxy variants that were in high linkage disequilibrium (LD, r2 > 0.8) with the lead sentinel SNPs from the corresponding GWAS reports. Among 21,654 variants (15,339 SNPs in 95% credible sets and 12,639 proxies) from all examined neurodevelopmental diseases, 454 variants (307 SNPs in 95% credible sets and 258 proxies) were located at previously identified cREs and contacted at least one expressed gene within at least one cell type. These 454 putative causal variants corresponded to 111 original signals, accounting for 29% of all loci reported across the seven neurodevelopmental diseases reported to date, and physically contacting a total of 600 annotated genes (Supplemental Table 14). On average, each locus yielded 4-5 putative causal variants and contacted 6-7 candidate effector genes. 40% of these variant-gene connections were shared between NPCs and neurons, while 23% were specific to neurons (Figure 5B). For example, a 5’primer UTR variant, rs3809912 (at the locus “SEH1L”), had three open proxies (rs11663049, rs1787000 and rs3809912) contacting 8 candidate effector genes (AFG3L2, CEP76, PSMG2, SEH1L, LDLRAD4, AP005482.1, PTPN2, PIRE1). The contacts between these non-coding regions harboring these key variants and their corresponding putative effector genes varied between NPCs and neurons (Figure 5C). SPIRRE1, LDLRAD4 and AP0005482.1 were identified as candidate effector genes specifically in neurons, while the promoters of SEH1L and AFG3L2 contacted regions harboring proxies in NPCs only. These findings suggest that the effect of a single locus can regulate different effector genes in both cell-specific and developmental-stage specific manners.
To further understand the functions of these implicated effector genes, we focused on a total of 477 protein-coding genes and found via pathway analyses that they were significantly enriched for developmental biology (FDR=2.05e-10), nervous system development (FDR =9.33e-09), chromatin modifying enzymes (FDR=4.17e-7) and signaling by ROBO receptors (FDR=8.15e-05, Figure 5D, Supplemental Table 15). 37 implicated protein-coding genes were shared by at least two disorders. Among these genes, FOXP169, SATB270 and ALDOA71 are involved in neurogenesis, while C12orf6572, SH2B173 and POU3F274 have previously been implicated in neuropsychiatric disorders. In addition, 27% (160) of identified genes were differentially expressed between NPCs and neurons, and were overall enriched among globally differentially expressed genes (one-side hypergeometric test P = 1.35e-2, compared to 22.9% global differentially expressed genes). Furthermore, 31% (141) of variants resided in open TF binding sites and were contacted by promoters, with the alleles for 43 of these variants predicted to have a strong impact on binding affinity, based on motifbreakR22 (Supplemental Table 16).
For example, rs10419245 is a proxy for a schizophrenia sentinel intergenic variant (rs2905432, r2=1) situated between GATAD2A and MAU2, but we found it contacted the gene promoter of NCAN (Figure 5E, F), a gene that encodes Neurocan which is known to be involved in modulating neuronal adhesion and neurite growth during development. This interaction was specific to NPCs, where the expression of NCAN was significantly lower than in neurons (Supplemental Table 14). The fact that rs10419245 was located close to a repressor complex chromatin state in NPCs (Figure 5E) suggests that this interaction results in a suppressive effect on expression of NCAN in NPCs. Further examination of the TF binding across the rs10419245-containing OCRs revealed that a C2H2-type zinc finger TF, GLIS2, was predicted to bind and that an allelic change could disrupt TF binding affinity (Figure 5G). Taking all this evidence together, we speculate that the effect allele of rs10419245 may trigger the early expression of NCAN in NPCs by reducing suppressive binding of transcriptional factor GLIS2, and in turn influences the timing of neuronal development.
Another example is rs76324150, a proxy for the intelligence associated locus tagged by rs17563986 (intronic variant at the ‘MAPT’ locus). The proxy resides in an enhancer histone mark, and the proxy-harboring OCR physically contacts the promotor of the gene encoding formin-like protein, namely FMNL1 (Supplemental Figure 8A, B), for which gene expression is upregulated in neurons (Supplemental Figure 8C) and plays an important role in regulation of cortical actin filament dynamics and cell morphology. This rs76324150-FMNL1 contact was specifically established in neurons but not in NPCs, which is consistent with the observation of alleviated accessibility in neurons (Supplemental Figure 8D, E). A further search for associated TFs revealed that rs76324150 resides within a predicted binding site for transcription factors KLF12 and ZFX (Supplemental Figure 8F). The allelic change is predicted to disrupt the binding of both TFs, with a particularly strong effect for ZFX. This evidence implies that neuronal cells with allelic changes experience depleted expression of FMNL1 due to a disrupted contact via a positive regulatory enhancer for ZFX, which in turn results in neuronal cell development deficiency.
DISCUSSION
Interpretation of associated non-coding variants resulting from GWAS efforts has been a considerable challenge when translating such loci in to molecular insight for disease etiology. In this paper, we generated high-resolution maps of promoter interactions in human iPSC-derived NPCs and neurons, and connected neurodevelopmental disorder-associated signals to putative causal effector genes. We demonstrate that the implicated disease-associated genes are significantly enriched for neuronal development and chromatin organization, implicating the functional relevance of chromatin 3D structure to neurodevelopmental disease biology. By integrating genome-wide chromatin accessible atlases and epigenetic histone modification maps, we are able to estimate the regulatory consequence of SNP-harboring regulatory elements on target gene expression, and we propose potential molecular mechanisms by which allelic change can disrupt normal neuronal cell development. This comprehensive analysis provides a rich resource to interpret the GWAS signal for neurodevelopmental disorder studies and implicated the potential disease-relevant genes and pathways for further experimental investigation.
In the post-GWAS era, many varying fine-mapping approaches have been developed to implicate casual variants and effector genes75. Here we pre-filtered variants by two approaches --- proxy with high LD (R2 > 0.8) and 95% credible set --- before overlapping variants with regulatory elements and mapping causal variants to the corresponding effector genes using a 3D genomic approach. On one hand, the proxies in high LD approach is highly dependent on the population reference so can introduce bias in to proxy calculations. On the other hand, although credible sets statistically prioritize a set of plausible causal variants, the credible sets analyses require full summary statistics and principally assumes that there is only one underlying causal variant driving a specific association. Both approaches have been widely used in similar variant-to-gene mapping publications27,28,40,59. In our analysis, these two approaches generated low overlap of implicated casual variants and effector gene sets, but were enriched for similar pathways like neuron development and chromatin modification. It suggests that, with the constraint of the ATAC-seq and Capture-C, fine-mapping approaches could benefit from the combination of both variant filtering approaches in order to curate a more complete list of casual-variant to effector-gene pairs.
Neuronal development in the brain involves transcriptional regulation of progenitors to an incredible number of molecularly and functionally diverse classes of neurons. Similar to previous work investigating the global re-organization of cRE connection during the lineage commitment of NPCs from embryonic stem cells32, we simplified the developmental model using iPSC-derived NPCs and neurons, and validated cells with conventional molecular markers and transcriptomic profiles, confirming the validity of the in vitro model to represent neurons. Using this model, we provided a clear view of the chromatin re-organization during the decreasing stemness at iPSC-derived NPC differentiation. However, we are aware of the potential for heterogeneous cultures of iPSC-derived neurons due to donor genetic variation, cell culture protocols, and neuronal maturity. Currently detection of single-cell level chromatin conformation change is very challenging due to difficulties of achieving the sufficient sequencing depth that is necessary to detect chromatin interactions. Thus, we focused on differential analysis of chromatin conformation, accessibility, and gene expression levels, between two timepoints of neuronal differentiation. Future single cell studies may address the role of chromatin contacts in regulating cell diversity in the nervous system.
In addition to a global chromatin remodeling evaluation, we also highlight that developmental-stage-specific gene regulation is vulnerable to disease pathogenesis. We observed 60% cell-type-specific variant-to-gene contacts, and showed enrichment of differentially expressed genes among disease-associated implicated genes. Benefiting from the high-resolution promoter-focused Capture-C approach, we increased specificity for the captured promoter region, which in turn led to better annotation of interacting genes and higher resolution of interaction maps. This increased the power to distinguish chromatin interaction differences between different cell types. However, we also note the caveat that our promoter-focused Capture-C technique could generate false positives and false negatives. To circumnavigate this issue due to arbitrary Chicago score cutoffs, we complemented the interaction map with raw read count for direct visualization when claiming cell-specific interactions. However, ultimately the community will need additional statistical approaches for quantifying differential chromosomal interactions at high resolution maps from one cell type to another 76.
Although chromatin conformation plays a critical role in bringing regions harboring disease-associated variants in proximity to distal target effector genes, the interpretation of interaction consequences on gene expression requires much more exploration. From our study, we observed that a given contact between a cis-regulatory element and a gene can result in a totally different gene regulatory outcome, depending on which TF is bound and which epigenomic chromatin state the cis-regulatory element resides in, etc. The formation of contacts when NPCs differentiate to neurons does not always result in an enhancer-promoter interaction that leads to the up-regulation of gene expression. The establishment of long-range interactions may also contribute to gene silencing, like the example of PAX6 in neurons (Figure 3D). Similarly, the ceasing of an interaction during differentiation does not always result in down-regulation of gene expression (Figure 5E). Although we partially implicated the regulatory consequence of those interactions using histone modification data, we anticipate other mechanisms (such as TF cofactor binding, TF transcriptional activity, DNA methylation, etc) are at play in gene regulation. It is unclear what the causal relationship is between these modifications and chromatin conformation changes in the context of neuronal cell development, so further investigation is required.
CONCLUSIONS
Overall, our systematic approach of leveraging three-dimensional genomic data to map variants to target genes enabled us to identify the putative processes that are perturbed during neuronal development in the context of disease. A coherent line of evidence from transcriptomic, epigenomic and chromatin conformation experiments support the notion that disease-associated variants can alter gene expression via the perturbation of regulatory chromatin contacts in normal states and can result in abnormalities in neuronal cell proliferation, differentiation and migration.
METHODS
Cell culture
Frozen NPCs from donor CHOPWT10 and CHOPWT14 were obtained from CHOP stem cell core and thawed slowly in 37°C water bath. The thawed cells were gently washed in Neuronal Expansion Media (49% Neurobasal Media (ThermoFisher, cat# 21103049), 49% Advanced DMEM/F12 (ThermoFisher, cat# 12634010) and 2% 50X Neuronal Induction Supplement) in a 15 ml conical tube, followed by centrifuging at 300x g for 5 mins. Cells were resuspended in 1 ml pre-warmed Neuronal Expansion Media with Rock inhibitor (Y-27632 compound, Stem Cell Technologies, cat# 72304) at a final concentration of 10 μM and a cell count performed. NPCs were seeded at a density of 150k cells/cm2 onto hESC-qualified Matrigel-coated plates (Corning, cat# 354277) in 2.5 ml/well Neuronal Expansion Media and cultured at 37°C in a humidified cell culture incubator with 5% CO2. The day after, the medium was changed to remove the Y-27632 compound. NPCs were expanded for 6-7 days in 2.5 ml Neuronal Expansion Media exchanged every 48 hours before harvesting.
To differentiate NPCs into cortical neurons, we followed the protocol from Jones et al (2017) and started by seeding expanded NPCs onto plates coated with hESC-qualified Matrigel. The NPCs were supplemented with 1:1 mixture of N2:B27 media (250 ml N2 media: 240 ml DMEM/F12 with Glutamax (ThermoFisher, cat#10565-018), 2.5 ml 100X N2 supplement (ThermoFisher, cat#17502048), 2.5 ml 200mM L-glutamine (Corning,cat#MT25-005-CI), 2.5 ml 100X MEM Non-Essential Amino Acids solution (ThermoFisher, cat#11140076), 2.5ul 14.3M beta-mercaptoethanol (Sigma-Aldrich, cat#M3148), 2.5ml 100X Antibiotic-Antimycotic (ThermoFisher, cat#152440112), 0.25ml 100mg/ml insulin (Sigma-Aldrich, cat#I0516); 250 ml B27 media: 240ml Neurobasal media (ThermoFisher, cat# 21103049), 5ml 50x B27 supplement (ThermoFisher, cat#17504044)). The terminally differentiated neurons were maintained in N2:B27 medium for 30-40 days, with replacing half of the media every other day before harvesting.
PCR and Immunofluorescence Microscopy
PCR for selected markers was used to confirm cell identity. Primers were designed using the UCSC genome (https://genome.ucsc.edu/) database and Primer Blast free access design tool (https://www.ncbi.nlm.nih.gov/tools/primer-blast/) (Supplemental Table 3). Total RNA was extracted using TRIzol Reagent (Invitrogen, Carlsbad, California) and Qiagen RNeasy (Qiagen, Valencia, California) column. Two micrograms of RNA were used to obtain cDNA using Superscript IV VILO Master Mix reverse transcriptase following the manufacturer’s instructions (Thermo Fisher Scientific, Carlsbad, California). PCR reaction was carried out using Platinum Hot Start PCR MasterMix (Invitrogen, Carlsbad, California) and samples were run on a 2% agarose gel.
For immunofluorescence, cells were cultured in 35mm glass bottom dishes until 50% confluence and fixed using cold 100% cold methanol for 5 minutes. Cells were then rinsed with cold PBS and incubated in blocking buffer (PBS-T, 1%BSA, 22mg/ml glycine) during one hour at RT. Blocking buffer was replaced by primary antibody diluted in PBS-T/BSA1%. Samples were incubated with primary antibodies (Supplemental Table 3) at 4 °C and washed 3 times with cold PBS-T. Secondary antibody was added and samples were incubated in the dark for one hour. Cells were then washed as described above and incubated for a minute with NucBlue. Finally, samples were mounted using ProLong Glass Antifade (Molecular Probes, cat#P36930) and imaged using a confocal microscope Zeiss LSM810. Images were processed using Fiji-ImageJ.20.
ATAC-seq library preparation
Five replicates of CHOPWT10 and CHOPWT14 NPCs or CHOPWT10 and CHOPWT14 neurons were harvested using Accutase, followed by a DPBS wash, then counted. 50,000 cells of each sample were spun down at 550 ×g for 5 min at 4°C. The cell pellet was then resuspended in 50 μl cold lysis buffer (10 mM Tris-HCl, pH 7.4, 10 mM NaCl, 3 mM MgCl2, 0.1% IGEPAL CA-630) and spun down immediately at 550 ×g for 10 min, 4°C. The nuclei were resuspended on ice in the transposition reaction mix (2x TD Buffer, 2.5ul Tn5 Transposes and Nuclease Free H2O) (Illumina Cat#FC-121-1030, Nextera) on ice and the transposition reaction was incubated at 37°C for 45 min. The transposed DNA was then purified using a the MinElute Kit (Qiagen) adjusted to 10.5 μl elution buffer. The transposed DNA was converted into libraries using NEBNext High Fidelity 2x PCR Master Mix (NEB) and the Nextera Index Kit (illumina) by PCR amplification for 12 cycles. The PCR reaction was subsequently cleaned up using AMPureXP beads (Agencourt), checked on a Bioanalyzer 2100 (Agilent) high sensitivity DNA Chip (Aglient), and paired-end sequenced on the Illumina NovaSeq 6000 platform (51bp read length) at the Center for Spatial and Functional Genomics at CHOP.
RNA-seq library preparation
RNA was isolated from each cell type in triplicate using Trizol Reagent (Invitrogen). RNA was then purified using the Directzol RNA Miniprep Kit (Zymol) and depleted of contaminating genomic DNA using DNAse I. Purified RNA was then checked for quality on the Bioanlyzer 2100 using the Nano RNA Chip and samples with a RIN number above 7 were used for RNA-seq library synthesis. RNA samples were depleted of rRNA using the QIAseq FastSelect RNA Removal Kit then processed into libraries using the NEBNext Ultra Directional RNA Library Prep Kit for Illumina (NEB) according to manufacturer’s instructions. Quality and quantity of the libraries was measured using the Bioanalyzer 2100 DNA-1000 chip and Qubit fluorometer (Life Technologies). Completed libraries were pooled and sequenced on the NovaSeq 6000 platform using paired-end 51bp reads at the Center for Spatial and Functional Genomics at CHOP.
Promoter focused Capture-C library preparation
We used standard methods for generation of 3C libraries14. For each library, 107 fixed cells were thawed at room temperature, followed by centrifugation at RT for 5 mins at 14,000rpm. The cell pellet was resuspended in 1 mL of dH2O supplemented with 5 uL 200X protease inhibitor cocktail, incubated on ice for 10 mins, then centrifuged. The cell pellet was resuspended to a total volume of 650 uL in dH2O. 50 uL of cell suspension was set aside for pre-digestion QC, and the remaining sample was divided into 6 tubes. Both pre-digestion controls and samples underwent a predigestion incubation in a Thermomixer (BenchMark) with the addition of 0.3%SDS, 1x NEB DpnII restriction buffer, and dH2O for 1 hr at 37°C shaking at 1,000rpm. A 1.7% solution of Triton X-100 was added to each tube and shaking was continued for another hour. After the pre-digestion incubation, 10 ul of DpnII (NEB, 50 U/μL) was added to each sample tube only, and continued shaking along with pre-digestion control until the end of the day. An additional 10 μL of DpnII was added to each digestion reaction and digested overnight. The next day, a further 10 μL DpnII was added and continue shaking for another 2-3 hours. 100 uL of each digestion reaction was then removed, pooled into two 1.5 mL tube, and set aside for digestion efficiency QC. The remaining samples were heat inactivated incubated at 1000 rpm in a MultiTherm for 20 min, at 65°C to inactivate the DpnII, and cooled on ice for 20 additional minutes. Digested samples were ligated with 8 uL of T4 DNA ligase (HC ThermoFisher, 30 U/μL). and 1X ligase buffer at 1,000 rpm overnight at 16°C in a MultiTherm. The next day, an additional 2 μL of T4 DNA ligase was spiked in to each sample and incubated for another few hours. The ligated samples were then de-crosslinked overnight at 65°C with Proteinase K (20 mg/mL, Denville Scientific) along with pre-digestion and digestion control. The following morning, both controls and ligated samples were incubated for 30 min at 37°C with RNase A (Millipore), followed by phenol/chloroform extraction, ethanol precipitation at −20°C, then the 3C libraries were centrifuged at 3000 rpm for 45 min at 4°C to pellet the samples. The controls were centrifuged at 14,000 rpm. The pellets were resuspended in 70% ethanol and centrifuged as described above. The pellets of 3C libraries and controls were resuspended in 300uL and 20μL dH2O, respectively, and stored at −20°C. Sample concentrations were measured by Qubit. Digestion and ligation efficiencies were assessed by gel electrophoresis on a 0.9% agarose gel and also by quantitative PCR (SYBR green, Thermo Fisher).
Isolated DNA from 3C libraries was quantified using a Qubit fluorometer (Life technologies), and 10 μg of each library was sheared in dH2O using a QSonica Q800R to an average fragment size of 350bp. QSonica settings used were 60% amplitude, 30s on, 30s off, 2 min intervals, for a total of 5 intervals at 4 °C. After shearing, DNA was purified using AMPureXP beads (Agencourt). DNA size was assessed on a Bioanalyzer 2100 using a DNA 1000 Chip (Agilent) and DNA concentration was checked via Qubit. SureSelect XT library prep kits (Agilent) were used to repair DNA ends and for adaptor ligation following the manufacturer protocol. Excess adaptors were removed using AMPureXP beads. Size and concentration were checked again by Bioanalyzer 2100 using a DNA 1000 Chip and by Qubit fluorometer before hybridization. One microgram of adaptor-ligated library was used as input for the SureSelect XT capture kit using manufacturer protocol and our custom-designed 41K promoter Capture-C probe set. The quantity and quality of the captured libraries were assessed by Bioanalyzer using a high sensitivity DNA Chip and by Qubit fluorometer. SureSelect XT libraries were then paired-end sequenced on Illumina NovaSeq 6000 platform (51bp read length) at the Center for Spatial and Functional Genomics at CHOP.
RNA-seq data pre-processing and expression differential analysis
The pair-end fastq files were mapped to genome assembly hg19 by STAR (v2.6.0c)77 independently for each replicate. GencodeV19 annotation was used for gene feature annotation and the raw read count for gene feature was calculated by htseq-count (v0.6.1)78 with parameter settings -f bam -r pos -s reverse -t exon -m union. The gene features localized on chrM or annotated as rRNAs were removed from the final sample-by-gene read count matrix.
The differential analysis was performed in R (v3.3.2) using R package edgeR79 (v3.16.5). Briefly, the raw reads on genes features were applied to CPM (read Counts Per Million total reads). The gene features with median value of less than 0.7 CPM (10 ~ 18 read per gene feature) across all samples were removed from differential analysis. The trimmed mean of M-values (TMM) method were used to calculate normalization scaling factors and quasi-likelihood negative binomial generalized log-linear (glmQLFit) approach was applied to the count data with model fitting ~individual + cell type. The differential expression genes (DEGs) between NPC and neurons were identified with cut-off FDR < 0.05 and absolute logFC > 1.
Pathway enrichment analysis
The significantly differentially expressed genes were subjected to Ingenuity pathway analysis (IPA, QIAGEN). The significantly (BH corrected P < 0.05) enriched canonical pathways were grouped to neuronal activated and suppressed pathways based on z-score, which was calculated from log2 fold change of gene expression between NPCs and neurons. The networks with relevant genes were directly exported from IPA and p-value, z-score and percentage of the genes for each enriched pathway was plotted using ggplot2.
ATAC-seq peak calling
NPC and neuron ATAC-seq peaks were called using the ENCODE ATAC-seq pipeline (https://www.encodeproject.org/atac-seq/). Briefly, pair-end reads from all replicates for each cell type were aligned to hg19 genome using bowtie2, and duplicate reads were removed from the alignment. Aligned tags were generated by modifying the reads alignment by offsetting +4bp for all the reads aligned to the forward strand, and -5bp for all the reads aligned to the reverse strand. Narrow peaks were called independently for pooled replicates for each cell type using macs2 (-p 0.01 --nomodel --shift -75 --extsize 150 -B --SPMR --keep-dup all --call-summits) and ENCODE blacklist regions (wgEncodeDacMapabilityConsensusExcludable.bed.gz) were removed from called peaks. Finally, a consensus of open chromatin regions (OCRs) were obtained by consolidating the peak sets across cell types using bedtools intersect (v2.25.0)80.
Differential analysis of chromatin accessibility
To determine whether an OCR is differentially accessible between neurons and NPCs, the de-duplicated read counts for consensus OCRs were calculated for each replicate and normalized against background (10K bins of genome) using the R package Csaw81 (v 1.8.1). OCRs with median value of less than 1.2 CPM (10~50 reads per OCR) across all replicates were removed from further differential analysis. Similar to gene differential analysis, accessibility differential analysis was performed using glmQLFit approach fitting model ~individual + cell type in edgeR (v 3.16.5) but with the normalization scaling factors calculated from csaw. Differential OCRs between cell types were identified if FDR<0.05 and absolute log2 fold change >1.
Promoter Capture-C pre-processing and interaction calling
Paired-end reads from two donors for neurons (6 replicates per donor) and NPCs (3 replicates per donor) were pre-processed using the HICUP pipeline42 (v0.5.9), with bowtie282 as aligner and hg19 as the reference genome. Non-hybrid read count from all baited promoters were used for significant promoter interaction calling. Significant promoter interactions at 1-DpnII fragment resolution were called using CHiCAGO83 (v1.1.8) with default parameters except for binsize set to 2500. Significant interactions at 4-DpnII fragment resolution were also called using CHiCAGO with artificial baitmap and rmap files in which DpnII fragments were concatenated in silico into 4 consecutive fragments using default parameters except for removeAdjacent set to False. Interactions with a CHiCAGO score > 5 in at least one cell type in either 1-fragment or 4-fragment resolution were considered as significant interactions. The significant interactions were finally converted to ibed format in which each line represents a physical interaction between fragments.
Differential analysis on Promoter Capture-C using Chicdiff
To statistically compare the interaction strength between NPC and neuron, we first merged peaks called from triplicates in independent cell types with chicagoTools “makePeakMatrix”, and used merged peaks as reference interactions. We then generated Chicago R objects independently in each replicate and mapping replicate-wise read count to reference interactions. To overcome the paucity of read count number per DpnII fragment, the differential analyses were performed by merging 20 neighboring fragments for 1-fragment resolution and 10 for 4-fragment resolution in Chicdiff pipeline (v0.4)76. Then significant individual fragments were prioritized by filtering minimum P-value < 0.05 and min DeltaAsinhScore difference > 1 with “getCandidateInteractions” function.
Annotating OCRs to corresponding genes
OCRs were annotated to genes through two methods: at the linear genome level and 3-dimensional genome level. The linear genome level OCRs were called using ATAC-seq alone, by mapping to their corresponding genes according to their distance to TSS of a gene. The OCRs that resides within -1000bp ~ +500bp around TSS (defined as gene promoter region) were defined as prOCRs and assumed to regulate expression of their cognate gene. A prOCR could be annotated to multiple genes if it overlapped with more than gene promoter regions. For the 3-dimensional genome level, ATAC-seq and promoter-focused Capture-C data was integrated so that the OCRs were annotated to their corresponding gene based on significant physical interactions. An OCR was defined as cRE if it overlapped with promoter-interacting region (PIRs) with a minimum fraction of 50% on either PIR or OCR, and predicted to regulate gene promoters represented by bait fragments defined at probe design29.
Transcription factor analysis
JASPAR 2020 core vertebrate motifs were downloaded from http://jaspar.genereg.net/57 . This motif collections contains 746 motifs which corresponding to 657 different transcription factors (TFs). Since Tn5 transposase binds as a dimer and inserts two adaptors separated by 9 bp, we modified the reads’ aligned file in bam format by offsetting +4bp for all the reads aligned to the forward strand, and −5bp for all the reads aligned to the reverse strand. To prevent the mapping depth difference effect on TF binding site prediction, neuron mapping reads were further subsampled to have same number of read mapping in NPC bam file. PIQ56 was then used to predict transcription factor binding sites (TFBS) from the assembly gap masked genome sequence as described in https://github.com/orzechoj/piq-single. A purity score cutoff of 0.8 was used to define a binding site candidate and further filtered by blacklist from wgEncodeDacMapabilityConsensusExcludable.bed.gz as described in ATAC-seq peak calling. TFs with gene expression TPM < 1 were further filter out from further analysis.
To determine the TFs over-represented in gene annotated OCRs, we performed Bias-free Footprint Enrichment Test (BiFET v 1.1.8)58 which corrects for the bias from the imbalance of GC-content and read counts between target and background set to identify enriched TFs. Specifically, GC content for each OCR was pre-calculated using bedtools nuc, and the raw read counts were merged from all replicates per OCR. Target regions were defined as gene annotated OCRs (prOCRs + cREs) while the rest OCRs were labelled as background. The p-value calculated by the function calculate_enrich_p in each cell type and each motif were further corrected for multiple testing using false discovery rate (FDR) estimation.
To detect TF binding activity difference between cells, motif score was calculated for each motif as previously described, if the binding sites were within gene annotated OCRs and above purity score cutoff66. The motif score was defined as the probability of all predicted motif sites within cRE were occupied by corresponding TF at given sample. It is estimated as follows:
Where Mt is the number of motifs binding sites (bs). By comparison the motif score between samples for each motif, we were able to estimate the TF binding discrepancy between NPCs and neurons and identify cell-specific TF binding.
Identifying chromatin state using chromHMM
The bed file of histone marks ChIP-seq data for mESC-derived neuronal cells and NPCs were downloaded from epigenomic roadmap53,84 (Supplemental Table 9) and used as input for chromHMM (v1.17). Assembly of hg19 and 15-state model were set for model learning, which was performed for neurons and NPCs independently. Different sets of OCRs were compared to 15 chromatin states in corresponding cell type, and relative overlap enrichment was calculated for each state and plotted using OverlapEnrichment function. Finally, the annotation of chromatin states was manually added and color-coded with the reference to epigenome roadmap project.
Partitioned heritability LD score regression enrichment analysis
Partitioned heritability LD Score Regression85 (v1.0.0) was used to identify enrichment of GWAS summary statistics among gene-annotated open regions identified from NPCs and neurons. The baseline analysis was performed using LDSCORE data (https://data.broadinstitute.org/alkesgroup/LDSCORE) with LD scores, regression weights, and allele frequencies from 1000G Phase1. The summary statistics for 7 neurodevelopmental diseases were downloaded from Psychiatric Genomics Consortium (PGC), Complex Trait Genetic Lab (CTG) and Broad Antisocial Behavior Consortium (BroadABC) using the links and reference provided in Supplemental Table 5. We generated partitioned LD score regression annotations for NPCs and neurons using the coordinates of gene-annotated open regions (prOCR + cRE) as previously performed. Finally, the cell-type-specific partitioned LD scores were compared to baseline LD scores to measure enrichment in NPCs and neuronal cells independently.
Variant to gene mapping
The proxy SNPs, which were frequently co-inherited with the sentinel SNPs reported in 7 GWAS studies (Supplemental Table 12), were calculated using online SNP annotator SNiPA (https://snipa.helmholtz-muenchen.de/snipa/) with default setting (genome assembly as GRCh37, variant set as 1000 Genome Phase 3 v5, LD r-square cutoff as 0.8) in European population. The 95% credible sets were identified using p-values from GWAS summary statistics (https://github.com/statgen/gwas-credible-sets/) with flanking region of 200kb around sentinel signals. Both proxy SNPs and 95% credible sets were overlapped with cRE and annotated to connected genes using cRE-gene-contact maps for each cell type (Supplemental Table 13). SNP-gene pairs were further filtered with gene expression level > 1TPM at given cell type and visualized using pyGenomeTracks86 (v3.0).
To assess the effects of variants on transcription factor binding sites (TFBS), we applied motifbreakR22 to SNPs that resides on gene-connected cRE and compare the both reference and alternative alleles to JASPAR core as TF motif database. The comparison results were filtered with at least one allele achieving p-value below 1e-4 threshold.
Availability of data and materials
Our data are available from ArrayExpress (https://www.ebi.ac.uk/arrayexpress/) with accession numbers E-MTAB-9159 (promoter-Capture-C), E-MTAB-9087 (ATAC-seq), and E-MTAB-9085 (RNA-seq) respectively.
Supplementary Material
Supplemental Figure 1: Cell validation using cell markers via PCR and immune-fluorescence microscopy
A. Validation of NPCs using PCRs on NEST, SOX1 and PAX6. The validations were performed using 35 PCR cycles on DNA extracted from both donors. Primer sequences were available in Supplemental Table 3. B. Validation of NPCs using immune-fluorescence against Nestin, PAX6 and SOX1. Images were taken at 63X magnification. C. Validation of neurons using PCRs on bIII-Tub, NeuN, MAP2 and TUJ1. The validations were performed using 35 PCR cycles on DNA extracted from both donors. Primer sequences were available in Supplemental Table 3. D. Validation of neurons using immune-fluorescence against bIII-Tub (left; green), MAP2 (right; green) and CTIP2 (right; magenta). Images were taken at 63X magnification.
Supplemental Figure 2: Gene expression correlation between iPSC-derived cells and primary tissues from GTEx
Spearman correlation of gene expression from iPSC-derive neurons (A) or NPCs (B) with primary tissues from GTEx. The spearman correlations between the mean TPM value of genes in iPSC-derived samples and gene TPM of the full individual tissue RNA-seq dataset from GTEx (2017-06-05 v8) were calculated. The line within box indicates the median correlation. The lower and upper hinges correspond to the 25th (Q1) and 75th (Q3) percentiles. The upper whisker corresponds 1.5 * interquartile range (IQR) + Q3 and the lower whisker extends to Q1-1.5 * IQR. The brain and non-brain tissues are labeled with red and black respectively.
Supplemental Figure 3: Quality control of ATAC-seq and differential accessibility between NPCs and neurons
A. Pair-wise correlation among ATAC-seq samples from NPCs and neurons. Pearson correlation among 20 ATAC-seq samples (5 replicates x 2 donors x 2 cell types) are calculated using OCR log2FPKM values and plotted in heatmap with darker blue representing higher correlation. Blue and red bars label samples from NPCs and neurons, respectively. B. More OCRs with higher differentially accessibility in NPCs. The OCRs with differentially accessibility between NPCs and neurons were grouped to OCRs with higher accessibility in NPCs (blue) and higher accessibility in neurons (red). The number of OCRs were stratified by log2Fold change and plotted in bar graph.
Supplemental Figure 4. Quality control of promoter-focused capture-C
A. Pairwise correlation among promoter-focused capture-C libraries. 1Mbp-resolution raw read count HiC matrix from 18 sequenced capture-C libraries were normalized by adjust read count and compared to each other by HICRep (v 1.8.0). The Stratum-adjusted Correlation Coefficient (SCC) was calculated with smoothing parameter h=1. The hierarchical cluster is performed on 1-SCC with “complete” method. Heatmap color scale indicates the correlation with blue as high. B. Recall rate and positive prediction value of detecting active regulatory elements and enhancer features across different Chicago scores. C. Cis- and trans-interaction number in NPCs and neurons. Intra-chromosomal (cis, red) and inter-chromosomal (trans, green) interactions are predicted by CHiCAGO (score > 5) in both 1-fragment and 4-fragement resolution. Intra-chromosomal interactions account for more than 90% of total. D. Regulatory feature enrichment by promoter-interacting regions (PIRs) involved in significant interactions. PIR enrichment for genomic features was compared with distance-matched random regions using CHiCAGO peakEnrichment4Features function. Mean ± 95% Cl is depicted across 100 draws of non-significant interactions. E. Composition ratio of intra-chromosomal interactions in terms of across and within TADs in NPCs and neurons. The intra-chromosomal interactions are grouped into “within TADs” (green) and “across TADs” (red) depending whether both ends within the same TAD boundary of human ESC87. F. The distance between intra-chromosomal interacting chromatin regions in NPCs and neurons. Interaction distance represents the linear distance between middle points of both ends of a significant interaction. The interaction distance is significantly longer in NPCs (blue) than in neurons (red, Wilcox Rank-Sum Test P < 2e-16) at both resolutions in which 4-fragment resolution detects longer interaction than 1-fragment resolution does.
Supplemental Figure 5. Summary of gene-cRE interactions
A. The composition of putative cis-regulatory elements (cRE). cREs (red) are defined as an OCR that overlapped with promoter-interacting region (PIRs) with a minimum fraction of 50% in either NPCs or neurons. They partially intersect with previously defined prOCRs but also re-annotate non-prOCR (nprOCRs) to distal corresponding genes. Non-cRE (green) is the counterpart of cRE. B. Distribution of interacting cRE number per gene in NPCs and neurons. C. Distribution of interacting gene number per cRE in NPCs and neurons. D, E. Positive correlation between number of cRE and gene expression in NPC (D) and neurons (E). Genes are grouped based on the number of interacting cRE. Boxplots indicate the median, IQR, Q1 – 1.5 × IQR and Q3 + 1.5 × IQR. Linear regression was performed on the Mean log2TPM values for n = 15 bins (F test, NPC: P = 4.72e-5, r-square =0.74; neurons: P = 9.64e-4, r-square=0.58). F. The Venn diagram of gene-cRE interaction number in NPCs and neurons. G. Gene expression change is positively correlated with active enhancer-like cRE number change during neuronal differentiation. The blue numbers indicate the numbers of genes per group. Linear regression was performed between log2 expression fold change and cRE number difference per gene (P = 5.16x10−25; beta = 0.04; R2 = 9.22x10−3). Boxplot indicates median, Q1, Q3, Q1 – 1.5 × IQR and Q3 + 1.5 × IQR.
Supplemental Figure 6. Spatial interaction between NEFL promoter and cREs
A. cREs interact with the NEFL promoter in a cell-specific manner. The significant interaction (CHiCAGO score > 5) between cREs and NEFL are indicated by black arc in NPCs (blue peaks) and neurons (red peaks). B. Differential accessibility of cRE that interacts with NEFL promoter in neurons and NPCs. FPKM of cRE at 50kbp downstream of NEFL promoter are plotted. Boxplot indicates median, IQR, Q1 – 1.5 × IQR and Q3 + 1.5 × IQR.
Supplemental Figure 7. Chromatin state ratio of cREs that contact with gene promoter
A. Illustrating the calculation of chromatin state ratio across cREs that contact with gene promoter. All cREs interacting with a given gene and their corresponding chromatin states are aggregated. Chromatin state ratio is calculated per gene per state with state aggregated length dividing total cRE length. B. heatmap of chromatin state ratio for differentially expressed genes in NPC and neurons. The heatmap scale represents ratio level (high: blue, low: white).
Supplemental Figure 8. FMNL1 promoter interacts with cRE containing Intelligence Deficiency GWAS SNP specially in neuron.
A. The interaction map between FMNL1 promoter and cREs for NPC and neurons. The significant interaction (CHiCAGO score > 5) between cREs and FMNL1 are indicated by black arc in NPCs (top) and neurons (bottom red). Histone chromatin states colors were indicated in Figure 3B. Light blue bars indicate cREs. cRE ~900kbp upstream FMNL1 promoter contains causal variant rs76324150 that is the proxy (LD r2=1) of intelligence deficiency GWAS SNP rs17563986. B. Raw read count to support interaction between FMNL1 and cRE containing rs76324150. The geometric mean of the raw read counts supporting fragment interactions across all replicates were plotted in dots with significant interactions labeled blue. The significant interaction between FMNL1 bait and rs76324150 containing fragment is highlighted with red triangle. The expected level of Brownian collision background (solid line) and upper limit of 95% confidence of Brownian background (dashed line) were plotted. C. FMNL1 Expression difference between NPC and neurons. Expression levels are represented by TPM in RNA-seq across replicates. D. Accessibility of the cRE containing causal variant rs76324150. Accessibility is represented by FPKM in ATAC-seq across replicates. E. Disruption of ZFX and KLF12 binding site by rs76324150 at cRE contacting with FMNL1 promoter. Information Content Matrix (ICM) on nucleotide sequences of ZFX and KLF12 binding site were plot. The square highlights the disrupted site within ZFX and KLF12 binding sites, with the reference nucleotide sequence at top and alternative nucleotide sequence at bottom.
Supplemental Table 1: Differential expressed genes between NPCs and Neurons.
Supplemental Table 2: Ingenuity Pathway Enrichment for differentially expressed protein-coding genes.
Supplemental Table 3: PCR primers for cell validation
Supplemental Table 4: The accessibility of promoter OCR and expression level of their corresponding genes
Supplemental Table 5: Summary of promoter-focused capture-C libraries pre-processed by hicup
Supplemental Table 6: Summary statistics of significant interactions called by CHiCAGO
Supplemental Table 7: Differential interactions between NPCs and neurons detected by Chicdiff
Supplemental Table 8: Genome-wide interactions between gene and cRE in NPCs
Supplemental Table 9: Genome-wide interactions between gene and cRE in neurons
Supplemental Table 10: Roadmap epigenomics project resource for hESC-derived neurons and NPCs
Supplemental Table 11: Experimentally validated enhancers (VISTA) overlapping with cRE among different tissues.
Supplemental Table 12: BiFET transcription factors enrichment in NPCs and neurons
Supplemental Table 13: GWAS studies of neurodevelopmental disorders.
Supplemental Table 14: Variant to gene mapping for neurodevelopmental disorders.
Supplemental Table 15: Reactome pathway enrichment for genes implicated by variants associated with neurodevelopmental disorders
Supplemental Table 16: Variants associated with neurodevelopmental disorders disrupts TF binding at cRE.
Highlights.
High-resolution atlases of chromatin reorganization during neuronal differentiation
Activity of promoter contacting cis-regulatory elements correlates with expression
Neurodevelopmental disorder-associated variants mapped to distal targets
Putative effector genes are enriched for pathways related to neuronal development
Acknowledgements
We acknowledged funding and support from the Children’s Hospital of Philadelphia, National Human Genome Research Institute (NHGRI), Eunice Kennedy Shriver National Institute of Child Health & Human Development (NICHD),National Heart, Lung, and Blood Institute (NHLBI), and National Institute on Aging (NIA). We also thank Psychiatric Genomics Consortium (PGC) for making GWAS summary statistics freely available.
Funding
This research was funded by the Children’s Hospital of Philadelphia and by NIH grants K99 HD099330, R01 HG010067, R01 HL143790 and R01 AG057516. Dr. Grant is funded by the Daniel B. Burke Endowed Chair for Diabetes Research.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
REFERENCES
- 1.Harris JC New classification for neurodevelopmental disorders in DSM-5. Current Opinion in Psychiatry (2014). doi: 10.1097/YCO.0000000000000042 [DOI] [PubMed] [Google Scholar]
- 2.Weinberger DR Implications of Normal Brain Development for the Pathogenesis of Schizophrenia. Arch. Gen. Psychiatry (1987). doi: 10.1001/archpsyc.1987.01800190080012 [DOI] [PubMed] [Google Scholar]
- 3.Sullivan PF, Kendler KS & Neale MC Schizophrenia as a Complex Trait: Evidence from a Meta-analysis of Twin Studies. Arch. Gen. Psychiatry (2003). doi: 10.1001/archpsyc.60.12.1187 [DOI] [PubMed] [Google Scholar]
- 4.McGuffin P et al. The heritability of bipolar affective disorder and the genetic relationship to unipolar depression. Arch. Gen. Psychiatry (2003). doi: 10.1001/archpsyc.60.5.497 [DOI] [PubMed] [Google Scholar]
- 5.Faraone SV et al. Molecular genetics of attention-deficit/hyperactivity disorder. Biological Psychiatry (2005). doi: 10.1016/j.biopsych.2004.11.024 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Colvert E et al. Heritability of autism spectrum disorder in a UK population-based twin sample. JAMA Psychiatry (2015). doi: 10.1001/jamapsychiatry.2014.3028 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Sandin S et al. The heritability of autism spectrum disorder. JAMA - J. Am. Med. Assoc. (2017). doi: 10.1001/jama.2017.12141 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Baker LA, Bezdjian S & Raine A Behavioral Genetics: The Science of Antisocial Behavior. in The Impact of Behavioral Sciences on Criminal Law (2009). doi: 10.1093/acprof:oso/9780195340525.003.0001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Reichenberg A et al. Discontinuity in the genetic and environmental causes of the intellectual disability spectrum. Proceedings of the National Academy of Sciences of the United States of America (2016). doi: 10.1073/pnas.1508093112 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Den Braber A et al. Obsessive–compulsive symptoms in a large population-based twin-family sample are predicted by clinically based polygenic scores and by genome-wide SNPs. Transl. Psychiatry (2016). doi: 10.1038/tp.2015.223 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Ruzzo EK et al. Inherited and De Novo Genetic Risk for Autism Impacts Shared Networks. Cell (2019). doi: 10.1016/j.cell.2019.07.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Sebat J et al. Strong association of de novo copy number mutations with autism. Science (80-. ). (2007). doi: 10.1126/science.1138659 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.lossifov I et al. De Novo Gene Disruptions in Children on the Autistic Spectrum. Neuron (2012). doi: 10.1016/j.neuron.2012.04.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Demontis D et al. Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder. Nat. Genet. (2019). doi: 10.1038/s41588-018-0269-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Meta-analysis of GWAS of over 16,000 individuals with autism spectrum disorder highlights a novel locus at 10q24.32 and a significant overlap with schizophrenia. Mol. Autism (2017). doi: 10.1186/s13229-017-0137-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Savage JE et al. Genome-wide association meta-analysis in 269,867 individuals identifies new genetic and functional links to intelligence. Nat. Genet. (2018). doi: 10.1038/s41588-018-0152-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Ripke S et al. Biological insights from 108 schizophrenia-associated genetic loci. Nature (2014). doi: 10.1038/nature13595 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Stahl EA et al. Genome-wide association study identifies 30 loci associated with bipolar disorder. Nat. Genet. (2019). doi: 10.1038/s41588-019-0397-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Arnold PD et al. Revealing the complex genetic architecture of obsessive-compulsive disorder using meta-analysis. Mol. Psychiatry (2018). doi: 10.1038/mp.2017.154 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Tielbeek JJ et al. Genome-wide association studies of a broad spectrum of antisocial behavior. JAMA Psychiatry (2017). doi: 10.1001/jamapsychiatry.2017.3069 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Waszak SM et al. Population Variation and Genetic Control of Modular Chromatin Architecture in Humans. Cell (2015). doi: 10.1016/j.cell.2015.08.001 [DOI] [PubMed] [Google Scholar]
- 22.Coetzee SG, Coetzee GA & Hazelett DJ MotifbreakR: An R/Bioconductor package for predicting variant effects at transcription factor binding sites. Bioinformatics (2015). doi: 10.1093/bioinformatics/btv470 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Hormozdiari F et al. Colocalization of GWAS and eQTL Signals Detects Target Genes. Am. J. Hum. Genet. 99, 1245–1260 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Pers TH et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat. Commun. 6, 5890 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.de Leeuw CA, Mooij JM, Heskes T & Posthuma D MAGMA: Generalized Gene-Set Analysis of GWAS Data. PLoS Comput. Biol. 11, 1–19 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Javierre BM et al. Lineage-Specific Genome Architecture Links Enhancers and Noncoding Disease Variants to Target Gene Promoters. Cell 167, 1369–1384.e19 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Montefiori LE et al. A promoter interaction map for cardiovascular disease genetics. Elite 7, 1–35 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Song M et al. Mapping cis-regulatory chromatin contacts in neural cells links neuropsychiatric disorder risk variants to target genes. Nat. Genet. 51, 1252–1262 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Chesi A et al. Genome-scale Capture C promoter interactions implicate effector genes at GWAS loci for bone mineral density. Nat. Commun. 10, 1260 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Rajarajan P et al. Neuron-specific signatures in the chromosomal connectome associated with schizophrenia risk. Science (80-. ). 362, (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Hu R et al. Brain cell type – specific enhancer – promoter interactome maps and disease-risk association. Science (80-. ). 1139, 1134–1139 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Freire-Pritchett P et al. Global reorganisation of cis-regulatory units upon lineage commitment of human embryonic stem cells. Elife 6, (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Maguire JA et al. Generation of human control iPS cell line CHOPWT10 from healthy adult peripheral blood mononuclear cells. Stem Cell Res. (2016). doi: 10.1016/j.scr.2016.02.017 [DOI] [PubMed] [Google Scholar]
- 34.Jones VC, Atkinson-Dell R, Verkhratsky A & Mohamet L Aberrant iPSC-derived human astrocytes in Alzheimer’s disease. Cell Death Dis. 8, 1–11 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Ziller MJ et al. Dissecting neural differentiation regulatory networks through epigenetic footprinting. Nature 518, 355–359 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Friede RL & Samorajski T Axon caliber related to neurofilaments and microtubules in sciatic nerve fibers of rats and mice. Anat. Rec. (1970). doi: 10.1002/ar.1091670402 [DOI] [PubMed] [Google Scholar]
- 37.Freigang J et al. The crystal structure of the ligand binding module of axonin-1/TAG-1 suggests a zipper mechanism for neural cell adhesion. Cell (2000). doi: 10.1016/S0092-8674(00)80852-1 [DOI] [PubMed] [Google Scholar]
- 38.Benowitz LI & Routtenberg A GAP-43: An intrinsic determinant of neuronal development and plasticity. Trends Neurosci. (1997). doi: 10.1016/S0166-2236(96)10072-2 [DOI] [PubMed] [Google Scholar]
- 39.Ardlie KG et al. The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans. Science (80-. ). 348, 648–660 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Su C et al. Mapping effector genes at lupus GWAS loci using promoter Capture-C in follicular helper T cells. Nat. Commun. 11, 1–17 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Yang T et al. HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient. Genome Res. 27, 1939–1949 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Wingett S et al. HiCUP: Pipeline for mapping and processing Hi-C data. F1000Research (2015). doi: 10.12688/f1000research.7334.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Cairns J et al. CHiCAGO: Robust detection of DNA looping interactions in Capture Hi-C data. Genome Biol. 17, 1–17 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Dixon JR et al. Chromatin architecture reorganization during stem cell differentiation. Nature 518, 331–336 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Su C et al. Human follicular helper T cell promoter connectomes reveal novel genes and regulatory elements at SLE GWAS loci. bioRxiv (2019). doi: 10.1101/2019.12.20.885426 [DOI] [Google Scholar]
- 46.Lee CK, Shibata Y, Rao B, Strahl BD & Lieb JD Evidence for nucleosome depletion at active regulatory regions genome-wide. Nat. Genet. (2004). doi : 10.1038/ng1400 [DOI] [PubMed] [Google Scholar]
- 47.Ernst J & Kellis M Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues. Nat. Biotechnol. (2015). doi: 10.1038/nbt.3157 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Kallur T, Gisler R, Lindvall O & Kokaia Z Pax6 promotes neurogenesis in human neural stem cells. Mol. Cell. Neurosci. (2008). doi: 10.1016/j.mcn.2008.05.010 [DOI] [PubMed] [Google Scholar]
- 49.Osumi N, Shinohara H, Numayama-Tsuruta K & Maekawa M Concise Review: Pax6 Transcription Factor Contributes to both Embryonic and Adult Neurogenesis as a Multifunctional Regulator. Stem Cells (2008). doi: 10.1634/stemcells.2007-0884 [DOI] [PubMed] [Google Scholar]
- 50.Manuel MN, Mi D, Masonand JO & Price DJ Regulation of cerebral cortical neurogenesis by the Pax6 transcription factor. Frontiers in Cellular Neuroscience (2015). doi: 10.3389/fncel.2015.00070 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Visel A, Minovitsky S, Dubchak I & Pennacchio LA VISTA Enhancer Browser - A database of tissue-specific human enhancers. Nucleic Acids Res. (2007). doi: 10.1093/nar/gkl822 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Golbabapour S et al. Gene silencing and polycomb group proteins: An overview of their structure, mechanisms and phylogenetics. OMICS A Journal of Integrative Biology (2013). doi: 10.1089/omi.2012.0105 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Yen A & Kellis M Systematic chromatin state comparison of epigenomes associated with diverse properties including sex and tissue type. Nat. Commun. 6, 1–13 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Buck MJ & Lieb JD A chromatin-mediated mechanism for specification of conditional transcription factor targets. Nat. Genet. (2006). doi: 10.1038/ng1917 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.John S et al. Chromatin accessibility pre-determines glucocorticoid receptor binding patterns. Nature Genetics (2011). doi: 10.1038/ng.759 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Sherwood RI et al. Discovery of directional and nondirectional pioneer transcription factors by modeling DNase profile magnitude and shape. Nat. Biotechnol. 32, 171–178 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Fornes O et al. JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 1–6 (2019). doi: 10.1093/nar/gkz1001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Youn A, Marquez EJ, Lawlor N, Stitzel ML & Ucar D BiFET: sequencing Bias-free transcription factor Footprint Enrichment Test. Nucleic Acids Res. 47, e11 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Nott A et al. Brain cell type–specific enhancer–promoter interactome maps and disease-risk association. Science (80-. ). 366, 1134–1139 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Wang L, Wang R & Herrup K E2F1 works as a cell cycle suppressor in mature neurons. J. Neurosci. (2007). doi: 10.1523/JNEUROSCI.3681-07.2007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Kuwahara A et al. Tcf3 represses Wnt-β-catenin signaling and maintains neural stem cell population during neocortical development. PLoS One (2014). doi: 10.1371/journal.pone.0094408 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Knoepfler PS, Cheng PF & Eisenman RN N-myc is essential during neurogenesis for the rapid expansion of progenitor cell populations and the inhibition of neuronal differentiation. Genes Dev. (2002). doi: 10.1101/gad.1021202 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Pinto L et al. AP2γ regulates basal progenitor fate in a region- and layer-specific manner in the developing cortex. Nat. Neurosci. (2009). doi: 10.1038/nn.2399 [DOI] [PubMed] [Google Scholar]
- 64.Liu Y & Zhang Y ETV5 is Essential for Neuronal Differentiation of Human Neural Progenitor Cells by Repressing NEUROG2 Expression. Stem Cell Rev. Reports (2019). doi: 10.1007/s12015-019-09904-4 [DOI] [PubMed] [Google Scholar]
- 65.Wang Z et al. KLF6 and STAT3 co-occupy regulatory DNA and functionally synergize to promote axon growth in CNS neurons. Sci. Rep. (2018). doi: 10.1038/s41598-018-31101-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Fullard JF et al. Open chromatin profiling of human postmortem brain infers functional roles for non-coding schizophrenia loci. Hum. Mol. Genet. 26, 1942–1951 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Kiyota T, Kato A & Kato Y Ets-1 regulates radial glia formation during vertebrate embryogenesis. Organogenesis (2007). doi: 10.4161/org.3.2.5171 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Casoni F et al. Zfp423/ZNF423 regulates cell cycle progression, the mode of cell division and the DNA-damage response in purkinje neuron progenitors. Dev. (2017). doi: 10.1242/dev.155077 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Braccioli L et al. FOXP1 Promotes Embryonic Neural Stem Cell Differentiation by Repressing Jagged1 Expression. Stem Cell Reports (2017). doi: 10.1016/j.stemcr.2017.10.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Britanova O et al. Satb2 Is a Postmitotic Determinant for Upper-Layer Neuron Specification in the Neocortex. Neuron (2008). doi: 10.1016/j.neuron.2007.12.028 [DOI] [PubMed] [Google Scholar]
- 71.Cañete-Soler R, Reddy KS, Tolan DR & Zhai J Aldolases A and C are ribonucleolytic components of a neuronal complex that regulates the stability of the light-neurofilament mRNA. J. Neurosci. (2005). doi: 10.1523/JNEUROSCI.0885-05.2005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Imagawa E et al. Homozygous p.V116* mutation in C12orf65 results in Leigh syndrome. J. Neurol. Neurosurg. Psychiatry (2016). doi: 10.1136/jnnp-2014-310084 [DOI] [PubMed] [Google Scholar]
- 73.Jiang L et al. Neural deletion of Sh2b1 results in brain growth retardation and reactive aggression. FASEB J. (2018). doi: 10.1096/fj.201700831R [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Chen C et al. The transcription factor POU3F2 regulates a gene coexpression network in brain tissue from patients with psychiatric disorders. Sci. Transl. Med. (2018). doi: 10.1126/scitranslmed.aat8178 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Broekema RV, Bakker OB & Jonkers IH A practical view of fine-mapping and gene prioritization in the post-genome-wide association era. Open Biol. 10, (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Cairns J, Orchard WR, Malysheva V & Spivakov M Chicdiff: A computational pipeline for detecting differential chromosomal interactions in Capture Hi-C data. Bioinformatics (2019). doi: 10.1093/bioinformatics/btz450 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Dobin A et al. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics (2013). doi: 10.1093/bioinformatics/bts635 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Anders S, Pyl PT & Huber W HTSeq-A Python framework to work with high-throughput sequencing data. Bioinformatics (2015). doi: 10.1093/bioinformatics/btu638 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Robinson MD, McCarthy DJ & Smyth GK edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics (2009). doi: 10.1093/bioinformatics/btp616 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Quinlan AR & Hall IM BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics (2010). doi: 10.1093/bioinformatics/btq033 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Lun ATL & Smyth GK Csaw: A Bioconductor package for differential binding analysis of ChIP-seq data using sliding windows. Nucleic Acids Res. (2015). doi: 10.1093/nar/gkv1191 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Langmead B & Salzberg SL Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Cairns J et al. CHiCAGO: Robust detection of DNA looping interactions in Capture Hi-C data. Genome Biol. 17, 1–17 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Ziller MJ et al. Dissecting neural differentiation regulatory networks through epigenetic footprinting Michael. Nature 518, 355–359 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Finucane MK et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Ramirez F et al. High-resolution TADs reveal DNA sequences underlying genome organization in flies. Nat. Commun. (2018). doi: 10.1038/s41467-017-02525-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Dixon JR et al. Topological Domains in Mammalian Genomes Identified by Analysis of Chromatin Interactions. Nature 485, 376–380 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplemental Figure 1: Cell validation using cell markers via PCR and immune-fluorescence microscopy
A. Validation of NPCs using PCRs on NEST, SOX1 and PAX6. The validations were performed using 35 PCR cycles on DNA extracted from both donors. Primer sequences were available in Supplemental Table 3. B. Validation of NPCs using immune-fluorescence against Nestin, PAX6 and SOX1. Images were taken at 63X magnification. C. Validation of neurons using PCRs on bIII-Tub, NeuN, MAP2 and TUJ1. The validations were performed using 35 PCR cycles on DNA extracted from both donors. Primer sequences were available in Supplemental Table 3. D. Validation of neurons using immune-fluorescence against bIII-Tub (left; green), MAP2 (right; green) and CTIP2 (right; magenta). Images were taken at 63X magnification.
Supplemental Figure 2: Gene expression correlation between iPSC-derived cells and primary tissues from GTEx
Spearman correlation of gene expression from iPSC-derive neurons (A) or NPCs (B) with primary tissues from GTEx. The spearman correlations between the mean TPM value of genes in iPSC-derived samples and gene TPM of the full individual tissue RNA-seq dataset from GTEx (2017-06-05 v8) were calculated. The line within box indicates the median correlation. The lower and upper hinges correspond to the 25th (Q1) and 75th (Q3) percentiles. The upper whisker corresponds 1.5 * interquartile range (IQR) + Q3 and the lower whisker extends to Q1-1.5 * IQR. The brain and non-brain tissues are labeled with red and black respectively.
Supplemental Figure 3: Quality control of ATAC-seq and differential accessibility between NPCs and neurons
A. Pair-wise correlation among ATAC-seq samples from NPCs and neurons. Pearson correlation among 20 ATAC-seq samples (5 replicates x 2 donors x 2 cell types) are calculated using OCR log2FPKM values and plotted in heatmap with darker blue representing higher correlation. Blue and red bars label samples from NPCs and neurons, respectively. B. More OCRs with higher differentially accessibility in NPCs. The OCRs with differentially accessibility between NPCs and neurons were grouped to OCRs with higher accessibility in NPCs (blue) and higher accessibility in neurons (red). The number of OCRs were stratified by log2Fold change and plotted in bar graph.
Supplemental Figure 4. Quality control of promoter-focused capture-C
A. Pairwise correlation among promoter-focused capture-C libraries. 1Mbp-resolution raw read count HiC matrix from 18 sequenced capture-C libraries were normalized by adjust read count and compared to each other by HICRep (v 1.8.0). The Stratum-adjusted Correlation Coefficient (SCC) was calculated with smoothing parameter h=1. The hierarchical cluster is performed on 1-SCC with “complete” method. Heatmap color scale indicates the correlation with blue as high. B. Recall rate and positive prediction value of detecting active regulatory elements and enhancer features across different Chicago scores. C. Cis- and trans-interaction number in NPCs and neurons. Intra-chromosomal (cis, red) and inter-chromosomal (trans, green) interactions are predicted by CHiCAGO (score > 5) in both 1-fragment and 4-fragement resolution. Intra-chromosomal interactions account for more than 90% of total. D. Regulatory feature enrichment by promoter-interacting regions (PIRs) involved in significant interactions. PIR enrichment for genomic features was compared with distance-matched random regions using CHiCAGO peakEnrichment4Features function. Mean ± 95% Cl is depicted across 100 draws of non-significant interactions. E. Composition ratio of intra-chromosomal interactions in terms of across and within TADs in NPCs and neurons. The intra-chromosomal interactions are grouped into “within TADs” (green) and “across TADs” (red) depending whether both ends within the same TAD boundary of human ESC87. F. The distance between intra-chromosomal interacting chromatin regions in NPCs and neurons. Interaction distance represents the linear distance between middle points of both ends of a significant interaction. The interaction distance is significantly longer in NPCs (blue) than in neurons (red, Wilcox Rank-Sum Test P < 2e-16) at both resolutions in which 4-fragment resolution detects longer interaction than 1-fragment resolution does.
Supplemental Figure 5. Summary of gene-cRE interactions
A. The composition of putative cis-regulatory elements (cRE). cREs (red) are defined as an OCR that overlapped with promoter-interacting region (PIRs) with a minimum fraction of 50% in either NPCs or neurons. They partially intersect with previously defined prOCRs but also re-annotate non-prOCR (nprOCRs) to distal corresponding genes. Non-cRE (green) is the counterpart of cRE. B. Distribution of interacting cRE number per gene in NPCs and neurons. C. Distribution of interacting gene number per cRE in NPCs and neurons. D, E. Positive correlation between number of cRE and gene expression in NPC (D) and neurons (E). Genes are grouped based on the number of interacting cRE. Boxplots indicate the median, IQR, Q1 – 1.5 × IQR and Q3 + 1.5 × IQR. Linear regression was performed on the Mean log2TPM values for n = 15 bins (F test, NPC: P = 4.72e-5, r-square =0.74; neurons: P = 9.64e-4, r-square=0.58). F. The Venn diagram of gene-cRE interaction number in NPCs and neurons. G. Gene expression change is positively correlated with active enhancer-like cRE number change during neuronal differentiation. The blue numbers indicate the numbers of genes per group. Linear regression was performed between log2 expression fold change and cRE number difference per gene (P = 5.16x10−25; beta = 0.04; R2 = 9.22x10−3). Boxplot indicates median, Q1, Q3, Q1 – 1.5 × IQR and Q3 + 1.5 × IQR.
Supplemental Figure 6. Spatial interaction between NEFL promoter and cREs
A. cREs interact with the NEFL promoter in a cell-specific manner. The significant interaction (CHiCAGO score > 5) between cREs and NEFL are indicated by black arc in NPCs (blue peaks) and neurons (red peaks). B. Differential accessibility of cRE that interacts with NEFL promoter in neurons and NPCs. FPKM of cRE at 50kbp downstream of NEFL promoter are plotted. Boxplot indicates median, IQR, Q1 – 1.5 × IQR and Q3 + 1.5 × IQR.
Supplemental Figure 7. Chromatin state ratio of cREs that contact with gene promoter
A. Illustrating the calculation of chromatin state ratio across cREs that contact with gene promoter. All cREs interacting with a given gene and their corresponding chromatin states are aggregated. Chromatin state ratio is calculated per gene per state with state aggregated length dividing total cRE length. B. heatmap of chromatin state ratio for differentially expressed genes in NPC and neurons. The heatmap scale represents ratio level (high: blue, low: white).
Supplemental Figure 8. FMNL1 promoter interacts with cRE containing Intelligence Deficiency GWAS SNP specially in neuron.
A. The interaction map between FMNL1 promoter and cREs for NPC and neurons. The significant interaction (CHiCAGO score > 5) between cREs and FMNL1 are indicated by black arc in NPCs (top) and neurons (bottom red). Histone chromatin states colors were indicated in Figure 3B. Light blue bars indicate cREs. cRE ~900kbp upstream FMNL1 promoter contains causal variant rs76324150 that is the proxy (LD r2=1) of intelligence deficiency GWAS SNP rs17563986. B. Raw read count to support interaction between FMNL1 and cRE containing rs76324150. The geometric mean of the raw read counts supporting fragment interactions across all replicates were plotted in dots with significant interactions labeled blue. The significant interaction between FMNL1 bait and rs76324150 containing fragment is highlighted with red triangle. The expected level of Brownian collision background (solid line) and upper limit of 95% confidence of Brownian background (dashed line) were plotted. C. FMNL1 Expression difference between NPC and neurons. Expression levels are represented by TPM in RNA-seq across replicates. D. Accessibility of the cRE containing causal variant rs76324150. Accessibility is represented by FPKM in ATAC-seq across replicates. E. Disruption of ZFX and KLF12 binding site by rs76324150 at cRE contacting with FMNL1 promoter. Information Content Matrix (ICM) on nucleotide sequences of ZFX and KLF12 binding site were plot. The square highlights the disrupted site within ZFX and KLF12 binding sites, with the reference nucleotide sequence at top and alternative nucleotide sequence at bottom.
Supplemental Table 1: Differential expressed genes between NPCs and Neurons.
Supplemental Table 2: Ingenuity Pathway Enrichment for differentially expressed protein-coding genes.
Supplemental Table 3: PCR primers for cell validation
Supplemental Table 4: The accessibility of promoter OCR and expression level of their corresponding genes
Supplemental Table 5: Summary of promoter-focused capture-C libraries pre-processed by hicup
Supplemental Table 6: Summary statistics of significant interactions called by CHiCAGO
Supplemental Table 7: Differential interactions between NPCs and neurons detected by Chicdiff
Supplemental Table 8: Genome-wide interactions between gene and cRE in NPCs
Supplemental Table 9: Genome-wide interactions between gene and cRE in neurons
Supplemental Table 10: Roadmap epigenomics project resource for hESC-derived neurons and NPCs
Supplemental Table 11: Experimentally validated enhancers (VISTA) overlapping with cRE among different tissues.
Supplemental Table 12: BiFET transcription factors enrichment in NPCs and neurons
Supplemental Table 13: GWAS studies of neurodevelopmental disorders.
Supplemental Table 14: Variant to gene mapping for neurodevelopmental disorders.
Supplemental Table 15: Reactome pathway enrichment for genes implicated by variants associated with neurodevelopmental disorders
Supplemental Table 16: Variants associated with neurodevelopmental disorders disrupts TF binding at cRE.
Data Availability Statement
Our data are available from ArrayExpress (https://www.ebi.ac.uk/arrayexpress/) with accession numbers E-MTAB-9159 (promoter-Capture-C), E-MTAB-9087 (ATAC-seq), and E-MTAB-9085 (RNA-seq) respectively.
