Abstract
The three-dimensional (3D) structure of chromatin is intrinsically associated with gene regulation and cell function1–3. Methods based on chromatin conformation capture have mapped chromatin structures in neuronal systems such as in vitro differentiated neurons, neurons isolated through fluorescence-activated cell sorting from cortical tissues pooled from different animals and from dissociated whole hippocampi4–6. However, changes in chromatin organization captured by imaging, such as the relocation of Bdnf away from the nuclear periphery after activation7, are invisible with such approaches8. Here we developed immunoGAM, an extension of genome architecture mapping (GAM)2,9, to map 3D chromatin topology genome-wide in specific brain cell types, without tissue disruption, from single animals. GAM is a ligation-free technology that maps genome topology by sequencing the DNA content from thin (about 220 nm) nuclear cryosections. Chromatin interactions are identified from the increased probability of co-segregation of contacting loci across a collection of nuclear slices. ImmunoGAM expands the scope of GAM to enable the selection of specific cell types using low cell numbers (approximately 1,000 cells) within a complex tissue and avoids tissue dissociation2,10. We report cell-type specialized 3D chromatin structures at multiple genomic scales that relate to patterns of gene expression. We discover extensive ‘melting’ of long genes when they are highly expressed and/or have high chromatin accessibility. The contacts most specific of neuron subtypes contain genes associated with specialized processes, such as addiction and synaptic plasticity, which harbour putative binding sites for neuronal transcription factors within accessible chromatin regions. Moreover, sensory receptor genes are preferentially found in heterochromatic compartments in brain cells, which establish strong contacts across tens of megabases. Our results demonstrate that highly specific chromatin conformations in brain cells are tightly related to gene regulation mechanisms and specialized functions.
Subject terms: Data integration, Gene regulation, Nuclear organization, Genetics of the nervous system, Regulatory networks
A new technique called immunoGAM, which combines genome architecture mapping (GAM) with immunoselection, enabled the discovery of specialized chromatin conformations linked to gene expression in specific cell populations from mouse brain tissues.
Main
To explore how genome folding is related to cell specialization, we applied immunoGAM to mouse brain tissue slices and analysed three cell types with diverse functions (Fig. 1a): oligodendroglia (oligodendrocytes and their precursors (OLGs)) from the somatosensory cortex; pyramidal glutamatergic neurons (PGNs) from the cornu ammonis 1 (CA1) of the dorsal hippocampus; and dopaminergic neurons (DNs) from the ventral tegmental area (VTA) of the midbrain. OLGs are important for neuronal myelination and circuit formation11, whereas PGNs are important for temporal and spatial memory formation and consolidation12, and DNs are activated during cue-guided reward-based learning13. Publicly available GAM data from mouse embryonic stem (mES) cells9 were used for comparison (Supplementary Table 1).
We selected cell types from brain tissue slices by immunofluorescence with cell marker antibodies before genomic extraction (Fig. 1b). A detailed flowchart of immunoGAM quality control (QC) measures and normalization is shown in Extended Data Fig. 1a–d and Supplementary Table 2. GAM contact matrices, each from about 850 cells, had low biases in GC content and mappability (Extended Data Fig. 2a–c). We calculated local contact densities and topological domains using the insulation square method14, and calculated compartments associated with open chromatin (compartment A) and closed chromatin (compartment B) using principal component analysis (PCA)2 (Supplementary Tables 3–5).
As an example of cell-type-specific organization, we considered the Pcdh locus, which contains three clusters of cell adhesion genes (Pcdha, Pcdhb and Pcdhg) and occupies two topologically associating domains (TADs) in mES cells, as previously described15 (Fig. 1c, see Extended Data Fig. 3a for replicates). Mapping contact densities using 100–1,000 kb insulation squares showed that the locus is generally open above 500 kb. Higher expression of Pcdha and Pcdhb coincides with increased long-range contacts between the three clusters in neurons16 and OLGs17 and with additional long-range contacts with the highly expressed Fgf1 gene in OLGs. We also discovered contacts spanning tens of megabases in brain cells. For example, strong contacts connected two regions approximately 3- and 5-Mb wide, separated by 35 Mb, which contained clusters of vomeronasal (Vmn) and olfactory (Olfr) receptor genes (Fig. 1d, see Extended Data Fig. 3b for replicates). Thus, the application of immunoGAM in specific brain cell types reveals large rearrangements in 3D chromatin architecture at short-range and long-range genomic lengths.
To further investigate how cell-type-specific 3D genome topologies relate to gene expression and chromatin accessibility, we produced or collected published single-cell RNA sequencing (scRNA-seq) data and single-cell assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq) data from mES cells, the cortex, the hippocampus and the midbrain (Methods, Extended Data Fig. 4, Supplementary Table 6). After selecting cell populations equivalent to those captured by immunoGAM, we compiled cell-type-specific pseudobulk RNA-seq and ATAC-seq datasets.
TADs extensively rearrange between cell types
Complex and extensive cell-type-specific changes in TAD-level contacts were frequent, for example, at a 4-Mb region that contains Scn genes that encode sodium voltage-gated channel subunits (Fig. 2a, see Extended Data Fig. 5a for replicates). We obtained a total of approximately 2,300 TADs across cell types, with a median length of about 1 Mb, which is in line with previous reports6 (Extended Data Fig. 5b). Although pairwise comparisons of TAD border positions confirmed previous levels of conservation4,6 (78–89%; Extended Data Fig. 5c), multiway comparisons showed high cell-type specificity (Fig. 2b, see Extended Data Fig. 5d for sparser combinations). One-third of the borders were unique and significantly more insulated in other cell types (Extended Data Fig. 5e), with some variability noted between biological replicates (59–65%) (Extended Data Fig. 5f). By contrast, only 8% of the total set of borders was shared by brain cells and 14% by all cell types. Shared borders showed significantly stronger insulation in brain cells than in mES cells (Extended Data Fig. 5g), which suggests that there is structural stabilization after terminal differentiation. Unique boundaries often contained expressed genes (52–55% in brain cells, 38% in mES cells) (Extended Data Fig. 5h) and genes with enriched Gene Ontology (GO) terms relevant to the specialized cell type (Fig. 2c, Supplementary Table 7), such as ‘membrane depolarization’ and ‘cognition’ in PGNs or genes important for dopaminergic differentiation and dopamine synthesis in DNs.
Long neuronal genes melt in brain cells
Many neuronal genes involved in specialized functions are long (>300 kb) and produce many isoforms owing to complex RNA processing18. Chromatin reorganization was most apparent at long genes in both PGNs and DNs (Fig. 2d, e). For example, Grik2 loses contact density in PGNs compared to mES cells, especially around the transcription start site (TSS) and transcription end site (TES) (Fig. 2d). By contrast, Dscam decondenses across its entire gene body in DNs (Fig. 2e). To assess whether decondensation relates to the expression of long genes, we compared the insulation of the most and least expressed long genes (Extended Data Fig. 5i). Highly expressed genes were significantly less insulated at TSSs and TESs and throughout gene bodies in both DNs and PGNs, but not in OLGs or mES cells. The general contact loss at highly expressed long neuronal genes is reminiscent of the decondensation, or ‘melting’, observed by microscopy at polytene chromosome puffs19 or tandem gene arrays20.
To detect melting genome-wide in an unbiased manner, we devised the MELTRON pipeline. MELTRON calculates a ‘melting score’ as the significant difference between cumulative probabilities of insulation scores across a range of genomic scales (100–1,000 kb) between two cell types and within regions of interest, here defined as all (479) long genes (Fig. 2f). We found 120–180 melting genes with melting scores of >5 (Kolmogorov–Smirnov test, P < 1 × 10−5) between brain cells and mES cells (Fig. 2g, Supplementary Table 8). Grik2 had melting scores of 12 and 26 in PGNs (replicates 1 and 2, respectively), whereas Dscam had scores of 38 and 50 in DNs (replicates 1 and 2, respectively) and Magi2 had a score of 73 in OLGs (Extended Data Fig. 6a, b). Melting scores in the PGN and DN replicates correlated well (Extended Data Fig. 6c).
Melting genes were significantly more transcribed and showed higher chromatin accessibility than non-melting long genes, especially in PGNs and DNs (Fig. 2g, Extended Data Fig. 6d–f). Of interest, many top (3%) melting genes (24 out of 44) are sensitive to topoisomerase I inhibition in ex vivo neuronal cultures21, which was in contrast to 16% (42 out of 261) with intermediate melting scores or 16% of non-melting genes (Extended Data Fig. 6g). This result suggests that extensive melting of long genes is associated with the resolution of topological constraints21. Meltinggenes often belonged to compartment A in both mES cells and the corresponding brain cell (43–58%), especially when highly transcribed in both cell types (Extended Data Fig. 6h). Genes melting in OLGs and DNs were less likely to be lamina-associated or nucleolus-associated in mES cells, whereas PGNs did not show any preferred association (Extended Data Fig. 6i, j). Therefore, melting of long genes is not trivially associated with a transition from a heterochromatic state in mES cells to open chromatin in brain cells, although such events can occur (for example, Magi2 in OLGs or Dscam in DNs) (Supplementary Table 8).
We next examined in more detail melting in neurexin 3 (Nrxn3) and RNA binding Fox 1 homologue 1 (Rbfox1) genes, both of which are highly sensitive to topoisomerase I inhibition21. Nrxn3 encodes a membrane protein involved in synaptic connections and plasticity. In mES cells, Nrxn3 spans two TADs with high contact density, localizes in compartment B and associates with the nuclear lamina and the nucleolus. In DNs, Nrxn3 extensively melts (replicate scores of 48 and 49), is highly transcribed and accessible and belongs to compartment A (Fig. 3a, see Extended Data Fig. 7a for all cell types and replicates). Rbfox1 encodes a RNA-binding protein that regulates alternative splicing. In mES cells, Rbfox1 lies within a dense contact domain in compartment A, has very low expression and low chromatin accessibility. It also has nucleolar-associated domain and partial lamina-associated domain memberships. Rbfox1 extensively melts in PGNs (scores of 65 and 39), which coincides with its highest expression and high accessibility in these cells (Fig. 3b, Extended Data Fig. 7b).
To further understand the melting process in the Nrxn3 region, we used a polymer-physics-based approach22 to generate ensembles of 3D models in mES cells and DNs from GAM matrices (Fig. 3c, Supplementary Tables 9 and 10). 3D models were validated by reconstructing in silico GAM matrices (Extended Data Fig. 7c). mES cell models showed intermingled globular domains, including the green and red domains that contain Nrxn3 (Supplementary Video 1, see Extended Data Fig. 7d for additional examples). In DNs, the melted green domain becomes highly extended and has high gyration radii (Fig. 3c, d, Supplementary Video 2), while the upstream (grey) and downstream (blue) domains condense (Fig. 3a, Extended Data Fig. 7e).
Next, we applied fluorescence in situ hybridization on cryosections (cryo-FISH)2,23 to visualize Rbfox1 in mES cells and PGNs (Fig. 3e, Supplementary Table 11). In mES cells, a fluorescence-labelled probe across Rbfox1 revealed circular foci (average area of 0.44 ± 0.17 μm2, mean ± s.d.) often localized at the nucleolar surface (59%) or the nuclear periphery (27%; Fig. 3f, g, Extended Data Fig. 7f). In PGNs, Rbfox1 decondensed and elongated with significantly high areas (0.59 ± 0.31 μm2; Mann–Whitney test, P < 0.01) and localized to the nucleoplasm interior (77%). Using specific probes for the TSS, the middle and the TES of Rbfox1 revealed increased separation between the TSS and the TES in PGNs compared to mES cells (Fig. 3h, i; 0.65 ± 0.41 μm and 0.37 ± 0.22 μm, respectively; Mann–Whitney test P < 0.01; Extended Data Fig. 7g).
The extensive changes in Rbfox1 localization and condensation led us to ask whether melting is generally related to changes in intrachromasomal and interchromosomal contacts. We assessed this by comparing their trans–cis contact ratios (Methods). Melted genes had significantly lower trans–cis values (higher intrachromosomal contacts) in DNs and PGNs than in mES cells (Extended Data Fig. 8a–c), but not in OLGs or in non-melting long genes (Extended Data Fig. 8a, d). Of note, Rbfox1 had a higher trans–cis ratio in PGNs, whereas Nrxn3 had a lower trans–cis ratio in DNs (Extended Data Fig. 8e, f). Decreased trans–cis ratios of melting genes in DNs or PGNs were independent of NAD association in mES cells (Extended Data Fig. 8g), whereas non-melting genes with low trans–cis values were generally associated with NADs in mES cells (Extended Data Fig. 8h).
Together, polymer modelling from GAM data and single-cell imaging highlight that domain melting is a previously unappreciated topological feature of very long genes. Domain melting occurs when genes are highly expressed, or highly accessible, in brain cell types, and the process is robustly captured by immunoGAM (Fig. 3j). The decondensation of long genes in brain cells relative to mES cells often coincides with extensive reorganization of their chromosomal contacts, preferentially alongside increased intrachromosomal contacts.
Differential hubs of expressed genes
To explore how extensive chromatin rearrangements relate to changes in cis-regulatory elements and expressed genes, we extracted the top (5%) most differential contacts between PGNs and DNs within 5 Mb (ref. 9) (Fig. 4a, a detailed pipeline is provided in Extended Data Fig. 9a). We searched for binding motifs in accessible regions, which typically cover about 1.3 kb of the 50-kb contacting windows (Extended Data Fig. 9b), from differentially expressed transcription factors (TFs) that covered >5% of differential contacts (16 DN-specific and 32 PGN-specific TFs; Extended Data Fig. 9c, d, Supplementary Table 12). Out of 1,275 possible combinations of TF motif pairs, we prioritized 19 pairs (combinations of 14 TF motifs) that were most enriched in contacts of a given cell type or with a high ability to distinguish cell types (information gain; a full pipeline and criteria are provided in Extended Data Fig. 9e, f, and see Supplementary Table 13 for all TF pairs).
We searched for differential contacts containing the most common TF-pair combinations (Fig. 4b, a full list is shown in Extended Data Fig. 9g). In PGNs, homodimers and heterodimers for Neurod1 and/or Neurod2 putative binding sites characterized the most abundant contacts, together with Egr1, Etv5, Lhx2, Maz, Nr3c1, Pou3f2 and Ubp1 (Neurod group; 5,572 contacts). In DNs, contacts containing Neurod1 and Neurod2 appeared as heterodimers (660 contacts). The most frequent TF-motif pair in DNs, and the second most in PGNs, is a Ctcf homodimer (892 and 781 contacts, respectively). The next most abundant DN-specific contacts contained Foxa1 combined with Ctcf, Nr2f1 or Nr4a1 (Foxa1–TF group; 1,612 contacts). All groups spanned 0.05–5 Mb and captured strong contacts (Extended Data Fig. 10a, b). The selected differential contacts rarely coincided with two TAD borders (Extended Data Fig. 10c) and often involved compartment A windows (Extended Data Fig. 10d). Networks of differential contacts, built on the basis of motif co-occurrence using all 50 differentially expressed TFs, confirmed connectivity between multiple TF motifs in PGNs, and between Foxa1 or Neurod and specific TFs in DNs (Extended Data Fig. 10e, f, Supplementary Table 14).
Many contacts in each TF-motif group contained expressed genes in both contacting windows (30–45% in DNs, 40–50% in PGNs) that were significantly above the genome-wide or top 5% contact frequencies (10–16%; Fig. 4c, Extended Data Fig. 10g). Many of these genes were differentially expressed between PGNs and DNs (1,490 and 975, respectively, out of 3,537 differentially expressed genes; Extended Data Fig. 10h). In PGN-specific contacts, both the Neurod and Ctcf–Ctcf groups contained PGN upregulated genes with GO terms related to synaptic plasticity (Fig. 4d). Two PGN upregulated genes, Dlg4 (which is important for long-term potentiation24) and Shisa6 (which prevents desensitization of AMPA receptors during plasticity25) were present within a hub of Neurod contacts that contained other activity-related genes, including Map2k4 and Dnah9 (see Extended Data Fig. 10i for the differential contact matrix). DN upregulated genes found with the Foxa1–TF (139 out of 1,844), the Neurod–TF (87) or the Ctcf–Ctcf (80) pair are involved in synaptic organization and addiction pathways (Fig. 4e). For example, Dnm3 has altered protein expression in an alcohol-dependence paradigm26 and makes contacts containing the Foxa1–TF pair with Mrps14 (downregulated after nicotine exposure27), Cacynp (upregulated following alcohol exposure28) and Pou2f1 (a co-factor associated with alcohol dependence29) (see Extended Data Fig. 10j for the differential contact matrix). Of note, Egr1, an immediate early gene upregulated in activated neurons30, establishes PGN-specific contacts containing accessible regions covered by Egr1 and Neurod motifs (Fig. 4f, see Extended Data Fig. 10k for replicate data). Egr1 was highly upregulated in PGNs (log2(fold-change) = 3, PGNs compared to DNs) and gained contacts with its adjacent TAD. It also contained accessible chromatin peaks rich in TF motifs belonging to the Neurod group that are not seen in DNs. Binding of EGR1 protein to its own promoter is confirmed in published chromatin immunoprecipitation with sequencing (ChIP-seq) data from the cortex31.
Together, our strategy identifies hubs of chromatin contacts specific for different neuron types that contain putative binding sites for differentially expressed TFs (Fig. 4g). These interconnected hubs bring together distal genes with specialized neuronal functions, such as synaptic plasticity in PGNs or drug addiction in DNs.
Extensive A/B compartment reorganization
Last, we found broad changes in A/B compartmentalization between all cell types (Extended Data Fig. 11a, b), with lowest Pearson’s correlations of compartment eigenvector values between brain cells and mES cells and highest correlations between neuronal replicates (Extended Data Fig. 11c). Only 12% of genomic windows changed from compartment B in mES cells to compartment A in brain cells or between compartment A in mES cells to compartment B in brain cells (7%; see Extended Data Fig. 11d, e for per-chromosome transitions). Similar mean and total genomic lengths occupied contiguously by A or B compartments characterized all cell types (Extended Data Fig. 11f). B-to-A transitions from mES cells to brain cells contained 335 genes more strongly expressed in brain cells than in mES cells (Extended Data Fig. 12a). Their enriched GO terms included ‘behaviour’ and ‘gated ion channel activity’ (Fig. 5a). A-to-B transitions in mES cells to brain cells contained mostly silent genes in all cell types (572 out of 715 genes), except 50 transcriptional regulation genes highly expressed in mES cells (Fig. 5a, Extended Data Fig. 12b).
We found that A-to-B transitions were enriched for sensory receptor genes such as Vmn (149 genes out of 572 silent genes in the group) and Olfr (179 genes), and these were often found in clusters32,33 (Fig. 5b). Although silent, only 35% of Vmn and 66% of Olfr genes belonged to compartment B in mES cells compared with 82–96% and 72–85%, respectively, in brain cells (Extended Data Fig. 12c). Vmn and Olfr genes were often involved in strong clusters of contacts in brain cells that spanned up to 50 Mb (Fig. 5c, additional examples in Fig. 1d, Extended Data Fig. 12d, e). Long-range contacts in brain cells were significantly stronger when B compartments contained Vmn or, to a lesser extent, Olfr genes (at distances >3 Mb) (Extended Data Fig. 12f). This result suggests that sensory genes are not only more likely to belong to heterochromatic B compartments but also to more strongly contact other B compartments in brain cells.
Discussion
Here we introduced immunoGAM to capture genome-wide chromatin conformation states of specialized cell populations in the mouse brain. We discovered extensive reorganization of chromatin topology across genomic scales, including cell-type-specific TAD reorganization that involves genes relevant to brain cell specialization (Extended Data Fig. 12g).
We reported melting of long genes (>300 kb) with highest expression levels and/or accessible chromatin in brain cells. Single-cell imaging of Rbfox1 in PGNs showed that the most prominent decondensation occurred between TSSs and TESs. Many long genes have specialized regulation in brain cells, for example, by topoisomerase activity21 or DNA methylation34, by long stretches of H3K27ac or H3K4me1 acting as enhancer-like domains35 or by large transcription loops36. Their regulation is further complicated by intricate RNA processing dynamics18, which are required for adaptive responses based on activation state. Many of the highlighted genes, including Nrxn3, Rbfox1, Grik2 and Dscam, have genetic variants associated with or directly causal of neuronal diseases37–40. Thus, understanding how gene melting relates to regulation will become important to understanding the mechanisms of neurological disease.
Cell-type-specific networks of contacts were enriched for putative binding sites of differentially expressed TFs and connected hubs of differentially expressed genes with specialized functions24,25,30, which is reminiscent of transcription factories41. DN-specific loops contained genes related to drug-exposure response and addiction paradigms. Midbrain VTA DNs are the first brain cells that respond to addictive substances, including amphetamines, nicotine and cocaine42,43. Future studies can explore the relationship between DN-specific chromatin landscapes and the regulation of these critical genes, with potential implications for the onset of addiction. PGN-specific contacts connected hubs of synaptic plasticity genes. Of note, PGN-specific contacts at the Egr1 gene, which is involved in the activation of long-term potentiation, contained Egr1 binding motifs, which suggests that there may be self-activation mechanisms. Together with reports that de novo chromatin looping can accompany transcriptional activation5, our work suggests that coordinated TF binding at distant locations in the linear genome, but in close contact due to the 3D chromatin landscape, may be critical for the induction of long-term potentiation.
Our results also highlighted the specialization of repressive long-range contacts in brain cells. Repressed Olfr genes form a large interchromosomal hub in mature olfactory sensory neurons to regulate specificity of single Olfr gene activation44. We showed that sensory genes also form strong cis-contacts in brain cells not directly involved in sensory processes, a result confirmed in adult cortical neurons45. Tight 3D compartmentalization of Vmn and Olfr genes may be important for their repression in brain cells, as Olfr genes can be stochastically activated and mis-expressed in neurodegenerative diseases46.
Finally, we showed that immunoGAM requires low cell numbers (approximately 1,000 cells) from single individuals while retaining the spatial organization of cells within brain tissues. This highlights its potential to provide insights into the aetiology and progression of neurological disease. Collectively, our work showed that cell specialization in the brain and chromatin structure are intimately linked at multiple genomic scales.
Methods
Randomization, blinding, and sample size
Randomization and blinding were not relevant for the current study. The experiments and the subsequent analyses were performed on wild-type animals or cell lines, for which no clinical trial, treatment or disease comparison was performed. Samples were processed in different laboratories by different people, and there was no selection criteria for the wild-type mice used in the study. The appropriate number of samples for a GAM dataset varies and depends on multiple parameters such as nuclear volume, level of chromatin compaction, quality of DNA extraction, and so on. Because most of these parameters can be assessed only after the data have been collected and processed, we recommend that the optimal resolution is defined during the collection of each GAM dataset, rather than trying to estimate optimal sample size before data collection. GAM data can be collected in multiple batches from the same starting material, therefore the sample size can be increased until the desired resolution is achieved. For scRNA-seq experiments in mES cells, no statistical method was used to predetermine sample size. Libraries were generated twice, from mES cells from different biological replicates, to account for experimental variability. For scATAC-seq experiments, no statistical method was used to predetermine sample size.
Animal maintenance
Collection of GAM data from DNs was performed using one C57Bl/6NCrl (RRID: IMSR_CR:027; WT) mouse, which was purchased from Charles River, and from one tyrosine hydroxylase–green fluorescent protein (TH–GFP; B6.Cg-Tg(TH-GFP)21-31/C57B6) mouse, obtained as previously described50,51. All procedures involving WT and TH–GFP animals were approved by the Imperial College London’s Animal Welfare and Ethics Review Body. Adult male mice aged 2–3 months were used. All mice had access to food and water ad libitum and were kept on a 12-h light/12-h dark cycle at 20–23 °C and 45 ± 5% humidity. WT and TH–GFP mice received an intraperitoneal injection of saline 14 days or 24 h, respectively, before tissue collection, and they were part of a larger experiment for a different study. Collection of single-nucleus ATAC-seq (snATAC-seq) data from the midbrain VTA was performed using male C57Bl/6Nl (RRID: IMSR_CR:027; WT) mice, aged 7 and 9 weeks, which were a gift from M. Gotthardt. Mice for snATAC-seq were housed in a temperature-controlled room at 22 ± 2 °C with humidity of 55 ± 10% in individually ventilated cages with 12-h light/12-h dark cycles and with access to food and water ad libitum. All experiments involving snATAC-seq animals were carried out following institutional guidelines as approved by LaGeSo Berlin and following the Directive 2010/63/EU of the European Parliament on the protection of animals used for scientific purposes. Organ preparation was done under license X9014/11.
Collection of GAM data from somatosensory oligodendrocyte cells was performed using Sox10::cre-RCE::loxP-EGFP animals52, which were obtained by crossing Sox10::cre animals53 on a C57BL/6j genetic background with RCE::loxP-EGFP animals54 on a C57BL/6×CD1 mixed genetic background, both available from The Jackson Laboratory. The cre allele was maintained in hemizygosity, whereas the reporter allele was maintained in hemizygosity or homozygosity. Experimental procedures for Sox10::cre-RCE::loxP-EGFP animals were performed following the European directive 2010/63/EU, local Swedish directive L150/SJVFS/2019:9, Saknr L150 and Karolinska Institutet complementary guidelines for the procurement and use of laboratory animals, Dnr 1937/03-640. The procedures described were approved by the local committee for ethical experiments on laboratory animals in Sweden (Stockholms Norra Djurförsöksetiska nämnd), licence number 130/15. One male mouse was killed at post-natal day 21 (P21). Mice were housed to a maximum number of 5 per cage in individually ventilated cages with the following light/dark cycle: dawn 6:00–7:00, daylight 7:00–18:00, dusk 18:00–19:00, night 19:00–6:00. All mice had access to food and water ad libitum and were housed at 22 °C and 50% humidity.
Collection of GAM data from hippocampal CA1 PGNs was performed using two 19-week-old male Satb2flox/flox mice. C57Bl/6NCrl (RRID: IMSR_CR:027; WT) mice were purchased from Charles River, Satb2flox/flox mice that carry the loxP flanked exon 4 have been previously described55. The experimental procedures were done according to the AustrianAnimal Experimentation Ethics Board (Bundesministerium für Wissenschaft und Verkehr, Kommission für Tierversuchsangelegenheiten). All mice had access to food and water ad libitum and were kept on a 12-h light/12-h dark cycle at 22.5 °C and 55 ± 10% humidity.
Tissue fixation and preparation
WT, TH–GFP and Satb2flox/flox mice were anaesthetised under isoflurane (4%), given a lethal intraperitoneal injection of pentobarbital (0.08 μl, 100 mg ml–1 Euthatal) and transcardially perfused with 50 ml ice-cold PBS followed by 50–100 ml 4% depolymerized paraformaldehyde (PFA; electron microscopy grade, methanol-free) in 250 mM HEPES–NaOH (pH 7.4–7.6). Sox10::cre-RCE::loxP-EGFP animals were killed using an intraperitoneal injection of ketaminol and xylazine followed by transcardialperfusion with 20 ml PBS and 20 ml 4% PFA in 250 mM HEPES (pH 7.4–7.6). Brains from WT or TH–GFP mice were removed, and the tissue containing the VTA was dissected from each hemisphere at room temperature and rapidly transferred to fixative. For Satb2flox/flox mice, the CA1 field ippocampus was dissected from each hemisphere at room temperature. For Sox10cre/RCE mice, brain tissue containing the somatosensory cortex was dissected at room temperature. Following dissection, tissue blocks were placed in 4% PFA in 250 mM HEPES–NaOH (pH 7.4–7.6) for post-fixation at 4 °C for 1 h. Brains were then placed in 8% PFA in 250 mM HEPES and incubated at 4 °C for 2–3 h. Tissue blocks were then placed in 1% PFA in 250 mM HEPES and kept at 4 °C until tissue was prepared for cryopreservation (up to 5 days, with daily solution changes).
Cryoblock preparation and cryosectioning
Fixed tissue samples from different brain regions were further dissected to produce about 1.5 × 3 mm tissue samples suitable for Tokuyasu cryosectioning2 (Extended Data Fig. 1a) at room temperature in 1% PFA in 250 mM HEPES. For the hippocampus, the dorsal CA1 region was further isolated. Approximately 1–3 × 1–3 mm blocks were dissected from all brain regions and were further incubated in 4% PFA in 250 mM HEPES at 4 °C for 1 h. The fixed tissue was transferred to 2.1 M sucrose in PBS and embedded for 16–24 h at 4 °C, before being positioned at the top of copper stub holders suitable for ultracryomicrotomy and frozen in liquid nitrogen. Cryopreserved tissue samples are kept indefinitely immersed under liquid nitrogen.
Frozen tissue blocks were cryosectioned with an Ultracryomicrotome (Leica Biosystems, EM UC7), with an approximate 220–230 nm thickness2. Cryosections were captured in drops of 2.1 M sucrose in PBS solution suspended in a copper wire loop and transferred to 10-mm glass coverslips for confocal imaging or onto a 4.0-µm polyethylene naphthalate (PEN; Leica Microsystems, 11600289) membrane on metal framed slides for laser microdissection.
Immunofluorescence detection of GAM samples for confocal microscopy
For confocal imaging, cryosections were incubated in sheep anti-TH (1:500; Pel Freez Arkansas, P60101-0), mouse anti-pan-histone H11-4 (1:500; Merck, MAB3422) or chicken anti-GFP (1:500; Abcam, ab13970) followed by donkey anti-sheep or goat anti-chicken IgG conjugated with Alexa Fluor-488 (for TH and GFP; Abcam) or donkey anti-mouse IgG conjugatedwith Alexa Fluor-555 or Alexa Fluor-488 (for pan-histone; Invitrogen).
For PGNs, cryosections were washed (3 times, 30 min in total) in PBS, permeabilized (5 min) in 0.3% Triton X-100 in PBS (v/v) and incubated (2 h, room temperature) in blocking solution (1% BSA (w/v), 5% fetal bovine serum (FBS (w/v), Gibco, 10270), 0.05% Triton X-100 (v/v) in PBS). After incubation (overnight, 4 °C) with primary antibody in blocking solution, the cryosections were washed (3–5 times, 30 min) in 0.025% Triton X-100 in PBS (v/v) and immunolabelled (1 h, room temperature) with secondary antibodies in blocking solution followed by 3 washes (15 min) in PBS. Cryosections were then counterstained (5 min) with 0.5 µg ml–1 4′,6′-diamino-2-phenylindole (DAPI; Sigma-Aldrich, D9542) in PBS, and then rinsed in PBS and water. Coverslips were mounted in Mowiol 4-88 solution in 5% glycerol, 0.1 M Tris-HCl (pH 8.5).
The number of SATB2-positive cells present in the hippocampal CA1 area of the Satb2flox/flox control mice was determined by counting nuclei positive for SATB2 immunostaining (1:100; Abcam, ab10563678). To avoid counting the same nuclei, only every 30th ultrathin section cut through the tissue was collected, and the remaining sections discarded. Twenty-five nuclei were identified in the pyramidal neuron layer per image in the DAPI channel, and only SATB2-positive cells were counted. We confirmed that most cells (96%) within the CA1 layer were PGNs (data not shown).
For DNs and OLGs, cryosections were washed (3 times, 30 min in total) in PBS, quenched (20 min) in PBS containing 20 mM glycine, then permeabilized (15 min) in 0.1% Triton X-100 in PBS (v/v). Cryosections were then incubated (1 h, room temperature) in blocking solution (1% BSA (w/v), 0.2% fish-skin gelatin (w/v), 0.05% casein (w/v) and 0.05% Tween-20 (v/v) in PBS). After incubation (overnight, 4 °C) with the antibody in blocking solution, the cryosections were washed (3–5 times, 1 h) in blocking solution and immunolabelled (1 h, room temperature) with secondary antibodies in blocking solution, followed by 3 washes (15 min) in 0.5% Tween-20 in PBS (v/v). Cryosections were then counterstained with 0.5 µg ml–1 DAPI in PBS, then rinsed in PBS. Coverslips were mounted in Mowiol 4-88.
Digital images were acquired with a Leica TCS SP8-STED confocal microscope (Leica Microsystems) using a ×63 oil-immersion objective (numerical aperture of 1.4) or a ×2 oil-immersion objective, using a pinhole equivalent to 1 Airy disk. Images were acquired using 405-nm excitation and 420–480-nm emission for DAPI, 488-nm excitation and 505–530-nm emission for TH or GFP, and 555-nm excitation and 560-nm emission using a long-pass filter at 1,024 × 1,024 pixel resolution. Images were processed using Fiji (v.2.0.0-rc-69/1.52p), and adjustments included the optimization of the dynamic signal range with contrast stretching.
Immunofluorescence detection of GAM samples for laser microdissection
For laser microdissection, cryosections on PEN membranes were washed, permeabilized and blocked as for confocal microscopy, and incubated with primary and secondary antibodies as indicated above except for the use of higher concentrations of primary antibodies, as follows: anti-TH (1:50), anti-pan-histone (1:50) or anti-GFP (1:50). Secondary antibodies were used at the same concentration. Cell staining was visualized using a Leica laser microdissection microscope (Leica Microsystems, LMD7000) using a ×3 dry objective. Following detection of cellular sections of the cell types of choice containing nuclear slices (nuclear profiles (NPs)), individual NPs were laser microdissected from the PEN membrane and collected into PCR adhesive caps (AdhesiveStrip 8C opaque, Carl Zeiss, 415190-9161-000). We used multiplex-GAM9, for which three NPs were collected into each adhesive cap and the presence of NPs in each lid was confirmed with a ×5 objective using a 420–480-nm emission filter. Control lids not containing NPs (water controls) were included for each dataset collection to keep track of contamination and noise amplification of whole-genome amplification (WGA) and library reactions, and can be found in Supplementary Table 2.
WGA of NPs
WGA was performed using an in-house protocol. In brief, NPs were lysed directly in the PCR adhesive caps for 4 h (or 24 h for 160 out of 585 GAM samples from DN replicate 1) at 60 °C in 1.2× lysis buffer (30 mM Tris-HCl pH 8.0, 2 mM EDTA pH 8.0, 800 mM guanidinium-HCl, 5 % (v/v) Tween 20, 0.5 % (v/v) Triton X-100) containing 2.116 units ml–1 Qiagen protease (Qiagen, 19155). After protease inactivation at 75 °C for 30 min, the extracted DNA was amplified using random hexamer primers with an adaptor sequence. The pre-amplification step was done using 2× DeepVent mix (2× Thermo polymerase buffer (10×), 400 µm dNTPs, 4 mM MgSO4 in ultrapure water), 0.5 µM GAT-7N primers (5′-GTG AGT GAT GGT TGA GGT AGT GTG GAG NNN NNN N) and 2 units µl–1 DeepVent (exo-) DNA polymerase (New England Biolabs, M0259L) in the programmable thermal cycler for 11 cycles. Primers that annealed to the general adaptor sequence were then used in a second exponential amplification reaction to increase the amount of product. The exponential amplification was done using 2× DeepVent mix, 10 mM dNTPs, 100 µM GAM-COM primers (5′-GTGAGTGATGGTTGAGGTAGTGTGGAG) and 2 units µl–1 DeepVent (exo-) DNA polymerase in the programmable thermal cycler for 26 cycles. For a small number of NPs from DNs (Supplementary Table 2), WGA was performed using a WGA4 kit (Sigma-Aldrich) using the manufacturer’s instructions; the recent formulation of this kit is no longer suitable for GAM data production from subcellular nuclear slices.
GAM library preparation and high-throughput sequencing
Following WGA, the samples were purified using SPRI beads (0.725 or 1.7 ratio of beads per sample volume). The DNA concentration of each purified sample was measured using a Quant-iT Pico Green dsDNA assay kit (Invitrogen, P7589) according to the manufacturer’s instructions. GAM libraries were prepared using an Illumina Nextera XT library preparation kit (Illumina, FC-131-1096) following the manufacturer’s instructions with an 80% reduced volume of reagents. Following library preparation, the DNA was purified using SPRI beads (1.7 ratio of beads per sample volume) and the concentration for each sample was measured using a Quant-iT PicoGreen dsDNA assay. An equal amount of DNA from each sample was pooled together (up to 196 samples), and the final pool was additionally purified three times using the SPRI beads (1.7 ratio of beads per sample volume). The final pool of libraries was analysed using DNA High Sensitivity on-chip electrophoresis on an Agilent 2100 Bioanalyzer to confirm the removal of primer dimers and to estimate the average size and DNA fragment size distribution in the pool. NGS libraries were sequenced on an Illumina NextSeq 500 machine according to the manufacturer’s instructions using single-end 75 bp reads. The number of sequenced reads for each sample can be found in Supplementary Table 2.
Tn5-based libraries are preferred for GAM data sequencing to increase fragment sequence variation, which helps avoid the need for dark cycles in the current Illumina machines. This choice greatly reduces the cost of sequencing and decreases the frequency of noise reads from absent windows seen with the previous protocol3.
GAM data sequence alignment
Sequenced reads from each GAM library were mapped to the mouse genome assembly GRCm38 (December 2011, mm10) with Bowtie2 (v.2.3.4.3) using default settings56. All non-uniquely mapped reads, reads with mapping quality <20 and PCR duplicates were excluded from further analyses.
GAM data window calling and sample QC
Positive genomic windows present within ultrathin nuclear slices were identified for each GAM library. In brief, the genome was split into equal-sized windows (50 kb), and the number of nucleotides sequenced in each bin was calculated for each GAM sample with bedtools57. Next, we determined the percentage of orphan windows (that is, positive windows that were flanked by two adjacent negative windows) for every percentile of the nucleotide coverage distribution and we identified the percentile with the lowest percentage of orphan windows for each GAM sample in the dataset. The number of nucleotides that corresponds to the percentile with the lowest percentage of orphan windows in each sample was used as an optimal coverage threshold for window identification in each sample. Windows were called positive if the number of nucleotides sequenced in each bin was greater than the determined optimal threshold.
Each dataset was assessed for QC by determining the percentage of orphan windows in each sample, the number of uniquely mapped reads to the mouse genome and the correlations from cross-well contamination for every sample (Supplementary Table 2). Most GAM libraries passed the QC analyses (86–96% in each dataset; Extended Data Fig. 1b, c). To assess the quality of sampling in each GAM dataset, we measured the frequency with which all possible intrachromosomal pairs of genomic windows are found in the same GAM sample; we found that 98.8–99.9% of all mappable pairs of windows were sampled at least once at resolution 50 kb at all genomic distances. Each sample was considered to be of good quality if they had <70% orphan windows, >50,000 uniquely mapped reads and a cross-well contamination score determined per collection plate of <0.4 (Jaccard index). The number of samples in each cell type that passed QC is summarized in Extended Data Fig. 2a. Following QC analysis, we noted that the 160 (out of 585) DN replicate 1 samples incubated with lysis buffer for 24 h had decreases in orphan windows (median = 26% and 36% for 24 h and 4 h, respectively) and increases in total genome coverage (median = 9% and 6% for 24 h and 4 h, respectively). Although these differences were minor, we recommend 24 h lysis for future work.
Publicly available GAM datasets from mES cells
For mES cells, GAM datasets were downloaded from the 4D Nucleome portal (https://data.4dnucleome.org/). We used 249 × 3 NP GAM datasets from mES cells (clone 46C), which were grown at 37 °C in a 5% CO2 incubator in Glasgow modified Eagle’s medium (MEM), supplemented with 10% FBS, 2 ng ml–1 leukaemia inhibitory factor (LIF) and 1 mM 2-mercaptoethanol, on 0.1% gelatin-coated dishes. Cells were passaged every other day. After the last passage, 24 h before collection, mES cells were re-plated in serum-free ESGRO Complete Clonal Grade medium (Merck, SF001- B). The list of 4DN sample identity numbers is provided in Supplementary Table 1.
Visualization of pairwise chromatin contact matrices
To visualize GAM data, contact matrices were calculated using pointwise mutual information (PMI) for all pairs of windows genome-wide. PMI describes the difference between the probability of a pair of genomic windows being found in the same NP given both their joint distribution and their individual distributions across all NPs. PMI was calculated using the following formula, where p(x) and p(y) are the individual distributions of genomic windows x and y, respectively, and p(x,y) are their joint distribution:
1 |
PMI can be bounded between −1 and 1 to produce a normalized PMI (NPMI) value given by the following formula:
2 |
For visualization of the contact matrices, scale bars are adjusted in each genomic region displayed to a range between 0 and the 99th percentile of NPMI values for each cell type.
Insulation score and topological domain boundary calling
TAD calling was performed by calculating insulation scores in NPMI GAM contact matrices at 50-kb resolution, as previously described2,9. The insulation square method was chosen as it was previously shown that the domain borders detected in GAM data are also found in Hi-C, for which they are the most robust (most insulated)2,9. The insulation score was computed individually for each cell type and biological replicate, with insulation square sizes ranging from 100 to 1,000 kb. TAD boundaries were called using a 500-kb insulation square size and based on local minima of the insulation score. This approach does not detect meta-TADs or sub-TADs, and results in numbers and lengths of domains were similar to previous reports6,58. Future work with higher resolution GAM datasets will enable further analyses of the reorganization of domains at finer genomic scales to investigate changes in sub-TADs, which have been previously shown to occur following cell commitment to neuronal lineages59.
Within each dataset, boundaries that were touching or overlapping by at least one nucleotide were merged. Boundaries were further refined to consider only the minimum insulation score within the boundary and one window on each side, to produce a 3-bin ‘minimum insulation score’ boundary. In comparisons of boundaries between different datasets, 150-kb boundaries were considered different when separated by at least one 50-kb genomic bin, that is, if the centre of the boundaries are separated by at least 200 kb (note chromosome Y was excluded from the analysis). In Fig. 2b, we considered the boundary coordinate as the genomic window within a boundary with the lowest insulation value. TAD border coordinates for all cell types can be found in Supplementary Table 3, and the full range of insulation scores (100–1,000 kb) for all cell types can be found in Supplementary Table 4. UpSet plots for TAD border overlaps, compartments and TF motif analyses were generated using either custom Python or R scripts or using the UpSetR package (v.1.4.0)60.
Identification of compartments A and B
For compartment analysis, matrices of co-segregation frequency were determined using the ratio of independent occurrence of a single positive window in each sample over the pairwise co-occurrence of pairs of positive windows in a given pair of genomic windows2. GAM co-segregation matrices at 250-kb resolution were assigned to either A or B compartments, as previously described2. In brief, each chromosome was represented as a matrix of observed interactions O(i,j) between locus i and locus j (co-segregation) and separately for E(i,j), whereby each pair of genomic window is the mean number of contacts with the same distance between i and j. A matrix of observed over expected values O/E(i,j) was produced by dividing O by E. A correlation matrix C(i,j) was produced between column i and column j of the O/E matrix. PCA was performed for the first three components on matrix C before extracting the component with the best correlation to GC content. Loci with PCA eigenvector values with the same sign that correlate best with GC content were called A compartments, whereas regions with the opposite sign were B compartments. For visualizations and Pearson’s correlations between datasets, eigenvector values on the same chromosome in compartment A were normalized from 0 to 1, whereas values on the same chromosome in compartment B were normalized from −1 to 0. Compartments were considered common if they had the same compartment definition within the same genomic bin. Compartment changes between cell types were computed after considering compartments that were common between biological replicates unless otherwise indicated.
To identify and visualize gene expression differences among genes in changing compartments, k-means clustering was performed on triplicate pseudo-replicates of each cell type using a custom Python script (Extended Data Fig. 12a, b). The number of clusters were determined using the elbow method, with k-means = 6 for genes in compartment B in mES cells and compartment A in brain cells, and k-means = 5 for compartment A in mES cells and compartment B in brain cells.
mES cell culture for scRNA-seq and scATAC-seq
mES cells from the 46C clone, derived from E14tg2a and expressing GFP under the Sox1 promoter61, were a gift from D. Henrique (Instituto de Medicina Molecular, Faculdade Medicina Lisboa, Lisbon, Portugal). mES cells were cultured as previously described62. In brief, cells were routinely grown at 37 °C, 5% (v/v) CO2, on gelatine-coated (0.1% v/v) Nunc T25 flasks in Gibco Glasgow’s MEM (Invitrogen, 21710082), supplemented with 10% (v/v) fetal calf serum (BioScience LifeSciences, 7.01, batch number 110006) for scRNA-seq or Gibco FBS (Invitrogen, 10270-106, batch number 41F8126K) for ATAC-seq, 2,000 units ml–1 LIF (Millipore, ESG1107), 0.1 mM β-mercaptoethanol (Invitrogen, 31350-010), 2 mM l-glutamine (Invitrogen, 25030-024), 1 mM sodium pyruvate (Invitrogen, 11360070), 1% penicillin–streptomycin (Invitrogen, 15140122) and 1% MEM non-essential amino acids (Invitrogen, 11140035). Medium was changed every day and cells were split every other day. mES cell batches tested negative for Mycoplasma infection, which was performed according to the manufacturer’s instructions (AppliChem, A3744,0020). Before collecting material for scRNA-seq or ATAC-seq, cells were grown for 48 h in serum-free ESGRO Complete Clonal Grade medium (Merck, SF001- B), supplemented with 1,000 units ml–1 LIF, on gelatine -coated (Sigma, G1393-100 ml, 0.1% v/v) Nunc 10-cm dishes, with a change in medium after 24 h.
46C E14tg2 mES cells are not listed in the ICLAC Register of Misidentified Cell Lines. The 46C E14tg2 mES cell line was generated by insertion of an eGFP cassette under the control of the Sox1 promoter in E14tg2 cells. Reads aligned with the GFP sequence were identified in the GAM sequencing data from mES cells. In addition, genome sequencing data from GAM mES cell samples was mined for single nucleotide polymorphisms (SNPs). Although GAM sequencing reads are sparsely distributed across the genome, there was a 64% overlap of GAM mES cell SNPs with SNPs identified from the parental E14tg2 genome sequencing data (https://www.ncbi.nlm.nih.gov/sra?term=SRX389523; data not shown).
Single-cell mRNA library preparation
Two batches (denoted batch A and B) of single-cell mRNA-seq libraries were prepared according to the Fluidigm manual “Using the C1 Single-Cell Auto Prep System to Generate mRNA from Single Cells and Libraries for Sequencing”. Cell suspension was loaded on 10–17 μm C1 Single-Cell Auto Prep IFCs (Fluidigm, 100-5760, kit 100-6201). After loading, the chip was observed under the microscope to score cells as singlets, doublets, multiplets, debris or other. The chip was then loaded again on Fluidigm C1 IFCs, and cDNA was synthesized and pre-amplified in the chip using a Clontech SMARTer kit (Takara Clontech, 634833). In batch B, we included Spike-In Mix 1 (1:1,000; Life Technologies, 4456740) as per the Fluidigm manual. Illumina sequencing libraries were prepared using a Nextera XT kit (Illumina, FC- 131-1096) and a Nextera Index kit (Illumina, FC-131-1002), as previously described63. Libraries from each microfluidic chip (96 cells) were pooled and sequenced on 4 lanes on Illumina HiSeq 2000, 2×100-bp paired-end (batch A) or 1 lane on Illumina HiSeq 2000, 2×125-bp paired-end (batch B) at the Wellcome Trust Sanger Institute Sequencing Facility (Supplementary Table 15).
scRNA-seq data processing, mapping and expression estimates
To calculate expression estimates, mRNA-seq reads were mapped with STAR (spliced transcripts alignment to a reference, v.2.4.2a)64 and processed with RSEM using the ‘single-cell-prior’ option (RNA-seq by expectation-maximization, v.1.2.25)65. The references provided to STAR and RSEM were the GTF annotation from UCSC Known Genes (mm10, v.6) and the associated isoform–gene relationship information from the Known Isoforms table (UCSC), adding information for ERCC sequences in samples from batch B. Tables were downloaded from the UCSC Table browser (http://genome.ucsc.edu/cgi-bin/hgTables) and for ERCCs, from the ThermoFisher website (http://www.thermofisher.com/order/catalog/product/4456739). Gene-level expression estimates in ‘Expected Counts’ from RSEM were used for the analysis.
scRNA-seq data processing QC
Cells scored as doublets, multiplets or debris during visual inspection of the C1 chip were excluded from the analysis. Datasets were also excluded if any of the following conditions were met: <500,000 reads (calculated using sam-stats from ea-utils.1.1.2-537)66; <60% of reads mapped (calculated with sam-stats); <50% reads mapped to mRNA (picard-tools-2.5.0, http://broadinstitute.github.io/picard/); >15% of reads mapped to chrM (sam-stats); if present, >20% of reads mapped to ERCCs (sam-stats). Following processing, 98 single cells passed quality thresholds in the final dataset. Correlations between previously published mES cells (clone 46C) mRNA-seq bulk62 and the scRNA-seq mES cell transcriptomes were performed to assess the quality of the single-cell data. Correlations were performed as previously described67. Average single-cell expression was highly correlated with bulk RNA-seq data (Extended Data Fig. 4c).
scRNA-seq analysis
To utilize published single-cell transcriptomes from brain cell types of interest, we selected P21–22 OLGs68, P22–32 CA1 PGNs69 and P21–26 VTA DNs70 on the basis of the cell type and subtype definitions provided in the respective publications. The matrices of counts provided in each publication, along with the single-cell mES cell transcriptomes produced that passed QC, were combined with no prior batch correction due to the lack of equivalent cell types across all single-cell datasets. The combined matrix of counts was normalized by applying the LogNormalize method and scaled using Seurat (v.3.1.4)71. The scaled data were used for a PCA, followed by processing through dimensionality reduction using uniform manifold approximation and projection (UMAP)72 for visualization purposes using the Seurat R package71, with default parameters. Visualization of known cell-type-specific marker genes confirmed that the different transcriptomes are grouped into cell-type-specific clusters (Extended Data Fig. 4e). Single mES cell transcriptomes from batch A and B clustered together, and were pooled for further analyses. Genes that could not be mapped to the chosen reference GTF were removed (UCSC; accessed from iGenomes July 17, 2015; https://support.illumina.com/sequencing/sequencing_software/igenome.html).
To generate bigwig tracks for visualization, raw fastq files from each single cell within the same cell type were pooled into one fastq file. Reads were mapped to the mouse genome (mm10) using STAR with default parameters but–outFilterMultimapNmax 10. BAM files were sorted and indexed using Samtools (v.1.3.1)73 and normalized (reads per kilobase of transcript per million (RPKM)) bigwigs were generated using Deeptools (v.3.1.3)74 bamCoverage. To account for differences in the number of technical replicates in OLG samples, cells were divided into groups by the number of runs (1, 2 and 6). The median of the reads for the group with the lowest sequencing depth was used as a threshold to normalize the other groups (that is, the rest of the fastq files were randomly downsampled to that number of reads). The three groups of raw reads were pooled together and processed by applying the same method as for the other cell types. Pseudobulk expression was determined using the regularized log (R-log) value for each gene (Extended Data Fig. 4f, g). In each cell type, only the genes with R-log values of ≥2.5 in all pseudobulk replicates were considered expressed.
Differential gene expression analysis
For differential expression analysis for all cell types, pseudobulk replicate samples were obtained by randomly partitioning the total number of single cells per dataset into three groups and pooling all unique molecular identifiers (UMIs) per gene of cells belonging to the same replicate. To determine differentially expressed genes, all six possible pairwise comparisons between samples were performed using DEseq2 (v.1.24.0) with default parameters75. In addition, shrunken log2 fold-changes were added with the lfcShrink function, using default parameters. Genes classified as differentially expressed in at least one comparison were considered for further analysis (adjusted P value < 0.05; Benjamini–Hochberg multiple testing correction method). A summary table for the differential expression analysis of all cell types can be found in Supplementary Table 12. For the TF motif analysis, only the differentially expressed genes obtained from the comparison between DNs and PGNs were considered for further analysis (Extended Data Fig. 9c, d).
Tn5 purification
The pTXB1 plasmid carrying the Tn5-intein-CBD fusion construct with the hyperactive Tn5 protein containing the E54K and L372P mutations was obtained from Addgene (plasmid 60240). Tn5 expression and purification was performed as previously described76, except that the final storage buffer was 50 mM HEPES-KOH pH 7.2, 0.8 M NaCl, 0.1 mM EDTA, 1 mM dithiothreitol and 55% glycerol.
Tn5 adapter mix preparation
To generate 100 μM adapter mix, 200 μM Tn5MErev (5′-[phos]CTGTCTCTTATACACATC) was mixed with of 200 μM Tn5ME-A (5′-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG; Adapter_mixA, 1:1 ratio). Separately, 200 μM Tn5MErev was mixed with 1 volume of 200 μM Tn5ME-B (5′-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG; Adapter_mixB, 1:1 ratio). The two mixtures were incubated for 5 min at 95 °C and gradually cooled to 25 °C at a ramp rate of 0.1 °C s–1. Finally, the Adapter_mixA was mixed with Adapter_mixB at a 1:1 ratio for a final 100 μM adapter mix.
mES cell ATAC-seq library preparation
ATAC-seq libraries were generated from approximately 75,000 mES cell nuclei following the Omni ATAC protocol77 with a modified transposition reaction: TAPS-DMF buffer (50 mM TAPS-NaOH, pH 8.5, 25 mM MgCl2, 50% DMF), 0.1% Tween-20, 0.1% digitonin, in 0.25x PBS. A total of 3 μl of the Tn5 mix (5.6 μg Tn5 and 0.143 volume of 100 μM adapter mix) was added to the transposition reaction mix. Libraries were prepared as described in the Omni ATAC protocol. The final library was sequenced with an Illumina NextSeq 500 machine according to manufacturer’s instructions, using paired-end 75 bp reads (150 cycles).
Isolation of the VTA for snATAC-seq
Male C57Bl/6Nl (RRID: IMSR_CR:027; WT) mice, aged 7 and 9 weeks, were killed by cervical dislocation. Brains were removed and the tissue containing the midbrain VTA was dissected from each hemisphere at room temperature and rapidly frozen on dry ice. Frozen midbrain samples were kept at −80 °C until further processing.
DN snATAC-seq library preparation
Two 10X Genomics scATAC-seq libraries from the midbrain VTA, VTA-1 and VTA-2 (from mice aged 7 or 9 weeks, respectively), were generated from midbrain VTA samples according to the 10X Genomics manual “Nuclei Isolation from Mouse Brain Tissue for Single Cell ATAC Sequencing Rev B” for flash-frozen tissue with minor adjustments. In brief, 500 μl 0.1× lysis buffer (10 mM Tris-HCl, pH 7.4, 10 mM NaCl, 3 mM MgCl2, 1% BSA, 0.01% Tween-20, 0.01% Nonidet P40 substitute, 0.001% digitonin, and 1× complete Mini, EDTA-free protease inhibitor cocktail, Millipore-Sigma, 11836170001) was added to the frozen samples and immediately homogenized using a pellet pestle (15 times), followed by 5 min incubation on ice. The lysate was pipette mixed 10 times, then incubated 10 min on ice. Finally, 500 μl of chilled wash buffer (10 mM Tris-HCl, pH 7.4, 10 mM NaCl, 2 mM MgCl2, 1% BSA, 0.1% Tween-20) was added to the lysed cells, and the suspension was passed through a 30-μm CellTrics strainers (Th Geyer, 7648779). The final approximately 500 μl nuclei suspension was stained with DAPI (final concentration 0.03 μg ml–1) for about 5 min.
Around 200,000 DAPI-positive events were sorted using a BD FACSAria III flow cytometer with 70-µm nozzle configuration with sample and sort collection device cooling set to 4 °C into 300 μl Diluted Nuclei buffer (commercial buffer from 10X Genomics) in a 1.5-ml Eppendorf tube. A first gate excluded debris in a forward scatter/side scatter plot (see examples in Extended Data Fig. 4h, i). A consecutive, second gate in a DAPI-A/DAPI-H plot was used to exclude doublets and nuclei with incomplete DNA content (BD FACSDiva software, v.8.0.2). The collected nuclei were centrifuged at 500g for 5 min at 4 °C and resuspended in 20 µl Diluted Nuclei buffer. The nucleus concentration was determined using a Countess II FL Automated Cell Counter in DAPI fluorescence mode. snATAC-seq libraries were prepared per the Chromium Next GEM Single Cell ATAC Reagent kits v.1.1 User Guide. In brief, nuclei were loaded on a microfluidics chip together with transposition reagents, transposase enzyme, beads with oligo-dT tags and oil to create an emulsion. Afterwards, the transposase reaction takes place inside the droplets. The barcoded cDNA is recovered from the emulsion, amplified and cleaned using a bead purification process. The cDNA is then using for library construction, including enzymatic fragmentation, adapter ligation and sample index PCR. Libraries were sequenced with either an Illumina NextSeq 500 machine using paired-end 75 bp reads (for VTA-1, 150 cycles) or a NovaSeq 6000 using paired-end 75 bp reads (for VTA-2, 100 cycles).
ATAC-seq data processing, mapping, processing and QC
For bulk mES cell ATAC-seq, paired-end reads were mapped to the mouse genome (mm10) using Bowtie with the following parameters:–minins 25–maxins 2000–no-discordant–dovetail–soft-clipped-unmapped-tlen. Low-quality mapped reads (MQ < 30) and mitochondrial reads were removed. Duplicated reads were removed with Sambamba78 (v.0.6.8). Reads passing quality checks were converted to BAM format for further analyses.
For VTA snATAC-seq, paired-end reads were demultiplexed and mapped to the mouse genome (mm10) using the 10X Genomics Cellranger software (version cellranger-atac-1.2.0). The two VTA snATAC-seq libraries were analysed using ArchR software (v.0.9.1)79. Doublets were removed following default parameters in ArchR. Next, low-quality cells (identified as TSS enrichment score <4 and <2,500 unique fragments per cell) were removed for further analyses.
Next, dimensionality reduction was performed using the Latent semantic indexing (LSI) dimensionality reduction method from ArchR, with default parameters (except iterations = 10, resolution = 0.2, varFeatures = 60,000). The ArchR addHarmony function was used to run the Harmony algorithm for batch correction with default parameters, followed by clusters calling. Gene scores were determined as specified by ArchR79. DNs were identified as the cluster with higher gene scores for Th, a well-known DN marker, and confirmed by additional DN marker expression (for example, Lmx1b, Foxa2, Foxa1 and Slc6a3). The DN cluster is composed of 216 cells in total (113 from VTA-1 and 103 from the VTA-2). UMI duplicates were collapsed to one fragment. To visualize an approximation for gene expression, gene scores were calculated using the createArrowFiles (addGeneScoreMat = TRUE) function in ArchR.
Processing of published OLG and PGN scATAC-seq
scATAC-seq BAM files for OLGs were downloaded from the sciATAC-seq in vivo atlas of the mouse brain80. Next, reads were extracted from the BAM file that corresponded to cells from the cluster identified as oligodendrocytes from the prefrontal cortex (458 cells), to produce a pseudobulk ATAC BAM file. The original data, mapped to the mm9 genome, were converted to mm10 using the liftOver tool from UCSC utilities (https://genome.ucsc.edu/cgi-bin/hgLiftOver).
scATAC-seq datasets were obtained from hippocampal PGNs81. A BAM file containing all cell types was supplied by A. Adey (Molecular and Medical Genetics, Oregon Health & Science University, Portland, OR, USA). Reads were extracted from the BAM file that corresponded to the NR1 PGN population (270 cells) to produce a pseudobulk ATAC BAM file.
Generation of normalized ATAC-seq bigwig tracks
A size factor normalization was applied to generate ATAC-seq bigwig tracks comparable between mES cells, OLGs, PGNs and DNs. First, a count matrix was generated for all TSS regions (±250 bp), which contained reads from at least two of the four cell types. The TSS list was extracted from the genes.gtf file included in the cell ranger reference data (refdata-cellranger-atac-mm10-1.2.0l; https://support.10xgenomics.com/single-cell-atac/software/pipelines/latest/advanced/references). To calculate size factors, the TSS count matrix was processed through DESeqDataSetFromMatrix and estimateSizeFactors from the DESeq2 package75. For all cell types, the scale factor (SF) = (cell type size factor) × −1.
Each pseudobulk ATAC-seq BAM file from mES cells, PGNs and OLGs was converted to the bedGraph format using the genomeCoverageBed function from bedtools57 with the following parameters: -pc -bg -scale SF. For DNs, ATAC-seq fragment files were converted to the bedGraph format using the genomeCoverageBed function from bedtools57 with the following parameters: -g chrom.sizes -bg -scale SF. The mm10 chrom.sizes file was downloaded from UCSC using fetchChromSize from UCSC utilities (http://hgdownload.soe.ucsc.edu/admin/exe/). The bedGraph files were then converted to bigwig using the bedGraphToBigWig function from UCSC utilities.
DN and PGN ATAC-seq peak calling
ATAC-seq peaks were called in DNs following the iterative overlap peak merging procedure described in the ArchR package79. First, two pseudobulk replicates were generated by running the addGroupCoverages function and then reproducible peaks were called using the addReproduciblePeakSet function. For PGNs, peaks for the NR1 cluster were obtained from Sinnamon et al.81. For further analyses, peaks were considered positive if they were found in at least 10% of single nuclei (>10 nuclei in DNs; >13 cells in PGNs).
RNA and ATAC-seq length-scaled ATAC reads per million
To calculate length-scaled RNA reads per million (lsRRPM) for 479 long genes (>300 kb), the mES cell BAM file (paired-end) was read using the readGAlignmentPairs function from the GenomicAlignments function from the GenomicAlignments package in R (v.1.20.1; https://bioconductor.org/packages/release/bioc/html/GenomicAlignments.html). For published single-cell datasets (OLGs, PGNs, DNs; single-end libraries), BAM files were loaded using the readGAlignments function from the GenomicAlignment package. Owing to the very long length of some reads, all BAM fragments were resized to the 5′ end base pair to avoid overlapping with multiple features. Next, the following formula was used to compute lsRRPM values for each cell type and per gene:
To calculate length-scaled ATAC reads per million (lsARPM) for 479 long genes (>300 kb), concordant paired-end fragments were extracted for all cell types using the readGAlignmentPairs function from the GenomicAlignments package in R with the following total number of fragments: 37,261,746 (mES cells), 2,121,258 (OLGs), 4,594,229 (PGNs) and 8,939,526 (DNs). Next, the following formula was used to compute lsARPM values for each cell-type and per gene:
GO analysis
GO term enrichment analysis was performed using GOElite (v.1.2.4)82. In Extended Data Fig. 4n, DN snATAC-seq marker genes were extracted with the getMarkerFeatures function from ArchR with default parameters. Marker genes were selected as genes with log2 fold change values of >1 and false discovery rate of <0.01 in the DN cluster compared with all clusters from the VTA (total of 973 genes). All unique genes were used as the background GO dataset. In Fig. 2c, all genes expressed in at least one cell type, annotated to mm10, were used as the background dataset. In Fig. 4d, e, all genes expressed in PGNs or DNs were used as the background dataset, and in Fig. 5a, b, all unique genes were used. Default parameters were used for the GO enrichment: GO terms that were enriched above the background (significant permuted P values of <0.05, 2,000 permutations) were pruned to select the terms with the largest Z-score (>1.96) relative to all corresponding child or parent paths in a network of related terms (genes changed >2). GO terms which had a permuted P value of ≥0.01, contained fewer than 6 genes per GO term or from the ‘cellular_component’ ontology, were not reported in the main figures. A full list of unfiltered GO terms can be found in Supplementary Table 7.
MELTRON pipeline
To assess gene insulation differences, insulation square values at 10 length scales (100–1,000 kb) were calculated for genes >300 kb in length (n = 479; calculated for a minimum 8× 50-kb bins, that is, 400 kb minimum length). Cumulative probability distributions of insulation square values were calculated for each dataset, and the brain cells were compared to mES cell probability distributions for each gene by computing the maximum distance between the distributions and applying a Kolmogorov–Smirnov test. P values were corrected for multiple testing using the Bonferroni method, and –log10 transformed to obtain a domain melting score. Domain melting scores for each gene in each comparison can be found in Supplementary Table 8. For visualization, empirical cumulative probabilities and insulation score values were smoothed using a Gaussian kernel density estimate (adjust = 0.3).
Calculation of the trans–cis contact ratio
To determine the interaction strength of contacts to all (trans) somatic chromosomes relative to interaction strength to their own (cis) chromosome, cis and trans NPMI-normalized matrices were calculated at 250-kb resolution. Bins detected in less than 3%, or more than 75%, of 3 NP samples were removed from the analysis. To be sensitive to outliers, NPMI values of both cis (NPMIC) and trans (NPMIT) contacts for every bin were summarized with the arithmetic mean. The trans–cis contact ratio was then obtained using the following formula:
Trans–cis values of bins spanning long genes were summarized with the median.
Modelling and in silico GAM
To reconstruct 3D conformations of the Nrxn3 locus, we employed the Strings & Binders Switch (SBS) polymer model of chromatin83,84. In the SBS model, a chromatin region is modelled as a self-avoiding chain of beads, including different binding sites for diffusing, cognate, molecular binders. Binding sites of the same type can be bridged by their cognate binders, which then drives polymer folding. The optimal SBS polymers for the Nrxn3 locus in mES cells and DNs were inferred using PRISMR, a machine-learning-based procedure that finds the minimal arrangement of the polymer binding sites that best describe input pairwise contact data, such as Hi-C22 or GAM85. Here, PRISMR was applied to the GAM experimental data by considering the NPMI normalization on a 4.8 Mb region around the Nrxn3 gene (chromosome 12: 87,600,000–92,400,000; mm10) at 50-kb resolution in mES cells and DNs. The procedure returned optimal SBS polymer chains made of 1,440 beads, including 7 different types of binding sites, in both cell types. A full list of x, y and z coordinates for mES cell and DN polymer model structures can be found in Supplementary Tables 9 and 10, respectively.
Next, to generate thermodynamic ensembles of 3D conformations of the locus, molecular dynamics simulations were run of the optimal polymers, using the freely available LAMMPS software (v.5june2019)86. In these simulations, the system evolves according to the Langevin equation, with dynamics parameters derived from classical polymer physics studies87. Polymers are first initialized in self-avoiding conformations and then left to evolve to reach their equilibrium globular phase83. Beads and binders have the same diameter σ = 1, expressed in dimensionless units, and experience a hard-core repulsion by use of a truncated Lennard–Jones potential. Analogously, attractive interactions are modelled with short-ranged Lennard–Jones potentials83. A range of affinities between beads and cognate binders were sampled in the weak biochemical range, from 3.0 KBT to 8.0 KBT (where KB is the Boltzmann constant and T the system temperature). In addition, binders interact nonspecifically with the polymer with a lower affinity, sampled from 0 KBT to 2.7 KBT. For the sake of simplicity, the same affinity strengths were used for all different binding site types. The total binder concentration was taken above the polymer coil–globule transition threshold83. For each of the considered cases, ensembles of up to 450 distinct equilibrium configurations were derived. Full details about the model and simulations are discussed in Barbieri et al.83 and Chiariello et al.84.
In silico GAM NPMI matrices were obtained from the ensemble of 3D structures by applying the in silico GAM algorithm10, here generalized to simulate the GAM protocol with 3 NPs per GAM sample and to perform NPMI normalization. In silico GAM NPMI matrices can be obtained using previously published algorithms10, by aggregating the content of three in silico slices into one tube, and then applying the NPMI normalization formula (see the section ‘Visualization of pairwise chromatin contact matrices’, therein10). Specifically, the same number of slices were used as in the GAM experiments, 249 × 3 NPs for mES cellCs and 585 × 3 NPs for DNs. Pearson’s correlation coefficients were used to compare the in silico and experimental NPMI GAM matrices.
Example of single 3D conformations were rendered by a third-order spline of the polymer bead positions, with regions of interest highlighted in different colours. To quantify the size and variability of the 3D structures in mES cells and DNs, the average gyration radius (Rg) was measured from the selected domains encompassing and surrounding the Nrxn3 gene, expressed in dimensionless units σ in Fig. 3d, Extended Data Fig. 7e. Analyses and plots were produced with the Anaconda package v.4.7.12, and 3D structure visualizations were produced with POV Ray, v.3.7 (http://www.povray.org/download/).
Cryosections for FISH experiments
Fixed and cryopreserved hippocampal CA1 tissue and mES cells were cryosectioned as previously described (see ‘Cryoblock preparation and cryosectioning’ above) with an approximate thickness of 400 nm and transferred to glass coverslips (thickness number 1.5, diameter 10 mm) coated with laminin (Sigma-Aldrich, P8920) according to the manufacturer’s instructions for the three-colour FISH experiment (TSS, middle and TES), or washed in 100% ethanol and autoclaved for the immunofluorescence whole-gene FISH experiment (nucleolus, Rbfox1).
BAC probes labelling and precipitation
BACs targeting the Rbfox1 locus (Supplementary Table 11) were obtained from the BACPAC Resources Center (https://bacpacresources.org) and amplified from glycerol stocks using a MIDIprep kit (NucleoBond Xtra BAC purification kit, Machery-Nagel, 740436). Purified BACs were labelled using a nick translation kit (Abbott Molecular, 7J0001) according to the manufacturer’s instructions and the following fluorophores (all Invitrogen, Thermo Fisher Scientific): ChromaTide Alexa Fluor 488-5-dUTP (C11397), ChromaTide Alexa Fluor 568-5-dUTP (C11399) and Alexa Fluor 647-aha-dUTP (A32763). Labelled BAC probes were co-precipitated with yeast tRNA (20 μg μl–1 final concentration; Invitrogen, AM7119) and mouse Cot-1 DNA (3 μg μl–1 final concentration; Invitrogen, 18440-016) overnight at −20 °C. After clean up in 70% ethanol, the probes were dissolved in 100% deionized formamide (for 1 h; Sigma, F9037) before adding (1:1) a 2× hybridization mix (20% dextran sulfate, 0.1 M phosphate buffer in 4× saline-sodium citrate (SSC); mixing for 1 h), denatured (10 min, 80 °C), and reannealed (30 min, 37 °C) before hybridization.
Immunolabelling before FISH
Immunofluorescence labelling of the nucleolus was performed as described above (‘Immunofluorescence detection for confocal microscopy’) by incubating the cryosections overnight (at 4 °C) with a mouse monoclonal antibody anti-nucleophosmin B23 (a gift from H. Busch49), followed by incubation (1 h) with donkey antibodies raised against mouse IgG conjugated with Alexa Fluor-555 (Invitrogen). Before cryo-FISH, the bound antibodies were fixed (1 h, 4 °C) in 8% depolymerized PFA (EM-grade) in 250 mM HEPES–NaOH (pH 7.6) and rinsed in PBS.
Cryo-FISH
Cryo-FISH was performed as previously described2,23 with a few modifications. In brief, cryosections were washed (30 min) in 1× PBS, rinsed with 2× SSC (Sigma, S6639) and incubated (2 h, 37 °C) in 250 μg ml–1 RNase A (Sigma, R4642) in 2× SSC. After washing in 2× SSC, cryosections were treated (10 min) with 0.1 M HCl, dehydrated in ethanol (30%, 50%, 70%, 90%, 100% series, 3 min each on ice) and denatured (10 min) at 80 °C in 70% formamide, 2× SSC, 0.05 M phosphate buffer (pH 7.4). Cryosections were dehydrated as described above, and overlaid on hybridization mixture on HybriSlip (Invitrogen, H18202). After sealing with rubber cement and incubation (48 h, 37 °C) in a moist chamber, cryosections were washed (25 min, 42 °C) in 50% formamide in 2× SSC, (30 min, 60 °C) in 0.1× SSC and (10 min, 42 °C) in 0.1% Triton X-100 in 4× SSC. After rinsing with 1× PBS, coverslips were mounted in Vectashield mounting medium (anti-Fading) with DAPI (Vector Laboratories, H-1200).
Cryo-FISH microscopy
Cryo-FISH images were collected sequentially with a Leica TCS SP8-STED confocal microscope (Leica Microsystems DMI6000B-CS) using Leica Application Suite X v.3.5.5.19976 and a HC PL APO CS2 ×63/1.40 oil objective (numerical aperture of 1.4, Plan Apochromat) (see ‘Immunofluorescence detection for confocal microscopy’) using the following settings: 405-nm excitation and 420–500-nm emission (for DAPI), 488-nm excitation and 510–535-nm emission (for probes labelled with ChromaTide Alexa Fluor-488 and for nucleophosmin), 568-nm excitation and 586–620-nm emission (for probes labelled with ChromaTide Alexa Fluor-568), 647-nm excitation and 657–700-nm emission (for probes labelled with Alexa Fluor-647), and 555-nm excitation and 586–640-nm emission (for immunofluorescence labelling of nucleophosmin with Alexa Fluor-555). All images were collected with a ×4 zoom at 1,024 × 1,024 pixel resolution (pixel size of 0.0451 μm, resolution of 22.1760 pixels μm–1).
Cryo-FISH image analysis
Images were analysed using Fiji software (v.2.0.0-rc-69/1.52p)88. All images were pre-processed as previously described23. Genomic foci were visually identified, and areas of the manually defined objects were measured using the Fiji-Area tool. For the cryo-FISH experiment combined with immunofluorescence, the location of genomic loci in relation to the nuclear lamina or nucleolus was assessed on the basis of the overlap of foci with the nucleolus (identified by nucleophosmin immunolabelling) or the nuclear lamina (as defined by the periphery of the DAPI staining) by at least three pixels. To determine the distance between the TSS, middle and TES genomic foci, we took the centre of mass of the selected objects, as defined by Fiji-Center of mass function (the brightness-weighted average of the x and y coordinates of all pixels within the selected areas). Distances between the objects were measured using the Fiji-Line tool between the centres of mass defined for each object. Images for visualization in figure panels were processed using Fiji or Adobe Photoshop CS6, for which adjustments included the optimization of the dynamic signal range with contrast stretching.
Determination of differential contacts between GAM datasets
Significant differences in pairwise contacts between a pair of GAM datasets were determined as previously described with modifications9. In brief, genomic windows with low detection, defined as less than 2% of the distribution of all detected genomic windows for each chromosome, were removed from both datasets to be compared. Contacts were filtered to be within 0.5–5 Mb distance and above 0.15 NPMI, and NPMI contact frequencies at each genomic distance of each chromosome were normalized by computing a Z-score transformation, and a differential matrix (D) was derived by subtracting the two Z-score normalized matrices9.
TF-binding site analysis
To find TF-binding motifs present within specific contacts, significant differential contacts were determined for DNs and PGNs. Accessible regions within the differential contacts were determined using scATAC-seq for PGNs81 and DNs. To account for methodological differences, including lower sequencing depth in PGN scATAC-seq data (Extended Data Fig. 4l), we considered only the peaks that occurred in >10% of cells (>10 cells in DNs; >13 in PGNs). Motif finding within accessible regions in significant contacts was performed using the Regulatory Genomics Toolbox (v.0.12.3; https://www.regulatory-genomics.org/motif-analysis/introduction/) with TF motifs (from the HOCOMOCO database, v.11)89 obtained for TFs expressed in either DNs or PGNs (R-log ≥ 2.5) to determine the percentage of windows containing each TF motif. Next, TF motifs were filtered based on (1) the percentage of windows containing the motif (>5%) and (2) the differential expression in either PGNs or DNs (–log10(adjusted P value) > 3, see ‘Differential gene expression analysis’ above), which resulted in 50 TF motifs for feature pair analysis (33 TF motifs from PGNs and 17 from DNs; Extended Data Fig. 9c, d).
Feature pairs associated with specific contacts were determined as previously described9 and testing the 1,275 combinations of motif pairs (1,225 heterotypic motif pairs and 50 homotypic motif pairs). The number of contacts containing each pair of selected TF motifs (PGNTF and DNTF), together with the percentage of total significant differential contacts in PGNs and DNs (PGN and DN), were used to determine the enrichment score for all TF feature pair interactions (that is, the ratio between frequencies of contacts in PGNs or DNs, (PGNTF/PGN)/(DNTF/DN)). The effectiveness of a TF pair for discriminating between contacts from PGNs and DNs was assessed by using the information gain measure90. Enrichment and information gain for all TF feature pair interactions, as well as differential expression values for TFs (DNs compared to PGNs), can be found in Supplementary Table 13. The top feature pairs were extracted on basis of the highest information gain (ten feature pairs), PGN enrichment (five feature pairs) and DN enrichment (five feature pairs) scores. Contact overlaps for top feature pairs were visualized using UpSet plots.
Network and community detection analysis of TF-binding sites in significant differential contacts
To determine the interconnectivity between different TF motifs found in accessible regions of significant differential contacts, the number of contacts for each pair of TF motifs (1,275 pairs) was determined. After filtering pairs of TF motifs involved in less than 20% of the total contacts (15,833 and 5,400 contacts minimum in PGNs and DNs, respectively), a network was built for each cell type with TF motifs as nodes and number of contacts as weighted edges. The Leiden algorithm was used to detect communities of strongly interconnected nodes, using the leiden package in R91,92, with a resolution of 1.01 for both PGNs and DNs (Extended Data Fig. 10f, Supplementary Table 14).
GAM aggregated contact plots
To visualize the average contact intensity for a set of genomic contacts, NPMI contact frequencies at each genomic distance of each chromosome were first normalized by computing a Z-score transformation. The resulting Z-score values were determined for each contact and for each contact in a 4-bin radius (50-kb bins). For each chromosome, Z-score values for each set of contacts and for the surrounding bins were summarized by the arithmetic mean. Mean values computed for each chromosome were added together and divided by the number of chromosomes.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this paper.
Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41586-021-04081-2.
Supplementary information
Acknowledgements
The authors thank S. Q. Xie and A. Ashraf for help processing midbrain samples; R. A. Beagrie for sharing access to the mES cell GAM data before publication and discussions; M. Bartosovic for help with manuscript rebuttal; E. Espel for help optimizing the whole-genome amplification protocol; M. Gotthardt for providing animals and R. Jüttner for help with midbrain dissections for snATAC-seq; C. Baugher for help developing the TF motif analysis pipeline; members of the Pombo laboratory for discussions; C. Braeuning of the Scientific Genomics Platform for scientific and technical support, and A. Schütz and her team of the Protein Production and Characterization Platform, both at the Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin; A. Adey for providing a BAM file of the published PGN scATAC-seq analysis; and the David Garfield group for sharing an optimized protocol for the bulk ATAC-seq tagmentation reaction. A.P. and A. Akalin acknowledge support from the Helmholtz Association (Germany). A.P. and M.N. acknowledge support from the National Institutes of Health Common Fund 4D Nucleome Program grants U54DK107977 and 1UM1HG011585, and the Berlin Institute of Health (BIH). A.P. and L.Z.-R. acknowledge support by the Deutsche Forschungsgemeinschaft (DFG; German Research Foundation) International Research Training Group (IRTG2403). A.P. acknowledges support from the DFG under Germany’s Excellence Strategy–EXC-2049–390688087. G.C.-B. acknowledges European Union Horizon 2020/European Research Council Consolidator Grant (EPIScOPE no. 681893), Swedish Research Council (no. 2015-03558; 2019-01360), Swedish Brain Foundation (no. FO2017-0075), Knut and Alice Wallenberg Foundation (grant 2019-0107), The Swedish Society for Medical Research (SSMF, grant JUB2019), Ming Wai Lau Centre for Reparative Medicine and Karolinska Institutet. G.D. and G.A. acknowledge support from the Austrian FWF through DK W1206 ‘Signal Processing in Neurons’ and SFB F44 ‘Cell Signaling in Chronic CNS Disorders’, P25014-B24. M.A.U. acknowledges funding by the Medical Research Council (UK) (U120085816) and a Royal Society University Research Fellowship. M.N. thanks support from CINECA ISCRA Grant HP10CYFPS5 and HP10CRTY8P, by computer resources at INFN and Scope at the University of Naples (M.N.). L.W. acknowledges the support of Ohio University’s GERB program. I.H. was supported by a Boehringer Ingelheim Fonds PhD fellowship, E.T.T. by an EMBO short-term fellowship (ASTF 336-2015), and I.I.-A. by a Long-Term Fellowship from the Federation of European Biochemical Societies (FEBS). S.C. is a GABBA PhD fellow supported by the FCT (Fundação para a Ciência e Tecnologia; PD/BD/135453/2017).
Extended data figures and tables
Source data
Author contributions
The authors consider the joint first authors to have extensively contributed to this work and consider that A.K. and I.H. contributed equally. A.P. designed the concept for this work. W.W.-N., M.M., A. Abentung, E.J.P. and A.P. collected animal tissue samples. I.H., W.W.-N., L.S., A.K., M.M. and R.K. produced the GAM datasets. I.H., L.S., A.K., W.W.-N. and M.M. optimized the experimental immunoGAM protocol. W.W.-N., A.K., C.J.T., I.I.-A., E.I. and T.M.S. developed the computational pipelines for bioinformatics and QC analyses of the GAM data. A.K. and I.I.-A. performed the QC analyses of the GAM data. T.M.S. and C.J.T. developed the NPMI normalization of the GAM data. T.M.S. performed the bias analysis of the GAM data. W.W.-N., A.K. and D.S. performed the bioinformatics analyses of the GAM data. D.S. developed the domain melting analysis, with consultation from C.J.T. D.S. devised the MELTRON pipeline and performed the domain melting analysis of the GAM data. D.S. developed the trans–cis ratio analysis. C.J.T. initially developed the differential contact approach9, and W.W.-N. and C.J.T. further developed and adapted this analysis for the current study. W.W.-N. performed the differential contact analysis. Y.Z. performed the TF motif finding enrichment, network and community analyses in differential contacts. W.W.-N. performed the post-hoc analyses of TF motif enrichment in differential contacts. E.T.T. performed the mES cell culture experiments. E.T.T. and A.A.K. produced the scRNA-seq data. L.Z.-R. and D.S. performed the RNA-seq analysis. L.Z.-R. performed the differential gene expression analysis. L.Z.-R. optimized the snATAC-seq protocol, produced and analysed the bulk and scATAC-seq data. S.B. optimized the polymer modelling method and performed PRISMR analysis. S.B. and A.M.C. produced models for polymer modelling. A.M.C. performed the statistical analyses of polymer models. L.F. and F.M. performed the in silico GAM experiments. I.H. and S.C. performed and analysed the FISH experiments. W.W.-N., D.S., I.H., L.Z.-R., Y.Z., A.K., T.M.S., S.B., A.M.C. and M.M. produced the final data plots. S.A.T. supervised the scRNA-seq experiments. M.M., G.A., G.D., M.A.U. and G.C.-B. supervised the animal tissue collection and provided animal samples. A.P. and S.C. supervised the cryo-FISH experiments. A.P., A.K. and W.W.-N. supervised the GAM experiments. A.P. and W.W.-N. supervised the ATAC-seq. V.F. and A. Akalin supervised the RNA-seq. M.N. supervised the polymer modelling and in silico GAM. L.W. supervised the TF motif and network analyses. W.W.-N., A.P., A.K., I.H., D.S., L.Z.-R., L.W., Y.Z., G.C.-B., M.M., S.B., A.M.C. and M.N. contributed to the interpretation of the results. W.W.-N. wrote the first draft of the manuscript. W.W.-N. designed the figures and illustrations, with contributions from A.P., D.S. and I.H. W.W.-N., A.P., A.K., I.H., D.S., L.Z.-R., Y.Z., L.W., G.C.-B., S.C., S.B. and A.M.C. wrote the manuscript. All authors provided critical feedback and helped revise the manuscript.
Funding
Open access funding provided by Max-Delbrück-Centrum für Molekulare Medizin in der Helmholtz-Gemeinschaft (MDC).
Data availability
Raw fastq sequencing files for all samples from DN, PGN and OLG GAM datasets, together with non-normalized co-segregation matrices, normalized pair-wised chromatin contacts maps and raw GAM segregation tables are available from the GEO repository under accession number GSE148792. Raw fastq sequencing files for mES cell GAM datasets are available from 4DN data portal (https://data.4dnucleome.org/). The 4DN sample IDs for all samples used in the study are available in Supplementary Table 1. All polymer model 3D structures produced for the analyses of this work are available in Supplementary Tables 9 and 10. Raw confocal and laser microdissection images, as well as images and ROIs for cryo-FISH experiments are available at: https://github.com/pombo-lab/WinickNg_Kukalev_Harabula_Nature_2021/tree/main/microscopy_images/.
Raw single cell mES cell transcriptome data are available from ENA data portal (https://www.ebi.ac.uk/ena/browser/home). The ENA sample IDs for all samples used in the study are available in Supplementary Table 15. Position sorted BAM files for ATAC-seq data from mES cells and DNs are available from the GEO repository under accession number GSE174024, together with processed bigwig files. A public UCSC session with all data produced, as well as all published data utilized in this study is available at http://genome-euro.ucsc.edu/s/Kjmorris/Winick_Ng_2021_GAMbrainpublicsession. Source data are provided with this paper.
Code availability
Processing and plotting scripts for MELTRON and insulation scores are available at: https://github.com/pombo-lab/Meltron/. Processing and plotting scripts for the trans–cis contact ratios are available at https://github.com/pombo-lab/GAM_trans_cis_ratio/. Custom python and R scripts for GAM window calling, GAM quality control, GAM genome sampling quality and resolution, production of NPMI matrices, aggregated maps, k-means clustering, calculation of insulation scores and compartment calling were deposited in https://github.com/pombo-lab/WinickNg_Kukalev_Harabula_Nature_2021/tree/main/code/.
Competing interests
In the past 3 years, S.A.T. has acted as a consultant for Genentech and Roche, and is a remunerated member of Scientific Advisory Boards of Biogen, GlaxoSmithKline and Foresite Labs. A.P. and M.N. hold a patent on GAM93.
Footnotes
Peer review information Peer review statement: Nature thanks Chongyuan Luo and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Warren Winick-Ng, Alexander Kukalev, Izabela Harabula, Luna Zea-Redondo and Dominik Szabó
Change history
3/16/2022
An amendment to the underlying code was made to enable an author name to appear correctly in PubMed.
Contributor Information
Warren Winick-Ng, Email: warren.winick-ng@mdc-berlin.de.
Ana Pombo, Email: ana.pombo@mdc-berlin.de.
Extended data
is available for this paper at 10.1038/s41586-021-04081-2.
Supplementary information
The online version contains supplementary material available at 10.1038/s41586-021-04081-2.
References
- 1.Jung I, et al. A compendium of promoter-centered long-range chromatin interactions in the human genome. Nat. Genet. 2019;51:1442–1449. doi: 10.1038/s41588-019-0494-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Beagrie RA, et al. Complex multi-enhancer contacts captured by genome architecture mapping. Nature. 2017;543:519–524. doi: 10.1038/nature21411. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Quinodoz SA, et al. Higher-order inter-chromosomal hubs shape 3D genome organization in the nucleus. Cell. 2018;174:744–757. doi: 10.1016/j.cell.2018.05.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Fraser J, et al. Hierarchical folding and reorganization of chromosomes are linked to transcriptional changes in cellular differentiation. Mol. Syst. Biol. 2015;11:852. doi: 10.15252/msb.20156492. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Beagan JA, et al. Three-dimensional genome restructuring across timescales of activity-induced neuronal gene expression. Nat. Neurosci. 2020;23:707–717. doi: 10.1038/s41593-020-0634-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Bonev B, et al. Multiscale 3D genome rewiring during mouse neural development. Cell. 2017;171:557–572. doi: 10.1016/j.cell.2017.09.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Walczak A, et al. Novel higher-order epigenetic regulation of the Bdnf gene upon seizures. J. Neurosci. 2013;33:2507–2511. doi: 10.1523/JNEUROSCI.1085-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Harabula A, Pombo A. The dynamics of chromatin architecture in brain development and function. Curr. Opin. Genet. Dev. 2021;67:84–93. doi: 10.1016/j.gde.2020.12.008. [DOI] [PubMed] [Google Scholar]
- 9.Beagrie, R. A. et al. Multiplex-GAM: genome-wide identification of chromatin contacts yields insights not captured by Hi-C. Preprint at bioRxiv10.1101/2020.07.31.230284 (2021).
- 10.Fiorillo L, et al. Comparison of the Hi-C, GAM and SPRITE methods by use of polymer models of chromatin. Nat. Methods. 2021;18:482–490. doi: 10.1038/s41592-021-01135-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Hughes EG, Orthmann-Murphy JL, Langseth AJ, Bergles DE. Myelin remodeling through experience-dependent oligodendrogenesis in the adult somatosensory cortex. Nat. Neurosci. 2018;21:696–706. doi: 10.1038/s41593-018-0121-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Stackman RW, Jr, Cohen SJ, Lora JC, Rios LM. Temporal inactivation reveals that the CA1 region of the mouse dorsal hippocampus plays an equivalent role in the retrieval of long-term object memory and spatial memory. Neurobiol. Learn. Mem. 2016;133:118–128. doi: 10.1016/j.nlm.2016.06.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Keiflin R, Janak PH. Dopamine prediction errors in reward learning and addiction: from theory to neural circuitry. Neuron. 2015;88:247–263. doi: 10.1016/j.neuron.2015.08.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Crane E, et al. Condensin-driven remodeling of X-chromosome topology during dosage compensation. Nature. 2015;523:240–244. doi: 10.1038/nature14450. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Monahan K, et al. Role of CCCTC binding factor (CTCF) and cohesin in the generation of single-cell diversity of protocadherin-α gene expression. Proc. Natl Acad. Sci. USA. 2012;23:9125–9130. doi: 10.1073/pnas.1205074109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Hirayama T, Tarusawa E, Yoshimura Y, Galjart N, Takeshi Y. CTCF is required for neural development and stochastic expression of clustered Pcdh genes in neurons. Cell Rep. 2012;2:345–357. doi: 10.1016/j.celrep.2012.06.014. [DOI] [PubMed] [Google Scholar]
- 17.Yu Y, Suo L, Wu Q. Protocadherin α gene cluster is required for myelination and oligodendrocyte development. Zoolog. Res. 2012;33:362–366. doi: 10.3724/SP.J.1141.2012.04362. [DOI] [PubMed] [Google Scholar]
- 18.Ray TA, et al. Comprehensive identification of mRNA isoforms reveals the diversity of neural cell-surface molecules with roles in retinal development and disease. Nat. Commun. 2020;11:3328. doi: 10.1038/s41467-020-17009-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Zuckerkandl E. Gene control in eukaryotes and the c-value paradox "excess" DNA as an impediment to transcription of coding sequences. J. Mol. Evol. 1976;9:73–104. doi: 10.1007/BF01796124. [DOI] [PubMed] [Google Scholar]
- 20.Müller WG, Walker D, Hafer GL, McNally JG. Large-scale chromatin decondensation and recondensation regulated by transcription from a natural promoter. J. Cell Biol. 2001;154:33–48. doi: 10.1083/jcb.200011069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.King IF, et al. Topoisomerases facilitate transcription of long genes linked to autism. Nature. 2013;501:58–62. doi: 10.1038/nature12504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Bianco S, et al. Polymer physics predicts the effects of structural variants on chromatin architecture. Nat. Genet. 2018;50:662–667. doi: 10.1038/s41588-018-0098-8. [DOI] [PubMed] [Google Scholar]
- 23.Branco MR, Pombo A. Intermingling of chromosome territories in interphase suggests role in translocations and transcription-dependent associations. PLoS Biol. 2006;4:e138. doi: 10.1371/journal.pbio.0040138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Bustos FJ, et al. Epigenetic editing of the Dlg4/PSD95 gene improves cognition in aged and Alzheimer’s disease mice. Brain. 2017;140:3252–3268. doi: 10.1093/brain/awx272. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Klaassen RV, et al. Shisa6 traps AMPA receptors at postsynaptic sites and prevents their desensitization during synaptic activity. Nat. Commun. 2016;7:10682. doi: 10.1038/ncomms10682. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Gorini G, Roberts AJ, Mayfield RD. Neurobiological signatures of alcohol dependence revealed by protein profiling. PLoS ONE. 2013;8:e82656. doi: 10.1371/journal.pone.0082656. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Wang J, et al. Genome-wide expression analysis reveals diverse effects of acute nicotine exposure on neuronal function-related genes and pathways. Front. Psychiatry. 2011;2:5. doi: 10.3389/fpsyt.2011.00005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Bell RL, et al. Gene expression changes in the nucleus accumbens of alcohol-preferring rats following chronic ethanol consumption. Pharmacol. Biochem. Behav. 2010;94:131–147. doi: 10.1016/j.pbb.2009.07.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Repunte-Canonigo V, et al. Identifying candidate drivers of alcohol dependence-induced excessive drinking by assembly and interrogation of brain-specific regulatory networks. Genome Biol. 2015;16:68. doi: 10.1186/s13059-015-0593-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Duclot F, Kabbaj M. The role of early growth response 1 (EGR1) in brain plasticity and neuropsychiatric disorders. Front. Behav. Neurosci. 2017;11:35. doi: 10.3389/fnbeh.2017.00035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Sun Z, et al. EGR1 recruits TET1 to shape the brain methylome during development and upon neuronal activity. Nat. Commun. 2019;10:3892. doi: 10.1038/s41467-019-11905-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Magklara A, et al. An epigenetic signature for monoallelic olfactory receptor expression. Cell. 2011;145:555–570. doi: 10.1016/j.cell.2011.03.040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Kambere MB, Lane RP. Co-regulation of a large and rapidly evolving repertoire of odorant receptor genes. BMC Neurosci. 2007;8:S2. doi: 10.1186/1471-2202-8-S3-S2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Gabel HW, et al. Disruption of DNA-methylation-dependent long gene repression in Rett syndrome. Nature. 2015;522:89–93. doi: 10.1038/nature14319. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Zhao YT, et al. Long genes linked to autism spectrum disorders harbor broad enhancer-like chromatin domains. Genome Res. 2018;28:933–942. doi: 10.1101/gr.233775.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Leidescher, S. et al. Spatial organisation of transcribed eukaryotic genes. Preprint at bioRxiv10.1101/2020.05.20.106591 (2021).
- 37.Vaags AK, et al. Rare deletions at the neurexin 3 locus in autism spectrum disorder. Am. J. Hum. Genet. 2012;90:133–141. doi: 10.1016/j.ajhg.2011.11.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Lee J-A, et al. Cytoplasmic Rbfox1 regulates the expression of synaptic and autism-related genes. Neuron. 2016;89:113–128. doi: 10.1016/j.neuron.2015.11.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Guzmán YF, et al. A gain-of-function mutation in the GRIK2 gene causes neurodevelopmental deficits. Neurol. Genet. 2017;3:e129. doi: 10.1212/NXG.0000000000000129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Guidi S, et al. Neurogenesis impairment and increased cell death reduce total neuron number in the hippocampal region of fetuses with Down syndrome. Brain Pathol. 2008;18:180–197. doi: 10.1111/j.1750-3639.2007.00113.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Bradley CA. Simulated binding of transcription factors to active and inactive regions folds human chromosomes into loops, rosettes and topological domains. Nucleic Acids Res. 2016;44:3503–3512. doi: 10.1093/nar/gkw135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Lüscher C, Malenka RC. Drug-evoked synaptic plasticity in addiction: from molecular changes to circuit remodeling. Neuron. 2011;69:650–663. doi: 10.1016/j.neuron.2011.01.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Lüscher C, Ungless M. The mechanistic classification of addictive drugs. PLoS Med. 2006;3:e437. doi: 10.1371/journal.pmed.0030437. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Monahan K, Horta A, Lomvardas S. LHX2- and LDB1-mediated trans interactions regulate olfactory receptor choice. Nature. 2018;565:448–453. doi: 10.1038/s41586-018-0845-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Tan L, et al. Changes in genome architecture and transcriptional dynamics progress independently of sensory experience during post-natal brain development. Cell. 2021;184:741–758. doi: 10.1016/j.cell.2020.12.032. [DOI] [PubMed] [Google Scholar]
- 46.Ansoleaga B, et al. Dysregulation of brain olfactory and taste receptors in AD, PSP and CJD, and AD-related model. Neuroscience. 2013;248:369–382. doi: 10.1016/j.neuroscience.2013.06.034. [DOI] [PubMed] [Google Scholar]
- 47.Peric-Hupkes D, et al. Molecular maps of the reorganization of genome–nuclear lamina interactions during differentiation. Mol. Cell. 2010;38:603–613. doi: 10.1016/j.molcel.2010.03.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Bizhanova A, Yan A, Yu J, Zhu LJ, Kaufman PD. Distinct features of nucleolus-associated domains in mouse embryonic stem cells. Chromosoma. 2020;129:121–139. doi: 10.1007/s00412-020-00734-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Valdez BC, et al. Identification of the nuclear and nucleolar localization signals of the protein p120. J. Biol. Chem. 1994;269:23776–23783. [PubMed] [Google Scholar]
- 50.Sawamoto K, et al. Generation of dopaminergic neurons in the adult brain from mesencephalic precursor cells labeled with a nestin–GFP transgene. J. Neurosci. 2001;21:3895–3903. doi: 10.1523/JNEUROSCI.21-11-03895.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Matsushita N, et al. Dynamics of tyrosine hydroxylase promoter activity during midbrain dopaminergic neuron development. J. Neurochem. 2002;82:295–304. doi: 10.1046/j.1471-4159.2002.00972.x. [DOI] [PubMed] [Google Scholar]
- 52.Falcão AM, et al. Disease-specific oligodendrocyte lineage cells arise in multiple sclerosis. Nat. Med. 2018;24:1837–1844. doi: 10.1038/s41591-018-0236-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Matsuoka T, et al. Neural crest origins of the neck and shoulder. Nature. 2005;436:347–355. doi: 10.1038/nature03837. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Sousa VH, et al. Characterization of Nkx6-2-derived neocortical interneuron lineages. Cereb. Cortex. 2009;19:i1–i10. doi: 10.1093/cercor/bhp038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Jaitner C, et al. Satb2 determines miRNA expression and long-term memory in the adult central nervous system. eLife. 2016;5:e17361. doi: 10.7554/eLife.17361. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat. Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Dixon JR, et al. Chromatin architecture reorganization during stem cell differentiation. Nature. 2015;518:331–336. doi: 10.1038/nature14222. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Phillips-Cremins JE, et al. Architectural protein subclasses shape 3-D organization of genomes during lineage commitment. Cell. 2013;153:1281–1295. doi: 10.1016/j.cell.2013.04.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Lex A, et al. UpSet: visualization of intersecting sets. IEEE Trans. Vis. Comput. Graph. 2014;20:1983–1992. doi: 10.1109/TVCG.2014.2346248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Ying QL, Stavridis M, Griffiths D, Li M, Smith A. Conversion of embryonic stem cells into neuroectodermal precursors in adherent monoculture. Nat. Biotechnol. 2003;21:183–186. doi: 10.1038/nbt780. [DOI] [PubMed] [Google Scholar]
- 62.Ferrai C, et al. RNA polymerase II primes Polycomb-repressed developmental genes throughout terminal neuronal differentiation. Mol. Syst. Biol. 2017;13:946. doi: 10.15252/msb.20177754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Kolodziejczyk AA, et al. Single cell RNA-sequencing of pluripotent states unlocks modular transcriptional variation. Cell Stem Cell. 2015;17:471–485. doi: 10.1016/j.stem.2015.09.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Dobin A, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics. 2011;12:323. doi: 10.1186/1471-2105-12-323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Aronesty, E. ea-utils: Command-line tools for processing biological sequencing data. https://expressionanalysis.github.io/ea-utils/ (2011).
- 67.Kar G, et al. Flipping between Polycomb repressed and active transcriptional states introduces noise in gene expression. Nat. Commun. 2017;8:36. doi: 10.1038/s41467-017-00052-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Marques S, et al. Oligodendrocyte heterogeneity in the mouse juvenile and adult central nervous system. Science. 2016;352:1326–1329. doi: 10.1126/science.aaf6463. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Zeisel A, et al. Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science. 2015;347:1138–1142. doi: 10.1126/science.aaa1934. [DOI] [PubMed] [Google Scholar]
- 70.La Manno G, et al. Molecular diversity of midbrain development in mouse, human, and stem cells. Cell. 2016;167:566–580. doi: 10.1016/j.cell.2016.09.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Macosko EZ, et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell. 2015;161:1202–1214. doi: 10.1016/j.cell.2015.05.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.McInnes, L., Healy, J., Saul, N. & Großberger, L. UMAP: uniform manifold approximation and projection. J. Open Source Softw. 10.21105/joss.00861 (2018).
- 73.Li H, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Ramirez F, et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 2016;44:W160–W165. doi: 10.1093/nar/gkw257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Picelli S, et al. Tn5 transposase and tagmentation procedures for massively scaled sequencing projects. Genome Res. 2014;24:2033–2040. doi: 10.1101/gr.177881.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Corces MR, et al. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nat. Methods. 2017;14:959–962. doi: 10.1038/nmeth.4396. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Tarasov A, Vilella AJ, Cuppen E, Nijman IJ, Prins P. Sambamba: fast processing of NGS alignment formats. Bioinformatics. 2015;31:2032–2034. doi: 10.1093/bioinformatics/btv098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Granja JM, et al. ArchR is a scalable software package for integrative single-cell chromatin accessibility analysis. Nat. Genet. 2021;53:403–411. doi: 10.1038/s41588-021-00790-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Cusanovich DA, et al. A single-cell atlas of in vivo mammalian chromatin accessibility. Cell. 2018;174:1309–1324. doi: 10.1016/j.cell.2018.06.052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Sinnamon JR, et al. The accessible chromatin landscape of the murine hippocampus at single-cell resolution. Genome Res. 2019;29:857–869. doi: 10.1101/gr.243725.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Zambon AC, et al. GO-Elite: a flexible solution for pathway and ontology over-representation. Bioinformatics. 2012;28:2209–2210. doi: 10.1093/bioinformatics/bts366. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Barbieri M, et al. Complexity of chromatin folding is captured by the strings and binders switch model. Proc. Natl Acad. Sci. USA. 2012;109:16173–16178. doi: 10.1073/pnas.1204799109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Chiariello AM, Annunziatella C, Bianco S, Esposito A, Nicodemi M. Polymer physics of chromosome large-scale 3D organisation. Sci. Rep. 2016;6:29775. doi: 10.1038/srep29775. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Fiorillo L, et al. Inference of chromosome 3D structures from GAM data by a physics computational approach. Methods. 2020;181–182:70–79. doi: 10.1016/j.ymeth.2019.09.018. [DOI] [PubMed] [Google Scholar]
- 86.Plimpton S. Fast parallel algorithms for short-range molecular dynamics. J. Comput. Phys. 1995;117:1–19. [Google Scholar]
- 87.Kremer K, Grest GS. Dynamics of entangled linear polymer melts: a molecular-dynamics simulation. J. Chem. Phys. 1990;92:5057–5086. [Google Scholar]
- 88.Schindelin J, et al. Fiji: an open-source platform for biological-image analysis. Nat. Methods. 2012;9:676–682. doi: 10.1038/nmeth.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Kulakovskiy IV, et al. HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-seq analysis. Nucleic Acids Res. 2018;46:D252–D259. doi: 10.1093/nar/gkx1106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Larose, D. T. & Larose, C. D. Discovering Knowledge in Data. An Introduction to Data Mining 2nd edn (Wiley, 2014).
- 91.Traag VA, Waltmann L, van Eck NJ. From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 2019;9:5233. doi: 10.1038/s41598-019-41695-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Kelly, S. T. leiden: R implementation of the Leiden algorithm. R version 0.3.3 https://github.com/TomKellyGenetics/leiden (2019).
- 93.Pombo, A., Edwards, P. A. W., Nicodemi, M., Scialdone, A. & Beagrie, R. A. Genome architecture mapping. International patent PCT/EP2015/079413 (2015).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Raw fastq sequencing files for all samples from DN, PGN and OLG GAM datasets, together with non-normalized co-segregation matrices, normalized pair-wised chromatin contacts maps and raw GAM segregation tables are available from the GEO repository under accession number GSE148792. Raw fastq sequencing files for mES cell GAM datasets are available from 4DN data portal (https://data.4dnucleome.org/). The 4DN sample IDs for all samples used in the study are available in Supplementary Table 1. All polymer model 3D structures produced for the analyses of this work are available in Supplementary Tables 9 and 10. Raw confocal and laser microdissection images, as well as images and ROIs for cryo-FISH experiments are available at: https://github.com/pombo-lab/WinickNg_Kukalev_Harabula_Nature_2021/tree/main/microscopy_images/.
Raw single cell mES cell transcriptome data are available from ENA data portal (https://www.ebi.ac.uk/ena/browser/home). The ENA sample IDs for all samples used in the study are available in Supplementary Table 15. Position sorted BAM files for ATAC-seq data from mES cells and DNs are available from the GEO repository under accession number GSE174024, together with processed bigwig files. A public UCSC session with all data produced, as well as all published data utilized in this study is available at http://genome-euro.ucsc.edu/s/Kjmorris/Winick_Ng_2021_GAMbrainpublicsession. Source data are provided with this paper.
Processing and plotting scripts for MELTRON and insulation scores are available at: https://github.com/pombo-lab/Meltron/. Processing and plotting scripts for the trans–cis contact ratios are available at https://github.com/pombo-lab/GAM_trans_cis_ratio/. Custom python and R scripts for GAM window calling, GAM quality control, GAM genome sampling quality and resolution, production of NPMI matrices, aggregated maps, k-means clustering, calculation of insulation scores and compartment calling were deposited in https://github.com/pombo-lab/WinickNg_Kukalev_Harabula_Nature_2021/tree/main/code/.