Abstract
Cell fate decision involves rewiring of the genome, but remains poorly understood at the chromatin level. Here, we report that chromatin remodeling complex NuRD participates in closing open chromatin in the early phase of somatic reprogramming. Sall4, Jdp2, Glis1 and Esrrb can reprogram MEFs to iPSCs efficiently, but only Sall4 is indispensable capable of recruiting endogenous components of NuRD. Yet knocking down NuRD components only reduces reprogramming modestly, in contrast to disrupting the known Sall4-NuRD interaction by mutating or deleting the NuRD interacting motif at its N-terminus that renders Sall4 inept to reprogram. Remarkably, these defects can be partially rescured by grafting NuRD interacting motif onto Jdp2. Further analysis of chromatin accessibility dynamics demonstrates that the Sall4-NuRD axis plays a critical role in closing the open chromatin in the early phase of reprogramming. Among the chromatin loci closed by Sall4-NuRD encode genes resistant to reprogramming. These results identify a previously unrecognized role of NuRD in reprogramming, and may further illuminate chromatin closing as a critical step in cell fate control.
Subject terms: Reprogramming, Pluripotent stem cells, Chromatin remodelling
Somatic reprogramming involves both transcriptional and epigenetic resetting, but we don’t yet fully understand this process. Here they show that Jdp2, Glis1, Esrrb, and Sall4 can mediate reprogramming by recruiting the NuRD complex to close chromatin, highlighting a potential role in cell fate control.
Introduction
Pluripotent stem cells (PSC) can be derived from the inner cell mass of blastocyst-stage embryos1 or induced from somatic cells using a defined cocktail of factors such as Oct4, Sox2, Klf4, and c-Myc (OSKM)2. Induced pluripotent stem cells (iPSC) are functionally indistinguishable from embryonic stem cells (ESC) and exhibit remarkable developmental plasticity, capable of giving rise to all cell types of an organism except for extraembryonic tissues. The establishment and maintenance of PSC is attributed to their unique chromatin structure and the transcriptional regulatory network governing by core transcription factors such as Oct4, Sox2, Nanog, Esrrb, and Sall43.
Reprogramming somatic cells to pluripotent stem cells or a particular cell type of interest represents a new paradigm for both basic biological sciences and translational research2,4–8. Classic Yamanaka reprogramming (OSKM) has provided conceptual understanding of the reprogramming process through various experimental conditions9,10. The fact that reprogramming can be achieved through non-Yamanaka methods such as combinations of factors or chemicals suggests that reprogramming can start at divergent points or pathways11–16, but eventually converge to a common path towards pluripotency, thus, unifying the classic Yamanaka system with all these alternatives. This unified theory is consistent with earlier reports on mesenchymal-epithelial transition (MET), barriers, and epigenetic regulations17–19. Extensively studied chromatin-based barriers to transcription factor mediated reprogramming include repressive chromatin factors and factors associated with active transcription20. Repressive factors are associated with actively transcribed loci in somatic cells, such as H3K4/36/79 methylation21–23, SUMOylation24, Facilitates Chromatin Transcription (FACT)25, and Poly (ADP-Ribose) Polymerase 1 (PARP1)26. Genetic screening identified numerous barriers with repressive functions that act on pluripotent-associated genes, including DNA Methyltransferases (DNMT)27, Nucleosome Remodeling and histone Deacetylation (NuRD)28, histone deacetylases (HDACs)29, H3K9 methylation30, and chromatin assembly factor 1 (CAF1)31. Among the barriers identified, the NuRD complex appears to be quite controversial32–34. The NuRD complex consists of seven core components, each with multiple paralogues, including HDAC1/2, CHD3/4, RbAp46/48, MBD2/3, GATAD2a/b, and MTA1/2/3. These components perform various functions such as histone deacetylation, ATP-dependent chromatin remodeling, histone chaperoning, CpG-binding, DNA-binding, and transcriptional regulation35. Near 100% reprogramming has been achieved by deleting MBD3, a component of NuRD28. This claim was disputed by others showing that the role of NuRD is context dependent, e.g., being a positive regulator in one reprogramming system while negative in others32,36,37.
The generation of iPSCs has been known to involve a direct interaction between reprogramming factors and chromatin regulators in order to overcome epigenetic barriers. According to recent research, Oct4 has the ability to interact with SWI/SNF complex compounds, which enhances reprogramming by facilitating OCT4 binding to target promoters38. Additionally, the interaction between Nanog and Sin3a allows for a synergistic transcriptional program on pluripotent gene activation and the overcoming of reprogramming barriers39. Of note, Sall4 is crucial for the early development of embryos, and mice without Sall4 cannot survive beyond E6.540–42. Sall4 has been shown to have a positive role in generating iPSCs from somatic cells and can replace OSKM when overexpressed with other factors during reprogramming43,44. The establishment and maintenance of pluripotency is dependent on Sall4’s interaction with various proteins. It has a conserved N-terminal 12 amino acid (N12-aa) that interacts with the NuRD complex, which is also found in Sall1 and Sall345. Additionally, Sall4 plays a crucial role in maintaining the undifferentiated state of PSC by binding to Oct4, Sox2, and Nanog. It acts as a transcription repressor by interacting with LSD1 and DNMTs. Furthermore, a collaborative role between Sall4 and TET proteins was discovered in stepwise oxidation of DNA methylation in ESC46. Among above protein-partners, the specific protein-partners that interact with Sall4 to define its molecular function remain unknown. While the inhibitory role of NuRD in reprogramming is established in lots of studies, Sall4 has been found to facilitate reprogramming. However, the mechanism of Sall4 integrating with the transcriptional repressor, NuRD complex, to coordinately regulate pluripotency establishment remain further investigation.
Here, we present evidence that the NuRD complex play a critical role in efficient reprogramming driven by Jdp2, Glis1, Esrrb and Sall4 (JGES). Sall4 recruits NuRD complex to open chromatin in MEFs to ensure the closure of somatic loci. This recruitment is dependent on the N-terminal motif of Sall4 and can be transferred to an unrelated factor such as Jdp2. Our results suggest that NuRD complex plays a positive role in early phase of reprogramming by closing chromatin in MEFs.
Results
JGES reprogramming through a Sall4-NuRD axis
The 7 F (Jdp2, Esrrb, Sall4, Nanog, Glis1, Kdm2b, Mkk6) system we described previously achieved efficiency better than the classic OSKM system, but remains cumbersome in mechanistic studies47. To this end, we embarked on an optimization process that eventually gives rise to a 4 F system, Jdp2, Glis1, Esrrb and Sall4 (JGES). We first optimized the culture medium, by starting out with iCD1 (iPS Chemically Defined medium1) and testing each component through dropout experiments (Supplementary Fig. 1a), that identifies LiCl being detrimental and confirms vitamin C and LIF critical (Supplementary Fig. 1b). Then, we performed a screen for chemicals that can boost 7 F reprogramming (Supplementary Fig. 1c) and identified two additional chemicals, GSK-LSD1-2HCL and SGC0946 capable of further improving 7 F reprogramming (Supplementary data 1). Along with ROCK inhibitor previously shown to enhance 7 F reprogramming, we formulated iCD3 (iPS Chemically Defined medium3) (Supplementary Fig. 1d) and show that it improves 7 F reprogramming by about 100%, generating 12–13 colonies per 150 MEF cells, ~8%, in 7 days (Supplementary Fig. 1e, f). To reduce the number of factors in 7 F, we performed a dropout experiment for each factor and show that each factor appears to contribute significantly to 7 F reprogramming in iCD3, although with various degree of impact (Supplementary Fig. 1g). We then show that JGES reprogram MEFs to iPSCs at the efficiency comparable to 7 F under iCD1 (Supplementary Fig. 1h, i). We then picked colonies from JGES to establish iPSC clones and tested their transgene integration. As shown in Supplementary Fig. 1j, iPSC clones from JGES contain those 4 genes, without the classic Yamanaka factor OSKM nor Nanog, Kdm2b and Mkk6 from 7 F (Supplementary data 2, 3). The JGES clones can be passed stably (Supplementary Fig. 1k), possess normal karyotypes (Supplementary Fig. 1l) and can generate chimera with blastocysts injection that can undergo germ-line transmission (Supplementary Fig. 1m). We can also demonstrate that Oct4-GFP positive clones picked at Day 7 can give rise to chimera with germline transmission without further passaging (Supplementary Fig. 1n). Thus, JGES, like the classic OSKM, may be suitable for mechanistic studies.
We ask whether each of JGES factors contributes to reprogramming equally as OSKM. We assessed this by performing drop-out experiments and show that, surprisingly, while dropping J, G, E individually weakens reprogramming significantly, but Sall4 appears to be more important than the other 3 as its removal renders reprogramming close to 0 (Fig. 1a). Consistently, when we looked at the bulk RNAseq datasets from the drop-out experiments in Fig. 1a, JGE is the only one with a divergence towards the left as all others move to the right towards ESCs along the PC1 axis (Fig. 1b). Based on these results, we decided to focus on Sall4 for further mechanistic analysis.
We hypothesize that Sall4 must engage a critical cellular component to overcome a major barrier during reprogramming as SALL4 has been shown to interact with many proteins. To identify such partner(s), we performed IP-Mass on MEFs infected with JGES with anti-SALL4 antibody and show by pairwise comparison that SALL4 co-purifies with canonical subunits of the NuRD complex (Fig.1c, Supplementary data 4), suggesting that collaborative interactions exist between SALL4 and the NuRD complex.
NuRD has been implicated in reprogramming in previous studies32–34,48,49, but with mixed results and divergent mechanistic explanations. To investigate its role in JGES reprogramming, we performed knock-down experiments on NuRD subunits and show that among all the 13 canonical subunits, knocking-down Gatad2b/2a and Chd4 significantly reduce Oct4-GFP+ colonies (Fig.1d, Supplementary Fig. 2a, Supplementary data 5, 6). Principal component analysis (PCA) shows that there is a delay of transition from somatic state to pluripotent state with shGatad2b (Yellow) or shChd4 (Blue) compared to control (Green). Though one of the shChd4 samples on day7 looks like shLuciferase, another shChd4 sample and two shGatad2b samples on day 7 show further distance to ESCs than shLuciferase at day 5. Compared to shChd4 samples at day7, shGatad2b samples at day7 are much farther away from ES cells, suggesting that depleting Gatad2b in MEFs restrains the conversion of MEFs to the pluripotent state (Fig.1e). Intriguingly, comparative transcriptome analysis revealed that genes with altered expression in shGatad2b or shChd4 are associated with response to interferon-beta, striated muscle contraction, extracellular matrix organization (Fig.1f). Additionally, we found that many genes like Rasa3, Bicc1, and Tmem98 failed to be downregulated when Gatad2b was knocked down (Supplementary Fig. 2b). In combination, these results suggest that Sall4 need assistants from subunits of NuRD complex during JGES reprogramming.
NuRD mediates chromatin closing of somatic loci
Given the mixed roles of NuRD reported earlier in reprogramming studies, we wish to resolve its role in JGES reprogramming by ATAC-seq to analyze the chromatin accessibility dynamics (CAD). Consistent with the reduction of reprogramming efficiency, we observed that shGatad2b and shChd4 impact CAD quite dramatically, altering CO (close to open) and OC (open to close) peaks (Fig.1g). For example, the number of CO1, CO2, and CO3 peaks are much higher in shGatad2b and shChd4 than in shLuciferase, suggesting that many loci are opened improperly in shGatad2b and shChd4 at reprogramming day0, day1 and day3. The number of OC4 and OC5 peaks is much higher in shGatad2b and shChd4 than in shLuciferase, indicating that more loci fail to be closed in shGatad2b and shChd4 at reprogramming day3 and day5 than in shLuciferase (Supplementary Fig. 2c). Specifically, only 45.45% and 24.96% loci in CO2 of shGatad2b and shChd4 overlap with those of shLuciferase, 59.06% and 68.12% loci in OC1 change as expected, respectively (Supplementary Fig. 2d, e). We calculated the number of genes that gene body or promoter was located in the ATAC-seq peaks region (Supplementary Fig. 2f, g). Consistent with chromatin accessibility results, hundreds of additional genes were enriched by Gatad2b or Chd4 knockdown when compared to the control. Interestingly, when compared to the common and shLuciferase parts, fewer genes were enriched from OC1 and OC2 but more genes were found in OC3-OC6 by Gatad2b knockdown. Similar results could be found from the Chd4 knockdown reprogramming (Supplementary Fig. 2f, g). These results indicated that there is a delay of lots of gene regions becoming inaccessible during reprogramming by Gatad2b/Chd4 knockdown. Besides, when analyzing the Sall4, Gatad2b occupancy and H3K27ac modification at OC regions, Sall4 and Gatad2b displayed a higher binding density at shLuciferase than shGatad2b and shChd4 specific regions. However, an increased H3K27ac signal was found by Gatad2b or Chd4 knockdown (Supplementary Fig. 2h, i). These data support that the NuRD complex is involved in the inactivation of the somatic program early in JGES reprogramming. To further analyze the dynamics of progressively closing of chromatin loci, we defined gradually closed regions (GCRs) by subtracting ATAC-seq signals between adjacent stages below a threshold which was set at 0.05 multiplied by the range of the normalized ATAC-seq signal. Knocking-down either Chd4 or Gatad2b appears to slow down closing of somatic loci (Supplementary Fig. 2j), as measured by the normalized signal intensity of the gradually close region (GCR). The GCRs are dominated by transcription factor motifs for AP-1, ETS, and TEAD family genes such as JUNB, ETV1, and TEAD1 (Supplementary Fig. 2k).
We then approached the Sall4-NuRD axis through their known interactions through the N-terminal 12 residues50 by mutating individual residues (Fig. 2a) and show that Sall4P9A, Sall4R3A, Sall4R4A, and Sall4K5A render it no longer able to reprogram (Fig. 2b). We then focused on Sall4K5A and show that this mutant fails to interact with NuRD through IP-Mass or CO-IP (Fig. 2c, d, processed data provided in Supplementary data 1). Consistently, this mutant also fails to mediate the requisite transcriptomic and chromatin reprogramming as measured by RNA- and ATAC-seq, respectively (Supplementary Fig. 3a, b).
Unlike NuRD, we also show, through unbiased analysis of the proteomics data, that Sall4K5A and Sall4WT also share many protein partners (Fig. 2e, detail provided in supplementary data1). Gene ontology analysis show that the common proteins are associated with DNA repair (Supplementary Fig. 3c). To validate the function of those proteins, a decrease in reprogramming efficiency was observed when we knocked down Hmbox1, Tfam, Parp1, Lig3, and Kpna4 by shRNA (Supplementary Fig. 3d, e), suggesting that Sall4 engages many partners besides NuRD to facilitate reprogramming. Yet since Sall4K5A is almost totally ineffective, we conclude that the NuRD-Sall4 axis may play a dominant role in JGES reprogramming.
The multiple zinc fingers of Sall4 have unique roles in regulating downstream target gene expression by its zinc finger clusters (ZFC)46,51,52. To determine their contribution to JGES reprogramming, we made mutation or deletion as detailed in Supplementary Fig. 3f. Consistent with point mutation data described above, deleting N12 abolishes Sall4-dependent reprogramming, while other mutations have either no effect (ZF1) or limited impacts (C420A, ZFC2-4) (Supplementary Fig. 3g, h). confirming that Sall4 mediates reprogramming primarily through its N12 domain that engages NuRD and secondarily through ZFC2-4 that likely engages the above-mentioned factors involved in DNA repairs. As such, we continued to focus on the Sall4-NuRD complex in this study.
Compared to the shChd4 and shGatad2b data in Fig1d with ~ 50% reduction in iPSC colonies, Sall4K5A, like Sall4P9A, Sall4R3A, and Sall4R4A, on the other hand, has ~100% reduction (Fig.2b), offering the opportunity to assess the Sall4-NuRD axis in mediating chromatin closing. So, we performed ATAC-seq on MEFs undergoing reprogramming with Sall4WT vs Sall4K5A and compared the pattern of open and close chromatin. A total of 9344 ATAC-seq peaks can be classified into 6 clusters according to chromatin accessibility (Fig. 2f). Among the 6 clusters, more than 2/3 of regions are open in Sall4K5A but closed in Sall4WT (C4, n = 4573 and C6, n = 1904). Besides, there are two interesting clusters that exhibit a loss of accessibility in Sall4K5A but become accessible (C5) and inaccessible (C3) progressively in Sall4WT, respectively. Examining the expression for genes whose promoter located within ATAC-seq peaks in each cluster indicates that patterns in transcription match those of chromatin accessibility (Fig. 2g). For regions that are more accessible in Sall4WT, we observed a higher level of gene expression in Sall4WT than in Sall4K5A such as Mas1, Peg10, and Pkd1 (Supplementary Fig. 3i, j). In contrast, for regions in which accessibility is established in Sall4K5A but remains inaccessible in Sall4WT, there was a more significant increase in gene expression in Sall4K5A than in Sall4WT such as Bicc1, Fmo1, and Sox5 (Supplementary Fig. 3i, j). Gene ontology analysis of genes associated with distinct clusters showed that the C4 and C6 loci correspond to those related to somatic cell maintenance and differentiation (e.g., regulation of mesenchymal stem cell differentiation), while the C5 loci are associated with cell cycle phase transition (Supplementary Fig. 3k).
Motif enrichment for each cluster shows that enriched motifs are quite different between Sall4WT and Sall4K5A. For example, motifs from ETS (ETS1) and HOMEBOX family (Lhx3, Lhx1, Dlx1, Dlx3) members are specifically enriched in C6 and C4, respectively. Motifs for ETS and FOX family (FoxK2, FoxO3) members are both found in C6 and C4. Moreover, motifs for TFs from the AP-1 family such as Fosl2, Fra1/2, c-Jun, and JunB are also present in C6 (Fig. 2h).These results are entirely consistent with our earlier findings that somatic gene loci enriched with somatic state specific TFs from AP-1 and ETS family members are barriers for reprogramming10,53,54. Several TFs have already been shown significantly inhibits iPSC induction such as FoxK2 and FoxO355. Lhx3 and Dlx1/2 selectively drive fibroblast to distinct subtypes of neurons6,56–58. On the other hand, motif enrichment for C5 shows that pluripotent TFs such as OCT4/6, KLF4, and SOX17/21 are only found in Sall4 WT but not in K5A. These results suggest that the interaction between Sall4 and NuRD complex is required to reconfigure the chromatin architecture for reprogramming.
Consistent with the failure to close somatic loci, we show by RNA-seq that genes in G3 fail to be downregulated and are related to the MEF somatic state (extracellular matrix organization) (Fig. 2i). Furthermore, Sall4K5A appears to divert cell fate towards innate immunity such as interferon-beta and complement activation in G6 (Fig. 2i). Coincidentally, those GO terms observed in Sall4K5A groups can also be found after depletion of Gatad2b and Chd4 during JGES reprogramming (Fig. 1f). Consistently, by comparing the RNA-seq data in each group (Fig. 1f) with Sall4K5A upregulated or downregulated gene sets, we show statistically significant concordance between shGatad2b/Chd4 and Sall4K5A (Supplementary Fig. 3l). Together, these results suggest that Sall4-NuRD axis is important for closing somatic chromatin loci during reprogramming.
Sall4 K5A mutation results less occupancy of Gatad2b and increased H3K27ac at somatic loci
To test whether failure to close somatic loci associated with Sall4K5A is dependent on histone modification, we performed CUT&Tag by anti-SALL4 and H3K27ac antibody on MEFs infected with Sall4WT and Sall4K5A, undergoing reprogramming at day1. Quantification of SALL4 CUT&Tag signal showed that over 75% (67598) of Sall4K5A peaks overlap with those of Sall4WT, indicating that Sall4WT and Sall4K5A occupy similar loci (Fig. 3a). When mapping the CUT&Tag dataset to global occupancy of H3K27ac, we identified a total of 48913 peaks and 88% (43894) peaks are enriched from both Sall4WT and Sall4K5A (Fig. 3b). So, we focus on the 67598 common target peaks from Sall4WT and Sall4K5A CUT&Tag data to evaluate the levels of H3K27ac. Compared to Sall4WT control, the H3K27ac density in Sall4K5A could be divided into three basic groups, declining (decreased) or increasing in intensity (increased), no change (Fig. 3c). Basically, in 72% (48654) of the gene loci, the level of H3K27ac remains similar regardless of WT or K5A. However, 14.5% (9828) of the loci have higher H3K27ac density in Sall4K5A reprogramming. For regions with elevated H3K27ac in Sall4K5A, we observed a gradual loss of chromatin accessibility from day1 to the end of reprogramming (Fig. 3d). Of note, compared to Sall4WT, not only the average level of H3K27ac but also chromatin accessibility shows much higher level in Sall4K5A samples (Fig. 3d, e). Consistent with this observation, many somatic gene loci engaged by Sall4 (Zeb1, Tgfbr3, Htra1 and Nrp2) are associated with higher levels of H3K27ac in Sall4K5A (Fig. 3f). On the other hand, reduced level of H3K27ac and chromatin accessibility were found in Sall4K5A at several pluripotent gene loci (Sox2, Tead4, and Wnt6) (Supplementary Fig. 4a–c), suggesting that certain chromatin regions need be opened and activated during this process. Based on the same motif enrichment method mentioned above, motif enrichment in those three groups showed that motifs for somatic TFs, such as JunB, Fra1/2, Fosl2, and BATF are enriched from the no change group. Interestingly, much more significant enrichment of those motifs was observed in H3K27ac elevated group (Supplementary Fig. 4d), suggesting that without NuRD, Sall4K5A occupies loci with high H3K37ac, and consequently fails to close them properly.
To explore transcriptional changes regulated by Sall4, genes located within an increased or decreased region were considered as a putative annotated nearest gene (Fig. 3g). Integrating analysis with RNA-seq, we show that annotated nearest genes from increased and decreased regions correspond predominantly to RNA-seq group 3 and group2, respectively (Fig. 3h). Gene ontology analysis of G3 and G2 show that the increased-G3 correspond to genes related to somatic state (e.g., angiogenesis, wound healing, and extracellular matrix organization), while the decreased-G2 are largely associated with pluripotent state (Fig. 3i, j). Similar analysis performed for genes located closest to Sall4K5A specific CUT&Tag peaks showed that Sall4K5A-bound genes largely overlap with group 3 and 6 (Supplementary Fig. 4e). Gene ontology analysis of the overlap genes revealed that Sall4K5A-G3 and Sall4K5A-G6 are associated with somatic features such as extracellular matrix organization and cell adhesion (Supplementary Fig. 4f). Together, these results suggest that Sall4 plays a crucial role in the transcriptional regulation of various important biological processes during the process of reprogramming.
NuRD complex cooperates with transcription factors to regulate gene expression at chromatin level by ATP-dependent chromatin remodeling and histone deacetylase activities. The modification of histone tails such as H3K27ac is tightly coupled to chromatin accessibility and gene expression. Thus, dissociation between NuRD complex and Sall4K5A can result in failure of NuRD-mediated deacetylation of histone H3K27 at somatic gene loci. We then ask whether the high levels of H3K27ac in Sall4K5A are due to the lack of NuRD subunits. To this end, we performed CUT&Tag experiments with anti-Gatad2b antibody during Sall4WT and Sall4K5A reprograming (Supplementary Fig. 4g). Firstly, we analyzed genome-wide occupancy of Sall4 CUT&Tag dataset in Sall4WT and Sall4K5A, generating three groups, WT and K5A co-binding regions (common), K5A specifically binding (SK5A speci) and WT specifically binding (SWT speci) (Fig. 3k). Then each group was subdivided into three subgroups by GATAD2B CUT&Tag dataset in WT or K5A condition (Fig. 3k). Among genomic regions occupied by both Sall4WT and Sall4K5A, large numbers of Gatad2b-bound peaks (7199) were identified only in Sall4WT reprogramming, which were not bound by Gatad2b in Sall4K5A. Similar Gatad2b binding pattern was also found among Sall4WT specific regions (463) (Fig. 3l). The weak signals at sites occupied by Gatad2b in Sall4K5A condition reveals that Gatad2b genomic binding ability is partly compromised when Sall4 dissociates with NuRD complex. Again, those loci are highly enriched in TF motifs from FRA1, ATF3, JUNB, BATF, JUN, that are all AP1 TFs. These results suggest that disruption of the Sall4-NuRD axis leads to mislocalization of NuRD and failure to close somatic loci enriched with AP1 TFs (Supplementary Fig. 4h).
To further identify candidates, we compared annotated nearest genes within each cluster and found overlaps between annotated nearest genes and group 3 and 6 (Supplementary Fig. 4i, j). Gene ontology analysis of the overlap genes in Gatad2b lost-G3 and Gatad2b lost-G6 are associated with angiogenesis, transmembrane transport, respectively (Supplementary Fig. 4k). To further validate our predictive analysis and narrow down the scale of candidates, we found 16 overlap genes by selecting the common genes from group1 of shGatad2b RNA-seq and Gatad2b lost-G3 (Supplementary Fig. 4l). Those genes are rapidly downregulated after JGES induction but remain higher expression with knocking down of Gatad2b or Sall4K5A. When we overexpressed 15 of the candidates during JGES reprogramming, 11 genes of them lead to significantly decreased reprogramming efficiency (Supplementary Fig. 4m), highlighting the requirement of NuRD complex for inactivation of somatic program during JGES reprogramming.
Next, we ask whether above sites existed in open chromatin conformation with active histone marks and ATAC-seq signal, we compared loci with elevated H3K27ac (9828) with those lost Gatad2b binding (7199 + 463) to identified a total of 610 common loci and found a reciprocal relationship between Gatad2b and H3K27ac (Fig. 3m, green vs red). Motif enrichment analysis demonstrate that those loci are mostly bound by TFs such as AP-1, Atf3, JunB and BATF (Supplementary Fig. 4n). Further analyzing chromatin accessibility dynamic shows that the 610 loci were accessible in MEFs and became more accessible by day0 and day1, then loss of accessibility progressively and became inaccessible in ESCs. Of note, an increase in ATAC-seq signal was observed in Sall4K5A on day1 (Fig. 3n). These results suggest that NuRD complex was involved in closing of somatic chromatin. To determine gene expression in the 610 regions, we performed an integrative analysis of CUT&Tag and RNA-seq data. We then focused on genes that fail to be downregulated in G3 and abnormally activated in G6 during Sall4K5A reprogramming (Fig. 2i). There were 39 genes in G3 and 13 genes in G6 accompanied with elevated H3K27ac and Gatad2b lost (Supplementary Fig. 4o). Interestingly, Tgfb3, Runx1, Chd3 and Rasa3 were identified with a slower inactivation pattern in Sall4K5A than Sall4WT (Supplementary Fig. 4p). Based on the results that slow inactivation of somatic genes such as Rasa3, Htra1, Fzd2 either in shGatad2b or Sall4K5A, we propose that this may represent a new class of barrier genes for cellular reprogramming. Indeed, we show that knocking down Rasa3 facilitates and its overexpression inhibits JGES reprogramming, respectively (Supplementary Fig. 4q, r). More importantly, knocking down Rasa3 appears to rescue the defect caused by Gatad2b knockdown or mutated Sall4 to some extent (Supplementary Fig. 4s, t). Together, these results suggest that the NuRD complex is required to close somatic chromatin loci that encode barriers during JGES reprogramming.
Rescue of K5A Sall4 mutant by grafting N12 onto Jdp2
JGES may convert MEFs to iPSCs by closing somatic chromatin and opening pluripotent ones, similar to OSKM and chemical reprogramming53,59. Given the essential role demonstrated here for the collaborative interactions between Sall4 and NuRD complex directing somatic program inactivation, we may rescue Sall4 mutant(s) by reconstituting an alternative chromatin closing pathway. We first analyzed the genome-wide occupancy of Jdp2, Glis1, Esrrb, Sall4, and Gatad2b by CUT&Tag and show that they bind to diverse loci (Supplementary Fig. 5a), suggesting that there may be a division of labor among chromatin binding proteins, with Jdp2 binding to somatic chromatin and Esrrb binding to pluripotent loci. We then designed and constructed synthetic factors by fusing N12 (the NuRD interaction motif) to N terminal of Jdp2 (native Jdp2), Esrrb and Glis1, and introduced these factors with Sall4K5A during reprogramming to show that Jdp2N12 (synthetic Jdp2), not Glis1 N12 nor Esrrb N12, can rescue Sall4K5A (Fig.4a, b) and Sall4delN12 defect (Supplementary Fig. 5b, c).
Then we performed RNA-seq on JGES, JGESK5A, JN12GESK5A and JGE. PCA indicated that JGE and JGESK5A differ significantly from JGES at gene expression levels. In contrast, JN12GESK5A share similar dynamic with JGES reprograming despite a delay (Supplementary Fig. 5d). To gain mechanistic insight into why synthetic Jdp2 has this unique ability, we performed IP-MS with JDP2 antibody, and show that it can recruit subunits of NuRD complex (Fig. 4c, processed data provided in supplementary data1). Next, to explore the function of N12 during reprogramming, MEFs were infected with Esrrb, Glis1, Sall4K5A, and synthetic or native Jdp2.Then we performed JDP2 CUT&Tag to map their genomic occupancy. A total of 110402 peaks from both samples including 57104 common peaks, 23452 JDP2N12 specific binding peaks and 29846 JDP2WT specific binding peaks was identified (Fig. 4d). We show using de novo motif calling that both synthetic and native JDP2 binds to genomic regions enriched for AP-1 family motifs such as Fra1/2, Atf3, JunB and Fosl2 (Fig. 4e). We further performed ATAC-seq to compare the chromatin accessibility of reprogramming mediated by synthetic and native Jdp2 (Fig. 4f). We classified ATAC-seq peaks into native-specific (41264), synthetic-specific (6611) and common loci (49890) (Fig. 4g), and show that common loci are enriched in motifs for somatic state TFs such as Fra1/2, Atf3, JunB, and Fosl2 (Fig. 4h). Importantly, we identified peaks that are closed in synthetic Jdp2 but remain open in native Jdp2 during reprogramming to show that transcript level correlates well with chromatin accessibility for genes such as Tgfbr2, Htra1, Bmp1, Nrp2, Wnt5a, Col6a3, Igfbp7, and Bmp4 (Fig. 4h, Supplementary Fig. 5e), all shown to inhibit JGES reprogramming (Supplementary Fig. 4m). Together, these data indicate that N12 is required for Jdp2 to close somatic state related chromatin.
To validate Jdp2N12 may recruit NuRD complex to orchestrate chromatin remodeling and trigger somatic program inactivation. we performed GATAD2B CUT&Tag experiment during Jdp2N12 and Jdp2WT reprogramming in combination with Esrrb, Glis1, and Sall4K5A on day1. First, we categorized the JDP2N12 and JDP2WT CUT&Tag peaks into the simplest tier of JDP2-Common, JDP2N12 specific, and JDP2WT specific (Fig. 4d), and then we analyzed GATAD2B binding density for above three regions during Jdp2WT and Jdp2N12 reprogramming (Fig. 4i, Supplementary Fig. 5f, g). For JDP2N12-specific regions, we observed higher GATAD2B binding density in Jdp2N12 reprogramming. Conversely, the JDP2WT-specific regions exhibited increased GATAD2B occupancy in Jdp2WT reprogramming. Collectively, these results suggest that grafting the NuRD interacting motif onto Jdp2 to engage somatic specific regions can functionally rescue the disrupted Sall4-NuRD axis (Fig. 4j).
Discussion
We show here that JGES reprogramming differs markedly from the classic Yamanaka factors OSKM in that JGES relies on Sall4 to engage endogenous NuRD to close open chromatin in MEFs while OSKM relies on pioneering factors to open chromatin in MEFs10,60. This Sall4-NuRD axis may play similar roles in normal development and disease processes.
While sharing the same starting cells, i.e., MEFs, and final outcome, i.e., iPSCs capable of chimera formation and germline transmission, JGES may orchestrate reprogramming quite differently from OSKM, thus, offering a rare opportunity to compare and contrast their strategies in mediating cell fate decisions. The pioneering model of reprogramming has been proposed for OSKM based on their ability to bind to chromatin not normally accessible by transcription factors60–62. It has been thought that this pioneering function is a critical feature of OSKM reprogramming, these factors are not normally expressed in MEFs, thus, their binding motifs are buried mostly in chromatin in closed forms. However, the pioneering model does not provide explanations on how the open chromatin in MEFs are closed. Our earlier work suggests that OSK activates endogenous factors such as Sap30 that will engage Sin3A to close open chromatin in MEFs during early phase of reprogramming53. By identifying the Sall4-NuRD axis here, we propose that similar chromatin closing event should be the initiating event in cell fate transition.
Our findings appear to contradict multiple earlier studies that implicated NuRD subunits as a negative rheostat in reprogramming, including one that found that Gatad2a and Chd4 depletion resulted in up to 100% iPSC derivation efficiency28. Our study is different from these other studies in a number of ways, including the reprogramming cocktail we used and the reprogramming conditions. Mor and colleagues used knockdown experiments by siRNA or knockout to inhibit Gatad2 or Chd4 during reprogramming, in contrast to our retrovirus delivery34. While in our reprogramming system, the MEFs were infected with JGES retrovirus for reprogramming, they opted for transgenic “secondary reprogramming” embryonic fibroblasts (MEFs) that carry TetO-inducible OKSM for iPS induction. Furthermore, Mor and colleagues discovered that repressing Mbd3 and Chd4 with targeted siRNA prior to OKSM induction hampered the reprogramming process. Meanwhile, Santos and colleagues reported a positive role for MBD3/NuRD in transcription factor-mediated reprogramming of neural stem cells and epiblast stem cells to naive stem cells, implying a context-dependent role for the NuRD complex in pluripotency induction32. Besides, we have reported that MEFs induced with OKSM and 7 F (Jdp2, Esrrb, Sall4, Nanog, Kdm2b, Mkk6, Gkis1) follow distinct molecular trajectory during 7-day course to arrive final naïve state14. Mor and colleagues proposed a model that Gatad2a/Mbd3 represses the same genes that OKSM try to reactivate. However, Our JGES reprogramming system has shown a preference for collaboration with the NuRD complex to effectively deactivate somatic cell-specific genes during the early stages of reprogramming.
Sall4-NuRD interaction has been reported in cancer and in development processes such as hematopoiesis and neurogenesis63–66. However, the precise mechanism of its role in carcinogenesis has not been fully understood. In light of our finding here, one may argue that Sall4 functions to silence critical cell fate regulators through NuRD and then promote cell fate towards cancerous direction (Fig. 4j). If so, our work may lead to better models for therapeutical development.
Methods
Mice
OG2 transgenic mouse (CBA/CaJ x C57BL/6 J) were purchased from the Jackson laboratories (Mouse strain datasheet: 004654). Animals were individually housed under a 12 h light/dark cycle and provided with food and water ad libitum. Our studies followed the guidelines for the Care and Use of Laboratory Animals of the National Institutes of Health, and the protocols were approved by the Committee on the Ethics of Animal Experiments at the Guangzhou Institutes of Biomedicine and Health.
DNA constructs, cell lines, and cell culture
All constructs for in vitro expression were cloned to pMXs plasmids, and shRNAs were cloned to pSuper plasmids. MEFs were isolated from E13.5 mouse embryos regardless of sex from crossing male Oct4-GFP transgenic allele-carrying mice (CBA/CaJ 3 C57BL/6 J) to 129S4/SvJaeJ female mice around 6–8 weeks old. Briefly, the integral organs, the tail, the limbs and head were removed. The remaining tissues were cut into small pieces and then dissociated by digestive solution (0.25% trypsin: 0.05% trypsin =1:1; GIBCO) for 15 min at 37°C to obtain a single cell suspension. The isolated MEFs were seed onto 0.1% gelatin-coated culture dish DMEM-high glucose (Hyclone) contain 10% FBS (GIBCO), 1% GlutMAX (GIBCO),1% sodium pyruvate (GIBCO) and 1% NEAA (GIBCO), which was defined as fibroblast medium. Plat-E cells were maintained in DMEM high-glucose media (Hyclone) supplemented with 10% FBS (NTC, SFBE, HK-026). Mouse ESCs derived from embryos at home from Oct4-GFP transgenic mice and iPSCs (Male or female) derived in the study were cultured feeders-free with N2B27-2i medium (50% (v/v): high-glucose DMEM (Hyclone), 50% (v/v) knock out DMEM (GIBCO), N2 (GIBCO), B27 (GIBCO), 1% sodium pyruvate (GIBCO), 1% non-essential amino acids (GIBCO), 1% GlutaMAX (GIBCO), 0.1 mM 2-mercaptoethanol (GIBCO),1000 U/ml leukemia inhibitory factor (LIF) (Millipore), and the 2i inhibitors, 3 mM CHIR99021 (Sigma), and 1 mM PD0325901 (Sigma) or mES medium for karyotype analysis: high-glucose DMEM (Hyclone),15%(v/v)FBS(GIBCO), 1% sodium pyruvate (GIBCO), 1% non-essential amino acids (GIBCO), 1% GlutaMAX (GIBCO), 0.1 mM 2-mercaptoethanol (GIBCO),1000 U/ml leukemia inhibitory factor (LIF) (Millipore),.All the cell lines have been confirmed as mycoplasma contamination free with the Kit from Lonza (LT07-318).
iPSCs generation
This protocol started with production of the retro-virus. Plat-E cells were seeded at the concentration of 7.5 × 106−8.5 × 106 Cells per 10 cm dish uniformly and then were cultured in high-glucose DMEM (HyClone, SH30022.01) supplemented with 10% FBS (NTC, SFBE, HK-026) medium (10% FBS) for 12–16 h to reach a 70–80% confluent. The next step is plasmid transfection. For each 10 cm dish, Replacement of the Plat-E cells medium with 7.5 mL fresh 10% FBS should be applied firstly. A modified calcium phosphate transfection method was conducted as follows: each plasmid should be manufactured in an individual tube, 1068 µL ddH2O, 25 mg plasmid, 156.25 µL 2 M CaCl2, 1.25 ml 2×HBS were added in order to a total volume of 2.5 ml, mix the liquid immediately after adding 2×HBS, after incubate for 5 min at room temperature, the mixture should be gently transferred into the Plat-E cell. Replace the medium with 10 ml 10% FBS within 10–16 h after transfection. And then, the retrovirus should be collected twice, 48 h and 72 h after transfection, the supernatant containing the virus was collected at each time by a syringe and filter through a 4.5 mm filter and a 10 ml fresh 10% FBS medium was added to the Plat-E cell dish after the first collection, the virus could be stored at room temperature for 48 h at most. Thawing the frozen Passage 1 OG2 MEF (mouse embryonic fibroblast) into a 6 cm dish with 10% FBS medium and cultured in a 5% CO2 incubator while conduct the transfection. Then split the MEFs to P24 plate at 1.5 × 104 cell density per well before infected with virus when reach a 100% confluence. MEF cells should be also infected for twice. Mix the virus stock at proper volume and one volume fresh 10% FBS medium, then mix polybrene with the mixture to a final concentration of 4 mg/ml before infection. The second virus infection was conducted 24 h later. At post-infection Day0, replace the virus contained medium with fresh reprogramming medium iCD3 or chemical screening medium. Change medium every 24 h and observe the morphology change. GFP+ colony will appear at day 2 to day 3, GFP+ clones are photoed by living cells station (NIKON, Bio Station CT) and counted by Image-J using particles analysis.
ICD3 establishment
The chemical screening library include 630 chemicals is consist of 10 signal pathways relative to Tyrosine Kinase/Adaptors (n = 142), PI3K/Akt/mTOR Signaling (n = 96), Chromatin/Epigenetic (n = 86), Immunology/Inflammation (n = 59), JAK/STAT Signaling (n = 51), MAPK Signaling (n = 54), Angiogenesis (n = 37), Stem Cell (n = 27), Metabolism (n = 16), Neuroscience (n = 14), and others (n = 48).Before the chemical screening, by deleting LiCl which represses 7 F reprograming efficiency from iCD1 we developed iCD2, and then, we deliver the chemicals one by one into iCD2 at 1 µM and 5 µM concentration, after JGES virus transfection, cells were treated with iCD2 plus chemicals for 7 days and GFP+ clones were measured then. TOP5 chemicals were then combined with each other for next round reprograming, at the end of the screening, we established iCD3 reprograming culture medium.
Immunofluorescence
Cells growing on glass slide (NEST, 801007) were washed 3 times with PBS, then fixed with 4% PFA for 0.5 h, after washing 3 times in PBS and subsequently penetrated and blocked with 0.1% Triton X-100 and 3% BSA for 0.5 h at room temperature. Then, the cells were washed 3 times and incubated with primary antibody diluted with 3% BSA for two hours at room temperature or over-night at 4 degrees. After 3 washes in PBS, the cells were incubated for one hour in second antibodies diluted with 3% BSA. After washing 3 times in PBS cells were then incubated in DAPI diluted with PBS for 2 min. Then, the glass slide was mounted on the slides for observation on the confocal microscope (Zeiss 710 NLO). The following antibodies were used in this project: anti-Flag (Sigma Aldrich, F1804 1:200)
Co-immunoprecipitation and western blot
To perform co-immunoprecipitation, Cells were digested with 0.25% trypsin and washed for 3 times in PBS, whole cell extracts were prepared using lysis buffer (50 mM Tris pH 7.4, 200 mM NaCl, 10% Glycerol, 1% NP40,1 mM EDTA) with fresh added 1x Complete Protease inhibitors (Sigma, 1187358001) and 1% PMSF, incubated for 15 min on ice and then 1 h at 4 °C on a rotation wheel. Soluble cell lysates were collected after maximum speed centrifugation at 4 °C for 15 min, the supernatant was incubated with anti-FLAG beads, DYKDDDDK (Themo Fisher, A36797) overnight at 4 °C on a rotation wheel. Beads were then washed three times with cell wash buffer (50 mM Tris pH 7.4, 200 mM NaCl, 10% Glycerol, 0.01% NP40,1 mM EDTA) for 5 times. After completely removal of cell wash buffer, immunoprecipitated proteins with FLAG beads were boiled at 100-degree water in loading buffer (4% SDS,10% 2-Mercaptoethanol, 20% Glycerol,0.004% Bromophenol blue, 0.125 M Tris Ph 6.8) for 10 min, Whole protein extract were stored at −80 degree and avoid freeze and thaw cycle. To perform western blot, Total proteins or IP extract were analyzed by SDS-PAGE and then transferred to PVDF membrane (Millipore). After incubated with indicated antibodies, the membrane was exposed to X film. NuRD Complex Antibody Sampler Kit (CST,8349 T,1:1000), Anti GATAD2B (Abcam, ab224391,1:1000), anti RBBP4(Novusbio NB500-123,1:1000), anti-FLAG (Sigma Aldrich, F1804 1:1000), anti SALL4 (Abcam, ab29112,1:1000), anti-H3K27ac (Abcam ab4729,1:1000), were used.
Mass spectrometry analysis
Peptides after digestion were separated by AcclaimTM PepMapTM 100 C18 column (Thermo, 164941) using a 140 min of total data collection (100 min of 2–22%, 20 min 22–28% and 12 min of 28–36% gradient of B buffer (which containing 80% acetonitrile and 0.1% formic acid in H2O) for peptide separation, following with two steps washes: 2 min of 36–100% and 6 min of 100% B buffer) with an Easy-nLC 1200 connected online to a Fusion Lumos mass spectrometer (Thermo). Scans were collected in data-dependent top-speed mode with dynamic exclusion at 90 s. MaxQuant version 1.6.0.1 search against Mouse Fasta database was used to analyze raw data, with label free quantification and match between runs functions enabled. DEP package was used to analyze and visualize the output protein group.
RNA-seq and data analysis
Total RNA was isolated with TRIzol (Invitrogen). Libraries were prepared using the VAHTS mRNA-seq v2 Library Prep Kit for Illumina (Vazyme, NR601-01/02,) with 1 µg RNA per sample following the manufacturer’s instruction. Sequencing was performed using an illumina nova seq instrument at GUANGZHOU IGE BIOTECHNOLOGY LTD, (Gunagzhou, China). To analyze the gene expression, reads were aligned to the reference transcriptome using RSEM67 (v. 1.2.28) and the index built by RSEM with the mouse genome, mm10, and annotated to gene by annotables (v. 0.1.90) in R. Then, DESeq268 (v. 1.26.0) was used for data normalization and differential expression analysis. Differentially expressed genes were defined by Wald test (Benjamini-Hochberg-corrected P-value < 0.05 and absolute fold change > = 1.5) and Likelihood ratio test (Benjamini-Hochberg-corrected P-value < 0.05) for time course experiments. Gene ontology analysis was performed using clusterProfiler69 (v. 3.14.3)
ATAC-seq and data analysis
ATAC-seq library construction was performed using TruePrep DNA Library Prep Kit V2 for Illumina (Vazyme, TD501-01) and TruePrep Index Kit V2 for Illumina (Vazyme, TD202). Around 50,000 living cells were collected for each sample and the ATAC library was sequenced on a illumina nova6000 and carried out by Berry Genomics Corporation, (Beijing,China). All of the sequencing data were aligned to the mouse genome assembly (mm10) using bowtie270 (v. 2.3.5.1) with the following options: -p 20–very-sensitive -k 10. Then, sambamba71 (v. 0.6.6) was used to sort and remove duplicate reads with the following options: [XS] == null and not unmapped and not duplicate. Alignment BAM files were transformed into read coverage files (bigWig format) using deepTools72 (v. 3.5.1) using the RPKM normalization method. peaks were called using genrich (v. 0.6) with options: -j -y -r -m 30 -e MT -v -q 0.01. Then, peaks from different sample were merged to a peak set by DiffBind73 (v. 2.14.0) using RPKM or read count. Differential binding region was defined based on the peak set with read count by DESeq2 using Wald test (Benjamini-Hochberg-corrected P-value < 0.05 and absolute fold change > = 1.5) and Likelihood ratio test (Benjamini-Hochberg-corrected P-value < 0.05) for time course experiments. Open-close state change was based on peak set of RPKM with boundary of 2. Motif analysis was performed using HOMER74 (v.4.11). Peak was annotated to gene loci by ChIPseeker 69(v. 3.20.1).
CUT&Tag and data analysis
CUT&Tag library construction was performed using Hyperactive In-Situ ChIP Library Prep Kit for Illumina (pG-Tn5) (Vazyme, TD901) and TruePrep Index Kit V2 for Illumina (Vazyme, TD202). Around 100,000 living cells were collected for each sample and the CUT&Tag library was sequenced on a illumina nova6000 and carried out by Berry Genomics Corporation, (Beijing,China).
Sequenced reads were aligned to the mouse reference genome (mm10) using Bowtie2 with the parameters:–end-to-end–very-sensitive–no-unal–no-mixed–no-discordant–phred33 -I 10 -X 700. Then, sambamba was used to sort and remove duplicate reads with the following options: [XS] == null and not unmapped and not duplicate. Peaks were called using MACS275 (v. 2.2.6) with the default parameters. Reads that mapped to mitochondrial DNA or unassigned sequences were discarded. For paired-end sequencing data, only concordantly aligned pairs were retained. Alignment BAM files were transformed into read coverage files (bigWig format) using deepTools using the RPKM normalization method. For the sample without repeat, differential binding region was analysis by manorm76 (v. 1.3.0) with P-value < = 0.01. For the sample with repeats, differential binding region was analysis by DESeq2 based on the peak set produced by DiffBind, which is the same as ATAC-seq analysis. Motif analysis was performed using HOMER. Peak was annotated to gene loci by ChIPseeker.
Karyotype Analysis
Karyotype analysis was performed according to protocol published previously8,77–79. Briefly, 5 × 105 cells were seeded on 10 cm cell culture dishes and incubated for 48 h to reaching a 90% confluence. The cells were treated with fresh medium containing 0.2 μg/mL colchicine incubating for 2 h. The treated cells were collected and resuspended in 7 mL of 37 °C KCl hypotonic solution. After hypotonic treatment, nuclei were collected by centrifugation. Using a freshly configured Carnot fixation solution, pre-immobilize the nucleus for 3 min and then sample was collected and fixed again at 37 °C for 40 min. After fixation, the nuclei were collected and then resuspend. Clean slides were soaked cold water before use. Draw up a few drops of resuspended cells onto chilled and clean slides and spread them and then in a 75 °C oven for 3 h. Add trypsin to the staining vat and preheated in 37 °C. The dry slides were digested by trypsin for 8–12 s then terminated by saline. After that, the slides were stained with filtered Giemsa for 4 min then washed by PBS and ddH2O. Check the chromosomes with the microscope. Count at least 20 cells. Significant problem if more than 4 cells have more or less than 40 chromosomes (mouse).
Statistics and reproducibility
Data are presented as mean ± s.d. as indicated in the figure legends. Unpaired two-tailed student t-test, The P-value was calculated with the Prism 6 software. A P < 0.05 was considered as statistically, *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001. No statistical method was used to predetermine sample size. All experiments were replicated at least three times, and data are shown as means with SEM. No specific randomization or blinding protocols were used.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China (31830060, 92068201, 32000502, 81974019), Major State Basic Research Development Program (2017YFA0504100), Guangdong Science and Technology Project (2020B1212060052). The Research Team Project of Natural Science Foundation of Guangdong Province of China (2017A030312007), Natural Science Foundation of Guangdong Province of China (2019A1515012032), National Key Research and Development Program of China (2018YFA0108700, 2017YFA0105602), NSFC Projects of International Cooperation and Exchanges (81720108004). Guangdong Provincial Special Support Program for Prominent Talents (2021JC06Y656), Science and Technology Planning Project of Guangdong Province (2022B1212010010), Guangzhou Science and Technology Plan Project (202201000006). The Special Project of Dengfeng Program of Guangdong Provincial People’s Hospital (DFJH201812; KJ012019119; KJ012019423), High-level Hospital Construction Project (DFJHBF202110).
Source data
Author contributions
B.W. and J.M. performed the main experiments; C.L., L.L., H.F., C.Z., and X.H. performed the bioinformatic analysis; J.M., S.F., and Y.H. performed the cell culture experiments; L.W. performed the chemical screening experiment; J.K., H.L., and X.Y. performed the RNA-seq experiment; J.G. construct the plasmids; L.G. performed and X.Z. analysis the IP-MS experiment; P.Z., J.C., and J.L. supervised the bioinformatics analysis; D.P. supervised and conceived the whole study, wrote the manuscript, and approved the final version.
Peer review
Peer review information
Nature Communications thanks Brian Hendrich and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Data availability
The data supporting the conclusions of this study, including CUT&Tag for H3K27ac, Sall4, Jdp2, Gatad2b, Esrrb and Glis1 are available at GEO under accession GSE199612. The ATAC-seq and RNA-seq data were from GSE199609 and GSE199613. The RNA-seq data of MEF and ES cells was obtained from GSE127927. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the iProX partner repository with the dataset identifier PXD041704. Source Data for Figs. 1a, d, 2b, 4c, and Supplementary Figs 1b, e, g, h, 2a, 3d, e, g, 4m, q, r, s, t, 5c are provided with the manuscript. The authors declare that all data supporting the findings of this study are available within the article and its supplementary information files or from the corresponding author upon reasonable request. Source data are provided with this paper.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Bo Wang, Chen Li, Jin Ming.
Contributor Information
Ping Zhu, Email: tanganqier@163.com.
Duanqing Pei, Email: peiduanqing@westlake.edu.cn.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-023-38543-0.
References
- 1.Evans MJ, Kaufman MH. Establishment in culture of pluripotential cells from mouse embryos. Nature. 1981;292:154–156. doi: 10.1038/292154a0. [DOI] [PubMed] [Google Scholar]
- 2.Takahashi K, Yamanaka S. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell. 2006;126:663–676. doi: 10.1016/j.cell.2006.07.024. [DOI] [PubMed] [Google Scholar]
- 3.Papp B, Plath K. Epigenetics of reprogramming to induced pluripotency. Cell. 2013;152:1324–1343. doi: 10.1016/j.cell.2013.02.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Qian L, et al. In vivo reprogramming of murine cardiac fibroblasts into induced cardiomyocytes. Nature. 2012;485:593–598. doi: 10.1038/nature11044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Takahashi K, Yamanaka S. A decade of transcription factor-mediated reprogramming to pluripotency. Nat. Rev. Mol. Cell Biol. 2016;17:183–193. doi: 10.1038/nrm.2016.8. [DOI] [PubMed] [Google Scholar]
- 6.Lentini C, et al. Reprogramming reactive glia into interneurons reduces chronic seizure activity in a mouse model of mesial temporal lobe epilepsy. Cell Stem Cell. 2021;28:2104–2121.e2110. doi: 10.1016/j.stem.2021.09.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Yu B, et al. Reprogramming fibroblasts into bipotential hepatic stem cells by defined factors. Cell Stem Cell. 2013;13:328–340. doi: 10.1016/j.stem.2013.06.017. [DOI] [PubMed] [Google Scholar]
- 8.Yu J, et al. Induced pluripotent stem cell lines derived from human somatic cells. Science. 2007;318:1917–1920. doi: 10.1126/science.1151526. [DOI] [PubMed] [Google Scholar]
- 9.Xu Y, et al. Transcriptional control of somatic cell reprogramming. Trends Cell Biol. 2016;26:272–288. doi: 10.1016/j.tcb.2015.12.003. [DOI] [PubMed] [Google Scholar]
- 10.Chronis C, et al. Cooperative binding of transcription factors orchestrates reprogramming. Cell. 2017;168:442–459 e420. doi: 10.1016/j.cell.2016.12.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Shu J, et al. Induction of pluripotency in mouse somatic cells with lineage specifiers. Cell. 2013;153:963–975. doi: 10.1016/j.cell.2013.05.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Xiao X, et al. Generation of induced pluripotent stem cells with substitutes for Yamanaka’s four transcription factors. Cell Reprogram. 2016;18:281–297. doi: 10.1089/cell.2016.0020. [DOI] [PubMed] [Google Scholar]
- 13.Liu J, et al. The oncogene c-Jun impedes somatic cell reprogramming. Nat. Cell Biol. 2015;17:856–867. doi: 10.1038/ncb3193. [DOI] [PubMed] [Google Scholar]
- 14.Wang B, et al. Induction of Pluripotent stem cells from mouse embryonic fibroblasts by Jdp2-Jhdm1b-Mkk6-Glis1-Nanog-Essrb-Sall4. Cell Rep. 2019;27:3473–3485.e3475. doi: 10.1016/j.celrep.2019.05.068. [DOI] [PubMed] [Google Scholar]
- 15.Guan J, et al. Chemical reprogramming of human somatic cells to pluripotent stem cells. Nature. 2022;605:325–331. doi: 10.1038/s41586-022-04593-5. [DOI] [PubMed] [Google Scholar]
- 16.Hou P, et al. Pluripotent stem cells induced from mouse somatic cells by small-molecule compounds. Science. 2013;341:651–654. doi: 10.1126/science.1239278. [DOI] [PubMed] [Google Scholar]
- 17.Li Q, et al. A sequential EMT-MET mechanism drives the differentiation of human embryonic stem cells towards hepatocytes. Nat. Commun. 2017;8:15166. doi: 10.1038/ncomms15166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Pei D, Shu X, Gassama-Diagne A, Thiery JP. Mesenchymal-epithelial transition in development and reprogramming. Nat. Cell Biol. 2019;21:44–53. doi: 10.1038/s41556-018-0195-z. [DOI] [PubMed] [Google Scholar]
- 19.Smith ZD, Sindhu C, Meissner A. Molecular features of cellular reprogramming and development. Nat. Rev. Mol. Cell Biol. 2016;17:139–154. doi: 10.1038/nrm.2016.6. [DOI] [PubMed] [Google Scholar]
- 20.Arabaci DH, Terzioglu G, Bayirbasi B, Onder TT. Going up the hill: Chromatin-based barriers to epigenetic reprogramming. FEBS J. 2021;288:4798–4811. doi: 10.1111/febs.15628. [DOI] [PubMed] [Google Scholar]
- 21.Koche RP, et al. Reprogramming factor expression initiates widespread targeted chromatin remodeling. Cell Stem Cell. 2011;8:96–105. doi: 10.1016/j.stem.2010.12.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Onder TT, et al. Chromatin-modifying enzymes as modulators of reprogramming. Nature. 2012;483:598–602. doi: 10.1038/nature10953. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Wagner EJ, Carpenter PB. Understanding the language of Lys36 methylation at histone H3. Nat. Rev. Mol. Cell Biol. 2012;13:115–126. doi: 10.1038/nrm3274. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Cossec JC, et al. SUMO Safeguards somatic and pluripotent cell identities by enforcing distinct chromatin states. Cell Stem Cell. 2018;23:742–757.e748. doi: 10.1016/j.stem.2018.10.001. [DOI] [PubMed] [Google Scholar]
- 25.Kolundzic E, et al. FACT Sets a barrier for cell fate reprogramming in caenorhabditis elegans and human cells. Dev. Cell. 2018;46:611–626 e612. doi: 10.1016/j.devcel.2018.07.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Lynch CJ, et al. The RNA Polymerase II Factor RPAP1 is critical for mediator-driven transcription and cell identity. Cell Rep. 2018;22:396–410. doi: 10.1016/j.celrep.2017.12.062. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Mikkelsen TS, et al. Dissecting direct reprogramming through integrative genomic analysis. Nature. 2008;454:49–55. doi: 10.1038/nature07056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Rais Y, et al. Deterministic direct reprogramming of somatic cells to pluripotency. Nature. 2013;502:65–70. doi: 10.1038/nature12587. [DOI] [PubMed] [Google Scholar]
- 29.Huangfu D, et al. Induction of pluripotent stem cells by defined factors is greatly improved by small-molecule compounds. Nat. Biotechnol. 2008;26:795–797. doi: 10.1038/nbt1418. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Chen J, et al. H3K9 methylation is a barrier during somatic cell reprogramming into iPSCs. Nat. Genet. 2013;45:34–42. doi: 10.1038/ng.2491. [DOI] [PubMed] [Google Scholar]
- 31.Cheloufi S, et al. The histone chaperone CAF-1 safeguards somatic cell identity. Nature. 2015;528:218–224. doi: 10.1038/nature15749. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Dos Santos RL, et al. MBD3/NuRD facilitates induction of pluripotency in a context-dependent manner. Cell Stem Cell. 2014;15:392. doi: 10.1016/j.stem.2014.08.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Luo M, et al. NuRD blocks reprogramming of mouse somatic cells into pluripotent stem cells. Stem Cells. 2013;31:1278–1286. doi: 10.1002/stem.1374. [DOI] [PubMed] [Google Scholar]
- 34.Mor N, et al. Neutralizing Gatad2a-Chd4-Mbd3/NuRD complex facilitates deterministic induction of naive pluripotency. Cell Stem Cell. 2018;23:412–425.e410. doi: 10.1016/j.stem.2018.07.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Millard CJ, Fairall L, Ragan TJ, Savva CG, Schwabe JWR. The topology of chromatin-binding domains in the NuRD deacetylase complex. Nucleic Acids Res. 2020;48:12972–12982. doi: 10.1093/nar/gkaa1121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Jaffer S, Goh P, Abbasian M, Nathwani AC. Mbd3 promotes reprogramming of primary human fibroblasts. Int J. Stem Cells. 2018;11:235–241. doi: 10.15283/ijsc18036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Wang S, et al. Transient activation of autophagy via Sox2-mediated suppression of mTOR is an important early step in reprogramming to pluripotency. Cell Stem Cell. 2013;13:617–625. doi: 10.1016/j.stem.2013.10.005. [DOI] [PubMed] [Google Scholar]
- 38.Singhal N, et al. Chromatin-remodeling components of the BAF complex facilitate reprogramming. Cell. 2010;141:943–955. doi: 10.1016/j.cell.2010.04.037. [DOI] [PubMed] [Google Scholar]
- 39.Saunders A, et al. The SIN3A/HDAC Corepressor complex functionally cooperates with NANOG to promote pluripotency. Cell Rep. 2017;18:1713–1726. doi: 10.1016/j.celrep.2017.01.055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Tatetsu H, et al. SALL4, the missing link between stem cells, development and cancer. Gene. 2016;584:111–119. doi: 10.1016/j.gene.2016.02.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Elling U, Klasen C, Eisenberger T, Anlag K, Treier M. Murine inner cell mass-derived lineages depend on Sall4 function. Proc. Natl Acad. Sci. USA. 2006;103:16319–16324. doi: 10.1073/pnas.0607884103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Sakaki-Yumoto M, et al. The murine homolog of SALL4, a causative gene in Okihiro syndrome, is essential for embryonic stem cell proliferation, and cooperates with Sall1 in anorectal, heart, brain and kidney. Dev. Dev. 2006;133:3005–3013. doi: 10.1242/dev.02457. [DOI] [PubMed] [Google Scholar]
- 43.Tsubooka N, et al. Roles of Sall4 in the generation of pluripotent stem cells from blastocysts and fibroblasts. Genes Cells. 2009;14:683–694. doi: 10.1111/j.1365-2443.2009.01301.x. [DOI] [PubMed] [Google Scholar]
- 44.Buganim Y, et al. The developmental potential of iPSCs is greatly influenced by reprogramming factor selection. Cell Stem Cell. 2014;15:295–309. doi: 10.1016/j.stem.2014.07.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Yamashita K, Sato A, Asashima M, Wang PC, Nishinakamura R. Mouse homolog of SALL1, a causative gene for Townes-Brocks syndrome, binds to A/T-rich sequences in pericentric heterochromatin via its C-terminal zinc finger domains. Genes Cells. 2007;12:171–182. doi: 10.1111/j.1365-2443.2007.01042.x. [DOI] [PubMed] [Google Scholar]
- 46.Xiong J, et al. Cooperative action between SALL4A and TET proteins in stepwise oxidation of 5-Methylcytosine. Mol. Cell. 2016;64:913–925. doi: 10.1016/j.molcel.2016.10.013. [DOI] [PubMed] [Google Scholar]
- 47.Wang B, et al. Establishment of a CRISPR/Cas9-mediated GATAD2B homozygous knockout human embryonic stem cell line. Stem Cell Res. 2021;57:102590. doi: 10.1016/j.scr.2021.102590. [DOI] [PubMed] [Google Scholar]
- 48.Hu G, Wade PA. NuRD and pluripotency: A complex balancing act. Cell Stem Cell. 2012;10:497–503. doi: 10.1016/j.stem.2012.04.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Xing, G. et al. MAP2K6 remodels chromatin and facilitates reprogramming by activating Gatad2b-phosphorylation dependent heterochromatin loosening. Cell Death Differ.29, 1042–1054 (2021). [DOI] [PMC free article] [PubMed]
- 50.Lauberth SM, Rauchman M. A conserved 12-amino acid motif in Sall1 recruits the nucleosome remodeling and deacetylase corepressor complex. J. Biol. Chem. 2006;281:23922–23931. doi: 10.1074/jbc.M513461200. [DOI] [PubMed] [Google Scholar]
- 51.Kong NR, et al. Zinc Finger Protein SALL4 functions through an AT-Rich Motif to regulate gene expression. Cell Rep. 2021;34:108574. doi: 10.1016/j.celrep.2020.108574. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Pantier R, et al. SALL4 controls cell fate in response to DNA base composition. Mol. Cell. 2021;81:845–858.e848. doi: 10.1016/j.molcel.2020.11.046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Li D, et al. Chromatin accessibility dynamics during iPSC reprogramming. Cell Stem Cell. 2017;21:819–833.e816. doi: 10.1016/j.stem.2017.10.012. [DOI] [PubMed] [Google Scholar]
- 54.Liu Y, et al. AP-1 activity is a major barrier of human somatic cell reprogramming. Cell Mol. Life Sci. 2021;78:5847–5863. doi: 10.1007/s00018-021-03883-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Fu M, et al. Forkhead box family transcription factors as versatile regulators for cellular reprogramming to pluripotency. Cell Regen. 2021;10:17. doi: 10.1186/s13619-021-00078-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Abernathy DG, et al. MicroRNAs induce a permissive chromatin environment that enables neuronal subtype-specific reprogramming of adult human fibroblasts. Cell Stem Cell. 2017;21:332–348 e339. doi: 10.1016/j.stem.2017.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Velasco S, et al. A Multi-step transcriptional and chromatin state cascade underlies motor neuron programming from embryonic stem cells. Cell Stem Cell. 2017;20:205–217.e208. doi: 10.1016/j.stem.2016.11.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Victor MB, et al. Generation of human striatal neurons by microRNA-dependent direct conversion of fibroblasts. Neuron. 2014;84:311–323. doi: 10.1016/j.neuron.2014.10.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Cao S, et al. Chromatin accessibility dynamics during chemical induction of pluripotency. Cell Stem Cell. 2018;22:529–542.e525. doi: 10.1016/j.stem.2018.03.005. [DOI] [PubMed] [Google Scholar]
- 60.Soufi A, et al. Pioneer transcription factors target partial DNA motifs on nucleosomes to initiate reprogramming. Cell. 2015;161:555–568. doi: 10.1016/j.cell.2015.03.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Li S, Zheng EB, Zhao L, Liu S. Nonreciprocal and conditional cooperativity directs the pioneer activity of pluripotency transcription factors. Cell Rep. 2019;28:2689–2703.e2684. doi: 10.1016/j.celrep.2019.07.103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Balsalobre A, Drouin J. Pioneer factors as master regulators of the epigenome and cell fate. Nat. Rev. Mol. Cell Biol. 2022;23:449–464. doi: 10.1038/s41580-022-00464-z. [DOI] [PubMed] [Google Scholar]
- 63.Liu BH, et al. Targeting cancer addiction for SALL4 by shifting its transcriptome with a pharmacologic peptide. Proc. Natl Acad. Sci. USA. 2018;115:E7119–E7128. doi: 10.1073/pnas.1801253115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Tahara N, et al. Sall4 regulates neuromesodermal progenitors and their descendants during body elongation in mouse embryos. Development. 2019;146:dev177659. doi: 10.1242/dev.177659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Yang J, Liao W, Ma Y. Role of SALL4 in hematopoiesis. Curr. Opin. Hematol. 2012;19:287–291. doi: 10.1097/MOH.0b013e328353c684. [DOI] [PubMed] [Google Scholar]
- 66.Zeng SS, et al. The transcription factor SALL4 regulates stemness of EpCAM-positive hepatocellular carcinoma. J. Hepatol. 2014;60:127–134. doi: 10.1016/j.jhep.2013.08.024. [DOI] [PubMed] [Google Scholar]
- 67.Li B, Dewey CN. RSEM: Accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinforma. 2011;12:323. doi: 10.1186/1471-2105-12-323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Yu G, Wang LG, He QY. ChIPseeker: An R/Bioconductor package for ChIP peak annotation, comparison and visualization. Bioinformatics. 2015;31:2382–2383. doi: 10.1093/bioinformatics/btv145. [DOI] [PubMed] [Google Scholar]
- 70.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat. Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Tarasov A, Vilella AJ, Cuppen E, Nijman IJ, Prins P. Sambamba: Fast processing of NGS alignment formats. Bioinformatics. 2015;31:2032–2034. doi: 10.1093/bioinformatics/btv098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Ramirez F, et al. deepTools2: A next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 2016;44:W160–W165. doi: 10.1093/nar/gkw257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Stark, R. & Brown, G. DiffBind: differential binding analysis of ChIP-Seq peak data. Bioconductor. http://www.http://http://bioconductor.org/packages/release/bioc/html/DiffBind.html (2012).
- 74.Heinz S, et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell. 2010;38:576–589. doi: 10.1016/j.molcel.2010.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Zhang Y. Model-based analysis of ChIP-Seq (MACS) Genome Biol. 2008;9:R137. doi: 10.1186/gb-2008-9-9-r137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Shao Z, Zhang Y, Yuan GC, Orkin SH, Waxman DJ. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets. Genome Biol. 2012;13:R16. doi: 10.1186/gb-2012-13-3-r16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Ferguson-Smith MA, Trifonov V. Mammalian karyotype evolution. Nat. Rev. Genet. 2007;8:950–962. doi: 10.1038/nrg2199. [DOI] [PubMed] [Google Scholar]
- 78.Kranz AR. Karyotype analysis in meiosis: Giemsa banding in the genus Secale L. Theor. Appl. Genet. 1976;47:101–107. doi: 10.1007/BF00274937. [DOI] [PubMed] [Google Scholar]
- 79.Moralli D, et al. An improved technique for chromosomal analysis of human ES and iPS cells. Stem Cell Rev. Rep. 2011;7:471–477. doi: 10.1007/s12015-010-9224-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data supporting the conclusions of this study, including CUT&Tag for H3K27ac, Sall4, Jdp2, Gatad2b, Esrrb and Glis1 are available at GEO under accession GSE199612. The ATAC-seq and RNA-seq data were from GSE199609 and GSE199613. The RNA-seq data of MEF and ES cells was obtained from GSE127927. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the iProX partner repository with the dataset identifier PXD041704. Source Data for Figs. 1a, d, 2b, 4c, and Supplementary Figs 1b, e, g, h, 2a, 3d, e, g, 4m, q, r, s, t, 5c are provided with the manuscript. The authors declare that all data supporting the findings of this study are available within the article and its supplementary information files or from the corresponding author upon reasonable request. Source data are provided with this paper.