Skip to main content
eLife logoLink to eLife
. 2018 Jan 30;7:e29518. doi: 10.7554/eLife.29518

Unique molecular events during reprogramming of human somatic cells to induced pluripotent stem cells (iPSCs) at naïve state

Yixuan Wang 1,†,, Chengchen Zhao 1,, Zhenzhen Hou 1,, Yuanyuan Yang 1,, Yan Bi 1, Hong Wang 1, Yong Zhang 1, Shaorong Gao 1,
Editor: Martin Pera2
PMCID: PMC5807049  PMID: 29381138

Abstract

Derivation of human naïve cells in the ground state of pluripotency provides promising avenues for developmental biology studies and therapeutic manipulations. However, the molecular mechanisms involved in the establishment and maintenance of human naïve pluripotency remain poorly understood. Using the human inducible reprogramming system together with the 5iLAF naïve induction strategy, integrative analysis of transcriptional and epigenetic dynamics across the transition from human fibroblasts to naïve iPSCs revealed ordered waves of gene network activation sharing signatures with those found during embryonic development from late embryogenesis to pre-implantation stages. More importantly, Transcriptional analysis showed a significant transient reactivation of transcripts with 8-cell-stage-like characteristics in the late stage of reprogramming, suggesting transient activation of gene network with human zygotic genome activation (ZGA)-like signatures during the establishment of naïve pluripotency. Together, Dissecting the naïve reprogramming dynamics by integrative analysis improves the understanding of the molecular features involved in the generation of naïve pluripotency directly from somatic cells.

Research organism: Human

Introduction

The pluripotent state, emerging during the development of embryos from the totipotent zygote stage into the blastocyst stage in vivo, can be captured indefinitely in vitro as embryonic stem cells (ESCs). Moreover, the generation of induced pluripotent stem cells (iPSCs) further demonstrates that pluripotency can be re-captured directly from somatic cells. Previous studies have demonstrated distinct properties in human ESCs/iPSCs compared with mouse ESCs/iPSCs with regard to cell morphology, transcriptional profiles, signaling requirements and epigenetic modifications(Hackett and Surani, 2014; Nichols and Smith, 2009). Human ESCs/iPSCs represent a primed pluripotent state corresponding to the post-implantation epiblast; while mouse ESCs/iPSCs represent a naïve state of pluripotency found in the pre-implantation blastocyst stage (Brons et al., 2007; Huang et al., 2014; Tesar et al., 2007; Ware, 2017). Recent advances have allowed the development of several strategies to achieve human naïve pluripotency through genetic or chemical manipulations (Chan et al., 2013; Gafni et al., 2013; Hanna et al., 2010; Takashima et al., 2014; Theunissen et al., 2014; Ware et al., 2014). Rigorous molecular criteria for the evaluation of human naïve pluripotency have also been defined through systematic comparisons among naïve pluripotent cell lines derived via different strategies, primed pluripotent cell lines and human pre-implantation embryos(Davidson et al., 2015; Dodsworth et al., 2015; Huang et al., 2014; Theunissen et al., 2016).

The widely used 5iLAF culture system allows the establishment of naïve pluripotency in different types of human cells, including cells from pre-implantation embryos, primed pluripotent stem cells, and even somatic cells (Pastor et al., 2016; Theunissen et al., 2014; Yang et al., 2016). However, in-depth mechanistic studies exploring naïve pluripotency establishment during these reprogramming processes are still lacking, owing to the low efficiency of reprogramming and the high heterogeneity of the reprogramming cells. The human inducible reprogramming system recently developed by Cacchiarelli and colleagues (Cacchiarelli et al., 2015) enables high-resolution analysis throughout the reprogramming process, thus providing a powerful tool for dissecting the molecular roadmap of the gain of naïve pluripotency directly from somatic cells.

In this study, we utilized the human secondary reprogramming system together with the 5iLAF naïve induction system to systematically characterize the transcriptional and epigenetic dynamics involved in the transition from fibroblast cells to naïve pluripotent cells at base resolution. Integrative analysis revealed ordered gene network activation that shared signatures with embryogenesis from the post-implantation to pre-implantation stages, with a transient wave of 8-cell-specific transcripts expression in the late stage of naïve reprogramming, suggesting the activation of gene networks with human zygotic genome activation (ZGA)-like characteristics during naïve pluripotency establishment. Altogether, dissecting the dynamics during naïve induction process provides a comprehensive analysis of the naïve pluripotency reprogramming roadmap, which improves the understating of molecular networks in human naïve pluripotency.

Results

Establishment of inducible naïve reprogramming system in human

To establish the secondary naïve iPSCs induction system in human, we first reprogrammed primary human embryonic fibroblasts (1° hEF) to primary primed iPSCs (1° piPSCs) by using the doxycycline (dox)-inducible polycistronic human OSKM cassette, as previously reported (Park et al., 2008). After the differentiation of clonal 1° piPSCs, the resultant secondary human inducible fibroblast-like cells (2° hiF) were reprogrammed into iPSCs at naïve pluripotent state by 5iLAF with dox treatment (Figure 1—figure supplement 1A). Consistent with previous observations (Cacchiarelli et al., 2015), the 2° hiFs showed significantly decreased proliferation ability and naïve reprogramming efficiency, as well as increased senescence-associated beta-galactosidase activity with long-term culturing and passaging (Figure 1—figure supplement 1B–D); these effects were rescued by immortalizing the 2° hiF cell lines (2° hiF-T) with constitutive human TERT expression (Figure 1—figure supplement 1B–D).

Upon dox treatment, morphological changes occurred in hiF-T cells at approximately day 2, and small cell aggregates were observed as early as day 6 (Figure 1A). Dome-shaped colonies emerged and expanded after the culture conditions were changed from conventional hESC medium (hESM) to 5iLAF medium on day 6 (Figure 1A). After 20 days of induction, the cells were further cultured under dox-withdrawal conditions for 4 days before clonal expansion, to establish the secondary naïve iPSC lines (niPSC-Ts) (Figure 1A; Figure 1—figure supplement 1A). We also used the OCT4-ΔPE-GFP reporter system to monitor the activation of the OCT4 distal enhancer (DE), the molecular signature of ground state pluripotency, during human naïve reprogramming. GFP+ colonies were observed at day 20 of induction (Figure 1B) and were increased during the reprogramming process (Figure 1B), reaching ~91.4% in clonally derived niPSC-Ts, as detected by FACS analysis (Figure 1C). Immunostaining results of the derived niPSC-Ts exhibited robust expression of core pluripotency markers (OCT4, SOX2, NANOG and TRA-1–60) and naïve pluripotency markers (DPPA3 and UTF1) (Figure 1D). In agreement with recent reports (Pastor et al., 2016), almost all niPSC-Ts were negative for SSEA-3 and SSEA-4 expression (Figure 1D). Thus, using the dox-inducible system and 5iLAF naïve reprogramming strategy, we established a stable and reliable system for the integrative study of the transcriptional and epigenetic roadmap to human naïve pluripotency.

Figure 1. Establishment of the secondary naïve iPSC induction system.

(A) Representative bright field images of hiF-Ts, niPSC-Ts and reprogramming cells at the indicated time points during reprogramming. Scale bar, 100 μm. (B) Phase and OCT4-ΔPE-GFP images of niPSC-Ts and reprogramming cells at the indicated time points during reprogramming. Scale bar, 100 μm. (C) Flow cytometry analysis of the proportion of GFP+ cells in OCT4-ΔPE-GFP niPSC-Ts. (D) Immunostaining images of pluripotency-related marker expression in niPSC-Ts. Scale bar, 50 μm.

Figure 1.

Figure 1—figure supplement 1. Optimization of secondary human naïve iPSCs reprogramming system.

Figure 1—figure supplement 1.

(A) Schematic overview of secondary naïve reprogramming strategy to generate human naïve iPSCs. 1° HEF, primary human embryonic fibroblasts; 1° piPSCs, primary primed iPSCs; 2° hiF, inducible fibroblasts; 2° hiF-T, immortalized inducible fibroblasts; 2° niPSCs, secondary naïve iPSCs. (B) Growth curves of hiF and hiF-T cells at different passages. (C) Alkaline phosphatase (AP) staining representing the naïve reprogramming efficiencies of HEFs, hiFs and hiF-Ts at different passages. (D) Representative images showing the senescent cells in hiFs and hiF-Ts at different passages by senescence-associated-beta-galactosidase (SA-β-GAL) assay. Scale bar, 50 μm. Senescent cells are stained in blue and indicated with red arrowheads.

Transcriptional profiling of naïve reprogramming cells

Next, we collected the mRNAs of cells at different time points throughout the naïve reprogramming process and performed RNA-seq analysis (Figure 2A). The Pearson correlation distance analysis of mRNAs segregated the cell samples into three distinct categories including hiF-T/0d/2d/6d, 8d/12d and 14d/20d/24d+dox/24d-dox/niPSC-T (Figure 2—figure supplement 1A). On the basis of the dynamics of the differentially expressed (DE) genes during naïve reprogramming (Figure 2—figure supplement 1B), multi-dimensional scaling (MDS) analysis exhibited a continuous trajectory of transcriptional dynamics from hiF-Ts to established niPSC-Ts (Figure 2B). However, distinct from the results for the primed reprogramming system (Cacchiarelli et al., 2015), the cellular states at days 20–24 during naïve reprogramming were similar, and dox withdrawal did not result in dramatic transcriptional changes in the cells on day 24 (Figure 2B), thus further suggesting the intrinsic differences between naïve and primed pluripotency. Compared with those in the primed reprogramming system, the epiblast-specific markers representing naïve pluripotency were gradually up-regulated (one-tailed t-test p-value=5.99e-22), whereas the primed-specific genes were gradually down-regulated (one-tailed t-test p-value=1.67e-3) during the naïve pluripotency induction process (Figure 2—figure supplement 1C). We also observed transient up-regulation followed by marked down-regulation of transcriptional factors OTX2 and ZIC2 during naïve reprogramming, which were known to direct OCT4 to primed state-specific enhancer sites (Buecker et al., 2014) and exhibited robust activation during primed reprogramming (Figure 2—figure supplement 1D).

Figure 2. Transcriptional profiling of cells during naïve reprogramming.

(A) Schematic representation of reprogramming intermediate collection at different time points, as indicated. hiF-T cells were first cultured in conventional hESM with dox for 6 days and then switched to 5iLAF culture medium supplemented with dox until day 20. Cells with or without dox treatment for four additional days were collected. (B) MDS analysis of RNA-seq data during the naïve reprogramming process. (C) Line plots showing transcriptional dynamics of differentially expressed genes during the naïve reprograming process. Genes were grouped by k-means clustering. Gray shades represent a 95% bootstrap confidence interval around the mean value. (D) Heatmaps showing the expression patterns of genes with pre-implantation signatures in both the naïve and primed reprogramming process. (E) Correlation analysis of transcriptional profiles between naïve reprogramming and the embryonic development process, with the Pearson correlation coefficient of each pair shown on each cell of the heatmap. (F) Immunostaining images of pluripotency-related marker expression in the reprogramming cells at indicated time points during naïve and primed reprogramming. Scale bar, 50 μm.

Figure 2.

Figure 2—figure supplement 1. Transcriptional profiling of naïve pluripotency reprogramming cells.

Figure 2—figure supplement 1.

(A) Correlation analysis (left panel) ad cell clustering analysis (right panel) of transcriptional profiles at different time points during naïve reprograming. (B) Bar plot showing dynamic changes in differentially expressed (DE) gene numbers of two adjacent time points during naïve reprogramming. (C) Expression dynamics of epiblast-specific genes and primed-specific genes during naïve reprogramming relative to those in the primed reprogramming process. (D) Bar plot showing the absolute expression values of OTX2 and ZIC2 genes in the naïve and primed reprogramming processes. Error bars represent a 95% confidence interval around the mean value.
Figure 2—figure supplement 2. Expression dynamics of gene clusters in Figure 2C.

Figure 2—figure supplement 2.

Detailed description of Figure 2D gene clusters with specific expression patterns. Line plot and heatmap (left panels) showing expression patterns of each cluster during naïve reprograming. Grey shades represent a 95% confidence interval around the mean value; Bar plot (middle panels) showing enrichment of p-value of representative GO term of each cluster; Bar plot (right panels) showing expression dynamics of selected genes across naïve reprogramming time course.

Next, we characterized the transcriptome dynamics during naïve reprogramming in detail. On the basis of the DE genes between two adjacent time points of RNA-seq data (Figure 2—figure supplement 1B), we identified several dynamic expression clusters and focused on seven major patterns in three categories (down-regulated, up-regulated and transiently up-regulated), which were defined according to gene ontology (GO) enrichment analysis, developmental cell identity, and expression pattern of marker genes (Figure 2C; Figure 2—figure supplement 2) (Edgar et al., 2013). The activation of OSKM down-regulated genes involved in cell junction and extracellular matrix (ECM) organization occurred in two waves, at day 12 and day 20 of naïve reprogramming (Figure 2C; Figure 2—figure supplement 2). More importantly, pluripotency-associated gene networks were also activated in two waves; the earlier one consisted of genes with an early embryogenesis signature such as gradual up-regulation of CDH1 and NANOG from day 2, whereas the later one comprised genes with pre-implantation characteristics, such as DPPA3 and TFCP2L1 (Figure 2C; Figure 2—figure supplement 2). For the transiently up-regulated genes during naïve reprogramming, the first transient wave peaked at day eight and was enriched in genes characteristic of late embryogenesis and pattern specification, such as LHX9 and the HOX cluster genes (Figure 2C; Figure 2—figure supplement 2). The second wave peaked around day 12–14 and included metabolism-associated genes, such as IGF2 and AFP (Figure 2C; Figure 2—figure supplement 2). The third wave, which peaked during the late reprogramming process at approximately day 24, was enriched in genes important for gamete generation (Fisher’s exact test p-value=1.934e-3) and sexual reproduction (Fisher’s exact test p-value=3.517e-3), such as OVOL1 and CGB5 (Figure 2C; Figure 2—figure supplement 2). Notably, the gene expression program with pre-implantation-like signatures exhibited significantly different dynamics in naïve compared with primed reprogramming systems, which exhibited robust up-regulation along the naïve reprogramming process, peaking in niPSC-Ts (Figure 2C, D). However, such expression was lost upon dox withdrawal and iPSC-T derivation during primed reprogramming (Figure 2D) (Cacchiarelli et al., 2015), results consistent with the MDS analysis results in both reprogramming systems, respectively (Figure 2B) (Cacchiarelli et al., 2015). In addition, in comparing the transcriptional profiles between the naïve reprogramming process of hiF-Ts to niPSC-Ts and human early embryo development (Yan et al., 2013), we found that the reprogramming cells at day 20 and 24 and niPSC-Ts most closely resembled human embryos at the late blastocyst stage (Figure 2E). Immunostaining of the pre-implantation marker DPPA3 and early embryogenesis marker UTF1 in the reprogramming cells at day 20 and day 24 in both induction systems also confirmed our observations in the transcriptional profiles (Figure 2F). Therefore, in contrast to the primed reprogramming system, in which the transcriptional profile finally dropped to the post-implantation-like stage (Cacchiarelli et al., 2015), the derivation of naïve iPSCs was accompanied by ordered waves of gene network activation, with the gene program finally stabilizing at the pre-implantation-like stage.

Gene network activation with 8-cell-stage-like characteristics during naïve reprogramming

During human embryo development, the major wave of embryonic genome activation (EGA) occurs at approximately day three at ~8 cell (8C) stage (Niakan et al., 2012; Vassena et al., 2011), times corresponding to the major wave of zygotic genome activation (ZGA) at the 2 cell stage in mice, which is a key transcriptional feature of totipotency (Latham and Schultz, 2001; Macfarlan et al., 2012; Schultz, 2002). By analyzing single cell RNA-seq datasets of human early embryos (Yan et al., 2013), we identified 538 genes with expression levels peaking at the 8C stage during human embryonic development (Figure 3A), including the previously reported EGA regulators with PRD-like homeodomains such as DUXA, OTX2 and LEUTX (Töhönen et al., 2015). Further investigation revealed that these 8C-genes were significantly enriched in the cluster including genes important for gamete generation (Fisher’s exact test p-value=3.784322e-08) (Figure 3A; Figure 3—figure supplement 1A, Figure 2—figure supplement 2) and exhibited similar transiently up-regulated expression patterns during the late stages of naïve reprogramming (Figure 3B). In addition to ZSCAN4 (Figure 3C; Figure 3—figure supplement 1B), whose mouse homologs are 2C-stage restricted and are important for telomere stability in mouse ESCs and iPSCs (Ko, 2016; Zalzman et al., 2010); we also identified KLF17, TBX20 and some PRAMEF family genes including PRAMEF15, PRAMEF5 etc. in the 8C-stage category with transiently enhanced transcription activity during late stages of naïve reprogramming but not in the primed reprogramming system (Figure 3—figure supplement 1C). More importantly, we identified MBD3L2/3/4/5 genes as 8C-genes during human embryo development (Figure 3—figure supplement 1B); these genes are the homologs of the mouse Mbd3l2 gene, which is specifically expressed at the 2C stage during mouse development (Jiang et al., 2002). Similar to the genes listed above, the transcriptional dynamics of MBD3L2/3/4/5 also showed dramatically increased expression during the late stages of naïve reprogramming from day 20 to 24 even after dox withdrawal, then decreased sharply after niPSC-T derivation (Figure 3C, D; Figure 3—figure supplement 1C, D). However, during the primed reprogramming, we observed only a small transient wave of ZSCAN4 expression during the late stages from day 20 to day 24 (Figure 3C), and only marginal MBD3L2/3/4/5 expression throughout the induction process (Figure 3D). Interestingly, the MBD3L2/3/4/5 genes are loci-clustered in the human genome and are far from MBD3L1. Although these genes exhibited differential mRNA expression levels, the amino acid sequences of each gene in the cluster are identical, thus suggesting that the gene cluster might have evolved from mouse Mbd3l2 via the copy-paste mode, similar to the situation observed in mouse Zscan4 retrotransposons and ERV repeats. Using specific human ZSCAN4 and MBD3L2-5 antibodies, we performed immunostaining in cells during reprogramming. While only weak expression of ZSCAN4 as well as no expression of MBD3L2-5 could be detected around day 14, we could observe robust expression of ZSCAN4 and MBD3L2-5 around day 24 during naïve reprogramming (Figure 3E). During primed reprogramming, we could observe weak expression of ZSCAN4 and DPPA3 at 24d + dox, which diminished after dox withdrawal (Figure 3—figure supplement 1E). However, the expression of MBD3L2-5 could not be detected during primed reprogramming process (Figure 3—figure supplement 1E). Western blot analysis of MBD3L2-5 also confirmed our observations in both immunostaining and transcriptional profiles (Figure 3F). Although the transcriptional profiles of MBD3L cluster genes exhibited sharply decreased mRNA expression in the derived niPSC-T lines, we could still observe MBD3L2-5+ cells in naïve iPSC clones compared to their absence in primed iPSCs by immunostaining (Figure 3—figure supplement 1F), suggesting the expression of MBD3L2/3/4/5 as an indicator for naïve pluripotency. In summary, the transient expression of 8C-genes, which occurs specifically during the late stages of naïve reprogramming, suggests the emergence of gene network activation with 8C-stage-like characteristics. Compared with the primed reprogramming that finally stabilized at the post-implantation-like state (Cacchiarelli et al., 2015), naïve reprogramming not only reached the pre-implantation-like status but more importantly, underwent reactivation of gene programs with human EGA-like characteristics.

Figure 3. Transient activation of transcripts with 8C-stage-like signatures during naïve reprogramming.

(A) Line plot showing expression dynamics of 8C-stage-specific genes during human embryonic development. Gray shades represent a 95% confidence interval around the mean value. (B) Line plot showing transcriptional dynamics of 8C-specific genes across naïve and primed reprogramming. (C–D) Bar plot showing the absolute expression values of ZSCAN4 (C) and MBD3L2 genes (D) in the naïve and primed reprogramming processes. Error bars represent a 95% confidence interval around the mean value. (E) Immunostaining images of ZSCAN4, MBD3L2-5 and DPPA3 expression in cells during naïve reprogramming. Scale bar, 50 μm. (F) Western blot results of MBD3L2-5 expression in naïve and primed reprogramming cells, niPSC-Ts and piPSC-Ts. β-ACTIN was used as endogenous control. (G) Line plot showing the expression patterns of 8C-specific TEs during human embryonic development. Gray shades represent a 95% confidence interval around the mean value. (H) Line plot showing expression dynamics of 8C-specific TEs during naïve reprogramming. (I–J) Heatmap of expression patterns of 8C-specific HERVK integrants across naïve reprogramming (I) and pre-implantation development (J). (K) Heatmaps showing different expression patterns of KRAB-ZNF genes in the naïve reprogramming process. K-means clustering was performed on KRAB-ZNF genes with k = 8 using R library ‘amap’. Distance between genes was measured based on their correlation.

Figure 3.

Figure 3—figure supplement 1. Dynamics of 8C-genes during naïve reprogramming.

Figure 3—figure supplement 1.

(A) Bar plot showing the p-value of 8C-gene enrichments in each cluster measuring with Fisher’ exact test. (B) Line plot showing expression patterns of selected genes during human pre-implantation development. (C) Bar plot showing the absolute expression values of several 8C-genes in the naïve and primed reprogramming processes. Error bars represent a 95% confidence interval around the mean value. (D) Bar plot showing the absolute expression values of MBD3L3/4/5 genes in the naïve and primed reprogramming processes. Error bars represent a 95% confidence interval around the mean value. (E) Immunostaining images of ZSCAN4, MBD3L2-5 and DPPA3 expression in cells during primed reprogramming. Scale bar, 50 μm. (F) Immunostaining for OCT4, ZSCAN4 and MBD3L2-5 in niPSC-Ts and piPSCTs respectively. Scale bar, 20 μm.
Figure 3—figure supplement 2. Dynamics of TEs during naïve reprogramming.

Figure 3—figure supplement 2.

(A) Pie charts of genomic distribution of top 100 8C-TEs distribution (upper panel) and all TEs (lower panel), showing significant enrichment (Fisher’s exact test p-value=0.00408) in LTRs. (B) Heatmaps of expression patterns of SVA integrants during naïve reprogramming (upper panel) and embryonic development (lower panel). (C) Expression dynamics of LTR7-HERVH during embryonic development. (D) Bar plot showing expression dynamics of LTR7-HERVH during naïve and primed reprogramming. (E) Transcriptional dynamics of ZNF534 in naïve and primed reprogramming. Error bars represent a 95% confidence interval around the mean value.

Several lines of evidence in mouse pre-implantation development revealed activation of many retrotransposons, including ERVs, LINE-1 elements and SINE elements during ZGA at the 2C stage (Peaston et al., 2004). According to our observation of the 8C-gene network wave during naïve reprogramming, we next assessed the dynamics of transposon elements (TEs) during this process. We identified TEs that were specifically and highly expressed at the 8C stage (8C-TEs) during human embryo development (Figure 3G); these TEs also showed transient reactivation during the late stages of naïve reprogramming (Figure 3H). Further investigation of these 8C-TEs showed a significant enrichment (Fisher’s exact test p-value=0.00408) in LTR (Figure 3—figure supplement 2A). Notably, integrants of the HERVK-family that were activated upon EGA during development (Grow et al., 2015), especially MER9a3-HERVK9, were significantly up-regulated at day 12 of naïve reprogramming and down-regulated upon niPSC-T derivation (Figure 3I; J). These results, together with the recent observations of the transient activation of 2C-specific MERVL/ZSCAN4 transcriptional network in the intermediate-late stages of mouse iPSC reprogramming (Eckersley-Maslin et al., 2016), suggest that the transient wave of 8C-transcripts is an important feature of human naïve reprogramming and that the reactivation of a gene network with the human EGA-like signature occurs during this process.

Recent studies in naïve pluripotency evaluation have indicated that naïve human ESCs display a unique transposon signature of cleavage-stage embryos with significant overexpression of the SINE-VNTR-Alu (SVA) family of transposon elements (Theunissen et al., 2016). In our system, reprogramming to naïve pluripotency induced significant up-regulation of several subgroups of the SVA family highly expressed during the morula-stage of human early embryo development, with the highest expression levels in niPSC-Ts (Figure 3—figure supplement 2B). Similar to previous observations (Theunissen et al., 2016), LTR7 and HERVH-int in the LTR7-HERVH family that were highly expressed in hESC0 and hESC10 that simulated the post-implantation stages, with no enrichment in 8C or morula stages during development (Figure 3—figure supplement 2C), exhibited higher expression levels across primed reprogramming than across naïve induction (Figure 3—figure supplement 2D). Hence, the transcriptional dynamics of TEs also revealed an ordered reactivation of transcripts with 8C- and morula-stage signatures in the intermediate-late stages of naïve reprogramming.

We also assessed the transcriptional profiles of KRAB-ZNF genes during reprogramming, which have been reported to play central roles in repressing TEs during early embryogenesis (Huntley et al., 2006; Quenneville et al., 2011). Transcriptional dynamics analyses divided the KRAB-ZNFs in to eight distinct clusters, among which the expression pattern of genes in cluster v was remarkably up-regulated in the naïve reprogramming system with robust expression in niPSC-Ts (Figure 3K). These genes included ZNF534, the repressor of LTR7-HERVH (Figure 3—figure supplement 2E). These results suggested that the naïve-specific KRAB-ZNF genes were activated to repress the TE network with a post-implantation-like signature, accompanied by the reactivation of TEs with characteristics reminiscent of morula-stage embryos during reprogramming to the naïve pluripotent state.

Dynamic changes in epigenetic modifications in the naïve reprogramming system

Transcriptional profiling across the naïve reprogramming process revealed an ordered reactivation of diverse developmental pathways from late embryogenesis to the pre-implantation stages. Next we examined the dynamics of epigenetic modifications during this process. In contrast to the continuous hypermethylation during primed reprogramming, a significant decrease in global DNA methylation was observed throughout the naïve pluripotency induction process (Figure 4A; Figure 4—figure supplement 1A), including an average methylation level resembling that in ICM by the end of reprogramming in niPSC-Ts, as previously demonstrated (Figure 4A) (Leitch et al., 2013; Okae et al., 2014; Pastor et al., 2016; Smith et al., 2014; Theunissen et al., 2016). Quantitative analysis via HPLC-MS also revealed a dramatic decrease in 5mC levels throughout naïve reprogramming but not the primed reprogramming process (Figure 4—figure supplement 1B). Dynamic changes in differentially methylated C sites (DMCs) throughout naïve reprogramming also showed an increasing trend of hypomethylated C site ratios (Figure 4—figure supplement 1C), a result consistent with the global demethylation trend (Figure 4A; Figure 4—figure supplement 1A, B). For detailed analysis, we characterized naïve-specific differentially methylated Regions (DMRs) by comparing the DMRs of niPSC-Ts with those of hiF-Ts and piPSC-Ts (Figure 4—figure supplement 1D); the results showed a dramatic trend of down-regulation in the average methylation ratios during naïve reprogramming, but no significant changes during primed reprogramming (Figure 4B). We also found that the identified naïve-specific DMRs were enriched in genes related to naïve pluripotency, which exhibited significant DNA de-methylation at the promoter regions on approximately day 24 during reprogramming (Figure 4C).

Figure 4. Changes in DNA methylation during naïve reprogramming.

(A) Bar plot showing changes in average DNA methylation ratios of all covered C sites during naïve and primed reprogramming. Error bars represent a 95% confidence interval around the mean value. (B) Box plot showing DNA methylation ratio dynamics of naïve specific DMRs during naïve and primed reprogramming. The middle lines of the boxes indicate the median, the outer edges represent the first and the third quartiles, and the whiskers indicate the 1.5 × interquartile range below the lower quartile and above the upper quartile. (C) Dynamics in the DNA methylation levels of naïve-specific DMR-related genes during naïve reprogramming. (D) DNA methylation over stable primary imprints during naïve and primed reprogramming. The middle lines of the boxes indicate the median, the outer edges represent the first and the third quartiles, and the whiskers indicate the 1.5 × interquartile range below the lower quartile and above the upper quartile.

Figure 4.

Figure 4—figure supplement 1. Dynamics of DNA methylation in naïve reprogramming.

Figure 4—figure supplement 1.

(A) Histograms showing the distribution of methylation levels (%) across all CpGs at indicated time points during naïve reprogramming. a, average methylation level. (B) 5mC dynamics across naïve and primed reprogramming by HPLC-MS analysis. (C) Bar plot showing DMC dynamics during naïve reprograming. The ratio of hyper-DMCs divided by the number of all C sites (red bars) and the ratio of hypo-DMCs divided by the number of all C sites (blue bars) were plotted separately. (D) Heatmap showing dynamics of DNA methylation on hiF-T with niPSC-T and piPSC-T. Each line is a DMR called using mcomp from MOABS. Naïve specific DMRs were defined as those only detected in niPSC-Ts compared with hiF-Ts and piPSC-Ts. (E) Line plot showing the expression dynamics of selected genes related to DNA methylation (upper panel) and TET family genes (lower panel) in naïve reprogramming (solid line) and primed reprogramming (dash line) process.

Growing evidence indicates that naïve PSCs lose DNA methylation at primary imprints, which are retained throughout pre-implantation development (Pastor et al., 2016). We examined DNA methylation dynamics in 31 stable primary imprints in both the naïve and primed reprogramming processes. The DNA methylation ratios markedly decreased during naïve reprogramming, and methylation was nearly completely lost in niPSC-Ts (Figure 4D), results similar to previous observations (Pastor et al., 2016). However, in primed reprogramming, the decrease was more moderate, and the DNA methylation ratio finally stabilized at ~50% in piPSC-Ts, as previously reported (Figure 4D) (Pastor et al., 2016). The failure to reactivate DNMT3A and DNMT3B, as well as the over-activation of TET2 (Figure 4—figure supplement 1E), might be causative of the genome-wide hypomethylation and loss of imprinting observed in the naïve pluripotency induction process.

To assess the genome-wide landscape of histone modifications, we mapped two active modifications (H3K4me2 and H3K4me3) and two repressive modifications (H3K27me3 and H3K9me3) during naïve reprogramming. We first classified genes to ‘H3K4me2-only’, ‘H3K4me3-only’ and ‘both H3K4me2/H3K4me3’ catalogs based on the H3K4me3 and H3K4me2 modification signals detected on the promoter region of each gene. A transiently decreased pattern of ‘both H3K4me2/H3K4me3’ gene numbers was observed during this process, with an increasing pattern of ‘H3K4me3-only’ gene numbers as well as a decreasing pattern of ‘H3K4me2-only’ signals, thus suggesting that there was a transient transition from H3K4me2 to H3K4me3 on the promoter regions during the naïve reprogramming process (Figure 5—figure supplement 1A).

Next, we focused on H3K4me3 and H3K27me3 dynamics during reprogramming. We observed increasing numbers of bivalent genes during the reprogramming process (Figure 5A,B), however, the up-regulation of bivalent signals was much slower with less gene numbers in niPSC-Ts during naïve reprogramming compared to primed reprogramming (Figure 5A,B). For further comparison, we clustered the genes with similar patterns of bivalency on their promoters in both naïve and primed reprogramming process, and found that there were two clusters of genes highly related to embryonic development showing increasing bivalent signals on their promoter regions during primed reprogramming, while exhibiting almost no H3K4me3/H3K27me3 signals during the whole reprogramming process to naïve pluripotency (Figure 5C; Figure 5—figure supplement 1B), results consistent with the previous report (Pastor et al., 2016; Theunissen et al., 2014; Yang et al., 2016).

Figure 5. Histone modification profiles during naïve reprogramming.

(A) Alluvial plots showing the global dynamics of genes covered by different chromatin states during naïve reprogramming. Each line represents a gene. Red bar represents genes with promoter that covered only by H3K4me3. Yellow bar represents genes with promoter that covered only by H3K27me3. Blue bar represents genes with promoter that covered by both H3K4me3 and H3K27me3. Grey bar represents genes with promoter that covered by neither H3K4me3 nor H3K27me3. (B) Line plot showing dynamics of different histone modification signals across naïve and primed reprogramming. (C) Heatmap showing clusters of genes with different bivalency patterns across naïve and primed reprogramming. (D) Average profiles of H3K4me3 and H3K27me3 signals surrounding the TSS of genes characteristic for early embryogenesis and pre-implantation during naïve reprogramming. (E) Heatmap showing six clusters of H3K9me3 peaks with different patterns (left panel) and average H3K9me3 profiles around integrants of SVA family in cluster vi of the heatmap (right panel) during naïve reprogramming. (F) Average H3K9me3 profiling around MER9a3-HERVK-9 TE during naïve reprogramming.

Figure 5.

Figure 5—figure supplement 1. Dynamics of histone modifications in naïve reprogramming.

Figure 5—figure supplement 1.

(A) Alluvial plots showing the global dynamics of genes covered by different chromatin states during naïve reprogramming. Each line represents a gene. Red bar represents genes with promoter that covered only by H3K4me2. Yellow bar represents genes with promoter that covered only by H3K4me3. Blue bar represents genes with promoter that covered by both H3K4me2 and H3K4me3. Grey bar represents genes with promoter that covered by neither H3K4me2 nor H3K4me3. (B) Enrichment of p-value of representative GO term of the selected clusters in Figure 5C. (C) Average H3K9me3 profiles around integrants of SVA family during naïve reprogramming.

To address the effect of histone modification on transcriptional activity, we also examined the H3K4me3 and H3K27me3 modification dynamics around the transcription start sites (TSSs) of gene clusters with different expression patterns. The average H3K4me3 signals were highly correlated with the transcriptional dynamics and increased around the TSSs of genes with pre-implantation and early embryogenesis signatures, while the average H3K27me3 signals decreased around the TSSs of these genes during reprogramming (Figure 5D).

We also checked the dynamics of H3K9me3 modification and found a decreasing pattern of H3K9me3 signals during naïve reprogramming, including integrants of SVA family, the naïve specific TEs (Figure 5E; Figure 5—figure supplement 1C). Moreover, the H3K9me3 signals around the 8C-TEs, especially MER9a3-HERVK9, exhibited a transiently decreased pattern at day 14, strongly correlated with the transiently increasing transcriptional activity of these 8C-TEs during naïve reprogramming (Figure 5F).

Integrative analysis of the naïve iPSC reprogramming system

Next, we focused on the down- and up-regulated patterns of genes with early/late somatic, early embryogenesis- and pre-implantation-like signatures (Figure 2C; Figure 2—figure supplement 2) and analyzed the relationships among the gene expression patterns, DNA methylation status and histone modifications. Despite their distinct expression dynamics, the transcriptional levels of these genes were closely correlated with the changes in epigenetic modifications at the promoter regions during naïve reprogramming, regulated by DNA methylation, histone modifications, or both (Figure 6A). Deep investigations at the base level also confirmed our observations above (Figure 6—figure supplement 1A). For detailed analysis, we divided the up-regulated genes during reprogramming into three groups: high-CpG-density promoters (HCPs), intermediate-CpG-density promoters (ICPs) and low-CpG-density promoters (LCPs) on the basis of the CpG ratios and the GC contents of their promoters (Figure 6—figure supplement 1B). We observed that, the two clusters with up-regulation trend showed different CpG-density patterns on these promoters (Figure 6A). Furthermore, compared with the genes with lower CpG-densities in their promoters, genes with higher CpG-densities tended to undergo more rapid DNA demethylation and re-establishment of active histone modifications (H3K4me2/3), as well as earlier up-regulation in transcription during naïve reprogramming (Figure 6B; Figure 6—figure supplement 1B). We examined the genes classified by different CpG-densities at promoter regions and found that the genes associated with core naïve pluripotency were enriched primarily in the LCP and ICP groups (Figure 6—figure supplement 1C). Together, these results indicated that the transcriptional dynamics strongly correlated with epigenetic changes during naïve reprogramming, which may be affected by the CpG density at gene promoter regions.

Figure 6. Integrative analysis of transcriptional and epigenetic dynamics during naïve reprogramming.

(A) Transcriptional dynamics of genes with different patterns are closely correlated with epigenetic modifications at promoter regions during naïve reprogramming. Black lines in the heatmaps separate genes regulated by histone modifications (upper panel, left part), DNA methylation (upper panel, right part) or both (upper panel, middle part). Line plots (lower panels) show the representative genes in each category that are regulated by H3K4me3 modification or DNA methylation. (B) Up-regulated genes with different CG ratios in their promoters exhibit different kinetics with regard to transcription, DNA methylation and H3K4me2/H3K4me3/H3K27me3 coverage during naïve reprogramming.

Figure 6.

Figure 6—figure supplement 1. Epigenetic changes of representative genes during naïve reprogramming path.

Figure 6—figure supplement 1.

(A) Base-level visualization of DNA methylation and histone modifications in the promoter regions of representative genes. (B) Classification of genes by the CG ratio in their promoter. (C) Enrichment of p-value of representative GO term of clusters based on different CG ratios in promoters as shown in Figure 6B.

Discussion

The newly discovered ‘naive’ state of human pluripotency holds great promise for early embryo development studies and therapeutic manipulations, overcoming the application bottlenecks of pluripotent stem cells at primed state. However, recent studies have primarily focused on naïve pluripotency derivation and identification (Dodsworth et al., 2015; Gafni et al., 2013; Hanna et al., 2010; Huang et al., 2014; Takashima et al., 2014; Theunissen et al., 2016; Theunissen et al., 2014), in lack of in-depth mechanical studies of naïve pluripotency establishment during reprogramming. Here, by using the secondary human naïve reprogramming system, we monitored the dynamics of transcriptome and epigenome during naïve pluripotency induction process. We observed ordered reactivations of transcriptional networks sharing signatures reminiscent of reversed embryonic development, including a significant transient wave of 8C-stage-specific transcripts expression in the late reprogramming stages followed by the stabilization of the transcriptome with pre-implantation-like characteristics, thus suggesting that the activation of a network with human EGA-like characteristics occurs during naïve pluripotency induction.

Recent advances have demonstrated that during the primed iPSCs induction, the pre-implantation-like state is reached quickly but is lost upon dox withdrawal and iPSC line derivation (Cacchiarelli et al., 2015). Distinctly from that phenomenon, the reprogramming cells in the naïve pluripotency induction process undergo a gradual reactivation of networks with pre-implantation signatures (Figure 2). Interestingly, the expression wave of 8C transcripts observed specifically in naïve reprogramming (Figure 3) suggests that the 8C-stage-like state is transiently established during the naïve pluripotency induction process. We also observed a significant genome-wide DNA demethylation as early as 12 days after naïve induction, which eventually led to hypomethylation in naïve iPSCs with similar methylation ratios of ICM in vivo, as previously reported (Hanna et al., 2010; Pastor et al., 2016; Theunissen et al., 2016) (Figure 4A; Figure 4—figure supplement 1A). The failure to maintain DNA methylation might be correlated with the lower expression levels of de novo DNA methyltransferase DNMT3A/3B, as well as the higher expression level of 5mC oxidase TET2, across naïve reprogramming compared with the primed system (Figure 4—figure supplement 1E).

Taken together, dissecting and analyzing the dynamics of naïve induction process provide the first molecular roadmap of the reprogramming of human somatic cells into naïve pluripotent state, which improve the understating of the molecular networks in the establishment and maintenance of naïve pluripotency and provide a theoretical basis for further applications.

Materials and methods

Key resources table.

Reagent type (species)
or resource
Designation Source or reference Identifiers Additional
information
gene () NA NA
strain, strain background () NA NA
genetic reagent () NA NA
cell line () Human embryonic fibroblasts (HEFs);
Primary primed iPSC lines;
hiF-T cell lines; Secondary primed iPSC
lines; Secondary naïve iPSC lines
This paper;
Cacchiarelli, D., Trapnell, C.,
Ziller, M.J., Soumillon, M.,
Cesana, M., Karnik, R.,
Donaghey, J., Smith, Z.D.,
Ratanasirintrawoot, S.,
Zhang, X., et al. (2015).
Cell. 2015 Jul
16;162(2):412–424. doi:
10.1016/j.cell.2015.06.016.
Yan, L., Yang, M., Guo, H.,
Yang, L., Wu, J., Li, R., Liu, P.,
Lian, Y., Zheng, X., Yan, J.,
et al. (2013). Nat Struct
Mol Biol. 2013
Sep;20(9):1131–9. doi:
10.1038/nsmb.2660.
Epub 2013 Aug 11.
transfected construct () dox-inducible, polycistronic
OKMS lentiviral vector
Addgene 51543.
Cacchiarelli, D., Trapnell, C.,
Ziller, M.J., Soumillon, M.,
Cesana, M., Karnik, R.,
Donaghey, J., Smith, Z.D.,
Ratanasirintrawoot, S.,
Zhang, X., et al.
(2015).Cell. 2015 Jul
16;162(2):412–424. doi:
10.1016/j.cell.2015.06.016.
biological sample () hiF-T/0d/2d/6d/8d/12d/14d/20d/
24d+dox/24d-dox/niPSC-T;
Oocyte/Zygote/2 cell/4 cell/8 cell/
Morula/MTE/PTE/EPI/PE/hESC0/hESC10;
hiF-T/2d/6d/8d/14d/20d/24d+
dox/24d-dox/piPSC-T;
this paper;
antibody anti-SSEA3, SSEA4, TRA-1–60, UTF1,
DPPA3, ZSCNA4, MBD3L2
Millipore MAB4304,
Millipore MAB4360, Abcam
ab24273, Santa Cruz sc-
67249, Millipore AB4340,
Abcam ab174802
recombinant DNA reagent NA NA
sequence-based reagent KAPA Stranded mRNA-Seq Kit;
KAPA DNA Library Preparation Kits
KAPA KK8401;
KAPA KK8234
peptide, recombinant protein human LIF recombinant protein;
bFGF recombinant protein
Peprotech 300–05;
Peprotech 450–33
commercial assay or kit bowtie; TopHat; Cufflinks;
edgeR; MACS2;
KAPA KK8401;
KAPA KK8234
chemical compound, drug Activin A; PD0325901; IM-12;
SB590885; WH-4–023; Y-27632;
Peprotech 120–14;
Stemgent 04–0012; Enzo
BML-WN102; R and D
systems 2650; A Chemtek
0104–002013; Stemgent
04–0012;
software, algorithm bowtie; TopHat; Cufflinks;
edgeR; MACS2;
PMID: 22388286;
PMID: 19289445;
PMID: 22383036;
PMID: 24743990;
PMID: 18798982;
RRID:SCR_005476;
RRID:SCR_013035;
RRID:SCR_014597;
RRID:SCR_012802;
RRID:SCR_013291
NA
other

Human skin tissue acquisition, cell culture and reprogramming

Human skin specimens from abortive fetus were obtained from the Clinical and Translational Research Center of Shanghai First Maternity and Infant Hospital, Tongji University to make human embryonic fibroblasts (HEFs). The identity of HEFs has been authenticated by STR profiling and the cells are cultured with no mycoplasma contamination.

HEFs were cultured in DMEM (Invitrogen) supplemented with 10% FBS (Invitrogen). Primed iPSCs were cultured in hESM containing DMEM/F12 with 20% knockout serum replacement (KSR) (Invitrogen) and 4 ng/ml bFGF (Peprotech). Naïve iPSCs were cultured in 5iLAF medium containing DMEM/F12: Neurobasal (1:1) (Invitrogen), 1% N2 supplement (Invitrogen), 2% B27 supplement (Invitrogen), 0.5% KSR (Invitrogen), 20 ng/ml human LIF (Peprotech), 8 ng/ml bFGF (Peprotech), 50 μg/ml BSA (Sigma) and the following cytokines and small molecules: PD0325901 (Stemgent, 1 μM), IM-12 (Enzo, 1 μM), SB590885 (R and D systems, 0.5 μM), WH-4–023 (A Chemtek, 1 μM), Y-27632 (Stemgent, 10 μM), and Activin A (Peprotech, 20 ng/ml), and passaged by Accutase (Sigma) every 4–5 days as previously reported (Theunissen et al., 2014; Yang et al., 2016).

For reprogramming, hEFs were infected with the dox-inducible, polycistronic OKMS lentiviral vector (addgene) and cultured in conventional human embryonic stem cell medium(hESM) to generate primary primed iPSCs, which were further differentiated into hiFs and then infected with pBabe-TERT retroviral vector and screened by 1.6 ng/ml puromycin to generate hiF-Ts. Secondary naïve iPSC-Ts were derived by culturing hiF-Ts in hESM with dox for 6 days followed by changing to 5iLAF medium with dox for 14 days, which were then maintained in 5iLAF medium without dox for additional 4 days. After 24 days of infection, mESC-like colonies were picked and expanded wihout dox on irradiated feeder cells in 5iLAF medium.

For differentiation from naïve to primed state, niPSC-T cells were first digested into single cells and plated onto irradiated feeder cells, which were then cultured in conventional hESM supplemented with Y-27632 (Stemgent, 10 μM) for 8–10 days.

Detection of cells growth rate

5 × 104 cells (early hiF, late hiF, early hiF-T and late hiF-T) were passaged onto 12-well plate in three replicates. Calculation of cell growth rate was performed by counting of cell numbers at 24 hr, 48 hr, 72 hr and 96 hr respectively.

Antibodies

For immunostaining, the primary antibodies used included those against OCT3/4 (1:500, Santa Cruz), SOX2 (1:500, Santa Cruz), NANOG (1:500, Abcam), SSEA3 (1:50, Millipore), SSEA4 (1:50, Millipore), TRA-1–60 (1:50, Millipore), UTF1 (1: 200, Abcam), DPPA3 (1:50, Santa Cruz), ZSCAN4 (1:100, Millipore) and MBD3L2 (1:100, Abcam). The following secondary antibodies were used: Alexa Fluor 594-conjugated donkey anti-mouse IgG (1:500; Invitrogen), fluorescein isothiocyanate (FITC) 488-conjugated donkey anti-rabbit IgG (1:500; Invitrogen), and FITC 488-conjugated donkey anti-goat IgG (1:500; Invitrogen). Anti-H3K4me2 (Millipore), Anti-H3K4me3 (Millipore), Anti-H3K27me3 (Abcam) and Anti-H3K9me3 (Millipore) antibodies were used for ChIP experiments.

Flow cytometry and immunofluroresence staining

For flow cytometry, OCT4-ΔPE-niPSC-Ts were collected, washed and re-suspended in FACS buffer containing PBS (Invitrogen) supplemented with 2% FBS (Invitrogen). All analyses were performed on a MoFloXDP cell sorter (Beckman Coulter). Flow cytometry data were processed using Flow Jo software.

Immunostaining was performed according to standard protocols. In brief, cells were fixed with PBS containing 4% paraformaldehyde (Sigma-Aldrich) overnight at 4°C and permeabilized for 15 min in PBS containing 0.5% Triton X-100. After incubation with blocking buffer (PBS containing 4% BSA) for 30 min at room temperature, cells were incubated with primary antibodies followed by secondary antibodies. Nuclei were stained with 4’,6-diamidino-2-phenylindole (1:10,000; Sigma-Aldrich). Images were taken using A1 Nikon confocal microscope.

RNA-seq library generation and sequencing

Total RNAs were isolated from naïve iPSCs and reprogramming cells using TRizol (Invitrogen). To generate RNA sequencing libraries, KAPA Stranded mRNA-Seq Kit (KAPA) was used following the manufacturer’s instructions. Adapters were offered by TruSeq Library Prep Pooling kit (Illumina). Single-end 50 bp sequencing was further performed on a HiSeq 2500 or 2000 (Illumina) at Berry Genomics Corporation.

RNA-seq data processing

Gene expression Analysis

All RNA-seq reads were aligned to the human genome (hg19) using TopHat (v2.0.12) with default parameters (Trapnell et al., 2009). Gene expression level was measured as FPKM using Cufflinks(v2.2.1) to eliminate the effects of sequencing depth and transcript length (Trapnell et al., 2010). MDS clustering analysis was based on expression profile of all genes using R function ‘cmdscale’. For each comparison, differential expressed (DE) genes were founded using GFOLD(v1.1.3) with the GFOLD value >0.58 (fold change >1.5)(Feng et al., 2012). For following analysis, FPKM were log2 transformed after adding a pseudo-count of 1. K-means clustering was performed on combined DE genes of each nearby time points with k = 14 using R library ‘amap’. Distance between genes was measured based on their correlation. Batch effects of samples from different systems are removed using removeBatchEffect function of the R library ‘edgeR’. The negative values in the normalized data are considered as zero. Pearson correlation coefficient was calculated between each two samples on common genes using R function cor().

Retrotransposon expression analysis

All RNA-seq reads were aligned to the human genome (hg19) using bowtie2(v 2.2.3) with default parameters. Then FPKM of repeat classes were calculated as the sum of the number of reads that align to each class divided by the genome coverage of the class in kilobases. FPKM of each repeat elements were calculated same as repeat classes. Repeats that expressed in specific stages were identified by comparing the repeat expression in the specific stage with the average expression. The repeats with average expression lower than 0.0001 were discarded.

A time-point specific expressed score was calculated as the expression of given time point divided by the average expression for each candidate gene, which should have a max expressed value in the given stage. The top 100 8C-TEs are the top 100 TE ranked by the time-point specific expressed score.

Gene ontology analysis

Functional annotation was performed using the Database for Annotation, Visualization and Integrated Discovery (DAVID) Bioinformatics Resource (Huang et al., 2009). Gene ontology terms for each function cluster were summarized to a representative term and P-values were plotted to show the significance. The genes are classified according to expression pattern of marker genes and developmental cell identity using LifeMap Discovery (Edgar et al., 2013).

ChIP-seq library generation and sequencing

Reprogramming cells and iPSCs were cross-linked and lysed to release chromatin, which were then sonicated by SonicsVibraCell Sonicator (Covaris) and immunoprecipitated with pretreated antibody-coupled ProteinG Dynabeads (Invitrogen) at 4°C for 8–12 hr. The ChIPed DNA was reverse-crosslinked, eluted, purified and quantified by Qubit dsDNA HS assay kit (Life Technologies). For ChIP sequencing, ChIP-seq libraries were prepared according to the protocols described in the KAPA DNA Library Preparation Kits (KAPA). Paired-end 125 bp sequencing was further performed on a HiSeq 2500 or 2000 (Illumina) at Berry Genomics Corporation.

ChIP-seq data processing

All ChIP-seq reads were aligned to the human genome (hg19) using bowtie2(v 2.2.3) with default parameters. Then reads signal for each sample were generated using the MACS2 (v2.1.0.20140616) pileup function and were normalized to 1 million reads for visualization. Chromatin states were identified and characterized using ChromHMM (v1.11) (Ernst and Kellis, 2012). The reads alignment files of H3K4me2, H3K4me3 and H3K27me3 modifications across 4-time stages during reprogramming path were binned into 200 bp bins using the Binarize Bam command. Next, a model was trained using the Learn Model command with 200 bp resolution and default parameters. Finally, for active marker, the whole genome was classified into four states: H3K4me2-only (without H3K4me3) region, H3K4me3-only (without H3K4me2) region, H3K4me2 and H3K4me3 region and non-marked region at each stage. For bivalency analysis, the whole genome was then classified into four states: H3K4me3-only (without H3K27me3) region, H3K27me3-only (without H3K4me3) region, H3K4me3 and H3K27me3 region and non-marked region at each stage. Genes were classified based on their promoters overlapping with ChromHMM segments in each stage. Promoters were defined as ±2 kb around the TSS. Genes overlapped with multiply segments in one stage were discarded in following analysis.

Alluvial diagrams of reprogramming lineages were plotted using the alluvial function in R to show the transitions of different histone mark covered gene number. The gene number was the counter of genes in each state and the percentages of the specified intervals in each stage were plotted to show the global trend of that specific chromatin state. The alluvial diagrams showed the percentage changes of different chromatin states covered gene during each transition; the lines from the present stage to next stage cannot be traced, as they represent different genes.

RRBS and data processing

For Reduced Representation Bisulfite Sequencing (RRBS), genomic DNAs were extracted from reprogramming cells and iPSCs using DNeasy Blood and Tissue Kits (QIAGEN), digested by (NEB) and inactivated by heating. The sequencing libraries were constructed as previously described (Gu et al., 2011). Paired-end 125 bp sequencing was further performed on a HiSeq 2500 or 2000 (Illumina) at Berry Genomics Corporation. All Reduced Representation Bisulfite Sequencing (RRBS) reads were aligned to human genome (hg19).

The correlation between DNA methylation ratio and gene expression was calculated using function cor() in R on each gene.

CpG ratio calculation and promoter classification

Local CpG ratio was calculated for 500 bp bins with 50 bp steps, as previously defined (Weber et al., 2007). The CpG ratio for each transcript was calculated as the max local CpG ratio around ±2 kb of the TSS. The transcripts were then separated into high-CpG-density promoters (HCPs), intermediate-CpG-density promoters (ICPs) and low-CpG-density promoters (LCPs) based on the CpG ratio and the GC content cut-off previously defined (Weber et al., 2007).

Accession number

The accession number for all the sequencing-derived data in this paper is GEO: GSE89072.

The accession number for RNA-seq data of embryo development used in this paper is GEO: GSE36552.

The accession number for RNA-seq data of primed cell reprogramming used in this paper is GEO: GSE62777.

The accession number for DNA methylation data of human early embryos used in this paper is GEO: GSE49828

Acknowledgements

We thank our colleagues in the laboratory for their assistance with the experiments and comments on the manuscript. This work was supported by the Ministry of Science and Technology of China (2016YFA0100400, and 2014CB964601), the National Natural Science Foundation of China (NSFC) (31671530, 31325019 and 31471392), and the Science and Technology Commission of Shanghai Municipality (Grant 14YF1403900, 15XD1503500).

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Yixuan Wang, Email: wangyixuan@tongji.edu.cn.

Shaorong Gao, Email: gaoshaorong@tongji.edu.cn.

Martin Pera, University of Melbourne, Australia.

Funding Information

This paper was supported by the following grants:

  • National Natural Science Foundation of China 31671530 to Yixuan Wang.

  • National Natural Science Foundation of China 31471392 to Yixuan Wang.

  • Ministry of Science and Technology of the People's Republic of China 2014CB964601 to Yixuan Wang.

  • Science and Technology Commission of Shanghai Municipality Grant 14YF1403900 to Yixuan Wang.

  • Science and Technology Commission of Shanghai Municipality Grant 15XD1503500 to Yixuan Wang.

  • Ministry of Science and Technology of the People's Republic of China 2016YFA0100400 to Shaorong Gao.

  • National Natural Science Foundation of China 31325019 to Shaorong Gao.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Validation, Investigation, Methodology, Writing—original draft, Writing—review and editing, Designed the experimental plan, derived all the cell lines, collected and analyzed data, conceived the project and provided mentoring.

Formal analysis, Investigation, Methodology, Performed sequencing data processing, analysis and interpretation and wrote the manuscript.

Formal analysis, Validation, Investigation, Visualization, Methodology, Derived all the cell lines, collected and analyzed data, and wrote the manuscript.

Validation, Investigation, Methodology, Sesigned the experimental plan, derived all the cell lines, collected and analyzed data, and wrote the manuscript.

Validation, Investigation, Methodology, Assisted with cell culture and plasmid construction.

Resources, Supervision, Visualization, Project administration, Provided all experimental materials.

Software, Formal analysis, Methodology, Writing—original draft, Assisted with data analysis and interpretation.

Conceptualization, Resources, Supervision, Funding acquisition, Writing—original draft, Project administration, Writing—review and editing, Conceived the project and provided mentoring.

Ethics

Human subjects: Human subjects: Human skin specimens from abortive fetus were obtained from the Clinical and Translational Research Center of Shanghai First Maternity and Infant Hospital, Tongji University to make human embryonic fibroblasts (HEFs). The patients provided informed consent for tissue donations, and the Biological Research Ethics Committee of Tongji University approved the study.

Additional files

Source data 1. Source Data for Figures 26 and Figure Supplements.

The Source Data files include the source R code for Figures 26. The R scripts use the data in the ‘Expression Data’ folder with the relative path. The ‘Expression Data’ folder contains the non-redundant related gene expression, methylation and histone modification tables for the main figures and figure supplements.

elife-29518-data1.zip (50.4MB, zip)
DOI: 10.7554/eLife.29518.016
Transparent reporting form
DOI: 10.7554/eLife.29518.017

Major datasets

The following dataset was generated:

Wang Y, author; Zhao C, author; Hou Z, author; Yang Y, author; Bi Y, author; Wang H, author; Zhang Y, author; Gao S, author. Integrative analysis of reprogramming human fibroblast cells to naïve pluripotency. 2017 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE89072 Publicly available at the NCBI Gene Expression Omnibus (accession no: GSE89072)

The following previously published datasets were used:

Cacchiarelli D, author; Trapnell C, author; Ziller MJ, author; Soumillon M, author; Cesana M, author; Karnik R, author; Donaghey J, author; Smith ZD, author; Ratanasirintrawoot S, author; Zhang X, author; Ho Sui SJ, author; Wu Z, author; Akopian V, author; Gifford CA, author; Doench J, author; Rinn JL, author; Daley GQ, author; Meissner A, author; Lander ES, author; Mikkelsen TS, author. Integrative analyses of human reprogramming reveal dynamic nature of induced pluripotency. 2015 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE62777 Publicly available at the NCBI Gene Expression Omnibus (accession no: GSE62777)

Tang F, author; Qiao J, author; Li R, author. Tracing pluripotency of human early embryos and embryonic stem cells by single cell RNA-seq. 2013 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE36552 Publicly available at the NCBI Gene Expression Omnibus (accession no: GSE36552)

Guo H, author; Zhu P, author; Yan L, author; Qiao J, author; Tang F, author. The DNA methylation landscape of human early embryos. 2014 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE49828 Publicly available at the NCBI Gene Expression Omnibus (accession no: GSE49828)

References

  1. Brons IG, Smithers LE, Trotter MW, Rugg-Gunn P, Sun B, Chuva de Sousa Lopes SM, Howlett SK, Clarkson A, Ahrlund-Richter L, Pedersen RA, Vallier L. Derivation of pluripotent epiblast stem cells from mammalian embryos. Nature. 2007;448:191–195. doi: 10.1038/nature05950. [DOI] [PubMed] [Google Scholar]
  2. Buecker C, Srinivasan R, Wu Z, Calo E, Acampora D, Faial T, Simeone A, Tan M, Swigut T, Wysocka J. Reorganization of enhancer patterns in transition from naive to primed pluripotency. Cell Stem Cell. 2014;14:838–853. doi: 10.1016/j.stem.2014.04.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Cacchiarelli D, Trapnell C, Ziller MJ, Soumillon M, Cesana M, Karnik R, Donaghey J, Smith ZD, Ratanasirintrawoot S, Zhang X, Ho Sui SJ, Wu Z, Akopian V, Gifford CA, Doench J, Rinn JL, Daley GQ, Meissner A, Lander ES, Mikkelsen TS. Integrative analyses of human reprogramming reveal dynamic nature of induced pluripotency. Cell. 2015;162:412–424. doi: 10.1016/j.cell.2015.06.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Chan YS, Göke J, Ng JH, Lu X, Gonzales KA, Tan CP, Tng WQ, Hong ZZ, Lim YS, Ng HH. Induction of a human pluripotent state with distinct regulatory circuitry that resembles preimplantation epiblast. Cell Stem Cell. 2013;13:663–675. doi: 10.1016/j.stem.2013.11.015. [DOI] [PubMed] [Google Scholar]
  5. Davidson KC, Mason EA, Pera MF. The pluripotent state in mouse and human. Development. 2015;142:3090–3099. doi: 10.1242/dev.116061. [DOI] [PubMed] [Google Scholar]
  6. Dodsworth BT, Flynn R, Cowley SA. The current state of Naïve human pluripotency. Stem Cells. 2015;33:3181–3186. doi: 10.1002/stem.2085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Eckersley-Maslin MA, Svensson V, Krueger C, Stubbs TM, Giehr P, Krueger F, Miragaia RJ, Kyriakopoulos C, Berrens RV, Milagre I, Walter J, Teichmann SA, Reik W. MERVL/Zscan4 network activation results in transient genome-wide DNA demethylation of mESCs. Cell Reports. 2016;17:179–192. doi: 10.1016/j.celrep.2016.08.087. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Edgar R, Mazor Y, Rinon A, Blumenthal J, Golan Y, Buzhor E, Livnat I, Ben-Ari S, Lieder I, Shitrit A, Gilboa Y, Ben-Yehudah A, Edri O, Shraga N, Bogoch Y, Leshansky L, Aharoni S, West MD, Warshawsky D, Shtrichman R. LifeMap Discovery™: the embryonic development, stem cells, and regenerative medicine research portal. PLoS One. 2013;8:e66629. doi: 10.1371/journal.pone.0066629. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Ernst J, Kellis M. ChromHMM: automating chromatin-state discovery and characterization. Nature Methods. 2012;9:215–216. doi: 10.1038/nmeth.1906. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Feng J, Meyer CA, Wang Q, Liu JS, Shirley Liu X, Zhang Y. GFOLD: a generalized fold change for ranking differentially expressed genes from RNA-seq data. Bioinformatics. 2012;28:2782–2788. doi: 10.1093/bioinformatics/bts515. [DOI] [PubMed] [Google Scholar]
  11. Gafni O, Weinberger L, Mansour AA, Manor YS, Chomsky E, Ben-Yosef D, Kalma Y, Viukov S, Maza I, Zviran A, Rais Y, Shipony Z, Mukamel Z, Krupalnik V, Zerbib M, Geula S, Caspi I, Schneir D, Shwartz T, Gilad S, Amann-Zalcenstein D, Benjamin S, Amit I, Tanay A, Massarwa R, Novershtern N, Hanna JH. Derivation of novel human ground state naive pluripotent stem cells. Nature. 2013;504:282–286. doi: 10.1038/nature12745. [DOI] [PubMed] [Google Scholar]
  12. Grow EJ, Flynn RA, Chavez SL, Bayless NL, Wossidlo M, Wesche DJ, Martin L, Ware CB, Blish CA, Chang HY, Pera RA, Wysocka J. Intrinsic retroviral reactivation in human preimplantation embryos and pluripotent cells. Nature. 2015;522:221–225. doi: 10.1038/nature14308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Gu H, Smith ZD, Bock C, Boyle P, Gnirke A, Meissner A. Preparation of reduced representation bisulfite sequencing libraries for genome-scale DNA methylation profiling. Nature Protocols. 2011;6:468–481. doi: 10.1038/nprot.2010.190. [DOI] [PubMed] [Google Scholar]
  14. Hackett JA, Surani MA. Regulatory principles of pluripotency: from the ground state up. Cell Stem Cell. 2014;15:416–430. doi: 10.1016/j.stem.2014.09.015. [DOI] [PubMed] [Google Scholar]
  15. Hanna J, Cheng AW, Saha K, Kim J, Lengner CJ, Soldner F, Cassady JP, Muffat J, Carey BW, Jaenisch R. Human embryonic stem cells with biological and epigenetic characteristics similar to those of mouse ESCs. PNAS. 2010;107:9222–9227. doi: 10.1073/pnas.1004584107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Huang daW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature Protocols. 2009;4:44–57. doi: 10.1038/nprot.2008.211. [DOI] [PubMed] [Google Scholar]
  17. Huang K, Maruyama T, Fan G. The naive state of human pluripotent stem cells: a synthesis of stem cell and preimplantation embryo transcriptome analyses. Cell Stem Cell. 2014;15:410–415. doi: 10.1016/j.stem.2014.09.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Huntley S, Baggott DM, Hamilton AT, Tran-Gyamfi M, Yang S, Kim J, Gordon L, Branscomb E, Stubbs L. A comprehensive catalog of human KRAB-associated zinc finger genes: insights into the evolutionary history of a large family of transcriptional repressors. Genome Research. 2006;16:669–677. doi: 10.1101/gr.4842106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Jiang CL, Jin SG, Lee DH, Lan ZJ, Xu X, O'Connor TR, Szabó PE, Mann JR, Cooney AJ, Pfeifer GP. MBD3L1 and MBD3L2, two new proteins homologous to the methyl-CpG-binding proteins MBD2 and MBD3: characterization of MBD3L1 as a testis-specific transcriptional repressor. Genomics. 2002;80:621–629. doi: 10.1006/geno.2002.7001. [DOI] [PubMed] [Google Scholar]
  20. Ko MS. Zygotic genome activation revisited: Looking through the expression and function of Zscan4. Current Topics in Developmental Biology. 2016;120:103–124. doi: 10.1016/bs.ctdb.2016.04.004. [DOI] [PubMed] [Google Scholar]
  21. Latham KE, Schultz RM. Embryonic genome activation. Frontiers in Bioscience. 2001;6:d748–759. doi: 10.2741/A639. [DOI] [PubMed] [Google Scholar]
  22. Leitch HG, McEwen KR, Turp A, Encheva V, Carroll T, Grabole N, Mansfield W, Nashun B, Knezovich JG, Smith A, Surani MA, Hajkova P. Naive pluripotency is associated with global DNA hypomethylation. Nature Structural & Molecular Biology. 2013;20:311–316. doi: 10.1038/nsmb.2510. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Macfarlan TS, Gifford WD, Driscoll S, Lettieri K, Rowe HM, Bonanomi D, Firth A, Singer O, Trono D, Pfaff SL. Embryonic stem cell potency fluctuates with endogenous retrovirus activity. Nature. 2012;487:57–63. doi: 10.1038/nature11244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Niakan KK, Han J, Pedersen RA, Simon C, Pera RA. Human pre-implantation embryo development. Development. 2012;139:829–841. doi: 10.1242/dev.060426. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Nichols J, Smith A. Naive and primed pluripotent states. Cell Stem Cell. 2009;4:487–492. doi: 10.1016/j.stem.2009.05.015. [DOI] [PubMed] [Google Scholar]
  26. Okae H, Chiba H, Hiura H, Hamada H, Sato A, Utsunomiya T, Kikuchi H, Yoshida H, Tanaka A, Suyama M, Arima T. Genome-wide analysis of DNA methylation dynamics during early human development. PLoS Genetics. 2014;10:e1004868. doi: 10.1371/journal.pgen.1004868. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Park IH, Zhao R, West JA, Yabuuchi A, Huo H, Ince TA, Lerou PH, Lensch MW, Daley GQ. Reprogramming of human somatic cells to pluripotency with defined factors. Nature. 2008;451:141–146. doi: 10.1038/nature06534. [DOI] [PubMed] [Google Scholar]
  28. Pastor WA, Chen D, Liu W, Kim R, Sahakyan A, Lukianchikov A, Plath K, Jacobsen SE, Clark AT. Naive human pluripotent cells feature a methylation landscape devoid of blastocyst or germline memory. Cell Stem Cell. 2016;18:323–329. doi: 10.1016/j.stem.2016.01.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Peaston AE, Evsikov AV, Graber JH, de Vries WN, Holbrook AE, Solter D, Knowles BB. Retrotransposons regulate host genes in mouse oocytes and preimplantation embryos. Developmental Cell. 2004;7:597–606. doi: 10.1016/j.devcel.2004.09.004. [DOI] [PubMed] [Google Scholar]
  30. Quenneville S, Verde G, Corsinotti A, Kapopoulou A, Jakobsson J, Offner S, Baglivo I, Pedone PV, Grimaldi G, Riccio A, Trono D. In embryonic stem cells, ZFP57/KAP1 recognize a methylated hexanucleotide to affect chromatin and DNA methylation of imprinting control regions. Molecular Cell. 2011;44:361–372. doi: 10.1016/j.molcel.2011.08.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Schultz RM. The molecular foundations of the maternal to zygotic transition in the preimplantation embryo. Human Reproduction Update. 2002;8:323–331. doi: 10.1093/humupd/8.4.323. [DOI] [PubMed] [Google Scholar]
  32. Smith ZD, Chan MM, Humm KC, Karnik R, Mekhoubad S, Regev A, Eggan K, Meissner A. DNA methylation dynamics of the human preimplantation embryo. Nature. 2014;511:611–615. doi: 10.1038/nature13581. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Takashima Y, Guo G, Loos R, Nichols J, Ficz G, Krueger F, Oxley D, Santos F, Clarke J, Mansfield W, Reik W, Bertone P, Smith A. Resetting transcription factor control circuitry toward ground-state pluripotency in human. Cell. 2014;158:1254–1269. doi: 10.1016/j.cell.2014.08.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Tesar PJ, Chenoweth JG, Brook FA, Davies TJ, Evans EP, Mack DL, Gardner RL, McKay RD. New cell lines from mouse epiblast share defining features with human embryonic stem cells. Nature. 2007;448:196–199. doi: 10.1038/nature05972. [DOI] [PubMed] [Google Scholar]
  35. Theunissen TW, Powell BE, Wang H, Mitalipova M, Faddah DA, Reddy J, Fan ZP, Maetzel D, Ganz K, Shi L, Lungjangwa T, Imsoonthornruksa S, Stelzer Y, Rangarajan S, D'Alessio A, Zhang J, Gao Q, Dawlaty MM, Young RA, Gray NS, Jaenisch R. Systematic identification of culture conditions for induction and maintenance of naive human pluripotency. Cell Stem Cell. 2014;15:471–487. doi: 10.1016/j.stem.2014.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Theunissen TW, Friedli M, He Y, Planet E, O'Neil RC, Markoulaki S, Pontis J, Wang H, Iouranova A, Imbeault M, Duc J, Cohen MA, Wert KJ, Castanon R, Zhang Z, Huang Y, Nery JR, Drotar J, Lungjangwa T, Trono D, Ecker JR, Jaenisch R. Molecular criteria for defining the naive human pluripotent state. Cell Stem Cell. 2016;19:502–515. doi: 10.1016/j.stem.2016.06.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009;25:1105–1111. doi: 10.1093/bioinformatics/btp120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology. 2010;28:511–515. doi: 10.1038/nbt.1621. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Töhönen V, Katayama S, Vesterlund L, Jouhilahti EM, Sheikhi M, Madissoon E, Filippini-Cattaneo G, Jaconi M, Johnsson A, Bürglin TR, Linnarsson S, Hovatta O, Kere J. Novel PRD-like homeodomain transcription factors and retrotransposon elements in early human development. Nature Communications. 2015;6:8207. doi: 10.1038/ncomms9207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Vassena R, Boué S, González-Roca E, Aran B, Auer H, Veiga A, Izpisua Belmonte JC. Waves of early transcriptional activation and pluripotency program initiation during human preimplantation development. Development. 2011;138:3699–3709. doi: 10.1242/dev.064741. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Ware CB, Nelson AM, Mecham B, Hesson J, Zhou W, Jonlin EC, Jimenez-Caliani AJ, Deng X, Cavanaugh C, Cook S, Tesar PJ, Okada J, Margaretha L, Sperber H, Choi M, Blau CA, Treuting PM, Hawkins RD, Cirulli V, Ruohola-Baker H. Derivation of naive human embryonic stem cells. PNAS. 2014;111:4484–4489. doi: 10.1073/pnas.1319738111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Ware CB. Concise review: lessons from Naïve human pluripotent cells. Stem Cells. 2017;35:35–41. doi: 10.1002/stem.2507. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Weber M, Hellmann I, Stadler MB, Ramos L, Pääbo S, Rebhan M, Schübeler D. Distribution, silencing potential and evolutionary impact of promoter DNA methylation in the human genome. Nature Genetics. 2007;39:457–466. doi: 10.1038/ng1990. [DOI] [PubMed] [Google Scholar]
  44. Yan L, Yang M, Guo H, Yang L, Wu J, Li R, Liu P, Lian Y, Zheng X, Yan J, Huang J, Li M, Wu X, Wen L, Lao K, Li R, Qiao J, Tang F. Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells. Nature Structural & Molecular Biology. 2013;20:1131–1139. doi: 10.1038/nsmb.2660. [DOI] [PubMed] [Google Scholar]
  45. Yang Y, Zhang X, Yi L, Hou Z, Chen J, Kou X, Zhao Y, Wang H, Sun XF, Jiang C, Wang Y, Gao S. Naïve induced pluripotent stem cells generated from β-thalassemia fibroblasts allow efficient gene correction with CRISPR/Cas9. Stem Cells Translational Medicine. 2016;5:267. doi: 10.5966/sctm.2015-0157erratum. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Zalzman M, Falco G, Sharova LV, Nishiyama A, Thomas M, Lee SL, Stagg CA, Hoang HG, Yang HT, Indig FE, Wersto RP, Ko MS. Zscan4 regulates telomere elongation and genomic stability in ES cells. Nature. 2010;464:858–863. doi: 10.1038/nature08882. [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision letter

Editor: Martin Pera1

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Unique molecular events during reprogramming of human somatic cells to induced pluripotent stem cells at naïve state" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Fiona Watt as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Qi-Long Ying (Reviewer #2); Christine Wells (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

This study provides novel information on the time course of reprogramming of human fibroblasts to the naive pluripotent state.

Essential revisions:

While two reviewers agree that your study will be of interest to the field, there are some serious concerns about the comparative analysis of the gene expression data and the bioinformatics approaches used in these analyses. The reviewers also note a number of instances in which the presentation of results is unclear. You should address these major criticisms concerning data analysis and clear up the areas where the presentation lacks clarity.

Reviewer #1:

A key question relates to the analysis of the data: the authors need to clarify precisely what embryo and primed cell reprogramming datasets they are comparing their data to.

1) Subsection “Transcriptional profiling of naïve reprogramming cells” Figure 2C - how were genes classified as early embryogenesis, pre-implantation, and late embryogenesis? These do not turn up in the GO analysis in Figure 2—figure supplement 2.

2) Subsection “Transcriptional profiling of naïve reprogramming cells” - what is the significance of the activation of genes related to placenta development and gamete generation?

3) Subsection “Transcriptional profiling of naïve reprogramming cells” Figure 2E - what is the source of the human embryo data? Is there a distinction in these data for blastocyst between inner cell mass and trophectoderm? What about the epiblast proper, which the naïve cells should resemble?

4) Subsection “Transcriptional profiling of naïve reprogramming cells” – where do the authors' data show that primed cell reprogramming drops to a post-implantation stage? What post-implantation data in what species?

5) Subsection “8-cell-stage-like state was transiently established during naïve reprogramming” and elsewhere-the statement that an eight-cell stage was transiently established probably claims too much from the data. Unless I misunderstand Figure 3A-B, the induction of a selected set of 8C genes is considerably more modest in naïve reprogrammed cells compared to that in the embryo, and the reprogramming cells may be expressing many genes that are not typical of this stage. Some other form of analysis would be required to prove that the transient cells are really equivalent to an 8C stage.

6) Subsection “8-cell-stage-like state was transiently established during naïve reprogramming” - again the significance of enrichment for genes involved in gamete generation is unclear here.

7) Subsection “Dynamic changes in epigenetic modifications in the naïve reprogramming system” and Figure 5C - what genes are chosen for study here and why?

8) Subsection “Dynamic changes in epigenetic modifications in the naïve reprogramming system” and following-the logic of comparison of the H3K4me3 with those of the mouse blastocysts is not clear. What cells of the mouse blastocyst? How about epiblast?

9) Subsection “Dynamic changes in epigenetic modifications in the naïve reprogramming system” and Figure 5E,F - what about global patterns of H3K9me3?

10) Discussion section – what is the basis for equating these patterns of gene expression with the epiblast? What human embryo epiblast data are used for comparison?

Reviewer #2:

Figure 2 primed heatmap is the first place that we see a comparison between the 2i reprogrammed cells and cells reprogrammed in 5iLAF. This is repeated in Figure 3B. I can't find details of the primed timecourse. Was this generated in parallel with the 5iLAF 2i reprogramming? Is it a 2i timecourse or derived from the first reprogramming stage? If so, this should be included in the over-view cartoons describing the experimental design and detailed in the Materials and methods section.

Or has the data been taken from an external publication (e.g. Cacchiarelli et al., 2015 which is referenced heavily in the Results section describing the transcriptome time series)? If so, I have some concerns about the integration of these two data sets. The GEO accessions for any external data series used in comparisons should be provided in the methods. Details on normalization used to address technical batch, as would be expected from integrating disparate data sources, should be provided. These integration issues are exacerbated in comparisons of single cell RNA-seq and population RNA-seq for the correlation table with the early embryo series.

The authors benchmark the 2i 5LAF transcriptome series against an external early human embryogenesis series using a Pearson correlation. Why this choice of embryo time points? If the 2i iPSC equivalent is late epiblast, is correlation with zygote-blastocyst sufficient or informative? There are relevant later stage human and monkey embryogenesis datasets that would provide a more appropriate developmental window.

How was the correlation analysis performed? There are no substantive details of any of the bioinformatics analyses beyond the RNA-seq processing.

The comparison of 'primed'-iPSC and 5iLAF-2i-iPSC in Figure 3 to the '8-cell' signature that the authors claim is reminiscent of the reactivation of the zygote transcriptome rests heavily on comparative higher expression of this signature in the 5iLAF series and not in the 'primed' iPSC series. As far as I can tell from the methods described, only 1 replicate is provided per time point in the 5iLAF-2i-iPSC timecourse. This spike could represent an oversampling issue or some other technical issue with a small number of libraries – Panel B in figure 6 particularly concerns us with regard to potential over-sampling. I have insufficient information to evaluate the quality of the primed and 5iLAF series without knowing the source of the former. The transposon mining comes from the same RNA-seq libraries, and so the same question about library quality, source of comparison and suspicion in a 'spike' seen in a small number of data points in a time series applies to the transposon analysis.

The authors have not evaluated whether the 8-cell signature is expressed later in human embryonic development. Indeed, many of the individual genes are more broadly expressed. This needs further exploration before concluding that the 5iLAF-iPSC have an equivalent zygote-reactivation wave of chromatin remodeling and gene expression. Of the individual genes highlighted in the RNA-seq, and then confirmed by IF, Figure 3 lacks comparison of the 5iLAF-iPSC with the primed iPSC series or with other pluripotent cell lines, so the claim that these genes are hall-marks of the naïve state remains somewhat anecdotal.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Thank you for submitting your article "Unique molecular events during reprogramming of human somatic cells to induced pluripotent stem cells (iPSCs) at naïve state" for consideration by eLife. Your article has been re-reviewed by one peer reviewer, and the evaluation has been overseen by a Reviewing Editor and Fiona Watt as the Senior Editor. The following individual involved in review of your submission has agreed to reveal her identity: Christine Wells (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

The revised manuscript and rebuttal address many concerns raised in the first round of review. However, one reviewer who is highly experienced in this field continues to have several important concerns about the analysis and interpretation of the data. These are summarized below and relate to your analysis of the repeat RNA-seq time course, the specificity of expression of key markers used to define cell state and the reliance on one embryo dataset only, and the degree to which the final cell state of the naïve and primed protocols really differ.

We are therefore offering you the opportunity to address these issues. As a matter of policy, we rarely offer more than one round of revision so please take the following concerns carefully as we may not be able to consider any further revisions.

Essential revisions:

1) The reanalysis, including repeat RNA-seq of the time course, have been superficially addressed. In fact, it is not clear that this repeat data has been incorporated into any of the analyses or figures. The libraries appear genuinely different, sufficiently so that the numbers of DE genes, and scales on plots should be different between the manuscript versions. They are not. One would assume that this would address potential library artefacts association with the earlier version of the manuscript, however no changes in relevant figures have arisen from inclusion of additional data, suggesting that no additional data has been included in the analysis.

Please clarify whether or not the repeat time course has been incorporated into the revised figures.

2) Figure 2D: I'm concerned that genes I observe to be highly variable in any pluripotent cell culture, including 2i-cultures, are being used as hallmarks of naïve vs primed states, especially given the shallow replication of the data generated here, and the single comparison to one primed dataset generated in another lab. In particular, the claims that dox-withdrawal in primed iPSC (Cacchiarelli) lead to loss of naïve markers such as DPP2,3,5, DNMT3L, NODAL etc. exemplifies the superficial nature of the comparison being made in this manuscript. These are genes that are highly variable in hESC and iPSC cultures, regardless of reprogramming method or 2i-culture condition (see for example https://www.stemformatics.org/expressions/result?graphType=box&datasetID=7253&gene=ENSG00000156574&db_id=56 as well as https://www.stemformatics.org/expressions/result?graphType=box&datasetID=7045&gene=ENSG00000187569&db_id=56). There is not sufficient depth of characterization in this manuscript to demonstrate that gain or loss of these markers, in either reprogramming study (present or Cacchiarelli), are indicative of a higher-order difference in primed or naïve state. The authors can verify the variable expression of markers in the HipSc catalogue, and in expression atlases such as GEO, or stemformatics, which include a large number of iPSC reprogramming datasets, including datasets linked above.

The authors have not adequately addressed the question of comparison to late epiblast stages in their correlation analyses – the Yan dataset is small, and Gao et al., seem to be using hESC as their equivalent to a post-implantation stage. So my original request for more appropriate embryo comparisons have not been addressed. The Yan dataset used post-hoc clustering to predict which cells were PE vs EPI vs Polar TE. The correlations are interesting observations but overstated because they are looking at a limited range of embryonic stages.

Please address the issue of variable expression of markers used to define cell states and refer to other studies of human embryo gene expression to validate markers for particular cell states.

3) The behaviour of primed and naive time courses are very similar up to the derivation of the final iPSC line. I'm not convinced that differences in the final iPSC state are not the result of normal line variation.

Please highlight more clearly the basis for claiming that the end stages of these protocols are genuinely different rather than a function of normal variation between cell lines. Is it possible for instance that the differences in the final cell lines reflect long term adaptation to particular culture conditions rather than inherent differences in cell state? If 2 above is answered adequately, you should be able to address this point. We apologise if the significance of point 2 above was not clear enough in the previous review.

It is important to address these concerns by demonstrating (1) that your analysis of the repeat data is adequately reflected in the revision, (2) by showing that the markers used clearly define what they are claimed to (by reference to databases or more recent human embryo data), and (3) to define clearly the differences in the reprogramming endpoints. These revisions do not entail new experimentation and should not consume too much time.

eLife. 2018 Jan 30;7:e29518. doi: 10.7554/eLife.29518.036

Author response


Reviewer #1:

A key question relates to the analysis of the data: the authors need to clarify precisely what embryo and primed cell reprogramming datasets they are comparing their data to.

Thank the reviewer for the question. Both the embryo development data and primed cell reprogramming data are published datasets. The embryo development dataset is GSE36552 (Yan et al., 2013). The primed cell reprogramming dataset is GSE62777 (Cacchiarelli et al., 2015). We have added these statements to the revised manuscript.

1) Subsection “Transcriptional profiling of naïve reprogramming cells” Figure 2C - how were genes classified as early embryogenesis, pre-implantation, and late embryogenesis? These do not turn up in the GO analysis in Figure 2—figure supplement 2.

Thank the reviewer for the question. The genes in Figure 2C were classified according to the classification criteria established in Cacchiarelli et al., 2015, in which they applied gene ontology enrichment analysis for both biological processes and developmental cell identity (Edgar et al., (2013)). We could also observe signature genes (cited in Figure 2—figure supplement 2) representative for each cluster in our system as well as in Cacchiarelli et al., 2015. We also added these statements to the Materials and methods section.

2) Subsection “Transcriptional profiling of naïve reprogramming cells” - what is the significance of the activation of genes related to placenta development and gamete generation?

Thank the reviewer for the question. According to reviewer #2’s suggestion, we performed RNA-seq of another replicate during the naïve reprogramming time-course and re-analyzed the datasets after incorporating some samples of the new replicate. The GO enrichment analysis was also re-performed (Figure 2—figure supplement 2), and there were some small changes in the GO enrichment terms. The p-value of these genes related to gamete generation is 0.001934. And we have added that in the revised manuscript.

3) Subsection “Transcriptional profiling of naïve reprogramming cells” Figure 2E-what is the source of the human embryo data? Is there a distinction in these data for blastocyst between inner cell mass and trophectoderm? What about the epiblast proper, which the naïve cells should resemble?

Thank the reviewer for the question. The source of the human embryo dataset is GSE36552 (Yan et al., 2013). In their paper, they distinguished the cells at the late blastocyst stage into Mural trophectoderm (TE), Polar TE, epiblast (EPI), and primitive endoderm (PE) by unsupervised clustering of the expression profiles in each cell. They also derived hESCs from the blastocyst and performed RNA-seq analysis of hESC0 (passage 0) and hESC10 (passage 10) to simulate the in vivo post-implantation stages during development. This dataset includes pre-implantation, implantation and post-implantation stages during development. We re-compared the correlation of gene expression between cells during naïve reprogramming process and embryonic cells during development, and found that the reprogramming cells at day 20 and 24 and niPSC-Ts most closely resembled human embryos at the late blastocyst stage, which represents the time window of naïve ground state in vivo (Hackett and Surani, 2014; Nichols and Smith, 2009) We have updated the Figure 2E and the revised manuscript.

4) Subsection “Transcriptional profiling of naïve reprogramming cells” – where do the authors' data show that primed cell reprogramming drops to a post-implantation stage? What post-implantation data in what species?

Thank the reviewer for the question. The human primed cell reprogramming dataset is GSE62777 (Cacchiarelli et al., 2015). In this paper, the authors claimed that the primed reprogramming cells show transient re-activation of developmental genes, eventually reaching a pre-implantation-like state that is lost upon derivation of primed iPSC lines.

By comparing the correlation of gene expression between primed cell reprogramming dataset (GSE62777) and embryo development dataset (GSE36552), we could also find that the cells at 24d+dox of primed reprogramming most closely resembled to the epiblast (EPI) stage during embryonic development. After dox withdrawal, the cells most closely resembled to the hESC0 and hESC10 cells, which represented the post-implantation stage (Author response image 1).

Author response image 1.

Author response image 1.

5) Subsection “8-cell-stage-like state was transiently established during naïve reprogramming” and elsewhere-the statement that an eight-cell stage was transiently established probably claims too much from the data. Unless I misunderstand Figure 3A-B, the induction of a selected set of 8C genes is considerably more modest in naïve reprogrammed cells compared to that in the embryo, and the reprogramming cells may be expressing many genes that are not typical of this stage. Some other form of analysis would be required to prove that the transient cells are really equivalent to an 8C stage.

Thank the reviewer for this important question. The statement that transcriptional network with eight-cell-stage-like characteristics was transiently activated during naïve reprograming was concluded based on the following observations:

1) 8-cell specific genes were significantly enriched in the cluster exhibiting transiently enhanced transcription activity during the late stages of naïve reprogramming, which could not be observed during primed reprogramming process.

2) Dissecting the dynamics of transposon elements (TEs) by similar strategy also revealed that 8C-TEs were transiently re-activated during naïve reprograming.

3) The H3K9me3 signals around the 8C-TEs exhibited a transiently decreased pattern at day 14, strongly correlated with the transiently increasing transcriptional activity of the 8C-TEs at that time point.

Our results cannot reach the conclusion that the transient cells are equivalent to an 8C stage, and we apologize for the misleading. We would like to conclude that the specific pattern of 8C-transcripts wave that we observed suggests the emergence of gene network activation with the 8C-stage-like characteristics during naïve reprograming.

6) Subsection “8-cell-stage-like state was transiently established during naïve reprogramming”-again the significance of enrichment for genes involved in gamete generation is unclear here.

The p-value of these genes related to gamete generation is 3.784322e-08. We have added that in the revised manuscript.

7) Subsection “Dynamic changes in epigenetic modifications in the naïve reprogramming system” and Figure 5C - what genes are chosen for study here and why?

Thank the reviewer for the question. The genes with both H3K4me3 and H3K27me3 modifications covered at least one time point in the promoter regions (+/- 2k of TSS) during the reprogramming process were chosen for our study. The genes with no H3K4me3 and H3K27me3 modifications in the promoter regions are excluded since they are not relevant for the bivalency analysis.

8) Subsection “Dynamic changes in epigenetic modifications in the naïve reprogramming system” and following-the logic of comparison of the H3K4me3 with those of the mouse blastocysts is not clear. What cells of the mouse blastocyst? How about epiblast?

The reviewer raised a very good point. We hoped to compare the epigenetic similarity between the reprograming cells and the embryonic cells during development. Since there is no H3K4me3 ChIP-seq data in human pre-implantation embryos, we used the mouse pre-implantation embryo H3K4me3 ChIP-seq dataset (GSE73952) and calculated the correlation of H3K4me3 signal in the above two systems at the promoter regions (+/- 2k of TSS) of homolog genes.

Thank the reviewer for pointing this. We now realize that this analysis may be inaccurate and inappropriate. We have removed the figure panel and the related statements in our revised manuscript.

9) Subsection “Dynamic changes in epigenetic modifications in the naïve reprogramming system” and figure 5EF-what about global patterns of H3K9me3?

The global pattern of H3K9me3 is down-regulated, as shown in Author response image 2:

Author response image 2.

Author response image 2.

10) Discussion section – what is the basis for equating these patterns of gene expression with the epiblast? What human embryo epiblast data are used for comparison?

Thank the reviewer for the question. We apologize for the misleading caused by the inaccurate statements. Our analysis showed that derived niPSC-Ts exhibit reactivation of network with late blastocyst-stage signatures. We removed such statements for more accuracy in the revised manuscript.

Reviewer #2:

Figure 2 primed heatmap is the first place that we see a comparison between the 2i reprogrammed cells and cells reprogrammed in 5iLAF. This is repeated in Figure 3B. I can't find details of the primed timecourse. Was this generated in parallel with the 5iLAF 2i reprogramming? Is it a 2i timecourse or derived from the first reprogramming stage? If so, this should be included in the over-view cartoons describing the experimental design and detailed in the Materials and methods section.

Or has the data been taken from an external publication (e.g. Cacchiarelli et al., 2015 which is referenced heavily in the Results section describing the transcriptome time series)? If so, I have some concerns about the integration of these two data sets. The GEO accessions for any external data series used in comparisons should be provided in the methods. Details on normalization used to address technical batch, as would be expected from integrating disparate data sources, should be provided. These integration issues are excerbated in comparisons of single cell RNA-seq and population RNA-seq for the correlation table with the early embryo series.

Thank the reviewer for this important question. The primed reprogramming dataset is taken from the publication “Cacchiarelli et al., 2015”, and the GEO accession number is GSE62777. We apologize for our omissions in the description. We have added the GEO accessions for both primed reprogramming and early embryo development data series in the Materials and methods section. Batch effects of samples from different systems are removed using removeBatchEffect function of the R library “edgeR”. The negative values in the normalized data are considered as zero. We have also provided the details on normalization used to address technical batch between different systems in the revised Materials and methods section.

The authors benchmark the 2i 5LAF transcriptome series against an external early human embryogenesis series using a Pearson correlation. Why this choice of embryo time points? If the 2i iPSC equivalent is late epiblast, is correlation with zygote-blastocyst sufficient or informative? There are relevant later stage human and monkey embryogenesis datasets that would provide a more appropriate developmental window.

Thank the reviewer for the important question. The source of the human embryo dataset is dataset is GSE36552 (Yan et al., 2013). In this paper, they distinguished the cells at the late blastocyst stage into Mural trophectoderm (TE), Polar TE, epiblast (EPI), and primitive endoderm (PE) by unsupervised clustering of the expression profiles in each cell. They also derived hESCs from the blastocyst and performed RNA-seq analysis of hESC0 (passage 0) and hESC10 (passage 10) to simulate the in vivo post-implantation stages during development. This dataset includes pre-implantation, implantation and post-implantation stages during development.

We re-compared the correlation of gene expression between cells during naïve reprogramming process and embryonic cells during development (Figure 2E) and found that the reprogramming cells at day 20 and 24 and niPSC-Ts most closely resembled human embryos at the late blastocyst stage, which represents the time window of naïve ground state in vivo (Hackett, and Surani, 2014; Nichols and Smith, 2009)

How was the correlation analysis performed? There are no substantive details of any of the bioinformatics analyses beyond the RNA-seq processing.

The Pearson correlation coefficient was calculated between each two samples on common genes using R function cor(). We apologize for our omissions in the method description. The substantive details of bioinformatics analyses are displayed below:

Gene expression Analysis

Batch effects of samples from different systems are removed using removeBatchEffect function of the R library “edgeR”. The negative values in the normalized data are considered as zero.

Retrotransposon expression Analysis

A time-point specific expressed score was calculated as the expression of given time point divided by the average expression for each candidate gene, which should have a max expressed value in the given stage. The top 100 8C-TEs are the top 100 TE ranked by the time-point specific expressed score.

Gene ontology analysis

Functional annotation was performed using the Database for Annotation, Visualization and Integrated Discovery (DAVID) Bioinformatics Resource (Huang et al., 2009). Gene ontology terms for each function cluster were summarized to a representative term and P-values were plotted to show the significance. The genes are classified according to developmental cell identity using LifeMap Discovery (Edgar et al., 2013).

And we replenished these bioinformatics analysis methods to “Materials and methods” section.

The comparison of 'primed'-iPSC and 5iLAF-2i-iPSC in Figure 3 to the '8-cell' signature that the authors claim is reminiscent of the reactivation of the zygote transcriptome rests heavily on comparative higher expression of this signature in the 5iLAF series and not in the 'primed' iPSC series. As far as I can tell from the methods described, only 1 replicate is provided per time point in the 5iLAF-2i-iPSC timecourse. This spike could represent an oversampling issue or some other technical issue with a small number of libraries – Panel B in figure 6 particularly concerns us with regard to potential over-sampling. I have insufficient information to evaluate the quality of the primed and 5iLAF series without knowing the source of the former. The transposon mining comes from the same RNA-seq libraries, and so the same question about library quality, source of comparison and suspicion in a 'spike' seen in a small number of data points in a time series applies to the transposon analysis.

Thank the reviewer for this important question. The primed reprogramming dataset is taken from the publication “Cacchiarelli et al., 2015”, and the GEO accession number is GSE62777.

According to the reviewer’s suggestion, we performed another replicate per time point in the naïve reprogramming timecourse, and chose the qualified samples at 8d, 14d and 20d during naïve reprogramming, which were also important time points for naïve reprogramming. Then we re-analyzed the datasets and got the similar conclusions with small variations (Figure 2, Figure 2—figure supplement 1 and Figure 2—figure supplement 2, Figure 3, Figure 3—figure supplement 1 and Figure 3—figure supplement 2, Figure 6, Figure 6—figure supplement 1).

The authors have not evaluated whether the 8-cell signature is expressed later in human embryonic development. Indeed, many of the individual genes are more broadly expressed. This needs further exploration before concluding that the 5iLAF-iPSC have an equivalent zygote-reactivation wave of chromatin remodeling and gene expression. Of the individual genes highlighted in the RNA-seq, and then confirmed by IF, Figure 3 lacks comparison of the 5iLAF-iPSC with the primed iPSC series or with other pluripotent cell lines, so the claim that these genes are hall-marks of the naïve state remains somewhat anecdotal.

Thank the reviewer for this important question. We have now evaluated the expression of 8C-genes in later stages including the PGC cells and somatic cells at 4 weeks, 7weeks and 8weeks (GSE63818) during human embryonic development (Guo et al., 2015), which showed no significant expression compared to the 8-cell stage (Author response image 3).

Author response image 3.

Author response image 3.

For the individual genes highlighted in Figure 3 such as MBD3L2-5 and ZSCAN4, we also checked their expression dynamics during primed reprogramming. Both IF and western blot analysis exhibited no significant activation of these genes during the primed reprogramming process. We have also added the immunostaining and western blotting results of primed series to the revised Figure 3F and Figure 3—figure supplement 1E.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Essential revisions:

1) The reanalysis, including repeat RNA-seq of the time course, have been superficially addressed. In fact, it is not clear that this repeat data has been incorporated into any of the analyses or figures. The libraries appear genuinely different, sufficiently so that the numbers of DE genes, and scales on plots should be different between the manuscript versions. They are not. One would assume that this would address potential library artefacts association with the earlier version of the manuscript, however no changes in relevant figures have arisen from inclusion of additional data, suggesting that no additional data has been included in the analysis.

Please clarify whether or not the repeat time course has been incorporated into the revised figures.

Thank the reviewer for this question. The third replicate of RNA-seq datasets of the naïve reprogramming time course has been analyzed and incorporated into the figures (Figure 2B-E, Figure 2—figure supplement 1A-D; Figure 2—figure supplement 2; Figure 3I-K; Figure 3—figure supplement 1A-D; Figure 3—figure supplement 2A, B, E; Figure 6A-B). And the numbers of DE genes in each cluster (Figure 2—figure supplement 2) were different from the first submission version in our last revision. We apologize for our omission in the legend of Figure 2D and in the update of Figure 2—figure supplement 1A, Figure 3B, Figure 3H, Figure 3—figure supplement 1C, Figure 3—figure supplement 2C and 2D. We have updated these figures in this revised version.

2) Figure 2D: I'm concerned that genes I observe to be highly variable in any pluripotent cell culture, including 2i-cultures, are being used as hallmarks of naïve vs primed states, especially given the shallow replication of the data generated here, and the single comparison to one primed dataset generated in another lab. In particular, the claims that dox-withdrawal in primed iPSC (Cacchiarelli) lead to loss of naïve markers such as DPP2,3,5, DNMT3L, NODAL etc. exemplifies the superficial nature of the comparison being made in this manuscript. These are genes that are highly variable in hESC and iPSC cultures, regardless of reprogramming method or 2i-culture condition (see for example https://www.stemformatics.org/expressions/result?graphType=box&datasetID=7253&gene=ENSG00000156574&db_id=56 as well as https://www.stemformatics.org/expressions/result?graphType=box&datasetID=7045&gene=ENSG00000187569&db_id=56. There is not sufficient depth of characterization in this manuscript to demonstrate that gain or loss of these markers, in either reprogramming study (present or Cacchiarelli), are indicative of a higher-order difference in primed or naïve state. The authors can verify the variable expression of markers in the HipSc catalogue, and in expression atlases such as GEO, or stemformatics, which include a large number of iPSC reprogramming datasets, including datasets linked above.

Thank the reviewer for this important question. We have now re-selected the marker genes that indicate naïve pluripotency according to several important publications in the naïve pluripotency research field including Theunissen et al., (2014), Takashima et al., (2014), Pastor et al., (2016), and Guo et al., (2016). We also collected the published RNA-seq datasets containing 5 sets of naïve and primed ESCs and 3 sets of naïve and primed iPSCs to validate the selected marker genes expression according to the reviewer’s suggestion (GSE59435 of Theunissen et al., paper; E-MTAB-2857 of Takashima et al., paper; GSE76970 of Pastor et al., paper; GSE69319 of Yang et al., paper). We compared the expression of these marker genes in these datasets. Although some markers exhibit variable expression levels in different cell lines, most of them showed significant expression difference between the naïve cells and primed cells (Author response image 4), t-test are performed on each available pair of datasets, p-value are shown in the right of the plots (NA stands for not enough data for t-test)).

Author response image 4.

Author response image 4.

Moreover, we also checked the expression dynamics of the re-selected markers in the iPSC reprogramming datasets (GSE50206 from Takahashi K et al., (2014)), most of which also showed transiently upregulation during the reprogramming process, and finally down-regulated in the iPSC stage (Author response image 5).

Author response image 5.

Author response image 5.

The authors have not adequately addressed the question of comparison to late epiblast stages in their correlation analyses – the Yan dataset is small, and Gao et al., seem to be using hESC as their equivalent to a post-implantation stage. So my original request for more appropriate embryo comparisons have not been addressed. The Yan dataset used post-hoc clustering to predict which cells were PE vs EPI vs Polar TE. The correlations are interesting observations but overstated because they are looking at a limited range of embryonic stages.

Please address the issue of variable expression of markers used to define cell states and refer to other studies of human embryo gene expression to validate markers for particular cell states.

Thank the reviewer for this question. According to the reviewer’s suggestion, we have added two more human embryonic development datasets, including RNA-seq dataset of single embryonic cells from E3 to E7 (ArrayExpress: E-MTAB-3929) that used PAM clustering to classified which cells were PE vs EPI vs PTE, and RNA-seq datasets of somatic cells at 4 weeks, 7 weeks and 8 weeks (GSE63818) during human embryonic development (Guo et al., 2015), which is the earliest embryonic datasets after implantation to our knowledge. The correlation analyses between naïve reprogramming dataset and human E3-E7 datasets also showed that the reprogramming cells at day 20 and 24 and niPSC-Ts most closely resembled epiblast cells at E5 to E7, especially at E6, similar to our previous observations in the correlation analyses between the naïve reprogramming dataset and Yan et al.’s dataset (GSE36552) (Author response image 6). However, this strong correlation was not observed when comparing our naïve reprogramming dataset with Guo et al., dataset (GSE63818) (Author response image 7).

Author response image 6.

Author response image 6.

Author response image 7.

Author response image 7.

We are sorry that we have not found the public embryonic data at the stages just in the narrow time window of post-implantation. Therefore, in the last revised response we used hESC0 and hESC10 to simulate the post-implantation time-point during development, which were reported to originate from a post-ICM intermediate (PICMI), a transient epiblast-like structure that has undergone X-inactivation in female cells and is both necessary and sufficient for ESC derivation (O’Leary et al., 2012).

3) The behaviour of primed and naive time courses are very similar up to the derivation of the final iPSC line. I'm not convinced that differences in the final iPSC state are not the result of normal line variation.

Please highlight more clearly the basis for claiming that the end stages of these protocols are genuinely different rather than a function of normal variation between cell lines. Is it possible for instance that the differences in the final cell lines reflect long term adaptation to particular culture conditions rather than inherent differences in cell state? If 2 above is answered adequately, you should be able to address this point. We apologise if the significance of point 2 above was not clear enough in the previous review.

Thank the reviewer for this important question. We have performed cell clustering based on cell-to-cell correlation of the reprogramming endpoints in our study and naïve/primed ESCs/iPSCs from public datasets mentioned above. The correlation analyses showed that the niPSC-Ts which most resembled other naïve ESCs/iPSCs in the public datasets, separate clearly from the piPSC-Ts, indicating that the differences in the final iPSC state are not the result of normal line variation (Author response image 8).

Author response image 8.

Author response image 8.

We do not think that the differences between niPSC-T and piPSC-Ts reflect long term adaptation to the 5iLAF culture conditions. The activation of OCT4 distal enhancer and the signatures in epigenome in naïve cells strongly suggest their inherent differences from primed cells.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Source data 1. Source Data for Figures 26 and Figure Supplements.

    The Source Data files include the source R code for Figures 26. The R scripts use the data in the ‘Expression Data’ folder with the relative path. The ‘Expression Data’ folder contains the non-redundant related gene expression, methylation and histone modification tables for the main figures and figure supplements.

    elife-29518-data1.zip (50.4MB, zip)
    DOI: 10.7554/eLife.29518.016
    Transparent reporting form
    DOI: 10.7554/eLife.29518.017

    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES