Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Feb 21.
Published in final edited form as: Cell Syst. 2024 Feb 8;15(2):109–133.e10. doi: 10.1016/j.cels.2024.01.001

Retrospective identification of cell-intrinsic factors that mark pluripotency potential in rare somatic cells

Naveen Jain 1, Yogesh Goyal 2,3,4,5, Margaret C Dunagin 5, Christopher J Cote 5, Ian A Mellis 6, Benjamin Emert 6, Connie L Jiang 1, Ian P Dardani 5, Sam Reffsin 5, Miles Arnett 5, Wenli Yang 7,8,9, Arjun Raj 5,10,11
PMCID: PMC10940218  NIHMSID: NIHMS1962028  PMID: 38335955

Summary

Pluripotency can be induced in somatic cells by the expression of OCT4, KLF4, SOX2, and MYC. Usually only a rare subset of cells reprogram, and the molecular characteristics of this subset remain unknown. We apply retrospective clone tracing to identify and characterize the rare human fibroblasts primed for reprogramming. These fibroblasts showed markers of increased cell cycle speed and decreased fibroblast activation. Knockdown of a fibroblast activation factor identified by our analysis increased the reprogramming efficiency. We provide evidence for a unified model in which cells can move into and out of the primed state over time, explaining how reprogramming appears deterministic at short time scales and stochastic at long time scales. Furthermore, inhibiting the activity of LSD1 enlarged the pool of cells that were primed for reprogramming. Thus, even homogeneous cell populations can exhibit heritable molecular variability that can dictate whether individual rare cells will reprogram or not.

Electronic Table of Contents Blurb

Only rare somatic cells can successfully reprogram into iPSCs. We identified a molecular state marking this rare subset in human fibroblasts. This state persists across several divisions, but can eventually be gained or lost in individual cells. We propose that these state dynamics reconcile stochastic and deterministic models of reprogramming.

Graphical Abstract

graphic file with name nihms-1962028-f0001.jpg

Introduction

The demonstration that pluripotent stem cells can be induced from differentiated somatic cells via ectopic expression of the four “Yamanaka” factors (OCT4, KLF4, SOX2, and MYC; OKSM) was a watershed discovery holding promise for disease modeling and regenerative medicine.1,2 Yet, induction of pluripotency is a highly inefficient process, with only a small percentage of originating cells properly undergoing reprogramming into iPSCs. Early on, reprogramming efficiencies were usually below 0.1%, and while technical improvements to ensure more homogenous delivery and stoichiometry of the OKSM factors have helped increase the efficiency of reprogramming to around 1–5% for human fibroblasts (and even further by modulating various chemical and physical factors), it remains far from 100%.3,4 One possibility is that intrinsic differences between individual cells before the addition of the OKSM factors lead them to either reprogram or not. Comparing the profiles of the rare, highly-reprogrammable cells against their more common, less-reprogrammable counterparts could reveal barriers to reprogramming only active in the latter group. However, the identification of highly-reprogrammable cells prior to reprogramming itself has remained challenging.

Even the question of whether the cells that are able to reprogram have a fixed identity remains heavily debated. One set of results argues that there is no intrinsically defined “primed” subpopulation per se, and that all cells are equally capable of undergoing reprogramming.58 The primary evidence for this model is the observation that all single-cell clones derived from the parental population have a reprogrammable subpopulation.6 In such a view, only pseudo-random effects (i.e., unknown environmental factors) can dictate which cells reprogram. Other experiments, however, have provided evidence for the existence of intrinsic differences dictating reprogramming outcomes. Most directly, mouse embryonic fibroblasts (MEFs) recently derived from a shared single progenitor cell (i.e., twins) share reprogramming outcomes even when divorced from their original context and separated onto different plates.911 These results collectively suggest that reprogramming potential is at least partially encoded by pre-existing differences before OKSM induction, and these differences are stable enough to exist across cell division in twins. However, while reprogramming can be accomplished by the same set of reprogramming factors in both mouse and human cells, mouse cells reprogram much more quickly, form iPSCs with different morphologies and pluripotent states, and have a somewhat different sequence of reprogramming events (i.e., MET occurs very early in mouse by very late in human).12,13 Thus, it remains to be seen if this memory exists in human fibroblasts undergoing reprogramming.

What, then, are the factors that are associated with this primed state before the induction of OKSM, and how do they differ from those that operate after induction? Much work over the years has focused on the latter, elucidating the molecular sequence of events between the induction of OKSM in somatic cells and becoming fully reprogrammed as an iPSC. This set of events has been elucidated in great detail via comprehensive analyses of cells undergoing reprogramming. For instance, after induction, cells that reprogram have been shown to exhibit: accelerated cell cycle progression,6,1417 expression of factors facilitating or marking successful mesenchymal-to-epithelial transition (MET),1823 enhanced chromatin accessibility at pluripotency gene loci,18,2426 and expression of factors facilitating the action of the OKSM factors or establishing pluripotency directly.18,2628 However, there is no guarantee that factors responsible for driving the path to pluripotency are the same as those that mark cells primed for reprogramming before induction. A few studies in mouse cells have attempted to show that some features that appear after induction may be present beforehand, such as fast cycling23,29 and expression of markers found in developmental progenitors,11,30 or even stem cells.31 However, the lack of means to directly and precisely identify primed cells in an unbiased way has limited our knowledge of the factors most critical for priming.

The primary hurdle in identifying these factors is the technical challenge of retrospectively identifying and profiling the initial state of cells based on whether or not they ultimately reprogram into an iPSC. This challenge is compounded by the fact that cells bound to become iPSCs are very rare within the population. Several recently developed clonal barcoding and retrospective characterization methods have now made it possible to connect initial cell state to phenotypic fate with high resolution.3236

Here, we make use of one such method called Rewind32 that uses a DNA/RNA barcoding strategy to pick out “needle-in-a-haystack” fibroblasts primed to become iPSCs from thousands of nonprimed fibroblasts in a unique human fibroblast line that is clonally derived and has inducible expression of the Yamanaka factors to minimize technical variability between cells. Using Rewind, we demonstrate the existence of a subset of clonally-derived human fibroblasts primed to become iPSCs upon exposure to OKSM. These primed fibroblasts exhibit an elevated rate of cell cycle progression and have low levels of factors associated with fibroblast activation even before the induction of OKSM. Furthermore, cells can seemingly move into and out of the primed state, such that different subsets are primed at any given time. Our results suggest that intrinsic cellular variability can define a reprogrammable state that can persist for multiple cell divisions but not indefinitely.

Results

Rewind in the hiF-T system enables retrospective identification and characterization of cells primed for reprogramming into iPSCs

Our goal was to identify markers for cells that are primed for reprogramming. The central challenge was the retrospective identification of the cells that undergo reprogramming. We used a clonal barcoding method called Rewind to explicitly connect pre-existing differences in somatic cells (defining the primed state) with iPSC reprogramming outcomes.32 Rewind uses a lentiviral library of barcodes incorporated into the 3’ UTR of GFP, enabling barcodes to stably exist as DNA and mRNA for detection by both single-cell RNA-sequencing and single-molecule RNA FISH imaging. It is particularly well-suited for identifying very rare cells in the population, as is the case with cells primed for reprogramming. In Rewind, after performing barcoding, cells undergo a few divisions resulting in what we refer to as twins, or cells in the same clone with a recent common ancestor. In the case of reprogramming, after separating twins, we immediately transcriptionally profile one split of twins to capture the initial state (i.e., a molecular “carbon copy”), and we reprogram the other split of twins into iPSCs via induction of OKSM. We then sequence the clone barcodes in the resultant iPSCs, use them to identify primed twins in the “carbon copy”, and use their transcriptome profiles to determine which genes’ expression patterns are associated with reprogramming success.

To minimize the potential impact of confounding variables that also may vary between cells, such as the degree of induction of the OKSM factors, we used a clonally-derived, secondary human fibroblast-like cell line (hiF-T) with doxycycline-inducible expression of the OKSM factors (Figure 1A).37 The clonal derivation of the line minimized the contribution of genetic differences to reprogramming, and the inducible expression of human telomerase ensured more consistent reprogramming efficiency even after months in culture; these cells displayed less variability in proliferation, senescence and reprogramming efficiency than primary fibroblasts. Furthermore, the OKSM factors are combined in a single cassette, facilitating consistent dosage of OKSM across cells. We observed rapid and relatively homogeneous OKSM induction with low variability—over 99% of cells expressed high levels of OKSM after 48 hours in doxycycline, as determined by measuring SOX2 and KLF4 mRNA levels by RNA fluorescence in situ hybridization (FISH) with and without doxycycline (Figures S1AB, see Table S1). Despite the minimization of variability in induction, reprogramming still only occurred in a small fraction of cells (0.01–0.1%). For all of our experiments reported here in hiF-Ts, we used a single knockout serum-replacement induction medium (see Methods).

Figure 1: Rewind enables retrospective identification of and gene expression profiling of cells primed for reprogramming into iPSCs before OKSM induction.

Figure 1:

A. Schematic of reprogramming human inducible fibroblast-like (hiF-T) cells into induced pluripotent stem cells (iPSCs). Addition of doxycycline induces expression of a polycistronic cassette driving stable and stoichiometric expression of the Yamanka factors (OCT4, KLF4, SOX2, and MYC). We stained for alkaline phosphatase activity to identify pluripotent iPSCs by imaging after the reprogramming period of 3–4 weeks.

B. Schematic of Rewind for following fates of hiF-T cells. Here, we transduced hiF-T cells at an MOI of ~1 with our barcode library. After 3–5 cell divisions, we divided the culture into splits (A and B), induced OKSM in each separately, and performed barcode DNA-sequencing in the resulting iPSC colonies. If extrinsic factors alone dictated reprogramming outcomes or if priming did not have memory across cell divisions (i.e., “no memory”), we would expect essentially no overlap in clones forming iPSCs in each split. If intrinsic factors dictated some degree of reprogramming outcomes and if priming had memory across cell divisions (i.e., “memory”), we would expect some degree of overlap in clones forming iPSCs in each split.

C. We performed computer simulations to determine the degree of overlap in clones forming iPSCs in each split expected if extrinsic factors alone dictated reprogramming outcomes (i.e., “no memory”). The degree of overlap was simulated for 1000 replicates for reprogramming frequencies of 0.01% and 0.1%, where reprogramming frequency = (number of alkaline positive iPSC colonies formed per well) / (number of cells seeded before OKSM induction per well).

D. The observed degree of overlap in clones forming iPSCs in each split for n = 3 independent biological replicates.

E. Schematic of single-cell Rewind for retrospectively identifying hiF-T cells primed to reprogram into iPSCs. Here, we transduced hiF-T cells at an MOI of ~0.15 with our barcode library. After 3 cell divisions, we sorted the successfully barcoded population (GFP positive) and divided the culture into two splits (A and B): with split A we reprogrammed the cells into iPSCs via induction of OKSM and performed barcode DNA-sequencing to identify primed clones and with split B (i.e., “carbon copy”) we immediately performed single-cell RNA-sequencing and barcode DNA-sequencing to label single-cell expression profiles with clone barcodes.

F. The datasets are GSM7092515 and GSM7092516. We applied the Uniform Manifold Approximation and Projection (UMAP) algorithm within Seurat to the first 50 principal components to spatially visualize differences in gene expression in hiF-Ts before OKSM induction in split B (i.e., “carbon copy”). Left UMAP: Cells are colored by clusters determined using Seurat’s FindClusters command at a resolution of 0.45 (i.e., “Seurat clusters, res = 0.45”). Right UMAP: Cells are colored as primed (red) or nonprimed (light gray), determined by which clone twins formed iPSC colonies in reprogramming split A. Bar plot: We asked whether the primed hiF-T cells were more transcriptionally similar to each other than to the average of 1000 random samples of an equal number of nonprimed cells (i.e., “P(same distribution)”) and plotted the corresponding probability distributions across Seurat clusters (see Methods).

G. We plotted the log2(fold change) between primed cells versus nonprimed cells from our data versus the log2(fold change) between hiF-T cells and hiF-T-derived iPSCs from 37 for individual genes. Genes near the x-axis were differentially expressed between primed versus nonprimed cells but not between fibroblasts and iPSCs, while genes near the y-axis were vice versa. We labeled selected positive priming markers (in red) and negative priming markers (in blue). We chose 10 markers associated with iPSCs, fibroblasts, activated fibroblasts, and myofibroblasts identified in previous studies and asked if any broad category fit our observed priming markers. We plotted the average log2(fold change) between primed cells versus nonprimed cells for markers in each category and asked if the observed distribution was different from random (i.e., “P(median not random)”).

H. Here, we transduced hiF-T cells at an MOI of ~1 with our barcode library. After 3–4 cell divisions, we divided the culture into two splits (A and B): with split A we reprogrammed the cells into iPSCs via induction of OKSM and performed barcode DNA-sequencing to identify primed clones and with split B fixed cells in situ on glass slides before OKSM induction. We designed RNA clone barcode probes to label and identify primed clone barcodes by RNA FISH in the fixed cells (see Methods). We measured OCT4 and SOX2 expression in individual primed cells (in red) and nonprimed cells (in light gray) and plotted the population distribution of counts per cell for each gene. P-values comparing sample medians were calculated using the Wilcoxon rank sum test.

Human fibroblasts within clones share reprogramming outcomes, indicating iPSC reprogramming is not purely stochastic

The application of Rewind requires that reprogrammability is largely dictated by heritable, intrinsic (meaning innate) differences in cellular state as opposed to extrinsic factors such as cell-cell interactions with neighbors or other microenvironmental factors. That way, the “carbon copy” cells would reflect the state of the cells that successfully reprogrammed. To demonstrate the primed state was intrinsic and heritable over at least a few cell divisions, we barcoded a population of 300,000–400,000 fibroblasts, let them proliferate for 3–5 divisions, and split each set of twins into different plates, hence randomizing their microenvironmental context. We then induced OKSM in both plates until the emergence of iPSCs (Figure 1B). If extrinsic factors were responsible for determining which cells reprogrammed following induction, we would expect largely distinct sets of barcodes in each population (maximum overlap of 0.385% across 10,000 simulations in silico) (Figure 1C). Instead, we observed a much higher degree of overlap (7.56%–37.1%) in barcodes even 14 days post-transduction, consistent with previous barcoding experiments in MEFs9,11 (Figures 1D, S2AC). Thus, intrinsic differences that persist for at least several cell divisions are major determinants of cellular reprogramming, enabling us to apply Rewind in this system.

We wondered how much the lack of clone barcode overlap between splits was explained by subsampling in our experiments. Therefore, we again barcoded a population of cells and enabled them to divide for 2–3 or 4–5 divisions before separating them into 2 or 3 splits (mimicking the conditions used for our barcoding experiments) and sequencing the clone barcodes for each split without inducing reprogramming (Figure S2D). Because each split was sequenced without reprogramming (i.e., no selection was used), we would need an extremely high sequencing depth in order to detect all of the clone barcodes per split, which was not feasible for this specific question. Consequently, we were unsure where to set a read threshold for calling bona fide clone barcodes: (1) if we set a low threshold, spurious or insufficiently sequenced barcodes would be present in each split, resulting in an underestimate of the real overlap versus (2) if we set a high threshold, clones with high rates of division would be overrepresented in each split resulting in an overestimate of the real overlap. We used a higher cutoff (>100 reads per million per barcode) to estimate absolute maximums for barcode overlap. We estimated the baseline overlap between splits as 20%–40% for 2–3 divisions and 50%–80% for 4–5 divisions. While it is difficult to quantify definitively, we believe technical limitations still represent a significant source of variability and reason for imperfect overlap between splits. The remaining lack of overlap is likely due to twins not sharing reprogramming outcomes, perhaps due to imperfect memory of priming within individual clones over time.

Pluripotent cells form colonies with large numbers of cells in them (>100 cells), hence we assumed that a large number of reads corresponding to a particular clone barcode was an indicator that those cells had successfully reprogrammed. To validate this assumption, we performed a similar barcode overlap experiment as before but with a split for generating embryoid bodies, which are cell aggregates mimicking the early embryo.38 We found that barcodes with a larger number of reads make up a majority of the resulting embryoid bodies, validating the use of number of reads as a proxy for pluripotency (Figure S3A). Furthermore, we found that primed twins even when separated form iPSC colonies of a similar size, as has been previously reported (Figure S2C).11

Primed cells in the initial population have measurable gene expression differences before OKSM exposure

Having validated our ability to use Rewind in the hiF-T system, we applied it to measure gene expression differences between primed and nonprimed fibroblasts. We barcoded a population of fibroblasts as before in our barcode overlap experiments, let them divide for 3 divisions, and then separated the twins into different experimental splits. One split was reprogrammed into iPSCs via induction of the OKSM factors and the other split (i.e., “carbon copy”) was run through the single-cell RNA-sequencing pipeline, yielding gene expression profiles for individual cells. Because our clone barcodes are both integrated into the DNA and expressed into mRNA, clone barcodes can be both detected by targeted barcode DNA-sequencing and also detected in the single-cell transcriptomes generated using the 10x chromium platform.36 By connecting both the clone barcode and the 10x cell ID (see Methods), we are able to assign the clone barcode to single-cell expression profiles in our dataset (Figure 1E). We filtered out all cells with spurious barcodes with unusual sequences or few reads and also removed all cells with multiple barcodes.36 After those filtering steps, we were able to confidently label 28.9% of all cells as containing a single clone barcode. After sequencing the clone barcodes in the resulting iPSCs corresponding to primed cells, we additionally labeled cells as primed or nonprimed in the single-cell RNA-sequencing dataset before comparing gene expression differences. We identified 42 cells as primed in our whole population of 13,589 cells, consistent with what we would roughly expect based on our observed reprogramming efficiencies. To avoid biases resulting from barcoded and nonbarcoded cells (i.e., possible gene expression differences facilitating integration of lentiviral DNA or not), all comparisons between primed and nonprimed cells were done only with barcoded cells (Figure S4A).

Having identified the rare primed cells within our single-cell RNA-sequencing dataset, we could then compute the expression differences between primed and nonprimed cells to find transcriptome markers of the primed state (Figure 1F). Most of the markers we identified that distinguished primed and nonprimed cells were consistently differentially expressed across three biologically independent Rewind experiments (Figure S4B, see Table S4). Upon categorizing the genes that were differentially expressed between primed and nonprimed fibroblasts, we found two general groupings. One consisted of genes involved in cell cycle regulation, pointing to an overall speedup of cell cycle progression in primed cells. Examples include increased expression of MKI67, TOP2A, and CENPF, all of which are expressed during the G2/M phase of the cell cycle. Increased expression of genes specific to the G2/M phase of the cell cycle may potentially indicate increased proliferation rate, because increases in proliferation rate are usually the result of a shortening of the G1 phase,39 making expression of G2/M phase genes relatively higher within the population as a whole. As predicted, primed cells had a higher fraction of cells in G2/M (65.9% versus 32.3% in nonprimed) and a lower fraction of cells in G1 (9.8% versus 34.9% in nonprimed) (Figure S4C). Primed fibroblasts also had lower expression of cyclin dependent kinase inhibitors CDKN2A and CDKN1A, which are known reprogramming barriers.29,40,41 Genes associated with M-phase regulation (CENPF, SMC4, CDC20) or microtubules (ASPM, TUBA1A) were also upregulated in primed fibroblasts.

The other primary gene expression signature of primed fibroblasts we observed was lower expression of several genes associated with activated fibroblasts (SPP1, SERPINE2, THBS1, TAGLN),4245 differentiation into myofibroblasts (MYL9, ACTA2, TPM2, POSTN),43,44,46,47 and pathological fibrosis (GDF15, IGFBP7, GAS5).4850 This signature suggests that unactivated fibroblasts within the population are more likely to reprogram. We wondered if these priming markers might be regulated by a core set of common transcription factors. We used a database that aggregates lists of regulatory relationships between transcription factors and target genes to identify potential common transcription factors for the positive and negative priming markers separately.51 Interestingly, we found that our negative priming markers were positively associated with several transcription factors known to either switch on during EMT or off during MET, including TWIST2, SNAI2, OSR1, and PRRX2 (Figures S5AB).52 We manually identified a number of binding motifs for these factors upstream of SPP1 and FTH1 (Figure S5C).

We wondered if the gene expression profiles of primed cells showed ectopic expression of factors specific to the target cell type. For example, elite cells identified in a human mesenchymal cell reprogramming system were marked by expression of stem cell markers such as NANOG, OCT4, and SOX2.31 We looked at the expression of sets of genes specific to iPSCs, fibroblasts, activated fibroblasts, myofibroblasts and cell cycle53 in primed versus nonprimed cells (Figures 1G, S4E). The absolute gene expression differences between primed and nonprimed cells were subtle but appropriately reflected the relative expression differences we observed (Figure S4D). The expression of canonical iPSC genes (e.g., NANOG, LIN28A, DNMT3B) was virtually the same in primed vs. nonprimed cells, while many activated fibroblast and myofibroblast genes were downregulated in primed cells. Next, we wondered whether the gene expression profile of primed cells reflected some part of the expression changes associated with the reprogramming process itself. We compared the gene expression profile of primed cells from our Rewind data to existing bulk RNA-sequencing data across multiple time points during iPSC reprogramming (after induction of reprogramming) in our cell line.37 Broadly, positive priming markers increased in expression through the reprogramming process while negative priming markers decreased in expression early on in reprogramming (Figures S5AB). The positive priming markers, however, did not show as much differential expression in iPSCs versus fibroblasts as compared to the negative priming markers. Downregulation of the negative priming markers (primarily fibroblast-specific genes) was also sustained during the whole reprogramming process (Figure 1G). These results argue against primed cells showing any indication of being in an iPSC-like state, instead suggesting an association with a generic lack of fibroblast activation and the early stages of reprogramming in our fibroblast system.

We wanted to confirm differential expression of the positive and negative priming markers by direct visualization of gene expression by single-molecule RNA FISH. We performed Rewind experiments similar to those in Figure 1E, but instead of subjecting one split of the experiment to single-cell RNA-sequencing, we instead immediately fixed the “carbon copy” fibroblasts after splitting. After identifying barcodes corresponding to iPSCs in the split in which OKSM was induced, we designed RNA FISH probes targeting the clone barcodes of primed cells (Figure 2A). In our fixed samples, we identified primed cells using these clone barcode RNA FISH probes (Figure S6A). We further measured expression of positive priming markers TOP2A and CENPF as well as negative priming markers SPP1 and SQSTM1, confirming that they had higher expression and lower expression respectively in primed cells as compared to nonprimed cells (Figure 2B, see Table S1). Single-molecule RNA FISH also confirmed that there was no difference in OCT4 or SOX2 expression between primed and nonprimed cells, in line with our single-cell RNA-sequencing results and eliminating the possibility that leaky expression of OKSM before induction could be responsible for priming individual cells for reprogramming (Figure 1H).

Figure 2: Selecting for positive and negative priming markers in the initial population predictably changes reprogramming outcome.

Figure 2:

A. Schematic of in situ Rewind for retrospectively identifying hiF-T cells primed to reprogram into iPSCs fixed on slides. Here, we transduced hiF-T cells at an MOI of ~1 with our barcode library. After 3–4 cell divisions, we divided the culture into two splits (A and B). With split A we reprogrammed the cells into iPSCs via induction of OKSM and performed barcode DNA-sequencing to identify primed clones. With split B (i.e., “carbon copy”), we immediately fixed cells on slides after splitting but before OKSM induction. We marked primed cells in split B with RNA FISH probes to primed clone barcodes identified in the reprogrammed iPSCs in split A. We imaged DAPI (blue) and RNA FISH signal from primed clonal barcodes (white) at different magnifications on a fluorescence microscope. Shown is a representative of 44 primed cells in ~25K individual cells.

B. We measured gene expression in individual primed and nonprimed cells in split B by performing single-molecule RNA FISH for SPP1 and SQSTM1 (i.e., negative priming markers) as well as TOP2A and CENPF (i.e., positive priming markers). For each marker, we plotted FISH spots per cell for many primed (in red) and nonprimed (in gray) cells. P-values comparing sample medians were calculated via Wilcoxon rank sum test.

C. To sort cells based on cycling speed, we stained cells with a fluorescent dye that becomes diluted with successive cell divisions (i.e., “accumulation dye”). Here, lighter shades of brown indicate higher dilution and more divisions (i.e., “fast cyclers”) while darker shades of brown indicate lower dilution and fewer divisions (i.e., “slow cyclers”). We sorted out slow and fast cells, reprogrammed each into iPSCs via induction of OKSM, and quantified the number of alkaline phosphatase-positive iPSC colonies (in white) formed.

D. We sorted cells based on cycling speed into four bins: slow, mid slow, mid fast, and fast. We plated cells, let them divide for 2–3 days, and measured the proliferation rate, where proliferation rate = (number of cells at the end) / (number of cells seeded). All measured proliferation rates were normalized to a control of ungated hiF-T cells (i.e., “normalized prolif. rate”). Metric shown is mean +/− standard error for n = 4 independent biological replicates for slow and fast and n = 2 for mid slow and mid fast.

E. We sorted cells based on cycling speed into four bins: slow, mid slow, mid fast, and fast. We reprogrammed each bin into iPSCs via induction of OKSM and quantified the number of iPSC colonies formed. The number of iPSC colonies formed for each bin was normalized to a control of ungated hiF-T cells (i.e., “normalized reprog. rate”). Metric shown is mean +/− standard error for n = 4 independent biological replicates and n = 3 for mid slow and mid fast.

F. We designed CRISPR guides to knockdown mRNA expression of (1) positive controls LSD1, DOT1L, and CDKN1A, (2) negative control MDM2, (3) negative priming markers FTH1, GDF15, SPP1, and SQSTM1, and (4) upstream priming regulators NFE2L2 and MYLB2 identified in Figure S5A. We quantified the number of iPSC colonies formed for each guide (labeled numerically) for each gene and reported each measurement as a fold change value in comparison to a no guide control (i.e., backbone vector lacking a targeting guide RNA). Each black dot shown per gene and guide is an individual biological replicate.

G. We performed bulk RNA-sequencing of cell lines infected with guides knocking down mRNA expression of LSD1, CDKN1A, and SPP1 as well as a no guide control for n = 2 biological replicates. We plotted expression profiles for each gene, guide, and replicate (labeled as gene name_guide number_replicate number) in principal component space where each color represents a different gene. We performed gene ontology enrichment analysis by using gprofiler2’s gost command for genes downregulated (log2(fold change) < −0.5) by an aggregate of each CRISPR guide and replicates for each gene and plotted the results for significant hits.

Pre-existing, faster cycling cells have an intrinsically higher efficiency of iPSC reprogramming

While the transcriptional signatures of primed cells suggested the importance of cell cycle speed and fibroblast activation (here, meaning expression of markers of fibroblast activation or myofibroblast differentiation) in reprogrammability, we wanted to confirm the associations between cell cycle and lack of fibroblast activation with priming via alternative methods. In the case of cell cycle, several studies have shown that cells already undergoing the reprogramming process can show increased rates of division,14,15 but less is known29 about how natural variability in proliferation rate before the reprogramming process is associated with reprogrammability.

To measure how cell cycling speed in uninduced fibroblasts affected the ability of cells to reprogram, we separated cells by cycling speed by staining them with a fluorescent dye that becomes diluted with successive cell divisions (Figure 2C). After a sufficient number of divisions, fast cycling cells were identified with low fluorescent signal while slow cycling cells were identified with high fluorescent signal due to differential dye diffusion. Upon sorting, we found that fast cycling cells proliferated at a 1.10-fold faster rate compared to the ungated control and a 1.34-fold faster rate compared to slow cycling cells (Figure 2D). Upon induction of OKSM in these different subpopulations, we found that fast cycling cells generated 2.64-fold more colonies than ungated cells and 5.32-fold more colonies than slow cycling cells (Figure 2E). There has been a report of an ultra-fast cycling (8-fold faster) population with higher reprogramming potential14; we did not observe such a subpopulation in our cell line.

Why do faster cycling cells correlate with more reprogramming into iPSCs? The difference could merely be the result of increased numbers of cells prior to induction owing to the increased number of divisions in fast cycling cells, or it could be that these cells have an intrinsically higher propensity to reprogram. To directly measure if primed clones reprogrammed more efficiently solely due to entering the reprogramming process with more cells, we measured the number of cells per clone for primed and nonprimed cells at the time of OKSM induction in our Rewind single-cell RNA-sequencing dataset from Figure 1E (Figure S7A). We observed a minimal difference in the distribution number of starting cells for primed versus nonprimed cells (mean(primed) = 1.20 versus mean(nonprimed) = 1.25, p-value = 0.57), indicating that reprogramming success is not merely a function of number of progenitors before OKSM induction.

Given that the cells were only kept in culture for 2–3 divisions after barcoding but before splitting for single-cell RNA-sequencing, we wondered how sensitive this approach was in detecting expected differences in number of starting cells. We performed an additional Rewind experiment in which we sorted our “carbon copy” split into different groups based on cycling speed before performing single-cell RNA-sequencing. With those data, we measured the starting number of cells per clone across cycling speeds as an indicator of proliferation rate. While we were able to detect differences in the number of starting cells between fast and ungated cells, we did not detect a difference between slow and ungated cells. We cannot distinguish whether the minimal difference in number of starting cells between slow and ungated cells here is because of smaller differences in cycling speed between these groups in this experiment or because of the inherent noisiness in measuring number of starting cells per clone in our dataset. Despite this limitation, we saw minimal differences in distribution of number of starting cells between primed and nonprimed cells both in bulk and when separating cells into cycling speed sort groupings. We also demonstrated that a higher fraction of fast clones reprogram into iPSCs compared to ungated and slow clones (Figure S7BD). Additionally, we estimated that fast clones and slow clones would need to maintain their respective pre-OKSM cycling speeds for nearly a week following OKSM induction, by which point reprogramming is well underway, to generate a sufficiently different number of progenitors to fully explain the different observed reprogramming rates (Figure S7E). These results collectively argue that naturally-occurring fast cycling cells within an otherwise homogenous population have an intrinsically higher rate of reprogramming than slow cycling cells.

We further wondered whether this difference in intrinsic reprogramming efficiency between cycling speed subpopulations could be explained by differences in OKSM induction. Therefore, we sorted hiF-Ts based on cycling speed (fast, slow, ungated) and measured OCT4 mRNA expression levels following induction over a time course of 2 days for each cycling speed subpopulation (Figure S8A). We found a comparable increase in OCT4 mRNA expression upon induction across cycling speeds at 12 hours. After 24 hours, however, slow cycling cells demonstrated somewhat lower median levels of OCT4 mRNA compared with fast cycling cells and ungated cells. A previous study in human fibroblasts54 has demonstrated that even order of magnitude differences in the induced levels of the OKSM factors did not significantly affect reprogramming efficiency. Furthermore, fast cycling cells and ungated cells demonstrated similar levels of induced OCT4 mRNA levels yet had differences in reprogramming efficiency, implying a different source of variation is more important in driving reprogramming outcomes than the relatively small differences in the level of OKSM induction between fast and slow cycling cells.

Knockdown of the negative priming marker SPP1 increases iPSC reprogramming efficiency without affecting cycling speed

While the above results show that cycling speed is associated with higher rates of reprogramming, not every fast cycling cell underwent reprogramming. Our molecular profiling results suggested that lower levels of fibroblast activation markers (Figure 1G) may be another important feature of priming. To demonstrate that a lack of fibroblast activation markers per se could lead to increase of reprogramming efficiency, we knocked down the expression of selected negative priming markers, including some with potential roles in fibroblast activation, to see if their loss could lead to an increased number of primed cells. We used CRISPR/Cas9 to knock down the negative priming markers SPP1, FTH1, and CDKN1A, as well as putative upstream regulators MYBL2 and NFE2L2 identified by motif analysis (Figure S5A, see Table S2). We included LSD1 and DOT1L knockdowns as positive and MDM2 knockdown as a negative control: LSD1 and DOT1L knockdown is known to increase reprogramming rate, and MDM2 knockdown is known to block reprogramming via upregulating p53 activity.37,55,56 Knockdown of SPP1 mRNA expression resulted in a higher reprogramming efficiency compared to cells infected with the same CRISPR lentivirus without a guide RNA (Figure 2F). Different guides knocked down RNA to different extents (including variability across replicates); we found that the greater the level of mRNA knockdown, the higher the number of iPSC colonies formed was (Figure S6C, see Table S1). We did not observe as clear of an association between knockdown of SPP1 at the protein level and iPSC colony formation rate; however, this lack of association could be explained by nonspecific SPP1 antibody binding in our assay or be due to SPP1 being a secreted protein (Figure S6B).57 These results show that some negative priming markers such as SPP1 may also functionally modulate iPSC reprogramming efficiency; notably, knocking down this factor did not affect cell cycle speed (Figure S6D).

We wondered how knockdown of SPP1 mRNA resulted in increased iPSC reprogramming efficiency. To measure gene expression changes specifically associated with knockdown of SPP1 mRNA, we performed bulk RNA-sequencing of the no guide control, KDM1A, CDKN1A, and SPP1 CRISPR knockdown lines (each of which resulted in increased reprogramming efficiency) and compared gene expression profiles. We verified the knockdown of each target gene compared to control for each CRISPR knockdown guide (Figure S6E). The gene expression profiles resulting from CRISPR knockdown for each guide clustered separately by target gene when plotted in principal component space, indicating that each target regulated the expression of distinct sets of genes (Figure 2G). Genes downregulated by SPP1 mRNA knockdown were specifically enriched for TGF-B (adjusted p-value = 3.27 × 10−2) and BMP signaling (adjusted p-value = 3.31 × 10−2). These downregulated genes included inducers of TGF-B (ITGB3),58 negative feedback regulators of TGF-B induced by TGF-B (GREM2, NOG, SMAD7),59,60 and downstream genes induced by TGF-B (SERPINE1),61 all consistent with lower basal levels of TGF-B signaling. Knockdown of SPP1 mRNA did not result in differential expression of many of the positive nor negative priming markers. These results collectively suggest that knockdown of SPP1 mRNA increases reprogramming efficiency not by pushing cells into the fast-cycling primed state but rather by decreasing TGF-B signaling, in line with SPP1’s known role in mediating TGF-B signaling as a barrier to iPSC reprogramming.62,63

Pre-existing differences in cycling speed and knockdown of negative primed state markers affect reprogramming efficiency in mouse embryonic fibroblasts

We wondered if the effects of increased cycling speed and knockdown of SPP1 mRNA on reprogramming efficiency were specific to our hiF-T cell line. To determine if these priming features were a function of semi-immortalization or were species specific, we performed similar reprogramming experiments in primary MEFs with a doxycycline-inducible OKSM cassette.

Regarding cycling speed, previous studies have focused on fast cycling cells after but not before the induction of reprogramming.14,15 Indeed, Guo et al. 201414 posited that there was likely not a sufficient difference in cycling speed in uninduced populations of MEFs to drive differences in reprogramming outcomes. To evaluate if pre-existing differences in cycling speed might affect reprogramming outcomes in MEFs, we sorted MEFs based on cycling speed and reprogrammed each sorted population separately. We found that fast cycling MEFs existing even before the induction of OKSM can reprogram at a higher efficiency, as in our hiF-T cell line (Figure S8B).

Regarding the negative priming markers, we performed CRISPR knockdowns of mRNA levels of SPP1, FTH1, and SQSTM1 in MEFs and evaluated the effects on iPSC reprogramming efficiency (see Table S2). As in the hiF-T system, we used knockdown of LSD1 mRNA as a positive control for increased reprogramming rate.64 We reprogrammed the MEFs into iPSCs under normoxic conditions despite the known lower reprogramming efficiencies to better match the conditions we used for our hiF-T cells.65 Knockdown of SPP1 mRNA for at least one guide resulted in an increase in reprogramming efficiency, corroborating our results in human fibroblasts (Figure S8C). As with the hiF-T system, knockdown of mRNA of the negative priming markers FTH1 and SQSTM1 did not affect reprogramming efficiency compared to the no guide control.

A key advantage of our hiF-T cell line is that it is clonally-derived and ensures homogenous dosage of the reprogramming factors. This advantage enabled us to remove confounding effects and identify pre-existing differences between a seemingly homogenous population of human fibroblasts, some of which can affect reprogramming outcomes. By contrast, MEF populations are derived from whole embryos and made up of clones from different developmental lineages, and are expected to have a much higher baseline heterogeneity between cells. Even so, differences in cycling speed and SPP1 expression levels similarly affected reprogramming outcomes in MEFs, indicating some degree of cross species generalizability. However, we have only evaluated reprogramming systems making use of integrated, inducible cassettes containing the reprogramming factors. Performing Rewind in different reprogramming systems under different reprogramming conditions (i.e., different induction media) may reveal different types of primed states.

Cells may move into or out of the primed state across time

Here, we have identified pre-existing features of a rare subset of cells poised to become iPSCs upon OKSM exposure. This primed subset consists of distinct clones identified by barcoding within the timescales of our Rewind experiments (<1 week). However, evidence from mouse B cells indicates that upon continued exposure to the reprogramming factors, nearly every clone is capable of becoming an iPSC albeit with different latencies6, suggesting that virtually every cell can ultimately enter the primed state. If priming were transient, it might explain the apparent discrepancies seen between deterministic and stochastic models of reprogramming: at any given moment in time a predictable subset of clones are primed (i.e., “elite”) but the clones within that subset change over time. Therefore, at short timescales reprogramming might seem deterministic while at long timescales reprogramming might seem stochastic. For these reasons, we wondered if cells could move into and out of the primed state over longer timescales, as observed in other contexts.32,35,36,66

To evaluate whether cells could move into and out of the primed state, we sorted a population of cells into separate groups based on cycling speed and tracked the cycling speed and reprogramming rate over a month in culture. If cell states did not change over time, we would expect that the fast cycling cells would maintain a higher cycling speed and reprogramming rate while the slow cycling cells would maintain a lower cycling speed and reprogramming rate across time (Figure 3AB). Instead, we saw that both fast and slow cycling cells revert back to the population mean for both cycling speed and reprogramming rate after 10–20 days. This reversion indicates that slow cycling cells can eventually switch to become fast cycling cells and vice versa, resulting in corresponding changes in iPSC reprogramming efficiency.

Figure 3: Cells may move into or out of the primed state across time.

Figure 3:

A. We performed similar experiments as in Figure 2C to sort a population of hiF-T cells into separate subpopulations based on cycling speed (i.e., slow cyclers, fast cyclers, ungated cells). We monitored each separate subpopulation over a month in culture; every few days we removed a subsample to measure the proliferation rate (i.e., change in number of cells with time (see Figure S6D)) and reprogramming rate (i.e., number of iPSC colonies formed with OKSM induction). Different possibilities for results and interpretations of each result are described.

B. We performed the experiment described in Figure 3A for n = 3 biological replicates. Each column shows results from a different individual biological replicate. Shown are proliferation rate (upper row) and reprogramming rate (lower row) for fast sorted cells (in light brown) and slow sorted cells (in dark brown) normalized to ungated cells (in gray) across time.

C. To control for population dynamics in our experiments in Figures 3AB, we performed a similar experiment by monitoring a population of barcoded hiF-T cells for a month in culture. Here, we transduced hiF-T cells at an MOI of ~0.15 with our barcode library and allowed 3–4 divisions before initially splitting the population. Every week (2–3 divisions), we removed a subsample to measure the distribution of clone barcodes in the cultured fibroblasts and to reprogram another subsample followed by sequencing the gDNA barcodes in the resulting iPSCs to identify primed clones at that point in time. Primed and nonprimed cells are indicated by red and white cytoplasm respectively, and nucleus color denotes clone barcode.

D. To quantify primed state fluctuations across time, we measured pairwise clone barcode overlaps between the subset of primed clones at each time point. Different possibilities for results and interpretations of each result are described. The experimental data is shown for n = 2 biological replicates shown in different colors on the bar plot. To correct for subsampling, we divided each pairwise overlap by the baseline overlap between all clones present before reprogramming at each time point (see Figure S9A). The time points being compared are indicated with a red highlight in the schematic under each x-axis.

E. For clones only primed at each time point (i.e., not primed across multiple time points), we plotted the normalized barcode abundance versus time for each biological replicate shown in different colors. The expected distribution is based on the assumption that cells move into and out of the primed state over time, which would result in the relative abundance to peak during the time point each clone is primed as the primed state is associated with faster cycling speed. The relative abundance is normalized for sequencing depth as well as number of barcodes present at each time point.

F. To demonstrate clones moving into and out of the primed subset, we plotted Sankey diagrams for all clones forming iPSCs at some point during the month in culture for each biological replicate. For each time point, boxes/nodes represent primed in red and nonprimed in gray while ribbons depict the flow of cells between clusters (red indicates moving into primed versus gray indicates moving into nonprimed regardless of origin).

However, differing population dynamics within each sorted population is a potential confounder. Therefore, we wondered if we could likewise observe clones fluctuating into and out of the primed state in an unsorted population with our barcoding approach. We performed similar experiments as above with barcoded cells, keeping them in continuous culture for a month, splitting the population at regular intervals and sequencing barcodes in one split to measure the relative abundance of each clone in the OKSM-naive fibroblast population as well as to reprogram cells in another split into iPSCs to identify primed clones (e.g., cells sharing a barcode sequenced in large reprogrammed iPSC colonies) over time (Figure 3C).

To quantify primed state fluctuations across time, we measured pairwise overlaps between the subset of primed clones at each time point. To correct for subsampling, we divided each pairwise overlap by the baseline overlap in all clones present before reprogramming at each time point. If cell states were fixed over time, we expected that the amount of overlap would be more or less the same across time (Figure 3D). Instead, we generally saw that the overlap was higher for points nearer together in time and lower for points further apart in time, indicating that the composition of the primed subset changes over time. This pattern held whether we chose primed cells with the experimental reprogramming frequencies or when assuming a fixed reprogramming frequency (Figure S9A). For biological replicate 2 compared with biological replicate 1, we saw more barcode overlap between week 1 and week 4. This difference was likely driven by the fact that biological replicate 2 had a higher fraction of clones remaining in the primed state across the whole time period, resulting in higher barcode overlap across all pairwise comparisons.

We observed that clones that are primed in the first week do not all remain primed indefinitely and also do not come to dominate the culture over time as might be expected if priming were immutable due to the faster cycling speed of cells in the primed state (Figure 3E). In fact, clones exiting the primed state on average decreased in relative abundance in the culture over time, perhaps indicating these clones have exited out of a cell state marked by a relatively faster cycling speed. Furthermore, clones that were not initially primed became primed at later time points (Figure 3F). These clones that became primed at later times increased in average relative abundance in the population over time, indicating that these clones may have become primed by entering into a cell state marked by a relatively faster cycling speed. Finally, while primed cells on average had more cells per clone compared to nonprimed cells in the initial population, very high cells per clone was not a prerequisite for being primed and colony size in reprogrammed iPSCs did not strongly correlate with pre-OKSM abundance (Figure S9B).

Taken together, these results suggest that cells are not immutably primed or not primed, and that cells may move into or out of the primed state over time. Furthermore, while the specific cells that comprised the primed subset changed over time, we observed a more or less stable reprogramming rate and degree of memory on the population level over a month (>15 cell divisions) in culture (Figure S9C). The dynamic nature of priming helps reconcile the observation that given enough time, seemingly every cell is capable of forming an iPSC with the identification by us and others of a discrete subset of cells poised to become iPSCs at a specific point in time.911,14,31 Future work must be done to directly visualize primed state fluctuations in individual cells.

Cycling speed and fibroblast activation may be aspects of a single axis of biological variability marking primed cells

Given that we identified two modes of priming (faster cycling and lack of fibroblast activation), we wondered the extent to which these two modes were distinct, or whether they were just different aspects of a single underlying axis of biological variability. To distinguish these possibilities, we performed a Rewind experiment as before in Figure 1E; however, with our “carbon copy” split we sorted out cells based on cycling speed and then ran each group separately through the single-cell RNA-sequencing pipeline (Figures 4AB). Simultaneously quantifying cycling speed and priming marker expression in individual cells enabled us to directly measure the relative contribution of each mode of priming to overall priming status.

Figure 4: Cycling speed and fibroblast activation are aspects of a single axis of biological variability marking primed cells.

Figure 4:

A. The datasets are GSM7092519, GSM7092520, and GSM7092521. Here, we simultaneously stained hiF-T cells with an accumulation dye and transduced the same hiF-T cells at an MOI of ~0.15 with our barcode library. After 2–3 cell divisions, we divided the culture into two splits. With one split, we reprogrammed the cells into iPSCs via induction of OKSM and performed barcode DNA-sequencing to identify primed cones. With the other split, we sorted out successfully barcoded cells (GFP positive) and sorted out equal numbers of slow, ungated, and fast cells based on the accumulation dye signal. We immediately performed single-cell RNA-sequencing on each separate cycling speed population and barcode DNA-sequencing to label single-cell expression profiles with clone barcodes. We applied the Uniform Manifold Approximation and Projection (UMAP) algorithm to the first 50 principal components to visualize differences in gene expression in hiF-Ts before OKSM induction. Shown are cells for which we could confidently assign a single clone barcode across all cycling speeds on the left UMAPs and in fast cyclers versus slow cyclers on the right UMAPs. Cells are colored by clusters determined using Seurat’s FindClusters command at a resolution of 0.3 or by priming status (primed in red, nonprimed in gray).

B. The datasets are GSM7092519, GSM7092520, and GSM7092521. On the UMAP, we recolored each cell for its expression for a select subset of positive and negative priming markers using Seurat’s FeaturePlot command. Positive priming markers CENPF, MKI67, TOP2A, and SOX21 are primarily expressed in clusters 0 and 1. Negative priming markers SPP1, GDF15, CDKN1A, and FTH1 are primarily expressed in clusters 3 and 4. Color scales shown have arbitrary units based on relative gene expression of scaled, log-normalized RNA counts.

C. To measure the relative explanatory power of cycling speed and a subset of our identified priming markers (as well as the housekeeping genes GAPDH and UBC) in the context of priming, we calculated and plotted odds ratios (see Methods). The odds ratios for each gene was calculated with corresponding standard error separately in n = 3 biologically independent single-cell RNA-sequencing datasets and aggregated via a random-effects model using the metafor package in R. P-values comparing sample means were calculated using the Student’s t-test.

D. To visualize different axes of biological variability in our single-cell RNA-sequencing dataset, we aggregated expression profiles in each of our cycling speed-priming categories using Seurat’s AggregateExpression command and plotted the aggregates in principal component space. We extracted loadings for each principal component and plotted them for principal component 1 and principal component 2, highlighting the positive priming markers (in red) and negative priming markers (in blue).

E. To evaluate the correlation between SPP1 levels and cycling speed, we plotted normalized SPP1 counts from our single-cell RNA-sequencing dataset described in Figure 4A for all cells versus primed cells within each cycling speed. Each dot represents the normalized SPP1 counts from an individual cell. P-values comparing sample medians were calculated using the Wilcoxon rank sum test.

F. To determine if low levels of SPP1 may drive cycling speed, we measured proliferation rates for cells transduced with each CRISPR guide for MDM2 and SPP1 in Figure 3F. Here, we plotted aggregated proliferation rates for all guides to a given gene target. Metric shown is mean +/− standard error for all aggregated guides. The proliferation rates for individual guides across n = 2 independent biological replicates and details on how we calculated proliferation rate can be found in Figure S6D. P-values comparing sample means were calculated using the Student’s t-test.

G. We subsetted our single-cell RNA-sequencing dataset described in Figure 4A to include only primed cells and clustered using the Uniform Manifold Approximation and Projection (UMAP) algorithm to the first 50 principal components to visualize difference in gene expression between slow primed, ungated primed, and fast primed cells. On the UMAP, we recolored each cell for its expression for a subset of markers that were differentially expressed between fast primed cells and slow/ungated primed cells. Color scales shown have arbitrary units based on relative gene expression of scaled, log-normalized RNA counts.

H. The datasets are a subset of GSM7092519, GSM7092520, and GSM7092521. To determine if the iPSC colonies arising from fast versus slow cycling primed cells had any phenotypic differences, we plotted the distribution of iPSC colony size for iPSCs derived from slow, ungated, and fast clones. The size of each iPSC colony was determined by normalizing read counts after performing DNA-sequencing on the reprogrammed iPSCs using spike-ins of known cell number and of known barcodes (see Table S3).

We first set about determining their relative contributions by using odds ratios between reprogrammability as a function of proliferation speed and transcriptional profile. For example, we asked what are the odds that a fast cycling cell is also primed? We compared odds ratios for cycling compared to several positive (CENPF, MKI67, TOP2A, SOX21) and negative (SPP1, GDF15, CDKN1A, FTH1) priming markers. The positive priming markers, which were predominantly associated with cell cycle progression, did not have a stronger positive association with priming compared with fast cycling. However, several of the negative priming markers, in particular SPP1, had a stronger negative association with priming compared with slow cycling (Figure 4C). To use information lost by dichotomizing continuous gene expression values to calculate odds ratios, we generated logistic regression models in which we determined the contributions of cycling speed, expression of each gene, and the interaction between these terms in predicting priming. Again, we saw that the negative priming markers explained more of the variation in priming status compared to cycling speed as a variable (Figure S10A).

We wondered if this difference in explanatory power was because the negative priming markers, associated with fibroblast activation, represented a distinct mode of priming. To answer this question, we first aggregated molecular profiles from cells in each of our cycling speed and priming categories and visualized the similarity in principal component space (Figure 4D, see Table S6). The aggregates separated by cycling speed along principal component 1 (PC1), which was determined primarily by our identified priming markers (i.e., positive priming markers in the positive direction and negative priming markers in the negative direction) and argued for a single shared axis of variability. However, slow primed cells and ungated primed cells were less far along PC1 compared with fast nonprimed cells, indicating that cycling speed is the predominant signature in PC1. Slow primed and ungated primed cells separated from nonprimed cells along principal component 2, which explained a similar amount of variation in our aggregate samples yet did not correlate with our identified priming markers (Figure 4D). This principal component hints at additional, unidentified axes of biological (or perhaps technical) variability that may drive priming in ways distinct from those we have described here. These factors may enable slow primed cells to reprogram in spite of lacking fast cycling.

To evaluate the relationship between cycling speed and fibroblast activation more closely, we focused on SPP1 because it was the negative marker with the highest explanatory power and earlier we had shown how expression levels of SPP1 can directly affect reprogramming efficiency. When we measured SPP1 mRNA levels between primed and nonprimed cells across cycling speeds (via both our single-cell RNA-sequencing dataset and bulk RNA-sequencing of each cycling speed group), SPP1 expression was strongly anti-correlated with cycling speed broadly but was generally lower among primed cells compared to nonprimed cells within each cycling speed group (Figures 4E, S10BC).

This pattern of SPP1 expression could indicate that slow cycling primed cells may successfully reprogram in spite of having a slow cycling speed (in part by having low SPP1 levels) or that slow cycling primed cells are more simply marked by a relatively faster cycling speed compared to their nonprimed peers. To distinguish these possibilities, we measured the fraction of cells in G1 as a proxy for cycling speed (i.e., a higher fraction of cells in G1 implies a slower cycling speed)39 across cycling speed categories and between primed and nonprimed cells. We found that across all cycling speed categories, primed cells had a lower fraction of cells in G1; slow primed cells had an even lower fraction of cells in G1 compared with fast cyclers in bulk. Additionally, cells with the lowest 2.5% and 5% of SPP1 expression had a lower fraction of cells in G1 compared to the remaining population (Figure S7F). However, slow primed cells demonstrated fewer (0.28 cells fewer on average, p-value = 0.16) mean starting cells per clone compared to fast primed cells. Possible explanations for this discrepancy include: (1) slow primed cells are misclassified as slow by our accumulation dye approach, (2) slow primed cells spend relatively less time in G1 but spend a longer time overall to complete the entire cell cycle, or (3) slow primed cells mark cells newly entering the primed state and acquiring a fast cycling speed.

To determine if slow primed cells represented possible contamination of fast primed cells in the slow subpopulation during sorting, we measured barcode overlap from our single-cell RNA-sequencing dataset across each sorted cycling speed subpopulation (Figure S10D); 10.4% of fast primed clone barcodes were also present in the slow subpopulation. We wondered what portion of this overlap was due to sorting contamination versus possible loss of priming memory within a given clone. We had 4 clones in our single-cell RNA-sequencing dataset with twins in the slow subpopulation and twins in the fast subpopulation. When comparing the fraction of cells not in G1 for each set of twins, we found 3 clones had a similar fraction (implying contamination) while 1 clone had a higher fraction for the fast twins compared to the slow twins (implying loss of memory). This analysis, however, is based on a very small number of clones and cells per clone. Additionally, slow primed cells cluster separately from fast primed cells in principal component space (Figure 4D), implying that they represent distinct cell types. Therefore, we cannot definitively conclude whether slow primed cells simply reflect contamination during our sorting process.

Given the correlation between SPP1 mRNA levels and cycling speed, we wondered if knockdown of SPP1 may increase reprogramming efficiency by increasing cycling speed. When we knocked down SPP1 mRNA levels by CRISPR in Figure 2F, we found that cycling speed was unchanged in cells with guides to SPP1 compared to control (Figure 4F). In contrast, knockdown of MDM2 mRNA levels reduced cycling speed and reprogramming efficiency, presumably via upregulation of p53 activity. That SPP1 knockdown can seemingly increase reprogramming rate independently of cycling speed indicates that fast cycling speed, per se, is not required for successful iPSC reprogramming.

These results collectively show that fast cycling and lack of fibroblast activation genes may mark a single underlying axis of variability, meaning that fast cycling cells show low levels of fibroblast activation markers and vice versa. While reducing SPP1 levels can increase reprogramming efficiency without changing cycling speed, we do not know if the opposite is also true; indeed, cells with fast cycling speed may only affect reprogramming by virtue of their association with lack of fibroblast activation. Of these modes of priming, however, lack of fibroblast activation markers was a stronger predictor of whether an individual cell was likely to reprogram. These results further demonstrate SPP1’s perhaps dual roles as a predictive marker of cycling speed as a proxy for priming and a determinant of reprogramming via its interaction with TGF-B signaling, and we cannot differentiate which of these roles contributes most to the increased reprogramming potential of primed cells upon knockdown of SPP1 mRNA.

Slow cycling primed cells and fast cycling primed cells are transcriptionally distinct, implying the existence of multiple types of primed cells

We wondered how the distinction between primed and nonprimed cells differed within the fast and slow cycling subgroups. We found that the differences were stronger (in terms of fold change of priming marker expression) for the primed vs. nonprimed cells within the slow cycling cells as compared to the fast cycling cells (Figure 10F, see Table S5). Gene expression profiles between fast primed and fast nonprimed cells were considerably less distinct, but fast primed cells did have subtly lower expression of negative priming markers (ACTA2, SERPINE2) and higher expression of positive priming markers (CDC20, CCNB1) as compared to fast nonprimed cells (Figure S10F). Slow primed cells and ungated primed cells did not cluster near fast primed cells in principal component space (Figure 4D), indicating that despite sharing a core primed gene expression profile, additional sources of perhaps technical or biological variation may be more dominant when classifying these cells. When we clustered solely the primed cells (as opposed to the entire population) and projected into UMAP space, we again found that fast cycling primed cells clustered separately the ungated and slow primed cells (Figure 4G). Thus, fast cycling primed cells may represent a cellular state distinct from that of slow cycling primed cells, showing that there may be multiple types of primed cells.

A parallel question is whether the iPSC colonies arising from fast versus slow cycling primed cells had any phenotypic differences; we found that colonies arising from both types of cells did not have any appreciable differences in the number of cells per colony (Figure 4H) (although when only large colonies are considered, fast primed cells did lead to larger iPSC colonies; Figure S10G). These results suggest that different initial states may adopt similar ultimate fates as part of the reprogramming process. We also wondered whether the iPSC colonies arising from fast versus slow cycling primed cells had any phenotypic differences; we found that colonies arising from both types of cells did not have any appreciable differences in the number of cells per colony (Figure 4H) (although when only large colonies are considered, fast primed cells did lead to larger iPSC colonies; Figure S10G). These results suggest that different initial states may adopt similar ultimate fates as part of the reprogramming process. However, we have only here measured a single phenotypic feature (colony size); comparing, for example, the time required to reprogram successfully into iPSCs or capacity of the reprogrammed iPSCs to differentiate may reveal other important differences in phenotypic fate.

The primed state is defined by extrinsic perturbations in addition to intrinsic cell state

Here, we have defined primed cells as cells that are able to undergo reprogramming when induced. We have demonstrated a mapping between the intrinsic state of the cell and reprogramming outcome, seemingly enabling a purely intrinsic, state-based definition of priming. There are perturbations, however, that can change the apparent efficiency of reprogramming, challenging that assertion.

One such perturbation is inhibition of LSD1 (histone lysine demethylase 1), which was recently identified as a reprogramming booster in our specific cell line37 and for which a chemical inhibitor is readily available. If priming were purely intrinsic, then the apparent boost in reprogramming would have to come from an increase in the proliferation of those same existing intrinsically primed cells. If, on the other hand, LSD1 inhibition allowed cells to reprogram that otherwise would not have reprogrammed, then one could say that the perturbation acts by a “reclassification of state” for priming, consequently meaning that priming cannot be defined purely intrinsically (Figure 5A). To discriminate between these possibilities, We barcoded fibroblasts and separated twins into two splits, one with pure OKSM induction (i.e., “control”) and the other with both LSD1 inhibition and OKSM induction (i.e., “+LSD1i”). Upon sequencing the resultant iPSC colonies for clone barcodes, we found that a number of clone barcodes showed up in both the pure OKSM and the LSD1 inhibition conditions, but an even larger proportion showed up only when LSD1 was inhibited (Figures 5B, S11AB). Cells primed for reprogramming only with LSD1 inhibition still exhibited memory for the primed state to roughly the same extent as conventional priming (Figure 5C). Thus, LSD1 inhibition concurrent with OKSM induction led to a reclassification of initial cell states as primed.

Figure 5: The primed state is defined by extrinsic perturbations in addition to intrinsic cell state.

Figure 5:

A. Schematic of Rewind for following fates of hiF-T cells reprogrammed in OKSM alone (i.e., “control”) and in OKSM with LSD1 inhibition (i.e., “+LSD1 inhibition”) with different possibilities for the source of additional colonies described. We transduced hiF-T cells at an MOI of ~1 with our barcode library. After 3–4 cell divisions, we divided the cultures into splits (A and B). In split A we reprogrammed with OKSM alone while in split B we reprogrammed with OKSM and LSD1 inhibition. After reprogramming, we stained for alkaline phosphatase activity (in white) and imaged using fluorescence microscopy. We measured the number of colonies formed in each reprogramming condition and plotted the fold change between OKSM with LSD1 inhibition over OKSM alone. Metric shown is mean +/− standard error for n = 5 independent biological replicates.

B. For some experiments described in Figure 5A, we performed barcode DNA-sequencing on iPSCs formed in OKSM alone (i.e., “control”) versus in OKSM with LSD1 inhibition (i.e., “+LSD1i”). We compared barcodes across each reprogramming condition after performing normalizations described in Figure S11. We measured the number of clones in the reprogrammed iPSCs in each reprogramming condition and plotted the fold change between OKSM with LSD1 inhibition over OKSM alone. Metric shown is mean +/− standard error for n = 3 independent biological replicates.

C. We performed a similar experiment as described in Figure 5A, but after 5–6 cell divisions we divided the cultures into five splits: two splits were reprogrammed with OKSM alone, two splits were reprogrammed with OKSM and LSD1 inhibition (i.e., “+LSD1i”), and one split was reprogrammed with OKSM and DOT1L inhibition (i.e., “+DOT1Li”). After reprogramming, we performed barcode DNA-sequencing on iPSCs formed in each reprogramming condition and compared barcode overlap within and across conditions as indicated. We plotted barcode overlap across each indicated comparison. Metric shown is mean +/− standard error for n = 2 independent biological replicates. P-values comparing sample means were calculated using the Student’s t-test.

D. To determine when LSD1 inhibition acts to increase the number of iPSC colonies, we reprogrammed hiF-T cells via OKSM induction and added LSD1 inhibitor during the different time frames indicated. Shown are representative images of 24-wells after reprogramming and staining for alkaline phosphatase activity (in white) in each condition. We quantified and plotted the number of colonies in each condition after normalizing to the number of colonies formed in baseline reprogramming (i.e., “only OKSM”). Metric shown is mean +/− standard error for n = 2 independent biological replicates. P-values comparing sample means were calculated using the Student’s t-test.

E. We performed bulk RNA-sequencing on hiF-T cells after one week in normal culture conditions versus culturing with LSD1 inhibition (i.e., “+LSD1i”) and performed differential expression analysis in DESeq2. OKSM was not induced in either condition. We plotted log2(fold change) values for different categories of genes in LSD1 inhibition over control. Shown from left to right are genes with the top 50 log2(fold change) values in the differential expression analysis (i.e., “upregulated genes in +LSD1i”), our top 25 negative priming markers, four housekeeping genes (UBC, GAPDH, PGH1, ACTB), and our top 25 positive priming markers. P-values comparing sample medians were calculated using the Wilcoxon rank sum test.

F. To determine if LSD1 inhibition had any effect on proliferation rate, we measured proliferation rate per day as described in Figure S6D in hiF-T cells after one week in normal culture conditions (i.e., “control”) versus culturing with LSD1 inhibition (i.e., “LSD1i”). Metric shown is mean +/− standard error for n = 2 independent biological replicates. P-values comparing sample means are calculated using the Student’s t-test.

G. We used the accumulation dye approach in Figure 2C to sort out hiF-Ts by cycling speed (slow, unsorted, and fast) and then reprogrammed each population with OKSM alone versus OKSM and LSD1 inhibition. After reprogramming, we stained the iPSCs with alkaline phosphatase and counted the number of iPSC colonies formed in each condition. We calculated the normalized reprogramming rate by dividing the number of colonies formed in each condition by the number of colonies formed in the unsorted population reprogrammed with OKSM alone. Metric shown is mean +/− standard error for n = 2 independent biological replicates. P-values comparing means were calculated using the Student’s t-test.

LSD1 inhibition before OKSM induction might also function to increase iPSC reprogramming efficiency by pushing nonprimed cells from the population into the same intrinsic state that was classified as primed without LSD1 inhibition. To test this possibility, we used inhibition of LSD1 at various points before and during reprogramming. We found that a 7 day pre-treatment with LSD1 inhibitor prior to OKSM induction did not lead to any appreciable difference in the number of iPSC colonies (Figure 5D), suggesting that LSD1 inhibition primarily changes the probability of a cell in a given state to reprogram rather than changing the state to resemble a specific primed state itself.

Supporting this conclusion, we found that LSD1 inhibition did not change the expression of our previously identified priming markers when comparing bulk RNA-sequencing profiles for hiF-Ts after a week in culture with or without LSD1 inhibition (Figure 5E). We did, however, observe upregulation of epithelial markers (KRT19, EPCAM) as well as factors whose expression early on following OKSM induction is associated with reprogramming success (BMP4, ITGB4),23,67 consistent with LSD1’s proposed role in facilitating MET (Figure 5E).37 Additionally, knockdown of LSD1 by chemical inhibition or CRISPR did not increase and perhaps decreased cell cycling (Figure 5F),37,64 and LSD1 inhibition increased iPSC generation efficiency regardless of cycling status in uninduced fibroblasts (Figure 5G). Thus, LSD1 inhibition does not increase priming by pushing cells into a state of faster cycling. Together, these results are consistent with a model in which LSD1 inhibition allows cells to reprogram that would not have reprogrammed otherwise, and points to the fact that the primed cell state cannot be solely intrinsically defined as single discrete state, but must be defined based on both the intrinsic state of the cell and the context of the reprogramming process itself.

We further wondered if cells that required LSD1 inhibition to reprogram also were able to reprogram when treated with other reprogramming boosters. Inhibition of DOT1L, an H3K79 methyltransferase, is known to also facilitate iPSC reprogramming.56,68 We found that some clones dependent on LSD1 inhibition could also form iPSCs with DOT1L inhibition, but the amount of memory across perturbation conditions was lower, implying a combination of booster specific and booster general mechanisms for expanding the subset of primed cells (Figure 5C).

Clone of origin has a minimal effect in dictating the final molecular states of cells subjected to iPSC reprogramming

Above, we showed that LSD1 inhibition allows some cells to reprogram that would normally not reprogram. This result raises the question of whether the different initial state of these cells propagates to differences in the molecular profile of the ultimate iPSCs formed. Indeed, more generally, we wondered to what extent either the cell of origin or reprogramming conditions affected the ultimate iPSC state.

Given the heterogeneity in reprogrammed iPSCs,6971 we needed to have single-cell resolution of these outcomes as well as information about the clonal origin of iPSC colonies. To trace reprogramming outcomes from the originating cell through to the final state with single-cell resolution, we used a method called FateMap.36 In FateMap, we again barcode cells before inducing OKSM but we collect the entire pool of cells after reprogramming is finished instead of beforehand and perform single-cell RNA-sequencing on them. Thus, we can measure the transcriptional heterogeneity within and between individual iPSC clones. As reported, there was significant heterogeneity within the reprogrammed population as a whole (Figure 6A). Upon clustering, we found several distinct iPSC clusters, identifiable by broad average expression of pluripotency markers (clusters 0, 1, 2, 3). Other clusters included surviving fibroblasts (cluster 4) and a more indeterminate cluster potentially representing incomplete reprogramming or differentiating iPSCs (cluster 5) (Figures 6AB).

Figure 6: Clone of origin has less influence than reprogramming conditions in dictating the final molecular states of cells subjected to iPSC reprogramming.

Figure 6:

A. The datasets are GSM7092522, GSM7092523, and GSM7092524, and GSM7092525. We transduced hiF-T cells at an MOI of ~0.15 with our barcode library. After 3 cell divisions, we sorted the successfully barcoded population (GFP positive) and divided the culture into four splits for reprogramming: two splits reprogrammed with OKSM alone and two splits reprogrammed with OKSM and LSD1 inhibition. After reprogramming each split into iPSCs, we performed single-cell RNA-sequencing and barcode DNA-sequencing to label single-cell expression profiles with clone barcodes. We applied the Uniform Manifold Approximation and Projection (UMAP) to the first 50 principal components to spatially visualize differences in gene expression in the resulting iPSCs within and across each reprogramming condition. Cells are colored by cluster determined by using Seurat’s FindClusters command at a resolution of 0.3. For each reprogramming condition, we calculated the fraction of cells across iPSCs (clusters 0, 1, 2, 3), an indeterminate subset somewhere between iPSCs and fibroblasts (cluster 5, “indeterminate”), and fibroblasts seemingly surviving reprogramming but not becoming iPSCs (cluster 4).

B. The datasets are GSM7092522, GSM7092523, and GSM7092524, and GSM7092525. We used Seurat’s AddModuleScore command to average expression of 8–10 previously described markers of pluripotency as well as epithelial, mesenchymal, and fibroblast cell identity in each cell. On the UMAP, we recolored each cell for its score for each module. Color scales shown have arbitrary units based on relative gene expression of scaled, log-normalized RNA counts.

C. The datasets are GSM7092522, GSM7092523, and GSM7092524, and GSM7092525. Schematic demonstrating how to interpret different values for the mixing coefficient, previously described in 36 (see Methods). Higher values of the mixing coefficient indicate a higher similarity in the expression profiles of the sets of barcoded cells analyzed. Representative UMAPs for different mixing coefficient values are shown. We calculated mixing coefficients for twins from the same clone on separate plates within the same reprogramming condition (in pink for iPSCs formed from OKSM alone, in blue for iPSCs formed from OKSM with LSD1 inhibition) and across reprogramming conditions (in gray). P-values comparing sample medians were calculated using the Wilcoxon rank sum test.

D. The datasets are GSM7092522, GSM7092523, and GSM7092524, and GSM7092525. On the UMAP, we recolored matched twins forming iPSCs with OKSM alone in pink and those forming iPSC with OKSM and LSD1 inhibition in light blue. For all clones with twins forming iPSCs in both reprogramming conditions, we measured the cluster containing the largest fraction of twins (i.e., dominant cluster) and calculated the fraction of all clones having that dominant cluster in each reprogramming condition. For each clone, we compared the dominant cluster in iPSCs formed with OKSM alone to the dominant cluster in iPSCs formed with OKSM and LSD1 inhibition, indicating a “switch” whenever the dominant cluster changed across reprogramming conditions. We measured the fraction of pairwise switches for each clone and marked them on the UMAP. The width of each arrow indicates the relative fraction of clones making that switch from OKSM alone to OKSM with LDS1 inhibition.

E. For each matched pair twin across reprogramming conditions identified in Figure 6D, we plotted the average assigned pluripotency module scores for the twins reprogrammed in OKSM alone on the x-axis and for the twins reprogrammed in OKSM with LSD1 inhibition on the y-axis. Each dot represents an individual clone barcode. The p-value comparing paired means was calculated using a paired t-test. To measure differential expression of individual pluripotency markers, we calculated log2(fold change) values for each clone barcode and a subset of genes. We selected the pluripotency markers used in Figure 1 and also additional pluripotency markers associated with clusters 0, 1, and 3 found by running Seurat’s FindMarkers command. Metric shown is mean +/− 95% confidence interval for n = 26 clone barcodes.

F. The datasets are GSM7092522, GSM7092523, and GSM7092524, and GSM7092525. To identify clones forming iPSCs only when reprogrammed with OKSM and LSD1 inhibition (i.e., “LSD1i-dependent clones” in light blue) versus clones forming iPSCs in both reprogramming conditions (i.e., “LSD1i-independent clones” in dark blue), we performed barcode DNA-sequencing separately for each programming condition on the leftover iPSCs (see Figure S13C). We calculated the fraction of clone barcodes in each condition and plotted the barcode overlap as a Venn diagram with the percentage of all clone barcodes shown for each subset. On the UMAP, we plotted only cells for which we could confidently assign a single clone barcode and reprogrammed in OKSM with LSD1 inhibition.

G. To visualize different axes of biological variability in our single-cell RNA-sequencing dataset, we aggregated expression profiles using Seurat’s AggregateExpression command for fibroblasts (cluster 4), indeterminate cells (cluster 5), iPSCs formed in OKSM alone (i.e., “OKSM alone iPSCs” in red), and iPSCs formed in OKSM with LSD1 inhibition from LSD1i-dependent clones (in light blue) versus from LSD1i-independent clones (in dark blue). We extracted loadings for each principal component and identified that the pluripotency markers used in Figure 6B had a significant contribution to principal component 1 in the leftward direction.

Having mapped the heterogeneous outcomes of reprogramming for normal induction with OKSM, we then set about measuring the differences in these outcomes that arise due to inhibition of LSD1 during induction. In order to track the outcomes of cells across these two conditions, after barcoding we let cells divide into multiple twins, which were then separated into four separate splits: two were subjected to standard OKSM induction and two were subjected to OKSM induction together with LSD1 inhibition. By comparing outcomes between twins across the same condition, one can determine the extent to which fates are predetermined, and comparing across different conditions reveals the differences in the composition and character of outcome states between those conditions. (Overall, we found that there were broadly minimal differences in the distribution of iPSCs, fibroblasts, and indeterminate cells between OKSM induction versus OKSM induction with LSD1 inhibition (Figure 6A).)

To measure the similarity of final molecular states both within and across clones, we used a previously formulated “mixing coefficient” metric. Briefly, the mixing coefficient measures how evenly interspersed distributions across different splits are in principal component space36; a mixing coefficient near 1 indicates high mixing or similarity while near 0 indicates low mixing or similarity (Figures 6C, S12A). Recent studies have shown that twins subjected to the same conditions can have either similar36,72 or distinct73 outcomes, suggesting a high degree of intrinsic or extrinsic fate specification, respectively. Here, within each reprogramming condition, we saw a high degree of mixing when comparing twins on separate plates (Figure 6C). However, we saw a similarly high degree of mixing between non-twins, indicating that within a set of reprogramming conditions clones are relatively mixed with no clear preference for distinct final states. To account for the relative low number of cells per clone, we performed a similar analysis with a more sensitive metric of similarity of single-cell molecular profiles.73 While we found some amount of dissimilarity when comparing individual clones and equal numbers of randomly sampled cells, several clones continued to show no difference from random (Figures S12BC).

Extrinsic reprogramming conditions contribute more than intrinsic clonal differences in determining gene expression fates in reprogramming iPSCs

We next wondered if twins across instead of within reprogramming conditions were transcriptionally similar. We found that matched twins in OKSM versus OKSM with LSD1 inhibition had a low degree of mixing, indicating distinct molecular profiles dictated by each respective reprogramming condition itself (Figure 6C). These results collectively suggest that what final state a cell ends up in after becoming an iPSC is dictated more by the extrinsic reprogramming conditions and less by pre-existing intrinsic differences before OKSM induction in our system.

Given that the same clones seemed to have distinct final states in iPSCs formed from OKSM alone versus OKSM with LSD1 inhibition, we wondered what was different about those final states. We compared matched twins from the same clone across each condition. A significant fraction of clones primarily in cluster 0 in OKSM induction alone “switched” to primarily be in cluster 3 (60%) or cluster 1 (25%) in OKSM induction with LSD1 inhibition (Figure 6D). To broadly assess the pluripotency status of these cells, we assigned a pluripotency module score to each cell based on averaged expression of 10 commonly used pluripotency markers and found no significant difference between matched twins across conditions (Figure 6E).

Looking at specific pluripotency markers individually, however, iPSCs demonstrated heterogeneity within and across conditions, as previously seen in iPSC culture (Figures 6E, S13AB).74,75 We saw minimal differences in expression of the core pluripotency factors NANOG and OCT4. Among markers selected for the pluripotency module, DNA methyltransferase DNMT3B and RNA-binding protein LIN28A, both with known roles in pluripotency maintenance, were modestly elevated in twins in OKSM induction alone. Meanwhile, PODXL, a surface marker used for isolating “functional” iPSCs, was modestly elevated in twins in OKSM induction with LSD1 inhibition.76 We further identified a number of genes less commonly associated with pluripotency but strongly correlated with our gene expression clusters; twins forming iPSCs in OKSM induction with LSD1 inhibition had higher levels of NLRP7, known to block BMP4-mediated downregulation of pluripotency factors and differentiation, and lower levels of TERF1, known to promote telomere elongation and pluripotency factor expression, when compared to twins forming iPSCs in OKSM induction alone.77,78 These results indicate that LSD1 inhibition may push reprogramming cells towards certain final iPSC states marked by differential expression of several pluripotency markers not reported in previous bulk RNA-sequencing analyses.37

LSD1 inhibition (LSD1i) during iPSC reprogramming increased the number of primed clones and consequently the number of iPSC colonies formed by “reclassifying” previously nonprimed clones as primed (Figure 5). We wondered whether clones reclassified from nonprimed to primed with LSD1 inhibition (i.e., LSD1i-dependent) could be distinguished from clones classified as primed without or with LSD1 inhibition (i.e., LSD1i-independent) after becoming iPSCs. To identify LSD1i-independent and -dependent clone barcodes while avoiding significant subsampling, we performed barcode DNA-sequencing and measured barcode overlap as in Figure 5B on the leftover reprogrammed iPSCs in each condition after removing a small fraction for single-cell RNA-sequencing (Figures 6F, S13C). The LSD1i-dependent clones were, however, essentially indistinguishable from the LSD1i-independent clones when comparing gene expression profiles. When we visualized aggregates of each category in principal component space, we found iPSCs formed via reprogramming with OKSM and LSD1 inhibition derived from LSD1i-independent and from LSD1i-dependent clones to be virtually overlapping (Figure 6G). The positioning of different iPSC subpopulations in principal component space further reiterated that LSD1 inhibition during reprogramming creates iPSCs that are subtly distinct from those formed by OKSM alone.

In sum, while pre-existing molecular differences strongly influence whether a cell becomes an iPSC or not, such differences seemingly have much less influence on the final transcriptional state of the resultant iPSCs, reflecting either a lack of memory of those initial differences or intrinsic homogeneity of the iPSC fate itself.

Discussion

While much work has gone into identifying and characterizing the molecular characteristics and mechanisms of cells en route to becoming iPSCs following induction of OKSM, the demonstration that “twins” share the same reprogramming outcome911 suggested the existence of intrinsic differences in otherwise homogeneous-seeming cells that drive reprogramming outcomes. Rewind, one of a number of tools that enable one to retrospectively connect fates to initial primed states, allowed us to characterize these primed states directly.32,34,35,79 Here, we revealed that primed fibroblasts are marked by naturally-arising fast cycling speed and lack of fibroblast activation, both of which likely represent a single underlying axis of biological variability and both of which directly affect iPSC reprogramming efficiency.

Although Rewind was able to reveal new information about cells primed for reprogramming, it is worth noting that there are likely additional important features to be discovered. For instance, while cycling speed and fibroblast activation emerged as the strongest measureable priming features in our analyses, in logistic regression models each variable alone only explained 0.5%–1% of variation in priming and when combined as a principal component only explained 9.3% of variation in gene expression, leaving open the question of what explains the remaining variability—indeed, there were already hints of another principal component that may additionally mark the primed state (Figure 4D). One source of variability is undoubtedly technical: given that the primed cells are a very rare subset, we often had only 10s-100s of primed cells for our analyses, leaving us with a high degree of noisiness in our measurements. Other sources of variability that could affect the Rewind methodology could be a loss of memory between the cell divisions required for the technique to work. More advanced methods for distinguishing cells within primed clones that have exited the primed state (and cells within nonprimed clones that have entered the primed state) may enable a more complete characterization of the primed gene expression profile.

A major question raised by our work is how to define the primed state. It is tempting to use a purely cell-intrinsic definition of priming consisting of a discrete set of molecular markers. However, as our results using the LSD1 inhibitor during reprogramming show, cells that are not primed for reprogramming in one condition can be primed for reprogramming in another. Hence, any definition must incorporate both the molecular state of the cell as well as the stimulus applied. We have observed similar reclassifications in other systems when different perturbation conditions are applied,32,36,80,81 suggesting that such a definition for priming may be required in other contexts as well. Interestingly, reclassification of nonprimed cells to primed cells via LSD1 inhibition was not random but applied only to specific clones. This specificity implies that responsiveness to LSD1 inhibition is a function of specific cell-intrinsic features and has some degree of memory. Additional Rewind experiments designed to measure cellular states and fates at multiple points during the reprogramming process would reveal these cell-intrinsic features and may help elucidate the mechanisms underlying how LSD1 inhibition reclassifies cells to be primed.

The fact that cells can be primed in one condition and not primed in another also eliminates the possibility that there is a single, distinct primed molecular state. Rather, there must be multiple primed states, given that different cells can have different outcomes in different conditions. Our prior work has demonstrated the existence of multiple primed states in the context of cancer therapy resistance.32,36,80,82 Are these primed states organized along a single axis of variability, or multiple? In the case of cancer therapy resistance, we found evidence for multiple axes of variability.32,82 Another question is whether primed cell states form a continuum, or if they consist of discrete metastable states that cells can fluctuate between.8385 It is difficult to answer this question with current approaches. New conceptual approaches (experimental and theoretical) for the identification of metastable states will be required to provide answers. Whether the primed cells we found are primed for other cell fate transitions is another important future direction.

Unlike in a previously described human mesenchymal cell system, priming in our human fibroblast system was not marked by ectopic expression of factors specific to the target cell type.31 In a mouse system, cells with higher reprogramming potential expressed markers of embryonic morphogenesis and myofibroblast progenitors,30 such as expression of Myl11 and Tpm2 in MEFs. These differences may represent the heterogeneity inherent to the isolation of those cell types.

In contrast, we observed that fibroblast markers were unchanged in primed cells (versus nonprimed cells), but primed cells did express comparatively lower levels of markers of fibroblast activation and myofibroblast differentiation. The hiF-T cell line is a secondary fibroblast line (meaning that it was formed from reprogramming fibroblasts into iPSCs and then differentiating a clone back into fibroblast-like cells), so it is unclear how to interpret the expression of these markers in this line, which may arise from the culturing conditions used.8692 The literature is also mixed on the effects of fibroblast activation on iPSC reprogramming, with some studies reporting that activated fibroblasts and myofibroblasts taken from fibrotic or damaged tissues in human and mouse are more resistant to iPSC reprogramming9395 while others reporting that rodent fibroblasts with an enhanced propensity to form myofibroblasts reprogram into iPSCs with higher efficiency.30,96,97 Overall, it is difficult to say whether primed cells are genuinely non-activated fibroblasts or just in some different fibroblast state, but it is clear that the differences between primed and nonprimed cells reflect differences along this axis.

Cell fate is typically thought to be determined by some combination of cell intrinsic and cell extrinsic factors. Intrinsic factors can be defined as those within the cell that, if the cell were placed in a different environment, would still dictate the same outcome, whereas extrinsic factors are those environmental factors that would dictate the outcome regardless of the internal state of the cell. An important concept to introduce here is that of memory, namely how long the intrinsic determination persists over time. Schematically, the following classification may hold: (1) Cell fate is determined by extrinsic factors33,73; (2) Cell fate is determined by very short-lived intrinsic factors, thereby seeming “stochastic”98,99; (3) Cell fate is determined by intermediate-lifetime intrinsic factors, in which case close progeny may adopt the same fate as the ancestor32,35,36,66; (4) Cell fate is determined by very long-lived intrinsic factors (e.g., mutations), in which case all progeny will adopt the same fate as the ancestor. With the advent of barcoding techniques, it has become possible to rigorously distinguish between these possibilities. In the case of reprogramming, much ink has been spilled on the distinction between extreme cases 2 and 4, i.e., either completely stochastic induction versus completely deterministic induction, with both cases seemingly supported and contradicted by experimental evidence. We propose that case 3, an intermediate level of memory, may be a way to reconcile the various data. In that scenario, cells have a limited memory of the primed state, so once a cell enters the primed state, its progeny will eventually “forget”. On a long time scale, such as expanding individual cells into large clonal populations6, this loss of cellular memory would make priming for reprogramming seem stochastic, but on shorter timescales, such as the few divisions used in ours and others’ barcoding experiments, priming will be inherited and seem deterministic. Such intermediate memory timescales have been found in many other systems.32,66,100,101 Performing similar clone-splitting experiments at multiple time points before and after the induction of reprogramming would help identify short-term and long-term factors affecting reprogramming efficiency.

Given that there is a molecular distinction between cells that were primed to reprogram versus those that were not, it was conceivable that smaller differences even between individual primed cells could propagate to differences between the resultant iPSCs. With the advent of barcoding technologies, this form of clonal memory has revealed itself in a number of systems.34,36,72,79,101 In the reprogramming system we analyzed here, we found little evidence for clonal memory in the final state, with iPSCs seeming largely the same even when we knew there were differences in the initial state (e.g., LSD1i-dependent and -independent primed cells formed indistinguishable iPSCs). It could be that measurement techniques based solely on RNA cannot reveal the differences driven by clonal memory. Alternatively, it could be that iPSCs represent an attractor state that erases the molecular history of the cell.

STAR Methods

RESOURCE AVAILABILITY

Lead contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Arjun Raj (arjunrajlab@gmail.com).

Materials availability

Materials generated in this study are available from the lead contact upon request. There are no restrictions to the availability of these materials.

Data and code availability

All raw and processed data have been deposited on Dropbox and are publicly available as of the date of publication. URLs are listed in the Key Resources Table. All bulk RNA-sequencing data and single-cell RNA-sequencing data have been deposited at GEO and are publicly available as of the date of publication. Ascension numbers are listed in the Key Resources Table. All genomic DNA barcode sequencing data have been deposited on Figshare and are publicly available as of the date of publication. URLs are listed in the Key Resources Table.

Key resources table.
REAGENT or RESOURCE SOURCE IDENTIFIER
Antibodies
Mouse Anti-Human TRA-1-60 Alexa Fluor 488 Antibody Invitrogen CAT# A25618; RRID: AB_2885001
Rabbit Anti-Human SPP1 Antibody Proteintech CAT# 22952-1-AP, RRID: AB_2783651
Rabit Anti-Human LSD1 Antibody Cell Signaling Technology CAT# 2184S, RRID: AB_2070132
Rabbit Anti-HumanBeta Actin Antibody Cell Signaling Technology CAT# 4970S, RRID: AB_2223172
Rabbit Anti-Human Histone H3 Antibody Abcam CAT# AB1791, RRID: AB_302613
Goat Anti-Rabbit HRP Antibody Bio-Rad CAT# 1706515, RRID: AB_11125142
Chemicals, peptides, and recombinant proteins
rhFGF-basic Promega CAT# G5071
Polybrene Millipore-Sigma CAT# TR-1003-G
Cell Trace Yellow Invitrogen CAT# C34567
ROCK inhibitor Y26632 Calbiochem CAT# 688001
LSD1 inhibitor RN-1 Millipore-Sigma CAT# 489479
DOT1L inhibitor Selleck Chemicals CAT# S7062
Critical commercial assays
QIAmp DNA Mini Kit Qiagen CAT# 51304
NEBNext Poly(A) mRNA Magnetic Isolation Module NEB CAT# E7490L
NEBNext Ultra II RNA Library Prep Kit for Illumina NEB CAT# E7770L
NEBNext Multiplex Oligos for Illumina (Dual Index Primers Set 1) oligos NEB CAT# E7600S
Chromium Next GEM SingleCell 3’ HT Kit v3.1 10x Genomics CAT# 1000370
Illumina NextSeq 500/550 75 Cycle High-Output Kit Illumina CAT# 20024906
Illumina NextSeq 500/550 150 Cycle Mid-Output Kit Illumina CAT# 20024904
Illumina NextSeq 1000/2000 P3 100 Cycle Kit Illumina CAT# 20040559
Illumiina NovaSeq 6000 S1 100 Cycle Kit Illumina CAT# 20028319
Vector Red Substrate kit Vector Labs CAT# SK-5100
Deposited data
All Bulk RNA-Sequencing Data This Paper GSE226987,
GSE243933
All Single-Cell RNA-Sequencing Data This Paper GSE227151
All Genomic DNA Barcode Sequencing Data This Paper https://figshare.com/projects/Retrospective_identification_of_cell-intrinsic_factors_that_mark_pluripotency_potential_in_rare_somatic_cells/161662
Experimental models: Cell lines
Human: HEK293FT Fisher CAT# R70007, RRID: CVCL_6911
Human: OKSM-inducible, iPSC-derived hiF-T fibroblasts Cacchiarelli et al., 201537 N/A
Mouse: OKSM-inducible, ESC-derived embryonic fibroblasts Stadtfeld et al., 2010102 N/A
Mouse: CF-1 irradiated feeder embryonic fibroblasts Fisher CAT# A34181
Oligonucleotides
Single-Molecule RNA FISH Probe Sets See Table S1 N/A
Primers for Amplification of gDNA Barcodes Emert et al., 202132 N/A
Primers for Amplification of 10X Barcodes Goyal et al., 202336 N/A
Hybridization Chain Reaction B1 Alexa Fluor 647 Amplifier Hairpins Molecular Instruments N/A
Hybridization Chain Reaction Custom Probe Sets for RNA Barcodes Molecular Instruments N/A
Recombinant DNA
Cell Clone Barcode Library Plasmids Emert et al., 202132 N/A
CRISPR Knockdown Constructs See Table S2 N/A
pPAX2 Trono Lab (unpublished) Addgene Plasmid #12260
pVSV.G Reya et al., 2003104 Addgene Plasmid #14888
pLentiCRISPRv2-blast Babu Lab (unpublished) Addgene Plasmid #83480
pLentiCRISPRv2-GFP Walter et al., 2017108 Addgene Plasmid #82416
pMD2.G Trono Lab (unpublished) Addgene Plasmid #12259
pMDLg Dull et al., 1998109 Addgene Plasmid #12251
pRSV-Rev Dull et al., 1998109 Addgene Plasmid #12253
Software and algorithms
STAR Dobin et al., 2013115 N/A
HTSeq Anders et al., 2015116 N/A
kallisto Bray et al., 2016117 N/A
ChEA3 Keenan et al., 201951 N/A
HOMER Heinz et al., 2010120 N/A
JASPAR Castro-Mondragon et al., 2022119 N/A
IGV Robinson et al., 2011121 N/A
DESeq2 Love et al., 2014118 N/A
Image Analysis Pipeline via rajlabimagetools Repository Raj et al., 2008105 N/A
Barcode Analysis Pipeline via timemachine Repository Emert et al., 202132 N/A
Barcode Analysis Pipeline via FateMap_Goyal2023 Repository Goyal et al., 202336 N/A
All Raw Data Used to Produce Figures This Paper https://www.dropbox.com/sh/ulu6728tcp49dv2/AAAPwLYQiVLloH_JL38lvTj6a?dl=0,
https://www.dropbox.com/sh/zz958910t4fkj9w/AAAgTVwO5yAKZ1TpSQVfV6Qga?dl=0
All Code Used to Produce Figures This Paper https://doi.org/10.5281/zenodo.7707418

All original code has been deposited at Zenodo and is publicly available as of the date of publication. DOIs are listed in the key resources table.

Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

EXPERIMENTAL MODEL AND SUBJECT DETAILS

Cell lines and culture conditions

Unless otherwise noted, all cell culture incubations were performed at 37°C, 5% CO2. We tested intermittently for mycoplasma contamination. We cultured hiF-T cells as previously described prior to hiF-T reprogramming experiments.37 Briefly, we expanded hiF-T cells in growth medium on TC plastic dishes coated with Attachment Factor (Fisher #S006100), and split cells 1:3 upon reaching 60%–70% confluency. hiF-T growth medium (GM) is DMEM/F-12 with Glutamax (Life Tech #10585018) + 10% ES-FBS (Life Tech #16141079) + 1X 2-mercaptoethanol (Life Tech #21985023) + 1X NEAA (Invitrogen #11140050) + P/S + 0.5ug/mL puromycin + 16 ng/mL rhFGF-basic (Promega #G5071). When passaging hiF-T cells, we performed dissociation with accutase (Sigma #A6964–100ML), and followed the manufacturer’s instructions.

Primary MEFs (kindly provided by Chris Lengner) were derived from wild-type KH2-OKSM; rosa26:M2rtTA (OKSM; rtTA) mESCs as previously described.102 Briefly, mESCs were injected into mouse E3.5 blastocysts to generate chimeric embryos. MEFs were prepared from E12.5 chimeric embryos and grown in the presence of 2 mg/mL puromycin for 48 hours before being frozen down. We cultured MEF cells as previously described prior to MEF reprogramming experiments.103 Briefly, we expanded MEF cells in growth medium on TC plastic dishes coated with Attachment Factor (Fisher #S006100), and split cells 1:3 upon reaching 60%–70% confluency. MEF growth medium (GM) is DMEM with Glutamax (LifeTech #10566016) + 10% FBS (Gibco #16000044) + P/S. When passaging hiF-T cells we performed dissociation with accutase (Sigma #A694–100ML), and followed the manufacturer’s instructions.

METHOD DETAILS

Reprogramming to pluripotency

We performed hiF-T reprogramming experiments as previously described.37,52 Briefly, we expanded hiF-T cells in hiF-T GM without puromycin for one week. On day −1, we seeded CF-1 irradiated MEFs (Fisher #A34181) on 24-well plates (Corning #353047) coated with Attachment Factor (Fisher #S006100). On day 0, we seeded varying amounts (1–3 * 10^5) hiF-T cells per 24-well plate. On day 1, we began Yamanaka factor induction by switching media to hiF-T GM with 2 ug/mL doxycycline and without puromycin. On day 3, we switched media to KSR medium (KSRM): DMEM/F-12 with Glutamax (Life Tech #10585018) + 20% Knockout Serum Replacement (Life Tech #10828010) + 1X 2-mercaptoethanol (Life Tech #21985023) + 1X NEAA (Invitrogen #11140050) + P/S + 8 ng/mL rhFGF-basic (Promega #G5071) + 2 ug/mL doxycycline. We changed KSRM daily, and analyzed cells on day 21. We performed ≥2 biological replicates (i.e. different vials of hiF-T cells expanded and reprogrammed on different days with different batches of media) unless otherwise specified for all experiments. For reprogramming experiments with perturbations, we used the LSD1 inhibitor RN-1 (Millipore #489479) at a final concentration of 1uM and the DOT1L inhibitor pinometostat (Selleck Chemicals #S7062) at a final concentration of 4uM.

We performed MEF reprogramming experiments as previously described,103 but at 37°C with atmospheric O2 instead of 37°C with 5% O2. Briefly, we expanded MEF cells in MEF GM for one week. On day −1, we seeded CF-1 irradiated MEFs (Fisher #A34181) on 24-well plates (Corning #353047) coated with Attachment Factor (Fisher #S006100). On day 0, we seeded varying amounts (0.5–2 * 10^5) hiF-T cells per 24-well plate. On day 1, we began Yamanaka factor induction by switching media to ESGRO-2i media (Sigma-Aldrich #SF016–200) with 2 ug/mL doxycycline. We changed ESGRO-2i media daily, and analyzed cells on day 14. We performed ≥2 biological replicates (i.e. different vials of hiF-T cells expanded and reprogrammed on different days with different batches of media) unless otherwise specified for all experiments.

Alkaline phosphatase staining with colorimetry

We used the Vector Red Substrate kit (Vector Labs #SK-5100) to stain hiF-T-iPSC colonies after fixation on day 21 of reprogramming experiments. We fixed wells in 24-well format using 3.7% formaldehyde for 3 min, and followed the manufacturer’s instructions.

Embryoid body formation from hiF-T-iPSC colonies

After forming hiF-T-iPSC colonies following 21 days of OKSM induction, we evaluated differentiation of hiF-T-iPSCs into embryoid bodies. We dissociated hiF-T-iPSCs from MEF feeder cells with accutase (Sigma #A6964–100ML) for 2–10 min at 37°C before adding embryoid body (EB) media: DMEM/F-12 with Glutamax (Life Tech #10585018) + 20% ES-FBS (Life Tech #16141079) + 1X 2-mercaptoethanol (Life Tech #21985023) + 1X NEAA (Invitrogen #11140050) + P/S. We dislodged colonies and mechanically broke up colonies by pipetting up and down to form a cell suspension. We performed two washes with centrifugation at 1200 rpm for 2 min and resuspension in 5 mL of EB media. Because MEF feeders and hiF-T-iPSCs both dissociated with accutase, we briefly placed the cell suspension on 10 cm dishes coated with Attachment Factor (Fisher #S006100) for 30 min at 37°C to enable preferential adhering of the MEF feeders but not hiF-T-iPSCs. The hiF-T-iPSCs largely remained in suspension, which we collected. We plated 1 mL of final cell suspension into each well of an ultra-low attachment surface 6-well plate (Corning #3471) and allowed the hiF-T-iPSCs to aggregate and differentiate into embryoid bodies over 14 days. To help with cell survival, we incubated cells in 10uM of ROCK inhibitor Y26632 (Calbiochem #688001) for the first 2 days. We replaced the media every 2 days by collecting aggregates into a 15 mL conical, incubating for 15–30 min at 37°C to allow the aggregates to settle at the bottom, and carefully replacing the media. On day 14, we collected barcoded embryoid bodies for DNA-sequencing of clone barcodes and plated a small amount on normal 6-well plates to enable adhering and differentiation of embryoid bodies into different cell types, which we confirmed by light microscopy imaging.

Clone barcode library lentivirus generation

Barcode libraries were constructed as previously described.32,36,73,80 Full protocol available at https://www.protocols.io/view/barcode-plasmid-library-cloning-4hggt3w. Briefly, we modified the LRG2.1T plasmid (kindly provided by Junwei Shi) by removing the U6 promoter and single guide RNA scaffold and inserting a spacer sequence flanked by EcoRV restriction sites just after the stop codon of GFP. We digested this vector backbone with EcoRV (NEB #R3195S) and gel purified the resulting linearized vector. We ordered PAGE-purified ultramer oligonucleotides (IDT) containing 30 nucleotides homologous to the vector insertion site surrounding 100 nucleotides with a repeating “WSN” pattern (W = A or T, S = G or C, N = any) and used Gibson assembly followed by column purification to combine the linearized vector and barcode oligo insert. We performed electroporations of the column-purified plasmid into Endura electrocompetent E. coli cells (Lucigen #60242–1) using a Gene Pulser Xcell (Bio-Rad #1652662), allowing for recovery before plating serial dilutions and seeding cultures (200 mL each) for maxipreparation. We incubated these cultures on a shaker at 225 rpm and 32 °C for 12–14 h, after which we pelleted cultures by centrifugation and used the EndoFree Plasmid Maxi Kit (Qiagen #12362) to isolate plasmid according to the manufacturer’s protocol, sometimes freezing pellets at −20°C for several days before isolating plasmid. Barcode insertion was verified by polymerase chain reaction (PCR) from colonies from plated serial dilutions. We pooled the plasmids from the separate cultures in equal amounts by weight before packaging into lentivirus. We estimated our library complexity as described elsewhere.36 Briefly, we sequenced three independent transductions in WM989 A6-G3 melanoma cells and took note of the total and pairwise overlapping extracted barcodes. Using the mark-recapture analysis formula, we estimate our barcode diversity from these three transductions to be between 48.9 and 63.3 million barcodes.

Lentivirus packaging of clone barcode library

We adapted previously described protocols to package lentivirus.32,36,73,80 We first grew HEK293FT to near confluency (80–95%) in 10 cm plates in DMEM + 10% FBS + P/S. On day −1, we changed the media in HEK293FT cells to DMEM + 10% FBS without antibiotics. For each 10 cm plate, we added 80 μL of polyethylenimine (Polysciences #23966) to 500 μL of Opti-MEM (Fisher #31985062), separately combining 5 μg of pVSV.G104 and 7.5 μg of pPAX2 and 7.35 μg of the barcode plasmid library in 500 μL of Opti-MEM. We incubated both solutions separately at room temperature for 5 min. Then, we mixed both solutions together by vortexing and incubated the combined plasmid-polyethylenimine solution at room temperature for 15 min. We added 1.09 mL of the combined plasmid-polyethylenimine solution dropwise to each 10 cm dish. After 6–7 hours, we aspirated the media from the cells, washed with 1X DPBS, and added fresh hiF-T GM. The next morning, we aspirated the media, and added fresh hiF-T GM. Approximately 9–11 hours later, we transferred the virus-laden media to an empty, sterile 50 mL conical tube and stored it at 4°C, and added fresh hiF-T GM to each plate. We continued to collect the virus-laden media every 9–11 hours for the next 30 hours in the same 50 conical mL tube, and stored the collected media at 4°C. Upon final collection, we filtered the virus-laden media through a 0.45μm PES filter (Millipore-Sigma #SE1M003M00) and stored 1.5 mL aliquots in cryovials at −80°C.

Transduction with lentiviral clone barcode library

To transduce hiF-T cells, we freshly thawed virus-laden media on ice, added it to dissociated cells with 4ug/mL of polybrene (Millipore-Sigma #TR-1003-G), and plated 50,000 cells/well in a 6-well plate coated with Attachment Factor (Fisher #S006100). The volume of virus-laden media used was decided by measuring the multiplicity of infection (MOI) with different viral titers. For single-cell RNA-sequencing experiments, we aimed for a low MOI with 10%–25% GFP-positive cells to minimize the fraction of cells with multiple unique barcodes. We found it relatively computationally challenging to differentiate multiple-barcoded cells from doublets introduced by gel beads-in-emulsions. This was not an issue for bulk DNA clone barcode overlap experiments for which we aimed for a high MOI with 60–70% GFP-positive cells. We used 35 uL/well of virus-laden media for low MOI and 80 uL/well for high MOI. After plating hiF-T cells with virus, we performed a 30 min incubation at room temperature before centrifuging the 6-well plate at 930g for 30 min at room temperature. After 24 hours, we passaged the cells from 2 wells onto 10 cm plates or from 6 wells onto 15cm plates. The barcoded cells (GFP-positive) were sorted for all single-cell RNA-sequencing experiments but not for bulk DNA clone barcode overlap experiments.

DNA-sequencing of clone barcodes from genomic DNA

We prepared clone barcode sequencing libraries from genomic DNA (gDNA) as previously described.32,36,73 Briefly, we isolated gDNA from barcoded cells using the QIAmp DNA Mini Kit (Qiagen #51304) per the manufacturer’s protocol. Extracted gDNA was stored as a pellet in −20°C for days to weeks before the next step. We then performed targeted amplification of the barcode using custom primers containing Illumina adaptor sequences, unique sample indices, variable-length staggered bases, and an “UMI” consisting of 6 random nucleotides (NHNNNN). As reported previously,32 the “UMI” does not uniquely tag barcode DNA molecules, but nevertheless appeared to increase reproducibility and normalize raw read counts. We determined the number of amplification cycles (N) by initially performing a separate quantitative PCR (qPCR) and selecting the number of cycles needed to achieve one-third of the maximum fluorescence intensity for serial dilutions of genomic DNA. The thermocycler (Veriti #4375786) was set to the following settings: 98°C for 30 sec, followed by N cycles of 98°C for 10 sec and then 65°C for 40 sec and, finally, 65 °C for 5 min. Upon completion of the PCR reaction, we immediately performed a 0.7X bead purification (Beckman Coulter SPRISelect #B23319), followed by final elution in nuclease-free water. Purified libraries were quantified with a High Sensitivity dsDNA kit (Thermo Fisher #Q33230) on a Qubit Fluorometer (Thermo Fisher #Q33238), pooled, and sequenced on a NextSeq 500 machine (Illumina) using 150 cycles for read 1 and 8 reads for each index (i5 and i7). The primers used are previously described.32,36

Flow sorting of barcoded cells

We used accutase (Sigma #A6964–100ML) to detach the barcoded cells from the plate and subsequently neutralized the accutase with the corresponding media depending on the cell type (hiF-T GM for fibroblasts, KSRM for iPSCs). We then pelleted the cells, performed a wash with 1X DPBS (Invitrogen #14190–136), and resuspended again in 1X DPBS. Cells were sorted on a BD FACSJazz machine (BD Biosciences) or MoFlo Astrios machine (Beckman Coulter), gated for positive GFP signal and singlets. Sorted cells were then centrifuged to remove the supernatant media containing PBS, and either replated with the appropriate cell culture media or prepared for DNA-sequencing or RNA-sequencing.

Single-cell RNA-sequencing

We used the 10x Genomics single-cell RNA-seq kit v3 to sequence barcoded cells. We resuspended the cells (aiming for up to 10,000 cells for recovery/ sample) in PBS and followed the protocol for the Chromium Next GEM Single Cell 3ʹ Reagent Kits v3.1 as per manufacturer directions (10x Genomics). Briefly, we generated gel beads-in-emulsion (GEMs) using the 10x Chromium system, and subsequently extracted and amplified barcoded cDNA as per post-GEM RT-cleanup instructions. We then used a fraction of this amplified cDNA (25%) and proceeded with fragmentation, end-repair, poly A-tailing, adapter ligation, and 10x sample indexing per the manufacturer’s protocol. We quantified libraries using the High Sensitivity dsDNA kit (Thermo Fisher #Q32854) on Qubit 2.0 Fluorometer (Thermo Fisher #Q32866) and performed Bioanalyzer 2100 (Agilent #G2939BA) analysis prior to sequencing on a NextSeq 500 machine (Illumina) using 28 cycles for read 1, 55 cycles for read 2, and 8 cycles for i7 index.

Clone barcode recovery from single-cell RNA-sequencing libraries

As the clone barcodes are transcribed, we extracted the barcode information from the amplified cDNA from the 10x Genomics V3 chemistry protocol. We ran a PCR side reaction with one primer that targets the 3’ UTR of GFP and one primer that targets a region introduced by the amplification step within the V3 chemistry protocol (“Read 1”). The two primers amplify both the 10x cell-identifying sequence as well as the 100 base pair barcode that we introduced lentivirally. The number of cycles, usually between 12–15, was decided by calculating the cycle threshold value from a qPCR reaction using the NEBNext Q5 Hot Start HiFi PCR Master Mix (NEB #M0543L) for the specified cDNA concentration. The thermocycler (Veriti #4375305) was set to the following settings: 98°C for 30 sec, followed by N cycles of 98°C for 10 sec and then 65°C for 2 min and, finally, 65°C for 5 min. Upon completion of the PCR reaction, we immediately performed a 0.7X bead purification (Beckman Coulter SPRISelect #B23319) followed by final elution in nuclease-free water. Purified libraries were quantified with High Sensitivity dsDNA kit (Thermo Fisher #Q33230) on Qubit Fluorometer (Thermo Fisher #Q33238), pooled, and sequenced on a NextSeq 500 machine (Illumina). We sequenced 26 cycles on Read 1 which gives 10x cell-identifying sequence and UMI, 124 cycles for read 2 which gives the barcode sequence, and 8 cycles for index i7 to demultiplex pooled samples. The primers used are previously described.36

Single-molecule RNA FISH on cells in plates

We performed single-molecule RNA FISH as previously described.105 For the genes used here, we designed complementary oligonucleotide probe sets using custom probe design software in MATLAB and ordered them with a primary amine group on the 3′ end from Biosearch Technologies (see Table S1 for probe sequences). We then pooled each gene’s complementary oligos and coupled the set to Cy3 (GE Healthcare), Alexa Fluor 594 (Life Technologies) or Atto 647N (ATTO-TEC) N-hydroxysuccinimide ester dyes. The cells were fixed as follows: we aspirated media from the plates containing cells, washed the cells once with 1X DPBS, and then incubated the cells in the fixation buffer (3.7% formaldehyde in 1X DPBS) for 10 min at room temperature. We then aspirated the fixation buffer, washed samples twice with 1X DPBS, and added 70% ethanol before storing samples at 4°C. For hybridization of RNA FISH probes, we rinsed samples with wash buffer (10% formamide in 2X SSC) before adding hybridization buffer (10% formamide and 10% dextran sulfate in 2X SSC) with standard concentrations of RNA FISH probes and incubating samples overnight with coverslips, in humidified containers at 37°C. The next morning, we performed two 30 min washes at 37°C with the wash buffer, after which we added 2X SSC with 50 ng/mL of DAPI. We mounted the samples for imaging in 2X SSC. To strip RNA FISH probes to re-hybridize and re-image for additional genes, we incubated samples in stripping buffer (60% formamide in 2X SSC) for 20 min on a hot plate at 37°C, washed samples 3 × 15 min with 1X PBS on a hot plate at 37°C, then returned samples to 2X SSC. After stripping RNA FISH probes, we re-imaged all previous positions and excluded dyes with residual signal from subsequent hybridization.

Detecting clone barcodes in carbon copies in situ

We adapted the Hybridization Chain Reaction (HCR V3.0)106 for barcode RNA FISH as follows. We used 1.2 pmol each of up to 240 barcode RNA FISH probes per 0.3 mL hybridization buffer. Our primary hybridization buffer consisted of 30% formamide, 10% dextran sulfate, 9 mM citric acid pH 6.0, 50 μg/mL heparin, 1X Denhardt’s solution (Life Technologies #750018) and 0.1% Tween-20 in 5X SSC. For primary hybridization, we used 100 μL hybridization buffer per well of a 2 well plate, covered the well with a glass coverslip, and incubated the samples in humidified containers at 37°C for 8 hours. Following the primary probe hybridization, we washed samples 4 × 5 min at 37°C with washing buffer containing 30% formamide, 9 mM citric acid pH 6.0, 50 μg/mL heparin, and 0.1% tween-20 in 5X SSC. We then washed the samples at room temperature 2 × 5 min with 5X SSCT (5X SSC + 0.1% Tween-20) and incubated the samples at room temperature for 30 min in amplification buffer containing 10% dextran sulfate and 0.1% Tween-20 in 5X SSC. During this incubation, we snap-cooled individual HCR hairpins (Molecular Instruments) conjugated to Alexa Fluor 647 (Alexa647) by heating to 95°C for 90 sec then immediately transferring to room temperature to cool for 30 min while concealed from light. After these 30 min, we resuspended and pooled the hairpin in amplification buffer to a final concentration of 6–15 nM each. We added the hairpin solution to samples along with a coverslip and incubated samples at room temperature overnight (12–16 hours) concealed from light. The following morning, we washed samples 5 × 5 min with 5X SSCT containing 50 ng/mL DAPI, added SlowFade antifade solution (Life Technologies #S36940) and a coverslip before proceeding with imaging. To remove fluorescent signal for subsequent rounds of RNA FISH or immunofluorescence, we photobleached samples on the microscope or stripped HCR hairpins as described above for RNA FISH probes.

Imaging RNA FISH and colorimetric dye signal

To image single-molecule RNA FISH, nuclei, and colorimetric dye signal, we used a Nikon TI-E inverted fluorescence microscope equipped with a SOLA SE U-nIR light engine (Lumencor), a Hamamatsu ORCA-Flash 4.0 V3 sCMOS camera, and 4X Plan-Fluor DL 4XF (Nikon #MRH20041/MRH20045), 10X Plan-Fluor 10X/0.30 (Nikon #MRH10101) and 60X Plan-Apo λ (#MRD01605) objectives. We used the following filter sets to acquire signal from different fluorescence channels: 31000v2 (Chroma) for DAPI, 41028 (Chroma) for Atto 488, SP102v1 (Chroma) for Cy3, 17 SP104v2 (Chroma) for Atto 647N, and a custom filter set for Alexa Fluor 594. We tuned the exposure times depending on the dyes used (Cy3, Atto 647N, Alexa Fluor 594, and DAPI). For large scans, we used a Nikon Perfect Focus system to maintain focus across the imaging area. For imaging RNA FISH signal at high magnification (≥60X), we acquired z-stacks of multiple Z-planes and used the maximum intensity projection to visualize the signal.

Flow sorting of cells based on cycling speed

To sort hiF-T cells by cycling speed, we labeled hiF-Ts with CellTrace Yellow (Invitrogen #C34567) at a final concentration of 10uM. Briefly, we dissociated the cells, centrifuged and resuspended in 1XDPBS for two washes, added CellTrace Yellow to the cells in suspension, and incubated at 37°C for 30 minutes. After incubation, we resuspended in five volumes of media to remove free dye, centrifuged and resuspended in fresh media, and replated onto 10 cm plates. For experiments with barcoded hiF-Ts, we stained hiF-T cells with CellTrace Yellow immediately before performing lentiviral transduction of clone barcodes. Generally, cells were harvested for FACS sorting 3–4 days following labeling. Fast and slow cycling cells were determined by examining the distribution of CellTrace Yellow signal during FACS sorting, corresponding to the dimmest 15% of cells and brightest 15% of cells respectively.

CRISPR knockdown construct lentivirus generation

To knockdown expression of several priming markers identified by Rewind, we generated lentiviral CRISPR constructs. The hiF-T cells are sensitive to seeding density and often do not survive at low densities, making clonal bottlenecking difficult. Hence, because we were unable to confirm sample-wide genetic indels via precise selection of knockout clones, we refer to the effect here as knockdown instead of knockout despite using CRISPR/Cas9. To design each CRISPR knockdown construct, we selected 2–4 guides per gene from a genome-wide database designed using previously described optimized metrics,107 generally prioritizing guides with a higher Rule Set 2 score when possible. We ordered pairs of complementary forward and reverse single stranded oligonucleotides (IDT) for each guide containing compatible overhang sites for insertion into a lentiCRISPRv2-blast (Addgene #83480) or lentiCRISPR-v2-GFP (Addgene #82416) backbone,108 which simultaneously encodes Cas9 and an insertable target guide DNA (gDNA). We resuspended each oligo to 25 uM in NF-H2O and performed phosphorylation and annealing of each oligo pair by combining the oligos with 1X T4 ligase buffer (NEB #B0202S) and polynucleotide kinase (NEB #M0201S) and incubating in a thermocycler (Veriti #4375305) with the following settings: 37°C for 30 min, 95°C for 5 min, and ramping down to 25°C at 5°C/min. Then, we inserted our annealed oligos into the lentiCRISPRv2-blast backbone via Golden Gate assembly; we combined 50 ng of backbone with 25 ng of annealed oligo along with T4 ligase buffer (NEB #B0202S), T4 ligase (#M0202S), and BsmBI restriction enzyme (NEB #R0739S) and incubated in a thermocycler (Veriti #4375305) with the following settings: 50 cycles of 42°C for 5 min and 16°C for 5 min followed by 65°C for 10 min. To grow out plasmid before packing, we performed heat-shock transformation at 42°C in Stbl3 E. coli cells for each guide before plating on LB plates with ampicillin and incubating at 225 rpm and 37°C for 8–12 hours. Individual colonies were picked for each guide and grown out in 5 mL liquid cultures of LB with ampicillin. Then, we pelleted each liquid culture by centrifugation and used the Monarch Plasmid Miniprep Kit (NEB #T1010L) to isolate plasmid according to the manufacturer’s protocol. Guides were verified by Sanger sequencing of the isolated plasmid.

Lentivirus packaging of CRISPR knockdown constructs

To package sequence-verified plasmids for each guide into lentivirus, we first grew HEK293T to 65%–80% confluency in 6-well plates in DMEM + 10% FBS without antibiotics and on 0.1% gelatin (#ES-006-B). For each individual plasmid, we combined 0.5 ug of pMD2.G, 0.883 ug of pMDLg, 0.333 ug of pRSV-Rev,109 and 1.333 ug of the plasmid in 200 uL of Opti-MEM. After vortexing, we added 9.09 uL of polyethylenimine (Polysciences #23966) and incubated for 15 min at room temperature before adding the final mixture for each plasmid to a HEK293-containing well of the 6-well plates. After 4–6 hours, we aspirated the media and added fresh hiF-T GM. After 48 hours, we filtered the virus-laden media through a 0.45μm PES filter (Millipore-Sigma #SE1M003M00) and stored 1.5 mL aliquots in cryovials at −80°C.

Transduction with lentiviral CRISPR knockdown constructs

To transduce hiF-T cells, we freshly thawed virus-laden media on ice, added it to dissociated cells with 4 ug/mL of polybrene (Millipore-Sigma #TR-1003-G), and plated 50,000 cells/well in a 6-well plate coated with Attachment Factor (Fisher #S006100). We used 400 uL/well of virus-laden media, aiming for a high MOI. After plating hiF-T cells with virus, we performed a 30 min incubation at room temperature before centrifuging the 6-well plate at 930g for 30 min at room temperature. After 24 hours, we passaged the cells from 2 wells onto 10 cm plates and began selection in 2.5 ug/mL of blasticidin. This selection was removed after 7 days, which corresponded with when cell death was near complete in control hiF-Ts we cultured in parallel containing no lentiviral constructs. We wanted to minimize the effect of blasticidin on reprogramming, so we cultured the cells for 7 days without blasticidin and without puromycin before reprogramming via OKSM induction.

Immunoblotting analyses of whole-cell lysates

For immunoblotting analysis, we prepared whole-cell lysates without sonication as previously described.110 Briefly, cells were lysed in buffer containing 20mM Tris, pH 7.5, 137 mM NaCl, 1 mM MgCl2, 1mM CaCl2, and 1% NP-40 supplemented with 1:100 Halt protease and phosphatase inhibitor cocktail (Thermo Fisher #78430) and benzonase (Novagen #707463) at 12.5 U/mL. The lysates were rotated at 4°C for 30min and boiled at 95°C in the presence of 1% SDS. The resulting supernatants after centrifugation were quantified by using the Pierce Rapid Gold PCA Protein Assay Kit (Thermo #A53225) and equal amounts were subjected to electrophoresis using NuPAGE 4%–12% Bis-Tris precast gels (Thermo #NP0335BOX). Afterwards, we cut the nitrocellulose membrane into separate strips for each protein being probed, and we used 5% milk in TBS supplemented with 0.1% Tween20 (TBST) to block the membrane at room temperature for 30 min. Primary antibodies were diluted in 5% milk in TBST and incubated at 4°C overnight. The membrane was washed 3 times with TBST, each for 10 min, followed by incubation of secondary antibodies at room temperature for 1 hour in 5% milk in TBST. The membrane was washed again 3 times and imaged on an Amersham Imager 600 (GE Healthcare). The primary antibodies used were 1:1000 rabbit anti-SPP1 (Proteintech #22952–1-AP), 1:1000 rabbit anti-LSD1 (Cell Signaling Technology #2184S), 1:1000 rabbit anti-beta actin (Cell Signaling Technology #4970S), and 1:2000 rabbit anti-histone H3 (Abcam #AB1791). The secondary antibody used was 1:5000 HRP goat anti-rabbit (Bio-Rad #1706515).

Bulk RNA-sequencing library preparation

We conducted standard bulk, paired-end (37:8:8:38) RNA-sequencing using a RNeasy Micro kit (Qiagen #74004) for RNA extraction, NEBNext Poly(A) mRNA Magnetic Isolation Module (NEB #E7490L), NEBNext Ultra II RNA Library Prep Kit for Illumina (NEB #E7770L), NEBNext Multiplex Oligos for Illumina (Dual Index Primers Set 1) oligos (NEB E7600S), and an Illumina NextSeq 550 75-cycle high-output kit (Illumina #20024906), as previously described.52

QUANTIFICATION AND STATISTICAL ANALYSIS

Analyses of sequenced clone barcodes from genomic DNA

The barcode libraries from genomic DNA-sequencing data were analyzed as previously described,32 with the custom barcode analysis pipeline available at https://github.com/arjunrajlaboratory/timemachine. Briefly, this pipeline searches for barcode sequences that satisfy a minimum phred score and a minimum length. We also use STARCODE,111 available at https://github.com/gui11aume/starcode, to merge sequences with Levenshtein distance ≤ 8 and add the counts across collapsed (merged) barcode sequences. To normalize reads for differently indexed samples and correct for sequencing depth, we calculated reads per million (RPM) for each barcode. This normalization was insufficient when comparing iPSCs reprogrammed in OKSM alone versus in OKSM with perturbations to increase overall efficiency (i.e., inhibition of LSD1 or DOT1L). When reprogramming in OKSM with perturbations a higher fraction of clones form iPSCs, so each individual clone has less representation and hence reads even when correcting for sequencing depth. For our early clone barcode sequencing experiments (Figures 1C, 1E, 5BC, S2AC, S3, S11), we performed an additional correction by assuming extremely large iPSC colonies appearing in both conditions had similar representation in each. We determined the average fold change for the largest 5–10 shared iPSC colonies between each condition and multiplied all RPM values for each barcode in the OKSM with perturbation conditions by this fold change value. Additionally, we made use of two subclones (D8 and F8) of WM989 A6-G3 generated previously36 as spike-in standards (see Table S3) to convert sequencing counts into relative cell numbers (Figures 3A, 4C, 5A, 5F, S8, S10C). We spiked in a known number of cells for each barcoded subclone to each cell pellet before gDNA extraction and sequencing. Then, we used linear regression (on (0,0), (count_F8, cells_F8), (count_D8, cells_D8) to get the conversion factor from read counts of all barcodes to cell numbers. We used a minimum cell count (determined either by large colony or spike-in normalization) and log2-fold change between pairs of conditions to annotate clones as condition-dependent or condition-independent. We obtained similar results when using each normalization method (see Figure S11).

Simulation for clone barcode overlap

We adapted a described previously computational model that simulates all steps of our experiments designed to compare barcode overlap in resistant colonies.9 The model simulates cell seeding and infection. Each cell is represented as an independent object. The number of barcoded cells was calculated as number of barcoded cells = number of seeded cells × (1 − e^-MOI), where the MOI was estimated for our barcode lentivirus. Barcodes were represented by integer numbers from among 20 million variants of unique barcodes as a conservative estimate from our lentiviral library diversity.32,36 The subset of barcoded cells was assigned barcodes randomly with replacement from this library. The model simulates expanding cells prior to induction of OKSM. Each cell, regardless of barcode status, undergoes a cell division procedure with 2–4 rounds depending on the experimental condition. In each round, a given cell will give rise to a number of progeny sharing the same barcode based on an estimated distribution of cell division in hiF-T cells. The model plates cells onto separate dishes/splits (with number of dishes/splits dependent on the experiment) by randomly assigning each cell an integer. The model simulates the formation of resistant colonies assuming a purely stochastic model of iPSC reprogramming. A defined fraction of cells on each plate form iPSC colonies based on a reprogramming efficiency that was calculated as reprogramming efficiency = number of barcoded iPSC colonies after reprogramming / number of seeded barcoded cells based on experimental observations. Additionally, each cell forming an iPSC colony is subject to probabilistic material loss at different stages of the in silico experiment, including cell culture (5%) and genomic DNA extraction (5%). The output of the model was the number of barcodes shared between different plates or barcode overlap. This was not corrected for cells having more than one lentiviral barcode due to multiple integrations for a given MOI. We performed 1000 independent simulations to obtain a distribution of barcode overlap values to determine the probability of obtaining our observed barcode overlap from our experiments by random chance. This model was written and executed in R.

Analyses of expression data from single-cell RNA-sequencing

We adapted the cellranger v3.0.2 by 10x Genomics into our custom pipeline (https://github.com/arjunrajlaboratory/10xCellranger) to map and align the reads from the NextSeq sequencing run. Briefly, we downloaded the BCL counts and used cellranger mkfastq to demultiplex raw base call files into library-specific FASTQ files. We aligned the FASTQ files to the hg38 human reference genome and extracted gene expression count matrices using cellranger count, while also filtering and correcting cell identifiers and unique molecular identifiers (UMI) with default settings. We then performed the downstream single-cell expression analysis in Seurat v3. Within each experimental sample, we removed genes that were present in less than three cells, as well as cells with less than or equal to 200 genes. We also filtered for mitochondrial gene fraction which was dependent on the cell type. For non-identically treated samples, we integrated them using scanorama112, which may work better to integrate non-similar datasets and avoid over-clustering. For samples that were exposed to identical treatment, we normalized using SCTransform113 and the samples according to the Satija lab’s integration workflow (https://satijalab.org/seurat/articles/integration_introduction.html). For each experiment, we used these integrated datasets to generate data dimensionality reductions by principal component analysis (PCA) and Uniform Manifold Approximation and Projection (UMAP), using 50 principal components for UMAP generation. For a majority of analyses, we worked with the principal component space and normalized expression counts. To determine what resolution for clustering to use for each single-cell RNA-sequencing dataset, we used the clustree package in R.114 Briefly, we visualized the relationships between clusters at multiple resolutions (0 – 1 in steps of 0.1) and chose the highest resolution before individual cluster nodes began to have multiple incoming edges, which indicates overclustering. We used a resolution of 0.45 for our dataset in Figure 1 (also S4, S7), a resolution of 0.3 for our dataset in Figure 4 (also S7), and a resolution of 0.3 for our dataset in Figure 6 (also S12, S13). In several of our single-cell RNA-sequencing datasets, we noticed several outlier clusters with lower numbers of counts and numbers of features usually combined with higher percentages of mitochondrial or ribosomal counts in comparison to the majority of clusters. We decided to remove said clusters from our datasets before performing further analyses. These outlier clusters corresponded to clusters 4, 9, and 10 for our dataset in Figure 1 (also S4, S7) and to clusters 6, 7, and 8 for our dataset in Figure 6 (also S12, S13).

Analyses of clone barcode data from single-cell RNA-sequencing

The barcodes from the side reaction of single-cell cDNA libraries were recovered by developing custom shell, R, and Python scripts, which are all available at this link: https://github.com/arjunrajlaboratory/10XBarcodeMatching. Briefly, we scan through each read searching for sequences complementary to the side reaction library preparation primers, filtering out reads that lack the GFP barcode sequence, have too many repeated nucleotides, or do not meet a phred score cutoff. Since small differences in otherwise identical barcodes can be introduced due to sequencing and/or PCR errors, we merged highly similar barcode sequences using STARCODE,111 available at https://github.com/gui11aume/starcode. For varying lengths of barcodes (30, 40 or 50) depending on the initial distribution of Levenshtein distance of non-merged barcodes, we merged sequences with Levenshtein distance ≤ 8, summed the counts, and kept only the most abundant barcode sequence. For downstream analysis, we filtered out all barcodes that were associated below a conservative minimum cutoff (dependent on sequencing depth) of unique molecular identifiers (UMI). For cases where one 10x cell-identifying sequence was associated with more than one unique barcode, we calculated the fraction of UMIs per barcode for each cell and filtered out all barcodes that did not make up at least 40% of UMIs. Finally, we filtered out all cells that were still associated with more than one unique barcode after these filtering steps. This could either result from multiplets introduced within gel beads-in-emulsions or because of the same cell receiving multiple barcodes during lentiviral transduction. We were able to confidently recover barcodes associated with 30–40% of single cells, which were then used in the downstream clone-resolved analysis.

Computationally processing fluorescence microscopy images

For colony counting, Nikon-generated ND2 files were stitched and converted from ND2 format to TIFF format within the Nikon NIS-Elements software. The number of colonies within each sample was counted using custom MATLAB code, available at https://github.com/arjunrajlaboratory/blobCounter. To identify cells containing primed clone barcodes by RNA FISH, we used custom MATLAB scripts to stitch, contrast and compress scan images (scripts available at https://github.com/arjunrajlaboratory/timemachineimageanalysis) then manually reviewed these stitched images. This review yielded positions containing candidate barcode RNA FISH positive cells which we then re-imaged for verification at 60X magnification in multiple Z-planes. If we were uncertain about the fluorescence signal in a candidate cell (e.g., abnormal localization pattern, non-specific signal in multiple channels), we excluded the cell from imaging during subsequent rounds of RNA FISH or immunofluorescence. For quantification of RNA FISH images we used custom MATLAB software available at: https://github.com/arjunrajlaboratory/rajlabimagetools. Briefly, the image analysis pipeline includes manual segmentation of cell boundaries, thresholding of each fluorescence channel in each cell to identify individual RNA FISH spots, and then extraction of spot counts for all channels and cells. Notably, for some genes, we were not able to quantify expression in a few cells because of grossly abnormal or non-specific fluorescence signal (i.e. schmutz) or because we lost a cell during sequential hybridizations. We excluded data from these cells from analyses and as a result, some plots may contain slightly different numbers of points for different genes. For quantification of cell numbers for determining proliferation rate, we used custom MATLAB software available at https://github.com/arjunrajlaboratory/colonycounting_v2. Briefly, the image analysis pipeline involves stitching the tiled DAPI images, identifying individual cells based on DAPI signal, and then extracting cell counts from the entire well.

Bulk RNA-sequencing analyses

As previously described, we aligned RNA-seq reads to the human genome (hg38) with STAR115 and counted uniquely mapping reads with HTSeq116 and output a counts matrix. The counts matrix was used to obtain transcripts per million (TPM) and other normalized values for each gene using scripts provided at: https://github.com/arjunrajlaboratory/RajLabSeqTools/tree/master/LocalComputerScripts. For our bulk RNA-sequencing data for our CRISPR knockdown lines, we aligned reads to the human genome (assembly hg38) using kallisto117 and generated count tables with uniquely mapped reads using scripts provided at: https://github.com/arjunrajlaboratory/KallistoSleuth. We performed differential expression analysis in R using DESeq2118 and with data from at least 2 biological replicates for each sample and condition.

Transcription factor overrepresentation and binding motif analysis

We used ChEA351 to identify possible upstream regulators of a subset of the positive and negative priming markers. ChEA3 integrates data about associations between transcription factors and target genes from multiple assays and genomic analyses. We used the jsonlite package in R to access the ChEA3 application programming interface (API) to submit queries to perform transcription factor target overrepresentation analysis. We submitted separate queries for the best 50 negative priming markers (i.e., lowest log2(fold change) for primed cells / nonprimed cells) and for the best 50 positive priming markers (i.e., highest log2 (fold change) for primed cells / nonprimed cells). We sorted the output from each query by integrated mean rank across each of ChEA3’s databases. For several putative upstream regulators for the negative priming markers (TWIST2, PRRX2, OSR1, and SNAI2), we extracted binding motifs from the JASPAR database119 using the monaLisa package in R. Then, we used the scanMotifGenomeWide.pl command within HOMER120 to look for instances of each motif across the genome and output a text file with the results. To simultaneously visualize binding motifs for each of these transcription factors upstream of SPP1 and FTH1, we combined the text files for each motif into a single BED file in R and uploaded it as a custom track in the Integrated Genome Viewer.121

Calculating odds ratios for priming given expression of priming markers

To measure the relative explanatory power of cycling speed and a subset of our identified priming markers, we calculated log(odds ratios). For cycling speed (dichotomous variable), we asked if a cell is in a given cycling speed group, what are its odds of being primed. The odds ratio for each cycling speed was calculated with the corresponding standard error using the single-cell RNA-sequencing dataset generated in Figure 3A. For each primed marker (continuous variable), we asked if a cell has high expression (in the top 10% of expression values per cell) of a given marker, what are its odds of being primed. Briefly, we built 2 × 2 contingency tables in each case and calculated the log(odds ratio) as (odds of being primed with high expression of gene x) / (odds of being primed without high expression of gene x). The standard error of the log(odds ratio) was approximated by the square root of the sum of the reciprocals of the four frequencies.122 The odds ratios for each marker were calculated with corresponding standard error separately in n = 3 biologically independent single-cell RNA-sequencing datasets identifying primed and nonprimed cells in the initial population and aggregated using a random-effects model using the metafor package in R. Additionally, we generated logistic regression models to evaluate the contributions of cycling speed, expression of our identified priming markers, and the interactions between these terms in predicting priming. For each primed marker, we generated a separate logistic regression model using the glm command in R and generated a p-value for each term coefficient determined by the likelihood ratio test. For cycling speed as well as each primed marker individually, we generated single-term logistic regression models (i.e., priming = B0 + B1*cycling speed or priming = B0 + B1*gene expression) and calculated R2 values to calculate the percent variation in priming explained by each priming marker or by cycling speed alone.

Mixing coefficient calculation and nearest neighbor analyses

We used a quantifiable approach to measure the gene expression relatedness of different barcoded clones as described in.36 For each pair of barcoded clones, we calculated the nearest neighbors for each cell in the 50-dimensional principal component space. We then classified the neighbors as “self” if the neighbors were from the same barcode clone or “non-self” if they belonged to the other barcode clone. We calculated the mixing coefficient as follows: mixing coefficient = (number of non-self neighbors) / (number of self-neighbors). A mixing coefficient of 1 would indicate perfect mixing such that each cell has the same number of self and non-self neighbors. A mixing coefficient of 0 would indicate that there is no mixing and that each cell within a barcoded clone lies far away from the other barcoded clone in the principal component space. All cases in which the calculated mixing coefficient was ≥1 were considered perfect mixing. The higher the mixing coefficient, the higher the transcriptional relatedness of the barcoded clones analyzed. As the number of nearest neighbors depends on the size (number of cells) of a clone, we performed this analysis between cells of similar clone size. We performed this analysis only on clones with at least 4 cells in each split.

Cluster probability distribution and Jensen-Shannon distance analyses

To more sensitively measure the gene expression relatedness of different barcoded clones, we calculated Seurat cluster probability distributions and Jensen-Shannon distance as previously described.73 For each clone barcode within a dataset, we found how its associated cells partitioned across Seurat clusters. We then divided the raw number of cells per cluster by the total number of cells found in that cluster within that dataset, then normalized all cluster proportions to sum to 1 to get “probability distributions” for each barcoded clone. To generate the probability distributions we might expect from random chance, we averaged the results from the normalization as above for 1000 random samples of a matched number of barcoded cells, noting that for all barcodes this distribution was approximately uniform. We calculated the Jensen-Shannon distance between our observed barcode probability distribution and the averaged random probability distribution largely as previously described,123 one exception being that we chose to use log base 2 to calculate the Kullback-Leibler divergence so that maximally different samples would have a Jensen-Shannon distance of 1. To determine the significance of our calculated Jensen-Shannon distance, we took another 1000 random samples of a matched number of cells, and for the probability distribution associated with each random sample we calculated the Jensen-Shannon distance from the averaged random probability distribution. We performed this analysis only on clones with at least 4 cells in each split.

Supplementary Material

1
2

Table S1: Sequences of smFISH probes used here

3

Table S2: Sequences of CRISPR knockdown guides used here

4

Table S3: Single-cell cloned barcoded WM989 A6-G3 melanoma lines used as spike-in standard for gDNA sequencing

5

Table S4: Genes differentially expressed for each Rewind replicated individually and in aggregate

6

Table S5: Genes differentially expressed between primed and nonprimed cells across cycling speeds

7

Table S6: Loadings for each principal component for visualizing samples aggregated by cycling speed and priming status

Highlights.

  • Reprogramming is successful in a rare, transient subset of intrinsically “primed” cells

  • Primed cells are marked by increased cell cycle speed and decreased fibroblast activation

  • Priming is dynamic as individual cells may move into or out of the primed state over time

  • Certain drugs increase reprogramming efficiency by enlarging the subset of primed clones

Acknowledgements

We thank members of the Arjun Raj Lab, particularly Lauren Beck, Phil Burnham, Lee Richman, Yael Heyman, Karun Kiani, Eric Sanford, Allison Cote, and Cat Triandafillou for insightful discussions related to this work. We thank Blake Caldwell of the Marisa Bartolomei lab, Jingchao Zhang of the Ken Zaret lab, Chris Lengner, as well as Wenli Yang and Rachel Truitt of the Penn iPSC core for advice on iPSC handling and reprogramming. We thank Daniel Xu of the Shelley Berger lab for assistance with immunoblotting protocols and Felicia Peng of the John Murray lab for assistance with copy editing. We thank the Genomics Facility at the Wistar Institute, especially Sonali Majumdar and Sandy Widura for assistance with single-cell partitioning and addition of 10x cell identifiers. We thank the Flow Cytometry Core Laboratory at the Children’s Hospital of Philadelphia Research Institute for assistance with flow cytometry and fluorescence-activated sorting. Finally, NJ thanks his beloved cat Brioche for keeping him company on her favorite chair or in her favorite box during hours of writing and proofreading (and for some unanticipated edits) as well as Felicia Peng for her unwavering and invaluable support.

NJ acknowledges support from the Michael Brown Fellowship, NIH T32 GM007170, and NIH F30 HD103378. YG acknowledges support from the Burroughs Welcome Fund Career Awards at the Scientific Interface, a grant from Research Catalyst Program from the McCormick School of Engineering at Northwestern University, and Northwestern University’s startup funds. IAM acknowledges support from NIH F30 NS100595. BE acknowledges support from NIH F30 CA236129, NIH T32 GM007170, NIH T32 HG000046. CLJ acknowledges support from NIH F30 HG010822, NIH T32 DK007780, and NIH T32 GM007170. AR acknowledge support from TR01 GM137425, NIH R01 CA238237, NIH R01 CA232256, NIH 4DN U01 DK127405, and NSF EFRI EFMA19–33400.

Footnotes

Declaration of Interests

AR receives royalties related to Stellaris RNA FISH probes. All remaining authors declare no competing interests.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • 1.Takahashi K, and Yamanaka S (2006). Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 126, 663–676. [DOI] [PubMed] [Google Scholar]
  • 2.Yamanaka S (2009). A fresh look at iPS cells. Cell 137, 13–17. [DOI] [PubMed] [Google Scholar]
  • 3.Malik N, and Rao MS (2013). A review of the methods for human iPSC derivation. Methods Mol. Biol. 997, 23–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Brouwer M, Zhou H, and Nadif Kasri N (2016). Choices for Induction of Pluripotency: Recent Developments in Human Induced Pluripotent Stem Cell Reprogramming Strategies. Stem Cell Rev Rep 12, 54–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Hochedlinger K, and Jaenisch R (2015). Induced Pluripotency and Epigenetic Reprogramming. Cold Spring Harb. Perspect. Biol. 7. 10.1101/cshperspect.a019448. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Hanna J, Saha K, Pando B, van Zon J, Lengner CJ, Creyghton MP, van Oudenaarden A, and Jaenisch R (2009). Direct cell reprogramming is a stochastic process amenable to acceleration. Nature 462, 595–601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Buganim Y, Faddah DA, Cheng AW, Itskovich E, Markoulaki S, Ganz K, Klemm SL, van Oudenaarden A, and Jaenisch R (2012). Single-cell expression analyses during cellular reprogramming reveal an early stochastic and a late hierarchic phase. Cell 150, 1209–1222. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Hanna JH, Saha K, and Jaenisch R (2010). Pluripotency and cellular reprogramming: facts, hypotheses, unresolved issues. Cell 143, 508–525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Yunusova AM, Fishman VS, Vasiliev GV, and Battulin NR (2017). Deterministic versus stochastic model of reprogramming: new evidence from cellular barcoding technique. Open Biol. 7. 10.1098/rsob.160311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Pour M, Pilzer I, Rosner R, Smith ZD, Meissner A, and Nachman I (2015). Epigenetic predisposition to reprogramming fates in somatic cells. EMBO Rep. 16, 370–378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Shakiba N, Fahmy A, Jayakumaran G, McGibbon S, David L, Trcka D, Elbaz J, Puri MC, Nagy A, van der Kooy D, et al. (2019). Cell competition during reprogramming gives rise to dominant clones. Science, eaan0925. [DOI] [PubMed] [Google Scholar]
  • 12.Teshigawara R, Hirano K, Nagata S, Ainscough J, and Tada T (2015). OCT4 Activity during Conversion of Human Intermediately Reprogrammed Stem Cells to iPS Cells through MET. Development. 10.1242/dev.130344. [DOI] [PubMed] [Google Scholar]
  • 13.Fu K, Chronis C, Soufi A, Bonora G, Edwards M, Smale ST, Zaret KS, Plath K, and Pellegrini M (2018). Comparison of reprogramming factor targets reveals both species-specific and conserved mechanisms in early iPSC reprogramming. BMC Genomics 19, 956. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Guo S, Zi X, Schulz VP, Cheng J, Zhong M, Koochaki SHJ, Megyola CM, Pan X, Heydari K, Weissman SM, et al. (2014). Nonstochastic reprogramming from a privileged somatic cell state. Cell 156, 649–662. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Smith ZD, Nachman I, Regev A, and Meissner A (2010). Dynamic single-cell imaging of direct reprogramming reveals an early specifying event. Nat. Biotechnol. 28, 521–526. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Babos KN, Galloway KE, Kisler K, Zitting M, Li Y, Shi Y, Quintino B, Chow RH, Zlokovic BV, and Ichida JK (2019). Mitigating Antagonism between Transcription and Proliferation Allows Near-Deterministic Cellular Reprogramming. Cell Stem Cell. 10.1016/j.stem.2019.08.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Hu X, Eastman AE, and Guo S (2019). Cell cycle dynamics in the reprogramming of cellular identity. FEBS Lett. 593, 2840–2852. [DOI] [PubMed] [Google Scholar]
  • 18.Schwarz BA, Cetinbas M, Clement K, Walsh RM, Cheloufi S, Gu H, Langkabel J, Kamiya A, Schorle H, Meissner A, et al. (2018). Prospective Isolation of Poised iPSC Intermediates Reveals Principles of Cellular Reprogramming. Cell Stem Cell 23, 289–305.e5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Di Stefano B, Collombet S, Jakobsen JS, Wierer M, Sardina JL, Lackner A, Stadhouders R, Segura-Morales C, Francesconi M, Limone F, et al. (2016). C/EBPα creates elite cells for iPSC reprogramming by upregulating Klf4 and increasing the levels of Lsd1 and Brd4. Nat. Cell Biol. 18, 371–381. [DOI] [PubMed] [Google Scholar]
  • 20.Li R, Liang J, Ni S, Zhou T, Qing X, Li H, He W, Chen J, Li F, Zhuang Q, et al. (2010). A mesenchymal-to-epithelial transition initiates and is required for the nuclear reprogramming of mouse fibroblasts. Cell Stem Cell 7, 51–63. [DOI] [PubMed] [Google Scholar]
  • 21.Liu X, Sun H, Qi J, Wang L, He S, Liu J, Feng C, Chen C, Li W, Guo Y, et al. (2013). Sequential introduction of reprogramming factors reveals a time-sensitive requirement for individual factors and a sequential EMT-MET mechanism for optimal reprogramming. Nat. Cell Biol. 15, 829–838. [DOI] [PubMed] [Google Scholar]
  • 22.Samavarchi-Tehrani P, Golipour A, David L, Sung H-K, Beyer TA, Datti A, Woltjen K, Nagy A, and Wrana JL (2010). Functional genomics reveals a BMP-driven mesenchymal-to-epithelial transition in the initiation of somatic cell reprogramming. Cell Stem Cell 7, 64–77. [DOI] [PubMed] [Google Scholar]
  • 23.Guo L, Lin L, Wang X, Gao M, Cao S, Mai Y, Wu F, Kuang J, Liu H, Yang J, et al. (2019). Resolving Cell Fate Decisions during Somatic Cell Reprogramming by Single-Cell RNA-Seq. Mol. Cell 73, 815–829.e7. [DOI] [PubMed] [Google Scholar]
  • 24.Hussein SMI, Puri MC, Tonge PD, Benevento M, Corso AJ, Clancy JL, Mosbergen R, Li M, Lee D-S, Cloonan N, et al. (2014). Genome-wide characterization of the routes to pluripotency. Nature 516, 198–206. [DOI] [PubMed] [Google Scholar]
  • 25.Becker JS, McCarthy RL, Sidoli S, Donahue G, Kaeding KE, He Z, Lin S, Garcia BA, and Zaret KS (2017). Genomic and Proteomic Resolution of Heterochromatin and Its Restriction of Alternate Fate Genes. Mol. Cell 68, 1023–1037.e15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Zviran A, Mor N, Rais Y, Gingold H, Peles S, Chomsky E, Viukov S, Buenrostro JD, Scognamiglio R, Weinberger L, et al. (2018). Deterministic Somatic Cell Reprogramming Involves Continuous Transcriptional Changes Governed by Myc and Epigenetic-Driven Modules. Cell Stem Cell. 10.1016/j.stem.2018.11.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Polo JM, Anderssen E, Walsh RM, Schwarz BA, Nefzger CM, Lim SM, Borkent M, Apostolou E, Alaei S, Cloutier J, et al. (2012). A molecular roadmap of reprogramming somatic cells into iPS cells. Cell 151, 1617–1632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Chronis C, Fiziev P, Papp B, Butz S, Bonora G, Sabri S, Ernst J, and Plath K (2017). Cooperative Binding of Transcription Factors Orchestrates Reprogramming. Cell 168, 442–459.e20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Utikal J, Polo JM, Stadtfeld M, Maherali N, Kulalert W, Walsh RM, Khalil A, Rheinwald JG, and Hochedlinger K (2009). Immortalization eliminates a roadblock during cellular reprogramming into iPS cells. Nature 460, 1145–1148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Nemajerova A, Kim SY, Petrenko O, and Moll UM (2012). Two-factor reprogramming of somatic cells to pluripotent stem cells reveals partial functional redundancy of Sox2 and Klf4. Cell Death Differ. 19, 1268–1276. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Wakao S, Kitada M, Kuroda Y, Shigemoto T, Matsuse D, Akashi H, Tanimura Y, Tsuchiyama K, Kikuchi T, Goda M, et al. (2011). Multilineage-differentiating stress-enduring (Muse) cells are a primary source of induced pluripotent stem cells in human fibroblasts. Proc. Natl. Acad. Sci. U. S. A. 108, 9875–9880. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Emert BL, Cote CJ, Torre EA, Dardani IP, Jiang CL, Jain N, Shaffer SM, and Raj A (2021). Variability within rare cell states enables multiple paths toward drug resistance. Nat. Biotechnol. 39, 865–876. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Biddy BA, Kong W, Kamimoto K, Guo C, Waye SE, Sun T, and Morris SA (2018). Single-cell mapping of lineage and identity in direct reprogramming. Nature 564, 219–224. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Umkehrer C, Holstein F, Formenti L, Jude J, Froussios K, Neumann T, Cronin SM, Haas L, Lipp JJ, Burkard TR, et al. (2021). Isolating live cell clones from barcoded populations using CRISPRa-inducible reporters. Nat. Biotechnol. 39, 174–178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Weinreb C, Rodriguez-Fraticelli A, Camargo FD, and Klein AM (2020). Lineage tracing on transcriptional landscapes links state to fate during differentiation. Science 367. 10.1126/science.aaw3381. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Goyal Y, Busch GT, Pillai M, Li J, Boe RH, Grody EI, Chelvanambi M, Dardani IP, Emert B, Bodkin N, et al. (2023). Diverse clonal fates emerge upon drug treatment of homogeneous cancer cells. Nature 620, 651–659. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Cacchiarelli D, Trapnell C, Ziller MJ, Soumillon M, Cesana M, Karnik R, Donaghey J, Smith ZD, Ratanasirintrawoot S, Zhang X, et al. (2015). Integrative Analyses of Human Reprogramming Reveal Dynamic Nature of Induced Pluripotency. Cell 162, 412–424. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Sheridan SD, Surampudi V, and Rao RR (2012). Analysis of embryoid bodies derived from human induced pluripotent stem cells as a means to assess pluripotency. Stem Cells Int. 2012, 738910. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Eastman AE, and Guo S (2020). The palette of techniques for cell cycle analysis. FEBS Lett. 10.1002/1873-3468.13842. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Zhan Z, Song L, Zhang W, Gu H, Cheng H, Zhang Y, Yang Y, Ji G, Feng H, Cheng T, et al. (2019). Absence of cyclin-dependent kinase inhibitor p27 or p18 increases efficiency of iPSC generation without induction of iPSC genomic instability. Cell Death Dis. 10, 271. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Li H, Collado M, Villasante A, Strati K, Ortega S, Cañamero M, Blasco MA, and Serrano M (2009). The Ink4/Arf locus is a barrier for iPS cell reprogramming. Nature 460, 1136–1139. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Peyser R, MacDonnell S, Gao Y, Cheng L, Kim Y, Kaplan T, Ruan Q, Wei Y, Ni M, Adler C, et al. (2019). Defining the Activated Fibroblast Population in Lung Fibrosis Using Single-Cell Sequencing. Am. J. Respir. Cell Mol. Biol. 61, 74–85. [DOI] [PubMed] [Google Scholar]
  • 43.Layton TB, Williams L, McCann F, Zhang M, Fritzsche M, Colin-York H, Cabrita M, Ng MTH, Feldmann M, Sansom SN, et al. (2020). Cellular census of human fibrosis defines functionally distinct stromal cell types and states. Nat. Commun. 11, 2768. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Hsia L-T, Ashley N, Ouaret D, Wang LM, Wilding J, and Bodmer WF (2016). Myofibroblasts are distinguished from activated skin fibroblasts by the expression of AOC3 and other associated markers. Proc. Natl. Acad. Sci. U. S. A. 113, E2162–E2171. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Sandberg TP, Stuart MPME, Oosting J, Tollenaar RAEM, Sier CFM, and Mesker WE (2019). Increased expression of cancer-associated fibroblast markers at the invasive front and its association with tumor-stroma ratio in colorectal cancer. BMC Cancer 19, 284. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Guerrero-Juarez CF, Dedhia PH, Jin S, Ruiz-Vega R, Ma D, Liu Y, Yamaga K, Shestova O, Gay DL, Yang Z, et al. (2019). Single-cell analysis reveals fibroblast heterogeneity and myeloid-derived adipocyte progenitors in murine skin wounds. Nat. Commun. 10, 650. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Walker EJ, Heydet D, Veldre T, and Ghildyal R (2019). Transcriptomic changes during TGF-β-mediated differentiation of airway fibroblasts to myofibroblasts. Sci. Rep. 9, 20377. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Tang R, Wang Y-C, Mei X, Shi N, Sun C, Ran R, Zhang G, Li W, Staveley-O’Carroll KF, Li G, et al. (2020). LncRNA GAS5 attenuates fibroblast activation through inhibiting Smad3 signaling. Am. J. Physiol. Cell Physiol. 319, C105–C115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Radwanska A, Cottage CT, Piras A, Overed-Sayer C, Sihlbom C, Budida R, Wrench C, Connor J, Monkley S, Hazon P, et al. (2022). Increased expression and accumulation of GDF15 in IPF extracellular matrix contribute to fibrosis. JCI Insight 7. 10.1172/jci.insight.153058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Liu L-X, Huang S, Zhang Q-Q, Liu Y, Zhang D-M, Guo X-H, and Han D-W (2009). Insulin-like growth factor binding protein-7 induces activation and transdifferentiation of hepatic stellate cells in vitro. World J. Gastroenterol. 15, 3246–3253. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Keenan AB, Torre D, Lachmann A, Leong AK, Wojciechowicz ML, Utti V, Jagodnik KM, Kropiwnicki E, Wang Z, and Ma’ayan A (2019). ChEA3: transcription factor enrichment analysis by orthogonal omics integration. Nucleic Acids Res. 47, W212–W224. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Mellis IA, Edelstein HI, Truitt R, Goyal Y, Beck LE, Symmons O, Dunagin MC, Linares Saldana RA, Shah PP, Pérez-Bermejo JA, et al. (2021). Responsiveness to perturbations is a hallmark of transcription factors that maintain cell identity in vitro. Cell Syst 12, 885–899.e8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Tirosh I, Izar B, Prakadan SM, Wadsworth MH 2, Treacy D, Trombetta JJ, Rotem A, Rodman C, Lian C, Murphy G, et al. (2016). Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science 352, 189–196. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Schmitt CE, Morales BM, Schmitz EMH, Hawkins JS, Lizama CO, Zape JP, Hsiao EC, and Zovein AC (2017). Fluorescent tagged episomals for stoichiometric induced pluripotent stem cell reprogramming. Stem Cell Res. Ther. 8, 132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Wienken M, Dickmanns A, Nemajerova A, Kramer D, Najafova Z, Weiss M, Karpiuk O, Kassem M, Zhang Y, Lozano G, et al. (2016). MDM2 Associates with Polycomb Repressor Complex 2 and Enhances Stemness-Promoting Chromatin Modifications Independent of p53. Mol. Cell 61, 68–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Wille CK, and Sridharan R (2022). DOT1L inhibition enhances pluripotency beyond acquisition of epithelial identity and without immediate suppression of the somatic transcriptome. Stem Cell Reports 17, 384–396. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Rittling SR, and Feng F (1998). Detection of mouse osteopontin by western blotting. Biochem. Biophys. Res. Commun. 250, 287–292. [DOI] [PubMed] [Google Scholar]
  • 58.Rapisarda V, Borghesan M, Miguela V, Encheva V, Snijders AP, Lujambio A, and O’Loghlen A (2017). Integrin Beta 3 Regulates Cellular Senescence by Activating the TGF-β Pathway. Cell Rep. 18, 2480–2493. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Wang Y-W, Lin W-Y, Wu F-J, and Luo C-W (2022). Unveiling the transcriptomic landscape and the potential antagonist feedback mechanisms of TGF-β superfamily signaling module in bone and osteoporosis. Cell Commun. Signal. 20, 190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Yan X, Liu Z, and Chen Y (2009). Regulation of TGF-beta signaling by Smad7. Acta Biochim. Biophys. Sin. 41, 263–272. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Samarakoon R, Higgins CE, Higgins SP, and Higgins PJ (2009). TGF-beta1-Induced Expression of the Poor Prognosis SERPINE1/PAI-1 Gene Requires EGFR Signaling: A New Target for Anti-EGFR Therapy. J. Oncol. 2009, 342391. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Kramerova I, Kumagai-Cresse C, Ermolova N, Mokhonova E, Marinov M, Capote J, Becerra D, Quattrocelli M, Crosbie RH, Welch E, et al. (2019). Spp1 (osteopontin) promotes TGFβ processing in fibroblasts of dystrophin-deficient muscles through matrix metalloproteinases. Hum. Mol. Genet. 28, 3431–3442. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Lenga Y, Koh A, Perera AS, McCulloch CA, Sodek J, and Zohar R (2008). Osteopontin expression is required for myofibroblast differentiation. Circ. Res. 102, 319–327. [DOI] [PubMed] [Google Scholar]
  • 64.Sun H, Liang L, Li Y, Feng C, Li L, Zhang Y, He S, Pei D, Guo Y, and Zheng H (2016). Lysine-specific histone demethylase 1 inhibition promotes reprogramming by facilitating the expression of exogenous transcriptional factors and metabolic switch. Sci. Rep. 6, 30903. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Yoshida Y, Takahashi K, Okita K, Ichisaka T, and Yamanaka S (2009). Hypoxia Enhances the Generation of Induced Pluripotent Stem Cells. Cell Stem Cell 5, 237–241. 10.1016/j.stem.2009.08.001. [DOI] [PubMed] [Google Scholar]
  • 66.Shaffer SM, Emert BL, Reyes Hueros RA, Cote C, Harmange G, Schaff DL, Sizemore AE, Gupte R, Torre E, Singh A, et al. (2020). Memory Sequencing Reveals Heritable Single-Cell Gene Expression Programs Associated with Distinct Cellular Behaviors. Cell 182, 947–959.e17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Hayashi Y, Hsiao EC, Sami S, Lancero M, Schlieve CR, Nguyen T, Yano K, Nagahashi A, Ikeya M, Matsumoto Y, et al. (2016). BMP-SMAD-ID promotes reprogramming to pluripotency by inhibiting p16/INK4A-dependent senescence. Proc. Natl. Acad. Sci. U. S. A. 113, 13057–13062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Onder TT, Kara N, Cherry A, Sinha AU, Zhu N, Bernt KM, Cahan P, Marcarci BO, Unternaehrer J, Gupta PB, et al. (2012). Chromatin-modifying enzymes as modulators of reprogramming. Nature 483, 598–602. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Xing QR, El Farran CA, Gautam P, Chuah YS, Warrier T, Toh CXD, Kang NY, Sugii S, Chang YT, Xu J, et al. (2020). Diversification of reprogramming trajectories revealed by parallel single-cell transcriptome and chromatin accessibility sequencing. Science Advances 6, eaba1190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Nguyen QH, Lukowski SW, Chiu HS, Senabouth A, Bruxner TJC, Christ AN, Palpant NJ, and Powell JE (2018). Single-cell RNA-seq of human induced pluripotent stem cells reveals cellular heterogeneity and cell state transitions between subpopulations. Genome Res. 28, 1053–1066. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Yang S, Cho Y, and Jang J (2021). Single cell heterogeneity in human pluripotent stem cells. BMB Rep. 54, 505–515. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Richman LP, Goyal Y, Jiang CL, and Raj A (2023). ClonoCluster: A method for using clonal origin to inform transcriptome clustering. Cell Genomics, 100247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Jiang CL, Goyal Y, Jain N, Wang Q, Truitt RE, Coté AJ, Emert B, Mellis IA, Kiani K, Yang W, et al. (2022). Cell type determination for cardiac differentiation occurs soon after seeding of human-induced pluripotent stem cells. Genome Biol. 23, 90. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Masaki H, Ishikawa T, Takahashi S, Okumura M, Sakai N, Haga M, Kominami K, Migita H, McDonald F, Shimada F, et al. (2007). Heterogeneity of pluripotent marker gene expression in colonies generated in human iPS cell induction culture. Stem Cell Res. 1, 105–115. [DOI] [PubMed] [Google Scholar]
  • 75.Narsinh KH, Sun N, Sanchez-Freire V, Lee AS, Almeida P, Hu S, Jan T, Wilson KD, Leong D, Rosenberg J, et al. (2011). Single cell transcriptional profiling reveals heterogeneity of human induced pluripotent stem cells. J. Clin. Invest. 121, 1217–1221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Cai J, Chen J, Liu Y, Miura T, Luo Y, Loring JF, Freed WJ, Rao MS, and Zeng X (2006). Assessing self-renewal and differentiation in human embryonic stem cell lines. Stem Cells 24, 516–530. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Liu Q, Wang G, Lyu Y, Bai M, Jiapaer Z, Jia W, Han T, Weng R, Yang Y, Yu Y, et al. (2018). The miR-590/Acvr2a/Terf1 Axis Regulates Telomere Elongation and Pluripotency of Mouse iPSCs. Stem Cell Reports 11, 88–101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Alici-Garipcan A, Özçimen B, Süder I, Ülker V, Önder TT, and Özören N (2020). NLRP7 plays a functional role in regulating BMP4 signaling during differentiation of patient-derived trophoblasts. Cell Death Dis. 11, 658. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Tian L, Schreuder J, Zalcenstein D, Tran J, Kocovski N, Su S, Diakumis P, Bahlo M, Sargeant T, Hodgkin PD, et al. (2018). SIS-seq, a molecular “time machine”, connects single cell fate with gene programs . bioRxiv, 2018.08.29.403113. 10.1101/403113. [DOI] [Google Scholar]
  • 80.Torre EA, Arai E, Bayatpour S, Jiang CL, Beck LE, Emert BL, Shaffer SM, Mellis IA, Fane ME, Alicea GM, et al. (2021). Genetic screening for single-cell variability modulators driving therapy resistance. Nat. Genet. 53, 76–85. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Pillai M, Hojel E, Jolly MK, and Goyal Y (2023). Unraveling non-genetic heterogeneity in cancer with dynamical models and computational tools. Nat Comput Sci 3, 301–313. [DOI] [PubMed] [Google Scholar]
  • 82.Dardani I, Emert BL, Goyal Y, Jiang CL, Kaur A, Lee J, Rouhanifard SH, Alicea GM, Fane ME, Xiao M, et al. (2022). ClampFISH 2.0 enables rapid, scalable amplified RNA detection in situ. Nat. Methods 19, 1403–1410. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Schuh L, Saint-Antoine M, Sanford EM, Emert BL, Singh A, Marr C, Raj A, and Goyal Y (2020). Gene Networks with Transcriptional Bursting Recapitulate Rare Transient Coordinated High Expression States in Cancer. Cell Syst 10, 363–378.e12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Chang HH, Hemberg M, Barahona M, Ingber DE, and Huang S (2008). Transcriptome-wide noise controls lineage choice in mammalian progenitor cells. Nature 453, 544–547. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Mojtahedi M, Skupin A, Zhou J, Castaño IG, Leong-Quong RYY, Chang H, Trachana K, Giuliani A, and Huang S (2016). Cell Fate Decision as High-Dimensional Critical State Transition. PLoS Biol. 14, e2000640. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Baum J, and Duffy HS (2011). Fibroblasts and myofibroblasts: what are we talking about? J. Cardiovasc. Pharmacol. 57, 376–379. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Doolin MT, Smith IM, and Stroka KM (2021). Fibroblast to myofibroblast transition is enhanced by increased cell density. Mol. Biol. Cell 32, ar41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Masur SK, Dewal HS, Dinh TT, Erenburg I, and Petridou S (1996). Myofibroblasts differentiate from fibroblasts when plated at low density. Proc. Natl. Acad. Sci. U. S. A. 93, 4219–4223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.López-Antona I, Contreras-Jurado C, Luque-Martín L, Carpintero-Leyva A, González-Méndez P, and Palmero I (2022). Dynamic regulation of myofibroblast phenotype in cellular senescence. Aging Cell 21, e13580. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Hinz B, Phan SH, Thannickal VJ, Prunotto M, Desmoulière A, Varga J, De Wever O, Mareel M, and Gabbiani G (2012). Recent developments in myofibroblast biology: paradigms for connective tissue remodeling. Am. J. Pathol. 180, 1340–1355. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Baranyi U, Winter B, Gugerell A, Hegedus B, Brostjan C, Laufer G, and Messner B (2019). Primary Human Fibroblasts in Culture Switch to a Myofibroblast-Like Phenotype Independently of TGF Beta. Cells 8. 10.3390/cells8070721. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Pakshir P, Noskovicova N, Lodyga M, Son DO, Schuster R, Goodwin A, Karvonen H, and Hinz B (2020). The myofibroblast at a glance. J. Cell Sci. 133. 10.1242/jcs.227900. [DOI] [PubMed] [Google Scholar]
  • 93.Tanaka N, Kato H, Tsuda H, Sato Y, Muramatsu T, Iguchi A, Nakajima H, Yoshitake A, and Senbonmatsu T (2020). Development of a High-Efficacy Reprogramming Method for Generating Human Induced Pluripotent Stem (iPS) Cells from Pathologic and Senescent Somatic Cells. Int. J. Mol. Sci. 21. 10.3390/ijms21186764. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Song G, Pacher M, Balakrishnan A, Yuan Q, Tsay H-C, Yang D, Reetz J, Brandes S, Dai Z, Pützer BM, et al. (2016). Direct Reprogramming of Hepatic Myofibroblasts into Hepatocytes In Vivo Attenuates Liver Fibrosis. Cell Stem Cell 18, 797–808. [DOI] [PubMed] [Google Scholar]
  • 95.Mahmoudi S, Mancini E, Xu L, Moore A, Jahanbani F, Hebestreit K, Srinivasan R, Li X, Devarajan K, Prélot L, et al. (2019). Heterogeneity in old fibroblasts is linked to variability in reprogramming and wound healing. Nature 574, 553–558. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Koumas L, Smith TJ, Feldon S, Blumberg N, and Phipps RP (2003). Thy-1 expression in human fibroblast subsets defines myofibroblastic or lipofibroblastic phenotypes. Am. J. Pathol. 163, 1291–1300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Sanders YY, Kumbla P, and Hagood JS (2007). Enhanced myofibroblastic differentiation and survival in Thy-1(−) lung fibroblasts. Am. J. Respir. Cell Mol. Biol. 36, 226–235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Symmons O, and Raj A (2016). What’s Luck Got to Do with It: Single Cells, Multiple Fates, and Biological Nondeterminism. Mol. Cell 62, 788–802. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Luria SE, and Delbrück M (1943). Mutations of Bacteria from Virus Sensitivity to Virus Resistance. Genetics 28, 491–511. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Shaffer SM, Dunagin MC, Torborg SR, Torre EA, Emert B, Krepler C, Beqiri M, Sproesser K, Brafford PA, Xiao M, et al. (2017). Rare cell variability and drug-induced reprogramming as a mode of cancer drug resistance. Nature 546, 431–435. 10.1038/nature22794. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Mold JE, Weissman MH, Ratz M, Hagemann-Jensen M, Hård J, Eriksson C-J, Toosi H, Berghenstråhle J, von Berlin L, Martin M, et al. (2022). Clonally heritable gene expression imparts a layer of diversity within cell types. bioRxiv, 2022.02.14.480352. 10.1101/2022.02.14.480352. [DOI] [PubMed] [Google Scholar]
  • 102.Stadtfeld M, Maherali N, Borkent M, and Hochedlinger K (2010). A reprogrammable mouse strain from gene-targeted embryonic stem cells. Nature Methods 7, 53–55. 10.1038/nmeth.1409. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Caldwell BA, Liu MY, Prasasya RD, Wang T, DeNizio JE, Leu NA, Amoh NYA, Krapp C, Lan Y, Shields EJ, et al. (2021). Functionally distinct roles for TET-oxidized 5-methylcytosine bases in somatic reprogramming to pluripotency. Mol. Cell 81, 859–869.e8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Reya T, Duncan AW, Ailles L, Domen J, Scherer DC, Willert K, Hintz L, Nusse R, and Weissman IL (2003). A role for Wnt signalling in self-renewal of haematopoietic stem cells. Nature 423, 409–414. [DOI] [PubMed] [Google Scholar]
  • 105.Raj A, van den Bogaard P, Rifkin SA, van Oudenaarden A, and Tyagi S (2008). Imaging individual mRNA molecules using multiple singly labeled probes. Nat. Methods 5, 877–879. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Choi HMT, Schwarzkopf M, Fornace ME, Acharya A, Artavanis G, Stegmaier J, Cunha A, and Pierce NA (2018). Third-generation in situ hybridization chain reaction: multiplexed, quantitative, sensitive, versatile, robust. Development 145. 10.1242/dev.165753. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Doench JG, Fusi N, Sullender M, Hegde M, Vaimberg EW, Donovan KF, Smith I, Tothova Z, Wilen C, Orchard R, et al. (2016). Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat. Biotechnol. 34, 184–191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Walter DM, Venancio OS, Buza EL, Tobias JW, Deshpande C, Gudiel AA, Kim-Kiselak C, Cicchini M, Yates TJ, and Feldser DM (2017). Systematic In Vivo Inactivation of Chromatin-Regulating Enzymes Identifies Setd2 as a Potent Tumor Suppressor in Lung Adenocarcinoma. Cancer Res. 77, 1719–1729. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Dull T, Zufferey R, Kelly M, Mandel RJ, Nguyen M, Trono D, and Naldini L (1998). A third-generation lentivirus vector with a conditional packaging system. J. Virol. 72, 8463–8471. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Dou Z, Ghosh K, Vizioli MG, Zhu J, Sen P, Wangensteen KJ, Simithy J, Lan Y, Lin Y, Zhou Z, et al. (2017). Cytoplasmic chromatin triggers inflammation in senescence and cancer. Nature 550, 402–406. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Zorita E, Cuscó P, and Filion GJ (2015). Starcode: sequence clustering based on all-pairs search. Bioinformatics 31, 1913–1919. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Hie B, Bryson B, and Berger B (2019). Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat. Biotechnol. 37, 685–691. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.Hafemeister C, and Satija R (2019). Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol. 20, 296. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114.Zappia L, and Oshlack A (2018). Clustering trees: a visualization for evaluating clusterings at multiple resolutions. Gigascience 7. 10.1093/gigascience/giy083. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, and Gingeras TR (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116.Anders S, Pyl PT, and Huber W (2014). HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.Bray NL, Pimentel H, Melsted P, and Pachter L (2016). Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527. [DOI] [PubMed] [Google Scholar]
  • 118.Love MI, Huber W, and Anders S (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 119.Castro-Mondragon JA, Riudavets-Puig R, Rauluseviciute I, Lemma RB, Turchi L, Blanc-Mathieu R, Lucas J, Boddie P, Khan A, Manosalva Pérez N, et al. (2022). JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 50, D165–D173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 120.Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, Cheng JX, Murre C, Singh H, and Glass CK (2010). Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 121.Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, and Mesirov JP (2011). Integrative genomics viewer. Nat. Biotechnol. 29, 24–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 122.Bland JM, and Altman DG (2000). Statistics notes. The odds ratio. BMJ 320, 1468. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123.Arumugam M, Raes J, Pelletier E, Le Paslier D, Yamada T, Mende DR, Fernandes GR, Tap J, Bruls T, Batto J-M, et al. (2011). Enterotypes of the human gut microbiome. Nature 473, 174–180. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2

Table S1: Sequences of smFISH probes used here

3

Table S2: Sequences of CRISPR knockdown guides used here

4

Table S3: Single-cell cloned barcoded WM989 A6-G3 melanoma lines used as spike-in standard for gDNA sequencing

5

Table S4: Genes differentially expressed for each Rewind replicated individually and in aggregate

6

Table S5: Genes differentially expressed between primed and nonprimed cells across cycling speeds

7

Table S6: Loadings for each principal component for visualizing samples aggregated by cycling speed and priming status

Data Availability Statement

All raw and processed data have been deposited on Dropbox and are publicly available as of the date of publication. URLs are listed in the Key Resources Table. All bulk RNA-sequencing data and single-cell RNA-sequencing data have been deposited at GEO and are publicly available as of the date of publication. Ascension numbers are listed in the Key Resources Table. All genomic DNA barcode sequencing data have been deposited on Figshare and are publicly available as of the date of publication. URLs are listed in the Key Resources Table.

Key resources table.

REAGENT or RESOURCE SOURCE IDENTIFIER
Antibodies
Mouse Anti-Human TRA-1-60 Alexa Fluor 488 Antibody Invitrogen CAT# A25618; RRID: AB_2885001
Rabbit Anti-Human SPP1 Antibody Proteintech CAT# 22952-1-AP, RRID: AB_2783651
Rabit Anti-Human LSD1 Antibody Cell Signaling Technology CAT# 2184S, RRID: AB_2070132
Rabbit Anti-HumanBeta Actin Antibody Cell Signaling Technology CAT# 4970S, RRID: AB_2223172
Rabbit Anti-Human Histone H3 Antibody Abcam CAT# AB1791, RRID: AB_302613
Goat Anti-Rabbit HRP Antibody Bio-Rad CAT# 1706515, RRID: AB_11125142
Chemicals, peptides, and recombinant proteins
rhFGF-basic Promega CAT# G5071
Polybrene Millipore-Sigma CAT# TR-1003-G
Cell Trace Yellow Invitrogen CAT# C34567
ROCK inhibitor Y26632 Calbiochem CAT# 688001
LSD1 inhibitor RN-1 Millipore-Sigma CAT# 489479
DOT1L inhibitor Selleck Chemicals CAT# S7062
Critical commercial assays
QIAmp DNA Mini Kit Qiagen CAT# 51304
NEBNext Poly(A) mRNA Magnetic Isolation Module NEB CAT# E7490L
NEBNext Ultra II RNA Library Prep Kit for Illumina NEB CAT# E7770L
NEBNext Multiplex Oligos for Illumina (Dual Index Primers Set 1) oligos NEB CAT# E7600S
Chromium Next GEM SingleCell 3’ HT Kit v3.1 10x Genomics CAT# 1000370
Illumina NextSeq 500/550 75 Cycle High-Output Kit Illumina CAT# 20024906
Illumina NextSeq 500/550 150 Cycle Mid-Output Kit Illumina CAT# 20024904
Illumina NextSeq 1000/2000 P3 100 Cycle Kit Illumina CAT# 20040559
Illumiina NovaSeq 6000 S1 100 Cycle Kit Illumina CAT# 20028319
Vector Red Substrate kit Vector Labs CAT# SK-5100
Deposited data
All Bulk RNA-Sequencing Data This Paper GSE226987,
GSE243933
All Single-Cell RNA-Sequencing Data This Paper GSE227151
All Genomic DNA Barcode Sequencing Data This Paper https://figshare.com/projects/Retrospective_identification_of_cell-intrinsic_factors_that_mark_pluripotency_potential_in_rare_somatic_cells/161662
Experimental models: Cell lines
Human: HEK293FT Fisher CAT# R70007, RRID: CVCL_6911
Human: OKSM-inducible, iPSC-derived hiF-T fibroblasts Cacchiarelli et al., 201537 N/A
Mouse: OKSM-inducible, ESC-derived embryonic fibroblasts Stadtfeld et al., 2010102 N/A
Mouse: CF-1 irradiated feeder embryonic fibroblasts Fisher CAT# A34181
Oligonucleotides
Single-Molecule RNA FISH Probe Sets See Table S1 N/A
Primers for Amplification of gDNA Barcodes Emert et al., 202132 N/A
Primers for Amplification of 10X Barcodes Goyal et al., 202336 N/A
Hybridization Chain Reaction B1 Alexa Fluor 647 Amplifier Hairpins Molecular Instruments N/A
Hybridization Chain Reaction Custom Probe Sets for RNA Barcodes Molecular Instruments N/A
Recombinant DNA
Cell Clone Barcode Library Plasmids Emert et al., 202132 N/A
CRISPR Knockdown Constructs See Table S2 N/A
pPAX2 Trono Lab (unpublished) Addgene Plasmid #12260
pVSV.G Reya et al., 2003104 Addgene Plasmid #14888
pLentiCRISPRv2-blast Babu Lab (unpublished) Addgene Plasmid #83480
pLentiCRISPRv2-GFP Walter et al., 2017108 Addgene Plasmid #82416
pMD2.G Trono Lab (unpublished) Addgene Plasmid #12259
pMDLg Dull et al., 1998109 Addgene Plasmid #12251
pRSV-Rev Dull et al., 1998109 Addgene Plasmid #12253
Software and algorithms
STAR Dobin et al., 2013115 N/A
HTSeq Anders et al., 2015116 N/A
kallisto Bray et al., 2016117 N/A
ChEA3 Keenan et al., 201951 N/A
HOMER Heinz et al., 2010120 N/A
JASPAR Castro-Mondragon et al., 2022119 N/A
IGV Robinson et al., 2011121 N/A
DESeq2 Love et al., 2014118 N/A
Image Analysis Pipeline via rajlabimagetools Repository Raj et al., 2008105 N/A
Barcode Analysis Pipeline via timemachine Repository Emert et al., 202132 N/A
Barcode Analysis Pipeline via FateMap_Goyal2023 Repository Goyal et al., 202336 N/A
All Raw Data Used to Produce Figures This Paper https://www.dropbox.com/sh/ulu6728tcp49dv2/AAAPwLYQiVLloH_JL38lvTj6a?dl=0,
https://www.dropbox.com/sh/zz958910t4fkj9w/AAAgTVwO5yAKZ1TpSQVfV6Qga?dl=0
All Code Used to Produce Figures This Paper https://doi.org/10.5281/zenodo.7707418

All original code has been deposited at Zenodo and is publicly available as of the date of publication. DOIs are listed in the key resources table.

Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

RESOURCES