Summary
Direct lineage conversion offers a new strategy for tissue regeneration and disease modeling. Despite recent success in directly reprogramming fibroblasts into various cell types, the precise changes that occur as fibroblasts progressively convert to target cell fates remain unclear. The inherent heterogeneity and asynchronous nature of the reprogramming process renders it difficult to study using bulk genomic techniques. Here, to overcome this limitation, we applied single-cell RNA-seq to analyze global transcriptome changes at early stages of induced cardiomyocyte (iCM) reprogramming1–4. Using unsupervised dimensionality reduction and clustering algorithms, we identified molecularly distinct subpopulations of cells along reprogramming. We also constructed routes of iCM formation, and delineated the relationship between cell proliferation and iCM induction. Further analysis of global gene expression changes during reprogramming revealed an unexpected down-regulation of factors involved in mRNA processing and splicing. Detailed functional analysis of the top candidate splicing factor Ptbp1 revealed that it is a critical barrier to the acquisition of CM-specific splicing patterns in fibroblasts. Concomitantly, Ptbp1 depletion promoted cardiac transcriptome acquisition and increased iCM reprogramming efficiency. Additional quantitative analysis of our dataset revealed a strong correlation between the expression of each reprogramming factor and the progress of individual cells through the reprogramming process, and led to the discovery of novel surface markers for enrichment of iCMs. In summary, our single cell transcriptomics approaches enabled us to reconstruct the reprogramming trajectory and to uncover heretofore unrecognized intermediate cell populations, gene pathways and regulators involved in iCM induction.
Direct cardiac reprogramming that converts scar-forming fibroblasts to iCMs holds promise as a novel approach to replenish lost CMs in diseased hearts1–4. Considerable efforts have been made to improve the efficiency and unravel the underlying mechanism5–15. However, it still remains unknown how conversion of fibroblast to myocyte is achieved without following the conventional CM specification and differentiation. This is partly due to the fact that the starting fibroblasts exhibit largely uncharacterized molecular heterogeneity, and the reprogramming population contains fully-, partially- and unconverted cells. Traditional population-based genome-wide approaches are incapable of resolving such unsynchronized cell-fate-switching process. Therefore, we leveraged the power of single cell transcriptomics to better investigate the Mef2c (M), Gata4 (G) and Tbx5 (T)-mediated iCM reprogramming.
Previous studies indicate that a snapshot of an unsynchronized biological process can capture cells at different stages of the process16. Because emergence of iCMs occurs as early as day 31,11–15, we reasoned that day 3 reprogramming fibroblasts contain a wide spectrum of cells transitioning from fibroblast to iCM fate. We therefore performed single-cell RNA-seq on day 3 M+G+T-infected cardiac fibroblasts (CFs) from 7 independent experiments (design see Extended Data Fig. 1) followed by a series of quality control steps (Methods, Extended Data Fig. 1, Supplementary Table 1-2). Extensive data normalization was performed to correct for technical variations and batch effects (Methods, Extended Data Fig. 1–2). After comparing the entire set of single-cell RNA-seq data to bulk RNA-seq data of endogenous CFs and CMs obtained from parallel experiments, we detected a group of resident or circulating immune or immune-like cells (Extended Data Fig. 3) that were not included in following analyses.
Unsupervised Hierarchical Clustering (HC) and Principle Component Analysis (PCA) on the remaining 454 non-immune cells revealed three gene clusters that account for most variability in the data: CM-, fibroblast-, and cell cycle-related genes (Fig. 1a-b, Extended Data Fig. 4a-c). Based on the expression of cell cycle-related genes, the cells were grouped into cell cycle-active (CCA) and cell cycle-inactive (CCI) populations (Fig. 1a), which was confirmed by the cells’ molecular signature in their proliferation states (Extended Data Fig. 4d-g, Pro/NP, proliferating/non-proliferating). Within CCA and CCI, HC further identified 4 subpopulations based on differential expression of fibroblast vs myocyte genes: Fib, intermediate Fib (iFib), pre-iCM (piCM) and iCM (Fig. 1a). When plotted by PCA or t-distributed stochastic neighbor embedding (tSNE), a stepwise transcriptome shift from Fib to iFib to piCM to iCM was evident (Fig. 1c, Extended Data Fig. 4h-i). We also analyzed the reprogramming process as a continuous transition using SLICER17, an algorithm for inferring nonlinear cellular trajectories (Fig. 1d-e). The trajectory built by SLICER suggested that Fib, iFib, piCM, and iCM form a continuum on the bottom CCI path, representing an iCM reprogramming route. We further calculated pseudotime for each cell on the trajectory by defining a starting Fib cell and measuring the distance of each cell to the starting cell along reprogramming (Fig. 1e). We then examined the distribution of cells along pseudotime by plotting the “free energy” (Max[density] - density) of the trajectory and discovered a peak (lowest density) in piCM (Fig. 1f). These data suggest that piCM is an unstable cell state seeking to settle into a more stable state such as iCM, consistent with the PCA and HC analyses showing that piCMs express both CM and fibroblast markers as an intermediate cell type and our other experimental evidence (Fig. 1a-c, Extended Data Fig. 4j-o). To experimentally test the iCM route, we performed population-based gene expression profiling at reprogramming days 0, 3, 5, 7, 10 and 14 (Fig. 1g-h, Extended Data Fig. 4p-v). PCA constructed a pattern showing an oriented path along reprogramming (Fig. 1g, Extended Data Fig. 4p-s). Expression of the three main gene clusters selected from single cell data showed consistent changes in population data (Fig 1h, Extended Data Fig. 4t-v), supporting the SLICER trajectory.
By analyzing CCA and CCI populations, we found that even though proliferative iCM (CCA iCM) was observed (Fig. 1a-d), iCM and piCM were predominantly CCI (Fig. 1i). We thus designed 4 sets of experiments to address the relationship between cell proliferation and iCM reprogramming by: 1) manipulating the expression of cell cycle-related genes on either lentiviral M/G/T-transduced fibroblasts (Extended Data Fig. 5a-p) or doxycycline (Dox)-inducible MGT-infected fibroblasts (Extended Data Fig. 5q-s); 2) synchronizing cell cycle of starting CFs (Fig. 1j-l); 3) transiently overexpressing large T antigen to accelerate CF proliferation (Extended Data Fig. 5t-z); 4) establishing an immortalized cardiac fibroblast line CF-T prior to initiation of iCM reprogramming (Fig. 1m-p). All four sets of experiments yielded congruent results that decreased proliferation or cell cycle synchronization enhanced iCM reprogramming while increased proliferation suppressed iCM generation.
We next examined the cellular composition of our isolated starting CFs (see methods) and identified five subpopulations (Fig. 2a-b, Extended Data Fig. 6a-i, Supplementary Discussion 1). To delineate how these subpopulations were reprogrammed, we applied HC calculated from the starting CFs to those transduced with M+G+T and determined the correlation of non-myocyte lineage markers expression to reprogramming (Fig. 2c). Expression of both endothelial and epicardial genes was significantly decreased in all cells transduced with M+G+T, irrespective of the reprogramming status. However, fibroblast and myofibroblast/smooth muscle genes were suppressed only in iCM but not in iFib or piCM (Fig. 2c, Extended Data Fig. 6j-k), supported by experimental data tracking protein expression of representative markers along reprogramming (Fig. 2d-f, Extended Data Fig. 6l-m). Thus we conclude that endothelial and epicardial genes can be readily suppressed, while fibroblast and myofibroblast/smooth muscle genes were gradually suppressed along reprogramming. This differential suppression is consistent with the distance among different cardiac cell lineages during development and suggests that the recent (epigenetic) memories might be easier to be erased than the ones gained further back. The progressive suppression of fibroblast markers also indicates a difference between iCM and iPSC reprogramming where early down-regulation of fibroblast markers such as Thy1 is one of the hallmarks and prerequisites for iPSC reprogramming to proceed18.
To understand the molecular cascades underlying iCM induction, we performed nonparametric regression and k-medoid clustering, and identified three major clusters of genes that are significantly related to and show similar trends over reprogramming (Extended Data Fig. 7a-d). Further analysis identified six smaller gene clusters with narrower variation over the trend and GO analyses were performed for each cluster (Fig. 3a-g, Supplementary Table 3, Supplementary Discussion 2). The largest Cluster 1 that shows a trend of immediate and continuous down-regulation of gene expression is enriched in GO terms related to protein translation/biosynthesis, modification and transportation (Fig. 3b). Such changes are probably to balance for increased energy requirements during the cell fate switch and/or to transit from a protein production and secretion factory (a fibroblast) to an energy-consuming power station (a CM). Cluster 2 downregulated genes are enriched in GO terms suggesting a late suppression of fibroblast genes and growth factors, while Cluster 4&5 upregulated genes are enriched in GO terms indicating engagement in metabolic shift and structural changes towards a CM fate (Fig. 3e-f).
Unexpectedly, we observed that Cluster 1 is also enriched in GO terms “mRNA splicing”, “mRNA processing” and “RNA recognition motif”. This finding prompted us to interrogate the role of splicing factor(s) in iCM induction. Thus we used icMEF line19 to screen a shRNA library that targeted 26 splicing factors representing the most common splicing factor families20 and identified Ptbp1 as the top candidate that also showed a differential expression in CFs versus CMs (Extended Data Fig. 7e-h). Importantly, knockdown of Ptbp1 in various primary fibroblasts consistently resulted in a significant increase in reprogramming efficiency (Fig. 3h-i and Extended Data Fig. 8a-p), demonstrating Ptbp1 as a general barrier to iCM induction. Overexpression of Ptbp1 yet has minimal effects (Extended Data Fig. 8q-u). To understand how Ptbp1 silencing led to improved iCM reprogramming, we performed high-depth RNA-seq to analyze alternative splicing (AS) events of day 3 reprogramming cells with or without Ptbp1. A total of 1494 AS events were detected upon Ptbp1 knockdown, 97% of which were not induced by MGT alone (Extended Data Fig. 9a, Supplementary Table 4-5). Importantly, calculation of the difference in percent spliced-in (dPSI) suggested that AS events between reprogramming vs control fibroblasts and endogenous CMs vs CFs were towards opposite direction (negative association, p=0.008). Knockdown of Ptbp1 in reprogramming fibroblasts, however, induced a strong positive association (p=2.2e-16), suggesting that Ptbp1 silencing together with MGT but not MGT alone shifted the splicing pattern from CF towards CM type (Fig. 3j-k). Furthermore, a much higher percentage of exon skipping (ES) events (63%) among the five known AS types was observed in MGT-infected cells upon Ptbp1 silencing (Fig. 3l). Motif analysis using rMAPS showed that a CT-rich Ptbp1 binding motif was significantly enriched in ES exons compared to background exons (Fib. 3m). Interestingly, for exons more included upon Ptbp1 knockdown, the motif was strongly enriched within 100 bp of the upstream intron (red arrows, p<1e-30); for exons more skipped upon Ptbp1 knockdown, the motif was less strongly enriched but showed a broad peak at 50-200 bp in the downstream intron (blue arrows, p<0.05). These data are consistent with the higher % of inclusion (69%) than skipping (31%) among ES events observed in Ptbp1 knockdown samples (Extended Data Fig. 9b), suggesting Ptbp1’s role as a repressor for exon inclusion when bound to upstream intron, and likely as a weaker activator for exon skipping when bound to downstream intron. Next we assessed the GO terms of genes alternatively spliced upon Ptbp1 silencing (Fig. 3n-o, Extended Data Fig. 9c-i). In addition to altering splicing patterns of genes related to CM lineage and function (Fig. 3n), Ptbp1 silencing resulted in splicing pattern changes of 21 other splicing factors, suggesting that Ptbp1 knockdown might trigger a second wave of splicing changes by regulating isoform switching of other splicing factors. To move one step further, we explored the potential downstream effects of Ptbp1-mediated splicing re-patterning (Supplementary Table 6, Supplementary Discussion 3). DESeq2 analyses of differentially expressed genes (DEG) revealed that Ptbp1 knockdown enhanced the MGT-induced CF to CM transcriptome shift by augmenting MGT-mediated changes (Fig. 3p, Extended Data Fig. 9j-n) and altering the expression of an additional set of cardiac and fibroblast lineage genes (Fig. 3q, Extended Data Fig. 9o).
To determine if cardiac reprogramming is a rare and random event or an M, G, T-determined process, we plotted the expression of M, G, and T and M+G+T in each cell against the reprogramming pseudotime of that cell calculated by SLICER (Fig. 4a, Extended Data Fig. 9p). We found that the expression levels of M, G, and T and M+G+T are highly correlated with the reprogramming progress, despite the fact that the expression of MGT was not used in building the trajectory. We also determined the average expression levels of M, G, and T and the average ratio of expression (M/G, M/T, and G/T) in the Fib, iFib, piCM, and iCM cell populations along the reprogramming trajectory (Extended Data Fig. 9q-s). Consistent with our previous studies6,14,21, we observed higher level of M than G and T in iCM, further underscoring the importance of high Mef2c expression in iCM induction.
To unravel the gene networks regulated by reprogramming factors, we navigated the relationship between the expression of a reprogramming factor and its downstream targets in each single cell. Using Tbx5 as an example, we calculated the Spearman correlation between Tbx5 expression and the expression of its downstream targets22,23 within each reprogramming cell (Fig. 4b left). Then we generated a correlation matrix for selected Tbx5 targets to determine their co-expression patterns (Fig. 4b right). The correlation patterns suggest Tbx5 acts through promoting cardiac function-related genes and suppressing protein biosynthesis and non-myocyte lineages (Fig 4b, Extended Data Fig. 9t-u).
Finally, we aimed to discover novel markers for targeting or enriching cell populations during iCM induction. To identify specific markers for each cell population along reprogramming trajectory, we selected genes that were expressed significantly higher (for positive selection marker) or lower (for negative selection marker) in the cell population of interest than the other three populations (Tukey-adjusted p-value < 0.05 in pairwise comparisons post ANOVA, Extended Data Fig. 10a-f, Supplementary Table 7). Negative selection markers for iCM appeared most attractive to supplement for cardiac positive selection markers. Among top 20 negative markers for iCM, we focused on four surface markers Cd200, Clca1, Tm4sf1, and Vcam1 (Fig. 4c). Linear regression suggests that their expression was highly anti-correlated with the reprogramming process, and was barely detectable in iCM (Fig. 4d). Further experimental validation confirmed Cd200 as a negative selection marker (Fig. 4e-f), and knocking down Cd200 didn’t affect reprogramming efficiency (Extended Data Fig. 10g-n).
We have used single cell transcriptomics analysis to gain insights into heterogeneity within an unsynchronized cardiac reprogramming system. The findings hold promise for improving the efficiency and detection of iCM formation. We also anticipate that the experimental and analytical methods presented here, when applied in additional cell programming or reprogramming contexts, will yield crucial insights about cell fate determination and the nature of cell type identity.
METHODS
Mouse strains and plasmids
Transgenic mice of CD1 background that contain αMHC promoter driven-GFP were described previously1. All animal experiments conformed to the NIH guidelines (Guide for the care and use of laboratory animals) and UNC Qian Lab animal protocol #15.277.0. This protocol was approved by the University of North Carolina at Chapel Hill Institutional Animal Care and Use Committee (IACUC) that oversees the university’s animal care and use (NIH/PHS Animal Welfare Assurance Number: A3410-0; USDA Animal Research Facility Registration Number: 55-R-0004; AAALAC Institutional Number: #329). pMXs retroviral vectors containing mouse Gata4, Mef2c, or Tbx5 were described previously1. The empty pMXs and pMXs-puro retroviral vectors were purchased from Cell Biolabs and they harbor a partial lacZ suffer sequence and were thereby referred to as lacZ in this manuscript. pMXs-DsRed and the polycistronic pMXs-puro-MGT were described previously14. pLenti-M/G/T were cloned by replacing the GFP insert in pLenti-GFP-Puro (Addgene #17448) with M, G, or T using BamHI and SalI. pTripZ-rTtA was cloned by removing the tet-on promoter and RFP in the pTripZ vector24 using XbaI and MluI followed by blunt-end ligation. pTripZ-iMGT was constructed by four steps. First, an intermediate plasmid pTripZ-iRFP was cloned to remove the Ubc promoter, rTtA and Puro sequences from the original pTripZ vector, which was achieved by replacing the sequences between MluI and Acc65I with PCR-amplified WPRE. Second, to introduce an AgeI site before Mef2c, the first ~600bp of Mef2c sequence before the BsrGI restriction site was PCR-amplified and cloned into pGEMT-easy (Promega), resulting in pGEMT-AgeI-Mef2c-BsrGI. Third, MGT was excised from pGEMT-MGT14 and inserted into pGEMT-AgeI-Mef2c-BsrGI with BsrGI and SalI, resulting in pGEMT-AgeI-MGT-MluI (there is a MluI site located in the pGEMT-easy vector after SalI). Last, pTripZ-iMGT was cloned by replacing RFP in pTripZ-iRFP with polycistronic MGT excised from the pGEMT vector using AgeI and MluI. For gene overexpression, Ptbp1, Cd200, and cell cycle-related genes (p15, p16, Ccnd1, Ccnd2, and Ccne1) were PCR amplified from neonatal mouse CF cDNA and cloned into the pLenti vector using BamHI and SalI (XbaI and SalI for Ptbp1 and Ccnd2, and BamHI and XhoI for Ccnd1). The control pLenti-lacZ vector was cloned by replacing the GFP insert in pLenti-GFP with the partial lacZ sequence from pMXs-puro using BamHI and SalI. Cloning primers were listed in the Supplementary Table 8. pBabe-Zeo-LargeT was purchased from Addgene (#1779). The non-targeting shNT pLKO.1-scramble plasmid was described previously24 and all other shRNAs (pLKO.1 vector based, MISSION shRNA glycerol stock) were purchased from Sigma and their TRC numbers were listed in the Supplementary Table 8.
Isolation of neonatal CF and CM, and generation of iCM
We chose to reprogram mouse neonatal CFs, which were used in the first1 and many of the follow-up cardiac reprogramming studies2–6,8,12,14–15,21,25. CFs were isolated using standard protocol as described previously21,25. Specifically, neonatal (day 1.5) hearts were isolated from αMHC-GFP+ pups and rinsed thoroughly with chilled phosphate buffer saline (PBS). The hearts were then minced by a razor blade, transferred to 8 ml warm 0.05% Trypsin-EDTA (Gibco), and incubated at 37 °C for 10 min. After 5 rounds of collagenase digestion (5 ml of warm 0.2% collagenase type II in HBSS for 3 min at 37 °C followed by vortexing for 1 min), a single cell suspension was obtained by passing through 40 μm cell strainers. The cells were then suspended in 1 ml of red cell lysis buffer (150 ml NH4Cl, 10 mM KHCO3, and 0.1 mM EDTA) for 1 min on ice and resuspended in magnetic-activated cell sorting buffer (MACS buffer: DPBS/0.5% BSA/2 mM EDTA). To sort Thy1+ cells, approximately 1 × 107 cells were suspended in 90 μl MACS buffer with 10 μl Thy1.2 micro-beads (Miltenyi Biotec) at 4 °C for 30 min. The cells were then washed, suspended in MACS buffer and applied to equilibrated LS column (Miltenyi Biotec). Cells bound to beads were flushed out after two washes and seeded onto 0.1% gelatin-coated plates at 2.5 × 104/cm2 in fibroblast media (IMDM/20% FBS/1 × Pen/Strep). After overnight culturing, media was replaced to remove unattached cells. We refer to the MACS-isolated Thy1+ adherent non-myocytes as neonatal CFs. For bulk RNA-seq experiments, neonatal CFs were similarly isolated except that MACS-isolated Thy1+ cells were directly lysed in TRIzol (Life Technology) without culturing. Neonatal CMs were isolated using the neonatal cardiomyocytes isolation system (Worthington Biochemical Corporation) except that all enzymes were used at a ¼ of the recommended concentration to increase cell viability. After a 1.5 hour pre-plating on uncoated surface to remove attached nonmyocytes, the unattached CMs were collected in TRIzol (> 80% viability by Trypan blue staining).
For iCM generation, pMXs retroviruses were packaged by transfecting platE cells (Cell Biolabs) with lipofectamine 2000 (Life Technology) as previously described14. Viruses collected from one 10 cm dish were resuspended in 100 μl iCM media (10% FBS in DMEM/M199 (4:1)) and added to cells at 5 μl of each virus (if co-transduction)/cm2 of surface area. All transduction was performed in iCM media containing 4 μg/ml of polybrene. For single-cell RNA-seq, CFs were untransduced, transduced with 1: 2 ratio of pMXs-DsRed: pMXs (lacZ), or transduced with equal amount of M, G, T viruses. For microarray experiments, CFs were transduced with the control pMXs-puro (lacZ) or the pMXs-puro-MGT viruses and collected in TRIzol. Day 5, 7, 10 and 14 samples were selected with 2 μg/ml puromycin from day 3 and maintained in 1 μg/ml puromycin from day 6. Day 0 samples were overnight cultured CFs that were collected immediately before viral transduction. For bulk RNA-seq, CFs were transduced with pMXs-puro (lacZ), pMXs-puro-MGT, pMXs-puro-MGT+shNT, or pMXs-puro-MGT+shPtbp1-271 for three days and then collected in TRIzol. All microarray and bulk RNA-seq samples were duplicated.
Capture of single cells, RNA spike-ins, and preparation of cDNA
Single cells were captured using the Fluidigm C1 system (up to 96 single cells/plate). A total of seven individual experiments (E1-E7) were performed starting from mouse breeding, CF isolation, iCM reprogramming, to single cell capture and cDNA preparation (see Extended Data Fig. 1 for experimental design and workflow). Three out of the seven experiments (expt) (E1, E2, and E4) contained only M+G+T-transduced cells. Four out of the seven experiments (E3 and E5-E7) contained cells treated with two different conditions in order to estimate the relative abundance of mouse mRNA between treatments. Specifically, for expt E1, E2, and E4, CFs transduced with M+G+T for 3 days were collected by trypsinization, stained with 7AAD or NearIR live/dead dye (Thermo Fisher Scientific), and FACS-sorted for live cells (negative for the live/dead dye). Pilot experiments showed an average diameter of 12.6 μm and a buoyancy of 7.5 (cells): 2.5 (buoyancy buffer) of CFs. Therefore, the sorted single cell suspension (~2000 cells μl−1) was loaded on a medium-sized (10-17 μm) microfluidic RNA-seq chip (C1 Single-Cell mRNA Seq IFC, cat # 100-6041, initially designed chips used in E1-E3 and redesigned chips used in E4-E7) and single cells were captured with the C1 system. Bright field images were taken for each capture site. For expt E3 and E5-E7, day 3 M+G+T-transduced (E5&E6) or untransduced (E3&E7) CFs were stained with the NearIR live/dead dye and 0.25 - 1 μM Carboxyfluorescein succinimidyl ester (CFSE, Thermo Fisher Scientific), and the DsRed-transduced CF were stained with the NearIR dye only. Then 12,000 CFSE+NearIR− green fluorescent cells and 12,000 DsRed+NearIR− red fluorescent cells were sequentially FACS-sorted into the same tube and mixed. For experiment E3, cell sorting was slightly different that 700 of each of CFSE single positive cells (untransduced), DsRed single positive cells (DsRed-transduced), and double negative cells (from the DsRed-transduced wells but no DsRed protein expression) were sorted into a single cell suspension. After cell capture, fluorescent images of GFP and RFP channels as well as bright field pictures were taken.
Next, control RNA spike-ins were added into lysis Mix A (see Fluidigm’s protocol), which were then loaded onto the IFC plate before cell lysis. Expt E1 and E2 used the Ambion Array Control spike-ins (AM1780) that were included in the SMARTer kit. E1 used only spike 1, 4, and 7 according to Fluidigm’s protocol but at a concentration that is 100 fold higher than suggested, based on recommendations from UNC Advanced Analytics Core (AAC) that provided the Fluidigm service. E2 used all 8 spike-ins contained in the kit at the following working concentrations (before addition to lysis Mix A): 10 pg/μl of spike 1, 1 pg/μl of spike 2, and 10 fold lower for the next spike and so on. For E3, we used the Ambion spike-ins at a half concentration of those used in E2 and another spike-in, the External RNA Controls Consortium (ERCC) RNA spike-in Mix 1 (Ambion, Life Technologies) after an 80,000-fold dilution. For E4-E7, only the ERCC spike-in was used after a 40,000-fold dilution and 1 μl of the diluted working spike-in was mixed with 19 μl of other components to make lysis Mix A. Then cell lysis, reverse transcription, and cDNA pre-amplification were performed on the chip according to Fluidigm’s standard protocol and the control RNA spike-ins were processed in parallel with cellular RNA. Differences in spike-ins added to each experiment reflected how the technology evolved over time during the progress of this project. To address the spike-ins issue, among others, we developed a pipeline described below in the “Processing and normalization of single-cell RNA-seq data” section to normalize and analyze all acquired useful data.
Illumina library preparation and sequencing
After in situ cDNA library preparation, the bright field and fluorescent images of each capture site (“nest”) on the chip were carefully examined. Forty-six empty nests, 30 nests with two or more cells, and 22 nests containing morphologically unhealthy cells out of 672 capture sites on seven chips were excluded for further analysis, resulting in 574 single-cell cDNA libraries. The size distribution and quality of cDNA libraries from each single cell were ensured by bioanalyzer. For E3 only, cDNA library concentrations were measured with picogreen (Thermo Fisher Scientific) and four single-cell cDNA libraries below 1 ng/μl were excluded from further analysis. E1-E4 each contained a negative control from an empty nest that was processed in parallel with other healthy single cells. Therefore a total of 574 high-quality cDNA libraries were submitted to the UNC High Throughput Sequencing Facility (HTSF). Illumina libraries were prepared using the Nextera XT DNA Sample Preparation kit according to Fluidigm’s standard protocol except with 13 cycles of amplification. The barcoded single-cell Illumina libraries of each experiment were pooled and sequenced for 50 base pairs (bp) single-end reads on Illumina HiSeq 2500. Illumina library preparation and sequencing were carried out in three batches: E1 by itself on two lanes, E2 and E3 processed together on one lane each, and E4-E7 processed together on one lane each. Previous studies showed that 0.5-1 million reads per cell were sufficient to detect most genes expressed by single cells26,27. In this study, we sequenced the cells at about 1-5 × 106 reads per cell. Raw reads were re-assigned to each single cell by their unique Nextera barcode and sequencing reads removed of barcodes were received from HTSF in .fastq format.
For microarray and bulk RNA-seq samples, cellular RNA was extracted from TRIzol (microarray samples were further purified with the RNAeasy kit from Qiagen), and only samples with RNA integrity number (RIN) above 8 determined by bioanalyzer were further processed. Microarray samples were submitted to HTSF for one color Cy-dye labeling and long oligo (60 mer) Agilent high density microarrays. Bulk RNA-seq samples were prepared with the TruSeq Stranded mRNA Library Prep Kit (Illumina). The barcoded Illumina libraries were pooled and submitted to HTSF for sequencing. About 6 × 107 100 bp paired-end reads/sample were obtained and sequencing reads removed of Illumina indexes were received from HTSF in .fastq format.
Processing and normalization of single-cell RNA-seq data
The quality of sequencing results was first checked by FASTQC. Reads are in high-quality and no trimming was required. The raw reads were then mapped to the merged genome of mm10, ERCC, and E.coli K12 with Tophat2 using default settings. Information about the number of total reads and the percentages of reads mapped to spike-in or mouse genome for each single cell were detailed in the Supplementary Table 1. Outliers showing high ratios of % reads mapped to spike-in to % reads mapped to mouse genome were removed (Extended Data Fig. 1d). This step removed 61 outliers from the 574 sequenced single cells, resulting in 513 high-quality single cells for analysis (Supplementary Table 2). Gene expression was counted with Htseq-count using the union mode28 (http://www-huber.embl.de/users/anders/HTSeq).
Limit of detection of our single-cell RNA-seq was determined as previously described26. Briefly, the concentration of each ERCC spike-in in the lysis chamber was first calculated. For expt E3, seven of the spike-ins were presented at 1.24 molecules per chamber and are as follows: ERCC-00014, ERCC-00028, ERCC-00039, ERCC-00067, ERCC-00077, ERCC-00143, and ERCC-00150. For expt E4-E7, five of the spike-ins were present at 1.24 molecules per chamber and are as follows: ERCC-00031, ERCC-00033, ERCC-00058, ERCC-00069, and ERCC-00134. The number of non-zero measurements of each spike-in was then counted. This number divided by the total number of high quality cells from that plate is the probability of detection for each spike-in at this concentration. Average probability of detection of all 12 spike-ins is 0.30, consistent with previous findings26 and suggesting single-molecule sensitivity of our experiments.
We developed a three-step normalization strategy in order to extract biological meaningful information from all the single-cell RNA-seq data (Extended Data Figure 1c). Firstly, we normalized mouse gene raw counts to each cell’s technical and biological size factors within each experiment using a previously described method29. These two size factors account for technical variations within each experiment such as amplification efficiency and differences in the amount of biological starting material in each cell. Based on the normalized DsRed counts, cells in experiments that involved two treatments were classified as DsRed-transduced (E3R, E5R, E6R, and E7R, expressing high levels of DsRed), or M+G+T-transduced (E5M, E6M)/untransduced cells (E3U, E7U, Extended Data Fig. 1g).
Secondly, we corrected for “batch effects” that accounts for technical contributions to expt-to-expt variations due to different cell-capture efficiency, types/amounts of spike-ins and Fluidigm chips (Extended Data Fig. 1b), while preserving biological information such as total mRNA abundance. By comparing among biological replicate experiments, we found different mean total mRNA counts/cell (Extended Data Fig. 1h) that likely resulted from varying cell-capture efficiency/plate (68 sequenced cells in E5, 33% more compared to 51 cells in E6, Supplementary Table 2), various amounts of spike-ins used (100 fold more concentrated spike-ins in E1 than E2) and different types of spike-ins and Fluidigm chips used (Fluidigm spike-in and previous chip in E2 and ERCC spike-in and redesigned chip in E4), suggesting the existence of “batch effects”. To determine whether different treatments affected mouse mRNA abundance in the cell, we also examined mean total mRNA reads from different treatments in the same experiment (Extended Data Fig. 1h). We found no difference in average total mRNA counts between uninfected and DsRed-transduced cells (E3U vs E3R and E7U vs E7R) but 40% less counts in cells undergoing reprogramming (M+G+T-transduced, E5M, E6M) than DsRed-transduced cells (E5R, E6R), suggesting biological variations caused by treatment. Thus, to retain mRNA abundance information while correcting for batch effects, we normalized each treatment in each experiment to an expt size factor so that the median mRNA counts equals 1,000,000 for uninfected and DsRed-transduced cells and 616136 (deduced from the average M:R ratio of E5 and E6) for M+G+T-transduced cells (Extended Data Fig. 1h). This normalization successfully removed batch effects discussed above. An example was shown in Extended Data Fig. 1i for a comparison of cells from E5 and E6 on a PCA plot.
Lastly, we focused on nonimmune cells (462 cells in total, see Extended Data Fig. 3 for details) and removed residual batch effects using ComBat, a method that was designed for normalizing gene expression data30 and performed well in previous studies31. By examination of all experiments in each treatment condition with PCA plots (Extended Data Fig. 2a-c, the “Before” columns), we found no batch effects in PC1/PC2 plot but started to see incomplete overlap of different experiments in PC3 (for uninfected) or PC4 (for M+G+T- and DsRed-transduced), which represented < 5% variance of the data. Because batch effects were observed between different chips but not the same chips, we postulated that the use of two different versions of the Fluidigm medium-size chips might be the cause. The ComBat normalization was run separately for each treatment to remove only technical variations between batches while preserve biological variations between treatments. ComBat requires all input genes to be expressed in all batches, i.e. at least one cell in each batch. Therefore, genes that have non-zero counts in all batches were selected and normalized for each treatment. After the normalization, results from different treatments were merged. For those genes that were selected in one treatment but not others, expression levels were set to 0 in other treatment(s). After this procedure, there were a total of 14414 genes left. PCA analyses with ComBat-normalized counts showed that no batch effects we detected in the top 20 PCs (the “After” columns in Extended Data Fig. 2a-c for PC1-4, and data not shown), suggesting successful removal of all residual batch effects in our data.
Analysis of single-cell RNA-seq data
Outlier detection, PCA, HC, and the generation of violin plots were performed with the “SINGuLAR Analysis Toolset” package (Fluidigm) in R. First, normalized expression was log2 transformed before analysis. Outliers were detected and removed based on average gene expression and PCA using the SINGuLAR package (Extended Data Fig. 2d-e), resulting in 454 high-quality nonimmune cells for downstream analysis. Expression of the reprogramming factors M, G, and T was excluded before PCA, HC, or SLICER (see the next session) analysis. Next, top 400 PCA genes were selected by largest weight (loading) contribution to principle component 1, 2, or 3. Then HC was performed with these 400 genes and cells were grouped (Fig. 1a). The group information was used to generate PCA plots (Fig. 1b-c), violin plots (Extended Data Fig. 4b, l) and to perform analysis of variance (ANOVA, Fig. 4c, d, Extended Data Fig. 10a-e). ANOVA and Tukey post-hoc test were performed with custom scripts in R in order to identify positive or negative selection markers for iCM and piCM. For ANOVA, CCI but not CCA cells were used. For violin plots in Fig. 2 and 4, box plots were overlaid on top of them. The center dot represents median gene expression and the central rectangle spans the first quartile to the third quartile of the data distribution. The “whiskers” above or below the box show the locations of 1.5 interquartile range above the third quartile or below the first quartile. tSNE analysis was performed with the “Rtsne” package in R. GO analysis was performed using the DAVID functional annotation tool version 6.7 (https://david.ncifcrf.gov/). All GO terms shown in this study have p value or corrected p value (FDR) < 0.05. Abbreviation of GO terms: Fig. 1a, pos. reg. of SMC prolif. (positive regulation of smooth muscle cell proliferation); Fig. 3n, Cell. Resp. to DNA damage (cellular response to DNA damage), Mitochondrion inner mem. (mitochondrion inner membrane); Fig. 4b, pos. reg. of n.t. metabolism (positive regulation of nucleotide metabolism), pos. reg. of transcription (positive regulation of transcription). For the comparison of distributions of number of detected genes in different cell groups, we conducted the one-sided two-sample Kolmogorov-Smirnov test (Extended Data Fig. 4n-o). Since we are comparing the distributions of two samples, the conclusion is more general than a mean test such as t-test and does not rely on restrictive statistical assumptions such as normal distributions.
In Fig. 2a-c, cells from expt E1-E3 were presented. Analysis of data from expt E4-E7 showed consistent results (Extended Data Fig. 6e, f, and k). For Fig. 2c, first, CCI and CCA cells in Fb/Epi (Fig. 2a-b) were merged into one Fb/Epi group. Then a new HC for control cells in E3 was calculated using the four cell-lineage-related gene clusters but not the cell cycle genes identified in Fig. 2a. The calculated HC was very similar to that in Fig. 2a and was applied to reprogramming cells from E1 and E2 to generate Fig. 2c. For all correlation analyses, gene expression was always log transformed before analysis. In Fig. 4a, d, and Extended Data Fig. 9p, CCI cells were used. Linear regression was performed to obtain the regression coefficient (R value) and its corresponding p value (two-sided, α = 0.05). For correlation analysis of Tbx5 and its target genes in Fig. 4b, M+G+T-transduced CCI cells were included. The list of Tbx5 ChIP-seq peaks in HL-1 and the list of genes differentially expressed in wild type versus Tbx5 null mutant mice hearts were obtained from previous studies22–23. Genes present in both lists (2109 genes) were selected as Tbx5 downstream targets and used to calculate Fig. 4b. A total of 170 genes with spearman correlation coefficient > 0.3 or < −0.3 were selected and their correlation coefficient with Tbx5 was plotted in Fig. 4b left panel. Then inter-correlation between these genes was calculated and the correlation matrix ordered by HC was shown as a heatmap in Fig. 4b right panel. Three sets of genes A, B, C were found to be co-expressed (p<2.6e-6 by Spearman correlation). Representative genes of them were listed in the right panel and their corresponding GO terms were labeled on the left (Fig. 4b). For correlation analysis of M, G, T expression and expression of transcription factors or splicing factors, M+G+T-transduced CCI cells were presented in Extended Data Fig. 9t-u. The list of mouse transcription factors was obtained from public databases as previously described12 and the list of splicing factors was obtained from a previous study32.
Trajectory construction and identification of genes related to iCM reprogramming
We used SLICER (Selective Locally linear Inference of Cellular Expression Relationships)17, an algorithm that we previously developed, to construct cellular trajectories of iCM reprogramming. SLICER is implemented as an R package, which is freely available on the Comprehensive R Archive Network (CRAN) and on GitHub (https://github.com/jw156605/slicer). Briefly, SLICER discovers a nonlinear, low-dimensional manifold embedded in gene expression space that indicates how cellular gene expression profiles change during a sequential process. Additionally, SLICER automatically detects the presence, location, and number of branches in a trajectory, corresponding to multiple cell fates or multiple cellular processes occurring simultaneously. In this paper, the manifold corresponds to the reprogramming process by which fibroblasts turn into iCMs. To ensure consistency between the clustering and trajectory analyses, we ran SLICER on the control and reprogramming CFs using the top 400 PCA genes, rather than using SLICER’s gene selection approach. We performed nonlinear dimensionality reduction using a technique called local linear embedding (LLE), which is analogous to a nonlinear version of PCA. For this paper, we used a three-dimensional LLE projection for trajectory construction. We then build a k-nearest neighbor graph in the low-dimensional manifold space produced by LLE. Shortest paths through the neighbor graph correspond to geodesics along the manifold, and we use the lengths of these shortest paths to order cells according to their distances from a user-defined starting cell. The steps of the reprogramming process can then be traced by examining the cells one-by-one in the specified ordering. We also investigated the distribution of cells along pseudotime, reasoning that local differences in density could indicate the relative speed of changes and stability of intermediate states. We estimated the density of cells in pseudotime using a Gaussian kernel density estimator, then calculated “free energy” as max(density-density). NP M+G+T-transduced cells were used for free energy calculation (Fig. 1f).
Using a method similar to that described in Trapnell et al.33, we used nonlinear regression to identify genes significantly related to the reprogramming process. Only NP cells were included for this analysis. For each gene with mean expression above 1, we fit a generalized additive model (GAM) of the Tobit family (VGAM R package). The GAM approach uses cubic splines to fit a smooth nonlinear model, and the Tobit likelihood accounts for zero inflation by modeling gene expression dropout as data censoring. To avoid overfitting the data, which would result in a curve that is too “wiggly”, we constrained the GAM fits to use three degrees of freedom. We then identified genes that are significantly related to the reprogramming process using a likelihood ratio test, with a constant GAM as the null model. Using k-medoid clustering (pam algorithm from the cluster R package), we identified clusters of significantly related genes that showed similar trends over the reprogramming process (Fig. 3a and Extended Data Fig. 7a).
Analysis of bulk RNA-seq and microarray data
Bulk RNA-seq data were analyzed similarly as the single cell data except that they were only normalized for sequencing depth. Specifically, raw counts from HTseq-count were divided by the total number of mm10 mRNA reads from that sample and then multiplied by 1×106 to give counts per million (CPM). For differential expression analysis of lacZ vs MGT samples and MGT+shNT vs MGT+shPtbp1 samples, raw counts were inputted into DESeq234 in R and lists of DEG were obtained (FDR < 0.05, fold change > 1.25). Heatmaps were generated using the heatmap.2 function in the “gplots” package in R. The microarray data were processed using the limma package from Bioconductor35. Raw data were first background corrected and normalized using “normexp” and “quantile” methods, respectively. Next, control probes and low intensity probes were filtered out using the 1.1 multiplier of 95% quantile for negative controls as a cutoff. Lastly, probe intensity data were log2 transformed and replicated probes for each gene were averaged for subsequent analyses. PCA were performed in R with the “prcomp” function using all of the 34378 detected genes and the 3d plot was generated with the “scatterplot3d” package in R (Extended Data Fig. 4p-q).
Analysis of splicing
We aligned the bulk RNA-seq data (100-bp, paired-end reads) to mm10 using Mapsplice 2.1.4. To detect alternative splicing, we used rMATS36 version 3.2.5 with Ensembl GRCm38.82 gene annotations and the novelSS (novel splicing site) flag to identify unannotated splicing events. All other rMATS parameters were set to the default values. In Fig. 3j-l, n-o, and Extended Data Fig. 9a-i, we used FDR < 0.05 and dPSI > 15 as cutoffs, resulting in 1494 AS events for MGT+shPtbp1 vs MGT+shNT and 879 AS events for MGT vs lacZ (see Supplementary Table 4-5 for the lists). In Fig. 3j-k, to determine whether the direction (sign) of PSI change is consistent between two group pairs, we identified the overlapping AS events between the samples (69 overlapping events between MGT vs lacZ and CM vs CF, and 155 events between MGT+shPtbp1 vs MGT+shNT and CM vs CF). We then conducted a binomial test by first transforming the paired dPSI data into either +1 or −1 based on if their signs agree or not, and then calculating the proportion of +1s and compare it with 50% using one-sided binomial test. The results for dPSI (CM-CF) vs (MGT-lacZ) showed that only 34.78% of times the signs in these two groups agree with each other (p=0.0077). Therefore we conclude that there’s enough statistical evidence to support that the directions (signs) in CM-CF and MGT-lacZ are different. The result for CM-CF vs shPtbp1-shNT shows that 83.22% of times the signs in two groups are the same (p=2.2e-16). Therefore we conclude that there is enough statistical evidence to support that the directions (signs) in CM-CF and shPtbp1-shNT are the same. To plot the positional distribution of Ptbp1 binding motifs, we used rMAPS37 version 1.0.5. The rMAPS tool can only be used for ES events and has a database of known binding motifs for RNA binding proteins, including Ptbp1. It considers ES events with FDR < 0.05 and dPSI > 5 as statistically significant and all others insignificant. Then it takes all ES events (both significant and insignificant) and uses the events that are not significant to create a background profile. We basically extracted the exon skipping events from the rMATS comparison of MGT+shPtbp1 vs MGT+shNT and ran rMAPS on this list to generate Fig. 3m.
Proliferation assays
Lentiviruses were packaged by transfecting 293T cells with lipofectamine 2000 as previously described12. For packaging shRNA viruses, a total of 10 μg plasmids consisting of equivalent concentrations of each of the four or five shRNA targeting different regions of the gene were used. Lentiviral M, G, T, iMGT and large T were added to cells at 5 μl of each virus (if co-transduction)/cm2 of surface area and all other lentiviruses (rTtA, lacZ, cell-cycle related genes, and shRNA) were used at 2.5 μl each/cm2. For Extended Data Fig. 5b-m, pMXs-puro-MGT were used for iCM induction. In the EdU incorporation assay, cells were pulsed with 10 μM of EdU for three days before staining with Alexa Fluor 647 labelled EdU in the Click-iT Plus Edu Alexa Fluor™ 647 Flow Cytometry Assay Kit (ThermoFisher Scientific, #C10634). Propidium iodide (PI, Life Technologies, #P3566) staining was performed as previously described19. For iMGT induction, doxycycline (Dox) was added at 1 μg/ml and changed every 2-4 days. Puromycin (Puro) selection was performed at 1 μg/ml and media were changed every 2-4 days. For cell cycle synchronization assay, CFs were treated with 400 ng/ml of Nocodazole (G2/M) or cultured in low serum condition (0.5% FBS, G0/G1) for 4 days before iCM induction; normal serum condition was 10% FBS. To generate CF-T cells, neonatal CFs were transduced with pBabe-largeT and selected by 600 ng/ml Zeocin in fibroblast medium (IMDM/20% FBS/1 × Pen/Strep) from day 2 for 2 weeks. The resulting CF-Ts were a relatively homogenous pool of cells after antibiotic selection. All data are representative of multiple repeated experiments.
shRNA library screen, immunofluorescence staining, flow cytometry and qRT-PCR
For screening of shRNA library targeting splicing factors, MGT expression was induced by 1 μg/ml doxycycline in icMEF cells19 and lentiviruses containing mixed clones of shRNAs targeting each gene were added to the cells at 5 μl/cm2. For Ptbp1 protein expression, western blotting was performed as previously described14 (anti-Ptbp1, Cell Signaling 8776, 1: 500 dilution). Adult CF (AdCF) and tail-tip fibroblast (AdTTF) were isolated by the explant method as previously described15. The clone 271 was used for all shPtbp1-related experiments except for the initial screen while mixed clones of shCd200 viruses were used for shCd200-related experiments. Information of shRNAs were listed in the Supplementary Table 8.
Immunofluorescence staining and flow cytometry were performed as previously described12. Primary antibodies were used at the following dilutions: rabbit anti-GFP (Invitrogen, A11122, 1:500), chicken anti-GFP (Abcam, ab13970, 1:1500), anti-αSMA (Sigma, A2547, 1:200), anti-SM22α (Abcam, ab14106, 1:200), anti-αActinin (Sigma, A7811, 1:500), anti-Cx43 (Sigma, C6219, 1:200), APC-Thy1.2 (ebioscience, 17-0902-81, 1:100) and APC-Cd200 (Biolegend, 123809, 1:200). Images were captured using EVOS® FL Auto Cell Imaging System (Life Technologies). All images shown in this study were overlaid with hochest nuclear staining except for live images. For quantification, 10-30 images from multiple repeated experiments were randomly taken under 10×, 20 × or 40 × magnifications at the same exposure setting and then counted in a blinded way. For Extended Data Fig. 6g-i, neonatal hearts were minced into small pieces and plated with fibroblast medium (the explant method14). After 7 days of migration, the adherent cells were either immunostained in situ (cy3-αSMA, Sigma, C6198, 1:500 dilution; APC-Thy1.1, ebioscience, 17-0900-82, 1:100 dilution; PE-CD31, Biolegend, 102408, 1:200 dilution), or trypsinized, filtered through 40 μm cell strainers (Thermo Scientific), immunostained for Thy1.2 and αSMA/CD31, and then analyzed by flow cytometry. All flow data were collected on a Beckman Coulter Cyan ADP flow cytometer (UNC Flow Cytometry Core Facility) and analyzed with the FlowJo software (Tree Star). qRT-PCR was performed as previously described12 (see Supplementary Table 8 for primer sequences). All data are representative of multiple repeated experiments.
Statistics
Unless otherwise stated, values are expressed as average ± standard deviation (SD) or standard mean of error (SEM) of multiple biologically independent samples. Statistical tests performed include student’s t test, one way ANOVA followed by post hoc correction, linear regression, Spearman correlation, Kolmogorov-Smirnov test, binomial test, likelihood ratio test, and Chi-square test. Application and results of these tests were described in details in METHODS and figure legends. Generally, a p value of < 0.05 was considered statistically significant (*), a p value of < 0.01 was considered highly significant (**), and a p value of < 0.001 was considered strongly significant (***). All data are representative of multiple repeated experiments.
Data availability
The RNA-sequencing data that support the findings of this study are available in GEO under the accession number GSE98571. Source data for all figures are available in the supplementary tables.
Extended Data
Supplementary Material
Acknowledgments
We thank UNC AAC Core, HTSF Core, Flow Core for technical support. This study was supported by NIH HG06272 to J.F.P., NIH BD2K Fellowship (T32 CA201159) and NIH F31 Fellowship (HG008912) to J.D.W., NIH/NHLBI R00 HL109079 and American Heart Association (AHA) 15GRNT25530005 to J.L., AHA 13SDG17060010, Ellison Medical Foundation (EMF) AG-NS-1064-13, and NIH/NHLBI R01HL128331 to L.Q., and generous gifts from Dr. Hugh “Chip” McAllister and Cecil Sewell.
Footnotes
Supplementary Information is linked to the online version of the paper at www.nature.com/nature.
Author Contributions L.Q., Z.L., and L.W. conceived and designed the study. Z.L. and L.W. designed and performed single-cell RNA-seq. Z.L., L.W., Y.Z. and C.Y. prepared samples for microarray and bulk RNA-seq. L.W., Y.Z., H.M., H.V., C.Y., S.Y., B.W., S.A., and M.Z. performed other experiments. Z.L., J.W., J.P. performed data analysis and modeling. W.S. helped with statistical analysis. Z.L, J.W., J.L. and L.Q. wrote the manuscript, with extensive input from all authors. J.L. and L.Q. provided funding and overall supervision.
Author Information
The authors declare no competing financial interest. Readers are welcome to comment on the online version of the paper.
References
- 1.Ieda M, et al. Direct reprogramming of fibroblasts into functional cardiomyocytes by defined factors. Cell. 2010;142:375–386. doi: 10.1016/j.cell.2010.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Jayawardena TM, et al. MicroRNA-mediated in vitro and in vivo direct reprogramming of cardiac fibroblasts to cardiomyocytes. Circulation research. 2012;110:1465–1473. doi: 10.1161/CIRCRESAHA.112.269035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Qian L, et al. In vivo reprogramming of murine cardiac fibroblasts into induced cardiomyocytes. Nature. 2012;485:593–598. doi: 10.1038/nature11044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Song K, et al. Heart repair by reprogramming non-myocytes with cardiac transcription factors. Nature. 2012;485:599–604. doi: 10.1038/nature11139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Dal-Pra S, Hodgkinson CP, Mirotsou M, Kirste I, Dzau VJ. Demethylation of H3K27 Is Essential for the Induction of Direct Cardiac Reprogramming by miR Combo. Circulation research. 2017;120:1403–1413. doi: 10.1161/CIRCRESAHA.116.308741. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Ma H, Wang L, Yin C, Liu J, Qian L. In vivo cardiac reprogramming using an optimal single polycistronic construct. Cardiovascular research. 2015;108:217–219. doi: 10.1093/cvr/cvv223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Mathison M, et al. In vivo cardiac cellular reprogramming efficacy is enhanced by angiogenic preconditioning of the infarcted myocardium with vascular endothelial growth factor. Journal of the American Heart Association. 2012;1:e005652. doi: 10.1161/JAHA.112.005652. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Mohamed TM, et al. Chemical Enhancement of In Vitro and In Vivo Direct Cardiac Reprogramming. Circulation. 2017;135:978–995. doi: 10.1161/CIRCULATIONAHA.116.024692. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Zhao Y, et al. High-efficiency reprogramming of fibroblasts into cardiomyocytes requires suppression of pro-fibrotic signalling. Nature communications. 2015;6:8243. doi: 10.1038/ncomms9243. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Zhou H, Dickson ME, Kim MS, Bassel-Duby R, Olson EN. Akt1/protein kinase B enhances transcriptional reprogramming of fibroblasts to functional cardiomyocytes. Proceedings of the National Academy of Sciences of the United States of America. 2015;112:11864–11869. doi: 10.1073/pnas.1516237112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Ifkovits JL, Addis RC, Epstein JA, Gearhart JD. Inhibition of TGFbeta signaling increases direct conversion of fibroblasts to induced cardiomyocytes. PloS one. 2014;9:e89678. doi: 10.1371/journal.pone.0089678. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Liu Z, et al. Re-patterning of H3K27me3, H3K4me3 and DNA methylation during fibroblast conversion into induced cardiomyocytes. Stem cell research. 2016;16:507–518. doi: 10.1016/j.scr.2016.02.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Muraoka N, et al. MiR-133 promotes cardiac reprogramming by directly repressing Snai1 and silencing fibroblast signatures. The EMBO journal. 2014;33:1565–1581. doi: 10.15252/embj.201387605. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Wang L, et al. Stoichiometry of Gata4, Mef2c, and Tbx5 influences the efficiency and quality of induced cardiac myocyte reprogramming. Circulation research. 2015;116:237–244. doi: 10.1161/CIRCRESAHA.116.305547. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Zhou Y, et al. Bmi1 Is a Key Epigenetic Barrier to Direct Cardiac Reprogramming. Cell stem cell. 2016;18:382–395. doi: 10.1016/j.stem.2016.02.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Shin J, et al. Single-Cell RNA-Seq with Waterfall Reveals Molecular Cascades underlying Adult Neurogenesis. Cell stem cell. 2015;17:360–372. doi: 10.1016/j.stem.2015.07.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Welch JD, Hartemink AJ, Prins JF. SLICER: inferring branched, nonlinear cellular trajectories from single cell RNA-seq data. Genome biology. 2016;17:106. doi: 10.1186/s13059-016-0975-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Polo JM, et al. A molecular roadmap of reprogramming somatic cells into iPS cells. Cell. 2012;151:1617–1632. doi: 10.1016/j.cell.2012.11.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Vaseghi HR, et al. Generation of an inducible fibroblast cell line for studying direct cardiac reprogramming. Genesis. 2016;54:398–406. doi: 10.1002/dvg.22947. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Fu XD, Ares M., Jr Context-dependent control of alternative splicing by RNA-binding proteins. Nature reviews Genetics. 2014;15:689–701. doi: 10.1038/nrg3778. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Wang L, et al. Improved Generation of Induced Cardiomyocytes Using a Polycistronic Construct Expressing Optimal Ratio of Gata4, Mef2c and Tbx5. Journal of visualized experiments : JoVE. 2015 doi: 10.3791/53426. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.He A, Kong SW, Ma Q, Pu WT. Co-occupancy by multiple cardiac transcription factors identifies transcriptional enhancers active in heart. Proceedings of the National Academy of Sciences of the United States of America. 2011;108:5632–5637. doi: 10.1073/pnas.1016959108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Waldron L, et al. The Cardiac TBX5 Interactome Reveals a Chromatin Remodeling Network Essential for Cardiac Septation. Developmental cell. 2016;36:262–275. doi: 10.1016/j.devcel.2016.01.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Zheng X, et al. Prolyl hydroxylation by EglN2 destabilizes FOXO3a by blocking its interaction with the USP9x deubiquitinase. Genes & development. 2014;28:1429–1444. doi: 10.1101/gad.242131.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Qian L, Berry EC, Fu JD, Ieda M, Srivastava D. Reprogramming of mouse fibroblasts into cardiomyocyte-like cells in vitro. Nature protocols. 2013;8:1204–1215. doi: 10.1038/nprot.2013.067. [DOI] [PubMed] [Google Scholar]
- 26.Wu AR, et al. Quantitative assessment of single-cell RNA-sequencing methods. Nature methods. 2014;11:41–46. doi: 10.1038/nmeth.2694. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Treutlein B, et al. Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature. 2014;509:371–375. doi: 10.1038/nature13173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Anders S, Pyl PT, Huber W. HTSeq–a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31:166–169. doi: 10.1093/bioinformatics/btu638. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Brennecke P, et al. Accounting for technical noise in single-cell RNA-seq experiments. Nature methods. 2013;10:1093–1095. doi: 10.1038/nmeth.2645. [DOI] [PubMed] [Google Scholar]
- 30.Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8:118–127. doi: 10.1093/biostatistics/kxj037. [DOI] [PubMed] [Google Scholar]
- 31.Chen C, et al. Removing batch effects in analysis of expression microarray data: an evaluation of six batch adjustment methods. PloS one. 2011;6:e17238. doi: 10.1371/journal.pone.0017238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Han H, et al. MBNL proteins repress ES-cell-specific alternative splicing and reprogramming. Nature. 2013;498:241–245. doi: 10.1038/nature12270. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Trapnell C, et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nature biotechnology. 2014;32:381–386. doi: 10.1038/nbt.2859. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome biology. 2014;15:550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Ritchie ME, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic acids research. 2015;43:e47. doi: 10.1093/nar/gkv007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Shen S, et al. rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-Seq data. Proceedings of the National Academy of Sciences of the United States of America. 2014;111:E5593–5601. doi: 10.1073/pnas.1419161111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Park JW, Jung S, Rouchka EC, Tseng YT, Xing Y. rMAPS: RNA map analysis and plotting server for alternative exon regulation. Nucleic acids research. 2016;44:W333–338. doi: 10.1093/nar/gkw410. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Pinto AR, et al. An abundant tissue macrophage population in the adult murine heart with a distinct alternatively-activated macrophage profile. PloS one. 2012;7:e36814. doi: 10.1371/journal.pone.0036814. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Whitfield ML, et al. Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Molecular biology of the cell. 2002;13:1977–2000. doi: 10.1091/mbc.02-02-0030.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Gooding C, Smith CW. Tropomyosin exons as models for alternative splicing. Advances in experimental medicine and biology. 2008;644:27–42. doi: 10.1007/978-0-387-85766-4_3. [DOI] [PubMed] [Google Scholar]
- 41.Terenzi F, Ladd AN. Conserved developmental alternative splicing of muscleblind-like (MBNL) transcripts regulates MBNL localization and activity. RNA biology. 2010;7:43–55. doi: 10.4161/rna.7.1.10401. [DOI] [PubMed] [Google Scholar]
- 42.Dube DK, McLean MD, Dube S, Poiesz BJ. Translational control of tropomyosin expression in vertebrate hearts. Anat Rec (Hoboken) 2014;297:1585–1595. doi: 10.1002/ar.22978. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The RNA-sequencing data that support the findings of this study are available in GEO under the accession number GSE98571. Source data for all figures are available in the supplementary tables.