Summary
The inability to quantify cardiomyocyte (CM) maturation remains a significant barrier to evaluating the effects of ongoing efforts to produce adult-like CMs from pluripotent stem cells (PSCs). Here, we present a protocol to quantify stem-cell-derived CM maturity using a single-cell RNA sequencing-based metric “entropy score.” We describe steps for generating an entropy score using customized R code. This tool can be used to quantify maturation levels of PSC-CMs and potentially other cell types.
For complete details on the use and execution of this protocol, please refer to Kannan et al.1
Subject areas: Bioinformatics, Developmental biology, RNA-seq, Gene expression, Stem cells
Graphical abstract
Highlights
-
•
Index PSC-CM maturation status using a single-cell RNA sequencing-based metric
-
•
Filter out poor-quality cells with quality control metrics for a more reliable comparison
-
•
Visualize gene expression trend over entropy scores
-
•
Compare entropy scores across different studies
Publisher’s note: Undertaking any experimental protocol requires adherence to local institutional guidelines for laboratory safety and ethics.
The inability to quantify cardiomyocyte (CM) maturation remains a significant barrier to evaluating the effects of ongoing efforts to produce adult-like CMs from pluripotent stem cells (PSCs). Here, we present a protocol to quantify stem-cell-derived CM maturity using a single-cell RNA sequencing-based metric “entropy score.” We describe steps for generating an entropy score using customized R code. This tool can be used to quantify maturation levels of PSC-CMs and potentially other cell types.
Before you begin
PSC-CMs show great potential in drug discovery, disease modeling, and cell therapy.2,3,4 However, PSC-CMs are immature as they resemble early stage CMs which is a primary roadblock to their application. These in vitro derived cells differ from endogenous CMs in their sarcomere organization, mitochondrial density, morphology, calcium handling, and force production.5 Various strategies have been reported to mature PSC-CMs, including electric stimulus,6,7,8 mechanical stretching,9,10 metabolic maturation media,11,12 neonatal incubation,13,14 extracellular matrix coating,15 cardiac organoids,16,17 small molecule,18,19,20 and coculture with other cell types.21,22,23 However, there is no established metric to compare maturity of PSC-CMs across studies. We previously used bulk transcriptomic data to characterize the maturation level of PSC-CMs using a package called MatStat based on CellNet.24,25 However, CMs exhibit significant diversity, necessitating the analysis of CMs at the individual cell level. Drawing inspiration from the gene expression pattern from pluripotent stem cells to differentiated PSC-CMs, we created a quantitative maturation metric based on Shannan Entropy. Using single cell RNA-seq data, subsequently, we validated that entropy scores can robustly reflect the developmental stage of single CMs. Furthermore, we found that entropy scores remain consistent for the same developmental stage across datasets. In addition, we provide a high-quality primary mouse CMs reference dataset using large-particle fluorescence-activated cell sorting (LP-FACS).26 LP-FACS can sort out healthy rod-shaped adult CMs and thus ensuring the quality of the scRNA-seq data. This reference reflects the endogenous CM development trajectory and can be used to benchmark PSC-CMs through entropy score. Apart from PSC-CMs, immaturity of PSC-derived hepatocytes,27,28 pancreatic islet cells,29 neurons,30,31 and other cell types have been reported. We found that entropy scores can be used as a maturation metric for other PSC-derived cell types. Specifically, we used publicly available datasets of pancreatic beta cells and hepatocytes as a proof of concept.1
The protocol below describes how to generate entropy score from scRNA-seq data of CMs. It is important to note that a high-quality dataset is crucial for generating reliable entropy score. The fraction of mitochondrial reads per cell is commonly used as a quality control metric with a high percentage of mitochondria reads indicating that the cell is apoptotic and low-quality.32 However, the median percentage mitochondria reads threshold in healthy cells may vary greatly between samples from different time points. For example, mitochondria number and size increase during cardiomyocyte maturation.33 Supporting this, we observed higher percentage mitochondria reads in postnatal CMs than embryonic CMs in our original study.1 Therefore, instead of using one threshold for all datasets, exploratory data analysis is necessary to determine a reasonable percentage mitochondrial read threshold for each CM scRNA-seq dataset. For other cell types, users need to establish new range of the percentage of mitochondrial reads based on available datasets. In addition, low sequencing depth can bias entropy score and a minimum of 2000 counts/cell depth is necessary. We found that drop-seq and single-nuclei RNA-seq or Nuc-seq data are not suitable for applying entropy score likely due to low depth.
Software installation
Timing: 10 min
-
1.
Download and install the most recent version of R from https://cran.r-project.org.
-
2.
Download and install RStudio from https://www.rstudio.com/products/rstudio/download/.
Note: RStudio is not necessary for running R. However, we highly recommend using RStudio as it is helpful for tracking variables and visualizing data.
-
3.
Create a new R script in RStudio.
-
4.
Install R packages from CRAN and Bioconductor using the following commands.
> install.packages("ggplot2")
> install.packages("reshape2")
> install.packages("Matrix")
> install.packages("grid")
> install.packages("stringr")
> install.packages("dplyr")
> install.packages("devtools")
devtools::install_github("pcahan1/singleCellNet")
-
5.
Load all necessary packages using the following commands.
> library(ggplot2)
> library(reshape2)
> library(Matrix)
> library(grid)
> library(stringr)
> library(dplyr)
> library(singleCellNet)
> library(splines)
Download associated files
Timing: 10 min
-
6.
Create a new folder as your workspace folder.
-
7.
Download “entropy_functions.R” from https://github.com/skannan4/cm-entropy-score#readme into the workspace folder that you just created.
Note: This file includes all the functions required to run this workflow.
-
8.
Download “clean_nodatasets_060720.RData” from https://www.synapse.org/#!Synapse:syn21788425/files/ into the same workspace folder.
Note: This file is necessary for running quality control (QC) on your dataset of choice.
Optional: If you wish to run sample codes and generate graphs shown in this protocol, you can load “clean_ 060720.RData” from https://www.synapse.org/#!Synapse:syn21788425/files/ into the workspace folder instead of “clean_nodatasets_060720.RData” using the command shown below. This file is not necessary for generating entropy score for your datasets.
Note: “clean_ 060720.RData” contains a full in vivo cardiomyocyte reference dataset from embryonic day 14 (e14) to postnatal day 56 (p56),1 which you can use to benchmark against entropy score generated from your datasets. We also published an extended in vivo cardiomyocyte reference from embryonic day 14 (e14) to postnatal day 84 (p84).34
> load("clean_060720.RData")
Loading R workspace
Timing: 5 min
-
9.
Set the working directory to the workspace folder that you created in Step 6. An example command is shown below.
> setwd("/Users/Desktop/Entropy/MyWorkSpace")
-
10.
Load “clean_nodatasets_060720.RData” into your current workspace using the following command.
> load("clean_nodatasets_060720.RData")
Optional: If you wish to reproduce graphs shown in this protocol, load “clean_ 060720.RData” into your workspace using the following command instead of “clean_nodatasets_060720.RData”. Please note that “clean_ 060720.RData” is 1.75 GB in size and thus memory-consuming.
> load("clean_060720.RData")
-
11.
Load “entropy_functions.R” into your current work space using the following command.
> source("entropy_functions.R")
CRITICAL: Make sure the datasets and functions are loaded into your RStudio workspace. Several “Data”, “Values”, and “Functions” entries should show up in the “Environment” box in the IDE. These are required for running the workflow.
Key resources table
REAGENT or RESOURCE | SOURCE | IDENTIFIER |
---|---|---|
Deposited data | ||
Raw and analyzed data | Kannan, Farid, Lin, Miyamoto, and Kwon1 | GEO: GSE147807 |
R workspace files (clean_nodatasets_060720.RData, clean_060720.RData) | Kannan, Farid, Lin, Miyamoto, and Kwon1 | https://www.synapse.org/#!Synapse:syn21788425/files/ |
Software and algorithms | ||
R v4.1.2 | R | https://www.r-project.org |
RStudio v2021.09.1 | RStudio | https://www.rstudio.com/products/rstudio/download/ |
ggplot2 v3.3.5 | Wickham35 | https://CRAN.R-project.org/package=ggplot2 |
reshape2 v1.4.4 | Wickham36 | https://CRAN.R-project.org/package=reshape2 |
Matrix v1.4-0 | Maechler and Bates37 | https://CRAN.R-project.org/package=Matrix |
grid v4.1.2 | Murrell38 | https://CRAN.R-project.org/package=grid |
stringr v1.4.0 | Wickham39 | https://CRAN.R-project.org/package=stringr |
dplyr v1.0.7 | Wickham, François, Henry, Müller, and Vaughan40 | https://CRAN.R-project.org/package=dplyr |
singleCellNet v0.1.0 | Tan and Cahan41 | https://pcahan1.github.io/singleCellNet/ |
entropy_functions.R | Kannan, Farid, Lin, Miyamoto, and Kwon1 | https://www.synapse.org/#!Synapse:syn21788425/files/ |
Other | ||
macOS | Apple | Monterey v12.0.1 |
MacBook Pro | Apple | Apple M1 Max, 10-core CPU, 32GB RAM |
Step-by-step method details
Prepare gene expression matrix and phenotype table
Timing: 5 min
In this step, we will check the format of counts table and phenotype table and prepare them as inputs for the entropy generating function.
Note: To run entropy score, you will need two matrices: a counts table and a metadata or phenotype table. We use the Bioconductor format for the counts table, in which genes are in the row names and cells are in the column names.
-
1.
Make sure the counts table have gene names as row names and cell barcodes as column names.
Note: Here we use “kannan_ref_data” as our counts table. The counts table format is shown below. Counts table can be either a dataframe or a sparse matrix. Here we use the sparse matrix format.
> counts_table = kannan_ref_data
> counts_table[1:4,1:4]
4 × 4 sparse Matrix of class "dgCMatrix"
AAGAGGCAAAAGTT AAGAGGCAATATAG AAGAGGCAATCAAA AAGAGGCAATGAAT
Gnai3 . . 1 .
Pbsn . . . .
Cdc45 . . . 2
H19 14 93 9 19
Optional: If your dataset has row names in ENSEMBL format, you may use the rename_genes() function to convert ENSEMBL ID to gene symbols. For mouse datasets, run the following command.
> rename_genes(counts_table, species = "mouse")
Similarly, for human datasets, simply replace “mouse” for “human”
> rename_genes(counts_table, species = "human")
-
2.
Generate a vector containing all the time points or other conditions of your cells. An example of a time point vector is shown below.
> pheno_table = combined_datasets[combined_datasets$data == "kannan_ref_data",]
> pheno_table$timepoint[1:10]
[1] e18 e14 e18 p0 e14 e18 p0 e18 e14 e14
CRITICAL: Make sure the order of cells in your vector or phenotype table matches the order of cells in the counts table.
Optional: You can also create a phenotype table which encompasses other metadata you wish to include. Similarly, the order of cells in the phenotype table must match the order of cells in the counts table.
Run data QC and generate entropy score
Timing: 6 min (936 cells, Run on a MacBook Pro, Apple M1 Max, 10-core CPU, 32 GB RAM)
In this step, we will perform data QC of the counts table and simultaneously generate entropy scores for each cell.
Note: Apart from entropy scores, this function also calculates and outputs a series of QC metrics which we will explain in details in the “expected outcomes” section. Both entropy scores and QC metrics will be recorded in the output data frame. No cell is excluded in this process.
Note: This step can be time-consuming depending on the size of your dataset. We provided a benchmarking table (Table 1) showing running time for datasets of three sizes on two computers with different CPU and RAM specifications.
Table 1.
data_qc() running times benchmarked for three datasets in the order of increasing sizes on two computers with different numbers of CPU cores and RAM sizes
MacBook pro, apple M1 max, 10-Core CPU, 32 GB RAM |
iMac, 3.2 GHz intel core i5, Quad-core CPU, 16 GB RAM |
|
---|---|---|
936 cells | 5.39 min | 14.43 min |
2224 cells | 5.09 min | 14.00 min |
13595 cells | 6.87 min | 23.29 min |
-
3.Run data_qc(), which output QC metrics and entropy score for each cell. Each input field is explained in details below.> qc_output = data_qc(dataset = "counts_table",study = "Kannan study",timepoint_list = pheno_table$timepoint,scn_calc = TRUE,species = "mouse",sample_type = "in vivo",isolation = "LP-FACS",sequencing = "mcSCRB-seq",mapping = "zUMIs",datatype = "UMIs",doi = "doi:12345",other_meta = NA)
-
a.dataset: REQUIRED. Your dataset object name as a character. Please note that your dataset name must be in quotes.
-
b.study: REQUIRED. A character giving the name of the study, e.g., “Kannan et al.”
-
c.timepoint_list: REQUIRED. Vector listing the time points of cells in the dataset as generated in Step 2.
-
d.scn_calc: Optional. Defaulted to TRUE. Here you can choose whether to run the SingleCellNet function to compute cell type classification using a Tabula Muris ref. 42. Skipping will significantly speed up this step as SingleCellNet is the most time-consuming portion of data_qc(); however, we strongly encourage you to run this step even if you are confident in the identity of your cells.
-
e.species: Optional. Defaulted to “mouse”. Set to “human” to analyze human datasets.
-
f.sample_type: Optional. For example, “in vivo”, “directed differentiation”, or “direct reprogramming”.
-
g.isolation: Optional. You can store the method by which sample was acquired, e.g., 10x, FACS, fixation, manual picking.
-
h.sequencing: Optional. You can store the sequencing protocol, e.g., SCRB-seq, TotalSeq.
-
i.mapping: Optional. You can store the mapping method, e.g., STAR/FeatureCounts, zUMIs, kallisto.
-
j.datatype: Optional. You can store the datatype, e.g., reads, UMIs.
-
k.doi: Optional.
-
l.other_meta: Optional. If you have another metadata field of interest (for example, atrial vs. ventricular), you can input here as a vector. Similar to “timepoint_list”, the order of cells in this vector must match that of cells in the counts table.
-
a.
Optional: If you are very confident that your datasets has been properly filtered and include only high quality cardiomyocytes, you can simply run master_entropy() function as shown below. This function will return a vector of entropy score for each cell in your dataset, and does not return any QC metrics. However, we strongly recommend you using the data_qc() function outlined in Step 3.
> master_entropy(counts_table)
Data visualization examples
Timing: 10 min
In Step 3, you have generated entropy score and QC metrics as the output of running the data_qc() function. You can now explore the dataset based on your interest.
Note: We provided some plots generated using ggplot2 in the section below as an example. For a detailed explanation of the data_qc() output table, please proceed to the “expected outcomes” section.
-
4.
Plot the entropy scores of cells that pass QC over different time points. An example plot is shown in Figure 1.
> qc_output_goodcell = qc_output[qc_output$good_cell == "TRUE",]
CRITICAL: If your cell type of interest is not cardiac muscle cell, please use the code shown below to select for cells that pass QC as “qc_output_goodcell”. Here, we use “erythrocyte” as an example. Please exchange this input field with the cell type name of your interest. Note that the cell type name needs to be spelled exactly the same as used in the Tabula Muris ref. 42. We have provided a list of cell type names in the supplementary files.
> ggplot(qc_output_goodcell , aes(x = timepoint, y = entropy, fill = timepoint)) + geom_jitter(size = 0.5) + geom_violin(scale = "width") + geom_boxplot(position = position_dodge(width = 1)) + xlab("Timepoint") + ylab(expression('Shannon Entropy'∼italic(S))) + theme_linedraw() + theme(legend.position = "none")
> qc_output_goodcell = qc_output[qc_output$top5_norm < 1.3 & qc_output$depth_norm > -0.5 & qc_output$max_celltype == "erythrocyte",]
-
5.
Visualize gene expression trend over entropy score. An example plot is shown in Figure 2.
Note: Here we normalize the raw counts table for sequencing depth in each cell using counts per million method. You can explore the genes of your interest by simply replacing the gene names in “y = Myh6” and plot title names in “title = Myh6” in the plotting command ggplot().
Note: Plot the negative values of the entropy score so that the more mature state is aligned with the positive direction of the x-axis.
> counts_table_goodcell = counts_table[,rownames(qc_output_goodcell)]
> head(counts_table_goodcell)[1:5,1:5]
> lib_size = colSums(counts_table_goodcell)
> lib_size[1:5]
> counts_table_goodcell_norm = sweep(counts_table_goodcell, 2, lib_size, FUN = '/')
> counts_table_goodcell_norm ∗ 10e+06
> head(counts_table_goodcell_norm)[1:5,1:5]
> dim(counts_table_goodcell_norm)
> counts_table_goodcell_norm = t(counts_table_goodcell_norm)
> dim(counts_table_goodcell_norm)
> head(counts_table_goodcell_norm)[1:5,1:5]
> plot_entropy = -1∗qc_output_goodcell$entropy
> plot_data = as.data.frame(as.matrix(counts_table_goodcell_norm))
> plot_data$plot_entropy = plot_entropy
> plot_data$timepoint = qc_output_goodcell$timepoint
> ggplot(plot_data, aes(x = plot_entropy, y = Myh6, color = timepoint)) + geom_jitter(size = 1) + geom_smooth(method = lm, formula = y ∼ ns(x, df=3), se = FALSE, color = "black", size = 0.5) + theme_linedraw() + ylim(0,NA) + theme(legend.position="none") + labs(title = "Myh6") + theme(plot.title = element_text(hjust = 0.5)) + xlab("-1∗Entropy") +ylab("Relative Expression")
-
6.
Plot the entropy scores generated from two datasets in one graph. An example plot is shown in Figure 3.
Note: Here we generated separately the entropy scores of the Murphy dataset (“murphy_data” included in “clean_ 060720.RData”) which sequenced mouse in vivo cardiomyocytes at postnatal day 0, 7, 14, 21, 28 (p0, p7, p14, p21, p28). We plot them with the entropy scores generated previously in Step 3 from “kannan_ref_data”.
> counts_table_1 = murphy_data
> pheno_table_1 = combined_datasets[combined_datasets$data == " murphy_data",]
> qc_output_1 = data_qc(dataset = "counts_table_1",
study = "Murphy study",
timepoint_list = pheno_table_1$timepoint,
scn_calc = TRUE,
species = "mouse",
sample_type = "in vivo",
isolation = "LP-FACS",
sequencing = "mcSCRB-seq",
mapping = "zUMIs",
datatype = "UMIs",
doi = "doi:12345",
other_meta = NA)
> qc_output_goodcell_1 = qc_output_1[qc_output_1$good_cell == "TRUE",]
> qc_output_goodcell_combined = rbind(qc_output_goodcell_1, qc_output_goodcell)
> ggplot(qc_output_goodcell_combined , aes(x = timepoint, y = entropy, fill = study)) + geom_jitter(size = 0.5) + geom_violin(scale = "width") + geom_boxplot(position = position_dodge(width = 1)) + xlab("Timepoint") + ylab(expression('Shannon Entropy'∼italic(S))) + theme_linedraw()
Figure 1.
Shannon Entropy S computed for each time point from the reference dataset
Only cells that pass QC (“good_cell” = = “TRUE”) are plotted here. Entropy score for individual cells are plotted as dots.
Figure 2.
Gene expression trends over the negative value of entropy score, relative gene expression levels are normalized by counts per million
The relative expression values of a inquired gene are plotted as dots. This dataset contains data acquired from primary mouse CMs from embryonic day 14, 18 (e14, e18), and postnatal day 0, 4, 8, 11, 14, 18, 22, 28, 35, 56 (p0, p4, p8, p11, p14, p18, p22, p28, p35, p56).
Figure 3.
Shannon Entropy S computed for murphy_data and kannan_ref_data for each time point plotted in one graph
Cells from the two datasets are plotted in different colors. Similar to Figure 1, only cells that pass QC (“good_cell” = = “TRUE”) are plotted. Entropy score for individual cells are plotted as dots.
Expected outcomes
Quality control remains an active area of discussion in scRNA-seq analysis. In addition to computing entropy scores, our R code computes multiple metrics that can enable users to make decisions about individual cell quality. Some of the metrics included are:
“top5” and “top5_norm”: Several pipelines have suggested using the number of reads attributed to the top expressed genes as a QC metric.32,43 We compute this percentage as “top5.” We further normalize within time points within datasets (top5_norm) to have a QC metric that can be standardized across study.
“mito” and “mito_all”: The number of mitochondrial reads can reflect cell membrane integrity when cells are being captured for generating scRNA-seq library.32 We compute the percentage of mitochondrial reads among genes used for entropy score calculation as “mito” and among all genes as “mito_all”. Both values are provided for users to evaluate the quality of their specific samples or datasets.
“depth_norm”: We found in our original study that cells with low depth can lead to inaccurate entropy quantification.1 We compute the normalized depth of each cell by dividing the median “depth” value of the corresponding time point as “depth_norm”.
“ribo”: A high percentage of ribosomal protein genes reads is associated with RNA degradation and thus used as a QC metric.44 We compute this percentage as “ribo”.
“max_celltype”: We use SingleCellNet to classify cells by comparing against the Tabula Muris ref. 41,42 “max_celltype” is the most likely cell type determined by the SingleCellNet pipeline.
“good_cell”: This column returns “TRUE” if a cell meets all the criteria and passes QC. Our default criteria are “top5_norm” less than 1.3, “depth_norm” less than −0.5, and “max_celltype” equal to “cardiac muscle cell”. Since “max_celltype” equal to “cardiac muscle cell” is required for “good_cell” = = “TURE” in the data_qc() function, this criteria for QC does not apply to other cell types. For how to select cells that pass QC for cells types other than cardiac muscle cell, please refer to Step 4 in the “Data Visualizing Examples” section.
By providing these metrics, we enable users to make informed QC decisions about their input datasets. However, based on extensive benchmarking across multiple datasets, we recommend a default threshold strategy of “depth_norm”, “top5_norm”, “max_celltype”. These selection criteria are already implemented by default into the output of data_qc() in the column “good_cell”, e.g., a cell is marked as TRUE if it meets those criteria and FALSE if not. Note that “mito” and “ribo” are not used as thresholds for “good_cell”. Therefore, we recommend using these values to flag low quality datasets and rerun data_qc() with only high quality datasets.
An example of qc output is shown below in Tables 2, 3, 4, and 5. The row names are individual cell barcodes in the same order as the counts table.
Table 2.
Column 1–5 of data_qc() output
Time point | Study | Depth | Genes | entropy | |
---|---|---|---|---|---|
AAGAGGCAAAAGTT | e18 | Kannan Reference | 13122 | 3618 | 6.09789614267799 |
AAGAGGCAATATAG | e14 | Kannan Reference | 32392 | 6511 | 6.20830508482549 |
AAGAGGCAATCAAA | e18 | Kannan Reference | 22638 | 4752 | 5.98835728271229 |
AAGAGGCACCCTGG | p0 | Kannan Reference | 25684 | 4468 | 5.68027652156496 |
… | … | … | … | … | … |
Table 3.
Column 6–8 of data_qc() output
top5 | top5_norm | depth_norm | |
---|---|---|---|
AAGAGGCAAAAGTT | 0.0823045267489712 | 0.953345229637834 | −0.266424340364696 |
AAGAGGCAATATAG | 0.0556310200049395 | 1.00143035583828 | 0.592636018258444 |
AAGAGGCAATCAAA | 0.0961215655093206 | 1.11338998671512 | 0.278915358922847 |
AAGAGGCACCCTGG | 0.123773555520947 | 1.22995318293779 | −0.867837685768919 |
… | … | … | … |
Table 4.
Column 9–11 of data_qc() output
Mito | mito_all | Ribo | |
---|---|---|---|
AAGAGGCAAAAGTT | 0.110564304461942 | 0.116979119036732 | 0.0376528547549925 |
AAGAGGCAATATAG | 0.0658584757883266 | 0.0795258088416893 | 0.0528837622005324 |
AAGAGGCAATCAAA | 0.0946907498631637 | 0.132299673116 | 0.0338911471255873 |
AAGAGGCACCCTGG | 0.156433458397705 | 0.184433888802367 | 0.0417477617750097 |
… | … | … | … |
Table 5.
Column 12–15 of data_qc() output
cm_score | max_score | max_celltype | good_cell | |
---|---|---|---|---|
AAGAGGCAAAAGTT | 0.714 | 0.714 | cardiac muscle cell | TRUE |
AAGAGGCAATATAG | 0.494 | 0.494 | cardiac muscle cell | TRUE |
AAGAGGCAATCAAA | 0.614 | 0.614 | cardiac muscle cell | TRUE |
AAGAGGCACCCTGG | 0.786 | 0.786 | cardiac muscle cell | FALSE |
… | … | … | … | … |
It is expected that only a portion of cells will pass QC (good_cell = = “TRUE”). The fraction of cells that can pass QC is largely dependent on the quality of the cells used to generate the library for sequencing. In Figure 4, we compare the entropy scores of all cells (Figure 4A) and cells that pass QC (Figure 4B) from the Murphy dataset (“Murphy_data”) included in “clean_ 060720.RData”. As shown in Figure 4C, the QC process mainly eliminated cells that have artificially lower entropy score likely due to poor cell quality. By applying the same QC standards to all datasets, transcriptomic entropy score can help us compare developmental maturity of cardiomyocytes from different studies.
Figure 4.
Comparing Shannon Entropy S of cells before and after QC
(A) Entropy score of all cells from the Murphy dataset (combined_datasets$data = = "murphy_data").
(B) Entropy score of only cells that pass QC from the same dataset.
(C) Cells which pass QC and has a value of “TRUE” in the “good_cell” column are shown in blue. Cells which did not pass QC and has a value of “FALSE” in the “good_cell” column are shown in red.
Limitations
We have shown that in vivo maturation time points are strongly correlated with decreasing transcriptomic entropy. Therefore, transcriptomic entropy score serves as a quantitative metric to decode the maturation state of cardiomyocytes. Through the process of developing entropy score, we found that careful QC is necessary and crucial for computing entropy score. We incorporated QC steps by setting thresholds for “top5_norm”, “depth_norm”, and “max_celltype” in the entropy generating function. However, users need to determine the quality of their datasets by themselves based on “mito” and read depth distribution. As mentioned previously, entropy scores will be drastically lower if input cells have an unreasonably high percentage of mitochondrial reads or ribosomal reads, and thus fail to reflect the maturation state of the cells. We set this threshold to be 30% for PSC-CMs as we observed that the majority of good quality cells fall below this threshold. However, this threshold is subjected to change if users work with more mature PSC-CMs in the future or a different cell type than cardiomyocytes. Therefore, thorough exploratory analysis of the input datasets is necessary for determining a proper “mito” threshold. Datasets generated by drop-seq library preparation methods are not suitable for applying entropy score. We found that entropy score performs poorly on drop-seq datasets likely due to the low sequencing depth. Similarly, Nuc-seq is not suitable given the low read counts. We suspect that entropy score may still work with Nuc-seq datasets but cannot be compared with studies using whole cell RNA-seq data. In addition, entropy score is not recommended for bulk RNA-seq datasets. Bulk RNA-seq samples contain cardiomyocytes of varied qualities. Since the transcriptomic information is pooled in bulk RNA-seq data, we cannot properly QC bulk samples with the metrics we developed. Furthermore, bulk RNA-seq samples often contain varied percentages of non-cardiomyocytes. In the past, we observed extensive batch effect due to varied purity of cardiomyocytes in bulk RNA-seq samples and thus no meaningful comparison can be made. To study subpopulations of a specific cell type using entropy score, for example, atrial cardiomyocytes and ventricular cardiomyocytes, we recommend annotating the cell subtypes using established markers or referencing single-cell RNA-seq atlas of your choice and incorporating the annotation using the “other-meta” input field in the “data_qc” function. Subsequently, you can plot entropy scores of the cell subpopulations of interest.
Troubleshooting
Problem 1
In Step 3, you receive the error message.
Error in matrix(0, nrow = length(genes), ncol = ncol(expDat)) :
non-numeric matrix extent
Potential solution
This error occurs likely because 1) the gene names in the counts table are not in the form of gene symbols or, 2) the species is not typed correctly. For scenario 1), please use rename_genes() function to convert to gene names as described in Step 1. For scenario 2), please check if “species =” argument in the data_qc() function has been correctly input.
Problem 2
In Step 3, you receive the error message.
Error in `$<-.data.frame`(`∗tmp∗`, "timepoint", value = c(21L, 18L, 21L, :
replacement has 930 rows, data has 936
Potential solution
This error occurs because the length of vector “timepoint” does not match the number of columns in the counts table which should both equal the number of cells. In the case of the error message above, the “timepoint” vector has 930 entries while the counts table has 936 columns. To resolve this problem, make sure the “timepoint” vector has an entry for every cell in the counts table and the order cells must match that in the counts table. Likewise, this error message can pop up for “other_meta” for the same reason.
Problem 3
At the end of Step 3, you find that the entropy score of the cardiomyocytes in one or several experimental group are unexpectedly low, i.e., <5 for embryonic or perinatal cardiomyocytes, <4 for adult cardiomyocytes.
Potential solution
This error occurs likely due to the low quality of input cells for sequencing. This mainly reflects in high mitochondrial reads. We recommend discarding cells, or sometimes the entire sample, with unexpectedly high percentage mitochondrial reads compared to the predetermined “mito” threshold for your dataset.
Problem 4
In Step 5 of the Data Visualization Examples section, you receive the error message.
Error in `geom_jitter()`:
! Problem while computing aesthetics.
Error occurred in the 1st layer.
Caused by error in `FUN()`:
! object 'myh7' not found
Potential solution
This error occurs because the name of the gene is not input correctly. For mouse datasets, gene names should be typed with the first letter in uppercase, i.e., Myh7. For human datasets, genes names should be typed in all uppercase letters, i.e., MYH7.
Problem 5
When running the code, you find that certain packages are no longer supported by the version of R you are using.
Potential solution
This may occur when using a more updated version of R. The code described in this protocol is run in R version 4.1.2. We recommend reverting to this version of R. R version 4.1.2 for Windows is available at https://cran.r-project.org/bin/windows/base/old/. R version 4.1.2 for macOS is available at https://cran.r-project.org/bin/macosx/base/.
Resource availability
Lead contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, [Chulan Kwon] (ckwon13@jhmi.edu).
Technical contact
Elaine Zhelan Chen (zchen119@jhu.edu).
Materials availability
This study did not generate new unique reagents.
Data and code availability
All R workspace files are available on SYNAPSE at https://www.synapse.org/#!Synapse:syn21788425/files/. R code used in this protocol is available on GitHub at https://github.com/elainezhlchen/entropy_star_protocol. R code is also available on Zenodo at https://doi.org/10.5281/zenodo.10971926. Entropy function code and code used in the original publication are available on GitHub at https://github.com/skannan4/cm-entropy-score#readme. All sequencing data produced in the original study can be found on GEO with accession number GSE147807. Additional information is available from the lead contact from request.
Acknowledgments
We thank Kwon lab members for helpful discussions and critical feedback in preparing this manuscript. This work was supported by grants from NIH (R01HL156947 and T32HL007227), AHA (23TPA1058685), and MSCRF (2023-MSCRFD-6139). The graphical abstract was created with BioRender.com.
Author contributions
Conceptualization, S.K. and M.F.; research design and methodology, S.K. and M.F.; data analysis and interpretation, E.Z.C., S.K., and M.F.; writing – original draft, E.Z.C., S.K., S.M., and C.K.; writing – review and editing, E.Z.C., S.K., S.M., and C.K.; funding acquisition, C.K.
Declaration of interests
The authors declare no competing interests.
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.xpro.2024.103083.
Supplemental information
References
- 1.Kannan S., Farid M., Lin B.L., Miyamoto M., Kwon C. Transcriptomic entropy benchmarks stem cell-derived cardiomyocyte maturation against endogenous tissue at single cell level. PLoS Comput. Biol. 2021;17 doi: 10.1371/journal.pcbi.1009305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Brodehl A., Ebbinghaus H., Deutsch M.A., Gummert J., Gärtner A., Ratnavadivel S., Milting H. Human Induced Pluripotent Stem-Cell-Derived Cardiomyocytes as Models for Genetic Cardiomyopathies. Int. J. Mol. Sci. 2019;20:4381. doi: 10.3390/ijms20184381. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Paik D.T., Chandy M., Wu J.C. Patient and Disease-Specific Induced Pluripotent Stem Cells for Discovery of Personalized Cardiovascular Drugs and Therapeutics. Pharmacol. Rev. 2020;72:320–342. doi: 10.1124/pr.116.013003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Karbassi E., Fenix A., Marchiano S., Muraoka N., Nakamura K., Yang X., Murry C.E. Cardiomyocyte maturation: advances in knowledge and implications for regenerative medicine. Nat. Rev. Cardiol. 2020;17:341–359. doi: 10.1038/s41569-019-0331-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Murphy S.A., Chen E.Z., Tung L., Boheler K.R., Kwon C. Maturing heart muscle cells: Mechanisms and transcriptomic insights. Semin. Cell Dev. Biol. 2021;119:49–60. doi: 10.1016/j.semcdb.2021.04.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Tandon N., Cannizzaro C., Chao P.H.G., Maidhof R., Marsano A., Au H.T.H., Radisic M., Vunjak-Novakovic G. Electrical stimulation systems for cardiac tissue engineering. Nat. Protoc. 2009;4:155–173. doi: 10.1038/nprot.2008.183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Hirt M.N., Boeddinghaus J., Mitchell A., Schaaf S., Börnchen C., Müller C., Schulz H., Hubner N., Stenzig J., Stoehr A., et al. Functional improvement and maturation of rat and human engineered heart tissue by chronic electrical stimulation. J. Mol. Cell. Cardiol. 2014;74:151–161. doi: 10.1016/j.yjmcc.2014.05.009. [DOI] [PubMed] [Google Scholar]
- 8.Ma R., Liang J., Huang W., Guo L., Cai W., Wang L., Paul C., Yang H.T., Kim H.W., Wang Y. Electrical Stimulation Enhances Cardiac Differentiation of Human Induced Pluripotent Stem Cells for Myocardial Infarction Therapy. Antioxid. Redox Signal. 2018;28:371–384. doi: 10.1089/ars.2016.6766. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Ruan J.L., Tulloch N.L., Razumova M.V., Saiget M., Muskheli V., Pabon L., Reinecke H., Regnier M., Murry C.E. Mechanical Stress Conditioning and Electrical Stimulation Promote Contractility and Force Maturation of Induced Pluripotent Stem Cell-Derived Human Cardiac Tissue. Circulation. 2016;134:1557–1567. doi: 10.1161/CIRCULATIONAHA.114.014998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Lui C., Chin A.F., Park S., Yeung E., Kwon C., Tomaselli G., Chen Y., Hibino N. Mechanical stimulation enhances development of scaffold-free, 3D-printed, engineered heart tissue grafts. J. Tissue Eng. Regen. Med. 2021;15:503–512. doi: 10.1002/term.3188. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Feyen D.A.M., McKeithan W.L., Bruyneel A.A.N., Spiering S., Hörmann L., Ulmer B., Zhang H., Briganti F., Schweizer M., Hegyi B., et al. Metabolic Maturation Media Improve Physiological Function of Human iPSC-Derived Cardiomyocytes. Cell Rep. 2020;32 doi: 10.1016/j.celrep.2020.107925. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Funakoshi S., Fernandes I., Mastikhina O., Wilkinson D., Tran T., Dhahri W., Mazine A., Yang D., Burnett B., Lee J., et al. Generation of mature compact ventricular cardiomyocytes from human pluripotent stem cells. Nat. Commun. 2021;12:3155. doi: 10.1038/s41467-021-23329-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Cho G.S., Lee D.I., Tampakakis E., Murphy S., Andersen P., Uosaki H., Chelko S., Chakir K., Hong I., Seo K., et al. Neonatal Transplantation Confers Maturation of PSC-Derived Cardiomyocytes Conducive to Modeling Cardiomyopathy. Cell Rep. 2017;18:571–582. doi: 10.1016/j.celrep.2016.12.040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Cho G.S., Tampakakis E., Andersen P., Kwon C. Use of a neonatal rat system as a bioincubator to generate adult-like mature cardiomyocytes from human and mouse pluripotent stem cells. Nat. Protoc. 2017;12:2097–2109. doi: 10.1038/nprot.2017.089. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Chanthra N., Abe T., Miyamoto M., Sekiguchi K., Kwon C., Hanazono Y., Uosaki H. A Novel Fluorescent Reporter System Identifies Laminin-511/521 as Potent Regulators of Cardiomyocyte Maturation. Sci. Rep. 2020;10:4249. doi: 10.1038/s41598-020-61163-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Miyamoto M., Nam L., Kannan S., Kwon C. Heart organoids and tissue models for modeling development and disease. Semin. Cell Dev. Biol. 2021;118:119–128. doi: 10.1016/j.semcdb.2021.03.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Hofbauer P., Jahnel S.M., Papai N., Giesshammer M., Deyett A., Schmidt C., Penc M., Tavernini K., Grdseloff N., Meledeth C., et al. Cardioids reveal self-organizing principles of human cardiogenesis. Cell. 2021;184:3299–3317.e22. doi: 10.1016/j.cell.2021.04.034. [DOI] [PubMed] [Google Scholar]
- 18.Murphy S.A., Miyamoto M., Kervadec A., Kannan S., Tampakakis E., Kambhampati S., Lin B.L., Paek S., Andersen P., Lee D.I., et al. PGC1/PPAR drive cardiomyocyte maturation at single cell level via YAP1 and SF3B2. Nat. Commun. 2021;12:1648. doi: 10.1038/s41467-021-21957-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Kambhampati S., Murphy S., Uosaki H., Kwon C. Cross-Organ Transcriptomic Comparison Reveals Universal Factors During Maturation. J. Comput. Biol. 2022;29:1031–1044. doi: 10.1089/cmb.2021.0349. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Parikh S.S., Blackwell D.J., Gomez-Hurtado N., Frisk M., Wang L., Kim K., Dahl C.P., Fiane A., Tønnessen T., Kryshtal D.O., et al. Thyroid and Glucocorticoid Hormones Promote Functional T-Tubule Development in Human-Induced Pluripotent Stem Cell-Derived Cardiomyocytes. Circ. Res. 2017;121:1323–1330. doi: 10.1161/CIRCRESAHA.117.311920. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Kowalski W.J., Garcia-Pak I.H., Li W., Uosaki H., Tampakakis E., Zou J., Lin Y., Patterson K., Kwon C., Mukouyama Y.S. Sympathetic Neurons Regulate Cardiomyocyte Maturation in Culture. Front. Cell Dev. Biol. 2022;10 doi: 10.3389/fcell.2022.850645. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Tampakakis E., Gangrade H., Glavaris S., Htet M., Murphy S., Lin B.L., Liu T., Saberi A., Miyamoto M., Kowalski W., et al. Heart neurons use clock genes to control myocyte proliferation. Sci. Adv. 2021;7 doi: 10.1126/sciadv.abh4181. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Dunn K.K., Reichardt I.M., Simmons A.D., Jin G., Floy M.E., Hoon K.M., Palecek S.P. Coculture of Endothelial Cells with Human Pluripotent Stem Cell-Derived Cardiac Progenitors Reveals a Differentiation Stage-Specific Enhancement of Cardiomyocyte Maturation. Biotechnol. J. 2019;14 doi: 10.1002/biot.201800725. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Uosaki H., Cahan P., Lee D.I., Wang S., Miyamoto M., Fernandez L., Kass D.A., Kwon C. Transcriptional Landscape of Cardiomyocyte Maturation. Cell Rep. 2015;13:1705–1716. doi: 10.1016/j.celrep.2015.10.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Cahan P., Li H., Morris S.A., Lummertz da Rocha E., Daley G.Q., Collins J.J. CellNet: network biology applied to stem cell engineering. Cell. 2014;158:903–915. doi: 10.1016/j.cell.2014.07.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Kannan S., Miyamoto M., Lin B.L., Zhu R., Murphy S., Kass D.A., Andersen P., Kwon C. Large Particle Fluorescence-Activated Cell Sorting Enables High-Quality Single-Cell RNA Sequencing and Functional Analysis of Adult Cardiomyocytes. Circ. Res. 2019;125:567–569. doi: 10.1161/CIRCRESAHA.119.315493. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Raju R., Chau D., Notelaers T., Myers C.L., Verfaillie C.M., Hu W.S. In Vitro Pluripotent Stem Cell Differentiation to Hepatocyte Ceases Further Maturation at an Equivalent Stage of E15 in Mouse Embryonic Liver Development. Stem Cells Dev. 2018;27:910–921. doi: 10.1089/scd.2017.0270. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Baxter M., Withey S., Harrison S., Segeritz C.P., Zhang F., Atkinson-Dell R., Rowe C., Gerrard D.T., Sison-Young R., Jenkins R., et al. Phenotypic and functional analyses show stem cell-derived hepatocyte-like cells better mimic fetal rather than adult hepatocytes. J. Hepatol. 2015;62:581–589. doi: 10.1016/j.jhep.2014.10.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Sun Z.Y., Yu T.Y., Jiang F.X., Wang W. Functional maturation of immature β cells: A roadblock for stem cell therapy for type 1 diabetes. World J. Stem Cells. 2021;13:193–207. doi: 10.4252/wjsc.v13.i3.193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Autar K., Guo X., Rumsey J.W., Long C.J., Akanda N., Jackson M., Narasimhan N.S., Caneus J., Morgan D., Hickman J.J. A functional hiPSC-cortical neuron differentiation and maturation model and its application to neurological disorders. Stem Cell Rep. 2022;17:96–109. doi: 10.1016/j.stemcr.2021.11.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.de Leeuw S.M., Davaz S., Wanner D., Milleret V., Ehrbar M., Gietl A., Tackenberg C. Increased maturation of iPSC-derived neurons in a hydrogel-based 3D culture. J. Neurosci. Methods. 2021;360 doi: 10.1016/j.jneumeth.2021.109254. [DOI] [PubMed] [Google Scholar]
- 32.Ilicic T., Kim J.K., Kolodziejczyk A.A., Bagger F.O., McCarthy D.J., Marioni J.C., Teichmann S.A. Classification of low quality cells from single-cell RNA-seq data. Genome Biol. 2016;17:29. doi: 10.1186/s13059-016-0888-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Guo Y., Pu W.T. Cardiomyocyte Maturation: New Phase in Development. Circ. Res. 2020;126:1086–1106. doi: 10.1161/CIRCRESAHA.119.315862. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Kannan S., Miyamoto M., Zhu R., Lynott M., Guo J., Chen E.Z., Colas A.R., Lin B.L., Kwon C. Trajectory reconstruction identifies dysregulation of perinatal maturation programs in pluripotent stem cell-derived cardiomyocytes. Cell Rep. 2023;42 doi: 10.1016/j.celrep.2023.112330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Wickham H. ggplot2. Springer; 2016. Data analysis; pp. 189–201.https://ggplot2.tidyverse.org [Google Scholar]
- 36.Wickham H. Reshaping Data with the reshape Package. J. Stat. Softw. 2007;21:1–20. http://www.jstatsoft.org/v21/i12/ [Google Scholar]
- 37.Maechler M., Bates D. 2nd Introduction to the Matrix package. 2006. https://cran.r-project.org/web/packages/Matrix/Matrix.pdf
- 38.Murrell P. Chapman & Hall/CRC Press; 2022. R Graphics.https://www.stat.auckland.ac.nz/∼paul/RGraphics/rgraphics.html [Google Scholar]
- 39.Wickham H. 2023. Stringr: Simple, Consistent Wrappers for Common String Operations.https://github.com/tidyverse/stringrhttps://stringr.tidyverse.org [Google Scholar]
- 40.Wickham H., François R., Henry L., Müller K., Vaughan D. 2023. Dplyr: A Grammar of Data Manipulation. R Package Version 1.0.7.https://github.com/tidyverse/dplyrhttps://dplyr.tidyverse.org [Google Scholar]
- 41.Tan Y., Cahan P. SingleCellNet: A Computational Tool to Classify Single Cell RNA-Seq Data Across Platforms and Across Species. Cell Syst. 2019;9:207–213.e2. doi: 10.1016/j.cels.2019.06.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Tabula Muris Consortium, Overall coordination, Logistical coordination, Organ collection and processing, Library preparation and sequencing, Computational data analysis, Cell type annotation, Writing group, Supplemental text writing group, Principal investigators Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature. 2018;562:367–372. doi: 10.1038/s41586-018-0590-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Heumos L., Schaar A.C., Lance C., Litinetskaya A., Drost F., Zappia L., Lücken M.D., Strobl D.C., Henao J., Curion F., et al. Best practices for single-cell analysis across modalities. Nat. Rev. Genet. 2023;24:550–572. doi: 10.1038/s41576-023-00586-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Subramanian A., Alperovich M., Yang Y., Li B. Biology-inspired data-driven quality control for scientific discovery in single-cell transcriptomics. Genome Biol. 2022;23:267. doi: 10.1186/s13059-022-02820-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All R workspace files are available on SYNAPSE at https://www.synapse.org/#!Synapse:syn21788425/files/. R code used in this protocol is available on GitHub at https://github.com/elainezhlchen/entropy_star_protocol. R code is also available on Zenodo at https://doi.org/10.5281/zenodo.10971926. Entropy function code and code used in the original publication are available on GitHub at https://github.com/skannan4/cm-entropy-score#readme. All sequencing data produced in the original study can be found on GEO with accession number GSE147807. Additional information is available from the lead contact from request.