An increment of diversity method for cell state trajectory inference of time-series scRNA-seq data

Yan Hong; Hanshuang Li; Chunshen Long; Pengfei Liang; Jian Zhou; Yongchun Zuo

doi:10.1016/j.fmre.2024.01.020

. 2024 Feb 9;4(4):770–776. doi: 10.1016/j.fmre.2024.01.020

An increment of diversity method for cell state trajectory inference of time-series scRNA-seq data

Yan Hong ^1,¹, Hanshuang Li ^1,¹, Chunshen Long ¹, Pengfei Liang ¹, Jian Zhou ¹, Yongchun Zuo ^1,^⁎

PMCID: PMC11330101 PMID: 39156571

Abstract

The increasing emergence of the time-series single-cell RNA sequencing (scRNA-seq) data, inferring developmental trajectory by connecting transcriptome similar cell states (i.e., cell types or clusters) has become a major challenge. Most existing computational methods are designed for individual cells and do not take into account the available time series information. We present IDTI based on the Increment of Diversity for Trajectory Inference, which combines time series information and the minimum increment of diversity method to infer cell state trajectory of time-series scRNA-seq data. We apply IDTI to simulated and three real diverse tissue development datasets, and compare it with six other commonly used trajectory inference methods in terms of topology similarity and branching accuracy. The results have shown that the IDTI method accurately constructs the cell state trajectory without the requirement of starting cells. In the performance test, we further demonstrate that IDTI has the advantages of high accuracy and strong robustness.

Keywords: Increment of diversity, Time-series scRNA-seq data, Cell state trajectory inference, Topology similarity, Branching accuracy

Graphical abstract

1. Introduction

Cell development and differentiation is a dynamic process, which is the basis of studying ontogenesis in multicellular organisms [1]. The scRNA-seq is an excellent technique for studying cell fate, allowing transcription analysis to reveal the underlying developmental dynamics, cell communication, gene regulation and disease development [2]. The analysis of trajectory inference can verify known cell differentiation relationships and reveal cell development trajectories. In particular, reconstructing cell state trajectories between adjacent time points is key to analyzing transcriptional dynamics over time [3,4]. At present, it remains a challenge to accurately infer the cell state trajectory of time-series scRNA-seq data.

In recent years, a series of trajectory inference methods based on scRNA-seq data have been developed [5], [6], [7], [8], [9]. In 2014, Trapnell et al. proposed Monocle to construct Minimum Spanning Tree (MST) based on transcriptome similarity to infer cell trajectory, which was a pioneering trajectory inference method [10], [11], [12]. La Manno et al. proposed RNA velocity to infer the direction and speed of cell differentiation based on the spliced and unspliced mRNAs [13]. Schiebinger et al. developed the landmark work Waddington-OT based on the principle of using the optimal transport framework to model cell development in dynamic processes [14]. Setty et al. and Stassen et al. presented Palantir and VIA respectively, both of whom applied Markov chain to single cell pseudotime analysis [15,16]. Saelens et al. developed Dyno to integrate and evaluate more than 70 trajectory inference methods as of 2019 [17]. These computational methods have emerged to meet different needs. However, most of the existing trajectory inference methods have been designed for individual cells, ignoring the importance of cell state trajectory inference, and forgetting the available time series information. In the last several years, there have also been approaches to infer cell state trajectories by combining temporal information. For example, CSHMM utilized a continuous state Hidden Markov Model (HMM) to reconstruct continuous cell state trajectory [18]. Tempora combined biological pathways to identify cell types and incorporated temporal information to infer cell state trajectories [19]. CStreet constructed k-nearest neighbor connections of cells within each time point and between adjacent time points, and then used force-directed graphs to estimate the connection probability of cell states [20]. GraphFP is a nonlinear Fokker-Planck equation based on graph model and dynamic inference framework, which can reconstruct the cell state transition potential energy landscape [21].

Here, we present IDTI, which for the first time utilizes increment of diversity to cell state trajectory inference. It develops for time-series scRNA-seq data, so gene expression matrix with time series information is used as input. IDTI trajectory inference includes identification of cell states, sectionalization data based on time points, calculation of increment of diversity, determination of the relationship between cell states, visualization of the inferred trajectory and so on. Through application and comparison, we conclude that IDTI method has high accuracy and robustness. Thus, the trajectories inferred by IDTI can reflect real developmental relationships and help to understand and explain the process of cellular identity transformation.

2. Materials and methods

2.1. Data collection and preprocessing

We tested IDTI on simulated and several real time-series scRNA-seq datasets. The simulated dataset has been generated by Splatter [22], an R package for the simple simulation of scRNA-seq data. The real datasets have been available in the Gene Expression Omnibus (GEO) database under accession code GSE98150 [23], GSE90047 [24] and GSE107122 [25], which are mouse early embryonic development, mouse hepatoblast differentiation and mouse cerebral cortex development respectively. The gene expression matrix with time series information as an input to IDTI needs to be prepared in h5ad file format. Cell state labels have been given for the datasets used in our study; if not, which can be obtained from cell_clusters function of IDTI. The data preprocessing includes: filtering low-count genes and cells by function sc.pp.filter_genes and sc.pp.filter_cells, and normalizing data by function sc.pp.normalize_total and sc.pp.log1p. Here, we have tried to select different amounts of highly variable genes using function sc.pp.highly_variable_genes for downstream analysis. Then, the data have been normalized by function MinMaxScaler, which was scaled to a positive value between 0 and 1 to facilitate the calculation of the logarithmic function in subsequent analyses. See the code section at https://github.com/hy-1994/IDTI for specific parameters.

2.2. Methods

2.2.1. The measure of diversity

As early as 1978, Laxton proposed the concept of measure of diversity [26], which was applied in the geographical distribution of biological species. For the high-dimensional gene expression space $S = {X_{1}, X_{2}, . . ., X_{n}}$ , which is composed of n cell states. Let $X \in S$ , x_i denotes the sum of gene expression values of i^th dimension of cell state X. The measure of diversity of $X : [x_{1}, x_{2}, . . ., x_{m}]$ is defined as

\begin{matrix} D (X) = N_{X} \log_{b} N_{X} - \sum_{i = 1}^{m} x_{i} \log_{b} x_{i} \end{matrix}

(1)

where $N_{X} = \sum_{i = 1}^{m} x_{i}$ is sum expression values of each x_i in X; b is the given base of logarithm, which is e; if x_i = 0, then $\log_{b} x_{i} = 0$ . Similarly, when we have another cell state $Y : [y_{1}, y_{2}, . . ., y_{m}]$ , D(Y) can be defined as Eq. 2, where $N_{Y} = \sum_{i = 1}^{m} y_{i}$ is also sum expression values of every y_i in Y.

\begin{matrix} D (Y) = N_{Y} \log_{b} N_{Y} - \sum_{i = 1}^{m} y_{i} \log_{b} y_{i} \end{matrix}

(2)

Here, we can see that the measure of diversity is highly similar to information entropy [27,28], both are descriptions of state space from the perspective of information, and the basis of measurement is the logarithmic function measured according to information. However, the meanings between them are different: Information entropy is a description of state uncertainty or disorder; while the measure of diversity is a description of the overall uncertainty. Greater information entropy implies a large degree of uncertainty, but not necessarily a large measure of diversity. Conversely, a higher measure of diversity does not necessarily indicate greater disorder.

2.2.2. The increment of diversity

Furthermore, the measure of diversity is extended to the concept of the increment of diversity (ID), which can quantitatively represent biological similarity. Subsequently, the ID has been widely used in the field of bioinformatics, especially in the study of biological classification [29], [30], [31], [32]. The ID between $X : [x_{1}, x_{2}, . . ., x_{m}]$ and $Y : [y_{1}, y_{2}, . . ., y_{m}]$ is calculated as

\begin{matrix} ID (X, Y) = D (X + Y) - D (X) - D (Y) \end{matrix}

(3)

The smaller the value of $ID (X, Y)$ , the higher the similarity between X and Y. D(X + Y) can be calculated as

\begin{matrix} D (X + Y) = (N_{X} + N_{Y}) \log_{b} (N_{X} + N_{Y}) - \sum_{i = 1}^{m} (x_{i} + y_{i}) \log_{b} (x_{i} + y_{i}) \end{matrix}

(4)

We then apply ID to cell state trajectory inference of time-series scRNA-seq data, on the premise that we need to determine standard sources. Given the available time information, we always regard cell states at the previous moment as the standard sources of cell states at the later moment. For example, there are standard sources STD_k (k = 1, 2, …, K) at time point T_i, where K is the number of standard sources, then the relationship between standard sources and cell states $Z \in S$ at time point T_i₊₁ is determined by the minimum increment of diversity algorithm, and the decision principle is

ID (Z, {STD}_{k}) = m i n {ID (Z, {STD}_{1}), ID (Z, {STD}_{2}), . . ., ID (Z, {STD}_{K})}

(5)

2.2.3. Calculation of graph edit distance and F₁ score

Here, we evaluate IDTI by comparing its inferred trajectory to known trajectory manually curated from the literature. Two approaches are used to evaluate the topology similarity and branching accuracy between trajectories: graph edit distance (GED) score and F₁ score.

The GED score is degree of similarity between two graphs G₁ and G₂ [33], which is defined as follows:

GED (G_{1}, G_{2}) = m i n {\sum_{e_{j} \in γ (G_{1}, G_{2})} c (e_{j})}

(6)

where $γ (G_{1}, G_{2})$ denotes all the complete edit paths from graph G₁ to G₂, and $c (e_{j})$ denotes the edit cost of the edit operation $e_{j}$ . The deletion of nodes V and edges E, and the substitution of nodes V constitute a complete edit path $γ_{i}$ , and the score of each edit operation is defined as 1. If the sum of the cost values of this path is the smallest, this cost is the edit distance between graphs. The GED score is calculated using the function nx.graph_edit_distance(G₁, G₂) in NetworkX [34]. The closer the GED score between two trajectories is to 0, the higher the similarity between them.

The F₁ score is the harmonic mean of precision and recall of trajectory directed edge identification [35,36]. A true positive (TP)/false positive (FP) edge in the inferred trajectory is an edge that actually exists/does not exist in the gold trajectory. A False Negative (FN) edge in the inferred trajectory is when there is an edge in the gold trajectory between cell states that is absent in the inferred trajectory. The precision is calculated as the ratio of the number of TP edges to the total number of predicted edges (the sum of TP edges and FP edges). The recall is calculated as the ratio of the number of TP edges to the number of all real edges (the sum of TP edges and FN edges) [37]. The range of F₁ score is [0, 1], and the higher the F₁ score between two trajectories indicates the higher branching accuracy.

F_{1} = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(7)

\begin{matrix} Prec ision = \frac{TP}{TP + FP} \end{matrix}

(8)

\begin{matrix} Recall = \frac{TP}{TP + FN} \end{matrix}

(9)

3. Results and discussion

3.1. Overview of IDTI

IDTI infers the cell state trajectory from the expression matrix of time-series scRNA-seq data. IDTI first performs the identification of cell states and then sectionalizes the data based on available time information. Most importantly, the calculation of the ID and the inference of developmental relationships between cell states. We calculate the ID between cell states at adjacent time points to represent the similarity between cell states, and then determine the development trajectory by the minimum increment of diversity algorithm. In the end, we visualize trajectories through Uniform Manifold Approximation and Projection (UMAP) plot and directed graph, of which the UMAP shows the relationships between cell states in the form of individual cells, and the directed graph shows the hierarchical structure of evolutionary relationships (Fig. 1).

3.2. Application of IDTI on simulated dataset

We used function splatsimulatePath in Splatter to generate the simulated time-series scRNA-seq data with continuous trajectory. We have gotten a simulated dataset including 600 cells and 10,000 genes at three time points (T₁, T₂, T₃). These cells can be classified into seven categories, of which the T₁ stage contains the cell type Clu1A, the T₂ stage contains two cell types Clu2A and Clu2B, and the T₃ stage contains four cell types Clu3A, Clu3B, Clu3C and Clu3D (Fig. 2a). The known development trajectories are Clu1A–Clu2A–Clu3A, Clu1A–Clu2A–Clu3B, Clu1A–Clu2B–Clu3C, Clu1A–Clu2B–Clu3D, and we manually draw the trajectories (Fig. 2b). As shown in UMAP plot and directed graph, the IDTI can accurately reconstruct the four developmental trajectories for the simulated data, which are same as the developmental trajectory of the simulation (Fig. 2c,d).

Fig 2 — **IDTI analysis of the simulated dataset**. (a) Scatter plot showing the visualization of Principal Component Analysis (PCA) dimensional reduction of the simulated data. Different cell states are plotted by different colors. (b) The gold trajectory of the simulated data is used to evaluate the accuracy of the inferred trajectories. (c) UMAP plot showing the cell state trajectory inferred from the simulated dataset using IDTI. Each node represents a single cell, and which are colored by cell states. The black nodes indicate the center of cell states, and the arrows connecting them represent the cell state trajectory. (d) Directed graph showing the hierarchy of cell state trajectory of the simulated dataset using IDTI. The timeline on the left represents developmental stages or time points. Circles represent cell states, of which the relative size represents cell population and the relative color depth represents the increment of diversity between the cell states at adjacent time points.

3.3. Application of IDTI on real time-series scRNA-seq datasets

First, we applied IDTI on the time-series scRNA-seq dataset of mouse early embryonic development, which contains 40 single cells from eight stages on embryos from MII Oocyte to embryonic day 6.6 (E6.6), and we manually mapped the gold standard developmental trajectory (Fig. 3a). IDTI was able to successfully predict the different trajectories of trophoblast ectoderm (TE) and inner cell mass (ICM), and the ICM continued to gradually differentiate into extraembryonic ectoderm (Exe) and epiblast (Epi) (Fig. 3b,c).

Fig 3 — **IDTI applies of the real time-series scRNA-seq datasets**. The mouse early embryonic development time-series scRNA-seq dataset: (a) the gold trajectory (b) UMAP plot and (c) directed graph showing cell state trajectory inferred by IDTI. The mouse hepatoblast differentiation time-series scRNA-seq dataset: (d) the gold trajectory (e) UMAP plot and (f) directed graph showing cell state trajectory inferred by IDTI. The mouse cerebral cortex development time-series scRNA-seq dataset: (g) the gold trajectory (h) UMAP plot and (i) directed graph showing cell state trajectory inferred by IDTI.

Next, we applied IDTI on the time-series scRNA-seq dataset of mouse hepatoblast differentiation, consisting of 447 single cells collected at embryos (E10.5-E17.5). Here, we used the strategy proposed by Yang, et al. [24] to annotate the cells, labeling as “hepatoblast” at early time points (E10.5, E11.5), “hepatoblast/hepatocyte” at intermediate time points (E12.5, E13.5, E14.5), “hepatocyte” and “cholangiocyte” at late time points (E15.5, E17.5). We also manually mapped the gold standard developmental trajectory (Fig. 3d). IDTI also successfully inferred developmental trajectories of hepatoblast differentiation through intermediate cells into hepatocyte and cholangiocyte cells (Fig. 3e,f).

Finally, in order to evaluate the performance of IDTI with multiple cell states at each time point, we applied IDTI to the time-series scRNA-seq dataset during mouse cerebral cortex development, which contains 6316 cells collected at E11.5, E13.5, E15.5 and E17.5. These cells have covered a wide range of neuronal development, from early precursors (apical precursors (APs) and radial precursors (RPs)) to intermediate precursors (IPs) and differentiated cortical neurons. Tran et al. [19] used GSVA and the marker genes of APs, RPs, IPs, young neurons and neurons to automatically annotate the seven clusters. Meanwhile, we manually mapped the gold standard developmental trajectory through literature search [21] (Fig. 3g). IDTI inferred two trajectories rooted in APs/RPs, one trajectory branching into young neuron cells and neuron cells via IPs, and the other trajectory converging in the cluster of neuron cells. Unfortunately, it was not able to infer trajectories of APs/RPs to young neuron cells and neuron cells to young neuron cells (Fig. 3h,i).

3.4. Comparison of IDTI with other trajectory inference methods

Here, we compared IDTI with six other trajectory inference methods (i.e., Monocle 2, TSCAN [38], Slingshot [39], PAGA [40], Tempora and CStreet) on the simulated dataset, the mouse early embryonic development dataset, the mouse hepatoblast differentiation dataset and the mouse cerebral cortex development dataset. In order to facilitate comparison, we formalized all the inferred trajectories to graph (or network), of which nodes represent cell states and edges represent the relationship between two cell states. In addition to IDTI and CStreet, other methods need to manually determine the starting cells of the trajectory. Here, we evaluated the results using two metrics: the GED score, which was used to evaluate the similarity between the inferred trajectory and the gold trajectory, and the F₁ score, which was used to evaluate the branching accuracy between the inferred trajectory and the gold trajectory.

On the simulated dataset, IDTI can accurately infer all the four development trajectories (Fig. 2c,d). Here, we assigned Clu1A as the trajectory starting cells for the other methods. Monocle 2 inferred pseudotime trajectories based on individual cells, but its cells showed confusion (Fig. 4a). TSCAN can infer two main trajectories, unable to correctly infer the development trajectory of the terminal cell states (Fig. 4b). Slingshot cannot construct the real bifurcated trajectory, only linear trajectory (Fig. 4c). However, PAGA constructed a coarse-grained diagram, which contains six connections, and the results couldn't construct the main development trajectory (Fig. 4d). CStreet was relatively accurate, except that the trajectory Clu1A–Clu2A cannot be inferred (Fig. 4e). Tempora was not used for simulated data because pathway information is required. The trajectory inferred by the IDTI method was exactly the same as the gold trajectory, of which GED score is 0 and F₁ score is 1. In conclusion, the results on the simulated dataset showed that IDTI outperforms all other methods in the topology and branching accuracy, followed by CStreet, TSCAN, and finally PAGA, Monocle 2, Slingshot (Tables 1, 2).

Fig 4 — **Comparison of IDTI with other trajectory inference methods on the simulated dataset**. (a) Monocle 2 (b) TSCAN (c) Slingshot (d) PAGA (e) CStreet.

Table 1.

Comparison between IDTI and other methods on the topology similarity.

Methods	GED score
	Simulated data	Mouse Embryo data	Mouse Hepatoblast data	Mouse Cerebral Cortex data
IDTI	0	0	0	2
Monocle 2	5	8	2	4
TSCAN	2	9	0	3
Slingshot	5	10	-^a	2
PAGA	5	9	2	3
Tempora	-^a	-^a	2	1
CStreet	1	-^a	0	0

Open in a new tab

- represents that the result of the corresponding method is not available.

Table 2.

Comparison between IDTI and other methods on the branching accuracy.

Methods	F₁ score
	Simulated data	Mouse Embryo data	Mouse Hepatoblast data	Mouse Cerebral Cortex data
IDTI	1.00	1.00	1.00	0.80
Monocle 2	0.00	0.17	0.67	0.50
TSCAN	0.80	0.31	1.00	0.67
Slingshot	0.00	0.00	-^a	0.80
PAGA	0.22	0.15	0.67	0.67
Tempora	-^a	-^a	0.80	0.91
CStreet	0.91	-^a	1.00	1.00

Open in a new tab

- represents that the result of the corresponding method is not available and all values are reserved to two decimal places.

Similarly, we also made comparisons on the three real datasets. On the mouse early embryonic development time-series scRNA-seq dataset, we assigned MII Oocyte as the starting cells for other methods. The comparison results showed that CStreet and Tempora failed to complete the trajectory construction, and the other methods IDTI did best (Figs. 3c, S1). Hepatoblast was considered as the starting cells on the mouse hepatoblast differentiation time-series scRNA-seq dataset, IDTI, TSCAN and CStreet could accurately infer the real trajectory, in which Slingshot failed to infer the trajectory (Figs. 3f, S2). On the mouse cerebral cortex time-series scRNA-seq dataset, we assigned Aps/RPs as the starting cells. The results displayed that CStreet outperformed optimally, and Tempora failed to infer the trajectory of young neuron cells to neuron cells. The performance of IDTI was second only to CStreet and Tempora, where GED score of IDTI is 2 and F₁ score is 0.8, unable to infer from IPs to neuron cells and young neuron cells to neuron cells (Figs. 3i, S3). The results on real datasets show that IDTI performs best in topology similarity and branching accuracy on the mouse early embryonic development dataset, and the mouse hepatoblast development dataset, and only performs slightly worse on the mouse cerebral cortex dataset, but it is still acceptable (Tables 1, 2).

In summary, the IDTI is the first to utilize the increment of diversity to infer the trajectory for time-series scRNA-seq data, and which can reconstruct relatively accurate trajectories without the need to define the starting cells.

3.5. Evaluation of IDTI performances

To further evaluate the robustness of IDTI, we randomly perturbed the simulated dataset in two ways: different cell sampling rates and different gene dropout rates. Among them, the cell sampling rates had been set at 90%, 80% and 70% respectively, and the selection was performed 5 times with different random seeds for each number. The gene dropout rates had been set at 10%, 20%, 30%, 40% and 50% respectively, and the selection was performed 3 times with different random seeds for each number. Therefore, we generated a total of 30 perturbed datasets, and constructed trajectories using IDTI. As a result, there are no differences between the trajectories constructed by IDTI on the perturbed and original datasets. Specifically, IDTI still showed reliable results when the cell sampling rate was as low as 70% or the gene dropout rate was as high as 50%. Therefore, changes in cell number and gene dropout rate within a certain range have no effect on IDTI trajectory inference. To sum up, the IDTI is also reliable on datasets with small cell numbers and high gene dropout rates, indicating that the IDTI has high robustness.

4. Conclusion

With the development of sequencing technology, many computational methods of trajectory inference have been proposed. Meanwhile, time series experiments provide available temporal information to trajectory inference. We present IDTI, which makes full use of the time series information, and utilizes increment of diversity for cell state trajectory inference. The IDTI is an effective trajectory reconstruction method, which can reproduce the process of cell state transformation.

The time series information is very important to the performance of IDTI, which provides direction for the trajectory. The IDTI analyses the time-series scRNA-seq data at the level of cell states, not at the individual cell. Compared with inferred trajectories based on single cells, the advantage of cell state trajectory inference is to avoid single cells in the same cell state, and assign to different branches. The IDTI also performs well compared with other six commonly used trajectory inference methods in simulated and real datasets, and IDTI doesn't need to assign the starting cells. Furthermore, the IDTI is highly robust over datasets with different sampling rates and different dropout rates. In summary, the IDTI is a computational method for cell trajectory inference using time-series scRNA-seq data, which provides an easy and accurate way to understand and interpret transition process of cell identity.

Availability

IDTI is written in python and freely available at https://github.com/hy-1994/IDTI.

Declaration of competing interest

The authors declare that they have no conflicts of interest in this work.

Acknowledgments

We thank Prof. Shaorong Gao (Tongji University) for sharing the scRNA-seq data of mouse early embryonic development, Prof. Chengran Xu (Peking University) for sharing the scRNA-seq data of mouse hepatoblast differentiation and Prof. Freda Miller (Toronto University) for sharing the scRNA-seq data of mouse cerebral cortex development in the GEO database. We thank Prof. Yong Zhang (Tongji University) for sharing the python code of the CStreet method on Github. We also thank Prof. Gary Bader (Toronto University) for annotating and preprocessing mouse cerebral cortex development. This work was supported by the National Natural Science Foundation of China (62061034, 62171241), the key technology research program of Inner Mongolia Autonomous Region (2021GG0398) and the Science and Technology Leading Talent Team in Inner Mongolia Autonomous Region (2022LJRC0009).

Biographies

graphic file with name fx1.jpg

Yan Hong is a doctoral student of the State key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot, China. Her research interests include bioinformatics and computational biology.

graphic file with name fx2.jpg

Hanshuang Li is a doctoral student of the State key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot, China. Her research interests include bioinformatics, systems biology and developmental biology.

graphic file with name fx3.jpg

Yongchun Zuo(BRID: 03153.00.08361) is a PhD principal investigator of Bioinformatics. He is the professor of the State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University. Professor Zuo focuses on the computational biology researches, including classification of DNA/Protein sequence, codon optimization of gene and its regulatory sequence, and integration analysis of multi-omics in cell reprogramming.

Footnotes

Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.fmre.2024.01.020.

Appendix. Supplementary materials

mmc1.docx^{(1.4MB, docx)}

References

1.Zheng L., Liang P., Long C., et al. EmAtlas: A comprehensive atlas for exploring spatiotemporal activation in mammalian embryogenesis. Nucleic Acids Res. 2023;51(D1):D924–d932. doi: 10.1093/nar/gkac848. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Li H., Long C., Hong Y., et al. Characterizing cellular differentiation potency and waddington landscape via energy indicator. Research (Wash D C) 2023;6:0118. doi: 10.34133/research.0118. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Ding J., Sharon N., Bar-Joseph Z. Temporal modelling using single-cell transcriptomics. Nat. Rev. Genet. 2022;23(6):355–368. doi: 10.1038/s41576-021-00444-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Cannoodt R., Saelens W., Saeys Y. Computational methods for trajectory inference from single-cell transcriptomics. Eur. J. Immunol. 2016;46(11):2496–2506. doi: 10.1002/eji.201646347. [DOI] [PubMed] [Google Scholar]
5.Chen H., Albergante L., Hsu J.Y., et al. Single-cell trajectories reconstruction, exploration and mapping of omics data with STREAM. Nat. Commun. 2019;10(1):1903. doi: 10.1038/s41467-019-09670-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Wang Z., Zhong Y., Ye Z., et al. MarkovHC: Markov hierarchical clustering for the topological structure of high-dimensional single-cell omics data with transition pathway and critical point detection. Nucleic Acids Res. 2022;50(1):46–56. doi: 10.1093/nar/gkab1132. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Rashid S., Kotton D.N., Bar-Joseph Z. TASIC: Determining branching models from time series single cell data. Bioinformatics. 2017;33(16):2504–2512. doi: 10.1093/bioinformatics/btx173. [DOI] [PubMed] [Google Scholar]
8.Xie J., Yin Y., Wang J. TIPD: A probability distribution-based method for trajectory inference from single-cell RNA-Seq data. Interdiscip. Sci. 2021;13(4):652–665. doi: 10.1007/s12539-021-00445-4. [DOI] [PubMed] [Google Scholar]
9.Guo M., Bao E.L., Wagner M., et al. SLICE: Determining cell differentiation and lineage based on single cell entropy. Nucleic Acids Res. 2017;45(7):e54. doi: 10.1093/nar/gkw1278. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Trapnell C., Cacchiarelli D., Grimsby J., et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 2014;32(4):381–386. doi: 10.1038/nbt.2859. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Qiu X., Mao Q., Tang Y., et al. Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods. 2017;14(10):979–982. doi: 10.1038/nmeth.4402. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Cao J., Spielmann M., Qiu X., et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature. 2019;566(7745):496–502. doi: 10.1038/s41586-019-0969-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Manno G.La, Soldatov R., Zeisel A., et al. RNA velocity of single cells. Nature. 2018;560(7719):494–498. doi: 10.1038/s41586-018-0414-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Schiebinger G., Shu J., Tabaka M., et al. Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming. Cell. 2019;176(4):928–943. doi: 10.1016/j.cell.2019.01.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Setty M., Kiseliovas V., Levine J., et al. Characterization of cell fate probabilities in single-cell data with Palantir. Nat. Biotechnol. 2019;37(4):451–460. doi: 10.1038/s41587-019-0068-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Stassen S.V., Yip G.G.K., Wong K.K.Y., et al. Generalized and scalable trajectory inference in single-cell omics data with VIA. Nat. Commun. 2021;12(1):5528. doi: 10.1038/s41467-021-25773-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Saelens W., Cannoodt R., Todorov H., et al. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 2019;37(5):547–554. doi: 10.1038/s41587-019-0071-9. [DOI] [PubMed] [Google Scholar]
18.Lin C., Bar-Joseph Z. Continuous-state HMMs for modeling time-series single-cell RNA-Seq data. Bioinformatics. 2019;35(22):4707–4715. doi: 10.1093/bioinformatics/btz296. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Tran T.N., Bader G.D. Tempora: Cell trajectory inference using time-series single-cell RNA sequencing data. PLoS Comput. Biol. 2020;16(9) doi: 10.1371/journal.pcbi.1008205. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Zhao C., Xiu W., Hua Y., et al. CStreet: A computed Cell State trajectory inference method for time-series single-cell RNA sequencing data. Bioinformatics. 2021;37(21):3774–3780. doi: 10.1093/bioinformatics/btab488. [DOI] [PubMed] [Google Scholar]
21.Jiang Q., Zhang S., Wan L. Dynamic inference of cell developmental complex energy landscape from time series single-cell transcriptomic data. PLoS Comput. Biol. 2022;18(1) doi: 10.1371/journal.pcbi.1009821. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Zappia L., Phipson B., Oshlack A. Splatter: Simulation of single-cell RNA sequencing data. Genome Biol. 2017;18(1):174. doi: 10.1186/s13059-017-1305-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Wang C., Liu X., Gao Y., et al. Reprogramming of H3K9me3-dependent heterochromatin during mammalian embryo development. Nat. Cell Biol. 2018;20(5):620–631. doi: 10.1038/s41556-018-0093-4. [DOI] [PubMed] [Google Scholar]
24.Yang L., Wang W.H., Qiu W.L., et al. A single-cell transcriptomic analysis reveals precise pathways and regulatory mechanisms underlying hepatoblast differentiation. Hepatology. 2017;66(5):1387–1401. doi: 10.1002/hep.29353. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Yuzwa S.A., Borrett M.J., Innes B.T., et al. Developmental emergence of adult neural stem cells as revealed by single-cell transcriptional profiling. Cell Rep. 2017;21(13):3970–3986. doi: 10.1016/j.celrep.2017.12.017. [DOI] [PubMed] [Google Scholar]
26.Laxton R.R. The measure of diversity. J. Theor. Biol. 1978;70(1):51–67. doi: 10.1016/0022-5193(78)90302-8. [DOI] [PubMed] [Google Scholar]
27.Shannon C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948;27(3):379–423. [Google Scholar]
28.Li J., He X., Gao S., et al. The Metal-binding Protein Atlas (MbPA): An integrated database for curating metalloproteins in all aspects. J. Mol. Biol. 2023;435(14) doi: 10.1016/j.jmb.2023.168117. [DOI] [PubMed] [Google Scholar]
29.Lu M., Liu S., Sangaiah A.K., et al. Nucleosome positioning with fractal entropy increment of diversity in telemedicine. IEEE Access. 2018;6:33451–33459. [Google Scholar]
30.Zhang L., Luo L. Splice site prediction with quadratic discriminant analysis using diversity measure. Nucleic Acids Res. 2003;31(21):6214–6220. doi: 10.1093/nar/gkg805. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Wu C.Y., Li Q.Z., Feng Z.X. Non-coding RNA identification based on topology secondary structure and reading frame in organelle genome level. Genomics. 2016;107(1):9–15. doi: 10.1016/j.ygeno.2015.12.002. [DOI] [PubMed] [Google Scholar]
32.Zuo Y.C., Chen W., Fan G.L., et al. A similarity distance of diversity measure for discriminating mesophilic and thermophilic proteins. Amino Acids. 2013;44(2):573–580. doi: 10.1007/s00726-012-1374-z. [DOI] [PubMed] [Google Scholar]
33.Gao X., Xiao B., Tao D., et al. A survey of graph edit distance. Pattern Anal. Appl. 2010;13(1):113–129. [Google Scholar]
34.Hagberg A., Swart P., Chult D.S. Los Alamos National Lab.(LANL); Los Alamos, NMUnited States: 2008. Exploring Network Structure, Dynamics, and Function using NetworkX. [Google Scholar]
35.Wang H., Zhang Z., Li H., et al. A cost-effective machine learning-based method for preeclampsia risk assessment and driver genes discovery. Cell Biosci. 2023;13(1):41. doi: 10.1186/s13578-023-00991-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Liang P., Zheng L., Long C., et al. HelPredictor models single-cell transcriptome to predict human embryo lineage allocation. Brief. Bioinform. 2021;22(6) doi: 10.1093/bib/bbab196. bbab196. [DOI] [PubMed] [Google Scholar]
37.Wang H., Liang P., Zheng L., et al. eHSCPr discriminating the cell identity involved in endothelial to hematopoietic transition. Bioinformatics. 2021;37(15):2157–2164. doi: 10.1093/bioinformatics/btab071. [DOI] [PubMed] [Google Scholar]
38.Ji Z., Ji H. TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis. Nucleic Acids Res. 2016;44(13):e117. doi: 10.1093/nar/gkw430. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Street K., Risso D., Fletcher R.B., et al. Slingshot: Cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics. 2018;19(1):477. doi: 10.1186/s12864-018-4772-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Wolf F.A., Hamey F.K., Plass M., et al. PAGA: Graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biol. 2019;20(1):59. doi: 10.1186/s13059-019-1663-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

mmc1.docx^{(1.4MB, docx)}

[bib0001] 1.Zheng L., Liang P., Long C., et al. EmAtlas: A comprehensive atlas for exploring spatiotemporal activation in mammalian embryogenesis. Nucleic Acids Res. 2023;51(D1):D924–d932. doi: 10.1093/nar/gkac848. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0002] 2.Li H., Long C., Hong Y., et al. Characterizing cellular differentiation potency and waddington landscape via energy indicator. Research (Wash D C) 2023;6:0118. doi: 10.34133/research.0118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0003] 3.Ding J., Sharon N., Bar-Joseph Z. Temporal modelling using single-cell transcriptomics. Nat. Rev. Genet. 2022;23(6):355–368. doi: 10.1038/s41576-021-00444-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0004] 4.Cannoodt R., Saelens W., Saeys Y. Computational methods for trajectory inference from single-cell transcriptomics. Eur. J. Immunol. 2016;46(11):2496–2506. doi: 10.1002/eji.201646347. [DOI] [PubMed] [Google Scholar]

[bib0005] 5.Chen H., Albergante L., Hsu J.Y., et al. Single-cell trajectories reconstruction, exploration and mapping of omics data with STREAM. Nat. Commun. 2019;10(1):1903. doi: 10.1038/s41467-019-09670-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0006] 6.Wang Z., Zhong Y., Ye Z., et al. MarkovHC: Markov hierarchical clustering for the topological structure of high-dimensional single-cell omics data with transition pathway and critical point detection. Nucleic Acids Res. 2022;50(1):46–56. doi: 10.1093/nar/gkab1132. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0007] 7.Rashid S., Kotton D.N., Bar-Joseph Z. TASIC: Determining branching models from time series single cell data. Bioinformatics. 2017;33(16):2504–2512. doi: 10.1093/bioinformatics/btx173. [DOI] [PubMed] [Google Scholar]

[bib0008] 8.Xie J., Yin Y., Wang J. TIPD: A probability distribution-based method for trajectory inference from single-cell RNA-Seq data. Interdiscip. Sci. 2021;13(4):652–665. doi: 10.1007/s12539-021-00445-4. [DOI] [PubMed] [Google Scholar]

[bib0009] 9.Guo M., Bao E.L., Wagner M., et al. SLICE: Determining cell differentiation and lineage based on single cell entropy. Nucleic Acids Res. 2017;45(7):e54. doi: 10.1093/nar/gkw1278. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0010] 10.Trapnell C., Cacchiarelli D., Grimsby J., et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 2014;32(4):381–386. doi: 10.1038/nbt.2859. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0011] 11.Qiu X., Mao Q., Tang Y., et al. Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods. 2017;14(10):979–982. doi: 10.1038/nmeth.4402. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0012] 12.Cao J., Spielmann M., Qiu X., et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature. 2019;566(7745):496–502. doi: 10.1038/s41586-019-0969-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0013] 13.Manno G.La, Soldatov R., Zeisel A., et al. RNA velocity of single cells. Nature. 2018;560(7719):494–498. doi: 10.1038/s41586-018-0414-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0014] 14.Schiebinger G., Shu J., Tabaka M., et al. Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming. Cell. 2019;176(4):928–943. doi: 10.1016/j.cell.2019.01.006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0015] 15.Setty M., Kiseliovas V., Levine J., et al. Characterization of cell fate probabilities in single-cell data with Palantir. Nat. Biotechnol. 2019;37(4):451–460. doi: 10.1038/s41587-019-0068-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0016] 16.Stassen S.V., Yip G.G.K., Wong K.K.Y., et al. Generalized and scalable trajectory inference in single-cell omics data with VIA. Nat. Commun. 2021;12(1):5528. doi: 10.1038/s41467-021-25773-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0017] 17.Saelens W., Cannoodt R., Todorov H., et al. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 2019;37(5):547–554. doi: 10.1038/s41587-019-0071-9. [DOI] [PubMed] [Google Scholar]

[bib0018] 18.Lin C., Bar-Joseph Z. Continuous-state HMMs for modeling time-series single-cell RNA-Seq data. Bioinformatics. 2019;35(22):4707–4715. doi: 10.1093/bioinformatics/btz296. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0019] 19.Tran T.N., Bader G.D. Tempora: Cell trajectory inference using time-series single-cell RNA sequencing data. PLoS Comput. Biol. 2020;16(9) doi: 10.1371/journal.pcbi.1008205. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0020] 20.Zhao C., Xiu W., Hua Y., et al. CStreet: A computed Cell State trajectory inference method for time-series single-cell RNA sequencing data. Bioinformatics. 2021;37(21):3774–3780. doi: 10.1093/bioinformatics/btab488. [DOI] [PubMed] [Google Scholar]

[bib0021] 21.Jiang Q., Zhang S., Wan L. Dynamic inference of cell developmental complex energy landscape from time series single-cell transcriptomic data. PLoS Comput. Biol. 2022;18(1) doi: 10.1371/journal.pcbi.1009821. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0022] 22.Zappia L., Phipson B., Oshlack A. Splatter: Simulation of single-cell RNA sequencing data. Genome Biol. 2017;18(1):174. doi: 10.1186/s13059-017-1305-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0023] 23.Wang C., Liu X., Gao Y., et al. Reprogramming of H3K9me3-dependent heterochromatin during mammalian embryo development. Nat. Cell Biol. 2018;20(5):620–631. doi: 10.1038/s41556-018-0093-4. [DOI] [PubMed] [Google Scholar]

[bib0024] 24.Yang L., Wang W.H., Qiu W.L., et al. A single-cell transcriptomic analysis reveals precise pathways and regulatory mechanisms underlying hepatoblast differentiation. Hepatology. 2017;66(5):1387–1401. doi: 10.1002/hep.29353. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0025] 25.Yuzwa S.A., Borrett M.J., Innes B.T., et al. Developmental emergence of adult neural stem cells as revealed by single-cell transcriptional profiling. Cell Rep. 2017;21(13):3970–3986. doi: 10.1016/j.celrep.2017.12.017. [DOI] [PubMed] [Google Scholar]

[bib0026] 26.Laxton R.R. The measure of diversity. J. Theor. Biol. 1978;70(1):51–67. doi: 10.1016/0022-5193(78)90302-8. [DOI] [PubMed] [Google Scholar]

[bib0027] 27.Shannon C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948;27(3):379–423. [Google Scholar]

[bib0028] 28.Li J., He X., Gao S., et al. The Metal-binding Protein Atlas (MbPA): An integrated database for curating metalloproteins in all aspects. J. Mol. Biol. 2023;435(14) doi: 10.1016/j.jmb.2023.168117. [DOI] [PubMed] [Google Scholar]

[bib0029] 29.Lu M., Liu S., Sangaiah A.K., et al. Nucleosome positioning with fractal entropy increment of diversity in telemedicine. IEEE Access. 2018;6:33451–33459. [Google Scholar]

[bib0030] 30.Zhang L., Luo L. Splice site prediction with quadratic discriminant analysis using diversity measure. Nucleic Acids Res. 2003;31(21):6214–6220. doi: 10.1093/nar/gkg805. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0031] 31.Wu C.Y., Li Q.Z., Feng Z.X. Non-coding RNA identification based on topology secondary structure and reading frame in organelle genome level. Genomics. 2016;107(1):9–15. doi: 10.1016/j.ygeno.2015.12.002. [DOI] [PubMed] [Google Scholar]

[bib0032] 32.Zuo Y.C., Chen W., Fan G.L., et al. A similarity distance of diversity measure for discriminating mesophilic and thermophilic proteins. Amino Acids. 2013;44(2):573–580. doi: 10.1007/s00726-012-1374-z. [DOI] [PubMed] [Google Scholar]

[bib0033] 33.Gao X., Xiao B., Tao D., et al. A survey of graph edit distance. Pattern Anal. Appl. 2010;13(1):113–129. [Google Scholar]

[bib0034] 34.Hagberg A., Swart P., Chult D.S. Los Alamos National Lab.(LANL); Los Alamos, NMUnited States: 2008. Exploring Network Structure, Dynamics, and Function using NetworkX. [Google Scholar]

[bib0035] 35.Wang H., Zhang Z., Li H., et al. A cost-effective machine learning-based method for preeclampsia risk assessment and driver genes discovery. Cell Biosci. 2023;13(1):41. doi: 10.1186/s13578-023-00991-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0036] 36.Liang P., Zheng L., Long C., et al. HelPredictor models single-cell transcriptome to predict human embryo lineage allocation. Brief. Bioinform. 2021;22(6) doi: 10.1093/bib/bbab196. bbab196. [DOI] [PubMed] [Google Scholar]

[bib0037] 37.Wang H., Liang P., Zheng L., et al. eHSCPr discriminating the cell identity involved in endothelial to hematopoietic transition. Bioinformatics. 2021;37(15):2157–2164. doi: 10.1093/bioinformatics/btab071. [DOI] [PubMed] [Google Scholar]

[bib0038] 38.Ji Z., Ji H. TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis. Nucleic Acids Res. 2016;44(13):e117. doi: 10.1093/nar/gkw430. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0039] 39.Street K., Risso D., Fletcher R.B., et al. Slingshot: Cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics. 2018;19(1):477. doi: 10.1186/s12864-018-4772-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0040] 40.Wolf F.A., Hamey F.K., Plass M., et al. PAGA: Graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biol. 2019;20(1):59. doi: 10.1186/s13059-019-1663-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

An increment of diversity method for cell state trajectory inference of time-series scRNA-seq data

Yan Hong

Hanshuang Li

Chunshen Long

Pengfei Liang

Jian Zhou

Yongchun Zuo

Abstract

Graphical abstract

1. Introduction

2. Materials and methods

2.1. Data collection and preprocessing

2.2. Methods

2.2.1. The measure of diversity

2.2.2. The increment of diversity

2.2.3. Calculation of graph edit distance and F1 score

3. Results and discussion

3.1. Overview of IDTI

Fig. 1.

3.2. Application of IDTI on simulated dataset

Fig. 2.

3.3. Application of IDTI on real time-series scRNA-seq datasets

Fig. 3.

3.4. Comparison of IDTI with other trajectory inference methods

Fig. 4.

Table 1.

Table 2.

3.5. Evaluation of IDTI performances

4. Conclusion

Availability

Declaration of competing interest

Acknowledgments

Biographies

Footnotes

Appendix. Supplementary materials

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

2.2.3. Calculation of graph edit distance and F₁ score