Skip to main content
Nature Portfolio logoLink to Nature Portfolio
. 2025 Oct 23;22(11):2328–2336. doi: 10.1038/s41592-025-02870-5

PHLOWER leverages single-cell multimodal data to infer complex, multi-branching cell differentiation trajectories

Mingbo Cheng 1,2,#, Jitske Jansen 3,4,#, Katharina C Reimer 3,4, Vincent P Grande 5, James S Nagai 1,2, Zhijian Li 6,7, Paul Kießling 3, Martin Grasshoff 1,2, Christoph Kuppe 3, Michael T Schaub 2,5, Rafael Kramann 2,3,4,8,, Ivan G Costa 1,2,
PMCID: PMC12615267  PMID: 41131366

Abstract

Computational trajectory analysis is a key computational task for inferring differentiation trees from this single-cell data. An open challenge is the prediction of complex and multi-branching trees from multimodal data. To address these challenges, we present PHLOWER (decomposition of the Hodge Laplacian for inferring trajectories from flows of cell differentiation), which leverages the harmonic component of the Hodge decomposition on simplicial complexes to infer trajectory embeddings from single-cell multimodal data. These natural representations of cell differentiation facilitate the estimation of their underlying differentiation trees. We evaluate PHLOWER through benchmarking with multi-branching differentiation trees and using kidney organoid multimodal and spatial single-cell data. These demonstrate the power of PHLOWER in both the inference of complex trees and the identification of transcription factors regulating off-target cells in kidney organoids. Thus, PHLOWER enables inference of complex branching trajectories and prediction of transcriptional regulators by leveraging multimodal data.

Subject terms: Gene regulatory networks, Experimental models of disease


PHLOWER leverages single-cell multimodal data to infer complex, multi-branching cell differentiation trajectories.

Main

Cellular differentiation, the process by which a cell changes its chromatin and expression programs to acquire more specialized functions, is not only crucial in the development of multicellular organisms but also key during onset and progression of diseases. Transcription factors (TFs), which are proteins binding to regulatory DNA regions (open chromatin regions), are key regulators of gene expression, thereby orchestrating cellular differentiation processes. Dissecting these key regulatory events will help to develop protocols for cellular reprogramming or ex vivo differentiation in, for example, organoids and to understand disease-related differentiation processes for potential therapeutic interventions1. In this context, a unique resource to understand the interplay between chromatin, regulatory signals (TF binding) and expression changes during cellular differentiation2,3, is multimodal single-cell sequencing2, which can measure both full expression programs and genome-wide open chromatin. However, the associated experimental protocols dissociate cells, making it impossible to track how an individual cell differentiates over time experimentally.

To overcome this experimental limitation, computational trajectory analysis, which explores nonlinear embeddings in the cellular space and algorithms to find trees in these spaces, has therefore become an important tool in the analysis of cell differentiation in single cells47. However, these approaches have been hitherto applied to the study of differentiation trees with small trees (three to nine branches) and the only comprehensive benchmarking is based on small trees with four to five differentiation branches8. Currently, no work has evaluated their scalability to complex and multi-branching trajectories. Moreover, the analysis of single-cell multiomics data provides further challenges, as there is a need to estimate joint embeddings across modalities for modality detection9,10. Altogether, there is a lack of computational approaches to infer complex, branching trajectories from multimodal sequencing data.

We propose here PHLOWER. Simplicial complexes (SCs) are generalizations of graphs that allow not only for nodes and edges, but also triangles and other higher-order structures to be present11. The discrete Hodge Laplacian (HL) on SCs represents a generalization of the well-known graph Laplacian explored in diffusion maps12 and trajectory inference for single-cell data13,14. PHLOWER uses SCs as representation of single-cell multimodal data, where a node represents a cell and an edge indicates a potential cell differentiation event. The harmonic component obtained via spectral decomposition of the HL on an SC allows the creation of edge-flow embeddings (cell differentiation events) and trajectory embeddings (cell differentiation paths)11. These represent cell differentiation processes directly, and can thus enable the detection of complex differentiation trees and characterize branching events with high precision.

Results

Differentiation tree inference with PHLOWER

PHLOWER uses the discrete HL and its associated Hodge decomposition to obtain embeddings of cell differentiation trajectories. The Graph Laplacian (a zero-order HL) is a matrix representation of graphs, where samples are encoded as vertices and distances as edge weights. Particular eigenvectors of the Graph Laplacian are used to obtain an embedding representing cells in the graph, forming the basis for methods in clustering analysis15, nonlinear dimension reduction12, trajectory inference and pseudotime estimation in single-cell data13,14. In PHLOWER, we represent single-cell data as an SC, that is, a higher-order generalization of a graph, consisting of nodes (0-simplices), edges (1-simplices) and triangles (2-simplices). The spectral decomposition of the HL can be used to decompose edge flows into gradient-free, curl-free and harmonic components11,16. Of note, while the HL and Hodge decomposition are defined on differential forms on Riemannian manifolds17, there are guarantees that the behavior of the HL on an SC converges to the HL on manifolds in the limit18,19. PHLOWER focuses on the harmonic eigenvectors of the HL11,16 as these are associated with holes in the SC, which in turn can reveal cell differentiation tree branches in the gene expression SC.

In short, PHLOWER first uses a graph representation of the single-cell data and the graph Laplacian to estimate pseudotime of cells and identify progenitor and terminal cells, similarly to methods in previous studies13,20. Cell differentiation processes are classically represented as generating an out-branching, tree-like structure. To transform such a directed branching process into a SC, we perform a Delaunay triangulation and connect terminal differentiated cells (high pseudotime) with progenitor cells (low pseudotime). Connecting cells with high and low pseudotime creates a hole for every main trajectory in the graph (Fig. 1). Next, we perform a Hodge decomposition of the edge-flow space on the SC. The harmonic components of this decomposition provide edge-level embeddings, where each point in an embedding represents a cell differentiation event (equation (19)), as well as a trajectory embedding, where each embedding point represents a cell differentiation trajectory (equation (21)). PHLOWER explores these embeddings to delineate major differentiation trajectories in single-cell data and to reconstruct complex cell differentiation trees. As an example, we show how PHLOWER can be used to predict a simple differentiation tree from mouse embryonic fibroblasts (MEFs)21 toward either neurons or myocytes (Fig. 1 and Extended Data Fig. 1).

Fig. 1. Schematic overview of major steps of PHLOWER.

Fig. 1

PHLOWER receives as input a multimodal or a unimodal single-cell dataset. a, It next creates a graph representation and a graph embedding62 of the cells. Our example is based on a differentiation system of MEFs toward neurons or myocytes21. From this, PHLOWER uses a graph Laplacian decomposition and a random walk to estimate pseudotime from progenitor cells (that is, MEFs). b, Next, cells with low pseudotime (progenitors) and high pseudotime (terminally differentiated cells) are connected and a simplicial complex is obtained by Delaunay triangulation of the graph. c, An edge embedding is obtained via the harmonic eigenvectors via the decomposition of the simplicial complex HL. In the MEF data, the first two eigenvectors have zero eigenvalues and their signals discriminate edges belonging to the two differentiation trajectories (or holes in the SC). d, PHLOWER performs the next random walks to obtain trajectories and uses the HL decomposition to obtain a trajectory embedding and a cumulative trajectory (CT) embedding. Clustering analysis in the trajectory embedding reveals major trajectories in the data, while the cumulative embedding space is used to estimate trajectory backbones and branching-point events. PHLOWER outputs stream trees representing the differentiation tree over pseudotime and allows the detection of regulators and gene markers.

Extended Data Fig. 1. Detailed steps of PHLOWER on a single cell data with mouse embryonic fibroblasts towards neurons and myocytes.

Extended Data Fig. 1

a, First, PHLOWER estimates a graph representation using kernels and stress majorization layout62. Colors correspond to cluster labels as defined in21. This information (labels) is not used by PHLOWER algorithms. b, Next, a pseudo-time is estimated using the zero-order Laplacian and using MEF cluster as root. The definition of a root is the only supervision required in PHLOWER. Edges between cells are obtained via Delaunay triangulation (left). To create holes in the simplicial complex, PHLOWER performs a trick by including edges between high pseudo-time towards low pseudo time vertices (middle). If we re-run the graph layout, two holes are clearly visible. c, PHLOWER next computes the harmonic eigenvectors of the Hodge Laplacian of the simplicial complex. This indicates two harmonic eigenvectors, that is eigenvectors with zero eigenvalues. By plotting harmonic eigenvector’s values on the edges, we observe that the values of the first harmonic eigenvector discriminates edges in the neuronal branch vs. others, while the values of the second harmonic eigenvector discriminates the myocyte related cells from others. d, Next, PHLOWER generates trajectories by random walk on the simplicial complex representation to obtain edge-flow vectors. e, Finally, PHLOWER creates a trajectory embeddings (Eq. (21)) by a dot product of the edge flows with the harmonic eigenvectors. We observe two clear clusters in the trajectory map (left), which are detected by providing this embedding as input for DBSCAN69. We can similarly create a cumulative trajectory map (Eq. (23)) by creating edge flows of trajectories by considering only the first one, two, three and so on first differentiation events. PHLOWER uses this space to delineate the backbones of differentiation trajectories and to detect branching points events (in green, right panel). f, PHLOWER outputs a differentiation tree, pseudo-time values and an allocation of cells to positions in the branches. PHLOWER makes use of STREAM to visualise the final trajectories.

Benchmarking of trajectory inference methods

Next, we evaluated PHLOWER and competing methods on simulated datasets22 and single-cell RNA-sequencing (scRNA-seq) datasets8 on how well methods can recover original tree structures and how well they can place cells within these differentiation trees. We utilized the diffusion-limited aggregation (DLA) tree22 to simulate ten complex differentiation tree datasets with 5 to 18 total branches, similarly to previous work14. We selected 33 scRNA-seq datasets from the benchmarking Dynverse, which contained single-rooted tree structures with at least three branches (Supplementary Table 1).

These data were provided as input for PHLOWER and competing approaches for the detection of differentiation trees (PAGA tree4, Monocle3 (ref. 23), cellTree24, pCreode25, Slice26, RaceID27,28, Slingshot6, TSCAN29, MST30, Elpigraph31 and STREAM5). This selection included the top ten approaches from a recent benchmark evaluation8. Tools were evaluated using the metrics proposed by the Dynverse framework8: tree structure similarity (Hamming–Ipsen–Mikhailov, HIM), location of cells within a branch (correlation), cell allocation to branches (F1 branches), cell allocation to branching points (F1 milestones) and an accuracy estimated as the average of the four previous metrics. An example of the steps of the PHLOWER algorithm in fitting a differentiation tree in simulated data is shown in Supplementary Fig. 1.

For simulated data, PHLOWER was the best-performing method in regard to tree topology recovery, followed by PAGA, RaceID and Monocle3 (Fig. 2a and Extended Data Fig. 2a). Regarding the problem of allocating cells to positions in a branch, PHLOWER was also the best performer, followed by TSCAN and RaceID. In allocating cells to the correct branches, PHLOWER obtained the best performance, followed by Monocle3 and PAGA. In the allocation of cells to branches, PHLOWER was the best performer, followed by RaceID, Monocle3 and PAGA. The final accuracy value indicated PHLOWER, PAGA and RaceID as the best-performing methods for the simulated data.

Fig. 2. PHLOWER benchmarking.

Fig. 2

a, HIM score (topology similarity between true and inferred trees), cell allocation within a branch (correlation), cell allocation to branching points (F1 milestones), cell allocation to branches (F1 branches) and accuracy (y axis) after evaluation of 12 tree inference algorithms (x axis) on the simulated data (n = 10). Methods are ranked (left to right) regarding the mean value (black trace). b, As in a on Dynverse real datasets (n = 33). c,d, Differentiation trees estimated with PAGA, Monocle3 and PHLOWER algorithms for the pancreas progenitor (n = 3,696) (c) and neurogenesis (n = 18,213) (d) datasets. Trees in the left display the reference cell differentiation tree. Red arrows indicate false positive branches.

Source data

Extended Data Fig. 2. Ranking of Benchmarked methods.

Extended Data Fig. 2

a, We show the mean rankings of the methods based on the HIM, Correlation, F1 branches and F1 milestones for all evaluated methods in the simulate data set (based on results from Fig. 2a) and scRNA-seq data sets (based on results from Fig. 2b). Higher ranking values indicate the best performer. Bars indicate methods with similar performance in accordance to the Friedman-Nemeniy test. b, HIM, correlation, F1 branches, F1 milestones and accuracy score (y-axis) vs. methods (x-axis) for 33 real scRNA-seq datasets. Methods are sorted by increasing mean (black line), while distinct color indicates the distribution of score per type of structure: bifurcation, multifurcation and trees. While some approaches perform comparatively similar with all type of structures, some approaches perform relatively better for more complex structures, that is PAGA has higher HIM scores for trees, while slighshot performs best for simpler bifurcation. c, Distributions of topology size differences between predicted and reference structures. Positive values indicate that the predicted topologies are more complex than the references, while negative values indicate simpler trees were estimated. The dots represent the mean values, and the lines indicate the standard deviations. The best approach should have the lowest absolute coefficient of variation (cv=mean/std). Note that this statistic does not evaluate the topological similarity of the tree, which is indicated by the HIM statistics.

Source data

For real scRNA-seq data, PHLOWER was the best-performing approach followed by Monocle3 and PAGA regarding structure similarity (Fig. 2b and Extended Data Fig. 2a). Regarding the location of cells, PHLOWER obtained highest average scores followed by Monocle3 and Slingshot. Regarding allocation to branches or millstones, PHLOWER was also the top performer with Slice, and Slingshot as follow-up approaches. In the aggregated accuracy ranking, PHLOWER obtained the highest average value followed by Monocle3 and pCreode.

The ranking of competing approaches differed on simulated versus scRNA-seq datasets. While RaceID, Monocle3 and PAGA were runners-up on simulated data, Monocle3, pCreode and Slingshot were runners-up for real data. One potential reason is that these methods perform differently in distinct types of tree structures, that is, simulated data focus on larger complex tree structures. To further investigate this, we stratified the performance statistics on real scRNA-seq by type of structure as determined by Dynverse (bifurcation, multifurcation and trees)8. This indicates that some approaches, such as Slingshot, perform relatively better in simpler trees (bifurcation), while others perform relatively better on complex structures (monocle and PAGA; Extended Data Fig. 2b). These results are also supported by an analysis showing that some approaches, such as Slingshot, tend to underestimate the size of the trees (Extended Data Fig. 2c).

Another important aspect is the computational requirements of the approaches. To evaluate this, we resorted to two datasets commonly used in the RNA velocity literature: the pancreas progenitor (≈3,700 cells)32 and neurogenesis (≈18,000 cells) datasets33. PHLOWER required 0.5–12 h and 12–40 GB of memory for these datasets using a standard desktop (Extended Data Fig. 3a,b). We next evaluated a time-efficient variant of PHLOWER based on cell downsampling (Extended Data Fig. 3c,d). We observed that by using 30% of the cells, PHLOWER obtained 8.6 times faster processing and used one-sixth of the memory in contrast to using all cells, while infering a bona fide neurogenesis cell differentiation tree.

Extended Data Fig. 3. Time and Memory Benchmarking.

Extended Data Fig. 3

a, We show the time (left) and memory (right) requirements for pancreas progenitor (3.7k cells) and b, neurogenesis (18k cells) data. The profiling was performed with the package memory-profiler (0.61.0) on a computer with an Intel i5 10400 processor (12 threads), 64GB RAM, running Linux Mint OS 21.1. c, We show the time and memory requirements of PHLOWER for the neurogenesis data (18k cells) after subsampling by using between 30% to 90% of cells. The use of half of the data provides a 8.6x speed up and requires 1/6 of the memory when compared to using all cells. Experiments were executed on a high perfomance computing node (AMD EPYC 7543 50/128 Cores 2.345G, 1024GB RAM, Rocky Linux 8.9.). Note that this provides an speed up of 4x compared to experiments in a-b, which needed to be executed in a normal desktop computer due to super user requirements of Dynverse. d, PHLOWER stream trees obtained for the distinct sub-sampling procedures.

Source data

The performance of PHLOWER and competing methods is further illustrated by the analysis of the inferred trees in the pancreas progenitor and neurogenesis datasets. We focus here on PHLOWER, PAGA and Monocle3, as these were the only methods recovering complex trees, when compared to other approaches (Supplementary Figs. 2 and 3). This is in line with the previous analysis, which indicates that most competing approaches underperform in complex differentiation trees (Extended data Fig. 2c).

For pancreas progenitor data (Fig. 2c), we observed that the Monocle3 tree does not capture the epsilon branch, and the delta branch corresponds to an unconnected single-branched tree. PHLOWER and PAGA recovered all main branches of this tree: epsilon, delta, alpha and beta. In the larger and more complex neurogenesis data (Fig. 2d), Monocle3 inferred four unconnected trees delineating some of the main branching events. PHLOWER and PAGA recapitulated the main terminal branches with the exception of the small oligodendrocyte precursor cell population, which was missed by both methods. Note however that PAGA inferred two false positive branches (Fig. 2d), which are not related to any cell type described in this dataset34. Altogether, this analysis supports the power of PHLOWER in recovery on complex cell differentiation trees.

An alternative approach to characterize topological features in a dataset is persistent homology (PH)35. To check its association with PHLOWER, we performed PH in the the triangulated SC (Extended Data Fig. 4). This indicates PH can support PHLOWER by determining thresholds to build the triangulated SC. Moreover, a PH analysis of the SC used by PHLOWER indicated that PH can be used to characterize the number of holes in PHLOWER’s SC (Extended Data Fig. 5).

Extended Data Fig. 4. Persistent Homology Analysis.

Extended Data Fig. 4

a, We display the triangulated single cell graph (threshold Q75) for the MEF data. b, Persistence diagram with 0-loops for the same data cloud from (A). c, Graphs obtained after varying the the radius ϵ for several cutoffs values. We report the number of barcodes/connected components for each radius. The value 85.4 represents the first graph with a single connected component. Using an radius of factor 1.2 higher (102.5) leads to a graph with higher connectivity between nodes. This approach provides a graph similar to the filtering scheme used in PHLOWER (A). d-f, same as a-c for the DLA simulated data with 10 branches. The python module gudhi (3.10.1) is used for the persistence analysis.

Source data

Extended Data Fig. 5. 1-dimensional Persistent Homology.

Extended Data Fig. 5

a, Persistence diagram with 1-loops estimated on the SC of the MEF cells. b, Birth and death diagram with both 0-loops (connected components) and 1-loops (holes) for the SC representing MEF cells. c, SCs obtained after varying the the radius ϵ for several cutoffs values. Interestingly, the radius of 300, which is the lowest to remove all 1-loops but those with “infinity radius” find two holes as expected in this data set. d-f, same as a-c for the simulated trees with 10 branches. In this more complex data set, the use of the threshold of 1750 provides a SC with 6 holes. This radius in the smallest radius such that only holes with “infinite size” are considered. These correspond to holes generated by artificially connecting cells with low and high pseudotime, precisely corresponding to features we want to extract with PHLOWER.

Source data

Inferring cellular trajectories in kidney organoids

Induced pluripotent stem (iPS) cell-derived kidney organoids represent a solid model to validate cellular trajectories because this model represents the differentiation of stem and progenitor cells toward various kidney cell lineages. Therefore, we generated a multimodal single-cell multiome sequencing (RNA and assay for transposase accessible chromatin (ATAC)) datasets of kidney organoids after 7, 12, 19 and 25 days of differentiation by using our own protocol36. This recovered 13,751 cells with an average of 10,378 RNA transcripts and 19,263 DNA fragments (ATAC) per cell (Extended Data Fig. 6a). We next integrated the data for each modality independently37 and used MOJITOO to obtain a joint ATAC–RNA embedding9. We provided the data as input for PHLOWER, which recovered a trajectory with nine branches (Fig. 3a and Extended Data Fig. 6). We observed that the tree successfully sorted cells based on the organoid’s age (Fig. 3b), and this matched well with the pseudotime estimates (Fig. 3c). These nine branches could be grouped into three major branches associated with epithelial cells (two sub-branches associated with podocytes and one with tubular cells), four branches of stromal cells and one major branch associated with muscle and neuronal cells (three sub-branches). The annotation of these cells was based on the expression of markers as TBXT38 and KDR39 for mesoderm cells, PODXL40 and NPHS2 (ref. 40) for podocytes, SLC12A1 (ref. 41) and PAPPA2 (ref. 41) for kidney tubule epithelial cells, COL1A2 (ref. 40) and PDGFRB42 for stromal cells, and neuronal and muscle markers MAP2 (ref. 36), MSX1 (refs. 38,4042), MYL1 and MYF6 (Extended Data Fig. 7). The latter branches, that is, neuronal and muscle, are considered off-target cells and potentially may hamper maturation of kidney cells in the organoids, which should be avoided in kidney organoids43,44.

Extended Data Fig. 6. Quality check for kidney organoids single cell data.

Extended Data Fig. 6

a, We show violin plots with quality check information after filtering and cell detection. These are: number of features (genes) in RNA, number of counts (transcripts) in RNA, proportion of mitochondrial genes (RNA), fragment sizes distribution (ATAC), number of fragments (ATAC) and transcription start site enrichment. All libraries had similar values across days of sampling. b-g, PHLOWER workflow for the kidney organoids data as described in Extended Data Fig. 1.

Source data

Fig. 3. Kidney organoid data.

Fig. 3

a, Differentiation tree on the kidney organoid data as estimated by PHLOWER (n = 13,751). b,c, Day of differentiation (b) and pseudotime estimates (c) of cells in the differentiation trees (n = 13,751). d, Violin plots show the marker genes for cell types in each branch. e, Selected TFs with branch-specific expression and TF activity scores (n = 13,751).

Source data

Extended Data Fig. 7. Gene markers for cell type of PHLOWER tree identification.

Extended Data Fig. 7

Expressions of the gene markers TBXT,KDR, PODXL, NPHS2, PAPPA2, SLC12A1, MYL1, MYF6, MSX1, MAP2, COL1A2 and PDGFRB are shown in the PHLOWER tree (n=13,751; same genes as in Fig. 3d).

A key question to be addressed by PHLOWER is the detection of regulators (TFs) driving the differentiation of iPS cells within the kidney organoids. We leveraged the tree inferred by PHLOWER and a procedure similar to scMega45 to detect TFs related to branch differentiation. In short, we estimated TF activity scores and selected TFs whose expression levels are concordant with the TF activity and are differentially expressed between compared branches (Extended Data Fig. 8). When comparing cells at the end of tubular and podocyte trajectories, this recovered bona fide regulators of these cells such as WT1 (ref. 46) and MAFB47 for podocytes and HNF1B48 and GRHL2 (ref. 49) for tubular cells (Fig. 3e and Extended Data Fig. 9). Next, we compared major branches: stromal cells versus others and neuronal/muscle cells versus others. We detected TFs related to fibroblasts including TWIST1 and RUNX2 (refs. 50,51) as regulators of stromal cells, and the known skeletal muscle TF MYOG for the muscle branch. Among the top three neuronal cell regulators, we observed the TFs PAX3, RFX4 and ZIC2, which have been previously related to neuronal cell differentiation and/or neuronal diseases5254. The modulation of these TFs is of particular interest, as neuronal cells are considered to be undesired off-target cell types in kidney organoids.

Extended Data Fig. 8. Regulators associated with cell branches.

Extended Data Fig. 8

a, Regulators associated with Podocyte cells. We perform a differential expression analysis to find genes specific to podocytes (comparing with tubular cells) (left). Of these DE genes, we select transcription factors, which transcription factor activity (scATAC-seq), is highly correlated with gene expression (middle). Heatmaps display the TF activity and gene expression profiles of these TFs over the differentiation path from mesoderm towards podocyte cells. b, Same as a when contrasting tubular cells with podocytes. c, Sames as a when contrasting neuronal/muscle cells with all other branches. d, Same as a when contrasting stromal cells with all other branches. e, Same as a when contrasting Muscle cells with all other Neuron cells.

Source data

Extended Data Fig. 9. TF activity and gene expression of kidney organoid relevant transcription factors.

Extended Data Fig. 9

TF activity and gene expression of selected transcription factors in the PHLOWER estimated kidney organoid differentiation tree (n=13,751).

Spatial organization of kidney organoids and perturbation of off-target lineages

We leveraged our analysis of multimodal single-cell data to perform a subcellular resolution spatial transcriptomic analysis of kidney organoids using the Xenium spatial imaging platform55. We used PHLOWER results to derive a 100-gene kidney organoid marker panel for the detection of mesenchymal cells, tubular epithelial cells, stromal cells, neuronal cells and podocytes (Extended Data Fig. 10a). This panel selection included computational data-driven-based marker genes from the multiome analysis, some literature-based marker genes and top candidate TFs from the scMega analysis (Supplementary Table 2).

Extended Data Fig. 10. Xenium spatial profiling.

Extended Data Fig. 10

a, Gene expression in multiome kidney organoid data, highlighting the 100 genes and transcription factors selected for the Xenium experiment. b, QC for the day 19, day 25, scrambled siRNA and siRNA treated spatial data. c, UMAP shows the cell types of the xenium data. d, Dot plots show the markers of cell types of the xenium data. e, mRNA expression of genes targeted by siRNA experiments. Statistical test using unpaired two-tailed t-test with bars representing mean ± sd of at least N=3 per condition per experiment, 2 independent experiments.

Source data

Clustering and trajectory analyses of the 105,092 cells in a 19-day kidney organoid and two 25-day kidney organoids indicated the identification of progenitor cells (mesoderm, podocyte and tubular epithelial progenitors) and all major kidney organoid branches (mesoderm, podocytes, tubular epithelial cells, stromal cells and neuronal cells; Fig. 4a,b and Extended Data Fig. 10b,c). When contrasting cells detected on 19 versus 25 days, we observed a higher proportion of mesoderm cells on day 19, while day 25 had a higher amount of tubular epithelial cells and podocytes (Fig. 4c), reflecting the higher maturation of organoids on day 25. Moreover, we observed gene and TF expression patterns similar to the predictions based on the multimodal analysis (Fig. 4d, Extended Data Fig. 10d and Supplementary Figs. 4 and 5). This included the TFs PAX3, RFX4 and ZIC2, which were predicted by PHLOWER to control the off-target neuronal lineage. Altogether, these results indicate that the Xenium panel can identify major differentiation branches in kidney organoids in a spatial context and will aid in understanding cell differentiation differences based on signaling events originating from cellular neighborhoods and spatial niches.

Fig. 4. Spatial profiling of kidney organoids.

Fig. 4

a, Xenium-based spatial profiling of kidney organoid differentiation on day 19 and day 25 (sections 1 and 2). Colors represent distinct cell populations. b, PHLOWER-estimated differentiation tree on day-19 and day-25 organoids (n = 105,092). c, Single-cell permutation test (two sided) to measure relative differences in cell proportions comparing cell abundances on day 25 versus day 19. x axis shows the log2-transformed fold difference (log2FD), error bars are based on confidence intervals (95%) for the magnitude difference returned via bootstrapping63 (n = 1,000), and P values were adjusted using a Benjamini–Hochberg false discovery rate64. d, Expression of TFs controlling the differentiation of the kidney organoid branches.

Source data

To validate PHLOWER predictions, we performed multiplex short interfering RNA (siRNA) knockdown experiments of the identified neuronal lineage-defining TFs PAX3, RFX4 and ZIC2 in iPS cell-derived kidney organoids during the differentiation process from day 19 onwards. Knockdown experiments led to a reduction of 25–30% in PAX3, RFX4 and ZIC2 mRNA expression (Extended Data Fig. 10e). Xenium spatial profiling of scrambled siRNA as the control condition (scrambled; sections 1 and 2) and siRNA-treated organoids against PAX3, RFX4 and ZIC2 detected the same cell types as non-treated day-25 organoids (Fig. 5a and Supplementary Figs. 6 and 7). When contrasting scrambled siRNA versus siRNA, we observed a significant increase in tubular cells and podocyte progenitors and a significant decrease in muscle, stromal and neuronal cells (Fig. 5b). Immunofluorescence staining further confirmed the reduction of stromal cells including off-target neuronal cells in the siRNA condition, as reflected by the significant decrease in interstitial vimentin (Fig. 5c,d). We observed a significant increase in podocyte nephrin protein expression and 50% more tubular E-cadherin protein expression, suggesting improved podocyte (progenitor) and tubular characteristics as a result of diminished off-target cells (Fig. 5e,f). Altogether, these experiments supported that gene silencing of PAX3, RFX4 and ZIC2—known as neuronal lineage markers and considered as an off-target population—led to a small increase in tubular cell proportion and enhanced podocyte development as shown by increased nephrin and a higher proportion of podocyte progenitors.

Fig. 5. Spatial profiling of kidney organoids after siRNA knockdown of PAX3, RFX4 and ZIC2.

Fig. 5

a, Xenium-based spatial profiling of kidney organoids at day 25 after treatment with scrambled siRNA control (scrambled) and siRNA multiplex against the TFs PAX3, RFX4 and ZIC2 (siRNA). b, Single-cell permutation test (two sided) to measure relative differences in cell proportions comparing cell abundances in siRNA and scrambled siRNA kidney organoids. x axis shows the log2FD, and error bars are based on confidence intervals (95%) for the magnitude difference returned via bootstrapping63 (n = 1,000), and P values were adjusted using a Benjamini–Hochberg false discovery rate64. c, Immunofluorescence staining of control, scrambled siRNA and siRNA-treated kidney organoids showing nephrin (magenta), vimentin (green) and DAPI (blue). d, Quantification of the levels of neuronal marker vimentin+ and NPHS1 of images in c using one-way analysis of variance followed by Tukey’s post test with bars representing the mean ± s.d. of at least N = 3 organoids per condition per experiment, from two independent experiments. e, Silencing of RFX4, ZIX3 and PAX3 increased nephrin (red) expression in podocytes and increased E-cadherin (green) expression in tubular cells by 50%, although this was not significant. f, Podocyte and tubule quantification using an unpaired, two-tailed t-test with bars representing the mean ± s.d. of at least N = 3 per condition per experiment, from two independent experiments.

Source data

Discussion

Trajectory analysis is paramount in the analysis of cells undergoing cell differentiation. Despite a wealth of literature on computational methods8, most methods have been only applied or tested in small cell differentiation trees with few branches. PHLOWER explores the harmonic component of the HL decomposition, which allows the consideration of the interaction between edges, nodes and triangles of an SC to estimate embedding on edges and trajectory levels. By exploring this higher-order representation of the data explicitly, it allows the detection of complex and large cell differentiation trees. This is supported by our comprehensive benchmarking, which indicates that PHLOWER obtains highest average scores in all evaluated scenarios and metrics. Of note, our evaluation focused on single-rooted tree structures with 3 to 26 branches. This is in contrast to a previous benchmarking study8, which includes linear or circular structures.

SCs represent higher-order generalizations of graphs. Therefore, computations on SCs can be computationally demanding owing to combinatorial explosion. While the SCs explored by PHLOWER are sparse, which results in relatively few edges and triangles, PHLOWER required downsampling strategies to handle large spatial transcriptomics datasets with more than 100,000 cells. This highlights a current gap in efficient methods for working with SCs on large-scale data.

Multimodal single-cell data, which allows the parallel measurement of transcriptome and open chromatin data, can provide rich information on the relationship between the regulatory function and the transcriptional function of cells3. We display the power of PHLOWER by analyzing a kidney organoid differentiation course, where PHLOWER was able to detect major cell lineages, as well as TFs with branch-specific gene expression and TF activity. Off-target cell populations compromise organoid development and maturation43,44,56. To address this, we targeted TFs identified by PHLOWER as regulators of neuronal lineages using siRNA. This pertubation promoted enhanced tubular cell differentiation. Altogether, these findings highlight PHLOWER’s capacity to reveal key regulatory factors within complex cell differentiation systems and its utility for improving organoid protocols.

An approach to characterize topological features in a dataset is PH35. An evaluation of PH analysis of PHLOWER-generated SCs indicates that PH supports the analysis of the cell differentiation trees by determining thresholds for building the triangulated SC or to determine the number of holes. However, distinct from the harmonic components of the HL, PH does not give unique generators that quantify the relation of all simplices (edges and nodes) to the holes57. The combination of these approaches represents an interesting topic of further research.

HLs have been previously explored in other molecular problems58. HL of vector fields have been performed in RNA velocity estimates, which allowed visualization of curl-free, divergent-free and harmonic components59. This work also suggests that the harmonic component captured the overall direction of differentiation similarly as explored in PHLOWER. However, the previous work focused only on the visualization of the RNA velocity fields, and could not be used to infer the trees or to allocate cells along these trees as performed by PHLOWER. A current limitation of PHLOWER is that it only considers the harmonic components of the Hodge decomposition. Incorporating additional components, such as curl and gradient terms, would enable the analysis of more complex cell differentiation structures, including acyclic graphs. Future applications may also involve three-dimensional molecular data, such as protein–ligand prediction60 or three-dimensional spatial transcriptomics61. In such geometric data, higher-order structures, such as the tetrahedron (3-simplices), can be analyzed, which may reveal geometric features (2-loops) like hulls or voids. Addressing these cases will require efficient numerical methods for higher-order HL decompositions.

Methods

Rationale

PHLOWER uses the HL and the harmonic component of the associated Hodge decomposition to obtain embeddings of cell differentiation trajectories. For this, PHLOWER represents single-cell data as an SC consisting of nodes (0-simplices), edges (1-simplices) and triangles (2-simplices). Next, a first-order HL is used to describe the SC. The first-order HL describes how edges relate to each other via nodes (so-called lower-adjacency) and triangular faces (upper-adjacency). Importantly, the decomposition of the HL can be used to decompose flows and trajectories on the edges into gradient-free, curl-free and harmonic components, akin to the classical Helmholtz decomposition of vector fields known from vector calculus. While we focus here on ‘discrete’ HLs based on SCs, there are guarantees that the spectral behavior of these converges to the HL on manifolds in the limit if weighted accordingly18,19.

Of particular interest in our context here are the spectral embeddings associated with the so-called harmonic eigenvectors of the HL11,16, as these are associated with ‘holes’ in the underlying space. Just like the eigenvectors of the graph Laplacian can be used to define a spectral embedding of the nodes in a graph, the eigenvectors of the HL can be used to provide an embedding of the edges. As edges correspond to cell differentiation events, this enables PHLOWER to represent cell differentiation events and cell differentiation pathways (sequences of edges) in a direct way. See Supplementary Fig. 8 for a graphical contrast of the L0 and L1 Laplacian decompositions.

Overview of PHLOWER

PHLOWER receives as input a set of matrices representing a single-cell sequencing modality, as shown in equation (1):

X={X(1),,X(m)} 1

where X(i)Rn×s(i) represents the data of a particular single-cell modality, n represents the number of cells, and s(i) represents the number of features in modality i. Here, we assume that cells match across modalities.

PHLOWER has been evaluated on multimodal single-cell (ATAC and RNA) sequencing or unimodal scRNA-seq (m = 1). First, PHLOWER constructs a graph representation G of the single-cell matrix by estimating a joint embedding9 followed by kernel representation, graph construction and pseudotime estimation as done in the literature20. Next, it builds an SC. For this, it first uses the Delaunay triangulation procedure between nodes (cells) to obtain edges forming triangles. Next, it uses pseudotime information from the graph to find terminal differentiated cells and root cells. Finally, it connects terminal (differentiated) cells to root (progenitors) cells with edges and triangles to create holes corresponding to cell trajectories to obtain the final SC. Thus, PHLOWER created different cell differentiation paths measurable by topology/geometry, which we will exploit in the next steps.

PHLOWER next computes the harmonic eigenvectors of the normalized first-order HL11 associated with the SC. Eigenvectors with zero eigenvalues (harmonic eigenvectors) delineate holes in the SC, which are associated with cell differentiation trajectories. Lastly, PHLOWER samples trajectories from the SC, which represent edge flows. Taking the dot product of individual edge flows with the harmonic eigenvectors creates a trajectory embedding, where each point represents a particular trajectory. Clustering analysis on this space allows PHLOWER to find major trajectory groups. Moreover, PHLOWER builds a cumulative trajectory space, which is used to recover the differentiation trees. This can be visualized as stream plots5. PHLOWER outputs a tree structure, the association of each cell within a branch and pseudotime estimates (Extended Data Fig. 1). It also detects branch-specific regulators by detection of TFs similarly to ref. 45 with (1) branch-specific expression patterns and (2) similar gene expression and TF activity along the cell differentiation.

Single-cell graph representation

As a first step, PHLOWER estimates a single-cell graph to represent the data using a procedure similar to DDHodge20. This procedure takes as input a low-dimensional embedding Xl, which can be provided by MOJITOO9 for multimodal data; or by principal component analysis (PCA) for scRNA-seq.

Diffusion map

Given a common joint cell embedding Xl, we represent the data using diffusion maps12. For this, we first estimate a Gaussian kernel W as shown in equation (2):

Wij=expxilxjl2σiσj 2

where xil is the representation of sample (cell) i in the embedding Xl and σi is the local scaling parameter. This is estimated with the distance to the n-th nearest neighbor from xil as in ref. 65.

The Graph Laplacian L0 can be defined according to equation (3):

L0=DW 3

where D is a diagonal matrix with i-th entry dii = ∑jWij. We now consider a random-walk process on the graph described by the above graph Laplacian as shown in equation (4):

pt+1=ptD1WptM 4

where ptRn is the normalized probability vector at time t, and M is the transition matrix of the random walk. Next, we perform an eigen-decomposition on the symmetric form of M according to equation (5):

M=D12(M)D12=D12(D1W)D12=D12WD12=QΛQ 5

Taking advantage of the above eigen-decomposition, we can effectively calculate Ms (ref. 12), as given by equation (6):

Ms=(D12MD12)s=(D12QΛQD12)s=D12QΛsQD12 6

where the columns of D12Q include the right eigenvectors of M and the rows of QD12 include the left eigenvectors. The diagonal matrix Λ contains the corresponding eigenvalues. Finally, we can estimate pseudotime u (or potential) at time t = s according to equation (7):

u=(1/m,1/m,,1/m,0,0,,0)Ms 7

where m is the number of start cells, and s is the step of the diffusion process.

Graph-based pseudotime estimation

To improve the pseudotime estimates, we use a procedure similar to that used in ref. 20, which smooths pseudotime estimates by considering the graph connectivity. Let us denote the fully connected graph by GF=(V,EF) and the associated pruned k-nearest-neighbor graph by GP=(V,EF). Next, we prune this fully connected graph by considering only the k-nearest neighbors. This provides a graph G(knn)=(V,E), where V is the set of vertices and E is the set of edges. We represent the two graphs as incidence matrices B1F and B1P. For X{F,P}, vertex vjV and edge eiEX, we have equation (8):

B1X[i,j]=1ifedgeejleavesvertexvi1ifedgeejentersvertexvi0ifotherwise. 8

We now define initial edge weights as gradient values of the pseudotime estimates of equation (7) on the fully connected graph, for example, wijF=ujui. We want to estimate pseudotimes of the pruned k-nearest neighbor graph by using the pseudotime estimates of the full graph. For this, we get a first estimate of the gradients of the truncated graph (wP) by minimizing as shown in equation (9):

wP=argminwPB1FwFB1PwP2+λwP2, 9

where λ is the regularization parameter. Next, we update the potential of the vertices according to equation (10):

us=argminu(B1P)uwP2. 10

This allows us to estimate an updated gradient ws=(B1P)us.

With the graph G(knn) = GP with edge weights ws and pseudotime estimates us, we estimate a graph embedding with the stress majorization layout algorithm62,66. Examples of the graph embeddings and pseudotime estimates are available in Extended Data Fig. 1a.

HL decomposition on a single-cell SC

Given G(knn) and pseudotime estimate us, the next steps are the creation of an SC and estimation of the Hodge decomposition.

Triangulation of the single-cell graph

A typical cell differentiation graph has tree-like structures. The harmonic eigenvectors of the HL are able to characterize distinct types of topological structures (connected components, holes and voids), but not trees or branches. PHLOWER resorts to a trick, that is, it uses pseudotime estimates to find end-state cells and adds dummy/artificial edges from end-state to start-state cells. This creates holes in the data, one for each trajectory group in the tree. For this, we obtain m (default 5) cells with the highest and lowest pseudotime. Next, we use a Delaunay triangulation67 on the graph embedding to create a hole-free set of triangles. Next, we remove edges connecting distant vertices, that is, we estimate the distribution of the distance of connected vertices and obtain the value associated with the 75% quantile (Q75), and we remove all edges connecting vertices, whose distance is greater than three times the Q75 value. We denote this as the triangulated graph. We further refine the dummy edges by performing a triangulation between each pair of terminal (high pseudotime) vertices and root (low pseudotime) vertices. This results in SC=(V(t),E(t),T(t)), where V(t) is the vertices set, E(t) is the edge set and T(t) is the set of triangles. An example of the simplex representation of a differentiation tree is provided in Extended Data Fig. 1b.

The filtering of edges by distance to obtain the triangulated graph is related to PH analysis35. To show this, we evaluated how the proposed filtering from the triangulated graph compares to PH analysis of connected components for the mouse embryonic data and simulated data (Extended Data Fig. 4). This comparison suggests that using a filtering step such that we obtain one connected component plus an additional radius (1.2 of the minimum radius to obtain a single component) produces similar results to the previously described filtering of edges. The user can use either approach now in PHLOWER.

Matrix representation of an SC

Next, we represent the two-dimensional SC=(V(t),E(t),T(t)), as incidence matrices or boundary operators Bk which map k-simplices to (k − 1)-simplices. For example, the first boundary operator B1 captures the relation between vertices (0-simplices) and edges (1-simplices), and B2 captures the relation between edges (1-simplices) and triangles (2-simplices)68.

The boundary matrix on 1-simplices B1 is defined in equation (8). An entry in B2 capturing the relationship between an oriented edge eiE(t) and an oriented triangle ΔqT(t) can be defined as shown in equation (11):

B2[i,q]=1ifeiΔqandeihassameorientationasΔq1ifeiΔqandeihasoppositeorientationasΔq0ifotherwise. 11

Check Supplementary Fig. 9 for an example of an SC and its corresponding B1 and B2 matrices. There we can find several edges ([1, 2], [2, 3], [2, 4], [2, 5], [3, 4], [4, 5], [4, 6], [4, 7], [6, 7], [7, 8]) and three triangles ([2, 3, 4], [2, 4, 5] and [4, 6, 7]). Entry B1[1, 1] has a value −1, as the first edge e1 = (1, 2) leaves vertex 1, while B1[2, 1] has a value 1, as the first edge [1, 2] enters the vertex 2. Regarding B2, all entries related to the first edge e1 = (1, 2) are zero, as there is no triangle associated with it. B2[2, 1] is equal to one as the direction of e2 = (2, 3) fits the direction of the first triangle (Δ1 = (2, 3, 4)). B2[3, 1] is equal to −1 as the third edge (e3 = (2, 4)) has the opposite direction to Δ1 = (2, 3, 4). We refer the reader to ref. 11 for an in-depth characterization of SCs. A similar rationale follows for B2. Note that for computational simplicity, the orientations of edges and triangles are set with a bookkeeping procedure11. That is, vertices are given numerical IDs in order of creation, and edges and triangles are oriented toward vertices with higher ID (Supplementary Fig. 9).

HL and Hodge decomposition

The HL is a higher-order generalized form of the graph Laplacian. The k-th HL is defined in equation (12):

Lk=BkBk+Bk+1Bk+1 12

where Bk is the k-th boundary operator.

For k = 0, L0 is the same as the graph Laplacian introduced in equation (3), as shown in equation (13):

L0=00+B1B1=DA, 13

where D is the diagonal degree matrix of a graph and A is the adjacency matrix.

Here, we are interested in the first-order HL, as shown in equation (14):

L1=B1B1+B2B2. 14

As in diffusion maps22, it is preferable to work with the normalized form of the HL, as this provides a random-walk process on the SC11 as shown in equation (15):

L1=D2B1D11B1+B2D3B2D21 15

where D2 is the diagonal matrix of (adjusted) degrees of each edge, that is, D2=max(diag(B21),I). D1 is the diagonal matrix of weighted degrees of the vertices according to the edge weights D2, and D3=13I, similar to the standard form of L0, decomposition hard. To address this, we construct the symmetric form of L1 with closely related eigenvectors and the same eigenvalues, as shown in equation (16):

L1s=D21/2L1D21/2=D21/2B1D11B1D21/2+D21/2B2D3B2D21/2. 16

Next, we perform eigen-decomposition on the symmetric form L1s, as shown in equation (17):

L1s=QΛQ 17

where columns of Q are the eigenvectors and the diagonal elements of the diagonal matrix Λ indicate the corresponding eigenvalues. Thus, the decomposition of the normalized form L1 becomes equation (18):

L1=D21/2L1sD21/2=D21/2QΛQD21/2=UΛU1 18

where U=(u1,,uE(t)) is the eigenvector matrix of L1, Λ=diag(λ1,,λE(t)) are the eigenvalues, and |E(t)| is the number of edges in the SC. We assume the eigenvectors have been sorted by their corresponding increasing eigenvalues such that 0λ1λ2λE(t). We denote this, according to equation (19),

H:=(u1,,uh) 19

to be the matrix containing all the harmonic eigenvectors associated with L1, that is, all the eigenvectors corresponding to the 0 eigenvalues, where h is the number with eigenvalues being equal to 0 and HRE(t)×h.

For example, for the embryonic mouse data, we observe that the HL spectra have two zero eigenvalues up to numerical inaccuracies, which define two harmonic eigenvectors. If we plot the harmonic eigenvector values on vertices, we observe that they highlight the two major branches of the triangulated graph (Extended Data Fig. 1c). This is related to the spectral clustering algorithm, where eigenvectors with zero eigenvalues are associated with disconnected components (or clusters) in a graph (Supplementary Fig. 8).

Trajectory embedding and tree inference

Trajectory sampling and embedding

To generate trajectories, we sample paths (or edge flows) in the SC by following edges with positive divergence (or increasing pseudotime). Owing to the sparsity of the SC, we sample paths in G(knn). For this, we perform a preference random walk on graph G(knn). We choose a random starting point from the vertices (cells) with the m lowest pseudotime values. We choose the next vertex randomly by considering the divergence values (ws). Only positive divergences (increase in pseudotime) are considered. We stop when no further positive potential is available. Next, we return to the SC, estimating the shortest paths between vertices in case-sampled edges (from G(knn)) are not present in the SC. We define the embedding fRE(t) of a path on the SC into the edge-flow space as shown in equation (20):

f[i,j]=1ifedge(i,j)istraversed1ifedge(j,i)istraversed0otherwise 20

The random walk is repeated n times. This provides us with a path matrix F(t)RE(t)×n, where E(t) is the number of edges in the SC. See Extended Data Fig. 1d for examples of sampled trajectories or edge flows. We next project these paths F(t) onto harmonic space to estimate a trajectory embedding as shown in equation (21):

H(t)=HF(t) 21

where HRE(t)×h are the harmonic eigenvectors defined in equation (19). PHLOWER next performs clustering on the trajectory embedding H(t) with DBSCAN69 to group the paths into major differentiation trajectories. For the MEF data, we observe that the trajectory embedding (or trajectory map) reveals two clusters (Extended Data Fig. 1e) associated with the neuronal and myocyte differentiation.

Cumulative trajectory embedding and tree inference

The path representation presented in equation (20) does not keep the time step of an edge visit. Thus, we also define a traversed edge-flow (traversed path) matrix f^RE(t)×S to record the edge visits for each time step individually, that is, as shown in equation (22):

f^[i,j,s]=1ifedge(i,j)istraversedinsteps1ifedge(j,i)istraversedinsteps0otherwise 22

where S is the length of the trajectory, 1 ≤ s ≤ S is the number of the step and E(t) is the number of edges in the graph SC. As we have n trajectories, we will have n traversed edge-flow matrices {f^1,f^2,,f^n}.

We use cumulative trajectory embedding to represent paths and to detect major trajectory groups and branching points. For a path f^, we can obtain a point associated with every step s in this cumulative (harmonic) trajectory embedding space as shown in equation (23):

vs=Hi=1sf^,i 23

where f^,iRE(t) is a signed indicator vector for the edge traversed in step i, vsRh is the cumulative trajectory embedding of path f^ in step s in harmonic coordinates and h is the number of harmonic eigenvectors with zero eigenvalues. This is computed for all 1 ≤ s ≤ S, which defines a vector v = {v1, …, vS} for every path. Intuitively, the entries of v describe a trajectory in the harmonic edge-flow space that starts at 0 and ends in the harmonic trajectory embedding of the entire path.

As observed in Extended Data Fig. 1e, these vectors are low-dimensional representations of paths in the cumulative trajectory embedding. By coloring paths from distinct groups with distinct colors, we can recognize branching-point events, branches shared by trajectory groups and terminal branches. Note also that if we consider only the final entry vS for every path, we obtain the same result as in the previously described trajectory embedding.

Recall that we have performed the DBSCAN clustering method to cluster the n paths into m groups {g1, g2, ⋯ gm} on the trajectory embedding defined in equation (21). PHLOWER next uses a procedure schematized in Supplementary Fig. 10 to find the differentiation tree structure. First, it estimates pseudotime values for every edge, that is, the average pseudotime us from vertices associated with the edge. It next bins all edges by considering their pseudotime, that is, it selects the trajectory group with the lowest pseudotime and splits its edges in p bins (Supplementary Fig. 10b). The same range of pseudotime is used to bin all trajectory groups and bins are indexed in increasing pseudotime (Supplementary Fig. 10c). After binning of edges for each group, PHLOWER next finds the branching points for all group pairs by comparing the distance of edges within the bin versus the distance of edges between the bins for a given bin index i.

More formally, for groups gi and gj and bin k, their corresponding edges in cumulative space are defined as set Vik, Vjk. We then estimate the average edge coordinate per bin to serve as backbones for every group, that is, as shown in equation (24):

bik¯=1MvVikv, 24

where M=Vik is the number of edges in the bin. We also consider the average distance between edges in a bin to have an estimate of the compactness of edges in a bin and trajectory, as shown in equation (25):

σik¯=1M2uVikvVikuv2. 25

Finally, we calculate the distance between two groups gi and gj in time bin k, as shown in equation (26):

d(i,j,k)=bik¯σik¯bjk¯σjk¯2. 26

For every pair of groups, PHLOWER finds a unique branching point by traversing bins in decreasing order and finding the first bin such that d(i, j, k) < σ (as default 1).

This is repeated for all pairs of groups. The tree is finally built in a bottom-up manner. PHLOWER first considers branching points with the highest index and builds a sub-tree by merging the two trajectory groups at hand. This is repeated until all branching points are considered (Supplementary Fig. 10d,e). PHLOWER finally defines the leaves and the root by finding more extreme points, and edges with lowest and highest pseudotime in a trajectory group. These are the so-called milestones (branching points, root and end points) needed for evaluation by Dynverse8. Trajectories are redefined as branches, that is, part of the trajectories between two milestones. We also allocate cells (vertices) to these branches. For this, we consider the location of all edges associated with a vertex and use the mean value in the cumulative trajectory embedding space. This is used to allocate cells to branches and to find the distance between respective milestones. Moreover, we provide this information for STREAM5 for visualization as a STREAM tree.

Regulator and marker selection

We use the cumulative trajectory space and statistical tests to find markers and regulators associated with trajectories. To compare two final branches, PHLOWER selects all cells associated within particular areas of the branches, that is, the highest 50% of the bins. Note that every cell can be visited by several edges. To consider this important information, we multiply the expression count vector of each cell by the number of visits. We then perform a statistical test (default is the t-test from Scanpy70). In case we are interested in comparing sub-trees with several branches, PHLOWER first merges the sub-tree as a single branch, before the selection of the bin.

In the case of multimodal data (RNA and ATAC), PHLOWER also has a module to find regulators. First, we estimate a TF activity score per cell using chromVar71. We then use the previously described test to find branch-specific regulators. Since TF activity cannot be discerned from TFs with similar motifs, we only consider genes whose TF activity is similar to the expression pattern at the selected branch as in ref. 45. We first calculate the average expression/TF activity of cells of bins around the branch of interest. We next smooth the gene expression using convolution (numpy.convolve)). We then estimate the correlation between gene expression and TF activity and use this to sort branch-specific regulators.

Materials

Synthetic datasets

To test the power of PHLOWER in the detection of multi-branching trees, we generate ten simulated datasets using DLA trees22 as proposed in PHATE14. For this, we use a Gaussian noise parameter of 5 and vary the parameter n_branch, which controls the number of branches in the trees, from 3 to 12. This generated trees with 5 to 18 branches. For all data, we generate 3,000 points and 100 features. Of note, we use the same data as evaluated in PHATE as data with 10 branches. Next, we run PHATE to visualize the branches, with which we can construct ground truth for Dynverse benchmarking in the future.

Next, we reformat this data to be used within the trajectory benchmarking framework Dynverse8 by using Dynwrap. In short, Dynverse needs detailed information of the branches, branch points and association of cells within a branch. To do so, we need to find the branching points of the DLA tree. First, we perform PHATE on the DLA high-dimensional data to get embedding with two dimensions. We only consider embeddings, where the tree structure is preserved. Next, we find the branching points by finding two nearest neighbors of two branches. With these branching points, we created the branch backbones needed by Dynverse. We calculate the association of each data point to a branching point (milestone percentage) by calculating the Euclidean distance between the points related to each branch.

Real scRNA-seq datasets

To evaluate the performance of PHLOWER on real datasets, we selected 33 real datasets from the Dynverse benchmarking dataset, including 4 gold-standard and 29 silver-standard datasets with known ground-truth tree structures. Specifically, we only consider data with a single root and at least three branches. These are classified as bifurcation, multifurcation and tree structures as in Dynverse (Supplementary Table 1). For this, we inspected the code from https://github.com/dynverse/dynbenchmark/ and ran the script to download all the necessary data.

Benchmarking evaluation

Dynverse includes wrappers for several trajectory methods. We explore the following methods in our evaluation: PAGA tree4, Monocle3 (ref. 23), Slingshot6, Slice26, TSCAN29, Slicer72, MST30 and Elpigraph31. We did not included Slicer in the final benchmarking as it obtained poor results. We expand this by including a wrapper for STREAM5 and PHLOWER. Note that some tools (Palantir and MIRA7,73) do not infer differentiation trees and can, therefore, not be evaluated here. We first calculate the distance of each cell to all branches. Next, we assign the cell to the nearest branch and use the distance ratio to associate cells to milestones. Then, we use Dyneval to measure quality metrics for the generated trajectories. As in ref. 8, we mainly focus on HIM distance to measure the tree structure similarity; cordist to measure the correlation between the cell geodesic distances within predicted and true branches; and F1,branches and F1,milestones to measure the accuracy of a cell assigned to the correct branches and the correct milestone (bifurcation points). To obtain a final accuracy score, we used a procedure as described in Dynverse, which is based on the average of the four previous statistics per dataset.

Statistics and reproducibility

Benchmarking analyses were evaluated with the Friedman–Nemenyi post hoc test74. First, Friedman’s test75 was performed to compare the average ranks of the methods across all datasets. Nemenyi’s post hoc test76 was performed for pairwise multiple comparisons. For imaging experiments, the number of samples for each group was chosen on the basis of the expected levels of variation and consistency. The depicted immunofluorescence graphs are representative. All imaging experiments were performed at least three times, and all repeats were successful.

Kidney organoids

Ethical statement

Permission for the creation and use of iPS cells in this study was obtained from the ethical commission at RWTH Aachen University Hospital (approval number EK23-193).

Cell culture

For the generation of the human iPS cell-353 line, erythroblasts from a healthy male volunteer were reprogrammed using the CytoTune-iPS 2.0 Sendai Reprogramming Kit (Thermo Fisher) according to the manufacturer’s protocol. In short, erythroblasts were transduced using the CytoTune-iPS 2.0 Sendai Reprogramming Kit and re-seeded on plates with inactivated MEFs to support the growth and pluripotency of iPS cells. Emerging iPS cell colonies were picked, expanded and assessed for activation of stem cell markers to confirm pluripotency.

Generation of human iPS cell-derived kidney organoids

For the generation of human iPS cell-derived kidney organoids, iPS cells were seeded using a density of 20,000 cells per cm2 on Geltrex (Thermo Fisher)-coated six-well plates (Greiner Bio-One). The differentiation protocol was based on our previous work36. In brief, differentiation toward ureteric bud-like and metanephric mesenchyme lineage was initiated using CHIR-99021 (10 μM, BioTechne) in STEM diff APEL 2 medium (StemCell Technologies) for 3 and 5 days, respectively. Next, the medium was replaced sequentially for APEL 2 supplemented with fibroblast growth factor 9 (200 ng ml−1, R&D systems) and heparin (1 μg ml−1, Sigma-Aldrich) up to day 7. On day 7, differentiated cells were trypsinized and mixed in a ratio of one part 3-day CHIR-differentiated cells and two parts 5-day CHIR-differentiated cells to stimulate cross-talk between both lineages to boost segmented nephrogenesis. To generate cell pellets, 300,000 cells per 1.5-ml tube were aliquoted from the cell mix and centrifuged three times at 300g for 3 min changing position by 180° per cycle. Cell pellets were plated on Costar Transwell filters (type 3450, Corning, Sigma-Aldrich), followed by a 1-h CHIR pulse (5 μM) in APEL 2 medium added to the basolateral compartment. Next, medium was replaced for APEL 2 medium supplemented with fibroblast growth factor 9 and heparin for an additional 5 days and the entire three-dimensional organoid culture was performed using the air–liquid interface. After 5 days of organoid culture, APEL 2 medium was supplemented with human epidermal growth factor (10 ng ml−1, Sigma-Aldrich). Medium was replaced every other day for an additional 13 days.

Silencing of ZIC2, RFX4 and PAX3 during stem cell-derived kidney organoid development

Kidney organoids (at least N = 3 per condition per experiment, two independent experiments) were transfected using ON-TARGETplus SMARTpool siRNA’s ZIC2 (L-017505-00-0005), RFX4 (L-013577-00-0005) and PAX3 (L-012399-00-0005, work concentration 25 nM, Horizon Discovery) and DharmaFECT Transfection reagent 1 (0.5% vol/vol) from day 7+5 (intermediate mesoderm stage) onward to the end of the protocol until day 7+18. Organoids were refreshed every other day.

RNA extraction, cDNA synthesis and qPCR

RNA from organoids was extracted using the PureLink RNA mini kit (Thermo Fisher) according to the manufacturer’s protocol. RNA was stored at −80 °C until further processing. cDNA synthesis was performed using 200 ng RNA as input, using the iScript cDNA synthesis kit (Bio-Rad) according to the manufacturer’s protocol. The mRNA expression was quantified by performing a semi-quantitative real-time PCR using PowerUp SYBR Mastermix (Applied Biosystems) and primers targeting human RFX4, ZIC2 and PAX3 genes (hZIC2_For AAAGGACCCACACAGGGGAGA, hZIC2_Rev GACGTGCATGTGCTTCTTCCT, hRFX4_For TGGGAAGAGCATGCATTGTG, hRFX4_Rev TCTTTCAATCCAGCTCTCTGTGG, hPAX3_For GCAGTATGGACAAAGTGCCT, hPAX3_Rev CAGGGCCAGTTTTAGCTCCA). After correction with the corresponding human GAPD gene (GAPDH_For GAAGGTGAAGGTCGGAGTCA, GAPDH_Rev TGGACTCCACGACGTACTCA), gene expression levels were plotted as fold change compared to control. Data plotting and statistical analysis were performed using GraphPad Prism (version 10.0.3).

Harvesting kidney organoids for 10x genomics NEXT GEM multiome pipeline

To dissect nephrogenic differentiation trajectories in kidney organoids using the multiome pipeline, organoids were collected at different time points during culture. The following differentiation stages were processed: day 7 (cells harvested from the two-dimensional cell layer, primitive streak - intermediate mesoderm stage), organoids day 7+5 (day 12), day 7+12 (day 19) and day 7+18 (day 25). These kidney organoid stages represent nephrogenesis ranging from intermediate mesoderm (day 7+5) toward metanephric mesenchyme and ureteric bud-like lineages (day 7+12) that result into nephron-like structures embedded by a (progenitor) stromal compartment at the end of the differentiation protocol (day 7+18). Kidney organoids (N = 4 per time point) were collected on the respective dates (day 7+5, day 7+12 and day 7+18) by cutting single organoids out of the Transwell filter in a sterile flow hood using a scalpel. Organoids were washed with 5 ml PBS per filter at room temperature three times. Afterwards, single organoids were cut from the Transwell filter with the organoids still being attached to the membrane, transferred to 1.7-ml cryovials (Greiner Bio-One) and subsequently snap frozen and stored at −80 C until nuclei isolation.

Single-nuclei isolation from kidney organoids

Snap-frozen kidney organoids were thawed in PBS and crushed using a glass tube and douncer (Duran Wheaton Kimble Life Sciences). After passing the single-cell suspension through a 40-μm cell strainer (Greiner Bio-One), the suspension was centrifuged at 4 °C and 300g for 5 min. Subsequently, the supernatant was discarded and the cell pellet was resuspended in Nuc101 cell lysis buffer (Thermo Fisher), supplemented with RNase and protease inhibitors (Recombinant RNase Inhibitor and Superase RNase Inhibitor, Thermo Fisher, and cOmplete Protease Inhibitor, Roche), incubated for 30 s and centrifuged at 4 °C and 500g for 5 min. After discarding the supernatant, the nuclei were carefully resuspended in PBS containing 1% (vol/vol) Ultra-Pure bovine serum albumin (BSA; Invitrogen Ambion, Thermo Fisher) and Protector RNAse inhibitor (Sigma-Aldrich). Single nuclei were counted using Trypan blue (Thermo Fisher) and prepared for 10x genomics ChromiumNextGEM Multiome pipeline v1 according to the manufacturer’s guidelines (https://cdn.10xgenomics.com/image/upload/v1666737555/support-documents/CG000338_ChromiumNextGEM_Multiome_ATAC_GEX_User_Guide_RevF.pdf).

Formalin-fixed paraffin-embedded tissue of human iPS cell-derived kidney organoids

iPS cell-derived kidney organoids were cut from the Transwell filter and fixated in 4% (vol/vol) formalin on ice for 20 min. Fixed iPS cell-derived kidney organoids were stripped of the filter membrane using a scalpel. Each single organoid was embedded using 2.25% (wt/vol) agarose gel (Thermo Fisher). After embedding for 5 min at 4 °C, the iPS cell-derived kidney organoids were transferred to embedding cassettes and paraffinized. After paraffinization, iPS cell-derived kidney organoids were cut at a thickness of 4 μm using a rotary microtome (Microm HM355 S, GMI) and mounted on FLEX IHC microscope slides (DAKO, Agilent Technologies) for immunofluorescence staining or Xenium array slides for spatial analysis.

Immunofluorescence staining

Slides were deparaffinized using a series of xylol (2×) and 100% (vol/vol) ethanol (3×), followed by antigen retrieval by boiling slides in Tris-buffered EDTA (VWR Chemicals) for 20 min. Primary (1:100 dilution) and secondary (1:200 dilution) antibodies were diluted in PBS containing 1% (vol/vol) BSA (Sigma-Aldrich). Primary antibodies (NPHS1, AF 4269-SP, RD Systems, Vimentin, ab92547, Abcam, E-cadherin, 610405, BD Biosciences) were incubated overnight at 4 °C, and secondary antibodies were incubated at room temperature for 2 h (donkey anti-sheep IgG (H+L) Alexa Fluor 647 (Thermo Fisher), donkey anti-rabbit IgG (H+L) Alexa Fluor 488 (Thermo Fisher), donkey anti-mouse IgG (H+L) Alexa Fluor 488, DAPI (300 nM, 4′,6-diamidine-2′ phenylindole dihydrochloride, Merck). After each antibody incubation, slides were washed three times in PBS for 5 min. Slides were mounted using Fluoromount-G (Southern Biotech, SanBio). Images were captured using a Zeiss LSM 980 confocal microscope. Image analysis was performed using Fiji v2.14, and conditions were normalized to control. Data plotting and statistical analysis were performed using GraphPad Prism (version 10.0.3).

Computational analysis of kidney organoids

After read mapping using the cellranger-arc tool (version 1.0.1), we filter the low-quality cells using information both from scRNA and scATAC reads. We first import scRNA to Seurat77 to get the scRNA metrics for filtering. We next import the fragments into ArchR78 to get the scATAC metrics for filtering. With the information above, we retain cells with barcodes in both scRNA and scATAC count matrices. Next, we filter the cells using threshold nFeatureRNA > 400 and nCountRNA <400,000 and percent.mito > 5 and scATAC using thresholds in ArchR that minTSS = 6 and minFrags = 2,500 and maxFrags = 1 × 105. We next perform preprocessing to scRNA using Seurat. To do this, we first normalize the scRNA data by calling NormalizeData with the default parameters. Next, we find top variable features with parameter selection.method = ‘vst’. Then, we scale the data by regressing out the cell cycle and mitochondrial effect. We next run PCA with 50 principal components (PCs). To remove the batch effects, we next run harmony37 to integrate the four samples. For scATAC data, we use ArchR to perform the preprocessing. We first create Arrow files using the aforementioned filter threshold by calling createArrowFiles; the tile matrices are created directly from the fragment files. Next, we add doublet scores by calling addDoubletScores for each sample followed by the filterDoublets function to remove doublets with default parameters. Next, we run addIterativeLSI on the tile matrices to add dimensional reductions. A batch correction is also called using Harmony37 with the addHarmony function. To obtain a uniform dimension reduction for the downstream analysis, we next run MOJITOO9 to integrate the scRNA and scATAC data with default parameters.

MOJITOO embedding was used as input for PHLOWER. An important parameter is the indication of the root cells. For this, we performed a clustering analysis using Seurat77 with resolution = 2.5 on the MOJITOO embedding space. The root cells were defined by clusters predominantly present in day 7 and with the expression of mesoderm markers (TBXT, MESP1, KDR5). PHLOWER found 28 dimensions with zero eigenvalues and clustering analysis of the trajectory space detected 16 groups. We removed four main trajectory outliers (where less than 0.5% of samples are inside the cluster) and one trajectory, where cells had a low pseudotime values (for example, they did not differentiate). Finally, some clusters had the same end time points. Therefore, we kept the one with the largest number of trajectories. This resulted in the nine final trajectories found by PHLOWER.

Computational analysis of Xenium experiments of kidney organoids

To select markers, we used the phlower.tl.tree_mbranch_markers function to identify markers for each main branch (stromal, neuronal, tubular and podocytes), focusing on cells with the highest pseudotime according to PHLOWER (top 50%). Next, we further selected markers with expression specific to sub-branches (tubular, podocytes, stromal 1, stromal 2–4, neuron 1 and neuron 2–3). For this, we retained only genes with an adjusted P value < 0.05 and log fold change (FC) > 1.5 for the main branch markers. Next, we only considered protein-coding genes according to Ensembl (version GRCh38.104). We then scaled the logFC values of both main branches and sub-branches to a 0–1 range and calculated the mean normalized logFC for each branch and its sub-branches. We selected the top 12 genes for each sub-branch based on the mean normalized logFC. For TF selection, we used TF–gene correlation along each sub-branch to identify five TFs for each main branch using scMEGA analysis (Extended Data Fig. 9). Additionally, we included literature-based cell-type markers reported in Extended Data Fig. 7, which supported the annotation of the multiome single-cell data. In total, we identified 100 genes for the Xenium experiment (Supplementary Table 2). The panel primers were designed with the Xenium Panel Designer tool provided by 10x Genomics.

For the data analysis, we used Xenium Explore3 to inspect each region and remove those that did not show differentiation and remove stromal cell patches around organoid edges. This process left us with two scrambled siRNA regions, one siRNA region, one day-19 region and two day-25 regions. Note that multiple regions correspond to distinct sections of the same organoid. Next, we used Space Ranger with default parameters and loaded each Xenium sample using Seurat (version 5.0.2) with the LoadXenium function and removed cells based on the organoid masks. We integrated all cells using Seurat’s merge function and filtered out cells with zero nCount_Xenium. We next applied SCTransform for data normalization, PCA for dimensionality reduction and Louvain clustering (50 PCs and resolution = 0.3), and estimated uniform manifold approximation and projection for visualization. Next, we annotated the clusters by examining the gene expression of known cell-type markers. We excluded five clusters with 33,396 cells owing to low nFeature_Xenium (<20) and nCount_Xenium (<60). This potentially represents cells whose markers are not present in our panel.

For the day-19 and day-25 regions, we performed trajectory inference using PHLOWER using a subset of the cells (2,000 per cluster). We used mesoderm cells as the root for the trajectory inference and identified three end branches. Finally, we integrated PHLOWER with STREAM for visualization.

Pancreas and neurogenesis scRNA-seq data

We applied PHLOWER and competing approaches to pancreatic endocrinogenesis data (3,696 cells)32 and the data on developing mouse hippocampus (neurogenesis, 18,213 cells)33, both of which were used in scVelo34. Specifically, we extracted AnnData using the scVelo package via scvelo.datasets.pancreas and scvelo.datasets.dentategyrus_lamanno. For both datasets, we applied the Scanpy pipeline including filtering, feature selection, normalization and PCA with default parameters. See tutorials for details on how PHLOWER was executed in these datasets.

For competing approaches, these methods were executed with Dynverse. Basically, we first use R to load the h5ad file with anndata::read_h5ad and create a SeuratObject from the count matrix. We then perform normalization and identify the top 2,000 variable features to obtain the log-normalized expression matrix. Next, we use dynwrap::wrap_expression to structure a Dynverse dataset object with count and expression data. The start_id is set as the first ‘Ductal’ and ‘nIPC’ cell ID for pancreas and neurogenesis, and group_id is assigned based on cell types. The dataset object is then saved as an R object, enabling trajectory inference with dynwrap::infer_trajectory. Finally, we visualize the inferred trajectories using dynplot::plot_dimred.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Online content

Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41592-025-02870-5.

Supplementary information

Supplementary Information (84.2MB, pdf)

Supplementary Figs. 1–10 and Supplementary Tables 1 and 2.

Reporting Summary (1.5MB, pdf)
Peer Review File (2.1MB, pdf)

Source data

Source Data Fig. 2 (102.1KB, xlsx)

Benchmarking results on simulated and real datasets.

Source Data Fig. 3 (222.9KB, xlsx)

Gene marker count matrix of the kidney organoid multiome data.

Source Data Fig. 4 (10.2KB, xlsx)

Permutation test to measure relative differences in cell proportions comparing cell abundances in day 25 versus day 19.

Source Data Fig. 5 (12.5KB, xlsx)

Permutation test for siRNA and scrambled siRNA kidney organoids and quantification of the immunofluorescence staining markers.

Source Data Extended Data Fig. 2 (2.1MB, xlsx)

Friedman–Nemenyi test benchmarking methods across simulated and real datasets; different trajectory type contribution of the benchmarking metrics; distributions of topology size differences between predicted and reference structures.

Source Data Extended Data Fig. 3 (11.7KB, xlsx)

Time and memory benchmarking.

Source Data Extended Data Fig. 4 (49.9KB, xlsx)

PH analysis of example datasets.

Source Data Extended Data Fig. 5 (50.2KB, xlsx)

One-dimensional PH analysis of example datasets.

Source Data Extended Data Fig. 6 (839.9KB, xlsx)

Quality check of kidney organoid multiome data.

Source Data Extended Data Fig. 8 (338.1KB, xlsx)

Differential expression analysis and transcription factor–gene expression correlation to find regulators.

Source Data Extended Data Fig. 10 (26.8KB, xlsx)

Gene expression in kidney organoid multiome data for Xenium experiments; Xenium quality check, UMAP and gene marker expressions; statistical test of mRNA expression of genes targeted by siRNA experiments.

Acknowledgements

We thank the IZKF Aachen Genomics Core facility for sequencing experiments. This project has been funded by the German Research Foundation (DFG; 3888802535, CRU344-417911533, CRU344-4288578857858, CRU5011-445703531, CRU5011-445703531, SFBTRR219 322900939, Emmy Noether EN-459969915 and Research Training Group 2236 UnRAVeL), by the Consortia E:MED Fibromap, CureFib, Graphs4Patients and AgedHeart funded by the German Ministry of Science, Technology and Space (BMFTR). This work was further supported by grants from the European Research Council (ERC-COG 101043403, ERC-PoC 101138549, ERC-StG-101040726, ERC-StG-101039827 HIGH-HOPeS), the Dutch Kidney Foundation (DKF), TASKFORCE EP1805, the NWO VIDI 09150172010072, Else Kroener Fresenius Foundation (EKFS) and the Aventis Foundation. Views and opinions expressed are those of the authors only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.

Extended data

Author contributions

M.C., I.C., V.G. and M.S. conceived computational methods, while J.J., K.R., C.K. and R.K. conceived organoids, wet-lab and sequencing experiments. M.C. wrote the code and performed computational analysis, except where otherwise noted. V.G., J.N. and M.G. supported the implementation of PHLOWER. Z.L. performed the analysis of the single-cell multimodal data, while P.K. supported the analysis of the Xenium data. J.J. and K.R. performed the organoid experiments and J.J. performed knockdown validations. All authors edited, reviewed and approved the final manuscript.

Peer review

Peer review information

Nature Methods thanks Kelin Xia, Markus List, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Madhura Mukhopadhyay, in collaboration with the Nature Methods team. Peer reviewer reports are available.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Data availability

Data objects with the kidney organoid (multiome and Xenium) and the benchmarking data have been deposited in Zenodo via 10.5281/zenodo.13860460 (ref. 79). Single-cell multimodal data and spatial transcriptomics have been deposited in the Gene Expression Omnibus (GEO) under the accessions GSE302266 and GSE302264. Source data are provided with this paper.

Code availability

Code, documentation and examples for running analysis of this paper are available on GitHub via https://github.com/CostaLab/phlower/ and readthedocs via https://phlower.readthedocs.io/en/latest/.

Competing interests

R.K. is founder and board member of Sequantrix, is a member of the scientific advisory board of Hybridize Therapeutics, received honoraria from Bayer, Chugai, Pfizer, Roche, Genentech, Lilly and GSK, and received research funding from Travere Therapeutics, Galapagos, Novo Nordisk and Ask Bio. The other authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Mingbo Cheng, Jitske Jansen.

These authors jointly supervised this work: Christoph Kuppe, Michael T. Schaub, Rafael Kramann, Ivan G. Costa.

Contributor Information

Rafael Kramann, Email: rkramann@ukaachen.de.

Ivan G. Costa, Email: ivan.costa@rwth-aachen.de

Extended data

is available for this paper at 10.1038/s41592-025-02870-5.

Supplementary information

The online version contains supplementary material available at 10.1038/s41592-025-02870-5.

References

  • 1.Kim, J., Koo, B. K. & Knoblich, J. A. Human organoids: model systems for human biology and medicine. Nat. Rev. Mol. Cell Biol.21, 571–584 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Ma, S. et al. Chromatin potential identified by shared single-cell profiling of RNA and chromatin. Cell183, 1103–1116 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Costa, I. G. Dissecting gene regulation with multimodal sequencing. Nat. Methods20, 1282–1284 (2023). [DOI] [PubMed] [Google Scholar]
  • 4.Wolf, F. A. et al. PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biol.20, 59 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Chen, H. et al. Single-cell trajectories reconstruction, exploration and mapping of omics data with stream. Nature Commun.10, 1903 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Street, K. et al. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics19, 477 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Lynch, A. W. et al. MIRA: joint regulatory modeling of multimodal expression and chromatin accessibility in single cells. Nat. Methods19, 1097–1108 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol.37, 547–554 (2019). [DOI] [PubMed] [Google Scholar]
  • 9.Cheng, M., Li, Z. & Costa, I. G. MOJITOO: a fast and universal method for integration of multimodal single-cell data. Bioinformatics38, i282–i289 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Argelaguet, R. et al. Mofa+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol.21, 111 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Schaub, M. T., Benson, A. R., Horn, P., Lippner, G. & Jadbabaie, A. Random walks on simplicial complexes and the normalized Hodge 1-Laplacian. SIAM Review62, 353–391 (2020). [Google Scholar]
  • 12.Coifman, R. R. et al. Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. Proc. Natl Acad. Sci USA102, 7426–7431 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Haghverdi, L., Buettner, F. & Theis, F. J. Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics31, 2989–2998 (2015). [DOI] [PubMed] [Google Scholar]
  • 14.Moon, K. R. et al. Visualizing structure and transitions in high-dimensional biological data. Nat. Biotechnol.37, 1482–1492 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Von Luxburg, U. A tutorial on spectral clustering. Stat. Comput.17, 395–416 (2007). [Google Scholar]
  • 16.Frantzen, F., Seby, J. -B. & Schaub, M. T. Outlier detection for trajectories via flow-embeddings. In 55th Asilomar Conference on Signals, Systems, and Computers 1568–1572 (IEEE, 2021).
  • 17.Bertin, J. Introduction to Hodge Theory Vol. 8 (American Mathematical Society, 2002).
  • 18.Dodziuk, J. Finite-difference approach to the Hodge theory of harmonic forms. Am. J. Math.98, 79–104 (1976). [Google Scholar]
  • 19.Chen, Y. -C., Wu, W., Meilă, M. & Kevrekidis, I. G. Helmholtzian eigenmap: topological feature discovery and edge flow learning from point cloud data. Preprint at 10.48550/arXiv.2103.00762 (2021).
  • 20.Maehara, K. & Ohkawa, Y. Modeling latent flows on single-cell data using the hodge decomposition. Preprint at bioRxiv10.1101/592089 (2019).
  • 21.Treutlein, B. et al. Dissecting direct reprogramming from fibroblast to neuron using single-cell RNA-seq. Nature534, 391–395 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Witten Jr, T. A. & Sander, L. M. Diffusion-limited aggregation, a kinetic critical phenomenon. Phys. Rev. Lett.47, 1400 (1981). [Google Scholar]
  • 23.Cao, J. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature566, 496–502 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.duVerle, D. A., Yotsukura, S., Nomura, S., Aburatani, H. & Tsuda, K. CellTree: an R/Tioconductor package to infer the hierarchical structure of cell populations from single-cell RNA-seq data. BMC Bioinformatics17, 363 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Herring, C. A. et al. Unsupervised trajectory analysis of single-cell RNA-seq and imaging data reveals alternative tuft cell origins in the gut. Cell Syst.6, 37–51 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Guo, M., Bao, E. L., Wagner, M., Whitsett, J. A. & Xu, Y. Slice: determining cell differentiation and lineage based on single cell entropy. Nucleic Acids Res.45, e54–e54 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Grün, D. et al. Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature525, 251–255 (2015). [DOI] [PubMed] [Google Scholar]
  • 28.Grün, D. et al. De novo prediction of stem cell identity using single-cell transcriptome data. Cell Stem Cell19, 266–277 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Ji, Z. & Ji, H. TSCAN: pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis. Nucleic Acids Res.44, e117–e117 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Graham, R. L. & Hell, P. On the history of the minimum spanning tree problem. Ann. Hist. Comput.7, 43–57 (1985). [Google Scholar]
  • 31.Albergante, L. et al. Robust and scalable learning of complex intrinsic dataset geometry via elpigraph. Entropy22, 296 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Bastidas-Ponce, A. et al. Comprehensive single cell mrna profiling reveals a detailed roadmap for pancreatic endocrinogenesis. Development146, dev173849 (2019). [DOI] [PubMed] [Google Scholar]
  • 33.La Manno, G. et al. RNA velocity of single cells. Nature560, 494–498 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Bergen, V., Lange, M., Peidli, S., Wolf, F. A. & Theis, F. J. Generalizing RNA velocity to transient cell states through dynamical modeling. Nat. Biotechnol.38, 1408–1414 (2020). [DOI] [PubMed] [Google Scholar]
  • 35.Carlsson, G. Topology and data. Bull. Am. Math. Soc.46, 255–308 (2009). [Google Scholar]
  • 36.Jansen, J. et al. SARS-CoV-2 infects the human kidney and drives fibrosis in kidney organoids. Cell Stem Cell29, 217–231 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with harmony. Nat. Methods16, 1289–1296 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Schmidt-Ott, K. M. How to grow a kidney: patient-specific kidney organoids come of age. Nephrol. Dial. Transplant.32, 17–23 (2016). [Google Scholar]
  • 39.Evseenko, D. et al. Mapping the first stages of mesoderm commitment during differentiation of human embryonic stem cells. Proc. Natl Acad. Sci. USA107, 13742–13747 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Menon, R. et al. Single-cell analysis of progenitor cell dynamics and lineage specification in the human fetal kidney. Development145, dev164038 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Hochane, M. et al. Single-cell transcriptomics reveals gene expression dynamics of human fetal kidney development. PLoS Biol.17, e3000152 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Combes, A. N. et al. Single cell analysis of the developing mouse kidney provides deeper insight into marker gene expression and ligand-receptor crosstalk. Development146, dev178673 (2019). [DOI] [PubMed] [Google Scholar]
  • 43.Wu, H. et al. Comparative analysis and refinement of human PSC-derived kidney organoid differentiation with single-cell transcriptomics. Cell Stem Cell23, 869–881 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Subramanian, A. et al. Single cell census of human kidney organoids shows reproducibility and diminished off-target cells after transplantation. Nat. Commun.10, 5462 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Li, Z., Nagai, J. S., Kuppe, C., Kramann, R. & Costa, I. G. scMEGA: single-cell multi-omic enhancer-based gene regulatory network inference. Bioinform. Adv.3, vbad003 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Kann, M. et al. Genome-wide analysis of Wilms’ tumor 1-controlled gene expression in podocytes reveals key regulatory mechanisms. J. Am. Soc. Nephrol.26, 2097 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Schreibing, F. & Kramann, R. Mapping the human kidney using single-cell genomics. Nat. Rev. Nephrol.18, 347–360 (2022). [DOI] [PubMed] [Google Scholar]
  • 48.Kompatscher, A. et al. Loss of transcriptional activation of the potassium channel kir5. 1 by HNF1β drives autosomal dominant tubulointerstitial kidney disease. Kidney Int.92, 1145–1156 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Aue, A. et al. A grainyhead-like 2/ovo-like 2 pathway regulates renal epithelial barrier function and lumen expansion. J. Am. Soc. Nephrol.26, 2704 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.García-Palmero, I. et al. Twist1-induced activation of human fibroblasts promotes matrix stiffness by upregulating palladin and collagen a1(VI). Oncogene35, 5224–5236 (2016). [DOI] [PubMed] [Google Scholar]
  • 51.Kawane, T. et al. Runx2 is required for the proliferation of osteoblast progenitors and induces proliferation by regulating Fgfr2 and Fgfr3. Sci. Rep.8, 13551 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Terzić, J. & Saraga-Babić, M. Expression pattern of PAX3 and PAX6 genes during human embryogenesis. Int. J. Dev. Biol.43, 501–508 (1999). [PubMed] [Google Scholar]
  • 53.Nagai, T. et al. Zic2 regulates the kinetics of neurulation. Proc. Natl. Acad. Sci. USA.97, 1618–1623 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Zhang, D., Zeldin, D. C. & Blackshear, P. J. Regulatory factor X4 variant 3: a transcription factor involved in brain development and disease. J. Neurosci. Res.85, 3515–3522 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Janesick, A. et al. High resolution mapping of the tumor microenvironment using integrated single-cell, spatial and in situ analysis. Nat. Commun.14, 8353 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Safi, W. et al. Assessing kidney development and disease using kidney organoids and CRISPR engineering. Front. Cell Dev. Biol.10, 948395 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Obayashi, I. Volume-optimal cycle: tightest representative cycle of a generator in persistent homology. SIAM J. Appl. Algebr. Geom.2, 508–534 (2018). [Google Scholar]
  • 58.Wei, R. K. J., Wee, J., Laurent, V. E. & Xia, K. Hodge theory-based biomolecular data analysis. Sci. Rep.12, 9699 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Su, Z., Tong, Y. & Wei, G. -W. Hodge decomposition of single-cell RNA velocity. J. Chem. Inf. Model.64, 3558–3568 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Meng, Z. & Xia, K. Persistent spectral-based machine learning (PerSpect ML) for protein-ligand binding affinity prediction. Sci. Adv.7, eabc5329 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Schott, M. et al. Open-ST: high-resolution spatial transcriptomics in 3D. Cell187, 3953–3972 (2024). [DOI] [PubMed] [Google Scholar]
  • 62.Gansner, E. R., Koren, Y. & North, S. Graph drawing by stress majorization. In International Symposium on Graph Drawing 239–250 (Springer, 2004).
  • 63.Sriramkumar, S. et al. Single-cell analysis of a high-grade serous ovarian cancer cell line reveals transcriptomic changes and cell subpopulations sensitive to epigenetic combination treatment. PLoS ONE17, e0271584 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Royal Stat. Soc. Ser. B Methodol.57, 289–300 (1995). [Google Scholar]
  • 65.Zelnik-Manor, L. & Perona, P. Self-tuning spectral clustering. In Advances in Neural Information Processing Systems17 (2004).
  • 66.Ortmann, M., Klimenta, M. & Brandes, U. A sparse stress model. In International Symposium on Graph Drawing and Network Visualization 18–32 (Springer, 2016).
  • 67.Delaunay, B., Vide, S., Lamémoire, A. & De Georges, V. Bulletin de l’academie des sciences de l’urss. Classe des Sciences Mathématiques et naturelles6, 793–800 (1934). [Google Scholar]
  • 68.Schaub, M. T., Zhu, Y., Seby, J. -B., Roddenberry, T. M. & Segarra, S. Signal processing on higher-order networks: livin’ on the edge… and beyond. Signal Process.187, 108149 (2021). [Google Scholar]
  • 69.Ester, M., Kriegel, H. -P., Sander, J. & Xu, X. et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. 2nd International Conference on Knowledge Discovery and Data Mining 226–231 (ACM, 1996).
  • 70.Wolf, F. A., Angerer, P. & Theis, F. J. Scanpy: large-scale single-cell gene expression data analysis. Genome Biol.19, 15 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Schep, A. N., Wu, B., Buenrostro, J. D. & Greenleaf, W. J. chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data. Nat. Methods14, 975–978 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Welch, J. D., Hartemink, A. J. & Prins, J. F. SLICER: inferring branched, nonlinear cellular trajectories from single cell RNA-seq data. Genome Biol.17, 106 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Setty, M. et al. Characterization of cell fate probabilities in single-cell data with Palantir. Nat. Biotechnol.37, 451–460 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res.7, 1–30 (2006). [Google Scholar]
  • 75.Friedman, M. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc.32, 675–701 (1937). [Google Scholar]
  • 76.Nemenyi, P. B. Distribution-free Multiple Comparisons (Princeton University, 1963).
  • 77.Stuart, T. et al. Comprehensive integration of single-cell data. Cell177, 1888–1902 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Granja, J. M. et al. Archr is a scalable software package for integrative single-cell chromatin accessibility analysis. Nat. Genet.53, 403–411 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Cheng, M., & Costa, I. PHLOWER - single cell trajectory analysis using decomposition of the Hodge Laplacian. Zenodo10.5281/zenodo.13860460 (2024).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information (84.2MB, pdf)

Supplementary Figs. 1–10 and Supplementary Tables 1 and 2.

Reporting Summary (1.5MB, pdf)
Peer Review File (2.1MB, pdf)
Source Data Fig. 2 (102.1KB, xlsx)

Benchmarking results on simulated and real datasets.

Source Data Fig. 3 (222.9KB, xlsx)

Gene marker count matrix of the kidney organoid multiome data.

Source Data Fig. 4 (10.2KB, xlsx)

Permutation test to measure relative differences in cell proportions comparing cell abundances in day 25 versus day 19.

Source Data Fig. 5 (12.5KB, xlsx)

Permutation test for siRNA and scrambled siRNA kidney organoids and quantification of the immunofluorescence staining markers.

Source Data Extended Data Fig. 2 (2.1MB, xlsx)

Friedman–Nemenyi test benchmarking methods across simulated and real datasets; different trajectory type contribution of the benchmarking metrics; distributions of topology size differences between predicted and reference structures.

Source Data Extended Data Fig. 3 (11.7KB, xlsx)

Time and memory benchmarking.

Source Data Extended Data Fig. 4 (49.9KB, xlsx)

PH analysis of example datasets.

Source Data Extended Data Fig. 5 (50.2KB, xlsx)

One-dimensional PH analysis of example datasets.

Source Data Extended Data Fig. 6 (839.9KB, xlsx)

Quality check of kidney organoid multiome data.

Source Data Extended Data Fig. 8 (338.1KB, xlsx)

Differential expression analysis and transcription factor–gene expression correlation to find regulators.

Source Data Extended Data Fig. 10 (26.8KB, xlsx)

Gene expression in kidney organoid multiome data for Xenium experiments; Xenium quality check, UMAP and gene marker expressions; statistical test of mRNA expression of genes targeted by siRNA experiments.

Data Availability Statement

Data objects with the kidney organoid (multiome and Xenium) and the benchmarking data have been deposited in Zenodo via 10.5281/zenodo.13860460 (ref. 79). Single-cell multimodal data and spatial transcriptomics have been deposited in the Gene Expression Omnibus (GEO) under the accessions GSE302266 and GSE302264. Source data are provided with this paper.

Code, documentation and examples for running analysis of this paper are available on GitHub via https://github.com/CostaLab/phlower/ and readthedocs via https://phlower.readthedocs.io/en/latest/.


Articles from Nature Methods are provided here courtesy of Nature Publishing Group

RESOURCES