Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

ArXiv logoLink to ArXiv
[Preprint]. 2023 Sep 26:arXiv:2211.00805v2. [Version 2]

GEODESIC SINKHORN FOR FAST AND ACCURATE OPTIMAL TRANSPORT ON MANIFOLDS

Guillaume Huguet 1,*, Alexander Tong 1,*, María Ramos Zapatero 2, Christopher J Tape 2, Guy Wolf 1, Smita Krishnaswamy 3
PMCID: PMC10557786  PMID: 37808090

Abstract

Efficient computation of optimal transport distance between distributions is of growing importance in data science. Sinkhorn-based methods are currently the state-of-the-art for such computations, but require On2 computations. In addition, Sinkhorn-based methods commonly use an Euclidean ground distance between datapoints. However, with the prevalence of manifold structured scientific data, it is often desirable to consider geodesic ground distance. Here, we tackle both issues by proposing Geodesic Sinkhorn—based on diffusing a heat kernel on a manifold graph. Notably, Geodesic Sinkhorn requires only O(nlogn) computation, as we approximate the heat kernel with Chebyshev polynomials based on the sparse graph Laplacian. We apply our method to the computation of barycenters of several distributions of high dimensional single cell data from patient samples undergoing chemotherapy. In particular, we define the barycentric distance as the distance between two such barycenters. Using this definition, we identify an optimal transport distance and path associated with the effect of treatment on cellular data.

1. INTRODUCTION

Optimal Transport (OT) distances or Wasserstein distances are computed by lifting ground distances between points to distances between measures. This distance is computed relative to a ground distance on the support of the distributions, making it more informative than distances based only on a pointwise comparison of the densities. However, to compute the Wasserstein, one needs to find the optimal transport plan from the source distribution to a target distribution; this is a linear programming problem requiring On3logn for discrete distributions of size n [1].

An efficient modification of the optimal transport problem is to consider entropy-regularized transportation. This formulation is solved with the Sinkhorn algorithm [2] by iteratively rescaling a Gaussian kernel based on the distance matrix. It is equivalent to the Schrödinger Bridge problem, for which similar algorithms were developed [3]–[5]. In the discrete case, it requires On2 for distributions of size n, since it relies on matrix-vector products. Furthermore, this formulation allows for fast computation of the discrete barycenter with fixed support (the average distributions w.r.t. the Sinkhorn distance). An important drawback of the Sinkhorn algorithm is the necessity of storing and multiplying the pairwise distance matrix with a vector.

Additionally, the ground distance is commonly chosen as the Euclidean distance. The Euclidean distance is often sub-optimal for high-dimensional datasets over larger distances according to the manifold hypothesis, which says observations lie near a low dimensional (curved) manifold in high dimensional space [6]. For higher dimensional datasets assumed to be sampled from a lower dimensional manifold, using a distance closer to the manifold for OT has shown promising results [7]–[10].

In this work, we present Geodesic Sinkhorn1 a Sinkhorn-based method for fast optimal transport with a heat-geodesic ground distance. Our method is based on the geometry of the dataset constructed from a common graph and uses the heat kernel on the graph to defined a heat-geodesic distance. Key to this approach, we will never need to construct or operate on an n×n distance matrix, and we will only use the sparse Laplacian matrix and sparse matrix-vector products. For sparse graphs, this can be used for O(nlogn) computation of the Sinkhorn distance with a manifold ground distance, improving on the On2 implementations based on dense matrices.

Increasing the state-of-the art efficiency in Sinkhorn computation opens us up to being able to perform complex operations on large groups of datasets. In particular, we consider interpolating between datasets and show that using our heat-geodesic distance improves the interpolation accuracy compared to OT with Euclidean distance. The barycenter corresponds to the average distribution of a set of distributions. Our method allows for finer-grained barycenters on a data manifold, which motivates us to define a novel notion of dissimilarity between families of distributions called barycentric distance.

We apply the barycentric distance to single cell data from patient-derived cancer organoids (PDOs) to assess the effect of treatments (such as drugs and chemotherapy). Here we have one set of PDOs from control conditions, and another set that are treated. The treatment effect is thus the distance between these barycenters. In addition, we use Geodesic Sinkhorn’s barycenter to compare the effect from one family of distributions to another.

Our main contributions include: (1) A new method for computing optimal transport distances on a manifold called Geodesic Sinkhorn, which is highly efficient in time and memory. (2) Defining the barycentric distance; a novel distance between families of distributions, and showing its utility in deriving treatment effect from control and treated patient samples.

2. RELATED WORK

Geodesic Sinkhorn is related to prior work linking the entropy-regularized optimal transport problem triangular mesh with the heat operator [8], [11], but using different graph filtering techniques. These approaches approximate the application of the heat kernel to a vector by discretizing the heat equation and solving systems of linear equations. This technique was used in different contexts, either with the cotangent Laplacian [8] or to learn a ground metric [12]. Solving these systems for each Sinkhorn iteration can be done efficiently with a sparse Cholesky decomposition. However, this method’s efficiency depends mainly on the efficiency of the Cholesky decomposition which can be slow depending on the sparsity pattern is On3 for an n×n matrix, and necessitates solving 2K systems of linear equations per Sinkhorn iteration, where K is the number of sub-steps in the backward Euler discretization.

3. PRELIMINARIES

In this section, we start by reviewing the basics of OT and the Wasserstein distance, as well the Sinkhorn distance. Then we review two notions fundamental to our method; the heat equation on a graph and the Chebyshev approximation of the heat kernel.

3.1. Wasserstein Distance

In the following, we assume that all distributions admit a density or a probability mass function, and we use the same notation for both. Let μ,ν be two probability distributions on a measurable space 𝒳Rd with metric d(,), let Π(μ,ν) be the set of joint probability distributions π on the space 𝒳×𝒳 where, for any measurable subset ω𝒳,π(ω×𝒳)=μ(ω) and π(𝒳×ω)=ν(ω). The p-Wasserstein distance is defined as:

Wpμ,νinfπΠμ,ν𝒳2d(x,y)p dπ(x,y)1/p. (1)

In the following, we consider p=2. An exact algorithm based on linear programming can solve this problem in On3logn time for discrete distributions of size n.

3.2. Sinkhorn Distances

The Kullback-Leibler (KL) divergence between π and some strictly positive K on 𝒳×𝒳 is defined as

DKLπK𝒳2lnπx,yKx,y1dπx,y. (2)

The Sinkhorn distance2 is a relaxation of equation 1 where the infimum is over all coupling in πΠ(μ,ν)DKL(πμ×ν)ξ} for ξ>1. Introduced in [13], the optimization of this distance can be solved by considering the entropyregularized transport

Wd,λ2(μ,ν)infπΠ(μ,ν)𝒳2d(x,y)2 dπ(x,y)λH(π)1/2, (3)

where we define the entropy of a coupling π as H(π):=lnπ(x,y)dπ(x,y), and λ>0. This formulation converges to the Wasserstein distance as λ0, and can be solved with the Sinkhorn algorithm with complexity of the order On2/ϵ for discrete distributions of size n [13]. In the discrete case, the transport matrix π admits the form diag(v)Kλdiag(w), where v,w are vectors of size n. The Sinkhorn algorithm iteratively updates the vectors as (v,w)μ./Kλw,ν./Kλ'v, where Kλ:=ed(x,y)2/λ.

Following [8], using the kernel Kλ gives an alternative interpretation of the Sinkhorn distance as

Wd,λ2μ,ν=λ1/21+minπΠμ,νDKLπKλ1/2. (4)

The problem in equation 3 is strictly convex and continuous yielding a unique minimizer. In the discrete case, this leads to an algorithm for the entropy-regularized Wasserstein distance based on the Sinkhorn algorithm enforcing the marginal constraints on the kernel Kλ while minimizing the distance as quantified by DKL.

The underlying metric d(,) is generally unknown, thus the kernel Kλ cannot be evaluated. The authors of [8] proposed to approximate Kλ with the heat kernel t(x,y) on 𝒳. According to Varadhan’s formula [14], the geodesic distance on a manifold can be recovered from the heat transfer at small timescales as

d(x,y)2=limt0+4tlntx,y. (5)

Hence, motivating the use of the heat-geodesic distance d2(x,y):=4tlnt(x,y), with associated kernel Kλ(x,y)= λ/4(x,y). Interestingly, Sinkhorn-based methods admit an efficient algorithm to solve the barycenter problem which we present next.

3.3. Interpolation with discrete support

By constraining the support to a set 𝒳 (or a graph), we can efficiently interpolate between more than two distributions. The barycenter problem [1], [8], [15] generalizes the notion of average between points to an average between distributions. For a set of m distributions μ1,,μm supported on 𝒳, the objective is to find a distribution minimizing the average distance

μ*argminμ𝒫(𝒳)i=1mαiWdpμ,μip,

where 𝒫(𝒳) denotes the space of probability distributions supported on 𝒳, and α1,,αm are non-negative weights. Finding the barycenter is a challenging optimization problem, however the barycenter for Sinkhorn-based methods admits an efficient computation. It involves updating m vectors vi,wi, which define a transport plan from μi to the barycenter μ*. The support of the barycenter is constrained to 𝒳, for most Sinkhorn-based methods the size of 𝒳 needs to be small for computational reason. Our method does not suffer from such a limitation. Hence, we can consider barycenter with greater expressivity, and interpolate between large sets of distributions.

3.4. Heat Diffusion on a Graph

Consider an undirected graph 𝒢=(V,E) with a set V of n vertices and a set of edges E, and its weighted adjacency matrix A with non-negative edge weights, and the diagonal degree matrix D, where Dii:=kAik. We define the combinatorial Laplacian as L:=DA, for any function f:VR we have (Lf)(v)=uau,v(f(v)f(u)). The combinatorial Laplacian is a symmetric positive semi-definite matrix, and has an eigendecomposition L=ΨΛΨT with orthonormal eigenvectors Ψ and diagonal eigenvalue matrix Λ=diagλ1,λ2,,λn, such that 0λ1λ2λn. The combinatorial Laplacian is a natural extension of the negative of the Laplacian operator to a graph. For a signal f0Rn on 𝒢, the diffusion of f0 on the graph evolves according to the heat equation

ddtft+Lft=0, s.t. f0=f0tR+.

The heat kernel solves this ODE, it is defined by the matrix exponential Ht:=etL. By orthogonality of the eigenvectors of L, we can write Ht=ΨetΛΨT and f(t)=Htf0. Computing Ht by eigendecomposition would require On3 operations. Recall that, for the Sinkhorn algorithm, we are only concerned with the application of the heat operator Ht on a signal fRn. For larger diffusion time, the heat kernel converges to its eigenvector associated to the lowest eigenvalues of the Laplacian, hence, intuitively, the heat kernel corresponds to a low-pass filter. In Geodesic Sinkhorn, we use Chebyshev polynomials [16], [17] to approximate the application of the heat operator to a signal. For a short timescale t, using the heat kernel accounts for using the geodesic distance as ground distance in the entropy-regularized OT formulation equation 3.

3.5. Chebyshev Polynomials

Polynomial sequences are often used to approximate functions or operator. With Chebyshev polynomials, we can approximate the application of the matrix exponential Ht=etL to a signal f on the graph. An attractive property of Chebyshev polynomials is that the approximation error decays exponentially with the maximum degree K. They are defined by the recursive relation TkkN with T0(y)=0,T1(y)=y and Tk(y)=2yTk1(y)Tk2(y) for k2. On [−1, 1] these polynomials are orthogonal w.r.t. the weight (1y)1/2, and can be used to express the operator Ht. Assuming the largest eigenvalue λn2, we can write

Ht=b02+k=1bkTk(LId),

where the K+1 scalar coefficient bk depend on time and can be evaluated with the Bessel function. The approximation of Ht is based on the first K term of the series which we note pK(L,t). It results in K matrix-vector products which can be efficient since, in general, L is a sparse matrix. On a m-nearest neighbor graph, this can be O(Kmn/λ), where λ is a regularization parameter. Chebyshev polynomials admits interesting theoretical properties and are known to converge faster than other polynomials [17], [18]. The choice of the parameter K is related to the number of neighbors or the connectivity of the graph. For small diffusion time, hence only diffusing in a local neighborhood, the approximation is accurate even with a small K. As the diffusion time increases, K has to increase in order to consider a larger neighborhood around a node. For OT, we consider small diffusion time, and we found that our results were stable for all K greater than 10.

4. GEODESIC SINKHORN DISTANCES

We define the Geodesic Sinkhorn distance between any signals or distributions on a graph 𝒢 by the entropy-regularized OT with the heat kernel Ht on the graph. This construction is also valid between any point cloud datasets. In that case, for m datasets X1,,Xm sampled from a set of distributions μ1,,μm, we construct a common graph using an affinity kernel on the m datasets and compare two samples by taking the distance between two indicator functions on the graph. We approximate the heat kernel Ht with Chebyshev polynomials pK(L,l) of order K. In Algorithm 1, we present the main steps to evaluate the Geodesic Sinkhorn. It is based upon Sinkhorn iterations [2], [13], where ⊘ and ⊙ denote respectively the elementwise division and multiplication. Note that, as opposed to the usual Sinkhorn algorithm, we never have to store a dense n×n distance matrix, but only the usually sparse graph Laplacian.

graphic file with name nihpp-2211.00805v2-f0001.jpg

Definition 4.1. The Geodesic Sinkhorn distance between two distributions μ,ν on a graph 𝒢 is

WHt(μ,ν)4t1/21+minπΠ(μ,ν)DKLπHt1/2.

In the following proposition, we find the ground distance implicitly used in the optimal transport defined by Geodesic Sinkhorn. We use ≃ for the equivalence relation between distances.

Proposition 4.2. There exists a maximum Chebyshev polynomial degree K such that the ground distance in Geodesic Sinkhorn is equivalent to the one based Ht

4tlogpK(L,t)ij4tlogHtij

In particular, the Wasserstein distances with these ground distances are equivalent.

Proof. Because the approximation error decreases exponentially in K [17], we have that for any ϵ>0 sufficiently small there exist Ki such that Htijϵ<pKij<Htij+ ϵ. Choose K such that this is true for all vertices K:=maxK1,Kn. We define

cminijHtijϵHtij and CmaxijHtij+ϵHtij

and we have the equivalence between the distances since

cHtijpK(L,t)ijCHtij for all ij[n],

and since the logarithm is a monotonic function. □

In [8], [12], using the Euler implicit discretization results in a ground cost of the form ϵlnIdϵ4KLK, where Id is the identity matrix, and can be seen as another approximation for the matrix exponential.

The efficiency of Geodesic Sinkhorn improves the notion of barycenter as it is possible to consider much larger graph 𝒢, thus a finer grained support of the barycenter. This leads us to define a novel distance between families of distributions.

Definition 4.3. For two finite families of distributions 𝒯 and 𝒞 supported on 𝒢, we define the barycentric distance between the families 𝒯,𝒞 as

γ(𝒯,𝒞)WHtμ𝒯*,μ𝒞*

where μ𝒯*,μ𝒞* are respectively the barycenters of 𝒯 and 𝒞.

The previous definition is valid for any distances between distributions or barycenters. However, OT barycenters are known to be more informative than others [15]. We will further explore this comparison in our experiments. We use it to distinguish between two groups in a medical setting where a set of patients received a treatment (defining the family 𝒯), and another set acts as a control family 𝒞. Following this idea, we define a notion of effect between two families.

Definition 4.4. For two family of distributions 𝒯 and 𝒞 supported on 𝒢, define the Expected Barycenter Effect of 𝒯 as

τ(𝒯)Eμ𝒯*YtEμ𝒞*Yc,

where μ𝒯*,μ𝒞* are respectively the barycenters of 𝒯 and 𝒞, and the features Yc~μ𝒞* and Yt~μ𝒯*.

Note that we compute the average on the family of distributions instead of the average on their support, hence we evaluate their expectations in a closed form. This definition also extends to a conditional equivalent where families of distribution can be subdivided with discrete covariate variables. When the barycenters are computed with the total variation, this definition is equivalent to the naive Average Treatment Effect(ATE) [19]; i.e. difference of empirical means.

5. RESULTS

We demonstrate the accuracy and efficiency of the Geodesic Sinkhorn distance on two tasks: (1) Nearest-Wasserstein-neighbor calculation on simulated data with manifold structure similar to the setup of [10]; (2) A newly defined Barycentric distance between families of distributions computed to quantify the effect of a treatment on patient-derived organoids. In Appendix A.1, we present additional results on time series interpolation.

5.1. Nearest-Wasserstein-neighbor distributions

In this experiment, we compare our method with Sinkhorn [13], and LR Sinkhorn [20], both algorithms with Euclidean and squared Euclidean ground distance, with DiffusionEMD [21], and Sinkorn with Euler approximation of the heat filter. We created 15 Gaussian distributions sampled randomly on a swiss roll dataset, and sampled 10k observations from each distribution. We rotated the observations in 10 dimensions. We consider a k-nearest neighbors task on these distributions. We evaluate the methods with the ground truth, since we know the exact geodesic distance on the manifold. In Tab. 1, we report the average and standard deviation over 10 seeds of the Spearman and Pearson correlations to the ground truth, and the runtime in seconds with and without the computation of the graph. Our method is the most accurate while being much faster than other Sinkhorn-based methods.

Table 1.

KNN task for 15 distributions, best score highlighted is bold. Geodesic Sinkhorn is the most accurate, while being faster than other Sinkhorn-based methods.

Method SpearmanR PearsonR P@5 Time(s) no graph Time(s)
Diffusion EMD 0.62±0.097 0.736±0.023 0.66±0.072 2.845±0.135 7.877±0.531
Sinkhorn W1 0.387±0.044 0.523±0.036 0.471±0.028 112.406±0.206 112.406±0.206
Sinkhorn W2 0.411±0.036 0.485±0.027 0.492±0.053 133.686±5.234 133.686±5.234
LR Sinkhorn W1 −0.31±0.07 −0.131±0.086 0.237±0.037 578.631±107.82 578.631±107.82
LR Sinkhorn W2 0.366±0.048 0.379±0.051 0.447±0.023 204.191±3.656 204.191±3.656
Euler Sinkhorn 0.776±0.061 0.718±0.009 0.728±0.072 449.752±42.985 455.059±43.083
Geodesic Sinkhorn 0.847±0.023 0.754±0.016 0.833±0.034 10.176±1.249 16.682±1.705

5.2. Barycentric distance

We test if we can identify a linear treatment effect with the Expected Barycenter Effect (EBE). In this experiment, we create a control family of distributions 𝒞 of ten standard Gaussian distributions. The treatment group consists of nine Gaussian distributions 𝒩(5,1), and one outlier centered at different means. For each distribution, we sample 500 observations, and reproduce the experiment over ten seeds. In Tab. 2, we report the EBE and its standard deviation with the Geodesic Sinkhorn, the Total Variation (TV) distance, and Sinkhorn. Since the TV only compares the mean, it is sensitive to the outlier, whereas our method can identify the true treatment effect.

Table 2.

Expected Barycenter Effect (EBE) with one outlier distribution centered at −60,−30,0, or 5. Comparison using the barycenter from Sinkhorn, total variation, or Geodesic Sinkhorn. Values closer to the real treatment effect of 5 are better.

Outlier EBE Geo Sinkhorn EBE Sinkhorn EBE TV
−60 5.016±0.226 −0.103±0.005 −1.429±0.144
−30 5.053±0.196 0.355±0.049 1.571±0.144
0 4.917±0.315 4.954±0.157 4.571±0.144
No outlier 5.059±0.159 5.054±0.16 5.071±0.144

5.3. Single-cell signaling data

We use single-cell signaling data produced by mass cytometry (MC) for a screening study to compare the treatment effect of different chemotherapies on 10 colorectal cancer (CRC) patient-derived organoids (PDOs) [22]. These PDOs can be grouped into chemoresistant PDOs, that show little-to-no effect when treated with chemotherapies; and chemosensitive PDOs, that present strong shifts in their phenotypes upon treatment. The observations include single-cell data information on the cell cycle and signaling changes upon treatment of PDOs with different CRC treatments at a range of concentrations. In Fig. 1, we present the barycentric distances matrices between treatments a) and between four concentrations of treatment SN-38 (S) b). In both cases, the control groups corresponds to AH and DMSO, the two rightmost columns. We compare the distance matrices between Sinkhorn (left) and our method (right). Our method provide a finer distinction between treatments (Fig. 1 top) and concentrations (Fig. 1 bottom), especially for the chemosensitive group. As observed in [22], chemosensitive PDOs show little-to-no response to lower concentrations of SN-38 (S1), but their phenotype shifts very strongly upon treatment with higher concentrations (S2, S3, and S4) (Fig. 1 b). When comparing combinations of different treatments (Fig. 1 a), Geodesic Sinkhorn better resolves the difference between SN-38 (S) alone and in combination with Cetuximab (C), showing that S is the main agent creating the treatment effect and the combination with C does not resolve in a synergistic effect [22]. Note that we only consider the relative magnitude of the distances, since the two algorithm use different ground distances.

Fig. 1.

Fig. 1.

a) Barycentric distances matrices for the Sinkhorn algorithm (left) and our method Geodesic Sinkhorn (right). b) Barycentric distances matrices between doses of treatment SN-38, for four concentrations S1 S2 S3 S4. Control groups correspond to AH and DSMO. Geodesic Sinkhorn provides a clearer distinction between treatments, and doses.

6. CONCLUSION

In this work, we considered the use of OT for graphs and large datasets in high dimensions potentially sampled from a lower dimensional manifold. We proposed Geodesic Sinkhorn, a fast implementation of the Sinkhorn algorithm using the graph Laplacian and Chebyshev polynomials. Our method is well adapted for large and high dimensional datasets as it is defined with a geodesic ground distance, which takes into account the underlying geometry of the data, and requires less computation time and less memory. On a synthetic dataset, we showed that Geodesic Sinkhorn is much faster than other Sinkhorn-based methods while being more accurate. With the Wasserstein barycenter, we defined the barycentric distance to compare entire families of distributions, and the expected barycenter effect, then applied both methods to a large PDO drug screen dataset.

Acknowledgments

This research was partially funded by ESP Mérite [G.H.], NSERC Discovery grant 03267 [G.W.], Canada CIFAR AI Chair [G.W.], NIH grant R01GM135929 [G.W., S.K.], Cancer Research UK (C60693 / A23783) [C.J.T.], the Rosetrees Trust (M872 / A2292) [C.J.T.], and the Yale-UCL Collaborative Student Exchange Programme.

A. APPENDIX

A.1. Time series Interpolation

To evaluate Geodesic Sinkhorn’s performance on inferring dynamics, we test its performance on a task for time series interpolation. In this setting the most used datasets are Embryoid Body [1], and WOT [2]. We curated ten scRNA-seq datasets; WOT [2], Clark [3], Embryoid Body [1], and seven more from the 2022 multimodel single-cell integration challeng3 to test our method. The observations are the gene expression of single cells from a distribution evolving through time. The Waddington-OT dataset (WOT) has 38 timpoints of a developing stem cell population over 18 days collected roughly every 6–12 hours. This is the most densely sampled dataset in time. The Embryoid Body dataset is a single-cell RNA seq dataset of developing embryoid bodies from 0–30 days with 5 datasets collected over time. The Clark dataset contains 12 samples over 9 unique timepoints of a developing mouse retina. Finally, the NeurIPS 2022 data contains four donors with single-cell transcriptomic data collected over 4 timepoints for 3 donors with 10X-multiome and 4 combined cite-seq (in the publically released training data) leading to 7 additional time series with 4 timepoints.

The goal is to interpolate the distribution between two timepoints. The number of timepoints in each dataset range from 4 to 40. Here for a dataset with T single-cell distributions μ1,μ2,,μT over time for t[2T1] we compute the exact Euclidean 2-Wasserstein distance between the interpolated distribution μˆt at time t and the ground truth distribution μt,W2μˆt,μt. Since we interpolate between two distributions we used the McCann interpolant as its support is Rd. We compare our Geodesic Sinkhorn interpolation with either the Sinkhorn Mccann interpolant with Euclidean ground distance (L2 Sinkhorn), Sinkhorn Euler Mccann with the Euler heat approximation in Tab. 3 Sinkhorn with Euler approximation ran out of memory on the Clark and WOT datasets. We see that across all 10 the Geodesic Sinkhorn interpolation with the Mccann interpolant outperforms all other methods, hence showcasing the importance of the heat-geodesic distance and our kernel approximation. We also compare Sinkhron Euler and Geodesic Sinkhorn for different nearest neighbors graphs in Tab. 4 and Tab. 5, where Geodesic Sinkhorn outperforms Sinkhorn Euler on most datasets.

Table 3.

Time series interpolation task comparing mean and standard deviation (μ±σ) across 5 seeds the 2-Wasserstein metric averaged across time (1/T2)t[2T1]W2μˆt,μt for 10 single-cell timeseries datasets. Sinkhron Euler ran out of memory on two datasets. Lower is better, best performance on each dataset is bold.

Dataset L2 Sinkhorn Sinkhron Euler Geo Sinkhorn (ours)
Cite Donor0 48.545 ± 0.057 46.254 ± 3.192 44.440 ± 0.108
Cite Donor1 48.220 ± 0.055 45.897 ± 3.254 44.165 ± 0.103
Cite Donor2 50.281 ± 0.016 47.773 ± 3.958 45.673 ± 0.092
Cite Donor3 49.339 ± 0.081 46.565 ± 3.553 45.022 ± 0.146
Clark 13.500 ± 0.003 13.288 ± 0.008
EB 12.415 ± 0.008 12.298 ± 0.140 12.133 ± 0.011
Multiome Donor0 56.648 ± 0.048 55.373 ± 7.234 53.431 ± 0.077
Multiome Donor1 54.028 ± 0.126 52.396 ± 4.394 50.238 ± 0.022
Multiome Donor2 58.798 ± 0.155 57.182 ± 5.511 55.041 ± 0.058
WOT 8.096 ± 0.003 7.397 ± 0.106

Table 4.

Time series interpolation task comparing mean of the 2-Wasserstein for a KNN graph with 50 neighbors. Lower is better, best score is bold.

Dataset Sinkhorn Euler Geodesic Sinkhorn
Cite Donor0 48.507 46.850
Cite Donor1 48.207 46.883
Cite Donor2 50.383 49.176
Cite Donor3 49.210 47.646
Clark 13.506 13.378
EB 12.409 12.394
Multiome Donor0 56.676 55.095
Multiome Donor1 54.028 53.952
Multiome Donor2 58.821 57.187
WOT 8.070 8.279

Table 5.

Time series interpolation task comparing mean of the 2-Wasserstein for a KNN graph with 100 neighbors. Lower is better, best score is bold.

Dataset Sinkhorn Euler Geodesic Sinkhorn
Cite Donor0 48.573 47.102
Cite Donor1 48.197 47.373
Cite Donor2 50.298 49.517
Cite Donor3 49.290 48.433
Clark 13.500 13.396
EB 12.415 12.416
Multiome Donor0 56.708 55.562
Multiome Donor1 54.063 54.016
Multiome Donor2 58.802 57.732
WOT 8.096 8.253

Footnotes

2

With a slight abuse of language we use the term distance, although the entropy-regularized formulation does not respect the identity of indiscernibles.

References

  • [1].Peyré G. and Cuturi M., “Computational Optimal Transport,” arXiv, 2020. [Google Scholar]
  • [2].Sinkhorn R. and Knopp P., “Concerning nonnegative matrices and doubly stochastic matrices,” Pacific Journal of Mathematics, 1967. [Google Scholar]
  • [3].Fortet R., “Résolution d’un systeme d’équations de m. schrödinger,” J. Math. Pure Appl. IX, 1940. [Google Scholar]
  • [4].Kullback S., “Probability densities with given marginals,” The Annals of Mathematical Statistics, 1968. [Google Scholar]
  • [5].Knight P. A. and Ruiz D., “A fast algorithm for matrix balancing,” IMA Journal of Numerical Analysis, 2013. [Google Scholar]
  • [6].Moon K. R., Stanley J. S., Burkhardt D., van Dijk D., Wolf G., and Krishnaswamy S., “Manifold learning-based methods for analyzing single-cell RNA-sequencing data,” Current Opinion in Systems Biology, 2018. [Google Scholar]
  • [7].Huguet G., Magruder D. S., Tong A., et al. , “Manifold interpolating optimal-transport flows for trajectory inference,” NeurIPS, 2022. [PMC free article] [PubMed] [Google Scholar]
  • [8].Solomon J., De Goes F., Peyré G., et al. , “Convolutional wasserstein distances: Efficient optimal transportation on geometric domains,” ACM Transactions on Graphics (ToG), 2015. [Google Scholar]
  • [9].Tong A., Huguet G., Shung D., et al. , “Embedding signals on graphs with unbalanced diffusion earth mover’s distance,” in ICASSP, IEEE, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Tong A., Huguet G., Natik A., et al. , “Diffusion earth mover’s distance and distribution embeddings,” in ICML, 2021. [Google Scholar]
  • [11].Crane K., Weischedel C., and Wardetzky M., “Geodesics in heat: A new approach to computing distance based on heat flow,” ACM Transactions on Graphics, 2013. [Google Scholar]
  • [12].Heitz M., Bonneel N., Coeurjolly D., Cuturi M., and Peyré G., “Ground metric learning on graphs,” J. of Math. Imaging and Vision, 2021. [Google Scholar]
  • [13].Cuturi M., “Sinkhorn distances: Lightspeed computation of optimal transport,” NeurIPS, 2013. [Google Scholar]
  • [14].Varadhan S. R. S., “On the behavior of the fundamental solution of the heat equation with variable coefficients,” Communications on Pure and Applied Mathematics, 1967. [Google Scholar]
  • [15].Cuturi M. and Doucet A., “Fast computation of wasserstein barycenters,” in ICML, 2014. [Google Scholar]
  • [16].Shuman D. I., Vandergheynst P., and Frossard P., “Chebyshev polynomial approximation for distributed signal processing,” in DOCSS, IEEE, 2011. [Google Scholar]
  • [17].Marcotte S., Barbe A., Gribonval R., et al. , “Fast multi-scale diffusion on graphs,” in ICASSP, IEEE, 2022. [Google Scholar]
  • [18].Huang S.-G., Lyu I., Qiu A., and Chung M. K., “Fast polynomial approximation of heat kernel convolution on manifolds and its application to brain sulcal and gyral graph pattern analysis,” IEEE transactions on medical imaging, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [19].Imbens G. W. and Rubin D. B., Causal inference in statistics, social, and biomedical sciences. Cambridge University Press, 2015. [Google Scholar]
  • [20].Scetbon M., Cuturi M., and Peyré G., “Low-rank sinkhorn factorization,” in ICML, 2021. [Google Scholar]
  • [21].Tong A., Huang J., Wolf G., Dijk D. V., and Krishnaswamy S., “TrajectoryNet: A Dynamic Optimal Transport Network for Modeling Cellular Dynamics,” in ICML, 2020. [PMC free article] [PubMed] [Google Scholar]
  • [22].Zapatero M. R., Tong A., Sufi J., et al. , “Trellis single-cell screening reveals stromal regulation of patient-derived organoid drug responses,” bioRxiv, 2023. [DOI] [PubMed] [Google Scholar]

References

References

  • [1].Moon K. R., van Dijk D., Wang Z., et al. , “Visualizing structure and transitions in high-dimensional biological data,” Nat Biotechnol, vol. 37, no. 12, pp. 1482–1492, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Schiebinger G., Shu J., Tabaka M., et al. , “Optimal-Transport Analysis of Single-Cell Gene Expression Identifies Developmental Trajectories in Reprogramming,” en, Cell, vol. 176, no. 4, 928–943.e22, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Clark B. S., Stein-O’Brien G. L., Shiau F., et al. , “Single-Cell RNA-Seq Analysis of Retinal Development Identifies NFI Factors as Regulating Mitotic Exit and Late-Born Cell Specification,” Neuron, vol. 102, no. 6, 1111–1126.e5, Jun. 2019, ISSN: 08966273. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from ArXiv are provided here courtesy of arXiv

RESOURCES