Skip to main content
Entropy logoLink to Entropy
. 2023 Oct 10;25(10):1435. doi: 10.3390/e25101435

Soft Quantization Using Entropic Regularization

Rajmadan Lakshmanan 1,*, Alois Pichler 1
Editor: Jean-Pierre Gazeau1
PMCID: PMC10606929  PMID: 37895556

Abstract

The quantization problem aims to find the best possible approximation of probability measures on Rd using finite and discrete measures. The Wasserstein distance is a typical choice to measure the quality of the approximation. This contribution investigates the properties and robustness of the entropy-regularized quantization problem, which relaxes the standard quantization problem. The proposed approximation technique naturally adopts the softmin function, which is well known for its robustness from both theoretical and practicability standpoints. Moreover, we use the entropy-regularized Wasserstein distance to evaluate the quality of the soft quantization problem’s approximation, and we implement a stochastic gradient approach to achieve the optimal solutions. The control parameter in our proposed method allows for the adjustment of the optimization problem’s difficulty level, providing significant advantages when dealing with exceptionally challenging problems of interest. As well, this contribution empirically illustrates the performance of the method in various expositions.

Keywords: quantization, approximation of measures, entropic regularization

JEL Classification: 94A17, 81S20, 40A25

1. Introduction

Over the past few decades, extensive research has been conducted on optimal quantization techniques in order to tackle numerical problems that are related to various fields such as data science, applied disciplines, and economic models. These problems are typically centered around uncertainties or probabilities which demand robust and efficient solutions (cf. Graf and Mauldin [1], Luschgy and Pagès [2], El Nmeir et al. [3]). In general, these problems are difficult to handle, as the random components in the problem allow uncountably many outcomes. As a consequence, in order to address this difficulty the probability measures are replaced by simpler or finite measures, which can facilitate numerical computations. However, the probability measures should be ‘close’ in order to ensure that the result of the computations with approximate (discrete) measures resembles the original problem. In a nutshell, the goal is to find the best approximation of a diffuse measure using a discrete measure, which is called an optimal quantization problem. For a comprehensive discussion of the optimal quantization problem from a mathematical standpoint, refer to Graf and Luschgy [4].

On the other hand, entropy (sometimes known as information entropy) is an essential concept when dealing with uncertainties and probabilities. In mathematics, entropy is often used as a measure of information and uncertainty. It provides a quantitative measure of the randomness or disorder in a system or a random variable. Its applications span information theory, statistical analysis, probability theory, and the study of complex dynamical systems (cf. Breuer and Csiszár [5,6], Pichler and Schlotter [7]).

In order to assess the closeness of probability measures, distances are often considered; one notable instance is the Wasserstein distance. Ostensibly, the Wasserstein distance measures the minimum average amount of transporting cost required to transfer one probability measure into another. Unlike other formulations of distances and/or divergence, which simply compare the probabilities of the distribution functions (e.g., the total variation distance and the Kullback–Leibler divergence), the Wasserstein distance incorporates the geometry of the underlying space. This increases the understanding of the relationships between different probability measures in a geometrically trustworthy manner.

In our research work, we focus on entropy-regularized quantization methods. More precisely, we consider an entropy-regularized version of the Wasserstein problem to quantify the quality of the approximation, and adapt the stochastic gradient approach to obtain the optimal quantizers.

The key features of our methodology include the following:

  • (i)

    Our regularization approach stabilizes and simplifies the standard quantization problem by introducing penalty terms or constraints that discourage overly complex or overfitted models, promoting better generalizations and robustness in the solutions.

  • (ii)

    The influence of entropy is controlled using a parameter, λ, which enables us to reach the genuine optimal quantizers.

  • (iii)

    Generally, parameter tuning comes with certain limitations. However, our method builds upon the framework of the well-established softmin function, which allows us to exercise parameter control without encountering any restrictions.

  • (iv)

    For larger values of the regularization parameter λ, the optimal measure accumulates all its mass at the center of the measure.

  • Applications in the Context of Quantization.

Quantization techniques have undergone significant developments in recent years, particularly within the domain of deep learning and model optimization. State-of-the-art research has introduced advanced methodologies such as non-uniform quantization and quantization-aware training, enabling the efficient deployment of neural networks while preserving performance (cf. Jacob et al. [8], Zhuang et al. [9], Hubara et al. [10]). Furthermore, quantization principles have found applications beyond machine learning, such as in digital image processing, computer vision (cf. Polino et al. [11]), and electric charge quantization (cf. Bhattacharya [12]).

  • Related Works and Contributions.

As mentioned above, optimal quantization is a well-researched topic in the field of information theory and signal processing. Several methods have been developed for the optimal quantization problem, notably including the following:

  • Lloyd-Max Algorithm: this algorithm, also known as Lloyd’s algorithm or the k-means algorithm, is a popular iterative algorithm for computing optimal vector quantizers. It iteratively adjusts the centroids of the quantization levels to minimize the quantization error (cf. Scheunders [13]).

  • Tree-Structured Vector Quantization (TSVQ): TSVQ is a hierarchical quantization method that uses a tree structure to partition the input space into regions. It recursively applies vector quantization at each level of the tree until the desired number of quantization levels is achieved (cf. Wei and Levoy [14]).

  • Expectation-maximization (EM) algorithm: the EM algorithm is a general-purpose optimization algorithm that can be used for optimal quantization. It is an iterative algorithm that estimates the parameters of a statistical model to maximize the likelihood of the observed data (cf. Heskes [15]).

  • Stochastic Optimization Methods: stochastic optimization methods such as simulated annealing, genetic algorithms, and particle swarm optimization can be used to find optimal quantization strategies by exploring the search space and iteratively improving the quantization performance (cf. Pagès et al. [16]).

  • Greedy vector quantization (GVQ): the greedy algorithm tries to solve this problem iteratively by adding one code word at every step until the desired number of code words is reached, each time selecting the code word that minimizes the error. GVQ is known to provide suboptimal quantization compared to other non-greedy methods such as the Lloyd-Max and Linde–Buzo–Gray algorithms. However, it has been shown to perform well when the data have a strong correlation structure. Notably, it utilizes the Wasserstein distance to measure the error of approximation (cf. Luschgy and Pagès [2]).

These methods provide efficient and practical solutions for finding optimal quantization schemes, and have different trade-offs between complexity and performance. The choice of method depends on the problem of interest and the requirements of the application. However, most of these methods depend on strict constraints, which makes the solutions overly complex or results in model overfitting. Our method mitigates this issue by promoting better generalizations and robustness in the solutions.

In the optimal transport community, the entropy-regularized version of the optimal transport problem (known as the entropy-regularized Wasserstein problem) was initial proposed in Cuturi [17]. This entropy version of the Wasserstein problem promotes fast computations using Sinkhorn’s algorithm. As an avenue for constructive research, the above-cited study presented a multitude of results aimed at gaining a comprehensive understanding of the subtleties involved in enhancing the computational performance of entropy-optimal transport (cf. Ramdas et al. [18], Neumayer and Steidl [19], Altschuler et al. [20], Lakshmanan et al. [21], Ba and Quellmalz [22], Lakshmanan and Pichler [23]). These findings have served as a valuable foundation for further exploration in the field of optimal transport, providing insights into both the intricacies of the topic and potential avenues for improvement.

In contrast, we present a new and innovative approach that concentrates on the optimal quantization problem based on entropy and on its robust properties, which represents a distinct contribution with regard to standard entropy-regularized optimal transport problems.

One of the principal consequences of our research substantiates the convergence behavior of quantizers at the center of the measure. The relationship between the center of the measure and the entropy-regularized quantization problem has not been exposed yet. The following plain solution is obtained by intensifying the entropy term in the regularization of the quantization problem.

Theorem 1.

There exists a real valued λ0>0 such that the approximation of the entropy-regularized optimal quantization problem is provided by the Dirac measure

P=δa

for every λ>λ0, where a is the center of the measure P with respect to the distance d.

This enthralling interpretation (Theorem 1) of our master problem facilitates an understanding of the transition from a complex and difficult optimization solution to a simple solution. Moreover, along with a theoretical discussion, we provide an algorithm and numerical exemplification which empirically demonstrate the robustness of our method. The forthcoming sections elucidate the robustness and asymptotic properties of the proposed method in detail.

  • Outline of the Paper.

Section 2 establishes the essential notation, definitions, and properties. Moreover, we comprehensively expound upon the significance of the smooth minimum, a pivotal component in our research. In Section 3, we introduce the entropy-regularized optimal quantization problem and delve into its inherent properties. Section 4 presents a discussion of the soft tessellation, optimal weights, and theoretical properties of parameter tuning. Furthermore, we systematically illustrate the computational process along with a pseudo-algorithm. Section 5 provides numerical examples and empirically substantiates the theoretical proofs. Finally, Section 6 summarizes the study.

2. Preliminaries

In what follows, (X,d) is a Polish space. The σ-algebra generated by the Borel sets induced by the distance d is F, while the set of all probability measures on X is P(X).

2.1. Distances and Divergences of Measures

The standard quantization problem employs the Wasserstein distance to measure the quality of the approximation, which was initially studied by Monge and Kantorovich (cf. Monge [24], Kantorovich [25]). One of the remarkable properties of this distance is that it metrizes the weak* topology of measures.

Definition 1

(Wasserstein distance). Let P and P˜ be probability measures on (X,d). The Wasserstein distance of order r1 between P and P˜P(X) is

dr(P,P˜):=infX×Xd(ξ,ξ˜)r π(dξ,dξ˜)1/r, (1)

where the infimum is among all measures πP(X2) with marginals P and P˜, that is,

π(A×X)=P(A)and (2)
π(X×B)=P˜(B) (3)

for all sets A and BF. The measures

π1(·):=π(·×X) and π2(·):=π(X×·)

on X are called the marginal measures of the bivariate measure π.

Readers may refer to the excellent monographs in [26,27] for a comprehensive discussion of the Wasserstein distance.

Remark 1

(Flexibility). In the subsequent discussion, our problem of interest is to approximate the measure P, which is a continuous, discrete, or mixed measure on X=Rd. The measure P˜ is used to approximate the measure P, which is a discrete measure. The definition of the Wasserstein distance flexibly comprises all the cases, namely, continuous, semi-discrete, and discrete measures.

In contrast to the standard methodology, we investigate the quantization problem by utilizing an entropy version of the Wasserstein distance. The standard Wasserstein problem is regularized by adding the Kullback–Leibler divergence, which is known as the relative entropy.

Definition 2

(Kullback–Leibler divergence). Let P and QP(X) be probability measures. Denote the Radon–Nikodým derivative dQ=Z dP by ZL1(P) if Q is absolutely continuous with respect to P (QP). The Kullback–Leibler divergence is

D(QP):=EPZlogZ=EQlogZifQPanddQ=Z dP,+else, (4)

where EP (EQ, resp.) is the expectation with respect to the measure P (Q, resp.).

Per Gibbs’ inequality, the Kullback–Leibler divergence satisfies D(QP)0 (non-negativity). However, D is not a distance metric, as it does not satisfy the symmetry and triangle inequality properties.

We would like to emphasize the following distinctness with respect to the Wasserstein distance (cf. Remark 1): in order for the Kullback–Leibler divergence to be finite (D(QP)<), we must have

suppQsuppP,

where the support of the measure (cf. Rüschendorf [28]) is

suppP:=AF:AisclosedandP(A)=1.

If P is a continuous measure on X=Rd, then Q is as well. If P is a finite measure, then the support points of P contain the support points of Q.

2.2. The Smooth Minimum

In what follows, we present the smooth minimum in its general form, which includes discrete and continuous measures. The numerical computations in the following section rely on results for its discrete version. Therefore, we address the special properties of its discrete version in detail.

Definition 3

(Smooth minimum). Let λ>0 and let Y be a random variable. The smooth minimum, or smooth minimum with respect to P˜, is

minP˜; λ(Y):=λlogEP˜eY/λ (5)
=λlogXeY(η)/λ P˜(dη), (6)

provided that the expectation (integral) of eY/λ is finite, or if it is not finite, that minP˜; λ(Y):=. For λ=0, we set

minP˜; λ=0(Y):=ess infY. (7)

For a σ-algebra GF and λ>0 measurable with respect to G, the conditional smooth minimum is

minP˜;λ(Y| G):=λlogEP˜eY/λ G.

The following lemma relates the smooth minimum with the essential infimum (cf. (7)), that is, colloquially, the ‘minimum’ of a random variable. As well, the result justifies the term smooth minimum.

Lemma 1.

For λ>0, it holds that

minP˜;λ(Y)EP˜Y (8)

and

ess infYminP˜;λ(Y)λ0ess infY. (9)

Proof. 

Inequality (8) follows from Jensen’s inequality as applied to the convex function xexp(x/λ).

Next, the first inequality in the second display (9) follows from ess infYY and the fact that all operations in (6) are monotonic. Finally, let a>ess infY. Per Markov’s inequality, we have

EP˜eY/λea/λ P˜eY/λea/λ=ea/λ P˜(Ya), (10)

which is a variant of the Chernoff bound. From Inequality (10), it follows that

minP˜;λ(Y)=λlogEP˜eY/λλlogea/λ P˜(Ya)=a+λlog1P˜(Ya). (11)

When λ>0 and λ0, we have

minP˜;λ(Y)a,

where a is an arbitrary number with a>ess infY. This completes the proof.    □

Remark 2

(Nesting property). The main properties of the smooth minimum include translation equivariance

minP˜;λ(Y+c)=minP˜;λ(Y)+c,cR,

and positive homogeneity

minP˜;γ·λ(γ·Y)=γ·minP˜;λ(Y),γ>0.

As a consequence of the tower property of the expectation, we have the nesting property

minP˜;λminP˜;λ(Y| G)=minP˜;λ(Y),

provided that G is a sub-σ-algebra of F.

2.3. Softmin Function

The smooth minimum is related to the softmin function via its derivatives. In what follows, we express variants of its derivatives, which are involved later.

Definition 4

(Softmin function). For λ>0 and a random variable Y with a finite smooth minimum, the softmin function is the random variable

σλ(Y):=expYminP˜;λ(Y)λ=eY/λEP˜eY/λ, (12)

where the latter equality is obvious based on the definition of the smooth minimum in (6). The function σλ(Y) is called the Gibbs density.

  • The Derivative with respect to the Probability Measure

The definition of the smooth minimum in (6) does not require the measure P˜ to be a probability measure. Based on tlog(a+t·h)=ha (at t=0) for the natural logarithm, the directional derivative of the smooth minimum in the direction of the measure Q is

1tminP˜+t·Q;λ(Y)minP˜;λ(Y) (13)
=λtlogXeY/λ dP˜+t·QlogXeY/λdP˜ (14)
t0λ·XeY/λ dQXeY/λdP˜ (15)
=λ·Xσλ(Y) dQ. (16)

Note that λ σλ is (up to the constant λ) a Radon–Nikodým density in (16). Thus, the Gibbs density σλ(Y) is proportional to the directional derivative of the smooth minimum with respect to the underlying measure P˜.

  • The Derivative with respect to the Random Variable

In what follows, we additionally require the derivative of the smooth minimum with respect to its argument. Following similar reasoning as above, this is accomplished by

1tminP˜;λ(Y+t·Z)minP˜;λ(Y) (17)
=λtlogXe(Y+t·Z)/λ dP˜logXeY/λdP˜ (18)
=λtlogXeY/λ1tλZ+𝒪(t2) dP˜logXeY/λdP˜ (19)
t0XZ·eY/λ dP˜XeY/λdP˜ (20)
=XZ·σλ(Y)dP˜, (21)

which involves the softmin function σλ(·) as well.

3. Regularized Quantization

This section introduces the entropy-regularized optimal quantization problem along with its properties; we first recall the standard optimal quantization problem.

The standard quantization measures the quality of the approximation using the Wasserstein distance and considers the following problem (cf. Graf and Luschgy [4]):

infπ:π1=P, π2Pm(X)X×Xd(ξ,ξ˜) π(dξ,dξ˜), (22)

where

Pm(X):=P˜mP(X):P˜m=j=1mp˜j δyj (23)

is the set of measures on X supported by not more than m (mN) points.

Soft quantization (or quantization regularized with the Kullback–Leibler divergence) involves the regularized Wasserstein distance instead of (22). The soft quantization problem is regularized with the Kullback–Leibler divergence:

infEπdr+λ·D(π P×P˜m):π1=Pandπ2=P˜mPm(X), (24)

where λ>0 and Eπdr=X2d(ξ,ξ˜)r π(dξ,dξ˜). The optimal measure P˜mPm(X) for solving (24) depends on the regularization parameter λ.

In the following discussion, we initially investigate the regularized approximation, which again demonstrates the existence of an optimal approximation.

3.1. Approximation with Inflexible Marginal Measures

The following proposition addresses the optimal approximation problem after being regularized with the Kullback–Leibler divergence and fixed marginals. To this end, we dissect the infimum in the soft quantization problem (24) as follows:

infP˜mPm(X)infπ:π1=P,π2=P˜mEπdr+λ·D(π P×P˜m), (25)

where the marginals P and P˜m are fixed in the inner infimum.

The following Proposition 1 addresses this problem with a fixed bivariate distribution, which is the inner infimum in (25). Then, Proposition 2 reveals that the optimal marginals coincide in this case.

Proposition 1.

Let P be a probability measure and let λ>0. The inner optimization problem in (25) relative to the fixed bivariate distribution P×P˜ is provided by the explicit formula

infπ:π1=PEπdr+λ·D(πP×P˜)=λXlogXed(ξ,ξ˜)r/λ P˜(dξ˜) P(dξ) (26)
=EξPminξ˜P˜;λd(ξ,ξ˜)r, (27)

where D(πP×P˜) is the Kullback–Leibler divergence. Further, the infimum in (26) is attained.

Remark 3.

The notation in (27) ((29) below, resp.) is chosen to reflect the explicit expression in (26), while the soft minimum minP˜;λ is with respect to the measure P˜, which is associated with the variable ξ˜, and the expectation EP is with respect to P and has an associated variable ξ (that is, the variable ξ in (27) is associated with P and the variable ξ˜ with P˜).

Remark 4

(Standard quantization). The result from (27) extends

infπ:π1=Pandsuppπ2=suppP˜=XminξsuppP˜d(ξ,ξ˜)r P(dξ) (28)
=EPminξ˜suppP˜d(ξ,ξ˜)r, (29)

which is the formula without regularization and with restriction to the marginals P and P˜ (i.e., λ=0, cf. Pflug and Pichler [29]). Note that the preceding display thereby explicitly involves the support suppP˜, while (26) only involves the expectation (via the smooth minimum) with respect to the measure P˜. In other words, (27) quantifies the quality of entropy-regularized quantization, while (29) quantifies standard quantization.

Proof of Proposition 1.

It follows from the definition of the Kullback–Leibler divergence in (4) that it is enough to consider measures π which are absolutely continuous with respect to the product measure πP×P˜; otherwise, the objective is not finite. Hence, there is a Radon–Nikodým density Z˜ such that, with Fubini’s theorem,

π(A×B)=ABZ˜(ξ,η) P˜(dη)P(dξ).

In order for the marginal constraint π(A×X)=P(A) to be satisfied (cf. (2)), we have

AXZ˜(ξ,η)P˜(dη) P(dξ)=π(A×X)=P(A)=A1 P(dξ)

for every measurable set A. It follows that

XZ˜(ξ,η) P˜(dη)=1P(dξ)almosteverywhere.

We can conclude that every density of the form

Z˜(ξ,η)=Z(ξ,η)XZ(ξ,η) P˜(dη) (30)

satisfies constraints in (2), irrespective of Z, and conversely that, via Z˜ in (30), every Z defines a bivariate measure π satisfying the constraints in (2). We set Φ(ξ,η):=logZ(ξ,η) (with the convention that log0= and exp()=0, resp.) and consider

Z˜(ξ,η)=eΦ(ξ,η)XeΦ(ξ,η) P˜(dη).

With these, the divergence is

D(πP×P˜)==XXeΦ(ξ,η)XeΦ(ξ,η) P˜(dη)logeΦ(ξ,η)P˜(dη)P(dξ)XeΦ(ξ,η) P˜(dη)P(dξ)P˜(dη)P˜(dη)P(dξ)=XXeΦ(ξ,η)XeΦ(ξ,η) P˜(dη)logeΦ(ξ,η)XeΦ(ξ,η) P˜(dη)P˜(dη)P(dξ)=XXeΦ(ξ,η)XeΦ(ξ,η) P˜(dη)Φ(ξ,η) P˜(dη)P(dξ)XeΦ(ξ,η)XeΦ(ξ,η) P˜(dη)logXeΦ(ξ,η) P˜(dη)P˜(dη)P(dξ)=XXeΦ(ξ,η)XeΦ(ξ,η) P˜(dη)Φ(ξ,η)logXeΦ(ξ,η) P˜(dη)P(dξ).

For the other term in Objective (24), we have

Eπdr=XXeΦ(ξ,η)XeΦ(ξ,η) P˜(dη)d(ξ,η)r P˜(dη)P(dξ).

Combining the last expressions obtained, the objective in (26) is

Eπdr+λ D(πP×P˜)=XXeΦ(ξ,η)XeΦ(ξ,η) P˜(dη)d(ξ,η)r+λ Φ(ξ,η) P˜(dη)P(dξ) (31)
λXlogXeΦ(ξ,η) P˜(dη)P(dξ). (32)

For fixed ξ (ξ is suppressed in the following two displays to abbreviate the notation), consider the function

f(Φ):=XeΦ(η)XeΦ(η) P˜(dη)d(η)r+λ Φ(η) P˜(dη)λlogXeΦ(η) P˜(dη).

The directional derivative in direction h of this function is

limt01tf(Φ+t h)f(Φ) (33)
=XeΦ(η)XeΦ(η) P˜(dη)d(η)r+λ Φ(η)λh(η) P˜(dη) (34)
XeΦ(η)XeΦ(η)h(η) P˜(dη)XeΦ(η) P˜(dη)2d(η)r+λ Φ(η) P˜(dη) (35)
+λXeΦ(η)h(η)XeΦ(η) P˜(dη)P˜(dη) (36)
=XeΦ(η)XeΦ(η) P˜(dη)d(η)r+λ Φ(η)h(η) P˜(dη) (37)
XeΦ(η)XeΦ(η)h(η) P˜(dη)XeΦ(η) P˜(dη)2d(η)r+λ Φ(η) P˜(dη). (38)

Per (37) and (38), the derivative vanishes for every function h if d(η)r+λ Φ(η)=0. As ξ is arbitrary, the general minimum is attained for Φ(ξ,η)=d(ξ,η)r/λ. With this, the first expression in (31) vanishes, and we can conclude that

infπEπdr+λ D(πP×P˜)=λXlogXed(ξ,η)r/λ P˜(dη)P(dξ)=EPminP˜;λd(ξ,ξ˜)r.

Finally, notice that the variable Z(ξ,η)=eΦ(ξ,η) is completely arbitrary for the problem in (26) involving the Wasserstein distance and the Kullback–Leibler divergence. As outlined above, for every measure π with finite divergence D(πP×P˜), there is a density Z, as considered above. From this, the assertion in Proposition 1 follows.    □

Remark 5.

The preceding proposition considers probability measures π with marginal π1=P. Its first marginal distribution (trivially) is absolutely continuous with respect to P, π1P, as π1=P.

The second marginal π2, however, is not specified. In order for π to be feasible in (26), its Kullback–Leibler divergence with respect to P×P˜ must be finite. Hence, there is a (non-negative) Radon–Nikodým density Z such that

π2(B)=π(X×B)=X×BZ(ξ,η)P(dξ)P˜(dη).

It follows from Fubini’s theorem that

π2(B)=BXZ(ξ,η) P(dξ)P˜(dη)=BZ(η) P˜(dη),

where Z(η):=XZ(ξ,η) P(dξ). Thus, the second marginal is absolutely continuous with respect to P˜π2P˜.

Proposition 1 characterizes the objective of the quantization problem. In addition, its proof implicitly reveals the marginal of the best approximation. The following lemma explicitly spells out the density of the marginal of the optimal measure with respect to P˜.

Lemma 2

(Characterization of the best approximating measure). The best approximating marginal probability measure minimizing (26) has a density

Z(ξ˜)=EPσλd(ξ,ξ˜)r=Xσλd(ξ,ξ˜)r P(dξ),

where σλ(·) is the softmin function (cf. Definition 4).

Proof. 

Recall from the proof of Proposition 1 that we have the density

Z˜(ξ,ξ˜)=ed(ξ,ξ˜)r/λEP˜ed(ξ,ξ˜)r/λ

of the optimal measure π relative to P×P˜. From this, we can derive

π2(B)=π(X×B)=BXed(ξ,ξ˜)r/λEP˜ed(ξ,ξ˜)r/λ P(dξ)P˜(dξ˜)

such that

Z(ξ˜)=Xed(ξ,ξ˜)r/λEP˜ed(ξ,ξ˜)r/λP(dξ)=EPσλd(ξ,ξ˜)r

is the density with respect to P˜, that is, dπ2=Z dP˜ (i.e., π2(dξ˜)=Z(ξ˜) P˜(dξ˜)).    □

3.2. Approximation with Flexible Marginal Measure

The following proposition reveals that the best approximation of a bivariate measure in terms of a product of independent measures is provided by the product of its marginals. With this, it follows that the objectives in (25) and (26) coincide for P˜=π2.

Proposition 2.

Let P be a measure and let π be a bivariate measure with marginal π1=P and π2. Then, it holds that

D(π P×π2)D(π P×P˜), (39)

where P˜ is an arbitrary measure.

Proof. 

Define the Radon–Nikodým density Z(η):=π2(dη)P˜(dη) and observe that the extension Z(ξ,η):=Z(η) to X×X is the density Z=dP×π2dP×P˜. It follows with (4) that

0D(π2 P˜)=Eπ2logdπ2dP˜ (40)
=Eπlogd P×π2d P×P˜ (41)
=Eπlogd πd P×P˜logd πd P×π2 (42)
=D(π P×P˜)D(π P×π2), (43)

which is the assertion. In case the measures are not absolutely continuous, the assertion in (40) is trivial.    □

Suppose now that π is a solution of the master problem (26) with some P˜. It follows from the preceding proposition that the objective (26) improves when replacing the initial P˜ with the marginal of the optimal solution, that is, P˜=π2.

3.3. The Relation of Soft Quantization and Entropy

The soft quantization problem (26) involves the Kullback–Leibler divergence and not the entropy. The major advantage of the formulation presented above is that it works for discrete, continuous, or mixed measures, while entropy usually needs to be defined separately for discrete and continuous measures.

For a discrete measure with P(x):=P({x}) and P˜(y):=P˜({y}), the Kullback–Leibler divergence (4) is

D(P˜ P)=H(P˜,P)H(P˜) (44)
=xXP˜(x)logP˜(x)P(x), (45)

where

H(P˜,P):=xXP˜(x)·logP(x)

is the cross-entropy of the measures P˜ and P, while 

H(P˜):=H(P˜,P˜)=xXP˜(x)logP˜(x) (46)

is the entropy of P˜.

For a measure π with marginals P and P˜, the cross-entropy is

H(π,P×P˜)=x,yπ(x,y)logP(x)·P˜(y) (47)
=x,yπ(x,y)logP(x)x,yπ(x,y)logP˜(y) (48)
=xP(x)logP(x)yP˜(y)logP˜(y), (49)

where we have used the marginals from (2). Note that (49) does not depend on π; hence, H(π,P×P˜) does not depend on π.

With (44), the quantization problem (26) can be rewritten equivalently as

minπ:π2PX×Xdr dπλ·H(π) (50)

by involving the entropy only. For this reason, we call the master problem in (26) the entropy-regularized problem.

4. Soft Tessellation

The quantization problem (25) consists of finding a good (in the best case, the optimal) approximation of a general probability measure P on X using a simple and discrete measure P˜m=j=1mp˜j δyj. Thus, the problem consists of finding good weights p˜1,,p˜m as well as good locations y1,,ym. Quantization employs the Wasserstein distance to measure the quality of the approximation; instead, soft quantization involves the regularized Wasserstein distance, as in (26):

infP˜mPm(X) infπ:π1=P,π2=P˜mEπdr+λ·D(π P×P˜m),

where the measures on X supported by not more than m points (cf. (23)) are as follows:

Pm(X)=P˜mP(X):P˜m=j=1mp˜j δyj.

We separate the problems of finding the best weights and locations. The following Section 4.1 addresses the problem of finding the optimal weights p˜; the subsequent Section 4.2 then addresses the problem of finding the optimal locations y1,,ym. As well, we elaborate the numerical advantages of soft quantization below.

4.1. Optimal Weights

Proposition 1 above is formulated for the general probability measures P and P˜. The desired measure in quantization is a simple and discrete measure. To this end, recall that, per Remark 5, measures which are feasible for (26) have marginals π2 with π2P˜. It follows that the support of the marginal is smaller than the support of P˜, that is,

suppπ2suppP˜.

For a simple measure P˜=j=1mp˜j δyj with p˜j>0, it follows in particular that suppπ2{y1,,ym}. In this subsection, we consider the measure P˜ and the support {y1,,ym} to be fixed.

To unfold the result of Proposition 1 for discrete measures, recall the smooth minimum and the softmin function for the discrete (empirical or uniform) measure P˜=j=1mp˜j δyj. For this measure, the smooth minimum (6) explicitly is

minλ;P˜(y1,,ym)=λlogp˜1 ey1/λ++p˜m eym/λ.

This function is occasionally referred to as the LogSumExp function. The softmin function (or Gibbs density (12)) is

σλ(y1,,ym)=eyj/λp˜1 ey1/λ++p˜m eym/λ)j=1m.

It follows from Lemma 2 that the best approximating measure is Q=j=1mqj p˜j δyj, where the vector q of the optimal weights relative to P˜ is provided explicitly by

q=Xσλd(ξ,y1)r,,d(ξ,ym)r P(dξ)=EPσλd(ξ,y1)r,,d(ξ,ym)r, (51)

which involves computing expectations.

  • Soft Tessellation

For λ=0, the softmin function σλ is

p˜j·σλ=0d(ξ,y1)r,,d(ξ,ym)rj=1ifd(ξ,yj)r=mind(ξ,y1)r,,d(ξ,ym)r,0else.

That is, the mapping jp˜j·σλ()j can serve for classification, i.e., tessellation; the point ξ is associated with yj if σλ()j0, and the corresponding region is known as a Voronoi diagram.

For λ>0, the softmin p˜j·σλ()j is not a strict indicator, and can instead be interpreted as probability; that is,

p˜j·σλd(ξ,y1)r,,d(ξ,ym)rj

is the probability of allocating ξX to the quantizer yj.

Remark 6

(K-means and quantization). K-means clustering is a widely used unsupervised machine learning algorithm that groups datapoints into clusters based on their similarity. Voronoi tessellation, on the other hand, is a geometrical concept used to partition a space into regions, each of which are associated with a specific point or seed. Notably, this is an unavoidable concept in the investigation of optimal quantizers. In the K-means context, Voronoi tessellation helps to define cluster boundaries. Each cluster’s boundary is constructed as the region within which the datapoints are closer to its cluster center than to any other (cf. Graf and Luschgy [4] Chapter I).

4.2. Optimal Locations

As a result of Proposition 1, the objective in (27) is an expectation. To identify the optimal support points y1,,ym, it is essential to first minimize

minP˜=j=1mp˜jδyjEξPminλ;yP˜d(ξ,y)r. (52)

This is a stochastic, nonlinear, and non-convex optimization problem:

f(y1,,ym):=Ef(y1,,ym;ξ)=Eminλ;P˜j=1,,md(ξ,yi)r, (53)

where the function f(y1,,ym;ξ):=minλ;P˜{d(ξ,yi)r:j=1,,m} is nonlinear and non-convex. The optimal quantization problem constitutes an unconstrained, stochastic, non-convex, and nonlinear optimization problem. According to the chain rule and gradients in Section 2.3, the gradient of the objective is constructed from the components

yjf(y1,,ym)=p˜j·expd(ξ,yj)r/λj=1mp˜j·expd(ξ,yj)r/λ·y d(ξ,y)ry=yj, (54)

that is,

f=p˜·σλd(ξ,y1)r,,d(ξ,ym)r·r d(ξ,y)r1·y d(ξ,y), (55)

where ‘·’ denotes the Hadamard (element-wise) product and p˜d(ξ,y)r1 are the vectors with entries p˜jd(ξ,yj)r1j=1,,m. In other words, the gradient of the LogSumExp function (53) is the softmin function, which Section 2.3 explicitly illustrates.

Algorithm 1 is a stochastic gradient algorithm used to minimize (51), which collects the elements of the optimal weights and the optimal locations provided here and in the preceding section.

Algorithm 1: Stochastic gradient algorithm to find the optimal quantizers and optimal masses
graphic file with name entropy-25-01435-i001.jpg

Example 1.

To provide an example of the gradient of the distance function in (54) ((55), resp.), the derivative of the weighted norm

d(ξ,y)=yξp:==1dw·|yξ|p1/p

is

yjyξpr=r wj ξyprpp·|yjξj|p1·sign(yjξj).

4.3. Quantization with Large Regularization Parameters

The entropy in (46) is minimal for the Dirac measure P=δx (where x is any point in X); in this case, H(δx)=1·log1=0, while H(P˜)>0 for any other measure. For larger values of λ, the objective in (50), and as such the objective of the master problem (23), will supposedly prefer a measure with fewer points. This is indeed the case, as stated by Theorem 1 above. We provide its proof below after formally defining the center of the measure.

Definition 5

(Center of the measure). Let P be a probability measure on X and let d be a distance on X. The point aX is a center of the measure P with respect to the distance d if

aarg minxXEd(x,ξ)r,

provided that Ed(x0,ξ)r< for some (i.e., any) x0X and r1.

In what follows, we demonstrate that the regularized quantization problem (50) links the optimal quantization problem and the center of the measure.

Proof of Theorem 1.

According to Proposition 1, Problems (50) and (26) are equivalent. Now, assume that yi=yj for all ijm; then, d(yi,ξ)=d(yj,ξ) for ξΞ, and it follows that

minλd(y1,ξ)r,,d(ym,ξ)r=d(yi,ξ)r,i=1,,m.

Thus, the minimum of the optimization problem is attained at yi=a for each i=1,,m, where a is the center of the measure P with respect to the distance d. It follows that y1==ym=a is a local minimum and a stationary point satisfying the first order conditions

f(y1,,ym)=0

for the function f provided in (53). Note as well that

σλd(ξ,y1)r,,d(ξ,yn)ri=expd(ξ,yi)r/λj=1np˜jexpd(ξ,yj)r/λ=1,

and as such the softmin function does not depend on λ at the stationary point y1==ym=a.

Recall from (54) that

Eminλ;P˜j=1,,nd(ξ,yj)r=Eσλd(y1,ξ)r,,d(yn,ξ)r·d(ξ,yi)r.

According to the product rule, the Hessian matrix is

2Eminλ;P˜j=1,,nd(ξ,yj)r=Eσλd(y1,ξ)r,,d(yn,ξ)r·d(ξ,yi)r2+σλd(y1,ξ)r,,d(yn,ξ)r·2d(ξ,yi)r. (56)

Note that the second expression is positive definite, as the Hessian 2d(ξ,yi)r of the convex function is positive definite and minλ;P˜j=1,,n(x1,,xn)=σλ(x1,,xn)0. Further, the Hessian of the smooth minimum (see Appendix A) is

σλ=2minλj=1,,n=1λ Σ,

where the matrix Σ is

Σ:=diagσ1,,σnσσ.

This matrix Σ is positive definite (as i=1nσi=1) and 0Σ1 in Loewner order; indeed, Σ is the covariance matrix of the multinomial distribution. It follows that the first term in (56) is O(1), while the second is O1λ, such that (56) is positive definite for sufficiently small λ. Thus, the extremal point yi=a is a minimum for all λ. In particular, there exists λ0>0 such that (56) is positive definite for every λ>λ0, hence, the result. □

5. Numerical Illustration

This section presents numerical findings for the approaches and methods discussed earlier. The Julia implementations for these methods are available online (cf. https://github.com/rajmadan96/SoftQuantization.git, accessed on 8 September 2023).

In the following experiments, we approximate the measure P with a finite discrete measure P˜ using the stochastic gradient algorithm presented in Algorithm 1.

5.1. One Dimension

First, we perform the analysis in one dimension. In this experiment, our problem of interest is to find entropy-regularized optimal quantizers for

PN(0,1)andPExp(1)

(i.e., the normal and exponential distributions with standard parameters). To enhance the peculiarity, we consider only m=8 quantizers.

Figure 1 illustrates the results of soft quantization of the standard normal distribution and exponential distribution. It is apparent that when λ is increased beyond a certain threshold (cf. Theorem 1), the quantizers converge towards the center of the measure (i.e., the mean), while for smaller values of λ the quantizers are able to identify the actual optimal locations with greater accuracy. Furthermore, we emphasize that our proposed method is capable of identifying the mean location regardless of the shape of the distribution, which this experiment empirically substantiates.

Figure 1.

Figure 1

Soft quantization of measures on R with a varying regularization parameter λ with eight quantization points. (a) Normal distribution: for λ=10, the best approximation resides at the center of the measure; for λ=1, the approximation is reduced to only six points, as two of the remaining points have probability 0; for λ=0, we obtain the standard quantization. (b) Exponential distribution: the measures concentrate on one (λ large), three (λ=1), five (λ=0.5), and eight quantization points.

For better understanding of the dissemination of the weights (probabilities) and their respective positions, the following examination involves the calculation of the cumulative distribution function. Additionally, we consider

PΓ(2,2)(Gammadistribution)

as a problem of interest, which is a notably distinct scenario in terms of shape compared to the measures examined previously.

Figure 2 provides the results. It is evident that the number of quantizers m decreases as λ increases. When λ reaches a specific threshold, such as λ=20 in our case, all quantizers converge towards the center of the measures, represented by the mean (i.e., 4).

Figure 2.

Figure 2

Soft quantization of the Gamma distribution on R with varying regularization parameter λ; the approximating measure is simplifies with increasing λ. (aλ=0: approximate solution to standard quantization problem with eight quantizers. (bλ=1: the eight quantization points collapse to seven quantization points. (cλ=10: the eight quantization points collapse to three quantization points. (dλ=20: the quantization points converge to a single point representing the center of the measure.

5.2. Two Dimensions

Next, we demonstrate the behavior of entropy-regularized optimal quantization for a range of λ in two dimensions. In the following experiment, we consider

PU(0,1)×(0,1)(uniformdistributiononthesquare)

as a problem of interest. Initially, we perform the experiment with m=4 quantizers.

Figure 3 illustrates the findings. Figure 3a reveals a quantization pattern similar to that observed in the one-dimensional experiment. However, in Figure 3b we gain more detailed insight into the behavior of the quantizers at λ=1, where they align diagonally before eventually colliding. Furthermore, the size of the point indicates the respective probability of the quantization point, which is notably uniformly distributed for a varying regularization parameter λ.

Figure 3.

Figure 3

Two-dimensional soft quantization of the uniform distribution on R2 with a varying regularization parameter λ with 4 quantizers. (a) Uniform distribution in R2. (b) Enlargement of (a): for larger values of λ (here, λ=1), the quantizers align while converging to the center of the measure.

Again, we consider a uniform distribution as a problem of interest in the subsequent experiment, this time employing m=16 quantizers for enhanced comprehension. Figure 4 encapsulates the essence of the experiment, offering an extensive visual representation. In contrast to the previous experiment, it can be observed that for regularization values of λ=0.037 and λ=0.1 they assemble at the nearest strong points (in terms of high probability) rather than converging towards the center of the measure (see Figure 4b,c). Subsequently, for larger λ, they move from these strong points towards the center, where they are in a diagonal alignment before colliding (see Figure 4d). More concisely, when λ=0 we achieve the genuine quantization solution (see Figure 4a). As λ increases, the quantizers with lower probabilities converge towards those with nearest higher probabilities. Subsequently, all quantizers converge towards the center of the measure, represented by the mean of respective measure.

Figure 4.

Figure 4

Soft quantization of uniform distribution on R2 with varying regularization parameter λ; the approximating measure simplifies with λ increasing. (aλ=0.0: approximate solution to the standard quantization problem with sixteen quantizers. (bλ=0.037: the sixteen quantization points collapse to eight quantization points. (cλ=0.1: the sixteen quantization points collapse to four quantization points. (dλ=1.0: the quantization points converge to a single point, representing the center of the measure, in an aligned way.

Thus far, we have conducted two-dimensional experiments employing various quantizers (m=4 and m=16) with the uniform distribution. These experiments can be categorized under the k-means approach (see Remark 6). Next, we delve into the complexity of a multivariate normal distribution, with the aim of enhancing comprehension. More precisely, our problem of interest is to find a soft quantization for

PN(μ,Σ),

where

μ=00,Σ=3113.

In this endeavor, we employ more quantizers, specifically, m=100. Figure 5 captures the core essence of the experiment, delivering a comprehensive and visually illustrative representation. From the experiment, it is evident that the initial diagonal alignment precedes convergence toward the center of the measure as λ increases. Additionally, a noticeable shift can be observed on the part of the points with lower probabilities towards those with higher probabilities. This experiment highlights that the threshold of λ for achieving convergence or diagonal alignment in the center of the measure depends on the number of quantizers employed.

Figure 5.

Figure 5

Two-dimensional soft quantization of the normal distribution on R2 with varying regularization parameter λ and parameters r=2p=2, and m=100. (aλ=0.0 , the solution to the standard quantization problem, (bλ=5.0, (cλ=10.0.

6. Summary

In this study, we have enhanced the stability and simplicity of the standard quantization problem by introducing a novel method of quantization using entropy. Propositions 1 and 2 thoroughly elucidate the intricacies of the master problem (25). Our substantiation of the convergence of quantizers to the center of the measure explains the transition from a complex hard optimization problem to a simplified configuration (see Theorem 1). More concisely, this transition underscores the fundamental shift towards a more tractable and straightforward computational framework, marking a significant advancement in terms of the overall approach. Moreover, in Section 5, we provide numerical illustrations of our method that confirm its robustness, stability, and properties, as discussed in our theoretical results. These numerical demonstrations serve as empirical evidence reinforcing the efficacy of our proposed approach.

Appendix A. Hessian of the Softmin

The empirical measure 1λi=1nδxi is a probability measure. From Jensen’s inequality, it follows that minλ(x1,,xn)1ni=1nxix¯n. Thus, the smooth minimum involves a cumulant generating function, for which we derive

minλ(x1,,xn)=j=1(1)j1λj1·j!κj (A1)
=x¯n12λsn2+16λ2κ3+O(λ3), (A2)

where κj is the j-th cumulant with respect to the empirical measure. Specifically,

κ1=x¯n=1ni=1nxi,κ=sn2=1ni=1n(xix¯n)2,κ3=1ni=1n(xix¯n)3,

where x¯n is the ‘sample mean’ and sn2 the ‘sample variance’. The following cumulants (κ, etc.) are more involved. The Taylor series expansion log(1+x)=x12x2+O(x3) and

λlog1ni=1nexi/λ=x¯nλlog1ni=1ne(xix¯n)/λ=x¯nλlogi=1n1n1xix¯nλ+12xixnλ216xix¯nλ3+O1λ=x¯nλlog1+12λ2sn216λ3κ3+O(λ4)=x¯nλ12λ2sn216λ3κ3+O(λ4)=x¯n12λsn2+16λ2κ3+O(λ3).

Note as well that the softmin function is the gradient of the smooth minimum:

σλ(x1,,xm)i=ximinλ(x1,,xn).

The softmin function is frequently used in classification in a maximum likelihood framework. It holds that

2xixjminλ(x1,,xn)=xjexp(λ xi)k=1mexp(λ xk)=+λexp(λ xiλ xj)k=1mexp(λ xk)2=λ σiσj

for ij and

2xi2minλ(x1,,xn)=xiexp(λ xi)j=1mexp(λ xj)=λexp(λ xi)j=1mexp(λ xj)+λexp(λ xiλ xi)j=1mexp(λ xj)=λ σi+λσiσi,

that is,

2minλ(x1,,xn)=λσσdiagσ=λ·σ10000σnσ·σ.

Author Contributions

The authors have contributed equally to this article. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

Data is available at https://github.com/rajmadan96/SoftQuantization.git, accessed on 8 September 2023.

Conflicts of Interest

The authors declare no conflict of interest.

Funding Statement

DFG, German Research Foundation—Project-ID 416228727—SFB 1410.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

References

  • 1.Graf S., Mauldin R.D. A Classification of Disintegrations of Measures. Contemp. Math. 1989;94:147–158. [Google Scholar]
  • 2.Luschgy H., Pagès G. Greedy vector quantization. J. Approx. Theory. 2015;198:111–131. doi: 10.1016/j.jat.2015.05.005. [DOI] [Google Scholar]
  • 3.El Nmeir R., Luschgy H., Pagès G. New approach to greedy vector quantization. Bernoulli. 2022;28:424–452. doi: 10.3150/21-BEJ1350. [DOI] [Google Scholar]
  • 4.Graf S., Luschgy H. Foundations of Quantization for Probability Distributions. Volume 1730. Springer; Berlin, Germany: 2000. Lecture Notes in Mathematics. [DOI] [Google Scholar]
  • 5.Breuer T., Csiszár I. Measuring distribution model risk. Math. Financ. 2013;26:395–411. doi: 10.1111/mafi.12050. [DOI] [Google Scholar]
  • 6.Breuer T., Csiszár I. Systematic stress tests with entropic plausibility constraints. J. Bank. Financ. 2013;37:1552–1559. doi: 10.1016/j.jbankfin.2012.04.013. [DOI] [Google Scholar]
  • 7.Pichler A., Schlotter R. Entropy based risk measures. Eur. J. Oper. Res. 2020;285:223–236. doi: 10.1016/j.ejor.2019.01.016. [DOI] [Google Scholar]
  • 8.Jacob B., Kligys S., Chen B., Zhu M., Tang M., Howard A., Adam H., Kalenichenko D. Quantization and training of neural networks for efficient integer-arithmetic-only inference; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Salt Lake City, UT, USA. 18–23 June 2018. [Google Scholar]
  • 9.Zhuang B., Liu L., Tan M., Shen C., Reid I. Training quantized neural networks with a full-precision auxiliary module; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020. [(accessed on 6 October 2023)]. pp. 1488–1497. Available online: https://openaccess.thecvf.com/content_CVPR_2020/html/Zhuang_Training_Quantized_Neural_Networks_With_a_Full-Precision_Auxiliary_Module_CVPR_2020_paper.html. [Google Scholar]
  • 10.Hubara I., Courbariaux M., Soudry D., El-Yaniv R., Bengio Y. Binarized neural networks. [(accessed on 6 October 2023)];Adv. Neural Inf. Process. Syst. 2016 29 Available online: https://proceedings.neurips.cc/paper_files/paper/2016/hash/d8330f857a17c53d217014ee776bfd50-Abstract.html. [Google Scholar]
  • 11.Polino A., Pascanu R., Alistarh D.-A. Model compression via distillation and quantization; Proceedings of the 6th International Conference on Learning Representations; Vancouver, BC, Canada. 30 April–3 May 2018; [(accessed on 6 October 2023)]. Available online: https://research-explorer.ista.ac.at/record/7812. [Google Scholar]
  • 12.Bhattacharya K. Semi-classical description of electrostatics and quantization of electric charge. Phys. Scr. 2023;98:8. doi: 10.1088/1402-4896/ace1b0. [DOI] [Google Scholar]
  • 13.Scheunders P. A genetic Lloyd-Max image quantization algorithm. Pattern Recognit. Lett. 1996;17:547–556. doi: 10.1016/0167-8655(96)00011-6. [DOI] [Google Scholar]
  • 14.Wei L.Y., Levoy M. Fast texture synthesis using tree-structured vector quantization; Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques; 2000. [(accessed on 6 October 2023)]. pp. 479–488. Available online: https://dl.acm.org/doi/abs/10.1145/344779.345009. [Google Scholar]
  • 15.Heskes T. Self-organizing maps, vector quantization, and mixture modeling. IEEE Trans. Neural Netw. 2001;12:1299–1305. doi: 10.1109/72.963766. [DOI] [PubMed] [Google Scholar]
  • 16.Pagès G., Pham H., Printems J. Handbook of Computational and Numerical Methods in Finance. Springer Science & Business Media; Berlin/Heidelberg, Germany: 2004. Optimal Quantization Methods and Applications to Numerical Problems in Finance; pp. 253–297. [DOI] [Google Scholar]
  • 17.Cuturi M. Sinkhorn distances: Lightspeed computation of optimal transport; Proceedings of the 26th International Conference on Neural Information Processing Systems; Lake Tahoe, NV, USA. 5–10 December 2013; [Google Scholar]
  • 18.Ramdas A., García Trillos N., Cuturi M. On Wasserstein two-sample testing and related families of nonparametric tests. Entropy. 2017;19:47. doi: 10.3390/e19020047. [DOI] [Google Scholar]
  • 19.Neumayer S., Steidl G. Handbook of Mathematical Models and Algorithms in Computer Vision and Imaging: Mathematical Imaging and Vision. Springer; Berlin/Heidelberg, Germany: 2021. From optimal transport to discrepancy; pp. 1–36. [DOI] [Google Scholar]
  • 20.Altschuler J., Bach F., Rudi A., Niles-Weed J. Massively scalable Sinkhorn distances via the Nyström method. In: Wallach H., Larochelle H., Beygelzimer A., d’ Alché-Buc F., Fox E., Garnett R., editors. Advances in Neural Information Processing Systems. Volume 32 Curran Associates, Inc.; Red Hook, NY, USA: 2019. [Google Scholar]
  • 21.Lakshmanan R., Pichler A., Potts D. Nonequispaced Fast Fourier Transform Boost for the Sinkhorn Algorithm. Etna—Electron. Trans. Numer. Anal. 2023;58:289–315. doi: 10.1553/etna_vol58s289. [DOI] [Google Scholar]
  • 22.Ba F.A., Quellmalz M. Accelerating the Sinkhorn algorithm for sparse multi-marginal optimal transport via fast Fourier transforms. Algorithms. 2022;15:311. doi: 10.3390/a15090311. [DOI] [Google Scholar]
  • 23.Lakshmanan R., Pichler A. Fast approximation of unbalanced optimal transport and maximum mean discrepancies. arXiv. 2023 doi: 10.48550/arXiv.2306.13618.2306.13618 [DOI] [Google Scholar]
  • 24.Monge G. Mémoire sue la théorie des déblais et de remblais. Histoire de l’Académie Royale des Sciences de Paris, Avec les Mémoires de Mathématique et de Physique Pour la Même Année. 1781. [(accessed on 6 October 2023)]. pp. 666–704. Available online: https://cir.nii.ac.jp/crid/1572261550791499008.
  • 25.Kantorovich L. On the translocation of masses. J. Math. Sci. 2006;133:1381–1382. doi: 10.1007/s10958-006-0049-2. [DOI] [Google Scholar]
  • 26.Villani C. Topics in Optimal Transportation. Volume 58. American Mathematical Society; Providence, RI, USA: 2003. Graduate Studies in Mathematics. [DOI] [Google Scholar]
  • 27.Rachev S.T., Rüschendorf L. Mass Transportation Problems Volume I: Theory, Volume II: Applications. Volume XXV. Springer; New York, NY, USA: 1998. Probability and Its Applications. [DOI] [Google Scholar]
  • 28.Rüschendorf L. Mathematische Statistik. Springer; Berlin/Heidelberg, Germany: 2014. [DOI] [Google Scholar]
  • 29.Ch Pflug G., Pichler A. Multistage Stochastic Optimization. Springer; Berlin/Heidelberg, Germany: 2014. (Springer Series in Operations Research and Financial Engineering). [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data is available at https://github.com/rajmadan96/SoftQuantization.git, accessed on 8 September 2023.


Articles from Entropy are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES