Abstract
The quantization problem aims to find the best possible approximation of probability measures on using finite and discrete measures. The Wasserstein distance is a typical choice to measure the quality of the approximation. This contribution investigates the properties and robustness of the entropy-regularized quantization problem, which relaxes the standard quantization problem. The proposed approximation technique naturally adopts the softmin function, which is well known for its robustness from both theoretical and practicability standpoints. Moreover, we use the entropy-regularized Wasserstein distance to evaluate the quality of the soft quantization problem’s approximation, and we implement a stochastic gradient approach to achieve the optimal solutions. The control parameter in our proposed method allows for the adjustment of the optimization problem’s difficulty level, providing significant advantages when dealing with exceptionally challenging problems of interest. As well, this contribution empirically illustrates the performance of the method in various expositions.
Keywords: quantization, approximation of measures, entropic regularization
JEL Classification: 94A17, 81S20, 40A25
1. Introduction
Over the past few decades, extensive research has been conducted on optimal quantization techniques in order to tackle numerical problems that are related to various fields such as data science, applied disciplines, and economic models. These problems are typically centered around uncertainties or probabilities which demand robust and efficient solutions (cf. Graf and Mauldin [1], Luschgy and Pagès [2], El Nmeir et al. [3]). In general, these problems are difficult to handle, as the random components in the problem allow uncountably many outcomes. As a consequence, in order to address this difficulty the probability measures are replaced by simpler or finite measures, which can facilitate numerical computations. However, the probability measures should be ‘close’ in order to ensure that the result of the computations with approximate (discrete) measures resembles the original problem. In a nutshell, the goal is to find the best approximation of a diffuse measure using a discrete measure, which is called an optimal quantization problem. For a comprehensive discussion of the optimal quantization problem from a mathematical standpoint, refer to Graf and Luschgy [4].
On the other hand, entropy (sometimes known as information entropy) is an essential concept when dealing with uncertainties and probabilities. In mathematics, entropy is often used as a measure of information and uncertainty. It provides a quantitative measure of the randomness or disorder in a system or a random variable. Its applications span information theory, statistical analysis, probability theory, and the study of complex dynamical systems (cf. Breuer and Csiszár [5,6], Pichler and Schlotter [7]).
In order to assess the closeness of probability measures, distances are often considered; one notable instance is the Wasserstein distance. Ostensibly, the Wasserstein distance measures the minimum average amount of transporting cost required to transfer one probability measure into another. Unlike other formulations of distances and/or divergence, which simply compare the probabilities of the distribution functions (e.g., the total variation distance and the Kullback–Leibler divergence), the Wasserstein distance incorporates the geometry of the underlying space. This increases the understanding of the relationships between different probability measures in a geometrically trustworthy manner.
In our research work, we focus on entropy-regularized quantization methods. More precisely, we consider an entropy-regularized version of the Wasserstein problem to quantify the quality of the approximation, and adapt the stochastic gradient approach to obtain the optimal quantizers.
The key features of our methodology include the following:
-
(i)
Our regularization approach stabilizes and simplifies the standard quantization problem by introducing penalty terms or constraints that discourage overly complex or overfitted models, promoting better generalizations and robustness in the solutions.
-
(ii)
The influence of entropy is controlled using a parameter, , which enables us to reach the genuine optimal quantizers.
-
(iii)
Generally, parameter tuning comes with certain limitations. However, our method builds upon the framework of the well-established softmin function, which allows us to exercise parameter control without encountering any restrictions.
-
(iv)
For larger values of the regularization parameter , the optimal measure accumulates all its mass at the center of the measure.
Applications in the Context of Quantization.
Quantization techniques have undergone significant developments in recent years, particularly within the domain of deep learning and model optimization. State-of-the-art research has introduced advanced methodologies such as non-uniform quantization and quantization-aware training, enabling the efficient deployment of neural networks while preserving performance (cf. Jacob et al. [8], Zhuang et al. [9], Hubara et al. [10]). Furthermore, quantization principles have found applications beyond machine learning, such as in digital image processing, computer vision (cf. Polino et al. [11]), and electric charge quantization (cf. Bhattacharya [12]).
Related Works and Contributions.
As mentioned above, optimal quantization is a well-researched topic in the field of information theory and signal processing. Several methods have been developed for the optimal quantization problem, notably including the following:
-
–
Lloyd-Max Algorithm: this algorithm, also known as Lloyd’s algorithm or the k-means algorithm, is a popular iterative algorithm for computing optimal vector quantizers. It iteratively adjusts the centroids of the quantization levels to minimize the quantization error (cf. Scheunders [13]).
-
–
Tree-Structured Vector Quantization (TSVQ): TSVQ is a hierarchical quantization method that uses a tree structure to partition the input space into regions. It recursively applies vector quantization at each level of the tree until the desired number of quantization levels is achieved (cf. Wei and Levoy [14]).
-
–
Expectation-maximization (EM) algorithm: the EM algorithm is a general-purpose optimization algorithm that can be used for optimal quantization. It is an iterative algorithm that estimates the parameters of a statistical model to maximize the likelihood of the observed data (cf. Heskes [15]).
-
–
Stochastic Optimization Methods: stochastic optimization methods such as simulated annealing, genetic algorithms, and particle swarm optimization can be used to find optimal quantization strategies by exploring the search space and iteratively improving the quantization performance (cf. Pagès et al. [16]).
-
–
Greedy vector quantization (GVQ): the greedy algorithm tries to solve this problem iteratively by adding one code word at every step until the desired number of code words is reached, each time selecting the code word that minimizes the error. GVQ is known to provide suboptimal quantization compared to other non-greedy methods such as the Lloyd-Max and Linde–Buzo–Gray algorithms. However, it has been shown to perform well when the data have a strong correlation structure. Notably, it utilizes the Wasserstein distance to measure the error of approximation (cf. Luschgy and Pagès [2]).
These methods provide efficient and practical solutions for finding optimal quantization schemes, and have different trade-offs between complexity and performance. The choice of method depends on the problem of interest and the requirements of the application. However, most of these methods depend on strict constraints, which makes the solutions overly complex or results in model overfitting. Our method mitigates this issue by promoting better generalizations and robustness in the solutions.
In the optimal transport community, the entropy-regularized version of the optimal transport problem (known as the entropy-regularized Wasserstein problem) was initial proposed in Cuturi [17]. This entropy version of the Wasserstein problem promotes fast computations using Sinkhorn’s algorithm. As an avenue for constructive research, the above-cited study presented a multitude of results aimed at gaining a comprehensive understanding of the subtleties involved in enhancing the computational performance of entropy-optimal transport (cf. Ramdas et al. [18], Neumayer and Steidl [19], Altschuler et al. [20], Lakshmanan et al. [21], Ba and Quellmalz [22], Lakshmanan and Pichler [23]). These findings have served as a valuable foundation for further exploration in the field of optimal transport, providing insights into both the intricacies of the topic and potential avenues for improvement.
In contrast, we present a new and innovative approach that concentrates on the optimal quantization problem based on entropy and on its robust properties, which represents a distinct contribution with regard to standard entropy-regularized optimal transport problems.
One of the principal consequences of our research substantiates the convergence behavior of quantizers at the center of the measure. The relationship between the center of the measure and the entropy-regularized quantization problem has not been exposed yet. The following plain solution is obtained by intensifying the entropy term in the regularization of the quantization problem.
Theorem 1.
There exists a real valued such that the approximation of the entropy-regularized optimal quantization problem is provided by the Dirac measure
for every , where a is the center of the measure P with respect to the distance d.
This enthralling interpretation (Theorem 1) of our master problem facilitates an understanding of the transition from a complex and difficult optimization solution to a simple solution. Moreover, along with a theoretical discussion, we provide an algorithm and numerical exemplification which empirically demonstrate the robustness of our method. The forthcoming sections elucidate the robustness and asymptotic properties of the proposed method in detail.
Outline of the Paper.
Section 2 establishes the essential notation, definitions, and properties. Moreover, we comprehensively expound upon the significance of the smooth minimum, a pivotal component in our research. In Section 3, we introduce the entropy-regularized optimal quantization problem and delve into its inherent properties. Section 4 presents a discussion of the soft tessellation, optimal weights, and theoretical properties of parameter tuning. Furthermore, we systematically illustrate the computational process along with a pseudo-algorithm. Section 5 provides numerical examples and empirically substantiates the theoretical proofs. Finally, Section 6 summarizes the study.
2. Preliminaries
In what follows, is a Polish space. The -algebra generated by the Borel sets induced by the distance d is , while the set of all probability measures on is .
2.1. Distances and Divergences of Measures
The standard quantization problem employs the Wasserstein distance to measure the quality of the approximation, which was initially studied by Monge and Kantorovich (cf. Monge [24], Kantorovich [25]). One of the remarkable properties of this distance is that it metrizes the weak* topology of measures.
Definition 1
(Wasserstein distance). Let P and be probability measures on . The Wasserstein distance of order between P and is
(1) where the infimum is among all measures with marginals P and , that is,
(2)
(3) for all sets A and . The measures
on are called the marginal measures of the bivariate measure π.
Readers may refer to the excellent monographs in [26,27] for a comprehensive discussion of the Wasserstein distance.
Remark 1
(Flexibility). In the subsequent discussion, our problem of interest is to approximate the measure P, which is a continuous, discrete, or mixed measure on . The measure is used to approximate the measure P, which is a discrete measure. The definition of the Wasserstein distance flexibly comprises all the cases, namely, continuous, semi-discrete, and discrete measures.
In contrast to the standard methodology, we investigate the quantization problem by utilizing an entropy version of the Wasserstein distance. The standard Wasserstein problem is regularized by adding the Kullback–Leibler divergence, which is known as the relative entropy.
Definition 2
(Kullback–Leibler divergence). Let P and be probability measures. Denote the Radon–Nikodým derivative by if Q is absolutely continuous with respect to P (). The Kullback–Leibler divergence is
(4) where (, resp.) is the expectation with respect to the measure P (Q, resp.).
Per Gibbs’ inequality, the Kullback–Leibler divergence satisfies (non-negativity). However, D is not a distance metric, as it does not satisfy the symmetry and triangle inequality properties.
We would like to emphasize the following distinctness with respect to the Wasserstein distance (cf. Remark 1): in order for the Kullback–Leibler divergence to be finite (), we must have
where the support of the measure (cf. Rüschendorf [28]) is
If P is a continuous measure on , then Q is as well. If P is a finite measure, then the support points of P contain the support points of Q.
2.2. The Smooth Minimum
In what follows, we present the smooth minimum in its general form, which includes discrete and continuous measures. The numerical computations in the following section rely on results for its discrete version. Therefore, we address the special properties of its discrete version in detail.
Definition 3
(Smooth minimum). Let and let Y be a random variable. The smooth minimum, or smooth minimum with respect to , is
(5)
(6) provided that the expectation (integral) of is finite, or if it is not finite, that . For , we set
(7) For a σ-algebra and measurable with respect to , the conditional smooth minimum is
The following lemma relates the smooth minimum with the essential infimum (cf. (7)), that is, colloquially, the ‘minimum’ of a random variable. As well, the result justifies the term smooth minimum.
Lemma 1.
For , it holds that
(8) and
(9)
Proof.
Inequality (8) follows from Jensen’s inequality as applied to the convex function .
Next, the first inequality in the second display (9) follows from and the fact that all operations in (6) are monotonic. Finally, let . Per Markov’s inequality, we have
(10) which is a variant of the Chernoff bound. From Inequality (10), it follows that
(11) When and , we have
where a is an arbitrary number with . This completes the proof. □
Remark 2
(Nesting property). The main properties of the smooth minimum include translation equivariance
and positive homogeneity
As a consequence of the tower property of the expectation, we have the nesting property
provided that is a sub-σ-algebra of .
2.3. Softmin Function
The smooth minimum is related to the softmin function via its derivatives. In what follows, we express variants of its derivatives, which are involved later.
Definition 4
(Softmin function). For and a random variable Y with a finite smooth minimum, the softmin function is the random variable
(12) where the latter equality is obvious based on the definition of the smooth minimum in (6). The function is called the Gibbs density.
The Derivative with respect to the Probability Measure
The definition of the smooth minimum in (6) does not require the measure to be a probability measure. Based on (at ) for the natural logarithm, the directional derivative of the smooth minimum in the direction of the measure Q is
| (13) |
| (14) |
| (15) |
| (16) |
Note that is (up to the constant ) a Radon–Nikodým density in (16). Thus, the Gibbs density is proportional to the directional derivative of the smooth minimum with respect to the underlying measure .
The Derivative with respect to the Random Variable
In what follows, we additionally require the derivative of the smooth minimum with respect to its argument. Following similar reasoning as above, this is accomplished by
| (17) |
| (18) |
| (19) |
| (20) |
| (21) |
which involves the softmin function as well.
3. Regularized Quantization
This section introduces the entropy-regularized optimal quantization problem along with its properties; we first recall the standard optimal quantization problem.
The standard quantization measures the quality of the approximation using the Wasserstein distance and considers the following problem (cf. Graf and Luschgy [4]):
| (22) |
where
| (23) |
is the set of measures on supported by not more than m () points.
Soft quantization (or quantization regularized with the Kullback–Leibler divergence) involves the regularized Wasserstein distance instead of (22). The soft quantization problem is regularized with the Kullback–Leibler divergence:
| (24) |
where and . The optimal measure for solving (24) depends on the regularization parameter .
In the following discussion, we initially investigate the regularized approximation, which again demonstrates the existence of an optimal approximation.
3.1. Approximation with Inflexible Marginal Measures
The following proposition addresses the optimal approximation problem after being regularized with the Kullback–Leibler divergence and fixed marginals. To this end, we dissect the infimum in the soft quantization problem (24) as follows:
| (25) |
where the marginals P and are fixed in the inner infimum.
The following Proposition 1 addresses this problem with a fixed bivariate distribution, which is the inner infimum in (25). Then, Proposition 2 reveals that the optimal marginals coincide in this case.
Proposition 1.
Let P be a probability measure and let . The inner optimization problem in (25) relative to the fixed bivariate distribution is provided by the explicit formula
(26)
(27) where is the Kullback–Leibler divergence. Further, the infimum in (26) is attained.
Remark 3.
The notation in (27) ((29) below, resp.) is chosen to reflect the explicit expression in (26), while the soft minimum is with respect to the measure , which is associated with the variable , and the expectation is with respect to P and has an associated variable ξ (that is, the variable ξ in (27) is associated with P and the variable with ).
Remark 4
(Standard quantization). The result from (27) extends
(28)
(29) which is the formula without regularization and with restriction to the marginals P and (i.e., , cf. Pflug and Pichler [29]). Note that the preceding display thereby explicitly involves the support , while (26) only involves the expectation (via the smooth minimum) with respect to the measure . In other words, (27) quantifies the quality of entropy-regularized quantization, while (29) quantifies standard quantization.
Proof of Proposition 1.
It follows from the definition of the Kullback–Leibler divergence in (4) that it is enough to consider measures which are absolutely continuous with respect to the product measure ; otherwise, the objective is not finite. Hence, there is a Radon–Nikodým density such that, with Fubini’s theorem,
In order for the marginal constraint to be satisfied (cf. (2)), we have
for every measurable set A. It follows that
We can conclude that every density of the form
(30) satisfies constraints in (2), irrespective of Z, and conversely that, via in (30), every Z defines a bivariate measure satisfying the constraints in (2). We set (with the convention that and , resp.) and consider
With these, the divergence is
For the other term in Objective (24), we have
Combining the last expressions obtained, the objective in (26) is
(31)
(32) For fixed ( is suppressed in the following two displays to abbreviate the notation), consider the function
The directional derivative in direction h of this function is
(33)
(34)
(35)
(36)
(37)
(38) Per (37) and (38), the derivative vanishes for every function h if . As is arbitrary, the general minimum is attained for . With this, the first expression in (31) vanishes, and we can conclude that
Finally, notice that the variable is completely arbitrary for the problem in (26) involving the Wasserstein distance and the Kullback–Leibler divergence. As outlined above, for every measure with finite divergence , there is a density Z, as considered above. From this, the assertion in Proposition 1 follows. □
Remark 5.
The preceding proposition considers probability measures π with marginal . Its first marginal distribution (trivially) is absolutely continuous with respect to P, , as .
The second marginal , however, is not specified. In order for π to be feasible in (26), its Kullback–Leibler divergence with respect to must be finite. Hence, there is a (non-negative) Radon–Nikodým density Z such that
It follows from Fubini’s theorem that
where . Thus, the second marginal is absolutely continuous with respect to , .
Proposition 1 characterizes the objective of the quantization problem. In addition, its proof implicitly reveals the marginal of the best approximation. The following lemma explicitly spells out the density of the marginal of the optimal measure with respect to .
Lemma 2
(Characterization of the best approximating measure). The best approximating marginal probability measure minimizing (26) has a density
where is the softmin function (cf. Definition 4).
Proof.
Recall from the proof of Proposition 1 that we have the density
of the optimal measure relative to . From this, we can derive
such that
is the density with respect to , that is, (i.e., ). □
3.2. Approximation with Flexible Marginal Measure
The following proposition reveals that the best approximation of a bivariate measure in terms of a product of independent measures is provided by the product of its marginals. With this, it follows that the objectives in (25) and (26) coincide for .
Proposition 2.
Let P be a measure and let π be a bivariate measure with marginal and . Then, it holds that
(39) where is an arbitrary measure.
Proof.
Define the Radon–Nikodým density and observe that the extension to is the density . It follows with (4) that
(40)
(41)
(42)
(43) which is the assertion. In case the measures are not absolutely continuous, the assertion in (40) is trivial. □
Suppose now that is a solution of the master problem (26) with some . It follows from the preceding proposition that the objective (26) improves when replacing the initial with the marginal of the optimal solution, that is, .
3.3. The Relation of Soft Quantization and Entropy
The soft quantization problem (26) involves the Kullback–Leibler divergence and not the entropy. The major advantage of the formulation presented above is that it works for discrete, continuous, or mixed measures, while entropy usually needs to be defined separately for discrete and continuous measures.
For a discrete measure with and , the Kullback–Leibler divergence (4) is
| (44) |
| (45) |
where
is the cross-entropy of the measures and P, while
| (46) |
is the entropy of .
For a measure with marginals P and , the cross-entropy is
| (47) |
| (48) |
| (49) |
where we have used the marginals from (2). Note that (49) does not depend on ; hence, does not depend on .
With (44), the quantization problem (26) can be rewritten equivalently as
| (50) |
by involving the entropy only. For this reason, we call the master problem in (26) the entropy-regularized problem.
4. Soft Tessellation
The quantization problem (25) consists of finding a good (in the best case, the optimal) approximation of a general probability measure P on using a simple and discrete measure . Thus, the problem consists of finding good weights as well as good locations . Quantization employs the Wasserstein distance to measure the quality of the approximation; instead, soft quantization involves the regularized Wasserstein distance, as in (26):
where the measures on supported by not more than m points (cf. (23)) are as follows:
We separate the problems of finding the best weights and locations. The following Section 4.1 addresses the problem of finding the optimal weights ; the subsequent Section 4.2 then addresses the problem of finding the optimal locations . As well, we elaborate the numerical advantages of soft quantization below.
4.1. Optimal Weights
Proposition 1 above is formulated for the general probability measures P and . The desired measure in quantization is a simple and discrete measure. To this end, recall that, per Remark 5, measures which are feasible for (26) have marginals with . It follows that the support of the marginal is smaller than the support of , that is,
For a simple measure with , it follows in particular that . In this subsection, we consider the measure and the support to be fixed.
To unfold the result of Proposition 1 for discrete measures, recall the smooth minimum and the softmin function for the discrete (empirical or uniform) measure . For this measure, the smooth minimum (6) explicitly is
This function is occasionally referred to as the LogSumExp function. The softmin function (or Gibbs density (12)) is
It follows from Lemma 2 that the best approximating measure is , where the vector q of the optimal weights relative to is provided explicitly by
| (51) |
which involves computing expectations.
Soft Tessellation
For , the softmin function is
That is, the mapping can serve for classification, i.e., tessellation; the point is associated with if , and the corresponding region is known as a Voronoi diagram.
For , the softmin is not a strict indicator, and can instead be interpreted as probability; that is,
is the probability of allocating to the quantizer .
Remark 6
(K-means and quantization). K-means clustering is a widely used unsupervised machine learning algorithm that groups datapoints into clusters based on their similarity. Voronoi tessellation, on the other hand, is a geometrical concept used to partition a space into regions, each of which are associated with a specific point or seed. Notably, this is an unavoidable concept in the investigation of optimal quantizers. In the K-means context, Voronoi tessellation helps to define cluster boundaries. Each cluster’s boundary is constructed as the region within which the datapoints are closer to its cluster center than to any other (cf. Graf and Luschgy [4] Chapter I).
4.2. Optimal Locations
As a result of Proposition 1, the objective in (27) is an expectation. To identify the optimal support points , it is essential to first minimize
| (52) |
This is a stochastic, nonlinear, and non-convex optimization problem:
| (53) |
where the function is nonlinear and non-convex. The optimal quantization problem constitutes an unconstrained, stochastic, non-convex, and nonlinear optimization problem. According to the chain rule and gradients in Section 2.3, the gradient of the objective is constructed from the components
| (54) |
that is,
| (55) |
where ‘·’ denotes the Hadamard (element-wise) product and , are the vectors with entries , , . In other words, the gradient of the LogSumExp function (53) is the softmin function, which Section 2.3 explicitly illustrates.
Algorithm 1 is a stochastic gradient algorithm used to minimize (51), which collects the elements of the optimal weights and the optimal locations provided here and in the preceding section.
| Algorithm 1: Stochastic gradient algorithm to find the optimal quantizers and optimal masses |
![]() |
Example 1.
To provide an example of the gradient of the distance function in (54) ((55), resp.), the derivative of the weighted norm
is
4.3. Quantization with Large Regularization Parameters
The entropy in (46) is minimal for the Dirac measure (where x is any point in ); in this case, , while for any other measure. For larger values of , the objective in (50), and as such the objective of the master problem (23), will supposedly prefer a measure with fewer points. This is indeed the case, as stated by Theorem 1 above. We provide its proof below after formally defining the center of the measure.
Definition 5
(Center of the measure). Let P be a probability measure on and let d be a distance on . The point is a center of the measure P with respect to the distance d if
provided that for some (i.e., any) and .
In what follows, we demonstrate that the regularized quantization problem (50) links the optimal quantization problem and the center of the measure.
Proof of Theorem 1.
According to Proposition 1, Problems (50) and (26) are equivalent. Now, assume that for all i, ; then, for , and it follows that
Thus, the minimum of the optimization problem is attained at for each , where a is the center of the measure P with respect to the distance d. It follows that is a local minimum and a stationary point satisfying the first order conditions
for the function f provided in (53). Note as well that
and as such the softmin function does not depend on at the stationary point .
Recall from (54) that
According to the product rule, the Hessian matrix is
(56) Note that the second expression is positive definite, as the Hessian of the convex function is positive definite and . Further, the Hessian of the smooth minimum (see Appendix A) is
where the matrix is
This matrix is positive definite (as ) and in Loewner order; indeed, is the covariance matrix of the multinomial distribution. It follows that the first term in (56) is , while the second is , such that (56) is positive definite for sufficiently small . Thus, the extremal point is a minimum for all . In particular, there exists such that (56) is positive definite for every , hence, the result. □
5. Numerical Illustration
This section presents numerical findings for the approaches and methods discussed earlier. The Julia implementations for these methods are available online (cf. https://github.com/rajmadan96/SoftQuantization.git, accessed on 8 September 2023).
In the following experiments, we approximate the measure P with a finite discrete measure using the stochastic gradient algorithm presented in Algorithm 1.
5.1. One Dimension
First, we perform the analysis in one dimension. In this experiment, our problem of interest is to find entropy-regularized optimal quantizers for
(i.e., the normal and exponential distributions with standard parameters). To enhance the peculiarity, we consider only quantizers.
Figure 1 illustrates the results of soft quantization of the standard normal distribution and exponential distribution. It is apparent that when is increased beyond a certain threshold (cf. Theorem 1), the quantizers converge towards the center of the measure (i.e., the mean), while for smaller values of the quantizers are able to identify the actual optimal locations with greater accuracy. Furthermore, we emphasize that our proposed method is capable of identifying the mean location regardless of the shape of the distribution, which this experiment empirically substantiates.
Figure 1.
Soft quantization of measures on with a varying regularization parameter with eight quantization points. (a) Normal distribution: for , the best approximation resides at the center of the measure; for , the approximation is reduced to only six points, as two of the remaining points have probability 0; for , we obtain the standard quantization. (b) Exponential distribution: the measures concentrate on one ( large), three (), five (), and eight quantization points.
For better understanding of the dissemination of the weights (probabilities) and their respective positions, the following examination involves the calculation of the cumulative distribution function. Additionally, we consider
as a problem of interest, which is a notably distinct scenario in terms of shape compared to the measures examined previously.
Figure 2 provides the results. It is evident that the number of quantizers m decreases as increases. When reaches a specific threshold, such as in our case, all quantizers converge towards the center of the measures, represented by the mean (i.e., 4).
Figure 2.
Soft quantization of the Gamma distribution on with varying regularization parameter ; the approximating measure is simplifies with increasing . (a) : approximate solution to standard quantization problem with eight quantizers. (b) : the eight quantization points collapse to seven quantization points. (c) : the eight quantization points collapse to three quantization points. (d) : the quantization points converge to a single point representing the center of the measure.
5.2. Two Dimensions
Next, we demonstrate the behavior of entropy-regularized optimal quantization for a range of in two dimensions. In the following experiment, we consider
as a problem of interest. Initially, we perform the experiment with quantizers.
Figure 3 illustrates the findings. Figure 3a reveals a quantization pattern similar to that observed in the one-dimensional experiment. However, in Figure 3b we gain more detailed insight into the behavior of the quantizers at , where they align diagonally before eventually colliding. Furthermore, the size of the point indicates the respective probability of the quantization point, which is notably uniformly distributed for a varying regularization parameter .
Figure 3.
Two-dimensional soft quantization of the uniform distribution on with a varying regularization parameter with 4 quantizers. (a) Uniform distribution in . (b) Enlargement of (a): for larger values of (here, ), the quantizers align while converging to the center of the measure.
Again, we consider a uniform distribution as a problem of interest in the subsequent experiment, this time employing quantizers for enhanced comprehension. Figure 4 encapsulates the essence of the experiment, offering an extensive visual representation. In contrast to the previous experiment, it can be observed that for regularization values of and they assemble at the nearest strong points (in terms of high probability) rather than converging towards the center of the measure (see Figure 4b,c). Subsequently, for larger , they move from these strong points towards the center, where they are in a diagonal alignment before colliding (see Figure 4d). More concisely, when we achieve the genuine quantization solution (see Figure 4a). As increases, the quantizers with lower probabilities converge towards those with nearest higher probabilities. Subsequently, all quantizers converge towards the center of the measure, represented by the mean of respective measure.
Figure 4.
Soft quantization of uniform distribution on with varying regularization parameter ; the approximating measure simplifies with increasing. (a) : approximate solution to the standard quantization problem with sixteen quantizers. (b) : the sixteen quantization points collapse to eight quantization points. (c) : the sixteen quantization points collapse to four quantization points. (d) : the quantization points converge to a single point, representing the center of the measure, in an aligned way.
Thus far, we have conducted two-dimensional experiments employing various quantizers ( and ) with the uniform distribution. These experiments can be categorized under the k-means approach (see Remark 6). Next, we delve into the complexity of a multivariate normal distribution, with the aim of enhancing comprehension. More precisely, our problem of interest is to find a soft quantization for
where
In this endeavor, we employ more quantizers, specifically, . Figure 5 captures the core essence of the experiment, delivering a comprehensive and visually illustrative representation. From the experiment, it is evident that the initial diagonal alignment precedes convergence toward the center of the measure as increases. Additionally, a noticeable shift can be observed on the part of the points with lower probabilities towards those with higher probabilities. This experiment highlights that the threshold of for achieving convergence or diagonal alignment in the center of the measure depends on the number of quantizers employed.
Figure 5.
Two-dimensional soft quantization of the normal distribution on with varying regularization parameter and parameters , , and . (a) , the solution to the standard quantization problem, (b) , (c) .
6. Summary
In this study, we have enhanced the stability and simplicity of the standard quantization problem by introducing a novel method of quantization using entropy. Propositions 1 and 2 thoroughly elucidate the intricacies of the master problem (25). Our substantiation of the convergence of quantizers to the center of the measure explains the transition from a complex hard optimization problem to a simplified configuration (see Theorem 1). More concisely, this transition underscores the fundamental shift towards a more tractable and straightforward computational framework, marking a significant advancement in terms of the overall approach. Moreover, in Section 5, we provide numerical illustrations of our method that confirm its robustness, stability, and properties, as discussed in our theoretical results. These numerical demonstrations serve as empirical evidence reinforcing the efficacy of our proposed approach.
Appendix A. Hessian of the Softmin
The empirical measure is a probability measure. From Jensen’s inequality, it follows that . Thus, the smooth minimum involves a cumulant generating function, for which we derive
| (A1) |
| (A2) |
where is the j-th cumulant with respect to the empirical measure. Specifically,
where is the ‘sample mean’ and the ‘sample variance’. The following cumulants (, etc.) are more involved. The Taylor series expansion and
Note as well that the softmin function is the gradient of the smooth minimum:
The softmin function is frequently used in classification in a maximum likelihood framework. It holds that
for and
that is,
Author Contributions
The authors have contributed equally to this article. All authors have read and agreed to the published version of the manuscript.
Data Availability Statement
Data is available at https://github.com/rajmadan96/SoftQuantization.git, accessed on 8 September 2023.
Conflicts of Interest
The authors declare no conflict of interest.
Funding Statement
DFG, German Research Foundation—Project-ID 416228727—SFB 1410.
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
References
- 1.Graf S., Mauldin R.D. A Classification of Disintegrations of Measures. Contemp. Math. 1989;94:147–158. [Google Scholar]
- 2.Luschgy H., Pagès G. Greedy vector quantization. J. Approx. Theory. 2015;198:111–131. doi: 10.1016/j.jat.2015.05.005. [DOI] [Google Scholar]
- 3.El Nmeir R., Luschgy H., Pagès G. New approach to greedy vector quantization. Bernoulli. 2022;28:424–452. doi: 10.3150/21-BEJ1350. [DOI] [Google Scholar]
- 4.Graf S., Luschgy H. Foundations of Quantization for Probability Distributions. Volume 1730. Springer; Berlin, Germany: 2000. Lecture Notes in Mathematics. [DOI] [Google Scholar]
- 5.Breuer T., Csiszár I. Measuring distribution model risk. Math. Financ. 2013;26:395–411. doi: 10.1111/mafi.12050. [DOI] [Google Scholar]
- 6.Breuer T., Csiszár I. Systematic stress tests with entropic plausibility constraints. J. Bank. Financ. 2013;37:1552–1559. doi: 10.1016/j.jbankfin.2012.04.013. [DOI] [Google Scholar]
- 7.Pichler A., Schlotter R. Entropy based risk measures. Eur. J. Oper. Res. 2020;285:223–236. doi: 10.1016/j.ejor.2019.01.016. [DOI] [Google Scholar]
- 8.Jacob B., Kligys S., Chen B., Zhu M., Tang M., Howard A., Adam H., Kalenichenko D. Quantization and training of neural networks for efficient integer-arithmetic-only inference; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Salt Lake City, UT, USA. 18–23 June 2018. [Google Scholar]
- 9.Zhuang B., Liu L., Tan M., Shen C., Reid I. Training quantized neural networks with a full-precision auxiliary module; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020. [(accessed on 6 October 2023)]. pp. 1488–1497. Available online: https://openaccess.thecvf.com/content_CVPR_2020/html/Zhuang_Training_Quantized_Neural_Networks_With_a_Full-Precision_Auxiliary_Module_CVPR_2020_paper.html. [Google Scholar]
- 10.Hubara I., Courbariaux M., Soudry D., El-Yaniv R., Bengio Y. Binarized neural networks. [(accessed on 6 October 2023)];Adv. Neural Inf. Process. Syst. 2016 29 Available online: https://proceedings.neurips.cc/paper_files/paper/2016/hash/d8330f857a17c53d217014ee776bfd50-Abstract.html. [Google Scholar]
- 11.Polino A., Pascanu R., Alistarh D.-A. Model compression via distillation and quantization; Proceedings of the 6th International Conference on Learning Representations; Vancouver, BC, Canada. 30 April–3 May 2018; [(accessed on 6 October 2023)]. Available online: https://research-explorer.ista.ac.at/record/7812. [Google Scholar]
- 12.Bhattacharya K. Semi-classical description of electrostatics and quantization of electric charge. Phys. Scr. 2023;98:8. doi: 10.1088/1402-4896/ace1b0. [DOI] [Google Scholar]
- 13.Scheunders P. A genetic Lloyd-Max image quantization algorithm. Pattern Recognit. Lett. 1996;17:547–556. doi: 10.1016/0167-8655(96)00011-6. [DOI] [Google Scholar]
- 14.Wei L.Y., Levoy M. Fast texture synthesis using tree-structured vector quantization; Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques; 2000. [(accessed on 6 October 2023)]. pp. 479–488. Available online: https://dl.acm.org/doi/abs/10.1145/344779.345009. [Google Scholar]
- 15.Heskes T. Self-organizing maps, vector quantization, and mixture modeling. IEEE Trans. Neural Netw. 2001;12:1299–1305. doi: 10.1109/72.963766. [DOI] [PubMed] [Google Scholar]
- 16.Pagès G., Pham H., Printems J. Handbook of Computational and Numerical Methods in Finance. Springer Science & Business Media; Berlin/Heidelberg, Germany: 2004. Optimal Quantization Methods and Applications to Numerical Problems in Finance; pp. 253–297. [DOI] [Google Scholar]
- 17.Cuturi M. Sinkhorn distances: Lightspeed computation of optimal transport; Proceedings of the 26th International Conference on Neural Information Processing Systems; Lake Tahoe, NV, USA. 5–10 December 2013; [Google Scholar]
- 18.Ramdas A., García Trillos N., Cuturi M. On Wasserstein two-sample testing and related families of nonparametric tests. Entropy. 2017;19:47. doi: 10.3390/e19020047. [DOI] [Google Scholar]
- 19.Neumayer S., Steidl G. Handbook of Mathematical Models and Algorithms in Computer Vision and Imaging: Mathematical Imaging and Vision. Springer; Berlin/Heidelberg, Germany: 2021. From optimal transport to discrepancy; pp. 1–36. [DOI] [Google Scholar]
- 20.Altschuler J., Bach F., Rudi A., Niles-Weed J. Massively scalable Sinkhorn distances via the Nyström method. In: Wallach H., Larochelle H., Beygelzimer A., d’ Alché-Buc F., Fox E., Garnett R., editors. Advances in Neural Information Processing Systems. Volume 32 Curran Associates, Inc.; Red Hook, NY, USA: 2019. [Google Scholar]
- 21.Lakshmanan R., Pichler A., Potts D. Nonequispaced Fast Fourier Transform Boost for the Sinkhorn Algorithm. Etna—Electron. Trans. Numer. Anal. 2023;58:289–315. doi: 10.1553/etna_vol58s289. [DOI] [Google Scholar]
- 22.Ba F.A., Quellmalz M. Accelerating the Sinkhorn algorithm for sparse multi-marginal optimal transport via fast Fourier transforms. Algorithms. 2022;15:311. doi: 10.3390/a15090311. [DOI] [Google Scholar]
- 23.Lakshmanan R., Pichler A. Fast approximation of unbalanced optimal transport and maximum mean discrepancies. arXiv. 2023 doi: 10.48550/arXiv.2306.13618.2306.13618 [DOI] [Google Scholar]
- 24.Monge G. Mémoire sue la théorie des déblais et de remblais. Histoire de l’Académie Royale des Sciences de Paris, Avec les Mémoires de Mathématique et de Physique Pour la Même Année. 1781. [(accessed on 6 October 2023)]. pp. 666–704. Available online: https://cir.nii.ac.jp/crid/1572261550791499008.
- 25.Kantorovich L. On the translocation of masses. J. Math. Sci. 2006;133:1381–1382. doi: 10.1007/s10958-006-0049-2. [DOI] [Google Scholar]
- 26.Villani C. Topics in Optimal Transportation. Volume 58. American Mathematical Society; Providence, RI, USA: 2003. Graduate Studies in Mathematics. [DOI] [Google Scholar]
- 27.Rachev S.T., Rüschendorf L. Mass Transportation Problems Volume I: Theory, Volume II: Applications. Volume XXV. Springer; New York, NY, USA: 1998. Probability and Its Applications. [DOI] [Google Scholar]
- 28.Rüschendorf L. Mathematische Statistik. Springer; Berlin/Heidelberg, Germany: 2014. [DOI] [Google Scholar]
- 29.Ch Pflug G., Pichler A. Multistage Stochastic Optimization. Springer; Berlin/Heidelberg, Germany: 2014. (Springer Series in Operations Research and Financial Engineering). [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data is available at https://github.com/rajmadan96/SoftQuantization.git, accessed on 8 September 2023.






