Abstract
There has been a recent surge of interest in incorporating fairness aspects into classical clustering problems. Two recently introduced variants of the k-Center problem in this spirit are Colorful k-Center, introduced by Bandyapadhyay, Inamdar, Pai, and Varadarajan, and lottery models, such as the Fair Robust k-Center problem introduced by Harris, Pensyl, Srinivasan, and Trinh. To address fairness aspects, these models, compared to traditional k-Center, include additional covering constraints. Prior approximation results for these models require to relax some of the normally hard constraints, like the number of centers to be opened or the involved covering constraints, and therefore, only obtain constant-factor pseudo-approximations. In this paper, we introduce a new approach to deal with such covering constraints that leads to (true) approximations, including a 4-approximation for Colorful k-Center with constantly many colors—settling an open question raised by Bandyapadhyay, Inamdar, Pai, and Varadarajan—and a 4-approximation for Fair Robust k-Center, for which the existence of a (true) constant-factor approximation was also open. We complement our results by showing that if one allows an unbounded number of colors, then Colorful k-Center admits no approximation algorithm with finite approximation guarantee, assuming that . Moreover, under the Exponential Time Hypothesis, the problem is inapproximable if the number of colors grows faster than logarithmic in the size of the ground set.
Keywords: Approximation algorithms, k-Center, Clustering, Polyhedral techniques
Introduction
Along with k-Median and k-Means, k-Center is one of the most fundamental and heavily studied clustering problems. In k-Center, we are given a finite metric space (X, d) and an integer , and the task is to find a set with minimizing the maximum distance of any point in X to its closest point in C. Equivalently, the problem can be phrased as covering X with k balls of radius as small as possible, i.e., finding the smallest radius together with a set with such that , where is the ball of radius r around c.
k-Center, like most clustering problems, is computationally hard; actually it is -hard to approximate to within any constant below 2 [21]. On the positive side, various 2-approximations [15, 19] have been found, and thus, its approximability is settled. Many variations of k-Center have been studied, most of which are based on generalizations along one of the following two main axes:
-
(i)
which sets of centers can be selected, and
-
(ii)
which sets of points of X need to be covered.
The most prominent variations along (i) are variations where the set of centers is required to be in some down-closed family . For example, if centers have non-negative opening costs and there is a global budget for opening centers, Knapsack Center is obtained. If is the set of independent sets of a matroid, the problem is known as Matroid Center. The best-known problem type linked to (ii) is Robust k-Center. Here, an integer is given, and one only needs to cover any m points of X with k balls of radius as small as possible. Research on k-Center variants along one or both of these axes has been very active and fruitful, see, e.g., [8, 10, 11, 20]. In particular, recent work of Chakrabarty and Negahbani [9] presents an elegant and unifying framework for designing best possible approximation algorithms for all above-mentioned variants.
All the above variants have in common that there is a single covering requirement; either all of X needs to be covered or a subset of it. Moreover, they come with different kinds of packing constraints on the centers to be opened as in Knapsack or Matroid Center. However, the desire to address fairness in clustering, which has received significant attention recently, naturally leads to multiple covering constraints. Here, existing techniques only lead to constant-factor pseudo-approximations that violate at least one constraint, like the number of centers to be opened. In this work, we present techniques for obtaining (true) approximations for two recent fairness-inspired generalizations of k-Center along axis (ii), namely
-
(i)
-Colorful k-Center, as introduced by Bandyapadhyay et al. [3], and
-
(ii)
Fair Robust k-Center, a lottery model introduced by Harris et al. [18].
-Colorful k-Center () is a fairness-inspired k-Center model imposing covering constraints on subgroups. It is formally defined as follows.
Definition 1
(-Colorful k-Center () [3]) Let , (X, d) be a finite metric space, and for . The -Colorful k-Center problem () asks to find the smallest radius together with centers , , such that
Such a set of centers C is called a solution of radius r.1
We clarify that, unless explicitly stated otherwise, the number in the above definition is assumed to be part of the input.
The choice of name for the problem stems from interpreting each set for as a color assigned to the elements of . In particular, an element can have multiple colors or no color. In words, the task is to open k balls of smallest possible radius such that, for each color , at least points of color are covered. Hence, for , we recover the Robust k-Center problem.
We briefly contrast with related fairness models. A related class of models that has received significant attention also assumes that the ground set is colored, but requires that the ratio between colors within each cluster is approximately the same as the global ratio between colors. Such variants have been considered for k-Median, k-Means, and k-Center, e.g., see [2, 4, 5, 12, 28] and references therein. differentiates itself from the above notion of fairness by not requiring a per-cluster guarantee, but a global fairness guarantee. More precisely, each color can be thought of as representing a certain group of people (demographic), and a global covering requirement is given per demographic. Also notice the difference with the well-known Robust k-Center problem, where a feasible solution might, potentially, completely ignore a certain subgroup, resulting in a heavily unfair treatment. addresses this issue.
The presence of multiple covering constraints in , imposed by the colors, hinders the use of classical k-Center clustering techniques, which, as mentioned above, have mostly been developed for packing constraints on the centers to be opened. An elegant first step was done by Bandyapadhyay et al. [3]. They exploit sparsity of a well-chosen LP (in a similar spirit as in [18]) to obtain the following pseudo-approximation for : they efficiently compute a solution of twice the optimal radius by opening at most centers. Hence, up to more centers than allowed may have to be opened. Moreover, [3] shows that in the Euclidean plane, a significantly more involved extension of this technique allows for obtaining a true -approximation for . Unfortunately, this approach is heavily problem-tailored and does not even extend to 3-dimensional Euclidean spaces. This naturally leads to the main open question raised in [3]:
Does with admit an O(1)-approximation, for any finite metric?
Here, we introduce a new approach that answers this question affirmatively.
Together with additional ingredients, our approach also applies to Fair Robust k-Center, which is a natural lottery model introduced by Harris et al. [18]. We introduce the following generalization thereof that can be handled with our techniques, which we name Fair -Colorful k-Center problem (Fair ). (The Fair Robust k-Center problem, as introduced in [18], corresponds to .)
Definition 2
(Fair -Colorful k-Center problem (Fair )) Given is a instance on a finite metric space (X, d) together with a vector . The goal is to find the smallest radius , for which there exists a distribution over feasible solutions of radius r such that
An algorithm for this problem should return a radius r along with an efficient procedure for sampling a random feasible solution of radius r.
We note that if there exists a distribution with the desired properties for some radius r, then there exists a distribution of polynomial support with the desired properties (due to sparsity of the natural LP corresponding to the distribution, described in Sect. 3). This, in particular, implies that the corresponding decision problem is in .
Fair is a generalization of , where each element needs to be covered with a prescribed probability p(u). The Fair Robust k-Center problem, i.e., Fair with , is indeed a fairness-inspired generalization of Robust k-Center, since Robust k-Center is obtained by setting for all . One example setting where the additional fairness aspect of Fair compared to is nicely illustrated, is when k-Center problems have to be solved repeatedly on the same metric space. The introduction of the probability requirements p allows for obtaining a distribution to draw from that needs to consider all elements of X (as prescribed by p), whereas classical Robust k-Center likely ignores a group of badly-placed elements. We refer to Harris et al. [18] for further motivation of the problem setting. They also discuss the Knapsack and Matroid Center problem under the same notion of fairness.
For Fair Robust k-Center, [18] presents a 2-pseudo-approximation that slightly violates both the number of points to be covered and the probability of covering each point. More precisely, for any constant , only a -fraction of the required number of elements are covered, and element is covered only with probability instead of p(u). It was left open in [18] whether a true approximation may exist for Fair Robust k-Center.
Our results
Our main contribution is a method to obtain 4-approximations for variants of k-Center with unary encoded covering constraints on the points to be covered. We illustrate our technique in the context of , affirmatively resolving the open question of Bandyapadhyay et al. [3] about the existence of an O(1)-approximation for constantly many colors (without restrictions on the underlying metric space).
Theorem 1
There is a 4-approximation algorithm for running in time .
In a second step we extend and generalize our technique to Fair , which, as mentioned, is a generalization of . We show that Fair admits an O(1)-approximation, which neither violates covering nor probabilistic constraints.
Theorem 2
There is a 4-approximation algorithm for Fair running in time , where L is the encoding length of the input.
We recall that in our definition of , the number of colors is part of the input. In the following, we complete our results above—which lead to efficient algorithms only for constant —by showing inapproximability of when is not bounded. This holds even on the real line (1-dimensional Euclidean space).
Theorem 3
It is -hard to decide whether on the real line admits a solution of radius 0. Moreover, unless the Exponential Time Hypothesis fails, for any function with , no polynomial-time algorithm can distinguish whether on the real line with admits a solution of radius 0.
Hence, assuming the Exponential Time Hypothesis, there is no polynomial-time approximation algorithm for if the number of colors grows faster than logarithmic in the size of the ground set. Notice that, for a logarithmic number of colors, our procedures run in quasi-polynomial time.
Finally, we extend the hardness implied by Theorem 3 to bi-criteria algorithms that are allowed to open more than k centers. An bi-criteria algorithm for , for , is an algorithm that returns a solution that picks at most centers and its radius is at most , where r is the radius of an optimal solution with k centers. More precisely, we prove the following theorem.
Theorem 4
There exists a constant , such that it is -hard to decide whether on the real line admits a solution of radius 0, even if we are allowed to violate the number of open centers by a factor of .
Notice that, unless , the above theorem rules out the existence of a bi-criteria algorithm for for any value of .
Note: In an independent work, Jia, Sheth, and Svensson [23], also made advances on . We briefly highlight some main differences. In particular, they gave a 3-approximation algorithm for running in time . Hence, this algorithm provides a better approximation guarantee than our 4-approximation for , though with a slower running time. Moreover, contrary to [23], we also show that our techniques extend to Fair (Theorem 2) and obtain the hardness results highlighted in Theorems 3 and 4.
Outline of main technical contributions and paper organization
We introduce two main technical ingredients. The first is a method to deal with additional covering constraints in k-Center problems. We showcase this method in the context of , which leads to Theorem 1. For this, we combine polyhedral sparsity-based arguments as used by Bandyapadhyay et al. [3], which by themselves only lead to pseudo-approximations, with dynamic programming to design a round-or-cut approach. Round-or-cut approaches, first used by Carr et al. [7], leverage the ellipsoid method in a clever way. In each ellipsoid iteration they either separate the current point from a well-defined polyhedron P, or round the current point to a good solution. The rounding step may happen even if the current point is not in P. Round-or-cut methods have found applications in numerous problem settings (see, e.g., [1, 9, 16, 24–27]). The way we employ round-or-cut is inspired by a powerful round-or-cut approach of Chakrabarty and Negahbani [9] also developed in the context of k-Center. However, their approach is not applicable to k-Center problems as soon as multiple covering constraints exist, like in ; see Appendix B for more details.
Our second technical contribution first employs LP duality to transform lottery-type models, like Fair , into an auxiliary problem that corresponds to a weighted version of k-Center with covering constraints. We then show how a certain type of approximate separation over the dual is possible, by leveraging the techniques we introduced in the context of , leading to a 4-approximation.
Even though Theorem 2 is a strictly stronger statement than Theorem 1, we first prove Theorem 1 in Sect. 2, because it allows us to give a significantly cleaner presentation of some of our main technical contributions. In Sect. 3, we then focus on the additional techniques needed to deal with Fair , by reducing it to a problem that can be tackled with the techniques introduced in Sect. 2. Finally, in Sect. 4, we discuss the hardness results stated in Theorems 3 and 4.
A 4-approximation for B with running time
In this section, we prove Theorem 1, which implies a polynomial-time 4-approximation algorithm for with constantly many colors. We assume ; notice that corresponds to Robust k-Center, for which a (tight) polynomial-time 2-approximation is known [8, 18]. Moreover, we assume that , since otherwise, we can simply enumerate over all subsets of X of size k, which leads to an exact algorithm with running time . Thus, from now on, we have that .
We present a procedure that for any returns a solution of radius 4r if a solution of radius r exists, and runs in time . This implies Theorem 1 because the optimal radius is a distance between two points. Hence, we can run the procedure for all possible pairwise distances r between points in X (or, alternatively, do binary search on the set of pairwise distances in order to speed up the algorithm) and return the best solution found. Thus, we fix in what follows. We denote by the following canonical relaxation of with radius r:
![]() |
1 |
Integral points correspond to solutions of radius r, where x and y are characteristic vectors indicating the points that are covered and the centers that are opened, respectively. We denote the integer hull of by .
Our algorithm is based on the round-or-cut framework, first used in [7]. The main building block is a procedure that rounds a point to a radius 4r solution under certain conditions. It will turn out that these conditions are always satisfied if . If they are not satisfied, then we can prove that and generate in time a hyperplane separating (x, y) from . This separation step now becomes an iteration of the ellipsoid method, employed to find a point in , and we continue with a new candidate point (x, y). Schematically, the whole process is described in Fig. 1.
Fig. 1.
An iteration of the ellipsoid method
On a high level, we realize our round-or-cut procedure as follows. First, we check whether and return a violated constraint if this is not the case. If , we partition the metric space, based on a natural greedy heuristic introduced by Harris et al. [18]. This gives a set of centers with corresponding clusters . We now exploit a technique by Bandyapadhyay et al. [3], which implies that if , then one can leverage sparsity arguments in a simplified LP to obtain a radius 4r solution that picks centers only within S. (For brevity, we use the shorthand for any finite set W and vector ; in particular, .) We then turn to the case where . At this point, we show that one can efficiently check whether there exists a solution of radius 2r that opens at most centers outside of S. This is achieved by guessing the centers outside of S (of which there are at most many, as noted) and using dynamic programming to find the remaining centers in S. If no such radius 2r solution exists, we argue that any solution of radius r has at most centers in B(S, r), proving that is an inequality separating (x, y) from .
We now give a formal treatment of each step of this algorithm, which is schematically described in Fig. 1. Given a point , we first check whether , and, if not, return a violated constraint of . Such a constraint separates (x, y) from because . Hence, we may assume that .
We now use a partitioning technique by Harris et al. [18] that, given , allows for obtaining what we call an (x, y)-good partition , defined as follows.
Definition 3
((x, y)-good partition) Let . A tuple , where the family partitions X and with for , is an (x, y)-good partition if:
-
(i)
for all ,
-
(ii)
for all , and
-
(iii)
for all and for all .
The partitioning procedure of [18] was originally introduced for Robust k-Center and naturally extends to (see [3]). For completeness, we describe it in Algorithm 1. Contrary to prior procedures, we compute an (x, y)-good partition whose centers have pairwise distances of strictly more than 4r (instead of 2r as in prior work). This large separation avoids overlap of radius 2r balls around centers in S, and allows us to use dynamic programming (DP) to build a radius 2r solution with centers in S under certain conditions. However, it is also the reason why we get a 4-approximation if the DP approach cannot be applied.
Lemma 1
([3, 18]) For , Algorithm 1 computes an (x, y)-good partition in polynomial time.
For completeness, we present the proof of the above lemma.
Proof of Lemma 1
By construction, the first two properties of the definition of an (x, y)-good partition are trivially satisfied by the generated partition . We now turn to the third property. For each point , by the greedy criterion we have . Since , we also have , implying the statement.
The following theorem follows from the results in [3].
Theorem 5
([3]) Let and be an (x, y)-good partition. Then, if , a solution of radius 4r can be found in polynomial time.
For completeness, we provide in Appendix A a proof of a slightly stronger version of Theorem 5, namely Theorem 8, which we reuse later in a more general context. Theorem 8 easily follows by the same sparsity argument used in [3].
We are left with the case . In this case we present a procedure that either returns a solution of radius 2r or, if it fails to do so, we show that every point must fulfill ; hence, this is an inequality separating (x, y) from .
To show the above, we assume that holds and provide a procedure obtaining a solution of radius 2r. (Notice that we cannot check whether , and even if we knew that , we still need a procedure transforming the possibly fractional point (x, y) to an actual (integral) solution.) Note that if , then there must exist a solution of radius r with . In particular, we must have . We observe that if such a solution exists, then there must be a solution of radius 2r which has at most centers outside of S. This is formalized in the following lemma.
Lemma 2
Let with for all with , and . If there is a radius r solution with , then there is a radius 2r solution with .
Proof
Assume there is a solution of radius r with . Let . For each , let be the unique point in S such that ; is well defined because for every . Thus, , where .
Let . We have . Moreover, as for every , we have that . Thus, is a feasible solution of radius 2r. Finally, by construction, .
So, we have now proved that if and , then there is a solution of radius 2r with . The motivation for considering solutions of radius 2r with all centers in S except for constantly many (if ) is that such solutions can be found efficiently via dynamic programming. This is possible because the centers in S are separated by distances strictly larger than 4r, which implies that radius 2r balls centered at points in S do not overlap. Hence, there are no interactions between such balls. This is formalized below.
Lemma 3
Let with for all with , and . If a radius 2r solution with exists, then we can find such a solution in time .
Proof
Suppose there is a solution of radius 2r with . The algorithm has two components. We first guess the set . Because , there are choices. Given Q, it remains to select at most centers to fulfill the color requirements. Note that for any , the number of points of color that B(W, 2r) covers on top of those already covered by B(Q, 2r) is where equality holds because centers in W are separated by distances strictly larger than 4r, and thus B(W, 2r) is the disjoint union of the sets B(w, 2r) for . Hence, the task of finding a set with such that is a solution of radius 2r can be phrased as finding a feasible solution to the following binary program:
| 2 |
The above binary program can be easily solved through standard dynamic programming techniques in time, because the coefficients are small. For completeness, we show in Appendix A how this can be done for a slightly more general problem (see Theorem 9), which we will reuse later on.2 As the dynamic program is run for many guesses of Q, we obtain an overall running time of , as claimed.
This completes the last ingredient for an iteration of our round-or-cut approach as shown in Fig. 1. In summary, assuming (for otherwise Theorem 5 leads to a solution of radius 4r) we use Lemma 3 (with ) to check whether there is a radius 2r solution with . This requires time. If this is the case, we are done. If not, the contrapositive of Lemma 2 (with ) implies that every radius r solution fulfills . Hence, every point satisfies . However, this constraint is violated by (x, y), and so it separates (x, y) from . Thus, we proved that the process described in Fig. 1 is a valid round-or-cut procedure that runs in time .
Corollary 1
There is an algorithm that, given a point , either returns a solution of radius 4r or an inequality separating (x, y) from . The running time of the algorithm is .
We can now prove the main theorem.
Proof of Theorem 1
We run the ellipsoid method on for each of the candidate radii r. For each r, the number of ellipsoid iterations is polynomially bounded as the separating hyperplanes that are produced by the algorithm have encoding length at most O(|X|) (see Theorem 6.4.9 of [17]). To see this, note that all generated hyperplanes are either inequalities defining or inequalities of the form . For the correct guess of r, is non-empty and the algorithm terminates by returning a radius 4r solution. Hence, if we return the best solution among those computed for all guesses of r, we have a 4-approximation, and the total running time is .
The lottery model of Harris et al. [18]
Our main tool to solve the lottery model of Harris et al. [18] is a reduction to a certain type of weighted k-center problem. A key step of this reduction is to transform the problem through the use of linear duality. In Subsect. 3.1, we first present this reduction before proving in Subsect. 3.2 our algorithmic result for the above-referred version of a weighted k-center problem.
Reduction to weighted version of k-center
Let (X, d) be a Fair instance, and let be the family of sets of centers satisfying the covering requirements with radius r, i.e.,
Note that a radius r solution for Fair defines a distribution over the sets in . Given r, such a distribution exists if and only if the following (exponential-size) linear program is feasible (with being its dual):
The dual problem can naturally be interpreted as a packing problem with packing constraints imposed by -solutions. However, we will mostly be interested in approximately separating over . This will turn out to reduce to a weighted version of as we highlight later.
Clearly, if is feasible, then its optimal value is 0. As mentioned in the introduction, it is also easy to see that if is feasible, then it has a feasible solution with polynomial support (since the number of non-trivial constraints is ).
We will again assume that . If , then for each fixed radius r, we solve in time , where L is the encoding length of the input. If is infeasible, then the radius r is too small. Otherwise, we compute a feasible extreme point solution to which corresponds to a distribution with support size . Hence, by applying binary search over all candidate radii, which are the pairwise distances between points in X, we can compute an optimal distribution for the smallest possible radius in time. Thus, from now on, we assume that .
Observe that, for any , always has a feasible solution (the zero vector) of value 0. Thus, by strong duality, is feasible if and only if the optimal value of is 0. Note that is scale-invariant, meaning that if is feasible for then so is for . This implies that has a solution of strictly positive objective value if and only if is unbounded. We thus define the following polyhedron , which contains all solutions of of value at least 1: As discussed, the following statement is a direct consequence of strong duality of linear programming.
Lemma 4
is empty if and only if PLP(r) is feasible.
The main lemma that allows us to obtain our result is the following. It guarantees the existence of an algorithm approximately solving a certain weighted k-center problem, where clients are weighted by . Before proving the lemma in Subsect. 3.2, we show that it implies Theorem 2.
Lemma 5
There is an algorithm that, given a point satisfying and a radius , either certifies that , or outputs a set with . The running time of the algorithm is , where L is the encoding length of the input.
In words, Lemma 5 either certifies or returns a hyperplane separating from . Its proof leverages techniques introduced in Sect. 2, and we present it in Subsect. 3.2. Using Lemma 5, we can now prove Theorem 2.
Proof of Theorem 2
As noted, there are polynomially many choices for the radius r, for each of which we run the ellipsoid method to check emptiness of as follows. Whenever there is a call to the separation oracle for a point , we first check whether and . If one of these constraints is violated, we return it as separating hyperplane. Otherwise, we invoke the algorithm of Lemma 5. The algorithm either returns a constraint in the inequality description of violated by , which solves the separation problem, or certifies . If, at any iteration of the ellipsoid method, the separation oracle is called for a point for which Lemma 5 certifies , then Lemma 4 implies is infeasible. Thus, there is no solution to the considered Fair instance of radius r. Hence, consider from now on that the separation oracle always returns a separating hyperplane, in which case the ellipsoid method certifies that as follows. Let be the family of all sets returned by Lemma 5 through calls to the separation oracle. Then, the following polyhedron: which clearly contains , is empty. As the encoding length of any constraint in the inequality description of is polynomially bounded in the input, the ellipsoid method runs in polynomial time (see Theorem 6.4.9 of [17]). In particular, the number of calls to the separation oracle, and thus , is polynomially bounded.
As , Lemma 4 implies that PLP(4r) is feasible. More precisely, because , the linear program obtained from DLP(4r) by replacing , which parameterizes the constraints in DLP(4r), by , has optimal value equal to 0. Hence, its dual, which corresponds to PLP(4r) where we replace by , is feasible. As this feasible linear program has polynomial size, because is polynomially bounded, we can solve it efficiently to obtain a distribution with the desired properties. Moreover, the total running time is , where L is the encoding length of the input.
Proof of Lemma 5
The desired separation algorithm requires us to find a solution for a instance with an extra covering constraint; the procedure of Sect. 2 generalizes to handle this extra constraint. We follow similar steps as in Fig. 1.
Let be a point satisfying , let , and, moreover, let
Hence, to prove Lemma 5, we need to find a procedure that either certifies or returns a set . To avoid technical complications later on due to the strict inequality in the definition of , we observe, using standard techniques, that one can efficiently compute a polynomially encoded to replace the inequality by .
Lemma 6
Let . Then one can efficiently compute an with encoding length O(L), where L is the encoding length of , such that the following holds: For any , we have if and only if .
Proof
The tuple consists of rationals
, with and . Let . Note that if , then . Thus, we set
. Moreover , and so the encoding length of is O(L).
Let be the following modified relaxation of , defined for given , and a corresponding as per Lemma 6, where the polytope is defined for a fixed radius r, as in Sect. 2 (see (1)):
Let be the integer hull of . We now state the following straightforward observation, whose proof is an immediate consequence of the definitions of the corresponding polytopes and Lemma 6.
Observation 1
Let be such that and . Then .
The following lemma is a slightly modified version of Theorem 5, which is also a direct consequence of Theorem 8 given in Appendix A.
Lemma 7
Let , let , and let be an (x, y)-good partition. If , a set can be found in polynomial time.
If , then Lemma 7 leads to a set that satisfies ; this gives a constraint separating from .
It remains to consider the case . As in Sect. 2, we can either find a set or certify that every satisfies .
Lemma 8
Let , with for all with , and . If there is a set with , then there is a set with .
The proof of the above lemma is identical to the proof of Lemma 2, and thus is omitted.
Lemma 9
Let , with for all with , and . If there exists a set with , then we can find such a set in time .
Proof
As in the proof of Lemma 3, we first guess up to centers . For each of those guesses, we consider the binary program (2) with objective function to be maximized. Again, this is a special case of the binary program presented in Theorem 9, given in Appendix A, and thus can be solved in time . For the guess , the characteristic vector is feasible for this binary program, implying that the optimal centers chosen by the binary program fulfill .
Corollary 2
Let . There is an algorithm that, given , either returns a set or returns a hyperplane separating (x, y) from . The running time of the algorithm is , where L is the encoding length of the input.
Proof
If , we return a violated constraint separating (x, y) from . Hence we assume . Since , we can use Theorem 1 to get an (x, y)-good partition . If , Lemma 7 gives a set . So, assuming , we use Lemma 9 (with ) to check whether there is with . If this is the case, we are done because . If not, the contrapositive of Lemma 8 (with ) implies that every fulfills . Hence, every point satisfies . However, this constraint is violated by (x, y), and it thus separates (x, y) from .
Proof of Lemma 5
We use the ellipsoid method to check emptiness of . Whenever the separation oracle gets called for a point , we invoke the algorithm of Corollary 2. If the algorithm returns at any point a set , then C corresponds to a constraint in the inequality description of violated by . Otherwise, the ellipsoid method certifies that , which implies by Observation 1. Note that the number of iterations of the ellipsoid method is polynomial as the separating hyperplanes used by the procedure above have encoding length , where L is the encoding length of the input (see Theorem 6.4.9 of [17]). Thus, the total running time is .
Hardness results for Colorful k-Center
We now prove our hardness results. We start in Subsect. 4.1 by showing Theorem 3, i.e., that becomes hard to approximate when the number of colors is unbounded. Then, in Subsect. 4.2, we prove Theorem 4, which shows our bi-criteria inapproximability result, i.e., there is an approximation hardness even when one is allowed to exceed the number of centers to be opened by up to a factor for some constant c.
We note that all of our hardness results apply even to real-line metrics. These are instances where the underlying metric is given by a set of real numbers , and the distance function d is defined as for every . The task that we prove to be hard is distinguishing whether such an instance admits a solution of radius 0 or not.
We start by discussing a reduction from the well-known Set Cover problem to on the real line. More precisely, we will show that deciding whether a given Set Cover instance has a solution of size at most k is equivalent to deciding whether a certain real-line instance admits a solution of radius 0. We note that the reduction is a straightforward adaptation of the reduction appearing in [22] in the context of the Partial Set Cover problem in geometric settings. For completeness, we first define the (decision version of the) Set Cover problem.
Definition 4
Let U be a finite set, let be a family of subsets of U, and let . The (decision) Set Cover problem, denoted as , asks to decide whether there exists a subset such that and .
The following lemma, mimicking the ideas in [22], shows a simple yet very useful reduction from Set Cover to .
Lemma 10
Let be a Set Cover instance. Then, in time polynomial in |U| and , we can construct a real-line instance with points and colors such that is a “yes” instance if and only if the instance admits a solution of radius 0. Moreover, any solution of radius 0 can be mapped efficiently to a solution.
This reduction is independent of the parameter k, in the sense that for different values of k, the same instance is obtained with the only difference that the number k of centers one can open is different.
Proof
We construct a instance as follows. Let and . Let and . We set . Each element corresponds to a distinct color . We also set the covering requirement for each color to be . Note that none of depend on k. Clearly, the construction can be done in time polynomial in |U| and .
We now observe that the given is a “yes” instance if and only if the constructed instance admits a solution of radius 0. Indeed, if is a solution of radius 0, then the set is a feasible solution of the Set Cover instance of size . Conversely, if is a Set Cover solution of size , then is a solution of radius 0 with many centers.
Hardness of approximation for .
In this section, we prove our main hardness result, Theorem 3. For that, we reduce from the well-known Vertex Cover problem on graphs of maximum degree 3 and cast it as a problem. We first formally define the problem.
Definition 5
Let be a graph of maximum degree 3 and let . The Vertex Cover problem on such a graph, denoted as , asks to decide whether there exists a set of size at most k such that for every .
Notice that Vertex Cover is a special case of Set Cover; hence, we can employ the reduction highlighted in Lemma 10 to obtain a problem. Reducing from a Vertex Cover problem of bounded degree, instead of starting from a general Set Cover problem, has the advantage that the cardinality of a minimum Vertex Cover in bounded degree graphs has, up to constant factors, the same size as the underlying ground set, which is the edge set in case of Vertex Cover. This relation is relevant in our reduction to derive a contradiction with the Exponential Time Hypothesis.
In order to prove Theorem 3, we will use the following hardness results for .
Theorem 6
-
(i)
There is no algorithm for that runs in polynomial time, assuming that .
-
(ii)
There is no algorithm for that runs in time , assuming the Exponential Time Hypothesis.
In our proof of Theorem 3, we reduce to using Lemma 10 and then derive hardness of by the hardness given by Theorem 6. Whereas this approach proves the first part of Theorem 3 in a straightforward way, it faces a technical hurdle for the second part. More precisely, note that the second part can be rephrased as follows. The existence of a function with together with a polynomial-time algorithm for on the real line with violates the Exponential Time Hypothesis. However, by reducing a general instance to through Lemma 10, we may obtain a instance that does not fulfill , which is required to apply algorithm , as algorithm only needs to work on instances in this regime. Indeed, the reduction of Lemma 10 would only allow us to use algorithm to obtain a polynomial-time algorithm for Set Cover instances with ; in particular, we would only be able to solve instances whose underlying graph satisfies . However, can easily be transformed into an algorithm working for instance by artificially inflating the vertex set V to make sure that |E| is small compared to |V|. The following lemma formalizes this quite straightforward, though slightly technical, step.
Lemma 11
Let be a function satisfying . Suppose that there exists an algorithm that solves in polynomial time any instances with . Then there is an algorithm that solves any instance in time .
Proof
Let be a Vertex Cover instance on a graph of maximum degree 3. To be able to apply to we would need . If this is satisfied, we simply apply . Hence, assume from now on . In this case we create a modified instance obtained by inflating through the addition of singleton vertices as discussed in the following. Because , there is a constant and a non-decreasing function with
-
(i)
, and
-
(ii)
.
Without loss of generality, we assume that ; for otherwise, the instance has constant size and can therefore be solved in constant time. We add
new singleton vertices to the instance to obtain a new blown-up instance that is equivalent to because the introduced singleton vertices are not incident with any edges.
Hence, the new Vertex Cover instance fulfills
| 3 |
Notice that
where the above inequalities follow by the properties of the function h, including that h is non-decreasing, and (3). Hence, algorithm is applicable to and, because and are equivalent instances, solves the original instance . Finally, the running time to construct and solve through is upper bounded by
where we used the fact that .
We highlight that the function h(n) does not need to be known or computed explicitly to perform the reduction. By our choice of N, the number of vertices in the blown-up instance is either |V(G)| or a power of two between |V(G)| and . Hence, one can simply run in parallel for each of the polynomially many options of the size of the blown-up instance and terminate as soon as the first one of these parallel computations terminates.
We are now ready to prove Theorem 3.
Proof of Theorem 3
The first part of the theorem is an immediate consequence of part 6 of Theorem 6 and Lemma 10.
For the second part, let be a function that satisfies and assume for the sake of contradiction that there is a polynomial-time algorithm for on the real line with . Then, by Lemma 10, there exists a polynomial-time algorithm for Vertex Cover instances satisfying . By Lemma 11, this implies the existence of an algorithm for solving (arbitrary) instances in time .
To obtain a contradiction with Theorem 6 (assuming the Exponential Time Hypothesis), it remains to show that this implies the existence of an algorithm for running in time . Given a instance, we proceed as follows. Because G has no vertex of degree larger than 3, any vertex cover in G must have cardinality at least
. Hence, if
, we know that is a “no” instance. Otherwise, if
, the running time of algorithm is , thus leading to the desired contradiction under the Exponential Time Hypothesis.
Hardness for bi-criteria algorithms
In this section, we extend the hardness result stated in Theorem 3 to bi-criteria algorithms. For this, we reduce from the optimization version of the Set Cover problem, which we refer to as the Minimum Cardinality Set Cover problem to distinguish it from the decision version used earlier. For completeness, we define it formally below.
Definition 6
(Minimum Cardinality Set Cover ()) Let U be a finite set and be a family of subsets of U. The Minimum Cardinality Set Cover problem asks to compute the smallest subset such that .
is a well-understood -hard problem. We are interested in its approximation hardness, which, after a long series of works, was settled by Dinur and Steurer [13]; we state their result as Theorem 7. We note that since we are not interested in optimizing the constant that appears in the main theorem of this section, any known -hardness result for suffices to derive Theorem 4, proved below.
Theorem 7
[13]] For every , it is -hard to approximate for instances with universe size n and sets to within a factor of .
Combining Theorem 7 with Lemma 10 leads to the desired result.
Proof of Theorem 4
Suppose that, for some constant to be determined later, there exists an algorithm for on the real line that, if there exists a solution of radius 0, it finds a solution of radius 0 by opening at most many centers, where X are the points on which is defined. We now translate this algorithm to using Lemma 10. To this end, consider an instance with , where the polynomial is the one from Theorem 7. Let be the optimal value of .
For every , we use the reduction of Lemma 10 to get a real-line instance and run on it. For , the resulting instance, by Lemma 10, has a feasible solution of size at most , and thus, for this instance our algorithm will return a solution of size at most . Because , this means that the returned Set Cover has size at most , for some constant that depends on c and the hidden universal constants in the assumption. Thus, by considering all constructed instances—which only differ by their value of k—for which a solution was returned and picking the smallest such solution, we obtain a set cover of size at most . By setting the constant c appropriately (it is easy to see that this can always be done for sufficiently small c), this now contradicts Theorem 7. We conclude that it is -hard to decide whether a instance has a solution of radius 0, even if we allow solutions that open up to centers.
Conclusion
In this work, we presented a technique for obtaining true constant-factor approximation algorithms for k-center problems with multiple covering constraints on the points to be covered. This leads to a polynomial-time 4-approximation algorithm for -Colorful k-Center, where , the number of colors, is assumed to be constant, as well as a polynomial-time 4-approximation algorithm for the more general Fair -Colorful k-Center problem.
We note here that our results extend to the supplier setting, where there are distinct sets of facilities and clients, and one is allowed to open k facilities in order to cover clients. For such settings, we obtain a polynomial-time 5-approximation algorithm for the Fair -Colorful k-Supplier problem. The extension of our arguments to this setting is done by using a standard technique: we first find clients C that constitute a 4-approximate solution to the corresponding Center problem and then pick a facility for each . Using the notation introduced in the description of Algorithm 1, we note that terminating Algorithm 1 once does not affect the remaining steps in our approximation algorithms. Hence we may assume that for all , which guarantees the existence of a facility in B(s, r). We also clarify that the “guessing a few centers” part of our algorithm performed in Lemma 9 can be applied directly to facilities with no issues arising.
On the negative side, we show that Colorful k-Center is inapproximable when the number of colors is assumed to be part of the input.
There are still some open questions remaining; we highlight two of them, which we find particularly natural and interesting:
-
(i)
The currently known hardness of -Colorful k-Center is , inherited from the standard k-Center problem, while (for constant ) we give a polynomial-time 4-approximation, and, as already mentioned, in an independent work, Jia, Sheth, and Svensson [23] give a polynomial-time 3-approximation with a worse running time. It would be interesting to close this gap.
-
(ii)
-Colorful k-Center naturally generalizes to the knapsack and matroid versions of it, where the set of centers that are opened must satisfy a knapsack or a matroid constraint. Currently, our technique does not easily generalize to such settings, so new ideas might be needed to handle these problems.
Technical theorems
Theorem 8
([3]) Let (X, d) be a finite metric space, and suppose that the following polytope
![]() |
is not empty, where , , and for every , and . Let , and let be a partition obtained by running Algorithm 1 with input (x, y). Then, if , we can find in polynomial time a set with satisfying for all .
Proof
Let and be the partition obtained by running Algorithm 1 with input (x, y). It is easy to see that satisfies all three properties of an (x, y)-good partition, and so, by slightly abusing terminology, we will call it an (x, y)-good partition.3 We now assume that , since otherwise, the set of centers S is already a feasible solution, as . We claim that the simplified LP given below is feasible and has optimal value at most y(B(S, r)).
![]() |
4 |
This is indeed the case because we can construct a feasible point to the above LP with objective value at most y(B(S, r)) as follows. Let for all . Because is a (x, y)-good partition, property 3 of Definition 3 implies that
(here we also use the fact that for all , as ), i.e., z is a feasible solution of the above LP, and its objective value is .
Suppose now that the hypothesis holds, i.e., . In particular, this means that . Note that if , then , which, by the greediness of Algorithm 1, further implies that for every . Such a case is trivial, as we can simply set . Thus, from now on, we assume that . By the above discussion, the optimal value of the above simplified LP is at most . We consider an optimal extreme point solution of LP (4). A standard sparsity argument implies that has at most t fractional variables. Indeed, is defined by q linearly independent and tight constraints of (4), among which at most t many are not of type or . Hence, this implies that there are at least -tight constraints of (4) of type or . This in turn implies that has at most t fractional components.
Furthermore, the number of strictly positive components of is at most k. To see this, note that if components of are equal to 1, all other entries must be 0 because is an optimal solution to (4), which has objective value no more than . Otherwise, there are at most variables that are equal to 1 and, together with at most t fractional variables, there are at most k strictly positive entries. Therefore, the set of centers has size at most k and satisfies for all , because , as for all .
For completeness, we now discuss how the dynamic programming problems appearing in our approaches can be solved in the claimed running time.
Theorem 9
Consider the following binary program:
![]() |
where , , and for all , where M is some positive integer number, and . Then, the above program can be solved in time .
Proof
The above binary program can be solved using standard dynamic programming techniques. More precisely, we define the following DP table. For every , for every , and , let be the maximum objective value of any vector that satisfies
-
(i)
,
-
(ii)
, and
-
(iii)
for every .
Initialization is easy to define. For all non-trivial tuples , by setting for every , we have
By observing the range of each parameter of the above table, we get that there are table entries in total. Moreover, for each entry we need to compute auxiliary quantities , and so we conclude that each entry can be computed in time . Thus, the DP can be solved in time , where we used the fact that .
We remark that the update time per table entry in the above proof can be reduced to O(1) amortized update time per table entry through a more careful analysis. However, the resulting slight reduction in running time from to is irrelevant for our purposes.
A limiting example for the framework of Chakrabarty and Negahbani [9]
A natural way to extend the approach of [9] is the following procedure. Given a point , we first run Algorithm 1 (with balls of radius 2 at each step) to get a partition of X, and then we use dynamic programming to decide whether it is possible to select at most k clusters of this partition so that the covering requirements for all colors are satisfied. Such a selection, if it exists, gives a 2-approximation. If there is no such selection, we want to return a hyperplane separating (x, y) from , as in [9].
However, there is an instance and a point (x, y), given below, such that neither the partition will lead to a solution nor is it possible to separate (x, y) from . Thus any such procedure needs to deal with this limitation.
In Fig. 2, we present an instance of -Colorful k-Center with in the one-dimensional Euclidean space; hence . There are two colors, red and blue; the red points are represented as red circles and the blue points as blue squares. The color covering requirements are . It is easy to see that there are no integral solutions of radius 0, hence any solution with radius 1 is optimal. We consider two different optimal solutions:
with corresponding clustering ,
with corresponding clustering .
We clarify that in the above, we slightly abuse notation; if there are multiple points in a location, we only pick one of them as a center, while in the corresponding clustering, all points in a covered location participate in the clustering. It is easy to verify that the above clusterings are indeed feasible solutions of radius 1, and thus, they are optimal solutions.
Fig. 2.
A limiting example for the Chakrabarty-Negahbani framework [9]
We now define the fractional solution , where and . Observe that we have for all .
In the above example, given the defined point (x, y) as input, Algorithm 1 may return the indicated partitioning . We stress here that there are ties, and in order to get this partitioning we resolve them adversarially. Note that there is no specified way to resolve such ties in [9] and it seems highly unclear how to design a procedure that always break ties in a good way even if there is a good way to break them. Observe now that no combination of two of these resulting clusters satisfies the covering requirement, so the partitioning does not lead to a solution. However, we cannot possibly find an appropriate separating hyperplane because by construction.
Funding
Open Access funding provided by ETH Zurich.
Footnotes
The version introduced in [3] requires to partition X. However, this additional condition on the input does not simplify the problem. Indeed, readily reduces to the model in [3] by introducing a new color with and replacing each element that has colors by q elements on the same location with each having a single color.
Program (2) reduces to the one of Theorem 9 by removing any redundant constraint of the first type that has negative right-hand side.
Note that the only reason why this is a slight abuse of terminology is because we defined (x, y)-good partitions only for points in . Moreover, contrary to , the decription of the polytope contains specific constraints for the covering requirements of the colors. However, these constraints did not play any role in showing that Algorithm 1 returns an (x, y)-good partition (see proof of Lemma 1).
This project received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 817750) and the Swiss National Science Foundation grants 200021_184622 and PZ00P2_174117. This research was conducted while the second author was at ETH Zurich. A preliminary version of this work was presented at the 21st Conference on Integer Programming and Combinatorial Optimization (IPCO 2020). An independent work of Jia, Sheth, and Svensson [23], presented at the same venue, gave a 3-approximation for Colorful k-Center with constantly many colors using different techniques.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Georg Anegg, Email: ganegg@ethz.ch.
Haris Angelidakis, Email: c.angelidakis@tue.nl.
Adam Kurpisz, Email: kurpisza@ethz.ch.
Rico Zenklusen, Email: ricoz@ethz.ch.
References
- 1.An HC, Singh M, Svensson O. LP-based algorithms for capacitated facility location. SIAM J. Comput. 2017;46(1):272–306. doi: 10.1137/151002320. [DOI] [Google Scholar]
- 2.Backurs, A., Indyk, P., Onak, K., Schieber, B., Vakilian, A., Wagner, T.: Scalable fair clustering. In: Proceedings of the 36th International Conference on Machine Learning (ICML), pp. 405–413 (2019)
- 3.Bandyapadhyay, S., Inamdar, T., Pai, S., Varadarajan, K.R.: A constant approximation for Colorful -Center. In: Proceedings of the 27th Annual European Symposium on Algorithms (ESA), pp. 12:1–12:14 (2019)
- 4.Bera, S.K., Chakrabarty, D., Flores, N., Negahbani, M.: Fair algorithms for clustering. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems (NeurIPS), pp. 4955–4966 (2019)
- 5.Bercea, I.O., Groß, M., Khuller, S., Kumar, A., Rösner, C., Schmidt, D.R., Schmidt, M.: On the cost of essentially fair clusterings. In: Proceedings of the 22nd International Conference on Approximation Algorithms for Combinatorial Optimization Problems (APPROX/RANDOM), pp. 18:1–18:22 (2019)
- 6.Cai L, Juedes DW. On the existence of subexponential parameterized algorithms. J. Comput. Syst. Sci. 2003;67(4):789–807. doi: 10.1016/S0022-0000(03)00074-6. [DOI] [Google Scholar]
- 7.Carr, R.D., Fleischer, L.K., Leung, V.J., Phillips, C.A.: Strengthening integrality gaps for capacitated network design and covering problems. In: Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 106–115 (2000)
- 8.Chakrabarty, D., Goyal, P., Krishnaswamy, R.: The non-uniform -center problem. In: Proceedings of the 43rd International Colloquium on Automata, Languages, and Programming (ICALP), pp. 67:1–67:15 (2016)
- 9.Chakrabarty D, Negahbani M. Generalized center problems with outliers. ACM Trans. Algorithm. 2019;15(3):41:1–41:14. doi: 10.1145/3338513. [DOI] [Google Scholar]
- 10.Charikar, M., Khuller, S., Mount, D.M., Narasimhan, G.: Algorithms for facility location problems with outliers. In: Proceedings of the 12th Annual Symposium on Discrete Algorithms (SODA), pp. 642–651 (2001)
- 11.Chen DZ, Li J, Liang H, Wang H. Matroid and knapsack center problems. Algorithmica. 2016;75(1):27–52. doi: 10.1007/s00453-015-0010-1. [DOI] [Google Scholar]
- 12.Chierichetti, F., Kumar, R., Lattanzi, S., Vassilvitskii, S.: Fair clustering through fairlets. In: Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS), pp. 5029–5037 (2017)
- 13.Dinur, I., Steurer, D.: Analytical approach to parallel repetition. In: Proceedings of the 46th Annual Symposium on the Theory of Computing (STOC), pp. 624–633 (2014)
- 14.Garey, M.R., Johnson, D.S., Stockmeyer, L.: Some simplified NP-complete problems. In: Proceedings of the 6th Annual ACM Symposium on Theory of Computing (STOC), pp. 47–63 (1974)
- 15.Gonzalez TF. Clustering to minimize the maximum intercluster distance. Theor. Comput. Sci. 1985;38:293–306. doi: 10.1016/0304-3975(85)90224-5. [DOI] [Google Scholar]
- 16.Grandoni, F., Kalaitzis, C., Zenklusen, R.: Improved approximation for tree augmentation: Saving by rewiring. In: Proceedings of the 50th ACM Symposium on Theory of Computing (STOC), pp. 632–645 (2018)
- 17.Grötschel M, Lovász L, Schrijver A. Geometric Algorithms and Combinatorial Optimization. 2. Germany: Springer; 1993. [Google Scholar]
- 18.Harris, D.G., Pensyl, T., Srinivasan, A., Trinh, K.: A lottery model for center-type problems with outliers. ACM Transactions on Algorithms 15(3), 36:1–36:25 (2019)
- 19.Hochbaum DS, Shmoys DB. A best possible heuristic for the k-center problem. Math. Op. Res. 1985;10(2):180–184. doi: 10.1287/moor.10.2.180. [DOI] [Google Scholar]
- 20.Hochbaum DS, Shmoys DB. A unified approach to approximation algorithms for bottleneck problems. J. ACM. 1986;33(3):533–550. doi: 10.1145/5925.5933. [DOI] [Google Scholar]
- 21.Hsu W, Nemhauser GL. Easy and hard bottleneck location problems. Discret. Appl. Math. 1979;1(3):209–215. doi: 10.1016/0166-218X(79)90044-1. [DOI] [Google Scholar]
- 22.Inamdar, T., Varadarajan, K.R.: On the partition set cover problem (2018). https://arxiv.org/abs/1809.06506
- 23.Jia, X., Sheth, K., Svensson, O.: Fair colorful k-center clustering. In: Proceedings of the 21st International Conference on Integer Programming and Combinatorial Optimization (IPCO), pp. 209–222 (2020)
- 24.Levi R, Lodi A, Sviridenko M. Approximation algorithms for the capacitated multi-item lot-sizing problem via flow-cover inequalities. Math. Op. Res. 2008;33(2):461–474. doi: 10.1287/moor.1070.0305. [DOI] [Google Scholar]
- 25.Li, S.: Approximating capacitated -median with open facilities. In: Proceedings of the 27th Annual ACM Symposium on Discrete Algorithms (SODA), pp. 786–796 (2016)
- 26.Li S. On uniform capacitated -median beyond the natural LP relaxation. ACM Trans. Algorithm. 2017;13(2):22:1–22:18. doi: 10.1145/2983633. [DOI] [Google Scholar]
- 27.Nutov, Z.: On the tree augmentation problem. In: Proceedings of the 25th Annual Symposium on Algorithms (ESA), pp. 61:1–61:14 (2017)
- 28.Rösner, C., Schmidt, M.: Privacy preserving clustering with constraints. In: Proceedings of the 45th International Colloquium on Automata, Languages, and Programming (ICALP), pp. 96:1–96:14 (2018)






