Abstract
We consider the approximate minimum selection problem in presence of independent random comparison faults. This problem asks to select one of the smallest k elements in a linearly-ordered collection of n elements by only performing unreliable pairwise comparisons: whenever two elements are compared, there is a small probability that the wrong comparison outcome is observed. We design a randomized algorithm that solves this problem with a success probability of at least for and any using comparisons in expectation (if or the problem becomes trivial). Then, we prove that the expected number of comparisons needed by any algorithm that succeeds with probability at least must be whenever q is bounded away from , thus implying that the expected number of comparisons performed by our algorithm is asymptotically optimal in this range. Moreover, we show that the approximate minimum selection problem can be solved using comparisons in the worst case, which is optimal when q is bounded away from and .
Keywords: Approximate minimum selection, Unreliable comparisons, Independent errors
Introduction
In an ideal world, computational tasks are always carried out reliably, i.e., every operation performed by an algorithm behaves exactly as intended. Practical architectures, however, are error-prone and even basic operations can sometimes return the wrong results, especially when large-scale systems are involved. When dealing with these spurious results the first instinct is to try to detect and correct the errors as they manifest, so that the problems of interest can then be solved using classical (non fault-tolerant) algorithms. An alternative approach deliberately allows errors to interfere with the execution of an algorithm, in the hope that the computed solution will still be good, at least in an approximate sense. This begs the question: is it possible to devise algorithms that cope with faults by design and return solutions that are demonstrably good?
We investigate this question by considering a generalization of the fundamental problem of finding the minimum element in a totally-ordered set: in the fault-tolerant approximate minimum selection problem ( for short) we wish to return one of the smallest k elements in a collection of size using only unreliable pairwise comparisons, i.e., comparisons in which the result can sometimes be incorrect due to errors. This allows, for example, to find a representative in the top percentile of the input set, or to obtain a good estimate of the minimum from a set of noisy observations.
In this paper we provide both upper and lower bounds on the number of comparisons needed by any (possibly randomized) algorithm that solves with a success probability of at least . Since for we can solve by simply returning a random input element, we will focus on , where . We prove that can be solved using comparisons in expectation,1 and that this number of comparisons is asymptotically optimal when q is bounded away from . Moreover, we show that whenever we can use the same asymptotic number of comparisons also in the worst case.
Our results have applications in any setting that is subject to random comparison errors (e.g., due to communication interferences, alpha particles, charge collection, cosmic rays [4, 8], or energy-efficient architectures where the energy consumed by the computation can be substantially reduced if a small fraction of faulty results is allowed [2, 9, 10, 29]), or in which performing accurate comparisons is too resource-consuming (think, e.g., of the elements as references to remotely stored records) while approximate comparisons can be carried out much quicker. One concrete application might be selecting one of the top-tier products from a collection of items built using an imprecise manufacturing process (i.e., a high-quality cask from a distillery or a fast semiconductor from a fabrication facility). In these settings, products can be compared to one another (either by human experts or by automated tests), yet the result of the comparisons are not necessarily accurate.2
Before presenting our results in more detail, we briefly discuss the considered error model.
The Error Model
We consider independent random comparison faults, a simple and natural error model, in which there exists a true strict ordering relation among the set S of n input elements, yet algorithms are only allowed to gather information on this relation via unreliable comparisons between pairs of elements. The outcome of a comparison involving two distinct elements x and y can be either “<” or “>” to signify that x is reported as “smaller than” or “larger than” y, respectively. Most of the times the outcome of a comparison will correspond to the true relative order of the compared elements, but there is a probability upper bounded by a constant that the wrong result will be observed instead. An algorithm can compare the same pair of elements more than once and, when this happens, the outcome of each comparison is chosen independently of the previous results. In a similar way, comparisons involving different pairs of elements are also assumed to be independent.
The above error model was first considered in the 80s and 90s when the related problems of finding the minimum, selecting the k-th smallest element, and of sorting a sequence have been studied [14, 30, 31]. The best solutions to these problems are due to Feige et al. [14], who provided Monte Carlo algorithms having a success probability of and requiring , , and comparisons in the worst case, respectively. Moreover, the authors also provide matching lower bounds, thus showing that the above algorithms use the asymptotically optimal number of comparisons in the worst case. In the sequel we will invoke the minimum finding algorithm of [14]—which we name FindMin—as a subroutine. We therefore find convenient to restate the following theorem from [14] using our notation:
Theorem 1
([14, Theorem 3.5]) Given a set S of n elements and a parameter , algorithm FindMin performs comparisons in the worst case and returns the minimum of S with a success probability of at least .
Our Contributions
We design a randomized algorithm that solves with a success probability of at least using comparisons in expectation, where . Moreover, we show that the expected number of comparisons performed by our algorithm is asymptotically optimal when q is bounded away from by proving that any algorithm that succeeds with probability at least requires comparisons in expectation.
We also show how to additionally guarantee that the worst-case number of comparisons required by our algorithm will be . This implies that, as soon as , we can solve with a success probability of at least using comparisons, which is asymptotically optimal when q is bounded away from .
A possible way to evaluate different algorithms for is that of comparing the range of values of k that they are able to handle if we impose an (asymptotic) upper limit of T on the (possibly expected) number of comparisons that they are allowed to perform. For example, if we require the algorithms to succeed with high probability (w.h.p., i.e., for ) and pick , the natural algorithm that executes FindMin with on a randomly chosen subset of elements only works for . For the same choice of T and q our algorithm works for any , thus exhibiting a quadratic difference w.r.t. the smallest achievable values of k. When , the natural algorithm cannot provide any (non-trivial) guarantee on the rank of the returned element w.h.p., while our algorithm works for any . To summarize, our algorithm is able to handle the asymptotically optimal range of k if (i) T refers to an expected number of comparisons, or (ii) T refers to a worst-case number of comparisons and .
Our Techniques
To obtain our positive results we start by designing a reduction that transforms an instance of into an instance of with elements. This reduction shows that if it is possible to solve with a success probability of at least using T(n) comparisons, then can be solved with a success probability of at least using comparisons. This allows us to focus on solving with a success probability of at least using comparisons.
We do so using a noisy oracle that is able to detect, using a constant number of comparisons, whether an element is among the smallest elements in S with an error probability upper bounded by a (small) constant. We employ this noisy oracle in an iterative algorithm that considers one random input element at a time: whenever is considered, x is tested by querying the oracle multiple times, and it is returned only if most of the query answers report x as one smallest elements in S. The amount of queries performed during each test increases with the iteration number and is chosen to simultaneously ensure that (i) the overall expected number of comparisons is , and (ii) the probability of (wrongly) returning an element that is too large is at most q. Using our reduction, the above algorithm can be immediately transformed into an algorithm that solves using comparisons in expectation and comparisons in the worst case.
To reduce the number of comparisons needed in the worst case, we design an algorithm for that is reminiscent of knockout-style tournaments and always performs comparisons. Thanks to our reduction, this latter algorithm improves the worst-case number of comparisons required to solve to , which is optimal for and q bounded away from .
Regarding our negative results, we obtain our lower bound of using three different strategies depending on the values of k, n, and q (where q is bounded away from ). For and , which we deem the most interesting case, we reduce to so that (an extension of) the lower bound of [14] applies. For and , we construct a set of roughly instances of n elements each such that the generic i-th input position contains one of the largest elements in at least one bad instance in . In order to return the i-th input element, an algorithm for needs to detect whether the input instance is bad w.r.t. i with a sufficiently high confidence, which requires comparisons in expectation. Finally, for , we show how to improve the success probability of any algorithm that solves from to at least with only a small blow-up in the number of comparisons, allowing us to employ one of the two lower bound strategies described above.
Other Related Works
The problem of finding the exact minimum of a collection of elements using unreliable comparisons had already received attention back in 1987 when Ravikumar et al. [32] considered the variant in which only up to f comparisons can fail and proved that comparisons are needed in the worst case. Notice that, in our setting with q bounded away from , in expectation since comparisons are necessary (as we show in Sect. 6) and each comparison fails with constant probability. In [1], Aigner considered a prefix-bounded probability of error : at any point during the execution of an algorithm, at most a p-fraction of the past comparisons could have failed. Here, the situation significantly worsens as up to comparisons might be necessary to find the minimum (and this is tight). Moreover, if the fraction of erroneous comparisons is globally bounded by , and , then Aigner also proved that no algorithm can succeed with certainty [1]. The landscape improves when we assume that errors occur independently at random: in addition to the already-cited algorithm by Feige et al. [14] (see Sect. 1.1), a recent paper by Braverman et al. [6] also considered the round complexity and the number of comparisons needed by partition and selection algorithms. The results in [6] imply that, for constant error probabilities, comparisons are needed by any algorithm that selects the minimum w.h.p.
Recently, Chen et al. [11] focused on computing the smallest k elements given r independent noisy comparisons between each pair of elements. For this problem, in a more general error model, they provide a tight algorithm that requires at most times as many samples as the best possible algorithm that achieves the same success probability.
If we turn our attention to the related problem of sorting with faults, then comparisons are needed to correctly sort n elements when up to f comparisons can return the wrong answer, and this is tight [3, 23, 26]. In the prefix-bounded model, the result in [1] on minimum selection also implies that comparisons are sufficient for sorting, while a lower bound of holds even for the easier problem of checking whether the input elements are already sorted [5]. The problem of sorting when faults are permanent (or, equivalently, when a pair of elements can only be compared once) has also been extensively studied and it exhibits connections to both the rank aggregation problem and to the minimum feedback arc set [6, 7, 16–18, 20–22, 24, 27].
Another related family of problems falls in the framework of Rényi–Ulam games: in these two-player games a responder secretly selects an object x from a known universe U and a questioner needs to identify x by asking up to n questions to the responder. The responder can lie up to f times and wins if the questioner fails to uniquely identify x. As an example, if and the questions are comparisons the of form “Is ?”, where is chosen by the questioner, then the questioner can always win by asking at most questions, which is tight [33].3 More broadly, Rényi–Ulam games and other search games have been extensively studied for a wide variety of search spaces and question types, as discussed in [31].
Other error-prone models have also been considered in the context of optimization algorithms [19] and in the design resilient data strictures [15, 25]. For more related problems on the aforementioned and other fault models, we refer the interested reader to [31] for a survey and to [12] for a monograph.
Finally, we point out that, in the fault-free case, a simple sampling strategy allows to find one of the smallest k elements with probability at least using comparisons.
Paper Organization
In Sect. 2 we give some preliminary remarks and we outline a simple strategy to reduce the error probability. Section 3 describes our reduction from to . In Sects. 4 and 5 we design two algorithms that solve using comparisons in expectation and comparisons in the worst case, respectively. Finally, Sect. 6 is devoted to proving our lower bounds.
Preliminaries
We will often draw elements from the input set into one or more (multi)sets using sampling with replacement, i.e., we allow multiple copies of an element to appear in the same multiset. We will then perform comparisons among the elements of these multisets as if they were all distinct: when two copies of the same element are compared, we break the tie using any arbitrary (but consistent) ordering among the copies.
According to our error model, each comparison fault happens independently at random with probability at most . This error probability can be reduced by repeating a comparison multiple times using a simple majority strategy. The same strategy also works in the related setting in which pairwise comparisons are no longer allowed but we have access to a noisy oracle that can be queried with an element and returns either true or false. In this setting each is associated with a correct binary answer and, when is queried with x, it returns the correct answer with probability at least (and the wrong answer with the complementary probability). The errors in ’s answers are independent. Next lemma provides a lower bound on the probability of correctness of the majority strategy:
Lemma 1
Let x and y be two distinct elements. For any error probability upper bounded by a constant there exists a constant such that the strategy that compares x and y (resp. queries with x) times and returns the majority result is correct with probability at least .
Proof
Suppose, w.l.o.g., that . Let be an indicator random variable that is 1 iff the i-th comparison (resp. query result) is correct. Since the s are independent Bernoulli random variables with parameter at least , stochastically dominates [28, Definition 17.1] a binomial random variable X with parameters and ,4 and hence . Moreover, since , we know that and hence we can use the Chernoff bound to upper bound the probability of failure of the majority strategy (see [28, Theorem 4.5 (2)]). Indeed:
which satisfies claim once we choose .
Our Reduction
In this section we reduce the problem of solving to the problem of solving .5 We will say that an element x is small if it is one of the smallest k elements of S, otherwise we say that x is large. The reduction constructs a set of size m that contains at least small elements, where the value of m will be determined later. The set is selected as follows:
Create m sets by independently sampling, with replacement, elements per set from S.
Run FindMin (Theorem 1) with failure probability on each of the sets. Let be the collection containing the returned elements, where is the element returned by the execution of FindMin on the i-th set.
Using Theorem 1, Lemma 1 and the Chernoff bound, we are able to prove the following lemma:
Lemma 2
The probability that fewer than elements in are small is at most .
Proof
Since the i-th set contains at least elements and each of them is independently small with a probability of , the probability that no element in the i-th set is small is upper bounded by:
where we used the inequality for . In other words, for every i, the event “the i-th set contains a small element” has probability at least . Moreover, since we chose , the probability that FindMin returns the correct minimum of the i-th set is at least (see Theorem 1). Clearly, if both of the previous events happen, must be a small element, and by the union bound, the complementary probability is at most .
Let be an indicator random variable that is 1 iff is a small element so that is the number of small elements in . Since the s are independently small with a probability of at least , the variable X stochastically dominates a Binomial random variable with parameters m and . As a consequence and, by using the Chernoff bound [28, Theorem 4.5 (2)], we obtain:
We are now ready to show the consequence of the above reduction:
Lemma 3
Let A be an algorithm that solves with a success probability of at least using at most comparisons in the worst case (resp. in expectation), for any choice of . For any k and any , there exists an algorithm that solves with a success probability of at least using comparisons in the worst case (resp. in expectation).
Proof
We first choose and we compute the set according to our reduction. Then we run A on with a failure probability of , and we answer with the element it returns. Notice that first step of the reduction requires no comparisons. Moreover, since each of the executions of FindMin requires comparisons (see Theorem 1 and recall that ), the worst-case number of comparisons performed during the second step is . Overall, the total number of comparisons is as claimed. This upper bound holds in the worst-case if refers to a worst-case number of comparisons and in expectation if refers to an expected number of comparisons.
We now consider the probability of success. By Lemma 2, the probability that fewer than elements in are small is at most . Since the probability that A fails to return one of the smallest elements in is at most , the claim follows by using the union bound.
It is not hard to see that, if we choose algorithm A in Lemma 3 to be FindMin, we have which, thanks to our reduction, allows us to solve with a success probability of at least using comparisons. This number of comparisons matches our lower bound of (see Theorem 5 in Sect. 6) when and q is bounded away from . Nevertheless, the major difficulty in solving lies in the case .
Solving Using the Asymptotically Optimal Expected Number of Comparisons
In this section, we will solve with a success probability of at least using comparisons in expectation. By Lemma 3, it is sufficient to devise an algorithm that solves with a success probability of at least using comparisons in expectation. We assume that since otherwise we can simply return an element selected uniformly at random from S.
In designing such an algorithm we will use a noisy oracle that can be queried with an element and provides a guess on whether x is small, i.e., among the smallest elements of S, or large (i.e., among the largest elements of S). More precisely, for , let denote the set containing the smallest elements of S, and let denote . Then, satisfies the following conditions:
reports an element x to be small with probability at least if and with probability at most if . In other words, identifies whether an element in is small or large with a failure probability of at most ;
Queries to can be repeated and errors in the answers are independent;
Each query to is implemented using a constant number of comparisons between elements in S.
Notice that provides no guarantees on the accuracy of its answers when . We will show how to build such an oracle in Sect. 4.1.
Let be the constant of Lemma 1 for . Our algorithm works in phases: in the generic i-th phase we select one element uniformly at random from S, and we perform a test on . This test consists of queries to , and it succeeds if x is reported as small by the majority of the queries (otherwise it fails). If the test on succeeds we return . Otherwise we move to the next phase. We name the above algorithm GeometricTest since the probability that a test succeeds when a large element is considered decreases geometrically w.r.t. the phase number, as we will show in Sect. 4.2.
Implementing
To describe our implementation of we can assume, without loss of generality, that (if , we can simulate each comparison by returning the majority result of comparisons, as shown by Lemma 1). The oracle answers a query for an element by comparing x with a randomly sampled element y from . If x compares smaller than y, then x is reported as small, otherwise it is reported as large. Suppose that , if x is (incorrectly) reported as large at least one of the following two conditions must be true (i) or (ii) the comparison between x and y returned the wrong result. The first condition is true with probability at most while the second condition is true with probability at most . Therefore the probability that x is reported as large is at most . If then, in order for x to be (incorrectly) reported as small, we must have that (i) or (ii) the comparison between x and y returned the wrong result. The first condition is true with probability at most while the second condition is true with probability at most . Overall, x is reported as small with probability at most .
Analysis of the Expected Number of Comparison and of the Success Probability of GeometricTest
The following lemmas respectively provide an upper bound on the expected number of comparisons and a lower bound on the success probability of GeometricTest.
Lemma 4
GeometricTest performs comparisons in expectation.
Proof
Consider a generic phase i. Assuming that the algorithm did not stop during phases , the probability that it stops during phase i is at least:
where we used the fact that a test on an element from performed during the i-th phase succeeds with probability at least , as shown by Lemma 1. Then, the number of phases executed by the algorithm is stochastically dominated by a geometric random variable X with parameter (see [28, Definition 2.8]).
Since, for some constant , at most comparisons are performed during phase i,6 we have that the overall number of comparisons performed during phases is upper bounded by . Then, the expected number of comparisons performed by GeometricTest is at most:
where we used the equality which follows from the more general identity for [13].
Lemma 5
GeometricTest solves with a success probability of at least .
Proof
If GeometricTest fails, then either it does not terminate or it returns an element in . Since it is easy to see that the algorithm terminates almost surely,7 we can focus on upper bounding the probability that the algorithm terminates at the end of a generic phase i by returning an element in . In order for this to happen we must have that (i) is large and (ii) was reported as small by at least of the queries to . The probability of (i) is at most , and by Lemma 1, the probability of (ii) given (i) is at most , implying that . We can now use the union bound over the different phases to upper bound the overall failure probability with .
Combining Lemmas 4 and 5 we can conclude that GeometricTest solves with a success probability of at least using comparisons in expectation. Lemma 3 immediately implies the following theorem:
Theorem 2
can be be solved with a success probability of at least using comparisons in expectation.
We conclude this section by pointing out that, since each phase of GeometricTest has a probability of at least of returning a small element (as shown in the proof of Lemma 4), we can consider the variant obtained by running GeometricTest for up to phases (if the number of phases is exceeded we return a random element). We name this variant TruncatedGeometricTest. We can upper bound the worst-case number of comparisons performed by TruncatedGeometricTest with and lower bound the success probability with . Combining TruncatedGeometricTest with Lemma 3, we obtain an algorithm that solves with a success probability of at least using comparisons in expectation and comparisons in the worst case. This latter algorithm uses the same worst-case number of comparisons as the one that can obtained by combining FindMin with Lemma 3, but it uses fewer comparisons in expectation (see the discussion at the end Sect. 3 for details). In the next section we will show how to reduce the asymptotic number of comparisons needed in the worst case.
Solving Using an Almost-Optimal Number of Comparisons in the Worst Case
In this section, we solve with a success probability of at least using comparisons in the worst case. For the sake of simplicity, we assume that n is a power of two.8 We let be a parameter that will be chosen later, and we design an algorithm that requires comparisons to solve with a success probability of at least .
Our algorithm simulates a knockout tournament and works in rounds. In the beginning we construct a set containing n elements from S, where each is obtained by running FindMin with a failure probability of on a (multi)set of elements randomly sampled with replacement from S. Then, in the generic i-th round we match together pairs of elements from the set , and we add the match winners to a new set . Specifically, for each we run a match between and consisting of comparisons, where is the constant of Lemma 1. The winner of the match is the element that is reported to be smaller by the majority of the comparisons.
After the -th round we are left with a set containing a single element: this element is winner of the tournament, i.e., it is the element returned by our algorithm. The above algorithm can be visualized as a complete binary tree of height in which the leaves are the elements in , the root is the unique element in , and each internal vertex at depth represents some element having and as its two children. See Fig. 1 for an example.
Fig. 1.

An example of the complete binary tree representing an execution of our algorithm when the input sequence S contains elements. Each internal vertex is the winner of a match (consisting of comparisons) between its two children and
As in the previous section, we will say that an element is small if it is among the smallest element of S, and large otherwise. The following lemma provides a lower bound on the success probability of our algorithm:
Lemma 6
Consider a tournament among n elements, where n is a power of 2. The probability that the winner of the tournament is a small element is at least .
Proof
We prove by induction on that , .
We start by considering the base case . Each element is obtained by running FindMin on a (multi)set of elements sampled, with replacement, from S. In order for to be large, at least one of the following conditions must be true: (i) the execution of FindMin on fails, which happens with probability at most , or (ii) all elements in are large. Since each element in is independently large with probability at most , the probability of (ii) is at most . Using the union bound and , we have that is large with probability at most .
We now consider and we show that if the induction claim holds for then it must also hold for i. Since , we know that each element is the winner of a match between the elements and in , each of which is large with probability at most by the induction hypothesis. Moreover, each is chosen as function of a collection of elements from along with all the outcomes of their pairwise comparisons performed during phases . Since and are disjoint subsets of , and the elements in are chosen using independent executions of FindMin (on independently chosen subsets of S), we have that the events “ is large” and “ is large” are also independent. For to be large either (i) and are both large, which happens with probability at most , or (ii) exactly one of and is large and it wins the match in phase i. The probability that exactly one of and is large can be upper bounded by the probability that at least one of and is large, which is at most by induction hypothesis. We hence focus on the probability that, in a match between a large and a small element, the large element wins. Since and are compared times during the match, Lemma 1 ensures that this probability is at most . Putting it all together, we have:
This completes the proof by induction and shows that the winner of the tournament (i.e., the sole element in ) is large with probability at most (and small with probability at least ).
We now analyze the number of comparisons performed by our algorithm.
Lemma 7
Simulating the tournament requires comparisons in the worst-case.
Proof
The initial selection of the elements in requires comparisons (recall that we choose ). The tournament itself consists of rounds. The number of matches that take place in round i is and, for each match, comparisons are needed. It follows that the total number of comparisons performed in each round is and, since there are rounds, the overall number of comparisons in rounds 1 to is .
If we now select , we obtain an algorithm for that performs comparisons in the worst case and has a success probability of at least . We can now use this algorithm in our reduction of Lemma 3 to immediately obtain an algorithm for which is optimal for and q bounded away from (see Theorem 5 in Sect. 6).
Theorem 3
can be solved with a success probability of at least using comparisons in the worst case.
We can combine this algorithm with GeometricTest (described in Sect. 4) to solve with a success probability of at least using both comparisons in expectation and comparisons in the worst case. In order to do so, we simply run the two algorithms in parallel until one of them terminates. Clearly, the expected number of comparisons is asymptotically unaffected, while the probability that this combined algorithm fails can be upper bounded by the sum of the respective failure probabilities, i.e., by at most (recall that GeometricTest fails with probability at most , as shown by Lemma 5).
Lower Bound
The rest of the paper is devoted to proving our lower bound of on the expected number of comparisons required to solve with a success probability of at least . We prove our lower bound using three different strategies depending on the values of k and q. Figure 2 shows a qualitative representation of the considered regions.
Fig. 2.

A qualitative representation of the different ranges of the parameters q and k (as a fraction of n) handled by Lemmas 8, 10, and 11. The figure is not to scale
We start by considering what we deem to be the most interesting case, namely the one in which and q can be upper bounded by a small enough constant. For simplicity we pick this constant to be , although the same proof strategy actually works for any , where is a constant of choice.9 We will show that any algorithm that is able to solve with a success probability of at least can also be used to solve with the same success probability. Then, our lower bound for follows from the fact that any algorothm that solves with a success probability of a least must perform comparison in expectation. This is formalized in the following theorem, whose proof is given in Appendix A and is similar to the one used in [14, Theorem 2.1] to establish a lower bound on the worst-case number of comparisons needed to solve .10
Theorem 4
Let A be an algorithm that solves with a probability of success of at least . For any , there exists a sequence S of n elements such that the expected number of comparisons performed by A on S is at least , where is a constant that depends only on p.
We are now ready to prove our lower bound for when and .
Lemma 8
Let and . For every algorithm A that solves with a success probability of at least , there exists a sequence S of n elements such that the expected number of comparisons performed by A on S is larger than , where is a constant that depends only on p.
Proof
Let be the constant from Theorem 4, choose , and suppose towards a contradiction that there is an algorithm A that is able to solve with a success probability of at least using an expected number of comparisons of at most on every instance of elements. We will show that the existence of A implies the existence of an algorithm that is able to solve with a success probability of at least on any instance of elements using fewer than comparisons in expectation, thus contradicting Theorem 4.
Algorithm works as follows: given an instance of with elements, constructs an instance S of that consists of k copies of each element in and of copies of an arbitrary element from . Then, runs A on S (which contains exactly n elements) and outputs the element x returned by the execution of A. With probability at least , x is among the k smallest elements of S, implying that it is (a copy of) the smallest element of .
To conclude the proof, it suffices to notice that the expected number of comparisons performed by on is upper bounded by the expected number of comparisons performed by A on S, i.e., it is at most
We now turn our attention to the ranges of k and q that are not covered by Lemma 8. Recall that for , no lower bound exists since can be solved without performing any comparison by simply returning an element chosen uniformly at random from S. In the rest of this section we will consider values of q that are bounded away from , namely we assume the existence of some constant for which . As a consequence, we can prove a preliminary lower bound of on the expected number of comparisons needed to solve with a success probability of at least . This lower bound will be useful to handle some corner cases in the sequel.
Lemma 9
Let and . For every algorithm A that solves with a success probability of at least , there exists a sequence S of n elements such that the expected number of comparisons performed by A on S is at least .
Proof
Suppose towards a contradiction that there is an algorithm A that is able to solve with a success probability of at least using an expected number of comparisons smaller than on every instance of elements. Then, an execution of A performs no comparisons with a probability larger than . When the input of A is a random permutation of , the failure probability must be larger than . This implies the existence of at least one instance of n elements for which A fails with a probability larger than q, yielding the sought contradiction.
We now handle the case and . We will consider a suitable set of instances ensuring that any algorithm having a success probability of at least must perform at least comparisons on at least one instance in the set, for some constant . Notice that, in this case, .
Lemma 10
Let and . For every algorithm A that solves with a success probability of at least , there exists a sequence S of n elements such that the expected number of comparisons performed by A on S is at least , where is a constant that depends only on and p.
Proof
Let A be an algorithm that solves with a success probability of at least using at most comparisons in expectation on every instance of n elements, where .
Let , notice that is a constant as it only depends on , and define . We only need to consider since, when , Lemma 9 already ensures that .
In the rest of the proof we will consider sequences having n elements each, where . Then, we will lower bound by considering the expected number of comparisons needed by A to solve each instance with a success probability of at least .
We start by defining and, for , we let be the sequence obtained by performing consecutive right rotations on .11 See Fig. 3 for an example. We denote by the element that appears in the j-th position of sequence . We will use to refer to an execution of A with input , and to denote the event “ returns element ”. Moreover, we let be the event “ performs at most comparisons”. Since the expected number of comparisons of is at most , the Markov inequality implies that performs more than comparisons with a probability of at most , i.e., .
Fig. 3.

An example of the input sequences used in the proof of Lemma 10 for and (in this case ). The positions in have a white background, while those containing the k smallest elements of each sequence have a gray background (since , the gray intervals of any two sequences must overlap). Each position contains one of the smallest k elements in at least one of the sequences. In particular, our choice of ensures that is not among the k smallest elements of , i.e., . The element for is highlighted in bold (in this case )
Given any input sequence of n elements and , we can encode an execution of A that returns the j-th element of the input sequence with a pair (C, R) where C is the list of the observed comparison outcomes, and R is the list of all random choices taken by A. Let be the probability that the execution (C, R) of A is realized when when the input sequence is . Moreover, let denote the probability that the random choices of an execution of A are exactly those in R, given that the observed comparison results match those in C. Consider the case in which each comparison error is exactly p. Then, the probability of observing the comparison outcomes in C when A’s input is (resp. ) is at most (resp. at least ), allowing us to relate and as follows:
By summing the above inequality over all the choices of (C, R) for which A returns the j-th input element and such that , we obtain . Let be the set of all indices such that is not among the k smallest elements of . Moreover, for , define and notice that (see Fig. 3). Then:
| 1 |
| 2 |
where the second inequality follows from the fact that the generic j-th summand in (1) appears in the double sum of (2) when and . Since we know that, for all , fails with probability at most , we have:
The above inequality yields , which can be combined with to obtain the sought lower bound on :
Finally, we consider the remaining case . We show that the success probability of any algorithm that solves can be boosted from to at least by running it multiple times and selecting the smallest returned element using . Then, the lower bound of either Lemmas 8 or 10 applies.
Lemma 11
Let and . For every algorithm A that solves with a probability of success of at least , there exists a sequence S of n elements such that the expected number of comparisons performed by A on S is at least , where is a constant that depends only on and p.
Proof
Let be a constant such that with a failure probability of requires at most cm comparisons on any instance of m elements (Theorem 1 ensures that such a constant exists). Let A be an algorithm that solves with a probability of error of at most q using an expected number of comparisons of at most on every instance of n elements.
Let and be the constants of Lemmas 8 and 10, respectively. Define , , and . We can restrict ourselves to the case since otherwise we can use Lemma 9 and the inequality to write .
We now describe an algorithm that uses A to solve with a success probability of at least . Given an instance S of n elements, works as follows: first, it performs independent executions of A with input S and collects the returned elements into a set ; then, it runs FindMin on with a probability of error of , and answer with the returned element.
The expected number of comparisons of is at most , and the probability that no execution of A returns one of the k smallest elements is at most . By the union bound, the overall failure probability of is at most . By invoking either Lemmas 8 or 10 (depending on the values of k and n), we know that the expected number of comparisons of must be at least . In formulas:
| 3 |
Since , we have . Moreover, we know that . We can then solve (3) for and combine the result with the above inequalities:
| 4 |
Using once more, together with , we have:
which can be combined with (4) to show that .
Combining Lemmas 8, 10, and 11, we obtain the main result of this section:
Theorem 5
Let and . For every algorithm A that solves with a success probability of at least , there exists a sequence S of n elements such that the expected number of comparisons performed by A on S is at least , where is a constant that depends only on and p.
Acknowledgements
The authors wish to thank Tomáš Gavenčiak, Barbara Geissmann, Paolo Penna, Daniel Rutschmann, and Peter Widmayer for insightful discussions. We also wish to thank Claire Nicole Barbasch, the anonymous reviewers, and the coordinating editor for their careful reading of our manuscript and their many comments and suggestions.
Proof of Theorem 4
Consider the following Noisy-OR problem, first introduced in [14]: we are given a finite input sequence where and we need to output “true” if at least one equals 1, and “false” otherwise. The actual value of an element cannot be read directly but we can perform queries to a noisy oracle . Whenever is queried with an element , it returns with probability and with the complementary probability. We will denote such a query by .
Given an algorithm A that solves on every instance of n elements with a success probability of at least , we can design an algorithm that solves the Noisy-OR problem on every instance of elements with the same success probability. Intuitively, algorithm on input simulates an execution of A on input where and, for , . As a consequence, with probability at least , returns if all s are 0, and where i is the smallest index such that otherwise. Formally executes A with the following modifications:
- Whenever A compares two elements and with (the case is symmetric) simulates the comparison as follows:
- If , then queries with . If returns “1”, is treated as smaller than . Otherwise, is treated as smaller than .
- If , then performs queries (resp. ), and computes their majority result (resp ). If exactly one is 1, treats the corresponding element as the smaller between and . Otherwise (if ) is treated as smaller than . Notice that, by Lemma 1, (resp. ) with probability at least , showing that the element that is reported as smaller is actually the smaller between and with a probability of at least .
Whenever A terminates and returns an element , returns “true” if and “false” otherwise.
Each comparison of A is implemented by using at most queries to , therefore if the average number of comparisons required A on any instance of n elements is at most , performs at most queries in expectation on every instance of Noisy-OR consisting of elements. In the rest of the proof we will show that no algorithm for Noisy-OR having a success probability of at least can perform fewer than queries on all instances of elements, where is a suitable constant that depends on p. This will immediately prove that , where .
Consider any algorithm that solves Noisy-OR with a success probability of at least and a let be the maximum among the expected number of queries performed by on any input sequence of elements. We will derive a lower bound on . Assume that is finite and let be a parameter that will be chosen later. The existence of implies the existence of an algorithm for Noisy-OR that has a success probability of at least , performs at most queries in expectation (on every instance of elements), and at most queries in the worst case. Such can be obtained from by stopping immediately before the -th query to (if is stopped in this way, returns any arbitrary input element). By the Markov inequality is stopped with probability at most , showing that the probability of failure of is at most .
For , let denote the sequence , where if and otherwise (so that consists of zeros). Similarly to the proof of Theorem 2.1 in [14], we consider the rooted noisy Boolean decision tree T associated with : each internal vertex v of T has a label and represents a query , while each leaf is labeled with either “true” or “false”. A specific execution of on an input sequence of elements traces a path between the root r of T and a leaf of T, and the output of the execution matches the label of . Let L be the set of leaves of T.
Given we will denote by the probability that an execution of on input traces the (unique) path from r to in T. For , define as the number of internal vertices labeled j on the unique path between the root of T and . It follows that the execution of described by performs queries to , where is the depth of in T.
For , the only input element that differs between and is . Since an execution of that traces performs queries , we have that must be at least . Then:
| 5 |
Let . Since is a convex function we can use Jensen’s inequality to write:
| 6 |
Let Z be the set of leaves of T labeled “false”. Combining (5) and (6), summing over all leaves , and noticing that we must have , we obtain:
| 7 |
Define , , and . Since succeeds with probability at least on every input sequence of elements, we know that . Moreover, is a monotonically decreasing convex function, allowing us to combine the above inequality with Jensen’s inequality to write:
| 8 |
which implies
We can lower bound with the average number of queries performed by on input , indeed:
We now choose and . Notice that this is a valid choice for since . Therefore we know that and we can write:
where we used the identity for . This provides the sought lower bound on and completes the proof.
Funding
Open access funding provided by Swiss Federal Institute of Technology Zurich
Footnotes
In order to ease notation, we omit ceilings in our asymptotic upper bounds. E.g., we will write in place of . This is only relevant for values of q approaching 1.
Interestingly, some semiconductor companies consider the more general problem of approximately classifying all their products in a process known as product binning. E.g., identically-built processors can be assigned different commercial names depending on their performances.
Throughout the rest of the paper we use to denote binary logarithms and to refer to natural logarithms.
The first parameter of a binomial random variable represents the number of (independent) trials while the second parameter is the success probability of each trial.
In order to ease the notation, here and throughout the rest of the paper we will use as a shorthand for .
Recall from Sect. 4.1 that each query to is implemented with one comparison when , and with comparisons when . Since the number of queries in phase i is , we can pick .
The proof of Lemma 4 shows that the probability that more than i phases are needed is at most .
If this is not the case, it will suffice to pad S with randomly selected copies of elements from S.
The case and will be covered by the more general Lemma 11.
The lower bound of [14, Theorem 2.1] concerns the related problem of computing the logical OR of an input sequence of bits using noisy queries. A lower bound for this problem immediately translates into a lower bound for . For details see Lemma 3.4 and Theorem 3.5 in [14] or the self-contained proof of Theorem 4 in Appendix A.
The result of a single right rotation on a sequence is the sequence where and, for , .
Research supported by SNF (Project Number 200021_165524).
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Stefano Leucci, Email: stefano.leucci@univaq.it.
Chih-Hung Liu, Email: chih-hung.liu@inf.ethz.ch.
References
- 1.Aigner M. Finding the maximum and minimum. Discret. Appl. Math. 1997;74(1):1–12. doi: 10.1016/S0166-218X(96)00012-1. [DOI] [Google Scholar]
- 2.Anthes G. Inexact design: beyond fault-tolerance. Commun. ACM. 2013;56(4):18–20. doi: 10.1145/2436256.2436262. [DOI] [Google Scholar]
- 3.Bagchi A. On sorting in the presence of erroneous information. Inf. Process. Lett. 1992;43(4):213–215. doi: 10.1016/0020-0190(92)90203-8. [DOI] [Google Scholar]
- 4.Baumann RC. Radiation-induced soft errors in advanced semiconductor technologies. IEEE Trans. Device Mater. Reliab. 2005;5(3):305–316. doi: 10.1109/TDMR.2005.853449. [DOI] [Google Scholar]
- 5.Borgstrom, R.S., Kosaraju, S.R.: Comparison-based search in the presence of errors. In: Proceedings of the Twenty-fifth Symposium on Theory of Computing (STOC93), pp. 130–136 (1993)
- 6.Braverman, M., Mao, J., Weinberg, S.M.: Parallel algorithms for select and partition with noisy comparisons. In: Proceedings of the Forty-Eighth 48th Symposium on Theory of Computing (STOC16), pp. 851–862 (2016)
- 7.Braverman, M., Mossel, E.: Noisy sorting without resampling. In: Proceedings of the Nineteenth Symposium on Discrete Algorithms (SODA08), pp. 268–276 (2008)
- 8.Catania, J.A.: Soft errors in electronic memory—a white paper (2004)
- 9.Cheemavalagu, S., Korkmaz, P., Palem, K.: Ultra low-energy computing via probabilistic algorithms and devices: CMOS device primitives and the energy-probability relationship. In: Proceedings of the 2004 International Conference on Solid State Devices and Materials, pp. 402–403 (2004)
- 10.Cheemavalagu, S., Korkmaz, P., Palem, K., Akgul, B.E.S., Chakrapani, L.N.: A probabilistic CMOS switch and its realization by exploiting noise. In: Proceedings of the 2005 IFIP/IEEE International Conference on Very Large Scale Integration—System on a Chip (VLSI-SoC05, pp. 535–541 (2005)
- 11.Chen, X., Gopi, S., Mao, J., Schneider, J.: Competitive analysis of the top- ranking problem. In: Proceedings of the Twenty-Eighth Symposium on Discrete Algorithms (SODA17), pp. 1245–1264 (2017)
- 12.Cicalese, F.: Fault-Tolerant Search Algorithms—Reliable Computation with Unreliable Information. Monographs in Theoretical Computer Science. Springer (2013)
- 13.Edgar T. Staircase series. Math. Mag. 2018;91(2):92–95. doi: 10.1080/0025570X.2017.1415584. [DOI] [Google Scholar]
- 14.Feige U, Raghavan P, Peleg D, Upfal E. Computing with noisy information. SIAM J. Comput. 1994;23(5):1001–1018. doi: 10.1137/S0097539791195877. [DOI] [Google Scholar]
- 15.Finocchi I, Grandoni F, Italiano GF. Optimal resilient sorting and searching in the presence of memory faults. Theor. Comput. Sci. 2009;410(44):4457–4470. doi: 10.1016/j.tcs.2009.07.026. [DOI] [Google Scholar]
- 16.Geissmann, B., Leucci, S., Liu, C., Penna, P.: Sorting with recurrent comparison errors. In: Proceedings of the Twenty-Eighth International Symposium on Algorithms and Computation (ISAAC17), pp. 38:1–38:12 (2017)
- 17.Geissmann, B., Leucci, S., Liu, C., Penna, P.: Optimal sorting with persistent comparison errors. In: Proceedings of the Twenty-seventh European Symposium on Algorithms (ESA19), pp. 49:1–49:14 (2019)
- 18.Geissmann B, Leucci S, Liu C, Penna P. Optimal dislocation with persistent errors in subquadratic time. Theory Comput. Syst. 2020;64(3):508–521. doi: 10.1007/s00224-019-09957-5. [DOI] [Google Scholar]
- 19.Geissmann, B., Leucci, S., Liu, C., Penna, P., Proietti, G.: Dual-mode greedy algorithms can save energy. In: Proceedings of the 30th International Symposium on Algorithms and Computation (ISAAC19), LIPIcs, vol. 149, pp. 64:1–64:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019). 10.4230/LIPIcs.ISAAC.2019.64
- 20.Geissmann, B., Mihalák, M., Widmayer, P.: Recurring comparison faults: sorting and finding the minimum. In: Proceedings of the Twentieth International Symposium on Fundamentals of Computation Theory (FCT15), pp. 227–239 (2015)
- 21.Kenyon-Mathieu, C., Schudy, W.: How to rank with few errors. In: Proceedings of the Thirty-Nineth Symposium on Theory of Computing (STOC07), pp. 95–103 (2007)
- 22.Klein, R., Penninger, R., Sohler, C., Woodruff, D.P.: Tolerant algorithms. In: Proceedings of the Nineteenth European Symposium on Algorithms (ESA11), pp. 736—747 (2011)
- 23.Lakshmanan KB, Ravikumar B, Ganesan K. Coping with erroneous information while sorting. IEEE Trans. Comput. 1991;40(9):1081–1084. doi: 10.1109/12.83656. [DOI] [Google Scholar]
- 24.Leighton T, Ma Y. Tight bounds on the size of fault-tolerant merging and sorting networks with destructive faults. SIAM J. Comput. 1999;29(1):258–273. doi: 10.1137/S0097539796305298. [DOI] [Google Scholar]
- 25.Leucci, S., Liu, C., Meierhans, S.: Resilient dictionaries for randomly unreliable memory. In: Proceedings of the 27th Annual European Symposium on Algorithms, (ESA19), LIPIcs, vol. 144, pp. 70:1–70:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019). 10.4230/LIPIcs.ESA.2019.70
- 26.Long, P.M.: Sorting and searching with a faulty comparison oracle. University of California at Santa Cruz, Tech. rep. (1992)
- 27.Makarychev, K., Makarychev, Y., Vijayaraghavan, A.: Sorting noisy data with partial information. In: Proceedings of the Fourth Conference on Innovations in Theoretical Computer Science (ITCS13), pp. 515–528 (2013)
- 28.Mitzenmacher, M., Upfal, E.: Probability and Computing: Randomization and Probabilistic Techniques in Algorithms and Data Analysis, 2nd edn. Cambridge University Press (2017)
- 29.Palem, K., Lingamneni, A.: Ten years of building broken chips: The physics and engineering of inexact computing. ACM Trans. Embed. Comput. Syst. 12(2s), 87:1–87:23 (2013)
- 30.Pelc A. Searching with known error probability. Theor. Comput. Sci. 1989;63(2):185–202. doi: 10.1016/0304-3975(89)90077-7. [DOI] [Google Scholar]
- 31.Pelc A. Searching games with errors - fifty years of coping with liars. Theor. Comput. Sci. 2002;270(1–2):71–109. doi: 10.1016/S0304-3975(01)00303-6. [DOI] [Google Scholar]
- 32.Ravikumar, B., Ganesan, K., Lakshmanan, K.B.: On selecting the largest element in spite of erroneous information. In: Proceedings of the Fourth Symposium on Theoretical Aspects of Computer Science (STACs87), pp. 88–99 (1987)
- 33.Rivest RL, Meyer AR, Kleitman DJ, Winklmann K, Spencer J. Coping with errors in binary search procedures. J. Comput. Syst. Sci. 1980;20(3):396–404. doi: 10.1016/0022-0000(80)90014-8. [DOI] [Google Scholar]
