Skip to main content
Springer logoLink to Springer
. 2021 Nov 1;84(1):60–84. doi: 10.1007/s00453-021-00880-1

Approximate Minimum Selection with Unreliable Comparisons

Stefano Leucci 1, Chih-Hung Liu 2,
PMCID: PMC8786813  PMID: 35125579

Abstract

We consider the approximate minimum selection problem in presence of independent random comparison faults. This problem asks to select one of the smallest k elements in a linearly-ordered collection of n elements by only performing unreliable pairwise comparisons: whenever two elements are compared, there is a small probability that the wrong comparison outcome is observed. We design a randomized algorithm that solves this problem with a success probability of at least 1-q for q(0,n-kn) and any k[1,n-1] using O(nklog1q) comparisons in expectation (if kn or qn-kn the problem becomes trivial). Then, we prove that the expected number of comparisons needed by any algorithm that succeeds with probability at least 1-q must be Ω(nklog1q) whenever q is bounded away from n-kn, thus implying that the expected number of comparisons performed by our algorithm is asymptotically optimal in this range. Moreover, we show that the approximate minimum selection problem can be solved using O((nk+loglog1q)log1q) comparisons in the worst case, which is optimal when q is bounded away from n-kn and k=O(nloglog1q).

Keywords: Approximate minimum selection, Unreliable comparisons, Independent errors

Introduction

In an ideal world, computational tasks are always carried out reliably, i.e., every operation performed by an algorithm behaves exactly as intended. Practical architectures, however, are error-prone and even basic operations can sometimes return the wrong results, especially when large-scale systems are involved. When dealing with these spurious results the first instinct is to try to detect and correct the errors as they manifest, so that the problems of interest can then be solved using classical (non fault-tolerant) algorithms. An alternative approach deliberately allows errors to interfere with the execution of an algorithm, in the hope that the computed solution will still be good, at least in an approximate sense. This begs the question: is it possible to devise algorithms that cope with faults by design and return solutions that are demonstrably good?

We investigate this question by considering a generalization of the fundamental problem of finding the minimum element in a totally-ordered set: in the fault-tolerant approximate minimum selection problem (FT-MIN(k) for short) we wish to return one of the smallest k elements in a collection of size n>k using only unreliable pairwise comparisons, i.e., comparisons in which the result can sometimes be incorrect due to errors. This allows, for example, to find a representative in the top percentile of the input set, or to obtain a good estimate of the minimum from a set of noisy observations.

In this paper we provide both upper and lower bounds on the number of comparisons needed by any (possibly randomized) algorithm that solves FT-MIN(k) with a success probability of at least 1-q. Since for qn-kn we can solve FT-MIN(k) by simply returning a random input element, we will focus on q(0,qcrit), where qcrit=n-kn. We prove that FT-MIN(k) can be solved using O(nklog1q) comparisons in expectation,1 and that this number of comparisons is asymptotically optimal when q is bounded away from qcrit. Moreover, we show that whenever k=O(nloglog1q) we can use the same asymptotic number of comparisons also in the worst case.

Our results have applications in any setting that is subject to random comparison errors (e.g., due to communication interferences, alpha particles, charge collection, cosmic rays [4, 8], or energy-efficient architectures where the energy consumed by the computation can be substantially reduced if a small fraction of faulty results is allowed [2, 9, 10, 29]), or in which performing accurate comparisons is too resource-consuming (think, e.g., of the elements as references to remotely stored records) while approximate comparisons can be carried out much quicker. One concrete application might be selecting one of the top-tier products from a collection of items built using an imprecise manufacturing process (i.e., a high-quality cask from a distillery or a fast semiconductor from a fabrication facility). In these settings, products can be compared to one another (either by human experts or by automated tests), yet the result of the comparisons are not necessarily accurate.2

Before presenting our results in more detail, we briefly discuss the considered error model.

The Error Model

We consider independent random comparison faults, a simple and natural error model, in which there exists a true strict ordering relation among the set S of n input elements, yet algorithms are only allowed to gather information on this relation via unreliable comparisons between pairs of elements. The outcome of a comparison involving two distinct elements x and y can be either “<” or “>” to signify that x is reported as “smaller than” or “larger than” y, respectively. Most of the times the outcome of a comparison will correspond to the true relative order of the compared elements, but there is a probability upper bounded by a constant p<12 that the wrong result will be observed instead. An algorithm can compare the same pair of elements more than once and, when this happens, the outcome of each comparison is chosen independently of the previous results. In a similar way, comparisons involving different pairs of elements are also assumed to be independent.

The above error model was first considered in the 80s and 90s when the related problems of finding the minimum, selecting the k-th smallest element, and of sorting a sequence have been studied [14, 30, 31]. The best solutions to these problems are due to Feige et al. [14], who provided Monte Carlo algorithms having a success probability of 1-q and requiring O(nlog1q), O(nlogmin{k,n-k}q), and O(nlognq) comparisons in the worst case, respectively. Moreover, the authors also provide matching lower bounds, thus showing that the above algorithms use the asymptotically optimal number of comparisons in the worst case. In the sequel we will invoke the minimum finding algorithm of [14]—which we name FindMin—as a subroutine. We therefore find convenient to restate the following theorem from [14] using our notation:

Theorem 1

([14, Theorem 3.5]) Given a set S of n elements and a parameter ϱ(0,12), algorithm FindMin performs Onlog1ϱ comparisons in the worst case and returns the minimum of S with a success probability of at least 1-ϱ.

Our Contributions

We design a randomized algorithm that solves FT-MIN(k) with a success probability of at least 1-q using O(nklog1q) comparisons in expectation, where q(0,qcrit). Moreover, we show that the expected number of comparisons performed by our algorithm is asymptotically optimal when q is bounded away from qcrit by proving that any algorithm that succeeds with probability at least 1-q requires Ω(nklog1q) comparisons in expectation.

We also show how to additionally guarantee that the worst-case number of comparisons required by our algorithm will be O((nk+loglog1q)·log1q). This implies that, as soon as k=O(nloglog1q), we can solve FT-MIN(k) with a success probability of at least 1-q using O(nklog1q) comparisons, which is asymptotically optimal when q is bounded away from qcrit.

A possible way to evaluate different algorithms for FT-MIN(k) is that of comparing the range of values of k that they are able to handle if we impose an (asymptotic) upper limit of T on the (possibly expected) number of comparisons that they are allowed to perform. For example, if we require the algorithms to succeed with high probability (w.h.p., i.e., for q=1n) and pick T=Θ(n), the natural algorithm that executes FindMin with ϱ=O(1n) on a randomly chosen subset of O(nklogn) elements only works for k=Ω(log2n). For the same choice of T and q our algorithm works for any k=Ω(logn), thus exhibiting a quadratic difference w.r.t. the smallest achievable values of k. When T=ω(logn)o(log2n), the natural algorithm cannot provide any (non-trivial) guarantee on the rank of the returned element w.h.p., while our algorithm works for any k=Ω(nTlogn). To summarize, our algorithm is able to handle the asymptotically optimal range of k if (i) T refers to an expected number of comparisons, or (ii) T refers to a worst-case number of comparisons and T=ω(logn·loglogn).

Our Techniques

To obtain our positive results we start by designing a reduction that transforms an instance of FT-MIN(k) into an instance of FT-MIN34n with n=Θ(log1q) elements. This reduction shows that if it is possible to solve FT-MIN(34n) with a success probability of at least 1-q2 using T(n) comparisons, then FT-MIN(k) can be solved with a success probability of at least 1-q using O(nklog1q)+TΘ(log1q) comparisons. This allows us to focus on solving FT-MIN34n with a success probability of at least 1-q using O(1q) comparisons.

We do so using a noisy oracle that is able to detect, using a constant number of comparisons, whether an element is among the smallest elements in S with an error probability upper bounded by a (small) constant. We employ this noisy oracle in an iterative algorithm that considers one random input element at a time: whenever xS is considered, x is tested by querying the oracle multiple times, and it is returned only if most of the query answers report x as one smallest elements in S. The amount of queries performed during each test increases with the iteration number and is chosen to simultaneously ensure that (i) the overall expected number of comparisons is O(1q), and (ii) the probability of (wrongly) returning an element that is too large is at most q. Using our reduction, the above algorithm can be immediately transformed into an algorithm that solves FT-MIN(k) using O(nklog1q) comparisons in expectation and O(nklog1q+log21q) comparisons in the worst case.

To reduce the number of comparisons needed in the worst case, we design an algorithm for FT-MIN(34n) that is reminiscent of knockout-style tournaments and always performs O(nlogn+nlog1q+n-1log21q) comparisons. Thanks to our reduction, this latter algorithm improves the worst-case number of comparisons required to solve FT-MIN(k) to O(nklogn+(log1q)loglog1q), which is optimal for k=O(nloglog1q) and q bounded away from qcrit.

Regarding our negative results, we obtain our lower bound of Ω(nklog1q) using three different strategies depending on the values of k, n, and q (where q is bounded away from qcrit). For n2k and q14, which we deem the most interesting case, we reduce FT-MIN(k) to FT-MIN(1) so that (an extension of) the lower bound of [14] applies. For k<n<2k and q14, we construct a set I of roughly nn-k instances of n elements each such that the generic i-th input position contains one of the largest n-k elements in at least one bad instance in I. In order to return the i-th input element, an algorithm for FT-MIN(k) needs to detect whether the input instance is bad w.r.t. i with a sufficiently high confidence, which requires Ω(log1q) comparisons in expectation. Finally, for q>14, we show how to improve the success probability of any algorithm that solves FT-MIN(k) from 1-q<34 to at least 34 with only a small blow-up in the number of comparisons, allowing us to employ one of the two lower bound strategies described above.

Other Related Works

The problem of finding the exact minimum of a collection of elements using unreliable comparisons had already received attention back in 1987 when Ravikumar et al. [32] considered the variant in which only up to f comparisons can fail and proved that Θ(fn) comparisons are needed in the worst case. Notice that, in our setting with q bounded away from qcrit, f=Ω(nklog1q) in expectation since Ω(nklog1q) comparisons are necessary (as we show in Sect. 6) and each comparison fails with constant probability. In [1], Aigner considered a prefix-bounded probability of error p<12: at any point during the execution of an algorithm, at most a p-fraction of the past comparisons could have failed. Here, the situation significantly worsens as up to Θ(11-p)n comparisons might be necessary to find the minimum (and this is tight). Moreover, if the fraction of erroneous comparisons is globally bounded by ρ, and ρ=Ω(1n), then Aigner also proved that no algorithm can succeed with certainty [1]. The landscape improves when we assume that errors occur independently at random: in addition to the already-cited algorithm by Feige et al. [14] (see Sect. 1.1), a recent paper by Braverman et al. [6] also considered the round complexity and the number of comparisons needed by partition and selection algorithms. The results in [6] imply that, for constant error probabilities, Θ(nlogn) comparisons are needed by any algorithm that selects the minimum w.h.p.

Recently, Chen et al. [11] focused on computing the smallest k elements given r independent noisy comparisons between each pair of elements. For this problem, in a more general error model, they provide a tight algorithm that requires at most O(npolylogn) times as many samples as the best possible algorithm that achieves the same success probability.

If we turn our attention to the related problem of sorting with faults, then Ω(nlogn+fn) comparisons are needed to correctly sort n elements when up to f comparisons can return the wrong answer, and this is tight [3, 23, 26]. In the prefix-bounded model, the result in [1] on minimum selection also implies that (11-p)O(nlogn) comparisons are sufficient for sorting, while a lower bound of Ω((11-p)n) holds even for the easier problem of checking whether the input elements are already sorted [5]. The problem of sorting when faults are permanent (or, equivalently, when a pair of elements can only be compared once) has also been extensively studied and it exhibits connections to both the rank aggregation problem and to the minimum feedback arc set [6, 7, 1618, 2022, 24, 27].

Another related family of problems falls in the framework of Rényi–Ulam games: in these two-player games a responder secretly selects an object x from a known universe U and a questioner needs to identify x by asking up to n questions to the responder. The responder can lie up to f times and wins if the questioner fails to uniquely identify x. As an example, if U={1,,m} and the questions are comparisons the of form “Is xc?”, where cU is chosen by the questioner, then the questioner can always win by asking at most logm+floglogn+O(flogf) questions, which is tight [33].3 More broadly, Rényi–Ulam games and other search games have been extensively studied for a wide variety of search spaces and question types, as discussed in [31].

Other error-prone models have also been considered in the context of optimization algorithms [19] and in the design resilient data strictures [15, 25]. For more related problems on the aforementioned and other fault models, we refer the interested reader to [31] for a survey and to [12] for a monograph.

Finally, we point out that, in the fault-free case, a simple sampling strategy allows to find one of the smallest k elements with probability at least 1-q using O(min{n,nklog1q}) comparisons.

Paper Organization

In Sect. 2 we give some preliminary remarks and we outline a simple strategy to reduce the error probability. Section 3 describes our reduction from FT-MIN(k) to FT-MIN(34n). In Sects. 4 and  5 we design two algorithms that solve FT-MIN(k) using O(nklog1q) comparisons in expectation and O(nklog1q+(log1q)loglog1q) comparisons in the worst case, respectively. Finally, Sect. 6 is devoted to proving our lower bounds.

Preliminaries

We will often draw elements from the input set into one or more (multi)sets using sampling with replacement, i.e., we allow multiple copies of an element to appear in the same multiset. We will then perform comparisons among the elements of these multisets as if they were all distinct: when two copies of the same element are compared, we break the tie using any arbitrary (but consistent) ordering among the copies.

According to our error model, each comparison fault happens independently at random with probability at most p(0,12). This error probability can be reduced by repeating a comparison multiple times using a simple majority strategy. The same strategy also works in the related setting in which pairwise comparisons are no longer allowed but we have access to a noisy oracle O that can be queried with an element xS and returns either true or false. In this setting each xS is associated with a correct binary answer and, when O is queried with x, it returns the correct answer with probability at least 1-p>12 (and the wrong answer with the complementary probability). The errors in O’s answers are independent. Next lemma provides a lower bound on the probability of correctness of the majority strategy:

Lemma 1

Let x and y be two distinct elements. For any error probability upper bounded by a constant p[0,12) there exists a constant cpN+ such that the strategy that compares x and y (resp. queries O with x) 2cp·t+1 times and returns the majority result is correct with probability at least 1-e-t.

Proof

Suppose, w.l.o.g., that x<y. Let Xi{0,1} be an indicator random variable that is 1 iff the i-th comparison (resp. query result) is correct. Since the Xis are independent Bernoulli random variables with parameter at least 1-p, i=12cp·t+1Xi stochastically dominates [28, Definition 17.1] a binomial random variable X with parameters 2η=2cpt+1 and 1-p,4 and hence E[X]=2η(1-p)=(1-p)(2cp·t+1). Moreover, since p<1/2, we know that 2(1-p)>1 and hence we can use the Chernoff bound PrX(1-δ)E[X]exp-δ2E[X]2,δ(0,1) to upper bound the probability of failure of the majority strategy (see [28, Theorem 4.5 (2)]). Indeed:

Pr(Xη)=PrX12(1-p)E[X]exp-(2(1-p)-1)28(1-p)22η(1-p)=exp-(1-2p)24(1-p)η<exp-cpt(1-2p)24(1-p),

which satisfies claim once we choose cp=4(1-p)(1-2p)2.

Our Reduction

In this section we reduce the problem of solving FT-MIN(k) to the problem of solving FT-MIN(34n).5 We will say that an element x is small if it is one of the smallest k elements of S, otherwise we say that x is large. The reduction constructs a set S of size m that contains at least 34m small elements, where the value of m will be determined later. The set S is selected as follows:

  • Create m sets by independently sampling, with replacement, 3nk elements per set from S.

  • Run FindMin (Theorem 1) with failure probability ϱ=110 on each of the sets. Let S={x1,,xm} be the collection containing the returned elements, where xi is the element returned by the execution of FindMin on the i-th set.

Using Theorem 1, Lemma 1 and the Chernoff bound, we are able to prove the following lemma:

Lemma 2

The probability that fewer than 34m elements in S are small is at most e-m240.

Proof

Since the i-th set contains at least 3nk elements and each of them is independently small with a probability of kn, the probability that no element in the i-th set is small is upper bounded by:

1-kn3nke-kn3nk=e-3<120,

where we used the inequality 1+rer for rR. In other words, for every i, the event “the i-th set contains a small element” has probability at least 1-120. Moreover, since we chose ϱ=110, the probability that FindMin returns the correct minimum of the i-th set is at least 1-110 (see Theorem 1). Clearly, if both of the previous events happen, xi must be a small element, and by the union bound, the complementary probability is at most 120+110<16.

Let Xi be an indicator random variable that is 1 iff xi is a small element so that X=i=1mXi is the number of small elements in S. Since the xis are independently small with a probability of at least 56, the variable X stochastically dominates a Binomial random variable with parameters m and 56. As a consequence E[X]56m and, by using the Chernoff bound [28, Theorem 4.5 (2)], we obtain:

PrX34mPrX910E[X]e-12(110)2·56m=e-m240.

We are now ready to show the consequence of the above reduction:

Lemma 3

Let A be an algorithm that solves FT-MIN(34n) with a success probability of at least 1-qA(12,1) using at most T(n,qA) comparisons in the worst case (resp. in expectation), for any choice of qA(0,12). For any k and any q(0,1), there exists an algorithm that solves FT-MIN(k) with a success probability of at least 1-q using Onklog1q+TΘ(log1q),q2 comparisons in the worst case (resp. in expectation).

Proof

We first choose m=240ln2q and we compute the set S according to our reduction. Then we run A on S with a failure probability of qA=q2, and we answer with the element it returns. Notice that first step of the reduction requires no comparisons. Moreover, since each of the m=O(log1q) executions of FindMin requires O(nklog1ϱ)=O(nk) comparisons (see Theorem 1 and recall that ϱ=1/10), the worst-case number of comparisons performed during the second step is O(nklog1q). Overall, the total number of comparisons is Onklog1q+TΘ(log1q),q2 as claimed. This upper bound holds in the worst-case if T(n,qA) refers to a worst-case number of comparisons and in expectation if T(n,qA) refers to an expected number of comparisons.

We now consider the probability of success. By Lemma 2, the probability that fewer than 34m elements in S are small is at most e-m240e-ln2q=q2. Since the probability that A fails to return one of the smallest 34m elements in S is at most qA=q2, the claim follows by using the union bound.

It is not hard to see that, if we choose algorithm A in Lemma 3 to be FindMin, we have T(n,qA)=O(nlog1qA) which, thanks to our reduction, allows us to solve FT-MIN(k) with a success probability of at least 1-q using O(nklog1q+log21q) comparisons. This number of comparisons matches our lower bound of Ω(nklog1q) (see Theorem 5 in Sect. 6) when k=O(nlog1q) and q is bounded away from qcrit. Nevertheless, the major difficulty in solving FT-MIN(k) lies in the case k=ω(nlog1q).

Solving FT-MIN(k) Using the Asymptotically Optimal Expected Number of Comparisons

In this section, we will solve FT-MIN(k) with a success probability of at least 1-q using O(nklog1q) comparisons in expectation. By Lemma 3, it is sufficient to devise an algorithm that solves FT-MIN(34n) with a success probability of at least 1-q using O(log1q) comparisons in expectation. We assume that n4 since otherwise we can simply return an element selected uniformly at random from S.

In designing such an algorithm we will use a noisy oracle O that can be queried with an element xS and provides a guess on whether x is small, i.e., among the smallest 34n elements of S, or large (i.e., among the largest 14n elements of S). More precisely, for δ[0,1], let Sδ- denote the set containing the smallest δn elements of S, and let Sδ+ denote S\Sδ-. Then, O satisfies the following conditions:

  • O reports an element x to be small with probability at least 1-25 if xS1/3- and with probability at most 25 if xS3/4+. In other words, O identifies whether an element in S1/3-S3/4+ is small or large with a failure probability of at most 25;

  • Queries to O can be repeated and errors in the answers are independent;

  • Each query to O is implemented using a constant number of comparisons between elements in S.

Notice that O provides no guarantees on the accuracy of its answers when xS3/4-\S1/3-. We will show how to build such an oracle in Sect. 4.1.

Let cO be the constant of Lemma 1 for p=25. Our algorithm works in phases: in the generic i-th phase we select one element xi uniformly at random from S, and we perform a test on xi. This test consists of 2cOln2iq+1 queries to O, and it succeeds if x is reported as small by the majority of the queries (otherwise it fails). If the test on xi succeeds we return xi. Otherwise we move to the next phase. We name the above algorithm GeometricTest since the probability that a test succeeds when a large element is considered decreases geometrically w.r.t. the phase number, as we will show in Sect. 4.2.

Implementing O

To describe our implementation of O we can assume, without loss of generality, that p116 (if p>116, we can simulate each comparison by returning the majority result of 6cp+1 comparisons, as shown by Lemma 1). The oracle O answers a query for an element xS by comparing x with a randomly sampled element y from S\{x}. If x compares smaller than y, then x is reported as small, otherwise it is reported as large. Suppose that xS1/3-, if x is (incorrectly) reported as large at least one of the following two conditions must be true (i) yS1/3- or (ii) the comparison between x and y returned the wrong result. The first condition is true with probability at most n/3-1n-113 while the second condition is true with probability at most p116. Therefore the probability that x is reported as large is at most 13+116<25. If xS3/4+ then, in order for x to be (incorrectly) reported as small, we must have that (i) yS3/4+ or (ii) the comparison between x and y returned the wrong result. The first condition is true with probability at most n-3n/4-1n-114 while the second condition is true with probability at most p116. Overall, x is reported as small with probability at most 14+116<25.

Analysis of the Expected Number of Comparison and of the Success Probability of GeometricTest

The following lemmas respectively provide an upper bound on the expected number of comparisons and a lower bound on the success probability of GeometricTest.

Lemma 4

GeometricTest performs O(log1q) comparisons in expectation.

Proof

Consider a generic phase i. Assuming that the algorithm did not stop during phases 1,2,,i-1, the probability that it stops during phase i is at least:

PrxiS13-·Prthe test onxisucceeds|xiS13-13·1-q2i16,

where we used the fact that a test on an element from S13- performed during the i-th phase succeeds with probability at least 1-e-ln2iq1-q2i12, as shown by Lemma 1. Then, the number of phases executed by the algorithm is stochastically dominated by a geometric random variable X with parameter 16 (see [28, Definition 2.8]).

Since, for some constant κ>0, at most κln2iq comparisons are performed during phase i,6 we have that the overall number Ci of comparisons performed during phases 1,,i is upper bounded by j=1iκln2jqκlog2q·j=1iji2κlog2q. Then, the expected number of comparisons performed by GeometricTest is at most:

i=1Pr(X=i)Cii=11656i-1i2κlog2q=κ5log2q·i=1i256i=66κlog2q=Olog1q,

where we used the equality i=1i2(5/6)i=330 which follows from the more general identity i=1i2ri=r2+r(1-r)3 for r(0,1) [13].

Lemma 5

GeometricTest solves FT-MIN(34n) with a success probability of at least 1-q4.

Proof

If GeometricTest fails, then either it does not terminate or it returns an element in S3/4+. Since it is easy to see that the algorithm terminates almost surely,7 we can focus on upper bounding the probability ρi that the algorithm terminates at the end of a generic phase i by returning an element in S3/4+. In order for this to happen we must have that (i) xi is large and (ii) was reported as small by at least cOln2iq+1 of the 2cOln2iq+1 queries to O. The probability of (i) is at most 14, and by Lemma 1, the probability of (ii) given (i) is at most e-ln2iqq2i, implying that ρi14·q2i=q2i+2. We can now use the union bound over the different phases to upper bound the overall failure probability with i=1ρii=1q2i+2=q4.

Combining Lemmas 4 and 5 we can conclude that GeometricTest solves FT-MIN(34n) with a success probability of at least 1-q using O(log1q) comparisons in expectation. Lemma 3 immediately implies the following theorem:

Theorem 2

FT-MINk can be be solved with a success probability of at least 1-q using O(nklog1q) comparisons in expectation.

We conclude this section by pointing out that, since each phase of GeometricTest has a probability of at least 16 of returning a small element (as shown in the proof of Lemma 4), we can consider the variant obtained by running GeometricTest for up to 8log1q phases (if the number of phases is exceeded we return a random element). We name this variant TruncatedGeometricTest. We can upper bound the worst-case number of comparisons performed by TruncatedGeometricTest with O(i=18log1/qlog2iq)=O(log21q) and lower bound the success probability with 1-q4-568log1q1-q4-564·564log1q>1-q4-q2>1-q. Combining TruncatedGeometricTest with Lemma 3, we obtain an algorithm that solves FT-MINk with a success probability of at least 1-q using O(nklog1q) comparisons in expectation and O(nklog1q+log21q) comparisons in the worst case. This latter algorithm uses the same worst-case number of comparisons as the one that can obtained by combining FindMin with Lemma 3, but it uses fewer comparisons in expectation (see the discussion at the end Sect. 3 for details). In the next section we will show how to reduce the asymptotic number of comparisons needed in the worst case.

Solving FT-MIN(k) Using an Almost-Optimal Number of Comparisons in the Worst Case

In this section, we solve FT-MIN(k) with a success probability of at least 1-q using O((nk+loglog1q)log1q) comparisons in the worst case. For the sake of simplicity, we assume that n is a power of two.8 We let ρ(0,12] be a parameter that will be chosen later, and we design an algorithm that requires O(n·log1ρ·(logn+log1ρ)) comparisons to solve FT-MIN(34n) with a success probability of at least 1-ρn.

Our algorithm simulates a knockout tournament and works in logn rounds. In the beginning we construct a set S0={x1(0),x2(0),,xn(0)} containing n elements from S, where each xj(0) is obtained by running FindMin with a failure probability of ϱ=ρ22 on a (multi)set Xj of 2log1ρ elements randomly sampled with replacement from S. Then, in the generic i-th round we match together |Si-1|2=n2i pairs of elements from the set Si-1, and we add the match winners to a new set Si={x1(i),x2(i),,xn/2i(i)}Si-1. Specifically, for each j=1,,n2i we run a match between x2j-1(i-1) and x2j(i-1) consisting of 2cp2iln1ρ+3 comparisons, where cp is the constant of Lemma 1. The winner xj(i) of the match is the element that is reported to be smaller by the majority of the comparisons.

After the (logn)-th round we are left with a set Slogn containing a single element: this element is winner of the tournament, i.e., it is the element returned by our algorithm. The above algorithm can be visualized as a complete binary tree of height logn in which the leaves are the elements in S0, the root is the unique element in Slogn, and each internal vertex at depth logn-i represents some element xj(i)Si having x2j-1(i-1) and x2j(i-1) as its two children. See Fig. 1 for an example.

Fig. 1.

Fig. 1

An example of the complete binary tree representing an execution of our algorithm when the input sequence S contains n=8 elements. Each internal vertex xj(i) is the winner of a match (consisting of 2cp2iln1ρ+3 comparisons) between its two children x2j-1(i-1) and x2j(i-1)

As in the previous section, we will say that an element is small if it is among the 34n smallest element of S, and large otherwise. The following lemma provides a lower bound on the success probability of our algorithm:

Lemma 6

Consider a tournament among n elements, where n is a power of 2. The probability that the winner of the tournament is a small element is at least 1-ρn+1.

Proof

We prove by induction on i=0,,logn that xj(i)Si, Pr(xj(i)islarge)ρ2i+1.

We start by considering the base case i=0. Each element xj(0)Si is obtained by running FindMin on a (multi)set Xj of 2log1ρ elements sampled, with replacement, from S. In order for xj(0) to be large, at least one of the following conditions must be true: (i) the execution of FindMin on Xj fails, which happens with probability at most ϱ=ρ22, or (ii) all elements in Xj are large. Since each element in Xj is independently large with probability at most 14, the probability of (ii) is at most 142log1ρ=ρ4. Using the union bound and ρ12, we have that xj(0) is large with probability at most ρ4+ρ22ρ24+ρ22<ρ2=ρ2i+1.

We now consider i1 and we show that if the induction claim holds for i-1 then it must also hold for i. Since i1, we know that each element xj(i)Si is the winner of a match between the elements x2j-1(i-1) and x2j(i-1) in Si-1, each of which is large with probability at most ρ2i-1+1 by the induction hypothesis. Moreover, each xj(i-1) is chosen as function of a collection C(i-1,j)={xh(0)j2i-1h<(j+1)2i-1} of elements from S0 along with all the outcomes of their pairwise comparisons performed during phases 1,,i-1. Since C(i-1,2j-1) and C(i-1,2j) are disjoint subsets of S0, and the elements xh(0) in S0 are chosen using independent executions of FindMin (on independently chosen subsets Xh of S), we have that the events “x2j-1(i-1) is large” and “x2j(i-1) is large” are also independent. For xj(i) to be large either (i) x2j-1(i-1) and x2j(i-1) are both large, which happens with probability at most (ρ2i-1+1)2=ρ2i+2, or (ii) exactly one of x2j-1(i-1) and x2j(i-1) is large and it wins the match in phase i. The probability that exactly one of x2j-1(i-1) and x2j(i-1) is large can be upper bounded by the probability that at least one of x2j-1(i-1) and x2j(i-1) is large, which is at most 2·ρ2i-1+12ρ2 by induction hypothesis. We hence focus on the probability that, in a match between a large and a small element, the large element wins. Since x2j-1(i-1) and x2j(i-1) are compared 2cp2iln1ρ+3 times during the match, Lemma 1 ensures that this probability is at most e-2iln1ρ-1=ρ2i+1. Putting it all together, we have:

Pr(xj(i)islarge)ρ2i+2+2ρ2·ρ2i+1=(ρ+2ρ2)ρ2i+112+24ρ2i+1=ρ2i+1.

This completes the proof by induction and shows that the winner of the tournament (i.e., the sole element in Slogn) is large with probability at most ρ2logn+1=ρn+1 (and small with probability at least 1-ρn+1).

We now analyze the number of comparisons performed by our algorithm.

Lemma 7

Simulating the tournament requires O(n·logn·log1ρ+n·log21ρ) comparisons in the worst-case.

Proof

The initial selection of the elements in S0 requires O(n·log1ρ·log1ϱ)=O(n·log21ρ) comparisons (recall that we choose ϱ=ρ22). The tournament itself consists of logn rounds. The number of matches that take place in round i is n2i and, for each match, O(2ilog1ρ) comparisons are needed. It follows that the total number of comparisons performed in each round is O(nlog1ρ) and, since there are logn rounds, the overall number of comparisons in rounds 1 to logn is O(n·logn·log1ρ).

If we now select ρ=min{12,q1n}, we obtain an algorithm for FT-MIN(34n) that performs O(nlogn+logn·log1q+log21qn) comparisons in the worst case and has a success probability of at least 1-ρn+11-ρ·(q1n)n=1-ρ·q1-q2. We can now use this algorithm in our reduction of Lemma 3 to immediately obtain an algorithm for FT-MIN(k) which is optimal for k=O(nloglog1q) and q bounded away from qcrit (see Theorem 5 in Sect. 6).

Theorem 3

FT-MIN(k) can be solved with a success probability of at least 1-q using O(nklog1q+(log1q)loglog1q) comparisons in the worst case.

We can combine this algorithm with GeometricTest (described in Sect. 4) to solve FT-MIN(k) with a success probability of at least 1-1q using both O(nklog1q) comparisons in expectation and O(nklog1q+(log1q)loglog1q) comparisons in the worst case. In order to do so, we simply run the two algorithms in parallel until one of them terminates. Clearly, the expected number of comparisons is asymptotically unaffected, while the probability that this combined algorithm fails can be upper bounded by the sum of the respective failure probabilities, i.e., by at most q4+q2<q (recall that GeometricTest fails with probability at most q4, as shown by Lemma 5).

Lower Bound

The rest of the paper is devoted to proving our lower bound of Ω(nklog1q) on the expected number of comparisons required to solve FT-MIN(k) with a success probability of at least 1-q. We prove our lower bound using three different strategies depending on the values of k and q. Figure 2 shows a qualitative representation of the considered regions.

Fig. 2.

Fig. 2

A qualitative representation of the different ranges of the parameters q and k (as a fraction of n) handled by Lemmas 8, 10, and 11. The figure is not to scale

We start by considering what we deem to be the most interesting case, namely the one in which n2k and q can be upper bounded by a small enough constant. For simplicity we pick this constant to be 14, although the same proof strategy actually works for any q12-ε, where ε>0 is a constant of choice.9 We will show that any algorithm that is able to solve FT-MIN(k) with a success probability of at least 1-q can also be used to solve FT-MIN(1) with the same success probability. Then, our lower bound for FT-MIN(k) follows from the fact that any algorothm that solves FT-MIN(1) with a success probability of a least 1-q34,1 must perform Ω(log1q) comparison in expectation. This is formalized in the following theorem, whose proof is given in Appendix A and is similar to the one used in [14, Theorem 2.1] to establish a lower bound on the worst-case number of comparisons needed to solve FT-MIN(1).10

Theorem 4

Let A be an algorithm that solves FT-MIN(1) with a probability of success of at least 1-q(12,1]. For any n2, there exists a sequence S of n elements such that the expected number of comparisons performed by A on S is at least γnlog12q, where γ>0 is a constant that depends only on p.

We are now ready to prove our lower bound for FT-MIN(k) when n2k and q14.

Lemma 8

Let n2k and q14. For every algorithm A that solves FT-MIN(k) with a success probability of at least 1-q, there exists a sequence S of n elements such that the expected number of comparisons performed by A on S is larger than ψ1nklog1q, where ψ1>0 is a constant that depends only on p.

Proof

Let γ be the constant from Theorem 4, choose ψ1=γ4, and suppose towards a contradiction that there is an algorithm A that is able to solve FT-MIN(k) with a success probability of at least 1-q using an expected number of comparisons of at most ψ1nklog1q on every instance of n2k elements. We will show that the existence of A implies the existence of an algorithm A that is able to solve FT-MIN(1) with a success probability of at least 1-q on any instance of n=nk2 elements using fewer than γnlog12q comparisons in expectation, thus contradicting Theorem 4.

Algorithm A works as follows: given an instance S of FT-MIN(1) with n elements, A constructs an instance S of FT-MIN(k) that consists of k copies of each element in S and of n-knk<k copies of an arbitrary element from S. Then, A runs A on S (which contains exactly n elements) and outputs the element x returned by the execution of A. With probability at least 1-q, x is among the k smallest elements of S, implying that it is (a copy of) the smallest element of S.

To conclude the proof, it suffices to notice that the expected number of comparisons performed by A on S is upper bounded by the expected number of comparisons performed by A on S, i.e., it is at most

ψ1nklog1q=γ2n2klog1qγn2klog12qγnk-1log12q<γnlog12q.

We now turn our attention to the ranges of k and q that are not covered by Lemma 8. Recall that for qqcrit=n-kn, no lower bound exists since FT-MIN(k) can be solved without performing any comparison by simply returning an element chosen uniformly at random from S. In the rest of this section we will consider values of q that are bounded away from qcrit, namely we assume the existence of some constant α>0 for which 0<qn-kn-α. As a consequence, we can prove a preliminary lower bound of α on the expected number of comparisons needed to solve FT-MINk with a success probability of at least 1-q. This lower bound will be useful to handle some corner cases in the sequel.

Lemma 9

Let n>k and qn-kn-α. For every algorithm A that solves FT-MIN(k) with a success probability of at least 1-q, there exists a sequence S of n elements such that the expected number of comparisons performed by A on S is at least α.

Proof

Suppose towards a contradiction that there is an algorithm A that is able to solve FT-MIN(k) with a success probability of at least 1-q using an expected number of comparisons smaller than α on every instance of n>k elements. Then, an execution of A performs no comparisons with a probability larger than 1-α. When the input of A is a random permutation of 1,2,,n, the failure probability must be larger than (1-α)n-kn=n-kn-α(n-k)n>n-kn-αq. This implies the existence of at least one instance of n elements for which A fails with a probability larger than q, yielding the sought contradiction.

We now handle the case n<2k and q14. We will consider a suitable set of instances ensuring that any algorithm having a success probability of at least 1-q must perform at least ψ2log1q comparisons on at least one instance in the set, for some constant ψ2>0. Notice that, in this case, ψ2log1q>ψ22nklog1q.

Lemma 10

Let k<n<2k and qmin{14,n-kn-α}. For every algorithm A that solves FT-MIN(k) with a success probability of at least 1-q, there exists a sequence S of n elements such that the expected number of comparisons performed by A on S is at least ψ2log1q, where ψ2>0 is a constant that depends only on α and p.

Proof

Let A be an algorithm that solves FT-MIN(k) with a success probability of at least 1-q34 using at most μ comparisons in expectation on every instance of n elements, where k<n<2k.

Let β=3α8(α+1), notice that β(0,1) is a constant as it only depends on α, and define ψ2=minα2log1β,18log1-pp. We only need to consider q<β2 since, when qβ2, Lemma 9 already ensures that μαα2log1βlog1qψ2log1q.

In the rest of the proof we will consider η+1 sequences I0,I1,Iη having n elements each, where η=kn-k. Then, we will lower bound μ by considering the expected number of comparisons needed by A to solve each instance Ii with a success probability of at least 1-q.

We start by defining I0=1,2,,n and, for i=1,,η, we let Ii be the sequence obtained by performing (n-k)·i consecutive right rotations on I0.11 See Fig. 3 for an example. We denote by xj(i) the element that appears in the j-th position of sequence Ii. We will use A(Ii) to refer to an execution of A with input Ii, and A(Ii)=xj(i) to denote the event “A(Ii) returns element xj(i)”. Moreover, we let Fi be the event “A(Ii) performs at most 4μ comparisons”. Since the expected number of comparisons of A(Ii) is at most μ, the Markov inequality implies that A(Ii) performs more than 4μ comparisons with a probability of at most 14, i.e., Pr(Fi)34.

Fig. 3.

Fig. 3

An example of the input sequences I0,,Iη used in the proof of Lemma 10 for n=14 and k=10 (in this case η=3). The positions in L1,,Lη have a white background, while those containing the k smallest elements of each sequence have a gray background (since k>n2, the gray intervals of any two sequences must overlap). Each position j{1,,k} contains one of the smallest k elements in at least one of the sequences. In particular, our choice of j ensures that xj(j) is not among the k smallest elements of Ij, i.e., jLj. The element xj(j) for j=6 is highlighted in bold (in this case j=6/(14-10)=2)

Given any input sequence of n elements and j{1,,n}, we can encode an execution of A that returns the j-th element of the input sequence with a pair (CR) where C is the list of the observed comparison outcomes, and R is the list of all random choices taken by A. Let Pr((C,R)Ii) be the probability that the execution (CR) of A is realized when when the input sequence is Ii. Moreover, let Pr(RC) denote the probability that the random choices of an execution of A are exactly those in R, given that the observed comparison results match those in C. Consider the case in which each comparison error is exactly p. Then, the probability of observing the comparison outcomes in C when A’s input is I0 (resp. Ii) is at most (1-p)|C| (resp. at least p|C|), allowing us to relate Pr((C,R)I0) and Pr((C,R)Ii) as follows:

Pr((C,R)I0)(1-p)|C|Pr(RC)=1-pp|C|p|C|Pr(RC)1-pp|C|Pr((C,R)Ii).

By summing the above inequality over all the choices of (CR) for which A returns the j-th input element and such that |C|4μ, we obtain PrA(I0)=xj(0)F01-pp4μPrA(Ii)=xj(i)Fi. Let Li be the set of all indices h{1,,n} such that xh(i) is not among the k smallest elements of Ii. Moreover, for j{1,,k}, define j=jn-k and notice that jLj (see Fig. 3). Then:

PrA(I0)suceedsF0=j=1kPrA(I0)=xj(0)F01-pp4μj=1kPrA(Ij)=xj(j)Fj 1
1-pp4μi=1ηhLiPrA(Ii=xh(i)Fi)=1-pp4μi=1ηPrA(Ii)failsFi, 2

where the second inequality follows from the fact that the generic j-th summand in (1) appears in the double sum of (2) when i=j and h=jLj. Since we know that, for all i=0,,η, A(Ii) fails with probability at most q14, we have:

ηqi=1ηPr(A(Ii)fails)i=1ηPr(A(Ii)failsFi)=i=1ηPr(A(Ii)failsFi)Pr(Fi)34i=1ηPr(A(Ii)failsFi)34p1-p4μPr(A(I0)suceedsF0)34p1-p4μPr(A(I0)suceedsF0)38p1-p4μ,

The above inequality yields μlog38ηq4log1-pp, which can be combined with η1+nn-k1+1q+α1+1α=α+1α to obtain the sought lower bound on μ:

μlog38ηq4log1-pplog3α8(α+1)q4log1-pp=logβq4log1-pp=12logβ2+log1q4log1-pp>12logq+log1q4log1-pp=log1q8log1-ppψ2log1q.

Finally, we consider the remaining case q>14. We show that the success probability of any algorithm that solves FT-MIN(k) can be boosted from 1-q to at least 14 by running it multiple times and selecting the smallest returned element using FindMin. Then, the lower bound of either Lemmas 8 or 10 applies.

Lemma 11

Let n>k and 14<qn-kn-α. For every algorithm A that solves FT-MIN(k) with a probability of success of at least 1-q, there exists a sequence S of n elements such that the expected number of comparisons performed by A on S is at least ψ3nklog1q, where ψ3>0 is a constant that depends only on α and p.

Proof

Let c2 be a constant such that FindMin with a failure probability of ϱ=18 requires at most cm comparisons on any instance of m elements (Theorem 1 ensures that such a constant exists). Let A be an algorithm that solves FT-MIN(k) with a probability of error of at most q using an expected number of comparisons of at most μ on every instance of n elements.

Let ψ1 and ψ2 be the constants of Lemmas 8 and 10, respectively. Define ψ=min{ψ1,ψ22}, β=10ψlog-211-α, and ψ3=minα2βc,ψ10log11-α. We can restrict ourselves to the case nk>βc since otherwise we can use Lemma 9 and the inequality log1q2 to write μααβcnkα2βcnklog1qψ3nklog1q.

We now describe an algorithm A that uses A to solve FT-MIN(k) with a success probability of at least 34. Given an instance S of n elements, A works as follows: first, it performs t=3log1q independent executions of A with input S and collects the returned elements into a set S; then, it runs FindMin on S with a probability of error of ϱ=18, and answer with the returned element.

The expected number of comparisons of A is at most ct+tμ, and the probability that no execution of A returns one of the k smallest elements is at most qtq3log1q=123=18. By the union bound, the overall failure probability of A is at most 18+18=14. By invoking either Lemmas 8 or 10 (depending on the values of k and n), we know that the expected number of comparisons of A must be at least min{ψ1,ψ22}nklog1q=ψnklog1q. In formulas:

tc+tμψnklog1q. 3

Since 3log1q>32, we have t=3log1q3log1q+1<(1+23)3log1q=5log1q. Moreover, we know that qn-kn-α<1-α. We can then solve (3) for μ and combine the result with the above inequalities:

μψntklog1q-c>ψn5klog21q-c>ψ5log11-α·nklog1q-c. 4

Using q<1-α once more, together with nk>βc, we have:

ψ5log11-α·nklog1q>ψn5klog211-α=2β-1nk>2c,

which can be combined with (4) to show that μ>ψ10log11-α·nklog1qψ3nklog1q.

Combining Lemmas 8, 10, and 11, we obtain the main result of this section:

Theorem 5

Let n>k and qn-kn-α. For every algorithm A that solves FT-MIN(k) with a success probability of at least 1-q, there exists a sequence S of n elements such that the expected number of comparisons performed by A on S is at least ψnklog1q, where ψ>0 is a constant that depends only on α and p.

Acknowledgements

The authors wish to thank Tomáš Gavenčiak, Barbara Geissmann, Paolo Penna, Daniel Rutschmann, and Peter Widmayer for insightful discussions. We also wish to thank Claire Nicole Barbasch, the anonymous reviewers, and the coordinating editor for their careful reading of our manuscript and their many comments and suggestions.

Proof of Theorem 4

Consider the following Noisy-OR problem, first introduced in [14]: we are given a finite input sequence y1,y2, where yi{0,1} and we need to output “true” if at least one yi equals 1, and “false” otherwise. The actual value of an element yi cannot be read directly but we can perform queries to a noisy oracle O. Whenever O is queried with an element yi, it returns yi with probability 1-p and 1-yi with the complementary probability. We will denote such a query by O(yi).

Given an algorithm A that solves FT-MIN(1) on every instance of n elements with a success probability of at least 1-q, we can design an algorithm A that solves the Noisy-OR problem on every instance of n-1 elements with the same success probability. Intuitively, algorithm A on input y1,y2,,yn-1 simulates an execution of A on input x1,x2,,xn-1,xn where xn=n and, for i=1,,n-1, xi=(1-yi)n+i. As a consequence, with probability at least 1-q, A returns xn if all yis are 0, and xixn where i is the smallest index such that yi=1 otherwise. Formally A executes A with the following modifications:

  • Whenever A compares two elements xi and xj with i<j (the case j>i is symmetric) A simulates the comparison as follows:
    • If j=n, then A queries O with yi. If O(yi) returns “1”, xi is treated as smaller than xn. Otherwise, xn is treated as smaller than xi.
    • If jn, then A performs 2cpln2p+1 queries O(yi) (resp. O(yj)), and computes their majority result ri (resp rj). If exactly one rh{ri,rj} is 1, A treats the corresponding element xh as the smaller between xi and xj. Otherwise (if ri=rj) xi is treated as smaller than xj. Notice that, by Lemma 1, ri=xi (resp. rj=xj) with probability at least 1-p2, showing that the element that is reported as smaller is actually the smaller between xi and xj with a probability of at least 1-2p2=1-p.
  • Whenever A terminates and returns an element xi, A returns “true” if xixn and “false” otherwise.

Each comparison of A is implemented by A using at most 4cpln2p+2 queries to O, therefore if the average number of comparisons required A on any instance of n elements is at most μ, A performs at most (4cpln2p+2)μ queries in expectation on every instance of Noisy-OR consisting of n-1 elements. In the rest of the proof we will show that no algorithm for Noisy-OR having a success probability of at least 1-q can perform fewer than γnlog12q queries on all instances of n-1 elements, where γ>0 is a suitable constant that depends on p. This will immediately prove that μγ4cpln2p+2nlog12q=γnlog12q, where γ=γ4cpln2p+2.

Consider any algorithm A¯ that solves Noisy-OR with a success probability of at least 1-q and a let ν be the maximum among the expected number of queries performed by A¯ on any input sequence of n-1 elements. We will derive a lower bound on ν. Assume that ν is finite and let ε(0,12q-1) be a parameter that will be chosen later. The existence of A¯ implies the existence of an algorithm A for Noisy-OR that has a success probability of at least 1-(1+ε)q, performs at most ν queries in expectation (on every instance of n-1 elements), and at most νqε queries in the worst case. Such A can be obtained from A¯ by stopping A¯ immediately before the νqε+1-th query to O (if A¯ is stopped in this way, A returns any arbitrary input element). By the Markov inequality A¯ is stopped with probability at most νν/(qε)=qε, showing that the probability of failure q of A is at most q+qε=(1+ε)q<12.

For i=0,,n-1, let Ii denote the sequence y1,,yn-1, where yj=1 if j=i and yj=0 otherwise (so that I0 consists of n-1 zeros). Similarly to the proof of Theorem 2.1 in [14], we consider the rooted noisy Boolean decision tree T associated with A: each internal vertex v of T has a label λv{1,,n-1} and represents a query O(yλv), while each leaf is labeled with either “true” or “false”. A specific execution of A on an input sequence of n-1 elements traces a path P between the root r of T and a leaf of T, and the output of the execution matches the label of . Let L be the set of leaves of T.

Given L we will denote by Pr(Ii) the probability that an execution of A on input Ii traces the (unique) path P from r to in T. For j=1,,n-1, define η(,j) as the number of internal vertices labeled j on the unique path P between the root of T and . It follows that the execution of A described by P performs j=1n-1η(,j)=d() queries to O, where d() is the depth of in T.

For i=1,,n-1, the only input element that differs between Ii and I0 is yi. Since an execution of A that traces P performs η(,i) queries O(yi), we have that Pr(Ii) must be at least p1-pη(,i)Pr(I0). Then:

i=1n-1Pr(Ii)i=1n-1p1-pη(,i)Pr(I0)=(n-1)Pr(I0)·1n-1i=1n-1p1-pη(,i). 5

Let ϕ(x)=p1-px. Since ϕ(x) is a convex function we can use Jensen’s inequality to write:

1n-1i=1n-1p1-pη(,i)=1n-1i=1n-1ϕ(η(,i))ϕ1n-1i=1n-1η(,i)=ϕd()n-1=p1-pd()n-1. 6

Let Z be the set of leaves of T labeled “false”. Combining (5) and (6), summing over all leaves Z, and noticing that we must have ZPr(Ii)qi=1,,n-1, we obtain:

(n-1)q=i=1n-1qi=1n-1ZPr(Ii)=Zi=1n-1Pr(Ii)(n-1)Zp1-pd()n-1Pr(I0). 7

Define φ(x)=p1-pxn-1, α=Pr(I0), and d¯=Zαd(). Since A succeeds with probability at least 1-q on every input sequence of n-1 elements, we know that Zα1-q. Moreover, ϕ(x) is a monotonically decreasing convex function, allowing us to combine the above inequality with Jensen’s inequality to write:

Zp1-pd()n-1Pr(I0)=Zαφ(d())(1-q)Zαφ(d())Zα(1-q)φZαd()Zα(1-q)φd¯1-q=(1-q)p1-pd¯(1-q)(n-1). 8

From (7) and (8) we obtain:

(n-1)q(n-1)(1-q)p1-pd¯(1-q)(n-1),

which implies

d¯(n-1)(1-q)logq1-qlogp1-p=(n-1)(1-q)log1-qqlog1-pp.

We can lower bound ν with the average number d¯ of queries performed by A on input I0, indeed:

νLd()Pr(I0)Zd()Pr(I0)=d¯.

We now choose γ=18log1-pp and ε=minlog12q4(n-1),1+2q4q-1. Notice that this is a valid choice for ε since 1+2q4q-1<12q-1. Therefore we know that q(1+ϵ)q<12 and we can write:

νd¯(n-1)(1-q)log1-qqlog1-pp>(n-1)log12q2log1-pp=(n-1)log12(1+ε)q2log1-pp=(n-1)log12q-(n-1)log(1+ε)2log1-pp(n-1)log12q-12log12q2log1-pp=(n-32)log12q2log1-ppnlog12q8log1-pp=γnlog12q,

where we used the identity log(1+x)2x for x0. This provides the sought lower bound on ν and completes the proof.

Funding

Open access funding provided by Swiss Federal Institute of Technology Zurich

Footnotes

1

In order to ease notation, we omit ceilings in our asymptotic upper bounds. E.g., we will write O(nklog1q) in place of O(nklog1q). This is only relevant for values of q approaching 1.

2

Interestingly, some semiconductor companies consider the more general problem of approximately classifying all their products in a process known as product binning. E.g., identically-built processors can be assigned different commercial names depending on their performances.

3

Throughout the rest of the paper we use log to denote binary logarithms and ln to refer to natural logarithms.

4

The first parameter of a binomial random variable represents the number of (independent) trials while the second parameter is the success probability of each trial.

5

In order to ease the notation, here and throughout the rest of the paper we will use FT-MIN(34n) as a shorthand for FT-MIN(34n).

6

Recall from Sect. 4.1 that each query to O is implemented with one comparison when p116, and with 6cp+1 comparisons when p>116. Since the number of queries in phase i is 2cOln2iq+14cOln2iq, we can pick κ=4cO(6cp+1).

7

The proof of Lemma 4 shows that the probability that more than i phases are needed is at most 5/6i.

8

If this is not the case, it will suffice to pad S with 2logn-n randomly selected copies of elements from S.

9

The case kn2 and q14,12-ε will be covered by the more general Lemma 11.

10

The lower bound of [14, Theorem 2.1] concerns the related problem of computing the logical OR of an input sequence of bits using noisy queries. A lower bound for this problem immediately translates into a lower bound for FT-MIN(1). For details see Lemma 3.4 and Theorem 3.5 in [14] or the self-contained proof of Theorem 4 in Appendix A.

11

The result of a single right rotation on a sequence x1,x2,,x is the sequence y1,y2,,y where y1=x and, for i=2,,, yi=xi-1.

Research supported by SNF (Project Number 200021_165524).

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Stefano Leucci, Email: stefano.leucci@univaq.it.

Chih-Hung Liu, Email: chih-hung.liu@inf.ethz.ch.

References

  • 1.Aigner M. Finding the maximum and minimum. Discret. Appl. Math. 1997;74(1):1–12. doi: 10.1016/S0166-218X(96)00012-1. [DOI] [Google Scholar]
  • 2.Anthes G. Inexact design: beyond fault-tolerance. Commun. ACM. 2013;56(4):18–20. doi: 10.1145/2436256.2436262. [DOI] [Google Scholar]
  • 3.Bagchi A. On sorting in the presence of erroneous information. Inf. Process. Lett. 1992;43(4):213–215. doi: 10.1016/0020-0190(92)90203-8. [DOI] [Google Scholar]
  • 4.Baumann RC. Radiation-induced soft errors in advanced semiconductor technologies. IEEE Trans. Device Mater. Reliab. 2005;5(3):305–316. doi: 10.1109/TDMR.2005.853449. [DOI] [Google Scholar]
  • 5.Borgstrom, R.S., Kosaraju, S.R.: Comparison-based search in the presence of errors. In: Proceedings of the Twenty-fifth Symposium on Theory of Computing (STOC93), pp. 130–136 (1993)
  • 6.Braverman, M., Mao, J., Weinberg, S.M.: Parallel algorithms for select and partition with noisy comparisons. In: Proceedings of the Forty-Eighth 48th Symposium on Theory of Computing (STOC16), pp. 851–862 (2016)
  • 7.Braverman, M., Mossel, E.: Noisy sorting without resampling. In: Proceedings of the Nineteenth Symposium on Discrete Algorithms (SODA08), pp. 268–276 (2008)
  • 8.Catania, J.A.: Soft errors in electronic memory—a white paper (2004)
  • 9.Cheemavalagu, S., Korkmaz, P., Palem, K.: Ultra low-energy computing via probabilistic algorithms and devices: CMOS device primitives and the energy-probability relationship. In: Proceedings of the 2004 International Conference on Solid State Devices and Materials, pp. 402–403 (2004)
  • 10.Cheemavalagu, S., Korkmaz, P., Palem, K., Akgul, B.E.S., Chakrapani, L.N.: A probabilistic CMOS switch and its realization by exploiting noise. In: Proceedings of the 2005 IFIP/IEEE International Conference on Very Large Scale Integration—System on a Chip (VLSI-SoC05, pp. 535–541 (2005)
  • 11.Chen, X., Gopi, S., Mao, J., Schneider, J.: Competitive analysis of the top-k ranking problem. In: Proceedings of the Twenty-Eighth Symposium on Discrete Algorithms (SODA17), pp. 1245–1264 (2017)
  • 12.Cicalese, F.: Fault-Tolerant Search Algorithms—Reliable Computation with Unreliable Information. Monographs in Theoretical Computer Science. Springer (2013)
  • 13.Edgar T. Staircase series. Math. Mag. 2018;91(2):92–95. doi: 10.1080/0025570X.2017.1415584. [DOI] [Google Scholar]
  • 14.Feige U, Raghavan P, Peleg D, Upfal E. Computing with noisy information. SIAM J. Comput. 1994;23(5):1001–1018. doi: 10.1137/S0097539791195877. [DOI] [Google Scholar]
  • 15.Finocchi I, Grandoni F, Italiano GF. Optimal resilient sorting and searching in the presence of memory faults. Theor. Comput. Sci. 2009;410(44):4457–4470. doi: 10.1016/j.tcs.2009.07.026. [DOI] [Google Scholar]
  • 16.Geissmann, B., Leucci, S., Liu, C., Penna, P.: Sorting with recurrent comparison errors. In: Proceedings of the Twenty-Eighth International Symposium on Algorithms and Computation (ISAAC17), pp. 38:1–38:12 (2017)
  • 17.Geissmann, B., Leucci, S., Liu, C., Penna, P.: Optimal sorting with persistent comparison errors. In: Proceedings of the Twenty-seventh European Symposium on Algorithms (ESA19), pp. 49:1–49:14 (2019)
  • 18.Geissmann B, Leucci S, Liu C, Penna P. Optimal dislocation with persistent errors in subquadratic time. Theory Comput. Syst. 2020;64(3):508–521. doi: 10.1007/s00224-019-09957-5. [DOI] [Google Scholar]
  • 19.Geissmann, B., Leucci, S., Liu, C., Penna, P., Proietti, G.: Dual-mode greedy algorithms can save energy. In: Proceedings of the 30th International Symposium on Algorithms and Computation (ISAAC19), LIPIcs, vol. 149, pp. 64:1–64:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019). 10.4230/LIPIcs.ISAAC.2019.64
  • 20.Geissmann, B., Mihalák, M., Widmayer, P.: Recurring comparison faults: sorting and finding the minimum. In: Proceedings of the Twentieth International Symposium on Fundamentals of Computation Theory (FCT15), pp. 227–239 (2015)
  • 21.Kenyon-Mathieu, C., Schudy, W.: How to rank with few errors. In: Proceedings of the Thirty-Nineth Symposium on Theory of Computing (STOC07), pp. 95–103 (2007)
  • 22.Klein, R., Penninger, R., Sohler, C., Woodruff, D.P.: Tolerant algorithms. In: Proceedings of the Nineteenth European Symposium on Algorithms (ESA11), pp. 736—747 (2011)
  • 23.Lakshmanan KB, Ravikumar B, Ganesan K. Coping with erroneous information while sorting. IEEE Trans. Comput. 1991;40(9):1081–1084. doi: 10.1109/12.83656. [DOI] [Google Scholar]
  • 24.Leighton T, Ma Y. Tight bounds on the size of fault-tolerant merging and sorting networks with destructive faults. SIAM J. Comput. 1999;29(1):258–273. doi: 10.1137/S0097539796305298. [DOI] [Google Scholar]
  • 25.Leucci, S., Liu, C., Meierhans, S.: Resilient dictionaries for randomly unreliable memory. In: Proceedings of the 27th Annual European Symposium on Algorithms, (ESA19), LIPIcs, vol. 144, pp. 70:1–70:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019). 10.4230/LIPIcs.ESA.2019.70
  • 26.Long, P.M.: Sorting and searching with a faulty comparison oracle. University of California at Santa Cruz, Tech. rep. (1992)
  • 27.Makarychev, K., Makarychev, Y., Vijayaraghavan, A.: Sorting noisy data with partial information. In: Proceedings of the Fourth Conference on Innovations in Theoretical Computer Science (ITCS13), pp. 515–528 (2013)
  • 28.Mitzenmacher, M., Upfal, E.: Probability and Computing: Randomization and Probabilistic Techniques in Algorithms and Data Analysis, 2nd edn. Cambridge University Press (2017)
  • 29.Palem, K., Lingamneni, A.: Ten years of building broken chips: The physics and engineering of inexact computing. ACM Trans. Embed. Comput. Syst. 12(2s), 87:1–87:23 (2013)
  • 30.Pelc A. Searching with known error probability. Theor. Comput. Sci. 1989;63(2):185–202. doi: 10.1016/0304-3975(89)90077-7. [DOI] [Google Scholar]
  • 31.Pelc A. Searching games with errors - fifty years of coping with liars. Theor. Comput. Sci. 2002;270(1–2):71–109. doi: 10.1016/S0304-3975(01)00303-6. [DOI] [Google Scholar]
  • 32.Ravikumar, B., Ganesan, K., Lakshmanan, K.B.: On selecting the largest element in spite of erroneous information. In: Proceedings of the Fourth Symposium on Theoretical Aspects of Computer Science (STACs87), pp. 88–99 (1987)
  • 33.Rivest RL, Meyer AR, Kleitman DJ, Winklmann K, Spencer J. Coping with errors in binary search procedures. J. Comput. Syst. Sci. 1980;20(3):396–404. doi: 10.1016/0022-0000(80)90014-8. [DOI] [Google Scholar]

Articles from Algorithmica are provided here courtesy of Springer

RESOURCES