Skip to main content
Springer logoLink to Springer
. 2022 Apr 23;36(3):1197–1218. doi: 10.1007/s10618-022-00833-4

Ranking with submodular functions on a budget

Guangyi Zhang 1,, Nikolaj Tatti 2, Aristides Gionis 1
PMCID: PMC9110513  PMID: 35601821

Abstract

Submodular maximization has been the backbone of many important machine-learning problems, and has applications to viral marketing, diversification, sensor placement, and more. However, the study of maximizing submodular functions has mainly been restricted in the context of selecting a set of items. On the other hand, many real-world applications require a solution that is a ranking over a set of items. The problem of ranking in the context of submodular function maximization has been considered before, but to a much lesser extent than item-selection formulations. In this paper, we explore a novel formulation for ranking items with submodular valuations and budget constraints. We refer to this problem as max-submodular ranking (MSR). In more detail, given a set of items and a set of non-decreasing submodular functions, where each function is associated with a budget, we aim to find a ranking of the set of items that maximizes the sum of values achieved by all functions under the budget constraints. For the MSR problem with cardinality- and knapsack-type budget constraints we propose practical algorithms with approximation guarantees. In addition, we perform an empirical evaluation, which demonstrates the superior performance of the proposed algorithms against strong baselines.

Keywords: Ranking, Submodular maximization, Dynamic programming, Approximation algorithms

Introduction

Combinatorial optimization plays a central role in many machine-learning problems. One prevalent approach to solve such problems is via submodular-optimization techniques. The popularity of submodular-optimization methods results from the fact that in many real-world settings the objective function exhibits the “diminishing returns” property, as well as from the ever-growing rich toolkit that has been developed in the past decades. One fundamental primitive in this toolkit is submodular maximization (Krause and Golovin 2014), which has been the backbone of a number of important problems, such as sensor placement (Krause et al. 2008), viral marketing in social networks (Kempe et al. 2015), document summarization (Lin and Bilmes 2011), and more.

Submodular optimization has mainly been studied in the context of subset-selection problems. However, in many real-world applications the goal is to find a ranking over a set of items. Finding a ranking is a significantly more challenging task than subset selection, as the search space is factorially larger. One successful attempt of applying ideas from submodular optimization to ranking is the submodular-ranking problem (SR) (Azar and Gamzu 2011). In this problem, given a set of items and a set of submodular functions, the goal is to find a (partial) ranking of the items so as to minimize the average “cover time” of all functions.

An exemplary application of SR is in the multiple intents re-ranking problem (Azar et al. 2009), which has applications in web searching. In this problem setting, a user query may correspond to multiple user intents. For example, a query of “java” may mean a programming language, an island, or a type of coffee. Even for a seemingly unambiguous query, such as “New York,” there exist many possible intents, for example, attractions, cuisine, travel, cultural events, etc. In the absence of an explicit user intent, we need to consider all possibilities. The SR formulation proposes to model each intent as a submodular function, whose value improves when a non-redundant web page of the right intent is encountered, and reaches a maximum when the user is satisfied, i.e., having gathered sufficient information. The goal is to produce a ranking of web pages that minimizes the expected number of pages a user has to browse before they satisfy their information needs. The expectation here is over the distribution of different user intents, which for this particular application can be assumed to be known.

While the SR formulation can be useful in some cases, it fails to model realistically a number of other applications. Critically, it assumes that a demand can wait indefinitely before it gets satisfied. In the previous example, for instance, it is assumed that users will keep reading down a ranked list of web pages until they gather enough information. In reality, a budget can be set for the amount of service that a user receives. The budget can be the number of web pages to browse, or the time to spend on the web-search task. A user stops receiving service once the budget is exceeded. Moreover, the budget can vary across different demands. For example, a user intent can be classified into one of three types, informational, navigational, and transactional (Jansen et al. 2008), and each may come with a different budget, translating to the amount of “patience” that a user exhibit to obtain results for each type. User intents and budgets can be readily extracted from the past search logs.

To accommodate budgeted versions of the submodular ranking problem, we propose a new formulation, which we call max-submodular ranking (MSR). In the MSR problem, we are given a set of non-decreasing submodular functions, each associated with a budget. We aim to find a ranking that, instead of minimizing the total coverage time of the functions, maximizes the sum of function values (coverage) under individual budget constraints. In other words, every item in the ranking incurs a cost, and each function is evaluated at the maximal prefix of the ranked sequence that does not exceed its budget. A precise formulation of the MSR problem is provided in Sect. 3.

In this paper, we propose practical algorithms with approximation guarantees for MSR, when the budget constraints are either cardinality or knapsack constraints. We also note that the well-known constrained submodular maximization and minimum submodular cover problems are special cases of MSR and SR, respectively, when there is a single submodular function. In this sense, the MSR problem we define is a dual problem of SR, in the same way that max k-cover is a dual problem of minimum set cover.

MSR has great potential to be applied in other scenarios, such as in the case where the submodular functions are 0–1 activation functions. We call this special case max-activation ranking (MAR) problem. The idea is to activate as many demands as possible with a common ranking of items, or services, under individual budget constraints. As an example, some subscription-based streaming media services, such as Netflix, produce content in a data-driven fashion. One possibility is to arrange the plot structure in a TV series such that the maximum number of audience will get interested before their individual cut-off points for a new show. The goal for the TV series producer is to encourage the maximum-size audience to continue watching. A plot structure can be characterized as a sequence of scenes, each described by a set of tags, such as romantic, adventurous, funny, etc., which may interest particular audience. Similar applications can also be found in ranking commercial ads, ranking customer reviews, creating play lists for music streaming services, and more.

In concrete, our contributions in this paper are summarized as follows.

  • We introduce the novel problem of max-submodular ranking (MSR), where the goal is to find a ranking of a set of items so as to maximize the total value of a set of submodular functions under budget constraints.

  • We prove that a simple greedy algorithm achieves a factor-2 approximation for the MSR problem under cardinality constraints, which is tight for this particular greedy algorithm.

  • We show that a weighted greedy algorithm that pays more attention to functions with small budget achieves a factor-3 approximation for the MSR problem under cardinality constraints. While its worst-case bound is worse, there are natural problem instances for which the weighted greedy finds better solutions than its unweighted counterpart.

  • We devise a new algorithm that returns the best solution among the solutions found by a cost-efficient greedy algorithm and a ranking of “large” items produced by dynamic programming. Our algorithm achieves an approximation factor arbitrarily close to 4 for the MSR problem under knapsack constraints.

  • We empirically evaluate and compare different algorithms on real-life datasets, and find that the proposed algorithms achieve superior performance when compared with strong baselines.

The rest of the paper is organized as follows. We start by discussing the related work in Sect. 2, and we formally introduce the MSR problem in Sect. 3. The unweighted and weighted greedy algorithms for MSR under cardinality constraints are presented and analyzed in Sects. 4.1 and 4.2, respectively. The novel algorithm for the MSR problem under knapsack constraints is introduced and analyzed in Sect. 5. We present our empirical evaluation in Sect. 6, and we offer our concluding remarks in Sect. 7.

Related work

Submodular maximization

Submodular maximization is a special case of our formulation when given only a single function. Coupled with a non-decreasing property and with a cardinality constraint it is well-known that a simple greedy algorithm achieves a e/(e-1) approximation (Nemhauser et al. 1978), which is also shown to be tight (Nemhauser and Wolsey 1978). For a more general budget constraint, a natural algorithm is to return the best solution among the solutions found by a cost-efficient greedy method and by selecting the best singleton item. Recently, the approximation factor of this “best-of-two” algorithm was shown to be within [1/0.462, 1/0.427] (Feldman et al. 2020). A better 2-approximation is achieved by another greedy variant that returns the best solution among the solutions found by a cost-efficient greedy algorithm and all its intermediate solutions, each augmented with the best single additional item (Yaroslavtsev et al. 2020).

Submodularity for a sequence function

A sequential utility function is defined as f:SR, where S is the set of all possible sequences of subsets of a ground set of items V. Note that a set function can be seen as a special sequence function, in which the diminishing-returns effect holds for any subsequence relation. Streeter and Golovin (2008) and Zhang et al. (2012) introduce a notion of string submodularity, which restricts the diminishing returns to only the prefix subsequence relation. That is to say, a function f is string submodular if appending an item to a sequence results in no larger marginal gain than appending the item to a prefix of the sequence. The goal is to find a sequence of a given length that maximizes the value of the function f. In our formulation, the sum of multiple submodular functions remains submodular, and thus, string submodular. However, the analysis in the prior work does not apply in our case as we assume that each submodular function is associated with a different budget constraint.

Submodular ranking

Azar and Gamzu (2011) propose the submodular ranking (SR) problem, which aims to find a permutation to minimize the average “cover time” of a set of submodular functions, where we say that an input sequence “covers” a function if it evaluates to the maximum value of the function, and the “cover time” of a sequence of items is the shortest prefix of the sequence for which the function is covered. The problem we study in this paper can be seen as a dual problem of the SR problem. The SR problem originates from the classic min-sum set cover (MSSC) problem (Feige et al. 2004) and its generalizations (Azar et al. 2009; Gamzu 2010).

Diversified web search

In web search, in the absence of the explicit user intent, it is desirable to provide a sequence of high-quality and diverse documents that account for the interests of the overall user population. Typically, the diversity is evaluated by the coverage at the topical level of some existing taxonomy (Zhai et al. 2015). Carbonell and Goldstein (1998) propose a greedy algorithm with respect to maximal marginal relevance (MMR) to reduce the redundancy among returned documents. Bansal et al. (2010) define the problem of finding an ordering of search results that maximizes the discounted cumulative gain (DCG), i.e., the sum of discounted gains of different user types, where the discount factor increases if a user type is satisfied later on. They show that, in some special cases, the DCG metric can be rewritten as a weighted sum of submodular functions. Our framework contributes to this theme by, for example, casting each user type or topic as a submodular function.

Problem definition

We are given a universe set V with |V|=n items, a set of m non-decreasing submodular functions F={f1,,fm}, and a cost function c:VR+. Recall that a set function f:2VR+ is non-decreasing if f(T)f(S) for every TSV, and it is submodular if f(T{v})-f(T)f(S{v})-f(S) for every TSV and vV\S. Furthermore, each function fi is associated with a budget biR+. We will often write f(vS) to mean f({v}S)-f(S).

Let σ(V) denote the set of permutations of V, that is, σ(V)={π:VVπis a permutation}. Our goal is to find a permutation πσ(V) to maximize the sum of function values fi(πi), where the input set πi is a prefix of the sought permutation π with feasibility constraints. In particular, we consider that each function fi receives as input the maximal prefix of π that fits within its corresponding budget bi. In other words, the permutation π can be seen as a sequence of nested sets, one for each function. Formally, the max-submodular ranking (MSR) problem that we study in this paper is defined as follows.

Problem 1

(Max-submodular ranking (MSR)) Given a set of items V, a set of non-decreasing and submodular functions F={f1,,fm}, a cost function c:VR+, and non-negative budgets bi for each function fi, the MSR problem aims to find a permutation πσ(V) that maximizes the sum

fiFfi(πi),such thati=max{j[n]:c(πj)bi}, 1

where πj is the prefix of the permutation π of length j and c(πj)=vπjc(v).

We make a number of observations for Problem 1.

Without loss of generality, we can assume that fi()=0; otherwise we can translate the objective function by fiFfi().

Also note that not all items in the permutation solution π will necessarily be used as an input to some function fiF. Instead, only the items in πi for the largest i will be used. For this reason, we can think that the output to the MSR problem is a partial permutation; after all functions deplete their budget, the remaining items of the permutation does not matter.

Finally, note that when the cost function c is uniform, i.e., c(·)=1, we can consider only integral budget bi and assume i=bi.

With respect to the hardness of approximation of the MSR problem, we observe that MSR is equivalent to the standard submodularity-maximization problem when m=1, that is, when there is only one function in F. A second reduction from the standard submodularity-maximization problem can be obtained by letting bi=b, for all i=1,,m, i.e., when the same budget is used for all functions. The reason is that in this case the sum of submodular functions remains submodular, and we ask to maximize a submodular function under a cardinality constraint. We conclude the following hardness result.

Remark 1

(Nemhauser and Wolsey 1978). For solving the max-submodular ranking (MSR) problem, no algorithm requiring a polynomial number of function evaluations can achieve a better approximation guarantee than e/(e-1).

It is also well-known that maximum k-cover, a special case of submodular maximization, is a dual problem to the minimum set cover problem, where the constraint in one problem is treated as the objective function in the other (Feige 1998). More generally, the MSR problem can be considered as the dual problem to the submodular-ranking problem (SR) (Azar and Gamzu 2011), whose goal is to find a (partial) ranking of the items so as to minimize the average “cover time” of all functions.

We conclude the section by introducing some additional notation that will be used in our analysis. The optimal permutation is denoted by π. We use the operator to denote sequence concatenation and overload operator for subsequence relation.

Cardinality constraints

We start our analysis of the MSR problem for the case of cardinality constraints, that is, when the item costs are uniform (c(·)=1). For this particular case we present two algorithms, called Greedy-U and Greedy-W, both having provable guarantees. Both algorithms generate a permutation by greedily selecting one item before the next. Pseudocode for both algorithms is shown in a unified manner in Algorithm 1. The difference in the two algorithms lies in adopting different coefficients αi, associated with the submodular functions fi, in their selection criteria. The first algorithm, Greedy-U, is an unweighted greedy (αi=1) with respect to the submodular functions fi. The second algorithm, Greedy-W, is a weighted greedy (αi=1/bi) that puts more weight on functions with smaller budget.

The worst-case running time of both algorithms is O(n2m). In practice, they run much faster and their actual running time grows almost linearly in n, thanks to applying a standard lazy evaluation technique (Leskovec et al. 2007). More details on scalability are discussed in Sect. 6.4.graphic file with name 10618_2022_833_Figa_HTML.jpg

Unweighted greedy

We show that the unweighted greedy algorithm (αi=1) achieves a 2-approximation guarantee for the MSR problem with uniform cost. In addition, we show that the approximation ratio is tight for this particular algorithm.

Theorem 1

Greedy-U (Algorithm 1 with coefficients αi=1) is a 2-approximation algorithm for the MSR problem with uniform item costs (c(·)=1).

Proof

Write Rj={fiF:c(πj-1)<bi}. By the greedy selection criteria, we get that for arbitrary item vV in the j-th iteration it holds that

fiRjfi(πj)-fi(πj-1)fiRjfi(vπj-1). 2

The main idea of the proof is to choose an appropriate item v for the above inequality at each iteration of the greedy, and sum over all iterations. We denote the j-th item of the optimal permutation π by vj. We write ALG to denote the value achieved by the Greedy-U algorithm. Then

ALG=fiFfi(πbi)=fiFj=1bifi(πj)-fi(πj-1)telescoping series=j=1nfiRjfi(πj)-fi(πj-1)j=1nfiRjfi(vjπj-1)Equation(2)=fiFj=1bifi(vjπj-1)fiFj=1bifi(vjπbi)submodularityfiFfi(πbiπbi)-fi(πbi)submodularityfiFfi(πbi)-fi(πbi)monotonicity=OPT-ALG.

Consequently, 2ALGOPT, proving the claim.

We complete the analysis of the Greedy-U algorithm for the MSR variant with cardinality constraints, by showing that the approximation ratio 2 is tight.

Remark 2

Greedy-U (Algorithm 1 with coefficients αi=1) cannot do better than 2-approximation for the MSR problem with uniform item costs (c(·)=1).

Proof

We construct an instance where the algorithm returns ALG=12OPT. The main idea is to force the algorithm to pick up items that are only beneficial to functions with large budget and “starve” those with small budget in the early iterations. Consider functions fi with budget bi=i, for all i[m]. Let m=n be even, that is m=n=2k for some k. Select ϵ>0. For ik, we define fi(π)=min{1,I[viπ]+ϵI[vi+kπ]}, where I[·] is the indicator function. For i>k, we define fi(π)=I[viπ].

Clearly every fi is non-decreasing and submodular. One possible optimal permutation is π=(v1,,vn), which leads to OPT=m. Algorithm 1 with coefficient αi=1 returns a permutation (out of many equivalent possible permutations) π=(vn,,v1) with ALG=(1+ϵ)m/2. By letting ϵ be arbitrarily small, we see that the bound in Theorem 1 is tight.

Weighted greedy

Inspired by the instance that yields the tight bound in Remark 2, it is reasonable to let the algorithm favor functions with small budget at the early iterations. Such a strategy is desirable as it in some sense suggests fairness in resource allocation, i.e., more functions can afford at least one item from the returned ranking. It also turns out to have better performance in experiments. We show that such a strategy is indeed reliable by proving a constant-factor approximation guarantee.

Theorem 2

Greedy-W (Algorithm 1 with coefficients αi=1/bi) is a 3-approximation algorithm for the MSR problem with uniform item costs (c(·)=1).

Proof

Write Rj={fiF:c(πj-1)<bi}. By the greedy selection criteria, we know that for an arbitrary item vV it holds that

fiRjαi(fi(πj)-fi(πj-1))fiRjαifi(vπj-1). 3

We denote by vj the j-th item of the optimal permutation π. The idea is to replace the arbitrary item v with vkπ and compute a weighted sum. In order to define the weights, given k<j, we write djk=1/2, and djj=(j+1)/2. Immediately, k[j]djk=(j-1)/2+(j+1)/2=j.

Now Equation (3) implies

fiFj[bi]jαi(fi(πj)-fi(πj-1))=j[n]jfiRjαi(fi(πj)-fi(πj-1))=j[n]k[j]djkfiRjαi(fi(πj)-fi(πj-1))j[n]k[j]djkfiRjαifi(vkπj-1).

We will denote the left hand side of the above equation by LHS, and the right hand side by RHS. We will first bound the RHS. In order to do so, we need an additional bound on the weights djk, namely, for any fixed k,

j=kbdjkb=k+12b+b-k2b=b+12b>12. 4

We can now bound the right hand side with

RHS=fiFj[bi]k[j]djkαifi(vkπj-1)=fiFk[bi]j=kbidjkαifi(vkπj-1)fiFk[bi]j=kbidjkαifi(vkπbi)submodularityfiFk[bi]fi(vkπbi)/2Equation(4)fiF(fi(πbiπbi)-fi(πbi))/2submodularityfiF(fi(πbi)-fi(πbi))/2monotonicity=(OPT-ALG)/2.

Now we consider the left hand side,

LHS=fiFj[bi]jbi(fi(πj)-fi(πj-1))=fiFbibifi(πbi)-j<bij+1-jbifi(πj)fiFfi(πbi)=ALG.

Putting everything together, ALGLHSRHS(OPT-ALG)/2, and we obtain 3ALGOPT.

Knapsack constraints

The traditional way of handling knapsack constraints is to adopt a cost-efficient variant of the greedy algorithm where in each iteration we select the item with the largest ratio between utility and cost. Furthermore, we compute a second solution by selecting the maximum-utility singleton item that is feasible. The idea is to use the second solution to rescue the situation in which the greedy algorithm starts with some cost-efficient small items and then is “starved” (i.e., the remaining budget is not enough to admit another valuable large item). This idea however falls short when it comes to the MSR problem. The reason is that there are multiple knapsacks and each one of them may be “starved” by different big items. A more sophisticated way is needed to compute an alternative second solution.

We now discuss our proposed method in more detail. First, an item vV is called large with respect to a function fiF if its cost is more than half of the budget bi, that is, 2c(v)>bi. It is obvious that a function fi can afford at most one large item. The following variant of the MSR problem targets a similar objective to that of MSR, but exclusive to only large items.

Problem 2

(Max-submodular ranking of large items (MSRL)) Given a set of items V, a set of non-decreasing and submodular functions F={f1,,fm}, a cost function c:VR+, and non-negative budgets bi for each function fi, the MSRL problem aims to find a permutation πσ(V) that maximizes

z(π)=vjπz(vj,c(πj-1))=vjπfiF(vj;π)fi(vj), 5

where F(vj;π) is the set of functions that take the j-th item vjπ as a large item, i.e., F(vj;π)={fiF:2c(vj)>bi,c(πj)bi}, and z(vj,c) is defined to be the contribution of item vj by appending it to a prefix with cost c.

We start by proving that the cost-efficient greedy algorithm yields a 3-approximation when there is no large item in π. Next, we devise a dynamic programming (DP) algorithm in Algorithm 2 to approximately solve MSRL. Finally, we prove that the best solution among the greedy solution and the DP solution can achieve an approximation guarantee that is arbitrarily close to 4.

Step 1: bounding small items in π. We first discuss the case in the absence of large items in π. Let us introduce some notation. We denote the j-th selected item by our algorithm by uj. We denote the k-th item of the optimal permutation π by vk. We denote the greedy solution of Algorithm 1 with coefficient αi=1 by ALG1 and the DP solution of Algorithm 2 by ALG2.

The next theorem shows that, if every function fi includes no such large item in π, ALG1 ensures a constant-factor guarantee. Otherwise, we have an additional term z(π), which we will bound later.

Theorem 3

The greedy algorithm yields 3ALG1+z(π)OPT.

The proof relies on the next technical observation.

Observation 1

For any k, if item vkπ is feasible and not large for function fi, i.e., c(πk)bi and 2c(vk)bi, then at the j-th greedy iteration such that c(πj-1)c(πk)/2, we have c(πj-1)+c(vk)bi.

Proof

The proof is straightforward by combining c(πj-1)c(πk)/2bi/2 and c(vk)bi/2.

Proof of of Theorem 3

Write Rj={fiF:c(πj-1)<bi}. By greedy, we know that for arbitrary item vV in the j-th iteration it holds that

1c(uj)fiRj:c(πj)bifi(ujπj-1)1c(v)fiRj:c(πj-1)+c(v)bifi(vπj-1). 6

To simplify the notation used in the above inequality, let us define Xj={i[m]c(πj)bi} to be the valid function indices for πj, and similarly Yjk={i[m]c(πj-1)+c(vk)bi}.

For function fi, we define i=max{j[n]:c(πj)bi}.

Let us define a sequence of weights dj=len(Aj), where the interval Aj=(c(πj-1),c(πj)](0,c(π)/2].

We will start by lower bounding ALG1 with

ALG1=fiFj[i]fi(ujπj-1)=j[n]iXjfi(ujπj-1)j[n]djc(uj)iXjfi(ujπj-1).sincedjc(uj)

Let us denote the right hand side with C. We will prove the theorem by showing that C(OPT-ALG1-z(π))/2.

We define djk=len(AjBk), where interval Bk=(c(πk-1)/2,c(πk)/2]. We see immediately that dj=len(Aj)=k[n]djk as Bk partition Aj. Similarly, j[n]djk=len(Bk)=c(vk)/2 as Aj partition Bk.

We first claim that for any i,

ifj>iandki,thendjk=0. 7

To prove Equation (7) note that j-1i implies that c(πj-1)bi while ki implies that c(πk)bi. Consequently, AjBk= and djk=0.

Let us now define Si=k[i]:2c(vk)bi to be the set of small items for the i-th function. We claim that

ifkSianddjk>0,thenc(πj-1)+c(vk)bi. 8

To prove Equation (8) note that since ki, we have c(πk)bi. Moreover, since kSi, we have 2c(vk)bi. If c(πj-1)>c(πk)/2, then AjBk= and so djk=0. Thus, c(πj-1)c(πk)/2. Observation 1 now proves Equation (8).

We can now lower bound C with

C=j[n]k[n]djkc(uj)iXjfi(ujπj-1)sincedj=k[n]djkj[n]k[n]djkc(vk)iYjkfi(vkπj-1)Equation(6)=i[m]k[n]j[i]:iYjkdjkc(vk)fi(vkπj-1)i[m]kSij[i]:iYjk,djk>0djkc(vk)fi(vkπj-1)=i[m]kSij[i]djkc(vk)fi(vkπj-1)Equation(8)i[m]kSij[i]djkc(vk)fi(vkπi)submodularity=i[m]kSij[n]djkc(vk)fi(vkπi)Equation(7)=i[m]kSifi(vkπi)/2sincej[n]djk=c(vk)/2-z(π)/2+i[m]k[i]fi(vkπi)/2-z(π)/2+i[m](fi(πiπi)-fi(πi))/2submodularity-z(π)/2+i[m](fi(πi)-fi(πi))/2monotonicity=(OPT-ALG1-z(π))/2.

Putting everything together, we obtain ALG1(OPT-ALG1-z(π))/2, that is, 3ALG1+z(π)OPT.

Step 2: bounding large items in π. When some functions do take large items in OPT, the quantity z(π) is positive, and we need to bound it. We will do this by solving approximately the MSRL problem.

Our first result allows to order items based on their cost when solving MSRL.

Theorem 4

Assume a permutation π with some item vi for which there is an index j<i such that c(vj)c(vi). Define a sub-permutation π by removing vi. Then z(π)z(π).

The proof relies on the following technical observation.

Observation 2

Given an item v and two sequences π,π with costs c(π)c(π), we have F(v;πv)F(v;πv) and z(v;c(π))z(v;c(π)).

Proof

Note that

F(v;πv)={fiF:2c(v)>bi,c(π)+c(v)bi}{fiF:2c(v)>bi,c(π)+c(v)bi}=F(v;πv).

Consequently, we have

z(v;c(π))=fiF(v;πv)fi(v)fiF(v;πv)fi(v)=z(v;c(π)),

proving the claim.

Proof of Theorem 4

Let vi be an item that is in π but not in π. Assume that 2c(vi)>b for arbitrary function budget b. Then c(πi-1)+c(vi)2c(vi)>b, following the assumptions of the theorem. Consequently, F(v;πi)= and z(vi,c(πi-1))=0. Let uj be the j-th item in π. Observation 2 now implies that

z(π)=viπz(vi;c(πi-1))=viπz(vi;c(πi-1))ujπz(uj;c(πj-1))=z(π),

proving the claim.

The above theorem enables a way to limit ourselves to sequences of large items with non-decreasing costs when solving MSRL.

Let us assume for simplicity that z(·) is an integer-value in [k]. We will discuss how to relax this assumption shortly.

We can solve MSRL by constructing a table T with entry T(a,j) for each value a[k] and each item with index j[n]. We define the entry T(a,j) to be the lowest possible cost of a permutation using only the first j items with at least value a,

T(a,j)=min{c(π)z(π)a,π(v1,,vj)}.

Note that it is also possible to solve MSRL by defining a different dual DP, where each entry T(b,j) contains the highest value realizable by a permutation using only the first j items with at most cost b. However, this dual DP is not amenable to the standard rounding trick we will introduce shortly.

Theorem 5

The table T satisfies the following relation:

T(a,j)=minT(a,j-1),minaa+z(vj;T(a,j-1))aT(a,j-1)+c(vj), 9

when j>1. Moreover, T(0,1)=0, T(a,1)=c(v1) if 0<az(v1), and otherwise.

Proof

We will prove by induction. The result holds trivially for T(a,1).

Next, we assume the theorem holds for all T(a,j-1). Now we examine T(a,j). Let π be a sequence responsible for T(a,j). Let X be the value of the right hand side of Equation 9. Clearly, we have Xc(π), and we now prove the claim by showing that Xc(π).

If vj not in π, then XT(a,j-1)c(π), and we are done. If vj is in π, then let π be the permutation without vj. Let a=z(π), and by the inductive hypothesis, we know that T(a,j-1)c(π). Then

az(π)=a+z(vj;c(π))a+z(vj;T(a,j-1)),

where the last inequality is by Observation 2. Therefore, according to the DP updating rule, we have

XT(a,j-1)+c(vj)c(π)+c(vj)=c(π),

completing the proof.

We can use Theorem 5 to construct T using a dynamic program, which is described in Algorithm 2. Next, we will show that the DP solves the MSRL problem. graphic file with name 10618_2022_833_Figb_HTML.jpg

Theorem 6

Assume that z(π) is an integer in [k] for every π. The permutation π responsible for T(a,n), where a=max{aT(a,n)<}, returned by Algorithm 2 has the largest z(·) value. Besides, Algorithm 2 runs in O(n(k+m)+mlogm) time.

Proof

The correctness of the algorithm follows directly from Theorem 5. There are in total k×n table entries. Note that we can avoid directly invoking z(vj;·), which alone needs time O(m), by sorting fi by their budget bi and gradually including more fi as c(T(a,j-1)) and a decrease. This leads to an additional O(m) time per index j.

We provide a numerical example to illustrate the DP algorithm.

Example 1

Consider two modular functions f1,f2 with budget b1=3,b2=9, and three items v1,v2,v3 with costs 2.5, 3, 6.5, respectively. We define f1(v1)=1, f1(v2)=1.5, f2(v3)=1, and 0 otherwise.

It is easy to see that both the cost-efficient greedy algorithm and the best singleton will pick item v2, which leads to a sub-optimal ranking, while the DP algorithm can help us find the optimal ranking.

The DP algorithm first initializes T(a,j) for all a and j. We then process items v1,v2,v3 in non-decreasing order by their costs.

  • Item v1: we set T(a,1)=c(v1) for all 0<af1(v1) and T(0,1)=0.

  • Item v2: we set T(a,2)=T(a,1) for all af1(v1), and T(a,2)=c(v2) for all f1(v1)<af1(v2).

  • Item v3: we set T(a,3)=T(a,2) for all af1(v2), and T(a,3)=c(v1)+c(v3) for all f1(v2)<af1(v1)+f2(v3).

Finally, we return the permutation π=(v1,v3) responsible for T(a,3), where a=f1(v1)+f2(v3).

So far we have assumed that z is an integer. Next, we show that with a standard rounding technique, the DP method in Algorithm 2 gives an FPTAS for MSRL. The idea is to apply the DP to a rounded instance, which is obtained by first scaling and rounding down every function fi/K for certain K.

Theorem 7

Let P=maxi,vfi(v), where v is a large item for fi. Let K=Pϵm for any constant ϵ>0. Define fi=fi/K and let z(π) be the score of a permutation using fi instead of fi. Let π be the permutation with the largest z(π). Then Kz(π)(1-ϵ)z(π).

Proof

Due to scaling and rounding down we have fi(v)-Kfi(v)K. Since there can be at most one large item per function, and the score z contains at most m functions, thus, z(π)-Kz(π)mK=Pϵϵz(π).

Corollary 1

Algorithm 2 with rounding yields 1/(1-ϵ) approximation guarantee in O(nm2/ϵ) time.

Proof

Let π be the permutation with the largest z and let π be the permutation with the largest z. Then z(π)Kz(π)Kz(π)(1-ϵ)z(π), proving the approximation guarantee.

To prove the running time note that z(·)mP and z(·)mP/K=m2/ϵ. Theorem 6 proves the claim.

We are finally ready to state our main result for MSR with non-uniform cost.

Theorem 8

The best among Algorithm 1 with coefficient αi=1 and Algorithm 2 is (3+1/(1-ϵ))-approximation for the MSR problem with non-uniform cost.

Proof

Theorem 3 and Corollary 1 imply that

(3+(1-ϵ)-1)ALG3ALG1+(1-ϵ)-1ALG2ALG1+z(π)OPT,

where ALG=max{ALG1,ALG2}, proving the claim.

Experimental evaluation

In this section, we evaluate the performance of the proposed algorithms on real-world datasets. We first discuss our experimental evaluation for a playlist-making use-case. We model this use-case using the max-activation ranking (MAR) problem, which is a special case of the MSR problem when the submodular functions fi are 0–1 functions. We then conduct two experiments for the MSR problem: (i) multiple intents re-ranking and (ii) sequential active learning. Finally, we evaluate the running time of our methods. Statistics of the datasets used in the experiments are summarized in Table 1. Our implementation and pre-processing scripts can be found in a Github repository.1

Table 1.

Datasets statistics

Dataset n=|V| m=|F|
Songs 1872 100
Movies 3669 100
Books 3753 1000
20 Newsgroups 172 5
Handwritten Digits 1347 3

Proposed methods and baselines

The proposed greedy algorithms are denoted by Greedy-U and Greedy-W; as discussed in Sect. 4. The proposed dynamic program is denoted by DP. As baselines we use the following algorithms.

  • The greedy algorithm for the SR problem (Azar and Gamzu 2011), which favors functions near completion. We refer to this baseline as AG.

  • When only the minimum budget among all functions is considered, the objective is a submodular function as a whole. We then consider the well-known “best-of-two” algorithm that returns the best solution among the solutions found by a cost-efficient greedy method and by selecting the best singleton item. We refer to this baseline as Subm.

  • A simple ranking method (Quality) that orders individual items in non-increasing quality.

  • A random ranking algorithm (Random).

Note that in general, computing the optimal solution requires enumerating all sequences of length equal to the maximum budget, which is computationally intractable even for a modest scenario with universe set |V|=100 and budget b=10.

Experiments with the max-activation ranking (MAR) problem

We evaluate our methods on three datasets, the Million Song dataset (Bertin-Mahieux et al. 2011), the MovieLens dataset (Harper and Konstan 2015), and the Amazon Review dataset on books category (Ni et al. 2019). The three datasets have similar format, where each record can be seen as a triple of user, item and rating. We describe our experimental evaluation for the first dataset, and the other two datasets are processed in the same way and give very similar results, as can be verified in Fig. 1.

Fig. 1.

Fig. 1

Results of using the MAR problem formulation for making a playlist of items. The goal is to maximize the number of activated users. The universe V includes songs, movies or books. A user (a 0–1 activation function fi) is activated if they like at least one item among all items they consume within their budget. Markers are jittered horizontally to avoid overlap

In the Million Song dataset, each record is a triple representing a user, song and play count. We assume that a user likes a song if they play the song more than once. We investigate an instance of the MAR problem for the application scenario of creating a playlist. In particular, we want to find a ranking of songs that maximizes the number of users who like at least one song among songs they listen to. In this case, each user is modeled as a 0–1 activation function. We generate a random budget for each user, i.e., the maximum number of songs a user will listen to, from 1 to a given maximum budget. We also generate a random cost from 1 to 10 for each song in order to experiment with an additional non-uniform cost scenario.

The results of our evaluation are shown in Fig. 1. The error bars are over random user budgets and item costs. In the unit-cost scenario, the proposed Greedy-W algorithm is the best performing, closely followed by the proposed Greedy-U algorithm. The performance of the baselines is inferior, and one reason is that they fail to take into account the user budget. In the non-uniform cost scenario, the proposed Greedy-U algorithm obtains the best performance. Note that it is expected that DP has poor performance, as it is meant to help in extreme cases. Also note that DP does not scale for the book-list dataset—more details on scalability are discussed in Sect. 6.4. Interestingly, Greedy-W performs worse than AG, which indicates that a more sophisticated weighting scheme is needed to combine non-uniform budget and cost.

Experiments with the max-submodular ranking (MSR) problem

Multiple intents re-ranking

We simulate a web-page ranking application for documents in the 20 Newsgroups dataset (Dua and Graff 2017). For each newsgroup, we treat its title as a query, and collect documents that contains the query. We extract 5 topics from the collected documents by means of LDA model (Blei et al. 2003). Subsequently, each topic (i.e., its top 20 keywords) is considered as a potential user intent, and the submodular utility for a particular topic when given a set of documents is the coverage rate of its top keywords. We aim to find a ranking of documents that maximize the total utility of all user intents. As in the previous experiment, we generate a random budget for each user intent, i.e., the maximum number of documents the potential user will read, from 1 to a given maximum budget. For an additional non-uniform cost scenario, we use the document length as the cost for reading a document, and accordingly multiply the budget by the average document length.

The results of our experiment are shown in Fig. 2, where we report the average performance across all newsgroups. In the unit-cost scenario, the top-contender algorithms have close performance. This is due to the overwhelming advantage of lengthy documents that contain more words and produce higher utility. In the more realistic non-uniform cost scenario, our algorithms, Greedy-U and Greedy-W, achieve the best performance. Quality algorithm behaves the worst as it fails to consider the cost of items, and its first-rank lengthy document exceeds the user budget most of the time.

Fig. 2.

Fig. 2

MSR for multiple intents re-ranking in web page ranking. The goal is to maximize the total utility of all user intents within their individual reading budget. The universe V includes documents. The utility of a user intent (a coverage function fi) is represented by the coverage rate of its top keywords. Markers are jittered horizontally to avoid overlap

Sequential active learning

Active learning seeks to make label queries on only a small number of informative data points in order to maximize model performance. In particular, for the k-nearest neighbors (kNN) model, an intuitive measure for informativeness of a set of labeled data points is the average distance from an unlabeled data point to its closest labeled point, i.e., the facility-location function (Wei et al. 2015). We refer to this average distance as the radius. Thus, the active-learning task can be naturally formulated as labeling a small subset of data to maximize the radius reduction. Note that the reduction of the radius by labeling a subset of data points is clearly non-decreasing and submodular.

In our setting, we assume that we have access to multiple models that are trained on the same labeled data, and we aim to label data sequentially to maximize the total reduction in the radii among all models. This happens, for example, when each model runs on a different subset of features. Interestingly, in this case each model can be seen as a student with different learning capacity, and a teacher tries to optimize the classroom teaching by feeding them labeled data (Zhu et al. 2017). We evaluate the performance of active-learning kNNs (k=1) with Euclidean distance in the Handwritten Digits dataset (Dua and Graff 2017). Each kNN model adopts a different strategy in unsupervised feature selection, such as variance thresholding, PCA, and feature agglomeration. Again, we generate a random query budget for each model and a random cost (from 1 to 10) for labeling each data point.

As we can see in Fig. 3, all greedy algorithms are very effective in reducing the radii. The correlation between the radius reduction and model accuracy (over testing data) is obvious. Note that the Random algorithm is a standard strong baseline in data subset selection, which is outperformed by the greedy algorithms by a large margin. The comparison becomes more evident in the non-uniform cost scenario, as the Random algorithm fails to take into account the item costs.

Fig. 3.

Fig. 3

MSR for sequential data subset selection for kNN models. The goal is to boost the average predictive accuracy of kNN models. The universe V includes all data points. The sum of the surrogate objective function fi (reduction of radii) for each model is optimized. Markers are jittered horizontally to avoid overlap

Running time

We examine the scalability of all methods by fixing either the number of users (i.e., functions) or the maximum budget (equal to the number of items), while varying the other. In Fig. 4 we demonstrate the running time of all algorithms for the task of making a synthetic playlist. In this case, we generate a dataset by assuming that each user likes a small random subset of items. We generate a random budget for each user, from 1 to the given maximum budget, and a random cost from 1 to 10 for each item.

Fig. 4.

Fig. 4

Running time of all methods for the task of making a synthetic playlist

When comparing the running time, the Quality algorithm is a meaningful baseline, as it produces a ranking after a single evaluation on each item over all functions, i.e., O(max{nlog(n),mn}). Its running time varies almost linearly as a function of the budget, which is in contrast to the behavior of the naïve greedy algorithms. Thanks to the lazy evaluation technique (Leskovec et al. 2007), the running time of all greedy algorithms actually grows nearly linearly in the budget. The AG algorithm is slower as it is subject to frequent function evaluations, because its greedy criterion depends on the current function values. The running time of the DP algorithm grows quadratically in the number of functions, which has difficulty in scaling to a very large number. On the other hand, it scales well in the number of items, and particularly, when the budget is big, it finishes quickly as there is no large item. The running time of all except for the Random algorithm grows linearly in the number of functions, which is inevitable if the utility of items is considered.

Conclusions

In this paper, we introduce a novel problem in the active area of submodular optimization. Our problem, max-submodular ranking (MSR), ask to find a ranking of items such that the sum of multiple budgeted submodular utility is maximized. The MSR problem has wide application in the ranking of web pages, ads, and other types of items. We propose several practical algorithms with approximation guarantees for the MSR problem, with either cardinality or knapsack budget constraints. We empirically demonstrate the superior performance of the proposed algorithms on real-life datasets, compared with a state-of-the-art baseline and other meaningful heuristics.

One direction for future work is to narrow the gap between the approximation ratio and the lower bound. Another direction is to study the online version of the MSR problem, to allow for the arrival of new submodular functions. Other potential directions include imposing a more general constraint for each submodular function and experimenting with new applications.

Acknowledgements

This research is supported by the Academy of Finland projects MALSOME (343045), AIDA (317085) and MLDB (325117), the ERC Advanced Grant REBOUND (834862), the EC H2020 RIA project SoBigData++ (871042), and the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation.

Funding

Open access funding provided by Royal Institute of Technology.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Guangyi Zhang, Email: guaz@kth.se.

Nikolaj Tatti, Email: nikolaj.tatti@helsinki.fi.

Aristides Gionis, Email: argioni@kth.se.

References

  1. Azar Y, Gamzu I (2011) Ranking with submodular valuations. In: Proceedings of the twenty-second annual ACM-SIAM symposium on discrete algorithms. SIAM, pp 1070–1079
  2. Azar Y, Gamzu I, Yin X (2009) Multiple intents re-ranking. In: Proceedings of the forty-first annual ACM symposium on theory of computing, pp 669–678
  3. Bansal N, Jain K, Kazeykina A, Naor JS (2010) Approximation algorithms for diversified search ranking. In: International colloquium on automata, languages, and programming. Springer, pp 273–284
  4. Bertin-Mahieux T, Ellis DP, Whitman B, Lamere P (2011) The million song dataset. In: Proceedings of the 12th international conference on music information retrieval (ISMIR 2011)
  5. Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. J Mach Learn Res. 2003;3:993–1022. [Google Scholar]
  6. Carbonell JG, Goldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: SIGIR
  7. Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
  8. Feige U. A threshold of ln n for approximating set cover. J ACM. 1998;45(4):634–652. doi: 10.1145/285055.285059. [DOI] [Google Scholar]
  9. Feige U, Lovász L, Tetali P. Approximating min sum set cover. Algorithmica. 2004;40(4):219–234. doi: 10.1007/s00453-004-1110-5. [DOI] [Google Scholar]
  10. Feldman M, Nutov Z, Shoham E (2020) Practical budgeted submodular maximization. arXiv preprint arXiv:2007.04937
  11. Gamzu I (2010) Web search ranking and allocation mechanisms. PhD thesis, Tel Aviv University
  12. Harper FM, Konstan JA. The movielens datasets: history and context. ACM Trans Interact Intell Syst. 2015;5(4):1–19. doi: 10.1145/2827872. [DOI] [Google Scholar]
  13. Jansen BJ, Booth DL, Spink A. Determining the informational, navigational, and transactional intent of web queries. Inf Process Manage. 2008;44(3):1251–1266. doi: 10.1016/j.ipm.2007.07.015. [DOI] [Google Scholar]
  14. Kempe D, Kleinberg J, Tardos É. Maximizing the spread of influence through a social network. Theory Comput. 2015;11(4):105–147. doi: 10.4086/toc.2015.v011a004. [DOI] [Google Scholar]
  15. Krause A, Golovin D. Submodular function maximization. Tractability. 2014;3:71–104. [Google Scholar]
  16. Krause A, Singh A, Guestrin C (2008) Near-optimal sensor placements in gaussian processes: Theory, efficient algorithms and empirical studies. J Mach Learn Res 9(2)
  17. Leskovec J, Krause A, Guestrin C, Faloutsos C, VanBriesen J, Glance N (2007) Cost-effective outbreak detection in networks. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 420–429
  18. Lin H, Bilmes J (2011) A class of submodular functions for document summarization. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, pp 510–520
  19. Nemhauser GL, Wolsey LA. Best algorithms for approximating the maximum of a submodular set function. Math Oper Res. 1978;3(3):177–188. doi: 10.1287/moor.3.3.177. [DOI] [Google Scholar]
  20. Nemhauser GL, Wolsey LA, Fisher ML. An analysis of approximations for maximizing submodular set functions-I. Math Program. 1978;14(1):265–294. doi: 10.1007/BF01588971. [DOI] [Google Scholar]
  21. Ni J, Li J, McAuley J (2019) Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 188–197
  22. Streeter M, Golovin D (2008) An online algorithm for maximizing submodular functions. In: Proceedings of the 21st international conference on neural information processing systems, pp 1577–1584
  23. Wei K, Iyer R, Bilmes J (2015) Submodularity in data subset selection and active learning. In: International conference on machine learning. PMLR, pp 1954–1963
  24. Yaroslavtsev G, Zhou S, Avdiukhin D (2020) “bring your own greedy”+ max: Near-optimal 1/2-approximations for submodular knapsack. In: International conference on artificial intelligence and statistics. PMLR, pp 3263–3274
  25. Zhai C, Cohen WW, Lafferty J (2015) Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In: SIGIR
  26. Zhang Z, Chong EK, Pezeshki A, Moran W, Howard SD (2012) Submodularity and optimality of fusion rules in balanced binary relay trees. In: 2012 IEEE 51st IEEE conference on decision and control (CDC). IEEE, pp 3802–3807
  27. Zhu X, Liu J, Lopes M (2017) No learner left behind: on the complexity of teaching multiple learners simultaneously. In: IJCAI, pp 3588–3594

Articles from Data Mining and Knowledge Discovery are provided here courtesy of Springer

RESOURCES