Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Aug 19.
Published in final edited form as: Proc Annu Symp Found Comput Sci. 2020;1:517–528.

Near Optimal Linear Algebra in the Online and Sliding Window Models

Vladimir Braverman 1, Petros Drineas 2, Cameron Musco 3, Christopher Musco 4, Jalaj Upadhyay 5, David P Woodruff 6, Samson Zhou 6
PMCID: PMC8375632  NIHMSID: NIHMS1696963  PMID: 34421392

Abstract

We initiate the study of numerical linear algebra in the sliding window model, where only the most recent W updates in a stream form the underlying data set. Although many existing algorithms in the sliding window model use or borrow elements from the smooth histogram framework (Braverman and Ostrovsky, FOCS 2007), we show that many interesting linear-algebraic problems, including spectral and vector induced matrix norms, generalized regression, and lowrank approximation, are not amenable to this approach in the row-arrival model. To overcome this challenge, we first introduce a unified row-sampling based framework that gives randomized algorithms for spectral approximation, low-rank approximation/projection-cost preservation, and 1-subspace embeddings in the sliding window model, which often use nearly optimal space and achieve nearly input sparsity runtime. Our algorithms are based on “reverse online” versions of offline sampling distributions such as (ridge) leverage scores, 1 sensitivities, and Lewis weights to quantify both the importance and the recency of a row; our structural results on these distributions may be of independent interest for future algorithmic design.

Although our techniques initially address numerical linear algebra in the sliding window model, our row-sampling framework rather surprisingly implies connections to the well-studied online model; our structural results also give the first sample optimal (up to lower order terms) online algorithm for low-rank approximation/projection-cost preservation. Using this powerful primitive, we give online algorithms for column/row subset selection and principal component analysis that resolves the main open question of Bhaskara et al. (FOCS 2019). We also give the first online algorithm for 1-subspace embeddings. We further formalize the connection between the online model and the sliding window model by introducing an additional unified framework for deterministic algorithms using a merge and reduce paradigm and the concept of online coresets, which we define as a weighted subset of rows of the input matrix that can be used to compute a good approximation to some given function on all of its prefixes. Our sampling based algorithms in the row-arrival online model yield online coresets, giving deterministic algorithms for spectral approximation, low-rank approximation/projection-cost preservation, and 1-subspace embeddings in the sliding window model that use nearly optimal space.

Keywords: streaming algorithms, online algorithms, sliding window model, numerical linear algebra

I. Introduction

The advent of big data has reinforced efforts to design and analyze algorithms in the streaming model, where data arrives sequentially, can be observed in a small number of passes (ideally once), and the proposed algorithms are allowed to use space that is sublinear in the size of the input. For example in a typical e-commerce setup, the entries of a row represent the number of each item purchased by a customer in a transaction. As the transaction is completed, the advertiser receives an entire row of information as an update, which corresponds to the row-arrival model. Then the underlying covariance matrix summarizes information about which items tend to be purchased together, while low-rank approximation identifies a representative subset of transactions.

However, the streaming model does not fully address settings where the data is time-sensitive; the advertiser is not interested in the outdated behavior of customers. Thus one scenario that is not well-represented by the streaming model is when recent data is considered more accurate and important than data that arrived prior to a certain time window, as in applications such as network monitoring [8], [9], [18]–[20], event detection in social media [32], and data summarization [11], [25]. To model such settings, Datar et al. [21] introduced the sliding window model, which is parametrized by the size W of the window that represents the size of the active data that we want to analyze, in contrast to the so-called “expired data”. The objective is to compute or approximate statistics only on the active data using memory that is sublinear in the window size W.

The sliding window model is more appropriate than the unbounded streaming model in a number of applications [2], [30], [34], [37]. For example in large scale social media analysis, each row in a matrix can correspond to some online document, such as the content of a Twitter post, and given some corresponding time information. Although a streaming algorithm can analyze the data starting from a certain time, analysis with a recent time frame, e.g., the most recent week or month, could provide much more attractive information to advertisers or content providers. Similarly in the task of data summarization, the underlying data set is a matrix whose rows correspond to a number of subjects, while the columns correspond to a number of features. Information on each subject arrives sequentially and the task is to select a small number of representative subjects, which is usually done through some kind of PCA [33], [35]. However, if the behavior of the subjects has recently and indefinitely changed, we would like the summary to only be representative of the updated behavior, rather than the outdated information.

Another time-sensitive scenario that is not well-represented by the streaming model is when irreversible decisions must be made upon the arrival of each update in the stream, which enables further actions downstream, such as in scheduling, facility location, and data structures. The goal of the online model is to address such settings by requiring immediate and permanent actions on each element of the stream as it arrives, while still remaining competitive with an optimal offline solution that has full knowledge of the entire input. We specifically study the case where the online model must also use space sublinear in the size of the input, though this restriction is not always enforced across algorithms in the online model for other problems. In the context of online PCA, an algorithm receives a stream of input vectors and must immediately project each input vector into a lower dimension space of its choice. The projected vector can then be used as input to some downstream rotationally invariant algorithm, such as classification, clustering, or regression, which would run more efficiently due to the lower dimensional input. Moreover, PCA serves as a popular preprocessing step because it often actually improves the quality of the solution by removing isotropic noise [6] from the data. Thus in applications such as clustering, the denoised projection can perform better than the original input. The online model has also been extensively used in many other applications, such as learning [3], [5], [28], (prophet) secretary problems [24], [26], [36], ad allocation [31], and a variety of graph applications [12], [13], [27].

Generally, the sliding window model and the online model do not seem related, resulting in a different set of techniques being developed for each problem and each setting. Surprisingly, our results exhibit a seemingly unexplored and interesting connection between the sliding window model and the online model. Our observation is that an online algorithm should be correct on all prefixes of the input, in case the stream terminates at that prefix; on the other hand, a sliding window algorithm should be correct on all suffixes of the input, in case the previous elements expire leaving only the suffix (and perhaps a bunch of “dummy” elements). Then can we gain something by viewing each update to a sliding window algorithm as an update to an online algorithm in reverse? At first glance, the answer might seem to be no; we cannot simulate an online algorithm with the stream in reverse order because it would have access to the entire stream whereas a sliding window algorithm only maintains a sketch of the previous elements upon each update. However, it turns out that in the row-arrival model, a sketch of the previous elements often suffices to approximately simulate the entire stream input to the online algorithm. Indeed, we show that any row-sampling based online algorithm for the problems of spectral approximation, low-rank approximation/projection-cost preservation, and 1-subspace embedding automatically implies a corresponding deterministic sliding window algorithm for the problem!

A. Our Contributions

We initiate and perform a comprehensive study for both randomized and deterministic algorithms in the sliding window model. We first present a randomized row sampling framework for spectral approximation, low-rank approximation/projection-cost preservation, and 1-subspace embeddings in the sliding window model. Most of our results are space or time optimal, up to lower order terms. Our sliding window structural results imply structural results for the online setting, which we use to give algorithms for row/column subset selection, PCA, projection-cost preservation, and subspace embeddings in the online model. Our online algorithms are simple and intuitive, yet they either are novel for the particular problem or improve upon the state-of-the-art, e.g., Bhaskara et al.(FOCS 2019) [4]. Finally, we formalize a surprising connection between online algorithms and sliding window algorithms by describing a unified framework for deterministic algorithms in the sliding window model based on the merge-and-reduce paradigm and the concept of online coresets, which are provably generated by online algorithms.

Row Sampling Framework for the Sliding Window Model:

One may ask whether existing algorithms in the sliding window model can be generalized to problems for numerical linear algebra. In the full version of the paper [7], we give counterexamples showing that various linear-algebraic functions, including the spectral norm, vector induced matrix norms, generalized regression, and low-rank approximation, are not smooth according to the definitions of [10] and therefore cannot be used in the smooth histogram framework. This motivates the need for new frameworks for problems of linear algebra in the sliding window model. We first give a row sampling based framework for space and runtime efficient randomized algorithms for numerical linear algebra in the sliding window model.

Framework I.1 (Row Sampling Framework for the Sliding Window Model).

There exists a row sampling based framework in the sliding window model that upon the arrival of each new row of the stream with condition number κ chooses whether to keep or discard each previously stored row, according to some predefined probability distribution for each problem. Using the appropriate probability distribution, we obtain for any approximation parameter ε > 0:

  1. A randomized algorithm for spectral approximation in the sliding window model that with high probability, outputs a matrix M that is a subset of (rescaled) rows of an input matrix AW×d such that (1 − ε) AAMM ⪯ (1 + ε) AA, while storing O(dε2 log n log κ) rows at any time and using nearly input sparsity time. (See Theorem II.4.)

  2. A randomized algorithm for low-rank approximation/projection-cost preservation in the sliding window model that with high probability, outputs a matrix M that is a subset of (rescaled) rows of an input matrix AW×d such that for all rank k orthogonal projection matrices Pd×d,
    MMPF2(1±ε)AAPF2,
    while storing O(kε2 log n log2κ) rows at any time and using nearly input sparsity time. (See Theorem II.10.)
  3. A randomized algorithm for 1-subspace embeddings in the sliding window model that with high probability, outputs a matrix M that is a subset of (rescaled) rows of an input matrix AW×d such that (1 − ε) ∥Ax1 ≤ ∥Mx1 ≤ (1 + ε) ∥Ax1 for all xd, while storing O(d2ε2 log2n log κ) rows at any time. (See Theorem IV.5.)

Here we say the stream has condition number κ if the ratio of the largest to smallest nonzero singular values of any matrix formed by consecutive rows of the stream is at most κ.

For low-rank approximation/projection-cost preservation, we can further improve the polylogarithmic factors from O(kε2 log n log2κ) rows stored at any time to O(kε2 log2n), under the assumption that the entries of the underlying matrix are integers with magnitude at most poly(n) even though log κ can be as large as O(d log n) with these assumptions. To the best of our knowledge, not only are our contributions in Framework I.1 the first such algorithms for these problems in the sliding window model, but also Theorem II.4 and Theorem II.10 are both space and runtime optimal up to lower order terms, even compared to row sampling algorithms in the offline setting for most reasonable regime of parameters [1], [23].

Numerical Linear Algebra in the Online Model:

An important step in the analysis of our row sampling framework for numerical linear algebra in the sliding window model is bounding the sum of the sampling probabilities for each row. In particular, we provide a tight bound on the sum of the online ridge leverage scores that was previously unexplored. We show that our bounds along with the paradigm of row sampling with respect to online ridge leverage scores offer simple online algorithms that improve upon the state-of-the-art across broad applications.

Theorem I.2 (Online Rank k Projection-Cost Preservation).

Given parameters ε > 0, k > 0, and a matrix An×d, whose rows a1, … , an arrive sequentially in a stream with condition number κ, there exists an online algorithm that with high probability, outputs a matrix M that has O(kε2 log n log2κ) (rescaled) rows of A and for all rank k orthogonal projection matrices Pd×d,

(1ε)AAPF2MMPF2(1+ε)AAPF2.

(See Theorem III.1.)

Theorem III.1 immediately yields improvements on the two online algorithms recently developed by Bhaskara et al. (FOCS 2019) for online row subset selection and online PCA [4].

Theorem I.3 (Online Row Subset Selection).

Given parameters ε > 0, k > 0, and a matrix An×d, whose rows a1, … , an arrive sequentially in a stream with condition number κ, there exists an online algorithm that with high probability, outputs a matrix M with O(kε log n log2κ) rows that contains a matrix T of k rows such that

AAT1TF2(1+ε)AA(k)F2.

(See Theorem III.2.)

By comparison, the online row subset selection algorithm of [4] stores O(kε2 log n log2κ) rows to succeed with high probability. Moreover, our algorithm provides the guarantee of the existence of a subset T of k rows that provides a (1 + ε)-approximation to the best rank k solution, whereas [4] promises the bicriteria result that their matrix with rank O(kε2 log n log2κ) is a (1 + ε)-approximation to the best rank k solution.

The online PCA algorithm of [4] also offers this bicriteria guarantee; for an input matrix An×d, they give an algorithm that outputs a matrix Mn×m, where m=O(kε2( log n+ log κ)4) and a matrix X of rank m, so that AMXF2(1+ε)AA(k)F2. Our online row subset selection can also be adjoined with the online PCA algorithm of [4] to offer the promise of the existence of a submatrix Yk×d within X such that there exists a matrix B with ABYF2(1+ε)AA(k)F2.

Theorem I.4 (Online Principal Component Analysis).

Given parameters n, d, k, ε > 0 and a matrix An×d whose rows arrive sequentially in a stream with condition number κ, let m=O(kε2( log n+ log κ)4). There exists an algorithm for online PCA that immediately outputs a row mim after seeing row aid and with high probability, outputs a matrix Xm×d at the end of the stream such that

AMXF2(1+ε)AA(k)F2,

where A(k) is the best rank k approximation to A. Moreover, X contains a submatrix Yk×d such that there exists a matrix B such that

ABYF2(1+ε)AA(k)F2.

(See Theorem III.3.)

Our sliding window algorithm for 1-subspace embeddings also uses an online 1-subspace embedding algorithm that we develop.

Theorem I.5 (Online 1-Subspace Embedding).

Given ε>1n and a matrix An×d, whose rows a1, … , an arrive sequentially in a stream with condition number κ, there exists an online algorithm that outputs a matrix M with O(d2ε2 log2n log κ) (rescaled) rows of A such that

(1ε)Ax1Mx1(1+ε)Ax1,

for all xd with high probability. (See Theorem IV.4.)

A Coreset Framework for Deterministic Sliding Window Algorithms:

To formalize a connection between online algorithms and sliding window algorithms, we give a framework for deterministic sliding window algorithms based on the merge-and-reduce paradigm and the concept of an online coreset, which we define as a weighted subset of rows of A that can be used to compute a good approximation to some given function on all prefixes of A. On the other hand, observe that a row-sampling based online algorithm does not know when the input might terminate, so it must output a good approximation to any prefix of the input, which is exactly the requirement of an online coreset! Moreover, an online cannot revoke any of its decisions, so the history of its decisions are fully observable. Indeed, each of our online algorithms imply the existence of an online coreset for the corresponding problem.

Framework I.6 (Coreset Framework for Deterministic Sliding Window Algorithms).

There exists a merge-and-reduce framework for numerical linear algebra in the sliding window model using online coresets. If the input stream has condition number κ, then for approximation parameter ε>1n, the framework gives:

  1. A deterministic algorithm for spectral approximation in the sliding window model that outputs a matrix M that is a subset of (rescaled) rows of an input matrix Aw×d such that (1 − ε) AAMM ⪯ (1 + ε) AA, while storing O(dε2 log4n log κ) rows at any time. (See Theorem V.5).

  2. A deterministic algorithm for low-rank approximation/projection-cost preservation in the sliding window model that outputs a matrix M that is a subset of (rescaled) rows of an input matrix AW×d such that for all rank k orthogonal projection matrices Pd×d,
    MMPF2(1±ε)AAPF2,
    while storing O(kε2 log4n log2κ) rows at any time. (See Theorem V.7.)
  3. A deterministic algorithm for 1-subspace embeddings in the sliding window model that outputs a matrix M that is a subset of (rescaled) rows of an input matrix AW×d such that (1 − ε) ∥Ax1 ≤ ∥Mx1 ≤ (1 + ε) ∥Ax1 for all xd, while storing O(dε2 log5n log κ) rows at any time. (See Theorem V.11).

All of the results presented using Framework I.6 are space optimal, up to lower order terms [1], [17], [23]. Again we have the property for low-rank approximation/projection-cost preservation that the number of sampled rows can be improved from O(kε2 log n log2κ) to O(kε2 log2n), under the assumption that the entries of the underlying matrix are integers at most poly(n) in magnitude.

We remark that neither our randomized framework Framework I.1 nor our deterministic framework Framework I.6 requires the sliding window parameter W as input during the processing of the stream. Instead, they create oblivious data structures from which approximations for any window can be computed after processing the stream. We outline our methods in this extended abstract and defer all proofs to [7].

II. Row Sampling Framework for the Sliding Window Model

In this section, we give space and time efficient algorithms for matrix functions in the sliding window model. Our general approach will be to use the following framework. As the stream r1,,rnd arrives, we shall maintain a weighted subset of these rows at each time. Suppose at some time t, we have a matrix Mt=rt,1rt,mt of weighted rows of the stream that can be used to give a good approximation to the function applied to any suffix of the stream. Upon the arrival of row t + 1, we first set Mt+1 = rt+1. Then starting with i = mt and moving backwards toward i = 1, we repeatedly prepend a weighted version of rt,i to Mt+1 with some probability that depends on rt,i, Mt+1, and the matrix function to be approximated. Once the rows of Mt have each been either added to Mt+1 or discarded, we proceed to row t + 2, i.e., the next update in the stream.

Note that the matrices Mt serve no real purpose other than for presentation; the framework is just storing a subset of weighted rows at each time and repeatedly performing online row sampling, starting with the most recent row. Since an online algorithm must be correct on all prefixes of the input, then our framework must be correct on all suffixes of the input and in particular, on the sliding window. This observation demonstrates a connection between online algorithms and sliding window algorithms that we explore in greater detail in future sections. We give our framework in Algorithm 1.

graphic file with name nihms-1696963-f0001.jpg

A. ℓ2-Subspace Embedding

We first give a randomized algorithm for spectral approximation in the sliding window model that is both space and time efficient. [16] defined the concept of online (ridge) leverage scores and show that by sampling each row of a matrix A with probability proportional to its online leverage score, the weighted sample at the end of the stream provides a (1 + ε) spectral approximation to A. We recall the definition of online ridge leverage scores of a matrix from [16], as well as introduce reverse online ridge leverage scores.

Definition II.1 (Online/Reverse Online (Ridge) Leverage Scores).

For a matrix A=a1ann×d, let Ai = a1 ○ … ○ ai and Zi = an ○ … ○ ai. Let λ ≥ 0. The online λ-ridge leverage score of row ai is defined to be min(1,ai(Ai1Ai1+λI)1ai), while the reverse online λ-ridge leverage score of row ai is defined to be min(1,ai(Zi+1Zi+1+λI)1ai). The (reverse) online leverage scores are defined respectively by setting λ = 0, though we use the convention that if the (reverse) online leverage score of ai is 1 if rank(Ai) > rank(Ai−1) (respectively if rank(Zi) > rank(Zi+1)).

From the definition, it is evident that the reverse online (ridge) leverage scores are monotonic; whenever a new row is added to A, the scores of existing rows cannot increase.

Intuitively, the online leverage score quantifies how important row ai is, with respect to the previous rows, while the reverse online leverage score quantifies how important row ai is, with respect to the following rows, and the ridge leverage scores are regularized versions of these quantities. As the name suggests, online (ridge) leverage scores seem appropriate for online algorithms while reverse online (ridge) leverage scores seem appropriate for sliding window algorithms, where recency is an emphasis. Hence, we use reverse online leverage scores in computing the sampling probability of each particular row in Algorithm 2 that serves as our customized Score function in Algorithm 1 for spectral approximation.

A.

However, these quantities are related; [16] provide an asymptotic bound on the sum of the online ridge leverage scores of any matrix, which also implies a bound on the sum of the reverse online ridge leverage scores, by reversing the order of the rows in a matrix. We present these results for the case where the input matrix has full rank, noting that similar bounds can be provided if the input matrix is not full rank by replacing the smallest singular value with the smallest nonzero singular value.

Lemma II.2 (Bound on Sum of Online Ridge Leverage Scores).

[16] Let the rows of A=a1ann×d arrive in a stream with condition number κ and let i be the online (ridge) leverage score of ai with regularization λ. Then i=1nli=O(d log A2λ) for λ > σmin(A) and i=1nli=O(d log κ) for λσmin(A). It follows that if τi is the reverse online ridge leverage score of ai, then i=1nτi=O(d log A2λ) for λ > σmin(A) and i=1nτi=O(d log κ) for λσmin(A).

We show that at any time t, Algorithm 1 using the Score function of Algorithm 2 stores a matrix Mt whose rows with timestamp after a time i ∈ [t] provides a (1 + ε) spectral approximation to any matrix Zi = rtri. This statement shows a good approximation to any suffix of the stream at all times and in particular for t = n and i = nW + 1, shows that Algorithm 1 using the Score function of Algorithm 2 outputs a spectral approximation for the matrix induced by the sliding window model.

Lemma II.3 (Spectral Approximation Guarantee, Bounds on Sampling Probabilities).

Let t ∈ [n], λ ≥ 0 and ε > 0. For i ∈ [t], let qi=min(1,αri(Zi+1Zi+1)1ri), where Zi+1 = rt ○ … ○ ri+1. Then with high probability after the arrival of row rt, Algorithm 1 using the Score function of Algorithm 2 will have sampled each row ri with probability at least qi and probability at most 4qi. Moreover, if Y is the suffix of Mt consisting of the (scaled) rows whose timestamps are at least i, then

(1ε)(ZiZi+λI)YY+λI(1+ε)(ZiZi+λI).

Theorem II.4 (Randomized Spectral Approximation Sliding Window Algorithm).

Let r1,,rnd be a stream of rows and κ be the condition number of the stream. Let W > 0 be a window size parameter and A = rnW+1 ○ … ○ rn be the matrix consisting of the W most recent rows. Given a parameter ε > 0, there exists an algorithm that outputs a matrix M with a subset of (rescaled) rows of A such that (1 − ε) AAMM ⪯ (1 + ε) AA and stores O(dε2 log n log κ) rows at any time, with high probability.

B. Low-Rank Approximation

In this section, we give a randomized algorithm for low-rank approximation in the sliding window model that is both space and time optimal, up to lower order terms. Throughout this section, we use A(k) to denote the best rank k approximation to a matrix An×d so that A(k):=argminrank(X)kAXF2 . Recall the following definition of a projection-cost preservation, from which it follows that obtaining a projection-cost preservation of A suffices to produce a low-rank approximation of A.

Definition II.5 (Rank k Projection-Cost Preservation [15]).

For m < n, a matrix Mm×d of rescaled rows of An×d is a (1 + ε) projection-cost preservation if, for all rank k orthogonal projection matrices Pd×d,

(1ε)AAPF2MMPF2(1+ε)AAPF2.

[15] showed that an additive-multiplicative spectral approximation of a matrix A along with an additional moderate condition that holds for ridge leverage score sampling gives a projection-cost preservation of A.

Lemma II.6.

[15] Let A=a1ann×d and λAA(k)F2k. Let p be the largest integer such that σp(A)2λ and let X = AA(p). Let Sm×n be a sampling matrix so that M = AS is a subset of scaled rows of A. If (1ε)(AA+λI)_MM+λI(1+ε)(AA+λI) and |SXF2XF2|εAA(k)F2, then M is a rank k projection-cost preservation of A with approximation parameter 24ε.

We focus our discussion on the additive-multiplicatve spectral approximation since the same argument of [15] with Freedman’s inequality rather than Chernoff bounds shows sampling matrices generated from ridge leverage scores satisfy the condition |SXF2XF2|εAA(k)F2 with high probability, even when the entries of S are not independent:

Lemma II.7.

[15] Let A=a1ann×d, λ=AA(k)F2k, and τi be the ridge leverage score of ai with regularization λ. Let p be the largest integer such that σp(A)2λ and let X = AA(p). Let Sm×n be a sampling matrix so that row ai is sampled by S, not necessarily independently, with probability at least min(1,Cτiε2 log n) for sufficiently large constant C. Then |SXF2XF2|εAA(k)F2 with high probability.

On the other hand, our space analysis in Section II-A relied on bounding the sum of the online leverage scores by O(d log κ) through Lemma II.2; a better bound is not known if we set λ=AA(k)F2k. This gap provides a barrier for algorithmic design not only in the sliding window model but also in the online model. We show a tighter analysis showing that the sum of the online ridge leverage scores for λ=AA(k)F2k is O(k log κ).

Now if we knew the value of AA(k)F2k a priori, we could set λAA(k)F2k and immediately apply Lemma II.3 to show that the output of Algorithm 1 with a SCORE function that uses the λ regularization outputs a matrix M that is a rank k projection-cost preservation of A.

Initially, even a constant factor approximation to AA(k)F2k seems challenging because the quantity is not smooth. This issue can be circumvented using additional procedures, such as spectral approximation on rows with reduced dimension [14]. Even simpler, observe that sampling with any regularization factor λ<AA(k)F2k would still provide the guarantees of Lemma II.6.

We could set λ = 0 and still obtain a rank k projection-cost preservation of A, but smaller values of λ correspond to larger number of sampled rows and the total number of sampled rows for λ = 0 would be proportional to d, as opposed to our goal of k. Instead, observe that if B is any prefix or suffix of rows of A, then BB(k)F2AA(k)F2. In other words, we can use the rows that have already been sampled to give a constant factor approximation to AA(k)F2 as it evolves, i.e., as more rows of A arrive. We again pay for the underestimate to AA(k)F2 by sampling an additional number of rows, but we show that we cannot sample too many rows before our approximation to AA(k)F2 doubles, which only incurs an additional O( log κ) factor in the number of sampled rows. We give the Score function for low-rank approximation in Algorithm 3.

We first bound the probability that each row is sampled, analogous to Lemma II.3.

Lemma II.8 (Projection-Cost Preservation Guarantee, Bounds on Sampling Probabilities).

Let t ∈ [n] be fixed and for each i ∈ [t], let Zi = rt ○ … ○ ri. Let λ=Zi+1(Zi+1)(k)F2k and ε > 0. Let ri(Zi+1Zi+1+λI)1ri). Then with high probability after the arrival of row rt, Algorithm 1 using the Score function of Algorithm 3 will have sampled row ri with probability at least qi and probability at most 2qi. Moreover, if Y is the suffix of Mt consisting of the (scaled) rows whose timestamps are at least i, then

(1ε)(ZiZi+λI)YY+λI(1+ε)(ZiZi+λI).

B.

We now give a tighter bound on the sum of the online λ-ridge leverage scores li for λ=AA(k)22k.

Lemma II.9 (Bound on Sum of Online Ridge Leverage Scores).

Let An×d have condition number κ. Let β ≥ 1, k ≥ 1 be constants and λ=AA(k)22βk. Then i=1nli=O(k log κ).

The sum of the reverse online λ-ridge leverage scores is bounded by the same quantity, since the rows of the input matrix can simply be considered in reverse order. We now show that Algorithm 1 using the Score function of Algorithm 3 gives a relative error low-rank approximation with efficient space usage.

Theorem II.10 (Randomized Low-Rank Approximation Sliding Window Algorithm).

Let r1,,rnd be a stream of rows and κ be the condition number of the matrix r1 ○ …. ○ rn. Let W > 0 be a window size parameter and A = rnW+1 ○ …. ○ rn be the matrix consisting of the W most recent rows. Given a parameter ε > 0, there exists an algorithm that with high probability, outputs a matrix M that is a (1 + ε) rank k projection-cost preservation of A and stores O(kε2 log n log2κ) rows at any time.

We describe how to achieve nearly input sparsity runtime and elaborate on bounded precision and condition number in [7].

III. Simple Rank Constrained Algorithms in the Online Model

In this section, we show that the paradigm of row sampling with respect to online ridge leverage scores offers simple analysis for a number of online algorithms that improve upon the state-of-the-art.

A. Online Projection-Cost Preservation

As a warm-up, we first demonstrate how our previous analysis bounding the sum of the online ridge leverage scores can be applied to analyze a natural online algorithm for producing a projection-cost preservation of a matrix A=a1ann×d. Our algorithm samples each row with probability equal to the online ridge leverage scores, where the regularization parameter λi is computed at each step. Note that if Ai = a1 ○ …. ○ ai, then Ai(Ai)(k)F2Aj(Aj)(k)F2 for any i < j. Thus if λi=Ai1(Ai1)(k)F2k, then sampling row ai with online ridge leverage score regularized by λi has a higher probability than with ridge leverage score regularized by λ=AA(k)F2k. Although [16] was only interested in spectral approximation and therefore set λ = εσmin(A), they nevertheless shows that online row sampling with any regularization λ gives an additive-multiplicative spectral approximation to A. Thus by setting λ=AA(k)F2k, our algorithm outputs a rank k projection-cost preservation of A by Lemma II.6 and Lemma II.7. Moreover, our bounds for the sum of the online ridge leverage scores in Lemma II.9 show that our algorithm only samples a small number of rows, optimal up to lower order factors.

Theorem III.1 (Online Rank k Projection-Cost Preservation).

Given parameters ε > 0, k > 0, and a matrix An×d whose rows a1, … , an arrive sequentially in a stream with condition number κ, there exists an online algorithm that outputs a matrix M with O(kε2 log n log2κ) (rescaled) rows of A such that

MM(k)F2(1±ε)AA(k)F2,

and thus M is a rank k projection-cost preservation of A, with high probability.

B. Online Row Subset Selection

We next describe how to perform row subset selection in the online model. Our starting point is an offline algorithm by [15] and the paradigm of adaptive sampling [22], [23], [29], which is the procedure of repeatedly sampling rows of A with probability proportional to their squared distances to the subspace spanned by Z. [15] first obtained a matrix Z that is a constant factor low-rank approximation to the underlying matrix A through ridge-leverage score sampling. They observed that a theorem by [22] shows that adaptive sampling O(kε) additional rows S of A against the rows of Z suffices for ZS to contain a (1 + ε) factor approximation to the online row subset selection problem.

[15] then adapted this approach to the streaming model by maintaining a reservoir of O(kε) rows and replacing rows appropriately as new rows arrive and more information about Z is obtained. Alternatively, we modify the proof of [22] if Z is given to show that adaptive sampling can also be performed on data streams to obtain a different but valid S by oversampling each row of A by a O(kε) factor. Moreover by running a low-rank approximation algorithm in parallel, downsampling can be performed as rows of Z arrive, so that the above approach can be performed in one stream.

In the online model, we cannot downsample rows of S once they are selected, since Z evolves as the stream arrives. Fortunately, [15] showed that the adaptive sampling probabilities can be upper bounded by the λ-ridge leverage scores, where λ=1kAA(k)F2. Since the λ-ridge leverage scores are at most the online λ-ridge leverage scores, we can again sample rows proportional to their online λ-ridge leverage scores. It then suffices to again use Lemma II.9 to bound the number of rows sampled in this manner.

Theorem III.2 (Online Row Subset Selection).

Given parameters 0<ε<12, k > 0 and a matrix An×d whose rows a1, … , an arrive sequentially in a stream with condition number κ, there exists an online algorithm that with high probability, outputs a matrix M with O(kε log n log2κ) rows that contains a matrix T of k rows such that

AAT1TF2(1+ε)AA(k)F2.

C. Online Principal Component Analysis

Recall that in the online PCA problem, rows of the matrix A=a1ann×d arrive sequentially in a data stream and after each row ai arrives, the goal is to immediately output a row mi1×m such that at the end of the stream, there exists a low-rank matrix Xm×d such that

AMXF2(1+ε)AA(k)F2,

where M = m1 ○ … ○ mn and A(k) is again the best rank k approximation to A.

[4] gave an algorithm for the online PCA problem that embeds into a matrix X that has rank O(kε2( log n+ log κ)4) with high probability, where κ is the condition number of A. Their algorithm maintains and updates the matrix X throughout the data stream. After the arrival of row ai, the matrix X is updated using a combination of residual based sampling and a black-box theorem of [6]. Row mi is then output as the embedding of ai into X by mi = aiX(i), where X(i) is the matrix X after row i has been processed by X. X has the property that no rows from X are ever removed across the duration of the algorithm, so then mi is only an upper bound on the best embedding of ai. However, this matrix X does not provably contain a good rank k approximation to A. That is, X does not contain a rank k submatrix Yk×d such that there exists a matrix B such that

ABYF2(1+ε)AA(k)F2.

Recall that our online row subset selection algorithm returns a matrix Z with O(kε log n log2κ) rows of A that contains a submatrix Y of k rows that is a good rank k approximation of A. Thus, our algorithm for online row subset selection algorithm can be combined with the algorithm of [4] to output matrices M and W such that MW is a good approximation to the online PCA problem but also so that W contains a submatrix Y of k rows that is a good rank k approximation of A. Namely, the algorithm of [4] can be run to produce a matrix X(i) after the arrival of each row ai. Moreover, our online row subset selection algorithm can be run to produce a matrix Z(i) after the arrival of each row ai. Let W(0)=0×d and let W(i) append the newly added rows of X(i) and Z(i) to W(i−1). We then immediately output the embedding mi = aiW(i).

Theorem III.3 (Online Principal Component Analysis).

Given parameters n, d, k, ε > 0 and a matrix An×d whose rows arrive sequentially in a stream with condition number κ, let m=O(kε2( log n+ log κ)4). There exists an algorithm for online PCA that immediately outputs a row mim after seeing row aid and outputs a matrix Wm×d at the end of the stream such that

AMWF2(1+ε)AA(k)F2,

where A(k) is the best rank k approximation to A. Moreover, W contains a submatrix Yk×d such that there exists a matrix B such that

ABYF2(1+ε)AA(k)F2.

IV. 1-Subspace Embeddings

In this section, we consider 1-subspace embeddings in both the online model and the sliding window model.

Definition IV.1 ((Online) 1 Sensitivity).

For a matrix A=a1ann×d, we define the 1 sensitivity of ai by maxxd|aix|Ax1 and the online 1 sensitivity of ai by maxxd|aix|Aix1, where Ai = a1 ○ … ○ ai. We again that use the convention that the online 1 sensitivity of ai is 1 if rank(Ai) > rank(Ai−1).

Note that from the definition, the online 1 sensitivity of a row is at least as large as the 1 sensitivity of the row. Similarly, the (online) 1 sensitivity of each of the previous rows in A cannot increase when a new row r is added to A. Thus we can use the online 1 sensitivities to define an online algorithm for 1-subspace embedding.

We first show 1 sensitivity sampling gives an 1-subspace embedding.

Lemma IV.2.

For ε>1n, Algorithm 4 returns a matrix M such that with high probability, we have for all xd,

|Mx1Ax1|εAx1.

To analyze the space complexity, it remains to bound the sum of the online 1 sensitivities. Much like the proof of Theorem II.4, we first bound the sum of regularized online 1 sensitivities and then relate these quantities. We then require a few structural results relating online leverage scores and their relationships to regularized online 1 sensitivities.

IV.

IV.

Lemma IV.3 (Bound on Sum of Online 1 Sensitivities).

Let A=a1ann×d. Let ζiOL(A) be the online 1 sensitivity of ai with respect to A, for each i ∈ [n]. Then i=1nζiOL(A)=O(d log n log κ).

We now give the full guarantees of Algorithm 4.

Theorem IV.4 (Online 1-Subspace Emebedding).

Given ε > 0 and a matrix An×d whose rows a1, … , an arrive sequentially in a stream with condition number κ, there exists an online algorithm that outputs a matrix M with O(d2ε2 log2n log κ) (rescaled) rows of A such that

(1ε)Ax1Mx1(1+ε)Ax1,

for all xd with high probability.

Finally, note that we can use the reverse online 1 sensitivities in the framework of Algorithm 1 to obtain an 1-subspace embedding in the sliding window model.

Since we are considering a sliding window algorithm, we consider the reverse online 1 sensitivities rather than using the online 1 sensitivities as for the online 1-subspace embedding algorithm in Algorithm 4. For a matrix A=a1ann×d, the reverse online 1 sensitivity of row ai is defined in the natural way, by maxxdaixZi1x1 if rank(Zi−1) = rank(Zi) and by 1 otherwise, where Zi = an ○ … ○ ai. Note that Algorithm 1 with the Score function in Algorithm 5 evaluates the importance of each row compared to the following rows, so we are approximately sampling by reverse online 1 sensitivities, as desired. The proof follows along the same lines as Theorem II.4 and Theorem II.10, using a martingale argument to show that the approximations for the reverse online 1 sensitivities induce sufficiently high sampling probabilities, while still using the sum of the online 1 sensitivities to bound the total number of sampled rows. We sketch the proof below, as Section V presents an improved algorithm for 1-subspace embedding in the sliding window model that is nearly space optimal, up to lower order factors.

Theorem IV.5 (Randomized 1-Subspace Embedding Sliding Window Algorithm).

Let r1,,rnd be a stream of rows and κ be the condition number of the matrix r1 ○ … ○ rn. Let W > 0 be a window size parameter and A = rnW+1 ○ … ○ rn be the matrix consisting of the W most recent rows. Given a parameter ε > 0, there exists an algorithm that outputs a matrix M with a subset of (rescaled) rows of A such that (1ε)Ax1Mx1(1+ε)Ax1 for all xd and stores O(d2ε2 log2n log κ) rows at any time, with high probability.

We detail how to efficiently provide constant factor approximations to the sensitivities in [7].

V. A Coreset Framework for Deterministic Sliding Window Algorithms

We give a framework for deterministic sliding window algorithms based on the merge and reduce paradigm and the concept of online coresets. We define an online coreset for a matrix A as a weighted subset of rows of A that also provides a good approximation to prefixes of A:

Definition V.1 (Online Coreset).

An online coreset for a function f, an approximation parameter ε > 0, and a matrix An×d=a1an is a subset of weighted rows of A such that for any Ai = a1 ○ … ○ ai with i ∈ [n], we have f(Mi) is a (1 + ε)-approximation of f(Ai), where Mi is the matrix that consists of the weighted rows of A in the coreset that appear at time i or before.

We can use deterministic online coresets for deterministic sliding window algorithms using a merge and reduce framework. The idea is to store the mspace most recent rows in a block B0, for some parameter mspace related to the coreset size. Once B0 is full, we reduce B0 to a smaller number of rows by setting B1 to be a (1+ε log n) coreset of B0, starting with the most recent row, and then empty B0 so that it can again store the most recent rows. Subsequently, whenever B0 is full, we merge successive non-empty blocks B0, … , Bi and reduce them to a (1+ε log n)i coreset, indexed as Bi+1. Since the entire stream has length n, then by using O( log n) blocks Bi, we will have a (1+ε log n) log n coreset, starting with the most recent row. Rescaling ε, this gives a merge and reduce based framework for (1 + ε) deterministic sliding window algorithms based on online coresets. We give the framework in full in Algorithm 6.

V.

Lemma V.2.

Bi in Algorithm 6 is a (1+ε log n)i online coreset for 2i−1mspace rows.

Theorem V.3.

Let r1,,rn1×d be a stream of rows, ε > 0, and A = rnW+1 ○ … ○ rn be the matrix consisting of the W most recent rows. If there exists a deterministic online coreset algorithm for a matrix function f that stores S(n,d,ε) rows, then there exists a deterministic sliding window algorithm that stores O(S(n,d,ε log n) log n) rows and outputs a matrix M such that f(M) is a (1 + ε)-approximation of f(A).

The online row sampling algorithm of [16] shows the existence of an online coreset for spectral approximation. This coreset can be inefficiently but explicitly computed by computing the online leverage scores, enumerating over sufficiently small subsets of scaled rows, and checking whether a subset is an online coreset for spectral approximation.

Theorem V.4 (Online Coreset for Spectral Approximation).

[16] For a matrix An×d=a1an, there exists a constant C > 0 and a deterministic algorithm Coreset(A,ε) that outputs an online coreset of Cdε2 log n log κ weighted rows of A. For any i ∈ [n], let Mi be the weighted rows of A in the coreset that appear at time i or before. Then (1ε)Aix2Mix2(1+ε)Aix2 for all xd, where Ai = a1 ○ … ○ ai.

Then Theorem V.3 and Theorem V.4 imply:

Theorem V.5 (Deterministic Sliding Window Algorithm for Spectral Approximation).

Let r1,,rn1×d be a stream of rows and κ be the condition number of the stream. Let ε > 0 and A = rnW+1 ○ … ○ rn be the matrix consisting of the W most recent rows. There exists a deterministic algorithm that stores O(dε2 log 4n log κ) rows and outputs a matrix M such that (1 − ε)∥Ax2 ≤ ∥Mx2 ≤ (1 + ε) ∥Ax2 for all xd.

Theorem III.1 shows that the existence of an online coreset for computing a rank k projection-cost preservation.

Theorem V.6 (Online Coreset for Rank k Projection-Cost Preservation).

For a matrix An×d=a1an, there exists a constant C > 0 and a deterministic algorithm Coreset(A,ε) that outputs an online coreset of Ckε2 log n log κ weighted rows of A. For any i ∈ [n], let Mi be the weighted rows of A in the coreset that appear at time i or before. Then Mi is a (1+ε) projection-cost preservation for Ai := a1 ○ … ○ ai.

Thus Theorem V.3 and Theorem V.6 give:

Theorem V.7 (Deterministic Sliding Window Algorithm for Rank k Projection-Cost Preservation).

Let r1,,rn1×d be a stream of rows and κ be the condition number of the stream. Let ε > 0 and A = rnW+1 ○…○rn be the matrix consisting of the W most recent rows. There exists a deterministic algorithm that stores O(dε2 log4n log κ) rows and outputs a matrix M such that (1 − ε)∥AAPF ≤ ∥MMPF ≤ (1 + ε) ∥AAPF for all rank k orthogonal projection matrices Pd×d.

For 1-subspace embeddings, we can use our online coreset from Theorem IV.4, but in fact [17] showed the existence of an offline coreset for 1-subspace embeddings that stores a smaller number of rows. The offline coreset of [17] is based on sampling rows proportional to their Lewis weights. We define a corresponding online version of Lewis weights:

Definition V.8 ((Online) Lewis Weights).

For a matrix A=a1ann×d, let Ai = a1 ○ … ○ ai. Let wi(A) denote the Lewis weight of row ai. Then the Lewis weights of A are the unique weights such that wi(A)=(ai(AW1A)1ai)1/2, where W is a diagonal matrix with Wi,i = wi(A). Equivalently, wi(A) = τi(W−1/2A), where τi(W−1/2A) denotes the leverage score of row i of W−1/2A. We define the online Lewis weight of ai to be the Lewis weight of row ai with respect to the matrix Ai−1 and use the convention that the Lewis weight of ai is 1 if rank(Ai) > rank(Ai−1).

Theorem V.9 (Online Coreset for 1-Subspace Embedding).

[17] Let An×d=a1an, If there exists an upper bound C > 0 on the sum of the online Lewis weights of A, then there exists a deterministic algorithm Coreset(A,ε) that outputs an online coreset of O(Cε2 log n) weighted rows of A. For any i ∈ [n], let Mi be the weighted rows of A in the coreset that appear at time i or before. Then Mi is an 1-subspace embedding for Ai := a1 ○ … ○ ai with approximation (1 + ε).

We bound the sum of the online Lewis weights by first considering a regularization of the input matrix, which only slightly alters each score.

Lemma V.10 (Bound on Sum of Online Lewis Weights).

Let A=a1ann×d. Let wiOL(A) be the online Lewis weight of ai with respect to A, for each i ∈ [n]. Then i=1nwiOL(A)=O(d log n log κ).

Then from Theorem V.3, Theorem V.9, and Lemma V.10:

Theorem V.11 (Deterministic Sliding Window Algorithm for 1-Subspace Embedding).

Let r1,,rn1×d be a stream of rows and κ be the condition number of the stream. Let ε > 0 and A = rnW+1 ○ … ○ rn be the matrix consisting of the W most recent rows. There exists a deterministic algorithm that stores O(dε2 log 5n log κ) rows and outputs a matrix M such that (1 − ε)∥Ax1 ≤ ∥Mx1 ≤ (1 + ε) ∥Ax1 for all xd.

Note that Theorem V.3 also provides an approach for a randomized ℓ1-subspace embedding sliding window algorithm that improves upon the space requirements of Theorem IV.5, by using online coresets randomly generated sampling rows with respect to their online Lewis weights. Moreover, recall that in some settings, the online model does not require algorithms to use space sublinear in the size of the input. In these settings, Lemma V.10 could also potentially be useful in a row-sampling based algorithm for online 1-subspace embedding that improves upon the sample complexity of Theorem IV.4. We provide further details on efficient online coreset construction by derandomizing the OnlineBSS algorithm of [16] in [7].

Acknowledgements

Vladimir Braverman is supported in part by NSF CAREER grant 1652257, ONR Award N00014-18-1-2364 and the Lifelong Learning Machines program from DARPA/MTO. Petros Drineas is supported in part by NSF FRG-1760353 and NSF AF-1814041. This work was done before Jalaj Upadhyay joined Apple. David P. Woodruff is supported by the National Institute of Health (NIH) grant 5R01 HG 10798-2, subaward through Indiana University Bloomington, as well as Office of Naval Research (ONR) grant N00014-18-1-2562. Samson Zhou is supported in part by NSF CCF-1649515 and by a Simons Investigator Award of D. Woodruff.

Contributor Information

Vladimir Braverman, Johns Hopkins University.

Petros Drineas, Purdue University.

Cameron Musco, University of Massachusetts Amherst.

Christopher Musco, New York University.

Jalaj Upadhyay, Apple.

References

  • [1].Andoni A, Chen J, Krauthgamer R, Qin B, Woodruff DP, and Zhang Q. On sketching quadratic forms. In Proceedings of the 2016 ACM Conference on Innovations in Theoretical Computer Science, pages 311–319, 2016. [Google Scholar]
  • [2].Babcock B, Babu S, Datar M, Motwani R, and Widom J. Models and issues in data stream systems. In Proceedings of the Twenty-first ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 1–16, 2002. [Google Scholar]
  • [3].Balcan M, Dick T, and Vitercik E. Dispersion for datadriven algorithm design, online learning, and private optimization. In 59th IEEE Annual Symposium on Foundations of Computer Science, FOCS, pages 603–614, 2018. [Google Scholar]
  • [4].Bhaskara A, Lattanzi S, Vassilvitskii S, and Zadimoghaddam M. Residual based sampling for online low rank approximation. In 60th IEEE Annual Symposium on Foundations of Computer Science, FOCS, pages 1596–1614, 2019. [Google Scholar]
  • [5].Blum A, Kumar V, Rudra A, and Wu F. Online learning in online auctions. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 202–204, 2003. [Google Scholar]
  • [6].Boutsidis C, Garber D, Karnin ZS, and Liberty E. Online principal components analysis. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 887–901, 2015. [Google Scholar]
  • [7].Braverman V, Drineas P, Musco C, Musco C, Upadhyay J, Woodruff DP, and Zhou S. Near optimal linear algebra in the online and sliding window models. CoRR, abs/1805.03765, 2018. [PMC free article] [PubMed] [Google Scholar]
  • [8].Braverman V, Grigorescu E, Lang H, Woodruff DP, and Zhou S. Nearly optimal distinct elements and heavy hitters on sliding windows. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM, pages 7:1–7:22, 2018. [Google Scholar]
  • [9].Braverman V, Lang H, Ullah E, and Zhou S. Improved algorithms for time decay streams. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM, pages 27:1–27:17, 2019. [Google Scholar]
  • [10].Braverman V and Ostrovsky R. Smooth histograms for sliding windows. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS) Proceedings, pages 283–293, 2007. [Google Scholar]
  • [11].Chen J, Nguyen HL, and Zhang Q. Submodular maximization over sliding windows. CoRR, abs/1611.00129, 2016. [Google Scholar]
  • [12].Cohen IR, Peng B, and Wajc D. Tight bounds for online edge coloring. In 60th IEEE Annual Symposium on Foundations of Computer Science, FOCS, pages 1–25. IEEE Computer Society, 2019. [Google Scholar]
  • [13].Cohen IR and Wajc D. Randomized online matching in regular graphs. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 960–979, 2018. [Google Scholar]
  • [14].Cohen MB, Elder S, Musco C, Musco C, and Persu M. Dimensionality reduction for k-means clustering and low rank approximation. In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC, pages 163–172, 2015. [Google Scholar]
  • [15].Cohen MB, Musco C, and Musco C. Input sparsity time low-rank approximation via ridge leverage score sampling. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1758–1777, 2017. [Google Scholar]
  • [16].Cohen MB, Musco C, and Pachocki JW. Online row sampling. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM, pages 7:1–7:18, 2016. [Google Scholar]
  • [17].Cohen MB and Peng R. p row sampling by lewis weights. In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC, pages 183–192, 2015. [Google Scholar]
  • [18].Cormode G. The continuous distributed monitoring model. SIGMOD Record, 42(1):5–14, 2013. [Google Scholar]
  • [19].Cormode G and Garofalakis MN. Streaming in a connected world: querying and tracking distributed data streams. In EDBT 2008, 11th International Conference on Extending Database Technology, Proceedings, page 745, 2008. [Google Scholar]
  • [20].Cormode G and Muthukrishnan S. What’s new: finding significant differences in network data streams. IEEE/ACM Transactions on Networking, 13(6):1219–1232, 2005. [Google Scholar]
  • [21].Datar M, Gionis A, Indyk P, and Motwani R. Maintaining stream statistics over sliding windows. SIAM J. Comput, 31(6):1794–1813, 2002. [Google Scholar]
  • [22].Deshpande A, Rademacher L, Vempala S, and Wang G. Matrix approximation and projective clustering via volume sampling. Theory of Computing, 2(12):225–247, 2006. [Google Scholar]
  • [23].Deshpande A and Vempala SS. Adaptive sampling and fast low-rank matrix approximation. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM, pages 292–303, 2006. [Google Scholar]
  • [24].Ehsani S, Hajiaghayi M, Kesselheim T, and Singla S. Prophet secretary for combinatorial auctions and matroids. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 700–714. SIAM, 2018. [Google Scholar]
  • [25].Epasto A, Lattanzi S, Vassilvitskii S, and Zadimoghaddam M. Submodular optimization over sliding windows. In Proceedings of the 26th International Conference on World Wide Web, WWW, pages 421–430, 2017. [Google Scholar]
  • [26].Esfandiari H, Hajiaghayi M, Liaghat V, and Monemizadeh M. Prophet secretary. SIAM J. Discrete Math, 31(3):1685–1701, 2017. [Google Scholar]
  • [27].Gamlath B, Kapralov M, Maggiori A, Svensson O, and Wajc D. Online matching with general arrivals. In 60th IEEE Annual Symposium on Foundations of Computer Science, FOCS, pages 26–37. IEEE Computer Society, 2019. [Google Scholar]
  • [28].Hazan E and Koren T. The computational power of optimization in online learning. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC, pages 128–141, 2016. [Google Scholar]
  • [29].Mahabadi S, Razenshteyn I, Woodruff DP, and Zhou S. Non-adaptive adaptive sampling on turnstile streams. In Symposium on Theory of Computing Conference, STOC, 2020. [Google Scholar]
  • [30].Manku GS and Motwani R. Approximate frequency counts over data streams. PVLDB, 5(12):1699, 2012. [Google Scholar]
  • [31].Naor JS and Wajc D. Near-optimum online ad allocation for targeted advertising. ACM Trans. Economics and Comput, 6(3–4):16:1–16:20, 2018. [Google Scholar]
  • [32].Osborne M, Moran S, McCreadie R, Lunen AV, Sykora M, Cano E, Ireson N, MacDonald C, Ounis I, He Y, Jackson T, Ciravegna F, and O’Brien A. Real-time detection, tracking and monitoring of automatically discovered events in social media. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014. [Google Scholar]
  • [33].Papadimitriou S and Yu PS. Optimal multi-scale patterns in time series streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 647–658, 2006. [Google Scholar]
  • [34].Papapetrou O, Garofalakis MN, and Deligiannakis A. Sketching distributed sliding-window data streams. VLDB J, 24(3):345–368, 2015. [Google Scholar]
  • [35].Qahtan AA, Alharbi B, Wang S, and Zhang X. A pcabased change detection framework for multidimensional data streams: Change detection in multidimensional data streams. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 935–944, 2015. [Google Scholar]
  • [36].Rubinstein A. Beyond matroids: secretary problem and prophet inequality with general constraints. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC, pages 324–332, 2016. [Google Scholar]
  • [37].Wei Z, Liu X, Li F, Shang S, Du X, and Wen J. Matrix sketching over sliding windows. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference, pages 1465–1480, 2016. [Google Scholar]

RESOURCES