Results 1 - 10
of
95
Near-optimal lower bounds on the multi-party communication complexity of set disjointness
- In IEEE Conference on Computational Complexity
, 2003
"... We study the communication complexity of the set disjointness problem in the general multi-party model. For t players, each holding a subset of a universe of size n, we establish a near-optimal lower bound of Ω(n/(t log t)) on the communication complexity of the problem of determining whether their ..."
Abstract
-
Cited by 62 (5 self)
- Add to MetaCart
We study the communication complexity of the set disjointness problem in the general multi-party model. For t players, each holding a subset of a universe of size n, we establish a near-optimal lower bound of Ω(n/(t log t)) on the communication complexity of the problem of determining whether their sets are disjoint. In the more restrictive one-way communication model, in which the players are required to speak in a predetermined order, we improve our bound to an optimal Ω(n/t). These results improve upon the earlier bounds of Ω(n/t 2) in the general model, and Ω(ε 2 n/t 1+ε) in the one-way model, due to Bar-Yossef, Jayram, Kumar, and Sivakumar [5]. As in the case of earlier results, our bounds apply to the unique intersection promise problem. This communication problem is known to have connections with the space complexity of approximating frequency moments in the data stream model. Our results lead to an improved space complexity lower bound of Ω(n 1−2/k / log n) for approximating the k th frequency moment with a constant number of passes over the input, and a technical improvement to Ω(n 1−2/k) if only one pass over the input is permitted. Our proofs rely on the information theoretic direct sum decomposition paradigm of Bar-Yossef et al [5]. Our improvements stem from novel analytical tech-
Optimal space lower bounds for all frequency moments
- In SODA
, 2004
"... Abstract We prove that any one-pass streaming algorithm which (ffl, ffi)-approximates the kth frequency moment Fk, for any real k 6 = 1 and any ffl = \Omega i 1pm j, must use \Omega \Gamma 1ffl2 \Delta bits of space, where m is the size of the universe. This is optimal in terms of ffl, resolves the ..."
Abstract
-
Cited by 42 (10 self)
- Add to MetaCart
Abstract We prove that any one-pass streaming algorithm which (ffl, ffi)-approximates the kth frequency moment Fk, for any real k 6 = 1 and any ffl = \Omega i 1pm j, must use \Omega \Gamma 1ffl2 \Delta bits of space, where m is the size of the universe. This is optimal in terms of ffl, resolves the open questions of BarYossef et al in [3, 4], and extends the \Omega \Gamma 1ffl2 \Delta lower bound for F0 in [11] to much smaller ffl by applying novel techniques. Along the way we lower bound the one-way communication complexity of approximating the Hamming distance and the number of bipartite graphs with minimum/maximum degree constraints. 1 Introduction Computing statistics on massive data sets is increasinglyimportant these days. Advances in communication and storage technology enable large bodies of raw datato be generated daily, and consequently, there is a rising demand to process this data efficiently. Sinceit is impractical for an algorithm to store even a small fraction of the data stream, its performance istypically measured by the amount of space it uses. In many scenarios, such as internet routing, once a streamelement is examined it is lost forever unless explicitly saved by the processing algorithm. This, along with thesheer size of the data, makes multiple passes over the data infeasible. In this paper we restrict our attention toone-pass streaming algorithms and we investigate their space complexity.Let a =
Streaming and sublinear approximation of entropy and information distances
- In ACM-SIAM Symposium on Discrete Algorithms
, 2006
"... In most algorithmic applications which compare two distributions, information theoretic distances are more natural than standard ℓp norms. In this paper we design streaming and sublinear time property testing algorithms for entropy and various information theoretic distances. Batu et al posed the pr ..."
Abstract
-
Cited by 33 (9 self)
- Add to MetaCart
In most algorithmic applications which compare two distributions, information theoretic distances are more natural than standard ℓp norms. In this paper we design streaming and sublinear time property testing algorithms for entropy and various information theoretic distances. Batu et al posed the problem of property testing with respect to the Jensen-Shannon distance. We present optimal algorithms for estimating bounded, symmetric f-divergences (including the Jensen-Shannon divergence and the Hellinger distance) between distributions in various property testing frameworks. Along the way, we close a (log n)/H gap between the upper and lower bounds for estimating entropy H, yielding an optimal algorithm over all values of the entropy. In a data stream setting (sublinear space), we give the first algorithm for estimating the entropy of a distribution. Our algorithm runs in polylogarithmic space and yields an asymptotic constant factor approximation scheme. An integral part of the algorithm is an interesting use of an F0 (the number of distinct elements in a set) estimation algorithm; we also provide other results along the space/time/approximation tradeoff curve. Our results have interesting structural implications that connect sublinear time and space constrained algorithms. The mediating model is the random order streaming model, which assumes the input is a random permutation of a multiset and was first considered by Munro and Paterson in 1980. We show that any property testing algorithm in the combined oracle model for calculating a permutation invariant functions can be simulated in the random order model in a single pass. This addresses a question raised by Feigenbaum et al regarding the relationship between property testing and stream algorithms. Further, we give a polylog-space PTAS for estimating the entropy of a one pass random order stream. This bound cannot be achieved in the combined oracle (generalized property testing) model. 1
Simpler algorithm for estimating frequency moments of data streams
- PROCEEDINGS OF THE SEVENTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHM
, 2006
"... The problem of estimating the kth frequency moment Fk over a data stream by looking at the items exactly once as they arrive was posed in [1, 2]. A succession of algorithms have been proposed for this problem [1, 2, 6, 8, 7]. Recently, Indyk and Woodruff [11] have presented the first algorithm for e ..."
Abstract
-
Cited by 28 (2 self)
- Add to MetaCart
The problem of estimating the kth frequency moment Fk over a data stream by looking at the items exactly once as they arrive was posed in [1, 2]. A succession of algorithms have been proposed for this problem [1, 2, 6, 8, 7]. Recently, Indyk and Woodruff [11] have presented the first algorithm for estimating Fk, for k > 2, using space Õ(n1-2/k), matching the space lower bound (up to poly-logarithmic factors) for this problem [1, 2, 3, 4, 13] (n is the number of distinct items occurring in the stream.) In this paper, we present a simpler 1-pass algorithm for estimating Fk.
An optimal randomised cell probe lower bounds for approximate nearest neighbor searching
- In Proceedings of the Symposium on Foundations of Computer Science
"... Abstract We consider the approximate nearest neighbour search problem on the Hamming Cube {0, 1}d.We show that a randomised cell probe algorithm that uses polynomial storage and word size dO(1)requires a worst case query time of \Omega (log log d / log log log d). The approximation factor may beas l ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
Abstract We consider the approximate nearest neighbour search problem on the Hamming Cube {0, 1}d.We show that a randomised cell probe algorithm that uses polynomial storage and word size dO(1)requires a worst case query time of \Omega (log log d / log log log d). The approximation factor may beas loose as 2log 1-j d for any fixed j> 0. This generalises an earlier result [6] on the deterministic complexity of the same problem and, more importantly, fills a major gap in the study of thisproblem since all earlier lower bounds either did not allow randomisation [6, 19] or did not allow approximation [5, 2, 16]. We also give a cell probe algorithm which proves that our lower boundis optimal. Our proof uses a lower bound on the round complexity of the related communication problem.We show, additionally, that considerations of bit complexity alone cannot prove any nontrivial cell probe lower bound for the problem. This shows that the Richness Technique [20] used in a lot ofrecent research around this problem would not have helped here.
Robust lower bounds for communication and stream computation
- in Proceedings of the 40th Annual ACM Symposium on Theory of Computing (British
, 2008
"... We study the communication complexity of evaluating functions when the input data is randomly allocated (according to some known distribution) amongst two or more players, possibly with information overlap. This naturally extends previously studied variable partition models such as the best-case and ..."
Abstract
-
Cited by 16 (5 self)
- Add to MetaCart
We study the communication complexity of evaluating functions when the input data is randomly allocated (according to some known distribution) amongst two or more players, possibly with information overlap. This naturally extends previously studied variable partition models such as the best-case and worst-case partition models [32, 29]. We aim to understand whether the hardness of a communication problem holds for almost every allocation of the input, as opposed to holding for perhaps just a few atypical partitions. A key application is to the heavily studied data stream model. There is a strong connection between our communication lower bounds and lower bounds in the data stream model that are “robust” to the ordering of the data. That is, we prove lower bounds for when the order of the items in the stream is chosen not adversarially but rather uniformly (or near-uniformly) from the set of all permuations. This random-order data stream model has attracted recent interest, since lower bounds here give stronger evidence for the inherent hardness of streaming problems. Our results include the first random-partition communication lower bounds for problems including multi-party set disjointness and gap-Hamming-distance. Both are tight. We also extend and improve previous results [19, 7] for a form of pointer jumping that is relevant to the problem of selection (in particular, median finding). Collectively, these results yield lower bounds for a variety of problems in the random-order data stream model, including estimating the number of distinct elements, approximating frequency moments, and quantile estimation.
Estimating the sortedness of a data stream
- In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms
, 2007
"... The distance to monotonicity of a sequence is the minimum number of edit operations required to transform the sequence into an increasing order; this measure is complementary to the length of the longest increasing subsequence (LIS). We address the question of estimating these quantities in the one- ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
The distance to monotonicity of a sequence is the minimum number of edit operations required to transform the sequence into an increasing order; this measure is complementary to the length of the longest increasing subsequence (LIS). We address the question of estimating these quantities in the one-pass data stream model and present the first sub-linear space algorithms for both problems. We first present O ( √ n)-space deterministic algorithms that approximate the distance to monotonicity and the LIS to within a factor that is arbitrarily close to 1. We also show a lower bound of Ω(n) on the space required by any randomized algorithm to compute the LIS (or alternatively the distance from monotonicity) exactly, demonstrating that approximation is necessary for sub-linear space computation; this bound improves upon the existing lower bound of Ω ( √ n) [LNVZ06]. Our main result is a randomized algorithm that uses only O(log 2 n) space and approximates the distance to monotonicity to within a factor that is arbitrarily close to 4. In contrast, we believe that any significant reduction in the space complexity for approximating the length of the LIS is considerably hard. We conjecture that any deterministic (1 + ɛ) approximation algorithm for LIS requires Ω ( √ n) space, and as a step towards this conjecture, prove a space lower bound of Ω ( √ n) for a restricted yet natural class of deterministic algorithms. 1
Rectangle Size Bounds and Threshold Covers in Communication Complexity
- In Proceedings Eighteenth Annual IEEE Conference on Computational Complexity
, 2003
"... We investigate the power of the most important lower bound technique in randomized communication complexity, which is based on an evaluation of the maximal size of approximately monochromatic rectangles, minimized over all distributions on the inputs. While it is known that the 0-error version of th ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
We investigate the power of the most important lower bound technique in randomized communication complexity, which is based on an evaluation of the maximal size of approximately monochromatic rectangles, minimized over all distributions on the inputs. While it is known that the 0-error version of this bound is polynomially tight for deterministic communication, nothing in this direction is known for constant error and randomized communication complexity. We rst study a onesided version of this bound and obtain that its value lies between the MA- and AM-complexities of the considered function. Hence the lower bound actually works for a (communication complexity) class between MA\co MA and AM\co AM . We also show that the MA-complexity of the disjointness problem is n). Following this we consider the conjecture that the lower bound method is polynomially tight for randomized communication complexity. First we disprove a distributional version of this conjecture. Then we give a combinatorial characterization of the value of the lower bound method, in which the optimization over all distributions is absent. This characterization is done by what we call a uniform threshold cover. We also study relaxations of this notion, namely approximate majority covers and majority covers, and compare these three notions in power, exhibiting exponential separations. Each of these covers captures a lower bound method previously used for randomized communication complexity.
The computational hardness of estimating edit distance
- In Proceedings of the Symposium on Foundations of Computer Science
, 2007
"... We prove the first non-trivial communication complexity lower bound for the problem of estimating the edit distance (aka Levenshtein distance) between two strings. To the best of our knowledge, this is the first computational setting in which the complexity of computing the edit distance is provably ..."
Abstract
-
Cited by 14 (7 self)
- Add to MetaCart
We prove the first non-trivial communication complexity lower bound for the problem of estimating the edit distance (aka Levenshtein distance) between two strings. To the best of our knowledge, this is the first computational setting in which the complexity of computing the edit distance is provably larger than that of Hamming distance. Our lower bound exhibits a trade-off between approximation and communication, asserting, for example, that protocols with O(1) bits of communication can only obtain approximation α ≥ Ω(log d / log log d), where d is the length of the input strings. This case of O(1) communication is of particular importance since it captures constant-size sketches as well as embeddings into spaces like L1 and squared-L2, two prevailing algorithmic approaches for dealing with edit distance. Furthermore, the bound holds not only for strings over alphabet Σ = {0, 1}, but also for strings that are permutations (aka the Ulam metric). Besides being applicable to a much richer class of algorithms than all previous results, our bounds are near-tight in at least one case, namely of embedding permutations into L1. The proof uses a new technique, that relies on Fourier analysis in a rather elementary way. 1

