## The space complexity of approximating the frequency moments (1996)

### Cached

### Download Links

- [www.math.tau.ac.il]
- [www.cc.gatech.edu]
- [www.cs.tau.ac.il]
- [www.math.tau.ac.il]
- [www.tau.ac.il]
- [www.tau.ac.il]
- [www.mathcs.emory.edu]
- [www.math.tau.ac.il]
- CiteULike
- DBLP

### Other Repositories/Bibliography

Venue: | JOURNAL OF COMPUTER AND SYSTEM SCIENCES |

Citations: | 702 - 12 self |

### BibTeX

@INPROCEEDINGS{Alon96thespace,

author = {Noga Alon and Yossi Matias and Mario Szegedy},

title = {The space complexity of approximating the frequency moments},

booktitle = {JOURNAL OF COMPUTER AND SYSTEM SCIENCES},

year = {1996},

pages = {20--29},

publisher = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

The frequency moments of a sequence containing mi elements of type i, for 1 ≤ i ≤ n, are the numbers Fk = �n i=1 mki. We consider the space complexity of randomized algorithms that approximate the numbers Fk, when the elements of the sequence are given one by one and cannot be stored. Surprisingly, it turns out that the numbers F0, F1 and F2 can be approximated in logarithmic space, whereas the approximation of Fk for k ≥ 6 requires nΩ(1) space. Applications to data bases are mentioned as well.

### Citations

8567 |
Elements of Information Theory
- Cover, Thomas
- 1991
(Show Context)
Citation Context .... . . , x2t}. Let pi denote the fraction of members of F that contain xi, and let H(p) = −p log 2 p − (1 − p) log 2(1 − p) be the binary entropy function. By a standard entropy inequality (cf., e.g., =-=[4]-=-), �2t |F| ≤ 2 i=1 H(pi) . In order to determine the partition P = I1 ∪ I2 ∪ · · · ∪ Is ∪ {x} we have to choose one of the elements xi as x. The crucial observation is that if the choice of xi as x re... |

1688 |
The Probabilistic Method
- Alon, Spencer
- 2002
(Show Context)
Citation Context ...kn1−1/k F 2 k s1λ 2 F 2 k ≤ 1 8 . It follows that the probability that a single Yi deviates from Fk by more than λFk is at most 1/8, and hence, by the standard estimate of Chernoff (cf., for example, =-=[2]-=- Appendix A), the probability 5sthat more than s2/2 of the variables Yi deviate by more than λFk from Fk is at most ɛ. In case this does not happen, the median Yi supplies a good estimate to the requi... |

386 |
Some complexity questions related to distributive computing
- Yao
(Show Context)
Citation Context ...his simple proof, let us recall some basic definitions and facts concerning the ɛ-error probabilistic communication complexity Cɛ(f) of a function f : {0, 1} n × {0, 1} n ↦→ {0, 1}, introduced by Yao =-=[18]-=-. Consider two parties with unlimited computing power, that wish to compute the value of a Boolean function f(x, y), where x and y are binary vectors of length n, the first party possesses x and the s... |

276 | Maintenance of materialized views: problems, techniques and applications - Gupta, Mumick - 1995 |

217 |
A fast and simple randomized parallel algorithm for the maximal independent set problem
- Alon, Babai, et al.
- 1986
(Show Context)
Citation Context ...t coordinates 1 ≤ i1 ≤ . . . ≤ i4 ≤ n and every choice of ɛ1, . . . , ɛ4 ∈ {−1, 1} exactly a (1/16)−fraction of the vectors have ɛj in their coordinate number ij for j = 1, . . . , 4. As described in =-=[1]-=- such sets (also known as orthogonal arrays of strength 4) can be constructed using the parity check matrices of BCH codes. To implement this construction we need an irreducible polynomial of degree d... |

157 | Fast incremental maintenance of approximate histograms
- Gibbons, Matias, et al.
- 1997
(Show Context)
Citation Context ...lation should preferably be done and updated as the records of the relation are inserted to the database. A more concrete discussion about the practical implications of such framework can be found in =-=[8]-=-. Note that it is rather straightforward to maintain the (exact) frequency moments by maintaining a full histogram on the data, i.e., maintaining a counter mi for each data value i ∈ {1, 2, . . . , n}... |

152 |
The probabilistic communication complexity of set intersection
- Kalyanasundaram, Schnitger
- 1992
(Show Context)
Citation Context ..., 2, . . . , n} whose characteristic vectors are x and y intersect. Several researchers studied the communication complexity of this function. Improving a result in [3], Kalyanasundaram and Schnitger =-=[13]-=- proved that for any fixed ɛ < 1/2, Cɛ(DISn) ≥ Ω(n). Razborov [16] exhibited a simple measure µ on the inputs of this function and showed that for this measure Dɛ(DISn|µ) ≥ Ω(n). Our lower bound for t... |

134 | Balancing histogram optimality and practicality for query result size estimation - Ioannidis, Poosala - 1995 |

114 | Sampling-based estimation of the number of distinct values of an attribute
- Haas, Naughton, et al.
- 1995
(Show Context)
Citation Context ...hus, for example, the degree of the skew may determine the selection of algorithms for data partitioning, as discussed by DeWitt et al [5] (see also references therein). The recent work by Haas et al =-=[12]-=- considers sampling based algorithms for estimating F0, and proposes a hybrid approach in which the algorithm is selected based on the degree of skew of the data, measured essentially by the function ... |

111 |
Complexity classes in communication complexity theory
- Babai, Frankl, et al.
- 1986
(Show Context)
Citation Context ...the worst possible x and y). The complexity Cɛ(f) is the expected number of bits communicated in the worst case (under the best protocol). As shown by Yao [19] and extended by Babai, Frankl and Simon =-=[3]-=-, Cɛ(f) can be estimated by considering the related notion of the ɛ-error distributional communication complexity Dɛ(f|µ) under a probability measure on the possible inputs (x, y). Here the two partie... |

104 |
On the distributional complexity of disjointness
- Razborov
- 1992
(Show Context)
Citation Context .... Several researchers studied the communication complexity of this function. Improving a result in [3], Kalyanasundaram and Schnitger [13] proved that for any fixed ɛ < 1/2, Cɛ(DISn) ≥ Ω(n). Razborov =-=[16]-=- exhibited a simple measure µ on the inputs of this function and showed that for this measure Dɛ(DISn|µ) ≥ Ω(n). Our lower bound for the space complexity of estimating F ∗ ∞ follows easily from the re... |

100 | Practical Skew Handling in Parallel Joins
- DeWitt
- 1992
(Show Context)
Citation Context ... of major consideration in many parallel database applications. Thus, for example, the degree of the skew may determine the selection of algorithms for data partitioning, as discussed by DeWitt et al =-=[5]-=- (see also references therein). The recent work by Haas et al [12] considers sampling based algorithms for estimating F0, and proposes a hybrid approach in which the algorithm is selected based on the... |

92 | A linear-time probabilistic counting algorithm for database applications
- Whang, Vander-zanden, et al.
- 1990
(Show Context)
Citation Context ...ximating F0 using O(log n) bits of memory. (Their analysis, however, is based on the assumption that explicit families of hash functions with very strong random properties are available.) Whang et al =-=[17]-=- considered the problem of approximating F0 in the context of databases. Here we obtain rather tight bounds for the minimum possible memory required to approximate the numbers Fk. We prove that for ev... |

91 |
bounds by probabilistic arguments
- Lower
- 1983
(Show Context)
Citation Context ...f(x, y) with probability at least 1 − ɛ (for the worst possible x and y). The complexity Cɛ(f) is the expected number of bits communicated in the worst case (under the best protocol). As shown by Yao =-=[19]-=- and extended by Babai, Frankl and Simon [3], Cɛ(f) can be estimated by considering the related notion of the ɛ-error distributional communication complexity Dɛ(f|µ) under a probability measure on the... |

57 |
Counting large numbers of events in small registers
- Morris
- 1978
(Show Context)
Citation Context ...y, 3/4, given that m ≤ n O(1) . (In the following sections we consider the general case, that is, the space complexity as a function of n, m, the relative error λ and the error-probability ɛ.) Morris =-=[15]-=- (see also [6], [11]) showed how to approximate F1 (that is; how to design an approximate counter) using only O(log log m) (= O(log log n) ) bits of memory. Flajolet and Martin [7] designed an algorit... |

41 | Approximate counting: a detailed analysis
- FLAJOLET
- 1985
(Show Context)
Citation Context ...hat m ≤ n O(1) . (In the following sections we consider the general case, that is, the space complexity as a function of n, m, the relative error λ and the error-probability ɛ.) Morris [15] (see also =-=[6]-=-, [11]) showed how to approximate F1 (that is; how to design an approximate counter) using only O(log log m) (= O(log log n) ) bits of memory. Flajolet and Martin [7] designed an algorithm for approxi... |

23 | Mumick I. S.: Maintenance of Materialized Views: Problems - Gupta - 1995 |

12 |
A supplement to sampling-based methods for query size estimation in a database system
- Ling, Sun
- 1992
(Show Context)
Citation Context ...o maintain the (exact) frequency moments by maintaining a full histogram on the data, i.e., maintaining a counter mi for each data value i ∈ {1, 2, . . . , n}, which requires memory of size Ω(n) (cf. =-=[14]-=-). However, it is important that the memory used for computing and maintaining the estimates be limited. Large memory requirements would require storing the data structures in external memory, which w... |

8 |
Probabilistic counting of a large number of events, manuscript
- Hofri, Kechris
- 1995
(Show Context)
Citation Context ... ≤ n O(1) . (In the following sections we consider the general case, that is, the space complexity as a function of n, m, the relative error λ and the error-probability ɛ.) Morris [15] (see also [6], =-=[11]-=-) showed how to approximate F1 (that is; how to design an approximate counter) using only O(log log m) (= O(log log n) ) bits of memory. Flajolet and Martin [7] designed an algorithm for approximating... |

6 | Dynamic probabilistic maintenance of self-join sizes in limited storage, manuscript - filon, Gibbons, et al. - 1996 |

5 | Surprise indexes and p-values - Good - 1989 |

4 |
Practical maintenance algorithms for high-biased histograms using probabilistic filtering
- Gibbons, Matias, et al.
- 1995
(Show Context)
Citation Context ...remark that in practice, one may be able to obtain estimation algorithms which for typical data sets would be more efficient than the worst case performance implied by the lower bounds. Gibbons et al =-=[9]-=- recently presented an algorithm for maintaining an approximate list of the k most popular items and their approximate counts (and hence also approximating F ∗ ∞) using small memory, which works well ... |

3 |
Surprise indexes and P -values
- Good
- 1989
(Show Context)
Citation Context ... appearing in the sequence, F1 ( = m) is the length of the sequence, and F2 is the repeat rate or Gini’s index of homogeneity needed in order to compute the surprise index of the sequence (see, e.g., =-=[10]-=-). We also define F ∗ ∞ = max 1≤i≤n mi . ∗ A preliminary version of this paper appeared in Proceedings of the 28th Annual ACM Symposium on Theory of Computing (STOC), May, 1996. † Department of Mathem... |