## Zero-One Frequency Laws

Citations: | 5 - 0 self |

### BibTeX

@MISC{Braverman_zero-onefrequency,

author = {Vladimir Braverman and Rafail Ostrovsky},

title = {Zero-One Frequency Laws},

year = {}

}

### OpenURL

### Abstract

Data streams emerged as a critical model for multiple applications that handle vast amounts of data. One of the most influential and celebrated papers in streaming is the “AMS ” paper on computing frequency moments by Alon, Matias and Szegedy. The main question left open (and explicitly asked) by AMS in 1996 is to give the precise characterization for which functions G on frequency vectors mi (1 ≤ i ≤ n) can ∑ i∈[n] G(mi) be approximated efficiently, where “efficiently ” means by a single pass over data stream and poly-logarithmic memory. No such characterization was known despite a tremendous amount of research on frequency-based functions in streaming literature. In this paper we finally resolve the AMS main question and give a precise characterization (in fact, a zero-one law) for all monotonically increasing functions on frequencies that are zero at the origin. That is, we consider all monotonic functions G: R ↦ → R such that G(0) = 0 and G can be computed in poly-logarithmic time and space and ask, for which G in this class is there an (1±ɛ)-approximation algorithm for computing ∑ i∈[n] G(mi) for any polylogarithmic ɛ? We give an algebraic characterization for all such G so that: • For all functions G in our class that satisfy our algebraic condition, we provide a very general and constructive way to derive an efficient (1±ɛ)-approximation algorithm for computing ∑ i∈[n] G(mi) with polylogarithmic memory and a single pass over data stream; while • For all functions G in our class that do not satisfy our algebraic characterization, we show a lower bound

### Citations

701 | The space complexity of approximating the frequency moments
- Alon, Matias, et al.
- 1999
(Show Context)
Citation Context ...many applications with vast amounts of data. The importance of the streaming model is discussed, e.g., by Aggarwal (ed.) [1] and Muthukrishnan [41]. In the seminal AMS paper, Alon, Matias and Szegedy =-=[2]-=- studied the following basic model: Definition 1.1. Let m, n be positive integers. A stream D = D(n, m) is a sequence of size m of integers p1, . . . , pm, where pi ∈ {1, . . . , n}. A frequency vecto... |

618 | Privacy and communication complexity
- Kushilevitz
- 1992
(Show Context)
Citation Context ... to note that heavy hitters were solved for many specific metrics [18, 22]. 3. THE LOWER BOUND To establish lower bounds, we will use SET DISJOINTNESS and INDEX problems from communication complexity =-=[37]-=-. Recall that SET DISJOINTNESS is the following promise problem: each of t ≥ 2 players is given a set from the universe [N]; all sets have exactly one common element or disjoint. The lower bound on th... |

379 | Data streams: Algorithms and applications
- Muthukrishnan
(Show Context)
Citation Context ...NTRODUCTION Data streams emerged as a critical model for many applications with vast amounts of data. The importance of the streaming model is discussed, e.g., by Aggarwal (ed.) [1] and Muthukrishnan =-=[41]-=-. In the seminal AMS paper, Alon, Matias and Szegedy [2] studied the following basic model: Definition 1.1. Let m, n be positive integers. A stream D = D(n, m) is a sequence of size m of integers p1, ... |

340 | Probabilistic Counting Algorithms for Data Base Applications
- Flajolet, Martin
- 1985
(Show Context)
Citation Context ...k ) for one-pass algorithms. Indyk and Woodruff [32] and Woodruff [43] gave optimal lower bound in terms of error parameter. Many other results on frequency moments include, e.g., Flajolet and Martin =-=[24]-=-, Bar-Yossef, Jayram, Kumar, and Sivakumar [4], Coppersmith and Kumar [19], Cormode, Datar, Indyk and Muthukrishnan [20], Feigenbaum, Kannan, Strauss and Viswanathan [23], Ganguly [25], Ganguly and Co... |

294 | An improved data stream summary: The count-min sketch and its applications
- Cormode, Muthukrishnan
- 2004
(Show Context)
Citation Context ...ky [8], Braverman and Ostrovsky [13]. The related question of frequent elements has been studied by Charikar, Chen and Farach-Colton [18], Cormode and Hadjieleftheriou [21], Cormode and Muthukrishnan =-=[22]-=-. The frequency-based functions were studied in extended models such as the read/write model (Beame, Jayram and Rudra [5]), and the randomized model (Chakrabarti, Cormode and McGregor [16], Jayram, Mc... |

261 | Finding frequent items in data streams
- Charikar, Chen, et al.
- 2002
(Show Context)
Citation Context ... Xu and Zhang [38], Braverman, Chung, Liu, Mitzenmacher and Ostrovsky [8], Braverman and Ostrovsky [13]. The related question of frequent elements has been studied by Charikar, Chen and Farach-Colton =-=[18]-=-, Cormode and Hadjieleftheriou [21], Cormode and Muthukrishnan [22]. The frequency-based functions were studied in extended models such as the read/write model (Beame, Jayram and Rudra [5]), and the r... |

260 | Stable distributions, pseudorandom generators, embeddings and data stream computation
- Indyk
- 2000
(Show Context)
Citation Context ...wed 1 that for k = 0, 1, 2 it is possible to approximate Fk with polylogarithmic space; for k > 2 they gave O ∗ (n 1−1/k ) upper bound. Also, they gave O ∗ (n 1−5/k ) lower bound for any k > 5. Indyk =-=[30]-=- presented a celebrated method of stable distributions for approximating Lp norms p ∈ (0, 2] in a general model where deletions are allowed and updates can be larger then 1. Indyk and Woodruff [33] ga... |

188 | Pseudorandom generators for space-bounded computation
- Nisan
- 1992
(Show Context)
Citation Context ...general proof for any function that satisfies (1). Many of the existing methods first assume that totally random vectors are available, and later employ the celebrated pseudorandom generator of Nisan =-=[42]-=- to reduce the space for randomness. This brings another natural question: Are pseudorandom generators necessary or can we directly work with k-wise independent distributions? Surprisingly, we show th... |

154 | An information statistics approach to data stream and communication complexity - Bar-Yossef, Jayram, et al. |

144 | Counting distinct elements in a data stream
- Bar-Yossef, Jayram, et al.
- 2002
(Show Context)
Citation Context ... [32] and Woodruff [43] gave optimal lower bound in terms of error parameter. Many other results on frequency moments include, e.g., Flajolet and Martin [24], Bar-Yossef, Jayram, Kumar, and Sivakumar =-=[4]-=-, Coppersmith and Kumar [19], Cormode, Datar, Indyk and Muthukrishnan [20], Feigenbaum, Kannan, Strauss and Viswanathan [23], Ganguly [25], Ganguly and Cormode [26], Li [39], and Kane, Nelson and Wood... |

87 |
An approximate L1-difference algorithm for massive data streams
- Feigenbaum, Kannan, et al.
- 2000
(Show Context)
Citation Context ...de, e.g., Flajolet and Martin [24], Bar-Yossef, Jayram, Kumar, and Sivakumar [4], Coppersmith and Kumar [19], Cormode, Datar, Indyk and Muthukrishnan [20], Feigenbaum, Kannan, Strauss and Viswanathan =-=[23]-=-, Ganguly [25], Ganguly and Cormode [26], Li [39], and Kane, Nelson and Woodruff [35, 36], Braverman and Ostrovsky [9, 11]. Currently, many important frequency-based functions are well-understood in d... |

75 |
Optimal approximations of frequency moments of data streams
- Indyk, Woodruff
- 2005
(Show Context)
Citation Context ...yk [30] presented a celebrated method of stable distributions for approximating Lp norms p ∈ (0, 2] in a general model where deletions are allowed and updates can be larger then 1. Indyk and Woodruff =-=[33]-=- gave the first optimal algorithm for Fk, k > 2, proving O ∗ (n 1−2/k ) upper bound. This result was later improved by polylog factors by Bhuvanagiri, Ganguly, Kesh and Saha [7]. BarYossef, Jayram, Ku... |

72 | Near-optimal lower bounds on the multiparty communication complexity of set disjointness
- Chakrabarti, Khot, et al.
- 2003
(Show Context)
Citation Context ...giri, Ganguly, Kesh and Saha [7]. BarYossef, Jayram, Kumar and Sivakumar [3] used information theory to prove the first nearly matching lower bound of Ω(n 1−(2+ɛ)/k ). Later Chakrabarti, Khot and Sun =-=[17]-=- improved the lower bound to Ω(n 1−2/k ) for one-pass algorithms. Indyk and Woodruff [32] and Woodruff [43] gave optimal lower bound in terms of error parameter. Many other results on frequency moment... |

71 | Comparing data streams using hamming norms (how to zero in
- Cormode, Datar, et al.
- 2002
(Show Context)
Citation Context ...ter. Many other results on frequency moments include, e.g., Flajolet and Martin [24], Bar-Yossef, Jayram, Kumar, and Sivakumar [4], Coppersmith and Kumar [19], Cormode, Datar, Indyk and Muthukrishnan =-=[20]-=-, Feigenbaum, Kannan, Strauss and Viswanathan [23], Ganguly [25], Ganguly and Cormode [26], Li [39], and Kane, Nelson and Woodruff [35, 36], Braverman and Ostrovsky [9, 11]. Currently, many important ... |

60 | Optimal space lower bounds for all frequency moments
- Woodruff
- 2004
(Show Context)
Citation Context ...ve the first nearly matching lower bound of Ω(n 1−(2+ɛ)/k ). Later Chakrabarti, Khot and Sun [17] improved the lower bound to Ω(n 1−2/k ) for one-pass algorithms. Indyk and Woodruff [32] and Woodruff =-=[43]-=- gave optimal lower bound in terms of error parameter. Many other results on frequency moments include, e.g., Flajolet and Martin [24], Bar-Yossef, Jayram, Kumar, and Sivakumar [4], Coppersmith and Ku... |

54 | Streaming and sublinear approximation of entropy and information distances
- Guha, McGregor, et al.
- 2006
(Show Context)
Citation Context ...ropy norm and distributions included the works of Bhuvanagiri and Ganguly [6], Chakrabarti, Do Ba and Muthukrishnan [14], Chakrabarti, Cormode and McGregor [15], Guha, McGregor and Venkatasubramanian =-=[28]-=-, Harvey, Nelson and Onak [29], Indyk and McGregor [31], Lall, Sekar, Ogihara, Xu and Zhang [38], Braverman, Chung, Liu, Mitzenmacher and Ostrovsky [8], Braverman and Ostrovsky [13]. The related quest... |

53 | A near-optimal algorithm for computing the entropy of a stream
- Chakrabarti, Cormode, et al.
- 2007
(Show Context)
Citation Context ...n different models. Research on entropy, entropy norm and distributions included the works of Bhuvanagiri and Ganguly [6], Chakrabarti, Do Ba and Muthukrishnan [14], Chakrabarti, Cormode and McGregor =-=[15]-=-, Guha, McGregor and Venkatasubramanian [28], Harvey, Nelson and Onak [29], Indyk and McGregor [31], Lall, Sekar, Ogihara, Xu and Zhang [38], Braverman, Chung, Liu, Mitzenmacher and Ostrovsky [8], Bra... |

47 | Data streaming algorithms for estimating entropy of network traffic
- Lall, Sekar, et al.
- 2006
(Show Context)
Citation Context ...a and Muthukrishnan [14], Chakrabarti, Cormode and McGregor [15], Guha, McGregor and Venkatasubramanian [28], Harvey, Nelson and Onak [29], Indyk and McGregor [31], Lall, Sekar, Ogihara, Xu and Zhang =-=[38]-=-, Braverman, Chung, Liu, Mitzenmacher and Ostrovsky [8], Braverman and Ostrovsky [13]. The related question of frequent elements has been studied by Charikar, Chen and Farach-Colton [18], Cormode and ... |

43 | Tight lower bounds for the distinct elements problem
- Indyk, Woodruff
- 2003
(Show Context)
Citation Context ...tion theory to prove the first nearly matching lower bound of Ω(n 1−(2+ɛ)/k ). Later Chakrabarti, Khot and Sun [17] improved the lower bound to Ω(n 1−2/k ) for one-pass algorithms. Indyk and Woodruff =-=[32]-=- and Woodruff [43] gave optimal lower bound in terms of error parameter. Many other results on frequency moments include, e.g., Flajolet and Martin [24], Bar-Yossef, Jayram, Kumar, and Sivakumar [4], ... |

40 | Simpler algorithms for estimating frequency moments of data streams
- Bhuvanagiri, Ganguly, et al.
- 2006
(Show Context)
Citation Context ...1. Indyk and Woodruff [33] gave the first optimal algorithm for Fk, k > 2, proving O ∗ (n 1−2/k ) upper bound. This result was later improved by polylog factors by Bhuvanagiri, Ganguly, Kesh and Saha =-=[7]-=-. BarYossef, Jayram, Kumar and Sivakumar [3] used information theory to prove the first nearly matching lower bound of Ω(n 1−(2+ɛ)/k ). Later Chakrabarti, Khot and Sun [17] improved the lower bound to... |

38 | Finding frequent items in data streams
- Cormode, Hadjieleftheriou
(Show Context)
Citation Context ...g, Liu, Mitzenmacher and Ostrovsky [8], Braverman and Ostrovsky [13]. The related question of frequent elements has been studied by Charikar, Chen and Farach-Colton [18], Cormode and Hadjieleftheriou =-=[21]-=-, Cormode and Muthukrishnan [22]. The frequency-based functions were studied in extended models such as the read/write model (Beame, Jayram and Rudra [5]), and the randomized model (Chakrabarti, Cormo... |

34 | Estimating entropy and entropy norm on data streams - Chakrabarti, Ba, et al. - 2006 |

27 |
An improved data stream algorithm for frequency moments
- Coppersmith, Kumar
- 2004
(Show Context)
Citation Context ...e optimal lower bound in terms of error parameter. Many other results on frequency moments include, e.g., Flajolet and Martin [24], Bar-Yossef, Jayram, Kumar, and Sivakumar [4], Coppersmith and Kumar =-=[19]-=-, Cormode, Datar, Indyk and Muthukrishnan [20], Feigenbaum, Kannan, Strauss and Viswanathan [23], Ganguly [25], Ganguly and Cormode [26], Li [39], and Kane, Nelson and Woodruff [35, 36], Braverman and... |

27 | An optimal algorithm for the distinct elements problem
- Kane, Nelson, et al.
- 2010
(Show Context)
Citation Context ...persmith and Kumar [19], Cormode, Datar, Indyk and Muthukrishnan [20], Feigenbaum, Kannan, Strauss and Viswanathan [23], Ganguly [25], Ganguly and Cormode [26], Li [39], and Kane, Nelson and Woodruff =-=[35, 36]-=-, Braverman and Ostrovsky [9, 11]. Currently, many important frequency-based functions are well-understood in different models. Research on entropy, entropy norm and distributions included the works o... |

26 | Estimating statistical aggregates on probabilistic data streams
- Jayram, McGregor, et al.
- 2007
(Show Context)
Citation Context ...s were studied in extended models such as the read/write model (Beame, Jayram and Rudra [5]), and the randomized model (Chakrabarti, Cormode and McGregor [16], Jayram, McGregor, Muthukrishnan and Vee =-=[34]-=-). The main question left open (and explicitly asked) by Alon, Matias and Szegedy [2] is: AMS (informal): What other frequency-based functions can be approximated on streams? 1 This is a very informal... |

25 | Estimating entropy over data streams
- Bhuvanagiri, Ganguly
- 2006
(Show Context)
Citation Context ...[9, 11]. Currently, many important frequency-based functions are well-understood in different models. Research on entropy, entropy norm and distributions included the works of Bhuvanagiri and Ganguly =-=[6]-=-, Chakrabarti, Do Ba and Muthukrishnan [14], Chakrabarti, Cormode and McGregor [15], Guha, McGregor and Venkatasubramanian [28], Harvey, Nelson and Onak [29], Indyk and McGregor [31], Lall, Sekar, Ogi... |

22 | Robust lower bounds for communication and stream computation
- Chakrabarti, Cormode, et al.
- 2008
(Show Context)
Citation Context ...thukrishnan [22]. The frequency-based functions were studied in extended models such as the read/write model (Beame, Jayram and Rudra [5]), and the randomized model (Chakrabarti, Cormode and McGregor =-=[16]-=-, Jayram, McGregor, Muthukrishnan and Vee [34]). The main question left open (and explicitly asked) by Alon, Matias and Szegedy [2] is: AMS (informal): What other frequency-based functions can be appr... |

21 | Sketching and streaming entropy via approximation theory
- Harvey, Nelson, et al.
- 2008
(Show Context)
Citation Context ...cluded the works of Bhuvanagiri and Ganguly [6], Chakrabarti, Do Ba and Muthukrishnan [14], Chakrabarti, Cormode and McGregor [15], Guha, McGregor and Venkatasubramanian [28], Harvey, Nelson and Onak =-=[29]-=-, Indyk and McGregor [31], Lall, Sekar, Ogihara, Xu and Zhang [38], Braverman, Chung, Liu, Mitzenmacher and Ostrovsky [8], Braverman and Ostrovsky [13]. The related question of frequent elements has b... |

21 | Declaring independence via the sketching of sketches
- INDYK, MCGREGOR
- 2008
(Show Context)
Citation Context ...nagiri and Ganguly [6], Chakrabarti, Do Ba and Muthukrishnan [14], Chakrabarti, Cormode and McGregor [15], Guha, McGregor and Venkatasubramanian [28], Harvey, Nelson and Onak [29], Indyk and McGregor =-=[31]-=-, Lall, Sekar, Ogihara, Xu and Zhang [38], Braverman, Chung, Liu, Mitzenmacher and Ostrovsky [8], Braverman and Ostrovsky [13]. The related question of frequent elements has been studied by Charikar, ... |

18 | On the exact space complexity of sketching and streaming small norms
- Kane, Nelson, et al.
- 2010
(Show Context)
Citation Context ...persmith and Kumar [19], Cormode, Datar, Indyk and Muthukrishnan [20], Feigenbaum, Kannan, Strauss and Viswanathan [23], Ganguly [25], Ganguly and Cormode [26], Li [39], and Kane, Nelson and Woodruff =-=[35, 36]-=-, Braverman and Ostrovsky [9, 11]. Currently, many important frequency-based functions are well-understood in different models. Research on entropy, entropy norm and distributions included the works o... |

15 | Lower bounds for randomized read/write stream algorithms
- Beame, Jayram, et al.
- 2007
(Show Context)
Citation Context ...ach-Colton [18], Cormode and Hadjieleftheriou [21], Cormode and Muthukrishnan [22]. The frequency-based functions were studied in extended models such as the read/write model (Beame, Jayram and Rudra =-=[5]-=-), and the randomized model (Chakrabarti, Cormode and McGregor [16], Jayram, McGregor, Muthukrishnan and Vee [34]). The main question left open (and explicitly asked) by Alon, Matias and Szegedy [2] i... |

13 | Estimating frequency moments of data streams using random linear combinations
- Ganguly
- 2004
(Show Context)
Citation Context ...olet and Martin [24], Bar-Yossef, Jayram, Kumar, and Sivakumar [4], Coppersmith and Kumar [19], Cormode, Datar, Indyk and Muthukrishnan [20], Feigenbaum, Kannan, Strauss and Viswanathan [23], Ganguly =-=[25]-=-, Ganguly and Cormode [26], Li [39], and Kane, Nelson and Woodruff [35, 36], Braverman and Ostrovsky [9, 11]. Currently, many important frequency-based functions are well-understood in different model... |

13 | On estimating frequency moments of data streams
- Ganguly, Cormode
- 2007
(Show Context)
Citation Context ...Yossef, Jayram, Kumar, and Sivakumar [4], Coppersmith and Kumar [19], Cormode, Datar, Indyk and Muthukrishnan [20], Feigenbaum, Kannan, Strauss and Viswanathan [23], Ganguly [25], Ganguly and Cormode =-=[26]-=-, Li [39], and Kane, Nelson and Woodruff [35, 36], Braverman and Ostrovsky [9, 11]. Currently, many important frequency-based functions are well-understood in different models. Research on entropy, en... |

12 | Sketching information divergences
- Guha, Indyk, et al.
- 2008
(Show Context)
Citation Context ...ift Invariant Theorem is not applicable. The questions of Alon, Matias and Szegedy [2] and Guha To the best of our knowledge, the only work in this direction is the result of Guha, Indyk and McGregor =-=[27]-=-. They proved the Shift Invariant Theorem, a general result that gives a necessary condition for approximating a wide class of two-variable functions. The Shift Invariant Theorem says, very informally... |

11 | Compressed Counting
- Li
- 2009
(Show Context)
Citation Context ...ayram, Kumar, and Sivakumar [4], Coppersmith and Kumar [19], Cormode, Datar, Indyk and Muthukrishnan [20], Feigenbaum, Kannan, Strauss and Viswanathan [23], Ganguly [25], Ganguly and Cormode [26], Li =-=[39]-=-, and Kane, Nelson and Woodruff [35, 36], Braverman and Ostrovsky [9, 11]. Currently, many important frequency-based functions are well-understood in different models. Research on entropy, entropy nor... |

10 | Optimal sampling from sliding windows - Braverman, Ostrovsky, et al. - 2009 |

7 | Smooth histograms for sliding windows
- Braverman, Ostrovsky
- 2007
(Show Context)
Citation Context ...Datar, Indyk and Muthukrishnan [20], Feigenbaum, Kannan, Strauss and Viswanathan [23], Ganguly [25], Ganguly and Cormode [26], Li [39], and Kane, Nelson and Woodruff [35, 36], Braverman and Ostrovsky =-=[9, 11]-=-. Currently, many important frequency-based functions are well-understood in different models. Research on entropy, entropy norm and distributions included the works of Bhuvanagiri and Ganguly [6], Ch... |

7 |
Open Problems in Data Streams and Related Topics
- McGregor
(Show Context)
Citation Context ... and Szegedy [2] is: AMS (informal): What other frequency-based functions can be approximated on streams? 1 This is a very informal explanation. For precise statements see [2]. In 2006 Guha and Indyk =-=[40]-=- (Question 5) asked a related question: Guha, Indyk (informal): What distances can be computed between two distribution vectors V and U defined by two streams? Consider a function φ(x, y) such that φ(... |

3 |
Data Streams: Models and Algorithms (Advances in Database Systems
- Aggarwal
- 2006
(Show Context)
Citation Context ...y of Computation.1. INTRODUCTION Data streams emerged as a critical model for many applications with vast amounts of data. The importance of the streaming model is discussed, e.g., by Aggarwal (ed.) =-=[1]-=- and Muthukrishnan [41]. In the seminal AMS paper, Alon, Matias and Szegedy [2] studied the following basic model: Definition 1.1. Let m, n be positive integers. A stream D = D(n, m) is a sequence of ... |

2 |
Effective Computations on Sliding Windows
- Braverman, Ostrovsky
- 2010
(Show Context)
Citation Context ...Datar, Indyk and Muthukrishnan [20], Feigenbaum, Kannan, Strauss and Viswanathan [23], Ganguly [25], Ganguly and Cormode [26], Li [39], and Kane, Nelson and Woodruff [35, 36], Braverman and Ostrovsky =-=[9, 11]-=-. Currently, many important frequency-based functions are well-understood in different models. Research on entropy, entropy norm and distributions included the works of Bhuvanagiri and Ganguly [6], Ch... |