## Sampling Algorithms: Lower Bounds and Applications (Extended Abstract) (2001)

Citations: | 50 - 2 self |

### BibTeX

@MISC{Bar-yossef01samplingalgorithms:,

author = {Ziv Bar-yossef and D. Sivakumar},

title = {Sampling Algorithms: Lower Bounds and Applications (Extended Abstract)},

year = {2001}

}

### Years of Citing Articles

### OpenURL

### Abstract

] Ziv Bar-Yossef y Computer Science Division U. C. Berkeley Berkeley, CA 94720 zivi@cs.berkeley.edu Ravi Kumar IBM Almaden 650 Harry Road San Jose, CA 95120 ravi@almaden.ibm.com D. Sivakumar IBM Almaden 650 Harry Road San Jose, CA 95120 siva@almaden.ibm.com ABSTRACT We develop a framework to study probabilistic sampling algorithms that approximate general functions of the form f : A n ! B, where A and B are arbitrary sets. Our goal is to obtain lower bounds on the query complexity of functions, namely the number of input variables x i that any sampling algorithm needs to query to approximate f(x1 ; : : : ; xn ). We define two quantitative properties of functions --- the block sensitivity and the minimum Hellinger distance --- that give us techniques to prove lower bounds on the query complexity. These techniques are quite general, easy to use, yet powerful enough to yield tight results. Our applications include the mean and higher statistical moments, the median and other selection functions, and the frequency moments, where we obtain lower bounds that are close to the corresponding upper bounds. We also point out some connections between sampling and streaming algorithms and lossy compression schemes. 1.

### Citations

9539 | Statistical Learning Theory - Vapnik - 1998 |

1734 | A theory of the learnable - Valiant - 1984 |

1533 | Probability inequalities for sums of bounded random variables - Hoeffding - 1963 |

1320 |
Statistical Decision Theory and Bayesian Analysis. Second Edition
- Berger
- 1985
(Show Context)
Citation Context ...ior work and in particular the relationship of our model to the areas of Boolean decision tree complexity [Bd99], PAC and statistical learning Theory [Val84, KV94, Vap98], statistical decision theory =-=[Ber85]-=-, statistical estimation theory [Van68], and statistical sequential analysis [Sie85]. Section 9 concludes with some open problems. 2 Preliminaries In this section we introduce a notion of approximatio... |

745 | A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations - Chernoff - 1952 |

710 | The space complexity of approximating the frequency moments
- Alon, Matias, et al.
- 1996
(Show Context)
Citation Context ...gorithm that uses s samples can be simulated by a streaming algorithm that uses space (roughly) s, we show that the converse is not true for F 2 : the streaming algorithm of Alon, Matias, and Szegedy =-=[AMS99] uses-=- O(log m) space for approximating F 2 , while we prove ansp m) lower bound for sampling algorithms. (This style of \separation" of the two models is also demonstrated in [FKSV00].) We then show t... |

611 | An introduction to computational learning theory - Kearns, Vazirani - 1994 |

436 | zur Gathen and - von - 1999 |

365 | Probabilistic computations: Towards a unified measure of complexity - Yao - 1977 |

297 | Group representations in probability and statistics - Diaconis - 1988 |

264 | Stable distributions, pseudorandom generators, embeddings, and data stream computation
- Indyk
- 2000
(Show Context)
Citation Context ...n operator, which is not a distance function. In an earlier version of this paper, we had claimed that this problem could be circumvented, but this claim appears to be erroneous. Independently, Indyk =-=[Ind00]-=- has shown how to handle some of the complications that arise due to the median; he applies the streaming algorithms of [AMS99, FKSV99] for distance functions to obtain embeddings of metric spaces int... |

202 | Min-wise independent permutations - Broder, Charikar, et al. - 1998 |

199 |
Modulation Theory
- Trees, Detection
- 1968
(Show Context)
Citation Context ...ship of our model to the areas of Boolean decision tree complexity [Bd99], PAC and statistical learning Theory [Val84, KV94, Vap98], statistical decision theory [Ber85], statistical estimation theory =-=[Van68]-=-, and statistical sequential analysis [Sie85]. Section 9 concludes with some open problems. 2 Preliminaries In this section we introduce a notion of approximation for functions f : A n ! B, where A an... |

162 | Computing on data streams
- Henzinger, Raghavan, et al.
- 1998
(Show Context)
Citation Context ...t in novel computational paradigms. These include, but are not restricted to, algorithms that probe only small (random) portions of the data; algorithms that work by making a few passes over the data =-=[HRR99]-=-; algorithms that operate on a stream of data with limited space and stringent constraints on time per data item [HRR99, AMS99, FKSV99]. Algorithms of this nature are typically intended to compute suc... |

133 | Sequential analysis tests and confidence intervals - Siegmund - 1985 |

128 | Complexity measures and decision tree complexity: a survey
- Buhrman, Wolf
(Show Context)
Citation Context ...y precise sense). These results are described in Section 7. In Section 8 we discuss related prior work and in particular the relationship of our model to the areas of Boolean decision tree complexity =-=[Bd99]-=-, PAC and statistical learning Theory [Val84, KV94, Vap98], statistical decision theory [Ber85], statistical estimation theory [Van68], and statistical sequential analysis [Sie85]. Section 9 concludes... |

121 | Property testing in bounded degree graphs
- Goldreich, Ron
(Show Context)
Citation Context ...t -far from being bipartite, we need to make at least dn modications to its edge list, implying dn)-sized blocks, which implies only O(1=) block sensitivity. On the other hand, Goldreich and Ron [GR97=-=-=-] prove that S w ; (f) =sp n). 2 7 Lossy Compression In the context of computing with massive data sets, the following question is fairly natural and extremely important: what computations can be perf... |

114 | Probability inequalities for sums of bounded random variables - Hoeding - 1963 |

97 | Let sleeping files lie: Pattern matching in Z-compressed files - Amir, Benson, et al. - 1996 |

84 |
CREW PRAMs and decision trees
- Nisan
- 1991
(Show Context)
Citation Context ...lower bounds on the query complexity of randomized sampling algorithms that approximately compute functions. Oursrst lower bound technique is based on an adaptation of the notion of block sensitivity =-=[Nis91]-=-. The advantage of this technique is that it applies to any function, even though the lower bounds may be somewhat weak. Moreover, it applies to the expected query complexity, not just the worst case ... |

80 | Testing that distributions are close - Batu, Fortnow, et al. - 2000 |

75 |
Towards Estimation Error Guarantees for Distinct Values
- Charikar, Chaudhuri, et al.
- 2000
(Show Context)
Citation Context ...be approximated using O(m 1 1 k ) samples, which immediately implies the space upper bound of [AMS99] for all k > 2. Finally, we provide a simple proof for the sampling lower bound of Charikar et al. =-=[CCMN00]-=- for F 0 . These results are described in Section 5. We also investigate the question of how tight our lower bound methodologies are. We obtain a general theorem, which shows that at least in some spe... |

73 | On the degree of polynomials that approximate symmetric boolean functions (preliminary version - Paturi - 1992 |

66 | The quantum query complexity of approximating the median and related statistics
- Nayak, Wu
- 1999
(Show Context)
Citation Context ... [SV99]) give lower bounds for relative approximation of the mean on any input x. Charikar et al. [CCMN00] prove a lower bound for ratio approximation of the frequency moment of order 0. Nayak and Wu =-=[NW99]-=- give a lower bound on the quantum query complexity of the median and some other statistics. Sampling algorithms can be viewed as a special case of the general framework studied in statistical 28 deci... |

62 | Bounds for dispersers, extractors, and depth-two superconcentrators
- Radhakrishnan, Ta-Shma
(Show Context)
Citation Context ... Even, and Goldreich [CEG95], and (modulo the machinery) is substantially simpler. This lower bound has another powerful consequence: it implies the main technical result of Radhakrishnan and Ta-Shma =-=[RTS00]-=-, which they use to obtain lower bounds on extractor, disperser, and superconcentrator parameters. Our lower and upper bounds for the frequency moments have some interesting implications. (Recall that... |

56 | A measure of the asymptotic eciency for tests of a hypothesis based on the sum of observations - Cherno - 1952 |

54 | An optimal algorithm for Monte Carlo estimation
- Dagum, Karp, et al.
- 2000
(Show Context)
Citation Context ...query complexity of non-Boolean function approximations; all of these are tailored to specic functions. Canetti et al. [CEG95] show a lower bound for additive approximation of the mean; Dagum et al. [=-=DKLR95]-=- (and also, implicitly, Schulman and Vazirani [SV99]) give lower bounds for relative approximation of the mean on any input x. Charikar et al. [CCMN00] prove a lower bound for ratio approximation of t... |

51 |
Randomness-optimal Oblivious Samplingâ€ť, Random Structures and Algorithms
- Zuckerman
- 1997
(Show Context)
Citation Context ...M j = 2 m , is a (k; )-extractor, if for all distributions X on N with min-entropy at least k, d(E(X; U T ); UM )s, where U T and UM are the uniform distributions on T and M respectively. Zuckerman [Z=-=uc9-=-7] showed that any extractor implies an (oblivious) sampling algorithm for the mean: Lemma 5.8 (Zuckerman) If there exists a (k; )-extractor E : N T ! M , then there exists a sampling algorithm T of ... |

45 | On testing expansion in bounded-degree graphs
- Goldreich, Ron
- 2000
(Show Context)
Citation Context ...ks ` samples x 1 ; : : : ; x ` using U x and counts the k-wise collisions C k , our basic estimator. It is clear that E(C k ) = ` k kU x k k k . The following lemma, whose proof generalizes that of [=-=-=-GR00] for F 2 , helps us to bound the variance of this estimator. Lemma 5.3 Var(C k ) O(E(C k ) 1+1=k ). Now, noting that kU x k k k 1=r k 1 , by Chebyshev inequality, Pr(jC k E(C k )j E(C k )) O(... |

39 |
Compressed-domain techniques for image/video indexing and manipulation
- Chang
- 1995
(Show Context)
Citation Context ...ompressing the compressed data. Applications include pattern matching on compressed textsles [ABF96], operations such as scene change detection and abrupt lighting change detection on video sequences =-=[Cha95]-=-, and nearest neighbor computations. This question becomes more interesting if, to gain larger compression factors, we are willing to accept some loss in the precision of the computation. The idea of ... |

29 | Testing and spot-checking of data streams
- Feigenbaum, Kannan, et al.
(Show Context)
Citation Context ...tias, and Szegedy [AMS99] uses O(log m) space for approximating F 2 , while we prove ansp m) lower bound for sampling algorithms. (This style of \separation" of the two models is also demonstrate=-=d in [FKSV00-=-].) We then show that for k 2, the k-th frequency moment can be approximated using O(m 1 1 k ) samples, which immediately implies the space upper bound of [AMS99] for all k > 2. Finally, we provide a... |

24 | Tight Bounds for Depth-two Super-concentrators - Radhakrishnan, Ta-Shma - 2000 |

21 |
Probabilistic computation, towards a uni measure of complexity
- Yao
- 1977
(Show Context)
Citation Context ...ely, over all inputs x 2 A n . Notice that this model can simulate, with the same eciency, various models of decision trees, including Boolean, comparison, and algebraic decision trees. Yao's Theorem =-=[Yao77-=-] gives an equivalent characterization of a randomized decision tree as a distribution over deterministic decision trees. The expected query complexity of the tree on input x is the expected length (... |

20 | Lower Bounds for Sampling Algorithms for Estimating the Average
- Canetti, Even, et al.
- 1995
(Show Context)
Citation Context ...selection functions, the mean and higher statistical 2 moments, and frequency moments F k for k 6= 1. For the case of the mean, our lower bound matches the lower bound of Canetti, Even, and Goldreich =-=[CEG95]-=-, and (modulo the machinery) is substantially simpler. This lower bound has another powerful consequence: it implies the main technical result of Radhakrishnan and Ta-Shma [RTS00], which they use to o... |

14 |
Let sleeping lie: pattern matching in z-compressed
- Amir, Benson, et al.
- 1996
(Show Context)
Citation Context ... is of much interest tosnd algorithms that are able to compute interesting functions without completely decompressing the compressed data. Applications include pattern matching on compressed textsles =-=[ABF96]-=-, operations such as scene change detection and abrupt lighting change detection on video sequences [Cha95], and nearest neighbor computations. This question becomes more interesting if, to gain large... |

14 | An approximate L -difference algorithm for massive data streams (extended abstract - Feigenbaum, Kannan, et al. - 1999 |

10 |
An approximate L - dierence algorithm for massive data streams
- Feigenbaum, Kannan, et al.
- 1999
(Show Context)
Citation Context ...ng y in a stream (in arbitrary order), and outputs the output of T . 2 A partial converse to the above was proved by making additional assumptions about the encoding function of the lossy compression =-=[FKSV99]-=-. We omit all the parameters in the following statement. Proposition 7.6 [FKSV99] If f has a lossy compression scheme whose compression function has a streaming algorithm, then f has a streaming algor... |

4 |
Sequential analysis - tests and con intervals
- Siegmund
- 1985
(Show Context)
Citation Context ...ision tree complexity [Bd99], PAC and statistical learning Theory [Val84, KV94, Vap98], statistical decision theory [Ber85], statistical estimation theory [Van68], and statistical sequential analysis =-=[Sie85]-=-. Section 9 concludes with some open problems. 2 Preliminaries In this section we introduce a notion of approximation for functions f : A n ! B, where A and B are arbitrary sets. We then generalize Bo... |

2 | Majorizing estimators and the approximation of ]p-complete problems
- Schulman, Vazirani
- 1999
(Show Context)
Citation Context ...s; all of these are tailored to specic functions. Canetti et al. [CEG95] show a lower bound for additive approximation of the mean; Dagum et al. [DKLR95] (and also, implicitly, Schulman and Vazirani [=-=SV99]-=-) give lower bounds for relative approximation of the mean on any input x. Charikar et al. [CCMN00] prove a lower bound for ratio approximation of the frequency moment of order 0. Nayak and Wu [NW99] ... |