@MISC{Flajolet90onadaptive, author = {Philippe Flajolet}, title = {On Adaptive Sampling}, year = {1990} }

Share

OpenURL

Abstract

. We analyze the storage/accuracy trade--off of an adaptive sampling algorithm due to Wegman that makes it possible to evaluate probabilistically the number of distinct elements in a large file stored on disk. 1 Introduction A problem that naturally arises in query optimization of data base systems [1] is to estimate the number of distinct elements (also called cardinality) of a large collection of data with unpredictable replications. The trivial solution that consists in building a list of distinct elements is usually too much resource consuming both in terms of storage and processing time requirements. In [4] the authors have presented a solution called Probabilistic Counting that estimates the cardinality of a large file typically stored on disk; when using m words of in--core memory the algorithm presents an expected relative accuracy close to 0:78 p m; and it performs only a constant number of operations per element of the file. Wegman [11] has proposed an interesting alter...