#### DMCA

## Processing data-stream join aggregates using skimmed sketches (2004)

### Cached

### Download Links

- [www.cs.berkeley.edu]
- [www.softnet.tuc.gr]
- [www.softnet.tuc.gr]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proc. Int. Conf. on Extending Database Technology (EDBT |

Citations: | 23 - 4 self |

### Citations

844 | The space complexity of approximating the frequency moments
- Alon, Matias, et al.
- 1999
(Show Context)
Citation Context ... aggregate SQL queries over data streams. Techniques based on random stream sampling [13] are known to give very poor result estimates for queries involving one or more joins [14, 4, 15]. Alon et al. =-=[4, 3]-=- propose algorithms that employ small pseudo-random sketch summaries to estimate the size of self-joins and binary joins over data streams. Their algorithms rely on a single-pass method for computing ... |

418 | Approximate frequency counts over data streams. In - Manku, Motwani - 2002 |

340 | Finding frequent items in data streams
- Charikar, Chen, et al.
- 2002
(Show Context)
Citation Context ... algorithm relies on randomized sketches; however, unlike basic sketching, our join estimation algorithm arranges the random sketches in a hash structure (similar to the COUNTSKETCH data structure of =-=[8]-=-). As a result, processing a stream element requires only a single sketch per hash table to be updated (i.e., the sketch for the hash bucket that the element maps to), rather than updating all the ske... |

335 | Random sampling with a reservoir
- VITTER
- 1985
(Show Context)
Citation Context ...ng samples and simple statistics over sliding windows [12]. A particularly challenging problem is that of answering aggregate SQL queries over data streams. Techniques based on random stream sampling =-=[13]-=- are known to give very poor result estimates for queries involving one or more joins [14, 4, 15]. Alon et al. [4, 3] propose algorithms that employ small pseudo-random sketch summaries to estimate th... |

269 | Maintaining stream statistics over sliding windows
- Datar, Gionis, et al.
(Show Context)
Citation Context ...in sizes [3–5], distinct values [6, 7], frequent stream elements [8–10], computing one-dimensional Haar wavelet decompositions [11], and maintaining samples and simple statistics over sliding windows =-=[12]-=-. A particularly challenging problem is that of answering aggregate SQL queries over data streams. Techniques based on random stream sampling [13] are known to give very poor result estimates for quer... |

216 | Approximate query processing using wavelets.
- Chakrabarti, Garofalakis, et al.
- 2000
(Show Context)
Citation Context ...blem is that of answering aggregate SQL queries over data streams. Techniques based on random stream sampling [13] are known to give very poor result estimates for queries involving one or more joins =-=[14, 4, 15]-=-. Alon et al. [4, 3] propose algorithms that employ small pseudo-random sketch summaries to estimate the size of self-joins and binary joins over data streams. Their algorithms rely on a single-pass m... |

215 | Surfing wavelets on streams: One-pass summaries for approximate aggregate queries.
- Gilbert, Kotidis, et al.
- 2001
(Show Context)
Citation Context ... order-statistics computation [1, 2], estimating frequency moments and join sizes [3–5], distinct values [6, 7], frequent stream elements [8–10], computing one-dimensional Haar wavelet decompositions =-=[11]-=-, and maintaining samples and simple statistics over sliding windows [12]. A particularly challenging problem is that of answering aggregate SQL queries over data streams. Techniques based on random s... |

206 | Space-efficient online computation of quantile summaries
- Greenwald, Khanna
- 2001
(Show Context)
Citation Context ...Recently, single-pass algorithms for processing streams in the presence of limited memory have been proposed for several different problems; examples include quantile and order-statistics computation =-=[1, 2]-=-, estimating frequency moments and join sizes [3–5], distinct values [6, 7], frequent stream elements [8–10], computing one-dimensional Haar wavelet decompositions [11], and maintaining samples and si... |

199 | What’s Hot and What’s Not: Tracking Most Frequent Items Dynamically”.
- Cormode, Muthukrishnan
- 2003
(Show Context)
Citation Context ...n sizes are large (e.g., 64-bit IP addresses). Fortunately, it is possible to reduce the execution time of procedure �¤£���� ¡ £ ¥ � SKIMDENSE to using the concept of dyadic intervals as suggested in =-=[9]-=-. Consider a hierarchical organization of domains¡ � � � values � ¡ ¥ � £ levels £ into 3 ¥ �s. Ats¢¡ £ levels¡ , £ � �¦¥ ¤ ¤¥¦ ¤ �¨§�© � © £ ¦ © � � � £s��¥s� ¡ ��¥ � � ¤ ¤§¦ � � � �s¡ © � � � ¦ © � ... |

186 | Processing complex aggregate queries over data streams.
- Dobra, Gehrke, et al.
- 2002
(Show Context)
Citation Context ...joinCOUNT queries), our skimmed-sketch method can readily be extended to handle complex, multi-join queries containing general aggregate operators (e.g.,SUM), in a manner similar to that described in =-=[5]-=-. More concretely, our key contributions can be summarized as follows. SKIMMED-SKETCH ALGORITHM FOR JOIN SIZE ESTIMATION. Our skimmed-sketch algorithm is similar in spirit to the bifocal sampling tech... |

169 | Join synopses for approximate query answering.
- Acharya, Gibbons, et al.
- 1999
(Show Context)
Citation Context ...blem is that of answering aggregate SQL queries over data streams. Techniques based on random stream sampling [13] are known to give very poor result estimates for queries involving one or more joins =-=[14, 4, 15]-=-. Alon et al. [4, 3] propose algorithms that employ small pseudo-random sketch summaries to estimate the size of self-joins and binary joins over data streams. Their algorithms rely on a single-pass m... |

123 | Tracking join and self-join sizes in limited storage.
- Alon, Gibbons, et al.
- 1999
(Show Context)
Citation Context ...blem is that of answering aggregate SQL queries over data streams. Techniques based on random stream sampling [13] are known to give very poor result estimates for queries involving one or more joins =-=[14, 4, 15]-=-. Alon et al. [4, 3] propose algorithms that employ small pseudo-random sketch summaries to estimate the size of self-joins and binary joins over data streams. Their algorithms rely on a single-pass m... |

119 | Distinct sampling for highly-accurate answers to distinct values queries and event reports
- Gibbons
- 2001
(Show Context)
Citation Context ...limited memory have been proposed for several different problems; examples include quantile and order-statistics computation [1, 2], estimating frequency moments and join sizes [3–5], distinct values =-=[6, 7]-=-, frequent stream elements [8–10], computing one-dimensional Haar wavelet decompositions [11], and maintaining samples and simple statistics over sliding windows [12]. A particularly challenging probl... |

112 | How to summarize the universe: Dynamic maintenance of quantiles
- Gilbert, Kotidis, et al.
- 2002
(Show Context)
Citation Context ...Recently, single-pass algorithms for processing streams in the presence of limited memory have been proposed for several different problems; examples include quantile and order-statistics computation =-=[1, 2]-=-, estimating frequency moments and join sizes [3–5], distinct values [6, 7], frequent stream elements [8–10], computing one-dimensional Haar wavelet decompositions [11], and maintaining samples and si... |

40 | Bifocal sampling for skew-resistant join size estimation.
- Ganguly
- 1996
(Show Context)
Citation Context ...ncretely, our key contributions can be summarized as follows. SKIMMED-SKETCH ALGORITHM FOR JOIN SIZE ESTIMATION. Our skimmed-sketch algorithm is similar in spirit to the bifocal sampling technique of =-=[16]-=-, but tailored to a data-stream setting. Instead of samples, our skimmed-sketch method employs randomized hash sketch summaries of streams ¡ in ¢ ¢ and to approximate the size of ¡¤£¦¥ two steps. It f... |

20 |
Comparing data streams using hamming norms
- Cormode, Datar, et al.
- 2002
(Show Context)
Citation Context ...limited memory have been proposed for several different problems; examples include quantile and order-statistics computation [1, 2], estimating frequency moments and join sizes [3–5], distinct values =-=[6, 7]-=-, frequent stream elements [8–10], computing one-dimensional Haar wavelet decompositions [11], and maintaining samples and simple statistics over sliding windows [12]. A particularly challenging probl... |