• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

DMCA

What’s Hot and What’s Not: Tracking Most Frequent Items Dynamically, in (2003)

Cached

  • Download as a PDF

Download Links

  • [www.mathcs.emory.edu]
  • [files.ifi.uzh.ch]
  • [www.cs.princeton.edu]
  • [www.research.att.com]
  • [vc.cs.nthu.edu.tw]
  • [www.cs.princeton.edu]
  • [dimacs.rutgers.edu]
  • [www.dimacs.rutgers.edu]
  • [dimacs.rutgers.edu]
  • [www.research.att.com]
  • [www.dimacs.rutgers.edu]
  • [www.cs.rutgers.edu]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Graham Cormode , S Muthukrishnan
Venue:Proc. of ACM PODS
Citations:199 - 13 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@INPROCEEDINGS{Cormode03what’shot,
    author = {Graham Cormode and S Muthukrishnan},
    title = {What’s Hot and What’s Not: Tracking Most Frequent Items Dynamically, in},
    booktitle = {Proc. of ACM PODS},
    year = {2003}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

ABSTRACT Most database management systems maintain statistics on the underlying relation. One of the important statistics is that of the "hot items" in the relation: those that appear many times (most frequently, or more than some threshold). For example, end-biased histograms keep the hot items as part of the histogram and are used in selectivity estimation. Hot items are used as simple outliers in data mining, and in anomaly detection in networking applications. We present a new algorithm for dynamically determining the hot items at any time in the relation that is undergoing deletion operations as well as inserts. Our algorithm maintains a small space data structure that monitors the transactions on the relation, and when required, quickly outputs all hot items, without rescanning the relation in the database. With user-specified probability, it is able to report all hot items. Our algorithm relies on the idea of "group testing", is simple to implement, and has provable quality, space and time guarantees. Previously known algorithms for this problem that make similar quality and performance guarantees can not handle deletions, and those that handle deletions can not make similar guarantees without rescanning the database. Our experiments with real and synthetic data shows that our algorithm is remarkably accurate in dynamically tracking the hot items independent of the rate of insertions and deletions.

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University