Efficient discovery of error-tolerant frequent itemsets in high dimensions (2001)
Cached
Download Links
- [infolab.stanford.edu]
- [www-db.stanford.edu]
- DBLP
Other Repositories/Bibliography
| Venue: | In SIGKDD 2001 |
| Citations: | 44 - 0 self |
BibTeX
@INPROCEEDINGS{Yang01efficientdiscovery,
author = {Cheng Yang},
title = {Efficient discovery of error-tolerant frequent itemsets in high dimensions},
booktitle = {In SIGKDD 2001},
year = {2001},
pages = {194--203},
publisher = {ACM Press}
}
Years of Citing Articles
OpenURL
Abstract
We present a generalization of frequent itemsets allowing for the notion of errors in the itemset definition. We motivate the problem and present an efficient algorithm that identifies errortolerant frequent clusters of items in transactional data (customerpurchase data, web browsing data, text, etc.). The algorithm exploits sparseness of the underlying data to find large groups of items that are correlated over database records (rows). The notion of transaction coverage allows us to extend the algorithm and view it as a fast clustering algorithm for discovering segments of similar transactions in binary sparse data. We evaluate the new algorithm on three real-world applications: clustering highdimensional data, query selectivity estimation and collaborative filtering. Results show that the algorithm consistently uncovers structure in large sparse databases that other traditional clustering algorithms fail to find.







