A Framework for Measuring Changes in Data Characteristics (1999)
Cached
Download Links
- [www.cs.cornell.edu]
- [www.cs.cornell.edu]
- DBLP
Other Repositories/Bibliography
| Venue: | IN PODS |
| Citations: | 44 - 1 self |
BibTeX
@INPROCEEDINGS{Ganti99aframework,
author = {Venkatesh Ganti and Johannes Gehrke and Raghu Ramakrishnan and Wei-Yin Loh},
title = {A Framework for Measuring Changes in Data Characteristics},
booktitle = {IN PODS},
year = {1999},
pages = {126--137},
publisher = {ACM Press}
}
Years of Citing Articles
OpenURL
Abstract
A data mining algorithm builds a model that captures interesting aspects of the underlying data. We develop a framework for quantifying the difference, called the deviation, between two datasets in terms of the models they induce. Our framework covers a wide variety of models including frequent itemsets, decision tree classifiers, and clusters, and captures standard measures of deviation such as the misclassification rate and the chi-squared metric as special cases. We also show how statistical techniques can be applied to the deviation measure to assess whether the difference between two models is meaningful (i.e., whether the underlying datasets have statistically significant differences in their characteristics), and discuss several practical applications.







