Utilizing association rules for identification of possible errors in data sets (2000)
| Citations: | 3 - 0 self |
BibTeX
@TECHREPORT{Marcus00utilizingassociation,
author = {Andrian Marcus and Jonathan I. Maletic},
title = {Utilizing association rules for identification of possible errors in data sets},
institution = {},
year = {2000}
}
OpenURL
Abstract
Abstract: The paper analyzes the application of association rules to the problem of data cleansing and automatically identifying potential errors in data sets. Association rules are a fundamental class of patterns that exist in data. These patterns have been widely utilized (e.g., market basket analysis) and extensive studies exist to find efficient association rule mining algorithms. Special attention is given in literature to the extension of binary association rules (e.g., ratio, quantitative, generalized, multiple-level, constrained-based, distance-based, composite association rules). A new extension of the boolean association rules, ordinal association rules, that incorporates ordinal relationships among data items, is introduced. An algorithm that finds these rules and identifies potential errors in data is proposed. A prototype tool is described and the results of applying it to a real-world data set are given. The tool is designed to be domain independent and constitutes the first part in a proposed framework for automated data cleansing. Other approaches to data cleansing are described and compared.







