MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

IntelliClean: A Knowledge-Based Intelligent Data Cleaner (2000) [29 citations — 0 self]

by Mong Li Lee ,  Tok Wang Ling ,  Wai Lup Low
In Knowledge Discovery and Data Mining
Add To MetaCart

Abstract:

Existing data cleaning methods work on the basis of computing the degree of similarity between nearby records in a sorted database. High recall is achieved by accepting records with low degrees of similarity as duplicates, at the cost of lower precision. High precision is achieved analogously at the cost of lower recall. This is the recall-precision dilemma. In this paper, we propose a generic knowledge-based framework for effective data cleaning that implements existing cleaning strategies and more. We develop a new method to compute transitive closure under uncertainty which handles the merging of groups of inexact duplicate records. Experimental results show that this framework can identify duplicates and anomalies with high recall and precision.

Citations

202 The merge/purge problem for large databases – Hernandez, Stolfo - 1995
132 CP: An efficient domain-independent algorithm for detecting approximately duplicate database records – AE, Elkan - 1997
37 Rete: A fast algorithm for the many patterns/many objects match problem – Forgy - 1982
33 the java expert system shell – Jess - 1997
16 Cleansing Data for Mining and Warehousing – Lee, Lu, et al. - 1999
12 Dealing with Dirty Data – Kimball - 1996
5 Expert systems: Principles and programming (3rd edition – Giarratano, Riley - 1998
1 Challenges of e#cient data cleansing – Maydanchik - 1999
1 A comparison of two incremental merge/purge strategies – Waller - 1998