Data Cleaning Methods (2003)
by
William Winkler
| Citations: | 13 - 1 self |
BibTeX
@MISC{Winkler03datacleaning,
author = {William Winkler},
title = {Data Cleaning Methods},
year = {2003}
}
OpenURL
Abstract
Data Cleaning methods are used for finding duplicates within a file or across sets of files. This overview provides background on the Fellegi-Sunter model of record linkage. The Fellegi-Sunter model provides an optimal theoretical classification rule. Fellegi and Sunter introduced methods for automatically estimating optimal parameters without training data that we extend to many real world situations. Keywords EM Algorithm, string comparator, unsupervised learning. 1.







