Matching Algorithms within a Duplicate Detection System (2000)
| Venue: | Bulletin of the Technical Committee on Data Engineering |
| Citations: | 16 - 0 self |
BibTeX
@ARTICLE{Monge00matchingalgorithms,
author = {Alvaro E. Monge},
title = {Matching Algorithms within a Duplicate Detection System},
journal = {Bulletin of the Technical Committee on Data Engineering},
year = {2000},
volume = {23},
pages = {2000}
}
OpenURL
Abstract
Detecting database records that are approximate duplicates, but not exact duplicates, is an important task. Databases may contain duplicate records concerning the same real-world entity because of data entry errors, unstandardized abbreviations, or differences in the detailed schemas of records from multiple databases – such as what happens in data warehousing where records from multiple data sources are integrated into a single source of information – among other reasons. In this paper we review a system to detect approximate duplicate records in a database and provide properties that a pair-wise record matching algorithm must have in order to have a successful duplicate detection system. 1







