• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

DMCA

Duplicate Record Detection: A Survey (2007)

Cached

  • Download as a PDF

Download Links

  • [www.stern.nyu.edu]
  • [archive.nyu.edu]
  • [archive.nyu.edu]
  • [homepages.inf.ed.ac.uk]
  • [www.ipeirotis.com]
  • [www.cs.purdue.edu]
  • [dc-pubs.dbs.uni-leipzig.de]
  • [www.cs.utah.edu]
  • [www.cs.purdue.edu]
  • [www.cs.umd.edu]

  • Other Repositories/Bibliography

  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Ahmed K. Elmagarmid , Panagiotis G. Ipeirotis , Vassilios S. Verykios
Citations:448 - 11 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@MISC{Elmagarmid07duplicaterecord,
    author = {Ahmed K. Elmagarmid and Panagiotis G. Ipeirotis and Vassilios S. Verykios},
    title = {Duplicate Record Detection: A Survey },
    year = {2007}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

Often, in the real world, entities have two or more representations in databases. Duplicate records do not share a common key and/or they contain errors that make duplicate matching a difficult task. Errors are introduced as the result of transcription errors, incomplete information, lack of standard formats, or any combination of these factors. In this paper, we present a thorough analysis of the literature on duplicate record detection. We cover similarity metrics that are commonly used to detect similar field entries, and we present an extensive set of duplicate detection algorithms that can detect approximately duplicate records in a database. We also cover multiple techniques for improving the efficiency and scalability of approximate duplicate detection algorithms. We conclude with coverage of existing tools and with a brief discussion of the big open problems in the area.

Keyphrases

duplicate record detection    survey ahmed    duplicate record    similar field entry    standard format    index term duplicate detection    name matching    common key    transcription error    duplicate detection    difficult task    identity uncertainty    brief discussion    multiple technique    real world    fuzzy duplicate detection    entity resolution    incomplete information    database hardening    instance identification    entity matching    data deduplication    approximate duplicate detection algorithm    abstract often    similarity metric    thorough analysis    data integration    record linkage    data cleaning    extensive set    big open problem   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University