• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 5,799
Next 10 →

Data Cleaning: Problems and Current Approaches

by Erhard Rahm, Hong Hai Do - IEEE Data Engineering Bulletin , 2000
"... We classify data quality problems that are addressed by data cleaning and provide an overview of the main solution approaches. Data cleaning is especially required when integrating heterogeneous data sources and should be addressed together with schema-related data transformations. In data warehouse ..."
Abstract - Cited by 279 (8 self) - Add to MetaCart
We classify data quality problems that are addressed by data cleaning and provide an overview of the main solution approaches. Data cleaning is especially required when integrating heterogeneous data sources and should be addressed together with schema-related data transformations. In data

An Extensible Framework for Data Cleaning

by Helena Galhardas, Daniela Florescu, Dennis Shasha, Eric Simon - In ICDE , 2000
"... Data integration solutions dealing with large amounts of data have been strongly required in the last few years. Besides the traditional data integration problems (e.g. schema integration, local to global schema mappings), three additional data problems have to be dealt with: (1) the absence of un ..."
Abstract - Cited by 74 (0 self) - Add to MetaCart
of universal keys across dierent databases that is known as the object identity problem, (2) the existence of keyboard errors in the data, and (3) the presence of inconsistencies in data coming from multiple sources. Dealing with these problems is globally called the data cleaning process. In this work, we

Special Issue on Data Cleaning

by K. Bharat, A. Broder, J. Dean, M. R. Henzinger, Automatically Extracting, Structure Free, Text Addresses, V. Borkar, K. Deshmukh, S. Sarawagi, C. Knoblock, K. Lerman, S. Minton, I. Muslea, P. Vassiliadis, Z. Vagena, S. Skiadopoulos, N. Karayannidis, T. Sellis, David B. Lomet, Luis Gravano, Alon Levy, Sunita Sarawagi, Gerhard Weikum , 2000
"... We classify data quality problems that are addressed by data cleaning and provide an overview of the main solution approaches. Data cleaning is especially required when integrating heterogeneous data sources and should be addressed together with schema-related data transformations. In data warehous ..."
Abstract - Add to MetaCart
We classify data quality problems that are addressed by data cleaning and provide an overview of the main solution approaches. Data cleaning is especially required when integrating heterogeneous data sources and should be addressed together with schema-related data transformations. In data

Data Cleaning Methods

by William Winkler , 2003
"... Data Cleaning methods are used for finding duplicates within a file or across sets of files. This overview provides background on the Fellegi-Sunter model of record linkage. The Fellegi-Sunter model provides an optimal theoretical classification rule. Fellegi and Sunter introduced methods for au ..."
Abstract - Cited by 20 (1 self) - Add to MetaCart
Data Cleaning methods are used for finding duplicates within a file or across sets of files. This overview provides background on the Fellegi-Sunter model of record linkage. The Fellegi-Sunter model provides an optimal theoretical classification rule. Fellegi and Sunter introduced methods

An Interactive Framework for Data Cleaning

by Vijayshankar Raman, Joseph M. Hellerstein , 2000
"... Cleaning organizational data of discrepancies in structure and content is important for data warehousing and Enterprise Data Integration (EDI). Current commercial solutions for data cleaning involve many iterations of time-consuming "data quality" analysis to find errors, and long-runnin ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
Cleaning organizational data of discrepancies in structure and content is important for data warehousing and Enterprise Data Integration (EDI). Current commercial solutions for data cleaning involve many iterations of time-consuming "data quality" analysis to find errors, and long

Email Data Cleaning

by Jie Tang, Hang Li, Yunbo Cao, Zhaohui Tang - In 5th International Conference on Knowledge and Data Discovery KDD’05 , 2005
"... Addressed in this paper is the issue of ‘email data cleaning ’ for text mining. Many text mining applications need take emails as input. Email data is usually noisy and thus it is necessary to clean it before mining. Several products offer email cleaning features, however, the types of noises that c ..."
Abstract - Cited by 22 (4 self) - Add to MetaCart
Addressed in this paper is the issue of ‘email data cleaning ’ for text mining. Many text mining applications need take emails as input. Email data is usually noisy and thus it is necessary to clean it before mining. Several products offer email cleaning features, however, the types of noises

THE IMPACT OF DATA CLEANING ON INTERNAL VALIDITY

by Wilcox Aidan, Aidan Wilcox
"... Any number you want: the impact of data cleaning on internal validity ..."
Abstract - Add to MetaCart
Any number you want: the impact of data cleaning on internal validity

Declarative Data Cleaning: Language, Model, and Algorithms

by Helena Galhardas, Daniela Florescu, Dennis Shasha - In VLDB , 2001
"... The problem of data cleaning, which consists of removing inconsistencies and errors from original data sets, is well known in the area of decision support systems and data warehouses. This holds regardless of the application - relational database joining, web-related, or scientific. In all cases, ex ..."
Abstract - Cited by 125 (6 self) - Add to MetaCart
The problem of data cleaning, which consists of removing inconsistencies and errors from original data sets, is well known in the area of decision support systems and data warehouses. This holds regardless of the application - relational database joining, web-related, or scientific. In all cases

Continuous Data Cleaning

by Maksims Volkovs, Fei Chiang, Jaroslaw Szlichta, Renée J. Miller
"... Abstract—In declarative data cleaning, data semantics are encoded as constraints and errors arise when the data violates the constraints. Various forms of statistical and logical inference can be used to reason about and repair inconsistencies (errors) in data. Recently, unified approaches that repa ..."
Abstract - Cited by 7 (2 self) - Add to MetaCart
Abstract—In declarative data cleaning, data semantics are encoded as constraints and errors arise when the data violates the constraints. Various forms of statistical and logical inference can be used to reason about and repair inconsistencies (errors) in data. Recently, unified approaches

The LLUNATIC Data-Cleaning Framework

by Floris Geerts, Giansalvatore Mecca, Paolo Papotti, Donatello Santoro
"... Data-cleaning (or data-repairing) is considered a crucial problem in many database-related tasks. It consists in making a database consistent with respect to a set of given constraints. In recent years, repairing methods have been proposed for several classes of constraints. However, these methods r ..."
Abstract - Cited by 10 (3 self) - Add to MetaCart
Data-cleaning (or data-repairing) is considered a crucial problem in many database-related tasks. It consists in making a database consistent with respect to a set of given constraints. In recent years, repairing methods have been proposed for several classes of constraints. However, these methods
Next 10 →
Results 1 - 10 of 5,799
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University