Results 1  10
of
53
Models for incomplete and probabilistic information.
 IEEE Data Engineering Bulletin
, 2006
"... Abstract. We discuss, compare and relate some old and some new models for incomplete and probabilistic databases. We characterize the expressive power of ctables over infinite domains and we introduce a new kind of result, algebraic completion, for studying less expressive models. By viewing proba ..."
Abstract

Cited by 83 (9 self)
 Add to MetaCart
(Show Context)
Abstract. We discuss, compare and relate some old and some new models for incomplete and probabilistic databases. We characterize the expressive power of ctables over infinite domains and we introduce a new kind of result, algebraic completion, for studying less expressive models. By viewing probabilistic models as incompleteness models with additional probability information, we define completeness and closure under query languages of general probabilistic database models and we introduce a new such model, probabilistic ctables, that is shown to be complete and closed under the relational algebra.
Dependencies revisited for improving data quality,
, 2008
"... Abstract Dependency theory is almost as old as relational databases themselves, and has traditionally been used to improve the quality of schema, among other things. Recently there has been renewed interest in dependencies for improving the quality of data. The increasing demand for data quality te ..."
Abstract

Cited by 69 (4 self)
 Add to MetaCart
(Show Context)
Abstract Dependency theory is almost as old as relational databases themselves, and has traditionally been used to improve the quality of schema, among other things. Recently there has been renewed interest in dependencies for improving the quality of data. The increasing demand for data quality technology has also motivated revisions of classical dependencies, to capture more inconsistencies in reallife data, and to match, repair and query the inconsistent data. This paper aims to provide an overview of recent advances in revising classical dependencies for improving data quality.
Worldset decompositions: Expressiveness and efficient algorithms
 In Proc. ICDT
, 2007
"... Abstract. Uncertain information is commonplace in realworld data management scenarios. The ability to represent large sets of possible instances (worlds) while supporting efficient storage and processing is an important challenge in this context. The recent formalism of worldset decompositions (WSD ..."
Abstract

Cited by 38 (12 self)
 Add to MetaCart
(Show Context)
Abstract. Uncertain information is commonplace in realworld data management scenarios. The ability to represent large sets of possible instances (worlds) while supporting efficient storage and processing is an important challenge in this context. The recent formalism of worldset decompositions (WSDs) provides a spaceefficient representation for uncertain data that also supports scalable processing. WSDs are complete for finite worldsets in that they can represent any finite set of possible worlds. For possibly infinite worldsets, we show that a natural generalization of WSDs precisely captures the expressive power of ctables. We then show that several important problems are efficiently solvable on WSDs while they are NPhard on ctables. Finally, we give a polynomialtime algorithm for factorizing WSDs, i.e. an efficient algorithm for minimizing such representations. 1
From Complete to Incomplete Information and Back
 In Proc. SIGMOD
"... Incomplete information arises naturally in numerous data management applications. Recently, several researchers have studied query processing in the context of incomplete information. Most work has combined the syntax of a traditional query language like relational algebra with a nonstandard semanti ..."
Abstract

Cited by 37 (11 self)
 Add to MetaCart
(Show Context)
Incomplete information arises naturally in numerous data management applications. Recently, several researchers have studied query processing in the context of incomplete information. Most work has combined the syntax of a traditional query language like relational algebra with a nonstandard semantics such as certain or ranked possible answers. There are now also languages with special features to deal with uncertainty. However, to the standards of the data management community, to date no language proposal has been made that can be considered a natural analog to SQL or relational algebra for the case of incomplete information. In this paper we propose such a language, Worldset Algebra, which satisfies the robustness criteria and analogies to relational algebra that we expect. The language supports the contemplation on alternatives and can thus map from a complete database to an incomplete one comprising several possible worlds. We show that Worldset Algebra is conservative over relational algebra in the sense that any query that maps from a complete database to a complete database (a completetocomplete query) is equivalent to a relational algebra query. Moreover, we give an efficient algorithm for effecting this translation. We then study algebraic query optimization of such queries. We argue that query languages with explicit constructs for handling uncertainty allow for the more natural and simple expression of many realworld decision support queries. The results of this paper not only suggest a language for specifying queries in this way, but also allow for their efficient evaluation in any relational database management system.
On approximating optimum repairs for functional dependency violations
"... We study the problem of repairing an inconsistent database that violates a set of functional dependencies by making the smallest possible value modifications. For an inconsistent database, we define an optimum repair as a database that satisfies the functional dependencies, and minimizes, among all ..."
Abstract

Cited by 27 (0 self)
 Add to MetaCart
(Show Context)
We study the problem of repairing an inconsistent database that violates a set of functional dependencies by making the smallest possible value modifications. For an inconsistent database, we define an optimum repair as a database that satisfies the functional dependencies, and minimizes, among all repairs, a distance measure that depends on the number of corrections made in the database and the weights of tuples modified. We show that like other versions of the repair problem, checking the existence of a repair within a certain distance of a database is NPcomplete. We also show that finding a constantfactor approximation for the optimum repair for any set of functional dependencies is NPhard. Furthermore, there is a small constant and a set of functional dependencies, for which finding an approximate solution for the optimum repair within the factor of that constant is also NPhard. Then we present an approximation algorithm that for a fixed set of functional dependencies and an arbitrary input inconsistent database, produces a repair whose distance to the database is within a constant factor of the optimum repair distance. We finally show how the approximation algorithm can be used in data cleaning using a recent extension to functional dependencies, called conditional functional dependencies.
Coherent Integration of Databases by Abductive Logic Programming
 Journal of Artificial Intelligence Research
, 2004
"... We introduce an abductive method for a coherent integration of independent datasources. ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
(Show Context)
We introduce an abductive method for a coherent integration of independent datasources.
Repair Localization for Query Answering from Inconsistent Databases
 ACM TRANSACTIONS ON DATABASE SYSTEMS (TODS
, 2008
"... Query answering from inconsistent databases amounts to finding “meaningful” answers to queries posed over database instances that do not satisfy integrity constraints specified over their schema. A declarative approach to this problem relies on the notion of repair, that is, a database that satisfie ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
Query answering from inconsistent databases amounts to finding “meaningful” answers to queries posed over database instances that do not satisfy integrity constraints specified over their schema. A declarative approach to this problem relies on the notion of repair, that is, a database that satisfies integrity constraints and is obtained from the original inconsistent database by “minimally” adding and/or deleting tuples. Consistent answers to a user query are those answers that are in the evaluation of the query over each repair. Motivated by the fact that computing consistent answers from inconsistent databases is in general intractable, the present paper investigates techniques that allow to localize the difficult part of the computation on a small fragment of the database at hand, called “affected ” part. Based on a number of localization results, an approach to query answering from inconsistent data is presented, in which the query is evaluated over each of the repairs of the affected part only, augmented with the part that is not affected. Single query results are then suitably recombined. For some relevant settings, techniques are also discussed to factorize repairs into components that can be processed independently of one another, thereby guaranteeing exponential gain w.r.t. the basic approach, which is not based on localization. The effectiveness of the results is demonstrated for consistent query
Querying and repairing inconsistent numerical databases
 IN PROC
, 2010
"... The problem of extracting consistent information from relational databases violating integrity constraints on numerical data is addressed. In particular, aggregate constraints defined as linear inequalities on aggregatesum queries on input data are considered. The notion of repair as consistent set ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
(Show Context)
The problem of extracting consistent information from relational databases violating integrity constraints on numerical data is addressed. In particular, aggregate constraints defined as linear inequalities on aggregatesum queries on input data are considered. The notion of repair as consistent set of updates at attributevalue level is exploited, and the characterization of several datacomplexity issues related to repairing data and computing consistent query answers is provided. Moreover, a method for computing “reasonable” repairs of inconsistent numerical databases is provided, for a restricted but expressive class of aggregate constraints. Several experiments are presented, which assess the effectiveness of the proposed approach in reallife application scenarios.
Representing uncertain data: models, properties, and algorithms
 THE VLDB JOURNAL
, 2009
"... In general terms, an uncertain relation encodes a set of possible certain relations. There are many ways to represent uncertainty, ranging from alternative values for attributes to rich constraint languages. Among the possible models for uncertain data, there is a tension between simple and intuiti ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
In general terms, an uncertain relation encodes a set of possible certain relations. There are many ways to represent uncertainty, ranging from alternative values for attributes to rich constraint languages. Among the possible models for uncertain data, there is a tension between simple and intuitive models, which tend to be incomplete, and complete models, which tend to be nonintuitive and more complex than necessary for many applications. We present a space of models for representing uncertain data based on a variety of uncertainty constructs and tupleexistence constraints. We explore a number of properties and results for these models. We study completeness of the models, as well as closure under relational operations, and we give results relating closure and completeness. We then examine whether different models guarantee unique representations of uncertain data, and for those models that do not, we provide complexity results and algorithms for testing equivalence of representations. The next problem we consider is that of minimizing the size of representation of models, showing that minimizing the number of tuples also minimizes the size of constraints. We show that minimization is intractable in general and study the more restricted problem
Worldset Decompositions: Expressiveness and Efficient Algorithms
, 2008
"... Uncertain information is commonplace in realworld data management scenarios. The ability to represent large sets of possible instances (worlds) while supporting efficient storage and processing is an important challenge in this context. The recent formalism of worldset decompositions (WSDs) provid ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
(Show Context)
Uncertain information is commonplace in realworld data management scenarios. The ability to represent large sets of possible instances (worlds) while supporting efficient storage and processing is an important challenge in this context. The recent formalism of worldset decompositions (WSDs) provides a spaceefficient representation for uncertain data that also supports scalable processing. WSDs are complete for finite worldsets in that they can represent any finite set of possible worlds. For possibly infinite worldsets, we show that a natural generalization of WSDs precisely captures the expressive power of ctables. We then show that several important problems are efficiently solvable on WSDs while they are NPhard on ctables. Finally, we give a polynomialtime algorithm for factorizing WSDs, i.e. an efficient algorithm for minimizing such representations.