Results 1  10
of
142
Efficient Query Evaluation on Probabilistic Databases
, 2004
"... We describe a system that supports arbitrarily complex SQL queries with ”uncertain” predicates. The query semantics is based on a probabilistic model and the results are ranked, much like in Information Retrieval. Our main focus is efficient query evaluation, a problem that has not received attentio ..."
Abstract

Cited by 429 (44 self)
 Add to MetaCart
(Show Context)
We describe a system that supports arbitrarily complex SQL queries with ”uncertain” predicates. The query semantics is based on a probabilistic model and the results are ranked, much like in Information Retrieval. Our main focus is efficient query evaluation, a problem that has not received attention in the past. We describe an optimization algorithm that can compute efficiently most queries. We show, however, that the data complexity of some queries is #Pcomplete, which implies that these queries do not admit any efficient evaluation methods. For these queries we describe both an approximation algorithm and a MonteCarlo simulation algorithm.
ULDBs: Databases with uncertainty and lineage
 IN VLDB
, 2006
"... This paper introduces ULDBs, an extension of relational databases with simple yet expressive constructs for representing and manipulating both lineage and uncertainty. Uncertain data and data lineage are two important areas of data management that have been considered extensively in isolation, howev ..."
Abstract

Cited by 289 (35 self)
 Add to MetaCart
This paper introduces ULDBs, an extension of relational databases with simple yet expressive constructs for representing and manipulating both lineage and uncertainty. Uncertain data and data lineage are two important areas of data management that have been considered extensively in isolation, however many applications require the features in tandem. Fundamentally, lineage enables simple and consistent representation of uncertain data, it correlates uncertainty in query results with uncertainty in the input data, and query processing with lineage and uncertainty together presents computational benefits over treating them separately. We show that the ULDB representation is complete, and that it permits straightforward implementation of many relational operations. We define two notions of ULDB minimality—dataminimal and lineageminimal—and study minimization of ULDB representations under both notions. With lineage, derived relations are no longer selfcontained: their uncertainty depends on uncertainty in the base data. We provide an algorithm for the new operation of extracting a database subset in the presence of interconnected uncertainty. Finally, we show how ULDBs enable a new approach to query processing in probabilistic databases. ULDBs form the basis of the Trio system under development at Stanford.
A Probabilistic Relational Algebra for the Integration of Information Retrieval and Database Systems
 ACM Transactions on Information Systems
, 1994
"... We present a probabilistic relational algebra (PRA) which is a generalization of standard relational algebra. Here tuples are assigned probabilistic weights giving the probability that a tuple belongs to a relation. Based on intensional semantics, the tuple weights of the result of a PRA expression ..."
Abstract

Cited by 209 (33 self)
 Add to MetaCart
We present a probabilistic relational algebra (PRA) which is a generalization of standard relational algebra. Here tuples are assigned probabilistic weights giving the probability that a tuple belongs to a relation. Based on intensional semantics, the tuple weights of the result of a PRA expression always confirm to the underlying probabilistic model. We also show for which expressions extensional semantics yields the same results. Furthermore, we discuss complexity issues and indicate possibilities for optimization. With regard to databases, the approach allows for representing imprecise attribute values, whereas for information retrieval, probabilistic document indexing and probabilistic search term weighting can be modelled. As an important extension, we introduce the concept of vague predicates which yields a probabilistic weight instead of a Boolean value, thus allowing for queries with vague selection conditions. So PRA implements uncertainty and vagueness in combination with the...
ProbView: A Flexible Probabilistic Database System
 ACM TRANSACTIONS ON DATABASE SYSTEMS
, 1997
"... ... In this article, we characterize, using postulates, whole classes of strategies for conjunction, disjunction, and negation, meaningful from the viewpoint of probability theory. (1) We propose a probabilistic relational data model and a generic probabilistic relational algebra that neatly capture ..."
Abstract

Cited by 198 (16 self)
 Add to MetaCart
... In this article, we characterize, using postulates, whole classes of strategies for conjunction, disjunction, and negation, meaningful from the viewpoint of probability theory. (1) We propose a probabilistic relational data model and a generic probabilistic relational algebra that neatly captures various strategies satisfying the postulates, within a single unified framework. (2) We show that as long as the chosen strategies can be computed in polynomial time, queries in the positive fragment of the probabilistic relational algebra have essentially the same data complexity as classical relational algebra. (3) We establish various containments and equivalences between algebraic expressions, similar in spirit to those in classical algebra. (4) We develop algorithms for maintaining materialized probabilistic views. (5) Based on these ideas, we have developed
Representing and querying correlated tuples in probabilistic databases
 In ICDE
, 2007
"... Probabilistic databases have received considerable attention recently due to the need for storing uncertain data produced by many real world applications. The widespread use of probabilistic databases is hampered by two limitations: (1) current probabilistic databases make simplistic assumptions abo ..."
Abstract

Cited by 136 (11 self)
 Add to MetaCart
(Show Context)
Probabilistic databases have received considerable attention recently due to the need for storing uncertain data produced by many real world applications. The widespread use of probabilistic databases is hampered by two limitations: (1) current probabilistic databases make simplistic assumptions about the data (e.g., complete independence among tuples) that make it difficult to use them in applications that naturally produce correlated data, and (2) most probabilistic databases can only answer a restricted subset of the queries that can be expressed using traditional query languages. We address both these limitations by proposing a framework that can represent not only probabilistic tuples, but also correlations that may be present among them. Our proposed framework naturally lends itself to the possible world semantics thus preserving the precise query semantics extant in current probabilistic databases. We develop an efficient strategy for query evaluation over such probabilistic databases by casting the query processing problem as an inference problem in an appropriately constructed probabilistic graphical model. We present several optimizations specific to probabilistic databases that enable efficient query evaluation. We validate our approach by presenting an experimental evaluation that illustrates the effectiveness of our techniques at answering various queries using real and synthetic datasets. 1
Supporting ValidTime Indeterminacy
 ACM Transactions on Database Systems
, 1998
"... In validtime indeterminacy it is known that an event stored in a database did in fact occur, but it is not known exactly when. In this paper we extend the SQL data model and query language to support validtime indeterminacy. We represent the occurrence time of an event with a set of possible insta ..."
Abstract

Cited by 97 (17 self)
 Add to MetaCart
In validtime indeterminacy it is known that an event stored in a database did in fact occur, but it is not known exactly when. In this paper we extend the SQL data model and query language to support validtime indeterminacy. We represent the occurrence time of an event with a set of possible instants, delimiting when the event might have occurred, and a probability distribution over that set. We also describe query language constructs to retrieve information in the presence of indeterminacy. These constructs enable users to specify their credibility in the underlying data and their plausibility in the relationships among that data. A denotational semantics for SQL’s select statement with optional credibility and plausibility constructs is given. We show that this semantics is reliable, in that it never produces incorrect information, is maximal, in that if it were extended to be more informative, the results may not be reliable, and reduces to the previous semantics when there is no indeterminacy. Although the extended data model and query language provide needed modeling capabilities, these extensions appear initially to carry a significant execution cost. A contribution of this paper is to demonstrate that our approach is useful and practical. An efficient representation of validtime indeterminacy and efficient query processing algorithms are provided. The cost of
Models for Incomplete and Probabilistic Information
 IEEE Data Engineering Bulletin
, 2006
"... Abstract. We discuss, compare and relate some old and some new models for incomplete and probabilistic databases. We characterize the expressive power of ctables over infinite domains and we introduce a new kind of result, algebraic completion, for studying less expressive models. By viewing probab ..."
Abstract

Cited by 77 (10 self)
 Add to MetaCart
(Show Context)
Abstract. We discuss, compare and relate some old and some new models for incomplete and probabilistic databases. We characterize the expressive power of ctables over infinite domains and we introduce a new kind of result, algebraic completion, for studying less expressive models. By viewing probabilistic models as incompleteness models with additional probability information, we define completeness and closure under query languages of general probabilistic database models and we introduce a new such model, probabilistic ctables, that is shown to be complete and closed under the relational algebra. 1
Clean answers over dirty databases: A probabilistic approach
 In Proc. ICDE
, 2006
"... The detection of duplicate tuples, corresponding to the same realworld entity, is an important task in data integration and cleaning. While many techniques exist to identify such tuples, the merging or elimination of duplicates can be a difficult task that relies on adhoc and often manual solution ..."
Abstract

Cited by 65 (2 self)
 Add to MetaCart
(Show Context)
The detection of duplicate tuples, corresponding to the same realworld entity, is an important task in data integration and cleaning. While many techniques exist to identify such tuples, the merging or elimination of duplicates can be a difficult task that relies on adhoc and often manual solutions. We propose a complementary approach that permits declarative query answering over duplicated data, where each duplicate is associated with a probability of being in the clean database. We rewrite queries over a database containing duplicates to return each answer with the probability that the answer is in the clean database. Our rewritten queries are sensitive to the semantics of duplication and help a user understand which query answers are most likely to be present in the clean database. The semantics that we adopt is independent of the way the probabilities are produced, but is able to effectively exploit them during query answering. In the absence of external knowledge that associates each database tuple with a probability, we offer a technique, based on tuple summaries, that automates this task. We experimentally study the performance of our rewritten queries. Our studies show that the rewriting does not introduce a significant overhead in query execution time. This work is done in the context of the ConQuer project at the University of Toronto, which focuses on the efficient management of inconsistent and dirty databases. 1
Current Approaches to Handling Imperfect Information in Data and Knowledge Bases
, 1996
"... This paper surveys methods for representing and reasoning with imperfect information. It opens with an attempt to classify the different types of imperfection that may pervade data, and a discussion of the sources of such imperfections. The classification is then used as a framework for considering ..."
Abstract

Cited by 62 (1 self)
 Add to MetaCart
(Show Context)
This paper surveys methods for representing and reasoning with imperfect information. It opens with an attempt to classify the different types of imperfection that may pervade data, and a discussion of the sources of such imperfections. The classification is then used as a framework for considering work that explicitly concerns the representation of imperfect information, and related work on how imperfect information may be used as a basis for reasoning. The work that is surveyed is drawn from both the field of databases and the field of artificial intelligence. Both of these areas have long been concerned with the problems caused by imperfect information, and this paper stresses the relationships between the approaches developed in each.
An InformationTheoretic Approach to Normal Forms for Relational and XML Data
 In PODS
, 2003
"... Normalization as a way of producing good database designs is a wellunderstood topic. ..."
Abstract

Cited by 61 (8 self)
 Add to MetaCart
(Show Context)
Normalization as a way of producing good database designs is a wellunderstood topic.