Results 1  10
of
26
Conditioning Probabilistic Databases
"... Past research on probabilistic databases has studied the problem of answering queries on a static database. Application scenarios of probabilistic databases however often involve the conditioning of a database using additional information in the form of new evidence. The conditioning problem is thus ..."
Abstract

Cited by 42 (13 self)
 Add to MetaCart
Past research on probabilistic databases has studied the problem of answering queries on a static database. Application scenarios of probabilistic databases however often involve the conditioning of a database using additional information in the form of new evidence. The conditioning problem is thus to transform a probabilistic database of priors into a posterior probabilistic database which is materialized for subsequent query processing or further refinement. It turns out that the conditioning problem is closely related to the problem of computing exact tuple confidence values. It is known that exact confidence computation is an NPhard problem. This has lead researchers to consider approximation techniques for confidence computation. However, neither conditioning nor exact confidence computation can be solved using such techniques. In this paper we present efficient techniques for both problems. We study several problem decomposition methods and heuristics that are based on the most successful search techniques from constraint satisfaction, such as the variable elimination rule of the DavisPutnam algorithm. We complement this with a thorough experimental evaluation of the algorithms proposed. Our experiments show that our exact algorithms scale well to realistic database sizes and can in some scenarios compete with the most efficient previous approximation algorithms.
Probabilistic data exchange
 In Proc. ICDT
, 2010
"... The work reported here lays the foundations of data exchange in the presence of probabilistic data. This requires rethinking the very basic concepts of traditional data exchange, such as solution, universal solution, and the certain answers of target queries. We develop a framework for data exchange ..."
Abstract

Cited by 28 (5 self)
 Add to MetaCart
The work reported here lays the foundations of data exchange in the presence of probabilistic data. This requires rethinking the very basic concepts of traditional data exchange, such as solution, universal solution, and the certain answers of target queries. We develop a framework for data exchange over probabilistic databases, and make a case for its coherence and robustness. This framework applies to arbitrary schema mappings, and finite or countably infinite probability spaces on the source and target instances. After establishing this framework and formulating the key concepts, we study the application of the framework to a concrete and practical setting where probabilistic databases are compactly encoded by means of annotations formulated over random Boolean variables. In this setting, we study the problems of testing for the existence of solutions and universal solutions, materializing such solutions, and evaluating target queries (for unions of conjunctive queries) in both the exact sense and the approximate sense. For each of the problems, we carry out a complexity analysis based on properties of the annotation, in various classes of dependencies. Finally, we show that the framework and results easily and completely generalize to allow not only the data, but also the schema mapping itself to be probabilistic.
MayBMS: A System for Managing Large Uncertain and Probabilistic Databases
 Managing and Mining Uncertain Data, chapter 6
, 2008
"... MayBMS is a stateoftheart probabilistic database management system that has been built as an extension of Postgres, an opensource relational database management system. MayBMS follows a principled approach to leveraging the strengths of previous database research for achieving scalability. This ..."
Abstract

Cited by 22 (4 self)
 Add to MetaCart
MayBMS is a stateoftheart probabilistic database management system that has been built as an extension of Postgres, an opensource relational database management system. MayBMS follows a principled approach to leveraging the strengths of previous database research for achieving scalability. This article describes the main goals of this project, the design of query and update language, efficient exact and approximate query processing, and algorithmic and systems aspects.
Approximate Confidence Computation in Probabilistic Databases
"... Abstract—This paper introduces a deterministic approximation algorithm with error guarantees for computing the probability of propositional formulas over discrete random variables. The algorithmisbasedonanincrementalcompilationofformulasinto decision diagrams using three types of decompositions: Sha ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
Abstract—This paper introduces a deterministic approximation algorithm with error guarantees for computing the probability of propositional formulas over discrete random variables. The algorithmisbasedonanincrementalcompilationofformulasinto decision diagrams using three types of decompositions: Shannon expansion, independence partitioning, and product factorization. With each decomposition step, lower and upper bounds on the probability of the partially compiled formula can be quickly computed and checked against the allowed error. This algorithm can be effectively used to compute approximate confidence values of answer tuples to positive relational algebra queries on general probabilistic databases (ctables with discrete probability distributions). We further tune our algorithm so as to capture all known tractable conjunctive queries without selfjoins on tupleindependent probabilistic databases: In this case, the algorithm requires time polynomial in the input size even for exact computation. We implementedthealgorithm as anextension of theSPROUT query engine. An extensive experimental effort shows that it consistently outperforms stateofart approximation techniques by several orders of magnitude. I.
kNearest Neighbors in Uncertain Graphs
"... Complex networks, such as biological, social, and communication networks, often entail uncertainty, and thus, can be modeled as probabilistic graphs. Similar to the problem of similarity search in standard graphs, a fundamental problem for probabilistic graphs is to efficiently answer knearest neig ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
Complex networks, such as biological, social, and communication networks, often entail uncertainty, and thus, can be modeled as probabilistic graphs. Similar to the problem of similarity search in standard graphs, a fundamental problem for probabilistic graphs is to efficiently answer knearest neighbor queries (kNN), which is the problem of computing the k closest nodes to some specific node. In this paper we introduce a framework for processing kNN queries in probabilistic graphs. We propose novel distance functions that extend wellknown graph concepts, such as shortest paths. In order to compute them in probabilistic graphs, we design algorithms based on sampling. During kNN query processing we efficiently prune the search space using novel techniques. Our experiments indicate that our distance functions outperform previously used alternatives in identifying true neighbors in realworld biological data. We also demonstrate that our algorithms scale for graphs with tens of millions of edges. 1.
The Trichotomy of HAVING Queries on a Probabilistic Database
 VLDBJ
"... We study the evaluation of positive conjunctive queries with Boolean aggregate tests (similar to HAVING in SQL) on probabilistic databases. More precisely, we study conjunctive queries with predicate aggregates on probabilistic databases where the aggregation function is one of MIN, MAX, EXISTS, C ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
We study the evaluation of positive conjunctive queries with Boolean aggregate tests (similar to HAVING in SQL) on probabilistic databases. More precisely, we study conjunctive queries with predicate aggregates on probabilistic databases where the aggregation function is one of MIN, MAX, EXISTS, COUNT, SUM, AVG, or COUNT(DISTINCT) and the comparison function is one of =, �, ≥,>, ≤, or <. The complexity of evaluating a HAVING query depends on the aggregation function, α, and the comparison function, θ. In this paper, we establish a set of trichotomy results for conjunctive queries with HAVING predicates parametrized by (α, θ). For such queries (without self joins), one of the following three statements is true: (1) The exact evaluation problem has Ptime data complexity. In this case, we call the query safe. (2) The exact evaluation problem is ♯Phard, but the approximate evaluation problem has (randomized) Ptime data complexity. More precisely, there exists an fptras for the query. In this case, we call the query apxsafe. (3) The exact evaluation problem is ♯Phard, and the approximate evaluation problem is also hard. We call these queries hazardous. The precise definition of each class depends on the aggregate considered and the comparison function. Thus, we have queries that are (MAX, ≥)safe, (COUNT, ≤)apxsafe, (SUM, =)hazardous, etc. Our trichotomy result is a signifi
ReadOnce Functions and Query Evaluation in Probabilistic Databases
"... Probabilistic databases hold promise of being a viable means for largescale uncertainty management, increasingly needed in a number of real world applications domains. However, query evaluation in probabilistic databases remains a computational challenge. Prior work on efficient exact query evaluat ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
Probabilistic databases hold promise of being a viable means for largescale uncertainty management, increasingly needed in a number of real world applications domains. However, query evaluation in probabilistic databases remains a computational challenge. Prior work on efficient exact query evaluation in probabilistic databases has largely concentrated on querycentric formulations (e.g., safe plans, hierarchical queries), in that, they only consider characteristics of the query and not the data in the database. It is easy to construct examples where a supposedly hard query run on an appropriate database gives rise to a tractable query evaluation problem. In this paper, we develop efficient query evaluation techniques that leverage characteristics of both the query and the data in the database. We focus on tupleindependent databases where the query evaluation problem is equivalent to computing marginal probabilities of Boolean formulas associated with the result tuples. Query evaluation is easy if the Boolean formulas can be factorized into a form that has every variable appearing at most once (called readonce); this suggests a naive approach that incorporates previously developed Boolean formula factorization algorithms into the query evaluation. We then develop novel, more efficient factorization algorithms that work for a large subclass of queries (specifically, conjunctive queries without selfjoins), by exploiting the unique structure of the result tuple Boolean formulas. We empirically demonstrate that our proposed techniques are (1) orders of magnitude faster than generic inference algorithms when used to evaluate general readonce functions, and (2) for the special case of hierarchical queries, they rival the efficiency of prior techniques specifically designed to handle such queries. 1.
10ˆ(10ˆ6) Worlds and Beyond: Efficient Representation and Processing of Incomplete Information
 VLDBJ
"... We present a decompositionbased approach to managing probabilistic information. We introduce worldset decompositions (WSDs), a spaceefficient and complete representation system for finite sets of worlds. We study the problem of efficiently evaluating relational algebra queries on worldsets repre ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
We present a decompositionbased approach to managing probabilistic information. We introduce worldset decompositions (WSDs), a spaceefficient and complete representation system for finite sets of worlds. We study the problem of efficiently evaluating relational algebra queries on worldsets represented by WSDs. We also evaluate our technique experimentally in a large census data scenario and show that it is both scalable and efficient.
A Compositional Query Algebra for SecondOrder Logic and Uncertain Databases
 In Proc. ICDT
, 2009
"... Worldset algebra is a variablefree query language for uncertain databases. It constitutes the core of the query language implemented in MayBMS, an uncertain database system. This paper shows that worldset algebra captures exactly secondorder logic over finite structures, or equivalently, the pol ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
Worldset algebra is a variablefree query language for uncertain databases. It constitutes the core of the query language implemented in MayBMS, an uncertain database system. This paper shows that worldset algebra captures exactly secondorder logic over finite structures, or equivalently, the polynomial hierarchy. The proofs also imply that worldset algebra is closed under composition, a previously open problem. 1.
On Probabilistic Fixpoint and Markov Chain Query Languages
"... We study highly expressive query languages such as datalog, fixpoint, and whilelanguages on probabilistic databases. We generalize these languages such that computation steps (e.g. datalog rules) can fire probabilistically. We define two possible semantics for such query languages, namely inflation ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
We study highly expressive query languages such as datalog, fixpoint, and whilelanguages on probabilistic databases. We generalize these languages such that computation steps (e.g. datalog rules) can fire probabilistically. We define two possible semantics for such query languages, namely inflationary semantics where the results of each computation step are added to the current database and noninflationary queries that induce a random walk inbetween database instances. We then study the complexity of exact and approximate query evaluation under these semantics.