Results 1 -
8 of
8
Read-Once Functions and Query Evaluation in Probabilistic Databases
"... Probabilistic databases hold promise of being a viable means for large-scale uncertainty management, increasingly needed in a number of real world applications domains. However, query evaluation in probabilistic databases remains a computational challenge. Prior work on efficient exact query evaluat ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Probabilistic databases hold promise of being a viable means for large-scale uncertainty management, increasingly needed in a number of real world applications domains. However, query evaluation in probabilistic databases remains a computational challenge. Prior work on efficient exact query evaluation in probabilistic databases has largely concentrated on query-centric formulations (e.g., safe plans, hierarchical queries), in that, they only consider characteristics of the query and not the data in the database. It is easy to construct examples where a supposedly hard query run on an appropriate database gives rise to a tractable query evaluation problem. In this paper, we develop efficient query evaluation techniques that leverage characteristics of both the query and the data in the database. We focus on tuple-independent databases where the query evaluation problem is equivalent to computing marginal probabilities of Boolean formulas associated with the result tuples. Query evaluation is easy if the Boolean formulas can be factorized into a form that has every variable appearing at most once (called read-once); this suggests a naive approach that incorporates previously developed Boolean formula factorization algorithms into the query evaluation. We then develop novel, more efficient factorization algorithms that work for a large subclass of queries (specifically, conjunctive queries without self-joins), by exploiting the unique structure of the result tuple Boolean formulas. We empirically demonstrate that our proposed techniques are (1) orders of magnitude faster than generic inference algorithms when used to evaluate general read-once functions, and (2) for the special case of hierarchical queries, they rival the efficiency of prior techniques specifically designed to handle such queries. 1.
Conditioning and Aggregating Uncertain Data Streams: Going Beyond Expectations
"... Uncertain data streams are increasingly common in real-world deployments and monitoring applications require the evaluation of complex queries on such streams. In this paper, we consider complex queries involving conditioning (e.g., selections and group by’s) and aggregation operations on uncertain ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Uncertain data streams are increasingly common in real-world deployments and monitoring applications require the evaluation of complex queries on such streams. In this paper, we consider complex queries involving conditioning (e.g., selections and group by’s) and aggregation operations on uncertain data streams. To characterize the uncertainty of answers to these queries, one generally has to compute the full probability distribution of each operation used in the query. Computing distributions of aggregates given conditioned tuple distributions is a hard, unsolved problem. Our work employs a new evaluation framework that includes a general data model, approximation metrics, and approximate representations. Within this framework we design fast data-stream algorithms, both deterministic and randomized, for returning approximate distributions with bounded errors as answers to those complex queries. Our experimental results demonstrate the accuracy and efficiency of our approximation techniques and offer insights into the strengths and limitations of deterministic and randomized algorithms. 1.
Aggregation in Probabilistic Databases via Knowledge Compilation
"... This paper presents a query evaluation technique for positive relational algebra queries with aggregates on a representation system for probabilistic data based on the algebraic structures of semiring and semimodule. The core of our evaluation technique is a procedure that compiles semimodule and se ..."
Abstract
- Add to MetaCart
This paper presents a query evaluation technique for positive relational algebra queries with aggregates on a representation system for probabilistic data based on the algebraic structures of semiring and semimodule. The core of our evaluation technique is a procedure that compiles semimodule and semiring expressions into so-called decomposition trees, for which the computation of the probability distribution can be done in polynomial time in the size of the tree and of the distributions represented by its nodes. We give syntactic characterisations of tractable queries with aggregates by exploiting the connection between query tractability and polynomial-time decomposition trees. A prototype of the technique is incorporated in the probabilistic database engine SPROUT. We report on performance experiments with custom datasets and TPC-H data. 1.
SUM Query Processing over Probabilistic Data
, 2011
"... Thème: Représentation et traitement des données et des connaissances apport d e r e c h e r c h e ..."
Abstract
- Add to MetaCart
Thème: Représentation et traitement des données et des connaissances apport d e r e c h e r c h e
vldb manuscript No. (will be inserted by the editor) CLARO: Modeling and Processing Uncertain Data Streams
"... the date of receipt and acceptance should be inserted later Abstract Uncertain data streams, where data are incomplete and imprecise, have been observed in many environments. Feeding such data streams to existing stream systems produces results of unknown quality, which is of paramount concern to mo ..."
Abstract
- Add to MetaCart
the date of receipt and acceptance should be inserted later Abstract Uncertain data streams, where data are incomplete and imprecise, have been observed in many environments. Feeding such data streams to existing stream systems produces results of unknown quality, which is of paramount concern to monitoring applications. In this paper, we present the Claro system that supports stream processing for uncertain data naturally captured using continuous random variables. Claro employs a unique data model that is flexible and allows efficient computation. Built on this model, we develop evaluation techniques for relational operators by exploring statistical theory and approximation. We also consider query planning for complex queries given an accuracy requirement. Evaluation results show that our techniques can achieve high performance while satisfying accuracy requirements, and outperform state-of-the-art sampling methods. 1
Queries with Difference on Probabilistic Databases
"... We study the feasibility of the exact and approximate computation of the probability of relational queries with difference on tuple-independent databases. We show that even the difference between two “safe ” conjunctive queries without self-joins is “unsafe ” for exact computation. We turn to approx ..."
Abstract
- Add to MetaCart
We study the feasibility of the exact and approximate computation of the probability of relational queries with difference on tuple-independent databases. We show that even the difference between two “safe ” conjunctive queries without self-joins is “unsafe ” for exact computation. We turn to approximation and design an FPRAS for a large class of relational queries with difference, limited by how difference is nested and by the nature of the subtracted subqueries. We give examples of inapproximable queries outside this class. 1.
The VLDB Journal DOI 10.1007/s00778-011-0261-7 REGULAR PAPER CLARO: modeling and processing uncertain data streams
"... Abstract Uncertain data streams, where data are incomplete and imprecise, have been observed in many environments. Feeding such data streams to existing stream systems produces results of unknown quality, which is of paramount concern to monitoring applications. In this paper, we present the Claro s ..."
Abstract
- Add to MetaCart
Abstract Uncertain data streams, where data are incomplete and imprecise, have been observed in many environments. Feeding such data streams to existing stream systems produces results of unknown quality, which is of paramount concern to monitoring applications. In this paper, we present the Claro system that supports stream processing for uncertain data naturally captured using continuous random variables. Claro employs a unique data model that is flexible and allows efficient computation. Built on this model, we develop evaluation techniques for relational operators by exploring statistical theory and approximation. We also consider query planning for complex queries given an accuracy requirement. Evaluation results show that our techniques can achieve high performance while satisfying accuracy requirements and outperform state-of-the-art sampling methods.

