• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Efficient top-k query evaluation on probabilistic data (extended version (2006)

by C Ré, N Dalvi, D Suciu
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 182
Next 10 →

A Survey of Top-k Query Processing Techniques in Relational Database Systems

by Ihab F. Ilyas, George Beskales, Mohamed A. Soliman
"... Efficient processing of top-k queries is a crucial requirement in many interactive environments that involve massive amounts of data. In particular, efficient top-k processing in domains such as the Web, multimedia search and distributed systems has shown a great impact on performance. In this surve ..."
Abstract - Cited by 167 (6 self) - Add to MetaCart
Efficient processing of top-k queries is a crucial requirement in many interactive environments that involve massive amounts of data. In particular, efficient top-k processing in domains such as the Web, multimedia search and distributed systems has shown a great impact on performance. In this survey, we describe and classify top-k processing techniques in relational databases. We discuss different design dimensions in the current techniques including query models, data access methods, implementation levels, data and query certainty, and supported scoring functions. We show the implications of each dimension on the design of the underlying techniques. We also discuss top-k queries in XML domain, and show their connections to relational approaches.

Top-k query processing in uncertain databases

by Mohamed A. Soliman, Ihab F. Ilyas - In ICDE , 2007
"... Top-k processing in uncertain databases is semantically and computationally different from traditional top-k processing. The interplay between score and uncertainty makes traditional techniques inapplicable. We introduce new probabilistic formulations for top-k queries. Our formulations are based on ..."
Abstract - Cited by 125 (9 self) - Add to MetaCart
Top-k processing in uncertain databases is semantically and computationally different from traditional top-k processing. The interplay between score and uncertainty makes traditional techniques inapplicable. We introduce new probabilistic formulations for top-k queries. Our formulations are based on “marriage ” of traditional top-k semantics and possible worlds semantics. In the light of these formulations, we construct a framework that encapsulates a state space model and efficient query processing techniques to tackle the challenges of uncertain data settings. We prove that our techniques are optimal in terms of the number of accessed tuples and materialized search states. Our experiments show the efficiency of our techniques under different data distributions with orders of magnitude improvement over naïve materialization of possible worlds. 1
(Show Context)

Citation Context

... infeasibility of applying traditional top–k techniques in this setting. In parallel to this work, the problem of finding the k most probable query answers in probabilistic databases was addressed in =-=[19]-=-. An approach was presented to generate the top–k probable query answers using Monte-Carlo simulation, where computing the exact probability of an answer is relaxed in favor of computing the correct a...

MCDB: a Monte Carlo approach to managing uncertain data

by Ravi Jampani, Fei Xu, Mingxi Wu, Luis Leopoldo, Perez Christopher, Jermaine Peter, J. Haas , 2008
"... To deal with data uncertainty, existing probabilistic database sys-tems augment tuples with attribute-level or tuple-level probability values, which are loaded into the database along with the data itself. This approach can severely limit the system’s ability to gracefully handle complex or unforese ..."
Abstract - Cited by 110 (3 self) - Add to MetaCart
To deal with data uncertainty, existing probabilistic database sys-tems augment tuples with attribute-level or tuple-level probability values, which are loaded into the database along with the data itself. This approach can severely limit the system’s ability to gracefully handle complex or unforeseen types of uncertainty, and does not permit the uncertainty model to be dynamically parameterized ac-cording to the current state of the database. We introduce MCDB, a system for managing uncertain data that is based on a Monte Carlo approach. MCDB represents uncertainty via “VG functions,” which are used to pseudorandomly generate realized values for un-certain attributes. VG functions can be parameterized on the re-sults of SQL queries over “parameter tables ” that are stored in the database, facilitating what-if analyses. By storing parameters, and not probabilities, and by estimating, rather than exactly com-puting, the probability distribution over possible query answers, MCDB avoids many of the limitations of prior systems. For ex-ample, MCDB can easily handle arbitrary joint probability distri-butions over discrete or continuous attributes, arbitrarily complex SQL queries, and arbitrary functionals of the query-result distri-bution such as means, variances, and quantiles. To achieve good performance, MCDB uses novel query processing techniques, exe-cuting a query plan exactly once, but over “tuple bundles ” instead of ordinary tuples. Experiments indicate that our enhanced func-tionality can be obtained with acceptable overheads relative to tra-ditional systems.

Fast and Simple Relational Processing of Uncertain Data

by Lyublena Antova, Thomas Jansen, Christoph Koch, Dan Olteanu
"... Abstract — This paper introduces U-relations, a succinct and purely relational representation system for uncertain databases. U-relations support attribute-level uncertainty using vertical partitioning. If we consider positive relational algebra extended by an operation for computing possible answer ..."
Abstract - Cited by 91 (4 self) - Add to MetaCart
Abstract — This paper introduces U-relations, a succinct and purely relational representation system for uncertain databases. U-relations support attribute-level uncertainty using vertical partitioning. If we consider positive relational algebra extended by an operation for computing possible answers, a query on the logical level can be translated into, and evaluated as, a single relational algebra query on the U-relational representation. The translation scheme essentially preserves the size of the query in terms of number of operations and, in particular, number of joins. Standard techniques employed in off-the-shelf relational database management systems are effective for optimizing and processing queries on U-relations. In our experiments we show that query evaluation on U-relations scales to large amounts of data with high degrees of uncertainty.
(Show Context)

Citation Context

...mula (x 1≤i≤k i 1 = vi 1 ∧ ... ∧ xi mi = vi mi ). Recent work considered efficient solutions for restricted classes of queries and probabilistic databases [10] or by applying approximation techniques =-=[14]-=-. Scalable confidence computation is out of the scope of this paper. Our current approach for exact confidence computation exploits the independence and variable sharing among ws-descriptors and is by...

Conditioning Probabilistic Databases

by Christoph Koch, Dan Olteanu
"... Past research on probabilistic databases has studied the problem of answering queries on a static database. Application scenarios of probabilistic databases however often involve the conditioning of a database using additional information in the form of new evidence. The conditioning problem is thus ..."
Abstract - Cited by 65 (13 self) - Add to MetaCart
Past research on probabilistic databases has studied the problem of answering queries on a static database. Application scenarios of probabilistic databases however often involve the conditioning of a database using additional information in the form of new evidence. The conditioning problem is thus to transform a probabilistic database of priors into a posterior probabilistic database which is materialized for subsequent query processing or further refinement. It turns out that the conditioning problem is closely related to the problem of computing exact tuple confidence values. It is known that exact confidence computation is an NPhard problem. This has lead researchers to consider approximation techniques for confidence computation. However, neither conditioning nor exact confidence computation can be solved using such techniques. In this paper we present efficient techniques for both problems. We study several problem decomposition methods and heuristics that are based on the most successful search techniques from constraint satisfaction, such as the variable elimination rule of the Davis-Putnam algorithm. We complement this with a thorough experimental evaluation of the algorithms proposed. Our experiments show that our exact algorithms scale well to realistic database sizes and can in some scenarios compete with the most efficient previous approximation algorithms.
(Show Context)

Citation Context

...SSN) = 1; we should get three tuples in the result. Monte Carlo simulation based approximation algorithms will do very badly on such queries. Confidence approximation using a Karp-Lubystyle algorithm =-=[17, 9, 21]-=- will independently underestimate each tuple’s confidence with probability ≈.5. Thus the probability that at least one tuple is missing from the result of such a query is very high (see also [19]. In ...

10^(10^6) Worlds and Beyond: Efficient Representation and Processing of Incomplete Information

by Lyublena Antova, Christoph Koch, Dan Olteanu , 2006
"... Current systems and formalisms for representing incomplete information generally suffer from at least one of two weaknesses. Either they are not strong enough for representing results of simple queries, or the handling and processing of the data, e.g. for query evaluation, is intractable. In this pa ..."
Abstract - Cited by 64 (8 self) - Add to MetaCart
Current systems and formalisms for representing incomplete information generally suffer from at least one of two weaknesses. Either they are not strong enough for representing results of simple queries, or the handling and processing of the data, e.g. for query evaluation, is intractable. In this paper, we present a decomposition-based approach to addressing this problem. We introduce world-set decompositions (WSDs), a space-efficient formalism for representing any finite set of possible worlds over relational databases. WSDs are therefore a strong representation system for any relational query language. We study the problem of efficiently evaluating relational algebra queries on sets of worlds represented by WSDs. We also evaluate our technique experimentally in a large census data scenario and show that it is both scalable and efficient.

Semantics of ranking queries for probabilistic data and expected ranks

by Graham Cormode, Feifei Li, Ke Yi - In Proc. of ICDE’09 , 2009
"... Abstract — When dealing with massive quantities of data, topk queries are a powerful technique for returning only the k most relevant tuples for inspection, based on a scoring function. The problem of efficiently answering such ranking queries has been studied and analyzed extensively within traditi ..."
Abstract - Cited by 63 (1 self) - Add to MetaCart
Abstract — When dealing with massive quantities of data, topk queries are a powerful technique for returning only the k most relevant tuples for inspection, based on a scoring function. The problem of efficiently answering such ranking queries has been studied and analyzed extensively within traditional database settings. The importance of the top-k is perhaps even greater in probabilistic databases, where a relation can encode exponentially many possible worlds. There have been several recent attempts to propose definitions and algorithms for ranking queries over probabilistic data. However, these all lack many of the intuitive properties of a top-k over deterministic data. Specifically, we define a number of fundamental properties, including exact-k, containment, unique-rank, value-invariance, and stability, which are all satisfied by ranking queries on certain data. We argue that all these conditions should also be fulfilled by any reasonable definition for ranking uncertain data. Unfortunately, none of the existing definitions is able to achieve this. To remedy this shortcoming, this work proposes an intuitive new approach of expected rank. This uses the well-founded notion of the expected rank of each tuple across all possible worlds as the basis of the ranking. We are able to prove that, in contrast to all existing approaches, the expected rank satisfies all the required properties for a ranking query. We provide efficient solutions to compute this ranking across the major models of uncertain data, such as attribute-level and tuple-level uncertainty. For an uncertain relation of N tuples, the processing cost is O(N log N)—no worse than simply sorting the relation. In settings where there is a high cost for generating each tuple in turn, we provide pruning techniques based on probabilistic tail bounds that can terminate the search early and guarantee that the top-k has been found. Finally, a comprehensive experimental study confirms the effectiveness of our approach. I.
(Show Context)

Citation Context

...s of the top-k query to this setting, and how to answer such queries efficiently. To this end, there has been several recent works outlining possible definitions, and associated algorithms. Ré et al. =-=[28]-=- base their ranking on the confidence associated with each query result. Soliman et al. [33] extend the semantics of ranking queries from certain data and study the problem of ranking tuples when ther...

A Unified Approach to Ranking in Probabilistic Databases

by Jian Li, Barna Saha, Amol Deshpande
"... The dramatic growth in the number of application domains that naturally generate probabilistic, uncertain data has resulted in a need for efficiently supporting complex querying and decision-making over such data. In this paper, we present a unified approach to ranking and top-k query processing in ..."
Abstract - Cited by 62 (3 self) - Add to MetaCart
The dramatic growth in the number of application domains that naturally generate probabilistic, uncertain data has resulted in a need for efficiently supporting complex querying and decision-making over such data. In this paper, we present a unified approach to ranking and top-k query processing in probabilistic databases by viewing it as a multi-criteria optimization problem, and by deriving a set of features that capture the key properties of a probabilistic dataset that dictate the ranked result. We contend that a single, specific ranking function may not suffice for probabilistic databases, and we instead propose two parameterized ranking functions, called P RF ω and P RF e, that generalize or can approximate many of the previously proposed ranking functions. We present novel generating functions-based algorithms for efficiently ranking large datasets according to these ranking functions, even if the datasets exhibit complex correlations modeled using probabilistic and/xor trees or Markov networks. We further propose that the parameters of the ranking function be learned from user preferences, and we develop an approach to learn those parameters. Finally, we present a comprehensive experimental study that illustrates the effectiveness of our parameterized ranking functions, especially P RF e, at approximating other ranking functions and the scalability of our proposed algorithms for exact or approximate ranking. 1.

Probabilistic Databases: Diamonds in the Dirt

by Nilesh Dalvi, Christopher Ré, Dan Suciu - IN: COMMUNICATIONS OF ACM , 2009
"... ..."
Abstract - Cited by 57 (0 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...l form of probabilistic inference that occurs during query evaluation. A number of such results have emerged recently: lineage-based representations [4], safe plans [11], algorithms for top-k queries =-=[31, 37]-=-, and representations of views over probabilistic data [33]. What is common to all these results is that they apply and extend well known concepts that are fundamental to data management, such as the ...

Monochromatic and Bichromatic Reverse Skyline Search over Uncertain Databases

by Xiang Lian, Lei Chen , 2008
"... Reverse skyline queries over uncertain databases have many important applications such as sensor data monitoring and business planning. Due to the existence of uncertainty in many real-world data, answering reverse skyline queries accurately and efficiently over uncertain data has become increasingl ..."
Abstract - Cited by 56 (2 self) - Add to MetaCart
Reverse skyline queries over uncertain databases have many important applications such as sensor data monitoring and business planning. Due to the existence of uncertainty in many real-world data, answering reverse skyline queries accurately and efficiently over uncertain data has become increasingly important. In this paper, we model the probabilistic reverse skyline query on uncertain data, in both monochromatic and bichromatic cases, and propose effective pruning methods to reduce the search space of query processing. Moreover, efficient query procedures have been presented seamlessly integrating the proposed pruning methods. Extensive experiments have demonstrated the efficiency and effectiveness of our proposed approach with various experimental settings.
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University