Results 1  10
of
41
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach
 Computer Science Department, Florida State University
, 2008
"... Uncertain data is inherent in a few important applications such as environmental surveillance and mobile object tracking. Topk queries (also known as ranking queries) are often natural and useful in analyzing uncertain data in those applications. In this paper, we study the problem of answering pro ..."
Abstract

Cited by 82 (16 self)
 Add to MetaCart
(Show Context)
Uncertain data is inherent in a few important applications such as environmental surveillance and mobile object tracking. Topk queries (also known as ranking queries) are often natural and useful in analyzing uncertain data in those applications. In this paper, we study the problem of answering probabilistic threshold topk queries on uncertain data, which computes uncertain records taking a probability of at least p to be in the topk list where p is a user specified probability threshold. We present an efficient exact algorithm, a fast sampling algorithm, and a Poisson approximation based algorithm. An empirical study using real and synthetic data sets verifies the effectiveness of probabilistic threshold topk queries and the efficiency of our methods.
Semantics of ranking queries for probabilistic data and expected ranks
 In Proc. of ICDE’09
, 2009
"... Abstract — When dealing with massive quantities of data, topk queries are a powerful technique for returning only the k most relevant tuples for inspection, based on a scoring function. The problem of efficiently answering such ranking queries has been studied and analyzed extensively within traditi ..."
Abstract

Cited by 63 (1 self)
 Add to MetaCart
(Show Context)
Abstract — When dealing with massive quantities of data, topk queries are a powerful technique for returning only the k most relevant tuples for inspection, based on a scoring function. The problem of efficiently answering such ranking queries has been studied and analyzed extensively within traditional database settings. The importance of the topk is perhaps even greater in probabilistic databases, where a relation can encode exponentially many possible worlds. There have been several recent attempts to propose definitions and algorithms for ranking queries over probabilistic data. However, these all lack many of the intuitive properties of a topk over deterministic data. Specifically, we define a number of fundamental properties, including exactk, containment, uniquerank, valueinvariance, and stability, which are all satisfied by ranking queries on certain data. We argue that all these conditions should also be fulfilled by any reasonable definition for ranking uncertain data. Unfortunately, none of the existing definitions is able to achieve this. To remedy this shortcoming, this work proposes an intuitive new approach of expected rank. This uses the wellfounded notion of the expected rank of each tuple across all possible worlds as the basis of the ranking. We are able to prove that, in contrast to all existing approaches, the expected rank satisfies all the required properties for a ranking query. We provide efficient solutions to compute this ranking across the major models of uncertain data, such as attributelevel and tuplelevel uncertainty. For an uncertain relation of N tuples, the processing cost is O(N log N)—no worse than simply sorting the relation. In settings where there is a high cost for generating each tuple in turn, we provide pruning techniques based on probabilistic tail bounds that can terminate the search early and guarantee that the topk has been found. Finally, a comprehensive experimental study confirms the effectiveness of our approach. I.
A Unified Approach to Ranking in Probabilistic Databases
"... The dramatic growth in the number of application domains that naturally generate probabilistic, uncertain data has resulted in a need for efficiently supporting complex querying and decisionmaking over such data. In this paper, we present a unified approach to ranking and topk query processing in ..."
Abstract

Cited by 62 (3 self)
 Add to MetaCart
(Show Context)
The dramatic growth in the number of application domains that naturally generate probabilistic, uncertain data has resulted in a need for efficiently supporting complex querying and decisionmaking over such data. In this paper, we present a unified approach to ranking and topk query processing in probabilistic databases by viewing it as a multicriteria optimization problem, and by deriving a set of features that capture the key properties of a probabilistic dataset that dictate the ranked result. We contend that a single, specific ranking function may not suffice for probabilistic databases, and we instead propose two parameterized ranking functions, called P RF ω and P RF e, that generalize or can approximate many of the previously proposed ranking functions. We present novel generating functionsbased algorithms for efficiently ranking large datasets according to these ranking functions, even if the datasets exhibit complex correlations modeled using probabilistic and/xor trees or Markov networks. We further propose that the parameters of the ranking function be learned from user preferences, and we develop an approach to learn those parameters. Finally, we present a comprehensive experimental study that illustrates the effectiveness of our parameterized ranking functions, especially P RF e, at approximating other ranking functions and the scalability of our proposed algorithms for exact or approximate ranking. 1.
kNearest Neighbors in Uncertain Graphs
"... Complex networks, such as biological, social, and communication networks, often entail uncertainty, and thus, can be modeled as probabilistic graphs. Similar to the problem of similarity search in standard graphs, a fundamental problem for probabilistic graphs is to efficiently answer knearest neig ..."
Abstract

Cited by 33 (4 self)
 Add to MetaCart
Complex networks, such as biological, social, and communication networks, often entail uncertainty, and thus, can be modeled as probabilistic graphs. Similar to the problem of similarity search in standard graphs, a fundamental problem for probabilistic graphs is to efficiently answer knearest neighbor queries (kNN), which is the problem of computing the k closest nodes to some specific node. In this paper we introduce a framework for processing kNN queries in probabilistic graphs. We propose novel distance functions that extend wellknown graph concepts, such as shortest paths. In order to compute them in probabilistic graphs, we design algorithms based on sampling. During kNN query processing we efficiently prune the search space using novel techniques. Our experiments indicate that our distance functions outperform previously used alternatives in identifying true neighbors in realworld biological data. We also demonstrate that our algorithms scale for graphs with tens of millions of edges. 1.
A Survey on Representation, Composition and Application of Preferences in Database Systems
 ACM TODS
, 2011
"... Preferences have been traditionally studied in philosophy, psychology, and economics and applied to decision making problems. Recently, they have attracted the attention of researchers in other fields, such as databases where they capture soft criteria for queries. Databases bring a whole fresh pers ..."
Abstract

Cited by 30 (6 self)
 Add to MetaCart
Preferences have been traditionally studied in philosophy, psychology, and economics and applied to decision making problems. Recently, they have attracted the attention of researchers in other fields, such as databases where they capture soft criteria for queries. Databases bring a whole fresh perspective to the study of preferences, both computational and representational. From a representational perspective, the central question is how we can effectively represent preferences and incorporate them in database querying. From a computational perspective, we can look at how we can efficiently process preferences in the context of database queries. Several approaches have been proposed but a systematic study of these works is missing. The purpose of this survey is to provide a framework for placing existing works in perspective and highlight critical open challenges to serve as a springboard for researchers in database systems. We organize our study around three axes: preference representation, preference composition, and preference query processing.
Ranking with uncertain scores
 In ICDE
, 2009
"... Abstract — Large databases with uncertain information are becoming more common in many applications including data integration, location tracking, and Web search. In these applications, ranking records with uncertain attributes needs to handle new problems that are fundamentally different from conve ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
(Show Context)
Abstract — Large databases with uncertain information are becoming more common in many applications including data integration, location tracking, and Web search. In these applications, ranking records with uncertain attributes needs to handle new problems that are fundamentally different from conventional ranking. Specifically, uncertainty in records ’ scores induces a partial order over records, as opposed to the total order that is assumed in the conventional ranking settings. In this paper, we present a new probabilistic model, based on partial orders, to encapsulate the space of possible rankings originating from score uncertainty. Under this model, we formulate several ranking query types with different semantics. We describe and analyze a set of efficient query evaluation algorithms. We show that our techniques can be used to solve the problem of rank aggregation in partial orders. In addition, we design novel sampling techniques to compute approximate query answers. Our experimental evaluation uses both real and synthetic data. The experimental study demonstrates the efficiency and effectiveness of our techniques in different settings.
Consensus answers for queries over probabilistic databases
 in PODS
, 2009
"... We address the problem of finding a “best ” deterministic query answer to a query over a probabilistic database. For this purpose, we propose the notion of a consensus world (or a consensus answer) which is a deterministic world (answer) that minimizes the expected distance to the possible worlds (a ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
(Show Context)
We address the problem of finding a “best ” deterministic query answer to a query over a probabilistic database. For this purpose, we propose the notion of a consensus world (or a consensus answer) which is a deterministic world (answer) that minimizes the expected distance to the possible worlds (answers). This problem can be seen as a generalization of the wellstudied inconsistent information aggregation problems (e.g. rank aggregation) to probabilistic databases. We consider this problem for various types of queries including SPJ queries, Topk ranking queries, groupby aggregate queries, and clustering. For different distance metrics, we obtain polynomial time optimal or approximation algorithms for computing the consensus answers (or prove NPhardness). Most of our results are for a general probabilistic database model, called and/xor tree model, which significantly generalizes previous probabilistic database models like xtuples and blockindependent disjoint models, and is of independent interest.
Query answering techniques on uncertain and probabilistic data
 In SIGMOD 2008
"... Uncertain data are inherent in some important applications, such as environmental surveillance, market analysis, and quantitative economics research. Due to the importance of those applications and the rapidly increasing amount of uncertain data collected and accumulated, analyzing large collections ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
(Show Context)
Uncertain data are inherent in some important applications, such as environmental surveillance, market analysis, and quantitative economics research. Due to the importance of those applications and the rapidly increasing amount of uncertain data collected and accumulated, analyzing large collections of uncertain data has become an important task and has attracted more and more interest from the database community. Recently, uncertain data management has become an emerging hot area in database research and development. In this tutorial, we systematically review some representative studies on answering various queries on uncertain and probabilistic data.
Topk queries on uncertain data: On score distribution and typical answers
 In SIGMOD 2009
"... Uncertain data arises in a number of domains, including data integration and sensor networks. Topk queries that rank results according to some userdefined score are an important tool for exploring large uncertain data sets. As several recent papers have observed, the semantics of topk queries on ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
(Show Context)
Uncertain data arises in a number of domains, including data integration and sensor networks. Topk queries that rank results according to some userdefined score are an important tool for exploring large uncertain data sets. As several recent papers have observed, the semantics of topk queries on uncertain data can be ambiguous due to tradeoffs between reporting highscoring tuples and tuples with a high probability of being in the resulting data set. In this paper, we demonstrate the need to present the score distribution of topk vectors to allow the user to choose between results along this scoreprobability dimensions. One option would be to display the complete distribution of all potential topk tuple vectors, but this set is too large to compute. Instead, we propose to provide a number of typical vectors that effectively sample this distribution. We propose efficient algorithms to compute these vectors. We also extend the semantics and algorithms to the scenario of score ties, which is not dealt with in the previous work in the area. Our work includes a systematic empirical study on both real dataset and synthetic datasets.
Scalable Probabilistic Similarity Ranking in Uncertain Databases
"... This paper introduces a scalable approach for probabilistic topk similarity ranking on uncertain vector data. Each uncertain object is represented by a set of vector instances that are assumed to be mutuallyexclusive. The objective is to rank the uncertain data according to their distance to a ref ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
(Show Context)
This paper introduces a scalable approach for probabilistic topk similarity ranking on uncertain vector data. Each uncertain object is represented by a set of vector instances that are assumed to be mutuallyexclusive. The objective is to rank the uncertain data according to their distance to a reference object. We propose a framework that incrementally computes for each object instance and ranking position, the probability of the object falling at that ranking position. The resulting rank probability distribution can serve as input for several stateoftheart probabilistic ranking models. Existing approaches compute this probability distribution by applying the Poisson binomial recurrence technique of quadratic complexity. In this paper we theoretically as well as experimentally show that our framework reduces this to a lineartime complexity while having the same memory requirements, facilitated by incremental accessing of the uncertain vector instances in increasing order of their distance to the reference object. Furthermore, we show how the output of our method can be used to apply probabilistic topk ranking for the objects, according to different stateoftheart definitions. We conduct an experimental evaluation on synthetic and real data, which demonstrates the efficiency of our approach.