Results 11  20
of
310
Fast and Simple Relational Processing of Uncertain Data
"... Abstract — This paper introduces Urelations, a succinct and purely relational representation system for uncertain databases. Urelations support attributelevel uncertainty using vertical partitioning. If we consider positive relational algebra extended by an operation for computing possible answer ..."
Abstract

Cited by 68 (5 self)
 Add to MetaCart
Abstract — This paper introduces Urelations, a succinct and purely relational representation system for uncertain databases. Urelations support attributelevel uncertainty using vertical partitioning. If we consider positive relational algebra extended by an operation for computing possible answers, a query on the logical level can be translated into, and evaluated as, a single relational algebra query on the Urelational representation. The translation scheme essentially preserves the size of the query in terms of number of operations and, in particular, number of joins. Standard techniques employed in offtheshelf relational database management systems are effective for optimizing and processing queries on Urelations. In our experiments we show that query evaluation on Urelations scales to large amounts of data with high degrees of uncertainty.
Models for Incomplete and Probabilistic Information
 IEEE Data Engineering Bulletin
, 2006
"... Abstract. We discuss, compare and relate some old and some new models for incomplete and probabilistic databases. We characterize the expressive power of ctables over infinite domains and we introduce a new kind of result, algebraic completion, for studying less expressive models. By viewing probab ..."
Abstract

Cited by 63 (9 self)
 Add to MetaCart
Abstract. We discuss, compare and relate some old and some new models for incomplete and probabilistic databases. We characterize the expressive power of ctables over infinite domains and we introduce a new kind of result, algebraic completion, for studying less expressive models. By viewing probabilistic models as incompleteness models with additional probability information, we define completeness and closure under query languages of general probabilistic database models and we introduce a new such model, probabilistic ctables, that is shown to be complete and closed under the relational algebra. 1
Probabilistic skylines on uncertain data
 In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB’07), Viena
, 2007
"... Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain data remains an open problem at large. In this pap ..."
Abstract

Cited by 63 (16 self)
 Add to MetaCart
Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain data remains an open problem at large. In this paper, we tackle the problem of skyline analysis on uncertain data. We propose a novel probabilistic skyline model where an uncertain object may take a probability to be in the skyline, and a pskyline contains all the objects whose skyline probabilities are at least p. Computing probabilistic skylines on large uncertain data sets is challenging. We develop two efficient algorithms. The bottomup algorithm computes the skyline probabilities of some selected instances of uncertain objects, and uses those instances to prune other instances and uncertain objects effectively. The topdown algorithm recursively partitions the instances of uncertain objects into subsets, and prunes subsets and objects aggressively. Our experimental results on both the real NBA player data set and the benchmark synthetic data sets show that probabilistic skylines are interesting and useful, and our two algorithms are efficient on large data sets, and complementary to each other in performance. 1.
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach
 Computer Science Department, Florida State University
, 2008
"... Uncertain data is inherent in a few important applications such as environmental surveillance and mobile object tracking. Topk queries (also known as ranking queries) are often natural and useful in analyzing uncertain data in those applications. In this paper, we study the problem of answering pro ..."
Abstract

Cited by 62 (14 self)
 Add to MetaCart
Uncertain data is inherent in a few important applications such as environmental surveillance and mobile object tracking. Topk queries (also known as ranking queries) are often natural and useful in analyzing uncertain data in those applications. In this paper, we study the problem of answering probabilistic threshold topk queries on uncertain data, which computes uncertain records taking a probability of at least p to be in the topk list where p is a user specified probability threshold. We present an efficient exact algorithm, a fast sampling algorithm, and a Poisson approximation based algorithm. An empirical study using real and synthetic data sets verifies the effectiveness of probabilistic threshold topk queries and the efficiency of our methods.
Clean answers over dirty databases: A probabilistic approach
 In Proc. ICDE
, 2006
"... The detection of duplicate tuples, corresponding to the same realworld entity, is an important task in data integration and cleaning. While many techniques exist to identify such tuples, the merging or elimination of duplicates can be a difficult task that relies on adhoc and often manual solution ..."
Abstract

Cited by 61 (2 self)
 Add to MetaCart
The detection of duplicate tuples, corresponding to the same realworld entity, is an important task in data integration and cleaning. While many techniques exist to identify such tuples, the merging or elimination of duplicates can be a difficult task that relies on adhoc and often manual solutions. We propose a complementary approach that permits declarative query answering over duplicated data, where each duplicate is associated with a probability of being in the clean database. We rewrite queries over a database containing duplicates to return each answer with the probability that the answer is in the clean database. Our rewritten queries are sensitive to the semantics of duplication and help a user understand which query answers are most likely to be present in the clean database. The semantics that we adopt is independent of the way the probabilities are produced, but is able to effectively exploit them during query answering. In the absence of external knowledge that associates each database tuple with a probability, we offer a technique, based on tuple summaries, that automates this task. We experimentally study the performance of our rewritten queries. Our studies show that the rewriting does not introduce a significant overhead in query execution time. This work is done in the context of the ConQuer project at the University of Toronto, which focuses on the efficient management of inconsistent and dirty databases. 1
10^(10^6) Worlds and Beyond: Efficient Representation and Processing of Incomplete Information
, 2006
"... Current systems and formalisms for representing incomplete information generally suffer from at least one of two weaknesses. Either they are not strong enough for representing results of simple queries, or the handling and processing of the data, e.g. for query evaluation, is intractable. In this pa ..."
Abstract

Cited by 61 (8 self)
 Add to MetaCart
Current systems and formalisms for representing incomplete information generally suffer from at least one of two weaknesses. Either they are not strong enough for representing results of simple queries, or the handling and processing of the data, e.g. for query evaluation, is intractable. In this paper, we present a decompositionbased approach to addressing this problem. We introduce worldset decompositions (WSDs), a spaceefficient formalism for representing any finite set of possible worlds over relational databases. WSDs are therefore a strong representation system for any relational query language. We study the problem of efficiently evaluating relational algebra queries on sets of worlds represented by WSDs. We also evaluate our technique experimentally in a large census data scenario and show that it is both scalable and efficient.
Online filtering, smoothing and probabilistic modeling of streaming data
 in ICDE
, 2008
"... In this paper, we address the problem of extending a relational database system to facilitate efficient realtime application of dynamic probabilistic models to streaming data. We use the recently proposed abstraction of modelbased views for this purpose, by allowing users to declaratively specify ..."
Abstract

Cited by 50 (3 self)
 Add to MetaCart
In this paper, we address the problem of extending a relational database system to facilitate efficient realtime application of dynamic probabilistic models to streaming data. We use the recently proposed abstraction of modelbased views for this purpose, by allowing users to declaratively specify the model to be applied, and by presenting the output of the models to the user as a probabilistic database view. We support declarative querying over such views using an extended version of SQL that allows for querying probabilistic data. Underneath we use particle filters, a class of sequential Monte Carlo algorithms commonly used to implement dynamic probabilistic models, to represent the present and historical states of the model as sets of weighted samples (particles) that are kept uptodate as new readings arrive. We develop novel techniques to convert the queries on the modelbased view directly into queries over particle tables, enabling highly efficient query processing. Finally, we present experimental evaluation of our prototype implementation over sensor data from the Intel Lab dataset that demonstrates the feasibility of online modeling of streaming data using our system and establishes the advantages of such tight integration between dynamic probabilistic models and database systems. 1
The dichotomy of conjunctive queries on probabilistic structures
 In PODS
, 2007
"... We show that for every conjunctive query, the complexity of evaluating it on a probabilistic database is either PTIME or #Pcomplete, and we give an algorithm for deciding whether a given conjunctive query is PTIME or #Pcomplete. The dichotomy property is a fundamental result on query evaluation on ..."
Abstract

Cited by 47 (13 self)
 Add to MetaCart
We show that for every conjunctive query, the complexity of evaluating it on a probabilistic database is either PTIME or #Pcomplete, and we give an algorithm for deciding whether a given conjunctive query is PTIME or #Pcomplete. The dichotomy property is a fundamental result on query evaluation on probabilistic databases and it gives a complete classification of the complexity of conjunctive queries. 1. PROBLEM STATEMENT Fix a relational vocabulary R1,..., Rk, denoted R. A tupleindependent probabilistic structure is a pair (A, p) where A = (A, R A 1,..., R A k) is first order structure and p is a function that associates to each tuple t in A a rational number p(t) ∈ [0, 1]. A probabilistic structure (A,p) induces a probability distribution on the set of substructures B of A by: p(B) = kY ( Y p(t) × i=1 t∈RB i Y t∈R A i −RB i (1 − p(t))) (1) where B ⊆ A, more precisely B = (A, R B 1,..., B B k) is s.t. R B i ⊆ R A i for i = 1, k. A conjunctive query, q, is a sentence of the form ∃¯x.(ϕ1 ∧... ∧ϕm), where each ϕi is a positive atomic predicate R(t), called a subgoal, and the tuple t consists of variables and/or constants. As usual, we drop the existential quantifiers and the ∧, writing q = ϕ1, ϕ2,..., ϕm. A conjunctive property is a property on structures defined by a conjunctive query q, and its probability on a probabilistic structure (A, p) is defined as: p(q) = X p(B) (2)
Community information management
, 2006
"... We introduce Cimple, a joint project between the University of Illinois and the University of Wisconsin. Cimple aims to develop a software platform that can be rapidly deployed and customized to manage datarich online communities. We first describe the envisioned working of such a software platform ..."
Abstract

Cited by 46 (10 self)
 Add to MetaCart
We introduce Cimple, a joint project between the University of Illinois and the University of Wisconsin. Cimple aims to develop a software platform that can be rapidly deployed and customized to manage datarich online communities. We first describe the envisioned working of such a software platform and our prototype, DBLife, which is a community portal being developed for the database research community. We then describe the technical challenges in Cimple and our solution approach. Finally, we discuss managing uncertainty and provenance, a crucial task in making our software platform practical. 1
Anonymizing Social Networks
 VLDB 2008
, 2008
"... Advances in technology have made it possible to collect data about individuals and the connections between them, such as email correspondence and friendships. Agencies and researchers who have collected such social network data often have a compelling interest in allowing others to analyze the data. ..."
Abstract

Cited by 45 (3 self)
 Add to MetaCart
Advances in technology have made it possible to collect data about individuals and the connections between them, such as email correspondence and friendships. Agencies and researchers who have collected such social network data often have a compelling interest in allowing others to analyze the data. However, in many cases the data describes relationships that are private (e.g., email correspondence) and sharing the data in full can result in unacceptable disclosures. In this paper, we present a framework for assessing the privacy risk of sharing anonymized network data. This includes a model of adversary knowledge, for which we consider several variants and make connections to known graph theoretical results. On several realworld social networks, we show that simple anonymization techniques are inadequate, resulting in substantial breaches of privacy for even modestly informed adversaries. We propose a novel anonymization technique based on perturbing the network and demonstrate empirically that it leads to substantial reduction of the privacy threat. We also analyze the effect that anonymizing the network has on the utility of the data for social network analysis.