Results 1  10
of
11
On the Optimal Approximation of Queries Using Tractable Propositional Languages
"... This paper investigates the problem of approximating conjunctive queries without selfjoins on probabilistic databases by lower and upper bounds that can be computed more efficiently. We study this problem via an indirection: Given a propositional formula Φ, find formulas in a more restricted langua ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
(Show Context)
This paper investigates the problem of approximating conjunctive queries without selfjoins on probabilistic databases by lower and upper bounds that can be computed more efficiently. We study this problem via an indirection: Given a propositional formula Φ, find formulas in a more restricted language that are greatest lower bound and least upper bound, respectively, ofΦ. We studyboundsin the languages of readonce formulas, where every variable occurs at most once, and of readonce formulas in disjunctive normal form. We show equivalences of syntactic and modeltheoretic characterisations of optimal bounds for unate formulas, and present algorithms that can enumerate them with polynomial delay. Such bounds can be computed by queries expressed using firstorder queries extended with transitive closure and a special choice construct. Besides probabilistic databases, theseresults can also benefit the problem of approximate query evaluation in relational databases, since the bounds expressed by queries can be computed in polynomial combined complexity. Categories andSubject Descriptors H.2.4 [Database Management]: Systems—Query Processing
Probabilistic Databases with MarkoViews
"... Most of the work on query evaluation in probabilistic databases has focused on the simple tupleindependent data model, where all tuples are independent random events. Several efficient query evaluation techniques exists in this setting, such as safe plans, algorithms based on OBDDs, treedecomposit ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
(Show Context)
Most of the work on query evaluation in probabilistic databases has focused on the simple tupleindependent data model, where all tuples are independent random events. Several efficient query evaluation techniques exists in this setting, such as safe plans, algorithms based on OBDDs, treedecomposition and a variety of approximation algorithms. However, complex data analytics tasks often require complex correlations between tuples, and here query evaluation is significantly more expensive, or more restrictive. In this paper, we propose MVDB as a framework both for representing complex correlations and for efficient query evaluation. An MVDB specifies correlations by views, called MarkoViews, on the probabilistic relations and declaring the weights of the view’s outputs. An MVDB is a (very large) Markov Logic Network. We make two sets of contributions. First, we show that query evaluation on an MVDB is equivalent to evaluating a Union of Conjunctive Query(UCQ) over a tupleindependent database. The translation is exact (thus allowing the techniques developed for tuple independent databases to be carried over to MVDB), yet it is novel and quite nonobvious (some resulting probabilities may be negative!). This translation in itself though may not lead to much gain since the translated query gets complicated as we try to capture more correlations. Our second contribution is to propose a new query evaluation strategy that exploits offline compilation to speed up online query evaluation. Here we utilize and extend our prior work on compilation of UCQ. We validate experimentally our techniques on a large probabilistic database with MarkoViews inferred from the DBLP data. 1.
Approximate Lifted Inference with Probabilistic Databases
"... This paper proposes a new approach for approximate evaluation of #Phard queries with probabilistic databases. In our approach, every query is evaluated entirely in the database engine by evaluating a fixed number of query plans, each providing an upper bound on the true probability, then taking th ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
This paper proposes a new approach for approximate evaluation of #Phard queries with probabilistic databases. In our approach, every query is evaluated entirely in the database engine by evaluating a fixed number of query plans, each providing an upper bound on the true probability, then taking their minimum. We provide an algorithm that takes into account important schema information to enumerate only the minimal necessary plans among all possible plans. Importantly, this algorithm is a strict generalization of all known results of PTIME selfjoinfree conjunctive queries: A query is safe if and only if our algorithm returns one single plan. We also apply three relational query optimization techniques to evaluate all minimal safe plans very fast. We give a detailed experimental evaluation of our approach and, in the process, provide a new way of thinking about the value of probabilistic methods over nonprobabilistic methods for ranking query answers. 1.
Oblivious bounds on the probability of Boolean functions
 ACM Trans. Database Syst. (TODS
"... This paper develops upper and lower bounds for the probability of Boolean functions by treating multiple occurrences of variables as independent and assigning them new individual probabilities. We call this approach dissociation and give an exact characterization of optimal oblivious bounds, i.e. w ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
This paper develops upper and lower bounds for the probability of Boolean functions by treating multiple occurrences of variables as independent and assigning them new individual probabilities. We call this approach dissociation and give an exact characterization of optimal oblivious bounds, i.e. when the new probabilities are chosen independent of the probabilities of all other variables. Our motivation comes from the weighted model counting problem (or, equivalently, the problem of computing the probability of a Boolean function), which is #Phard in general. By performing several dissociations, one can transform a Boolean formula whose probability is difficult to compute, into one whose probability is easy to compute, and which is guaranteed to provide an upper or lower bound on the probability of the original formula by choosing appropriate probabilities for the dissociated variables. Our new bounds shed light on the connection between previous relaxationbased and modelbased approximations and unify them as concrete choices in a larger design space. We also show how our theory allows a standard relational database management system (DBMS) to both upper and lower bound hard probabilistic queries in guaranteed polynomial time.
Anytime approximation in probabilistic databases
, 2013
"... This article describes an approximation algorithm for computing the probability of propositional formulas over discrete random variables. It incrementally refines lower and upper bounds on the probability of the formulas until the desired absolute or relative error guarantee is reached. This algori ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
This article describes an approximation algorithm for computing the probability of propositional formulas over discrete random variables. It incrementally refines lower and upper bounds on the probability of the formulas until the desired absolute or relative error guarantee is reached. This algorithm is used by the SPROUT query engine to approximate the probabilities of results to relational algebra queries on expressive probabilistic databases.
Learning a compositional semantics for Freebase with an open predicate vocabulary
 Transactions of the Association for Computational Linguistics
, 2015
"... We present an approach to learning a modeltheoretic semantics for natural language tied to Freebase. Crucially, our approach uses an open predicate vocabulary, enabling it to produce denotations for phrases such as “Republican frontrunner from Texas ” whose semantics cannot be represented using ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
We present an approach to learning a modeltheoretic semantics for natural language tied to Freebase. Crucially, our approach uses an open predicate vocabulary, enabling it to produce denotations for phrases such as “Republican frontrunner from Texas ” whose semantics cannot be represented using the Freebase schema. Our approach directly converts a sentence’s syntactic CCG parse into a logical form containing predicates derived from the words in the sentence, assigning each word a consistent semantics across sentences. This logical form is evaluated against a learned probabilistic database that defines a distribution over denotations for each textual predicate. A training phase produces this probabilistic database using a corpus of entitylinked text and probabilistic matrix factorization with a novel ranking objective function. We evaluate our approach on a compositional question answering task where it outperforms several competitive baselines. We also compare our approach against manually annotated Freebase queries, finding that our open predicate vocabulary enables us to answer many questions that Freebase cannot. 1
Oblivious Bounds on the Probability of Boolean Functions
, 2013
"... This paper develops upper and lower bounds for the probability of Boolean functions by treating multiple occurrences of variables as independent and assigning them new individual probabilities. We call this approach dissociation and give an exact characterization of optimal oblivious bounds, i.e. wh ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
This paper develops upper and lower bounds for the probability of Boolean functions by treating multiple occurrences of variables as independent and assigning them new individual probabilities. We call this approach dissociation and give an exact characterization of optimal oblivious bounds, i.e. when the new probabilities are chosen independent of the probabilities of all other variables. Our motivation comes from the weighted model counting problem (or, equivalently, the problem of computing the probability of a Boolean function), which is #Phard in general. By performing several dissociations, one can transform a Boolean formula whose probability is difficult to compute, into one whose probability is easy to compute, and which is guaranteed to provide an upper or lower bound, respectively, on the probability of the original formula. Our new bounds shed light on the connection between previous relaxationbased and modelbased approximations in the literature and unify them as concrete choices in a larger design space. We also show how our theory allows a standard relational database management systems (DBMS) to both upper and lower bound hard probabilistic queries.
Tractability in Probabilistic Databases
, 2011
"... All intext references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. ..."
Abstract
 Add to MetaCart
All intext references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
Managing Structured Collections of Community Data
"... Data management is becoming increasingly social. We observe a new form of information in such collaborative scenarios, where users contribute and reuse information, which resides neither in the base data nor in the schema information. This “superimposed structure ” derives partly from interaction wi ..."
Abstract
 Add to MetaCart
(Show Context)
Data management is becoming increasingly social. We observe a new form of information in such collaborative scenarios, where users contribute and reuse information, which resides neither in the base data nor in the schema information. This “superimposed structure ” derives partly from interaction within the community, and partly from the recombination of existing data. We argue that this triad of data, schema, and higherorder structure requires new data abstractions that – at the same time – must efficiently scale to very large community databases. In addition, data generated by the community exposes four characteristics that make scalability especially difficult: (i) inconsistency, as different users or applications have or require partially overlapping and contradicting views; (ii) nonmonotonicity, as new information may be able to revoke previous information already built upon; (iii) uncertainty, as both user intent and rankings are generally uncertain; and (iv) provenance, as content contributors want to track their data, and “content reusers ” evaluate their trust. We show promising scalable solutions to two of these problems, and illustrate the general data management challenges with a seemingly simple example from community elearning (“celearning”).
Approximate Lifted Inference in Probabilistic Databases
"... This paper proposes a new approach for approximate evaluation of #Phard queries over probabilistic databases. In our approach, every query is evaluated entirely in the database engine by evaluating a fixed number of query plans, each providing an upper bound on the true probability, then taking ..."
Abstract
 Add to MetaCart
(Show Context)
This paper proposes a new approach for approximate evaluation of #Phard queries over probabilistic databases. In our approach, every query is evaluated entirely in the database engine by evaluating a fixed number of query plans, each providing an upper bound on the true probability, then taking their minimum. We provide an algorithm that takes into account important schema information to enumerate only the minimal necessary plans among all possible plans. Importantly, this algorithm is a strict generalization of all known results of PTIME selfjoin free conjunctive queries: A query is safe if and only if our algorithm returns one single plan. We also apply three relational query optimization techniques to evaluate all minimal safe plans very fast. We give a detailed experimental evaluation of our approach and, in the process, provide a new way of thinking about the value of probabilistic methods over nonprobabilistic methods for ranking query answers.