Results 1 - 10
of
33
Optimal Aggregation Algorithms for Middleware
- In PODS
, 2001
"... Abstract: Assume that each object in a database has m grades, or scores, one for each of m attributes. For example, an object can have a color grade, that tells how red it is, and a shape grade, that tells how round it is. For each attribute, there is a sorted list, which lists each object and its g ..."
Abstract
-
Cited by 431 (4 self)
- Add to MetaCart
Abstract: Assume that each object in a database has m grades, or scores, one for each of m attributes. For example, an object can have a color grade, that tells how red it is, and a shape grade, that tells how round it is. For each attribute, there is a sorted list, which lists each object and its grade under that attribute, sorted by grade (highest grade first). There is some monotone aggregation function, orcombining rule, such as min or average, that combines the individual grades to obtain an overall grade. To determine the top k objects (that have the best overall grades), the naive algorithm must access every object in the database, to find its grade under each attribute. Fagin has given an algorithm (“Fagin’s Algorithm”, or FA) that is much more efficient. For some monotone aggregation functions, FA is optimal with high probability in the worst case. We analyze an elegant and remarkably simple algorithm (“the threshold algorithm”, or TA) that is optimal in a much stronger sense than FA. We show that TA is essentially optimal, not just for some monotone aggregation functions, but for all of them, and not just in a high-probability worst-case sense, but over every database. Unlike FA, which requires large buffers (whose size may grow unboundedly as the database size grows), TA requires only a small, constant-size buffer. TA allows early stopping, which yields, in a precise sense, an approximate version of the top k answers.
Sampling Algorithms: Lower Bounds and Applications (Extended Abstract)
, 2001
"... ] Ziv Bar-Yossef y Computer Science Division U. C. Berkeley Berkeley, CA 94720 zivi@cs.berkeley.edu Ravi Kumar IBM Almaden 650 Harry Road San Jose, CA 95120 ravi@almaden.ibm.com D. Sivakumar IBM Almaden 650 Harry Road San Jose, CA 95120 siva@almaden.ibm.com ABSTRACT We develop a fr ..."
Abstract
-
Cited by 43 (2 self)
- Add to MetaCart
] Ziv Bar-Yossef y Computer Science Division U. C. Berkeley Berkeley, CA 94720 zivi@cs.berkeley.edu Ravi Kumar IBM Almaden 650 Harry Road San Jose, CA 95120 ravi@almaden.ibm.com D. Sivakumar IBM Almaden 650 Harry Road San Jose, CA 95120 siva@almaden.ibm.com ABSTRACT We develop a framework to study probabilistic sampling algorithms that approximate general functions of the form f : A n ! B, where A and B are arbitrary sets. Our goal is to obtain lower bounds on the query complexity of functions, namely the number of input variables x i that any sampling algorithm needs to query to approximate f(x1 ; : : : ; xn ). We define two quantitative properties of functions --- the block sensitivity and the minimum Hellinger distance --- that give us techniques to prove lower bounds on the query complexity. These techniques are quite general, easy to use, yet powerful enough to yield tight results. Our applications include the mean and higher statistical moments, the median and other selection functions, and the frequency moments, where we obtain lower bounds that are close to the corresponding upper bounds. We also point out some connections between sampling and streaming algorithms and lossy compression schemes. 1.
An Optimal Approximation Algorithm For Bayesian Inference
- Artificial Intelligence
, 1997
"... Approximating the inference probability Pr[X = xjE = e] in any sense, even for a single evidence node E, is NP-hard. This result holds for belief networks that are allowed to contain extreme conditional probabilities---that is, conditional probabilities arbitrarily close to 0. Nevertheless, all p ..."
Abstract
-
Cited by 40 (2 self)
- Add to MetaCart
Approximating the inference probability Pr[X = xjE = e] in any sense, even for a single evidence node E, is NP-hard. This result holds for belief networks that are allowed to contain extreme conditional probabilities---that is, conditional probabilities arbitrarily close to 0. Nevertheless, all previous approximation algorithms have failed to approximate efficiently many inferences, even for belief networks without extreme conditional probabilities. We prove that we can approximate efficiently probabilistic inference in belief networks without extreme conditional probabilities. We construct a randomized approximation algorithm---the bounded-variance algorithm---that is a variant of the known likelihood-weighting algorithm. The bounded-variance algorithm is the first algorithm with provably fast inference approximation on all belief networks without extreme conditional probabilities. From the bounded-variance algorithm, we construct a deterministic approximation algorithm u...
Wave and Equation Based Rate Control Using Multicast Round Trip Time
- In Proceedings ACM SIGCOMM 2002
, 2002
"... This paper introduces Wave and Equation Based Rate Control (WEBRC), the first multiple rate multicast congestion control protocol to be equation based. The equation-based approach enforces fairness to TCP with the benefit that fluctuations in the flow rate are small in comparison to TCP. ..."
Abstract
-
Cited by 39 (3 self)
- Add to MetaCart
This paper introduces Wave and Equation Based Rate Control (WEBRC), the first multiple rate multicast congestion control protocol to be equation based. The equation-based approach enforces fairness to TCP with the benefit that fluctuations in the flow rate are small in comparison to TCP.
Monte Carlo Model Checking
- In Proc. of Tools and Algorithms for Construction and Analysis of Systems (TACAS 2005), volume 3440 of LNCS
, 2005
"... Abstract. We present MC 2, what we believe to be the first randomized, Monte Carlo algorithm for temporal-logic model checking, the classical problem of deciding whether or not a property specified in temporal logic holds of a system specification. Given a specification S of a finite-state system, a ..."
Abstract
-
Cited by 29 (4 self)
- Add to MetaCart
Abstract. We present MC 2, what we believe to be the first randomized, Monte Carlo algorithm for temporal-logic model checking, the classical problem of deciding whether or not a property specified in temporal logic holds of a system specification. Given a specification S of a finite-state system, an LTL (Linear Temporal Logic) formula ϕ, and parameters ɛ and δ, MC 2 takes N = ln(δ) / ln(1 − ɛ) random samples (random walks ending in a cycle, i.e lassos) from the Büchi automaton B = BS × B¬ϕ to decide if L(B) = ∅. Should a sample reveal an accepting lasso l, MC 2 returns false with l as a witness. Otherwise, it returns true and reports that with probability less than δ, pZ < ɛ, where pZ is the expectation of an accepting lasso in B. It does so in time O(N · D) and space O(D), where D is B’s recurrence diameter, using a number of samples N that is optimal to within a constant factor. Our experimental results demonstrate that MC 2 is fast, memory-efficient, and scales very well.
Empirical Bernstein Stopping
"... Sampling is a popular way of scaling up machine learning algorithms to large datasets. The question often is how many samples are needed. Adaptive stopping algorithms monitor the performance in an online fashion and they can stop early, saving valuable resources. We consider problems where probabili ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
Sampling is a popular way of scaling up machine learning algorithms to large datasets. The question often is how many samples are needed. Adaptive stopping algorithms monitor the performance in an online fashion and they can stop early, saving valuable resources. We consider problems where probabilistic guarantees are desired and demonstrate how recently-introduced empirical Bernstein bounds can be used to design stopping rules that are efficient. We provide upper bounds on the sample complexity of the new rules, as well as empirical results on model selection and boosting in the filtering setting. 1.
MayBMS: a probabilistic database management system
- In SIGMOD Conference
, 2009
"... MayBMS is a state-of-the-art probabilistic database management system which leverages the strengths of previous database research for achieving scalability. As a proof of concept for its ease of use, we have built on top of MayBMS a Web-based application that offers NBA-related information based on ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
MayBMS is a state-of-the-art probabilistic database management system which leverages the strengths of previous database research for achieving scalability. As a proof of concept for its ease of use, we have built on top of MayBMS a Web-based application that offers NBA-related information based on what-if analysis of team dynamics using data available at www.nba.com.
MayBMS: A System for Managing Large Uncertain and Probabilistic Databases
- Managing and Mining Uncertain Data, chapter 6
, 2008
"... MayBMS is a state-of-the-art probabilistic database management system that has been built as an extension of Postgres, an open-source relational database management system. MayBMS follows a principled approach to leveraging the strengths of previous database research for achieving scalability. This ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
MayBMS is a state-of-the-art probabilistic database management system that has been built as an extension of Postgres, an open-source relational database management system. MayBMS follows a principled approach to leveraging the strengths of previous database research for achieving scalability. This article describes the main goals of this project, the design of query and update language, efficient exact and approximate query processing, and algorithmic and systems aspects.
Approximate Confidence Computation in Probabilistic Databases
"... Abstract—This paper introduces a deterministic approximation algorithm with error guarantees for computing the probability of propositional formulas over discrete random variables. The algorithmisbasedonanincrementalcompilationofformulasinto decision diagrams using three types of decompositions: Sha ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
Abstract—This paper introduces a deterministic approximation algorithm with error guarantees for computing the probability of propositional formulas over discrete random variables. The algorithmisbasedonanincrementalcompilationofformulasinto decision diagrams using three types of decompositions: Shannon expansion, independence partitioning, and product factorization. With each decomposition step, lower and upper bounds on the probability of the partially compiled formula can be quickly computed and checked against the allowed error. This algorithm can be effectively used to compute approximate confidence values of answer tuples to positive relational algebra queries on general probabilistic databases (c-tables with discrete probability distributions). We further tune our algorithm so as to capture all known tractable conjunctive queries without selfjoins on tuple-independent probabilistic databases: In this case, the algorithm requires time polynomial in the input size even for exact computation. We implementedthealgorithm as anextension of theSPROUT query engine. An extensive experimental effort shows that it consistently outperforms state-of-art approximation techniques by several orders of magnitude. I.
Secondary-Storage Confidence Computation for Conjunctive Queries with Inequalities
- In Proc. SIGMOD
, 2009
"... This paper investigates the problem of efficiently computing the confidences of distinct tuples in the answers to conjunctive queries with inequalities (<) on tuple-independent probabilistic databases. This problem is fundamental to probabilistic databases and was recently stated open. Our contribut ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
This paper investigates the problem of efficiently computing the confidences of distinct tuples in the answers to conjunctive queries with inequalities (<) on tuple-independent probabilistic databases. This problem is fundamental to probabilistic databases and was recently stated open. Our contributions are of both theoretical and practical importance. We define a class of tractable queries with inequalities, and generalize existing results on #P-hardness of query evaluation, now in the presence of inequalities. For the tractable queries, we introduce a confidence computation technique based on efficient compilation of the lineage of the query answer into Ordered Binary Decision Diagrams (OBDDs), whose sizes are linear in the number of variables of the lineage. We implemented a secondary-storage variant of our technique in PostgreSQL. This variant does not need to materialize the OBDD, but computes, in one scan over the lineage, the probabilities of OBDD fragments and combines them on the fly. Experiments with probabilistic TPC-H data show up to two orders of magnitude improvements when compared with state-of-the-art approaches.

