Results 1  10
of
567
Efficient Query Evaluation on Probabilistic Databases
, 2004
"... We describe a system that supports arbitrarily complex SQL queries with ”uncertain” predicates. The query semantics is based on a probabilistic model and the results are ranked, much like in Information Retrieval. Our main focus is efficient query evaluation, a problem that has not received attentio ..."
Abstract

Cited by 455 (48 self)
 Add to MetaCart
(Show Context)
We describe a system that supports arbitrarily complex SQL queries with ”uncertain” predicates. The query semantics is based on a probabilistic model and the results are ranked, much like in Information Retrieval. Our main focus is efficient query evaluation, a problem that has not received attention in the past. We describe an optimization algorithm that can compute efficiently most queries. We show, however, that the data complexity of some queries is #Pcomplete, which implies that these queries do not admit any efficient evaluation methods. For these queries we describe both an approximation algorithm and a MonteCarlo simulation algorithm.
Approximate counting, uniform generation and rapidly mixing markov chains
 Inf. Comput
, 1989
"... The paper studies effective approximate solutions to combinatorial counting and uniform generation problems. Using a technique based on the simulation of ergodic Markov chains, it is shown that, for selfreducible structures, almost uniform generation is possible in polynomial time provided only tha ..."
Abstract

Cited by 318 (11 self)
 Add to MetaCart
The paper studies effective approximate solutions to combinatorial counting and uniform generation problems. Using a technique based on the simulation of ergodic Markov chains, it is shown that, for selfreducible structures, almost uniform generation is possible in polynomial time provided only that randomised approximate counting to within some arbitrary polynomial factor is possible in polynomial time. It follows that, for selfreducible structures, polynomial time randomised algorithms for counting to within factors of the form (1
On the Hardness of Approximate Reasoning
, 1996
"... Many AI problems, when formalized, reduce to evaluating the probability that a propositional expression is true. In this paper we show that this problem is computationally intractable even in surprisingly restricted cases and even if we settle for an approximation to this probability. We consider va ..."
Abstract

Cited by 289 (13 self)
 Add to MetaCart
Many AI problems, when formalized, reduce to evaluating the probability that a propositional expression is true. In this paper we show that this problem is computationally intractable even in surprisingly restricted cases and even if we settle for an approximation to this probability. We consider various methods used in approximate reasoning such as computing degree of belief and Bayesian belief networks, as well as reasoning techniques such as constraint satisfaction and knowledge compilation, that use approximation to avoid computational difficulties, and reduce them to modelcounting problems over a propositional domain. We prove that counting satisfying assignments of propositional languages is intractable even for Horn and monotone formulae, and even when the size of clauses and number of occurrences of the variables are extremely limited. This should be contrasted with the case of deductive reasoning, where Horn theories and theories with binary clauses are distinguished by the e...
The Markov Chain Monte Carlo method: an approach to approximate counting and integration
, 1996
"... In the area of statistical physics, Monte Carlo algorithms based on Markov chain simulation have been in use for many years. The validity of these algorithms depends crucially on the rate of convergence to equilibrium of the Markov chain being simulated. Unfortunately, the classical theory of stocha ..."
Abstract

Cited by 286 (12 self)
 Add to MetaCart
(Show Context)
In the area of statistical physics, Monte Carlo algorithms based on Markov chain simulation have been in use for many years. The validity of these algorithms depends crucially on the rate of convergence to equilibrium of the Markov chain being simulated. Unfortunately, the classical theory of stochastic processes hardly touches on the sort of nonasymptotic analysis required in this application. As a consequence, it had previously not been possible to make useful, mathematically rigorous statements about the quality of the estimates obtained. Within the last ten years, analytical tools have been devised with the aim of correcting this deficiency. As well as permitting the analysis of Monte Carlo algorithms for classical problems in statistical physics, the introduction of these tools has spurred the development of new approximation algorithms for a wider class of problems in combinatorial enumeration and optimization. The “Markov chain Monte Carlo ” method has been applied to a variety of such problems, and often provides the only known efficient (i.e., polynomial time) solution technique.
When trees collide: An approximation algorithm for the generalized Steiner problem on networks
, 1994
"... We give the first approximation algorithm for the generalized network Steiner problem, a problem in network design. An instance consists of a network with linkcosts and, for each pair fi; jg of nodes, an edgeconnectivity requirement r ij . The goal is to find a minimumcost network using the a ..."
Abstract

Cited by 256 (39 self)
 Add to MetaCart
We give the first approximation algorithm for the generalized network Steiner problem, a problem in network design. An instance consists of a network with linkcosts and, for each pair fi; jg of nodes, an edgeconnectivity requirement r ij . The goal is to find a minimumcost network using the available links and satisfying the requirements. Our algorithm outputs a solution whose cost is within 2dlog 2 (r + 1)e of optimal, where r is the highest requirement value. In the course of proving the performance guarantee, we prove a combinatorial minmax approximate equality relating minimumcost networks to maximum packings of certain kinds of cuts. As a consequence of the proof of this theorem, we obtain an approximation algorithm for optimally packing these cuts; we show that this algorithm has application to estimating the reliability of a probabilistic network.
The NPcompleteness column: an ongoing guide
 JOURNAL OF ALGORITHMS
, 1987
"... This is the nineteenth edition of a (usually) quarterly column that covers new developments in the theory of NPcompleteness. The presentation is modeled on that used by M. R. Garey and myself in our book "Computers and Intractability: A Guide to the Theory of NPCompleteness," W. H. Freem ..."
Abstract

Cited by 242 (0 self)
 Add to MetaCart
(Show Context)
This is the nineteenth edition of a (usually) quarterly column that covers new developments in the theory of NPcompleteness. The presentation is modeled on that used by M. R. Garey and myself in our book "Computers and Intractability: A Guide to the Theory of NPCompleteness," W. H. Freeman & Co., New York, 1979 (hereinafter referred to as "[G&J]"; previous columns will be referred to by their dates). A background equivalent to that provided by [G&J] is assumed, and, when appropriate, crossreferences will be given to that book and the list of problems (NPcomplete and harder) presented there. Readers who have results they would like mentioned (NPhardness, PSPACEhardness, polynomialtimesolvability, etc.) or open problems they would like publicized, should
Efficient topk query evaluation on probabilistic data
 in ICDE
, 2007
"... Modern enterprise applications are forced to deal with unreliable, inconsistent and imprecise information. Probabilistic databases can model such data naturally, but SQL query evaluation on probabilistic databases is difficult: previous approaches have either restricted the SQL queries, or computed ..."
Abstract

Cited by 180 (33 self)
 Add to MetaCart
(Show Context)
Modern enterprise applications are forced to deal with unreliable, inconsistent and imprecise information. Probabilistic databases can model such data naturally, but SQL query evaluation on probabilistic databases is difficult: previous approaches have either restricted the SQL queries, or computed approximate probabilities, or did not scale, and it was shown recently that precise query evaluation is theoretically hard. In this paper we describe a novel approach, which computes and ranks efficiently the topk answers to a SQL query on a probabilistic database. The restriction to topk answers is natural, since imprecisions in the data often lead to a large number of answers of low quality, and users are interested only in the answers with the highest probabilities. The idea in our algorithm is to run in parallel several MonteCarlo simulations, one for each candidate answer, and approximate each probability only to the extent needed to compute correctly the topk answers. The algorithms is in a certain sense provably optimal and scales to large databases: we have measured running times of 5 to 50 seconds for complex SQL queries over a large database (10M tuples of which 6M probabilistic). Additional contributions of the paper include several optimization techniques, and a simple data model for probabilistic data that achieves completeness by using SQL views. 1
Scalable Influence Maximization for Prevalent Viral Marketing in LargeScale Social Networks
"... Influence maximization, defined by Kempe, Kleinberg, and Tardos (2003), is the problem of finding a small set of seed nodes in a social network that maximizes the spread of influence under certain influence cascade models. The scalability of influence maximization is a key factor for enabling preval ..."
Abstract

Cited by 173 (13 self)
 Add to MetaCart
(Show Context)
Influence maximization, defined by Kempe, Kleinberg, and Tardos (2003), is the problem of finding a small set of seed nodes in a social network that maximizes the spread of influence under certain influence cascade models. The scalability of influence maximization is a key factor for enabling prevalent viral marketing in largescale online social networks. Prior solutions, such as the greedy algorithm of Kempe et al. (2003) and its improvements are slow and not scalable, while other heuristic algorithms do not provide consistently good performance on influence spreads. In this paper, we design a new heuristic algorithm that is easily scalable to millions of nodes and edges in our experiments. Our algorithm has a simple tunable parameter for users to control the balance between the running time and the influence spread of the algorithm. Our results from extensive simulations on several realworld and synthetic networks demonstrate that our algorithm is currently the best scalable solution to the influence maximization problem: (a) our algorithm scales beyond millionsized graphs where the greedy algorithm becomes infeasible, and (b) in all size ranges, our algorithm performs consistently well in influence spread — it is always among the best algorithms, and in most cases it significantly outperforms all other scalable heuristics to as much as 100%–260 % increase in influence spread.
ProbLog: a probabilistic Prolog and its application in link discovery
 In Proceedings of 20th International Joint Conference on Artificial Intelligence
, 2007
"... We introduce ProbLog, a probabilistic extension of Prolog. A ProbLog program defines a distribution over logic programs by specifying for each clause the probability that it belongs to a randomly sampled program, and these probabilities are mutually independent. The semantics of ProbLog is then defi ..."
Abstract

Cited by 146 (27 self)
 Add to MetaCart
We introduce ProbLog, a probabilistic extension of Prolog. A ProbLog program defines a distribution over logic programs by specifying for each clause the probability that it belongs to a randomly sampled program, and these probabilities are mutually independent. The semantics of ProbLog is then defined by the success probability of a query, which corresponds to the probability that the query succeeds in a randomly sampled program. The key contribution of this paper is the introduction of an effective solver for computing success probabilities. It essentially combines SLDresolution with methods for computing the probability of Boolean formulae. Our implementation further employs an approximation algorithm that combines iterative deepening with binary decision diagrams. We report on experiments in the context of discovering links in real biological networks, a demonstration of the practical usefulness of the approach. 1