Results 1  10
of
13
SumProduct Networks: A New Deep Architecture
"... The key limiting factor in graphical model inference and learning is the complexity of the partition function. We thus ask the question: what are general conditions under which the partition function is tractable? The answer leads to a new kind of deep architecture, which we call sumproduct networks ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
The key limiting factor in graphical model inference and learning is the complexity of the partition function. We thus ask the question: what are general conditions under which the partition function is tractable? The answer leads to a new kind of deep architecture, which we call sumproduct networks (SPNs). SPNs are directed acyclic graphs with variables as leaves, sums and products as internal nodes, and weighted edges. We show that if an SPN is complete and consistent it represents the partition function and all marginals of some graphical model, and give semantics to its nodes. Essentially all tractable graphical models can be cast as SPNs, but SPNs are also strictly more general. We then propose learning algorithms for SPNs, based on backpropagation and EM. Experiments show that inference and learning with SPNs can be both faster and more accurate than with standard deep networks. For example, SPNs perform image completion better than stateoftheart deep networks for this task. SPNs also have intriguing potential connections to the architecture of the cortex. 1
Learning Thin Junction Trees via Graph Cuts
"... Structure learning algorithms usually focus on the compactness of the learned model. However, for general compact models, both exact and approximate inference are still NPhard. Therefore, the focus only on compactness leads to learning models that require approximate inference techniques, thus redu ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
Structure learning algorithms usually focus on the compactness of the learned model. However, for general compact models, both exact and approximate inference are still NPhard. Therefore, the focus only on compactness leads to learning models that require approximate inference techniques, thus reducing their prediction quality. In this paper, we propose a method for learning an attractive class of models: boundedtreewidth junction trees, which permit both compact representation of probability distributions and efficient exact inference. Using Bethe approximation of the likelihood, we transform the problem of finding a good junction tree separator into a minimum cut problem on a weighted graph. Using the graph cut intuition, we present an efficient algorithm with theoretical guarantees for finding good separators, which we recursively apply to obtain a thin junction tree. Our extensive empirical evaluation demonstrates the benefit of applying exact inference using our models to answer queries. We also extend our technique to learning low treewidth conditional random fields, and demonstrate significant improvements over state of the art blockL1 regularization techniques. 1
Learning Efficient Markov Networks
"... We present an algorithm for learning hightreewidth Markov networks where inference is still tractable. This is made possible by exploiting contextspecific independence and determinism in the domain. The class of models our algorithm can learn has the same desirable properties as thin junction tree ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
We present an algorithm for learning hightreewidth Markov networks where inference is still tractable. This is made possible by exploiting contextspecific independence and determinism in the domain. The class of models our algorithm can learn has the same desirable properties as thin junction trees: polynomial inference, closedform weight learning, etc., but is much broader. Our algorithm searches for a feature that divides the state space into subspaces where the remaining variables decompose into independent subsets (conditioned on the feature and its negation) and recurses on each subspace/subset of variables until no useful new features can be found. We provide probabilistic performance guarantees for our algorithm under the assumption that the maximum feature length is bounded by a constant k (the treewidth can be much larger) and dependences are of bounded strength. We also propose a greedy version of the algorithm that, while forgoing these guarantees, is much more efficient. Experiments on a variety of domains show that our approach outperforms many stateoftheart Markov network structure learners. 1
Approximate Inference by Compilation to Arithmetic Circuits
"... Arithmetic circuits (ACs) exploit contextspecific independence and determinism to allow exact inference even in networks with high treewidth. In this paper, we introduce the first ever approximate inference methods using ACs, for domains where exact inference remains intractable. We propose and eva ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
Arithmetic circuits (ACs) exploit contextspecific independence and determinism to allow exact inference even in networks with high treewidth. In this paper, we introduce the first ever approximate inference methods using ACs, for domains where exact inference remains intractable. We propose and evaluate a variety of techniques based on exact compilation, forward sampling, AC structure learning, Markov network parameter learning, variational inference, and Gibbs sampling. In experiments on eight challenging realworld domains, we find that the methods based on sampling and learning work best: one such method (AC 2F) is faster and usually more accurate than loopy belief propagation, mean field, and Gibbs sampling; another (AC 2G) has a running time similar to Gibbs sampling but is consistently more accurate than all baselines. 1
Learning multilinear representations of distributions for efficient inference
"... Abstract We examine the class of multilinear representations (MLR) for expressing probability distributions over discrete variables. Recently, MLR have been considered as intermediate representations that facilitate inference in distributions represented as graphical models. We show that MLR is an ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract We examine the class of multilinear representations (MLR) for expressing probability distributions over discrete variables. Recently, MLR have been considered as intermediate representations that facilitate inference in distributions represented as graphical models. We show that MLR is an expressive representation of discrete distributions and can be used to concisely represent classes of distributions which have exponential size in other commonly used representations, while supporting probabilistic inference in time linear in the size of the representation. Our key contribution is presenting techniques for learning boundedsize distributions represented using MLR, which support efficient probabilistic inference. We demonstrate experimentally that the MLR representations we learn support accurate and very efficient inference. Keywords Learning probability distributions · Multilinear polynomials · Probabilistic inference · Graphical models
Local Structure and Determinism in Probabilistic Databases
"... While extensive work has been done on evaluating queries over tupleindependent probabilistic databases, query evaluation over correlated data has received much less attention even though the support for correlations is essential for many natural applications of probabilistic databases, e.g., inform ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
While extensive work has been done on evaluating queries over tupleindependent probabilistic databases, query evaluation over correlated data has received much less attention even though the support for correlations is essential for many natural applications of probabilistic databases, e.g., information extraction, data integration, computer vision, etc. In this paper, we develop a novel approach for efficiently evaluating probabilistic queries over correlated databases where correlations are represented using a factor graph, a class of graphical models widely used for capturing correlations and performing statistical inference. Our approach exploits the specific values of the factor parameters and the determinism in the correlations, collectively called local structure, to reduce the complexity of query evaluation. Our framework is based on arithmetic circuits, factorized representations of probability distributions that can exploit such local structure. Traditionally, arithmetic circuits are generated following a compilation process and can not be updated directly. We introduce a generalization of arithmetic circuits, called annotated arithmetic circuits, and a novel algorithm for updating them, which enables us to answer probabilistic queries efficiently. We present a comprehensive experimental analysis and show speedups of at least one order of magnitude in many cases.
Tractable Learning and Inference in HighTreewidth Graphical Models
, 2009
"... Probabilistic graphical models, by making conditional independence assumptions, can represent complex joint distributions in a factorized form. However, in large problems graphical models often run into two issues. First, in nontreelike graphs, computational issues frustrate exact inference. There ..."
Abstract
 Add to MetaCart
Probabilistic graphical models, by making conditional independence assumptions, can represent complex joint distributions in a factorized form. However, in large problems graphical models often run into two issues. First, in nontreelike graphs, computational issues frustrate exact inference. There are several approximate inference algorithms that, while often working well, do not obey approximation bounds. Second, traditional learning methods are nonrobust with respect to model errors – if the conditional independence assumptions of the model are violated, poor predictions can result. This thesis proposes two new methods for learning parameters of graphical models: implicit and procedural fitting. The goal of these methods is to improve the results of running a particular inference algorithm. Implicit fitting views inference as a large nonlinear energy function over predicted marginals. During learning, the parameters are adjusted to place the minima of this function close to the true marginals. Inspired by algorithms like loopy belief propagation, procedural fitting considers inference as a message passing procedure. Parameters are adjusted while learning so that this messagepassing process gives the best results. These methods are robust to both model errors and approximate
Tractable Learning and Inference in HighTreewidth Graphical Models
, 2009
"... Probabilistic graphical models, by making conditional independence assumptions, can represent complex joint distributions in a factorized form. However, in large problems graphical models often run into two issues. First, in nontreelike graphs, computational issues frustrate exact inference. There ..."
Abstract
 Add to MetaCart
Probabilistic graphical models, by making conditional independence assumptions, can represent complex joint distributions in a factorized form. However, in large problems graphical models often run into two issues. First, in nontreelike graphs, computational issues frustrate exact inference. There are several approximate inference algorithms that, while often working well, do not obey approximation bounds. Second, traditional learning methods are nonrobust with respect to model errors – if the conditional independence assumptions of the model are violated, poor predictions can result. This thesis proposes two new methods for learning parameters of graphical models: implicit and procedural fitting. The goal of these methods is to improve the results of running a particular inference algorithm. Implicit fitting views inference as a large nonlinear energy function over predicted marginals. During learning, the parameters are adjusted to place the minima of this function close to the true marginals. Inspired by algorithms like loopy belief propagation, procedural fitting considers inference as a message passing procedure. Parameters are adjusted while learning so that this
Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition
"... To the best of our current knowledge, the cortex uses essentially the same learning and inference algorithms throughout. If this hypothesis is correct, discovering these algorithms should be the holy grail of machine learning. One way to pursue it is to focus on the neuroscience, attempting to under ..."
Abstract
 Add to MetaCart
To the best of our current knowledge, the cortex uses essentially the same learning and inference algorithms throughout. If this hypothesis is correct, discovering these algorithms should be the holy grail of machine learning. One way to pursue it is to focus on the neuroscience, attempting to understand and formalize what the “wetware ” is doing. However, progress in this direction is hindered by our very limited current understanding of neurophysiology and (especially) neuroanatomy. Another approach is to focus on a simple task (e.g., digit recognition), develop learning algorithms that work on it, and hope they generalize to everything else the brain does. While this allows for a rapid experimental cycle, it seems unlikely that a single simple task will exhibit all the attributes of intelligence we need to capture, and we will need a lot of luck to find the “fundamental algorithms ” in this way. A third, and perhaps the most promising, approach, is to consider all the different things the brain does, and try to understand what they have in common. Is there a single set of capabilities that underlies vision, motion, language, commonsense reasoning, etc.? Viewed in this light, the entire AI and cognitive science literature becomes a rich source of potential clues. In particular, two themes appear repeatedly: the ability to handle noise, uncertainty, and incomplete information, which is well captured by probability and graphical models; and the ability to deal with complex situations,
Advisor:
"... We examine the class of multilinear representations (MLR) for expressing probability distributions over discrete variables. Recently, MLRs have been considered as intermediate representations that facilitate inference in distributions represented as graphical models. We show that MLR is an expressi ..."
Abstract
 Add to MetaCart
We examine the class of multilinear representations (MLR) for expressing probability distributions over discrete variables. Recently, MLRs have been considered as intermediate representations that facilitate inference in distributions represented as graphical models. We show that MLR is an expressive representation of discrete distributions and can be used to concisely represent classes of distributions which have exponential size in other commonly used representations, while supporting probabilistic inference in time that is linear in the size of the representation. Our key contribution is presenting techniques for learning boundedsize distributions represented using MLRs, which support efficient probabilistic inference. We then demonstrate experimentally that the MLR representations we learn supports accurate and very efficient inference. ii To my parents. iii Acknowledgements This work was made possible by the help and support of many people. A great many thanks to my advisor, Prof. Dan Roth, who actually proposed the main idea to me and constantly provided guidance to me through discussion and revisions of the research work. Also thanks to Daniel Lowd,