Results 1  10
of
46
Graphical models, exponential families, and variational inference
, 2008
"... The formalism of probabilistic graphical models provides a unifying framework for capturing complex dependencies among random variables, and building largescale multivariate statistical models. Graphical models have become a focus of research in many statistical, computational and mathematical fiel ..."
Abstract

Cited by 819 (28 self)
 Add to MetaCart
(Show Context)
The formalism of probabilistic graphical models provides a unifying framework for capturing complex dependencies among random variables, and building largescale multivariate statistical models. Graphical models have become a focus of research in many statistical, computational and mathematical fields, including bioinformatics, communication theory, statistical physics, combinatorial optimization, signal and image processing, information retrieval and statistical machine learning. Many problems that arise in specific instances — including the key problems of computing marginals and modes of probability distributions — are best studied in the general setting. Working with exponential family representations, and exploiting the conjugate duality between the cumulant function and the entropy for exponential families, we develop general variational representations of the problems of computing likelihoods, marginal probabilities and most probable configurations. We describe how a wide varietyof algorithms — among them sumproduct, cluster variational methods, expectationpropagation, mean field methods, maxproduct and linear programming relaxation, as well as conic programming relaxations — can all be understood in terms of exact or approximate forms of these variational representations. The variational approach provides a complementary alternative to Markov chain Monte Carlo as a general source of approximation methods for inference in largescale statistical models.
Residual splash for optimally parallelizing belief propagation
 In In Artificial Intelligence and Statistics (AISTATS
, 2009
"... As computer architectures move towards multicore we must build a theoretical understanding of parallelism in machine learning. In this paper we focus on parallel inference in graphical models. We demonstrate that the natural, fully synchronous parallelization of belief propagation is highly ineffici ..."
Abstract

Cited by 68 (8 self)
 Add to MetaCart
(Show Context)
As computer architectures move towards multicore we must build a theoretical understanding of parallelism in machine learning. In this paper we focus on parallel inference in graphical models. We demonstrate that the natural, fully synchronous parallelization of belief propagation is highly inefficient. By bounding the achievable parallel performance in chain graphical models we develop a theoretical understanding of the parallel limitations of belief propagation. We then provide a new parallel belief propagation algorithm which achieves optimal performance. Using two challenging realworld tasks, we empirically evaluate the performance of our algorithm on large cyclic graphical models where we achieve near linear parallel scaling and out perform alternative algorithms. 1
A Comparative Study of Modern Inference Techniques for Discrete Energy Minimization Problem
"... Seven years ago, Szeliski et al. published an influential study on energy minimization methods for Markov random fields (MRF). This study provided valuable insights in choosing the best optimization technique for certain classes of problems. While these insights remain generally useful today, the ph ..."
Abstract

Cited by 48 (13 self)
 Add to MetaCart
Seven years ago, Szeliski et al. published an influential study on energy minimization methods for Markov random fields (MRF). This study provided valuable insights in choosing the best optimization technique for certain classes of problems. While these insights remain generally useful today, the phenominal success of random field models means that the kinds of inference problems we solve have changed significantly. Specifically, the models today often include higher order interactions, flexible connectivity structures, large labelspaces of different cardinalities, or learned energy tables. To reflect these changes, we provide a modernized and enlarged study. We present an empirical comparison of 24 stateofart techniques on a corpus of 2,300 energy minimization instances from 20 diverse computer vision applications. To ensure reproducibility, we evaluate all methods in the OpenGM2 framework and report extensive results regarding runtime and solution quality. Key insights from our study agree with the results of Szeliski et al. for the types of models they studied. However, on new and challenging types of models our findings disagree and suggest that polyhedral methods and integer programming solvers are competitive in terms of runtime and solution quality over a large range of model types.
Approximate Inference in Graphical Models using LP Relaxations
, 2010
"... Graphical models such as Markov random fields have been successfully applied to a wide variety of fields, from computer vision and natural language processing, to computational biology. Exact probabilistic inference is generally intractable in complex models having many dependencies between the vari ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
(Show Context)
Graphical models such as Markov random fields have been successfully applied to a wide variety of fields, from computer vision and natural language processing, to computational biology. Exact probabilistic inference is generally intractable in complex models having many dependencies between the variables. We present new approaches to approximate inference based on linear programming (LP) relaxations. Our algorithms optimize over the cycle relaxation of the marginal polytope, which we show to be closely related to the first lifting of the SheraliAdams hierarchy, and is significantly tighter than the pairwise LP relaxation. We show how to efficiently optimize over the cycle relaxation using a cuttingplane algorithm that iteratively introduces constraints into the relaxation. We provide a criterion to determine which constraints would be most helpful in tightening the relaxation, and give efficient algorithms for solving the search problem of finding the best cycle constraint to add according to this criterion.
Alphabet SOUP: A Framework for Approximate Energy Minimization
"... Many problems in computer vision can be modeled using conditional Markov random fields (CRF). Since finding the maximum a posteriori (MAP) solution in such models is NPhard, much attention in recent years has been placed on finding good approximate solutions. In particular, graphcut based algorith ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
(Show Context)
Many problems in computer vision can be modeled using conditional Markov random fields (CRF). Since finding the maximum a posteriori (MAP) solution in such models is NPhard, much attention in recent years has been placed on finding good approximate solutions. In particular, graphcut based algorithms, such as αexpansion, are tremendously successful at solving problems with regular potentials. However, for arbitrary energy functions, message passing algorithms, such as maxproduct belief propagation, are still the only resort. In this paper we describe a general framework for finding approximate MAP solutions of arbitrary energy functions. Our algorithm (called Alphabet SOUP for Sequential Optimization for Unrestricted Potentials) performs a search over variable assignments by iteratively solving subproblems over a reduced statespace. We provide a theoretical guarantee on the quality of the solution when the inner loop of our algorithm is solved exactly. We show that this approach greatly improves the efficiency of inference and achieves lower energy solutions for a broad range of vision problems. 1.
More data means less inference: A pseudomax approach to structured learning
"... The problem of learning to predict structured labels is of key importance in many applications. However, for general graph structure both learning and inference are intractable. Here we show that it is possible to circumvent this difficulty when the distribution of training examples is rich enough, ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
(Show Context)
The problem of learning to predict structured labels is of key importance in many applications. However, for general graph structure both learning and inference are intractable. Here we show that it is possible to circumvent this difficulty when the distribution of training examples is rich enough, via a method similar in spirit to pseudolikelihood. We show that our new method achieves consistency, and illustrate empirically that it indeed approaches the performance of exact methods when sufficiently large training sets are used. Many prediction problems in machine learning applications are structured prediction tasks. For example, in protein folding we are given a protein sequence and the goal is to predict the protein’s native structure [14]. In parsing for natural language processing, we are given a sentence and the goal is to predict the most likely parse tree [2]. In these and many other applications, we can formalize the structured prediction problem as taking an input x (e.g., primary sequence, sentence) and predicting y (e.g., structure, parse) according to y = arg maxˆy∈Y θ · φ(x, ˆy), where φ(x, y) is a function that maps any input and a candidate assignment to a feature vector, Y denotes the space of all possible
Anytime AND/OR Depthfirst Search for Combinatorial Optimization
"... One popular and efficient scheme for solving exactly combinatorial optimization problems over graphical models is depthfirst Branch and Bound. However, when the algorithm exploits problem decomposition using AND/OR search spaces, its anytime behavior breaks down. This paper 1) analyzes and demonstr ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
One popular and efficient scheme for solving exactly combinatorial optimization problems over graphical models is depthfirst Branch and Bound. However, when the algorithm exploits problem decomposition using AND/OR search spaces, its anytime behavior breaks down. This paper 1) analyzes and demonstrates this inherent conflict between effective exploitation of problem decomposition (through AND/OR search spaces) and the anytime behavior of depthfirst search (DFS), 2) presents a first scheme to address this issue while maintaining desirable DFS memory properties, 3) analyzes and demonstrates its effectiveness. Our work is applicable to any problem that can be cast as search over an AND/OR search space.
Multiple choice learning: Learning to produce multiple structured outputs
 In NIPS
, 2012
"... We address the problem of generating multiple hypotheses for structured prediction tasks that involve interaction with users or successive components in a cascaded architecture. Given a set of multiple hypotheses, such components/users typically have the ability to retrieve the best (or approximatel ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
(Show Context)
We address the problem of generating multiple hypotheses for structured prediction tasks that involve interaction with users or successive components in a cascaded architecture. Given a set of multiple hypotheses, such components/users typically have the ability to retrieve the best (or approximately the best) solution in this set. The standard approach for handling such a scenario is to first learn a singleoutput model and then produce MBest Maximum a Posteriori (MAP) hypotheses from this model. In contrast, we learn to produce multiple outputs by formulating this task as a multipleoutput structuredoutput prediction problem with a lossfunction that effectively captures the setup of the problem. We present a maxmargin formulation that minimizes an upperbound on this lossfunction. Experimental results on image segmentation and protein sidechain prediction show that our method outperforms conventional approaches used for this type of scenario and leads to substantial improvements in prediction accuracy.
MessagePassing for Approximate MAP Inference with Latent Variables
"... We consider a general inference setting for discrete probabilistic graphical models where we seek maximum a posteriori (MAP) estimates for a subset of the random variables (max nodes), marginalizing over the rest (sum nodes). We present a hybrid messagepassing algorithm to accomplish this. The hybr ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
We consider a general inference setting for discrete probabilistic graphical models where we seek maximum a posteriori (MAP) estimates for a subset of the random variables (max nodes), marginalizing over the rest (sum nodes). We present a hybrid messagepassing algorithm to accomplish this. The hybrid algorithm passes a mix of sum and max messages depending on the type of source node (sum or max). We derive our algorithm by showing that it falls out as the solution of a particular relaxation of a variational framework. We further show that the Expectation Maximization algorithm can be seen as an approximation to our algorithm. Experimental results on synthetic and realworld datasets, against several baselines, demonstrate the efficacy of our proposed algorithm. 1