Results 1  10
of
421
Graphical models, exponential families, and variational inference
, 2008
"... The formalism of probabilistic graphical models provides a unifying framework for capturing complex dependencies among random variables, and building largescale multivariate statistical models. Graphical models have become a focus of research in many statistical, computational and mathematical fiel ..."
Abstract

Cited by 453 (25 self)
 Add to MetaCart
The formalism of probabilistic graphical models provides a unifying framework for capturing complex dependencies among random variables, and building largescale multivariate statistical models. Graphical models have become a focus of research in many statistical, computational and mathematical fields, including bioinformatics, communication theory, statistical physics, combinatorial optimization, signal and image processing, information retrieval and statistical machine learning. Many problems that arise in specific instances — including the key problems of computing marginals and modes of probability distributions — are best studied in the general setting. Working with exponential family representations, and exploiting the conjugate duality between the cumulant function and the entropy for exponential families, we develop general variational representations of the problems of computing likelihoods, marginal probabilities and most probable configurations. We describe how a wide varietyof algorithms — among them sumproduct, cluster variational methods, expectationpropagation, mean field methods, maxproduct and linear programming relaxation, as well as conic programming relaxations — can all be understood in terms of exact or approximate forms of these variational representations. The variational approach provides a complementary alternative to Markov chain Monte Carlo as a general source of approximation methods for inference in largescale statistical models.
A New Class of Upper Bounds on the Log Partition Function
 In Uncertainty in Artificial Intelligence
, 2002
"... Bounds on the log partition function are important in a variety of contexts, including approximate inference, model fitting, decision theory, and large deviations analysis [11, 5, 4]. We introduce a new class of upper bounds on the log partition function, based on convex combinations of distribution ..."
Abstract

Cited by 159 (27 self)
 Add to MetaCart
Bounds on the log partition function are important in a variety of contexts, including approximate inference, model fitting, decision theory, and large deviations analysis [11, 5, 4]. We introduce a new class of upper bounds on the log partition function, based on convex combinations of distributions in the exponential domain, that is applicable to an arbitrary undirected graphical model. In the special case of convex combinations of treestructured distributions, we obtain a family of variational problems, similar to the Bethe free energy, but distinguished by the following desirable properties: (i) they are convex, and have a unique global minimum; and (ii) the global minimum gives an upper bound on the log partition function. The global minimum is defined by stationary conditions very similar to those defining xed points of belief propagation (BP) or treebased reparameterization [see 13, 14]. As with BP fixed points, the elements of the minimizing argument can be used as approximations to the marginals of the original model. The analysis described here can be extended to structures of higher treewidth (e.g., hypertrees), thereby making connections with more advanced approximations (e.g., Kikuchi and variants [15, 10]).
Learning to combine bottomup and topdown segmentation
 in: European Conference on Computer Vision
"... Abstract. Bottomup segmentation based only on lowlevel cues is a notoriously difficult problem. This difficulty has lead to recent topdown segmentation algorithms that are based on classspecific image information. Despite the success of topdown algorithms, they often give coarse segmentations t ..."
Abstract

Cited by 105 (0 self)
 Add to MetaCart
Abstract. Bottomup segmentation based only on lowlevel cues is a notoriously difficult problem. This difficulty has lead to recent topdown segmentation algorithms that are based on classspecific image information. Despite the success of topdown algorithms, they often give coarse segmentations that can be significantly refined using lowlevel cues. This raises the question of how to combine both topdown and bottomup cues in a principled manner. In this paper we approach this problem using supervised learning. Given a training set of ground truth segmentations we train a fragmentbased segmentation algorithm which takes into account both bottomup and topdown cues simultaneously, in contrast to most existing algorithms which train topdown and bottomup modules separately. We formulate the problem in the framework of Conditional Random Fields (CRF) and derive a feature induction algorithm for CRF, which allows us to efficiently search over thousands of candidate fragments. Whereas pure topdown algorithms often require hundreds of fragments, our simultaneous learning procedure yields algorithms with a handful of fragments that are combined with lowlevel cues to efficiently compute high quality segmentations. 1
Collective classification in network data
, 2008
"... Numerous realworld applications produce networked data such as web data (hypertext documents connected via hyperlinks) and communication networks (people connected via communication links). A recent focus in machine learning research has been to extend traditional machine learning classification te ..."
Abstract

Cited by 104 (28 self)
 Add to MetaCart
Numerous realworld applications produce networked data such as web data (hypertext documents connected via hyperlinks) and communication networks (people connected via communication links). A recent focus in machine learning research has been to extend traditional machine learning classification techniques to classify nodes in such data. In this report, we attempt to provide a brief introduction to this area of research and how it has progressed during the past decade. We introduce four of the most widely used inference algorithms for classifying networked data and empirically compare them on both synthetic and realworld data.
Fixing MaxProduct: Convergent Message Passing Algorithms for MAP LPRelaxations
"... We present a novel message passing algorithm for approximating the MAP problem in graphical models. The algorithm is similar in structure to maxproduct but unlike maxproduct it always converges, and can be proven to find the exact MAP solution in various settings. The algorithm is derived via bloc ..."
Abstract

Cited by 80 (10 self)
 Add to MetaCart
We present a novel message passing algorithm for approximating the MAP problem in graphical models. The algorithm is similar in structure to maxproduct but unlike maxproduct it always converges, and can be proven to find the exact MAP solution in various settings. The algorithm is derived via block coordinate descent in a dual of the LP relaxation of MAP, but does not require any tunable parameters such as step size or tree weights. We also describe a generalization of the method to cluster based potentials. The new method is tested on synthetic and realworld problems, and compares favorably with previous approaches. Graphical models are an effective approach for modeling complex objects via local interactions. In such models, a distribution over a set of variables is assumed to factor according to cliques of a graph with potentials assigned to each clique. Finding the assignment with highest probability in these models is key to using them in practice, and is often referred to as the MAP (maximum aposteriori) assignment problem. In the general case the problem is NP hard, with complexity exponential in the treewidth of the underlying graph.
MRF optimization via dual decomposition: Messagepassing revisited
 In ICCV
, 2007
"... A new messagepassing scheme for MRF optimization is proposed in this paper. This scheme inherits better theoretical properties than all other stateoftheart message passing methods and in practice performs equally well/outperforms them. It is based on the very powerful technique of Dual Decomposi ..."
Abstract

Cited by 78 (6 self)
 Add to MetaCart
A new messagepassing scheme for MRF optimization is proposed in this paper. This scheme inherits better theoretical properties than all other stateoftheart message passing methods and in practice performs equally well/outperforms them. It is based on the very powerful technique of Dual Decomposition [1] and leads to an elegant and general framework for understanding/designing messagepassing algorithms that can provide new insights into existing techniques. Promising experimental results and comparisons with the state of the art demonstrate the extreme theoretical and practical potentials of our approach. 1.
Message passing algorithms for compressed sensing: I. motivation and construction
 Proc. ITW
, 2010
"... Abstract—In a recent paper, the authors proposed a new class of lowcomplexity iterative thresholding algorithms for reconstructing sparse signals from a small set of linear measurements [1]. The new algorithms are broadly referred to as AMP, for approximate message passing. This is the second of tw ..."
Abstract

Cited by 70 (9 self)
 Add to MetaCart
Abstract—In a recent paper, the authors proposed a new class of lowcomplexity iterative thresholding algorithms for reconstructing sparse signals from a small set of linear measurements [1]. The new algorithms are broadly referred to as AMP, for approximate message passing. This is the second of two conference papers describing the derivation of these algorithms, connection with related literature, extensions of original framework, and new empirical evidence. This paper describes the state evolution formalism for analyzing these algorithms, and some of the conclusions that can be drawn from this formalism. We carried out extensive numerical simulations to confirm these predictions. We present here a few representative results. I. GENERAL AMP AND STATE EVOLUTION We consider the model
Tightening LP Relaxations for MAP using Message Passing
"... Linear Programming (LP) relaxations have become powerful tools for finding the most probable (MAP) configuration in graphical models. These relaxations can be solved efficiently using messagepassing algorithms such as belief propagation and, when the relaxation is tight, provably find the MAP confi ..."
Abstract

Cited by 68 (12 self)
 Add to MetaCart
Linear Programming (LP) relaxations have become powerful tools for finding the most probable (MAP) configuration in graphical models. These relaxations can be solved efficiently using messagepassing algorithms such as belief propagation and, when the relaxation is tight, provably find the MAP configuration. The standard LP relaxation is not tight enough in many realworld problems, however, and this has lead to the use of higher order clusterbased LP relaxations. The computational cost increases exponentially with the size of the clusters and limits the number and type of clusters we can use. We propose to solve the cluster selection problem monotonically in the dual LP, iteratively selecting clusters with guaranteed improvement, and quickly resolving with the added clusters by reusing the existing solution. Our dual messagepassing algorithm finds the MAP configuration in protein sidechain placement, protein design, and stereo problems, in cases where the standard LP relaxation fails. 1
Graphcover decoding and finitelength analysis of messagepassing iterative decoding of LDPC codes
 IEEE TRANS. INFORM. THEORY
, 2005
"... The goal of the present paper is the derivation of a framework for the finitelength analysis of messagepassing iterative decoding of lowdensity paritycheck codes. To this end we introduce the concept of graphcover decoding. Whereas in maximumlikelihood decoding all codewords in a code are comp ..."
Abstract

Cited by 68 (12 self)
 Add to MetaCart
The goal of the present paper is the derivation of a framework for the finitelength analysis of messagepassing iterative decoding of lowdensity paritycheck codes. To this end we introduce the concept of graphcover decoding. Whereas in maximumlikelihood decoding all codewords in a code are competing to be the best explanation of the received vector, under graphcover decoding all codewords in all finite covers of a Tanner graph representation of the code are competing to be the best explanation. We are interested in graphcover decoding because it is a theoretical tool that can be used to show connections between linear programming decoding and messagepassing iterative decoding. Namely, on the one hand it turns out that graphcover decoding is essentially equivalent to linear programming decoding. On the other hand, because iterative, locally operating decoding algorithms like messagepassing iterative decoding cannot distinguish the underlying Tanner graph from any covering graph, graphcover decoding can serve as a model to explain the behavior of messagepassing iterative decoding. Understanding the behavior of graphcover decoding is tantamount to understanding