Results 1  10
of
53
Graphical models, exponential families, and variational inference. Foundations Trends
 Ihler (ihler@ics.uci.edu), University of California, Irvine. Michael
"... The formalism of probabilistic graphical models provides a unifying framework for capturing complex dependencies among random variables, and building largescale multivariate statistical models. Graphical models have become a focus of research in many statistical, computational and mathematical fiel ..."
Abstract

Cited by 428 (27 self)
 Add to MetaCart
The formalism of probabilistic graphical models provides a unifying framework for capturing complex dependencies among random variables, and building largescale multivariate statistical models. Graphical models have become a focus of research in many statistical, computational and mathematical fields, including bioinformatics, communication theory, statistical physics, combinatorial optimization, signal and image processing, information retrieval and statistical machine learning. Many problems that arise in specific instances — including the key problems of computing marginals and modes of probability distributions — are best studied in the general setting. Working with exponential family representations, and exploiting the conjugate duality between the cumulant function and the entropy for exponential families, we develop general variational representations of the problems of computing likelihoods, marginal probabilities and most probable configurations. We describe how a wide varietyof algorithms — among them sumproduct, cluster variational methods, expectationpropagation, mean field methods, maxproduct and linear programming relaxation, as well as conic programming relaxations — can all be understood in terms of exact or approximate forms of these variational representations. The variational approach provides a complementary alternative to Markov chain Monte Carlo as a general source of approximation methods for inference in largescale statistical models. 1
Fixing MaxProduct: Convergent Message Passing Algorithms for MAP LPRelaxations
"... We present a novel message passing algorithm for approximating the MAP problem in graphical models. The algorithm is similar in structure to maxproduct but unlike maxproduct it always converges, and can be proven to find the exact MAP solution in various settings. The algorithm is derived via bloc ..."
Abstract

Cited by 75 (10 self)
 Add to MetaCart
We present a novel message passing algorithm for approximating the MAP problem in graphical models. The algorithm is similar in structure to maxproduct but unlike maxproduct it always converges, and can be proven to find the exact MAP solution in various settings. The algorithm is derived via block coordinate descent in a dual of the LP relaxation of MAP, but does not require any tunable parameters such as step size or tree weights. We also describe a generalization of the method to cluster based potentials. The new method is tested on synthetic and realworld problems, and compares favorably with previous approaches. Graphical models are an effective approach for modeling complex objects via local interactions. In such models, a distribution over a set of variables is assumed to factor according to cliques of a graph with potentials assigned to each clique. Finding the assignment with highest probability in these models is key to using them in practice, and is often referred to as the MAP (maximum aposteriori) assignment problem. In the general case the problem is NP hard, with complexity exponential in the treewidth of the underlying graph.
Tightening LP Relaxations for MAP using Message Passing
"... Linear Programming (LP) relaxations have become powerful tools for finding the most probable (MAP) configuration in graphical models. These relaxations can be solved efficiently using messagepassing algorithms such as belief propagation and, when the relaxation is tight, provably find the MAP confi ..."
Abstract

Cited by 65 (10 self)
 Add to MetaCart
Linear Programming (LP) relaxations have become powerful tools for finding the most probable (MAP) configuration in graphical models. These relaxations can be solved efficiently using messagepassing algorithms such as belief propagation and, when the relaxation is tight, provably find the MAP configuration. The standard LP relaxation is not tight enough in many realworld problems, however, and this has lead to the use of higher order clusterbased LP relaxations. The computational cost increases exponentially with the size of the clusters and limits the number and type of clusters we can use. We propose to solve the cluster selection problem monotonically in the dual LP, iteratively selecting clusters with guaranteed improvement, and quickly resolving with the added clusters by reusing the existing solution. Our dual messagepassing algorithm finds the MAP configuration in protein sidechain placement, protein design, and stereo problems, in cases where the standard LP relaxation fails. 1
MAP Estimation, Linear Programming and Belief Propagation with Convex Free Energies
, 2007
"... Finding the most probable assignment (MAP) in a general graphical model is known to be NP hard but good approximations have been attained with maxproduct belief propagation (BP) and its variants. In particular, it is known that using BP on a singlecycle graph or tree reweighted BP on an arbitrary ..."
Abstract

Cited by 45 (4 self)
 Add to MetaCart
Finding the most probable assignment (MAP) in a general graphical model is known to be NP hard but good approximations have been attained with maxproduct belief propagation (BP) and its variants. In particular, it is known that using BP on a singlecycle graph or tree reweighted BP on an arbitrary graph will give the MAP solution if the beliefs have no ties. In this paper we extend the setting under which BP can be used to provably extract the MAP. We define Convex BP as BP algorithms based on a convex free energy approximation and show that this class includes ordinary BP with singlecycle, tree reweighted BP and many other BP variants. We show that when there are no ties, fixedpoints of convex maxproduct BP will provably give the MAP solution. We also show that convex sumproduct BP at sufficiently small temperatures can be used to solve linear programs that arise from relaxing the MAP problem. Finally, we derive a novel condition that allows us to derive the MAP solution even if some of the convex BP beliefs have ties. In experiments, we show that our theorems allow us to find the MAP in many realworld instances of graphical models where exact inference using junctiontree is impossible.
Messagepassing for graphstructured linear programs: Proximal methods and rounding schemes
, 2008
"... The problem of computing a maximum a posteriori (MAP) configuration is a central computational challenge associated with Markov random fields. A line of work has focused on “treebased ” linear programming (LP) relaxations for the MAP problem. This paper develops a family of superlinearly convergen ..."
Abstract

Cited by 30 (1 self)
 Add to MetaCart
The problem of computing a maximum a posteriori (MAP) configuration is a central computational challenge associated with Markov random fields. A line of work has focused on “treebased ” linear programming (LP) relaxations for the MAP problem. This paper develops a family of superlinearly convergent algorithms for solving these LPs, based on proximal minimization schemes using Bregman divergences. As with standard messagepassing on graphs, the algorithms are distributed and exploit the underlying graphical structure, and so scale well to large problems. Our algorithms have a doubleloop character, with the outer loop corresponding to the proximal sequence, and an inner loop of cyclic Bregman divergences used to compute each proximal update. Different choices of the Bregman divergence lead to conceptually related but distinct LPsolving algorithms. We establish convergence guarantees for our algorithms, and illustrate their performance via some simulations. We also develop two classes of graphstructured rounding schemes, randomized and deterministic, for obtaining integral configurations from the LP solutions. Our deterministic rounding schemes use a “reparameterization ” property of our algorithms so that when the LP solution is integral, the MAP solution can be obtained even before the LPsolver converges to the optimum. We also propose a graphstructured randomized rounding scheme that applies to iterative LP solving algorithms in general. We analyze the performance of our rounding schemes, giving bounds on the number of iterations required, when the LP is integral, for the rounding schemes to obtain the MAP solution. These bounds are expressed in terms of the strength of the potential functions, and the energy gap, which measures how well the integral MAP solution is separated from other integral configurations. We also report simulations comparing these rounding schemes. 1
Minimizing and learning energy functions for sidechain prediction
 In RECOMB2007
, 2007
"... Sidechain prediction is an important subproblem of the general protein folding problem. Despite much progress in sidechain prediction, performance is far from satisfactory. As an example, the ROSETTA protocol that uses simulated annealing to select the minimum energy conformations, correctly predi ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
Sidechain prediction is an important subproblem of the general protein folding problem. Despite much progress in sidechain prediction, performance is far from satisfactory. As an example, the ROSETTA protocol that uses simulated annealing to select the minimum energy conformations, correctly predicts the first two sidechain angles for approximately 72 % of the buried residues in a standard data set. Is further improvement more likely to come from better search methods, or from better energy functions? Given that exact minimization of the energy is NP hard, it is difficult to get a systematic answer to this question. In this paper, we present a novel search method and a novel method for learning energy functions from training data that are both based on Tree Reweighted Belief Propagation (TRBP). We find that TRBP can find the global optimum of the ROSETTA energy function in a few minutes of computation for approximately 85 % of the proteins in a standard benchmark set. TRBP can also effectively bound the partition function which enables using the Conditional Random Fields (CRF) framework for learning. Interestingly, finding the global minimum does not significantly improve sidechain prediction for
Lagrangian relaxation for MAP estimation in graphical models
 in: 45th Annual Allerton Conference on Communication, Control and Computing
, 2007
"... Abstract — We develop a general framework for MAP estimation in discrete and Gaussian graphical models using Lagrangian relaxation techniques. The key idea is to reformulate an intractable estimation problem as one defined on a more tractable graph, but subject to additional constraints. Relaxing th ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
Abstract — We develop a general framework for MAP estimation in discrete and Gaussian graphical models using Lagrangian relaxation techniques. The key idea is to reformulate an intractable estimation problem as one defined on a more tractable graph, but subject to additional constraints. Relaxing these constraints gives a tractable dual problem, one defined by a thin graph, which is then optimized by an iterative procedure. When this iterative optimization leads to a consistent estimate, one which also satisfies the constraints, then it corresponds to an optimal MAP estimate of the original model. Otherwise there is a “duality gap”, and we obtain a bound on the optimal solution. Thus, our approach combines convex optimization with dynamic programming techniques applicable for thin graphs. The popular treereweighted maxproduct (TRMP) method may be seen as solving a particular class of such relaxations, where the intractable graph is relaxed to a set of spanning trees. We also consider relaxations to a set of small induced subgraphs, thin subgraphs (e.g. loops), and a connected tree obtained by “unwinding ” cycles. In addition, we propose a new class of multiscale relaxations that introduce “summary ” variables. The potential benefits of such generalizations include: reducing or eliminating the “duality gap ” in hard problems, reducing the number or Lagrange multipliers in the dual problem, and accelerating convergence of the iterative optimization procedure. I.