## Graphical models, exponential families, and variational inference. Foundations Trends

Venue: | Ihler (ihler@ics.uci.edu), University of California, Irvine. Michael |

Citations: | 438 - 25 self |

### BibTeX

@INPROCEEDINGS{Wainwright_graphicalmodels,,

author = {Martin J. Wainwright and Michael I. Jordan},

title = {Graphical models, exponential families, and variational inference. Foundations Trends},

booktitle = {Ihler (ihler@ics.uci.edu), University of California, Irvine. Michael},

year = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

The formalism of probabilistic graphical models provides a unifying framework for capturing complex dependencies among random variables, and building large-scale multivariate statistical models. Graphical models have become a focus of research in many statistical, computational and mathematical fields, including bioinformatics, communication theory, statistical physics, combinatorial optimization, signal and image processing, information retrieval and statistical machine learning. Many problems that arise in specific instances — including the key problems of computing marginals and modes of probability distributions — are best studied in the general setting. Working with exponential family representations, and exploiting the conjugate duality between the cumulant function and the entropy for exponential families, we develop general variational representations of the problems of computing likelihoods, marginal probabilities and most probable configurations. We describe how a wide varietyof algorithms — among them sum-product, cluster variational methods, expectation-propagation, mean field methods, max-product and linear programming relaxation, as well as conic programming relaxations — can all be understood in terms of exact or approximate forms of these variational representations. The variational approach provides a complementary alternative to Markov chain Monte Carlo as a general source of approximation methods for inference in large-scale statistical models. 1

### Citations

413 | Constructing free energy approximations and generalized belief propagation algorithms. technical report TR-2002-35, Mitsubishi Electrical Research Laboratories, 2002. available at http://www.merl.com
- Yedidia, Freeman, et al.
(Show Context)
Citation Context ...nary graph — namely, the 3 × 3 grid shown in Figure D.1(a). As illustrated in panel (b), we cluster the nodes into groups of four, which is known as Kikuchi 4-plaque clustering in statistical physics =-=[268, 269]-=-. We then form the augmented hypergraph G shown in panel (c), with hyperedge set E := V ∪ E ′ ∪ { {1,2,4,5},{2,3,5,6},{4,5,7,8},{5,6,8,9} } . The darkness of the boxes in this diagram reflects the dep... |

398 | Generalized belief propagation
- Yedidia, Freeman, et al.
- 2000
(Show Context)
Citation Context ...updates in terms of the Bethe approximation. Although the approximation itself dates back to the work of Bethe [29], the connection to the sum-product algorithm was first elucidated by Yedidia et al. =-=[268, 269]-=-. We then describe various natural generalizations of the Bethe approximation, including Kikuchi clustering and other hypergraph-based methods [133, 269]. Finally, we describe expectation-propagation ... |

361 |
Lectures on polytopes
- Ziegler
- 1995
(Show Context)
Citation Context ...τu,τst,τsu,τtu) |∃τstu such that inequalities (8.41) hold } , corresponding to the projection of L2(G) downtoR L . A classical technique for computing such projections is Fourier– Motzkin elimination =-=[24, 271]-=-. It is based on the following two steps: (a) first express all the inequalities so that the variable τstu to be eliminated appears on the left-hand side; and (b) then combine the (≤) constraints with... |

265 | Codes and Decoding on General Graphs
- Wiberg
- 1996
(Show Context)
Citation Context ...onnections, the fractional extreme points of the first-order LP relaxation have a very specific interpretation as pseudocodewords of the underlying code, studied in earlier work on iterative decoding =-=[83, 86, 257]-=-. Figure 8.3(b) provides a concrete illustration of a pseudocodeword that arises when the relaxation is applied to the toy220 Max-product and LP Relaxations code shown in Figure 8.3(b). Consider the ... |

230 | Correctness of belief propagation in Gaussian graphical models of arbitrary topology
- Weiss, Freeman
- 2001
(Show Context)
Citation Context ...Relaxations max-product message-passing updates (8.6) can be efficiently implemented with one recursion for the mean term (number a), and a second recursion for the variance component (see the papers =-=[250, 262]-=- for further details). For Gaussian max-product applied to a tree-structured problem, the updates are guaranteed to converge, and compute both the correct means µs = E[Xs] and variances σ 2 s = E[X 2 ... |

177 | Correctness of local probability propagation in graphical models with loops - Weiss - 2000 |

156 | A new class of upper bounds on the log partition function - Wainwright, Jaakkola, et al. - 2005 |

121 | Multiresolution Markov models for signal and image processing
- Willsky
- 2002
(Show Context)
Citation Context ...l in Example 3.5 of Section 3.3.22 Background Fig. 2.8 (a) The 4-nearest neighbor lattice model in 2D is often used for image modeling. (b) A multiscale quad tree approximation to a 2D lattice model =-=[262]-=-. Nodes in the original lattice (drawn in white) lie at the finest scale of the tree. The middle and top scales of the tree consist of auxiliary nodes (drawn in gray), introduced to model the fine sca... |

109 | CCCP algorithms to minimize the Bethe and Kikuchi free energies: Convergent alternatives to belief propagation
- Yuille
- 2002
(Show Context)
Citation Context ...entials, Dobrushin-type conditions and related contraction arguments guarantee both convergence of the updates, and as a consequence, uniqueness of the associated fixed point. A parallel line of work =-=[111, 254, 270]-=- has explored alternatives to sum-product that are guaranteed to converge, albeit at the price of increased computational cost. However, with the exception of trees and other special cases [110, 167, ... |

98 | A linear programming approach to max-sum problem: a review
- Werner
(Show Context)
Citation Context ...ut to be a number of interesting connections between TRW maxproduct and a line of research, due to Schlesinger and collaborators, previously published in the Russian literature [140, 210]. The survey =-=[256]-=- provides a detailed overview of this line of work, and some connections to reweighted max-product and LP relaxation. In addition to the basic TRW algorithm [245] and the TRW-S scheduling studied by K... |

96 |
Codes and iterative decoding on general graphs
- Wiberg, Loeliger, et al.
- 1995
(Show Context)
Citation Context ...posing the “hard-core” constraint that xs = xt for all (s,t) ∈ E.) A classical way of studying the ordinary sum- and max-product algorithms, dating back to the work of Gallager [87] and Wiberg et al. =-=[258]-=-, is via the computation tree associated with the messagepassing updates. As illustrated in Figure 8.2(b), the computation tree is rooted at a particular vertex (1 in this case), and it tracks the pat... |

57 | Linear programming relaxations and belief propagation – an empirical study
- Yanover, Meltzer, et al.
(Show Context)
Citation Context ...scheduling of the TRW max-product updates (8.21). Various forms of these reweighted max-product algorithms have been applied in problems such as segmentation and disparity problems in computer vision =-=[137, 139, 168, 227, 251, 265]-=-, error-control coding [76], side-chain prediction [251, 266], and sensor fusion [46, 49]. There also turn out to be a number of interesting connections between TRW maxproduct and a line of research, ... |

47 | Variational approximations between mean field theory and the junction tree algorithm
- Wiegerinck
- 2000
(Show Context)
Citation Context ...classes of tractable distributions that incorporate additional structure. This structured mean field approach was first proposed by Saul and Jordan [209], and further developed by various researchers =-=[10, 259, 120]-=-. Here, we capture the structured mean field idea by discussing a general form of the updates for an approximation based on an arbitrary subgraph F of the original graph G. We do not claim that these ... |

45 | T.: MAP estimation, linear programming and belief propagation with convex free energies
- Weiss, Yanover, et al.
- 2007
(Show Context)
Citation Context ...owever, this basic procedure for “convexification” is quite broadly applicable; as we describe, it yields convex analogs of other known variational methods, including Kikuchi and region-graph methods =-=[246, 251, 260]-=-, as well as expectation-propagation approximation. It has also suggested novel variational methods, also based on the notion of convex combinations, including those based on planar graph decompositio... |

30 | Fractional belief propagation
- Wiegerinck, Heskes
- 2002
(Show Context)
Citation Context ... observe that edge b appears in any spanning tree (i.e., it is a bridge), so that it must have edge appearance probability ρb =1. In their work on fractional belief propagation, Wiegerinck and Heskes =-=[261]-=- examined the class of reweighted Bethe problems of the form (7.11), but without the requirement that the weights ρst belong to the spanning tree polytope. Although loosening this requirement does yie... |

27 | Log-determinant relaxation for approximate inference in discrete Markov random fields
- Wainwright, Jordan
- 2006
(Show Context)
Citation Context ...based on convex combinations of tractable distributions, including the method of conditional entropy decompositions [95], and methods based on semidefinite constraints and log-determinant programming =-=[248]-=-. Apart from these known algorithms, there are a large number of novel methods based on convex variational relaxations that await discovery. We conclude the section by discussing some benefits of conv... |

27 |
An idiosyncratic journey beyond mean field theory
- Yedidia
- 2000
(Show Context)
Citation Context ...sequences at greater depth in Section 5.4. Despite these issues, the naive mean field approximation becomes asymptotically exact for certain types of models as the number of nodes m grows to infinity =-=[12, 267]-=-. An example is the ferromagnetic Ising model defined on the complete graph Km with suitably rescaled parameters θst > 0 for all (s,t) ∈ E; see Baxter [12] for further discussion of such exact cases. ... |

22 | Minimizing and learning energy functions for side-chain prediction
- Yanover, Schueler-Furman, et al.
- 2008
(Show Context)
Citation Context ...ed max-product algorithms have been applied in problems such as segmentation and disparity problems in computer vision [137, 139, 168, 227, 251, 265], error-control coding [76], side-chain prediction =-=[251, 266]-=-, and sensor fusion [46, 49]. There also turn out to be a number of interesting connections between TRW maxproduct and a line of research, due to Schlesinger and collaborators, previously published in... |

18 | Structured region graphs: Morphing EP into GBP - Welling, Minka, et al. |

15 |
Approximations with reweighted generalized belief propagation
- Wiegerinck
- 2005
(Show Context)
Citation Context ...owever, this basic procedure for “convexification” is quite broadly applicable; as we describe, it yields convex analogs of other known variational methods, including Kikuchi and region-graph methods =-=[246, 251, 260]-=-, as well as expectation-propagation approximation. It has also suggested novel variational methods, also based on the notion of convex combinations, including those based on planar graph decompositio... |

13 | Markov image modeling - Woods - 1978 |

13 |
The Maximum Entropy Method
- Wu
- 1997
(Show Context)
Citation Context ...nonical parameters. 3.1 Exponential Representations via Maximum Entropy One way in which to motivate exponential family representations of graphical models is through the principle of maximum entropy =-=[123, 264]-=-. Here, so as to provide helpful intuition for our subsequent development, we describe a particularly simple version for a scalar random variable X. Suppose that given n independent and identically di... |

12 | Bayesian random fields: The bethe-laplace approximation
- Welling, Parise
- 2006
(Show Context)
Citation Context ...e method, one could imagine using approximate sampling methods or variational methods. The use of such approximate methods and their impact on parameter estimation is still an active area of research =-=[225, 231, 241, 243, 253]-=-. 6.2 Partially Observed Models and Expectation–Maximization A more challenging version of parameter estimation arises in the partially observed setting, in which the random vector X ∼ pθ is not obser... |

10 |
Belief optimization: A stable alternative to loopy belief propagation
- Welling, Teh
- 2001
(Show Context)
Citation Context ...entials, Dobrushin-type conditions and related contraction arguments guarantee both convergence of the updates, and as a consequence, uniqueness of the associated fixed point. A parallel line of work =-=[111, 254, 270]-=- has explored alternatives to sum-product that are guaranteed to converge, albeit at the price of increased computational cost. However, with the exception of trees and other special cases [110, 167, ... |

9 |
Exact MAP estimates via agreement on (hyper)trees: Linear programming and message-passing approaches
- Wainwright, Jaakkola, et al.
- 2003
(Show Context)
Citation Context ...rative algorithm for exactly solving the Bethe variational problem on trees [268, 269]. In this section, we discuss the parallel link between the ordinary max-product algorithm and linear programming =-=[245]-=-. For a discrete MRF on the tree, the set M is given by the marginal polytope M(T ), whose elements consist of a marginal probability vector µs(·) for each node, and joint probability matrix µst(·, ·)... |

9 | Treewidth-based conditions for exactness of the Sherali-Adams and Lasserre relaxations. Univ
- Wainwright, Jordan
- 2004
(Show Context)
Citation Context ...positive semidefiniteness on covariance and other moment matrices. The moment matrix perspective allows for a unified treatment of various relaxations of marginal polytopes. See Wainwright and Jordan =-=[247]-=- for additional material on the ideas presented here. 9.1 Moment Matrices and Their Properties Given a random vector Y ∈ Rd , consider the collection of its secondorder moments: λst = E[YsYt], for s,t... |

2 | Teh: Linear response for approximate inference - Welling, W - 2003 |