## Tree-Based Reparameterization Framework for Analysis of Belief Propagation and Related Algorithms (2001)

### Cached

### Download Links

Citations: | 102 - 21 self |

### BibTeX

@MISC{Wainwright01tree-basedreparameterization,

author = {Martin Wainwright and Tommi Jaakkola and Alan Willsky},

title = {Tree-Based Reparameterization Framework for Analysis of Belief Propagation and Related Algorithms},

year = {2001}

}

### Years of Citing Articles

### OpenURL

### Abstract

We present a tree-based reparameterization framework that provides a new conceptual view of a large class of algorithms for computing approximate marginals in graphs with cycles. This class includes the belief propagation or sum-product algorithm [39, 36], as well as a rich set of variations and extensions of belief propagation. Algorithms in this class can be formulated as a sequence of reparameterization updates, each of which entails re-factorizing a portion of the distribution corresponding to an acyclic subgraph (i.e., a tree). The ultimate goal is to obtain an alternative but equivalent factorization using functions that represent (exact or approximate) marginal distributions on cliques of the graph. Our framework highlights an important property of BP and the entire class of reparameterization algorithms: the distribution on the full graph is not changed. The perspective of tree-based updates gives rise to a simple and intuitive characterization of the fixed points in terms of tree consistency. We develop interpretations of these results in terms of information geometry. The invariance of the distribution, in conjunction with the fixed point characterization, enables us to derive an exact relation between the exact marginals on an arbitrary graph with cycles, and the approximations provided by belief propagation, and more broadly, any algorithm that minimizes the Bethe free energy. We also develop bounds on this approximation error, which illuminate the conditions that govern their accuracy. Finally, we show how the reparameterization perspective extends naturally to more structured approximations (e.g., Kikuchi and variants [52, 37]) that operate over higher order cliques.

### Citations

8622 |
Information Theory
- Cover, Thomas
- 1991
(Show Context)
Citation Context ...formation matrix . Therefore, the Hessian is positive semidefinite, so that is a convex function of . In addition, the exponential parameterization of (6) induces a certain form for the KL divergence =-=[43]-=- that will be useful in the sequel. Given two parameter vectors and , we denote by the KL divergence between the distributions and . This divergence can be written in the following form: Note that thi... |

7092 |
Probabilistic reasoning in intelligent systems: networks of plausible inference
- Pearl
- 1988
(Show Context)
Citation Context ...810642 0018-9448/03$17.00 © 2003 IEEE I. INTRODUCTION PROBABILITY distributions defined by graphs arise in a variety of fields, including coding theory, e.g., [5], [6], artificial intelligence, e.g.,=-= [1]-=-, [7], statistical physics [8], as well as image processing and computer vision, e.g., [9]. Given a graphical model, one important problem is computing marginal distributions of variables at each node... |

3291 | Convex Analysis
- Rockafellar
- 1970
(Show Context)
Citation Context ...ve set of parameters for a graphical distribution. More precisely, the quantities and are a dual set of parameters, related via the Legendre transform applied to the log partition function (see [41], =-=[44]-=-, [45]). We will frequently need to consider mappings between these two parameterizations. In particular, the computation of the marginals can be expressed compactly as a map acting on the parameter v... |

1477 |
Exactly Solved Models in Statistical Mechanics
- Baxter
- 1982
(Show Context)
Citation Context ...03 IEEE I. INTRODUCTION PROBABILITY distributions defined by graphs arise in a variety of fields, including coding theory, e.g., [5], [6], artificial intelligence, e.g., [1], [7], statistical physics =-=[8]-=-, as well as image processing and computer vision, e.g., [9]. Given a graphical model, one important problem is computing marginal distributions of variables at each node of the graph. For acyclic gra... |

1293 |
Local Computations with Probabilities on Graphical Structures and their Application to Expert Systems
- Lauritzen, Spiegelhalter
- 1988
(Show Context)
Citation Context ...e), then there exists a unique reparameterization specified by exact marginal distributions over cliques. Indeed, such a parameterization is the cornerstone of the junction tree representation (e.g., =-=[27]-=-, [28]). For a graph with cycles, on the other hand, exact factorizations exposing these marginals do not generally exist. Nevertheless, it is always possible to reparameterize certain portions of any... |

1147 | Spatial interaction and the statistical analysis of lattice systems - Besag - 1974 |

1144 |
Modern Graph Theory
- Bollobás
- 1998
(Show Context)
Citation Context ... II. BACKGROUND This section provides background necessary for subsequent developments. We begin with the basics of graphical models, including the necessary preliminaries on graph theory. (See [34], =-=[35]-=- for more background on graph theory.) As for graphical models, there are a variety of different formalisms, including directed Bayesian networks [1], factor graphs [6], and Markov random fields [36].... |

1103 |
Graphical Model
- Lauritzen
- 1996
(Show Context)
Citation Context ...which are also known as hypertrees. In order to define these objects, we require the notions of tree decomposition and running intersection, which are well known in the context of junction trees (see =-=[57]-=-, [37]). Given a hypergraph ,atree decomposition is an acyclic graph in which the nodes are formed by the maximal hyperedges of . Any intersection of two maximal hyperedges that are adjacent in the tr... |

912 |
An Introduction to Bayesian Networks
- Jensen
- 1996
(Show Context)
Citation Context ...en there exists a unique reparameterization specified by exact marginal distributions over cliques. Indeed, such a parameterization is the cornerstone of the junction tree representation (e.g., [27], =-=[28]-=-). For a graph with cycles, on the other hand, exact factorizations exposing these marginals do not generally exist. Nevertheless, it is always possible to reparameterize certain portions of any facto... |

835 | An Introduction to Variational Methods for Graphical Models - Jordan, Ghahramani, et al. - 1998 |

834 |
Enumerative Combinatorics
- Stanley
- 1997
(Show Context)
Citation Context ...set of ). The set of hyperedges can be viewed as a partially ordered set, where the partial ordering is specified by inclusion. More details on hypergraphs can be found in Berge [55], whereas Stanley =-=[56]-=- provides more information on partially ordered sets (also known as posets). Given two hyperedges and , one of three possibilities can hold: i) the hyperedge is contained within , in which case we wri... |

748 |
Nonlinear Programming
- Bertsekas
- 1999
(Show Context)
Citation Context ... constraint set , this allows us to establish equivalence of TRP fixed points with those of BP. C. Tree Reparameterization Updates as Projections Given a linear subspace and a vector ,itis well known =-=[47]-=- that the projection under the Euclidean norm (i.e., ) is characterized by an orthogonality condition, or equivalently a Pythagorean relation. The main result of this subsection is to show that a simi... |

617 | Introduction to Graphical Models
- Jordan, Bishop
- 2004
(Show Context)
Citation Context ...2 0018-9448/03$17.00 © 2003 IEEE I. INTRODUCTION PROBABILITY distributions defined by graphs arise in a variety of fields, including coding theory, e.g., [5], [6], artificial intelligence, e.g., [1],=-= [7]-=-, statistical physics [8], as well as image processing and computer vision, e.g., [9]. Given a graphical model, one important problem is computing marginal distributions of variables at each node of t... |

584 |
The computational complexity of probabilistic inference using bayesian belief networks
- Cooper
- 1990
(Show Context)
Citation Context ... For acyclic graphs (i.e., trees), standard and highly efficient algorithms exist for this task. In contrast, exact solutions are prohibitively complex for more general graphs of any substantial size =-=[10]-=-. As a result, there has been considerable interest and effort aimed at developing approximate inference algorithms for large graphs with cycles. The belief propagation (BP) algorithm [11], [3], [1], ... |

529 | Algebraic Graph Theory - Biggs - 1974 |

470 | Learning low-level vision
- Freeman, Pasztor, et al.
- 2000
(Show Context)
Citation Context ... graphs arise in a variety of fields, including coding theory, e.g., [5], [6], artificial intelligence, e.g., [1], [7], statistical physics [8], as well as image processing and computer vision, e.g., =-=[9]-=-. Given a graphical model, one important problem is computing marginal distributions of variables at each node of the graph. For acyclic graphs (i.e., trees), standard and highly efficient algorithms ... |

429 |
Generalized iterative scaling for log-linear models
- Darroch, Ratchli
- 1972
(Show Context)
Citation Context ...gman distance over a set formed by an intersection of linear constraints [32]. An example that involves a Bregman distance other than the Euclidean norm is the generalized iterative scaling algorithm =-=[48]-=-, used to compute projections involving the KL divergence. A Pythagorean relation analogous to (32) is instrumental in establishing the convergence of such techniques [46], [32]. The problem of intere... |

398 | Generalized Belief Propagation
- Yedidia, Freeman, et al.
- 2000
(Show Context)
Citation Context ... size [10]. As a result, there has been considerable interest and effort aimed at developing approximate inference algorithms for large graphs with cycles. The belief propagation (BP) algorithm [11], =-=[3]-=-, [1], also known as the sum-product algorithm, e.g., [12], [13], [2], [6], is one important method for computing approximate marginals. The interest in this algorithm has been fueled in part by its u... |

316 | Turbo decoding as an instance of Pearl’s belief propagation algorithm
- McEliece, MacKay, et al.
- 1998
(Show Context)
Citation Context ... observed; given instead are noisy observations of at some (or all) of the nodes, on which basis one would like to draw inferences about . For example, in the context of error-correcting codes (e.g., =-=[2]-=-), the collection represents the bits received from the noisy channel, whereas the vector represents the transmitted codeword. Similarly, in image processing or computer vision [8], the vector represe... |

307 | Understanding belief propagation and its generalizations - Yedidia, F, et al. - 2001 |

296 |
The generalized distributive law
- Aji, McEliece
- 2000
(Show Context)
Citation Context ...rest and effort aimed at developing approximate inference algorithms for large graphs with cycles. The belief propagation (BP) algorithm [11], [3], [1], also known as the sum-product algorithm, e.g., =-=[12]-=-, [13], [2], [6], is one important method for computing approximate marginals. The interest in this algorithm has been fueled in part by its use in fields such as artificial intelligence and computer ... |

265 |
A family of algorithms for approximate Bayesian inference
- Minka
- 2001
(Show Context)
Citation Context ...en problem. Second, our error analysis can be applied to the problem of assessing the relative accuracy of different approximations. As we discuss in Section VI, various extensions to BP (e.g., [33], =-=[29]-=-, [25], [4]) can be analyzed from a reparameterization perspective, and a similar error analysis is applicable. Since the (intractable) partition function of the original model is the same regardless ... |

259 | I-divergence geometry of probability distributions and minimization problems, The Ann. Probab. 3 - Csiszár - 1975 |

257 |
Parallel optimization: Theory, algorithms, and applications
- Censor, Zenios
- 1997
(Show Context)
Citation Context ...roximation to the Kullback–Leibler (KL) divergence. This result establishes interesting links between TRP and successive projection algorithms for constrained minimization of Bregman distances (e.g.=-=, [32]-=-). The Pythagorean result enables us to show that fixed points of the TRP algorithm satisfy the necessary conditions to be a constrained local minimum of , thereby enabling us to make contact with the... |

232 | Correctness of belief propagation in gaussian graphical models of arbitrary topology
- Weiss, Freeman
- 2001
(Show Context)
Citation Context ...veral researchers [15]–[17], [11] have analyzed the single-cycle case, where belief propagation can be reformulated as a matrix powering method. For Gaussian processes on arbitrary graphs, two group=-=s [18]-=-, [19], using independent methods, have shown that when BP converges, then the conditional means are exact but the error covariances are generally incorrect. For the special case of graphs correspondi... |

210 | A tourist guide through treewidth - Bodlaender - 1993 |

178 | Correctness of Local Probability Propagation in Graphical Models with Loops
- Weiss
- 1998
(Show Context)
Citation Context ...antial size [10]. As a result, there has been considerable interest and effort aimed at developing approximate inference algorithms for large graphs with cycles. The belief propagation (BP) algorithm =-=[11]-=-, [3], [1], also known as the sum-product algorithm, e.g., [12], [13], [2], [6], is one important method for computing approximate marginals. The interest in this algorithm has been fueled in part by ... |

157 | A new class of upper bounds on the log partition function
- Wainwright, Jaakkola, et al.
(Show Context)
Citation Context ...ph. What is required, in order to compute the expressions in Theorem 4, are upper bounds on the log-partition function. A class of upper bounds are available for the Ising model [52]; in related work =-=[53]-=-, [45], we have developed a technique for upper bounding the log partition function of an arbitrary undirected graphical model. Such methods allow upper bounds on the expressions in Theorem 4 to be co... |

120 |
Low-Density Parity Check Codes
- Gallager
- 1963
(Show Context)
Citation Context ...gital Object Identifier 10.1109/TIT.2003.810642 0018-9448/03$17.00 © 2003 IEEE I. INTRODUCTION PROBABILITY distributions defined by graphs arise in a variety of fields, including coding theory, e.g.,=-= [5]-=-, [6], artificial intelligence, e.g., [1], [7], statistical physics [8], as well as image processing and computer vision, e.g., [9]. Given a graphical model, one important problem is computing margina... |

109 | CCCP algorithms to minimize the Bethe and Kikuchi free energies: Convergent alternatives to belief propagation
- Yuille
- 2002
(Show Context)
Citation Context ...rgy associated with the graphicalsWAINWRIGHT et al.: ANALYSIS OF SUM-PRODUCT AND RELATED ALGORITHMS BY TREE-BASED REPARAMETERIZATION 1121 distribution, 1 which inspired other researchers (e.g., [22], =-=[23]-=-) to develop more sophisticated algorithms for minimizing the Bethe free energy. Yedidia et al. also proposed extensions to BP based on cluster variational methods [24]; in subsequent work, various re... |

108 | Iterative decoding of compound codes by probability propagation in graphical models
- Kschischang, Frey
- 1998
(Show Context)
Citation Context ... Object Identifier 10.1109/TIT.2003.810642 0018-9448/03$17.00 © 2003 IEEE I. INTRODUCTION PROBABILITY distributions defined by graphs arise in a variety of fields, including coding theory, e.g., [5],=-= [6]-=-, artificial intelligence, e.g., [1], [7], statistical physics [8], as well as image processing and computer vision, e.g., [9]. Given a graphical model, one important problem is computing marginal dis... |

94 | Modeling and estimation of multiresolution stochastic processes
- Basseville, Benveniste, et al.
- 1992
(Show Context)
Citation Context ...n Trees as Reparameterization: Algorithms for optimal inference on trees have appeared in the literature of various fields, including coding theory [6], artificial intelligence [1], and system theory =-=[39]. In-=- broad overview, such algorithms consist of a recursive series of updates, in which “messages” are passed from node to node. 4 In Section VI, we shall consider more general Markov random fields th... |

75 | Loopy belief propagation and Gibbs measures - Tatikonda, Jordan |

61 |
A theory of cooperative phenomena
- Kikuchi
- 1951
(Show Context)
Citation Context ... other researchers (e.g., [22], [23]) to develop more sophisticated algorithms for minimizing the Bethe free energy. Yedidia et al. also proposed extensions to BP based on cluster variational methods =-=[24]-=-; in subsequent work, various researchers, e.g., [25], [4] have studied and explored such extensions. Tatikonda and Jordan [26] derived conditions for convergence of BP based on the unwrapped computat... |

54 |
Differential geometry of curved exponential families — curvatures and information loss
- Amari
- 1982
(Show Context)
Citation Context ...torization. Geometrically, this invariance means that successive iterates are confined to an affine subspace of exponential parameters (i.e., an -flat manifold in terms of information geometry (e.g., =-=[30]-=-, [31]). We then show how each TRP update can be viewed as a projection onto an -flat manifold formed by the constraints associated with each tree. We prove that a Pythagorean-type result holds for su... |

52 | Graphs, Networks and Algorithms - Jungnickel - 1999 |

51 |
The geometry of turbo-decoding dynamics
- Richardson
- 2000
(Show Context)
Citation Context ...nd effort aimed at developing approximate inference algorithms for large graphs with cycles. The belief propagation (BP) algorithm [11], [3], [1], also known as the sum-product algorithm, e.g., [12], =-=[13]-=-, [2], [6], is one important method for computing approximate marginals. The interest in this algorithm has been fueled in part by its use in fields such as artificial intelligence and computer vision... |

43 |
Tailbiting map decoders
- Anderson, Hladik
- 1998
(Show Context)
Citation Context ...s converge, the quality of the resulting approximations varies substantially. Recent work has yielded some insight into the dynamics and convergence properties of BP. For example, several researchers =-=[15]��-=-�[17], [11] have analyzed the single-cycle case, where belief propagation can be reformulated as a matrix powering method. For Gaussian processes on arbitrary graphs, two groups [18], [19], using inde... |

43 | Iterative decoding on graphs with a single cycle - Aji, Horn, et al. - 1998 |

41 |
Stochastic processes on graphs with cycles: geometric and variational approaches
- Wainwright
- 2002
(Show Context)
Citation Context ... of parameters for a graphical distribution. More precisely, the quantities and are a dual set of parameters, related via the Legendre transform applied to the log partition function (see [41], [44], =-=[45]-=-). We will frequently need to consider mappings between these two parameterizations. In particular, the computation of the marginals can be expressed compactly as a map acting on the parameter vector ... |

38 | Nonlinear Programming. Athena Scienti c - Bertsekas - 1995 |

37 | Embedded trees: estimation of Gaussian processes on graphs with cycles
- Sudderth, Wainwright, et al.
- 2004
(Show Context)
Citation Context ...(if they converge), there is little reason to apply them in practice. There remains, however, the interesting problem of computing correct error covariances at each node: we refer the reader to [49], =-=[50]-=- for description of an embedded spanning tree method that efficiently computes both means and error covariances for a linear Gaussian problem on a graph with cycles. V. ANALYSIS OF THE APPROXIMATION E... |

36 |
Information geometry of Boltzmann machines
- Amari, Kurata, et al.
- 1992
(Show Context)
Citation Context ...milies of Distributions Central to our work are exponential representations of distributions, which have been studied extensively in statistics and applied probability theory (e.g., [31], [40], [30], =-=[41]-=-, [42]). Given an index set , we consider a collection of potential functions associated with the graph . We let denote a vector of parameters, and then consider following distribution: (6a) 7 Here we... |

32 |
Loopy-belief propagation for approximate inference: An empirical study
- Murphy, Weiss, et al.
- 2000
(Show Context)
Citation Context ...6], is one important method for computing approximate marginals. The interest in this algorithm has been fueled in part by its use in fields such as artificial intelligence and computer vision, e.g., =-=[14]-=-, [9], and also by the success of turbo codes and other graphical codes, for which the decoding algorithm is a particular instantiation of belief propagation, e.g., [5], [2], [6]. While there are vari... |

29 | Recursive algorithms for approximating probabilities in graphical models
- Jaakkola, Jordan
- 1997
(Show Context)
Citation Context ...ce on the original graph. What is required, in order to compute the expressions in Theorem 4, are upper bounds on the log-partition function. A class of upper bounds are available for the Ising model =-=[52]-=-; in related work [53], [45], we have developed a technique for upper bounding the log partition function of an arbitrary undirected graphical model. Such methods allow upper bounds on the expressions... |

27 |
Understanding Belief Propagation and its Generalizations,” Mitsubishi
- Yedidia, Freeman, et al.
- 2000
(Show Context)
Citation Context ...ing open problem. Second, our error analysis can be applied to the problem of assessing the relative accuracy of different approximations. As we discuss in Section VI, various extensions to BP (e.g., =-=[33]-=-, [29], [25], [4]) can be analyzed from a reparameterization perspective, and a similar error analysis is applicable. Since the (intractable) partition function of the original model is the same regar... |

25 | A theorem about random fields
- Grimmett
- 1973
(Show Context)
Citation Context ... [35] for more background on graph theory.) As for graphical models, there are a variety of different formalisms, including directed Bayesian networks [1], factor graphs [6], and Markov random fields =-=[36]-=-. With some caveats, 2 these different representations are essentially equivalent. In this paper, we will make use of the formalism of Markov random fields, which are defined by undirected graphs. Mor... |

24 | Belief propagation and statistical physics
- Pakzad, Anantharam
- 2002
(Show Context)
Citation Context ... sophisticated algorithms for minimizing the Bethe free energy. Yedidia et al. also proposed extensions to BP based on cluster variational methods [24]; in subsequent work, various researchers, e.g., =-=[25]-=-, [4] have studied and explored such extensions. Tatikonda and Jordan [26] derived conditions for convergence of BP based on the unwrapped computation tree and links to Gibbs measures in statistical p... |

23 | Belief propagation on partially ordered sets
- McEliece, Yildirim
- 2002
(Show Context)
Citation Context ...Second, our error analysis can be applied to the problem of assessing the relative accuracy of different approximations. As we discuss in Section VI, various extensions to BP (e.g., [33], [29], [25], =-=[4]-=-) can be analyzed from a reparameterization perspective, and a similar error analysis is applicable. Since the (intractable) partition function of the original model is the same regardless of the appr... |

23 | Tree-based modeling and estimation of Gaussian processes on graphs with cycles
- Wainwright, Sudderth, et al.
- 2001
(Show Context)
Citation Context ...means (if they converge), there is little reason to apply them in practice. There remains, however, the interesting problem of computing correct error covariances at each node: we refer the reader to =-=[49]-=-, [50] for description of an embedded spanning tree method that efficiently computes both means and error covariances for a linear Gaussian problem on a graph with cycles. V. ANALYSIS OF THE APPROXIMA... |