## Convexifying the bethe free energy (2009)

### Cached

### Download Links

Venue: | in Conference on Uncertainty in Artifical Intelligence (UAI |

Citations: | 11 - 2 self |

### BibTeX

@INPROCEEDINGS{Meshi09convexifyingthe,

author = {Ofer Meshi and Ariel Jaimovich and Amir Globerson and Nir Friedman},

title = {Convexifying the bethe free energy},

booktitle = {in Conference on Uncertainty in Artifical Intelligence (UAI},

year = {2009}

}

### OpenURL

### Abstract

The introduction of loopy belief propagation (LBP) revitalized the application of graphical models in many domains. Many recent works present improvements on the basic LBP algorithm in an attempt to overcome convergence and local optima problems. Notable among these are convexified free energy approximations that lead to inference procedures with provable convergence and quality properties. However, empirically LBP still outperforms most of its convex variants in a variety of settings, as we also demonstrate here. Motivated by this fact we seek convexified free energies that directly approximate the Bethe free energy. We show that the proposed approximations compare favorably with state-of-the art convex free energy approximations. 1

### Citations

817 |
Inference from Iterative Simulation Using Multiple Sequences
- Gelman, Rubin
- 1992
(Show Context)
Citation Context ...e random walk is close to the stationary distribution, we apply a heuristic convergence test by running in parallel several chains from different random starting points and comparing their statistics =-=[1]-=-. Once we determine convergence, we then use samples from the different runs to estimate A. Finally, we solve the optimization problem in Eq. (15) with and without enforcing the variable-valid constra... |

495 | Loopy belief propagation for approximate inference: An empirical study
- Murphy, Weiss, et al.
- 1999
(Show Context)
Citation Context ...the free energy functional. In the last decade, loopy belief propagation (LBP), a simple local message passing procedure, proved to be empirically successful and was used in a variety of applications =-=[10]-=-. The seminal work of Yedidia et al. [20] merged these lines of work by formulating loopy belief propagation in terms of optimizing the Bethe free energy, an approximate free energy functional. LBP su... |

471 | Graphical models, exponential families, and variational inference
- Wainwright, Jordan
- 2003
(Show Context)
Citation Context ...terative optimization algorithm and requires less calls to the inference procedure. More importantly, while finding the optimal counting numbers in TRW is computationally hard for non-pairwise models =-=[14]-=-, our method is naturally applicable in the more general setting. To evaluate the above adaptive strategy, we compare it with the convexBethe-c approximation and with the Bethe approximation in the sa... |

442 | Constructing free-energy approximations and generalized belief propagation algorithms
- Yedidia, Freeman, et al.
(Show Context)
Citation Context ... long history in statistical physics is based on a weighted sum of local entropies Hc(µ) = ∑ r crHr(µr), where r are subsets of variables (regions) and the coefficients cr are called counting numbers =-=[21]-=-. The approximate optimization problem then takes the form: log ˜ Z(θ) = max µ∈L(G) { θ T µ + Hc(µ) The entropy approximation is defined both by the choice of regions and by the choice of counting num... |

410 | Generalized belief propagation
- Yedidia, Freeman, et al.
- 2000
(Show Context)
Citation Context ...ecade, loopy belief propagation (LBP), a simple local message passing procedure, proved to be empirically successful and was used in a variety of applications [10]. The seminal work of Yedidia et al. =-=[20]-=- merged these lines of work by formulating loopy belief propagation in terms of optimizing the Bethe free energy, an approximate free energy functional. LBP suffers from two inherent problems: it fail... |

163 | A new class of upper bounds on the log partition function
- Wainwright, Jaakkola, et al.
- 2002
(Show Context)
Citation Context ...of the Bethe free energy [17, 22]. However, this still leaves the problem of local optima, and therefore the dependence of the solution on initial conditions. To alleviate this problem, several works =-=[2, 6, 13, 15]-=- construct convex free energy approximations, for which there is a single global optimum. Convexity also paved the way for the introduction of provably convergent message-passing algorithms for calcul... |

110 | Cccp algorithms to minimize the bethe and kikuchi free energies: Convergent alternatives to belief propagation - Yuille |

52 | Approximate inference and constrained optimization
- Heskes, Albers, et al.
- 2003
(Show Context)
Citation Context ...of the Bethe free energy [17, 22]. However, this still leaves the problem of local optima, and therefore the dependence of the solution on initial conditions. To alleviate this problem, several works =-=[2, 6, 13, 15]-=- construct convex free energy approximations, for which there is a single global optimum. Convexity also paved the way for the introduction of provably convergent message-passing algorithms for calcul... |

49 | Divergence measures and message passing
- Minka
- 2005
(Show Context)
Citation Context ...e free energies can be written as: Hc(µ) − Hb(µ) = ∑ (1 − cα)Iα(µα) Where Cvv is the variable-valid subspace and Iα(µα) = ∑ i∈α Hi(µi)−Hα(µα) is the multi-information of the distribution µα (see also =-=[9]-=-). Since Iα(µα) ≥ 0 always holds, we get that if cα ≤ 1 for all α, then Hc(µ) is an upper bound on the Bethe entropy. This property is not only sufficient but also necessary if we want to find countin... |

47 | Belief optimization for binary networks: a stable alternative to loopy belief propagation
- Welling, Teh
- 2001
(Show Context)
Citation Context ...due to the non-convexity of the Bethe free energy. Several approaches have been introduced to fix the non-convergence issue, so that LBP provably converges to a local optimum of the Bethe free energy =-=[17, 22]-=-. However, this still leaves the problem of local optima, and therefore the dependence of the solution on initial conditions. To alleviate this problem, several works [2, 6, 13, 15] construct convex f... |

45 | MAP estimation, linear programming and belief propagation with convex free energies
- Weiss, Yanover, et al.
- 2007
(Show Context)
Citation Context ...of the Bethe free energy [17, 22]. However, this still leaves the problem of local optima, and therefore the dependence of the solution on initial conditions. To alleviate this problem, several works =-=[2, 6, 13, 15]-=- construct convex free energy approximations, for which there is a single global optimum. Convexity also paved the way for the introduction of provably convergent message-passing algorithms for calcul... |

38 | Estimating the wrong graphical model: Benefits in the computation-limited setting
- Wainwright
- 2006
(Show Context)
Citation Context ...r optimization, it still inherited the localoptima problem of the Bethe optimization. More recently, convex free energy variants were shown to be particularly useful in the context of model selection =-=[12]-=-. Despite these merits, in terms of quality of the approximation, convex free energies are still often not competitive with Bethe and in fact result in poorer performance over a wide range of paramete... |

30 | Fractional belief propagation
- Wiegerinck, Heskes
(Show Context)
Citation Context ...stand out as the main region of relatively low error. In fact, we note that to the best of our knowledge all free energy approximations suggested in the literature obey this variable-valid constraint =-=[2, 4, 13, 19, 21]-=-. The rightmost column of Figure 2 shows performance of variable-valid approximations. We notice that for almost all models tested the approximation improves as the counting numbers get closer to the ... |

28 | Convexity arguments for efficient minimization of the bethe and kikuchi free energies
- Heskes
(Show Context)
Citation Context ...on is a convex combination of tree entropies and is concave. Furthermore, it is an upper bound on the true H(µ) so that the optimum of Eq. (5) yields an upper bound on log Z(θ). More recently, Heskes =-=[5]-=- derived a set of sufficient condition for cα, ci to yield a concave function. He showed that an entropy approximation Hc(µ) is provably concave for µ ∈ L(G) if there exist auxiliary counting numbers ... |

21 |
On the choice of regions for generalized belief propagation
- Welling
- 2004
(Show Context)
Citation Context ...s. For simplicity, we limit ourselves to a common choice of regions — over variables and factors, although } } (3) (5) the results to follow can be generalized to more elaborate region choices (e.g., =-=[16, 18]-=-). In this case the approximate entropy takes the form: Hc(µ) = ∑ ciHi(µi) + ∑ cαHα(µα) (6) i where ci and cα are the counting numbers for variables and factors, respectively. Each set of counting num... |

16 | Convergent messagepassing algorithms for inference over general graphs with convex free energies
- Hazan, Shashua
- 2008
(Show Context)
Citation Context ..., for which there is a single global optimum. Convexity also paved the way for the introduction of provably convergent message-passing algorithms for calculating likelihood and marginal probabilities =-=[3, 4]-=-. Moreover, some of these approximations provide upper bounds on the partition function [2, 13]. Despite their algorithmic elegance and convergence properties, convex variants often do not provide bet... |

15 |
Approximations with reweighted generalized belief propagation
- Wiegerinck
- 2005
(Show Context)
Citation Context ...s. For simplicity, we limit ourselves to a common choice of regions — over variables and factors, although } } (3) (5) the results to follow can be generalized to more elaborate region choices (e.g., =-=[16, 18]-=-). In this case the approximate entropy takes the form: Hc(µ) = ∑ ciHi(µi) + ∑ cαHα(µα) (6) i where ci and cα are the counting numbers for variables and factors, respectively. Each set of counting num... |

13 | Convergent propagation algorithms via oriented trees
- Globerson, Jaakkola
- 2007
(Show Context)
Citation Context ..., for which there is a single global optimum. Convexity also paved the way for the introduction of provably convergent message-passing algorithms for calculating likelihood and marginal probabilities =-=[3, 4]-=-. Moreover, some of these approximations provide upper bounds on the partition function [2, 13]. Despite their algorithmic elegance and convergence properties, convex variants often do not provide bet... |

13 | Estimation and marginalization using kikuchi approximation methods
- Pakzad, Anantharam
- 2005
(Show Context)
Citation Context ...er- or under-counting variables and factors in the approximate entropy Hc(µ). Furthermore, for tree structured distributions it has been shown that only valid counting numbers can yield exact results =-=[11]-=-. Figure 1 illustrates the structure of the above constraints in the space of counting numbers. Note that the Bethe approximation is the single choice of counting numbers that is both factor- and vari... |

9 | Approximate inference using conditional entropy decompositions
- Globerson, Jaakkola
- 2008
(Show Context)
Citation Context |

3 | The volume of relaxed boolean-quadric and cut polytopes
- Ko, Lee, et al.
- 1997
(Show Context)
Citation Context ... by performing a random walk inside L(G). Starting with a random point inside 1 Volume computations over such polytopes are generally difficult, but in some special cases may be solved in closed form =-=[8]-=-. L(G) (i.e., a set of consistent marginals µ) we sample a legal direction, find the two boundaries along this direction, and then sample uniformly a new point from within the bounded interval. A stra... |