## Norm-Product Belief Propagation: Primal-Dual Message-Passing for Approximate Inference (2008)

Citations: | 21 - 7 self |

### BibTeX

@MISC{Hazan08norm-productbelief,

author = {Tamir Hazan and Amnon Shashua},

title = {Norm-Product Belief Propagation: Primal-Dual Message-Passing for Approximate Inference},

year = {2008}

}

### Years of Citing Articles

### OpenURL

### Abstract

Inference problems in graphical models can be represented as a constrained optimization of a free energy function. In this paper we treat both forms of probabilistic inference, estimating marginal probabilities of the joint distribution and finding the most probable assignment, through a unified message-passing algorithm architecture. In particular we generalize the Belief Propagation (BP) algorithms of sum-product and maxproduct and tree-rewaighted (TRW) sum and max product algorithms (TRBP) and introduce a new set of convergent algorithms based on ”convex-free-energy” and Linear-Programming (LP) relaxation as a zero-temprature of a convex-free-energy. The main idea of this work arises from taking a general perspective on the existing BP and TRBP algorithms while observing that they all are reductions from the basic optimization formula of f + ∑ i hi

### Citations

7052 |
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
- Pearl
- 1988
(Show Context)
Citation Context ... methods for inference over (factor) graphs are message-passing algorithms which pass messages along the edges of the factor graph until convergence is reached. The belief-propagation (BP) algorithms =-=[33]-=- come in two varieties: the sum-product algorithm for computing marginal probabilities and the max-product algorithm for computing the MAP assignment. Citing [50], the centrality of inference using gr... |

3267 | Convex Analysis
- Rockafellar
- 1970
(Show Context)
Citation Context ... = ∅ when x ̸∈ C. If f(x) is also strictly convex on C it is called a convex function of Legendre type, and its gradient mapping ∇f : C → R n is continuous and one-to-one, and ∇f ∗ = (∇f) −1 . Proof: =-=[35]-=-, Theorem 26.1 and Theorem 26.5 The sets {x | a ⊤ x ≥ b} and {x | a ⊤ x ≤ b}, are called the closed half-spaces associated with the hyperplane. We say that two sets C1, C2 are separated by a hyperplan... |

2112 |
A New Approach to Linear Filtering and Prediction Problems
- Kalman
- 1960
(Show Context)
Citation Context ... Those include the Viterbi algorithm [42], Gallager’s sum-product algorithm for decoding low-density parity check codes [15], the turbo-decoding algorithm [3], the Kalman filter for signal processing =-=[25]-=-, and the transfer-matrix approach in statistical mechanics [1]. The BP algorithms are exact, i.e., the resulting marginal probabilities and the MAP assignments are the correct ones, when the factor g... |

1476 |
Exactly Solved Models in Statistical Mechanics
- Baxter
- 1982
(Show Context)
Citation Context ...t algorithm for decoding low-density parity check codes [15], the turbo-decoding algorithm [3], the Kalman filter for signal processing [25], and the transfer-matrix approach in statistical mechanics =-=[1]-=-. The BP algorithms are exact, i.e., the resulting marginal probabilities and the MAP assignments are the correct ones, when the factor graph is free of cycles — a state of affairs the considerably li... |

1304 | Near Shannon limit errorcorrecting coding and decoding: Turbo-Codes
- Nerrou, Glavieux, et al.
- 1993
(Show Context)
Citation Context ...endently derived under different disciplines. Those include the Viterbi algorithm [42], Gallager’s sum-product algorithm for decoding low-density parity check codes [15], the turbo-decoding algorithm =-=[3]-=-, the Kalman filter for signal processing [25], and the transfer-matrix approach in statistical mechanics [1]. The BP algorithms are exact, i.e., the resulting marginal probabilities and the MAP assig... |

1167 | Factor graphs and the sum-product algorithm
- Kschischang, Frey, et al.
- 2001
(Show Context)
Citation Context ...ding to the original nodes of the hypergraph and the other set corresponds to its hyperedges. In the context of graphical models such a bipartite graph representation is referred to as a factor graph =-=[28]-=- with variable nodes representing φi(xi) and a factor node for each function ψα(xα). An edge connects a variable node i with factor node α if and only if xi ∈ xα, i.e., xi is an argumentPRESENTED IN ... |

1164 |
Error bounds for convolutional codes and an asymptotically optimum decoding algorithm
- Viterbi
- 1967
(Show Context)
Citation Context ...Y IN ARTIFICIAL INTELLIGENCE (UAI), JULY 2008. 2 equivalent or very similar message-passing algorithms have been independently derived under different disciplines. Those include the Viterbi algorithm =-=[42]-=-, Gallager’s sum-product algorithm for decoding low-density parity check codes [15], the turbo-decoding algorithm [3], the Kalman filter for signal processing [25], and the transfer-matrix approach in... |

611 | Learning in Graphical Models
- Jordan, editor
- 1999
(Show Context)
Citation Context ...sis [23], protein folding [49], computer graphics [13], [8], clustering [36], as well as other broad disciplines which include signal processing, artificial intelligence and statistical physics [14], =-=[24]-=-. Probabilistic inference comes in two distinct forms and typically involve two slightly different algorithmic thrusts. One form of inference task is to obtain one global state of the joint distributi... |

467 | E.: Learning low-level vision
- Freeman, Pasztor
(Show Context)
Citation Context ...plications covering a variety of disciplines. Those include digital communications (error correcting codes [12]), computer vision [38], medical diagnosis [23], protein folding [49], computer graphics =-=[13]-=-, [8], clustering [36], as well as other broad disciplines which include signal processing, artificial intelligence and statistical physics [14], [24]. Probabilistic inference comes in two distinct fo... |

413 | Constructing Free-Energy Approximations and Generalized Belief FALL 2008 105 Spring Symposium Series Call for Participation AAAI presents the 2009 Spring Symposium Series, to be held Monday - Wednesday, March 23-25, 2008, at Stanford University. The topic
- Yedidia, Freeman, et al.
(Show Context)
Citation Context ...he belief-propagation (BP) algorithms [33] come in two varieties: the sum-product algorithm for computing marginal probabilities and the max-product algorithm for computing the MAP assignment. Citing =-=[50]-=-, the centrality of inference using graphical models and the utility of the BP algorithms for solving them is reflected in the fact thatPRESENTED IN PART AT THE CONFERENCE ON UNCERTAINTY IN ARTIFICIA... |

259 |
The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming
- Bregman
- 1967
(Show Context)
Citation Context ...ine sets C1∩...∩Cn and the sub-problem in Eqn. 22 corresponds to the projection of µ i onto the affine set Ci. Hildreth [22] extended the problem with open half spaces Ci = {x | a⊤ i x ≤ bi}. Bregman =-=[5]-=- extended Hildreth’s problem setup by including any strictly convex function f. The special case of Entropy projections was introduced later by Csiszar [9], as I-projections. Dykstra [11], [10] was th... |

258 |
I-divergence geometry of probability distributions and minimization problems. The Annals of Probability
- Csiszár
- 1975
(Show Context)
Citation Context ... half spaces Ci = {x | a⊤ i x ≤ bi}. Bregman [5] extended Hildreth’s problem setup by including any strictly convex function f. The special case of Entropy projections was introduced later by Csiszar =-=[9]-=-, as I-projections. Dykstra [11], [10] was the first to introduce general convex sets Ci (i.e., going beyond affine sets or half-spaces) but limited the treatment to f representing the Euclidean norm ... |

256 |
Graphical Models for Machine Learning and Digital Communication
- Frey
- 1998
(Show Context)
Citation Context ...diagnosis [23], protein folding [49], computer graphics [13], [8], clustering [36], as well as other broad disciplines which include signal processing, artificial intelligence and statistical physics =-=[14]-=-, [24]. Probabilistic inference comes in two distinct forms and typically involve two slightly different algorithmic thrusts. One form of inference task is to obtain one global state of the joint dist... |

176 | Correctness of local probability propagation in graphical models with loops
- Weiss
(Show Context)
Citation Context ... such a case, the MAP assignment of xi corresponds to the index of highest entry of bi(xi). In general convergence is not guaranteed, and the MAP assignment can be recovered only for specific graphs, =-=[46]-=-, [2]. A. Inference using a Variational Principle The BP algorithms apply to tree-structured factor graphs yet are well defined for general factor graphs but without convergence or accuracy guarantees... |

159 |
A ring in graph theory
- Tutte
(Show Context)
Citation Context ...he directed edge j → i. It is possible to find such edge probabilities for the uniform distribution over all spanning trees by employing a variant of the matrix tree theorem for directed trees, [16], =-=[41]-=- p.141. In this formulation ci, ciα > 0 but cα = 0. The problem with cα = 0 is that the function f is no longer strictly convex. In the claim below we show how to convert a TRW setting according to th... |

154 | A new class of upper bounds on the log partition function
- Wainwright, Jaakkola, et al.
- 2005
(Show Context)
Citation Context ...ever, a convergent message passing algorithm for the general class of convex free energies is still lacking. The existing algorithms either employ damping heuristics to ensure convergence in practice =-=[44]-=- or focus on a sub-class of free energies where the entropy term is a positive combination of joint entropies [21]. The MAP assignment problem has been shown to be approximated by a Linear-Programming... |

153 |
Convex Analysis and Optimization; Athena Scientific
- Bertsekas, Nedić, et al.
- 2003
(Show Context)
Citation Context ...results were reported in [16]. Figure 6 compares the running time of the convex-sumproduct algorithm with a general convex solver performing conditional gradient descent on the primal energy function =-=[4]-=- which uses linear programming to find feasible search directions. We ran the algorithms on n × n grids where n = 2, 3, ..., 10. The stopping criteria for all algorithms was the same and based on a pr... |

133 | Comparison of Graph Cuts with Belief Propagation for Stereo, using Identical MRF Parameters
- M, Freeman
- 2003
(Show Context)
Citation Context ...olving inference using graphical models comes up in a wide range of applications covering a variety of disciplines. Those include digital communications (error correcting codes [12]), computer vision =-=[38]-=-, medical diagnosis [23], protein folding [49], computer graphics [13], [8], clustering [36], as well as other broad disciplines which include signal processing, artificial intelligence and statistica... |

131 | MAP estimation via agreement on trees: message-passing and linear programming
- Wainwright, Jaakkola, et al.
- 2005
(Show Context)
Citation Context ... of free energies where the entropy term is a positive combination of joint entropies [22]. The MAP assignment problem has been shown to be approximated by a Linear-Programming (LP) relaxation scheme =-=[43]-=- with message-passing algorithmic attempts as a solution [45], [18]. All attempts so far guarantee convergence only under special cases (such as binary variables). A double-loop of message passing usi... |

119 |
Low-density parity check codes
- Gallager
- 1963
(Show Context)
Citation Context ...-passing algorithms have been independently derived under different disciplines. Those include the Viterbi algorithm [42], Gallager’s sum-product algorithm for decoding low-density parity check codes =-=[15]-=-, the turbo-decoding algorithm [3], the Kalman filter for signal processing [25], and the transfer-matrix approach in statistical mechanics [1]. The BP algorithms are exact, i.e., the resulting margin... |

108 | CCCP algorithms to minimize the Bethe and Kikuchi free energies: Convergent alternatives to belief propagation
- Yuille
- 2002
(Show Context)
Citation Context ... written on top of the inference package by Talya Meltzer available at http://www.cs.huji.ac.il/∼talyam/. generated by direct optimization of the non-convex Bethe free energy using the CCCP algorithm =-=[49]-=-. The input graph for those experiments was the Ising model on a two dimensional ∑ 8 × 8 grid. The distribution has the form p(x) ∝ e ij∈E θijxixj+θixi , where θij, θi are parameters, xi ∈ {±1}, and E... |

91 |
Finding MAPs for belief networks is NP-hard
- Shimony
- 1994
(Show Context)
Citation Context ...to obtain each of its possible ni values. In this paper, we will focus on both inference problems with the objective of introducing a unifying algorithmic thrust. Exact inference is, however, NP-hard =-=[37]-=- thus introducing the need to derive algorithms for approximate inference. One of the most popular class of methods for inference over (factor) graphs are message-passing algorithms which pass message... |

75 | Loopy belief propagation and Gibbs measures
- Tatikonda, Jordan
- 2002
(Show Context)
Citation Context ...t is well-defined and often gives surprisingly good approximate results for graphical models with cycles. However, in this context there are no convergence guarantees (except under some special cases =-=[39]-=-, [19], [31]) and the algorithms fail to converge in many cases of interest. During the past decade there has been much progress in putting forward a framework for approximate inference using variatio... |

74 | Fixing maxproduct: Convergent message passing algorithms for MAP LP-relaxations
- Globerson, Jaakkola
- 2007
(Show Context)
Citation Context ...of joint entropies [22]. The MAP assignment problem has been shown to be approximated by a Linear-Programming (LP) relaxation scheme [43] with message-passing algorithmic attempts as a solution [45], =-=[18]-=-. All attempts so far guarantee convergence only under special cases (such as binary variables). A double-loop of message passing using a proximal minimization technique proposed recently by [32] is c... |

62 |
An algorithm for restricted least squares regression
- Dykstra
- 1983
(Show Context)
Citation Context ...e vector µ i onto the convex set Ci. In that case, following some algebraic manipulations (such as eliminating µ i among other manipulations) the scheme (with A = 0) reduces to the well known Dykstra =-=[11]-=- (also goes under different names such as Hildreth, Bregman, Csiszar, Han) successive projection algorithm which has its origins in the work of Von-Neumann [30]. Further historical details can be foun... |

58 |
A theory of cooperative phenomena
- Kikuchi
- 1951
(Show Context)
Citation Context ... the marginal consistency constraints to better approximate the full probability simplex constraints. This effort includes Kikuchi free energy, region graphs and other hyper-graph based methods [50], =-=[26]-=-. The second thrust looks for convergence guaranteed message-passing algorithms by extending the Bethe free energy to form a wider class of functions, known as convex free energies, which are convex i... |

57 | On the uniqueness of loopy belief propagation fixed points
- Heskes
(Show Context)
Citation Context ...ell-defined and often gives surprisingly good approximate results for graphical models with cycles. However, in this context there are no convergence guarantees (except under some special cases [37], =-=[20]-=-, [29]) and the algorithms fail to converge in many cases of interest. During the past decade there has been much progress in putting forward a framework for approximate inference using variational pr... |

57 | Approximate Inference and Protein-Folding
- Yanover, Weiss
(Show Context)
Citation Context ...up in a wide range of applications covering a variety of disciplines. Those include digital communications (error correcting codes [12]), computer vision [36], medical diagnosis [24], protein folding =-=[47]-=-, computer graphics [13], [8], clustering [34], as well as other broad disciplines which include signal processing, artificial intelligence and statistical physics [14], [25]. Probabilistic inference ... |

51 | On the Optimality of Tree-reweighted Maxproduct Message-passing
- Kolmogorov, Wainwright
- 2007
(Show Context)
Citation Context ... ∑ α∈N(i) cα. These max-product type algorithms are not guaranteed to converge, but whenever they converge one can extract an optimal solution for a pairwise linear program with binary variables, cf. =-=[27]-=- theorem 4 and [29] corollary 2. The third family corresponds to concave entropy approximation with cα, ci, ciα ≥ 0. These convex-maxproduct algorithms are guaranteed to converge to the global optimum... |

49 | Tree-based reparameterization for approximate estimation on graphs with cycles
- Wainwright, Jaakkola, et al.
- 2001
(Show Context)
Citation Context ...s for any factor graph. An important member of this class is the tree-reweighted (TRW) free energy which consist of a linear combination of free energies defined on spanning trees of the factor graph =-=[43]-=-. It is notable that for this specific member of convex free energies a convergent message-passing algorithm, applicable to pairwise factors only, has been recently introduced [16]. However, a converg... |

46 | Maximum Weight Matching via Max-Product Belief Propagation
- Bayati, Shah, et al.
- 2005
(Show Context)
Citation Context ...a case, the MAP assignment of xi corresponds to the index of highest entry of bi(xi). In general convergence is not guaranteed, and the MAP assignment can be recovered only for specific graphs, [46], =-=[2]-=-. A. Inference using a Variational Principle The BP algorithms apply to tree-structured factor graphs yet are well defined for general factor graphs but without convergence or accuracy guarantees. The... |

45 | T.: MAP estimation, linear programming and belief propagation with convex free energies
- Weiss, Yanover, et al.
- 2007
(Show Context)
Citation Context ...ation of joint entropies [21]. The MAP assignment problem has been shown to be approximated by a Linear-Programming (LP) relaxation scheme [45] with message-passing algorithmic attempts as a solution =-=[47]-=-, [17]. All attempts so far guarantee convergence only under special cases (such as binary variables) with the exception of [17] which we will show arises as a special case of our algorithm. A double-... |

35 |
The patch transform and its applications to image editing
- Cho, Butman, et al.
- 2008
(Show Context)
Citation Context ...ions covering a variety of disciplines. Those include digital communications (error correcting codes [12]), computer vision [38], medical diagnosis [23], protein folding [49], computer graphics [13], =-=[8]-=-, clustering [36], as well as other broad disciplines which include signal processing, artificial intelligence and statistical physics [14], [24]. Probabilistic inference comes in two distinct forms a... |

33 | Globally optimal solutions for energy minimization in stereo vision using reweighted belief propagation
- Meltzer, Yanover, et al.
- 2005
(Show Context)
Citation Context ... max-product type algorithms are not guaranteed to converge, but whenever they converge one can extract an optimal solution for a pairwise linear program with binary variables, cf. [27] theorem 4 and =-=[29]-=- corollary 2. The third family corresponds to concave entropy approximation with cα, ci, ciα ≥ 0. These convex-maxproduct algorithms are guaranteed to converge to the global optimum for a pairwise lin... |

30 | Message-passing for graph-structured linear programs: Proximal methods and rounding schemes
- Ravikumar, Agarwal, et al.
(Show Context)
Citation Context ...inary variables) with the exception of [17] which we will show arises as a special case of our algorithm. A double-loop of message passing using a proximal minimization technique proposed recently by =-=[34]-=- is convergent but at a considerable computational expense. In this paper, we derive a class of approximate inference message-passing algorithms, which we call norm-product algorithms, using the notio... |

30 | Fractional belief propagation
- Wiegerinck, Heskes
- 2002
(Show Context)
Citation Context ...e in this paper but for the sake of clarity we leave it outside the current scope. The Bethe approximation Hbethe(b) of the entropy can be written in a more general form, known as fractional entropy, =-=[48]-=- ∑ ¯cαH(bα) + ∑ ¯ciH(bi), (2) α i xα where ¯ci = 1 − ∑ α∈N(i) ¯cα. Thus when the coefficients ¯cα = 1 for all factor nodes we obtain the Bethe approximation. A convex free energy is based on a result ... |

26 | Convexity arguments for efficient minimization of the Bethe and Kikuchi free energies
- Heskes
- 2006
(Show Context)
Citation Context ...nting validity of marginals). When the factor graph has cycles the Bethe energy is non-convex and although it is possible to derive convergent algorithms to a local minima of the Bethe function [49], =-=[22]-=- the computational cost is large and thus has not gained popularity. To overcome the difficulty with the non-convexity of the Bethe approximation, several authors have introduced a class of approximat... |

23 |
A quadratic programming procedure
- Hildreth
- 1957
(Show Context)
Citation Context ...primal solution is to find the projection of b0 onto the intersection of the affine sets C1∩...∩Cn and the sub-problem in Eqn. 22 corresponds to the projection of µ i onto the affine set Ci. Hildreth =-=[23]-=- extended the problem with open half spaces Ci = {x | a⊤ i x ≤ bi}. Bregman [5] extended Hildreth’s problem setup by including any strictly convex function f. The special case of Entropy projections w... |

21 |
Dual coordinate ascent methods for non-stricly convex minimization
- Tseng
- 1993
(Show Context)
Citation Context ...., λ ∗ n. Also, consider the primal sequence generated by ∇f ∗ (− ∑ i λi) computed from the dual sequence, then this primal sequence is bounded and its limit point is the optimal solution x∗ . Proof: =-=[40]-=-. APPENDIX B THE PRIMAL-DUAL BLOCK ASCENT ALGORITHM We describe an algorithm for solving programs of the form f(b) + ∑ hi(b) while solving sub-problems which consists of f(b) and a single function hi(... |

21 | LP decoding
- Feldman, Karger, et al.
- 2003
(Show Context)
Citation Context ...less task. Problems involving inference using graphical models comes up in a wide range of applications covering a variety of disciplines. Those include digital communications (error correcting codes =-=[12]-=-), computer vision [36], medical diagnosis [24], protein folding [47], computer graphics [13], [8], clustering [34], as well as other broad disciplines which include signal processing, artificial inte... |

20 | 2005b). “Sufficient conditions for convergence of Loopy Belief Propagation
- Mooij, Kappen
(Show Context)
Citation Context ...fined and often gives surprisingly good approximate results for graphical models with cycles. However, in this context there are no convergence guarantees (except under some special cases [39], [19], =-=[31]-=-) and the algorithms fail to converge in many cases of interest. During the past decade there has been much progress in putting forward a framework for approximate inference using variational principl... |

16 | Variational probabilistic inference and the qmr-dt database
- Jaakkola, Jordan
- 1999
(Show Context)
Citation Context ...raphical models comes up in a wide range of applications covering a variety of disciplines. Those include digital communications (error correcting codes [12]), computer vision [38], medical diagnosis =-=[23]-=-, protein folding [49], computer graphics [13], [8], clustering [36], as well as other broad disciplines which include signal processing, artificial intelligence and statistical physics [14], [24]. Pr... |

15 | Convergent messagepassing algorithms for inference over general graphs with convex free energies
- Hazan, Shashua
- 2008
(Show Context)
Citation Context ...sible ∑ (cα + α ∑ i∈N(α) ciα − 1) 2 , (38) which is a least-squares criteria for uniformity of ¯cα. We refer to the two least-squares scheme as L2 convex free energy approximation. In an earlier work =-=[18]-=-, we also used the maximum entropy approach where the criterion function minimizes ∑ α ¯cα ln ¯cα. Further investigation for constructing good convex free energy approximations can be found in [30]. T... |

14 |
Dykstra’s algorithm as the nonlinear extension of Bregman’s optimization method
- Bregman, Censor, et al.
- 1999
(Show Context)
Citation Context ...s, in the particular case when f(b) = ˆ f(b), i.e., is essentially smooth, and hi(b) = δCi (b) (the indicator function of convex set Ci), the update step (b) for Algorithm 1 is a ”Bregman” projection =-=[6]-=- of the vector µ i onto the convex set Ci. In that case, following some algebraic manipulations (such as eliminating µ i among other manipulations) the scheme (with A = 0) reduces to the well known Dy... |

14 | Learning and inferring image segmentations using the GBP typical cut algorithm
- Shental, Zomet, et al.
- 2003
(Show Context)
Citation Context ...variety of disciplines. Those include digital communications (error correcting codes [12]), computer vision [36], medical diagnosis [24], protein folding [47], computer graphics [13], [8], clustering =-=[34]-=-, as well as other broad disciplines which include signal processing, artificial intelligence and statistical physics [14], [25]. Probabilistic inference comes in two distinct forms and typically invo... |

12 | Convergent propagation algorithms via oriented trees
- Globerson, Jaakkola
- 2007
(Show Context)
Citation Context ...of the factor graph [43]. It is notable that for this specific member of convex free energies a convergent message-passing algorithm, applicable to pairwise factors only, has been recently introduced =-=[16]-=-. However, a convergent message passing algorithm for the general class of convex free energies is still lacking. The existing algorithms either employ damping heuristics to ensure convergence in prac... |

10 |
An iterative procedure for obtaining iprojections onto the intersection of convex sets. The Annals of Probability, 13:975–984
- Dykstra
- 1985
(Show Context)
Citation Context ...Bregman [5] extended Hildreth’s problem setup by including any strictly convex function f. The special case of Entropy projections was introduced later by Csiszar [9], as I-projections. Dykstra [11], =-=[10]-=- was the first to introduce general convex sets Ci (i.e., going beyond affine sets or half-spaces) but limited the treatment to f representing the Euclidean norm and the KL divergence. The view of the... |

9 | The Dykstra algorithm with Bregman projections
- Censor, Reich
- 1998
(Show Context)
Citation Context ...cial case of Algorithm 8 when hi = δCi , where Ci is a convex set, and f is essentially smooth, i.e., A = 0, can be mapped (by eliminating step 2(a)) to a successive Bregman projection algorithm [6], =-=[7]-=- which is also known under the names of Dykstra, Hildreth, Han and Csiszar. This class of iterative projection schemes has a long history starting from Von-Neumann in the 50s [32] who introduced the c... |

9 | Convexifying the Bethe free energy
- Meshi, Jaimovich, et al.
- 2009
(Show Context)
Citation Context ...rk [18], we also used the maximum entropy approach where the criterion function minimizes ∑ α ¯cα ln ¯cα. Further investigation for constructing good convex free energy approximations can be found in =-=[30]-=-. The desire towards uniformity, besides being used extensively in probabilistic settings, is motivated by the success of the Bethe free energy where ¯cα = 1. The Bethe free energy is non-convex for f... |

8 |
Functional Operators Vol
- Neumann
- 1950
(Show Context)
Citation Context ... = 0) reduces to the well known Dykstra [11] (also goes under different names such as Hildreth, Bregman, Csiszar, Han) successive projection algorithm which has its origins in the work of Von-Neumann =-=[30]-=-. Further historical details can be found in Appendix B. Another useful property of the algorithm that it is well defined for non-convex primal energies. Specifically, we can establish the following r... |