Results 1  10
of
10
A New Class of Upper Bounds on the Log Partition Function
 In Uncertainty in Artificial Intelligence
, 2002
"... Bounds on the log partition function are important in a variety of contexts, including approximate inference, model fitting, decision theory, and large deviations analysis [11, 5, 4]. We introduce a new class of upper bounds on the log partition function, based on convex combinations of distribution ..."
Abstract

Cited by 211 (32 self)
 Add to MetaCart
Bounds on the log partition function are important in a variety of contexts, including approximate inference, model fitting, decision theory, and large deviations analysis [11, 5, 4]. We introduce a new class of upper bounds on the log partition function, based on convex combinations of distributions in the exponential domain, that is applicable to an arbitrary undirected graphical model. In the special case of convex combinations of treestructured distributions, we obtain a family of variational problems, similar to the Bethe free energy, but distinguished by the following desirable properties: (i) they are convex, and have a unique global minimum; and (ii) the global minimum gives an upper bound on the log partition function. The global minimum is defined by stationary conditions very similar to those defining xed points of belief propagation (BP) or treebased reparameterization [see 13, 14]. As with BP fixed points, the elements of the minimizing argument can be used as approximations to the marginals of the original model. The analysis described here can be extended to structures of higher treewidth (e.g., hypertrees), thereby making connections with more advanced approximations (e.g., Kikuchi and variants [15, 10]).
MAP estimation via agreement on trees: Messagepassing and linear programming
, 2002
"... We develop and analyze methods for computing provably optimal maximum a posteriori (MAP) configurations for a subclass of Markov random fields defined on graphs with cycles. By decomposing the original distribution into a convex combination of treestructured distributions, we obtain an upper bound ..."
Abstract

Cited by 190 (9 self)
 Add to MetaCart
(Show Context)
We develop and analyze methods for computing provably optimal maximum a posteriori (MAP) configurations for a subclass of Markov random fields defined on graphs with cycles. By decomposing the original distribution into a convex combination of treestructured distributions, we obtain an upper bound on the optimal value of the original problem (i.e., the log probability of the MAP assignment) in terms of the combined optimal values of the tree problems. We prove that this upper bound is tight if and only if all the tree distributions share an optimal configuration in common. An important implication is that any such shared configuration must also be a MAP configuration for the original distribution. Next we develop two approaches to attempting to obtain tight upper bounds: (a) a treerelaxed linear program (LP), which is derived from the Lagrangian dual of the upper bounds; and (b) a treereweighted maxproduct messagepassing algorithm that is related to but distinct from the maxproduct algorithm. In this way, we establish a connection between a certain LP relaxation of the modefinding problem, and a reweighted form of the maxproduct (minsum) messagepassing algorithm.
MAP estimation via agreement on (hyper)trees: Messagepassing and linear programming approaches
 IEEE Transactions on Information Theory
, 2002
"... We develop an approach for computing provably exact maximum a posteriori (MAP) configurations for a subclass of problems on graphs with cycles. By decomposing the original problem into a convex combination of treestructured problems, we obtain an upper bound on the optimal value of the original ..."
Abstract

Cited by 142 (11 self)
 Add to MetaCart
(Show Context)
We develop an approach for computing provably exact maximum a posteriori (MAP) configurations for a subclass of problems on graphs with cycles. By decomposing the original problem into a convex combination of treestructured problems, we obtain an upper bound on the optimal value of the original problem (i.e., the log probability of the MAP assignment) in terms of the combined optimal values of the tree problems. We prove that this upper bound is met with equality if and only if the tree problems share an optimal configuration in common. An important implication is that any such shared configuration must also be a MAP configuration for the original problem. Next we present and analyze two methods for attempting to obtain tight upper bounds: (a) a treereweighted messagepassing algorithm that is related to but distinct from the maxproduct (minsum) algorithm; and (b) a treerelaxed linear program (LP), which is derived from the Lagrangian dual of the upper bounds. Finally, we discuss the conditions that govern when the relaxation is tight, in which case the MAP configuration can be obtained. The analysis described here generalizes naturally to convex combinations of hypertreestructured distributions.
TreeBased Reparameterization Framework for Analysis of Belief Propagation and Related Algorithms
, 2001
"... We present a treebased reparameterization framework that provides a new conceptual view of a large class of algorithms for computing approximate marginals in graphs with cycles. This class includes the belief propagation or sumproduct algorithm [39, 36], as well as a rich set of variations and ext ..."
Abstract

Cited by 121 (23 self)
 Add to MetaCart
(Show Context)
We present a treebased reparameterization framework that provides a new conceptual view of a large class of algorithms for computing approximate marginals in graphs with cycles. This class includes the belief propagation or sumproduct algorithm [39, 36], as well as a rich set of variations and extensions of belief propagation. Algorithms in this class can be formulated as a sequence of reparameterization updates, each of which entails refactorizing a portion of the distribution corresponding to an acyclic subgraph (i.e., a tree). The ultimate goal is to obtain an alternative but equivalent factorization using functions that represent (exact or approximate) marginal distributions on cliques of the graph. Our framework highlights an important property of BP and the entire class of reparameterization algorithms: the distribution on the full graph is not changed. The perspective of treebased updates gives rise to a simple and intuitive characterization of the fixed points in terms of tree consistency. We develop interpretations of these results in terms of information geometry. The invariance of the distribution, in conjunction with the fixed point characterization, enables us to derive an exact relation between the exact marginals on an arbitrary graph with cycles, and the approximations provided by belief propagation, and more broadly, any algorithm that minimizes the Bethe free energy. We also develop bounds on this approximation error, which illuminate the conditions that govern their accuracy. Finally, we show how the reparameterization perspective extends naturally to more structured approximations (e.g., Kikuchi and variants [52, 37]) that operate over higher order cliques.
A comparison of algorithms for inference and learning in probabilistic graphical models
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2005
"... Computer vision is currently one of the most exciting areas of artificial intelligence research, largely because it has recently become possible to record, store and process large amounts of visual data. While impressive achievements have been made in pattern classification problems such as handwr ..."
Abstract

Cited by 66 (4 self)
 Add to MetaCart
(Show Context)
Computer vision is currently one of the most exciting areas of artificial intelligence research, largely because it has recently become possible to record, store and process large amounts of visual data. While impressive achievements have been made in pattern classification problems such as handwritten character recognition and face detection, it is even more exciting that researchers may be on the verge of introducing computer vision systems that perform scene analysis, decomposing image input into its constituent objects, lighting conditions, motion patterns, and so on. Two of the main challenges in computer vision are finding efficient models of the physics of visual scenes and finding efficient algorithms for inference and learning in these models. In this paper, we advocate the use of graphbased probability models and their associated inference and learning algorithms for computer vision and scene analysis. We review exact techniques and various approximate, computationally efficient techniques, including iterative conditional modes, the expectation maximization (EM) algorithm, the mean field method, variational techniques, structured variational techniques, Gibbs sampling, the sumproduct algorithm and “loopy ” belief propagation. We describe how each technique can be applied in a model of multiple, occluding objects, and contrast the behaviors and performances of the techniques using a unifying cost function, free energy.
Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses”. In: arXiv eprints
, 2012
"... We investigate the relationship between the structure of a discrete graphical model and the support of the inverse of a generalized covariance matrix. We show that for certain graph structures, the support of the inverse covariance matrix of indicator variables on the vertices of a graph reflects th ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
(Show Context)
We investigate the relationship between the structure of a discrete graphical model and the support of the inverse of a generalized covariance matrix. We show that for certain graph structures, the support of the inverse covariance matrix of indicator variables on the vertices of a graph reflects the conditional independence structure of the graph. Our work extends results that have previously been established only in the context of multivariate Gaussian graphical models, thereby addressing an open question about the significance of the inverse covariance matrix of a nonGaussian distribution. The proof exploits acombination of ideas from thegeometryof exponentialfamilies, junctiontreetheory,andconvexanalysis. Thesepopulationlevel results have various consequences for graph selection methods, both known and novel, including a novel method for structure estimation for missing or corrupted observations. We provide nonasymptotic guarantees for such methods, and illustrate the sharpness of these predictions via simulations. 1. Introduction. Graphical
Messagepassing and linear programming approaches
, 2003
"... We develop and analyze methods for computing provably optimal maximum a posteriori (MAP) configurations for a subclass of Markov random fields defined on graphs with cycles. By decomposing the original distribution into a convex combination of treestructured distributions, we obtain an upper bound ..."
Abstract
 Add to MetaCart
(Show Context)
We develop and analyze methods for computing provably optimal maximum a posteriori (MAP) configurations for a subclass of Markov random fields defined on graphs with cycles. By decomposing the original distribution into a convex combination of treestructured distributions, we obtain an upper bound on the optimal value of the original problem (i.e., the log probability of the MAP assignment) in terms of the combined optimal values of the tree problems. We prove that this upper bound is met with equality if and only if the tree distributions share an optimal configuration in common. An important implication is that any such shared configuration must also be a MAP configuration for the original distribution. Next we develop two approaches to attempting to obtain tight upper bounds: (a) a treerelaxed linear program (LP), which is derived from the Lagrangian dual of the upper bounds; and (b) a treereweighted messagepassing algorithm that is related to but distinct from the maxproduct (minsum) algorithm. Finally, we discuss the conditions that govern when the relaxation is tight, in which case the MAP configuration can be obtained. The analysis described here generalizes naturally to convex combinations of hypertreestructured distributions.
Covariance and entropy in Markov random fields
"... Abstract—We consider families of Markov random fields (MRFs) on an undirected graph using the exponential family representation. In earlier work [13] we proved that if the statistic that defines a family of MRFs is positively correlated, then the entropy is monotone decreasing in the exponential par ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—We consider families of Markov random fields (MRFs) on an undirected graph using the exponential family representation. In earlier work [13] we proved that if the statistic that defines a family of MRFs is positively correlated, then the entropy is monotone decreasing in the exponential parameters. In this paper we address the converse, specifically within the context of the Ising model. The statistic for an edge is viewed as positive or negative as it favors similar or dissimilar values at the endpoints of the edge. We show that for an acyclic Ising model with no self statistics, the statistic is positively correlated regardless of the polarity of the edges. We further show that for a cyclic Ising model, the statistic is positively correlated if and only if the statistic is not frustrated; and that the entropy is monotone decreasing in the exponential parameters, if and only if the statistic is not frustrated. I. PREAMBLE In this paper we pick up the discussion started in [13] and continued in [14]; namely, examining the relationship between the statistic defining a family of Markov random fields (MRFs) and the behavior of informationtheoretic quantities within that family of MRFs. In particular, we address the question of whether monotonicity of entropy over the family of MRFs implies that the statistic is positively correlated. The converse was shown in [13]. The interest in such questions arises from both engineering [15] and social science [16], [7] concerns. Let G = (V, E) be a graph. A family of exponential distributions is specified by a vector statistic t = (tij) defined on the endpoints of the edges E of the graph. 1 That is, for a given image x = {xi: i ∈ V} and each edge {i, j} ∈ E, the function tij: Xi×Xj − → R determines the contribution of the pair (xi, xj) to the probability of x. We say that X is Markov with respect to G, in that conditioning on a cutset renders separated subsets independent of one another [17]. The entire family of MRFs based on t is generated by introducing an exponential parameter θ = (θij) where for each edge {i, j}, θij scales the sensitivity of the distribution p(x) = p(x; θ) to the function tij. Specifically, if X is an MRF based on t with exponential parameter θ, the probability of an image x is p(x; θ) = exp { ∑ θijtij(xi, xj) − Φ(θ)}