## A comparison of algorithms for inference and learning in probabilistic graphical models (2005)

### Cached

### Download Links

Venue: | IEEE Transactions on Pattern Analysis and Machine Intelligence |

Citations: | 53 - 4 self |

### BibTeX

@ARTICLE{Frey05acomparison,

author = {Brendan J. Frey and Nebojsa Jojic},

title = {A comparison of algorithms for inference and learning in probabilistic graphical models},

journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},

year = {2005},

volume = {27},

pages = {2005}

}

### OpenURL

### Abstract

Computer vision is currently one of the most exciting areas of artificial intelligence re-search, largely because it has recently become possible to record, store and process large amounts of visual data. While impressive achievements have been made in pattern clas-sification problems such as handwritten character recognition and face detection, it is even more exciting that researchers may be on the verge of introducing computer vision systems that perform scene analysis, decomposing image input into its constituent objects, lighting conditions, motion patterns, and so on. Two of the main challenges in computer vision are finding efficient models of the physics of visual scenes and finding efficient algorithms for inference and learning in these models. In this paper, we advocate the use of graph-based probability models and their associated inference and learning algorithms for computer vision and scene analysis. We review exact techniques and various approximate, computationally efficient techniques, including iterative conditional modes, the expectation maximization (EM) algorithm, the mean field method, variational techniques, structured variational techniques, Gibbs sampling, the sum-product algorithm and “loopy ” belief propagation. We describe how each technique can be applied in a model of multiple, occluding objects, and contrast the behaviors and performances of the techniques using a unifying cost function, free energy.

### Citations

9193 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...ore tightly by the data. This observation becomes relevant when we study approximate inference techniques that obtain point estimates of the parameters, such as the expectation maximization algorithm =-=[6]-=-. We now turn to the general problem of inferring the values of unobserved (hidden) RVs, given the values of the observed (visible) RVs. Denote the hidden RVs by�and the visible RVs byÚand partition t... |

7556 |
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
- Pearl
- 1988
(Show Context)
Citation Context ...er commenting on each of these issues, we briefly review 3 kinds of graphical model: Bayesian networks (BNs), Markov random fields (MRFs), and factor graphs (FGs). For a more extensive treatment, see =-=[4, 9, 25, 26, 35]-=-. Prior knowledge usually includes strong beliefs about the existence of hidden random variables (RVs) and the relationships between RVs in the system. This notion of “modularity” is a central aspect ... |

4100 |
Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images
- Geman, Geman
- 1984
(Show Context)
Citation Context ...on��. Repeat for a fixed number of iterations or until convergence. 27sasÒ� Algorithmically, this is a minor modification of ICM, but in many applications it is able to escape poor local minima (c.f. =-=[15, 19]-=-). Also, the stochastically chosen values of��can be monitored to estimate the uncertainty in��under the posterior. IfÒcounts the number of sampling steps, then, , theÒth configuration of the hidden R... |

1309 | Factor graphs and the sum-product algorithm
- Kschischang, Frey, et al.
- 2001
(Show Context)
Citation Context ...er commenting on each of these issues, we briefly review 3 kinds of graphical model: Bayesian networks (BNs), Markov random fields (MRFs), and factor graphs (FGs). For a more extensive treatment, see =-=[4, 9, 25, 26, 35]-=-. Prior knowledge usually includes strong beliefs about the existence of hidden random variables (RVs) and the relationships between RVs in the system. This notion of “modularity” is a central aspect ... |

1174 | Graphical Models
- Lauritzen
- 1996
(Show Context)
Citation Context ...is the degree of��, i.e., the number of termsÉ���that ��appears in. The denominator is meant to account for the overlap between the clique marginals. For trees, the Bethe approximation is exact (c.f. =-=[26]-=-). 37s���Ø���������É���ÐÒÉ��� ����s���É��ÐÒÉ�� ������É���ÐÒ������Ú��� Substituting the Bethe approximation for the termÐÒÉ�,we obtain the Bethe free energy���Ø��, which approximates the true free ener... |

999 | On the statistical analysis of dirty pictures
- Besag
- 1986
(Show Context)
Citation Context ...nce techniques can be viewed as minimizing a cost function called “free energy” [33], which measures the accuracy of an approximate probability distribution. These include iterative conditional modes =-=[3]-=-, the expectation maximization (EM) algorithm [6, 33], variational techniques [24,33] structured variational techniques [24], Gibbs sampling [32] and the sum-product algorithm (a.k.a. loopy belief pro... |

912 | A tutorial on learning with Bayesian networks, learning in graphical models
- Heckerman
- 1998
(Show Context)
Citation Context ...t the structure of the BN in Fig. 4b, but not the number of classes,Â. Unknown structure can be represented as a hidden RV, so that inference of this hidden RV corresponds to Bayesian model selection =-=[17, 27]-=-. The BN in Fig. 4b can be modified to include an RV,Â, whose children are all of the� and�variables and whereÂlimits the range of the class indices. Given a training set, the posterior over Âreveals ... |

872 | An introduction to variational methods for graphical models
- Jordan, Ghahramani, et al.
- 1998
(Show Context)
Citation Context ...33], which measures the accuracy of an approximate probability distribution. These include iterative conditional modes [3], the expectation maximization (EM) algorithm [6, 33], variational techniques =-=[24,33]-=- structured variational techniques [24], Gibbs sampling [32] and the sum-product algorithm (a.k.a. loopy belief propagation) [25, 35]. The idea is to approximate the true posterior distributionÈ��Úby ... |

811 | A view of the EM algorithm that justifies incremental, sparse, and other variants, ser
- Neal, Hinton
- 1998
(Show Context)
Citation Context ...stribution cannot be computed in a tractable manner. So, we must turn to various approximations. Many approximate inference techniques can be viewed as minimizing a cost function called “free energy” =-=[33]-=-, which measures the accuracy of an approximate probability distribution. These include iterative conditional modes [3], the expectation maximization (EM) algorithm [6, 33], variational techniques [24... |

673 |
Probabilistic Networks and Expert Systems: Exact Computational Methods for Bayesian Networks
- Cowell, Dawid, et al.
- 2007
(Show Context)
Citation Context ...er commenting on each of these issues, we briefly review 3 kinds of graphical model: Bayesian networks (BNs), Markov random fields (MRFs), and factor graphs (FGs). For a more extensive treatment, see =-=[4, 9, 25, 26, 35]-=-. Prior knowledge usually includes strong beliefs about the existence of hidden random variables (RVs) and the relationships between RVs in the system. This notion of “modularity” is a central aspect ... |

597 | Probabilistic inference using markov chain monte carlo methods
- Neal
- 1993
(Show Context)
Citation Context ...distribution. These include iterative conditional modes [3], the expectation maximization (EM) algorithm [6, 33], variational techniques [24,33] structured variational techniques [24], Gibbs sampling =-=[32]-=- and the sum-product algorithm (a.k.a. loopy belief propagation) [25, 35]. The idea is to approximate the true posterior distributionÈ��Úby a simpler distributionÉ�, which is then used for making deci... |

502 | Loopy belief propagation for approximate inference: An empirical study
- Murphy, Weiss, et al.
- 1999
(Show Context)
Citation Context ...used to jump between modes of the posterior Also, it has produced state-of-the-art results on several difficult problems, including error-correcting decoding [14], medical ¡¡¡ modelÈ������� diagnosis =-=[30]-=-, random satisfiability [28], and phase-unwrapping in 2-dimensions [13]. To see how the sum-product algorithm works, consider computingÈ�in the È���È��È��È�. One approach is to computeÈ������for all v... |

481 | Learning low-level vision
- Freeman, Pasztor, et al.
(Show Context)
Citation Context ...aph is a tree. When the graph contains cycles, the sum-product algorithm (a.k.a. “loopy belief propagation”) is not exact and can diverge and oscillate. However, it has been used in vision algorithms =-=[8]-=-. Surprisingly, we have also found that its oscillatory behavior can be used to jump between modes of the posterior Also, it has produced state-of-the-art results on several difficult problems, includ... |

332 | Expectation propagation for approximate bayesian inference - Minka - 2001 |

322 |
Understanding belief propagation and its generalizations
- Yedidia
- 2001
(Show Context)
Citation Context ...– a consequence of the fact that FGs are more explicit about factorization. Another way to interconvert between representations is to expand the graph to include extra edges and extra variables (c.f. =-=[38]-=-). 3 Building Complex Models Using Modularity Graphical models provide a way to link simpler models together in a principled fashion that respects the rules of probability theory. Fig. 3 shows how the... |

306 |
Learning and relearning in Boltzmann machines
- Hinton, Sejnowski
- 1986
(Show Context)
Citation Context ...unction is the complete marginalization of É���Ü��.So, many of the techniques discussed in this paper can be used to approximately determine the effect of the partition function (e.g., Gibbs sampling =-=[19]-=-). There are also learning techniques that are specifically aimed at undirected graphical models, such as iterative proportional fitting [4]. For directed models, the partition function factorizes int... |

233 | The wake-sleep algorithm for unsupervised neural networks
- Hinton, Dayan, et al.
- 1995
(Show Context)
Citation Context ...is that the approximateÉdistribution used for the hidden RVs may not be well-suited to the model, causing the free energy to be a poor bound on the negative log-likelihood. However, as pointed out in =-=[18]-=-, since the free energy is �É�È��É�È ÐÒÈÚ(see (6)), if two models fit the data equally well (ÐÒÈÚis the same), minimizing the free energy will select the model that makes the approximateÉ-distribution... |

194 | On the optimality of solutions of the max-product belief propagation algorithm in arbitrary graphs
- Weiss, Freeman
- 2001
(Show Context)
Citation Context ...til convergence is detected, or until divergence is detected. Also, various schedules for updating the messages can be used and the quality of the results will depend on the schedule. It is proven in =-=[37]-=- that when the “max-product” algorithm converges, all configurations that differ by perturbing the RVs in subgraphs that contain at most one cycle, will have lower posterior probabilities. 34sIf the g... |

147 |
Analytic and algorithmic solution of random satisfiability problems. Science 297:812–815
- Mézard, Parisi, et al.
- 2002
(Show Context)
Citation Context ...f the posterior Also, it has produced state-of-the-art results on several difficult problems, including error-correcting decoding [14], medical ¡¡¡ modelÈ������� diagnosis [30], random satisfiability =-=[28]-=-, and phase-unwrapping in 2-dimensions [13]. To see how the sum-product algorithm works, consider computingÈ�in the È���È��È��È�. One approach is to computeÈ������for all values of�,�, and�and then co... |

140 | Learning flexible sprites in video layers
- Jojic, Frey
- 2001
(Show Context)
Citation Context ... locations [11]; changes in appearances of moving objects using a subspace model [10]; common motion patterns [22]; spatial deformations in object appearance [23]; layered models of occluding objects =-=[20]-=-; subspace models of occluding objects [12]; and the “epitome” of components in object appearance and shape [21]. An inference and learning algorithm in a combined model, like the one shown above, can... |

118 | A revolution: belief propagation in graphs with cycles
- Frey, MacKay
- 1998
(Show Context)
Citation Context ...ound that its oscillatory behavior can be used to jump between modes of the posterior Also, it has produced state-of-the-art results on several difficult problems, including error-correcting decoding =-=[14]-=-, medical ¡¡¡ modelÈ������� diagnosis [30], random satisfiability [28], and phase-unwrapping in 2-dimensions [13]. To see how the sum-product algorithm works, consider computingÈ�in the È���È��È��È�. ... |

110 | Propagation algorithms for variational Bayesian learning
- Ghahramani, Beal
- 2000
(Show Context)
Citation Context ...or RVs and parameters alike make use of the conditional independencies in the graphical model. It is possible to describe graph-based propagation algorithms for updating distributions over parameters =-=[16]-=-. It is often important to treat parameters and RVs differently during inference. Whereas each RV plays a role in a single training case, the parameters are shared across many training cases. So, the ... |

109 | EM algorithms for ML factor analysis - Rubin, Thayer - 1982 |

92 | Epitomic analysis of appearance and shape
- Jojic, Frey
- 2003
(Show Context)
Citation Context ...2]; spatial deformations in object appearance [23]; layered models of occluding objects [20]; subspace models of occluding objects [12]; and the “epitome” of components in object appearance and shape =-=[21]-=-. An inference and learning algorithm in a combined model, like the one shown above, can be obtained by linking together the modules and associated algorithms. 4 Parameterized Models and the Exponenti... |

60 | Transformation-invariant clustering using the EM algorithm
- Frey, Jojic
- 2003
(Show Context)
Citation Context ...rame is automatically decomposed into the parts shown in the BN. In previous papers, we describe efficient techniques for inference and learning in models that account for changes in object locations =-=[11]-=-; changes in appearances of moving objects using a subspace model [10]; common motion patterns [22]; spatial deformations in object appearance [23]; layered models of occluding objects [20]; subspace ... |

58 | Estimating mixture models of images and inferring spatial transformations using the em algorithm
- Frey, Jojic
- 1999
(Show Context)
Citation Context ...rame is automatically decomposed into the parts shown in the BN. In previous papers, we describe efficient techniques for inference and learning in models that account for changes in object locations =-=[10, 13]-=-; changes in appearances of moving objects using a subspace model [11]; common motion patterns [12, 25]; spatial deformations in object appearance [26]; layered models of moving, occluding objects in ... |

50 |
Transformed component analysis: Joint estimation of spatial transformations and image components
- Frey, Jojic
- 1999
(Show Context)
Citation Context ...evious papers, we describe efficient techniques for inference and learning in models that account for changes in object locations [11]; changes in appearances of moving objects using a subspace model =-=[10]-=-; common motion patterns [22]; spatial deformations in object appearance [23]; layered models of occluding objects [20]; subspace models of occluding objects [12]; and the “epitome” of components in o... |

43 | Bayesian neural networks and density networks
- MacKay
- 1995
(Show Context)
Citation Context ...t the structure of the BN in Fig. 4b, but not the number of classes,Â. Unknown structure can be represented as a hidden RV, so that inference of this hidden RV corresponds to Bayesian model selection =-=[17, 27]-=-. The BN in Fig. 4b can be modified to include an RV,Â, whose children are all of the� and�variables and whereÂlimits the range of the class indices. Given a training set, the posterior over Âreveals ... |

42 | Transformed hidden Markov models: Estimating mixture models and inferring spatial transformations in video sequences
- Jojic
- 2000
(Show Context)
Citation Context ...ficient techniques for inference and learning in models that account for changes in object locations [11]; changes in appearances of moving objects using a subspace model [10]; common motion patterns =-=[22]-=-; spatial deformations in object appearance [23]; layered models of occluding objects [20]; subspace models of occluding objects [12]; and the “epitome” of components in object appearance and shape [2... |

35 | Ordinal characteristics of transparency
- Adelson, Anandan
- 1990
(Show Context)
Citation Context ...rrors made by noise in the background will be avoided. The occlusion model explains an input image, with pixel intensitiesÞ�����ÞÃ, as a composition of a foreground image and a background image (c.f. =-=[1]-=-), and each of these images is selected from a library 3sofÂpossible maskÑ�Ñ�����ÑÃ,Ñ� images (a mixture model). Although separate libraries can be used for the foreground and background, for notation... |

34 | Bayesian mixture modeling by Monte Carlo simulation (Tech
- Neal
(Show Context)
Citation Context ... model parameters, so that the probability density of the extra parameters needed in more complex models is properly accounted for. For an example of Bayesian learning of infinite mixture models, see =-=[31]-=-. 16sformÔ�É��Õ� 5.3 Numerical Issues Many inference algorithms rely on the computation of expressions of the �,where the number of terms can be quite large. To avoid underflow, it is common to work i... |

27 | Very loopy belief propagation for unwrapping phase images
- Frey, Koetter, et al.
- 2001
(Show Context)
Citation Context ...-of-the-art results on several difficult problems, including error-correcting decoding [14], medical ¡¡¡ modelÈ������� diagnosis [30], random satisfiability [28], and phase-unwrapping in 2-dimensions =-=[13]-=-. To see how the sum-product algorithm works, consider computingÈ�in the È���È��È��È�. One approach is to computeÈ������for all values of�,�, and�and then computeÈ��È�ÈÈ�È������. For binary RVs, this ... |

27 |
A View of the EM Algorithm that
- Neal, Hinton
- 1998
(Show Context)
Citation Context ...stribution cannot be computed in a tractable manner. So, we must turn to various approximations. Many approximate inference techniques can be viewed as minimizing a cost function called “free energy” =-=[31]-=-, which measures the accuracy of an approximate probability distribution. These include iterated conditional modes [3], the expectation maximization (EM) algorithm [6], [31], variational techniques [2... |

26 | Learning appearance and transparency manifolds of occluded objects in layers
- FREY, JOJIC, et al.
- 2003
(Show Context)
Citation Context ...moving objects using a subspace model [10]; common motion patterns [22]; spatial deformations in object appearance [23]; layered models of occluding objects [20]; subspace models of occluding objects =-=[12]-=-; and the “epitome” of components in object appearance and shape [21]. An inference and learning algorithm in a combined model, like the one shown above, can be obtained by linking together the module... |

22 |
Learning graphical models of images, videos and their spatial transformations
- Frey, Jojic
- 2000
(Show Context)
Citation Context ...ent techniques for inference and learning in models that account for changes in object locations [10, 13]; changes in appearances of moving objects using a subspace model [11]; common motion patterns =-=[12, 25]-=-; spatial deformations in object appearance [26]; layered models of moving, occluding objects in 3-D scenes [22]; subspace models of moving, occluding objects in 3-D scenes [14]; and the “epitome” of ... |

21 |
Extending factor graphs so as to unify directed and undirected graphical models”, UAI
- Frey
(Show Context)
Citation Context |

21 |
Learning and relearning
- Hinton, Sejnowski
- 1986
(Show Context)
Citation Context ...ion is the complete marginalization of gkðxCk k Þ. So, many of the techniques discussed in this paper can be used to approximately determine the effect of the partition function (e.g., Gibbs sampling =-=[18]-=-). There are also learning techniques that are specifically aimed at undirected graphical models, such as iterative proportional fitting [4]. For directed models, the partition function factorizes int... |

20 |
A comparison of sequential learning methods for incomplete data
- Cowell, Dawid, et al.
- 1996
(Show Context)
Citation Context ...the means, the inverse variances are Gamma-distributed. If the training data is processed sequentially, where one training case is absorbed at a time, the mixture posterior can be updated as shown in =-=[5]-=-. The exact posterior is intractable, because the number of posterior mixture components is exponential 18sin the number of training cases, and the posterior distribution over the pixel means and vari... |

12 |
Filling in Scenes by Propagating Probabilities through layers into Appearance Models
- Frey
- 2000
(Show Context)
Citation Context ...ph contains cycles, the sum-product algorithm (a.k.a. “loopy belief propagation”) is not generally exact and can even diverge. However, it has been used to obtain good results on some vision problems =-=[7, 8]-=-, and has been shown to give the best known algorithms for solving difficult instances of NP-hard problems, including decoding error-correcting codes [16], random satisfiability problems [31], and pha... |

11 | Separating appearance from deformation
- Jojic, Simard, et al.
- 2001
(Show Context)
Citation Context ... models that account for changes in object locations [11]; changes in appearances of moving objects using a subspace model [10]; common motion patterns [22]; spatial deformations in object appearance =-=[23]-=-; layered models of occluding objects [20]; subspace models of occluding objects [12]; and the “epitome” of components in object appearance and shape [21]. An inference and learning algorithm in a com... |

8 |
Information and exponential families
- Barndorff-Nielson
- 1978
(Show Context)
Citation Context ...asis for on-line learning algorithms. For simplicity, in this paper, we assume the model parameters are fixed for the entire training set. 4.3 The Exponential Family Members of the exponential family =-=[2]-=- have the following parameterization:ÈÜ�������ÜÔ È���ª�Ü¡, where���������is a parameter vector andª�Üis the�th sufficient statistic. The sufficient statistics ofÜcontain all information that is needed... |

6 | Learning about multiple objects in images: Factorial learning without factorial search
- Williams, Titsias
- 2003
(Show Context)
Citation Context ...tained by linking together the modules and associated algorithms. Many other interesting avenues within this framework are being explored or have yet to be explored. For example, Williams and Titsias =-=[36]-=- describe a fast, greedy way to learn layered models of occluding objects. 4 PARAMETERIZED MODELS AND THE EXPONENTIAL FAMILY So far, we have studied graphical models as representations of structured p... |

3 |
Graphical models, variational inference and exponential families
- Wainwright, Jordan
- 2003
(Show Context)
Citation Context ...to find distributions that are close to the correct posterior distribution. This is accomplished by minimizing pseudo-distances on distributions, called “free energies”. (For an alternative view, see =-=[38]-=-.) It is interesting that in the 1800’s, Helmholtz was one of the first researchers to propose that vision is inference in a generative model, and that nature seeks correct probability distributions i... |

3 |
Propagation Algorithms for Variational Bayesian
- Ghahramani, Beal
(Show Context)
Citation Context ...or RVs and parameters alike make use of the conditional independencies in the graphical model. It is possible to describe graph-based propagation algorithms for updating distributions over parameters =-=[15]-=-. It is often important to treat parameters and RVs differently during inference. Whereas each RV plays a role in a single training case, the parameters are shared across many training cases. So, the ... |

1 |
A comparison of logistic regression and naive Bayes
- Ng, Jordan
- 2002
(Show Context)
Citation Context ... marginalization and Bayes rule. In the case of factor analysis, it turns out that the output is a linear function of a low-dimensional representation of the input, plus Gaussian noise. Ng and Jordan =-=[34]-=- show that within the context of logistic regression, for a given problem complexity ,generative approaches work better than discriminative approaches when the training data is limited. Discriminative... |

1 |
A modular generative model for layered vision
- Jojic, Frey
- 2003
(Show Context)
Citation Context ... in a principled fashion, an inference and learning algorithm in a combined model, like the one shown above, can be obtained by linking together the modules and associated algorithms, as described in =-=[23]-=-. 9sLearned means of appearance and mask images Front layer Background layer Bright Bright Bright Bright Deform Deform Deform Deform Position Position Position Position Hidden appearances, masks, brig... |

1 |
Transformed HMMs: Estimating mixture models of images and inferring spatial transformations in video sequences
- Jojic, Petrovic, et al.
- 2000
(Show Context)
Citation Context ...ent techniques for inference and learning in models that account for changes in object locations [10, 13]; changes in appearances of moving objects using a subspace model [11]; common motion patterns =-=[12, 25]-=-; spatial deformations in object appearance [26]; layered models of moving, occluding objects in 3-D scenes [22]; subspace models of moving, occluding objects in 3-D scenes [14]; and the “epitome” of ... |

1 |
A Comparison of Logistic Regression and Naive
- Ng, Jordan
- 2002
(Show Context)
Citation Context ...g marginalization and Bayes rule. In the case of factor analysis, it turns out that the output is a linear function of a lowdimensional representation of the input, plus Gaussian noise. Ng and Jordan =-=[32]-=- show that, within the context of logistic regression, for a given problem complexity, generative approaches work better than discriminative approaches when the training data is limited. Discriminativ... |

1 |
university education included engineering, physics, and computer science, culminating with a doctorate from Geoffrey Hinton’s Neural Networks Research Group at the University of Toronto. From 1997 to 1999, he was a Beckman Fellow at the University of Illi
- Yedidia, Freeman, et al.
(Show Context)
Citation Context ...F—a consequence of the fact that FGs are more explicit about factorization. Another way to interconvert between representations is to expand the graph to include extra edges and extra variables (cf., =-=[37]-=-). 3 BUILDING COMPLEX MODELS USING MODULARITY Graphical models provide a way to link simpler models together in a principled fashion that respects the rules of probability theory. Fig. 3 shows how the... |