## Learning in graphical models (1998)

### Cached

### Download Links

- [www.cs.drexel.edu]
- [www.cis.upenn.edu]
- [www.cs.ubc.ca]
- [www.cs.utah.edu]
- [www.stat.columbia.edu]
- [www.stat.columbia.edu]
- [www.stat.columbia.edu]
- [www.ics.uci.edu]
- [www.cs.princeton.edu]
- [www.cs.princeton.edu:80]
- [www.cs.princeton.edu]
- [www.cs.princeton.edu]
- [www.seas.harvard.edu]
- [www.seas.harvard.edu]
- [www.cs.ubc.ca]
- [www.cs.ubc.ca]
- [www.ai.mit.edu]
- [www.cs.ubc.ca]
- [www.stat.lsa.umich.edu]
- [www.cs.cmu.edu]
- [www.cs.dartmouth.edu]
- [www.cs.berkeley.edu]

Citations: | 650 - 10 self |

### BibTeX

@ARTICLE{Jordan98learningin,

author = {Michael I. Jordan},

title = {Learning in graphical models},

journal = {},

year = {1998},

volume = {19}

}

### Years of Citing Articles

### OpenURL

### Abstract

Statistical applications in fields such as bioinformatics, information retrieval, speech processing, im-age processing and communications often involve large-scale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for approaching these problems, and indeed many of the models developed by researchers in these applied fields are instances of the general graphical model formalism. We review some of the basic ideas underlying graphical models, including the algorithmic ideas that allow graphical models to be deployed in large-scale data analysis problems. We also present examples of graphical models in bioinformatics, error-control coding and language processing.

### Citations

7445 |
Probabilistic Reasoning in Intelligent Systems
- Pearl
- 1988
(Show Context)
Citation Context ...aphical model. Probability distributions associated with this graph can be factorized as p(xV) = 1 Z ψ(x1, x2)ψ(x1, x3)ψ(x2, x4)ψ(x3, x5)ψ(x2, x5, x6). time reachability algorithm based on the graph (=-=Pearl, 1988-=-). Second, as we discuss in the following section, the graphical structure can be exploited by algorithms for probabilistic inference. Let us now consider the undirected case. Given an undirected grap... |

3694 | Convex Analysis
- Rockafellar
- 1970
(Show Context)
Citation Context ... (1) the cumulant generating function A(θ) is a convex function on a convex domain Θ (Brown, 1986), and (2) any convex function can be expressed variationally in terms of its conjugate dual function (=-=Rockafellar, 1970-=-). This allows us to express the cumulant generating function as follows: A(θ) = { } ∗ sup 〈θ, µ〉 − A (µ) , (10) µ∈M where M is the set of realizable mean parameters: M = { µ ∈ R d ∣ ∃ p(·) s. t. ∫ } ... |

1279 | Factor graphs and the sumproduct algorithm
- Kschischang, Frey, et al.
- 2001
(Show Context)
Citation Context ...e a joint probability by taking the product across these factors: p(xV) = 1 Z ∏ i∈I fi(xCi ). (3) As shown in Figure 4, this definition is associated with a graphical representation—the factor graph (=-=Kschischang et al., 2001-=-). A factor graph is a bipartite graph in which the random variables are round nodes and the factors appear as square nodes. There is an edge between the factor node fi and the variable node Xv if and... |

1155 | Graphical Models
- Lauritzen
- 1996
(Show Context)
Citation Context ...rameters among the nodes of an undirected graph to yield an alternative general tool for Bayesian modeling. It 5is also possible to work with hybrids that include both directed and undirected edges (=-=Lauritzen, 1996-=-). In general, directed graphs and undirected graphs make different assertions of conditional independence. Thus, there are families of probability distributions that are captured by a directed graph ... |

1119 | Evolutionary trees from DNA sequences: a maximum likelihood approach - Felsenstein - 1981 |

1009 | Monte Carlo Statistical Methods
- Robert, Casella
- 2004
(Show Context)
Citation Context ... of graphical models. 3.2 Sampling algorithms Sampling algorithms such as importance sampling and Markov chain Monte Carlo (MCMC) provide a general methodology for probabilistic inference (Liu, 2001, =-=Robert and Casella, 2004-=-). The graphical model setting provides an opportunity for graph-theoretic structure to be exploited in the design, analysis and implementation of sampling algorithms. Note in particular that the clas... |

983 | Low-density parity-check codes - Gallager - 1962 |

867 | An Introduction to Variational Methods for Graphical Models
- Jordan, Ghahramani, et al.
(Show Context)
Citation Context ...bing the optimization problem in Eq. (10) in various ways. One approach is to restrict the optimization to a class of simplified or “tractable” distributions—this is known as the mean field approach (=-=Jordan et al., 1999-=-). Thus we consider a subset MTract ⊆ M corresponding to distributions that are tractable vis-a-vis an algorithm such as the junction tree algorithm, and restrict the optimization to this set: sup { }... |

666 |
Probabilistic Networks and expert systems
- Cowell, Dawid, et al.
- 1999
(Show Context)
Citation Context ...ls as a systematic application of graph-theoretic algorithms to probability theory, it should not be surprising that many authors have viewed graphical models as a general Bayesian “inference engine”(=-=Cowell et al., 1999-=-). What is perhaps most distinctive about the graphical model approach is its naturalness in formulating probabilistic models of complex phenomena in applied fields, while maintaining control over the... |

640 | Biological Sequence Analysis - Durbin, Eddy, et al. - 1998 |

594 | Dynamic bayesian networks: representation, inference and learning,” 2002, p 268, thesis
- Murphy
(Show Context)
Citation Context ...roposed update can be neglected. Finally, a variety of hybrid algorithms can be defined in which exact inference algorithms are used locally within an overall sampling framework (Jensen et al., 1995, =-=Murphy, 2002-=-). 3.3 Variational algorithms The basic idea of variational inference is to characterize a probability distribution as the solution to an optimization problem, to perturb this optimization problem, an... |

511 | Factorial hidden Markov models
- Ghahramani, Jordan
- 1995
(Show Context)
Citation Context ...ntained in the ovals in Figure 8; they have been suppressed to simplify the diagram). The model in Figure 8 is an instance of a graphical model family known as a factorial hidden Markov model (fHMM) (=-=Ghahramani and Jordan, 1996-=-); see Figure 10(c) for a generic example. Classical algorithms for inference on multilocus pedigrees are variants of the elimination algorithm on this fHMM, and correspond to different choices of eli... |

470 | Graphical models, exponential families and variational inference - Wainwright, Jordan - 2003 |

459 | Design of capacityapproaching irregular codes - Richardson, Shokrollahi, et al. - 2001 |

442 |
Monte Carlo strategies in scientific computing
- Liu
- 2001
(Show Context)
Citation Context ... properties of graphical models. 3.2 Sampling algorithms Sampling algorithms such as importance sampling and Markov chain Monte Carlo (MCMC) provide a general methodology for probabilistic inference (=-=Liu, 2001-=-, Robert and Casella, 2004). The graphical model setting provides an opportunity for graph-theoretic structure to be exploited in the design, analysis and implementation of sampling algorithms. Note i... |

338 |
Complexity of finding embeddings in a k-tree
- Arnborg, Corneil, et al.
- 1987
(Show Context)
Citation Context ...h. This is a graph-theoretic problem—it is independent of the numerical values of the potentials. 2 The problem of finding an elimination ordering that achieves the treewidth turns out to be NP-hard (=-=Arnborg et al., 1987-=-). It is often possible in practice, however, to find good or even optimal orderings for specific graphs, and a variety of inference algorithms in specific fields (e.g., the algorithms for inference o... |

308 |
The generalized distributive law
- McEliece, Aji
(Show Context)
Citation Context ...tions that underly the sum-product algorithm are justified by the fact that sums and products form a commutative semiring, and the algorithm generalizes immediately to any other commutative semiring (=-=Aji and McEliece, 2000-=-, Shenoy and Shafer, 1990). In particular, “maximization” and “product” form a commutative semiring, and a “max-product” variant of the sum-product algorithm can be used for computing modes of posteri... |

280 | A family of algorithms for approximate Bayesian inference. Doctoral dissertation - Minka - 2001 |

198 | A variational Bayesian framework for graphical models - Attias - 2000 |

182 | The power of amnesia: learning probabilistic automata with variable memory length
- Ron, Singer, et al.
- 1996
(Show Context)
Citation Context ...o approximate high-order Markov models with a small number of parameters. Further examples of variations of Markovian models include hierarchical HMMs (Murphy and Paskin, 2002), variable length HMMs (=-=Ron et al., 1996-=-), and buried HMMs (Bilmes, 2003). For a 29β N M α θ Z n Wn Figure 11: The latent Dirichlet allocation model for document collections. The outer plate represents a corpus containing M documents, whil... |

161 |
Fundamentals of Statistical Exponential Families
- BROWN
- 1986
(Show Context)
Citation Context ...gral: ∫ A(θ) = log exp〈θ, φ(xV)〉ν(dxV), (9) where 〈·, ·〉 denotes an inner product. We now use two important facts: (1) the cumulant generating function A(θ) is a convex function on a convex domain Θ (=-=Brown, 1986-=-), and (2) any convex function can be expressed variationally in terms of its conjugate dual function (Rockafellar, 1970). This allows us to express the cumulant generating function as follows: A(θ) =... |

155 | Construction of multilocus genetic linkage maps in humans - Lander, Green - 1987 |

139 | Axioms for probability and belief-function propagation
- Shenoy, Shafer
- 1990
(Show Context)
Citation Context ...um-product algorithm are justified by the fact that sums and products form a commutative semiring, and the algorithm generalizes immediately to any other commutative semiring (Aji and McEliece, 2000, =-=Shenoy and Shafer, 1990-=-). In particular, “maximization” and “product” form a commutative semiring, and a “max-product” variant of the sum-product algorithm can be used for computing modes of posterior distributions. Finally... |

129 |
A language and program for complex Bayesian modelling. The Statistician 43
- Gilks, Thomas, et al.
- 1994
(Show Context)
Citation Context ...node. Using these definitions, Gibbs samplers can be set up automatically from the graphical model specification, a fact that is exploited in the BUGS software for Gibbs sampling in graphical models (=-=Gilks et al., 1994-=-). The Markov blanket is also useful in the design of Metropolis-based algorithms—factors that do not appear in the Markov blanket of a set of variables being considered in a proposed update can be ne... |

121 | Low-density parity check codes - Gallager - 1962 |

115 |
A general model for the genetic analysis of pedigree data. Hum Hered 1971;21:523–42
- RC, Stewart
(Show Context)
Citation Context ...e. Classical algorithms for inference on multilocus pedigrees are variants of the elimination algorithm on this fHMM, and correspond to different choices of elimination order (Lander and Green, 1987, =-=Elston and Stewart, 1971-=-). While these algorithms are viable for small problems, exact inference is intractable for general multilocus pedigrees. Indeed, focusing only on the haplotype variables, it can be verified that the ... |

111 | Propagation algorithms for variational Bayesian learning - Ghahramani, Beal - 2000 |

96 | Linear time inference in hierarchical HMMs
- Murphy, Paskin
- 2002
(Show Context)
Citation Context ...nd Jordan, 1999). This model makes it possible to approximate high-order Markov models with a small number of parameters. Further examples of variations of Markovian models include hierarchical HMMs (=-=Murphy and Paskin, 2002-=-), variable length HMMs (Ron et al., 1996), and buried HMMs (Bilmes, 2003). For a 29β N M α θ Z n Wn Figure 11: The latent Dirichlet allocation model for document collections. The outer plate represe... |

83 | Loopy belief propagation and Gibbs measures - Tatikonda, Jordan - 2002 |

69 | Graphical models and automatic speech recognition
- Bilmes
- 2001
(Show Context)
Citation Context ...ls with a small number of parameters. Further examples of variations of Markovian models include hierarchical HMMs (Murphy and Paskin, 2002), variable length HMMs (Ron et al., 1996), and buried HMMs (=-=Bilmes, 2003-=-). For a 29β N M α θ Z n Wn Figure 11: The latent Dirichlet allocation model for document collections. The outer plate represents a corpus containing M documents, while the inner plate represents an ... |

66 | Mixed memory markov models: Decomposing complex stochastic processes as mixture of simpler ones - Saul, Jordan - 1999 |

53 | Boltzmann chains and hidden Markov models
- Saul, Jordan
- 1995
(Show Context)
Citation Context ...uced to encode the allocations of the mixture components. The model remains eminently tractable for exact inference. A more serious departure is the coupled hidden Markov model shown in Figure 10(b) (=-=Saul and Jordan, 1995-=-). This model involves two chains of state variables which are coupled via links between the chains. 7 Triangulating this graph yields cliques of size three, and the model remains tractable for exact ... |

47 | Empirical and hierarchical Bayesian estimation of ancestral states
- Huelsenbeck, Bollback
- 2001
(Show Context)
Citation Context ...hich the E step is computed via the sum-product algorithm), and search across tree topologies to find the maximum likelihood tree, it is also possible to use MCMC methods within a Bayesian framework (=-=Huelsenbeck and Bollback, 2001-=-). The assumptions leading to the classical phylogenetic tree model are wanting in several respects. First, the assumption of site independence is generally incorrect. There are biochemical interactio... |

29 | Biological Sequence Analysis. Cambridge Univ - DURBIN, EDDY, et al. - 1998 |

28 | Design of provably good low-density parity check codes
- Richardson, Shokrollahi, et al.
- 1999
(Show Context)
Citation Context ...ithm. The algorithm is quite successful in practice for large block lengths (large values of N). Moreover, theoretical convergence results are available for the sum-product algorithm in this setting (=-=Richardson et al., 2001-=-). Averaging over the ensemble of graphs, it can be shown that the average error probability goes to zero over the iterations of the sum-product algorithm, given conditions on the channel, the degree ... |

25 |
Construction of multilocus genetic maps in humans
- Lander, Green
- 1987
(Show Context)
Citation Context ...(c) for a generic example. Classical algorithms for inference on multilocus pedigrees are variants of the elimination algorithm on this fHMM, and correspond to different choices of elimination order (=-=Lander and Green, 1987-=-, Elston and Stewart, 1971). While these algorithms are viable for small problems, exact inference is intractable for general multilocus pedigrees. Indeed, focusing only on the haplotype variables, it... |

19 |
Blocking-gibbs sampling in very large probabilistic expert systems
- Jensen, Kong, et al.
- 1995
(Show Context)
Citation Context ...ing considered in a proposed update can be neglected. Finally, a variety of hybrid algorithms can be defined in which exact inference algorithms are used locally within an overall sampling framework (=-=Jensen et al., 1995-=-, Murphy, 2002). 3.3 Variational algorithms The basic idea of variational inference is to characterize a probability distribution as the solution to an optimization problem, to perturb this optimizati... |

14 | Semidefinite relaxations for approximate inference on graphs with cycles - Wainwright, Jordan - 2003 |

13 | Hierarchical bayesian models for applications in information retrieval - Blei, Jordan, et al. |

12 | Bayesian methods for neural networks and related models - Titterington |

10 | Mixture models in measurement error problems, with reference to epidemiological studies - Richardson, Leblond, et al. - 2002 |

9 | Multilocus linkage analysis by blocked Gibbs sampling - Thomas, Gutin, et al. |

5 | Evolutionary trees from DNA sequences: A maximum likelihood approach - TEIN, J - 1981 |

3 | General lower bounds based on computer generated higher order expansions - R, Kappen - 2002 |

1 | General lower bounds based on computer generated higher order expansions - andKappen - 2002 |

1 | Mixed memory Markov models: Decomposing complex stochastic processes as mixture of simpler ones - andJordan - 1999 |