Results 1  10
of
214
Learning in graphical models
, 2004
"... Statistical applications in fields such as bioinformatics, information retrieval, speech processing, image processing and communications often involve largescale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for ..."
Abstract

Cited by 608 (10 self)
 Add to MetaCart
Statistical applications in fields such as bioinformatics, information retrieval, speech processing, image processing and communications often involve largescale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for approaching these problems, and indeed many of the models developed by researchers in these applied fields are instances of the general graphical model formalism. We review some of the basic ideas underlying graphical models, including the algorithmic ideas that allow graphical models to be deployed in largescale data analysis problems. We also present examples of graphical models in bioinformatics, errorcontrol coding and language processing. Key words and phrases: Probabilistic graphical models, junction tree algorithm, sumproduct algorithm, Markov chain Monte Carlo, variational inference, bioinformatics, errorcontrol coding.
Dynamic Bayesian Networks: Representation, Inference and Learning
, 2002
"... Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have bee ..."
Abstract

Cited by 563 (3 self)
 Add to MetaCart
Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have been used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs
and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs by allowing the state space to be represented in factored form, instead of as a single discrete random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linearGaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from sequential data.
In particular, the main novel technical contributions of this thesis are as follows: a way of representing
Hierarchical HMMs as DBNs, which enables inference to be done in O(T) time instead of O(T 3), where T is the length of the sequence; an exact smoothing algorithm that takes O(log T) space instead of O(T); a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of
applying RaoBlackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization
and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation of the field of sequential data modelling.
Constructing Free Energy Approximations and Generalized Belief Propagation Algorithms
 IEEE Transactions on Information Theory
, 2005
"... Important inference problems in statistical physics, computer vision, errorcorrecting coding theory, and artificial intelligence can all be reformulated as the computation of marginal probabilities on factor graphs. The belief propagation (BP) algorithm is an efficient way to solve these problems t ..."
Abstract

Cited by 412 (12 self)
 Add to MetaCart
Important inference problems in statistical physics, computer vision, errorcorrecting coding theory, and artificial intelligence can all be reformulated as the computation of marginal probabilities on factor graphs. The belief propagation (BP) algorithm is an efficient way to solve these problems that is exact when the factor graph is a tree, but only approximate when the factor graph has cycles. We show that BP fixed points correspond to the stationary points of the Bethe approximation of the free energy for a factor graph. We explain how to obtain regionbased free energy approximations that improve the Bethe approximation, and corresponding generalized belief propagation (GBP) algorithms. We emphasize the conditions a free energy approximation must satisfy in order to be a “valid ” or “maxentnormal ” approximation. We describe the relationship between four different methods that can be used to generate valid approximations: the “Bethe method, ” the “junction graph method, ” the “cluster variation method, ” and the “region graph method.” Finally, we explain how to tell whether a regionbased approximation, and its corresponding GBP algorithm, is likely to be accurate, and describe empirical results showing that GBP can significantly outperform BP.
A New Class of Upper Bounds on the Log Partition Function
 In Uncertainty in Artificial Intelligence
, 2002
"... Bounds on the log partition function are important in a variety of contexts, including approximate inference, model fitting, decision theory, and large deviations analysis [11, 5, 4]. We introduce a new class of upper bounds on the log partition function, based on convex combinations of distribution ..."
Abstract

Cited by 154 (27 self)
 Add to MetaCart
Bounds on the log partition function are important in a variety of contexts, including approximate inference, model fitting, decision theory, and large deviations analysis [11, 5, 4]. We introduce a new class of upper bounds on the log partition function, based on convex combinations of distributions in the exponential domain, that is applicable to an arbitrary undirected graphical model. In the special case of convex combinations of treestructured distributions, we obtain a family of variational problems, similar to the Bethe free energy, but distinguished by the following desirable properties: (i) they are convex, and have a unique global minimum; and (ii) the global minimum gives an upper bound on the log partition function. The global minimum is defined by stationary conditions very similar to those defining xed points of belief propagation (BP) or treebased reparameterization [see 13, 14]. As with BP fixed points, the elements of the minimizing argument can be used as approximations to the marginals of the original model. The analysis described here can be extended to structures of higher treewidth (e.g., hypertrees), thereby making connections with more advanced approximations (e.g., Kikuchi and variants [15, 10]).
Probabilistic nonlinear principal component analysis with Gaussian process latent variable models
 Journal of Machine Learning Research
, 2005
"... Summarising a high dimensional data set with a low dimensional embedding is a standard approach for exploring its structure. In this paper we provide an overview of some existing techniques for discovering such embeddings. We then introduce a novel probabilistic interpretation of principal component ..."
Abstract

Cited by 134 (14 self)
 Add to MetaCart
Summarising a high dimensional data set with a low dimensional embedding is a standard approach for exploring its structure. In this paper we provide an overview of some existing techniques for discovering such embeddings. We then introduce a novel probabilistic interpretation of principal component analysis (PCA) that we term dual probabilistic PCA (DPPCA). The DPPCA model has the additional advantage that the linear mappings from the embedded space can easily be nonlinearised through Gaussian processes. We refer to this model as a Gaussian process latent variable model (GPLVM). Through analysis of the GPLVM objective function, we relate the model to popular spectral techniques such as kernel PCA and multidimensional scaling. We then review a practical algorithm for GPLVMs in the context of large data sets and develop it to also handle discrete valued data and missing attributes. We demonstrate the model on a range of realworld and artificially generated data sets.
Fast Sparse Gaussian Process Methods: The Informative Vector Machine
 Advances in Neural Information Processing Systems 15
, 2003
"... We present a framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on informationtheoretic principles, previously suggested for active learning. Our goal is not only to learn dsparse predictors (which can be evaluated in O(d) rather than O(n), d ..."
Abstract

Cited by 122 (27 self)
 Add to MetaCart
We present a framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on informationtheoretic principles, previously suggested for active learning. Our goal is not only to learn dsparse predictors (which can be evaluated in O(d) rather than O(n), d n, n the number of training points), but also to perform training under strong restrictions on time and memory requirements. The scaling of our method is at most O(n ), and in large realworld classification experiments we show that it can match prediction performance of the popular support vector machine (SVM), yet can be significantly faster in training. In contrast to the SVM, our approximation produces estimates of predictive probabilities (`error bars'), allows for Bayesian model selection and is less complex in implementation.
Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data
 IN ICML
, 2004
"... In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when longrange dependencies exist. We present dynamic conditional random fields (DCRFs), a generalization of linearchain cond ..."
Abstract

Cited by 121 (11 self)
 Add to MetaCart
In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when longrange dependencies exist. We present dynamic conditional random fields (DCRFs), a generalization of linearchain conditional random fields (CRFs) in which each time slice contains a set of state variables and edgesa distributed state representation as in dynamic Bayesian networks (DBNs)and parameters are tied across slices. Since exact
TreeBased Reparameterization Framework for Analysis of Belief Propagation and Related Algorithms
, 2001
"... We present a treebased reparameterization framework that provides a new conceptual view of a large class of algorithms for computing approximate marginals in graphs with cycles. This class includes the belief propagation or sumproduct algorithm [39, 36], as well as a rich set of variations and ext ..."
Abstract

Cited by 101 (21 self)
 Add to MetaCart
We present a treebased reparameterization framework that provides a new conceptual view of a large class of algorithms for computing approximate marginals in graphs with cycles. This class includes the belief propagation or sumproduct algorithm [39, 36], as well as a rich set of variations and extensions of belief propagation. Algorithms in this class can be formulated as a sequence of reparameterization updates, each of which entails refactorizing a portion of the distribution corresponding to an acyclic subgraph (i.e., a tree). The ultimate goal is to obtain an alternative but equivalent factorization using functions that represent (exact or approximate) marginal distributions on cliques of the graph. Our framework highlights an important property of BP and the entire class of reparameterization algorithms: the distribution on the full graph is not changed. The perspective of treebased updates gives rise to a simple and intuitive characterization of the fixed points in terms of tree consistency. We develop interpretations of these results in terms of information geometry. The invariance of the distribution, in conjunction with the fixed point characterization, enables us to derive an exact relation between the exact marginals on an arbitrary graph with cycles, and the approximations provided by belief propagation, and more broadly, any algorithm that minimizes the Bethe free energy. We also develop bounds on this approximation error, which illuminate the conditions that govern their accuracy. Finally, we show how the reparameterization perspective extends naturally to more structured approximations (e.g., Kikuchi and variants [52, 37]) that operate over higher order cliques.
PAMPAS: RealValued Graphical Models for Computer Vision
, 2003
"... Probabilistic models have been adopted for many computer vision applications, however inference in highdimensional spaces remains problematic. As the statespace of a model grows, the dependencies between the dimensions lead to an exponential growth in computation when performing inference. Many comm ..."
Abstract

Cited by 91 (3 self)
 Add to MetaCart
Probabilistic models have been adopted for many computer vision applications, however inference in highdimensional spaces remains problematic. As the statespace of a model grows, the dependencies between the dimensions lead to an exponential growth in computation when performing inference. Many common computer vision problems naturally map onto the graphical model framework; the representation is a graph where each node contains a portion of the statespace and there is an edge between two nodes only if they are not independent conditional on the other nodes in the graph. When this graph is sparsely connected, belief propagation algorithms can turn an exponential inference computation into one which is linear in the size of the graph. However belief propagation is only applicable when the variables in the nodes are discretevalued or jointly represented by a single multivariate Gaussian distribution, and this rules out many computer vision applications.
Gaussian processes for ordinal regression
 Journal of Machine Learning Research
, 2004
"... We present a probabilistic kernel approach to ordinal regression based on Gaussian processes. A threshold model that generalizes the probit function is used as the likelihood function for ordinal variables. Two inference techniques, based on the Laplace approximation and the expectation propagation ..."
Abstract

Cited by 68 (1 self)
 Add to MetaCart
We present a probabilistic kernel approach to ordinal regression based on Gaussian processes. A threshold model that generalizes the probit function is used as the likelihood function for ordinal variables. Two inference techniques, based on the Laplace approximation and the expectation propagation algorithm respectively, are derived for hyperparameter learning and model selection. We compare these two Gaussian process approaches with a previous ordinal regression method based on support vector machines on some benchmark and realworld data sets, including applications of ordinal regression to collaborative filtering and gene expression analysis. Experimental results on these data sets verify the usefulness of our approach.