Results 1 -
7 of
7
Sample Propagation
- Advances in Neural Information Processing System
, 2003
"... Rao--Blackwellization is an approximation technique for probabilistic inference that flexibly combines exact inference with sampling. It is useful in models where conditioning on some of the variables leaves a simpler inference problem that can be solved tractably. This paper presents Sample Pro ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Rao--Blackwellization is an approximation technique for probabilistic inference that flexibly combines exact inference with sampling. It is useful in models where conditioning on some of the variables leaves a simpler inference problem that can be solved tractably. This paper presents Sample Propagation, an efficient implementation of Rao--Blackwellized approximate inference for a large class of models. Sample Propagation tightly integrates sampling with message passing in a junction tree, and is named for its simple, appealing structure: it walks the clusters of a junction tree, sampling some of the current cluster's variables and then passing a message to one of its neighbors. We discuss the application of Sample Propagation to conditional Gaussian inference problems such as switching linear dynamical systems.
Constrained Approximate Maximum Entropy Learning of Markov Random Fields
"... Parameter estimation in Markov random fields (MRFs) is a difficult task, in which inference over the network is run in the inner loop of a gradient descent procedure. Replacing exact inference with approximate methods such as loopy belief propagation (LBP) can suffer from poor convergence. In this p ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Parameter estimation in Markov random fields (MRFs) is a difficult task, in which inference over the network is run in the inner loop of a gradient descent procedure. Replacing exact inference with approximate methods such as loopy belief propagation (LBP) can suffer from poor convergence. In this paper, we provide a different approach for combining MRF learning and Bethe approximation. We consider the dual of maximum likelihood Markov network learning — maximizing entropy with moment matching constraints — and then approximate both the objective and the constraints in the resulting optimization problem. Unlike previous work along these lines (Teh & Welling, 2003), our formulation allows parameter sharing between features in a general log-linear model, parameter regularization and conditional training. We show that piecewise training (Sutton & McCallum, 2005) is a very restricted special case of this formulation. We study two optimization strategies: one based on a single convex approximation and one that uses repeated convex approximations. We show results on several real-world networks that demonstrate that these algorithms can significantly outperform learning with loopy and piecewise. Our results also provide a framework for analyzing the trade-offs of different relaxations of the entropy objective and of the constraints. 1
Estimating the “wrong” Markov random field: Benefits in the computation-limited setting
- In Advances in Neural Information Processing Systems
, 2005
"... Consider the problem of joint parameter estimation and prediction in a Markov random field: i.e., the model parameters are estimated on the basis of an initial set of data, and then the fitted model is used to perform prediction (e.g., smoothing, denoising, interpolation) on a new noisy observation. ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Consider the problem of joint parameter estimation and prediction in a Markov random field: i.e., the model parameters are estimated on the basis of an initial set of data, and then the fitted model is used to perform prediction (e.g., smoothing, denoising, interpolation) on a new noisy observation. Working in the computation-limited setting, we analyze a joint method in which the same convex variational relaxation is used to construct an M-estimator for fitting parameters, and to perform approximate marginalization for the prediction step. The key result of this paper is that in the computation-limited setting, using an inconsistent parameter estimator (i.e., an estimator that returns the “wrong ” model even in the infinite data limit) is provably beneficial, since the resulting errors can partially compensate for errors made by using an approximate prediction technique. En route to this result, we analyze the asymptotic properties of M-estimators based on convex variational relaxations, and establish a Lipschitz stability property that holds for a broad class of variational methods. We show that joint estimation/prediction based on the reweighted sum-product algorithm substantially outperforms a commonly used heuristic based on ordinary sum-product. 1
Bethe Free Energy and Contrastive Divergence Approximations for Undirected Graphical Models
, 2003
"... Bethe Free Energy and Contrastive Divergence Approximations for Undirected Graphical Models Yee Whye Teh Doctorate of Philosophy Graduate Department of Computer Science University of Toronto 2003 As the machine learning community tackles more complex and harder problems, the graphical models ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Bethe Free Energy and Contrastive Divergence Approximations for Undirected Graphical Models Yee Whye Teh Doctorate of Philosophy Graduate Department of Computer Science University of Toronto 2003 As the machine learning community tackles more complex and harder problems, the graphical models needed to solve such problems become larger and more complicated. As a result performing inference and learning exactly for such graphical models become ever more expensive, and approximate inference and learning techniques become ever more prominent.
Distributed Covariance Estimation in Gaussian Graphical Models
"... Abstract—We consider distributed estimation of the inverse covariance matrix in Gaussian graphical models. These models factorize the multivariate distribution and allow for efficient distributed signal processing methods such as belief propagation (BP). The classical maximum likelihood approach to ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract—We consider distributed estimation of the inverse covariance matrix in Gaussian graphical models. These models factorize the multivariate distribution and allow for efficient distributed signal processing methods such as belief propagation (BP). The classical maximum likelihood approach to this covariance estimation problem, or potential function estimation in BP terminology, requires centralized computing and is computationally intensive. This motivates suboptimal distributed alternatives that tradeoff accuracy for communication cost. A natural solution is for each node to perform estimation of its local covariance with respect to its neighbors. The local maximum likelihood estimator is asymptotically consistent but suboptimal, i.e., it does not minimize mean squared estimation (MSE) error. We propose to improve the MSE performance by introducing additional symmetry constraints using averaging and pseudolikelihood estimation approaches. We compute the proposed estimates using message passing protocols, which can be efficiently implemented in large scale graphical models with many nodes. We illustrate the advantages of our proposed methods using numerical experiments with synthetic data as well as real world data from a wireless sensor network. Index Terms—Covariance estimation, distributed signal processing, graphical models. I.
Linear-Time Inverse Covariance Matrix Estimation in Gaussian Processes
"... The computational cost of Gaussian process regression grows cubically with respect to the number of variables due to the inversion of the covariance matrix, which is impractical for data sets with more than a few thousand nodes. Furthermore, Gaussian processes lack the ability to represent condition ..."
Abstract
- Add to MetaCart
The computational cost of Gaussian process regression grows cubically with respect to the number of variables due to the inversion of the covariance matrix, which is impractical for data sets with more than a few thousand nodes. Furthermore, Gaussian processes lack the ability to represent conditional independence assertions between variables. We describe iterative proportional scaling for directly estimating the precision matrix without inverting the covariance matrix, given an undirected graph and a covariance function or data. We introduce a variant of the Shafer-Shenoy algorithm combined with IPS that runs in O(nC 3)-time, where C is the largest clique size in the induced junction tree. We present results on synthetic data and temperature prediction in a real sensor network. 1
2010 IEEE Sensor Array and Multichannel Signal Processing Workshop Distributed covariance estimation in Gaussian graphical models
"... Abstract—We consider distributed covariance estimation in Gaussian graphical models. A typical motivation is learning the potential functions for inference via belief propagation in large scale networks. The classical approach based on a centralized maximum likelihood principle is infeasible, and su ..."
Abstract
- Add to MetaCart
Abstract—We consider distributed covariance estimation in Gaussian graphical models. A typical motivation is learning the potential functions for inference via belief propagation in large scale networks. The classical approach based on a centralized maximum likelihood principle is infeasible, and suboptimal distributed alternatives which tradeoff performance with communication costs are required. We begin with a natural solution where each node performs independent estimation of its local covariance with its neighbors. We show that these local solutions are consistent, and can be interpreted as a pseudo-likelihood method. Based on this interpretation, we propose to enhance the performance by introducing additional symmetry constraints. We enforce these using the methodology of the Alternating Direction Method of Multipliers. This results in a flexible message passing protocol between neighboring nodes which can be implemented in large scale networks. I.

