## Evaluating probabilities under high-dimensional latent

### Cached

### Download Links

Citations: | 13 - 4 self |

### BibTeX

@MISC{Murray_evaluatingprobabilities,

author = {Iain Murray and Ruslan Salakhutdinov},

title = {Evaluating probabilities under high-dimensional latent},

year = {}

}

### OpenURL

### Abstract

variable models

### Citations

617 | Fast and Robust Fixed-Point Algorithms for Independent Component Analysis
- Hyvarinen
- 1999
(Show Context)
Citation Context ... over 20 nats shows that, for model comparison purposes, the variational lower bound is quite loose. For comparison, we also trained square ICA and a mixture of factor analyzers (MFA) using code from =-=[16, 17]-=-. Square ICA achieves a test log probability of −551.14, and MFA with 50 mixture components and a 30-dimensional latent space achieves −502.30, clearly outperforming DBNs. 6 Discussion Our new Monte C... |

414 | Marginal likelihood from the Gibbs output
- Chib
- 1995
(Show Context)
Citation Context ...lose. 2.4 Chib-style estimators Bayes rule implies that for any special hidden state h ∗ , P (v) = P (h ∗ , v)/P (h ∗ |v). (8) This trivial identity suggests a family of estimators introduced by Chib =-=[9]-=-. First, we choose a particular hidden state h ∗ , usually one with high posterior probability, and then estimate P (h ∗ |v). We would like to obtain an estimator that is based on a sequence of states... |

237 | The EM algorithm for mixtures of factor analyzers
- Ghahramani, Hinton
- 1997
(Show Context)
Citation Context ... over 20 nats shows that, for model comparison purposes, the variational lower bound is quite loose. For comparison, we also trained square ICA and a mixture of factor analyzers (MFA) using code from =-=[16, 17]-=-. Square ICA achieves a test log probability of −551.14, and MFA with 50 mixture components and a 30-dimensional latent space achieves −502.30, clearly outperforming DBNs. 6 Discussion Our new Monte C... |

173 | Sequential Monte Carlo samplers
- Moral, Doucet, et al.
(Show Context)
Citation Context ...atches the target distribution. As an example we give a partial review of Annealed Importance Sampling (AIS) [7], a special case of a larger family of Sequential Monte Carlo (SMC) methods (see, e.g., =-=[8]-=-). Some of this theory will be needed in the new method we present in section 3. Annealing algorithms start with a sample from some tractable distribution P1. Steps are taken with a series of operator... |

170 |
Approximate Bayesian Inference with the Weighted Likelihood Bootstrap
- Newton, Raftery
- 1994
(Show Context)
Citation Context ...e correlated samples from MCMC are used; then the estimator is asymptotically unbiased. It was clear from the original paper and its discussion that the harmonic mean estimator can behave very poorly =-=[4]-=-. Samples in the tails of the posterior have large weights, which makes it easy to construct distributions where the estimator has infinite variance. A finite set of samples will rarely include any ex... |

170 | Annealed importance sampling
- Neal
- 1998
(Show Context)
Citation Context ...imensional than it was before, can help find an approximating Q distribution that closely matches the target distribution. As an example we give a partial review of Annealed Importance Sampling (AIS) =-=[7]-=-, a special case of a larger family of Sequential Monte Carlo (SMC) methods (see, e.g., [8]). Some of this theory will be needed in the new method we present in section 3. Annealing algorithms start w... |

142 | Integrating topics and syntax
- Griffiths, Blei, et al.
(Show Context)
Citation Context ...e an overestimate. Despite these problems the estimator has received significant attention in statistics, and has been used for evaluating latent variable models in recent machine learning literature =-=[5, 6]-=-. This is understandable: all of the existing, more accurate methods are harder to implement and take considerably longer to run. In this paper we propose a method that is nearly as easy to use as the... |

141 |
Marginal likelihood from the Metropolis-Hastings output
- Chib, Jeliazkov
- 2001
(Show Context)
Citation Context ... some Markov chains, there are technical problems with the above construction, which require an extension explained in the appendix. Moreover the approach above is not what Chib recommended. In fact, =-=[11]-=- explicitly favors a more elaborate procedure involving sampling from a sequence of distributions. This opens up the possibility of many sophisticated developments, e.g. [12, 13]. However, our focus i... |

101 | Topic modeling: Beyond bag-of-words
- Wallach
- 2006
(Show Context)
Citation Context ...e an overestimate. Despite these problems the estimator has received significant attention in statistics, and has been used for evaluating latent variable models in recent machine learning literature =-=[5, 6]-=-. This is understandable: all of the existing, more accurate methods are harder to implement and take considerably longer to run. In this paper we propose a method that is nearly as easy to use as the... |

74 |
Facilitating the Gibbs Sampler: the Gibbs stopper and the Griddy-Gibbs sampler
- Ritter, Tanner
- 1992
(Show Context)
Citation Context ...der from 1 to M, the move has probability: T (h ∗ M∏ ←h) = j=1 P ( h ∗ j ) ∣ ∗ h1:(j−1) , h (j+1):M . (11) Equations (9, 11) have been used in schemes for monitoring the convergence of Gibbs samplers =-=[10]-=-. It is worth emphasizing that we have only outlined the simplest possible scheme inspired by Chib’s general approach. For some Markov chains, there are technical problems with the above construction,... |

49 | Divergence measures and message passing
- Minka
- 2005
(Show Context)
Citation Context ...Importance sampling relies on the sampling distribution Q(h) being similar to the target distribution P (h|v). Specifically, the variance of the estimator is an α-divergence between the distributions =-=[3]-=-. Finding a tractable Q(h) with small divergence is difficult in high-dimensional problems. 2.2 The Harmonic mean method Using Q(h)=P (h|v) in (1) gives an “estimator” that requires knowing P (v). As ... |

40 | On the quantitative analysis of deep belief networks
- Salakhutdinov, Murray
- 2008
(Show Context)
Citation Context ...r dependencies amongst latent variables, even if they are independent a priori. Our current work is motivated by recent work on evaluating RBMs and their generalization to Deep Belief Networks (DBNs) =-=[1]-=-. For both types of models, a single constant was accurately approximated so that P (v, h) could be evaluated point-wise. For RBMs, the remaining sum over hidden variables was performed analytically. ... |

39 | Modeling image patches with a directed hierarchy of markov random fields. Advances in neural information processing systems
- Osindero, Hinton
(Show Context)
Citation Context ...handwritten digits (0 to 9), with 28×28 pixels. The image dataset consisted of 130,000 training and 20,000 test 20×20 patches. The raw image intensities were preprocessed and whitened as described in =-=[15]-=-. Gibbs sampling was used as a Markov chain transition operator throughout. All log probabilities quoted use natural logarithms, giving values in nats. 5.1 MNIST digits In our first experiment we used... |

22 |
Efficient Bayes factor estimation from the reversible jump output
- Bartolucci, Scaccia, et al.
- 2006
(Show Context)
Citation Context ...hib recommended. In fact, [11] explicitly favors a more elaborate procedure involving sampling from a sequence of distributions. This opens up the possibility of many sophisticated developments, e.g. =-=[12, 13]-=-. However, our focus in this work is on obtaining more useful results from simple cheap methods. There are also well-known problems with the Chib approach [14], to which we will return. 3 A new estima... |

17 |
Erroneous results in “Marginal Likelihood from the Gibbs Output.” Available from http://www.cs.utoronto.ca/˜radford/radford@stat.utoronto.ca
- Neal
- 1998
(Show Context)
Citation Context ...any sophisticated developments, e.g. [12, 13]. However, our focus in this work is on obtaining more useful results from simple cheap methods. There are also well-known problems with the Chib approach =-=[14]-=-, to which we will return. 3 A new estimator for evaluating latent-variable models We start with the simplest Chib-inspired estimator based on equations (8,9,11). Like many Markov chain Monte Carlo al... |

16 | Yee Whye Teh. A fast learning algorithm for deep belief nets - Hinton, Osindero |

8 | Studies in lower bounding probabilities of evidence using the Markov inequality
- Gogate, Bidyuk, et al.
(Show Context)
Citation Context ...ctice it is likely to underestimate the (log-)probability of a test set. Although the algorithm involves Markov chains, importance sampling underlies the estimator. Therefore the methods discussed in =-=[18]-=- could be used to bound the probability of accidentally over-estimating a test set probability. In principle our procedure is a general technique for estimating normalizing constants. It would not alw... |

7 | Bridge Estimation of the Probability Density at a Point
- Mira, Nicholls
- 2004
(Show Context)
Citation Context ...hib recommended. In fact, [11] explicitly favors a more elaborate procedure involving sampling from a sequence of distributions. This opens up the possibility of many sophisticated developments, e.g. =-=[12, 13]-=-. However, our focus in this work is on obtaining more useful results from simple cheap methods. There are also well-known problems with the Chib approach [14], to which we will return. 3 A new estima... |