Results 1 - 10
of
36
Probabilistic Inference Using Markov Chain Monte Carlo Methods
, 1993
"... Probabilistic inference is an attractive approach to uncertain reasoning and empirical learning in artificial intelligence. Computational difculties arise, however, because probabilistic models with the necessary realism and flexibility lead to complex distributions over high-dimensional spaces. Rel ..."
Abstract
-
Cited by 449 (15 self)
- Add to MetaCart
Probabilistic inference is an attractive approach to uncertain reasoning and empirical learning in artificial intelligence. Computational difculties arise, however, because probabilistic models with the necessary realism and flexibility lead to complex distributions over high-dimensional spaces. Related problems in other fields have been tackled using Monte Carlo methods based on sampling using Markov chains, providing a rich array of techniques that can be applied to problems in artificial intelligence. The "Metropolis algorithm" has been used to solve difficult problems in statistical physics for over forty years, and, in the last few years, the related method of "Gibbs sampling" has been applied to problems of statistical inference. Concurrently, an alternative method for solving problems in statistical physics by means of dynamical simulation has been developed as well, and has recently been unified with the Metropolis algorithm to produce the "hybrid Monte Carlo" method. In computer science, Markov chain sampling is the basis of the heuristic optimization technique of "simulated annealing", and has recently been used in randomized algorithms for approximate counting of large sets. In this review, I outline the role of probabilistic inference in artificial intelligence, and present the theory of Markov chains, and describe various Markov chain Monte Carlo algorithms, along with a number of supporting techniques. I try to present a comprehensive picture of the range of methods that have been developed, including techniques from the varied literature that have not yet seen wide application in artificial intelligence, but which appear relevant. As illustrative examples, I use the problems of probabilitic inference in expert systems, discovery of latent classes from data, and Bayesian learning for neural networks.
Experimental Queueing Analysis with Long-Range Dependent Packet Traffic
- IEEE/ACM Transactions on Networking
, 1996
"... Recent traffic measurement studies from a wide range of working packet networks have convincingly established the presence of significant statistical features that are characteristic of fractal traffic processes, in the sense that these features span many time scales. Of particular interest in packe ..."
Abstract
-
Cited by 275 (13 self)
- Add to MetaCart
Recent traffic measurement studies from a wide range of working packet networks have convincingly established the presence of significant statistical features that are characteristic of fractal traffic processes, in the sense that these features span many time scales. Of particular interest in packet traffic modeling is a property called long-range dependence, which is marked by the presence of correlations that can extend over many time scales. In this paper, we demonstrate empirically that, beyond its statistical significance in traffic measurements, long-range dependence has considerable impact on queueing performance, and is a dominant characteristic for a number of packet traffic engineering problems. In addition, we give conditions under which the use of compact and simple traffic models that incorporate long-range dependence in a parsimonious manner (e.g., fractional Brownian motion) is justified and can lead to new insights into the traffic management of high-speed networks. 1...
The Lack of A Priori Distinctions Between Learning Algorithms
, 1996
"... This is the first of two papers that use off-training set (OTS) error to investigate the assumption -free relationship between learning algorithms. This first paper discusses the senses in which there are no a priori distinctions between learning algorithms. (The second paper discusses the senses in ..."
Abstract
-
Cited by 103 (5 self)
- Add to MetaCart
This is the first of two papers that use off-training set (OTS) error to investigate the assumption -free relationship between learning algorithms. This first paper discusses the senses in which there are no a priori distinctions between learning algorithms. (The second paper discusses the senses in which there are such distinctions.) In this first paper it is shown, loosely speaking, that for any two algorithms A and B, there are "as many" targets (or priors over targets) for which A has lower expected OTS error than B as vice-versa, for loss functions like zero-one loss. In particular, this is true if A is cross-validation and B is "anti-cross-validation" (choose the learning algorithm with largest cross-validation error). This paper ends with a discussion of the implications of these results for computational learning theory. It is shown that one can not say: if empirical misclassification rate is low; the Vapnik-Chervonenkis dimension of your generalizer is small; and the trainin...
Bayes factors and model uncertainty
- DEPARTMENT OF STATISTICS, UNIVERSITY OFWASHINGTON
, 1993
"... In a 1935 paper, and in his book Theory of Probability, Jeffreys developed a methodology for quantifying the evidence in favor of a scientific theory. The centerpiece was a number, now called the Bayes factor, which is the posterior odds of the null hypothesis when the prior probability on the null ..."
Abstract
-
Cited by 70 (6 self)
- Add to MetaCart
In a 1935 paper, and in his book Theory of Probability, Jeffreys developed a methodology for quantifying the evidence in favor of a scientific theory. The centerpiece was a number, now called the Bayes factor, which is the posterior odds of the null hypothesis when the prior probability on the null is one-half. Although there has been much discussion of Bayesian hypothesis testing in the context of criticism of P-values, less attention has been given to the Bayes factor as a practical tool of applied statistics. In this paper we review and discuss the uses of Bayes factors in the context of five scientific applications. The points we emphasize are:- from Jeffreys's Bayesian point of view, the purpose of hypothesis testing is to evaluate the evidence in favor of a scientific theory;- Bayes factors offer a way of evaluating evidence in favor ofa null hypothesis;- Bayes factors provide a way of incorporating external information into the evaluation of evidence about a hypothesis;- Bayes factors are very general, and do not require alternative models to be nested;- several techniques are available for computing Bayes factors, including asymptotic approximations which are easy to compute using the output from standard packages that maximize likelihoods;- in "non-standard " statistical models that do not satisfy common regularity conditions, it can be technically simpler to calculate Bayes factors than to derive non-Bayesian significance
Graphical Models and Variational Methods
, 2001
"... We review the use of variational methods of approximating inference and learning in probabilistic graphical models. In particular, we focus on variational approximations to the integrals required for Bayesian learning. For models in the conjugate-exponential family, a generalisation of the EM algori ..."
Abstract
-
Cited by 27 (2 self)
- Add to MetaCart
We review the use of variational methods of approximating inference and learning in probabilistic graphical models. In particular, we focus on variational approximations to the integrals required for Bayesian learning. For models in the conjugate-exponential family, a generalisation of the EM algorithm is derived that iterates between optimising hyperparameters of the distribution over parameters, and inferring the hidden variable distributions. These approximations make use of available propagation algorithms for probabilistic graphical models. We give two case studies of how the variational Bayesian approach can be used to learn model structure: inferring the number of clusters and dimensionalities in a mixture of factor analysers, and inferring the dimension of the state space of a linear dynamical system. Finally, importance sampling corrections to the variational approximations are discussed, along with their limitations.
Occam's Razor
- In Advances in Neural Information Processing Systems 13
, 2001
"... The Bayesian paradigm apparently only sometimes gives rise to Occam's Razor; at other times very large models perform well. We give simple examples of both kinds of behaviour. The two views are reconciled when measuring complexity of functions, rather than of the machinery used to implement them. We ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
The Bayesian paradigm apparently only sometimes gives rise to Occam's Razor; at other times very large models perform well. We give simple examples of both kinds of behaviour. The two views are reconciled when measuring complexity of functions, rather than of the machinery used to implement them. We analyze the complexity of functions for some linear in the parameter models that are equivalent to Gaussian Processes, and always find Occam's Razor at work.
Off-Training Set Error And a Priori Distinctions Between . . .
, 1995
"... This paper uses off-training set (OTS) error to investigate the assumption-free relationship between learning algorithms. It is shown, loosely speaking, that for any two algorithms A and B, there are as many targets (or priors over targets) for which A has lower expected OTS error than B as vice-ver ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
This paper uses off-training set (OTS) error to investigate the assumption-free relationship between learning algorithms. It is shown, loosely speaking, that for any two algorithms A and B, there are as many targets (or priors over targets) for which A has lower expected OTS error than B as vice-versa, for loss functions like zero-one loss. In particular, this is true if A is cross-validation and B is "anti-cross-validation" (choose the generalizer with largest cross-validation error). On the other hand, for loss functions other than zero-one (e.g., quadratic loss), there are a priori distinctions between algorithms. However even for such loss functions, any algorithm is equivalent on average to its "randomized" version, and in this still has no first principles justification in terms of average error. Nonetheless, it may be that (for example) cross-validation has better minimax properties than anti-cross-validation, even for zero-one loss. This paper also analyzes averages over hypotheses rather than targets. Such analyses hold for all possible priors. Accordingly they prove, as a particular example, that cross-validation can not be justified as a Bayesian procedure. In fact, for a very natural restriction of the class of learning algorithms, one should use anti-crossvalidation rather than cross-validation (!). This paper ends with a discussion of the implications of these results for computational learning theory. It is shown that one can not say: if empirical misclassification rate is low; the VC dimension of your generalizer is small; and the training set is large, then with high probability your OTS error is small. Other implications for "membership queries " algorithms and "punting" algorithms are also discussed.
A Bayesian Framework for Concept Learning
- DEPARTMENT OF ARTIFICIAL INTELLIGENCE, EDINBURGH UNIVERSITY
, 1999
"... Human concept learning presents a version of the classic problem of induction, which is made particularly difficult by the combination of two requirements: the need to learn from a rich (i.e. nested and overlapping) vocabulary of possible concepts and the need to be able to generalize concepts reaso ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
Human concept learning presents a version of the classic problem of induction, which is made particularly difficult by the combination of two requirements: the need to learn from a rich (i.e. nested and overlapping) vocabulary of possible concepts and the need to be able to generalize concepts reasonably from only a few positive examples. I begin this thesis by considering a simple number concept game as a concrete illustration of this ability. On this task, human learners can with reasonable confidence lock in on one out of a billion billion billion logically possible concepts, after seeing only four positive examples of the concept, and can generalize informatively after seeing just a single example. Neither of the two classic approaches to inductive inference -- hypothesis testing in a constrained space of possible rules and computing similarity to the observed examples -- can provide a complete picture of how people generalize concepts in even this simple setting. This thesis prop...
Unsupervised learning
- Advanced Lectures on Machine Learning
, 2004
"... We give a tutorial and overview of the field of unsupervised learning from the perspective of statistical modelling. Unsupervised learning can be motivated from information theoretic and Bayesian principles. We briefly review basic models in unsupervised learning, including factor analysis, PCA, mix ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
We give a tutorial and overview of the field of unsupervised learning from the perspective of statistical modelling. Unsupervised learning can be motivated from information theoretic and Bayesian principles. We briefly review basic models in unsupervised learning, including factor analysis, PCA, mixtures of Gaussians, ICA, hidden Markov models, state-space models, and many variants and extensions. We derive the EM algorithm and give an overview of fundamental concepts in graphical models, and inference algorithms on graphs. This is followed by a quick tour of approximate Bayesian inference, including Markov chain Monte Carlo (MCMC), Laplace approximation, BIC, variational approximations, and expectation propagation (EP). The aim of this chapter is to provide a high-level view of the field. Along the way, many state-of-the-art ideas and future directions are also reviewed. Contents 1
Evaluation of 3D Human Motion Tracking with a Coordinated Mixture of Factor Analyzers
- Proc. EHuM workshop, NIPS
, 2006
"... ..."

