Results 1 - 10
of
74
An Introduction to MCMC for Machine Learning
, 2003
"... This purpose of this introductory paper is threefold. First, it introduces the Monte Carlo method with emphasis on probabilistic machine learning. Second, it reviews the main building blocks of modern Markov chain Monte Carlo simulation, thereby providing and introduction to the remaining papers of ..."
Abstract
-
Cited by 141 (2 self)
- Add to MetaCart
This purpose of this introductory paper is threefold. First, it introduces the Monte Carlo method with emphasis on probabilistic machine learning. Second, it reviews the main building blocks of modern Markov chain Monte Carlo simulation, thereby providing and introduction to the remaining papers of this special issue. Lastly, it discusses new interesting research horizons.
Dependency networks for inference, collaborative filtering, and data visualization
- Journal of Machine Learning Research
"... We describe a graphical model for probabilistic relationships|an alternative tothe Bayesian network|called a dependency network. The graph of a dependency network, unlike aBayesian network, is potentially cyclic. The probability component of a dependency network, like aBayesian network, is a set of ..."
Abstract
-
Cited by 122 (9 self)
- Add to MetaCart
We describe a graphical model for probabilistic relationships|an alternative tothe Bayesian network|called a dependency network. The graph of a dependency network, unlike aBayesian network, is potentially cyclic. The probability component of a dependency network, like aBayesian network, is a set of conditional distributions, one for each nodegiven its parents. We identify several basic properties of this representation and describe a computationally e cient procedure for learning the graph and probability components from data. We describe the application of this representation to probabilistic inference, collaborative ltering (the task of predicting preferences), and the visualization of acausal predictive relationships.
Likelihood Inference for Discretely Observed Non-Linear Diffusions
- Econometrica
, 1998
"... This paper is concerned with the Bayesian estimation of non-linear stochastic differential equations when only discrete observations are available. The estimation is carried out using a tuned MCMC method, in particular a blocked Metropolis-Hastings algorithm, by introducing auxiliary points and usin ..."
Abstract
-
Cited by 97 (13 self)
- Add to MetaCart
This paper is concerned with the Bayesian estimation of non-linear stochastic differential equations when only discrete observations are available. The estimation is carried out using a tuned MCMC method, in particular a blocked Metropolis-Hastings algorithm, by introducing auxiliary points and using the Euler-Maruyama discretisation scheme. Techniques for computing the likelihood function, the marginal likelihood and diagnostic measures (all based on the MCMC output) are presented. Examples using simulated and real data are presented and discussed in detail.
Strictly Proper Scoring Rules, Prediction, and Estimation
, 2007
"... Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if the forecaster maximizes the expected score for an observation drawn from the distribution F if he ..."
Abstract
-
Cited by 86 (13 self)
- Add to MetaCart
Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if the forecaster maximizes the expected score for an observation drawn from the distribution F if he or she issues the probabilistic forecast F, rather than G ̸ = F. It is strictly proper if the maximum is unique. In prediction problems, proper scoring rules encourage the forecaster to make careful assessments and to be honest. In estimation problems, strictly proper scoring rules provide attractive loss and utility functions that can be tailored to the problem at hand. This article reviews and develops the theory of proper scoring rules on general probability spaces, and proposes and discusses examples thereof. Proper scoring rules derive from convex functions and relate to information measures, entropy functions, and Bregman divergences. In the case of categorical variables, we prove a rigorous version of the Savage representation. Examples of scoring rules for probabilistic forecasts in the form of predictive densities include the logarithmic, spherical, pseudospherical, and quadratic scores. The continuous ranked probability score applies to probabilistic forecasts that take the form of predictive cumulative distribution functions. It generalizes the absolute error and forms a special case of a new and very general type of score, the energy score. Like many other scoring rules, the energy score admits a kernel representation in terms of negative definite functions, with links to inequalities of Hoeffding type, in both univariate and multivariate settings. Proper scoring rules for quantile and interval forecasts are also discussed. We relate proper scoring rules to Bayes factors and to cross-validation, and propose a novel form of cross-validation known as random-fold cross-validation. A case study on probabilistic weather forecasts in the North American Pacific Northwest illustrates the importance of propriety. We note optimum score approaches to point and quantile
An Interruptible Algorithm for Perfect Sampling via Markov Chains
- Annals of Applied Probability
, 1998
"... For a large class of examples arising in statistical physics known as attractive spin systems (e.g., the Ising model), one seeks to sample from a probability distribution # on an enormously large state space, but elementary sampling is ruled out by the infeasibility of calculating an appropriate nor ..."
Abstract
-
Cited by 75 (7 self)
- Add to MetaCart
For a large class of examples arising in statistical physics known as attractive spin systems (e.g., the Ising model), one seeks to sample from a probability distribution # on an enormously large state space, but elementary sampling is ruled out by the infeasibility of calculating an appropriate normalizing constant. The same difficulty arises in computer science problems where one seeks to sample randomly from a large finite distributive lattice whose precise size cannot be ascertained in any reasonable amount of time. The Markov chain Monte Carlo (MCMC) approximate sampling approach to such a problem is to construct and run "for a long time" a Markov chain with long-run distribution #. But determining how long is long enough to get a good approximation can be both analytically and empirically difficult. Recently, Jim Propp and David Wilson have devised an ingenious and efficient algorithm to use the same Markov chains to produce perfect (i.e., exact) samples from #. However, the running t...
Spatial Econometrics
- PALGRAVE HANDBOOK OF ECONOMETRICS: VOLUME 1, ECONOMETRIC THEORY
, 2001
"... Spatial econometric methods deal with the incorporation of spatial interaction and spatial structure into regression analysis. The field has seen a recent and rapid growth spurred both by theoretical concerns as well as by the need to be able to apply econometric models to emerging large geocoded da ..."
Abstract
-
Cited by 36 (5 self)
- Add to MetaCart
Spatial econometric methods deal with the incorporation of spatial interaction and spatial structure into regression analysis. The field has seen a recent and rapid growth spurred both by theoretical concerns as well as by the need to be able to apply econometric models to emerging large geocoded data bases. The review presented in this chapter outlines the basic terminology and discusses in some detail the specification of spatial effects, estimation of spatial regression models, and specification tests for spatial effects.
Bayesian model averaging
- STAT.SCI
, 1999
"... Standard statistical practice ignores model uncertainty. Data analysts typically select a model from some class of models and then proceed as if the selected model had generated the data. This approach ignores the uncertainty in model selection, leading to over-con dent inferences and decisions tha ..."
Abstract
-
Cited by 29 (0 self)
- Add to MetaCart
Standard statistical practice ignores model uncertainty. Data analysts typically select a model from some class of models and then proceed as if the selected model had generated the data. This approach ignores the uncertainty in model selection, leading to over-con dent inferences and decisions that are more risky than one thinks they are. Bayesian model averaging (BMA) provides a coherent mechanism for accounting for this model uncertainty. Several methods for implementing BMA haverecently emerged. We discuss these methods and present anumber of examples. In these examples, BMA provides improved out-of-sample predictive performance. We also provide a catalogue of
Methods for Approximating Integrals in Statistics with Special Emphasis on Bayesian Integration Problems
- Statistical Science
"... This paper is a survey of the major techniques and approaches available for the numerical approximation of integrals in statistics. We classify these into five broad categories; namely, asymptotic methods, importance sampling, adaptive importance sampling, multiple quadrature and Markov chain method ..."
Abstract
-
Cited by 26 (3 self)
- Add to MetaCart
This paper is a survey of the major techniques and approaches available for the numerical approximation of integrals in statistics. We classify these into five broad categories; namely, asymptotic methods, importance sampling, adaptive importance sampling, multiple quadrature and Markov chain methods. Each method is discussed giving an outline of the basic supporting theory and particular features of the technique. Conclusions are drawn concerning the relative merits of the methods based on the discussion and their application to three examples. The following broad recommendations are made. Asymptotic methods should only be considered in contexts where the integrand has a dominant peak with approximate ellipsoidal symmetry. Importance sampling, and preferably adaptive importance sampling, based on a multivariate Student should be used instead of asymptotics methods in such a context. Multiple quadrature, and in particular subregion adaptive integration, are the algorithms of choice for...
Joint Bayesian Model Selection and Estimation of Noisy Sinusoids via Reversible Jump MCMC
, 1999
"... In this paper, the problem of joint Bayesian model selection and parameter estimation for sinusoids in white Gaussian noise is addressed. An original Bayesian model is proposed that allows us to define a posterior distribution on the parameter space. All Bayesian inference is then based on this dist ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
In this paper, the problem of joint Bayesian model selection and parameter estimation for sinusoids in white Gaussian noise is addressed. An original Bayesian model is proposed that allows us to define a posterior distribution on the parameter space. All Bayesian inference is then based on this distribution. Unfortunately, a direct evaluation of this distribution and of its features, including posterior model probabilities, requires evaluation of some complicated high-dimensional integrals. We develop an efficient stochastic algorithm based on reversible jump Markov chain Monte Carlo methods to perform the Bayesian computation. A convergence result for this algorithm is established. In simulation, it appears that the performance of detection based on posterior model probabilities outperforms conventional detection schemes.
Weak convergence of Metropolis algorithms for non-iid target distributions
, 2007
"... In this paper, we shall optimize the efficiency of Metropolis algorithms for multidimensional target distributions with scaling terms possibly depending on the dimension. We propose a method to determine the appropriate form for the scaling of the proposal distribution as a function of the dimension ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
In this paper, we shall optimize the efficiency of Metropolis algorithms for multidimensional target distributions with scaling terms possibly depending on the dimension. We propose a method to determine the appropriate form for the scaling of the proposal distribution as a function of the dimension, which leads to the proof of an asymptotic diffusion theorem. We show that when there does not exist any component with a scaling term significantly smaller than the others, the asymptotically optimal acceptance rate is the well-known 0.234.

