Results 1  10
of
30
The design and analysis of benchmark experiments
 J Comp Graph Stat
, 2005
"... The assessment of the performance of learners by means of benchmark experiments is an established exercise. In practice, benchmark studies are a tool to compare the performance of several competing algorithms for a certain learning problem. Crossvalidation or resampling techniques are commonly used ..."
Abstract

Cited by 21 (10 self)
 Add to MetaCart
The assessment of the performance of learners by means of benchmark experiments is an established exercise. In practice, benchmark studies are a tool to compare the performance of several competing algorithms for a certain learning problem. Crossvalidation or resampling techniques are commonly used to derive point estimates of the performances which are compared to identify algorithms with good properties. For several benchmarking problems, test procedures taking the variability of those point estimates into account have been suggested. Most of the recently proposed inference procedures are based on special variance estimators for the crossvalidated performance. We introduce a theoretical framework for inference problems in benchmark experiments and show that standard statistical test procedures can be used to test for differences in the performances. The theory is based on well defined distributions of performance measures which can be compared with established tests. To demonstrate the usefulness in practice, the theoretical results are applied to regression and classification benchmark studies based on artificial and real world data.
Penalized loss functions for Bayesian model comparison
"... The deviance information criterion (DIC) is widely used for Bayesian model comparison, despite the lack of a clear theoretical foundation. DIC is shown to be an approximation to a penalized loss function based on the deviance, with a penalty derived from a crossvalidation argument. This approximati ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
The deviance information criterion (DIC) is widely used for Bayesian model comparison, despite the lack of a clear theoretical foundation. DIC is shown to be an approximation to a penalized loss function based on the deviance, with a penalty derived from a crossvalidation argument. This approximation is valid only when the effective number of parameters in the model is much smaller than the number of independent observations. In disease mapping, a typical application of DIC, this assumption does not hold and DIC underpenalizes more complex models. Another deviancebased loss function, derived from the same decisiontheoretic framework, is applied to mixture models, which have previously been considered an unsuitable application for DIC.
Gaussian process regression with Studentt likelihood
"... In the Gaussian process regression the observation model is commonly assumed to be Gaussian, which is convenient in computational perspective. However, the drawback is that the predictive accuracy of the model can be significantly compromised if the observations are contaminated by outliers. A robus ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
In the Gaussian process regression the observation model is commonly assumed to be Gaussian, which is convenient in computational perspective. However, the drawback is that the predictive accuracy of the model can be significantly compromised if the observations are contaminated by outliers. A robust observation model, such as the Studentt distribution, reduces the influence of outlying observations and improves the predictions. The problem, however, is the analytically intractable inference. In this work, we discuss the properties of a Gaussian process regression model with the Studentt likelihood and utilize the Laplace approximation for approximate inference. We compare our approach to a variational approximation and a Markov chain Monte Carlo scheme, which utilize the commonly used scale mixture representation of the Studentt distribution. 1
Bayesian Input Variable Selection Using Posterior Probabilities and Expected Utilities
, 2002
"... We consider the input variable selection in complex Bayesian hierarchical models. Our goal is to find a model with the smallest number of input variables having statistically or practically at least the same expected utility as the full model with all the available inputs. A good estimate for the ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
We consider the input variable selection in complex Bayesian hierarchical models. Our goal is to find a model with the smallest number of input variables having statistically or practically at least the same expected utility as the full model with all the available inputs. A good estimate for the expected utility can be computed using crossvalidation predictive densities. In the case of input selection and a large number of input combinations, the computation of the crossvalidation predictive densities for each model easily becomes computationally prohibitive. We propose to use the posterior probabilities obtained via variable dimension MCMC methods to find out potentially useful input combinations, for which the final model choice and assessment is done using the expected utilities.
Neural Network Methods In Analysing And Modelling Time Varying Processes
, 2003
"... Teknillinen korkeakoulu Sähkö ja tietoliikennetekniikan osasto Laskennallisen tekniikan laboratorio Distribution: ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Teknillinen korkeakoulu Sähkö ja tietoliikennetekniikan osasto Laskennallisen tekniikan laboratorio Distribution:
Model Selection via Predictive Explanatory Power 20
 Helsinki University of Technology, Laboratory of Computational Engineering
, 1998
"... We consider model selection as a decision problem from a predictive perspective. The optimal Bayesian way of handling model uncertainty is to integrate over model space. Model selection can then be seen as point estimation in the model space. We propose a model selection method based on KullbackLei ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We consider model selection as a decision problem from a predictive perspective. The optimal Bayesian way of handling model uncertainty is to integrate over model space. Model selection can then be seen as point estimation in the model space. We propose a model selection method based on KullbackLeibler divergence from the predictive distribution of the full model to the predictive distributions of the submodels. The loss of predictive explanatory power is defined as the expectation of this predictive discrepancy. The goal is to find the simplest submodel which has a similar predictive distribution as the full model, that is, the simplest submodel whose loss of explanatory power is acceptable. To compute the expected predictive discrepancy between complex models, for which analytical solutions do not exist, we propose to use predictive distributions obtained via kfold crossvalidation. We compare the performance of the method to posterior probabilities (Bayes factors), deviance information criteria (DIC) and direct maximization of the expected utility via crossvalidation.
MCMC Methods for MLPnetwork and Gaussian Process and Stuff – A documentation for Matlab
, 2006
"... Version 2.1 MCMCstuff toolbox is a collection of Matlab functions for Bayesian inference with Markov chain Monte Carlo (MCMC) methods. This documentation introduces some of the features available in the toolbox. Introduction includes demonstrations of using Bayesian Multilayer Perceptron (MLP) netwo ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Version 2.1 MCMCstuff toolbox is a collection of Matlab functions for Bayesian inference with Markov chain Monte Carlo (MCMC) methods. This documentation introduces some of the features available in the toolbox. Introduction includes demonstrations of using Bayesian Multilayer Perceptron (MLP) network and Gaussian process in simple regression and classification problems with a hierarchical automatic relevance determination (ARD) prior for covariate related parameters. The regression problems demonstrate the use of Gaussian and Student’s tdistribution residual models and classification
CATS benchmark time series prediction by Kalman smoother with crossvalidated noise density
, 2005
"... This article presents the winning solution to the CATS time series prediction competition. The solution is based on classical optimal linear estimation theory. The proposed method models the long and short term dynamics of the time series as stochastic linear models. The computation is based on a Ka ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
This article presents the winning solution to the CATS time series prediction competition. The solution is based on classical optimal linear estimation theory. The proposed method models the long and short term dynamics of the time series as stochastic linear models. The computation is based on a Kalman smoother, in which the noise densities are estimated by crossvalidation. In time series prediction the Kalman smoother is applied three times in different stages of the method.
Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory
 Journal of Machine Learning Research
, 2010
"... In regular statistical models, the leaveoneout crossvalidation is asymptotically equivalent to the Akaike information criterion. However, since many learning machines are singular statistical models, the asymptotic behavior of the crossvalidation remains unknown. In previous studies, we establis ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
In regular statistical models, the leaveoneout crossvalidation is asymptotically equivalent to the Akaike information criterion. However, since many learning machines are singular statistical models, the asymptotic behavior of the crossvalidation remains unknown. In previous studies, we established the singular learning theory and proposed a widely applicable information criterion, the expectation value of which is asymptotically equal to the average Bayes generalization loss. In the present paper, we theoretically compare the Bayes crossvalidation loss and the widely applicable information criterion and prove two theorems. First, the Bayes crossvalidation loss is asymptotically equivalent to the widely applicable information criterion as a random variable. Therefore, model selection and hyperparameter optimization using these two values are asymptotically equivalent. Second, the sum of the Bayes generalization error and the Bayes crossvalidation error is asymptotically equal to 2λ/n, where λ is the real log canonical threshold and n is the number of training samples. Therefore the relation between the crossvalidation error and the generalization error is determined by the algebraic geometrical structure of a learning machine. We also clarify that the deviance information criteria are different from the Bayes crossvalidation and the widely applicable information criterion.