Results 1  10
of
12
Accumulative prediction error and the selection of time series models
, 2006
"... This article reviews the rationale for using accumulative onestepahead prediction error (APE) as a datadriven method for model selection. Theoretically, APE is closely related to Bayesian model selection and the method of minimum description length (MDL). The sole requirement for using APE is tha ..."
Abstract

Cited by 25 (5 self)
 Add to MetaCart
This article reviews the rationale for using accumulative onestepahead prediction error (APE) as a datadriven method for model selection. Theoretically, APE is closely related to Bayesian model selection and the method of minimum description length (MDL). The sole requirement for using APE is that the models under consideration are capable of generating a prediction for the next, unseen data point. This means that APE may be readily applied to selection problems involving very complex models. APE automatically takes the functional form of parameters into account, and the ‘plugin’ version of APE does not require the specification of priors. APE is particularly easy to compute for data that have a natural ordering, such as time series. Here, we explore the possibility of using APE to discriminate the shortrange ARMA(1,1) model from the longrange ARFIMAð0; d; 0Þ model. We also illustrate how APE may be used for model metaselection, allowing one to choose between different model selection methods.
Bayesian Network Structure Learning using Factorized NML Universal Models
, 2008
"... Universal codes/models can be used for data compression and model selection by the minimum description length (MDL) principle. For many interesting model classes, such as Bayesian networks, the minimax regret optimal normalized maximum likelihood (NML) universal model is computationally very deman ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
Universal codes/models can be used for data compression and model selection by the minimum description length (MDL) principle. For many interesting model classes, such as Bayesian networks, the minimax regret optimal normalized maximum likelihood (NML) universal model is computationally very demanding. We suggest a computationally feasible alternative to NML for Bayesian networks, the factorized NML universal model, where the normalization is done locally for each variable. This can be seen as an approximate sumproduct algorithm. We show that this new universal model performs extremely well in model selection, compared to the existing stateoftheart, even for small sample sizes.
Monte Carlo Estimation of Minimax Regret with an Application to MDL Model Selection
, 2008
"... Minimum description length (MDL) model selection, in its modern NML formulation, involves a model complexity term which is equivalent to minimax/maximin regret. When the data are discretevalued, the complexity term is a logarithm of a sum of maximized likelihoods over all possible datasets. Becaus ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Minimum description length (MDL) model selection, in its modern NML formulation, involves a model complexity term which is equivalent to minimax/maximin regret. When the data are discretevalued, the complexity term is a logarithm of a sum of maximized likelihoods over all possible datasets. Because the sum has an exponential number of terms, its evaluation is in many cases intractable. In the continuous case, the sum is replaced by an integral for which a closed form is available in only a few cases. We present an approach based on Monte Carlo sampling, which works for all model classes, and gives strongly consistent estimators of the minimax regret. The estimates convergence almost surely to the correct value with increasing number of iterations. For the important class of Markov models, one of the presented estimators is particularly efficient: in empirical experiments, accuracy that is sufficient for model selection is usually achieved already on the first iteration, even for long sequences.
Luckiness and Regret in Minimum Description Length Inference
, 2009
"... Minimum Description Length (MDL) inference is based on the intuition that understanding the available data can be defined in terms of the ability to compress the data, i.e. to describe it in full using a shorter representation. This brief introduction discusses the design of the various codes used t ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Minimum Description Length (MDL) inference is based on the intuition that understanding the available data can be defined in terms of the ability to compress the data, i.e. to describe it in full using a shorter representation. This brief introduction discusses the design of the various codes used to implement MDL, focusing on the philosophically intriguing concepts of luckiness and regret: a good MDL code exhibits good performance in the worst case over all possible data sets, but achieves even better performance when the data turn out to be simple (although we suggest making no a priori assumptions to that effect). We then discuss how data compression relates to performance in various learning tasks, including parameter estimation, parametric and nonparametric model selection and sequential prediction of outcomes from an unknown source. Last, we briefly outline the history of MDL and its technical and philosophical relationship to other approaches to learning such as Bayesian, frequentist and prequential statistics. 1
MMLD Inference of the Poisson and Geometric Models
"... Abstract. This paper examines MMLDbased approximations for the inference of two univariate probability densities: the geometric distribution, parameterised in terms of a mean parameter, and the Poisson distribution. The focus is on both parameter estimation and hypothesis testing properties of the ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. This paper examines MMLDbased approximations for the inference of two univariate probability densities: the geometric distribution, parameterised in terms of a mean parameter, and the Poisson distribution. The focus is on both parameter estimation and hypothesis testing properties of the approximation. The new parameter estimators are compared to the MML87 estimators in terms of bias, squared error risk and KL divergence risk. Empirical experiments demonstrate that the MMLD parameter estimates are more biased, and feature higher squared error risk than the corresponding MML87 estimators. In contrast, the two criteria are virtually indistinguishable in the hypothesis testing experiment. 1
1. NORMALIZED MAXIMUM LIKELIHOOD Let
"... Bayesian networks are parametric models for multidimensional domains exhibiting complex dependencies between the dimensions (domain variables). A central problem in learning such models is how to regularize the number of parameters; in other words, how to determine which dependencies are significant ..."
Abstract
 Add to MetaCart
(Show Context)
Bayesian networks are parametric models for multidimensional domains exhibiting complex dependencies between the dimensions (domain variables). A central problem in learning such models is how to regularize the number of parameters; in other words, how to determine which dependencies are significant and which are not. The normalized maximum likelihood (NML) distribution or code offers an informationtheoretic solution to this problem. Unfortunately, computing it for arbitrary Bayesian network models appears to be computationally infeasible, but recent results have showed that it can be computed efficiently for certain restricted type of Bayesian network models. In this review paper we summarize the main results.
Efficient Computation of NML . . .
"... Bayesian networks are parametric models for multidimensional domains exhibiting complex dependencies between the dimensions (domain variables). A central problem in learning such models is how to regularize the number of parameters; in other words, how to determine which dependencies are significant ..."
Abstract
 Add to MetaCart
Bayesian networks are parametric models for multidimensional domains exhibiting complex dependencies between the dimensions (domain variables). A central problem in learning such models is how to regularize the number of parameters; in other words, how to determine which dependencies are significant and which are not. The normalized maximum likelihood (NML) distribution or code offers an informationtheoretic solution to this problem. Unfortunately, computing it for arbitrary Bayesian network models appears to be computationally infeasible, but we show how it can be computed efficiently for certain restricted type of Bayesian network models.