• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

An empirical study of minimum description length model selection with infinite parametric complexity (2006)

by S de Rooij, P Gru¨nwald
Venue:Journal of Mathematical Psychology
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 12
Next 10 →

Accumulative prediction error and the selection of time series models

by Peter Grünwald, et al. , 2006
"... This article reviews the rationale for using accumulative one-step-ahead prediction error (APE) as a data-driven method for model selection. Theoretically, APE is closely related to Bayesian model selection and the method of minimum description length (MDL). The sole requirement for using APE is tha ..."
Abstract - Cited by 25 (5 self) - Add to MetaCart
This article reviews the rationale for using accumulative one-step-ahead prediction error (APE) as a data-driven method for model selection. Theoretically, APE is closely related to Bayesian model selection and the method of minimum description length (MDL). The sole requirement for using APE is that the models under consideration are capable of generating a prediction for the next, unseen data point. This means that APE may be readily applied to selection problems involving very complex models. APE automatically takes the functional form of parameters into account, and the ‘plug-in’ version of APE does not require the specification of priors. APE is particularly easy to compute for data that have a natural ordering, such as time series. Here, we explore the possibility of using APE to discriminate the short-range ARMA(1,1) model from the long-range ARFIMAð0; d; 0Þ model. We also illustrate how APE may be used for model meta-selection, allowing one to choose between different model selection methods.

Bayesian Network Structure Learning using Factorized NML Universal Models

by Teemu Roos, Tomi Silander, Petri Kontkanen, Petri Myllymäki , 2008
"... Universal codes/models can be used for data compression and model selection by the minimum description length (MDL) principle. For many interesting model classes, such as Bayesian networks, the minimax regret optimal normalized maximum likelihood (NML) universal model is computationally very deman ..."
Abstract - Cited by 9 (4 self) - Add to MetaCart
Universal codes/models can be used for data compression and model selection by the minimum description length (MDL) principle. For many interesting model classes, such as Bayesian networks, the minimax regret optimal normalized maximum likelihood (NML) universal model is computationally very demanding. We suggest a computationally feasible alternative to NML for Bayesian networks, the factorized NML universal model, where the normalization is done locally for each variable. This can be seen as an approximate sum-product algorithm. We show that this new universal model performs extremely well in model selection, compared to the existing state-of-the-art, even for small sample sizes.

Agent-Based Model Selection Framework for Complex Adaptive Systems

by Tei Laine , 2006
"... ..."
Abstract - Cited by 4 (2 self) - Add to MetaCart
Abstract not found

Monte Carlo Estimation of Minimax Regret with an Application to MDL Model Selection

by Teemu Roos , 2008
"... Minimum description length (MDL) model selection, in its modern NML formulation, involves a model complexity term which is equivalent to minimax/maximin regret. When the data are discrete-valued, the complexity term is a logarithm of a sum of maximized likelihoods over all possible data-sets. Becaus ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
Minimum description length (MDL) model selection, in its modern NML formulation, involves a model complexity term which is equivalent to minimax/maximin regret. When the data are discrete-valued, the complexity term is a logarithm of a sum of maximized likelihoods over all possible data-sets. Because the sum has an exponential number of terms, its evaluation is in many cases intractable. In the continuous case, the sum is replaced by an integral for which a closed form is available in only a few cases. We present an approach based on Monte Carlo sampling, which works for all model classes, and gives strongly consistent estimators of the minimax regret. The estimates convergence almost surely to the correct value with increasing number of iterations. For the important class of Markov models, one of the presented estimators is particularly efficient: in empirical experiments, accuracy that is sufficient for model selection is usually achieved already on the first iteration, even for long sequences.

Luckiness and Regret in Minimum Description Length Inference

by Steven De Rooij, Peter D. Grünwald , 2009
"... Minimum Description Length (MDL) inference is based on the intuition that understanding the available data can be defined in terms of the ability to compress the data, i.e. to describe it in full using a shorter representation. This brief introduction discusses the design of the various codes used t ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Minimum Description Length (MDL) inference is based on the intuition that understanding the available data can be defined in terms of the ability to compress the data, i.e. to describe it in full using a shorter representation. This brief introduction discusses the design of the various codes used to implement MDL, focusing on the philosophically intriguing concepts of luckiness and regret: a good MDL code exhibits good performance in the worst case over all possible data sets, but achieves even better performance when the data turn out to be simple (although we suggest making no a priori assumptions to that effect). We then discuss how data compression relates to performance in various learning tasks, including parameter estimation, parametric and nonparametric model selection and sequential prediction of outcomes from an unknown source. Last, we briefly outline the history of MDL and its technical and philosophical relationship to other approaches to learning such as Bayesian, frequentist and prequential statistics. 1
(Show Context)

Citation Context

...l possible data sequences, like NML. Instead it only requires calculation of the ML estimator. While the prequential ML code has been used successfully in practical inference problems, it is shown in =-=[16]-=- that (in expectation) it does not necessarily achieve the desired regret (11) of (k/2)log n+O(1) unless the data are actually sampled from a distribution in the model. Application of the ML plug-in c...

MMLD Inference of the Poisson and Geometric Models

by Daniel F. Schmidt, Enes Makalic
"... Abstract. This paper examines MMLD-based approximations for the inference of two univariate probability densities: the geometric distribution, parameterised in terms of a mean parameter, and the Poisson distribution. The focus is on both parameter estimation and hypothesis testing properties of the ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract. This paper examines MMLD-based approximations for the inference of two univariate probability densities: the geometric distribution, parameterised in terms of a mean parameter, and the Poisson distribution. The focus is on both parameter estimation and hypothesis testing properties of the approximation. The new parameter estimators are compared to the MML87 estimators in terms of bias, squared error risk and KL divergence risk. Empirical experiments demonstrate that the MMLD parameter estimates are more biased, and feature higher squared error risk than the corresponding MML87 estimators. In contrast, the two criteria are virtually indistinguishable in the hypothesis testing experiment. 1
(Show Context)

Citation Context

...istribution under consideration is the geometric distribution. Previous MML87 formulations of the geometric distribution have been in terms of the binomial proportion parameter. Following the lead of =-=[14]-=-, we instead parameterise the geometric distribution in terms of a mean parameter μ>0: p(y n n∏ μ |μ) = yi (μ +1) yi i=1 (14) where yi ∈ Z +. The geometric and Poisson distributions share the same suf...

The Momentum Problem in . . .

by Tim van Erven , 2006
"... ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract not found

1. NORMALIZED MAXIMUM LIKELIHOOD Let

by Petri Myllymäki
"... Bayesian networks are parametric models for multidimensional domains exhibiting complex dependencies between the dimensions (domain variables). A central problem in learning such models is how to regularize the number of parameters; in other words, how to determine which dependencies are significant ..."
Abstract - Add to MetaCart
Bayesian networks are parametric models for multidimensional domains exhibiting complex dependencies between the dimensions (domain variables). A central problem in learning such models is how to regularize the number of parameters; in other words, how to determine which dependencies are significant and which are not. The normalized maximum likelihood (NML) distribution or code offers an information-theoretic solution to this problem. Unfortunately, computing it for arbitrary Bayesian network models appears to be computationally infeasible, but recent results have showed that it can be computed efficiently for certain restricted type of Bayesian network models. In this review paper we summarize the main results.
(Show Context)

Citation Context

...ant depending only on the sample size n: ln p(xn ; ˆ θ(x n )) pNML(x n ) = lnCM(n) . For some model classes, the normalizing factor is finite only if the range X n of the data is restricted, see e.g. =-=[1, 3, 4]-=-. For discrete models, the normalizing constant, CM(n), is given by a sum over all data matrices of size m × n: CM(n) = ∑ p(x n ; ˆ θ(x n )) . x n ∈X n 2. BAYESIAN NETWORKS Let us associate with the c...

Efficient Computation of NML . . .

by Petri Myllymäki
"... Bayesian networks are parametric models for multidimensional domains exhibiting complex dependencies between the dimensions (domain variables). A central problem in learning such models is how to regularize the number of parameters; in other words, how to determine which dependencies are significant ..."
Abstract - Add to MetaCart
Bayesian networks are parametric models for multidimensional domains exhibiting complex dependencies between the dimensions (domain variables). A central problem in learning such models is how to regularize the number of parameters; in other words, how to determine which dependencies are significant and which are not. The normalized maximum likelihood (NML) distribution or code offers an information-theoretic solution to this problem. Unfortunately, computing it for arbitrary Bayesian network models appears to be computationally infeasible, but we show how it can be computed efficiently for certain restricted type of Bayesian network models.

Acknowledgements

by Harri Laine, Kongens Lyngby, Harri Laine
"... www.compute.dtu.dk ..."
Abstract - Add to MetaCart
www.compute.dtu.dk
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University