Results 11 - 20
of
43
A Comparison of Scientific and Engineering Criteria for Bayesian Model Selection
- Statistics and Computing
, 1996
"... this paper, we assume that there are a finite number of possible true models. For each possible model m, we define the random (vector) variable \Theta m whose values correspond to the possible values of the parameters for m. We encode our uncertainty about \Theta m using the probability distribution ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
this paper, we assume that there are a finite number of possible true models. For each possible model m, we define the random (vector) variable \Theta m whose values correspond to the possible values of the parameters for m. We encode our uncertainty about \Theta m using the probability distribution p(\Theta m jm). In this paper, we assume that p(\Theta m jm) is a probability density function. Given random sample D, we compute the posterior distributions for M and each \Theta m
Conjoint Probabilistic Subband Modeling
- Massachusetts Institute of Technology
, 1997
"... A new approach to high-order-conditional probability density estimation is developed, based on a partitioning of conditioning space via decision trees. The technique is applied to image compression, image restoration, and texture synthesis, and the results compared with those obtained by standard mi ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
A new approach to high-order-conditional probability density estimation is developed, based on a partitioning of conditioning space via decision trees. The technique is applied to image compression, image restoration, and texture synthesis, and the results compared with those obtained by standard mixture density and linear regression models. By applying the technique to subband-domain processing, some evidence is provided to support the following statement: the appropriate tradeoff between spatial and spectral localization in linear preprocessing shifts towards greater spatial localization when subbands are processed in a way that exploits interdependence.
Language Acquisition in the MDL Framework
- In Eric Sven Ristad, Language Computation. American Mathemtatical Society, Philedelphia
, 1994
"... The Minimum Description Length (MDL) principle provides guidance to the fundamental question of determining what a given set of observed data tells us about the underlying data generating machinery. Hence, in the broadest sense the MDL principle relates to the central question of all science, al ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
The Minimum Description Length (MDL) principle provides guidance to the fundamental question of determining what a given set of observed data tells us about the underlying data generating machinery. Hence, in the broadest sense the MDL principle relates to the central question of all science, although its most useful applications have been to the more practical problem of fitting statistical models to data. In this article, we review the MDL principle and demonstrate how it may be profitably applied to the logical problem of language acquisition.
Empirical Limits for Time Series Econometrics Models,” unpublished
, 1998
"... This paper characterizes empirically achievable limits for time series econometric modeling and forecasting. The approach involves the concept of minimal information loss in time series regression and the paper shows how to derive bounds that delimit the proximity of empirical measures to the true p ..."
Abstract
-
Cited by 10 (7 self)
- Add to MetaCart
This paper characterizes empirically achievable limits for time series econometric modeling and forecasting. The approach involves the concept of minimal information loss in time series regression and the paper shows how to derive bounds that delimit the proximity of empirical measures to the true probability measure (the DGP) in models that are of econometric interest. The approach utilizes joint probability measures over the combined space of parameters and observables and the results apply for models with stationary, integrated, and cointegrated data. A theorem due to Rissanen is extended so that it applies directly to probabilities about the relative likelihood (rather than averages), a new way of proving results of the Rissanen type is demonstrated, and the Rissanen theory is extended to nonstationary time series with unit roots, near unit roots, and cointegration of unknown order. The corresponding bound for the minimal information loss in empirical work is shown not to be a constant, in general, but to be proportional to the logarithm of the determinant of the (possibility stochastic) Fisher-information matrix. In fact, the bound that determines proximity to the DGP is generally path dependent, and it depends specifically on the type as well as the number of regressors. For practical purposes, the
Bayesian inference procedures derived via the concept of relative surprise
- Communications in Statistics
, 1997
"... of least relative surprise; model checking; change of variable problem; crossvalidation. We consider the problem of deriving Bayesian inference procedures via the concept of relative surprise. The mathematical concept of surprise has been developed by I.J. Good in a long sequence of papers. We make ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
of least relative surprise; model checking; change of variable problem; crossvalidation. We consider the problem of deriving Bayesian inference procedures via the concept of relative surprise. The mathematical concept of surprise has been developed by I.J. Good in a long sequence of papers. We make a modiÞcation to this development that permits the avoidance of a serious defect; namely, the change of variable problem. We apply relative surprise to the development of estimation, hypothesis testing and model checking procedures. Important advantages of the relative surprise approach to inference include the lack of dependence on a particular loss function and complete freedom to the statistician in the choice of prior for hypothesis testing problems. Links are established with common Bayesian inference procedures such as highest posterior density regions, modal estimates and Bayes factors. From a practical perspective new inference
Learning hybrid Bayesian networks from data
, 1998
"... We illustrate two different methodologies for learning Hybrid Bayesian networks, that is, Bayesian networks containing both continuous and discrete variables, from data. The two methodologies differ in the way of handling continuous data when learning the Bayesian network structure. The first method ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
We illustrate two different methodologies for learning Hybrid Bayesian networks, that is, Bayesian networks containing both continuous and discrete variables, from data. The two methodologies differ in the way of handling continuous data when learning the Bayesian network structure. The first methodology uses discretized data to learn the Bayesian network structure, and the original non-discretized data for the parameterization of the learned structure. The second methodology uses non-discretized data both to learn the Bayesian network structure and its parameterization. For the direct handling of continuous data, we propose the use of artificial neural networks as probability estimators, to be used as an integral part of the scoring metric defined to search the space of Bayesian network structures. With both methodologies, we assume the availability of a complete dataset, with no missing values or hidden variables. We report experimental results aimed at comparing the two methodologies. These results provide evidence that learning with discretized data presents advantages both in terms of efficiency and in terms of accuracy of the learned models over the alternative approach of using non-discretized data.
Combining forecasting procedures: some theoretical results
- Econometric Theory
, 2004
"... We study some methods of combining procedures for forecasting a continuous random variable. Statistical risk bounds under the square error loss are obtained under mild distributional assumptions on the future given the current outside information and the past observations. The risk bounds show that ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
We study some methods of combining procedures for forecasting a continuous random variable. Statistical risk bounds under the square error loss are obtained under mild distributional assumptions on the future given the current outside information and the past observations. The risk bounds show that the combined forecast automatically achieves the best performance among the candidate procedures up to a constant factor and an additive penalty term. In term of the rate of convergence, the combined forecast performs as well as if one knew which candidate forecasting procedure is the best in advance. Empirical studies suggest combining procedures can sometimes improve forecasting accuracy compared to the original procedures. Risk bounds are derived to theoretically quantify the potential gain and price for linearly combining forecasts for improvement. The result supports the empirical finding that it is not automatically a good idea to combine forecasts. A blind combining can degrade performance dramatically due to the undesirable large variability in estimating the best combining weights. An automated combining method is shown in theory to achieve a balance between the potential gain and the complexity penalty (the price for combining); to take advantage (if any) of sparse combining; and to maintain the best performance (in rate) among the candidate forecasting procedures if linear or sparse combining does not help.
Bayesian Modeling of Uncertainty in Ensembles of Climate Models
, 2008
"... Projections of future climate change caused by increasing greenhouse gases depend critically on numerical climate models coupling the ocean and atmosphere (GCMs). However, different models differ substantially in their projections, which raises the question of how the different models can best be co ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
Projections of future climate change caused by increasing greenhouse gases depend critically on numerical climate models coupling the ocean and atmosphere (GCMs). However, different models differ substantially in their projections, which raises the question of how the different models can best be combined into a probability distribution of future climate change. For this analysis, we have collected both current and future projected mean temperatures produced by nine climate models for 22 regions of the earth. We also have estimates of current mean temperatures from actual observations, together with standard errors, that can be used to calibrate the climate models. We propose a Bayesian analysis that allows us to combine the different climate models into a posterior distribution of future temperature increase, for each of the 22 regions, while allowing for the different climate models to have different variances. Two versions of the analysis are proposed, a univariate analysis in which each region is analyzed separately, and a multivariate analysis in which the 22 regions are combined into an overall statistical model. A cross-validation approach is proposed to confirm the reasonableness of our Bayesian predictive distributions. The results of this analysis allow for a quantification of the uncertainty of climate model projections as a Bayesian posterior distribution, substantially extending previous approaches to uncertainty in climate models.
An empirical study of minimum description length model selection with infinite parametric complexity
- JOURNAL OF MATHEMATICAL PSYCHOLOGY
, 2006
"... Parametric complexity is a central concept in Minimum Description Length (MDL) model selection. In practice it often turns out to be infinite, even for quite simple models such as the Poisson and Geometric families. In such cases, MDL model selection as based on NML and Bayesian inference based on J ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Parametric complexity is a central concept in Minimum Description Length (MDL) model selection. In practice it often turns out to be infinite, even for quite simple models such as the Poisson and Geometric families. In such cases, MDL model selection as based on NML and Bayesian inference based on Jeffreys ’ prior can not be used. Several ways to resolve this problem have been proposed. We conduct experiments to compare and evaluate their behaviour on small sample sizes. We find interestingly poor behaviour for the plug-in predictive code; a restricted NML model performs quite well but it is questionable if the results validate its theoretical motivation. A Bayesian marginal distribution with Jeffreys’ prior can still be used if one sacrifices the first observation to make a proper posterior; this approach turns out to be most dependable.
Model Selection by Normalized Maximum Likelihood
, 2005
"... The Minimum Description Length (MDL) principle is an information theoretic approach to inductive inference that originated in algorithmic coding theory. In this approach, data are viewed as codes to be compressed by the model. From this perspective, models are compared on their ability to compress a ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
The Minimum Description Length (MDL) principle is an information theoretic approach to inductive inference that originated in algorithmic coding theory. In this approach, data are viewed as codes to be compressed by the model. From this perspective, models are compared on their ability to compress a data set by extracting useful information in the data apart from random noise. The goal of model selection is to identify the model, from a set of candidate models, that permits the shortest description length (code) of the data. Since Rissanen originally formalized the problem using the crude ‘two-part code ’ MDL method in the 1970s, many significant strides have been made, especially in the 1990s, with the culmination of the development of the refined ‘universal code’ MDL method, dubbed Normalized Maximum Likelihood (NML). It represents an elegant solution to the model selection problem. The present paper provides a tutorial review on these latest developments with a special focus on NML. An application example of NML in cognitive modeling is also provided.

