Results 1 - 10
of
136
Model Selection and the Principle of Minimum Description Length
- Journal of the American Statistical Association
, 1998
"... This paper reviews the principle of Minimum Description Length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This ..."
Abstract
-
Cited by 114 (4 self)
- Add to MetaCart
This paper reviews the principle of Minimum Description Length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This approach began with Kolmogorov's theory of algorithmic complexity, matured in the literature on information theory, and has recently received renewed interest within the statistics community. In the pages that follow, we review both the practical as well as the theoretical aspects of MDL as a tool for model selection, emphasizing the rich connections between information theory and statistics. At the boundary between these two disciplines, we find many interesting interpretations of popular frequentist and Bayesian procedures. As we will see, MDL provides an objective umbrella under which rather disparate approaches to statistical modeling can co-exist and be compared. We illustrate th...
Benchmark Priors for Bayesian Model Averaging
- FORTHCOMING IN THE JOURNAL OF ECONOMETRICS
, 2001
"... In contrast to a posterior analysis given a particular sampling model, posterior model probabilities in the context of model uncertainty are typically rather sensitive to the specification of the prior. In particular, “diffuse” priors on model-specific parameters can lead to quite unexpected consequ ..."
Abstract
-
Cited by 61 (3 self)
- Add to MetaCart
In contrast to a posterior analysis given a particular sampling model, posterior model probabilities in the context of model uncertainty are typically rather sensitive to the specification of the prior. In particular, “diffuse” priors on model-specific parameters can lead to quite unexpected consequences. Here we focus on the practically relevant situation where we need to entertain a (large) number of sampling models and we have (or wish to use) little or no subjective prior information. We aim at providing an “automatic” or “benchmark” prior structure that can be used in such cases. We focus on the Normal linear regression model with uncertainty in the choice of regressors. We propose a partly noninformative prior structure related to a Natural Conjugate g-prior specification, where the amount of subjective information requested from the user is limited to the choice of a single scalar hyperparameter g0j. The consequences of different choices for g0j are examined. We investigate theoretical properties, such as consistency of the implied Bayesian procedure. Links with classical information criteria are provided. More importantly, we examine the finite sample implications of several choices of g0j in a simulation study. The use of the MC3 algorithm of Madigan and York (1995), combined with efficient coding in Fortran, makes it feasible to conduct large simulations. In addition to posterior criteria, we shall also compare the predictive performance of different priors. A classic example concerning the economics of crime will also be provided and contrasted with results in the literature. The main findings of the paper will lead us to propose a “benchmark” prior specification in a linear regression context with model uncertainty.
Learning Bayesian Belief Networks Based on the Minimum Description Length Principle: Basic Properties
, 1996
"... This paper was partially presented at the 9th conference on Uncertainty in Artificial Intelligence, July 1993. ..."
Abstract
-
Cited by 44 (0 self)
- Add to MetaCart
This paper was partially presented at the 9th conference on Uncertainty in Artificial Intelligence, July 1993.
Regression And Time Series Model Selection Using Variants Of The Schwarz Information Criterion
, 1997
"... The Schwarz (1978) information criterion, SIC, is a widely-used tool in model selection, largely due to its computational simplicity and effective performance in many modeling frameworks. The derivation of SIC (Schwarz, 1978) establishes the criterion as an asymptotic approximation to a transformati ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
The Schwarz (1978) information criterion, SIC, is a widely-used tool in model selection, largely due to its computational simplicity and effective performance in many modeling frameworks. The derivation of SIC (Schwarz, 1978) establishes the criterion as an asymptotic approximation to a transformation of the Bayesian posterior probability of a candidate model. In this paper, we investigate the derivation for the identification of terms which are discarded as being asymptotically negligible, but which may be significant in small to moderate sample-size applications. We suggest several SIC variants based on the inclusion of these terms. The results of a simulation study show that the variants improve upon the performance of SIC in two important areas of application: multiple linear regression and time series analysis. 1. Introduction One of the most important problems confronting an investigator in statistical modeling is the choice of an appropriate model to characterize the underlyin...
Bayesian Statistics
- in WWW', Computing Science and Statistics
, 1989
"... ∗ Signatures are on file in the Graduate School. This dissertation presents two topics from opposite disciplines: one is from a parametric realm and the other is based on nonparametric methods. The first topic is a jackknife maximum likelihood approach to statistical model selection and the second o ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
∗ Signatures are on file in the Graduate School. This dissertation presents two topics from opposite disciplines: one is from a parametric realm and the other is based on nonparametric methods. The first topic is a jackknife maximum likelihood approach to statistical model selection and the second one is a convex hull peeling depth approach to nonparametric massive multivariate data analysis. The second topic includes simulations and applications on massive astronomical data. First, we present a model selection criterion, minimizing the Kullback-Leibler distance by using the jackknife method. Various model selection methods have been developed to choose a model of minimum Kullback-Liebler distance to the true model, such as Akaike information criterion (AIC), Bayesian information criterion (BIC), Minimum description length (MDL), and Bootstrap information criterion. Likewise, the jackknife method chooses a model of minimum Kullback-Leibler distance through bias reduction. This bias, which is inevitable in model
Hierarchical Universal Coding
- IEEE Trans. Inform. Theory
, 1998
"... In an earlier paper, we proved a strong version of the redundancy-capacity converse theorem of universal coding, stating that for `most' sources in a given class, the universal coding redundancy is essentially lower bounded by the capacity of the channel induced by this class. Since this result hold ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
In an earlier paper, we proved a strong version of the redundancy-capacity converse theorem of universal coding, stating that for `most' sources in a given class, the universal coding redundancy is essentially lower bounded by the capacity of the channel induced by this class. Since this result holds for general classes of sources, it extends Rissanen's strong converse theorem for parametric families. While our earlier result has established strong optimality only for mixture codes weighted by the capacityachieving prior, our first result herein extends this finding to a general prior. For some cases our technique also leads to a simplified proof of the above mentioned strong converse theorem. The major interest in this paper, however, is in extending the theory of universal coding to hierarchical structures of classes, where each class may have a different capacity. In this setting, one wishes to incur redundancy essentially as small as that corresponding to the active class, and not ...
General-to-specific reductions of Vector Autoregressive Processes
- Econometric Studies - A Festschrift in Honour of Joachim Frohn
, 2001
"... Unrestricted reduced form vector autoregressive (VAR) models have become a dominant research strategy in empirical macroeconomics since Sims (1980) critique of traditional macroeconometric modeling. They are however subjected to the curse of dimensionality. In this paper we propose general-to-specif ..."
Abstract
-
Cited by 12 (9 self)
- Add to MetaCart
Unrestricted reduced form vector autoregressive (VAR) models have become a dominant research strategy in empirical macroeconomics since Sims (1980) critique of traditional macroeconometric modeling. They are however subjected to the curse of dimensionality. In this paper we propose general-to-specific reductions of VAR models and consider computer-automated model selection algorithms embodied in PcGets (see Krolzig and Hendry, 2000) for doing so. Starting from the unrestricted VAR, standard testing procedures eliminate statistically-insignificant variables, with diagnostic tests checking the validity of reductions, ensuring a congruent final selection. Since jointly selecting and diagnostic testing eludes theoretical analysis, we evaluate the proposed strategy by simulation. The Monte Carlo experiments show that PcGets recovers the DGP specification from a large unrestricted VAR model with size and power close to commencing from the DGP itself. The application of the proposed reduction strategy to a US monetary system demonstrates the feasibility of PcGets for the analysis of large macroeconomic data sets.
Unifying the Derivations for the Akaike and Corrected Akaike Information Criteria
, 1997
"... The Akaike (1973, 1974) information criterion, AIC, and the corrected Akaike information criterion (Hurvich and Tsai, 1989), AICc, were both designed as estimators of the expected Kullback-Leibler discrepancy between the model generating the data and a fitted candidate model. AIC is justified in a v ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
The Akaike (1973, 1974) information criterion, AIC, and the corrected Akaike information criterion (Hurvich and Tsai, 1989), AICc, were both designed as estimators of the expected Kullback-Leibler discrepancy between the model generating the data and a fitted candidate model. AIC is justified in a very general framework, and as a result, offers a crude estimator of the expected discrepancy: one which exhibits a potentially high degree of negative bias in small-sample applications (Hurvich and Tsai, 1989). AICc corrects for this bias, but is less broadly applicable than AIC since its justification depends upon the form of the candidate model (Hurvich and Tsai, 1989, 1993; Hurvich, Shumway, and Tsai, 1990; Bedrick and Tsai, 1994). Although AIC and AICc share the same objective, the derivations of the criteria proceed along very different lines, making it difficult to reconcile how AICc improves upon the approximations leading to AIC. To address this issue, we present a derivation which unifies the justifications of AIC and AICc in the linear regression framework. Keywords: AIC, AICc, information theory, Kullback-Leibler information, model selection.

