Results 1  10
of
146
Model Selection and the Principle of Minimum Description Length
 Journal of the American Statistical Association
, 1998
"... This paper reviews the principle of Minimum Description Length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This ..."
Abstract

Cited by 145 (5 self)
 Add to MetaCart
This paper reviews the principle of Minimum Description Length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This approach began with Kolmogorov's theory of algorithmic complexity, matured in the literature on information theory, and has recently received renewed interest within the statistics community. In the pages that follow, we review both the practical as well as the theoretical aspects of MDL as a tool for model selection, emphasizing the rich connections between information theory and statistics. At the boundary between these two disciplines, we find many interesting interpretations of popular frequentist and Bayesian procedures. As we will see, MDL provides an objective umbrella under which rather disparate approaches to statistical modeling can coexist and be compared. We illustrate th...
Subspace information criterion for model selection
 Neural Computation
, 2001
"... The problem of model selection is considerably important for acquiring higher levels of generalization capability in supervised learning. In this paper, we propose a new criterion for model selection called the subspace information criterion (SIC), which is a generalization of Mallows ’ C L. It is a ..."
Abstract

Cited by 41 (28 self)
 Add to MetaCart
The problem of model selection is considerably important for acquiring higher levels of generalization capability in supervised learning. In this paper, we propose a new criterion for model selection called the subspace information criterion (SIC), which is a generalization of Mallows ’ C L. It is assumed that the learning target function belongs to a specified functional Hilbert space and the generalization error is defined as the Hilbert space squared norm of the difference between the learning result function and target function. SIC gives an unbiased estimate of the generalization error so defined. SIC assumes the availability of an unbiased estimate of the target function and the noise covariance matrix, which are generally unknown. A practical calculation method of SIC for least mean squares learning is provided under the assumption that the dimension of the Hilbert space is less than the number of training examples. Finally, computer simulations in two examples show that SIC works well even when the number of training examples is small.
The variable selection problem
 Journal of the American Statistical Association
, 2000
"... The problem of variable selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables ..."
Abstract

Cited by 39 (2 self)
 Add to MetaCart
The problem of variable selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables or predictors, but there is uncertainty about which subset to use. This vignette reviews some of the key developments which have led to the wide variety of approaches for this problem. 1
Bayesian Statistics
 in WWW', Computing Science and Statistics
, 1989
"... ∗ Signatures are on file in the Graduate School. This dissertation presents two topics from opposite disciplines: one is from a parametric realm and the other is based on nonparametric methods. The first topic is a jackknife maximum likelihood approach to statistical model selection and the second o ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
∗ Signatures are on file in the Graduate School. This dissertation presents two topics from opposite disciplines: one is from a parametric realm and the other is based on nonparametric methods. The first topic is a jackknife maximum likelihood approach to statistical model selection and the second one is a convex hull peeling depth approach to nonparametric massive multivariate data analysis. The second topic includes simulations and applications on massive astronomical data. First, we present a model selection criterion, minimizing the KullbackLeibler distance by using the jackknife method. Various model selection methods have been developed to choose a model of minimum KullbackLiebler distance to the true model, such as Akaike information criterion (AIC), Bayesian information criterion (BIC), Minimum description length (MDL), and Bootstrap information criterion. Likewise, the jackknife method chooses a model of minimum KullbackLeibler distance through bias reduction. This bias, which is inevitable in model
Regression And Time Series Model Selection Using Variants Of The Schwarz Information Criterion
, 1997
"... The Schwarz (1978) information criterion, SIC, is a widelyused tool in model selection, largely due to its computational simplicity and effective performance in many modeling frameworks. The derivation of SIC (Schwarz, 1978) establishes the criterion as an asymptotic approximation to a transformati ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
The Schwarz (1978) information criterion, SIC, is a widelyused tool in model selection, largely due to its computational simplicity and effective performance in many modeling frameworks. The derivation of SIC (Schwarz, 1978) establishes the criterion as an asymptotic approximation to a transformation of the Bayesian posterior probability of a candidate model. In this paper, we investigate the derivation for the identification of terms which are discarded as being asymptotically negligible, but which may be significant in small to moderate samplesize applications. We suggest several SIC variants based on the inclusion of these terms. The results of a simulation study show that the variants improve upon the performance of SIC in two important areas of application: multiple linear regression and time series analysis. 1. Introduction One of the most important problems confronting an investigator in statistical modeling is the choice of an appropriate model to characterize the underlyin...
Bootstrap estimate of KullbackLeibler information for model selection
 Statistica Sinica
, 1997
"... Estimation of KullbackLeibler amount of information is a crucial part of deriving a statistical model selection procedure which is based on likelihood principle like AIC. To discriminate nested models, we have to estimate it up to the order of constant while the KullbackLeibler information itself ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
Estimation of KullbackLeibler amount of information is a crucial part of deriving a statistical model selection procedure which is based on likelihood principle like AIC. To discriminate nested models, we have to estimate it up to the order of constant while the KullbackLeibler information itself is of the order of the number of observations. A correction term employed in AIC is an example to ful ll this requirement but it is a simple minded bias correction to the log maximum likelihood. Therefore there is no assurance that such a bias correction yields a good estimate of KullbackLeibler information. In this paper as an alternative, bootstrap type estimation is considered. We will rst show that both bootstrap estimates proposed by Efron (1983,1986,1993) and Cavanaugh and Shumway(1994) are at least asymptotically equivalent and there exist many other equivalent bootstrap estimates. We also show that all such methods are asymptotically equivalent to a nonbootstrap method, known as TIC (Takeuchi's Information Criterion) which is a generalization of AIC.
Unifying the Derivations for the Akaike and Corrected Akaike Information Criteria
, 1997
"... The Akaike (1973, 1974) information criterion, AIC, and the corrected Akaike information criterion (Hurvich and Tsai, 1989), AICc, were both designed as estimators of the expected KullbackLeibler discrepancy between the model generating the data and a fitted candidate model. AIC is justified in a v ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
The Akaike (1973, 1974) information criterion, AIC, and the corrected Akaike information criterion (Hurvich and Tsai, 1989), AICc, were both designed as estimators of the expected KullbackLeibler discrepancy between the model generating the data and a fitted candidate model. AIC is justified in a very general framework, and as a result, offers a crude estimator of the expected discrepancy: one which exhibits a potentially high degree of negative bias in smallsample applications (Hurvich and Tsai, 1989). AICc corrects for this bias, but is less broadly applicable than AIC since its justification depends upon the form of the candidate model (Hurvich and Tsai, 1989, 1993; Hurvich, Shumway, and Tsai, 1990; Bedrick and Tsai, 1994). Although AIC and AICc share the same objective, the derivations of the criteria proceed along very different lines, making it difficult to reconcile how AICc improves upon the approximations leading to AIC. To address this issue, we present a derivation which unifies the justifications of AIC and AICc in the linear regression framework. Keywords: AIC, AICc, information theory, KullbackLeibler information, model selection.
Towards Perceptual Intelligence: Statistical Modeling of Human Individual and Interactive Behaviors
 Prediction of Human Behavior, IEEE Intelligent Vehicles
, 1995
"... This thesis presents a computational framework for the automatic recognition and prediction of different kinds of human behaviors from video cameras and other sensors, via perceptually intelligent systems that automatically sense and correctly classify human behaviors, by means of Machine Perception ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
This thesis presents a computational framework for the automatic recognition and prediction of different kinds of human behaviors from video cameras and other sensors, via perceptually intelligent systems that automatically sense and correctly classify human behaviors, by means of Machine Perception and Machine Learning techniques. In the thesis I develop the statistical machine learning algorithms (dynamic graphical models) necessary for detecting and recognizing individual and interactive behaviors. In the case of the interactions two Hidden Markov Models (HMMs) are coupled in a novel architecture called Coupled Hidden Markov Models (CHMMs) that explicitly captures the interactions between them. The algorithms for learning the parameters from data as well as for doing inference with those models are developed and described. Four systems that experimentally evaluate the proposed paradigm are presented: (1) LAFTER, an automatic face detection and tracking system with facial expression recognition; (2) a TaiChi gesture recognition system; (3) a pedestrian surveillance system that recognizes typical human to human interactions; (4) and a SmartCar for driver maneuver recognition. These systems capture human behaviors of different nature and increasing complexity: first, isolated, singleuser facial expressions, then, twohand gestures and humantohuman interactions,...
A LargeSample Model Selection Criterion Based on Kullback's Symmetric Divergence
 Statistical and Probability Letters
, 1999
"... The Akaike information criterion, AIC, is a widely known and extensively used tool for statistical model selection. AIC serves as an asymptotically unbiased estimator of a variant of Kullback's directed divergence between the true model and a fitted approximating model. The directed divergence is an ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
The Akaike information criterion, AIC, is a widely known and extensively used tool for statistical model selection. AIC serves as an asymptotically unbiased estimator of a variant of Kullback's directed divergence between the true model and a fitted approximating model. The directed divergence is an asymmetric measure of separation between two statistical models, meaning that an alternate directed divergence may be obtained by reversing the roles of the two models in the definition of the measure. The sum of the two directed divergences is Kullback's symmetric divergence. Since the symmetric divergence combines the information in two related though distinct measures, it functions as a gauge of model disparity which is arguably more sensitive than either of its individual components. With this motivation, we propose a model selection criterion which serves as an asymptotically unbiased estimator of a variant of the symmetric divergence between the true model and a fitted approximating model. We examine the performance of the criterion relative to other wellknown criteria in a simulation study. Keywords: AIC, Akaike information criterion, Idivergence, Jdivergence, KullbackLeibler information, relative entropy. Correspondence: Joseph E. Cavanaugh, Department of Statistics, 222 Math Sciences Bldg., University of Missouri, Columbia, MO 65211. y This research was supported by NSF grant DMS9704436. 1.