Results 1  10
of
62
Bayesian measures of model complexity and fit
 Journal of the Royal Statistical Society, Series B
, 2002
"... [Read before The Royal Statistical Society at a meeting organized by the Research ..."
Abstract

Cited by 132 (2 self)
 Add to MetaCart
[Read before The Royal Statistical Society at a meeting organized by the Research
Adapting to unknown sparsity by controlling the false discovery rate
, 2000
"... We attempt to recover a highdimensional vector observed in white noise, where the vector is known to be sparse, but the degree of sparsity is unknown. We consider three different ways of defining sparsity of a vector: using the fraction of nonzero terms; imposing powerlaw decay bounds on the order ..."
Abstract

Cited by 108 (15 self)
 Add to MetaCart
We attempt to recover a highdimensional vector observed in white noise, where the vector is known to be sparse, but the degree of sparsity is unknown. We consider three different ways of defining sparsity of a vector: using the fraction of nonzero terms; imposing powerlaw decay bounds on the ordered entries; and controlling the ℓp norm for p small. We obtain a procedure which is asymptotically minimax for ℓr loss, simultaneously throughout a range of such sparsity classes. The optimal procedure is a dataadaptive thresholding scheme, driven by control of the False Discovery Rate (FDR). FDR control is a recent innovation in simultaneous testing, in which one seeks to ensure that at most a certain fraction of the rejected null hypotheses will correspond to false rejections. In our treatment, the FDR control parameter q also plays a controlling role in asymptotic minimaxity. Our results say that letting q = qn → 0 with problem size n is sufficient for asymptotic minimaxity, while keeping fixed q>1/2prevents asymptotic minimaxity. To our knowledge, this relation between ideas in simultaneous inference and asymptotic decision theory is new. Our work provides a new perspective on a class of model selection rules which has been introduced recently by several authors. These new rules impose complexity penalization of the form 2·log ( potential model size / actual model size). We exhibit a close connection with FDRcontrolling procedures having q tending to 0; this connection strongly supports a conjecture of simultaneous asymptotic minimaxity for such model selection rules.
The estimation of prediction error: Covariance penalties and crossvalidation
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2004
"... Having constructed a databased estimation rule, perhaps a logistic regression or a classification tree, the statistician would like to know its performance as a predictor of future cases. There are two main theories concerning prediction error: (1) penalty methods such as Cp, AIC, and SURE that dep ..."
Abstract

Cited by 46 (4 self)
 Add to MetaCart
Having constructed a databased estimation rule, perhaps a logistic regression or a classification tree, the statistician would like to know its performance as a predictor of future cases. There are two main theories concerning prediction error: (1) penalty methods such as Cp, AIC, and SURE that depend on the covariance between data points and their corresponding predictions; (2) Crossvalidation and related nonparametric bootstrap techniques. This paper concerns the connection between the two theories. A RaoBlackwell type of relation is derived, in which nonparametric methods like crossvalidation are seen to be randomized versions of their covariance penalty counterparts. The modelbased penalty methods offer substantially better accuracy, assuming that the model is believable.
Subspace information criterion for model selection
 Neural Computation
, 2001
"... The problem of model selection is considerably important for acquiring higher levels of generalization capability in supervised learning. In this paper, we propose a new criterion for model selection called the subspace information criterion (SIC), which is a generalization of Mallows ’ C L. It is a ..."
Abstract

Cited by 41 (28 self)
 Add to MetaCart
The problem of model selection is considerably important for acquiring higher levels of generalization capability in supervised learning. In this paper, we propose a new criterion for model selection called the subspace information criterion (SIC), which is a generalization of Mallows ’ C L. It is assumed that the learning target function belongs to a specified functional Hilbert space and the generalization error is defined as the Hilbert space squared norm of the difference between the learning result function and target function. SIC gives an unbiased estimate of the generalization error so defined. SIC assumes the availability of an unbiased estimate of the target function and the noise covariance matrix, which are generally unknown. A practical calculation method of SIC for least mean squares learning is provided under the assumption that the dimension of the Hilbert space is less than the number of training examples. Finally, computer simulations in two examples show that SIC works well even when the number of training examples is small.
Smoothing spline ANOVA models for large data sets with Bernoulli observations and the randomized GACV
 Ann. Statist
"... (ranGACV) method for choosing multiple smoothing parameters in penalized likelihood estimates for Bernoulli data. The method is intended for application with penalized likelihood smoothing spline ANOVA models. In addition we propose a class of approximate numerical methods for solving the penalized ..."
Abstract

Cited by 41 (19 self)
 Add to MetaCart
(ranGACV) method for choosing multiple smoothing parameters in penalized likelihood estimates for Bernoulli data. The method is intended for application with penalized likelihood smoothing spline ANOVA models. In addition we propose a class of approximate numerical methods for solving the penalized likelihood variational problem which, in conjunction with the ranGACV method allows the application of smoothing spline ANOVA models with Bernoulli data to much larger data sets than previously possible. These methods are based on choosing an approximating subset of the natural (representer) basis functions for the variational problem. Simulation studies with synthetic data, including synthetic data mimicking demographic risk factor data sets is used to examine the properties of the method and to compare the approach with the GRKPACK code of Wang (1997c). Bayesian “confidence intervals ” are obtained for the fits and are shown in the simulation studies to have the “across the function ” property usually claimed for these confidence intervals. Finally the method is applied
The Covariance Inflation Criterion for Adaptive Model Selection
 J. Roy. Statist. Soc. B
, 1999
"... We propose a new criterion for model selection in prediction problems. The covariance inflation criterion adjusts the training error by the average covariance of the predictions and responses, when the prediction rule is applied to permuted versions of the dataset. This criterion can be applied to g ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
We propose a new criterion for model selection in prediction problems. The covariance inflation criterion adjusts the training error by the average covariance of the predictions and responses, when the prediction rule is applied to permuted versions of the dataset. This criterion can be applied to general prediction problems (for example regression or classification), and to general prediction rules (for example stepwise regression, treebased models and neural nets). As a byproduct we obtain a measure of the effective number of parameters used by an adaptive procedure. We relate the covariance inflation criterion to other model selection procedures and illustrate its use in some regression and classification problems. We also revisit the conditional bootstrap approach to model selection. Keywords: model selection, adaptive, permutation, bootstrap, crossvalidation 1 Introduction This article concerns the selection of a prediction rule from a set of training data. The training set z =...
Crossover improvement for the genetic algorithm in information retrieval
 Information Processing and Management
, 1998
"... Abstract Genetic algorithms (GAs) search for good solutions to a problem by operations inspired from the natural selection of living beings. Among their many uses, we can count information retrieval (IR). In this field, the aim of the GA is to help an IR system to find, in a huge documents text col ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
Abstract Genetic algorithms (GAs) search for good solutions to a problem by operations inspired from the natural selection of living beings. Among their many uses, we can count information retrieval (IR). In this field, the aim of the GA is to help an IR system to find, in a huge documents text collection, a good reply to a query expressed by the user. The analysis of phenomena seen during the implementation of a GA for IR has brought us to a new crossover operation. This article introduces this new operation and compares it with other learning methods.
Bootstrap estimate of KullbackLeibler information for model selection
 Statistica Sinica
, 1997
"... Estimation of KullbackLeibler amount of information is a crucial part of deriving a statistical model selection procedure which is based on likelihood principle like AIC. To discriminate nested models, we have to estimate it up to the order of constant while the KullbackLeibler information itself ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
Estimation of KullbackLeibler amount of information is a crucial part of deriving a statistical model selection procedure which is based on likelihood principle like AIC. To discriminate nested models, we have to estimate it up to the order of constant while the KullbackLeibler information itself is of the order of the number of observations. A correction term employed in AIC is an example to ful ll this requirement but it is a simple minded bias correction to the log maximum likelihood. Therefore there is no assurance that such a bias correction yields a good estimate of KullbackLeibler information. In this paper as an alternative, bootstrap type estimation is considered. We will rst show that both bootstrap estimates proposed by Efron (1983,1986,1993) and Cavanaugh and Shumway(1994) are at least asymptotically equivalent and there exist many other equivalent bootstrap estimates. We also show that all such methods are asymptotically equivalent to a nonbootstrap method, known as TIC (Takeuchi's Information Criterion) which is a generalization of AIC.