Results 1  10
of
132
Model Selection and the Principle of Minimum Description Length
 Journal of the American Statistical Association
, 1998
"... This paper reviews the principle of Minimum Description Length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This ..."
Abstract

Cited by 156 (5 self)
 Add to MetaCart
This paper reviews the principle of Minimum Description Length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This approach began with Kolmogorov's theory of algorithmic complexity, matured in the literature on information theory, and has recently received renewed interest within the statistics community. In the pages that follow, we review both the practical as well as the theoretical aspects of MDL as a tool for model selection, emphasizing the rich connections between information theory and statistics. At the boundary between these two disciplines, we find many interesting interpretations of popular frequentist and Bayesian procedures. As we will see, MDL provides an objective umbrella under which rather disparate approaches to statistical modeling can coexist and be compared. We illustrate th...
Large Sample Sieve Estimation of SemiNonparametric Models
 Handbook of Econometrics
, 2007
"... Often researchers find parametric models restrictive and sensitive to deviations from the parametric specifications; seminonparametric models are more flexible and robust, but lead to other complications such as introducing infinite dimensional parameter spaces that may not be compact. The method o ..."
Abstract

Cited by 113 (14 self)
 Add to MetaCart
Often researchers find parametric models restrictive and sensitive to deviations from the parametric specifications; seminonparametric models are more flexible and robust, but lead to other complications such as introducing infinite dimensional parameter spaces that may not be compact. The method of sieves provides one way to tackle such complexities by optimizing an empirical criterion function over a sequence of approximating parameter spaces, called sieves, which are significantly less complex than the original parameter space. With different choices of criteria and sieves, the method of sieves is very flexible in estimating complicated econometric models. For example, it can simultaneously estimate the parametric and nonparametric components in seminonparametric models with or without constraints. It can easily incorporate prior information, often derived from economic theory, such as monotonicity, convexity, additivity, multiplicity, exclusion and nonnegativity. This chapter describes estimation of seminonparametric econometric models via the method of sieves. We present some general results on the large sample properties of the sieve estimates, including consistency of the sieve extremum estimates, convergence rates of the sieve Mestimates, pointwise normality of series estimates of regression functions, rootn asymptotic normality and efficiency of sieve estimates of smooth functionals of infinite dimensional parameters. Examples are used to illustrate the general results.
Boosting algorithms: Regularization, prediction and model fitting
 Statistical Science
, 2007
"... Abstract. We present a statistical perspective on boosting. Special emphasis is given to estimating potentially complex parametric or nonparametric models, including generalized linear and additive models as well as regression models for survival analysis. Concepts of degrees of freedom and correspo ..."
Abstract

Cited by 48 (9 self)
 Add to MetaCart
Abstract. We present a statistical perspective on boosting. Special emphasis is given to estimating potentially complex parametric or nonparametric models, including generalized linear and additive models as well as regression models for survival analysis. Concepts of degrees of freedom and corresponding Akaike or Bayesian information criteria, particularly useful for regularization and variable selection in highdimensional covariate spaces, are discussed as well. The practical aspects of boosting procedures for fitting statistical models are illustrated by means of the dedicated opensource software package mboost. This package implements functions which can be used for model fitting, prediction and variable selection. It is flexible, allowing for the implementation of new boosting algorithms optimizing userspecified loss functions. Key words and phrases: Generalized linear models, generalized additive models, gradient boosting, survival analysis, variable selection, software. 1.
Subspace information criterion for model selection
 Neural Computation
, 2001
"... The problem of model selection is considerably important for acquiring higher levels of generalization capability in supervised learning. In this paper, we propose a new criterion for model selection called the subspace information criterion (SIC), which is a generalization of Mallows ’ C L. It is a ..."
Abstract

Cited by 47 (30 self)
 Add to MetaCart
The problem of model selection is considerably important for acquiring higher levels of generalization capability in supervised learning. In this paper, we propose a new criterion for model selection called the subspace information criterion (SIC), which is a generalization of Mallows ’ C L. It is assumed that the learning target function belongs to a specified functional Hilbert space and the generalization error is defined as the Hilbert space squared norm of the difference between the learning result function and target function. SIC gives an unbiased estimate of the generalization error so defined. SIC assumes the availability of an unbiased estimate of the target function and the noise covariance matrix, which are generally unknown. A practical calculation method of SIC for least mean squares learning is provided under the assumption that the dimension of the Hilbert space is less than the number of training examples. Finally, computer simulations in two examples show that SIC works well even when the number of training examples is small.
Boosting for highdimensional linear models
 THE ANNALS OF STATISTICS
, 2006
"... We prove that boosting with the squared error loss, L2Boosting, is consistent for very highdimensional linear models, where the number of predictor variables is allowed to grow essentially as fast as O(exp(sample size)), assuming that the true underlying regression function is sparse in terms of th ..."
Abstract

Cited by 46 (4 self)
 Add to MetaCart
We prove that boosting with the squared error loss, L2Boosting, is consistent for very highdimensional linear models, where the number of predictor variables is allowed to grow essentially as fast as O(exp(sample size)), assuming that the true underlying regression function is sparse in terms of the ℓ1norm of the regression coefficients. In the language of signal processing, this means consistency for denoising using a strongly overcomplete dictionary if the underlying signal is sparse in terms of the ℓ1norm. We also propose here an AICbased method for tuning, namely for choosing the number of boosting iterations. This makes L2Boosting computationally attractive since it is not required to run the algorithm multiple times for crossvalidation as commonly used so far. We demonstrate L2Boosting for simulated data, in particular where the predictor dimension is large in comparison to sample size, and for a difficult tumorclassification problem with gene expression microarray data.
A New Method for Varying Adaptive Bandwidth Selection
 IEEE Trans. on Signal Proc
, 1999
"... A novel approach is developed to solve a problem of varying bandwidth selection for filtering a signal given with an additive noise. The approach is based on the intersection of confidence intervals (ICI) rule and gives the algorithm, which is simple to implement and adaptive to unknown smoothness o ..."
Abstract

Cited by 40 (24 self)
 Add to MetaCart
A novel approach is developed to solve a problem of varying bandwidth selection for filtering a signal given with an additive noise. The approach is based on the intersection of confidence intervals (ICI) rule and gives the algorithm, which is simple to implement and adaptive to unknown smoothness of the signal.
Estimating mixtures of regressions
"... In this paper, we show how Bayesian inference for switching regression models and their generalisations can be achieved by the specification of loss functions which overcome the label switching problem common to all mixture models. We also derive an extension to models where the number of components ..."
Abstract

Cited by 37 (2 self)
 Add to MetaCart
In this paper, we show how Bayesian inference for switching regression models and their generalisations can be achieved by the specification of loss functions which overcome the label switching problem common to all mixture models. We also derive an extension to models where the number of components in the mixture is unknown, based on the birthand death technique developed in Stephens (2000a). The methods are illustrated on various real datasets.
Crossvalidated local linear nonparametric regression
 Statistica Sinica
, 2004
"... Abstract: Local linear kernel methods have been shown to dominate local constant methods for the nonparametric estimation of regression functions. In this paper we study the theoretical properties of crossvalidated smoothing parameter selection for the local linear kernel estimator. We derive the ..."
Abstract

Cited by 27 (5 self)
 Add to MetaCart
Abstract: Local linear kernel methods have been shown to dominate local constant methods for the nonparametric estimation of regression functions. In this paper we study the theoretical properties of crossvalidated smoothing parameter selection for the local linear kernel estimator. We derive the rate of convergence of the crossvalidated smoothing parameters to their optimal benchmark values, and we establish the asymptotic normality of the resulting nonparametric estimator. We then generalize our result to the mixed categorical and continuous regressor case which is frequently encountered in applied settings. Monte Carlo simulation results are reported to examine the finite sample performance of the locallinear based crossvalidation smoothing parameter selector. We relate the theoretical and simulation results to a corrected AIC method (termed AICc) proposed by Hurvich, Simonoff and Tsai (1998) and find that AICc has impressive finitesample properties. Key words and phrases: Asymptotic normality, datadriven bandwidth selection, discrete and continuous data, local polynomial regression. 1.
2001, “Capital Requirements, Business Loans and Business Cycles: An Empirical Analysis of the Standardized Approach
 in the New Basel Capital Accord,” November 13, Board of Governors of the Federal Reserve System Working Paper
"... In the current regulatory framework, capital requirements are based on riskweighted assets, but all business loans carry a uniform risk weight, irrespective of variations in credit risk. The proposed new Capital Accord of the Bank for International Settlements provides for a greater sensitivity of c ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
In the current regulatory framework, capital requirements are based on riskweighted assets, but all business loans carry a uniform risk weight, irrespective of variations in credit risk. The proposed new Capital Accord of the Bank for International Settlements provides for a greater sensitivity of capital requirements to credit risk, raising the question of whether, and to what extent, the new capital standards will intensify business cycles. In this paper, we evaluate the potential cyclical effects of the “standardized approach ” to risk evaluation in the new Accord, which involves the ratings of external agencies. We combine Moody’s data on changes in U.S. borrowers ’ credit ratings since 1970 with estimates of the risk profile of business loans at commercial banks from the Survey of Terms of Business Lending, and also a risk profile estimated by Treacy and Carey (1998). We find that the level of required capital against business loans would be noticeably lower under the new Accord compared with the current regime. We do not find evidence of any substantial additional cyclicality in required capital levels under the standardized approach of the new Accord relative to the current regime.
A path following algorithm for Sparse PseudoLikelihood Inverse Covariance Estimation (SPLICE)
, 2008
"... Given n observations of a pdimensional random vector, the covariance matrix and its inverse (precision matrix) are needed in a wide range of applications. Sample covariance (e.g. its eigenstructure) can misbehave when p is comparable to the sample size n. Regularization is often used to mitigate th ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
Given n observations of a pdimensional random vector, the covariance matrix and its inverse (precision matrix) are needed in a wide range of applications. Sample covariance (e.g. its eigenstructure) can misbehave when p is comparable to the sample size n. Regularization is often used to mitigate the problem. In this paper, we proposed an ℓ1 penalized pseudolikelihood estimate for the inverse covariance matrix. This estimate is sparse due to the ℓ1 penalty, and we term this method SPLICE. Its regularization path can be computed via an algorithm based on the homotopy/LARSLasso algorithm. Simulation studies are carried out for various inverse covariance structures for p = 15 and n = 20, 1000. We compare SPLICE with the ℓ1 penalized likelihood estimate and a ℓ1 penalized Cholesky decomposition based method. SPLICE gives the best overall performance in terms of three metrics on the precision matrix and ROC curve for model selection. Moreover, our simulation results demonstrate that the SPLICE estimates are positivedefinite for most of the regularization path even though the restriction is not enforced.