Results 1  10
of
42
A note on the LASSO and related procedures in model selection
 STATISTICA SINICA
, 2004
"... The Lasso, the Forward Stagewise regression and the Lars are closely related procedures recently proposed for linear regression problems. Each of them can produce sparse models and can be used both for estimation and variable selection. In practical implementations these algorithms are typically tu ..."
Abstract

Cited by 47 (6 self)
 Add to MetaCart
The Lasso, the Forward Stagewise regression and the Lars are closely related procedures recently proposed for linear regression problems. Each of them can produce sparse models and can be used both for estimation and variable selection. In practical implementations these algorithms are typically tuned to achieve optimal prediction accuracy. We show that, when the prediction accuracy is used as the criterion to choose the tuning parameter, in general these procedures are not consistent in terms of variable selection. That is, the sets of variables selected are not consistent at finding the true set of important variables. In particular, we show that for any sample size n, when there are superfluous variables in the linear regression model and the design matrix is orthogonal, the probability of the procedures correctly identifying the true set of important variables is less than a constant (smaller than one) not depending on n. This result is also shown to hold for two dimensional problems with general correlated design matrices. The results indicate that in problems where
Spike and slab variable selection: frequentist and bayesian strategies
 The Annals of Statistics
"... Variable selection in the linear regression model takes many apparent faces from both frequentist and Bayesian standpoints. In this paper we introduce a variable selection method referred to as a rescaled spike and slab model. We study the importance of prior hierarchical specifications and draw con ..."
Abstract

Cited by 40 (7 self)
 Add to MetaCart
Variable selection in the linear regression model takes many apparent faces from both frequentist and Bayesian standpoints. In this paper we introduce a variable selection method referred to as a rescaled spike and slab model. We study the importance of prior hierarchical specifications and draw connections to frequentist generalized ridge regression estimation. Specifically, we study the usefulness of continuous bimodal priors to model hypervariance parameters, and the effect scaling has on the posterior mean through its relationship to penalization. Several model selection strategies, some frequentist and some Bayesian in nature, are developed and studied theoretically. We demonstrate the importance of selective shrinkage for effective variable selection in terms of risk misclassification, and show this is achieved using the posterior from a rescaled spike and slab model. We also show how to verify a procedure’s ability to reduce model uncertainty in finite samples using a specialized forward selection strategy. Using this tool, we illustrate the effectiveness of rescaled spike and slab models in reducing model uncertainty. 1. Introduction. We
Generalized functional linear models
 Ann. Statist
, 2005
"... We propose a generalized functional linear regression model for a regression situation where the response variable is a scalar and the predictor is a random function. A linear predictor is obtained by forming the scalar product of the predictor function with a smooth parameter function, and the expe ..."
Abstract

Cited by 40 (5 self)
 Add to MetaCart
We propose a generalized functional linear regression model for a regression situation where the response variable is a scalar and the predictor is a random function. A linear predictor is obtained by forming the scalar product of the predictor function with a smooth parameter function, and the expected value of the response is related to this linear predictor via a link function. If in addition a variance function is specified, this leads to a functional estimating equation which corresponds to maximizing a functional quasilikelihood. This general approach includes the special cases of the functional linear model, as well as functional Poisson regression and functional binomial regression. The latter leads to procedures for classification and discrimination of stochastic processes and functional data. We also consider the situation where the link and variance functions are unknown and are estimated nonparametrically from the data, using a semiparametric quasilikelihood procedure. An essential step in our proposal is dimension reduction by approximating the predictor processes with a truncated KarhunenLoève expansion. We develop asymptotic inference for the proposed class of generalized regression models. In the proposed asymptotic approach, the truncation parameter increases with sample size, and a martingale central limit theorem is applied to establish the resulting increasing dimension asymptotics. We establish asymptotic normality for a properly scaled distance
The variable selection problem
 Journal of the American Statistical Association
, 2000
"... The problem of variable selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables ..."
Abstract

Cited by 39 (2 self)
 Add to MetaCart
The problem of variable selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables or predictors, but there is uncertainty about which subset to use. This vignette reviews some of the key developments which have led to the wide variety of approaches for this problem. 1
Can the Strengths of AIC and BIC Be Shared?
 BIOMETRICA
, 2003
"... It is well known that AIC and BIC have different properties in model selection. BIC is consistent in the sense that if the true model is among the candidates, the probability of selecting the true model approaches 1. On the other hand, AIC is minimaxrate optimal for both parametric and nonparame ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
It is well known that AIC and BIC have different properties in model selection. BIC is consistent in the sense that if the true model is among the candidates, the probability of selecting the true model approaches 1. On the other hand, AIC is minimaxrate optimal for both parametric and nonparametric cases for estimating the regression function. There are several successful results on constructing new model selection criteria to share some strengths of AIC and BIC. However, we show that in a rigorous sense, even in the setting that the true model is included in the candidates, the above mentioned main strengths of AIC and BIC cannot be shared. That is, for any model selection criterion to be consistent, it must behave supoptimally compared to AIC in terms of mean average squared error.
Consistency of cross validation for comparing regression procedures. Annals of Statistics, Accepted paper
"... Theoretical developments on cross validation (CV) have mainly focused on selecting one among a list of finitedimensional models (e.g., subset or order selection in linear regression) or selecting a smoothing parameter (e.g., bandwidth for kernel smoothing). However, little is known about consistenc ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
Theoretical developments on cross validation (CV) have mainly focused on selecting one among a list of finitedimensional models (e.g., subset or order selection in linear regression) or selecting a smoothing parameter (e.g., bandwidth for kernel smoothing). However, little is known about consistency of cross validation when applied to compare between parametric and nonparametric methods or within nonparametric methods. We show that under some conditions, with an appropriate choice of data splitting ratio, cross validation is consistent in the sense of selecting the better procedure with probability approaching 1. Our results reveal interesting behavior of cross validation. When comparing two models (procedures) converging at the same nonparametric rate, in contrast to the parametric case, it turns out that the proportion of data used for evaluation in CV does not need to be dominating in size. Furthermore, it can even be of a smaller order than the proportion for estimation while not affecting the consistency property.
BICROSSVALIDATION OF THE SVD AND THE NONNEGATIVE MATRIX FACTORIZATION 1
"... This article presents a form of bicrossvalidation (BCV) for choosing the rank in outer product models, especially the singular value decomposition (SVD) and the nonnegative matrix factorization (NMF). Instead of leaving out a set of rows of the data matrix, we leave out a set of rows and a set of ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
This article presents a form of bicrossvalidation (BCV) for choosing the rank in outer product models, especially the singular value decomposition (SVD) and the nonnegative matrix factorization (NMF). Instead of leaving out a set of rows of the data matrix, we leave out a set of rows and a set of columns, and then predict the left out entries by low rank operations on the retained data. We prove a selfconsistency result expressing the prediction error as a residual from a low rank approximation. Random matrix theory and some empirical results suggest that smaller holdout sets lead to more overfitting, while larger ones are more prone to underfitting. In simulated examples we find that a method leaving out half the rows and half the columns performs well. 1. Introduction. Many
Towards Perceptual Intelligence: Statistical Modeling of Human Individual and Interactive Behaviors
 Prediction of Human Behavior, IEEE Intelligent Vehicles
, 1995
"... This thesis presents a computational framework for the automatic recognition and prediction of different kinds of human behaviors from video cameras and other sensors, via perceptually intelligent systems that automatically sense and correctly classify human behaviors, by means of Machine Perception ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
This thesis presents a computational framework for the automatic recognition and prediction of different kinds of human behaviors from video cameras and other sensors, via perceptually intelligent systems that automatically sense and correctly classify human behaviors, by means of Machine Perception and Machine Learning techniques. In the thesis I develop the statistical machine learning algorithms (dynamic graphical models) necessary for detecting and recognizing individual and interactive behaviors. In the case of the interactions two Hidden Markov Models (HMMs) are coupled in a novel architecture called Coupled Hidden Markov Models (CHMMs) that explicitly captures the interactions between them. The algorithms for learning the parameters from data as well as for doing inference with those models are developed and described. Four systems that experimentally evaluate the proposed paradigm are presented: (1) LAFTER, an automatic face detection and tracking system with facial expression recognition; (2) a TaiChi gesture recognition system; (3) a pedestrian surveillance system that recognizes typical human to human interactions; (4) and a SmartCar for driver maneuver recognition. These systems capture human behaviors of different nature and increasing complexity: first, isolated, singleuser facial expressions, then, twohand gestures and humantohuman interactions,...
Performance Prediction for Exponential Language Models
"... We investigate the task of performance prediction for language models belonging to the exponential family. First, we attempt to empirically discover a formula for predicting test set crossentropy for ngram language models. We build models over varying domains, data set sizes, and ngram orders, an ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
We investigate the task of performance prediction for language models belonging to the exponential family. First, we attempt to empirically discover a formula for predicting test set crossentropy for ngram language models. We build models over varying domains, data set sizes, and ngram orders, and perform linear regression to see whether we can model test set performance as a simple function of training set performance and various model statistics. Remarkably, we find a simple relationship that predicts test set performance with a correlation of 0.9997. We analyze why this relationship holds and show that it holds for other exponential language models as well, including classbased models and minimum discrimination information models. Finally, we discuss how this relationship can be applied to improve language model performance. 1