Results 1 -
7 of
7
The variable selection problem
- Journal of the American Statistical Association
, 2000
"... The problem of variable selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
The problem of variable selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables or predictors, but there is uncertainty about which subset to use. This vignette reviews some of the key developments which have led to the wide variety of approaches for this problem. 1
Feature Selection with Neural Networks
- Behaviormetrika
, 1998
"... Features gathered from the observation of a phenomenon are not all equally informative: some of them may be noisy, correlated or irrelevant. Feature selection aims at selecting a feature set that is relevant for a given task. This problem is complex and remains an important issue in many domains. In ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Features gathered from the observation of a phenomenon are not all equally informative: some of them may be noisy, correlated or irrelevant. Feature selection aims at selecting a feature set that is relevant for a given task. This problem is complex and remains an important issue in many domains. In the field of neural networks, feature selection has been studied for the last ten years and classical as well as original methods have been employed. This paper is a review of neural network approaches to feature selection. We first briefly introduce baseline statistical methods used in regression and classification. We then describe families of methods which have been developed specifically for neural networks. Representative methods are then compared on different test problems. Keywords Feature Selection, Subset selection, Variable Sensitivity, Sequential Search Sélection de Variables et Réseaux de Neurones Philippe LERAY et Patrick GALLINARI Résumé Les données collectées lors de l'obse...
Nonparametric Selection of Input Variables for Connectionist Learning
, 1996
"... re. However, for a range of explored problems, the relative ordering of mutual information estimates remains correct, despite inaccuracies in individual estimates. Analysis of forward selection explores the amount of data required to select a certain number of relevant input variables. It is shown t ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
re. However, for a range of explored problems, the relative ordering of mutual information estimates remains correct, despite inaccuracies in individual estimates. Analysis of forward selection explores the amount of data required to select a certain number of relevant input variables. It is shown that in order to select a certain number of relevant input variables, the amount of required data increases roughly exponentially as more relevant input variables are considered. It is also shown that the chances of forward selection ending up in a local minimum are reduced by bootstrapping the data. Finally, the method is compared to two connectionist methods for input variable selection: Sensitivity Based Pruning and Automatic Relevance Determination. It is shown that the new method outperforms these two when the number of independent, candidate input variables is large. However, the method requires the number of relevant input variables to be relatively small. These results are confirmed o
A New Approach to Variable Selection Using the TLS Approach
"... Abstract—The problem of variable selection is one of the most important model selection problems in statistical applications. It is also known as the subset selection problem and arises when one wants to explain the observations or data adequately by a subset of possible explanatory variables. The o ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract—The problem of variable selection is one of the most important model selection problems in statistical applications. It is also known as the subset selection problem and arises when one wants to explain the observations or data adequately by a subset of possible explanatory variables. The objective is to identify factors of importance and to include only variables that contribute significantly to the reduction of the prediction error. Numerous selection procedures have been proposed in the classical multiple linear regression model. We extend one of the most popular methods developed in this context, the backward selection procedure, to a more general class of models. In the basic linear regression model, errors are present on the observations only, if errors are present on the regressors as well, one gets the errors-in-variables model which for Gaussian noise becomes the total-least-squares (TLS) model, this is the context considered here. Index Terms—Least squares (LS) problem, matrix perturbation, stepwise regression, Student test, subset selection, total least squares (TLS) problem. I.
Pace Regression
, 1999
"... This paper articulates a new method of linear regression, \pace regression," that addresses many drawbacks of standard regression reported in the literature|particularly the subset selection problem. Pace regression improves on classical ordinary least squares (ols) regression by evaluating the ee ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper articulates a new method of linear regression, \pace regression," that addresses many drawbacks of standard regression reported in the literature|particularly the subset selection problem. Pace regression improves on classical ordinary least squares (ols) regression by evaluating the eect of each variable and using a clustering analysis to improve the statistical basis for estimating their contribution to the overall regression. As well as outperforming ols, it also outperforms|in a remarkably general sense|other linear modeling techniques in the literature, including subset selection procedures, which seek a reduction in dimensionality that falls out as a natural byproduct of pace regression. The paper denes six procedures that share the fundamental idea of pace regression, all of which are theoretically justied in terms of asymptotic performance. Experiments conrm the performance improvement over other techniques. Keywords: Linear regression; subset model sele...
Environment and Climate DG XII
"... this document, we try to present all these methods using unified notations and definitions. The reader may refer to the next page for a global view of our notations. Basic ingredients of feature selection methods. A feature selection technique typically requires the following ingredients: - a featur ..."
Abstract
- Add to MetaCart
this document, we try to present all these methods using unified notations and definitions. The reader may refer to the next page for a global view of our notations. Basic ingredients of feature selection methods. A feature selection technique typically requires the following ingredients: - a feature evaluation criterion to compare subsets of variables, it will be used to perform a choice on the variables, - a search procedure, to search the set of possible variable combinations, - a stop criterion, which could be a significance threshold in the evaluation criterion or the final feature space dimension. Depending on the task (e.g. prediction or classification) and on the model (linear, logistic, neural networks...), several evaluation criteria, based either on sound statistical grounds or heuristics, have been proposed for measuring the importance of a variable subset. For classification, classical criteria use probabilistic distances or entropy measures, often replaced in practice by simple interclass distance measures or even simple distances. For approximation or prediction, classical candidates are distance measures. Some methods consider only the data for computing the relevant variables, others take into account the model which will be used for the modelization task. In this case, the evaluation criterion may be based on the performances of the model for the candidate subset of variables. Several measures of performance do exist, this point is non trivial but will not be discussed further here. In general, evaluation criteria are non monotonic, and exact comparison of feature subsets amounts to a combinatorial problem, which rapidly becomes computationally unfeasible, even for moderate input size. Due to these limitations, most algorithms are based upon heuristic ...
complexity of
, 2007
"... Evaluation and selection of models for out-of-sample prediction when the sample size is small relative to the ..."
Abstract
- Add to MetaCart
Evaluation and selection of models for out-of-sample prediction when the sample size is small relative to the

