Wrappers for feature subset selection
 ARTIFICIAL INTELLIGENCE
, 1997
"... In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a ..."
Cited by 1023
In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. Our wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain. We study the strengths and weaknesses of the wrapper approach and show a series of improved designs. We compare the wrapper approach to induction without feature subset selection and to Relief, a filter approach to feature subset selection. Significant improvement in accuracy is achieved for some datasets for the two families of induction algorithms used: decision trees and
Irrelevant Features and the Subset Selection Problem
 MACHINE LEARNING: PROCEEDINGS OF THE ELEVENTH INTERNATIONAL
, 1994
"... We address the problem of finding a subset of features that allows a supervised induction algorithm to induce small highaccuracy concepts. We examine notions of relevance and irrelevance, and show that the definitions used in the machine learning literature do not adequately partition the features ..."
Cited by 594
We address the problem of finding a subset of features that allows a supervised induction algorithm to induce small highaccuracy concepts. We examine notions of relevance and irrelevance, and show that the definitions used in the machine learning literature do not adequately partition the features into useful categories of relevance. We present definitions for irrelevance and for two degrees of relevance. These definitions improve our understanding of the behavior of previous subset selection algorithms, and help define the subset of features that should be sought. The features selected should depend not only on the features and the target concept, but also on the induction algorithm. We describe a method for feature subset selection using crossvalidation that is applicable to any induction algorithm, and discuss experiments conducted with ID3 and C4.5 on artificial and real datasets.
Locally weighted learning
 ARTIFICIAL INTELLIGENCE REVIEW
, 1997
"... This paper surveys locally weighted learning, a form of lazy learning and memorybased learning, and focuses on locally weighted linear regression. The survey discusses distance functions, smoothing parameters, weighting functions, local model structures, regularization of the estimates and bias, ass ..."
Cited by 448
This paper surveys locally weighted learning, a form of lazy learning and memorybased learning, and focuses on locally weighted linear regression. The survey discusses distance functions, smoothing parameters, weighting functions, local model structures, regularization of the estimates and bias, assessing predictions, handling noisy data and outliers, improving the quality of predictions by tuning t parameters, interference between old and new data, implementing locally weighted learning e ciently, and applications of locally weighted learning. A companion paper surveys how locally weighted learning can be used in robot learning and control.
Analysis of variance for gene expression microarray data
 Journal of Computational Biology
, 2000
"... Spotted cDNA microarrays are emerging as a powerful and costeffective tool for largescale analysis of gene expression. Microarrays can be used to measure the relative quantities of speci � c mRNAs in two or more tissue samples for thousands of genes simultaneously. While the power of this technolog ..."
Cited by 210
Spotted cDNA microarrays are emerging as a powerful and costeffective tool for largescale analysis of gene expression. Microarrays can be used to measure the relative quantities of speci � c mRNAs in two or more tissue samples for thousands of genes simultaneously. While the power of this technology has been recognized, many open questions remain about appropriate analysis of microarray data. One question is how to make valid estimates of the relative expression for genes that are not biased by ancillary sources of variation. Recognizing that there is inherent “noise ” in microarray data, how does one estimate the error variation associated with an estimated change in expression, i.e., how does one construct the error bars? We demonstrate that ANOVA methods can be used to normalize microarray data and provide estimates of changes in gene expression that are corrected for potential confounding effects. This approach establishes a framework for the general analysis and interpretation of microarray data. Key words: Gene expression microarray, differential expression, analysis of variance, bootstrap.
Bayesian Model Averaging for Linear Regression Models
 Journal of the American Statistical Association
, 1997
"... We consider the problem of accounting for model uncertainty in linear regression models. Conditioning on a single selected model ignores model uncertainty, and thus leads to the underestimation of uncertainty when making inferences about quantities of interest. A Bayesian solution to this problem in ..."
Cited by 184
We consider the problem of accounting for model uncertainty in linear regression models. Conditioning on a single selected model ignores model uncertainty, and thus leads to the underestimation of uncertainty when making inferences about quantities of interest. A Bayesian solution to this problem involves averaging over all possible models (i.e., combinations of predictors) when making inferences about quantities of
A New Approach to Variable Selection in Least Squares Problems
, 1999
"... The title Lasso has been suggested by Tibshirani [7] as a colourful name for a technique of variable selection which requires the minimization of a sum of squares subject to an ll bound r; on the solution. This forces zero components in the minimizing solution for small values of r;. Thus this bo ..."
Cited by 164
The title Lasso has been suggested by Tibshirani [7] as a colourful name for a technique of variable selection which requires the minimization of a sum of squares subject to an ll bound r; on the solution. This forces zero components in the minimizing solution for small values of r;. Thus this bound can function as a selection parameter. This paper makes two contributions to computational problems associated with implementing the Lasso: (1) a com pact descent method for solving the constrained problem for a particular value of r; is formulated, and (2) a homotopy method, in which the constraint bound r; becomes the homotopy parameter, is developed to completely describe the possible selection regimes. Both algorithms have a finite termination property.
Performance persistence
 Journal of Finance
, 1995
"... Most optimizationbased decision support systems are used repeatedly with only modest changes to input data from scenario to scenario. Unfortunately, optimization (mathematical programming) has a welldeserved reputation for amplifying small input changes into drastically different solutions. A prev ..."
Cited by 156
Most optimizationbased decision support systems are used repeatedly with only modest changes to input data from scenario to scenario. Unfortunately, optimization (mathematical programming) has a welldeserved reputation for amplifying small input changes into drastically different solutions. A previously optimal solution, or a slight variation of one, may still be nearly optimal in a new scenario and managerially preferable to a dramatically different solution that is mathematically optimal. Mathematical programming models can be stated and solved so that they exhibit varying degrees of persistence with respect to previous values of variables, constraints, or even exogenous considerations. We use case studies to highlight how modeling with persistence has improved managerial acceptance and describe how to incorporate persistence as an intrinsic feature of any optimization model. T^e reasonable man /^ptimizationbased decision support adapts himself to the world; % # V^^'systems, that is, decision support the unreasonable one persists in trvine to adapt i uu J ^U.. iJ. u If systems built around one or more mathethe world to himself; matical programming models, are preTherefore, all progress depends on the unrea j • M I J • n * j i sonable man dominantly employed as follows; A model is used to produce a plan, the plan is pub
Automatic Construction of Decision Trees from Data: A MultiDisciplinary Survey
 Data Mining and Knowledge Discovery
, 1997
"... Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial ne ..."
Cited by 146
Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial neural networks. Researchers in these disciplines, sometimes working on quite different problems, identified similar issues and heuristics for decision tree construction. This paper surveys existing work on decision tree construction, attempting to identify the important issues involved, directions the work has taken and the current state of the art. Keywords: classification, treestructured classifiers, data compaction 1. Introduction Advances in data collection methods, storage and processing technology are providing a unique challenge and opportunity for automated data exploration techniques. Enormous amounts of data are being collected daily from major scientific projects e.g., Human Genome...
Nonlinear BlackBox Modeling in System Identification: a Unified Overview
 Automatica
, 1995
"... A nonlinear black box structure for a dynamical system is a model structure that is prepared to describe virtually any nonlinear dynamics. There has been considerable recent interest in this area with structures based on neural networks, radial basis networks, wavelet networks, hinging hyperplanes, ..."
Cited by 136
A nonlinear black box structure for a dynamical system is a model structure that is prepared to describe virtually any nonlinear dynamics. There has been considerable recent interest in this area with structures based on neural networks, radial basis networks, wavelet networks, hinging hyperplanes, as well as wavelet transform based methods and models based on fuzzy sets and fuzzy rules. This paper describes all these approaches in a common framework, from a user's perspective. It focuses on what are the common features in the different approaches, the choices that have to be made and what considerations are relevant for a successful system identification application of these techniques. It is pointed out that the nonlinear structures can be seen as a concatenation of a mapping from observed data to a regression vector and a nonlinear mapping from the regressor space to the output space. These mappings are discussed separately. The latter mapping is usually formed as a basis function e...
Preliminary Guidelines for Empirical Research in Software Engineering
 IEEE Transactions on Software Engineering
, 2002
"... propose a preliminary set of research guidelines aimed at stimulating discussion among software researchers. They are based on a review of research guidelines developed for medical researchers and on our own experience in doing and reviewing software engineering research. The guidelines are intended ..."
Cited by 129
propose a preliminary set of research guidelines aimed at stimulating discussion among software researchers. They are based on a review of research guidelines developed for medical researchers and on our own experience in doing and reviewing software engineering research. The guidelines are intended to assist researchers, reviewers, and metaanalysts in designing, conducting, and evaluating empirical studies. Editorial boards of software engineering journals may wish to use our recommendations as a basis for developing guidelines for reviewers and for framing policies for dealing with the design, data collection, and analysis and reporting of empirical studies. Index TermsÐEmpirical software research, research guidelines, statistical mistakes. 1