Results 1  10
of
32
Wrappers for feature subset selection
 ARTIFICIAL INTELLIGENCE
, 1997
"... In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a ..."
Abstract

Cited by 1023 (3 self)
 Add to MetaCart
In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. Our wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain. We study the strengths and weaknesses of the wrapper approach and show a series of improved designs. We compare the wrapper approach to induction without feature subset selection and to Relief, a filter approach to feature subset selection. Significant improvement in accuracy is achieved for some datasets for the two families of induction algorithms used: decision trees and
Irrelevant Features and the Subset Selection Problem
 MACHINE LEARNING: PROCEEDINGS OF THE ELEVENTH INTERNATIONAL
, 1994
"... We address the problem of finding a subset of features that allows a supervised induction algorithm to induce small highaccuracy concepts. We examine notions of relevance and irrelevance, and show that the definitions used in the machine learning literature do not adequately partition the features ..."
Abstract

Cited by 594 (23 self)
 Add to MetaCart
We address the problem of finding a subset of features that allows a supervised induction algorithm to induce small highaccuracy concepts. We examine notions of relevance and irrelevance, and show that the definitions used in the machine learning literature do not adequately partition the features into useful categories of relevance. We present definitions for irrelevance and for two degrees of relevance. These definitions improve our understanding of the behavior of previous subset selection algorithms, and help define the subset of features that should be sought. The features selected should depend not only on the features and the target concept, but also on the induction algorithm. We describe a method for feature subset selection using crossvalidation that is applicable to any induction algorithm, and discuss experiments conducted with ID3 and C4.5 on artificial and real datasets.
Correlationbased feature selection for machine learning
, 1998
"... A central problem in machine learning is identifying a representative set of features from which to construct a classification model for a particular task. This thesis addresses the problem of feature selection for machine learning through a correlation based approach. The central hypothesis is that ..."
Abstract

Cited by 139 (3 self)
 Add to MetaCart
A central problem in machine learning is identifying a representative set of features from which to construct a classification model for a particular task. This thesis addresses the problem of feature selection for machine learning through a correlation based approach. The central hypothesis is that good feature sets contain features that are highly correlated with the class, yet uncorrelated with each other. A feature evaluation formula, based on ideas from test theory, provides an operational definition of this hypothesis. CFS (Correlation based Feature Selection) is an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy. CFS was evaluated by experiments on artificial and natural datasets. Three machine learning algorithms were used: C4.5 (a decision tree learner), IB1 (an instance based learner), and naive Bayes. Experiments on artificial datasets showed that CFS quickly identifies and screens irrelevant, redundant, and noisy features, and identifies relevant features as long as their relevance does not strongly depend on other features. On natural domains, CFS typically eliminated well over half the features. In most cases, classification accuracy using the reduced feature set equaled or bettered accuracy using the complete feature set.
Feature selection for unsupervised learning
 Journal of Machine Learning Research
, 2004
"... In this paper, we identify two issues involved in developing an automated feature subset selection algorithm for unlabeled data: the need for finding the number of clusters in conjunction with feature selection, and the need for normalizing the bias of feature selection criteria with respect to dime ..."
Abstract

Cited by 95 (4 self)
 Add to MetaCart
In this paper, we identify two issues involved in developing an automated feature subset selection algorithm for unlabeled data: the need for finding the number of clusters in conjunction with feature selection, and the need for normalizing the bias of feature selection criteria with respect to dimension. We explore the feature selection problem and these issues through FSSEM (Feature Subset Selection using ExpectationMaximization (EM) clustering) and through two different performance criteria for evaluating candidate feature subsets: scatter separability and maximum likelihood. We present proofs on the dimensionality biases of these feature criteria, and present a crossprojection normalization scheme that can be applied to any criterion to ameliorate these biases. Our experiments show the need for feature selection, the need for addressing these two issues, and the effectiveness of our proposed solutions.
RealTime Inference of Complex Mental States from Facial Expressions and Head Gestures
 In Proc. Int’l Conf. Computer Vision & Pattern Recognition
, 2004
"... In this chapter, we describe a system for inferring complex mental states from a video stream of facial expressions and head gestures in realtime. The system abstracts video input into three levels, each representing head and facial events at different granularities of spatial and temporal abstract ..."
Abstract

Cited by 57 (8 self)
 Add to MetaCart
In this chapter, we describe a system for inferring complex mental states from a video stream of facial expressions and head gestures in realtime. The system abstracts video input into three levels, each representing head and facial events at different granularities of spatial and temporal abstraction. We use Dynamic Bayesian Networks to model the unfolding of head and facial displays, and corresponding mental states over time. We evaluate the system’s recognition accuracy and realtime performance for 6 classes of complex mental states—agreeing, concentrating, disagreeing, interested, thinking and unsure. Realtime performance, unobtrusiveness and lack of preprocessing make our system suitable for userindependent humancomputer interaction. 1
A pattern recognition approach to voicedunvoicedsilence classification with applications to speech recognition
 IEEE Trans. Acoust., Speech, Signal Processing
, 1976
"... Absbact—In speech analysis, the voicedunvoiced decision is usually performed in conjunction with pitch analysis. The linking of voicedunvoiced (VUV) decision to pitch analysis not only results in unnecessary complexity, but makes it difficult to classify short speech segments which are less than ..."
Abstract

Cited by 42 (0 self)
 Add to MetaCart
Absbact—In speech analysis, the voicedunvoiced decision is usually performed in conjunction with pitch analysis. The linking of voicedunvoiced (VUV) decision to pitch analysis not only results in unnecessary complexity, but makes it difficult to classify short speech segments which are less than a few pitch periods in duration. In this paper, we describe a pattern recognition approach for deciding whether a given segment of a speech signal should be classified as voiced speech, unvoiced speech, or silence, based on measurements made on the signal. In this method, five different measurements axe made on the speech segment to be classified. The measured parameters are the zerocrossing rate, the speech energy, the correlation between adjacent speech samples, the first predictor coefficient from a 12pole linear predictive coding (LPC) analysis, and the energy in the prediction error. The speech segment is assigned to a particular class based on a minimumdistance rule obtained under the assumption that the measured parameters are distributed according to the multidimensional Gaussian probability density function. The means and covariances for the Gaussian distribution are determined from manually classified speech data included in a training set. The method has been found to provide reliable classification with speech segments as short as 10 ms and has been used for both speech analysissynthesis and recognition applications. A simple nonlinear smoothing algorithm is described to provide a smooth 3level contour of an utterance for use in speech recognition applications. Quantitative results and several examples illustrating the performance of the methOd are included in the paper. I.
Object Detection Using Feature Subset Selection
 Pattern Recognition
, 2004
"... Past work on object detection has emphasized the issues of feature extraction and classification, however, relatively less attention has been given to the critical issue of feature selection. The main trend in feature extraction has been representing the data in a lower dimensional space, for exampl ..."
Abstract

Cited by 29 (5 self)
 Add to MetaCart
Past work on object detection has emphasized the issues of feature extraction and classification, however, relatively less attention has been given to the critical issue of feature selection. The main trend in feature extraction has been representing the data in a lower dimensional space, for example, using Principal Component Analysis (PCA). Without using an e#ective scheme to select an appropriate set of features in this space, however, these methods rely mostly on powerful classification algorithms to deal with redundant and irrelevant features. In this paper, we argue that feature selection is an important problem in object detection and demonstrate that Genetic Algorithms (GAs) provide a simple, general, and powerful framework for selecting good subsets of features, leading to improved detection rates. As a case study, we have considered PCA for feature extraction and Support Vector Machines (SVMs) for classification. The goal is searching the PCA space using GAs to select a subset of eigenvectors encoding important information about the target concept of interest. This is in contrast to traditional methods selecting some percentage of the top eigenvectors to represent the target concept, independently of the classification task. We have tested the proposed framework on two challenging applications: vehicle detection and face detection. Our experimental results illustrate significant performance improvements in both cases.
A Comparison of Induction Algorithms for Selective and nonSelective Bayesian Classifiers
 Proceedings of the Twelfth International Conference on Machine Learning
, 1995
"... In this paper we present a novel induction algorithm for Bayesian networks. This selective Bayesian network classifier selects a subset of attributes that maximizes predictive accuracy prior to the network learning phase, thereby learning Bayesian networks with a bias for small, highpredictiveaccu ..."
Abstract

Cited by 24 (5 self)
 Add to MetaCart
In this paper we present a novel induction algorithm for Bayesian networks. This selective Bayesian network classifier selects a subset of attributes that maximizes predictive accuracy prior to the network learning phase, thereby learning Bayesian networks with a bias for small, highpredictiveaccuracy networks. We compare the performance of this classifier with selective and nonselective naive Bayesian classifiers. We show that the selective Bayesian network classifier performs significantly better than both versions of the naive Bayesian classifier on almost all databases analyzed, and hence is an enhancement of the naive Bayesian classifier. Relative to the nonselective Bayesian network classifier, our selective Bayesian network classifier generates networks that are computationally simpler to evaluate and that display predictive accuracy comparable to that of Bayesian networks which model all features. 1 INTRODUCTION Bayesian induction methods have proven to be an important cla...
Learning Bayesian Networks Using Feature Selection
 in D. Fisher & H. Lenz, eds, Proceedings of the fifth International Workshop on Artificial Intelligence and Statistics, Ft. Lauderdale, FL
, 1995
"... This paper introduces a novel enhancement for learning Bayesian networks with a bias for small, highpredictiveaccuracy networks. The new approach selects a subset of features which maximizes predictive accuracy prior to the network learning phase. We examine explicitly the effects of two aspects o ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
This paper introduces a novel enhancement for learning Bayesian networks with a bias for small, highpredictiveaccuracy networks. The new approach selects a subset of features which maximizes predictive accuracy prior to the network learning phase. We examine explicitly the effects of two aspects of the algorithm, feature selection and node ordering. Our approach generates networks which are computationally simpler to evaluate and which display predictive accuracy comparable to that of Bayesian networks which model all attributes. 1 INTRODUCTION Bayesian networks are being increasingly recognized as an important representation for probabilistic reasoning. For many domains, the need to specify the probability distributions for a Bayesian network is considerable, and learning these probabilities from data using an algorithm like K2 [8] 1 could alleviate such specification difficulties. We describe an extension to the Bayesian network learning approaches introduced in K2. Rather than ...
An Inductive Learning Approach to Prognostic Prediction
 Proceedings of the Twelfth International Conference on Machine Learning
, 1995
"... This paper introduces the Recurrence Surface Approximation, an inductive learning method based on linear programming that predicts recurrence times using censored training examples, that is, examples in which the available training output may be only a lower bound on the "right answer." This approac ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
This paper introduces the Recurrence Surface Approximation, an inductive learning method based on linear programming that predicts recurrence times using censored training examples, that is, examples in which the available training output may be only a lower bound on the "right answer." This approach is augmented with a feature selection method that chooses an appropriate feature set within the context of the linear programming generalizer. Computational results in the field of breast cancer prognosis are shown. A straightforward translation of the prediction method to an artificial neural network model is also proposed. 1 INTRODUCTION Machine learning methods have been successfully applied to the analysis of many different complex problems in recent years, including many biomedical applications. One field which can benefit from this type of approach is the analysis of survival or lifetime data (Lee, 1992; Miller Jr., 1981), in which the objective can be broadly defined as predicting...