Results 1 - 10
of
42
Feature selection: Evaluation, application, and small sample performance
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1997
"... Abstract—A large number of algorithms have been proposed for feature subset selection. Our experimental results show that the sequential forward floating selection (SFFS) algorithm, proposed by Pudil et al., dominates the other algorithms tested. We study the problem of choosing an optimal feature s ..."
Abstract
-
Cited by 238 (9 self)
- Add to MetaCart
Abstract—A large number of algorithms have been proposed for feature subset selection. Our experimental results show that the sequential forward floating selection (SFFS) algorithm, proposed by Pudil et al., dominates the other algorithms tested. We study the problem of choosing an optimal feature set for land use classification based on SAR satellite images using four different texture models. Pooling features derived from different texture models, followed by a feature selection results in a substantial improvement in the classification accuracy. We also illustrate the dangers of using feature selection in small sample size situations. Index Terms—Feature selection, curse of dimensionality, genetic algorithm, node pruning, texture models, SAR image classification. 1
Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey
- Data Mining and Knowledge Discovery
, 1997
"... Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial ne ..."
Abstract
-
Cited by 121 (1 self)
- Add to MetaCart
Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial neural networks. Researchers in these disciplines, sometimes working on quite different problems, identified similar issues and heuristics for decision tree construction. This paper surveys existing work on decision tree construction, attempting to identify the important issues involved, directions the work has taken and the current state of the art. Keywords: classification, tree-structured classifiers, data compaction 1. Introduction Advances in data collection methods, storage and processing technology are providing a unique challenge and opportunity for automated data exploration techniques. Enormous amounts of data are being collected daily from major scientific projects e.g., Human Genome...
A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms
- ARTIFICIAL INTELLIGENCE REVIEW
, 1997
"... Many lazy learning algorithms are derivatives of the k-nearest neighbor (k-NN) classifier, which uses a distance function to generate predictions from stored instances. Several studies have shown that k-NN's performance is highly sensitive to the definition of its distance function. Many k-NN v ..."
Abstract
-
Cited by 94 (0 self)
- Add to MetaCart
Many lazy learning algorithms are derivatives of the k-nearest neighbor (k-NN) classifier, which uses a distance function to generate predictions from stored instances. Several studies have shown that k-NN's performance is highly sensitive to the definition of its distance function. Many k-NN variants have been proposed to reduce this sensitivity by parameterizing the distance function with feature weights. However, these variants have not been categorized nor empirically compared. This paper reviews a class of weight-setting methods for lazy learning algorithms. We introduce a framework for distinguishing these methods and empirically compare them. We observed four trends from our experiments and conducted further studies to highlight them. Our results suggest that methods which use performance feedback to assign weight settings demonstrated three advantages over other methods: they require less pre-processing, perform better in the presence of interacting features, and generally require less training data to learn good settings. We also found that continuous weighting methods tend to outperform feature selection algorithms for tasks where some features are useful but less important than others.
A Comparative Evaluation of Sequential Feature Selection Algorithms
, 1994
"... Several recent machine learning publications demonstrate the utility of using feature selection algorithms in supervised learning tasks. Among these, scqucnlial feature s1ion algorithms are receiving attention. The most frequently studied variants of these algorithms are forward and backward sequ ..."
Abstract
-
Cited by 93 (4 self)
- Add to MetaCart
Several recent machine learning publications demonstrate the utility of using feature selection algorithms in supervised learning tasks. Among these, scqucnlial feature s1ion algorithms are receiving attention. The most frequently studied variants of these algorithms are forward and backward sequential selection. Many studies on supervised learning with sequential feature selection report applications of these algorithms, but do not consider variants of them that might be more appropriate for some performance tasks. This paper reports positive empirical results on such variants, and argues for their serious consideration in similar learning tasks.
Feature Selection for Case-Based Classification of Cloud Types: An Empirical Comparison
- In Proceedings of the AAAI-94 Workshop on Case-Based Reasoning
, 1994
"... Accurate weather prediction is crucial for many activities, including Naval operations. Researchers within the meteorological division of the Naval Research Laboratory have developed and fielded several expert systems for problems such as fog and turbulence forecasting, and tropical storm movement. ..."
Abstract
-
Cited by 66 (3 self)
- Add to MetaCart
Accurate weather prediction is crucial for many activities, including Naval operations. Researchers within the meteorological division of the Naval Research Laboratory have developed and fielded several expert systems for problems such as fog and turbulence forecasting, and tropical storm movement. They are currently developing an automated system for satellite image interpretation, part of which involves cloud classification. Their cloud classification database contains 204 high-level features, but contains only a few thousand instances. The predictive accuracy of classifiers can be improved on this task by employing a feature selection algorithm. We explain why non-parametric case-based classifiers are excellent choices for use in feature selection algorithms. We then describe a set of such algorithms that use case-based classifiers, empirically compare them, and introduce novel extensions of backward sequential selection that allows it to scale to this task. Several of the approache...
Floating Search Methods for Feature Selection with Nonmonotonic Criterion Functions
- In Proceedings of the Twelveth International Conference on Pattern Recognition, IAPR
, 1994
"... this paper we return to the sequential selection procedures with backtracking and show that a family of suboptimal search algorithms which we call the Floating Search methods are very ecient and eective even on problems of high dimensionality involving nonmonotonic feature selection criterion functi ..."
Abstract
-
Cited by 42 (1 self)
- Add to MetaCart
this paper we return to the sequential selection procedures with backtracking and show that a family of suboptimal search algorithms which we call the Floating Search methods are very ecient and eective even on problems of high dimensionality involving nonmonotonic feature selection criterion functions. The Floating Search methods are related to the plus-l-take away r algorithm, but in contrast to the latter, the number of forward and backtracking steps is dynamically controlled instead of being xed beforehand. The purpose of this paper is to present the Floating Search procedures and show that they can cope with nonmonotonic feature set criterion functions. We shall demonstrate that the Floating Search procedures construct in parallel the feature sets of all dimensionalities up to a specied threshold. By means of sequential forward and backward selection these sets are updated whenever the modication results in a better performance. In consequence, the resulting feature sets, as in the case of the (l; r) sequential algorithms, are not necessarily nested. By the same token, the selection process can correct for any eects caused by nonmonotonicity of the FS criterion
Decision-level Fusion in Fingerprint Verification
- PATTERN RECOGNITION
, 2001
"... A scheme is proposed for classifier combination at decision level which stresses the importance of classier selection during combination. The proposed scheme is optimal (in the Neyman-Pearson sense) when sufficient data are available to obtain reasonable estimates of the join densities of classi ..."
Abstract
-
Cited by 35 (8 self)
- Add to MetaCart
A scheme is proposed for classifier combination at decision level which stresses the importance of classier selection during combination. The proposed scheme is optimal (in the Neyman-Pearson sense) when sufficient data are available to obtain reasonable estimates of the join densities of classifier outputs. Four different fingerprint matching algorithms are combined using the proposed scheme to improve the accuracy of a fingerprint verification system. Experiments conducted on a large fingerprint database ( 2,700 ngerprints) conrm the effectiveness of the proposed integration scheme. An overall matching performance increase of 3% is achieved. We further show that a combination of multiple impressions or multiple fingers improves the verification performance by more than 4% and 5%, respectively. Analysis of the results provide some insight into the various decision-level classifier combination strategies.
Machine Recognition of Timbre Using Steady-State Tone of Acoustic Musical Instruments
, 1998
"... Introduction A timbre recognition experiment to classify 39 different orchestral instrument timbres was conducted using an exemplar-based learning system. The data consisted of the steady-state spectrum of each of the instruments played at different pitches (Sandell 1994). It has been shown that th ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
Introduction A timbre recognition experiment to classify 39 different orchestral instrument timbres was conducted using an exemplar-based learning system. The data consisted of the steady-state spectrum of each of the instruments played at different pitches (Sandell 1994). It has been shown that the attack portion of a musical instrument is important for identification tasks. Yet other studies show that steady-state portion is also significant (Grey 1978; Kendall and Carterette 1986). In addition to the spectral data, the moments of the spectrum, including the centroid, were considered as potential features for the identification process. The implementation of the identification task is based on a combination of a k-nearest neighbor (k-NN) classifier and a genetic algorithm, which is used for feature selection and feature weighting. This paradigm, also known as the exemplar-based learning model (Aha 1997), is attractive because training is not necessary, learning is extremely
Parcel: Feature Subset Selection in Variable Cost Domains
, 1998
"... The vast majority of classification systems are designed with a single set of features, and optimised to a single specified cost. However, in examples such as medical and financial risk modelling, costs are known to vary subsequent to system design. In this paper, we present a design method for feat ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
The vast majority of classification systems are designed with a single set of features, and optimised to a single specified cost. However, in examples such as medical and financial risk modelling, costs are known to vary subsequent to system design. In this paper, we present a design method for feature selection in the presence of varying costs. Starting from the Wilcoxon nonparametric statistic for the performance of a classification system, we introduce a concept called the maximum realisable receiver operating characteristic (MRROC), and prove a related theorem. A novel criterion for feature selection, based on the area under the MRROC curve, is then introduced. This leads to a framework which we call Parcel. This has the flexibility to use different combinations of features at different operating points on the resulting MRROC curve. Empirical support for each stage in our approach is provided by experiments on real world problems, with Parcel achieving superior results. iv v C...
Minimum Probability of Error Image Retrieval
- IEEE Trans. Signal Processing
"... Abstract—We address the design of optimal architectures for image retrieval from large databases. Minimum probability of error (MPE) is adopted as the optimality criterion and retrieval formulated as a problem of statistical classification. The probability of retrieval error is lower- and upper-boun ..."
Abstract
-
Cited by 19 (13 self)
- Add to MetaCart
Abstract—We address the design of optimal architectures for image retrieval from large databases. Minimum probability of error (MPE) is adopted as the optimality criterion and retrieval formulated as a problem of statistical classification. The probability of retrieval error is lower- and upper-bounded by functions of the Bayes and density estimation errors, and the impact of the components of the retrieval architecture (namely, the feature transformation and density estimation) on these bounds is characterized. This characterization suggests interpreting the search for the MPE feature set as the search for the minimum of the convex hull of a collection of curves of probability of error versus feature space dimension. A new algorithm for MPE feature design, based on a dictionary of empirical feature sets and the wrapper model for feature selection, is proposed. It is shown that, unlike traditional feature selection techniques, this algorithm scales to problems containing large numbers of classes. Experimental evaluation reveals that the MPE architecture is at least as good as popular empirical solutions on the narrow domains where these perform best but significantly outperforms them outside these domains. Index Terms—Bayesian methods, color and texture, expectation–maximization, feature selection, image retrieval, image similarity, minimum probability of error, mixture models, multiresolution, optimal retrieval systems, wrapper methods. I.

