Results 1 - 10
of
14
Wrappers for feature subset selection
- ARTIFICIAL INTELLIGENCE
, 1997
"... In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a ..."
Abstract
-
Cited by 775 (3 self)
- Add to MetaCart
In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. Our wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain. We study the strengths and weaknesses of the wrapper approach and show a series of improved designs. We compare the wrapper approach to induction without feature subset selection and to Relief, a filter approach to feature subset selection. Significant improvement in accuracy is achieved for some datasets for the two families of induction algorithms used: decision trees and
Irrelevant Features and the Subset Selection Problem
- MACHINE LEARNING: PROCEEDINGS OF THE ELEVENTH INTERNATIONAL
, 1994
"... We address the problem of finding a subset of features that allows a supervised induction algorithm to induce small high-accuracy concepts. We examine notions of relevance and irrelevance, and show that the definitions used in the machine learning literature do not adequately partition the features ..."
Abstract
-
Cited by 515 (22 self)
- Add to MetaCart
We address the problem of finding a subset of features that allows a supervised induction algorithm to induce small high-accuracy concepts. We examine notions of relevance and irrelevance, and show that the definitions used in the machine learning literature do not adequately partition the features into useful categories of relevance. We present definitions for irrelevance and for two degrees of relevance. These definitions improve our understanding of the behavior of previous subset selection algorithms, and help define the subset of features that should be sought. The features selected should depend not only on the features and the target concept, but also on the induction algorithm. We describe a method for feature subset selection using cross-validation that is applicable to any induction algorithm, and discuss experiments conducted with ID3 and C4.5 on artificial and real datasets.
Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey
- Data Mining and Knowledge Discovery
, 1997
"... Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial ne ..."
Abstract
-
Cited by 122 (1 self)
- Add to MetaCart
Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial neural networks. Researchers in these disciplines, sometimes working on quite different problems, identified similar issues and heuristics for decision tree construction. This paper surveys existing work on decision tree construction, attempting to identify the important issues involved, directions the work has taken and the current state of the art. Keywords: classification, tree-structured classifiers, data compaction 1. Introduction Advances in data collection methods, storage and processing technology are providing a unique challenge and opportunity for automated data exploration techniques. Enormous amounts of data are being collected daily from major scientific projects e.g., Human Genome...
Wrappers For Performance Enhancement And Oblivious Decision Graphs
, 1995
"... In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are stu ..."
Abstract
-
Cited by 94 (6 self)
- Add to MetaCart
In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are studied under the wrapper approach. The hypothesis spaces we investigate are: decision tables with a default majority rule (DTMs) and oblivious read-once decision graphs (OODGs).
Useful Feature Subsets and Rough Set Reducts
, 1994
"... In supervised classification learning, one attempts to induce a classifier that correctly predicts the label of novel instances. We demonstrate that by choosing a useful subset of features for the indiscernibility relation, an induction algorithm based on simple decision table can have high predicti ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
In supervised classification learning, one attempts to induce a classifier that correctly predicts the label of novel instances. We demonstrate that by choosing a useful subset of features for the indiscernibility relation, an induction algorithm based on simple decision table can have high prediction accuracy on artificial and real-world datasets. We show that useful feature subsets are not necessarily maximal independent sets (relative reducts) with respect to the label, and that, in practical situations, using a subset of the relative core features may lead to superior performance. 1 Introduction In supervised classification learning, one is given a training set containing labelled instances (examples) . Each labelled instance contains a list of feature values (attribute values) and a discrete label value. The induction task is to build a classifier that will correctly predict the label of novel instances. Common classifiers are decision trees, neural networks, and nearest-neighbor...
New techniques for extracting features from protein sequences
- IBM Systems Journal
, 2001
"... In this paper we propose new techniques to extract features from protein sequences. We then use the features as inputs for a Bayesian neural network (BNN) and apply the BNN to classifying protein sequences obtained from the PIR protein database maintained at the National Biomedical Research Foundati ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
In this paper we propose new techniques to extract features from protein sequences. We then use the features as inputs for a Bayesian neural network (BNN) and apply the BNN to classifying protein sequences obtained from the PIR protein database maintained at the National Biomedical Research Foundation. To evaluate the performance of the proposed approach, we compare it with other protein classiers built based on sequence alignment and machine learning methods. Experimental results show the high precision of the proposed classi er and the complementarity of the bioinformatics tools studied in the paper.
Cryptographic Distinguishability Measures for Quantum-Mechanical States
- IEEE Trans. Inform. Theory 45. No
, 1999
"... This paper, mostly expository in nature, surveys four measures of distinguishability for quantum-mechanical states. This is done from the point of view of the cryptographer with a particular eye on applications in quantum cryptography. Each of the measures considered is rooted in an analogous classi ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
This paper, mostly expository in nature, surveys four measures of distinguishability for quantum-mechanical states. This is done from the point of view of the cryptographer with a particular eye on applications in quantum cryptography. Each of the measures considered is rooted in an analogous classical measure of distinguishability for probability distributions: namely, the probability of an identification error, the Kolmogorov distance, the Bhattacharyya coefficient, and the Shannon distinguishability (as defined through mutual information). These measures have a long history of use in statistical pattern recognition and classical cryptography. We obtain several inequalities that relate the quantum distinguishability measures to each other, one of which may be crucial for proving the security of quantum cryptographic key distribution. In another vein, these measures and their connecting inequalities are used to define a single notion of cryptographic exponential indistinguishability for two families of quantum states. This is a tool that may prove useful in the analysis of various quantum cryptographic protocols. 1
Genetic algorithms for classification and feature extraction
- Annual Meeting, Classification Society of North America
, 1995
"... In this paper we summarize our research on classification and feature extraction for high-dimensionality patterns using genetic algorithms. We have developed two GA-based approaches, both utilizing a feedback linkage between feature evaluation and classification. That is, we carry out feature extrac ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
In this paper we summarize our research on classification and feature extraction for high-dimensionality patterns using genetic algorithms. We have developed two GA-based approaches, both utilizing a feedback linkage between feature evaluation and classification. That is, we carry out feature extraction (with dimensionality reduction) and classifier design simultaneously, through “genetic learning and evolution. ” These approaches combine a GA with two different approaches: the K-Nearest-Neighbor decision rule and a production decision rule. We show the effectiveness of these approaches on a series of artificial test data and on real-world biological data.
Optimized feature extraction for learning-based image steganalysis
- IEEE Trans. Inform. Forensics and Security
, 2007
"... The purpose of image steganalysis is to detect the presence of hidden messages in cover photographic images. Supervised learning is an effective and universal approach to cope with the twin difficulties of unknown image statistics and unknown steganographic codes. A crucial part of the learning proc ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
The purpose of image steganalysis is to detect the presence of hidden messages in cover photographic images. Supervised learning is an effective and universal approach to cope with the twin difficulties of unknown image statistics and unknown steganographic codes. A crucial part of the learning process is the selection of low-dimensional informative features. We investigate this problem from three angles and propose a three-level optimization of the classifier. First, we select a subband image representation that provides better discrimination ability than a conventional wavelet transform. Second, we analyze two types of features—empirical moments of probability density functions (PDFs) and empirical moments of characteristic functions of the PDFs—and compare their merits. Third, we address the problem of feature dimensionality reduction, which strongly impacts classification accuracy. Experiments show that our method outperforms previous steganalysis methods. For instance, when the probability of false alarm is fixed at 1%, the stegoimage detection probability of our algorithm exceeds that of its closest competitor by at least 15 % and up to 50%.

