Results 1  10
of
138
Toward integrating feature selection algorithms for classification and clustering
 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2005
"... This paper introduces concepts and algorithms of feature selection, surveys existing feature selection algorithms for classification and clustering, groups and compares different algorithms with a categorizing framework based on search strategies, evaluation criteria, and data mining tasks, reveals ..."
Abstract

Cited by 127 (15 self)
 Add to MetaCart
This paper introduces concepts and algorithms of feature selection, surveys existing feature selection algorithms for classification and clustering, groups and compares different algorithms with a categorizing framework based on search strategies, evaluation criteria, and data mining tasks, reveals unattempted combinations, and provides guidelines in selecting feature selection algorithms. With the categorizing framework, we continue our efforts toward building an integrated system for intelligent feature selection. A unifying platform is proposed as an intermediate step. An illustrative example is presented to show how existing feature selection algorithms can be integrated into a meta algorithm that can take advantage of individual algorithms. An added advantage of doing so is to help a user employ a suitable algorithm without knowing details of each algorithm. Some realworld applications are included to demonstrate the use of feature selection in data mining. We conclude this work by identifying trends and challenges of feature selection research and development.
A survey of evolutionary algorithms for data mining and knowledge discovery
 In: A. Ghosh, and S. Tsutsui (Eds.) Advances in Evolutionary Computation
, 2002
"... Abstract: This chapter discusses the use of evolutionary algorithms, particularly genetic algorithms and genetic programming, in data mining and knowledge discovery. We focus on the data mining task of classification. In addition, we discuss some preprocessing and postprocessing steps of the knowled ..."
Abstract

Cited by 84 (3 self)
 Add to MetaCart
Abstract: This chapter discusses the use of evolutionary algorithms, particularly genetic algorithms and genetic programming, in data mining and knowledge discovery. We focus on the data mining task of classification. In addition, we discuss some preprocessing and postprocessing steps of the knowledge discovery process, focusing on attribute selection and pruning of an ensemble of classifiers. We show how the requirements of data mining and knowledge discovery influence the design of evolutionary algorithms. In particular, we discuss how individual representation, genetic operators and fitness functions have to be adapted for extracting highlevel knowledge from data. 1.
Variable Selection Using SVMbased Criteria
, 2003
"... We propose new methods to evaluate variable subset relevance with a view to variable selection. ..."
Abstract

Cited by 78 (3 self)
 Add to MetaCart
We propose new methods to evaluate variable subset relevance with a view to variable selection.
Hierarchical Text Categorization Using Neural Networks
 Information Retrieval
, 2002
"... This paper presents the design and evaluation of a text categorization method based on the Hierarchical Mixture of Experts model. This model uses a divide and conquer principle to define smaller categorization problems based on a predefined hierarchical structure. The final classifier is a hierarchi ..."
Abstract

Cited by 73 (0 self)
 Add to MetaCart
This paper presents the design and evaluation of a text categorization method based on the Hierarchical Mixture of Experts model. This model uses a divide and conquer principle to define smaller categorization problems based on a predefined hierarchical structure. The final classifier is a hierarchical array of neural networks. The method is evaluated using the UMLS Metathesaurus as the underlying hierarchical structure, and the OHSUMED test set of MEDLINE records. Comparisons with an optimized version of the traditional Rocchio's algorithm adapted for text categorization, as well as at neural network classifiers are provided. The results show that the use of the hierarchical structure improves text categorization performance with respect to an equivalent at model. The optimized Rocchio algorithm achieves a performance comparable with that of the hierarchical neural networks.
Simultaneous feature selection and clustering using mixture models
 IEEE TRANS. PATTERN ANAL. MACH. INTELL
, 2004
"... Clustering is a common unsupervised learning technique used to discover group structure in a set of data. While there exist many algorithms for clustering, the important issue of feature selection, that is, what attributes of the data should be used by the clustering algorithms, is rarely touched u ..."
Abstract

Cited by 73 (1 self)
 Add to MetaCart
Clustering is a common unsupervised learning technique used to discover group structure in a set of data. While there exist many algorithms for clustering, the important issue of feature selection, that is, what attributes of the data should be used by the clustering algorithms, is rarely touched upon. Feature selection for clustering is difficult because, unlike in supervised learning, there are no class labels for the data and, thus, no obvious criteria to guide the search. Another important problem in clustering is the determination of the number of clusters, which clearly impacts and is influenced by the feature selection issue. In this paper, we propose the concept of feature saliency and introduce an expectationmaximization (EM) algorithm to estimate it, in the context of mixturebased clustering. Due to the introduction of a minimum message length model selection criterion, the saliency of irrelevant features is driven toward zero, which corresponds to performing feature selection. The criterion and algorithm are then extended to simultaneously estimate the feature saliencies and the number of clusters.
Lightweight Agents for Intrusion Detection
 Journal of Systems and Software
, 2003
"... This paper focuses on intrusion detection and countermeasures with respect to widelyused operating systems and networks. The design and architecture of an intrusion detection system built from distributed agents is proposed to implement an intelligent system on which data mining can be performed to ..."
Abstract

Cited by 62 (7 self)
 Add to MetaCart
This paper focuses on intrusion detection and countermeasures with respect to widelyused operating systems and networks. The design and architecture of an intrusion detection system built from distributed agents is proposed to implement an intelligent system on which data mining can be performed to provide global, temporal views of an entire networked system. A starting point for agent intelligence in our system is the research into the use of machine learning over system call traces from the privileged sendmail program on UNIX. We use a rule learning algorithm to classify the system call traces for intrusion detection purposes and show the results.
Feature Selection in Unsupervised Learning via Evolutionary Search
 In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2000
"... Feature subset selection is an important problem in knowl edge discovery, not only for the insight gained from deter mining relevant modeling variables but also for the improved understandability, scalability, and possibly, accuracy of the resulting models. In this paper we consider the problem of ..."
Abstract

Cited by 54 (3 self)
 Add to MetaCart
Feature subset selection is an important problem in knowl edge discovery, not only for the insight gained from deter mining relevant modeling variables but also for the improved understandability, scalability, and possibly, accuracy of the resulting models. In this paper we consider the problem of feature selection for unsupervised learning. A number of heuristic criteria can be used to estimate the quality of clusters built from a given featuresubset. Rather than combining such criteria, we use ELSA, an evolutionary lo cal selection algorithm that maintains a diverse population of solutions that approximate the Pareto front in a multi dimensional objectiv espace. Each evolved solution repre sents a feature subset and a number of clusters; a standard Kmeans algorithm is applied to form the given n umber of clusters based on the selected features. Preliminary results on both real and synthetic data show promise in finding Paretooptimal solutions through which we can identify the significant features and the correct number of clusters.
Constructive Neural Network Learning Algorithms for Pattern Classification
, 2000
"... Constructive learning algorithms offer an attractive approach for the incremental construction of nearminimal neuralnetwork architectures for pattern classification. They help overcome the need for ad hoc and often inappropriate choices of network topology in algorithms that search for suitable we ..."
Abstract

Cited by 45 (14 self)
 Add to MetaCart
Constructive learning algorithms offer an attractive approach for the incremental construction of nearminimal neuralnetwork architectures for pattern classification. They help overcome the need for ad hoc and often inappropriate choices of network topology in algorithms that search for suitable weights in a priori fixed network architectures. Several such algorithms are proposed in the literature and shown to converge to zero classification errors (under certain assumptions) on tasks that involve learning a binary to binary mapping (i.e., classification problems involving binaryvalued input attributes and two output categories). We present two constructive learning algorithms MPyramidreal and MTilingreal that extend the pyramid and tiling algorithms, respectively, for learning real to Mary mappings (i.e., classification problems involving realvalued input attributes and multiple output classes). We prove the convergence of these algorithms and empirically demonstrate their applicability to practical pattern classification problems. Additionally, we show how the incorporation of a local pruning step can eliminate several redundant neurons from MTilingreal networks.
Feature Subset Selection by Bayesian networks: a comparison with genetic and sequential algorithms
"... In this paper we perform a comparison among FSSEBNA, a randomized, populationbased and evolutionary algorithm, and two genetic and other two sequential search approaches in the well known Feature Subset Selection (FSS) problem. In FSSEBNA, the FSS problem, stated as a search problem, uses the E ..."
Abstract

Cited by 42 (15 self)
 Add to MetaCart
In this paper we perform a comparison among FSSEBNA, a randomized, populationbased and evolutionary algorithm, and two genetic and other two sequential search approaches in the well known Feature Subset Selection (FSS) problem. In FSSEBNA, the FSS problem, stated as a search problem, uses the EBNA (Estimation of Bayesian Network Algorithm) search engine, an algorithm within the EDA (Estimation of Distribution Algorithm) approach. The EDA paradigm is born from the roots of the GA community in order to explicitly discover the relationships among the features of the problem and not disrupt them by genetic recombination operators. The EDA paradigm avoids the use of recombination operators and it guarantees the evolution of the population of solutions and the discovery of these relationships by the factorization of the probability distribution of best individuals in each generation of the search. In EBNA, this factorization is carried out by a Bayesian network induced by a chea...
On Feature Selection: Learning with Exponentially many Irrelevant Features as Training Examples
 Proceedings of the Fifteenth International Conference on Machine Learning
, 1998
"... We consider feature selection in the "wrapper " model of feature selection. This typically involves an NPhard optimization problem that is approximated by heuristic search for a "good" feature subset. First considering the idealization where this optimization is performed exactly, we give a rigorou ..."
Abstract

Cited by 37 (4 self)
 Add to MetaCart
We consider feature selection in the "wrapper " model of feature selection. This typically involves an NPhard optimization problem that is approximated by heuristic search for a "good" feature subset. First considering the idealization where this optimization is performed exactly, we give a rigorous bound for generalization error under feature selection. The search heuristics typically used are then immediately seen as trying to achieve the error given in our bounds, and succeeding to the extent that they succeed in solving the optimization. The bound suggests that, in the presence of many "irrelevant" features, the main source of error in wrapper model feature selection is from "overfitting " holdout or crossvalidation data. This motivates a new algorithm that, again under the idealization of performing search exactly, has sample complexity (and error) that grows logarithmically in the number of "irrelevant" features  which means it can tolerate having a number of "irrelevant" f...