• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Using mutual information for selecting features in supervised neural net learning (1994)

by R Battiti
Venue:IEEE Transactions on Neural Networks
Add To MetaCart

Tools

Sorted by:
Results 11 - 20 of 99
Next 10 →

Feature Selection Based on Joint Mutual Information

by Howard Hua Yang, John Moody - In Proceedings of International ICSC Symposium on Advances in Intelligent Data Analysis , 1999
"... A feature/input selection method is proposed based on joint mutual information. The new method is better than the existing methods based on mutual information in eliminating redundancy in the inputs. It is applied in a real world application to find 2-D viewing coordinates for data visualization and ..."
Abstract - Cited by 11 (1 self) - Add to MetaCart
A feature/input selection method is proposed based on joint mutual information. The new method is better than the existing methods based on mutual information in eliminating redundancy in the inputs. It is applied in a real world application to find 2-D viewing coordinates for data visualization and to select inputs for a neural network classifier. The result shows that the new method can find many good 2-D projections which cannot be found by the existing methods. Keywords: feature selection, joint mutual information, visualization, classification. 1 INTRODUCTION The goal of statistical modeling is to find a functional relationship between a target variable and a set of feature/input variables or explanatory variables. The statistical modeling is a process of construction and verification of hypotheses. It consists of three basic steps: model specification, model estimation and model selection. These three steps form a common framework for regression, classification and prediction. ...

Relevance of Time-Frequency Features for Phonetic and Speaker-Channel Classification

by Howard Hua Yang, Sarel Van Vuuren, Sangita Sharma, Hynek Hermansky - In ICASSP99 , 2000
"... The mutual information concept is used to study the distribution of speech information in frequency and in time. The main focus is on the information that is relevant for phonetic classification. A large database of hand-labeled fluent speech is used to (a) compute the mutual information (MI) bet ..."
Abstract - Cited by 11 (1 self) - Add to MetaCart
The mutual information concept is used to study the distribution of speech information in frequency and in time. The main focus is on the information that is relevant for phonetic classification. A large database of hand-labeled fluent speech is used to (a) compute the mutual information (MI) between a phonetic classification variable and one spectral feature variable in the time-frequency plane and (b) compute the joint mutual information (JMI) between the phonetic classification variable and two feature variables in the time-frequency plane. The MI and the JMI of the feature variables are used as relevance measures to select inputs for phonetic classifiers. Multi-layer perceptron classifiers with one or two inputs are trained to recognize phonemes to examine the effectiveness of the input selection method based on the MI and the JMI. To analyze the nonlinguistic sources of variability, we use speaker-channel labels to represent different speakers and different telephone cha...

A Versatile Framework for Labelling Imagery With a Large Number of Classes

by Shailesh Kumar, Melba Crawford, Joydeep Ghosh
"... Conventional methods for feature selection use some kind of separability criteria or classification accuracy for computing the relevance of a feature subset to the classification task. In two-class problems, this approach may be suitable, but for problems such as character recognition with 26 classe ..."
Abstract - Cited by 10 (6 self) - Add to MetaCart
Conventional methods for feature selection use some kind of separability criteria or classification accuracy for computing the relevance of a feature subset to the classification task. In two-class problems, this approach may be suitable, but for problems such as character recognition with 26 classes, these feature selection algorithms are often faced with complex tradeoffs among efficacy of features for separating different subsets of classes. We propose a class-pair based feature selection algorithm which, in conjunction with mixture modeling technique, provides significantly superior results for differentiating a large number of classes, even when the class priors vary considerably. This technique is applied to multi-sensor NASA/JPL remote sensing AIRSAR data for characterizing 11 types of land cover. The proposed polychotomous approach not only gives improved test accuracy, but also reduces the number of features used. Important domain information can be derived from the...

Feature Selection for Ranking

by Xiubo Geng, Tie-yan Liu, Tao Qin, Hang Li - Proceedings of the 30th Annual International ACM SIGIR Conference , 2007
"... Ranking is a very important topic in information retrieval. While algorithms for learning ranking models have been intensively studied, this is not the case for feature selection, despite of its importance. The reality is that many feature selection methods used in classification are directly applie ..."
Abstract - Cited by 10 (2 self) - Add to MetaCart
Ranking is a very important topic in information retrieval. While algorithms for learning ranking models have been intensively studied, this is not the case for feature selection, despite of its importance. The reality is that many feature selection methods used in classification are directly applied to ranking. We argue that because of the striking differences between ranking and classification, it is better to develop different feature selection methods for ranking. To this end, we propose a new feature selection method in this paper. Specifically, for each feature we use its value to rank the training instances, and define the ranking accuracy in terms of a performance measure or a loss function as the importance of the feature. We also define the correlation between the ranking results of two features as the similarity between them. Based on the definitions, we formulate the feature selection issue as an optimization problem, for which it is to find the features with maximum total importance scores and minimum total similarity scores. We also demonstrate how to solve the optimization problem in an efficient way. We have tested the effectiveness of our feature selection method on two information retrieval datasets and with two ranking models. Experimental results show that our method can outperform traditional feature selection methods for the ranking task.

Nonlinear Feature Transforms Using Maximum Mutual Information

by Kari Torkkola - In Proc. IJCNN , 2001
"... Finding the right features is an essential part of a pattern recognition system. This can be accomplished either by selection or by a transform from a larger number of "raw" features. In this work we learn non-linear dimension reducing discriminative transforms that are implemented as neural network ..."
Abstract - Cited by 9 (4 self) - Add to MetaCart
Finding the right features is an essential part of a pattern recognition system. This can be accomplished either by selection or by a transform from a larger number of "raw" features. In this work we learn non-linear dimension reducing discriminative transforms that are implemented as neural networks, either as radial basis function networks or as multilayer perceptrons. As the criterion, we use the joint mutual information (MI) between the class labels of training data and transformed features. Our measure of MI makes use of Renyi entropy as formulated by Principe et al. Resulting low-dimensional features enable a classifier to operate with less computational resources and memory without compromising the accuracy.

Feature Selection using Improved Mutual Information for Text Classification

by Jana Novovičová, Antonín Malík, Pavel Pudil - of Lecture Notes in Computer Science , 2004
"... Abstract. A major characteristic of text document classification problem is extremely high dimensionality of text data. In this paper we present two algorithms for feature (word) selection for the purpose of text classification. We used sequential forward selection methods based on improved mutual i ..."
Abstract - Cited by 8 (0 self) - Add to MetaCart
Abstract. A major characteristic of text document classification problem is extremely high dimensionality of text data. In this paper we present two algorithms for feature (word) selection for the purpose of text classification. We used sequential forward selection methods based on improved mutual information introduced by Battiti [1] and Kwak and Choi [6] for non-textual data. These feature evaluation functions take into consideration how features work together. The performance of these evaluation functions compared to the information gain which evaluate features individually is discussed. We present experimental results using naive Bayes classifier based on multinomial model on the Reuters data set. Finally, we analyze the experimental results from various perspectives, including F1-measure, precision and recall. Preliminary experimental results indicate the effectiveness of the proposed feature selection algorithms in a text classification problem. 1

Discriminative Feature Selection via Multiclass Variable Memory Markov Model

by Noam Slonim, Gill Bejerano, Shai Fine, Naftali Tishby - EURASIP Journal on Applied Signal Processing (JASP), Special issue on Unstructured Information Management from Multimedia Data Sources , 2002
"... We propose a novel feature selection method based on a Variable Memory Markov model (VMM). The VMM was originally proposed as a generative model trying to preserve the original source statistics from training data. ..."
Abstract - Cited by 8 (1 self) - Add to MetaCart
We propose a novel feature selection method based on a Variable Memory Markov model (VMM). The VMM was originally proposed as a generative model trying to preserve the original source statistics from training data.

Learning Discriminative Feature Transforms to Low Dimensions in Low Dimensions

by Kari Torkkola - In Advances in neural information processing systems 14 , 2001
"... The marriage of Renyi entropy with Parzen density estimation has been shown to be a viable tool in learning discriminative feature transforms. ..."
Abstract - Cited by 7 (2 self) - Add to MetaCart
The marriage of Renyi entropy with Parzen density estimation has been shown to be a viable tool in learning discriminative feature transforms.

Rapidly Estimating the Quality of Input Representations for Neural Networks

by Kevin Cherkauer, Jude W. Shavlik , 1995
"... The choice of an input representation for machine learning can have a profound impact on the accuracy of the learned model in classifying novel instances. A reliable method of rapidly estimating the value of a representation, independent of the learner, would be a powerful tool in the search for bet ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
The choice of an input representation for machine learning can have a profound impact on the accuracy of the learned model in classifying novel instances. A reliable method of rapidly estimating the value of a representation, independent of the learner, would be a powerful tool in the search for better representations. We introduce a fast representation-quality measure that is more accurate than Rendell and Ragavan's blurring metric in rank ordering input representations for neural networks on two difficult, real-world datasets. This work constitutes a step forward both in representation quality measures and in our understanding of the characteristics that engender good representations. 1 Introduction A major area of machine learning research is inductive learning from examples, where a system uses a set of classified training examples to induce a model, or "concept," that will accurately predict the classes of future examples. A main component of this approach is selecting an input ...

Genetic Feature Selection in a Fuzzy Rule-Based Classification System Learning Process for High Dimensional Problems

by J. Casillas, O. Cordón, M. J. Del Jesus, F. Herrera, Jorge Casillas, Francisco Herrera , 2000
"... The inductive learning of a Fuzzy Rule-Based Classification System (FRBCS) is made difficult by the presence of a high feature number that increases the dimensionality of the problem being solved. The difficulty comes from the exponential growth of the fuzzy rule search space with the increase in th ..."
Abstract - Cited by 7 (3 self) - Add to MetaCart
The inductive learning of a Fuzzy Rule-Based Classification System (FRBCS) is made difficult by the presence of a high feature number that increases the dimensionality of the problem being solved. The difficulty comes from the exponential growth of the fuzzy rule search space with the increase in the number of features considered in the learning process. In this work, we present a genetic feature selection process that can be integrated in a multistage genetic learning method to obtain, in a more efficient way, FRBCSs composed of a set of comprehensible fuzzy rules with high classification ability. The proposed process fixes, a priori, the number of selected features, and therefore, the size of the search space of candidate fuzzy rules. The experimentation carried out, using Sonar example base, shows a significant improvement on simplicity, precision and efficiency achieved by adding the proposed feature selection processes to the multistage genetic learning method or to other learning methods.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University