Results 1 - 10
of
34
Fast Binary Feature Selection with Conditional Mutual Information
- Journal of Machine Learning Research
, 2004
"... We propose in this paper a very fast feature selection technique based on conditional mutual information. ..."
Abstract
-
Cited by 65 (1 self)
- Add to MetaCart
We propose in this paper a very fast feature selection technique based on conditional mutual information.
Mutual Information in Learning Feature Transformations
- In Proceedings of the 17th International Conference on Machine Learning
, 2000
"... We present feature transformations useful for exploratory data analysis or for pattern recognition. Transformations are learned from example data sets by maximizing the mutual information between transformed data and their class labels. We make use of Renyi's quadratic entropy, and we extend the wor ..."
Abstract
-
Cited by 42 (7 self)
- Add to MetaCart
We present feature transformations useful for exploratory data analysis or for pattern recognition. Transformations are learned from example data sets by maximizing the mutual information between transformed data and their class labels. We make use of Renyi's quadratic entropy, and we extend the work of Principe et al. to mutual information between continuous multidimensional variables and discrete-valued class labels. 1.
Feature Selection by Maximum Marginal Diversity: Optimality and Implications for Visual Recognition
- In submitted
, 2002
"... We address the question of feature selection in the context of visual recognition. It is shown that, besides efficient from a computational standpoint, the infomax principle is nearly optimal in the minimum Bayes error sense. The concept of marginal diversity is introduced, leading to a generic prin ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
We address the question of feature selection in the context of visual recognition. It is shown that, besides efficient from a computational standpoint, the infomax principle is nearly optimal in the minimum Bayes error sense. The concept of marginal diversity is introduced, leading to a generic principle for feature selection (the principle of maximum marginal diversity) of extreme computational simplicity. The relationships between infomax and the maximization of marginal diversity are identified, uncovering the existence of a family of classification procedures for which near optimal (in the Bayes error sense) feature selection does not require combinatorial search. Examination of this family in light of recent studies on the statistics of natural images suggests that visual recognition problems are a subset of it. 1
A Statistic to Estimate the Variance of the Histogram-Based Mutual Information Estimator Based on . . .
, 1999
"... In the case of two signals with independent pairs of observations (x # ,y # ) a statistic to estimate the variance of the histogram based mutual information estimator has been derived earlier. We present such a statistic for dependent pairs. To derive this statistic it is necessary to avail of a ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
In the case of two signals with independent pairs of observations (x # ,y # ) a statistic to estimate the variance of the histogram based mutual information estimator has been derived earlier. We present such a statistic for dependent pairs. To derive this statistic it is necessary to avail of a reliable statistic to estimate the variance of the sample mean in case of dependent observations. We derive and discuss this statistic and a statistic to estimate the variance of the mutual information estimator. These statistics are validated by simulations. # 1999 Elsevier Science B.V. All rights reserved. Zusammenfassung Im Fall zweier Signale mit unabhaK ngigen Paaren von Beobachtungen (x # ,y # ) wurde schon fruK her eine Statistik zur SchaK tzung der Varianz des histogramm-basierten SchaK tzers fuK r die Transinformation (mutual information) abgeleitet. Wir stellen eine solche Statistik fuK r abhaK ngige Paare vor. Um diese Statistik abzuleiten, ist es erforderlich, auf eine zuv...
Automatic feature selection in neuroevolution
- In Genetic and Evolutionary Computation Conference
, 2005
"... Abstract. Feature selection is the process of finding the set of inputs to a machine learning algorithm that will yield the best performance. Developing a way to solve this problem automatically would make current machine learning methods much more useful. Previous efforts to automate feature select ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
Abstract. Feature selection is the process of finding the set of inputs to a machine learning algorithm that will yield the best performance. Developing a way to solve this problem automatically would make current machine learning methods much more useful. Previous efforts to automate feature selection rely on expensive meta-learning or are applicable only when labeled training data is available. This paper presents a novel method called FS-NEAT which extends the NEAT neuroevolution method to automatically determine the right set of inputs for the networks it evolves. By learning the network’s inputs, topology, and weights simultaneously, FS-NEAT addresses the feature selection problem without relying on meta-learning or labeled data. Initial experiments in a line orientation task demonstrate that FS-NEAT can learn networks with fewer inputs and better performance than traditional NEAT. Furthermore, it outperforms traditional NEAT even when the feature set does not contain extraneous features because it searches for networks in a lower-dimensional space. 1
Feature Selection with Neural Networks
- Behaviormetrika
, 1998
"... Features gathered from the observation of a phenomenon are not all equally informative: some of them may be noisy, correlated or irrelevant. Feature selection aims at selecting a feature set that is relevant for a given task. This problem is complex and remains an important issue in many domains. In ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Features gathered from the observation of a phenomenon are not all equally informative: some of them may be noisy, correlated or irrelevant. Feature selection aims at selecting a feature set that is relevant for a given task. This problem is complex and remains an important issue in many domains. In the field of neural networks, feature selection has been studied for the last ten years and classical as well as original methods have been employed. This paper is a review of neural network approaches to feature selection. We first briefly introduce baseline statistical methods used in regression and classification. We then describe families of methods which have been developed specifically for neural networks. Representative methods are then compared on different test problems. Keywords Feature Selection, Subset selection, Variable Sensitivity, Sequential Search Sélection de Variables et Réseaux de Neurones Philippe LERAY et Patrick GALLINARI Résumé Les données collectées lors de l'obse...
Automatic Determination of Optimal Network Topologies Based on Information Theory and Evolution
- IEEE Proceedings of the 23rd EUROMICRO Conference
, 1997
"... We present a new approach to determine the optimal topology of multilayer perceptrons for a given learning task based on information theory and evolution. Our method exploits the mutual information of the input-output relation to sort the units into a list with respect to their information content. ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
We present a new approach to determine the optimal topology of multilayer perceptrons for a given learning task based on information theory and evolution. Our method exploits the mutual information of the input-output relation to sort the units into a list with respect to their information content. Embedded in a evolutionary algorithm, a mutation operator is proposed, which removes or adds input units from given networks based on their ranking. On several benchmarks the power of the approach is demonstrated. We conclude that using an evolutionary algorithm as framework in conjunction with intelligent mutation operators is concurrently the most efficient optimization technique with regard to network size and performance as well as scalability. 1 Introduction A non-trivial task in the application of neural networks is the determination of the appropriate level of complexity of the model fitted by the given data. Following the general principle of Occam's Razor, we should choose the sim...
Spectrophotometric Variable Selection By Mutual Information
, 2004
"... Spectrophotometric data often comprise a great number of numerical components or variables that can be used in calibration models. When a large number of such variables are incorporated into a particular model, many difficulties arise, and it is often necessary to reduce the number of spectral varia ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Spectrophotometric data often comprise a great number of numerical components or variables that can be used in calibration models. When a large number of such variables are incorporated into a particular model, many difficulties arise, and it is often necessary to reduce the number of spectral variables. This paper proposes an incremental (Forward -- Backward) procedure, initiated using an entropy-based criterion (mutual information), to choose the first variable. The advantages of the method are discussed; results in quantitative chemical analysis by spectrophotometry show the improvements obtained with respect to traditional and nonlinear calibration models.
Mutual Information Methods for Evaluating Dependence Among Outputs in Learning Machines
, 2001
"... The evaluation of dependence among output errors of multi-input multi-output learning machines can help us in designing well-behaved systems, highlighting hidden interactions among their internal components that can add noise to the learning process. By estimating the relations between performances ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
The evaluation of dependence among output errors of multi-input multi-output learning machines can help us in designing well-behaved systems, highlighting hidden interactions among their internal components that can add noise to the learning process. By estimating the relations between performances and dependence among output errors, we can compare different models of learning machines in order to select the ones best suited to a particular problem. We distinguish between dependence among outputs and dependence among output errors and we propose measures based on mutual information for evaluating both these types of dependence. Global measures of dependence between outputs and output errors, together with mutual information error matrices for evaluating specific dependences between each pair of outputs are presented. We propose a statistical test of hyphotesis for evaluating the difference of the dependence among outputs and output errors between different learning machine...
Input variable selection: Mutual information and linear mixing measures
- IEEE Trans. on Knowledge and Data Engineering
"... Abstract — Determining the most appropriate inputs to a model has a significant impact on the performance of the model and associated algorithms for classification, prediction and data analysis. Previously we proposed an algorithm ICAIVS which utilizes independent component analysis (ICA) as a prepr ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract — Determining the most appropriate inputs to a model has a significant impact on the performance of the model and associated algorithms for classification, prediction and data analysis. Previously we proposed an algorithm ICAIVS which utilizes independent component analysis (ICA) as a preprocessing stage to overcome issues of dependencies between inputs, before the data being passed through to an inout variable selection (IVS) stage. While we demonstrated previously with artificial data that ICA can prevent an overestimation of necessary input variables, we show here that mixing between input variables is common in real world datasets so that ICA preprocessing is useful in practice. This experimental test is based on new measures introduced in this paper. Furthermore, we extend the implementation of our variable selection scheme to a statistical dependency test based on mutual information and test several algorithms on gaussian and sub-gaussian signals. Specifically, we propose a novel method of quantifying linear dependencies using ICA estimates of mixing matrices with a new Linear Mixing Measure (LMM). Index Terms — Input variable selection, modeling, data preprocessing, independent component analysis, mutual information estimation. I.

