Results 1  10
of
98
Using mutual information for selecting features in supervised neural net learning
 IEEE TRANSACTIONS ON NEURAL NETWORKS
, 1994
"... This paper investigates the application of the mutual infor“ criterion to evaluate a set of candidate features and to select an informative subset to be used as input data for a neural network classifier. Because the mutual information measures arbitrary dependencies between random variables, it is ..."
Abstract

Cited by 358 (1 self)
 Add to MetaCart
(Show Context)
This paper investigates the application of the mutual infor“ criterion to evaluate a set of candidate features and to select an informative subset to be used as input data for a neural network classifier. Because the mutual information measures arbitrary dependencies between random variables, it is suitable for assessing the “information content ” of features in complex classification tasks, where methods bases on linear relations (like the correlation) are prone to mistakes. The fact that the mutual information is independent of the coordinates chosen permits a robust estimation. Nonetheless, the use of the mutual information for tasks characterized by high input dimensionality requires suitable approximations because of the prohibitive demands on computation and samples. An algorithm is proposed that is based on a “greedy” selection of the features and that takes both the mutual information with respect to the output class and with respect to the alreadyselected features into account. Finally the results of a series of experiments are discussed.
A complete enumeration and classification of twolocus disease models. Hum Hered 50
, 2000
"... www.karger.com ..."
Unsupervised induction of stochastic contextfree grammars using distributional clustering
"... An algorithm is presented for learning a phrasestructure grammar from tagged text. It clusters sequences of tags together based on local distributional information, and selects clusters that satisfy a novel mutual information criterion. This criterion is shown to be related to the entropy of a rand ..."
Abstract

Cited by 60 (2 self)
 Add to MetaCart
(Show Context)
An algorithm is presented for learning a phrasestructure grammar from tagged text. It clusters sequences of tags together based on local distributional information, and selects clusters that satisfy a novel mutual information criterion. This criterion is shown to be related to the entropy of a random variable associated with the tree structures, and it is demonstrated that it selects linguistically plausible constituents. This is incorporated in a Minimum Description Length algorithm. The evaluation of unsupervised models is discussed, and results are presented when the algorithm has been trained on 12 million words of the British National Corpus. 1
The study of correlation structures of DNA sequences: A critical review,”
 Computers Chem.,
, 1997
"... ..."
ESTIMATING FUNCTIONS OF PROBABILITY DISTRIBUTIONS FROM A FINITE SET OF SAMPLES Part II: Bayes Estimators for Mutual Information, ChiSquared, Covariance, and other Statistics.
"... This paper is the second in a series of two on the problem of estimating a function of a probability distribution from a finite set of samples of that distribution. In the first paper, the Bayes estimator for a function of a probability distribution was introduced, the optimal properties of the Baye ..."
Abstract

Cited by 53 (4 self)
 Add to MetaCart
This paper is the second in a series of two on the problem of estimating a function of a probability distribution from a finite set of samples of that distribution. In the first paper, the Bayes estimator for a function of a probability distribution was introduced, the optimal properties of the Bayes estimator were discussed, and the Bayes and frequencycounts estimators for the Shannon entropy were derived and graphically contrasted. In the current paper the analysis of the first paper is extended by the derivation of Bayes estimators for several other functions of interest in statistics and information theory. These functions are (powers of) the mutual information, chisquared for tests of independence, variance, covariance, and average. Finding Bayes estimators for several of these functions requires extensions to the analytical techniques developed in the first paper, and these extensions form the main body of this paper. This paper extends the analysis in other ways as well, for example by enlarging the class of potential priors beyond the uniform prior assumed in the first paper. In particular, the use of the entropic and Dirichlet priors is considered.
Transition Phenomena in Cellular Automata Rule Space
 Physica D
, 1990
"... We define several qualitative classes of cellular automata (CA) behavior, based on various statistical measures, and describe how the space of all cellular automata is organized. As a cellular automaton... ..."
Abstract

Cited by 41 (8 self)
 Add to MetaCart
(Show Context)
We define several qualitative classes of cellular automata (CA) behavior, based on various statistical measures, and describe how the space of all cellular automata is organized. As a cellular automaton...
Longrange correlation and partial 1=f ˛ spectrum in a noncoding DNA sequence
 Europhys. Lett
, 1992
"... SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily represent the views of the Santa Fe Institute. We accept papers intended for publication in peerreviewed journals or proceedings volumes, but not papers that have already appeared in print. Except for pap ..."
Abstract

Cited by 37 (9 self)
 Add to MetaCart
(Show Context)
SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily represent the views of the Santa Fe Institute. We accept papers intended for publication in peerreviewed journals or proceedings volumes, but not papers that have already appeared in print. Except for papers by our external faculty, papers must be based on work done at SFI, inspired by an invited visit to or collaboration at SFI, or funded by an SFI grant. ©NOTICE: This working paper is included by permission of the contributing author(s) as a means to ensure timely distribution of the scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the author(s). It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may be reposted only with the explicit permission of the copyright holder. www.santafe.edu
On the relationship between complexity and entropy for Markov chains and regular languages
 Complex Systems
, 1991
"... Abstract. Using the pastfuture mutual information as a measure of complexity, the relation between the complexity and the Shannon entropy is determined analytically for sequences generated by Markov chains and regular languages. It is emphasized that, given an entropy value, there are many possible ..."
Abstract

Cited by 32 (2 self)
 Add to MetaCart
Abstract. Using the pastfuture mutual information as a measure of complexity, the relation between the complexity and the Shannon entropy is determined analytically for sequences generated by Markov chains and regular languages. It is emphasized that, given an entropy value, there are many possible complexity values, and vice versa; that is, the relationship between complexity and entropy is not onetoone, but rather manytoone or onetomany. It is also emphasized that there are structures in the complexityversusentropy plots, and these structures depend on the details of a Markov chain or a regular language grammar. 1.
Understanding LongRange Correlations in DNA Sequences
 PHYSICA D
, 1994
"... In this paper, we review the literature on statistical longrange correlation in DNA sequences. We examine the current evidence for these correlations, and conclude that a mixture of many length scales (including some relatively long ones) in DNA sequences is responsible for the observed 1/f like ..."
Abstract

Cited by 29 (7 self)
 Add to MetaCart
(Show Context)
In this paper, we review the literature on statistical longrange correlation in DNA sequences. We examine the current evidence for these correlations, and conclude that a mixture of many length scales (including some relatively long ones) in DNA sequences is responsible for the observed 1/f like spectral component. We note the complexity of the correlation structure in DNA sequences. The observed complexity often makes it hard, or impossible, to decompose the sequence into a few statistically stationary regions. We suggest that, based on the complexity of DNA sequences, a fruitful approach to understand longrange correlation is to model duplication, and other rearrangement processes, in DNA sequences. One model, called "expansionmodification system", contains only point duplication and point mutation. Though simplistic, this model is able to generate sequences with 1/f spectra. We emphasize the importance of DNA duplication in its contribution to the observed longrang...
Application of the Mutual Information Criterion for Feature Selection in ComputerAided Diagnosis.
 Medial Physics,
, 2001
"... The purpose of this study was to investigate an information theoretic approach to feature selection for computeraided diagnosis ͑CAD͒. The approach is based on the mutual information ͑MI͒ concept. MI measures the general dependence of random variables without making any assumptions about the natur ..."
Abstract

Cited by 27 (0 self)
 Add to MetaCart
The purpose of this study was to investigate an information theoretic approach to feature selection for computeraided diagnosis ͑CAD͒. The approach is based on the mutual information ͑MI͒ concept. MI measures the general dependence of random variables without making any assumptions about the nature of their underlying relationships. Consequently, MI can potentially offer some advantages over feature selection techniques that focus only on the linear relationships of variables. This study was based on a database of statistical texture features extracted from perfusion lung scans. The ultimate goal was to select the optimal subset of features for the computeraided diagnosis of acute pulmonary embolism ͑PE͒. Initially, the study addressed issues regarding the approximation of MI in a limited dataset as it is often the case in CAD applications. The MI selected features were compared to those features selected using stepwise linear discriminant analysis and genetic algorithms for the same PE database. Linear and nonlinear decision models were implemented to merge the selected features into a final diagnosis. Results showed that the MI is an effective feature selection criterion for nonlinear CAD models overcoming some of the wellknown limitations and computational complexities of other popular feature selection techniques in the field.