Results 1  10
of
77
Using mutual information for selecting features in supervised neural net learning
 IEEE Transactions on Neural Networks
, 1994
"... AbstractThis paper investigates the application of the mutual infor “ criterion to evaluate a set of candidate features and to select an informative subset to be used as input data for a neural network classifier. Because the mutual information measures arbitrary dependencies between random variabl ..."
Abstract

Cited by 221 (1 self)
 Add to MetaCart
(Show Context)
AbstractThis paper investigates the application of the mutual infor “ criterion to evaluate a set of candidate features and to select an informative subset to be used as input data for a neural network classifier. Because the mutual information measures arbitrary dependencies between random variables, it is suitable for assessing the “information content ” of features in complex classification tasks, where methods bases on linear relations (like the correlation) are prone to mistakes. The fact that the mutual information is independent of the coordinates chosen permits a robust estimation. Nonetheless, the use of the mutual information for tasks characterized by high input dimensionality requires suitable approximations because of the prohibitive demands on computation and samples. An algorithm is proposed that is based on a “greedy ” selection of the features and that takes both the mutual information with respect to the output class and with respect to the alreadyselected features into account. Finally the results of a series of experiments are discussed. Index TermsFeature extraction, neural network pruning, dimensionality reduction, mutual information, supervised learning,
Unsupervised induction of stochastic contextfree grammars using distributional clustering
"... An algorithm is presented for learning a phrasestructure grammar from tagged text. It clusters sequences of tags together based on local distributional information, and selects clusters that satisfy a novel mutual information criterion. This criterion is shown to be related to the entropy of a rand ..."
Abstract

Cited by 49 (2 self)
 Add to MetaCart
(Show Context)
An algorithm is presented for learning a phrasestructure grammar from tagged text. It clusters sequences of tags together based on local distributional information, and selects clusters that satisfy a novel mutual information criterion. This criterion is shown to be related to the entropy of a random variable associated with the tree structures, and it is demonstrated that it selects linguistically plausible constituents. This is incorporated in a Minimum Description Length algorithm. The evaluation of unsupervised models is discussed, and results are presented when the algorithm has been trained on 12 million words of the British National Corpus. 1
The study of correlation structure of dna sequences: a critical review
 Computers Chem
, 1997
"... ..."
ESTIMATING FUNCTIONS OF PROBABILITY DISTRIBUTIONS FROM A FINITE SET OF SAMPLES Part II: Bayes Estimators for Mutual Information, ChiSquared, Covariance, and other Statistics.
"... This paper is the second in a series of two on the problem of estimating a function of a probability distribution from a finite set of samples of that distribution. In the first paper, the Bayes estimator for a function of a probability distribution was introduced, the optimal properties of the Baye ..."
Abstract

Cited by 39 (3 self)
 Add to MetaCart
This paper is the second in a series of two on the problem of estimating a function of a probability distribution from a finite set of samples of that distribution. In the first paper, the Bayes estimator for a function of a probability distribution was introduced, the optimal properties of the Bayes estimator were discussed, and the Bayes and frequencycounts estimators for the Shannon entropy were derived and graphically contrasted. In the current paper the analysis of the first paper is extended by the derivation of Bayes estimators for several other functions of interest in statistics and information theory. These functions are (powers of) the mutual information, chisquared for tests of independence, variance, covariance, and average. Finding Bayes estimators for several of these functions requires extensions to the analytical techniques developed in the first paper, and these extensions form the main body of this paper. This paper extends the analysis in other ways as well, for example by enlarging the class of potential priors beyond the uniform prior assumed in the first paper. In particular, the use of the entropic and Dirichlet priors is considered.
Transition Phenomena in Cellular Automata Rule Space
 Physica D
, 1990
"... We define several qualitative classes of cellular automata (CA) behavior, based on various statistical measures, and describe how the space of all cellular automata is organized. As a cellular automaton... ..."
Abstract

Cited by 33 (7 self)
 Add to MetaCart
(Show Context)
We define several qualitative classes of cellular automata (CA) behavior, based on various statistical measures, and describe how the space of all cellular automata is organized. As a cellular automaton...
Linear Feature Extractors Based on Mutual Information
 In Proceedings of the 13th International Conference on Pattern Recognition
, 1996
"... This paper presents and evaluates two linear feature extractors based on mutual information. These feature extractors consider general dependencies between features and class labels, as opposed to well known linear methods such as PCA which does not consider class labels and LDA, which uses only sim ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
(Show Context)
This paper presents and evaluates two linear feature extractors based on mutual information. These feature extractors consider general dependencies between features and class labels, as opposed to well known linear methods such as PCA which does not consider class labels and LDA, which uses only simple low order dependencies. As evidenced by several simulations on high dimensional data sets, the proposed techniques provide superior feature extraction and better dimensionality reduction while having similar computational requirements. 1. Introduction The capabilities of a classifier are ultimately limited by the quality of the features in each input vector. In particular, when the measurement space is highdimensional but the number of samples is limited, one is faced with the "curse of dimensionality" problem during training [3]. Feature extraction is often used to alleviate this problem. Although linear feature extractors are ultimately less flexible than the more general nonlinear ...
On the relationship between complexity and entropy for Markov chains and regular languages
 Complex Systems
, 1991
"... Abstract. Using the pastfuture mutual information as a measure of complexity, the relation between the complexity and the Shannon entropy is determined analytically for sequences generated by Markov chains and regular languages. It is emphasized that, given an entropy value, there are many possible ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
Abstract. Using the pastfuture mutual information as a measure of complexity, the relation between the complexity and the Shannon entropy is determined analytically for sequences generated by Markov chains and regular languages. It is emphasized that, given an entropy value, there are many possible complexity values, and vice versa; that is, the relationship between complexity and entropy is not onetoone, but rather manytoone or onetomany. It is also emphasized that there are structures in the complexityversusentropy plots, and these structures depend on the details of a Markov chain or a regular language grammar. 1.
Understanding LongRange Correlations in DNA Sequences
 PHYSICA D
, 1994
"... In this paper, we review the literature on statistical longrange correlation in DNA sequences. We examine the current evidence for these correlations, and conclude that a mixture of many length scales (including some relatively long ones) in DNA sequences is responsible for the observed 1/f like ..."
Abstract

Cited by 21 (6 self)
 Add to MetaCart
(Show Context)
In this paper, we review the literature on statistical longrange correlation in DNA sequences. We examine the current evidence for these correlations, and conclude that a mixture of many length scales (including some relatively long ones) in DNA sequences is responsible for the observed 1/f like spectral component. We note the complexity of the correlation structure in DNA sequences. The observed complexity often makes it hard, or impossible, to decompose the sequence into a few statistically stationary regions. We suggest that, based on the complexity of DNA sequences, a fruitful approach to understand longrange correlation is to model duplication, and other rearrangement processes, in DNA sequences. One model, called "expansionmodification system", contains only point duplication and point mutation. Though simplistic, this model is able to generate sequences with 1/f spectra. We emphasize the importance of DNA duplication in its contribution to the observed longrang...
Generating Nontrivial LongRange Correlations and 1/f Spectra By Replication and Mutation
 INTERNATIONAL JOURNAL OF BIFURCATION AND CHAOS
, 1992
"... This paper aims at understanding the statistical features of nucleic acid sequences from the knowledge of the dynamical process that produces them. Two studies are carried out: first, mutual information function of the limiting sequences generated by simple sequence manipulation dynamics with repl ..."
Abstract

Cited by 19 (9 self)
 Add to MetaCart
This paper aims at understanding the statistical features of nucleic acid sequences from the knowledge of the dynamical process that produces them. Two studies are carried out: first, mutual information function of the limiting sequences generated by simple sequence manipulation dynamics with replications and mutations are calculated numerically (sometimes analytically). It is shown that elongation and replication can easily produce longrange correlations. These long range correlations could be destroyed in various degrees bymutation in different sequence manipulation models. Second, mutual information functions for several human nucleic acid sequences are determined. It is observed that intron sequences (noncoding sequences) tend to have longer correlation lengths than exon sequences (proteincoding sequences).
The KOJAK Group Finder: Connecting the Dots via Integrated KnowledgeBased and
 Statistical Reasoning,” Proc. 16th Innovative Applications of Artificial Intelligence Conf. (IAAI ’04
, 2004
"... Link discovery is a new challenge in data mining whose primary concerns are to identify strong links and discover hidden relationships among entities and organizations based on lowlevel, incomplete and noisy evidence data. To address this challenge, we are developing a hybrid link discovery system ..."
Abstract

Cited by 18 (6 self)
 Add to MetaCart
(Show Context)
Link discovery is a new challenge in data mining whose primary concerns are to identify strong links and discover hidden relationships among entities and organizations based on lowlevel, incomplete and noisy evidence data. To address this challenge, we are developing a hybrid link discovery system called KOJAK that combines stateoftheart knowledge representation and reasoning (KR&R) technology with statistical clustering and analysis techniques from the area of data mining. In this paper we report on the architecture and technology of its first fully completed module called the KOJAK Group Finder. The Group Finder is capable of finding hidden groups and group members in large evidence databases. Our group finding approach addresses a variety of important LD challenges, such as being able to exploit heterogeneous and structurally rich evidence, handling the connectivity curse, noise and corruption as well as the capability to scale up to very large, realistic data sets. The first version of the KOJAK Group Finder has been successfully tested and evaluated on a variety of synthetic datasets.