• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Mutual information functions versus correlation functions (1990)

by W Li
Venue:J Stat Phys
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 35
Next 10 →

Using mutual information for selecting features in supervised neural net learning

by Roberto Battiti - IEEE Transactions on Neural Networks , 1994
"... Abstract-This paper investigates the application of the mutual infor “ criterion to evaluate a set of candidate features and to select an informative subset to be used as input data for a neural network classifier. Because the mutual information measures arbitrary dependencies between random variabl ..."
Abstract - Cited by 141 (0 self) - Add to MetaCart
Abstract-This paper investigates the application of the mutual infor “ criterion to evaluate a set of candidate features and to select an informative subset to be used as input data for a neural network classifier. Because the mutual information measures arbitrary dependencies between random variables, it is suitable for assessing the “information content ” of features in complex classification tasks, where methods bases on linear relations (like the correlation) are prone to mistakes. The fact that the mutual information is independent of the coordinates chosen permits a robust estimation. Nonetheless, the use of the mutual information for tasks characterized by high input dimensionality requires suitable approximations because of the prohibitive demands on computation and samples. An algorithm is proposed that is based on a “greedy ” selection of the features and that takes both the mutual information with respect to the output class and with respect to the already-selected features into account. Finally the results of a series of experiments are discussed. Index Terms-Feature extraction, neural network pruning, di-mensionality reduction, mutual information, supervised learning,

Unsupervised induction of stochastic context-free grammars using distributional clustering

by Alexander Clark
"... An algorithm is presented for learning a phrase-structure grammar from tagged text. It clusters sequences of tags together based on local distributional information, and selects clusters that satisfy a novel mutual information criterion. This criterion is shown to be related to the entropy of a rand ..."
Abstract - Cited by 39 (2 self) - Add to MetaCart
An algorithm is presented for learning a phrase-structure grammar from tagged text. It clusters sequences of tags together based on local distributional information, and selects clusters that satisfy a novel mutual information criterion. This criterion is shown to be related to the entropy of a random variable associated with the tree structures, and it is demonstrated that it selects linguistically plausible constituents. This is incorporated in a Minimum Description Length algorithm. The evaluation of unsupervised models is discussed, and results are presented when the algorithm has been trained on 12 million words of the British National Corpus. 1

ESTIMATING FUNCTIONS OF PROBABILITY DISTRIBUTIONS FROM A FINITE SET OF SAMPLES Part II: Bayes Estimators for Mutual Information, Chi-Squared, Covariance, and other Statistics.

by David R. Wolf, David H. Wolpert
"... This paper is the second in a series of two on the problem of estimating a function of a probability distribution from a finite set of samples of that distribution. In the first paper, the Bayes estimator for a function of a probability distribution was introduced, the optimal properties of the Baye ..."
Abstract - Cited by 33 (3 self) - Add to MetaCart
This paper is the second in a series of two on the problem of estimating a function of a probability distribution from a finite set of samples of that distribution. In the first paper, the Bayes estimator for a function of a probability distribution was introduced, the optimal properties of the Bayes estimator were discussed, and the Bayes and frequency-counts estimators for the Shannon entropy were derived and graphically contrasted. In the current paper the analysis of the first paper is extended by the derivation of Bayes estimators for several other functions of interest in statistics and information theory. These functions are (powers of) the mutual information, chisquared for tests of independence, variance, covariance, and average. Finding Bayes estimators for several of these functions requires extensions to the analytical techniques developed in the first paper, and these extensions form the main body of this paper. This paper extends the analysis in other ways as well, for example by enlarging the class of potential priors beyond the uniform prior assumed in the first paper. In particular, the use of the entropic and Dirichlet priors is considered.

The study of correlation structures of dna sequences: a critical review

by Wentian Li, W. Li - Computers Chem , 1997
"... to be published in the special issue of Computer & Chemistry ..."
Abstract - Cited by 32 (7 self) - Add to MetaCart
to be published in the special issue of Computer & Chemistry

Transition Phenomena in Cellular Automata Rule Space

by Wentian Li, Norman H. Packard, Chris Langton - Physica D , 1990
"... We define several qualitative classes of cellular automata (CA) behavior, based on various statistical measures, and describe how the space of all cellular automata is organized. As a cellular automaton... ..."
Abstract - Cited by 25 (5 self) - Add to MetaCart
We define several qualitative classes of cellular automata (CA) behavior, based on various statistical measures, and describe how the space of all cellular automata is organized. As a cellular automaton...

Linear Feature Extractors Based on Mutual Information

by Kurt D. Bollacker, Joydeep Ghosh - In Proceedings of the 13th International Conference on Pattern Recognition , 1996
"... This paper presents and evaluates two linear feature extractors based on mutual information. These feature extractors consider general dependencies between features and class labels, as opposed to well known linear methods such as PCA which does not consider class labels and LDA, which uses only sim ..."
Abstract - Cited by 19 (2 self) - Add to MetaCart
This paper presents and evaluates two linear feature extractors based on mutual information. These feature extractors consider general dependencies between features and class labels, as opposed to well known linear methods such as PCA which does not consider class labels and LDA, which uses only simple low order dependencies. As evidenced by several simulations on high dimensional data sets, the proposed techniques provide superior feature extraction and better dimensionality reduction while having similar computational requirements. 1. Introduction The capabilities of a classifier are ultimately limited by the quality of the features in each input vector. In particular, when the measurement space is highdimensional but the number of samples is limited, one is faced with the "curse of dimensionality" problem during training [3]. Feature extraction is often used to alleviate this problem. Although linear feature extractors are ultimately less flexible than the more general non-linear ...

On the relationship between complexity and entropy for Markov chains and regular languages

by Wentian Li - Complex Systems , 1991
"... Abstract. Using the past-future mutual information as a measure of complexity, the relation between the complexity and the Shannon entropy is determined analytically for sequences generated by Markov chains and regular languages. It is emphasized that, given an entropy value, there are many possible ..."
Abstract - Cited by 19 (2 self) - Add to MetaCart
Abstract. Using the past-future mutual information as a measure of complexity, the relation between the complexity and the Shannon entropy is determined analytically for sequences generated by Markov chains and regular languages. It is emphasized that, given an entropy value, there are many possible complexity values, and vice versa; that is, the relationship between complexity and entropy is not one-toone, but rather many-to-one or one-to-many. It is also emphasized that there are structures in the complexity-versus-entropy plots, and these structures depend on the details of a Markov chain or a regular language grammar. 1.

Understanding Long-Range Correlations in DNA Sequences

by Wentian Li, Thomas G. Marr, Kunihiko Kaneko - Physica D , 1994
"... . In this paper, we review the literature on statistical long-range correlation in DNA sequences. We examine the current evidence for these correlations, and conclude that a mixture of many length scales #including some relatively long ones# in DNA sequences is responsible for the observed 1=f - ..."
Abstract - Cited by 14 (6 self) - Add to MetaCart
. In this paper, we review the literature on statistical long-range correlation in DNA sequences. We examine the current evidence for these correlations, and conclude that a mixture of many length scales #including some relatively long ones# in DNA sequences is responsible for the observed 1=f -like spectral component. We note the complexity of the correlation structure in DNA sequences. The observed complexity often makes it hard, or impossible, to decompose the sequence into a few statistically stationary regions. We suggest that, based on the complexityof DNA sequences, a fruitful approach to understand long-range correlation is to model duplication, and other rearrangement processes, in DNA sequences. One model, called #expansion-modi#cation system", contains only point duplication and point mutation. Though simplistic, this model is able to generate sequences with 1=f spectra. We emphasize the importance of DNA duplication in its contribution to the observed long-rang...

An Information Theory Framework for the Analysis of Scene Complexity

by Miquel Feixas, Esteve Del Acebo, Philippe Bekaert, Mateu Sbert , 1999
"... In this paper we present a new framework for the analysis of scene visibility and radiosity complexity. We introduce a number of complexity measures from information theory quantifying how difficult it is to compute with accuracy the visibility and radiosity in a scene. We define the continuous mu ..."
Abstract - Cited by 12 (8 self) - Add to MetaCart
In this paper we present a new framework for the analysis of scene visibility and radiosity complexity. We introduce a number of complexity measures from information theory quantifying how difficult it is to compute with accuracy the visibility and radiosity in a scene. We define the continuous mutual information as a complexity measure of a scene, independent of whatever discretisation, and discrete mutual information as the complexity of a discretised scene. Mutual information can be understood as the degree of correlation or dependence between all the points or patches of a scene. Thus, low complexity corresponds to low correlation and vice versa. Experiments illustrating that the best mesh of a given scene among a number of alternatives corresponds to the one with the highest discrete mutual information, indicate the feasibility of the approach. Unlike continuous mutual information, which is very cheap to compute, the computation of discrete mutual information can however b...

The KOJAK group finder: Connecting the dots via integrated knowledge-based and statistical reasoning

by Jafar Adibi, Hans Chalupsky, Eric Melz, Andre Valente - In Proceedings of the Sixteenth Innovative Applications of Artificial Intelligence Conference (IAAI-04 , 2004
"... Link discovery is a new challenge in data mining whose primary concerns are to identify strong links and discover hidden relationships among entities and organizations based on low-level, incomplete and noisy evidence data. To address this challenge, we are developing a hybrid link discovery system ..."
Abstract - Cited by 12 (5 self) - Add to MetaCart
Link discovery is a new challenge in data mining whose primary concerns are to identify strong links and discover hidden relationships among entities and organizations based on low-level, incomplete and noisy evidence data. To address this challenge, we are developing a hybrid link discovery system called KOJAK that combines state-of-theart knowledge representation and reasoning (KR&R) technology with statistical clustering and analysis techniques from the area of data mining. In this paper we report on the architecture and technology of its first fully completed module called the KOJAK Group Finder. The Group Finder is capable of finding hidden groups and group members in large evidence databases. Our group finding approach addresses a variety of important LD challenges, such as being able to exploit heterogeneous and structurally rich evidence, handling the connectivity curse, noise and corruption as well as the capability to scale up to very large, realistic data sets. The first version of the KOJAK Group Finder has been successfully tested and evaluated on a variety of synthetic datasets.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University