Results 1  10
of
14
Distribution of mutual information
 Advances in Neural Information Processing Systems 14: Proceedings of the 2002 Conference
, 2002
"... expectation and variance of mutual information. The mutual information of two random variables ı and j with joint probabilities {πij} is commonly used in learning Bayesian nets as well as in many other fields. The chances πij are usually estimated by the empirical sampling frequency nij/n leading to ..."
Abstract

Cited by 43 (12 self)
 Add to MetaCart
expectation and variance of mutual information. The mutual information of two random variables ı and j with joint probabilities {πij} is commonly used in learning Bayesian nets as well as in many other fields. The chances πij are usually estimated by the empirical sampling frequency nij/n leading to a point estimate I(nij/n) for the mutual information. To answer questions like “is I(nij/n) consistent with zero? ” or “what is the probability that the true mutual information is much larger than the point estimate? ” one has to go beyond the point estimate. In the Bayesian framework one can answer these questions by utilizing a (second order) prior distribution p(π) comprising prior information about π. From the prior p(π) one can compute the posterior p(πn), from which the distribution p(In) of the mutual information can be calculated. We derive reliable and quickly computable approximations for p(In). We concentrate on the mean, variance, skewness, and kurtosis, and noninformative priors. For the mean we also
Robust Feature Selection by Mutual Information Distributions
 Proceedings of the 18th International Conference on Uncertainty in Artificial Intelligence (UAI2002
, 2002
"... Mutual information is widely used in artificial intelligence, in a descriptive way, to measure the stochastic dependence of discrete random variables. In order to address questions such as the reliability of the empirical value, one must consider sampletopopulation inferential approaches. This pap ..."
Abstract

Cited by 27 (6 self)
 Add to MetaCart
Mutual information is widely used in artificial intelligence, in a descriptive way, to measure the stochastic dependence of discrete random variables. In order to address questions such as the reliability of the empirical value, one must consider sampletopopulation inferential approaches. This paper deals with the distribution of mutual information, as obtained in a Bayesian framework by a secondorder Dirichlet prior distribution. The exact analytical expression for the mean and an analytical approximation of the variance are reported. Asymptotic approximations of the distribution are proposed. The results are applied to the problem of selecting features for incremental learning and classification of the naive Bayes classifier. A fast, newly defined method is shown to outperform the traditional approach based on empirical mutual information on a number of real data sets. Finally, a theoretical development is reported that allows one to efficiently extend the above methods to incomplete samples in an easy and effective way.
Distribution of Mutual Information from Complete And Incomplete Data
 Computational Statistics and Data Analysis
, 2004
"... Mutual information is widely used, in a descriptive way, to measure the stochastic dependence of categorical random variables. In order to address questions such as the reliability of the descriptive value, one must consider sampletopopulation inferential approaches. This paper deals with the post ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
(Show Context)
Mutual information is widely used, in a descriptive way, to measure the stochastic dependence of categorical random variables. In order to address questions such as the reliability of the descriptive value, one must consider sampletopopulation inferential approaches. This paper deals with the posterior distribution of mutual information, as obtained in a Bayesian framework by a secondorder Dirichlet prior distribution. The exact analytical expression for the mean, and analytical approximations for the variance, skewness and kurtosis are derived. These approximations have a guaranteed accuracy level of the order O(n 3 ), where n is the sample size. Leading order approximations for the mean and the variance are derived in the case of incomplete samples. The derived analytical expressions allow the distribution of mutual information to be approximated reliably and quickly. In fact, the derived expressions can be computed with the same order of complexity needed for descriptive mutual information. This makes the distribution of mutual information become a concrete alternative to descriptive mutual information in many applications which would benefit from moving to the inductive side. Some of these prospective applications are discussed, and one of them, namely feature selection,isshowntoperform significantly better when inductive mutual information is used.
Conquering hierarchical difficulty by explicit chunking: Substructural chromosome compression
 In Proceedings of the 2006 Genetic and Evolutionary Computation Conference (GECCO 2006) Workshops: International Workshop on Learning Classifier Systems
, 2006
"... This paper proposes a chromosome compression scheme which represents subsolutions by the most expressive schemata. The proposed chromosome compression scheme is combined with the dependency structure matrix genetic algorithm and the restricted tournament replacement to create a scalable optimization ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
This paper proposes a chromosome compression scheme which represents subsolutions by the most expressive schemata. The proposed chromosome compression scheme is combined with the dependency structure matrix genetic algorithm and the restricted tournament replacement to create a scalable optimization tool which optimizes problems via hierarchical decomposition. One important feature of the proposed method is that at the end of the run, the problem structure obtained from the proposed method is comprehensible to human researchers and is reusable for largerscale problems. The empirical result shows that the proposed method scales subquadratically with the problem size on hierarchical problems and is able to capture the problem structures accurately.
A Matrix Approach for Finding Extrema: PROBLEMS WITH MODULARITY, HIERARCHY, AND OVERLAP
, 2006
"... Unlike most simple textbook examples, the real world is full with complex systems, and researchers in many different fields are often confronted by problems arising from such systems. Simple heuristics or even enumeration works quite well on small and easy problems; however, to efficiently solve lar ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Unlike most simple textbook examples, the real world is full with complex systems, and researchers in many different fields are often confronted by problems arising from such systems. Simple heuristics or even enumeration works quite well on small and easy problems; however, to efficiently solve large and difficult problems, proper decomposition according to the complex system is the key. In this research project, investigating and analyzing interactions between components of complex systems shed some light on problem decomposition. By recognizing three barebone types of interactions—modularity, hierarchy, and overlap, theories and models are developed to dissect and inspect problem decomposition in the context of genetic algorithms. This dissertation presents a research project to develop a competent optimization method to solve boundedly difficult problems with modularity, hierarchy, and overlap by explicit problem decomposition. The proposed genetic algorithm design utilizes a matrix representation of an interaction graph to analyze and decompose the problem. The results from this thesis should benefit research both technically and scientifically. Technically, this thesis develops an automated dependency structure matrix clustering technique and utilizes it to design a competent blackbox problem solver. Scientifically, the explicit interaction
Robust inference of trees
 IDSIA, Manno (Lugano), CH, 2003. Marcus Hutter is with the AI research institute IDSIA, Galleria 2, CH6928 MannoLugano, Switzerland. Email: marcus@idsia.ch, HP: http://www.idsia.ch/∼marcus/idsia
, 2003
"... Abstract. This paper is concerned with the reliable inference of optimal treeapproximations to the dependency structure of an unknown distribution generating data. The traditional approach to the problem measures the dependency strength between random variables by the index called mutual information ..."
Abstract

Cited by 7 (7 self)
 Add to MetaCart
(Show Context)
Abstract. This paper is concerned with the reliable inference of optimal treeapproximations to the dependency structure of an unknown distribution generating data. The traditional approach to the problem measures the dependency strength between random variables by the index called mutual information. In this paper reliability is achieved by Walley’s imprecise Dirichlet model, which generalizes Bayesian learning with Dirichlet priors. Adopting the imprecise Dirichlet model results in posterior interval expectation for mutual information, and in a set of plausible trees consistent with the data. Reliable inference about the actual tree is achieved by focusing on the substructure common to all the plausible trees. We develop an exact algorithm that infers the substructure in time O(m 4), m being the number of random variables. The new algorithm is applied to a set of data sampled from a known distribution. The method is shown to reliably infer edges of the actual tree even when the data are very scarce, unlike the traditional approach. Finally, we provide lower and upper credibility limits for mutual information under the imprecise Dirichlet model. These enable the previous developments to be extended to a full inferential method for trees.
Bayesian treatment of incomplete discrete data applied to mutual information and feature selection
 Proceedings of the Twentysixth German Conference on Artificial Intelligence (KI2003), volume 2821 of Lecture Notes in Computer Science
, 2003
"... Given the joint chances of a pair of random variables one can compute quantities of interest, like the mutual information. The Bayesian treatment of unknown chances involves computing, from a second order prior distribution and the data likelihood, a posterior distribution of the chances. A common t ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
Given the joint chances of a pair of random variables one can compute quantities of interest, like the mutual information. The Bayesian treatment of unknown chances involves computing, from a second order prior distribution and the data likelihood, a posterior distribution of the chances. A common treatment of incomplete data is to assume ignorability and determine the chances by the expectation maximization (EM) algorithm. The two different methods above are well established but typically separated. This paper joins the two approaches in the case of Dirichlet priors, and derives efficient approximations for the mean, mode and the (co)variance of the chances and the mutual information. Furthermore, we prove the unimodality of the posterior distribution, whence the important property of convergence of EM to the global maximum in the chosen framework. These results are applied to the problem of selecting features for incremental learning and naive Bayes classification. A fast filter based on the distribution of mutual information is shown to outperform the traditional filter based on empirical mutual information on a number of incomplete real data sets.
TreeBased Credal Networks for Classification
 Reliable Computing
"... Bayesian networks are models for uncertain reasoning which are achieving a growing importance also for the data mining task of classification. Credal networks extend Bayesian nets to sets of distributions, or credal sets. This paper extends a stateoftheart Bayesian net for classification, called ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Bayesian networks are models for uncertain reasoning which are achieving a growing importance also for the data mining task of classification. Credal networks extend Bayesian nets to sets of distributions, or credal sets. This paper extends a stateoftheart Bayesian net for classification, called treeaugmented naive Bayes classifier, to credal sets originated from probability intervals. This extension is a basis to address the fundamental problem of prior ignorance about the distribution that generates the data, which is a commonplace in data mining applications. This issue is often neglected, but addressing it properly is a key to ultimately draw reliable conclusions from the inferred models. In this paper we formalize the new model, develop an exact lineartime classification algorithm, and evaluate the credal netbased classifier on a number of real data sets. The empirical analysis shows that the new classifier is good and reliable, and raises a problem of excessive caution that is discussed in the paper. Overall, given the favorable tradeo# between expressiveness and e#cient computation, the newly proposed classifier appears to be a good candidate for the widescale application of reliable classifiers based on credal networks, to real and complex tasks.
Algorithms
, 2006
"... This paper presents a populationsizing model for the entropybased model building in genetic algorithms. Specifically, the population size required for building an accurate model is investigated. The effect of the selection pressure on population sizing is also incorporated. The proposed model indi ..."
Abstract
 Add to MetaCart
This paper presents a populationsizing model for the entropybased model building in genetic algorithms. Specifically, the population size required for building an accurate model is investigated. The effect of the selection pressure on population sizing is also incorporated. The proposed model indicates that the population size required for building an accurate model scales as Θ(mlog m), where m is the number of substructures and proportional to the problem size. Experiments are conducted to verify the derivations, and the results agree with the proposed model. 1
unknown title
, 2004
"... Mutual information is widely used, in a descriptive way, to measure the stochastic dependence of categorical random variables. In order to address questions such as the reliability of the descriptive value, one must consider sampletopopulation inferential approaches. This paper deals with the post ..."
Abstract
 Add to MetaCart
(Show Context)
Mutual information is widely used, in a descriptive way, to measure the stochastic dependence of categorical random variables. In order to address questions such as the reliability of the descriptive value, one must consider sampletopopulation inferential approaches. This paper deals with the posterior distribution of mutual information, as obtained in a Bayesian framework by a secondorder Dirichlet prior distribution. The exact analytical expression for the mean, and analytical approximations for the variance, skewness and kurtosis are derived. These approximations have a guaranteed accuracy level of the order O(n −3), where n is the sample size. Leading order approximations for the mean and the variance are derived in the case of incomplete samples. The derived analytical expressions allow the distribution of mutual information to be approximated reliably and quickly. In fact, the derived expressions can be computed with the same order of complexity needed for descriptive mutual information. This makes the distribution of mutual information become a concrete alternative to descriptive mutual information in many applications which would benefit from moving to the inductive side. Some of these prospective applications are discussed, and one of them, namely feature selection, is shown to perform significantly better when inductive mutual information is used.