Results 1  10
of
18
Distribution of Mutual Information from Complete And Incomplete Data
 Computational Statistics and Data Analysis
, 2004
"... Mutual information is widely used, in a descriptive way, to measure the stochastic dependence of categorical random variables. In order to address questions such as the reliability of the descriptive value, one must consider sampletopopulation inferential approaches. This paper deals with the post ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
Mutual information is widely used, in a descriptive way, to measure the stochastic dependence of categorical random variables. In order to address questions such as the reliability of the descriptive value, one must consider sampletopopulation inferential approaches. This paper deals with the posterior distribution of mutual information, as obtained in a Bayesian framework by a secondorder Dirichlet prior distribution. The exact analytical expression for the mean, and analytical approximations for the variance, skewness and kurtosis are derived. These approximations have a guaranteed accuracy level of the order O(n 3 ), where n is the sample size. Leading order approximations for the mean and the variance are derived in the case of incomplete samples. The derived analytical expressions allow the distribution of mutual information to be approximated reliably and quickly. In fact, the derived expressions can be computed with the same order of complexity needed for descriptive mutual information. This makes the distribution of mutual information become a concrete alternative to descriptive mutual information in many applications which would benefit from moving to the inductive side. Some of these prospective applications are discussed, and one of them, namely feature selection,isshowntoperform significantly better when inductive mutual information is used.
Coevolutionary modular neural networks for automatic problem decomposition
 in Proc. of The 2005 IEEE Congress on Evolutionary Computation
, 2005
"... Abstract Decomposing a complex computational problem into subproblems, which are computationally simpler to solve individually and which can be combined to produce a solution to the full problem, can efficiently lead to compact and general solutions. Modular neural networks represent one of the wa ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Abstract Decomposing a complex computational problem into subproblems, which are computationally simpler to solve individually and which can be combined to produce a solution to the full problem, can efficiently lead to compact and general solutions. Modular neural networks represent one of the ways in which this divideandconquer strategy can be implemented. Here we present a coevolutionary model which is used to design and optimize modular neural networks with taskspecific modules. The model consists of two populations. The first population consists of a pool of modules and the second population synthesizes complete systems by drawing elements from the pool of modules. Modules represent a part of the solution, which cooperates with others in the module population to form a complete solution. With the help of two artificial supervised learning tasks created by mixing two subtasks we demonstrate that if a particular task decomposition is better in terms of performance on the overall task, it can be evolved using this coevolutionary model. 1
Lautum Information
"... Abstract—A popular way to measure the degree of dependence between two random objects is by their mutual information, defined as the divergence between the joint and productofmarginal distributions. We investigate an alternative measure of dependence: the lautum information defined as the divergen ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Abstract—A popular way to measure the degree of dependence between two random objects is by their mutual information, defined as the divergence between the joint and productofmarginal distributions. We investigate an alternative measure of dependence: the lautum information defined as the divergence between the productofmarginal and joint distributions, i.e., swapping the arguments in the definition of mutual information. Some operational characterizations and properties are provided for this alternative measure of information. Index Terms—Divergence, hypothesis testing, information measures, Kelly gambling, mutual information.
Autosupervised learning in the Bayesian programming framework
 in The AAAI Spring Symposium on Developmental Robotics
, 2005
"... Abstract — Domestic and real world robotics requires continuous learning of new skills and behaviors to interact with humans. Autosupervised learning, a compromise between supervised and completely unsupervised learning, consist in relying on previous knowledge to acquire new skills. We propose her ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Abstract — Domestic and real world robotics requires continuous learning of new skills and behaviors to interact with humans. Autosupervised learning, a compromise between supervised and completely unsupervised learning, consist in relying on previous knowledge to acquire new skills. We propose here to realize autosupervised learning by exploiting statistical regularities in the sensorimotor space of a robot. In our context, it corresponds to achieve feature selection in a Bayesian programming framework. We compare several feature selection algorithms and validate them on a real robotic experiment.
Bayesian treatment of incomplete discrete data applied to mutual information and feature selection
 Proceedings of the Twentysixth German Conference on Artificial Intelligence (KI2003), volume 2821 of Lecture Notes in Computer Science
, 2003
"... Given the joint chances of a pair of random variables one can compute quantities of interest, like the mutual information. The Bayesian treatment of unknown chances involves computing, from a second order prior distribution and the data likelihood, a posterior distribution of the chances. A common t ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
Given the joint chances of a pair of random variables one can compute quantities of interest, like the mutual information. The Bayesian treatment of unknown chances involves computing, from a second order prior distribution and the data likelihood, a posterior distribution of the chances. A common treatment of incomplete data is to assume ignorability and determine the chances by the expectation maximization (EM) algorithm. The two different methods above are well established but typically separated. This paper joins the two approaches in the case of Dirichlet priors, and derives efficient approximations for the mean, mode and the (co)variance of the chances and the mutual information. Furthermore, we prove the unimodality of the posterior distribution, whence the important property of convergence of EM to the global maximum in the chosen framework. These results are applied to the problem of selecting features for incremental learning and naive Bayes classification. A fast filter based on the distribution of mutual information is shown to outperform the traditional filter based on empirical mutual information on a number of incomplete real data sets.
TreeBased Credal Networks for Classification
 Reliable Computing
"... Bayesian networks are models for uncertain reasoning which are achieving a growing importance also for the data mining task of classification. Credal networks extend Bayesian nets to sets of distributions, or credal sets. This paper extends a stateoftheart Bayesian net for classification, called ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Bayesian networks are models for uncertain reasoning which are achieving a growing importance also for the data mining task of classification. Credal networks extend Bayesian nets to sets of distributions, or credal sets. This paper extends a stateoftheart Bayesian net for classification, called treeaugmented naive Bayes classifier, to credal sets originated from probability intervals. This extension is a basis to address the fundamental problem of prior ignorance about the distribution that generates the data, which is a commonplace in data mining applications. This issue is often neglected, but addressing it properly is a key to ultimately draw reliable conclusions from the inferred models. In this paper we formalize the new model, develop an exact lineartime classification algorithm, and evaluate the credal netbased classifier on a number of real data sets. The empirical analysis shows that the new classifier is good and reliable, and raises a problem of excessive caution that is discussed in the paper. Overall, given the favorable tradeo# between expressiveness and e#cient computation, the newly proposed classifier appears to be a good candidate for the widescale application of reliable classifiers based on credal networks, to real and complex tasks.
Woperator Window Design by Maximization of Training Data Information
"... for the estimation of a Woperator from training data. The idea is to choose a subset of variables that maximizes the information observed in a set of training data. The task is formalized as a combinatorial optimization problem, where the search space is the powerset set of the candidate variable ..."
Abstract
 Add to MetaCart
for the estimation of a Woperator from training data. The idea is to choose a subset of variables that maximizes the information observed in a set of training data. The task is formalized as a combinatorial optimization problem, where the search space is the powerset set of the candidate variables and the measure to be minimized is the mean entropy of the estimated conditional probabilities. As a full exploration of the search space requires an enormous computational effort, some heuristics of the feature selection literature are applied. The proposed technique is mathematically sound and experimental results show that it is adequate in practice.
1 Is feature selection worth the effort? Assessing the impact on Random Forest and SVM
"... accuracy and computation time ..."
Conditionalloglikelihood MDL and Evolutionary MCMC
 PHD THESIS
, 2006
"... In the current society there is an increasing interest in intelligent techniques that can automatically process, analyze, and summarize the ever growing amount of data. Artificial intelligence is a research field that studies intelligent algorithms to support people in making decisions. Algorithms t ..."
Abstract
 Add to MetaCart
In the current society there is an increasing interest in intelligent techniques that can automatically process, analyze, and summarize the ever growing amount of data. Artificial intelligence is a research field that studies intelligent algorithms to support people in making decisions. Algorithms that are able to induce knowledge from examples are researched in the field of machine learning. This thesis studies improvements of particular machine learning algorithms. In the first part of this thesis we describe methods that are able to select useful attributes (or features) that can be used as inputs by a classification algorithm. We focus on Bayesian network classifiers that use Bayesian networks as knowledge representation and, more in particular, on selecting relevant attributes that should be used as inputs for the Bayesian network classifier. For our goal to construct selective Bayesian network classifiers, we propose and investigate a score function that can evaluate Bayesian network classifiers and that indicates the simplest and the most performant classifier. We theoretically and experimentally show that our proposed conditional loglikelihood minimum description length (MDL) is well suited for constructing simple and well performing Bayesian network classifiers. In the second part of this thesis we integrate some methods from evolutionary computation into a Markov chain Monte Carlo (MCMC) sampler. Sampling is related to optimization, but whereas in optimization we are only interested in the state with the highest fitness, in sampling we are interested in the overall probability distribution over states. To improve MCMC methods that are often used for sampling, we investigate the Evolutionary MCMC (EMCMC) framework, where populationbased MCMCs exchange information between the individual states such that they are still MCMCs at population level. We investigate and propose various evolutionary techniques (e.g. recombination, selection) which we then integrate in the EMCMC framework. We experimentally show that our proposed EMCMCs can outperform the standard MCMC algorithms.