Results 1 - 10
of
11
Distribution of Mutual Information from Complete And Incomplete Data
- Computational Statistics and Data Analysis
, 2004
"... Mutual information is widely used, in a descriptive way, to measure the stochastic dependence of categorical random variables. In order to address questions such as the reliability of the descriptive value, one must consider sample-to-population inferential approaches. This paper deals with the post ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Mutual information is widely used, in a descriptive way, to measure the stochastic dependence of categorical random variables. In order to address questions such as the reliability of the descriptive value, one must consider sample-to-population inferential approaches. This paper deals with the posterior distribution of mutual information, as obtained in a Bayesian framework by a second-order Dirichlet prior distribution. The exact analytical expression for the mean, and analytical approximations for the variance, skewness and kurtosis are derived. These approximations have a guaranteed accuracy level of the order O(n -3 ), where n is the sample size. Leading order approximations for the mean and the variance are derived in the case of incomplete samples. The derived analytical expressions allow the distribution of mutual information to be approximated reliably and quickly. In fact, the derived expressions can be computed with the same order of complexity needed for descriptive mutual information. This makes the distribution of mutual information become a concrete alternative to descriptive mutual information in many applications which would benefit from moving to the inductive side. Some of these prospective applications are discussed, and one of them, namely feature selection,isshowntoperform significantly better when inductive mutual information is used.
Co-evolutionary modular neural networks for automatic problem decomposition
- in Proc. of The 2005 IEEE Congress on Evolutionary Computation
, 2005
"... Abstract- Decomposing a complex computational problem into sub-problems, which are computationally simpler to solve individually and which can be combined to produce a solution to the full problem, can efficiently lead to compact and general solutions. Modular neural networks represent one of the wa ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Abstract- Decomposing a complex computational problem into sub-problems, which are computationally simpler to solve individually and which can be combined to produce a solution to the full problem, can efficiently lead to compact and general solutions. Modular neural networks represent one of the ways in which this divide-and-conquer strategy can be implemented. Here we present a co-evolutionary model which is used to design and optimize modular neural networks with taskspecific modules. The model consists of two populations. The first population consists of a pool of modules and the second population synthesizes complete systems by drawing elements from the pool of modules. Modules represent a part of the solution, which co-operates with others in the module population to form a complete solution. With the help of two artificial supervised learning tasks created by mixing two sub-tasks we demonstrate that if a particular task decomposition is better in terms of performance on the overall task, it can be evolved using this co-evolutionary model. 1
Auto-supervised learning in the Bayesian programming framework
- in The AAAI Spring Symposium on Developmental Robotics
, 2005
"... Abstract — Domestic and real world robotics requires continuous learning of new skills and behaviors to interact with humans. Auto-supervised learning, a compromise between supervised and completely unsupervised learning, consist in relying on previous knowledge to acquire new skills. We propose her ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract — Domestic and real world robotics requires continuous learning of new skills and behaviors to interact with humans. Auto-supervised learning, a compromise between supervised and completely unsupervised learning, consist in relying on previous knowledge to acquire new skills. We propose here to realize auto-supervised learning by exploiting statistical regularities in the sensorimotor space of a robot. In our context, it corresponds to achieve feature selection in a Bayesian programming framework. We compare several feature selection algorithms and validate them on a real robotic experiment.
Bayesian treatment of incomplete discrete data applied to mutual information and feature selection
- Proceedings of the Twenty-sixth German Conference on Artificial Intelligence (KI-2003), volume 2821 of Lecture Notes in Computer Science
, 2003
"... Given the joint chances of a pair of random variables one can compute quantities of interest, like the mutual information. The Bayesian treatment of unknown chances involves computing, from a second order prior distribution and the data likelihood, a posterior distribution of the chances. A common t ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
Given the joint chances of a pair of random variables one can compute quantities of interest, like the mutual information. The Bayesian treatment of unknown chances involves computing, from a second order prior distribution and the data likelihood, a posterior distribution of the chances. A common treatment of incomplete data is to assume ignorability and determine the chances by the expectation maximization (EM) algorithm. The two different methods above are well established but typically separated. This paper joins the two approaches in the case of Dirichlet priors, and derives efficient approximations for the mean, mode and the (co)variance of the chances and the mutual information. Furthermore, we prove the unimodality of the posterior distribution, whence the important property of convergence of EM to the global maximum in the chosen framework. These results are applied to the problem of selecting features for incremental learning and naive Bayes classification. A fast filter based on the distribution of mutual information is shown to outperform the traditional filter based on empirical mutual information on a number of incomplete real data sets.
Tree-Based Credal Networks for Classification
- Reliable Computing
"... Bayesian networks are models for uncertain reasoning which are achieving a growing importance also for the data mining task of classification. Credal networks extend Bayesian nets to sets of distributions, or credal sets. This paper extends a state-of-the-art Bayesian net for classification, called ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Bayesian networks are models for uncertain reasoning which are achieving a growing importance also for the data mining task of classification. Credal networks extend Bayesian nets to sets of distributions, or credal sets. This paper extends a state-of-the-art Bayesian net for classification, called tree-augmented naive Bayes classifier, to credal sets originated from probability intervals. This extension is a basis to address the fundamental problem of prior ignorance about the distribution that generates the data, which is a commonplace in data mining applications. This issue is often neglected, but addressing it properly is a key to ultimately draw reliable conclusions from the inferred models. In this paper we formalize the new model, develop an exact linear-time classification algorithm, and evaluate the credal netbased classifier on a number of real data sets. The empirical analysis shows that the new classifier is good and reliable, and raises a problem of excessive caution that is discussed in the paper. Overall, given the favorable trade-o# between expressiveness and e#cient computation, the newly proposed classifier appears to be a good candidate for the wide-scale application of reliable classifiers based on credal networks, to real and complex tasks.
Lautum Information
"... Abstract—A popular way to measure the degree of dependence between two random objects is by their mutual information, defined as the divergence between the joint and product-of-marginal distributions. We investigate an alternative measure of dependence: the lautum information defined as the divergen ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract—A popular way to measure the degree of dependence between two random objects is by their mutual information, defined as the divergence between the joint and product-of-marginal distributions. We investigate an alternative measure of dependence: the lautum information defined as the divergence between the product-of-marginal and joint distributions, i.e., swapping the arguments in the definition of mutual information. Some operational characterizations and properties are provided for this alternative measure of information. Index Terms—Divergence, hypothesis testing, information measures, Kelly gambling, mutual information.
W-operator Window Design by Maximization of Training Data Information
"... for the estimation of a W-operator from training data. The idea is to choose a subset of variables that maximizes the information observed in a set of training data. The task is formalized as a combinatorial optimization problem, where the search space is the powerset set of the candidate variable ..."
Abstract
- Add to MetaCart
for the estimation of a W-operator from training data. The idea is to choose a subset of variables that maximizes the information observed in a set of training data. The task is formalized as a combinatorial optimization problem, where the search space is the powerset set of the candidate variables and the measure to be minimized is the mean entropy of the estimated conditional probabilities. As a full exploration of the search space requires an enormous computational effort, some heuristics of the feature selection literature are applied. The proposed technique is mathematically sound and experimental results show that it is adequate in practice.
1 Is feature selection worth the effort? Assessing the impact on Random Forest and SVM
"... accuracy and computation time ..."
Conditional-loglikelihood MDL and Evolutionary MCMC
- PHD THESIS
, 2006
"... In the current society there is an increasing interest in intelligent techniques that can automatically process, analyze, and summarize the ever growing amount of data. Artificial intelligence is a research field that studies intelligent algorithms to support people in making decisions. Algorithms t ..."
Abstract
- Add to MetaCart
In the current society there is an increasing interest in intelligent techniques that can automatically process, analyze, and summarize the ever growing amount of data. Artificial intelligence is a research field that studies intelligent algorithms to support people in making decisions. Algorithms that are able to induce knowledge from examples are researched in the field of machine learning. This thesis studies improvements of particular machine learning algorithms. In the first part of this thesis we describe methods that are able to select useful attributes (or features) that can be used as inputs by a classification algorithm. We focus on Bayesian network classifiers that use Bayesian networks as knowledge representation and, more in particular, on selecting relevant attributes that should be used as inputs for the Bayesian network classifier. For our goal to construct selective Bayesian network classifiers, we propose and investigate a score function that can evaluate Bayesian network classifiers and that indicates the simplest and the most performant classifier. We theoretically and experimentally show that our proposed conditional log-likelihood minimum description length (MDL) is well suited for constructing simple and well performing Bayesian network classifiers. In the second part of this thesis we integrate some methods from evolutionary computation into a Markov chain Monte Carlo (MCMC) sampler. Sampling is related to optimization, but whereas in optimization we are only interested in the state with the highest fitness, in sampling we are interested in the overall probability distribution over states. To improve MCMC methods that are often used for sampling, we investigate the Evolutionary MCMC (EMCMC) framework, where population-based MCMCs exchange information between the individual states such that they are still MCMCs at population level. We investigate and propose various evolutionary techniques (e.g. recombination, selection) which we then integrate in the EMCMC framework. We experimentally show that our proposed EMCMCs can outperform the standard MCMC algorithms.

