Results 1  10
of
26
Robust Feature Selection by Mutual Information Distributions
 Proceedings of the 18th International Conference on Uncertainty in Artificial Intelligence (UAI2002
, 2002
"... Mutual information is widely used in artificial intelligence, in a descriptive way, to measure the stochastic dependence of discrete random variables. In order to address questions such as the reliability of the empirical value, one must consider sampletopopulation inferential approaches. This pap ..."
Abstract

Cited by 28 (6 self)
 Add to MetaCart
Mutual information is widely used in artificial intelligence, in a descriptive way, to measure the stochastic dependence of discrete random variables. In order to address questions such as the reliability of the empirical value, one must consider sampletopopulation inferential approaches. This paper deals with the distribution of mutual information, as obtained in a Bayesian framework by a secondorder Dirichlet prior distribution. The exact analytical expression for the mean and an analytical approximation of the variance are reported. Asymptotic approximations of the distribution are proposed. The results are applied to the problem of selecting features for incremental learning and classification of the naive Bayes classifier. A fast, newly defined method is shown to outperform the traditional approach based on empirical mutual information on a number of real data sets. Finally, a theoretical development is reported that allows one to efficiently extend the above methods to incomplete samples in an easy and effective way.
Distribution of Mutual Information from Complete And Incomplete Data
 Computational Statistics and Data Analysis
, 2004
"... Mutual information is widely used, in a descriptive way, to measure the stochastic dependence of categorical random variables. In order to address questions such as the reliability of the descriptive value, one must consider sampletopopulation inferential approaches. This paper deals with the post ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
Mutual information is widely used, in a descriptive way, to measure the stochastic dependence of categorical random variables. In order to address questions such as the reliability of the descriptive value, one must consider sampletopopulation inferential approaches. This paper deals with the posterior distribution of mutual information, as obtained in a Bayesian framework by a secondorder Dirichlet prior distribution. The exact analytical expression for the mean, and analytical approximations for the variance, skewness and kurtosis are derived. These approximations have a guaranteed accuracy level of the order O(n 3 ), where n is the sample size. Leading order approximations for the mean and the variance are derived in the case of incomplete samples. The derived analytical expressions allow the distribution of mutual information to be approximated reliably and quickly. In fact, the derived expressions can be computed with the same order of complexity needed for descriptive mutual information. This makes the distribution of mutual information become a concrete alternative to descriptive mutual information in many applications which would benefit from moving to the inductive side. Some of these prospective applications are discussed, and one of them, namely feature selection,isshowntoperform significantly better when inductive mutual information is used.
Statistical Measurement of Information Leakage
"... Abstract. Information theory provides a range of useful methods to analyse probability distributions and these techniques have been successfully applied to measure information flow and the loss of anonymity in secure systems. However, previous work has tended to assume that the exact probabilities o ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
Abstract. Information theory provides a range of useful methods to analyse probability distributions and these techniques have been successfully applied to measure information flow and the loss of anonymity in secure systems. However, previous work has tended to assume that the exact probabilities of every action are known, or that the system is nondeterministic. In this paper, we show that measures of information leakage based on mutual information and capacity can be calculated, automatically, from trial runs of a system alone. We find a confidence interval for this estimate based on the number of possible inputs, observations and samples. We have developed a tool to automatically perform this analysis and we demonstrate our method by analysing a Mixminon anonymous remailer node. 1
Bayesian and InformationTheoretic Tools for Neuroscience
 Schoolof Psychology, University of
, 2006
"... The overarching purpose of the studies presented in this report is the exploration of the uses of information theory and Bayesian inference applied to neural codes. Two approaches were taken: Starting from first principles, a coding mechanism is proposed, the results are compared to a biological neu ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
The overarching purpose of the studies presented in this report is the exploration of the uses of information theory and Bayesian inference applied to neural codes. Two approaches were taken: Starting from first principles, a coding mechanism is proposed, the results are compared to a biological neural code. Secondly, tools from information theory are used to measure the information contained in a biological neural code. Chapter 3: The REC model proposed by Harpur and Prager [33] codes inputs into a sparse, factorial representation, maintaining reconstruction accuracy. Here I propose a modification of the REC model to determine the optimal network dimensionality. The resulting code for unfiltered natural images is accurate, highly sparse and a large fraction of the code elements show localized features. Furthermore, I propose an activation algorithm for the network that is faster and more accurate than a gradient descent based activation method. Moreover, it is demonstrated that asymmetric noise promotes sparseness. Chapter 4: A fast, exact alternative to Bayesian classification is introduced. Computational time is quadratic in both the number of observed data points and the number of degrees of freedom of the underlying model. As an example application, responses of single neurons from highlevel visual cortex (area STSa) to rapid sequences of complex visual stimuli are analyzed. Chapter 5: I present an exact Bayesian treatment of a simple, yet sufficiently general probability distribution model. The model complexity, exact values of the expectations of entropies and their variances can be computed with polynomial effort given the data. The expectation of the mutual information becomes thus available, too, and a strict upper bound on its variance. The resulting algorithm is first tested on artificial data. To that end, an information theoretic similarity measure is derived. Second, the algorithm is demonstrated to be useful in neuroscience by studying the information content of the neural responses analyzed in the previous chapter. It is shown that the information throughput of STS neurons is maximized for stimulus durations ≈ 60ms.
Robust inference of trees
 IDSIA, Manno (Lugano), CH, 2003. Marcus Hutter is with the AI research institute IDSIA, Galleria 2, CH6928 MannoLugano, Switzerland. Email: marcus@idsia.ch, HP: http://www.idsia.ch/∼marcus/idsia
, 2003
"... Abstract. This paper is concerned with the reliable inference of optimal treeapproximations to the dependency structure of an unknown distribution generating data. The traditional approach to the problem measures the dependency strength between random variables by the index called mutual information ..."
Abstract

Cited by 7 (7 self)
 Add to MetaCart
Abstract. This paper is concerned with the reliable inference of optimal treeapproximations to the dependency structure of an unknown distribution generating data. The traditional approach to the problem measures the dependency strength between random variables by the index called mutual information. In this paper reliability is achieved by Walley’s imprecise Dirichlet model, which generalizes Bayesian learning with Dirichlet priors. Adopting the imprecise Dirichlet model results in posterior interval expectation for mutual information, and in a set of plausible trees consistent with the data. Reliable inference about the actual tree is achieved by focusing on the substructure common to all the plausible trees. We develop an exact algorithm that infers the substructure in time O(m 4), m being the number of random variables. The new algorithm is applied to a set of data sampled from a known distribution. The method is shown to reliably infer edges of the actual tree even when the data are very scarce, unlike the traditional approach. Finally, we provide lower and upper credibility limits for mutual information under the imprecise Dirichlet model. These enable the previous developments to be extended to a full inferential method for trees.
Robust estimators under the Imprecise Dirichlet Model (extended version
, 2003
"... Walley’s Imprecise Dirichlet Model (IDM) for categorical data overcomes several fundamental problems which other approaches to uncertainty suffer from. Yet, to be useful in practice, one needs efficient ways for computing the imprecise=robust sets or intervals. The main objective of this work is to ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Walley’s Imprecise Dirichlet Model (IDM) for categorical data overcomes several fundamental problems which other approaches to uncertainty suffer from. Yet, to be useful in practice, one needs efficient ways for computing the imprecise=robust sets or intervals. The main objective of this work is to derive exact, conservative, and approximate, robust and credible interval estimates under the IDM for a large class of statistical estimators, including the entropy and mutual information.
An approximation to the distribution of finite sample size mutual information estimates
 ICC
, 2004
"... Abstract — In this paper, the distribution of mutual information between two discrete random variables is approximated by means of a secondorder Taylor series expansion. Approximative expressions for the distribution of mutual information (MI) between independent random variables, conditional MI be ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Abstract — In this paper, the distribution of mutual information between two discrete random variables is approximated by means of a secondorder Taylor series expansion. Approximative expressions for the distribution of mutual information (MI) between independent random variables, conditional MI between conditionally independent variables, and MI between (weakly) dependent random variables are derived. These distributions are functions of the available sample size and the number of realisations of the random variables only; knowledge of the variables ’ PMF is not required. The results are verified numerically for various cases. Exemplary application ideas in statistics and communications engineering are proposed. I.
Fast nonparametric Bayesian inference on infinite trees
 In Proc. 15th International Conference on Artificial Intelligence and Statistics (AISTATS2005
, 2005
"... Given i.i.d. data from an unknown distribution, we consider the problem of predicting future items. An adaptive way to estimate the probability density is to recursively subdivide the domain to an appropriate datadependent granularity. A Bayesian would assign a dataindependent prior probability to ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
Given i.i.d. data from an unknown distribution, we consider the problem of predicting future items. An adaptive way to estimate the probability density is to recursively subdivide the domain to an appropriate datadependent granularity. A Bayesian would assign a dataindependent prior probability to “subdivide”, which leads to a prior over infinite(ly many) trees. We derive an exact, fast, and simple inference algorithm for such a prior, for the data evidence, the predictive distribution, the effective model dimension, and other quantities. 1
A Comprehensive Evaluation of Mutual Information Analysis Using a Fair Evaluation Framework
 Advances in Cryptology CRYPTO 2011, LNCS. Springer Berlin
, 2011
"... Abstract. The resistance of cryptographic implementations to side channel analysis is matter of considerable interest to those concerned with information security. It is particularly desirable to identify the attack methodology (e.g. di erential power analysis using correlation or distanceofmeans ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
Abstract. The resistance of cryptographic implementations to side channel analysis is matter of considerable interest to those concerned with information security. It is particularly desirable to identify the attack methodology (e.g. di erential power analysis using correlation or distanceofmeans as the distinguisher) able to produce the best results. Attempts to answer this question are complicated by the many and varied factors contributing to attack success: the device power consumption characteristics, an attacker's power model, the distinguisher by which measurements and model predictions are compared, the quality of the estimations, and so on. Previous work has delivered partial answers for certain restricted scenarios. In this paper we assess the e ectiveness of mutual information analysis within a generic and comprehensive evaluation framework. Complementary to existing work, we present several notions/characterisations of attack success, as well as a means of indicating the amount of data required by an attack. We are thus able to identify scenarios in which mutual information o ers performance advantages over other distinguishers. Furthermore we observe an interesting feature unique to the mutual information based distinguisher resembling a type of stochastic resonance, which could potentially enhance the e ectiveness of such attacks over other methods in certain noisy scenarios.
Bayesian treatment of incomplete discrete data applied to mutual information and feature selection
 Proceedings of the Twentysixth German Conference on Artificial Intelligence (KI2003), volume 2821 of Lecture Notes in Computer Science
, 2003
"... Given the joint chances of a pair of random variables one can compute quantities of interest, like the mutual information. The Bayesian treatment of unknown chances involves computing, from a second order prior distribution and the data likelihood, a posterior distribution of the chances. A common t ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
Given the joint chances of a pair of random variables one can compute quantities of interest, like the mutual information. The Bayesian treatment of unknown chances involves computing, from a second order prior distribution and the data likelihood, a posterior distribution of the chances. A common treatment of incomplete data is to assume ignorability and determine the chances by the expectation maximization (EM) algorithm. The two different methods above are well established but typically separated. This paper joins the two approaches in the case of Dirichlet priors, and derives efficient approximations for the mean, mode and the (co)variance of the chances and the mutual information. Furthermore, we prove the unimodality of the posterior distribution, whence the important property of convergence of EM to the global maximum in the chosen framework. These results are applied to the problem of selecting features for incremental learning and naive Bayes classification. A fast filter based on the distribution of mutual information is shown to outperform the traditional filter based on empirical mutual information on a number of incomplete real data sets.