Results 1  10
of
237
Analyzing neural responses to natural signals: Maximally informatiove dimensions
 in Advances in Neural Information Processing 15 edited by
, 2004
"... We propose a method that allows for a rigorous statistical analysis of neural responses to natural stimuli which are non–Gaussian and exhibit strong correlations. We have in mind a model in which neurons are selective for a small number of stimulus dimensions out of a high dimensional stimulus space ..."
Abstract

Cited by 71 (13 self)
 Add to MetaCart
We propose a method that allows for a rigorous statistical analysis of neural responses to natural stimuli which are non–Gaussian and exhibit strong correlations. We have in mind a model in which neurons are selective for a small number of stimulus dimensions out of a high dimensional stimulus space, but within this subspace the responses can be arbitrarily nonlinear. Existing analysis methods are based on correlation functions between stimuli and responses, but these methods are guaranteed to work only in the case of Gaussian stimulus ensembles. As an alternative to correlation functions, we maximize the mutual information between the neural responses and projections of the stimulus onto low dimensional subspaces. The procedure can be done iteratively by increasing the dimensionality of this subspace. Those dimensions that allow the recovery of all of the information between spikes and the full unprojected stimuli describe the relevant subspace. If the dimensionality of the relevant subspace indeed is small, it becomes feasible to map the neuron’s input–output function even under fully natural stimulus conditions. These ideas are illustrated in simulations on model visual and auditory neurons responding to natural scenes and sounds, respectively. 1
RS: Correcting for the sampling bias problem in spike train information measures
 J Neurophysiol
"... for the sampling bias problem in spike train information measures. J ..."
Abstract

Cited by 60 (9 self)
 Add to MetaCart
(Show Context)
for the sampling bias problem in spike train information measures. J
Kernel methods for measuring independence
 Journal of Machine Learning Research
, 2005
"... We introduce two new functionals, the constrained covariance and the kernel mutual information, to measure the degree of independence of random variables. These quantities are both based on the covariance between functions of the random variables in reproducing kernel Hilbert spaces (RKHSs). We prov ..."
Abstract

Cited by 59 (19 self)
 Add to MetaCart
We introduce two new functionals, the constrained covariance and the kernel mutual information, to measure the degree of independence of random variables. These quantities are both based on the covariance between functions of the random variables in reproducing kernel Hilbert spaces (RKHSs). We prove that when the RKHSs are universal, both functionals are zero if and only if the random variables are pairwise independent. We also show that the kernel mutual information is an upper bound near independence on the Parzen window estimate of the mutual information. Analogous results apply for two correlationbased dependence functionals introduced earlier: we show the kernel canonical correlation and the kernel generalised variance to be independence measures for universal kernels, and prove the latter to be an upper bound on the mutual information near independence. The performance of the kernel dependence functionals in measuring independence is verified in the context of independent component analysis.
The complexity of approximating the entropy
 SIAM JOURNAL ON COMPUTING
, 2005
"... We consider the problem of approximating the entropy of a discrete distribution under several different models of oracle access to the distribution. In the evaluation oracle model, the algorithm is given access to the explicit array of probabilities specifying the distribution. In this model, linear ..."
Abstract

Cited by 46 (6 self)
 Add to MetaCart
We consider the problem of approximating the entropy of a discrete distribution under several different models of oracle access to the distribution. In the evaluation oracle model, the algorithm is given access to the explicit array of probabilities specifying the distribution. In this model, linear time in the size of the domain is both necessary and sufficient for approximating the entropy. In the generation oracle model, the algorithm has access only to independent samples from the distribution. In this ( case, we show that a γmultiplicative approximation to the entropy can be obtained in O n (1+η)/γ2 log n time for distributions with entropy Ω(γ/η), where n is the size of the domain of the distribution and η is an arbitrarily small positive constant. We show that this model does not permit a multiplicative approximation to the entropy in general. For ( the class of distributions to which our upper bound applies, we obtain a lower bound of Ω n1/(2γ2) We next consider a combined oracle model in which the algorithm has access to both the
Divergence estimation of continuous distributions based on datadependent partitions
 IEEE Transactions on Information Theory
, 2005
"... Abstract—We present a universal estimator of the divergence @ A for two arbitrary continuous distributions and satisfying certain regularity conditions. This algorithm, which observes independent and identically distributed (i.i.d.) samples from both and, is based on the estimation of the Radon–Niko ..."
Abstract

Cited by 44 (3 self)
 Add to MetaCart
Abstract—We present a universal estimator of the divergence @ A for two arbitrary continuous distributions and satisfying certain regularity conditions. This algorithm, which observes independent and identically distributed (i.i.d.) samples from both and, is based on the estimation of the Radon–Nikodym derivative � � via a datadependent partition of the observation space. Strong convergence of this estimator is proved with an empirically equivalent segmentation of the space. This basic estimator is further improved by adaptive partitioning schemes and by bias correction. The application of the algorithms to data with memory is also investigated. In the simulations, we compare our estimators with the direct plugin estimator and estimators based on other partitioning approaches. Experimental results show that our methods achieve the best convergence performance in most of the tested cases. Index Terms—Bias correction, datadependent partition, divergence, Radon–Nikodym derivative, stationary and ergodic data, universal estimation of information measures. I.
Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection
"... We present a unifying framework for information theoretic feature selection, bringing almost two decades of research on heuristic filter criteria under a single theoretical interpretation. This is in response to the question: “what are the implicit statistical assumptions of feature selection criter ..."
Abstract

Cited by 39 (6 self)
 Add to MetaCart
We present a unifying framework for information theoretic feature selection, bringing almost two decades of research on heuristic filter criteria under a single theoretical interpretation. This is in response to the question: “what are the implicit statistical assumptions of feature selection criteria based on mutual information?”. To answer this, we adopt a different strategy than is usual in the feature selection literature—instead of trying to define a criterion, we derive one, directly from a clearly specified objective function: the conditional likelihood of the training labels. While many handdesigned heuristic criteria try to optimize a definition of feature ‘relevancy ’ and ‘redundancy’, our approach leads to a probabilistic framework which naturally incorporates these concepts. As a result we can unify the numerous criteria published over the last two decades, and show them to be loworder approximations to the exact (but intractable) optimisation problem. The primary contribution is to show that common heuristics for information based feature selection (including Markov Blanket algorithms as a special case) are approximate iterative maximisers of the conditional likelihood. A large empirical study provides strong evidence to favour certain classes of criteria, in particular those that balance the relative size of the relevancy/redundancy terms. Overall we conclude that the JMI criterion (Yang and Moody, 1999; Meyer et al., 2008) provides the best tradeoff in terms of accuracy, stability, and flexibility with small data samples.
Modelbased decoding, information estimation, and changepoint detection in multineuron spike trains
 UNDER REVIEW, NEURAL COMPUTATION
, 2007
"... Understanding how stimulus information is encoded in spike trains is a central problem in computational neuroscience. Decoding methods provide an important tool for addressing this problem, by allowing us to explicitly read out the information contained in spike responses. Here we introduce several ..."
Abstract

Cited by 38 (18 self)
 Add to MetaCart
(Show Context)
Understanding how stimulus information is encoded in spike trains is a central problem in computational neuroscience. Decoding methods provide an important tool for addressing this problem, by allowing us to explicitly read out the information contained in spike responses. Here we introduce several decoding methods based on pointprocess neural encoding models (i.e. “forward ” models that predict spike responses to novel stimuli). These models have concave loglikelihood functions, allowing for efficient fitting via maximum likelihood. Moreover, we may use the likelihood of the observed spike trains under the model to perform optimal decoding. We present: (1) a tractable algorithm for computing the maximum a posteriori (MAP) estimate of the stimulus — the most probable stimulus to have generated the observed single or multiplespike train response, given some prior distribution over the stimulus; (2) a Gaussian approximation to the posterior distribution, which allows us to quantify the fidelity with which various stimulus features are encoded; (3) an efficient method for estimating the mutual information between the stimulus and the response; and (4) a framework for the detection of changepoint times (e.g. the time at which the stimulus undergoes a change in mean or variance), by marginalizing over the posterior distribution of stimuli. We show several examples illustrating the performance of these estimators with simulated data.
Sketching and Streaming Entropy via Approximation Theory
"... We conclude a sequence of work by giving nearoptimal sketching and streaming algorithms for estimating Shannon entropy in the most general streaming model, with arbitrary insertions and deletions. This improves on prior results that obtain suboptimal space bounds in the general model, and nearopti ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
(Show Context)
We conclude a sequence of work by giving nearoptimal sketching and streaming algorithms for estimating Shannon entropy in the most general streaming model, with arbitrary insertions and deletions. This improves on prior results that obtain suboptimal space bounds in the general model, and nearoptimal bounds in the insertiononly model without sketching. Our highlevel approach is simple: we give algorithms to estimate Rényi and Tsallis entropy, and use them to extrapolate an estimate of Shannon entropy. The accuracy of our estimates is proven using approximation theory arguments and extremal properties of Chebyshev polynomials, a technique which may be useful for other problems. Our work also yields the bestknown and nearoptimal additive approximations for entropy, and hence also for conditional entropy and mutual information.
Efficient Markov Chain Monte Carlo methods for decoding population spike trains
 TO APPEAR, NEURAL COMPUTATION
, 2010
"... Stimulus reconstruction or decoding methods provide an important tool for understanding how sensory and motor information is represented in neural activity. We discuss Bayesian decoding methods based on an encoding generalized linear model (GLM) that accurately describes how stimuli are transformed ..."
Abstract

Cited by 33 (14 self)
 Add to MetaCart
(Show Context)
Stimulus reconstruction or decoding methods provide an important tool for understanding how sensory and motor information is represented in neural activity. We discuss Bayesian decoding methods based on an encoding generalized linear model (GLM) that accurately describes how stimuli are transformed into the spike trains of a group of neurons. The form of the GLM likelihood ensures that the posterior distribution over the stimuli that caused an observed set of spike trains is logconcave so long as the prior is. This allows the maximum a posteriori (MAP) stimulus estimate to be obtained using efficient optimization algorithms. Unfortunately, the MAP estimate can have a relatively large average error when the posterior is highly nonGaussian. Here we compare several Markov chain Monte Carlo (MCMC) algorithms that allow for the calculation of general Bayesian estimators involving posterior expectations (conditional on model parameters). An efficient version of the hybrid Monte Carlo (HMC) algorithm was significantly superior to other MCMC methods for Gaussian priors. When the prior distribution has sharp edges and corners, on the other hand, the “hitandrun” algorithm performed better than other MCMC methods. Using these
A LargeDeviation Analysis for the Maximum Likelihood Learning of Tree Structures
, 2009
"... The problem of maximumlikelihood learning of the Markov tree structure of an unknown distribution from samples is considered when the distribution is Markov on a tree. Largedeviation analysis of the error in estimation of the set of edges of the tree is considered. Necessary and sufficient conditi ..."
Abstract

Cited by 27 (17 self)
 Add to MetaCart
(Show Context)
The problem of maximumlikelihood learning of the Markov tree structure of an unknown distribution from samples is considered when the distribution is Markov on a tree. Largedeviation analysis of the error in estimation of the set of edges of the tree is considered. Necessary and sufficient conditions are provided to ensure that this error probability decays exponentially. These conditions are based on the mutual information between each pair of variables being distinct from that of other pairs. The rate of error decay, which is the error exponent, is derived using the largedeviation principle. For a discrete distribution, the error exponent is approximated using Euclidean information theory, and is given by a ratio, interpreted as the signaltonoise ratio (SNR) for learning. Extensions to the Gaussian case are also considered.