Results 1  10
of
44
Independent Component Analysis
 Neural Computing Surveys
, 2001
"... A common problem encountered in such disciplines as statistics, data analysis, signal processing, and neural network research, is nding a suitable representation of multivariate data. For computational and conceptual simplicity, such a representation is often sought as a linear transformation of the ..."
Abstract

Cited by 1492 (93 self)
 Add to MetaCart
A common problem encountered in such disciplines as statistics, data analysis, signal processing, and neural network research, is nding a suitable representation of multivariate data. For computational and conceptual simplicity, such a representation is often sought as a linear transformation of the original data. Wellknown linear transformation methods include, for example, principal component analysis, factor analysis, and projection pursuit. A recently developed linear transformation method is independent component analysis (ICA), in which the desired representation is the one that minimizes the statistical dependence of the components of the representation. Such a representation seems to capture the essential structure of the data in many applications. In this paper, we survey the existing theory and methods for ICA. 1
Filters, Random Fields and Maximum Entropy . . .
 INTERNATIONAL JOURNAL OF COMPUTER VISION
, 1998
"... This article presents a statistical theory for texture modeling. This theory combines filtering theory and Markov random field modeling through the maximum entropy principle, and interprets and clarifies many previous concepts and methods for texture analysis and synthesis from a unified point of vi ..."
Abstract

Cited by 193 (17 self)
 Add to MetaCart
This article presents a statistical theory for texture modeling. This theory combines filtering theory and Markov random field modeling through the maximum entropy principle, and interprets and clarifies many previous concepts and methods for texture analysis and synthesis from a unified point of view. Our theory characterizes the ensemble of images I with the same texture appearance by a probability distribution f (I) on a random field, and the objective of texture modeling is to make inference about f (I), given a set of observed texture examples. In our theory, texture modeling consists of two steps. (1) A set of filters is selected from a general filter bank to capture features of the texture, these filters are applied to observed texture images, and the histograms of the filtered images are extracted. These histograms are estimates of the marginal distributions of f (I). This step is called feature extraction. (2) The maximum entropy principle is employed to derive a distribution p(I), which is restricted to have the same marginal distributions as those in (1). This p(I) is considered as an estimate of f (I). This step is called feature fusion. A stepwise algorithm is proposed to choose filters from a general filter bank. The resulting model, called FRAME (Filters, Random fields And Maximum Entropy), is a Markov random field (MRF) model, but with a much enriched vocabulary and hence much stronger descriptive ability than the previous MRF models used for texture modeling. Gibbs sampler is adopted to synthesize texture images by drawing typical samples from p(I), thus the model is verified by seeing whether the synthesized texture images have similar visual appearances
Minimax Entropy Principle and Its Application to Texture Modeling
, 1997
"... This article proposes a general theory and methodology, called the minimax entropy principle, for building statistical models for images (or signals) in a variety of applications. This principle consists of two parts. The first is the maximum entropy principle for feature binding (or fusion): for a ..."
Abstract

Cited by 193 (39 self)
 Add to MetaCart
This article proposes a general theory and methodology, called the minimax entropy principle, for building statistical models for images (or signals) in a variety of applications. This principle consists of two parts. The first is the maximum entropy principle for feature binding (or fusion): for a certain set of feature statistics, a distribution can be built to bind these feature statistics together by maximizing the entropy over all distributions that reproduce these feature statistics. The second part is the minimum entropy principle for feature selection: among all plausible sets of feature statistics, we choose the set whose maximum entropy distribution has the minimum entropy. Computational and inferential issues in both parts are addressed, in particular, a feature pursuit procedure is proposed for approximately selecting the optimal set of features. The model complexity is restricted because of the sample variation in the observed feature statistics. The minimax entropy principle is applied to texture modeling, where a novel Markov random field (MRF) model, called FRAME (Filter, Random field, And Minimax Entropy), is derived, and encouraging results are obtained in experiments on a variety of texture images. Relationship between our theory and the mechanisms of neural computation is also discussed.
Non Linear Neurons in the Low Noise Limit: A Factorial Code Maximizes Information Transfer
, 1994
"... We investigate the consequences of maximizing information transfer in a simple neural network (one input layer, one output layer), focussing on the case of non linear transfer functions. We assume that both receptive fields (synaptic efficacies) and transfer functions can be adapted to the environm ..."
Abstract

Cited by 141 (18 self)
 Add to MetaCart
We investigate the consequences of maximizing information transfer in a simple neural network (one input layer, one output layer), focussing on the case of non linear transfer functions. We assume that both receptive fields (synaptic efficacies) and transfer functions can be adapted to the environment. The main result is that, for bounded and invertible transfer functions, in the case of a vanishing additive output noise, and no input noise, maximization of information (Linsker'sinfomax principle) leads to a factorial code  hence to the same solution as required by the redundancy reduction principle of Barlow. We show also that this result is valid for linear, more generally unbounded, transfer functions, provided optimization is performed under an additive constraint, that is which can be written as a sum of terms, each one being specific to one output neuron. Finally we study the effect of a non zero input noise. We find that, at first order in the input noise, assumed to be small ...
Mutual information, Fisher information and population coding
 Neural Computation
, 1998
"... In the context of parameter estimation and model selection, it is only quite recently that a direct link between the Fisher information and information theoretic quantities has been exhibited. We give an interpretation of this link within the standard framework of information theory. We show that in ..."
Abstract

Cited by 61 (3 self)
 Add to MetaCart
In the context of parameter estimation and model selection, it is only quite recently that a direct link between the Fisher information and information theoretic quantities has been exhibited. We give an interpretation of this link within the standard framework of information theory. We show that in the context of population coding, the mutual information between the activity of a large array of neurons and a stimulus to which the neurons are tuned is naturally related to the Fisher information. In the light of this result we consider the optimization of the tuning curves parameters in the case of neurons responding to a stimulus represented by an angular variable. To appear in Neural Computation Vol. 10, Issue 7, published by the MIT press. 1 Laboratory associated with C.N.R.S. (U.R.A. 1306), ENS, and Universities Paris VI and Paris VII 1 Introduction A natural framework to study how neurons communicate, or transmit information, in the nervous system is information theory (see e...
Learning Factorial Codes By Predictability Minimization
 Neural Computation
, 1991
"... I propose a novel general principle for unsupervised learning of distributed nonredundant internal representations of input patterns. The principle is based on two opposing forces. For each representational unit there is an adaptive predictor which tries to predict the unit from the remaining units ..."
Abstract

Cited by 54 (25 self)
 Add to MetaCart
I propose a novel general principle for unsupervised learning of distributed nonredundant internal representations of input patterns. The principle is based on two opposing forces. For each representational unit there is an adaptive predictor which tries to predict the unit from the remaining units. In turn, each unit tries to react to the environment such that it minimizes its predictability. This encourages each unit to filter `abstract concepts' out of the environmental input such that these concepts are statistically independent of those upon which the other units focus. I discuss various simple yet potentially powerful implementations of the principle which aim at finding binary factorial codes (Barlow et al., 1989), i.e. codes where the probability of the occurrence of a particular input is simply the product of the probabilities of the corresponding code symbols. Such codes are potentially relevant for (1) segmentation tasks, (2) speeding up supervised learning, (3) novelty detection. Methods for finding factorial codes automatically implement Occam's razor for finding codes using a minimal number of units. Unlike previous methods the novel principle has a potential for removing not only linear but also nonlinear output redundancy. Illustrative experiments show that algorithms based on the principle of predictability minimization are practically feasible. The final part of this paper describes an entirely local algorithm that has a potential for learning unique representations of extended input sequences.
Competition and multiple cause models
 Neural Computation
, 1995
"... If different causes can interact on any occasion to generate a set of patterns, then systems modelling the generation have to model the interaction too. We discuss a way of combining multiple causes that is based on the Integrated Segmentation and Recognition architecture of Keeler, Rumelhart and Le ..."
Abstract

Cited by 53 (3 self)
 Add to MetaCart
If different causes can interact on any occasion to generate a set of patterns, then systems modelling the generation have to model the interaction too. We discuss a way of combining multiple causes that is based on the Integrated Segmentation and Recognition architecture of Keeler, Rumelhart and Leow (1991). It is more cooperative than the scheme embodied in the mixture of experts architecture, which insists that just one cause generate each output, and more competitive than the noisyor combination function which was recently suggested by Saund (1994a;b). Simulations confirm its efficacy. 1
Redundancy Reduction and Independent Component Analysis: Conditions on Cumulants and Adaptive Approaches
, 1997
"... In the context of both sensory coding and signal processing, building factorized codes has been shown to be an efficient strategy. In a wide variety of situations, the signal to be processed is a linear mixture of statistically independent sources. Building a factorized code is then equivalent to pe ..."
Abstract

Cited by 32 (8 self)
 Add to MetaCart
In the context of both sensory coding and signal processing, building factorized codes has been shown to be an efficient strategy. In a wide variety of situations, the signal to be processed is a linear mixture of statistically independent sources. Building a factorized code is then equivalent to performing blind source separation. Thanks to the linear structure of the data, this can be done, in the language of signal processing, by finding an appropriate linear filter, or equivalently, in the language of neural modeling, by using a simple feedforward neural network. In this paper we discuss several aspects of the source separation problem. We give simple conditions on the network output which, if satisfied, guarantee that source separation has been obtained. Then we study adaptive approaches, in particular those based on redundancy reduction and maximisation of mutual information. We show how the resulting updating rules are related to the BCM theory of synaptic plasticity. Eventually...
Feature extraction through LOCOCODE
 NEURAL COMPUTATION
, 1998
"... "Lowcomplexity coding and decoding" (Lococode) is a novel approach to sensory coding and unsupervised learning. Unlike previous methods it explicitly takes into account the informationtheoretic complexity of the code generator: it computes lococodes that (1) convey information about the input d ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
"Lowcomplexity coding and decoding" (Lococode) is a novel approach to sensory coding and unsupervised learning. Unlike previous methods it explicitly takes into account the informationtheoretic complexity of the code generator: it computes lococodes that (1) convey information about the input data and (2) can be computed and decoded by lowcomplexity mappings. We implement Lococode by training autoassociators with Flat Minimum Search, a recent, general method for discovering lowcomplexity neural nets. It turns out that this approach can unmix an unknown number of independent data sources by extracting a minimal number of lowcomplexity features necessary for representing the data. Experiments show: unlike codes obtained with standard autoencoders, lococodes are based on feature detectors, never unstructured, usually sparse, sometimes factorial or local (depending on statistical properties of the data). Although Lococode is not explicitly designed to enforce sparse or factorial codes, it extracts optimal codes for difficult versions of the "bars" benchmark problem, whereas ICA and PCA do not. It also produces familiar, biologically plausible feature detectors when applied to real world images. As a preprocessor for a vowel recognition benchmark problem it sets the stage for excellent classification performance. Our results reveil an interesting, previously ignored connection between two important fields: regularizer research, and ICArelated research.