Results 1  10
of
22
Local Feature Analysis: A general statistical theory for object representation
, 1996
"... . Lowdimensional representations of sensory signals are key to solving many of the computational problems encountered in highlevel vision. Principal Component Analysis has been used in the past to derive practically useful compact representations for different classes of objects. One major object ..."
Abstract

Cited by 229 (9 self)
 Add to MetaCart
. Lowdimensional representations of sensory signals are key to solving many of the computational problems encountered in highlevel vision. Principal Component Analysis has been used in the past to derive practically useful compact representations for different classes of objects. One major objection to the applicability of PCA is that it invariably leads to global, nontopographic representations that are not amenable to further processing and are not biologically plausible. In this paper we present a new mathematical constructionLocal Feature Analysis (LFA)for deriving local topographic representations for any class of objects. The LFA representations are sparsedistributed and, hence, are effectively lowdimensional and retain all the advantages of the compact representations of the PCA. But unlike the global eigenmodes, they give a description of objects in terms of statistically derived local features and their positions. We illustrate the theory by using it to extract loca...
An efficient, probabilistically sound algorithm for segmentation and word discovery
 MACHINE LEARNING
, 1999
"... This paper presents a modelbased, unsupervised algorithm for recovering word boundaries in a naturallanguage text from which they have been deleted. The algorithm is derived from a probability model of the source that generated the text. The fundamental structure of the model is specified abstract ..."
Abstract

Cited by 142 (2 self)
 Add to MetaCart
This paper presents a modelbased, unsupervised algorithm for recovering word boundaries in a naturallanguage text from which they have been deleted. The algorithm is derived from a probability model of the source that generated the text. The fundamental structure of the model is specified abstractly so that the detailed component models of phonology, wordorder, and word frequency can be replaced in a modular fashion. The model yields a languageindependent, prior probability distribution on all possible sequences of all possible words over a given alphabet, based on the assumption that the input was generated by concatenating words from a fixed but unknown lexicon. The model is unusual in that it treats the generation of a complete corpus, regardless of length, as a single event in the probability space. Accordingly, the algorithm does not estimate a probability distribution on words; instead, it attempts to calculate the prior probabilities of various word sequences that could underlie the observed text. Experiments on phonemic transcripts of spontaneous speech by parents to young children suggest that our algorithm is more effective than other proposed algorithms, at least when utterance boundaries are given and the text includes a substantial number of short utterances.
Non Linear Neurons in the Low Noise Limit: A Factorial Code Maximizes Information Transfer
, 1994
"... We investigate the consequences of maximizing information transfer in a simple neural network (one input layer, one output layer), focussing on the case of non linear transfer functions. We assume that both receptive fields (synaptic efficacies) and transfer functions can be adapted to the environm ..."
Abstract

Cited by 141 (18 self)
 Add to MetaCart
We investigate the consequences of maximizing information transfer in a simple neural network (one input layer, one output layer), focussing on the case of non linear transfer functions. We assume that both receptive fields (synaptic efficacies) and transfer functions can be adapted to the environment. The main result is that, for bounded and invertible transfer functions, in the case of a vanishing additive output noise, and no input noise, maximization of information (Linsker'sinfomax principle) leads to a factorial code  hence to the same solution as required by the redundancy reduction principle of Barlow. We show also that this result is valid for linear, more generally unbounded, transfer functions, provided optimization is performed under an additive constraint, that is which can be written as a sum of terms, each one being specific to one output neuron. Finally we study the effect of a non zero input noise. We find that, at first order in the input noise, assumed to be small ...
Unsupervised Discovery of Morphemes
, 2002
"... We present two methods for unsupervised segmentation of words into morphemelike units. The model utilized is especially suited for languages with a rich morphology, such as Finnish. The first method is based on the Minimum Description Length (MDL) principle and works online. In the second met ..."
Abstract

Cited by 64 (15 self)
 Add to MetaCart
We present two methods for unsupervised segmentation of words into morphemelike units. The model utilized is especially suited for languages with a rich morphology, such as Finnish. The first method is based on the Minimum Description Length (MDL) principle and works online. In the second method, Maximum Likelihood (ML) optimization is used. The quality of the segmentations is measured using an evaluation method that compares the segmentations produced to an existing morphological analysis. Experiments on both Finnish and English corpora show that the presented methods perform well compared to a current stateoftheart system.
Redundancy Reduction and Independent Component Analysis: Conditions on Cumulants and Adaptive Approaches
, 1997
"... In the context of both sensory coding and signal processing, building factorized codes has been shown to be an efficient strategy. In a wide variety of situations, the signal to be processed is a linear mixture of statistically independent sources. Building a factorized code is then equivalent to pe ..."
Abstract

Cited by 32 (8 self)
 Add to MetaCart
In the context of both sensory coding and signal processing, building factorized codes has been shown to be an efficient strategy. In a wide variety of situations, the signal to be processed is a linear mixture of statistically independent sources. Building a factorized code is then equivalent to performing blind source separation. Thanks to the linear structure of the data, this can be done, in the language of signal processing, by finding an appropriate linear filter, or equivalently, in the language of neural modeling, by using a simple feedforward neural network. In this paper we discuss several aspects of the source separation problem. We give simple conditions on the network output which, if satisfied, guarantee that source separation has been obtained. Then we study adaptive approaches, in particular those based on redundancy reduction and maximisation of mutual information. We show how the resulting updating rules are related to the BCM theory of synaptic plasticity. Eventually...
Connectionist and statistical approaches to language acquisition: A distributional perspective
 Language and Cognitive Processes
, 1998
"... We propose that one important role for connectionist research in language acquisition is analysing what linguistic information is present in the child’s input. Recent connectionist and statistical work analysing the properties of real language corpora suggest that a priori objections against the uti ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
We propose that one important role for connectionist research in language acquisition is analysing what linguistic information is present in the child’s input. Recent connectionist and statistical work analysing the properties of real language corpora suggest that a priori objections against the utility of distributional information for the child are misguided. We illustrate our argument with examples of connectionist and statistical corpusbased research on phonology, segmentation, morphology, word classes, phrase structure, and lexical semantics. We discuss how this research relates to other empirical and theoretical approaches to the study of language acquisition.
Feature extraction through LOCOCODE
 NEURAL COMPUTATION
, 1998
"... "Lowcomplexity coding and decoding" (Lococode) is a novel approach to sensory coding and unsupervised learning. Unlike previous methods it explicitly takes into account the informationtheoretic complexity of the code generator: it computes lococodes that (1) convey information about the input d ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
"Lowcomplexity coding and decoding" (Lococode) is a novel approach to sensory coding and unsupervised learning. Unlike previous methods it explicitly takes into account the informationtheoretic complexity of the code generator: it computes lococodes that (1) convey information about the input data and (2) can be computed and decoded by lowcomplexity mappings. We implement Lococode by training autoassociators with Flat Minimum Search, a recent, general method for discovering lowcomplexity neural nets. It turns out that this approach can unmix an unknown number of independent data sources by extracting a minimal number of lowcomplexity features necessary for representing the data. Experiments show: unlike codes obtained with standard autoencoders, lococodes are based on feature detectors, never unstructured, usually sparse, sometimes factorial or local (depending on statistical properties of the data). Although Lococode is not explicitly designed to enforce sparse or factorial codes, it extracts optimal codes for difficult versions of the "bars" benchmark problem, whereas ICA and PCA do not. It also produces familiar, biologically plausible feature detectors when applied to real world images. As a preprocessor for a vowel recognition benchmark problem it sets the stage for excellent classification performance. Our results reveil an interesting, previously ignored connection between two important fields: regularizer research, and ICArelated research.
Low Entropy Coding with Unsupervised Neural Networks
"... ed on visual and speech data. The ability of the network to automatically generate wavelet codes from natural images is demonstrated. These bear a close resemblance to 2D Gabor functions, which have previously been used to describe physiological receptive fields, and as a means of producing compact ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
ed on visual and speech data. The ability of the network to automatically generate wavelet codes from natural images is demonstrated. These bear a close resemblance to 2D Gabor functions, which have previously been used to describe physiological receptive fields, and as a means of producing compact image representations. Keywords: neural networks, unsupervised learning, selforganisation, feature extraction, information theory, redundancy reduction, sparse coding, imaging models, occlusion, image coding, speech coding. Declaration This dissertation is the result of my own original work, except where reference is made to the work of others. No part of it has been submitted for any other university degree or diploma. Its length, including captions, footnotes, appendix and bibliography, is approximately 58000 words. Acknowledgements I would like first and foremost to thank Richard Prager, my supervisor, fo
Contextually Guided Unsupervised Learning Using Local Multivariate Binary Processors
, 1996
"... We consider the role of contextual guidance in learning and processing within multistream neural networks. Earlier work (Kay & Phillips, 1994, 1996; Phillips et al., 1995) showed how the goals of feature discovery and associative learning could be fused within a single objective, and made precise u ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
We consider the role of contextual guidance in learning and processing within multistream neural networks. Earlier work (Kay & Phillips, 1994, 1996; Phillips et al., 1995) showed how the goals of feature discovery and associative learning could be fused within a single objective, and made precise using information theory, in such a way that local binary processors could extract a single feature that is coherent across streams. In this paper we consider multiunit local processors with multivariate binary outputs that enable a greater number of coherent features to be extracted. Using the Ising model, we define a class of informationtheoretic objective functions and also local approximations, and derive the learning rules in both cases. These rules have similarities to, and differences from, the celebrated BCM rule. Local and global versions of Infomax appear as byproducts of the general approach, as well as multivariate versions of Coherent Infomax. Focussing on the more biologicall...