Results 1 - 10
of
17
Local Feature Analysis: A general statistical theory for object representation
, 1996
"... . Low-dimensional representations of sensory signals are key to solving many of the computational problems encountered in high-level vision. Principal Component Analysis has been used in the past to derive practically useful compact representations for different classes of objects. One major object ..."
Abstract
-
Cited by 188 (9 self)
- Add to MetaCart
. Low-dimensional representations of sensory signals are key to solving many of the computational problems encountered in high-level vision. Principal Component Analysis has been used in the past to derive practically useful compact representations for different classes of objects. One major objection to the applicability of PCA is that it invariably leads to global, nontopographic representations that are not amenable to further processing and are not biologically plausible. In this paper we present a new mathematical construction---Local Feature Analysis (LFA)---for deriving local topographic representations for any class of objects. The LFA representations are sparse-distributed and, hence, are effectively low-dimensional and retain all the advantages of the compact representations of the PCA. But unlike the global eigenmodes, they give a description of objects in terms of statistically derived local features and their positions. We illustrate the theory by using it to extract loca...
Non Linear Neurons in the Low Noise Limit: A Factorial Code Maximizes Information Transfer
, 1994
"... We investigate the consequences of maximizing information transfer in a simple neural network (one input layer, one output layer), focussing on the case of non linear transfer functions. We assume that both receptive fields (synaptic efficacies) and transfer functions can be adapted to the environm ..."
Abstract
-
Cited by 130 (17 self)
- Add to MetaCart
We investigate the consequences of maximizing information transfer in a simple neural network (one input layer, one output layer), focussing on the case of non linear transfer functions. We assume that both receptive fields (synaptic efficacies) and transfer functions can be adapted to the environment. The main result is that, for bounded and invertible transfer functions, in the case of a vanishing additive output noise, and no input noise, maximization of information (Linsker'sinfomax principle) leads to a factorial code - hence to the same solution as required by the redundancy reduction principle of Barlow. We show also that this result is valid for linear, more generally unbounded, transfer functions, provided optimization is performed under an additive constraint, that is which can be written as a sum of terms, each one being specific to one output neuron. Finally we study the effect of a non zero input noise. We find that, at first order in the input noise, assumed to be small ...
An efficient, probabilistically sound algorithm for segmentation and word discovery
- MACHINE LEARNING
, 1999
"... This paper presents a model-based, unsupervised algorithm for recovering word boundaries in a natural-language text from which they have been deleted. The algorithm is derived from a probability model of the source that generated the text. The fundamental structure of the model is specified abstract ..."
Abstract
-
Cited by 103 (2 self)
- Add to MetaCart
This paper presents a model-based, unsupervised algorithm for recovering word boundaries in a natural-language text from which they have been deleted. The algorithm is derived from a probability model of the source that generated the text. The fundamental structure of the model is specified abstractly so that the detailed component models of phonology, word-order, and word frequency can be replaced in a modular fashion. The model yields a language-independent, prior probability distribution on all possible sequences of all possible words over a given alphabet, based on the assumption that the input was generated by concatenating words from a fixed but unknown lexicon. The model is unusual in that it treats the generation of a complete corpus, regardless of length, as a single event in the probability space. Accordingly, the algorithm does not estimate a probability distribution on words; instead, it attempts to calculate the prior probabilities of various word sequences that could underlie the observed text. Experiments on phonemic transcripts of spontaneous speech by parents to young children suggest that our algorithm is more effective than other proposed algorithms, at least when utterance boundaries are given and the text includes a substantial number of short utterances.
Unsupervised Discovery of Morphemes
, 2002
"... We present two methods for unsupervised segmentation of words into morphemelike units. The model utilized is especially suited for languages with a rich morphology, such as Finnish. The first method is based on the Minimum Description Length (MDL) principle and works online. In the second met ..."
Abstract
-
Cited by 55 (15 self)
- Add to MetaCart
We present two methods for unsupervised segmentation of words into morphemelike units. The model utilized is especially suited for languages with a rich morphology, such as Finnish. The first method is based on the Minimum Description Length (MDL) principle and works online. In the second method, Maximum Likelihood (ML) optimization is used. The quality of the segmentations is measured using an evaluation method that compares the segmentations produced to an existing morphological analysis. Experiments on both Finnish and English corpora show that the presented methods perform well compared to a current state-of-the-art system.
Redundancy Reduction and Independent Component Analysis: Conditions on Cumulants and Adaptive Approaches
, 1997
"... In the context of both sensory coding and signal processing, building factorized codes has been shown to be an efficient strategy. In a wide variety of situations, the signal to be processed is a linear mixture of statistically independent sources. Building a factorized code is then equivalent to pe ..."
Abstract
-
Cited by 31 (7 self)
- Add to MetaCart
In the context of both sensory coding and signal processing, building factorized codes has been shown to be an efficient strategy. In a wide variety of situations, the signal to be processed is a linear mixture of statistically independent sources. Building a factorized code is then equivalent to performing blind source separation. Thanks to the linear structure of the data, this can be done, in the language of signal processing, by finding an appropriate linear filter, or equivalently, in the language of neural modeling, by using a simple feedforward neural network. In this paper we discuss several aspects of the source separation problem. We give simple conditions on the network output which, if satisfied, guarantee that source separation has been obtained. Then we study adaptive approaches, in particular those based on redundancy reduction and maximisation of mutual information. We show how the resulting updating rules are related to the BCM theory of synaptic plasticity. Eventually...
Feature extraction through LOCOCODE
- NEURAL COMPUTATION
, 1998
"... "Low-complexity coding and decoding" (Lococode) is a novel approach to sensory coding and unsupervised learning. Unlike previous methods it explicitly takes into account the information-theoretic complexity of the code generator: it computes lococodes that (1) convey information about the input d ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
"Low-complexity coding and decoding" (Lococode) is a novel approach to sensory coding and unsupervised learning. Unlike previous methods it explicitly takes into account the information-theoretic complexity of the code generator: it computes lococodes that (1) convey information about the input data and (2) can be computed and decoded by low-complexity mappings. We implement Lococode by training autoassociators with Flat Minimum Search, a recent, general method for discovering low-complexity neural nets. It turns out that this approach can unmix an unknown number of independent data sources by extracting a minimal number of low-complexity features necessary for representing the data. Experiments show: unlike codes obtained with standard autoencoders, lococodes are based on feature detectors, never unstructured, usually sparse, sometimes factorial or local (depending on statistical properties of the data). Although Lococode is not explicitly designed to enforce sparse or factorial codes, it extracts optimal codes for difficult versions of the "bars" benchmark problem, whereas ICA and PCA do not. It also produces familiar, biologically plausible feature detectors when applied to real world images. As a preprocessor for a vowel recognition benchmark problem it sets the stage for excellent classification performance. Our results reveil an interesting, previously ignored connection between two important fields: regularizer research, and ICA-related research.
Low Entropy Coding with Unsupervised Neural Networks
"... ed on visual and speech data. The ability of the network to automatically generate wavelet codes from natural images is demonstrated. These bear a close resemblance to 2-D Gabor functions, which have previously been used to describe physiological receptive fields, and as a means of producing compact ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
ed on visual and speech data. The ability of the network to automatically generate wavelet codes from natural images is demonstrated. These bear a close resemblance to 2-D Gabor functions, which have previously been used to describe physiological receptive fields, and as a means of producing compact image representations. Keywords: neural networks, unsupervised learning, self-organisation, feature extraction, information theory, redundancy reduction, sparse coding, imaging models, occlusion, image coding, speech coding. Declaration This dissertation is the result of my own original work, except where reference is made to the work of others. No part of it has been submitted for any other university degree or diploma. Its length, including captions, footnotes, appendix and bibliography, is approximately 58000 words. Acknowledgements I would like first and foremost to thank Richard Prager, my supervisor, fo
Contextually Guided Unsupervised Learning Using Local Multivariate Binary Processors
, 1996
"... We consider the role of contextual guidance in learning and processing within multi-stream neural networks. Earlier work (Kay & Phillips, 1994, 1996; Phillips et al., 1995) showed how the goals of feature discovery and associative learning could be fused within a single objective, and made precise u ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
We consider the role of contextual guidance in learning and processing within multi-stream neural networks. Earlier work (Kay & Phillips, 1994, 1996; Phillips et al., 1995) showed how the goals of feature discovery and associative learning could be fused within a single objective, and made precise using information theory, in such a way that local binary processors could extract a single feature that is coherent across streams. In this paper we consider multi-unit local processors with multivariate binary outputs that enable a greater number of coherent features to be extracted. Using the Ising model, we define a class of information-theoretic objective functions and also local approximations, and derive the learning rules in both cases. These rules have similarities to, and differences from, the celebrated BCM rule. Local and global versions of Infomax appear as by-products of the general approach, as well as multivariate versions of Coherent Infomax. Focussing on the more biologicall...
Perceptual Learning from Cross-Modal Feedback
- Psychology of Learning and Motivation
, 1997
"... Introduction Ultimately we must understand how humans and animals are able to learn the complicated tasks they do. An important component of that learning process is the learning of how to form useful categories from sensory data. Thus the focus of this chapter is that of learning to classify--- le ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Introduction Ultimately we must understand how humans and animals are able to learn the complicated tasks they do. An important component of that learning process is the learning of how to form useful categories from sensory data. Thus the focus of this chapter is that of learning to classify--- learning to recognize that particular patterns belong to the same class which is different from the set of classes that represent other patterns. That such learning can be difficult is illustrated by a commonly used two-dimensional vowel dataset taken from Peterson and Barney (1952) 1 , shown in Figure 1. The data represent different utterances of the common vowels of english. As you can easily see, the distributions from different classes overlap making error-free classification impossible and simple clustering non-optimal. Learning algorithms for classification have been the subject of study in the field of pattern recognition since the 1950s. Such algorithms attempt to

