Results 1  10
of
81
Hierarchical mixtures of experts and the EM algorithm
 Neural Computation
, 1994
"... We present a treestructured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM’s). Learning is treated as a maximum likelihood ..."
Abstract

Cited by 737 (19 self)
 Add to MetaCart
We present a treestructured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM’s). Learning is treated as a maximum likelihood problem; in particular, we present an ExpectationMaximization (EM) algorithm for adjusting the parameters of the architecture. We also develop an online learning algorithm in which the parameters are updated incrementally. Comparative simulation results are presented in the robot dynamics domain. 1
Dimension Reduction by Local Principal Component Analysis
, 1997
"... Reducing or eliminating statistical redundancy between the components of highdimensional vector data enables a lowerdimensional representation without significant loss of information. Recognizing the limitations of principal component analysis (PCA), researchers in the statistics and neural networ ..."
Abstract

Cited by 102 (0 self)
 Add to MetaCart
Reducing or eliminating statistical redundancy between the components of highdimensional vector data enables a lowerdimensional representation without significant loss of information. Recognizing the limitations of principal component analysis (PCA), researchers in the statistics and neural network communities have developed nonlinear extensions of PCA. This article develops a local linear approach to dimension reduction that provides accurate representations and is fast to compute. We exercise the algorithms on speech and image data, and compare performance with PCA and with neural network implementations of nonlinear PCA. We find that both nonlinear techniques can provide more accurate representations than PCA and show that the local linear techniques outperform neural network implementations.
A first application of independent component analysis to extracting structure from stock returns
 International Journal on Neural Systems
, 1997
"... ..."
Learning in Linear Neural Networks: a Survey
 IEEE Transactions on neural networks
, 1995
"... Networks of linear units are the simplest kind of networks, where the basic questions related to learning, generalization, and selforganisation can sometimes be answered analytically. We survey most of the known results on linear networks, including: (1) backpropagation learning and the structure ..."
Abstract

Cited by 56 (4 self)
 Add to MetaCart
Networks of linear units are the simplest kind of networks, where the basic questions related to learning, generalization, and selforganisation can sometimes be answered analytically. We survey most of the known results on linear networks, including: (1) backpropagation learning and the structure of the error function landscape; (2) the temporal evolution of generalization; (3) unsupervised learning algorithms and their properties. The connections to classical statistical ideas, such as principal component analysis (PCA), are emphasized as well as several simple but challenging open questions. A few new results are also spread across the paper, including an analysis of the effect of noise on backpropagation networks and a unified view of all unsupervised algorithms. Keywords linear networks, supervised and unsupervised learning, Hebbian learning, principal components, generalization, local minima, selforganisation I. Introduction This paper addresses the problems of supervise...
Recognizing handwritten digits using mixtures of linear models
 Advances in Neural Information Processing Systems 7
, 1995
"... We construct a mixture of locally linear generative models of a collection of pixelbased images of digits, and use them for recognition. Different models of a given digit are used to capture different styles of writing, and new images are classified by evaluating their loglikelihoods under each mo ..."
Abstract

Cited by 56 (6 self)
 Add to MetaCart
We construct a mixture of locally linear generative models of a collection of pixelbased images of digits, and use them for recognition. Different models of a given digit are used to capture different styles of writing, and new images are classified by evaluating their loglikelihoods under each model. We use an EMbased algorithm in which the Mstep is computationally straightforward principal components analysis (PCA). Incorporating tangentplane information [12] about expected local deformations only requires adding tangent vectors into the sample covariance matrices for the PCA, and it demonstrably improves performance. 1
Fast nonlinear dimension reduction
 In IEEE International Conference on Neural Networks
, 1993
"... We present a fast algorithm for nonlinear dimension reduction. The algorithm builds a local linear model of the data by merging PCA with clustering based on a new distortion measure. Experiments with speech and image data indicate that the local linear algorithm produces encodings with lower distor ..."
Abstract

Cited by 45 (4 self)
 Add to MetaCart
We present a fast algorithm for nonlinear dimension reduction. The algorithm builds a local linear model of the data by merging PCA with clustering based on a new distortion measure. Experiments with speech and image data indicate that the local linear algorithm produces encodings with lower distortion than those built by velayer autoassociative networks. The local linear algorithm is also more than an order of magnitude faster to train. 1
A review of dimension reduction techniques
, 1997
"... The problem of dimension reduction is introduced as a way to overcome the curse of the dimensionality when dealing with vector data in highdimensional spaces and as a modelling tool for such data. It is defined as the search for a lowdimensional manifold that embeds the highdimensional data. A cl ..."
Abstract

Cited by 32 (4 self)
 Add to MetaCart
The problem of dimension reduction is introduced as a way to overcome the curse of the dimensionality when dealing with vector data in highdimensional spaces and as a modelling tool for such data. It is defined as the search for a lowdimensional manifold that embeds the highdimensional data. A classification of dimension reduction problems is proposed. A survey of several techniques for dimension reduction is given, including principal component analysis, projection pursuit and projection pursuit regression, principal curves and methods based on topologically continuous maps, such as Kohonen’s maps or the generalised topographic mapping. Neural network implementations for several of these techniques are also reviewed, such as the projection pursuit learning network and the BCM neuron with an objective function. Several appendices complement the mathematical treatment of the main text.
Image compression with neural networks  A survey
 Signal Processing: Image Communication 14
, 1999
"... Apart from the existing technology on image compression represented by series of JPEG, MPEG and H.26x standards, new technology such as neural networks and genetic algorithms are being developed to explore the future of image coding. Successful applications of neural networks to vector quantization ..."
Abstract

Cited by 25 (2 self)
 Add to MetaCart
Apart from the existing technology on image compression represented by series of JPEG, MPEG and H.26x standards, new technology such as neural networks and genetic algorithms are being developed to explore the future of image coding. Successful applications of neural networks to vector quantization have now become well established, and other aspects of neural network involvement in this area are stepping up to play signi"cant roles in assisting with those traditional technologies. This paper presents an extensive survey on the development of neural networks for image compression which covers three categories: direct image compression by neural networks; neural network implementation of existing techniques, and neural network based technology which provide improvement over traditional algorithms. # 1999 Elsevier Science B.V. All rights reserved.
Unsupervised Neural Network Learning Procedures . . .
, 1996
"... In this article, we review unsupervised neural network learning procedures which can be applied to the task of preprocessing raw data to extract useful features for subsequent classification. The learning algorithms reviewed here are grouped into three sections: informationpreserving methods, densi ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
In this article, we review unsupervised neural network learning procedures which can be applied to the task of preprocessing raw data to extract useful features for subsequent classification. The learning algorithms reviewed here are grouped into three sections: informationpreserving methods, density estimation methods, and feature extraction methods. Each of these major sections concludes with a discussion of successful applications of the methods to realworld problems.
Low Entropy Coding with Unsupervised Neural Networks
"... ed on visual and speech data. The ability of the network to automatically generate wavelet codes from natural images is demonstrated. These bear a close resemblance to 2D Gabor functions, which have previously been used to describe physiological receptive fields, and as a means of producing compact ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
ed on visual and speech data. The ability of the network to automatically generate wavelet codes from natural images is demonstrated. These bear a close resemblance to 2D Gabor functions, which have previously been used to describe physiological receptive fields, and as a means of producing compact image representations. Keywords: neural networks, unsupervised learning, selforganisation, feature extraction, information theory, redundancy reduction, sparse coding, imaging models, occlusion, image coding, speech coding. Declaration This dissertation is the result of my own original work, except where reference is made to the work of others. No part of it has been submitted for any other university degree or diploma. Its length, including captions, footnotes, appendix and bibliography, is approximately 58000 words. Acknowledgements I would like first and foremost to thank Richard Prager, my supervisor, fo