Results 1  10
of
62
Hierarchical mixtures of experts and the EM algorithm
 Neural Computation
, 1994
"... We present a treestructured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM’s). Learning is treated as a maximum likelihood ..."
Abstract

Cited by 723 (19 self)
 Add to MetaCart
We present a treestructured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM’s). Learning is treated as a maximum likelihood problem; in particular, we present an ExpectationMaximization (EM) algorithm for adjusting the parameters of the architecture. We also develop an online learning algorithm in which the parameters are updated incrementally. Comparative simulation results are presented in the robot dynamics domain. 1
Dimension Reduction by Local Principal Component Analysis
, 1997
"... Reducing or eliminating statistical redundancy between the components of highdimensional vector data enables a lowerdimensional representation without significant loss of information. Recognizing the limitations of principal component analysis (PCA), researchers in the statistics and neural networ ..."
Abstract

Cited by 99 (0 self)
 Add to MetaCart
Reducing or eliminating statistical redundancy between the components of highdimensional vector data enables a lowerdimensional representation without significant loss of information. Recognizing the limitations of principal component analysis (PCA), researchers in the statistics and neural network communities have developed nonlinear extensions of PCA. This article develops a local linear approach to dimension reduction that provides accurate representations and is fast to compute. We exercise the algorithms on speech and image data, and compare performance with PCA and with neural network implementations of nonlinear PCA. We find that both nonlinear techniques can provide more accurate representations than PCA and show that the local linear techniques outperform neural network implementations.
A first application of independent component analysis to extracting structure from stock returns
 INTERNATIONAL JOURNAL OF NEURAL SYSTEMS
, 1997
"... This paper discusses the application of a modern signal processing technique known as independent component analysis (ICA) or blind source separation to multivariate financial time series such as a portfolio of stocks. The key idea of ICA is to linearly map the observed multivariate time series int ..."
Abstract

Cited by 57 (1 self)
 Add to MetaCart
This paper discusses the application of a modern signal processing technique known as independent component analysis (ICA) or blind source separation to multivariate financial time series such as a portfolio of stocks. The key idea of ICA is to linearly map the observed multivariate time series into a new space of statistically independent components (ICs). This can be viewed as a factorization of the portfolio since joint probabilities become simple products in the coordinate system of the ICs. We apply ICA to three years of daily returns of the 28 largest Japanese stocks and compare the results with those obtained using principal component analysis. The results indicate that the estimated ICs fall into two categories, (i) infrequent but large shocks (responsible for the major changes in the stock prices), and (ii) frequent smaller fluctuations (contributing little to the overall level of the stocks). We show that the overall stock price can be reconstructed surprisingly well by using a small number of thresholded weighted ICs. In contrast, when using shocks derived from principal components instead of independent components, the reconstructed price is less similar to the original one. Independent component analysis is a potentially powerful method of analyzing and understanding driving mechanisms in financial markets. There are further
Recognizing handwritten digits using mixtures of linear models
 Advances in Neural Information Processing Systems 7
, 1995
"... We construct a mixture of locally linear generative models of a collection of pixelbased images of digits, and use them for recognition. Different models of a given digit are used to capture different styles of writing, and new images are classified by evaluating their loglikelihoods under each mo ..."
Abstract

Cited by 56 (6 self)
 Add to MetaCart
We construct a mixture of locally linear generative models of a collection of pixelbased images of digits, and use them for recognition. Different models of a given digit are used to capture different styles of writing, and new images are classified by evaluating their loglikelihoods under each model. We use an EMbased algorithm in which the Mstep is computationally straightforward principal components analysis (PCA). Incorporating tangentplane information [12] about expected local deformations only requires adding tangent vectors into the sample covariance matrices for the PCA, and it demonstrably improves performance. 1
Learning in Linear Neural Networks: a Survey
 IEEE Transactions on neural networks
, 1995
"... Networks of linear units are the simplest kind of networks, where the basic questions related to learning, generalization, and selforganisation can sometimes be answered analytically. We survey most of the known results on linear networks, including: (1) backpropagation learning and the structure ..."
Abstract

Cited by 56 (4 self)
 Add to MetaCart
Networks of linear units are the simplest kind of networks, where the basic questions related to learning, generalization, and selforganisation can sometimes be answered analytically. We survey most of the known results on linear networks, including: (1) backpropagation learning and the structure of the error function landscape; (2) the temporal evolution of generalization; (3) unsupervised learning algorithms and their properties. The connections to classical statistical ideas, such as principal component analysis (PCA), are emphasized as well as several simple but challenging open questions. A few new results are also spread across the paper, including an analysis of the effect of noise on backpropagation networks and a unified view of all unsupervised algorithms. Keywords linear networks, supervised and unsupervised learning, Hebbian learning, principal components, generalization, local minima, selforganisation I. Introduction This paper addresses the problems of supervise...
Fast nonlinear dimension reduction
 In IEEE International Conference on Neural Networks
, 1993
"... We present a fast algorithm for nonlinear dimension reduction. The algorithm builds a local linear model of the data by merging PCA with clustering based on a new distortion measure. Experiments with speech and image data indicate that the local linear algorithm produces encodings with lower distor ..."
Abstract

Cited by 44 (4 self)
 Add to MetaCart
We present a fast algorithm for nonlinear dimension reduction. The algorithm builds a local linear model of the data by merging PCA with clustering based on a new distortion measure. Experiments with speech and image data indicate that the local linear algorithm produces encodings with lower distortion than those built by velayer autoassociative networks. The local linear algorithm is also more than an order of magnitude faster to train. 1
A review of dimension reduction techniques
, 1997
"... The problem of dimension reduction is introduced as a way to overcome the curse of the dimensionality when dealing with vector data in highdimensional spaces and as a modelling tool for such data. It is defined as the search for a lowdimensional manifold that embeds the highdimensional data. A cl ..."
Abstract

Cited by 30 (4 self)
 Add to MetaCart
The problem of dimension reduction is introduced as a way to overcome the curse of the dimensionality when dealing with vector data in highdimensional spaces and as a modelling tool for such data. It is defined as the search for a lowdimensional manifold that embeds the highdimensional data. A classification of dimension reduction problems is proposed. A survey of several techniques for dimension reduction is given, including principal component analysis, projection pursuit and projection pursuit regression, principal curves and methods based on topologically continuous maps, such as Kohonen’s maps or the generalised topographic mapping. Neural network implementations for several of these techniques are also reviewed, such as the projection pursuit learning network and the BCM neuron with an objective function. Several appendices complement the mathematical treatment of the main text.
Unsupervised Neural Network Learning Procedures . . .
, 1996
"... In this article, we review unsupervised neural network learning procedures which can be applied to the task of preprocessing raw data to extract useful features for subsequent classification. The learning algorithms reviewed here are grouped into three sections: informationpreserving methods, densi ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
In this article, we review unsupervised neural network learning procedures which can be applied to the task of preprocessing raw data to extract useful features for subsequent classification. The learning algorithms reviewed here are grouped into three sections: informationpreserving methods, density estimation methods, and feature extraction methods. Each of these major sections concludes with a discussion of successful applications of the methods to realworld problems.
Justifying and generalizing contrastive divergence
, 2007
"... We study an expansion of the loglikelihood in undirected graphical models such as the Restricted Boltzmann Machine (RBM), where each term in the expansion is associated with a sample in a Gibbs chain alternating between two random variables (the visible vector and the hidden vector, in RBMs). We ar ..."
Abstract

Cited by 23 (10 self)
 Add to MetaCart
We study an expansion of the loglikelihood in undirected graphical models such as the Restricted Boltzmann Machine (RBM), where each term in the expansion is associated with a sample in a Gibbs chain alternating between two random variables (the visible vector and the hidden vector, in RBMs). We are particularly interested in estimators of the gradient of the loglikelihood obtained through this expansion. We show that its terms converge to zero, justifying the use of a truncation, i.e. running only a short Gibbs chain, which is the main idea behind the Contrastive Divergence approximation of the loglikelihood gradient. By truncating even more, we obtain a stochastic reconstruction error, related through a meanfield approximation to the reconstruction error often used to train autoassociators and stacked autoassociators. The derivation is not specific to the particular parametric forms used in RBMs, and only requires convergence of the Gibbs chain. 1
Image compression with neural networks  A survey
 Signal Processing: Image Communication 14
, 1999
"... Apart from the existing technology on image compression represented by series of JPEG, MPEG and H.26x standards, new technology such as neural networks and genetic algorithms are being developed to explore the future of image coding. Successful applications of neural networks to vector quantization ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
Apart from the existing technology on image compression represented by series of JPEG, MPEG and H.26x standards, new technology such as neural networks and genetic algorithms are being developed to explore the future of image coding. Successful applications of neural networks to vector quantization have now become well established, and other aspects of neural network involvement in this area are stepping up to play signi"cant roles in assisting with those traditional technologies. This paper presents an extensive survey on the development of neural networks for image compression which covers three categories: direct image compression by neural networks; neural network implementation of existing techniques, and neural network based technology which provide improvement over traditional algorithms. # 1999 Elsevier Science B.V. All rights reserved.