Results 11  20
of
82
Stochastic gradient learning in neural networks
 In Proceedings of NeuroNîmes. EC2
, 1991
"... Many connectionist learning algorithms consists of minimizing a cost of the form C(w) = E(J(z,w)) = J(z,w)dP(z) where dP is an unknown probability distribution that characterizes the problem to learn, and J, the loss function, defines the learning system itself. This popular statistical formulatio ..."
Abstract

Cited by 28 (1 self)
 Add to MetaCart
Many connectionist learning algorithms consists of minimizing a cost of the form C(w) = E(J(z,w)) = J(z,w)dP(z) where dP is an unknown probability distribution that characterizes the problem to learn, and J, the loss function, defines the learning system itself. This popular statistical formulation has led to many theoretical results. The minimization of such a cost may be achieved with a stochastic gradient descent algorithm, e.g.: wt+1 = wt − ɛt∇wJ(z,wt) With some restrictions on J and C, this algorithm converges, even if J is non differentiable on a set of measure 0. Links with simulated annealing are depicted. Résumé De nombreux algorithmes connexionnistes consistent à minimiser un coût de la forme C(w) = E(J(z,w)) = J(z,w)dP(z) où dP est une distribution de probabilité inconnue qui caractérise le problème, et J, le critère local, décrit le système d’apprentissage lui même. Cette formulation statistique bien connue a donné lieu à de nombreux résultats théoriques. La minimisation d’un tel coût peut être accomplie au moyen d’un algorithme de descente stochastique de gradient, par exemple: wt+1 = wt − ɛt∇wJ(z,wt) Au prix de quelques restrictions sur C et J, cet algorithme converge, même si J n’est pas dérivable sur un ensemble de mesure nulle. Des liens avec les méthodes de recuit simulé sont également soulignés.
Prototype Selection for Composite Nearest Neighbor Classifiers
, 1997
"... Combining the predictions of a set of classifiers has been shown to be an effective way to create composite classifiers that are more accurate than any of the component classifiers. Increased accuracy has been shown in a variety of realworld applications, ranging from protein sequence identificatio ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
Combining the predictions of a set of classifiers has been shown to be an effective way to create composite classifiers that are more accurate than any of the component classifiers. Increased accuracy has been shown in a variety of realworld applications, ranging from protein sequence identification to determining the fat content of ground meat. Despite such individual successes, the answers are not known to fundamental questions about classifier combination, such as "Can classifiers from any given model class be combined to create a composite classifier with higher accuracy?" or "Is it possible to increase the accuracy of a given classifier by combining its predictions with those of only a small number o...
A Global Optimization Technique for Statistical Classifier Design
 IEEE Transactions on Signal Processing
"... A global optimization method is introduced for the design of statistical classifiers that minimize the rate of misclassification. We first derive the theoretical basis for the method, based on which we develop a novel design algorithm and demonstrate its effectiveness and superior performance in the ..."
Abstract

Cited by 25 (9 self)
 Add to MetaCart
A global optimization method is introduced for the design of statistical classifiers that minimize the rate of misclassification. We first derive the theoretical basis for the method, based on which we develop a novel design algorithm and demonstrate its effectiveness and superior performance in the design of practical classifiers for some of the most popular structures currently in use. The method, grounded in ideas from statistical physics and information theory, extends the deterministic annealing approach for optimization, both to incorporate structural constraints on data assignments to classes and to minimize the probability of error as the cost objective. During the design, data are assigned to classes in probability, so as to minimize the expected classification error given a specified level of randomness, as measured by Shannon's entropy. The constrained optimization is equivalent to a free energy minimization, motivating a deterministic annealing approach in which the entropy...
A neural network based hybrid system for detection, characterization and classification of shortduration oceanic signals
 IEEE Journal of Ocean Engineering
, 1992
"... AbstractAutomated identification and classification of shortduration oceanic signals obtained from passive sonar is a complex problem because of the large variability in both temporal and spectral characteristics even in signals obtained from the same source. This paper presents the design and eva ..."
Abstract

Cited by 25 (19 self)
 Add to MetaCart
AbstractAutomated identification and classification of shortduration oceanic signals obtained from passive sonar is a complex problem because of the large variability in both temporal and spectral characteristics even in signals obtained from the same source. This paper presents the design and evaluation of a comprehensive classifier system for such signals. We first highlight the importance of selecting appropriate signal descriptors or feature vectors for highquality classification of realistic shortduration oceanic signals. Waveletbased feature extractors are shown to be superior to the more commonly used autoregressive coefficients and power spectral coefficients for this purpose. A variety of static neural network classifiers are evaluated and compared favorably with traditional statistical techniques for signal classification. We concentrate on those networks that are able to time out irrelevant input features and are less susceptible to noisy inputs, and introduce two new neuralnetwork based classifiers. Methods for combining the outputs of several classifiers to yield a more accurate labeling are proposed and evaluated based on the interpretation of network outputs as approximating posterior class probabilities. These methods lead to higher classification accuracy and also provide a mechanism for recognizing deviant signals and false alarms. Performance results are given for signals in the DARPA standard data set I. KeywordsNeural networks, pattern classification, passive sonar, shortduration oceanic signals, feature extraction, evidence combination. S I.
Using Vector Quantization for Image Processing
 Proc. IEEE
, 1993
"... Image compression is the process of reducing the number of bits required to represent an image. Vector quantization, the mapping of pixel intensity vectors into binary vectors indexing a limited number of possible reproductions, is a popular image compression algorithm. Compression has traditionally ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
Image compression is the process of reducing the number of bits required to represent an image. Vector quantization, the mapping of pixel intensity vectors into binary vectors indexing a limited number of possible reproductions, is a popular image compression algorithm. Compression has traditionally been done with little regard for image processing operations that may precede or follow the compression step. Recent work has used vector quantization both to simplify image processing tasks  such as enhancement, classification, halftoning, and edge detection  and to reduce the computational complexity by performing them simultaneously with the compression. After briefly reviewing the fundamental ideas of vector quantization, we present a survey of vector quantization algorithms that perform image processing. 1 Introduction Data compression is the mapping of a data set into a bit stream to decrease the number of bits required to represent the data set. With data compression, one can st...
Using SelfOrganizing Maps and Learning Vector Quantization for Mixture Density Hidden Markov Models
, 1997
"... This work presents experiments to recognize pattern sequences using hidden Markov models (HMMs). The pattern sequences in the experiments are computed from speech signals and the recognition task is to decode the corresponding phoneme sequences. The training of the HMMs of the phonemes using the col ..."
Abstract

Cited by 20 (8 self)
 Add to MetaCart
This work presents experiments to recognize pattern sequences using hidden Markov models (HMMs). The pattern sequences in the experiments are computed from speech signals and the recognition task is to decode the corresponding phoneme sequences. The training of the HMMs of the phonemes using the collected speech samples is a difficult task because of the natural variation in the speech. Two neural computing paradigms, the SelfOrganizing Map (SOM) and the Learning Vector Quantization (LVQ) are used in the experiments to improve the recognition performance of the models. A HMM consists of sequential states which are trained to model the feature changes in the signal produced during the modeled process. The output densities applied in this work are mixtures of Gaussian density functions. SOMs are applied to initialize and train the mixtures to give a smooth and faithful presentation of the feature vector space defined by the corresponding training samples. The SOM maps similar feature vect...
Online learning and stochastic approximations
 In Online Learning in Neural Networks
, 1998
"... The convergence of online learning algorithms is analyzed using the tools of the stochastic approximation theory, and proved under very weak conditions. A general framework for online learning algorithms is first presented. This framework encompasses the most common online learning algorithms in use ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
The convergence of online learning algorithms is analyzed using the tools of the stochastic approximation theory, and proved under very weak conditions. A general framework for online learning algorithms is first presented. This framework encompasses the most common online learning algorithms in use today, as illustrated by several examples. The stochastic approximation theory then provides general results describing the convergence of all these learning algorithms at once.
SelfOrganizing Process Based On Lateral Inhibition And Synaptic Resource Redistribution
 In Proceedings of the International Conference on Artificial Neural Networks
, 1991
"... implementation Selforganization can be efficiently implemented based on Euclidian distance and global supervision. It is not necessary to explicitly model the connections between the units in the network. Every unit computes the distance between its weight vector and the input vector. An external ..."
Abstract

Cited by 18 (7 self)
 Add to MetaCart
implementation Selforganization can be efficiently implemented based on Euclidian distance and global supervision. It is not necessary to explicitly model the connections between the units in the network. Every unit computes the distance between its weight vector and the input vector. An external supervisor finds the unit with the smallest distance, looks up the current neighborhood radius from a training schedule, and tells the units within this radius to modify their input weights. The weight adaptations are proportional to the Euclidian difference. The weights of unit (i; j) in a 2D map are (a) 0 samples (b) 30 samples (c) 100 samples (d) 10,000 samples Figure 1: Abstract implementation of selforganization. The map consists of 20 \Theta 20 units in a 2D array organization. The weight vector of each unit is shown as a point on the unit square 0 x; y 1. Each vector is connected with a line to the weight vectors of the four neighboring units. In other words, each intersection ...
Mixture of Experts Regression Modeling by Deterministic Annealing
 IEEE Transactions on Signal Processing
, 1997
"... We propose a new learning algorithm for regression modeling. The method is especially suitable for optimizing neural network structures that are amenable to a statistical description as mixture models. These include mixture of experts, hierarchical mixture of experts (HME), and normalized radial bas ..."
Abstract

Cited by 18 (3 self)
 Add to MetaCart
We propose a new learning algorithm for regression modeling. The method is especially suitable for optimizing neural network structures that are amenable to a statistical description as mixture models. These include mixture of experts, hierarchical mixture of experts (HME), and normalized radial basis functions (NRBF). Unlike recent maximum likelihood (ML) approaches, we directly minimize the (squared) regression error. We use the probabilistic framework as means to define an optimization method that avoids many shallow local minima on the complex cost surface. Our method is based on deterministic annealing (DA), where the entropy of the system is gradually reduced, with the expected regression cost (energy) minimized at each entropy level. The corresponding Lagrangian is the system's "freeenergy," and this annealing process is controlled by variation of the Lagrange multiplier, which acts as a "temperature" parameter. The new method consistently and substantially outperformed the com...
Unsupervised Classification Learning from CrossModal Environmental Structure
, 1994
"... This dissertation addresses the problem of unsupervised learning for pattern classification or category learning. A model that is based on gross cortical anatomy and implements biologically plausible computations is developed and shown to have classification power approaching that of a supervised di ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
This dissertation addresses the problem of unsupervised learning for pattern classification or category learning. A model that is based on gross cortical anatomy and implements biologically plausible computations is developed and shown to have classification power approaching that of a supervised discriminant algorithm. The advantage of supervised learning is that the final error metric is available during training. Unfortunately, when modeling human category learning, or in constructing classifiers for autonomous robots, one must deal with not having an omniscient entity labeling all incoming sensory patterns. We show that we can substitute for the labels by making use of structure between the pattern distributions to different sensory modalities. For example the cooccurrence of a visual image of a cow with a "moo" sound can be used to simultaneously develop appropriate visual features for distinguishing the cow image and appropriate auditory features for recognizing the moo. We mode...