Results 1  10
of
19
Deterministic Annealing for Clustering, Compression, Classification, Regression, and Related Optimization Problems
 Proceedings of the IEEE
, 1998
"... this paper. Let us place it within the neural network perspective, and particularly that of learning. The area of neural networks has greatly benefited from its unique position at the crossroads of several diverse scientific and engineering disciplines including statistics and probability theory, ph ..."
Abstract

Cited by 247 (11 self)
 Add to MetaCart
this paper. Let us place it within the neural network perspective, and particularly that of learning. The area of neural networks has greatly benefited from its unique position at the crossroads of several diverse scientific and engineering disciplines including statistics and probability theory, physics, biology, control and signal processing, information theory, complexity theory, and psychology (see [45]). Neural networks have provided a fertile soil for the infusion (and occasionally confusion) of ideas, as well as a meeting ground for comparing viewpoints, sharing tools, and renovating approaches. It is within the illdefined boundaries of the field of neural networks that researchers in traditionally distant fields have come to the realization that they have been attacking fundamentally similar optimization problems.
Latent Variable Models for Neural Data Analysis
, 1999
"... The brain is perhaps the most complex system to have ever been subjected to rigorous scientific investigation. The scale is staggering: over 1011 neurons, each making an average of 10 3 synapses, with computation occurring on scales ranging from a single dendritic spine, to an entire cortical area. ..."
Abstract

Cited by 42 (5 self)
 Add to MetaCart
The brain is perhaps the most complex system to have ever been subjected to rigorous scientific investigation. The scale is staggering: over 1011 neurons, each making an average of 10 3 synapses, with computation occurring on scales ranging from a single dendritic spine, to an entire cortical area. Slowly, we are beginning to acquire experimental tools that can gather the massive amounts of data needed to characterize this system. However, to understand and interpret these data will also require substantial strides in inferential and statistical techniques. This dissertation attempts to meet this need, extending and applying the modern tools of latent variable modeling to problems in neural data analysis. It is divided
Soft Learning Vector Quantization
 NEURAL COMPUTATION
, 2002
"... Learning Vector Quantization is a popular class of adaptive nearest prototype classifiers for multiclass classification, but learning algorithms from this family have so far been proposed on heuristic grounds. Here we take a more principled approach and derive two variants of Learning Vector Quantiz ..."
Abstract

Cited by 34 (0 self)
 Add to MetaCart
Learning Vector Quantization is a popular class of adaptive nearest prototype classifiers for multiclass classification, but learning algorithms from this family have so far been proposed on heuristic grounds. Here we take a more principled approach and derive two variants of Learning Vector Quantization using a Gaussian mixture ansatz. We propose an objective function which is based on a likelihood ratio and we derive a learning rule using gradient descent. The new approach provides a way to extend the algorithms of the LVQ family to different distance measure and allows for the design of "soft" Learning Vector Quantization algorithms. Benchmark results show that the new methods lead to better classification performance than LVQ 2.1. An additional benefit of the new method is that model assumptions are made explicit, so that the method can be adapted more easily to dierent kinds of problems.
Soft Nearest Prototype Classification
, 2002
"... We propose a new method for the construction of nearest prototype classifiers which is based on a Gaussian mixture ansatz and which can be interpreted as an annealed version of Learning Vector Quantization. The algorithm performs a gradient descent on a costfunction minimizing the classification er ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
We propose a new method for the construction of nearest prototype classifiers which is based on a Gaussian mixture ansatz and which can be interpreted as an annealed version of Learning Vector Quantization. The algorithm performs a gradient descent on a costfunction minimizing the classification error on the training set. We investigate the properties of the algorithm and assess its performance for several toy data sets and for an optical letter classification task. Results show (i) that annealing in the dispersion parameter of the Gaussian kernels improves classification accuracy, (ii) that classification results are better than those obtained with standard Learning Vector Quantization (LVQ 2.1, LVQ 3) for equal numbers of prototypes and (iii) that annealing of the width parameter improved the classification capability. Additionally, the principled approach provides an explanation of a number of features of the (heuristic) LVQ methods.
A Generalized VQ Method for Combined Compression and Estimation
 Proc. IEEE Intl. Conf. Acoustics, Speech, and Sig. Proc
, 1996
"... In vector quantization, one approximates an input random vector, Y, by choosing from a finite set of values known as the codebook. We consider a more general problem where one may not have direct access to Y but only to some statistically related random vector X. We observe X and would like to gener ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
In vector quantization, one approximates an input random vector, Y, by choosing from a finite set of values known as the codebook. We consider a more general problem where one may not have direct access to Y but only to some statistically related random vector X. We observe X and would like to generate an approximation to Y from a codebook of candidate vectors. This operation, called generalized vector quantization (GVQ), is essentially that of quantized estimation. An important special case of GVQ is the problem of noisy source coding wherein a quantized approximation of a vector, Y, is obtained from observation of its noisecorrupted version, X. The optimal GVQ encoder has high complexity. We overcome the complexity barrier by optimizing a structurallyconstrained encoder. This challenging optimization task is solved via a probabilistic approach, based on deterministic annealing (DA), which overcomes problems of shallow local minima that trap simpler descent methods. We demonstrate th...
Deterministically Annealed Design of Hidden Markov Model Speech Recognizers
, 2001
"... Many conventional speech recognition systems are based on the use of hidden Markov models (HMM) within the context of discriminantbased pattern classification. While the speech recognition objective is a low rate of misclassification, HMM design has been traditionally approached via maximum likelih ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
Many conventional speech recognition systems are based on the use of hidden Markov models (HMM) within the context of discriminantbased pattern classification. While the speech recognition objective is a low rate of misclassification, HMM design has been traditionally approached via maximum likelihood (ML) modeling which is, in general, mismatched with the minimum error objective and hence suboptimal. Direct minimization of the error rate is difficult because of the complex nature of the cost surface, and has only been addressed recently by discriminative design methods such as generalized probabilistic descent (GPD). While existing discriminative methods offer significant benefits, they commonly rely on local optimization via gradient descent whose performance suffers from the prevalence of shallow local minima. As an alternative, we propose the deterministic annealing (DA) design method that directly minimizes the error rate while avoiding many poor local minima of the cost. DA is derived from fundamental principles of statistical physics and information theory. In DA, the HMM classifier's decision is randomized and its expected error rate is minimized subject to a constraint on the level of randomness which is measured by the Shannon entropy. The entropy constraint is gradually relaxed, leading in the limit of zero entropy to the design of regular nonrandom HMM classifiers. An efficient forwardbackward algorithm is proposed for the DA method. Experiments on synthetic data and on a simplified recognizer for isolated English letters demonstrate that the DA design method can improve recognition error rates over both ML and GPD methods.
The Deterministic Annealing Approach For Discriminative Continuous HMM Design
 In IEEE Workshop on Neural Networks and Signal Processing
, 1997
"... We propose a deterministic annealing (DA) algorithm to design classifiers based on continuous observation hidden Markov models. The algorithm belongs to the class of minimum classification error (MCE) techniques that are known to outperform maximum likelihood (ML) design. Most MCE methods smooth the ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
We propose a deterministic annealing (DA) algorithm to design classifiers based on continuous observation hidden Markov models. The algorithm belongs to the class of minimum classification error (MCE) techniques that are known to outperform maximum likelihood (ML) design. Most MCE methods smooth the piecewise constant classification error cost to facilitate the use of local descent optimization methods, but are susceptible to the numerous shallow local minimum traps that riddle the cost surface. The DA approach employs randomization of the classification rule followed by minimization of the corresponding expected misclassification rate, while controlling the level of randomness via a constraint on the Shannon entropy. The effective cost function is smooth and converges to the MCE cost at the limit of zero entropy. The proposed algorithm significantly outperforms both standard ML and standard MCE design methods on the Eset database. Keywords: Speech recognition, Discriminative trainin...
Statistical Physics of Clustering Algorithms
 DIPLOMARBEIT, TECHNIQUE UNIVERSITÄT, FB PHYSIK, INSTITUT FÜR THEORETISHE PHYSIK
, 1998
"... ..."
Deterministically Annealed Mixture Of Experts Models For Statistical Regression
, 1997
"... We attack the problem of statistical regression function design for the class of mixture models which includes the hierarchical mixture of experts (HME) and the normalized radial basis functions (NRBF). There has been much interest in this area, prompted by algorithms based on the maximum likelihood ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We attack the problem of statistical regression function design for the class of mixture models which includes the hierarchical mixture of experts (HME) and the normalized radial basis functions (NRBF). There has been much interest in this area, prompted by algorithms based on the maximum likelihood (ML) approach, which emphasize a probabilistic description of the model. While these algorithms have several attractive properties, their design objective is mismatched to the original squarederror regression cost and the algorithms are easily trapped by poor local minima on the cost surface. In this paper, we propose an extension of the deterministic annealing (DA) method for the design of mixturebased regression models. We construct a probabilistic framework, but unlike the ML method, we directly optimize the squarederror regression cost, while avoiding poor local minima. Experimental results show that the DA method outperforms standard design methods for both HME and NRBF regression m...