Results 1  10
of
82
Gradientbased learning applied to document recognition
 Proceedings of the IEEE
, 1998
"... Multilayer neural networks trained with the backpropagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradientbased learning algorithms can be used to synthesize a complex decision surface that can classify hi ..."
Abstract

Cited by 742 (59 self)
 Add to MetaCart
Multilayer neural networks trained with the backpropagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradientbased learning algorithms can be used to synthesize a complex decision surface that can classify highdimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of two dimensional (2D) shapes, are shown to outperform all other techniques. Reallife document recognition systems are composed of multiple modules including field extraction, segmentation, recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN’s), allows such multimodule systems to be trained globally using gradientbased methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank check is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal checks. It is deployed commercially and reads several million checks per day.
PopulationBased Incremental Learning: A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning
, 1994
"... Genetic algorithms (GAs) are biologically motivated adaptive systems which have been used, with varying degrees of success, for function optimization. In this study, an abstraction of the basic genetic algorithm, the Equilibrium Genetic Algorithm (EGA), and the GA in turn, are reconsidered within th ..."
Abstract

Cited by 301 (12 self)
 Add to MetaCart
Genetic algorithms (GAs) are biologically motivated adaptive systems which have been used, with varying degrees of success, for function optimization. In this study, an abstraction of the basic genetic algorithm, the Equilibrium Genetic Algorithm (EGA), and the GA in turn, are reconsidered within the framework of competitive learning. This new perspective reveals a number of different possibilities for performance improvements. This paper explores populationbased incremental learning (PBIL), a method of combining the mechanisms of a generational genetic algorithm with simple competitive learning. The combination of these two methods reveals a tool which is far simpler than a GA, and which outperforms a GA on large set of optimization problems in terms of both speed and accuracy. This paper presents an empirical analysis of where the proposed technique will outperform genetic algorithms, and describes a class of problems in which a genetic algorithm may be able to perform better. Extensions to this algorithm are discussed and analyzed. PBIL and extensions are compared with a standard GA on twelve problems, including standard numerical optimization functions, traditional GA test suite problems, and NPComplete problems.
Deterministic Annealing for Clustering, Compression, Classification, Regression, and Related Optimization Problems
 Proceedings of the IEEE
, 1998
"... this paper. Let us place it within the neural network perspective, and particularly that of learning. The area of neural networks has greatly benefited from its unique position at the crossroads of several diverse scientific and engineering disciplines including statistics and probability theory, ph ..."
Abstract

Cited by 248 (11 self)
 Add to MetaCart
this paper. Let us place it within the neural network perspective, and particularly that of learning. The area of neural networks has greatly benefited from its unique position at the crossroads of several diverse scientific and engineering disciplines including statistics and probability theory, physics, biology, control and signal processing, information theory, complexity theory, and psychology (see [45]). Neural networks have provided a fertile soil for the infusion (and occasionally confusion) of ideas, as well as a meeting ground for comparing viewpoints, sharing tools, and renovating approaches. It is within the illdefined boundaries of the field of neural networks that researchers in traditionally distant fields have come to the realization that they have been attacking fundamentally similar optimization problems.
A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms
 ARTIFICIAL INTELLIGENCE REVIEW
, 1997
"... Many lazy learning algorithms are derivatives of the knearest neighbor (kNN) classifier, which uses a distance function to generate predictions from stored instances. Several studies have shown that kNN's performance is highly sensitive to the definition of its distance function. Many kNN v ..."
Abstract

Cited by 111 (0 self)
 Add to MetaCart
Many lazy learning algorithms are derivatives of the knearest neighbor (kNN) classifier, which uses a distance function to generate predictions from stored instances. Several studies have shown that kNN's performance is highly sensitive to the definition of its distance function. Many kNN variants have been proposed to reduce this sensitivity by parameterizing the distance function with feature weights. However, these variants have not been categorized nor empirically compared. This paper reviews a class of weightsetting methods for lazy learning algorithms. We introduce a framework for distinguishing these methods and empirically compare them. We observed four trends from our experiments and conducted further studies to highlight them. Our results suggest that methods which use performance feedback to assign weight settings demonstrated three advantages over other methods: they require less preprocessing, perform better in the presence of interacting features, and generally require less training data to learn good settings. We also found that continuous weighting methods tend to outperform feature selection algorithms for tasks where some features are useful but less important than others.
Connectionist Probability Estimation in HMM Speech Recognition
 IEEE Transactions on Speech and Audio Processing
, 1992
"... This report is concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system, This is achieved through a statistical understanding of connectionist networks as probability estimators, first elucidated by Herve Bourlard. We review the basis of HMM speech ..."
Abstract

Cited by 62 (16 self)
 Add to MetaCart
This report is concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system, This is achieved through a statistical understanding of connectionist networks as probability estimators, first elucidated by Herve Bourlard. We review the basis of HMM speech recognition, and point out the possible benefits of incorporating connectionist networks. We discuss some issues necessary to the construction of a connectionist HMM recognition system, and describe the performance of such a system, including evaluations on the DARPA database, in collaboration with Mike Cohen and Horacio Franco of SRI International. In conclusion, we show that a connectionist component improves a state of the art HMM system. ii Part I INTRODUCTION Over the past few years, connectionist models have been widely proposed as a potentially powerful approach to speech recognition (e.g. Makino et al. (1983), Huang et al. (1988) and Waibel et al. (1989)). However, whilst connec...
Recognizing Teleoperated Manipulations
 Proceedings of the IEEE International Conference on Robotics and Automation
, 1993
"... The many degreesoffreedom and distributed sensing capability of dextrous robot hands permits the use of control programs that rely on qualitative changes in sensor feedback rather than precise positioning and force information. One way of designing such a control program is to have the robot learn ..."
Abstract

Cited by 48 (1 self)
 Add to MetaCart
The many degreesoffreedom and distributed sensing capability of dextrous robot hands permits the use of control programs that rely on qualitative changes in sensor feedback rather than precise positioning and force information. One way of designing such a control program is to have the robot learn the qualitative control characteristics from examples. A convenient way of providing these examples is via teleoperation. To this end, this paper presents results for recognizing and segmenting manipulation primitives from a teleoperated task by analysis of features in sensor feedback. knearest quantized pattern vectors determine potential classifications. A hidden Markov model provides task context for the final segmentation. The illustrative task is picking up a plastic egg with a spatula. 1 Introduction Dextrous hands can be controlled qualitatively using strategies that servo on significant changes in sensor feedback. Using the many degrees of freedom and distributed sensing capabili...
Modular Neural Networks for Learning ContextDependent Game Strategies
 Master’s thesis, Computer Speech and Language Processing
, 1992
"... The method of temporal differences (TD) is a learning technique which specialises in predicting the likely outcome of a sequence over time. Examples of such sequences include speech frame vectors, whose outcome is a phoneme or word decision, and positions in a board game, whose outcome is a win/loss ..."
Abstract

Cited by 32 (3 self)
 Add to MetaCart
The method of temporal differences (TD) is a learning technique which specialises in predicting the likely outcome of a sequence over time. Examples of such sequences include speech frame vectors, whose outcome is a phoneme or word decision, and positions in a board game, whose outcome is a win/loss decision. Recent results by Tesauro in the domain of backgammon indicate that a neural network, trained by TD methods to evaluate positions generated by selfplay, can reach an advanced level of backgammon skill. For my summer thesis project, I first implemented the TD/neural network learning algorithms and confirmed Tesauro's results, using the domains of tictactoe and backgammon. Then, motivated by Waibel's success with modular neural networks for phoneme recognition, I experimented with using two modular architectures (DDD and MetaPi) in place of the monolithic networks. I found that using the modular networks significantly enhanced the ability of the backgammon evaluator to change it...
Performance evaluation of pattern classifiers for handwritten character recognition
 International Journal on Document Analysis and Recognition
, 2002
"... Abstract. This paper describes a performance evaluation study in which some efficient classifiers are tested in handwritten digit recognition. The evaluated classifiers include a statistical classifier (modified quadratic discriminant function, MQDF), three neural classifiers, and an LVQ (learning v ..."
Abstract

Cited by 31 (3 self)
 Add to MetaCart
Abstract. This paper describes a performance evaluation study in which some efficient classifiers are tested in handwritten digit recognition. The evaluated classifiers include a statistical classifier (modified quadratic discriminant function, MQDF), three neural classifiers, and an LVQ (learning vector quantization) classifier. They are efficient in that high accuracies can be achieved at moderate memory space and computation cost. The performance is measured in terms of classification accuracy, sensitivity to training sample size, ambiguity rejection, and outlier resistance. The outlier resistance of neural classifiers is enhanced by training with synthesized outlier data. The classifiers are tested on a large data set extracted from NIST SD19. As results, the test accuracies of the evaluated classifiers are comparable to or higher than those of the nearest neighbor (1NN) rule and regularized discriminant analysis (RDA). It is shown that neural classifiers are more susceptible to small sample size than MQDF, although they yield higher accuracies on large sample size. As a neural classifier, the polynomial classifier (PC) gives the highest accuracy and performs best in ambiguity rejection. On the other hand, MQDF is superior in outlier rejection even though it is not trained with outlier data. The results indicate that pattern classifiers have complementary advantages and they should be appropriately combined to achieve higher performance.
Population Based Incremental Learning: A Method for Integrating Genetic Search Based Function Optimization and Competitve Learning
, 1994
"... Genetic algorithms (GAs) are biologically motivated adaptive systems which have been used, with varying degrees of success, for function optimization. In this study, an abstraction of the basic genetic algorithm, the Equilibrium Genetic Algorithm (EGA), and the GA in turn, are reconsidered within ..."
Abstract

Cited by 30 (0 self)
 Add to MetaCart
Genetic algorithms (GAs) are biologically motivated adaptive systems which have been used, with varying degrees of success, for function optimization. In this study, an abstraction of the basic genetic algorithm, the Equilibrium Genetic Algorithm (EGA), and the GA in turn, are reconsidered within the framework of competitive learning. This new perspective reveals a number of different possibilities for performance improvements. This paper explores population based incremental learning (PBIL), a method of combining the mechanisms of a generational genetic algorithm with simple competitive learning. The combination of these two methods reveals a tool which is far simpler than a GA, and which outperforms a GA on large set of optimization problems in terms of both speed and accuracy. This paper presents an empirical analysis of where the proposed technique will outperform genetic algorithms, and describes a class of problems in which a genetic algorithm may be able to perform b...
Bayes Risk Weighted Vector Quantization With Posterior Estimation for Image Compression and Classification
 IEEE TRANSACTIONS ON IMAGE PROCESSING
, 1996
"... Classification and compression play important roles in communicating digital information. Their combination is useful in many applications, including the detection of abnormalities in compressed medical images. In view of the similarities of compression and lowlevel classification, it is not surpri ..."
Abstract

Cited by 28 (8 self)
 Add to MetaCart
Classification and compression play important roles in communicating digital information. Their combination is useful in many applications, including the detection of abnormalities in compressed medical images. In view of the similarities of compression and lowlevel classification, it is not surprising that there are many similar methods for their design. Because some of these methods are useful for designing vector quantizers, it seems natural that vector quantization (VQ) is explored for the combined goal. We investigate several VQbased algorithms that seek to minimize both the distortion of compressed images and errors in classifying their pixel blocks. These algorithms are investigated with both full search and treestructured codes. We emphasize a nonparametric technique that minimizes both error measures simultaneously by incorporating a Bayes risk component into the distortion measure used for design and encoding. We introduce a treestructured posterior estimator to produce t...