Results 1  10
of
1,622
Gradientbased learning applied to document recognition
 Proceedings of the IEEE
, 1998
"... Multilayer neural networks trained with the backpropagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradientbased learning algorithms can be used to synthesize a complex decision surface that can classify hi ..."
Abstract

Cited by 1465 (84 self)
 Add to MetaCart
Multilayer neural networks trained with the backpropagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradientbased learning algorithms can be used to synthesize a complex decision surface that can classify highdimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of two dimensional (2D) shapes, are shown to outperform all other techniques. Reallife document recognition systems are composed of multiple modules including field extraction, segmentation, recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN’s), allows such multimodule systems to be trained globally using gradientbased methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank check is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal checks. It is deployed commercially and reads several million checks per day.
Prediction of complete gene structures in human genomic DNA
 J. Mol. Biol
, 1997
"... The problem of identifying genes in genomic DNA sequences by computational methods has attracted considerable research attention in recent years. From one point of view, the problem is closely ..."
Abstract

Cited by 1160 (9 self)
 Add to MetaCart
(Show Context)
The problem of identifying genes in genomic DNA sequences by computational methods has attracted considerable research attention in recent years. From one point of view, the problem is closely
Probabilistic PartofSpeech Tagging Using Decision Trees
, 1994
"... In this paper, a new probabilistic tagging method is presented which avoids problems that Markov Model based taggers face, when they have to estimate transition probabilities from sparse data. In this tagging method, transition probabilities are estimated using a decision tree. Based on this method, ..."
Abstract

Cited by 1009 (9 self)
 Add to MetaCart
In this paper, a new probabilistic tagging method is presented which avoids problems that Markov Model based taggers face, when they have to estimate transition probabilities from sparse data. In this tagging method, transition probabilities are estimated using a decision tree. Based on this method, a partofspeech tagger (called TreeTagger) has been implemented which achieves 96.36 % accuracy on PennTreebank data which is better than that of a trigram tagger (96.06 %) on the same data.
Good ErrorCorrecting Codes based on Very Sparse Matrices
, 1999
"... We study two families of errorcorrecting codes defined in terms of very sparse matrices. "MN" (MacKayNeal) codes are recently invented, and "Gallager codes" were first investigated in 1962, but appear to have been largely forgotten, in spite of their excellent properties. The ..."
Abstract

Cited by 741 (23 self)
 Add to MetaCart
We study two families of errorcorrecting codes defined in terms of very sparse matrices. "MN" (MacKayNeal) codes are recently invented, and "Gallager codes" were first investigated in 1962, but appear to have been largely forgotten, in spite of their excellent properties. The decoding of both codes can be tackled with a practical sumproduct algorithm. We prove that these codes are "very good," in that sequences of codes exist which, when optimally decoded, achieve information rates up to the Shannon limit. This result holds not only for the binarysymmetric channel but also for any channel with symmetric stationary ergodic noise. We give experimental results for binarysymmetric channels and Gaussian channels demonstrating that practical performance substantially better than that of standard convolutional and concatenated codes can be achieved; indeed, the performance of Gallager codes is almost as close to the Shannon limit as that of turbo codes.
Object Tracking: A Survey
, 2006
"... The goal of this article is to review the stateoftheart tracking methods, classify them into different categories, and identify new trends. Object tracking, in general, is a challenging problem. Difficulties in tracking objects can arise due to abrupt object motion, changing appearance patterns o ..."
Abstract

Cited by 690 (8 self)
 Add to MetaCart
The goal of this article is to review the stateoftheart tracking methods, classify them into different categories, and identify new trends. Object tracking, in general, is a challenging problem. Difficulties in tracking objects can arise due to abrupt object motion, changing appearance patterns of both the object and the scene, nonrigid object structures, objecttoobject and objecttoscene occlusions, and camera motion. Tracking is usually performed in the context of higherlevel applications that require the location and/or shape of the object in every frame. Typically, assumptions are made to constrain the tracking problem in the context of a particular application. In this survey, we categorize the tracking methods on the basis of the object and motion representations used, provide detailed descriptions of representative methods in each category, and examine their pros and cons. Moreover, we discuss the important issues related to tracking including the use of appropriate image features, selection of motion models, and detection of objects.
The Infinite Hidden Markov Model
 Machine Learning
, 2002
"... We show that it is possible to extend hidden Markov models to have a countably infinite number of hidden states. By using the theory of Dirichlet processes we can implicitly integrate out the infinitely many transition parameters, leaving only three hyperparameters which can be learned from data. Th ..."
Abstract

Cited by 629 (41 self)
 Add to MetaCart
We show that it is possible to extend hidden Markov models to have a countably infinite number of hidden states. By using the theory of Dirichlet processes we can implicitly integrate out the infinitely many transition parameters, leaving only three hyperparameters which can be learned from data. These three hyperparameters define a hierarchical Dirichlet process capable of capturing a rich set of transition dynamics. The three hyperparameters control the time scale of the dynamics, the sparsity of the underlying statetransition matrix, and the expected number of distinct hidden states in a finite sequence. In this framework it is also natural to allow the alphabet of emitted symbols to be infiniteconsider, for example, symbols being possible words appearing in English text.
Constructing Free Energy Approximations and Generalized Belief Propagation Algorithms
 IEEE Transactions on Information Theory
, 2005
"... Important inference problems in statistical physics, computer vision, errorcorrecting coding theory, and artificial intelligence can all be reformulated as the computation of marginal probabilities on factor graphs. The belief propagation (BP) algorithm is an efficient way to solve these problems t ..."
Abstract

Cited by 586 (13 self)
 Add to MetaCart
(Show Context)
Important inference problems in statistical physics, computer vision, errorcorrecting coding theory, and artificial intelligence can all be reformulated as the computation of marginal probabilities on factor graphs. The belief propagation (BP) algorithm is an efficient way to solve these problems that is exact when the factor graph is a tree, but only approximate when the factor graph has cycles. We show that BP fixed points correspond to the stationary points of the Bethe approximation of the free energy for a factor graph. We explain how to obtain regionbased free energy approximations that improve the Bethe approximation, and corresponding generalized belief propagation (GBP) algorithms. We emphasize the conditions a free energy approximation must satisfy in order to be a “valid ” or “maxentnormal ” approximation. We describe the relationship between four different methods that can be used to generate valid approximations: the “Bethe method, ” the “junction graph method, ” the “cluster variation method, ” and the “region graph method.” Finally, we explain how to tell whether a regionbased approximation, and its corresponding GBP algorithm, is likely to be accurate, and describe empirical results showing that GBP can significantly outperform BP.
Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora
, 1997
"... ..."
Turbo decoding as an instance of Pearl’s belief propagation algorithm
 IEEE Journal on Selected Areas in Communications
, 1998
"... Abstract—In this paper, we will describe the close connection between the now celebrated iterative turbo decoding algorithm of Berrou et al. and an algorithm that has been well known in the artificial intelligence community for a decade, but which is relatively unknown to information theorists: Pear ..."
Abstract

Cited by 420 (16 self)
 Add to MetaCart
(Show Context)
Abstract—In this paper, we will describe the close connection between the now celebrated iterative turbo decoding algorithm of Berrou et al. and an algorithm that has been well known in the artificial intelligence community for a decade, but which is relatively unknown to information theorists: Pearl’s belief propagation algorithm. We shall see that if Pearl’s algorithm is applied to the “belief network ” of a parallel concatenation of two or more codes, the turbo decoding algorithm immediately results. Unfortunately, however, this belief diagram has loops, and Pearl only proved that his algorithm works when there are no loops, so an explanation of the excellent experimental performance of turbo decoding is still lacking. However, we shall also show that Pearl’s algorithm can be used to routinely derive previously known iterative, but suboptimal, decoding algorithms for a number of other errorcontrol systems, including Gallager’s
A practical partofspeech tagger
 IN PROCEEDINGS OF THE THIRD CONFERENCE ON APPLIED NATURAL LANGUAGE PROCESSING
, 1992
"... We present an implementation of a partofspeech tagger based on a hidden Markov model. The methodology enables robust and accurate tagging with few resource requirements. Only a lexicon and some unlabeled training text are required. Accuracy exceeds 96%. We describe implementation strategies and op ..."
Abstract

Cited by 404 (5 self)
 Add to MetaCart
We present an implementation of a partofspeech tagger based on a hidden Markov model. The methodology enables robust and accurate tagging with few resource requirements. Only a lexicon and some unlabeled training text are required. Accuracy exceeds 96%. We describe implementation strategies and optimizations which result in highspeed operation. Three applications for tagging are described: phrase recognition; word sense disambiguation; and grammatical function assignment.