Results 1  10
of
53
Gradientbased learning applied to document recognition
 Proceedings of the IEEE
, 1998
"... Multilayer neural networks trained with the backpropagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradientbased learning algorithms can be used to synthesize a complex decision surface that can classify hi ..."
Abstract

Cited by 1377 (82 self)
 Add to MetaCart
Multilayer neural networks trained with the backpropagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradientbased learning algorithms can be used to synthesize a complex decision surface that can classify highdimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of two dimensional (2D) shapes, are shown to outperform all other techniques. Reallife document recognition systems are composed of multiple modules including field extraction, segmentation, recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN’s), allows such multimodule systems to be trained globally using gradientbased methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank check is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal checks. It is deployed commercially and reads several million checks per day.
The tradeoffs of large scale learning
 IN: ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 20
, 2008
"... This contribution develops a theoretical framework that takes into account the effect of approximate optimization on learning algorithms. The analysis shows distinct tradeoffs for the case of smallscale and largescale learning problems. Smallscale learning problems are subject to the usual approx ..."
Abstract

Cited by 248 (4 self)
 Add to MetaCart
(Show Context)
This contribution develops a theoretical framework that takes into account the effect of approximate optimization on learning algorithms. The analysis shows distinct tradeoffs for the case of smallscale and largescale learning problems. Smallscale learning problems are subject to the usual approximation–estimation tradeoff. Largescale learning problems are subject to a qualitatively different tradeoff involving the computational complexity of the underlying optimization algorithms in nontrivial ways.
Discovering Informative Patterns and Data Cleaning
, 1996
"... We present a method for discovering informative patterns from data. With this method, large databases can be reduced to only a few representative data entries. Our framework also encompasses methods for cleaning databases containing corrupted data. Both online and offline algorithms are proposed a ..."
Abstract

Cited by 59 (1 self)
 Add to MetaCart
We present a method for discovering informative patterns from data. With this method, large databases can be reduced to only a few representative data entries. Our framework also encompasses methods for cleaning databases containing corrupted data. Both online and offline algorithms are proposed and experimentally checked on databases of handwritten images. The generality of the framework makes it an attractive candidate for new applications in knowledge discovery. Keywords: knowledge discovery, machine learning, informative patterns, data cleaning, information gain. 4.1
Rigorous learning curve bounds from statistical mechanics
 Machine Learning
, 1994
"... Abstract In this paper we introduce and investigate a mathematically rigorous theory of learning curves that is based on ideas from statistical mechanics. The advantage of our theory over the wellestablished VapnikChervonenkis theory is that our bounds can be considerably tighter in many cases, an ..."
Abstract

Cited by 57 (10 self)
 Add to MetaCart
(Show Context)
Abstract In this paper we introduce and investigate a mathematically rigorous theory of learning curves that is based on ideas from statistical mechanics. The advantage of our theory over the wellestablished VapnikChervonenkis theory is that our bounds can be considerably tighter in many cases, and are also more reflective of the true behavior (functional form) of learning curves. This behavior can often exhibit dramatic properties such as phase transitions, as well as power law asymptotics not explained by the VC theory. The disadvantages of our theory are that its application requires knowledge of the input distribution, and it is limited so far to finite cardinality function classes. We illustrate our results with many concrete examples of learning curve bounds derived from our theory. 1 Introduction According to the VapnikChervonenkis (VC) theory of learning curves [27, 26], minimizing empirical error within a function class F on a random sample of m examples leads to generalization error bounded by ~O(d=m) (in the case that the target function is contained in F) or ~O(pd=m) plus the optimal generalization error achievable within F (in the general case). 1 These bounds are universal: they hold for any class of hypothesis functions F, for any input distribution, and for any target function. The only problemspecific quantity remaining in these bounds is the VC dimension d, a measure of the complexity of the function class F. It has been shown that these bounds are essentially the best distributionindependent bounds possible, in the sense that for any function class, there exists an input distribution for which matching lower bounds on the generalization error can be given [5, 7, 22].
Multiresolution Abnormal Trace Detection Using VariedLength nGrams and Automata
 IEEE Trans. on Systems, Man, and Cybernetics Part C: Applications and Reviews
, 2007
"... Detection and diagnosis of faults in a largescale distributed system is a formidable task. Interest in monitoring and using traces of user requests for fault detection has been on the rise recently. In this paper we propose novel fault detection methods based on abnormal trace detection. One essent ..."
Abstract

Cited by 25 (5 self)
 Add to MetaCart
(Show Context)
Detection and diagnosis of faults in a largescale distributed system is a formidable task. Interest in monitoring and using traces of user requests for fault detection has been on the rise recently. In this paper we propose novel fault detection methods based on abnormal trace detection. One essential problem is how to represent the large amount of training trace data compactly as an oracle. Our key contribution is the novel use of variedlength ngrams and automata to characterize normal traces. A new trace is compared against the learned automata to determine whether it is abnormal. We develop algorithms to automatically extract ngrams and construct multiresolution automata from training data. Further both deterministic and multihypothesis algorithms are proposed for detection. We inspect the trace constraints of real application software and verify the existence of long ngrams. Our approach is tested in a real system with injected faults and achieves good results in experiments. 1.
An Empirical Quest for Optimal Rule Learning Heuristics
, 2008
"... The primary goal of the research reported in this paper is to identify what criteria are responsible for the good performance of a heuristic rule evaluation function in a greedy topdown covering algorithm. We first argue that search heuristics for inductive rule learning algorithms typically trade o ..."
Abstract

Cited by 16 (8 self)
 Add to MetaCart
The primary goal of the research reported in this paper is to identify what criteria are responsible for the good performance of a heuristic rule evaluation function in a greedy topdown covering algorithm. We first argue that search heuristics for inductive rule learning algorithms typically trade off consistency and coverage, and we investigate this tradeoff by determining optimal parameter settings for five different parametrized heuristics. In order to avoid biasing our study by known functional families, we also investigate the potential of using metalearning for obtaining alternative rule learning heuristics. The key results of this experimental study are not only practical default values for commonly used heuristics and a broad comparative evaluation of known and novel rule learning heuristics, but we also gain theoretical insights into factors that are responsible for a good performance. For example, we observe that consistency should be weighed more heavily than coverage, presumably because a lack of coverage can later be corrected by learning additional rules.
Finding Optimal Neural Networks for Land Use Classification
 IEEE Transactions on Geoscience and Remote Sensing
, 1998
"... In this letter we present a fully automatic and computationally efficient algorithm based on the Minimum Description Length Principle (MDL) for optimizing multilayer perceptron classifiers. We demonstrate our method on the problem of multispectral Landsat image classification. We compare our results ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
(Show Context)
In this letter we present a fully automatic and computationally efficient algorithm based on the Minimum Description Length Principle (MDL) for optimizing multilayer perceptron classifiers. We demonstrate our method on the problem of multispectral Landsat image classification. We compare our results with a hand designed multilayer perceptron and a Gaussian maximum likelihood classifier where our method produces better classification accuracy with a smaller number of hidden units. 1 Introduction The number of applications of neural networks to remote sensing problems (especially classification) has been constantly increasing in the last few years (e.g. see [1, 2, 3, 4]). It has been demonstrated that in many cases neural networks perform considerably better than classical methods e.g. [1]. However, to achieve this superior performance, the neural networks need to be carefully designed. This includes both the design of the network topology as well as the input/output representati...
Bounds on Sample Size for Policy Evaluation in Markov Environments
 In Fourteenth Annual Conference on Computational Learning Theory
, 2001
"... Reinforcement learning means finding the optimal course of action in Markovian environments without knowledge of the environment 's dynamics. Stochastic optimization algorithms used in the field rely on estimates of the value of a policy. Typically, the value of a policy is estimated from r ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
(Show Context)
Reinforcement learning means finding the optimal course of action in Markovian environments without knowledge of the environment 's dynamics. Stochastic optimization algorithms used in the field rely on estimates of the value of a policy. Typically, the value of a policy is estimated from results of simulating that very policy in the environment.
Metalearning rule learning heuristics
 Technical Report. Knowledge Engineering Group
, 2007
"... The goal of this paper is to investigate to what extent a rule learning heuristic can be learned from experience. To that end, we let a rule learner learn a large number of rules and record their performance on the test set. Subsequently, we train regression algorithms on predicting the test set per ..."
Abstract

Cited by 10 (7 self)
 Add to MetaCart
(Show Context)
The goal of this paper is to investigate to what extent a rule learning heuristic can be learned from experience. To that end, we let a rule learner learn a large number of rules and record their performance on the test set. Subsequently, we train regression algorithms on predicting the test set performance of a rule from its training set characteristics. We investigate several variations of this basic scenario, including the question whether it is better to predict the performance of the candidate rule itself or of the resulting final rule. Our experiments on a number of independent evaluation sets show that the learned heuristics outperform standard rule learning heuristics. We also analyze their behavior in coverage space. 1