Results 1  10
of
184
Experiments with a New Boosting Algorithm
, 1996
"... In an earlier paper, we introduced a new “boosting” algorithm called AdaBoost which, theoretically, can be used to significantly reduce the error of any learning algorithm that consistently generates classifiers whose performance is a little better than random guessing. We also introduced the relate ..."
Abstract

Cited by 1631 (20 self)
 Add to MetaCart
In an earlier paper, we introduced a new “boosting” algorithm called AdaBoost which, theoretically, can be used to significantly reduce the error of any learning algorithm that consistently generates classifiers whose performance is a little better than random guessing. We also introduced the related notion of a “pseudoloss ” which is a method for forcing a learning algorithm of multilabel conceptsto concentrate on the labels that are hardest to discriminate. In this paper, we describe experiments we carried out to assess how well AdaBoost with and without pseudoloss, performs on real learning problems. We performed two sets of experiments. The first set compared boosting to Breiman’s “bagging ” method when used to aggregate various classifiers (including decision trees and single attributevalue tests). We compared the performance of the two methods on a collection of machinelearning benchmarks. In the second set of experiments, we studied in more detail the performance of boosting using a nearestneighbor classifier on an OCR problem.
Gradientbased learning applied to document recognition
 Proceedings of the IEEE
, 1998
"... Multilayer neural networks trained with the backpropagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradientbased learning algorithms can be used to synthesize a complex decision surface that can classify hi ..."
Abstract

Cited by 732 (58 self)
 Add to MetaCart
Multilayer neural networks trained with the backpropagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradientbased learning algorithms can be used to synthesize a complex decision surface that can classify highdimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of two dimensional (2D) shapes, are shown to outperform all other techniques. Reallife document recognition systems are composed of multiple modules including field extraction, segmentation, recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN’s), allows such multimodule systems to be trained globally using gradientbased methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank check is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal checks. It is deployed commercially and reads several million checks per day.
Statistical pattern recognition: A review
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2000
"... The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques ..."
Abstract

Cited by 667 (22 self)
 Add to MetaCart
The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques and methods imported from statistical learning theory have bean receiving increasing attention. The design of a recognition system requires careful attention to the following issues: definition of pattern classes, sensing environment, pattern representation, feature extraction and selection, cluster analysis, classifier design and learning, selection of training and test samples, and performance evaluation. In spite of almost 50 years of research and development in this field, the general problem of recognizing complex patterns with arbitrary orientation, location, and scale remains unsolved. New and emerging applications, such as data mining, web searching, retrieval of multimedia data, face recognition, and cursive handwriting recognition, require robust and efficient pattern recognition techniques. The objective of this review paper is to summarize and compare some of the wellknown methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field.
An introduction to kernelbased learning algorithms
 IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2001
"... This paper provides an introduction to support vector machines (SVMs), kernel Fisher discriminant analysis, and ..."
Abstract

Cited by 375 (48 self)
 Add to MetaCart
This paper provides an introduction to support vector machines (SVMs), kernel Fisher discriminant analysis, and
Distance metric learning for large margin nearest neighbor classification
 In NIPS
, 2006
"... We show how to learn a Mahanalobis distance metric for knearest neighbor (kNN) classification by semidefinite programming. The metric is trained with the goal that the knearest neighbors always belong to the same class while examples from different classes are separated by a large margin. On seven ..."
Abstract

Cited by 326 (10 self)
 Add to MetaCart
We show how to learn a Mahanalobis distance metric for knearest neighbor (kNN) classification by semidefinite programming. The metric is trained with the goal that the knearest neighbors always belong to the same class while examples from different classes are separated by a large margin. On seven data sets of varying size and difficulty, we find that metrics trained in this way lead to significant improvements in kNN classification—for example, achieving a test error rate of 1.3 % on the MNIST handwritten digits. As in support vector machines (SVMs), the learning problem reduces to a convex optimization based on the hinge loss. Unlike learning in SVMs, however, our framework requires no modification or extension for problems in multiway (as opposed to binary) classification. 1
Regularization networks and support vector machines
 Advances in Computational Mathematics
, 2000
"... Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization a ..."
Abstract

Cited by 267 (33 self)
 Add to MetaCart
Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization and Support Vector Machines. We review both formulations in the context of Vapnik’s theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics. The emphasis is on regression: classification is treated as a special case.
Discriminant Adaptive Nearest Neighbor Classification
, 1994
"... Nearest neighbor classification expects the class conditional probabilities to be locally constant, and suffers from bias in high dimensions. We propose a locally adaptive form of nearest neighbor classification to try to ameliorate this curse of dimensionality. We use a local linear discriminant an ..."
Abstract

Cited by 243 (1 self)
 Add to MetaCart
Nearest neighbor classification expects the class conditional probabilities to be locally constant, and suffers from bias in high dimensions. We propose a locally adaptive form of nearest neighbor classification to try to ameliorate this curse of dimensionality. We use a local linear discriminant analysis to estimate an effective metric for computing neighborhoods. We determine the local decision boundaries from centroid information, and then shrink neighborhoods in directions orthogonal to these local decision boundaries, and elongate them parallel to the boundaries. Thereafter, any neighborhoodbased classifier can be employed, using the modified neighborhoods. The posterior probabilities tend to be more homogeneous in the modified neighborhoods. We also propose a method for global dimension reduction, that combines local dimension information. In a number of examples, the methods demonstrate the potential for substantial improvements over nearest neighbor classification. Keywords...
Svmknn: Discriminative nearest neighbor classification for visual category recognition
 in CVPR
, 2006
"... We consider visual category recognition in the framework of measuring similarities, or equivalently perceptual distances, to prototype examples of categories. This approach is quite flexible, and permits recognition based on color, texture, and particularly shape, in a homogeneous framework. While n ..."
Abstract

Cited by 210 (7 self)
 Add to MetaCart
We consider visual category recognition in the framework of measuring similarities, or equivalently perceptual distances, to prototype examples of categories. This approach is quite flexible, and permits recognition based on color, texture, and particularly shape, in a homogeneous framework. While nearest neighbor classifiers are natural in this setting, they suffer from the problem of high variance (in biasvariance decomposition) in the case of limited sampling. Alternatively, one could use support vector machines but they involve timeconsuming optimization and computation of pairwise distances. We propose a hybrid of these two methods which deals naturally with the multiclass setting, has reasonable computational complexity both in training and at run time, and yields excellent results in practice. The basic idea is to find close neighbors to a query sample and train a local support vector machine that preserves the distance function on the collection of neighbors. Our method can be applied to large, multiclass data sets for which it outperforms nearest neighbor and support vector machines, and remains efficient when the problem becomes intractable for support vector machines. A wide variety of distance functions can be used and our experiments show stateoftheart performance on a number of benchmark data sets for shape and texture classification (MNIST, USPS, CUReT) and object recognition (Caltech101). On Caltech101 we achieved a correct classification rate of 59.05%(±0.56%) at 15 training images per class, and 66.23%(±0.48%) at 30 training images. 1.
Solving the multipleinstance problem with axisparallel rectangles
 Artificial Intelligence
, 1997
"... ..."
Kernel principal component analysis
 ADVANCES IN KERNEL METHODS  SUPPORT VECTOR LEARNING
, 1999
"... A new method for performing a nonlinear form of Principal Component Analysis is proposed. By the use of integral operator kernel functions, one can efficiently compute principal components in highdimensional feature spaces, related to input space by some nonlinear map; for instance the space of all ..."
Abstract

Cited by 177 (8 self)
 Add to MetaCart
A new method for performing a nonlinear form of Principal Component Analysis is proposed. By the use of integral operator kernel functions, one can efficiently compute principal components in highdimensional feature spaces, related to input space by some nonlinear map; for instance the space of all possible dpixel products in images. We give the derivation of the method and present experimental results on polynomial feature extraction for pattern recognition.