Results 1 - 10
of
131
Support-Vector Networks
- Machine Learning
, 1995
"... The support-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special pr ..."
Abstract
-
Cited by 1491 (22 self)
- Add to MetaCart
The support-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the supportvector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data.
Gradient-based learning applied to document recognition
- Proceedings of the IEEE
, 1998
"... Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify hi ..."
Abstract
-
Cited by 487 (38 self)
- Add to MetaCart
Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of two dimensional (2-D) shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation, recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN’s), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank check is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal checks. It is deployed commercially and reads several million checks per day.
Optimal Brain Damage
- Advances in Neural Information Processing Systems
, 1990
"... We have used information-theoretic ideas to derive a class of practical and nearly optimal schemes for adapting the size of a neural network. By removing unimportant weights from a network, several improvements can be expected: better generalization, fewer training examples required, and improve ..."
Abstract
-
Cited by 375 (5 self)
- Add to MetaCart
We have used information-theoretic ideas to derive a class of practical and nearly optimal schemes for adapting the size of a neural network. By removing unimportant weights from a network, several improvements can be expected: better generalization, fewer training examples required, and improved speed of learning and/or classification. The basic idea is to use second-derivative information to make a tradeoff between network complexity and training set error. Experiments confirm the usefulness of the methods on a real-world application. 1 INTRODUCTION Most successful applications of neural network learning to real-world problems have been achieved using highly structured networks of rather large size [for example (Waibel, 1989; LeCun et al., 1990)]. As applications become more complex, the networks will presumably become even larger and more structured. Design tools and techniques for comparing different architectures and minimizing the network size will be needed. More impor...
Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions
- IN ICML
, 2003
"... An approach to semi-supervised learning is proposed that is based on a Gaussian random field model. Labeled and unlabeled data are represented as vertices in a weighted graph, with edge weights encoding the similarity between instances. The learning ..."
Abstract
-
Cited by 325 (13 self)
- Add to MetaCart
An approach to semi-supervised learning is proposed that is based on a Gaussian random field model. Labeled and unlabeled data are represented as vertices in a weighted graph, with edge weights encoding the similarity between instances. The learning
Extracting Support Data for a Given Task
- Proceedings, First International Conference on Knowledge Discovery & Data Mining, Menlo Park
, 1995
"... We report a novel possibility for extracting a small subset of a data base which contains all the information necessary to solve a given classification task: using the Support Vector Algorithm to train three different types of handwritten digit classifiers, we observed that these types of classifier ..."
Abstract
-
Cited by 154 (31 self)
- Add to MetaCart
We report a novel possibility for extracting a small subset of a data base which contains all the information necessary to solve a given classification task: using the Support Vector Algorithm to train three different types of handwritten digit classifiers, we observed that these types of classifiers construct their decision surface from strongly overlapping small (ß 4%) subsets of the data base. This finding opens up the possibility of compressing data bases significantly by disposing of the data which is not important for the solution of a given task. In addition, we show that the theory allows us to predict the classifier that will have the best generalization ability, based solely on performance on the training set and characteristics of the learning machines. This finding is important for cases where the amount of available data is limited. Introduction in: U. M. Fayyad and R. Uthurusamy (eds.): Proceedings, First International Conference on Knowledge Discovery & Data Mining. AAA...
Modeling the manifolds of images of handwritten digits
- IEEE Transactions on Neural Networks
, 1997
"... description length, density estimation. ..."
Face Recognition: A Convolutional Neural Network Approach
- IEEE Transactions on Neural Networks
, 1997
"... Faces represent complex, multidimensional, meaningful visual stimuli and developing a computational model for face recognition is difficult [43]. We present a hybrid neural network solution which compares favorably with other methods. The system combines local image sampling, a self-organizing map n ..."
Abstract
-
Cited by 127 (0 self)
- Add to MetaCart
Faces represent complex, multidimensional, meaningful visual stimuli and developing a computational model for face recognition is difficult [43]. We present a hybrid neural network solution which compares favorably with other methods. The system combines local image sampling, a self-organizing map neural network, and a convolutional neural network. The self-organizing map provides a quantization of the image samples into a topological space where inputs that are nearby in the original space are also nearby in the output space, thereby providing dimensionality reduction and invariance to minor changes in the image sample, and the convolutional neural network provides for partial invariance to translation, rotation, scale, and deformation. The convolutional network extracts successively larger features in a hierarchical set of layers. We present results using the Karhunen-Loeve transform in place of the self-organizing map, and a multi-layer perceptron in place of the convolutional netwo...
Shape quantization and recognition with randomized trees
- Neural Computation
, 1997
"... We explore a new approach to shape recognition based on a virtually in nite family of binary features (\queries") of the image data, designed to accommodate prior in-formation about shape invariance and regularity. Each query corresponds to a spatial arrangement ofseveral local topographic code ..."
Abstract
-
Cited by 126 (15 self)
- Add to MetaCart
We explore a new approach to shape recognition based on a virtually in nite family of binary features (\queries") of the image data, designed to accommodate prior in-formation about shape invariance and regularity. Each query corresponds to a spatial arrangement ofseveral local topographic codes (\tags") which are in themselves too primitive and common to be informative about shape. All the discriminating power derives from relative angles and distances among the tags. The important attributes of the queries are (i) a natural partial ordering corresponding to increasing structure and complexity � (ii) semi-invariance, meaning that most shapes of a given class will answer the same way totwo queries which are successive in the ordering � and (iii) stability, since the queries are not based on distinguished points and substructures. No classi er based on the full feature set can be evaluated and it is impossible to determine a priori which arrangements are informative. Our approach istoselect informative features and build tree classi ers at the same time by inductive learning. In e ect, each tree provides an approximation to the full posterior where the features
Discriminant Analysis by Gaussian Mixtures
- Journal of the Royal Statistical Society, Series B
, 1996
"... Fisher-Rao linear discriminant analysis (LDA) is a valuable tool for multigroup classification. LDA is equivalent to maximum likelihood classification assuming Gaussian distributions for each class. In this paper, we fit Gaussian mixtures to each class to facilitate effective classification in non-n ..."
Abstract
-
Cited by 124 (9 self)
- Add to MetaCart
Fisher-Rao linear discriminant analysis (LDA) is a valuable tool for multigroup classification. LDA is equivalent to maximum likelihood classification assuming Gaussian distributions for each class. In this paper, we fit Gaussian mixtures to each class to facilitate effective classification in non-normal settings, especially when the classes are clustered. Low dimensional views are an important by-product of LDA---our new techniques inherit this feature. We are able to control the within-class spread of the subclass centers relative to the between-class spread. Our technique for fitting these models permits a natural blend with nonparametric versions of LDA. Keywords: Classification, Pattern Recognition, Clustering, Nonparametric, Penalized. 1 Introduction In the generic classification or discrimination problem, the outcome of interest G falls into J unordered classes, which for convenience we denote by the set J = f1; 2; 3; \Delta \Delta \Delta Jg. We wish to build a rule for pred...
Local Learning Algorithms
- Neural Computation
, 1992
"... Very rarely are training data evenly distributed in the input space. Local learning algorithms attempt to locally adjust the capacity of the training system to the properties of the training set in each area of the input space. The family of local learning algorithms contains known methods, like the ..."
Abstract
-
Cited by 101 (1 self)
- Add to MetaCart
Very rarely are training data evenly distributed in the input space. Local learning algorithms attempt to locally adjust the capacity of the training system to the properties of the training set in each area of the input space. The family of local learning algorithms contains known methods, like the k-Nearest Neighbors method (kNN) or the Radial Basis Function networks (RBF), as well as new algorithms. A single analysis models some aspects of these algorithms. In particular, it suggests that neither kNN or RBF, nor non local classifiers, achieve the best compromise between locality and capacity. A careful control of these parameters in a simple local learning algorithm has provided a performance breakthrough for an optical character recognition problem. Both the error rate and the rejection performance have been significantly improved. 1 Introduction. Here is a simple local algorithm: For each testing pattern, (1) select the few training examples located in the vicinity of the testing...

