Results 1  10
of
103
Training Invariant Support Vector Machines
, 2002
"... Practical experience has shown that in order to obtain the best possible performance, prior knowledge about invariances of a classification problem at hand ought to be incorporated into the training procedure. We describe and review all known methods for doing so in support vector machines, provide ..."
Abstract

Cited by 136 (16 self)
 Add to MetaCart
Practical experience has shown that in order to obtain the best possible performance, prior knowledge about invariances of a classification problem at hand ought to be incorporated into the training procedure. We describe and review all known methods for doing so in support vector machines, provide experimental results, and discuss their respective merits. One of the significant new results reported in this work is our recent achievement of the lowest reported test error on the wellknown MNIST digit recognition benchmark task, with SVM training times that are also significantly faster than previous SVM methods.
Adaptive document image binarization
 PATTERN RECOGNITION
, 2000
"... A new method is presented for adaptive document image binarization, where the page is considered as a collection of subcomponents such as text, background and picture. The problems caused by noise, illumination and many source typerelated degradations are addressed. Two new algorithms are applied t ..."
Abstract

Cited by 117 (0 self)
 Add to MetaCart
A new method is presented for adaptive document image binarization, where the page is considered as a collection of subcomponents such as text, background and picture. The problems caused by noise, illumination and many source typerelated degradations are addressed. Two new algorithms are applied to determine a local threshold for each pixel. The performance evaluation of the algorithm utilizes test images with groundtruth, evaluation metrics for binarization of textual and synthetic images, and a weightbased ranking procedure for the final result presentation. The proposed algorithms were tested with images including different types of document components and degradations. The results were compared with a number of known techniques in the literature. The benchmarking results show that the method adapts and performs well in each case qualitatively and quantitatively.
Learning over Sets using Kernel Principal Angles
 Journal of Machine Learning Research
, 2003
"... We consider the problem of learning with instances defined over a space of sets of vectors. We derive a new positive definite kernel f (A,B) defined over pairs of matrices A,B based on the concept of principal angles between two linear subspaces. We show that the principal angles can be recovered ..."
Abstract

Cited by 78 (2 self)
 Add to MetaCart
We consider the problem of learning with instances defined over a space of sets of vectors. We derive a new positive definite kernel f (A,B) defined over pairs of matrices A,B based on the concept of principal angles between two linear subspaces. We show that the principal angles can be recovered using only innerproducts between pairs of column vectors of the input matrices thereby allowing the original column vectors of A,B to be mapped onto arbitrarily highdimensional feature spaces.
A SelfCorrecting 100Font Classifier
, 1994
"... We have developed a practical scheme to take advantage of local typeface homogeneity to improve the accuracy of a character classifier. Given a polyfont classifier which is capable of recognizing any of 100 typefaces moderately well, our method allows it to specialize itself automatically to the sin ..."
Abstract

Cited by 64 (35 self)
 Add to MetaCart
We have developed a practical scheme to take advantage of local typeface homogeneity to improve the accuracy of a character classifier. Given a polyfont classifier which is capable of recognizing any of 100 typefaces moderately well, our method allows it to specialize itself automatically to the single  but otherwise unknown  typeface it is reading. Essentially, the classifier retrains itself after examining some of the images, guided at first by the preset classification boundaries of the given classifier, and later by the behavior of the retrained classifier. Experimental trials on 6.4M pseudorandomly distorted images show that the method improves on 95 of the 100 typefaces. It reduces the error rate by a factor of 2.5, averaged over 100 typefaces, when applied to an alphabet of 80 ASCII characters printed at ten point and digitized at 300 pixels/inch. This selfcorrecting method complements, and does not hinder, other methods for improving OCR accuracy, such as linguistic con...
Global and Local Document Degradation Models
 In Proceedings of the International Conference on Document Analysis and Recognition
, 1993
"... Two sources of document degradation are modeled  i) perspective distortion that occurs while photocopying or scanning thick, bound documents, and ii) degradation due to perturbation in the optical scanning and digitization process: speckle, blurr, jitter, threshold. Perspective distortion is model ..."
Abstract

Cited by 52 (11 self)
 Add to MetaCart
Two sources of document degradation are modeled  i) perspective distortion that occurs while photocopying or scanning thick, bound documents, and ii) degradation due to perturbation in the optical scanning and digitization process: speckle, blurr, jitter, threshold. Perspective distortion is modeled by studying the underlying perspective geometry of the optical system of photocopiers and scanners. An illumination model is described to account for the nonlinear intensity change occuring across a page in a perspectivedistorted document. The optical distortion process is modeled morphlogically. First a distance transform on the foreground is performed and followed by a random inversion of binary pixels where the probability of flip is a function of the distance of the pixel to the boundary of the foreground. Correlating the flipped pixels is modeled by a morphological closing operation. 1 Introduction There are many reasons for modeling document degradation. First, in order to study ...
Survey of the state of the art in human language technology
 Studies In Natural Language Processing, XIIXIII
, 1997
"... Sponsors: ..."
Anatomy Of A Versatile Page Reader
 Proceedings of the IEEE
, 1992
"... An experimental printedpage reader that is easy to adapt to various languages is described. Changing the target language may involve simultaneous changes in symbol sets, typefaces, sizes of text, page layouts, linguistic contexts, and imaging defects. Our strategy has been to isolate the effects of ..."
Abstract

Cited by 41 (5 self)
 Add to MetaCart
An experimental printedpage reader that is easy to adapt to various languages is described. Changing the target language may involve simultaneous changes in symbol sets, typefaces, sizes of text, page layouts, linguistic contexts, and imaging defects. Our strategy has been to isolate the effects of these sources of variation within separate, independent engineering subsystems. In this way, we have been able to construct, with a minimum of manual effort, classifiers for arbitrary combinations of symbols, typefaces, sizes, and imaging defects. We have tried to rid the algorithms of all languagespecific rules, relying instead on automatic learning from examples and generalized tabledriven methods. For some tasks we have been able to avoid languagedependency altogether: for example, for geometric page layout analysis we have found a globaltolocal strategy that requires no prior knowledge of the symbol set. We can exploit linguistic context, such as provided by dictionaries, through da...
Kernel Principal Angles for Classification Machines with Applications to Image Sequence Interpretation
, 2002
"... We consider the problem of learning with instances defined over a space of sets of vectors. We derive a new positive definite kernel f(A# B) defined over pairs of matrices A# B based on the concept of principal angles between two linear subspaces. We show that the principal angles can be recovered ..."
Abstract

Cited by 38 (6 self)
 Add to MetaCart
We consider the problem of learning with instances defined over a space of sets of vectors. We derive a new positive definite kernel f(A# B) defined over pairs of matrices A# B based on the concept of principal angles between two linear subspaces. We show that the principal angles can be recovered using only innerproducts between pairs of column vectors of the input matrices thereby allowing the original column vectors of A# B to be mapped onto arbitrarily highdimensional feature spaces.
Document Image Defect Models and Their Uses
 In Proceedings of the Second International Conference on Document Analysis and Recognition ICDAR93
, 1993
"... The accuracy of today's document recognition algorithms falls abruptly when image quality degrades even slightly. In an effort to surmount this barrier, researchers have in recent years intensified their study of explicit, quantitative, parameterized models of the image defects that occur durin ..."
Abstract

Cited by 34 (7 self)
 Add to MetaCart
The accuracy of today's document recognition algorithms falls abruptly when image quality degrades even slightly. In an effort to surmount this barrier, researchers have in recent years intensified their study of explicit, quantitative, parameterized models of the image defects that occur during printing and scanning. I review the recent literature and discuss the form these models might take. I give a preview of a large publicdomain database of character images, labeled with groundtruth including all defect model parameters, the first of its kind. I describe the use of massive pseudorandomly generated training sets for the construction of highperformance decision trees for preclassification. Also, I report preliminary results along a more theoretical line of attack: the estimation of the intrinsic error rate of precisespecified text recognition problems (this is joint work with Tin K. Ho). Finally, I list some open problems. 1 Introduction In recent years, some researchers i...
Overview of Evaluation in Speech and Natural Language Processing
, 1997
"... Introduction to Evaluation Terminology and Use We can broadly distinguish three kinds of evaluation, appropriate to three different goals. 1. Adequacy Evaluation This is determination of the fitness of a system for a purposewill it do what is required, how well, at what cost, etc. Typically for ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
Introduction to Evaluation Terminology and Use We can broadly distinguish three kinds of evaluation, appropriate to three different goals. 1. Adequacy Evaluation This is determination of the fitness of a system for a purposewill it do what is required, how well, at what cost, etc. Typically for a prospective user, it may be comparative or not, and may require considerable work to identify a user's needs. One model is consumer organizations which publish the results of tests on, e.g., cars or appliances, and identify best buys for certain priceperformance targets. This also goes by the names evaluation and evaluation proper. 476 Chapter 13: Evaluation 2. Diagnostic Evaluation This is production of a system performance profile with respect to some taxonimization of the space of possible inputs. It is typically used by system developers, but sometimes offered to endus