Results 11  20
of
54
Detecting and reading text in natural scenes
 In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition
, 2004
"... This paper gives an algorithm for detecting and reading text in natural images. The algorithm is intended for use by blind and visually impaired subjects walking through city scenes. We first obtain a dataset of city images taken by blind and normally sighted subjects. From this dataset, we manually ..."
Abstract

Cited by 80 (2 self)
 Add to MetaCart
This paper gives an algorithm for detecting and reading text in natural images. The algorithm is intended for use by blind and visually impaired subjects walking through city scenes. We first obtain a dataset of city images taken by blind and normally sighted subjects. From this dataset, we manually label and extract the text regions. Next we perform statistical analysis of the text regions to determine which image features are reliable indicators of text and have low entropy (i.e. feature response is similar for all text images). We obtain weak classifiers by using joint probabilities for feature responses on and off text. These weak classifiers are used as input to an AdaBoost machine learning algorithm to train a strong classifier. In practice, we trained a cascade with 4 strong classifiers containg 79 features. An adaptive binarization and extension algorithm is applied to those regions selected by the cascade classifier. A commercial OCR software is used to read the text or reject it as a nontext region. The overall algorithm has a success rate of over 90% (evaluated by complete detection and reading of the text) on the test set and the unread text is typically small and distant from the viewer. 1.
Joint Induction of Shape Features and Tree Classifiers
 IEEE Trans. PAMI
, 1997
"... We introduce a very large family of binary features for twodimensional shapes. The salient ones for separating particular shapes are determined by inductive learning during the construction of classi cation trees. There is a feature for every possible geometric arrangement of local topographic code ..."
Abstract

Cited by 76 (6 self)
 Add to MetaCart
We introduce a very large family of binary features for twodimensional shapes. The salient ones for separating particular shapes are determined by inductive learning during the construction of classi cation trees. There is a feature for every possible geometric arrangement of local topographic codes. The arrangements express coarse constraints on relative angles and distances among the code locations and are nearly invariant to substantial a ne and nonlinear deformations. They are also partially ordered, which makes it possible to narrow the search for informative ones at each node of the tree. Di erent trees correspond to di erent aspects of shape. They are statistically weakly dependent due to randomization and are aggregated in a simple way. Adapting the algorithm to a shape family is then fully automatic once training samples are provided. As an illustration, we classify handwritten digits from the NIST database � the error rate is:7%.
Experiments on Multistrategy Learning by MetaLearning
 In Proc. Second Intl. Conference on Info. and Knowledge Mgmt
, 1993
"... In this paper, we propose metalearning as a general technique to combine the results of multiple learning algorithms, each applied to a set of training data. We detail several metalearning strategies for combining independently learned classifiers, each computed by different algorithms, to improve ..."
Abstract

Cited by 75 (13 self)
 Add to MetaCart
In this paper, we propose metalearning as a general technique to combine the results of multiple learning algorithms, each applied to a set of training data. We detail several metalearning strategies for combining independently learned classifiers, each computed by different algorithms, to improve overall prediction accuracy. The overall resulting classifier is composed of the classifiers generated by the different learning algorithms and a metaclassifier generated by a metalearning strategy. The strategies described here are independent of the learning algorithms used. Preliminary experiments using different strategies and learning algorithms on two molecular biologysequence analysis data sets demonstrate encouraging results. Machine learning techniques are central to automated knowledge discovery systems and hence our approach can enhance the effectiveness of such systems. 1 Introduction The information age provides us with easy access to a wide range of information, however, it ...
Method Combination For Document Filtering
, 1996
"... There is strong empirical and theoretic evidence that combination of retrieval methods can improve performance. In this paper, we systematically compare combination strategies in the context of document filtering, using queries from the Tipster reference corpus. We find that simple averaging strateg ..."
Abstract

Cited by 53 (1 self)
 Add to MetaCart
There is strong empirical and theoretic evidence that combination of retrieval methods can improve performance. In this paper, we systematically compare combination strategies in the context of document filtering, using queries from the Tipster reference corpus. We find that simple averaging strategies do indeed improve performance, but that direct averaging of probability estimates is not the correct approach. Instead, the probability estimates must be renormalized using logistic regression on the known relevance judgements. We examine more complex combination strategies but find them less successful due to the high correlations among our filtering methods which are optimized over the same training data and employ similar document representations. 1 Introduction A text filtering system monitors an incoming document stream and selects documents identified as relevant to one or more of its query profiles. If profile interactions are ignored, this reduces to a number of independent bina...
Theoretical Views of Boosting and Applications
, 1999
"... . Boosting is a general method for improving the accuracy of any given learning algorithm. Focusing primarily on the AdaBoost algorithm, we briefly survey theoretical work on boosting including analyses of AdaBoost's training error and generalization error, connections between boosting and game theo ..."
Abstract

Cited by 50 (2 self)
 Add to MetaCart
. Boosting is a general method for improving the accuracy of any given learning algorithm. Focusing primarily on the AdaBoost algorithm, we briefly survey theoretical work on boosting including analyses of AdaBoost's training error and generalization error, connections between boosting and game theory, methods of estimating probabilities using boosting, and extensions of AdaBoost for multiclass classification problems. Some empirical work and applications are also described. Background Boosting is a general method which attempts to "boost" the accuracy of any given learning algorithm. Kearns and Valiant [29, 30] were the first to pose the question of whether a "weak" learning algorithm which performs just slightly better than random guessing in Valiant's PAC model [44] can be "boosted" into an arbitrarily accurate "strong" learning algorithm. Schapire [36] came up with the first provable polynomialtime boosting algorithm in 1989. A year later, Freund [16] developed a much more effici...
An Extensible MetaLearning Approach for Scalable and Accurate Inductive Learning
, 1996
"... Much of the research in inductive learning concentrates on problems with relatively small amounts of data. With the coming age of ubiquitous network computing, it is likely that orders of magnitude more data in databases will be available for various learning problems of real world importance. Som ..."
Abstract

Cited by 44 (8 self)
 Add to MetaCart
Much of the research in inductive learning concentrates on problems with relatively small amounts of data. With the coming age of ubiquitous network computing, it is likely that orders of magnitude more data in databases will be available for various learning problems of real world importance. Some learning algorithms assume that the entire data set fits into main memory, which is not feasible for massive amounts of data, especially for applications in data mining. One approach to handling a large data set is to partition the data set into subsets, run the learning algorithm on each of the subsets, and combine the results. Moreover, data can be inherently distributed across multiple sites on the network and merging all the data in one location can be expensive or prohibitive. In this thesis we propose, investigate, and evaluate a metalearning approach to integrating the results of mul...
Constructing Boosting Algorithms from SVMs: An Application to Oneclass Classification
, 2002
"... ..."
Boosting and hardcore sets
 In Proceedings of the Fortieth Annual Symposium on Foundations of Computer Science
, 1999
"... This paper connects two fundamental ideas from theoretical computer science: hardcore set construction, a type of hardness amplification from computational complexity, and boosting, a technique from computational learning theory. Using this connection we give fruitful applications of complexitythe ..."
Abstract

Cited by 38 (8 self)
 Add to MetaCart
This paper connects two fundamental ideas from theoretical computer science: hardcore set construction, a type of hardness amplification from computational complexity, and boosting, a technique from computational learning theory. Using this connection we give fruitful applications of complexitytheoretic techniques to learning theory and vice versa. We show that the hardcore set construction of Impagliazzo [15], which establishes the existence of distributions under which boolean functions are highly inapproximable, may be viewed as a boosting algorithm. Using alternate boosting methods we give an improved bound for hardcore set construction which matches known lower bounds from boosting and thus is optimal within this class of techniques. We then show how to apply techniques from [15] to give a new version of Jackson’s celebrated Harmonic Sieve algorithm for learning DNF formulae under the uniform distribution using membership queries. Our new version has a significant asymptotic improvement in running time. Critical to our arguments is a careful analysis of the distributions which are employed in both boosting and hardcore set constructions.
Transformation Invariant Autoassociation with Application to Handwritten Character Recognition
, 1995
"... When training neural networks by the classical backpropagation algorithm the whole problem to learn must be expressed by a set of inputs and desired outputs. However, we often have highlevel knowledge about the learning problem. In optical character recognition (OCR), for instance, we know that the ..."
Abstract

Cited by 36 (8 self)
 Add to MetaCart
When training neural networks by the classical backpropagation algorithm the whole problem to learn must be expressed by a set of inputs and desired outputs. However, we often have highlevel knowledge about the learning problem. In optical character recognition (OCR), for instance, we know that the classification should be invariant under a set of transformations like rotation or translation. We propose a new modular classification system based on several autoassociative multilayer perceptrons which allows the efficient incorporation of such knowledge. Results are reported on the NIST database of upper case handwritten letters and compared to other approaches to the invariance problem. 1 INCORPORATION OF EXPLICIT KNOWLEDGE The aim of supervised learning is to learn a mapping between the input and the output space from a set of example pairs (input, desired output). The classical implementation in the domain of neural networks is the backpropagation algorithm. If this learning set is ...