• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 425
Next 10 →

Learning on the Fly: Font-Free Approaches to Difficult OCR Problems

by Andrew Kae, Erik Learned-miller
"... Despite ubiquitous claims that optical character recognition (OCR) is a “solved problem, ” many categories of documents continue to break modern OCR software such as documents with moderate degradation or unusual fonts. Many approaches rely on pre-computed or stored character models, but these are v ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
Despite ubiquitous claims that optical character recognition (OCR) is a “solved problem, ” many categories of documents continue to break modern OCR software such as documents with moderate degradation or unusual fonts. Many approaches rely on pre-computed or stored character models

Experiments with a New Boosting Algorithm

by Yoav Freund, Robert E. Schapire , 1996
"... In an earlier paper, we introduced a new “boosting” algorithm called AdaBoost which, theoretically, can be used to significantly reduce the error of any learning algorithm that consistently generates classifiers whose performance is a little better than random guessing. We also introduced the relate ..."
Abstract - Cited by 2213 (20 self) - Add to MetaCart
-learning benchmarks. In the second set of experiments, we studied in more detail the performance of boosting using a nearest-neighbor classifier on an OCR problem.

N-grambased text categorization

by William B. Cavnar, John M. Trenkle - In Proc. of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval , 1994
"... Text categorization is a fundamental task in document processing, allowing the automated handling of enormous streams of documents in electronic form. One difficulty in handling some classes of documents is the presence of different kinds of textual errors, such as spelling and grammatical errors in ..."
Abstract - Cited by 445 (0 self) - Add to MetaCart
in email, and character recognition errors in documents that come through OCR. Text categorization must work reliably on all input, and thus must tolerate some level of these kinds of problems. We describe here an N-gram-based approach to text categorization that is tolerant of textual errors. The system

2009 10th International Conference on Document Analysis and Recognition Learning on the Fly: Font-Free Approaches to Difficult OCR Problems

by Andrew Kae, Erik Learned-miller
"... Despite ubiquitous claims that optical character recognition (OCR) is a “solved problem, ” many categories of documents continue to break modern OCR software such as documents with moderate degradation or unusual fonts. Many approaches rely on pre-computed or stored character models, but these are v ..."
Abstract - Add to MetaCart
Despite ubiquitous claims that optical character recognition (OCR) is a “solved problem, ” many categories of documents continue to break modern OCR software such as documents with moderate degradation or unusual fonts. Many approaches rely on pre-computed or stored character models

Video OCR for Digital News Archives

by Toshio Sato, Takeo Kanade, Ellen K. Hughes, Michael A. Smith - In Proc. Workshop on Content-Based Access of Image and Video Databases. (Los Alamitos, CA , 1998
"... Video OCR is a technique that can greatly help to locate topics of interest in a large digital news video archive via the automatic extraction and reading of captions and annotations. News captions generally provide vital search information about the video being presented -- the names of people and ..."
Abstract - Cited by 110 (0 self) - Add to MetaCart
Video OCR is a technique that can greatly help to locate topics of interest in a large digital news video archive via the automatic extraction and reading of captions and annotations. News captions generally provide vital search information about the video being presented -- the names of people

Evaluation of Pattern Classifiers for Fingerprint and OCR Applications

by J.L. Blue, G.T. Candela, P.J. Grother, R. Chellappa, C.L. Wilson - Pattern Recognition , 1993
"... In this paper we evaluate the classification accuracy of four statistical and three neural network classifiers for two image based pattern classification problems. These are fingerprint classification and optical character recognition (OCR) for isolated handprinted digits. The evaluation results rep ..."
Abstract - Cited by 37 (2 self) - Add to MetaCart
In this paper we evaluate the classification accuracy of four statistical and three neural network classifiers for two image based pattern classification problems. These are fingerprint classification and optical character recognition (OCR) for isolated handprinted digits. The evaluation results

Representing Ocred Documents In Html

by Tao Hong, Sargur N. Srihari - IN PROCEEDINGS OF THE IAPR 1997 INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR , 1997
"... OCR is an error-prone process. It is time-consuming and expensivetomanually proofread OCR results. The errors remaining in OCRed texts can cause serious problems in reading and understanding if they do not refer to the original image representation. As demonstrated in this paper, a hybrid documen ..."
Abstract - Cited by 4 (1 self) - Add to MetaCart
OCR is an error-prone process. It is time-consuming and expensivetomanually proofread OCR results. The errors remaining in OCRed texts can cause serious problems in reading and understanding if they do not refer to the original image representation. As demonstrated in this paper, a hybrid

OCR based thresholding

by Yves Rangoni, Faisal Shafait, Thomas M. Breuel - In Proceedings of IAPR Conference on Machine Vision Applications , 2009
"... In large-scale digitization processes, several common tasks are performed to provide an electronic version of a paper document. One of the first steps is the thresholding of the image, which is necessary for the following procedures to work properly. Many binarization methods have been proposed to s ..."
Abstract - Cited by 8 (2 self) - Add to MetaCart
to solve this problem, but they need to be tuned on the target document corpus to obtain best results. In this paper, we introduce a full automatic thresholding method for printed document analysis. The purpose is to obtain the most suitable binarizer for a given document image according to the quality

Evaluation of model-based retrieval effectiveness with OCR text

by Kazem Taghva, Julie Borsack, Allen Condit - ACM Transactions on Information Systems , 1996
"... We give a comprehensive report on our experiments with retrieval from OCR-generated text using systems based on standard models of retrieval. More specifically, we show that average precision and recall is not affected by OCR errors across systems for several collections. The collections used in the ..."
Abstract - Cited by 35 (12 self) - Add to MetaCart
with these models are generally not robust enough to deal with OCR errors. It is further shown that the OCR errors and garbage strings generated from the mistranslation of graphic objects increase the size of the index by a wide margin. We not only point out problems that can arise from applying OCR text within

Probabilistic Management of OCR Data using an RDBMS

by Arun Kumar , Christopher Ré , 2011
"... The digitization of scanned forms and documents is changing the data sources that enterprises manage. To integrate these new data sources with enterprise data, the current state-ofthe-art approach is to convert the images to ASCII text using optical character recognition (OCR) software and then to s ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
and then to store the resulting ASCII text in a relational database. The OCR problem is challenging, and so the output of OCR often contains errors. In turn, queries on the output of OCR may fail to retrieve relevant answers. State-of-theart OCR programs, e.g., the OCR powering Google Books, use a probabilistic
Next 10 →
Results 1 - 10 of 425
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University