• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 444
Next 10 →

Probabilistic Management of OCR Data using an RDBMS

by Arun Kumar , Christopher Ré , 2011
"... The digitization of scanned forms and documents is changing the data sources that enterprises manage. To integrate these new data sources with enterprise data, the current state-ofthe-art approach is to convert the images to ASCII text using optical character recognition (OCR) software and then to s ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
The digitization of scanned forms and documents is changing the data sources that enterprises manage. To integrate these new data sources with enterprise data, the current state-ofthe-art approach is to convert the images to ASCII text using optical character recognition (OCR) software

Tree-Structured Named Entity Recognition on OCR Data: Analysis, Processing and Results

by Marco Dinarelli, Sophie Rosset
"... In this paper we deal with named entity detection on data acquired via OCR process on documents dating from 1890. The resulting corpus is very noisy. We perform an analysis to find possible strategies to overcome errors introduced by the OCR process. We propose a preprocessing procedure in three ste ..."
Abstract - Add to MetaCart
In this paper we deal with named entity detection on data acquired via OCR process on documents dating from 1890. The resulting corpus is very noisy. We perform an analysis to find possible strategies to overcome errors introduced by the OCR process. We propose a preprocessing procedure in three

Jigsawing: A Method to Create Virtual Examples in OCR data

by S. V. N. Vishwanthan, M. Narasimha Murty
"... ..."
Abstract - Add to MetaCart
Abstract not found

OCR Correction and Query Expansion for Retrieval on OCR Data - CLARIT TREC-5 Confusion Track Report

by Xiang Tong, Chengxiang Zhai, Natasa Milic-Frayling, David A. Evans , 1996
"... this report we first give a brief description of the OCR correction and query expansion techniques, and then discuss the results of our experiments. 2 The Automatic OCR Correction System ..."
Abstract - Cited by 4 (1 self) - Add to MetaCart
this report we first give a brief description of the OCR correction and query expansion techniques, and then discuss the results of our experiments. 2 The Automatic OCR Correction System

N-grambased text categorization

by William B. Cavnar, John M. Trenkle - In Proc. of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval , 1994
"... Text categorization is a fundamental task in document processing, allowing the automated handling of enormous streams of documents in electronic form. One difficulty in handling some classes of documents is the presence of different kinds of textual errors, such as spelling and grammatical errors in ..."
Abstract - Cited by 445 (0 self) - Add to MetaCart
in email, and character recognition errors in documents that come through OCR. Text categorization must work reliably on all input, and thus must tolerate some level of these kinds of problems. We describe here an N-gram-based approach to text categorization that is tolerant of textual errors. The system

Results of Applying Probabilistic IR to OCR Text

by Kazem Taghva, Julie Borsack, Allen Condit - in Proc. 17th Intl. ACM/SIGIR Conf. on Research and Development in Information Retrieval , 1994
"... Character accuracy of optically recognized text is considered a basic measure for evaluating OCR devices. In the broader sense, another fundamental measure of an OCR's goodness is whether its generated text is usable for retrieving information. In this study, we evaluate retrieval effectiveness ..."
Abstract - Cited by 49 (18 self) - Add to MetaCart
effectiveness from OCR text databases using a probabilistic IR system. We compare these retrieval results to their manually corrected equivalent. We show there is no statistical difference in precision and recall using graded accuracy levels from three OCR devices. However, characteristics of the OCR data have

Evaluation of Pattern Classifiers for Fingerprint and OCR Applications

by J.L. Blue, G.T. Candela, P.J. Grother, R. Chellappa, C.L. Wilson - Pattern Recognition , 1993
"... In this paper we evaluate the classification accuracy of four statistical and three neural network classifiers for two image based pattern classification problems. These are fingerprint classification and optical character recognition (OCR) for isolated handprinted digits. The evaluation results rep ..."
Abstract - Cited by 37 (2 self) - Add to MetaCart
directions is used to generate the input feature set. The statistical classifiers used were Euclidean minimum distance, quadratic minimum distance, normal, and k-nearest neighbor. The neural network classifiers used were multilayer perceptron, radial basis function, and probabilistic. The OCR data consisted

OCR with No Shape Training

by Tin Kam Ho, George Nagy - Proc. of 15th ICPR , 2000
"... We present a document-specific OCR system and apply it to a corpus of faxed business letters. Unsupervised classification of the segmented character bitmaps on each page, using a "clump" metric, typically yields several hundred clusters with highly skewed populations. Letter identities are ..."
Abstract - Cited by 27 (6 self) - Add to MetaCart
. This research differs from earlier attempts to apply cipher decoding to OCR in (1) using real data (2) a more appropriate clustering algorithm, and (3) decoding a many-to-many instead of a one-to-one mapping between clusters and letters. 1.

Evaluating Supervised Topic Models in the Presence of OCR Errors

by Daniel Walker, Eric Ringger
"... Supervised topic models are promising tools for text analytics that simultaneously model topical patterns in document collections and relationships between those topics and document metadata, such as timestamps. We examine empirically the effect of OCR noise on the ability of supervised topic models ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
models to produce high quality output through a series of experiments in which we evaluate three supervised topic models and a naive baseline on synthetic OCR data having various levels of degradation and on real OCR data from two different decades. The evaluation includes experiments with and without

Cryptogram decoding for OCR using numerization strings

by Gary Huang, Erik Learned-miller, Andrew Mccallum - in Proceedings, IAPR 9th Int’l Conf. on Document Analysis and Recognition (ICDAR’07 , 2007
"... OCR systems for printed documents typically require large numbers of font styles and character models to work well. When given an unseen font, performance degrades even in the absence of noise. In this paper, we perform OCR in an unsupervised fashion without using any character models by using a cry ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
cryptogram decoding algorithm. We present results on real and artificial OCR data. 1. Introduction and Related
Next 10 →
Results 1 - 10 of 444
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University