Results 1 - 10
of
444
Probabilistic Management of OCR Data using an RDBMS
, 2011
"... The digitization of scanned forms and documents is changing the data sources that enterprises manage. To integrate these new data sources with enterprise data, the current state-ofthe-art approach is to convert the images to ASCII text using optical character recognition (OCR) software and then to s ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
The digitization of scanned forms and documents is changing the data sources that enterprises manage. To integrate these new data sources with enterprise data, the current state-ofthe-art approach is to convert the images to ASCII text using optical character recognition (OCR) software
Tree-Structured Named Entity Recognition on OCR Data: Analysis, Processing and Results
"... In this paper we deal with named entity detection on data acquired via OCR process on documents dating from 1890. The resulting corpus is very noisy. We perform an analysis to find possible strategies to overcome errors introduced by the OCR process. We propose a preprocessing procedure in three ste ..."
Abstract
- Add to MetaCart
In this paper we deal with named entity detection on data acquired via OCR process on documents dating from 1890. The resulting corpus is very noisy. We perform an analysis to find possible strategies to overcome errors introduced by the OCR process. We propose a preprocessing procedure in three
OCR Correction and Query Expansion for Retrieval on OCR Data - CLARIT TREC-5 Confusion Track Report
, 1996
"... this report we first give a brief description of the OCR correction and query expansion techniques, and then discuss the results of our experiments. 2 The Automatic OCR Correction System ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
this report we first give a brief description of the OCR correction and query expansion techniques, and then discuss the results of our experiments. 2 The Automatic OCR Correction System
N-grambased text categorization
- In Proc. of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval
, 1994
"... Text categorization is a fundamental task in document processing, allowing the automated handling of enormous streams of documents in electronic form. One difficulty in handling some classes of documents is the presence of different kinds of textual errors, such as spelling and grammatical errors in ..."
Abstract
-
Cited by 445 (0 self)
- Add to MetaCart
in email, and character recognition errors in documents that come through OCR. Text categorization must work reliably on all input, and thus must tolerate some level of these kinds of problems. We describe here an N-gram-based approach to text categorization that is tolerant of textual errors. The system
Results of Applying Probabilistic IR to OCR Text
- in Proc. 17th Intl. ACM/SIGIR Conf. on Research and Development in Information Retrieval
, 1994
"... Character accuracy of optically recognized text is considered a basic measure for evaluating OCR devices. In the broader sense, another fundamental measure of an OCR's goodness is whether its generated text is usable for retrieving information. In this study, we evaluate retrieval effectiveness ..."
Abstract
-
Cited by 49 (18 self)
- Add to MetaCart
effectiveness from OCR text databases using a probabilistic IR system. We compare these retrieval results to their manually corrected equivalent. We show there is no statistical difference in precision and recall using graded accuracy levels from three OCR devices. However, characteristics of the OCR data have
Evaluation of Pattern Classifiers for Fingerprint and OCR Applications
- Pattern Recognition
, 1993
"... In this paper we evaluate the classification accuracy of four statistical and three neural network classifiers for two image based pattern classification problems. These are fingerprint classification and optical character recognition (OCR) for isolated handprinted digits. The evaluation results rep ..."
Abstract
-
Cited by 37 (2 self)
- Add to MetaCart
directions is used to generate the input feature set. The statistical classifiers used were Euclidean minimum distance, quadratic minimum distance, normal, and k-nearest neighbor. The neural network classifiers used were multilayer perceptron, radial basis function, and probabilistic. The OCR data consisted
OCR with No Shape Training
- Proc. of 15th ICPR
, 2000
"... We present a document-specific OCR system and apply it to a corpus of faxed business letters. Unsupervised classification of the segmented character bitmaps on each page, using a "clump" metric, typically yields several hundred clusters with highly skewed populations. Letter identities are ..."
Abstract
-
Cited by 27 (6 self)
- Add to MetaCart
. This research differs from earlier attempts to apply cipher decoding to OCR in (1) using real data (2) a more appropriate clustering algorithm, and (3) decoding a many-to-many instead of a one-to-one mapping between clusters and letters. 1.
Evaluating Supervised Topic Models in the Presence of OCR Errors
"... Supervised topic models are promising tools for text analytics that simultaneously model topical patterns in document collections and relationships between those topics and document metadata, such as timestamps. We examine empirically the effect of OCR noise on the ability of supervised topic models ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
models to produce high quality output through a series of experiments in which we evaluate three supervised topic models and a naive baseline on synthetic OCR data having various levels of degradation and on real OCR data from two different decades. The evaluation includes experiments with and without
Cryptogram decoding for OCR using numerization strings
- in Proceedings, IAPR 9th Int’l Conf. on Document Analysis and Recognition (ICDAR’07
, 2007
"... OCR systems for printed documents typically require large numbers of font styles and character models to work well. When given an unseen font, performance degrades even in the absence of noise. In this paper, we perform OCR in an unsupervised fashion without using any character models by using a cry ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
cryptogram decoding algorithm. We present results on real and artificial OCR data. 1. Introduction and Related
Results 1 - 10
of
444