Results 1 -
3 of
3
Degraded Text Recognition Using Visual And Linguistic Context
, 1995
"... Recognition of degraded text is a challenging problem. To improve the performance of an OCR system on degraded images of text, postprocessing techniques are critical. The objective of postprocessing is to correct errors or to resolve ambiguities in OCR results by using contextual information. Depend ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
Recognition of degraded text is a challenging problem. To improve the performance of an OCR system on degraded images of text, postprocessing techniques are critical. The objective of postprocessing is to correct errors or to resolve ambiguities in OCR results by using contextual information. Depending on the extent of context used, there are different levels of postprocessing. In current commercial OCR systems, word-level postprocessing methods, such as dictionary-lookup, have been applied successfully. However, many OCR errors cannot be corrected by word-level postprocessing. To overcome this limitation, passage-level postprocessing, in which global contextual information is utilized, is necessary. In most current studies on passage-level postprocessing, linguistic context is the major resource to be exploited. This thesis addresses problems in degraded text recognition and discusses potential solutions through passage-level postprocessing. The objective is to develop a postprocessin...
Document Understanding: Research Directions
, 1992
"... A document image is a visual representation of a printed page such as a journal article page, a facsimile cover page, a technical document, an o#ce letter, etc. Document understanding as a research endeavor consists of studying all processes involved in taking a document through various representati ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
A document image is a visual representation of a printed page such as a journal article page, a facsimile cover page, a technical document, an o#ce letter, etc. Document understanding as a research endeavor consists of studying all processes involved in taking a document through various representations: from a scanned physical document to high-level semantic descriptions of the document. Some of the types of representation that are useful are: editable descriptions, descriptions that enable exact reproductions and high-level semantic descriptions about document content. This report is a de#nition of #ve research subdomains within document understanding as pertaining to predominantly printed documents. The topics described are: modular architectures for document understanding; decomposition and structural analysis of documents; model-based OCR; table, diagram and image understanding; and performance evaluation under distortion and noise. 1 Each of the main sections of this paper were ...
Quantifying information leakage in document redaction
- In HDP ‘04
"... In this paper, we examine ways in which sensitive information might leak through the process of redaction. Such attacks apply known methods from document image analysis and natural language processing to recover text thought to have been obliterated for the purposes of public release. Systematically ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this paper, we examine ways in which sensitive information might leak through the process of redaction. Such attacks apply known methods from document image analysis and natural language processing to recover text thought to have been obliterated for the purposes of public release. Systematically identifying and testing these weaknesses is a first step towards designing effective countermeasures. We describe our development of a prototype semi-automated system intended to accept as input a redacted document and provide feedback to the user as to whether the document might suffer from such leaks.

