Results 1 -
2 of
2
Transforming Paper Documents into XML Format with WISDOM++
- International Journal of Document Analysis and Recognition
, 2000
"... The transformation of scanned paper documents to a form suitable for an Internet browser is a complex process that requires solutions to several problems. The application of an OCR to some parts of the document image is only one of the problems. In fact, the generation of documents in HTML format is ..."
Abstract
-
Cited by 35 (9 self)
- Add to MetaCart
The transformation of scanned paper documents to a form suitable for an Internet browser is a complex process that requires solutions to several problems. The application of an OCR to some parts of the document image is only one of the problems. In fact, the generation of documents in HTML format is easier when the layout structure of a page has been extracted by means of a document analysis process. The adoption of an XML format is even better, since it can facilitate the retrieval of documents in the Web. Nevertheless, an effective transformation of paper documents into this format requires further processing steps, namely document image classification and understanding. WISDOM++ is a document processing system that operates in five steps: document analysis, document classification, document understanding, text recognition with an OCR, and text transformation into HTML/XML format. The innovative aspects described in the paper are: the preprocessing algorithm, the adaptive page segmen...
WISDOM++: An Interactive and Adaptive Document Analysis System
- Proc. of the 5th Int. Conf. on Document Analysis and Recognition, IEEE Computer Society Press, Los Alamitos
, 1999
"... WISDOM++ is a document analysis system whose main design requirements are real-time user interaction and adaptivity. This paper presents the two-phased skew estimation algorithm and the adaptive document block segmentation and classification techniques. An evaluation of the performance of some of th ..."
Abstract
- Add to MetaCart
WISDOM++ is a document analysis system whose main design requirements are real-time user interaction and adaptivity. This paper presents the two-phased skew estimation algorithm and the adaptive document block segmentation and classification techniques. An evaluation of the performance of some of these tasks is also conducted according to a benchmarking procedure.

