Results 1 -
7 of
7
Document Image Understanding: Geometric and Logical Layout
- in Proc. of the Conference on Computer Vision and Pattern Recognition
, 1994
"... Introduction Document Image Understanding encompasses the technology required to make paper documents equivalent to other computer exchange media like floppies, tapes, and cdroms. The physical reader of the paper document is the scanner just like the physical reader of the floppy is the floppy driv ..."
Abstract
-
Cited by 40 (6 self)
- Add to MetaCart
Introduction Document Image Understanding encompasses the technology required to make paper documents equivalent to other computer exchange media like floppies, tapes, and cdroms. The physical reader of the paper document is the scanner just like the physical reader of the floppy is the floppy drive and the physical reader of the tape cartridge is the tape cartridge drive, and the physical reader of the cdrom is the cdrom drive. But document image understanding can involve more than just reading the character strings on a paper document and putting them in a format of our favorite word processing system. For documents have on them information just like floppies have information. But the information on a floppy is relatively simple. Its structure is typically just a set of files. But paper documents have a much more complicated structure. For example a business letter has a sender's address, a receiver's address, a date, an opening salutation, a body, a closing, and a signatur
Geometric layout analysis techniques for document image understanding: a review
, 1998
"... Document Image Understanding (DIU) is an interesting research area with a large variety of challenging applications. Researchers have worked from decades on this topic, as witnessed by the scientific literature. The main purpose of the present report is to describe the current status of DIU with par ..."
Abstract
-
Cited by 37 (0 self)
- Add to MetaCart
Document Image Understanding (DIU) is an interesting research area with a large variety of challenging applications. Researchers have worked from decades on this topic, as witnessed by the scientific literature. The main purpose of the present report is to describe the current status of DIU with particular attention to two subprocesses: document skew angle estimation and page decomposition. Several algorithms proposed in the literature are synthetically described. They are included in a novel classification scheme. Some methods proposed for the evaluation of page decomposition algorithms are described. Critical discussions are reported about the current status of the field and about the open problems. Some considerations about the logical layout analysis are also reported.
A Survey of Table Recognition: Models, Observations, Transformations, and Inferences
- International Journal of Document Analysis and Recognition
, 2003
"... Table characteristics vary widely. Consequently, a great variety of computational approaches have been applied to table recognition. In this survey, the table recognition literature is presented as an interaction of table models, observations, transformations and inferences. A table model defines ..."
Abstract
-
Cited by 32 (3 self)
- Add to MetaCart
Table characteristics vary widely. Consequently, a great variety of computational approaches have been applied to table recognition. In this survey, the table recognition literature is presented as an interaction of table models, observations, transformations and inferences. A table model defines the physical and logical structure of tables; the model is used to detect tables, and to analyze and decompose the detected tables. Observations perform feature measurements and data lookup, transformations alter or restructure data, and inferences generate and test hypotheses. This presentation clarifies the decisions that are made by a table recognizer, and the assumptions and inferencing techniques that underlie these decisions.
A Language for Specifying and Comparing Table Recognition Strategies
, 2004
"... Table recognition algorithms may be described by models of table location and struc-ture, and decisions made relative to these models. These algorithms are usually defined informally as a sequence of decisions with supporting data observations and transformations. In this investigation, we formalize ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Table recognition algorithms may be described by models of table location and struc-ture, and decisions made relative to these models. These algorithms are usually defined informally as a sequence of decisions with supporting data observations and transformations. In this investigation, we formalize these algorithms as strategies in an imitation game, where the goal of the game is to match table interpretations from a chosen procedure as closely as possible. The chosen procedure may be a person or persons producing ‘ground truth, ’ or an algorithm. To describe table recognition strategies we have defined the Recognition Strat-egy Language (RSL). RSL is a simple functional language for describing strategies as sequences of abstract decision types whose results are determined by any suit-able decision method. RSL defines and maintains interpretation trees, a simple data structure for describing recognition results. For each interpretation in an interpreta-tion tree, we annotate hypothesis histories which capture the creation, revision, and rejection of individual hypotheses, such as the logical type and structure of regions. We present a proof-of-concept using two strategies from the literature. We demon-strate how RSL allows strategies to be specified at the level of decisions rather than ii algorithms, and we compare results of our strategy implementations using new tech-niques. In particular, we introduce historical recall and precision metrics. Con-ventional recall and precision characterize hypotheses accepted after a strategy has finished. Historical recall and precision provide additional information by describing all generated hypotheses, including any rejected in the final result. iii
Chinese Document Layout Analysis Based on Adaptive Split-and-Merge and Qualitative Spatial Reasoning
- Pattern Recognition
"... The ultimate goal of automatic document processing is to understand the semantics of a document. Towards such an end, one of the primary enabling steps has been to first reason about the layout of the document by means of page segmentation and segment spatial reasoning or labeling. This, in turn, al ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
The ultimate goal of automatic document processing is to understand the semantics of a document. Towards such an end, one of the primary enabling steps has been to first reason about the layout of the document by means of page segmentation and segment spatial reasoning or labeling. This, in turn, allows for the derivation of document logical organization. This paper describes a generic document segmentation and geometric relation labeling method with applications to Chinese document analysis. Unlike the previous document segmentation methods where text spacing, border lines, and/or a priori layout models based template matching processing are performed, the present method begins with a hierarchy of partitioned image layers where inhomogeneous higher-level regions are recursively partitioned into lower-level rectangular subregions and at the same time lower-level smaller homogeneous regions are merged into larger homogeneous regions. Furthermore, the derived segment data structure readi...
Distributed Autonomous Agents for Chinese Document Image Segmentation
- Advances in Oriental Document Analysis and Recognition Techniques, World Scientific
, 1998
"... In Chinese document image processing, text and/or graphical block detection serves as an essential step in document layout analysis that in turn permits the effective reasoning about the logical relationships among various text paragraphs and graphical entities for the purpose of document understand ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In Chinese document image processing, text and/or graphical block detection serves as an essential step in document layout analysis that in turn permits the effective reasoning about the logical relationships among various text paragraphs and graphical entities for the purpose of document understanding. This paper presents a novel computational paradigm for extracting text/graphic blocks from Chinese document images, which is based on a notion of distributed autonomous agents. The primary features of the agents lie in that they are (1) adaptive to the locality of given images and hence efficient in locating the homogeneous image blocks, (2) reliable in performing image processing as the computation proceeds simultaneously from different image locations, (3) less sensitive to the noise in the given images as the computation disperses gracefully when it is moving away from the homogeneous blocks, and (4) easy to represent in their behaviors and evolvable in their performance. The paper, ...
Electronic Publishing, Vol . 8(2 3), 207--220 (june September 1995)
- Electronic Publishing
, 1995
"... This paper presents the findings of a year's study of the document analysis and document classification of PDF material. The first of these terms covers the decomposition of CCC 0894--3982/95/020207--14 Received 3 April 1996 1995 by John Wiley & Sons, Ltd. Revised 4 July 1996 208 WILLIAM S. LOVEGRO ..."
Abstract
- Add to MetaCart
This paper presents the findings of a year's study of the document analysis and document classification of PDF material. The first of these terms covers the decomposition of CCC 0894--3982/95/020207--14 Received 3 April 1996 1995 by John Wiley & Sons, Ltd. Revised 4 July 1996 208 WILLIAM S. LOVEGROVE AND DAVID F. BRAILSFORD a page into geometric elements (e.g a group of text lines, or a photograph, that form a `block' of some sort) followed by the demarcation of these blocks using paragraph breaks, inter-column gutters and so on. After a preliminary tagging of these blocks according to their apparent type (e.g. photograph, heading, paragraph etc.) it may be possible to combine this type classification with the known geometric block layout to discern which headings (for example) govern which particular text and graphic blocks. In this way one can begin document classification by inferring whether this document is a journal paper, newspaper, brochure, business letter or some other document type [4]. Previous research in this field has taken scanned bitmap images as the input to the document analysis system and classification of the document components is often guided by a priori knowledge of the document's class [5--9]. It is noteworthy that there has been hardly any research in using PostScript as a starting point for document analysis. Certainly, if a PostScript file has been designed for maximum rasterising efficiency it can be a daunting task even to reconstruct the `reading order' of the document. It may also be the case that previous workers have presumed that a well-structured source text will always be available to match PostScript output and, therefore, that working `bottom up' from PostScript would seldom be necessary. However, we shall find that documents can...

