Results 1  10
of
16
A Survey of Table Recognition: Models, Observations, Transformations, and Inferences
 International Journal of Document Analysis and Recognition
, 2003
"... Table characteristics vary widely. Consequently, a great variety of computational approaches have been applied to table recognition. In this survey, the table recognition literature is presented as an interaction of table models, observations, transformations and inferences. A table model defines ..."
Abstract

Cited by 49 (4 self)
 Add to MetaCart
(Show Context)
Table characteristics vary widely. Consequently, a great variety of computational approaches have been applied to table recognition. In this survey, the table recognition literature is presented as an interaction of table models, observations, transformations and inferences. A table model defines the physical and logical structure of tables; the model is used to detect tables, and to analyze and decompose the detected tables. Observations perform feature measurements and data lookup, transformations alter or restructure data, and inferences generate and test hypotheses. This presentation clarifies the decisions that are made by a table recognizer, and the assumptions and inferencing techniques that underlie these decisions.
Graphics Recognition  from Reengineering to Retrieval
 Proc. of 7th ICDAR
, 2003
"... In this paper, we discuss how the focus in document analysis, generally speaking, and in graphics recognition more specifically, has moved from reengineering problems to indexing and information retrieval. After a review of ongoing work on these topics, we propose some challenges for the years to c ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
(Show Context)
In this paper, we discuss how the focus in document analysis, generally speaking, and in graphics recognition more specifically, has moved from reengineering problems to indexing and information retrieval. After a review of ongoing work on these topics, we propose some challenges for the years to come.
R.: Making Documents Work: Challenges for Document Understanding
 In: Proc. of the Seventh Int. Conf. on Document Analysis and Recognition (ICDAR ‘03), IEEE Computer
, 2003
"... In this paper I will try to explain the nature of document understanding in all of its dimensions. Therefore I will first describe the characteristics of data, knowledge, and information in order to describe their synergetic interweaving. After that I will try to structure the inherent complexity of ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
In this paper I will try to explain the nature of document understanding in all of its dimensions. Therefore I will first describe the characteristics of data, knowledge, and information in order to describe their synergetic interweaving. After that I will try to structure the inherent complexity of subproblems of document understanding which may not be solved serially, but rather are attributes of individual documents. Thus, this paper focuses on system engineering challenges. However, I will show some recent work done on the different topics and give some insights in the individual techniques we chose at DFKI.
A Language for Specifying and Comparing Table Recognition Strategies
, 2004
"... ..."
(Show Context)
smartfix: A requirementsdriven system for document analysis and understanding
 In Proceedings of IAPR International Workshop on Document Analysis Systems (DAS’02
, 2002
"... Abstract. Although the internet offers a widespread platform for information interchange, daytoday work in large companies still means the processing of tens of thousands of printed documents every day. This paper presents the system smartFIX which is a document analysis and understanding system ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
Abstract. Although the internet offers a widespread platform for information interchange, daytoday work in large companies still means the processing of tens of thousands of printed documents every day. This paper presents the system smartFIX which is a document analysis and understanding system deve loped by the DFKI spinoff INSIDERS. It permits the processing of documents ranging from fixed format forms to unstructured letters of any format. Apart from the architecture, the main components and system characteristics, we also show some results when applying smartFIX to medical bills and prescriptions. 1
Table detection via probability optimization
 in Proceedings of Document Analysis Systems, (DAS’02
, 2002
"... Abstract. In this paper, we define the table detection problem as a probability optimization problem. We begin, as we do in our previous algorithm, finding and validating each detected table candidates. We proceed to compute a set of probability measurements for each of the table entities. The compu ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper, we define the table detection problem as a probability optimization problem. We begin, as we do in our previous algorithm, finding and validating each detected table candidates. We proceed to compute a set of probability measurements for each of the table entities. The computation of the probability measurements takes into consideration tables, table text separators and table neighboring text blocks. Then, an iterative updating method is used to optimize the page segmentation probability to obtain the final result. This new algorithm shows a great improvement over our previous algorithm. The training and testing data set for the algorithm include 1, 125 document pages having 518 table entities and a total of 10, 934 cell entities. Compared with our previous work, it raised the accuracy rate to 95.67 % from 90.32 % and to 97.05 % from 92.04%. 1
Information extraction by finding repeated structure
 Ninth IAPR International Workshop on Document Analysis Systems
, 2010
"... Repetition of layout structure is prevalent in document images. In document design, such repetition conveys the underlying logical and functional structure of the data. For example, in invoices, the names, unit prices, quantities and other descriptors of every line item are laid out in a consiste ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Repetition of layout structure is prevalent in document images. In document design, such repetition conveys the underlying logical and functional structure of the data. For example, in invoices, the names, unit prices, quantities and other descriptors of every line item are laid out in a consistent spatial structure. We propose a general method for extracting such repeated structure from documents. After receiving a single example of the structure to be found, the proposed method localizes additional instances of this structure in the same document and in additional documents. A wide variety of perceptually motivated cues (such as alignment and saliency) is used for this purpose. These cues are combined in a probabilistic model, and a novel algorithm for exact inference in this model is proposed and used. We demonstrate that this method can cope with complex instances of repeated structure and generalizes successfully across a wide range of structure variations. Categories and Subject Descriptors I.7 [Computing Methodologies]: Document and text pro
www.elsevier.com/locate/patcog Table structure understanding and its performance evaluation
, 2004
"... This paper presents a table structure understanding algorithm designed using optimization methods. The algorithm is probability based, where the probabilities are estimated from geometric measurements made on the various entities in a large training set. The methodology includes a global parameter o ..."
Abstract
 Add to MetaCart
(Show Context)
This paper presents a table structure understanding algorithm designed using optimization methods. The algorithm is probability based, where the probabilities are estimated from geometric measurements made on the various entities in a large training set. The methodology includes a global parameter optimization scheme, a novel automatic table ground truth generation system and a table structure understanding performance evaluation protocol. With a document data set having 518 table and 10,934 cell entities, it performed at the 96.76 % accuracy rate on the cell level and 98.32 % accuracy rate on the table level.
The Eighth IAPR Workshop on Document Analysis Systems An endtoend administrative document analysis system
"... This paper presents an endtoend administrative document analysis system. This system uses casebased reasoning in order to process documents from known and unknown classes. For each document, the system retrieves the nearest processing experience in order to analyze and interpret the current docum ..."
Abstract
 Add to MetaCart
(Show Context)
This paper presents an endtoend administrative document analysis system. This system uses casebased reasoning in order to process documents from known and unknown classes. For each document, the system retrieves the nearest processing experience in order to analyze and interpret the current document. When a complete analysis is done, this document needs to be added to the document database. This requires an incremental learning process in order to take into account every new information, without losing the previous learnt ones. For this purpose, we proposed an improved version of an already existing neural network called Incremental Growing Neural Gas. Applied on documents learning and classification, this neural network reaches a recognition rate of 97.63%. 1.
unknown title
"... With the large number of existing documents and the increasing speed in the production of new documents, finding efficient methods to process these documents for their content retrieval and storage becomes critical. Tables are a popular and efficient document element type. Therefore, table structure ..."
Abstract
 Add to MetaCart
(Show Context)
With the large number of existing documents and the increasing speed in the production of new documents, finding efficient methods to process these documents for their content retrieval and storage becomes critical. Tables are a popular and efficient document element type. Therefore, table structure understanding is an important problem in the document layout analysis field. This paper presents a table structure understanding algorithm using optimization methods. It includes steps of column style labeling, large horizontal blank block equivalence subsets location, statistical refinement, iterative updating optimization and table decomposition. The column style labeling, statistical refinement and iterative updating optimization steps are probability based, where the probabilities are estimated from geometric measurements made on the various entities with which the algorithm works in a large training set. Each step of our table structure understanding algorithm has some tuning parameters. We initially set the parameters with some conjectural values. Then with a global parameter optimization scheme, we update these values using a line search optimization algorithm. We use a performance evaluation protocol employing an area overlapping measure. With this scheme, we can obtain statistically satisfactory tuning parameter values on the fly. Large data sets with ground truth are essential in assessing the performance of a computer vision algorithm. Manually generating document ground truth proved to be very costly and prone to involve subjective errors. We address this problem by using an automatic table ground truth generation system which can efficiently generate a large amount of accurate ground