Results 1 - 10
of
16
Retrieval of Complex Objects Using a Four-Valued Logic
- Proceedings of the 19th International ACM SIGIR Conference on Research and Development in Information Retrieval
, 1996
"... The aggregated structure of documents plays a key role in full-text, multimedia, and network Information Retrieval (IR). Considering aggregation provides new querying facilities and improves retrieval effectiveness. We present a knowledge representation for IR purposes which pays special attention t ..."
Abstract
-
Cited by 24 (6 self)
- Add to MetaCart
The aggregated structure of documents plays a key role in full-text, multimedia, and network Information Retrieval (IR). Considering aggregation provides new querying facilities and improves retrieval effectiveness. We present a knowledge representation for IR purposes which pays special attention to this aggregated structure of objects. In addition, further features of objects can be described. Thus, the structure of full-text documents, the heterogeneity and the spatial and temporal relationships of objects typical for multimedia IR, and meta information for network IR are representable within one integrated framework. The model we propose allows for querying on the content of documents (objects) as well as on other features. The query result may contain objects having different types. Instead of retrieving only whole documents, the retrieval process determines the least aggregated entities that imply the query. 1 Motivation and Background New IR applications like full-text, multime...
Forming Grammars for Structured Documents: An Application of Grammatical Inference
- Proceedings of the Second International Colloquium on Grammatical Inference (ICGI-94): Grammatical Inference and Applications, volume 862 of LNAI
, 1994
"... We consider the problem of generating grammars for classes of structured documents -- dictionaries, encyclopedias, user manuals, and so on -- from examples. The examples consist of structures of individual documents, and they can be collected either by converting typographical tagging of documents p ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
We consider the problem of generating grammars for classes of structured documents -- dictionaries, encyclopedias, user manuals, and so on -- from examples. The examples consist of structures of individual documents, and they can be collected either by converting typographical tagging of documents prepared for printing into structural tags, or by using document recognition techniques. Our method forms first finite-state automata describing the examples completely. These automata are modified by considering certain context conditions; the modifications correspond to generalizing the underlying language. Finally, the automata are converted into regular expressions, and they are used to construct the grammar. In addition to automata, an alternative representation, characteristic k-grams, is introduced. Some interactive operations are also described that are necessary for generating a grammar for a large and complicated document.
Hypothesis Management for Structured Document Recognition
- 1st ICDAR, Saint-Malo
, 1991
"... This paper describes a new approach to identify the specific structure of a document from a generic model by a blackboard based system called Graphein. The system deals with different hypotheses of structuration and its methodology permits to take into account the structural context of documents. Th ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
This paper describes a new approach to identify the specific structure of a document from a generic model by a blackboard based system called Graphein. The system deals with different hypotheses of structuration and its methodology permits to take into account the structural context of documents. The model is described with an international standard formalism (oda) which characterizes the different constituent objects and their subordinates. The system adopts different reading strategies according to the hypotheses extracted from the model. A top-down method (guided by the model) is applied when an hypothesis is "sure enough", a mixed method extracts clues from the image before applying one hypothesis, or a full bottom-up method (fusion process) is activated when the model is not directly usable. The choice of the best strategy to apply depends on the analysis of the current hypotheses.
A Structured Document Database System
- In Proceedings of the International Conference on Electronic Publishing, Document Manipulation & Typography
, 1990
"... We describe a database system for writing, editing, and querying structured documents. The structure of text is described using a context-free grammar. The operations are implemented using a powerful query language. The system supports the use of user-defined multiple views of the documents: one vie ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
We describe a database system for writing, editing, and querying structured documents. The structure of text is described using a context-free grammar. The operations are implemented using a powerful query language. The system supports the use of user-defined multiple views of the documents: one view can contain all the structure explicitly, while another can contain only part of the document and have only part of the structure visible. This makes the system flexible for different editing tasks. The system is implemented in C using a relational database system. 1 Introduction Text with a structure is quite common: dictionaries, reference manuals, yearly reports etc. are typical examples. In recent years, research into systems for writing structured documents has flourished: see, e.g., [AFQ89b, AFQ89a, Fur89, Qui89] for recent surveys of the field. The SGML and ODA standards (see [Jol89, Bar89, Bro89]) have further increased the interest in the area. The Helsinki Structured Text Databa...
Qualitative Analysis of Low-Level Logical Structures
, 1993
"... This paperpresentsa qualitative approach to logical structure recognitionof library references. The system is driven by a generic model of a reference class and by an OCR flow, given in SGML format, that include ASCII code of the characters and information about the typographic style and the lexical ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
This paperpresentsa qualitative approach to logical structure recognitionof library references. The system is driven by a generic model of a reference class and by an OCR flow, given in SGML format, that include ASCII code of the characters and information about the typographic style and the lexical affiliation of words. The approach used is based on hypotheses production and verification about the existence of sub-field limits in the reference area. At each step of the analysis, the generated hypotheses are sorted on the basis of their confidence scores and the most likely hypothesis is analyzed. The result is a structured flow containing, in UNIMARC format, the list of different sub-fields recognized, accompanied with their confidence score.
SGML Nets: Integrating Document and Workflow Modeling
- In Proc of the 31st Hawaii International Conference on System Sciences (HICSS-31
, 1998
"... In this paper, we introduce so-called SGML nets as a new formalism for an integrated modeling of document structures as well as document manipulation processes. SGML nets are a variant of high-level Petri nets where each place (passive element, “document store”) is typed using an SGML document type ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In this paper, we introduce so-called SGML nets as a new formalism for an integrated modeling of document structures as well as document manipulation processes. SGML nets are a variant of high-level Petri nets where each place (passive element, “document store”) is typed using an SGML document type definition (DTD). Each place may be marked with a set of DTD-conforming document instances. Each transition (active element) specifies a class of operations on these document stores. Edges in SGML nets are inscribed with document templates. The incoming arcs of a transition select a set of instances to be read from the input places, while outgoing arcs define insertions into output places. The definition of the occurrence rule ensures DTDconformance of the document instances in all places of the net at every moment. 1.
Document Understanding: Research Directions
, 1992
"... A document image is a visual representation of a printed page such as a journal article page, a facsimile cover page, a technical document, an o#ce letter, etc. Document understanding as a research endeavor consists of studying all processes involved in taking a document through various representati ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
A document image is a visual representation of a printed page such as a journal article page, a facsimile cover page, a technical document, an o#ce letter, etc. Document understanding as a research endeavor consists of studying all processes involved in taking a document through various representations: from a scanned physical document to high-level semantic descriptions of the document. Some of the types of representation that are useful are: editable descriptions, descriptions that enable exact reproductions and high-level semantic descriptions about document content. This report is a de#nition of #ve research subdomains within document understanding as pertaining to predominantly printed documents. The topics described are: modular architectures for document understanding; decomposition and structural analysis of documents; model-based OCR; table, diagram and image understanding; and performance evaluation under distortion and noise. 1 Each of the main sections of this paper were ...
Distributed Multimedia Application Study
, 1992
"... The development of distributed multimedia technology has the potential to both create new application areas and augment those that pre-existed. To benefit from this potential, however, it is important to ensure that application developers are provided with system services which help them achieve the ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
The development of distributed multimedia technology has the potential to both create new application areas and augment those that pre-existed. To benefit from this potential, however, it is important to ensure that application developers are provided with system services which help them achieve their objectives. Developers of distributed systems, therefore, must fully understand the nature of these objectives and the system functionality required to support them. It is essential that an understanding is achieved of the types of application which will be developed and the mechanisms necessary for these applications to operate successfully. This paper surveys multimedia applications research in order to assess its contribution to providing requirements for future processing and communications support infrastructures. The paper also derives a set of requirements for future distributed systems technology. 1. INTRODUCTION It is often the case that research advances in Computer Science foll...
Usability of Groupware Products for Supporting Publishing Workflows
- Proc. CON '94, Workflow Management: Challenges, Paradigms and Products
, 1994
"... This paper describes different workflow management systems from a publishing point of view. The intention is to derive the special requirements which arise when dealing with compound structured documents and to investigate to what degree these requirements are fulfilled by currently available gr ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper describes different workflow management systems from a publishing point of view. The intention is to derive the special requirements which arise when dealing with compound structured documents and to investigate to what degree these requirements are fulfilled by currently available groupware products. The requirements are identified and an investigation of three typical representatives of workfiow tools is given.
Using HyTime for Modeling Publishing Workflows
, 1995
"... . 1 Introduction development processes Using HyTime for Modeling Publishing Workflows FAW - Research Institute for Applied Knowledge Processing A-4232 HAGENBERG, Austria E-Mail: franz sre @faw.uni-linz.ac.at University of Vienna A-1010 VIENNA, Austria E-Mail: gq@ifs.univie.ac.at Technical Universit ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
. 1 Introduction development processes Using HyTime for Modeling Publishing Workflows FAW - Research Institute for Applied Knowledge Processing A-4232 HAGENBERG, Austria E-Mail: franz sre @faw.uni-linz.ac.at University of Vienna A-1010 VIENNA, Austria E-Mail: gq@ifs.univie.ac.at Technical University of Vienna A-1010 VIENNA, Austria E-Mail: tjoa@eimoni.tuwien.ac.at Workflow Management is seen as the key for improving overall business effectiveness. However, today's solutions are very specific and isolated. Recently emerged international standards in the publishing sector provide the means for enabling standardized descriptions and thus allow the interchange and reuse of workflow information. This paper describes how HyTime can be used as a modeling language to describe publishing workflows. We present relevant parts of the HyTime model and reflect on pros and cons of the chosen approach. This work was supported by the Austrian Federal Research Foundation (FFF) under grant no. 2/310. F...

