Results 1 - 10
of
20
From Structured Documents to Novel Query Facilities
, 1994
"... Structured documents (e.g., SGML) can benefit a lot from database support and more specifically from object-oriented database (OODB) management systems. This paper describes a natural mapping from SGML documents into OODB's and a formal extension of two OODB query languages (one SQL-like and the oth ..."
Abstract
-
Cited by 222 (34 self)
- Add to MetaCart
Structured documents (e.g., SGML) can benefit a lot from database support and more specifically from object-oriented database (OODB) management systems. This paper describes a natural mapping from SGML documents into OODB's and a formal extension of two OODB query languages (one SQL-like and the other calculus) in order to deal with SGML document retrieval. Although motivated by structured documents, the extensions of query languages that we present are general and useful for a variety of other OODB applications. A key element is the introduction of paths as first class citizens. The new features allow to query data (and to some extent schema) without exact knowledge of the schema in a simple and homogeneous fashion. 1 Introduction Structured documents are central to a wide class of applications such as software engineering, libraries, technical documentation, etc. They are often stored in file systems and document access tools are somewhat limited. We believe that (object-oriented) d...
An Algebra for Structured Text Search and A Framework for its Implementation
- The Computer Journal
, 1995
"... A query algebra is presented that expresses searches on structured text. In addition to traditional full-text boolean queries that search a pre-defined collection of documents, the algebra permits queries that harness document structure. The algebra manipulates arbitrary intervals of text, which are ..."
Abstract
-
Cited by 104 (19 self)
- Add to MetaCart
A query algebra is presented that expresses searches on structured text. In addition to traditional full-text boolean queries that search a pre-defined collection of documents, the algebra permits queries that harness document structure. The algebra manipulates arbitrary intervals of text, which are recognized in the text from implicit or explicit markup. The algebra has seven operators, which combine intervals to yield new ones: containing , not containing , contained in, not contained in, one of , both of , followed by . The ultimate result of a query is the set of intervals that satisfy it. An implementation framework is given based on four primitive access functions. Each access function finds the solution to a query nearest to a given position in the database. Recursive definitions for the seven operators are given in terms of these access functions. Search time is at worst proportional to the time required to solve the elementary terms in the query. Inverted indices yield search ...
Algebras for Querying Text Regions: Expressive Power and Optimization
- Journal of Computer and System Sciences
, 1998
"... There is a significant amount of interest in combining and extending database and information retrieval technologies to manage textual data. The challenge is becoming more relevant due to increased availability of documents in digital form. Document data has a natural hierarchical structure, which m ..."
Abstract
-
Cited by 28 (0 self)
- Add to MetaCart
There is a significant amount of interest in combining and extending database and information retrieval technologies to manage textual data. The challenge is becoming more relevant due to increased availability of documents in digital form. Document data has a natural hierarchical structure, which may be made explicit due to the use of mark-up conventions (as with SGML). An important aspect of managing structured and semi-structured textual data consists of supporting the efficient retrieval of text components based both on their content and structure. In this paper we study issues related to the expressive power and optimization of a class of algebras that support combining string (or pattern) searches with queries on the hierarchical structure of the text. The region algebra studied is a set-at-a-time algebra for manipulating text regions (substrings of the text) that supports finding out nesting and ordering properties of the text regions. This algebra is part of the language in us...
Shortest Substring Ranking (MultiText Experiments for TREC-4)
- Proceedings of TREC-4
"... To address the TREC-4 topics, we used a precise query language that yields and combines arbitrary intervals of text rather than pre-defined units like words and documents. Each solution was scored in inverse proportion to the length of the shortest interval containing it. Each document was scored by ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
To address the TREC-4 topics, we used a precise query language that yields and combines arbitrary intervals of text rather than pre-defined units like words and documents. Each solution was scored in inverse proportion to the length of the shortest interval containing it. Each document was scored by the sum of the scores of solutions within it. Whenever the above strategy yielded less than 1000 documents, documents satisfying successively weaker queries were added with lower rank. Our results for the ad-hoc topics compare favourably with the median average precision for all groups. 1 Introduction The central concern of the MultiText project at the University of Waterloo is the management of data in large-scale distributed text database systems [10]. A major component of this work has been the development of a query language that is suitable for expressing queries over the heterogeneous data that is present in a very large text database. The query language developed for the MultiText p...
Integrating a Structured-Text Retrieval System with an Object-Oriented Database System
- IN PROCEEDINGS OF THE TWENTIETH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES
, 1994
"... We describe the integration of a structured-text retrieval system (TextMachine) into an object-oriented database system (OpenODB). Our approach is a light-weight one, using the external function capability of the database system to encapsulate the text retrieval system as an external information ..."
Abstract
-
Cited by 22 (2 self)
- Add to MetaCart
We describe the integration of a structured-text retrieval system (TextMachine) into an object-oriented database system (OpenODB). Our approach is a light-weight one, using the external function capability of the database system to encapsulate the text retrieval system as an external information source. Yet, we are able to provide a tight integration in the query language and processing; the user can access the text retrieval system using a standard database query language. The efficient and effective retrieval of structured text performed by the text retrieval system is seamlessly combined with the rich modeling and general-purpose querying capabilities of the database system, resulting in an integrated system with querying power beyond those of the underlying systems. The integrated system also provides uniform access to textual data in the text retrieval system and structured data in the database system, thereby achieving information fusion. We discuss the design and imple...
Complete Answer Aggregates for Tree-like Databases: A Novel Approach to Combine Querying and Navigation
- ACM TRANSACTIONS ON INFORMATION SYSTEMS
, 2001
"... The use of markup languages like SGML, HTML, or XML for encoding the structure of documents or linguistic data has lead . . . ..."
Abstract
-
Cited by 22 (3 self)
- Add to MetaCart
The use of markup languages like SGML, HTML, or XML for encoding the structure of documents or linguistic data has lead . . .
Schema-independent retrieval from heterogeneous structured text
- Fourth Annual Symposium on Document Analysis and Retrieval, Las Vegas, NV
, 1995
"... ..."
Querying Structured Documents with Hypertext Links using OODBMS
- In ECHT'94
, 1994
"... Hierarchical logical structure and hypertext links are complementary and can be combined to build more powerful document management systems [28, 25, 24, 13]. Previous work exploits this complementarity for building better document processors, browsers and editing tools, but not for building sophisti ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
Hierarchical logical structure and hypertext links are complementary and can be combined to build more powerful document management systems [28, 25, 24, 13]. Previous work exploits this complementarity for building better document processors, browsers and editing tools, but not for building sophisticated querying mechanisms. Querying in hypertext has been a requirement since [19] and has already been elaborated in many hypertext systems [11, 7, 4, 21], but has not yet been used for hypertext systems superimposed on an underlying hierarchical logical structure. In this paper we use the model and the SQL-like query language of [10] in order to manage structured documents with hypertext links. The model represents a structured document with typed links as a complex object, and uses paths through the document structure, as first class citizens in formulating queries. Several examples of queries illustrate, from a practical point of view, the expressive power of the language to retrieve doc...

