Results 1 -
6 of
6
Searching and browsing collections of structural information
- In IEEE Advances in Digital Libraries (ADL’2000
, 1997
"... This paper proposes a new approach to querying collections of structured textual information such as SGML/XML documents. Knowledge about the structure of documents is an additional resource that should be exploited during retrieval since the semantics of the different textual objects can be used to ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
This paper proposes a new approach to querying collections of structured textual information such as SGML/XML documents. Knowledge about the structure of documents is an additional resource that should be exploited during retrieval since the semantics of the different textual objects can be used to specify an information need much more precisely. However, the traditional probabilistic retrieval model lacks the ability to handle structural information. We define a new retrieval function based on the probabilistic model which overcomes this drawback. The presented query language allows the assignment of structural roles to individual terms. The efficient evaluation of queries in this framework requires appropriate index structures. We design text and structure indexes and show how their information is combined during evaluation. The implementation supports additional functionalities such as a table of contents for browsing. First evaluation results show the feasibility of the approach on collections of unstructured documents. 1
Query optimization in XML structured-document databases
- THE VLDB JOURNAL
, 2006
"... While the information published in the form of XML-compliant documents keeps fast mounting up, efficient and effective query processing and optimization for XML have now become more important than ever. This article reports our recent advances in XML structureddocument query optimization. In this ar ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
While the information published in the form of XML-compliant documents keeps fast mounting up, efficient and effective query processing and optimization for XML have now become more important than ever. This article reports our recent advances in XML structureddocument query optimization. In this article, we elaborate on a novel approach and the techniques developed for XML query optimization. Our approach performs heuristic-based algebraic transformations on XPath queries, represented as PAT algebraic expressions, to achieve query optimization. This article first presents a comprehensive set of general equivalences with regard to XML documents and XML queries. Based on these equivalences, we developed a large set of deterministic algebraic transformation rules for XML query optimization. Our approach is unique, in that it performs exclusively deterministic transformations on queries for fast optimization. The deterministic nature of the proposed approach straightforwardly renders high optimization efficiency and simplicity in implementation. Our approach is a logical-level one, which is independent of any particular storage model. Therefore, the optimizers developed based on our approach can be easily adapted to a broad range of XML data/information servers to achieve fast query optimization. Experimental study confirms the validity and effectiveness of the proposed approach.
XPRES: a Ranking Approach to Retrieval on Structured Documents
, 1999
"... This paper proposes a new approach to query collections of structured textual information like SGML/XML documents. Knowledge about structure of documents is an additional value that should be exploited during retrieval. The semantic of different text parts could be used to specify an information nee ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This paper proposes a new approach to query collections of structured textual information like SGML/XML documents. Knowledge about structure of documents is an additional value that should be exploited during retrieval. The semantic of different text parts could be used to specify an information need much more precisely. The traditional probabilistic retrieval model lacks the ability to handle structural information. We define a new retrieval function based on the probabilistic model which overcomes this drawback. The presented query language allows the assignment of structural roles to individual terms. The efficient evaluation of queries in this framework requires appropriate index structures. We design text and structure indexes and show how their information is combined during evaluation. The implementation supports additional functionalities like a table of contents for browsing. First evaluation results show the feasibility of the approach on unstructured document colle...
Searching structured documents
- Information Processing and Management
, 2004
"... Structured document interchange formats such as XML and SGML are ubiquitous, however information retrieval systems supporting structured searching are not. Structured searching can result in increased precision. A search for the author “Smith” in an unstructured corpus of documents specializing in i ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Structured document interchange formats such as XML and SGML are ubiquitous, however information retrieval systems supporting structured searching are not. Structured searching can result in increased precision. A search for the author “Smith” in an unstructured corpus of documents specializing in iron-working could have a lower precision than a structured search for “Smith as author” in the same corpus. Analysis of XML retrieval languages identifies additional functionality that must be supported including searching at, and broken across multiple nodes in the document tree. A data structure is developed to support structured document searching. Application of this structure to information retrieval is then demonstrated. Document ranking is examined and adapted specifically for structured searching.
Beyond Information Searching and Browsing: Acquiring Knowledge from Digital Libraries
- Information Processing & Management
, 2001
"... As one of the most complex and advanced forms of Internet information systems, digital libraries serve as an increasingly important channel to a vast array of information sources and services. However, from the standpoint of satisfying human's information needs, the current digital library systems s ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
As one of the most complex and advanced forms of Internet information systems, digital libraries serve as an increasingly important channel to a vast array of information sources and services. However, from the standpoint of satisfying human's information needs, the current digital library systems suffer from the following two shortcomings: (i) inadequate strategic level cognition support; (ii) inadequate knowledge sharing facilities. In this paper, we introduce a two-layered digital library architecture to support different levels of human cognitive acts. The model moves beyond simple information searching and browsing across multiple repositories, to inquiry of knowledge. To address users' high-order cognitive requests, we propose an information space, consisting of a knowledge subspace and a document subspace. A formal description of the knowledge subspace for knowledge sharing and dissemination, as well as mechanisms for constructing the two subspaces, are particularly dis...
A Heuristics-Based Approach to Query Optimization in Structured Document Databases
- In: Proceedings Of International Database Engineering and Application Symposium
, 1999
"... The number of documents published via WWW in form of SGML/HTML has been rapidly growing for years. Efficient, declarative access mechanisms for this type of documents -- structured documents in general -- are becoming of great importance. This paper reports our most recent advance in pursuit of effe ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
The number of documents published via WWW in form of SGML/HTML has been rapidly growing for years. Efficient, declarative access mechanisms for this type of documents -- structured documents in general -- are becoming of great importance. This paper reports our most recent advance in pursuit of effective processing and optimization of structured-document queries, which are important for large repositories of structured documents. Our methodology emphasizes applying exclusively deterministic transformations on query expressions to achieve the best possible optimization efficiency. A new approach is thus proposed that facilitates the exploitation of the DTD-knowledge, structural properties, and structure indices of structured documents for the purpose of fast query optimization.

