Results 1 -
8 of
8
Supporting Full-Text Information Retrieval with a Persistent Object Store
- In 4th Intl. Conf. on Extending Database Technology
, 1994
"... Full-text information retrieval systems have unusual and challenging data management requirements. Attempts have been made to satisfy these requirements using traditional (e.g., relational) database management systems. Those attempts, however, have produced rather discouraging results. Instead, info ..."
Abstract
-
Cited by 27 (4 self)
- Add to MetaCart
Full-text information retrieval systems have unusual and challenging data management requirements. Attempts have been made to satisfy these requirements using traditional (e.g., relational) database management systems. Those attempts, however, have produced rather discouraging results. Instead, information retrieval systems typically use custom data management facilities that require significant development effort and usually do not provide all of the services available from a standard database management system. Advanced data management systems, such as object-oriented database management systems and persistent object stores, offer a reasonable alternative to the two previous approaches. We have taken an existing information retrieval system (INQUERY) and substituted a persistent object store (Mneme) for the portion of the custom data management system that manages an inverted file index. The result is an improvement in performance and significant opportunities for the inform...
Execution Performance Issues in Full-Text Information Retrieval
, 1995
"... The task of an information retrieval system is to identify documents that will satisfy a user's information need. Effective fulfillment of this task has long been an active area of research, leading to sophisticated retrieval models for representing information content in documents and queries and m ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
The task of an information retrieval system is to identify documents that will satisfy a user's information need. Effective fulfillment of this task has long been an active area of research, leading to sophisticated retrieval models for representing information content in documents and queries and measuring similarity between the two. The maturity and proven effectiveness of these systems has resulted in demand for increased capacity, performance, scalability, and functionality, especially as information retrieval is integrated into more traditional database management environments. In this dissertation we explore a number of functionality and performance issues in information retrieval. First, we consider creation and modification of the document collection, concentrating on management of the inverted file index. An inverted file architecture based on a persistent object store is described and experimental results are presented for inverted file creation and modification. Our architecture provides performance that scales well with document collection size and the database features supported by the persistent object store provide many solutions to issues that arise during integration of information retrieval into more general database environments. We then turn to query evaluation speed and introduce a new optimization technique for statistical ranking retrieval systems that support structured queries. Experimental results from a variety of query sets show that execution time can be reduced by more than 50% wit...
An Extended Relational Document Retrieval Model
- In: Processing & Management,Vol
, 1988
"... Abstract-Relational Data Base Management Systems offer a commercially available tool with which to build effective document retrieval systems. The full potential of the relational model for supporting the kind of ad hoc inquiry characteristic of document retrieval has only recently been explored. In ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
Abstract-Relational Data Base Management Systems offer a commercially available tool with which to build effective document retrieval systems. The full potential of the relational model for supporting the kind of ad hoc inquiry characteristic of document retrieval has only recently been explored. In addition, commercially available relational DBMS’s also provide effective tools for managing document data bases by providing facilities for, inter alia, concurrency control, data migration and reorganization routines, authorization mechanisms, enforcement of integrity constraints, dynamic data definition, etc. This article will present a relational logical model to support a sophisticated document retrieval system in which flexible forms of inferential and associative searching can be performed. Examples of ad hoc inquiry will be presented in SQL. Several problems of particular importance to document retrieval will be discussed, including the importance of Conjunctive Normal Form in query formulation, unique aspects of document retrieval storage and processing overhead, and techniques for reducing the size of storage without severely impacting retrieval effectiveness. 1.
Introduction and Overview
- Journal of the American Society for Information Science
, 1993
"... This paper provides a partial overview of practice, problems, proposals, and plans relating to the handling of 'composite documents ' by extended information storage and retrieval sys-tems. It aims to describe such documents, to explore various areas of application for them, to portray a number of r ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
This paper provides a partial overview of practice, problems, proposals, and plans relating to the handling of 'composite documents ' by extended information storage and retrieval sys-tems. It aims to describe such documents, to explore various areas of application for them, to portray a number of representative test collec-tions under development, to survey related stu-dies as well as previous work of this author, and to examine plans for further investigation. 1.1. Background Experimental information retrieval (IR) sys-tems, some dating back to the sixties, have demonstrated the viability of fully automatic document storage and retrieval methodologies with. small to medium size bibliographic collec-tions [72]. Many of these experimental systems utilize the vector space model in which each important term (such as a word stem) identifies a different dimension in a space, so that matrix methods and vector operations can be defined on queries and documents. Statistical techniques have been very effective, and probabilistic enhancements have given additional improve-ments [84]. However, the basic vector space model is oriented towards recording the essential information in the text of a title/abstract combi-This work was supported by the NSF under grant IST-8418877 and by Virginia's Center for Innovative Technolo-gy under grant INF-85-016.
Using the Relational Model and Part-of-Speech Tagging to Implement Text Relevance
, 1992
"... Weintroduce a database design that improves prior work on document retrieval within the relational model. While previous approaches require extensions to the relational model, our approach uses an unchanged relational system. We focus on the implementation of assigning a measure of relevance between ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Weintroduce a database design that improves prior work on document retrieval within the relational model. While previous approaches require extensions to the relational model, our approach uses an unchanged relational system. We focus on the implementation of assigning a measure of relevance between a query and a document as this is more useful for large document databases. Since run-time performance is critical to interactive document retrieval applications, we provide a detailed performance analysis of the number of disk reads required by our approach as compared to a traditional IR system. Additionally,we showhow an existing stochastic heuristic which assigns the parts of speech to a document can be incorporated into the relational model. 1 Introduction The idea of using relational systems to facilitate text processing was #rst proposed in #9#. In this work it was shown that simplistic keyword searches can be done using unchanged SEQUEL #a precursor to SQL#. Additional operations...
A Parallel RDBMS Approach to Implement Relevance Feedback
"... this paper, the terms in the relevant documents were sorted using six different sort orders. The sorts were done using either were done using either ..."
Abstract
- Add to MetaCart
this paper, the terms in the relevant documents were sorted using six different sort orders. The sorts were done using either were done using either
A Parallel Relational Database Management System Approach to Relevance Feedback in Information Retrieval
- Journal of the American Society of Information Science
, 1999
"... A scalable, parallel, relational-database driven information retrieval engine is described. To support portability across a wide-range of execution environments, including parallel multicomputers, all algorithms strictly adhere to the SQL-92 standard. By incorporating relevance feedback algorithms, ..."
Abstract
- Add to MetaCart
A scalable, parallel, relational-database driven information retrieval engine is described. To support portability across a wide-range of execution environments, including parallel multicomputers, all algorithms strictly adhere to the SQL-92 standard. By incorporating relevance feedback algorithms, accuracy was significantly enhanced over prior database-driven information retrieval efforts. Algorithmic modifications to our earlier prototype resulted in significantly enhanced scalability. Currently our information retrieval engine sustains near-linear speedups using a 24-node parallel database machine. Experiments using the TREC data collections are presented to validate the described approaches. 1.
The QMUL Team with Probabilistic SQL at Enterprise Track
"... The enterprise track caught our attention, since the task is similar to a project we carried our for the BBC. Our motivation for participation has been twofold: On one hand, there is the usual challenge to design and test the quality of retrieval strategies. On the other hand, and for us very import ..."
Abstract
- Add to MetaCart
The enterprise track caught our attention, since the task is similar to a project we carried our for the BBC. Our motivation for participation has been twofold: On one hand, there is the usual challenge to design and test the quality of retrieval strategies. On the other hand, and for us very important, the TREC participation has been an opportunity to investigate the resource e#ort it requires to deliver a TREC result.

