Results 1 - 10
of
17
Evaluation of an Inference Network-Based Retrieval Model
- ACM Transactions on Information Systems
, 1991
"... The use of inference networks to support document retrieval is introduced. A network-based retrieval model is described and compared to conventional probabilistic and Boolean models. The performance of a retrieval system based on the inference network model is evaluated and compared to performance w ..."
Abstract
-
Cited by 203 (20 self)
- Add to MetaCart
The use of inference networks to support document retrieval is introduced. A network-based retrieval model is described and compared to conventional probabilistic and Boolean models. The performance of a retrieval system based on the inference network model is evaluated and compared to performance with conventional retrieval models,
An Evaluation of Information Retrieval Accuracy with Simulated OCR Output
, 1992
"... Optical Character Recognition (OCR) is a critical part of many text-based applications. Although some commercial systems use the output from OCR devices to index documents without editing, there is very little quantitative data on the impact of OCR errors on the accuracy of a text retrieval system. ..."
Abstract
-
Cited by 37 (10 self)
- Add to MetaCart
Optical Character Recognition (OCR) is a critical part of many text-based applications. Although some commercial systems use the output from OCR devices to index documents without editing, there is very little quantitative data on the impact of OCR errors on the accuracy of a text retrieval system. Because of the difficulty of constructing test collections to obtain this data, we have carried out evaluations using simulated OCR output on a variety of databases. The results show that high quality OCR devices have little effect on the accuracy of retrieval, but low quality devices used with databases of short documents can result in significant degradation. 1 Introduction Text-based information systems have become increasingly important in business, government, and academia. In many applications, the source of the text is not documents from word processors, but instead documents in their original paper form. Although imaging systems provide a simple means of storing these documents and ...
Evaluating the Performance of Distributed Architectures for Information Retrieval using a Variety of Workloads
- ACM Transactions on Information Systems
, 1997
"... Information explosion across the Internet and elsewhere offers access to an increasing number of document collections. In order for users to effectively access these collections, information retrieval (IR) systems must provide coordinated, concurrent, and distributed access. In this paper, we desc ..."
Abstract
-
Cited by 33 (7 self)
- Add to MetaCart
Information explosion across the Internet and elsewhere offers access to an increasing number of document collections. In order for users to effectively access these collections, information retrieval (IR) systems must provide coordinated, concurrent, and distributed access. In this paper, we describe a fully functional distributed IR system based on the Inquery unified IR system. To refine this prototype, we implement a flexible simulation model which we use to present a series of experiments using a variety of workloads that measure system performance. We vary numerous system parameters such as the number of users, document collections, terms per query, query term frequency, think time, answers returned, and workload. Based on our initial results, we recommend simple changes to the prototype and evaluate the changes using the simulator. Because of the significant resource demands of information retrieval, it is not difficult to generate workloads that overwhelm system resources regardless of the architecture. However under some realistic workloads, we demonstrate system organizations for which response time gracefully degrades as the workload increases and performance scales with the number of processors. This scalable architecture includes a surprisingly small number of brokers through which a large number of clients and servers communicate. Categories and Subject Descriptors: C.2.4 [Computer-Communication Networks]: Distributed Systems-- distributed applications; C.4 [Performance of Systems]: Performance Attributes; H.3.4 [Information Storage and Retrieval]: Systems and Software; General Terms: Experimentation, Performance Additional Key Words and Phrases: Distributed information retrieval architectures This material is based on work supported by ...
Evaluation of model-based retrieval effectiveness with OCR text
- ACM Transactions on Information Systems
, 1996
"... We give a comprehensive report on our experiments with retrieval from OCR-generated text using systems based on standard models of retrieval. More specifically, we show that average precision and recall is not affected by OCR errors across systems for several collections. The collections used in the ..."
Abstract
-
Cited by 30 (12 self)
- Add to MetaCart
We give a comprehensive report on our experiments with retrieval from OCR-generated text using systems based on standard models of retrieval. More specifically, we show that average precision and recall is not affected by OCR errors across systems for several collections. The collections used in these experiments include both actual OCR-generated text and standard information retrieval collections corrupted through the simulation of OCR errors. Both the actual and simulation experiments include full-text and abstract-length documents. We also demonstrate that the ranking and feedback methods associated with these models are generally not robust enough to deal with OCR errors. It is further shown that the OCR errors and garbage strings generated from the mistranslation of graphic objects increase the size of the index by a wide margin. We not only point out problems that can arise from applying OCR text within an information retrieval environment, we also suggest solutions to overcome some of these problems.
Performance Evaluation of a Distributed Architecture for Information Retrieval
- In Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 1996
"... Information explosion across the Internet and elsewhere offers access to an increasing number of document collections. In order for users to effectively access these collections, information retrieval (IR) systems must provide coordinated, concurrent, and distributed access. In this paper, we descri ..."
Abstract
-
Cited by 28 (7 self)
- Add to MetaCart
Information explosion across the Internet and elsewhere offers access to an increasing number of document collections. In order for users to effectively access these collections, information retrieval (IR) systems must provide coordinated, concurrent, and distributed access. In this paper, we describe a fully functional distributed IR system based on the Inquery unified IR system. To refine this prototype, we implement a flexible simulation model that analyzes performance issues given a wide variety of system parameters and configurations. We present a series of experiments that measure response time, system utilization, and identify bottlenecks. We vary numerous system parameters, such as the number of users, text collections, terms per query, and workload to generalize our results for other distributed IR systems. Based on our initial results, we recommend simple changes to the prototype and evaluate the changes using the simulator. Because of the significant resource demands of info...
Supporting Full-Text Information Retrieval with a Persistent Object Store
- In 4th Intl. Conf. on Extending Database Technology
, 1994
"... Full-text information retrieval systems have unusual and challenging data management requirements. Attempts have been made to satisfy these requirements using traditional (e.g., relational) database management systems. Those attempts, however, have produced rather discouraging results. Instead, info ..."
Abstract
-
Cited by 27 (4 self)
- Add to MetaCart
Full-text information retrieval systems have unusual and challenging data management requirements. Attempts have been made to satisfy these requirements using traditional (e.g., relational) database management systems. Those attempts, however, have produced rather discouraging results. Instead, information retrieval systems typically use custom data management facilities that require significant development effort and usually do not provide all of the services available from a standard database management system. Advanced data management systems, such as object-oriented database management systems and persistent object stores, offer a reasonable alternative to the two previous approaches. We have taken an existing information retrieval system (INQUERY) and substituted a persistent object store (Mneme) for the portion of the custom data management system that manages an inverted file index. The result is an improvement in performance and significant opportunities for the inform...
Order preserving minimal perfect hash functions and information retrieval
- ACM TRANSACTIONS ON INFORMATION SYSTEMS
, 1991
"... Rapid access to information is essential for a wide variety of retrieval systems and applications. Hashing has long been used when the fastest possible direct search is desired, but is generally not appropriate when sequential or range searches are also required. This paper describes a hashing metho ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
Rapid access to information is essential for a wide variety of retrieval systems and applications. Hashing has long been used when the fastest possible direct search is desired, but is generally not appropriate when sequential or range searches are also required. This paper describes a hashing method, developed for collections that are relatively static, that supports both direct and sequential access. Indeed, the algorithm described gives hash functions that are optimal in terms of time and hash table space utilization, and that preserve any a priori ordering desired. Furthermore, the resulting order preserving minimal perfect hash functions (OPMPHFs) can be
Open video: A framework for a test collection
- Journal of Network and Computer Applications
, 2000
"... This paper provides a framework for such a test collection and describes the Open Video Project that has begun to develop a test collection based on this framework. The proposed test collection is meant to be used to study a wide range of problems, such as tests of algorithms for creating surroga ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
This paper provides a framework for such a test collection and describes the Open Video Project that has begun to develop a test collection based on this framework. The proposed test collection is meant to be used to study a wide range of problems, such as tests of algorithms for creating surrogates for video content or interfaces that display result sets from queries. An important challenge in developing such a collection is storing and distributing video objects. This paper is meant to layout video management issues that may influence distributed storage solutions. More specifically, this paper describes the first phase for creating the test collection, sets guidelines for building the collection, and serves as a basis for discussion to inform subsequent phases and invite research community involvement. 2000 Academic Press 1. Introduction It is inevitable that the technical limitations that impede widespread usage of video libraries will dimi
Introduction and Overview
- Journal of the American Society for Information Science
, 1993
"... This paper provides a partial overview of practice, problems, proposals, and plans relating to the handling of 'composite documents ' by extended information storage and retrieval sys-tems. It aims to describe such documents, to explore various areas of application for them, to portray a number of r ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
This paper provides a partial overview of practice, problems, proposals, and plans relating to the handling of 'composite documents ' by extended information storage and retrieval sys-tems. It aims to describe such documents, to explore various areas of application for them, to portray a number of representative test collec-tions under development, to survey related stu-dies as well as previous work of this author, and to examine plans for further investigation. 1.1. Background Experimental information retrieval (IR) sys-tems, some dating back to the sixties, have demonstrated the viability of fully automatic document storage and retrieval methodologies with. small to medium size bibliographic collec-tions [72]. Many of these experimental systems utilize the vector space model in which each important term (such as a word stem) identifies a different dimension in a space, so that matrix methods and vector operations can be defined on queries and documents. Statistical techniques have been very effective, and probabilistic enhancements have given additional improve-ments [84]. However, the basic vector space model is oriented towards recording the essential information in the text of a title/abstract combi-This work was supported by the NSF under grant IST-8418877 and by Virginia's Center for Innovative Technolo-gy under grant INF-85-016.
Enhancing the set-based model using proximity information
- Proceedings of the 9 th International Symposium on String Processing and Information Retrieval
, 2002
"... ..."

