Results 1 -
5 of
5
Inverted files for text search engines
- ACM Computing Surveys
, 2006
"... The technology underlying text search engines has advanced dramatically in the past decade. The development of a family of new index representations has led to a wide range of innovations in index storage, index construction, and query evaluation. While some of these developments have been consolida ..."
Abstract
-
Cited by 136 (2 self)
- Add to MetaCart
The technology underlying text search engines has advanced dramatically in the past decade. The development of a family of new index representations has led to a wide range of innovations in index storage, index construction, and query evaluation. While some of these developments have been consolidated in textbooks, many specific techniques are not widely known or the textbook descriptions are out of date. In this tutorial, we introduce the key techniques in the area, describing both a core implementation and how the core can be enhanced through a range of extensions. We conclude with a comprehensive bibliography of text indexing literature.
An Algebra for Structured Text Search and A Framework for its Implementation
- The Computer Journal
, 1995
"... A query algebra is presented that expresses searches on structured text. In addition to traditional full-text boolean queries that search a pre-defined collection of documents, the algebra permits queries that harness document structure. The algebra manipulates arbitrary intervals of text, which are ..."
Abstract
-
Cited by 104 (19 self)
- Add to MetaCart
A query algebra is presented that expresses searches on structured text. In addition to traditional full-text boolean queries that search a pre-defined collection of documents, the algebra permits queries that harness document structure. The algebra manipulates arbitrary intervals of text, which are recognized in the text from implicit or explicit markup. The algebra has seven operators, which combine intervals to yield new ones: containing , not containing , contained in, not contained in, one of , both of , followed by . The ultimate result of a query is the set of intervals that satisfy it. An implementation framework is given based on four primitive access functions. Each access function finds the solution to a query nearest to a given position in the database. Recursive definitions for the seven operators are given in terms of these access functions. Search time is at worst proportional to the time required to solve the elementary terms in the query. Inverted indices yield search ...
Fast Inverted Indexes with On-Line Update
, 1994
"... We describe data structures and an update strategy for the practical implementation of inverted indexes. The context of our discussion is the construction of a dedicated index engine for a distributed full-text information retrieval system, but the results have wider application. Retrieval operation ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
We describe data structures and an update strategy for the practical implementation of inverted indexes. The context of our discussion is the construction of a dedicated index engine for a distributed full-text information retrieval system, but the results have wider application. Retrieval operations require a single disk access per query term. The on-line update strategy guarantees the consistency of on-disk data structures. Index compression integrates smoothly. 1 Introduction 1.1 Environment Our general concern is the construction of a distributed full-text information retrieval system. The basic architecture consists of a group of LANconnected processors, each managing its own separate disk and memory. Individual processors act as either text servers, storing documents and servicing requests for portions of these documents, or as index engines, identifying the portions of documents that match client-generated search criteria. To external clients, the group of machines appears to ...
Dynamic Inverted Indexes for a Distributed Full-Text Retrieval System
, 1995
"... We describe data structures and an update strategy for the implementation of dynamic inverted indexes in the context of a dedicated index engine for a distributed fulltext retrieval system. Except in rare cases, retrieval operations require a single disk access per query term. The on-line update str ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
We describe data structures and an update strategy for the implementation of dynamic inverted indexes in the context of a dedicated index engine for a distributed fulltext retrieval system. Except in rare cases, retrieval operations require a single disk access per query term. The on-line update strategy guarantees the consistency of ondisk data structures across node failures. Index compression integrates smoothly. We examine the performance of the system both experimentally and through an analytical comparison with a competing B-tree based approach. 1 Introduction 1.1 Environment Our general concern is the construction of a distributed full-text retrieval system. The architecture consists of a group of LAN-connected processors, each managing its own separate disk and memory. Individual processors act as either text servers, storing documents and servicing requests for portions of these documents, or as index engines, identifying the portions of documents that match client-generate...
Space and Time Improvements for Indexing in Information Retrieval
- In Proceedings of 4th Annual Symposium on Document Analysis and Information Retrieval
, 1995
"... When indexing large text collections minimizing the indexing time and the disk storage used to create an index remains important. Indexing optimizations applied to a prototype retrieval system at NIST are discussed in this paper. These include the organization of the index, the use of virtual memory ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
When indexing large text collections minimizing the indexing time and the disk storage used to create an index remains important. Indexing optimizations applied to a prototype retrieval system at NIST are discussed in this paper. These include the organization of the index, the use of virtual memory facilities to improve indexing time, an index addressing scheme to decrease index size, and the implementation of term position information extensions using compression. These improvements provided a large decrease in indexing time and moderate decrease in index size for indices without term position extensions. Indices using term position extensions had a more moderate increase in space/time efficiency. 1 Introduction As computers grow exponentially faster, and disk drives become more compact and inexpensive, it seems that efficiency should be less important. However, this is not so, at least in the information retrieval community. If available disk space is growing, so is the amount of t...

