Abstract:
Full-text information retrieval systems have traditionally been designed for archival environments. They often provide little or no support for adding new documents to an existing document collection, requiring instead that the entire collection be re-indexed. Modern applications, such as information filtering, operate in dynamic environments that require frequent additions to document collections. We provide this ability using a traditional inverted file index built on top of a persistent object store. The data management facilities of the persistent object store are used to produce efficient incremental update of the inverted lists. We describe our system and present experimental results showing superior incremental indexing and competitive query processing performance. Keywords: full-text document retrieval, incremental indexing, persistent object store, performance 1 Introduction Full-text information retrieval (IR) systems are well established tools for satisfying a user's inf...
Citations
|
2329
|
Introduction to modern information retrieval
– Salton
- 1983
|
|
614
|
Human behavior and the principle of least-effort
– Zipf
- 1949
|
|
593
|
TC: Managing Gigabytes: compressing and indexing documents and images
– IA, Moffat, et al.
- 1999
|
|
290
|
The INQUERY retrieval system
– Callan, Croft, et al.
- 1992
|
|
172
|
Evaluation of an Inference Network-Based Retrieval Model
– Turtle, Croft
- 1991
|
|
128
|
Access methods for text
– FALOUTSOS
- 1985
|
|
91
|
Self-indexing inverted files for fast text retrieval
– Moffat, Zobel
- 1996
|
|
83
|
Information Retrieval - Computational and Theoretical Aspects
– Heaps
- 1978
|
|
80
|
Design of the Mneme persistent object store
– MOSS
- 1990
|
|
77
|
Incremental Updates of Inverted Lists for Text Document Retrieval
– Tomasic, Garcia-Molina, et al.
- 1994
|
|
62
|
A Comparison of Text Retrieval Models
– Turtle, Croft
- 1992
|
|
61
|
OPt,imizations for dynamic inverted index maintenance
– Cutting, Pedersen
- 1990
|
|
57
|
The Fourth Text REtrieval
– Harman
- 1996
|
|
57
|
An efficient indexing technique for full-text database systems
– Zobel, Moffat, et al.
- 1992
|
|
40
|
Inverted Files
– Harmon, Fox, et al.
- 1992
|
|
36
|
On B-tree indices for skewed distributions
– Faloutsos, Jagadish
- 1992
|
|
27
|
Supporting full-text information retrieval with a persistent object store
– Brown, Callan, et al.
- 1994
|
|
20
|
Hybrid index organizations for text databases
– Faloutsos, Jagadish
- 1992
|
|
16
|
Compression and fast indexing for multi-gigabyte text databases
– Moffat, Zobel
- 1994
|
|
16
|
Synthetic workload performance analysis of incremental updates
– Shoens, Tomasic, et al.
- 1994
|
|
13
|
Applying informetric characteristics of databases to IR system file design, Part I: informetric models
– Wolfram
- 1992
|
|
11
|
FAST-INV: A fast algorithm for building large inverted files
– Fox, Lee
- 1991
|
|
2
|
Peter Wong and Dik Lun Lee. Implementations of partial document ranking using inverted files
– Yee
- 1993
|
|
1
|
A nearest neighbour search algorithm for bibliographic retrieval from multilist files
– Willett
- 1984
|
|
1
|
The Second Text REtrieval Conference (TREC2
– editor
- 1994
|