The paper summarizes the essential properties of document retrieval and reviews both conventional practice and research findings, the latter suggesting that simple statistical techniques can be effective. It then considers the new opportunities and challenges presented by the ability to search full text directly (rather than e.g. titles and abstracts), and suggests appropriate approaches to doing this, with a focus on the role of natural language processing. The paper also comments on possible connections with data and knowledge retrieval, and concludes by emphasizing the importance of rigorous performance testing. This paper will appear in Communications of the ACM. 2 Introduction Automatic text, or document, retrieval has recently become a topic of interest for those working in natural language processing (NLP). The aim of this article is to indicate the key properties of document retrieval, distinguishing it from both data retrieval and question answering; to summarize past exper...
|
588
|
A stochastic parts program and noun phrase parser for unrestricted text
– Church
- 1988
|
|
465
|
Improving retrieval performance by relevance feedback
– Salton, Buckley
- 1990
|
|
287
|
The vocabulary problem in human-system communication
– Furnas, Landauer, et al.
- 1987
|
|
172
|
An evaluation of phrasal and clustered representations on a text categorization task
– Lewis
- 1992
|
|
172
|
Evaluation of an Inference Network-Based Retrieval Model
– Turtle, Croft
- 1991
|
|
165
|
Noun classification from predicate-argument structure
– Hindle
- 1990
|
|
149
|
Overview of the first text retrieval conference (TREC-1
– Harman
- 1992
|
|
128
|
Experiments in automatic phrase indexing for document retrieval: a comparison of syntactic and non-syntactic methods
– Fagan
- 1987
|
|
128
|
Introduction to Modern Information Retrieval. McGraw-Hill Computer Science Series
– Salton, McGill
- 1983
|
|
103
|
The use of phrases and structured queries in information retrieval
– Croft, Turtle, et al.
- 1991
|
|
98
|
Lexical Ambiguity and Information Retrieval
– Krovetz, Croft
- 1992
|
|
78
|
Models for retrieval with probabilistic indexing
– Fuhr
- 1989
|
|
78
|
SCISOR: Extracting information from on-line news
– Jacobs, Rau
- 1990
|
|
73
|
Another Look at Automatic Text Retrieval Systems
– Salton
- 1986
|
|
67
|
A Comparison of Internet Resource Discovery Approaches
– Schwartz, Emtage, et al.
- 1992
|
|
57
|
The Fourth Text REtrieval
– Harman
- 1996
|
|
42
|
Evaluating Message Understanding Systems: An Analysis of the Third Message Undestanding Conference (MUC-3
– Chinchor, Hirschman, et al.
- 1993
|
|
36
|
Topic parsing: Accounting for text macro structures in full-text analysis
– Hahn
- 1990
|
|
34
|
Natural Language Interfaces to Databases
– Copestake, Jones, et al.
- 1990
|
|
33
|
Automatic search term variant generation
– Jones, Tait
- 1984
|
|
30
|
Automatic indexing using selective nlp and first-order thesauri
– Evans, Ginther-Webster, et al.
- 1991
|
|
29
|
Experiments with Query Acquisition and Use in Document Retrieval Systems
– Croft, Das
- 1990
|
|
27
|
Design and evaluation of the CLARIT-TREC-2 system
– Evans, Lefferts
- 1994
|
|
27
|
Grammatically-based automatic word class formation
– Hirschman, Grishman, et al.
- 1975
|
|
27
|
Global text matching for information retrieval
– Salton, Buckley
- 1991
|
|
24
|
TTP: A fast and robust parser for natural language
– Strzalkowski
- 1991
|
|
21
|
Intelligent Databases
– Parsaye, Chignell, et al.
- 1989
|
|
18
|
An Evaluation of Query Processing Strategies Using the TIPSTER Collection
– Callan, Croft
- 1993
|
|
17
|
Progress in Natural Language Understanding: An Application to Lunar Geology
– Woods
- 1973
|
|
16
|
The Importance of Proper Weighting Methods
– Buckley
- 1993
|
|
12
|
From n-grams to collocations: an evaluation of Xtract
– Smadja
- 1991
|
|
11
|
Recent developments in natural language text retrieval
– Strzalkowski, Carballo
- 1994
|
|
11
|
Automatic classification and summarization of banking telexes
– Young, Hayes
|
|
9
|
Document Retrieval Systems
– Willett
- 1988
|
|
2
|
Subject Access Systems
– Milstead
- 1984
|
|
2
|
Natural language comes of age
– Pritchard-Schoch
- 1993
|
|
2
|
Fashionable trends and feasible strategies in information management
– Jones, K
- 1988
|
|
1
|
Natural Language Markets
– Engelien, McBryde
- 1991
|
|
1
|
Search term relevance weighting---some recent results
– Jones, K
- 1980
|
|
1
|
The importance of proper weighting methods. To appear
– Buckley
- 1993
|