Results 1 - 10
of
29
Indexing by latent semantic analysis
- JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE
, 1990
"... A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries. The p ..."
Abstract
-
Cited by 2168 (30 self)
- Add to MetaCart
A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries. The particular technique used is singular-value decomposition, in which a large term by document matrix is decomposed into a set of ca. 100 or-thogonal factors from which the original matrix can be approximated by linear combination. Documents are represented by ca. 100 item vectors of factor weights. Queries are represented as pseudo-document vectors formed from weighted combinations of terms, and documents with supra-threshold cosine values are re-turned. initial tests find this completely automatic method for retrieval to be promising.
An Adaptive Web Page Recommendation Service
, 1997
"... An adaptive recommendation service seeks to adapt to its users, providing increasingly personalized recommendations over time. In this paper we introduce the "Fab" adaptive web page recommendation service. There has been much research on analyzing document content in order to improve recommendations ..."
Abstract
-
Cited by 104 (0 self)
- Add to MetaCart
An adaptive recommendation service seeks to adapt to its users, providing increasingly personalized recommendations over time. In this paper we introduce the "Fab" adaptive web page recommendation service. There has been much research on analyzing document content in order to improve recommendations or search results. More recently researchers have begun to explore how the similarities between users can be exploited to the same ends. The Fab system strikes a balance between these two approaches, taking advantage of the shared interests among users without losing the benefits of the representations provided by content analysis. Running since March 1996, it has been populated with a collection of agents for the collection and selection of web pages, whose interaction fosters emergent collaborative properties. In this paper we explain the design of the system architecture and report the results of our first experiment, evaluating recommendations provided to a group of test users. 1 Introd...
How Reliable are the Results of Large-Scale Information Retrieval Experiments?
- Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 1998
"... Two stages in measurement of techniques for information retrieval are gathering of documents for relevance assessment and use of the assessments to numerically evaluate e#ectiveness. We consider both of these stages in the context of the TREC experiments, to determine whether they lead to measuremen ..."
Abstract
-
Cited by 100 (3 self)
- Add to MetaCart
Two stages in measurement of techniques for information retrieval are gathering of documents for relevance assessment and use of the assessments to numerically evaluate e#ectiveness. We consider both of these stages in the context of the TREC experiments, to determine whether they lead to measurements that are trustworthy and fair. Our detailed empirical investigation of the TREC results shows that the measured relative performance of systems appears to be reliable, but that recall is overestimated: it is likely that many relevant documents have not been found. We propose a new pooling strategy that can significantly increase the number of relevant documents found for given e#ort, without compromising fairness.
Overview of the Fifth Text REtrieval Conference (TREC-5)
- PROCEEDINGS OF THE FIFTH TEXT RETRIEVAL CONFERENCE (TREC-5
, 1997
"... ..."
Efficient Construction of Large Test Collections
, 1998
"... Test collections with a million or more documents are needed for the evaluation of modern information retrieval systems. Yet their construction requires a great deal of effort. Judgements must be rendered as to whether or not documents are relevant to each of a set of queries. Exhaustive judging, in ..."
Abstract
-
Cited by 62 (4 self)
- Add to MetaCart
Test collections with a million or more documents are needed for the evaluation of modern information retrieval systems. Yet their construction requires a great deal of effort. Judgements must be rendered as to whether or not documents are relevant to each of a set of queries. Exhaustive judging, in which every document is examined and a judgement rendered, is infeasible for collections of this size. Current practice is represented by the "pooling method", as used in the TREC conference series, in which only the first k documents from each of a number of sources are judged. We propose two methods, Interactive Searching and Judging and Moveto -Front Pooling, that yield effective test collections while requiring many fewer judgements. Interactive Searching and Judging selects documents to be judged using an interactive search system, and may be used by a small research team to develop an effective test collection using minimal resources. Move-to-Front Pooling directly improves on the standard pooling method by using a variable number of documents from each source depending on its retrieval performance. Move-to-Front Pooling would be an appropriate replacement for the standard pooling method in future collection development efforts involving many independent groups.
Building a Question Answering Test Collection
- Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
"... The TREC-8 Question Answering (QA) Track was the first large-scale evaluation of domain-independent question answering systems. In addition to fostering research on the QA task, the track was used to investigate whether the evaluation methodology used for document retrieval is appropriate for a diff ..."
Abstract
-
Cited by 58 (7 self)
- Add to MetaCart
The TREC-8 Question Answering (QA) Track was the first large-scale evaluation of domain-independent question answering systems. In addition to fostering research on the QA task, the track was used to investigate whether the evaluation methodology used for document retrieval is appropriate for a different natural language processing task. As with document relevance judging, assessors had legitimate differences of opinions as to whether a response actually answers a question, but comparative evaluation of QA systems was stable despite these differences. Creating a reusable QA test collection is fundamentally more difficult than creating a document retrieval test collection since the QA task has no equivalent to document identifiers. 1 Introduction The Text REtrieval Conference (TREC) is a series of workshops organized by the National Institute of Standards and Technology (NIST) and designed to advance the state-of-the-art in information retrieval (IR) [15]. The workshops have focused p...
Automating the Assignment of Submitted Manuscripts to Reviewers
- In Research and Development in Information Retrieval
, 1992
"... The 117 manuscripts submitted for the Hypertext'91 conference were assigned to members of the review committee, using a variety of automated methods based on information retrieval principles and Latent Semantic Indexing. Fifteen reviewers provided exhaustive ratings for the submitted abstracts, indi ..."
Abstract
-
Cited by 47 (2 self)
- Add to MetaCart
The 117 manuscripts submitted for the Hypertext'91 conference were assigned to members of the review committee, using a variety of automated methods based on information retrieval principles and Latent Semantic Indexing. Fifteen reviewers provided exhaustive ratings for the submitted abstracts, indicating how well each abstract matched their interests. The automated methods do a fairly good job of assigning relevant papers for review, but they are still somewhat poorer than assignments made manually by human experts and substantially poorer than an assignment perfectly matching the reviewers' own ranking of the papers. A new automated assignment method called "n of 2n" achieves better performance than human experts by sending reviewers more papers than they actually have to review and then allowing them to choose part of their review load themselves. Keywords: Conferences, Program Committees, Reviewers, Referees, Manuscripts, Papers, Assignment, Matching, Interests, Latent Semantic Ind...
Evaluating retrieval performance using clickthrough data
, 2003
"... This paper proposes a new method for evaluating the quality of retrieval functions. Unlike traditional methods that require relevance judgments by experts or explicit user feedback, it is based entirely on clickthrough data. This is a key advantage, since clickthrough data can be collected at very l ..."
Abstract
-
Cited by 44 (6 self)
- Add to MetaCart
This paper proposes a new method for evaluating the quality of retrieval functions. Unlike traditional methods that require relevance judgments by experts or explicit user feedback, it is based entirely on clickthrough data. This is a key advantage, since clickthrough data can be collected at very low cost and without overhead for the user. Taking an approach from experiment design, the paper proposes an experiment setup that generates unbiased feedback about the relative quality of two search results without explicit user feedback. A theoretical analysis shows that the method gives the same results as evaluation with traditional relevance judgments under mild assumptions. An empirical analysis verifies that the assumptions are indeed justified and that the new method leads to conclusive results in a WWW retrieval study. 1
Phonetic String Matching: Lessons from Information Retrieval
, 1996
"... Phonetic matching is used in applications such as name retrieval, where the spelling of a name is used to identify other strings that are likely to be of similar pronunciation. In this paper we explain the parallels between information retrieval and phonetic matching, and describe our new phonetic m ..."
Abstract
-
Cited by 38 (2 self)
- Add to MetaCart
Phonetic matching is used in applications such as name retrieval, where the spelling of a name is used to identify other strings that are likely to be of similar pronunciation. In this paper we explain the parallels between information retrieval and phonetic matching, and describe our new phonetic matching techniques. Our experimental comparison with existing techniques such as Soundex and edit distances, which is based on recall and precision, demonstrates that the new techniques are superior. In addition, reasoning from the similarity of phonetic matching and information retrieval, we have applied combination of evidence to phonetic matching. Our experiments with combining demonstrate that it leads to substantial improvements in effectiveness.
Relevance: A review of the literature and a framework for thinking on the notion in information science
- Eds.), Advances in Librarianship 6
, 1976
"... Relevance is a, if not even the, key notion in information science in general and information retrieval in particular. This two-part critical review traces and synthesizes the scholarship on relevance over the past 30 years or so and provides an updated framework within which the still widely disson ..."
Abstract
-
Cited by 31 (1 self)
- Add to MetaCart
Relevance is a, if not even the, key notion in information science in general and information retrieval in particular. This two-part critical review traces and synthesizes the scholarship on relevance over the past 30 years or so and provides an updated framework within which the still widely dissonant ideas and works about relevance might be interpreted and related. It is a continuation and update of a similar review that appeared in 1975 under the same title, considered here as being Part I. The present review is organized in two parts: Part II addresses the questions related to nature and manifestations of relevance, and Part III addresses questions related to relevance behavior and effects. In Part II, the nature of relevance is discussed in terms of meaning ascribed to relevance, theories used or proposed, and models that have been developed. The

