Results 1 - 10
of
57
Improved Algorithms for Topic Distillation in a Hyperlinked Environment
, 1998
"... This paper addresses the problem of topic distillation on the World Wide Web, namely, given a typical user query to find quality documents related to the query topic. Connectivity analysis has been shown to be useful in identifying high quality pages within a topic specific graph of hyperlinked docu ..."
Abstract
-
Cited by 374 (6 self)
- Add to MetaCart
This paper addresses the problem of topic distillation on the World Wide Web, namely, given a typical user query to find quality documents related to the query topic. Connectivity analysis has been shown to be useful in identifying high quality pages within a topic specific graph of hyperlinked documents. The essence of our approach is to augment a previous connectivity analysis based algorithm with content analysis. We identify three problems with the existing approach and devise algorithms to tackle them. The results of a user evaluation are reported that show an improvement of precision at 10 documents by at least 45% over pure connectivity analysis.
Viewing Morphology as an Inference Process
, 1993
"... Morphology is the area of linguistics concerned with the internal structure of words. Information Retrieval has generally not paid much attention to word structure, other than to account for some of the variability in word forms via the use of stemmers. This paper will describe our experiments to de ..."
Abstract
-
Cited by 236 (4 self)
- Add to MetaCart
Morphology is the area of linguistics concerned with the internal structure of words. Information Retrieval has generally not paid much attention to word structure, other than to account for some of the variability in word forms via the use of stemmers. This paper will describe our experiments to determine the importance of morphology, and the effect that it has on performance. We will also describe the role of morphological analysis in word sense disambiguation, and in identifying lexical semantic relationships in a machine-readable dictionary. We will first provide a brief overview of morphological phenomena, and then describe the experiments themselves. 1 Introduction Morphology is the area of linguistics concerned with the internal structure of words. It is usually broken down into two subclasses: inflectional and derivational. Inflectional morphology describes predictable changes a word undergoes as a result of syntax - the plural and possessive form for nouns, and the past tens...
Probabilistic Models in Information Retrieval
- The Computer Journal
, 1992
"... In this paper, an introduction and survey over probabilistic information retrieval (IR) is given. First, the basic concepts of this approach are described: the probability ranking principle shows that optimum retrieval quality can be achieved under certain assumptions; a conceptual model for IR alon ..."
Abstract
-
Cited by 87 (4 self)
- Add to MetaCart
In this paper, an introduction and survey over probabilistic information retrieval (IR) is given. First, the basic concepts of this approach are described: the probability ranking principle shows that optimum retrieval quality can be achieved under certain assumptions; a conceptual model for IR along with the corresponding event space clarify the interpretation of the probabilistic parameters involved. For the estimation of these parameters, three different learning strategies are distinguished, namely query-related, document-related and description-related learning. As a representative for each of these strategies, a specific model is described. A new approach regards IR as uncertain inference; here, imaging is used as a new technique for estimating the probabilistic parameters, and probabilistic inference networks support more complex forms of inference. Finally, the more general problems of parameter estimation, query expansion and the development of models for advanced document representations are discussed.
Retrieving records from a gigabyte of text on a minicomputer using statistical ranking
- Journal of the American Society for Information Science
, 1990
"... Statistically based ranked retrieval of records using keywords provides many advantages over traditional Boolean retrieval methods, especially for end users. This approach to retrieval, however, has not seen wide-spread use in large operational retrieval systems. To show the feasibility of this retr ..."
Abstract
-
Cited by 67 (1 self)
- Add to MetaCart
Statistically based ranked retrieval of records using keywords provides many advantages over traditional Boolean retrieval methods, especially for end users. This approach to retrieval, however, has not seen wide-spread use in large operational retrieval systems. To show the feasibility of this retrieval methodology, re-search was done to produce very fast search tech-niques using these ranking algorithms, and then to test the results against large databases with many end users. The results show not only response times on the order of 1 and l/2 seconds for 806 megabytes of text, but also very favorable user reaction. Novice users were able to consistently obtain good search results after 5 minutes of training. Additional work was done to de-vise new indexing techniques to create inverted files for large databases using a minicomputer. These techniques use no sorting, require a working space of only about 20 % of the size of the input text, and produce indices that are about 14 % of the input text size.
Re-examining the potential effectiveness of interactive query expansion, in: SIGIR ’03
- Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, ACM
, 2003
"... Much attention has been paid to the relative effectiveness of interactive query expansion versus automatic query expansion. Although interactive query expansion has the potential to be an effective means of improving a search, in this paper we show that, on average, human searchers are less likely t ..."
Abstract
-
Cited by 45 (3 self)
- Add to MetaCart
Much attention has been paid to the relative effectiveness of interactive query expansion versus automatic query expansion. Although interactive query expansion has the potential to be an effective means of improving a search, in this paper we show that, on average, human searchers are less likely than systems to make good expansion decisions. To enable good expansion decisions, searchers must have adequate instructions on how to use interactive query expansion functionalities. We show that simple instructions on using interactive query expansion do not necessarily help searchers make good expansion decisions and discuss difficulties found in making query expansion decisions. Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]:- search process, relevance feedback.
Experiments with query acquisition and use in document retrieval systems
- In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval
, 1990
"... Im some recent experimental document retrieval systems, emphasis has been placed on the acquisition of a detailed model of the information need through interaction with the user. It has been argued that these “enhanced ” queries, in combination with relevance feedback, will improve retrieval perform ..."
Abstract
-
Cited by 32 (8 self)
- Add to MetaCart
Im some recent experimental document retrieval systems, emphasis has been placed on the acquisition of a detailed model of the information need through interaction with the user. It has been argued that these “enhanced ” queries, in combination with relevance feedback, will improve retrieval performance. In this paper, we describe a study with the aim of evaluating how easily enhanced queries can be acquired from users and how effectively this additional knowledge can be used in retrieval. The results indicate that significant effectiveness benefits can be obtained through the acquisition of domain concepts related to query concepts, together with their level of importance to the information need. 1
Effects of OCR errors on ranking and feedback using the vector space model
- Inf. Proc. and Management
, 1996
"... We report on the performance of the vector space model in the presence of OCR errors. We show that average precision and recall is not affected for our full text document collection when the OCR version is compared to its corresponding corrected set. We do see divergence though between the relevant ..."
Abstract
-
Cited by 29 (12 self)
- Add to MetaCart
We report on the performance of the vector space model in the presence of OCR errors. We show that average precision and recall is not affected for our full text document collection when the OCR version is compared to its corresponding corrected set. We do see divergence though between the relevant document rankings of the OCR and corrected collections with different weighting combinations. In particular, we observed that cosine normalization plays a considerable role in the disparity seen between the collections. Furthermore, we show that even though feedback improves retrieval for both collections, it can not be used to compensate for OCR errors caused by badly degraded documents.
Natural Language Information Retrieval: TREC-3 Report
- In Proceedings of the Fifth Text REtrieval Conference (TREC-5
"... In this paper we report on the recent developments in NYU's natural language information retrieval system, especially as related to the 3rd Text Retrieval Conference (TREC-3). The main characteristic of this system is the use of advanced natural language processing to enhance the effectiveness of te ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
In this paper we report on the recent developments in NYU's natural language information retrieval system, especially as related to the 3rd Text Retrieval Conference (TREC-3). The main characteristic of this system is the use of advanced natural language processing to enhance the effectiveness of term-based document retrieval. The system is designed around a traditional statistical backbone consisting of the indexer module, which builds inverted index files from pre-processed documents, and a retrieval engine which searches and ranks the documents in response to user queries. Natural language processing is used to (1) preprocess the documents in order to extract content-carrying terms, (2) discover inter-term dependencies and build a conceptual hierarchy specific to the database domain, and (3) process user's natural language requests into effective search queries. For the present TREC-3 effort, the total of 3.3 GBytes of text articles have been processed (Tipster disks 1 through 3), i...
Fast and effective query refinement
- IN PROC. OF THE 20TH INTL. ACM SIGIR CONF. ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL
, 1997
"... Query Refinement is an essential information retrieval tool that interactively recommends new terms related to a particular query. This paper introduces concept recall, an experimental measure of an algorithm's ability to suggest terms humans have judged to be semantically related to an information ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
Query Refinement is an essential information retrieval tool that interactively recommends new terms related to a particular query. This paper introduces concept recall, an experimental measure of an algorithm's ability to suggest terms humans have judged to be semantically related to an information need. This study uses precision improvement experiments to measure the ability of an algorithm to produce single term query modifications that predict a user's information need as partially encoded by the query. An oracle algorithm produces ideal query modifications, providing a meaningful context for interpreting precision improvement results. This study also introduces RMAP, a fast and practical query refinement algorithm that refines multiple term queries by dynamically combining precomputed suggestions for single term queries. RMAP achieves accuracy comparable to a much slower algorithm, although both RMAP and the slower algorithm lag behind the best possible term suggestions o ered by the oracle. We believe RMAP is fast enough to be integrated into present dayInternet search engines: RMAP computes 100 term suggestions for a 160,000 document collection in 15 ms on a low-end PC.

