Results 1 - 10
of
18
Cumulated Gain-based Evaluation of IR Techniques
- ACM Transactions on Information Systems
, 2002
"... Modem large retrieval environments tend to overwhelm their users by their large output. Since all documents are not of equal relevance to their users, highly relevant documents should be identified and ranked first for presentation to the users. In order to develop IR techniques to this direction, i ..."
Abstract
-
Cited by 233 (3 self)
- Add to MetaCart
Modem large retrieval environments tend to overwhelm their users by their large output. Since all documents are not of equal relevance to their users, highly relevant documents should be identified and ranked first for presentation to the users. In order to develop IR techniques to this direction, it is necessary to develop evaluation approaches and methods that credit IR methods for their ability to retrieve highly relevant documents. This can be done by extending traditional evaluation methods, i.e., recall and precision based on binary relevance assessments, to graded relevance assessments. Alternatively, novel measures based on graded relevance assessments may be developed. This paper proposes three novel measures that compute the cumulative gain the user obtains by examining the retrieval result up to a given ranked position. The first one accumulates the relevance scores of retrieved documents along the ranked result list. The second one is similar but applies a discount factor on the relevance scores in order to devaluate late-retrieved documents. The third one computes the relative-tothe -ideal performance of IR techniques, based on the cumulative gain they are able to yield. The novel measures are defined and discussed and then their use is demonstrated in a case study using TREC data - sample system run results for 20 queries in TREC-7. As relevance base we used novel graded relevance assessments on a four-point scale. The test results indicate that the proposed measures credit IR methods for their ability to retrieve highly relevant documents and allow testing of statistical significance of effectiveness differences. The graphs based on the measures also provide insight into the performance IR techniques and allow interpretation, e.g., from the user point of ...
Query expansion using random walk models
- In CIKM
, 2005
"... It has long been recognized that capturing term relationships is an important aspect of information retrieval. Even with large amounts of data, we usually only have significant evidence for a fraction of all potential term pairs. It is therefore important to consider whether multiple sources of evid ..."
Abstract
-
Cited by 36 (6 self)
- Add to MetaCart
It has long been recognized that capturing term relationships is an important aspect of information retrieval. Even with large amounts of data, we usually only have significant evidence for a fraction of all potential term pairs. It is therefore important to consider whether multiple sources of evidence may be combined to predict term relations more accurately. This is particularly important when trying to predict the probability of relevance of a set of terms given a query, which may involve both lexical and semantic relations between the terms. We describe a Markov chain framework that combines multiple sources of knowledge on term associations. The stationary distribution of the model is used to obtain probability estimates that a potential expansion term reflects aspects of the original query. We use this model for query expansion and evaluate the effectiveness of the model by examining the accuracy and robustness of the expansion methods, and investigate the relative effectiveness of various sources of term evidence. Statistically significant differences in accuracy were observed depending on the weighting of evidence in the random walk. For example, using co-occurrence data later in the walk was generally better than using it early, suggesting further improvements in effectiveness may be possible by learning walk behaviors.
Dictionary-Based Cross-Language Information Retrieval: Problems, Methods, and Research Findings
- Information Retrieval
, 2001
"... This paper reviews literature on dictionary-based cross-language information retrieval (CLIR) and presents CLIR research done at the University of Tampere (UTA). The main problems associated with dictionary-based CLIR, as well as appropriate methods to deal with the problems are discussed. We will p ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
This paper reviews literature on dictionary-based cross-language information retrieval (CLIR) and presents CLIR research done at the University of Tampere (UTA). The main problems associated with dictionary-based CLIR, as well as appropriate methods to deal with the problems are discussed. We will present the structured query model by Pirkola and report findings for four different language pairs concerning the effectiveness of query structuring. The architecture of our automatic query translation and construction system is presented.
The loquacious user: A document-independent source of terms for query expansion
- In Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval
, 2005
"... [dianek | vijayd | fu] @ email.unc.edu In this paper we investigate the effectiveness of a documentindependent technique for eliciting feedback from users about their information problems. We propose that such a technique can be used to elicit terms from users for use in query expansion and as a fo ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
[dianek | vijayd | fu] @ email.unc.edu In this paper we investigate the effectiveness of a documentindependent technique for eliciting feedback from users about their information problems. We propose that such a technique can be used to elicit terms from users for use in query expansion and as a follow-up when ambiguous queries are initially posed by users. We design a feedback form to obtain additional information from users, administer the form to users after initial querying, and create a series of experimental runs based on the information that we obtained from the form. Results demonstrate that the form was successful at eliciting more information from users and that this additional information significantly improved retrieval performance. Our results further demonstrate a strong relationship between query length and performance.
The Effects Of Query Complexity, Expansion And Structure On Retrieval Performance In Probabilistic Text Retrieval
- University of Tampere
, 1999
"... ueries using all search facets identified from requests, low complexity was achieved by formulating queries with major facets only. Query expansion was based on a thesaurus, from which the expansion keys were elicited for queries. There were five expansion types: (1) the first query version was an u ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
ueries using all search facets identified from requests, low complexity was achieved by formulating queries with major facets only. Query expansion was based on a thesaurus, from which the expansion keys were elicited for queries. There were five expansion types: (1) the first query version was an unexpanded, original query with one search key for each search concept (original search concepts) elicited from the test thesaurus; (2) the synonyms of the original search keys were added to the original query; (3) search keys representing the narrower concepts of the original search concepts were added to the original query; (4) search keys representing the associative concepts of the original search concepts were added to the original query; (5) all previous expansion keys were cumulatively added to the original query. Query structure refers to the syntactic structure of a query expression, marked with query operators and parentheses. The structure of queries was either weak (queries with n
Structured Translation for Cross-Language Information Retrieval
- In ACM SIGIR
, 2000
"... The paper introduces a query translation model that re ects the structure of the cross-language information retrieval task. The model is based on a structured bilingual dictionary in which the translations of each term are clustered into groups with distinct meanings. Query translation is modeled as ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
The paper introduces a query translation model that re ects the structure of the cross-language information retrieval task. The model is based on a structured bilingual dictionary in which the translations of each term are clustered into groups with distinct meanings. Query translation is modeled as a two-stage process, with the system rst determining the intended meaning of a query term and then selecting translations appropriate to that meaning that might appear in the document collection. An implementation of structured translation based on automatic dictionary clustering is described and evaluated by using Chinese queries to retrieve English documents. Structured translation achieved an average precision that was statistically indistinguishable from Pirkola's technique for very short queries, but Pirkola's technique outperformed structured translation on long queries. The paper concludes with some observations on future work to improve retrieval e ectiveness and on other potential uses of structured translation in interactive cross-language retrieval applications. 1.
Studies on Linguistic Problems and Methods in Text Retrieval: The Effects of Anaphor and Ellipsis Resolution
- Department of Information Science, University of Tampere
, 1999
"... First, I want to say a couple a words about the background of this dissertation. When I started ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
First, I want to say a couple a words about the background of this dissertation. When I started
Translation-Based Indexing for Cross-Language Retrieval
- In Proceedings of the 24th BCS-IRSG European Colloquium on IR
, 2002
"... Structured queries have proven to be an effective technique for crosslanguage information retrieval when evidence about translation probability is not available. Query execution time is adversely impacted, however, because the full postings list for each translation is used in the computation. Th ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Structured queries have proven to be an effective technique for crosslanguage information retrieval when evidence about translation probability is not available. Query execution time is adversely impacted, however, because the full postings list for each translation is used in the computation. This paper describes an alternative approach, translation-based indexing, that improves query-time efficiency by integrating the translation and indexing processes. Experiment results demonstrate that similar effectiveness can be achieved at a cost in indexing time that is roughly linear in the average number of known translations for each term.
Document Text Characteristics Affect the Ranking of the Most Relevant Documents By Expanded Structured Queries
, 2001
"... The increasing flood of documentary information through the Internet and other information sources challenges the developers of information retrieval systems. It is not enough that an IR system is able to make a distinction between relevant and non-relevant documents. The reduction of information ov ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
The increasing flood of documentary information through the Internet and other information sources challenges the developers of information retrieval systems. It is not enough that an IR system is able to make a distinction between relevant and non-relevant documents. The reduction of information overload requires that IR systems provide the capability of screening the most valuable documents out of the mass of potentially or marginally relevant documents. This paper introduces a new concept-based method to analyze the text characteristics of documents at varying relevance levels. The results of the document analysis were applied in an experiment on query expansion (QE) in a probabilistic IR system.
User-thesaurus interaction in a web-based database
- In: Proceedings of the Infotech Oulu International Workshop on Information Retrieval
, 2001
"... Abstract. A major challenge faced by users during the information search and retrieval process is the selection of search terms for query formulation and expansion. Thesauri are recognised as one source of search terms which can assist users in query construction and expansion. As the number of elec ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Abstract. A major challenge faced by users during the information search and retrieval process is the selection of search terms for query formulation and expansion. Thesauri are recognised as one source of search terms which can assist users in query construction and expansion. As the number of electronic thesauri attached to information retrieval systems has grown, a range of interface facilities and features have been developed to aid users in formulating their queries. The pilot study reported here aimed to explore and evaluate how a thesaurus-enhanced search interface assisted end-users in selecting search terms. Specifically, it focused on the evaluation of users ' attitudes toward both the thesaurus and its interface as tools for facilitating search term selection for query expansion. Thesaurusbased searching and browsing behaviours adopted by users while interacting with a thesaurus-enhanced search interface were also examined. 1.

