Results 1 - 10
of
12
ExpansionTool: Concept-Based Query Expansion And Construction
"... We develop a deductive data model for concept-based query expansion. ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
We develop a deductive data model for concept-based query expansion.
Document Text Characteristics Affect the Ranking of the Most Relevant Documents By Expanded Structured Queries
, 2001
"... The increasing flood of documentary information through the Internet and other information sources challenges the developers of information retrieval systems. It is not enough that an IR system is able to make a distinction between relevant and non-relevant documents. The reduction of information ov ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
The increasing flood of documentary information through the Internet and other information sources challenges the developers of information retrieval systems. It is not enough that an IR system is able to make a distinction between relevant and non-relevant documents. The reduction of information overload requires that IR systems provide the capability of screening the most valuable documents out of the mass of potentially or marginally relevant documents. This paper introduces a new concept-based method to analyze the text characteristics of documents at varying relevance levels. The results of the document analysis were applied in an experiment on query expansion (QE) in a probabilistic IR system.
Extensions to the STAIRS Study - Empirical Evidence for the Hypothesised Ineffectiveness of Boolean Queries in Large Full-Text Databases
- Information Retrieval
, 2001
"... The STAIRS study conducted by Blair and Maron in the mid-80's is a milestone in the history of IR evaluation. Blair and Maron made strong conclusion about the inadequacy of free-text searching large databases, and their study has been widely referred in the literature to justify the problems of effe ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
The STAIRS study conducted by Blair and Maron in the mid-80's is a milestone in the history of IR evaluation. Blair and Maron made strong conclusion about the inadequacy of free-text searching large databases, and their study has been widely referred in the literature to justify the problems of effectiveness in IR systems. However, some critics of the study have plausibly pointed out that the ineffectiveness conclusions were not solidly based on empirical data. This paper introduces a new theoretical and empirical approach to study the problems of high recall searching in large databases and reports the results of a case experiment. The findings verify some of the hypothetical conclusions introduced in the STAIRS study, and expands the picture of falling performance. It is shown that low precision in high recall searching is unavoidable in exact-match Boolean searching since even major concepts are often expressed implicitly in relevant documents. The author suggests that the problem could be reduced in facet-based best-match searching.
Consistency of Textual Expression in Newspaper Articles: An argument for semantically based query expansion
- Information Processing & Management
, 2001
"... This article investigates how consistent different newspapers are in their choice of words when writing about the same news events. News articles on the same news events were taken from three Finnish newspapers and compared in regard to their central concepts and words representing the concepts i ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This article investigates how consistent different newspapers are in their choice of words when writing about the same news events. News articles on the same news events were taken from three Finnish newspapers and compared in regard to their central concepts and words representing the concepts in the news texts. Consistency figures were calculated for each set of three articles (the total number of sets was 60).
A retrospective evaluation method for exact-match and best-match queries applying an interactive query performance analyser
- in Crestani, F. et al. (Eds), Advances in Information Retrieval: Proceedings of the 24th European Colloquium on IR Research
, 2002
"... Abstract. A retrospective method for the performance comparison of queries based on different IR models is introduced. The method is based on the interactive optimisation of queries by a group of test searchers using a query performance analyser. The case experiment focused on comparing the maximum ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract. A retrospective method for the performance comparison of queries based on different IR models is introduced. The method is based on the interactive optimisation of queries by a group of test searchers using a query performance analyser. The case experiment focused on comparing the maximum effectiveness of Boolean exact-match queries, and structured and unstructured best-match queries. The experiment verified the problems in maintaining precision of Boolean queries at high recall levels. Interesting similarities were also observed between structured and unstructured best-match queries challenging the results of earlier studies.
The effects of relevance feedback quality and quantity in interactive relevance feedback: A simulation based on user modelling
- In Proceedings of the 28th European Conference on Information Retrieval
, 2006
"... Abstract. Experiments on the effectiveness of relevance feedback with real users are time-consuming and expensive. This makes simulation for rapid testing desirable. We define a user model, which helps to quantify some interaction decisions involved in simulated relevance feedback. First, the releva ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. Experiments on the effectiveness of relevance feedback with real users are time-consuming and expensive. This makes simulation for rapid testing desirable. We define a user model, which helps to quantify some interaction decisions involved in simulated relevance feedback. First, the relevance criterion defines the relevance threshold of the user to accept documents as relevant to his/her needs. Second, the browsing effort refers to the patience of the user to browse through the initial list of retrieved documents in order to give feedback. Third, the feedback effort refers to the effort and ability of the user to collect feedback documents. We use the model to construct several simulated relevance feedback scenarios in a laboratory setting. Using TREC data providing graded relevance assessments, we study the effect of the quality and quantity of the feedback documents on the effectiveness of the relevance feedback and compare this to the pseudo-relevance feedback. Our results indicate that one can compensate large amounts of relevant but low quality feedback by small amounts of highly relevant feedback. 1.
The polyrepresentation continuum in IR
- In: IIiX: Proceedings of the 1st international conference on Information interaction in context
, 2006
"... Abstract. The polyrepresentation principle suggests that cognitively and functionally different representations of information objects may be used in information retrieval to enhance quality of results. In the paper, several empirical studies that intentionally or unintentionally have tested the pri ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. The polyrepresentation principle suggests that cognitively and functionally different representations of information objects may be used in information retrieval to enhance quality of results. In the paper, several empirical studies that intentionally or unintentionally have tested the principle are introduced and discussed. The continuum proposed by Larsen (2004; Ingwersen & Larsen, 2005) showing the structural dimension of the retrieval techniques involved in polyrepresentation is further elaborated by adding a novel second dimension consisting of query structure and modus. The new twodimensional continuum can be seen as a constructive framework for further investigations of polyrepresentative principles in IR. Symposium themes. Document structure in contextual IIR; Research design 1
Hierarchical clustering of a Finnish newspaper article collection with graded relevance assessments. A manuscript submitted to Information Retrieval
"... Abstract. Search facilitated with agglomerative hierarchical clustering methods was studied in a collection of Finnish newspaper articles (N = 53,893). To allow quick experiments, clus-tering was applied to a sample (N = 5,000) that was reduced with principal components analysis. The dendrograms wer ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. Search facilitated with agglomerative hierarchical clustering methods was studied in a collection of Finnish newspaper articles (N = 53,893). To allow quick experiments, clus-tering was applied to a sample (N = 5,000) that was reduced with principal components analysis. The dendrograms were heuristically cut to find an optimal partition, whose clusters were compared with each of the 30 queries to retrieve the best-matching cluster. The four-level relevance assessment was collapsed into a binary one by (A) considering all the relevant and (B) only the highly relevant documents relevant, respectively. Single linkage (SL) was the worst method. It created many tiny clusters, and, consequently, searches enabled with it had high precision and low recall. The complete linkage (CL), average linkage (AL), and Ward’s methods (WM) returned reasonably-sized clusters typically of 18-32 documents. Their recall (A: 27-52%, B: 50-82%) and precision (A: 83-90%, B: 18-21%) was higher than and comparable to those of the SL clusters, respectively. The AL and WM clustering had 1-8 % better effectiveness than nearest neighbor searching (NN), and SL and CL were 1-9 % less efficient that NN. However, the differences were statistically insignificant. When evaluated with the liberal assessment A, the results suggest that the AL and WM clustering offer better retrieval ability than NN. Assessment B renders the AL and WM clustering better than NN, when recall is considered more important than precision. The results imply that collections in the highly inflectional and agglutinative languages, such as Finnish, may be clustered as the collections in English, provided that documents are appropriately preprocessed.
Is a morphologically complex language really that complex in full-text retrieval
- In Advances in Natural Language Processing (pp. 411-422). LNCS #4139
, 2006
"... Abstract. In this paper we show that keyword variation of a morphologically complex language, Finnish, can be handled effectively for IR purposes by generating only the textually most frequent forms of the keyword. Theoretically Finnish nouns have about 2,000 different forms, but occurrences of most ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. In this paper we show that keyword variation of a morphologically complex language, Finnish, can be handled effectively for IR purposes by generating only the textually most frequent forms of the keyword. Theoretically Finnish nouns have about 2,000 different forms, but occurrences of most of the forms are rare. Corpus statistics showed that about 84 – 88 per cent of the occurrences of inflected noun forms are forms of only six cases out of the 14 possible. This number – maximally 2*6 – of keyword’s variant forms makes it feasible to try them all in a search. IR results of the frequent
A Test Collection for the Evaluation of Content-Based
, 2001
"... Content-based image retrieval (CBIR) algorithms have been seen as a promising access method for digital photograph collections. Unfortunately, we have very little evidence of the usefulness of these algorithms in real user needs and contexts. In this paper, we introduce a test collection for the ..."
Abstract
- Add to MetaCart
Content-based image retrieval (CBIR) algorithms have been seen as a promising access method for digital photograph collections. Unfortunately, we have very little evidence of the usefulness of these algorithms in real user needs and contexts. In this paper, we introduce a test collection for the evaluation of CBIR algorithms. In the test collection, the performance testing is based on photograph similarity perceived by end-users in the context of realistic illustration tasks and environment. The building process and the characteristics of the resulting test collection are outlined, including a typology of similarity criteria expressed by the subjects judging the similarity of photographs. A small-scale study on the consistency of similarity assessments is presented. A case evaluation of two CBIR algorithms is reported. The results show clear correlation between the subjects' similarity assessments and the functioning of feature parameters of the tested algorithms. 1

