Results 1 -
7 of
7
Report on CLEF-2003 Monolingual Tracks: Fusion of Probabilistic Models for Effective Monolingual Retrieval
- In
, 2004
"... Abstract. For our third participation in the CLEF evaluation campaign, our first objective was to propose more effective and general stopword lists for the Swedish, Finnish and Russian languages along with an improved, more efficient and simpler stemming procedure for these three languages. Our seco ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
Abstract. For our third participation in the CLEF evaluation campaign, our first objective was to propose more effective and general stopword lists for the Swedish, Finnish and Russian languages along with an improved, more efficient and simpler stemming procedure for these three languages. Our second goal was to suggest a combined search approach based on a data fusion strategy that would work with various European languages. Included in this combined approach is a decompounding strategy for the German, Dutch, Swedish and Finnish languages.
Comparative Study of Monolingual and Multilingual Search Models for Use with Asian Languages
- ACM Transactions on Asian Languages Information Processing
, 2005
"... Based on the NTCIR-4 test-collection, our first objective is to present an overview of the retrieval effectiveness of nine vector-space and two probabilistic models when performing monolingual searches in the Chinese, Japanese, Korean and English languages. Our second goal is to analyze the relative ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Based on the NTCIR-4 test-collection, our first objective is to present an overview of the retrieval effectiveness of nine vector-space and two probabilistic models when performing monolingual searches in the Chinese, Japanese, Korean and English languages. Our second goal is to analyze the relative merits of using various automated and freely available tools to translate English-language topics into Chinese, Japanese or Korean, and then submit the resultant query to retrieve pertinent documents written in one of these three Asian languages. We also demonstrate how bilingual searches could be improved by applying both combined query translation strategies and data fusion approaches. Finally, we address basic problems related to multilingual searches in which queries written in English are used to search documents written in the English, Chinese, Japanese and Korean languages.
Experiments in Terabyte and Enterprise tracks with Terrier
- In Proceedings of TREC 2006
, 2007
"... In TREC 2006, we participate in three tasks of the Terabyte and Enterprise tracks. We continue experiments using Terrier 1, our modular and scalable Information Retrieval (IR) platform. Furthering our research into the Divergence From Randomness (DFR) framework of weighting models, we introduce two ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
In TREC 2006, we participate in three tasks of the Terabyte and Enterprise tracks. We continue experiments using Terrier 1, our modular and scalable Information Retrieval (IR) platform. Furthering our research into the Divergence From Randomness (DFR) framework of weighting models, we introduce two new effective and low-cost models, which combine evidence from document structure and capture term dependence and proximity, respectively. Additionally, in the Terabyte track, we improve on our query expansion mechanism on fields, presented in TREC 2005, with a new and more refined technique, which combines evidence in a linear, rather than uniform, way. We also introduce a novel, low-cost syntacticallybased noise reduction technique, which we flexibly apply to both the queries and the index. Furthermore, in the Named Page Finding task, we present a new technique for combining query-independent evidence, in the form of prior probabilities. In the Enterprise track, we test our new voting model for expert search. Our experiments focus on the need for candidate length normalisation, and on how retrieval performance can be enhanced by applying retrieval techniques to the underlying ranking of documents. 1.
Comparing Weighting Models for Monolingual Information Retrieval
- In the Proceedings of the Working Notes for the CLEF 2003 Workshop
, 2003
"... Motivated by the hypothesis that the retrieval performance of a weighting model is independent of the language in which queries and collection are expressed, we compared the retrieval performance of three weighting models, i.e., Okapi, statistical language modeling (SLM), and deviation from rand ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Motivated by the hypothesis that the retrieval performance of a weighting model is independent of the language in which queries and collection are expressed, we compared the retrieval performance of three weighting models, i.e., Okapi, statistical language modeling (SLM), and deviation from randomness (DFR), on three monolingual test collections, i.e., French, Italian, and Spanish. The DFR model was found to consistently achieve better results than both Okapi and SLM, whose performance was comparable. We also evaluated whether the use of retrieval feedback improved retrieval performance; retrieval feedback was beneficial for DFR and Okapi and detrimental for SLM. Besides relative performance, DFR with retrieval feedback achieved excellent absolute results: best run for Italian and Spanish, third run for French.
Performance, Experimentation
"... Information retrieval systems often use proximity or term dependence models to increase the effectiveness of document retrieval. Many of the existing proximity models examine document-level local statistics, such as the frequencies that pairs of query terms occur within fixed-size windows of each do ..."
Abstract
- Add to MetaCart
Information retrieval systems often use proximity or term dependence models to increase the effectiveness of document retrieval. Many of the existing proximity models examine document-level local statistics, such as the frequencies that pairs of query terms occur within fixed-size windows of each document, before applying standard or adapted weighting functions – for instance Markov Random Fields. Term weighting models use Inverse Document Frequency (IDF) to control the influence of occurrences of different query terms in documents. Similarly, some proximity models also take into account the frequency of pairs of query terms in the entire corpus of documents. However, pair frequency is an expensive statistic to pre-compute at indexing time, or to compute at retrieval time before scoring documents. In this work, we examine in a uniform setting, the importance of such global statistics for proximity weighting. We investigate two sources of global statistics, namely the target corpus, and the entire Web. Experiments are conducted using the TREC GOV2 and ClueWeb09 test collections. Our results show that local statistics alone are sufficient for effective retrieval, and global statistics usually do not bring any significant improvement in effectiveness, compared to the same proximity approaches that do not use these global statistics.
Experiments with Terrier Blog, Entity, Million Query, Relevance Feedback, and Web tracks
"... In TREC 2009, we extend our Voting Model for the faceted blog distillation, top stories identification, and related entity finding tasks. Moreover, we experiment with our novel xQuAD framework for search result diversification. Besides fostering our research in multiple directions, by participating ..."
Abstract
- Add to MetaCart
In TREC 2009, we extend our Voting Model for the faceted blog distillation, top stories identification, and related entity finding tasks. Moreover, we experiment with our novel xQuAD framework for search result diversification. Besides fostering our research in multiple directions, by participating in such a wide portfolio of tracks, we further develop the indexing and retrieval capabilities of our Terrier Information Retrieval platform, to effectively and efficiently cope with a new generation of large-scale test collections. 1.
Tracks with Terrier
"... Feedback tracks. In all tracks, we continue the research and development of the Terrier platform 1 centred around extending state-of-the-art weighting models based on the Divergence From ..."
Abstract
- Add to MetaCart
Feedback tracks. In all tracks, we continue the research and development of the Terrier platform 1 centred around extending state-of-the-art weighting models based on the Divergence From

