Results 1 - 10
of
16
2005 CLEF2005: Multilingual retrieval by combining multiple multilingual ranked lists
- In Proceedings of Cross-Language Evaluation Forum
, 2005
"... Abstract: We participated in two tasks: Multi-8 two-years-on retrieval and Multi-8 results merging. For our multi-8 two-years-on retrieval work, simple multilingual ranked lists are first built by merging ranked lists of different languages that are generated by single types of retrieval algorithms. ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
Abstract: We participated in two tasks: Multi-8 two-years-on retrieval and Multi-8 results merging. For our multi-8 two-years-on retrieval work, simple multilingual ranked lists are first built by merging ranked lists of different languages that are generated by single types of retrieval algorithms. Then, algorithms are proposed to combine these simple multilingual ranked lists into a single ranked list. Empirical study shows that the approach of combining multilingual retrieval results can substantially improve the accuracies over single multilingual ranked lists. Multi-8 results merging task is our primary interest. This task is viewed as similar to the results merging task of federated search. Query-specific and language-specific models are proposed to calculate comparable document scores for a small amount of documents and estimate logistic models by using information of these documents. The logistic models are used to estimate comparable scores for all documents and thus the documents can be sorted into a final ranked list. A set of experiments demonstrated the advantage of the query-specific and language-specific models against several other alternatives.
Monolingual Document Retrieval for European Languages
- Information Retrieval
, 2003
"... Recent years have witnessed considerable advances in information retrieval for European languages other than English. We give an overview of commonly used techniques and we analyze them with respect to their impact on retrieval e#ectiveness. The techniques considered range from linguistically motiva ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Recent years have witnessed considerable advances in information retrieval for European languages other than English. We give an overview of commonly used techniques and we analyze them with respect to their impact on retrieval e#ectiveness. The techniques considered range from linguistically motivated techniques, such as morphological normalization and compound splitting, to knowledge-free approaches, such as n-gram indexing. Evaluations are carried out against data from the CLEF campaign, covering eight European languages. Our results show that for many of these languages a modicum of linguistic techniques may lead to improvements in retrieval e#ectiveness, as can the use of language independent techniques.
Report on CLEF-2003 Monolingual Tracks: Fusion of Probabilistic Models for Effective Monolingual Retrieval
- In
, 2004
"... Abstract. For our third participation in the CLEF evaluation campaign, our first objective was to propose more effective and general stopword lists for the Swedish, Finnish and Russian languages along with an improved, more efficient and simpler stemming procedure for these three languages. Our seco ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
Abstract. For our third participation in the CLEF evaluation campaign, our first objective was to propose more effective and general stopword lists for the Swedish, Finnish and Russian languages along with an improved, more efficient and simpler stemming procedure for these three languages. Our second goal was to suggest a combined search approach based on a data fusion strategy that would work with various European languages. Included in this combined approach is a decompounding strategy for the German, Dutch, Swedish and Finnish languages.
Natural Language Processing in Information Retrieval
- In Proceedings of the 14th Meeting of Computational Linguistics in the Netherlands
, 2004
"... Many Natural Language Processing (NLP) techniques have been used in Information Retrieval. The results are not encouraging. Simple methods (stopwording, porter-style stemming, etc.) usually yield significant improvements, while higher-level processing (chunking, parsing, word sense disambiguation, e ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Many Natural Language Processing (NLP) techniques have been used in Information Retrieval. The results are not encouraging. Simple methods (stopwording, porter-style stemming, etc.) usually yield significant improvements, while higher-level processing (chunking, parsing, word sense disambiguation, etc.) only yield very small improvements or even a decrease in accuracy. At the same time, higher-level methods increase the processing and storage cost dramatically. This makes them hard to use on large collections. We review NLP techniques and come to the conclusion that (a) NLP needs to be optimized for IR in order to be effective and (b) document retrieval is not an ideal application for NLP, at least given the current state-of-the-art in NLP. Other IR-related tasks, e.g., question answering and information extraction, seem to be better suited. 1
CONTROL: CLEF-2003 with Open, Transparent Resources Off-Line. Experiments with merging strategies
- In C. Peters(Ed.), Results of
, 2003
"... Abstract: Corpus-based approaches to CLIR have been studied for many years. However, using commercial MT systems for CLEF has been considered easier and better performing. Our goal is to be one of the CLEF participants who show that the hypothetical performance drop is not large enough to justify th ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract: Corpus-based approaches to CLIR have been studied for many years. However, using commercial MT systems for CLEF has been considered easier and better performing. Our goal is to be one of the CLEF participants who show that the hypothetical performance drop is not large enough to justify the loss of control and transparency, especially for research systems. We participated in two bilingual runs and the small multilingual run using software and data that are free to obtain, transparent and modifiable. 1
Linked Relevance Feedback for the ImageCLEF Photo Task
- In Working
"... In this paper we will describe Berkeley’s approach to the ImageCLEFphoto task for CLEF 2007. Once again (as in ImageCLEFphoto for CLEF 2006) we used entirely text-based methods for retrieval. For some runs this year, however, we exploited the basic similarity of the topics and database from 2006 to ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In this paper we will describe Berkeley’s approach to the ImageCLEFphoto task for CLEF 2007. Once again (as in ImageCLEFphoto for CLEF 2006) we used entirely text-based methods for retrieval. For some runs this year, however, we exploited the basic similarity of the topics and database from 2006 to acquire the metadata descriptions of the “example images ” in the 2007 queries, and used that metadata to expand the query content for each topic. The results speak for themselves: use of what amounts to relevance feedback based on image metadata is much more effective than use of unexpanded queries, and even provides a method of cross-language retrieval for unknown languages when parallel topics and example images can be established. We submitted 19 runs for ImageCLEFphoto this year, of which 8 where monolingual English, German and Spanish, and the remaining 11 where bilingual from various languages to English, German and Spanish.
hsql database engine home page. http://hsqldb.sourceforge.net
- In: Working Notes of the 6 th Workshop of the Cross-Language Evaluation Forum, CLEF. Sep. 2005
, 2001
"... In this paper I will describe the Berkeley (group 1) approach to the GeoCLEF task for CLEF 2005. The main technique we are testing is the fusion of multiple probabilistic searches against different XML components using both Logistic Regression (LR) algorithms and a version of the Okapi BM-25 algorit ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this paper I will describe the Berkeley (group 1) approach to the GeoCLEF task for CLEF 2005. The main technique we are testing is the fusion of multiple probabilistic searches against different XML components using both Logistic Regression (LR) algorithms and a version of the Okapi BM-25 algorithm. We also combine multiple translations of queries in cross-language searching. Since this is the first time that the Cheshire system has been used for CLEF this approach can, at best, be considered a very preliminary base testing of some retrieval algorithms and approaches. The primary geographically based approaches taken for GeoCLEF were to georeference proper nouns in the text using a gazetteer derived from the World Gazetteer with both English and German names for each place, and to expand place names for regions or countries in the queries by the names of the countries or cities in those regions or countries.
University of Hagen at GeoCLEF 2007: Exploring Location Indicators for Geographic Information Retrieval
"... Location indicators are text segments from which a geographic scope can be inferred, e.g. adjectives, demonyms (names for inhabitants of a place), geographic codes, orthographic variants, and abbreviations can be mapped to location names in one or more inferential steps. In this paper, the normaliza ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Location indicators are text segments from which a geographic scope can be inferred, e.g. adjectives, demonyms (names for inhabitants of a place), geographic codes, orthographic variants, and abbreviations can be mapped to location names in one or more inferential steps. In this paper, the normalization of location indicators and treating morphology of location indicators for geographic information retrieval (GIR) within the system GIRSA (Geographic Information Retrieval by Semantic Annotation) are explored. Several retrieval experiments are performed on the German GeoCLEF 2007 data, including a baseline IR experiment on stemmed text (0.119 mean average precision, MAP). Results for this experiment are compared to results for experiments with normalized location indicators. Additionally, the latter approach was combined with an approach using semantic networks for retrieval (an extension of an experiment performed for GeoCLEF 2005). When using the topic title and description, the best performance was achieved by the combination of approaches (0.196 MAP); adding location names from the narrative part increased MAP to 0.258. Results indicate that 1) employing normalized location indicators improves
UC Berkeley at CLEF-2003 – Russian language experiments and domain-specific retrieval
- In Comparative Evaluation of Multilingual Information Access Systems, 4th Workshop of the Cross-Language Evaluation Forum, CLEF 2003 (edited by
, 2004
"... ABSTRACT. As in the previous years, Berkeley’s group 1 experimented with the domain-specific CLEF collection GIRT as well as with Russian as query and document language. The GIRT collection was substantially extended this year and we were able to improve our retrieval results for the query languages ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
ABSTRACT. As in the previous years, Berkeley’s group 1 experimented with the domain-specific CLEF collection GIRT as well as with Russian as query and document language. The GIRT collection was substantially extended this year and we were able to improve our retrieval results for the query languages German, English and Russian. For the GIRT retrieval experiments, we utilized our previous experiences by combining different translations, thesaurus matching, decompounding for German compounds and a blind feedback algorithm. We find that our thesaurus matching technique compares to conventional machine translation for Russian and German against English retrieval and outperforms machine translation for English to German retrieval. With the introduction in CLEF 2003 of a Russian document collection, we participated in the CLEF main task with monolingual and bilingual runs for the Russian collection. For bilingual retrieval our approaches were query translation (for German or English as topic languages) and ‘fast ’ document translation (for English as the topic language). Document translation significantly underperformed query translation (using the PROMPT translation system). 1
Monolingual Retrieval for European Languages
- Information Retrieval
, 2003
"... Recent years have witnessed considerable advances in information retrieval for European languages other than English. We give an overview of commonly used techniques and we analyze them with respect to their impact on retrieval e#ectiveness. The techniques considered range from linguistically motiva ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Recent years have witnessed considerable advances in information retrieval for European languages other than English. We give an overview of commonly used techniques and we analyze them with respect to their impact on retrieval e#ectiveness. The techniques considered range from linguistically motivated techniques, such as morphological normalization and compound splitting, to knowledge-free approaches, such as n-gram indexing. Evaluations are carried out against data from the CLEF campaign, covering eight European languages. Our results show that for many of these languages a modicum of linguistic techniques may lead to improvements in retrieval e#ectiveness, as can the use of language independent techniques. What exactly the best combination of settings is, proved to be highly language dependent in our experiments.

