Results 1 - 10
of
43
Modern information retrieval: a brief overview
- BULLETIN OF THE IEEE COMPUTER SOCIETY TECHNICAL COMMITTEE ON DATA ENGINEERING
, 2001
"... For thousands of years people have realized the importance of archiving and finding information. With the advent of computers, it became possible to store large amounts of information; and finding useful information from such collections became a necessity. The field of Information Retrieval (IR) wa ..."
Abstract
-
Cited by 101 (0 self)
- Add to MetaCart
For thousands of years people have realized the importance of archiving and finding information. With the advent of computers, it became possible to store large amounts of information; and finding useful information from such collections became a necessity. The field of Information Retrieval (IR) was born in the 1950s out of this necessity. Over the last forty years, the field has matured considerably. Several IR systems are used on an everyday basis by a wide variety of users. This article is a brief overview of the key advances in the field of Information Retrieval, and a description of where the state-of-the-art is at in the field.
The CLEF Cross Language Image Retrieval Track (ImageCLEF) 2004
- MULTILINGUAL INFORMATION ACCESS FOR TEXT, SPEECH AND IMAGES: RESULT OF THE FIFTH CLEF EVALUATION CAMPAIGN, LECTURE NOTES IN COMPUTER SCIENCE
, 2005
"... In this paper we describe ImageCLEF 1, the cross language image retrieval track of the Cross Language Evaluation Forum (CLEF 3). We instigated and ran a pilot experiment in 2003 where participants submitted entries for an ad hoc bilingual image retrieval task on a collection of historic photographs ..."
Abstract
-
Cited by 40 (15 self)
- Add to MetaCart
In this paper we describe ImageCLEF 1, the cross language image retrieval track of the Cross Language Evaluation Forum (CLEF 3). We instigated and ran a pilot experiment in 2003 where participants submitted entries for an ad hoc bilingual image retrieval task on a collection of historic photographs from St. Andrews University Library. This was designed to simulate the situation in which users would express their search request in natural language but require visual documents in return. For 2004 we have extended the tasks to include a medical image retrieval task and a user-centred evaluation.
Embedding web-based statistical translation models in cross-language information retrieval
- Computational Linguistics
, 2003
"... Although more and more language pairs are covered by machine translation (MT) services, there are still many pairs that lack translation resources. Cross-language information retrieval (CLIR) is an application that needs translation functionality of a relatively low level of sophistication, since cu ..."
Abstract
-
Cited by 29 (3 self)
- Add to MetaCart
Although more and more language pairs are covered by machine translation (MT) services, there are still many pairs that lack translation resources. Cross-language information retrieval (CLIR) is an application that needs translation functionality of a relatively low level of sophistication, since current models for information retrieval (IR) are still based on a bag of words. The Web provides a vast resource for the automatic construction of parallel corpora that can be used to train statistical translation models automatically. The resulting translation models can be embedded in several ways in a retrieval model. In this article, we will investigate the problem of automatically mining parallel texts from the Web and different ways of integrating the translation models within the retrieval process. Our experiments on standard test collections for CLIR show that the Web-based translation models can surpass commercial MT systems in CLIR tasks. These results open the perspective of constructing a fully automatic query translation device for CLIR at a very low cost. 1.
Cross-Language Retrieval Experiments at CLEF 2002
, 2002
"... This paper describes monolingual, cross-language, and multilingual retrieval experiments using CLEF-2002 test collection. The paper presents a technique for incorporating blind relevance feedback into a document ranking formula based on logistic regression analysis, and a procedure for decomposing G ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
This paper describes monolingual, cross-language, and multilingual retrieval experiments using CLEF-2002 test collection. The paper presents a technique for incorporating blind relevance feedback into a document ranking formula based on logistic regression analysis, and a procedure for decomposing German or Dutch compounds into their component words. 1
A System for Supporting Cross-Lingual Information Retrieval
, 1999
"... In this paper, we present the system MULINEX, a fully implemented system which supports cross-lingual search of the WWW. Users can formulate, expand and disambiguate queries, filter the search results and read the retrieved documents by using only their native language. This multilingual function ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
In this paper, we present the system MULINEX, a fully implemented system which supports cross-lingual search of the WWW. Users can formulate, expand and disambiguate queries, filter the search results and read the retrieved documents by using only their native language. This multilingual functionality is achieved by the use of dictionary-based query translation, multilingual document categorisation and automatic translation of summaries and documents. The system supports French, German and English and has been installed and tested in the online services of two European internet content and service provider companies. This paper focuses on the techniques and algorithms used in the MULINEX system, explaining how each component works and how it contributes to the overall functionality of the integrated system. The primary system functionalities are outlined from the user perspective, followed by a description of the document database used in the system. The technologies and linguistic resources used in the various system components are then described in detail. 2 1
Using EuroWordNet in a Concept-based Approach to Cross-Language Text Retrieval
- Applied Artificial Intelligence
, 1999
"... W e present an approach to cross ± language text retrieval based on the EuroWordNet (EWN) multilingual semantic database. EuroW ordNet is a multilingual, W ordNet ± like database with basic semantic relations between words for several European languages (English, Dutch, Spanish, Italian, German, Fre ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
W e present an approach to cross ± language text retrieval based on the EuroWordNet (EWN) multilingual semantic database. EuroW ordNet is a multilingual, W ordNet ± like database with basic semantic relations between words for several European languages (English, Dutch, Spanish, Italian, German, French, Czech, and Estonian). In addition to the relations in W ordNet 1.5, EW N includes domain labels, cross ± language, and cross ± part ± of± speech relations, which are directly useful for multilingual information retrieval. In our approach, documents in any language covered by EuroW ordNet are indexed in a space of language ± independent concepts (the EuroW ordNet Inter Lingual Index), thus turning term weighting and query/document matching into language ± independent tasks. W e report on the results of a number of experiments that measure the potential bene�ts of the approach and its tolerance to word sense disambiguation errors. In our monolingual experiments, the classical, vector space model for text retrieval is shown to give better results (up to 29 % better in our experiments) if W ordNet synsets are chosen as the indexing space, instead of word forms. T his result is obtained for a manually disambiguated test collection derived from the SEMCOR annotated corpus. The sensitivity of retrieval
Query Term Disambiguation for Web Cross-Language Information Retrieval using a Search Engine
- In Proceedings of the fifth international
, 2000
"... With the worldwide growth of the Internet, research on Cross-Language Information Retrieval (CLIR) is being paid much attention. Existing CLIR approaches based on query translation require parallel corpora or comparable corpora for the disambiguation of translated query terms. However, those natur ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
With the worldwide growth of the Internet, research on Cross-Language Information Retrieval (CLIR) is being paid much attention. Existing CLIR approaches based on query translation require parallel corpora or comparable corpora for the disambiguation of translated query terms. However, those natural language resources are not readily available. In this paper, we propose a disambiguation method for dictionary-based query translation that is independent of the availability of such scarce language resources, while achieving adequate retrieval effectiveness by utilizing Web documents as a corpus and using co-occurrence information between terms within that corpus. In the experiments, our method achieved 97% of manual translation case in terms of the average precision.
The CLEF 2004 Cross Language Image Retrieval Track
- EDS.) MULTILINGUAL INFORMATION ACCESS FOR TEXT, SPEECH AND IMAGES: RESULTS OF THE FIFTH CLEF EVALUATION CAMPAIGN, LECTURE NOTES IN COMPUTER SCIENCE
, 2005
"... The purpose of this paper is to outline efforts from the 2004 CLEF cross–language image retrieval campaign (ImageCLEF). The aim of this CLEF track is to explore the use of both text and content–based retrieval methods for cross–language image retrieval. Three tasks were offered in the ImageCLEF tra ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
The purpose of this paper is to outline efforts from the 2004 CLEF cross–language image retrieval campaign (ImageCLEF). The aim of this CLEF track is to explore the use of both text and content–based retrieval methods for cross–language image retrieval. Three tasks were offered in the ImageCLEF track: a TREC–style ad-hoc retrieval task, retrieval from a medical collection, and a user–centered (interactive) evaluation task. Eighteen research groups from a variety of backgrounds and nationalities participated in ImageCLEF. In this paper we describe the ImageCLEF tasks, submissions from participating groups and summarise the main findings.
Report on CLEF-2001 Experiments
- Proceedings of the CLEF 2001 Cross-Language Text Retrieval System Evaluation Campaign. Lecture Notes in Computer Science
, 2001
"... . For our first participation in CLEF retrieval tasks, our first objective was to define a general stopword list for various European languages (namely, French, Italian, German and Spanish) and also to suggest simple and efficient stemming procedures for them. Our second aim was to suggest a combine ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
. For our first participation in CLEF retrieval tasks, our first objective was to define a general stopword list for various European languages (namely, French, Italian, German and Spanish) and also to suggest simple and efficient stemming procedures for them. Our second aim was to suggest a combined approach that might be implemented in order to facilitate effective access to multilingual collections. 1. Monolingual indexing and search Most European languages (including French, Italian, Spanish, German) share many of the same characteristics as does the language of Shakespeare (e.g., word boundaries marked in a conventional manner, variant word forms generated by adding suffixes to the end of a root, etc.). Any adaptation of indexing or search strategies thus means the elaboration of general stopword lists and fast stemming procedures. Stopword lists contain nonsignificant words that are removed from a document or a request before the indexing process is begun. Stemming procedures try to remove inflectional and derivational suffixes in order to conflate word variants into the same stem or root. This first chapter will deal with these issues and is organized as follows: Section 1.1 contains an overview of our five test collections while Section 1.2 describes our general approach to building stopword lists and stemmers for use with languages other than English. Section 1.3 depicts the Okapi probabilistic model together with the description of the runs submitted by us in the monolingual track. 1.1. Overview of the test-collections The corpora used in our experiments included newspapers such as the Los Angeles Times, Le Monde (French), La Stampa (Italian), Der Spiegel and Frankfurter Rundschau (German) and EFE (Spanish) and various news items edited by the Swiss news a...
Employing the Resolution Power of Search Keys
- Journal of the American Society for Information Science and Technology
, 2001
"... Search key resolution power is analyzed in the context of a request, i.e., among the set of search keys for the request. Methods of characterizing the resolution power of keys automatically are studied and the effects search keys of varying resolution power have on retrieval effectiveness are analyz ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Search key resolution power is analyzed in the context of a request, i.e., among the set of search keys for the request. Methods of characterizing the resolution power of keys automatically are studied and the effects search keys of varying resolution power have on retrieval effectiveness are analyzed. It is shown that it often is possible to identify the best key of a query while the discrimination between the remaining keys presents problems. It is also shown that query performance is improved by suitably using the best key in a structured query. The tests were run with InQuery in a subcollection of the TREC collection, which contained some 515.000 documents.

