Results 1 -
3 of
3
TREC-9 Cross-Language Information Retrieval (English - Chinese) Overview
- In: Information Technology: The Ninth Text Retrieval Conference (TREC-9). NIST SP
, 2001
"... Sixteen groups participated in the TREC-9 cross-language information retrieval track which focussed on retrieving Chinese language documents in response to 25 English queries. A variety of CLIR approaches were tested and a rich set of experiments performed which measured the utility of various re ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Sixteen groups participated in the TREC-9 cross-language information retrieval track which focussed on retrieving Chinese language documents in response to 25 English queries. A variety of CLIR approaches were tested and a rich set of experiments performed which measured the utility of various resources such as machine translation and parallel corpora, as well as pre- and posttranslation query expansion using pseudo-relevance feedback.
KUNLP system for NTCIR-3 English–Korean cross-language information retrieval
- In
, 2002
"... This paper describes KUNLP system for the English-Korean cross-language information retrieval track in NTCIR-3 workshop and some experiments after the workshop. Query translation method based on the bilingual dictionary and the document language corpus was used. To automatically transliterate some p ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper describes KUNLP system for the English-Korean cross-language information retrieval track in NTCIR-3 workshop and some experiments after the workshop. Query translation method based on the bilingual dictionary and the document language corpus was used. To automatically transliterate some proper nouns such as Korean person names, Korean place names, and Korean company names, we have constructed the bilingual biographical dictionary, and collected the corresponding translations of Korean place names and Korean company names. We submitted a monolingual run and three cross-language runs, which used only a description field of each topic as a query. Cross-language runs were classified as to whether query expansion was used and whether manual transliteration was applied. Comparisons between cross-language runs show that query expansion is useful in the English-Korean cross-language information retrieval and transliteration also improves the system performance. And additional experiments after NTCIR-3 workshop show that the Korean query which consists of the best translation equivalents for English query terms is more effective than that consisting of two or more translation equivalents. In addition, including English acronyms and initial words in the Korean query is helpful to retrieve Korean documents. Keywords: English-Korean cross-language information retrieval, query translation, query expansion, query transliteration. 1
Cross-Language Spoken Document Retrieval Using HMM-Based Retrieval Model with Multi-Scale Fusion
"... Cross-language spoken document retrieval (CL-SDR) is the technology that facilitates automatic retrieval of relevant information from a collection of spoken documents in a language that is different from that used in the queries. Information sources that are in different languages can then be retrie ..."
Abstract
- Add to MetaCart
Cross-language spoken document retrieval (CL-SDR) is the technology that facilitates automatic retrieval of relevant information from a collection of spoken documents in a language that is different from that used in the queries. Information sources that are in different languages can then be retrieved automatically with CL-SDR, and the number of searchable information sources will increase significantly. The HMM-based retrieval model is a probabilistic formulation for the retrieval problem. Extensions to this retrieval model can be made by taking advantage of its probabilistic nature. Specifically, we have incorporated the translation component to make it possible to perform cross-language information retrieval (CLIR). In addition, this HMM-based CLIR retrieval model is also extended for retrieval at subword scales. In this work the extended HMM-based retrieval model has been applied to an English-Mandarin CL-SDR task, which is to search the Mandarin spoken document collection with English queries at word and subword scales. Retrieval results obtained from these indexing scales are then fused for multi-scale CL-SDR. Experimental results demonstrate that improvement in CL-SDR retrieval performance can be achieved by fusion of word and subword scales.

