Results 1 - 10
of
33
Should we Translate the Documents or the Queries in Cross-language Information Retrieval?
, 1999
"... Previous comparisons of document and query translation suffered difficulty due to differing quality of machine translation in these two opposite directions. We avoid this difficulty by training identical statistical translation models for both translation di- rections using the same training data. W ..."
Abstract
-
Cited by 38 (1 self)
- Add to MetaCart
Previous comparisons of document and query translation suffered difficulty due to differing quality of machine translation in these two opposite directions. We avoid this difficulty by training identical statistical translation models for both translation di- rections using the same training data. We investigate information retrieval between English and French, incorporating both trans- lations directions into both document trans- lation and query translation-based information retrieval, as well as into hybrid systems. We find that hybrids o document and query translation-based systems outperform query translation systems, even human-quality query translation systems. I
Disambiguation strategies for cross-language information retrieval
- In Proceedings of the third European Conference on Research and Advanced Technology for Digital Libraries (ECDL
, 1999
"... Keywords: Cross-Language Information Retrieval, Statistical Machine ..."
Abstract
-
Cited by 33 (11 self)
- Add to MetaCart
Keywords: Cross-Language Information Retrieval, Statistical Machine
Japanese/English Cross-Language Information Retrieval: Exploration of Query . . .
- COMPUTERS AND THE HUMANITIES
, 2001
"... Cross-language information retrieval (CLIR), where queries and documents are in different languages, has of late become one of the major topics within the information retrieval community. This paper ..."
Abstract
-
Cited by 21 (8 self)
- Add to MetaCart
Cross-language information retrieval (CLIR), where queries and documents are in different languages, has of late become one of the major topics within the information retrieval community. This paper
Semantic Annotation for Concept-Based Cross-Language Medical Information Retrieval
"... We present a framework for concept-based cross-language information retrieval in the medical domain, which is under development in the MUCHMORE project. Our approach is based on using the Unified Medical Language System (UMLS) as the primary source of semantic data. Documents and queries are annotat ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
We present a framework for concept-based cross-language information retrieval in the medical domain, which is under development in the MUCHMORE project. Our approach is based on using the Unified Medical Language System (UMLS) as the primary source of semantic data. Documents and queries are annotated with multiple layers of linguistic information. Linguistic processing includes part-of-speech tagging, morphological analysis, phrase recognition and the identification of medical terms and semantic relations between them. The paper
Effects of Term Segmentation on Chinese/English Cross-Language Information Retrieval
, 1999
"... The majority of recent Cross-Language Information Retrieval (CLIR) research has focused on European languages. CLIR problems that involve East Asian languages such as Chinese introduce additional challenges, because written Chinese texts lack boundaries between terms. This paper examines three Chine ..."
Abstract
-
Cited by 10 (7 self)
- Add to MetaCart
The majority of recent Cross-Language Information Retrieval (CLIR) research has focused on European languages. CLIR problems that involve East Asian languages such as Chinese introduce additional challenges, because written Chinese texts lack boundaries between terms. This paper examines three Chinese segmentation techniques in combination with two variants of dictionary-based Chinese to English query translation. The results indicate that failure to segment terms, particularly technical terms and names, can have a cascading effect that reduces retrieval effectiveness. Task-tuned segmentation algorithms and alternative term weighting strategies are suggested as productive directions for future work.
The development and use of machine translation systems and computer-based translation tools
- In International Conference on Machine Translation & Computer Language Information Processing
, 1999
"... Abstract: This survey of the present demand and use of computer-based translation software concentrates on systems designed for the production of translations of publishable quality, including developments in controlled language systems, translator workstations, and localisation; but it covers also ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
Abstract: This survey of the present demand and use of computer-based translation software concentrates on systems designed for the production of translations of publishable quality, including developments in controlled language systems, translator workstations, and localisation; but it covers also the developments of software for non-translators, in particular for use with Web pages and other Internet applications, and it looks at future needs and systems under development. The final section compares the types of translations that can be met most appropriately by human and by machine (and computer-aided) translation respectively.
Large-Scale Construction of Chinese-English Semantic Hierarchy
- in Proceedings of the Workshop on English-Chinese Cross Language Information Retrieval, International Conference on Chinese Language Computing
, 2000
"... This paper describes an approach to large-scale construction of a semantic hierarchy for Chinese verbs. Leveraging off of an existing Chinese conceptual database called HowNet and a Levin-based English verb classification, we use thematic-role information to create links between Chinese concepts and ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
This paper describes an approach to large-scale construction of a semantic hierarchy for Chinese verbs. Leveraging off of an existing Chinese conceptual database called HowNet and a Levin-based English verb classification, we use thematic-role information to create links between Chinese concepts and English classes. The resulting hierarchy is used for multilingual lexicons in machine translation and cross-language information retrieval applications.
Cross-lingual Information Retrieval using Hidden Markov Models
- IN PROCEEDINGS OF THE 2000 JOINT SIGDAT CONFERENCE
, 2000
"... This paper presents empirical results in cross-lingual information retrieval using English queries to access Chinese documents (TREC-5 and TREC-6) and Spanish documents (TREC4). Since our interest is in languages where resources may be minimal, we use an integrated probabilistic model that requires ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This paper presents empirical results in cross-lingual information retrieval using English queries to access Chinese documents (TREC-5 and TREC-6) and Spanish documents (TREC4). Since our interest is in languages where resources may be minimal, we use an integrated probabilistic model that requires only a bilingual dictionary as a resource. We explore how a combined probability model of term translation and retrieval can reduce the effect of translation ambiguity. In addition, we estimate an upper bound on performance, if translation ambiguity were a solved problem. We also measure performance as a function of bilingual dictionary size.
Machine Translation for Information Access across the . . .
, 1999
"... In this paper we describe the design and implementation of MuST, a multilingual information retrieval, summarization, and translation system. MuST integrates machine translation and other text processing services to enable users to perform cross-language information retrieval using available search ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
In this paper we describe the design and implementation of MuST, a multilingual information retrieval, summarization, and translation system. MuST integrates machine translation and other text processing services to enable users to perform cross-language information retrieval using available search services such as commercial Internet search engines. To handle non-standard languages, a new Internet indexing agent can be deployed, specialized local search services can be built, and shallow MT can be added to provide useful functionality. A case study of augmenting MuST with Indonesian is included. MuST adopts ubiquitous web browsers as its primary user interface, and provides tightly integrated automated shallow translation and user biased summarization to help users quickly judge the relevance of documents.
Applying machine translation to two-stage cross-language information retrieval
- In Proceedings of the 4th Conference of the Association for Machine Translation in the Americas
, 2000
"... Abstract. Cross-language information retrieval (CLIR), where queries and documents are in di erent languages, needs a translation of queries and/or documents, so as to standardize both of them into a common representation. For this purpose, the use of machine translation is an e ective approach. How ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Abstract. Cross-language information retrieval (CLIR), where queries and documents are in di erent languages, needs a translation of queries and/or documents, so as to standardize both of them into a common representation. For this purpose, the use of machine translation is an e ective approach. However, computational cost is prohibitive in translating large-scale document collections. To resolve this problem, we proposeatwo-stage CLIR method. First, we translate a given query into the document language, and retrieve a limited number of foreign documents. Second, we machine translate only those documents into the user language, and re-rank them based on the translation result. We also show the e ectiveness of our method by way of experiments using Japanese queries and English technical documents. 1

