Results 1 -
2 of
2
Resource Selection for Domain-Specific Cross-Lingual IR
- In Proc. of the 27th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR
, 2004
"... An under-explored question in cross-language information retrieval (CLIR) is to what degree the performance of CLIR methods depends on the availability of high-quality translation resources for particular domains. To address this issue, we evaluate several competitive CLIR methods - with different t ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
An under-explored question in cross-language information retrieval (CLIR) is to what degree the performance of CLIR methods depends on the availability of high-quality translation resources for particular domains. To address this issue, we evaluate several competitive CLIR methods - with different training corpora - on test documents in the medical domain. Our results show severe performance degradation when using a general-purpose training corpus or a commercial machine translation system (SYSTRAN), versus a domain-specific training corpus. A related unexplored question is whether we can improve CLIR performance by systematically analyzing training resources and optimally matching them to target collections. We start exploring this problem by suggesting a simple criterion for automatically matching training resources to target corpora. By using cosine similarity between training and target corpora as resource weights we obtained an average of 5.6% improvement over using all resources with no weights. The same metric yields 99.4% of the performance obtained when an oracle chooses the optimal resource every time.
Domain Adaptation of Translation Models for Multilingual Applications
, 2009
"... number W0550432. ..."

