Results 1 -
6 of
6
Simultaneous multilingual search for translingual information retrieval
- In Proceedings of ACM 17th Conference on Information and Knowledge Management (CIKM
, 2008
"... We consider the problem of translingual information retrieval, where monolingual searchers issue queries in a different language than the document language(s) and the results must be returned in the language they know, the query language. We present a framework for translingual IR that integrates do ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
We consider the problem of translingual information retrieval, where monolingual searchers issue queries in a different language than the document language(s) and the results must be returned in the language they know, the query language. We present a framework for translingual IR that integrates document translation and query translation into the retrieval model. The corpus is represented as an aligned, jointly indexed “pseudo-parallel” corpus, where each document contains the text of the document along with its translation into the query language. The queries are formulated as multilingual structured queries, where each query term and its translations into the document language(s) are treated as synonym sets. This model leverages simultaneous search in multiple languages against jointly indexed documents to improve the accuracy of results over search using document translation or query translation alone. For query translation, we compared a statistical machine translation (SMT) approach to a dictionarybased approach. We found that using a Wikipedia-derived dictionary for named entities combined with an SMT-based dictionary worked better than SMT alone. Simultaneous multilingual search also has other important features suited to translingual search, since it can provide an indication of poor document translation when a match with the source document is found. We show how close integration of CLIR and SMT allows us to improve result translation in addition to IR results.
Mt error detection for cross-lingual question answering
- Proc. COLING2010
, 2010
"... We present a novel algorithm for detecting errors in MT, specifically focusing on content words that are deleted during MT. We evaluate it in the context of cross-lingual question answering (CLQA), where we try to correct the detected errors by using a better (but slower) MT system to retranslate a ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We present a novel algorithm for detecting errors in MT, specifically focusing on content words that are deleted during MT. We evaluate it in the context of cross-lingual question answering (CLQA), where we try to correct the detected errors by using a better (but slower) MT system to retranslate a limited number of sentences at query time. Using a query-dependent ranking heuristic enabled the system to direct scarce MT resources towards retranslating the sentences that were most likely to benefit CLQA. The error detection algorithm
Can One Language Bootstrap the Other: A Case Study on Event Extraction
"... This paper proposes a new bootstrapping framework using cross-lingual information projection. We demonstrate that this framework is particularly effective for a challenging NLP task which is situated at the end of a pipeline and thus suffers from the errors propagated from upstream processing and ha ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper proposes a new bootstrapping framework using cross-lingual information projection. We demonstrate that this framework is particularly effective for a challenging NLP task which is situated at the end of a pipeline and thus suffers from the errors propagated from upstream processing and has low-performance baseline. Using Chinese event extraction as a case study and bitexts as a new source of information, we present three bootstrapping techniques. We first conclude that the standard mono-lingual bootstrapping approach is not so effective. Then we exploit a second approach that potentially benefits from the extra information captured by an English event extraction system and projected into Chinese. Such a crosslingual scheme produces significant performance gain. Finally we show that the combination of mono-lingual and cross-lingual information in bootstrapping can further enhance the performance. Ultimately this new framework obtained 10.1 % relative improvement in trigger labeling (F-measure) and 9.5 % relative improvement in argument-labeling. 1
Name Extraction and Translation for Distillation
"... Name translation is important well beyond the relative frequency of names in a text: a correctly translated passage, but with the wrong name, may lose most of its value. The Nightingale team has built a name translation component which operates in tandem with a conventional phrase-based statistical ..."
Abstract
- Add to MetaCart
Name translation is important well beyond the relative frequency of names in a text: a correctly translated passage, but with the wrong name, may lose most of its value. The Nightingale team has built a name translation component which operates in tandem with a conventional phrase-based statistical MT system, identifying names in the source text and proposing translations to the MT system. Versions have been developed for both Chineseto-English and Arabic-to-English name translation. The system has four main components, a name tagger, translation lists, a transliteration engine, and a context-based ranker. This chapter presents these components in detail and investigates the impact of name translation on cross-lingual spoken sentence retrieval. 1

