Results 1 -
9 of
9
Boosting Cross-Language Retrieval by Learning Bilingual Phrase Associations from Relevance Rankings
"... We present an approach to learning bilingual n-gram correspondences from relevance rankings of English documents for Japanese queries. We show that directly optimizing cross-lingual rankings rivals and complements machine translation-based cross-language information retrieval (CLIR). We propose an e ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
We present an approach to learning bilingual n-gram correspondences from relevance rankings of English documents for Japanese queries. We show that directly optimizing cross-lingual rankings rivals and complements machine translation-based cross-language information retrieval (CLIR). We propose an efficient boosting algorithm that deals with very large cross-product spaces of word correspondences. We show in an experimental evaluation on patent prior art search that our approach, and in particular a consensus-based combination of boosting and translation-based approaches, yields substantial improvements in CLIR performance. Our training and test data are made publicly available. 1
Combining Statistical Translation Techniques for Cross-Language Information Retrieval
"... Cross-language information retrieval today is dominated by techniques that rely principally on context-independent token-to-token mappings despite the fact that state-of-the-art statistical machine translation systems now have far richer translation models available in their internal representations ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Cross-language information retrieval today is dominated by techniques that rely principally on context-independent token-to-token mappings despite the fact that state-of-the-art statistical machine translation systems now have far richer translation models available in their internal representations. This paper explores combination-of-evidence techniques using three types of statistical translation models: context-independent token translation, token translation using phrase-dependent contexts, and token translation using sentence-dependent contexts. Context-independent translation is performed using statistically-aligned tokens in parallel text, phrase-dependent translation is performed using aligned statistical phrases, and sentence-dependent translation is performed using those same aligned phrases together with an n-gram language model. Experiments on retrieval of Arabic, Chinese, and French documents using English queries show that no one technique is optimal for all queries, but that statistically significant improvements in mean average precision over strong baselines can be achieved by combining translation evidence from all three techniques. The optimal combination is, however, found to be resource-dependent, indicating a need for future work on robust tuning to the characteristics of individual collections.
Multilingual test sets for machine translation of search queries for cross-lingual information retrieval in the medical domain
- In To appear in Proceedings of the Ninth International Conference on Language Resources and Evaluation, Reykjavik
, 2014
"... Abstract This paper presents development and test sets for machine translation of search queries in cross-lingual information retrieval in the medical domain. The data consists of the total of 1,508 real user queries in English translated to Czech, German, and French. We describe the translation an ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract This paper presents development and test sets for machine translation of search queries in cross-lingual information retrieval in the medical domain. The data consists of the total of 1,508 real user queries in English translated to Czech, German, and French. We describe the translation and review process involving medical professionals and present a baseline experiment where our data sets are used for tuning and evaluation of a machine translation system.
Response-based learning for grounded machine translation
- In Meeting of the Association for Computational Linguistics (ACL
, 2014
"... Abstract We propose a novel learning approach for statistical machine translation (SMT) that allows to extract supervision signals for structured learning from an extrinsic response to a translation input. We show how to generate responses by grounding SMT in the task of executing a semantic parse ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract We propose a novel learning approach for statistical machine translation (SMT) that allows to extract supervision signals for structured learning from an extrinsic response to a translation input. We show how to generate responses by grounding SMT in the task of executing a semantic parse of a translated query against a database. Experiments on the GEO-QUERY database show an improvement of about 6 points in F1-score for responsebased learning over learning from references only on returning the correct answer from a semantic parse of a translated query. In general, our approach alleviates the dependency on human reference translations and solves the reachability problem in structured learning for SMT.
Learning to Translate Queries for CLIR
"... heidelberg.de The statistical machine translation (SMT) component of cross-lingual information retrieval (CLIR) systems is often regarded as black box that is optimized for translation qual-ity independent from the retrieval task. In recent work [10], SMT has been tuned for retrieval by training a r ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
heidelberg.de The statistical machine translation (SMT) component of cross-lingual information retrieval (CLIR) systems is often regarded as black box that is optimized for translation qual-ity independent from the retrieval task. In recent work [10], SMT has been tuned for retrieval by training a reranker on k-best translations ordered according to their retrieval per-formance. In this paper we propose a decomposable proxy for retrieval quality that obviates the need for costly inter-mediate retrieval. Furthermore, we explore the full search space of the SMT decoder by directly optimizing decoder parameters under a retrieval-based objective. Experimen-tal results for patent retrieval show our approach to be a promising alternative to the standard pipeline approach.
SEARCHING TO TRANSLATE AND TRANSLATING TO SEARCH: WHEN INFORMATION RETRIEVAL MEETS MACHINE TRANSLATION
, 2013
"... With the adoption of web services in daily life, people have access to tremen-dous amounts of information, beyond any human’s reading and comprehension capabilities. As a result, search technologies have become a fundamental tool for accessing information. Furthermore, the web contains information i ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
With the adoption of web services in daily life, people have access to tremen-dous amounts of information, beyond any human’s reading and comprehension capabilities. As a result, search technologies have become a fundamental tool for accessing information. Furthermore, the web contains information in multiple languages, introducing another barrier between people and information. Therefore, search technologies need to handle content written in multiple languages, which requires techniques to account for the linguistic differences. Information Retrieval (IR) is the study of search techniques, in which the task is to find material relevant to a given information need. Cross-Language Information Retrieval (CLIR) is a special case of IR when the search takes place in a multi-lingual collection. Of course, it is not helpful to retrieve content in languages the user cannot understand. Machine Translation (MT) studies the translation of text from one language into another efficiently (within a reasonable amount of time) and effectively (fluent and retaining the original meaning), which helps people understand what is being written, regardless of the source language.
Response-Based Learning for Patent Translation
"... In response-based structured prediction, instead of a gold-standard structure, the learner is given a response to a predicted structure from which a supervision signal for structured learning is extracted. Applied to statistical machine translation (SMT), different types of environments such as a do ..."
Abstract
- Add to MetaCart
(Show Context)
In response-based structured prediction, instead of a gold-standard structure, the learner is given a response to a predicted structure from which a supervision signal for structured learning is extracted. Applied to statistical machine translation (SMT), different types of environments such as a downstream application, a professional translator, or an SMT user, may respond to predicted translations with a ranking, a correction, or an acceptance/rejection decision, respec-tively. We present algorithms and experiments that show that learning from responses alleviates the supervision problem and allows a direct optimization of SMT for tasks such as cross-lingual patent prior art retrieval, or translation of technical patent documents. 1
On the Problem of Theoretical Terms in Empirical Computational Linguistics
"... Philosophy of science has pointed out a problem of theoretical terms in empirical sciences. This problem arises if all known measuring procedures for a quantity of a theory presuppose the validity of this very theory, because then statements containing theoretical terms are circular. We argue that a ..."
Abstract
- Add to MetaCart
Philosophy of science has pointed out a problem of theoretical terms in empirical sciences. This problem arises if all known measuring procedures for a quantity of a theory presuppose the validity of this very theory, because then statements containing theoretical terms are circular. We argue that a similar circularity can happen in empirical computational linguistics, especially in cases where data are manually annotated by experts. We define a criterion of T-non-theoretical grounding as guidance to avoid such circularities, and exemplify how this criterion can be met by crowdsourcing, by task-related data annotation, or by data in the wild. We argue that this criterion should be considered as a necessary condition for an empirical science, in addition to measures for reliability of data annotation. 1.
Flat vs. Hierarchical Phrase-Based Translation Models for Cross-Language Information Retrieval
"... ABSTRACT Although context-independent word-based approaches remain popular for cross-language information retrieval, many recent studies have shown that integrating insights from modern statistical machine translation systems can lead to substantial improvements in effectiveness. In this paper, we ..."
Abstract
- Add to MetaCart
(Show Context)
ABSTRACT Although context-independent word-based approaches remain popular for cross-language information retrieval, many recent studies have shown that integrating insights from modern statistical machine translation systems can lead to substantial improvements in effectiveness. In this paper, we compare flat and hierarchical phrase-based translation models for query translation. Both approaches yield significantly better results than either a token-based or a one-best translation baseline on standard test collections. The choice of model manifests interesting tradeoffs in terms of effectiveness, efficiency, and model compactness.