Results 1 - 10
of
13
An end-to-end discriminative approach to machine translation
- In Proceedings of the Joint International Conference on Computational Linguistics and Association of Computational Linguistics (COLING/ACL
, 2006
"... We present a perceptron-style discriminative approach to machine translation in which large feature sets can be exploited. Unlike discriminative reranking approaches, our system can take advantage of learned features in all stages of decoding. We first discuss several challenges to error-driven disc ..."
Abstract
-
Cited by 77 (2 self)
- Add to MetaCart
We present a perceptron-style discriminative approach to machine translation in which large feature sets can be exploited. Unlike discriminative reranking approaches, our system can take advantage of learned features in all stages of decoding. We first discuss several challenges to error-driven discriminative approaches. In particular, we explore different ways of updating parameters given a training example. We find that making frequent but smaller updates is preferable to making fewer but larger updates. Then, we discuss an array of features and show both how they quantitatively increase BLEU score and how they qualitatively interact on specific examples. One particular feature we investigate is a novel way to introduce learning into the initial phrase extraction process, which has previously been entirely heuristic. 1
Supertagged Phrase-based Statistical Machine Translation
- In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL’07
, 2007
"... Phrase-based Statistical Machine Translation (PBSMT) systems represent the dominant approach in MT today. However, unlike systems in other paradigms, it has proven difficult to date to incorporate syntactic knowledge in order to improve translation quality. This paper improves on recent research whi ..."
Abstract
-
Cited by 21 (7 self)
- Add to MetaCart
Phrase-based Statistical Machine Translation (PBSMT) systems represent the dominant approach in MT today. However, unlike systems in other paradigms, it has proven difficult to date to incorporate syntactic knowledge in order to improve translation quality. This paper improves on recent research which uses ‘syntactified ’ target language phrases, by incorporating supertags as constraints to better resolve parse tree fragments. In addition, we do not impose any sentence-length limit, and using a log-linear decoder, we outperform a stateof-the-art PBSMT system by over 1.3 BLEU points (or 3.51% relative) on the NIST 2003 Arabic–English test corpus. 1.
Integrated n-best re-ranking for spoken language translation
- Proc. of the 9th European Conference on Speech Communication and Technology, Interspeech’05
, 2005
"... This paper describes the application of N-best lists to a spoken language translation system. Multiple hypotheses are generated both by the speech recognizer and by the statistical machine translator; they are finally re-ranked by optimally weighting recognition and translation scores, estimated in ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
This paper describes the application of N-best lists to a spoken language translation system. Multiple hypotheses are generated both by the speech recognizer and by the statistical machine translator; they are finally re-ranked by optimally weighting recognition and translation scores, estimated in an integrated scheme. We provide experimental results for the Italian-to-English direction on the BTEC corpus, a collection of sentences in the touristic domain developed within the C-STAR project. 1.
A comparative study on language model adaptation techniques using new evaluation metrics
- Proc. HLT/EMNLP
, 2005
"... This paper presents comparative experimental results on four techniques of language model adaptation, including a maximum a posteriori (MAP) method and three discriminative training methods, the boosting algorithm, the average perceptron and the minimum sample risk method, on the task of Japanese Ka ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
This paper presents comparative experimental results on four techniques of language model adaptation, including a maximum a posteriori (MAP) method and three discriminative training methods, the boosting algorithm, the average perceptron and the minimum sample risk method, on the task of Japanese Kana-Kanji conversion. We evaluate these techniques beyond simply using the character error rate (CER): the CER results are interpreted using a metric of domain similarity between background and adaptation domains, and are further evaluated by correlating them with a novel metric for measuring the side effects of adapted models. Using these metrics, we show that the discriminative methods are superior to a MAP-based method not only in terms of achieving larger CER reduction, but also of being more robust against the similarity of background and adaptation domains, and achieve larger CER reduction with fewer side effects. 1
Simple Syntactic and Morphological Processing Can Help English-Hindi Statistical Machine Translation
"... In this paper, we report our work on incorporating syntactic and morphological information for English to Hindi statistical machine translation. Two simple and computationally inexpensive ideas have proven to be surprisingly effective: (i) reordering the English source sentence as per Hindi syntax, ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper, we report our work on incorporating syntactic and morphological information for English to Hindi statistical machine translation. Two simple and computationally inexpensive ideas have proven to be surprisingly effective: (i) reordering the English source sentence as per Hindi syntax, and (ii) using the suffixes of Hindi words. The former is done by applying simple transformation rules on the English parse tree. The latter, by using a simple suffix separation program. With only a small amount of bilingual training data and limited tools for Hindi, we achieve reasonable performance and substantial improvements over the baseline phrase-based system. Our approach eschews the use of parsing or other sophisticated linguistic tools for the target language (Hindi) making it a useful framework for statistical machine translation from English to Indian languages in general, since such tools are not widely available for Indian languages currently. 1
Chinese-English Organization Name Translation Based on Correlative Expansion
"... This paper presents an approach to translating Chinese organization names into English based on correlative expansion. Firstly, some candidate translations are generated by using statistical translation method. And several correlative named entities for the input are retrieved from a correlative nam ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper presents an approach to translating Chinese organization names into English based on correlative expansion. Firstly, some candidate translations are generated by using statistical translation method. And several correlative named entities for the input are retrieved from a correlative named entity list. Secondly, three kinds of expansion methods are used to generate some expanded queries. Finally, these queries are submitted to a search engine, and the refined translation results are mined and re-ranked by using the returned web pages. Experimental results show that this approach outperforms the compared system in overall translation accuracy. 1
Feasibility of Minimum Error Rate Training with a Human- Based Automatic Evaluation Metric
"... Abstract: Minimum error rate training (MERT) involves choosing parameter values for a machine translation (MT) system that maximize performance on a tuning set as measured by an automatic evaluation metric, such as BLEU. The method is best when the system will eventually be evaluated using the same ..."
Abstract
- Add to MetaCart
Abstract: Minimum error rate training (MERT) involves choosing parameter values for a machine translation (MT) system that maximize performance on a tuning set as measured by an automatic evaluation metric, such as BLEU. The method is best when the system will eventually be evaluated using the same metric, but in reality, most MT evaluations have a human-based component. Although performing MERT with a human-based metric seems like a daunting task, we describe a new metric, RYPT, which takes human judgments into account, but only requires human input to build a database that can be reused over and over again, hence eliminating the need for human input at tuning time. In this investigative study, we analyze the diversity (or lack thereof) of the candidates produced during MERT, we describe how this redundancy can be used to our advantage, and show that RYPT is a better predictor of translation quality than BLEU. Manuscript Click here to download Manuscript: HMERT_Zaidan-CCB_MTJ09.tex Click here to view linked References
Proceedings of IWSLT 2009, Tokyo- Japan FBK @ IWSLT 2009
"... This paper reports on the participation of FBK at the IWSLT 2009 Evaluation. This year we worked on the Arabic-English and Turkish-English BTEC tasks with a special effort on linguistic preprocessing techniques involving morphological segmentation. In addition, we investigated the adaptation problem ..."
Abstract
- Add to MetaCart
This paper reports on the participation of FBK at the IWSLT 2009 Evaluation. This year we worked on the Arabic-English and Turkish-English BTEC tasks with a special effort on linguistic preprocessing techniques involving morphological segmentation. In addition, we investigated the adaptation problem in the development of systems for the Chinese-English and English-Chinese challenge tasks; in particular, we explored different ways for clustering training data into topic or dialog-specific subsets: by producing (and combining) smaller but more focused models, we intended to make better use of the available training data, with the ultimate purpose of improving translation quality. 1.
Using Tectogrammatical Alignment in Phrase-Based Machine Translation
"... Abstract. In this paper, we describe an experiment whose goal is to improve the quality of machine translation. Phrase-based machine translation, which is the state-of-the-art in the field of statistical machine translation, learns its phrase tables from large parallel corpora, which have to be alig ..."
Abstract
- Add to MetaCart
Abstract. In this paper, we describe an experiment whose goal is to improve the quality of machine translation. Phrase-based machine translation, which is the state-of-the-art in the field of statistical machine translation, learns its phrase tables from large parallel corpora, which have to be aligned on the word level. The most common word-alignment tool is GIZA++. It is very universal and language independent. In this text, we introduce a different approach – the tectogrammatical alignment. It works on content (autosemantic) words only, but on these words it widely outperforms GIZA++. The GIZA++ word-alignment can be therefore improved using tectogrammatical alignment and if we use this improved alignment for training phrase-based automatic translators, the translation quality also slightly increases.

