Results 1 - 10
of
25
Hierarchical phrase-based translation with weighted finite state transducers and . . .
- IN PROCEEDINGS OF HLT/NAACL
, 2010
"... In this article we describe HiFST, a lattice-based decoder for hierarchical phrase-based translation and alignment. The decoder is implemented with standard Weighted Finite-State Transducer (WFST) operations as an alternative to the well-known cube pruning procedure. We find that the use of WFSTs ra ..."
Abstract
-
Cited by 48 (20 self)
- Add to MetaCart
In this article we describe HiFST, a lattice-based decoder for hierarchical phrase-based translation and alignment. The decoder is implemented with standard Weighted Finite-State Transducer (WFST) operations as an alternative to the well-known cube pruning procedure. We find that the use of WFSTs rather than k-best lists requires less pruning in translation search, resulting in fewer search errors, better parameter optimization, and improved translation performance. The direct generation of translation lattices in the target language can improve subsequent rescoring procedures, yielding further gains when applying long-span language models and Minimum Bayes Risk decoding. We also provide insights as to how to control the size of the search space defined by hierarchical rules. We show that shallow-n grammars, low-level rule catenation, and other search constraints can help to match the power of the translation system to specific language pairs.
Kriya – An end-to-end Hierarchical Phrase-based MT System
, 2012
"... This paper describes Kriya — a new statistical machine translation (SMT) system that uses hierarchical phrases, which were first introduced in the Hiero machine translation system (Chiang, 2007). Kriya supports both a grammar extraction module for synchronous context-free grammars (SCFGs) and a CKY- ..."
Abstract
-
Cited by 11 (8 self)
- Add to MetaCart
This paper describes Kriya — a new statistical machine translation (SMT) system that uses hierarchical phrases, which were first introduced in the Hiero machine translation system (Chiang, 2007). Kriya supports both a grammar extraction module for synchronous context-free grammars (SCFGs) and a CKY-based decoder. There are several re-implementations of Hiero in the machine translation community, but Kriya offers the following novel contributions: (a) Grammar extraction in Kriya supports extraction of the full set of Hiero-style SCFG rules but also supports the extraction of several types of compact rule sets which leads to faster decoding for different language pairs without compromising the BLEU scores. Kriya currently supports extraction of compact SCFGs such as grammars with one non-terminal and grammar pruning based on certain rule patterns, and (b) The Kriya decoder offers some unique improvements in the implementation of cube pruning, such as increasing diversity in the target language n-best output and novel methods for language model (LM) integration. The Kriya decoder can take advantage of parallelization using a networked cluster. Kriya supports KENLM and SRILM for language model queries and exploits n-gram history states in KENLM. This paper also provides several experimental results which demonstrate that the translation quality of Kriya compares favourably to the Moses (Koehn et al., 2007) phrase-based system in several language pairs while showing a substantial improvement for Chinese-English similar to Chiang (2007). We also quantify the model sizes for phrase-based and Hiero-style systems apart from presenting experiments comparing variants of Hiero models. 1.
A Comparison of Various Types of Extended Lexicon Models for Statistical Machine Translation
- In Conf. of the Assoc. for Machine Translation in the Americas (AMTA
, 2010
"... In this work we give a detailed comparison of the impact of the integration of discriminative and trigger-based lexicon models in state-ofthe-art hierarchical and conventional phrasebased statistical machine translation systems. As both types of extended lexicon models can grow very large, we apply ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
In this work we give a detailed comparison of the impact of the integration of discriminative and trigger-based lexicon models in state-ofthe-art hierarchical and conventional phrasebased statistical machine translation systems. As both types of extended lexicon models can grow very large, we apply certain restrictions to discard some of the less useful information. We show how these restrictions facilitate the training of the extended lexicon models. We finally evaluate systems that incorporate both types of models with different restrictions on a large-scale translation task for the Arabic-English language pair. Our results suggest that extended lexicon models can be substantially reduced in size while still giving clear improvements in translation performance. 1
Discriminative reordering extensions for hierarchical phrase-based machine translation
- In Proceedings of the 16th Annual Conference of the European Association for Machine Translation (EAMT
, 2012
"... In this paper, we propose novel extensions of hierarchical phrase-based systems with a discriminative lexicalized reordering model. We compare different feature sets for the discriminative reordering model and investigate combinations with three types of non-lexicalized reordering rules which are ad ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
In this paper, we propose novel extensions of hierarchical phrase-based systems with a discriminative lexicalized reordering model. We compare different feature sets for the discriminative reordering model and investigate combinations with three types of non-lexicalized reordering rules which are added to the hierarchical grammar in order to allow for more reordering flexibility during decoding. All extensions are evaluated in standard hierarchical setups as well as in setups where the hierarchical recursion depth is restricted. We achieve improvements of up to +1.2 %BLEU on a large-scale Chinese→English translation task. 1
Bayesian extraction of minimal scfg rules for hierarchical phrase-based translation
- In Proceedings of the Sixth Workshop on SMT
, 2011
"... We present a novel approach for extracting a minimal synchronous context-free grammar (SCFG) for Hiero-style statistical machine translation using a non-parametric Bayesian framework. Our approach is designed to extract rules that are licensed by the word alignments and heuristically extracted phras ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
We present a novel approach for extracting a minimal synchronous context-free grammar (SCFG) for Hiero-style statistical machine translation using a non-parametric Bayesian framework. Our approach is designed to extract rules that are licensed by the word alignments and heuristically extracted phrase pairs. Our Bayesian model limits the number of SCFG rules extracted, by sampling from the space of all possible hierarchical rules; additionally our informed prior based on the lexical alignment probabilities biases the grammar to extract high quality rules leading to improved generalization and the automatic identification of commonly re-used rules. We show that our Bayesian model is able to extract minimal set of hierarchical phrase rules without impacting the translation quality as measured by the BLEU score. 1
Lexicon Models for Hierarchical Phrase-Based Machine Translation
"... In this paper, we investigate lexicon models for hierarchical phrase-based statistical machine translation. We study five types of lexicon models: a model which is extracted from word-aligned training data and—given the word alignment matrix—relies on pure relative frequencies [1]; the IBM model 1 l ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
(Show Context)
In this paper, we investigate lexicon models for hierarchical phrase-based statistical machine translation. We study five types of lexicon models: a model which is extracted from word-aligned training data and—given the word alignment matrix—relies on pure relative frequencies [1]; the IBM model 1 lexicon [2]; a regularized version of IBM model 1; a triplet lexicon model variant [3]; and a discriminatively trained word lexicon model [4]. We explore sourceto-target models with phrase-level as well as sentence-level scoring and target-to-source models with scoring on phrase level only. For the first two types of lexicon models, we compare several scoring variants. All models are used during search, i.e. they are incorporated directly into the log-linear model combination of the decoder. Phrase table smoothing with triplet lexicon models and with discriminative word lexicons are novel contributions. We also propose a new regularization technique for IBM model 1 by means of the Kullback-Leibler divergence with the empirical unigram distribution as regularization term. Experiments are carried out on the large-scale NIST Chinese→English translation task and on the English→French and Arabic→English IWSLT TED tasks. For Chinese→English and English→French, we obtain the best results by using the discriminative word lexicon to smooth our phrase tables. 1.
Hierarchical Phrase-based Translation Grammars Extracted from Alignment Posterior Probabilities
"... We report on investigations into hierarchical phrase-based translation grammars based on rules extracted from posterior distributions over alignments of the parallel text. Rather than restrict rule extraction to a single alignment, such as Viterbi, we instead extract rules based on posterior distrib ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
We report on investigations into hierarchical phrase-based translation grammars based on rules extracted from posterior distributions over alignments of the parallel text. Rather than restrict rule extraction to a single alignment, such as Viterbi, we instead extract rules based on posterior distributions provided by the HMM word-to-word alignment model. We define translation grammars progressively by adding classes of rules to a basic phrase-based system. We assess these grammars in terms of their expressive power, measured by their ability to align the parallel text from which their rules are extracted, and the quality of the translations they yield. In Chinese-to-English translation, we find that rule extraction from posteriors gives translation improvements. We also find that grammars with rules with only one nonterminal, when extracted from posteriors, can outperform more complex grammars extracted from Viterbi alignments. Finally, we show that the best way to exploit source-totarget and target-to-source alignment models is to build two separate systems and combine their output translation lattices. 1
Lightly-Supervised Training for Hierarchical Phrase-Based Machine Translation
"... In this paper we apply lightly-supervised training to a hierarchical phrase-based statistical machine translation system. We employ bitexts that have been built by automatically translating large amounts of monolingual data as additional parallel training corpora. We explore different ways of using ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
In this paper we apply lightly-supervised training to a hierarchical phrase-based statistical machine translation system. We employ bitexts that have been built by automatically translating large amounts of monolingual data as additional parallel training corpora. We explore different ways of using this additional data to improve our system. Our results show that integrating a second translation model with only non-hierarchical phrases extracted from the automatically generated bitexts is a reasonable approach. The translation performance matches the result we achieve with a joint extraction on all training bitexts while the system is kept smaller due to a considerably lower overall number of phrases. 1
A performance study of cube pruning for large-scale hierarchical machine translation
- In Proceedings of the NAACL 7th Workshop on Syntax, Semantics and Structure in Statistical Translation
, 2013
"... Abstract In this paper, we empirically investigate the impact of critical configuration parameters in the popular cube pruning algorithm for decoding in hierarchical statistical machine translation. Specifically, we study how the choice of the k-best generation size affects translation quality and ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Abstract In this paper, we empirically investigate the impact of critical configuration parameters in the popular cube pruning algorithm for decoding in hierarchical statistical machine translation. Specifically, we study how the choice of the k-best generation size affects translation quality and resource requirements in hierarchical search. We furthermore examine the influence of two different granularities of hypothesis recombination. Our experiments are conducted on the large-scale Chinese→English and Arabic→English NIST translation tasks. Besides standard hierarchical grammars, we also explore search with restricted recursion depth of hierarchical rules based on shallow-1 grammars.
Improved features and grammar selection for syntax-based MT
- In Proceedings of the Workshop on Statistical Machine Translation (WMT
, 2010
"... Abstract We present the Carnegie Mellon University Stat-XFER group submission to the WMT 2010 shared translation task. Updates to our syntax-based SMT system mainly fell in the areas of new feature formulations in the translation model and improved filtering of SCFG rules. Compared to our WMT 2009 ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract We present the Carnegie Mellon University Stat-XFER group submission to the WMT 2010 shared translation task. Updates to our syntax-based SMT system mainly fell in the areas of new feature formulations in the translation model and improved filtering of SCFG rules. Compared to our WMT 2009 submission, we report a gain of 1.73 BLEU by using the new features and decoding environment, and a gain of up to 0.52 BLEU from improved grammar selection.