Results 1 - 10
of
597
A new string-to-dependency machine translation algorithm with a target dependency language model
- In Proc. of ACL
, 2008
"... In this paper, we propose a novel string-todependency algorithm for statistical machine translation. With this new framework, we employ a target dependency language model during decoding to exploit long distance word relations, which are unavailable with a traditional n-gram language model. Our expe ..."
Abstract
-
Cited by 135 (7 self)
- Add to MetaCart
(Show Context)
In this paper, we propose a novel string-todependency algorithm for statistical machine translation. With this new framework, we employ a target dependency language model during decoding to exploit long distance word relations, which are unavailable with a traditional n-gram language model. Our experiments show that the string-to-dependency decoder achieves 1.48 point improvement in BLEU and 2.53 point improvement in TER compared to a standard hierarchical string-tostring system on the NIST 04 Chinese-English evaluation set. 1
cdec: A decoder, alignment, and learning framework for finite-state and context-free translation models
- In Proceedings of ACL System Demonstrations
, 2010
"... We present cdec, an open source framework for decoding, aligning with, and training a number of statistical machine translation models, including word-based models, phrase-based models, and models based on synchronous context-free grammars. Using a single unified internal representation for translat ..."
Abstract
-
Cited by 134 (53 self)
- Add to MetaCart
(Show Context)
We present cdec, an open source framework for decoding, aligning with, and training a number of statistical machine translation models, including word-based models, phrase-based models, and models based on synchronous context-free grammars. Using a single unified internal representation for translation forests, the decoder strictly separates model-specific translation logic from general rescoring, pruning, and inference algorithms. From this unified representation, the decoder can extract not only the 1- or k-best translations, but also alignments to a reference, or the quantities necessary to drive discriminative training using gradient-based or gradient-free optimization techniques. Its efficient C++ implementation means that memory use and runtime performance are significantly better than comparable decoders. 1
Online Large-Margin Training of Syntactic and Structural Translation Features
"... Minimum-error-rate training (MERT) is a bottleneck for current development in statistical machine translation because it is limited in the number of weights it can reliably optimize. Building on the work of Watanabe et al., we explore the use of the MIRA algorithm of Crammer et al. as an alternative ..."
Abstract
-
Cited by 124 (12 self)
- Add to MetaCart
(Show Context)
Minimum-error-rate training (MERT) is a bottleneck for current development in statistical machine translation because it is limited in the number of weights it can reliably optimize. Building on the work of Watanabe et al., we explore the use of the MIRA algorithm of Crammer et al. as an alternative to MERT. We first show that by parallel processing and exploiting more of the parse forest, we can obtain results using MIRA that match or surpass MERT in terms of both translation quality and computational cost. We then test the method on two classes of features that address deficiencies in the Hiero hierarchical phrasebased model: first, we simultaneously train a large number of Marton and Resnik’s soft syntactic constraints, and, second, we introduce a novel structural distortion model. In both cases we obtain significant improvements in translation performance. Optimizing them in combination, for a total of 56 feature weights, we improve performance by 2.6 Bleu on a subset of the NIST 2006 Arabic-English evaluation data.
Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability.
- for Computational Linguistics.
, 2011
"... Abstract In statistical machine translation, a researcher seeks to determine whether some innovation (e.g., a new feature, model, or inference algorithm) improves translation quality in comparison to a baseline system. To answer this question, he runs an experiment to evaluate the behavior of the t ..."
Abstract
-
Cited by 124 (15 self)
- Add to MetaCart
(Show Context)
Abstract In statistical machine translation, a researcher seeks to determine whether some innovation (e.g., a new feature, model, or inference algorithm) improves translation quality in comparison to a baseline system. To answer this question, he runs an experiment to evaluate the behavior of the two systems on held-out data. In this paper, we consider how to make such experiments more statistically reliable. We provide a systematic analysis of the effects of optimizer instability-an extraneous variable that is seldom controlled for-on experimental outcomes, and make recommendations for reporting results more accurately.
11,001 new features for statistical machine translation
- In North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HLT
, 2009
"... We use the Margin Infused Relaxed Algorithm of Crammer et al. to add a large number of new features to two machine translation systems: the Hiero hierarchical phrasebased translation system and our syntax-based translation system. On a large-scale Chinese-English translation task, we obtain statisti ..."
Abstract
-
Cited by 117 (2 self)
- Add to MetaCart
(Show Context)
We use the Margin Infused Relaxed Algorithm of Crammer et al. to add a large number of new features to two machine translation systems: the Hiero hierarchical phrasebased translation system and our syntax-based translation system. On a large-scale Chinese-English translation task, we obtain statistically significant improvements of +1.5 Bleu and +1.1 Bleu, respectively. We analyze the impact of the new features and the performance of the learning algorithm. 1
Forestbased translation
- In Proceedings of ACL-08: HLT
, 2008
"... Among syntax-based translation models, the tree-based approach, which takes as input a parse tree of the source sentence, is a promising direction being faster and simpler than its string-based counterpart. However, current tree-based systems suffer from a major drawback: they only use the 1-best pa ..."
Abstract
-
Cited by 105 (24 self)
- Add to MetaCart
(Show Context)
Among syntax-based translation models, the tree-based approach, which takes as input a parse tree of the source sentence, is a promising direction being faster and simpler than its string-based counterpart. However, current tree-based systems suffer from a major drawback: they only use the 1-best parse to direct the translation, which potentially introduces translation mistakes due to parsing errors. We propose a forest-based approach that translates a packed forest of exponentially many parses, which encodes many more alternatives than standard n-best lists. Large-scale experiments show an absolute improvement of 1.7 BLEU points over the 1-best baseline. This result is also 0.8 points higher than decoding with 30-best parses, and takes even less time. 1
Word sense disambiguation improves statistical machine translation
- In 45th Annual Meeting of the Association for Computational Linguistics (ACL-07
, 2007
"... Recent research presents conflicting evidence on whether word sense disambiguation (WSD) systems can help to improve the performance of statistical machine translation (MT) systems. In this paper, we successfully integrate a state-of-the-art WSD system into a state-of-the-art hierarchical phrase-bas ..."
Abstract
-
Cited by 99 (5 self)
- Add to MetaCart
Recent research presents conflicting evidence on whether word sense disambiguation (WSD) systems can help to improve the performance of statistical machine translation (MT) systems. In this paper, we successfully integrate a state-of-the-art WSD system into a state-of-the-art hierarchical phrase-based MT system, Hiero. We show for the first time that integrating a WSD system improves the performance of a state-ofthe-art statistical MT system on an actual translation task. Furthermore, the improvement is statistically significant. 1
A survey of statistical machine translation
, 2007
"... Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular tec ..."
Abstract
-
Cited by 93 (6 self)
- Add to MetaCart
Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular techniques have only emerged within the last few years. This survey presents a tutorial overview of state-of-the-art SMT at the beginning of 2007. We begin with the context of the current research, and then move to a formal problem description and an overview of the four main subproblems: translational equivalence modeling, mathematical modeling, parameter estimation, and decoding. Along the way, we present a taxonomy of some different approaches within these areas. We conclude with an overview of evaluation and notes on future directions.
Dependency parsing by belief propagation
- In Proceedings of EMNLP
, 2008
"... We formulate dependency parsing as a graphical model with the novel ingredient of global constraints. We show how to apply loopy belief propagation (BP), a simple and effective tool for approximate learning and inference. As a parsing algorithm, BP is both asymptotically and empirically efficient. E ..."
Abstract
-
Cited by 84 (9 self)
- Add to MetaCart
(Show Context)
We formulate dependency parsing as a graphical model with the novel ingredient of global constraints. We show how to apply loopy belief propagation (BP), a simple and effective tool for approximate learning and inference. As a parsing algorithm, BP is both asymptotically and empirically efficient. Even with second-order features or latent variables, which would make exact parsing considerably slower or NP-hard, BP needs only O(n3) time with a small constant factor. Furthermore, such features significantly improve parse accuracy over exact first-order methods. Incorporating additional features would increase the runtime additively rather than multiplicatively. 1
Forest rescoring: Faster decoding with integrated language models
- In ACL ’07
, 2007
"... Efficient decoding has been a fundamental problem in machine translation, especially with an integrated language model which is essential for achieving good translation quality. We develop faster approaches for this problem based on k-best parsing algorithms and demonstrate their effectiveness on bo ..."
Abstract
-
Cited by 80 (0 self)
- Add to MetaCart
(Show Context)
Efficient decoding has been a fundamental problem in machine translation, especially with an integrated language model which is essential for achieving good translation quality. We develop faster approaches for this problem based on k-best parsing algorithms and demonstrate their effectiveness on both phrase-based and syntax-based MT systems. In both cases, our methods achieve significant speed improvements, often by more than a factor of ten, over the conventional beam-search method at the same levels of search error and translation accuracy. 1