Results 1 - 10
of
51
Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora
, 1997
"... ..."
Hierarchical phrase-based translation
- Computational Linguistics
, 2007
"... We present a statistical machine translation model that uses hierarchical phrases—phrases that contain subphrases. The model is formally a synchronous context-free grammar but is learned from a parallel text without any syntactic annotations. Thus it can be seen as combining fundamental ideas from b ..."
Abstract
-
Cited by 209 (4 self)
- Add to MetaCart
We present a statistical machine translation model that uses hierarchical phrases—phrases that contain subphrases. The model is formally a synchronous context-free grammar but is learned from a parallel text without any syntactic annotations. Thus it can be seen as combining fundamental ideas from both syntax-based translation and phrase-based translation. We describe our system’s training and decoding methods in detail, and evaluate it for translation speed and translation accuracy. Using BLEU as a metric of translation accuracy, we find that our system performs significantly better than the Alignment Template System, a state-of-the-art phrasebased system. 1.
Learning Dependency Translation Models as Collections of Finite State Head Transducers
- Computational Linguistics
, 2000
"... The paper defines weighted head transducers,finite-state machines that perform middle-out string transduction. These transducers are strictly more expressive than the special case of standard leftto-right finite-state transducers. Dependency transduction models are then defined as collections of wei ..."
Abstract
-
Cited by 57 (3 self)
- Add to MetaCart
The paper defines weighted head transducers,finite-state machines that perform middle-out string transduction. These transducers are strictly more expressive than the special case of standard leftto-right finite-state transducers. Dependency transduction models are then defined as collections of weighted head transducers that are applied hierarchically. A dynamic programming search algorithm is described for finding the optimal transduction of an input string with respect to a dependency transduction model. A method for automatically training a dependency transduction model from a set of input-output example strings is presented. The method first searches for hierarchical alignments of the training examples guided by correlation statistics, and then constructs the transitions of head transducers that are consistent with these alignments. Experimental results are given for applying the training method to translation from English to Spanish and Japanese. 1.
Statistical syntax-directed translation with extended domain of locality
- In Proc. AMTA 2006
, 2006
"... A syntax-directed translator first parses the source-language input into a parsetree, and then recursively converts the tree into a string in the target-language. We model this conversion by an extended treeto-string transducer that have multi-level trees on the source-side, which gives our system m ..."
Abstract
-
Cited by 50 (12 self)
- Add to MetaCart
A syntax-directed translator first parses the source-language input into a parsetree, and then recursively converts the tree into a string in the target-language. We model this conversion by an extended treeto-string transducer that have multi-level trees on the source-side, which gives our system more expressive power and flexibility. We also define a direct probability model and use a linear-time dynamic programming algorithm to search for the best derivation. The model is then extended to the general log-linear framework in order to rescore with other features like n-gram language models. We devise a simple-yet-effective algorithm to generate non-duplicate k-best translations for n-gram rescoring. Initial experimental results on English-to-Chinese translation are presented. 1
Context-Free Languages and Push-Down Automata
- Handbook of Formal Languages
, 1997
"... Contents 1. Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 1.1 Grammars : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 1.2 Examples : : : : : : : : : : : : : : : : : : : : : : : : : : : ..."
Abstract
-
Cited by 48 (0 self)
- Add to MetaCart
Contents 1. Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 1.1 Grammars : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 1.2 Examples : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 2. Systems of equations : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5 2.1 Systems : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6 2.2 Resolution : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 11 2.3 Linear systems : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 12 2.4 Parikh's theorem : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
Word sense disambiguation improves statistical machine translation
- In 45th Annual Meeting of the Association for Computational Linguistics (ACL-07
, 2007
"... Recent research presents conflicting evidence on whether word sense disambiguation (WSD) systems can help to improve the performance of statistical machine translation (MT) systems. In this paper, we successfully integrate a state-of-the-art WSD system into a state-of-the-art hierarchical phrase-bas ..."
Abstract
-
Cited by 45 (3 self)
- Add to MetaCart
Recent research presents conflicting evidence on whether word sense disambiguation (WSD) systems can help to improve the performance of statistical machine translation (MT) systems. In this paper, we successfully integrate a state-of-the-art WSD system into a state-of-the-art hierarchical phrase-based MT system, Hiero. We show for the first time that integrating a WSD system improves the performance of a state-ofthe-art statistical MT system on an actual translation task. Furthermore, the improvement is statistically significant. 1
A survey of statistical machine translation
, 2007
"... Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular tec ..."
Abstract
-
Cited by 30 (3 self)
- Add to MetaCart
Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular techniques have only emerged within the last few years. This survey presents a tutorial overview of state-of-the-art SMT at the beginning of 2007. We begin with the context of the current research, and then move to a formal problem description and an overview of the four main subproblems: translational equivalence modeling, mathematical modeling, parameter estimation, and decoding. Along the way, we present a taxonomy of some different approaches within these areas. We conclude with an overview of evaluation and notes on future directions.
Forest rescoring: Faster decoding with integrated language models
- In ACL ’07
, 2007
"... Efficient decoding has been a fundamental problem in machine translation, especially with an integrated language model which is essential for achieving good translation quality. We develop faster approaches for this problem based on k-best parsing algorithms and demonstrate their effectiveness on bo ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
Efficient decoding has been a fundamental problem in machine translation, especially with an integrated language model which is essential for achieving good translation quality. We develop faster approaches for this problem based on k-best parsing algorithms and demonstrate their effectiveness on both phrase-based and syntax-based MT systems. In both cases, our methods achieve significant speed improvements, often by more than a factor of ten, over the conventional beam-search method at the same levels of search error and translation accuracy. 1
A discriminative latent variable model for statistical machine translation
- In Proc. of the 46th Annual Conference of the Association for Computational Linguistics: Human Language Technologies (ACL-08:HLT
, 2008
"... Large-scale discriminative machine translation promises to further the state-of-the-art, but has failed to deliver convincing gains over current heuristic frequency count systems. We argue that a principle reason for this failure is not dealing with multiple, equivalent translations. We present a tr ..."
Abstract
-
Cited by 29 (2 self)
- Add to MetaCart
Large-scale discriminative machine translation promises to further the state-of-the-art, but has failed to deliver convincing gains over current heuristic frequency count systems. We argue that a principle reason for this failure is not dealing with multiple, equivalent translations. We present a translation model which models derivations as a latent variable, in both training and decoding, and is fully discriminative and globally optimised. Results show that accounting for multiple derivations does indeed improve performance. Additionally, we show that regularisation is essential for maximum conditional likelihood models in order to avoid degenerate solutions. 1
Lexicalized Markov grammars for sentence compression
, 2007
"... We present a sentence compression system based on synchronous context-free grammars (SCFG), following the successful noisy-channel approach of (Knight and Marcu, 2000). We define a headdriven Markovization formulation of SCFG deletion rules, which allows us to lexicalize probabilities of constituent ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
We present a sentence compression system based on synchronous context-free grammars (SCFG), following the successful noisy-channel approach of (Knight and Marcu, 2000). We define a headdriven Markovization formulation of SCFG deletion rules, which allows us to lexicalize probabilities of constituent deletions. We also use a robust approach for tree-to-tree alignment between arbitrary document-abstract parallel corpora, which lets us train lexicalized models with much more data than previous approaches relying exclusively on scarcely available document-compression corpora. Finally, we evaluate different Markovized models, and find that our selected best model is one that exploits head-modifier bilexicalization to accurately distinguish adjuncts from complements, and that produces sentences that were judged more grammatical than those generated by previous work. 1

