• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

The Alignment Template Approach to Statistical Machine Translation (2004)

by Franz Josef Och, Hermann Ney
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 480
Next 10 →

Hierarchical phrase-based translation

by David Chiang - Computational Linguistics , 2007
"... We present a statistical machine translation model that uses hierarchical phrases—phrases that contain subphrases. The model is formally a synchronous context-free grammar but is learned from a parallel text without any syntactic annotations. Thus it can be seen as combining fundamental ideas from b ..."
Abstract - Cited by 597 (9 self) - Add to MetaCart
We present a statistical machine translation model that uses hierarchical phrases—phrases that contain subphrases. The model is formally a synchronous context-free grammar but is learned from a parallel text without any syntactic annotations. Thus it can be seen as combining fundamental ideas from both syntax-based translation and phrase-based translation. We describe our system’s training and decoding methods in detail, and evaluate it for translation speed and translation accuracy. Using BLEU as a metric of translation accuracy, we find that our system performs significantly better than the Alignment Template System, a state-of-the-art phrasebased system. 1.
(Show Context)

Citation Context

...on accuracy, we find that our system performs significantly better than the Alignment Template System, a state-of-the-art phrasebased system. 1. Introduction The alignment template translation model (=-=Och and Ney 2004-=-)and related phrase-based models advanced the state of the art in machine translation by expanding the basic unit of translation from words to phrases, that is, substrings of potentially unlimited siz...

A hierarchical phrase-based model for statistical machine translation

by David Chiang - IN ACL , 2005
"... We present a statistical phrase-based translation model that uses hierarchical phrases— phrases that contain subphrases. The model is formally a synchronous context-free grammar but is learned from a bitext without any syntactic information. Thus it can be seen as a shift to the formal machinery of ..."
Abstract - Cited by 491 (12 self) - Add to MetaCart
We present a statistical phrase-based translation model that uses hierarchical phrases— phrases that contain subphrases. The model is formally a synchronous context-free grammar but is learned from a bitext without any syntactic information. Thus it can be seen as a shift to the formal machinery of syntaxbased translation systems without any linguistic commitment. In our experiments using BLEU as a metric, the hierarchical phrasebased model achieves a relative improvement of 7.5 % over Pharaoh, a state-of-the-art phrase-based system.
(Show Context)

Citation Context

... as a metric, the hierarchical phrasebased model achieves a relative improvement of 7.5% over Pharaoh, a state-of-the-art phrase-based system. 1 Introduction The alignment template translation model (=-=Och and Ney, 2004-=-) and related phrase-based models advanced the previous state of the art by moving from words to phrases as the basic unit of translation. Phrases, which can be any substring and not necessarily phras...

Scalable inference and training of context-rich syntactic translation models

by Michel Galley, Jonathan Graehl, Kevin Knight, Daniel Marcu, Steve Deneefe, Wei Wang, Ignacio Thayer - In ACL , 2006
"... Statistical MT has made great progress in the last few years, but current translation models are weak on re-ordering and target language fluency. Syntactic approaches seek to remedy these problems. In this paper, we take the framework for acquiring multi-level syntactic translation rules of (Galley ..."
Abstract - Cited by 280 (21 self) - Add to MetaCart
Statistical MT has made great progress in the last few years, but current translation models are weak on re-ordering and target language fluency. Syntactic approaches seek to remedy these problems. In this paper, we take the framework for acquiring multi-level syntactic translation rules of (Galley et al., 2004) from aligned tree-string pairs, and present two main extensions of their approach: first, instead of merely computing a single derivation that minimally explains a sentence pair, we construct a large number of derivations that include contextually richer rules, and account for multiple interpretations of unaligned words. Second, we propose probability estimates and a training procedure for weighting these rules. We contrast different approaches on real examples, show that our estimates based on multiple derivations favor phrasal re-orderings that are linguistically better motivated, and establish that our larger rules provide a 3.63 BLEU point increase over minimal rules. 1
(Show Context)

Citation Context

...bility models estimated from a large number of derivations favor phrasal re-orderings that are linguistically well motivated. An empirical evaluation against a state-of-the-art SMT system similar to (=-=Och and Ney, 2004-=-) indicates positive prospects. Finally, we show that our contextually richer rules provide a 3.63 BLEU point increase over those of (Galley et al., 2004). 2 Inferring syntactic transformations We ass...

Better k-best parsing

by Liang Huang , 2005
"... We discuss the relevance of k-best parsing to recent applications in natural language processing, and develop efficient algorithms for k-best trees in the framework of hypergraph parsing. To demonstrate the efficiency, scalability and accuracy of these algorithms, we present experiments on Bikel’s i ..."
Abstract - Cited by 190 (16 self) - Add to MetaCart
We discuss the relevance of k-best parsing to recent applications in natural language processing, and develop efficient algorithms for k-best trees in the framework of hypergraph parsing. To demonstrate the efficiency, scalability and accuracy of these algorithms, we present experiments on Bikel’s implementation of Collins ’ lexicalized PCFG model, and on Chiang’s CFG-based decoder for hierarchical phrase-based translation. We show in particular how the improved output of our algorithms has the potential to improve results from parse reranking systems and other applications. 1

Minimum Bayes-risk decoding for statistical machine translation

by Shankar Kumar, William Byrne - IN PROCEEDINGS OF HLT-NAACL , 2004
"... We present Minimum Bayes-Risk (MBR) decoding for statistical machine translation. This statistical approach aims to minimize expected loss of translation errors under loss functions that measure translation performance. We describe a hierarchy of loss functions that incorporate different levels of l ..."
Abstract - Cited by 179 (16 self) - Add to MetaCart
We present Minimum Bayes-Risk (MBR) decoding for statistical machine translation. This statistical approach aims to minimize expected loss of translation errors under loss functions that measure translation performance. We describe a hierarchy of loss functions that incorporate different levels of linguistic information from word strings, word-to-word alignments from an MT system, and syntactic structure from parse-trees of source and target language sentences. We report the performance of the MBR decoders on a Chinese-to-English translation task. Our results show that MBR decoding can be used to tune statistical MT performance for specific loss functions.

Large language models in machine translation

by Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, Jeffrey Dean - In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning , 2007
"... This paper reports on the benefits of largescale statistical language modeling in machine translation. A distributed infrastructure is proposed which we use to train on up to 2 trillion tokens, resulting in language models having up to 300 billion n-grams. It is capable of providing smoothed probabi ..."
Abstract - Cited by 178 (9 self) - Add to MetaCart
This paper reports on the benefits of largescale statistical language modeling in machine translation. A distributed infrastructure is proposed which we use to train on up to 2 trillion tokens, resulting in language models having up to 300 billion n-grams. It is capable of providing smoothed probabilities for fast, single-pass decoding. We introduce a new smoothing method, dubbed Stupid Backoff, that is inexpensive to train on large data sets and approaches the quality of Kneser-Ney Smoothing as the amount of training data increases. 1
(Show Context)

Citation Context

...oblem of machine translation is to automatically produce a target-language (e.g., English) translation ê. The mathematics of the problem were formalized by (Brown et al., 1993), and re-formulated by (=-=Och and Ney, 2004-=-) in terms of the optimization ê = arg max e M� λmhm(e,f) (1) m=1 where {hm(e,f)} is a set of M feature functions and {λm} a set of weights. One or more feature functions may be of the form h(e,f) = h...

Tree-to-String Alignment Template for Statistical Machine Translation

by Yang Liu, et al. , 2006
"... We present a novel translation model based on tree-to-string alignment template (TAT) which describes the alignment between a source parse tree and a target string. A TAT is capable of generating both terminals and non-terminals and performing reordering at both low and high levels. The model is lin ..."
Abstract - Cited by 173 (32 self) - Add to MetaCart
We present a novel translation model based on tree-to-string alignment template (TAT) which describes the alignment between a source parse tree and a target string. A TAT is capable of generating both terminals and non-terminals and performing reordering at both low and high levels. The model is linguistically syntaxbased because TATs are extracted automatically from word-aligned, source side parsed parallel texts. To translate a source sentence, we first employ a parser to produce a source parse tree and then apply TATs to transform the tree into a target string. Our experiments show that the TAT-based model significantly outperforms Pharaoh, a state-of-the-art decoder for phrase-based models.
(Show Context)

Citation Context

...at the TAT-based model significantly outperforms Pharaoh, a state-of-the-art decoder for phrase-based models. 1 Introduction Phrase-based translation models (Marcu and Wong, 2002; Koehn et al., 2003; =-=Och and Ney, 2004-=-), which go beyond the original IBM translation models (Brown et al., 1993) 1 by modeling translations of phrases rather than individual words, have been suggested to be the state-of-theart in statist...

A Smorgasbord of Features for Statistical Machine Translation

by Franz Josef Och, Kenji Yamada, Alex Fraser, Daniel Gildea, Viren Jain, et al. , 2004
"... We describe a methodology for rapid experimentation in statistical machine translation which we use to add a large number of features to a baseline system exploiting features from a wide range of levels of syntactic representation. Feature ..."
Abstract - Cited by 150 (4 self) - Add to MetaCart
We describe a methodology for rapid experimentation in statistical machine translation which we use to add a large number of features to a baseline system exploiting features from a wide range of levels of syntactic representation. Feature

Statistical syntax-directed translation with extended domain of locality

by Liang Huang - In Proc. AMTA 2006 , 2006
"... A syntax-directed translator first parses the source-language input into a parsetree, and then recursively converts the tree into a string in the target-language. We model this conversion by an extended treeto-string transducer that have multi-level trees on the source-side, which gives our system m ..."
Abstract - Cited by 121 (14 self) - Add to MetaCart
A syntax-directed translator first parses the source-language input into a parsetree, and then recursively converts the tree into a string in the target-language. We model this conversion by an extended treeto-string transducer that have multi-level trees on the source-side, which gives our system more expressive power and flexibility. We also define a direct probability model and use a linear-time dynamic programming algorithm to search for the best derivation. The model is then extended to the general log-linear framework in order to rescore with other features like n-gram language models. We devise a simple-yet-effective algorithm to generate non-duplicate k-best translations for n-gram rescoring. Initial experimental results on English-to-Chinese translation are presented. 1
(Show Context)

Citation Context

..., respectively, and get the completed Chinese string in (e). ◦ ◦s2 Previous Work It is helpful to compare this approach with recent efforts in statistical MT. Phrase-based models (Koehn et al., 2003; =-=Och and Ney, 2004-=-) are good at learning local translations that are pairs of (consecutive) sub-strings, but often insufficient in modeling the reorderings of phrases themselves, especially between language pairs with ...

Improved statistical machine translation using paraphrases

by Chris Callison-burch, Miles Osborne - In Proceedings of HLT/NAACL-2006 , 2006
"... Parallel corpora are crucial for training SMT systems. However, for many language pairs they are available only in very limited quantities. For these language pairs a huge portion of phrases encountered at run-time will be unknown. We show how techniques from paraphrasing can be used to deal with th ..."
Abstract - Cited by 117 (3 self) - Add to MetaCart
Parallel corpora are crucial for training SMT systems. However, for many language pairs they are available only in very limited quantities. For these language pairs a huge portion of phrases encountered at run-time will be unknown. We show how techniques from paraphrasing can be used to deal with these otherwise unknown source language phrases. Our results show that augmenting a stateof-the-art SMT system with paraphrases leads to significantly improved coverage and translation quality. For a training corpus with 10,000 sentence pairs we increase the coverage of unique test set unigrams from 48 % to 90%, with more than half of the newly covered items accurately translated, as opposed to none in current approaches. 1
(Show Context)

Citation Context

...oblem of Coverage in SMT Statistical machine translation made considerable advances in translation quality with the introduction of phrase-based translation (Marcu and Wong, 2002; Koehn et al., 2003; =-=Och and Ney, 2004-=-). BysTest Set Items with Translations (%) 100 90 80 70 60 50 40 30 20 10 unigrams bigrams trigrams 4-grams 0 10000 100000 1e+06 1e+07 Training Corpus Size (num words) Figure 1: Percent of unique unig...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University