Results 1  10
of
30
A survey of statistical machine translation
, 2007
"... Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of humanproduced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular tec ..."
Abstract

Cited by 52 (4 self)
 Add to MetaCart
Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of humanproduced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular techniques have only emerged within the last few years. This survey presents a tutorial overview of stateoftheart SMT at the beginning of 2007. We begin with the context of the current research, and then move to a formal problem description and an overview of the four main subproblems: translational equivalence modeling, mathematical modeling, parameter estimation, and decoding. Along the way, we present a taxonomy of some different approaches within these areas. We conclude with an overview of evaluation and notes on future directions.
Hierarchical phrasebased translation with weighted finite state transducers and . . .
 IN PROCEEDINGS OF HLT/NAACL
, 2010
"... In this article we describe HiFST, a latticebased decoder for hierarchical phrasebased translation and alignment. The decoder is implemented with standard Weighted FiniteState Transducer (WFST) operations as an alternative to the wellknown cube pruning procedure. We find that the use of WFSTs ra ..."
Abstract

Cited by 27 (12 self)
 Add to MetaCart
In this article we describe HiFST, a latticebased decoder for hierarchical phrasebased translation and alignment. The decoder is implemented with standard Weighted FiniteState Transducer (WFST) operations as an alternative to the wellknown cube pruning procedure. We find that the use of WFSTs rather than kbest lists requires less pruning in translation search, resulting in fewer search errors, better parameter optimization, and improved translation performance. The direct generation of translation lattices in the target language can improve subsequent rescoring procedures, yielding further gains when applying longspan language models and Minimum Bayes Risk decoding. We also provide insights as to how to control the size of the search space defined by hierarchical rules. We show that shallown grammars, lowlevel rule catenation, and other search constraints can help to match the power of the translation system to specific language pairs.
An efficient twopass approach to synchronouscfg driven statistical mt
 In Proc. of HLTNAACL
, 2007
"... We present an efficient, novel twopass approach to mitigate the computational impact resulting from online intersection of an ngram language model (LM) and a probabilistic synchronous contextfree grammar (PSCFG) for statistical machine translation. In first pass CYKstyle decoding, we consider fi ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
We present an efficient, novel twopass approach to mitigate the computational impact resulting from online intersection of an ngram language model (LM) and a probabilistic synchronous contextfree grammar (PSCFG) for statistical machine translation. In first pass CYKstyle decoding, we consider firstbest chart item approximations, generating a hypergraph of sentence spanning target language derivations. In the second stage, we instantiate specific alternative derivations from this hypergraph, using the LM to drive this search process, recovering from search errors made in the first pass. Model search errors in our approach are comparable to those made by the stateoftheart “Cube Pruning ” approach in (Chiang, 2007) under comparable pruning conditions evaluated on both hierarchical and syntaxbased grammars. 1
Rule filtering by pattern for efficient hierarchical translation
 In Proceedings of the EACL
, 2009
"... We describe refinements to hierarchical translation search procedures intended to reduce both search errors and memory usage through modifications to hypothesis expansion in cube pruning and reductions in the size of the rule sets used in translation. Rules are put into syntactic classes based on th ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
We describe refinements to hierarchical translation search procedures intended to reduce both search errors and memory usage through modifications to hypothesis expansion in cube pruning and reductions in the size of the rule sets used in translation. Rules are put into syntactic classes based on the number of nonterminals and the pattern, and various filtering strategies are then applied to assess the impact on translation speed and quality. Results are reported on the 2008 NIST ArabictoEnglish evaluation task. 1
Restructuring, Relabeling, and Realigning for SyntaxBased Machine Translation
"... Language Weaver, Inc. This article shows that the structure of bilingual material from standard parsing and alignment tools is not optimal for training syntaxbased statistical machine translation (SMT) systems. We present three modifications to the MT training data to improve the accuracy of a stat ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
Language Weaver, Inc. This article shows that the structure of bilingual material from standard parsing and alignment tools is not optimal for training syntaxbased statistical machine translation (SMT) systems. We present three modifications to the MT training data to improve the accuracy of a stateoftheart syntax MT system:restructuring changes the syntactic structure of training parse trees to enable reuse of substructures; relabeling alters bracket labels to enrich rule application context; and realigning unifies word alignment across sentences to remove bad word alignments and refine good ones. Better structures, labels, and word alignments are learned by the EM algorithm. We show that each individual technique leads to improvement as measured by BLEU, and we also show that the greatest improvement is achieved by combining them. We report an overall 1.48 BLEU improvement on the NIST08 evaluation set over a strong baseline in Chinese/English translation. 1. Background Syntactic methods have recently proven useful in statistical machine translation (SMT). In this article, we explore different ways of exploiting the structure of bilingual material for syntaxbased SMT. In particular, we ask what kinds of tree structures, tree labels, and word alignments are best suited for improving endtoend translation accuracy. We begin with structures from standard parsing and alignment tools, then use the EM algorithm to revise these structures in light of the translation task. We report an overall +1.48 BLEU improvement on a standard ChinesetoEnglish test.
Syntactic realignment models for machine translation
 In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLPCoNLL2007
, 2007
"... We present a method for improving word alignment for statistical syntaxbased machine translation that employs a syntactically informed alignment model closer to the translation model than commonlyused word alignment models. This leads to extraction of more useful linguistic patterns and improved B ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
We present a method for improving word alignment for statistical syntaxbased machine translation that employs a syntactically informed alignment model closer to the translation model than commonlyused word alignment models. This leads to extraction of more useful linguistic patterns and improved BLEU scores on translation experiments in Chinese and Arabic. 1 Methods of statistical MT Roughly speaking, there are two paths commonly taken in statistical machine translation (Figure 1). The idealistic path uses an unsupervised learning algorithm such as EM (Demptser et al., 1977) to learn parameters for some proposed translation model from a bitext training corpus, and then directly translates using the weighted model. Some examples of the idealistic approach are the direct IBM word model (Berger et al., 1994; Germann et al., 2001), the phrasebased approach of Marcu and Wong (2002), and the syntax approaches of Wu (1996) and Yamada and Knight (2001). Idealistic approaches are conceptually simple and thus easy to relate to observed phenomena. However, as more parameters are added to the model the idealistic approach has not scaled well, for it is increasingly difficult to incorporate large amounts of training data efficiently over an increasingly large search space. Additionally, the EM procedure has a tendency to overfit its training data when the input units have varying explanatory powers, such as variablesize phrases or variableheight trees.
Factorization of synchronous contextfree grammars in linear time
 In NAACL Workshop on Syntax and Structure in Statistical Translation (SSST
, 2007
"... Factoring a Synchronous ContextFree Grammar into an equivalent grammar with a smaller number of nonterminals in each rule enables synchronous parsing algorithms of lower complexity. The problem can be formalized as searching for the treedecomposition of a given permutation with the minimal branchi ..."
Abstract

Cited by 9 (5 self)
 Add to MetaCart
Factoring a Synchronous ContextFree Grammar into an equivalent grammar with a smaller number of nonterminals in each rule enables synchronous parsing algorithms of lower complexity. The problem can be formalized as searching for the treedecomposition of a given permutation with the minimal branching factor. In this paper, by modifying the algorithm of Uno and Yagiura (2000) for the closely related problem of finding all common intervals of two permutations, we achieve a linear time algorithm for the permutation factorization problem. We also use the algorithm to analyze the maximum SCFG rule length needed to cover handaligned data from various language pairs. 1
Efficient Parsing for Transducer Grammars
"... The treetransducer grammars that arise in current syntactic machine translation systems are large, flat, and highly lexicalized. We address the problem of parsing efficiently with such grammars in three ways. First, we present a pair of grammar transformations that admit an efficient cubictime CKY ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
The treetransducer grammars that arise in current syntactic machine translation systems are large, flat, and highly lexicalized. We address the problem of parsing efficiently with such grammars in three ways. First, we present a pair of grammar transformations that admit an efficient cubictime CKYstyle parsing algorithm despite leaving most of the grammar in nary form. Second, we show how the number of intermediate symbols generated by this transformation can be substantially reduced through binarization choices. Finally, we describe a twopass coarsetofine parsing approach that prunes the search space using predictions from a subset of the original grammar. In all, parsing time reduces by 81%. We also describe a coarsetofine pruning scheme for forestbased language model reranking that allows a 100fold increase in beam size while reducing decoding time. The resulting translations improve by 1.3 BLEU. 1
MACHINE TRANSLATION BY PATTERN MATCHING
, 2008
"... The best systems for machine translation of natural language are based on statistical models learned from data. Conventional representation of a statistical translation model requires substantial offline computation and representation in main memory. Therefore, the principal bottlenecks to the amoun ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
The best systems for machine translation of natural language are based on statistical models learned from data. Conventional representation of a statistical translation model requires substantial offline computation and representation in main memory. Therefore, the principal bottlenecks to the amount of data we can exploit and the complexity of models we can use are available memory and CPU time, and current state of the art already pushes these limits. With data size and model complexity continually increasing, a scalable solution to this problem is central to future improvement. CallisonBurch et al. (2005) and Zhang and Vogel (2005) proposed a solution that we call translation by pattern matching, which we bring to fruition in this dissertation. The training data itself serves as a proxy to the model; rules and parameters are computed on demand. It achieves our desiderata of minimal offline computation and compact representation, but is dependent on fast pattern matching algorithms on text. They demonstrated its application to a common model based on the translation of contiguous substrings, but leave some open problems. Among these is a question: can this approach match the performance of conventional methods despite unavoidable differences that it induces in the model? We show how to answer this question affirmatively. The main
Parsing and Translation Algorithms Based on Weighted Extended Tree Transducers
"... This paper proposes a uniform framework for the development of parsing and translation algorithms for weighted extended (topdown) tree transducers and input strings. The asymptotic time complexity of these algorithms can be improved in practice by exploiting an algorithm for rule factorization in t ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
This paper proposes a uniform framework for the development of parsing and translation algorithms for weighted extended (topdown) tree transducers and input strings. The asymptotic time complexity of these algorithms can be improved in practice by exploiting an algorithm for rule factorization in the above transducers.