Results 1  10
of
40
Hierarchical phrasebased translation
 Computational Linguistics
, 2007
"... We present a statistical machine translation model that uses hierarchical phrases—phrases that contain subphrases. The model is formally a synchronous contextfree grammar but is learned from a parallel text without any syntactic annotations. Thus it can be seen as combining fundamental ideas from b ..."
Abstract

Cited by 590 (9 self)
 Add to MetaCart
We present a statistical machine translation model that uses hierarchical phrases—phrases that contain subphrases. The model is formally a synchronous contextfree grammar but is learned from a parallel text without any syntactic annotations. Thus it can be seen as combining fundamental ideas from both syntaxbased translation and phrasebased translation. We describe our system’s training and decoding methods in detail, and evaluate it for translation speed and translation accuracy. Using BLEU as a metric of translation accuracy, we find that our system performs significantly better than the Alignment Template System, a stateoftheart phrasebased system. 1.
A survey of statistical machine translation
, 2007
"... Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of humanproduced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular tec ..."
Abstract

Cited by 93 (6 self)
 Add to MetaCart
Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of humanproduced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular techniques have only emerged within the last few years. This survey presents a tutorial overview of stateoftheart SMT at the beginning of 2007. We begin with the context of the current research, and then move to a formal problem description and an overview of the four main subproblems: translational equivalence modeling, mathematical modeling, parameter estimation, and decoding. Along the way, we present a taxonomy of some different approaches within these areas. We conclude with an overview of evaluation and notes on future directions.
Hmm word and phrase alignment for statistical machine translation
 In Proceedings of HLTEMNLP
, 2005
"... HMMbased models are developed for the alignment of words and phrases in bitext. The models are formulated so that alignment and parameter estimation can be performed efficiently. We find that ChineseEnglish word alignment performance is comparable to that of IBM Model4 even over large training bi ..."
Abstract

Cited by 52 (10 self)
 Add to MetaCart
(Show Context)
HMMbased models are developed for the alignment of words and phrases in bitext. The models are formulated so that alignment and parameter estimation can be performed efficiently. We find that ChineseEnglish word alignment performance is comparable to that of IBM Model4 even over large training bitexts. Phrase pairs extracted from word alignments generated under the model can also be used for phrasebased translation, and in Chinese to English and Arabic to English translation, performance is comparable to systems based on Model4 alignments. Direct phrase pair induction under the model is described and shown to improve translation performance. 1
Hierarchical phrasebased translation with weighted finite state transducers and . . .
 IN PROCEEDINGS OF HLT/NAACL
, 2010
"... In this article we describe HiFST, a latticebased decoder for hierarchical phrasebased translation and alignment. The decoder is implemented with standard Weighted FiniteState Transducer (WFST) operations as an alternative to the wellknown cube pruning procedure. We find that the use of WFSTs ra ..."
Abstract

Cited by 45 (18 self)
 Add to MetaCart
In this article we describe HiFST, a latticebased decoder for hierarchical phrasebased translation and alignment. The decoder is implemented with standard Weighted FiniteState Transducer (WFST) operations as an alternative to the wellknown cube pruning procedure. We find that the use of WFSTs rather than kbest lists requires less pruning in translation search, resulting in fewer search errors, better parameter optimization, and improved translation performance. The direct generation of translation lattices in the target language can improve subsequent rescoring procedures, yielding further gains when applying longspan language models and Minimum Bayes Risk decoding. We also provide insights as to how to control the size of the search space defined by hierarchical rules. We show that shallown grammars, lowlevel rule catenation, and other search constraints can help to match the power of the translation system to specific language pairs.
Contextfree reordering, finitestate translation
 In Proc. of HLTNAACL
, 2010
"... We describe a class of translation model in which a set of input variants encoded as a contextfree forest is translated using a finitestate translation model. The forest structure of the input is wellsuited to representing word order alternatives, making it straightforward to model translation as ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
(Show Context)
We describe a class of translation model in which a set of input variants encoded as a contextfree forest is translated using a finitestate translation model. The forest structure of the input is wellsuited to representing word order alternatives, making it straightforward to model translation as a two step process: (1) treebased source reordering and (2) phrase transduction. By treating the reordering process as a latent variable in a probabilistic translation model, we can learn a longrange source reordering model without example reordered sentences, which are problematic to construct. The resulting model has stateoftheart translation performance, uses linguistically motivated features to effectively model long range reordering, and is significantly smaller than a comparable hierarchical phrasebased translation model. 1
Learning a discriminative weighted finitestate transducer for speech recognition
 IEEE Transactions on Audio, Speech, and Language Processing
, 2011
"... Abstract—Weighted finitestate transducers (WFSTs) have been widely adopted as efficient representations of a general speech recognition model. The WFST for speech recognizer is typically assembled or composed from the several components—the language model, the pronunciation mapping and the acoustic ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Weighted finitestate transducers (WFSTs) have been widely adopted as efficient representations of a general speech recognition model. The WFST for speech recognizer is typically assembled or composed from the several components—the language model, the pronunciation mapping and the acoustic model—which are estimated separately without any endtoend optimization. This paper examines how the weights of such transducers can be learned in a manner that captures the interaction between the components. The paths in the transducer are represented asgrams defined over the input and output sequences whose linear weights are learned using a discriminative criterion. The resulting linear model factors into two weighted finitestate acceptors (WFSAs) which can be applied as corrections to the input and the output side of the initial WFST. This formulation allows duration cues to be incorporated seamlessly. Empirical results on a large vocabulary Arabic GALE task demonstrate that the proposed model improves word error rate substantially, with a gain of 1.5%–1.7 % absolute. Through a series of experiments, we analyze the contributions from and interactions between acoustic, duration, and language components to find that duration cues play an important role in a largevocabulary Arabic speech recognition task. Although this paper focuses on speech recognition, the proposed framework for learning the weights of a finite transducer is more general in nature and can be applied to other tasks such as utterance classification. Index Terms—Acoustic modeling, discriminative learning, duration modeling, finitestate transducers, language modeling, learning finitestate transducers. I.
somniis. Translated by
, 1934
"... Title: A pattern recognition approach to machine translation: monotone and nonmonotone phrasebased statistical models ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
Title: A pattern recognition approach to machine translation: monotone and nonmonotone phrasebased statistical models
Largescale statistical machine translation with weighted finite state transducers
 In Post Proceedings of the 7th International Workshop on FiniteState Methods and Natural Language Processing, FSMNLP 2008
, 2009
"... statistical machine translation system follows a generative model of translation and is implemented by the composition of component models of translation and movement realised as Weighted Finite State Transducers. Our flexible architecture requires no special purpose decoder and readily handles the ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
(Show Context)
statistical machine translation system follows a generative model of translation and is implemented by the composition of component models of translation and movement realised as Weighted Finite State Transducers. Our flexible architecture requires no special purpose decoder and readily handles the largescale natural language processing demands of stateoftheart machine translation systems. In this paper we describe the CUED system’s participation in the NIST 2008 ArabicEnglish machine translation evaluation task. Key words: Statistical machine translation, weighted finite state transducers, largescale natural language processing, finite state grammars. 1
Phrasal segmentation models for statistical machine translation
 In Coling 2008: Companion volume: Posters and Demonstrations
, 2008
"... Phrasal segmentation models define a mapping from the words of a sentence to sequences of translatable phrases. We discuss the estimation of these models from large quantities of monolingual training text and describe their realization as weighted finite state transducers for incorporation into phra ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Phrasal segmentation models define a mapping from the words of a sentence to sequences of translatable phrases. We discuss the estimation of these models from large quantities of monolingual training text and describe their realization as weighted finite state transducers for incorporation into phrasebased statistical machine translation systems. Results are reported on the NIST ArabicEnglish translation tasks showing significant complementary gains in BLEU score with large 5gram and 6gram language models. 1
MACHINE TRANSLATION BY PATTERN MATCHING
, 2008
"... The best systems for machine translation of natural language are based on statistical models learned from data. Conventional representation of a statistical translation model requires substantial offline computation and representation in main memory. Therefore, the principal bottlenecks to the amoun ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
The best systems for machine translation of natural language are based on statistical models learned from data. Conventional representation of a statistical translation model requires substantial offline computation and representation in main memory. Therefore, the principal bottlenecks to the amount of data we can exploit and the complexity of models we can use are available memory and CPU time, and current state of the art already pushes these limits. With data size and model complexity continually increasing, a scalable solution to this problem is central to future improvement. CallisonBurch et al. (2005) and Zhang and Vogel (2005) proposed a solution that we call translation by pattern matching, which we bring to fruition in this dissertation. The training data itself serves as a proxy to the model; rules and parameters are computed on demand. It achieves our desiderata of minimal offline computation and compact representation, but is dependent on fast pattern matching algorithms on text. They demonstrated its application to a common model based on the translation of contiguous substrings, but leave some open problems. Among these is a question: can this approach match the performance of conventional methods despite unavoidable differences that it induces in the model? We show how to answer this question affirmatively. The main