Results 1 -
5 of
5
A block bigram prediction model for statistical machine translation
- ACM Transactions Speech Language Processing
, 2007
"... In this paper, we present a novel training method for a localized phrase-based prediction model for statistical machine translation (SMT). The model predicts block neighbors to carry out a phrasebased translation that explicitly handles local phrase re-ordering. We use a maximum likelihood criterion ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
In this paper, we present a novel training method for a localized phrase-based prediction model for statistical machine translation (SMT). The model predicts block neighbors to carry out a phrasebased translation that explicitly handles local phrase re-ordering. We use a maximum likelihood criterion to train a log-linear block bigram model which uses real-valued features (e.g. a language model score) as well as binary features based on the block identities themselves (e.g. block bigram features). The model training relies on an efficient enumeration of local block neighbors in parallel training data. A novel stochastic gradient descent (SGD) training algorithm is presented that can easily handle millions of features. Moreover, when viewing SMT as a block generation process, it becomes quite similar to sequential natural language annotation problems such as part-of-speech tagging, phrase chunking, or shallow parsing. The novel approach is successfully tested on a standard Arabic-English translation task using two different phrase re-ordering models: a block orientation model and a phrase-distortion model. Categories and Subject Descriptors: I.2.7 [Artificial Intelligence]: Natural Language Processing—statistical machine translation; G.3 [Probability and Statistics]: Statistical computing— stochastic gradient descent
Phrase reordering for statistical machine translation based on predicate-argument structure
- In Proceedings of the International Workshop on Spoken Language Translation: Evaluation Campaign on Spoken Language Translation
, 2006
"... In this paper, we describe a novel phrase reordering model based on predicate-argument structure. Our phrase reordering method utilizes a general predicate-argument structure analyzer to reorder source language chunks based on predicate-argument structure. We explicitly model longdistance phrase ali ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper, we describe a novel phrase reordering model based on predicate-argument structure. Our phrase reordering method utilizes a general predicate-argument structure analyzer to reorder source language chunks based on predicate-argument structure. We explicitly model longdistance phrase alignments by reordering arguments and predicates. The reordering approach is applied as a preprocessing step in training phase of a phrase-based statistical MT system. We report experimental results in the evaluation campaign of IWSLT 2006. 1.
Vocabulary Extension via PoS Information for SMT
"... One of the weaknesses of the socalled phrase based translation models is that they carry out a blind extraction of the phrase translation table, i.e., they do not take into account the linguistic information which is inherent to every language. On the other hand, Part of Speech (PoS) tagging is a pr ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
One of the weaknesses of the socalled phrase based translation models is that they carry out a blind extraction of the phrase translation table, i.e., they do not take into account the linguistic information which is inherent to every language. On the other hand, Part of Speech (PoS) tagging is a problem that, nowadays, presents a pretty mature state of the art, obtaining error rates of almost 2%. Because of this, the use of automatically PoS-tagged corpora in Statistical Machine Translation (SMT) with the purpose of incorporating syntactical knowledge and enhancing the results obtained by state of the art SMT systems seems quite natural. In this work, we present results obtained on the EuroParl corpus by creating an extended vocabulary composed of the regular words and their PoS tags concatenated to them. 1
Printed by Fotocopias ZorroagaMATMT2008 workshop Mixing Approaches to Machine Translation
, 2008
"... proceedings editors: ..."
Sehda S 2 MT: Incorporation of Syntax into Statistical Translation System
"... This paper describes Sehda’s S 2 MT (Syntactic Statistical Machine Translation) system submitted to the Korean-English track in the evaluation campaign of the IWSLT-05 workshop. The S 2 MT is a phrase-based statistical system trained on linguistically processed parallel data. 1. ..."
Abstract
- Add to MetaCart
This paper describes Sehda’s S 2 MT (Syntactic Statistical Machine Translation) system submitted to the Korean-English track in the evaluation campaign of the IWSLT-05 workshop. The S 2 MT is a phrase-based statistical system trained on linguistically processed parallel data. 1.

