Results 11 - 20
of
30
Fast Decoding for Statistical Machine Translation
- In Proc. Int. Conf. Spoken Language Processing
, 1998
"... We investigated an e cient decoding algorithm for statistical machine translation. Compared to the other algorithms, this new algorithm is applicable to di erent translation models, and it is much faster. Experiments showed that the algorithm achieved an overall performance comparable to the state o ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
We investigated an e cient decoding algorithm for statistical machine translation. Compared to the other algorithms, this new algorithm is applicable to di erent translation models, and it is much faster. Experiments showed that the algorithm achieved an overall performance comparable to the state of the art decoding algorithms. 1.
SMT Decoder Dissected: Word Reordering
- in Int. Conf. on Natural Language Processing and Knowledge Engineering (NLP-KE
, 2003
"... In this paper we describe a decoder for statistical machine translation which allows controlled reordering of the words generated in the target language. After a general discussion of the structure of a decoder a particular implementation is discussed which allows for word-toword and phrase-to-phras ..."
Abstract
-
Cited by 10 (7 self)
- Add to MetaCart
In this paper we describe a decoder for statistical machine translation which allows controlled reordering of the words generated in the target language. After a general discussion of the structure of a decoder a particular implementation is discussed which allows for word-toword and phrase-to-phrase translation. Word reordering is used to improve the translation quality. We analyze the effect of the length of this reordering window on the search space and the translation quality. Results for Chinese-to-English and Arabic-to-English translation tasks are presented. 1.
Distortion models for statistical machine translation
- In ACL
, 2006
"... In this paper, we argue that n-gram language models are not sufficient to address word reordering required for Machine Translation. We propose a new distortion model that can be used with existing phrase-based SMT decoders to address those n-gram language model limitations. We present empirical resu ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
In this paper, we argue that n-gram language models are not sufficient to address word reordering required for Machine Translation. We propose a new distortion model that can be used with existing phrase-based SMT decoders to address those n-gram language model limitations. We present empirical results in Arabic to English Machine Translation that show statistically significant improvements when our proposed model is used. We also propose a novel metric to measure word order similarity (or difference) between any pair of languages based on word alignments. 1
Statistical machine translation based on hierarchical phrase alignment
- Proc. of the 9th Intl. Conference on Theoretical and Methodological Issues in Machine Translation
, 2002
"... This paper describes statistical machine translation improved by applying hierarchical phrase alignment. The hierarchical phrase alignment is a method to align bilingual sentences phrase-by-phrase employing the partial parse results. Based on the hierarchical phrase alignment, a translation model is ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
This paper describes statistical machine translation improved by applying hierarchical phrase alignment. The hierarchical phrase alignment is a method to align bilingual sentences phrase-by-phrase employing the partial parse results. Based on the hierarchical phrase alignment, a translation model is trained on a chunked corpus by converting hierarchically aligned phrases into a sequence of chunks. The second method transforms the bilingual correspondence of the phrase alignments into that of translation model. Both of our approaches yield better quality of the translaiton model. 1
Comparing Reordering Constraints for SMT Using Efficient BLEU Oracle Computation
"... This paper describes a new method to compare reordering constraints for Statistical Machine Translation. We investigate the best possible (oracle) BLEU score achievable under different reordering constraints. Using dynamic programming, we efficiently find a reordering that approximates the highest a ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
This paper describes a new method to compare reordering constraints for Statistical Machine Translation. We investigate the best possible (oracle) BLEU score achievable under different reordering constraints. Using dynamic programming, we efficiently find a reordering that approximates the highest attainable BLEU score given a reference and a set of reordering constraints. We present an empirical evaluation of popular reordering constraints: local constraints, the IBM constraints, and the Inversion Transduction Grammar (ITG) constraints. We present results for a German-English translation task and show that reordering under the ITG constraints can improve over the baseline by more than 7.5 BLEU points. 1
Automatic development of spanish-catalan corpora for machine translation
- in Procs. of the Second International Workshop on Spanish Language Processing and Language Technologies
, 2001
"... Abstract To be able to successfully translate a text using example-based techniques, it is necessary to have a large computerized database of parallel sentences. In this paper, we describe an automatic procedure to construct a bilingual corpus from Internet. We also describe how we obtained two Span ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract To be able to successfully translate a text using example-based techniques, it is necessary to have a large computerized database of parallel sentences. In this paper, we describe an automatic procedure to construct a bilingual corpus from Internet. We also describe how we obtained two Spanish and Catalan corpora from two periodical publications (a legal bulletin and a newspaper). The corpus construction process consists of four main phases. First, the information is automatically obtained from Internet and no significant information is eliminated. Second, the corpus is fragmented into linguistic units (tokens, sentences, paragraph, etc.) by specific rules. Then, a procedure detects certain translation units, which have a specific behavior (numbers, abbreviations, proper nouns, etc.). Finally, the sentences from the two different languages are aligned. We introduce a new iterative algorithm for aligning parallel texts, which is based on Dynamic Programming. A manual test was done to verify the output of each phase. At the end of the paper, we discuss the results. 1
Efficient decoding for statistical machine translation with a fully expanded WFST model
- Proc. of MNLP04
, 2004
"... This paper proposes a novel method to compile statistical models for machine translation to achieve efficient decoding. In our method, each statistical submodel is represented by a weighted finite-state transducer (WFST), and all of the submodels are expanded into a composition model beforehand. Fur ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper proposes a novel method to compile statistical models for machine translation to achieve efficient decoding. In our method, each statistical submodel is represented by a weighted finite-state transducer (WFST), and all of the submodels are expanded into a composition model beforehand. Furthermore, the ambiguity of the composition model is reduced by the statistics of hypotheses while decoding. The experimental results show that the proposed model representation drastically improves the efficiency of decoding compared to the dynamic composition of the submodels, which corresponds to conventional approaches. 1
Vocabulary Extension via PoS Information for SMT
"... One of the weaknesses of the socalled phrase based translation models is that they carry out a blind extraction of the phrase translation table, i.e., they do not take into account the linguistic information which is inherent to every language. On the other hand, Part of Speech (PoS) tagging is a pr ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
One of the weaknesses of the socalled phrase based translation models is that they carry out a blind extraction of the phrase translation table, i.e., they do not take into account the linguistic information which is inherent to every language. On the other hand, Part of Speech (PoS) tagging is a problem that, nowadays, presents a pretty mature state of the art, obtaining error rates of almost 2%. Because of this, the use of automatically PoS-tagged corpora in Statistical Machine Translation (SMT) with the purpose of incorporating syntactical knowledge and enhancing the results obtained by state of the art SMT systems seems quite natural. In this work, we present results obtained on the EuroParl corpus by creating an extended vocabulary composed of the regular words and their PoS tags concatenated to them. 1
MACHINE TRANSLATION BY PATTERN MATCHING
, 2008
"... The best systems for machine translation of natural language are based on statistical models learned from data. Conventional representation of a statistical translation model requires substantial offline computation and representation in main memory. Therefore, the principal bottlenecks to the amoun ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The best systems for machine translation of natural language are based on statistical models learned from data. Conventional representation of a statistical translation model requires substantial offline computation and representation in main memory. Therefore, the principal bottlenecks to the amount of data we can exploit and the complexity of models we can use are available memory and CPU time, and current state of the art already pushes these limits. With data size and model complexity continually increasing, a scalable solution to this problem is central to future improvement. Callison-Burch et al. (2005) and Zhang and Vogel (2005) proposed a solution that we call translation by pattern matching, which we bring to fruition in this dissertation. The training data itself serves as a proxy to the model; rules and parameters are computed on demand. It achieves our desiderata of minimal offline computation and compact representation, but is dependent on fast pattern matching algorithms on text. They demonstrated its application to a common model based on the translation of contiguous substrings, but leave some open problems. Among these is a question: can this approach match the performance of conventional methods despite unavoidable differences that it induces in the model? We show how to answer this question affirmatively. The main
STATISTICAL MACHINE TRANSLATION DECODER BASED ON PHRASE
"... This paper describes a decoding algorithm for statistical machine translation based on phrases. In the past, the solution to the decoding problem were inspired from that of speech recognizers, translating each input word into one or more output words generating in left-to-right direction. The algori ..."
Abstract
- Add to MetaCart
This paper describes a decoding algorithm for statistical machine translation based on phrases. In the past, the solution to the decoding problem were inspired from that of speech recognizers, translating each input word into one or more output words generating in left-to-right direction. The algorithm presented here iteratively constructs phrases or chunks of cepts until all the input words are consumed. This behavior resulted in computational complexity higher than those with left-to-right constraints, though the translation accuracy is better from the Japanese-to-English translation experiments. 1.

