Results 1 -
5 of
5
A survey of statistical machine translation
, 2007
"... Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular tec ..."
Abstract
-
Cited by 30 (3 self)
- Add to MetaCart
Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular techniques have only emerged within the last few years. This survey presents a tutorial overview of state-of-the-art SMT at the beginning of 2007. We begin with the context of the current research, and then move to a formal problem description and an overview of the four main subproblems: translational equivalence modeling, mathematical modeling, parameter estimation, and decoding. Along the way, we present a taxonomy of some different approaches within these areas. We conclude with an overview of evaluation and notes on future directions.
Word Alignment for Languages with Scarce Resources
, 2005
"... This paper presents the task definition, resources, participating systems, and comparative results for the shared task on word alignment, which was organized as part of the ACL 2005 Workshop on Building and Using Parallel Texts. The shared task included English--Inuktitut, Romanian--English, ..."
Abstract
-
Cited by 29 (1 self)
- Add to MetaCart
This paper presents the task definition, resources, participating systems, and comparative results for the shared task on word alignment, which was organized as part of the ACL 2005 Workshop on Building and Using Parallel Texts. The shared task included English--Inuktitut, Romanian--English, and English--Hindi sub-tasks, and drew the participation of ten teams from around the world with a total of 50 systems.
Pivot Language Approach for Phrase-Based Statistical Machine Translation
"... This paper proposes a novel method for phrase-based statistical machine translation by using pivot language. To conduct translation between languages Lf and Le with a small bilingual corpus, we bring in a third language Lp, which is named the pivot language. For Lf-Lp and Lp-Le, there exist large bi ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
This paper proposes a novel method for phrase-based statistical machine translation by using pivot language. To conduct translation between languages Lf and Le with a small bilingual corpus, we bring in a third language Lp, which is named the pivot language. For Lf-Lp and Lp-Le, there exist large bilingual corpora. Using only Lf-Lp and Lp-Le bilingual corpora, we can build a translation model for Lf-Le. The advantage of this method lies in that we can perform translation between Lf and Le even if there is no bilingual corpus available for this language pair. Using BLEU as a metric, our pivot language method achieves an absolute improvement of 0.06 (22.13 % relative) as compared with the model directly trained with 5,000 Lf-Le sentence pairs for French-Spanish translation. Moreover, with a small Lf-Le bilingual corpus available, our method can further improve the translation quality by using the additional Lf-Lp and Lp-Le bilingual corpora.
MACHINE TRANSLATION BY PATTERN MATCHING
, 2008
"... The best systems for machine translation of natural language are based on statistical models learned from data. Conventional representation of a statistical translation model requires substantial offline computation and representation in main memory. Therefore, the principal bottlenecks to the amoun ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The best systems for machine translation of natural language are based on statistical models learned from data. Conventional representation of a statistical translation model requires substantial offline computation and representation in main memory. Therefore, the principal bottlenecks to the amount of data we can exploit and the complexity of models we can use are available memory and CPU time, and current state of the art already pushes these limits. With data size and model complexity continually increasing, a scalable solution to this problem is central to future improvement. Callison-Burch et al. (2005) and Zhang and Vogel (2005) proposed a solution that we call translation by pattern matching, which we bring to fruition in this dissertation. The training data itself serves as a proxy to the model; rules and parameters are computed on demand. It achieves our desiderata of minimal offline computation and compact representation, but is dependent on fast pattern matching algorithms on text. They demonstrated its application to a common model based on the translation of contiguous substrings, but leave some open problems. Among these is a question: can this approach match the performance of conventional methods despite unavoidable differences that it induces in the model? We show how to answer this question affirmatively. The main

