Results 1 - 10
of
11
Dependency treelet translation: Syntactically informed phrasal SMT
, 2005
"... We describe a novel approach to statistical machine translation that combines syntactic information in the source language with recent advances in phrasal translation. This method requires a source-language dependency parser, target language word segmentation and an unsupervised word alignment compo ..."
Abstract
-
Cited by 102 (5 self)
- Add to MetaCart
We describe a novel approach to statistical machine translation that combines syntactic information in the source language with recent advances in phrasal translation. This method requires a source-language dependency parser, target language word segmentation and an unsupervised word alignment component. We align a parallel corpus, project the source dependency parse onto the target sentence, extract dependency treelet translation pairs, and train a tree-based ordering model. We describe an efficient decoder and show that using these treebased models in combination with conventional SMT models provides a promising approach that incorporates the power of phrasal SMT with the linguistic generality available in a parser. 1.
Statistical Significance Tests for Machine Translation Evaluation
, 2004
"... If two translation systems differ differ in performance on a test set, can we trust that this indicates a difference in true system quality? To answer this question, we describe bootstrap resampling methods to compute statistical significance of test results, and validate them on the concrete exampl ..."
Abstract
-
Cited by 102 (0 self)
- Add to MetaCart
If two translation systems differ differ in performance on a test set, can we trust that this indicates a difference in true system quality? To answer this question, we describe bootstrap resampling methods to compute statistical significance of test results, and validate them on the concrete example of the BLEU score. Even for small test sizes of only 300 sentences, our methods may give us assurances that test result differences are real.
Dependency tree translation: Syntactically informed phrasal smt
- In ACL
, 2005
"... done while at Microsoft Research We describe a novel approach to statistical machine translation that combines syntactic information in the source language with recent advances in phrasal translation. We depend on a source-language dependency parser and a word-aligned parallel corpus. The only targe ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
done while at Microsoft Research We describe a novel approach to statistical machine translation that combines syntactic information in the source language with recent advances in phrasal translation. We depend on a source-language dependency parser and a word-aligned parallel corpus. The only target language resource assumed is a word breaker. These are used to produce treelet (“phrase”) translation pairs as well as several models, including a channel model, an order model, and a target language model. Together these models and the treelet translation pairs provide a powerful and promising approach to MT that incorporates the power of phrasal SMT with the linguistic generality available in a parser. We evaluate two decoding approaches, one inspired by dynamic programming and the
Inversion Transduction Grammar for joint phrasal translation modeling
- NAACL-HLT 2007 / AMTA Workshop on Syntax and Structure in Statistical Translation (SSST
, 2007
"... We present a phrasal inversion transduction grammar as an alternative to joint phrasal translation models. This syntactic model is similar to its flatstring phrasal predecessors, but admits polynomial-time algorithms for Viterbi alignment and EM training. We demonstrate that the consistency constrai ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
We present a phrasal inversion transduction grammar as an alternative to joint phrasal translation models. This syntactic model is similar to its flatstring phrasal predecessors, but admits polynomial-time algorithms for Viterbi alignment and EM training. We demonstrate that the consistency constraints that allow flat phrasal models to scale also help ITG algorithms, producing an 80-times faster inside-outside algorithm. We also show that the phrasal translation tables produced by the ITG are superior to those of the flat joint phrasal model, producing up to a 2.5 point improvement in BLEU score. Finally, we explore, for the first time, the utility of a joint phrasal translation model as a word alignment method. 1
Example-based machine translation based on syntactic transfer with statistical models
- In Proceedings of COLING
, 2004
"... This paper presents example-based machine translation (MT) based on syntactic transfer, which selects the best translation by using models of statistical machine translation. Example-based MT sometimes generates invalid translations because it selects similar examples to the input sentence based onl ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
This paper presents example-based machine translation (MT) based on syntactic transfer, which selects the best translation by using models of statistical machine translation. Example-based MT sometimes generates invalid translations because it selects similar examples to the input sentence based only on source language similarity. The method proposed in this paper selects the best translation by using a language model and a translation model in the same manner as statistical MT, and it can improve MT quality over that of ‘pure ’ example-based MT. A feature of this method is that the statistical models are applied after word re-ordering is achieved by syntactic transfer. This implies that MT quality is maintained even when we only apply a lexicon model as the translation model. In addition, translation speed is improved by bottom-up generation, which utilizes the tree structure that is output from the syntactic transfer. 1
An Exploration of Data-driven Machine Translation for Sign Languages
, 2008
"... A dissertation submitted in fulfilment of the requirements for the award of ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
A dissertation submitted in fulfilment of the requirements for the award of
Do we need phrases? Challenging the conventional wisdom in statistical machine translation
- In NAACL
, 2006
"... We begin by exploring theoretical and practical issues with phrasal SMT, several of which are addressed by syntax-based SMT. Next, to address problems not handled by syntax, we propose the concept of a Minimal Translation Unit (MTU) and develop MTU sequence models. Finally we incorporate these model ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We begin by exploring theoretical and practical issues with phrasal SMT, several of which are addressed by syntax-based SMT. Next, to address problems not handled by syntax, we propose the concept of a Minimal Translation Unit (MTU) and develop MTU sequence models. Finally we incorporate these models into a syntax-based SMT system and demonstrate that it improves on the state of the art translation quality within a theoretically more desirable framework. 1.
Multilingual Openmind Bridging the Gap between Different Cultures and Languages with Openmind Common Sense
, 2005
"... The need for more effective communication across different countries has increased as the interactions between them have been growing. Communication is still difficult because of both language differences and cultural differences. Although there have been many attempts to meet the communication need ..."
Abstract
- Add to MetaCart
The need for more effective communication across different countries has increased as the interactions between them have been growing. Communication is still difficult because of both language differences and cultural differences. Although there have been many attempts to meet the communication need on the level of language with machine translators and dictionaries, many problems related with cultural or conceptual differences still remain. In this thesis, I propose to build and use Multilingual Openmind, a network of common sense database from several countries and languages, to solve the problems. First, I enumerate the weakness of current translation assistant tools: the limited scale of bilingual corpora; the limits of word-to-word mapping resulted from the differences of conceptual borders; and the lack of cultural contexts. Next, I discuss new approaches to solve the problems and their required features: continuously updated multilingual corpora, context-based method using common sense knowledge databases, relation-to-relation mapping, and inference algorithms. Finally, I propose the design of Multilingual Openmind, which enables new approaches and features discussed above. ii
IMPROVING WORD SEGMENTATION FOR THAI SPEECH TRANSLATION
"... A vocabulary list and language model are primary components in a speech translation system. Generating both from plain text is a straightforward task for English. However, it is quite challenging for Chinese, Japanese, or Thai which provide no word segmentation, i.e. the text has no word boundary de ..."
Abstract
- Add to MetaCart
A vocabulary list and language model are primary components in a speech translation system. Generating both from plain text is a straightforward task for English. However, it is quite challenging for Chinese, Japanese, or Thai which provide no word segmentation, i.e. the text has no word boundary delimiter. For Thai word segmentation, Maximal Matching, a lexicon-based approach, is one of the popular methods. Nevertheless this method heavily relies on the coverage of the lexicon. When text contains an unknown word, this method usually produces a wrong boundary. When extracting words from this segmented text, some words will not be retrieved because of wrong segmentation. In this paper, we propose statistical techniques to tackle this problem. Based on different word segmentation methods we develop various speech translation systems and show that the proposed method can significantly improve the translation accuracy by about 6.42 % BLEU points compared to the baseline system.
Obtaining Word Phrases with Stochastic Inversion Transduction Grammars for Phrase-based Statistical Machine Translation ∗
"... Phrase-based statistical translation systems are currently providing excellent results in real machine translation tasks. In phrase-based statistical translation systems, the basic translation units are word phrases. An important problem that is related to the estimation of phrase-based statistical ..."
Abstract
- Add to MetaCart
Phrase-based statistical translation systems are currently providing excellent results in real machine translation tasks. In phrase-based statistical translation systems, the basic translation units are word phrases. An important problem that is related to the estimation of phrase-based statistical models is the obtaining of word phrases from an aligned bilingual training corpus. In this work, we propose obtaining word phrases by means of a Stochastic Inversion Transduction Grammar. Preliminary experiments have been carried out on real tasks and promising results have been obtained. 1

