Results 1 - 10
of
27
11,001 new features for statistical machine translation
- In North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HLT
, 2009
"... We use the Margin Infused Relaxed Algorithm of Crammer et al. to add a large number of new features to two machine translation systems: the Hiero hierarchical phrasebased translation system and our syntax-based translation system. On a large-scale Chinese-English translation task, we obtain statisti ..."
Abstract
-
Cited by 39 (1 self)
- Add to MetaCart
We use the Margin Infused Relaxed Algorithm of Crammer et al. to add a large number of new features to two machine translation systems: the Hiero hierarchical phrasebased translation system and our syntax-based translation system. On a large-scale Chinese-English translation task, we obtain statistically significant improvements of +1.5 Bleu and +1.1 Bleu, respectively. We analyze the impact of the new features and the performance of the learning algorithm. 1
A survey of statistical machine translation
, 2007
"... Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular tec ..."
Abstract
-
Cited by 30 (3 self)
- Add to MetaCart
Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular techniques have only emerged within the last few years. This survey presents a tutorial overview of state-of-the-art SMT at the beginning of 2007. We begin with the context of the current research, and then move to a formal problem description and an overview of the four main subproblems: translational equivalence modeling, mathematical modeling, parameter estimation, and decoding. Along the way, we present a taxonomy of some different approaches within these areas. We conclude with an overview of evaluation and notes on future directions.
Syntax augmented machine translation via chart parsing
- in Proceedings on the Workshop on Statistical Machine Translation. New York City: Association for Computational Linguistics
, 2006
"... We present a hierarchical phrase-based translation model which annotates and generalizes existing phrase translations with syntactic categories derived from parsing the target side of a parallel corpus. We associate target parse trees for each training sentence pair with a search lattice constructed ..."
Abstract
-
Cited by 24 (6 self)
- Add to MetaCart
We present a hierarchical phrase-based translation model which annotates and generalizes existing phrase translations with syntactic categories derived from parsing the target side of a parallel corpus. We associate target parse trees for each training sentence pair with a search lattice constructed from the existing phrase translations on the corresponding source sentence, and consider techniques to produce a syntactically motivated bilingual synchronous grammar. We describe refinements to a chart based decoder and k-best extraction techniques to effectively parse the resulting grammar, which contains up to 4000 syntax-derivated nonterminals, producing translations that achieve significant improvements over Pharaoh, a stateof-the-art phrase based system, on the Europarl French-to-English task (Koehn and Monz, 2005). 1
Translating with non-contiguous phrases
- In EMNLP
, 2005
"... This paper presents a phrase-based statistical machine translation method, based on non-contiguous phrases, i.e. phrases with gaps. A method for producing such phrases from a word-aligned corpora is proposed. A statistical translation model is also presented that deals such phrases, as well as a tra ..."
Abstract
-
Cited by 23 (6 self)
- Add to MetaCart
This paper presents a phrase-based statistical machine translation method, based on non-contiguous phrases, i.e. phrases with gaps. A method for producing such phrases from a word-aligned corpora is proposed. A statistical translation model is also presented that deals such phrases, as well as a training method based on the maximization of translation accuracy, as measured with the NIST evaluation metric. Translations are produced by means of a beam-search decoder. Experimental results are presented, that demonstrate how the proposed method allows to better generalize from the training data. 1
Syntax-Based Alignment: Supervised or Unsupervised?
, 2004
"... Tree-based approaches to alignment model translation as a sequence of probabilistic operations transforming the syntactic parse tree of a sentence in one language into that of the other. The trees may be learned directly from parallel corpora (Wu, 1997), or provided by a parser trained on hand ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
Tree-based approaches to alignment model translation as a sequence of probabilistic operations transforming the syntactic parse tree of a sentence in one language into that of the other. The trees may be learned directly from parallel corpora (Wu, 1997), or provided by a parser trained on hand-annotated treebanks (Yamada and Knight, 2001). In this paper, we compare these approaches on Chinese-English and French-English datasets, and find that automatically derived trees result in better agreement with human-annotated word-level alignments for unseen test data.
Extended Multi Bottom-Up Tree Transducers
"... Abstract. Extended multi bottom-up tree transducers are de ned and investigated. They are an extension of multi bottom-up tree transducers by arbitrary, not just shallow, left-hand sides of rules; this includes rules that do not consume input. It is shown that such transducers can compute any transf ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Abstract. Extended multi bottom-up tree transducers are de ned and investigated. They are an extension of multi bottom-up tree transducers by arbitrary, not just shallow, left-hand sides of rules; this includes rules that do not consume input. It is shown that such transducers can compute any transformation that is computed by a linear extended top-down tree transducer. Moreover, the classical composition results for bottomup tree transducers are generalized to extended multi bottom-up tree transducers. Finally, a characterization in terms of extended top-down tree transducers is presented. 1
A Syntactic Skeleton for Statistical Machine Translation
- In Proceedings of the 11th Conference of the European Association for Machine Translation
, 2006
"... We present a method for improving statistical machine translation performance by using linguistically motivated syntactic information. Our algorithm recursively decomposes source language sentences into syntactically simpler and shorter chunks, and recomposes their translation to form target languag ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
We present a method for improving statistical machine translation performance by using linguistically motivated syntactic information. Our algorithm recursively decomposes source language sentences into syntactically simpler and shorter chunks, and recomposes their translation to form target language sentences. This improves both the word order and lexical selection of the translation. We report statistically significant relative improvements of 3.3 % BLEU score in an experiment (English→Spanish) carried out on an 800-sentence test set extracted from the Europarl corpus. 1
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages
- In Proceedings of the ESSLLI 2002 Student Session
, 2002
"... Data-driven approaches to machine translation often rely heavily on large training corpora. We are developing a translation system targeted specifically at minority languages for which such large corpora are not usually available. ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Data-driven approaches to machine translation often rely heavily on large training corpora. We are developing a translation system targeted specifically at minority languages for which such large corpora are not usually available.
Symmetric Probabilistic Alignment
, 2006
"... The CMU Example-Based Machine Translation (EBMT) system has been deployed successfully in many projects for years. But even though a good alignment algorithm is essential since the CMU EBMT system uses parallel corpora, it has relatively less studied than other components of EBMT. For this reason, w ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The CMU Example-Based Machine Translation (EBMT) system has been deployed successfully in many projects for years. But even though a good alignment algorithm is essential since the CMU EBMT system uses parallel corpora, it has relatively less studied than other components of EBMT. For this reason, we developed a new alignment algorithm which uses statistical information drawn from parallel corpora and heuristics based on human linguistic knowledge. Unlike most alignment approaches in Statistical Machine Translation (SMT) systems, our alignment algorithm uses only bilingual dictionaries as statistical information trained from other systems, calculates alignment scores bi-directionally and aims at aligning up to 8 words long source fragments. In our experiments so far, it outperformed the old heuristic-based alignment algorithm in both alignment accuracy and translation accuracy in EBMT. Its performance was very close to the the state-of-the-art in SMT systems for which we picked IBM Model 4 for comparison, and a combination of our new method and IBM Model 4 performed best.

