Results 1 - 10
of
30
A Syntax-based Statistical Translation Model
, 2001
"... We present a syntax-based statistical translation model. Our model transforms a source-language parse tree into a target-language string by applying stochastic operations at each node. These operations capture linguistic differences such as word order and case marking. Model parameters are es ..."
Abstract
-
Cited by 202 (13 self)
- Add to MetaCart
We present a syntax-based statistical translation model. Our model transforms a source-language parse tree into a target-language string by applying stochastic operations at each node. These operations capture linguistic differences such as word order and case marking. Model parameters are estimated in polynomial time using an EM algorithm. The model produces word alignments that are better than those produced by IBM Model 5. 1
Improvements in Phrase-Based Statistical Machine Translation
- In Proc. of the Human Language Technology Conf. (HLT-NAACL
, 2004
"... In statistical machine translation, the currently best performing systems are based in some way on phrases or word groups. We describe the baseline phrase-based translation system and various refinements. We describe a highly efficient monotone search algorithm with a complexity linear in the ..."
Abstract
-
Cited by 44 (8 self)
- Add to MetaCart
In statistical machine translation, the currently best performing systems are based in some way on phrases or word groups. We describe the baseline phrase-based translation system and various refinements. We describe a highly efficient monotone search algorithm with a complexity linear in the input sentence length. We present translation results for three tasks: Verbmobil, Xerox and the Canadian Hansards. For the Xerox task, it takes less than 7 seconds to translate the whole test set consisting of more than 10K words. The translation results for the Xerox and Canadian Hansards task are very promising. The system even outperforms the alignment template system.
A Decoder for Syntax-based Statistical MT
- PROCEEDINGS OF THE 40TH ANNIVERSARY MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL-02
, 2002
"... This paper describes a decoding algorithm for a syntax-based translation model (Yamada and Knight, 2001). The model has been extended to incorporate phrasal translations as presented here. In contrast to a conventional word-to-word statistical model, a decoder for the syntaxbased model builds ..."
Abstract
-
Cited by 32 (2 self)
- Add to MetaCart
This paper describes a decoding algorithm for a syntax-based translation model (Yamada and Knight, 2001). The model has been extended to incorporate phrasal translations as presented here. In contrast to a conventional word-to-word statistical model, a decoder for the syntaxbased model builds up an English parse tree given a sentence in a foreign language.
A survey of statistical machine translation
, 2007
"... Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular tec ..."
Abstract
-
Cited by 30 (3 self)
- Add to MetaCart
Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular techniques have only emerged within the last few years. This survey presents a tutorial overview of state-of-the-art SMT at the beginning of 2007. We begin with the context of the current research, and then move to a formal problem description and an overview of the four main subproblems: translational equivalence modeling, mathematical modeling, parameter estimation, and decoding. Along the way, we present a taxonomy of some different approaches within these areas. We conclude with an overview of evaluation and notes on future directions.
A Comparative Study on Reordering Constraints in Statistical Machine Translation
, 2003
"... In statistical machine translation, the generation of a translation hypothesis is computationally expensive. If arbitrary wordreorderings are permitted, the search problem is NP-hard. On the other hand, if we restrict the possible word-reorderings in an appropriate way, we obtain a polynomial ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
In statistical machine translation, the generation of a translation hypothesis is computationally expensive. If arbitrary wordreorderings are permitted, the search problem is NP-hard. On the other hand, if we restrict the possible word-reorderings in an appropriate way, we obtain a polynomial-time search algorithm.
Novel reordering approaches in phrase-based statistical machine translation
- Proceedings of the ACL Workshop on Building and Using Parallel Texts: Data-Driven Machine Translation and Beyond
, 2005
"... This paper presents novel approaches to reordering in phrase-based statistical machine translation. We perform consistent reordering of source sentences in training and estimate a statistical translation model. Using this model, we follow a phrase-based monotonic machine translation approach, for wh ..."
Abstract
-
Cited by 22 (7 self)
- Add to MetaCart
This paper presents novel approaches to reordering in phrase-based statistical machine translation. We perform consistent reordering of source sentences in training and estimate a statistical translation model. Using this model, we follow a phrase-based monotonic machine translation approach, for which we develop an efficient and flexible reordering framework that allows to easily introduce different reordering constraints. In translation, we apply source sentence reordering on word level and use a reordering automaton as input. We show how to compute reordering automata on-demand using IBM or ITG constraints, and also introduce two new types of reordering constraints. We further add weights to the reordering automata. We present detailed experimental results and show that reordering significantly improves translation quality. 1
Greedy Decoding for Statistical Machine Translation in Almost Linear Time
, 2003
"... We present improvements to a greedy decoding algorithm for statistical machine translation that reduce its time complexity from at least cubic (O(n^6) when applied navely) to practically linear time without sacrificing translation quality. We achieve this by integrating hypothesis evaluati ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
We present improvements to a greedy decoding algorithm for statistical machine translation that reduce its time complexity from at least cubic (O(n^6) when applied navely) to practically linear time without sacrificing translation quality. We achieve this by integrating hypothesis evaluation into hypothesis creation, tiling improvements over the translation hypothesis at the end of each search iteration, and by imposing restrictions on the amount of word reordering during decoding.
Stochastic lexicalized inversion transduction grammar for alignment
- In Proc. of ACL
, 2005
"... We present a version of Inversion Transduction Grammar where rule probabilities are lexicalized throughout the synchronous parse tree, along with pruning techniques for efficient training. Alignment results improve over unlexicalized ITG on short sentences for which full EM is feasible, but pruning ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
We present a version of Inversion Transduction Grammar where rule probabilities are lexicalized throughout the synchronous parse tree, along with pruning techniques for efficient training. Alignment results improve over unlexicalized ITG on short sentences for which full EM is feasible, but pruning seems to have a negative impact on longer sentences. 1
Syntax-Based Alignment: Supervised or Unsupervised?
, 2004
"... Tree-based approaches to alignment model translation as a sequence of probabilistic operations transforming the syntactic parse tree of a sentence in one language into that of the other. The trees may be learned directly from parallel corpora (Wu, 1997), or provided by a parser trained on hand ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
Tree-based approaches to alignment model translation as a sequence of probabilistic operations transforming the syntactic parse tree of a sentence in one language into that of the other. The trees may be learned directly from parallel corpora (Wu, 1997), or provided by a parser trained on hand-annotated treebanks (Yamada and Knight, 2001). In this paper, we compare these approaches on Chinese-English and French-English datasets, and find that automatically derived trees result in better agreement with human-annotated word-level alignments for unseen test data.
Grammar Inference and Statistical Machine Translation
, 1998
"... NLP researchers face a dilemma: on one side, it is unarguably accepted that languages have internal structure rather than strings of words. On the other side, they find it very difficult and expensive to write grammars that have good coverage of language structures. Statistical machine translation ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
NLP researchers face a dilemma: on one side, it is unarguably accepted that languages have internal structure rather than strings of words. On the other side, they find it very difficult and expensive to write grammars that have good coverage of language structures. Statistical machine translation tries to cope with this problem by ignoring language structures and using a statistical models to depict the translation process. Most of the translation models are word-based. While the approach has achieved surprisingly good performance comparable to the best commercial systems, many questions remain in the machine translation community. Can the statistical word-based translation still perform well on language pairs with radically different linguistic structures? How would it function with less training data or with spoken languages? The thesis work investigated these questions. In summary, word-based alignment model is a major cause of errors in German-English statistical spoken language...

