Results 11 -
17 of
17
2006b. Empirical lower bounds on the complexity of translational equivalence
- In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics (ACL
"... This paper describes a study of the patterns of translational equivalence exhibited by a variety of bitexts. The study found that the complexity of these patterns in every bitext was higher than suggested in the literature. These findings shed new light on why “syntactic ” constraints have not helpe ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper describes a study of the patterns of translational equivalence exhibited by a variety of bitexts. The study found that the complexity of these patterns in every bitext was higher than suggested in the literature. These findings shed new light on why “syntactic ” constraints have not helped to improve statistical translation models, including finitestate phrase-based models, tree-to-string models, and tree-to-tree models. The paper also presents evidence that inversion transduction grammars cannot generate some translational equivalence relations, even in relatively simple real bitexts in syntactically similar languages with rigid word order. Instructions for replicating our experiments are at
Demonstration of Joshua: An Open Source Toolkit for Parsing-based Machine Translation ∗
"... We describe Joshua (Li et al., 2009a) 1, an open source toolkit for statistical machine translation. Joshua implements all of the algorithms required for translation via synchronous context free grammars (SCFGs): chart-parsing, n-gram language model integration, beam- and cubepruning, and k-best ext ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We describe Joshua (Li et al., 2009a) 1, an open source toolkit for statistical machine translation. Joshua implements all of the algorithms required for translation via synchronous context free grammars (SCFGs): chart-parsing, n-gram language model integration, beam- and cubepruning, and k-best extraction. The toolkit also implements suffix-array grammar extraction and minimum error rate training. It uses parallel and distributed computing techniques for scalability. We also provide a demonstration outline for illustrating the toolkit’s features to potential users, whether they be newcomers to the field or power users interested in extending the toolkit. 1
MACHINE TRANSLATION BY PATTERN MATCHING
, 2008
"... The best systems for machine translation of natural language are based on statistical models learned from data. Conventional representation of a statistical translation model requires substantial offline computation and representation in main memory. Therefore, the principal bottlenecks to the amoun ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The best systems for machine translation of natural language are based on statistical models learned from data. Conventional representation of a statistical translation model requires substantial offline computation and representation in main memory. Therefore, the principal bottlenecks to the amount of data we can exploit and the complexity of models we can use are available memory and CPU time, and current state of the art already pushes these limits. With data size and model complexity continually increasing, a scalable solution to this problem is central to future improvement. Callison-Burch et al. (2005) and Zhang and Vogel (2005) proposed a solution that we call translation by pattern matching, which we bring to fruition in this dissertation. The training data itself serves as a proxy to the model; rules and parameters are computed on demand. It achieves our desiderata of minimal offline computation and compact representation, but is dependent on fast pattern matching algorithms on text. They demonstrated its application to a common model based on the translation of contiguous substrings, but leave some open problems. Among these is a question: can this approach match the performance of conventional methods despite unavoidable differences that it induces in the model? We show how to answer this question affirmatively. The main
Extracting Phrasal Alignments from Comparable Corpora by Using Joint Probability SMT Model
"... We propose a method of extracting phrasal alignments from comparable corpora by using an extended phrase-based joint probability model for statistical machine translation (SMT). Our method does not require preexisting dictionaries or splitting documents into sentences in advance. By checking each al ..."
Abstract
- Add to MetaCart
We propose a method of extracting phrasal alignments from comparable corpora by using an extended phrase-based joint probability model for statistical machine translation (SMT). Our method does not require preexisting dictionaries or splitting documents into sentences in advance. By checking each alignment for its reliability by using log-likelihood ratio statistics while searching for optimal alignments, our method aims to produce phrasal alignments for only parallel parts of the comparable corpora. Experimental result shows that our method achieves about 0.8 in precision of phrasal alignment extraction when using 2,000 Japanese-English document pairs as training data. 1
Extracting Transfer Rules for Multiword Expressions from Parallel Corpora
"... This paper presents a procedure for extracting transfer rules for multiword expressions from parallel corpora for use in a rule based Japanese-English MT system. We show that adding the multi-word rules improves translation quality and sketch ideas for learning more such rules. 1 ..."
Abstract
- Add to MetaCart
This paper presents a procedure for extracting transfer rules for multiword expressions from parallel corpora for use in a rule based Japanese-English MT system. We show that adding the multi-word rules improves translation quality and sketch ideas for learning more such rules. 1
Efficient retrieval of tree translation examples for Syntax-Based Machine Translation
"... We propose an algorithm allowing to efficiently retrieve example treelets in a parsed tree database in order to allow on-the-fly extraction of syntactic translation rules. We also propose improvements of this algorithm allowing several kinds of flexible matchings. 1 ..."
Abstract
- Add to MetaCart
We propose an algorithm allowing to efficiently retrieve example treelets in a parsed tree database in order to allow on-the-fly extraction of syntactic translation rules. We also propose improvements of this algorithm allowing several kinds of flexible matchings. 1

