Results 1 - 10
of
15
Large-scale discriminative n-gram language models for statistical machine translation
- In Proceedings of AMTA
, 2008
"... We extend discriminative n-gram language modeling techniques originally proposed for automatic speech recognition to a statistical machine translation task. In this context, we propose a novel data selection method that leads to good models using a fraction of the training data. We carry out systema ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
(Show Context)
We extend discriminative n-gram language modeling techniques originally proposed for automatic speech recognition to a statistical machine translation task. In this context, we propose a novel data selection method that leads to good models using a fraction of the training data. We carry out systematic experiments on several benchmark tests for Chinese to English translation using a hierarchical phrase-based machine translation system, and show that a discriminative language model significantly improves upon a state-of-the-art baseline. The experiments also highlight the benefits of our data selection method. 1
Constituent reordering and syntax models for Englishto-Japanese statistical machine translation
- In COLING
, 2010
"... We present a constituent parsing-based reordering technique that improves the performance of the state-of-the-art English-to-Japanese phrase translation system that includes distortion models by 4.76 BLEU points. The phrase translation model with reordering applied at the pre-processing stage outper ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
(Show Context)
We present a constituent parsing-based reordering technique that improves the performance of the state-of-the-art English-to-Japanese phrase translation system that includes distortion models by 4.76 BLEU points. The phrase translation model with reordering applied at the pre-processing stage outperforms a syntax-based translation system that incorporates a phrase translation model, a hierarchical phrase-based translation model and a tree-to-string grammar. We also show that combining constituent reordering and the syntax model improves the translation quality by additional 0.84 BLEU points. 1
Improving fluency by reordering target constituents using MST parser in English-toJapanese phrase-based SMT
- In Proceedings of MT Summit XII
, 2009
"... We propose a reordering method to improve the fluency of the output of the phrase-based SMT (PBSMT) system. We parse the transla-tion results that follow the source language or-der into non-projective dependency trees, then reorder dependency trees to obtain fluent tar-get sentences. Our method ensu ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We propose a reordering method to improve the fluency of the output of the phrase-based SMT (PBSMT) system. We parse the transla-tion results that follow the source language or-der into non-projective dependency trees, then reorder dependency trees to obtain fluent tar-get sentences. Our method ensures that the translation results are grammatically correct and achieves major improvements over PB-SMT using dependency-based metrics. 1
Syntactic SMT Using a Discriminative Text Generation Model
"... We study a novel architecture for syntactic SMT. In contrast to the dominant approach in the literature, the system does not rely on translation rules, but treat translation as an unconstrained target sentence gen-eration task, using soft features to cap-ture lexical and syntactic correspondences be ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We study a novel architecture for syntactic SMT. In contrast to the dominant approach in the literature, the system does not rely on translation rules, but treat translation as an unconstrained target sentence gen-eration task, using soft features to cap-ture lexical and syntactic correspondences between the source and target languages. Target syntax features and bilingual trans-lation features are trained consistently in a discriminative model. Experiments us-ing the IWSLT 2010 dataset show that the system achieves BLEU comparable to the state-of-the-art syntactic SMT systems. 1
Partial-Tree Linearization: Generalized Word Ordering for Text Synthesis
- PROCEEDINGS OF THE TWENTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE
, 2013
"... We present partial-tree linearization, a generalized word ordering (i.e. ordering a set of input words into a grammatical and fluent sentence) task for text-to-text applications. Recent studies of word ordering can be categorized into either abstract word ordering (no input syntax except for POS) or ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We present partial-tree linearization, a generalized word ordering (i.e. ordering a set of input words into a grammatical and fluent sentence) task for text-to-text applications. Recent studies of word ordering can be categorized into either abstract word ordering (no input syntax except for POS) or tree linearization (input words are associated with a full unordered syntax tree). Partial-tree linearization covers the whole spectrum of input between these two extremes. By allowing POS and dependency relations to be associated with any subset of input words, partial-tree linearization is more practical for a dependency-based NLG pipeline, such as transfer-based MT and abstractive text summarization. In addition, a partial-tree linearizer can also perform abstract word ordering and full-tree linearization. Our system achieves the best published results on standard PTB evaluations of these tasks.
Sentence Realisation from Bag of Words with dependency constraints
"... In this paper, we present five models for sentence realisation from a bag-of-words containing minimal syntactic information. It has a large variety of applications ranging from Machine Translation to Dialogue systems. Our models employ simple and efficient techniques based on n-gram Language modelin ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
In this paper, we present five models for sentence realisation from a bag-of-words containing minimal syntactic information. It has a large variety of applications ranging from Machine Translation to Dialogue systems. Our models employ simple and efficient techniques based on n-gram Language modeling. We evaluated the models by comparing the synthesized sentences with reference sentences using the standard BLEU metric(Papineni et al., 2001). We obtained higher results (BLEU score of 0.8156) when compared to the state-of-art results. In future, we plan to incorporate our sentence realiser in Machine Translation and observe its effect on the translation accuracies. 1
A critique of statistical machine translation
- IN WALTER DAELEMANS & VÉRONIQUE HOSTE (EDS.), JOURNAL OF TRANSLATION AND INTERPRETING STUDIES: SPECIAL ISSUE ON EVALUATION OF TRANSLATION TECHNOLOGY, LINGUISTICA ANTVERPIENSIA
, 2009
"... Phrase-Based Statistical Machine Translation (PB-SMT) is clearly the leading paradigm in the field today. Nevertheless—and this may come as some surprise to the PB-SMT community—most translators, and somewhat more surprisingly perhaps, many experienced MT protagonists, find the basic model extremely ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Phrase-Based Statistical Machine Translation (PB-SMT) is clearly the leading paradigm in the field today. Nevertheless—and this may come as some surprise to the PB-SMT community—most translators, and somewhat more surprisingly perhaps, many experienced MT protagonists, find the basic model extremely difficult to understand. The main aim of this paper, therefore, is to discuss why this might be the case. Our basic thesis is that proponents of PB-SMT do not seek to address any community other than their own, for they do not feel any need to do so. We will demonstrate that this was not always the case; on the contrary, when statistical models of translation were first presented, the language used to describe how such a model might work was very conciliatory, and inclusive. Over the next five years things changed considerably; once SMT achieved dominance particularly over the rule-based paradigm, it had established a position where it did not need to bring along the rest of the MT community with it, and in our view, this has largely pertained to this day. Having discussed these issues, we will provide three additional observations: firstly, we will discuss the role of automatic MT evaluation metrics when describing PB-SMT systems; secondly, we will comment on the recent syntactic embellishments of PB-SMT, noting especially that most of these contributions have come from researchers who have prior experience in fields other than statistical models of translation; and finally, we will briefly comment on the relationship between PB-SMT and other models of translation, suggesting that there are many gains to be had if the SMT community were to open up more to the other MT paradigms.
Declarative syntactic processing of natural language using concurrent constraint programming and probabilistic dependency modeling
- In Proc. UCNLG
, 2007
"... This paper describes a declarative approach to parsing and realization of natural lan-guage using a probabilistic dependency model of syntax within a constrained opti-mization framework. Such an approach is particularly well-suited for applications like machine translation. The paper describes a tes ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
This paper describes a declarative approach to parsing and realization of natural lan-guage using a probabilistic dependency model of syntax within a constrained opti-mization framework. Such an approach is particularly well-suited for applications like machine translation. The paper describes a test-of-concept implementation applied to the classic sentence “Time flies like an ar-row. ” and discusses the further research necessary for scaling up to general broad-coverage processing of language. 1
SEARCH AND LEARNING FOR THE LINEAR ORDERING PROBLEM WITH AN APPLICATION TO MACHINE TRANSLATION
, 2009
"... This dissertation is about ordering. The problem of arranging a set of n items in a desired order is quite common, as well as fundamental to computer science. Sorting is one instance, as is the Traveling Salesman Problem. Each problem instance can be thought of as optimization of a function that app ..."
Abstract
- Add to MetaCart
This dissertation is about ordering. The problem of arranging a set of n items in a desired order is quite common, as well as fundamental to computer science. Sorting is one instance, as is the Traveling Salesman Problem. Each problem instance can be thought of as optimization of a function that applies to the set of permutations. The dissertation treats word reordering for machine translation as another instance of a combinatorial optimization problem. The approach introduced is to combine three different functions of permutations. The first function is based on finite-state automata, the second is an instance of the Linear Ordering Problem, and the third is an entirely new permutation problem related to the LOP. The Linear Ordering Problem has the most attractive computational properties of the three, all of which are NP-hard optimization problems. The dissertation expends significant effort developing neighborhoods for local search on the LOP, and uses grammars and other tools from natural language parsing to introduce several new results, including a state-of-the-art local search procedure. Combinatorial optimization problems such as the TSP or the LOP are usually
Improving Chinese-English . . .
, 2009
"... Machine Translation (MT) is a task with multiple components, each of which can be very challenging. This thesis focuses on a difficult language pair – Chinese to English – and works on several language-specific aspects that make translation more difficult. The first challenge this thesis focuses on ..."
Abstract
- Add to MetaCart
Machine Translation (MT) is a task with multiple components, each of which can be very challenging. This thesis focuses on a difficult language pair – Chinese to English – and works on several language-specific aspects that make translation more difficult. The first challenge this thesis focuses on is the differences in the writing systems. In Chinese there are no explicit boundaries between words, and even the definition of a “word” is unclear. We build a general purpose Chinese word segmenter with linguistically inspired features that performs very well on the SIGHAN 2005 bakeoff data. Then we study how Chinese word segmenter performance is related to MT performance, and provide a way to tune the “word ” unit in Chinese so that it can better match up with the English word granularity, and therefore improve MT performance. The second challenge we address is different word order between Chinese and English. We first perform error analysis on three state-of-the-art MT systems to see what the most prominent problems are, especially how different word orders cause translation errors. According to our findings, we propose two solutions to improve Chinese-to-English