Results 11 - 20
of
21
Stochastic language generation using WIDL - Expressions and its application in machine translation and summarization
- in Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL
, 2006
"... We propose WIDL-expressions as a flexible formalism that facilitates the integration of a generic sentence realization system within end-to-end language processing applications. WIDL-expressions represent compactly probability distributions over finite sets of candidate realizations, and have optima ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We propose WIDL-expressions as a flexible formalism that facilitates the integration of a generic sentence realization system within end-to-end language processing applications. WIDL-expressions represent compactly probability distributions over finite sets of candidate realizations, and have optimal algorithms for realization via interpolation with language model probability distributions. We show the effectiveness of a WIDL-based NLG system in two sentence realization tasks: automatic translation and headline generation. 1
Do we need phrases? Challenging the conventional wisdom in statistical machine translation
- In NAACL
, 2006
"... We begin by exploring theoretical and practical issues with phrasal SMT, several of which are addressed by syntax-based SMT. Next, to address problems not handled by syntax, we propose the concept of a Minimal Translation Unit (MTU) and develop MTU sequence models. Finally we incorporate these model ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We begin by exploring theoretical and practical issues with phrasal SMT, several of which are addressed by syntax-based SMT. Next, to address problems not handled by syntax, we propose the concept of a Minimal Translation Unit (MTU) and develop MTU sequence models. Finally we incorporate these models into a syntax-based SMT system and demonstrate that it improves on the state of the art translation quality within a theoretically more desirable framework. 1.
Unsupervised Syntax-Based Machine Translation
, 2007
"... We present a new unsupervised syntax-based MT system, termed U-DOT, which uses the unsupervised U-DOP model for learning paired trees, and which computes the most probable target sentence from the relative frequencies of paired subtrees. We test U-DOT on the German-English Europarl corpus, showing t ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We present a new unsupervised syntax-based MT system, termed U-DOT, which uses the unsupervised U-DOP model for learning paired trees, and which computes the most probable target sentence from the relative frequencies of paired subtrees. We test U-DOT on the German-English Europarl corpus, showing that it outperforms the state-of-the-art phrase-based Pharaoh system. We demonstrate that the inclusion of noncontiguous phrases significantly improves the translation accuracy. This paper presents the first translation results with the data-oriented translation (DOT) model on the Europarl corpus, to the best of our knowledge. Introduction: Phrase-Based vs Syntax-Based Machine Translation Phrase-based and syntax-based methods in MT have complementary strengths and shortcomings. While phrase-based methods have been highly successful
Analysis and Evaluation of Comparable Corpora for Under Resourced Areas of Machine Translation
- Proceedings of the 3rd Workshop on Building and Using Comparable Corpora. Applications of Parallel and Comparable Corpora in Natural Language Engineering and the Humanities
, 2010
"... Lack of sufficient linguistic resources and parallel corpora for many languages and domains currently is one of the major obstacles to further advancement of automated translation. The solution proposed in this paper is to exploit the fact that non-parallel bi- or multilingual text resources are muc ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Lack of sufficient linguistic resources and parallel corpora for many languages and domains currently is one of the major obstacles to further advancement of automated translation. The solution proposed in this paper is to exploit the fact that non-parallel bi- or multilingual text resources are much more widely available than parallel translation data. This position paper presents previous research in this field and research plans of the ACCURAT project. Its goal is to find, analyze and evaluate novel methods that exploit comparable corpora in order to compensate for the shortage of linguistic resources, and ultimately to significantly improve MT quality for under-resourced languages and narrow domains. 1.
The University of Edinburgh System Description for IWSLT 2007
"... We present the University of Edinburgh’s submission for the IWSLT 2007 shared task. Our efforts focused on adapting our statistical machine translation system to the open data conditions for the Italian-English task of the evaluation campaign. We examine the challenges of building a system with a li ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We present the University of Edinburgh’s submission for the IWSLT 2007 shared task. Our efforts focused on adapting our statistical machine translation system to the open data conditions for the Italian-English task of the evaluation campaign. We examine the challenges of building a system with a limited set of in-domain development data (SITAL), a small training corpus in a related but distinct domain (BTEC), and a large out of domain corpus (Europarl). We concentrated on the corrected text track, and present additional results of our experiments using the open-source Moses MT system with speech input. 1.
PATRICK NGUYEN
"... This paper presents a new hypothesis alignment method for combining outputs of multiple machine translation (MT) systems. An indirect hidden Markov model (IHMM) is proposed to address the synonym matching and word ordering issues in hypothesis alignment. Unlike traditional HMMs whose parameters are ..."
Abstract
- Add to MetaCart
This paper presents a new hypothesis alignment method for combining outputs of multiple machine translation (MT) systems. An indirect hidden Markov model (IHMM) is proposed to address the synonym matching and word ordering issues in hypothesis alignment. Unlike traditional HMMs whose parameters are trained via maximum likelihood estimation (MLE), the parameters of the IHMM are estimated indirectly from a variety of sources including word semantic similarity, word surface similarity, and a distance-based distortion penalty. The IHMM-based method significantly outperforms the state-of-the-art TER-based alignment model in our experiments on NIST benchmark datasets. Our combined SMT system using the proposed method achieved the best Chinese-to-English translation result in the constrained training track of the 2008 NIST Open MT Evaluation.
English Syntactic Reordering for English-Thai Phrase-Based Statistical Machine Translation
"... In language pairs which have different wordorders, accuracy of translations in phrase-based statistical machine translation (SMT) systems will decrease. Syntactic reordering approaches can improve phrase-based SMT systems by reordering words in sentences to make word-orders of source language senten ..."
Abstract
- Add to MetaCart
In language pairs which have different wordorders, accuracy of translations in phrase-based statistical machine translation (SMT) systems will decrease. Syntactic reordering approaches can improve phrase-based SMT systems by reordering words in sentences to make word-orders of source language sentences similar to word-orders of target language sentences. This paper proposes reordering rules for an English-Thai phrase-based SMT system. Our reordering approach is the first that is tested in an English-Thai phrase-based SMT system. The reordering rules transform both training and test English sentences in a preprocessing step. After the preprocessing step, word-orders of English sentences are more similar to word-orders of Thai sentences. The reorder approach improves accuracy of English-Thai translation in the Moses phrase-based SMT system. In the system, the BLEU score increases clearly from 40.05 % to 57.45%. Key Words: reordering, English-Thai translation, phrase-based SMT
A Web-Based Interactive Computer Aided Translation Tool
"... We developed caitra, a novel tool that aids human translators by (a) making suggestions for sentence completion in an interactive machine translation setting, (b) providing alternative word and phrase translations, and (c) allowing them to postedit machine translation output. The tool uses the Moses ..."
Abstract
- Add to MetaCart
We developed caitra, a novel tool that aids human translators by (a) making suggestions for sentence completion in an interactive machine translation setting, (b) providing alternative word and phrase translations, and (c) allowing them to postedit machine translation output. The tool uses the Moses decoder, is implemented in Ruby on Rails and C++ and delivered over the web. 1
Improving Mid-Range Reordering using Templates of Factors
"... We extend the factored translation model (Koehn and Hoang, 2007) to allow translations of longer phrases composed of factors such as POS and morphological tags to act as templates for the selection and reordering of surface phrase translation. We also reintroduce the use of alignment information wit ..."
Abstract
- Add to MetaCart
We extend the factored translation model (Koehn and Hoang, 2007) to allow translations of longer phrases composed of factors such as POS and morphological tags to act as templates for the selection and reordering of surface phrase translation. We also reintroduce the use of alignment information within the decoder, which forms an integral part of decoding in the Alignment Template System (Och, 2002), into phrase-based decoding. Results show an increase in translation performance of up to 1.0 % BLEU for out-of-domain French–English translation. We also show how this method compares and relates to lexicalized reordering. 1
Rapid Unsupervised Topic . . .
, 2009
"... In open-domain language exploitation applications, a wide variety of topics with swift topic shifts has to be captured. Consequently, it is crucial to rapidly adapt all language components of a spoken language system. This thesis addresses unsupervised topic adaptation in both monolingual and crossl ..."
Abstract
- Add to MetaCart
In open-domain language exploitation applications, a wide variety of topics with swift topic shifts has to be captured. Consequently, it is crucial to rapidly adapt all language components of a spoken language system. This thesis addresses unsupervised topic adaptation in both monolingual and crosslingual settings. For automatic speech recognition we rapidly adapt a language model on a source language. For statistical machine translation, we adapt a language model of a target language, a translation lexicon and a phrase table using a source text. For monolingual adaptation, we propose latent Dirichlet-Tree allocation for Bayesian latent semantic analysis. Our model enables rapid incremental language model adaptation via caching the fractional topic counts of word hypotheses decoded from previous speech utterances. Latent Dirichlet-Tree allocation models topic correlation in a tree-based hierarchy and thus addresses the model initialization issue. To address the “bag-of-word” assumption in latent semantic analysis, we extend our approach to N-gram latent Dirichlet-Tree allocation. We investigate a fractional Kneser-Ney smoothing approach to handle

