• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Word-level confidence estimation for machine translation using phrase-based translation models (2005)

by Nicola Ueffing, Hermann Ney
Venue:Computational Linguistics
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 23
Next 10 →

A survey of statistical machine translation

by Adam Lopez , 2007
"... Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular tec ..."
Abstract - Cited by 30 (3 self) - Add to MetaCart
Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular techniques have only emerged within the last few years. This survey presents a tutorial overview of state-of-the-art SMT at the beginning of 2007. We begin with the context of the current research, and then move to a formal problem description and an overview of the four main subproblems: translational equivalence modeling, mathematical modeling, parameter estimation, and decoding. Along the way, we present a taxonomy of some different approaches within these areas. We conclude with an overview of evaluation and notes on future directions.

Transductive learning for statistical machine translation

by Nicola Ueffing - In Proc. of ACL , 2007
"... Statistical machine translation systems are usually trained on large amounts of bilingual text and monolingual text in the target language. In this paper we explore the use of transductive semi-supervised methods for the effective use of monolingual data from the source language in order to improve ..."
Abstract - Cited by 12 (3 self) - Add to MetaCart
Statistical machine translation systems are usually trained on large amounts of bilingual text and monolingual text in the target language. In this paper we explore the use of transductive semi-supervised methods for the effective use of monolingual data from the source language in order to improve translation quality. We propose several algorithms with this aim, and present the strengths and weaknesses of each one. We present detailed experimental evaluations on the French–English EuroParl data set and on data from the NIST Chinese–English largedata track. We show a significant improvement in translation quality on both tasks. 1

NRC’s PORTAGE system for WMT 2007

by Nicola Ueffing, Michel Simard, Samuel Larkin, Howard Johnson - In Proc. ACL Workshop on SMT , 2007
"... We present the PORTAGE statistical machine translation system which participated in the shared task of the ACL 2007 Second Workshop on Statistical Machine Translation. The focus of this description is on improvements which were incorporated into the system over the last year. These include adapted l ..."
Abstract - Cited by 10 (3 self) - Add to MetaCart
We present the PORTAGE statistical machine translation system which participated in the shared task of the ACL 2007 Second Workshop on Statistical Machine Translation. The focus of this description is on improvements which were incorporated into the system over the last year. These include adapted language models, phrase table pruning, an IBM1-based decoder feature, and rescoring with posterior probabilities. 1

Collaborative Entity Extraction and Translation

by Heng Ji, Ralph Grishman - Proc. International Conference on Recent Advances in Natural Language Processing 2007. Borovets , 2007
"... Entity extraction is the task of identifying names and nominal phrases (‘mentions’) in a text and linking coreferring mentions. We propose the use of a new source of data for improving entity extraction: the information gleaned from large bitexts and captured by a statistical, phrase-based machine t ..."
Abstract - Cited by 8 (6 self) - Add to MetaCart
Entity extraction is the task of identifying names and nominal phrases (‘mentions’) in a text and linking coreferring mentions. We propose the use of a new source of data for improving entity extraction: the information gleaned from large bitexts and captured by a statistical, phrase-based machine translation system. We translate the individual mentions and test properties of the translated mentions, as well as comparing the translations of coreferring mentions. The results provide feedback to improve source language entity extraction. Experiments on Chinese and English show that this approach can significantly improve Chinese entity extraction (2.2%relative improvement in name tagging F-measure, representing a 15.0 % error reduction), as well as Chinese to English entity translation (9.1 % relative improvement in F-measure), over state-of-the-art entity extraction and machine translation systems.

Confidence driven unsupervised semantic parsing

by Dan Goldwasser, Roi Reichart, James Clarke, Dan Roth - In Proc. of the Meeting of Association for Computational Linguistics (ACL , 2011
"... Current approaches for semantic parsing take a supervised approach requiring a considerable amount of training data which is expensive and difficult to obtain. This supervision bottleneck is one of the major difficulties in scaling up semantic parsing. We argue that a semantic parser can be trained ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
Current approaches for semantic parsing take a supervised approach requiring a considerable amount of training data which is expensive and difficult to obtain. This supervision bottleneck is one of the major difficulties in scaling up semantic parsing. We argue that a semantic parser can be trained effectively without annotated data, and introduce an unsupervised learning algorithm. The algorithm takes a self training approach driven by confidence estimation. Evaluated over Geoquery, a standard dataset for this task, our system achieved 66 % accuracy, compared to 80 % of its fully supervised counterpart, demonstrating the promise of unsupervised approaches for this task. 1

Automatic Selection of High Quality Parses Created By a Fully Unsupervised Parser

by Roi Reichart
"... The average results obtained by unsupervised statistical parsers have greatly improved in the last few years, but on many specific sentences they are of rather low quality. The output of such parsers is becoming valuable for various applications, and it is radically less expensive to create than man ..."
Abstract - Cited by 5 (2 self) - Add to MetaCart
The average results obtained by unsupervised statistical parsers have greatly improved in the last few years, but on many specific sentences they are of rather low quality. The output of such parsers is becoming valuable for various applications, and it is radically less expensive to create than manually annotated training data. Hence, automatic selection of high quality parses created by unsupervised parsers is an important problem. In this paper we present PUPA, a POS-based Unsupervised Parse Assessment algorithm. The algorithm assesses the quality of a parse tree using POS sequence statistics collected from a batch of parsed sentences. We evaluate the algorithm by using an unsupervised POS tagger and an unsupervised parser, selecting high quality parsed sentences from English (WSJ) and German (NEGRA) corpora. We show that PUPA outperforms the leading previous parse assessment algorithm for supervised parsers, as well as a strong unsupervised baseline. Consequently, PUPA allows obtaining high quality parses without any human involvement. 1

Semi-supervised model adaptation for statistical machine translation

by Nicola Ueffing, Gholamreza Haffari, Anoop Sarkar , 2008
"... ..."
Abstract - Cited by 4 (2 self) - Add to MetaCart
Abstract not found

Improving Word Alignment with Bridge Languages

by Shankar Kumar, Franz Och, Wolfgang Macherey, Google Inc
"... We describe an approach to improve Statistical Machine Translation (SMT) performance using multi-lingual, parallel, sentence-aligned corpora in several bridge languages. Our approach consists of a simple method for utilizing a bridge language to create a word alignment system and a procedure for com ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
We describe an approach to improve Statistical Machine Translation (SMT) performance using multi-lingual, parallel, sentence-aligned corpora in several bridge languages. Our approach consists of a simple method for utilizing a bridge language to create a word alignment system and a procedure for combining word alignment systems from multiple bridge languages. The final translation is obtained by consensus decoding that combines hypotheses obtained using all bridge language word alignments. We present experiments showing that multilingual, parallel text in Spanish, French, Russian, and Chinese can be utilized in this framework to improve translation performance on an Arabic-to-English task. 1

Fluency Constraints for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices

by Graeme Blackwood, Adrià De Gispert, William Byrne
"... A novel and robust approach to improving statistical machine translation fluency is developed within a minimum Bayesrisk decoding framework. By segmenting translation lattices according to confidence measures over the maximum likelihood translation hypothesis we are able to focus on regions with pot ..."
Abstract - Cited by 3 (2 self) - Add to MetaCart
A novel and robust approach to improving statistical machine translation fluency is developed within a minimum Bayesrisk decoding framework. By segmenting translation lattices according to confidence measures over the maximum likelihood translation hypothesis we are able to focus on regions with potential translation errors. Hypothesis space constraints based on monolingual coverage are applied to the low confidence regions to improve overall translation fluency. 1

Active Learning for Statistical Phrase-based Machine Translation

by Gholamreza Haffari, Maxim Roy, Anoop Sarkar , 2009
"... Statistical machine translation (SMT) models need large bilingual corpora for training, which are unavailable for some language pairs. This paper provides the first serious experimental study of active learning for SMT. We use active learning to improve the quality of a phrase-based SMT system, and ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
Statistical machine translation (SMT) models need large bilingual corpora for training, which are unavailable for some language pairs. This paper provides the first serious experimental study of active learning for SMT. We use active learning to improve the quality of a phrase-based SMT system, and show significant improvements in translation compared to a random sentence selection baseline, when test and training data are taken from the same or different domains. Experimental results are shown in a simulated setting using three language pairs, and in a realistic situation for Bangla-English, a language pair with limited translation resources.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University