• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

An Efficient Method for Determining Bilingual Word Classes

by Franz Josef Och
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 32
Next 10 →

Phrase-Based Statistical Machine Translation

by Richard Zens, Franz Josef Och, Hermann Ney , 2002
"... This paper is based on the work carried out in the framework of the Verbmobil project, which is a limited-domain speech translation task (German-English). In the nal evaluation, the statistical approach was found to perform best among ve competing approaches. In this ..."
Abstract - Cited by 64 (3 self) - Add to MetaCart
This paper is based on the work carried out in the framework of the Verbmobil project, which is a limited-domain speech translation task (German-English). In the nal evaluation, the statistical approach was found to perform best among ve competing approaches. In this

A survey of statistical machine translation

by Adam Lopez , 2007
"... Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular tec ..."
Abstract - Cited by 30 (3 self) - Add to MetaCart
Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular techniques have only emerged within the last few years. This survey presents a tutorial overview of state-of-the-art SMT at the beginning of 2007. We begin with the context of the current research, and then move to a formal problem description and an overview of the four main subproblems: translational equivalence modeling, mathematical modeling, parameter estimation, and decoding. Along the way, we present a taxonomy of some different approaches within these areas. We conclude with an overview of evaluation and notes on future directions.

Multi-Align: Combining Linguistic and Statistical Techniques To Improve Alignments for Adaptable MT

by Necip Fazil Ayan, Bonnie Borr, Nizar Habash - In Proceedings of AMTA’2004 , 2004
"... The continuously growing MT market faces the challenge of translating new languages, diverse genres, and di#erent domains using a variety of available linguistic resources. ..."
Abstract - Cited by 8 (2 self) - Add to MetaCart
The continuously growing MT market faces the challenge of translating new languages, diverse genres, and di#erent domains using a variety of available linguistic resources.

Statistical Methods for Machine Translation

by Stephan Vogel, Franz Josef Och, Christof Tillmann, Sonja Nießen, Hassan Sawaf, Hermann Ney - Verbmobil: Foundations of Speech-toSpeech Translation , 2000
"... . In this article we describe the statistical approach to machine translation as implemented in the stattrans module of the Verbmobil system. The statistical translation approach uses two types of information: a translation model and a language model. The language model used is an m-gram model. The ..."
Abstract - Cited by 8 (3 self) - Add to MetaCart
. In this article we describe the statistical approach to machine translation as implemented in the stattrans module of the Verbmobil system. The statistical translation approach uses two types of information: a translation model and a language model. The language model used is an m-gram model. The translation model comprises a stochastic lexicon and word position parameters. To capture dependencies between word groups in each of the two languages, alignment templates are used. We describe the components of the system and report results on the Verbmobil task. The experience obtained in the Verbmobil project shows that the statistical approach is very competitive with other translation approaches. 1 Introduction In comparison with written language, speech and especially spontaneous speech poses additional difficulties for the task of automatic translation. Typically, these difficulties are caused by errors of the recognition process, which is carried out before the translation process...

Modelling lexical redundancy for machine translation

by David Talbot, Miles Osborne - In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics , 2006
"... Certain distinctions made in the lexicon of one language may be redundant when translating into another language. We quantify redundancy among source types by the similarity of their distributions over target types. We propose a languageindependent framework for minimising lexical redundancy that ca ..."
Abstract - Cited by 7 (1 self) - Add to MetaCart
Certain distinctions made in the lexicon of one language may be redundant when translating into another language. We quantify redundancy among source types by the similarity of their distributions over target types. We propose a languageindependent framework for minimising lexical redundancy that can be optimised directly from parallel text. Optimisation of the source lexicon for a given target language is viewed as model selection over a set of cluster-based translation models. Redundant distinctions between types may exhibit monolingual regularities, for example, inflexion patterns. We define a prior over model structure using a Markov random field and learn features over sets of monolingual types that are predictive of bilingual redundancy. The prior makes model selection more robust without the need for language-specific assumptions regarding redundancy. Using these models in a phrase-based SMT system, we show significant improvements in translation quality for certain language pairs. 1

Improved HMM Alignment Models for Languages with Scarce Resources

by Adam Lopez, Philip Resnik - In (this volume , 2005
"... We introduce improvements to statistical word alignment based on the Hidden Markov Model. One improvement incorporates syntactic knowledge. Results on the workshop data show that alignment performance exceeds that of a state-of-the art system based on more complex models, resulting in over a 5 ..."
Abstract - Cited by 6 (2 self) - Add to MetaCart
We introduce improvements to statistical word alignment based on the Hidden Markov Model. One improvement incorporates syntactic knowledge. Results on the workshop data show that alignment performance exceeds that of a state-of-the art system based on more complex models, resulting in over a 5.5% absolute reduction in error on Romanian-English.

Inferring Shallow-Transfer Machine Translation Rules from Small Parallel Corpora

by Felipe Sánchez-martínez, Mikel L. Forcada, Departament De Llenguatges I Sistemes
"... This paper describes a method for the automatic inference of structural transfer rules to be used in a shallow-transfer machine translation (MT) system from small parallel corpora. The structural transfer rules are based on alignment templates, like those used in statistical MT. Alignment templates ..."
Abstract - Cited by 5 (2 self) - Add to MetaCart
This paper describes a method for the automatic inference of structural transfer rules to be used in a shallow-transfer machine translation (MT) system from small parallel corpora. The structural transfer rules are based on alignment templates, like those used in statistical MT. Alignment templates are extracted from sentence-aligned parallel corpora and extended with a set of restrictions which are derived from the bilingual dictionary of the MT system and control their application as transfer rules. The experiments conducted using three different language pairs in the free/open-source MT platform Apertium show that translation quality is improved as compared to word-for-word translation (when no transfer rules are used), and that the resulting translation quality is close to that obtained using hand-coded transfer rules. The method we present is entirely unsupervised and benefits from information in the rest of modules of the MT system in which the inferred rules are applied. 1.

Analysis of statistical and morphological classes to generate weighted reordering hypotheses on a statistical machine translation system

by Marta R. Costa-jussà, José A. R. Fonollosa - In Proceedings of the ACL-2007 Workshop on Statistcal Machine Translation (WMT-07 , 2007
"... One main challenge of statistical machine translation (SMT) is dealing with word order. The main idea of the statistical machine reordering (SMR) approach is to use the powerful techniques of SMT systems to generate a weighted reordering graph for SMT systems. This technique supplies reordering cons ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
One main challenge of statistical machine translation (SMT) is dealing with word order. The main idea of the statistical machine reordering (SMR) approach is to use the powerful techniques of SMT systems to generate a weighted reordering graph for SMT systems. This technique supplies reordering constraints to an SMT system, using statistical criteria. In this paper, we experiment with different graph pruning which guarantees the translation quality improvement due to reordering at a very low increase of computational cost. The SMR approach is capable of generalizing reorderings, which have been learned during training, by using word classes instead of words themselves. We experiment with statistical and morphological classes in order to choose those which capture the most probable reorderings. Satisfactory results are reported in the WMT07 Es/En task. Our system outperforms in terms of BLEU the WMT07 Official baseline system. 1

The RWTH System For Statistical Translation Of Spoken Dialogues

by Hermann Ney, Franz Josef Och, Stephan Vogel, H. Ney, F. J. Och, S. Vogel - In Proceedings of the ARPA Workshop on Human Language Technology , 2001
"... This paper gives an overview of our work on statistical machine translation of spoken dialogues, in particular in the framework of the Verbmobil project. The goal of the Verbmobil project is the translation of spoken dialogues in the domains of appointment scheduling and travel planning. Starting wi ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
This paper gives an overview of our work on statistical machine translation of spoken dialogues, in particular in the framework of the Verbmobil project. The goal of the Verbmobil project is the translation of spoken dialogues in the domains of appointment scheduling and travel planning. Starting with the Bayes decision rule as in speech recognition, we show how the required probability distributions can be structured into three parts: the language model, the alignment model and the lexicon model. We describe the components of the system and report results on the Verbmobil task. The experience obtained in the Verbmobil project, in particular a large-scale end-to-end evaluation, showed that the statistical approach resulted in signicantly lower error rates than three competing translation approaches: the sentence error rate was 29% in comparison with 52% to 62% for the other translation approaches. 1.

Refined Lexicon Models for Statistical Machine Translation using a Maximum Entropy Approach

by Ismael García Varea, Dpto De Informatica, Univ De Castilla-la Mancha, Franz J. Och, Hermann Ney, Lehrstuhl Fur Inf Vi, Francisco Casacuberta - In Proc. of ACL-EACL , 2001
"... Typically, the lexicon models used in statistical machine translation systems do not include any kind of linguistic or contextual information, which often leads to problems in performing a correct word-sense disambiguation. One way to deal with this problem within the statistical framework is ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
Typically, the lexicon models used in statistical machine translation systems do not include any kind of linguistic or contextual information, which often leads to problems in performing a correct word-sense disambiguation. One way to deal with this problem within the statistical framework is using maximum entropy methods. In this paper, we present how to use this information within a statistical machine translation system. We show that it is possible to significantly decrease training and test corpus perplexity of the translation models. In addition, we perform a rescoring of N-Best lists using our maximum entropy model and thereby yield an improvement in translation quality. Experimental results are presented with the so called "Vermobil Task".
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University