• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Re-evaluating the role of BLEU in machine translation research (2006)

Cached

  • Download as a PDF

Download Links

  • [www.iccs.informatics.ed.ac.uk]
  • [www.iccs.inf.ed.ac.uk]
  • [acl.ldc.upenn.edu]
  • [acl.ldc.upenn.edu]
  • [www-csli.stanford.edu]
  • [www.cs.jhu.edu]
  • [cs.jhu.edu]
  • [homepages.inf.ed.ac.uk]
  • [www.iccs.inf.ed.ac.uk]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Chris Callison-burch , Miles Osborne
Venue:In EACL
Citations:53 - 3 self
  • Summary
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@INPROCEEDINGS{Callison-burch06re-evaluatingthe,
    author = {Chris Callison-burch and Miles Osborne},
    title = {Re-evaluating the role of BLEU in machine translation research},
    booktitle = {In EACL},
    year = {2006},
    pages = {249--256}
}

Years of Citing Articles

Bookmark

citeulike Connotea Bibsonomy Del.icio.us Digg Reddit

OpenURL

 

Abstract

We argue that the machine translation community is overly reliant on the Bleu machine translation evaluation metric. We show that an improved Bleu score is neither necessary nor sufficient for achieving an actual improvement in translation quality, and give two significant counterexamples to Bleu’s correlation with human judgments of quality. This offers new potential for research which was previously deemed unpromising by an inability to improve upon Bleu scores. 1

Citations

992 BLEU: A Method for Automatic Evaluation of Machine Translation - Papineni, Roukos, et al. - 2002
260 and E.Hovy. Automatic Evaluation of Summaries Using N-gram Co-Occurrence Statistics - Lin - 2003
256 Discriminative training and maximum entropy models for statistical machine translation - Och, Ney
216 Automatic Evaluation of Machine Translation Quality using N-gram Co-occurrence Statistics - Doddington - 2010
158 Europarl: A parallel corpus for statistical machine translation - Koehn - 2005
93 A smorgasbord of features for statistical machine translation - Och, Gildea, et al. - 2004
29 Word Sense Disambiguation vs. Statistical Machine Translation - Carpuat, Wu - 2005
17 A.: Extending the BLEU MT Evaluation Method with Frequency Weightings - Babych, Hartley - 2004
11 NIST 2005 machine translation evaluation official results. In Official release of automatic evaluation scores for all submission - Audrey, Przybocki - 2005
7 language models for machine translation - Syntax-based
4 B system description for the 2005 NIST MT evaluation exercise - Linear
1 Hovy and Deepak Ravichandra. 2003. Holy and unholy grails - Eduard
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University