## Phrase-Based Statistical Machine Translation (2002)

### Cached

### Download Links

- [www-i6.informatik.rwth-aachen.de]
- [www.is.cs.cmu.edu]
- [www-i6.informatik.rwth-aachen.de]
- DBLP

### Other Repositories/Bibliography

Citations: | 114 - 16 self |

### BibTeX

@INPROCEEDINGS{Zens02phrase-basedstatistical,

author = {Richard Zens and Franz Josef Och and Hermann Ney},

title = {Phrase-Based Statistical Machine Translation},

booktitle = {},

year = {2002},

pages = {18--32},

publisher = {Springer Verlag}

}

### Years of Citing Articles

### OpenURL

### Abstract

This paper is based on the work carried out in the framework of the Verbmobil project, which is a limited-domain speech translation task (German-English). In the nal evaluation, the statistical approach was found to perform best among ve competing approaches. In this

### Citations

1544 | BLEU: a method for automatics evaluation of machine translation
- Papineni, Roukos, et al.
- 2002
(Show Context)
Citation Context ...s through the lattice. 7. Experimental results The automatic evaluation criteria are computed using the IWSLT 2005 evaluation server. For all the experiments, we report the two accuracy measures BLEU =-=[22]-=- and NIST [23] as well as the two error rates WER and PER. For the primary submissions, we also report the two accuracy measures Meteor [24] and GTM [25]. All those criteria are computed with respect ... |

1308 | A systematic comparison of various statistical alignment models
- Och, Ney
- 2003
(Show Context)
Citation Context ...s, we use a refined alignment probability p(aj − aj−1|G(eaj ), I) that conditions the jump widths of the alignment positions aj − aj−1 on the word class G(eaj ). This is the so-called homogeneous HMM =-=[19]-=-. 4.5. Word penalties Several word penalties are used in the rescoring step: hWP(f J 1 , e I ⎧ ⎨ I (a) 1) = I/J (b) ⎩ 2|I − J|/(I + J) (c) (21) The word penalties are heuristics that affect the genera... |

1212 | The mathematics of statistical machine translation: Parameter estimation - Brown, Pietra, et al. - 1993 |

803 | Srilm - an extensible language modeling toolkit
- Stolcke
- 2002
(Show Context)
Citation Context ...t the average sentence and phrase lengths. The model scaling factors can be adjusted to prefer longer sentences and longer phrases. 3.5. Target language model We use the SRI language modeling toolkit =-=[17]-=- to train a standard n-gram language model. The smoothing technique we apply is the modified Kneser-Ney discounting with interpolation. The order of the language model depends on the translation direc... |

608 | A statistical approach to machine translation
- Brown, Cocke, et al.
- 1990
(Show Context)
Citation Context ... (1) ê Î 1 = argmax I,eI 1 = argmax I,e I 1 { P r(e I 1) · P r(f J 1 |e I 1) } This decomposition into two knowledge sources is known as the source-channel approach to statistical machine translation =-=[1]-=-. It allows an independent modeling of the target language model P r(eI 1) and the translation model P r(f J 1 |eI 1) 1 . The target language model describes the well-formedness of the target language... |

479 | Minimum error rate training for statistical machine translation
- Och
- 2003
(Show Context)
Citation Context ...1 are trained according to the maximum entropy principle, e.g., using the GIS algorithm. Alternatively, one can train them with respect to the final translation quality measured by an error criterion =-=[3]-=-. For the IWSLT evaluation campaign, we optimized the scaling factors with respect to a linear interpolation of WER, PER, BLEU and NIST using the Downhill Simplex algorithm from [4]. 1.3. Phrase-based... |

471 | Improved Statistical Alignment Models - Och, Ney - 2000 |

384 | Discriminative training and maximum entropy models for statistical machine translation
- Och, Ney
(Show Context)
Citation Context ...ce in the target language. 1.2. Log-linear model An alternative to the classical source-channel approach is the direct modeling of the posterior probability P r(eI 1|f J 1 ). Using a log-linear model =-=[2]-=-, we obtain: ( ∑M exp m=1 λmhm(eI 1, f J ) 1 ) P r(e I 1|f J 1 ) = ∑ exp e ′I′ 1 ( ∑M m=1 λmhm(e ′I ′ 1 , f J 1 ) (2) ) (3) The denominator represents a normalization factor that depends only on the s... |

329 |
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics
- Doddington
(Show Context)
Citation Context ...lattice. 7. Experimental results The automatic evaluation criteria are computed using the IWSLT 2005 evaluation server. For all the experiments, we report the two accuracy measures BLEU [22] and NIST =-=[23]-=- as well as the two error rates WER and PER. For the primary submissions, we also report the two accuracy measures Meteor [24] and GTM [25]. All those criteria are computed with respect to multiple re... |

280 | Improved alignment models for statistical machine translation
- Och, Tillmann, et al.
- 1999
(Show Context)
Citation Context ...th the word alignment. Thus, the words of the source phrase are aligned onlyto words in the target phrase and vice versa. This criterion is identical to the alignment template criterion described in =-=[13]-=-. We use relative frequencies to estimate the phrase translation probabilities: p( ˜ f|˜e) = N( ˜ f, ˜e) N(˜e) Here, the number of co-occurrences of a phrase pair ( ˜ f, ˜e) that are consistent with t... |

214 | HMM-based word alignment in statistical translation - Vogel, Ney, et al. - 1996 |

148 | METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments
- Banerjee, Lavie
- 2005
(Show Context)
Citation Context ...l the experiments, we report the two accuracy measures BLEU [22] and NIST [23] as well as the two error rates WER and PER. For the primary submissions, we also report the two accuracy measures Meteor =-=[24]-=- and GTM [25]. All those criteria are computed with respect to multiple references (with the exception of EnglishChinese where only one reference is available). Research Laboratories, Kyoto, Japan. Ta... |

119 | Decoding complexity in word-replacement translation models - Knight - 1999 |

80 |
Toward a broad-coverage bilingual corpus for speech translation of travel conversations in the real world
- Takezawa, Sumita, et al.
- 2002
(Show Context)
Citation Context ...n, the reported improvements might have been larger with a proper handling of the vocabularies. 6. Tasks and corpora The experiments were carried out on the Basic Travel Expression Corpus (BTEC) task =-=[20]-=-. This is a multilingual speech corpus which contains tourism-related sentences similar to those that are found in phrase books. The corpus statistics are shown in Table 1. For the supplied data track... |

65 | An efficient method for determining bilingual word classes - Och - 1999 |

62 | Improvements in phrase-based statistical machine translation
- Zens, Ney
- 2004
(Show Context)
Citation Context ...veral models (also called feature functions). In this section, we will describe the models that are used in the first pass, i.e., during search. This is an improved version of the system described in =-=[12]-=-. More specifically the models are: a phrase translation model, a word-based translation model, a deletion model, word and phrase penalty, a target language model and a reordering model. 3.1. Phrase-b... |

53 | Evaluation of machine translation and its evaluation
- Turian, Shen, et al.
- 2003
(Show Context)
Citation Context ...ents, we report the two accuracy measures BLEU [22] and NIST [23] as well as the two error rates WER and PER. For the primary submissions, we also report the two accuracy measures Meteor [24] and GTM =-=[25]-=-. All those criteria are computed with respect to multiple references (with the exception of EnglishChinese where only one reference is available). Research Laboratories, Kyoto, Japan. Table 4: Progre... |

41 | An Inequality and Associated Maximization Technique - Baum - 1972 |

38 | Semantic-based transfer - Emele, Dorna, et al. - 1996 |

34 | Algorithms for statistical translation of spoken language - Ney, Nießen, et al. - 2000 |

34 | Generation of word graphs in statistical machine translation
- Ueffing, Och, et al.
- 2002
(Show Context)
Citation Context ...arch algorithms generate a word graph containing the most likely translation hypotheses. Out of this word graph we extract N-best lists. For more details on word graphs and Nbest list extraction, see =-=[10, 11]-=-. 3. Models used during search We use a log-linear combination of several models (also called feature functions). In this section, we will describe the models that are used in the first pass, i.e., du... |

33 | Novel reordering approaches in phrase-based statistical machine translation
- Kanthak, Vilar, et al.
- 2005
(Show Context)
Citation Context ... sentence so that the overall search can generate nonmonotone translations. Using this approach, it is very simple to experiment with various reordering constraints, e.g., the constraints proposed in =-=[6]-=-. Alternatively, we can use ASR lattices as input and translate them without changing the search algorithm, cf. [7]. A disadvantage when translating lattices with this method is that the search is mon... |

31 | Word reordering and a dynamic programming beam search algorithm for statistical machine translation
- Tillmann, Ney
- 2003
(Show Context)
Citation Context ...following idea: while traversing the input graph, a phrase can be skipped and processed later. Source cardinality synchronous search. For singleword based models, this search strategy is described in =-=[8]-=-. The idea is that the search proceeds synchronously with the cardinality of the already translated source positions. Here, we use a phrase-based version of this idea. To make the search problem feasi... |

21 | Quine’s Empirical Assumptions - Chomsky - 1968 |

16 | Reordering Constraints for Phrase-Based Statistical Machine Translation
- Zens, Ney, et al.
- 2004
(Show Context)
Citation Context ...hronously with the cardinality of the already translated source positions. Here, we use a phrase-based version of this idea. To make the search problem feasible, the reorderings are constrained as in =-=[9]-=-. Word graphs and N-best lists. The two described search algorithms generate a word graph containing the most likely translation hypotheses. Out of this word graph we extract N-best lists. For more de... |

16 |
A speech and language database for speech translation research
- Morimoto, Uratani, et al.
- 1994
(Show Context)
Citation Context ...’03 and IWSLT’04) were made available for each language pair. As additional training resources for the C-Star track, we used the full BTEC for Japanese-English and the Spoken Language DataBase (SLDB) =-=[21]-=-, which consists of transcriptions of spoken dialogs in the domain of hotel reservations 3 . 3 The Japanese-English training corpora (BTEC, SLDB) that we used in the C-Star track were kindly provided ... |

10 | R.: Robust content extraction for translation and dialog processing - Reithinger, Engel |

9 | Deep linguistic analysis with HPSG - Uszkoreit, Flickinger, et al. - 2000 |

7 | Clustered language models based on regular expressions for SMT
- Hasan, Ney
- 2005
(Show Context)
Citation Context ...s One of the first ideas in rescoring is to use additional language models that were not used in the generation procedure. In our system, we use clustered language models based on regular expressions =-=[18]-=-. Each hypothesis is classified by matching it to regular expressions that identify the type of the sentence. Then, a cluster-specific (or sentence-type-specific) language model is interpolated into a... |

6 | 2005. Phrase-based Translation of Speech Recognizer Word Lattices Using Loglinear Model Combination
- Matusov, Ney
(Show Context)
Citation Context ...to experiment with various reordering constraints, e.g., the constraints proposed in [6]. Alternatively, we can use ASR lattices as input and translate them without changing the search algorithm, cf. =-=[7]-=-. A disadvantage when translating lattices with this method is that the search is monotone. To overcome this problem, we extended the monotone search algorithm from [5, 7] so that it is possible to re... |

5 | Ney: Word Graphs for Statistical Machine Translation
- Zens, H
- 2005
(Show Context)
Citation Context ...arch algorithms generate a word graph containing the most likely translation hypotheses. Out of this word graph we extract N-best lists. For more details on word graphs and Nbest list extraction, see =-=[10, 11]-=-. 3. Models used during search We use a log-linear combination of several models (also called feature functions). In this section, we will describe the models that are used in the first pass, i.e., du... |

4 | Hahn: Functional validation of a machine translation system: Verbmobil - Tessiore, v - 2000 |