## Obtaining Word Phrases with Stochastic Inversion Transduction Grammars for Phrase-based Statistical Machine Translation ∗

### BibTeX

@MISC{Sánchez_obtainingword,

author = {J. A. Sánchez and J. M. Benedí},

title = {Obtaining Word Phrases with Stochastic Inversion Transduction Grammars for Phrase-based Statistical Machine Translation ∗},

year = {}

}

### OpenURL

### Abstract

Phrase-based statistical translation systems are currently providing excellent results in real machine translation tasks. In phrase-based statistical translation systems, the basic translation units are word phrases. An important problem that is related to the estimation of phrase-based statistical models is the obtaining of word phrases from an aligned bilingual training corpus. In this work, we propose obtaining word phrases by means of a Stochastic Inversion Transduction Grammar. Preliminary experiments have been carried out on real tasks and promising results have been obtained. 1

### Citations

1427 | A systematic comparison of various statistical alignment models
- Och, Ney
(Show Context)
Citation Context ...oblem that is related to phrase-based statistical translation is to automatically obtain bilingual word phrases from parallel corpora. Several methods have been defined for dealing with this problem (=-=Och & Ney, 2003-=-). In this work, we study a method to obtain word phrases that is based on Stochastic Inversion Transduction Grammars that was proposed in (Wu, 1997). Stochastic Inversion Transduction Grammars (SITG)... |

1308 | The mathematics of statistical machine translation: Parameter estimation
- Brown, Pietra, et al.
- 1993
(Show Context)
Citation Context ...ary experiments have been carried out on real tasks and promising results have been obtained. 1 Introduction Machine Translation is a problem that can be addressed by means of statistical techniques (=-=Brown, Pietra, Pietra, & Mercer, 1993-=-). In this approach, the process of human language translation is modeled statistically by means of statistical translation models. In order to estimate these statistical translation models, several a... |

884 | A maximum-entropy-inspired parser
- Charniak
- 2000
(Show Context)
Citation Context ...the translation when some kind of structural information was incorporated in the parsing. Since the training data was not bracketed, we parsed the English part of the corpus with the Charniak parser (=-=Charniak, 2000-=-). Only the bracketing was kept in the corpus and the other information (POStags and syntactic tags) was removed. We then obtained word phrases according to the bracketing by using the same SITG that ... |

648 | A Statistical Approach to Machine Translation
- Brown, Cocke, et al.
- 1990
(Show Context)
Citation Context ...e these statistical translation models, several approaches have been proposed in the literature: finite-state techniques (Bangalore & Riccardi, 2001; Casacuberta & Vidal, 2004); alignment techniques (=-=Brown et al., 1990-=-, 1993; Zens, Och, & Ney, 2002; Vogel et al., 2003; Koehn, 2004; Och & Ney, 2004); and syntax-based techniques (Wu, 1997; Yamada & Knight, 2001). Phrase-based techniques are based on the alignment of ... |

517 | Improved statistical alignment models
- Och, Ney
- 2000
(Show Context)
Citation Context ...entence pairs 3,000 Running words 35,067 35,630 3-gram test-set perp. 3.7 3.0 4.1.1 Obtaining a SITG from an aligned corpus For this experiment, a SITG was constructed as follows: the GIZA++ toolkit (=-=Och & Ney, 2000-=-) was used to obtain a translation table and the corresponding probability Pr(f|e). The alignment was carried out in both directions in order to have both insertions and deletions available. This tabl... |

495 | Stochastic inversion transduction grammars and bilingual parsing of parallel corpora
- Wu
- 1997
(Show Context)
Citation Context ...e & Riccardi, 2001; Casacuberta & Vidal, 2004); alignment techniques (Brown et al., 1990, 1993; Zens, Och, & Ney, 2002; Vogel et al., 2003; Koehn, 2004; Och & Ney, 2004); and syntax-based techniques (=-=Wu, 1997-=-; Yamada & Knight, 2001). Phrase-based techniques are based on the alignment of word phrases (Marcu & Wong, 2002; Zens et al., 2002; Vogel et al., 2003; Koehn, 2004; Tomás, Lloret, & Casacuberta, 2005... |

408 |
The alignment template approach to statistical machine translation
- Och, Ney
- 2004
(Show Context)
Citation Context ...e literature: finite-state techniques (Bangalore & Riccardi, 2001; Casacuberta & Vidal, 2004); alignment techniques (Brown et al., 1990, 1993; Zens, Och, & Ney, 2002; Vogel et al., 2003; Koehn, 2004; =-=Och & Ney, 2004-=-); and syntax-based techniques (Wu, 1997; Yamada & Knight, 2001). Phrase-based techniques are based on the alignment of word phrases (Marcu & Wong, 2002; Zens et al., 2002; Vogel et al., 2003; Koehn, ... |

288 | A syntax-based statistical translation model
- Yamada, Knight
- 2001
(Show Context)
Citation Context ...di, 2001; Casacuberta & Vidal, 2004); alignment techniques (Brown et al., 1990, 1993; Zens, Och, & Ney, 2002; Vogel et al., 2003; Koehn, 2004; Och & Ney, 2004); and syntax-based techniques (Wu, 1997; =-=Yamada & Knight, 2001-=-). Phrase-based techniques are based on the alignment of word phrases (Marcu & Wong, 2002; Zens et al., 2002; Vogel et al., 2003; Koehn, 2004; Tomás, Lloret, & Casacuberta, 2005). Phrase-based statist... |

280 | Inside-outside reestimation from partially bracketed corpora
- Pereira, Schabes
- 1992
(Show Context)
Citation Context ...d corpus is available, then a modified version of the parsing algorithm can be defined in order to take into account the bracketing of the strings. The modifications are similar to those proposed in (=-=Pereira & Schabes, 1992-=-) for the inside algorithm. Following the notation that is presented in (Pereira & Schabes, 1992), we can define a partially bracketed corpus as a set of sentence pairs that is annotated with parenthe... |

202 | A phrase-based, joint probability model for statistical machine translation
- Marcu, Wong
- 2002
(Show Context)
Citation Context ...Och, & Ney, 2002; Vogel et al., 2003; Koehn, 2004; Och & Ney, 2004); and syntax-based techniques (Wu, 1997; Yamada & Knight, 2001). Phrase-based techniques are based on the alignment of word phrases (=-=Marcu & Wong, 2002-=-; Zens et al., 2002; Vogel et al., 2003; Koehn, 2004; Tomás, Lloret, & Casacuberta, 2005). Phrase-based statistical translation systems are currently providing excellent results in real machine transl... |

122 | The RWTH phrase-based statistical machine translation system
- Zens, Bender, et al.
- 2005
(Show Context)
Citation Context ...ation models, several approaches have been proposed in the literature: finite-state techniques (Bangalore & Riccardi, 2001; Casacuberta & Vidal, 2004); alignment techniques (Brown et al., 1990, 1993; =-=Zens, Och, & Ney, 2002-=-; Vogel et al., 2003; Koehn, 2004; Och & Ney, 2004); and syntax-based techniques (Wu, 1997; Yamada & Knight, 2001). Phrase-based techniques are based on the alignment of word phrases (Marcu & Wong, 20... |

68 | Machine Translation with Inferred Stochastic Finite-State Transducers
- Casacuberta, Vidal
- 2004
(Show Context)
Citation Context ...tatistical translation models. In order to estimate these statistical translation models, several approaches have been proposed in the literature: finite-state techniques (Bangalore & Riccardi, 2001; =-=Casacuberta & Vidal, 2004-=-); alignment techniques (Brown et al., 1990, 1993; Zens, Och, & Ney, 2002; Vogel et al., 2003; Koehn, 2004; Och & Ney, 2004); and syntax-based techniques (Wu, 1997; Yamada & Knight, 2001). Phrase-base... |

32 | The CMU statistical machine translation system
- Vogel, Zhang, et al.
- 2003
(Show Context)
Citation Context ...proaches have been proposed in the literature: finite-state techniques (Bangalore & Riccardi, 2001; Casacuberta & Vidal, 2004); alignment techniques (Brown et al., 1990, 1993; Zens, Och, & Ney, 2002; =-=Vogel et al., 2003-=-; Koehn, 2004; Och & Ney, 2004); and syntax-based techniques (Wu, 1997; Yamada & Knight, 2001). Phrase-based techniques are based on the alignment of word phrases (Marcu & Wong, 2002; Zens et al., 200... |

29 | A nite-state approach to machine translation
- Bangalore, Riccardi
- 2001
(Show Context)
Citation Context ... statistically by means of statistical translation models. In order to estimate these statistical translation models, several approaches have been proposed in the literature: finite-state techniques (=-=Bangalore & Riccardi, 2001-=-; Casacuberta & Vidal, 2004); alignment techniques (Brown et al., 1990, 1993; Zens, Och, & Ney, 2002; Vogel et al., 2003; Koehn, 2004; Och & Ney, 2004); and syntax-based techniques (Wu, 1997; Yamada &... |

16 |
Probabilistic estimation of stochastic regular syntaxdirected translation schemes
- Casacuberta
- 1995
(Show Context)
Citation Context ...(Wu, 1997). Stochastic Inversion Transduction Grammars (SITG) can be viewed as a restricted Stochastic Context-Free SyntaxDirected Transduction Scheme (Aho & Ullman, 1972; Maryanski & Thomason, 1979; =-=Casacuberta, 1995-=-). SITGs can be used to carry out a simultaneous parsing of both the input string and the output string. In this work, we propose to apply this idea to obtain aligned word phrases to be used in phrase... |

11 |
Properties of stochastic syntax-directed tranlation schemata
- Maryanski, Thomason
- 1979
(Show Context)
Citation Context ...ammars that was proposed in (Wu, 1997). Stochastic Inversion Transduction Grammars (SITG) can be viewed as a restricted Stochastic Context-Free SyntaxDirected Transduction Scheme (Aho & Ullman, 1972; =-=Maryanski & Thomason, 1979-=-; Casacuberta, 1995). SITGs can be used to carry out a simultaneous parsing of both the input string and the output string. In this work, we propose to apply this idea to obtain aligned word phrases t... |

4 |
Phrase-based alignment models for statistical machine translation
- Tomas, Lloret, et al.
- 2005
(Show Context)
Citation Context ... syntax-based techniques (Wu, 1997; Yamada & Knight, 2001). Phrase-based techniques are based on the alignment of word phrases (Marcu & Wong, 2002; Zens et al., 2002; Vogel et al., 2003; Koehn, 2004; =-=Tomás, Lloret, & Casacuberta, 2005-=-). Phrase-based statistical translation systems are currently providing excellent results in real machine translation tasks. In phrase-based statistical translation systems, the basic translation unit... |

2 |
The theory of parsing, translation, and compiling. volumen i: parsing
- Aho, Ullman
- 1972
(Show Context)
Citation Context ...sion Transduction Grammars that was proposed in (Wu, 1997). Stochastic Inversion Transduction Grammars (SITG) can be viewed as a restricted Stochastic Context-Free SyntaxDirected Transduction Scheme (=-=Aho & Ullman, 1972-=-; Maryanski & Thomason, 1979; Casacuberta, 1995). SITGs can be used to carry out a simultaneous parsing of both the input string and the output string. In this work, we propose to apply this idea to o... |

2 | Transtype2 computer assisted translation (tt2). technical report. information society technologies (ist) program - TT2 - 2002 |

1 |
Pharaoh: a beam search for phrase-based statistical machine translation models
- Koehn
- 2004
(Show Context)
Citation Context ...roposed in the literature: finite-state techniques (Bangalore & Riccardi, 2001; Casacuberta & Vidal, 2004); alignment techniques (Brown et al., 1990, 1993; Zens, Och, & Ney, 2002; Vogel et al., 2003; =-=Koehn, 2004-=-; Och & Ney, 2004); and syntax-based techniques (Wu, 1997; Yamada & Knight, 2001). Phrase-based techniques are based on the alignment of word phrases (Marcu & Wong, 2002; Zens et al., 2002; Vogel et a... |