## An exponential translation model for target language morphology

### Cached

### Download Links

### BibTeX

@MISC{Subotin_anexponential,

author = {Michael Subotin and Paxfire Inc},

title = {An exponential translation model for target language morphology},

year = {}

}

### OpenURL

### Abstract

This paper presents an exponential model for translation into highly inflected languages which can be scaled to very large datasets. As in other recent proposals, it predicts targetside phrases and can be conditioned on sourceside context. However, crucially for the task of modeling morphological generalizations, it estimates feature parameters from the entire training set rather than as a collection of separate classifiers. We apply it to English-Czech translation, using a variety of features capturing potential predictors for case, number, and gender, and one of the largest publicly available parallel data sets. We also describe generation and modeling of inflected forms unobserved in training data and decoding procedures for a model with non-local target-side feature dependencies. 1

### Citations

851 | An Empirical Study of Smoothing Techniques for Language Modeling
- Chen, Goodman
- 1998
(Show Context)
Citation Context ...iments described below we trained an exponential model for the p(Y |X) lexical model. For greater speed we estimate the probabilities for the other two models using interpolated Kneser-Ney smoothing (=-=Chen and Goodman, 1998-=-), where the surface form of a rule or an aligned word pair plays to role of a trigram, the pairing of the source surface form with the lemmatized target form plays the role of a bigram, and the sourc... |

759 | SRILM – an extensible language modeling toolkit
- Stolcke
- 2002
(Show Context)
Citation Context ...Unaligned words at the outer edges of rules or gaps were disallowed. A 5-gram language model with modified interpolated Kneser-Ney smoothing (Chen and Goodman, 1998) was trained by the SRILM toolkit (=-=Stolcke, 2002-=-) on a set of 208 million running words of text obtained by combining the monolingual Czech text distributed by the 2010 236 ACL MT workshop with the Czech portion of the training data. The decision r... |

634 | Statistical Phrase-Based Translation
- Koehn, Marcu
- 2003
(Show Context)
Citation Context ...proportion of non-parallel sentences pairs. All conditions use word alignments produced by sequential iterations of IBM model 1, HMM, and IBM model 4 in GIZA++, followed by “diag-and” symmetrization (=-=Koehn et al., 2003-=-). Thresholds for phrase extraction and decoder pruning were set to values typical for the baseline system (Chiang, 2007). Unaligned words at the outer edges of rules or gaps were disallowed. A 5-gram... |

375 | Hierarchical phrase-based translation
- Chiang
- 2007
(Show Context)
Citation Context ...borate source-side dependencies. 2 Hierarchical phrase-based translation We take as our starting point David Chiang’s Hiero system, which generalizes phrase-based translation to substrings with gaps (=-=Chiang, 2007-=-). Consider for instance the following set of context-free rules with a single non-terminal symbol: 〈 A , A 〉 → 〈 A1 A2 , A1 A2 〉 〈 A , A 〉 → 〈 d ′ A1 idées A2 , A1 A2 ideas 〉 〈 A , A 〉 → 〈 incolores ... |

267 | A limited memory algorithm for bound constrained optimization
- Byrd, Lu, et al.
- 1995
(Show Context)
Citation Context ...the experimental condition. 8 Parameter estimation Parameter estimation was performed using a modified version of the maximum entropy module from SciPy (Jones et al., 2001) and the LBFGS-B algorithm (=-=Byrd et al., 1995-=-). The objective included an ℓ2 regularizer with the regularization trade-off set to 1. The amount of training data presented a practical challenge for parameter estimation. Several strategies were pu... |

136 | Feature selection, L1 vs. L2 regularization, and rotational invariance
- NG
(Show Context)
Citation Context ...valently, its logarithm: LL( ⃗w) = log M∏ m=1 p(Ym|Xm) = M∑ log p(Ym|Xm) m=1 where the expressions range over all training instances {m}. In this work we extend the objective using an ℓ2 regularizer (=-=Ng, 2004-=-; Gao et al., 2007). We obtain the counts of instances and features from the standard heuristics used to extract the grammar from a word-aligned parallel corpus. Exponential models and other classifie... |

95 | D.: Improving statistical machine translation using word sense disambiguation - Carpuat, Wu - 2007 |

31 | Efficient large-scale distributed training of conditional maximum entropy models - Mann, McDonald, et al. - 2009 |

27 | Enriching morphologically poor languages for statistical machine translation - Avramidis, Koehn - 2008 |

10 | Czeng 0.9: Large parallel treebank with rich annotation - Bojar, Zabokrtsk´y - 2009 |

7 | A Discriminative Lexicon Model for Complex Morphology - Jeong, Toutanova, et al. - 2010 |

4 | markers and morphology: Addressing the crux of the fluency problem in english-hindi smt - Case |

3 | K.: Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish
- Yeniterzi, Oflazer
- 2010
(Show Context)
Citation Context ...system correlates significantly with a measure of targetside, but not source-side morphological complexity. Recently, several studies (Bojar, 2007; Avramidis and Koehn, 2009; Ramanathan et al., 2009; =-=Yeniterzi and Oflazer, 2010-=-) proposed modeling targetside morphology in a phrase-based factored models framework (Koehn and Hoang, 2007). Under this approach linguistic annotation of source sentences is analyzed using heuristic... |

2 |
How Phrase Sense Disambiguation outperforms Word Sense Disambiguation for Statistical Machine Translation
- 2007b
- 2007
(Show Context)
Citation Context ...s that would suffice to generate the English translation 1b for the French sentence 1a. 1a. d’ incolores idées vertes dorment furieusement 1b. colorless green ideas sleep furiously As shown by Chiang =-=(2007)-=-, a weighted grammar of this form can be collected and scored by simple extensions of standard methods for phrase-based translation and efficiently combined with a language model in a CKY decoder to a... |

1 | The Ninth Conference of the Association for Machine Translation in the Americas (AMTA-2010 - Jones, Oliphant, et al. |

1 |
SciPy: Open source scientific tools for Python. http://www.scipy.org
- Koehn, Hoang
- 2007
(Show Context)
Citation Context ...ly, several studies (Bojar, 2007; Avramidis and Koehn, 2009; Ramanathan et al., 2009; Yeniterzi and Oflazer, 2010) proposed modeling targetside morphology in a phrase-based factored models framework (=-=Koehn and Hoang, 2007-=-). Under this approach linguistic annotation of source sentences is analyzed using heuristics to identify relevant structural phenomena, whose occurrences are 230 in turn used to compute additional re... |

1 | Programátorská dokumentace k projektu Morfo. http://ufal.mff.cuni.cz/morfo - Kolovratník, Pˇrikryl - 2008 |

1 | An Essential Grammar - Czech - 2005 |