### BibTeX

@MISC{Foster_mixture-modeladaptation,

author = {George Foster},

title = {Mixture-Model Adaptation for SMT},

year = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

We describe a mixture-model approach to adapting a Statistical Machine Translation System for new domains, using weights that depend on text distances to mixture components. We investigate a number of variants on this approach, including cross-domain versus dynamic adaptation; linear versus loglinear mixtures; language and translation model adaptation; different methods of assigning weights; and granularity of the source unit being adapted to. The best methods achieve gains of approximately one BLEU percentage point over a state-of-the art non-adapted baseline system. 1

### Citations

2968 | Indexing by latent semantic analysis
- Deerwester, Dumais, et al.
- 1990
(Show Context)
Citation Context ...form: −˜p(w) log ˜pdoc(w), where ˜p(w) is the relative frequency of word w within the component or document, and pdoc(w) is the proportion of components it appears in. Latent Semantic Analysis (LSA) (=-=Deerwester et al., 1990-=-) is a technique for implicitly capturing the semantic properties of texts, based on the use of Singular Value Decomposition to produce a rankreduced approximation of an original matrix of word and do... |

1606 | Bleu: A method for automatic evaluation of machine translation
- Papineni, Roukos, et al.
- 2002
(Show Context)
Citation Context ...loglinear approach: [ ] ∑ p(t, a|s) ∝ exp αifi(s, t, a) (1) where each fi(s, t, a) is a feature function, and weights αi are set using Och’s algorithm (Och, 2003) to maximize the system’s BLEU score (=-=Papineni et al., 2001-=-) on a development corpus. The features used in this study are: the length of t; a single-parameter distortion penalty on phrase reordering in a, as described in (Koehn et al., 2003); phrase translati... |

782 |
Statistical Methods for Speech Recognition
- Jelinek
- 1998
(Show Context)
Citation Context ...rpus (as opposed to components), reduced the rank to 100, then calculated the projections of the component and document vectors described in the previous paragraph into the reduced space. Perplexity (=-=Jelinek, 1997-=-) is a standard way of evaluating the quality of a language model on a test text. We define a perplexity-based distance metric pc(q) 1/|q| , where pc(q) is the probability assigned to q by an ngram la... |

505 | Latticebased minimum error rate training for statistical machine translation
- Macherey, Och, et al.
- 2008
(Show Context)
Citation Context ...rase ˜tk. To model p(t, a|s), we use a standard loglinear approach: [ ] ∑ p(t, a|s) ∝ exp αifi(s, t, a) (1) where each fi(s, t, a) is a feature function, and weights αi are set using Och’s algorithm (=-=Och, 2003-=-) to maximize the system’s BLEU score (Papineni et al., 2001) on a development corpus. The features used in this study are: the length of t; a single-parameter distortion penalty on phrase reordering ... |

178 | A Cache-Based Natural Language Model for Speech Recognition - Kuhn, Mori - 1990 |

135 | The mathematics of Machine Translation: Parameter Estimation - Brown, Pietra, et al. - 1993 |

103 | Modeling long distance dependencies in language: Topic mixtures versus dynamic cache model
- Iyer, Ostendorf
- 1999
(Show Context)
Citation Context ...d technique in machine learning (Hastie et al., 2001). It has been widely used to adapt language models for speech recognition and other applications, for instance using cross-domain topic mixtures, (=-=Iyer and Ostendorf, 1999-=-), dynamic topic mixtures (Kneser and Steinbiss, 1993), hierachical mixtures (Florian and Yarowsky, 1999), and cache mixtures (Kuhn and De Mori, 1990). Most previous work on adaptive SMT focuses on th... |

68 | Improvements in phrasebased statistical machine translation
- Zens, Ney
- 2004
(Show Context)
Citation Context ... form: log p(s|t, a) ≈ ∑K k=1 log p(˜sk|˜tk). We use two different estimates for the conditional probabilities p(˜t|˜s) and p(˜s|˜t): relative frequencies and “lexical” probabilities as described in (=-=Zens and Ney, 2004-=-). In both cases, the “forward” phrase probabilities p(˜t|˜s) are not used as features, but only as a filter on the set of possible translations: for each source phrase ˜s that matches some ngram in s... |

51 | Adaptation of the Translation Model for Statistical Machine Translation based on Information Retrieval - Hildebrand, Eck, et al. - 2005 |

39 |
On the dynamic adaptation of stochastic language models
- Kneser, Steinbiss
- 1993
(Show Context)
Citation Context ...1). It has been widely used to adapt language models for speech recognition and other applications, for instance using cross-domain topic mixtures, (Iyer and Ostendorf, 1999), dynamic topic mixtures (=-=Kneser and Steinbiss, 1993-=-), hierachical mixtures (Florian and Yarowsky, 1999), and cache mixtures (Kuhn and De Mori, 1990). Most previous work on adaptive SMT focuses on the use of IR techniques to identify a relevant subset ... |

38 | Language model adaptation for statistical machine translation based on information retrieval - Eck, Vogel, et al. - 2004 |

37 | A discriminative global training algorithm for statistical MT
- Tillmann, Zhang
- 2006
(Show Context)
Citation Context ...ses, the amount of text available to train each, and therefore its reliability, decreases. This makes it suitable for discriminative SMT training, which is still a challenge for large parameter sets (=-=Tillmann and Zhang, 2006-=-; Liang et al., 2006). Techniques for assigning mixture weights depend on the setting. In cross-domain adaptation, knowledge of both source and target texts in the in-domain sample can be used to opti... |

26 | Dynamic nonlocal language modeling via hierarchical topic-based adaptation
- Florian, Yarowsky
- 1999
(Show Context)
Citation Context ...s for speech recognition and other applications, for instance using cross-domain topic mixtures, (Iyer and Ostendorf, 1999), dynamic topic mixtures (Kneser and Steinbiss, 1993), hierachical mixtures (=-=Florian and Yarowsky, 1999-=-), and cache mixtures (Kuhn and De Mori, 1990). Most previous work on adaptive SMT focuses on the use of IR techniques to identify a relevant subset of the training corpus from which an adapted model ... |

6 | Self-training for machine translation - Ueffing - 2006 |

4 | The NiCT-ATR statistical machine translation system for the iwslt 2006 evaluation - Zhang, Yamamota, et al. |

1 | Phrasetable smoothing for statistical machine translation - Hastie, Friedman - 2006 |