Results 1 
3 of
3
MACHINE TRANSLATION BY PATTERN MATCHING
, 2008
"... The best systems for machine translation of natural language are based on statistical models learned from data. Conventional representation of a statistical translation model requires substantial offline computation and representation in main memory. Therefore, the principal bottlenecks to the amoun ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
The best systems for machine translation of natural language are based on statistical models learned from data. Conventional representation of a statistical translation model requires substantial offline computation and representation in main memory. Therefore, the principal bottlenecks to the amount of data we can exploit and the complexity of models we can use are available memory and CPU time, and current state of the art already pushes these limits. With data size and model complexity continually increasing, a scalable solution to this problem is central to future improvement. CallisonBurch et al. (2005) and Zhang and Vogel (2005) proposed a solution that we call translation by pattern matching, which we bring to fruition in this dissertation. The training data itself serves as a proxy to the model; rules and parameters are computed on demand. It achieves our desiderata of minimal offline computation and compact representation, but is dependent on fast pattern matching algorithms on text. They demonstrated its application to a common model based on the translation of contiguous substrings, but leave some open problems. Among these is a question: can this approach match the performance of conventional methods despite unavoidable differences that it induces in the model? We show how to answer this question affirmatively. The main
Domain Adaptation of Translation Models for Multilingual Applications
, 2009
"... number W0550432. ..."
(Show Context)
Abstract Title of dissertation: MACHINE TRANSLATION BY PATTERN MATCHING
"... The best systems for machine translation of natural language are based on statistical models learned from data. Conventional representation of a statistical translation model requires substantial offline computation and representation in main memory. Therefore, the principal bottlenecks to the amoun ..."
Abstract
 Add to MetaCart
(Show Context)
The best systems for machine translation of natural language are based on statistical models learned from data. Conventional representation of a statistical translation model requires substantial offline computation and representation in main memory. Therefore, the principal bottlenecks to the amount of data we can exploit and the complexity of models we can use are available memory and CPU time, and current state of the art already pushes these limits. With data size and model complexity continually increasing, a scalable solution to this problem is central to future improvement. CallisonBurch et al. (2005) and Zhang and Vogel (2005) proposed a solution that we call translation by pattern matching, which we bring to fruition in this dissertation. The training data itself serves as a proxy to the model; rules and parameters are computed on demand. It achieves our desiderata of minimal offline computation and compact representation, but is dependent on fast pattern matching algorithms on text. They demonstrated its application to a common model based on the translation of contiguous substrings, but leave some open problems. Among these is a question: can this approach match the performance of conventional methods despite unavoidable differences that it