Results 1 -
3 of
3
Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
"... We propose a novel technique of learning how to transform the source parse trees to improve the translation qualities of syntax-based translation models using synchronous context-free grammars. We transform the source tree phrasal structure into a set of simpler structures, expose such decisions to ..."
Abstract
- Add to MetaCart
We propose a novel technique of learning how to transform the source parse trees to improve the translation qualities of syntax-based translation models using synchronous context-free grammars. We transform the source tree phrasal structure into a set of simpler structures, expose such decisions to the decoding process, and find the least expensive transformation operation to better model word reordering. In particular, we integrate synchronous binarizations, verb regrouping, removal of redundant parse nodes, and incorporate a few important features such as translation boundaries. We learn the structural preferences from the data in a generative framework. The syntax-based translation system integrating the proposed techniques outperforms the best Arabic-English unconstrained system in NIST-08 evaluations by 1.3 absolute BLEU, which is statistically significant. 1
Pruning Exponential Language Models
"... Abstract—Language model pruning is an essential technology for speech applications running on resource-constrained devices, and many pruning algorithms have been developed for conventional word n-gram models. However, while exponential language models can give superior performance, there has been li ..."
Abstract
- Add to MetaCart
Abstract—Language model pruning is an essential technology for speech applications running on resource-constrained devices, and many pruning algorithms have been developed for conventional word n-gram models. However, while exponential language models can give superior performance, there has been little work on the pruning of these models. In this paper, we propose several pruning algorithms for general exponential language models. We show that our best algorithm applied to an exponential n-gram model outperforms existing n-gram model pruning algorithms by up to 0.4 % absolute in speech recognition word-error rate on Wall Street Journal and Broadcast News data sets. In addition, we show that Model M, an exponential class-based language model, retains its performance improvement over conventional word n-gram models when pruned to equal size, with gains of up to 2.5 % absolute in word-error rate. I.
INTERSPEECH 2011 Personalizing Model M for Voice-search
"... Model M is a recently proposed class based exponential n-gram language model. In this paper, we extend it with personalization features, address the scalability issues present with large data sets, and test its effectiveness on the Bing Mobile voice-search task. We find that Model M by itself reduce ..."
Abstract
- Add to MetaCart
Model M is a recently proposed class based exponential n-gram language model. In this paper, we extend it with personalization features, address the scalability issues present with large data sets, and test its effectiveness on the Bing Mobile voice-search task. We find that Model M by itself reduces both perplexity and word error rate compared with a conventional model, and that the personalization features produce a further significant improvement. The personalization features provide a very large improvement when the history contains a relevant query; thus the overall effect is gated by the number of times a user requeries a past request. Index Terms: voice search, language modeling, speech recognition, personalization

