Results 1 
6 of
6
Strategies for Training Large Scale Neural Network Language Models
"... Abstract—We describe how to effectively train neural network based language models on large data sets. Fast convergence during training and better overall performance is observed when the training data are sorted by their relevance. We introduce hashbased implementation of a maximum entropy model, ..."
Abstract

Cited by 41 (4 self)
 Add to MetaCart
(Show Context)
Abstract—We describe how to effectively train neural network based language models on large data sets. Fast convergence during training and better overall performance is observed when the training data are sorted by their relevance. We introduce hashbased implementation of a maximum entropy model, that can be trained as a part of the neural network model. This leads to significant reduction of computational complexity. We achieved around 10 % relative reduction of word error rate on English Broadcast News speech recognition task, against large 4gram model trained on 400M tokens. I.
One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling. Google
, 2013
"... We propose a new benchmark corpus to be used for measuring progress in statistical language modeling. With almost one billion words of training data, we hope this benchmark will be useful to quickly evaluate novel language modeling techniques, and to compare their contribution when combined with ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
We propose a new benchmark corpus to be used for measuring progress in statistical language modeling. With almost one billion words of training data, we hope this benchmark will be useful to quickly evaluate novel language modeling techniques, and to compare their contribution when combined with other advanced techniques. We show performance of several wellknown types of language models, with the best results achieved with a recurrent neural network based language model. The baseline unpruned KneserNey 5gram model achieves perplexity 67.6. A combination of techniques leads to 35% reduction in perplexity, or 10 % reduction in crossentropy (bits), over that baseline. The benchmark is available as a code.google.com project; besides the scripts needed to rebuild the training/heldout data, it also makes available logprobability values for each word in each of ten heldout data sets, for each of the baseline ngram models. 1
Backoff Inspired Features for Maximum Entropy Language Models
"... Maximum Entropy (MaxEnt) language models [1, 2] are linear models that are typically regularized via wellknown L1 or L2 terms in the likelihood objective, hence avoiding the need for the kinds of backoff or mixture weights used in smoothed ngram language models using Katz backoff [3] and similar t ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Maximum Entropy (MaxEnt) language models [1, 2] are linear models that are typically regularized via wellknown L1 or L2 terms in the likelihood objective, hence avoiding the need for the kinds of backoff or mixture weights used in smoothed ngram language models using Katz backoff [3] and similar techniques. Even though backoff cost is not required to regularize the model, we investigate the use of backoff features in MaxEnt models, as well as some backoffinspired variants. These features are shown to improve model quality substantially, as shown in perplexity and worderror rate reductions, even in very large scale training scenarios of tens or hundreds of billions of words and hundreds of millions of features. Index Terms: maximum entropy modeling, language modeling, ngram models, linear models
Randomized Maximum Entropy Language Models
"... Abstract—We address the memory problem of maximum entropy language models(MELM) with very large feature sets. Randomized techniques are employed to remove all large, exact data structures in MELM implementations. To avoid the dictionary structure that maps each feature to its corresponding weight, t ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—We address the memory problem of maximum entropy language models(MELM) with very large feature sets. Randomized techniques are employed to remove all large, exact data structures in MELM implementations. To avoid the dictionary structure that maps each feature to its corresponding weight, the feature hashing trick [1] [2] can be used. We also replace the explicit storage of features with a Bloom filter. We show with extensive experiments that false positive errors of Bloom filters and random hash collisions do not degrade model performance. Both perplexity and WER improvements are demonstrated by building MELM that would otherwise be prohibitively large to estimate or store. I.
STATISTICAL LANGUAGE MODELS BASED ON NEURAL NETWORKS
, 2012
"... patř́ı např́ıklad automaticke ́ rozpoznáváńı řeči a strojovy ́ překlad (př́ıkladem je známá aplikace Google Translate). Tradičńı techniky pro odhad těchto model̊u jsou založeny na tzv. Ngramech. Navzdory známým nedostatk̊um těchto technik a obrovskému úsiĺı výzkumných skupin ..."
Abstract
 Add to MetaCart
(Show Context)
patř́ı např́ıklad automaticke ́ rozpoznáváńı řeči a strojovy ́ překlad (př́ıkladem je známá aplikace Google Translate). Tradičńı techniky pro odhad těchto model̊u jsou založeny na tzv. Ngramech. Navzdory známým nedostatk̊um těchto technik a obrovskému úsiĺı výzkumných skupin např́ıc ̌ mnoha oblastmi (rozpoznáváńı řeči, automaticky ́ překlad, neuroscience, uměla ́ inteligence, zpracováńı přirozeného jazyka, komprese dat, psychologie atd.), Ngramy v podstate ̌ z̊ustaly nejúspěšněǰśı technikou. Ćılem této práce je prezentace několika architektur jazykových model̊u založených na neuronových śıt́ıch. Ačkoliv jsou tyto modely výpočetne ̌ náročněǰśı nez ̌ Ngramove ́ modely, s technikami vyvinutými v této práci je možne ́ jejich efektivńı použit́ı v reálných aplikaćıch. Dosažene ́ sńıžeńı počtu chyb při rozpoznáváńı řeči oproti nejlepš́ım Ngramovým model̊um dosahuje 20%. Model založeny ́ na rekurentńı neurovove ́ śıti dosahuje nejlepš́ıch publikovaných výsledk̊u na velmi známe ́ datove ́ sade ̌ (Penn Treebank). Statistical language models are crucial part of many successful applications, such as automatic speech recognition and statistical machine translation (for example wellknown