• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Three New Graphical Models for Statistical Language Modelling

Cached

  • Download as a PDF

Download Links

  • [www.cs.toronto.edu]
  • [www.cs.toronto.edu]
  • [learning.cs.toronto.edu]
  • [www.cs.utoronto.ca]
  • [www.cs.utoronto.ca]
  • [imls.engr.oregonstate.edu]
  • [www.machinelearning.org]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Andriy Mnih , Geoffrey Hinton
Citations:21 - 3 self
  • Summary
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@MISC{Mnih_threenew,
    author = {Andriy Mnih and Geoffrey Hinton},
    title = {Three New Graphical Models for Statistical Language Modelling},
    year = {}
}

Bookmark

citeulike Connotea Bibsonomy Del.icio.us Digg Reddit

OpenURL

 

Abstract

The supremacy of n-gram models in statistical language modelling has recently been challenged by parametric models that use distributed representations to counteract the difficulties caused by data sparsity. We propose three new probabilistic language models that define the distribution of the next word in a sequence given several preceding words by using distributed representations of those words. We show how real-valued distributed representations for words can be learned at the same time as learning a large set of stochastic binary hidden features that are used to predict the distributed representation of the next word from previous distributed representations. Adding connections from the previous states of the binary hidden features improves performance as does adding direct connections between the real-valued distributed representations. One of our models significantly outperforms the very best n-gram models. 1.

Citations

631 J: An Empirical Study of Smoothing Techniques for Language Modeling - SF, Goodman - 1996
449 SRILM – an extensible language modeling toolkit - Stolcke - 2002
353 Training products of experts by minimizing contrastive divergence - Hinton
241 A Fast Learning Algorithm for Deep Belief Nets - Hinton, Osindero, et al.
154 Learning distributed representations of concepts - Hinton - 1986
16 Learning multilevel distributed representations for high-dimensional sequences - Sutskever, Hinton - 2007
11 Hierarchical Probabilistic Neural Network Language Model - Morin, Bengio
10 F.: Distributed latent variable models of lexical co-occurrences - Blitzer, Globerson, et al. - 2005
7 Quick training of probabilistic neural nets by sampling - Bengio, Senécal - 2003
6 Hierarchical Distributed Representations for Statistical Language Modeling - BLITZER, WEINBERGER, et al. - 2005
5 Training neural network language models on very large corpora - Schwenk, Gauvain - 2005
3 Using a connectionist model in a syntactical based language model - Emami, Xu, et al. - 2003
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University