• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

DMCA

A Neural Probabilistic Language Model (2003)

Cached

  • Download as a PDF

Download Links

  • [www.cs.colorado.edu]
  • [www.iro.umontreal.ca]
  • [www.iro.umontreal.ca]
  • [www.iro.umontreal.ca]
  • [www.iro.umontreal.ca]
  • [john.blitzer.com]
  • [www.jmlr.org]
  • [www.blutner.de]
  • [www.jmlr.org]
  • [www.ai.mit.edu]
  • [www.ai.mit.edu]
  • [www.iro.umontreal.ca]
  • [jmlr.org]
  • [jmlr.org]
  • [jmlr.csail.mit.edu]
  • [www.iro.umontreal.ca]
  • [www.iro.umontreal.ca]
  • [www.iro.umontreal.ca]

  • Other Repositories/Bibliography

  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Yoshua Bengio , Réjean Ducharme , Pascal Vincent , Christian Jauvin
Venue:JOURNAL OF MACHINE LEARNING RESEARCH
Citations:405 - 19 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@ARTICLE{Bengio03aneural,
    author = {Yoshua Bengio and Réjean Ducharme and Pascal Vincent and Christian Jauvin},
    title = {A Neural Probabilistic Language Model},
    journal = {JOURNAL OF MACHINE LEARNING RESEARCH},
    year = {2003},
    volume = {3},
    pages = {1137--1155}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of dimensionality: a word sequence on which the model will be tested is likely to be different from all the word sequences seen during training. Traditional but very successful approaches based on n-grams obtain generalization by concatenating very short overlapping sequences seen in the training set. We propose to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences. The model learns simultaneously (1) a distributed representation for each word along with (2) the probability function for word sequences, expressed in terms of these representations. Generalization is obtained because a sequence of words that has never been seen before gets high probability if it is made of words that are similar (in the sense of having a nearby representation) to words forming an already seen sentence. Training such large models (with millions of parameters) within a reasonable time is itself a significant challenge. We report on experiments using neural networks for the probability function, showing on two text corpora that the proposed approach significantly improves on state-of-the-art n-gram models, and that the proposed approach allows to take advantage of longer contexts.

Keyphrases

neural probabilistic language model    word sequence    distributed representation    probability function    training large model    text corpus    successful approach    nearby representation    training sentence    joint probability function    state-of-the-art n-gram model    reasonable time    neighboring sentence    exponential number    statistical language modeling    high probability    training set    neural network    significant challenge   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University