• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

DMCA

Two decades of statistical language modeling: Where do we go from here (2000)

Cached

  • Download as a PDF

Download Links

  • [ima.umn.edu]
  • [www.ima.umn.edu]
  • [silver.ima.umn.edu]
  • [redesign.ima.umn.edu]
  • [www.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [www-koi.compression.ru]
  • [www.compression.ru]
  • [www-win.compression.ru]
  • [compression.graphicon.ru]
  • [ciir.cs.umass.edu]
  • [www-2.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [www-lat.compression.graphicon.ru]
  • [www.iro.umontreal.ca]
  • [www-lat.compression.ru]
  • [www.cs.cmu.edu]
  • [www.iro.umontreal.ca]
  • [www.ima.umn.edu]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Ronald Rosenfeld
Venue:Proceedings of the IEEE
Citations:210 - 1 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@INPROCEEDINGS{Rosenfeld00twodecades,
    author = {Ronald Rosenfeld},
    title = {Two decades of statistical language modeling: Where do we go from here},
    booktitle = {Proceedings of the IEEE},
    year = {2000},
    pages = {2000}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

Statistical Language Models estimate the distribution of various natural language phenomena for the purpose of speech recognition and other language technologies. Since the first significant model was proposed in 1980, many attempts have been made to improve the state of the art. We review them here, point to a few promising directions, and argue for a Bayesian approach to integration of linguistic theories with data. 1. OUTLINE Statistical language modeling (SLM) is the attempt to capture regularities of natural language for the purpose of improving the performance of various natural language applications. By and large, statistical language modeling amounts to estimating the probability distribution of various linguistic units, such as words, sentences, and whole documents. Statistical language modeling is crucial for a large variety of language technology applications. These include speech recognition (where SLM got its start), machine translation, document classification and routing, optical character recognition, information retrieval, handwriting recognition, spelling correction, and many more. In machine translation, for example, purely statistical approaches have been introduced in [1]. But even researchers using rule-based approaches have found it beneficial to introduce some elements of SLM and statistical estimation [2]. In information retrieval, a language modeling approach was recently proposed by [3], and a statistical/information theoretical approach was developed by [4]. SLM employs statistical estimation techniques using language training data, that is, text. Because of the categorical nature of language, and the large vocabularies people naturally use, statistical techniques must estimate a large number of parameters, and consequently depend critically on the availability of large amounts of training data.

Keyphrases

statistical language modeling    speech recognition    information retrieval    machine translation    promising direction    categorical nature    statistical estimation    probability distribution    large number    statistical approach    statistical language model    outline statistical language modeling    various natural language phenomenon    language technology application    capture regularity    bayesian approach    many attempt    statistical language    natural language    large variety    training data    document classification    rule-based approach    statistical estimation technique    language training data    statistical information theoretical approach    whole document    first significant model    linguistic theory    language technology    various linguistic unit    optical character recognition    various natural language application    statistical technique    large amount    large vocabulary people   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University