• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

DMCA

Unsupervised Learning by Probabilistic Latent Semantic Analysis (2001)

Cached

  • Download as a PDF

Download Links

  • [www.cs.bham.ac.uk]
  • [www.iro.umontreal.ca]
  • [www.iro.umontreal.ca]
  • [www.cs.ualberta.ca]
  • [webdocs.cs.ualberta.ca]
  • [www.cs.helsinki.fi]
  • [webdocs.cs.ualberta.ca]
  • [www.cs.helsinki.fi]

  • Other Repositories/Bibliography

  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Thomas Hofmann
Venue:Machine Learning
Citations:616 - 4 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@INPROCEEDINGS{Hofmann01unsupervisedlearning,
    author = {Thomas Hofmann},
    title = {Unsupervised Learning by Probabilistic Latent Semantic Analysis},
    booktitle = {Machine Learning},
    year = {2001},
    pages = {2001}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

Abstract. This paper presents a novel statistical method for factor analysis of binary and count data which is closely related to a technique known as Latent Semantic Analysis. In contrast to the latter method which stems from linear algebra and performs a Singular Value Decomposition of co-occurrence tables, the proposed technique uses a generative latent class model to perform a probabilistic mixture decomposition. This results in a more principled approach with a solid foundation in statistical inference. More precisely, we propose to make use of a temperature controlled version of the Expectation Maximization algorithm for model fitting, which has shown excellent performance in practice. Probabilistic Latent Semantic Analysis has many applications, most prominently in information retrieval, natural language processing, machine learning from text, and in related areas. The paper presents perplexity results for different types of text and linguistic data collections and discusses an application in automated document indexing. The experiments indicate substantial and consistent improvements of the probabilistic method over standard Latent Semantic Analysis.

Keyphrases

probabilistic latent semantic analysis    singular value decomposition    related area    many application    information retrieval    probabilistic mixture decomposition    perplexity result    standard latent semantic analysis    count data    novel statistical method    co-occurrence table    natural language processing    probabilistic method    factor analysis    expectation maximization algorithm    excellent performance    consistent improvement    model fitting    linguistic data collection    different type    linear algebra    machine learning    latent semantic analysis    generative latent class model    principled approach    automated document indexing    statistical inference    solid foundation    latter method   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University