MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Latent Dirichlet Allocation (2003) [485 citations — 23 self]

Abstract:

We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.

Citations

1636 Indexing by latent semantic analysis – Deerwester, Dumais, et al. - 1990
1439 Modern Information Retrieval – Baeza-Yates, Ribeiro - 1999
805 Making large-scale SVM learning practical – Joachims - 1999
606 Bayesian Data Analysis – Gelman, Carlin, et al. - 1995
495 Text classification from labeled and unlabeled documents using em – Nigam, McCallum, et al. - 2000
494 Statistical methods for speech recognition – Jelinek - 1997
464 An introduction to variational methods for graphical models – Jordan, Ghahramani, et al. - 1999
357 Learning in Graphical Models – Jordan - 1998
305 Probabilistic latent semantic indexing – Hofmann - 1999
158 Latent semantic indexing: A probabilistic analysis – Papadimitriou, Tamaki, et al. - 1998
157 Using maximum entropy for text classification – Nigam, Lafferty, et al. - 1999
149 Overview of the first text retrieval conference (TREC-1 – Harman - 1992
125 2003, ‘Modeling annotated data – Blei, Jordan
98 A variational bayesian framework for graphical models – Attias - 1999
69 Probabilistic models for unified collaborative and content-based recommendation in sparse-data environments – BLEI, Popescul, et al. - 2001
60 An experimental comparison of several clustering and initialization methods – Meila, Heckerman - 1998
55 Expectation-propagation for the generative aspect model. Uncertainty – Minka, Lafferty - 2007
47 Improving multi-class text classification with naive Bayes – Rennie - 2001
45 Estimating a dirichlet distribution – Minka - 2000
38 Parametric empirical Bayes inference: Theory and applications – MORRIS - 1983
31 A probabilistic approach to semantic representation – Griffiths, Steyvers - 2002
26 Approximate Bayesian Inference in Conditionally Independent Hierarchical Models (Parametric Empirical Bayes Models – Kass, Steey - 1989
16 Recent progress on de Finetti’s notions of exchangeability – Diaconis - 1988
5 Exchangeability and related topics. In Ecole d' et e de probabilit es de Saint-Flour, XIII – Aldous - 1983
4 Bayesian methods for censored categorical data – Dickey, Jiang, et al. - 1987
3 Finetti. Theory of probability. Vol – de - 1990
2 Caenorrhabditis genetic center bibliography – Avery - 2002