Abstract:
We propose a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], and Hofmann's aspect model, also known as probabilistic latent semantic indexing (pLSI) [3]. In the context of text modeling, our model posits that each document is generated as a mixture of topics, where the continuous-valued mixture proportions are distributed as a latent Dirichlet random variable. Inference and learning are carried out efficiently via variational algorithms. We present empirical results on applications of this model to problems in text modeling, collaborative filtering, and text classification.
Citations
|
1636
|
Indexing by latent semantic analysis
– Deerwester, Dumais, et al.
- 1990
|
|
1439
|
Modern Information Retrieval
– Baeza-Yates, Ribeiro
- 1999
|
|
805
|
Making large-scale SVM learning practical
– Joachims
- 1999
|
|
606
|
Bayesian Data Analysis
– Gelman, Carlin, et al.
- 1995
|
|
495
|
Text classification from labeled and unlabeled documents using em
– Nigam, McCallum, et al.
- 2000
|
|
494
|
Statistical methods for speech recognition
– Jelinek
- 1997
|
|
464
|
An introduction to variational methods for graphical models
– Jordan, Ghahramani, et al.
- 1999
|
|
357
|
Learning in Graphical Models
– Jordan
- 1998
|
|
305
|
Probabilistic latent semantic indexing
– Hofmann
- 1999
|
|
158
|
Latent semantic indexing: A probabilistic analysis
– Papadimitriou, Tamaki, et al.
- 1998
|
|
157
|
Using maximum entropy for text classification
– Nigam, Lafferty, et al.
- 1999
|
|
149
|
Overview of the first text retrieval conference (TREC-1
– Harman
- 1992
|
|
125
|
2003, ‘Modeling annotated data
– Blei, Jordan
|
|
98
|
A variational bayesian framework for graphical models
– Attias
- 1999
|
|
69
|
Probabilistic models for unified collaborative and content-based recommendation in sparse-data environments
– BLEI, Popescul, et al.
- 2001
|
|
60
|
An experimental comparison of several clustering and initialization methods
– Meila, Heckerman
- 1998
|
|
55
|
Expectation-propagation for the generative aspect model. Uncertainty
– Minka, Lafferty
- 2007
|
|
47
|
Improving multi-class text classification with naive Bayes
– Rennie
- 2001
|
|
45
|
Estimating a dirichlet distribution
– Minka
- 2000
|
|
38
|
Parametric empirical Bayes inference: Theory and applications
– MORRIS
- 1983
|
|
31
|
A probabilistic approach to semantic representation
– Griffiths, Steyvers
- 2002
|
|
26
|
Approximate Bayesian Inference in Conditionally Independent Hierarchical Models (Parametric Empirical Bayes Models
– Kass, Steey
- 1989
|
|
16
|
Recent progress on de Finetti’s notions of exchangeability
– Diaconis
- 1988
|
|
5
|
Exchangeability and related topics. In Ecole d' et e de probabilit es de Saint-Flour, XIII
– Aldous
- 1983
|
|
4
|
Bayesian methods for censored categorical data
– Dickey, Jiang, et al.
- 1987
|
|
3
|
Finetti. Theory of probability. Vol
– de
- 1990
|
|
2
|
Caenorrhabditis genetic center bibliography
– Avery
- 2002
|