## Supervised topic models (2008)

### Cached

### Download Links

- [www.cs.princeton.edu]
- [www.ece.duke.edu]
- [www.cs.princeton.edu:80]
- [people.ee.duke.edu]
- [books.nips.cc]
- [books.nips.cc]
- DBLP

### Other Repositories/Bibliography

Venue: | In preparation |

Citations: | 191 - 4 self |

### BibTeX

@INPROCEEDINGS{Blei08supervisedtopic,

author = {David M. Blei and Jon D. Mcauliffe},

title = {Supervised topic models},

booktitle = {In preparation},

year = {2008},

publisher = {MIT Press}

}

### OpenURL

### Abstract

### Citations

2608 | Latent dirichlet allocation
- Blei, Ng, et al.
- 2003
(Show Context)
Citation Context ...g need to analyze large collections of electronic text. The complexity of document corpora has led to considerable interest in applying hierarchical statistical models based on what are called topics =-=[3, 12, 8, 5]-=-. Formally, a topic is a probability distribution over terms in a vocabulary. Informally, a topic represents an underlying semantic theme; a document consisting of a large number of words might be con... |

2262 |
Elements of statistical learning
- Hastie, Tibshirani, et al.
- 2001
(Show Context)
Citation Context ...pervised topic models is mirrored in existing dimension-reduction techniques. For example, consider regression on unsupervised principal components versus partial least squares and projection pursuit =-=[9]-=-, which both search for covariate linear combinations most predictive of a response variable. In text analysis, McCallum et al. developed a joint topic model for words and categories [10], and Blei an... |

1728 |
Generalized Linear Models
- McCullagh, Nelder
- 1989
(Show Context)
Citation Context ...a normal linear model to a suitably transformed version of such a response. When no transformation results in approximate normality, statisticians often make use of a generalized linear model, or GLM =-=[9]-=-. In this section, we describe sLDA in full generality, replacing the normal linear model of the earlier exposition with a GLM formulation. As we shall see, the result is a generic framework which can... |

584 | A Bayesian Hierarchical Model for Learning Natural Scene Categories
- Fei-Fei, Perona
- 2005
(Show Context)
Citation Context ...nsupervised LDA has previously been used to construct features for classification. The hope was that LDA topics would turn out to be useful for categorization, since they act to reduce data dimension =-=[3, 6]-=-. However, when the goal is prediction, fitting unsupervised topics may not be a good choice. Consider predicting a movie rating from the words in its review. Intuitively, good predictive topics will ... |

343 | Modeling annotated data
- Blei, Jordan
- 2003
(Show Context)
Citation Context ... of a response variable. In text analysis, McCallum et al. developed a joint topic model for words and categories [10], and Blei and Jordan developed an LDA model to predict caption words from images =-=[2]-=-. This paper is organized as follows. We first develop the supervised latent Dirichlet allocation model (sLDA) for document-response pairs. We derive parameter estimation and prediction algorithms for... |

260 | The author-topic model for authors and documents
- Rosen-Zvi, Griffiths, et al.
- 2004
(Show Context)
Citation Context ...g need to analyze large collections of electronic text. The complexity of document corpora has led to considerable interest in applying hierarchical statistical models based on what are called topics =-=[3, 12, 8, 5]-=-. Formally, a topic is a probability distribution over terms in a vocabulary. Informally, a topic represents an underlying semantic theme; a document consisting of a large number of words might be con... |

189 | Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales
- Pang, Lee
(Show Context)
Citation Context ...the same issues. 3 Empirical results We evaluated sLDA on two prediction problems. First, we consider “sentiment analysis” of newspaper movie reviews. We use the publicly available data introduced in =-=[11]-=-, which contains movie reviews paired with the number of stars given. While Pang and Lee treat this as a classification problem, we treat it as a regression problem. With a 5000-term vocabulary chosen... |

135 | Integrating topics and syntax
- Griffiths, Steyvers, et al.
- 2005
(Show Context)
Citation Context ...g need to analyze large collections of electronic text. The complexity of document corpora has led to considerable interest in applying hierarchical statistical models based on what are called topics =-=[3, 12, 8, 5]-=-. Formally, a topic is a probability distribution over terms in a vocabulary. Informally, a topic represents an underlying semantic theme; a document consisting of a large number of words might be con... |

129 |
Probabilistic topic models
- Steyvers, Griffiths
- 2007
(Show Context)
Citation Context ...pics, but each document uses a mix of topics unique to itself. Thus, topic models are a relaxation of classical document mixture models, which associate each document with a single unknown topic. See =-=[7]-=- for a recent review. Here we build on latent Dirichlet allocation (LDA) [3], a topic model that serves as the basis for many others. In LDA, we treat the topic proportions for a document as a draw fr... |

121 | Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces
- Fukumizu, Bach, et al.
- 2004
(Show Context)
Citation Context ... [7], which both search for covariate linear combinations most predictive of a response variable. These linear supervised methods have non1sparametric analogs, such as an approach based on kernel ICA =-=[6]-=-. In text analysis, McCallum et al. developed a joint topic model for words and categories [8], and Blei and Jordan developed an LDA model to predict caption words from images [2]. In chemogenomic pro... |

58 | Applying discrete PCA in data analysis
- Buntine, Jakulin
- 2004
(Show Context)
Citation Context |

35 | Mathematical Statistics
- Bickel, Doksum
- 1977
(Show Context)
Citation Context ...ct gradient, which we omit for brevity. Other times, no exact gradient is available. In a longer paper [4], we apply both variational bounding techniques and the multivariate delta method for moments =-=[1]-=- to obtain approximate gradients. The GLM contribution to the gradient determines whether the φ j coordinate update itself has a closed form, as it does in the normal case (11) and the Poisson case (o... |

33 | Multi-conditional learning: Generative/discriminative training for clustering and classification
- McCallum, Pal, et al.
- 2006
(Show Context)
Citation Context ...ction pursuit [9], which both search for covariate linear combinations most predictive of a response variable. In text analysis, McCallum et al. developed a joint topic model for words and categories =-=[10]-=-, and Blei and Jordan developed an LDA model to predict caption words from images [2]. This paper is organized as follows. We first develop the supervised latent Dirichlet allocation model (sLDA) for ... |

16 | A latent variable model for chemogenomic profiling. Bioinformatics - Flaherty, Giaever, et al. - 2005 |