#### DMCA

## Online Learning for Latent Dirichlet Allocation

### Cached

### Download Links

- [www.cs.princeton.edu:80]
- [www.di.ens.fr]
- [books.nips.cc]
- [www.di.ens.fr]
- [ar.newsmth.net]
- [machinelearning.wustl.edu]
- [videolectures.net]
- [www.cs.princeton.edu:80]
- [www.cs.princeton.edu]
- [people.ee.duke.edu]
- [www.cs.princeton.edu]
- [people.ee.duke.edu]
- [www.cs.columbia.edu]
- [mimno.infosci.cornell.edu]
- [www.cs.princeton.edu]
- [www.researchgate.net]
- [papers.nips.cc]

Citations: | 207 - 20 self |

### Citations

11956 | Maximum Likelihood from Incomplete Data via the EM Algorithm
- Dempster, Laird, et al.
- 1997
(Show Context)
Citation Context ...t derivative of the logarithm of the gamma function). The updates in equation 5 are guaranteed to converge to a stationary point of the ELBO. By analogy to the Expectation-Maximization (EM) algorithm =-=[14]-=-, we can partition these updates into an “E” step—iteratively updating γ and φ until convergence, holding λ fixed—and an “M” step—updating λ given φ. In practice, this algorithm converges to a better ... |

4359 | Latent Dirichlet Allocation
- Blei, Ng, et al.
(Show Context)
Citation Context ...onal Bayes is a practical new method for estimating the posterior of complex hierarchical Bayesian models. 2 Online variational Bayes for latent Dirichlet allocation Latent Dirichlet Allocation (LDA) =-=[7]-=- is a Bayesian probabilistic model of text documents. It assumes a collection of K “topics.” Each topic defines a multinomial distribution over the vocabulary and is assumed to have been drawn from a ... |

1128 | An introduction to variational methods for graphical models
- Jordan, Ghahramani, et al.
- 1999
(Show Context)
Citation Context ...ction. 2.1 Batch variational Bayes for LDA In Variational Bayesian inference (VB) the true posterior is approximated by a simpler distribution q(z, θ, β), which is indexed by a set of free parameters =-=[12, 13]-=-. These parameters are optimized to maximize the Evidence Lower BOund (ELBO): log p(w|α, η) ≥L(w, φ, γ, λ) � Eq[log p(w, z, θ, β|α, η)] − Eq[log q(z, θ, β)]. (1) Maximizing the ELBO is equivalent to m... |

1102 |
Finding scientific topics
- Griffiths, Steyvers
- 2004
(Show Context)
Citation Context ...pling (CGS) is a popular MCMC approach that samples from the posterior over topic assignments z by repeatedly sampling the topic assignment zdi conditioned on the data and all other topic assignments =-=[22]-=-. One online MCMC approach adapts CGS by sampling topic assignments zdi based on the topic assignments and data for all previously analyzed words, instead of all other words in the corpus [23]. This a... |

993 | A view of the EM algorithm that justifies incremental sparse and other variants
- Neal, Hinton
- 1998
(Show Context)
Citation Context ... fit a set of per-observation parameters (such as the per-document variational parameters γd and φd in LDA). The problem is addressed by online coordinate ascent algorithms such as those described in =-=[20, 21, 16, 17, 10]-=-. The goal of these algorithms is to set the global parameters so that the objective is as good as possible once the perobservation parameters are optimized. Most of these approaches assume the comput... |

960 |
A stochastic approximation method
- Robbins, Monro
- 1951
(Show Context)
Citation Context ...laces sampling with optimization, we can use results from stochastic optimization to analyze online LDA. Stochastic optimization algorithms optimize an objective using noisy estimates of its gradient =-=[18]-=-. Although there is no explicit gradient computation, algorithm 2 can be interpreted as a stochastic natural gradient algorithm [16, 15]. We begin by deriving a related first-order stochastic gradient... |

326 | Online learning for matrix factorization and sparse coding
- Mairal, Bach, et al.
(Show Context)
Citation Context ...ars in document d) into a matrix of topic weights θ and a dictionary of topics β [9]. Our work can thus 2be seen as an extension of online matrix factorization techniques that optimize squared error =-=[10]-=- to more general probabilistic formulations. We can analyze a corpus of documents with LDA by examining the posterior distribution of the topics β, topic proportions θ, and topic assignments z conditi... |

270 | The tradeoffs of large scale learning
- Bottou, Bousquet
(Show Context)
Citation Context ...h many others are based. Our algorithm is based on online stochastic optimization, which has been shown to produce good parameter estimates dramatically faster than batch algorithms on large datasets =-=[6]-=-. Online LDA handily analyzes massive collections of documents and, moreover, online LDA need not locally store or collect the documents— each can arrive in a stream and be discarded after one look. I... |

267 | A variational Bayesian framework for graphical models
- Attias
- 2000
(Show Context)
Citation Context ...ction. 2.1 Batch variational Bayes for LDA In Variational Bayesian inference (VB) the true posterior is approximated by a simpler distribution q(z, θ, β), which is indexed by a set of free parameters =-=[12, 13]-=-. These parameters are optimized to maximize the Evidence Lower BOund (ELBO): log p(w|α, η) ≥L(w, φ, γ, λ) � Eq[log p(w, z, θ, β|α, η)] − Eq[log q(z, θ, β)]. (1) Maximizing the ELBO is equivalent to m... |

237 | Reading tea leaves: How humans interpret topic models
- Chang, Boyd-Graber, et al.
- 2009
(Show Context)
Citation Context ...ld fixed. In all experiments α and η are fixed at 0.01 and the number of topics K = 100. There is some question as to the meaningfulness of perplexity as a metric for comparing different topic models =-=[25]-=-. Held-out likelihood metrics are nonetheless well suited to measuring how well an inference algorithm accomplishes the specific optimization task defined by a model. Evaluating learning parameters. O... |

119 | On smoothing and inference for topic models.
- Asuncion, Welling, et al.
- 2009
(Show Context)
Citation Context ...choice of approximate posterior introduces bias, VB is empirically shown to be faster than and as accurate as MCMC, which makes it an attractive option when applying Bayesian models to large datasets =-=[1, 2, 3]-=-. Nonetheless, large scale data analysis with VB can be computationally difficult. Standard “batch” VB algorithms iterate between analyzing each observation and updating dataset-wide variational param... |

109 | Rethinking LDA: why priors matter - Wallach, Mimno, et al. - 2009 |

90 | On-line EM algorithm for the normalized gaussian network.
- Sato, Ishii
- 2000
(Show Context)
Citation Context ... fit a set of per-observation parameters (such as the per-document variational parameters γd and φd in LDA). The problem is addressed by online coordinate ascent algorithms such as those described in =-=[20, 21, 16, 17, 10]-=-. The goal of these algorithms is to set the global parameters so that the objective is as good as possible once the perobservation parameters are optimized. Most of these approaches assume the comput... |

83 | Distributed inference for latent dirichlet allocation. In
- Newman, Asuncion, et al.
- 2007
(Show Context)
Citation Context ...ments. to summarize the latent structure of massive document collections that cannot be annotated by hand. A central research problem for topic modeling is to efficiently fit models to larger corpora =-=[4, 5]-=-. To this end, we develop an online variational Bayes algorithm for latent Dirichlet allocation (LDA), one of the simplest topic models and one on which many others are based. Our algorithm is based o... |

80 | Efficient methods for topic model inference on streaming document collections.
- Yao, Mimno, et al.
- 2009
(Show Context)
Citation Context ... are effective, but both present significant computational challenges in the face of massive data sets.Developing scalable approximate inference methods for topic models is an active area of research =-=[3, 4, 5, 11]-=-. To this end, we develop online variational inference for LDA, an approximate posterior inference algorithm that can analyze massive collections of documents. We first review the traditional variatio... |

69 |
Online model selection based on the variational Bayes.
- Sato
- 2001
(Show Context)
Citation Context ...rithm. The condition that κ ∈ (0.5, 1] is needed to guarantee convergence. We show in section 2.3 that online LDA corresponds to a stochastic natural gradient algorithm on the variational objective L =-=[15, 16]-=-. This algorithm closely resembles one proposed in [16] for online VB on models with hidden data— the most important difference is that we use an approximate E step to optimize γt and φt, since we can... |

67 | Variational methods for the Dirichlet process
- Blei, Jordan
- 2004
(Show Context)
Citation Context ...choice of approximate posterior introduces bias, VB is empirically shown to be faster than and as accurate as MCMC, which makes it an attractive option when applying Bayesian models to large datasets =-=[1, 2, 3]-=-. Nonetheless, large scale data analysis with VB can be computationally difficult. Standard “batch” VB algorithms iterate between analyzing each observation and updating dataset-wide variational param... |

49 | Online EM for unsupervised models. - Liang, Klein - 2009 |

47 | Online inference of topics with latent dirichlet allocation.
- Canini, Shi, et al.
- 2009
(Show Context)
Citation Context ...er words in the corpus [23]. This algorithm is fast and has constant memory requirements, but is not guaranteed to converge to the posterior. Two alternative online MCMC approaches were considered in =-=[24]-=-. The first, called incremental LDA, periodically resamples the topic assignments for previously analyzed words. The second approach uses particle filtering instead of CGS. In a study in [24], none of... |

40 | Modeling and predicting personal information dissemination behavior
- Song, Lin, et al.
- 2005
(Show Context)
Citation Context ...gnments [22]. One online MCMC approach adapts CGS by sampling topic assignments zdi based on the topic assignments and data for all previously analyzed words, instead of all other words in the corpus =-=[23]-=-. This algorithm is fast and has constant memory requirements, but is not guaranteed to converge to the posterior. Two alternative online MCMC approaches were considered in [24]. The first, called inc... |

37 | Parallel inference for latent dirichlet allocation on graphics processing units.
- Yan, Xu, et al.
- 2009
(Show Context)
Citation Context ...ments. to summarize the latent structure of massive document collections that cannot be annotated by hand. A central research problem for topic modeling is to efficiently fit models to larger corpora =-=[4, 5]-=-. To this end, we develop an online variational Bayes algorithm for latent Dirichlet allocation (LDA), one of the simplest topic models and one on which many others are based. Our algorithm is based o... |

35 |
Variational inference for large-scale models of discrete choice
- Braun, McAuliffe
- 2010
(Show Context)
Citation Context ...choice of approximate posterior introduces bias, VB is empirically shown to be faster than and as accurate as MCMC, which makes it an attractive option when applying Bayesian models to large datasets =-=[1, 2, 3]-=-. Nonetheless, large scale data analysis with VB can be computationally difficult. Standard “batch” VB algorithms iterate between analyzing each observation and updating dataset-wide variational param... |

35 | Online learning and stochastic approximations.
- Bottou
- 1998
(Show Context)
Citation Context ...d φt as random variables ∑drawn at the same time as each observed document nt, then Eg[D∇λℓ(nt, γt, φt, λ)|λ] = ∇λ d ℓ(nd, γd, φd, λ). Thus, since ∑∞ t=0 ρt = ∞ and ∑∞ t=0 ρ2 ∑ t < ∞, the analysis in =-=[19]-=- shows both that λ converges and that the gradient ∇λ d ℓ(nd, γd, φd, λ) converges to 0, and thus that λ converges to a stationary point. 1 The update in equation 12 only makes use of first-order grad... |

3 |
Stochastic approximations and efficient learning. The Handbook of Brain Theory and Neural Networks, Second edition
- Bottou, Murata
- 2002
(Show Context)
Citation Context ...rithm. The condition that κ ∈ (0.5, 1] is needed to guarantee convergence. We show in section 2.3 that online LDA corresponds to a stochastic natural gradient algorithm on the variational objective L =-=[15, 16]-=-. This algorithm closely resembles one proposed in [16] for online VB on models with hidden data— the most important difference is that we use an approximate E step to optimize γt and φt, since we can... |

2 |
Variational extentions to EM and multinomial PCA
- Buntine
- 2002
(Show Context)
Citation Context ...nk of LDA as a probabilistic factorization of the matrix of word counts n (where ndw is the number of times word w appears in document d) into a matrix of topic weights θ and a dictionary of topics β =-=[9]-=-. Our work can thus 2be seen as an extension of online matrix factorization techniques that optimize squared error [10] to more general probabilistic formulations. We can analyze a corpus of document... |