## Hidden topic Markov models (2007)

Venue: | In Proceedings of Artificial Intelligence and Statistics |

Citations: | 46 - 1 self |

### BibTeX

@INPROCEEDINGS{Gruber07hiddentopic,

author = {Amit Gruber and Michal Rosen-zvi and Yair Weiss},

title = {Hidden topic Markov models},

booktitle = {In Proceedings of Artificial Intelligence and Statistics},

year = {2007}

}

### OpenURL

### Abstract

Algorithms such as Latent Dirichlet Allocation (LDA) have achieved significant progress in modeling word document relationships. These algorithms assume each word in the document was generated by a hidden topic and explicitly model the word distribution of each topic as well as the prior distribution over topics in the document. Given these parameters, the topics of all words in the same document are assumed to be independent. In this paper, we propose modeling the topics of words in the document as a Markov chain. Specifically, we assume that all words in the same sentence have the same topic, and successive sentences are more likely to have the same topics. Since the topics are hidden, this leads to using the well-known tools of Hidden Markov Models for learning and inference. We show that incorporating this dependency allows us to learn better topics and to disambiguate words that can belong to different topics. Quantitatively, we show that we obtain better perplexity in modeling documents with only a modest increase in learning and inference complexity. 1

### Citations

4273 | A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition
- Rabiner
- 1989
(Show Context)
Citation Context ...nditioned on β and θ the HTMM model is a special type of HMM. This allows us to use the standard parameter estimation tools of HMMs, namely Expectation-Maximization and the forward-backward algorithm =-=[12]-=-. Unlike fully Bayesian inference methods, the standard EM algorithm for HMMs distinguishes between latent variables and parameters. Applied to the HTMM model, the latent variables are the topics zn a... |

2366 | Latent Dirichlet Allocation
- Blei, Ng, et al.
(Show Context)
Citation Context ...rest in automated extraction of useful information from text. Modeling the observed text as generated from latent aspects or topics is a prominent approach in machine learning studies of texts (e.g., =-=[8, 14, 3, 4, 6]-=-). In such models, the ”bag-of-words” assumption is often employed, an assumption that the order of words can be ignored and text corpora can be represented by a co-occurrence matrix of words and docu... |

804 | Text classification from labeled and unlabeled documents using
- Nigam, McCallum, et al.
- 2000
(Show Context)
Citation Context ...r extreme, if we do not allow any topic transitions and set ψn = 0 between any two words, we obtain the mixture of unigrams model in which all words in the document are assumed to have the same topic =-=[11]-=-. Unlike the LDA and mixture of unigrams models, the HTMM model (which allows infrequent topic transitions within a document) is no longer invariant to a reshuffling of the words. Documents for which ... |

692 | A stochastic parts program and noun phrase parser for unrestricted text
- Church
- 1988
(Show Context)
Citation Context ...l to predict previously unobserved words in trained documents. Markov models such as N-grams and HMMs that capture local dependencies between words have been employed mainly in part-of-speech tagging =-=[5]-=-. Models for semantic parsing tasks often use a “shallow” model with no hidden states [9]. In recent years several probabilistic models for text that infer topics and incorporate Markovian relations h... |

625 |
Finding scientific topics
- Griffiths, Steyvers
- 2004
(Show Context)
Citation Context ...rest in automated extraction of useful information from text. Modeling the observed text as generated from latent aspects or topics is a prominent approach in machine learning studies of texts (e.g., =-=[8, 14, 3, 4, 6]-=-). In such models, the ”bag-of-words” assumption is often employed, an assumption that the order of words can be ignored and text corpora can be represented by a co-occurrence matrix of words and docu... |

531 | Probabilistic latent semantic analysis
- Hofmann
- 1999
(Show Context)
Citation Context ...rest in automated extraction of useful information from text. Modeling the observed text as generated from latent aspects or topics is a prominent approach in machine learning studies of texts (e.g., =-=[8, 14, 3, 4, 6]-=-). In such models, the ”bag-of-words” assumption is often employed, an assumption that the order of words can be ignored and text corpora can be represented by a co-occurrence matrix of words and docu... |

392 | Correlated topic models
- Blei, Lafferty
(Show Context)
Citation Context ...ry convex costs, the value of the normal w will always be unique. Acknowledgments C. Burges wishes to thank W. Keasler, V. Lawrence and C. Nohl of Lucent Tech- nologies for their support . References =-=[1]-=- R. Fletcher. Practical Methods of Optimization. John Wiley and Sons, Inc., 2 nd edition, 1987. Figure 3: An example of segmenting a document according to its semantic and of word sense disambiguation... |

233 | The author-topic model for authors and documents
- Rosen-Zvi, Griffiths, et al.
- 2004
(Show Context)
Citation Context ...ters β, θ is intractable. In recent years, several alternatives for approximate inference have been suggested: EM [8] or variational EM [3], Expectation propagation (EP) [10] and Monte-Carlo sampling =-=[13, 7]-=-. In this paper, we take advantage of the fact that conditioned on β and θ the HTMM model is a special type of HMM. This allows us to use the standard parameter estimation tools of HMMs, namely Expect... |

123 | Integrating topics and syntax
- Griffiths, Steyvers, et al.
- 2005
(Show Context)
Citation Context ... parsing tasks often use a “shallow” model with no hidden states [9]. In recent years several probabilistic models for text that infer topics and incorporate Markovian relations have been studied. In =-=[7]-=- a model that integrates topics and syntax is introduced. It contains a latent variable per each word that stands for syntactic classes. The model posits that words are either generated from topics th... |

110 | Expectation-propagation for the generative aspect model
- Minka, Lafferty
- 2002
(Show Context)
Citation Context ... probabilities over the parameters β, θ is intractable. In recent years, several alternatives for approximate inference have been suggested: EM [8] or variational EM [3], Expectation propagation (EP) =-=[10]-=- and Monte-Carlo sampling [13, 7]. In this paper, we take advantage of the fact that conditioned on β and θ the HTMM model is a special type of HMM. This allows us to use the standard parameter estima... |

84 | Topic modeling: beyond bag-ofwords
- Wallach
- 2006
(Show Context)
Citation Context ...ation conveyed in the structure of words. During the last couple of years, a few models were introduced in which consecutive words are modeled by Markovian relations. These are the Bigram topic model =-=[15]-=-, the LDA collocation model and the Topical n-grams model [16]. All these models assume that words generation in texts depend on a latent topic assignment as well as on the n-previous words in the tex... |

79 | A hierarchical Dirichlet language model
- MacKay, Peto
- 1994
(Show Context)
Citation Context ...ams and HMMs that capture local dependencies between words have been employed mainly in part-of-speech tagging [5]. Models for semantic parsing tasks often use a “shallow” model with no hidden states =-=[9]-=-. In recent years several probabilistic models for text that infer topics and incorporate Markovian relations have been studied. In [7] a model that integrates topics and syntax is introduced. It cont... |

68 | The power of word clusters for text classification
- Slonim, Tishby
- 2001
(Show Context)
Citation Context |

52 | Topic segmentation with an aspect hidden markov model
- Blei, Moreno
- 2001
(Show Context)
Citation Context ...LDA model and to capture relations between consecutive words. We follow the same lines, while we allow Markovian relations between the hidden aspects. A somewhat related model is the aspect HMM model =-=[2]-=-, though it models unstructured data that contains stream of words. The model contains latent topics that have Markovian relations. In the aspect HMM model, documents or segments are inferred using he... |

18 | A note on topical n-grams
- Wang, McCallum
- 2005
(Show Context)
Citation Context ...le of years, a few models were introduced in which consecutive words are modeled by Markovian relations. These are the Bigram topic model [15], the LDA collocation model and the Topical n-grams model =-=[16]-=-. All these models assume that words generation in texts depend on a latent topic assignment as well as on the n-previous words in the text. This added complexity seem to provide the models with more ... |

4 |
Buntine and Aleks Jakulin. Applying discrete PCA in data analysis
- Wray
- 2004
(Show Context)
Citation Context |