Results 1 
4 of
4
Hidden topic Markov models
 In Proceedings of Artificial Intelligence and Statistics
, 2007
"... Algorithms such as Latent Dirichlet Allocation (LDA) have achieved significant progress in modeling word document relationships. These algorithms assume each word in the document was generated by a hidden topic and explicitly model the word distribution of each topic as well as the prior distributio ..."
Abstract

Cited by 88 (2 self)
 Add to MetaCart
(Show Context)
Algorithms such as Latent Dirichlet Allocation (LDA) have achieved significant progress in modeling word document relationships. These algorithms assume each word in the document was generated by a hidden topic and explicitly model the word distribution of each topic as well as the prior distribution over topics in the document. Given these parameters, the topics of all words in the same document are assumed to be independent. In this paper, we propose modeling the topics of words in the document as a Markov chain. Specifically, we assume that all words in the same sentence have the same topic, and successive sentences are more likely to have the same topics. Since the topics are hidden, this leads to using the wellknown tools of Hidden Markov Models for learning and inference. We show that incorporating this dependency allows us to learn better topics and to disambiguate words that can belong to different topics. Quantitatively, we show that we obtain better perplexity in modeling documents with only a modest increase in learning and inference complexity. 1
Latent Topic Models for Hypertext
"... Latent topic models have been successfully applied as an unsupervised topic discovery technique in large document collections. With the proliferation of hypertext document collection such as the Internet, there has also been great interest in extending these approaches to hypertext [6, 9]. These app ..."
Abstract

Cited by 30 (1 self)
 Add to MetaCart
Latent topic models have been successfully applied as an unsupervised topic discovery technique in large document collections. With the proliferation of hypertext document collection such as the Internet, there has also been great interest in extending these approaches to hypertext [6, 9]. These approaches typically model links in an analogous fashion to how they model words the documentlink cooccurrence matrix is modeled in the same way that the documentword cooccurrence matrix is modeled in standard topic models. In this paper we present a probabilistic generative model for hypertext document collections that explicitly models the generation of links. Specifically, links from a word w to a document d depend directly on how frequent the topic of w is in d, in addition to the indegree of d. We show how to perform EM learning on this model efficiently. By not modeling links as analogous to words, we end up using far fewer free parameters and obtain better link prediction results. 1
Learning Author Topic Models from Text Corpora
, 2005
"... We propose a new unsupervised learning technique for extracting information about authors and topics from large text collections. We model documents as if they were generated by a twostage stochastic process. An author is represented by a probability distribution over topics, and each topic is r ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
We propose a new unsupervised learning technique for extracting information about authors and topics from large text collections. We model documents as if they were generated by a twostage stochastic process. An author is represented by a probability distribution over topics, and each topic is represented as a probability distribution over words. The probability distribution over topics in a multiauthor paper is a mixture of the distributions associated with the authors. The topicword and authortopic distributions are learned from data in an unsupervised manner using a Markov chain Monte Carlo algorithm. We apply the methodology to three large text corpora: 150,000 abstracts from the CiteSeer digital library, 1,740 papers from the Neural Information Processing Systems Conference (NIPS), and 121,000 emails from a large corporation. We discuss in detail the interpretation of the results discovered by the system including specific topic and author models, ranking of authors by topic and topics by author, parsing of abstracts by topics and authors, and detection of unusual papers by specific authors. Experiments based
Improving Statistical Topic Models by Using Ontological Concepts
"... Over the last decade with popularization of Internet and increase in information available electronically, there is a necessity to find means to get an insight into such large repositories of information. Statistical Models are anticipated to provide good understanding on a large corpus of documents ..."
Abstract
 Add to MetaCart
Over the last decade with popularization of Internet and increase in information available electronically, there is a necessity to find means to get an insight into such large repositories of information. Statistical Models are anticipated to provide good understanding on a large corpus of documents. Latent Dirichlet Allocation (LDA) and Concept Based Topics Models having humandefined concepts from external ontology have been shown to provide good insight into such large collection of documents. This report details an attempt to apply LDA and Concept Based Topics Models and evaluate its performance on NIPS Dataset, using Machine Learning Glossary as External Ontology based on rate of Convergence over Gibbs Iterations, Quantity of Training Data requirements, Number of Topics / Concepts, and also by introducing smoothing and filtering vocabulary to make both these models comparable. Experiments on NIPS Data shows decrease in rate of information gain over higher Gibbs Iterations for both the models. Having more training data and higher number of topics for LDA model is beneficial, but Concept Model performs relatively good on less training data. LDA with Consistent Vocabulary provides wonderful results, but Concept Model gives better posterior beliefs on words related to a particular concepts. Therefore, LDA Model does learn and group