Results 11  20
of
89
Ensemble Learning for Hidden Markov Models
, 1997
"... The standard method for training Hidden Markov Models optimizes a point estimate of the model parameters. This estimate, which can be viewed as the maximum of a posterior probability density over the model parameters, may be susceptible to overfitting, and contains no indication of parameter uncerta ..."
Abstract

Cited by 91 (0 self)
 Add to MetaCart
The standard method for training Hidden Markov Models optimizes a point estimate of the model parameters. This estimate, which can be viewed as the maximum of a posterior probability density over the model parameters, may be susceptible to overfitting, and contains no indication of parameter uncertainty. Also, this maximummay be unrepresentative of the posterior probability distribution. In this paper we study a method in which we optimize an ensemble which approximates the entire posterior probability distribution. The ensemble learning algorithm requires the same resources as the traditional BaumWelch algorithm. The traditional training algorithm for hidden Markov models is an expectation maximization (EM) algorithm (Dempster et al. 1977) known as the BaumWelch algorithm. It is a maximum likelihood method, or, with a simple modification, a penalized maximum likelihood method, which can be viewed as maximizing a posterior probability density over the model parameters. Recently, ...
Hidden topic Markov models
 In Proceedings of Artificial Intelligence and Statistics
, 2007
"... Algorithms such as Latent Dirichlet Allocation (LDA) have achieved significant progress in modeling word document relationships. These algorithms assume each word in the document was generated by a hidden topic and explicitly model the word distribution of each topic as well as the prior distributio ..."
Abstract

Cited by 81 (2 self)
 Add to MetaCart
(Show Context)
Algorithms such as Latent Dirichlet Allocation (LDA) have achieved significant progress in modeling word document relationships. These algorithms assume each word in the document was generated by a hidden topic and explicitly model the word distribution of each topic as well as the prior distribution over topics in the document. Given these parameters, the topics of all words in the same document are assumed to be independent. In this paper, we propose modeling the topics of words in the document as a Markov chain. Specifically, we assume that all words in the same sentence have the same topic, and successive sentences are more likely to have the same topics. Since the topics are hidden, this leads to using the wellknown tools of Hidden Markov Models for learning and inference. We show that incorporating this dependency allows us to learn better topics and to disambiguate words that can belong to different topics. Quantitatively, we show that we obtain better perplexity in modeling documents with only a modest increase in learning and inference complexity. 1
Topical ngrams: Phrase and topic discovery, with an application to information retrieval
 In Proceedings of the 7th IEEE International Conference on Data Mining
, 2007
"... Most topic models, such as latent Dirichlet allocation, rely on the bagofwords assumption. However, word order and phrases are often critical to capturing the meaning of text in many text mining tasks. This paper presents topical ngrams, a topic model that discovers topics as well as topical phra ..."
Abstract

Cited by 62 (3 self)
 Add to MetaCart
(Show Context)
Most topic models, such as latent Dirichlet allocation, rely on the bagofwords assumption. However, word order and phrases are often critical to capturing the meaning of text in many text mining tasks. This paper presents topical ngrams, a topic model that discovers topics as well as topical phrases. The probabilistic model generates words in their textual order by, for each word, first sampling a topic, then sampling its status as a unigram or bigram, and then sampling the word from a topicspecific unigram or bigram distribution. Thus our model can model “white house ” as a special meaning phrase in the ‘politics ’ topic, but not in the ‘real estate ’ topic. Successive bigrams form longer phrases. We present experiments showing meaningful phrases and more interpretable topics from the NIPS data and improved information retrieval performance on a TREC collection. 1
A bit of progress in language modeling — extended version
, 2001
"... 1.1 Overview Language modeling is the art of determining the probability of a sequence of words. This is useful in a large variety of areas including speech recognition, ..."
Abstract

Cited by 57 (1 self)
 Add to MetaCart
1.1 Overview Language modeling is the art of determining the probability of a sequence of words. This is useful in a large variety of areas including speech recognition,
Model evolution by runtime parameter adaptation
 International Conference on Software Engineering
, 2009
"... Models can help software engineers to reason about designtime decisions before implementing a system. This paper focuses on models that deal with nonfunctional properties, such as reliability and performance. To build such models, one must rely on numerical estimates of various parameters provid ..."
Abstract

Cited by 39 (2 self)
 Add to MetaCart
(Show Context)
Models can help software engineers to reason about designtime decisions before implementing a system. This paper focuses on models that deal with nonfunctional properties, such as reliability and performance. To build such models, one must rely on numerical estimates of various parameters provided by domain experts or extracted by other similar systems. Unfortunately, estimates are seldom correct. In addition, in dynamic environments, the value of parameters may change over time. We discuss an approach that addresses these issues by keeping models alive at run time and feeding a Bayesian estimator with data collected from the running system, which produces updated parameters. The updated model provides an increasingly better representation of the system. By analyzing the updated model at run time, it is possible to detect or predict if a desired property is, or will be, violated by the running implementation. Requirement violations may trigger automatic reconfigurations or recovery actions aimed at guaranteeing the desired goals. We illustrate a working framework supporting our methodology and apply it to an example in which a Web service orchestrated composition is modeled through a Discrete Time Markov Chain. Numerical simulations show the effectiveness of the approach. 1.
Modeling Human Performance in Statistical Word Segmentation
"... What mechanisms support the ability of human infants, adults, and other primates to identify words from fluent speech using distributional regularities? In order to better characterize this ability, we collected data from adults in an artificial language segmentation task similar to Saffran, Newport ..."
Abstract

Cited by 37 (14 self)
 Add to MetaCart
What mechanisms support the ability of human infants, adults, and other primates to identify words from fluent speech using distributional regularities? In order to better characterize this ability, we collected data from adults in an artificial language segmentation task similar to Saffran, Newport, and Aslin (1996) in which the length of sentences was systematically varied between groups of participants. We then compared the fit of a variety of computational models— including simple statistical models of transitional probability and mutual information, a clustering model based on mutual information by Swingley (2005), PARSER (Perruchet & Vintner, 1998), and a Bayesian model. We found that while all models were able to successfully complete the task, fit to the human data varied considerably, with the Bayesian model achieving the highest correlation with our results.
Efficient Bayesian Parameter Estimation in Large Discrete Domains
 Advances in Neural Information Processing Systems
, 1999
"... In this paper we examine the problem of estimating the parameters of a multinomial distribution over a large number of discrete outcomes, most of which do not appear in the training data. We analyze this problem from a Bayesian perspective and develop a hierarchical prior that incorporates the assum ..."
Abstract

Cited by 35 (1 self)
 Add to MetaCart
(Show Context)
In this paper we examine the problem of estimating the parameters of a multinomial distribution over a large number of discrete outcomes, most of which do not appear in the training data. We analyze this problem from a Bayesian perspective and develop a hierarchical prior that incorporates the assumption that the observed outcomes constitute only a small subset of the possible outcomes. We show how to efficiently perform exact inference with this form of hierarchical prior and compare our method to standard approaches and demonstrate its merits. Category: Algorithms and Architectures Presentation preference: none This paper was not submitted elsewhere nor will be submitted during NIPS review period. 1 Introduction One of the most important problems in statistical inference is multinomialestimation: Given a past history of observations independent trials with a discrete set of outcomes, predict the probability of the next trial. Such estimators are the basic building blocks in mor...