Results 1  10
of
50
A bayesian framework for word segmentation: Exploring the effects of context
 In 46th Annual Meeting of the ACL
, 2009
"... Since the experiments of Saffran et al. (1996a), there has been a great deal of interest in the question of how statistical regularities in the speech stream might be used by infants to begin to identify individual words. In this work, we use computational modeling to explore the effects of differen ..."
Abstract

Cited by 108 (29 self)
 Add to MetaCart
Since the experiments of Saffran et al. (1996a), there has been a great deal of interest in the question of how statistical regularities in the speech stream might be used by infants to begin to identify individual words. In this work, we use computational modeling to explore the effects of different assumptions the learner might make regarding the nature of words – in particular, how these assumptions affect the kinds of words that are segmented from a corpus of transcribed childdirected speech. We develop several models within a Bayesian ideal observer framework, and use them to examine the consequences of assuming either that words are independent units, or units that help to predict other units. We show through empirical and theoretical results that the assumption of independence causes the learner to undersegment the corpus, with many two and threeword sequences (e.g. what’s that, do you, in the house) misidentified as individual words. In contrast, when the learner assumes that words are predictive, the resulting segmentation is far more accurate. These results indicate that taking context into account is important for a statistical word segmentation strategy to be successful, and raise the possibility that even young infants may be able to exploit more subtle statistical patterns than have usually been considered. 1
Topic models over text streams: a study of batch and online unsupervised learning
 In Proc. 7th SIAM Int’l. Conf. on Data Mining
"... Topic modeling techniques have widespread use in text data mining applications. Some applications use batch models, which perform clustering on the document collection in aggregate. In this paper, we analyze and compare the performance of three recentlyproposed batch topic models—Latent Dirichlet A ..."
Abstract

Cited by 33 (1 self)
 Add to MetaCart
(Show Context)
Topic modeling techniques have widespread use in text data mining applications. Some applications use batch models, which perform clustering on the document collection in aggregate. In this paper, we analyze and compare the performance of three recentlyproposed batch topic models—Latent Dirichlet Allocation (LDA), Dirichlet Compound Multinomial (DCM) mixtures and vonMises Fisher (vMF) mixture models. In cases where offline clustering on complete document collections is infeasible due to resource and responserate constraints, online unsupervised clustering methods that process incoming data incrementally are necessary. To this end, we propose online variants of vMF, EDCM and LDA. Experiments on large realworld document collections, in both the offline and online settings, demonstrate that though LDA is a good model for finding wordlevel topics, vMF finds better documentlevel topic clusters more efficiently, which is often important in text mining applications. Finally, we propose a practical heuristic for hybrid topic modeling, which learns online topic models on streaming text and intermittently runs batch topic models on aggregated documents offline. Such a hybrid model is useful for several applications (e.g., dynamic topicbased aggregation of usergenerated content in social networks) that need a good tradeoff between the performance of batch offline algorithms and efficiency of incremental online algorithms. 1
Accounting for Burstiness in Topic Models
"... Many different topic models have been used successfully for a variety of applications. However, even stateoftheart topic models suffer from the important flaw that they do not capture the tendency of words to appear in bursts; it is a fundamental property of language that if a word is used once i ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
(Show Context)
Many different topic models have been used successfully for a variety of applications. However, even stateoftheart topic models suffer from the important flaw that they do not capture the tendency of words to appear in bursts; it is a fundamental property of language that if a word is used once in a document, it is more likely to be used again. We introduce a topic model that uses Dirichlet compound multinomial (DCM) distributions to model this burstiness phenomenon. On both text and nontext datasets, the new model achieves better heldout likelihood than standard latent Dirichlet allocation (LDA). It is straightforward to incorporate the DCM extension into topic models that are more complex than LDA. 1.
Spherical Topic Models
"... We introduce the Spherical Admixture Model (SAM), a Bayesian topic model over arbitrary ℓ2 normalized data. SAM models documents as points on a highdimensional spherical manifold, and is capable of representing negative wordtopic correlations and word presence/absence, unlike models with multinomial ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
We introduce the Spherical Admixture Model (SAM), a Bayesian topic model over arbitrary ℓ2 normalized data. SAM models documents as points on a highdimensional spherical manifold, and is capable of representing negative wordtopic correlations and word presence/absence, unlike models with multinomial document likelihood, such as LDA. In this paper, we evaluate SAM as a topic browser, focusing on its ability to model “negative ” topic features, and also as a dimensionality reduction method, using topic proportions as features for difficult classification tasks in natural language processing and computer vision. 1
Informationbased models for ad hoc IR
 In SIGIR’10, conference on Research and development in information retrieval
, 2010
"... We introduce in this paper the family of informationbased models for ad hoc information retrieval. These models draw their inspiration from a longstanding hypothesis in IR, namely the fact that the difference in the behaviors of a word at the document and collection levels brings information on th ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
(Show Context)
We introduce in this paper the family of informationbased models for ad hoc information retrieval. These models draw their inspiration from a longstanding hypothesis in IR, namely the fact that the difference in the behaviors of a word at the document and collection levels brings information on the significance of the word for the document. This hypothesis has been exploited in the 2Poisson mixture models, in the notion of eliteness in BM25, and more recently in DFR models. We show here that, combined with notions related to burstiness, it can lead to simpler and better models.
PeopleLDA: Anchoring topics to people using face recognition
 In IEEE International Conference on Computer Vision
, 2007
"... ..."
(Show Context)
Latent Class Models for Algorithm Portfolio Methods
"... Different solvers for computationally difficult problems such as satisfiability (SAT) perform best on different instances. Algorithm portfolios exploit this phenomenon by predicting solvers ’ performance on specific problem instances, then shifting computational resources to the solvers that appear ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
Different solvers for computationally difficult problems such as satisfiability (SAT) perform best on different instances. Algorithm portfolios exploit this phenomenon by predicting solvers ’ performance on specific problem instances, then shifting computational resources to the solvers that appear best suited. This paper develops a new approach to the problem of making such performance predictions: natural generative models of solver behavior. Two are proposed, both following from an assumption that problem instances cluster into latent classes: a mixture of multinomial distributions, and a mixture of Dirichlet compound multinomial distributions. The latter model extends the former to capture burstiness, the tendency of solver outcomes to recur. These models are integrated into an algorithm portfolio architecture and used to run standard SAT solvers on competition benchmarks. This approach is found competitive with the most prominent existing portfolio, SATzilla, which relies on domainspecific, handselected problem features; the latent class models, in contrast, use minimal domain knowledge. Their success suggests that these models can lead to more powerful and more general algorithm portfolio methods.
Bayesian surprise and landmark detection
 in Proceedings of the 2009 IEEE international conference on Robotics and Automation. Institute of Electrical and Electronics Engineers Inc., The, 2009
"... model for each location denoted by the measurement number for an experiment. Only every second measurement is shown. The measurements corresponding to landmarks (i.e. where the landmark detector fires) are shown in red (shaded overlay). It can be seen that these correspond to the start of subsequenc ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
(Show Context)
model for each location denoted by the measurement number for an experiment. Only every second measurement is shown. The measurements corresponding to landmarks (i.e. where the landmark detector fires) are shown in red (shaded overlay). It can be seen that these correspond to the start of subsequences of measurements that also differ qualitatively from the preceding measurements, for example measurements before 34 are much more cluttered than those following it. Abstract — Automatic detection of landmarks, usually special places in the environment such as gateways, for topological mapping has proven to be a difficult task. We present the use of Bayesian surprise, introduced in computer vision, for landmark detection. Further, we provide a novel hierarchical, graphical model for the appearance of a place and use this model to perform surprisebased landmark detection. Our scheme is agnostic to the sensor type, and we demonstrate this by implementing a simple laser model for computing surprise. We evaluate our landmark detector using appearance and laser measurements in the context of a topological mapping algorithm, thus demonstrating the practical applicability of the detector. I.