• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

A gentle tutorial of the EM algorithm and its application to parameter estimation for gaussian mixture and hidden markov models (1998)

by J A Bilmes
Venue:International Computer Science Institute, Tech. Rep
Add To MetaCart

Tools

Sorted by:
Results 11 - 20 of 225
Next 10 →

Novel Estimation Methods for Unsupervised Discovery of Latent Structure in Natural Language Text

by Noah Ashton Smith , 2006
"... This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likel ..."
Abstract - Cited by 20 (7 self) - Add to MetaCart
This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likelihood estimation, in different ways. Contrastive estimation maximizes the conditional probability of the observed data given a “neighborhood” of implicit negative examples. Skewed deterministic annealing locally maximizes likelihood using a cautious parameter search strategy that starts with an easier optimization problem than likelihood, and iteratively moves to harder problems, culminating in likelihood. Structural annealing is similar, but starts with a heavy bias toward simple syntactic structures and gradually relaxes the bias. Our estimation methods do not make use of annotated examples. We consider their performance in both an unsupervised model selection setting, where models trained under different initialization and regularization settings are compared by evaluating the training objective on a small set of unseen, unannotated development data, and supervised model selection, where the most accurate model on the development set (now with annotations)

Comparing and unifying search-based and similarity-based approaches to semi-supervised clustering

by Sugato Basu, Raymond J. Mooney - In Proceedings of the ICML-2003 Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining , 2003
"... Semi-supervised clustering employs a small amount of labeled data to aid unsupervised learning. Previous work in the area has employed one of two approaches: 1) Searchbased methods that utilize supervised data to guide the search for the best clustering, and 2) Similarity-based methods that use supe ..."
Abstract - Cited by 19 (3 self) - Add to MetaCart
Semi-supervised clustering employs a small amount of labeled data to aid unsupervised learning. Previous work in the area has employed one of two approaches: 1) Searchbased methods that utilize supervised data to guide the search for the best clustering, and 2) Similarity-based methods that use supervised data to adapt the underlying similarity metric used by the clustering algorithm. This paper presents a unified approach based on the K-Means clustering algorithm that incorporates both of these techniques. Experimental results demonstrate that the combined approach generally produces better clusters than either of the individual approaches. 1.

Language Evolution by Iterated Learning With Bayesian Agents

by Thomas L. Griffiths , Michael L. Kalish , 2007
"... Languages are transmitted from person to person and generation to generation via a process of iterated learning: people learn a language from other people who once learned that language themselves. We analyze the consequences of iterated learning for learning algorithms based on the principles of Ba ..."
Abstract - Cited by 18 (6 self) - Add to MetaCart
Languages are transmitted from person to person and generation to generation via a process of iterated learning: people learn a language from other people who once learned that language themselves. We analyze the consequences of iterated learning for learning algorithms based on the principles of Bayesian inference, assuming that learners compute a posterior distribution over languages by combining a prior (representing their inductive biases) with the evidence provided by linguistic data. We show that when learners sample languages from this posterior distribution, iterated learning converges to a distribution over languages that is determined entirely by the prior. Under these conditions, iterated learning is a form of Gibbs sampling, a widely-used Markov chain Monte Carlo algorithm. The consequences of iterated learning are more complicated when learners choose the language with maximum posterior probability, being affected by both the prior of the learners and the amount of information transmitted between generations. We show that in this case, iterated learning corresponds to another statistical inference algorithm, a variant of the expectation-maximization (EM) algorithm. These results clarify the role of iterated learning in explanations of linguistic universals and provide a formal connection between constraints on language acquisition and the languages that come to be spoken, suggesting that information transmitted via iterated learning will ultimately come to mirror the minds of the learners.

RankClus: Integrating clustering with ranking for heterogeneous information network analysis

by Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng, Tianyi Wu - In EDBT’09
"... As information networks become ubiquitous, extracting knowledge from information networks has become an important task. Both ranking and clustering can provide overall views on information network data, and each has been a hot topic by itself. However, ranking objects globally without considering wh ..."
Abstract - Cited by 17 (13 self) - Add to MetaCart
As information networks become ubiquitous, extracting knowledge from information networks has become an important task. Both ranking and clustering can provide overall views on information network data, and each has been a hot topic by itself. However, ranking objects globally without considering which clusters they belong to often leads to dumb results, e.g., ranking database and computer architecture conferences together may not make much sense. Similarly, clustering a huge number of objects (e.g., thousands of authors) in one huge cluster without distinction is dull as well. In this paper, we address the problem of generating clusters for a specified type of objects, as well as ranking information for all types of objects based on these clusters in a multityped (i.e., heterogeneous) information network. A novel

Mixed IDEAs

by Peter A. N. Bosman, Dirk Thierens , 2000
"... ..."
Abstract - Cited by 16 (2 self) - Add to MetaCart
Abstract not found

Closing the learning-planning loop with predictive state representations. http://arxiv.org/abs/0912.2385

by Byron Boots, Sajid M. Siddiqi, Geoffrey J. Gordon , 2009
"... Abstract — A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must learn an accurate model of our environment, and then plan to maximize reward. Unfortunately, learning algorithms often recover a model w ..."
Abstract - Cited by 15 (10 self) - Add to MetaCart
Abstract — A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must learn an accurate model of our environment, and then plan to maximize reward. Unfortunately, learning algorithms often recover a model which is too inaccurate to support planning or too large and complex for planning to be feasible; or, they require large amounts of prior domain knowledge or fail to provide important guarantees such as statistical consistency. To begin to fill this gap, we propose a novel algorithm which provably learns a compact, accurate model directly from sequences of action-observation pairs. To evaluate the learner, we then close the loop from observations to actions: we plan in the learned model and recover a policy which is nearoptimal in the original environment (not the model). In more detail, we present a spectral algorithm for learning a Predictive State Representation (PSR). We demonstrate the algorithm by learning a model of a simulated high-dimensional, vision-based mobile robot planning task, and then performing approximate point-based planning in the learned model. This experiment shows that the learned PSR captures the essential features of the environment, allows accurate prediction with a small number of parameters, and enables successful and efficient planning. Our algorithm has several benefits which have not appeared together in any previous PSR learner: it is computationally efficient and statistically consistent; it handles high-dimensional observations and long time horizons by working from real-valued features of observation sequences; and finally, our close-the-loop experiments provide an end-to-end practical test. I.

Understanding Background Mixture Models for Foreground

by Segmentation Wayne Power, P. Wayne, P. Wayne, Power Johann, Power Johann, A. Schoonees, A. Schoonees , 2002
"... The seminal video surveillance papers on moving object segmentation through adaptive Gaussian mixture models of the background image do not provide adequate information for easy replication of the work. They also do not explicitly base their algorithms on the underlying statistical theory and someti ..."
Abstract - Cited by 14 (0 self) - Add to MetaCart
The seminal video surveillance papers on moving object segmentation through adaptive Gaussian mixture models of the background image do not provide adequate information for easy replication of the work. They also do not explicitly base their algorithms on the underlying statistical theory and sometimes even suffer from errors of derivation. This tutorial paper describes a practical implementation of the Stauffer-Grimson algorithm and provides values for all model parameters. It also shows what approximations to the theory were made and how to improve the standard algorithm by redefining those approximations.

Advancing Continuous IDEAs with Mixture Distributions and Factorization Selection Metrics

by Peter A.N. Bosman, Dirk Thierens - Proceedings of the Optimization by Building and Using Probabilistic Models OBUPM Workshop at the Genetic and Evolutionary Computation Conference GECCO–2001 , 2001
"... Evolutionary optimization based on proba- bilistic models has so far been limited to the use of factorizations in the case of continuous representations. Furthermore, a maximum complexity parameter n was required previously to construct factorizations to prevent unnecessary complexity to be in ..."
Abstract - Cited by 14 (5 self) - Add to MetaCart
Evolutionary optimization based on proba- bilistic models has so far been limited to the use of factorizations in the case of continuous representations. Furthermore, a maximum complexity parameter n was required previously to construct factorizations to prevent unnecessary complexity to be introduced in the factorization. In this paper, we advance these techniques by using clustering and the EM algorithm to allow for mixture distributions.

Optimizing time series discretization for knowledge discovery

by Fabian Mörchen - Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’05 , 2005
"... Knowledge Discovery in time series usually requires symbolic time series. Many discretization methods that convert numeric time series to symbolic time series ignore the temporal order of values. This often leads to symbols that do not correspond to states of the process generating the time series a ..."
Abstract - Cited by 12 (3 self) - Add to MetaCart
Knowledge Discovery in time series usually requires symbolic time series. Many discretization methods that convert numeric time series to symbolic time series ignore the temporal order of values. This often leads to symbols that do not correspond to states of the process generating the time series and cannot be interpreted meaningfully. We propose a new method for meaningful unsupervised discretization of numeric time series called Persist. The algorithm is based on the Kullback-Leibler divergence between the marginal and the self-transition probability distributions of the discretization symbols. Its performance is evaluated on both artificial and real life data in comparison to the most common discretization methods. Persist achieves significantly higher accuracy than existing static methods and is robust against noise. It also outperforms Hidden Markov Models for all but very simple cases.

On Transforming Statistical Models for Non-Frontal Face Verification

by Conrad Sanderson, Samy Bengio, Yongsheng Gao , 2006
"... We address the pose mismatch problem which can occur in face verification systems that have only a single (frontal) face image available for training. In the framework of a Bayesian classifier based on mixtures of gaussians, the problem is tackled through extending each frontal face model with art ..."
Abstract - Cited by 11 (1 self) - Add to MetaCart
We address the pose mismatch problem which can occur in face verification systems that have only a single (frontal) face image available for training. In the framework of a Bayesian classifier based on mixtures of gaussians, the problem is tackled through extending each frontal face model with artificially synthesized models for non-frontal views. The synthesis methods are based on several implementations of Maximum Likelihood Linear Regression (MLLR), as well as standard multi-variate linear regression (LinReg). All synthesis techniques rely on prior information and learn how face models for the frontal view are related to face models for non-frontal views. The synthesis and extension approach is evaluated by applying it to two face verification systems: a holistic system (based on PCA-derived features) and a local feature system (based on DCT-derived features). Experiments on the FERET database suggest that for the holistic system, the LinReg based technique is more suited than the MLLR based techniques; for the local feature system, the results show that synthesis via a new MLLR implementation obtains better performance than synthesis based on traditional MLLR. The results further suggest that extending frontal models considerably reduces errors. It is also shown that the local feature system is less affected by view changes than the holistic system; this can be attributed to the parts based representation of the face, and, due to the classifier based on mixtures of gaussians, the lack of constraints on spatial relations between the face parts, allowing for deformations and movements of face areas.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University