Results 1  10
of
37
Unsupervised learning of finite mixture models
 IEEE Transactions on pattern analysis and machine intelligence
, 2002
"... AbstractÐThis paper proposes an unsupervised algorithm for learning a finite mixture model from multivariate data. The adjective ªunsupervisedº is justified by two properties of the algorithm: 1) it is capable of selecting the number of components and 2) unlike the standard expectationmaximization ..."
Abstract

Cited by 418 (22 self)
 Add to MetaCart
AbstractÐThis paper proposes an unsupervised algorithm for learning a finite mixture model from multivariate data. The adjective ªunsupervisedº is justified by two properties of the algorithm: 1) it is capable of selecting the number of components and 2) unlike the standard expectationmaximization (EM) algorithm, it does not require careful initialization. The proposed method also avoids another drawback of EM for mixture fitting: the possibility of convergence toward a singular estimate at the boundary of the parameter space. The novelty of our approach is that we do not use a model selection criterion to choose one among a set of preestimated candidate models; instead, we seamlessly integrate estimation and model selection in a single algorithm. Our technique can be applied to any type of parametric mixture model for which it is possible to write an EM algorithm; in this paper, we illustrate it with experiments involving Gaussian mixtures. These experiments testify for the good performance of our approach. Index TermsÐFinite mixtures, unsupervised learning, model selection, minimum message length criterion, Bayesian methods, expectationmaximization algorithm, clustering. æ 1
Hierarchical Latent Class Models for Cluster Analysis
 Journal of Machine Learning Research
, 2002
"... Latent class models are used for cluster analysis of categorical data. Underlying such a model is the assumption that the observed variables are mutually independent given the class variable. A serious problem with the use of latent class models, known as local dependence, is that this assumption is ..."
Abstract

Cited by 61 (12 self)
 Add to MetaCart
Latent class models are used for cluster analysis of categorical data. Underlying such a model is the assumption that the observed variables are mutually independent given the class variable. A serious problem with the use of latent class models, known as local dependence, is that this assumption is often untrue. In this paper we propose hierarchical latent class models as a framework where the local dependence problem can be addressed in a principled manner. We develop a searchbased algorithm for learning hierarchical latent class models from data. The algorithm is evaluated using both synthetic and realworld data.
Learning shapeclasses using a mixture of treeunions
 IEEE Trans. PAMI
, 2006
"... Abstract—This paper poses the problem of treeclustering as that of fitting a mixture of tree unions to a set of sample trees. The treeunions are structures from which the individual data samples belonging to a cluster can be obtained by edit operations. The distribution of observed tree nodes in ea ..."
Abstract

Cited by 27 (8 self)
 Add to MetaCart
(Show Context)
Abstract—This paper poses the problem of treeclustering as that of fitting a mixture of tree unions to a set of sample trees. The treeunions are structures from which the individual data samples belonging to a cluster can be obtained by edit operations. The distribution of observed tree nodes in each cluster sample is assumed to be governed by a Bernoulli distribution. The clustering method is designed to operate when the correspondences between nodes are unknown and must be inferred as part of the learning process. We adopt a minimum description length approach to the problem of fitting the mixture model to data. We make maximumlikelihood estimates of the Bernoulli parameters. The treeunions and the mixing proportions are sought so as to minimize the description length criterion. This is the sum of the negative logarithm of the Bernoulli distribution, and a messagelength criterion that encodes both the complexity of the uniontrees and the number of mixture components. We locate node correspondences by minimizing the edit distance with the current tree unions, and show that the edit distance is linked to the description length criterion. The method can be applied to both unweighted and weighted trees. We illustrate the utility of the resulting algorithm on the problem of classifying 2D shapes using a shock graph representation. Index Terms—Structural learning, tree clustering, mixture modelinq, minimum description length, model codes, shock graphs. 1
Model Selection by Normalized Maximum Likelihood
, 2005
"... The Minimum Description Length (MDL) principle is an information theoretic approach to inductive inference that originated in algorithmic coding theory. In this approach, data are viewed as codes to be compressed by the model. From this perspective, models are compared on their ability to compress a ..."
Abstract

Cited by 24 (9 self)
 Add to MetaCart
The Minimum Description Length (MDL) principle is an information theoretic approach to inductive inference that originated in algorithmic coding theory. In this approach, data are viewed as codes to be compressed by the model. From this perspective, models are compared on their ability to compress a data set by extracting useful information in the data apart from random noise. The goal of model selection is to identify the model, from a set of candidate models, that permits the shortest description length (code) of the data. Since Rissanen originally formalized the problem using the crude ‘twopart code ’ MDL method in the 1970s, many significant strides have been made, especially in the 1990s, with the culmination of the development of the refined ‘universal code’ MDL method, dubbed Normalized Maximum Likelihood (NML). It represents an elegant solution to the model selection problem. The present paper provides a tutorial review on these latest developments with a special focus on NML. An application example of NML in cognitive modeling is also provided.
General DirectionofArrival Tracking with Acoustic Nodes
 IEEE Trans. on Signal Processing
, 2005
"... Traditionally in target tracking, much emphasis is put on the motion model that realistically represents the target's movements. In this paper, we first present the classical constant velocity model and then introduce a new model that incorporates an acceleration component along the heading d ..."
Abstract

Cited by 12 (11 self)
 Add to MetaCart
(Show Context)
Traditionally in target tracking, much emphasis is put on the motion model that realistically represents the target's movements. In this paper, we first present the classical constant velocity model and then introduce a new model that incorporates an acceleration component along the heading direction of the target. We also show that the target motion parameters can be considered part of a more general feature set for target tracking. This is exemplified by showing that target frequencies, which may be unrelated to the target motion, can also be used to improve the tracking performance. In order to include the frequency variable, a new array steering vector is presented for the directionofarrival (DOA) estimation problems. The independent partition particle filter (IPPF) is used to compare the performances of the two motion models by tracking multiple maneuvering targets using the acoustic sensor outputs directly.
Latent Variable Discovery in Classification Models
, 2004
"... The naive Bayes model makes the often unrealistic assumption that feature variables are mutually independent given the class variable. We interpret the violation of this assumption as an indication of the presence of latent variables and show how latent variables can be detected. Latent variable dis ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
The naive Bayes model makes the often unrealistic assumption that feature variables are mutually independent given the class variable. We interpret the violation of this assumption as an indication of the presence of latent variables and show how latent variables can be detected. Latent variable discovery is interesting, especially for medical applications, because it can lead to better understanding of application domains. It can also improve classification accuracy and boost user confidence in classification models.
Debiasing for intrinsic dimension estimation
 in Proc. IEEE Statistical Signal Processing Workshop
, 2007
"... Many algorithms have been proposed for estimating the intrinsic dimension of high dimensional data. A phenomenon common to all of them is a negative bias, perceived to be the result of undersampling. We propose improved methods for estimating intrinsic dimension, taking manifold boundaries into cons ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
(Show Context)
Many algorithms have been proposed for estimating the intrinsic dimension of high dimensional data. A phenomenon common to all of them is a negative bias, perceived to be the result of undersampling. We propose improved methods for estimating intrinsic dimension, taking manifold boundaries into consideration. By estimating dimension locally, we are able to analyze and reduce the effect that sample data depth has on the negative bias. Additionally, we offer improvements to an existing algorithm for dimension estimation, based on knearest neighbor graphs, and offer an algorithm for adapting any dimension estimation algorithm to operate locally. Finally, we illustrate the uses of local dimension estimation with data sets consisting of multiple manifolds, including applications such as diagnosing anomalies in router networks and image segmentation. Index Terms — Intrinsic dimension, manifold learning, Riemannian manifold, nearest neighbor graph, geodesics
An application of minimum description length clustering to partitioning learning curves
 Proceedings of the 2005 IEEE International Symposium on Information Theory
, 2005
"... Abstract — We apply a Minimum Description Length–based clustering technique to the problem of partitioning a set of learning curves. The goal is to partition experimental data collected from different sources into groups of sources that are statistically the same. We solve this problem by defining s ..."
Abstract

Cited by 8 (7 self)
 Add to MetaCart
(Show Context)
Abstract — We apply a Minimum Description Length–based clustering technique to the problem of partitioning a set of learning curves. The goal is to partition experimental data collected from different sources into groups of sources that are statistically the same. We solve this problem by defining statistical models for the data generating processes, then partitioning them using the Normalized Maximum Likelihood criterion. Unlike many alternative model selection methods, this approach which is optimal (in a minimax coding sense) for data of any sample size. We present an application of the method to the cognitive modeling problem of partitioning of human learning curves for different categorization tasks. I.
Online Bayesian treestructured transformation of HMMs with optimal model selection for speaker adaptation
 IEEE Transactions on Speech and Audio Processing
"... ..."
(Show Context)