Results 1  10
of
24
Unsupervised learning of finite mixture models
 IEEE Transactions on pattern analysis and machine intelligence
, 2002
"... AbstractÐThis paper proposes an unsupervised algorithm for learning a finite mixture model from multivariate data. The adjective ªunsupervisedº is justified by two properties of the algorithm: 1) it is capable of selecting the number of components and 2) unlike the standard expectationmaximization ..."
Abstract

Cited by 267 (20 self)
 Add to MetaCart
AbstractÐThis paper proposes an unsupervised algorithm for learning a finite mixture model from multivariate data. The adjective ªunsupervisedº is justified by two properties of the algorithm: 1) it is capable of selecting the number of components and 2) unlike the standard expectationmaximization (EM) algorithm, it does not require careful initialization. The proposed method also avoids another drawback of EM for mixture fitting: the possibility of convergence toward a singular estimate at the boundary of the parameter space. The novelty of our approach is that we do not use a model selection criterion to choose one among a set of preestimated candidate models; instead, we seamlessly integrate estimation and model selection in a single algorithm. Our technique can be applied to any type of parametric mixture model for which it is possible to write an EM algorithm; in this paper, we illustrate it with experiments involving Gaussian mixtures. These experiments testify for the good performance of our approach. Index TermsÐFinite mixtures, unsupervised learning, model selection, minimum message length criterion, Bayesian methods, expectationmaximization algorithm, clustering. æ 1
Hierarchical Latent Class Models for Cluster Analysis
 Journal of Machine Learning Research
, 2002
"... Latent class models are used for cluster analysis of categorical data. Underlying such a model is the assumption that the observed variables are mutually independent given the class variable. A serious problem with the use of latent class models, known as local dependence, is that this assumption is ..."
Abstract

Cited by 46 (12 self)
 Add to MetaCart
Latent class models are used for cluster analysis of categorical data. Underlying such a model is the assumption that the observed variables are mutually independent given the class variable. A serious problem with the use of latent class models, known as local dependence, is that this assumption is often untrue. In this paper we propose hierarchical latent class models as a framework where the local dependence problem can be addressed in a principled manner. We develop a searchbased algorithm for learning hierarchical latent class models from data. The algorithm is evaluated using both synthetic and realworld data.
Learning shapeclasses using a mixture of treeunions
 IEEE Trans. PAMI
, 2006
"... Abstract—This paper poses the problem of treeclustering as that of fitting a mixture of tree unions to a set of sample trees. The treeunions are structures from which the individual data samples belonging to a cluster can be obtained by edit operations. The distribution of observed tree nodes in ea ..."
Abstract

Cited by 20 (6 self)
 Add to MetaCart
Abstract—This paper poses the problem of treeclustering as that of fitting a mixture of tree unions to a set of sample trees. The treeunions are structures from which the individual data samples belonging to a cluster can be obtained by edit operations. The distribution of observed tree nodes in each cluster sample is assumed to be governed by a Bernoulli distribution. The clustering method is designed to operate when the correspondences between nodes are unknown and must be inferred as part of the learning process. We adopt a minimum description length approach to the problem of fitting the mixture model to data. We make maximumlikelihood estimates of the Bernoulli parameters. The treeunions and the mixing proportions are sought so as to minimize the description length criterion. This is the sum of the negative logarithm of the Bernoulli distribution, and a messagelength criterion that encodes both the complexity of the uniontrees and the number of mixture components. We locate node correspondences by minimizing the edit distance with the current tree unions, and show that the edit distance is linked to the description length criterion. The method can be applied to both unweighted and weighted trees. We illustrate the utility of the resulting algorithm on the problem of classifying 2D shapes using a shock graph representation. Index Terms—Structural learning, tree clustering, mixture modelinq, minimum description length, model codes, shock graphs. 1
Model Selection by Normalized Maximum Likelihood
, 2005
"... The Minimum Description Length (MDL) principle is an information theoretic approach to inductive inference that originated in algorithmic coding theory. In this approach, data are viewed as codes to be compressed by the model. From this perspective, models are compared on their ability to compress a ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
The Minimum Description Length (MDL) principle is an information theoretic approach to inductive inference that originated in algorithmic coding theory. In this approach, data are viewed as codes to be compressed by the model. From this perspective, models are compared on their ability to compress a data set by extracting useful information in the data apart from random noise. The goal of model selection is to identify the model, from a set of candidate models, that permits the shortest description length (code) of the data. Since Rissanen originally formalized the problem using the crude ‘twopart code ’ MDL method in the 1970s, many significant strides have been made, especially in the 1990s, with the culmination of the development of the refined ‘universal code’ MDL method, dubbed Normalized Maximum Likelihood (NML). It represents an elegant solution to the model selection problem. The present paper provides a tutorial review on these latest developments with a special focus on NML. An application example of NML in cognitive modeling is also provided.
General DirectionofArrival Tracking with Acoustic Nodes
 IEEE Trans. on Signal Processing
, 2005
"... Traditionally in target tracking, much emphasis is put on the motion model that realistically represents the target's movements. In this paper, we first present the classical constant velocity model and then introduce a new model that incorporates an acceleration component along the heading direct ..."
Abstract

Cited by 12 (11 self)
 Add to MetaCart
Traditionally in target tracking, much emphasis is put on the motion model that realistically represents the target's movements. In this paper, we first present the classical constant velocity model and then introduce a new model that incorporates an acceleration component along the heading direction of the target. We also show that the target motion parameters can be considered part of a more general feature set for target tracking. This is exemplified by showing that target frequencies, which may be unrelated to the target motion, can also be used to improve the tracking performance. In order to include the frequency variable, a new array steering vector is presented for the directionofarrival (DOA) estimation problems. The independent partition particle filter (IPPF) is used to compare the performances of the two motion models by tracking multiple maneuvering targets using the acoustic sensor outputs directly.
Latent Variable Discovery in Classification Models
, 2004
"... The naive Bayes model makes the often unrealistic assumption that feature variables are mutually independent given the class variable. We interpret the violation of this assumption as an indication of the presence of latent variables and show how latent variables can be detected. Latent variable dis ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
The naive Bayes model makes the often unrealistic assumption that feature variables are mutually independent given the class variable. We interpret the violation of this assumption as an indication of the presence of latent variables and show how latent variables can be detected. Latent variable discovery is interesting, especially for medical applications, because it can lead to better understanding of application domains. It can also improve classification accuracy and boost user confidence in classification models.
Debiasing for intrinsic dimension estimation
 in Proc. IEEE Statistical Signal Processing Workshop
, 2007
"... Many algorithms have been proposed for estimating the intrinsic dimension of high dimensional data. A phenomenon common to all of them is a negative bias, perceived to be the result of undersampling. We propose improved methods for estimating intrinsic dimension, taking manifold boundaries into cons ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
Many algorithms have been proposed for estimating the intrinsic dimension of high dimensional data. A phenomenon common to all of them is a negative bias, perceived to be the result of undersampling. We propose improved methods for estimating intrinsic dimension, taking manifold boundaries into consideration. By estimating dimension locally, we are able to analyze and reduce the effect that sample data depth has on the negative bias. Additionally, we offer improvements to an existing algorithm for dimension estimation, based on knearest neighbor graphs, and offer an algorithm for adapting any dimension estimation algorithm to operate locally. Finally, we illustrate the uses of local dimension estimation with data sets consisting of multiple manifolds, including applications such as diagnosing anomalies in router networks and image segmentation. Index Terms — Intrinsic dimension, manifold learning, Riemannian manifold, nearest neighbor graph, geodesics
Online Bayesian treestructured transformation of HMMs with optimal model selection for speaker adaptation
 IEEE Trans. Speech and Audio Proc
, 2001
"... Abstract—This paper presents a new recursive Bayesian learning approach for transformation parameter estimation in speaker adaptation. Our goal is to incrementally transform or adapt a set of hidden Markov model (HMM) parameters for a new speaker and gain large performance improvement from a small a ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Abstract—This paper presents a new recursive Bayesian learning approach for transformation parameter estimation in speaker adaptation. Our goal is to incrementally transform or adapt a set of hidden Markov model (HMM) parameters for a new speaker and gain large performance improvement from a small amount of adaptation data. By constructing a clustering tree of HMM Gaussian mixture components, the linear regression (LR) or affine transformation parameters for HMM Gaussian mixture components are dynamically searched. An online Bayesian learning technique is proposed for recursive maximum a posteriori (MAP) estimation of LR and affine transformation parameters. This technique has the advantages of being able to accommodate flexible forms of transformation functions as well as a priori probability density functions (pdfs). To balance between model complexity and goodness of fit to adaptation data, a dynamic programming algorithm is developed for selecting models using a Bayesian variant of the “minimum description length ” (MDL) principle. Speaker adaptation experiments with a 26letter English alphabet vocabulary were conducted, and the results confirmed effectiveness of the online learning framework. Index Terms—Affine transformation, Bayesian model selection, hidden Markov models (HMMs), linear regression (LR), model
Topology Discovery on Unicast Networks: A Hierarchical Approach Based on EndtoEnd Measurements
, 2005
"... In this paper we address the problem of topology discovery in unicast logical tree networks using endtoend measurements. Without any cooperation from the internal routers, topology estimation can be formulated as hierarchical clustering of the leaf nodes based on pairwise correlations as similarit ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
In this paper we address the problem of topology discovery in unicast logical tree networks using endtoend measurements. Without any cooperation from the internal routers, topology estimation can be formulated as hierarchical clustering of the leaf nodes based on pairwise correlations as similarity metrics. We investigate three types of similarity metrics: queueing delay measured by sandwich probes, delay variance measured by packet pairs, and loss rate measured also by packet pairs. Unlike previous work which first assumes the network topology is a binary tree and then tries to generalize to a nonbinary tree, we provide a framework which directly deals with general logical tree topologies. Based on our proposed finite mixture model for the set of similarity measurements we develop a penalized hierarchical topology likelihood that leads to a natural clustering of the leaf nodes level by level. A hierarchical algorithm to estimate the topology is developed in a similar manner by finding the best partitions of the leaf nodes. Our simulations show that the algorithm is more robust than binarytree based methods. The three types of similarity metrics are also evaluated under various network load conditions using ns2. 1