Results 1 -
2 of
2
Identification of MCMC Samples for Clustering
"... Abstract. For clustering problems, many studies use just MAP assignments to show clustering results instead of using whole samples from a MCMC sampler. This is because it is not straightforward to recognize clusters based on whole samples. Thus, we proposed an identification algorithm which construc ..."
Abstract
- Add to MetaCart
Abstract. For clustering problems, many studies use just MAP assignments to show clustering results instead of using whole samples from a MCMC sampler. This is because it is not straightforward to recognize clusters based on whole samples. Thus, we proposed an identification algorithm which constructs groups of relevant clusters. The identification exploits spectral clustering to group clusters. Although a naive spectral clustering algorithm is intractable due to memory space and computational time, we developed a memory-and-time efficient spectral clustering for samples of a MCMC sampler. In experiments, we show our algorithm is tractable for real data while the naive algorithm is intractable. For search query log data, we also show representative vocabularies of clusters, which cannot be chosen by just MAP assignments. 1
The Hierarchical Local Partition Process
"... Editor: We consider the problem for which K different types of data are collected to characterize an associated inference task, with this performed for M distinct tasks. It is assumed that the parameters associated with the model for data type (modality) k may be represented in the form of a mixture ..."
Abstract
- Add to MetaCart
Editor: We consider the problem for which K different types of data are collected to characterize an associated inference task, with this performed for M distinct tasks. It is assumed that the parameters associated with the model for data type (modality) k may be represented in the form of a mixture model, with the M tasks representing M draws from the mixture. We wish to simultaneously infer mixture models across all K modality types, using data from all M tasks. Considering tasks m1 and m2, we wish to impose the belief that if the data associated with modality k are drawn from the same mixture component (implying a similarity between tasks m1 and m2), then it is more probable that the associated data from modality j ̸ = k will also be drawn from the same component. On the other hand, it is anticipated that there may be “random effects ” that manifest idiosyncratic behavior for a subset of the modalities, even when similarity exists between the other modalities. The model employed utilizes a hierarchical Bayesian formalism, based on the local partition process. Inference is examined using both Markov chain Monte Carlo (MCMC) sampling and variational Bayesian (VB) analysis. The method is illustrated first with simulated data and then with data from two real applications. Concerning the latter, we consider analysis of gene-expression data and the sorting of annotated images.

