Discovering latent patterns with hierarchical Bayesian mixed-membership models and the issue of model choice (2006)
| Venue: | In Data Mining Patterns: New Methods and Applications (P. Poncelet, F. Masseglia and M. Teisseire, eds.) 240–275. Idea Group Inc |
| Citations: | 6 - 3 self |
BibTeX
@TECHREPORT{Airoldi06discoveringlatent,
author = {Edoardo M. Airoldi and Stephen E. Fienberg and Cyrille Joutard and Tanzy M. Love},
title = {Discovering latent patterns with hierarchical Bayesian mixed-membership models and the issue of model choice},
institution = {In Data Mining Patterns: New Methods and Applications (P. Poncelet, F. Masseglia and M. Teisseire, eds.) 240–275. Idea Group Inc},
year = {2006}
}
OpenURL
Abstract
There has been an explosive growth of data-mining models involving latent structure for clustering and classification. While having related objectives these models use different parameterizations and often very different specifications and constraints. Model choice is thus a major methodological issue and a crucial practical one for applications. In this paper, we work from a general formulation of hierarchical Bayesian mixed-membership models in Erosheva [15] and Erosheva, Fienberg, and Lafferty [19] and present several model specifications and variations, both parametric and nonparametric, in the context of the learning the number of latent groups and associated patterns for clustering units. Model choice is an issue within specifications, and becomes a component of the larger issue of model comparison. We elucidate strategies for comparing models and specifications by producing novel analyses of two data sets: (1) a corpus of scientific publications from the Proceedings of the National Academy of Sciences (PNAS) examined earlier by Erosheva, Fienberg, and Lafferty [19] and Griffiths and Steyvers [22]; (2) data on functionally disabled American seniors from the National







