Results 1 
9 of
9
Discovering latent patterns with hierarchical Bayesian mixedmembership models and the issue of model choice
 In Data Mining Patterns: New Methods and Applications (P. Poncelet, F. Masseglia and M. Teisseire, eds.) 240–275. Idea Group Inc
, 2006
"... There has been an explosive growth of datamining models involving latent structure for clustering and classification. While having related objectives these models use different parameterizations and often very different specifications and constraints. Model choice is thus a major methodological iss ..."
Abstract

Cited by 13 (5 self)
 Add to MetaCart
(Show Context)
There has been an explosive growth of datamining models involving latent structure for clustering and classification. While having related objectives these models use different parameterizations and often very different specifications and constraints. Model choice is thus a major methodological issue and a crucial practical one for applications. In this paper, we work from a general formulation of hierarchical Bayesian mixedmembership models in Erosheva [15] and Erosheva, Fienberg, and Lafferty [19] and present several model specifications and variations, both parametric and nonparametric, in the context of the learning the number of latent groups and associated patterns for clustering units. Model choice is an issue within specifications, and becomes a component of the larger issue of model comparison. We elucidate strategies for comparing models and specifications by producing novel analyses of two data sets: (1) a corpus of scientific publications from the Proceedings of the National Academy of Sciences (PNAS) examined earlier by Erosheva, Fienberg, and Lafferty [19] and Griffiths and Steyvers [22]; (2) data on functionally disabled American seniors from the National
Discovery of Latent Patterns with Hierarchical Bayesian MixedMembership Models and the Issue of Model Choice
, 2006
"... Data mining has experienced an explosive growth of models with latent structure for clustering and classification. While having related objectives these models use different parameterizations and often very different specifications and constraints. Model choice is thus a big methodological issue and ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
Data mining has experienced an explosive growth of models with latent structure for clustering and classification. While having related objectives these models use different parameterizations and often very different specifications and constraints. Model choice is thus a big methodological issue and a crucial practical one for applications. In this paper, we work from a general formulation of hierarchical Bayesian mixedmembership models in Erosheva, Fienberg, and Lafferty (2004) and present several model specifications and variations, both parametric and nonparametric, in the context of the learning the number of latent groups, and associated patterns, for clustering units. Model choice is an issue within specifications, and becomes a component of the larger issue of model comparison. We elucidate strategies for comparing models and specifications by producing novel analyses of the following two data sets: (1) a corpus of scientific publications from the
Polytope samplers for inference in illposed inverse problems
"... We consider linear illposed inverse problems y = Ax, in which we want to infer many count parameters x from few count observations y, where the matrix A is binary and has some unimodularity property. Such problems are typical in applications such as contingency table analysis and network tomography ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
We consider linear illposed inverse problems y = Ax, in which we want to infer many count parameters x from few count observations y, where the matrix A is binary and has some unimodularity property. Such problems are typical in applications such as contingency table analysis and network tomography (on which we present testing results). These properties of A have a geometrical implication for the solution space: It is a convex integer polytope. We develop a novel approach to characterize this polytope in terms of its vertices; by taking advantage of the geometrical intuitions behind the Hermite normal form decomposition of the matrix A, and of a newly defined pivoting operation to travel across vertices. Next, we use this characterization to develop three (exact) polytope samplers for x with emphasis on uniform distributions. We showcase one of these samplers on simulated and real data. 1
Deconvolution of mixing time series on a graph
"... In many applications we are interested in making inference on latent time series from indirect measurements, which are often lowdimensional projections resulting from mixing or aggregation. Positron emission tomography, superresolution, and network traffic monitoring are some examples. Inference i ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
In many applications we are interested in making inference on latent time series from indirect measurements, which are often lowdimensional projections resulting from mixing or aggregation. Positron emission tomography, superresolution, and network traffic monitoring are some examples. Inference in such settings requires solving a sequence of illposed inverse problems, yt = Axt, where the projection mechanism provides information on A. We consider problems in which A specifies mixing on a graph of times series that are bursty and sparse. We develop a multilevel statespace model for mixing times series and an efficient approach to inference. A simple model is used to calibrate regularization parameters that lead to efficient inference in the multilevel statespace model. We apply this method to the problem of estimating pointtopoint traffic flows on a network from aggregate measurements. Our solution outperforms existing methods for this problem, and our twostage approach suggests an efficient inference strategy for multilevel models of multivariate time series. 1
Erratum
"... Below we describe an error in the presentation of the experimental results of our paper. Note that this error, which appears in the published version of the paper, disfavours our algorithm, SSM. In reality, all the matches SSM produces are true matches, that is, SSM always has a precision of 1. This ..."
Abstract
 Add to MetaCart
(Show Context)
Below we describe an error in the presentation of the experimental results of our paper. Note that this error, which appears in the published version of the paper, disfavours our algorithm, SSM. In reality, all the matches SSM produces are true matches, that is, SSM always has a precision of 1. This observation is relevant to Figure 8(left) and Table 1. All the precision numbers for SSM should be 1. This implies that there is a discrepancy between the matches reported by SSM and those reported by SPRING [1]. Indeed, SSM reports more (correct) matches, some of which SPRING does not report. (Hence the error in presenting the precision numbers for SSM.) The error in the presentation of our results was caused by a problematic situation we identified in the implementation of the SPRING [1] algorithm: in some cases, where we need to expand one of the partial results produced so far, SPRING may pick a partial result that may become invalid later on. However, this may lead the algorithm to miss some correct matches down the road (deriving from the path not followed). In our implementation of SSM we recognized this problem, and were able to identify more matches. Below we give some more details on this problem. Problem in DTW/LCSS dynamic programing solution In the following discussion, we consider SPRINGLCSS, but the same applies for SPRINGDTW. Assume we want to compute the LCSS distance between pattern A = {a1, …, a3} and stream B = {b1, …, b10}. Each cell of the LCSS distance matrix is referred to as LCSS(row,col). a5 a4 a3 a2 a1 b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 The LCSS update equations are:
Discovering Latent Patterns with Hoerarchical Bayesian MixedMembership Models
, 2006
"... There has been an explosive growth of datamining models involving latent structure for clustering and classification. While having related objectives these models use different parameterizations and often very different specifications and constraints. Model choice is thus a major methodological iss ..."
Abstract
 Add to MetaCart
There has been an explosive growth of datamining models involving latent structure for clustering and classification. While having related objectives these models use different parameterizations and often very different specifications and constraints. Model choice is thus a major methodological issue and a crucial practical one for applications. In this paper, we work from a general formulation of hierarchical Bayesian mixedmembership models in Erosheva [15] and Erosheva, Fienberg, and Lafferty [19] and present several model specifications and variations, both parametric and nonparametric, in the context of the learning the number of latent groups and associated patterns for clustering units. Model choice is an issue within specifications, and becomes a component of the larger issue of model comparison. We elucidate strategies for comparing models and specifications by producing novel analyses of two data sets: (1) a corpus of scientific publications from the Proceedings of the National Academy of Sciences (PNAS) examined earlier by Erosheva, Fienberg, and Lafferty [19] and Griffiths and Steyvers [22]; (2) data on functionally disabled American seniors from the National