Results 1 
6 of
6
Discovering latent patterns with hierarchical Bayesian mixedmembership models and the issue of model choice
 In Data Mining Patterns: New Methods and Applications (P. Poncelet, F. Masseglia and M. Teisseire, eds.) 240–275. Idea Group Inc
, 2006
"... There has been an explosive growth of datamining models involving latent structure for clustering and classification. While having related objectives these models use different parameterizations and often very different specifications and constraints. Model choice is thus a major methodological iss ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
There has been an explosive growth of datamining models involving latent structure for clustering and classification. While having related objectives these models use different parameterizations and often very different specifications and constraints. Model choice is thus a major methodological issue and a crucial practical one for applications. In this paper, we work from a general formulation of hierarchical Bayesian mixedmembership models in Erosheva [15] and Erosheva, Fienberg, and Lafferty [19] and present several model specifications and variations, both parametric and nonparametric, in the context of the learning the number of latent groups and associated patterns for clustering units. Model choice is an issue within specifications, and becomes a component of the larger issue of model comparison. We elucidate strategies for comparing models and specifications by producing novel analyses of two data sets: (1) a corpus of scientific publications from the Proceedings of the National Academy of Sciences (PNAS) examined earlier by Erosheva, Fienberg, and Lafferty [19] and Griffiths and Steyvers [22]; (2) data on functionally disabled American seniors from the National
Discovery of Latent Patterns with Hierarchical Bayesian MixedMembership Models and the Issue of Model Choice
, 2006
"... Data mining has experienced an explosive growth of models with latent structure for clustering and classification. While having related objectives these models use different parameterizations and often very different specifications and constraints. Model choice is thus a big methodological issue and ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Data mining has experienced an explosive growth of models with latent structure for clustering and classification. While having related objectives these models use different parameterizations and often very different specifications and constraints. Model choice is thus a big methodological issue and a crucial practical one for applications. In this paper, we work from a general formulation of hierarchical Bayesian mixedmembership models in Erosheva, Fienberg, and Lafferty (2004) and present several model specifications and variations, both parametric and nonparametric, in the context of the learning the number of latent groups, and associated patterns, for clustering units. Model choice is an issue within specifications, and becomes a component of the larger issue of model comparison. We elucidate strategies for comparing models and specifications by producing novel analyses of the following two data sets: (1) a corpus of scientific publications from the
Polytope samplers for inference in illposed inverse problems
"... We consider linear illposed inverse problems y = Ax, in which we want to infer many count parameters x from few count observations y, where the matrix A is binary and has some unimodularity property. Such problems are typical in applications such as contingency table analysis and network tomography ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
We consider linear illposed inverse problems y = Ax, in which we want to infer many count parameters x from few count observations y, where the matrix A is binary and has some unimodularity property. Such problems are typical in applications such as contingency table analysis and network tomography (on which we present testing results). These properties of A have a geometrical implication for the solution space: It is a convex integer polytope. We develop a novel approach to characterize this polytope in terms of its vertices; by taking advantage of the geometrical intuitions behind the Hermite normal form decomposition of the matrix A, and of a newly defined pivoting operation to travel across vertices. Next, we use this characterization to develop three (exact) polytope samplers for x with emphasis on uniform distributions. We showcase one of these samplers on simulated and real data. 1
Deconvolution of mixing time series on a graph
"... In many applications we are interested in making inference on latent time series from indirect measurements, which are often lowdimensional projections resulting from mixing or aggregation. Positron emission tomography, superresolution, and network traffic monitoring are some examples. Inference i ..."
Abstract
 Add to MetaCart
In many applications we are interested in making inference on latent time series from indirect measurements, which are often lowdimensional projections resulting from mixing or aggregation. Positron emission tomography, superresolution, and network traffic monitoring are some examples. Inference in such settings requires solving a sequence of illposed inverse problems, yt = Axt, where the projection mechanism provides information on A. We consider problems in which A specifies mixing on a graph of times series that are bursty and sparse. We develop a multilevel statespace model for mixing times series and an efficient approach to inference. A simple model is used to calibrate regularization parameters that lead to efficient inference in the multilevel statespace model. We apply this method to the problem of estimating pointtopoint traffic flows on a network from aggregate measurements. Our solution outperforms existing methods for this problem, and our twostage approach suggests an efficient inference strategy for multilevel models of multivariate time series. 1
Research Track Paper Recovering Latent TimeSeries from their Observed Sums: Network Tomography with Particle Filters.
"... Hidden variables, evolving over time, appear in multiple settings, where it is valuable to recover them, typically from observed sums. Our driving application is ’network tomography’, where we need to estimate the origindestination (OD) traffic flows to determine, e.g., who is communicating with wh ..."
Abstract
 Add to MetaCart
Hidden variables, evolving over time, appear in multiple settings, where it is valuable to recover them, typically from observed sums. Our driving application is ’network tomography’, where we need to estimate the origindestination (OD) traffic flows to determine, e.g., who is communicating with whom in a local area network. This information allows network engineers and managers to solve problems in design, routing, configuration debugging, monitoring and pricing. Unfortunately the direct measurement of the OD traffic is usually difficult, or even impossible; instead, we can easily measure the loads on every link, that is, sums of desirable OD flows. In this paper we propose iFILTER, a method to solve this problem, which improves the stateoftheart by (a) introducing explicit time dependence, and by (b) using realistic, nonGaussian marginals in the statistical models for the traffic flows, as never attempted before. We give experiments on real data, where iFILTER scales linearly with new observations and outperforms the best existing solutions, in a wide variety of settings. Specifically, on real network traffic measured at CMU, and at AT&T, iFILTER reduced the estimation errors between 15 % and 46 % in all cases.