Results 1  10
of
20
Foundations of a Multiway Spectral Clustering Framework for Hybrid Linear Modeling
, 2009
"... Abstract The problem of Hybrid Linear Modeling (HLM) is to model and segment data using a mixture of affine subspaces. Different strategies have been proposed to solve this problem, however, rigorous analysis justifying their performance is missing. This paper suggests the Theoretical Spectral Curva ..."
Abstract

Cited by 37 (10 self)
 Add to MetaCart
(Show Context)
Abstract The problem of Hybrid Linear Modeling (HLM) is to model and segment data using a mixture of affine subspaces. Different strategies have been proposed to solve this problem, however, rigorous analysis justifying their performance is missing. This paper suggests the Theoretical Spectral Curvature Clustering (TSCC) algorithm for solving the HLM problem and provides careful analysis to justify it. The TSCC algorithm is practically a combination of Govindu’s multiway spectral clustering framework (CVPR 2005) and Ng et al.’s spectral clustering algorithm (NIPS 2001). The main result of this paper states that if the given data is sampled from a mixture of distributions concentrated around affine subspaces, then with high sampling probability the TSCC algorithm segments well the different underlying clusters. The goodness of clustering depends on the withincluster errors, the betweenclusters interaction, and a tuning parameter applied by TSCC. The proof also provides new insights for the analysis of Ng et al. (NIPS 2001). Keywords Hybrid linear modeling · dflats clustering · Multiway clustering · Spectral clustering · Polar curvature · Perturbation analysis · Concentration inequalities Communicated by Albert Cohen. This work was supported by NSF grant #0612608.
Translated Poisson mixture model for stratification learning
 Int. J. Comput. Vision
, 2000
"... A framework for the regularized and robust estimation of nonuniform dimensionality and density in high dimensional noisy data is introduced in this work. This leads to learning stratifications, that is, mixture of manifolds representing different characteristics and complexities in the data set. Th ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
(Show Context)
A framework for the regularized and robust estimation of nonuniform dimensionality and density in high dimensional noisy data is introduced in this work. This leads to learning stratifications, that is, mixture of manifolds representing different characteristics and complexities in the data set. The basic idea relies on modeling the high dimensional sample points as a process of Translated Poisson mixtures, with regularizing restrictions, leading to a model which includes the presence of noise. The Translated Poisson distribution is useful to model a noisy counting process, and it is derived from the noiseinduced translation of a regular Poisson distribution. By maximizing the loglikelihood of the process counting the points falling into a local ball, we estimate the local dimension and density. We show that
Spectral clustering based on local linear approximations
 ELECTRONIC JOURNAL OF STATISTICS
, 2011
"... Abstract: In the context of clustering, we assume a generative model where each cluster is the result of sampling points in the neighborhood of an embedded smooth surface; the sample may be contaminated with outliers, which are modeled as points sampled in space away from the clusters. We consider a ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
(Show Context)
Abstract: In the context of clustering, we assume a generative model where each cluster is the result of sampling points in the neighborhood of an embedded smooth surface; the sample may be contaminated with outliers, which are modeled as points sampled in space away from the clusters. We consider a prototype for a higherorder spectral clustering method based on the residual from a local linear approximation. We obtain theoretical guarantees for this algorithm and show that, in terms of both separation and robustness to outliers, it outperforms the standard spectral clustering algorithm (based on pairwise distances) of Ng, Jordan and Weiss (NIPS ’01). The optimal choice for some of the tuning parameters depends on the dimension and thickness of the clusters. We provide estimators that come close enough for our theoretical purposes. We also discuss the cases of clusters of mixed dimensions and of clusters that are generated from smoother surfaces. In our experiments, this algorithm is shown to outperform pairwise spectral clustering on both simulated and real data.
Inferring local homology from sampled stratified spaces
 In “Proc. 48th
, 2007
"... We study the reconstruction of a stratified space from a possibly noisy point sample. Specifically, we use the vineyard of the distance function restricted to a 1parameter family of neighborhoods of a point to assess the local homology of the stratified space at that point. We prove the correctness ..."
Abstract

Cited by 17 (8 self)
 Add to MetaCart
(Show Context)
We study the reconstruction of a stratified space from a possibly noisy point sample. Specifically, we use the vineyard of the distance function restricted to a 1parameter family of neighborhoods of a point to assess the local homology of the stratified space at that point. We prove the correctness of this assessment under the assumption of a sufficiently dense sample. We also give an algorithm that constructs the vineyard and makes the local assessment in time at most cubic in the size of the Delaunay triangulation of the point sample.
Local homology transfer and stratification learning
 In ACMSIAM Sympos. Discrete Alg
, 2012
"... The objective of this paper is to show that point cloud data can under certain circumstances be clustered by strata in a plausible way. For our purposes, we consider a stratified space to be a collection of manifolds of different dimensions which are glued together in a locally trivial manner inside ..."
Abstract

Cited by 13 (5 self)
 Add to MetaCart
(Show Context)
The objective of this paper is to show that point cloud data can under certain circumstances be clustered by strata in a plausible way. For our purposes, we consider a stratified space to be a collection of manifolds of different dimensions which are glued together in a locally trivial manner inside some Euclidean space. To adapt this abstract definition to the world of noise, we first define a multiscale notion of stratified spaces, providing a stratification at different scales which are indexed by a radius parameter. We then use methods derived from kernel and cokernel persistent homology to cluster the data points into different strata. We prove a correctness guarantee for this clustering method under certain topological conditions. We then provide a probabilistic guarantee for the clustering for the point sample setting – we provide bounds on the minimum number of sample points required to state with
Clustering Based on Pairwise Distances When the Data is of Mixed Dimensions
, 909
"... Abstract. In the context of clustering, we consider a generative model in a Euclidean ambient space with clusters of different shapes, dimensions, sizes and densities. In an asymptotic setting where the number of points becomes large, we obtain theoretical guaranties for a few emblematic methods bas ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
Abstract. In the context of clustering, we consider a generative model in a Euclidean ambient space with clusters of different shapes, dimensions, sizes and densities. In an asymptotic setting where the number of points becomes large, we obtain theoretical guaranties for a few emblematic methods based on pairwise distances: a simple algorithm based on the extraction of connected components in a neighborhood graph; the spectral clustering method of Ng, Jordan and Weiss; and hierarchical clustering with single linkage. The methods are shown to enjoy some nearoptimal properties in terms of separation between clusters and robustness to outliers. The local scaling method of ZelnikManor and Perona is shown to lead to a nearoptimal choice for the scale in the first two methods. We also provide a lower bound on the spectral gap to consistently choose the correct number of clusters in the spectral method.
Recent Advances in Nonlinear Dimensionality Reduction, Manifold and Topological Learning
"... Abstract. The evergrowing amount of data stored in digital databases raises the question of how to organize and extract useful knowledge. This paper outlines some current developments in the domains of dimensionality reduction, manifold learning, and topological learning. Several aspects are dealt ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Abstract. The evergrowing amount of data stored in digital databases raises the question of how to organize and extract useful knowledge. This paper outlines some current developments in the domains of dimensionality reduction, manifold learning, and topological learning. Several aspects are dealt with, ranging from novel algorithmic approaches to their realworld applications. The issue of quality assessment is also considered and progress in quantitive as well as visual crieria is reported. 1
Regularized mixed dimensionality and density learning in computer vision
 In Proceedings of 1st Workshop on Component Analysis Methods for Classification, Clustering, Modeling and Estimation Problems in Computer Vision, in conjunction with CVPR
, 2007
"... A framework for the regularized estimation of nonuniform dimensionality and density in high dimensional data is introduced in this work. This leads to learning stratifications, that is, mixture of manifolds representing different characteristics and complexities in the data set. The basic idea relie ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
A framework for the regularized estimation of nonuniform dimensionality and density in high dimensional data is introduced in this work. This leads to learning stratifications, that is, mixture of manifolds representing different characteristics and complexities in the data set. The basic idea relies on modeling the high dimensional sample points as a process of Poisson mixtures, with regularizing restrictions and spatial continuity constraints. Theoretical asymptotic results for the model are presented as well. The presentation of the framework is complemented with artificial and real examples showing the importance of regularized stratification learning in computer vision applications. 1.
L.H.: Distance between subspaces of different dimensions. ArXiv eprints
, 2014
"... ar ..."
(Show Context)
Towards a stratified learning approach to predict future citation counts
 In Digital Libraries
, 2014
"... In this paper, we study the problem of predicting future citation count of a scientific article after a given time interval of its publication. To this end, we gather and conduct an exhaustive analysis on a dataset of more than 1.5 million scientific papers of computer science domain. On analysis o ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
In this paper, we study the problem of predicting future citation count of a scientific article after a given time interval of its publication. To this end, we gather and conduct an exhaustive analysis on a dataset of more than 1.5 million scientific papers of computer science domain. On analysis of the dataset, we notice that the citation count of the articles over the years follows a diverse set of patterns; on closer inspection we identify six broad categories of citation patterns. This important observation motivates us to adopt stratified learning approach in the prediction task, whereby, we propose a twostage prediction model – in the first stage, the model maps a query paper into one of the six categories, and then in the second stage a regression module is run only on the subpopulation corresponding to that category to predict the future citation count of the query paper. Experimental results show that the categorization of this huge dataset during the training phase leads to a remarkable improvement (around 50%) in comparison to the wellknown baseline system. 1.