Results 1 - 10
of
24
Multi-Manifold Semi-Supervised Learning
"... We study semi-supervised learning when the data consists of multiple intersecting manifolds. We give a finite sample analysis to quantify the potential gain of using unlabeled data in this multi-manifold setting. We then propose a semi-supervised learning algorithm that separates different manifolds ..."
Abstract
-
Cited by 147 (9 self)
- Add to MetaCart
We study semi-supervised learning when the data consists of multiple intersecting manifolds. We give a finite sample analysis to quantify the potential gain of using unlabeled data in this multi-manifold setting. We then propose a semi-supervised learning algorithm that separates different manifolds into decision sets, and performs supervised learning within each set. Our algorithm involves a novel application of Hellinger distance and size-constrained spectral clustering. Experiments demonstrate the benefit of our multimanifold semi-supervised learning approach. 1
Foundations of a Multi-way Spectral Clustering Framework for Hybrid Linear Modeling
, 2009
"... Abstract The problem of Hybrid Linear Modeling (HLM) is to model and segment data using a mixture of affine subspaces. Different strategies have been proposed to solve this problem, however, rigorous analysis justifying their performance is missing. This paper suggests the Theoretical Spectral Curva ..."
Abstract
-
Cited by 37 (10 self)
- Add to MetaCart
(Show Context)
Abstract The problem of Hybrid Linear Modeling (HLM) is to model and segment data using a mixture of affine subspaces. Different strategies have been proposed to solve this problem, however, rigorous analysis justifying their performance is missing. This paper suggests the Theoretical Spectral Curvature Clustering (TSCC) algorithm for solving the HLM problem and provides careful analysis to justify it. The TSCC algorithm is practically a combination of Govindu’s multi-way spectral clustering framework (CVPR 2005) and Ng et al.’s spectral clustering algorithm (NIPS 2001). The main result of this paper states that if the given data is sampled from a mixture of distributions concentrated around affine subspaces, then with high sampling probability the TSCC algorithm segments well the different underlying clusters. The goodness of clustering depends on the within-cluster errors, the between-clusters interaction, and a tuning parameter applied by TSCC. The proof also provides new insights for the analysis of Ng et al. (NIPS 2001). Keywords Hybrid linear modeling · d-flats clustering · Multi-way clustering · Spectral clustering · Polar curvature · Perturbation analysis · Concentration inequalities Communicated by Albert Cohen. This work was supported by NSF grant #0612608.
Sparse Manifold Clustering and Embedding
"... We propose an algorithm called Sparse Manifold Clustering and Embedding (SMCE) for simultaneous clustering and dimensionality reduction of data lying in multiple nonlinear manifolds. Similar to most dimensionality reduction methods, SMCE finds a small neighborhood around each data point and connects ..."
Abstract
-
Cited by 31 (1 self)
- Add to MetaCart
(Show Context)
We propose an algorithm called Sparse Manifold Clustering and Embedding (SMCE) for simultaneous clustering and dimensionality reduction of data lying in multiple nonlinear manifolds. Similar to most dimensionality reduction methods, SMCE finds a small neighborhood around each data point and connects each point to its neighbors with appropriate weights. The key difference is that SMCE finds both the neighbors and the weights automatically. This is done by solving a sparse optimization problem, which encourages selecting nearby points that lie in the same manifold and approximately span a low-dimensional affine subspace. The optimal solution encodes information that can be used for clustering and dimensionality reduction using spectral clustering and embedding. Moreover, the size of the optimal neighborhood of a data point, which can be different for different points, provides an estimate of the dimension of the manifold to which the point belongs. Experiments demonstrate that our method can effectively handle multiple manifolds that are very close to each other, manifolds with non-uniform sampling and holes, as well as estimate the intrinsic dimensions of the manifolds.
Mathematical Methods for Diffusion MRI Processing
, 2008
"... In this article, we review recent mathematical models and computational methods for the processing of diffusion Magnetic Resonance Images, including state-of-the-art reconstruction of diffusion models, cerebral white matter connectivity analysis, and segmentation techniques. We focus on Diffusion Te ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
In this article, we review recent mathematical models and computational methods for the processing of diffusion Magnetic Resonance Images, including state-of-the-art reconstruction of diffusion models, cerebral white matter connectivity analysis, and segmentation techniques. We focus on Diffusion Tensor Images (DTI) and Q-Ball Images (QBI).
Estimation of intrinsic dimensionality of samples from noisy low-dimensional manifolds in high dimensions with multiscale svd
, 2009
"... The problem of estimating the intrinsic dimensionality of cer-tain point clouds is of interest in many applications in statis-tics and analysis of high-dimensional data sets. Our setting is the following: the points are sampled from a manifoldM of dimension k, embedded in RD, with k D, and corrupte ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
(Show Context)
The problem of estimating the intrinsic dimensionality of cer-tain point clouds is of interest in many applications in statis-tics and analysis of high-dimensional data sets. Our setting is the following: the points are sampled from a manifoldM of dimension k, embedded in RD, with k D, and corrupted by D-dimensional noise. WhenM is a linear manifold (hy-perplane), one may analyse this situation by SVD, hoping the noise would perturb the rank k covariance matrix. WhenM is a nonlinear manifold, SVD performed globally may dra-matically overestimate the intrinsic dimensionality. We dis-cuss a multiscale version SVD that is useful in estimating the intrinsic dimensionality of nonlinear manifolds. Index Terms — Multiscale analysis, intrinsic dimension-ality, high dimensional data, manifolds, point clouds, sample
Kernelized spectral curvature clustering (KSCC
- In ICCV Workshop on Dynamical Vision
, 2009
"... Multi-manifold modeling is increasingly used in segmentation and data representation tasks in computer vision and related fields. While the general problem, modeling data by mixtures of manifolds, is very challenging, several approaches exist for modeling data by mixtures of affine subspaces (which ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
(Show Context)
Multi-manifold modeling is increasingly used in segmentation and data representation tasks in computer vision and related fields. While the general problem, modeling data by mixtures of manifolds, is very challenging, several approaches exist for modeling data by mixtures of affine subspaces (which is often referred to as hybrid linear modeling). We translate some important instances of multimanifold modeling to hybrid linear modeling in embedded spaces, without explicitly performing the embedding but applying the kernel trick. The resulting algorithm, Kernel Spectral Curvature Clustering, uses kernels at two levels- both as an implicit embedding method to linearize nonflat manifolds and as a principled method to convert a multiway affinity problem into a spectral clustering one. We demonstrate the effectiveness of the method by comparing it with other state-of-the-art methods on both synthetic data and a real-world problem of segmenting multiple motions from two perspective camera views.
Approximation of Points on low-dimensional manifolds via random linear projections. arXiv:1204.3337
"... ar ..."
Data Skeletonization via Reeb Graphs
"... Recovering hidden structure from complex and noisy non-linear data is one of the most fundamental problems in machine learning and statistical inference. While such data is often high-dimensional, it is of interest to approximate it with a low-dimensional or even one-dimensional space, since many im ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
(Show Context)
Recovering hidden structure from complex and noisy non-linear data is one of the most fundamental problems in machine learning and statistical inference. While such data is often high-dimensional, it is of interest to approximate it with a low-dimensional or even one-dimensional space, since many important aspects of data are often intrinsically low-dimensional. Furthermore, there are many scenarios where the underlying structure is graph-like, e.g, river/road networks or various trajectories. In this paper, we develop a framework to extract, as well as to simplify, a one-dimensional ”skeleton ” from unorganized data using the Reeb graph. Our algorithm is very simple, does not require complex optimizations and can be easily applied to unorganized high-dimensional data such as point clouds or proximity graphs. It can also represent arbitrary graph structures in the data. We also give theoretical results to justify our method. We provide a number of experiments to demonstrate the effectiveness and generality of our algorithm, including comparisons to existing methods, such as principal curves. We believe that the simplicity and practicality of our algorithm will help to promote skeleton graphs as a data analysis tool for a broad range of applications. 1
Multiscale geometric methods for data sets I: Multiscale covariances, noise and curvature,” submitted
, 2012
"... Large data sets are often modeled as being noisy samples from probability distributionsµinR D, withDlarge. It has been noticed that oftentimes the supportM of these probability distributions seems to be well-approximated by low-dimensional sets, perhaps even by manifolds. We shall consider sets that ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
(Show Context)
Large data sets are often modeled as being noisy samples from probability distributionsµinR D, withDlarge. It has been noticed that oftentimes the supportM of these probability distributions seems to be well-approximated by low-dimensional sets, perhaps even by manifolds. We shall consider sets that are locally well approximated by k-dimensional planes, with k ≪ D, with k-dimensional manifolds isometrically embedded in R D being a special case. Samples from µ are furthermore corrupted by D-dimensional noise. Certain tools from multiscale geometric measure theory and harmonic analysis seem well-suited to be adapted to the study of samples from such probability distributions, in order to yield quantitative geometric information about them. In this paper we introduce and study multiscale covariance matrices, i.e. covariances corresponding to the distribution restricted to a ball of radiusr, with a fixed center and varyingr, and under rather general geometric assumptions we study how their empirical, noisy counterparts behave. We prove that in the range of scales where these covariance matrices are most informative, the empirical, noisy covariances are close to their expected, noiseless counterparts. In fact, this is true as soon as the number of samples in the balls where the covariance matrices are computed is linear in the intrinsic dimension of M. As an application, we present an algorithm for estimating the intrinsic dimension ofM. 1
ON THE NON-UNIFORM COMPLEXITY OF BRAIN CONNECTIVITY
, 2007
"... A stratification and manifold learning approach for analyzing High Angular Resolution Diffusion Imaging (HARDI) data is introduced in this paper. HARDI data provides highdimensional signals measuring the complex microstructure of biological tissues, such as the cerebral white matter. We show that th ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
(Show Context)
A stratification and manifold learning approach for analyzing High Angular Resolution Diffusion Imaging (HARDI) data is introduced in this paper. HARDI data provides highdimensional signals measuring the complex microstructure of biological tissues, such as the cerebral white matter. We show that these high-dimensional spaces may be understood as unions of manifolds of varying dimensions/complexity and densities. With such analysis, we use clustering to characterize the structural complexity of the white matter. We briefly present the underlying framework and numerical experiments illustrating this original and promising approach. Key words: Stratification and manifold learning, DTI, HARDI, complexity, white matter connectivity. 1.