Results 1  10
of
57
Nearestneighbor searching and metric space dimensions
 In NearestNeighbor Methods for Learning and Vision: Theory and Practice
, 2006
"... Given a set S of n sites (points), and a distance measure d, the nearest neighbor searching problem is to build a data structure so that given a query point q, the site nearest to q can be found quickly. This paper gives a data structure for this problem; the data structure is built using the distan ..."
Abstract

Cited by 87 (0 self)
 Add to MetaCart
Given a set S of n sites (points), and a distance measure d, the nearest neighbor searching problem is to build a data structure so that given a query point q, the site nearest to q can be found quickly. This paper gives a data structure for this problem; the data structure is built using the distance function as a “black box”. The structure is able to speed up nearest neighbor searching in a variety of settings, for example: points in lowdimensional or structured Euclidean space, strings under Hamming and edit distance, and bit vector data from an OCR application. The data structures are observed to need linear space, with a modest constant factor. The preprocessing time needed per site is observed to match the query time. The data structure can be viewed as an application of a “kdtree ” approach in the metric space setting, using Voronoi regions of a subset in place of axisaligned boxes. 1
Geodesic entropic graphs for dimension and entropy estimation in manifold learning
 IEEE Trans. on Signal Processing
, 2004
"... Abstract—In the manifold learning problem, one seeks to discover a smooth low dimensional surface, i.e., a manifold embedded in a higher dimensional linear vector space, based on a set of measured sample points on the surface. In this paper, we consider the closely related problem of estimating the ..."
Abstract

Cited by 67 (4 self)
 Add to MetaCart
Abstract—In the manifold learning problem, one seeks to discover a smooth low dimensional surface, i.e., a manifold embedded in a higher dimensional linear vector space, based on a set of measured sample points on the surface. In this paper, we consider the closely related problem of estimating the manifold’s intrinsic dimension and the intrinsic entropy of the sample points. Specifically, we view the sample points as realizations of an unknown multivariate density supported on an unknown smooth manifold. We introduce a novel geometric approach based on entropic graph methods. Although the theory presented applies to this general class of graphs, we focus on the geodesicminimalspanningtree (GMST) to obtaining asymptotically consistent estimates of the manifold dimension and the Rényientropy of the sample density on the manifold. The GMST approach is striking in its simplicity and does not require reconstruction of the manifold or estimation of the multivariate density of the samples. The GMST method simply constructs a minimal spanning tree (MST) sequence using a geodesic edge matrix and uses the overall lengths of the MSTs to simultaneously estimate manifold dimension and entropy. We illustrate the GMST approach on standard synthetic manifolds as well as on real data sets consisting of images of faces. Index Terms—Conformal embedding, intrinsic dimension, intrinsic entropy, manifold learning, minimal spanning tree, nonlinear dimensionality reduction. I.
Convergence of laplacian eigenmaps
 In NIPS
, 2006
"... Geometrically based methods for various tasks of machine learning have attracted considerable attention over the last few years. In this paper we show convergence of eigenvectors of the point cloud Laplacian to the eigenfunctions of the LaplaceBeltrami operator on the underlying manifold, thus esta ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
Geometrically based methods for various tasks of machine learning have attracted considerable attention over the last few years. In this paper we show convergence of eigenvectors of the point cloud Laplacian to the eigenfunctions of the LaplaceBeltrami operator on the underlying manifold, thus establishing the first convergence results for a spectral dimensionality reduction algorithm in the manifold setting. 1
Data Dimensionality Estimation Methods: A Survey
 Pattern Recognition
, 2003
"... In this paper, data dimensionality estimation methods are reviewed. The estimation of the dimensionality of a data set is a classical problem of pattern recognition. There are some good reviews [1] in literature but they do not include more recent developments based on fractal techniques and neural ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
In this paper, data dimensionality estimation methods are reviewed. The estimation of the dimensionality of a data set is a classical problem of pattern recognition. There are some good reviews [1] in literature but they do not include more recent developments based on fractal techniques and neural autoassociators. The aim of this paper is to provide an uptodate survey of the dimensionality estimation methods of a data set, paying special attention to the fractalbased methods.
Translated Poisson mixture model for stratification learning
 Int. J. Comput. Vision
, 2000
"... A framework for the regularized and robust estimation of nonuniform dimensionality and density in high dimensional noisy data is introduced in this work. This leads to learning stratifications, that is, mixture of manifolds representing different characteristics and complexities in the data set. Th ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
A framework for the regularized and robust estimation of nonuniform dimensionality and density in high dimensional noisy data is introduced in this work. This leads to learning stratifications, that is, mixture of manifolds representing different characteristics and complexities in the data set. The basic idea relies on modeling the high dimensional sample points as a process of Translated Poisson mixtures, with regularizing restrictions, leading to a model which includes the presence of noise. The Translated Poisson distribution is useful to model a noisy counting process, and it is derived from the noiseinduced translation of a regular Poisson distribution. By maximizing the loglikelihood of the process counting the points falling into a local ball, we estimate the local dimension and density. We show that
Manifoldadaptive dimension estimation
 In ICML ’07: Proceedings of the 24th international conference on Machine learning
, 2007
"... Intuitively, learning should be easier when the data points lie on a lowdimensional submanifold of the input space. Recently there has been a growing interest in algorithms that aim to exploit such geometrical properties of the data. Oftentimes these algorithms require estimating the dimension of t ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
Intuitively, learning should be easier when the data points lie on a lowdimensional submanifold of the input space. Recently there has been a growing interest in algorithms that aim to exploit such geometrical properties of the data. Oftentimes these algorithms require estimating the dimension of the manifold first. In this paper we propose an algorithm for dimension estimation and study its finitesample behaviour. The algorithm estimates the dimension locally around the data points using nearest neighbor techniques and then combines these local estimates. We show that the rate of convergence of the resulting estimate is independent of the dimension of the input space and hence the algorithm is “manifoldadaptive”. Thus, when the manifold supporting the data is low dimensional, the algorithm can be exponentially more efficient than its counterparts that are not exploiting this property. Our computer experiments confirm the obtained theoretical results. 1.
Learning Nonlinear Image Manifolds by Global Alignment of Local Linear Models
 IEEE Trans. Pattern Analysis and Machine Intelligence
, 2006
"... ..."
Exploratory analysis and visualization of speech and music by locally linear embedding
, 2004
"... ..."
Stratification learning: Detecting mixed density and dimensionality in high dimensional point clouds
 In Advances in NIPS 19
, 2006
"... The study of point cloud data sampled from a stratification, a collection of manifolds with possible different dimensions, is pursued in this paper. We present a technique for simultaneously soft clustering and estimating the mixed dimensionality and density of such structures. The framework is base ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
The study of point cloud data sampled from a stratification, a collection of manifolds with possible different dimensions, is pursued in this paper. We present a technique for simultaneously soft clustering and estimating the mixed dimensionality and density of such structures. The framework is based on a maximum likelihood estimation of a Poisson mixture model. The presentation of the approach is completed with artificial and real examples demonstrating the importance of extending manifold learning to stratification learning. 1
A.O.: Estimating local intrinsic dimension with knearest neighbor graphs
 In: IEEE Workshop on Statistical Signal Processing (SSP
, 2005
"... Abstract — Many highdimensional data sets of practical interest exhibit a varying complexity in different parts of the data space. This is the case, for example, of databases of images containing many samples of a few textures of different complexity. Such phenomena can be modeled by assuming that ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
Abstract — Many highdimensional data sets of practical interest exhibit a varying complexity in different parts of the data space. This is the case, for example, of databases of images containing many samples of a few textures of different complexity. Such phenomena can be modeled by assuming that the data lies on a collection of manifolds with different intrinsic dimensionalities. In this extended abstract, we introduce a method to estimate the local dimensionality associated with each point in a data set, without any prior information about the manifolds, their quantity and their sampling distributions. The proposed method uses a global dimensionality estimator based on knearest neighbor (kNN) graphs, together with an algorithm for computing neighborhoods in the data with similar topological properties.