• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Intrinsic dimension estimation using packing numbers (2002)

by B Kegl
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 26
Next 10 →

Nearest-neighbor searching and metric space dimensions

by Kenneth L. Clarkson - In Nearest-Neighbor Methods for Learning and Vision: Theory and Practice , 2006
"... Given a set S of n sites (points), and a distance measure d, the nearest neighbor searching problem is to build a data structure so that given a query point q, the site nearest to q can be found quickly. This paper gives a data structure for this problem; the data structure is built using the distan ..."
Abstract - Cited by 63 (0 self) - Add to MetaCart
Given a set S of n sites (points), and a distance measure d, the nearest neighbor searching problem is to build a data structure so that given a query point q, the site nearest to q can be found quickly. This paper gives a data structure for this problem; the data structure is built using the distance function as a “black box”. The structure is able to speed up nearest neighbor searching in a variety of settings, for example: points in low-dimensional or structured Euclidean space, strings under Hamming and edit distance, and bit vector data from an OCR application. The data structures are observed to need linear space, with a modest constant factor. The preprocessing time needed per site is observed to match the query time. The data structure can be viewed as an application of a “kd-tree ” approach in the metric space setting, using Voronoi regions of a subset in place of axis-aligned boxes. 1

Geodesic entropic graphs for dimension and entropy estimation in manifold learning

by Jose A. Costa, Student Member, Alfred O. Hero - IEEE Trans. on Signal Processing , 2004
"... Abstract—In the manifold learning problem, one seeks to discover a smooth low dimensional surface, i.e., a manifold embedded in a higher dimensional linear vector space, based on a set of measured sample points on the surface. In this paper, we consider the closely related problem of estimating the ..."
Abstract - Cited by 52 (4 self) - Add to MetaCart
Abstract—In the manifold learning problem, one seeks to discover a smooth low dimensional surface, i.e., a manifold embedded in a higher dimensional linear vector space, based on a set of measured sample points on the surface. In this paper, we consider the closely related problem of estimating the manifold’s intrinsic dimension and the intrinsic entropy of the sample points. Specifically, we view the sample points as realizations of an unknown multivariate density supported on an unknown smooth manifold. We introduce a novel geometric approach based on entropic graph methods. Although the theory presented applies to this general class of graphs, we focus on the geodesic-minimal-spanning-tree (GMST) to obtaining asymptotically consistent estimates of the manifold dimension and the Rényi-entropy of the sample density on the manifold. The GMST approach is striking in its simplicity and does not require reconstruction of the manifold or estimation of the multivariate density of the samples. The GMST method simply constructs a minimal spanning tree (MST) sequence using a geodesic edge matrix and uses the overall lengths of the MSTs to simultaneously estimate manifold dimension and entropy. We illustrate the GMST approach on standard synthetic manifolds as well as on real data sets consisting of images of faces. Index Terms—Conformal embedding, intrinsic dimension, intrinsic entropy, manifold learning, minimal spanning tree, nonlinear dimensionality reduction. I.

Data Dimensionality Estimation Methods: A Survey

by Francesco Camastra - Pattern Recognition , 2003
"... In this paper, data dimensionality estimation methods are reviewed. The estimation of the dimensionality of a data set is a classical problem of pattern recognition. There are some good reviews [1] in literature but they do not include more recent developments based on fractal techniques and neural ..."
Abstract - Cited by 14 (0 self) - Add to MetaCart
In this paper, data dimensionality estimation methods are reviewed. The estimation of the dimensionality of a data set is a classical problem of pattern recognition. There are some good reviews [1] in literature but they do not include more recent developments based on fractal techniques and neural autoassociators. The aim of this paper is to provide an up-to-date survey of the dimensionality estimation methods of a data set, paying special attention to the fractal-based methods.

Convergence of laplacian eigenmaps

by Mikhail Belkin, Partha Niyogi - In NIPS , 2006
"... Geometrically based methods for various tasks of machine learning have attracted considerable attention over the last few years. In this paper we show convergence of eigenvectors of the point cloud Laplacian to the eigenfunctions of the Laplace-Beltrami operator on the underlying manifold, thus esta ..."
Abstract - Cited by 14 (1 self) - Add to MetaCart
Geometrically based methods for various tasks of machine learning have attracted considerable attention over the last few years. In this paper we show convergence of eigenvectors of the point cloud Laplacian to the eigenfunctions of the Laplace-Beltrami operator on the underlying manifold, thus establishing the first convergence results for a spectral dimensionality reduction algorithm in the manifold setting. 1

Translated Poisson mixture model for stratification learning

by Gloria Haro, Gregory Randal, Guillermo Sapiro, Gloria Haro, Gregory Randall, Guillermo Sapiro - Int. J. Comput. Vision , 2000
"... A framework for the regularized and robust estimation of non-uniform dimensionality and density in high dimensional noisy data is introduced in this work. This leads to learning stratifications, that is, mixture of manifolds representing different characteristics and complexities in the data set. Th ..."
Abstract - Cited by 11 (2 self) - Add to MetaCart
A framework for the regularized and robust estimation of non-uniform dimensionality and density in high dimensional noisy data is introduced in this work. This leads to learning stratifications, that is, mixture of manifolds representing different characteristics and complexities in the data set. The basic idea relies on modeling the high dimensional sample points as a process of Translated Poisson mixtures, with regularizing restrictions, leading to a model which includes the presence of noise. The Translated Poisson distribution is useful to model a noisy counting process, and it is derived from the noise-induced translation of a regular Poisson distribution. By maximizing the log-likelihood of the process counting the points falling into a local ball, we estimate the local dimension and density. We show that

Stratification learning: Detecting mixed density and dimensionality in high dimensional point clouds

by Gregory Randall, Gloria Haro, Gloria Haro, Gregory R, Guillermo Sapiro, Guillermo Sapiro - In Advances in NIPS 19 , 2006
"... The study of point cloud data sampled from a stratification, a collection of manifolds with possible different dimensions, is pursued in this paper. We present a technique for simultaneously soft clustering and estimating the mixed dimensionality and density of such structures. The framework is base ..."
Abstract - Cited by 9 (1 self) - Add to MetaCart
The study of point cloud data sampled from a stratification, a collection of manifolds with possible different dimensions, is pursued in this paper. We present a technique for simultaneously soft clustering and estimating the mixed dimensionality and density of such structures. The framework is based on a maximum likelihood estimation of a Poisson mixture model. The presentation of the approach is completed with artificial and real examples demonstrating the importance of extending manifold learning to stratification learning. 1

Manifold-adaptive dimension estimation

by Amir Massoud Farahmand, Csaba Szepesvári - In ICML ’07: Proceedings of the 24th international conference on Machine learning , 2007
"... Intuitively, learning should be easier when the data points lie on a low-dimensional submanifold of the input space. Recently there has been a growing interest in algorithms that aim to exploit such geometrical properties of the data. Oftentimes these algorithms require estimating the dimension of t ..."
Abstract - Cited by 9 (0 self) - Add to MetaCart
Intuitively, learning should be easier when the data points lie on a low-dimensional submanifold of the input space. Recently there has been a growing interest in algorithms that aim to exploit such geometrical properties of the data. Oftentimes these algorithms require estimating the dimension of the manifold first. In this paper we propose an algorithm for dimension estimation and study its finite-sample behaviour. The algorithm estimates the dimension locally around the data points using nearest neighbor techniques and then combines these local estimates. We show that the rate of convergence of the resulting estimate is independent of the dimension of the input space and hence the algorithm is “manifold-adaptive”. Thus, when the manifold supporting the data is low dimensional, the algorithm can be exponentially more efficient than its counterparts that are not exploiting this property. Our computer experiments confirm the obtained theoretical results. 1.

Orthogonal locality preserving indexing

by Deng Cai, Xiaofei He - In Proc. International Conference on Research and Development in Information Retrieval (SIGIR’05 , 2005
"... We consider the problem of document indexing and representation. Recently, Locality Preserving Indexing (LPI) was proposed for learning a compact document subspace. Different from Latent Semantic Indexing which is optimal in the sense of global Euclidean structure, LPI is optimal in the sense of loc ..."
Abstract - Cited by 7 (2 self) - Add to MetaCart
We consider the problem of document indexing and representation. Recently, Locality Preserving Indexing (LPI) was proposed for learning a compact document subspace. Different from Latent Semantic Indexing which is optimal in the sense of global Euclidean structure, LPI is optimal in the sense of local manifold structure. However, LPI is extremely sensitive to the number of dimensions. This makes it difficult to estimate the intrinsic dimensionality, while inaccurately estimated dimensionality would drastically degrade its performance. One reason leading to this problem is that LPI is non-orthogonal. Non-orthogonality distorts the metric structure of the document space. In this paper, we propose a new algorithm called Orthogonal LPI. Orthogonal LPI iteratively computes the mutually orthogonal basis functions which respect the local geometrical structure. Moreover, our empirical study shows that OLPI can have more locality preserving power than LPI. We compare the new algorithm to LSI and LPI. Extensive experimental results show that Orthogonal LPI obtains better performance than both LSI and LPI. More crucially, it is insensitive to the number of dimensions, which makes it an efficient data preprocessing method for text clustering, classification, retrieval, etc.

A.O.: Estimating local intrinsic dimension with knearest neighbor graphs

by Jose A. Costa, Abhishek Girotra, Alfred O. Hero Iii - In: IEEE Workshop on Statistical Signal Processing (SSP , 2005
"... Abstract — Many high-dimensional data sets of practical interest exhibit a varying complexity in different parts of the data space. This is the case, for example, of databases of images containing many samples of a few textures of different complexity. Such phenomena can be modeled by assuming that ..."
Abstract - Cited by 6 (0 self) - Add to MetaCart
Abstract — Many high-dimensional data sets of practical interest exhibit a varying complexity in different parts of the data space. This is the case, for example, of databases of images containing many samples of a few textures of different complexity. Such phenomena can be modeled by assuming that the data lies on a collection of manifolds with different intrinsic dimensionalities. In this extended abstract, we introduce a method to estimate the local dimensionality associated with each point in a data set, without any prior information about the manifolds, their quantity and their sampling distributions. The proposed method uses a global dimensionality estimator based on k-nearest neighbor (k-NN) graphs, together with an algorithm for computing neighborhoods in the data with similar topological properties.

De-biasing for intrinsic dimension estimation

by Kevin M. Carter, Alfred O. Hero Iii, Raviv Raich - in Proc. IEEE Statistical Signal Processing Workshop , 2007
"... Many algorithms have been proposed for estimating the intrinsic dimension of high dimensional data. A phenomenon common to all of them is a negative bias, perceived to be the result of undersampling. We propose improved methods for estimating intrinsic dimension, taking manifold boundaries into cons ..."
Abstract - Cited by 6 (4 self) - Add to MetaCart
Many algorithms have been proposed for estimating the intrinsic dimension of high dimensional data. A phenomenon common to all of them is a negative bias, perceived to be the result of undersampling. We propose improved methods for estimating intrinsic dimension, taking manifold boundaries into consideration. By estimating dimension locally, we are able to analyze and reduce the effect that sample data depth has on the negative bias. Additionally, we offer improvements to an existing algorithm for dimension estimation, based on k-nearest neighbor graphs, and offer an algorithm for adapting any dimension estimation algorithm to operate locally. Finally, we illustrate the uses of local dimension estimation with data sets consisting of multiple manifolds, including applications such as diagnosing anomalies in router networks and image segmentation. Index Terms — Intrinsic dimension, manifold learning, Riemannian manifold, nearest neighbor graph, geodesics
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University