Results 1  10
of
63
Diffusion maps and coarsegraining: A unified framework for dimensionality reduction, graph partitioning and data set parameterization
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2006
"... We provide evidence that nonlinear dimensionality reduction, clustering and data set parameterization can be solved within one and the same framework. The main idea is to define a system of coordinates with an explicit metric that reflects the connectivity of a given data set and that is robust to ..."
Abstract

Cited by 96 (5 self)
 Add to MetaCart
We provide evidence that nonlinear dimensionality reduction, clustering and data set parameterization can be solved within one and the same framework. The main idea is to define a system of coordinates with an explicit metric that reflects the connectivity of a given data set and that is robust to noise. Our construction, which is based on a Markov random walk on the data, offers a general scheme of simultaneously reorganizing and subsampling graphs and arbitrarily shaped data sets in high dimensions using intrinsic geometry. We show that clustering in embedding spaces is equivalent to compressing operators. The objective of data partitioning and clustering is to coarsegrain the random walk on the data while at the same time preserving a diffusion operator for the intrinsic geometry or connectivity of the data set up to some accuracy. We show that the quantization distortion in diffusion space bounds the error of compression of the operator, thus giving a rigorous justification for kmeans clustering in diffusion space and a precise measure of the performance of general clustering algorithms.
Random projections of smooth manifolds
 Foundations of Computational Mathematics
, 2006
"... We propose a new approach for nonadaptive dimensionality reduction of manifoldmodeled data, demonstrating that a small number of random linear projections can preserve key information about a manifoldmodeled signal. We center our analysis on the effect of a random linear projection operator Φ: R N ..."
Abstract

Cited by 83 (23 self)
 Add to MetaCart
We propose a new approach for nonadaptive dimensionality reduction of manifoldmodeled data, demonstrating that a small number of random linear projections can preserve key information about a manifoldmodeled signal. We center our analysis on the effect of a random linear projection operator Φ: R N → R M, M < N, on a smooth wellconditioned Kdimensional submanifold M ⊂ R N. As our main theoretical contribution, we establish a sufficient number M of random projections to guarantee that, with high probability, all pairwise Euclidean and geodesic distances between points on M are wellpreserved under the mapping Φ. Our results bear strong resemblance to the emerging theory of Compressed Sensing (CS), in which sparse signals can be recovered from small numbers of random linear measurements. As in CS, the random measurements we propose can be used to recover the original data in R N. Moreover, like the fundamental bound in CS, our requisite M is linear in the “information level” K and logarithmic in the ambient dimension N; we also identify a logarithmic dependence on the volume and conditioning of the manifold. In addition to recovering faithful approximations to manifoldmodeled signals, however, the random projections we propose can also be used to discern key properties about the manifold. We discuss connections and contrasts with existing techniques in manifold learning, a setting where dimensionality reducing mappings are typically nonlinear and constructed adaptively from a set of sampled training data.
Protovalue functions: A laplacian framework for learning representation and control in markov decision processes
 Journal of Machine Learning Research
, 2006
"... This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by d ..."
Abstract

Cited by 66 (10 self)
 Add to MetaCart
This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by diagonalizing symmetric diffusion operators (ii) A specific instantiation of this approach where global basis functions called protovalue functions (PVFs) are formed using the eigenvectors of the graph Laplacian on an undirected graph formed from state transitions induced by the MDP (iii) A threephased procedure called representation policy iteration comprising of a sample collection phase, a representation learning phase that constructs basis functions from samples, and a final parameter estimation phase that determines an (approximately) optimal policy within the (linear) subspace spanned by the (current) basis functions. (iv) A specific instantiation of the RPI framework using leastsquares policy iteration (LSPI) as the parameter estimation method (v) Several strategies for scaling the proposed approach to large discrete and continuous state spaces, including the Nyström extension for outofsample interpolation of eigenfunctions, and the use of Kronecker sum factorization to construct compact eigenfunctions in product spaces such as factored MDPs (vi) Finally, a series of illustrative discrete and continuous control tasks, which both illustrate the concepts and provide a benchmark for evaluating the proposed approach. Many challenges remain to be addressed in scaling the proposed framework to large MDPs, and several elaboration of the proposed framework are briefly summarized at the end.
Regularization on graphs with functionadapted diffusion process
, 2006
"... Harmonic analysis and diffusion on discrete data has been shown to lead to stateoftheart algorithms for machine learning tasks, especially in the context of semisupervised and transductive learning. The success of these algorithms rests on the assumption that the function(s) to be studied (learn ..."
Abstract

Cited by 23 (5 self)
 Add to MetaCart
Harmonic analysis and diffusion on discrete data has been shown to lead to stateoftheart algorithms for machine learning tasks, especially in the context of semisupervised and transductive learning. The success of these algorithms rests on the assumption that the function(s) to be studied (learned, interpolated, etc.) are smooth with respect to the geometry of the data. In this paper we present a method for modifying the given geometry so the function(s) to be studied are smoother with respect to the modified geometry, and thus more amenable to treatment using harmonic analysis methods. Among the many possible applications, we consider the problems of image denoising and transductive classification. In both settings, our approach improves on standard diffusion based methods.
Representation Policy Iteration
, 2005
"... This paper addresses a fundamental issue central to approximation methods for solving large Markov decision processes (MDPs): how to automatically learn the underlying representation for value function approximation? A novel theoretically rigorous framework is proposed that automatically generates g ..."
Abstract

Cited by 21 (8 self)
 Add to MetaCart
This paper addresses a fundamental issue central to approximation methods for solving large Markov decision processes (MDPs): how to automatically learn the underlying representation for value function approximation? A novel theoretically rigorous framework is proposed that automatically generates geometrically customized orthonormal sets of basis functions, which can be used with any approximate MDP solver like leastsquares policy iteration (LSPI). The key innovation is a coordinatefree representation of value functions, using the theory of smooth functions on a Riemannian manifold. Hodge theory yields a constructive method for generating basis functions for approximating value functions based on the eigenfunctions of the selfadjoint (LaplaceBeltrami) operator on manifolds. In effect, this approach performs a global Fourier analysis on the state space graph to approximate value functions, where the basis functions reflect the largescale topology of the underlying state space. A new class of algorithms called Representation Policy Iteration (RPI) are presented that automatically learn both basis functions and approximately optimal policies. Illustrative experiments compare the performance of RPI with that of LSPI using two handcoded basis functions (RBF and polynomial state encodings).
Wiener’s lemma for infinite matrices
 Trans. Amer. Math. Soc
, 2006
"... Abstract. The classical Wiener lemma and its various generalizations are important and have numerous applications in numerical analysis, wavelet theory, frame theory, and sampling theory. There are many different equivalent formulations for the classical Wiener lemma, with an equivalent formulation ..."
Abstract

Cited by 18 (11 self)
 Add to MetaCart
Abstract. The classical Wiener lemma and its various generalizations are important and have numerous applications in numerical analysis, wavelet theory, frame theory, and sampling theory. There are many different equivalent formulations for the classical Wiener lemma, with an equivalent formulation suitable for our generalization involving commutative algebra of infinite matrices W: = {(a(j − j ′)) j,j ′ ∈Zd: � j∈Zd a(j)  < ∞}. In the study of spline approximation, (diffusion) wavelets and affine frames, Gabor frames on nonuniform grid, and nonuniform sampling and reconstruction, the associated algebras of infinite matrices are extremely noncommutative, but we expect those noncommutative algebras to have a similar property to Wiener’s lemma for the commutative algebra W. In this paper, we consider two noncommutative algebras of infinite matrices, the Schur class and the Sjöstrand class, and establish Wiener’s lemmas for those matrix algebras. 1.
Fast direct policy evaluation using multiscale analysis of markov diffusion processes
 In Proceedings of the 23rd International Conference on Machine Learning, 601–608
, 2005
"... Policy evaluation is a critical step in the approximate solution of large Markov decision processes (MDPs), typically requiring O(S  3) to directly solve the Bellman system of S  linear equations (where S  is the state space size in the discrete case, and the sample size in the continuous case ..."
Abstract

Cited by 17 (10 self)
 Add to MetaCart
Policy evaluation is a critical step in the approximate solution of large Markov decision processes (MDPs), typically requiring O(S  3) to directly solve the Bellman system of S  linear equations (where S  is the state space size in the discrete case, and the sample size in the continuous case). In this paper we apply a recently introduced multiscale framework for analysis on graphs to design a faster algorithm for policy evaluation. For a fixed policy π, this framework efficiently constructs a multiscale decomposition of the random walk P π associated with the policy π. This enables efficiently computing medium and long term state distributions, approximation of value functions, and the direct computation of the potential operator (I −γP π) −1 needed to solve Bellman’s equation. We show that even a preliminary nonoptimized version of the solver competes with highly optimized iterative techniques, requiring in many cases a complexity of O(S). 1.
Diffusion polynomial frames on metric measure spaces. submitted
, 2006
"... We construct a multiscale tight frame based on an arbitrary orthonormal basis for the L 2 space of an arbitrary sigma finite measure space. The approximation properties of the resulting multiscale are studied in the context of Besov approximation spaces, which are characterized both in terms of suit ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
We construct a multiscale tight frame based on an arbitrary orthonormal basis for the L 2 space of an arbitrary sigma finite measure space. The approximation properties of the resulting multiscale are studied in the context of Besov approximation spaces, which are characterized both in terms of suitable K–functionals and the frame transforms. The only major condition required is the uniform boundedness of a summabilility operator. We give sufficient conditions for this to hold in the context of a very general class of metric measure spaces. The theory is illustrated using the approximation of characteristic functions of caps on a dumbell manifold, and applied to the problem of recognition of hand–written digits. Our methods outperforms comparable methods for semi–supervised learning.
Geodesic Gaussian kernels for value function approximation
, 2007
"... The leastsquares policy iteration approach works efficiently in value function approximation, given appropriate basis functions. Because of its smoothness, the Gaussian kernel is a popular and useful choice as a basis function. However, it does not allow for discontinuity which typically arises in ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
The leastsquares policy iteration approach works efficiently in value function approximation, given appropriate basis functions. Because of its smoothness, the Gaussian kernel is a popular and useful choice as a basis function. However, it does not allow for discontinuity which typically arises in realworld reinforcement learning tasks. In this paper, we propose a new basis function based on geodesic Gaussian kernels, which exploits the nonlinear manifold structure induced by the Markov decision processes. The usefulness of the proposed method is successfully demonstrated in simulated robot arm control and Khepera robot navigation.
Stability of Localized Operators
, 2008
"... Let ℓ p,1 ≤ p ≤ ∞, be the space of all psummable sequences and Ca be the convolution operator associated with a summable sequence a. It is known that the ℓ p stability of the convolution operator Ca for different 1 ≤ p ≤ ∞ are equivalent to each other, i.e., if Ca has ℓ pstability for some 1 ≤ p ..."
Abstract

Cited by 9 (6 self)
 Add to MetaCart
Let ℓ p,1 ≤ p ≤ ∞, be the space of all psummable sequences and Ca be the convolution operator associated with a summable sequence a. It is known that the ℓ p stability of the convolution operator Ca for different 1 ≤ p ≤ ∞ are equivalent to each other, i.e., if Ca has ℓ pstability for some 1 ≤ p ≤ ∞ then Ca has ℓ qstability for all 1 ≤ q ≤ ∞. In the study of spline approximation, wavelet analysis, timefrequency analysis, and sampling, there are many localized operators of nonconvolution type whose stability is one of the basic assumptions. In this paper, we consider the stability of those localized operators including infinite matrices in the Sjöstrand class, synthesis operators with generating functions enveloped by shifts of a function in the Wiener amalgam space, and integral operators with kernels having certain regularity and decay at infinity. We show that the ℓ p stability (or L pstability) of those three classes of localized operators are equivalent to each other, and we also prove that the left inverse of those localized operators are well localized.