Results 11  20
of
668
Global Versus Local Methods in Nonlinear Dimensionality Reduction
, 2003
"... Recently proposed algorithms for nonlinear dimensionality reduction fall broadly into two categories which have different advantages and disadvantages: global (Isomap [1]), and local (Locally Linear Embedding [2], Laplacian Eigenmaps [3]). We present two variants of Isomap which combine the adva ..."
Abstract

Cited by 208 (6 self)
 Add to MetaCart
(Show Context)
Recently proposed algorithms for nonlinear dimensionality reduction fall broadly into two categories which have different advantages and disadvantages: global (Isomap [1]), and local (Locally Linear Embedding [2], Laplacian Eigenmaps [3]). We present two variants of Isomap which combine the advantages of the global approach with what have previously been exclusive advantages of local methods: computational sparsity and the ability to invert conformal maps.
Randomwalk computation of similarities between nodes of a graph, with application to collaborative recommendation
 IEEE Transactions on Knowledge and Data Engineering
"... ABSTRACT This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted, undirected, graph. It is based on a Markovchain model of random walk through the database. More precisely, we compute quantities (the average commu ..."
Abstract

Cited by 194 (19 self)
 Add to MetaCart
(Show Context)
ABSTRACT This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted, undirected, graph. It is based on a Markovchain model of random walk through the database. More precisely, we compute quantities (the average commute time, the pseudoinverse of the Laplacian matrix of the graph, etc) that provide similarities between any pair of nodes, having the nice property of increasing when the number of paths connecting those elements increases and when the "length" of paths decreases. It turns out that the square root of the average commute time is a Euclidean distance and that the pseudoinverse of the Laplacian matrix is a kernel (it contains innerproducts closely related to commute times). A procedure for computing the subspace projection of the node vectors of the graph that preserves as much variance as possible in terms of the commutetime distance a principal components analysis (PCA) of the graph is also introduced. This graph PCA provides a nice interpretation to the "Fiedler vector", widely used for graph partitioning. The model is evaluated on a collaborativerecommendation task where suggestions are made about which movies people should watch based upon what they watched in the past. Experimental results on the MovieLens database show that the Laplacianbased similarities perform well in comparison with other methods. The model, which nicely fits into the socalled "statistical relational learning" framework, could also be used to compute document or word similarities, and, more generally, could be applied to machinelearning and patternrecognition tasks involving a database. * François Fouss, Alain Pirotte and Marco Saerens are with the
Why does unsupervised pretraining help deep learning?
, 2010
"... Much recent research has been devoted to learning algorithms for deep architectures such as Deep Belief Networks and stacks of autoencoder variants with impressive results being obtained in several areas, mostly on vision and language datasets. The best results obtained on supervised learning tasks ..."
Abstract

Cited by 155 (20 self)
 Add to MetaCart
Much recent research has been devoted to learning algorithms for deep architectures such as Deep Belief Networks and stacks of autoencoder variants with impressive results being obtained in several areas, mostly on vision and language datasets. The best results obtained on supervised learning tasks often involve an unsupervised learning component, usually in an unsupervised pretraining phase. The main question investigated here is the following: why does unsupervised pretraining work so well? Through extensive experimentation, we explore several possible explanations discussed in the literature including its action as a regularizer (Erhan et al., 2009b) and as an aid to optimization (Bengio et al., 2007). Our results build on the work of Erhan et al. (2009b), showing that unsupervised pretraining appears to play predominantly a regularization role in subsequent supervised training. However our results in an online setting, with a virtually unlimited data stream, point to a somewhat more nuanced interpretation of the roles of optimization and regularization in the unsupervised pretraining effect.
Superresolution through neighbor embedding
 IEEE Conference on Computer Vision and Patter Recognition 01
, 2004
"... In this paper, we propose a novel method for solving singleimage superresolution problems. Given a lowresolution image as input, we recover its highresolution counterpart using a set of training examples. While this formulation resembles other learningbased methods for superresolution, our metho ..."
Abstract

Cited by 151 (5 self)
 Add to MetaCart
In this paper, we propose a novel method for solving singleimage superresolution problems. Given a lowresolution image as input, we recover its highresolution counterpart using a set of training examples. While this formulation resembles other learningbased methods for superresolution, our method has been inspired by recent manifold learning methods, particularly locally linear embedding (LLE). Specifically, small image patches in the low and highresolution images form manifolds with similar local geometry in two distinct feature spaces. As in LLE, local geometry is characterized by how a feature vector corresponding to a patch can be reconstructed by its neighbors in the feature space. Besides using the training image pairs to estimate the highresolution embedding, we also enforce local compatibility and smoothness constraints between patches in the target highresolution image through overlapping. Experiments show that our method is very flexible and gives good empirical results. 1.
Maximum likelihood estimation of intrinsic dimension
 In Advances in Neural Information Processing Systems
, 2005
"... We propose a new method for estimating intrinsic dimension of a dataset derived by applying the principle of maximum likelihood to the distances between close neighbors. We derive the estimator by a Poisson process approximation, assess its bias and variance theoretically and by simulations, and ap ..."
Abstract

Cited by 143 (7 self)
 Add to MetaCart
(Show Context)
We propose a new method for estimating intrinsic dimension of a dataset derived by applying the principle of maximum likelihood to the distances between close neighbors. We derive the estimator by a Poisson process approximation, assess its bias and variance theoretically and by simulations, and apply it to a number of simulated and real datasets. We also show it has the best overall performance compared with two other intrinsic dimension estimators. 1
Neighborhood preserving embedding
 In Proceedings of the Tenth IEEE International Conference on Computer Vision
, 2005
"... Recently there has been a lot of interest in geometrically motivated approaches to data analysis in high dimensional spaces. We consider the case where data is drawn from sampling a probability distribution that has support on or near a submanifold of Euclidean space. In this paper, we propose a nov ..."
Abstract

Cited by 123 (13 self)
 Add to MetaCart
(Show Context)
Recently there has been a lot of interest in geometrically motivated approaches to data analysis in high dimensional spaces. We consider the case where data is drawn from sampling a probability distribution that has support on or near a submanifold of Euclidean space. In this paper, we propose a novel subspace learning algorithm called Neighborhood Preserving Embedding (NPE). Different from Principal Component Analysis (PCA) which aims at preserving the global Euclidean structure, NPE aims at preserving the local neighborhood structure on the data manifold. Therefore, NPE is less sensitive to outliers than PCA. Also, comparing to the recently proposed manifold learning algorithms such as Isomap and Locally Linear Embedding, NPE is defined everywhere, rather than only on the training data points. Furthermore, NPE may be conducted in the original space or in the reproducing kernel Hilbert space into which data points are mapped. This gives rise to kernel NPE. Several experiments on face database demonstrate the effectiveness of our algorithm. 1.
Channel compensation for SVM speaker recognition
 in Proceedings of Odyssey04, The Speaker and Language Recognition Workshop
"... One of the major remaining challenges to improving accuracy in stateoftheart speaker recognition algorithms is reducing the impact of channel and handset variations on system performance. For Gaussian Mixture Model based speaker recognition systems, a variety of channeladaptation techniques are ..."
Abstract

Cited by 113 (16 self)
 Add to MetaCart
(Show Context)
One of the major remaining challenges to improving accuracy in stateoftheart speaker recognition algorithms is reducing the impact of channel and handset variations on system performance. For Gaussian Mixture Model based speaker recognition systems, a variety of channeladaptation techniques are known and available for adapting models between different channel conditions, but for the much more recent Support Vector Machine (SVM) based approaches to this problem, much less is known about the best way to handle this issue. In this paper we explore techniques that are specific to the SVM framework in order to derive fully nonlinear channel compensations. The result is a system that is less sensitive to specific kinds of labeled channel variations observed in training. 1.
Diffusion maps, spectral clustering and eigenfunctions of fokkerplanck operators
 in Advances in Neural Information Processing Systems 18
, 2005
"... This paper presents a diffusion based probabilistic interpretation of spectral clustering and dimensionality reduction algorithms that use the eigenvectors of the normalized graph Laplacian. Given the pairwise adjacency matrix of all points, we define a diffusion distance between any two data points ..."
Abstract

Cited by 110 (14 self)
 Add to MetaCart
(Show Context)
This paper presents a diffusion based probabilistic interpretation of spectral clustering and dimensionality reduction algorithms that use the eigenvectors of the normalized graph Laplacian. Given the pairwise adjacency matrix of all points, we define a diffusion distance between any two data points and show that the low dimensional representation of the data by the first few eigenvectors of the corresponding Markov matrix is optimal under a certain mean squared error criterion. Furthermore, assuming that data points are random samples from a density p(x) = e −U(x) we identify these eigenvectors as discrete approximations of eigenfunctions of a FokkerPlanck operator in a potential 2U(x) with reflecting boundary conditions. Finally, applying known results regarding the eigenvalues and eigenfunctions of the continuous FokkerPlanck operator, we provide a mathematical justification for the success of spectral clustering and dimensional reduction algorithms based on these first few eigenvectors. This analysis elucidates, in terms of the characteristics of diffusion processes, many empirical findings regarding spectral clustering algorithms.
Label propagation through linear neighborhoods
 ICML06, 23rd International Conference on Machine Learning
, 2006
"... A novel semisupervised learning approach is proposed based on a linear neighborhood model, which assumes that each data point can be linearly reconstructed from its neighborhood. Our algorithm, named Linear Neighborhood Propagation (LNP), can propagate the labels from the labeled points to the whol ..."
Abstract

Cited by 108 (13 self)
 Add to MetaCart
(Show Context)
A novel semisupervised learning approach is proposed based on a linear neighborhood model, which assumes that each data point can be linearly reconstructed from its neighborhood. Our algorithm, named Linear Neighborhood Propagation (LNP), can propagate the labels from the labeled points to the whole dataset using these linear neighborhoods with sufficient smoothness. We also derive an easy way to extend LNP to outofsample data. Promising experimental results are presented for synthetic data, digit and text classification tasks. 1.
Laplacian score for feature selection
 In Advances in Neural Information Processing Systems (NIPS
, 2006
"... Abstract In supervised learning scenarios, feature selection has been studied widely in the literature. Selecting features in unsupervised learning scenarios is a much harder problem, due to the absence of class labels that would guide the search for relevant information. And, almost all of previou ..."
Abstract

Cited by 103 (4 self)
 Add to MetaCart
(Show Context)
Abstract In supervised learning scenarios, feature selection has been studied widely in the literature. Selecting features in unsupervised learning scenarios is a much harder problem, due to the absence of class labels that would guide the search for relevant information. And, almost all of previous unsupervised feature selection methods are "wrapper" techniques that require a learning algorithm to evaluate the candidate feature subsets. In this paper, we propose a "filter" method for feature selection which is independent of any learning algorithm. Our method can be performed in either supervised or unsupervised fashion. The proposed method is based on the observation that, in many real world classification problems, data from the same class are often close to each other. The importance of a feature is evaluated by its power of locality preserving, or, Laplacian Score. We compare our method with data variance (unsupervised) and Fisher score (supervised) on two data sets. Experimental results demonstrate the effectiveness and efficiency of our algorithm.