Results 1  10
of
11
The Elastic Embedding Algorithm for Dimensionality Reduction
"... We propose a new dimensionality reduction method, the elastic embedding (EE), that optimises an intuitive, nonlinear objective function of the lowdimensional coordinates of the data. The method reveals a fundamental relation betwen a spectral method, Laplacian eigenmaps, and a nonlinear method, sto ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
We propose a new dimensionality reduction method, the elastic embedding (EE), that optimises an intuitive, nonlinear objective function of the lowdimensional coordinates of the data. The method reveals a fundamental relation betwen a spectral method, Laplacian eigenmaps, and a nonlinear method, stochastic neighbour embedding; and shows that EE can be seen as learning both the coordinates and the affinities between data points. We give a homotopy method to train EE, characterise the critical value of the homotopy parameter, and study the method’s behaviour. For a fixed homotopy parameter, we give a globally convergent iterative algorithm that is very effective and requires no user parameters. Finally, we give an extension to outofsample points. In standard datasets, EE obtains results as good or better than those of SNE, but more efficiently and robustly. 1.
Visualizing nonmetric similarities in multiple maps
, 2011
"... Techniques for multidimensional scaling visualize objects as points in a lowdimensional metric map. As a result, the visualizations are subject to the fundamental limitations of metric spaces. These limitations prevent multidimensional scaling from faithfully representing nonmetric similarity data ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Techniques for multidimensional scaling visualize objects as points in a lowdimensional metric map. As a result, the visualizations are subject to the fundamental limitations of metric spaces. These limitations prevent multidimensional scaling from faithfully representing nonmetric similarity data such as word associations or event cooccurrences. In particular, multidimensional scaling cannot faithfully represent intransitive pairwise similarities in a visualization, and it cannot faithfully visualize “central ” objects. In this paper, we present an extension of a recently proposed multidimensional scaling technique called tSNE. The extension aims to address the problems of traditional multidimensional scaling techniques when these techniques are used to visualize nonmetric similarities. The new technique, called multiple maps tSNE, alleviates these problems by constructing a collection of maps that reveal complementary structure in the similarity data. We apply multiple maps tSNE to a large data set of word association data and to a data set of NIPS coauthorships, demonstrating its ability to successfully visualize nonmetric similarities.
Training Recurrent Neural Networks
, 2013
"... Recurrent Neural Networks (RNNs) are powerful sequence models that were believed to be difficult to train, and as a result they were rarely used in machine learning applications. This thesis presents methods that overcome the difficulty of training RNNs, and applications of RNNs to challenging probl ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Recurrent Neural Networks (RNNs) are powerful sequence models that were believed to be difficult to train, and as a result they were rarely used in machine learning applications. This thesis presents methods that overcome the difficulty of training RNNs, and applications of RNNs to challenging problems. We first describe a new probabilistic sequence model that combines Restricted Boltzmann Machines and RNNs. The new model is more powerful than similar models while being less difficult to train. Next, we present a new variant of the Hessianfree (HF) optimizer and show that it can train RNNs on tasks that have extreme longrange temporal dependencies, which were previously considered to be impossibly hard. We then apply HF to characterlevel language modelling and get excellent results. We also apply HF to optimal control and obtain RNN control laws that can successfully operate under conditions of delayed feedback and unknown disturbances. Finally, we describe a random parameter initialization scheme that allows gradient descent with momentum to train RNNs on problems with longterm dependencies. This directly contradicts widespread beliefs about the inability of firstorder methods to do so, and suggests that previous attempts at training RNNs failed partly due to flaws in the random initialization.
HeavyTailed Symmetric Stochastic Neighbor Embedding
"... Stochastic Neighbor Embedding (SNE) has shown to be quite promising for data visualization. Currently, the most popular implementation, tSNE, is restricted to a particular Student tdistribution as its embedding distribution. Moreover, it uses a gradient descent algorithm that may require users to ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Stochastic Neighbor Embedding (SNE) has shown to be quite promising for data visualization. Currently, the most popular implementation, tSNE, is restricted to a particular Student tdistribution as its embedding distribution. Moreover, it uses a gradient descent algorithm that may require users to tune parameters such as the learning step size, momentum, etc., in finding its optimum. In this paper, we propose the Heavytailed Symmetric Stochastic Neighbor Embedding (HSSNE) method, which is a generalization of the tSNE to accommodate various heavytailed embedding similarity functions. With this generalization, we are presented with two difficulties. The first is how to select the best embedding similarity among all heavytailed functions and the second is how to optimize the objective function once the heavytailed function has been selected. Our contributions then are: (1) we point out that various heavytailed embedding similarities can be characterized by their negative score functions. Based on this finding, we present a parameterized subset of similarity functions for choosing the best tailheaviness for HSSNE; (2) we present a fixedpoint optimization algorithm that can be applied to all heavytailed functions and does not require the user to set any parameters; and (3) we present two empirical studies, one for unsupervised visualization showing that our optimization algorithm runs as fast and as good as the best known tSNE implementation and the other for semisupervised visualization showing quantitative superiority using the homogeneity measure as well as qualitative advantage in cluster separation over tSNE. 1
PartialHessian Strategies for Fast Learning of Nonlinear Embeddings
"... Stochastic neighbor embedding (SNE) and related nonlinear manifold learning algorithms achieve highquality lowdimensional representations of similarity data, but are notoriously slow to train. We propose a generic formulation of embedding algorithms that includes SNE and other existing algorithms, ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Stochastic neighbor embedding (SNE) and related nonlinear manifold learning algorithms achieve highquality lowdimensional representations of similarity data, but are notoriously slow to train. We propose a generic formulation of embedding algorithms that includes SNE and other existing algorithms, and study their relation with spectral methods and graph Laplacians. This allows us to define several partialHessian optimization strategies, characterize their global and local convergence, and evaluate them empirically. We achieve up to two orders of magnitude speedup over existing training methods with a strategy (which we call the spectral direction)
Tree Preserving Embedding
"... Visualization techniques for complex data are a workhorse of modern scientific pursuits. The goal of visualization is to embed highdimensional data in a lowdimensional space while preserving structure in the data relevant to exploratory data analysis such as clusters. However, existing visualizati ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Visualization techniques for complex data are a workhorse of modern scientific pursuits. The goal of visualization is to embed highdimensional data in a lowdimensional space while preserving structure in the data relevant to exploratory data analysis such as clusters. However, existing visualization methods often either fail to separate clusters due to the crowding problem or can only separate clusters at a single resolution. Here, we develop a new approach to visualization, tree preserving embedding (TPE). Our approach uses the topological notion of connectedness to separate clusters at all resolutions. We provide a formal guarantee of cluster separation for our approach that holds for finite samples. Our approach requires no parameters and can handle general types of data, making it easy to use in practice. 1.
Visualizing Data using tSNE Laurens
"... We present a new technique called “tSNE ” that visualizes highdimensional data by giving each datapoint a location in a two or threedimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly ..."
Abstract
 Add to MetaCart
We present a new technique called “tSNE ” that visualizes highdimensional data by giving each datapoint a location in a two or threedimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map. tSNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for highdimensional data that lie on several different, but related, lowdimensional manifolds, such as images of objects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large data sets, we show how tSNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to influence the way in which a subset of the data is displayed. We illustrate the performance of tSNE on a wide variety of data sets and compare it with many other nonparametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by tSNE are significantly better than those produced by the other techniques on almost all of the data sets.
Stochastic kNeighborhood Selection for Supervised and Unsupervised Learning
"... Neighborhood Components Analysis (NCA) is a popular method for learning a distance metric to be used within a knearest neighbors (kNN) classifier. A key assumption built into the model is that each point stochastically selects a single neighbor, which makes the model welljustified only for kNN wit ..."
Abstract
 Add to MetaCart
Neighborhood Components Analysis (NCA) is a popular method for learning a distance metric to be used within a knearest neighbors (kNN) classifier. A key assumption built into the model is that each point stochastically selects a single neighbor, which makes the model welljustified only for kNN with k = 1. However, kNN classifiers with k> 1 are more robust and usually preferred in practice. Here we present kNCA, which generalizes NCA by learning distance metrics that are appropriate for kNN with arbitrary k. The main technical contribution is showing how to efficiently compute and optimize the expected accuracy of a kNN classifier. We apply similar ideas in an unsupervised setting to yield kSNE and ktSNE, generalizations of Stochastic Neighbor Embedding (SNE, tSNE) that operate on neighborhoods of size k, which provide an axis of control over embeddings that allow for more homogeneous and interpretable regions. Empirically, we show that kNCA often improves classification accuracy over state of the art methods, produces qualitative differences in the embeddings as k is varied, and is more robust with respect to label noise. 1.
Modeling Semantic Similarities in Multiple Maps
, 2009
"... Models that represent words as points in a semantic space are subject to fundamental limitations of metric spaces. These limitations prevent semantic space models from faithfully representing, for example, the pairwise similarities between word meanings as revealed by word association data. In parti ..."
Abstract
 Add to MetaCart
Models that represent words as points in a semantic space are subject to fundamental limitations of metric spaces. These limitations prevent semantic space models from faithfully representing, for example, the pairwise similarities between word meanings as revealed by word association data. In particular, semantic space models cannot faithfully represent intransitive pairwise similarities or the similarities of words that have multiple meanings. In this paper, we present a model that alleviates the limitations of semantic space models by constructing a collection of maps that represent complementary structure in the similarity data. Our model is a variant of a similarity choice model known as Stochastic Neighbor Embedding that constructs multiple maps and allows each object to occur as a point in several different maps. We apply the model to a set of word association data, demonstrating that it can successfully represent intransitive semantic relations as well as words with multiple meanings, and that it outperforms traditional semantic space models in the prediction of word associations. We compare the model to alternative representations of semantic structure, such as topic models and semantic networks. Modeling Semantic Similarities in Multiple Maps Laurens van der Maaten
Visualizing Data using tSNE Laurens van der Maaten MICCIKAT
"... We present a new technique called “tSNE ” that visualizes highdimensional data by giving each datapoint a location in a two or threedimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly ..."
Abstract
 Add to MetaCart
We present a new technique called “tSNE ” that visualizes highdimensional data by giving each datapoint a location in a two or threedimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map. tSNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for highdimensional data that lie on several different, but related, lowdimensional manifolds, such as images of objects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large datasets, we show how tSNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to influence the way in which a subset of the data is displayed. We illustrate the performance of tSNE on a wide variety of datasets and compare it with many other nonparametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by tSNE are significantly better than those produced by the other techniques on almost all of the datasets.