Results 1  10
of
42
Inductive Hashing on Manifolds
"... Learning based hashing methods have attracted considerable attention due to their ability to greatly increase the scale at which existing algorithms may operate. Most of these methods are designed to generate binary codes that preserve the Euclidean distance in the original space. Manifold learning ..."
Abstract

Cited by 15 (6 self)
 Add to MetaCart
(Show Context)
Learning based hashing methods have attracted considerable attention due to their ability to greatly increase the scale at which existing algorithms may operate. Most of these methods are designed to generate binary codes that preserve the Euclidean distance in the original space. Manifold learning techniques, in contrast, are better able to model the intrinsic structure embedded in the original highdimensional data. The complexity of these models, and the problems with outofsample data, have previously rendered them unsuitable for application to largescale embedding, however. In this work, we consider how to learn compact binary embeddings on their intrinsic manifolds. In order to address the abovementioned difficulties, we describe an efficient, inductive solution to the outofsample data problem, and a process by which nonparametric manifold learning may be used as the basis of a hashing method. Our proposed approach thus allows the development of a range of new hashing techniques exploiting the flexibility of the wide variety of manifold learning approaches available. We particularly show that hashing on the basis of tSNE [29] outperforms stateoftheart hashing methods on largescale benchmark datasets, and is very effective for image classification with very short code lengths. 1.
A MaxentStress Model for Graph Layout
"... In some applications of graph visualization, input edges have associated target lengths. Dealing with these lengths is a challenge, especially for large graphs. Stress models are often employed in this situation. However, the traditional full stress model is not scalable due to its reliance on an in ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
(Show Context)
In some applications of graph visualization, input edges have associated target lengths. Dealing with these lengths is a challenge, especially for large graphs. Stress models are often employed in this situation. However, the traditional full stress model is not scalable due to its reliance on an initial allpairs shortest path calculation. A number of fast approximation algorithms have been proposed. While they work well for some graphs, the results are less satisfactory on graphs of intrinsically high dimension, because nodes overlap unnecessarily. We propose a solution, called the maxentstress model, which applies the principle of maximum entropy to cope with the extra degrees of freedom. We describe a forceaugmented stress majorization algorithm that solves the maxentstress model. Numerical results show that the algorithm scales well, and provides acceptable layouts for large, nonrigid graphs. This also has potential applications to scalable algorithms for statistical multidimensional scaling (MDS) with variable distances.
Visualizing nonmetric similarities in multiple maps
, 2011
"... Techniques for multidimensional scaling visualize objects as points in a lowdimensional metric map. As a result, the visualizations are subject to the fundamental limitations of metric spaces. These limitations prevent multidimensional scaling from faithfully representing nonmetric similarity data ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Techniques for multidimensional scaling visualize objects as points in a lowdimensional metric map. As a result, the visualizations are subject to the fundamental limitations of metric spaces. These limitations prevent multidimensional scaling from faithfully representing nonmetric similarity data such as word associations or event cooccurrences. In particular, multidimensional scaling cannot faithfully represent intransitive pairwise similarities in a visualization, and it cannot faithfully visualize “central ” objects. In this paper, we present an extension of a recently proposed multidimensional scaling technique called tSNE. The extension aims to address the problems of traditional multidimensional scaling techniques when these techniques are used to visualize nonmetric similarities. The new technique, called multiple maps tSNE, alleviates these problems by constructing a collection of maps that reveal complementary structure in the similarity data. We apply multiple maps tSNE to a large data set of word association data and to a data set of NIPS coauthorships, demonstrating its ability to successfully visualize nonmetric similarities.
Accelerating tSNE using treebased algorithms
, 2014
"... The paper investigates the acceleration of tSNE—an embedding technique that is commonly used for the visualization of highdimensional data in scatter plots—using two treebased algorithms. In particular, the paper develops variants of the BarnesHut algorithm and of the dualtree algorithm that a ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
The paper investigates the acceleration of tSNE—an embedding technique that is commonly used for the visualization of highdimensional data in scatter plots—using two treebased algorithms. In particular, the paper develops variants of the BarnesHut algorithm and of the dualtree algorithm that approximate the gradient used for learning tSNE embeddings in O(N logN). Our experiments show that the resulting algorithms substantially accelerate tSNE, and that they make it possible to learn embeddings of data sets with millions of objects. Somewhat counterintuitively, the BarnesHut variant of tSNE appears to outperform the dualtree variant.
Informationgeometric dimensionality reduction
 Signal Processing Magazine, IEEE
, 2011
"... We consider the problem of dimensionality reduction and manifold learning when the domain of interest is a set of probability distributions instead of a set of Euclidean data vectors. In this problem, one seeks to discover a low dimensional representation, called an embedding, that preserves certain ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
We consider the problem of dimensionality reduction and manifold learning when the domain of interest is a set of probability distributions instead of a set of Euclidean data vectors. In this problem, one seeks to discover a low dimensional representation, called an embedding, that preserves certain properties such as distance between measured distributions or separation between classes of distributions. Such representations are useful for data visualization and clustering. While a standard Euclidean dimension reduction method like PCA, ISOMAP, or Laplacian Eigenmaps can easily be applied to distributional data – e.g. by quantization and vectorization of the distributions – this may not provide the best lowdimensional embedding. This is because the most natural measure of dissimilarity between probability distributions is the information divergence and not the standard Euclidean distance. If the information divergence is adopted then the space of probability distributions becomes a nonEuclidean space called an information geometry. This article presents methods that are specifically designed for the lowdimensional embedding of informationgeometric data, and we illustrate these methods for visualization in flow cytometry and demography analysis. Index Terms Information geometry, dimensionality reduction, statistical manifold, classification
Unsupervised Dimensionality Reduction: Overview and Recent Advances
, 2010
"... Unsupervised dimensionality reduction aims at representing highdimensional data in lowerdimensional spaces in a faithful way. Dimensionality reduction can be used for compression or denoising purposes, but data visualization remains one its most prominent applications. This paper attempts to give ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Unsupervised dimensionality reduction aims at representing highdimensional data in lowerdimensional spaces in a faithful way. Dimensionality reduction can be used for compression or denoising purposes, but data visualization remains one its most prominent applications. This paper attempts to give a broad overview of the domain. Past develoments are briefly introduced and pinned up on the time line of the last eleven decades. Next, the principles and techniques involved in the major methods are described. A taxonomy of the methods is suggested, taking into account various properties. Finally, the issue of quality assessment is briefly dealt with.
A Behavioral Investigation of Dimensionality Reduction
"... A cornucopia of dimensionality reduction techniques have emerged over the past decade, leaving data analysts with a wide variety of choices for reducing their data. Means of evaluating and comparing lowdimensional embeddings useful for visualization, however, are very limited. When proposing a new ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
A cornucopia of dimensionality reduction techniques have emerged over the past decade, leaving data analysts with a wide variety of choices for reducing their data. Means of evaluating and comparing lowdimensional embeddings useful for visualization, however, are very limited. When proposing a new technique it is common to simply show rival embeddings sidebyside and let human judgment determine which embedding is superior. This study investigates whether such human embedding evaluations are reliable, i.e., whether humans tend to agree on the quality of an embedding. We also investigate what types of embedding structures humans appreciate a priori. Our results reveal that, although experts are reasonably consistent in their evaluation of embeddings, novices generally disagree on the quality of an embedding. We discuss the impact of this result on the way dimensionality reduction researchers should present their results, and on applicability of dimensionality reduction outside of machine learning.
Recent Advances in Nonlinear Dimensionality Reduction, Manifold and Topological Learning
"... Abstract. The evergrowing amount of data stored in digital databases raises the question of how to organize and extract useful knowledge. This paper outlines some current developments in the domains of dimensionality reduction, manifold learning, and topological learning. Several aspects are dealt ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Abstract. The evergrowing amount of data stored in digital databases raises the question of how to organize and extract useful knowledge. This paper outlines some current developments in the domains of dimensionality reduction, manifold learning, and topological learning. Several aspects are dealt with, ranging from novel algorithmic approaches to their realworld applications. The issue of quality assessment is also considered and progress in quantitive as well as visual crieria is reported. 1
Visualizing the quality of dimensionality reduction
"... Abstract. Many different evaluation measures for dimensionality reduction can be summarized based on the coranking framework [6]. Here, we extend this framework in two ways: (i) we show that the current parameterization of the quality shows unpredictable behavior, even in simple settings, and we pr ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Many different evaluation measures for dimensionality reduction can be summarized based on the coranking framework [6]. Here, we extend this framework in two ways: (i) we show that the current parameterization of the quality shows unpredictable behavior, even in simple settings, and we propose a different parameterization which yields more intuitive results; (ii) we propose how to link the quality to pointwise quality measures which can directly be integrated into the visualization. 1
Tree Preserving Embedding
"... Visualization techniques for complex data are a workhorse of modern scientific pursuits. The goal of visualization is to embed highdimensional data in a lowdimensional space while preserving structure in the data relevant to exploratory data analysis such as clusters. However, existing visualizati ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Visualization techniques for complex data are a workhorse of modern scientific pursuits. The goal of visualization is to embed highdimensional data in a lowdimensional space while preserving structure in the data relevant to exploratory data analysis such as clusters. However, existing visualization methods often either fail to separate clusters due to the crowding problem or can only separate clusters at a single resolution. Here, we develop a new approach to visualization, tree preserving embedding (TPE). Our approach uses the topological notion of connectedness to separate clusters at all resolutions. We provide a formal guarantee of cluster separation for our approach that holds for finite samples. Our approach requires no parameters and can handle general types of data, making it easy to use in practice. 1.