Results 1  10
of
257
FastMap: A Fast Algorithm for Indexing, DataMining and Visualization of Traditional and Multimedia Datasets
, 1995
"... A very promising idea for fast searching in traditional and multimedia databases is to map objects into points in kd space, using k featureextraction functions, provided by a domain expert [25]. Thus, we can subsequently use highly finetuned spatial access methods (SAMs), to answer several types ..."
Abstract

Cited by 434 (22 self)
 Add to MetaCart
(Show Context)
A very promising idea for fast searching in traditional and multimedia databases is to map objects into points in kd space, using k featureextraction functions, provided by a domain expert [25]. Thus, we can subsequently use highly finetuned spatial access methods (SAMs), to answer several types of queries, including the `Query By Example' type (which translates to a range query); the `all pairs' query (which translates to a spatial join [8]); the nearestneighbor or bestmatch query, etc. However, designing feature extraction functions can be hard. It is relatively easier for a domain expert to assess the similarity/distance of two objects. Given only the distance information though, it is not obvious how to map objects into points. This is exactly the topic of this paper. We describe a fast algorithm to map objects into points in some kdimensional space (k is userdefined), such that the dissimilarities are preserved. There are two benefits from this mapping: (a) efficient ret...
Self Organization of a Massive Document Collection
 IEEE Transactions on Neural Networks
"... This article describes the implementation of a system that is able to organize vast document collections according to textual similarities. It is based on the SelfOrganizing Map (SOM) algorithm. As the feature vectors for the documents we use statistical representations of their vocabularies. The m ..."
Abstract

Cited by 234 (14 self)
 Add to MetaCart
(Show Context)
This article describes the implementation of a system that is able to organize vast document collections according to textual similarities. It is based on the SelfOrganizing Map (SOM) algorithm. As the feature vectors for the documents we use statistical representations of their vocabularies. The main goal in our work has been to scale up the SOM algorithm to be able to deal with large amounts of highdimensional data. In a practical experiment we mapped 6,840,568 patent abstracts onto a 1,002,240node SOM. As the feature vectors we used 500dimensional vectors of stochastic figures obtained as random projections of weighted word histograms. Keywords Data mining, exploratory data analysis, knowledge discovery, large databases, parallel implementation, random projection, SelfOrganizing Map (SOM), textual documents. I. Introduction A. From simple searches to browsing of selforganized data collections Locating documents on the basis of keywords and simple search expressions is a c...
Virtual Landmarks for the Internet
, 2003
"... Internet coordinate schemes have been proposed as a method for estimating minimum round trip time between hosts without direct measurement. In such a scheme, each host is assigned a set of coordinates, and Euclidean distance is used to form the desired estimate. Two key questions are: How accurate a ..."
Abstract

Cited by 166 (3 self)
 Add to MetaCart
Internet coordinate schemes have been proposed as a method for estimating minimum round trip time between hosts without direct measurement. In such a scheme, each host is assigned a set of coordinates, and Euclidean distance is used to form the desired estimate. Two key questions are: How accurate are coordinate schemes across the Internet as a whole? And: are coordinate assignment schemes fast enough, and scalable enough, for large scale use? In this paper we make contributions toward answering both those questions. Whereas the coordinate assignment problem has in the past been approached by nonlinear optimization, we develop a faster method based on dimensionality reduction of the Lipschitz embedding. We show that this method is reasonably accurate, even when applied to measurements spanning the Internet, and that it naturally leads to a scalable measurement strategy based on the notion of virtual landmarks.
Probabilistic nonlinear principal component analysis with Gaussian process latent variable models
 Journal of Machine Learning Research
, 2005
"... Summarising a high dimensional data set with a low dimensional embedding is a standard approach for exploring its structure. In this paper we provide an overview of some existing techniques for discovering such embeddings. We then introduce a novel probabilistic interpretation of principal component ..."
Abstract

Cited by 150 (15 self)
 Add to MetaCart
(Show Context)
Summarising a high dimensional data set with a low dimensional embedding is a standard approach for exploring its structure. In this paper we provide an overview of some existing techniques for discovering such embeddings. We then introduce a novel probabilistic interpretation of principal component analysis (PCA) that we term dual probabilistic PCA (DPPCA). The DPPCA model has the additional advantage that the linear mappings from the embedded space can easily be nonlinearised through Gaussian processes. We refer to this model as a Gaussian process latent variable model (GPLVM). Through analysis of the GPLVM objective function, we relate the model to popular spectral techniques such as kernel PCA and multidimensional scaling. We then review a practical algorithm for GPLVMs in the context of large data sets and develop it to also handle discrete valued data and missing attributes. We demonstrate the model on a range of realworld and artificially generated data sets.
ThreeDimensional Face Recognition
, 2005
"... An expressioninvariant 3D face recognition approach is presented. Our basic assumption is that facial expressions can be modelled as isometries of the facial surface. This allows to construct expressioninvariant representations of faces using the bendinginvariant canonical forms approach. The re ..."
Abstract

Cited by 113 (22 self)
 Add to MetaCart
An expressioninvariant 3D face recognition approach is presented. Our basic assumption is that facial expressions can be modelled as isometries of the facial surface. This allows to construct expressioninvariant representations of faces using the bendinginvariant canonical forms approach. The result is an efficient and accurate face recognition algorithm, robust to facial expressions, that can distinguish between identical twins (the first two authors). We demonstrate a prototype system based on the proposed algorithm and compare its performance to classical face recognition methods. The numerical methods employed by our approach do not require the facial surface explicitly. The surface gradients field, or the surface metric, are sufficient for constructing the expressioninvariant representation of any given face. It allows us to perform the 3D face recognition task while avoiding the surface reconstruction stage.
Data Exploration Using SelfOrganizing Maps
 ACTA POLYTECHNICA SCANDINAVICA: MATHEMATICS, COMPUTING AND MANAGEMENT IN ENGINEERING SERIES NO. 82
, 1997
"... Finding structures in vast multidimensional data sets, be they measurement data, statistics, or textual documents, is difficult and timeconsuming. Interesting, novel relations between the data items may be hidden in the data. The selforganizing map (SOM) algorithm of Kohonen can be used to aid the ..."
Abstract

Cited by 107 (4 self)
 Add to MetaCart
Finding structures in vast multidimensional data sets, be they measurement data, statistics, or textual documents, is difficult and timeconsuming. Interesting, novel relations between the data items may be hidden in the data. The selforganizing map (SOM) algorithm of Kohonen can be used to aid the exploration: the structures in the data sets can be illustrated on special map displays. In this work, the methodology of using SOMs for exploratory data analysis or data mining is reviewed and developed further. The properties of the maps are compared with the properties of related methods intended for visualizing highdimensional multivariate data sets. In a set of case studies the SOM algorithm is applied to analyzing electroencephalograms, to illustrating structures of the standard of living in the world, and to organizing fulltext document collections. Measures are proposed for evaluating the quality of different types of maps in representing a given data set, and for measuring the robu...
Visualizing data using tSNE
 Costsensitive Machine Learning for Information Retrieval 33
"... We present a new technique called “tSNE ” that visualizes highdimensional data by giving each datapoint a location in a two or threedimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly ..."
Abstract

Cited by 98 (9 self)
 Add to MetaCart
(Show Context)
We present a new technique called “tSNE ” that visualizes highdimensional data by giving each datapoint a location in a two or threedimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map. tSNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for highdimensional data that lie on several different, but related, lowdimensional manifolds, such as images of objects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large data sets, we show how tSNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to influence the way in which a subset of the data is displayed. We illustrate the performance of tSNE on a wide variety of data sets and compare it with many other nonparametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by tSNE are significantly better than those produced by the other techniques on almost all of the data sets.
Solving Euclidean Distance Matrix Completion Problems Via Semidefinite Programming
, 1997
"... Given a partial symmetric matrix A with only certain elements specified, the Euclidean distance matrix completion problem (IgDMCP) is to find the unspecified elements of A that make A a Euclidean distance matrix (IgDM). In this paper, we follow the successful approach in [20] and solve the IgDMCP by ..."
Abstract

Cited by 69 (14 self)
 Add to MetaCart
Given a partial symmetric matrix A with only certain elements specified, the Euclidean distance matrix completion problem (IgDMCP) is to find the unspecified elements of A that make A a Euclidean distance matrix (IgDM). In this paper, we follow the successful approach in [20] and solve the IgDMCP by generalizing the completion problem to allow for approximate completions. In particular, we introduce a primaldual interiorpoint algorithm that solves an equivalent (quadratic objective function) semidefinite programming problem (SDP). Numerical results are included which illustrate the efficiency and robustness of our approach. Our randomly generated problems consistently resulted in low dimensional solutions when no completion existed.
The design and implementation of a selfcalibrating distributed acoustic sensing platform
 In SenSys
, 2006
"... We present the design, implementation, and evaluation of the Acoustic Embedded Networked Sensing Box (ENSBox), a platform for prototyping rapiddeployable distributed acoustic sensing systems, particularly distributed source localization. Each ENSBox integrates an ARM processor running Linux and sup ..."
Abstract

Cited by 49 (15 self)
 Add to MetaCart
(Show Context)
We present the design, implementation, and evaluation of the Acoustic Embedded Networked Sensing Box (ENSBox), a platform for prototyping rapiddeployable distributed acoustic sensing systems, particularly distributed source localization. Each ENSBox integrates an ARM processor running Linux and supports key facilities required for source localization: a sensor array, wireless network services, time synchronization, and precise selfcalibration of array position and orientation. The ENSBox’s integrated, high precision selfcalibration facility sets it apart from other platforms. This selfcalibration is precise enough to support acoustic source localization applications in complex, realistic environments: e.g., 5 cm average 2D position error and 1.5 degree average orientation error over a partially obstructed 80x50 m outdoor area. Further, our integration of array orientation into the position estimation algorithm is a novel extension of traditional multilateration techniques. We present the result of several different test deployments, measuring the performance of the system in urban settings, as well as forested, hilly environments with obstructing foliage and 20–30 m distances between neighboring nodes. Categories and Subject Descriptors C.3 [Computer Systems Organization]: SpecialPurpose and ApplicationBased Systems—Signal processing
BValidating the independent components of neuroimaging time series via clustering and visualization
 NeuroImage
, 2004
"... and visualization ..."
(Show Context)