Results 1  10
of
84
Self Organization of a Massive Document Collection
 IEEE Transactions on Neural Networks
"... This article describes the implementation of a system that is able to organize vast document collections according to textual similarities. It is based on the SelfOrganizing Map (SOM) algorithm. As the feature vectors for the documents we use statistical representations of their vocabularies. The m ..."
Abstract

Cited by 204 (14 self)
 Add to MetaCart
This article describes the implementation of a system that is able to organize vast document collections according to textual similarities. It is based on the SelfOrganizing Map (SOM) algorithm. As the feature vectors for the documents we use statistical representations of their vocabularies. The main goal in our work has been to scale up the SOM algorithm to be able to deal with large amounts of highdimensional data. In a practical experiment we mapped 6,840,568 patent abstracts onto a 1,002,240node SOM. As the feature vectors we used 500dimensional vectors of stochastic figures obtained as random projections of weighted word histograms. Keywords Data mining, exploratory data analysis, knowledge discovery, large databases, parallel implementation, random projection, SelfOrganizing Map (SOM), textual documents. I. Introduction A. From simple searches to browsing of selforganized data collections Locating documents on the basis of keywords and simple search expressions is a c...
ThreeDimensional Face Recognition
, 2005
"... An expressioninvariant 3D face recognition approach is presented. Our basic assumption is that facial expressions can be modelled as isometries of the facial surface. This allows to construct expressioninvariant representations of faces using the bendinginvariant canonical forms approach. The re ..."
Abstract

Cited by 103 (22 self)
 Add to MetaCart
An expressioninvariant 3D face recognition approach is presented. Our basic assumption is that facial expressions can be modelled as isometries of the facial surface. This allows to construct expressioninvariant representations of faces using the bendinginvariant canonical forms approach. The result is an efficient and accurate face recognition algorithm, robust to facial expressions, that can distinguish between identical twins (the first two authors). We demonstrate a prototype system based on the proposed algorithm and compare its performance to classical face recognition methods. The numerical methods employed by our approach do not require the facial surface explicitly. The surface gradients field, or the surface metric, are sufficient for constructing the expressioninvariant representation of any given face. It allows us to perform the 3D face recognition task while avoiding the surface reconstruction stage.
Data Exploration Using SelfOrganizing Maps
 ACTA POLYTECHNICA SCANDINAVICA: MATHEMATICS, COMPUTING AND MANAGEMENT IN ENGINEERING SERIES NO. 82
, 1997
"... Finding structures in vast multidimensional data sets, be they measurement data, statistics, or textual documents, is difficult and timeconsuming. Interesting, novel relations between the data items may be hidden in the data. The selforganizing map (SOM) algorithm of Kohonen can be used to aid the ..."
Abstract

Cited by 96 (4 self)
 Add to MetaCart
Finding structures in vast multidimensional data sets, be they measurement data, statistics, or textual documents, is difficult and timeconsuming. Interesting, novel relations between the data items may be hidden in the data. The selforganizing map (SOM) algorithm of Kohonen can be used to aid the exploration: the structures in the data sets can be illustrated on special map displays. In this work, the methodology of using SOMs for exploratory data analysis or data mining is reviewed and developed further. The properties of the maps are compared with the properties of related methods intended for visualizing highdimensional multivariate data sets. In a set of case studies the SOM algorithm is applied to analyzing electroencephalograms, to illustrating structures of the standard of living in the world, and to organizing fulltext document collections. Measures are proposed for evaluating the quality of different types of maps in representing a given data set, and for measuring the robu...
Expressioninvariant 3D face recognition
, 2003
"... We present a novel 3D face recognition approach based on geometric invariants introduced by Elad and Kimmel. The key idea of the proposed algorithm is a representation of the facial surface, invariant to isometric deformations, such as those resulting from different expressions and postures of the ..."
Abstract

Cited by 79 (17 self)
 Add to MetaCart
We present a novel 3D face recognition approach based on geometric invariants introduced by Elad and Kimmel. The key idea of the proposed algorithm is a representation of the facial surface, invariant to isometric deformations, such as those resulting from different expressions and postures of the face. The obtained geometric invariants allow mapping 2D facial texture images into special images that incorporate the 3D geometry of the face. These signature images are then decomposed into their principal components. The result is an efficient and accurate face recognition algorithm that is robust to facial expressions. We demonstrate the results of our method and compare it to existing 2D and 3D face recognition algorithms.
Finding Approximate POMDP Solutions Through Belief Compression
, 2003
"... Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are generally considered to be intractable for large models. The intractability of these algorithms is to a large extent a consequence of computing an exact, optimal policy over the ent ..."
Abstract

Cited by 62 (2 self)
 Add to MetaCart
Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are generally considered to be intractable for large models. The intractability of these algorithms is to a large extent a consequence of computing an exact, optimal policy over the entire belief space. However, in realworld POMDP problems, computing the optimal policy for the full belief space is often unnecessary for good control even for problems with complicated policy classes. The beliefs experienced by the controller often lie near a structured, lowdimensional manifold embedded in the highdimensional belief space. Finding a good approximation to the optimal value function for only this manifold can be much easier than computing the full value function. We introduce a new method for solving largescale POMDPs by reducing the dimensionality of the belief space. We use Exponential family Principal Components Analysis (Collins, Dasgupta, & Schapire, 2002) to represent sparse, highdimensional belief spaces using lowdimensional sets of learned features of the belief state. We then plan only in terms of the lowdimensional belief features. By planning in this lowdimensional space, we can find policies for POMDP models that are orders of magnitude larger than models that can be handled by conventional techniques. We demonstrate the use of this algorithm on a synthetic problem and on mobile robot navigation tasks. 1.
Optimal cluster preserving embedding of nonmetric proximity data
 IEEE Trans. Pattern Analysis and Machine Intelligence
, 2003
"... Abstract—For several major applications of data analysis, objects are often not represented as feature vectors in a vector space, but rather by a matrix gathering pairwise proximities. Such pairwise data often violates metricity and, therefore, cannot be naturally embedded in a vector space. Concern ..."
Abstract

Cited by 43 (4 self)
 Add to MetaCart
Abstract—For several major applications of data analysis, objects are often not represented as feature vectors in a vector space, but rather by a matrix gathering pairwise proximities. Such pairwise data often violates metricity and, therefore, cannot be naturally embedded in a vector space. Concerning the problem of unsupervised structure detection or clustering, in this paper, a new embedding method for pairwise data into Euclidean vector spaces is introduced. We show that all clustering methods, which are invariant under additive shifts of the pairwise proximities, can be reformulated as grouping problems in Euclidian spaces. The most prominent property of this constant shift embedding framework is the complete preservation of the cluster structure in the embedding space. Restating pairwise clustering problems in vector spaces has several important consequences, such as the statistical description of the clusters by way of cluster prototypes, the generic extension of the grouping procedure to a discriminative prediction rule, and the applicability of standard preprocessing methods like denoising or dimensionality reduction. Index Terms—Clustering, pairwise proximity data, cost function, embedding, MDS. 1
Learning as Extraction of LowDimensional Representations
 Mechanisms of Perceptual Learning
, 1996
"... Psychophysical findings accumulated over the past several decades indicate that perceptual tasks such as similarity judgment tend to be performed on a lowdimensional representation of the sensory data. Low dimensionality is especially important for learning, as the number of examples required for a ..."
Abstract

Cited by 26 (7 self)
 Add to MetaCart
Psychophysical findings accumulated over the past several decades indicate that perceptual tasks such as similarity judgment tend to be performed on a lowdimensional representation of the sensory data. Low dimensionality is especially important for learning, as the number of examples required for attaining a given level of performance grows exponentially with the dimensionality of the underlying representation space. In this chapter, we argue that, whereas many perceptual problems are tractable precisely because their intrinsic dimensionality is low, the raw dimensionality of the sensory data is normally high, and must be reduced by a nontrivial computational process, which, in itself, may involve learning. Following a survey of computational techniques for dimensionality reduction, we show that it is possible to learn a lowdimensional representation that captures the intrinsic lowdimensional nature of certain classes of visual objects, thereby facilitating further learning of tasks...
The Solution of the Metric STRESS and SSTRESS Problems in Multidimensional Scaling Using Newton's Method
, 1995
"... This paper considers numerical algorithms for finding local minimizers of metric multidimensional scaling problems. Both the STRESS and SSTRESS criteria are considered, and the leading algorithms for each are carefully explicated. A new algorithm, based on Newton's method, is proposed. Translational ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
This paper considers numerical algorithms for finding local minimizers of metric multidimensional scaling problems. Both the STRESS and SSTRESS criteria are considered, and the leading algorithms for each are carefully explicated. A new algorithm, based on Newton's method, is proposed. Translational and rotational indeterminancy is removed by a parametrization that has not previously been used in multidimensional scaling algorithms. In contrast to previous algorithms, a very pleasant feature of the new algorithm is that it can be used with either the STRESS or the SSTRESS criterion. Numerical results are presented. Key words: Metric multidimensional scaling, STRESS criterion, SSTRESS criterion, unconstrained optimization, Newton's method. Department of Computational and Applied Mathematics, Rice University, Houston, TX 772511892. This author was generously supported by a Patricia R. Harris Fellowship. y Department of Computational and Applied Mathematics and Center for Research in...
Clustering with the connectivity kernel
 In NIPS
, 2004
"... Clustering aims at extracting hidden structure in dataset. While the problem of finding compact clusters has been widely studied in the literature, extracting arbitrarily formed elongated structures is considered a much harder problem. In this paper we present a novel clustering algorithm which tack ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
Clustering aims at extracting hidden structure in dataset. While the problem of finding compact clusters has been widely studied in the literature, extracting arbitrarily formed elongated structures is considered a much harder problem. In this paper we present a novel clustering algorithm which tackles the problem by a two step procedure: first the data are transformed in such a way that elongated structures become compact ones. In a second step, these new objects are clustered by optimizing a compactnessbased criterion. The advantages of the method over related approaches are threefold: (i) robustness properties of compactnessbased criteria naturally transfer to the problem of extracting elongated structures, leading to a model which is highly robust against outlier objects; (ii) the transformed distances induce a Mercer kernel which allows us to formulate a polynomial approximation scheme to the generally N Phard clustering problem; (iii) the new method does not contain free kernel parameters in contrast to methods like spectral clustering or meanshift clustering. 1
Applications of Multidimensional Scaling to Molecular Conformation
, 1997
"... Multidimensional scaling (MDS) is a collection of data analytic techniques for constructing configurations of points from information about interpoint distances. Such constructions arise in computational chemistry when one endeavors to infer the conformation (3dimensional structure) of a molecule fr ..."
Abstract

Cited by 23 (5 self)
 Add to MetaCart
Multidimensional scaling (MDS) is a collection of data analytic techniques for constructing configurations of points from information about interpoint distances. Such constructions arise in computational chemistry when one endeavors to infer the conformation (3dimensional structure) of a molecule from information about its interatomic distances. For a number of reasons, this application of MDS poses computational challenges not encountered in more traditional applications. In this report we sketch the mathematical formulation of MDS for molecular conformation problems and describe two approaches that can be employed for their solution. 1 Molecular Conformation Consider a molecule with n atoms. We can represent its conformation, or 3dimensional structure, by specifying the coordinates of each atom with respect to a Euclidean coordinate system for ! 3 . We store these coordinates in an n \Theta 3 configuration matrix X. Given X, we can easily compute the matrix of interatomic distan...