Results 1  10
of
41
Self Organization of a Massive Document Collection
 IEEE Transactions on Neural Networks
"... This article describes the implementation of a system that is able to organize vast document collections according to textual similarities. It is based on the SelfOrganizing Map (SOM) algorithm. As the feature vectors for the documents we use statistical representations of their vocabularies. The m ..."
Abstract

Cited by 264 (15 self)
 Add to MetaCart
(Show Context)
This article describes the implementation of a system that is able to organize vast document collections according to textual similarities. It is based on the SelfOrganizing Map (SOM) algorithm. As the feature vectors for the documents we use statistical representations of their vocabularies. The main goal in our work has been to scale up the SOM algorithm to be able to deal with large amounts of highdimensional data. In a practical experiment we mapped 6,840,568 patent abstracts onto a 1,002,240node SOM. As the feature vectors we used 500dimensional vectors of stochastic figures obtained as random projections of weighted word histograms. Keywords Data mining, exploratory data analysis, knowledge discovery, large databases, parallel implementation, random projection, SelfOrganizing Map (SOM), textual documents. I. Introduction A. From simple searches to browsing of selforganized data collections Locating documents on the basis of keywords and simple search expressions is a c...
Data Exploration Using SelfOrganizing Maps
 ACTA POLYTECHNICA SCANDINAVICA: MATHEMATICS, COMPUTING AND MANAGEMENT IN ENGINEERING SERIES NO. 82
, 1997
"... Finding structures in vast multidimensional data sets, be they measurement data, statistics, or textual documents, is difficult and timeconsuming. Interesting, novel relations between the data items may be hidden in the data. The selforganizing map (SOM) algorithm of Kohonen can be used to aid the ..."
Abstract

Cited by 115 (4 self)
 Add to MetaCart
Finding structures in vast multidimensional data sets, be they measurement data, statistics, or textual documents, is difficult and timeconsuming. Interesting, novel relations between the data items may be hidden in the data. The selforganizing map (SOM) algorithm of Kohonen can be used to aid the exploration: the structures in the data sets can be illustrated on special map displays. In this work, the methodology of using SOMs for exploratory data analysis or data mining is reviewed and developed further. The properties of the maps are compared with the properties of related methods intended for visualizing highdimensional multivariate data sets. In a set of case studies the SOM algorithm is applied to analyzing electroencephalograms, to illustrating structures of the standard of living in the world, and to organizing fulltext document collections. Measures are proposed for evaluating the quality of different types of maps in representing a given data set, and for measuring the robu...
Solving Euclidean Distance Matrix Completion Problems Via Semidefinite Programming
, 1997
"... Given a partial symmetric matrix A with only certain elements specified, the Euclidean distance matrix completion problem (IgDMCP) is to find the unspecified elements of A that make A a Euclidean distance matrix (IgDM). In this paper, we follow the successful approach in [20] and solve the IgDMCP by ..."
Abstract

Cited by 82 (15 self)
 Add to MetaCart
Given a partial symmetric matrix A with only certain elements specified, the Euclidean distance matrix completion problem (IgDMCP) is to find the unspecified elements of A that make A a Euclidean distance matrix (IgDM). In this paper, we follow the successful approach in [20] and solve the IgDMCP by generalizing the completion problem to allow for approximate completions. In particular, we introduce a primaldual interiorpoint algorithm that solves an equivalent (quadratic objective function) semidefinite programming problem (SDP). Numerical results are included which illustrate the efficiency and robustness of our approach. Our randomly generated problems consistently resulted in low dimensional solutions when no completion existed.
Spectral Partitioning with Indefinite Kernels Using the Nyström Extension
 European Conference on Computer Vision 2002
, 2002
"... ..."
(Show Context)
Optimal cluster preserving embedding of nonmetric proximity data
 IEEE Trans. Pattern Analysis and Machine Intelligence
, 2003
"... Abstract—For several major applications of data analysis, objects are often not represented as feature vectors in a vector space, but rather by a matrix gathering pairwise proximities. Such pairwise data often violates metricity and, therefore, cannot be naturally embedded in a vector space. Concern ..."
Abstract

Cited by 54 (4 self)
 Add to MetaCart
(Show Context)
Abstract—For several major applications of data analysis, objects are often not represented as feature vectors in a vector space, but rather by a matrix gathering pairwise proximities. Such pairwise data often violates metricity and, therefore, cannot be naturally embedded in a vector space. Concerning the problem of unsupervised structure detection or clustering, in this paper, a new embedding method for pairwise data into Euclidean vector spaces is introduced. We show that all clustering methods, which are invariant under additive shifts of the pairwise proximities, can be reformulated as grouping problems in Euclidian spaces. The most prominent property of this constant shift embedding framework is the complete preservation of the cluster structure in the embedding space. Restating pairwise clustering problems in vector spaces has several important consequences, such as the statistical description of the clusters by way of cluster prototypes, the generic extension of the grouping procedure to a discriminative prediction rule, and the applicability of standard preprocessing methods like denoising or dimensionality reduction. Index Terms—Clustering, pairwise proximity data, cost function, embedding, MDS. 1
On certain linear mappings between innerproduct and squared distance matrices
 Linear Algebra Appl
, 1988
"... We obtain the spectral decomposition of four linear mappings. The first, Ie, is a mapping of the linear hull of all centered innerproduct matrices onto the linear hull of all the induced squareddistance matrices. It is based on the natural generalization of the cosine law of elementary Euclidean g ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
We obtain the spectral decomposition of four linear mappings. The first, Ie, is a mapping of the linear hull of all centered innerproduct matrices onto the linear hull of all the induced squareddistance matrices. It is based on the natural generalization of the cosine law of elementary Euclidean geometry. The other three mappings studied are,, 1, the adjoint,,*, and ("*) 1. Extensions and applications, particularly to multidimensional scaling, are discussed in some detail. 1.
Applications of Multidimensional Scaling to Molecular Conformation
, 1997
"... Multidimensional scaling (MDS) is a collection of data analytic techniques for constructing configurations of points from information about interpoint distances. Such constructions arise in computational chemistry when one endeavors to infer the conformation (3dimensional structure) of a molecule fr ..."
Abstract

Cited by 24 (6 self)
 Add to MetaCart
Multidimensional scaling (MDS) is a collection of data analytic techniques for constructing configurations of points from information about interpoint distances. Such constructions arise in computational chemistry when one endeavors to infer the conformation (3dimensional structure) of a molecule from information about its interatomic distances. For a number of reasons, this application of MDS poses computational challenges not encountered in more traditional applications. In this report we sketch the mathematical formulation of MDS for molecular conformation problems and describe two approaches that can be employed for their solution. 1 Molecular Conformation Consider a molecule with n atoms. We can represent its conformation, or 3dimensional structure, by specifying the coordinates of each atom with respect to a Euclidean coordinate system for ! 3 . We store these coordinates in an n \Theta 3 configuration matrix X. Given X, we can easily compute the matrix of interatomic distan...
The Solution of the Metric STRESS and SSTRESS Problems in Multidimensional Scaling Using Newton's Method
, 1995
"... This paper considers numerical algorithms for finding local minimizers of metric multidimensional scaling problems. Both the STRESS and SSTRESS criteria are considered, and the leading algorithms for each are carefully explicated. A new algorithm, based on Newton's method, is proposed. Translat ..."
Abstract

Cited by 23 (3 self)
 Add to MetaCart
This paper considers numerical algorithms for finding local minimizers of metric multidimensional scaling problems. Both the STRESS and SSTRESS criteria are considered, and the leading algorithms for each are carefully explicated. A new algorithm, based on Newton's method, is proposed. Translational and rotational indeterminancy is removed by a parametrization that has not previously been used in multidimensional scaling algorithms. In contrast to previous algorithms, a very pleasant feature of the new algorithm is that it can be used with either the STRESS or the SSTRESS criterion. Numerical results are presented. Key words: Metric multidimensional scaling, STRESS criterion, SSTRESS criterion, unconstrained optimization, Newton's method. Department of Computational and Applied Mathematics, Rice University, Houston, TX 772511892. This author was generously supported by a Patricia R. Harris Fellowship. y Department of Computational and Applied Mathematics and Center for Research in...
Polynomial Instances Of The Positive Semidefinite And Euclidean Distance Matrix Completion Problems
 SIAM J. Matrix Anal. Appl
, 1998
"... Given an undirected graph G = (V; E) with node set V = [1; n], a subset S ` V and a rational vector a 2 Q S[E , the positive semidefinite matrix completion problem consists of determining whether there exists a real symmetric n \Theta n positive semidefinite matrix X = (x ij ) satisfying: x ii = a ..."
Abstract

Cited by 17 (6 self)
 Add to MetaCart
(Show Context)
Given an undirected graph G = (V; E) with node set V = [1; n], a subset S ` V and a rational vector a 2 Q S[E , the positive semidefinite matrix completion problem consists of determining whether there exists a real symmetric n \Theta n positive semidefinite matrix X = (x ij ) satisfying: x ii = a i (i 2 S) and x ij = a ij (ij 2 E). Similarly, the Euclidean distance matrix completion problem asks for the existence of a Euclidean distance matrix completing a partially defined given matrix. It is not known whether these problems belong to NP. We show here that they can be solved in polynomial time when restricted to the graphs having a fixed minimum fillin; the minimum fillin of graph G being the minimum number of edges needed to be added to G in order to obtain a chordal graph. A simple combinatorial algorithm permits to construct a completion in polynomial time in the chordal case. We also show that the completion problem is polynomially solvable for a class of graphs including wheels of fixed length (assuming all diagonal entries are specified). The running time of our algorithms is polynomially bounded in terms of n and the bitlength of the input a. We also observe that the matrix completion problem can be solved in polynomial time in the real number model for the class of graphs containing no homeomorph of K 4 .