MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Self Organization of a Massive Document Collection (0) [147 citations — 14 self]

by Teuvo Kohonen ,  Samuel Kaski ,  Krista Lagus ,  Jarkko Salojarvi ,  Vesa Paatero ,  Antti Saarela
IEEE Transactions on Neural Networks
Add To MetaCart

Abstract:

This article describes the implementation of a system that is able to organize vast document collections according to textual similarities. It is based on the Self-Organizing Map (SOM) algorithm. As the feature vectors for the documents we use statistical representations of their vocabularies. The main goal in our work has been to scale up the SOM algorithm to be able to deal with large amounts of high-dimensional data. In a practical experiment we mapped 6,840,568 patent abstracts onto a 1,002,240-node SOM. As the feature vectors we used 500-dimensional vectors of stochastic figures obtained as random projections of weighted word histograms. Keywords Data mining, exploratory data analysis, knowledge discovery, large databases, parallel implementation, random projection, Self-Organizing Map (SOM), textual documents. I. Introduction A. From simple searches to browsing of self-organized data collections Locating documents on the basis of keywords and simple search expressions is a c...

Citations

2329 Introduction to modern information retrieval – Salton - 1983
2274 Self-Organizing Maps – Kohonen - 1995
1636 Indexing by latent semantic analysis – Deerwester, Dumais, et al. - 1990
805 Self-Organized Formation of Topologically Correct Feature Maps. Biol Cybern 43:59–69 – Kohonen - 1982
424 Dithered Quantizers – Gray, Stockham - 1993
312 Two-Level Morphology: A General Computational Model for WordForm Recognition and Production – Koskenniemi - 1983
259 Toward optimal feature selection – Koller, Sahami - 1996
158 Latent semantic indexing: A probabilistic analysis – Papadimitriou, Tamaki, et al. - 1998
150 Asymptotically optimal block quantization – Gersho - 1979
149 A self-organizing semantic map for information retrieval – Lin, Soergel, et al. - 1991
140 Multidimensional Scaling – Kruskal, Wish - 1978
117 Self-organizing semantic maps – Ritter, Kohonen - 1989
116 Multidimensional scaling: I. theory and method – Torgerson - 1952
104 SOM PAK, The selfOrganizing Map Program Package. Version 3.1 – Kohonen, Hynninen, et al. - 1995
99 WEBSOMself-organizing maps of document collections – Honkela, Kaski, et al. - 1997
93 Vector quantization in speech coding – Makhoul, Roucos - 1985
77 Data exploration using self-organizing maps – Kaski - 1997
75 Dimensionality reduction by random mapping: Fast similarity computation for clustering – Kaski - 1998
67 Clustering in large graphs and matrices – Drineas, Frieze, et al. - 1999
62 Map Displays of Information Retrieval – Lin - 1997
59 Internet Categorization and Search: A Self-organizing Approach – Chen, Schufels, et al. - 1996
56 Newsgroup Exploration with the WEBSOM Method and Browsing Interface – Honkela, Kaski, et al. - 1996
53 SelfOrganizing Maps of Document Collections: A New Approach to Interactive Exploration – Lagus, Honkela, et al. - 1996
49 Self-organization of very large document collections: State of the art – Kohonen - 1998
47 AS: Discussion of a set of points in terms of their mutual distances. Pyschometrika – Young, Householder - 1938
42 Keyword selection method for characterizing text document maps – Lagus, Kaski - 1999
41 Very large two-level SOM for the browsing of newsgroups – Kohonen, Kaski, et al. - 1996
35 Exploration of Very Large Databases by Self Organising Maps – Kohonen - 1997
34 with the tree-structured selforganizing map – Koikkalainen, “Progress - 1994
32 Creating an order in digital libraries with self-organizing maps – Kaski, Honkela, et al.
27 Theory of multidimensional scaling – Leeuw, Heiser - 1982
24 Self-Organization of a Massive Text Document Collection – Kohonen, Kaski, et al. - 1999
20 Tukey, Exploratory Data Analysis – W - 1977
20 Clustering, taxonomy and topological maps of patterns – Kohonen - 1982
20 Text classification with self-organizing maps: Some lessons learned – Merkl - 1998
20 A scalable self-organizing map algorithm for textual classification: A neural network approach to thesaurus generation – Roussinov, Chen - 1998
20 Fast deterministic self-organizing maps – Koikkalainen - 1995
19 Things You Haven't Heard about the Self-Organizing Map – Kohonen
16 Unsupervised learning and the information retrieval problem – Scholtes - 1991
16 WEBSOM for textual data mining – Lagus, Honkela, et al. - 1999
12 Comparison of SOM point densities based on different criteria – Kohonen - 1999
12 Information visualization for collaborative computing – Chen, Nunamaker, et al. - 1998
11 New developments of Learning vector Quantization and the Self-Organizing map – Kohonen - 1992
8 Multidimensional scaling and its applications – Wish, Carroll - 1982
7 Improving the learning speed in topological maps of patterns – Rodriques, Almeida - 1990
5 The representation of semantic similarity between documents by using maps: Application of an artificial neural network to organize software libraries – Merkl, Tjoa - 1994
5 Convergence and ordering of Kohonen's batch map – Cheng - 1997
5 Neural networks and information extraction in astronomical information retrieval – Lesteven, Ponçot, et al. - 1996
3 Sammon Jr., "A nonlinear mapping for data structure analysis – W - 1969
1 Multidimensional scaling," in Encyclopedia of Statistical Sciences – Young - 1985