Results 1  10
of
15
Self Organization of a Massive Document Collection
 IEEE Transactions on Neural Networks
"... This article describes the implementation of a system that is able to organize vast document collections according to textual similarities. It is based on the SelfOrganizing Map (SOM) algorithm. As the feature vectors for the documents we use statistical representations of their vocabularies. The m ..."
Abstract

Cited by 204 (14 self)
 Add to MetaCart
This article describes the implementation of a system that is able to organize vast document collections according to textual similarities. It is based on the SelfOrganizing Map (SOM) algorithm. As the feature vectors for the documents we use statistical representations of their vocabularies. The main goal in our work has been to scale up the SOM algorithm to be able to deal with large amounts of highdimensional data. In a practical experiment we mapped 6,840,568 patent abstracts onto a 1,002,240node SOM. As the feature vectors we used 500dimensional vectors of stochastic figures obtained as random projections of weighted word histograms. Keywords Data mining, exploratory data analysis, knowledge discovery, large databases, parallel implementation, random projection, SelfOrganizing Map (SOM), textual documents. I. Introduction A. From simple searches to browsing of selforganized data collections Locating documents on the basis of keywords and simple search expressions is a c...
Clustering of the SelfOrganizing Map
, 2000
"... The selforganizing map (SOM) is an excellent tool in exploratory phase of data mining. It projects input space on prototypes of a lowdimensional regular grid that can be effectively utilized to visualize and explore properties of the data. When the number of SOM units is large, to facilitate quant ..."
Abstract

Cited by 159 (1 self)
 Add to MetaCart
The selforganizing map (SOM) is an excellent tool in exploratory phase of data mining. It projects input space on prototypes of a lowdimensional regular grid that can be effectively utilized to visualize and explore properties of the data. When the number of SOM units is large, to facilitate quantitative analysis of the map and the data, similar units need to be grouped, i.e., clustered. In this paper, different approaches to clustering of the SOM are considered. In particular, the use of hierarchical agglomerative clustering and partitive clustering usingmeans are investigated. The twostage procedurefirst using SOM to produce the prototypes that are then clustered in the second stageis found to perform well when compared with direct clustering of the data and to reduce the computation time.
Data Exploration Using SelfOrganizing Maps
 ACTA POLYTECHNICA SCANDINAVICA: MATHEMATICS, COMPUTING AND MANAGEMENT IN ENGINEERING SERIES NO. 82
, 1997
"... Finding structures in vast multidimensional data sets, be they measurement data, statistics, or textual documents, is difficult and timeconsuming. Interesting, novel relations between the data items may be hidden in the data. The selforganizing map (SOM) algorithm of Kohonen can be used to aid the ..."
Abstract

Cited by 96 (4 self)
 Add to MetaCart
Finding structures in vast multidimensional data sets, be they measurement data, statistics, or textual documents, is difficult and timeconsuming. Interesting, novel relations between the data items may be hidden in the data. The selforganizing map (SOM) algorithm of Kohonen can be used to aid the exploration: the structures in the data sets can be illustrated on special map displays. In this work, the methodology of using SOMs for exploratory data analysis or data mining is reviewed and developed further. The properties of the maps are compared with the properties of related methods intended for visualizing highdimensional multivariate data sets. In a set of case studies the SOM algorithm is applied to analyzing electroencephalograms, to illustrating structures of the standard of living in the world, and to organizing fulltext document collections. Measures are proposed for evaluating the quality of different types of maps in representing a given data set, and for measuring the robu...
The Growing Hierarchical SelfOrganizing Map: Exploratory Analysis of HighDimensional Data
 IEEE Transactions on Neural Networks
, 2002
"... The SelfOrganizing Map is a very popular unsupervised neural network model for the analysis of highdimensional input data as in data mining applications. However, at least two limitations have to be noted, which are related, on the one hand, to the static architecture of this model, as well as, on ..."
Abstract

Cited by 34 (3 self)
 Add to MetaCart
The SelfOrganizing Map is a very popular unsupervised neural network model for the analysis of highdimensional input data as in data mining applications. However, at least two limitations have to be noted, which are related, on the one hand, to the static architecture of this model, as well as, on the other hand, to the limited capabilities for the representation of hierarchical relations of the data. With our novel Growing Hierarchical SelfOrganizing Map presented in this paper we address both limitations. The Growing Hierarchical SelfOrganizing Map is an arti cial neural network model with hierarchical architecture composed of independent growing selforganizing maps. The motivation was to provide a model that adapts its architecture during its unsupervised training process according to the particular requirements of the input data. Furthermore, by providing a global orientation of the independently growing maps in the individual layers of the hierarchy, navigation across branches is facilitated. The bene ts of this novel neural network are rst, a problemdependent architecture, and second, the intuitive representation of hierarchical relations in the data. This is especially appealing in explorative data mining applications, allowing the inherent structure of the data to unfold in a highly intuitive fashion.
A Scalable Selforganizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation
 Communication Cognition and Artificial Intelligence, Spring
, 1998
"... : The rapid proliferation of textual and multimedia online databases, digital libraries, Internet servers, and intranet services has turned researchers' and practitioners' dream of creating an informationrich society into a nightmare of infogluts. Many researchers believe that turning an infoglu ..."
Abstract

Cited by 29 (5 self)
 Add to MetaCart
: The rapid proliferation of textual and multimedia online databases, digital libraries, Internet servers, and intranet services has turned researchers' and practitioners' dream of creating an informationrich society into a nightmare of infogluts. Many researchers believe that turning an infoglut into a useful digital library requires automated techniques for organizing and categorizing largescale information. This paper presents research in which we sought to develop a scaleable textual classification and categorization system based on the Kohonen's selforganizing feature map (SOM) algorithm. In our paper, we show how selforganization can be used for automatic thesaurus generation. Our proposed data structure and algorithm took advantage of the sparsity of coordinates in the document input vectors and reduced the SOM computational complexity by several order of magnitude. The proposed Scaleable SOM (SSOM) algorithm makes largescale textual categorization tasks a possibility. A...
Using SelfOrganizing Maps and Learning Vector Quantization for Mixture Density Hidden Markov Models
, 1997
"... This work presents experiments to recognize pattern sequences using hidden Markov models (HMMs). The pattern sequences in the experiments are computed from speech signals and the recognition task is to decode the corresponding phoneme sequences. The training of the HMMs of the phonemes using the col ..."
Abstract

Cited by 20 (8 self)
 Add to MetaCart
This work presents experiments to recognize pattern sequences using hidden Markov models (HMMs). The pattern sequences in the experiments are computed from speech signals and the recognition task is to decode the corresponding phoneme sequences. The training of the HMMs of the phonemes using the collected speech samples is a difficult task because of the natural variation in the speech. Two neural computing paradigms, the SelfOrganizing Map (SOM) and the Learning Vector Quantization (LVQ) are used in the experiments to improve the recognition performance of the models. A HMM consists of sequential states which are trained to model the feature changes in the signal produced during the modeled process. The output densities applied in this work are mixtures of Gaussian density functions. SOMs are applied to initialize and train the mixtures to give a smooth and faithful presentation of the feature vector space defined by the corresponding training samples. The SOM maps similar feature vect...
Fast Winner Search for SOMBased Monitoring and Retrieval of HighDimensional Data
 In Proceedings of ICANN99, Ninth International Conference on Artificial Neural Networks
, 1999
"... SelfOrganizing Maps (SOMs) are widely used in engineering and dataanalysis tasks, but so far rarely in very largescale problems. The reason is the amount of computation: while small SOMs can be computed starting from the basic principles, rapid computation of large maps of highdimensional data re ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
SelfOrganizing Maps (SOMs) are widely used in engineering and dataanalysis tasks, but so far rarely in very largescale problems. The reason is the amount of computation: while small SOMs can be computed starting from the basic principles, rapid computation of large maps of highdimensional data requires special methods. Winner search, finding the position of a data sample on the map, is the computational bottleneck: comparison between the data vector and all of the model vectors of the map is required. In this paper a method is proposed for reducing the amount of computation by restricting the search to certain smalldimensional subspaces of the original space. The method is suitable for applications in which the map can be computed offline, for instance in data monitoring, classification, and information retrieval. In a case study with the WEBSOM system that organizes text document collections on a SOM, the amount of computation was reduced to about 14% of the original, and even to ...
Recent Advances with the Growing Hierarchical SelfOrganizing Map
, 2001
"... We present our recent work on the Growing Hierarchical SelfOrganizing Map, a dynamically growing neural network model which evolves into a hierarchical structure according to the necessities of the input data during an unsupervised training process. The benefits of this novel architecture are shown ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
We present our recent work on the Growing Hierarchical SelfOrganizing Map, a dynamically growing neural network model which evolves into a hierarchical structure according to the necessities of the input data during an unsupervised training process. The benefits of this novel architecture are shown by organizing a realworld document collection according to semantic similarities.
A New Approach to Hierarchical Clustering and Structuring of Data with SelfOrganizing Maps
 Journal of Intelligent Data Analysis
, 2003
"... The SelfOrganizing Map (SOM) is a powerful tool for exploratory data analysis which has been employed in a wide range of data mining applications. We present a novel approach... ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
The SelfOrganizing Map (SOM) is a powerful tool for exploratory data analysis which has been employed in a wide range of data mining applications. We present a novel approach...
Organizing And Exploring HighDimensional Data With The Growing Hierarchical SelfOrganizing Map
 Proceedings of the 1st International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2002
, 2002
"... The SelfOrganizing Map is a very popular unsupervised neural network model for the analysis of highdimensional input data as in data mining applications. However, at least two limitations have to be noted, which are caused, on the one hand, by the static architecture of this model, as well as, on ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
The SelfOrganizing Map is a very popular unsupervised neural network model for the analysis of highdimensional input data as in data mining applications. However, at least two limitations have to be noted, which are caused, on the one hand, by the static architecture of this model, as well as, on the other hand, by the limited capabilities for the representation of hierarchical relations of the data. With our Growing Hierarchical SelfOrganizing Map we present an artificial neural network model with hierarchical architecture composed of independent growing selforganizing maps to address both limitations. The motivation is to provide a model that adapts its architecture during its unsupervised training process according to the particular requirements of the input data. The benefits of this neural network are first, a problemdependent architecture, and second, the intuitive representation of hierarchical relations in the data. This is especially appealing in exploratory data mining applications, allowing the inherent structure of the data to unfold in a highly intuitive fashion.