Results 1 - 10
of
15
Self Organization of a Massive Document Collection
- IEEE Transactions on Neural Networks
"... This article describes the implementation of a system that is able to organize vast document collections according to textual similarities. It is based on the Self-Organizing Map (SOM) algorithm. As the feature vectors for the documents we use statistical representations of their vocabularies. The m ..."
Abstract
-
Cited by 183 (14 self)
- Add to MetaCart
This article describes the implementation of a system that is able to organize vast document collections according to textual similarities. It is based on the Self-Organizing Map (SOM) algorithm. As the feature vectors for the documents we use statistical representations of their vocabularies. The main goal in our work has been to scale up the SOM algorithm to be able to deal with large amounts of high-dimensional data. In a practical experiment we mapped 6,840,568 patent abstracts onto a 1,002,240-node SOM. As the feature vectors we used 500-dimensional vectors of stochastic figures obtained as random projections of weighted word histograms. Keywords Data mining, exploratory data analysis, knowledge discovery, large databases, parallel implementation, random projection, Self-Organizing Map (SOM), textual documents. I. Introduction A. From simple searches to browsing of self-organized data collections Locating documents on the basis of keywords and simple search expressions is a c...
Clustering of the Self-Organizing Map
, 2000
"... The self-organizing map (SOM) is an excellent tool in exploratory phase of data mining. It projects input space on prototypes of a low-dimensional regular grid that can be effectively utilized to visualize and explore properties of the data. When the number of SOM units is large, to facilitate quant ..."
Abstract
-
Cited by 103 (0 self)
- Add to MetaCart
The self-organizing map (SOM) is an excellent tool in exploratory phase of data mining. It projects input space on prototypes of a low-dimensional regular grid that can be effectively utilized to visualize and explore properties of the data. When the number of SOM units is large, to facilitate quantitative analysis of the map and the data, similar units need to be grouped, i.e., clustered. In this paper, different approaches to clustering of the SOM are considered. In particular, the use of hierarchical agglomerative clustering and partitive clustering using-means are investigated. The two-stage procedure---first using SOM to produce the prototypes that are then clustered in the second stage---is found to perform well when compared with direct clustering of the data and to reduce the computation time.
Data Exploration Using Self-Organizing Maps
- ACTA POLYTECHNICA SCANDINAVICA: MATHEMATICS, COMPUTING AND MANAGEMENT IN ENGINEERING SERIES NO. 82
, 1997
"... Finding structures in vast multidimensional data sets, be they measurement data, statistics, or textual documents, is difficult and time-consuming. Interesting, novel relations between the data items may be hidden in the data. The selforganizing map (SOM) algorithm of Kohonen can be used to aid the ..."
Abstract
-
Cited by 93 (4 self)
- Add to MetaCart
Finding structures in vast multidimensional data sets, be they measurement data, statistics, or textual documents, is difficult and time-consuming. Interesting, novel relations between the data items may be hidden in the data. The selforganizing map (SOM) algorithm of Kohonen can be used to aid the exploration: the structures in the data sets can be illustrated on special map displays. In this work, the methodology of using SOMs for exploratory data analysis or data mining is reviewed and developed further. The properties of the maps are compared with the properties of related methods intended for visualizing highdimensional multivariate data sets. In a set of case studies the SOM algorithm is applied to analyzing electroencephalograms, to illustrating structures of the standard of living in the world, and to organizing full-text document collections. Measures are proposed for evaluating the quality of different types of maps in representing a given data set, and for measuring the robu...
The Growing Hierarchical Self-Organizing Map: Exploratory Analysis of High-Dimensional Data
- IEEE Transactions on Neural Networks
, 2002
"... The Self-Organizing Map is a very popular unsupervised neural network model for the analysis of high-dimensional input data as in data mining applications. However, at least two limitations have to be noted, which are related, on the one hand, to the static architecture of this model, as well as, on ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
The Self-Organizing Map is a very popular unsupervised neural network model for the analysis of high-dimensional input data as in data mining applications. However, at least two limitations have to be noted, which are related, on the one hand, to the static architecture of this model, as well as, on the other hand, to the limited capabilities for the representation of hierarchical relations of the data. With our novel Growing Hierarchical SelfOrganizing Map presented in this paper we address both limitations. The Growing Hierarchical Self-Organizing Map is an arti cial neural network model with hierarchical architecture composed of independent growing self-organizing maps. The motivation was to provide a model that adapts its architecture during its unsupervised training process according to the particular requirements of the input data. Furthermore, by providing a global orientation of the independently growing maps in the individual layers of the hierarchy, navigation across branches is facilitated. The bene ts of this novel neural network are rst, a problem-dependent architecture, and second, the intuitive representation of hierarchical relations in the data. This is especially appealing in explorative data mining applications, allowing the inherent structure of the data to unfold in a highly intuitive fashion.
A Scalable Self-organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation
- Communication Cognition and Artificial Intelligence, Spring
, 1998
"... : The rapid proliferation of textual and multimedia online databases, digital libraries, Internet servers, and intranet services has turned researchers' and practitioners' dream of creating an information-rich society into a nightmare of info-gluts. Many researchers believe that turning an info-glu ..."
Abstract
-
Cited by 23 (5 self)
- Add to MetaCart
: The rapid proliferation of textual and multimedia online databases, digital libraries, Internet servers, and intranet services has turned researchers' and practitioners' dream of creating an information-rich society into a nightmare of info-gluts. Many researchers believe that turning an info-glut into a useful digital library requires automated techniques for organizing and categorizing large-scale information. This paper presents research in which we sought to develop a scaleable textual classification and categorization system based on the Kohonen's self-organizing feature map (SOM) algorithm. In our paper, we show how self-organization can be used for automatic thesaurus generation. Our proposed data structure and algorithm took advantage of the sparsity of coordinates in the document input vectors and reduced the SOM computational complexity by several order of magnitude. The proposed Scaleable SOM (SSOM) algorithm makes large-scale textual categorization tasks a possibility. A...
Using Self-Organizing Maps and Learning Vector Quantization for Mixture Density Hidden Markov Models
, 1997
"... This work presents experiments to recognize pattern sequences using hidden Markov models (HMMs). The pattern sequences in the experiments are computed from speech signals and the recognition task is to decode the corresponding phoneme sequences. The training of the HMMs of the phonemes using the col ..."
Abstract
-
Cited by 19 (8 self)
- Add to MetaCart
This work presents experiments to recognize pattern sequences using hidden Markov models (HMMs). The pattern sequences in the experiments are computed from speech signals and the recognition task is to decode the corresponding phoneme sequences. The training of the HMMs of the phonemes using the collected speech samples is a difficult task because of the natural variation in the speech. Two neural computing paradigms, the Self-Organizing Map (SOM) and the Learning Vector Quantization (LVQ) are used in the experiments to improve the recognition performance of the models. A HMM consists of sequential states which are trained to model the feature changes in the signal produced during the modeled process. The output densities applied in this work are mixtures of Gaussian density functions. SOMs are applied to initialize and train the mixtures to give a smooth and faithful presentation of the feature vector space defined by the corresponding training samples. The SOM maps similar feature vect...
Fast Winner Search for SOM-Based Monitoring and Retrieval of High-Dimensional Data
- In Proceedings of ICANN99, Ninth International Conference on Artificial Neural Networks
, 1999
"... Self-Organizing Maps (SOMs) are widely used in engineering and data-analysis tasks, but so far rarely in very large-scale problems. The reason is the amount of computation: while small SOMs can be computed starting from the basic principles, rapid computation of large maps of highdimensional data re ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
Self-Organizing Maps (SOMs) are widely used in engineering and data-analysis tasks, but so far rarely in very large-scale problems. The reason is the amount of computation: while small SOMs can be computed starting from the basic principles, rapid computation of large maps of highdimensional data requires special methods. Winner search, finding the position of a data sample on the map, is the computational bottleneck: comparison between the data vector and all of the model vectors of the map is required. In this paper a method is proposed for reducing the amount of computation by restricting the search to certain small-dimensional subspaces of the original space. The method is suitable for applications in which the map can be computed offline, for instance in data monitoring, classification, and information retrieval. In a case study with the WEBSOM system that organizes text document collections on a SOM, the amount of computation was reduced to about 14% of the original, and even to ...
Recent Advances with the Growing Hierarchical Self-Organizing Map
, 2001
"... We present our recent work on the Growing Hierarchical Self-Organizing Map, a dynamically growing neural network model which evolves into a hierarchical structure according to the necessities of the input data during an unsupervised training process. The benefits of this novel architecture are shown ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
We present our recent work on the Growing Hierarchical Self-Organizing Map, a dynamically growing neural network model which evolves into a hierarchical structure according to the necessities of the input data during an unsupervised training process. The benefits of this novel architecture are shown by organizing a real-world document collection according to semantic similarities.
A New Approach to Hierarchical Clustering and Structuring of Data with Self-Organizing Maps
- Journal of Intelligent Data Analysis
, 2003
"... The Self-Organizing Map (SOM) is a powerful tool for exploratory data analysis which has been employed in a wide range of data mining applications. We present a novel approach... ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
The Self-Organizing Map (SOM) is a powerful tool for exploratory data analysis which has been employed in a wide range of data mining applications. We present a novel approach...
Organizing And Exploring High-Dimensional Data With The Growing Hierarchical Self-Organizing Map
- Proceedings of the 1st International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2002
, 2002
"... The Self-Organizing Map is a very popular unsupervised neural network model for the analysis of high-dimensional input data as in data mining applications. However, at least two limitations have to be noted, which are caused, on the one hand, by the static architecture of this model, as well as, on ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The Self-Organizing Map is a very popular unsupervised neural network model for the analysis of high-dimensional input data as in data mining applications. However, at least two limitations have to be noted, which are caused, on the one hand, by the static architecture of this model, as well as, on the other hand, by the limited capabilities for the representation of hierarchical relations of the data. With our Growing Hierarchical Self-Organizing Map we present an artificial neural network model with hierarchical architecture composed of independent growing self-organizing maps to address both limitations. The motivation is to provide a model that adapts its architecture during its unsupervised training process according to the particular requirements of the input data. The benefits of this neural network are first, a problem-dependent architecture, and second, the intuitive representation of hierarchical relations in the data. This is especially appealing in exploratory data mining applications, allowing the inherent structure of the data to unfold in a highly intuitive fashion.

