## Clustering of the Self-Organizing Map (2000)

Citations: | 159 - 1 self |

### BibTeX

@MISC{Vesanto00clusteringof,

author = {Juha Vesanto and Esa Alhoniemi},

title = {Clustering of the Self-Organizing Map},

year = {2000}

}

### Years of Citing Articles

### OpenURL

### Abstract

The self-organizing map (SOM) is an excellent tool in exploratory phase of data mining. It projects input space on prototypes of a low-dimensional regular grid that can be effectively utilized to visualize and explore properties of the data. When the number of SOM units is large, to facilitate quantitative analysis of the map and the data, similar units need to be grouped, i.e., clustered. In this paper, different approaches to clustering of the SOM are considered. In particular, the use of hierarchical agglomerative clustering and partitive clustering using-means are investigated. The two-stage procedure---first using SOM to produce the prototypes that are then clustered in the second stage---is found to perform well when compared with direct clustering of the data and to reduce the computation time.

### Citations

3238 |
Self-Organizing Maps
- Kohonen
- 2001
(Show Context)
Citation Context ...onsidered. For this reason, ecient visualizations and summaries are essential. In this article, we focus on clusters, since they are important characterizations of data. The Self-Organizing Map (SOM) =-=[2]-=- is especially suitable for data survey because it has prominent visualization properties. It creates a set of prototype vectors representing the data set and carries out a topology preserving project... |

476 |
Fast learning in networks of locally-tuned processing units
- Moody, Darken
- 1989
(Show Context)
Citation Context ...e BMU of vector x i , and evaluated for unit j. If neighborhood kernel value is one for the BMU and zero elsewhere, this leads to minimization of Eq. 1 | the SOM reduces to adaptive k-means algorithm =-=[18]-=-. If this is not the case, from Eq. 6 it follows that the prototype vectors are not in the centroids of their Voronoi sets, but are local averages of all vectors in the data set weighted by neighborho... |

466 |
Mixture Models: Inference and Applications to Clustering
- McLachlan, Basford
- 1988
(Show Context)
Citation Context ... exactly one cluster. Fuzzy clustering [4] is a generalization of crisp clustering where each sample has a varying degree of membership in all clusters. Clustering can also be based on mixture models =-=[5]-=-. In this approach, the data are assumed to be generated by several parametrized distributions, typically Gaussians. Distribution parameters are estimated using for example Expectation-Maximation algo... |

392 |
A nonlinear mapping for data structure analysis
- Sammon
- 1969
(Show Context)
Citation Context ... can be visualized while still preserving its essential topological properties. Examples of such nonlinear projection methods include multi-dimensional scaling techniques [39], [40], Sammon's mapping =-=[41]-=- and curvilinear component analysis [42]. A special technique is to project the prototype vectors into a color space so that similar map units are assigned similar colors [43], [44]. Of course, based ... |

328 |
Multidimensional scaling by optimizing goodness of fit to a non-metric hypothesis
- KRUSKAL
- 1964
(Show Context)
Citation Context ...the high-dimensional data set can be visualized while still preserving its essential topological properties. Examples of such nonlinear projection methods include multi-dimensional scaling techniques =-=[39]-=-, [40], Sammon's mapping [41] and curvilinear component analysis [42]. A special technique is to project the prototype vectors into a color space so that similar map units are assigned similar colors ... |

298 |
A cluster separation measure
- Davies, Bouldin
- 1979
(Show Context)
Citation Context ...best one among dierent partitionings, each of these can be evaluated using some kind of validity index. Several indices have been proposed [6], [12]. In our simulations, we used Davies-Bouldin index [=-=13]-=-, which uses VESANTO AND ALHONIEMI: CLUSTERING OF THE SELF-ORGANIZING MAP 3 S c for within-cluster distance and d ce for between clusters distance. According to Davies-Bouldin validity index, the best... |

285 |
An Examination of Procedures for Determining the Number of Clusters in a Data Set
- Milligan, Cooper
- 1985
(Show Context)
Citation Context ... k-means tries tosnd spherical clusters. To select the best one among dierent partitionings, each of these can be evaluated using some kind of validity index. Several indices have been proposed [6], [=-=12]-=-. In our simulations, we used Davies-Bouldin index [13], which uses VESANTO AND ALHONIEMI: CLUSTERING OF THE SELF-ORGANIZING MAP 3 S c for within-cluster distance and d ce for between clusters distanc... |

209 |
Data Preparation for Data Mining
- Pyle
- 1999
(Show Context)
Citation Context ... the other hand, data modeling without good understanding and careful preparation of the data leads to problems. Finally, the whole mining process is meaningless if the new knowledge will not be used =-=[1]-=-. The purpose of survey is to gain insight into the data | possibilities and problems | to determine whether the data are sucient and to select the proper preprocessing and modeling tools. Typically, ... |

207 | Self organization of a massive document collection
- Kohonen, Kaski, et al.
- 2000
(Show Context)
Citation Context ...er hand, the complexity scales quadratively with the number of map units. Thus, training huge maps is time consuming, although the process can be speeded up with special techniques; see, for example, =-=[33]-=- and [34]. For example, in [33], a SOM with million units was trained with 6.8 million 500-dimensional data vectors. If desired, some vector quantization algorithm, e.g., -means, can be used instead o... |

201 |
Nonmetric multidimensional scaling: a numerical method
- Kruskal
- 1964
(Show Context)
Citation Context ...gh-dimensional data set can be visualized while still preserving its essential topological properties. Examples of such nonlinear projection methods include multi-dimensional scaling techniques [39], =-=[40]-=-, Sammon's mapping [41] and curvilinear component analysis [42]. A special technique is to project the prototype vectors into a color space so that similar map units are assigned similar colors [43], ... |

199 |
Asymptotically optimal block quantization
- Gersho
(Show Context)
Citation Context ...n shown that the density of the prototype vectors is proportional to const p(x) d d+r , where p(x) is the probability density function (pdf) of the input data, d is dimension and r is distance norm [=-=28]-=-, [29]. For the SOM, connection between the prototypes and the pdf of the input data has not been derived in general case. However, a similar power law has been derived in one-dimensional case [30]. E... |

150 | Curvilinear component analysis: a self-orgaaiziog neural network for nonlinear mapping of &ta sets
- Demartines, Hérault
- 1997
(Show Context)
Citation Context ...g its essential topological properties. Examples of such nonlinear projection methods include multi-dimensional scaling techniques [39], [40], Sammon's mapping [41] and curvilinear component analysis =-=[42]-=-. A special technique is to project the prototype vectors into a color space so that similar map units are assigned similar colors [43], [44]. Of course, based on the visualization, one can select clu... |

141 |
Chameleon: Hierarchical clustering using dynamic modeling
- Karypis, Han, et al.
- 1999
(Show Context)
Citation Context ...o a cluster can radically change the distances [6]. To be more robust, the local criterion should depend on collective features of a local data set [7]. Solutions include using more than one neighbor =-=[8]-=-, or a weighted sum of all distances. It has been shown that the SOM algorithm implicitly uses such a measure [9]. B. Algorithms The two main ways to cluster data | make the partitioning | are hierarc... |

122 |
A “neural-gas” network learns topologies
- Martinetz, Schulten
- 1991
(Show Context)
Citation Context ...f desired, some vector quantization algorithm, e.g. kmeans, can be used instead of SOM in creating thesrst abstraction level. Other possibilities include: Minimum spanning tree SOM [19], neural gas [=-=20]-=-, growing cell structures [21] and competing SOMs [22] are examples of algorithms where the neighborhood relations are much moresexible and/or the low-dimensional output grid has been discarded. Their... |

121 |
Asymptotic quantization error of continuous signals and the quantization dimension
- Zador
- 1982
(Show Context)
Citation Context ...n that the density of the prototype vectors is proportional to const p(x) d d+r , where p(x) is the probability density function (pdf) of the input data, d is dimension and r is distance norm [28], [=-=29]-=-. For the SOM, connection between the prototypes and the pdf of the input data has not been derived in general case. However, a similar power law has been derived in one-dimensional case [30]. Even th... |

118 |
Kohonen’s self organizing feature map for exploratory data analysis
- Ultsch, Simeon
- 1990
(Show Context)
Citation Context ...s well as their spatial relationships is usually acquired by visual inspection of the map. The most widely used methods for visualizing the cluster structure of the SOM are distance matrix techniques =-=[35-=-], [36], especially the unied distance matrix (U-matrix). The U-matrix shows distances between prototype vectors of neighboring map units. Because they typically have similar prototype vectors, U-matr... |

104 |
Some new indexes of cluster validity
- Bezdek, Pal
- 1998
(Show Context)
Citation Context ...oid linkage d ce = jjc k c l jj to nearest neighbor. However, the problem is that they are sensitive to noise and outliers. Addition of a single sample to a cluster can radically change the distances =-=[6]-=-. To be more robust, the local criterion should depend on collective features of a local data set [7]. Solutions include using more than one neighbor [8], or a weighted sum of all distances. It has be... |

80 | SOM-based data visualization method
- Vesanto
- 1999
(Show Context)
Citation Context ...ace onto a low-dimensional grid. This ordered grid can be used as a convenient visualization surface for showing dierent features of the SOM (and thus of the data), for example the cluster structure [=-=3]-=-. However, the visualizations can only be used to obtain qualitative information. To produce summaries | quantitative descriptions of data properties | interesting groups of map units must be selected... |

71 |
A nonlinear projection method based on Kohonen’s topology preserving maps
- Kraaijveld, Mao, et al.
- 1995
(Show Context)
Citation Context ... as their spatial relationships is usually acquired by visual inspection of the map. The most widely used methods for visualizing the cluster structure of the SOM are distance matrix techniques [35], =-=[36-=-], especially the unied distance matrix (U-matrix). The U-matrix shows distances between prototype vectors of neighboring map units. Because they typically have similar prototype vectors, U-matrix is ... |

63 | Growing grid – a self-organizing network with constant neighborhood range and adaptation strength, Neural Processing Letters 2(5
- Fritzke
- 1995
(Show Context)
Citation Context ...M. Also several such growing variants of the SOM have been proposed where the new nodes do have a well-dened place on low-dimensional grid, and thus the visualization would not be very problematic [2=-=3]-=-, [24], [25], [26], [27]. The SOM variants were not used in this study, because we wanted to select the most commonly used version of the SOM. However, the principles presented in this paper could nat... |

50 |
Fuzzy Models for Pattern Recognition: Methods That Search for Structure in Data
- Bezdek, Pal
- 1992
(Show Context)
Citation Context ...tering A. Denitions A clustering Q means partitioning a data set into a set of clusters Q i , i = 1; : : : ; C. In crisp clustering, each data sample belongs to exactly one cluster. Fuzzy clustering [=-=4]-=- is a generalization of crisp clustering where each sample has a varying degree of membership in all clusters. Clustering can also be based on mixture models [5]. In this approach, the data are assume... |

46 | Clustering properties of hierarchical selforganizing maps
- Lampinen, Oja
- 1992
(Show Context)
Citation Context ...ctive features of a local data set [7]. Solutions include using more than one neighbor [8], or a weighted sum of all distances. It has been shown that the SOM algorithm implicitly uses such a measure =-=[9]-=-. B. Algorithms The two main ways to cluster data | make the partitioning | are hierarchical and partitive approaches. The hierarchical methods can be further divided to agglomerative and divisive alg... |

43 |
Asymptotic level density for a class of vector quantization processes
- Ritter
- 1991
(Show Context)
Citation Context ...rm [28], [29]. For the SOM, connection between the prototypes and the pdf of the input data has not been derived in general case. However, a similar power law has been derived in one-dimensional case =-=[30-=-]. Even though the law holds only when the number of prototypes approaches innity and neighborhood width is very large, numerical experiments have shown that the computational results are relatively a... |

38 |
Self-Organizing Maps: Optimization approaches
- Kohonen
- 1991
(Show Context)
Citation Context ...re is also a batch version of the algorithm where the adaptation coecient is not used [2]. In the case of a discrete data set andsxed neighborhood kernel, the error function of SOM can be shown to be =-=[17]-=- E = N X i=1 M X j=1 h bj jjx i m j jj 2 ; (6) where N is number of training samples and M is the number of map units. Neighborhood kernel h bj is centered at unit b, the BMU of vector x i , and evalu... |

32 | Let it grow - self organizing feature maps with problem dependent cell structure
- Fritzke
- 1991
(Show Context)
Citation Context ...ation algorithm, e.g. kmeans, can be used instead of SOM in creating thesrst abstraction level. Other possibilities include: Minimum spanning tree SOM [19], neural gas [20], growing cell structures [=-=21]-=- and competing SOMs [22] are examples of algorithms where the neighborhood relations are much moresexible and/or the low-dimensional output grid has been discarded. Their visualization is much less st... |

29 |
A comparison of SOM Neural Network and Hierarchical Clustering Methods
- Mangiameli, Chen, et al.
- 1996
(Show Context)
Citation Context ... especially in the agglomerative methods. Another benet is noise reduction. The prototypes are local averages of the data and therefore less sensitive to random variations than the original data. In [=-=16-=-] partitive methods (i.e., a small SOM) greatly outperformed hierarchical methods in clustering imperfect data. Outliers are less of a problem since | by denition | there are very few outlier points a... |

29 |
A scalable parallel algorithm for self-organizing maps with applications to sparse data mining problems
- Lawrence, Almasi, et al.
- 1999
(Show Context)
Citation Context ...es not require huge amounts of memory | basically just the prototype vectors and the current training vector | and can be implemented both in a neural, on-line learning manner as well as parallelized =-=[32]-=-. On the other hand, the complexity scales quadratively with the number of map units. Thus, training huge maps is time-consuming, although the process can be speeded up with special techniques, see fo... |

27 | Visualizing high-dimensional structure with the incremental grid growing neural network
- Blackmore, Miikkulainen
- 1995
(Show Context)
Citation Context ...lso several such growing variants of the SOM have been proposed where the new nodes do have a well-dened place on low-dimensional grid, and thus the visualization would not be very problematic [23], [=-=24]-=-, [25], [26], [27]. The SOM variants were not used in this study, because we wanted to select the most commonly used version of the SOM. However, the principles presented in this paper could naturally... |

25 |
Domany E: Superparamagnetic clustering of data
- Blatt, Wiseman
- 1998
(Show Context)
Citation Context ...to noise and outliers. Addition of a single sample to a cluster can radically change the distances [6]. To be more robust, the local criterion should depend on collective features of a local data set =-=[7]-=-. Solutions include using more than one neighbor [8], or a weighted sum of all distances. It has been shown that the SOM algorithm implicitly uses such a measure [9]. B. Algorithms The two main ways t... |

24 |
Complexity optimized data clustering by competitive neural networks
- Buhmann, Kuehnel
- 1993
(Show Context)
Citation Context ...vide a data set into a number of clusters, typically by trying to minimize some criterion or error function. The number of clusters is usually predened, but it can also be part of the error function [=-=11]-=-. The algorithm consists of the following steps: 1. determine the number of clusters 2. initialize the cluster centers 3. compute partitioning for data 4. compute (update) cluster centers 5. if the pa... |

23 |
Phase transition in stochastic self organizing maps
- Graepel, Burger, et al.
- 1997
(Show Context)
Citation Context ...each number of clusters was selected using error criterion in Eq. 1. Another possibility would have been to use some annealing technique to better avoid local minima of the error function [47], [48], =-=[49]-=-. (a) (b) (c) Fig. 4. Illustration of data set I (a), II (b), and III (c). Twodimensional data set I is directly plotted in two dimensions, whereas data sets II and III are projected to the two-dimens... |

22 |
Visualising the clusters on the Self-Organizing Map
- Iivarinen, Kohonen, et al.
- 1994
(Show Context)
Citation Context ...ors of neighboring map units. Because they typically have similar prototype vectors, U-matrix is actually closely related to the single linkage measure. It can be eciently visualized using gray shade =-=[37]-=-, see for example Figs. 7(a), 11(a), and 12(a). Another visualization method is to display the number of hits in each map unit. Training of the SOM positions interpolating map units between clusters a... |

20 |
Fast deterministic self-organizing maps
- Koikkalainen
- 1995
(Show Context)
Citation Context ...and, the complexity scales quadratively with the number of map units. Thus, training huge maps is time-consuming, although the process can be speeded up with special techniques, see for example [33], =-=[34]-=-. For example, in [33], a SOM with million units was trained with 6.8 million 500-dimensional data vectors. B. Visual inspection An initial idea of the number of clusters in the SOM as well as their s... |

17 |
Simulated annealing and codebook design
- Vaisey, Gersho
- 1988
(Show Context)
Citation Context ...tioning for each number of clusters was selected using error criterion in Eq. 1. Another possibility would have been to use some annealing technique to better avoid local minima of the error function =-=[47]-=-, [48], [49]. (a) (b) (c) Fig. 4. Illustration of data set I (a), II (b), and III (c). Twodimensional data set I is directly plotted in two dimensions, whereas data sets II and III are projected to th... |

16 |
Comparison of SOM point densities based on different criteria
- Kohonen
- 1999
(Show Context)
Citation Context ...er of prototypes approaches innity and neighborhood width is very large, numerical experiments have shown that the computational results are relatively accurate even for a small number of prototypes [=-=31]-=-. Based on close relation between the SOM and k-means, it can be assumed that the SOM roughly follows the density of training data when not only the number of map units but alsosnal neighborhood width... |

12 |
Superparamagnetic clustering of data,” Phys
- Blatt, Wiseman, et al.
- 1996
(Show Context)
Citation Context ...to noise and outliers. Addition of a single sample to a cluster can radically change the distances [6]. To be more robust, the local criterion should depend on collective features of a local data set =-=[7]-=-. Solutions include using more than one neighbor [8] or a weighted sum of all distances. It has been shown that the SOM algorithm implicitly uses such a measure [9]. B. Algorithms The two main ways to... |

12 | Improving the learning speed in topological maps of pattern - Rodrigues, Almeida - 1990 |

10 |
Clustering of socio-economic data with Kohonen maps
- Varfis, Versino
- 1992
(Show Context)
Citation Context ... number of samples many clustering algorithms | especially hierarchical ones | become intractably heavy. For this reason, it is convenient to cluster a set of prototypes rather than directly the data =-=[15-=-]. Consider clustering N samples using k-means. This involves making several clustering trials with dierent values for k. The computational complexity is proportional to P Cmax k=2 Nk, where Cmax is p... |

10 |
Self-organizing map as a new method for clustering and data analysis
- Zhang, Li
- 1993
(Show Context)
Citation Context ...ers. The Voronoi sets of such map units have very few samples (\hits") or may even be empty. This information can be utilized in clustering the SOM by using zero-hit units to indicate cluster bor=-=ders [38]-=-. Generic vector projection methods can also be used. As opposed to the methods above, these are generally applicable to any set of vectors, for example the original data set. The high-dimensional vec... |

10 |
Interpreting the Kohonen self-organizing map using contiguity-constrained clustering
- Murtagh
- 1995
(Show Context)
Citation Context ...CLUSTERING OF THE SELF-ORGANIZING MAP 5 C. SOM clustering In agglomerative clustering, the SOM neighborhood relation can be used to constrain the possible merges in the construction of the dendrogram =-=[45]-=-. Also, knowledge of interpolating units can be utilized both in agglomerative and partitive clustering by excluding them from the analysis. If this is used together with the neighborhood constraint i... |

10 |
Growing grid—a self-organizing network with constant neighborhood range and adaptation strength
- Fritzke
- 1995
(Show Context)
Citation Context ...ddition, several such growing variants of the SOM have been proposed where the new nodes do have a welldefined place on low-dimensional grid, and thus, the visualization would not be very problematic =-=[23]-=-–[27]. The SOM variants were not used in this study because we wanted to select the most commonly used version of the SOM. However, the principles presented in this paper could naturally be applied to... |

8 |
Hierarchical self-organizing networks
- Luttrell
- 1989
(Show Context)
Citation Context ...e-level approaches to clustering have been proposed earlier, e.g. in [9]. While extra abstraction levels yield higher distortion, they also eectively reduce the complexity of the reconstruction task [=-=14]-=-. Abstraction level 1 Abstraction level 2 N samples C clusters M prototypes Fig. 2. First abstraction level is obtained by creating a set of prototype vectors using, e.g., the SOM. Clustering of the S... |

8 |
Krista Lagus, Jarkko Salojärvi, Jukka Honkela, Vesa Paatero, and Antti Saarela, “Self Organization of a Massive Document Collection
- Kohonen, Kaski
- 2000
(Show Context)
Citation Context ...ther hand, the complexity scales quadratively with the number of map units. Thus, training huge maps is time-consuming, although the process can be speeded up with special techniques, see for example =-=[33]-=-, [34]. For example, in [33], a SOM with million units was trained with 6.8 million 500-dimensional data vectors. B. Visual inspection An initial idea of the number of clusters in the SOM as well as t... |

8 | Coloring that Reveals High-Dimensional Structures in Data
- Kaski, Venna, et al.
- 1999
(Show Context)
Citation Context ... Sammon’s mapping [41], and curvilinear component analysis [42]. A special technique is to project the prototype vectors into a color space so that similar map units are assigned similar colors [43], =-=[44]-=-. Of course, based on the visualization, one can select clusters manually. However, this is a tedious process and nothing guarantees that the manual selection is done consistently. Instead, automated ... |

7 |
Improving the learning speed in topological maps of patterns
- Rodriques, Almeida
- 1990
(Show Context)
Citation Context ...such growing variants of the SOM have been proposed where the new nodes do have a well-dened place on low-dimensional grid, and thus the visualization would not be very problematic [23], [24], [25], [=-=26]-=-, [27]. The SOM variants were not used in this study, because we wanted to select the most commonly used version of the SOM. However, the principles presented in this paper could naturally be applied ... |

7 |
Interactive interpretation of hierarchical clustering
- Boudaillier, Hebrail
- 1998
(Show Context)
Citation Context ...r different clusters. In fact, some clusters may be composed of several subclusters; to obtain sensible partitioning of the data, the dendrogram may have to be cut at different levels for each branch =-=[10]-=-. For example, two alternative ways to get three clusters are shown in Fig. 1. Partitive clustering algorithms divide a data set into a number of clusters, typically by trying to minimize some criteri... |

6 |
Vector quantization codebook generation using simulated annealing
- Flanagan, Morrell, et al.
- 1989
(Show Context)
Citation Context ...g for each number of clusters was selected using error criterion in Eq. 1. Another possibility would have been to use some annealing technique to better avoid local minima of the error function [47], =-=[48]-=-, [49]. (a) (b) (c) Fig. 4. Illustration of data set I (a), II (b), and III (c). Twodimensional data set I is directly plotted in two dimensions, whereas data sets II and III are projected to the two-... |

6 |
Self-organizing maps: optimization approaches,” Artificial Neural Networks
- Kohonen
- 1991
(Show Context)
Citation Context ... also a batch version of the algorithm where the adaptation coefficient is not used [2]. In the case of a discrete data set and fixed neighborhood kernel, the error function of SOM can be shown to be =-=[17]-=- (3) (4) (5) (6) where is number of training samples, and is the number of map units. Neighborhood kernel is centered at unit , which is the BMU of vector , and evaluated for unit . If neighborhood ke... |

5 |
Clustering with competing self-organizing maps
- Cheng
- 1992
(Show Context)
Citation Context ...eans, can be used instead of SOM in creating thesrst abstraction level. Other possibilities include: Minimum spanning tree SOM [19], neural gas [20], growing cell structures [21] and competing SOMs [=-=22]-=- are examples of algorithms where the neighborhood relations are much moresexible and/or the low-dimensional output grid has been discarded. Their visualization is much less straightforward than that ... |

5 |
Knowledge discovery with supervised and unsupervised self evolving neural networks
- Alahakoon, Halgamuge
- 1998
(Show Context)
Citation Context ...rowing variants of the SOM have been proposed where the new nodes do have a well-dened place on low-dimensional grid, and thus the visualization would not be very problematic [23], [24], [25], [26], [=-=27]-=-. The SOM variants were not used in this study, because we wanted to select the most commonly used version of the SOM. However, the principles presented in this paper could naturally be applied to the... |