### Citations

4591 |
Self-Organizing Maps,
- Kohonen
- 1995
(Show Context)
Citation Context ...f desired, some vector quantization algorithm, e.g., -means, can be used instead of SOM in creating the first abstraction level. Other possibilities include the following. • Minimum spanning tree SOM =-=[19]-=-, neural gas [20], growing cell structures [21], and competing SOM’s [22] are examples of algorithms where the neighborhood relations are much more flexible and/or the low-dimensional output grid has ... |

624 |
Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis.
- KRUSKAL
- 1964
(Show Context)
Citation Context ... the high-dimensional data set can be visualized while still preserving its essential topological properties. Examples of such nonlinear projection methods include multidimensional scaling techniques =-=[39]-=-, [40], Sammon’s mapping [41], and curvilinear component analysis [42]. A special technique is to project the prototype vectors into a color space so that similar map units are assigned similar colors... |

617 |
Mixture Models: Inference and Applications to Clustering.
- McLachlan, Basford
- 1988
(Show Context)
Citation Context ... exactly one cluster. Fuzzy clustering [4] is a generalization of crisp clustering where each sample has a varying degree of membership in all clusters. Clustering can also be based on mixture models =-=[5]-=-. In this approach, the data are assumed to be generated by several parametrized distributions (typically Gaussians). Distribution parameters are estimated using, for example, the expectation-maximati... |

600 |
Fast learning in networks of locally-tuned processing units
- Moody, Darken
- 1989
(Show Context)
Citation Context ...of vector , and evaluated for unit . If neigh(3) (4) (5) (6) borhood kernel value is one for the BMU and zero elsewhere, this leads to minimization of (1)—the SOM reduces to adaptive -means algorithm =-=[18]-=-. If this is not the case, from (6), it follows that the prototype vectors are not in the centroids of their Voronoi sets but are local averages of all vectors in the data set weighted by neighborhood... |

545 |
A nonlinear mapping for data structure analysis
- Sammon
- 1969
(Show Context)
Citation Context ...t can be visualized while still preserving its essential topological properties. Examples of such nonlinear projection methods include multidimensional scaling techniques [39], [40], Sammon’s mapping =-=[41]-=-, and curvilinear component analysis [42]. A special technique is to project the prototype vectors into a color space so that similar map units are assigned similar colors [43], [44]. Of course, based... |

530 |
An examination of procedures for determining the number of clusters in a data set
- Milligan, Cooper
- 1985
(Show Context)
Citation Context ...means tries to find spherical clusters. To select the best one among different partitionings, each of these can be evaluated using some kind of validity index. Several indices have been proposed [6], =-=[12]-=-. In our simulations, we used the Davies–Bouldin index [13], which uses for within-cluster distance and for between clusters distance. According to Davies–Bouldin validity index, the best clustering m... |

511 |
A cluster separation measure
- Davies, Bouldin
- 1979
(Show Context)
Citation Context ...one among different partitionings, each of these can be evaluated using some kind of validity index. Several indices have been proposed [6], [12]. In our simulations, we used the Davies–Bouldin index =-=[13]-=-, which uses for within-cluster distance and for between clusters distance. According to Davies–Bouldin validity index, the best clustering minimizes where is the number of clusters. The Davies–Bouldi... |

307 |
Data preparation for data mining
- Pyle
- 1999
(Show Context)
Citation Context ... the other hand, data modeling without good understanding and careful preparation of the data leads to problems. Finally, the whole mining process is meaningless if the new knowledge will not be used =-=[1]-=-. The purpose of survey is to gain insight into the data—possibilities and problems—to determine whether the data are sufficient and to select the proper preprocessing and modeling tools. Typically, s... |

256 | Self organization of a massive document collection
- KOHONEN, KASKI, et al.
(Show Context)
Citation Context ...er hand, the complexity scales quadratively with the number of map units. Thus, training huge maps is time consuming, although the process can be speeded up with special techniques; see, for example, =-=[33]-=- and [34]. For example, in [33], a SOM with million units was trained with 6.8 million 500-dimensional data vectors. If desired, some vector quantization algorithm, e.g., -means, can be used instead o... |

250 |
Asymptotically optimal block quantization
- GERSHO
- 1979
(Show Context)
Citation Context ...zation, it has been shown that the density of the prototype vectors is proportional to const , where is the probability density function (p.d.f.) of the input data, is dimension, and is distance norm =-=[28]-=-, [29]. For the SOM, connection between the prototypes and the p.d.f. of the input data has not been derived in general case. However, a similar power law has been derived in the 1-D case [30]. Even t... |

207 | Curvilinear component analysis: A self-organizing neural network for nonlinear mapping of data sets
- Demartines, Hérault
- 1997
(Show Context)
Citation Context ...g its essential topological properties. Examples of such nonlinear projection methods include multidimensional scaling techniques [39], [40], Sammon’s mapping [41], and curvilinear component analysis =-=[42]-=-. A special technique is to project the prototype vectors into a color space so that similar map units are assigned similar colors [43], [44]. Of course, based on the visualization, one can select clu... |

199 |
Chameleon: Hierarchical clustering using dynamic modeling
- Karypis, Han, et al.
- 1999
(Show Context)
Citation Context ...o a cluster can radically change the distances [6]. To be more robust, the local criterion should depend on collective features of a local data set [7]. Solutions include using more than one neighbor =-=[8]-=- or a weighted sum of all distances. It has been shown that the SOM algorithm implicitly uses such a measure [9]. B. Algorithms The two main ways to cluster data—make the partitioning—are hierarchical... |

175 |
Kohonen’s self organizing feature maps for exploratory data analysis
- Ultsch, HP
- 1990
(Show Context)
Citation Context ... well as their spatial relationships, is usually acquired by visual inspection of the map. The most widely used methods for visualizing the cluster structure of the SOM are distance matrix techniques =-=[35]-=-, [36], especially the unified distance matrix (U-matrix). The U-matrix shows distances between prototype vectors of neighboring map units. Because they typically have similar prototype vectors, U-mat... |

169 |
A “neural-gas” network learns topologies
- Martinetz, Schulten
- 1991
(Show Context)
Citation Context ...ector quantization algorithm, e.g., -means, can be used instead of SOM in creating the first abstraction level. Other possibilities include the following. • Minimum spanning tree SOM [19], neural gas =-=[20]-=-, growing cell structures [21], and competing SOM’s [22] are examples of algorithms where the neighborhood relations are much more flexible and/or the low-dimensional output grid has been discarded. T... |

161 | Some new indexes of cluster validity
- Bezdek, Pal
- 1998
(Show Context)
Citation Context ...n Table I are based on distance to nearest neighbor. However, the problem is that they are sensitive to noise and outliers. Addition of a single sample to a cluster can radically change the distances =-=[6]-=-. To be more robust, the local criterion should depend on collective features of a local data set [7]. Solutions include using more than one neighbor [8] or a weighted sum of all distances. It has bee... |

159 |
Asymptotic quantization error of continuous signals and their quantization dimension
- Zador
- 1982
(Show Context)
Citation Context ..., it has been shown that the density of the prototype vectors is proportional to const , where is the probability density function (p.d.f.) of the input data, is dimension, and is distance norm [28], =-=[29]-=-. For the SOM, connection between the prototypes and the p.d.f. of the input data has not been derived in general case. However, a similar power law has been derived in the 1-D case [30]. Even though ... |

97 | A Non-linear Projection Method Based on Kohonen’s Topology Preserving Maps
- Kraaijveld, Mao, et al.
- 1992
(Show Context)
Citation Context ...as their spatial relationships, is usually acquired by visual inspection of the map. The most widely used methods for visualizing the cluster structure of the SOM are distance matrix techniques [35], =-=[36]-=-, especially the unified distance matrix (U-matrix). The U-matrix shows distances between prototype vectors of neighboring map units. Because they typically have similar prototype vectors, U-matrix is... |

72 |
Fuzzy Models for Pattern Recognition: Methods that Search for Structures in Data
- Bezdek, Pal, et al.
- 1992
(Show Context)
Citation Context ... acceptable. II. CLUSTERING A. Definitions A clustering means partitioning a data set into a set of clusters , . In crisp clustering, each data sample belongs to exactly one cluster. Fuzzy clustering =-=[4]-=- is a generalization of crisp clustering where each sample has a varying degree of membership in all clusters. Clustering can also be based on mixture models [5]. In this approach, the data are assume... |

62 | Clustering properties of hierarchical selforganizing maps
- Lampinen, Oja
- 1992
(Show Context)
Citation Context ...ective features of a local data set [7]. Solutions include using more than one neighbor [8] or a weighted sum of all distances. It has been shown that the SOM algorithm implicitly uses such a measure =-=[9]-=-. B. Algorithms The two main ways to cluster data—make the partitioning—are hierarchical and partitive approaches. The hierarchical methods can be further divided to agglomerative and divisive algorit... |

49 |
A comparison of SOM neural network and hierarchical clustering methods
- Mangiameli, SK, et al.
- 1996
(Show Context)
Citation Context ...pecially in the agglomerative methods. Another benefit is noise reduction. The prototypes are local averages of the data and, therefore, less sensitive to random variations than the original data. In =-=[16]-=-, partitive methods (i.e., a small SOM) greatly outperformed hierarchical methods in clustering imperfect data. Outliers are less of a problem since—by definition—there are very few outlier points, an... |

49 |
Asymptotic level density for a class of vector quantization processes
- Ritter
- 1991
(Show Context)
Citation Context ...ce norm [28], [29]. For the SOM, connection between the prototypes and the p.d.f. of the input data has not been derived in general case. However, a similar power law has been derived in the 1-D case =-=[30]-=-. Even though the law holds only when the number of prototypes approaches infinity and neighborhood width is very large, numerical experiments have shown that the computational results are relatively ... |

38 |
A scalable parallel algorithm for self-organizing maps with applications to sparse data mining problems
- Lawrence, Almasi, et al.
- 1999
(Show Context)
Citation Context ...t does not require huge amounts of memory—basically just the prototype vectors and the current training vector—and can be implemented both in a neural, on-line learning manner as well as parallelized =-=[32]-=-. On the other hand, the complexity scales quadratively with the number of map units. Thus, training huge maps is time consuming, although the process can be speeded up with special techniques; see, f... |

37 | Let it Grow - Self-organizing Feature Maps With Problem Dependent Cell Structure”; Proc.od the ICANN-9 1, Helsinki, Elesevier Science Publ., 199 1. We are now left with only 3 neurons in a minimum distance classifier: neuron 1 (with weight coordinates wi=
- Fritzke
(Show Context)
Citation Context ...e.g., -means, can be used instead of SOM in creating the first abstraction level. Other possibilities include the following. • Minimum spanning tree SOM [19], neural gas [20], growing cell structures =-=[21]-=-, and competing SOM’s [22] are examples of algorithms where the neighborhood relations are much more flexible and/or the low-dimensional output grid has been discarded. Their visualization is much les... |

32 |
Phase transitions in stochastic self-organizing maps
- Graepel, Burger, et al.
- 1997
(Show Context)
Citation Context ...ning for each number of clusters was selected using error criterion in (1). Another possibility would have been to use some annealing technique to better avoid local minima of the error function [47]–=-=[49]-=-. To select the best clustering among the partitionings with different number of clusters, the Davies–Bouldin validity index (2) was used. In practice, though, it is better to use the index values as ... |

30 | Visualizing high-dimensional structure with the incremental grid growing neural network - Blackmore, Miikkulainen - 1995 |

28 |
Complexity optimized data clustering by competitive neural networks
- Buhmann, KÜhnel
- 1993
(Show Context)
Citation Context ...ide a data set into a number of clusters, typically by trying to minimize some criterion or error function. The number of clusters is usually predefined, but it can also be part of the error function =-=[11]-=-. The algorithm consists of the following steps. 1) Determine the number of clusters. 2) Initialize the cluster centers. 3) Compute partitioning for data. 4) Compute (update) cluster centers. 5) If th... |

26 |
Visualizing the clusters on the Self-Organizing Map
- Iivarinen, Kohonen, et al.
- 1994
(Show Context)
Citation Context ... of neighboring map units. Because they typically have similar prototype vectors, U-matrix is actually closely related to the single linkage measure. It can be efficiently visualized using gray shade =-=[37]-=-; see, for example, Figs. 7(a), 11(a), and 12(a).sVESANTO AND ALHONIEMI: CLUSTERING OF THE SELF-ORGANIZING MAP 591 Another visualization method is to display the number of hits in each map unit. Train... |

23 |
Fast Deterministic Self-Organizing Maps
- Koikkalainen
- 1995
(Show Context)
Citation Context ...the complexity scales quadratively with the number of map units. Thus, training huge maps is time consuming, although the process can be speeded up with special techniques; see, for example, [33] and =-=[34]-=-. For example, in [33], a SOM with million units was trained with 6.8 million 500-dimensional data vectors. If desired, some vector quantization algorithm, e.g., -means, can be used instead of SOM in ... |

19 |
Simulated annealing and codebook design
- Vaisey, Gersho
- 1988
(Show Context)
Citation Context ...titioning for each number of clusters was selected using error criterion in (1). Another possibility would have been to use some annealing technique to better avoid local minima of the error function =-=[47]-=-–[49]. To select the best clustering among the partitionings with different number of clusters, the Davies–Bouldin validity index (2) was used. In practice, though, it is better to use the index value... |

18 |
Superparamagnetic clustering of data, Phys
- Blatt, Wiseman, et al.
- 1996
(Show Context)
Citation Context ...to noise and outliers. Addition of a single sample to a cluster can radically change the distances [6]. To be more robust, the local criterion should depend on collective features of a local data set =-=[7]-=-. Solutions include using more than one neighbor [8] or a weighted sum of all distances. It has been shown that the SOM algorithm implicitly uses such a measure [9]. B. Algorithms The two main ways to... |

18 |
Comparison of SOM point densities based on different criteria
- Kohonen
- 1999
(Show Context)
Citation Context ...ned by the two eigenvectors with largest eigenvalues. The gray dots are data points, and the black crosses are prototype vectors of the SOM trained with corresponding data. small number of prototypes =-=[31]-=-. Based on close relation between the SOM and -means, it can be assumed that the SOM roughly follows the density of training data when not only the number of map units but also the final neighborhood ... |

17 |
Self-organizing map as a new method for clustering and data analysis
- Zhang, Li
- 1993
(Show Context)
Citation Context ...ers. The Voronoi sets of such map units have very few samples (“hits”) or may even be empty. This information can be utilized in clustering the SOM by using zero-hit units to indicate cluster borders =-=[38]-=-. Generic vector projection methods can also be used. As opposed to the methods above, these are generally applicable to any set of vectors, for example the original data set. The high-dimensional vec... |

14 | Improving the learning speed in topological maps of patterns - Rodrigues, Almeida - 1990 |

13 |
interactive interpretation of hierarchical clustering
- Boudaillier, Hebrail
- 1998
(Show Context)
Citation Context ...r different clusters. In fact, some clusters may be composed of several subclusters; to obtain sensible partitioning of the data, the dendrogram may have to be cut at different levels for each branch =-=[10]-=-. For example, two alternative ways to get three clusters are shown in Fig. 1. Partitive clustering algorithms divide a data set into a number of clusters, typically by trying to minimize some criteri... |

13 |
Clustering with Competing Self-Organizing Maps
- Cheng
- 1992
(Show Context)
Citation Context ...instead of SOM in creating the first abstraction level. Other possibilities include the following. • Minimum spanning tree SOM [19], neural gas [20], growing cell structures [21], and competing SOM’s =-=[22]-=- are examples of algorithms where the neighborhood relations are much more flexible and/or the low-dimensional output grid has been discarded. Their visualization is much less straightforward than tha... |

12 | Coloring that Reveals High-dimensional Structures
- Kaski, Venna, et al.
- 1999
(Show Context)
Citation Context ... Sammon’s mapping [41], and curvilinear component analysis [42]. A special technique is to project the prototype vectors into a color space so that similar map units are assigned similar colors [43], =-=[44]-=-. Of course, based on the visualization, one can select clusters manually. However, this is a tedious process and nothing guarantees that the manual selection is done consistently. Instead, automated ... |

12 |
Interpreting the Kohonen self-organizing map using continuity-constrained clustering
- Murtagh
- 1995
(Show Context)
Citation Context ... Instead, automated methods are needed. C. SOM Clustering In agglomerative clustering, the SOM neighborhood relation can be used to constrain the possible merges in the construction of the dendrogram =-=[45]-=-. In addition, knowledge of interpolating units can be utilized both in agglomerative and partitive clustering by excluding them from the analysis. If this is used together with the neighborhood const... |

11 |
Clustering of socio-economic data with Kohonen Maps
- Varfis, Versino
- 1992
(Show Context)
Citation Context ...OMPUTATION TIMES FOR CONSTRUCTING THE SOM’S ARE SHOWN IN THE FIRST COLUMN ones—become intractably heavy. For this reason, it is convenient to cluster a set of prototypes rather than directly the data =-=[15]-=-. Consider clustering samples using -means. This involves making several clustering trials with different values for . The computational complexity is proportional to , where is preselected maximum nu... |

11 |
Growing grid—a self-organizing network with constant neighborhood range and adaptation strength
- Fritzke
- 1995
(Show Context)
Citation Context ...ddition, several such growing variants of the SOM have been proposed where the new nodes do have a welldefined place on low-dimensional grid, and thus, the visualization would not be very problematic =-=[23]-=-–[27]. The SOM variants were not used in this study because we wanted to select the most commonly used version of the SOM. However, the principles presented in this paper could naturally be applied to... |

10 |
Hierarchical self-organizing networks
- Luttrell
- 1989
(Show Context)
Citation Context ...level approaches to clustering have been proposed earlier, e.g., in [9]. While extra abstraction levels yield higher distortion, they also effectively reduce the complexity of the reconstruction task =-=[14]-=-. The primary benefit of the two-level approach is the reduction of the computational cost. Even with relatively small number of samples, many clustering algorithms—especially hierarchical (2) TABLE I... |

7 |
Self-Organizing Maps: Optimization Approaches,» in Artificial Neural Networks
- Kohonen
- 1991
(Show Context)
Citation Context ... also a batch version of the algorithm where the adaptation coefficient is not used [2]. In the case of a discrete data set and fixed neighborhood kernel, the error function of SOM can be shown to be =-=[17]-=- where is number of training samples, and is the number of map units. Neighborhood kernel is centered at unit , which is the BMU of vector , and evaluated for unit . If neigh(3) (4) (5) (6) borhood ke... |

7 |
Knowledge discovery with supervised and unsupervised self evolving neural networks
- Alahakoon, Halgamuge
- 1998
(Show Context)
Citation Context ...on, several such growing variants of the SOM have been proposed where the new nodes do have a welldefined place on low-dimensional grid, and thus, the visualization would not be very problematic [23]–=-=[27]-=-. The SOM variants were not used in this study because we wanted to select the most commonly used version of the SOM. However, the principles presented in this paper could naturally be applied to the ... |

7 | Vector quantization codebook generation using simulated annealing - Flanagan, Morrell, et al. - 1989 |

5 |
SOM-based data visualization methods,” Intell
- Vesanto
- 1999
(Show Context)
Citation Context ...e onto a low-dimensional grid. This ordered grid can be used as a convenient visualization surface for showing different features of the SOM (and thus of the data), for example, the cluster structure =-=[3]-=-. Manuscript received June 15, 1999; revised November 13, 1999. The authors are with Neural Networks Research Centre, Helsinki University of Technology, Helsinki, Finland (e-mail: Juha.Vesanto@hut.fi;... |

5 |
On the use of two traditional statistical techniques to improve the readibility
- Varfis
- 1993
(Show Context)
Citation Context ... [40], Sammon’s mapping [41], and curvilinear component analysis [42]. A special technique is to project the prototype vectors into a color space so that similar map units are assigned similar colors =-=[43]-=-, [44]. Of course, based on the visualization, one can select clusters manually. However, this is a tedious process and nothing guarantees that the manual selection is done consistently. Instead, auto... |

4 |
Self-Organizing Maps. Berlin-Heidelberg
- Kohonen
- 2001
(Show Context)
Citation Context ...onsidered. For this reason, efficient visualizations and summaries are essential. In this paper, we focus on clusters since they are important characterizations of data. The self-organizing map (SOM) =-=[2]-=- is especially suitable for data survey because it has prominent visualization properties. It creates a set of prototype vectors representing the data set and carries out a topology preserving project... |

4 | A neural network which adapts its stucture to a given set of patterns - Jockusch - 1990 |

2 |
Industrial Applications of Neural Networks
- Simula, Vesanto, et al.
- 1999
(Show Context)
Citation Context ...arked by solid line and results for the prototype vectors of the SOM by dashed line, respectively. • Data set III consisted of 4205 75-D samples of the technology of pulp and paper mills of the world =-=[46]-=-. Especially in data set III, the original variables had very different scales. Therefore, as part of preprocessing, all variables in each data set were linearly scaled to have zero mean and unit vari... |