Results 1 - 10
of
48
Data Clustering: A Review
- ACM COMPUTING SURVEYS
, 1999
"... Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exp ..."
Abstract
-
Cited by 912 (9 self)
- Add to MetaCart
Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult problem combinatorially, and differences in assumptions and contexts in different communities has made the transfer of useful generic concepts and methodologies slow to occur. This paper presents an overview of pattern clustering methods from a statistical pattern recognition perspective, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We present a taxonomy of clustering techniques, and identify cross-cutting themes and recent advances. We also describe some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval.
SOM PAK: The Self-Organizing Map Program Package
, 1996
"... : The Self-Organizing Map (SOM) represents the result of a vector quantization algorithm that places a number of reference or codebook vectors into a high-dimensional input data space to approximate to its data sets in an ordered fashion. The SOM PAK program package contains all programs necessary f ..."
Abstract
-
Cited by 117 (8 self)
- Add to MetaCart
: The Self-Organizing Map (SOM) represents the result of a vector quantization algorithm that places a number of reference or codebook vectors into a high-dimensional input data space to approximate to its data sets in an ordered fashion. The SOM PAK program package contains all programs necessary for the correct application of the SelfOrganizing Map algorithm in the visualization of complex experimental data. The first version 1.0 of this program package was published in 1992 and since then the package has been updated regularly to include latest improvements in the SOM implementations. This report that contains the last documentation was prepared for bibliographical purposes. Contents 1 General 3 2 The principle of the SOM 4 3 Practical advices for the construction of good maps 6 4 Installation of the program package 7 4.1 Getting the program code : : : : : : : : : : : : : : : : : : : : : : : : 8 4.2 Installation in UNIX : : : : : : : : : : : : : : : : : : : : : : : : : : : 8 4.3...
Clustering of the Self-Organizing Map
, 2000
"... The self-organizing map (SOM) is an excellent tool in exploratory phase of data mining. It projects input space on prototypes of a low-dimensional regular grid that can be effectively utilized to visualize and explore properties of the data. When the number of SOM units is large, to facilitate quant ..."
Abstract
-
Cited by 104 (0 self)
- Add to MetaCart
The self-organizing map (SOM) is an excellent tool in exploratory phase of data mining. It projects input space on prototypes of a low-dimensional regular grid that can be effectively utilized to visualize and explore properties of the data. When the number of SOM units is large, to facilitate quantitative analysis of the map and the data, similar units need to be grouped, i.e., clustered. In this paper, different approaches to clustering of the SOM are considered. In particular, the use of hierarchical agglomerative clustering and partitive clustering using-means are investigated. The two-stage procedure---first using SOM to produce the prototypes that are then clustered in the second stage---is found to perform well when compared with direct clustering of the data and to reduce the computation time.
Data Exploration Using Self-Organizing Maps
- ACTA POLYTECHNICA SCANDINAVICA: MATHEMATICS, COMPUTING AND MANAGEMENT IN ENGINEERING SERIES NO. 82
, 1997
"... Finding structures in vast multidimensional data sets, be they measurement data, statistics, or textual documents, is difficult and time-consuming. Interesting, novel relations between the data items may be hidden in the data. The selforganizing map (SOM) algorithm of Kohonen can be used to aid the ..."
Abstract
-
Cited by 93 (4 self)
- Add to MetaCart
Finding structures in vast multidimensional data sets, be they measurement data, statistics, or textual documents, is difficult and time-consuming. Interesting, novel relations between the data items may be hidden in the data. The selforganizing map (SOM) algorithm of Kohonen can be used to aid the exploration: the structures in the data sets can be illustrated on special map displays. In this work, the methodology of using SOMs for exploratory data analysis or data mining is reviewed and developed further. The properties of the maps are compared with the properties of related methods intended for visualizing highdimensional multivariate data sets. In a set of case studies the SOM algorithm is applied to analyzing electroencephalograms, to illustrating structures of the standard of living in the world, and to organizing full-text document collections. Measures are proposed for evaluating the quality of different types of maps in representing a given data set, and for measuring the robu...
SOM-Based Data Visualization Methods
- Intelligent Data Analysis
, 1999
"... The Self-Organizing Map (SOM) is an efficient tool for visualization of multidimensional numerical data. In this paper, an overview and categorization of both old and new methods for the visualization of SOM is presented. The purpose is to give an idea of what kind of information can be acquired fro ..."
Abstract
-
Cited by 55 (3 self)
- Add to MetaCart
The Self-Organizing Map (SOM) is an efficient tool for visualization of multidimensional numerical data. In this paper, an overview and categorization of both old and new methods for the visualization of SOM is presented. The purpose is to give an idea of what kind of information can be acquired from different presentations and how the SOM can best be utilized in exploratory data visualization. Most of the presented methods can also be applied in the more general case of first making a vector quantization (e.g. k-means) and then a vector projection (e.g. Sammon's mapping).
Exploration of Text Collections with Hierarchical Feature Maps
, 1997
"... Document classification is one of the central issues in information retrieval research. The aim is to uncover similarities between text documents. In other words, classification techniques are used to gain insight in the structure of the various data items contained in the text archive. In this pape ..."
Abstract
-
Cited by 37 (14 self)
- Add to MetaCart
Document classification is one of the central issues in information retrieval research. The aim is to uncover similarities between text documents. In other words, classification techniques are used to gain insight in the structure of the various data items contained in the text archive. In this paper we show the results from using a hierarchy of self-organizing maps to perform the text classification task. Each of the individual self-organizing maps is trained independently and gets specialized to a subset of the input data. As a consequence, the choice of this particular artificial neural network model enables the true establishment of a document taxonomy. The benefit of this approach is a straightforward representation of document similarities combined with dramatically reduced training time. In particular, the hierarchical representation of document collections is appealing because it is the underlying organizational principle in use by librarians providing the necessary familiarity...
Novelty detection using Self-Organizing Maps
- In Proc. of ICONIP'97
, 1997
"... Failure detection in process monitoring involves a classification mainly on the basis of data from normal operation. When a Self-Organizing Map is used for the description of normal system behaviour, a compatibility measure is needed for declaring a map and a dataset as matching. We propose a novel ..."
Abstract
-
Cited by 23 (2 self)
- Add to MetaCart
Failure detection in process monitoring involves a classification mainly on the basis of data from normal operation. When a Self-Organizing Map is used for the description of normal system behaviour, a compatibility measure is needed for declaring a map and a dataset as matching. We propose a novel variant of one such measure and investigate usefulness of consisting and novel measures both with synthetic data and in two real world applications. 1 Introduction A typical aspect of fault diagnosis in a process or system is the limited availability of measurement data concerning faulty situations: often it is hard to acquire a dataset representative of the whole "failure space", whereas the normal operation space can be characterized very accurately. It may be valuable to make an accurate representation of the normal (admissible or healthy) behaviour, and detect faults as significant deviations w.r.t. this admissible domain. When there are some measurements from the abnormal situation, g...
Emergence and Categorization of Coordinated Visual Behavior Through Embodied Interaction
, 1998
"... . This paper discusses the emergence of sensorimotor coordination for ESCHeR, a 4DOF redundant foveated robot-head, by interaction with its environment. A feedback-errorlearning (FEL)-based distributed control provides the system with explorative abilities with reflexes constraining the learning spa ..."
Abstract
-
Cited by 23 (8 self)
- Add to MetaCart
. This paper discusses the emergence of sensorimotor coordination for ESCHeR, a 4DOF redundant foveated robot-head, by interaction with its environment. A feedback-errorlearning (FEL)-based distributed control provides the system with explorative abilities with reflexes constraining the learning space. A Kohonen network, trained at run-time, categorizes the sensorimotor patterns obtained over ESCHeR's interaction with its environment, enables the reinforcement of frequently executed actions, thus stabilizing the learning activity over time. We explain how the development of ESCHeR's visual abilities (namely gaze fixation and saccadic motion), from a context-free reflex-based control process to a context-dependent, pattern-based sensorimotor coordination can be related to the Piagetian 'stage theory'. Keywords: Foveated active vision, Oculomotor control, Feedback-error-learning, Emergent coordination, Sensorimotor memory. 1. Introduction Human babies are born with a rich set of innate ...
Neural Maps and Topographic Vector Quantization
, 1999
"... Neural maps combine the representation of data by codebook vectors, like a vector quantizer, with the property of topography, like a continuous function. While the quantization error is simple to compute and to compare between different maps, topography of a map is difficult to define and to quantif ..."
Abstract
-
Cited by 19 (4 self)
- Add to MetaCart
Neural maps combine the representation of data by codebook vectors, like a vector quantizer, with the property of topography, like a continuous function. While the quantization error is simple to compute and to compare between different maps, topography of a map is difficult to define and to quantify. Yet, topography of a neural map is an advantageous property, e.g. in the presence of noise in a transmission channel, in data visualization, and in numerous other applications. In this paper we review some conceptual aspects of definitions of topography, and some recently proposed measures to quantify topography. We apply the measures first to neural maps trained on synthetic data sets, and check the measures for properties like reproducability, scalability, systematic dependence of the value of the measure on the topology of the map etc. We then test the measures on maps generated for four real-world data sets, a chaotic time series, speech data, and two sets of image data. The measures ...

