## A Survey of Fuzzy Clustering Algorithms for Pattern Recognition (1998)

Citations: | 61 - 2 self |

### BibTeX

@MISC{Baraldi98asurvey,

author = {A. Baraldi and P. Blonda},

title = {A Survey of Fuzzy Clustering Algorithms for Pattern Recognition},

year = {1998}

}

### Years of Citing Articles

### OpenURL

### Abstract

Clustering algorithms aim at modelling fuzzy (i.e., ambiguous) unlabeled patterns efficiently. Our goal is to propose a theoretical framework where clustering systems can be compared on the basis of their learning strategies. In the first part of this work, the following issues are reviewed: relative (probabilistic) and absolute (possibilistic) fuzzy membership functions and their relationships to the Bayes rule, batch and on-line learning, growing and pruning networks, modular network architectures, topologically perfect mapping, ecological nets and neuro-fuzziness. From this discussion an equivalence between the concepts of fuzzy clustering and soft competitive learning in clustering algorithms is proposed as a unifying framework in the comparison of clustering systems. Moreover, a set of functional attributes is selected for use as dictionary entries in our comparison. In the second part of this paper, five clustering algorithms taken from the literature are reviewed and compared on...

### Citations

9946 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ...of detecting the global minimum of a cost function for classification problems, are becoming increasingly popular in finding solutions to both classification and function regression tasks [30], [39], =-=[42]-=-. B. Growing Networks and Pruning Human designers typically have the opportunity to embed task-specific prior knowledge in an inductive learning algorithm, e.g., by setting the topology and the comple... |

9054 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ..., and relationships between SOM and other optimization techniques such as “maximum-entropy” clustering [25], deterministic annealing [26], and the expectation–maximization (EM) optimization algorithm =-=[27]-=-, are discussed in [15], [18], [28]–[31]. From a general perspective, it should be remembered that, compared to hard competitive learning, soft competitive learning not only decreases dependency on in... |

5369 |
Neural Networks for Pattern Recognition
- Bishop
- 1995
(Show Context)
Citation Context ...ates (winner as well as nonwinner) must decrease toward zero in line with the Robbins–Monro theorem (see [1, Sect. III]). Important properties of this cooling schedule have been analyzed in [9], [17]–=-=[19]-=-, [21]–[23]. The second heuristic rule applies to the output lattice of processing units and requires the size of the update (resonance) neighborhood centered on the winner node to decrease monotonica... |

3732 |
Self-organizing maps
- Kohonen
- 1997
(Show Context)
Citation Context ...rature are reviewed, assessed and compared on the basis of the selected properties of interest. These clustering models are 1) on-line learning, static-sizing, static-linking selforganizing map (SOM) =-=[2]-=-, [3]; 2) off-line learning, static-sizing, no-linking fuzzy learning vector quantization, FLVQ [4] (which was first called fuzzy kohonen clustering network (FKCN) [5]); 3) on-line learning, dynamic-s... |

3021 |
Learning internal representations by error propagation
- Rumelhart, Hinton, et al.
- 1986
(Show Context)
Citation Context ...he learning system is chosen on the basis of a stochastic process [31].BARALDI AND BLONDA: FUZZY CLUSTERING ALGORITHMS FOR PATTERN RECOGNITION—PART I 781 the adoption of “small” learning rates [34], =-=[36]-=-. According to the view that on-line procedures are approximations of iterative batch algorithms, learning rate constraints capable of guaranteeing convergence of the iterative batch mode may be appli... |

2497 | A tutorial on support vector machines for pattern recognition
- Burges
- 1998
(Show Context)
Citation Context ...ely [38]. In clustering-byselection the learning algorithm selects prototype vectors , as a subset of the input data set . Input patterns selected as prototype vectors are also called support vectors =-=[39]-=-. Typical application fields of clustering-by-selection algorithms are perceptual grouping, hidden data-structure detection and pattern classification (when the clustering algorithm is integrated in a... |

2334 |
Algorithms for Clustering Data
- Jain, Dubes
- 1988
(Show Context)
Citation Context ...date neighborhood) centered on the winner PE, such that where is computed by means of (1) such that if then . It is important to observe that (2) is related to on-line (McQueen’s) - means [19], [29], =-=[33]-=-, whose batch (Lloyd’s or Forgy’s) (1) (2)788 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 29, NO. 6, DECEMBER 1999 version [33] is a special case of the EM optimizati... |

1780 |
Neural networks and physical systems with emergent collective computation abilities
- Hopfield
- 1982
(Show Context)
Citation Context ...terizes all natural systems featuring cognitive capabilities [33]. Some artificial neural systems feature none of the biological properties listed above. For example, SOM [5] and the Hopfield network =-=[53]-=- are homogeneous systems; they feature no structured architecture and no supervision or reinforcement by, or feedback from, an external environment (also termed supervisor). In reinforcement learning,... |

764 | Hierarchical mixtures of experts and the em algorithm
- Jordan, Jacobs
- 1994
(Show Context)
Citation Context ...pplied mathematics, the principle of tackling a problem by dividing it into simpler subproblems whose solutions can be combined to yield a solution to the complex problem is termed divide and conquer =-=[45]-=-. An application of this strategy can be found in [46] and [47]. In supervised learning, an interesting modular proposal that addresses the major problem of providing effective integration of the syst... |

655 | Graphical models
- Jordan
- 2004
(Show Context)
Citation Context ...tional (theoretically computed over an infinite data set) by adapting system parameters on the basis of the finite training set [30], i.e., the learning problem is turned into an optimization problem =-=[31]-=-. When system parameters are learned from training data, there are two classes of learning situations, depending on how data are presented to the learner: the “batch” setting in which data are availab... |

646 |
Neural networks and the bias/variance dilemma
- Geman, Bienenstock, et al.
- 1992
(Show Context)
Citation Context ...ures (see [1, Sect. V-C)], i.e., an important property of the model that must be “hard-wired or built-in, perhaps to be tuned later by experience, but not learned in any statistically meaningful way” =-=[56]-=-. • In several tests regarding satellite image clustering that are in progress, FOSART features mean square error values larger than those obtained with SOM [57]. D. Advantages • Owing to its soft com... |

557 |
A massively parallel architecture for a self-organizing neural pattern recognition machine
- Carpenter, Grossberg
- 1987
(Show Context)
Citation Context ...n Table II. IV. FUZZY ART In recent years, several ART-based models have been presented. ART 1 categorizes binary patterns but features sensitivity to the order of presentation of the random sequence =-=[47]-=-. This finding led to the development of the Improved ART 1 system (IART 1), which is less dependent than ART 1 on the order of presentation of the input sequence [48]. The adaptive Hamming net (AHN),... |

393 |
Fuzzy ARTMAP: a neural network architecture for incremental supervised learning of analog multidimensional maps
- Carpenter, Grossberg, et al.
- 1992
(Show Context)
Citation Context ...ng vector quantization, FLVQ [4] (which was first called fuzzy kohonen clustering network (FKCN) [5]); 3) on-line learning, dynamic-sizing, no-linking fuzzy adaptive resonance theory (fuzzy ART) [6], =-=[7]-=-; 4) on-line learning, dynamic-sizing, dynamic-linking growing neural gas (GNG) [8], [9]; 5) on-line learning, dynamic-sizing, dynamic-linking fully self-organizing simplified adaptive resonance theor... |

322 | A growing neural gas network learns topologies
- Fritzke
- 1995
(Show Context)
Citation Context ...etwork (FKCN) [5]); 3) on-line learning, dynamic-sizing, no-linking fuzzy adaptive resonance theory (fuzzy ART) [6], [7]; 4) on-line learning, dynamic-sizing, dynamic-linking growing neural gas (GNG) =-=[8]-=-, [9]; 5) on-line learning, dynamic-sizing, dynamic-linking fully self-organizing simplified adaptive resonance theory (FOSART) [10], [11], based on the fuzzy simplified adaptive resonance theory (fuz... |

309 |
Fuzzy ART: Fast Stable Learning and Categorization of Analog Patterns by an Adaptive Resonance System
- Carpenter, Grossberg, et al.
- 1991
(Show Context)
Citation Context ...ides perceptual grouping and hidden data-structure detection [40], clustering-by-replacement algorithms can also be applied to data requantization tasks, i.e., to detect compact data coding [5], [7], =-=[10]-=-, [14], [16], [26], [41]. V. BEYOND ERROR GRADIENT DESCENT: ADVANCED TECHNIQUES FOR LEARNING FROM DATA In recent years, the neural network community has made a considerable effort in the search for le... |

287 |
ART2: self-organization of stable category recognition codes for analog input patterns
- Carpenter, Grossberg
- 1987
(Show Context)
Citation Context ...me and storage requirement [49]. ART 2, designed to detect regularities in analog random sequences, employs a computationally expensive architecture which presents difficulties in parameter selection =-=[50]-=-. To overcome these difficulties, the fuzzy ART system was developed as a generalization of ART 1 [6], [7]. This means, however, that ART 1-based structural problems may also affect fuzzy ART. The str... |

286 |
Neural-Gas Network for Vector Quantization and its Application to Time-Series Prediction in
- Martinetz, Berkovich, et al.
- 1993
(Show Context)
Citation Context ...ween SOM and other optimization techniques such as “maximum-entropy” clustering [25], deterministic annealing [26], and the expectation–maximization (EM) optimization algorithm [27], are discussed in =-=[15]-=-, [18], [28]–[31]. From a general perspective, it should be remembered that, compared to hard competitive learning, soft competitive learning not only decreases dependency on initialization, but also ... |

275 | Growing Cell Structures - A Self-Organizing Network for Unsupervised and Supervised - Fritzke - 1994 |

262 |
Learning from data - Concepts, Theory, and Methods
- Cherkassky, Mulier
- 1998
(Show Context)
Citation Context ...OM and other optimization techniques such as “maximum-entropy” clustering [25], deterministic annealing [26], and the expectation–maximization (EM) optimization algorithm [27], are discussed in [15], =-=[18]-=-, [28]–[31]. From a general perspective, it should be remembered that, compared to hard competitive learning, soft competitive learning not only decreases dependency on initialization, but also reduce... |

242 |
A possibilistic approach to clustering
- Krishnapuram, Keller
- 1993
(Show Context)
Citation Context ...are a fuzzy -partition of the input space (see [1, Sect. II-A]). • The FLVQ learning rule is where (5) (6) (7) It can be observed that if , then . This causes “the relative membership problem of FCM” =-=[45]-=-. It means that since (4) provides membership values that are relative numbers, then noise points and outliers may have significantly high membership values and may severely affect the prototype param... |

232 |
Adaptive Pattern Recognition and Neural Networks
- Pao
- 1989
(Show Context)
Citation Context ...ges over patterns and index over concepts. Absolute and relative membership types are related by the following equation: Relative typicality values, , must satisfy the following three conditions [8], =-=[18]-=-: 1) ; 2) ; 3) . Constraint 2) is an inherently probabilistic constraint [19], relating values to posterior probability estimates in a Bayesian framework. Because of condition 2), values are relative ... |

189 |
Topology representing networks
- Martinetz, Schulten
- 1994
(Show Context)
Citation Context ...he input space is larger than three. In fact, SOM tries to form a neighborhoodpreserving inverse mapping from lattice to input manifold , but not necessarily a neighborhood preserving mapping from to =-=[37]-=- (see [1, Sect. VI]). To obtain a topologically correct map by running the SOM algorithm, the topological (adjacency) structure of the preset graph has to match the topological structure of the unknow... |

141 |
Statistical mechanics and phase transitions in clustering
- Rose, Gurewitz, et al.
- 1990
(Show Context)
Citation Context ...TERN RECOGNITION—PART II 787 Voronoi polyhedra [9]. Interpretations of this second heuristic rule, and relationships between SOM and other optimization techniques such as “maximum-entropy” clustering =-=[25]-=-, deterministic annealing [26], and the expectation–maximization (EM) optimization algorithm [27], are discussed in [15], [18], [28]–[31]. From a general perspective, it should be remembered that, com... |

104 | Self-organizing maps: Ordering, convergence properties and energy functions - Erwinm, Obermayer, et al. - 1992 |

91 |
Robust Clustering Methods: A Unified View
- DAVE, KRISHNAPURAM
- 1997
(Show Context)
Citation Context ...Owing to its soft competitive implementation, FOSART is expected to be less prone to being trapped in local minima and less likely to generate dead units than hard competitive alternatives [9], [15], =-=[38]-=-. In the tests provided, the system is stable with respect to small changes in input parameters and in the order of the presentation sequence [10], [11]. • Due to its neuron removal strategy, it is ro... |

84 |
Gaussian ARTMAP: A neural network for fast incremental learning of noisy multidimensional maps
- Williamson
- 1996
(Show Context)
Citation Context ...ima. (1) Different expressions in the existing literature and consistent with the definition provided above were found to be useful. These include the following: [19] (2) Gaussian (Gaussian mixtures; =-=[23]-=-, [24]) (3) [25] (4) Gaussian [17] (5) where is assumed to be the Euclidean distance between input pattern and prototype (receptive field center) of the -th category. Variables and are all scale param... |

80 |
The generalized Gabor scheme of image representation in biological and machine vision
- Porat, Zeevi
- 1988
(Show Context)
Citation Context ...onotonically with time, such that a soft competitive learning strategy changes into a hard competitive (WTA) one. This model transition is equivalent to stating that the initial overlap (oversampling =-=[24]-=-) between nodes’ receptive fields must decrease monotonically with time until it is reduced to zero, as hard competitive learning renders receptive fields equivalent to 1083–4419/99$10.00 © 1999 IEEE... |

75 | Neural Computation and Self-Organizing Maps - Ritter, Martinetz, et al. - 1992 |

74 |
Generalised Clustering Networks and Kohonen's Self-Organising Scheme
- Pal, Bezdec, et al.
- 1993
(Show Context)
Citation Context ...ing rates (winner as well as nonwinner) must decrease toward zero in line with the Robbins–Monro theorem (see [1, Sect. III]). Important properties of this cooling schedule have been analyzed in [9], =-=[17]-=-–[19], [21]–[23]. The second heuristic rule applies to the output lattice of processing units and requires the size of the update (resonance) neighborhood centered on the winner node to decrease monot... |

73 | Self-Organizing Maps, 2nd Edition - Kohonen - 1997 |

70 | Perceptrons: expanded edition - Minsky, Papert - 1988 |

61 |
Growing Cell Structures—A Self-Organizing Network for Unsupervised and Supervised Learning
- Fritzke
- 1993
(Show Context)
Citation Context ...ation system. E. Architectural Features The main features of fuzzy ART are summarized in Table II. V. GNG GNG combines the growth mechanism inherited from the earlier proposed Growing Cell Structures =-=[53]-=- with the synapse generation rule CHR [37]. GNG is capable of generating and removing both lateral connections and PE’s, i.e., GNG belongs to the class of FSONN models (see Part I, Section VI). In par... |

59 |
Fuzzy Kohonen clustering networks
- Bezdek, Tsao, et al.
- 1992
(Show Context)
Citation Context ... and [46], it is clarified that FLVQ, like SOM, does not optimize any known objective function, and that it is expected to reach termination when the FCM objective function is approximately minimized =-=[5]-=-. In [41]–[43],BARALDI AND BLONDA: FUZZY CLUSTERING SLGORITHMS FOR PATTERN RECOGNITION—PART II 791 EFLVQ-F learning schemes are formally derived to minimize a given functional when is constant. It is... |

54 | Design and evolution of modular neural network architectures, Neural Networks
- Happel, Murre
- 1994
(Show Context)
Citation Context ... IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 29, NO. 6, DECEMBER 1999 in an entire domain by generalizing its learned behavior to instances not previously encountered =-=[44]-=-. In line with biological learning systems, a classical engineering paradigm consists of partitioning the solution to a problem between several modules specialized in learning a single task, i.e., mod... |

47 | Gtm: A principle alternative to the selforganizing map
- Bishop, Svensen, et al.
- 1997
(Show Context)
Citation Context ... soft learning strategy is not equivalent to a WTA strategy), one cannot specify a cost function that is minimized by (2), i.e., there exists no cost function yielding (2) as its gradient [15], [16], =-=[32]-=-. SOM instead features a set of potential functions, one for each node, to be independently minimized following a stochastic (on-line) gradient descent [16]. In [31], a cost function that leads to an ... |

41 |
Fuzzy min-max neural networks- Part2: clustering
- Simpson
- 1993
(Show Context)
Citation Context ...(12) and (13), fuzzy ART substitutes the operators employed in the ART 1-based activation function and match function with fuzzy-like operations (intersection and cardinality). As observed by Simpson =-=[52]-=-, to be correctly interpreted as fuzzy operations, these operations would have to be applied to fuzzy set membership values, rather than to the parameters (pattern and template vectors) of absolute fu... |

35 |
Signal and Image Processing with Neural Networks
- Masters
- 1994
(Show Context)
Citation Context ...om, an external environment (also termed supervisor). In reinforcement learning, the neural system is allowed to react to each training case. It is then told whether its reaction was effective or not =-=[54]-=-. To increase their biological plausibility, artificial neural models should employ differentiated structures provided with dishomogeneous layers, specialized subnets, hierarchies of maps, etc. In par... |

34 | Derivation of a class of training algorithms - Luttrell - 1990 |

34 |
A Bayesian analysis of self-organizing maps
- Luttrell
- 1994
(Show Context)
Citation Context ...r optimization techniques such as “maximum-entropy” clustering [25], deterministic annealing [26], and the expectation–maximization (EM) optimization algorithm [27], are discussed in [15], [18], [28]–=-=[31]-=-. From a general perspective, it should be remembered that, compared to hard competitive learning, soft competitive learning not only decreases dependency on initialization, but also reduces the prese... |

29 |
Fuzzy algorithms for learning vector quantization
- Karayiannis, Pai
- 1996
(Show Context)
Citation Context ...antization system. For some authors, “SOM was not intended for pattern classification. Rather, SOM attempts to find topological structures in the input data and display them in one or two dimensions” =-=[39]-=-, i.e., SOM can be employed in data visualization tasks because “SOM simply attempts to achieve a consistent spatial mapping of the training vectors to (usually) two dimensions” [40]. More precisely, ... |

26 | Learning without Local Minima in Radial Basis Function Networks
- Bianchini, frasconi, et al.
- 1995
(Show Context)
Citation Context ... addresses the major problem of providing effective integration of the system modules is presented in [45]. Analytically, the importance of developing modular architectures has been stressed in [35], =-=[48]-=-, where sufficient (but not necessary) conditions capable of guaranteeing local minima free cost functions are detected, such that a simple gradient descent algorithm can always reach the absolute min... |

25 | Multiple-prototype classifier design
- Bezdek
- 1998
(Show Context)
Citation Context ...to [14]. IV. PROTOTYPE VECTOR EDITING SCHEME Basically, clustering algorithms employ two reference vector generation schemes, termed clustering-by-selection and clustering-by-replacement respectively =-=[38]-=-. In clustering-byselection the learning algorithm selects prototype vectors , as a subset of the input data set . Input patterns selected as prototype vectors are also called support vectors [39]. Ty... |

23 |
Comments on “A possibilistic approach to clustering
- Barni, Cappellini, et al.
- 1996
(Show Context)
Citation Context ...ffect the prototype parameter estimate (e.g., [20]). On the other hand, in possibilistic fuzzy clustering, learning rates computed from absolute typicalities tend to produce coincident clusters [20], =-=[22]-=-. This poor behavior can be explained by the fact that cluster prototypes are uncoupled in possibilistic clustering, i.e., possibilistic clustering algorithms try to minimize an objective function by ... |

22 |
Self-Organizing Neural Network as a Fuzzy Classifier
- Mitra, Pal
- 1994
(Show Context)
Citation Context ...]. However, other authors believe that since “fuzziness can be incorporated at various levels to generate a fuzzy neural network, i.e., it can be at the input, output, learning or neural levels” (see =-=[4]-=-, where each input feature is expressed in terms of fuzzy membership values indicating a degree of belonging to each of the linguistic properties low, medium, and high), our “claim of calling certain ... |

22 |
Two Soft Relatives of Learning Vector Quantization
- Bezdek, Pal
- 1995
(Show Context)
Citation Context ...se clustering models are 1) on-line learning, static-sizing, static-linking selforganizing map (SOM) [2], [3]; 2) off-line learning, static-sizing, no-linking fuzzy learning vector quantization, FLVQ =-=[4]-=- (which was first called fuzzy kohonen clustering network (FKCN) [5]); 3) on-line learning, dynamic-sizing, no-linking fuzzy adaptive resonance theory (fuzzy ART) [6], [7]; 4) on-line learning, dynami... |

22 |
An integrated approach to fuzzy learning vector quantization and fuzzy c-means clustering
- Karayiannis, Bezdek
- 1997
(Show Context)
Citation Context ... of batch FLVQ algorithms formally defined as a class of cost function minimization schemes. Hereafter, this class of batch vector quantizers will be referred to as the extended FLVQ family (EFLVQ-F) =-=[41]-=-–[43]. FLVQ updating can be seen as a special case of EFLVQ-F learning schemes for a restricted range of the weighting exponent. FLVQ is also related to several on-line fuzzy clustering algorithms suc... |

22 |
Repairs to GLVQ: a new family of competitive learning schemes
- Karayiannis, Bezdek, et al.
- 1996
(Show Context)
Citation Context ...r as well as nonwinner) must decrease toward zero in line with the Robbins–Monro theorem (see [1, Sect. III]). Important properties of this cooling schedule have been analyzed in [9], [17]–[19], [21]–=-=[23]-=-. The second heuristic rule applies to the output lattice of processing units and requires the size of the update (resonance) neighborhood centered on the winner node to decrease monotonically with ti... |

22 | The LBG-U method for vector quantization - an improvement over LBG inspired from neural networks - Fritzke - 1997 |

21 | Competitive Learning Algorithms for Robust Vector Quantization
- Hofmann, Buhmann
- 1998
(Show Context)
Citation Context ...the same as, (1) and (2) is discussed. This cost function was introduced in a nonneural context to design an optimal vector quantizer codebook for encoding data for transmission along a noisy channel =-=[34]-=-, [35]. C. Limitations Despite its many successes in practical applications, SOM contains some major deficiencies (many of which are acknowledged in [3]). • Since SOM does not minimize any known objec... |

20 | Online learning and stochastic approximations
- Bottou
- 1998
(Show Context)
Citation Context ...winner as well as nonwinner) must decrease toward zero in line with the Robbins–Monro theorem (see [1, Sect. III]). Important properties of this cooling schedule have been analyzed in [9], [17]–[19], =-=[21]-=-–[23]. The second heuristic rule applies to the output lattice of processing units and requires the size of the update (resonance) neighborhood centered on the winner node to decrease monotonically wi... |