Results 1  10
of
426
Survey of clustering algorithms
 IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2005
"... Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the ..."
Abstract

Cited by 270 (3 self)
 Add to MetaCart
(Show Context)
Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.
Deterministic Annealing for Clustering, Compression, Classification, Regression, and Related Optimization Problems
 Proceedings of the IEEE
, 1998
"... this paper. Let us place it within the neural network perspective, and particularly that of learning. The area of neural networks has greatly benefited from its unique position at the crossroads of several diverse scientific and engineering disciplines including statistics and probability theory, ph ..."
Abstract

Cited by 259 (11 self)
 Add to MetaCart
(Show Context)
this paper. Let us place it within the neural network perspective, and particularly that of learning. The area of neural networks has greatly benefited from its unique position at the crossroads of several diverse scientific and engineering disciplines including statistics and probability theory, physics, biology, control and signal processing, information theory, complexity theory, and psychology (see [45]). Neural networks have provided a fertile soil for the infusion (and occasionally confusion) of ideas, as well as a meeting ground for comparing viewpoints, sharing tools, and renovating approaches. It is within the illdefined boundaries of the field of neural networks that researchers in traditionally distant fields have come to the realization that they have been attacking fundamentally similar optimization problems.
Variable Neighborhood Search
, 1997
"... Variable neighborhood search (VNS) is a recent metaheuristic for solving combinatorial and global optimization problems whose basic idea is systematic change of neighborhood within a local search. In this survey paper we present basic rules of VNS and some of its extensions. Moreover, applications a ..."
Abstract

Cited by 242 (24 self)
 Add to MetaCart
Variable neighborhood search (VNS) is a recent metaheuristic for solving combinatorial and global optimization problems whose basic idea is systematic change of neighborhood within a local search. In this survey paper we present basic rules of VNS and some of its extensions. Moreover, applications are briefly summarized. They comprise heuristic solution of a variety of optimization problems, ways to accelerate exact algorithms and to analyze heuristic solution processes, as well as computerassisted discovery of conjectures in graph theory.
Adaptive fuzzy segmentation of magnetic resonance images
 IEEE TRANS. MED. IMAG
, 1999
"... An algorithm is presented for the fuzzy segmentation of twodimensional (2D) and threedimensional (3D) multispectral magnetic resonance (MR) images that have been corrupted by intensity inhomogeneities, also known as shading artifacts. The algorithm is an extension of the 2D adaptive fuzzy Cme ..."
Abstract

Cited by 101 (9 self)
 Add to MetaCart
An algorithm is presented for the fuzzy segmentation of twodimensional (2D) and threedimensional (3D) multispectral magnetic resonance (MR) images that have been corrupted by intensity inhomogeneities, also known as shading artifacts. The algorithm is an extension of the 2D adaptive fuzzy Cmeans algorithm (2D AFCM) presented in previous work by the authors. This algorithm models the intensity inhomogeneities as a gain field that causes image intensities to smoothly and slowly vary through the image space. It iteratively adapts to the intensity inhomogeneities and is completely automated. In this paper, we fully generalize 2D AFCM to threedimensional (3D) multispectral images. Because of the potential size of 3D image data, we also describe a new faster multigridbased algorithm for its implementation. We show, using simulated MR data, that 3D AFCM yields lower error rates than both the standard fuzzy Cmeans (FCM) algorithm and two other competing methods, when segmenting corrupted images. Its efficacy is further demonstrated using real 3D scalar and multispectral MR brain images.
Data Clustering: 50 Years Beyond KMeans
, 2008
"... Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning. As an example, a common scheme of scientific classification puts organisms into taxonomic ranks: domain, kingdom, phylum, class, etc.). Cluster analysis is the formal study of algorithms and m ..."
Abstract

Cited by 83 (3 self)
 Add to MetaCart
Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning. As an example, a common scheme of scientific classification puts organisms into taxonomic ranks: domain, kingdom, phylum, class, etc.). Cluster analysis is the formal study of algorithms and methods for grouping, or clustering, objects according to measured or perceived intrinsic characteristics or similarity. Cluster analysis does not use category labels that tag objects with prior identifiers, i.e., class labels. The absence of category information distinguishes data clustering (unsupervised learning) from classification or discriminant analysis (supervised learning). The aim of clustering is exploratory in nature to find structure in data. Clustering has a long and rich history in a variety of scientific fields. One of the most popular and simple clustering algorithms, Kmeans, was first published in 1955. In spite of the fact that Kmeans was proposed over 50 years ago and thousands of clustering algorithms have been published since then, Kmeans is still widely used. This speaks to the difficulty of designing a general purpose clustering algorithm and the illposed problem of clustering. We provide a brief overview of clustering, summarize well known clustering methods, discuss the major challenges and key issues in designing clustering algorithms, and point out some of the emerging and useful research directions, including semisupervised clustering, ensemble clustering, simultaneous feature selection, and data clustering and large scale data clustering.
Resampling method for unsupervised estimation of cluster validity
 Neural Computation
, 2001
"... We introduce a method for validation of results obtained by clustering analysis of data. The method is based on resampling the available data. A figure of merit that measures the stability of clustering solutions against resampling is introduced. Clusters which are stable against resampling give ris ..."
Abstract

Cited by 72 (3 self)
 Add to MetaCart
(Show Context)
We introduce a method for validation of results obtained by clustering analysis of data. The method is based on resampling the available data. A figure of merit that measures the stability of clustering solutions against resampling is introduced. Clusters which are stable against resampling give rise to local maxima of this figure of merit. This is presented first for a onedimensional data set, for which an analytic approximation for the figure of merit is derived and compared with numerical measurements. Next, the applicability of the method is demonstrated for higher dimensional data, including gene microarray expression data. 1
Performance Evaluation of Some Clustering Algorithms and Validity Indices
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2002
"... Abstract—In this article, we evaluate the performance of three clustering algorithms, hard KMeans, single linkage, and a simulated annealing (SA) based technique, in conjunction with four cluster validity indices, namely DaviesBouldin index, Dunn’s index, CalinskiHarabasz index, and a recently de ..."
Abstract

Cited by 61 (1 self)
 Add to MetaCart
(Show Context)
Abstract—In this article, we evaluate the performance of three clustering algorithms, hard KMeans, single linkage, and a simulated annealing (SA) based technique, in conjunction with four cluster validity indices, namely DaviesBouldin index, Dunn’s index, CalinskiHarabasz index, and a recently developed index I. Based on a relation between the index I and the Dunn’s index, a lower bound of the value of the former is theoretically estimated in order to get unique hard Kpartition when the data set has distinct substructures. The effectiveness of the different validity indices and clustering methods in automatically evolving the appropriate number of clusters is demonstrated experimentally for both artificial and reallife data sets with the number of clusters varying from two to ten. Once the appropriate number of clusters is determined, the SAbased clustering technique is used for proper partitioning of the data into the said number of clusters.
Designing fuzzy inference systems from data: an interpretabilityoriented review
 IEEE Trans. Fuzzy Systems
"... Abstract—Fuzzy inference systems (FIS) are widely used for process simulation or control. They can be designed either from expert knowledge or from data. For complex systems, FIS based on expert knowledge only may suffer from a loss of accuracy. This is the main incentive for using fuzzy rules infer ..."
Abstract

Cited by 59 (12 self)
 Add to MetaCart
(Show Context)
Abstract—Fuzzy inference systems (FIS) are widely used for process simulation or control. They can be designed either from expert knowledge or from data. For complex systems, FIS based on expert knowledge only may suffer from a loss of accuracy. This is the main incentive for using fuzzy rules inferred from data. Designing a FIS from data can be decomposed into two main phases: automatic rule generation and system optimization. Rule generation leads to a basic system with a given space partitioning and the corresponding set of rules. System optimization can be done at various levels. Variable selection can be an overall selection or it can be managed rule by rule. Rule base optimization aims to select the most useful rules and to optimize rule conclusions. Space partitioning can be improved by adding or removing fuzzy sets and by tuning membership function parameters. Structure optimization is of a major importance: selecting variables, reducing the rule base and optimizing the number of fuzzy sets. Over the years, many methods have become available for designing FIS from data. Their efficiency is usually characterized by a numerical performance index. However, for humancomputer cooperation another criterion is needed: the rule interpretability. An implicit assumption states that fuzzy rules are by nature easy to be interpreted. This could be wrong when dealing with complex multivariable systems or when the generated partitioning is meaningless for experts. This paper analyzes the main methods for automatic rule generation and structure optimization. They are grouped into several families and compared according to the rule interpretability criterion. For this purpose, three conditions for a set of rules to be interpretable are defined. Index Terms—Fuzzy inference systems, fuzzy partitioning, interpretability, rule induction, system optimization. I.
Vector Quantization with Complexity Costs
, 1993
"... Vector quantization is a data compression method where a set of data points is encoded by a reduced set of reference vectors, the codebook. We discuss a vector quantization strategy which jointly optimizes distortion errors and the codebook complexity, thereby, determining the size of the codebook. ..."
Abstract

Cited by 58 (19 self)
 Add to MetaCart
Vector quantization is a data compression method where a set of data points is encoded by a reduced set of reference vectors, the codebook. We discuss a vector quantization strategy which jointly optimizes distortion errors and the codebook complexity, thereby, determining the size of the codebook. A maximum entropy estimation of the cost function yields an optimal number of reference vectors, their positions and their assignment probabilities. The dependence of the codebook density on the data density for different complexity functions is investigated in the limit of asymptotic quantization levels. How different complexity measures influence the efficiency of vector quantizers is studied for the task of image compression, i.e., we quantize the wavelet coefficients of gray level images and measure the reconstruction error. Our approach establishes a unifying framework for different quantization methods like Kmeans clustering and its fuzzy version, entropy constrained vector quantizati...
A modified fuzzy Cmeans algorithm for bias field estimation and segmentation of MRI data
 IEEE Trans. on Medical Imaging
, 2002
"... Abstract—In this paper, we present a novel algorithm for fuzzy segmentation of magnetic resonance imaging (MRI) data and estimation of intensity inhomogeneities using fuzzy logic. MRI intensity inhomogeneities can be attributed to imperfections in the radiofrequency coils or to problems associated ..."
Abstract

Cited by 57 (1 self)
 Add to MetaCart
(Show Context)
Abstract—In this paper, we present a novel algorithm for fuzzy segmentation of magnetic resonance imaging (MRI) data and estimation of intensity inhomogeneities using fuzzy logic. MRI intensity inhomogeneities can be attributed to imperfections in the radiofrequency coils or to problems associated with the acquisition sequences. The result is a slowly varying shading artifact over the image that can produce errors with conventional intensitybased classification. Our algorithm is formulated by modifying the objective function of the standard fuzzy cmeans (FCM) algorithm to compensate for such inhomogeneities and to allow the labeling of a pixel (voxel) to be influenced by the labels in its immediate neighborhood. The neighborhood effect acts as a regularizer and biases the solution toward piecewisehomogeneous labelings. Such a regularization is useful in segmenting scans corrupted by salt and pepper noise. Experimental results on both synthetic images and MR data are given to demonstrate the effectiveness and efficiency of the proposed algorithm. Index Terms—Bias field, fuzzy logic, image segmentation, MR imaging. I.