Results 1  10
of
230
Survey of clustering algorithms
 IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2005
"... Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the ..."
Abstract

Cited by 248 (3 self)
 Add to MetaCart
Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.
A Robust Competitive Clustering Algorithm with Applications in Computer Vision
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1998
"... This paper addresses three major issues associated with conventional partitional clustering, namely, sensitivity to initialization, difficulty in determining the number of clusters, and sensitivity to noise and outliers. The proposed Robust Competitive Agglomeration (RCA) algorithm starts with a lar ..."
Abstract

Cited by 86 (3 self)
 Add to MetaCart
This paper addresses three major issues associated with conventional partitional clustering, namely, sensitivity to initialization, difficulty in determining the number of clusters, and sensitivity to noise and outliers. The proposed Robust Competitive Agglomeration (RCA) algorithm starts with a large number of clusters to reduce the sensitivity to initialization, and determines the actual number of clusters by a process of competitive agglomeration. Noise immunity is achieved by incorporating concepts from robust statistics into the algorithm. RCA assigns two different sets of weights for each data point: the first set of constrained weights represents degrees of sharing, and is used to create a competitive environment and to generate a fuzzy partition of the data set. The second set corresponds to robust weights, and is used to obtain robust estimates of the cluster prototypes. By choosing an appropriate distance measure in the objective function, RCA can be used to find a...
A Survey of Fuzzy Clustering Algorithms for Pattern Recognition  Part 11
"... the concepts of fuzzy clustering and soft competitive learning in clustering algorithms is proposed on the basis of the existing literature. Moreover, a set of functional attributes is selected for use as dictionary entries in the comparison of clustering algorithms. In this paper, five clustering a ..."
Abstract

Cited by 53 (2 self)
 Add to MetaCart
the concepts of fuzzy clustering and soft competitive learning in clustering algorithms is proposed on the basis of the existing literature. Moreover, a set of functional attributes is selected for use as dictionary entries in the comparison of clustering algorithms. In this paper, five clustering algorithms taken from the literature are reviewed, assessed and compared on the basis of the selected properties of interest. These clustering models are 1) selforganizing map (SOM); 2) fuzzy learning vector quantization (FLVQ); 3) fuzzy adaptive resonance theory (fuzzy ART); 4) growing neural gas (GNG); 5) fully selforganizing simplified adaptive resonance theory (FOSART). Although our theoretical comparison is fairly simple, it yields observations that may appear parodoxical. First, only FLVQ, fuzzy ART, and FOSART exploit concepts derived from fuzzy set theory (e.g., relative and/or absolute fuzzy membership functions). Secondly, only SOM, FLVQ, GNG, and FOSART employ soft competitive learning mechanisms, which are affected by asymptotic misbehaviors in the case of FLVQ, i.e., only SOM, GNG, and FOSART are considered effective fuzzy clustering algorithms. Index Terms—Ecological net, fuzzy clustering, modular architecture, relative and absolute membership function, soft and hard competitive learning, topologically correct mapping. I.
Designing fuzzy inference systems from data: an interpretabilityoriented review
 IEEE Trans. Fuzzy Systems
"... Abstract—Fuzzy inference systems (FIS) are widely used for process simulation or control. They can be designed either from expert knowledge or from data. For complex systems, FIS based on expert knowledge only may suffer from a loss of accuracy. This is the main incentive for using fuzzy rules infer ..."
Abstract

Cited by 50 (9 self)
 Add to MetaCart
Abstract—Fuzzy inference systems (FIS) are widely used for process simulation or control. They can be designed either from expert knowledge or from data. For complex systems, FIS based on expert knowledge only may suffer from a loss of accuracy. This is the main incentive for using fuzzy rules inferred from data. Designing a FIS from data can be decomposed into two main phases: automatic rule generation and system optimization. Rule generation leads to a basic system with a given space partitioning and the corresponding set of rules. System optimization can be done at various levels. Variable selection can be an overall selection or it can be managed rule by rule. Rule base optimization aims to select the most useful rules and to optimize rule conclusions. Space partitioning can be improved by adding or removing fuzzy sets and by tuning membership function parameters. Structure optimization is of a major importance: selecting variables, reducing the rule base and optimizing the number of fuzzy sets. Over the years, many methods have become available for designing FIS from data. Their efficiency is usually characterized by a numerical performance index. However, for humancomputer cooperation another criterion is needed: the rule interpretability. An implicit assumption states that fuzzy rules are by nature easy to be interpreted. This could be wrong when dealing with complex multivariable systems or when the generated partitioning is meaningless for experts. This paper analyzes the main methods for automatic rule generation and structure optimization. They are grouped into several families and compared according to the rule interpretability criterion. For this purpose, three conditions for a set of rules to be interpretable are defined. Index Terms—Fuzzy inference systems, fuzzy partitioning, interpretability, rule induction, system optimization. I.
A survey of kernel and spectral methods for clustering
, 2008
"... Clustering algorithms are a useful tool to explore data structures and have been employed in many disciplines. The focus of this paper is the partitioning clustering problem with a special interest in two recent approaches: kernel and spectral methods. The aim of this paper is to present a survey of ..."
Abstract

Cited by 48 (5 self)
 Add to MetaCart
Clustering algorithms are a useful tool to explore data structures and have been employed in many disciplines. The focus of this paper is the partitioning clustering problem with a special interest in two recent approaches: kernel and spectral methods. The aim of this paper is to present a survey of kernel and spectral clustering methods, two approaches able to produce nonlinear separating hypersurfaces between clusters. The presented kernel clustering methods are the kernel version of many classical clustering algorithms, e.g., Kmeans, SOM and neural gas. Spectral clustering arise from concepts in spectral graph theory and the clustering problem is configured as a graph cut problem where an appropriate objective function has to be optimized. An explicit proof of the fact that these two paradigms have the same objective is reported since it has been proven that these two seemingly different approaches have the same mathematical foundation. Besides, fuzzy kernel clustering methods are presented as extensions of kernel Kmeans clustering algorithm.
Lowcomplexity fuzzy relational clustering algorithms for web mining
 IEEE TRANSACTIONS ON FUZZY SYSTEMS
, 2001
"... This paper presents new algorithms—fuzzy cmedoids (FCMdd) and robust fuzzy cmedoids (RFCMdd)—for fuzzy clustering of relational data. The objective functions are based on selecting c representative objects (medoids) from the data set in such a way that the total fuzzy dissimilarity within each clus ..."
Abstract

Cited by 34 (2 self)
 Add to MetaCart
This paper presents new algorithms—fuzzy cmedoids (FCMdd) and robust fuzzy cmedoids (RFCMdd)—for fuzzy clustering of relational data. The objective functions are based on selecting c representative objects (medoids) from the data set in such a way that the total fuzzy dissimilarity within each cluster is minimized. A comparison of FCMdd with the wellknown relational fuzzy cmeans algorithm (RFCM) shows that FCMdd is more efficient. We present several applications of these algorithms to Web mining, including Web document clustering, snippet clustering, and Web access log analysis.
On mining Web Access Logs
 In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery
, 2000
"... The proliferation of information on the world wide web has made the personalization of this information space a necessity. One possible approach to web personalization is to mine typical user profiles from the vast amount of historical data stored in access logs. In the absence of any a priori knowl ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
The proliferation of information on the world wide web has made the personalization of this information space a necessity. One possible approach to web personalization is to mine typical user profiles from the vast amount of historical data stored in access logs. In the absence of any a priori knowledge, unsupervised classification or clustering methods seem to be ideally suited to analyze the semistructured log data of user accesses. In this paper, we define the notion of a “user session”, as well as a dissimilarity measure between two web sessions that captures the organization of a web site. To extract a user access profile, we cluster the user sessions based on the pairwise dissimilarities using a robust fuzzy clustering algorithm that we have developed. We report the results of experiments with our algorithm and show that this leads to extraction of interesting user profiles. We also show that it outperforms association rule based approaches for this task. 1
Possibility theory and statistical reasoning
 Computational Statistics & Data Analysis Vol
, 2006
"... Numerical possibility distributions can encode special convex families of probability measures. The connection between possibility theory and probability theory is potentially fruitful in the scope of statistical reasoning when uncertainty due to variability of observations should be distinguished f ..."
Abstract

Cited by 30 (2 self)
 Add to MetaCart
Numerical possibility distributions can encode special convex families of probability measures. The connection between possibility theory and probability theory is potentially fruitful in the scope of statistical reasoning when uncertainty due to variability of observations should be distinguished from uncertainty due to incomplete information. This paper proposes an overview of numerical possibility theory. Its aim is to show that some notions in statistics are naturally interpreted in the language of this theory. First, probabilistic inequalites (like Chebychev’s) offer a natural setting for devising possibility distributions from poor probabilistic information. Moreover, likelihood functions obey the laws of possibility theory when no prior probability is available. Possibility distributions also generalize the notion of confidence or prediction intervals, shedding some light on the role of the mode of asymmetric probability densities in the derivation of maximally informative interval substitutes of probabilistic information. Finally, the simulation of fuzzy sets comes down to selecting a probabilistic representation of a possibility distribution, which coincides with the Shapley value of the corresponding consonant capacity. This selection process is in agreement with Laplace indifference principle and is closely connected with the mean interval of a fuzzy interval. It sheds light on the “defuzzification ” process in fuzzy set theory and provides a natural definition of a subjective possibility distribution that sticks to the Bayesian framework of exchangeable bets. Potential applications to risk assessment are pointed out. 1
Partially Supervised Clustering for Image Segmentation
 Pattern Recognition
, 1996
"... Abstract All clustering algorithms process unlabeled ata and, consequently, suffer from two problems: (P1) choosing and validating the correct number of clusters and (P2) insuring that algorithmic labels correspond to meaningful physical labels. Clustering algorithms such as hard and fuzzy cmeans, ..."
Abstract

Cited by 29 (2 self)
 Add to MetaCart
Abstract All clustering algorithms process unlabeled ata and, consequently, suffer from two problems: (P1) choosing and validating the correct number of clusters and (P2) insuring that algorithmic labels correspond to meaningful physical labels. Clustering algorithms such as hard and fuzzy cmeans, based on optimizing sums of squared errors objective functions, suffer from a third problem: (P3) a tendency to recommend solutions that equalize cluster populations. The semisupervised cmeans algorithms introduced in this paper attempt to overcome these three problems for problem domains where a few data from each class can be labeled. Segmentation f magnetic resonance images is a problem of this type and we use it to illustrate the new algorithm. Our examples how that the semisupervised approach provides MRI segmentations that are superior to ordinary fuzzy cmeans and to the crisp knearest neighbor rule and further, that the new method ameliorates (P1)(P3). Cluster analysis Fuzzy cmeans Partial supervision Image segmentation Magnetic resonance images
Multipleprototype classifier design
 IEEE Trans. Syst., Man, Cybern. B
, 1998
"... Abstract—Five methods that generate multiple prototypes from labeled data are reviewed. Then we introduce a new sixth approach, which is a modification of Chang’s method. We compare the six methods with two standard classifier designs: the 1nearest prototype (1np) and 1nearest neighbor (1nn) rul ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
Abstract—Five methods that generate multiple prototypes from labeled data are reviewed. Then we introduce a new sixth approach, which is a modification of Chang’s method. We compare the six methods with two standard classifier designs: the 1nearest prototype (1np) and 1nearest neighbor (1nn) rules. The standard of comparison is the resubstitution error rate; the data used are the Iris data. Our modified Chang’s method produces the best consistent (zero errors) design. One of the competitive learning models produces the best minimal prototypes design (five prototypes that yield three resubstitution errors). Index Terms — Competitive learning, Iris data, modified Chang’s method (MCA), multiple prototypes, nearest neighbor