Results 1 - 10
of
39
Survey of clustering data mining techniques
, 2002
"... Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in math ..."
Abstract
-
Cited by 177 (0 self)
- Add to MetaCart
Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in mathematics, statistics, and numerical analysis. From a machine learning perspective clusters correspond to hidden patterns, the search for clusters is unsupervised learning, and the resulting system represents a data concept. From a practical perspective clustering plays an outstanding role in data mining applications such as scientific data exploration, information retrieval and text mining, spatial database applications, Web analysis, CRM, marketing, medical diagnostics, computational biology, and many others. Clustering is the subject of active research in several fields such as statistics, pattern recognition, and machine learning. This survey focuses on clustering in data mining. Data mining adds to clustering the complications of very large datasets with very many attributes of different types. This imposes unique
Performance Evaluation of Some Clustering Algorithms and Validity Indices
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2002
"... Abstract—In this article, we evaluate the performance of three clustering algorithms, hard K-Means, single linkage, and a simulated annealing (SA) based technique, in conjunction with four cluster validity indices, namely Davies-Bouldin index, Dunn’s index, Calinski-Harabasz index, and a recently de ..."
Abstract
-
Cited by 40 (0 self)
- Add to MetaCart
Abstract—In this article, we evaluate the performance of three clustering algorithms, hard K-Means, single linkage, and a simulated annealing (SA) based technique, in conjunction with four cluster validity indices, namely Davies-Bouldin index, Dunn’s index, Calinski-Harabasz index, and a recently developed index I. Based on a relation between the index I and the Dunn’s index, a lower bound of the value of the former is theoretically estimated in order to get unique hard K-partition when the data set has distinct substructures. The effectiveness of the different validity indices and clustering methods in automatically evolving the appropriate number of clusters is demonstrated experimentally for both artificial and real-life data sets with the number of clusters varying from two to ten. Once the appropriate number of clusters is determined, the SA-based clustering technique is used for proper partitioning of the data into the said number of clusters.
GA-fuzzy modeling and classification: complexity and performance
, 1999
"... The use of Genet ic Algorit hms (GAs) and ot her evolut ionary opt imizat ion met hodst o design fuzzy rules forsyst4E modeling anddat classificat73 have received much at4L t ion in recent litn at ure.AutL rs have focused on various aspect oft hese randomizedtz hniques, and a whole scale of algoritW ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
The use of Genet ic Algorit hms (GAs) and ot her evolut ionary opt imizat ion met hodst o design fuzzy rules forsyst4E modeling anddat classificat73 have received much at4L t ion in recent litn at ure.AutL rs have focused on various aspect oft hese randomizedtz hniques, and a whole scale of algoritW0 have been proposed. We comment on some recent work and describe a new and e#cient t wo-st5 approacht hat leads t good result forfunct3 n approximat ion, dynamic systNE modeling and da t classificat ion problems. First fuzzyclust5 ing is appliedt o obt in a compact initL7 rule-based model. Then ten model is optB6B3W by a real-coded GA subject4 t const raint st hat maint aint he semant ic propert ies oft he rules. We consider four examples from to litE657W0N a syntW386 nonlinear dynamic systcW model,t he Iris dat classificatNE problem, to Wine dat a classificat ion problem andt he dynamic modeling of a Diesel engine tW bocharger. The obt3845 result are comparedt o otB5 recentc proposed met8 ...
GenIc: A Single Pass Generalized Incremental Algorithm for Clustering
- In SIAM Int. Conf. on Data Mining
, 2004
"... In this paper we introduce a new single pass clustering algorithm called GenIc designed with the objective of having low overall cost. We examine some of the properties of GenIc and compare it to windowed k-means. We also study its performance using experimental data sets obtained from network monit ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
In this paper we introduce a new single pass clustering algorithm called GenIc designed with the objective of having low overall cost. We examine some of the properties of GenIc and compare it to windowed k-means. We also study its performance using experimental data sets obtained from network monitoring.
Automatic Segmentation of Non-enhancing Brain Tumors in Magnetic Resonance Images
- Artificial Intelligence in Medicine
, 2001
"... Tumor segmentation from magnetic resonance #MR# images may aid in tumor treatmentby tracking the progress of tumor growth and#or shrinkage. In this paper we present the #rst automatic segmentation method which separates non-enhancing brain tumors from healthy tissues in MR images to aid in the ta ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
Tumor segmentation from magnetic resonance #MR# images may aid in tumor treatmentby tracking the progress of tumor growth and#or shrinkage. In this paper we present the #rst automatic segmentation method which separates non-enhancing brain tumors from healthy tissues in MR images to aid in the task of tracking tumor size over time. The MR feature images used for the segmentation consist of three weighted images #T1, T2 and proton density# for each axial slice through the head. An initial segmentation is computed using an unsupervised fuzzy clustering algorithm. Then, integrated domain knowledge and image processing techniques contribute to the #nal tumor segmentation. They are applied under the control of a knowledge-based system. The system knowledge was acquired by training on two patientvolumes #14 images#. Testing has shown successful tumor segmentations on four patient volumes #31 images#. Our results show that we detected all six non-enhancing brain tumors, located tumor tissue in 35 of the 36 ground truth #radiologist labeled# slices containing tumor and successfully separated tumor regions from physically connected CSF regions in nine of nine slices. Quantitative measurements are promising as correspondence ratios between ground truth and segmented tumor regions ranged between 0.368 and 0.871 per volume, with percent match ranging between 0.530 and 0.909 per volume. Keywords MRI, non-enhancing brain tumors, image processing, automatic tissue classi#cation, fuzzy, clustering I.
i-Miner: A Web Usage Mining Framework Using Hierarchical Intelligent Systems
- The IEEE Int. Conf. on Fuzzy Systems
, 2003
"... Recently Web mining has become a hot research topic, which combines two of the prominent research areas comprising of data mining and the World Wide Web (WWW). Web usage mining attempts to discover useful knowledge from the secondary data obtained from the interactions of the users with the Web. Web ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
Recently Web mining has become a hot research topic, which combines two of the prominent research areas comprising of data mining and the World Wide Web (WWW). Web usage mining attempts to discover useful knowledge from the secondary data obtained from the interactions of the users with the Web. Web usage mining has become very critical for effective Web site management, business and support services, personalization, network traffic flow analysis and so on. Previous study on Web usage mining [8][9] using a concurrent neuro-fuzzy approach [2] has shown that the usage trend analysis very much depends on the performance of the clustering of the number of requests. In this paper, a novel approach `intelligent-miner' (i-Miner) is introduced to optimize the concurrent architecture of a fuzzy clustering algorithm (to discover data clusters) and a fuzzy inference system to analyze the trends. In the concurrent neuro-fuzzy approach [9], selforganizing maps were used to cluster the web user requests. A hybrid evolutionary FCM approach is proposed in this paper to optimally segregate similar user interests. The clustered data is then used to analyze the trends using a Takagi-Sugeno fuzzy inference system learned using a combination of evolutionary algorithm and neural network learning. Empirical results clearly shows that the proposed technique is efficient with lesser number of if-then rules and improved accuracy at the expense of complicated algorithms and extra computational cost.
A Genetic Rule-Based Data Clustering Toolkit
- In Proceedings of the 2002 Congress on Evolutionary Computation CEC2002
, 2002
"... Clustering is a hard combinatorial problem and is defined as the unsupervised classification of patterns. The formation of clusters is based on the principle of maximizing the similarity between objects of the same cluster while simultaneously minimizing the similarity between objects belonging to d ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Clustering is a hard combinatorial problem and is defined as the unsupervised classification of patterns. The formation of clusters is based on the principle of maximizing the similarity between objects of the same cluster while simultaneously minimizing the similarity between objects belonging to distinct clusters. This paper presents a tool for database clustering using a rule-based genetic algorithm (RBCGA). RBCGA evolves individuals consisting of a fixed set of clustering rules, where each rule includes d non-binary intervals, one for each feature. The investigations attempt to alleviate certain drawbacks related to the classical minimization of square-error criterion by suggesting a flexible fitness function which takes into consideration, cluster asymmetry, density, coverage and homogeny.
A Maximum Variance Cluster Algorithm
- IEEE Trans. Pattern Anal. Mach. Intell
, 2002
"... We present a partitional cluster algorithm that minimizes the sum-of-squared-error criterion while imposing a hard constraint on the cluster variance. Conceptually, hypothesized clusters act in parallel and cooperate with their neighboring clusters in order to minimize the criterion and to satisfy t ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We present a partitional cluster algorithm that minimizes the sum-of-squared-error criterion while imposing a hard constraint on the cluster variance. Conceptually, hypothesized clusters act in parallel and cooperate with their neighboring clusters in order to minimize the criterion and to satisfy the variance constraint. In order to enable the demarcation of the cluster neighborhood without crucial parameters, we introduce the notion of foreign cluster samples. Finally, we demonstrate a new method for cluster tendency assessment based on varying the variance constraint parameter.
Randomized Metric Induction and Evolutionary Conceptual Clustering for Semantic Knowledge Bases
- ACM-CIKM 2007
, 2007
"... We present an evolutionary clustering method which can be applied to multi-relational knowledge bases storing resource annotations expressed in the standard languages for the Semantic Web. The method exploits an effective and languageindependent semi-distance measure defined for the space of individ ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
We present an evolutionary clustering method which can be applied to multi-relational knowledge bases storing resource annotations expressed in the standard languages for the Semantic Web. The method exploits an effective and languageindependent semi-distance measure defined for the space of individual resources, that is based on a finite number of dimensions corresponding to a committee of discriminating features (represented by concept descriptions). A maximally discriminating group of features can be obtained with the randomized optimization methods described in the paper. The clustering algorithm represents the possible clusterings as strings of central elements (medoids, w.r.t. the given metric) of variable length. Hence, the number of clusters is not required as a parameter since the method is able to find an optimal choice by means of the evolutionary operators and of a proper fitness function. We also show how to assign each cluster with a newly constructed intensional definition in the employed concept language. An experimentation with some ontologies proves the feasibility of our method and its effectiveness in terms of clustering validity indices.

