## Improvements to the scalability of multiobjective clustering (2005)

Venue: | In Proceedings of the 2005 IEEE Congress on Evolutionary Computation, IEEE |

Citations: | 8 - 4 self |

### BibTeX

@INPROCEEDINGS{Handl05improvementsto,

author = {Julia Handl},

title = {Improvements to the scalability of multiobjective clustering},

booktitle = {In Proceedings of the 2005 IEEE Congress on Evolutionary Computation, IEEE},

year = {2005},

pages = {2372--2379},

publisher = {Press}

}

### OpenURL

### Abstract

Abstract- In previous work, we have introduced a novel and highly effective approach to data clustering, based on the explicit optimization of a partitioning with respect to two complementary clustering objectives [4, 5, 6]. In this paper, we make three modifications to the algorithm that improve its scalability to large data sets with high dimensionality and large numbers of clusters. Specifically, we introduce new initialization and mutation schemes that enable a more efficient exploration of the search space, and modify the null data model that is used as a basis for selecting the most significant solution from the Pareto front. The high performance of the resulting algorithm is demonstrated on a newly developed clustering test suite. 1

### Citations

519 |
Comparing Partitions
- Hubert, Arabie
- 1985
(Show Context)
Citation Context ...a representation based on the contingency table defined by two partitionings � and � (one being the known correct classification and one being the partition under evaluation), the Adjusted Rand Index =-=[7]-=- is given as � ����¥�� � � £ � � ��� � ����� � ��� ����� ��� � � � � � � ��� � � � � � ����� � � � � � ��� � � � � � � � �¤� � � ����� � � � � � � ����� � � � �¤� � � � � � � � � � � ¥ � � � ��� where... |

395 | Cluster ensembles – a knowledge reuse framework for combining multiple partitions
- Strehl, Ghosh
(Show Context)
Citation Context ...on and description of this methodology (including pseudo-code) is provided in [5]. In previous work, MOCK has been compared to three single-objective clustering algorithm, an advanced ensemble method =-=[13]-=- and the Gap statistic [14], and results indicated a clear advantage to the multiobjective approach. 1.2 Limitations of MOCK and scope of this work The first implementation of MOCK described above was... |

260 | Estimating the number of clusters in a data set via the gap statistic
- Tibshirani, Walther, et al.
- 2001
(Show Context)
Citation Context ...methodology (including pseudo-code) is provided in [5]. In previous work, MOCK has been compared to three single-objective clustering algorithm, an advanced ensemble method [13] and the Gap statistic =-=[14]-=-, and results indicated a clear advantage to the multiobjective approach. 1.2 Limitations of MOCK and scope of this work The first implementation of MOCK described above was targeted at scenarios in w... |

223 |
Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis
- Rousseeuw
- 1987
(Show Context)
Citation Context ... in a cluster analysis. Given clustering solutions for a range of different numbers of clusters, the Silhouette Width can be employed to determine the most appropriate solutions. The Silhouette Width =-=[11]-=- for a partitioning is computed as the average Silhouette value over all data items. The Silhouette value for an individual data item � , which reflects the confidence in this particular cluster assig... |

98 | An empirical comparison of four initialization methods for the K-Means algorithm, Pattern recognition letters 20
- Pena, Lozano, et al.
- 1999
(Show Context)
Citation Context ...puted after the reassignment of all data items. To reduce suboptimal solutionss-means is run repeatedly (10 times) using random initialisation (which is known to be an effective initialization method =-=[10]-=-) and only the best result in terms of intra-cluster variance is returned. 4.1.2 Hierarchical clustering In general, agglomerative clustering algorithms start with the finest partitioning possible (th... |

93 | An impossibility theorem for clustering
- Kleinberg
- 2002
(Show Context)
Citation Context ...high performance of the resulting algorithm is demonstrated on a newly developed clustering test suite. 1 Introduction The inherently multiobjective nature of data clustering, as identified in, e.g., =-=[3, 8]-=-, has motivated us in our recent work [4, 5, 6] to devise an explicit multiobjectiveoptimization approach to this problem. In [4], we began to investigate this idea with an evolutionary algorithm, VIE... |

41 | PESA-II: Region-based selection in evolutionary multiobjective optimization
- Corne, Jerram, et al.
- 2001
(Show Context)
Citation Context ...sting multiobjective clustering algorithm MOCK (MultiObjective Clustering with automatic Kdetermination) is based on the elitist multiobjective evolutionary algorithm, PESA-II, described in detail in =-=[2]-=-. MOCK optimizes two clustering objectives, overall deviation and connectivity, which reflect two fundamentally different aspects of a good clustering solution: the global concept of compactness of cl... |

33 |
Practical Nonparametric Statistics, second edition
- Conover
- 1980
(Show Context)
Citation Context ...e over the ten different problem instances only. The statistically best performer is highlighted in bold face. To test if difference were statistically significant, a paired Wilcoxon signed-rank test =-=[1]-=-, which takes into account the dependence on problem instance, was applied to all pairs of algorithms. Notice the null hypothesis of no difference could not be rejected for MOCK and average link 10d-2... |

30 |
Why so many clustering algorithms: A position paper
- ESTIVILL-CASTRO
(Show Context)
Citation Context ...high performance of the resulting algorithm is demonstrated on a newly developed clustering test suite. 1 Introduction The inherently multiobjective nature of data clustering, as identified in, e.g., =-=[3, 8]-=-, has motivated us in our recent work [4, 5, 6] to devise an explicit multiobjectiveoptimization approach to this problem. In [4], we began to investigate this idea with an evolutionary algorithm, VIE... |

27 |
Cubic Clustering Criterion
- Sarle
- 1983
(Show Context)
Citation Context ...ensional manifold within the high-dimensional space — a property that cannot be captured by the current null model. Here, we suggest the use of a refined null model, based on the description by Sarle =-=[12]-=-. Again, a Poisson model is used, but now it is set up within the space of principal components. Specifically, a principal component analysis is applied to the correlation matrix of the original data.... |

11 |
A genetic algorithm for clustering problems
- Park, Song
- 1998
(Show Context)
Citation Context ...ess of clusters, and the more local one of connectedness of data points. The definitions of these objectives can be found in [6]. The encoding employed is the locus-based adjacency scheme proposed in =-=[9]-=-. In this graph-based representation, each individual ¡ consists of ¢ genes ¡¤£¦¥¨§¨§©§¨¥�¡�� , where ¢ is the size of the clustered data set, and each gene ¡¤� can take allele values � in the range �... |

10 |
Exploiting the trade-off - the benefits of multiple objectives in data clustering
- Handl, Knowles
- 2005
(Show Context)
Citation Context ...eveloped. The new initialization operator is based on the observation that different clustering algorithms tend to perform better (find better approximations) in different regions of the Pareto front =-=[6]-=-. In particular, MST type or single link solutions tend to be close to optimal in those regions of the Pareto front where connectivity is low, whereass-means performs well in the regions where overall... |

8 |
Evolutionary multiobjective clustering
- Handl, Knowles
- 2004
(Show Context)
Citation Context ...revious work, we have introduced a novel and highly effective approach to data clustering, based on the explicit optimization of a partitioning with respect to two complementary clustering objectives =-=[4, 5, 6]-=-. In this paper, we make three modifications to the algorithm that improve its scalability to large data sets with high dimensionality and large numbers of clusters. Specifically, we introduce new ini... |

5 |
Multiobjective clustering with automatic determination of the number of clusters. In
- Handl, Knowles
- 2004
(Show Context)
Citation Context ...revious work, we have introduced a novel and highly effective approach to data clustering, based on the explicit optimization of a partitioning with respect to two complementary clustering objectives =-=[4, 5, 6]-=-. In this paper, we make three modifications to the algorithm that improve its scalability to large data sets with high dimensionality and large numbers of clusters. Specifically, we introduce new ini... |