## Human Cluster Evaluation and Formal Quality Measures: A Comparative Study

### BibTeX

@MISC{Lewis_humancluster,

author = {Joshua M. Lewis and Margareta Ackerman},

title = {Human Cluster Evaluation and Formal Quality Measures: A Comparative Study},

year = {}

}

### OpenURL

### Abstract

Clustering quality evaluation is an essential component of cluster analysis. Given the plethora of clustering techniques and their possible parameter settings, data analysts require sound means of comparing alternate partitions of the same data. When proposing a novel technique, researchers commonly apply two means of clustering quality evaluation. First, they apply formal Clustering Quality Measures (CQMs) to compare the results of the novel technique with those of previous algorithms. Second, they visually present the resultant partitions of the novel method and invite readers to see for themselves that it uncovers the correct partition. These two approaches are viewed as disjoint and complementary. Our study compares formal CQMs with human evaluations using a diverse set of measures based on a novel theoretical taxonomy. We find that some highly natural CQMs are in sharp contrast with human evaluations while others correlate well. Through a comparison of clustering experts and novices, as well as a consistency analysis, we support the hypothesis that clustering evaluation skill is present in the general population.

### Citations

1767 |
The measurement of observer agreement for categorical data
- Landis, Koch
- 1977
(Show Context)
Citation Context ...ing complete agreement and 0 representing the amount of agreement expected by chance. While there is no standard significance test for differences in κ, the rating scale suggested by Landis and Koch (=-=Landis & Koch, 1977-=-) would classify the CQM and novice rater groups each as in slight agreement, and the expert raters as in fair agreement. To test whether any one measure was significantly harming CQM consistency we l... |

1200 | On spectral clustering: Analysis and an algorithm - Ng, Jordan, et al. |

287 |
Measuring nominal scale agreement among many raters
- Fleiss
- 1971
(Show Context)
Citation Context ...d not exclude any subjects due to inconsistency and we did not analyze internal consistency for experts as they were only tested on one block. To analyze consistency across subjects we use Fleiss’ κ (=-=Fleiss, 1971-=-) and include neutral responses. Fleiss’ κ measures the deviation between observed agreement and the agreement attributable to chance given the relative frequency of ratings and normalized for the num... |

271 |
Silhouettes: a graphical aid to the interpretation and validation of cluster analysis
- Rousseeuw
- 1987
(Show Context)
Citation Context ...al mathematical formalizations disagree with human evaluations. On the other hand, we identify CQMs whose evaluations are well correlated with those of humans. In particular, we find that Silhouette (=-=Rousseeuw, 1987-=-) and Dunn’s Index (Dunn, 1974) are highly correlated with human evaluations. Our findings also indicate that there is sufficient similarity between the evaluations of novices and experts to suggest t... |

232 |
A dendrite method for cluster analysis
- Caliński, Harabasz
- 1974
(Show Context)
Citation Context ...luster separation and within-cluster homogeneity, respectively. The average between of C is avgx̸∼Cyd(x,y). The average within of C is avgx∼Cyd(x,y). Calinski-Harabasz: The Calinski-Harabasz measure (=-=Caliński & Harabasz, 1974-=-) makes use of cluster centers. ∑x∈Ci |x| denote the center-of-mass of cluster Ci, Let ci = 1 |Ci| and ¯x the center-of-mass of X. Let B(C) = ∑Ci |Ci||ci − ¯x| 2 and W(C) = ∑Ci ∑x∈Ci |x − ci| 2 . The ... |

136 |
Well separated clusters and optimal fuzzy partitions
- Dunn
- 1974
(Show Context)
Citation Context ...gree with human evaluations. On the other hand, we identify CQMs whose evaluations are well correlated with those of humans. In particular, we find that Silhouette (Rousseeuw, 1987) and Dunn’s Index (=-=Dunn, 1974-=-) are highly correlated with human evaluations. Our findings also indicate that there is sufficient similarity between the evaluations of novices and experts to suggest that clustering evaluation is a... |

97 | An impossibility theorem for clustering
- Kleinberg
- 2002
(Show Context)
Citation Context ...properties are descriptive only, and not necessarily desirable. Our taxonomy of CQMs follows a line of work on theoretical foundations of clustering beginning with the famous impossibility result by (=-=Kleinberg, 2003-=-), which showed that no clustering function can simultaneously satisfy three specific properties. (Ackerman & Ben-David, 2008) reformulate these properties in the setting of CQMs, and show that these ... |

91 |
Mathematical Taxonomy
- Jardine, Sibson
- 1971
(Show Context)
Citation Context ...ogeneityinvariance can also be viewed as consistency properties. An additional consistency property, order consistency, is an adaptation of an analogous property of clustering functions presented in (=-=Jardine & Sibson, 1971-=-). Order consistency describes CQMs that depend only on the order of pairwise distances. Definition 3. A CQM m is order consistent if for all d and d ′ over X such that for all p,q,r,s ∈ X, d(p,q) < d... |

23 | Measures of clustering quality: A working set of axioms for clustering
- Ackerman, Ben-David
- 2008
(Show Context)
Citation Context ...oretical foundations of clustering beginning with the famous impossibility result by (Kleinberg, 2003), which showed that no clustering function can simultaneously satisfy three specific properties. (=-=Ackerman & Ben-David, 2008-=-) reformulate these properties in the setting of CQMs, and show that these properties are consistent and satisfied by many CQMs. We follow up on (Ackerman & Ben-David, 2008) by studying natural proper... |

19 | A uniqueness theorem for clustering - Zadeh, Ben-David - 2009 |

12 | Clusterability: A theoretical study
- Ackerman, Ben-David
- 2009
(Show Context)
Citation Context ...s clusterings consistent with what a human would have chosen.” (Ng, Jor1 If no good clusterings have been found the underlying dataset may have no good clustering (the data is not “clusterable”, see (=-=Ackerman & Ben-David, 2009-=-) for more on clusterability). dan, & Weiss, 2002) Up until now, clustering quality measures and human judgment were considered complementary approaches to clustering evaluation. Most papers that pres... |

7 | Characterization of Linkage-based Clustering
- Ackerman, Ben-David, et al.
- 2010
(Show Context)
Citation Context ... CQM selection 2 , in particular for selecting a versatile set of CQMs, we develop a property-based framework for distinguishing CQMs based on such a framework for clustering algorithms discussed in (=-=Ackerman, Ben-David, & Loker, 2010-=-b) (also see (BosaghZadeh & Ben-David, 2009) and (Ackerman, Ben-David, & Loker, 2010a)). The framework consists of identifying natural properties of CQMs and classifying measures based on the properti... |

4 |
Measuring the power of hierarchical cluster analysis
- Baker, Hubert
- 1975
(Show Context)
Citation Context ...X, we write x ∼C y if x and y belong to the same cluster in C and x ̸∼C y, otherwise. Finally, a CQM is a function that maps clusterings to real numbers. Gamma: This measure was proposed as a CQM by (=-=Baker & Hubert, 1975-=-) and it is the best performing measure in (Milligan, 1981). Let d + denote the number of times that a pair of points that was clustered together has distance smaller than two points that belong to di... |

4 |
A monte-carlo study of 30 internal criterion measures for cluster-analysis
- Milligan
- 1981
(Show Context)
Citation Context ...d x ̸∼C y, otherwise. Finally, a CQM is a function that maps clusterings to real numbers. Gamma: This measure was proposed as a CQM by (Baker & Hubert, 1975) and it is the best performing measure in (=-=Milligan, 1981-=-). Let d + denote the number of times that a pair of points that was clustered together has distance smaller than two points that belong to different cluster, whereas d− denotes the opposite result. F... |

1 |
Differentiating clustering paradigms: a property-based approach
- Ackerman, Ben-David, et al.
- 2010
(Show Context)
Citation Context ... CQM selection 2 , in particular for selecting a versatile set of CQMs, we develop a property-based framework for distinguishing CQMs based on such a framework for clustering algorithms discussed in (=-=Ackerman, Ben-David, & Loker, 2010-=-b) (also see (BosaghZadeh & Ben-David, 2009) and (Ackerman, Ben-David, & Loker, 2010a)). The framework consists of identifying natural properties of CQMs and classifying measures based on the properti... |

1 | Finding a better k: A psychophysical investigation of clustering
- Lewis
- 2009
(Show Context)
Citation Context ...ical formulations. The CQMs selected for the study are diverse in that they each satisfy a distinct set of these properties. Previous studies have investigated how humans choose the number of groups (=-=Lewis, 2009-=-) and partition data (Santos & Sá, 2005) in a clustering setting, but these approaches only show what humans think are the optimal partitions rather than how they judge partition quality in general. O... |

1 |
Human clustering on bi-dimensional data: An assessment (Tech
- Santos, Sá
- 2005
(Show Context)
Citation Context ...ted for the study are diverse in that they each satisfy a distinct set of these properties. Previous studies have investigated how humans choose the number of groups (Lewis, 2009) and partition data (=-=Santos & Sá, 2005-=-) in a clustering setting, but these approaches only show what humans think are the optimal partitions rather than how they judge partition quality in general. Our study uses a set of non-optimal part... |

1 | Available from http://www.di.ubi.pt/˜lfbaa/ entnetsPubs/JMS - Biomédica, Porto - 2002 |