## Dissimilarity Plots: A Visual Exploration Tool for Partitional Clustering (2009)

Citations: | 1 - 1 self |

### BibTeX

@MISC{Hahsler09dissimilarityplots:,

author = {Michael Hahsler and Kurt Hornik},

title = {Dissimilarity Plots: A Visual Exploration Tool for Partitional Clustering},

year = {2009}

}

### OpenURL

### Abstract

For hierarchical clustering, dendrograms provide convenient and powerful visualization. Although many visualization methods have been suggested for partitional clustering, their usefulness deteriorates quickly with increasing dimensionality of the data and/or they fail to represent structure between and within clusters simultaneously. In this paper we extend (dissimilarity) matrix shading with several reordering steps based on seriation. Both methods, matrix shading and seriation, have been well-known for a long time. However, only recent algorithmic improvements allow to use seriation for larger problems. Furthermore, seriation is used in a novel stepwise process (within each cluster and between clusters) which leads to a visualization technique that is independent of the dimensionality of the data. A big advantage is that it presents the structure between clusters and the micro-structure within clusters in one concise plot. This not only allows for judging cluster quality but also makes mis-specification of the number of clusters apparent. We give a detailed discussion of the construction of dissimilarity plots and demonstrate their usefulness with several examples.

### Citations

1821 |
Cluster analysis and display of genome-wide expression patterns
- Eisen
- 1998
(Show Context)
Citation Context ...euristics. For example matrix shading is often used in connection with hierarchical clustering, where the order of the dendrogram leaf nodes is used to arrange the matrix yielding a cluster heat map (=-=Eisen, Spellman, Browndagger, and Botstein, 1998-=-; Wilkinson and Friendly, 2009). Since the order of leaf nodes in a dendrogram is not unique (each subtree can be rotated) and to further improve the presentation, the leaf nodes can be reordered usin... |

640 |
UCI machine learning repository
- Asuncion, Newman
- 2007
(Show Context)
Citation Context ...list(main = "")) 4.4 High-dimensional data To show how dissimilarity plots work with higher-dimensional data, we use the Votes data set available via the UCI Repository of Machine Learning Databases (=-=Asuncion and Newman, 2007-=-). This data set includes votes (voted for, voted against and unknown) for each of the U.S. House of the 435 Representatives congressmen on the 16 key votes during the second session of 1984. Hence th... |

224 |
Silhouettes: A graphical aid to the interpretation and validation of cluster analysis
- Rousseeuw
- 1987
(Show Context)
Citation Context ...pically difficult for higher dimensional data. 1Another approach is to visualize metrics calculated from inter and intra-cluster similarities to judge cluster quality. For example, silhouette width (=-=Rousseeuw, 1987-=-; Kaufman and Rousseeuw, 1990) is a measure for how much an object belongs to its cluster (intra cluster similarity) compared to how close it is to objects in its nearest neighboring clusters. Silhoue... |

97 | Traveling Salesman Problem And Its Variations
- Gutin, Punnen, et al.
- 2002
(Show Context)
Citation Context ...=1 The length of the Hamiltonian path is equal to the value of the minimal span loss function (as used by Chen, 2002), and both notions are closely related to the traveling salesperson problem (TSP) (=-=Gutin and Punnen, 2002-=-). For the TSP exist specialized solvers (e.g., Concorde by Applegate, Bixby, Chvátal, and Cook (2006)) and good heuristics (e.g., Lin and Kernighan, 1973) which are more efficient than general seriat... |

55 |
Mosaic Displays for Multi-Way Contingency Tables
- Friendly
- 1994
(Show Context)
Citation Context ...hic Information Processing” (Bertin, 1981, which was first published in French in 1967) to this topic. More recently matrix reordering was applied to mosaic displays for multi-way contingency tables (=-=Friendly, 1994-=-), distance matrices (Wishart, 1999), correlation matrices (Friendly, 2002), and scatter plot matrices (Hurley, 2004). For these applications reordering is typically done using heuristics. For example... |

54 |
Numerical methods for fuzzy clustering
- Ruspini
- 1970
(Show Context)
Citation Context ...ithms is provided by Michael Brusco and is available in the R extension package seriation (Hahsler, Buchta, and Hornik, 2009). 4.1 Easily distinguishable groups First we look at the Ruspini data set (=-=Ruspini, 1970-=-) which is popular for illustrating clustering techniques. It consists of 75 points in two-dimensional space with four clearly distinguishable groups. We calculated a dissimilarity matrix using the eu... |

45 | Fast optimal leaf ordering for hierarchical clustering - Bar-Joseph, Gifford, et al. - 2001 |

45 |
Graphics and Graphic Information Processing. Walter de Gruter
- Bertin
- 1981
(Show Context)
Citation Context ...proved by reordering the rows and columns. Reordering matrices is a long known technique. For example Jacques Bertin devotes a whole chapter of his book “Graphics and Graphic Information Processing” (=-=Bertin, 1981-=-, which was first published in French in 1967) to this topic. More recently matrix reordering was applied to mosaic displays for multi-way contingency tables (Friendly, 1994), distance matrices (Wisha... |

40 | Relationship-based clustering and visualization for highdimensional data mining - Strehl, Ghosh - 2003 |

29 |
An overview of combinatorial data analysis, in: Arabie
- Arabie
- 1996
(Show Context)
Citation Context ...on 4 presents several examples to show how dissimilarity plots compare to other methods. We conclude the paper with Section 5. 2 Seriation Seriation is a basic problem in combinatorial data analysis (=-=Arabie and Hubert, 1996-=-) with the aim to arrange all objects in a set in a linear order given available data and some loss function, in order to reveal structural information. Solving problems in combinatorial data 2analys... |

28 |
PermutMatrix: a graphical environment to arrange gene expression profiles in optimal linear order
- Caraux, Pinloche
- 2005
(Show Context)
Citation Context ... the vertices and each edge eij ∈ E between the objects Oi, Oj ∈ Ω has a weight wij associated which represents the dissimilarity dij. Such a graph can be used for seriation (see, e.g., Hubert, 1974; =-=Caraux and Pinloche, 2005-=-). An order Ψ of the objects can be seen as a path through the graph where each node is visited exactly once, i.e., a Hamiltonian path. Minimizing the Hamiltonian path length results in a seriation op... |

24 |
Representation of similarity matrices by trees
- Hartigan
- 1967
(Show Context)
Citation Context ...cate a good clustering. To present these similarities visualizations are helpful for judging the quality of a clustering and to explore the cluster structure. For hierarchical clustering dendrograms (=-=Hartigan, 1967-=-) are available which show the hierarchical structure of the clustering as a binary tree. Similarities between clusters and between objects are represented in the plot by the height of internal nodes ... |

23 | Corrgrams: Exploratory displays for correlation matrices
- Friendly
- 2002
(Show Context)
Citation Context ...ench in 1967) to this topic. More recently matrix reordering was applied to mosaic displays for multi-way contingency tables (Friendly, 1994), distance matrices (Wishart, 1999), correlation matrices (=-=Friendly, 2002-=-), and scatter plot matrices (Hurley, 2004). For these applications reordering is typically done using heuristics. For example matrix shading is often used in connection with hierarchical clustering, ... |

21 |
Set Theory
- Hausdorff
- 1957
(Show Context)
Citation Context ...possible dissimilarity between any two objects, one of each set. Average-link computes the average of all pairwise dissimilarities between objects of the two sets. In set theory the Hausdorff metric (=-=Hausdorff, 2001-=-) is used to calculate the dissimilarity between two sets defined from pairwise dissimilarities between the elements in the two set. The metric is defined as: dH(X , Y) = max{sup x∈X infy∈Y d(x, y), s... |

19 | Generalized association plots: Information visualization via iteratively generated correlation matrices - Chen - 2002 |

16 |
Some applications of graph theory and related nonmetric techniques to problems of approximate seriation: the case of symmetric proximity measures
- Hubert
- 1974
(Show Context)
Citation Context ...s Ω constitute the vertices and each edge eij ∈ E between the objects Oi, Oj ∈ Ω has a weight wij associated which represents the dissimilarity dij. Such a graph can be used for seriation (see, e.g., =-=Hubert, 1974-=-; Caraux and Pinloche, 2005). An order Ψ of the objects can be seen as a path through the graph where each node is visited exactly once, i.e., a Hamiltonian path. Minimizing the Hamiltonian path lengt... |

16 | The history of the cluster heat map
- Wilkinson
- 2009
(Show Context)
Citation Context ...d in connection with hierarchical clustering, where the order of the dendrogram leaf nodes is used to arrange the matrix yielding a cluster heat map (Eisen, Spellman, Browndagger, and Botstein, 1998; =-=Wilkinson and Friendly, 2009-=-). Since the order of leaf nodes in a dendrogram is not unique (each subtree can be rotated) and to further improve the presentation, the leaf nodes can be reordered using heuristics (e.g., Gruvaeus a... |

14 | A method for chronologically ordering archaeological deposits - Robinson - 1951 |

13 | Unclassed Matrix Shading and Optimal Ordering in Hierarchical Cluster Analysis - Gale, WC, et al. - 1984 |

13 | Getting things in order: An introduction to the R package seriation - Hahsler, Hornik, et al. - 2008 |

12 |
Meulman,"Combinatorial data analysis: optimization by dynamic programming
- Hubert, Arabie, et al.
- 2001
(Show Context)
Citation Context ...ed a perfect anti-Robinson matrix after the statistician Robinson (1951). Formally, an n × n dissimilarity matrix D is in anti-Robinson form if and only if the following two gradient conditions hold (=-=Hubert et al., 1987-=-): within rows: dik ≤ dij for 1 ≤ i < k < j ≤ n; (2) within columns: dkj ≤ dij for 1 ≤ i < k < j ≤ n. (3) In an anti-Robinson matrix the smallest dissimilarity values appear close to the main diagonal... |

11 |
Two Additions to Hierarchical Cluster Analysis
- Gruvaeus, Wainer
- 1972
(Show Context)
Citation Context ...ndly, 2009). Since the order of leaf nodes in a dendrogram is not unique (each subtree can be rotated) and to further improve the presentation, the leaf nodes can be reordered using heuristics (e.g., =-=Gruvaeus and Wainer, 1972-=-). Only more recently Bar-Joseph, Demaine, Gifford, and Jaakkola (2001) developed an O(n 4 ) algorithm that finds the optimal order of leaf nodes which minimizes the sum of distances between the nodes... |

10 |
A Computer Generated Aid for Cluster Analysis
- Ling
- 1973
(Show Context)
Citation Context ...and neighborhood graphs) are reviewed in Leisch (2008). The visualization technique presented in this paper is based on a different technique called matrix shading (see, e.g., Sneath and Sokal, 1973; =-=Ling, 1973-=-; Gale, Halperin, and Costanzo, 1984). For matrix shading, each value in the matrix is represented by a square with the intensity of the color depending on the value. The presentation is improved by r... |

9 |
Branch-and-Bound Applications in Combinatorial Data Analysis
- Brusco, Stahl
- 2005
(Show Context)
Citation Context ...on is a combinatorial problem and thus in general very difficult to solve for all but extremely small problems. Recently a very efficient algorithm for the seriation problem based on branchand-bound (=-=Brusco and Stahl, 2005-=-) and a heuristic that combines dynamic programming combined with simulated annealing (Brusco, Köhn, and Stahl, 2008) have been developed. This algorithmic progress allows us to use seriation for visu... |

6 |
Displaying a clustering with CLUSPLOT
- Pison, Stuyf, et al.
- 1999
(Show Context)
Citation Context ...nality reduction methods (e.g., principal component analysis or multi-dimensional scaling). Objects belonging to the same cluster can be marked and separation between clusters can be judged visually (=-=Pison, Struyf, and Rousseeuw, 1999-=-). This type of visualization works very well if the dimensionality reduction is able to preserve a large portion of the variability in the original data which is typically difficult for higher dimens... |

3 |
Heuristic implementation of dynamic programming for matrix permutation problems in combinatorial data analysis
- Brusco, Kohn, et al.
(Show Context)
Citation Context ...s. Recently a very efficient algorithm for the seriation problem based on branchand-bound (Brusco and Stahl, 2005) and a heuristic that combines dynamic programming combined with simulated annealing (=-=Brusco, Köhn, and Stahl, 2008-=-) have been developed. This algorithmic progress allows us to use seriation for visualization of clusterings of larger data sets. The rest of the paper is organized as follows. In Section 2 we introdu... |

3 |
CB: Clustering visualization of multidimensional data
- Hurley
(Show Context)
Citation Context ...atrix reordering was applied to mosaic displays for multi-way contingency tables (Friendly, 1994), distance matrices (Wishart, 1999), correlation matrices (Friendly, 2002), and scatter plot matrices (=-=Hurley, 2004-=-). For these applications reordering is typically done using heuristics. For example matrix shading is often used in connection with hierarchical clustering, where the order of the dendrogram leaf nod... |

3 |
Clustangraphics3: Interactive graphics for cluster analysis
- Wishart
- 1999
(Show Context)
Citation Context ... 1981, which was first published in French in 1967) to this topic. More recently matrix reordering was applied to mosaic displays for multi-way contingency tables (Friendly, 1994), distance matrices (=-=Wishart, 1999-=-), correlation matrices (Friendly, 2002), and scatter plot matrices (Hurley, 2004). For these applications reordering is typically done using heuristics. For example matrix shading is often used in co... |

2 | Visualizing cluster analysis and finite mixture models,” in Handbook of Data Visualization - Leisch - 2008 |

1 | seriation: Infrastructure for seriation, 2009. URL http://CRAN.R-project.org/package=seriation. R package version - Hahsler, Buchta, et al. |

1 | colorspace: Color Space Manipulation, 2008. URL http://CRAN.R-project.org/package=colorspace. R package version - Ihaka, Murrell, et al. |