Results 1  10
of
14
Kary Clustering with Optimal Leaf Ordering for Gene Expression Data
 Bioinformatics
, 2003
"... A major challenge in gene expression analysis is e#ective data organization and visualization. One of the most popular tools for this task is hierarchical clustering. Hierarchical clustering allows a user to view relationships in scales ranging from single genes to large sets of genes, while at ..."
Abstract

Cited by 32 (2 self)
 Add to MetaCart
(Show Context)
A major challenge in gene expression analysis is e#ective data organization and visualization. One of the most popular tools for this task is hierarchical clustering. Hierarchical clustering allows a user to view relationships in scales ranging from single genes to large sets of genes, while at the same time providing a global view of the expression data. However, hierarchical clustering is very sensitive to noise, it usually lacks of a method to actually identify distinct clusters, and produces a large number of possible leaf orderings of the hierarchical clustering tree.
The history of the cluster heat map
 The American Statistician
, 2009
"... The cluster heat map is an ingenious display that simultaneously reveals row and column hierarchical cluster structure in a data matrix. It consists of a rectangular tiling with each tile shaded on a color scale to represent the value of the corresponding element of the data matrix. The rows (column ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
The cluster heat map is an ingenious display that simultaneously reveals row and column hierarchical cluster structure in a data matrix. It consists of a rectangular tiling with each tile shaded on a color scale to represent the value of the corresponding element of the data matrix. The rows (columns) of the tiling are ordered such that similar rows (columns) are near each other. On the vertical and horizontal margins of the tiling there are hierarchical cluster trees. This cluster heat map is a synthesis of several different graphic displays developed by statisticians over more than a century. We locate the earliest sources of this display in late 19th century publications. And we trace a diverse 20th century statistical literature that provided a foundation for this most widely used of all bioinformatics displays. 1
Overcoming the Curse of Dimensionality in Clustering by means of the Wavelet Transform
 The Computer Journal
, 2000
"... We use a redundant wavelet transform analysis to detect clusters in highdimensional data spaces. We overcome Bellman's \curse of dimensionality" in such problems by (i) using some canonical ordering of observation and variable (document and term) dimensions in our data, (ii) applying a ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
(Show Context)
We use a redundant wavelet transform analysis to detect clusters in highdimensional data spaces. We overcome Bellman's \curse of dimensionality" in such problems by (i) using some canonical ordering of observation and variable (document and term) dimensions in our data, (ii) applying a wavelet transform to such canonically ordered data, (iii) modeling the noise in wavelet space, (iv) dening signicant component parts of the data as opposed to insignicant or noisy component parts, and (v) reading o the resultant clusters. The overall complexity of this innovative approach is linear in the data dimensionality. We describe a number of examples and test cases, including the clustering of highdimensional hypertext data. 1 Introduction Bellman's (1961) [1] \curse of dimensionality" refers to the exponential growth of hypervolume as a function of dimensionality. All problems become tougher as the dimensionality increases. Nowhere is this more evident than in problems related to ...
A TwoWay Visualization Method for Clustered Data
, 2003
"... We describe a novel approach to the visualization of hierarchical clustering that superimposes the classical dendrogram over a fully synchronized lowdimensional embedding, thereby gaining the benefits of both approaches. In a single image one can view all the clusters, examine the relations between ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
We describe a novel approach to the visualization of hierarchical clustering that superimposes the classical dendrogram over a fully synchronized lowdimensional embedding, thereby gaining the benefits of both approaches. In a single image one can view all the clusters, examine the relations between them and study many of their properties. The method is based on an algorithm for lowdimensional embedding of clustered data, with the property that separation between all clusters is guaranteed, regardless of their nature. In particular, the algorithm was designed to produce embeddings that strictly adhere to a given hierarchical clustering of the data, so that every two disjoint clusters in the hierarchy are drawn separately.
Dissimilarity Plots: A Visual Exploration Tool for Partitional Clustering
, 2009
"... For hierarchical clustering, dendrograms provide convenient and powerful visualization. Although many visualization methods have been suggested for partitional clustering, their usefulness deteriorates quickly with increasing dimensionality of the data and/or they fail to represent structure between ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
For hierarchical clustering, dendrograms provide convenient and powerful visualization. Although many visualization methods have been suggested for partitional clustering, their usefulness deteriorates quickly with increasing dimensionality of the data and/or they fail to represent structure between and within clusters simultaneously. In this paper we extend (dissimilarity) matrix shading with several reordering steps based on seriation. Both methods, matrix shading and seriation, have been wellknown for a long time. However, only recent algorithmic improvements allow to use seriation for larger problems. Furthermore, seriation is used in a novel stepwise process (within each cluster and between clusters) which leads to a visualization technique that is independent of the dimensionality of the data. A big advantage is that it presents the structure between clusters and the microstructure within clusters in one concise plot. This not only allows for judging cluster quality but also makes misspecification of the number of clusters apparent. We give a detailed discussion of the construction of dissimilarity plots and demonstrate their usefulness with several examples.
Clutterbased dimension reordering in multidimensional data visualization (Master's thesis
, 2005
"... Visual clutter denotes a disordered collection of graphical entities in information visualization. It can obscure the structure present in the data. Even in a small dataset, visual clutter makes it hard for the viewer to find patterns, relationships and structure. In this thesis, I study visual clut ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Visual clutter denotes a disordered collection of graphical entities in information visualization. It can obscure the structure present in the data. Even in a small dataset, visual clutter makes it hard for the viewer to find patterns, relationships and structure. In this thesis, I study visual clutter with four distinct visualization techniques, and present the concept and framework of ClutterBased Dimension Reordering (CBDR). Dimension order is an attribute that can significantly affect a visualization’s expressiveness. By varying the dimension order in a display, it is possible to reduce clutter without reducing data content or modifying the data in any way. Clutter reduction is a displaydependent task. In this thesis, I apply the CBDR framework to four different visualization techniques. For each display technique, I determine what constitutes clutter in terms of display properties, then design a metric to measure visual clutter in this display. Finally I search for an order that minimizes the clutter in a display. Different algorithms for the searching process are
Computational and Interactive Visualization with a Focus on Topological Analysis, Dual Contouring and Waterresource Data Representation
, 2007
"... Increase in computing power has led to a substantial increase in the size of scientific and engineering data sets. Often, research in highdimensional spaces requires analysis of terabytes of data. This in turn has led to an increase in the demand for simplified representations of these large datase ..."
Abstract
 Add to MetaCart
Increase in computing power has led to a substantial increase in the size of scientific and engineering data sets. Often, research in highdimensional spaces requires analysis of terabytes of data. This in turn has led to an increase in the demand for simplified representations of these large datasets for effective analysis. Scientific visualization facilitates visual interpretation of massive data sets. Thus, visualization is driven by the needs of a broad spectrum of research areas. The first part of this dissertation describes use of topology to segment twodimensional tensor fields. The second part describes a ray intersection method to generate dual isosurfaces for trivariate, volumetric data. The third part describes an interactive visualization system for visualizing waterresource data. In the first two parts, we describe how topology can serve as foundation for two different areas: (a) In tensor field interpretation, to extract topology of the field based on different interpolation schemes to reduce complexity. (b) In volume visualization, to ensure topological correctness of isosurfaces, and using that as the underlying principle of the dualisosurfacing algorithm. In the third part, we present the visualization systems developed as a part of applicationdriven research concerned with management and planning of water resources. We describe two systems which support: (a) a global analysis of multiparameter timeseries data of different components of a large hydrological system, and (b) a localized statistical analysis of timeseries data of a single parameter in a specific part of a hydrological system.
History Corner The History of the Cluster Heat Map
"... The cluster heat map is an ingenious display that simultaneously reveals row and column hierarchical cluster structure in a data matrix. It consists of a rectangular tiling, with each tile shaded on a color scale to represent the value of the corresponding element of the data matrix. The rows (colum ..."
Abstract
 Add to MetaCart
The cluster heat map is an ingenious display that simultaneously reveals row and column hierarchical cluster structure in a data matrix. It consists of a rectangular tiling, with each tile shaded on a color scale to represent the value of the corresponding element of the data matrix. The rows (columns) of the tiling are ordered such that similar rows (columns) are near each other. On the vertical and horizontal margins of the tiling are hierarchical cluster trees. This cluster heat map is a synthesis of several different graphic displays developed by statisticians over more than a century. We locate the earliest sources of this display in late 19th century publications, and trace a diverse 20th century statistical literature that provided a foundation for this most widely used of all bioinformatics displays. KEY WORDS: Cluster analysis; Heatmap; Microarray; Visualization. 1.
Classification Visualization with Shaded Similarity Matrix
"... Shaded similarity matrix has long been used in visual cluster analysis. This paper investigates how it can be used in classification visualization. We focus on two popular classification methods: nearest neighbor and decision tree. Ensemble classifier visualization is also presented for handling lar ..."
Abstract
 Add to MetaCart
(Show Context)
Shaded similarity matrix has long been used in visual cluster analysis. This paper investigates how it can be used in classification visualization. We focus on two popular classification methods: nearest neighbor and decision tree. Ensemble classifier visualization is also presented for handling large data sets.