Results 1 -
8 of
8
K-ary Clustering with Optimal Leaf Ordering for Gene Expression Data
- Bioinformatics
, 2003
"... A major challenge in gene expression analysis is e#ective data organization and visualization. One of the most popular tools for this task is hierarchical clustering. Hierarchical clustering allows a user to view relationships in scales ranging from single genes to large sets of genes, while at ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
A major challenge in gene expression analysis is e#ective data organization and visualization. One of the most popular tools for this task is hierarchical clustering. Hierarchical clustering allows a user to view relationships in scales ranging from single genes to large sets of genes, while at the same time providing a global view of the expression data. However, hierarchical clustering is very sensitive to noise, it usually lacks of a method to actually identify distinct clusters, and produces a large number of possible leaf orderings of the hierarchical clustering tree.
A Two-Way Visualization Method for Clustered Data
, 2003
"... We describe a novel approach to the visualization of hierarchical clustering that superimposes the classical dendrogram over a fully synchronized low-dimensional embedding, thereby gaining the benefits of both approaches. In a single image one can view all the clusters, examine the relations between ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
We describe a novel approach to the visualization of hierarchical clustering that superimposes the classical dendrogram over a fully synchronized low-dimensional embedding, thereby gaining the benefits of both approaches. In a single image one can view all the clusters, examine the relations between them and study many of their properties. The method is based on an algorithm for lowdimensional embedding of clustered data, with the property that separation between all clusters is guaranteed, regardless of their nature. In particular, the algorithm was designed to produce embeddings that strictly adhere to a given hierarchical clustering of the data, so that every two disjoint clusters in the hierarchy are drawn separately.
Overcoming the Curse of Dimensionality in Clustering by means of the Wavelet Transform
- The Computer Journal
, 2000
"... We use a redundant wavelet transform analysis to detect clusters in high-dimensional data spaces. We overcome Bellman's \curse of dimensionality" in such problems by (i) using some canonical ordering of observation and variable (document and term) dimensions in our data, (ii) applying a wavelet t ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
We use a redundant wavelet transform analysis to detect clusters in high-dimensional data spaces. We overcome Bellman's \curse of dimensionality" in such problems by (i) using some canonical ordering of observation and variable (document and term) dimensions in our data, (ii) applying a wavelet transform to such canonically ordered data, (iii) modeling the noise in wavelet space, (iv) dening signicant component parts of the data as opposed to insignicant or noisy component parts, and (v) reading o the resultant clusters. The overall complexity of this innovative approach is linear in the data dimensionality. We describe a number of examples and test cases, including the clustering of high-dimensional hypertext data. 1 Introduction Bellman's (1961) [1] \curse of dimensionality" refers to the exponential growth of hypervolume as a function of dimensionality. All problems become tougher as the dimensionality increases. Nowhere is this more evident than in problems related to ...
The history of the cluster heat map
- The American Statistician
, 2009
"... The cluster heat map is an ingenious display that simultaneously reveals row and column hierarchical cluster structure in a data matrix. It consists of a rectangular tiling with each tile shaded on a color scale to represent the value of the corresponding element of the data matrix. The rows (column ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
The cluster heat map is an ingenious display that simultaneously reveals row and column hierarchical cluster structure in a data matrix. It consists of a rectangular tiling with each tile shaded on a color scale to represent the value of the corresponding element of the data matrix. The rows (columns) of the tiling are ordered such that similar rows (columns) are near each other. On the vertical and horizontal margins of the tiling there are hierarchical cluster trees. This cluster heat map is a synthesis of several different graphic displays developed by statisticians over more than a century. We locate the earliest sources of this display in late 19th century publications. And we trace a diverse 20th century statistical literature that provided a foundation for this most widely used of all bioinformatics displays. 1
A Two-Way Visualization Method for Clustered Data (Extended Abstract)
"... Yehuda Koren and David Harel Dept. of Computer Science and Applied Mathematics The Weizmann Institute of Science, Rehovot, Israel {yehuda,dharel}@wisdom.weizmann.ac. il ABSTRACT We describe a novel approach to the visualization of hierarchical clustering that superimposes the classical dendrogr ..."
Abstract
- Add to MetaCart
Yehuda Koren and David Harel Dept. of Computer Science and Applied Mathematics The Weizmann Institute of Science, Rehovot, Israel {yehuda,dharel}@wisdom.weizmann.ac. il ABSTRACT We describe a novel approach to the visualization of hierarchical clustering that superimposes the classical dendrogram over a fully synchronized low-dimensional embedding, thereby gaining the benefits of both approaches. In a single image one can view all the clusters, examine the relations between them and study many of their properties. The method is based on an algorithm for lowdimensional embedding of clustered data, with the property that separation between all clusters is guaranteed, regardless of their nature. In particular, the algorithm was designed to produce embeddings that strictly adhere to a given hierarchical clustering of the data, so that every two disjoint clusters in the hierarchy are drawn separately.
Dissimilarity Plots: A Visual Exploration Tool for Partitional Clustering
, 2009
"... For hierarchical clustering, dendrograms provide convenient and powerful visualization. Although many visualization methods have been suggested for partitional clustering, their usefulness deteriorates quickly with increasing dimensionality of the data and/or they fail to represent structure between ..."
Abstract
- Add to MetaCart
For hierarchical clustering, dendrograms provide convenient and powerful visualization. Although many visualization methods have been suggested for partitional clustering, their usefulness deteriorates quickly with increasing dimensionality of the data and/or they fail to represent structure between and within clusters simultaneously. In this paper we extend (dissimilarity) matrix shading with several reordering steps based on seriation. Both methods, matrix shading and seriation, have been well-known for a long time. However, only recent algorithmic improvements allow to use seriation for larger problems. Furthermore, seriation is used in a novel stepwise process (within each cluster and between clusters) which leads to a visualization technique that is independent of the dimensionality of the data. A big advantage is that it presents the structure between clusters and the micro-structure within clusters in one concise plot. This not only allows for judging cluster quality but also makes mis-specification of the number of clusters apparent. We give a detailed discussion of the construction of dissimilarity plots and demonstrate their usefulness with several examples.
Computational and Interactive Visualization with a Focus on Topological Analysis, Dual Contouring and Water-resource Data Representation
, 2007
"... Increase in computing power has led to a substantial increase in the size of scientific and engineering data sets. Often, research in high-dimensional spaces requires analysis of terabytes of data. This in turn has led to an increase in the demand for simplified representations of these large datase ..."
Abstract
- Add to MetaCart
Increase in computing power has led to a substantial increase in the size of scientific and engineering data sets. Often, research in high-dimensional spaces requires analysis of terabytes of data. This in turn has led to an increase in the demand for simplified representations of these large datasets for effective analysis. Scientific visualization facilitates visual interpretation of massive data sets. Thus, visualization is driven by the needs of a broad spectrum of research areas. The first part of this dissertation describes use of topology to segment twodimensional tensor fields. The second part describes a ray intersection method to generate dual isosurfaces for trivariate, volumetric data. The third part describes an interactive visualization system for visualizing water-resource data. In the first two parts, we describe how topology can serve as foundation for two different areas: (a) In tensor field interpretation, to extract topology of the field based on different interpolation schemes to reduce complexity. (b) In volume visualization, to ensure topological correctness of isosurfaces, and using that as the underlying principle of the dual-isosurfacing algorithm. In the third part, we present the visualization systems developed as a part of application-driven research concerned with management and planning of water resources. We describe two systems which support: (a) a global analysis of multiparameter time-series data of different components of a large hydrological system, and (b) a localized statistical analysis of time-series data of a single parameter in a specific part of a hydrological system.

