Results 1  10
of
20
Quality metrics in highdimensional data visualization: an overview and systematization
 IEEE TRANS. ON VISUALIZATION AND COMPUTER GRAPHICS
, 2011
"... In this paper, we present a systematization of techniques that use quality metrics to help in the visual exploration of meaningful patterns in highdimensional data. In a number of recent papers, different quality metrics are proposed to automate the demanding search through large spaces of alterna ..."
Abstract

Cited by 29 (3 self)
 Add to MetaCart
(Show Context)
In this paper, we present a systematization of techniques that use quality metrics to help in the visual exploration of meaningful patterns in highdimensional data. In a number of recent papers, different quality metrics are proposed to automate the demanding search through large spaces of alternative visualizations (e.g., alternative projections or ordering), allowing the user to concentrate on the most promising visualizations suggested by the quality metrics. Over the last decade, this approach has witnessed a remarkable development but few reflections exist on how these methods are related to each other and how the approach can be developed further. For this purpose, we provide an overview of approaches that use quality metrics in highdimensional data visualization and propose a systematization based on a thorough literature review. We carefully analyze the papers and derive a set of factors for discriminating the quality metrics, visualization techniques, and the process itself. The process is described through a reworked version of the wellknown information visualization pipeline. We demonstrate the usefulness of our model by applying it to several existing approaches that use quality metrics, and we provide reflections on implications of our model for future research.
Dissimilarity Plots: A Visual Exploration Tool for Partitional Clustering
, 2009
"... For hierarchical clustering, dendrograms provide convenient and powerful visualization. Although many visualization methods have been suggested for partitional clustering, their usefulness deteriorates quickly with increasing dimensionality of the data and/or they fail to represent structure between ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
For hierarchical clustering, dendrograms provide convenient and powerful visualization. Although many visualization methods have been suggested for partitional clustering, their usefulness deteriorates quickly with increasing dimensionality of the data and/or they fail to represent structure between and within clusters simultaneously. In this paper we extend (dissimilarity) matrix shading with several reordering steps based on seriation. Both methods, matrix shading and seriation, have been wellknown for a long time. However, only recent algorithmic improvements allow to use seriation for larger problems. Furthermore, seriation is used in a novel stepwise process (within each cluster and between clusters) which leads to a visualization technique that is independent of the dimensionality of the data. A big advantage is that it presents the structure between clusters and the microstructure within clusters in one concise plot. This not only allows for judging cluster quality but also makes misspecification of the number of clusters apparent. We give a detailed discussion of the construction of dissimilarity plots and demonstrate their usefulness with several examples.
Efficient Visualization of Largescale Data Tables through Reordering and Entropy Minimization
"... Abstract—Visualization of data tables with n examples and m columns using heatmaps provides a holistic view of the original data. As there are n! ways to order rows and m! ways to order columns, and data tables are typically ordered without regard to visual inspection, heatmaps of the original data ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Visualization of data tables with n examples and m columns using heatmaps provides a holistic view of the original data. As there are n! ways to order rows and m! ways to order columns, and data tables are typically ordered without regard to visual inspection, heatmaps of the original data tables often appear as noisy images. However, if rows and columns of a data table are ordered such that similar rows and similar columns are grouped together, a heatmap may provide a deep insight into the underlying data distribution. We propose an informationtheoretic approach to produce a wellordered data table. In particular, we search for ordering that minimizes entropy of residuals of predictive coding applied on the ordered data table. This formalization leads to a novel ordering procedure, EMordering, that can be applied separately on rows and columns. For ordering of rows, EMordering repeats until convergence the steps of (1) rescaling columns and (2) solving a Traveling Salesman Problem (TSP) where rows are treated as cities. To allow fast ordering of large data tables, we propose an efficient TSP heuristic with modest O n log(n) time complexity. When compared to the existing stateoftheart reordering approaches, we show that the method often provides heatmaps of higher visual quality, while being significantly more scalable. Moreover, analysis of realworld traffic and financial data sets using the proposed method, which allowed us to readily gain deeper insights about the data, further confirmed that EMordering can be a valuable tool for visual exploration of largescale data sets. I.
KingstonuponThames, Surrey
"... Consistency is important to the success of any software project and this includes the organization of its data structure definitions. Thus there is a need to better understand the decisions programmers make when organizing items into data structures. Results from an experiment that asked programmers ..."
Abstract
 Add to MetaCart
(Show Context)
Consistency is important to the success of any software project and this includes the organization of its data structure definitions. Thus there is a need to better understand the decisions programmers make when organizing items into data structures. Results from an experiment that asked programmers to design data structures based on various specifications are presented. The results show that although there is significant variation among subjects, patterns exist across designs. Participants included 26 professional and 12 student programmers. Comparisons of the two groups indicate that students are more conformist to the specification, but also show greater diversity in their answers, when compared to professionals. These results help to characterize the learning that experience brings and suggest that when selecting programmers for a task involving data structure design or modification, each programmer’s layout style be taken into account. 1.
Seriation in the Presence of Errors: A Factor 16 Approximation Algorithm for l∞Fitting Robinson Structures to Distances
 ALGORITHMICA
, 2007
"... The classical seriation problem consists in finding a permutation of the rows and the columns of the distance (or, more generally, dissimilarity) matrix d on a finite set X so that small values should be concentrated around the main diagonal as close as possible, whereas large values should fall as ..."
Abstract
 Add to MetaCart
The classical seriation problem consists in finding a permutation of the rows and the columns of the distance (or, more generally, dissimilarity) matrix d on a finite set X so that small values should be concentrated around the main diagonal as close as possible, whereas large values should fall as far from it as possible. This goal is best achieved by considering the Robinson property: a distance dR on X is Robinsonian if its matrix can be symmetrically permuted so that its elements do not decrease when moving away from the main diagonal along any row or column. If the distance d fails to satisfy the Robinson property, then we are lead to the problem of finding a reordering of d which is as close as possible to a Robinsonian distance. In this paper, we present a factor 16 approximation algorithm for the following NPhard fitting problem: given a finite set X and a dissimilarity d on X, wewish to find a Robinsonian dissimilarity dR on X minimizing the lâerror âd â dRâ â = maxx,yâX{d(x,y) â dR(x, y)} between d and dR.
First published in CVu vol. 22 no.?
"... Developer categorization of data structure fields (part 2 of 2) ..."
(Show Context)