Results 1  10
of
14
Interactive Visual Clustering of Large Collections of Trajectories
 VAST
, 2009
"... One of the most common operations in exploration and analysis of various kinds of data is clustering, i.e. discovery and interpretation of groups of objects having similar properties and/or behaviors. In clustering, objects are often treated as points in multidimensional space of properties. Howeve ..."
Abstract

Cited by 37 (8 self)
 Add to MetaCart
(Show Context)
One of the most common operations in exploration and analysis of various kinds of data is clustering, i.e. discovery and interpretation of groups of objects having similar properties and/or behaviors. In clustering, objects are often treated as points in multidimensional space of properties. However, structurally complex objects, such as trajectories of moving entities and other kinds of spatiotemporal data, cannot be adequately represented in this manner. Such data require sophisticated and computationally intensive clustering algorithms, which are very hard to scale effectively to large datasets not fitting in the computer main memory. We propose an approach to extracting meaningful clusters from large databases by combining clustering and classification, which are driven by a human analyst through an interactive visual interface.
Subspace search and visualization to make sense of alternative clusterings in highdimensional data
 In Proc. IEEE Symp. on Visual Analytics Science and Technology (VAST
, 2012
"... In explorative data analysis, the data under consideration often resides in a highdimensional (HD) data space. Currently many methods are available to analyze this type of data. So far, proposed automatic approaches include dimensionality reduction and cluster analysis, whereby visualinteractive m ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
(Show Context)
In explorative data analysis, the data under consideration often resides in a highdimensional (HD) data space. Currently many methods are available to analyze this type of data. So far, proposed automatic approaches include dimensionality reduction and cluster analysis, whereby visualinteractive methods aim to provide effective visual mappings to show, relate, and navigate HD data. Furthermore, almost all of these methods conduct the analysis from a singular perspective, meaning that they consider the data in either the original HD data space, or a reduced version thereof. Additionally, HD data spaces often consist of combined features that measure different properties, in which case the particular relationships between the various properties may not be clear to the analysts a priori since it can only be revealed if appropriate feature combinations (subspaces) of the data are taken into consideration.
Finding and visualizing relevant subspaces for clustering highdimensional astronomical data using connected morphological operators
 In IEEE Symposium on Visual Analytics Science and Technology (2010), IEEE
"... Data sets in astronomy are growing to enormous sizes. Modern astronomical surveys provide not only image data but also catalogues of millions of objects (stars, galaxies), each object with hundreds of associated parameters. Exploration of this very highdimensional data space poses a huge challenge ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
Data sets in astronomy are growing to enormous sizes. Modern astronomical surveys provide not only image data but also catalogues of millions of objects (stars, galaxies), each object with hundreds of associated parameters. Exploration of this very highdimensional data space poses a huge challenge. Subspace clustering is one among several approaches which have been proposed for this purpose in recent years. However, many clustering algorithms require the user to set a large number of parameters without any guidelines. Some methods also do not provide a concise summary of the datasets, or, if they do, they lack additional important information such as the number of clusters present or the significance of the clusters. In this paper, we propose a method for ranking subspaces for clustering which overcomes many of the above limitations. First we carry out a transformation from parametric space to discrete im
Heidi Matrix: Nearest Neighbor Driven High Dimensional Data Visualization
"... Identifying patterns in large high dimensional data sets is a challenge. As the number of dimensions increases, the patterns in the data sets tend to be more prominent in the subspaces than the original dimensional space. A system to facilitate presentation of such subspace oriented patterns in high ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Identifying patterns in large high dimensional data sets is a challenge. As the number of dimensions increases, the patterns in the data sets tend to be more prominent in the subspaces than the original dimensional space. A system to facilitate presentation of such subspace oriented patterns in high dimensional data sets is required to understand the data. Heidi is a high dimensional data visualization system that captures and visualizes the closeness of points across various subspaces of the dimensions; thus, helping to understand the data. The core concept behind Heidi is based on prominence of patterns within the nearest neighbor relations between pairs of points across the subspaces. Given a ddimensional data set as input, Heidi system generates a 2D matrix represented as a color image. This representation gives insight into (i) how the clusters are placed with respect to each other, (ii) characteristics of placement of points within a cluster in all the subspaces and (iii) characteristics of overlapping clusters in various subspaces. A sample of results displayed and discussed in this paper illustrate how Heidi Visualization can be interpreted. 1.
Less is More: NonRedundant Subspace Clustering
"... Clustering is an important data mining task for grouping similar objects. In high dimensional data, however, effects attributed to the “curse of dimensionality”, render clustering in high dimensional data meaningless. Due to this, recent years have seen research on subspace clustering which searches ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Clustering is an important data mining task for grouping similar objects. In high dimensional data, however, effects attributed to the “curse of dimensionality”, render clustering in high dimensional data meaningless. Due to this, recent years have seen research on subspace clustering which searches for clusters in relevant subspace projections of high dimensional data. As the number of possible subspace projections is exponential in the number of dimensions, the number of possible subspace clusters can be overwhelming. In this position paper, we present our work on identifying nonredundant, relevant subspace clusters which reduce the result set to a manageable size. We discuss techniques for evaluating, visualizing and exploring subspace clusterings, and propose some directions for future work. 1.
ScagExplorer: Exploring Scatterplots by Their Scagnostics
"... A scatterplot displays a relation between a pair of variables. Given a set of v variables, there are v(v − 1)/2 pairs of variables, and thus the same number of possible pairwise scatterplots. Therefore for even small sets of variables, the number of scatterplots can be large. Scatterplot matrices (S ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
A scatterplot displays a relation between a pair of variables. Given a set of v variables, there are v(v − 1)/2 pairs of variables, and thus the same number of possible pairwise scatterplots. Therefore for even small sets of variables, the number of scatterplots can be large. Scatterplot matrices (SPLOMs) can easily run out of pixels when presenting highdimensional data. We introduce a theoretical method and a testbed for assessing whether our method can be used to guide interactive exploration of highdimensional data. The method is based on nine characterizations of the 2D distributions of orthogonal pairwise projections on a set of points in multidimensional Euclidean space. Working directly with these characterizations, we can locate anomalies for further analysis or search for similar distributions in a “large ” SPLOM with more than a hundred dimensions. Our testbed, ScagExplorer, is developed in order to evaluate the feasibility of handling huge collections of scatterplots.
A framework for evaluation and exploration of clustering algorithms in subspaces of high dimensional databases
 In Proceedings of the GI Conference on Database Systems for Business, Technology, and the Web (BTW
, 2011
"... Abstract: In high dimensional databases, traditional full space clustering methods are known to fail due to the curse of dimensionality. Thus, in recent years, subspace clustering and projected clustering approaches were proposed for clustering in high dimensional spaces. As the area is rather young ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract: In high dimensional databases, traditional full space clustering methods are known to fail due to the curse of dimensionality. Thus, in recent years, subspace clustering and projected clustering approaches were proposed for clustering in high dimensional spaces. As the area is rather young, few comparative studies on the advantages and disadvantages of the different algorithms exist. Part of the underlying problem is the lack of available open source implementations that could be used by researchers to understand, compare, and extend subspace and projected clustering algorithms. In this work, we discuss the requirements for open source evaluation software and propose the OpenSubspace framework that meets these requirements. OpenSubspace integrates stateoftheart performance measures and visualization techniques to foster clustering research in high dimensional databases. 1
Less is More: NonRedundant Subspace Clustering
"... All intext references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. ..."
Abstract
 Add to MetaCart
(Show Context)
All intext references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
An Algorithm for the Removal of Redundant Dimensions to Find Clusters in NDimensional Data using Subspace Clustering
"... Abstract: The data mining has emerged as a powerful tool to extract knowledge from huge databases. Researchers have introduced several machine learning algorithms to explore the databases to discover information, hidden patterns, and rules from the data which were not known at the data recording tim ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract: The data mining has emerged as a powerful tool to extract knowledge from huge databases. Researchers have introduced several machine learning algorithms to explore the databases to discover information, hidden patterns, and rules from the data which were not known at the data recording time. Due to the remarkable developments in the storage capacities, processing and powerful algorithmic tools, practitioners are developing new and improved algorithms and techniques in several areas of data mining to discover the rules and relationship among the attributes in simple and complex higher dimensional databases. Furthermore data mining has its implementation in large variety of areas ranging from banking to marketing, engineering to bioinformatics and from investment to risk analysis and fraud detection. Practitioners are analyzing and implementing the techniques of artificial neural networks for classification and regression problems because of accuracy, efficiency. The aim of his short research project is to develop a way of identifying the clusters in high dimensional data as well as redundant dimensions which can create a noise in identifying the clusters in high dimensional data. Techniques used in this project utilizes the strength of the projections of the data points along the dimensions to identify the intensity of projection along