Results 1 
1 of
1
Multivariate Clustering of LargeScale Scientific Simulation Data
, 2003
"... Simulations of complex scientific phenomena involve the execution of massively parallel computer programs. These simulation programs generate largescale data sets over the spatiotemporal space. Modeling such massive data sets is an essential step in helping scientists discover new information from ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Simulations of complex scientific phenomena involve the execution of massively parallel computer programs. These simulation programs generate largescale data sets over the spatiotemporal space. Modeling such massive data sets is an essential step in helping scientists discover new information from their computer simulations. In this paper, we present a simple but effective multivariate clustering algorithm for largescale scientific simulation data sets. Our algorithm utilizes the cosine similarity measure to cluster the field variables in a data set. Field variables include all variables except the spatial (x, y, z) and temporal (time) variables. The exclusion of the spatial dimensions is important since “similar ” characteristics could be located (spatially) far from each other. To scale our multivariate clustering algorithm for largescale data sets, we take advantage of the geometrical properties of the cosine similarity measure. This allows us to reduce the modeling time from O(n 2) to O(n × g(f(u))), where n is the number of data points, f(u) is a function of the userdefined clustering threshold, and g(f(u)) is the number of data points satisfying f(u). We show that on average g(f(u)) is much less than n. Finally, even though spatial variables do not play a role in building clusters, it is desirable to associate each cluster with its correct spatial region. To achieve this, we present a linking algorithm for connecting each cluster to the appropriate nodes of the data set’s topology tree (where the spatial information of the data set is stored). Our experimental evaluations on two largescale simulation data sets illustrate the value of our multivariate clustering and linking algorithms. 1.