Results 1 
5 of
5
Parallel Algorithms for Hierarchical Clustering
 Parallel Computing
, 1995
"... Hierarchical clustering is a common method used to determine clusters of similar data points in multidimensional spaces. O(n 2 ) algorithms are known for this problem [3, 4, 10, 18]. This paper reviews important results for sequential algorithms and describes previous work on parallel algorithms f ..."
Abstract

Cited by 80 (1 self)
 Add to MetaCart
Hierarchical clustering is a common method used to determine clusters of similar data points in multidimensional spaces. O(n 2 ) algorithms are known for this problem [3, 4, 10, 18]. This paper reviews important results for sequential algorithms and describes previous work on parallel algorithms for hierarchical clustering. Parallel algorithms to perform hierarchical clustering using several distance metrics are then described. Optimal PRAM algorithms using n log n processors are given for the average link, complete link, centroid, median, and minimum variance metrics. Optimal butterfly and tree algorithms using n log n processors are given for the centroid, median, and minimum variance metrics. Optimal asymptotic speedups are achieved for the best practical algorithm to perform clustering using the single link metric on a n log n processor PRAM, butterfly, or tree. Keywords. Hierarchical clustering, pattern analysis, parallel algorithm, butterfly network, PRAM algorithm. 1 In...
New Techniques for BestMatch Retrieval
 ACM Transactions on Information Systems
, 1990
"... A scheme to answer bestmatch queries from a file containing a collection of objects is described. A bestmatch query is to find the objects in the file that are closest (according to some (dis)similarity measure) to a given target. Previous work [5, 331 suggests that one can reduce the number of co ..."
Abstract

Cited by 53 (5 self)
 Add to MetaCart
A scheme to answer bestmatch queries from a file containing a collection of objects is described. A bestmatch query is to find the objects in the file that are closest (according to some (dis)similarity measure) to a given target. Previous work [5, 331 suggests that one can reduce the number of comparisons required to achieve the desired results using the triangle inequality, starting with a data structure for the file that reflects some precomputed intrafile distances. We generalize the technique to allow the optimum use of any given set of precomputed intrafile distances. Some empirical results are presented which illustrate the effectiveness of our scheme, and its performance relative to previous algorithms.
Clustering in Massive Data Sets
 Handbook of massive data sets
, 1999
"... We review the time and storage costs of search and clustering algorithms. We exemplify these, based on casestudies in astronomy, information retrieval, visual user interfaces, chemical databases, and other areas. Sections 2 to 6 relate to nearest neighbor searching, an elemental form of clustering, ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
We review the time and storage costs of search and clustering algorithms. We exemplify these, based on casestudies in astronomy, information retrieval, visual user interfaces, chemical databases, and other areas. Sections 2 to 6 relate to nearest neighbor searching, an elemental form of clustering, and a basis for clustering algorithms to follow. Sections 7 to 11 review a number of families of clustering algorithm. Sections 12 to 14 relate to visual or image representations of data sets, from which a number of interesting algorithmic developments arise.
Geometric Minimum Spanning Trees via WellSeparated Pair Decompositions
 ACM JOURNAL OF EXPERIMENTAL ALGORITHMICS
, 2001
"... ..."
Multivariate Twosample Tests
, 1978
"... Multivariate generalizations of the WaldWolfowitz runs statistic and the Smirnov maximum deviation statistic for the twosample problem are presented. They are based on the minimal spanning tree of the pooled sample points. Some null distribution results are derived, and a simulation study of power ..."
Abstract
 Add to MetaCart
Multivariate generalizations of the WaldWolfowitz runs statistic and the Smirnov maximum deviation statistic for the twosample problem are presented. They are based on the minimal spanning tree of the pooled sample points. Some null distribution results are derived, and a simulation study of power is reported. (To be published in Annals of Statistics) *Supported by Department of Energy