Parallel Algorithms for Hierarchical Clustering
 Parallel Computing
, 1995
"... Hierarchical clustering is a common method used to determine clusters of similar data points in multidimensional spaces. O(n 2 ) algorithms are known for this problem [3, 4, 10, 18]. This paper reviews important results for sequential algorithms and describes previous work on parallel algorithms f ..."
Cited by 80 (1 self)
Hierarchical clustering is a common method used to determine clusters of similar data points in multidimensional spaces. O(n 2 ) algorithms are known for this problem [3, 4, 10, 18]. This paper reviews important results for sequential algorithms and describes previous work on parallel algorithms for hierarchical clustering. Parallel algorithms to perform hierarchical clustering using several distance metrics are then described. Optimal PRAM algorithms using n log n processors are given for the average link, complete link, centroid, median, and minimum variance metrics. Optimal butterfly and tree algorithms using n log n processors are given for the centroid, median, and minimum variance metrics. Optimal asymptotic speedups are achieved for the best practical algorithm to perform clustering using the single link metric on a n log n processor PRAM, butterfly, or tree. Keywords. Hierarchical clustering, pattern analysis, parallel algorithm, butterfly network, PRAM algorithm. 1 In...
Parallel Programming using Functional Languages
, 1991
"... I am greatly indebted to Simon Peyton Jones, my supervisor, for his encouragement and technical assistance. His overwhelming enthusiasm was of great support to me. I particularly want to thank Simon and Geoff Burn for commenting on earlier drafts of this thesis. Through his excellent lecturing Cohn ..."
Cited by 48 (3 self)
I am greatly indebted to Simon Peyton Jones, my supervisor, for his encouragement and technical assistance. His overwhelming enthusiasm was of great support to me. I particularly want to thank Simon and Geoff Burn for commenting on earlier drafts of this thesis. Through his excellent lecturing Cohn Runciman initiated my interest in functional programming. I am grateful to Phil Trinder for his simulator, on which mine is based, and Will Partain for his help with LaTex and graphs. I would like to thank the Science and Engineering Research Council of Great Britain for their financial support. Finally, I would like to thank Michelle, whose culinary skills supported me whilst I was writingup.The Imagination the only nation worth defending a nation without alienation a nation whose flag is invisible and whose borders are forever beyond the horizon a nation whose motto is why have one or the other when you can have one the other and both
The Generalized Dimension Exchange Method for Load Balancing in kary ncubes and Variants
, 1995
"... The Generalized Dimension Exchange (GDE) method is a fully distributed load balancing method that operates in a relaxation fashion for multicomputers with a direct communication network. It is parameterized by an exchange parameter that governs the splitting of load between a pair of directly conne ..."
Cited by 44 (9 self)
The Generalized Dimension Exchange (GDE) method is a fully distributed load balancing method that operates in a relaxation fashion for multicomputers with a direct communication network. It is parameterized by an exchange parameter that governs the splitting of load between a pair of directly connected processors during load balancing. An optimal would lead to the fastest convergence of the balancing process. Previous work has resulted in the optimal for the binary ncubes. In this paper, we derive the optimal 's for the kary ncube network and its variantsthe ring, the torus, the chain, and the mesh. We establish the relationships between the optimal convergence rates of the method when applied to these structures, and conclude that the GDE method favors high dimensional kary ncubes. We also reveal the superiority of the GDE method to another relaxationbased method, the diffusion method. We further show through statistical simulations that the optimal 's do speed up the GDE...
Analysis of The Generalized Dimension Exchange Method for Dynamic Load Balancing
 Journal of Parallel and Distributed Computing
, 1992
"... The dimension exchange method is a distributed load balancing method for pointtopoint networks. We add a parameter, called the exchange parameter, to the method to control the splitting of load between a pair of directly connected processors, and call this parameterized version the generalized di ..."
Cited by 42 (7 self)
The dimension exchange method is a distributed load balancing method for pointtopoint networks. We add a parameter, called the exchange parameter, to the method to control the splitting of load between a pair of directly connected processors, and call this parameterized version the generalized dimension exchange (GDE) method. The rationale for the introduction of this parameter is that splitting the workload into equal halves does not necessarily lead to an optimal result (in terms of the convergence rate) for certain structures. We carry out an analysis of this new method, emphasizing on its termination aspects and potential efficiency. Given a specific structure, one needs to determine a value to use for the exchange parameter that would lead to an optimal result. To this end, we first derive a sufficient and necessary condition for the termination of the method. We then show that equal splitting, proposed originally by others as a heuristic strategy, indeed yields optimal efficie...
Largescale parallel data clustering
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1998
"... Abstract—Algorithmic enhancements are described that enable large computational reduction in mean squareerror data clustering. These improvements are incorporated into a parallel dataclustering tool, PCLUSTER, designed to execute on a network of workstations. Experiments involving the unsupervise ..."
Cited by 41 (4 self)
Abstract—Algorithmic enhancements are described that enable large computational reduction in mean squareerror data clustering. These improvements are incorporated into a parallel dataclustering tool, PCLUSTER, designed to execute on a network of workstations. Experiments involving the unsupervised segmentation of standard texture images were performed. For some data sets, a 96 percent reduction in computation was achieved. Index Terms—Data clustering, mean square error, data mining, image segmentation, parallel algorithm, network of workstations. ——————— — F ———————— 1
Iterative Dynamic Load Balancing in Multicomputers
 Journal of Operational Research Society
, 1994
"... Dynamic load balancing in multicomputers can improve the utilization of processors and the efficiency of parallel computations through migrating workload across processors at runtime. We present a survey and critique of dynamic load balancing strategies that are iterative: workload migration is car ..."
Cited by 21 (3 self)
Dynamic load balancing in multicomputers can improve the utilization of processors and the efficiency of parallel computations through migrating workload across processors at runtime. We present a survey and critique of dynamic load balancing strategies that are iterative: workload migration is carried out through transferring processes across nearest neighbor processors. Iterative strategies have become prominent in recent years because of the increasing popularity of pointtopoint interconnection networks for multicomputers. Key words: dynamic load balancing, multicomputers, optimization, queueing theory, scheduling. INTRODUCTION Multicomputers are highly concurrent systems that are composed of many autonomous processors connected by a communication network 1;2 . To improve the utilization of the processors, parallel computations in multicomputers require that processes be distributed to processors in such a way that the computational load is evenly spread among the processors...
On Runtime Parallel Scheduling for Processor Load Balancing
 IEEE Trans. Parallel and Distributed Systems
, 1997
"... Parallel scheduling is a new approach for load balancing. In parallel scheduling, all processors cooperate to schedule work. Parallel scheduling is able to accurately balance the load by using global load information at compiletime or runtime. It provides highquality load balancing. This paper pre ..."
Cited by 21 (0 self)
Parallel scheduling is a new approach for load balancing. In parallel scheduling, all processors cooperate to schedule work. Parallel scheduling is able to accurately balance the load by using global load information at compiletime or runtime. It provides highquality load balancing. This paper presents an overview of the parallel scheduling technique. Scheduling algorithms for tree, hypercube, and mesh networks are presented. These algorithms can fully balance the load and maximize locality 1. Introduction Static scheduling balances the workload before runtime and can be applied to problems with a predictable structure, which are called static problems. Dynamic scheduling performs scheduling activities concurrently at runtime, which applies to problems with an unpredictable structure, which are called dynamic problems. Static scheduling utilizes the knowledge of problem characteristics to reach a wellbalanced load [1, 2, 3, 4]. However, it is not able to balance the load for dynami...
Nearest Neighbor Algorithms for Load Balancing in Parallel Computers
, 1995
"... With nearest neighbor load balancing algorithms, a processor makes balancing decisions based on localized workload information and manages workload migrations within its neighborhood. This paper compares a couple of fairly wellknown nearest neighbor algorithms, the dimensionexchange (DE, for shor ..."
Cited by 19 (2 self)
With nearest neighbor load balancing algorithms, a processor makes balancing decisions based on localized workload information and manages workload migrations within its neighborhood. This paper compares a couple of fairly wellknown nearest neighbor algorithms, the dimensionexchange (DE, for short) and the diffusion (DF, for short) methods and their several variantsthe average dimensionexchange (ADE), the optimallytuned dimensionexchange (ODE), the local average diffusion (ADF) and the optimallytuned diffusion (ODF). The measures of interest are their efficiency in driving any initial workload distribution to a uniform distribution and their ability in controlling the growth of the variance among the processors' workloads. The comparison is made with respect to both oneport and allport communication architectures and in consideration of various implementation strategies including synchronous/asynchronous invocation policies and static/dynamic random workload behaviors. It t...
Fast and Parallel Mapping Algorithms for Irregular Problems
 Journal of Supercomputing
, 1993
"... In this paper we develop simple indexbased graph partitioning techniques. We show our methods to be very fast, easily parallelizable and that they produce good quality mappings. These properties make them useful for parallelization of a number of irregular and adaptive applications. Index Terms: M ..."
Cited by 16 (0 self)
In this paper we develop simple indexbased graph partitioning techniques. We show our methods to be very fast, easily parallelizable and that they produce good quality mappings. These properties make them useful for parallelization of a number of irregular and adaptive applications. Index Terms: Mapping, Remapping, Parallel, Merging, Sorting 1 Introduction Parallelization of dataparallel programs on distributedmemory parallel computers requires careful attention to load balancing and reduction of communication to achieve a good performance. For most regular and synchronous problems [13], mapping can be performed at the time of compilation by giving directives to decompose the data and its corresponding computations [8]. For irregular applications, achieving a good mapping is considerably more difficult; the nature of the irregularities may not be known at the time of compilation and can be derived only at runtime [7]. These applications can be represented as computational graphs...
Parallel Remapping Algorithms for Adaptive Problems
 PROC. OF THE SYMP. ON THE FRONTIERS OF MASSIVELY PARALLEL COMPUTATION
, 1995
"... In this paper we present fast parallel algorithms for remapping a class of irregular and adaptive problems on coarsegrained distributedmemory machines. We show that the remapping of these applications, using simple indexbased mapping algorithms, can be reduced to sorting a nearly sorted list of i ..."
Cited by 15 (3 self)
In this paper we present fast parallel algorithms for remapping a class of irregular and adaptive problems on coarsegrained distributedmemory machines. We show that the remapping of these applications, using simple indexbased mapping algorithms, can be reduced to sorting a nearly sorted list of integers or merging an unsorted list of integers with a sorted list of integers. By using the algorithms we have developed, the remapping of these problems can be achieved at a fraction of the cost of mapping from scratch. Results of experiments performed on the CM5 are presented.