Results 1 - 10
of
108
Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors
, 1999
"... Devices]: Modes of Computation---Parallelism and concurrency General Terms: Algorithms, Design, Performance, Theory Additional Key Words and Phrases: Automatic parallelization, DAG, multiprocessors, parallel processing, software tools, static scheduling, task graphs This research was supported ..."
Abstract
-
Cited by 142 (4 self)
- Add to MetaCart
Devices]: Modes of Computation---Parallelism and concurrency General Terms: Algorithms, Design, Performance, Theory Additional Key Words and Phrases: Automatic parallelization, DAG, multiprocessors, parallel processing, software tools, static scheduling, task graphs This research was supported by the Hong Kong Research Grants Council under contract numbers HKUST 734/96E, HKUST 6076/97E, and HKU 7124/99E. Authors' addresses: Y.-K. Kwok, Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong; email: ykwok@eee.hku.hk; I. Ahmad, Department of Computer Science, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong. Permission to make digital / hard copy of part or all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and / or a fee. 2000 ACM 0360-0300/99/1200--0406 $5.00 ACM Computing Surveys, Vol. 31, No. 4, December 1999 1.
The Watershed Transform: Definitions, Algorithms and Parallelization Strategies
, 2001
"... The watershed transform is the method of choice for image segmentation in the field of mathematical morphology. We present a critical review of several definitions of the watershed transform and the associated sequential algorithms, and discuss various issues which often cause confusion in the li ..."
Abstract
-
Cited by 90 (3 self)
- Add to MetaCart
The watershed transform is the method of choice for image segmentation in the field of mathematical morphology. We present a critical review of several definitions of the watershed transform and the associated sequential algorithms, and discuss various issues which often cause confusion in the literature. The need to distinguish between definition, algorithm specification and algorithm implementation is pointed out. Various examples are given which illustrate di#erences between watershed transforms based on di#erent definitions and/or implementations. The second part of the paper surveys approaches for parallel implementation of sequential watershed algorithms.
Solving A Polynomial Equation: Some History And Recent Progress
, 1997
"... The classical problem of solving an nth degree polynomial equation has substantially influenced the development of mathematics throughout the centuries and still has several important applications to the theory and practice of present-day computing. We briefly recall the history of the algorithmic a ..."
Abstract
-
Cited by 67 (8 self)
- Add to MetaCart
The classical problem of solving an nth degree polynomial equation has substantially influenced the development of mathematics throughout the centuries and still has several important applications to the theory and practice of present-day computing. We briefly recall the history of the algorithmic approach to this problem and then review some successful solution algorithms. We end by outlining some algorithms of 1995 that solve this problem at a surprisingly low computational cost.
Functional Skeletons for Parallel Coordination
- EURO-PAR'95 Parallel Processing
, 1995
"... . In this paper we propose a methodology for structured parallel programming using functional skeletons to compose and coordinate concurrent activities written in a standard imperative language. Skeletons are higher order functional forms with built-in parallel behaviour. We show how such forms ..."
Abstract
-
Cited by 53 (10 self)
- Add to MetaCart
. In this paper we propose a methodology for structured parallel programming using functional skeletons to compose and coordinate concurrent activities written in a standard imperative language. Skeletons are higher order functional forms with built-in parallel behaviour. We show how such forms can be used uniformly to abstract all aspects of a parallel program's behaviour including data partitioning, placement and re-arrangement (communication) as well as computation. Skeletons are naturally data parallel and are capable of expressing computation and co-ordination at a higher level of abstraction than other process oriented co-ordination notations. Examples of the application of this methodology are given and an implementation technique outlined. 1 Introduction This paper proposes the use of skeletons as a coordination language for programming parallel architectures. The coordination language model, as proposed by Gelernter and Carriero, builds parallel programs out of two...
HyperCast: A Protocol for Maintaining Multicast Group Members in a Logical Hypercube Topology
- In Networked Group Communication
, 1999
"... To efficiently support large-scale multicast applications with many thousand simultaneous members, it is essential that protocol mechanisms be available which support efficient exchange of control information between the members of a multicast group. Recently, we proposed the use of a control top ..."
Abstract
-
Cited by 47 (2 self)
- Add to MetaCart
To efficiently support large-scale multicast applications with many thousand simultaneous members, it is essential that protocol mechanisms be available which support efficient exchange of control information between the members of a multicast group. Recently, we proposed the use of a control topology, which organizes multicast group members in a logical n-dimensional hypercube, and transmits all control information along the edges of the hypercube. In this paper, we present the design, verification, and implementation of a protocol, called HyperCast, which maintains members of a large multicast group in a logical hypercube. We use measurement experiments of an implementation of the protocol on a networked computer cluster to quantitatively assess the performance of the protocol for multicast group sizes up to 1024 members. * This work is supported in part by the National Science Foundation under grants ANI-9870336 and NCR9624106 (CAREER). # Corresponding Author: J. Lieb...
Block Data Decomposition for Data-Parallel Programming on a Heterogeneous Workstation Network
, 1993
"... We present a block data decomposition algorithm for two-dimensional grid problems. Our method includes load balancing to accommodate heterogeneous processors, and we characterize the conditions that must be met for our partitioning strategy to be of value. While we concentrate on the workstation net ..."
Abstract
-
Cited by 36 (10 self)
- Add to MetaCart
We present a block data decomposition algorithm for two-dimensional grid problems. Our method includes load balancing to accommodate heterogeneous processors, and we characterize the conditions that must be met for our partitioning strategy to be of value. While we concentrate on the workstation network model of parallel processing because of its high communication costs and inherent heterogeneity, our method is applicable to other parallel architectures. 1 Introduction The concept of the hypercomputer, a virtual parallel machine formed from a network of workstations [4], has made parallel processing available in a wide range of settings. Workstation networks have become commonplace in scientific, academic, and business environments due mainly to their relatively low cost and general-purpose applicability. The current performance capabilities of workstations make them attractive alternatives to expensive specialized machines for many parallel processing applications. Parallel processi...
Skil: An Imperative Language with Algorithmic Skeletons for Efficient Distributed Programming
- In Proceedings of the Fifth International Symposium on High Performance Distributed Computing (HPDC5
, 1996
"... In this paper we present Skil, an imperative language enhanced with higher-order functions and currying, as well as with a polymorphic type system. The high level of Skil allows the integration of algorithmic skeletons, i.e. of higherorder functions representing parallel computation patterns. At the ..."
Abstract
-
Cited by 35 (5 self)
- Add to MetaCart
In this paper we present Skil, an imperative language enhanced with higher-order functions and currying, as well as with a polymorphic type system. The high level of Skil allows the integration of algorithmic skeletons, i.e. of higherorder functions representing parallel computation patterns. At the same time, the language can be efficiently implemented. After describing a series of skeletons which work with distributed arrays, we give two examples of parallel programs implemented on the basis of skeletons, namely shortest paths in graphs and Gaussian elimination. Runtime measurements show that we approach the efficiency of message-passing C up to a factor between 1 and 2.5. 1. Introduction Although parallel and distributed systems gain more and more importance nowadays, the state-of-the-art in the field of parallel software is far from being satisfactory. Not only is the programming of such systems a tedious and time-consuming task, but its outcome is usually machinedependent and hen...
On Supernode Transformation with Minimized Total Running Time
"... With the objective of minimizing the total execution time of a parallel program on a distributed memory parallel computer, this paper discusses how to find an optimal supernode size and optimal supernode relative side lengths of a supernode transformation (also known as tiling). We identify three ..."
Abstract
-
Cited by 26 (3 self)
- Add to MetaCart
With the objective of minimizing the total execution time of a parallel program on a distributed memory parallel computer, this paper discusses how to find an optimal supernode size and optimal supernode relative side lengths of a supernode transformation (also known as tiling). We identify three parameters of supernode transformation: supernode size, relative side lengths, and cutting hyperplane directions. For algorithms with perfectly nested loops and uniform dependencies, for sufficiently large supernodes and number of processors, and for the case where multiple supernodes are mapped to a single processor, we give an order n polynomial whose real positive roots include the optimal supernode size. For two special cases: (1) two dimensional algorithm problems and (2) n-dimensional algorithm problems where the communication cost is dominated by the startup penalty and therefore, can be approximated by a constant, we give a closed form expression for the optimal supernode s...
Parallelizing Existing Applications in a Distributed Heterogeneous Environment
- 4TH HETEROGENEOUS COMPUTING WORKSHOP (HCW '95
, 1995
"... Applications based upon the finite element method are well known for their demand for computational resources. An effective method for satisfying this demand is heterogeneous parallel computing. This paper presents the results obtained by applying heterogeneous computing to a large, existing finite ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
Applications based upon the finite element method are well known for their demand for computational resources. An effective method for satisfying this demand is heterogeneous parallel computing. This paper presents the results obtained by applying heterogeneous computing to a large, existing finite element application code: CSTEM. A difficult problem associated with heterogeneous computing is the mapping and scheduling problem---the process of assigning the tasks of a parallel program to the individual processors. A simple assignment heuristic, Levelized Min-Time (LMT), is presented, along with simulated results from applying the LMT algorithm to heterogeneous CSTEM on a variety of different heterogeneous machine clusters.

