Results 1  10
of
36
DeadlockFree Multicast Wormhole Routing in 2D Mesh Multicomputers
, 1992
"... Multicast communication services, in which the same message is delivered from a source node to an arbitrary number of destination nodes, are being provided in new generation multicomputers. Broadcast is a special case of multicast in which a message is delivered to all nodes in the network. The n ..."
Abstract

Cited by 133 (23 self)
 Add to MetaCart
Multicast communication services, in which the same message is delivered from a source node to an arbitrary number of destination nodes, are being provided in new generation multicomputers. Broadcast is a special case of multicast in which a message is delivered to all nodes in the network. The nCUBE2, a wormholerouted hypercube multicomputer, provides hardware support for broadcast and a restricted form of multicast in which the destinations form a subcube. However, the broadcast routing algorithm adopted in the nCUBE2 is not deadlockfree. In this paper, four multicast wormhole routing strategies for twodimensional (2D) mesh multicomputers are proposed and studied. All of the algorithms are shown to be deadlockfree. These are the first deadlockfree multicast wormhole routing algorithms ever proposed. A simulation study has been conducted that compares the performance of these multicast algorithms under dynamic network traffic conditions in a 2D mesh. The results ind...
Scalable Load Balancing Techniques for Parallel Computers
, 1994
"... In this paper we analyze the scalability of a number of load balancing algorithms which can be applied to problems that have the following characteristics : the work done by a processor can be partitioned into independent work pieces; the work pieces are of highly variable sizes; and it is not po ..."
Abstract

Cited by 106 (16 self)
 Add to MetaCart
In this paper we analyze the scalability of a number of load balancing algorithms which can be applied to problems that have the following characteristics : the work done by a processor can be partitioned into independent work pieces; the work pieces are of highly variable sizes; and it is not possible (or very difficult) to estimate the size of total work at a given processor. Such problems require a load balancing scheme that distributes the work dynamically among different processors. Our goal here is to determine the most scalable load balancing schemes for different architectures such as hypercube, mesh and network of workstations. For each of these architectures, we establish lower bounds on the scalability of any possible load balancing scheme. We present the scalability analysis of a number of load balancing schemes that have not been analyzed before. This gives us valuable insights into their relative performance for different problem and architectural characteristi...
Analyzing Scalability of Parallel Algorithms and Architectures
 Journal of Parallel and Distributed Computing
, 1994
"... The scalability of a parallel algorithm on a parallel architecture is a measure of its capacity to effectively utilize an increasing number of processors. Scalability analysis may be used to select the best algorithmarchitecture combination for a problem under different constraints on the growth of ..."
Abstract

Cited by 92 (19 self)
 Add to MetaCart
The scalability of a parallel algorithm on a parallel architecture is a measure of its capacity to effectively utilize an increasing number of processors. Scalability analysis may be used to select the best algorithmarchitecture combination for a problem under different constraints on the growth of the problem size and the number of processors. It may be used to predict the performance of a parallel algorithm and a parallel architecture for a large number of processors from the known performance on fewer processors. For a fixed problem size, it may be used to determine the optimal number of processors to be used and the maximum possible speedup that can be obtained. The objective of this paper is to critically assess the state of the art in the theory of scalability analysis, and motivate further research on the development of new and more comprehensive analytical tools to study the scalability of parallel algorithms and architectures. We survey a number of techniques and formalisms t...
Unstructured Tree Search on SIMD Parallel Computers
 IEEE Transactions on Parallel and Distributed Systems
, 1994
"... In this paper, we present new methods for load balancing of unstructured tree computations on largescale SIMD machines, and analyze the scalability of these and other existing schemes. An efficient formulation of tree search on a SIMD machine comprises of two major components: (i) a triggering mech ..."
Abstract

Cited by 38 (15 self)
 Add to MetaCart
In this paper, we present new methods for load balancing of unstructured tree computations on largescale SIMD machines, and analyze the scalability of these and other existing schemes. An efficient formulation of tree search on a SIMD machine comprises of two major components: (i) a triggering mechanism, which determines when the search space redistribution must occur to balance search space over processors; and (ii) a scheme to redistribute the search space. We have devised a new redistribution mechanism and a new triggering mechanism. Either of these can be used in conjunction with triggering and redistribution mechanisms developed by other researchers. We analyze the scalability of these mechanisms, and verify the results experimentally. The analysis and experiments show that our new load balancing methods are highly scalable on SIMD architectures. Their scalability is shown to be no worse than that of the best load balancing schemes on MIMD architectures. We verify our theoretical...
A scalable parallel formulation of the backpropagation algorithm for hypercubes and related architectures
 IEEE Transactions on Parallel and Distributed Systems
, 1994
"... Abstract In this paper, we present a new technique for mapping the backpropagation algorithm on hypercubes and related architectures. A key component of this technique is a network partitioning scheme which is called checkerboarding. Checkerboarding allows us to replace the alltoall broadcast oper ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
Abstract In this paper, we present a new technique for mapping the backpropagation algorithm on hypercubes and related architectures. A key component of this technique is a network partitioning scheme which is called checkerboarding. Checkerboarding allows us to replace the alltoall broadcast operation performed by the commonly used vertical network partitioning scheme, with operations that are much faster on the hypercubes and related architectures. Checkerboarding can be combined with the pattern partitioning technique to form a hybrid scheme which performs better than either one of these schemes. Theoretical analysis and experimental results on nCUBE2TM y and CM5TM z show that our scheme performs better than the other schemes, both for uniform and nonuniform networks.
Performance Properties of Large Scale Parallel Systems
 Department of Computer Science, University of Minnesota
, 1993
"... There are several metrics that characterize the performance of a parallel system, such as, parallel execution time, speedup and efficiency. A number of properties of these metrics have been studied. For example, it is a well known fact that given a parallel architecture and a problem of a fixed size ..."
Abstract

Cited by 23 (7 self)
 Add to MetaCart
There are several metrics that characterize the performance of a parallel system, such as, parallel execution time, speedup and efficiency. A number of properties of these metrics have been studied. For example, it is a well known fact that given a parallel architecture and a problem of a fixed size, the speedup of a parallel algorithm does not continue to increase with increasing number of processors. It usually tends to saturate or peak at a certain limit. Thus it may not be useful to employ more than an optimal number of processors for solving a problem on a parallel computer. This optimal number of processors depends on the problem size, the parallel algorithm and the parallel architecture. In this paper we study the impact of parallel processing overheads and the degree of concurrency of a parallel algorithm on the optimal number of processors to be used when the criterion for optimality is minimizing the parallel execution time. We then study a more general criterion of optimalit...
Scalability of Parallel Algorithms for Matrix Multiplication
 in Proc. of Int. Conf. on Parallel Processing
, 1991
"... A number of parallel formulations of dense matrix multiplication algorithm have been developed. For arbitrarily large number of processors, any of these algorithms or their variants can provide near linear speedup for sufficiently large matrix sizes and none of the algorithms can be clearly claimed ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
A number of parallel formulations of dense matrix multiplication algorithm have been developed. For arbitrarily large number of processors, any of these algorithms or their variants can provide near linear speedup for sufficiently large matrix sizes and none of the algorithms can be clearly claimed to be superior than the others. In this paper we analyze the performance and scalability of a number of parallel formulations of the matrix multiplication algorithm and predict the conditions under which each formulation is better than the others. We present a parallel formulation for hypercube and related architectures that performs better than any of the schemes described in the literature so far for a wide range of matrix sizes and number of processors. The superior performance and the analytical scalability expressions for this algorithm are verified through experiments on the Thinking Machines Corporation's CM5 TM y parallel computer for up to 512 processors. We show that special har...
Scalability of Parallel Sorting on Mesh Multicomputers
, 1991
"... This paper presents two new parallel algorithms QSP1 and QSP2 based on sequential quicksort for sorting data on a mesh multicomputer, and analyzes their scalability using the isoefficiency metric. We show that QSP2 matches the lower bound on the isoefficiency function for mesh multicomputers. The is ..."
Abstract

Cited by 19 (13 self)
 Add to MetaCart
This paper presents two new parallel algorithms QSP1 and QSP2 based on sequential quicksort for sorting data on a mesh multicomputer, and analyzes their scalability using the isoefficiency metric. We show that QSP2 matches the lower bound on the isoefficiency function for mesh multicomputers. The isoefficiency of QSP1 is also fairly close to optimal. Lang et al. and Schnorr et al. have developed parallel sorting algorithms for the mesh architecture that have either optimal (Schnorr) or close to optimal (Lang) runtime complexity for the oneelementperprocessor case. Both QSP1 and QSP2 have worse performance than these algorithms for the oneelementperprocessor case. But QSP1 and QSP2 have better scalability than the scaleddown variants of these algorithms (for the case in which there are more elements than processors). As a result, our new parallel formulations are better than these scaleddown variants in terms of speedup w.r.t the best sequential algorithms. We also present a dif...
Isoefficiency Function: A Scalability Metric for Parallel Algorithms and Architectures
, 1993
"... This paper provides a tutorial introduction to a performance evaluation metric called the isoefficiency function. Traditional methods for evaluating serial algorithms are inadequate for analyzing the performance of parallel algorithmarchitecture combinations. Isoefficiency function has proven usef ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
This paper provides a tutorial introduction to a performance evaluation metric called the isoefficiency function. Traditional methods for evaluating serial algorithms are inadequate for analyzing the performance of parallel algorithmarchitecture combinations. Isoefficiency function has proven useful for evaluating the performance of a wide variety of such combinations. On a sequential computer, the fastest algorithm for solving a given problem is the best algorithm. However, the performance of a parallel algorithm for a specific problem instance on a given number of processors provides only limited information. The time taken by a parallel algorithm to solve a problem instance depends on the problem size, the number of processors used to solve the problem, and machine characteristics such as: processor speed, speed of communication channels, type of interconnection network, and routing techniques. An algorithm that yields good performance for a selected problem on a fixed number of processors on a given machine may perform poorly if any of these parameters are changed. Hence, the evaluation of a parallel algorithm on a parallel computer requires a more comprehensive analysis, and the study of scalability aids us in this analysis. The