Results 1 - 10
of
21
Deadlock-Free Multicast Wormhole Routing in 2D Mesh Multicomputers
, 1992
"... Multicast communication services, in which the same message is delivered from a source node to an arbitrary number of destination nodes, are being provided in new generation multicomputers. Broadcast is a special case of multicast in which a message is delivered to all nodes in the network. The n ..."
Abstract
-
Cited by 121 (22 self)
- Add to MetaCart
Multicast communication services, in which the same message is delivered from a source node to an arbitrary number of destination nodes, are being provided in new generation multicomputers. Broadcast is a special case of multicast in which a message is delivered to all nodes in the network. The nCUBE-2, a wormhole-routed hypercube multicomputer, provides hardware support for broadcast and a restricted form of multicast in which the destinations form a subcube. However, the broadcast routing algorithm adopted in the nCUBE-2 is not deadlock-free. In this paper, four multicast wormhole routing strategies for two-dimensional (2D) mesh multicomputers are proposed and studied. All of the algorithms are shown to be deadlock-free. These are the first deadlock-free multicast wormhole routing algorithms ever proposed. A simulation study has been conducted that compares the performance of these multicast algorithms under dynamic network traffic conditions in a 2D mesh. The results ind...
Scalable Load Balancing Techniques for Parallel Computers
, 1994
"... In this paper we analyze the scalability of a number of load balancing algorithms which can be applied to problems that have the following characteristics : the work done by a processor can be partitioned into independent work pieces; the work pieces are of highly variable sizes; and it is not po ..."
Abstract
-
Cited by 89 (16 self)
- Add to MetaCart
In this paper we analyze the scalability of a number of load balancing algorithms which can be applied to problems that have the following characteristics : the work done by a processor can be partitioned into independent work pieces; the work pieces are of highly variable sizes; and it is not possible (or very difficult) to estimate the size of total work at a given processor. Such problems require a load balancing scheme that distributes the work dynamically among different processors. Our goal here is to determine the most scalable load balancing schemes for different architectures such as hypercube, mesh and network of workstations. For each of these architectures, we establish lower bounds on the scalability of any possible load balancing scheme. We present the scalability analysis of a number of load balancing schemes that have not been analyzed before. This gives us valuable insights into their relative performance for different problem and architectural characteristi...
Analyzing Scalability of Parallel Algorithms and Architectures
- Journal of Parallel and Distributed Computing
, 1994
"... The scalability of a parallel algorithm on a parallel architecture is a measure of its capacity to effectively utilize an increasing number of processors. Scalability analysis may be used to select the best algorithm-architecture combination for a problem under different constraints on the growth of ..."
Abstract
-
Cited by 84 (17 self)
- Add to MetaCart
The scalability of a parallel algorithm on a parallel architecture is a measure of its capacity to effectively utilize an increasing number of processors. Scalability analysis may be used to select the best algorithm-architecture combination for a problem under different constraints on the growth of the problem size and the number of processors. It may be used to predict the performance of a parallel algorithm and a parallel architecture for a large number of processors from the known performance on fewer processors. For a fixed problem size, it may be used to determine the optimal number of processors to be used and the maximum possible speedup that can be obtained. The objective of this paper is to critically assess the state of the art in the theory of scalability analysis, and motivate further research on the development of new and more comprehensive analytical tools to study the scalability of parallel algorithms and architectures. We survey a number of techniques and formalisms t...
Unstructured Tree Search on SIMD Parallel Computers
- IEEE Transactions on Parallel and Distributed Systems
, 1994
"... In this paper, we present new methods for load balancing of unstructured tree computations on large-scale SIMD machines, and analyze the scalability of these and other existing schemes. An efficient formulation of tree search on a SIMD machine comprises of two major components: (i) a triggering mech ..."
Abstract
-
Cited by 32 (12 self)
- Add to MetaCart
In this paper, we present new methods for load balancing of unstructured tree computations on large-scale SIMD machines, and analyze the scalability of these and other existing schemes. An efficient formulation of tree search on a SIMD machine comprises of two major components: (i) a triggering mechanism, which determines when the search space redistribution must occur to balance search space over processors; and (ii) a scheme to redistribute the search space. We have devised a new redistribution mechanism and a new triggering mechanism. Either of these can be used in conjunction with triggering and redistribution mechanisms developed by other researchers. We analyze the scalability of these mechanisms, and verify the results experimentally. The analysis and experiments show that our new load balancing methods are highly scalable on SIMD architectures. Their scalability is shown to be no worse than that of the best load balancing schemes on MIMD architectures. We verify our theoretical...
Performance Properties of Large Scale Parallel Systems
- Department of Computer Science, University of Minnesota
, 1993
"... There are several metrics that characterize the performance of a parallel system, such as, parallel execution time, speedup and efficiency. A number of properties of these metrics have been studied. For example, it is a well known fact that given a parallel architecture and a problem of a fixed size ..."
Abstract
-
Cited by 21 (7 self)
- Add to MetaCart
There are several metrics that characterize the performance of a parallel system, such as, parallel execution time, speedup and efficiency. A number of properties of these metrics have been studied. For example, it is a well known fact that given a parallel architecture and a problem of a fixed size, the speedup of a parallel algorithm does not continue to increase with increasing number of processors. It usually tends to saturate or peak at a certain limit. Thus it may not be useful to employ more than an optimal number of processors for solving a problem on a parallel computer. This optimal number of processors depends on the problem size, the parallel algorithm and the parallel architecture. In this paper we study the impact of parallel processing overheads and the degree of concurrency of a parallel algorithm on the optimal number of processors to be used when the criterion for optimality is minimizing the parallel execution time. We then study a more general criterion of optimalit...
Scalability of Parallel Algorithms for Matrix Multiplication
- in Proc. of Int. Conf. on Parallel Processing
, 1991
"... A number of parallel formulations of dense matrix multiplication algorithm have been developed. For arbitrarily large number of processors, any of these algorithms or their variants can provide near linear speedup for sufficiently large matrix sizes and none of the algorithms can be clearly claimed ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
A number of parallel formulations of dense matrix multiplication algorithm have been developed. For arbitrarily large number of processors, any of these algorithms or their variants can provide near linear speedup for sufficiently large matrix sizes and none of the algorithms can be clearly claimed to be superior than the others. In this paper we analyze the performance and scalability of a number of parallel formulations of the matrix multiplication algorithm and predict the conditions under which each formulation is better than the others. We present a parallel formulation for hypercube and related architectures that performs better than any of the schemes described in the literature so far for a wide range of matrix sizes and number of processors. The superior performance and the analytical scalability expressions for this algorithm are verified through experiments on the Thinking Machines Corporation's CM-5 TM y parallel computer for up to 512 processors. We show that special har...
Scalability of Parallel Sorting on Mesh Multicomputers
, 1991
"... This paper presents two new parallel algorithms QSP1 and QSP2 based on sequential quicksort for sorting data on a mesh multicomputer, and analyzes their scalability using the isoefficiency metric. We show that QSP2 matches the lower bound on the isoefficiency function for mesh multicomputers. The is ..."
Abstract
-
Cited by 18 (12 self)
- Add to MetaCart
This paper presents two new parallel algorithms QSP1 and QSP2 based on sequential quicksort for sorting data on a mesh multicomputer, and analyzes their scalability using the isoefficiency metric. We show that QSP2 matches the lower bound on the isoefficiency function for mesh multicomputers. The isoefficiency of QSP1 is also fairly close to optimal. Lang et al. and Schnorr et al. have developed parallel sorting algorithms for the mesh architecture that have either optimal (Schnorr) or close to optimal (Lang) run-time complexity for the one-element-per-processor case. Both QSP1 and QSP2 have worse performance than these algorithms for the one-element-perprocessor case. But QSP1 and QSP2 have better scalability than the scaled-down variants of these algorithms (for the case in which there are more elements than processors). As a result, our new parallel formulations are better than these scaled-down variants in terms of speedup w.r.t the best sequential algorithms. We also present a dif...
A scalable parallel formulation of the backpropagation algorithm for hypercubes and related architectures
- IEEE Transactions on Parallel and Distributed Systems
, 1994
"... Abstract In this paper, we present a new technique for mapping the backpropagation algorithm on hypercubes and related architectures. A key component of this technique is a network partitioning scheme which is called checkerboarding. Checkerboarding allows us to replace the all-to-all broadcast oper ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Abstract In this paper, we present a new technique for mapping the backpropagation algorithm on hypercubes and related architectures. A key component of this technique is a network partitioning scheme which is called checkerboarding. Checkerboarding allows us to replace the all-to-all broadcast operation performed by the commonly used vertical network partitioning scheme, with operations that are much faster on the hypercubes and related architectures. Checkerboarding can be combined with the pattern partitioning technique to form a hybrid scheme which performs better than either one of these schemes. Theoretical analysis and experimental results on nCUBE2TM y and CM5TM z show that our scheme performs better than the other schemes, both for uniform and non-uniform networks.
Isoefficiency Function: A Scalability Metric for Parallel Algorithms and Architectures
, 1993
"... This paper provides a tutorial introduction to a performance evaluation metric called the isoefficiency function. Traditional methods for evaluating serial algorithms are inadequate for analyzing the performance of parallel algorithm-architecture combinations. Isoefficiency function has proven usef ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
This paper provides a tutorial introduction to a performance evaluation metric called the isoefficiency function. Traditional methods for evaluating serial algorithms are inadequate for analyzing the performance of parallel algorithm-architecture combinations. Isoefficiency function has proven useful for evaluating the performance of a wide variety of such combinations. On a sequential computer, the fastest algorithm for solving a given problem is the best algorithm. However, the performance of a parallel algorithm for a specific problem instance on a given number of processors provides only limited information. The time taken by a parallel algorithm to solve a problem instance depends on the problem size, the number of processors used to solve the problem, and machine characteristics such as: processor speed, speed of communication channels, type of interconnection network, and routing techniques. An algorithm that yields good performance for a selected problem on a fixed number of processors on a given machine may perform poorly if any of these parameters are changed. Hence, the evaluation of a parallel algorithm on a parallel computer requires a more comprehensive analysis, and the study of scalability aids us in this analysis. The

