Results 11 - 20
of
37
Parallel Ordering Using Edge Contraction
- PARALLEL COMPUTING
, 1995
"... Computing a fill-reducing ordering of a sparse matrix is a central problem in the solution of sparse linear systems using direct methods. In recent years, there has been significant research in developing a sparse direct solver suitable for message-passing multiprocessors. However, computing the ord ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
Computing a fill-reducing ordering of a sparse matrix is a central problem in the solution of sparse linear systems using direct methods. In recent years, there has been significant research in developing a sparse direct solver suitable for message-passing multiprocessors. However, computing the ordering step in parallel remains a challenge and there are very few methods available. This paper describes a new scheme called parallel contracted ordering which is a combination of a new parallel nested dissection heuristic and any serial ordering method. The new nested dissection heuristic called Shrink-Split ND (SSND) is based on parallel graph contraction. For a system with N unknowns, the complexity of SSND is O( N P log P ) using P processors in a hypercube; the overall complexity is O( N P log N) when the serial ordering method chosen is graph exploration based nested dissection. We provide extensive empirical results on the quality of the ordering. We also report on the parallel...
Efficient Broadcasting in Wormhole-Routed Multicomputers: A Network-Partitioning Approach
- IEEE Transactions on Parallel and Distributed Systems
, 1996
"... In this paper, a network-partitioning approach for one-to-all broadcasting on wormhole-routed networks is proposed. To broadcast a message, the scheme works in three phases. First, a number of data-distributing networks (DDNs), which can work independently, are constructed. Then the message is evenl ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
In this paper, a network-partitioning approach for one-to-all broadcasting on wormhole-routed networks is proposed. To broadcast a message, the scheme works in three phases. First, a number of data-distributing networks (DDNs), which can work independently, are constructed. Then the message is evenly divided into sub-messages each being sent to a representative node in one DDN. Second, the sub-messages are broadcast on the DDNs concurrently. Finally, a number of datacollecting networks (DCNs), which can work independently too, are constructed. Then concurrently on each DCN the sub-messages are collected and combined into the original message. Our approach, especially designed for wormhole-routed networks, is conceptually similar but fundamentally very different from the traditional approach (e.g., [4, 12, 17, 29]) of using multiple edge-disjoint spanning trees in parallel for broadcasting in store-and-forward networks. One interesting issue is on the definition of independent, in the s...
Exchange of Messages of Different Sizes
- In IRREGULAR '98
"... In this paper, we study the exchange of messages among a set of processors linked through an interconnection network. We focus on general, non-uniform versions of all-to-all (or complete) exchange problems in asynchronous systems with a linear cost model and messages of arbitrary sizes. We exten ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
In this paper, we study the exchange of messages among a set of processors linked through an interconnection network. We focus on general, non-uniform versions of all-to-all (or complete) exchange problems in asynchronous systems with a linear cost model and messages of arbitrary sizes. We extend previous complexity results to show that the general asynchronous problems are NP-complete. We present several approximation algorithms and determine which heuristics are best suited to several parallel systems. We conclude with experimental results that show that our algorithms outperform the native all-to-all exchange algorithm on an IBM SP2 when the number of processors is odd.
Optimization of Collective Reduction Operations
- In ######## ###### ####### ######, Springer-Verlag LNCS 3036
, 2004
"... ..."
Communication Operations on Coarse-Grained Mesh Architectures
- Parallel Computing
, 1994
"... In this paper we consider three frequently arising communication operations, one-to-all, all-to-one, and all-to-all. We describe architecture-independent solutions for each operation, as well as solutions tailored towards the mesh architecture. We show how the relationship among the parameters of a ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
In this paper we consider three frequently arising communication operations, one-to-all, all-to-one, and all-to-all. We describe architecture-independent solutions for each operation, as well as solutions tailored towards the mesh architecture. We show how the relationship among the parameters of a parallel machine and the relationship of these parameters to the message size determines the best solution. We discuss performance and scalability issues of our solutions on the Intel Touchstone Delta. Our results show that in order to cover a broad range of scalability for a particular operation, multiple solutions should be employed. Keywords: Parallel processing, coarse-grained machines, communication operations, scalability. Research supported in part by ARPA under contract DABT63-92-C-0022ONR. The views and conclusions contained in this paper are those of the authors and should not be interpreted as representing official policies, expressed or implied, of the U.S. government. 1 In...
Implementation and Performance of the MPI Message Passing Interface on the Fujitsu AP1000 Multicomputer
- In Proceedings of ACSC'95. Available from ftp://dcssoft.anu.edu.au/pub/www/dcs/cap/mpi/mpi.html
"... MPI is the new standard which defines a set of message passing operations for multicomputers and clustered systems. In comparison to other popular message passing systems, MPI provides a richer collection of functions, allowing efficient implementations, portability and excellent support for the dev ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
MPI is the new standard which defines a set of message passing operations for multicomputers and clustered systems. In comparison to other popular message passing systems, MPI provides a richer collection of functions, allowing efficient implementations, portability and excellent support for the development of parallel libraries. In this paper, we describe the implementation and performance of MPI on the Fujitsu AP1000 multicomputer. To produce an efficient implementation, the operating system on the AP1000 had to be modified to better support MPI. These modifications are presented, along with the hardware operations that were utilised. A selective broadcast operation was developed from the modifications which allowed very efficient group-wide broadcast. The performance of the implementation in comparison to native AP1000 calls is presented with benchmarks of the collective routines implemented using the selective broadcast operation. 1 The MPI Standard The message passing paradigm h...
A General-Purpose Model for Heterogeneous Computation
, 2000
"... Heterogeneous computing environments are becoming an increasingly popular platform for executing parallel applications. Such environments consist of a diverse set of machines and offer considerably more computational power at a lower cost than a parallel computer. Efficient heterogeneous parallel ap ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Heterogeneous computing environments are becoming an increasingly popular platform for executing parallel applications. Such environments consist of a diverse set of machines and offer considerably more computational power at a lower cost than a parallel computer. Efficient heterogeneous parallel applications must account for the differences inherent in such an environment. For example, faster machines should possess more data items than their slower counterparts and communication should be minimized over slow network links. Current parallel applications are not designed with such heterogeneity in mind. Thus, a new approach is necessary for designing efficient heterogeneous parallel programs.
Scalable s-to-p broadcasting on message-passing mpps
- IEEE Transactions on Parallel and Distributed Systems
, 1998
"... Abstract—In s-to-p broadcasting, s processors in a p-processor machine contain a message to be broadcast to all the processors, 1 ≤ s ≤ p. We present a number of different broadcasting algorithms that handle all ranges of s. We show how the performance of each algorithm is influenced by the distribu ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract—In s-to-p broadcasting, s processors in a p-processor machine contain a message to be broadcast to all the processors, 1 ≤ s ≤ p. We present a number of different broadcasting algorithms that handle all ranges of s. We show how the performance of each algorithm is influenced by the distribution of the s source processors and by the relationships between the distribution and the characteristics of the interconnection network. For the Intel Paragon we show that for each algorithm and machine dimension there exist ideal distributions and distributions on which the performance degrades. For the Cray T3D we also demonstrate dependencies between distributions and machine sizes. To reduce the dependence of the performance on the distribution of sources, we propose a repositioning approach. In this approach, the initial distribution is turned into an ideal distribution of the target broadcasting algorithm. We report experimental results for the Intel Paragon and Cray T3D and discuss scalability and performance. Index Terms—Broadcasting, communication operations, message-passing MPPs, scalability.
Efficient Single-Node Broadcast in Wormhole-Routed Multicomputers: A Network-Partitioning Approach
- In Symposium on Parallel and Distributed Processing
, 1996
"... In this paper, a network-partitioning scheme for singlenode broadcasting on wormhole-routed networks is proposed. To broadcast a message, the scheme works in three phases. First, a number of data-distributing networks (DDNs), which can work independently, are constructed. Then the message is evenly ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
In this paper, a network-partitioning scheme for singlenode broadcasting on wormhole-routed networks is proposed. To broadcast a message, the scheme works in three phases. First, a number of data-distributing networks (DDNs), which can work independently, are constructed. Then the message is evenly divided into sub-messages each being sent to a representative node in one DDN. Second, the sub-messages are broadcast on the DDNs concurrently. Finally, a number of data-collecting networks (DCNs), which can work independently too, are constructed. Then concurrently on each DCN the sub-messages are re-collected and combined into the original message. One interesting issue is on the definition of independent, in the sense of wormhole routing, DDNs and DCNs. We show how to apply this scheme to tori, meshes, and hypercubes. Thorough analyses and experiments based on different system parameters and configurations are conducted. The results do confirm the advantage of our scheme, under various sy...

