Results 1 -
7 of
7
A Framework for Exploiting Task- and Data-Parallelism on Distributed Memory Multicomputers
- IEEE Transactions on Parallel and Distributed Systems
, 1997
"... offer significant advantages over shared memory multiprocessors in terms of cost and scalability. Unfortunately, the utilization of all the available computational power in these machines involves a tremendous programming effort on the part of users, which creates a need for sophisticated compiler a ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
offer significant advantages over shared memory multiprocessors in terms of cost and scalability. Unfortunately, the utilization of all the available computational power in these machines involves a tremendous programming effort on the part of users, which creates a need for sophisticated compiler and run-time support for distributed memory machines. In this paper, we explore a new compiler optimization for regular scientific applications–the simultaneous exploitation of task and data parallelism. Our optimization is implemented as part of the PARADIGM HPF compiler framework we have developed. The intuitive idea behind the optimization is the use of task parallelism to control the degree of data parallelism of individual tasks. The reason this provides increased performance is that data parallelism provides diminishing returns as the number of processors used is increased. By controlling the number of processors used for each data parallel task in an application and by concurrently executing these tasks, we make program execution more efficient and, therefore, faster. A practical implementation of a task and data parallel scheme of execution for an application on a distributed memory multicomputer also involves data redistribution. This data redistribution causes an overhead. However, as our experimental results show, this overhead is not a problem; execution of a program using task and data parallelism together can be significantly faster than its execution using data parallelism alone. This makes our proposed optimization practical and extremely useful.
Load balancing HPF programs by migrating virtual processors
- IN SECOND INTERNATIONAL WORKSHOP ON HIGH-LEVEL PROGRAMMING MODELS AND SUPPORTIVE ENVIRONMENTS, HIPS '97
, 1997
"... This paper explores the integration of load balancing features in the data parallel language HPF targeting semi-regular applications. We show that the HPF virtual processors are good candidates to be the unit of migration. Then, we compare 3 possible implementations and show that threads provide a g ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
This paper explores the integration of load balancing features in the data parallel language HPF targeting semi-regular applications. We show that the HPF virtual processors are good candidates to be the unit of migration. Then, we compare 3 possible implementations and show that threads provide a good trade-off between efficiency and ease of implementation. We finally describe a preliminary implementation. The experimental results, obtained with the Gaussian elimination with partial pivoting are promising.
An efficient uniform run-time scheme for mixed regularirregular applications
- In Proceedings of the 1998 ACM International Conference on Supercomputing
, 1998
"... Almost all applications containing indirect array address-ing (irregular accesses) have a substantial number of direct array accesses (regular accesses) too. A conspicuous per-centage of these direct array accesses usually require inter-processor communication for the applications to run on a distri ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Almost all applications containing indirect array address-ing (irregular accesses) have a substantial number of direct array accesses (regular accesses) too. A conspicuous per-centage of these direct array accesses usually require inter-processor communication for the applications to run on a distributed memory multicomputer. This study highlights how lack of a uniform representation and lack of a uniform scheme to generate communication structures and parallel code for regular and irregular accesses in a mixed regular-irregular application prevent sophisticated optimizations. F-urthermore, we also show that code generated for regular accesses using compile-time schemes are not alzvays compat-ible to code generated for irregular accesses using run-time schemes. In our opinion, existing schemes handling mixed regular-irregular applications either incur unnecessary pre-processing costs or fail to perform the best communication optimization. This study presents a uniform scheme to han-dle both regular and irregular accesses in a mixed regular-irregular application. While this allows for sophisticated communication optimizations such as message coalescing, message aggregation to be made across regular and irregu-lar accesses, the preprocessing costs incurred are likely to be minimum. Experimental comparisons for various bench-marks on a 16-processor IBM SP-2 show that our scheme is feasible and better than existing schemes. 1
Contribution to Better Handling of Irregular Problems in HPF2
, 1998
"... In this paper, we present our contribution for handling irregular applications with HPF2 and some experimental results. We propose a programming style of irregular applications close to the regular case, so that both compile-time and run-time techniques can be more easily performed. We use the well- ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
In this paper, we present our contribution for handling irregular applications with HPF2 and some experimental results. We propose a programming style of irregular applications close to the regular case, so that both compile-time and run-time techniques can be more easily performed. We use the well-known tree data structure to represent irregular data structures with hierarchical access, such as sparse matrices. This algorithmic representation avoids the indirections coming from the standard irregular programming style. We use derived data types of Fortran 90 to define trees and some approved extensions of HPF2 for their mapping. We also propose a run-time support for irregular applications with loop-carried dependencies that cannot be determined at compile-time. Then, we present the TriDenT library, which supports distributed trees and provides run-time optimizations based on the inspector/executor paradigm. Finally, we validate our contribution with experimental results on IBM SP2 fo...
Automatic Analytical Modeling for the Estimation of Cache Misses
- In International Conference on Parallel Architectures and Compilation Techniques
, 1999
"... Caches play a very important role in the performance of modern computer systems due to the gap between the memory and the processor speed. Among the methods for studying their behavior, the most widely used by now has been trace-driven simulation. Nevertheless, analytical modeling gives more informa ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Caches play a very important role in the performance of modern computer systems due to the gap between the memory and the processor speed. Among the methods for studying their behavior, the most widely used by now has been trace-driven simulation. Nevertheless, analytical modeling gives more information and requires smaller computation times that allow it to be used in the compilation step to drive automatic optimizations on the code. The traditional drawback of analytical modeling has been its limited precision and the lack of techniques to apply it systematically without user intervention. In this work we present a methodology to build analytical models for codes with regular access patterns. These models can be applied to caches with an arbitrary size, line size and associativity. Their validation through simulations using typical scientific code fragments has proved a good degree of accuracy.
Asynchronous Progressive Irregular Prefix Operation in HPF2
"... (this file is intended for private use only) In this paper, we study one kind of irregular computation on distributed arrays, the irregular prefix operation, that is currently not well taken into account by the standard data-parallel language HPF2. We show a parallel implementation that efficiently ..."
Abstract
- Add to MetaCart
(this file is intended for private use only) In this paper, we study one kind of irregular computation on distributed arrays, the irregular prefix operation, that is currently not well taken into account by the standard data-parallel language HPF2. We show a parallel implementation that efficiently takes advantage of the independent computations arising in this irregular operation. Our approach is based on the use of a directive which characterizes an irregular prefix operation and on inspector/executor support, implemented in the CoLuMBO library, which optimizes the execution by using an asynchronous communication scheme and then communication/computation overlap. We validate our contribution with results achieved on IBM SP2 for basic experiments and for a sparse Cholesky factorization algorithm applied to real size problems. KEY WORDS: HPF2, irregular application, prefix operation, run-time support, inspection/execution mechanism,
Evaluation of Compiler and Runtime Library Approaches for Supporting Parallel Regular Applications
- In Proceedings of the 12th International Parallel Processing Symposium
, 1998
"... Important applications including those in computational chemistry, computational fluid dynamics, structural analysis and sparse matrix applications usually consist of a mixture of regular and irregular accesses. While current state-of-the-art run-time library support for such applications handles th ..."
Abstract
- Add to MetaCart
Important applications including those in computational chemistry, computational fluid dynamics, structural analysis and sparse matrix applications usually consist of a mixture of regular and irregular accesses. While current state-of-the-art run-time library support for such applications handles the irregular accesses reasonably well, the efficacy of the optimizations at run-time for the regular accesses is yet to be proven. This paper aims to find out a better approach to handle the above applications in a unified compiler and run-time framework. Specifically, this paper considers only regular applications and evaluates the performance of two approaches, a run-time approach using PILAR and a compile-time approach using a commercial HPF compiler. This study shows that using a particular representation of regular accesses, the performance of regular code using run-time libraries can come close to the performance of code generated by a compiler. We also determine the operations that u...

