Results 1 -
4 of
4
Asynchronous Programming in UPC: A Case Study and Potential for Improvement ⋆
"... Abstract. In a traditional Partitioned Global Address Space language like UPC, an application programmer works with the model of a static set of threads performing locality-aware accesses on a global address space. On the other hand, asynchronous programming provides a simple interface for expressin ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. In a traditional Partitioned Global Address Space language like UPC, an application programmer works with the model of a static set of threads performing locality-aware accesses on a global address space. On the other hand, asynchronous programming provides a simple interface for expressing the concurrency in dynamic, irregular algorithms, with the prospect of efficient portable execution from sophisticated runtime schemes handling the exposed concurrency. In this paper, we adopt the asynchronous style of programming to parallelize a nested, tree-based code in UPC. To maximize performance without losing the ease of application programming, we design Asynchronous Remote Methods as a potential extension to the UPC standard. Our prototype implementation of this construct in Berkeley UPC yields within 7 % of ideal performance and 20-fold improvement over the original Standard UPC solution in some cases. 1
X10 as a parallel language for scientific computation: practice and experience
"... Abstract—X10 is an emerging Partitioned Global Address Space (PGAS) language intended to increase significantly the productivity of developing scalable HPC applications. The language has now matured to a point where it is meaningful to consider writing large scale scientific application codes in X10 ..."
Abstract
- Add to MetaCart
Abstract—X10 is an emerging Partitioned Global Address Space (PGAS) language intended to increase significantly the productivity of developing scalable HPC applications. The language has now matured to a point where it is meaningful to consider writing large scale scientific application codes in X10. This paper reports our experiences writing three codes from
unknown title
"... X10 as a parallel language for scientific computation We are exploring the suitability of the X10 programming language for development of high performance scientific applications. We developed three different scientific codes entirely in X10 version 2.1. One performs a complete Hartree-Fock (HF) qua ..."
Abstract
- Add to MetaCart
X10 as a parallel language for scientific computation We are exploring the suitability of the X10 programming language for development of high performance scientific applications. We developed three different scientific codes entirely in X10 version 2.1. One performs a complete Hartree-Fock (HF) quantum chemistry computation on a molecular system. The other two implement alternative approaches for the fast calculation of long-range electrostatic forces in biochemical simulations: the Fast Multipole Method and the Smooth Particle Mesh Ewald method. Each application represents a different pattern of computation & communication. Particle Mesh Ewald method The Smooth Particle Mesh Ewald method[2] splits particle interactions into short-range and long-range components. The short-range component is summed within some reduced domain. The long-range component is calculated by approximating a charge density field and interpolating charges on a mesh of grid points. Complexity is O(N log N). Our feedback on the applications presented here led to improvements in the X10 implementation including: complex arithmetic, fast local arrays, scaling of distributed arrays and async. A key step is a 3D Fast Fourier Transform, which requires all-to-all communication for a distributed implementation. Our code uses active messages and global barriers to implement the all-to-all communication. A more efficient implementation would use a blocking collective operation, similar to MPI_Alltoall, to combine data transfer and synchronization. Hartree-Fock method The Hartree-Fock method (HF) is widely used in quantum chemistry to determine the motion of electrons around fixed nuclei. The core of the method is the creation of the Fock matrix: of which the most expensive part is the evaluation of four-centered two electron integrals. Overall, this problem is characterized by load imbalance in the evaluation of the two-electron integrals and the need to accumulate multiple contributions to every element in the Fock matrix. The all-to-all transpose is implemented in X10 as: def transpose( source: DistArray[Complex](3), target:

