Results 1 -
8 of
8
Fault Tolerance for OpenSHMEM
"... ABSTRACT On today's supercomputing systems, faults are becoming a norm rather than an exception. Given the complexity required for achieving expected scalability and performance on future systems, this situation is expected to become worse. The systems are expected to function in a nearly cons ..."
Abstract
- Add to MetaCart
constant presence of faults. To be productive on these systems, programming models will require both hardware and software to be resilient to faults. With the growing importance of PGAS programming model and OpenSHMEM, as a part of HPC software stack, a lack of a fault tolerance model may become a
Scalable MiniMD Design with Hybrid MPI and OpenSHMEM *
"... ABSTRACT The MPI programming model has been widely used for scientific applications. The emergence of Partitioned Global Address Space (PGAS) programming models presents an alternative approach to improve programmability. With the global data view and lightweight communication operations, PGAS has ..."
Abstract
- Add to MetaCart
the benefits of both MPI and PGAS models. In this paper, we re-design an existing MPI based scientific mini-application (MiniMD) with MPI and OpenSHMEM programming models. We propose two alternative designs using MPI and OpenSHMEM programming models and compare performance and scalability of those designs
Contexts: A Mechanism for High Throughput Communication in OpenSHMEM
"... This paper introduces a proposed extension to the OpenSHMEM parallel programming model, called communication contexts. Con-texts introduce a new construct that allows a programmer to gen-erate independent streams of communication operations. In hy-brid executions where multiple threads execute withi ..."
Abstract
- Add to MetaCart
This paper introduces a proposed extension to the OpenSHMEM parallel programming model, called communication contexts. Con-texts introduce a new construct that allows a programmer to gen-erate independent streams of communication operations. In hy-brid executions where multiple threads execute
Profiling Non-Numeric OpenSHMEM Applications with the TAU Performance System
"... Abstract. The recent development of a unified SHMEM framework, OpenSHMEM, has enabled further study in the porting and scaling of ap-plications that can benefit from the SHMEM programming model. This paper focuses on non-numerical graph algorithms, which typically have a low FLOPS/byte ratio. An ove ..."
Abstract
- Add to MetaCart
Abstract. The recent development of a unified SHMEM framework, OpenSHMEM, has enabled further study in the porting and scaling of ap-plications that can benefit from the SHMEM programming model. This paper focuses on non-numerical graph algorithms, which typically have a low FLOPS/byte ratio
An Intra-Node Implementation of OpenSHMEM Using Virtual Address Space Mapping
"... The recent OpenSHMEM effort has generated renewed in-terest in developing a portable, high-performance implemen-tation of the SHMEM programming interface. One advan-tage of SHMEM is the simplified one-sided communication model, but the traditional UNIX shared memory model does not support the single ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
The recent OpenSHMEM effort has generated renewed in-terest in developing a portable, high-performance implemen-tation of the SHMEM programming interface. One advan-tage of SHMEM is the simplified one-sided communication model, but the traditional UNIX shared memory model does not support
Performance analysis of asynchronous Jacobi’s method implemented in MPI, SHMEM and
, 2013
"... Ever-increasing core counts create the need to develop parallel algorithms that avoid closely-coupled execution across all cores. In this paper we present performance analysis of several parallel asynchronous implementations of Jacobi’s method for solving systems of linear equations, using MPI, SHME ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
, SHMEM and OpenMP. In particular we have solved systems of over 4 billion unknowns using up to 32,768 processes on a Cray XE6 supercomputer. We show that the precise implementation details of asynchronous algorithms can strongly affect the resulting performance and convergence behaviour of our solvers
Article Performance analysis of asynchronous Jacobi’s method implemented in
"... Ever-increasing core counts create the need to develop parallel algorithms that avoid closely coupled execution across all cores. We present performance analysis of several parallel asynchronous implementations of Jacobi’s method for solving systems of linear equations, using MPI, SHMEM and OpenMP. ..."
Abstract
- Add to MetaCart
Ever-increasing core counts create the need to develop parallel algorithms that avoid closely coupled execution across all cores. We present performance analysis of several parallel asynchronous implementations of Jacobi’s method for solving systems of linear equations, using MPI, SHMEM and Open
A static binary instrumentation threading model for fast memory trace collection
- International Workshop on Data-Intensive Scalable Computing Systems
, 2012
"... Abstract—In order to achieve a high level of peformance, data intensive applications such as the real-time processing of surveillance feeds from unmanned aerial vehicles will require the strategic application of multi/many-core processors and coprocessors using a hybrid of inter-process message pass ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
passing (e.g. MPI and SHMEM) and intra-process threading (e.g. pthreads and OpenMP). To facilitate program design decisions, memory traces gathered through binary instrumentation can be used to understand the low-level interactions between a data intensive code and the memory subsystem of a multi