Results 1 - 10
of
203
The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing
- Journal of Future Generation Computing Systems
, 1999
"... ..."
High-Performance Parallel Programming in Java: Exploiting Native Libraries
, 1998
"... With most of today's fast scientific software written in Fortran and C, Java has a lot of catching up to do. In this paper we discuss how new Java programs can capitalize on high-performance libraries for other languages. With the help of a tool we have automatically created Java bindings for severa ..."
Abstract
-
Cited by 67 (3 self)
- Add to MetaCart
With most of today's fast scientific software written in Fortran and C, Java has a lot of catching up to do. In this paper we discuss how new Java programs can capitalize on high-performance libraries for other languages. With the help of a tool we have automatically created Java bindings for several standard libraries: MPI, BLAS, BLACS, PBLAS, ScaLAPACK. Performance results are presented for Java versions of two benchmarks from the NPB and PARKBENCH suites on an IBM SP2 distributed memory machine using JDK and IBM's high-performance Java compiler. The results confirm that fast parallel computing in Java is indeed possible.
PARDIS: A Parallel Approach to CORBA
- In 6th IEEE International Symposium on High Performance Distributed Computation
, 1997
"... This paper describes PARDIS, a system carrying explicit support for interoperability of PARallel DIStributed applications. PARDIS is closely based on the Common Object Request Broker Architecture (CORBA) [OMG95]. Like CORBA, it provides interoperability between heterogeneous components by specifying ..."
Abstract
-
Cited by 40 (10 self)
- Add to MetaCart
This paper describes PARDIS, a system carrying explicit support for interoperability of PARallel DIStributed applications. PARDIS is closely based on the Common Object Request Broker Architecture (CORBA) [OMG95]. Like CORBA, it provides interoperability between heterogeneous components by specifying their interfaces in a meta-language, the CORBA IDL, which can be translated into the language of interacting components, also providing interaction in a distributed domain. In order to provide support for interacting parallel applications, PARDIS extends the CORBA object model by a notion of an SPMD object. SPMD objects allow the request broker to interact directly with the distributed resources of a parallel application. To support distributed argument transfer, PARDIS introduces the notion of a distributed sequence --- a generalization of a CORBA sequence representing distributed data structures of parallel applications. In this report we will give a brief description of basic component i...
GridDB: A Data-Centric Overlay for Scientific Grids
, 2004
"... We present GridDB, a data-centric overlay for scientific grid data analysis. In contrast to currently deployed process-centric middleware, GridDB manages data entities rather than processes. GridDB provides a suite of services important to data analysis: a declarative interface, type-checking, ..."
Abstract
-
Cited by 36 (0 self)
- Add to MetaCart
We present GridDB, a data-centric overlay for scientific grid data analysis. In contrast to currently deployed process-centric middleware, GridDB manages data entities rather than processes. GridDB provides a suite of services important to data analysis: a declarative interface, type-checking, interactive query processing, and memoization. We discuss several elements of GridDB: workflow/data model, query language, software architecture and query processing; and a prototype implementation. We validate GridDB by showing its modeling of real-world physics and astronomy analyses, and measurements on our prototype.
Towards Portable Message Passing in Java: Binding MPI
- In Recent Advances in PVM and MPI, number 1332 in Lecture Notes in Computer Science
, 1997
"... . In this paper we present a way of successfully tackling the difficulties of binding MPI to Java with a view to ensuring portability. We have created a tool for automatically binding existing native C libraries to Java, and have applied the Java--to--C Interface generating tool (JCI) to bind MPI to ..."
Abstract
-
Cited by 34 (9 self)
- Add to MetaCart
. In this paper we present a way of successfully tackling the difficulties of binding MPI to Java with a view to ensuring portability. We have created a tool for automatically binding existing native C libraries to Java, and have applied the Java--to--C Interface generating tool (JCI) to bind MPI to Java. The approach of automatic binding by JCI ensures both portability across different platforms and full compatibility with the MPI specification. To evaluate the resulting combination we have run a Java version of the NAS parallel IS benchmark on a distributed--memory IBM SP2 machine. 1 Introduction It is generally accepted that computers based on the emerging hybrid shared/distributed-memory parallel architectures will become the fastest and most cost-effective supercomputers over the next decade. This, however, makes the search for the most appropriate programming model even more important than it has been so far. Users need a flexible yet comprehensive interface which covers both th...
Optimizing bandwidth limited problems using one-sided communication and overlap
- In 20th International Parallel and Distributed Processing Symposium (IPDPS
, 2006
"... This paper demonstrates the one-sided communication used in languages like UPC can provide a significant performance advantage for bandwidth-limited applications. This is shown through communication microbenchmarks and a case-study of UPC and MPI implementations of the NAS FT benchmark. Our optimiza ..."
Abstract
-
Cited by 32 (12 self)
- Add to MetaCart
This paper demonstrates the one-sided communication used in languages like UPC can provide a significant performance advantage for bandwidth-limited applications. This is shown through communication microbenchmarks and a case-study of UPC and MPI implementations of the NAS FT benchmark. Our optimizations rely on aggressively overlapping communication with computation, alleviating bottlenecks that typically occur when communication is isolated in a single phase. The new algorithms send more and smaller messages, yet the one-sided versions achieve> 1.9 × speedup over the base Fortran/MPI. Our one-sided versions show an average 15 % improvement over the twosided versions, due to the lower software overhead of onesided communication, whose semantics are fundamentally lighter-weight than message passing. Our UPC results use Berkeley UPC with GASNet and demonstrate the scalability of that system, with performance approaching 0.5 TFlop/s on the FT benchmark with 512 processors. 1.
Globalized Newton-Krylov-Schwarz algorithms and software for parallel implicit CFD
- Int. J. High Performance Computing Applications
, 1998
"... Key words. Newton-Krylov-Schwarz algorithms, parallel CFD, implicit methods Abstract. Implicit solution methods are important in applications modeled by PDEs with disparate temporal and spatial scales. Because such applications require high resolution with reasonable turnaround, parallelization is e ..."
Abstract
-
Cited by 29 (12 self)
- Add to MetaCart
Key words. Newton-Krylov-Schwarz algorithms, parallel CFD, implicit methods Abstract. Implicit solution methods are important in applications modeled by PDEs with disparate temporal and spatial scales. Because such applications require high resolution with reasonable turnaround, parallelization is essential. The pseudo-transient matrix-free Newton-Krylov-Schwarz (ΨNKS) algorithmic framework is presented as a widely applicable answer. This article shows that, for the classical problem of three-dimensional transonic Euler flow about an M6 wing, ΨNKS can simultaneously deliver • globalized, asymptotically rapid convergence through adaptive pseudo-transient continuation and Newton’s method; • reasonable parallelizability for an implicit method through deferred synchronization and favorable communication-to-computation scaling in the Krylov linear solver; and • high per-processor performance through attention to distributed memory and cache locality, especially through the Schwarz preconditioner. Two discouraging features of ΨNKS methods are their sensitivity to the coding of the underlying PDE discretization and the large number of parameters that must be selected to govern convergence. We therefore distill several recommendations from our experience and from our reading of the literature on various algorithmic components of ΨNKS, and we describe a freely available, MPI-based portable parallel software implementation of the solver employed here. 1. Introduction. Disparate
Performance and Experience with LAPI - a New High-Performance
- Communication Library for the IBM RS/6000 SP. In Proceedings of the International Parallel Processing Symposium
, 1998
"... LAPI is a low-level, high-performance communication interface available on the IBM RS/6000 SP system. It provides an activemessage-like interface along with remote memory copy and synchronization functionality. It is designed primarily for use by experienced programmers in developing parallel subsys ..."
Abstract
-
Cited by 27 (8 self)
- Add to MetaCart
LAPI is a low-level, high-performance communication interface available on the IBM RS/6000 SP system. It provides an activemessage-like interface along with remote memory copy and synchronization functionality. It is designed primarily for use by experienced programmers in developing parallel subsystems, libraries and tools, but we also expect power programmers to use it in end-user applications. IBM developed LAPI as a part of a project with Pacific Northwest National Laboratory (PNNL) to optimize the performance of the Global Arrays (GA) toolkit and its applications on the IBM RS/6000 SP. We provide an overview of LAPI characteristics and discuss its differences from other models such as MPI-2. We present some base performance parameters of LAPI including latency and bandwidth and compare it with performance of the MPI/MPL. The Global Arrays library from PNNL was ported to LAPI to exploit the performance benefits of this new interface. Experience using LAPI to implement GA and the performance of the resulting library are presented. 1
PVM and MPI: A comparison of features
- Calculateurs Paralleles
, 1996
"... This paper compares PVM and MPI features, pointing out the situations where one may befavored over the other. Application developers can determine where their application most likely will run and if it requires particular features supplied by only one or the other of the APIs. MPI is expected to be ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
This paper compares PVM and MPI features, pointing out the situations where one may befavored over the other. Application developers can determine where their application most likely will run and if it requires particular features supplied by only one or the other of the APIs. MPI is expected to be faster within a large multiprocessor. It has many more point-to-point and collective communication options than PVM. This can be important ifan algorithm is dependent onthe existence of a special communication option. MPI also has the ability to specify a logical communication topology. PVM is better when applications will be run over heterogeneous networks. It has good interoperability between di erent hosts. PVM allows the development of fault tolerant applications that can survive host or task failures. Because the PVM model is built around the virtual machine concept (not present in the MPI model), it provides a powerful set of dynamic resource manager and process control functions. Each API has its unique strengths and this will remain so into the foreseeable future. One area of future research is to study the feasibility of creating a programming environment that allows access to the virtual machine features of PVM and the message passing features of MPI. 1.

