Results 1 - 10
of
11
JavaParty - Transparent Remote Objects in Java
- Concurrency: Practice and Experience
, 1997
"... Java's threads offer appropriate means either for parallel programming of SMPs or as target constructs when compiling add-on features (e.g. forall constructs, automatic parallelization, etc.) Unfortunately, Java does not provide elegant and straightforward mechanisms for parallel programming on dist ..."
Abstract
-
Cited by 83 (3 self)
- Add to MetaCart
Java's threads offer appropriate means either for parallel programming of SMPs or as target constructs when compiling add-on features (e.g. forall constructs, automatic parallelization, etc.) Unfortunately, Java does not provide elegant and straightforward mechanisms for parallel programming on distributed memory machines, like clusters of workstations. JavaParty transparently adds remote objects to Java purely by declaration while avoiding disadvantages of explicit socket communication, the programming overhead of RMI, and many disadvantages of the message-passing approach in general. JavaParty is specifically targeted towards and implemented on clusters of workstations. It hence combines Java-like programming and the concepts of distributed shared memory in heterogeneous networks.
ABSTRACT Software Engineering for Multicore Systems – An Experience Report
, 2007
"... The emergence of inexpensive parallel computers powered by multicore chips combined with stagnating clock rates raises new challenges for software engineering. As future performance improvements will not come “for free ” from increased clock rates, performance critical applications will need to be p ..."
Abstract
-
Cited by 12 (7 self)
- Add to MetaCart
The emergence of inexpensive parallel computers powered by multicore chips combined with stagnating clock rates raises new challenges for software engineering. As future performance improvements will not come “for free ” from increased clock rates, performance critical applications will need to be parallelized. However, little is known about the engineering principles for parallel general-purpose applications. This paper presents an experience report with four diverse case studies on multicore software development for generalpurpose applications. They were programmed in different languages and benchmarked on several multicore computers. Empirical findings include: • Multicore computers deliver: Real speedups are achievable, albeit with significant programming effort and speedups that are typically lower than the number of cores employed. • Massive refactoring of sequential programs is required, sometimes at several levels. Special tools for parallelization refactorings appear to be an important area of research. • Autotuning is indispensable, as manually tuning thread assignment, number of pipeline stages, size of data partitions and other parameters is difficult and error prone. • Architectures that encompass several parallel components are poorly understood. Tuneable architectural patterns with parallelism at several levels need to be discovered.
Exploiting Object Locality in JavaParty, a Distributed Computing Environment for Workstation Clusters
- In The Ninth Workshop on Compilers for Parallel Computers (CPC2001
, 2001
"... In a distributed programming environment with location transparency, fast access to remote resources is absolutely critical for ecient program execution - but it is not sucient. Locality optimization will try to group objects according to their communication patterns and replace remote access by loc ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
In a distributed programming environment with location transparency, fast access to remote resources is absolutely critical for ecient program execution - but it is not sucient. Locality optimization will try to group objects according to their communication patterns and replace remote access by local access whenever possible. Locality optimization is based on the assumption that local access always is much faster than remote access.
A Reliable Transmission Protocol for Myrinet
- IN PROCEEDINGS OF THE 2ND WORKSHOP ON CLUSTER-COMPUTING
, 1999
"... This work presents a low-level communication protocol for Myrinet, which offers reliable data transmission at network interface level. The protocol is used within the ParaStation2 system, a high-performance cluster for parallel computing. Although most projects using Myrinet assume the hardware to b ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
This work presents a low-level communication protocol for Myrinet, which offers reliable data transmission at network interface level. The protocol is used within the ParaStation2 system, a high-performance cluster for parallel computing. Although most projects using Myrinet assume the hardware to be reliable, there is strong evidence that this assumption does not hold and reliable data transmission has to be ensured using an appropriate protocol. ParaStation2 exploits Myrinet's programmable network interface (NI) to implement link level flow control based on an ACK/NACK mechanism with timeout and retransmission. We describe the design and implementation of ParaStation2's transmission protocol and we evaluate it performance by comparing it to two similar protocols offering reliable data transmission, namely AM-II from Berkeley [CMC97] and VMMC-II from Princeton [DBL + 97].
Improving the Communication Subsystem Performance of WARPED
, 1998
"... With the advent of cheap and powerful hardware for workstations and networks, a new cluster-based architecture for Time Warp simulations has been envisioned. However, fine-grained Time Warp applications that communicate frequently are not the ideal candidates for such architectures due to their high ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
With the advent of cheap and powerful hardware for workstations and networks, a new cluster-based architecture for Time Warp simulations has been envisioned. However, fine-grained Time Warp applications that communicate frequently are not the ideal candidates for such architectures due to their high latency communication costs. Hence, designers of fine-grained Time Warp applications on clusters are faced with the problem of reducing the high communication latency of the communication subsystem in such architectures. An efficient communication subsystem consumes a lower fraction of the processing cycles for communication operations and allows the majority of the processing cycles to be used by the application. This increases the performance of Time Warp applications. This thesis reduces the latency of the communication subsystem by selecting one of the following approaches: (i) reducing network latency by employing a higher performance network hardware (i.e., Fast Ethernet versus Myrine...
Design and Evaluation of ParaStation2
- In Proceedings of the International Workshop on Distributed High Performance Computing and Gigabit Wide Area Networks, Lecture Notes in Control and Information Sciences
, 1999
"... This paper presents ParaStation2, an adaption of the ParaStation system (which was build on top of our own hardware) to the Myrinet hardware. The main focus lies on the design and implementation of ParaStation2's flow control protocol to ensure reliable data transmission at network interface level, ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper presents ParaStation2, an adaption of the ParaStation system (which was build on top of our own hardware) to the Myrinet hardware. The main focus lies on the design and implementation of ParaStation2's flow control protocol to ensure reliable data transmission at network interface level, which is different to most other projects using Myrinet. One-way latency is 14:5¯s to 18¯s (depending on the hardware platform) and throughput is 50 MByte/s to 65 MByte/s, which compares well to other approaches. At application level, we were able to achieve a performance of 5.3 GFLOP running a matrix multiplication on 8 DEC Alpha machines (21164A, 500 MHz). 1. Introduction
A Multithreaded Communication System for ATM-Based High Performance Distributed Computing Environments
"... Current advances in processor technology and the rapid development of high-speed networking technology (e.g., Asynchronous Transfer Mode (ATM), Myrinet, and Fast Ethernet) have made network-based computing an attractive environment for large-scale High Performance Distributed Computing (HPDC) applic ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Current advances in processor technology and the rapid development of high-speed networking technology (e.g., Asynchronous Transfer Mode (ATM), Myrinet, and Fast Ethernet) have made network-based computing an attractive environment for large-scale High Performance Distributed Computing (HPDC) applications. However, due to the communication overhead between computers and the inflexible communication architectures of the parallel/distributed software tools, most HPDC applications do not fully utilize the benefits of high-speed communication networks. This can be mainly attributed to the high cost associated with system calls and context switching, redundant data copying during protocol processing, lack of support to overlap computation and communication at the application level, and tight coupling of data and control functions. In this paper, we present an architecture, implementation, and performance evaluation of a multithreaded message-passing system for an ATM-based HPDC environment ...
An Evaluation Methodology for Parallel/Distributed Software Tools
"... The recent rapid growth of the network computing applications area has been accelerated by a variety of parallel and distributed computing (PDC) software tools that simplify process management, inter-process communication, and program debugging in a PDC environment. This variety of software tools va ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The recent rapid growth of the network computing applications area has been accelerated by a variety of parallel and distributed computing (PDC) software tools that simplify process management, inter-process communication, and program debugging in a PDC environment. This variety of software tools varies significantly in terms of the application domain targeted and corresponding functionality provided, the computational and communication model supported, the underlying implementation philosophy, and the computing environments supported. This makes the selection of the best tool to run a given class of applications on a parallel or distributed system a non-trivial task that requires some investigation. Currently, there are no general criteria to evaluate PDC software tools, nor is it easy to lay down such criteria. In this paper we present a multilevel evaluation methodology for PDC software tools in which tools are evaluated from three different perspectives: tool performance level (TPL...
Reducing Cache Conflicts by a Parametrized Memory Mapping
"... Algorithms which access memory regularly are typical for scientific computing, image processing and multimedia. Cache conflicts are often responsible for performance degradation, but can be avoided by an adequate placement of data in memory. The huge search space for such compile time placements is ..."
Abstract
- Add to MetaCart
Algorithms which access memory regularly are typical for scientific computing, image processing and multimedia. Cache conflicts are often responsible for performance degradation, but can be avoided by an adequate placement of data in memory. The huge search space for such compile time placements is systematically reduced until we arrive at a class of very simple mappings, well known from data distribution onto processors in parallel computing. The choice of parameters is then guided by a cost function which reects the tradeoff between additional instruction overhead and reduced miss penalty. We show by experiment that when keeping the overhead low, a considerable speedup can be achieved.
PSPVM2 -- PVM for ParaStation
- In Proc. of 1st Workshop on Cluster Computing
, 1997
"... This document describes the concept, implementation and performance of PSPVM2, the second port of the Parallel Virtual Machine (PVM, [2]) to the ParaStation system 1 . It is derived from the original PVM source code to retain compatibility with existing PVM applications and implementations, bu ..."
Abstract
- Add to MetaCart
This document describes the concept, implementation and performance of PSPVM2, the second port of the Parallel Virtual Machine (PVM, [2]) to the ParaStation system 1 . It is derived from the original PVM source code to retain compatibility with existing PVM applications and implementations, but uses a low level interface to the ParaStation system and direct communication within the ParaStation cluster. PSPVM is the fastest PVM implementation on a workstation cluster with a process-to-process latency as low as 11.5s 2 and a sustained transfer rate of up to 9.4MB/s. It can still be used as part of a larger virtual machine and all PVM programs can be compiled for PSPVM without any changes. 1 Introduction ParaStation is a communications fabric for connecting off-the-shelf workstations into a supercomputer. The fabric employs technology used in massively parallel machines and scales up to 4096 nodes. It offers a latency of 2s and a throughput of 15.5MB/s. ParaStation's user-...

