• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Wide-Area Parallel Programming Using the Remote Method Invocation Method. Concurrency: Practice and Experience (0)

by R van Nieuwpoort, J Maassen, H Bal, T Kielmann, R Veldema
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 11
Next 10 →

Efficient load balancing for wide-area divideand-conquer applications

by Rob V. Van Nieuwpoort, Thilo Kielmann, Henri E. Bal - In: Proc. PPoPP’01, Snowbird, UT (2001
"... Divide-and-conquer programs are easily parallelized by letting the programmer annotate potential parallelism in the form of spawn and sync constructs. To achieve efficient program execution, the generated work load has to be balanced evenly among the available CPUs. For single cluster systems, Rando ..."
Abstract - Cited by 46 (16 self) - Add to MetaCart
Divide-and-conquer programs are easily parallelized by letting the programmer annotate potential parallelism in the form of spawn and sync constructs. To achieve efficient program execution, the generated work load has to be balanced evenly among the available CPUs. For single cluster systems, Random Stealing (RS) is known to achieve optimal load balancing. However, RS is inefficient when applied to hierarchical wide-area systems where multiple clusters are connected via wide-area networks (WANs) with high latency and low bandwidth. In this paper, we experimentally compare RS with existing loadbalancing strategies that are believed to be efficient for multi-cluster systems, Random Pushing and two variants of Hierarchical Stealing. We demonstrate that, in practice, they obtain less than optimal results. We introduce a novel load-balancing algorithm, Clusteraware Random Stealing (CRS) which is highly efficient and easy to implement. CRS adapts itself to network conditions and job granularities, and does not require manually-tuned parameters. Although CRS sends more data across the WANs, it is faster than its competitors for 11 out of 12 test applications with various WAN configurations. It has at most 4 % overhead in run time compared to RS on a single, large cluster, even with high wide-area latencies and low wide-area bandwidths. These strong results suggest that divideand-conquer parallelism is a useful model for writing distributed supercomputing applications on hierarchical wide-area systems.

Ibis: A Flexible and Efficient Java-based Grid Programming Environment

by Rob V. Van Nieuwpoort, Jason Maassen, Gosia Wrzesińska, Rutger Hofman, Ceriel Jacobs, Thilo Kielmann, Henri E. Bal - Concurrency & Computation: Practice & Experience , 2005
"... In computational grids, performance-hungry applications need to simultaneously tap the computational power of multiple, dynamically available sites. The crux of designing grid programming environments stems exactly from the dynamic availability of compute cycles: grid programming environments (a) ne ..."
Abstract - Cited by 45 (15 self) - Add to MetaCart
In computational grids, performance-hungry applications need to simultaneously tap the computational power of multiple, dynamically available sites. The crux of designing grid programming environments stems exactly from the dynamic availability of compute cycles: grid programming environments (a) need to be portable to run on as many sites as possible, (b) they need to be flexible to cope with different network protocols and dynamically changing groups of compute nodes, while (c) they need to provide efficient (local) communication that enables high-performance computing in the first place. Existing programming environments are either portable (Java), or they are flexible (Jini, Java RMI), or they are highly efficient (MPI). No system combines all three properties that are necessary for grid computing. In this paper, we present Ibis, a new programming environment that combines Java’s “run everywhere ” portability both with flexible treatment of dynamically available networks and processor pools, and with highly efficient, object-based communication. Ibis can transfer Java objects very efficiently by combining streaming object serialization with a zero-copy protocol. Using RMI as a simple test case, we show that Ibis outperforms existing RMI implementations, achieving up to 9 times higher throughputs with trees of objects. 1

Experiences with the koala co-allocating scheduler in multiclusters

by H. H. Mohamed, D. H. J. Epema - In Proc. of the 5th IEEE/ACM Int’l Symp. on Cluster Computing and the GRID (CCGrid2005 , 2005
"... In multicluster systems, and more generally, in grids, jobs may require co-allocation, i.e., the simultaneous allocation of resources such as processors and input files in multiple clusters. While such jobs may have reduced runtimes because they have access to more resources, waiting for processors ..."
Abstract - Cited by 30 (9 self) - Add to MetaCart
In multicluster systems, and more generally, in grids, jobs may require co-allocation, i.e., the simultaneous allocation of resources such as processors and input files in multiple clusters. While such jobs may have reduced runtimes because they have access to more resources, waiting for processors in multiple clusters and for the input files to become available in the right locations, may introduce inefficiencies. Moreover, as single jobs now have to rely on multiple resource managers, co-allocation introduces reliability problems. In this paper, we present two additions to the original design of our KOALA co-allocating scheduler (different priority levels of jobs and incrementally claiming processors), and we report on our experiences with KOALA in our multicluster testbed while it was unstable. 1

Ibis: an efficient Java-based Grid programming environment

by Rob V. Van Nieuwpoort, Jason Maassen, Rutger Hofman, Thilo Kielmann, Henri E. Bal - in Joint ACM Java Grande - ISCOPE 2002 Conference , 2002
"... rob,jason,rutger,kielmann,bal ¡ ..."
Abstract - Cited by 26 (12 self) - Add to MetaCart
rob,jason,rutger,kielmann,bal ¡

An Evaluation of the Close-to-Files Processor and Data Co-Allocation Policy in Multiclusters

by H. H. Mohamed, D. H. J. Epema - In Proceedings of the 2004 IEEE International Conference on Cluster Computing , 2004
"... In multicluster systems, and more generally, in grids, jobs may require co-allocation, i.e., the simultaneous allocation of resources such as processors and input files in multiple clusters. While such jobs may have reduced runtimes because they have access to more resources, waiting for processors ..."
Abstract - Cited by 25 (7 self) - Add to MetaCart
In multicluster systems, and more generally, in grids, jobs may require co-allocation, i.e., the simultaneous allocation of resources such as processors and input files in multiple clusters. While such jobs may have reduced runtimes because they have access to more resources, waiting for processors in multiple clusters and for the input files to become available in the right locations may introduce inefficiencies. In previous work, we have studied through simulations only processor co-allocation. Here, we extend this work with an analysis of the performance in a real testbed of our prototype Processor and Data Co-Allocator with the Close-to-Files (CF) job-placement algorithm. CF tries to place job components on clusters with enough idle processors which are close to the sites where the input files reside. We present a comparison of the performance of CF and the Worst-Fit job-placement algorithm, with and without file replication, achieved with our prototype. Our most important findings are that CF with replication works best, and that the utilization in our testbed can be driven to about 80%. 1

Parallel Application Experience with Replicated Method Invocation

by Jason Maassen, Thilo Kielmann, Henri E. Bal , 2001
"... We describe and evaluate a new approach to object replication in Java, aimed at improving the performance of parallel programs. Our programming model allows the programmer to define groups of objects that can be replicated and updated as a whole, using totally-ordered broadcast to send update method ..."
Abstract - Cited by 17 (11 self) - Add to MetaCart
We describe and evaluate a new approach to object replication in Java, aimed at improving the performance of parallel programs. Our programming model allows the programmer to define groups of objects that can be replicated and updated as a whole, using totally-ordered broadcast to send update methods to all machines containing a copy. The model has been implemented in the Manta high-performance Java system. We evaluate system performance both with micro benchmarks and with a set of five parallel applications. For the applications, we also evaluate ease of programming, compared to RMI implementations. We present performance results for a Myrinet-based workstation cluster as well as for a wide-area distributed system consisting of four such clusters. The micro benchmarks show that updating a replicated object on 64 machines only takes about three times the RMI latency in Manta. Applications using Manta’s object replication mechanism perform at least as fast as manually optimized versions based on RMI, while keeping the application code as simple as with naive versions that use shared objects without taking locality into account. Using a replication mechanism in Manta’s runtime system enables several unmodified applications to run efficiently even on the wide-area system.

Trace-Based Simulations of Processor Co-Allocation Policies in . . .

by A. I. D. Bucur, et al. , 2003
"... In systems consisting of multiple clusters of processors which employ space sharing for scheduling jobs, such as our Distributed ASCI Supercomputer (DAS), coallocation, i.e., the simultaneous allocation of processors to single jobs in multiple clusters, may be required. In this paper we study the pe ..."
Abstract - Cited by 15 (8 self) - Add to MetaCart
In systems consisting of multiple clusters of processors which employ space sharing for scheduling jobs, such as our Distributed ASCI Supercomputer (DAS), coallocation, i.e., the simultaneous allocation of processors to single jobs in multiple clusters, may be required. In this paper we study the performance of several scheduling policies for co-allocating unordered requests in multiclusters with a workload derived from the DAS. We find that beside the policy, limiting the total job size significantly improves the performance, and that for a slowdown of jobs due to global communication bounded by 1.25, co-allocation is a viable choice.

Programming Environments for High-Performance Grid Computing: the Albatross Project

by Thilo Kielmann, Henri E. Bal, Jason Maassen, Rob Van Nieuwpoort, Lionel Eyraud, Rutger Hofman, Kees Verstoep , 2002
"... The aim of the Albatross project is to study applications and programming environments for computational Grids. We focus on high performance applications, running in parallel on multiple clusters or MPPs that are connected by wide-area networks (WANs). We briefly present three Grid programming envir ..."
Abstract - Cited by 12 (2 self) - Add to MetaCart
The aim of the Albatross project is to study applications and programming environments for computational Grids. We focus on high performance applications, running in parallel on multiple clusters or MPPs that are connected by wide-area networks (WANs). We briefly present three Grid programming environments developed in the context of the Albatross project: the MagPIe library for collective communication with MPI, the Replicated Method Invocation mechanism for Java (RepMI), and the Java-based Satin system for running divide-and-conquer programs on Grid platforms.

The Albatross Project: Parallel Application Support for Computational Grids

by Thilo Kielmann, Henri E. Bal, Jason Maassen, Rob van Nieuwpoort, Ronald Veldema, Rutger Hofman, Ceriel Jacobs, Kees Verstoep - In Proceedingof the 1st European GRID Forum Workshop , 2000
"... The aim of the Albatross project is to study applications and programming environments for computational grids consisting of multiple clusters that are connected by wide-area networks. Parallel processing on such systems is useful but challenging, given the large differences in latency and bandwi ..."
Abstract - Cited by 4 (2 self) - Add to MetaCart
The aim of the Albatross project is to study applications and programming environments for computational grids consisting of multiple clusters that are connected by wide-area networks. Parallel processing on such systems is useful but challenging, given the large differences in latency and bandwidth between LANs and WANs. We provide efficient algorithms and programming environments that exploit the hierarchical structure of wide-area clusters to minimize communication over the WANs. In addition, we use highly efficient local-area communication protocols. We illustrate this approach using the Manta high-performance Java system and the MagPIe MPI library, both of which are implemented on a collection of four Myrinet-based clusters connected by wide-area ATM networks. Our sample applications obtain high speedups on this wide-area system. 1 Introduction As computational grids become more widely available, it becomes feasible to run parallel applications on multiple clusters at d...

Object-based Collective Communication in Java

by Arnold Nelisse Thilo, Thilo Kielmann, Henri E. Bal, Jason Maassen - In Joint ACM JavaGrande-ISCOPE 2001 , 2001
"... CCJ is a communication library that adds MPI-like collective operations to Java. Rather than trying to adhere to the precise MPI syntax, CCJ aims at a clean integration of collective communication into Java's object-oriented framework. For example, CCJ uses thread groups to support Java's multithrea ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
CCJ is a communication library that adds MPI-like collective operations to Java. Rather than trying to adhere to the precise MPI syntax, CCJ aims at a clean integration of collective communication into Java's object-oriented framework. For example, CCJ uses thread groups to support Java's multithreading model and it allows any data structure (not just arrays) to be communicated. CCJ is implemented entirely in Java, on top of RMI, so it can be used with any Java virtual machine. The paper discusses three parallel Java applications that use collective communication. It compares the performance (on top of a Myrinet cluster) of CCJ, RMI and mpiJava versions of these applications, and also compares the code complexity of the CCJ and RMI versions. The results show that the CCJ versions are significantly simpler than the RMI versions and obtain a good performance.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University