Results 1 -
6 of
6
CAPSULE: Hardware-assisted parallel execution of componentbased programs
- In Proceedings of the 39th Annual International Symposium on Microarchitecture
, 2006
"... Since processor performance scalability will now mostly be achieved through thread-level parallelism, there is a strong incentive to parallelize a broad range of applications, including those with complex control flow and data structures. And writing parallel programs is a notoriously difficult task ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Since processor performance scalability will now mostly be achieved through thread-level parallelism, there is a strong incentive to parallelize a broad range of applications, including those with complex control flow and data structures. And writing parallel programs is a notoriously difficult task. Beyond processor performance, the architect can help by facilitating the task of the programmer, especially by simplifying the model exposed to the programmer. In this article, among the many issues associated with writing parallel programs, we focus on finding the appropriate parallelism granularity, and efficiently mapping tasks with complex control and data flow to threads. We propose to relieve the user and compiler of both tasks by delegating the parallelization decision to the architecture at run-time, through a combination of hardware and
A Comparison of Three High Speed Networks for Parallel Cluster Computing
- In Proc. 1st International Workshop on Communication and Arch. Support for Network-Based Parallel Computing
, 1997
"... . Many high speed networks have been developed that may be suitable for parallel computing on clusters of workstations. This paper compares three different networks: FastEthernet, ATM, and Myrinet. We have implemented the Panda portability layer on all three networks, using the same host machines an ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
. Many high speed networks have been developed that may be suitable for parallel computing on clusters of workstations. This paper compares three different networks: FastEthernet, ATM, and Myrinet. We have implemented the Panda portability layer on all three networks, using the same host machines and as much the same software as possible. We compare the latency and throughput for Panda's point-to-point and multicast communication on the three networks and analyze the performance differences. 1 Introduction The suitability of a cluster of workstations for parallel computing depends strongly on the local area network (LAN) that interconnects the machines. Whereas massively parallel processors (e.g., the SP-2) use efficient, specially-designed switching networks, workstation clusters typically use off-the-shelf LANs. With traditional LANs such as 10 Mpbs Ethernet, the relative communication overhead is high and will even get worse as processors become faster. Fortunately, many high speed...
Adding Dynamic Object Migration to the Distributing Compiler Pangaea
, 2001
"... In distributed, object-oriented programs, placement of objects is crucial for performance, since remote method calls are far more expensive than local calls. Finding an appropriate con guration, i.e. a mapping from objects to hosts, is the most important task in distributed program design. This is n ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In distributed, object-oriented programs, placement of objects is crucial for performance, since remote method calls are far more expensive than local calls. Finding an appropriate con guration, i.e. a mapping from objects to hosts, is the most important task in distributed program design. This is not only true for human developers, but also for the distributing compiler Pangaea that automatically generates a distributed program from a centralized one.
Performance study of parallel programs on a clustered Wide-Area Network
, 1997
"... Contents 1 Introduction 3 2 The environment of the experiment 5 2.1 The Orca language and implementation . . . . . . . . . . . . . . 5 2.2 The Amoeba processor pool . . . . . . . . . . . . . . . . . . . . . 6 2.3 The simulation of a clustered wide-area network . . . . . . . . . 7 2.4 Analyzing Orc ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Contents 1 Introduction 3 2 The environment of the experiment 5 2.1 The Orca language and implementation . . . . . . . . . . . . . . 5 2.2 The Amoeba processor pool . . . . . . . . . . . . . . . . . . . . . 6 2.3 The simulation of a clustered wide-area network . . . . . . . . . 7 2.4 Analyzing Orca programs . . . . . . . . . . . . . . . . . . . . . . 8 2.5 Compiling and running Orca programs . . . . . . . . . . . . . . . 8 3 Related work 10 4 All-pairs Shortest Paths 12 4.1 Performance of ASP . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.2 Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 5 The traveling salesman problem 15 5.1 Performance of TSP . . . . . . . . . . . . . . . . . . . . . . . . . 15 5.2 Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5.3 Summary . . . . . . . . . . . . . . . . . . . .
FlexRTS: An extensible Orca run-time system
"... FlexRTS is a dynamically configurable and extensible run-time system for Orca, a high performance parallel programming system. It provides run-time and application programmers with full control over the implementation and placement of kernel and user-level modules (device drivers, protocol stacks, t ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
FlexRTS is a dynamically configurable and extensible run-time system for Orca, a high performance parallel programming system. It provides run-time and application programmers with full control over the implementation and placement of kernel and user-level modules (device drivers, protocol stacks, thread packages, etc.). This allows programmers to optimize the run-time system on a per application basis and take most leverage out of the available hardware. Keywords: operating systems, run-time systems, parallel programming, extensibility. 1. Introduction It is hard for an application programmer to take full advantage of existing hardware. This is largely caused by a lack of control over the available abstractions. Many researchers have interpreted this as kernel abstractions and have proposed mechanisms for extending or adapting these [4, 5, 8, 11, 14]. Non-kernel abstractions, e.g. those provided by a run-time system, are in theory easy to adapt and extend, but in practice they are j...
Distributed Shared Array: An Integration of Message Passing and Multithreading on SMP Clusters
"... Abstract. This paper presents a Distributed Shared Array runtime system to support Java-compliant multithreaded programming on clusters of symmetric multiprocessors (SMPs). As a hybrid of message passing and shared address space programming models, the DSA programming model allows programmers to exp ..."
Abstract
- Add to MetaCart
Abstract. This paper presents a Distributed Shared Array runtime system to support Java-compliant multithreaded programming on clusters of symmetric multiprocessors (SMPs). As a hybrid of message passing and shared address space programming models, the DSA programming model allows programmers to explicitly control data distribution as so to take advantage of the deep memory hierarchy, while relieving them from error-prone orchestration of communication and synchronization at run-time. The DSA system is developed as an integral component of mobility support middleware for grid computing so that DSA-based virtual machines can be reconfigured to adapt to the varying resource supplies or demand over the course of a computation. The DSA runtime system also features a directorybased cache coherence protocol in support of replication of user-defined sharing granularity and a communication proxy mechanism for reducing network contention. We demonstrate the programmability of the model in a number of parallel applications and evaluate its performance on a cluster of SMP servers, in particular, the impact of the coherence granularity.

