Results 1 - 10
of
14
Arachne: A Portable Threads System Supporting Migrant Threads on Heterogeneous Network Farms
- IEEE Transactions on Parallel and Distributed Systems
, 1998
"... We present the design and implementation of Arachne, a threads system that can be interfaced with a communications library for multi-threaded distributed computations. In particular, Arachne supports thread migration between heterogeneous platforms, with dynamic stack size management and recursive t ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
We present the design and implementation of Arachne, a threads system that can be interfaced with a communications library for multi-threaded distributed computations. In particular, Arachne supports thread migration between heterogeneous platforms, with dynamic stack size management and recursive thread functions. Arachne is efficient, flexible and portable --- it is based entirely on C and C++. To facilitate heterogeneous thread operations, we have added three keywords to the C++ language. The Arachne preprocessor takes as input code written in that language, and outputs C++ code, suitable for compilation with a conventional C++ compiler. The Arachne runtime system manages all threads during program execution. We present some performance measurements on the costs of basic thread operations and thread migration in Arachne, and compare these to costs in other threads systems. Keywords: heterogeneous thread migration, user-level threads, compile-time code transformations, C++ Supporte...
The MultiCluster Model to the Integrated Use of Multiple Workstation Clusters
- Proc. of the 3rd Workshop on Personal Computerbased Networks of Workstations, 2000
, 2000
"... . One of the new research tendencies within the well-established cluster computing area is the growing interest in the use of multiple workstation clusters as a single virtual parallel machine, in much the same way as individual workstations are nowadays connected to build a single parallel cluster. ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
. One of the new research tendencies within the well-established cluster computing area is the growing interest in the use of multiple workstation clusters as a single virtual parallel machine, in much the same way as individual workstations are nowadays connected to build a single parallel cluster. In this paper we present an analysis on several aspects concerning the integration of different workstation clusters, such as Myrinet and SCI, and propose our MultiCluster model as an alternative to achieve such integrated architecture. 1 Introduction Cluster computing is nowadays a common practice to many research groups around the world that search for high performance to a great variety of parallel and distributed applications, like aerospacial and molecular simulations, Web servers, data mining, and so forth. To achieve high performance, many efforts have been devoted to the design and implementation of low overhead communication libraries, specially dedicated to fast communicat...
The UDP Calculus: Rigorous Semantics for Real Networking
, 2001
"... Network programming is notoriously hard to understand: one has to deal with a variety of protocols (IP, ICMP, UDP, TCP etc), concurrency, packet loss, host failure, timeouts, the complex sockets interface to the protocols, and subtle portability issues. Moreover, the behavioural properties of ope ..."
Abstract
-
Cited by 18 (14 self)
- Add to MetaCart
Network programming is notoriously hard to understand: one has to deal with a variety of protocols (IP, ICMP, UDP, TCP etc), concurrency, packet loss, host failure, timeouts, the complex sockets interface to the protocols, and subtle portability issues. Moreover, the behavioural properties of operating systems and the network are not well documented.
Scheduling Threads for Low Space Requirement and Good Locality
- In Proceedings of the Eleventh Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA
, 1999
"... The running time and memory requirement of a parallel program with dynamic, lightweight threads depends heavily on the underlying thread scheduler. In this paper, we present a simple, asynchronous, space-efficient scheduling algorithm for shared memory machines that combines the low scheduling overh ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
The running time and memory requirement of a parallel program with dynamic, lightweight threads depends heavily on the underlying thread scheduler. In this paper, we present a simple, asynchronous, space-efficient scheduling algorithm for shared memory machines that combines the low scheduling overheads and good locality of work stealing with the low space requirements of depth-first schedulers. For a nested-parallel program with depth D and serial space requirement S 1 , we show that the expected space requirement is S 1 +O(K \Delta p \Delta D) on p processors. Here, K is a user-adjustable runtime parameter, which provides a tradeoff between running time and space requirement. Our algorithm achieves good locality and low scheduling overheads by automatically increasing the granularity of the work scheduled on each processor. We have implemented the new scheduling algorithm in the context of a native, user-level implementation of Posix standard threads or Pthreads, and evaluated its p...
Rigour is good for you and feasible: reflections on formal treatments of C and UDP sockets
, 2002
"... Introduction We summarise two projects that formalised complex real world systems: UDP and its sockets API, and the C programming language. We describe their goals and the techniques used in both. We conclude by discussing how such techniques might be applied to other system software and by describ ..."
Abstract
-
Cited by 13 (11 self)
- Add to MetaCart
Introduction We summarise two projects that formalised complex real world systems: UDP and its sockets API, and the C programming language. We describe their goals and the techniques used in both. We conclude by discussing how such techniques might be applied to other system software and by describing the benefits this may bring. 2. Specifying UDP and the sockets API We recently formalised a substantial behavioural specification, that for the Internet protocol UDP, as presented to programmers through the sockets interface [12, 10, 1, 5, 11]. Our aim was to make clear the behavioural subtleties of the widely used -- but poorly documented -- sockets API. This clarification of the interface should ease the production of robust software that uses it. The specification was necessarily developed post hoc; we developed it by referring to existing documentation (RFCs and source code, for example), and by experimentally checking existing implementations, using automated tools. We produced th
SilkRoad: A Multithreaded Runtime System with Software Distributed Shared Memory for SMP Clusters
- In IEEE International Conference on Cluster Computing (Cluster2000
, 2000
"... Multithreaded parallel system with software Distributed Shared Memory (DSM) is an attractive direction in cluster computing. In these systems, distributing workloads and keeping the shared memory operations efficient are critical issues. Distributed Cilk (Cilk 5.1) is a multithreaded runtime system ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Multithreaded parallel system with software Distributed Shared Memory (DSM) is an attractive direction in cluster computing. In these systems, distributing workloads and keeping the shared memory operations efficient are critical issues. Distributed Cilk (Cilk 5.1) is a multithreaded runtime system for SMP clusters with the support of divide-and-conquer programming paradigm. However, there is no support for user level shared memory. In this paper, we describe SilkRoad, an extension of distributed Cilk, which implementing the Lazy Release Consistency (LRC) memory model. In the SilkRoad runtime system, the data of system control information (such as thread management, load balancing, etc) are kept consistent by means of the backing store, just as it is in the original distributed Cilk, while the user's cluster wide
A.: MPI/RT - an emerging standard for high-performance real-time systems
- In: HICSS
, 1998
"... The last several years saw an emergence of standardization activities for real-time systems including standardization of operating systems (series of POSIX standards [1]), of communication for distributed (POSIX.21 [15]) and parallel systems (MPI/RT [6] and real-time object management (real-time COR ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
The last several years saw an emergence of standardization activities for real-time systems including standardization of operating systems (series of POSIX standards [1]), of communication for distributed (POSIX.21 [15]) and parallel systems (MPI/RT [6] and real-time object management (real-time CORBA [14]). This article describes the ongoing work of real-time message passing interface (MPI/RT) standardization. MPI/RT advances the Message Passing Interface Standard (MPI), emphasizing changes that enable and support real-time communication, and is targeted for embedded, fault-tolerant and other real-time systems. 1
A Parallel, Multithreaded Decision Tree Builder
, 1998
"... Parallelization has become a popular mechanism to speed up data classification tasks that deal with large amounts of data. This paper describes a high-level, fine-grained parallel formulation of a decision tree-based classifier for memory-resident datasets on SMPs. We exploit two levels of divide-an ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Parallelization has become a popular mechanism to speed up data classification tasks that deal with large amounts of data. This paper describes a high-level, fine-grained parallel formulation of a decision tree-based classifier for memory-resident datasets on SMPs. We exploit two levels of divide-and-conquer parallelism in the tree builder: at the outer level across the tree nodes, and at the inner level within each tree node. Lightweight Pthreads are used to express this highly irregular and dynamic parallelism in a natural manner. The task of scheduling the threads and balancing the load is left to a space-efficient Pthreads scheduler. Experimental results on large datasets indicate that the space and time performance of the tree builder scales well with both the data size and number of processors. This research is supported by ARPA Contract No. DABT63-96-C-0071. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes, notwithstanding any copyri...
The SASHA Architecture for Network-Clustered Web Servers
- HASE
, 2001
"... We present the Scalable, Application-Space, Highly-Available (SASHA) architecture for network-clustered web servers that demonstrates high performance and fault tolerance using application-space software and Commercial-Off-The-Shelf (COTS) hardware and operating systems. Our SASHA architecture consi ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We present the Scalable, Application-Space, Highly-Available (SASHA) architecture for network-clustered web servers that demonstrates high performance and fault tolerance using application-space software and Commercial-Off-The-Shelf (COTS) hardware and operating systems. Our SASHA architecture consists of an application-space dispatcher, which performs OSI layer 4 switching using layer 2 or layer 3 address translation; application-space agents that execute on server nodes to provide the capability for any server node to operate as the dispatcher; a distributed state-reconstruction algorithm; and a token-based communications protocol that supports self-configuring, detecting and adapting to the addition or removal of servers. The SASHA architecture of clustering offers a flexible and cost-effective alternative to kernel-space or hardwarebased network-clustered servers with performance comparable to kernel-space implementations. 1.
Pthreads for Dynamic Parallelism
, 1998
"... Expressing a large number of lightweight, parallel threads in a shared address space significantly eases the task of writing a parallel program. Threads can be dynamically created to execute individual parallel tasks; the implementation schedules these threads onto the processors and effectively bal ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Expressing a large number of lightweight, parallel threads in a shared address space significantly eases the task of writing a parallel program. Threads can be dynamically created to execute individual parallel tasks; the implementation schedules these threads onto the processors and effectively balances the load. However, unless the threads scheduler is designed carefully, such a parallel program may suffer poor space and time performance. In this paper, we evaluate the performance of a native, lightweight POSIX threads (Pthreads) library on a shared memory machine using a set of parallel benchmarks that dynamically create a large number of threads. By studying the performance of one of the benchmarks, matrix multiply, we show how simple, yet provably good modifications to the library can result in significantly improved space and time performance. With the modified Pthreads library, each of the parallel benchmarks performs as well as its coarse-grained, hand-partitioned counterpart. ...

