Results 1 - 10
of
58
Kaapi: A thread scheduling runtime system for data flow computations on cluster of multi-processors
- In PASCO ’07: Proceedings of the 2007 international workshop on Parallel symbolic computation
, 2007
"... The high availability of multiprocessor clusters for com-puter science seems to be very attractive to the engineer because, at a first level, such computers aggregate high per-formances. Nevertheless, obtaining peak performances on irregular applications such as computer algebra problems re-mains a ..."
Abstract
-
Cited by 64 (14 self)
- Add to MetaCart
(Show Context)
The high availability of multiprocessor clusters for com-puter science seems to be very attractive to the engineer because, at a first level, such computers aggregate high per-formances. Nevertheless, obtaining peak performances on irregular applications such as computer algebra problems re-mains a challenging problem. The delay to access memory is non uniform and the irregularity of computations requires to use scheduling algorithms in order to automatically balance the workload among the processors. This paper focuses on the runtime support implementa-tion to exploit with great efficiency the computation re-sources of a multiprocessor cluster. The originality of our approach relies on the implementation of an efficient work-stealing algorithm for a macro data flow computation based on minor extension of POSIX thread interface.
Ariadne: Architecture of a Portable Threads system supporting Mobile Processes
- Software-Practice and Experience
, 1996
"... Threads exhibit a simply expressed and powerful form of concurrency, easily exploitable in applications that run on both uni- and multi-processors, shared- and distributed-memory systems. This paper presents the design and implementation of Ariadne: a layered, C-based software architecture for multi ..."
Abstract
-
Cited by 62 (22 self)
- Add to MetaCart
(Show Context)
Threads exhibit a simply expressed and powerful form of concurrency, easily exploitable in applications that run on both uni- and multi-processors, shared- and distributed-memory systems. This paper presents the design and implementation of Ariadne: a layered, C-based software architecture for multi-threaded computing on a variety of platforms. Ariadne is a portable user-space threads system that runs on shared- and distributed-memory multiprocessors. It can be used for parallel and distributed applications. Thread-migration is supported at the application level in homogeneous environments (e.g., networks of SPARCs and Sequent Symmetrys, Intel hypercubes). Threads may migrate between processes to access remote data, preserving locality of reference for computations with a dynamic data space. Ariadne can be tuned to specific applications through a customization layer. Support is provided for scheduling via a built-in or application-specific scheduler, and interfacing with any communicat...
Implementing lightweight threads
- In Proceedings of the 1992 USENIX Summer Conference
, 1992
"... We describe an implementation of a threads library that provides extremely lightweight threads within a single UNIX process while allowing fully concurrent access to system resources. The threads are lightweight enough so that they can be created quickly, there can be thousands present, and synchron ..."
Abstract
-
Cited by 56 (0 self)
- Add to MetaCart
We describe an implementation of a threads library that provides extremely lightweight threads within a single UNIX process while allowing fully concurrent access to system resources. The threads are lightweight enough so that they can be created quickly, there can be thousands present, and synchronization can be accomplished rapidly. These goals are achieved by providing user threads which multiplex on a pool of kernel-supported threads of control. This pool is managed by the library and will automatically grow or shrink as required to ensure that the process will make progress while not using an excessive amount of kernel resources. The programmer can also tune the relationship between threads and kernel supported threads of control. This paper focuses on scheduling and synchronizing user threads, and their interaction with UNIX signals in a multiplexing threads library.
Space-Efficient Scheduling of Nested Parallelism
- ACM Transactions on Programming Languages and Systems
, 1999
"... This article presents an on-line scheduling algorithm that is provably space e#cient and time e#cient for nested-parallel languages. For a computation with depth D and serial space requirement S1 , the algorithm generates a schedule that requires at most S1 +O(K D p)space (including scheduler spa ..."
Abstract
-
Cited by 35 (6 self)
- Add to MetaCart
(Show Context)
This article presents an on-line scheduling algorithm that is provably space e#cient and time e#cient for nested-parallel languages. For a computation with depth D and serial space requirement S1 , the algorithm generates a schedule that requires at most S1 +O(K D p)space (including scheduler space) on p processors. Here, K is a user-adjustable runtime parameter specifying the net amount of memory that a thread may allocate before it is preempted by the scheduler. Adjusting the value of K provides a trade-o# between the running time and the memory requirement of a parallel computation. To allow the scheduler to scale with the number of processors, we also parallelize the scheduler and analyze the space and time bounds of the computation to include scheduling costs. In addition to showing that the scheduling algorithm is space and time e#cient in theory, we demonstrate that it is e#ective in practice. We have implemented a runtime system that uses our algorithm to schedule lightweight parallel threads. The results of executing parallel programs on this system show that our scheduling algorithm significantly reduces memory usage compared to previous techniques, without compromising performance
A Machine Independent Interface for Lightweight Threads
- Review of the ACM Special Interest Group in Operating Systems
, 1993
"... Recently, lightweight thread libraries have become a common entity to support concurrent programming on shared memory multiprocessors. However, the disparity between primitives offered by operating systems creates a challenge for those who wish to create portable lightweight thread packages. What sh ..."
Abstract
-
Cited by 29 (8 self)
- Add to MetaCart
(Show Context)
Recently, lightweight thread libraries have become a common entity to support concurrent programming on shared memory multiprocessors. However, the disparity between primitives offered by operating systems creates a challenge for those who wish to create portable lightweight thread packages. What should be the interface between the machine-independent and machine-dependent parts of the thread library ? We have implemented a portable lightweight thread library on top of Unix on a KSR-1 supercomputer, BBN Butterfly multiprocessor, SGI multiprocessor, Sequent multiprocessor and Sun 3/4 family of uniprocessors. This paper first compares the nature and performance of the OS primitives offered by these machines. We then present a procedure-level abstraction that is efficiently implementable on all the architectures and is a sufficient base upon which a user-level thread package can be built. College of Computing Georgia Institute of Technology Atlanta, Georgia 30332--0280 1 Introduction T...
Integrating concurrency control and energy management in device drivers
- in Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, ser. SOSP ’07
"... Abstract Energy management is a critical concern in wireless sensornets. Despite its importance, sensor network operating systems today provide minimal energy management support, requiring applications to explicitly manage system power states. To address this problem, we present ICEM, a device driv ..."
Abstract
-
Cited by 29 (5 self)
- Add to MetaCart
(Show Context)
Abstract Energy management is a critical concern in wireless sensornets. Despite its importance, sensor network operating systems today provide minimal energy management support, requiring applications to explicitly manage system power states. To address this problem, we present ICEM, a device driver architecture that enables simple, energy efficient wireless sensornet applications. The key insight behind ICEM is that the most valuable information an application can give the OS for energy management is its concurrency. Using ICEM, a low-rate sensing application requires only a single line of energy management code and has an efficiency within 1.6% of a hand-tuned implementation. ICEM's effectiveness questions the assumption that sensornet applications must be responsible for all power management and sensornets cannot have a standardized OS with a simple API.
Scheduling Threads for Low Space Requirement and Good Locality
- In Proceedings of the Eleventh Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA
, 1999
"... The running time and memory requirement of a parallel program with dynamic, lightweight threads depends heavily on the underlying thread scheduler. In this paper, we present a simple, asynchronous, space-efficient scheduling algorithm for shared memory machines that combines the low scheduling overh ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
(Show Context)
The running time and memory requirement of a parallel program with dynamic, lightweight threads depends heavily on the underlying thread scheduler. In this paper, we present a simple, asynchronous, space-efficient scheduling algorithm for shared memory machines that combines the low scheduling overheads and good locality of work stealing with the low space requirements of depth-first schedulers. For a nested-parallel program with depth D and serial space requirement S 1 , we show that the expected space requirement is S 1 +O(K \Delta p \Delta D) on p processors. Here, K is a user-adjustable runtime parameter, which provides a tradeoff between running time and space requirement. Our algorithm achieves good locality and low scheduling overheads by automatically increasing the granularity of the work scheduled on each processor. We have implemented the new scheduling algorithm in the context of a native, user-level implementation of Posix standard threads or Pthreads, and evaluated its p...
Lazy process switching
- In Proceedings of 8th Workshop on Hot Topics in Operating Systems, Schloß Elmau
, 2001
"... Although IPC has become really fast it is still too slow on certain processors. Two examples motivating even faster IPC, critical sections in real-time applications and multi-threaded servers, are briefly discussed below. Critical sections in real-time applications suffer from the wellknown priority ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
(Show Context)
Although IPC has become really fast it is still too slow on certain processors. Two examples motivating even faster IPC, critical sections in real-time applications and multi-threaded servers, are briefly discussed below. Critical sections in real-time applications suffer from the wellknown priority-inversion problem [7]. Multiple solutions have been proposed, e.g. priority inheritance (which is generally not sufficient), priority ceiling [7], and stack-based priority ceiling [2]. All methods need to modify a thread’s priority while the thread executes the critical section. In the stack-based priority-ceiling protocol, for example, a thread has to execute the critical section always with the maximum priority of all threads that might eventually execute the critical section, regardless of its original priority. A very natural solution for stack-based priority ceiling in a thread/IPC-based system is to have a dedicated thread per critical