Results 1 - 10
of
89
Scheduler Activations: Effective Kernel Support for the User-Level Management of Parallelism
- ACM Transactions on Computer Systems
, 1992
"... Threads are the vehicle,for concurrency in many approaches to parallel programming. Threads separate the notion of a sequential execution stream from the other aspects of traditional UNIX-like processes, such as address spaces and I/O descriptors. The objective of this separation is to make the expr ..."
Abstract
-
Cited by 420 (18 self)
- Add to MetaCart
Threads are the vehicle,for concurrency in many approaches to parallel programming. Threads separate the notion of a sequential execution stream from the other aspects of traditional UNIX-like processes, such as address spaces and I/O descriptors. The objective of this separation is to make the expression and control of parallelism sufficiently cheap that the programmer or compiler can exploit even fine-grained parallelism with acceptable overhead. Threads can be supported either by the operating system kernel or by user-level library code in the application address space, but neither approach has been fully satisfactory. This paper addresses this dilemma. First, we argue that the performance of kernel threads is inherently worse than that of user-level threads, rather than this being an artifact of existing implementations; we thus argue that managing par- allelism at the user level is essential to high-performance parallel computing. Next, we argue that the lack of system integration exhibited by user-level threads is a consequence of the lack of kernel support for user-level threads provided by contemporary multiprocessor operating systems; we thus argue that kernel threads or processes, as currently conceived, are the wrong abstraction on which to support user- level management of parallelism. Finally, we describe the design, implementation, and performance of a new kernel interface and user-level thread package that together provide the same functionality as kernel threads without compromis- ing the performance and flexibility advantages of user-level management of parallelism.
Extensibility, safety and performance in the SPIN operating system
, 1995
"... This paper describes the motivation, architecture and performance of SPIN, an extensible operating system. SPIN provides an extension infrastructure, together with a core set of extensible services, that allow applications to safely change the operating system's interface and implementation. Extensi ..."
Abstract
-
Cited by 392 (14 self)
- Add to MetaCart
This paper describes the motivation, architecture and performance of SPIN, an extensible operating system. SPIN provides an extension infrastructure, together with a core set of extensible services, that allow applications to safely change the operating system's interface and implementation. Extensions allow an application to specialize the underlying operating system in order to achieve a particular level of performance and functionality. SPIN uses language and link-time mechanisms to inexpensively export ne-grained interfaces to operating system services. Extensions are written in a type safe language, and are dynamically linked into the operating system kernel. This approach o ers extensions rapid access to system services, while protecting the operating system code executing within the kernel address space. SPIN and its extensions are written in Modula-3 and run on DEC Alpha workstations. 1
The Apertos Reflective Operating System: The Concept and Its Implementation
, 1992
"... This paper proposes a framework for constructing an operating system in an open and mobile computing environment. The framework provides object/metaobject separation and metahierarchy. In the framework, we view object migration as a basic mechanism to accommodate object heterogeneity. The relevance ..."
Abstract
-
Cited by 182 (6 self)
- Add to MetaCart
This paper proposes a framework for constructing an operating system in an open and mobile computing environment. The framework provides object/metaobject separation and metahierarchy. In the framework, we view object migration as a basic mechanism to accommodate object heterogeneity. The relevance of the proposed framework to existing system structures is discussed. We then present a practical implementation of the Apertos operating system in this framework, where reflectors are introduced for metaobject programming and MetaCore for providing common primitives. We present some evaluation results of the Apertos operating system. We also present related work in terms of reflection mechanisms and systems.
A library implementation of POSIX threads under UNIX
- In Proceedings of the USENIX Conference
, 1993
"... Recently, there has been an effort to specify an IEEE standard for portable operating systems for open systems, called POSIX. One part of it, the POSIX 1003.4a threads extension (Pthreads for short) [12], describes the interface for light-weight threads that rely on shared memory and have a smaller ..."
Abstract
-
Cited by 118 (15 self)
- Add to MetaCart
Recently, there has been an effort to specify an IEEE standard for portable operating systems for open systems, called POSIX. One part of it, the POSIX 1003.4a threads extension (Pthreads for short) [12], describes the interface for light-weight threads that rely on shared memory and have a smaller context frame than processes. This paper describes and evaluates the design and implementation of a library of Pthreads calls that is solely based on UNIX. It shows that a library implementation is feasible and can result in good performance. This work can also be used as a comparison of the performance of other implementations, or as a prototyping, testing, and debugging system in the regular UNIX environment. Finally, some problems with the Pthreads standard are identified.
Using Continuations to Implement Thread Management and Communication in Operating Systems
, 1991
"... We have improved the performance of the Mach 3.0 operating system by redesigning its internal thread and interprocess communication facilities to use continuations as the basis for control transfer. Compared to previous versions of Mach 3.0, our new system consumes 85% less space per thread. Cross- ..."
Abstract
-
Cited by 104 (12 self)
- Add to MetaCart
We have improved the performance of the Mach 3.0 operating system by redesigning its internal thread and interprocess communication facilities to use continuations as the basis for control transfer. Compared to previous versions of Mach 3.0, our new system consumes 85% less space per thread. Cross-address space remote procedure calls execute 14% faster. Exception handling runs over 60% faster. In addition to improving system performance, we have used continuations to generalize many control transfer optimizations that are common to operating systems, and have recast those optimizations in terms of a single implementation methodology. This paper describes our experiences with using continuations in the Mach operating system. 1 Introduction We have achieved significant improvements in the performance of the Mach 3.0 operating system kernel [Accetta et al. 86] by redesigning it to use continuations as the basis for control transfers between execution contexts. In our system, a thread b...
Synthesis: An Efficient Implementation of Fundamental Operating System Services
, 1992
"... This dissertation shows that operating systems can provide fundamental services an order of magnitude more efficiently than traditional implementations. It describes the implementation of a new operating system kernel, Synthesis, that achieves this level of performance. The Synthesis kernel combines ..."
Abstract
-
Cited by 79 (1 self)
- Add to MetaCart
This dissertation shows that operating systems can provide fundamental services an order of magnitude more efficiently than traditional implementations. It describes the implementation of a new operating system kernel, Synthesis, that achieves this level of performance. The Synthesis kernel combines several new techniques to provide high performance without sacrificing the expressive power or security of the system. The new ideas include: ffl Run-time code synthesis --- a systematic way of creating executable machine code at runtime to optimize frequently-used kernel routines --- queues, buffers, context switchers, interrupt handlers, and system call dispatchers --- for specific situations, greatly reducing their execution time. ffl Fine-grain scheduling --- a new process-scheduling technique based on the idea of feedback that performs frequent scheduling actions and policy adjustments (at submillisecond intervals) resulting in an adaptive, self-tuning system that can support real-ti...
Speculative Synchronization: Applying Thread-Level Speculation to Explicitly Parallel Applications
- ASPLOS X
, 2002
"... Barriers, locks, and flags are synchronizing operations widely used by programmers and parallelizing compilers to produce race-free parallel programs. Often times, these operations are placed suboptimally, either because of conservative assumptions about the program, or merely for code simplicity. W ..."
Abstract
-
Cited by 75 (7 self)
- Add to MetaCart
Barriers, locks, and flags are synchronizing operations widely used by programmers and parallelizing compilers to produce race-free parallel programs. Often times, these operations are placed suboptimally, either because of conservative assumptions about the program, or merely for code simplicity. We propose
Non-blocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared Memory Multiprocessors
- JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1998
"... Most multiprocessors are multiprogrammed in order to achieve acceptable response time and to increase their uti-lization. Unfortunately, inopportune preemption may significantly degrade the performance of synchronized parallel applications. To address this problem, researchers have developed two pri ..."
Abstract
-
Cited by 65 (1 self)
- Add to MetaCart
Most multiprocessors are multiprogrammed in order to achieve acceptable response time and to increase their uti-lization. Unfortunately, inopportune preemption may significantly degrade the performance of synchronized parallel applications. To address this problem, researchers have developed two principal strategies for concurrent, atomic update of shared data structures: (1) preemption-safe locking and (2) non-blocking (lock-free) algorithms. Preemption-safe locking requires kernel support. Non-blocking algorithms generally require a universal atomic primitive such as compare-and-swap orload-linked/store-conditional, and are widely regarded as inefficient. We evaluate the performance of preemption-safe lock-based and non-blocking implementations of important data structures—queues, stacks, heaps, and counters—including non-blocking and lock-based queue algorithms of our own, in micro-benchmarks and real applications on a 12-processor SGI Challenge multiprocessor. Our results indicate that our non-blocking queue consistently outperforms the best known alternatives, and that data-structure-specific non-blocking algorithms, which exist for queues, stacks, and counters, can work extremely well. Not only do they outperform preemption-safe lock-based algorithms on multiprogrammed machines, they also outperform ordinary locks on dedicated machines. At the same time, since general-purpose non-blocking techniques do not yet appear to be practical, preemption-safe locks remain the preferred alternative for complex data structures: they outperform
Ariadne: Architecture of a Portable Threads system supporting Mobile Processes
- Software-Practice and Experience
, 1996
"... Threads exhibit a simply expressed and powerful form of concurrency, easily exploitable in applications that run on both uni- and multi-processors, shared- and distributed-memory systems. This paper presents the design and implementation of Ariadne: a layered, C-based software architecture for multi ..."
Abstract
-
Cited by 50 (15 self)
- Add to MetaCart
Threads exhibit a simply expressed and powerful form of concurrency, easily exploitable in applications that run on both uni- and multi-processors, shared- and distributed-memory systems. This paper presents the design and implementation of Ariadne: a layered, C-based software architecture for multi-threaded computing on a variety of platforms. Ariadne is a portable user-space threads system that runs on shared- and distributed-memory multiprocessors. It can be used for parallel and distributed applications. Thread-migration is supported at the application level in homogeneous environments (e.g., networks of SPARCs and Sequent Symmetrys, Intel hypercubes). Threads may migrate between processes to access remote data, preserving locality of reference for computations with a dynamic data space. Ariadne can be tuned to specific applications through a customization layer. Support is provided for scheduling via a built-in or application-specific scheduler, and interfacing with any communicat...

