Results 1 - 10
of
10
Hierarchical Clustering: A Structure for Scalable Multiprocessor Operating System Design
- JOURNAL OF SUPERCOMPUTING
, 1993
"... We introduce the concept of Hierarchical Clustering as a way to structure shared memory multiprocessor operating systems for scalability. As the name implies, the concept is based on clustering and hierarchical system design. Hierarchical Clustering leads to a modular system, composed of easy-tode ..."
Abstract
-
Cited by 57 (18 self)
- Add to MetaCart
We introduce the concept of Hierarchical Clustering as a way to structure shared memory multiprocessor operating systems for scalability. As the name implies, the concept is based on clustering and hierarchical system design. Hierarchical Clustering leads to a modular system, composed of easy-todesign and efficient building blocks. The resulting structure is scalable because it i) maximizes locality, which is key to good performance in NUMA systems, and ii) provides for concurrency that increases linearly with the number of processors. At the same time, there is tight coupling within a cluster, so the system performs well for local interactions which are expected to constitute the common case. A clustered system can easily be adapted to different hardware configurations and architectures by changing the size of the clusters. We show how this structuring technique is applied to the design of a microkernel-based operating system called HURRICANE. This prototype system is the first complete and running implementation of its kind, and demonstrates the feasibility of a hierarchically clustered system. We present performance results based on the prototype, demonstrating the characteristics and behavior of a clustered system. In particular, we show how clustering trades off the efficiencies of tight coupling for the advantages of replication, increased locality, and decreased lock contention. We describe some of the lessons we learned from our implementation efforts and close with a discussion of our future work.
Hfs: A performance-oriented flexible file system based on building-block compositions
- ACM Transactions on Computer Systems
, 1997
"... The Hurricane File System (HFS) is designed for (potentially large-scale) shared-memory multiprocessors. Its architecture is based on the principle that, in order to maximize performance for applications with diverse requirements, a file system must support a wide variety of file structures, file sy ..."
Abstract
-
Cited by 49 (8 self)
- Add to MetaCart
The Hurricane File System (HFS) is designed for (potentially large-scale) shared-memory multiprocessors. Its architecture is based on the principle that, in order to maximize performance for applications with diverse requirements, a file system must support a wide variety of file structures, file system policies, and I/O interfaces. Files in HFS are implemented using simple building blocks composed in potentially complex ways. This approach yields great flexibility, allowing an application to customize the structure and policies of a file to exactly meet its requirements. As an extreme example, HFS allows a file’s structure to be optimized for concurrent random-access write-only operations by 10 threads, something no other file system can do. Similarly, the prefetching, locking, and file cache management policies can all be chosen to match an application’s access pattern. In contrast, most parallel file systems support a single file structure and a small set of policies. We have implemented HFS as part of the Hurricane operating system running on the Hector shared-memory multiprocessor. We demonstrate that the flexibility of HFS comes with little processing or I/O overhead. We also show that for a number of file access patterns, HFS is able to deliver to the applications the full I/O bandwidth of the disks on our system.
HFS: A flexible file system for shared-memory multiprocessors
, 1994
"... The HURRICANE File System (HFS) is designed for large-scale, shared-memory multiprocessors. Its architecture is based on the principle that a file system must support a wide variety of file structures, file system policies and I/O interfaces to maximize performance for a wide variety of applications ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
The HURRICANE File System (HFS) is designed for large-scale, shared-memory multiprocessors. Its architecture is based on the principle that a file system must support a wide variety of file structures, file system policies and I/O interfaces to maximize performance for a wide variety of applications. HFS uses a novel, object-oriented building-block approach to provide the flexibility needed to support this variety of file structures, policies, and I/O interfaces. File structures can be defined in HFS that optimize for sequential or random access, read-only, write-only or read/write access, sparse or dense data, large or small file sizes, and different degrees of application concurrency. Policies that can be defined on a per-file or per-open instance basis include locking policies, prefetching policies, compression/decompression policies and file cache management policies. In contrast, most existing file systems have been designed to support a single file structure and a small set of po...
Experiences with Locking in a NUMA Multiprocessor Operating System Kernel
- IN OSDI SYMPOSIUM
, 1994
"... We describe the locking architecture of a new operating system, HURRICANE, designed for large scale shared-memory multiprocessors. Many papers already describe kernel locking techniques, andsome of the techniques we use have been previously described by others. However, our work is novel in the par ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
We describe the locking architecture of a new operating system, HURRICANE, designed for large scale shared-memory multiprocessors. Many papers already describe kernel locking techniques, andsome of the techniques we use have been previously described by others. However, our work is novel in the particular combination of techniques used, as well as several of the individual techniques themselves. Moreover, it is the way the techniques work together that is the source of our performance advantages and scalability. Briefly, we use: ffl a hybrid coarse-grain/fine-grain locking strategy that has the low latency and space overhead of a coarsegrain locking strategy while having the high concurrency of a fine-grain locking strategy; ffl replication of data structures to increase access bandwidth and improve concurrency; ffl a clustered kernel that bounds the number of processors that can compete for a lock so as to reduce second order effects such as memory and interconnect contention; ff...
(De-)Clustering Objects for Multiprocessor System Software
- IN PROC. FOURTH INTL. WORKSHOP ON OBJECT ORIENTATION IN OPERATING SYSTEMS (IWOOS95
, 1995
"... Designing system software for large-scale shared-memory multiprocessors is challenging because of the level of performance demanded by the application workload and the distributed nature of the system. Adopting an object-oriented approach for our system, we have developed a framework for de-clusteri ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
Designing system software for large-scale shared-memory multiprocessors is challenging because of the level of performance demanded by the application workload and the distributed nature of the system. Adopting an object-oriented approach for our system, we have developed a framework for de-clustering objects, where each object may migrate, replicate, and distribute all or part of its data across the system memory using the policies that will best meet the locality requirements for that data. The mechanism for object invocation hides the internal structure of an object, allowinga request to be made directly to the most suitable part of the object on a per-processor basis without any knowledge of how the object is de-clustered. Method invocation is very efficient, both within and across address spaces, involving no remote memory accesses in the common case. We describe the design and implementation of this framework in Tornado, our multiprocessor operating system.
Efficient Sleep/Wake-up Protocols for User-Level IPC
- In: International Conference on Parallel Processing
, 1998
"... We present a new facility for cross-address space IPC that exploits queues in memory shared between the client and server address space. The facility employs only widely available operating system mechanisms, and is hence easily portable to different commercial operating systems. It incorporates blo ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We present a new facility for cross-address space IPC that exploits queues in memory shared between the client and server address space. The facility employs only widely available operating system mechanisms, and is hence easily portable to different commercial operating systems. It incorporates blocking semantics to avoid wasting processor cycles, and still achieves almost twice the throughput of the native kernelmediated IPC facilities on SGI and IBM uniprocessors. In addition, we demonstrate significantly higher performance gains on an SGI multiprocessor. We argue that co-operating tasks will be better served if the operating system is aware of the co-operation, and propose an interface for a hand-off scheduling mechanism. Finally, we report initial performance results from a Linux implementation of our proposal. 1 Introduction The performance of Inter-Process Communication (IPC) is crucial to many applications. For this reason, there has been a great deal of research into develop...
A Prototype for Interprocess Communication Support, in Hardware
- In Proc. Hardware, 9th Euromicro Workshop on Real-Time Systems
, 1997
"... In message based systems, interprocess communication (IPC) is a central facility. If the IPC part is ineffective in such a system, it will decrease the performance and response time. By implementing the IPC facility in hardware, the administration (scheduling, message handling, time-out supervising ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
In message based systems, interprocess communication (IPC) is a central facility. If the IPC part is ineffective in such a system, it will decrease the performance and response time. By implementing the IPC facility in hardware, the administration (scheduling, message handling, time-out supervising etc.), is reduced on the CPU , which leads to more time left for the application and a more deterministic time behaviour. This paper describes an hardware implementation of asynchronous IPC in an RTU based architecture. RTU is a hardware implementation of a real-time kernel for uniprocessor and multiprocessor systems. In addition, our implementation of IPC supports message priority, priority inheritance on message arrival , and task timeout on message send/receive. An increased performance and message flow, in a message intense system, can be realized by implementing IPC functions in an RTU architecture. 1. Introduction Different methods have been used to improve IPC e.g. using registers [...
An Overview of the NUMAchine Multiprocessor Project
- In Proceedings of the Canadian Supercomputing Conference
, 1994
"... The NUMAchine multiprocessor project is a large research effort at the University of Toronto aimed to investigate and develop novel software techniques to support efficient parallel computing. An integral part of the project is to design and build the NUMAchine multiprocessor-- a large-scale, cache- ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The NUMAchine multiprocessor project is a large research effort at the University of Toronto aimed to investigate and develop novel software techniques to support efficient parallel computing. An integral part of the project is to design and build the NUMAchine multiprocessor-- a large-scale, cache-coherent, non uniform memory access (NUMA), shared memory multiprocessor. The NUMAchine has a number of hardware innovations designed to facilitate our software techniques. This integrated hardware--software approach is the major theme of the project. In this talk we will present an overview of the NUMAchine project, and will describe the NUMAchine multiprocessor architecture. In particular, we will describe the unique features of the architecture: network caches, cache-coherence protocol, support for block transfers, monitoring capabilities, and the FPGAbased flexible hardware control. Our software techniques will be address fundamental issues in software support for multiprocessors in the ...
(De-)Clustering Objects for Multiprocessor System Software
- In Proc. Fourth Intl. Workshop on Object Orientation in Operating Systems (IWOOS95
, 1995
"... Designing system software for large-scale shared-memory multiprocessors is challenging because of the level of performance demanded by the application workload and the distributed nature of the system. Adopting an objectoriented approach for our system, we have developed a framework for de-clusterin ..."
Abstract
- Add to MetaCart
Designing system software for large-scale shared-memory multiprocessors is challenging because of the level of performance demanded by the application workload and the distributed nature of the system. Adopting an objectoriented approach for our system, we have developed a framework for de-clustering objects, where each object may migrate, replicate, and distribute all or part of its data across the system memory using the policies that will best meet the locality requirements for that data. The mechanism for object invocation hides the internal structure of an object, allowinga request to be made directly to the most suitable part of the object on a per-processor basis without any knowledge of how the object is de-clustered. Method invocation is very efficient, both within and across address spaces, involving no remote memory accesses in the common case. We describe the design and implementation of this framework in Tornado, our multiprocessor operating system. 1
Tornado: Maximizing Locality and Concurrency . . .
- IN PROCEEDINGS OF THE 3RD SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI
, 2003
"... We describe the design and implementation of Tornado, a new operating system designed from the ground up specifically for today's shared memory multiprocessors. The need for improved locality in the operating system is growing as multiprocessor hardware evolves, increasing the costs for cache misses ..."
Abstract
- Add to MetaCart
We describe the design and implementation of Tornado, a new operating system designed from the ground up specifically for today's shared memory multiprocessors. The need for improved locality in the operating system is growing as multiprocessor hardware evolves, increasing the costs for cache misses and sharing, and adding complications due to NUMAness. Tornado is optimized so that locality and independence in application requests for operating system services---whether from multiple sequential applications or a single parallel application--- are mapped onto locality and independence in the servicing of these requests in the kernel and system servers. By contrast, previous shared memory multiprocessor operating systems all evolved from designs constructed at a time when sharing costs were low, memory latency was low and uniform, and caches were small; for these systems, concurrency was the main performance concern and locality was not an important issue. Tornado

