Results 1 - 10
of
110
Extensibility, safety and performance in the SPIN operating system
, 1995
"... This paper describes the motivation, architecture and performance of SPIN, an extensible operating system. SPIN provides an extension infrastructure, together with a core set of extensible services, that allow applications to safely change the operating system's interface and implementation. Extensi ..."
Abstract
-
Cited by 392 (14 self)
- Add to MetaCart
This paper describes the motivation, architecture and performance of SPIN, an extensible operating system. SPIN provides an extension infrastructure, together with a core set of extensible services, that allow applications to safely change the operating system's interface and implementation. Extensions allow an application to specialize the underlying operating system in order to achieve a particular level of performance and functionality. SPIN uses language and link-time mechanisms to inexpensively export ne-grained interfaces to operating system services. Extensions are written in a type safe language, and are dynamically linked into the operating system kernel. This approach o ers extensions rapid access to system services, while protecting the operating system code executing within the kernel address space. SPIN and its extensions are written in Modula-3 and run on DEC Alpha workstations. 1
Trace-Driven Memory Simulation: A Survey
- ACM Computing Surveys
, 2004
"... This article surveys and analyzes these developments by establishing criteria for evaluating trace-driven methods, and then applies these criteria to describe, categorize, and compare over 50 trace-driven simulation tools. We discuss the strengths and weaknesses of different approaches and show t ..."
Abstract
-
Cited by 134 (0 self)
- Add to MetaCart
This article surveys and analyzes these developments by establishing criteria for evaluating trace-driven methods, and then applies these criteria to describe, categorize, and compare over 50 trace-driven simulation tools. We discuss the strengths and weaknesses of different approaches and show that no single method is best when all criteria, including accuracy, speed, memory, flexibility, portability, expense, and ease of use are considered. In a concluding section, we examine fundamental limitations to trace-driven simulation, and survey some recent developments in memory simulation that may overcome these bottlenecks
The impact of architectural trends on operating system performance
- In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles
, 1995
"... Computer systems are rapidly changing. Over the next few years, we will see wide-scale deployment of dynamically-scheduled processors that can issue multiple instructions every clock cycle, execute instructions out of order, and overlap computation and cache misses. We also expect clock-rates to inc ..."
Abstract
-
Cited by 130 (7 self)
- Add to MetaCart
Computer systems are rapidly changing. Over the next few years, we will see wide-scale deployment of dynamically-scheduled processors that can issue multiple instructions every clock cycle, execute instructions out of order, and overlap computation and cache misses. We also expect clock-rates to increase, caches to grow, and multiprocessors to replace uniprocessors. Using SimOS, a complete machine simulation environment, this paper explores the impact of the above architectural trends on operating system performance. We present results based on the execution of large and realistic workloads (program development, transaction processing, and engineering compute-server) running on the IRIX 5.3 operating system from Silicon Graphics Inc. Looking at uniprocessor trends, we find that disk I/O is the first-order bottleneck for workloads such as program development and transaction processing. Its importance continues to grow over time. Ignoring I/O, we find that the memory system is the key bottleneck, stalling the CPU for over 50 % of the execution time. Surprisingly, however, our results show that this stall fraction is unlikely to increase on future machines due to increased cache sizes and new latency hiding techniques in processors. We also find that the benefits of these architectural trends spread broadly across a majority of the important services provided by the operating system. We find the situation to be much worse for multiprocessors. Most operating systems services consume 30-70 % more time than their uniprocessor counterparts. A large fraction of the stalls are due to coherence misses caused by communication between processors. Because larger caches do not reduce coherence misses, the performance gap between uniprocessor and multiprocessor performance will increase unless operating system developers focus on kernel restructuring to reduce unnecessary communication. The paper presents a detailed decomposition of execution time (e.g., instruction execution time, memory stall time separately for instructions and data, synchronization time) for important kernel services in the three workloads.
Scout: A Communications-Oriented Operating System
, 1994
"... This white paper describes Scout, a new operating system being designed for systems connected to the National Information Infrastructure (NII). Scout provides a communication-oriented software architecture for building operating system code that is specialized for the different systems that we expec ..."
Abstract
-
Cited by 114 (3 self)
- Add to MetaCart
This white paper describes Scout, a new operating system being designed for systems connected to the National Information Infrastructure (NII). Scout provides a communication-oriented software architecture for building operating system code that is specialized for the different systems that we expect to be available on the NII. It includes an explicit path abstraction that both facilitates effective resource management and permits optimizations of the critical path that I/O data follows. These path-enabled optimizations, along with the application of advanced compiler techniques, result in a system that has both predictable and scalable performance. June 17, 1994 Department of Computer Science The University of Arizona Tucson, AZ 1 Introduction As the National Information Infrastructure (NII) evolves, and digital computer networks become ubiquitous, communication will play an increasingly important role in computer systems. In fact, a recent report on the NII rejects the term "compu...
The performance of µ-kernel-based systems
- IN 16TH ACM SYMPOSIUM ON OPERATING SYSTEM PRINCIPLES (SOSP
, 1997
"... First-generation µ-kernels have a reputation for being too slow and lacking sufficient flexibility. To determine whether L4, a lean second-generation µ-kernel, has overcome these limitations, we have repeated several earlier experiments and conducted some novel ones. Moreover, we ported the Linux op ..."
Abstract
-
Cited by 97 (14 self)
- Add to MetaCart
First-generation µ-kernels have a reputation for being too slow and lacking sufficient flexibility. To determine whether L4, a lean second-generation µ-kernel, has overcome these limitations, we have repeated several earlier experiments and conducted some novel ones. Moreover, we ported the Linux operating system to run on top of the L4 µ-kernel and compared the resulting system with both Linux running native, and MkLinux, a Linux version that executes on top of a firstgeneration Mach-derived µ-kernel. For L 4 Linux, the AIM benchmarks report a maximum throughput which is only 5% lower than that of native Linux. The corresponding penalty is 5 times higher for a co-located in-kernel version of MkLinux, and 7 times higher for a userlevel version of MkLinux. These numbers demonstrate both that it is possible to implement a high-performance conventional operating system personality above a µ-kernel, and that the performance of the µ-kernel is crucial to achieve this. Further experiments...
Studies of Windows NT Performance using Dynamic Execution Traces
- IN PROCEEDINGS OF THE SECOND SYMPOSIUM ON OPERATING SYSTEM DESIGN AND IMPLEMENTATION
, 1996
"... We studied two aspects of the performance of Windows NT: processor bandwidth requirements for memory accesses in a uniprocessor system running commercial and benchmark applications, and locking behavior of a commercial database on a small-scale multiprocessor. Our studies are based on full dynamic e ..."
Abstract
-
Cited by 88 (0 self)
- Add to MetaCart
We studied two aspects of the performance of Windows NT: processor bandwidth requirements for memory accesses in a uniprocessor system running commercial and benchmark applications, and locking behavior of a commercial database on a small-scale multiprocessor. Our studies are based on full dynamic execution traces of the systems, which include all instructions executed by the operating system and applications over periods of a few seconds (enough time to allow for significant computation). The traces were obtained on Alpha PCs, using a new software tool called PatchWrx that takes advantage of the Alpha architecture's PAL-code layer to implement efficient, comprehensive system tracing. Because the Alpha version of Windows NT uses substantially the same code base as other versions, and therefore executes nearly the same sequence of calls, basic blocks, and data structure accesses, we believe our conclusions are relevant for non-Alpha systems as well. This paper describes our performance studies and interesting aspects of PatchWrx. We conclude
Operating system benchmarking in the wake of lmbench: A case study of the performance of NetBSD on the Intel x86 architecture
- In ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems
, 1997
"... The lmbench suite of operating system microbenchmarks provides a set of portable programs for use in cross-platform comparisons. We have augmented the fmbench suite to increase its flexibility and precision, and to improve its methodological and statistical opera-tion. This enables the detailed stud ..."
Abstract
-
Cited by 79 (1 self)
- Add to MetaCart
The lmbench suite of operating system microbenchmarks provides a set of portable programs for use in cross-platform comparisons. We have augmented the fmbench suite to increase its flexibility and precision, and to improve its methodological and statistical opera-tion. This enables the detailed study of interactions between the operating system and the hardware architecture. We describe mod-ifications to bnbench, and then use our new benchmark suite, hbench:OS, to examine how the performance of operating system primitives under NetBSD has scaled with the processor evolution of the Intel x86 architecture. Our analysis shows that off-chip memory system design continues to influence operating system performance in a significant way and that key design decisions (such as suboptimal choices of DRAM and cache technology, and memory-bus and cache coherency protocols) can essentially nul-lify the performance benefits of the aggressive execution core and sophisticated on-chip memory system of a modem processor such as the Intel Pentium Pro. 1
Toward Real Microkernels
, 1996
"... (IPC), and basic scheduling. All servers---even device drivers ---run in user mode and are treated exactly like any other application by the kernel. Since each server has its own address space, all these objects are protected from one another. When the microkernel idea was introduced in the COMMUNI ..."
Abstract
-
Cited by 69 (0 self)
- Add to MetaCart
(IPC), and basic scheduling. All servers---even device drivers ---run in user mode and are treated exactly like any other application by the kernel. Since each server has its own address space, all these objects are protected from one another. When the microkernel idea was introduced in the COMMUNICATIONS OF THE ACM September 1996/Vol. 39, No. 9 71 late 1980s, the software technology advantages seemed obvious: . Different application program interfaces (APIs), different file systems, and perhaps even different basic operating system strategies can coexist in one system. They are implemented as competing or cooperating servers. . The system becomes more flexible and extensible. It can be more easily and effectively adapted to new hardware or new applications. Only selected servers need to be modified or added to the system. In particular, the impact of such modifications can be restricted to a subset of the system, so all other processes ar
Instruction Fetching: Coping with Code Bloat
- In Proceedings of the 22nd Annual International Symposium on Computer Architecture
, 1995
"... Previous research has shown that the SPEC benchmarks achieve low miss ratios in relatively small instruction caches. This paper presents evidence that current software-development practices produce applications that exhibit substantially higher instruction-cache miss ratios than do the SPEC benchmar ..."
Abstract
-
Cited by 62 (9 self)
- Add to MetaCart
Previous research has shown that the SPEC benchmarks achieve low miss ratios in relatively small instruction caches. This paper presents evidence that current software-development practices produce applications that exhibit substantially higher instruction-cache miss ratios than do the SPEC benchmarks. To represent these trends, we have assembled a collection of applications, called the Instruction Benchmark Suite (IBS), that provides a better test of instruction-cache performance. We discuss the rationale behind the design of IBS and characterize its behavior relative to the SPEC benchmark suite. Our analysis is based on trace-driven and trap-driven simulations and takes into full account both the application and operating-system components of the workloads. This paper then reexamines a collection of previously-proposed hardware mechanisms for improving instruction-fetch performance
Optimizing Instruction Cache Performance for Operating System Intensive Workloads
- IEEE Transactions on Computers
, 1995
"... High instruction cache hit rates are key to high performance. One known technique to improve the hit rate of caches is to use an optimizing compiler to minimize cache interference via an improved layout of the code. This technique, however, has been applied to application code only, even though ther ..."
Abstract
-
Cited by 61 (11 self)
- Add to MetaCart
High instruction cache hit rates are key to high performance. One known technique to improve the hit rate of caches is to use an optimizing compiler to minimize cache interference via an improved layout of the code. This technique, however, has been applied to application code only, even though there is evidence that the operating system often uses the cache heavily and with less uniform patterns than applications. Therefore, it is unknown how well existing optimizations perform for systems code and whether better optimizations can be found. We address this problem in this paper. This paper characterizes in detail the locality patterns of the operating system code and shows that there is substantial locality. Unfortunately, caches are not able to extract much of it: rarely-executed special-case code disrupts spatial locality, loops with few iterations that call routines make loop locality hard to exploit, and plenty of loop-less code hampers temporal locality. As a result, interference...

