Results 1 - 10
of
49
Xen and the art of virtualization
- In SOSP (2003
"... Numerous systems have been designed which use virtualization to subdivide the ample resources of a modern computer. Some require specialized hardware, or cannot support commodity operating systems. Some target 100 % binary compatibility at the expense of performance. Others sacrifice security or fun ..."
Abstract
-
Cited by 990 (27 self)
- Add to MetaCart
Numerous systems have been designed which use virtualization to subdivide the ample resources of a modern computer. Some require specialized hardware, or cannot support commodity operating systems. Some target 100 % binary compatibility at the expense of performance. Others sacrifice security or functionality for speed. Few offer resource isolation or performance guarantees; most provide only best-effort provisioning, risking denial of service. This paper presents Xen, an x86 virtual machine monitor which allows multiple commodity operating systems to share conventional hardware in a safe and resource managed fashion, but without sacrificing either performance or functionality. This is achieved by providing an idealized virtual machine abstraction to which operating systems such as Linux, BSD and Windows XP, can be ported with minimal effort. Our design is targeted at hosting up to 100 virtual machine instances simultaneously on a modern server. The virtualization approach taken by Xen is extremely efficient: we allow operating systems such as Linux and Windows XP to be hosted simultaneously for a negligible performance overhead — at most a few percent compared with the unvirtualized case. We considerably outperform competing commercial and freely available solutions in a range of microbenchmarks and system-wide tests.
The Flux OSKit: A substrate for kernel and language research
- In Proceedings of the 16th ACM Symposium on Operating Systems Principles
, 1997
"... Implementing new operating systems is tedious, costly, and often impractical except for large projects. The Flux OSKit addresses this problem in a novel way by providing clean, well-documented OS components designed to be reused in a wide variety of other environments, rather than defining a new OS ..."
Abstract
-
Cited by 108 (1 self)
- Add to MetaCart
Implementing new operating systems is tedious, costly, and often impractical except for large projects. The Flux OSKit addresses this problem in a novel way by providing clean, well-documented OS components designed to be reused in a wide variety of other environments, rather than defining a new OS structure. The OSKit uses unconventional techniques to maximize its usefulness, such as intentionally exposing implementation details and platform-specific facilities. Further, the OSKit demonstrates a technique that allows unmodified code from existing mature operating systems to be incorporated quickly and updated regularly, by wrapping it with a small amount of carefully designed “glue” code to isolate its dependencies and export well-defined interfaces. The OSKit uses this technique to incorporate over 230,000 lines of stable code including device drivers, file systems, and network protocols. Our experience demonstrates that this approach to component software structure and reuse has a surprisingly large impact in the OS implementation domain. Four real-world examples show how the OSKit is catalyzing research and development in operating systems and programming languages. 1
The performance of µ-kernel-based systems
- IN 16TH ACM SYMPOSIUM ON OPERATING SYSTEM PRINCIPLES (SOSP
, 1997
"... First-generation µ-kernels have a reputation for being too slow and lacking sufficient flexibility. To determine whether L4, a lean second-generation µ-kernel, has overcome these limitations, we have repeated several earlier experiments and conducted some novel ones. Moreover, we ported the Linux op ..."
Abstract
-
Cited by 97 (14 self)
- Add to MetaCart
First-generation µ-kernels have a reputation for being too slow and lacking sufficient flexibility. To determine whether L4, a lean second-generation µ-kernel, has overcome these limitations, we have repeated several earlier experiments and conducted some novel ones. Moreover, we ported the Linux operating system to run on top of the L4 µ-kernel and compared the resulting system with both Linux running native, and MkLinux, a Linux version that executes on top of a firstgeneration Mach-derived µ-kernel. For L 4 Linux, the AIM benchmarks report a maximum throughput which is only 5% lower than that of native Linux. The corresponding penalty is 5 times higher for a co-located in-kernel version of MkLinux, and 7 times higher for a userlevel version of MkLinux. These numbers demonstrate both that it is possible to implement a high-performance conventional operating system personality above a µ-kernel, and that the performance of the µ-kernel is crucial to achieve this. Further experiments...
An Efficient Implementation of Java's Remote Method Invocation
- In Seventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’99
, 1999
"... Java offers interesting opportunities for parallel computing. In particular, Java Remote Method Invocation provides an unusually flexible kind of Remote Procedure Call. Unlike RPC, RMI supports polymorphism, which requires the system to be able to download remote classes into a running application. ..."
Abstract
-
Cited by 80 (25 self)
- Add to MetaCart
Java offers interesting opportunities for parallel computing. In particular, Java Remote Method Invocation provides an unusually flexible kind of Remote Procedure Call. Unlike RPC, RMI supports polymorphism, which requires the system to be able to download remote classes into a running application. Sun's RMI implementation achieves this kind of flexibility by passing around object type information and processing it at run time, which causes a major run time overhead. Using Sun's JDK 1.1.4 on a Pentium Pro/Myrinet cluster, for example, the latency for a null RMI (without parameters or a return value) is 1228 µsec, which is about a factor of 40 higher than that of a user-level RPC. In this paper, we study an alternative approach for implementing RMI, based on native compilation. This approach allows for better optimization, eliminates the need for processing of type information at run time, and makes a light weight communication protocol possible. We have built a Java system based on a n...
Efficient Java RMI for parallel programming
- ACM Transactions on Programming Languages and Systems (TOPLAS
, 2001
"... Java offers interesting opportunities for parallel computing. In particular, Java Remote Method Invocation (RMI) provides a flexible kind of remote procedure call (RPC) that supports polymorphism. Sun’s RMI implementation achieves this kind of flexibility at the cost of a major runtime overhead. The ..."
Abstract
-
Cited by 44 (12 self)
- Add to MetaCart
Java offers interesting opportunities for parallel computing. In particular, Java Remote Method Invocation (RMI) provides a flexible kind of remote procedure call (RPC) that supports polymorphism. Sun’s RMI implementation achieves this kind of flexibility at the cost of a major runtime overhead. The goal of this article is to show that RMI can be implemented efficiently, while still supporting polymorphism and allowing interoperability with Java Virtual Machines (JVMs). We study a new approach for implementing RMI, using a compiler-based Java system called Manta. Manta uses a native (static) compiler instead of a just-in-time compiler. To implement RMI efficiently, Manta exploits compile-time type information for generating specialized serializers. Also, it uses an efficient RMI protocol and fast low-level communication protocols. A difficult problem with this approach is how to support polymorphism and interoperability. One of the consequences of polymorphism is that an RMI implementation must be able to download remote classes into an application during runtime. Manta solves this problem by using a dynamic bytecode compiler, which is capable of compiling and linking bytecode into a running application. To allow interoperability with JVMs, Manta also implements the Sun RMI protocol (i.e., the standard RMI protocol), in addition to its own protocol.
The Case for Application-Specific Benchmarking
- In Workshop on Hot Topics in Operating Systems
, 1999
"... Most performance analysis today uses either microbenchmarks or standard macrobenchmarks (e.g., SPEC, LADDIS, the Andrew benchmark). However, the results of such benchmarks provide little information to indicate how well a particular system will handle a particular application. Such results are, at b ..."
Abstract
-
Cited by 40 (6 self)
- Add to MetaCart
Most performance analysis today uses either microbenchmarks or standard macrobenchmarks (e.g., SPEC, LADDIS, the Andrew benchmark). However, the results of such benchmarks provide little information to indicate how well a particular system will handle a particular application. Such results are, at best, useless and, at worst, misleading. In this paper, we argue for an application-directed approach to benchmarking, using performance metrics that reflect the expected behavior of a particular application across a range of hardware or software platforms. We present three different approaches to application-specific measurement, one using vectors that characterize both the underlying system and an application, one using trace-driven techniques, and a hybrid approach. We argue that such techniques should become the new standard. 1
Taming Linux
- In Proceedings of the 5th Annual Australasian Conference on Parallel And Real-Time Systems (PART ’98
, 1998
"... This paper describes the overall design, partial implementation and brief performance evaluation of a system in which Linux and its applications run besides real-time applications. The separation of the real-time and time-sharing subsystems is not restricted to the use of the CPU but enforced as wel ..."
Abstract
-
Cited by 33 (9 self)
- Add to MetaCart
This paper describes the overall design, partial implementation and brief performance evaluation of a system in which Linux and its applications run besides real-time applications. The separation of the real-time and time-sharing subsystems is not restricted to the use of the CPU but enforced as well for other resources, namely main memory and caches. This paper details the changes needed for the original Linux to decouple it from real-time processes and analyzes the performance of the resulting system.
TBBT: Scalable and Accurate Trace Replay for File Server Evaluation
- FAST '05
, 2005
"... This paper describes the design, implementation, and evaluation of TBBT, the first comprehensive NFS trace replay tool. Given an NFS trace, TBBT automatically detects and repairs missing operations in the trace, derives a file system image required to successfully replay the trace, ages the file sys ..."
Abstract
-
Cited by 23 (5 self)
- Add to MetaCart
This paper describes the design, implementation, and evaluation of TBBT, the first comprehensive NFS trace replay tool. Given an NFS trace, TBBT automatically detects and repairs missing operations in the trace, derives a file system image required to successfully replay the trace, ages the file system image appropriately, initializes the file server under test with that image, and finally drives the file server with a workload that is derived from replaying the trace according to user-specified parameters. TBBT can scale a trace temporally or spatially to meet the need of a simulation run without violating dependencies among file system operations in the trace.
Windows XP kernel crash analysis
- in Proceedings of the 2006 Large Installation System Administration Conference, 2006
, 2006
"... PC users have started viewing crashes as a fact of life rather than a problem. To improve operating system dependability, systems designers and programmers must analyze and understand failure data. In this paper, we analyze Windows XP kernel crash data collected from a population of volunteers who c ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
PC users have started viewing crashes as a fact of life rather than a problem. To improve operating system dependability, systems designers and programmers must analyze and understand failure data. In this paper, we analyze Windows XP kernel crash data collected from a population of volunteers who contribute to the Berkeley Open Infrastructure for Network Computing (BOINC) project. We found that OS crashes are predominantly caused by poorly-written device driver code. Users as well as product developers will benefit from understanding the crash behaviors elaborated in this paper.
Making the “Box” Transparent: System Call Performance as a First-class Result
"... For operating system intensive applications, the ability of designers to understand system call performance behavior is essential to achieving high performance. Conventional performance tools, such as monitoring tools and profilers, collect and present their information off-line or via out-ofband ch ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
For operating system intensive applications, the ability of designers to understand system call performance behavior is essential to achieving high performance. Conventional performance tools, such as monitoring tools and profilers, collect and present their information off-line or via out-ofband channels. We believe that making this information first-class and exposing it to applications via in-band channels on a per-call basis presents opportunities for performance analysis and tuning not available via other mechanisms. Furthermore, our approach provides direct feedback to applications on time spent in the kernel, resource contention, and time spent blocked, allowing them to immediately observe how their actions affect kernel behavior. Not only does this approach provide greater transparency into the workings of the kernel, but it also allows applications to control how performance information is collected, filtered, and correlated with application-level events. To demonstrate the power of this approach, we show that our implementation, DeBox, obtains precise information about OS behavior at low cost, and that it can be used in debugging and tuning application performance on complex workloads. In particular, we focus on the industry-standard SpecWeb99 benchmark running on the Flash Web Server. Using DeBox, we are able to diagnose a series of problematic interactions between the server and the OS. Addressing these issues as well as other optimization opportunities generates an overall factor of four improvement in our SpecWeb99 score, throughput gains on other benchmarks, and latency reductions ranging from a factor of 4 to 47.

