Results 1 - 10
of
238
Xen and the art of virtualization
- In SOSP (2003
"... Numerous systems have been designed which use virtualization to subdivide the ample resources of a modern computer. Some require specialized hardware, or cannot support commodity operating systems. Some target 100 % binary compatibility at the expense of performance. Others sacrifice security or fun ..."
Abstract
-
Cited by 990 (27 self)
- Add to MetaCart
Numerous systems have been designed which use virtualization to subdivide the ample resources of a modern computer. Some require specialized hardware, or cannot support commodity operating systems. Some target 100 % binary compatibility at the expense of performance. Others sacrifice security or functionality for speed. Few offer resource isolation or performance guarantees; most provide only best-effort provisioning, risking denial of service. This paper presents Xen, an x86 virtual machine monitor which allows multiple commodity operating systems to share conventional hardware in a safe and resource managed fashion, but without sacrificing either performance or functionality. This is achieved by providing an idealized virtual machine abstraction to which operating systems such as Linux, BSD and Windows XP, can be ported with minimal effort. Our design is targeted at hosting up to 100 virtual machine instances simultaneously on a modern server. The virtualization approach taken by Xen is extremely efficient: we allow operating systems such as Linux and Windows XP to be hosted simultaneously for a negligible performance overhead — at most a few percent compared with the unvirtualized case. We considerably outperform competing commercial and freely available solutions in a range of microbenchmarks and system-wide tests.
Small forwarding tables for fast routing lookups
- in ACM Sigcomm
, 1997
"... For some time, the networking community has assumed that it is impossible to do IP routing lookups in software fast enough to support gigabit speeds. IP routing lookups must �nd the routing entry with the longest matching pre�x, a task that has been thought to require hardware support at lookup freq ..."
Abstract
-
Cited by 157 (0 self)
- Add to MetaCart
For some time, the networking community has assumed that it is impossible to do IP routing lookups in software fast enough to support gigabit speeds. IP routing lookups must �nd the routing entry with the longest matching pre�x, a task that has been thought to require hardware support at lookup frequencies of millions per second. We present a forwarding table data structure designed for quick routing lookups. Forwarding tables are small enough to �t in the cache of a conventional general purpose processor. With the table in cache, a 200 MHz Pentium Pro or a 333 MHz Alpha 21164 can perform a few million lookups per second. This means that it is feasible to do a full routing lookup for each IPpacket at gigabit speeds without special hardware. The forwarding tables are very small, a large routing table with 40,000 routing entries can be compacted to a forwarding table of 150�160 Kbytes. A lookup typically requires less than 100 instructions on an Alpha, using eight memory references accessing a total of 14 bytes. 1
EROS: a fast capability system
- In Symposium on Operating Systems Principles
, 1999
"... EROS is a capability-based operating system for commodity processors which uses a single level storage model. The single level store's persistence is transparent to applications. The performance consequences of support for transparent persistence and capability-based architectures are generally beli ..."
Abstract
-
Cited by 151 (21 self)
- Add to MetaCart
EROS is a capability-based operating system for commodity processors which uses a single level storage model. The single level store's persistence is transparent to applications. The performance consequences of support for transparent persistence and capability-based architectures are generally believed to be negative. Surprisingly, the basic operations of EROS (such as IPC) are generally comparable in cost to similar operations in conventional systems. This is demonstrated with a set of microbenchmark measurements of semantically similar operations in Linux. The EROS system achieves its performance by coupling well-chosen abstract objects with caching techniques for those objects. The objects (processes, nodes, and pages) are well-supported by conventional hardware, reducing the overhead of capabilities. Software-managed caching techniques for these objects reduce the cost of persistence. The resulting performance suggests that composing protected subsystems may be less costly than c...
Performance Characterization of the quad Pentium Pro SMP using OLTP workloads
- In Proc. of the Intl. Symp. on Computer Architecture
, 1998
"... Commercial applications are an important, yet often overlooked, workload with significantly different characteristics from technical workloads. The potential impact of these differences is that computers optimized for technical workloads may not provide good performance for commercial applications, ..."
Abstract
-
Cited by 123 (4 self)
- Add to MetaCart
Commercial applications are an important, yet often overlooked, workload with significantly different characteristics from technical workloads. The potential impact of these differences is that computers optimized for technical workloads may not provide good performance for commercial applications, and these applications may not fully exploit advances in processor design. To evaluate these issues, we use hardware counters to measure architectural features of a four-processor Pentium Pro-based server running a TPC-C-like workload on an Informix database. We examine the effectiveness of out-of-order execution, branch prediction, speculative execution, superscalar issue and retire, caching and multiprocessor scaling. We find that out-of-order execution, superscalar issue and retire, and branch prediction are not as effective for database workloads as they are for technical workloads, such as SPEC. We find that caches are effective at reducing processor traffic to memory; even larger caches would be helpful to satisfy more data requests. Multiprocessor scaling of this workload is good, but even modest memory system utilization degrades application memory latency, limiting database throughput.
Microkernels meet recursive virtual machines
, 1996
"... This paper describes a novel approach to providing modular and extensible operating system functionality and encapsulated environments based on a synthesis of microkernel and virtual machine concepts. We have developed a software-based virtualizable architecture called Fluke that allows recursive vi ..."
Abstract
-
Cited by 113 (25 self)
- Add to MetaCart
This paper describes a novel approach to providing modular and extensible operating system functionality and encapsulated environments based on a synthesis of microkernel and virtual machine concepts. We have developed a software-based virtualizable architecture called Fluke that allows recursive virtual machines (virtual machines running on other virtual machines) to be implemented efficiently by a microkernel running on generic hardware. A complete virtual machine interface is provided at each level; efficiency derives from needing to implement only new functionality at each level. This infrastructure allows common OS functionality, such as process management, demand paging, fault tolerance, and debugging support, to be provided by cleanly modularized, independent, stackable virtual machine monitors, implemented as user processes. It can also provide uncommon or unique OS features, including the above features specialized for particular applications ’ needs, virtual machines transparently distributed cross-node, or security monitors that allow arbitrary untrusted binaries to be executed safely. Our prototype implementation of this model indicates that it is practical to modularize operating systems this way. Some types of virtual machine layers impose almost no overhead at all, while others impose some overhead (typically 0–35%), but only on certain classes of applications.
Linux security modules: General security support for the linux kernel
- In Proceedings of the 11th USENIX Security Symposium
, 2002
"... Symposium ..."
Effective Distributed Scheduling of Parallel Workloads
, 1996
"... We present a distributed algorithm for time-sharing parallel workloads that is competitive with coscheduling. Implicit scheduling allows each local scheduler in the system to make independent decisions that dynamically coordinate the scheduling of cooperating processes across processors. Of particul ..."
Abstract
-
Cited by 106 (5 self)
- Add to MetaCart
We present a distributed algorithm for time-sharing parallel workloads that is competitive with coscheduling. Implicit scheduling allows each local scheduler in the system to make independent decisions that dynamically coordinate the scheduling of cooperating processes across processors. Of particular importance is the blocking algorithm which decides the action of a process waiting for a communication or synchronization event to complete. Through simulation of bulk-synchronous parallel applications, we find that a simple two-phase fixed-spin blocking algorithm performs well; a two-phase adaptive algorithm that gathers run-time data on barrier wait-times performs slightly better. Our results hold for a range of machine parameters and parallel program characteristics. These findings are in direct contrast to the literature that states explicit coscheduling is necessary for fine-grained programs. We show that the choice of the local scheduler is crucial, with a priority-based scheduler p...
Measuring Experimental Error in Microprocessor Simulation
, 2001
"... We measure the ex4fl840E468 error that arises from the use of non-validated simulators in computer architecture research, with the goal of increasing the rigor of simulation -based studies. We describe the methodology that we used to validate a microprocessor simulator against a Compaq DS-10L work ..."
Abstract
-
Cited by 105 (28 self)
- Add to MetaCart
We measure the ex4fl840E468 error that arises from the use of non-validated simulators in computer architecture research, with the goal of increasing the rigor of simulation -based studies. We describe the methodology that we used to validate a microprocessor simulator against a Compaq DS-10L workstation, which contains an Alpha 21264 processor. Our evaluation suite consists of a set of 21 microbenchmarks that stress different aspects of the 21264 microarchitecture. Using the microbenchmark suite as the set of workloads, we describe how we reduced our simulator error to an arithmetic mean of 2%, and include details about the specific aspects of the pipeline that required ex8 a care to reduce the error. We show how these low-level optimizations reduce average error from 40% to less than 20% on macrobenchmarks drawn from the SPEC2000 suite. Finally, we exI837 the degree to which performance optimizations are stable across different simulators, showing that researchers would draw differ...
Dynamic Voltage Scaling on a Low-Power Microprocessor
- Mobile Computing and Networking - Mobicom
, 2001
"... Power consumption is the limiting factor for the functionality of future wearable devices. Since interactive applications like wireless information access generate bursts of activities, it is important to match the performance of the wearable device accordingly. This paper describes a system with a ..."
Abstract
-
Cited by 103 (7 self)
- Add to MetaCart
Power consumption is the limiting factor for the functionality of future wearable devices. Since interactive applications like wireless information access generate bursts of activities, it is important to match the performance of the wearable device accordingly. This paper describes a system with a microprocessor whose speed can be varied (frequency scaling) as well as its input voltage. Voltage scaling is important for reducing power consumption to very low values when operating at low speeds. Measurements show that the energy per instruction at minimal speed (59 MHz) is 1/5 of the energy required at full speed (251 MHz). The frequency and voltage can be scaled dynamically from user space in only 140 ¡ s. This allows power-aware applications to quickly adjust the performance level of the processor whenever the workload changes. 1
The performance of µ-kernel-based systems
- IN 16TH ACM SYMPOSIUM ON OPERATING SYSTEM PRINCIPLES (SOSP
, 1997
"... First-generation µ-kernels have a reputation for being too slow and lacking sufficient flexibility. To determine whether L4, a lean second-generation µ-kernel, has overcome these limitations, we have repeated several earlier experiments and conducted some novel ones. Moreover, we ported the Linux op ..."
Abstract
-
Cited by 97 (14 self)
- Add to MetaCart
First-generation µ-kernels have a reputation for being too slow and lacking sufficient flexibility. To determine whether L4, a lean second-generation µ-kernel, has overcome these limitations, we have repeated several earlier experiments and conducted some novel ones. Moreover, we ported the Linux operating system to run on top of the L4 µ-kernel and compared the resulting system with both Linux running native, and MkLinux, a Linux version that executes on top of a firstgeneration Mach-derived µ-kernel. For L 4 Linux, the AIM benchmarks report a maximum throughput which is only 5% lower than that of native Linux. The corresponding penalty is 5 times higher for a co-located in-kernel version of MkLinux, and 7 times higher for a userlevel version of MkLinux. These numbers demonstrate both that it is possible to implement a high-performance conventional operating system personality above a µ-kernel, and that the performance of the µ-kernel is crucial to achieve this. Further experiments...

