Results 1 - 10
of
27
Informed Prefetching and Caching
- In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles
, 1995
"... The underutilization of disk parallelism and file cache buffers by traditional file systems induces I/O stall time that degrades the performance of modern microprocessor-based systems. In this paper, we present aggressive mechanisms that tailor file system resource management to the needs of I/O-int ..."
Abstract
-
Cited by 321 (8 self)
- Add to MetaCart
The underutilization of disk parallelism and file cache buffers by traditional file systems induces I/O stall time that degrades the performance of modern microprocessor-based systems. In this paper, we present aggressive mechanisms that tailor file system resource management to the needs of I/O-intensive applications. In particular, we show how to use application-disclosed access patterns (hints) to expose and exploit I/O parallelism and to allocate dynamically file buffers among three competing demands: prefetching hinted blocks, caching hinted blocks for reuse, and caching recently used data for unhinted accesses. Our approach estimates the impact of alternative buffer allocations on application execution time and applies a cost-benefit analysis to allocate buffers where they will have the greatest impact. We implemented informed prefetching and caching in DEC’s OSF/1 operating system and measured its performance on a 150 MHz Alpha equipped with 15 disks running a range of applications including text search, 3D scientific visualization, relational database queries, speech recognition, and computational chemistry. Informed prefetching reduces the execution time of the first four of these applications by 20 % to 87%. Informed caching reduces the execution time of the fifth application by up to 30%.
RAID: High-Performance, Reliable Secondary Storage
- ACM COMPUTING SURVEYS
, 1994
"... Disk arrays were proposed in the 1980s as a way to use parallelism between multiple disks to improve aggregate I/O performance. Today they appear in the product lines of most major computer manufacturers. This paper gives a comprehensive overview of disk arrays and provides a framework in which to o ..."
Abstract
-
Cited by 282 (6 self)
- Add to MetaCart
Disk arrays were proposed in the 1980s as a way to use parallelism between multiple disks to improve aggregate I/O performance. Today they appear in the product lines of most major computer manufacturers. This paper gives a comprehensive overview of disk arrays and provides a framework in which to organize current and future work. The paper first introduces disk technology and reviews the driving forces that have popularized disk arrays: performance and reliability. It then discusses the two architectural techniques used in disk arrays: striping across multiple disks to improve performance and redundancy to improve reliability. Next, the paper describes seven disk array architectures, called RAID (Redundant Arrays of Inexpensive Disks) levels 0-6 and compares their performance, cost, and reliability. It goes on to discuss advanced research and implementation topics such as refining the basic RAID levels to improve performance and designing algorithms to maintain data consistency. Last, the paper describes six disk array prototypes or products and discusses future opportunities for research. The paper includes an annotated bibliography of disk array-related literature.
Optimal Prefetching via Data Compression
, 1995
"... Caching and prefetching are important mechanisms for speeding up access time to data on secondary storage. Recent work in competitive online algorithms has uncovered several promising new algorithms for caching. In this paper we apply a form of the competitive philosophy for the first time to the pr ..."
Abstract
-
Cited by 226 (11 self)
- Add to MetaCart
Caching and prefetching are important mechanisms for speeding up access time to data on secondary storage. Recent work in competitive online algorithms has uncovered several promising new algorithms for caching. In this paper we apply a form of the competitive philosophy for the first time to the problem of prefetching to develop an optimal universal prefetcher in terms of fault ratio, with particular applications to large-scale databases and hypertext systems. Our prediction algorithms for prefetching are novel in that they are based on data compression techniques that are both theoretically optimal and good in practice. Intuitively, in order to compress data effectively, you have to be able to predict future data well, and thus good data compressors should be able to predict well for purposes of prefetching. We show for powerful models such as Markov sources and nth order Markov sources that the page fault rates incurred by our prefetching algorithms are optimal in the limit for almost all sequences of page requests.
PPFS: A High Performance Portable Parallel File System
- In Proceedings of the 9th ACM International Conference on Supercomputing
, 1995
"... Rapid increases in processor performance over the past decade have outstripped performance improvements in input/output devices, increasing the importance of input /output performance to overall system performance. Further, experience has shown that the performance of parallel input/output systems i ..."
Abstract
-
Cited by 122 (13 self)
- Add to MetaCart
Rapid increases in processor performance over the past decade have outstripped performance improvements in input/output devices, increasing the importance of input /output performance to overall system performance. Further, experience has shown that the performance of parallel input/output systems is particularly sensitive to data placement and data management policies, making good choices critical. To explore this vast design space, we have developed a user-level library, the Portable Parallel File System (PPFS), which supports rapid experimentation and exploration. The PPFS includes a rich application interface, allowing the application to advertise access patterns, control caching and prefetching, and even control data placement. PPFS is both extensible and portable, making possible a wide range of experiments on a broad variety of platforms and configurations. Our initial experiments, based on simple benchmarks and two application programs, show that tailoring policies to input/out...
Power Management Techniques for Mobile Communication
, 1998
"... In mobile computing, power is a limited resource. Like other devices, communication devices need to be properly managed to conserve energy. In this paper, we present the design and implementation of an innovative transport level protocol capable of significantly reduc- ing the power usage of the com ..."
Abstract
-
Cited by 118 (2 self)
- Add to MetaCart
In mobile computing, power is a limited resource. Like other devices, communication devices need to be properly managed to conserve energy. In this paper, we present the design and implementation of an innovative transport level protocol capable of significantly reduc- ing the power usage of the communication device. The protocol achieves power savings by selectively choosing short periods of time to suspend communications and shut down the communication device. It manages the important task of queuing data for future delivery during periods of communication suspension, and decides when to restart communication. We also address the tradeoff between reducing power consumption and reducing delay for incoming data.
Application-Driven Power Management for Mobile Communication
, 2000
"... this paper, we present the design and implementation of an innovative transport level protocol capable of significantly reducing the power usage of the communication device. The protocol achieves power savings by selectively choosing short periods of time to suspend communications and shut down th ..."
Abstract
-
Cited by 101 (3 self)
- Add to MetaCart
this paper, we present the design and implementation of an innovative transport level protocol capable of significantly reducing the power usage of the communication device. The protocol achieves power savings by selectively choosing short periods of time to suspend communications and shut down the communication device. It manages the important task of queuing data for future delivery during periods of communication suspension, and decides when to restart communication. We also address the tradeoff between reducing power consumption and reducing delay for incoming data. We present results from experiments using our implementation of the protocol. These experiments measure the energy consumption for three simulated communication patterns as well as three trace-based communication patterns and compare the effects of different suspension strategies. Our results show up to 83% savings in the energy consumed by the communication. For a high-end laptop, this can translate to 6--9% sav
PASSION: Parallel And Scalable Software for Input-Output
, 1994
"... \We are developing a software system called PASSION: Parallel And Scalable Software for Input-Output which provides software support for high performance parallel I/O. PASSION provides support at the language, compiler, runtime as well as file system level. PASSION provides runtime procedures for pa ..."
Abstract
-
Cited by 72 (35 self)
- Add to MetaCart
\We are developing a software system called PASSION: Parallel And Scalable Software for Input-Output which provides software support for high performance parallel I/O. PASSION provides support at the language, compiler, runtime as well as file system level. PASSION provides runtime procedures for parallel access to files (read/write), as well as for out-of-core computations. These routines can either be used together with a compiler to translate out-of-core data parallel programs written in a language like HPF, or used directly by application programmers. A number of optimizations such as Two-Phase Access, Data Sieving, Data Prefetching and Data Reuse have been incorporated in the PASSION Runtime Library for improved performance. PASSION also provides an initial framework for runtime support for out-of-core irregular problems. The goal of the PASSION compiler is to automatically translate out- of-core data parallel programs to node programs for distributed memory machines, with calls to the PASSION Runtime Library. At the language level, PASSION suggests extensions to HPF for out-of-core programs. At the file system level, PASSION provides support for buffering and prefetching data from disks. A portable parallel file system is also being developed as part of this project, which can be used across homogeneous or heterogeneous networks of workstations. PASSION also provides support for integrating data and task parallelism using parallel I/O techniques. We have used PASSION to implement a number of out-of-core applications such as a Laplace's equation solver, 2D FFT, Matrix Multiplication, LU Decomposition, image processing applications as well as unstructured mesh kernels in molecular dynamics and computational fluid dynamics. We are currently in the process of using PASSION in applications in CFD (3D turbulent flows), molecular structure calculations, seismic computations, and earth and space science applications such as Four-Dimensional Data Assimilation. PASSION is currently available on the Intel Paragon, Touchstone Delta and iPSC/860. Efforts are underway to port it to the IBM SP-1 and SP-2 using the Vesta Parallel File System.
Mobile Awareness in a Wide Area Wireless Network of Info-Stations
- In ACM/IEEE International Conference on Mobile Computing and Networking (MobiCom
, 1998
"... Wireless networking is becoming an increasingly important communication means, yet high wide--area wireless data connectivity is difficult to achieve due to technological and physical limitations. To alleviate these problems we experiment with an alternative by placing many high bandwidth local "isl ..."
Abstract
-
Cited by 37 (1 self)
- Add to MetaCart
Wireless networking is becoming an increasingly important communication means, yet high wide--area wireless data connectivity is difficult to achieve due to technological and physical limitations. To alleviate these problems we experiment with an alternative by placing many high bandwidth local "islands" of info--stations dispersed throughout the low bandwidth wide--area wireless network. The location and distribution of the individual stations is crucial for the network's overall effectiveness, as demonstrated by our investigations. The info--stations are deployed in a transparent manner, often not geographically visible to the user. Applications must be designed to be mobile--aware and able to account for changing network characteristics by optimally utilizing the available network resources. We simulate alternative network layouts and determine their effectiveness by experimenting with an incremental map downloading application for road travelers that uses intelligent prefetching to...
Making the “Box” Transparent: System Call Performance as a First-class Result
"... For operating system intensive applications, the ability of designers to understand system call performance behavior is essential to achieving high performance. Conventional performance tools, such as monitoring tools and profilers, collect and present their information off-line or via out-ofband ch ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
For operating system intensive applications, the ability of designers to understand system call performance behavior is essential to achieving high performance. Conventional performance tools, such as monitoring tools and profilers, collect and present their information off-line or via out-ofband channels. We believe that making this information first-class and exposing it to applications via in-band channels on a per-call basis presents opportunities for performance analysis and tuning not available via other mechanisms. Furthermore, our approach provides direct feedback to applications on time spent in the kernel, resource contention, and time spent blocked, allowing them to immediately observe how their actions affect kernel behavior. Not only does this approach provide greater transparency into the workings of the kernel, but it also allows applications to control how performance information is collected, filtered, and correlated with application-level events. To demonstrate the power of this approach, we show that our implementation, DeBox, obtains precise information about OS behavior at low cost, and that it can be used in debugging and tuning application performance on complex workloads. In particular, we focus on the industry-standard SpecWeb99 benchmark running on the Flash Web Server. Using DeBox, we are able to diagnose a series of problematic interactions between the server and the OS. Addressing these issues as well as other optimization opportunities generates an overall factor of four improvement in our SpecWeb99 score, throughput gains on other benchmarks, and latency reductions ranging from a factor of 4 to 47.
Client Cache Management in a Distributed Object Database
, 1995
"... A distributed object database stores objects persistently at servers. Applications run on client machines, fetching objects into a client-side cache of objects. If fetching and cache management are done in terms of objects, rather than fixed-size units such as pages, three problems must be solved: 1 ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
A distributed object database stores objects persistently at servers. Applications run on client machines, fetching objects into a client-side cache of objects. If fetching and cache management are done in terms of objects, rather than fixed-size units such as pages, three problems must be solved: 1. which objects to prefetch, 2. how to translate, or swizzle, inter-object references when they are fetched from server to client, and 3. which objects to displace from the cache. This thesis reports the results of experiments to test various solutions to these problems. The experiments use the runtime system of the Thor distributed object database and benchmarks adapted from the Wisconsin OO7 benchmark suite. The thesis establishes the following points: 1. For plausible workloads involving some amount of object fetching, the prefetching policy is likely to have more impact on performance than swizzling policy or cache management policy. 2. A simple breadth-first prefetcher can have performa...

