Results 1 - 10
of
31
Explicit Control in a Batch-Aware Distributed File System
"... We present the design, implementation, and evaluation of the Batch-Aware Distributed File System (BAD-FS), a system designed to orchestrate large, I/O-intensive batch workloads on remote computing clusters distributed across the wide area. BAD-FS consists of two novel components: a storage layer whi ..."
Abstract
-
Cited by 44 (3 self)
- Add to MetaCart
We present the design, implementation, and evaluation of the Batch-Aware Distributed File System (BAD-FS), a system designed to orchestrate large, I/O-intensive batch workloads on remote computing clusters distributed across the wide area. BAD-FS consists of two novel components: a storage layer which exposes control of traditionally fixed policies such as caching, consistency, and replication; and a scheduler that exploits this control as needed for different users and workloads. By extracting these controls from the storage layer and placing them in an external scheduler, BAD-FS manages both storage and computation in a coordinated way while gracefully dealing with cache consistency, fault-tolerance, and space management issues in an application-specific manner. Using both microbenchmarks and real applications, we demonstrate the performance benefits of explicit control, delivering excellent end-to-end performance across the wide-area.
A nine year study of file system and storage benchmarking
- ACM Transactions on Storage
, 2008
"... Benchmarking is critical when evaluating performance, but is especially difficult for file and storage systems. Complex interactions between I/O devices, caches, kernel daemons, and other OS components result in behavior that is rather difficult to analyze. Moreover, systems have different features ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
Benchmarking is critical when evaluating performance, but is especially difficult for file and storage systems. Complex interactions between I/O devices, caches, kernel daemons, and other OS components result in behavior that is rather difficult to analyze. Moreover, systems have different features and optimizations, so no single benchmark is always suitable. The large variety of workloads that these systems experience in the real world also adds to this difficulty. In this article we survey 415 file system and storage benchmarks from 106 recent papers. We found that most popular benchmarks are flawed and many research papers do not provide a clear indication of true performance. We provide guidelines that we hope will improve future performance evaluations. To show how some widely used benchmarks can conceal or overemphasize overheads, we conducted a set of experiments. As a specific example, slowing down read operations on ext2 by a factor of 32 resulted in only a 2–5 % wall-clock slowdown in a popular compile benchmark. Finally, we discuss future work to improve file system and storage benchmarking.
Making the “Box” Transparent: System Call Performance as a First-class Result
"... For operating system intensive applications, the ability of designers to understand system call performance behavior is essential to achieving high performance. Conventional performance tools, such as monitoring tools and profilers, collect and present their information off-line or via out-ofband ch ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
For operating system intensive applications, the ability of designers to understand system call performance behavior is essential to achieving high performance. Conventional performance tools, such as monitoring tools and profilers, collect and present their information off-line or via out-ofband channels. We believe that making this information first-class and exposing it to applications via in-band channels on a per-call basis presents opportunities for performance analysis and tuning not available via other mechanisms. Furthermore, our approach provides direct feedback to applications on time spent in the kernel, resource contention, and time spent blocked, allowing them to immediately observe how their actions affect kernel behavior. Not only does this approach provide greater transparency into the workings of the kernel, but it also allows applications to control how performance information is collected, filtered, and correlated with application-level events. To demonstrate the power of this approach, we show that our implementation, DeBox, obtains precise information about OS behavior at low cost, and that it can be used in debugging and tuning application performance on complex workloads. In particular, we focus on the industry-standard SpecWeb99 benchmark running on the Flash Web Server. Using DeBox, we are able to diagnose a series of problematic interactions between the server and the OS. Addressing these issues as well as other optimization opportunities generates an overall factor of four improvement in our SpecWeb99 score, throughput gains on other benchmarks, and latency reductions ranging from a factor of 4 to 47.
Embracing diversity in the Barrelfish manycore operating system
- In Proceedings of the Workshop on Managed Many-Core Systems
, 2008
"... We discuss diversity and heterogeneity in manycore computer systems, and identify three distinct types of diversity, all of which present challenges to operating system designers and application writers alike. We observe that most current research work has concentrated on a narrow form of one of the ..."
Abstract
-
Cited by 16 (7 self)
- Add to MetaCart
We discuss diversity and heterogeneity in manycore computer systems, and identify three distinct types of diversity, all of which present challenges to operating system designers and application writers alike. We observe that most current research work has concentrated on a narrow form of one of these (non-uniform memory access) to the exclusion of the others, and show with measurement why this makes sense in the short term. However, we claim that this is not viable in the long term given current processor and system roadmaps, and present our approach to dealing with both heterogeneous hardware within a single system, and the increasing diversity of complete system configurations: we directly represent detailed system information in an expressive “system knowledge base ” accessible to applications and OS subsystems alike, and use this to control tasks such as scheduling and resource allocation. 1.
Unveiling the Transport
- IN HOTNETS II
, 2003
"... Traditional application programming interfaces for transport protocols make a virtue of hiding most internal per-connection state. We argue that this informationhiding precludes many potentially useful application features and performance optimizations. We advocate a disciplined, portable, and secur ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Traditional application programming interfaces for transport protocols make a virtue of hiding most internal per-connection state. We argue that this informationhiding precludes many potentially useful application features and performance optimizations. We advocate a disciplined, portable, and secure interface that gives applications both "get" and "set" access to transport connection state.
Deploying safe user-level network services with icTCP
- in Proceedings of the 6th Symposium on
, 2004
"... ..."
Non-interference for a practical DIFC-based operating system
- in IEEE Symposium on Security and Privacy (to appear). IEEE Computer Society
, 2009
"... Abstract. The Flume system is an implementation of decentralized information flow control (DIFC) at the operating system level. Prior work has shown Flume can be implemented as a practical extension to the Linux operating system, allowing real Web applications to achieve useful security guarantees. ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Abstract. The Flume system is an implementation of decentralized information flow control (DIFC) at the operating system level. Prior work has shown Flume can be implemented as a practical extension to the Linux operating system, allowing real Web applications to achieve useful security guarantees. However, the question remains if the Flume system is actually secure. This paper compares Flume with other recent DIFC systems like Asbestos, arguing that the latter is inherently susceptible to certain wide-bandwidth covert channels, and proving their absence in Flume by means of a noninterference proof in the Communicating Sequential Processes formalism. 1
Unifier: Unifying Cache Management and Communication Buffer Management for PVFS over InfiniBand
- In In Proceedings of IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 04
, 2004
"... The advent of networking technologies and high performance transport protocols facilitates the service of storage over networks. However, they pose challenges in integration and interaction among storage server application components and system components. In this paper, we put forward a component, ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
The advent of networking technologies and high performance transport protocols facilitates the service of storage over networks. However, they pose challenges in integration and interaction among storage server application components and system components. In this paper, we put forward a component, called Unifier, to provide more efficient integration and better interaction among these components. Unifier has three notable features. (1) Unifier integrates cache management and communication buffer management. It offers a single copy data sharing among all components in a server application safely and concurrently. (2) It reduces memory registration and deregistration costs to enable applications to take full advantage of RDMA operations. (3) It provides means to achieve adaptation, application-specific optimization, and better cooperation among different components. This paper presents the design and implementation of Unifier. This component has been deployed and evaluated in a version of PVFS1 implementation over InfiniBand. Experimental results show performance improvements between 30 % and 70 % over other approaches. Better scalability is also achieved by the PVFS I/O servers.
The Case Against User-level Networking
- In Third Workshop on Novel Uses of System Area Networks (SAN-3) (Held in conjunction with HPCA-10
, 2004
"... Abstract — Extensive research on system support for enabling I/O-intensive applications to achieve performance close to the limits imposed by the hardware suggests two main approaches: Low overhead I/O protocols and the flexibility to customize I/O policies to the needs of applications. One way to a ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract — Extensive research on system support for enabling I/O-intensive applications to achieve performance close to the limits imposed by the hardware suggests two main approaches: Low overhead I/O protocols and the flexibility to customize I/O policies to the needs of applications. One way to achieve both is by supporting user-level access to I/O devices, enabling user-level implementations of I/O protocols. User-level networking is an example of this approach, specific to network interface controllers (NICs). In this paper, we argue that the real key to high-performance in I/O-intensive applications is user-level file caching and user-level network buffering, both of which can be achieved without user-level access to NICs. Avoiding the need to support user-level networking carries two important benefits for overall system design: First, a NIC exporting a privileged kernel interface is simpler to design and implement than one exporting a user-level interface. Second, the kernel is re-instated as a global system resource controller and arbitrator. We develop an analytical model of network storage applications and use it to show that their performance is not affected by the use of a kernel-based API to NICs. I.
A buffer cache management scheme exploiting both temporal and spatial localities
- Trans. Storage
"... On-disk sequentiality of requested blocks, or their spatial locality, is critical to real disk performance where the throughput of access to sequentially-placed disk blocks can be an order of magnitude higher than that of access to randomly-placed blocks. Unfortunately, spatial locality of cached bl ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
On-disk sequentiality of requested blocks, or their spatial locality, is critical to real disk performance where the throughput of access to sequentially-placed disk blocks can be an order of magnitude higher than that of access to randomly-placed blocks. Unfortunately, spatial locality of cached blocks is largely ignored, and only temporal locality is considered in current system buffer cache managements. Thus, disk performance for workloads without dominant sequential accesses can be seriously degraded. To address this problem, we propose a scheme called DULO (DUal LOcality) which exploits both temporal and spatial localities in the buffer cache management. Leveraging the filtering effect of the buffer cache, DULO can influence the I/O request stream by making the requests passed to the disk more sequential, thus significantly increasing the effectiveness of I/O scheduling and prefetching for disk performance improvements. We have implemented a prototype of DULO in Linux 2.6.11. The implementation shows that DULO can significantly increases disk I/O throughput for real-world applications such as a Web server, TPC benchmark, file system benchmark, and scientific programs. It reduces their execution times by as much as 53%.

