Results 11 - 20
of
21
Caching for Bursts (C-Burst): Let Hard Disks Sleep Well and Work Energetically
"... High energy consumption has become a critical challenge in all kinds of computer systems. Hardware-supported Dynamic Power Management (DPM) provides a mechanism to save disk energy by transitioning an idle disk to a low-power mode. However, the achievable disk energy saving is mainly dependent on th ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
High energy consumption has become a critical challenge in all kinds of computer systems. Hardware-supported Dynamic Power Management (DPM) provides a mechanism to save disk energy by transitioning an idle disk to a low-power mode. However, the achievable disk energy saving is mainly dependent on the pattern of I/O requests received at the disk. In particular, for a given number of requests, a bursty disk access pattern serves as a foundation for energy optimization. Aggressive prefetching has been used to increase disk access burstiness and extend disk idle intervals, while caching, a critical component in buffer cache management, has not been paid a specific attention. In the absence of cooperation from caching, the attempt to create bursty disk accesses would often be disturbed due to improper replacement decision made by energyunaware caching policies. In this paper, we present the design of a set of comprehensive energy-aware caching schemes, called C-Burst, and its implementation in Linux kernel 2.6.21. Our caching schemes leverage the ‘filtering ’ effect of buffer cache to effectively reshape the disk access stream to a bursty pattern for energy saving. The experiments under various scenarios show that C-Burst schemes can achieve up to 35 % disk energy saving with minimal performance loss.
IMCa: A High Performance Caching Front-end for GlusterFS on InfiniBand ∗
"... With the rapid advances in computing technology, there is an explosion in media that needs to collected, cataloged, stored and accessed. With the speed of disks not keeping pace with the improvements in processor and network speed, the ability of network file systems to provide data to demanding app ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
With the rapid advances in computing technology, there is an explosion in media that needs to collected, cataloged, stored and accessed. With the speed of disks not keeping pace with the improvements in processor and network speed, the ability of network file systems to provide data to demanding applications at an appropriate rate is diminishing. In this paper, we propose to enhance the performance of network file systems by providing an InterMediate bank of Cache servers between the client and server called (IMCa). Whenever possible, file system operations from the client are serviced from the cache bank. We evaluate IMCa with a number of different benchmarks. The results of these experiments demonstrate that the intermediate cache architecture can reduce the latency of certain operations by upto 82 % over the native implementation and upto 86 % compared with the Lustre file system. In addition, we also see an improvement in the performance of data transfer operations in most cases and for most scenarios. Finally, the caching hierarchy helps us to achieve better scalability of file system operations.
Efficiently Identifying Working Sets in Block I/O Streams
"... Identifying groups of blocks that tend to be read or written togetherinagivenenvironmentisthefirststeptowardspowerful techniques for device failure isolation and power management. For example, identified groups can be placed together on a single disk, avoiding excess drive activity across an exascal ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Identifying groups of blocks that tend to be read or written togetherinagivenenvironmentisthefirststeptowardspowerful techniques for device failure isolation and power management. For example, identified groups can be placed together on a single disk, avoiding excess drive activity across an exascale storage system. Unlike previous grouping work, we focus on identifying groupings in data that can be gathered from real, running systems with minimal impact. Using temporal, spatial, and access ordering information from an enterprise data set, we identified a set of groupings that consistently appear, indicating that these are working sets that are likely to be accessed together. We present several techniques to obtain groupings along with a discussion of what techniques best apply to particular types of real systems. We intend to use these preliminary results to inform our search for new types of workloads with a goal of identifying properties of easily separable workloads across different systems and dynamically moving groups in these workloads to reduce disk activity in large storage systems.
Caching for Bursts (C-Burst): Let Hard Disks Sleep Well
"... High energy consumption has become a critical challenge in all kinds of computer systems. Hardware-supported Dynamic Power Management (DPM) provides a mechanism to save disk energy by transitioning an idle disk to a low-power mode. However, the achievable disk energy saving is mainly dependent on th ..."
Abstract
- Add to MetaCart
High energy consumption has become a critical challenge in all kinds of computer systems. Hardware-supported Dynamic Power Management (DPM) provides a mechanism to save disk energy by transitioning an idle disk to a low-power mode. However, the achievable disk energy saving is mainly dependent on the pattern of I/O requests received at the disk. In particular, for a given number of requests, a bursty disk access pattern serves as a foundation for energy optimization. Aggressive prefetching has been used to increase disk access burstiness and extend disk idle intervals, while caching, a critical component in buffer cache management, has not been paid a specific attention. In the absence of cooperation from caching, the attempt to create bursty disk accesses would often be disturbed due to improper replacement decision made by energy-unaware caching policies. In this paper, we present the design of a set of comprehensive energy-aware caching schemes, called C-Burst, and its implementation in Linux kernel 2.6.21. Our caching schemes leverage the ‘filtering ’ effect of buffer cache to effectively reshape the disk access stream to a bursty pattern for energy saving. The experiments under various scenarios show that C-Burst schemes can achieve up to 35 % disk energy saving with minimal performance loss.
PS-BC: Power-saving Considerations in Design of Buffer Caches Serving Heterogeneous Storage Devices
"... Under a replacement policy, existing operating systems identify and maintain most frequently used storage data in buffer caches located in main memory, aiming at low-latency I/O data accesses. However, replacement policies can also strongly affect energy consumptions of various connected storage dev ..."
Abstract
- Add to MetaCart
Under a replacement policy, existing operating systems identify and maintain most frequently used storage data in buffer caches located in main memory, aiming at low-latency I/O data accesses. However, replacement policies can also strongly affect energy consumptions of various connected storage devices, which has not been a consideration in the design and implementation of buffer cache management. In this paper, we present a system framework for an energy-aware buffer cache replacement, called PS-BC (power-saving buffer cache). By considering several critical factors affecting system energy consumption, PS-BC can effectively improve system energy efficiency, while it is able to flexibly incorporate conventional performance-oriented buffer cache replacement policies for different performance objectives. Our experimental studies based on a trace-driven simulation show that the PS-BC framework embedded with the CLOCK replacement policy can achieve an energy saving rate of up to 32.5% with a minimal overhead for various workloads.
Quantifying Temporal and Spatial Localities in Storage Workloads and Transformations by Data Path Components
"... Temporal and spatial localities are basic concepts in operating systems, and storage systems rely on localities to perform well. Surprisingly, it is difficult to quantify the localities present in workloads and how localities are transformed by storage data path components in metrics that can be com ..."
Abstract
- Add to MetaCart
Temporal and spatial localities are basic concepts in operating systems, and storage systems rely on localities to perform well. Surprisingly, it is difficult to quantify the localities present in workloads and how localities are transformed by storage data path components in metrics that can be compared under diverse settings. In this paper, we introduce stack- and block-affinity metrics to quantify temporal and spatial localities. We demonstrate that our metrics (1) behave well under extreme and normal loads, (2) can be used to validate synthetic loads at each stage of storage optimization, (3) can capture localities in ways that are resilient to generations of hardware, and (4) correlate meaningfully with performance. Our experience also unveiled hidden semantics of localities and identified future research directions. 1.
Improving Parallel I/O Performance with Data Layout Awareness ♯
"... Parallel applications can benefit greatly from massive computational capability, but their performance suffers from large latency of I/O accesses. The poor I/O performance has been attributed as a critical cause of the low sustained performance of parallel computing systems. In this study, we propos ..."
Abstract
- Add to MetaCart
Parallel applications can benefit greatly from massive computational capability, but their performance suffers from large latency of I/O accesses. The poor I/O performance has been attributed as a critical cause of the low sustained performance of parallel computing systems. In this study, we propose a data layout-aware optimization strategy to promote a better integration of the parallel I/O middleware and parallel file systems, two major components of the current parallel I/O systems, and to improve the data access performance. We explore the layout-aware optimization in both independent I/O and collective I/O, two primary forms of I/O in parallel applications. We illustrate that the layout-aware I/O optimization could improve the performance of current parallel I/O strategy effectively. The experimental results verify that the proposed strategy could improve parallel I/O performance by nearly 40 % on average. The proposed layout-aware parallel I/O has a promising potential in improving the I/O performance of parallel systems. Keywords: parallel I/O, parallel file systems, parallel I/O middleware, collective I/O, independent I/O, data layout, I/O performance, data access optimization
SOPA: Selecting the Optimal Policy Adaptively for a cache system
"... With the development of storage technique, new caching policies are continuously being introduced, which makes it increasingly important for cache systems to select the optimal caching policy dynamically under varying workloads. This paper proposes SOPA, a mechanism to adaptively select the optimal ..."
Abstract
- Add to MetaCart
With the development of storage technique, new caching policies are continuously being introduced, which makes it increasingly important for cache systems to select the optimal caching policy dynamically under varying workloads. This paper proposes SOPA, a mechanism to adaptively select the optimal policy and perform policy switching. First, SOPA encapsulates the functions of a caching policy into a module, and enables online policy switching by policy reconstruction. Second, SOPA selects the optimal policy dynamically by collecting and analyzing access traces, and reduces the decision-making cost by asynchronous decision. The simulation evaluation showed that no single caching policy could perform well under all of the different workloads, but SOPA could select the appropriate policy for each workload. The real-system evaluation showed that SOPA could reduce the average response time by up to 20.3 % compared with LRU and up to 11.9% compared with ARC.
S-FTL: An Efficient Address Translation for Flash Memory by Exploiting Spatial Locality
"... Abstract—The solid-state disk (SSD) is becoming increasingly popular, especially among users whose workloads exhibit substantial random access patterns. As SSD competes with the hard disk, whose per-GB cost keeps dramatically falling, the SSD must retain its performance advantages even with lowcost ..."
Abstract
- Add to MetaCart
Abstract—The solid-state disk (SSD) is becoming increasingly popular, especially among users whose workloads exhibit substantial random access patterns. As SSD competes with the hard disk, whose per-GB cost keeps dramatically falling, the SSD must retain its performance advantages even with lowcost configurations, such as those with a small built-in DRAM cache for mapping table and using MLC NAND. To this end, we need to make the limited cache space efficiently used to support fast logical-to-physical address translation in the flash translation layer (FTL) with minimal access of flash memory and minimal merge operations. Existing schemes usually require a large number of overhead accesses, either for accessing uncached entries of the mapping table or for the merge operation, and achieve suboptimal performance when the cache space is limited. In this paper we take into account spatial locality exhibited in the workloads to obtain a highly efficient FTL even with a relatively small cache, named as S-FTL. Specifically, we identify three access patterns related to spatial locality, including sequential writes, clustered access, and sparse writes. Accordingly we propose designs to take advantage of these patterns to reduce mapping table size, increase hit ratio for in-cache address translation, and minimize expensive writes to flash memory. We have conducted extensive trace-driven simulations to evaluate S-FTL and compared it with other state-of-the-art FTL schemes. Our experiments show that S-FTL can reduce accesses to the flash for address translation by up to 70 % and reduce response time of SSD by up to 25%, compared with the stateof-the-art FTL strategies such as FAST and DFTL. I.
Dissertation Committee: Approved by
"... The Internet age has exponentially increased the volume of digital media that is being shared and distributed. Broadband Internet has made technologies such as high quality streaming video on demand possible. Large scale supercomputers also consume and create huge quantities of data. This media and ..."
Abstract
- Add to MetaCart
The Internet age has exponentially increased the volume of digital media that is being shared and distributed. Broadband Internet has made technologies such as high quality streaming video on demand possible. Large scale supercomputers also consume and create huge quantities of data. This media and data must be stored, cataloged and retrieved with high-performance. Researching high-performance storage subsystems to meet the I/O demands of applications in modern scenarios is crucial. Advances in microprocessor technology have given rise to relatively cheap off-the-shelf hardware that may be put together as personal computers as well as servers. The servers may be connected together by networking technology to create farms or clusters of workstations (COW). The evolution of COWs has significantly reduced the cost of ownership of high-performance clusters and has allowed users to build fairly large scale machines based on commodity server hardware. As COWs have evolved, networking technologies like InfiniBand and 10 Gigabit Ethernet have also evolved. These networking technologies not only give lower end-to-end latencies, but also allow for better messaging throughput between the nodes. This allows

