Results 11 - 20
of
22
CLOCK-Pro: An Effective Improvement of the CLOCK Replacement
- In Proceedings of USENIX Annual Technical Conference
, 2005
"... With the ever-growing performance gap between memory systems and disks, and rapidly improving CPU performance, virtual memory (VM) management becomes increasingly important for overall system performance. However, one of its critical components, the page replacement policy, is still dominated by CLO ..."
Abstract
- Add to MetaCart
With the ever-growing performance gap between memory systems and disks, and rapidly improving CPU performance, virtual memory (VM) management becomes increasingly important for overall system performance. However, one of its critical components, the page replacement policy, is still dominated by CLOCK, a replacement policy developed almost 40 years ago. While pure LRU has an unaffordable cost in VM, CLOCK simulates the LRU replacement algorithm with a low cost acceptable in VM management. Over the last three decades, the inability of LRU as well as CLOCK to handle weak locality accesses has become increasingly serious, and an effective fix becomes increasingly desirable. Inspired by our I/O buffer cache replacement algorithm, LIRS [13], we propose an improved CLOCK replacement policy, called CLOCK-Pro. By additionally keeping track of a limited number of replaced pages, CLOCK-Pro works in a similar fashion as CLOCK with a VM-affordable cost. Furthermore, it brings all the much-needed performance advantages from LIRS into CLOCK. Measurements from an implementation of CLOCK-Pro in Linux Kernel 2.4.21 show that the execution times of some commonly used programs can be reduced by up to 47%.
Microsoft Corporation and
"... Current businesses rely heavily on efficient access to their databases. Manual tuning of these database systems by performance experts is increasingly infeasible: For small companies, hiring an expert may be too expensive; for large enterprises, even an expert may not fully understand the interactio ..."
Abstract
- Add to MetaCart
Current businesses rely heavily on efficient access to their databases. Manual tuning of these database systems by performance experts is increasingly infeasible: For small companies, hiring an expert may be too expensive; for large enterprises, even an expert may not fully understand the interaction between a large system and its multiple changing workloads. This trend has led major vendors to offer tools that automatically and dynamically tune a database system. Many database tuning knobs concern the buffer pool for caching data and disk pages. Specifically, these knobs control the buffer allocation and thus the cache miss probability, which has direct impact on performance. Previous methods for automatic buffer tuning are based on simulation, black-box control, gradient descent, and empirical equations. This article presents a new approach, using calculations with an analytically-derived equation that relates miss probability to buffer allocation; this equation fits four buffer replacement policies, as well as twelve datasets from mainframes running commercial databases in large corporations. The equation identifies a buffer-size limit that is useful for buffer tuning and powering down idle buffers. It can also replace simulation in predicting I/O costs. Experiments with PostgreSQL
IEEE International Conference on Data Engineering BP-Wrapper: A System Framework Making Any Replacement Algorithms (Almost) Lock Contention Free
"... Abstract — In a high-end database system, the execution concurrency level rises continuously in a multiprocessor environment due to the increase in number of concurrent transactions and the introduction of multi-core processors. A new challenge for buffer management to address is to retain its scala ..."
Abstract
- Add to MetaCart
Abstract — In a high-end database system, the execution concurrency level rises continuously in a multiprocessor environment due to the increase in number of concurrent transactions and the introduction of multi-core processors. A new challenge for buffer management to address is to retain its scalability in responding to the highly concurrent data processing demands and environment. The page replacement algorithm, a major component in the buffer management, can seriously degrade the system’s performance if the algorithm is not implemented in a scalable way. A lock-protected data structure is used in most replacement algorithms, where high contention is caused by concurrent accesses. A common practice is to modify a replacement algorithm to reduce the contention, such as to approximate the LRU replacement with the clock algorithm. Unfortunately, this type of modification usually hurts hit ratios of original algorithms. This problem may not exist or can be tolerated in an environment of low concurrency, thus has not been given enough attention for a long time. In this paper, instead of making a trade-off between the high hit ratio of a replacement algorithm and the low lock contention of its approximation, we propose a system framework, called BP-Wrapper, that (almost) eliminates lock contention for any replacement algorithm without requiring any changes to the algorithm. In BP-Wrapper, we use batching and prefetching techniques to reduce lock contention and to retain high hit ratio. The implementation of BP-Wrapper in PostgreSQL version 8.2 adds only about 300 lines of C code. It can increase the throughput up to two folds compared with the replacement algorithms with lock contention when running TPC-C-like and TPC-W-like workloads. I.
Management
"... Traditionally, operating systems use a coarse approximation of memory accesses to implement memory management algorithms by monitoring page faults or scanning page table entries. With finer-grained memory access information, however, the operating system can manage memory much more effectively. Prev ..."
Abstract
- Add to MetaCart
Traditionally, operating systems use a coarse approximation of memory accesses to implement memory management algorithms by monitoring page faults or scanning page table entries. With finer-grained memory access information, however, the operating system can manage memory much more effectively. Previous work has proposed the use of a software mechanism based on virtual page protection and soft faults to track page accesses at finer granularity. In this paper, we show that while this approach is effective for some applications, for many others it results in an unacceptably high overhead. We propose simple Page Access Tracking Hardware (PATH) to provide accurate page access information to the operating system. The suggested hardware support is generic and can be used by various memory management algorithms. In this paper, we show how the information generated by PATH can be used to implement (i) adaptive page replacement policies, (ii) smart process memory allocation to improve performance or to provide isolation and better process prioritization, and (iii) effectively prefetch virtual memory pages when applications have non-trivial memory access patterns. Our simulation results show that these algorithms can dramatically improve performance (up to 500%) with PATH-provided information, especially when the system is under memory pressure. We show that the software overhead of processing PATH information is less than 6 % across the applications we examined (less than 3% in all but two applications), which is at least an order of magnitude less than the overhead of existing software approaches. 1.
SOPA: Selecting the Optimal Policy Adaptively for a cache system
"... With the development of storage technique, new caching policies are continuously being introduced, which makes it increasingly important for cache systems to select the optimal caching policy dynamically under varying workloads. This paper proposes SOPA, a mechanism to adaptively select the optimal ..."
Abstract
- Add to MetaCart
With the development of storage technique, new caching policies are continuously being introduced, which makes it increasingly important for cache systems to select the optimal caching policy dynamically under varying workloads. This paper proposes SOPA, a mechanism to adaptively select the optimal policy and perform policy switching. First, SOPA encapsulates the functions of a caching policy into a module, and enables online policy switching by policy reconstruction. Second, SOPA selects the optimal policy dynamically by collecting and analyzing access traces, and reduces the decision-making cost by asynchronous decision. The simulation evaluation showed that no single caching policy could perform well under all of the different workloads, but SOPA could select the appropriate policy for each workload. The real-system evaluation showed that SOPA could reduce the average response time by up to 20.3 % compared with LRU and up to 11.9% compared with ARC.
CLIC: CLient-Informed Caching for Storage Servers
- Proc. FAST
, 2009
"... Traditional caching policies are known to perform poorly for storage server caches. One promising approach to solving this problem is to use hints from the storage clients to manage the storage server cache. Previous hinting approaches are ad hoc, in that a predefined reaction to specific types of h ..."
Abstract
- Add to MetaCart
Traditional caching policies are known to perform poorly for storage server caches. One promising approach to solving this problem is to use hints from the storage clients to manage the storage server cache. Previous hinting approaches are ad hoc, in that a predefined reaction to specific types of hints is hard-coded into the caching policy. With ad hoc approaches, it is difficult to ensure that the best hints are being used, and it is difficult to accommodate multiple types of hints and multiple client applications. In this paper, we propose CLient-Informed Caching (CLIC), a generic hint-based policy for managing storage server caches. CLIC automatically interprets hints generated by storage clients and translates them into a server caching policy. It does this without explicit knowledge of the application-specific hint semantics. We demonstrate using trace-based simulation of database workloads that CLIC outperforms hintoblivious and state-of-the-art hint-aware caching policies. We also demonstrate that the space required to track and interpret hints is small. 1
Professor:
, 2006
"... Hereby I declare, that this work was produced in an autonomous fashion (except chapter “State of the Art ” which was inspired by a whole lot of papers) and that I did use the specified resources, only. ..."
Abstract
- Add to MetaCart
Hereby I declare, that this work was produced in an autonomous fashion (except chapter “State of the Art ” which was inspired by a whole lot of papers) and that I did use the specified resources, only.
Improving Cache Performance using Victim Tag Stores
, 2011
"... With increasing pressure on memory bandwidth, there have been a number of proposals that improve the cache replacement policy. These mechanisms monitor the cache blocks while they are in the cache and evict blocks that are deemed to have low temporal locality. However, a majority of these mechanisms ..."
Abstract
- Add to MetaCart
With increasing pressure on memory bandwidth, there have been a number of proposals that improve the cache replacement policy. These mechanisms monitor the cache blocks while they are in the cache and evict blocks that are deemed to have low temporal locality. However, a majority of these mechanisms are agnostic to the temporal locality of a missed block and follow a single insertion policy for all incoming blocks. There is comparatively very little work on mechanisms to distinguish between missed blocks based on their temporal reuse behavior. Prior work has shown that distinguishing missed blocks based on their temporal locality and choosing the insertion policy on a per-block basis can significantly improve performance. To this end, we propose a new, simple hardware mechanism that predicts the temporal locality of a missed block before inserting it into the cache. The key insight behind the prediction scheme is that if a block with good temporal locality gets prematurely evicted from the cache, it will be accessed soon after eviction. To implement this prediction scheme, our mechanism augments the conventional cache with a structure, victim tag store, that keeps track of addresses of blocks evicted from the cache. We provide a practical, low-complexity hardware implementation of our mechanism using Bloom filters. We qualitatively and quantitatively compare our mechanism to five different cache management mechanisms and show that it provides significant performance improvements. 1
Decoupled Dynamic Cache Segmentation
"... The least recently used (LRU) replacement policy performspoorlyinthelast-levelcache(LLC)becausetemporal locality of memory accesses is filtered by first and second level caches. We propose a cache segmentation technique that dynamically adapts to cache access patterns by predicting the best number o ..."
Abstract
- Add to MetaCart
The least recently used (LRU) replacement policy performspoorlyinthelast-levelcache(LLC)becausetemporal locality of memory accesses is filtered by first and second level caches. We propose a cache segmentation technique that dynamically adapts to cache access patterns by predicting the best number of not-yet-referenced and alreadyreferenced blocks in the cache. This technique is independent from the LRU policy so it can work with less expensive replacementpolicies. Itcanautomaticallydetectwhen to bypass blocks to the CPU with no extra overhead. In a 2MB LLC single-core processor with a memory intensivesubsetofSPECCPU 2006benchmarks,itoutperforms LRU replacement on average by 5.2 % with not-recentlyused(NRU)replacementandonaverageby 2.2%with randomreplacement. Thetechniquealsocomplementsexisting shared cache partitioning techniques. Our evaluation with 10 multi-programmed workloads shows that this technique improvesperformanceof an 8MB LLC four-core system on averageby 12%, with a randomreplacementpolicyrequiringonlyhalfthespaceofthe LRUpolicy. 1.
SOPA: Selecting the Optimal Caching Policy Adaptively
"... With the development of storage technology and applications, new caching policies are continuously being introduced. It becomes increasingly important for storage systems to be able to select the matched caching policy dynamically under varying workloads. This article proposes SOPA, a cache framewor ..."
Abstract
- Add to MetaCart
With the development of storage technology and applications, new caching policies are continuously being introduced. It becomes increasingly important for storage systems to be able to select the matched caching policy dynamically under varying workloads. This article proposes SOPA, a cache framework to adaptively select the matched policy and perform policy switches in storage systems. SOPA encapsulates the functions of a caching policy into a module, and enables online policy switching by policy reconstruction. SOPA then selects the policy matched with the workload dynamically by collecting and analyzing access traces. To reduce the decision-making cost, SOPA proposes an asynchronous decision making process. The simulation experiments show that no single caching policy performed well under all of the different workloads. With SOPA, a storage system could select the appropriate policy for different workloads. The real-system evaluation results show that SOPA reduced the average response time by up to 20.3 % and 11.9 % compared with LRU and ARC, respectively.

