Results 1 - 10
of
16
Extending SSD lifetimes with disk-based write caches
- INPROCEEDINGSOFFAST’10(SANJOSE,CA,FEBRUARY
, 2010
"... We present Griffin, a hybrid storage device that uses a hard disk drive (HDD) as a write cache for a Solid State Device (SSD). Griffin is motivated by two observations: First, HDDs can match the sequential write bandwidth of mid-range SSDs. Second, both server and desktop workloads contain a signifi ..."
Abstract
-
Cited by 43 (3 self)
- Add to MetaCart
(Show Context)
We present Griffin, a hybrid storage device that uses a hard disk drive (HDD) as a write cache for a Solid State Device (SSD). Griffin is motivated by two observations: First, HDDs can match the sequential write bandwidth of mid-range SSDs. Second, both server and desktop workloads contain a significant fraction of block overwrites. By maintaining a log-structured HDD cache and migrating cached data periodically, Griffin reduces writes to the SSD while retaining its excellent performance. We evaluate Griffin using a variety of I/O traces from Windows systems and show that it extends SSD lifetime by a factor of two and reduces average I/O latency by 56%.
Hot Data Identification for Flash-based Storage Systems Using Multiple Bloom Filters
"... Abstract—Hot data identification can be applied to a variety of fields. Particularly in flash memory, it has a critical impact on its performance (due to a garbage collection) as well as its life span (due to a wear leveling). Although the hot data identification is an issue of paramount importance ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
(Show Context)
Abstract—Hot data identification can be applied to a variety of fields. Particularly in flash memory, it has a critical impact on its performance (due to a garbage collection) as well as its life span (due to a wear leveling). Although the hot data identification is an issue of paramount importance in flash memory, little investigation has been made. Moreover, all existing schemes focus almost exclusively on a frequency viewpoint. However, recency also must be considered equally with the frequency for effective hot data identification. In this paper, we propose a novel hot data identification scheme adopting multiple bloom filters to efficiently capture finer-grained recency as well as frequency. In addition to this scheme, we propose a Window-based Direct Address Counting (WDAC) algorithm to approximate an ideal hot data identification as our baseline. Unlike the existing baseline algorithm that cannot appropriately capture recency information due to its exponential batch decay, our WDAC algorithm, using a sliding window concept, can capture very fine-grained recency information. Our experimental evaluation with diverse realistic workloads including real SSD traces demonstrates that our multiple bloom filter-based scheme outperforms the state-of-theart scheme. In particular, ours not only consumes 50 % less memory and requires less computational overhead up to 58%, but also improves its performance up to 65%.
hatS: A Heterogeneity-Aware Tiered Storage for Hadoop
"... Abstract—Hadoop has become the de-facto large-scale data processing framework for modern analytics applications. A major obstacle for sustaining high performance and scalability in Hadoop is managing the data growth while meeting the ever higher I/O demand. To this end, a promising trend in storage ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
(Show Context)
Abstract—Hadoop has become the de-facto large-scale data processing framework for modern analytics applications. A major obstacle for sustaining high performance and scalability in Hadoop is managing the data growth while meeting the ever higher I/O demand. To this end, a promising trend in storage systems is to utilize hybrid and heterogeneous devices — Solid State Disks (SSD), ramdisks and Network Attached Storage (NAS), which can help achieve very high I/O rates at acceptable cost. However, the Hadoop Distributed File System (HDFS) that is unable to exploit such heterogeneous storage. This is because HDFS works on the assumption that the underlying devices are homogeneous storage blocks, disregarding their individual I/O characteristics, which leads to performance degradation. In this paper, we present hatS, a Heterogeneity-Aware Tiered Storage, which is a novel redesign of HDFS into a multi-tiered storage system that seamlessly integrates heterogeneous storage technologies into the Hadoop ecosystem. hatS also proposes data placement and retrieval policies, which improve the utilization of the storage devices based on their characteristics such as I/O throughput and capacity. We evaluate hatS using an actual implementation on a medium-sized cluster consisting of HDDs and two types of SSDs (i.e., SATA SSD and PCIe SSD). Experiments show that hatS achieves 32.6 % higher read bandwidth, on average, than HDFS for the test Hadoop jobs (such as Grep and TestDFSIO) by directing 64 % of the I/O accesses to the SSD tiers. We also evaluate our approach with trace-driven simulations using synthetic Facebook workloads, and show that compared to the standard setup, hatS improves the average I/O rate by 36%, which results in 26 % improvement in the job completion time. Keywords-Tiered storage; Hadoop Distributed File System (HDFS); data placement and retrieval policy.
Thermal modeling of hybrid storage clusters
- Journal of Signal Processing Systems
"... Your article is protected by copyright and all rights are held exclusively by Springer Science +Business Media New York. This e-offprint is for personal use only and shall not be self-archived in electronic repositories. If you wish to self-archive your article, please use the accepted manuscript ve ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Your article is protected by copyright and all rights are held exclusively by Springer Science +Business Media New York. This e-offprint is for personal use only and shall not be self-archived in electronic repositories. If you wish to self-archive your article, please use the accepted manuscript version for posting on your own website. You may further deposit the accepted manuscript version in any repository, provided it is only made publicly available 12 months after official publication or later and provided acknowledgement is given to the original source of publication and a link is inserted to the published article on Springer's website. The link must be accompanied by the following text: "The final publication is available at link.springer.com”.
An Active and Hybrid Storage System for Data-intensive Applications
, 2011
"... Since large-scale and data-intensive applications have been widely deployed, there is a growing demand for high-performance storage systems to support data-intensive applications. Compared with traditional storage systems, next-generation systems will embrace dedicated processor to reduce computatio ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Since large-scale and data-intensive applications have been widely deployed, there is a growing demand for high-performance storage systems to support data-intensive applications. Compared with traditional storage systems, next-generation systems will embrace dedicated processor to reduce computational load of host machines and will have hybrid combinations of different storage devices. We present a new architecture of active storage system, which leverage the computational power of the dedicated processor, and show how it utilizes the multi-core processor and offloads the computation from the host machine. We then solve the challenge of applying the active storage node to cooperate with the other nodes in the cluster environment by design a pipeline-parallel processing pattern and report the effectiveness of the mechanism. In order to evaluate the design, an open-source bioinformatics application is extended based on the pipeline-parallel mechanism. We also explore the hybrid configuration of storage devices within the active storage. The advent of flash-memory-based solid state disk has become a critical role in revolutionizing the storage world. However, instead of simply replacing the traditional magnetic harddisk with thesolid state disk, researchers believe that finding a complementary approach to corporate both of
HyCache: A Hybrid User-Level File System with SSD Caching Student Name
"... Abstract—For many decades, a huge performance gap has existed between volatile memory and mechanical hard disk drives. This will be a critical issue with extreme scale computing systems. Although non-volatile memory has been around since the 1990’s, mechanical hard disk drives are still dominant due ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—For many decades, a huge performance gap has existed between volatile memory and mechanical hard disk drives. This will be a critical issue with extreme scale computing systems. Although non-volatile memory has been around since the 1990’s, mechanical hard disk drives are still dominant due to large capacity and relatively low cost. We have designed and implemented HyCache, a user-level file system that leverages both mechanical hard disk drive cost-effectiveness and solidstate drive performance. We adopted FUSE to deliver a userlevel POSIX-compliant file system that does not require any OS or application modifications. HyCache allows multiple devices to be used together to achieve better performance while keeping costs low. An extensive evaluation is performed using synthetic benchmarks as well as real world applications, which shows that HyCache can achieve up to 7X higher throughput and 76X higher IOPS over traditional Ext4 file systems on mechanical hard disk drives. I.
Contents lists available at ScienceDirect Journal of Systems Architecture
"... journal homepage: www.elsevier.com/locate/sysarc ..."
(Show Context)
Samsung Electronics, Korea
"... The multi-level cell (MLC) NAND flash memory technology enables multiple bits of information to be stored on a single cell, thus making it possible to increase the density of the memory without increasing the die size. For most MLC flash memories, each cell can be programmed as a single-level cell o ..."
Abstract
- Add to MetaCart
(Show Context)
The multi-level cell (MLC) NAND flash memory technology enables multiple bits of information to be stored on a single cell, thus making it possible to increase the density of the memory without increasing the die size. For most MLC flash memories, each cell can be programmed as a single-level cell or a multi-level cell during runtime. Therefore, it has a potential to achieve both the high performance of SLC flash memory and the high capacity of MLC flash memory. In this paper, we present a flexible flash file system, called FlexFS, which takes advantage of the dynamic reconfiguration facility of MLC flash memory. FlexFS divides the flash memory medium into SLC and MLC regions, and dynamically changes the size of each region to meet the changing requirements of applications. We exploit patterns of storage usage to minimize the overhead of reorganizing two different regions. We also propose a novel wear management scheme which mitigates the effect of the extra writes required by FlexFS on the lifetime of flash memory. Our implementation of FlexFS in the Linux 2.6 kernel shows that it can achieve a performance comparable to SLC flash memory while keeping the capacity of MLC flash memory for both simulated and real mobile workloads. 1
PRISM: Zooming in Persistent RAM Storage Behavior
"... Abstract—It has been foreseen that some of the roles assumed by conventional rotating hard disk drives (HDDs) will migrate to solid-state drives (SSDs) and emerging persistent RAM storages. New persistent RAM storages have critical advantages over HDDs and even SSDs in terms of performance and power ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—It has been foreseen that some of the roles assumed by conventional rotating hard disk drives (HDDs) will migrate to solid-state drives (SSDs) and emerging persistent RAM storages. New persistent RAM storages have critical advantages over HDDs and even SSDs in terms of performance and power. Persistent RAM technologies are flexible enough to be used for both storage and main memory—in future platforms, this flexibility will allow tighter integration of a system’s memory and storage hierarchy. On the other hand, designers are faced with new technical issues to address to fully exploit the benefits of persistent RAM technologies and hide their downsides. In this paper, we introduce PRISM (PeRsIstent RAM Storage Monitor)—our novel infrastructure that enables exploring various design trade-offs of persistent RAM storage. PRISM allows designers to examine a persistent RAM storage’s lowlevel behavior and evaluate its various architectural organizations while running realistic workloads, as well as storage activities of a contemporary off-the-shelf OS. PRISM builds on kernel source code level instrumentation and the standard Linux device driver mechanism to generate persistent RAM storage traces. Moreover, PRISM includes a storage architecture simulator to faithfully model major persistent RAM storage hardware components. To illustrate how and with what PRISM can help the user, we present a case study that involves running an OLTP (on-line transaction processing) workload. PRISM successfully provides the detailed performance analysis results, while incurring acceptable overheads. Based on our experience, we believe that PRISM is a versatile tool for exploring persistent RAM storage design choices ranging from OS-level resource management policy down to chip-level storage organization. I.