Results 11 - 20
of
20
Efficiently Identifying Working Sets in Block I/O Streams
"... Identifying groups of blocks that tend to be read or written togetherinagivenenvironmentisthefirststeptowardspowerful techniques for device failure isolation and power management. For example, identified groups can be placed together on a single disk, avoiding excess drive activity across an exascal ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Identifying groups of blocks that tend to be read or written togetherinagivenenvironmentisthefirststeptowardspowerful techniques for device failure isolation and power management. For example, identified groups can be placed together on a single disk, avoiding excess drive activity across an exascale storage system. Unlike previous grouping work, we focus on identifying groupings in data that can be gathered from real, running systems with minimal impact. Using temporal, spatial, and access ordering information from an enterprise data set, we identified a set of groupings that consistently appear, indicating that these are working sets that are likely to be accessed together. We present several techniques to obtain groupings along with a discussion of what techniques best apply to particular types of real systems. We intend to use these preliminary results to inform our search for new types of workloads with a goal of identifying properties of easily separable workloads across different systems and dynamically moving groups in these workloads to reduce disk activity in large storage systems.
Context-Aware Prefetching at the Storage Server
"... In many of today’s applications, access to storage constitutes the major cost of processing a user request. Data prefetching has been used to alleviate the storage access latency. Under current prefetching techniques, the storage system prefetches a batch of blocks upon detecting an access pattern. ..."
Abstract
- Add to MetaCart
In many of today’s applications, access to storage constitutes the major cost of processing a user request. Data prefetching has been used to alleviate the storage access latency. Under current prefetching techniques, the storage system prefetches a batch of blocks upon detecting an access pattern. However, the high level of concurrency in today’s applications typically leads to interleaved block accesses, which makes detecting an access pattern a very challenging problem. Towards this, we propose and evaluate QuickMine, a novel, lightweight and minimally intrusive method for contextaware prefetching. Under QuickMine, we capture application contexts, such as a transaction or query, and leverage them for context-aware prediction and improved prefetching effectiveness in the storage cache. We implement a prototype of our context-aware prefetching algorithm in a storage-area network (SAN) built using Network Block Device (NBD). Our prototype shows that context-aware prefetching clearly outperforms existing context-oblivious prefetching algorithms, resulting in factors of up to 2 improvements in application latency for two e-commerce workloads with repeatable access patterns, TPC-W and RUBiS. 1
Memory Resource Allocation for File System Prefetching From a Supply Chain Management Perspective
"... As an important technique to hide disk I/O latency, prefetching has been widely studied, and dynamic adaptive prefetching techniques have been deployed in diverse storage environments. However, two issues are not well addressed by previous research: (1) how to handle the prefetching resource allocat ..."
Abstract
- Add to MetaCart
As an important technique to hide disk I/O latency, prefetching has been widely studied, and dynamic adaptive prefetching techniques have been deployed in diverse storage environments. However, two issues are not well addressed by previous research: (1) how to handle the prefetching resource allocation between concurrent sequential access streams with different request rates, and (2) how to coordinate prefetching at multiple levels in the data access path. Interestingly, we found that these problems bear a strong resemblance to situations long studied in the field of supply chain management (SCM), used by retailers such as Wal-Mart. In this paper, we demonstrate how to perform the problem mapping and then apply SCM principles in practice, particularly from the branch of inventory theory, to improve data prefetching performance in storage systems. More specifically, we applied (1) two SCM policies to dynamically configure the sequential prefetching parameters, and (2) an SCM solution to correct the access pattern information distortion in multi-level prefetching. We implemented these SCM-based strategies in the Linux kernel prefetching algorithm and a multi-level storage simulator, and evaluated the performance with three types of work-Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
1 Block Storage Listener for Detecting File-Level Intrusions
"... Abstract—An intrusion detection system (IDS) is usually located and operated at the host, where it captures local suspicious events, or at an appliance that listens to the network activity. Providing an online IDS to the storage controller is essential for dealing with compromised hosts or coordinat ..."
Abstract
- Add to MetaCart
Abstract—An intrusion detection system (IDS) is usually located and operated at the host, where it captures local suspicious events, or at an appliance that listens to the network activity. Providing an online IDS to the storage controller is essential for dealing with compromised hosts or coordinated attacks by multiple hosts. SAN block storage controllers are connected to the world via block-level protocols, such as iSCSI and Fibre Channel. Usually, block-level storage systems do not maintain information specific to the file-system using them. The range of threats that can be handled at the block level is limited. A file system view at the controller, together with the knowledge of which arriving block belongs to which file or inode, will enable the detection of file-level threats. In this paper, we present IDStor, an IDS for block-based storage. IDStor acts as a listener to storage traffic, out of the controller’s I/O path, and is therefore attractive for integration into existing SAN-based storage solutions. IDStor maintains a block-to-file mapping that is updated online. Using this mapping, IDStor infers the semantics of file-level commands from the intercepted block-level operations, thereby detecting file-level intrusions by merely observing the block read and write commands passing between the hosts and the controller. I.
Autograph: Automatically Extracting Workflow File Signatures
"... Storage management activities, such as reporting, file placement, migration and archiving, require the ability to discover files that belong to an application workflow by relying only on information from the file server. Some classes of application workflows, such as rendering an animated sequence f ..."
Abstract
- Add to MetaCart
Storage management activities, such as reporting, file placement, migration and archiving, require the ability to discover files that belong to an application workflow by relying only on information from the file server. Some classes of application workflows, such as rendering an animated sequence from its graphics models or building an application from its source files, often exhibit a high degree of repeatability. We describe a system called Autograph that exploits this repeatability to discover files that belong to an application workflow. Our approach examines traces of file accesses, finds repeated and correlated accesses, and infers which files likely belong to the same workflow. Our solution targets server workflows and uses file server traces, which contain less process and file information than the local machine traces used in prior work. We show that Autograph successfully extracts workflow file signatures, even if the workflows are concurrent or share files.
0 · Gopalan Sivathanu et al. Dear TOS editors and reviewers,
"... Attached please find our submission to the ACM Transactions on Storage Systems. Our paper is titled “End-to-End Abstractions for Application-Aware Storage. ” In this article, we provide an overview of the problem of “information-gap ” in the storage stack, and present two novel abstractions that eff ..."
Abstract
- Add to MetaCart
Attached please find our submission to the ACM Transactions on Storage Systems. Our paper is titled “End-to-End Abstractions for Application-Aware Storage. ” In this article, we provide an overview of the problem of “information-gap ” in the storage stack, and present two novel abstractions that effectively bridge this gap, thereby enabling a range of functionality that is almost impossible to achieve with existing systems and interfaces. Most of the material in this article forms part of Gopalan Sivathanu’s Ph.D. dissertation. Our first abstraction is Type-Aware Storage that aims communicating pointer information to the disk hardware. We have published this abstraction in OSDI 2006. This article includes a new unpublished case-study of type-aware storage,“Disk-Level Data Consistency. ” This case-study proposes and evaluates how complex higher-level consistency properties can be achieved at the disk hardware-level, in a file-system–agnostic manner. Our second abstraction is Context-Aware I/O, a flexible mechanism to communicate between applications and data, across the storage stack. We present the design, implementation, and evaluation of the above abstraction, and demonstrate its usefulness through two separate case-studies. This abstraction and its case-studies have not been published in any other venue. Overall, of this 60 page article, about 60 percent is new unpublished material. This work was completely done when all authors were affiliated with the File systems and
DHIS: Discriminating Hierarchical Storage Appears in the Proceedings of the Israeli Experimental Systems Conference (ACM SYSTOR ’09)
"... A typical storage hierarchy comprises of components with varying performance and cost characteristics, providing multiple options for data placement. We propose and evaluate a hierarchical storage system, DHIS, that uses application-level hints to discriminate between data with different access char ..."
Abstract
- Add to MetaCart
A typical storage hierarchy comprises of components with varying performance and cost characteristics, providing multiple options for data placement. We propose and evaluate a hierarchical storage system, DHIS, that uses application-level hints to discriminate between data with different access characteristics, and then customizes its placement and caching policies to each type. The data placement decisions in DHIS are made in an online fashion, during data creation. Most existing solutions that attempt to customize data layout require moving data around, based on access characteristics. DHIS uses two kinds of information to make its decisions. First, it uses knowledge about higher-level pointers between blocks (for example, file system pointers) to understand the relationship between blocks and consequently, their importance. Second, DHIS defines a set of generic attributes that the higher layers can use to annotate data, conveying various properties such as importance, access pattern, etc. Based on these attributes, DHIS dynamically decides to place the data in the hierarchy best suited for its requirements. By doing so, DHIS solves a critical problem faced by storage vendors and developers of higher level storage software, in terms of choosing the most efficient policy among many alternatives. Through several benchmarks, we show that DHIS’s data placement decisions improve performance significantly.
http://www.ssrc.ucsc.edu / HANDS: A Heuristically Arranged Non-Backup In-line
, 2012
"... Deduplication on is rarely used on primary storage because of the disk bottleneck problem, whichresultsfromtheneed to keep an index mapping chunks of data to hash values in memory in order to detect duplicate blocks. This index grows with the number of unique data blocks, creating a scalability prob ..."
Abstract
- Add to MetaCart
Deduplication on is rarely used on primary storage because of the disk bottleneck problem, whichresultsfromtheneed to keep an index mapping chunks of data to hash values in memory in order to detect duplicate blocks. This index grows with the number of unique data blocks, creating a scalability problem, and at current prices the cost of additional RAM approaches the cost of the indexed disks. Thus, previously, deduplication ratios had to be over 45 % to see any cost benefit. The HANDS technique that we introduce in this paper reduces the amount of in-memory index storage required by up to 99 % while still achieving between 30 % and 90 % of the deduplication of a full memory-resident index, making primary deduplication cost effective in workloads with a low deduplication rate. We achieve this by dynamically prefetching fingerprints from disk into memory cache according to working sets derived from access patterns. We demonstrate the effectiveness of our approach using a simple neighborhood grouping that requires only timestamp and block number, making it suitable for a wide range of storage systems without the need to modify host file systems. 1.
Efficient Web Logs Stair-Case Technique to Improve Hit Ratios of Caching
"... Abstract. Cache prefetching technique can improve the hit ratio and expedite users visiting speed. Predictive Web prefetching refers to the mechanism of deducing the forth coming page accesses of a client based on its past accesses.Congestion in Network remains one of the main barriers to the contin ..."
Abstract
- Add to MetaCart
Abstract. Cache prefetching technique can improve the hit ratio and expedite users visiting speed. Predictive Web prefetching refers to the mechanism of deducing the forth coming page accesses of a client based on its past accesses.Congestion in Network remains one of the main barriers to the continuing success of the Internet. For Web users, congestion manifests itself in unacceptably long response times. One possible remedy to the latency problem is to use caching at the client, at the proxy server, or within the Internet. However, Web documents are becoming increasingly dynamic, which limits the potential benefit of caching. The performance of a Web caching system can be dramatically increased by integrating document prefetching into its design. Although prefetching reduces the response time of a requested document, it also increases the network load, as some documents will be unnecessarily prefetched.In the paper, we developed a Stair-Case prune algorithm to mine popular with their conditional probabilities from the proxy log, and stored them in the rule table. Then, according to contents and the rule table, a prediction is calculated in some precondition. After the simulation, we found that our approach has much better performance than the other ones, in terms of hit ratio.

