Results 1 -
7 of
7
Minerva: an automated resource provisioning tool for large-scale storage systems
- ACM Transactions on Computer Systems
, 2001
"... Enterprise-scale storage systems, which can contain hundreds of host computers and storage devices and up to tens of thousands of disks and logical volumes, are difficult to design. The volume of choices that need to be made is massive, and many choices have unforeseen interactions. Storage system d ..."
Abstract
-
Cited by 103 (24 self)
- Add to MetaCart
Enterprise-scale storage systems, which can contain hundreds of host computers and storage devices and up to tens of thousands of disks and logical volumes, are difficult to design. The volume of choices that need to be made is massive, and many choices have unforeseen interactions. Storage system design is tedious and complicated to do by hand, usually leading to solutions that are grossly overprovisioned, substantially under-performing or, in the worst case, both. To solve the configuration nightmare, we present MINERVA: a suite of tools for designing storage systems automatically. MINERVA uses declarative specifications of application requirements and device capabilities; constraint-based formulations of the various subproblems; and optimization techniques to explore the search space of possible solutions. This paper also explores and evaluates the design decisions that went into MINERVA, using specialized micro and macro-benchmarks. We show that MINERVA can successfully handle a workload with substantial complexity (a decision-support database benchmark). MINERVA created a 16-disk design in only a few minutes that achieved the same performance as a 30-disk system manually designed by human experts. Of equal importance, MINERVA was able to predict the resulting system's performance before it was built.
Selecting RAID levels for disk arrays
, 2002
"... Disk arrays have a myriad of configuration parameters that interact in counter-intuitive ways, and those interactions can have significant impacts on cost, performance, and reliability. Even after values for these parameters have been chosen, there are exponentially-many ways to map data onto the di ..."
Abstract
-
Cited by 30 (7 self)
- Add to MetaCart
Disk arrays have a myriad of configuration parameters that interact in counter-intuitive ways, and those interactions can have significant impacts on cost, performance, and reliability. Even after values for these parameters have been chosen, there are exponentially-many ways to map data onto the disk arrays' logical units. Meanwhile, the importance of correct choices is increasing: storage systems represent an growing fraction of total system cost, they need to respond more rapidly to changing needs, and there is less and less tolerance for mistakes. We believe that automatic design and configuration of storage systems is the only viable solution to these issues. To that end, we present a comparative study of a range of techniques for programmatically choosing the RAID levels to use in a disk array. Our simplest approaches are modeled on existing, manual rules of thumb: they "tag" data with a RAID level before determining the configuration of the array to which it is assigned. Our best approach simultaneously determines the RAID levels for the data, the array configuration, and the layout of data on that array. It operates as an optimization process with the twin goals of minimizing array cost while ensuring that storage workload performance requirements will be met. This approach produces robust solutions with an average cost/performance 14-- 17% better than the best results for the tagging schemes, and up to 150--200% better than their worst solutions. We believe that this is the first presentation and systematic analysis of a variety of novel, fully-automatic RAID- level selection techniques. 1
Storage Device Performance Prediction with CART Models
, 2004
"... Storage device performance prediction is a key element of self-managed storage systems and application planning tasks, such as data assignment. This work explores the application of a machine learning tool, CART models, to storage device modeling. Our approach predicts a device's performance as a fu ..."
Abstract
-
Cited by 27 (5 self)
- Add to MetaCart
Storage device performance prediction is a key element of self-managed storage systems and application planning tasks, such as data assignment. This work explores the application of a machine learning tool, CART models, to storage device modeling. Our approach predicts a device's performance as a function of input workloads, requiring no knowledge of the device internals. We propose two uses of CART models: one that predicts per-request response times (and then derives aggregate values) and one that predicts aggregate values directly from workload characteristics. After being trained on our experimental platforms, both provide accurate black-box models across a range of test traces from real environments. Experiments show that these models predict the average and 90th percentile response time with an relative error as low as 16%, when the training workloads are similar to the testing workloads, and interpolate well across different workloads.
Ergastulum: Quickly Finding Near-Optimal Storage System Designs
"... The cost of large storage systems is dominated by management costs. Typically, skilled administrators configure storage manually using rules of thumb. However, designing a storage system for a given workload is a difficult task, because there are millions of possible configurations and mappings of d ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
The cost of large storage systems is dominated by management costs. Typically, skilled administrators configure storage manually using rules of thumb. However, designing a storage system for a given workload is a difficult task, because there are millions of possible configurations and mappings of data, and because storage system behavior is complex. Ergastulum is a new storage system designer that can be used both to guide administrators in their design decisions and as part of an automatic storage system management tool like Hippodrome [4]. Ergastulum generalizes the best-fit bin packing heuristic with randomization and backtracking to efficiently search through the huge number of possible design choices. Design decisions are informed by device models that estimate storage system performance. We show that Ergastulum quickly generates near-optimal storage system designs. It is faster and generates better solutions than previous tools, and it is substantially faster than an integer programming implementation that generates optimal solutions for simplified device models. We conclude that Ergastulum is a comprehensive solution to the storage system design problem.
Performance Modeling of Storage Devices using Machine Learning
, 2006
"... also sponsored through generous grants from the EMC Corporation and the Intel ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
also sponsored through generous grants from the EMC Corporation and the Intel
HPL--SSP--2001--4: Simple table-based modeling of storage devices
, 2001
"... Trace driven simulations are too slow for use in solvers. Analytic models require work from a person to understand the array enough to model it. Table based models offer the possibility of automatically measuring the performance of an array for use in a solver. We explain a simplistic way of generat ..."
Abstract
- Add to MetaCart
Trace driven simulations are too slow for use in solvers. Analytic models require work from a person to understand the array enough to model it. Table based models offer the possibility of automatically measuring the performance of an array for use in a solver. We explain a simplistic way of generating the input points in the table. We then explore three different ways of performing the interpolation of nearby points from the points within the table, and comment on future directions the work could go. 1
USENIX Association
, 1992
"... Modern storage environments are composed of a variety of devices with different performance characteristics. In this paper, we explore storage-aware caching algorithms, in which the file buffer replacement algorithm explicitly accounts for differences in performance across devices. We introduce a ne ..."
Abstract
- Add to MetaCart
Modern storage environments are composed of a variety of devices with different performance characteristics. In this paper, we explore storage-aware caching algorithms, in which the file buffer replacement algorithm explicitly accounts for differences in performance across devices. We introduce a new family of storageaware caching algorithms that partition the cache, with one partition per device. The algorithms set the partition sizes dynamically to balance work across the devices. Through simulation, we show that our storageaware policies perform similarly to LANDLORD, a costaware algorithm previously shown to perform well in Web caching environments. We also demonstrate that partitions can be easily incorporated into the Clock replacement algorithm, thus increasing the likelihood of deploying cost-aware algorithms in modern operating systems.

