Results 1 - 10
of
35
Active Disks: Programming Model, Algorithms and Evaluation
, 1998
"... Several application and technology trends indicate that it might be both profitable and feasible to move computation closer to the data that it processes. In this paper, we evaluate Active Disk architectures which integrate significant processing power and memory into a disk drive and allow applicat ..."
Abstract
-
Cited by 159 (9 self)
- Add to MetaCart
Several application and technology trends indicate that it might be both profitable and feasible to move computation closer to the data that it processes. In this paper, we evaluate Active Disk architectures which integrate significant processing power and memory into a disk drive and allow application-specific code to be downloaded and executed on the data that is being read from (written to) disk. The key idea is to o#oad bulk of the processing to the disk-resident processors and to use the host processor primarily for coordination, scheduling and combination of results from individual disks. To program Active Disks, we propose a stream-based programming model which allows disklets to be executed efficiently and safely. Simulation results for a suite of six algorithms from three application domains (commercial data warehouses, image processing and satellite data processing) indicate that for these algorithms, Active Disks outperform conventional-disk architectures.
Minerva: an automated resource provisioning tool for large-scale storage systems
- ACM Transactions on Computer Systems
, 2001
"... Enterprise-scale storage systems, which can contain hundreds of host computers and storage devices and up to tens of thousands of disks and logical volumes, are difficult to design. The volume of choices that need to be made is massive, and many choices have unforeseen interactions. Storage system d ..."
Abstract
-
Cited by 103 (24 self)
- Add to MetaCart
Enterprise-scale storage systems, which can contain hundreds of host computers and storage devices and up to tens of thousands of disks and logical volumes, are difficult to design. The volume of choices that need to be made is massive, and many choices have unforeseen interactions. Storage system design is tedious and complicated to do by hand, usually leading to solutions that are grossly overprovisioned, substantially under-performing or, in the worst case, both. To solve the configuration nightmare, we present MINERVA: a suite of tools for designing storage systems automatically. MINERVA uses declarative specifications of application requirements and device capabilities; constraint-based formulations of the various subproblems; and optimization techniques to explore the search space of possible solutions. This paper also explores and evaluates the design decisions that went into MINERVA, using specialized micro and macro-benchmarks. We show that MINERVA can successfully handle a workload with substantial complexity (a decision-support database benchmark). MINERVA created a 16-disk design in only a few minutes that achieved the same performance as a 30-disk system manually designed by human experts. Of equal importance, MINERVA was able to predict the resulting system's performance before it was built.
Traveling to Rome: QoS specifications for automated storage system management
- International Workshop on Quality of Service
, 2001
"... . The design and operation of very large-scale storage systems is an area ripe for application of automated design and management techniques -- and at the heart of such techniques is the need to represent storage system QoS in many guises: the goals (service level requirements) for the storage sy ..."
Abstract
-
Cited by 49 (6 self)
- Add to MetaCart
. The design and operation of very large-scale storage systems is an area ripe for application of automated design and management techniques -- and at the heart of such techniques is the need to represent storage system QoS in many guises: the goals (service level requirements) for the storage system, predictions for the design that results, enforcement constraints for the runtime system to guarantee, and observations made of the system as it runs. Rome is the information model that the Storage Systems Program at HP Laboratories has developed to address these needs. We use it as an "information bus" to tie together our storage system design, configuration, and monitoring tools. In 5 years of development, Rome is now on its third iteration; this paper describes its information model, with emphasis on the QoS-related components, and presents some of the lessons we have learned over the years in using it. 1.
Active Disks - Remote Execution for Network-Attached Storage
, 1997
"... The principal trend in the design of computer systems is the expectation of much greater computational power in future generations of microprocessors. This trend applies to embedded systems as well as host processors. As a result, devices such as storage controllers have excess capacity and growing ..."
Abstract
-
Cited by 46 (1 self)
- Add to MetaCart
The principal trend in the design of computer systems is the expectation of much greater computational power in future generations of microprocessors. This trend applies to embedded systems as well as host processors. As a result, devices such as storage controllers have excess capacity and growing computational capabilities. Storage system designers are exploiting this trend with higher-level interfaces to storage and increased intelligence inside storage devices. One development in this direction is Network-Attached Secure Disks (NASD) which attaches storage devices directly to the network and raises the storage interface above the simple (fixed-size block) memory abstraction of SCSI. This allows devices more freedom to provide efficient operations; promises more scalable subsystems by offloading file system and storage management functionality from dedicated servers; and reduces latency by executing common case requests directly at storage devices. In this paper, we push this increa...
An experimental study of data migration algorithms. Algorithm Engineering
- the Proceedings of WAE 2001: 5th Workshop on Algorithm Engineering (BRICS, University of Aarhus
, 2001
"... Abstract. The data migration problem is the problem ofcomputing a plan for moving data objects stored on devices in a network from one configuration to another. Load balancing or changing usage patterns might necessitate such a rearrangement ofdata. In this paper, we consider the case where the obje ..."
Abstract
-
Cited by 40 (5 self)
- Add to MetaCart
Abstract. The data migration problem is the problem ofcomputing a plan for moving data objects stored on devices in a network from one configuration to another. Load balancing or changing usage patterns might necessitate such a rearrangement ofdata. In this paper, we consider the case where the objects are fixed-size and the network is complete. We introduce two new data migration algorithms, one ofwhich has provably good bounds. We empirically compare the performance of these new algorithms against similar algorithms from Hall et al. [7] which have better theoretical guarantees and find that in almost all cases, the new algorithms perform better. We also find that both the new algorithms and the ones from Hall et al. perform much better in practice than the theoretical bounds suggest. 1
On Algorithms for Efficient Data Migration
, 2001
"... The data migration problem is the problem of computing an efficient plan for moving data stored on devices in a network from one configuration to another. Load balancing or changing usage patterns could necessitate such a rearrangement of data. In this paper, we consider the case where the objects a ..."
Abstract
-
Cited by 32 (3 self)
- Add to MetaCart
The data migration problem is the problem of computing an efficient plan for moving data stored on devices in a network from one configuration to another. Load balancing or changing usage patterns could necessitate such a rearrangement of data. In this paper, we consider the case where the objects are fixed-size and the network is complete. The direct migration problem is closely related to edge-coloring. However, because there are space constraints on the devices, the problem is more complex. Our main results are polynomial time algorithms for finding a near-optimal migration plan in the presence of space constraints when a certain number of additional nodes is available as temporary storage, and a 3/2-approximation for the case where data must be migrated directly to its destination.
Active Disks
, 1998
"... Several application and technology trends indicate that it might be both profitable and feasible to move computation closer to the data that it processes. In this paper, we evaluate Active Disk architectures which integrate significant processing power and memory into a disk drive and allow applicat ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
Several application and technology trends indicate that it might be both profitable and feasible to move computation closer to the data that it processes. In this paper, we evaluate Active Disk architectures which integrate significant processing power and memory into a disk drive and allow application-specific code to be downloaded and executed on the data that is being read from (written to) disk. The key idea is to offload bulk of the processing to the disk-resident processors and to use the host processor primarily for coordination, scheduling and combination of results from individual disks. To program Active Disks, we propose a stream-based programming model which allows disklets to be executed efficiently and safely. Simulation results for a suite of seven algorithms from three application domains (commercial data warehouses, image processing and satellite data processing) indicate that for these algorithms, Active Disks outperform conventional-disk architectures. 1 Introduction...
Selecting RAID levels for disk arrays
, 2002
"... Disk arrays have a myriad of configuration parameters that interact in counter-intuitive ways, and those interactions can have significant impacts on cost, performance, and reliability. Even after values for these parameters have been chosen, there are exponentially-many ways to map data onto the di ..."
Abstract
-
Cited by 30 (7 self)
- Add to MetaCart
Disk arrays have a myriad of configuration parameters that interact in counter-intuitive ways, and those interactions can have significant impacts on cost, performance, and reliability. Even after values for these parameters have been chosen, there are exponentially-many ways to map data onto the disk arrays' logical units. Meanwhile, the importance of correct choices is increasing: storage systems represent an growing fraction of total system cost, they need to respond more rapidly to changing needs, and there is less and less tolerance for mistakes. We believe that automatic design and configuration of storage systems is the only viable solution to these issues. To that end, we present a comparative study of a range of techniques for programmatically choosing the RAID levels to use in a disk array. Our simplest approaches are modeled on existing, manual rules of thumb: they "tag" data with a RAID level before determining the configuration of the array to which it is assigned. Our best approach simultaneously determines the RAID levels for the data, the array configuration, and the layout of data on that array. It operates as an optimization process with the twin goals of minimizing array cost while ensuring that storage workload performance requirements will be met. This approach produces robust solutions with an average cost/performance 14-- 17% better than the best results for the tagging schemes, and up to 150--200% better than their worst solutions. We believe that this is the first presentation and systematic analysis of a variety of novel, fully-automatic RAID- level selection techniques. 1
Feedback Control Real-Time Scheduling
, 2001
"... We develop Feedback Control real-time Scheduling (FCS) as a unified framework to provide Quality of Service (QoS) guarantees in unpredictable environments (such as ebusiness servers on the Internet). FCS includes four major components. First, novel scheduling architectures provide performance contro ..."
Abstract
-
Cited by 27 (10 self)
- Add to MetaCart
We develop Feedback Control real-time Scheduling (FCS) as a unified framework to provide Quality of Service (QoS) guarantees in unpredictable environments (such as ebusiness servers on the Internet). FCS includes four major components. First, novel scheduling architectures provide performance control to a new category of QoS critical systems that cannot be addressed by traditional open loop scheduling paradigms. Second, we derive dynamic models for computing systems for the purpose of performance control. These models provide a theoretical foundation for adaptive performance control. Third, we apply established control methodology to design scheduling algorithms with proven performance guarantees, which is in contrast with existing heuristics-based solutions relying on laborious design/tuning/testing iterations. Fourth, a set of controlbased performance specifications characterizes the efficiency, accuracy, and robustness of QoS guarantees. The

