Results 1 - 10
of
495
The HP AutoRAID hierarchical storage system
- ACM Transactions on Computer Systems
, 1995
"... Configuring redundant disk arrays is a black art. To configure an array properly, a system administrator must understand the details of both the array and the workload it will support. Incorrect understanding of either, or changes in the workload over time, can lead to poor performance. We present a ..."
Abstract
-
Cited by 263 (15 self)
- Add to MetaCart
Configuring redundant disk arrays is a black art. To configure an array properly, a system administrator must understand the details of both the array and the workload it will support. Incorrect understanding of either, or changes in the workload over time, can lead to poor performance. We present a solution to this problem: a two-level storage hierarchy implemented inside a single diskarray controller. In the upper level of this hierarchy, two copies of active data are stored to provide full redundancy and excellent performance. In the lower level, RAID 5 parity protection is used to provide excellent storage cost for inactive data, at somewhat lower performance. The technology we describe in this paper, known as HP AutoRAID, automatically and transparently manages migration of data blocks between these two levels as access patterns change. The result is a fully redundant storage system that is extremely easy to use, is suitable for a wide variety of workloads, is largely insensitive to dynamic workload changes, and performs much better than disk arrays with comparable numbers of spindles and much larger amounts of front-end RAM cache. Because the implementation of the HP AutoRAID technology is almost entirely in software, the additional hardware cost for these benefits is very small. We describe the HP AutoRAID technology in detail, provide performance data for an embodiment of it in a storage array, and summarize the results of simulation studies used to choose algorithms implemented in the array.
Disk-directed I/O for MIMD Multiprocessors
, 1994
"... Many scientific applications that run on today’s multiprocessors, such as weather forecasting and seismic analysis, are bottlenecked by their file-I/O needs. Even if the multiprocessor is configured with sufficient I/O hardware, the file-system software often fails to provide the available bandwidth ..."
Abstract
-
Cited by 262 (18 self)
- Add to MetaCart
Many scientific applications that run on today’s multiprocessors, such as weather forecasting and seismic analysis, are bottlenecked by their file-I/O needs. Even if the multiprocessor is configured with sufficient I/O hardware, the file-system software often fails to provide the available bandwidth to the application. Although libraries and enhanced file-system interfaces can make a significant improvement, we believe that fundamental changes are needed in the file-server software. We propose a new technique, disk-directed I/O, to allow the disk servers to determine the flow of data for maximum performance. Our simulations show that tremendous performance gains are possible. Indeed, disk-directed I/O provided consistent high performance that was largely independent of data distribution, obtained up to 93 % of peak disk bandwidth, and was as much as 16 times faster than traditional parallel file systems.
Adaptive disk spin-down policies for mobile computers
- In Proceedings 2nd USENIX Symp. on Mobile and Location-Independent Computing
, 1995
"... ABSTRACT: Mobile computers typically spin down their hard disk after a fixed period of inactivity. If this threshold is too long, the disk wastes energy; if it is too short, the delay due to spinning the disk up again frushates the user. Usage patterns change over time, so a single fixed threshold m ..."
Abstract
-
Cited by 237 (7 self)
- Add to MetaCart
(Show Context)
ABSTRACT: Mobile computers typically spin down their hard disk after a fixed period of inactivity. If this threshold is too long, the disk wastes energy; if it is too short, the delay due to spinning the disk up again frushates the user. Usage patterns change over time, so a single fixed threshold may not be appropriate at all times. Also, different users may have varying priorities with respect to trading off energy conservation against performance. We describe a method for varying the spin-down threshold dynamically by adapting to the user's access patterns and priorities. Adaptive spin-down can in some circumstances reduce by up to 507o the number of disk spin-ups that are deemed by the user to be inconvenient, while only moderately increasing energy consumption.
A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems
- Software – Practice & Experience
, 1997
"... It is well-known that Reed-Solomon codes may be used to provide error correction for multiple failures in RAID-like systems. The coding technique itself, however, is not as well-known. To the coding theorist, this technique is a straightforward extension to a basic coding paradigm and needs no speci ..."
Abstract
-
Cited by 234 (37 self)
- Add to MetaCart
(Show Context)
It is well-known that Reed-Solomon codes may be used to provide error correction for multiple failures in RAID-like systems. The coding technique itself, however, is not as well-known. To the coding theorist, this technique is a straightforward extension to a basic coding paradigm and needs no special mention. However, to the systems programmer with no training in coding theory, the technique may be a mystery. Currently, there are no references that describe how to perform this coding that do not assume that the reader is already well-versed in algebra and coding theory. This paper is intended for the systems programmer. It presents a complete specification of the coding algorithm plus details on how it may be implemented. This specification assumes no prior knowledge of algebra or coding theory. The goal of this paper is for a systems programmer to be able to implement Reed-Solomon coding for reliability in RAID-like systems without needing to consult any external references. Problem Specification Let there be storage devices, ¡£¢¥¤¦¡¨§©¤�������¤¦¡¨�, each of which holds � bytes. These are called the “Data De-vices. ” � Let there be � � more storage devices
Active Disks: Programming Model, Algorithms and Evaluation
, 1998
"... Several application and technology trends indicate that it might be both profitable and feasible to move computation closer to the data that it processes. In this paper, we evaluate Active Disk architectures which integrate significant processing power and memory into a disk drive and allow applicat ..."
Abstract
-
Cited by 202 (10 self)
- Add to MetaCart
(Show Context)
Several application and technology trends indicate that it might be both profitable and feasible to move computation closer to the data that it processes. In this paper, we evaluate Active Disk architectures which integrate significant processing power and memory into a disk drive and allow application-specific code to be downloaded and executed on the data that is being read from (written to) disk. The key idea is to o#oad bulk of the processing to the disk-resident processors and to use the host processor primarily for coordination, scheduling and combination of results from individual disks. To program Active Disks, we propose a stream-based programming model which allows disklets to be executed efficiently and safely. Simulation results for a suite of six algorithms from three application domains (commercial data warehouses, image processing and satellite data processing) indicate that for these algorithms, Active Disks outperform conventional-disk architectures.
Analysis of the clustering properties of the Hilbert space-filling curve
- IEEE Transactions on Knowledge and Data Engineering
, 2001
"... AbstractÐSeveral schemes for the linear mapping of a multidimensional space have been proposed for various applications, such as access methods for spatio-temporal databases and image compression. In these applications, one of the most desired properties from such linear mappings is clustering, whic ..."
Abstract
-
Cited by 192 (12 self)
- Add to MetaCart
(Show Context)
AbstractÐSeveral schemes for the linear mapping of a multidimensional space have been proposed for various applications, such as access methods for spatio-temporal databases and image compression. In these applications, one of the most desired properties from such linear mappings is clustering, which means the locality between objects in the multidimensional space being preserved in the linear space. It is widely believed that the Hilbert space-filling curve achieves the best clustering [1], [14]. In this paper, we analyze the clustering property of the Hilbert space-filling curve by deriving closed-form formulas for the number of clusters in a given query region of an arbitrary shape (e.g., polygons and polyhedra). Both the asymptotic solution for the general case and the exact solution for a special case generalize previous work [14]. They agree with the empirical results that the number of clusters depends on the hypersurface area of the query region and not on its hypervolume. We also show that the Hilbert curve achieves better clustering than the z curve. From a practical point of view, the formulas given in this paper provide a simple measure that can be used to predict the required disk access behaviors and, hence, the total access time.
External-Memory Graph Algorithms
, 1995
"... We present a collection of new techniques for designing and analyzing efficient external-memory algorithms for graph problems and illustrate how these techniques can be applied to a wide variety of specific problems. Our results include: ffl Proximate-neighboring. We present a simple method for der ..."
Abstract
-
Cited by 186 (22 self)
- Add to MetaCart
We present a collection of new techniques for designing and analyzing efficient external-memory algorithms for graph problems and illustrate how these techniques can be applied to a wide variety of specific problems. Our results include: ffl Proximate-neighboring. We present a simple method for deriving external-memory lower bounds via reductions from a problem we call the "proximate neighbors" problem. We use this technique to derive non-trivial lower bounds for such problems as list ranking, expression tree evaluation, and connected components. ffl PRAM simulation. We give methods for efficiently simulating PRAM computations in external memory, even for some cases in which the PRAM algorithm is not work-optimal. We apply this to derive a number of optimal (and simple) external-memory graph algorithms. ffl Time-forward processing. We present a general technique for evaluating circuits (or "circuit-like" computations) in external memory. We also use this in a deterministic list rank...
Idleness is Not Sloth
, 1995
"... Many people have observed that computer systems spend much of their time idle, and various schemes have been proposed to use this idle time productively. The commonest approach is to off-load activity from busy periods to less-busy ones in order to improve system responsiveness. In addition, specula ..."
Abstract
-
Cited by 172 (8 self)
- Add to MetaCart
(Show Context)
Many people have observed that computer systems spend much of their time idle, and various schemes have been proposed to use this idle time productively. The commonest approach is to off-load activity from busy periods to less-busy ones in order to improve system responsiveness. In addition, speculative work can be performed in idle periods in the hopes that it will be needed later at times of higher utilization, or non-renewable resource like battery power can be conserved by disabling unused resources. We found opportunities to exploit idle time in our work on storage systems, and after a few attempts to tackle specific instances of it in ad hoc ways, began to investigate general mechanisms that could be applied to this problem. Our results include a taxonomy of idle-time detection algorithms, metrics for evaluating them, and an evaluation of a number of idleness predictors that we generated from our taxonomy. 1. Introduction Resource usage is often bursty: periods of high utilizat...
Massive arrays of idle disks for storage archives
- In Proceedings of the 2002 ACM/IEEE conference on Supercomputing
, 2002
"... The declining costs of commodity disk drives is rapidly changing the economics of deploying large amounts of online or near-line storage. Conventional mass storage systems use either high performance RAID clusters, automated tape libraries or a combination of tape and disk. In this paper, we analyze ..."
Abstract
-
Cited by 168 (0 self)
- Add to MetaCart
(Show Context)
The declining costs of commodity disk drives is rapidly changing the economics of deploying large amounts of online or near-line storage. Conventional mass storage systems use either high performance RAID clusters, automated tape libraries or a combination of tape and disk. In this paper, we analyze an alternative design using massive arrays of idle disks, or MAID. We argue that this storage organization provides storage densities matching or exceeding those of tape libraries with performance similar to disk arrays. Moreover, we show that with effective power management of individual drives, this performance can be achieved using a very small power budget. In particular, we show that our power management strategy can result in the performance comparable to an always-on RAID system while using the power of such a RAID system. 1
The Galley parallel file system
- Parallel Computing
, 1996
"... Most current multiprocessor le systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/O requirements of parallel scienti c applications. Many multiprocessor le systems provide applications with a conventional Unix-like interface, allowing the ..."
Abstract
-
Cited by 151 (9 self)
- Add to MetaCart
(Show Context)
Most current multiprocessor le systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/O requirements of parallel scienti c applications. Many multiprocessor le systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. Thisinterface conceals the parallelism within the le system, increasing the ease of programmability, but making it di cult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insu cient interface, most current multiprocessor le systems are optimized for a di erent workload than they are being asked to support. We introduce Galley, a new parallel le system that is intended to e ciently support realistic scienti c multiprocessor workloads. We discuss Galley's le structure and application interface, as well as the performance advantages o ered by that interface. 1