Results 1 - 10
of
76
RAID: High-Performance, Reliable Secondary Storage
- ACM COMPUTING SURVEYS
, 1994
"... Disk arrays were proposed in the 1980s as a way to use parallelism between multiple disks to improve aggregate I/O performance. Today they appear in the product lines of most major computer manufacturers. This paper gives a comprehensive overview of disk arrays and provides a framework in which to o ..."
Abstract
-
Cited by 281 (6 self)
- Add to MetaCart
Disk arrays were proposed in the 1980s as a way to use parallelism between multiple disks to improve aggregate I/O performance. Today they appear in the product lines of most major computer manufacturers. This paper gives a comprehensive overview of disk arrays and provides a framework in which to organize current and future work. The paper first introduces disk technology and reviews the driving forces that have popularized disk arrays: performance and reliability. It then discusses the two architectural techniques used in disk arrays: striping across multiple disks to improve performance and redundancy to improve reliability. Next, the paper describes seven disk array architectures, called RAID (Redundant Arrays of Inexpensive Disks) levels 0-6 and compares their performance, cost, and reliability. It goes on to discuss advanced research and implementation topics such as refining the basic RAID levels to improve performance and designing algorithms to maintain data consistency. Last, the paper describes six disk array prototypes or products and discusses future opportunities for research. The paper includes an annotated bibliography of disk array-related literature.
Row-diagonal parity for double disk failure correction
- In Proceedings of the 3rd USENIX Symposium on File and Storage Technologies (FAST ’04
, 2004
"... Permission is granted for noncommercial reproduction of the work for educational or research purposes. ..."
Abstract
-
Cited by 105 (0 self)
- Add to MetaCart
Permission is granted for noncommercial reproduction of the work for educational or research purposes.
Parity Declustering for Continuous Operation in Redundant Disk Arrays
, 1992
"... We describe and evaluate a strategy for declustering the parity encoding in a redundant disk array. This declustered parity organization balances cost against data reliability and performance during failure recovery. It is targeted at highly-available parity-based arrays for use in continuousoperati ..."
Abstract
-
Cited by 89 (12 self)
- Add to MetaCart
We describe and evaluate a strategy for declustering the parity encoding in a redundant disk array. This declustered parity organization balances cost against data reliability and performance during failure recovery. It is targeted at highly-available parity-based arrays for use in continuousoperation systems. It improves on standard parity organizations by reducing the additional load on surviving disks during the reconstruction of a failed disk's contents. This yields higher user throughput during recovery, and/or shorter recovery time. We first address the generalized parity layout problem, basing our solution on balanced incomplete and complete block designs. A software implementation of declustering is then evaluated using a disk array simulator under a highly concurrent workload comprised of small user accesses. We show that declustered parity penalizes user response time while a disk is being repaired (before and during its recovery) less than comparable non-declustered (RAID5) ...
The TickerTAIP Parallel RAID Architecture
- ACM Transactions on Computer Systems
, 1993
"... This paper presents the TickerTAIP architecture and an evaluation of its behavior. We demonstrate the feasibility by an existence proof; describe a family of distributed algorithms for RAID parity calculation; discuss techniques for establishing request atomicity, sequencing and recovery; and provid ..."
Abstract
-
Cited by 82 (8 self)
- Add to MetaCart
This paper presents the TickerTAIP architecture and an evaluation of its behavior. We demonstrate the feasibility by an existence proof; describe a family of distributed algorithms for RAID parity calculation; discuss techniques for establishing request atomicity, sequencing and recovery; and provide a performance evaluation of the TickerTAIP design space in both absolute terms and by comparison to a centralized RAID implementation. We conclude that the TickerTAIP architectural approach is feasible, useful, and effective. *Princeton University, Princeton, NJ, **University of Illinois, Urbana-Champaign, IL, ***University of Wisconsin, Madison, WI Also published as Operating Systems Research Department report HPL-OSR-92-6 1
Fault Tolerant Design of Multimedia Servers
- In Proceedings of the ACM SIGMOD International Conference on Management of Data
, 1995
"... Recent technological advances have made multimedia on-demand servers feasible. Two challenging tasks in such systems are: a) satisfying the real-time requirement for continuous delivery of objects at specified bandwidths and b) efficiently servicing multiple clients simultaneously. To accomplish the ..."
Abstract
-
Cited by 77 (6 self)
- Add to MetaCart
Recent technological advances have made multimedia on-demand servers feasible. Two challenging tasks in such systems are: a) satisfying the real-time requirement for continuous delivery of objects at specified bandwidths and b) efficiently servicing multiple clients simultaneously. To accomplish these tasks and realize economies of scale associated with servicing a large user population, the multimedia server can require a large disk subsystem. Although a single disk is fairly reliable, a large disk farm can have an unacceptably high probability of disk failure. Further, due to the real-time constraint, the reliability and availability requirements of multimedia systems are very stringent. In this paper we investigate techniques for providing a high degree of reliability and availability, at low disk storage, bandwidth, and memory costs for on-demand multimedia servers. 1 Introduction Recent technological advances in digital signal processing, data compression techniques, and high spe...
Coding Techniques for Handling Failures in Large Disk Arrays
- ALGORITHMICA
, 1988
"... A crucial issue in the design of very large disk arrays is the protection of data against catastrophic disk failures. Although today single disks are highly reliable, when a disk array consists of 100 or 1000 disks, the probability that at least one disk will fail within a day or a week is high. In ..."
Abstract
-
Cited by 73 (2 self)
- Add to MetaCart
A crucial issue in the design of very large disk arrays is the protection of data against catastrophic disk failures. Although today single disks are highly reliable, when a disk array consists of 100 or 1000 disks, the probability that at least one disk will fail within a day or a week is high. In this paper, we address the problem of designing erasure-correcting binary linear codes that protect against the loss of data caused by disk failures in large disk arrays. We describe how such codes can be used to encode data in disk arrays, and give a simple method for data reconstruction. We discuss important reliability and performance constraints of these codes, and show how these constraints relate to properties of the parity check matrices of the codes. In so doing, we transform code design problems into combinatorial problems. Using this combinatorial framework, we present codes and prove they are optimal with respect to various reliability and performance constraints.
Reliability mechanisms for very large storage systems
- IN PROCEEDINGS OF THE 20TH IEEE / 11TH NASA GODDARD CONFERENCE ON MASS STORAGE SYSTEMS AND TECHNOLOGIES
, 2003
"... Reliability and availability are increasingly important in large-scale storage systems built from thousands of individual storage devices. Large systems must survive the failure of individual components; in systems with thousands of disks, even infrequent failures are likely in some device. We focus ..."
Abstract
-
Cited by 54 (18 self)
- Add to MetaCart
Reliability and availability are increasingly important in large-scale storage systems built from thousands of individual storage devices. Large systems must survive the failure of individual components; in systems with thousands of disks, even infrequent failures are likely in some device. We focus on two types of errors: nonrecoverable read errors and drive failures. We discuss mechanisms for detecting and recovering from such errors, introducing improved techniques for detecting errors in disk reads and fast recovery from disk failure. We show that simple RAID cannot guarantee sufficient reliability; our analysis examines the tradeoffs among other schemes between system availability and storage efficiency. Based on our data, we believe that two-way mirroring should be sufficient for most large storage systems. For those that need very high reliabilty, we recommend either three-way mirroring or mirroring combined with RAID.
Data Partitioning and Load Balancing in Parallel Disk Systems
, 1994
"... Parallel disk systems provide opportunities for exploiting I/O parallelism in two possible ways, namely via inter-request and intra-request parallelism. In this paper we discuss the main issues in performance tuning of such systems, namely striping and load balancing, and show their relationship to ..."
Abstract
-
Cited by 54 (8 self)
- Add to MetaCart
Parallel disk systems provide opportunities for exploiting I/O parallelism in two possible ways, namely via inter-request and intra-request parallelism. In this paper we discuss the main issues in performance tuning of such systems, namely striping and load balancing, and show their relationship to response time and throughput. We outline the main components of an intelligent file system that optimizes striping by taking into account the requirements of the applications, and performs load balancing by judicious file allocation and dynamic redistributions of the data when access patterns change. Our system uses simple but effective heuristics that incur only little overhead. We present performance experiments based on synthetic workloads and real-life traces.
Designing Disk Arrays for High Data Reliability
"... Redundancy based on a parity encoding has been proposed for insuring that disk arrays provide highly reliable data. Parity-based redundancy will tolerate many independent and dependent disk failures (shared support hardware) without on-line spare disks and many more such failures with on-line spare ..."
Abstract
-
Cited by 51 (9 self)
- Add to MetaCart
Redundancy based on a parity encoding has been proposed for insuring that disk arrays provide highly reliable data. Parity-based redundancy will tolerate many independent and dependent disk failures (shared support hardware) without on-line spare disks and many more such failures with on-line spare disks. This paper explores the design of reliable, redundant disk arrays. In the context of a 70 disk strawman array, it presents and applies analytic and simulation models for the time until data is lost. It shows how to balance requirements for high data reliability against the overhead cost of redundant data, on-line spares, and on-site repair personnel in terms of an array’s architecture, its component reliabilities, and its repair policies.
Tolerating Multiple Failures in RAID Architectures with Optimal Storage and Uniform Declustering
- In Proceedings of the 24th International Symposium on Computer Architecture
, 1996
"... We present Datum, a novel method for tolerating multiple disk failures in disk arrays. Datum is the first known method that can mask any given number of failures, requires an optimal amount of redundant storage space, and spreads reconstruction accesses uniformly over disks in the presence of failur ..."
Abstract
-
Cited by 47 (5 self)
- Add to MetaCart
We present Datum, a novel method for tolerating multiple disk failures in disk arrays. Datum is the first known method that can mask any given number of failures, requires an optimal amount of redundant storage space, and spreads reconstruction accesses uniformly over disks in the presence of failures without needing large layout tables in controller memory. Our approach is based on information dispersal, a coding technique that admits an efficient hardware implementation. As the method does not restrict the configuration parameters of the disk array, many existing RAID organizations are particular cases of Datum. A detailed performance comparison with two other approaches shows that Datum's response times are similar to those of the best competitor when two or less disks fail, and that the performance degrades gracefully when more than two disks fail. 1 Introduction Disk arrays [15] offer significant advantages over conventional disks. Fragmentation of the total storage space into ...

